Untitled Document

http://newsforge.com/newsforge/02/04/03/2151246.shtml?tid=23

Using the /dev and /proc file systems

Thursday April 04, 2002 - [ 09:47 AM GMT ] - By Matt Butcher - There are two Linux file systems that continually prove to be confusing stumbling blocks to new Linux users. These two directories, /proc and /dev have no Windows counterpart, and are not at first glance easily understandable. They are, however, powerful tools for understanding and using Linux.

This article is a walk-through of the device (/dev) and process (/proc) file systems. It will explain what they are, how they work, and how they are used in practice.

Devices: In Linux, a device is any piece of equipment (or code that emulates equipment) that provides methods for performing input or output (IO). For example, a keyboard is an input device. A hard disk is an input (read) and output (write) device. In Linux, most devices are represented as files in the file system (network cards are an exception). These special files are stored in a common place, /dev, where they are easily accessible to processes that need to perform IO-related tasks.

Devices roughly fall into two categories: character devices and block devices. Character devices deal with IO on a character by character basis. The most obvious example is a keyboard, where every key generates a character on the device. The mouse is another. Every motion or button click sends a character to the /dev/mouse device.

Block devices read data in larger chunks. Data storage devices, such as IDE hard drives (/dev/hd), SCSI hard drives (/dev/sd), and CD-ROMs (/dev/cdrom) are block devices. IO interactions with block devices transact with chunks of data, which allows large quantities of data to be moved back and forth more efficiently.

Device names: Devices are often named after the equipment they represent. Devices named /dev/fb represent the frame buffers for graphics. /dev/hd devices represent IDE hard disks. In some cases, symbolic links are employed to clarify what a device is, for example /dev/mouse, the device representing a mouse, may be linked to a serial, USB or PS2 device, depending on the physical hardware. The symbolic link makes it easier for both humans and machines to figure out which device is the mouse.

In some cases, there may be multiple devices of the same type. For instance, a machine may have two ATAPI CD-ROMS. Two devices will be used -- one for each device. For instance, /dev/cdrom0 will be the first CD-ROM, and /dev/cdrom1 will be the second.

The naming gets a little more confusing in cases like hard disks. A hard disk device name is composed of the type of disk followed by the position of the disk, and then the disk partition. The first hard disk may be called /dev/hda, with the "hd" part indicating that it is an IDE hard disk, and the "a" part indicating that it is the first hard disk. /dev/hdb would then refer to the second hard disk. Each hard disk is divided into partitions. The first partition on the first hard drive would be /dev/hda1, where the one at the end indicates the location of the partition. Note that where some devices (like /dev/cdrom0) may begin the index from 0, devices with partitions typically begin from 1. So a listing of all of the partitions on the two IDE hard drives on my computer might look like this:

/dev/hda
/dev/hda1
/dev/hda2
/dev/hda3
/dev/hda4
/dev/hdb
/dev/hdb1
/dev/hdb2
/dev/hdb3

SCSI hard disks use /dev/sd instead of /dev/hd, but otherwise the convention is the same. /dev/sda1 refers to the first partition on the first SCSI hard disk. Special Devices: There are a few special devices that come in useful every once in a while, /dev/null, /dev/zero, /dev/full, and /dev/random.

The null device, /dev/null, is sort of the "trash" device. Put simply, things that go in never come out. Many times, some program may generate unnecessary output. Shell scripts often employ /dev/null to prevent the user from having to see unnecessary output generated by utilities that it calls. The example below inserts a kernel module and sends the output from modprobe to /dev/null.

$ modprobe cipher-twofish > /dev/null

Closely related to /dev/null is /dev/zero. Like /dev/null, it can be used to dump unwanted data, but reads from /dev/zero return \0 characters (reads from /dev/null return end-of-file characters). For this reason, /dev/zero is commonly used to create empty files.

dd if=/dev/zero of=/my-file bs=1k count=100

The command above will create a file 100k in size, full of null characters.

/dev/full mimics a full device. Writing to /dev/full will generate an error. The full device is useful when testing how an application will act when it accesses a device that is full.

$ cp test-file /dev/full
cp: writing `/dev/full": No space left on device
$  df -k /dev/full
file system           1k-blocks     Used Available Use% Mounted on
/dev/full                    0        0         0   -

The random devices, /dev/random and /dev/urandom, generate "random" data. Though the output to both may appear to be completely random, /dev/random is actually more random than /dev/urandom. /dev/random generates random characters based on "environmental noise" that is not determinable. Since there is only a limited supply of this random noise, the /dev/random device is slow, and may pause in order to collect more data. /dev/urandom uses the same pool of noise as /dev/random, but if it runs out of random data, it generates pseudo-random data. This makes it faster, but less secure. Old /dev File System: In the past, the /dev file system has been part of the normal file system. It consisted of special files created once (usually when the system was installed) and stored on a hard disk.

On systems with this setup, the /dev file system needed to contain entries for any devices that might be connected to that computer. Consequently, /dev was very large, having entries for multiple hard drives, consoles, floppy drives, etc. Earlier, we saw the list of hard drive partitions on hdb. Under the old /dev file system, entries would exist for /dev/hdb1 through /dev/hdb11. In order to figure out which devices actually mapped to partitions on the device (remember, I only have three partitions on my hdb drive), some utility would have to be used to determine which devices were valid. the command "file -s hdb*" was one way to figure that sort of thing out, printing something like this:

$ file -s /dev/hdb?
/dev/hdb1: Linux/i386 ext2 file system
/dev/hdb2: Linux/i386 ext2 file system
/dev/hdb3: Linux/i386 ext2 file system
/dev/hdb4: empty
/dev/hdb5: empty
/dev/hdb6: empty
/dev/hdb7: empty
/dev/hdb8: empty
/dev/hdb9: empty

If a given device file wasn't already present, it had to be created by mknod or another program (like MAKEDEV). Though the "old way" worked, it was complex, and got tedious to manage. DevFS: In the 2.4 kernel tree, an alternative to the cumbersome disk-based /dev file system was introduced. The alternative, DevFS, incorporated new device handling code into the kernel. In DevFS, the /dev file system is created during each boot-up cycle and stored in RAM, instead of on the hard disk. Under this model, there is no need to maintain a list of all possible devices, and when new devices are added to hardware, the kernel just makes an entry for it in /dev. In the occasional cases where devices need special configuration to appear correctly in DevFS, there is a configuration file usually stored in /etc/devfsd.conf.

/proc: A file system for processes

Processes: At any given time, Linux will have many processes running at once. Some, such as window managers, email clients, and Web browsers, will be visible to the end user. Others, like servers and helper processes, are not immediately visible, but run in the background, handling tasks that do not require the user"s interaction. Running "ps -ef" in a shell will print a list of all the currently running processes. It should look something like this:

$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 11:08 ?        00:00:04 init
root         2     1  0 11:08 ?        00:00:00 [keventd]
root         3     0  0 11:08 ?        00:00:00 [ksoftirqd_CPU0]
root         4     0  0 11:08 ?        00:00:00 [kswapd]
root         5     0  0 11:08 ?        00:00:00 [bdflush]
root         6     0  0 11:08 ?        00:00:00 [kupdated]
root         8     1  0 11:08 ?        00:00:00 [kjournald]
root        86     1  0 11:08 ?        00:00:00 /sbin/devfsd /dev
root       165     1  0 11:09 ?        00:00:00 [kjournald]
root       168     1  0 11:09 ?        00:00:00 [khubd]
root       294     1  0 11:09 ?        00:00:00 [kapmd]
root       515     1  0 11:09 ?        00:00:00 metalog [MASTER]
root       521   515  0 11:09 ?        00:00:00 metalog [KERNEL]
root       531     1  0 11:09 ?        00:00:00 /sbin/dhcpcd eth0
/etc/X11/fs/config -droppriv -user xfs
root       572     1  0 11:09 ?        00:00:00 /usr/kde/2/bin/kdm
root       593   572  2 11:09 ?        00:04:27 /usr/X11R6/bin/X -auth /var/lib/kdm/authfiles/A:0-25pIgI
root       644     1  0 11:09 vc/1     00:00:00 /sbin/agetty 38400 tty1 linux
root      1045   572  0 12:16 ?        00:00:00 -:0
mbutcher  1062  1045  0 12:16 ?        00:00:00 /bin/sh /etc/X11/Sessions/kde-2.2.2
mbutcher  1091  1062  0 12:16 ?        00:00:00 /bin/bash --login /usr/kde/2/bin/startkde
mbutcher  1132     1  0 12:16 ?        00:00:00 kdeinit: Running...
mbutcher  1157  1132  0 12:16 ?        00:00:01 kdeinit: kwin
mbutcher  1159     1  0 12:16 ?        00:00:07 kdeinit: kdesktop
mbutcher  1168     1  0 12:16 ?        00:00:00 kdeinit: kwrited
mbutcher  1171  1168  0 12:16 pty/s0   00:00:00 /bin/cat
mbutcher  1173     1  0 12:16 ?        00:00:00 alarmd
mbutcher  1207  1132  0 12:23 ?        00:00:08 kdeinit: konsole -icon konsole -miniicon konsole
mbutcher  1219  1207  0 12:23 pty/s2   00:00:00 /bin/bash
mbutcher  1309  1260  0 13:48 pty/s3   00:00:01 vi dev-and-proc.html
root      1314  1220  0 14:03 pty/s2   00:00:00 ps -ef

Many of the tasks in the output to ps are background processes. Those that are contained in square brackets are kernel processes. Only a few, like the kde processes and the entries toward the end, are processes that I interact with.

In order to manage the system, the kernel must keep track of every process running, including itself. Many user-level applications, too, must be able to find out what processes are running ("ps" is a good example. "top" is another.). The /proc file system is where the kernel stores information about processes.

Like DevFS, /proc is stored in memory, rather than on disk. If you look at the file /proc/mounts (which lists all of the mounted file systems, much like the "mount" command), you should see a line in it that looks like this:

proc /proc proc rw 0 0

/proc is controlled by the kernel and does not have an underlying device. Because it contains mainly state information controlled by the kernel, the most logical place to store the information is in memory controlled by the kernel.

Information about running processes: To keep track of processes, the kernel assigns each one a Process ID (PID) number. Running the command "ps -ef" as we did above, will print a list of all running processes ordered by the PID number (which is in the second column). The /proc file system stores information about each PID.

In /proc, many of the directory names are numbers. These directories correspond to PID numbers. Inside of the directories are files that provide important details about the state, environment, and details regarding a process. In the output of ps (above), there was a line that read:

mbutcher  1219  1207  0 12:23 pty/s2   00:00:00 /bin/bash

This process is running the bash shell, and has PID 1219. The directory /proc/1219 contains information about this process.

$ ls /proc/1219
cmdline  cpu  cwd  environ  exe  fd  maps  mem  root  stat  statm  status

The file "cmdline" contains the command invoked to start the process. The "environ" file contains the environment variables for the process. "status" has status information on the process, including the user (UID) and group (GID) identification for the user executing the process, the parent process ID (PPID) that instantiated the PID, and the current state of the process,such as "Sleeping" or "Running."

$ cat status
Name:   bash
State:  S (sleeping)
Tgid:   1219
Pid:    1219
PPid:   1207
TracerPid:      0
Uid:    501     501     501     501
Gid:    501     501     501     501
FDSize: 256
Groups: 501 10 18
VmSize:     2400 kB
VmLck:         0 kB
VmRSS:      1272 kB
VmData:      124 kB
VmStk:        20 kB
VmExe:       544 kB
VmLib:      1604 kB
SigPnd: 0000000000000000
SigBlk: 0000000080010000
SigIgn: 8000000000384004
SigCgt: 000000004b813efb
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

Every process directory also has a couple symbolic links. "cwd" is a link to the current working directory for that process. "exe" is a link to the executable program the process is running, and "root" links to the directory the process sees as its root directory (usually "/"). The directory "fd" contains a list of symbolic links to the file descriptors that the process is using.

There are other files in the process directory that provide information about everything from processor and memory usage to the amount of time a process has been running. Most of these files are documented in the kernel source under "Documentation/file systems/proc.txt" and are available as a man page -- "man proc."

Kernel Information: In addition to storing information about specific processes, the /proc file system contains a great deal of information generated by the kernel itself to describe the state of the system.

The kernel and its modules may generate files in /proc to provide information about their current state. For instance, /proc/fb provides information about the currently available frame buffer devices (frame buffers are most often used to display a boot logo).

$ cat fb
0 VESA VGA

Note that the 0 refers to the frame buffer"s index, and corresponds to the device /dev/fb0. If I had a second framebuffer, the proc entry would also contain a line starting with a 1, corresponding to /dev/fb1. Often, proc data will refer to (and explain) entries in /dev.

Lots of information about hardware is stored in /proc. The file /proc/pci has info about every PCI device detected on the system. Running the command "lspci" ought to generate data that looks similar to this file, as it uses /proc/pci as its source of information. /proc/bus contains directories for various bus architectures (PCI, PCCard, USB), which in turn contain information about the devices connected via those buses. Various network information and statistics are stored in /proc/net. Information about hard disks is stored in /proc/ide and /proc/scsi, depending on the hard drive type. /proc/devices lists all of the devices (divided into the "block" and "character" categories) available on the system.

$ cat /proc/devices
 Character devices:
   1 mem
   2 pty/m%d
   3 pty/s%d
   4 tts/%d
   5 cua/%d
   7 vcs
  10 misc
  14 sound
  29 fb
 116 alsa
 162 raw
 180 usb
 226 drm
 254 pcmcia

 Block devices:
  1 ramdisk
  2 fd
  3 ide0
 22 ide1

There are many more files in /proc than can be covered here. Each kernel may, in fact, have different entries depending on what was built into the kernel, what hardware and software is present, and what state the computer is currently in. Some of these files are clearly meant for a machine to read, but others offer information that is intuitive. Most of these files are documented in various places in the kernel documentation. A good starting point in the kernel source is Documentation/file systems/proc.txt.

Interacting with processes via /proc: Some files in proc are not just for reading. Writing to them may alter the state of the kernel. Looking to see what's in a file in /proc is usually harmless, but writing to files without being sure of the outcome can be dangerous. Nevertheless, sometimes writing to /proc is the only way to communicate with the kernel.

For instance, in recent versions of the kernel, there is the option to include a kernel-level high-performance Web server (khttp). Because starting a Web server by default can be a security risk, khttp requires explicit startup through messages sent to a file in proc.

echo 1 > /proc/sys/net/khttpd/start

When the kernel sees the contents of /proc/sys/net/khttps/start change from 0 (the default) to 1, it starts the khttpd server.

There are dozens of other configurable parameters in /proc -- some for tuning hardware, others for managing the kernel internals. Almost all of them, though, are low level and can cause bad things to happen if set to the wrong values. As a rule of thumb, /proc entries should not be changed unless you know what you are doing.

Conclusion

/proc and /dev provide file-based interfaces to the internals of Linux. They assist in determining the configuration and state of various devices and processes on a system. They provide capabilities necessary to make the operating system easy to upgrade, analyze, debug, and run. Understanding and applying knowledge of these two file systems is key to making the most of your Linux system.