at v2.6.15-rc2 1959 lines 69 kB view raw
1Devfs (Device File System) FAQ 2 3 4Linux Devfs (Device File System) FAQ 5Richard Gooch 620-AUG-2002 7 8 9Document languages: 10 11 12 13 14 15 16 17----------------------------------------------------------------------------- 18 19NOTE: the master copy of this document is available online at: 20 21http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html 22and looks much better than the text version distributed with the 23kernel sources. A mirror site is available at: 24 25http://www.ras.ucalgary.ca/~rgooch/linux/docs/devfs.html 26 27There is also an optional daemon that may be used with devfs. You can 28find out more about it at: 29 30http://www.atnf.csiro.au/~rgooch/linux/ 31 32A mailing list is available which you may subscribe to. Send 33email 34to majordomo@oss.sgi.com with the following line in the 35body of the message: 36subscribe devfs 37To unsubscribe, send the message body: 38unsubscribe devfs 39instead. The list is archived at 40 41http://oss.sgi.com/projects/devfs/archive/. 42 43----------------------------------------------------------------------------- 44 45Contents 46 47 48What is it? 49 50Why do it? 51 52Who else does it? 53 54How it works 55 56Operational issues (essential reading) 57 58Instructions for the impatient 59Permissions persistence across reboots 60Dealing with drivers without devfs support 61All the way with Devfs 62Other Issues 63Kernel Naming Scheme 64Devfsd Naming Scheme 65Old Compatibility Names 66SCSI Host Probing Issues 67 68 69 70Device drivers currently ported 71 72Allocation of Device Numbers 73 74Questions and Answers 75 76Making things work 77Alternatives to devfs 78What I don't like about devfs 79How to report bugs 80Strange kernel messages 81Compilation problems with devfsd 82 83 84Other resources 85 86Translations of this document 87 88 89----------------------------------------------------------------------------- 90 91 92What is it? 93 94Devfs is an alternative to "real" character and block special devices 95on your root filesystem. Kernel device drivers can register devices by 96name rather than major and minor numbers. These devices will appear in 97devfs automatically, with whatever default ownership and 98protection the driver specified. A daemon (devfsd) can be used to 99override these defaults. Devfs has been in the kernel since 2.3.46. 100 101NOTE that devfs is entirely optional. If you prefer the old 102disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the 103default). In this case, nothing will change. ALSO NOTE that if you do 104enable devfs, the defaults are such that full compatibility is 105maintained with the old devices names. 106 107There are two aspects to devfs: one is the underlying device 108namespace, which is a namespace just like any mounted filesystem. The 109other aspect is the filesystem code which provides a view of the 110device namespace. The reason I make a distinction is because devfs 111can be mounted many times, with each mount showing the same device 112namespace. Changes made are global to all mounted devfs filesystems. 113Also, because the devfs namespace exists without any devfs mounts, you 114can easily mount the root filesystem by referring to an entry in the 115devfs namespace. 116 117 118The cost of devfs is a small increase in kernel code size and memory 119usage. About 7 pages of code (some of that in __init sections) and 72 120bytes for each entry in the namespace. A modest system has only a 121couple of hundred device entries, so this costs a few more 122pages. Compare this with the suggestion to put /dev on a <a 123href="#why-faq-ramdisc">ramdisc. 124 125On a typical machine, the cost is under 0.2 percent. On a modest 126system with 64 MBytes of RAM, the cost is under 0.1 percent. The 127accusations of "bloatware" levelled at devfs are not justified. 128 129----------------------------------------------------------------------------- 130 131 132Why do it? 133 134There are several problems that devfs addresses. Some of these 135problems are more serious than others (depending on your point of 136view), and some can be solved without devfs. However, the totality of 137these problems really calls out for devfs. 138 139The choice is a patchwork of inefficient user space solutions, which 140are complex and likely to be fragile, or to use a simple and efficient 141devfs which is robust. 142 143There have been many counter-proposals to devfs, all seeking to 144provide some of the benefits without actually implementing devfs. So 145far there has been an absence of code and no proposed alternative has 146been able to provide all the features that devfs does. Further, 147alternative proposals require far more complexity in user-space (and 148still deliver less functionality than devfs). Some people have the 149mantra of reducing "kernel bloat", but don't consider the effects on 150user-space. 151 152A good solution limits the total complexity of kernel-space and 153user-space. 154 155 156Major&minor allocation 157 158The existing scheme requires the allocation of major and minor device 159numbers for each and every device. This means that a central 160co-ordinating authority is required to issue these device numbers 161(unless you're developing a "private" device driver), in order to 162preserve uniqueness. Devfs shifts the burden to a namespace. This may 163not seem like a huge benefit, but actually it is. Since driver authors 164will naturally choose a device name which reflects the functionality 165of the device, there is far less potential for namespace conflict. 166Solving this requires a kernel change. 167 168/dev management 169 170Because you currently access devices through device nodes, these must 171be created by the system administrator. For standard devices you can 172usually find a MAKEDEV programme which creates all these (hundreds!) 173of nodes. This means that changes in the kernel must be reflected by 174changes in the MAKEDEV programme, or else the system administrator 175creates device nodes by hand. 176 177The basic problem is that there are two separate databases of 178major and minor numbers. One is in the kernel and one is in /dev (or 179in a MAKEDEV programme, if you want to look at it that way). This is 180duplication of information, which is not good practice. 181Solving this requires a kernel change. 182 183/dev growth 184 185A typical /dev has over 1200 nodes! Most of these devices simply don't 186exist because the hardware is not available. A huge /dev increases the 187time to access devices (I'm just referring to the dentry lookup times 188and the time taken to read inodes off disc: the next subsection shows 189some more horrors). 190 191An example of how big /dev can grow is if we consider SCSI devices: 192 193host 6 bits (say up to 64 hosts on a really big machine) 194channel 4 bits (say up to 16 SCSI buses per host) 195id 4 bits 196lun 3 bits 197partition 6 bits 198TOTAL 23 bits 199 200 201This requires 8 Mega (1024*1024) inodes if we want to store all 202possible device nodes. Even if we scrap everything but id,partition 203and assume a single host adapter with a single SCSI bus and only one 204logical unit per SCSI target (id), that's still 10 bits or 1024 205inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so 206that's 256 kBytes of inode storage on disc (assuming real inodes take 207a similar amount of space as VFS inodes). This is actually not so bad, 208because disc is cheap these days. Embedded systems would care about 209256 kBytes of /dev inodes, but you could argue that embedded systems 210would have hand-tuned /dev directories. I've had to do just that on my 211embedded systems, but I would rather just leave it to devfs. 212 213Another issue is the time taken to lookup an inode when first 214referenced. Not only does this take time in scanning through a list in 215memory, but also the seek times to read the inodes off disc. 216This could be solved in user-space using a clever programme which 217scanned the kernel logs and deleted /dev entries which are not 218available and created them when they were available. This programme 219would need to be run every time a new module was loaded, which would 220slow things down a lot. 221 222There is an existing programme called scsidev which will automatically 223create device nodes for SCSI devices. It can do this by scanning files 224in /proc/scsi. Unfortunately, to extend this idea to other device 225nodes would require significant modifications to existing drivers (so 226they too would provide information in /proc). This is a non-trivial 227change (I should know: devfs has had to do something similar). Once 228you go to this much effort, you may as well use devfs itself (which 229also provides this information). Furthermore, such a system would 230likely be implemented in an ad-hoc fashion, as different drivers will 231provide their information in different ways. 232 233Devfs is much cleaner, because it (naturally) has a uniform mechanism 234to provide this information: the device nodes themselves! 235 236 237Node to driver file_operations translation 238 239There is an important difference between the way disc-based character 240and block nodes and devfs entries make the connection between an entry 241in /dev and the actual device driver. 242 243With the current 8 bit major and minor numbers the connection between 244disc-based c&b nodes and per-major drivers is done through a 245fixed-length table of 128 entries. The various filesystem types set 246the inode operations for c&b nodes to {chr,blk}dev_inode_operations, 247so when a device is opened a few quick levels of indirection bring us 248to the driver file_operations. 249 250For miscellaneous character devices a second step is required: there 251is a scan for the driver entry with the same minor number as the file 252that was opened, and the appropriate minor open method is called. This 253scanning is done *every time* you open a device node. Potentially, you 254may be searching through dozens of misc. entries before you find your 255open method. While not an enormous performance overhead, this does 256seem pointless. 257 258Linux *must* move beyond the 8 bit major and minor barrier, 259somehow. If we simply increase each to 16 bits, then the indexing 260scheme used for major driver lookup becomes untenable, because the 261major tables (one each for character and block devices) would need to 262be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit 263systems). So we would have to use a scheme like that used for 264miscellaneous character devices, which means the search time goes up 265linearly with the average number of major device drivers on your 266system. Not all "devices" are hardware, some are higher-level drivers 267like KGI, so you can get more "devices" without adding hardware 268You can improve this by creating an ordered (balanced:-) 269binary tree, in which case your search time becomes log(N). 270Alternatively, you can use hashing to speed up the search. 271But why do that search at all if you don't have to? Once again, it 272seems pointless. 273 274Note that devfs doesn't use the major&minor system. For devfs 275entries, the connection is done when you lookup the /dev entry. When 276devfs_register() is called, an internal table is appended which has 277the entry name and the file_operations. If the dentry cache doesn't 278have the /dev entry already, this internal table is scanned to get the 279file_operations, and an inode is created. If the dentry cache already 280has the entry, there is *no lookup time* (other than the dentry scan 281itself, but we can't avoid that anyway, and besides Linux dentries 282cream other OS's which don't have them:-). Furthermore, the number of 283node entries in a devfs is only the number of available device 284entries, not the number of *conceivable* entries. Even if you remove 285unnecessary entries in a disc-based /dev, the number of conceivable 286entries remains the same: you just limit yourself in order to save 287space. 288 289Devfs provides a fast connection between a VFS node and the device 290driver, in a scalable way. 291 292/dev as a system administration tool 293 294Right now /dev contains a list of conceivable devices, most of which I 295don't have. Devfs only shows those devices available on my 296system. This means that listing /dev is a handy way of checking what 297devices are available. 298 299Major&minor size 300 301Existing major and minor numbers are limited to 8 bits each. This is 302now a limiting factor for some drivers, particularly the SCSI disc 303driver, which consumes a single major number. Only 16 discs are 304supported, and each disc may have only 15 partitions. Maybe this isn't 305a problem for you, but some of us are building huge Linux systems with 306disc arrays. With devfs an arbitrary pointer can be associated with 307each device entry, which can be used to give an effective 32 bit 308device identifier (i.e. that's like having a 32 bit minor 309number). Since this is private to the kernel, there are no C library 310compatibility issues which you would have with increasing major and 311minor number sizes. See the section on "Allocation of Device Numbers" 312for details on maintaining compatibility with userspace. 313 314Solving this requires a kernel change. 315 316Since writing this, the kernel has been modified so that the SCSI disc 317driver has more major numbers allocated to it and now supports up to 318128 discs. Since these major numbers are non-contiguous (a result of 319unplanned expansion), the implementation is a little more cumbersome 320than originally. 321 322Just like the changes to IPv4 to fix impending limitations in the 323address space, people find ways around the limitations. In the long 324run, however, solutions like IPv6 or devfs can't be put off forever. 325 326Read-only root filesystem 327 328Having your device nodes on the root filesystem means that you can't 329operate properly with a read-only root filesystem. This is because you 330want to change ownerships and protections of tty devices. Existing 331practice prevents you using a CD-ROM as your root filesystem for a 332*real* system. Sure, you can boot off a CD-ROM, but you can't change 333tty ownerships, so it's only good for installing. 334 335Also, you can't use a shared NFS root filesystem for a cluster of 336discless Linux machines (having tty ownerships changed on a common 337/dev is not good). Nor can you embed your root filesystem in a 338ROM-FS. 339 340You can get around this by creating a RAMDISC at boot time, making 341an ext2 filesystem in it, mounting it somewhere and copying the 342contents of /dev into it, then unmounting it and mounting it over 343/dev. 344 345A devfs is a cleaner way of solving this. 346 347Non-Unix root filesystem 348 349Non-Unix filesystems (such as NTFS) can't be used for a root 350filesystem because they variously don't support character and block 351special files or symbolic links. You can't have a separate disc-based 352or RAMDISC-based filesystem mounted on /dev because you need device 353nodes before you can mount these. Devfs can be mounted without any 354device nodes. Devlinks won't work because symlinks aren't supported. 355An alternative solution is to use initrd to mount a RAMDISC initial 356root filesystem (which is populated with a minimal set of device 357nodes), and then construct a new /dev in another RAMDISC, and finally 358switch to your non-Unix root filesystem. This requires clever boot 359scripts and a fragile and conceptually complex boot procedure. 360 361Devfs solves this in a robust and conceptually simple way. 362 363PTY security 364 365Current pseudo-tty (pty) devices are owned by root and read-writable 366by everyone. The user of a pty-pair cannot change 367ownership/protections without being suid-root. 368 369This could be solved with a secure user-space daemon which runs as 370root and does the actual creation of pty-pairs. Such a daemon would 371require modification to *every* programme that wants to use this new 372mechanism. It also slows down creation of pty-pairs. 373 374An alternative is to create a new open_pty() syscall which does much 375the same thing as the user-space daemon. Once again, this requires 376modifications to pty-handling programmes. 377 378The devfs solution allows a device driver to "tag" certain device 379files so that when an unopened device is opened, the ownerships are 380changed to the current euid and egid of the opening process, and the 381protections are changed to the default registered by the driver. When 382the device is closed ownership is set back to root and protections are 383set back to read-write for everybody. No programme need be changed. 384The devpts filesystem provides this auto-ownership feature for Unix98 385ptys. It doesn't support old-style pty devices, nor does it have all 386the other features of devfs. 387 388Intelligent device management 389 390Devfs implements a simple yet powerful protocol for communication with 391a device management daemon (devfsd) which runs in user space. It is 392possible to send a message (either synchronously or asynchronously) to 393devfsd on any event, such as registration/unregistration of device 394entries, opening and closing devices, looking up inodes, scanning 395directories and more. This has many possibilities. Some of these are 396already implemented. See: 397 398 399http://www.atnf.csiro.au/~rgooch/linux/ 400 401Device entry registration events can be used by devfsd to change 402permissions of newly-created device nodes. This is one mechanism to 403control device permissions. 404 405Device entry registration/unregistration events can be used to run 406programmes or scripts. This can be used to provide automatic mounting 407of filesystems when a new block device media is inserted into the 408drive. 409 410Asynchronous device open and close events can be used to implement 411clever permissions management. For example, the default permissions on 412/dev/dsp do not allow everybody to read from the device. This is 413sensible, as you don't want some remote user recording what you say at 414your console. However, the console user is also prevented from 415recording. This behaviour is not desirable. With asynchronous device 416open and close events, you can have devfsd run a programme or script 417when console devices are opened to change the ownerships for *other* 418device nodes (such as /dev/dsp). On closure, you can run a different 419script to restore permissions. An advantage of this scheme over 420modifying the C library tty handling is that this works even if your 421programme crashes (how many times have you seen the utmp database with 422lingering entries for non-existent logins?). 423 424Synchronous device open events can be used to perform intelligent 425device access protections. Before the device driver open() method is 426called, the daemon must first validate the open attempt, by running an 427external programme or script. This is far more flexible than access 428control lists, as access can be determined on the basis of other 429system conditions instead of just the UID and GID. 430 431Inode lookup events can be used to authenticate module autoload 432requests. Instead of using kmod directly, the event is sent to 433devfsd which can implement an arbitrary authentication before loading 434the module itself. 435 436Inode lookup events can also be used to construct arbitrary 437namespaces, without having to resort to populating devfs with symlinks 438to devices that don't exist. 439 440Speculative Device Scanning 441 442Consider an application (like cdparanoia) that wants to find all 443CD-ROM devices on the system (SCSI, IDE and other types), whether or 444not their respective modules are loaded. The application must 445speculatively open certain device nodes (such as /dev/sr0 for the SCSI 446CD-ROMs) in order to make sure the module is loaded. This requires 447that all Linux distributions follow the standard device naming scheme 448(last time I looked RedHat did things differently). Devfs solves the 449naming problem. 450 451The same application also wants to see which devices are actually 452available on the system. With the existing system it needs to read the 453/dev directory and speculatively open each /dev/sr* device to 454determine if the device exists or not. With a large /dev this is an 455inefficient operation, especially if there are many /dev/sr* nodes. A 456solution like scsidev could reduce the number of /dev/sr* entries (but 457of course that also requires all that inefficient directory scanning). 458 459With devfs, the application can open the /dev/sr directory 460(which triggers the module autoloading if required), and proceed to 461read /dev/sr. Since only the available devices will have 462entries, there are no inefficencies in directory scanning or device 463openings. 464 465----------------------------------------------------------------------------- 466 467Who else does it? 468 469FreeBSD has a devfs implementation. Solaris and AIX each have a 470pseudo-devfs (something akin to scsidev but for all devices, with some 471unspecified kernel support). BeOS, Plan9 and QNX also have it. SGI's 472IRIX 6.4 and above also have a device filesystem. 473 474While we shouldn't just automatically do something because others do 475it, we should not ignore the work of others either. FreeBSD has a lot 476of competent people working on it, so their opinion should not be 477blithely ignored. 478 479----------------------------------------------------------------------------- 480 481 482How it works 483 484Registering device entries 485 486For every entry (device node) in a devfs-based /dev a driver must call 487devfs_register(). This adds the name of the device entry, the 488file_operations structure pointer and a few other things to an 489internal table. Device entries may be added and removed at any 490time. When a device entry is registered, it automagically appears in 491any mounted devfs'. 492 493Inode lookup 494 495When a lookup operation on an entry is performed and if there is no 496driver information for that entry devfs will attempt to call 497devfsd. If still no driver information can be found then a negative 498dentry is yielded and the next stage operation will be called by the 499VFS (such as create() or mknod() inode methods). If driver information 500can be found, an inode is created (if one does not exist already) and 501all is well. 502 503Manually creating device nodes 504 505The mknod() method allows you to create an ordinary named pipe in the 506devfs, or you can create a character or block special inode if one 507does not already exist. You may wish to create a character or block 508special inode so that you can set permissions and ownership. Later, if 509a device driver registers an entry with the same name, the 510permissions, ownership and times are retained. This is how you can set 511the protections on a device even before the driver is loaded. Once you 512create an inode it appears in the directory listing. 513 514Unregistering device entries 515 516A device driver calls devfs_unregister() to unregister an entry. 517 518Chroot() gaols 519 5202.2.x kernels 521 522The semantics of inode creation are different when devfs is mounted 523with the "explicit" option. Now, when a device entry is registered, it 524will not appear until you use mknod() to create the device. It doesn't 525matter if you mknod() before or after the device is registered with 526devfs_register(). The purpose of this behaviour is to support 527chroot(2) gaols, where you want to mount a minimal devfs inside the 528gaol. Only the devices you specifically want to be available (through 529your mknod() setup) will be accessible. 530 5312.4.x kernels 532 533As of kernel 2.3.99, the VFS has had the ability to rebind parts of 534the global filesystem namespace into another part of the namespace. 535This now works even at the leaf-node level, which means that 536individual files and device nodes may be bound into other parts of the 537namespace. This is like making links, but better, because it works 538across filesystems (unlike hard links) and works through chroot() 539gaols (unlike symbolic links). 540 541Because of these improvements to the VFS, the multi-mount capability 542in devfs is no longer needed. The administrator may create a minimal 543device tree inside a chroot(2) gaol by using VFS bindings. As this 544provides most of the features of the devfs multi-mount capability, I 545removed the multi-mount support code (after issuing an RFC). This 546yielded code size reductions and simplifications. 547 548If you want to construct a minimal chroot() gaol, the following 549command should suffice: 550 551mount --bind /dev/null /gaol/dev/null 552 553 554Repeat for other device nodes you want to expose. Simple! 555 556----------------------------------------------------------------------------- 557 558 559Operational issues 560 561 562Instructions for the impatient 563 564Nobody likes reading documentation. People just want to get in there 565and play. So this section tells you quickly the steps you need to take 566to run with devfs mounted over /dev. Skip these steps and you will end 567up with a nearly unbootable system. Subsequent sections describe the 568issues in more detail, and discuss non-essential configuration 569options. 570 571Devfsd 572OK, if you're reading this, I assume you want to play with 573devfs. First you should ensure that /usr/src/linux contains a 574recent kernel source tree. Then you need to compile devfsd, the device 575management daemon, available at 576 577http://www.atnf.csiro.au/~rgooch/linux/. 578Because the kernel has a naming scheme 579which is quite different from the old naming scheme, you need to 580install devfsd so that software and configuration files that use the 581old naming scheme will not break. 582 583Compile and install devfsd. You will be provided with a default 584configuration file /etc/devfsd.conf which will provide 585compatibility symlinks for the old naming scheme. Don't change this 586config file unless you know what you're doing. Even if you think you 587do know what you're doing, don't change it until you've followed all 588the steps below and booted a devfs-enabled system and verified that it 589works. 590 591Now edit your main system boot script so that devfsd is started at the 592very beginning (before any filesystem 593checks). /etc/rc.d/rc.sysinit is often the main boot script 594on systems with SysV-style boot scripts. On systems with BSD-style 595boot scripts it is often /etc/rc. Also check 596/sbin/rc. 597 598NOTE that the line you put into the boot 599script should be exactly: 600 601/sbin/devfsd /dev 602 603DO NOT use some special daemon-launching 604programme, otherwise the boot script may not wait for devfsd to finish 605initialising. 606 607System Libraries 608There may still be some problems because of broken software making 609assumptions about device names. In particular, some software does not 610handle devices which are symbolic links. If you are running a libc 5 611based system, install libc 5.4.44 (if you have libc 5.4.46, go back to 612libc 5.4.44, which is actually correct). If you are running a glibc 613based system, make sure you have glibc 2.1.3 or later. 614 615/etc/securetty 616PAM (Pluggable Authentication Modules) is supposed to be a flexible 617mechanism for providing better user authentication and access to 618services. Unfortunately, it's also fragile, complex and undocumented 619(check out RedHat 6.1, and probably other distributions as well). PAM 620has problems with symbolic links. Append the following lines to your 621/etc/securetty file: 622 623vc/1 624vc/2 625vc/3 626vc/4 627vc/5 628vc/6 629vc/7 630vc/8 631 632This will not weaken security. If you have a version of util-linux 633earlier than 2.10.h, please upgrade to 2.10.h or later. If you 634absolutely cannot upgrade, then also append the following lines to 635your /etc/securetty file: 636 6371 6382 6393 6404 6415 6426 6437 6448 645 646This may potentially weaken security by allowing root logins over the 647network (a password is still required, though). However, since there 648are problems with dealing with symlinks, I'm suspicious of the level 649of security offered in any case. 650 651XFree86 652While not essential, it's probably a good idea to upgrade to XFree86 6534.0, as patches went in to make it more devfs-friendly. If you don't, 654you'll probably need to apply the following patch to 655/etc/security/console.perms so that ordinary users can run 656startx. Note that not all distributions have this file (e.g. Debian), 657so if it's not present, don't worry about it. 658 659--- /etc/security/console.perms.orig Sat Apr 17 16:26:47 1999 660+++ /etc/security/console.perms Fri Feb 25 23:53:55 2000 661@@ -14,7 +14,7 @@ 662 # man 5 console.perms 663 664 # file classes -- these are regular expressions 665-<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] 666+<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] 667 668 # device classes -- these are shell-style globs 669 <floppy>=/dev/fd[0-1]* 670 671If the patch does not apply, then change the line: 672 673<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] 674 675with: 676 677<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] 678 679 680Disable devpts 681I've had a report of devpts mounted on /dev/pts not working 682correctly. Since devfs will also manage /dev/pts, there is no 683need to mount devpts as well. You should either edit your 684/etc/fstab so devpts is not mounted, or disable devpts from 685your kernel configuration. 686 687Unsupported drivers 688Not all drivers have devfs support. If you depend on one of these 689drivers, you will need to create a script or tarfile that you can use 690at boot time to create device nodes as appropriate. There is a 691section which describes this. Another 692section lists the drivers which have 693devfs support. 694 695/dev/mouse 696 697Many disributions configure /dev/mouse to be the mouse device 698for XFree86 and GPM. I actually think this is a bad idea, because it 699adds another level of indirection. When looking at a config file, if 700you see /dev/mouse you're left wondering which mouse 701is being referred to. Hence I recommend putting the actual mouse 702device (for example /dev/psaux) into your 703/etc/X11/XF86Config file (and similarly for the GPM 704configuration file). 705 706Alternatively, use the same technique used for unsupported drivers 707described above. 708 709The Kernel 710Finally, you need to make sure devfs is compiled into your kernel. Set 711CONFIG_EXPERIMENTAL=y, CONFIG_DEVFS_FS=y and CONFIG_DEVFS_MOUNT=y by 712using favourite configuration tool (i.e. make config or 713make xconfig) and then make clean and then recompile your kernel and 714modules. At boot, devfs will be mounted onto /dev. 715 716If you encounter problems booting (for example if you forgot a 717configuration step), you can pass devfs=nomount at the kernel 718boot command line. This will prevent the kernel from mounting devfs at 719boot time onto /dev. 720 721In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting 722devfs onto /dev is completely safe, and requires no 723configuration changes. One exception to take note of is when 724LABEL= directives are used in /etc/fstab. In this 725case you will be unable to boot properly. This is because the 726mount(8) programme uses /proc/partitions as part of 727the volume label search process, and the device names it finds are not 728available, because setting CONFIG_DEVFS_FS=y changes the names in 729/proc/partitions, irrespective of whether devfs is mounted. 730 731Now you've finished all the steps required. You're now ready to boot 732your shiny new kernel. Enjoy. 733 734Changing the configuration 735 736OK, you've now booted a devfs-enabled system, and everything works. 737Now you may feel like changing the configuration (common targets are 738/etc/fstab and /etc/devfsd.conf). Since you have a 739system that works, if you make any changes and it doesn't work, you 740now know that you only have to restore your configuration files to the 741default and it will work again. 742 743 744Permissions persistence across reboots 745 746If you don't use mknod(2) to create a device file, nor use chmod(2) or 747chown(2) to change the ownerships/permissions, the inode ctime will 748remain at 0 (the epoch, 12 am, 1-JAN-1970, GMT). Anything with a ctime 749later than this has had it's ownership/permissions changed. Hence, a 750simple script or programme may be used to tar up all changed inodes, 751prior to shutdown. Although effective, many consider this approach a 752kludge. 753 754A much better approach is to use devfsd to save and restore 755permissions. It may be configured to record changes in permissions and 756will save them in a database (in fact a directory tree), and restore 757these upon boot. This is an efficient method and results in immediate 758saving of current permissions (unlike the tar approach, which saves 759permissions at some unspecified future time). 760 761The default configuration file supplied with devfsd has config entries 762which you may uncomment to enable persistence management. 763 764If you decide to use the tar approach anyway, be aware that tar will 765first unlink(2) an inode before creating a new device node. The 766unlink(2) has the effect of breaking the connection between a devfs 767entry and the device driver. If you use the "devfs=only" boot option, 768you lose access to the device driver, requiring you to reload the 769module. I consider this a bug in tar (there is no real need to 770unlink(2) the inode first). 771 772Alternatively, you can use devfsd to provide more sophisticated 773management of device permissions. You can use devfsd to store 774permissions for whole groups of devices with a single configuration 775entry, rather than the conventional single entry per device entry. 776 777Permissions database stored in mounted-over /dev 778 779If you wish to save and restore your device permissions into the 780disc-based /dev while still mounting devfs onto /dev 781you may do so. This requires a 2.4.x kernel (in fact, 2.3.99 or 782later), which has the VFS binding facility. You need to do the 783following to set this up: 784 785 786 787make sure the kernel does not mount devfs at boot time 788 789 790make sure you have a correct /dev/console entry in your 791root file-system (where your disc-based /dev lives) 792 793create the /dev-state directory 794 795 796add the following lines near the very beginning of your boot 797scripts: 798 799mount --bind /dev /dev-state 800mount -t devfs none /dev 801devfsd /dev 802 803 804 805 806add the following lines to your /etc/devfsd.conf file: 807 808REGISTER ^pt[sy] IGNORE 809CREATE ^pt[sy] IGNORE 810CHANGE ^pt[sy] IGNORE 811DELETE ^pt[sy] IGNORE 812REGISTER .* COPY /dev-state/$devname $devpath 813CREATE .* COPY $devpath /dev-state/$devname 814CHANGE .* COPY $devpath /dev-state/$devname 815DELETE .* CFUNCTION GLOBAL unlink /dev-state/$devname 816RESTORE /dev-state 817 818Note that the sample devfsd.conf file contains these lines, 819as well as other sample configurations you may find useful. See the 820devfsd distribution 821 822 823reboot. 824 825 826 827 828Permissions database stored in normal directory 829 830If you are using an older kernel which doesn't support VFS binding, 831then you won't be able to have the permissions database in a 832mounted-over /dev. However, you can still use a regular 833directory to store the database. The sample /etc/devfsd.conf 834file above may still be used. You will need to create the 835/dev-state directory prior to installing devfsd. If you have 836old permissions in /dev, then just copy (or move) the device 837nodes over to the new directory. 838 839Which method is better? 840 841The best method is to have the permissions database stored in the 842mounted-over /dev. This is because you will not need to copy 843device nodes over to /dev-state, and because it allows you to 844switch between devfs and non-devfs kernels, without requiring you to 845copy permissions between /dev-state (for devfs) and 846/dev (for non-devfs). 847 848 849Dealing with drivers without devfs support 850 851Currently, not all device drivers in the kernel have been modified to 852use devfs. Device drivers which do not yet have devfs support will not 853automagically appear in devfs. The simplest way to create device nodes 854for these drivers is to unpack a tarfile containing the required 855device nodes. You can do this in your boot scripts. All your drivers 856will now work as before. 857 858Hopefully for most people devfs will have enough support so that they 859can mount devfs directly over /dev without losing most functionality 860(i.e. losing access to various devices). As of 22-JAN-1998 (devfs 861patch version 10) I am now running this way. All the devices I have 862are available in devfs, so I don't lose anything. 863 864WARNING: if your configuration requires the old-style device names 865(i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure 866it to maintain compatibility entries. It is almost certain that you 867will require this. Note that the kernel creates a compatibility entry 868for the root device, so you don't need initrd. 869 870Note that you no longer need to mount devpts if you use Unix98 PTYs, 871as devfs can manage /dev/pts itself. This saves you some RAM, as you 872don't need to compile and install devpts. Note that some versions of 873glibc have a bug with Unix98 pty handling on devfs systems. Contact 874the glibc maintainers for a fix. Glibc 2.1.3 has the fix. 875 876Note also that apart from editing /etc/fstab, other things will need 877to be changed if you *don't* install devfsd. Some software (like the X 878server) hard-wire device names in their source. It really is much 879easier to install devfsd so that compatibility entries are created. 880You can then slowly migrate your system to using the new device names 881(for example, by starting with /etc/fstab), and then limiting the 882compatibility entries that devfsd creates. 883 884IF YOU CONFIGURE TO MOUNT DEVFS AT BOOT, MAKE SURE YOU INSTALL DEVFSD 885BEFORE YOU BOOT A DEVFS-ENABLED KERNEL! 886 887Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of 888reports back. Many of these are because people are trying to run 889without devfsd, and hence some things break. Please just run devfsd if 890things break. I want to concentrate on real bugs rather than 891misconfiguration problems at the moment. If people are willing to fix 892bugs/false assumptions in other code (i.e. glibc, X server) and submit 893that to the respective maintainers, that would be great. 894 895 896All the way with Devfs 897 898The devfs kernel patch creates a rationalised device tree. As stated 899above, if you want to keep using the old /dev naming scheme, 900you just need to configure devfsd appopriately (see the man 901page). People who prefer the old names can ignore this section. For 902those of us who like the rationalised names and an uncluttered 903/dev, read on. 904 905If you don't run devfsd, or don't enable compatibility entry 906management, then you will have to configure your system to use the new 907names. For example, you will then need to edit your 908/etc/fstab to use the new disc naming scheme. If you want to 909be able to boot non-devfs kernels, you will need compatibility 910symlinks in the underlying disc-based /dev pointing back to 911the old-style names for when you boot a kernel without devfs. 912 913You can selectively decide which devices you want compatibility 914entries for. For example, you may only want compatibility entries for 915BSD pseudo-terminal devices (otherwise you'll have to patch you C 916library or use Unix98 ptys instead). It's just a matter of putting in 917the correct regular expression into /dev/devfsd.conf. 918 919There are other choices of naming schemes that you may prefer. For 920example, I don't use the kernel-supplied 921names, because they are too verbose. A common misconception is 922that the kernel-supplied names are meant to be used directly in 923configuration files. This is not the case. They are designed to 924reflect the layout of the devices attached and to provide easy 925classification. 926 927If you like the kernel-supplied names, that's fine. If you don't then 928you should be using devfsd to construct a namespace more to your 929liking. Devfsd has built-in code to construct a 930namespace that is both logical and easy to 931manage. In essence, it creates a convenient abbreviation of the 932kernel-supplied namespace. 933 934You are of course free to build your own namespace. Devfsd has all the 935infrastructure required to make this easy for you. All you need do is 936write a script. You can even write some C code and devfsd can load the 937shared object as a callable extension. 938 939 940Other Issues 941 942The init programme 943Another thing to take note of is whether your init programme 944creates a Unix socket /dev/telinit. Some versions of init 945create /dev/telinit so that the telinit programme can 946communicate with the init process. If you have such a system you need 947to make sure that devfs is mounted over /dev *before* init 948starts. In other words, you can't leave the mounting of devfs to 949/etc/rc, since this is executed after init. Other 950versions of init require a named pipe /dev/initctl 951which must exist *before* init starts. Once again, you need to 952mount devfs and then create the named pipe *before* init 953starts. 954 955The default behaviour now is not to mount devfs onto /dev at 956boot time for 2.3.x and later kernels. You can correct this with the 957"devfs=mount" boot option. This solves any problems with init, 958and also prevents the dreaded: 959 960Cannot open initial console 961 962message. For 2.2.x kernels where you need to apply the devfs patch, 963the default is to mount. 964 965If you have automatic mounting of devfs onto /dev then you 966may need to create /dev/initctl in your boot scripts. The 967following lines should suffice: 968 969mknod /dev/initctl p 970kill -SIGUSR1 1 # tell init that /dev/initctl now exists 971 972Alternatively, if you don't want the kernel to mount devfs onto 973/dev then you could use the following procedure is a 974guideline for how to get around /dev/initctl problems: 975 976# cd /sbin 977# mv init init.real 978# cat > init 979#! /bin/sh 980mount -n -t devfs none /dev 981mknod /dev/initctl p 982exec /sbin/init.real $* 983[control-D] 984# chmod a+x init 985 986Note that newer versions of init create /dev/initctl 987automatically, so you don't have to worry about this. 988 989Module autoloading 990You will need to configure devfsd to enable module 991autoloading. The following lines should be placed in your 992/etc/devfsd.conf file: 993 994LOOKUP .* MODLOAD 995 996 997As of devfsd-v1.3.10, a generic /etc/modules.devfs 998configuration file is installed, which is used by the MODLOAD 999action. This should be sufficient for most configurations. If you 1000require further configuration, edit your /etc/modules.conf 1001file. The way module autoloading work with devfs is: 1002 1003 1004a process attempts to lookup a device node (e.g. /dev/fred) 1005 1006 1007if that device node does not exist, the full pathname is passed to 1008devfsd as a string 1009 1010 1011devfsd will pass the string to the modprobe programme (provided the 1012configuration line shown above is present), and specifies that 1013/etc/modules.devfs is the configuration file 1014 1015 1016/etc/modules.devfs includes /etc/modules.conf to 1017access local configurations 1018 1019modprobe will search it's configuration files, looking for an alias 1020that translates the pathname into a module name 1021 1022 1023the translated pathname is then used to load the module. 1024 1025 1026If you wanted a lookup of /dev/fred to load the 1027mymod module, you would require the following configuration 1028line in /etc/modules.conf: 1029 1030alias /dev/fred mymod 1031 1032The /etc/modules.devfs configuration file provides many such 1033aliases for standard device names. If you look closely at this file, 1034you will note that some modules require multiple alias configuration 1035lines. This is required to support module autoloading for old and new 1036device names. 1037 1038Mounting root off a devfs device 1039If you wish to mount root off a devfs device when you pass the 1040"devfs=only" boot option, then you need to pass in the 1041"root=<device>" option to the kernel when booting. If you use 1042LILO, then you must have this in lilo.conf: 1043 1044append = "root=<device>" 1045 1046Surprised? Yep, so was I. It turns out if you have (as most people 1047do): 1048 1049root = <device> 1050 1051 1052then LILO will determine the device number of <device> and will 1053write that device number into a special place in the kernel image 1054before starting the kernel, and the kernel will use that device number 1055to mount the root filesystem. So, using the "append" variety ensures 1056that LILO passes the root filesystem device as a string, which devfs 1057can then use. 1058 1059Note that this isn't an issue if you don't pass "devfs=only". 1060 1061TTY issues 1062The ttyname(3) function in some versions of the C library makes 1063false assumptions about device entries which are symbolic links. The 1064tty(1) programme is one that depends on this function. I've 1065written a patch to libc 5.4.43 which fixes this. This has been 1066included in libc 5.4.44 and a similar fix is in glibc 2.1.3. 1067 1068 1069Kernel Naming Scheme 1070 1071The kernel provides a default naming scheme. This scheme is designed 1072to make it easy to search for specific devices or device types, and to 1073view the available devices. Some device types (such as hard discs), 1074have a directory of entries, making it easy to see what devices of 1075that class are available. Often, the entries are symbolic links into a 1076directory tree that reflects the topology of available devices. The 1077topological tree is useful for finding how your devices are arranged. 1078 1079Below is a list of the naming schemes for the most common drivers. A 1080list of reserved device names is 1081available for reference. Please send email to 1082rgooch@atnf.csiro.au to obtain an allocation. Please be 1083patient (the maintainer is busy). An alternative name may be allocated 1084instead of the requested name, at the discretion of the maintainer. 1085 1086Disc Devices 1087 1088All discs, whether SCSI, IDE or whatever, are placed under the 1089/dev/discs hierarchy: 1090 1091 /dev/discs/disc0 first disc 1092 /dev/discs/disc1 second disc 1093 1094 1095Each of these entries is a symbolic link to the directory for that 1096device. The device directory contains: 1097 1098 disc for the whole disc 1099 part* for individual partitions 1100 1101 1102CD-ROM Devices 1103 1104All CD-ROMs, whether SCSI, IDE or whatever, are placed under the 1105/dev/cdroms hierarchy: 1106 1107 /dev/cdroms/cdrom0 first CD-ROM 1108 /dev/cdroms/cdrom1 second CD-ROM 1109 1110 1111Each of these entries is a symbolic link to the real device entry for 1112that device. 1113 1114Tape Devices 1115 1116All tapes, whether SCSI, IDE or whatever, are placed under the 1117/dev/tapes hierarchy: 1118 1119 /dev/tapes/tape0 first tape 1120 /dev/tapes/tape1 second tape 1121 1122 1123Each of these entries is a symbolic link to the directory for that 1124device. The device directory contains: 1125 1126 mt for mode 0 1127 mtl for mode 1 1128 mtm for mode 2 1129 mta for mode 3 1130 mtn for mode 0, no rewind 1131 mtln for mode 1, no rewind 1132 mtmn for mode 2, no rewind 1133 mtan for mode 3, no rewind 1134 1135 1136SCSI Devices 1137 1138To uniquely identify any SCSI device requires the following 1139information: 1140 1141 controller (host adapter) 1142 bus (SCSI channel) 1143 target (SCSI ID) 1144 unit (Logical Unit Number) 1145 1146 1147All SCSI devices are placed under /dev/scsi (assuming devfs 1148is mounted on /dev). Hence, a SCSI device with the following 1149parameters: c=1,b=2,t=3,u=4 would appear as: 1150 1151 /dev/scsi/host1/bus2/target3/lun4 device directory 1152 1153 1154Inside this directory, a number of device entries may be created, 1155depending on which SCSI device-type drivers were installed. 1156 1157See the section on the disc naming scheme to see what entries the SCSI 1158disc driver creates. 1159 1160See the section on the tape naming scheme to see what entries the SCSI 1161tape driver creates. 1162 1163The SCSI CD-ROM driver creates: 1164 1165 cd 1166 1167 1168The SCSI generic driver creates: 1169 1170 generic 1171 1172 1173IDE Devices 1174 1175To uniquely identify any IDE device requires the following 1176information: 1177 1178 controller 1179 bus (aka. primary/secondary) 1180 target (aka. master/slave) 1181 unit 1182 1183 1184All IDE devices are placed under /dev/ide, and uses a similar 1185naming scheme to the SCSI subsystem. 1186 1187XT Hard Discs 1188 1189All XT discs are placed under /dev/xd. The first XT disc has 1190the directory /dev/xd/disc0. 1191 1192TTY devices 1193 1194The tty devices now appear as: 1195 1196 New name Old-name Device Type 1197 -------- -------- ----------- 1198 /dev/tts/{0,1,...} /dev/ttyS{0,1,...} Serial ports 1199 /dev/cua/{0,1,...} /dev/cua{0,1,...} Call out devices 1200 /dev/vc/0 /dev/tty Current virtual console 1201 /dev/vc/{1,2,...} /dev/tty{1...63} Virtual consoles 1202 /dev/vcc/{0,1,...} /dev/vcs{1...63} Virtual consoles 1203 /dev/pty/m{0,1,...} /dev/ptyp?? PTY masters 1204 /dev/pty/s{0,1,...} /dev/ttyp?? PTY slaves 1205 1206 1207RAMDISCS 1208 1209The RAMDISCS are placed in their own directory, and are named thus: 1210 1211 /dev/rd/{0,1,2,...} 1212 1213 1214Meta Devices 1215 1216The meta devices are placed in their own directory, and are named 1217thus: 1218 1219 /dev/md/{0,1,2,...} 1220 1221 1222Floppy discs 1223 1224Floppy discs are placed in the /dev/floppy directory. 1225 1226Loop devices 1227 1228Loop devices are placed in the /dev/loop directory. 1229 1230Sound devices 1231 1232Sound devices are placed in the /dev/sound directory 1233(audio, sequencer, ...). 1234 1235 1236Devfsd Naming Scheme 1237 1238Devfsd provides a naming scheme which is a convenient abbreviation of 1239the kernel-supplied namespace. In some 1240cases, the kernel-supplied naming scheme is quite convenient, so 1241devfsd does not provide another naming scheme. The convenience names 1242that devfsd creates are in fact the same names as the original devfs 1243kernel patch created (before Linus mandated the Big Name 1244Change). These are referred to as "new compatibility entries". 1245 1246In order to configure devfsd to create these convenience names, the 1247following lines should be placed in your /etc/devfsd.conf: 1248 1249REGISTER .* MKNEWCOMPAT 1250UNREGISTER .* RMNEWCOMPAT 1251 1252This will cause devfsd to create (and destroy) symbolic links which 1253point to the kernel-supplied names. 1254 1255SCSI Hard Discs 1256 1257All SCSI discs are placed under /dev/sd (assuming devfs is 1258mounted on /dev). Hence, a SCSI disc with the following 1259parameters: c=1,b=2,t=3,u=4 would appear as: 1260 1261 /dev/sd/c1b2t3u4 for the whole disc 1262 /dev/sd/c1b2t3u4p5 for the 5th partition 1263 /dev/sd/c1b2t3u4p5s6 for the 6th slice in the 5th partition 1264 1265 1266SCSI Tapes 1267 1268All SCSI tapes are placed under /dev/st. A similar naming 1269scheme is used as for SCSI discs. A SCSI tape with the 1270parameters:c=1,b=2,t=3,u=4 would appear as: 1271 1272 /dev/st/c1b2t3u4m0 for mode 0 1273 /dev/st/c1b2t3u4m1 for mode 1 1274 /dev/st/c1b2t3u4m2 for mode 2 1275 /dev/st/c1b2t3u4m3 for mode 3 1276 /dev/st/c1b2t3u4m0n for mode 0, no rewind 1277 /dev/st/c1b2t3u4m1n for mode 1, no rewind 1278 /dev/st/c1b2t3u4m2n for mode 2, no rewind 1279 /dev/st/c1b2t3u4m3n for mode 3, no rewind 1280 1281 1282SCSI CD-ROMs 1283 1284All SCSI CD-ROMs are placed under /dev/sr. A similar naming 1285scheme is used as for SCSI discs. A SCSI CD-ROM with the 1286parameters:c=1,b=2,t=3,u=4 would appear as: 1287 1288 /dev/sr/c1b2t3u4 1289 1290 1291SCSI Generic Devices 1292 1293The generic (aka. raw) interface for all SCSI devices are placed under 1294/dev/sg. A similar naming scheme is used as for SCSI discs. A 1295SCSI generic device with the parameters:c=1,b=2,t=3,u=4 would appear 1296as: 1297 1298 /dev/sg/c1b2t3u4 1299 1300 1301IDE Hard Discs 1302 1303All IDE discs are placed under /dev/ide/hd, using a similar 1304convention to SCSI discs. The following mappings exist between the new 1305and the old names: 1306 1307 /dev/hda /dev/ide/hd/c0b0t0u0 1308 /dev/hdb /dev/ide/hd/c0b0t1u0 1309 /dev/hdc /dev/ide/hd/c0b1t0u0 1310 /dev/hdd /dev/ide/hd/c0b1t1u0 1311 1312 1313IDE Tapes 1314 1315A similar naming scheme is used as for IDE discs. The entries will 1316appear in the /dev/ide/mt directory. 1317 1318IDE CD-ROM 1319 1320A similar naming scheme is used as for IDE discs. The entries will 1321appear in the /dev/ide/cd directory. 1322 1323IDE Floppies 1324 1325A similar naming scheme is used as for IDE discs. The entries will 1326appear in the /dev/ide/fd directory. 1327 1328XT Hard Discs 1329 1330All XT discs are placed under /dev/xd. The first XT disc 1331would appear as /dev/xd/c0t0. 1332 1333 1334Old Compatibility Names 1335 1336The old compatibility names are the legacy device names, such as 1337/dev/hda, /dev/sda, /dev/rtc and so on. 1338Devfsd can be configured to create compatibility symlinks so that you 1339may continue to use the old names in your configuration files and so 1340that old applications will continue to function correctly. 1341 1342In order to configure devfsd to create these legacy names, the 1343following lines should be placed in your /etc/devfsd.conf: 1344 1345REGISTER .* MKOLDCOMPAT 1346UNREGISTER .* RMOLDCOMPAT 1347 1348This will cause devfsd to create (and destroy) symbolic links which 1349point to the kernel-supplied names. 1350 1351 1352----------------------------------------------------------------------------- 1353 1354 1355Device drivers currently ported 1356 1357- All miscellaneous character devices support devfs (this is done 1358 transparently through misc_register()) 1359 1360- SCSI discs and generic hard discs 1361 1362- Character memory devices (null, zero, full and so on) 1363 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> 1364 1365- Loop devices (/dev/loop?) 1366 1367- TTY devices (console, serial ports, terminals and pseudo-terminals) 1368 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu> 1369 1370- SCSI tapes (/dev/scsi and /dev/tapes) 1371 1372- SCSI CD-ROMs (/dev/scsi and /dev/cdroms) 1373 1374- SCSI generic devices (/dev/scsi) 1375 1376- RAMDISCS (/dev/ram?) 1377 1378- Meta Devices (/dev/md*) 1379 1380- Floppy discs (/dev/floppy) 1381 1382- Parallel port printers (/dev/printers) 1383 1384- Sound devices (/dev/sound) 1385 Thanks to Eric Dumas <dumas@linux.eu.org> and 1386 C. Scott Ananian <cananian@alumni.princeton.edu> 1387 1388- Joysticks (/dev/joysticks) 1389 1390- Sparc keyboard (/dev/kbd) 1391 1392- DSP56001 digital signal processor (/dev/dsp56k) 1393 1394- Apple Desktop Bus (/dev/adb) 1395 1396- Coda network file system (/dev/cfs*) 1397 1398- Virtual console capture devices (/dev/vcc) 1399 Thanks to Dennis Hou <smilax@mindmeld.yi.org> 1400 1401- Frame buffer devices (/dev/fb) 1402 1403- Video capture devices (/dev/v4l) 1404 1405 1406----------------------------------------------------------------------------- 1407 1408 1409Allocation of Device Numbers 1410 1411Devfs allows you to write a driver which doesn't need to allocate a 1412device number (major&minor numbers) for the internal operation of the 1413kernel. However, there are a number of userspace programmes that use 1414the device number as a unique handle for a device. An example is the 1415find programme, which uses device numbers to determine whether 1416an inode is on a different filesystem than another inode. The device 1417number used is the one for the block device which a filesystem is 1418using. To preserve compatibility with userspace programmes, block 1419devices using devfs need to have unique device numbers allocated to 1420them. Furthermore, POSIX specifies device numbers, so some kind of 1421device number needs to be presented to userspace. 1422 1423The simplest option (especially when porting drivers to devfs) is to 1424keep using the old major and minor numbers. Devfs will take whatever 1425values are given for major&minor and pass them onto userspace. 1426 1427This device number is a 16 bit number, so this leaves plenty of space 1428for large numbers of discs and partitions. This scheme can also be 1429used for character devices, in particular the tty devices, which are 1430currently limited to 256 pseudo-ttys (this limits the total number of 1431simultaneous xterms and remote logins). Note that the device number 1432is limited to the range 36864-61439 (majors 144-239), in order to 1433avoid any possible conflicts with existing official allocations. 1434 1435Please note that using dynamically allocated block device numbers may 1436break the NFS daemons (both user and kernel mode), which expect dev_t 1437for a given device to be constant over the lifetime of remote mounts. 1438 1439A final note on this scheme: since it doesn't increase the size of 1440device numbers, there are no compatibility issues with userspace. 1441 1442----------------------------------------------------------------------------- 1443 1444 1445Questions and Answers 1446 1447 1448Making things work 1449Alternatives to devfs 1450What I don't like about devfs 1451How to report bugs 1452Strange kernel messages 1453Compilation problems with devfsd 1454 1455 1456 1457Making things work 1458 1459Here are some common questions and answers. 1460 1461 1462 1463Devfsd doesn't start 1464 1465Make sure you have compiled and installed devfsd 1466Make sure devfsd is being started from your boot 1467scripts 1468Make sure you have configured your kernel to enable devfs (see 1469below) 1470Make sure devfs is mounted (see below) 1471 1472 1473Devfsd is not managing all my permissions 1474 1475Make sure you are capturing the appropriate events. For example, 1476device entries created by the kernel generate REGISTER events, 1477but those created by devfsd generate CREATE events. 1478 1479 1480Devfsd is not capturing all REGISTER events 1481 1482See the previous entry: you may need to capture CREATE events. 1483 1484 1485X will not start 1486 1487Make sure you followed the steps 1488outlined above. 1489 1490 1491Why don't my network devices appear in devfs? 1492 1493This is not a bug. Network devices have their own, completely separate 1494namespace. They are accessed via socket(2) and 1495setsockopt(2) calls, and thus require no device nodes. I have 1496raised the possibilty of moving network devices into the device 1497namespace, but have had no response. 1498 1499 1500How can I test if I have devfs compiled into my kernel? 1501 1502All filesystems built-in or currently loaded are listed in 1503/proc/filesystems. If you see a devfs entry, then 1504you know that devfs was compiled into your kernel. If you have 1505correctly configured and rebuilt your kernel, then devfs will be 1506built-in. If you think you've configured it in, but 1507/proc/filesystems doesn't show it, you've made a mistake. 1508Common mistakes include: 1509 1510Using a 2.2.x kernel without applying the devfs patch (if you 1511don't know how to patch your kernel, use 2.4.x instead, don't bother 1512asking me how to patch) 1513Forgetting to set CONFIG_EXPERIMENTAL=y 1514Forgetting to set CONFIG_DEVFS_FS=y 1515Forgetting to set CONFIG_DEVFS_MOUNT=y (if you want devfs 1516to be automatically mounted at boot) 1517Editing your .config manually, instead of using make 1518config or make xconfig 1519Forgetting to run make dep; make clean after changing the 1520configuration and before compiling 1521Forgetting to compile your kernel and modules 1522Forgetting to install your kernel 1523Forgetting to install your modules 1524 1525Please check twice that you've done all these steps before sending in 1526a bug report. 1527 1528 1529 1530How can I test if devfs is mounted on /dev? 1531 1532The device filesystem will always create an entry called 1533".devfsd", which is used to communicate with the daemon. Even 1534if the daemon is not running, this entry will exist. Testing for the 1535existence of this entry is the approved method of determining if devfs 1536is mounted or not. Note that the type of entry (i.e. regular file, 1537character device, named pipe, etc.) may change without notice. Only 1538the existence of the entry should be relied upon. 1539 1540 1541When I start devfsd, I see the error: 1542Error opening file: ".devfsd" No such file or directory? 1543 1544This means that devfs is not mounted. Make sure you have devfs mounted. 1545 1546 1547How do I mount devfs? 1548 1549First make sure you have devfs compiled into your kernel (see 1550above). Then you will either need to: 1551 1552set CONFIG_DEVFS_MOUNT=y in your kernel config 1553pass devfs=mount to your boot loader 1554mount devfs manually in your boot scripts with: 1555mount -t none devfs /dev 1556 1557 1558 1559Mount by volume LABEL=<label> doesn't work with 1560devfs 1561 1562Most probably you are not mounting devfs onto /dev. What 1563happens is that if your kernel config has CONFIG_DEVFS_FS=y 1564then the contents of /proc/partitions will have the devfs 1565names (such as scsi/host0/bus0/target0/lun0/part1). The 1566contents of /proc/partitions are used by mount(8) when 1567mounting by volume label. If devfs is not mounted on /dev, 1568then mount(8) will fail to find devices. The solution is to 1569make sure that devfs is mounted on /dev. See above for how to 1570do that. 1571 1572 1573I have extra or incorrect entries in /dev 1574 1575You may have stale entries in your dev-state area. Check for a 1576RESTORE configuration line in your devfsd configuration 1577(typically /etc/devfsd.conf). If you have this line, check 1578the contents of the specified directory for stale entries. Remove 1579any entries which are incorrect, then reboot. 1580 1581 1582I get "Unable to open initial console" messages at boot 1583 1584This usually happens when you don't have devfs automounted onto 1585/dev at boot time, and there is no valid 1586/dev/console entry on your root file-system. Create a valid 1587/dev/console device node. 1588 1589 1590 1591 1592 1593Alternatives to devfs 1594 1595I've attempted to collate all the anti-devfs proposals and explain 1596their limitations. Under construction. 1597 1598 1599Why not just pass device create/remove events to a daemon? 1600 1601Here the suggestion is to develop an API in the kernel so that devices 1602can register create and remove events, and a daemon listens for those 1603events. The daemon would then populate/depopulate /dev (which 1604resides on disc). 1605 1606This has several limitations: 1607 1608 1609it only works for modules loaded and unloaded (or devices inserted 1610and removed) after the kernel has finished booting. Without a database 1611of events, there is no way the daemon could fully populate 1612/dev 1613 1614 1615if you add a database to this scheme, the question is then how to 1616present that database to user-space. If you make it a list of strings 1617with embedded event codes which are passed through a pipe to the 1618daemon, then this is only of use to the daemon. I would argue that the 1619natural way to present this data is via a filesystem (since many of 1620the events will be of a hierarchical nature), such as devfs. 1621Presenting the data as a filesystem makes it easy for the user to see 1622what is available and also makes it easy to write scripts to scan the 1623"database" 1624 1625 1626the tight binding between device nodes and drivers is no longer 1627possible (requiring the otherwise perfectly avoidable 1628table lookups) 1629 1630 1631you cannot catch inode lookup events on /dev which means 1632that module autoloading requires device nodes to be created. This is a 1633problem, particularly for drivers where only a few inodes are created 1634from a potentially large set 1635 1636 1637this technique can't be used when the root FS is mounted 1638read-only 1639 1640 1641 1642 1643Just implement a better scsidev 1644 1645This suggestion involves taking the scsidev programme and 1646extending it to scan for all devices, not just SCSI devices. The 1647scsidev programme works by scanning /proc/scsi 1648 1649Problems: 1650 1651 1652the kernel does not currently provide a list of all devices 1653available. Not all drivers register entries in /proc or 1654generate kernel messages 1655 1656 1657there is no uniform mechanism to register devices other than the 1658devfs API 1659 1660 1661implementing such an API is then the same as the 1662proposal above 1663 1664 1665 1666 1667Put /dev on a ramdisc 1668 1669This suggestion involves creating a ramdisc and populating it with 1670device nodes and then mounting it over /dev. 1671 1672Problems: 1673 1674 1675 1676this doesn't help when mounting the root filesystem, since you 1677still need a device node to do that 1678 1679 1680if you want to use this technique for the root device node as 1681well, you need to use initrd. This complicates the booting sequence 1682and makes it significantly harder to administer and configure. The 1683initrd is essentially opaque, robbing the system administrator of easy 1684configuration 1685 1686 1687insufficient information is available to correctly populate the 1688ramdisc. So we come back to the 1689proposal above to "solve" this 1690 1691 1692a ramdisc-based solution would take more kernel memory, since the 1693backing store would be (at best) normal VFS inodes and dentries, which 1694take 284 bytes and 112 bytes, respectively, for each entry. Compare 1695that to 72 bytes for devfs 1696 1697 1698 1699 1700Do nothing: there's no problem 1701 1702Sometimes people can be heard to claim that the existing scheme is 1703fine. This is what they're ignoring: 1704 1705 1706device number size (8 bits each for major and minor) is a real 1707limitation, and must be fixed somehow. Systems with large numbers of 1708SCSI devices, for example, will continue to consume the remaining 1709unallocated major numbers. USB will also need to push beyond the 8 bit 1710minor limitation 1711 1712 1713simply increasing the device number size is insufficient. Apart 1714from causing a lot of pain, it doesn't solve the management issues 1715of a /dev with thousands or more device nodes 1716 1717 1718ignoring the problem of a huge /dev will not make it go 1719away, and dismisses the legitimacy of a large number of people who 1720want a dynamic /dev 1721 1722 1723the standard response then becomes: "write a device management 1724daemon", which brings us back to the 1725proposal above 1726 1727 1728 1729 1730What I don't like about devfs 1731 1732Here are some common complaints about devfs, and some suggestions and 1733solutions that may make it more palatable for you. I can't please 1734everybody, but I do try :-) 1735 1736I hate the naming scheme 1737 1738First, remember that no naming scheme will please everybody. You hate 1739the scheme, others love it. Who's to say who's right and who's wrong? 1740Ultimately, the person who writes the code gets to choose, and what 1741exists now is a combination of the choices made by the 1742devfs author and the 1743kernel maintainer (Linus). 1744 1745However, not all is lost. If you want to create your own naming 1746scheme, it is a simple matter to write a standalone script, hack 1747devfsd, or write a script called by devfsd. You can create whatever 1748naming scheme you like. 1749 1750Further, if you want to remove all traces of the devfs naming scheme 1751from /dev, you can mount devfs elsewhere (say 1752/devfs) and populate /dev with links into 1753/devfs. This population can be automated using devfsd if you 1754wish. 1755 1756You can even use the VFS binding facility to make the links, rather 1757than using symbolic links. This way, you don't even have to see the 1758"destination" of these symbolic links. 1759 1760Devfs puts policy into the kernel 1761 1762There's already policy in the kernel. Device numbers are in fact 1763policy (why should the kernel dictate what device numbers I use?). 1764Face it, some policy has to be in the kernel. The real difference 1765between device names as policy and device numbers as policy is that 1766no one will use device numbers directly, because device 1767numbers are devoid of meaning to humans and are ugly. At least with 1768the devfs device names, (even though you can add your own naming 1769scheme) some people will use the devfs-supplied names directly. This 1770offends some people :-) 1771 1772Devfs is bloatware 1773 1774This is not even remotely true. As shown above, 1775both code and data size are quite modest. 1776 1777 1778How to report bugs 1779 1780If you have (or think you have) a bug with devfs, please follow the 1781steps below: 1782 1783 1784 1785make sure you have enabled debugging output when configuring your 1786kernel. You will need to set (at least) the following config options: 1787 1788CONFIG_DEVFS_DEBUG=y 1789CONFIG_DEBUG_KERNEL=y 1790CONFIG_DEBUG_SLAB=y 1791 1792 1793 1794please make sure you have the latest devfs patches applied. The 1795latest kernel version might not have the latest devfs patches applied 1796yet (Linus is very busy) 1797 1798 1799save a copy of your complete kernel logs (preferably by 1800using the dmesg programme) for later inclusion in your bug 1801report. You may need to use the -s switch to increase the 1802internal buffer size so you can capture all the boot messages. 1803Don't edit or trim the dmesg output 1804 1805 1806 1807 1808try booting with devfs=dall passed to the kernel boot 1809command line (read the documentation on your bootloader on how to do 1810this), and save the result to a file. This may be quite verbose, and 1811it may overflow the messages buffer, but try to get as much of it as 1812you can 1813 1814 1815send a copy of your devfsd configuration file(s) 1816 1817send the bug report to me first. 1818Don't expect that I will see it if you post it to the linux-kernel 1819mailing list. Include all the information listed above, plus 1820anything else that you think might be relevant. Put the string 1821devfs somewhere in the subject line, so my mail filters mark 1822it as urgent 1823 1824 1825 1826 1827Here is a general guide on how to ask questions in a way that greatly 1828improves your chances of getting a reply: 1829 1830http://www.tuxedo.org/~esr/faqs/smart-questions.html. If you have 1831a bug to report, you should also read 1832 1833http://www.chiark.greenend.org.uk/~sgtatham/bugs.html. 1834 1835 1836Strange kernel messages 1837 1838You may see devfs-related messages in your kernel logs. Below are some 1839messages and what they mean (and what you should do about them, if 1840anything). 1841 1842 1843 1844devfs_register(fred): could not append to parent, err: -17 1845 1846You need to check what the error code means, but usually 17 means 1847EEXIST. This means that a driver attempted to create an entry 1848fred in a directory, but there already was an entry with that 1849name. This is often caused by flawed boot scripts which untar a bunch 1850of inodes into /dev, as a way to restore permissions. This 1851message is harmless, as the device nodes will still 1852provide access to the driver (unless you use the devfs=only 1853boot option, which is only for dedicated souls:-). If you want to get 1854rid of these annoying messages, upgrade to devfsd-v1.3.20 and use the 1855recommended RESTORE directive to restore permissions. 1856 1857 1858devfs_mk_dir(bill): using old entry in dir: c1808724 "" 1859 1860This is similar to the message above, except that a driver attempted 1861to create a directory named bill, and the parent directory 1862has an entry with the same name. In this case, to ensure that drivers 1863continue to work properly, the old entry is re-used and given to the 1864driver. In 2.5 kernels, the driver is given a NULL entry, and thus, 1865under rare circumstances, may not create the require device nodes. 1866The solution is the same as above. 1867 1868 1869 1870 1871 1872Compilation problems with devfsd 1873 1874Usually, you can compile devfsd just by typing in 1875make in the source directory, followed by a make 1876install (as root). Sometimes, you may have problems, particularly 1877on broken configurations. 1878 1879 1880 1881error messages relating to DEVFSD_NOTIFY_DELETE 1882 1883This happened because you have an ancient set of kernel headers 1884installed in /usr/include/linux or /usr/src/linux. 1885Install kernel 2.4.10 or later. You may need to pass the 1886KERNEL_DIR variable to make (if you did not install 1887the new kernel sources as /usr/src/linux), or you may copy 1888the devfs_fs.h file in the kernel source tree into 1889/usr/include/linux. 1890 1891 1892 1893 1894----------------------------------------------------------------------------- 1895 1896 1897Other resources 1898 1899 1900 1901Douglas Gilbert has written a useful document at 1902 1903http://www.torque.net/sg/devfs_scsi.html which 1904explores the SCSI subsystem and how it interacts with devfs 1905 1906 1907Douglas Gilbert has written another useful document at 1908 1909http://www.torque.net/scsi/SCSI-2.4-HOWTO/ which 1910discusses the Linux SCSI subsystem in 2.4. 1911 1912 1913Johannes Erdfelt has started a discussion paper on Linux and 1914hot-swap devices, describing what the requirements are for a scalable 1915solution and how and why he's used devfs+devfsd. Note that this is an 1916early draft only, available in plain text form at: 1917 1918http://johannes.erdfelt.com/hotswap.txt. 1919Johannes has promised a HTML version will follow. 1920 1921 1922I presented an invited 1923paper 1924at the 1925 19262nd Annual Storage Management Workshop held in Miamia, Florida, 1927U.S.A. in October 2000. 1928 1929 1930 1931 1932----------------------------------------------------------------------------- 1933 1934 1935Translations of this document 1936 1937This document has been translated into other languages. 1938 1939 1940 1941 1942The document master (in English) by rgooch@atnf.csiro.au is 1943available at 1944 1945http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html 1946 1947 1948 1949A Korean translation by viatoris@nownuri.net is available at 1950 1951http://your.destiny.pe.kr/devfs/devfs.html 1952 1953 1954 1955 1956----------------------------------------------------------------------------- 1957Most flags courtesy of ITA's 1958Flags of All Countries 1959used with permission.