Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

at 33bc227e4e48ddadcf2eacb381c19df338f0a6c8 195 lines 9.6 kB view raw
1ramfs, rootfs and initramfs 2October 17, 2005 3Rob Landley <rob@landley.net> 4============================= 5 6What is ramfs? 7-------------- 8 9Ramfs is a very simple filesystem that exports Linux's disk caching 10mechanisms (the page cache and dentry cache) as a dynamically resizable 11ram-based filesystem. 12 13Normally all files are cached in memory by Linux. Pages of data read from 14backing store (usually the block device the filesystem is mounted on) are kept 15around in case it's needed again, but marked as clean (freeable) in case the 16Virtual Memory system needs the memory for something else. Similarly, data 17written to files is marked clean as soon as it has been written to backing 18store, but kept around for caching purposes until the VM reallocates the 19memory. A similar mechanism (the dentry cache) greatly speeds up access to 20directories. 21 22With ramfs, there is no backing store. Files written into ramfs allocate 23dentries and page cache as usual, but there's nowhere to write them to. 24This means the pages are never marked clean, so they can't be freed by the 25VM when it's looking to recycle memory. 26 27The amount of code required to implement ramfs is tiny, because all the 28work is done by the existing Linux caching infrastructure. Basically, 29you're mounting the disk cache as a filesystem. Because of this, ramfs is not 30an optional component removable via menuconfig, since there would be negligible 31space savings. 32 33ramfs and ramdisk: 34------------------ 35 36The older "ram disk" mechanism created a synthetic block device out of 37an area of ram and used it as backing store for a filesystem. This block 38device was of fixed size, so the filesystem mounted on it was of fixed 39size. Using a ram disk also required unnecessarily copying memory from the 40fake block device into the page cache (and copying changes back out), as well 41as creating and destroying dentries. Plus it needed a filesystem driver 42(such as ext2) to format and interpret this data. 43 44Compared to ramfs, this wastes memory (and memory bus bandwidth), creates 45unnecessary work for the CPU, and pollutes the CPU caches. (There are tricks 46to avoid this copying by playing with the page tables, but they're unpleasantly 47complicated and turn out to be about as expensive as the copying anyway.) 48More to the point, all the work ramfs is doing has to happen _anyway_, 49since all file access goes through the page and dentry caches. The ram 50disk is simply unnecessary, ramfs is internally much simpler. 51 52Another reason ramdisks are semi-obsolete is that the introduction of 53loopback devices offered a more flexible and convenient way to create 54synthetic block devices, now from files instead of from chunks of memory. 55See losetup (8) for details. 56 57ramfs and tmpfs: 58---------------- 59 60One downside of ramfs is you can keep writing data into it until you fill 61up all memory, and the VM can't free it because the VM thinks that files 62should get written to backing store (rather than swap space), but ramfs hasn't 63got any backing store. Because of this, only root (or a trusted user) should 64be allowed write access to a ramfs mount. 65 66A ramfs derivative called tmpfs was created to add size limits, and the ability 67to write the data to swap space. Normal users can be allowed write access to 68tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information. 69 70What is rootfs? 71--------------- 72 73Rootfs is a special instance of ramfs, which is always present in 2.6 systems. 74(It's used internally as the starting and stopping point for searches of the 75kernel's doubly-linked list of mount points.) 76 77Most systems just mount another filesystem over it and ignore it. The 78amount of space an empty instance of ramfs takes up is tiny. 79 80What is initramfs? 81------------------ 82 83All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is 84extracted into rootfs when the kernel boots up. After extracting, the kernel 85checks to see if rootfs contains a file "init", and if so it executes it as PID 861. If found, this init process is responsible for bringing the system the 87rest of the way up, including locating and mounting the real root device (if 88any). If rootfs does not contain an init program after the embedded cpio 89archive is extracted into it, the kernel will fall through to the older code 90to locate and mount a root partition, then exec some variant of /sbin/init 91out of that. 92 93All this differs from the old initrd in several ways: 94 95 - The old initrd was a separate file, while the initramfs archive is linked 96 into the linux kernel image. (The directory linux-*/usr is devoted to 97 generating this archive during the build.) 98 99 - The old initrd file was a gzipped filesystem image (in some file format, 100 such as ext2, that had to be built into the kernel), while the new 101 initramfs archive is a gzipped cpio archive (like tar only simpler, 102 see cpio(1) and Documentation/early-userspace/buffer-format.txt). 103 104 - The program run by the old initrd (which was called /initrd, not /init) did 105 some setup and then returned to the kernel, while the init program from 106 initramfs is not expected to return to the kernel. (If /init needs to hand 107 off control it can overmount / with a new root device and exec another init 108 program. See the switch_root utility, below.) 109 110 - When switching another root device, initrd would pivot_root and then 111 umount the ramdisk. But initramfs is rootfs: you can neither pivot_root 112 rootfs, nor unmount it. Instead delete everything out of rootfs to 113 free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs 114 with the new root (cd /newmount; mount --move . /; chroot .), attach 115 stdin/stdout/stderr to the new /dev/console, and exec the new init. 116 117 Since this is a remarkably persnickity process (and involves deleting 118 commands before you can run them), the klibc package introduced a helper 119 program (utils/run_init.c) to do all this for you. Most other packages 120 (such as busybox) have named this command "switch_root". 121 122Populating initramfs: 123--------------------- 124 125The 2.6 kernel build process always creates a gzipped cpio format initramfs 126archive and links it into the resulting kernel binary. By default, this 127archive is empty (consuming 134 bytes on x86). The config option 128CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices 129in menuconfig, and living in usr/Kconfig) can be used to specify a source for 130the initramfs archive, which will automatically be incorporated into the 131resulting binary. This option can point to an existing gzipped cpio archive, a 132directory containing files to be archived, or a text file specification such 133as the following example: 134 135 dir /dev 755 0 0 136 nod /dev/console 644 0 0 c 5 1 137 nod /dev/loop0 644 0 0 b 7 0 138 dir /bin 755 1000 1000 139 slink /bin/sh busybox 777 0 0 140 file /bin/busybox initramfs/busybox 755 0 0 141 dir /proc 755 0 0 142 dir /sys 755 0 0 143 dir /mnt 755 0 0 144 file /init initramfs/init.sh 755 0 0 145 146One advantage of the text file is that root access is not required to 147set permissions or create device nodes in the new archive. (Note that those 148two example "file" entries expect to find files named "init.sh" and "busybox" in 149a directory called "initramfs", under the linux-2.6.* directory. See 150Documentation/early-userspace/README for more details.) 151 152If you don't already understand what shared libraries, devices, and paths 153you need to get a minimal root filesystem up and running, here are some 154references: 155http://www.tldp.org/HOWTO/Bootdisk-HOWTO/ 156http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html 157http://www.linuxfromscratch.org/lfs/view/stable/ 158 159The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is 160designed to be a tiny C library to statically link early userspace 161code against, along with some related utilities. It is BSD licensed. 162 163I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) 164myself. These are LGPL and GPL, respectively. 165 166In theory you could use glibc, but that's not well suited for small embedded 167uses like this. (A "hello world" program statically linked against glibc is 168over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do 169name lookups, even when otherwise statically linked.) 170 171Future directions: 172------------------ 173 174Today (2.6.14), initramfs is always compiled in, but not always used. The 175kernel falls back to legacy boot code that is reached only if initramfs does 176not contain an /init program. The fallback is legacy code, there to ensure a 177smooth transition and allowing early boot functionality to gradually move to 178"early userspace" (I.E. initramfs). 179 180The move to early userspace is necessary because finding and mounting the real 181root device is complex. Root partitions can span multiple devices (raid or 182separate journal). They can be out on the network (requiring dhcp, setting a 183specific mac address, logging into a server, etc). They can live on removable 184media, with dynamically allocated major/minor numbers and persistent naming 185issues requiring a full udev implementation to sort out. They can be 186compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned, 187and so on. 188 189This kind of complexity (which inevitably includes policy) is rightly handled 190in userspace. Both klibc and busybox/uClibc are working on simple initramfs 191packages to drop into a kernel build, and when standard solutions are ready 192and widely deployed, the kernel's legacy early boot code will become obsolete 193and a candidate for the feature removal schedule. 194 195But that's a while off yet.