Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

overlay: overlay filesystem documentation

Document the overlay filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

authored by

Neil Brown and committed by
Miklos Szeredi
7c37fbda f45827e8

+205
+198
Documentation/filesystems/overlayfs.txt
··· 1 + Written by: Neil Brown <neilb@suse.de> 2 + 3 + Overlay Filesystem 4 + ================== 5 + 6 + This document describes a prototype for a new approach to providing 7 + overlay-filesystem functionality in Linux (sometimes referred to as 8 + union-filesystems). An overlay-filesystem tries to present a 9 + filesystem which is the result over overlaying one filesystem on top 10 + of the other. 11 + 12 + The result will inevitably fail to look exactly like a normal 13 + filesystem for various technical reasons. The expectation is that 14 + many use cases will be able to ignore these differences. 15 + 16 + This approach is 'hybrid' because the objects that appear in the 17 + filesystem do not all appear to belong to that filesystem. In many 18 + cases an object accessed in the union will be indistinguishable 19 + from accessing the corresponding object from the original filesystem. 20 + This is most obvious from the 'st_dev' field returned by stat(2). 21 + 22 + While directories will report an st_dev from the overlay-filesystem, 23 + all non-directory objects will report an st_dev from the lower or 24 + upper filesystem that is providing the object. Similarly st_ino will 25 + only be unique when combined with st_dev, and both of these can change 26 + over the lifetime of a non-directory object. Many applications and 27 + tools ignore these values and will not be affected. 28 + 29 + Upper and Lower 30 + --------------- 31 + 32 + An overlay filesystem combines two filesystems - an 'upper' filesystem 33 + and a 'lower' filesystem. When a name exists in both filesystems, the 34 + object in the 'upper' filesystem is visible while the object in the 35 + 'lower' filesystem is either hidden or, in the case of directories, 36 + merged with the 'upper' object. 37 + 38 + It would be more correct to refer to an upper and lower 'directory 39 + tree' rather than 'filesystem' as it is quite possible for both 40 + directory trees to be in the same filesystem and there is no 41 + requirement that the root of a filesystem be given for either upper or 42 + lower. 43 + 44 + The lower filesystem can be any filesystem supported by Linux and does 45 + not need to be writable. The lower filesystem can even be another 46 + overlayfs. The upper filesystem will normally be writable and if it 47 + is it must support the creation of trusted.* extended attributes, and 48 + must provide valid d_type in readdir responses, so NFS is not suitable. 49 + 50 + A read-only overlay of two read-only filesystems may use any 51 + filesystem type. 52 + 53 + Directories 54 + ----------- 55 + 56 + Overlaying mainly involves directories. If a given name appears in both 57 + upper and lower filesystems and refers to a non-directory in either, 58 + then the lower object is hidden - the name refers only to the upper 59 + object. 60 + 61 + Where both upper and lower objects are directories, a merged directory 62 + is formed. 63 + 64 + At mount time, the two directories given as mount options "lowerdir" and 65 + "upperdir" are combined into a merged directory: 66 + 67 + mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper,\ 68 + workdir=/work /merged 69 + 70 + The "workdir" needs to be an empty directory on the same filesystem 71 + as upperdir. 72 + 73 + Then whenever a lookup is requested in such a merged directory, the 74 + lookup is performed in each actual directory and the combined result 75 + is cached in the dentry belonging to the overlay filesystem. If both 76 + actual lookups find directories, both are stored and a merged 77 + directory is created, otherwise only one is stored: the upper if it 78 + exists, else the lower. 79 + 80 + Only the lists of names from directories are merged. Other content 81 + such as metadata and extended attributes are reported for the upper 82 + directory only. These attributes of the lower directory are hidden. 83 + 84 + whiteouts and opaque directories 85 + -------------------------------- 86 + 87 + In order to support rm and rmdir without changing the lower 88 + filesystem, an overlay filesystem needs to record in the upper filesystem 89 + that files have been removed. This is done using whiteouts and opaque 90 + directories (non-directories are always opaque). 91 + 92 + A whiteout is created as a character device with 0/0 device number. 93 + When a whiteout is found in the upper level of a merged directory, any 94 + matching name in the lower level is ignored, and the whiteout itself 95 + is also hidden. 96 + 97 + A directory is made opaque by setting the xattr "trusted.overlay.opaque" 98 + to "y". Where the upper filesystem contains an opaque directory, any 99 + directory in the lower filesystem with the same name is ignored. 100 + 101 + readdir 102 + ------- 103 + 104 + When a 'readdir' request is made on a merged directory, the upper and 105 + lower directories are each read and the name lists merged in the 106 + obvious way (upper is read first, then lower - entries that already 107 + exist are not re-added). This merged name list is cached in the 108 + 'struct file' and so remains as long as the file is kept open. If the 109 + directory is opened and read by two processes at the same time, they 110 + will each have separate caches. A seekdir to the start of the 111 + directory (offset 0) followed by a readdir will cause the cache to be 112 + discarded and rebuilt. 113 + 114 + This means that changes to the merged directory do not appear while a 115 + directory is being read. This is unlikely to be noticed by many 116 + programs. 117 + 118 + seek offsets are assigned sequentially when the directories are read. 119 + Thus if 120 + - read part of a directory 121 + - remember an offset, and close the directory 122 + - re-open the directory some time later 123 + - seek to the remembered offset 124 + 125 + there may be little correlation between the old and new locations in 126 + the list of filenames, particularly if anything has changed in the 127 + directory. 128 + 129 + Readdir on directories that are not merged is simply handled by the 130 + underlying directory (upper or lower). 131 + 132 + 133 + Non-directories 134 + --------------- 135 + 136 + Objects that are not directories (files, symlinks, device-special 137 + files etc.) are presented either from the upper or lower filesystem as 138 + appropriate. When a file in the lower filesystem is accessed in a way 139 + the requires write-access, such as opening for write access, changing 140 + some metadata etc., the file is first copied from the lower filesystem 141 + to the upper filesystem (copy_up). Note that creating a hard-link 142 + also requires copy_up, though of course creation of a symlink does 143 + not. 144 + 145 + The copy_up may turn out to be unnecessary, for example if the file is 146 + opened for read-write but the data is not modified. 147 + 148 + The copy_up process first makes sure that the containing directory 149 + exists in the upper filesystem - creating it and any parents as 150 + necessary. It then creates the object with the same metadata (owner, 151 + mode, mtime, symlink-target etc.) and then if the object is a file, the 152 + data is copied from the lower to the upper filesystem. Finally any 153 + extended attributes are copied up. 154 + 155 + Once the copy_up is complete, the overlay filesystem simply 156 + provides direct access to the newly created file in the upper 157 + filesystem - future operations on the file are barely noticed by the 158 + overlay filesystem (though an operation on the name of the file such as 159 + rename or unlink will of course be noticed and handled). 160 + 161 + 162 + Non-standard behavior 163 + --------------------- 164 + 165 + The copy_up operation essentially creates a new, identical file and 166 + moves it over to the old name. The new file may be on a different 167 + filesystem, so both st_dev and st_ino of the file may change. 168 + 169 + Any open files referring to this inode will access the old data and 170 + metadata. Similarly any file locks obtained before copy_up will not 171 + apply to the copied up file. 172 + 173 + On a file opened with O_RDONLY fchmod(2), fchown(2), futimesat(2) and 174 + fsetxattr(2) will fail with EROFS. 175 + 176 + If a file with multiple hard links is copied up, then this will 177 + "break" the link. Changes will not be propagated to other names 178 + referring to the same inode. 179 + 180 + Symlinks in /proc/PID/ and /proc/PID/fd which point to a non-directory 181 + object in overlayfs will not contain valid absolute paths, only 182 + relative paths leading up to the filesystem's root. This will be 183 + fixed in the future. 184 + 185 + Some operations are not atomic, for example a crash during copy_up or 186 + rename will leave the filesystem in an inconsistent state. This will 187 + be addressed in the future. 188 + 189 + Changes to underlying filesystems 190 + --------------------------------- 191 + 192 + Offline changes, when the overlay is not mounted, are allowed to either 193 + the upper or the lower trees. 194 + 195 + Changes to the underlying filesystems while part of a mounted overlay 196 + filesystem are not allowed. If the underlying filesystem is changed, 197 + the behavior of the overlay is undefined, though it will not result in 198 + a crash or deadlock.
+7
MAINTAINERS
··· 6832 6832 F: include/scsi/osd_* 6833 6833 F: fs/exofs/ 6834 6834 6835 + OVERLAYFS FILESYSTEM 6836 + M: Miklos Szeredi <miklos@szeredi.hu> 6837 + L: linux-fsdevel@vger.kernel.org 6838 + S: Supported 6839 + F: fs/overlayfs/* 6840 + F: Documentation/filesystems/overlayfs.txt 6841 + 6835 6842 P54 WIRELESS DRIVER 6836 6843 M: Christian Lamparter <chunkeey@googlemail.com> 6837 6844 L: linux-wireless@vger.kernel.org