Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ext4: Orphan file documentation

Add documentation about the orphan file feature.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210816095713.16537-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

authored by

Jan Kara and committed by
Theodore Ts'o
3a6541e9 02f310fc

+89 -6
+1
Documentation/filesystems/ext4/globals.rst
··· 11 11 .. include:: bitmaps.rst 12 12 .. include:: mmp.rst 13 13 .. include:: journal.rst 14 + .. include:: orphan.rst
+5 -5
Documentation/filesystems/ext4/inodes.rst
··· 498 498 modification time (mtime), and deletion time (dtime). The four fields 499 499 are 32-bit signed integers that represent seconds since the Unix epoch 500 500 (1970-01-01 00:00:00 GMT), which means that the fields will overflow in 501 - January 2038. For inodes that are not linked from any directory but are 502 - still open (orphan inodes), the dtime field is overloaded for use with 503 - the orphan list. The superblock field ``s_last_orphan`` points to the 504 - first inode in the orphan list; dtime is then the number of the next 505 - orphaned inode, or zero if there are no more orphans. 501 + January 2038. If the filesystem does not have orphan_file feature, inodes 502 + that are not linked from any directory but are still open (orphan inodes) have 503 + the dtime field overloaded for use with the orphan list. The superblock field 504 + ``s_last_orphan`` points to the first inode in the orphan list; dtime is then 505 + the number of the next orphaned inode, or zero if there are no more orphans. 506 506 507 507 If the inode structure size ``sb->s_inode_size`` is larger than 128 508 508 bytes and the ``i_inode_extra`` field is large enough to encompass the
+52
Documentation/filesystems/ext4/orphan.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + Orphan file 4 + ----------- 5 + 6 + In unix there can inodes that are unlinked from directory hierarchy but that 7 + are still alive because they are open. In case of crash the filesystem has to 8 + clean up these inodes as otherwise they (and the blocks referenced from them) 9 + would leak. Similarly if we truncate or extend the file, we need not be able 10 + to perform the operation in a single journalling transaction. In such case we 11 + track the inode as orphan so that in case of crash extra blocks allocated to 12 + the file get truncated. 13 + 14 + Traditionally ext4 tracks orphan inodes in a form of single linked list where 15 + superblock contains the inode number of the last orphan inode (s\_last\_orphan 16 + field) and then each inode contains inode number of the previously orphaned 17 + inode (we overload i\_dtime inode field for this). However this filesystem 18 + global single linked list is a scalability bottleneck for workloads that result 19 + in heavy creation of orphan inodes. When orphan file feature 20 + (COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode 21 + (referenced from the superblock through s\_orphan_file_inum) with several 22 + blocks. Each of these blocks has a structure: 23 + 24 + .. list-table:: 25 + :widths: 8 8 24 40 26 + :header-rows: 1 27 + 28 + * - Offset 29 + - Type 30 + - Name 31 + - Description 32 + * - 0x0 33 + - Array of \_\_le32 entries 34 + - Orphan inode entries 35 + - Each \_\_le32 entry is either empty (0) or it contains inode number of 36 + an orphan inode. 37 + * - blocksize - 8 38 + - \_\_le32 39 + - ob\_magic 40 + - Magic value stored in orphan block tail (0x0b10ca04) 41 + * - blocksize - 4 42 + - \_\_le32 43 + - ob\_checksum 44 + - Checksum of the orphan block. 45 + 46 + When a filesystem with orphan file feature is writeably mounted, we set 47 + RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may 48 + be valid orphan entries. In case we see this feature when mounting the 49 + filesystem, we read the whole orphan file and process all orphan inodes found 50 + there as usual. When cleanly unmounting the filesystem we remove the 51 + RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan 52 + file and also make the filesystem fully compatible with older kernels.
+17
Documentation/filesystems/ext4/special_inodes.rst
··· 36 36 * - 11 37 37 - Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock. 38 38 39 + Note that there are also some inodes allocated from non-reserved inode numbers 40 + for other filesystem features which are not referenced from standard directory 41 + hierarchy. These are generally reference from the superblock. They are: 42 + 43 + .. list-table:: 44 + :widths: 20 50 45 + :header-rows: 1 46 + 47 + * - Superblock field 48 + - Description 49 + 50 + * - s\_lpf\_ino 51 + - Inode number of lost+found directory. 52 + * - s\_prj\_quota\_inum 53 + - Inode number of quota file tracking project quotas 54 + * - s\_orphan\_file\_inum 55 + - Inode number of file tracking orphan inodes.
+14 -1
Documentation/filesystems/ext4/super.rst
··· 479 479 - Filename charset encoding flags. 480 480 * - 0x280 481 481 - \_\_le32 482 - - s\_reserved[95] 482 + - s\_orphan\_file\_inum 483 + - Orphan file inode number. 484 + * - 0x284 485 + - \_\_le32 486 + - s\_reserved[94] 483 487 - Padding to the end of the block. 484 488 * - 0x3FC 485 489 - \_\_le32 ··· 607 603 the journal, JBD2 incompat feature 608 604 (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets 609 605 set (COMPAT\_FAST\_COMMIT). 606 + * - 0x1000 607 + - Orphan file allocated. This is the special file for more efficient 608 + tracking of unlinked but still open inodes. When there may be any 609 + entries in the file, we additionally set proper rocompat feature 610 + (RO\_COMPAT\_ORPHAN\_PRESENT). 610 611 611 612 .. _super_incompat: 612 613 ··· 722 713 - Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT) 723 714 * - 0x8000 724 715 - Verity inodes may be present on the filesystem. (RO\_COMPAT\_VERITY) 716 + * - 0x10000 717 + - Indicates orphan file may have valid orphan entries and thus we need 718 + to clean them up when mounting the filesystem 719 + (RO\_COMPAT\_ORPHAN\_PRESENT). 725 720 726 721 .. _super_def_hash: 727 722