docs: fs: convert docs without extension to ReST

+173 -84

Documentation/filesystems/Locking Documentation/filesystems/locking.rst

··· 1 - The text below describes the locking rules for VFS-related methods. 1 + ======= 2 + Locking 3 + ======= 4 + 5 + The text below describes the locking rules for VFS-related methods. 2 6 It is (believed to be) up-to-date. *Please*, if you change anything in 3 7 prototypes or locking protocols - update this file. And update the relevant 4 8 instances in the tree, don't leave that to maintainers of filesystems/devices/ 5 9 etc. At the very least, put the list of dubious cases in the end of this file. 6 10 Don't turn it into log - maintainers of out-of-the-tree code are supposed to 7 11 be able to use diff(1). 8 - Thing currently missing here: socket operations. Alexey? 9 12 10 - --------------------------- dentry_operations -------------------------- 11 - prototypes: 13 + Thing currently missing here: socket operations. Alexey? 14 + 15 + dentry_operations 16 + ================= 17 + 18 + prototypes:: 19 + 12 20 int (*d_revalidate)(struct dentry *, unsigned int); 13 21 int (*d_weak_revalidate)(struct dentry *, unsigned int); 14 22 int (*d_hash)(const struct dentry *, struct qstr *); ··· 32 24 struct dentry *(*d_real)(struct dentry *, const struct inode *); 33 25 34 26 locking rules: 35 - rename_lock ->d_lock may block rcu-walk 36 - d_revalidate: no no yes (ref-walk) maybe 37 - d_weak_revalidate:no no yes no 38 - d_hash no no no maybe 39 - d_compare: yes no no maybe 40 - d_delete: no yes no no 41 - d_init: no no yes no 42 - d_release: no no yes no 43 - d_prune: no yes no no 44 - d_iput: no no yes no 45 - d_dname: no no no no 46 - d_automount: no no yes no 47 - d_manage: no no yes (ref-walk) maybe 48 - d_real no no yes no 49 27 50 - --------------------------- inode_operations --------------------------- 51 - prototypes: 28 + ================== =========== ======== ============== ======== 29 + ops rename_lock ->d_lock may block rcu-walk 30 + ================== =========== ======== ============== ======== 31 + d_revalidate: no no yes (ref-walk) maybe 32 + d_weak_revalidate: no no yes no 33 + d_hash no no no maybe 34 + d_compare: yes no no maybe 35 + d_delete: no yes no no 36 + d_init: no no yes no 37 + d_release: no no yes no 38 + d_prune: no yes no no 39 + d_iput: no no yes no 40 + d_dname: no no no no 41 + d_automount: no no yes no 42 + d_manage: no no yes (ref-walk) maybe 43 + d_real no no yes no 44 + ================== =========== ======== ============== ======== 45 + 46 + inode_operations 47 + ================ 48 + 49 + prototypes:: 50 + 52 51 int (*create) (struct inode *,struct dentry *,umode_t, bool); 53 52 struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); 54 53 int (*link) (struct dentry *,struct inode *,struct dentry *); ··· 83 68 84 69 locking rules: 85 70 all may block 86 - i_rwsem(inode) 71 + 72 + ============ ============================================= 73 + ops i_rwsem(inode) 74 + ============ ============================================= 87 75 lookup: shared 88 76 create: exclusive 89 77 link: exclusive (both) ··· 107 89 update_time: no 108 90 atomic_open: exclusive 109 91 tmpfile: no 92 + ============ ============================================= 110 93 111 94 112 95 Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem 113 96 exclusive on victim. 114 97 cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem. 115 98 116 - See Documentation/filesystems/directory-locking for more detailed discussion 99 + See Documentation/filesystems/directory-locking.rst for more detailed discussion 117 100 of the locking scheme for directory operations. 118 101 119 - ----------------------- xattr_handler operations ----------------------- 120 - prototypes: 102 + xattr_handler operations 103 + ======================== 104 + 105 + prototypes:: 106 + 121 107 bool (*list)(struct dentry *dentry); 122 108 int (*get)(const struct xattr_handler *handler, struct dentry *dentry, 123 109 struct inode *inode, const char *name, void *buffer, ··· 132 110 133 111 locking rules: 134 112 all may block 135 - i_rwsem(inode) 113 + 114 + ===== ============== 115 + ops i_rwsem(inode) 116 + ===== ============== 136 117 list: no 137 118 get: no 138 119 set: exclusive 120 + ===== ============== 139 121 140 - --------------------------- super_operations --------------------------- 141 - prototypes: 122 + super_operations 123 + ================ 124 + 125 + prototypes:: 126 + 142 127 struct inode *(*alloc_inode)(struct super_block *sb); 143 128 void (*free_inode)(struct inode *); 144 129 void (*destroy_inode)(struct inode *); ··· 167 138 168 139 locking rules: 169 140 All may block [not true, see below] 170 - s_umount 141 + 142 + ====================== ============ ======================== 143 + ops s_umount note 144 + ====================== ============ ======================== 171 145 alloc_inode: 172 146 free_inode: called from RCU callback 173 147 destroy_inode: ··· 189 157 quota_read: no (see below) 190 158 quota_write: no (see below) 191 159 bdev_try_to_free_page: no (see below) 160 + ====================== ============ ======================== 192 161 193 162 ->statfs() has s_umount (shared) when called by ustat(2) (native or 194 163 compat), but that's an accident of bad API; s_umount is used to pin ··· 197 164 identify the superblock. Everything else (statfs(), fstatfs(), etc.) 198 165 doesn't hold it when calling ->statfs() - superblock is pinned down 199 166 by resolving the pathname passed to syscall. 167 + 200 168 ->quota_read() and ->quota_write() functions are both guaranteed to 201 169 be the only ones operating on the quota file by the quota code (via 202 170 dqio_sem) (unless an admin really wants to screw up something and 203 171 writes to quota files with quotas on). For other details about locking 204 172 see also dquot_operations section. 173 + 205 174 ->bdev_try_to_free_page is called from the ->releasepage handler of 206 175 the block device inode. See there for more details. 207 176 208 - --------------------------- file_system_type --------------------------- 209 - prototypes: 177 + file_system_type 178 + ================ 179 + 180 + prototypes:: 181 + 210 182 struct dentry *(*mount) (struct file_system_type *, int, 211 183 const char *, void *); 212 184 void (*kill_sb) (struct super_block *); 185 + 213 186 locking rules: 214 - may block 187 + 188 + ======= ========= 189 + ops may block 190 + ======= ========= 215 191 mount yes 216 192 kill_sb yes 193 + ======= ========= 217 194 218 195 ->mount() returns ERR_PTR or the root dentry; its superblock should be locked 219 196 on return. 197 + 220 198 ->kill_sb() takes a write-locked superblock, does all shutdown work on it, 221 199 unlocks and drops the reference. 222 200 223 - --------------------------- address_space_operations -------------------------- 224 - prototypes: 201 + address_space_operations 202 + ======================== 203 + prototypes:: 204 + 225 205 int (*writepage)(struct page *page, struct writeback_control *wbc); 226 206 int (*readpage)(struct file *, struct page *); 227 207 int (*writepages)(struct address_space *, struct writeback_control *); ··· 264 218 locking rules: 265 219 All except set_page_dirty and freepage may block 266 220 267 - PageLocked(page) i_rwsem 221 + ====================== ======================== ========= 222 + ops PageLocked(page) i_rwsem 223 + ====================== ======================== ========= 268 224 writepage: yes, unlocks (see below) 269 225 readpage: yes, unlocks 270 226 writepages: 271 227 set_page_dirty no 272 228 readpages: 273 - write_begin: locks the page exclusive 274 - write_end: yes, unlocks exclusive 229 + write_begin: locks the page exclusive 230 + write_end: yes, unlocks exclusive 275 231 bmap: 276 232 invalidatepage: yes 277 233 releasepage: yes ··· 287 239 error_remove_page: yes 288 240 swap_activate: no 289 241 swap_deactivate: no 242 + ====================== ======================== ========= 290 243 291 - ->write_begin(), ->write_end() and ->readpage() may be called from 244 + ->write_begin(), ->write_end() and ->readpage() may be called from 292 245 the request handler (/dev/loop). 293 246 294 - ->readpage() unlocks the page, either synchronously or via I/O 247 + ->readpage() unlocks the page, either synchronously or via I/O 295 248 completion. 296 249 297 - ->readpages() populates the pagecache with the passed pages and starts 250 + ->readpages() populates the pagecache with the passed pages and starts 298 251 I/O against them. They come unlocked upon I/O completion. 299 252 300 - ->writepage() is used for two purposes: for "memory cleansing" and for 253 + ->writepage() is used for two purposes: for "memory cleansing" and for 301 254 "sync". These are quite different operations and the behaviour may differ 302 255 depending upon the mode. 303 256 ··· 346 297 radix tree. This incoherency can lead to all sorts of hard-to-debug problems 347 298 in the filesystem like having dirty inodes at umount and losing written data. 348 299 349 - ->writepages() is used for periodic writeback and for syscall-initiated 300 + ->writepages() is used for periodic writeback and for syscall-initiated 350 301 sync operations. The address_space should start I/O against at least 351 - *nr_to_write pages. *nr_to_write must be decremented for each page which is 352 - written. The address_space implementation may write more (or less) pages 353 - than *nr_to_write asks for, but it should try to be reasonably close. If 354 - nr_to_write is NULL, all dirty pages must be written. 302 + ``*nr_to_write`` pages. ``*nr_to_write`` must be decremented for each page 303 + which is written. The address_space implementation may write more (or less) 304 + pages than ``*nr_to_write`` asks for, but it should try to be reasonably close. 305 + If nr_to_write is NULL, all dirty pages must be written. 355 306 356 307 writepages should _only_ write pages which are present on 357 308 mapping->io_pages. 358 309 359 - ->set_page_dirty() is called from various places in the kernel 310 + ->set_page_dirty() is called from various places in the kernel 360 311 when the target page is marked as needing writeback. It may be called 361 312 under spinlock (it cannot block) and is sometimes called with the page 362 313 not locked. 363 314 364 - ->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some 315 + ->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some 365 316 filesystems and by the swapper. The latter will eventually go away. Please, 366 317 keep it that way and don't breed new callers. 367 318 368 - ->invalidatepage() is called when the filesystem must attempt to drop 319 + ->invalidatepage() is called when the filesystem must attempt to drop 369 320 some or all of the buffers from the page when it is being truncated. It 370 321 returns zero on success. If ->invalidatepage is zero, the kernel uses 371 322 block_invalidatepage() instead. 372 323 373 - ->releasepage() is called when the kernel is about to try to drop the 324 + ->releasepage() is called when the kernel is about to try to drop the 374 325 buffers from the page in preparation for freeing it. It returns zero to 375 326 indicate that the buffers are (or may be) freeable. If ->releasepage is zero, 376 327 the kernel assumes that the fs has no private interest in the buffers. 377 328 378 - ->freepage() is called when the kernel is done dropping the page 329 + ->freepage() is called when the kernel is done dropping the page 379 330 from the page cache. 380 331 381 - ->launder_page() may be called prior to releasing a page if 332 + ->launder_page() may be called prior to releasing a page if 382 333 it is still found to be dirty. It returns zero if the page was successfully 383 334 cleaned, or an error value if not. Note that in order to prevent the page 384 335 getting mapped back in and redirtied, it needs to be kept locked 385 336 across the entire operation. 386 337 387 - ->swap_activate will be called with a non-zero argument on 338 + ->swap_activate will be called with a non-zero argument on 388 339 files backing (non block device backed) swapfiles. A return value 389 340 of zero indicates success, in which case this file can be used for 390 341 backing swapspace. The swapspace operations will be proxied to the 391 342 address space operations. 392 343 393 - ->swap_deactivate() will be called in the sys_swapoff() 344 + ->swap_deactivate() will be called in the sys_swapoff() 394 345 path after ->swap_activate() returned success. 395 346 396 - ----------------------- file_lock_operations ------------------------------ 397 - prototypes: 347 + file_lock_operations 348 + ==================== 349 + 350 + prototypes:: 351 + 398 352 void (*fl_copy_lock)(struct file_lock *, struct file_lock *); 399 353 void (*fl_release_private)(struct file_lock *); 400 354 401 355 402 356 locking rules: 403 - inode->i_lock may block 357 + 358 + =================== ============= ========= 359 + ops inode->i_lock may block 360 + =================== ============= ========= 404 361 fl_copy_lock: yes no 405 - fl_release_private: maybe maybe[1] 362 + fl_release_private: maybe maybe[1]_ 363 + =================== ============= ========= 406 364 407 - [1]: ->fl_release_private for flock or POSIX locks is currently allowed 408 - to block. Leases however can still be freed while the i_lock is held and 409 - so fl_release_private called on a lease should not block. 365 + .. [1]: 366 + ->fl_release_private for flock or POSIX locks is currently allowed 367 + to block. Leases however can still be freed while the i_lock is held and 368 + so fl_release_private called on a lease should not block. 410 369 411 - ----------------------- lock_manager_operations --------------------------- 412 - prototypes: 370 + lock_manager_operations 371 + ======================= 372 + 373 + prototypes:: 374 + 413 375 void (*lm_notify)(struct file_lock *); /* unblock callback */ 414 376 int (*lm_grant)(struct file_lock *, struct file_lock *, int); 415 377 void (*lm_break)(struct file_lock *); /* break_lease callback */ ··· 428 368 429 369 locking rules: 430 370 431 - inode->i_lock blocked_lock_lock may block 371 + ========== ============= ================= ========= 372 + ops inode->i_lock blocked_lock_lock may block 373 + ========== ============= ================= ========= 432 374 lm_notify: yes yes no 433 375 lm_grant: no no no 434 376 lm_break: yes no no 435 377 lm_change yes no no 378 + ========== ============= ================= ========= 436 379 437 - --------------------------- buffer_head ----------------------------------- 438 - prototypes: 380 + buffer_head 381 + =========== 382 + 383 + prototypes:: 384 + 439 385 void (*b_end_io)(struct buffer_head *bh, int uptodate); 440 386 441 387 locking rules: 442 - called from interrupts. In other words, extreme care is needed here. 388 + 389 + called from interrupts. In other words, extreme care is needed here. 443 390 bh is locked, but that's all warranties we have here. Currently only RAID1, 444 391 highmem, fs/buffer.c, and fs/ntfs/aops.c are providing these. Block devices 445 392 call this method upon the IO completion. 446 393 447 - --------------------------- block_device_operations ----------------------- 448 - prototypes: 394 + block_device_operations 395 + ======================= 396 + prototypes:: 397 + 449 398 int (*open) (struct block_device *, fmode_t); 450 399 int (*release) (struct gendisk *, fmode_t); 451 400 int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); ··· 468 399 void (*swap_slot_free_notify) (struct block_device *, unsigned long); 469 400 470 401 locking rules: 471 - bd_mutex 402 + 403 + ======================= =================== 404 + ops bd_mutex 405 + ======================= =================== 472 406 open: yes 473 407 release: yes 474 408 ioctl: no ··· 482 410 revalidate_disk: no 483 411 getgeo: no 484 412 swap_slot_free_notify: no (see below) 413 + ======================= =================== 485 414 486 415 media_changed, unlock_native_capacity and revalidate_disk are called only from 487 416 check_disk_change(). ··· 491 418 held. 492 419 493 420 494 - --------------------------- file_operations ------------------------------- 495 - prototypes: 421 + file_operations 422 + =============== 423 + 424 + prototypes:: 425 + 496 426 loff_t (*llseek) (struct file *, loff_t, int); 497 427 ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); 498 428 ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ··· 531 455 size_t, unsigned int); 532 456 int (*setlease)(struct file *, long, struct file_lock **, void **); 533 457 long (*fallocate)(struct file *, int, loff_t, loff_t); 534 - }; 535 458 536 459 locking rules: 537 460 All may block. ··· 565 490 the lease within the individual filesystem to record the result of the 566 491 operation 567 492 568 - --------------------------- dquot_operations ------------------------------- 569 - prototypes: 493 + dquot_operations 494 + ================ 495 + 496 + prototypes:: 497 + 570 498 int (*write_dquot) (struct dquot *); 571 499 int (*acquire_dquot) (struct dquot *); 572 500 int (*release_dquot) (struct dquot *); ··· 581 503 582 504 What filesystem should expect from the generic quota functions: 583 505 584 - FS recursion Held locks when called 506 + ============== ============ ========================= 507 + ops FS recursion Held locks when called 508 + ============== ============ ========================= 585 509 write_dquot: yes dqonoff_sem or dqptr_sem 586 510 acquire_dquot: yes dqonoff_sem or dqptr_sem 587 511 release_dquot: yes dqonoff_sem or dqptr_sem 588 512 mark_dirty: no - 589 513 write_info: yes dqonoff_sem 514 + ============== ============ ========================= 590 515 591 516 FS recursion means calling ->quota_read() and ->quota_write() from superblock 592 517 operations. 593 518 594 519 More details about quota locking can be found in fs/dquot.c. 595 520 596 - --------------------------- vm_operations_struct ----------------------------- 597 - prototypes: 521 + vm_operations_struct 522 + ==================== 523 + 524 + prototypes:: 525 + 598 526 void (*open)(struct vm_area_struct*); 599 527 void (*close)(struct vm_area_struct*); 600 528 vm_fault_t (*fault)(struct vm_area_struct*, struct vm_fault *); ··· 609 525 int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); 610 526 611 527 locking rules: 612 - mmap_sem PageLocked(page) 528 + 529 + ============= ======== =========================== 530 + ops mmap_sem PageLocked(page) 531 + ============= ======== =========================== 613 532 open: yes 614 533 close: yes 615 534 fault: yes can return with page locked ··· 620 533 page_mkwrite: yes can return with page locked 621 534 pfn_mkwrite: yes 622 535 access: yes 536 + ============= ======== =========================== 623 537 624 - ->fault() is called when a previously not present pte is about 538 + ->fault() is called when a previously not present pte is about 625 539 to be faulted in. The filesystem must find and return the page associated 626 540 with the passed in "pgoff" in the vm_fault structure. If it is possible that 627 541 the page may be truncated and/or invalidated, then the filesystem must lock ··· 630 542 subsequent truncate), and then return with VM_FAULT_LOCKED, and the page 631 543 locked. The VM will unlock the page. 632 544 633 - ->map_pages() is called when VM asks to map easy accessible pages. 545 + ->map_pages() is called when VM asks to map easy accessible pages. 634 546 Filesystem should find and map pages associated with offsets from "start_pgoff" 635 547 till "end_pgoff". ->map_pages() is called with page table locked and must 636 548 not block. If it's not possible to reach a page without blocking, ··· 639 551 "pte" field in vm_fault structure. Pointers to entries for other offsets 640 552 should be calculated relative to "pte". 641 553 642 - ->page_mkwrite() is called when a previously read-only pte is 554 + ->page_mkwrite() is called when a previously read-only pte is 643 555 about to become writeable. The filesystem again must ensure that there are 644 556 no truncate/invalidate races, and then return with the page locked. If 645 557 the page has been truncated, the filesystem should not look up a new page 646 558 like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which 647 559 will cause the VM to retry the fault. 648 560 649 - ->pfn_mkwrite() is the same as page_mkwrite but when the pte is 561 + ->pfn_mkwrite() is the same as page_mkwrite but when the pte is 650 562 VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is 651 563 VM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior 652 564 after this call is to make the pte read-write, unless pfn_mkwrite returns 653 565 an error. 654 566 655 - ->access() is called when get_user_pages() fails in 567 + ->access() is called when get_user_pages() fails in 656 568 access_process_vm(), typically used to debug a process through 657 569 /proc/pid/mem or ptrace. This function is needed only for 658 570 VM_IO | VM_PFNMAP VMAs. 659 571 660 - ================================================================================ 572 + -------------------------------------------------------------------------------- 573 + 661 574 Dubious stuff 662 575 663 576 (if you break something or notice that it is broken and do not fix it yourself

+25 -15

Documentation/filesystems/directory-locking Documentation/filesystems/directory-locking.rst

··· 1 - Locking scheme used for directory operations is based on two 1 + ================= 2 + Directory Locking 3 + ================= 4 + 5 + 6 + Locking scheme used for directory operations is based on two 2 7 kinds of locks - per-inode (->i_rwsem) and per-filesystem 3 8 (->s_vfs_rename_mutex). 4 9 5 - When taking the i_rwsem on multiple non-directory objects, we 10 + When taking the i_rwsem on multiple non-directory objects, we 6 11 always acquire the locks in order by increasing address. We'll call 7 12 that "inode pointer" order in the following. 8 13 9 - For our purposes all operations fall in 5 classes: 14 + For our purposes all operations fall in 5 classes: 10 15 11 16 1) read access. Locking rules: caller locks directory we are accessing. 12 17 The lock is taken shared. ··· 32 27 case) shared. 33 28 34 29 5) link creation. Locking rules: 30 + 35 31 * lock parent 36 32 * check that source is not a directory 37 33 * lock source 38 34 * call the method. 35 + 39 36 All locks are exclusive. 40 37 41 38 6) cross-directory rename. The trickiest in the whole bunch. Locking 42 39 rules: 40 + 43 41 * lock the filesystem 44 42 * lock parents in "ancestors first" order. 45 43 * find source and target. 46 44 * if old parent is equal to or is a descendent of target 47 - fail with -ENOTEMPTY 45 + fail with -ENOTEMPTY 48 46 * if new parent is equal to or is a descendent of source 49 - fail with -ELOOP 47 + fail with -ELOOP 50 48 * If it's an exchange, lock both the source and the target. 51 49 * If the target exists, lock it. If the source is a non-directory, 52 50 lock it. If we need to lock both, do so in inode pointer order. 53 51 * call the method. 52 + 54 53 All ->i_rwsem are taken exclusive. Again, we might get away with locking 55 54 the the source (and target in exchange case) shared. 56 55 ··· 63 54 64 55 65 56 If no directory is its own ancestor, the scheme above is deadlock-free. 57 + 66 58 Proof: 67 59 68 60 First of all, at any moment we have a partial ordering of the 69 - objects - A < B iff A is an ancestor of B. 61 + objects - A < B iff A is an ancestor of B. 70 62 71 63 That ordering can change. However, the following is true: 72 64 ··· 87 77 non-directory object, except renames, which take locks on source and 88 78 target in inode pointer order in the case they are not directories.) 89 79 90 - Now consider the minimal deadlock. Each process is blocked on 80 + Now consider the minimal deadlock. Each process is blocked on 91 81 attempt to acquire some lock and already holds at least one lock. Let's 92 82 consider the set of contended locks. First of all, filesystem lock is 93 83 not contended, since any process blocked on it is not holding any locks. 94 84 Thus all processes are blocked on ->i_rwsem. 95 85 96 - By (3), any process holding a non-directory lock can only be 86 + By (3), any process holding a non-directory lock can only be 97 87 waiting on another non-directory lock with a larger address. Therefore 98 88 the process holding the "largest" such lock can always make progress, and 99 89 non-directory objects are not included in the set of contended locks. 100 90 101 - Thus link creation can't be a part of deadlock - it can't be 91 + Thus link creation can't be a part of deadlock - it can't be 102 92 blocked on source and it means that it doesn't hold any locks. 103 93 104 - Any contended object is either held by cross-directory rename or 94 + Any contended object is either held by cross-directory rename or 105 95 has a child that is also contended. Indeed, suppose that it is held by 106 96 operation other than cross-directory rename. Then the lock this operation 107 97 is blocked on belongs to child of that object due to (1). 108 98 109 - It means that one of the operations is cross-directory rename. 99 + It means that one of the operations is cross-directory rename. 110 100 Otherwise the set of contended objects would be infinite - each of them 111 101 would have a contended child and we had assumed that no object is its 112 102 own descendent. Moreover, there is exactly one cross-directory rename 113 103 (see above). 114 104 115 - Consider the object blocking the cross-directory rename. One 105 + Consider the object blocking the cross-directory rename. One 116 106 of its descendents is locked by cross-directory rename (otherwise we 117 107 would again have an infinite set of contended objects). But that 118 108 means that cross-directory rename is taking locks out of order. Due ··· 122 112 Contradiction. I.e. deadlock is impossible. Q.E.D. 123 113 124 114 125 - These operations are guaranteed to avoid loop creation. Indeed, 115 + These operations are guaranteed to avoid loop creation. Indeed, 126 116 the only operation that could introduce loops is cross-directory rename. 127 117 Since the only new (parent, child) pair added by rename() is (new parent, 128 118 source), such loop would have to contain these objects and the rest of it ··· 133 123 we had acquired filesystem lock and rename() would fail with -ELOOP in that 134 124 case. 135 125 136 - While this locking scheme works for arbitrary DAGs, it relies on 126 + While this locking scheme works for arbitrary DAGs, it relies on 137 127 ability to check that directory is a descendent of another object. Current 138 128 implementation assumes that directory graph is a tree. This assumption is 139 129 also preserved by all operations (cross-directory rename on a tree that would 140 130 not introduce a cycle will leave it a tree and link() fails for directories). 141 131 142 - Notice that "directory" in the above == "anything that might have 132 + Notice that "directory" in the above == "anything that might have 143 133 children", so if we are going to introduce hybrid objects we will need 144 134 either to make sure that link(2) doesn't work for them or to make changes 145 135 in is_subdir() that would make it work even in presence of such beasts.

+2

Documentation/filesystems/index.rst

··· 20 20 path-lookup 21 21 api-summary 22 22 splice 23 + locking 24 + directory-locking 23 25 24 26 Filesystem support layers 25 27 =========================

+18 -13

Documentation/filesystems/nfs/Exporting Documentation/filesystems/nfs/exporting.rst

··· 1 + :orphan: 1 2 2 3 Making Filesystems Exportable 3 4 ============================= ··· 43 42 for the object. This leads to two related but distinct features of 44 43 the dcache that are not needed for normal filesystem access. 45 44 46 - 1/ The dcache must sometimes contain objects that are not part of the 45 + 1. The dcache must sometimes contain objects that are not part of the 47 46 proper prefix. i.e that are not connected to the root. 48 - 2/ The dcache must be prepared for a newly found (via ->lookup) directory 47 + 2. The dcache must be prepared for a newly found (via ->lookup) directory 49 48 to already have a (non-connected) dentry, and must be able to move 50 49 that dentry into place (based on the parent and name in the 51 50 ->lookup). This is particularly needed for directories as ··· 53 52 54 53 To implement these features, the dcache has: 55 54 56 - a/ A dentry flag DCACHE_DISCONNECTED which is set on 55 + a. A dentry flag DCACHE_DISCONNECTED which is set on 57 56 any dentry that might not be part of the proper prefix. 58 57 This is set when anonymous dentries are created, and cleared when a 59 58 dentry is noticed to be a child of a dentry which is in the proper ··· 72 71 dentries. That guarantees that we won't need to hunt them down upon 73 72 umount. 74 73 75 - b/ A primitive for creation of secondary roots - d_obtain_root(inode). 74 + b. A primitive for creation of secondary roots - d_obtain_root(inode). 76 75 Those do _not_ bear DCACHE_DISCONNECTED. They are placed on the 77 76 per-superblock list (->s_roots), so they can be located at umount 78 77 time for eviction purposes. 79 78 80 - c/ Helper routines to allocate anonymous dentries, and to help attach 79 + c. Helper routines to allocate anonymous dentries, and to help attach 81 80 loose directory dentries at lookup time. They are: 81 + 82 82 d_obtain_alias(inode) will return a dentry for the given inode. 83 83 If the inode already has a dentry, one of those is returned. 84 + 84 85 If it doesn't, a new anonymous (IS_ROOT and 85 - DCACHE_DISCONNECTED) dentry is allocated and attached. 86 + DCACHE_DISCONNECTED) dentry is allocated and attached. 87 + 86 88 In the case of a directory, care is taken that only one dentry 87 89 can ever be attached. 90 + 88 91 d_splice_alias(inode, dentry) will introduce a new dentry into the tree; 89 92 either the passed-in dentry or a preexisting alias for the given inode 90 93 (such as an anonymous one created by d_obtain_alias), if appropriate. 91 94 It returns NULL when the passed-in dentry is used, following the calling 92 95 convention of ->lookup. 93 - 96 + 94 97 Filesystem Issues 95 98 ----------------- 96 99 97 100 For a filesystem to be exportable it must: 98 - 99 - 1/ provide the filehandle fragment routines described below. 100 - 2/ make sure that d_splice_alias is used rather than d_add 101 + 102 + 1. provide the filehandle fragment routines described below. 103 + 2. make sure that d_splice_alias is used rather than d_add 101 104 when ->lookup finds an inode for a given parent and name. 102 105 103 - If inode is NULL, d_splice_alias(inode, dentry) is equivalent to 106 + If inode is NULL, d_splice_alias(inode, dentry) is equivalent to:: 104 107 105 108 d_add(dentry, inode), NULL 106 109 107 110 Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err) 108 111 109 - Typically the ->lookup routine will simply end with a: 112 + Typically the ->lookup routine will simply end with a:: 110 113 111 114 return d_splice_alias(inode, dentry); 112 115 } 113 116 114 117 115 118 116 - A file system implementation declares that instances of the filesystem 119 + A file system implementation declares that instances of the filesystem 117 120 are exportable by setting the s_export_op field in the struct 118 121 super_block. This field must point to a "struct export_operations" 119 122 struct which has the following members:

+1 -1

Documentation/filesystems/vfs.rst

··· 20 20 21 21 VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on 22 22 are called from a process context. Filesystem locking is described in 23 - the document Documentation/filesystems/Locking. 23 + the document Documentation/filesystems/locking.rst. 24 24 25 25 26 26 Directory Entry Cache (dcache)

+1 -1

fs/cifs/export.c

··· 24 24 */ 25 25 26 26 /* 27 - * See Documentation/filesystems/nfs/Exporting 27 + * See Documentation/filesystems/nfs/exporting.rst 28 28 * and examples in fs/exportfs 29 29 * 30 30 * Since cifs is a network file system, an "fsid" must be included for

+1 -1

fs/exportfs/expfs.c

··· 7 7 * and for mapping back from file handles to dentries. 8 8 * 9 9 * For details on why we do all the strange and hairy things in here 10 - * take a look at Documentation/filesystems/nfs/Exporting. 10 + * take a look at Documentation/filesystems/nfs/exporting.rst. 11 11 */ 12 12 #include <linux/exportfs.h> 13 13 #include <linux/fs.h>

+1 -1

fs/isofs/export.c

··· 10 10 * 11 11 * The following files are helpful: 12 12 * 13 - * Documentation/filesystems/nfs/Exporting 13 + * Documentation/filesystems/nfs/exporting.rst 14 14 * fs/exportfs/expfs.c. 15 15 */ 16 16

+1 -1

fs/orangefs/file.c

··· 555 555 * Change the file pointer position for an instance of an open file. 556 556 * 557 557 * \note If .llseek is overriden, we must acquire lock as described in 558 - * Documentation/filesystems/Locking. 558 + * Documentation/filesystems/locking.rst. 559 559 * 560 560 * Future upgrade could support SEEK_DATA and SEEK_HOLE but would 561 561 * require much changes to the FS

+1 -1

include/linux/dcache.h

··· 151 151 152 152 /* 153 153 * Locking rules for dentry_operations callbacks are to be found in 154 - * Documentation/filesystems/Locking. Keep it updated! 154 + * Documentation/filesystems/locking.rst. Keep it updated! 155 155 * 156 156 * FUrther descriptions are found in Documentation/filesystems/vfs.rst. 157 157 * Keep it updated too!

+1 -1

include/linux/exportfs.h

··· 139 139 * @get_parent: find the parent of a given directory 140 140 * @commit_metadata: commit metadata changes to stable storage 141 141 * 142 - * See Documentation/filesystems/nfs/Exporting for details on how to use 142 + * See Documentation/filesystems/nfs/exporting.rst for details on how to use 143 143 * this interface correctly. 144 144 * 145 145 * encode_fh: