Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

vfs: fix dentry RCU to refcounting possibly sleeping dput()

This is the fix that the last two commits indirectly led up to - making
sure that we don't call dput() in a bad context on the dentries we've
looked up in RCU mode after the sequence count validation fails.

This basically expands d_rcu_to_refcount() into the callers, and then
fixes the callers to delay the dput() in the failure case until _after_
we've dropped all locks and are no longer in an RCU-locked region.

The case of 'complete_walk()' was trivial, since its failure case did
the unlock_rcu_walk() directly after the call to d_rcu_to_refcount(),
and as such that is just a pure expansion of the function with a trivial
movement of the resulting dput() to after 'unlock_rcu_walk()'.

In contrast, the unlazy_walk() case was much more complicated, because
not only does convert two different dentries from RCU to be reference
counted, but it used to not call unlock_rcu_walk() at all, and instead
just returned an error and let the caller clean everything up in
"terminate_walk()".

Happily, one of the dentries in question (called "parent" inside
unlazy_walk()) is the dentry of "nd->path", which terminate_walk() wants
a refcount to anyway for the non-RCU case.

So what the new and improved unlazy_walk() does is to first turn that
dentry into a refcounted one, and once that is set up, the error cases
can continue to use the terminate_walk() helper for cleanup, but for the
non-RCU case. Which makes it possible to drop out of RCU mode if we
actually hit the sequence number failure case.

Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

+49 -53
+49 -53
fs/namei.c
··· 494 494 br_read_unlock(&vfsmount_lock); 495 495 } 496 496 497 - /* 498 - * When we move over from the RCU domain to properly refcounted 499 - * long-lived dentries, we need to check the sequence numbers 500 - * we got before lookup very carefully. 501 - * 502 - * We cannot blindly increment a dentry refcount - even if it 503 - * is not locked - if it is zero, because it may have gone 504 - * through the final d_kill() logic already. 505 - * 506 - * So for a zero refcount, we need to get the spinlock (which is 507 - * safe even for a dead dentry because the de-allocation is 508 - * RCU-delayed), and check the sequence count under the lock. 509 - * 510 - * Once we have checked the sequence count, we know it is live, 511 - * and since we hold the spinlock it cannot die from under us. 512 - * 513 - * In contrast, if the reference count wasn't zero, we can just 514 - * increment the lockref without having to take the spinlock. 515 - * Even if the sequence number ends up being stale, we haven't 516 - * gone through the final dput() and killed the dentry yet. 517 - */ 518 - static inline int d_rcu_to_refcount(struct dentry *dentry, seqcount_t *validate, unsigned seq) 519 - { 520 - if (likely(lockref_get_not_dead(&dentry->d_lockref))) { 521 - if (!read_seqcount_retry(validate, seq)) 522 - return 0; 523 - dput(dentry); 524 - } 525 - return -ECHILD; 526 - } 527 - 528 497 /** 529 498 * unlazy_walk - try to switch to ref-walk mode. 530 499 * @nd: nameidata pathwalk data ··· 508 539 { 509 540 struct fs_struct *fs = current->fs; 510 541 struct dentry *parent = nd->path.dentry; 511 - int want_root = 0; 512 542 513 543 BUG_ON(!(nd->flags & LOOKUP_RCU)); 514 - if (nd->root.mnt && !(nd->flags & LOOKUP_ROOT)) { 515 - want_root = 1; 516 - spin_lock(&fs->lock); 517 - if (nd->root.mnt != fs->root.mnt || 518 - nd->root.dentry != fs->root.dentry) 519 - goto err_root; 520 - } 544 + 545 + /* 546 + * Get a reference to the parent first: we're 547 + * going to make "path_put(nd->path)" valid in 548 + * non-RCU context for "terminate_walk()". 549 + * 550 + * If this doesn't work, return immediately with 551 + * RCU walking still active (and then we will do 552 + * the RCU walk cleanup in terminate_walk()). 553 + */ 554 + if (!lockref_get_not_dead(&parent->d_lockref)) 555 + return -ECHILD; 556 + 557 + /* 558 + * After the mntget(), we terminate_walk() will do 559 + * the right thing for non-RCU mode, and all our 560 + * subsequent exit cases should unlock_rcu_walk() 561 + * before returning. 562 + */ 563 + mntget(nd->path.mnt); 564 + nd->flags &= ~LOOKUP_RCU; 521 565 522 566 /* 523 567 * For a negative lookup, the lookup sequence point is the parents ··· 544 562 * be valid if the child sequence number is still valid. 545 563 */ 546 564 if (!dentry) { 547 - if (d_rcu_to_refcount(parent, &parent->d_seq, nd->seq) < 0) 548 - goto err_root; 565 + if (read_seqcount_retry(&parent->d_seq, nd->seq)) 566 + goto out; 549 567 BUG_ON(nd->inode != parent->d_inode); 550 568 } else { 551 - if (d_rcu_to_refcount(dentry, &dentry->d_seq, nd->seq) < 0) 552 - goto err_root; 553 - if (d_rcu_to_refcount(parent, &dentry->d_seq, nd->seq) < 0) 554 - goto err_parent; 569 + if (!lockref_get_not_dead(&dentry->d_lockref)) 570 + goto out; 571 + if (read_seqcount_retry(&dentry->d_seq, nd->seq)) 572 + goto drop_dentry; 555 573 } 556 - if (want_root) { 574 + 575 + /* 576 + * Sequence counts matched. Now make sure that the root is 577 + * still valid and get it if required. 578 + */ 579 + if (nd->root.mnt && !(nd->flags & LOOKUP_ROOT)) { 580 + spin_lock(&fs->lock); 581 + if (nd->root.mnt != fs->root.mnt || nd->root.dentry != fs->root.dentry) 582 + goto unlock_and_drop_dentry; 557 583 path_get(&nd->root); 558 584 spin_unlock(&fs->lock); 559 585 } 560 - mntget(nd->path.mnt); 561 586 562 587 unlock_rcu_walk(); 563 - nd->flags &= ~LOOKUP_RCU; 564 588 return 0; 565 589 566 - err_parent: 590 + unlock_and_drop_dentry: 591 + spin_unlock(&fs->lock); 592 + drop_dentry: 593 + unlock_rcu_walk(); 567 594 dput(dentry); 568 - err_root: 569 - if (want_root) 570 - spin_unlock(&fs->lock); 595 + return -ECHILD; 596 + out: 597 + unlock_rcu_walk(); 571 598 return -ECHILD; 572 599 } 573 600 ··· 605 614 if (!(nd->flags & LOOKUP_ROOT)) 606 615 nd->root.mnt = NULL; 607 616 608 - if (d_rcu_to_refcount(dentry, &dentry->d_seq, nd->seq) < 0) { 617 + if (unlikely(!lockref_get_not_dead(&dentry->d_lockref))) { 609 618 unlock_rcu_walk(); 619 + return -ECHILD; 620 + } 621 + if (read_seqcount_retry(&dentry->d_seq, nd->seq)) { 622 + unlock_rcu_walk(); 623 + dput(dentry); 610 624 return -ECHILD; 611 625 } 612 626 mntget(nd->path.mnt);