Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

[PATCH] namespace.c: fix race in mark_mounts_for_expiry()

This patch fixes a race found by Ram in mark_mounts_for_expiry() in
fs/namespace.c.

The bug can only be triggered with simultaneous exiting of a process having
a private namespace, and expiry of a mount from within that namespace.
It's practically impossible to trigger, and I haven't even tried. But
still, a bug is a bug.

The race happens when put_namespace() is called by another task, while
mark_mounts_for_expiry() is between atomic_read() and get_namespace(). In
that case get_namespace() will be called on an already dead namespace with
unforeseeable results.

The solution was suggested by Al Viro, with his own words:

Instead of screwing with atomic_read() in there, why don't we
simply do the following:
a) atomic_dec_and_lock() in put_namespace()
b) __put_namespace() called without dropping lock
c) the first thing done by __put_namespace would be
struct vfsmount *root = namespace->root;
namespace->root = NULL;
spin_unlock(...);
....
umount_tree(root);
...
d) check in mark_... would be simply namespace && namespace->root.

And we are all set; no screwing around with atomic_read(), no magic
at all. Dying namespace gets NULL ->root.
All changes of ->root happen under spinlock.
If under a spinlock we see non-NULL ->mnt_namespace, it won't be
freed until we drop the lock (we will set ->mnt_namespace to NULL
under that lock before we get to freeing namespace).
If under a spinlock we see non-NULL ->mnt_namespace and
->mnt_namespace->root, we can grab a reference to namespace and be
sure that it won't go away.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

authored by

Miklos Szeredi and committed by
Linus Torvalds
1ce88cf4 202322e6

+7 -3
+5 -2
fs/namespace.c
··· 869 869 /* don't do anything if the namespace is dead - all the 870 870 * vfsmounts from it are going away anyway */ 871 871 namespace = mnt->mnt_namespace; 872 - if (!namespace || atomic_read(&namespace->count) <= 0) 872 + if (!namespace || !namespace->root) 873 873 continue; 874 874 get_namespace(namespace); 875 875 ··· 1450 1450 1451 1451 void __put_namespace(struct namespace *namespace) 1452 1452 { 1453 + struct vfsmount *root = namespace->root; 1454 + namespace->root = NULL; 1455 + spin_unlock(&vfsmount_lock); 1453 1456 down_write(&namespace->sem); 1454 1457 spin_lock(&vfsmount_lock); 1455 - umount_tree(namespace->root); 1458 + umount_tree(root); 1456 1459 spin_unlock(&vfsmount_lock); 1457 1460 up_write(&namespace->sem); 1458 1461 kfree(namespace);
+2 -1
include/linux/namespace.h
··· 17 17 18 18 static inline void put_namespace(struct namespace *namespace) 19 19 { 20 - if (atomic_dec_and_test(&namespace->count)) 20 + if (atomic_dec_and_lock(&namespace->count, &vfsmount_lock)) 21 + /* releases vfsmount_lock */ 21 22 __put_namespace(namespace); 22 23 } 23 24