Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge patch series "fs: allow changing idmappings"

Christian Brauner <brauner@kernel.org> says:

Currently, it isn't possible to change the idmapping of an idmapped
mount. This is becoming an obstacle for various use-cases.

/* idmapped home directories with systemd-homed */

On newer systems /home is can be an idmapped mount such that each file
on disk is owned by 65536 and a subfolder exists for foreign id ranges
such as containers. For example, a home directory might look like this
(using an arbitrary folder as an example):

user1@localhost:~/data/mount-idmapped$ ls -al /data/
total 16
drwxrwxrwx 1 65536 65536 36 Jan 27 12:15 .
drwxrwxr-x 1 root root 184 Jan 27 12:06 ..
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 aaa
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 bbb
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 cc
drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers

When logging in home is mounted as an idmapped mount with the following
idmappings:

65536:$(id -u):1 // uid mapping
65536:$(id -g):1 // gid mapping
2147352576:2147352576:65536 // uid mapping
2147352576:2147352576:65536 // gid mapping

So for a user with uid/gid 1000 an idmapped /home would like like this:

user1@localhost:~/data/mount-idmapped$ ls -aln /mnt/
total 16
drwxrwxrwx 1 1000 1000 36 Jan 27 12:15 .
drwxrwxr-x 1 0 0 184 Jan 27 12:06 ..
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 aaa
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 bbb
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 cc
drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers

In other words, 65536 is mapped to the user's uid/gid and the range
2147352576 up to 2147352576 + 65536 is an identity mapping for
containers.

When a container is started a transient uid/gid range is allocated
outside of both mappings of the idmapped mount. For example, the
container might get the idmapping:

$ cat /proc/1742611/uid_map
0 537985024 65536

This container will be allowed to write to disk within the allocated
foreign id range 2147352576 to 2147352576 + 65536. To do this an
idmapped mount must be created from an already idmapped mount such that:

- The mappings for the user's uid/gid must be dropped, i.e., the
following mappings are removed:

65536:$(id -u):1 // uid mapping
65536:$(id -g):1 // gid mapping

- A mapping for the transient uid/gid range to the foreign uid/gid range
is added:

2147352576:537985024:65536

In combination this will mean that the container will write to disk
within the foreign id range 2147352576 to 2147352576 + 65536.

/* nested containers */

When the outer container makes use of idmapped mounts it isn't posssible
to create an idmapped mount for the inner container with a differen
idmapping from the outer container's idmapped mount.

There are other usecases and the two above just serve as an illustration
of the problem.

This patchset makes it possible to create a new idmapped mount from an
already idmapped mount. It aims to adhere to current performance
constraints and requirements:

- Idmapped mounts aim to have near zero performance implications for
path lookup. That is why no refernce counting, locking or any other
mechanism can be required that would impact performance.

This works be ensuring that a regular mount transitions to an idmapped
mount once going from a static nop_mnt_idmap mapping to a non-static
idmapping.

- The idmapping of a mount change anymore for the lifetime of the mount
afterwards. This not just avoids UAF issues it also avoids pitfalls
such as generating non-matching uid/gid values.

Changing idmappings could be solved by:

- Idmappings could simply be reference counted (above the simple
reference count when sharing them across multiple mounts).

This would require pairing mnt_idmap_get() with mnt_idmap_put() which
would end up being sprinkled everywhere into the VFS and some
filesystems that access idmappings directly.

It wouldn't just be quite ugly and introduce new complexity it would
have a noticeable performance impact.

- Idmappings could gain RCU protection. This would help the LOOKUP_RCU
case and avoids taking reference counts under RCU.

When not under LOOKUP_RCU reference counts need to be acquired on each
idmapping. This would require pairing mnt_idmap_get() with
mnt_idmap_put() which would end up being sprinkled everywhere into the
VFS and some filesystems that access idmappings directly.

This would have the same downsides as mentioned earlier.

- The earlier solutions work by updating the mnt->mnt_idmap pointer with
the new idmapping. Instead of this it would be possible to change the
idmapping itself to avoid UAF issues.

To do this a sequence counter would have to be added to struct mount.
When retrieving the idmapping to generate uid/gid values the sequence
counter would need to be sampled and the generation of the uid/gid
would spin until the update of the idmap is finished.

This has problems as well but the biggest issue will be that this can
lead to inconsistent permission checking and inconsistent uid/gid
pairs even more than this is already possible today. Specifically,
during creation it could happen that:

idmap = mnt_idmap(mnt);
inode_permission(idmap, ...);
may_create(idmap);
// create file with uid/gid based on @idmap

in between the permission checking and the generation of the uid/gid
value the idmapping could change leading to the permission checking
and uid/gid value that is actually used to create a file on disk being
out of sync.

Similarly if two values are generated like:

idmap = mnt_idmap(mnt)
vfsgid = make_vfsgid(idmap);
// idmapping gets update concurrently
vfsuid = make_vfsuid(idmap);

@vfsgid and @vfsuid could be out of sync if the idmapping was changed
in between. The generation of vfsgid/vfsuid could span a lot of
codelines so to guard against this a sequence count would have to be
passed around.

The performance impact of this solutio are less clear but very likely
not zero.

- Using SRCU similar to fanotify that can sleep. I find that not just
ugly but it would have memory consumption implications and is overall
pretty ugly.

/* solution */

So, to avoid all of these pitfalls creating an idmapped mount from an
already idmapped mount will be done atomically, i.e., a new detached
mount is created and a new set of mount properties applied to it without
it ever having been exposed to userspace at all.

This can be done in two ways. A new flag to open_tree() is added
OPEN_TREE_CLEAR_IDMAP that clears the old idmapping and returns a mount
that isn't idmapped. And then it is possible to set mount attributes on
it again including creation of an idmapped mount.

This has the consequence that a file descriptor must exist in userspace
that doesn't have any idmapping applied and it will thus never work in
unpriviledged scenarios. As a container would be able to remove the
idmapping of the mount it has been given. That should be avoided.

Instead, we add open_tree_attr() which works just like open_tree() but
takes an optional struct mount_attr parameter. This is useful beyond
idmappings as it fills a gap where a mount never exists in userspace
without the necessary mount properties applied.

This is particularly useful for mount options such as
MOUNT_ATTR_{RDONLY,NOSUID,NODEV,NOEXEC}.

To create a new idmapped mount the following works:

// Create a first idmapped mount
struct mount_attr attr = {
.attr_set = MOUNT_ATTR_IDMAP
.userns_fd = fd_userns
};

fd_tree = open_tree(-EBADF, "/", OPEN_TREE_CLONE, &attr, sizeof(attr));
move_mount(fd_tree, "", -EBADF, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

// Create a second idmapped mount from the first idmapped mount
attr.attr_set = MOUNT_ATTR_IDMAP;
attr.userns_fd = fd_userns2;
fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr));

// Create a second non-idmapped mount from the first idmapped mount:
memset(&attr, 0, sizeof(attr));
attr.attr_clr = MOUNT_ATTR_IDMAP;
fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr));

* patches from https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org:
fs: allow changing idmappings
fs: add kflags member to struct mount_kattr
fs: add open_tree_attr()
fs: add copy_mount_setattr() helper
fs: add vfs_open_tree() helper

Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

+200 -85
+1
arch/alpha/kernel/syscalls/syscall.tbl
··· 506 506 574 common getxattrat sys_getxattrat 507 507 575 common listxattrat sys_listxattrat 508 508 576 common removexattrat sys_removexattrat 509 + 577 common open_tree_attr sys_open_tree_attr
+1
arch/arm/tools/syscall.tbl
··· 481 481 464 common getxattrat sys_getxattrat 482 482 465 common listxattrat sys_listxattrat 483 483 466 common removexattrat sys_removexattrat 484 + 467 common open_tree_attr sys_open_tree_attr
+1
arch/arm64/tools/syscall_32.tbl
··· 478 478 464 common getxattrat sys_getxattrat 479 479 465 common listxattrat sys_listxattrat 480 480 466 common removexattrat sys_removexattrat 481 + 467 common open_tree_attr sys_open_tree_attr
+1
arch/m68k/kernel/syscalls/syscall.tbl
··· 466 466 464 common getxattrat sys_getxattrat 467 467 465 common listxattrat sys_listxattrat 468 468 466 common removexattrat sys_removexattrat 469 + 467 common open_tree_attr sys_open_tree_attr
+1
arch/microblaze/kernel/syscalls/syscall.tbl
··· 472 472 464 common getxattrat sys_getxattrat 473 473 465 common listxattrat sys_listxattrat 474 474 466 common removexattrat sys_removexattrat 475 + 467 common open_tree_attr sys_open_tree_attr
+1
arch/mips/kernel/syscalls/syscall_n32.tbl
··· 405 405 464 n32 getxattrat sys_getxattrat 406 406 465 n32 listxattrat sys_listxattrat 407 407 466 n32 removexattrat sys_removexattrat 408 + 467 n32 open_tree_attr sys_open_tree_attr
+1
arch/mips/kernel/syscalls/syscall_n64.tbl
··· 381 381 464 n64 getxattrat sys_getxattrat 382 382 465 n64 listxattrat sys_listxattrat 383 383 466 n64 removexattrat sys_removexattrat 384 + 467 n64 open_tree_attr sys_open_tree_attr
+1
arch/mips/kernel/syscalls/syscall_o32.tbl
··· 454 454 464 o32 getxattrat sys_getxattrat 455 455 465 o32 listxattrat sys_listxattrat 456 456 466 o32 removexattrat sys_removexattrat 457 + 467 o32 open_tree_attr sys_open_tree_attr
+1
arch/parisc/kernel/syscalls/syscall.tbl
··· 465 465 464 common getxattrat sys_getxattrat 466 466 465 common listxattrat sys_listxattrat 467 467 466 common removexattrat sys_removexattrat 468 + 467 common open_tree_attr sys_open_tree_attr
+1
arch/powerpc/kernel/syscalls/syscall.tbl
··· 557 557 464 common getxattrat sys_getxattrat 558 558 465 common listxattrat sys_listxattrat 559 559 466 common removexattrat sys_removexattrat 560 + 467 common open_tree_attr sys_open_tree_attr
+1
arch/s390/kernel/syscalls/syscall.tbl
··· 469 469 464 common getxattrat sys_getxattrat sys_getxattrat 470 470 465 common listxattrat sys_listxattrat sys_listxattrat 471 471 466 common removexattrat sys_removexattrat sys_removexattrat 472 + 467 common open_tree_attr sys_open_tree_attr sys_open_tree_attr
+1
arch/sh/kernel/syscalls/syscall.tbl
··· 470 470 464 common getxattrat sys_getxattrat 471 471 465 common listxattrat sys_listxattrat 472 472 466 common removexattrat sys_removexattrat 473 + 467 common open_tree_attr sys_open_tree_attr
+1
arch/sparc/kernel/syscalls/syscall.tbl
··· 512 512 464 common getxattrat sys_getxattrat 513 513 465 common listxattrat sys_listxattrat 514 514 466 common removexattrat sys_removexattrat 515 + 467 common open_tree_attr sys_open_tree_attr
+1
arch/x86/entry/syscalls/syscall_32.tbl
··· 472 472 464 i386 getxattrat sys_getxattrat 473 473 465 i386 listxattrat sys_listxattrat 474 474 466 i386 removexattrat sys_removexattrat 475 + 467 i386 open_tree_attr sys_open_tree_attr
+1
arch/x86/entry/syscalls/syscall_64.tbl
··· 390 390 464 common getxattrat sys_getxattrat 391 391 465 common listxattrat sys_listxattrat 392 392 466 common removexattrat sys_removexattrat 393 + 467 common open_tree_attr sys_open_tree_attr 393 394 394 395 # 395 396 # Due to a historical design error, certain syscalls are numbered differently
+1
arch/xtensa/kernel/syscalls/syscall.tbl
··· 437 437 464 common getxattrat sys_getxattrat 438 438 465 common listxattrat sys_listxattrat 439 439 466 common removexattrat sys_removexattrat 440 + 467 common open_tree_attr sys_open_tree_attr
+171 -81
fs/namespace.c
··· 87 87 static struct rb_root mnt_ns_tree = RB_ROOT; /* protected by mnt_ns_tree_lock */ 88 88 static LIST_HEAD(mnt_ns_list); /* protected by mnt_ns_tree_lock */ 89 89 90 + enum mount_kattr_flags_t { 91 + MOUNT_KATTR_RECURSE = (1 << 0), 92 + MOUNT_KATTR_IDMAP_REPLACE = (1 << 1), 93 + }; 94 + 90 95 struct mount_kattr { 91 96 unsigned int attr_set; 92 97 unsigned int attr_clr; 93 98 unsigned int propagation; 94 99 unsigned int lookup_flags; 95 - bool recurse; 100 + enum mount_kattr_flags_t kflags; 96 101 struct user_namespace *mnt_userns; 97 102 struct mnt_idmap *mnt_idmap; 98 103 }; ··· 3007 3002 return file; 3008 3003 } 3009 3004 3010 - SYSCALL_DEFINE3(open_tree, int, dfd, const char __user *, filename, unsigned, flags) 3005 + static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned int flags) 3011 3006 { 3012 - struct file *file; 3013 - struct path path; 3007 + int ret; 3008 + struct path path __free(path_put) = {}; 3014 3009 int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW; 3015 3010 bool detached = flags & OPEN_TREE_CLONE; 3016 - int error; 3017 - int fd; 3018 3011 3019 3012 BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC); 3020 3013 3021 3014 if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | 3022 3015 AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE | 3023 3016 OPEN_TREE_CLOEXEC)) 3024 - return -EINVAL; 3017 + return ERR_PTR(-EINVAL); 3025 3018 3026 3019 if ((flags & (AT_RECURSIVE | OPEN_TREE_CLONE)) == AT_RECURSIVE) 3027 - return -EINVAL; 3020 + return ERR_PTR(-EINVAL); 3028 3021 3029 3022 if (flags & AT_NO_AUTOMOUNT) 3030 3023 lookup_flags &= ~LOOKUP_AUTOMOUNT; ··· 3032 3029 lookup_flags |= LOOKUP_EMPTY; 3033 3030 3034 3031 if (detached && !may_mount()) 3035 - return -EPERM; 3032 + return ERR_PTR(-EPERM); 3033 + 3034 + ret = user_path_at(dfd, filename, lookup_flags, &path); 3035 + if (unlikely(ret)) 3036 + return ERR_PTR(ret); 3037 + 3038 + if (detached) 3039 + return open_detached_copy(&path, flags & AT_RECURSIVE); 3040 + 3041 + return dentry_open(&path, O_PATH, current_cred()); 3042 + } 3043 + 3044 + SYSCALL_DEFINE3(open_tree, int, dfd, const char __user *, filename, unsigned, flags) 3045 + { 3046 + int fd; 3047 + struct file *file __free(fput) = NULL; 3048 + 3049 + file = vfs_open_tree(dfd, filename, flags); 3050 + if (IS_ERR(file)) 3051 + return PTR_ERR(file); 3036 3052 3037 3053 fd = get_unused_fd_flags(flags & O_CLOEXEC); 3038 3054 if (fd < 0) 3039 3055 return fd; 3040 3056 3041 - error = user_path_at(dfd, filename, lookup_flags, &path); 3042 - if (unlikely(error)) { 3043 - file = ERR_PTR(error); 3044 - } else { 3045 - if (detached) 3046 - file = open_detached_copy(&path, flags & AT_RECURSIVE); 3047 - else 3048 - file = dentry_open(&path, O_PATH, current_cred()); 3049 - path_put(&path); 3050 - } 3051 - if (IS_ERR(file)) { 3052 - put_unused_fd(fd); 3053 - return PTR_ERR(file); 3054 - } 3055 - fd_install(fd, file); 3057 + fd_install(fd, no_free_ptr(file)); 3056 3058 return fd; 3057 3059 } 3058 3060 ··· 4613 4605 return -EINVAL; 4614 4606 4615 4607 /* 4616 - * Once a mount has been idmapped we don't allow it to change its 4617 - * mapping. It makes things simpler and callers can just create 4618 - * another bind-mount they can idmap if they want to. 4608 + * We only allow an mount to change it's idmapping if it has 4609 + * never been accessible to userspace. 4619 4610 */ 4620 - if (is_idmapped_mnt(m)) 4611 + if (!(kattr->kflags & MOUNT_KATTR_IDMAP_REPLACE) && is_idmapped_mnt(m)) 4621 4612 return -EPERM; 4622 4613 4623 4614 /* The underlying filesystem doesn't support idmapped mounts yet. */ ··· 4676 4669 break; 4677 4670 } 4678 4671 4679 - if (!kattr->recurse) 4672 + if (!(kattr->kflags & MOUNT_KATTR_RECURSE)) 4680 4673 return 0; 4681 4674 } 4682 4675 ··· 4706 4699 4707 4700 static void do_idmap_mount(const struct mount_kattr *kattr, struct mount *mnt) 4708 4701 { 4702 + struct mnt_idmap *old_idmap; 4703 + 4709 4704 if (!kattr->mnt_idmap) 4710 4705 return; 4711 4706 4712 - /* 4713 - * Pairs with smp_load_acquire() in mnt_idmap(). 4714 - * 4715 - * Since we only allow a mount to change the idmapping once and 4716 - * verified this in can_idmap_mount() we know that the mount has 4717 - * @nop_mnt_idmap attached to it. So there's no need to drop any 4718 - * references. 4719 - */ 4707 + old_idmap = mnt_idmap(&mnt->mnt); 4708 + 4709 + /* Pairs with smp_load_acquire() in mnt_idmap(). */ 4720 4710 smp_store_release(&mnt->mnt.mnt_idmap, mnt_idmap_get(kattr->mnt_idmap)); 4711 + mnt_idmap_put(old_idmap); 4721 4712 } 4722 4713 4723 4714 static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt) ··· 4735 4730 4736 4731 if (kattr->propagation) 4737 4732 change_mnt_propagation(m, kattr->propagation); 4738 - if (!kattr->recurse) 4733 + if (!(kattr->kflags & MOUNT_KATTR_RECURSE)) 4739 4734 break; 4740 4735 } 4741 4736 touch_mnt_namespace(mnt->mnt_ns); ··· 4765 4760 */ 4766 4761 namespace_lock(); 4767 4762 if (kattr->propagation == MS_SHARED) { 4768 - err = invent_group_ids(mnt, kattr->recurse); 4763 + err = invent_group_ids(mnt, kattr->kflags & MOUNT_KATTR_RECURSE); 4769 4764 if (err) { 4770 4765 namespace_unlock(); 4771 4766 return err; ··· 4816 4811 } 4817 4812 4818 4813 static int build_mount_idmapped(const struct mount_attr *attr, size_t usize, 4819 - struct mount_kattr *kattr, unsigned int flags) 4814 + struct mount_kattr *kattr) 4820 4815 { 4821 4816 struct ns_common *ns; 4822 4817 struct user_namespace *mnt_userns; ··· 4824 4819 if (!((attr->attr_set | attr->attr_clr) & MOUNT_ATTR_IDMAP)) 4825 4820 return 0; 4826 4821 4827 - /* 4828 - * We currently do not support clearing an idmapped mount. If this ever 4829 - * is a use-case we can revisit this but for now let's keep it simple 4830 - * and not allow it. 4831 - */ 4832 - if (attr->attr_clr & MOUNT_ATTR_IDMAP) 4833 - return -EINVAL; 4822 + if (attr->attr_clr & MOUNT_ATTR_IDMAP) { 4823 + /* 4824 + * We can only remove an idmapping if it's never been 4825 + * exposed to userspace. 4826 + */ 4827 + if (!(kattr->kflags & MOUNT_KATTR_IDMAP_REPLACE)) 4828 + return -EINVAL; 4829 + 4830 + /* 4831 + * Removal of idmappings is equivalent to setting 4832 + * nop_mnt_idmap. 4833 + */ 4834 + if (!(attr->attr_set & MOUNT_ATTR_IDMAP)) { 4835 + kattr->mnt_idmap = &nop_mnt_idmap; 4836 + return 0; 4837 + } 4838 + } 4834 4839 4835 4840 if (attr->userns_fd > INT_MAX) 4836 4841 return -EINVAL; ··· 4877 4862 } 4878 4863 4879 4864 static int build_mount_kattr(const struct mount_attr *attr, size_t usize, 4880 - struct mount_kattr *kattr, unsigned int flags) 4865 + struct mount_kattr *kattr) 4881 4866 { 4882 - unsigned int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW; 4883 - 4884 - if (flags & AT_NO_AUTOMOUNT) 4885 - lookup_flags &= ~LOOKUP_AUTOMOUNT; 4886 - if (flags & AT_SYMLINK_NOFOLLOW) 4887 - lookup_flags &= ~LOOKUP_FOLLOW; 4888 - if (flags & AT_EMPTY_PATH) 4889 - lookup_flags |= LOOKUP_EMPTY; 4890 - 4891 - *kattr = (struct mount_kattr) { 4892 - .lookup_flags = lookup_flags, 4893 - .recurse = !!(flags & AT_RECURSIVE), 4894 - }; 4895 - 4896 4867 if (attr->propagation & ~MOUNT_SETATTR_PROPAGATION_FLAGS) 4897 4868 return -EINVAL; 4898 4869 if (hweight32(attr->propagation & MOUNT_SETATTR_PROPAGATION_FLAGS) > 1) ··· 4926 4925 return -EINVAL; 4927 4926 } 4928 4927 4929 - return build_mount_idmapped(attr, usize, kattr, flags); 4928 + return build_mount_idmapped(attr, usize, kattr); 4930 4929 } 4931 4930 4932 4931 static void finish_mount_kattr(struct mount_kattr *kattr) 4933 4932 { 4934 - put_user_ns(kattr->mnt_userns); 4935 - kattr->mnt_userns = NULL; 4933 + if (kattr->mnt_userns) { 4934 + put_user_ns(kattr->mnt_userns); 4935 + kattr->mnt_userns = NULL; 4936 + } 4936 4937 4937 4938 if (kattr->mnt_idmap) 4938 4939 mnt_idmap_put(kattr->mnt_idmap); 4939 4940 } 4940 4941 4941 - SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path, 4942 - unsigned int, flags, struct mount_attr __user *, uattr, 4943 - size_t, usize) 4942 + static int copy_mount_setattr(struct mount_attr __user *uattr, size_t usize, 4943 + struct mount_kattr *kattr) 4944 4944 { 4945 - int err; 4946 - struct path target; 4945 + int ret; 4947 4946 struct mount_attr attr; 4948 - struct mount_kattr kattr; 4949 4947 4950 4948 BUILD_BUG_ON(sizeof(struct mount_attr) != MOUNT_ATTR_SIZE_VER0); 4951 - 4952 - if (flags & ~(AT_EMPTY_PATH | 4953 - AT_RECURSIVE | 4954 - AT_SYMLINK_NOFOLLOW | 4955 - AT_NO_AUTOMOUNT)) 4956 - return -EINVAL; 4957 4949 4958 4950 if (unlikely(usize > PAGE_SIZE)) 4959 4951 return -E2BIG; ··· 4956 4962 if (!may_mount()) 4957 4963 return -EPERM; 4958 4964 4959 - err = copy_struct_from_user(&attr, sizeof(attr), uattr, usize); 4960 - if (err) 4961 - return err; 4965 + ret = copy_struct_from_user(&attr, sizeof(attr), uattr, usize); 4966 + if (ret) 4967 + return ret; 4962 4968 4963 4969 /* Don't bother walking through the mounts if this is a nop. */ 4964 4970 if (attr.attr_set == 0 && ··· 4966 4972 attr.propagation == 0) 4967 4973 return 0; 4968 4974 4969 - err = build_mount_kattr(&attr, usize, &kattr, flags); 4975 + return build_mount_kattr(&attr, usize, kattr); 4976 + } 4977 + 4978 + SYSCALL_DEFINE5(mount_setattr, int, dfd, const char __user *, path, 4979 + unsigned int, flags, struct mount_attr __user *, uattr, 4980 + size_t, usize) 4981 + { 4982 + int err; 4983 + struct path target; 4984 + struct mount_kattr kattr; 4985 + unsigned int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW; 4986 + 4987 + if (flags & ~(AT_EMPTY_PATH | 4988 + AT_RECURSIVE | 4989 + AT_SYMLINK_NOFOLLOW | 4990 + AT_NO_AUTOMOUNT)) 4991 + return -EINVAL; 4992 + 4993 + if (flags & AT_NO_AUTOMOUNT) 4994 + lookup_flags &= ~LOOKUP_AUTOMOUNT; 4995 + if (flags & AT_SYMLINK_NOFOLLOW) 4996 + lookup_flags &= ~LOOKUP_FOLLOW; 4997 + if (flags & AT_EMPTY_PATH) 4998 + lookup_flags |= LOOKUP_EMPTY; 4999 + 5000 + kattr = (struct mount_kattr) { 5001 + .lookup_flags = lookup_flags, 5002 + }; 5003 + 5004 + if (flags & AT_RECURSIVE) 5005 + kattr.kflags |= MOUNT_KATTR_RECURSE; 5006 + 5007 + err = copy_mount_setattr(uattr, usize, &kattr); 4970 5008 if (err) 4971 5009 return err; 4972 5010 ··· 5009 4983 } 5010 4984 finish_mount_kattr(&kattr); 5011 4985 return err; 4986 + } 4987 + 4988 + SYSCALL_DEFINE5(open_tree_attr, int, dfd, const char __user *, filename, 4989 + unsigned, flags, struct mount_attr __user *, uattr, 4990 + size_t, usize) 4991 + { 4992 + struct file __free(fput) *file = NULL; 4993 + int fd; 4994 + 4995 + if (!uattr && usize) 4996 + return -EINVAL; 4997 + 4998 + file = vfs_open_tree(dfd, filename, flags); 4999 + if (IS_ERR(file)) 5000 + return PTR_ERR(file); 5001 + 5002 + if (uattr) { 5003 + int ret; 5004 + struct mount_kattr kattr = {}; 5005 + 5006 + kattr.kflags = MOUNT_KATTR_IDMAP_REPLACE; 5007 + if (flags & AT_RECURSIVE) 5008 + kattr.kflags |= MOUNT_KATTR_RECURSE; 5009 + 5010 + ret = copy_mount_setattr(uattr, usize, &kattr); 5011 + if (ret) 5012 + return ret; 5013 + 5014 + ret = do_mount_setattr(&file->f_path, &kattr); 5015 + if (ret) 5016 + return ret; 5017 + 5018 + finish_mount_kattr(&kattr); 5019 + } 5020 + 5021 + fd = get_unused_fd_flags(flags & O_CLOEXEC); 5022 + if (fd < 0) 5023 + return fd; 5024 + 5025 + fd_install(fd, no_free_ptr(file)); 5026 + return fd; 5012 5027 } 5013 5028 5014 5029 int show_path(struct seq_file *m, struct dentry *root) ··· 5526 5459 return 0; 5527 5460 } 5528 5461 5462 + /* This must be updated whenever a new flag is added */ 5463 + #define STATMOUNT_SUPPORTED (STATMOUNT_SB_BASIC | \ 5464 + STATMOUNT_MNT_BASIC | \ 5465 + STATMOUNT_PROPAGATE_FROM | \ 5466 + STATMOUNT_MNT_ROOT | \ 5467 + STATMOUNT_MNT_POINT | \ 5468 + STATMOUNT_FS_TYPE | \ 5469 + STATMOUNT_MNT_NS_ID | \ 5470 + STATMOUNT_MNT_OPTS | \ 5471 + STATMOUNT_FS_SUBTYPE | \ 5472 + STATMOUNT_SB_SOURCE | \ 5473 + STATMOUNT_OPT_ARRAY | \ 5474 + STATMOUNT_OPT_SEC_ARRAY | \ 5475 + STATMOUNT_SUPPORTED_MASK) 5476 + 5529 5477 static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id, 5530 5478 struct mnt_namespace *ns) 5531 5479 { ··· 5617 5535 if (!err && s->mask & STATMOUNT_MNT_NS_ID) 5618 5536 statmount_mnt_ns_id(s, ns); 5619 5537 5538 + if (!err && s->mask & STATMOUNT_SUPPORTED_MASK) { 5539 + s->sm.mask |= STATMOUNT_SUPPORTED_MASK; 5540 + s->sm.supported_mask = STATMOUNT_SUPPORTED; 5541 + } 5542 + 5620 5543 if (err) 5621 5544 return err; 5545 + 5546 + /* Are there bits in the return mask not present in STATMOUNT_SUPPORTED? */ 5547 + WARN_ON_ONCE(~STATMOUNT_SUPPORTED & s->sm.mask); 5622 5548 5623 5549 return 0; 5624 5550 }
+4
include/linux/syscalls.h
··· 951 951 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len, 952 952 int flags, uint32_t sig); 953 953 asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags); 954 + asmlinkage long sys_open_tree_attr(int dfd, const char __user *path, 955 + unsigned flags, 956 + struct mount_attr __user *uattr, 957 + size_t usize); 954 958 asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path, 955 959 int to_dfd, const char __user *to_path, 956 960 unsigned int ms_flags);
+3 -1
include/uapi/asm-generic/unistd.h
··· 849 849 __SYSCALL(__NR_listxattrat, sys_listxattrat) 850 850 #define __NR_removexattrat 466 851 851 __SYSCALL(__NR_removexattrat, sys_removexattrat) 852 + #define __NR_open_tree_attr 467 853 + __SYSCALL(__NR_open_tree_attr, sys_open_tree_attr) 852 854 853 855 #undef __NR_syscalls 854 - #define __NR_syscalls 467 856 + #define __NR_syscalls 468 855 857 856 858 /* 857 859 * 32 bit systems traditionally used different
+5 -3
include/uapi/linux/mount.h
··· 179 179 __u32 opt_array; /* [str] Array of nul terminated fs options */ 180 180 __u32 opt_sec_num; /* Number of security options */ 181 181 __u32 opt_sec_array; /* [str] Array of nul terminated security options */ 182 + __u64 supported_mask; /* Mask flags that this kernel supports */ 182 183 __u32 mnt_uidmap_num; /* Number of uid mappings */ 183 184 __u32 mnt_uidmap; /* [str] Array of uid mappings (as seen from callers namespace) */ 184 185 __u32 mnt_gidmap_num; /* Number of gid mappings */ 185 186 __u32 mnt_gidmap; /* [str] Array of gid mappings (as seen from callers namespace) */ 186 - __u64 __spare2[44]; 187 + __u64 __spare2[43]; 187 188 char str[]; /* Variable size part containing strings */ 188 189 }; 189 190 ··· 222 221 #define STATMOUNT_SB_SOURCE 0x00000200U /* Want/got sb_source */ 223 222 #define STATMOUNT_OPT_ARRAY 0x00000400U /* Want/got opt_... */ 224 223 #define STATMOUNT_OPT_SEC_ARRAY 0x00000800U /* Want/got opt_sec... */ 225 - #define STATMOUNT_MNT_UIDMAP 0x00001000U /* Want/got uidmap... */ 226 - #define STATMOUNT_MNT_GIDMAP 0x00002000U /* Want/got gidmap... */ 224 + #define STATMOUNT_SUPPORTED_MASK 0x00001000U /* Want/got supported mask flags */ 225 + #define STATMOUNT_MNT_UIDMAP 0x00002000U /* Want/got uidmap... */ 226 + #define STATMOUNT_MNT_GIDMAP 0x00004000U /* Want/got gidmap... */ 227 227 228 228 /* 229 229 * Special @mnt_id values that can be passed to listmount
+1
scripts/syscall.tbl
··· 407 407 464 common getxattrat sys_getxattrat 408 408 465 common listxattrat sys_listxattrat 409 409 466 common removexattrat sys_removexattrat 410 + 467 common open_tree_attr sys_open_tree_attr