Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull LSM updates from Paul Moore:
"Seven patches for the LSM layer and we've got a mix of trivial and
significant patches. Highlights below, starting with the smaller bits
first so they don't get lost in the discussion of the larger items:

- Remove some redundant NULL pointer checks in the common LSM audit
code.

- Ratelimit the lockdown LSM's access denial messages.

With this change there is a chance that the last visible lockdown
message on the console is outdated/old, but it does help preserve
the initial series of lockdown denials that started the denial
message flood and my gut feeling is that these might be the more
valuable messages.

- Open userfaultfds as readonly instead of read/write.

While this code obviously lives outside the LSM, it does have a
noticeable impact on the LSMs with Ondrej explaining the situation
in the commit description. It is worth noting that this patch
languished on the VFS list for over a year without any comments
(objections or otherwise) so I took the liberty of pulling it into
the LSM tree after giving fair notice. It has been in linux-next
since the end of August without any noticeable problems.

- Add a LSM hook for user namespace creation, with implementations
for both the BPF LSM and SELinux.

Even though the changes are fairly small, this is the bulk of the
diffstat as we are also including BPF LSM selftests for the new
hook.

It's also the most contentious of the changes in this pull request
with Eric Biederman NACK'ing the LSM hook multiple times during its
development and discussion upstream. While I've never taken NACK's
lightly, I'm sending these patches to you because it is my belief
that they are of good quality, satisfy a long-standing need of
users and distros, and are in keeping with the existing nature of
the LSM layer and the Linux Kernel as a whole.

The patches in implement a LSM hook for user namespace creation
that allows for a granular approach, configurable at runtime, which
enables both monitoring and control of user namespaces. The general
consensus has been that this is far preferable to the other
solutions that have been adopted downstream including outright
removal from the kernel, disabling via system wide sysctls, or
various other out-of-tree mechanisms that users have been forced to
adopt since we haven't been able to provide them an upstream
solution for their requests. Eric has been steadfast in his
objections to this LSM hook, explaining that any restrictions on
the user namespace could have significant impact on userspace.
While there is the possibility of impacting userspace, it is
important to note that this solution only impacts userspace when it
is requested based on the runtime configuration supplied by the
distro/admin/user. Frederick (the pathset author), the LSM/security
community, and myself have tried to work with Eric during
development of this patchset to find a mutually acceptable
solution, but Eric's approach and unwillingness to engage in a
meaningful way have made this impossible. I have CC'd Eric directly
on this pull request so he has a chance to provide his side of the
story; there have been no objections outside of Eric's"

* tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lockdown: ratelimit denial messages
userfaultfd: open userfaultfds with O_RDONLY
selinux: Implement userns_create hook
selftests/bpf: Add tests verifying bpf lsm userns_create hook
bpf-lsm: Make bpf_lsm_userns_create() sleepable
security, lsm: Introduce security_create_user_ns()
lsm: clean up redundant NULL pointer check

+172 -16
+2 -2
fs/userfaultfd.c
··· 991 991 int fd; 992 992 993 993 fd = anon_inode_getfd_secure("[userfaultfd]", &userfaultfd_fops, new, 994 - O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode); 994 + O_RDONLY | (new->flags & UFFD_SHARED_FCNTL_FLAGS), inode); 995 995 if (fd < 0) 996 996 return fd; 997 997 ··· 2094 2094 mmgrab(ctx->mm); 2095 2095 2096 2096 fd = anon_inode_getfd_secure("[userfaultfd]", &userfaultfd_fops, ctx, 2097 - O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL); 2097 + O_RDONLY | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL); 2098 2098 if (fd < 0) { 2099 2099 mmdrop(ctx->mm); 2100 2100 kmem_cache_free(userfaultfd_ctx_cachep, ctx);
+1
include/linux/lsm_hook_defs.h
··· 224 224 unsigned long arg3, unsigned long arg4, unsigned long arg5) 225 225 LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, 226 226 struct inode *inode) 227 + LSM_HOOK(int, 0, userns_create, const struct cred *cred) 227 228 LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) 228 229 LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp, 229 230 u32 *secid)
+4
include/linux/lsm_hooks.h
··· 806 806 * security attributes, e.g. for /proc/pid inodes. 807 807 * @p contains the task_struct for the task. 808 808 * @inode contains the inode structure for the inode. 809 + * @userns_create: 810 + * Check permission prior to creating a new user namespace. 811 + * @cred points to prepared creds. 812 + * Return 0 if successful, otherwise < 0 error code. 809 813 * 810 814 * Security hooks for Netlink messaging. 811 815 *
+6
include/linux/security.h
··· 437 437 int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, 438 438 unsigned long arg4, unsigned long arg5); 439 439 void security_task_to_inode(struct task_struct *p, struct inode *inode); 440 + int security_create_user_ns(const struct cred *cred); 440 441 int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); 441 442 void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid); 442 443 int security_msg_msg_alloc(struct msg_msg *msg); ··· 1194 1193 1195 1194 static inline void security_task_to_inode(struct task_struct *p, struct inode *inode) 1196 1195 { } 1196 + 1197 + static inline int security_create_user_ns(const struct cred *cred) 1198 + { 1199 + return 0; 1200 + } 1197 1201 1198 1202 static inline int security_ipc_permission(struct kern_ipc_perm *ipcp, 1199 1203 short flag)
+1
kernel/bpf/bpf_lsm.c
··· 335 335 BTF_ID(func, bpf_lsm_task_prctl) 336 336 BTF_ID(func, bpf_lsm_task_setscheduler) 337 337 BTF_ID(func, bpf_lsm_task_to_inode) 338 + BTF_ID(func, bpf_lsm_userns_create) 338 339 BTF_SET_END(sleepable_lsm_hooks) 339 340 340 341 bool bpf_lsm_is_sleepable_hook(u32 btf_id)
+5
kernel/user_namespace.c
··· 9 9 #include <linux/highuid.h> 10 10 #include <linux/cred.h> 11 11 #include <linux/securebits.h> 12 + #include <linux/security.h> 12 13 #include <linux/keyctl.h> 13 14 #include <linux/key-type.h> 14 15 #include <keys/user-type.h> ··· 112 111 ret = -EPERM; 113 112 if (!kuid_has_mapping(parent_ns, owner) || 114 113 !kgid_has_mapping(parent_ns, group)) 114 + goto fail_dec; 115 + 116 + ret = security_create_user_ns(new); 117 + if (ret < 0) 115 118 goto fail_dec; 116 119 117 120 ret = -ENOMEM;
+1 -1
security/lockdown/lockdown.c
··· 63 63 64 64 if (kernel_locked_down >= what) { 65 65 if (lockdown_reasons[what]) 66 - pr_notice("Lockdown: %s: %s is restricted; see man kernel_lockdown.7\n", 66 + pr_notice_ratelimited("Lockdown: %s: %s is restricted; see man kernel_lockdown.7\n", 67 67 current->comm, lockdown_reasons[what]); 68 68 return -EPERM; 69 69 }
+1 -13
security/lsm_audit.c
··· 44 44 struct iphdr *ih; 45 45 46 46 ih = ip_hdr(skb); 47 - if (ih == NULL) 48 - return -EINVAL; 49 - 50 47 ad->u.net->v4info.saddr = ih->saddr; 51 48 ad->u.net->v4info.daddr = ih->daddr; 52 49 ··· 56 59 switch (ih->protocol) { 57 60 case IPPROTO_TCP: { 58 61 struct tcphdr *th = tcp_hdr(skb); 59 - if (th == NULL) 60 - break; 61 62 62 63 ad->u.net->sport = th->source; 63 64 ad->u.net->dport = th->dest; ··· 63 68 } 64 69 case IPPROTO_UDP: { 65 70 struct udphdr *uh = udp_hdr(skb); 66 - if (uh == NULL) 67 - break; 68 71 69 72 ad->u.net->sport = uh->source; 70 73 ad->u.net->dport = uh->dest; ··· 70 77 } 71 78 case IPPROTO_DCCP: { 72 79 struct dccp_hdr *dh = dccp_hdr(skb); 73 - if (dh == NULL) 74 - break; 75 80 76 81 ad->u.net->sport = dh->dccph_sport; 77 82 ad->u.net->dport = dh->dccph_dport; ··· 77 86 } 78 87 case IPPROTO_SCTP: { 79 88 struct sctphdr *sh = sctp_hdr(skb); 80 - if (sh == NULL) 81 - break; 89 + 82 90 ad->u.net->sport = sh->source; 83 91 ad->u.net->dport = sh->dest; 84 92 break; ··· 105 115 __be16 frag_off; 106 116 107 117 ip6 = ipv6_hdr(skb); 108 - if (ip6 == NULL) 109 - return -EINVAL; 110 118 ad->u.net->v6info.saddr = ip6->saddr; 111 119 ad->u.net->v6info.daddr = ip6->daddr; 112 120 /* IPv6 can have several extension header before the Transport header
+5
security/security.c
··· 1909 1909 call_void_hook(task_to_inode, p, inode); 1910 1910 } 1911 1911 1912 + int security_create_user_ns(const struct cred *cred) 1913 + { 1914 + return call_int_hook(userns_create, 0, cred); 1915 + } 1916 + 1912 1917 int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) 1913 1918 { 1914 1919 return call_int_hook(ipc_permission, 0, ipcp, flag);
+9
security/selinux/hooks.c
··· 4222 4222 spin_unlock(&isec->lock); 4223 4223 } 4224 4224 4225 + static int selinux_userns_create(const struct cred *cred) 4226 + { 4227 + u32 sid = current_sid(); 4228 + 4229 + return avc_has_perm(&selinux_state, sid, sid, SECCLASS_USER_NAMESPACE, 4230 + USER_NAMESPACE__CREATE, NULL); 4231 + } 4232 + 4225 4233 /* Returns error only if unable to parse addresses */ 4226 4234 static int selinux_parse_skb_ipv4(struct sk_buff *skb, 4227 4235 struct common_audit_data *ad, u8 *proto) ··· 7136 7128 LSM_HOOK_INIT(task_movememory, selinux_task_movememory), 7137 7129 LSM_HOOK_INIT(task_kill, selinux_task_kill), 7138 7130 LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode), 7131 + LSM_HOOK_INIT(userns_create, selinux_userns_create), 7139 7132 7140 7133 LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission), 7141 7134 LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid),
+2
security/selinux/include/classmap.h
··· 254 254 { COMMON_FILE_PERMS, NULL } }, 255 255 { "io_uring", 256 256 { "override_creds", "sqpoll", "cmd", NULL } }, 257 + { "user_namespace", 258 + { "create", NULL } }, 257 259 { NULL } 258 260 }; 259 261
+102
tools/testing/selftests/bpf/prog_tests/deny_namespace.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #define _GNU_SOURCE 3 + #include <test_progs.h> 4 + #include "test_deny_namespace.skel.h" 5 + #include <sched.h> 6 + #include "cap_helpers.h" 7 + #include <stdio.h> 8 + 9 + static int wait_for_pid(pid_t pid) 10 + { 11 + int status, ret; 12 + 13 + again: 14 + ret = waitpid(pid, &status, 0); 15 + if (ret == -1) { 16 + if (errno == EINTR) 17 + goto again; 18 + 19 + return -1; 20 + } 21 + 22 + if (!WIFEXITED(status)) 23 + return -1; 24 + 25 + return WEXITSTATUS(status); 26 + } 27 + 28 + /* negative return value -> some internal error 29 + * positive return value -> userns creation failed 30 + * 0 -> userns creation succeeded 31 + */ 32 + static int create_user_ns(void) 33 + { 34 + pid_t pid; 35 + 36 + pid = fork(); 37 + if (pid < 0) 38 + return -1; 39 + 40 + if (pid == 0) { 41 + if (unshare(CLONE_NEWUSER)) 42 + _exit(EXIT_FAILURE); 43 + _exit(EXIT_SUCCESS); 44 + } 45 + 46 + return wait_for_pid(pid); 47 + } 48 + 49 + static void test_userns_create_bpf(void) 50 + { 51 + __u32 cap_mask = 1ULL << CAP_SYS_ADMIN; 52 + __u64 old_caps = 0; 53 + 54 + cap_enable_effective(cap_mask, &old_caps); 55 + 56 + ASSERT_OK(create_user_ns(), "priv new user ns"); 57 + 58 + cap_disable_effective(cap_mask, &old_caps); 59 + 60 + ASSERT_EQ(create_user_ns(), EPERM, "unpriv new user ns"); 61 + 62 + if (cap_mask & old_caps) 63 + cap_enable_effective(cap_mask, NULL); 64 + } 65 + 66 + static void test_unpriv_userns_create_no_bpf(void) 67 + { 68 + __u32 cap_mask = 1ULL << CAP_SYS_ADMIN; 69 + __u64 old_caps = 0; 70 + 71 + cap_disable_effective(cap_mask, &old_caps); 72 + 73 + ASSERT_OK(create_user_ns(), "no-bpf unpriv new user ns"); 74 + 75 + if (cap_mask & old_caps) 76 + cap_enable_effective(cap_mask, NULL); 77 + } 78 + 79 + void test_deny_namespace(void) 80 + { 81 + struct test_deny_namespace *skel = NULL; 82 + int err; 83 + 84 + if (test__start_subtest("unpriv_userns_create_no_bpf")) 85 + test_unpriv_userns_create_no_bpf(); 86 + 87 + skel = test_deny_namespace__open_and_load(); 88 + if (!ASSERT_OK_PTR(skel, "skel load")) 89 + goto close_prog; 90 + 91 + err = test_deny_namespace__attach(skel); 92 + if (!ASSERT_OK(err, "attach")) 93 + goto close_prog; 94 + 95 + if (test__start_subtest("userns_create_bpf")) 96 + test_userns_create_bpf(); 97 + 98 + test_deny_namespace__detach(skel); 99 + 100 + close_prog: 101 + test_deny_namespace__destroy(skel); 102 + }
+33
tools/testing/selftests/bpf/progs/test_deny_namespace.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <linux/bpf.h> 3 + #include <bpf/bpf_helpers.h> 4 + #include <bpf/bpf_tracing.h> 5 + #include <errno.h> 6 + #include <linux/capability.h> 7 + 8 + struct kernel_cap_struct { 9 + __u32 cap[_LINUX_CAPABILITY_U32S_3]; 10 + } __attribute__((preserve_access_index)); 11 + 12 + struct cred { 13 + struct kernel_cap_struct cap_effective; 14 + } __attribute__((preserve_access_index)); 15 + 16 + char _license[] SEC("license") = "GPL"; 17 + 18 + SEC("lsm.s/userns_create") 19 + int BPF_PROG(test_userns_create, const struct cred *cred, int ret) 20 + { 21 + struct kernel_cap_struct caps = cred->cap_effective; 22 + int cap_index = CAP_TO_INDEX(CAP_SYS_ADMIN); 23 + __u32 cap_mask = CAP_TO_MASK(CAP_SYS_ADMIN); 24 + 25 + if (ret) 26 + return 0; 27 + 28 + ret = -EPERM; 29 + if (caps.cap[cap_index] & cap_mask) 30 + return 0; 31 + 32 + return -EPERM; 33 + }