Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

seccomp: add a return code to trap to userspace

This patch introduces a means for syscalls matched in seccomp to notify
some other task that a particular filter has been triggered.

The motivation for this is primarily for use with containers. For example,
if a container does an init_module(), we obviously don't want to load this
untrusted code, which may be compiled for the wrong version of the kernel
anyway. Instead, we could parse the module image, figure out which module
the container is trying to load and load it on the host.

As another example, containers cannot mount() in general since various
filesystems assume a trusted image. However, if an orchestrator knows that
e.g. a particular block device has not been exposed to a container for
writing, it want to allow the container to mount that block device (that
is, handle the mount for it).

This patch adds functionality that is already possible via at least two
other means that I know about, both of which involve ptrace(): first, one
could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL.
Unfortunately this is slow, so a faster version would be to install a
filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP.
Since ptrace allows only one tracer, if the container runtime is that
tracer, users inside the container (or outside) trying to debug it will not
be able to use ptrace, which is annoying. It also means that older
distributions based on Upstart cannot boot inside containers using ptrace,
since upstart itself uses ptrace to monitor services while starting.

The actual implementation of this is fairly small, although getting the
synchronization right was/is slightly complex.

Finally, it's worth noting that the classic seccomp TOCTOU of reading
memory data from the task still applies here, but can be avoided with
careful design of the userspace handler: if the userspace handler reads all
of the task memory that is necessary before applying its security policy,
the tracee's subsequent memory edits will not be read by the tracer.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: "Serge E. Hallyn" <serge@hallyn.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
CC: Christian Brauner <christian@brauner.io>
CC: Tyler Hicks <tyhicks@canonical.com>
CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>

authored by

Tycho Andersen and committed by
Kees Cook
6a21cc50 a5662e4d

+1017 -10
+1
Documentation/ioctl/ioctl-number.txt
··· 79 79 0x1b all InfiniBand Subsystem <http://infiniband.sourceforge.net/> 80 80 0x20 all drivers/cdrom/cm206.h 81 81 0x22 all scsi/sg.h 82 + '!' 00-1F uapi/linux/seccomp.h 82 83 '#' 00-3F IEEE 1394 Subsystem Block for the entire subsystem 83 84 '$' 00-0F linux/perf_counter.h, linux/perf_event.h 84 85 '%' 00-0F include/uapi/linux/stm.h
+84
Documentation/userspace-api/seccomp_filter.rst
··· 122 122 Results in the lower 16-bits of the return value being passed 123 123 to userland as the errno without executing the system call. 124 124 125 + ``SECCOMP_RET_USER_NOTIF``: 126 + Results in a ``struct seccomp_notif`` message sent on the userspace 127 + notification fd, if it is attached, or ``-ENOSYS`` if it is not. See below 128 + on discussion of how to handle user notifications. 129 + 125 130 ``SECCOMP_RET_TRACE``: 126 131 When returned, this value will cause the kernel to attempt to 127 132 notify a ``ptrace()``-based tracer prior to executing the system ··· 187 182 The ``samples/seccomp/`` directory contains both an x86-specific example 188 183 and a more generic example of a higher level macro interface for BPF 189 184 program generation. 185 + 186 + Userspace Notification 187 + ====================== 188 + 189 + The ``SECCOMP_RET_USER_NOTIF`` return code lets seccomp filters pass a 190 + particular syscall to userspace to be handled. This may be useful for 191 + applications like container managers, which wish to intercept particular 192 + syscalls (``mount()``, ``finit_module()``, etc.) and change their behavior. 193 + 194 + To acquire a notification FD, use the ``SECCOMP_FILTER_FLAG_NEW_LISTENER`` 195 + argument to the ``seccomp()`` syscall: 196 + 197 + .. code-block:: c 198 + 199 + fd = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog); 200 + 201 + which (on success) will return a listener fd for the filter, which can then be 202 + passed around via ``SCM_RIGHTS`` or similar. Note that filter fds correspond to 203 + a particular filter, and not a particular task. So if this task then forks, 204 + notifications from both tasks will appear on the same filter fd. Reads and 205 + writes to/from a filter fd are also synchronized, so a filter fd can safely 206 + have many readers. 207 + 208 + The interface for a seccomp notification fd consists of two structures: 209 + 210 + .. code-block:: c 211 + 212 + struct seccomp_notif_sizes { 213 + __u16 seccomp_notif; 214 + __u16 seccomp_notif_resp; 215 + __u16 seccomp_data; 216 + }; 217 + 218 + struct seccomp_notif { 219 + __u64 id; 220 + __u32 pid; 221 + __u32 flags; 222 + struct seccomp_data data; 223 + }; 224 + 225 + struct seccomp_notif_resp { 226 + __u64 id; 227 + __s64 val; 228 + __s32 error; 229 + __u32 flags; 230 + }; 231 + 232 + The ``struct seccomp_notif_sizes`` structure can be used to determine the size 233 + of the various structures used in seccomp notifications. The size of ``struct 234 + seccomp_data`` may change in the future, so code should use: 235 + 236 + .. code-block:: c 237 + 238 + struct seccomp_notif_sizes sizes; 239 + seccomp(SECCOMP_GET_NOTIF_SIZES, 0, &sizes); 240 + 241 + to determine the size of the various structures to allocate. See 242 + samples/seccomp/user-trap.c for an example. 243 + 244 + Users can read via ``ioctl(SECCOMP_IOCTL_NOTIF_RECV)`` (or ``poll()``) on a 245 + seccomp notification fd to receive a ``struct seccomp_notif``, which contains 246 + five members: the input length of the structure, a unique-per-filter ``id``, 247 + the ``pid`` of the task which triggered this request (which may be 0 if the 248 + task is in a pid ns not visible from the listener's pid namespace), a ``flags`` 249 + member which for now only has ``SECCOMP_NOTIF_FLAG_SIGNALED``, representing 250 + whether or not the notification is a result of a non-fatal signal, and the 251 + ``data`` passed to seccomp. Userspace can then make a decision based on this 252 + information about what to do, and ``ioctl(SECCOMP_IOCTL_NOTIF_SEND)`` a 253 + response, indicating what should be returned to userspace. The ``id`` member of 254 + ``struct seccomp_notif_resp`` should be the same ``id`` as in ``struct 255 + seccomp_notif``. 256 + 257 + It is worth noting that ``struct seccomp_data`` contains the values of register 258 + arguments to the syscall, but does not contain pointers to memory. The task's 259 + memory is accessible to suitably privileged traces via ``ptrace()`` or 260 + ``/proc/pid/mem``. However, care should be taken to avoid the TOCTOU mentioned 261 + above in this document: all arguments being read from the tracee's memory 262 + should be read into the tracer's memory before any policy decisions are made. 263 + This allows for an atomic decision on syscall arguments. 190 264 191 265 Sysctls 192 266 =======
+4 -3
include/linux/seccomp.h
··· 4 4 5 5 #include <uapi/linux/seccomp.h> 6 6 7 - #define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ 8 - SECCOMP_FILTER_FLAG_LOG | \ 9 - SECCOMP_FILTER_FLAG_SPEC_ALLOW) 7 + #define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ 8 + SECCOMP_FILTER_FLAG_LOG | \ 9 + SECCOMP_FILTER_FLAG_SPEC_ALLOW | \ 10 + SECCOMP_FILTER_FLAG_NEW_LISTENER) 10 11 11 12 #ifdef CONFIG_SECCOMP 12 13
+37 -3
include/uapi/linux/seccomp.h
··· 15 15 #define SECCOMP_SET_MODE_STRICT 0 16 16 #define SECCOMP_SET_MODE_FILTER 1 17 17 #define SECCOMP_GET_ACTION_AVAIL 2 18 + #define SECCOMP_GET_NOTIF_SIZES 3 18 19 19 20 /* Valid flags for SECCOMP_SET_MODE_FILTER */ 20 - #define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) 21 - #define SECCOMP_FILTER_FLAG_LOG (1UL << 1) 22 - #define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) 21 + #define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) 22 + #define SECCOMP_FILTER_FLAG_LOG (1UL << 1) 23 + #define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) 24 + #define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3) 23 25 24 26 /* 25 27 * All BPF programs must return a 32-bit value. ··· 37 35 #define SECCOMP_RET_KILL SECCOMP_RET_KILL_THREAD 38 36 #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ 39 37 #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ 38 + #define SECCOMP_RET_USER_NOTIF 0x7fc00000U /* notifies userspace */ 40 39 #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */ 41 40 #define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */ 42 41 #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ ··· 63 60 __u64 args[6]; 64 61 }; 65 62 63 + struct seccomp_notif_sizes { 64 + __u16 seccomp_notif; 65 + __u16 seccomp_notif_resp; 66 + __u16 seccomp_data; 67 + }; 68 + 69 + struct seccomp_notif { 70 + __u64 id; 71 + __u32 pid; 72 + __u32 flags; 73 + struct seccomp_data data; 74 + }; 75 + 76 + struct seccomp_notif_resp { 77 + __u64 id; 78 + __s64 val; 79 + __s32 error; 80 + __u32 flags; 81 + }; 82 + 83 + #define SECCOMP_IOC_MAGIC '!' 84 + #define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr) 85 + #define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type) 86 + #define SECCOMP_IOW(nr, type) _IOW(SECCOMP_IOC_MAGIC, nr, type) 87 + #define SECCOMP_IOWR(nr, type) _IOWR(SECCOMP_IOC_MAGIC, nr, type) 88 + 89 + /* Flags for seccomp notification fd ioctl. */ 90 + #define SECCOMP_IOCTL_NOTIF_RECV SECCOMP_IOWR(0, struct seccomp_notif) 91 + #define SECCOMP_IOCTL_NOTIF_SEND SECCOMP_IOWR(1, \ 92 + struct seccomp_notif_resp) 93 + #define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64) 66 94 #endif /* _UAPI_LINUX_SECCOMP_H */
+446 -2
kernel/seccomp.c
··· 33 33 #endif 34 34 35 35 #ifdef CONFIG_SECCOMP_FILTER 36 + #include <linux/file.h> 36 37 #include <linux/filter.h> 37 38 #include <linux/pid.h> 38 39 #include <linux/ptrace.h> 39 40 #include <linux/security.h> 40 41 #include <linux/tracehook.h> 41 42 #include <linux/uaccess.h> 43 + #include <linux/anon_inodes.h> 44 + 45 + enum notify_state { 46 + SECCOMP_NOTIFY_INIT, 47 + SECCOMP_NOTIFY_SENT, 48 + SECCOMP_NOTIFY_REPLIED, 49 + }; 50 + 51 + struct seccomp_knotif { 52 + /* The struct pid of the task whose filter triggered the notification */ 53 + struct task_struct *task; 54 + 55 + /* The "cookie" for this request; this is unique for this filter. */ 56 + u64 id; 57 + 58 + /* 59 + * The seccomp data. This pointer is valid the entire time this 60 + * notification is active, since it comes from __seccomp_filter which 61 + * eclipses the entire lifecycle here. 62 + */ 63 + const struct seccomp_data *data; 64 + 65 + /* 66 + * Notification states. When SECCOMP_RET_USER_NOTIF is returned, a 67 + * struct seccomp_knotif is created and starts out in INIT. Once the 68 + * handler reads the notification off of an FD, it transitions to SENT. 69 + * If a signal is received the state transitions back to INIT and 70 + * another message is sent. When the userspace handler replies, state 71 + * transitions to REPLIED. 72 + */ 73 + enum notify_state state; 74 + 75 + /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */ 76 + int error; 77 + long val; 78 + 79 + /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */ 80 + struct completion ready; 81 + 82 + struct list_head list; 83 + }; 84 + 85 + /** 86 + * struct notification - container for seccomp userspace notifications. Since 87 + * most seccomp filters will not have notification listeners attached and this 88 + * structure is fairly large, we store the notification-specific stuff in a 89 + * separate structure. 90 + * 91 + * @request: A semaphore that users of this notification can wait on for 92 + * changes. Actual reads and writes are still controlled with 93 + * filter->notify_lock. 94 + * @next_id: The id of the next request. 95 + * @notifications: A list of struct seccomp_knotif elements. 96 + * @wqh: A wait queue for poll. 97 + */ 98 + struct notification { 99 + struct semaphore request; 100 + u64 next_id; 101 + struct list_head notifications; 102 + wait_queue_head_t wqh; 103 + }; 42 104 43 105 /** 44 106 * struct seccomp_filter - container for seccomp BPF programs ··· 112 50 * @log: true if all actions except for SECCOMP_RET_ALLOW should be logged 113 51 * @prev: points to a previously installed, or inherited, filter 114 52 * @prog: the BPF program to evaluate 53 + * @notif: the struct that holds all notification related information 54 + * @notify_lock: A lock for all notification-related accesses. 115 55 * 116 56 * seccomp_filter objects are organized in a tree linked via the @prev 117 57 * pointer. For any task, it appears to be a singly-linked list starting ··· 130 66 bool log; 131 67 struct seccomp_filter *prev; 132 68 struct bpf_prog *prog; 69 + struct notification *notif; 70 + struct mutex notify_lock; 133 71 }; 134 72 135 73 /* Limit any path through the tree to 256KB worth of instructions. */ ··· 452 386 if (!sfilter) 453 387 return ERR_PTR(-ENOMEM); 454 388 389 + mutex_init(&sfilter->notify_lock); 455 390 ret = bpf_prog_create_from_user(&sfilter->prog, fprog, 456 391 seccomp_check_filter, save_orig); 457 392 if (ret < 0) { ··· 546 479 547 480 static void __get_seccomp_filter(struct seccomp_filter *filter) 548 481 { 549 - /* Reference count is bounded by the number of total processes. */ 550 482 refcount_inc(&filter->usage); 551 483 } 552 484 ··· 616 550 #define SECCOMP_LOG_TRACE (1 << 4) 617 551 #define SECCOMP_LOG_LOG (1 << 5) 618 552 #define SECCOMP_LOG_ALLOW (1 << 6) 553 + #define SECCOMP_LOG_USER_NOTIF (1 << 7) 619 554 620 555 static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS | 621 556 SECCOMP_LOG_KILL_THREAD | 622 557 SECCOMP_LOG_TRAP | 623 558 SECCOMP_LOG_ERRNO | 559 + SECCOMP_LOG_USER_NOTIF | 624 560 SECCOMP_LOG_TRACE | 625 561 SECCOMP_LOG_LOG; 626 562 ··· 642 574 break; 643 575 case SECCOMP_RET_TRACE: 644 576 log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE; 577 + break; 578 + case SECCOMP_RET_USER_NOTIF: 579 + log = requested && seccomp_actions_logged & SECCOMP_LOG_USER_NOTIF; 645 580 break; 646 581 case SECCOMP_RET_LOG: 647 582 log = seccomp_actions_logged & SECCOMP_LOG_LOG; ··· 717 646 #else 718 647 719 648 #ifdef CONFIG_SECCOMP_FILTER 649 + static u64 seccomp_next_notify_id(struct seccomp_filter *filter) 650 + { 651 + /* 652 + * Note: overflow is ok here, the id just needs to be unique per 653 + * filter. 654 + */ 655 + lockdep_assert_held(&filter->notify_lock); 656 + return filter->notif->next_id++; 657 + } 658 + 659 + static void seccomp_do_user_notification(int this_syscall, 660 + struct seccomp_filter *match, 661 + const struct seccomp_data *sd) 662 + { 663 + int err; 664 + long ret = 0; 665 + struct seccomp_knotif n = {}; 666 + 667 + mutex_lock(&match->notify_lock); 668 + err = -ENOSYS; 669 + if (!match->notif) 670 + goto out; 671 + 672 + n.task = current; 673 + n.state = SECCOMP_NOTIFY_INIT; 674 + n.data = sd; 675 + n.id = seccomp_next_notify_id(match); 676 + init_completion(&n.ready); 677 + list_add(&n.list, &match->notif->notifications); 678 + 679 + up(&match->notif->request); 680 + wake_up_poll(&match->notif->wqh, EPOLLIN | EPOLLRDNORM); 681 + mutex_unlock(&match->notify_lock); 682 + 683 + /* 684 + * This is where we wait for a reply from userspace. 685 + */ 686 + err = wait_for_completion_interruptible(&n.ready); 687 + mutex_lock(&match->notify_lock); 688 + if (err == 0) { 689 + ret = n.val; 690 + err = n.error; 691 + } 692 + 693 + /* 694 + * Note that it's possible the listener died in between the time when 695 + * we were notified of a respons (or a signal) and when we were able to 696 + * re-acquire the lock, so only delete from the list if the 697 + * notification actually exists. 698 + * 699 + * Also note that this test is only valid because there's no way to 700 + * *reattach* to a notifier right now. If one is added, we'll need to 701 + * keep track of the notif itself and make sure they match here. 702 + */ 703 + if (match->notif) 704 + list_del(&n.list); 705 + out: 706 + mutex_unlock(&match->notify_lock); 707 + syscall_set_return_value(current, task_pt_regs(current), 708 + err, ret); 709 + } 710 + 720 711 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, 721 712 const bool recheck_after_trace) 722 713 { ··· 860 727 return -1; 861 728 862 729 return 0; 730 + 731 + case SECCOMP_RET_USER_NOTIF: 732 + seccomp_do_user_notification(this_syscall, match, sd); 733 + goto skip; 863 734 864 735 case SECCOMP_RET_LOG: 865 736 seccomp_log(this_syscall, 0, action, true); ··· 971 834 } 972 835 973 836 #ifdef CONFIG_SECCOMP_FILTER 837 + static int seccomp_notify_release(struct inode *inode, struct file *file) 838 + { 839 + struct seccomp_filter *filter = file->private_data; 840 + struct seccomp_knotif *knotif; 841 + 842 + mutex_lock(&filter->notify_lock); 843 + 844 + /* 845 + * If this file is being closed because e.g. the task who owned it 846 + * died, let's wake everyone up who was waiting on us. 847 + */ 848 + list_for_each_entry(knotif, &filter->notif->notifications, list) { 849 + if (knotif->state == SECCOMP_NOTIFY_REPLIED) 850 + continue; 851 + 852 + knotif->state = SECCOMP_NOTIFY_REPLIED; 853 + knotif->error = -ENOSYS; 854 + knotif->val = 0; 855 + 856 + complete(&knotif->ready); 857 + } 858 + 859 + kfree(filter->notif); 860 + filter->notif = NULL; 861 + mutex_unlock(&filter->notify_lock); 862 + __put_seccomp_filter(filter); 863 + return 0; 864 + } 865 + 866 + static long seccomp_notify_recv(struct seccomp_filter *filter, 867 + void __user *buf) 868 + { 869 + struct seccomp_knotif *knotif = NULL, *cur; 870 + struct seccomp_notif unotif; 871 + ssize_t ret; 872 + 873 + memset(&unotif, 0, sizeof(unotif)); 874 + 875 + ret = down_interruptible(&filter->notif->request); 876 + if (ret < 0) 877 + return ret; 878 + 879 + mutex_lock(&filter->notify_lock); 880 + list_for_each_entry(cur, &filter->notif->notifications, list) { 881 + if (cur->state == SECCOMP_NOTIFY_INIT) { 882 + knotif = cur; 883 + break; 884 + } 885 + } 886 + 887 + /* 888 + * If we didn't find a notification, it could be that the task was 889 + * interrupted by a fatal signal between the time we were woken and 890 + * when we were able to acquire the rw lock. 891 + */ 892 + if (!knotif) { 893 + ret = -ENOENT; 894 + goto out; 895 + } 896 + 897 + unotif.id = knotif->id; 898 + unotif.pid = task_pid_vnr(knotif->task); 899 + unotif.data = *(knotif->data); 900 + 901 + knotif->state = SECCOMP_NOTIFY_SENT; 902 + wake_up_poll(&filter->notif->wqh, EPOLLOUT | EPOLLWRNORM); 903 + ret = 0; 904 + out: 905 + mutex_unlock(&filter->notify_lock); 906 + 907 + if (ret == 0 && copy_to_user(buf, &unotif, sizeof(unotif))) { 908 + ret = -EFAULT; 909 + 910 + /* 911 + * Userspace screwed up. To make sure that we keep this 912 + * notification alive, let's reset it back to INIT. It 913 + * may have died when we released the lock, so we need to make 914 + * sure it's still around. 915 + */ 916 + knotif = NULL; 917 + mutex_lock(&filter->notify_lock); 918 + list_for_each_entry(cur, &filter->notif->notifications, list) { 919 + if (cur->id == unotif.id) { 920 + knotif = cur; 921 + break; 922 + } 923 + } 924 + 925 + if (knotif) { 926 + knotif->state = SECCOMP_NOTIFY_INIT; 927 + up(&filter->notif->request); 928 + } 929 + mutex_unlock(&filter->notify_lock); 930 + } 931 + 932 + return ret; 933 + } 934 + 935 + static long seccomp_notify_send(struct seccomp_filter *filter, 936 + void __user *buf) 937 + { 938 + struct seccomp_notif_resp resp = {}; 939 + struct seccomp_knotif *knotif = NULL, *cur; 940 + long ret; 941 + 942 + if (copy_from_user(&resp, buf, sizeof(resp))) 943 + return -EFAULT; 944 + 945 + if (resp.flags) 946 + return -EINVAL; 947 + 948 + ret = mutex_lock_interruptible(&filter->notify_lock); 949 + if (ret < 0) 950 + return ret; 951 + 952 + list_for_each_entry(cur, &filter->notif->notifications, list) { 953 + if (cur->id == resp.id) { 954 + knotif = cur; 955 + break; 956 + } 957 + } 958 + 959 + if (!knotif) { 960 + ret = -ENOENT; 961 + goto out; 962 + } 963 + 964 + /* Allow exactly one reply. */ 965 + if (knotif->state != SECCOMP_NOTIFY_SENT) { 966 + ret = -EINPROGRESS; 967 + goto out; 968 + } 969 + 970 + ret = 0; 971 + knotif->state = SECCOMP_NOTIFY_REPLIED; 972 + knotif->error = resp.error; 973 + knotif->val = resp.val; 974 + complete(&knotif->ready); 975 + out: 976 + mutex_unlock(&filter->notify_lock); 977 + return ret; 978 + } 979 + 980 + static long seccomp_notify_id_valid(struct seccomp_filter *filter, 981 + void __user *buf) 982 + { 983 + struct seccomp_knotif *knotif = NULL; 984 + u64 id; 985 + long ret; 986 + 987 + if (copy_from_user(&id, buf, sizeof(id))) 988 + return -EFAULT; 989 + 990 + ret = mutex_lock_interruptible(&filter->notify_lock); 991 + if (ret < 0) 992 + return ret; 993 + 994 + ret = -ENOENT; 995 + list_for_each_entry(knotif, &filter->notif->notifications, list) { 996 + if (knotif->id == id) { 997 + if (knotif->state == SECCOMP_NOTIFY_SENT) 998 + ret = 0; 999 + goto out; 1000 + } 1001 + } 1002 + 1003 + out: 1004 + mutex_unlock(&filter->notify_lock); 1005 + return ret; 1006 + } 1007 + 1008 + static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, 1009 + unsigned long arg) 1010 + { 1011 + struct seccomp_filter *filter = file->private_data; 1012 + void __user *buf = (void __user *)arg; 1013 + 1014 + switch (cmd) { 1015 + case SECCOMP_IOCTL_NOTIF_RECV: 1016 + return seccomp_notify_recv(filter, buf); 1017 + case SECCOMP_IOCTL_NOTIF_SEND: 1018 + return seccomp_notify_send(filter, buf); 1019 + case SECCOMP_IOCTL_NOTIF_ID_VALID: 1020 + return seccomp_notify_id_valid(filter, buf); 1021 + default: 1022 + return -EINVAL; 1023 + } 1024 + } 1025 + 1026 + static __poll_t seccomp_notify_poll(struct file *file, 1027 + struct poll_table_struct *poll_tab) 1028 + { 1029 + struct seccomp_filter *filter = file->private_data; 1030 + __poll_t ret = 0; 1031 + struct seccomp_knotif *cur; 1032 + 1033 + poll_wait(file, &filter->notif->wqh, poll_tab); 1034 + 1035 + ret = mutex_lock_interruptible(&filter->notify_lock); 1036 + if (ret < 0) 1037 + return EPOLLERR; 1038 + 1039 + list_for_each_entry(cur, &filter->notif->notifications, list) { 1040 + if (cur->state == SECCOMP_NOTIFY_INIT) 1041 + ret |= EPOLLIN | EPOLLRDNORM; 1042 + if (cur->state == SECCOMP_NOTIFY_SENT) 1043 + ret |= EPOLLOUT | EPOLLWRNORM; 1044 + if ((ret & EPOLLIN) && (ret & EPOLLOUT)) 1045 + break; 1046 + } 1047 + 1048 + mutex_unlock(&filter->notify_lock); 1049 + 1050 + return ret; 1051 + } 1052 + 1053 + static const struct file_operations seccomp_notify_ops = { 1054 + .poll = seccomp_notify_poll, 1055 + .release = seccomp_notify_release, 1056 + .unlocked_ioctl = seccomp_notify_ioctl, 1057 + }; 1058 + 1059 + static struct file *init_listener(struct seccomp_filter *filter) 1060 + { 1061 + struct file *ret = ERR_PTR(-EBUSY); 1062 + struct seccomp_filter *cur; 1063 + 1064 + for (cur = current->seccomp.filter; cur; cur = cur->prev) { 1065 + if (cur->notif) 1066 + goto out; 1067 + } 1068 + 1069 + ret = ERR_PTR(-ENOMEM); 1070 + filter->notif = kzalloc(sizeof(*(filter->notif)), GFP_KERNEL); 1071 + if (!filter->notif) 1072 + goto out; 1073 + 1074 + sema_init(&filter->notif->request, 0); 1075 + filter->notif->next_id = get_random_u64(); 1076 + INIT_LIST_HEAD(&filter->notif->notifications); 1077 + init_waitqueue_head(&filter->notif->wqh); 1078 + 1079 + ret = anon_inode_getfile("seccomp notify", &seccomp_notify_ops, 1080 + filter, O_RDWR); 1081 + if (IS_ERR(ret)) 1082 + goto out_notif; 1083 + 1084 + /* The file has a reference to it now */ 1085 + __get_seccomp_filter(filter); 1086 + 1087 + out_notif: 1088 + if (IS_ERR(ret)) 1089 + kfree(filter->notif); 1090 + out: 1091 + return ret; 1092 + } 1093 + 974 1094 /** 975 1095 * seccomp_set_mode_filter: internal function for setting seccomp filter 976 1096 * @flags: flags to change filter behavior ··· 1247 853 const unsigned long seccomp_mode = SECCOMP_MODE_FILTER; 1248 854 struct seccomp_filter *prepared = NULL; 1249 855 long ret = -EINVAL; 856 + int listener = -1; 857 + struct file *listener_f = NULL; 1250 858 1251 859 /* Validate flags. */ 1252 860 if (flags & ~SECCOMP_FILTER_FLAG_MASK) ··· 1259 863 if (IS_ERR(prepared)) 1260 864 return PTR_ERR(prepared); 1261 865 866 + if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) { 867 + listener = get_unused_fd_flags(O_CLOEXEC); 868 + if (listener < 0) { 869 + ret = listener; 870 + goto out_free; 871 + } 872 + 873 + listener_f = init_listener(prepared); 874 + if (IS_ERR(listener_f)) { 875 + put_unused_fd(listener); 876 + ret = PTR_ERR(listener_f); 877 + goto out_free; 878 + } 879 + } 880 + 1262 881 /* 1263 882 * Make sure we cannot change seccomp or nnp state via TSYNC 1264 883 * while another thread is in the middle of calling exec. 1265 884 */ 1266 885 if (flags & SECCOMP_FILTER_FLAG_TSYNC && 1267 886 mutex_lock_killable(&current->signal->cred_guard_mutex)) 1268 - goto out_free; 887 + goto out_put_fd; 1269 888 1270 889 spin_lock_irq(&current->sighand->siglock); 1271 890 ··· 1298 887 spin_unlock_irq(&current->sighand->siglock); 1299 888 if (flags & SECCOMP_FILTER_FLAG_TSYNC) 1300 889 mutex_unlock(&current->signal->cred_guard_mutex); 890 + out_put_fd: 891 + if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) { 892 + if (ret < 0) { 893 + fput(listener_f); 894 + put_unused_fd(listener); 895 + } else { 896 + fd_install(listener, listener_f); 897 + ret = listener; 898 + } 899 + } 1301 900 out_free: 1302 901 seccomp_filter_free(prepared); 1303 902 return ret; ··· 1332 911 case SECCOMP_RET_KILL_THREAD: 1333 912 case SECCOMP_RET_TRAP: 1334 913 case SECCOMP_RET_ERRNO: 914 + case SECCOMP_RET_USER_NOTIF: 1335 915 case SECCOMP_RET_TRACE: 1336 916 case SECCOMP_RET_LOG: 1337 917 case SECCOMP_RET_ALLOW: ··· 1340 918 default: 1341 919 return -EOPNOTSUPP; 1342 920 } 921 + 922 + return 0; 923 + } 924 + 925 + static long seccomp_get_notif_sizes(void __user *usizes) 926 + { 927 + struct seccomp_notif_sizes sizes = { 928 + .seccomp_notif = sizeof(struct seccomp_notif), 929 + .seccomp_notif_resp = sizeof(struct seccomp_notif_resp), 930 + .seccomp_data = sizeof(struct seccomp_data), 931 + }; 932 + 933 + if (copy_to_user(usizes, &sizes, sizeof(sizes))) 934 + return -EFAULT; 1343 935 1344 936 return 0; 1345 937 } ··· 1374 938 return -EINVAL; 1375 939 1376 940 return seccomp_get_action_avail(uargs); 941 + case SECCOMP_GET_NOTIF_SIZES: 942 + if (flags != 0) 943 + return -EINVAL; 944 + 945 + return seccomp_get_notif_sizes(uargs); 1377 946 default: 1378 947 return -EINVAL; 1379 948 } ··· 1552 1111 #define SECCOMP_RET_KILL_THREAD_NAME "kill_thread" 1553 1112 #define SECCOMP_RET_TRAP_NAME "trap" 1554 1113 #define SECCOMP_RET_ERRNO_NAME "errno" 1114 + #define SECCOMP_RET_USER_NOTIF_NAME "user_notif" 1555 1115 #define SECCOMP_RET_TRACE_NAME "trace" 1556 1116 #define SECCOMP_RET_LOG_NAME "log" 1557 1117 #define SECCOMP_RET_ALLOW_NAME "allow" ··· 1562 1120 SECCOMP_RET_KILL_THREAD_NAME " " 1563 1121 SECCOMP_RET_TRAP_NAME " " 1564 1122 SECCOMP_RET_ERRNO_NAME " " 1123 + SECCOMP_RET_USER_NOTIF_NAME " " 1565 1124 SECCOMP_RET_TRACE_NAME " " 1566 1125 SECCOMP_RET_LOG_NAME " " 1567 1126 SECCOMP_RET_ALLOW_NAME; ··· 1577 1134 { SECCOMP_LOG_KILL_THREAD, SECCOMP_RET_KILL_THREAD_NAME }, 1578 1135 { SECCOMP_LOG_TRAP, SECCOMP_RET_TRAP_NAME }, 1579 1136 { SECCOMP_LOG_ERRNO, SECCOMP_RET_ERRNO_NAME }, 1137 + { SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME }, 1580 1138 { SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME }, 1581 1139 { SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME }, 1582 1140 { SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME },
+445 -2
tools/testing/selftests/seccomp/seccomp_bpf.c
··· 5 5 * Test code for seccomp bpf. 6 6 */ 7 7 8 + #define _GNU_SOURCE 8 9 #include <sys/types.h> 9 10 10 11 /* ··· 41 40 #include <sys/fcntl.h> 42 41 #include <sys/mman.h> 43 42 #include <sys/times.h> 43 + #include <sys/socket.h> 44 + #include <sys/ioctl.h> 44 45 45 - #define _GNU_SOURCE 46 46 #include <unistd.h> 47 47 #include <sys/syscall.h> 48 + #include <poll.h> 48 49 49 50 #include "../kselftest_harness.h" 50 51 ··· 136 133 #define SECCOMP_GET_ACTION_AVAIL 2 137 134 #endif 138 135 136 + #ifndef SECCOMP_GET_NOTIF_SIZES 137 + #define SECCOMP_GET_NOTIF_SIZES 3 138 + #endif 139 + 139 140 #ifndef SECCOMP_FILTER_FLAG_TSYNC 140 141 #define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) 141 142 #endif ··· 158 151 struct seccomp_metadata { 159 152 __u64 filter_off; /* Input: which filter */ 160 153 __u64 flags; /* Output: filter's flags */ 154 + }; 155 + #endif 156 + 157 + #ifndef SECCOMP_FILTER_FLAG_NEW_LISTENER 158 + #define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3) 159 + 160 + #define SECCOMP_RET_USER_NOTIF 0x7fc00000U 161 + 162 + #define SECCOMP_IOC_MAGIC '!' 163 + #define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr) 164 + #define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type) 165 + #define SECCOMP_IOW(nr, type) _IOW(SECCOMP_IOC_MAGIC, nr, type) 166 + #define SECCOMP_IOWR(nr, type) _IOWR(SECCOMP_IOC_MAGIC, nr, type) 167 + 168 + /* Flags for seccomp notification fd ioctl. */ 169 + #define SECCOMP_IOCTL_NOTIF_RECV SECCOMP_IOWR(0, struct seccomp_notif) 170 + #define SECCOMP_IOCTL_NOTIF_SEND SECCOMP_IOWR(1, \ 171 + struct seccomp_notif_resp) 172 + #define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64) 173 + 174 + struct seccomp_notif { 175 + __u64 id; 176 + __u32 pid; 177 + __u32 flags; 178 + struct seccomp_data data; 179 + }; 180 + 181 + struct seccomp_notif_resp { 182 + __u64 id; 183 + __s64 val; 184 + __s32 error; 185 + __u32 flags; 186 + }; 187 + 188 + struct seccomp_notif_sizes { 189 + __u16 seccomp_notif; 190 + __u16 seccomp_notif_resp; 191 + __u16 seccomp_data; 161 192 }; 162 193 #endif 163 194 ··· 2122 2077 { 2123 2078 unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC, 2124 2079 SECCOMP_FILTER_FLAG_LOG, 2125 - SECCOMP_FILTER_FLAG_SPEC_ALLOW }; 2080 + SECCOMP_FILTER_FLAG_SPEC_ALLOW, 2081 + SECCOMP_FILTER_FLAG_NEW_LISTENER }; 2126 2082 unsigned int flag, all_flags; 2127 2083 int i; 2128 2084 long ret; ··· 2977 2931 2978 2932 skip: 2979 2933 ASSERT_EQ(0, kill(pid, SIGKILL)); 2934 + } 2935 + 2936 + static int user_trap_syscall(int nr, unsigned int flags) 2937 + { 2938 + struct sock_filter filter[] = { 2939 + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 2940 + offsetof(struct seccomp_data, nr)), 2941 + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1), 2942 + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_USER_NOTIF), 2943 + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), 2944 + }; 2945 + 2946 + struct sock_fprog prog = { 2947 + .len = (unsigned short)ARRAY_SIZE(filter), 2948 + .filter = filter, 2949 + }; 2950 + 2951 + return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); 2952 + } 2953 + 2954 + #define USER_NOTIF_MAGIC 116983961184613L 2955 + TEST(user_notification_basic) 2956 + { 2957 + pid_t pid; 2958 + long ret; 2959 + int status, listener; 2960 + struct seccomp_notif req = {}; 2961 + struct seccomp_notif_resp resp = {}; 2962 + struct pollfd pollfd; 2963 + 2964 + struct sock_filter filter[] = { 2965 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 2966 + }; 2967 + struct sock_fprog prog = { 2968 + .len = (unsigned short)ARRAY_SIZE(filter), 2969 + .filter = filter, 2970 + }; 2971 + 2972 + pid = fork(); 2973 + ASSERT_GE(pid, 0); 2974 + 2975 + /* Check that we get -ENOSYS with no listener attached */ 2976 + if (pid == 0) { 2977 + if (user_trap_syscall(__NR_getpid, 0) < 0) 2978 + exit(1); 2979 + ret = syscall(__NR_getpid); 2980 + exit(ret >= 0 || errno != ENOSYS); 2981 + } 2982 + 2983 + EXPECT_EQ(waitpid(pid, &status, 0), pid); 2984 + EXPECT_EQ(true, WIFEXITED(status)); 2985 + EXPECT_EQ(0, WEXITSTATUS(status)); 2986 + 2987 + /* Add some no-op filters so for grins. */ 2988 + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); 2989 + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); 2990 + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); 2991 + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); 2992 + 2993 + /* Check that the basic notification machinery works */ 2994 + listener = user_trap_syscall(__NR_getpid, 2995 + SECCOMP_FILTER_FLAG_NEW_LISTENER); 2996 + EXPECT_GE(listener, 0); 2997 + 2998 + /* Installing a second listener in the chain should EBUSY */ 2999 + EXPECT_EQ(user_trap_syscall(__NR_getpid, 3000 + SECCOMP_FILTER_FLAG_NEW_LISTENER), 3001 + -1); 3002 + EXPECT_EQ(errno, EBUSY); 3003 + 3004 + pid = fork(); 3005 + ASSERT_GE(pid, 0); 3006 + 3007 + if (pid == 0) { 3008 + ret = syscall(__NR_getpid); 3009 + exit(ret != USER_NOTIF_MAGIC); 3010 + } 3011 + 3012 + pollfd.fd = listener; 3013 + pollfd.events = POLLIN | POLLOUT; 3014 + 3015 + EXPECT_GT(poll(&pollfd, 1, -1), 0); 3016 + EXPECT_EQ(pollfd.revents, POLLIN); 3017 + 3018 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); 3019 + 3020 + pollfd.fd = listener; 3021 + pollfd.events = POLLIN | POLLOUT; 3022 + 3023 + EXPECT_GT(poll(&pollfd, 1, -1), 0); 3024 + EXPECT_EQ(pollfd.revents, POLLOUT); 3025 + 3026 + EXPECT_EQ(req.data.nr, __NR_getpid); 3027 + 3028 + resp.id = req.id; 3029 + resp.error = 0; 3030 + resp.val = USER_NOTIF_MAGIC; 3031 + 3032 + /* check that we make sure flags == 0 */ 3033 + resp.flags = 1; 3034 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1); 3035 + EXPECT_EQ(errno, EINVAL); 3036 + 3037 + resp.flags = 0; 3038 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); 3039 + 3040 + EXPECT_EQ(waitpid(pid, &status, 0), pid); 3041 + EXPECT_EQ(true, WIFEXITED(status)); 3042 + EXPECT_EQ(0, WEXITSTATUS(status)); 3043 + } 3044 + 3045 + TEST(user_notification_kill_in_middle) 3046 + { 3047 + pid_t pid; 3048 + long ret; 3049 + int listener; 3050 + struct seccomp_notif req = {}; 3051 + struct seccomp_notif_resp resp = {}; 3052 + 3053 + listener = user_trap_syscall(__NR_getpid, 3054 + SECCOMP_FILTER_FLAG_NEW_LISTENER); 3055 + EXPECT_GE(listener, 0); 3056 + 3057 + /* 3058 + * Check that nothing bad happens when we kill the task in the middle 3059 + * of a syscall. 3060 + */ 3061 + pid = fork(); 3062 + ASSERT_GE(pid, 0); 3063 + 3064 + if (pid == 0) { 3065 + ret = syscall(__NR_getpid); 3066 + exit(ret != USER_NOTIF_MAGIC); 3067 + } 3068 + 3069 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); 3070 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ID_VALID, &req.id), 0); 3071 + 3072 + EXPECT_EQ(kill(pid, SIGKILL), 0); 3073 + EXPECT_EQ(waitpid(pid, NULL, 0), pid); 3074 + 3075 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ID_VALID, &req.id), -1); 3076 + 3077 + resp.id = req.id; 3078 + ret = ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp); 3079 + EXPECT_EQ(ret, -1); 3080 + EXPECT_EQ(errno, ENOENT); 3081 + } 3082 + 3083 + static int handled = -1; 3084 + 3085 + static void signal_handler(int signal) 3086 + { 3087 + if (write(handled, "c", 1) != 1) 3088 + perror("write from signal"); 3089 + } 3090 + 3091 + TEST(user_notification_signal) 3092 + { 3093 + pid_t pid; 3094 + long ret; 3095 + int status, listener, sk_pair[2]; 3096 + struct seccomp_notif req = {}; 3097 + struct seccomp_notif_resp resp = {}; 3098 + char c; 3099 + 3100 + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); 3101 + 3102 + listener = user_trap_syscall(__NR_gettid, 3103 + SECCOMP_FILTER_FLAG_NEW_LISTENER); 3104 + EXPECT_GE(listener, 0); 3105 + 3106 + pid = fork(); 3107 + ASSERT_GE(pid, 0); 3108 + 3109 + if (pid == 0) { 3110 + close(sk_pair[0]); 3111 + handled = sk_pair[1]; 3112 + if (signal(SIGUSR1, signal_handler) == SIG_ERR) { 3113 + perror("signal"); 3114 + exit(1); 3115 + } 3116 + /* 3117 + * ERESTARTSYS behavior is a bit hard to test, because we need 3118 + * to rely on a signal that has not yet been handled. Let's at 3119 + * least check that the error code gets propagated through, and 3120 + * hope that it doesn't break when there is actually a signal :) 3121 + */ 3122 + ret = syscall(__NR_gettid); 3123 + exit(!(ret == -1 && errno == 512)); 3124 + } 3125 + 3126 + close(sk_pair[1]); 3127 + 3128 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); 3129 + 3130 + EXPECT_EQ(kill(pid, SIGUSR1), 0); 3131 + 3132 + /* 3133 + * Make sure the signal really is delivered, which means we're not 3134 + * stuck in the user notification code any more and the notification 3135 + * should be dead. 3136 + */ 3137 + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); 3138 + 3139 + resp.id = req.id; 3140 + resp.error = -EPERM; 3141 + resp.val = 0; 3142 + 3143 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1); 3144 + EXPECT_EQ(errno, ENOENT); 3145 + 3146 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); 3147 + 3148 + resp.id = req.id; 3149 + resp.error = -512; /* -ERESTARTSYS */ 3150 + resp.val = 0; 3151 + 3152 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); 3153 + 3154 + EXPECT_EQ(waitpid(pid, &status, 0), pid); 3155 + EXPECT_EQ(true, WIFEXITED(status)); 3156 + EXPECT_EQ(0, WEXITSTATUS(status)); 3157 + } 3158 + 3159 + TEST(user_notification_closed_listener) 3160 + { 3161 + pid_t pid; 3162 + long ret; 3163 + int status, listener; 3164 + 3165 + listener = user_trap_syscall(__NR_getpid, 3166 + SECCOMP_FILTER_FLAG_NEW_LISTENER); 3167 + EXPECT_GE(listener, 0); 3168 + 3169 + /* 3170 + * Check that we get an ENOSYS when the listener is closed. 3171 + */ 3172 + pid = fork(); 3173 + ASSERT_GE(pid, 0); 3174 + if (pid == 0) { 3175 + close(listener); 3176 + ret = syscall(__NR_getpid); 3177 + exit(ret != -1 && errno != ENOSYS); 3178 + } 3179 + 3180 + close(listener); 3181 + 3182 + EXPECT_EQ(waitpid(pid, &status, 0), pid); 3183 + EXPECT_EQ(true, WIFEXITED(status)); 3184 + EXPECT_EQ(0, WEXITSTATUS(status)); 3185 + } 3186 + 3187 + /* 3188 + * Check that a pid in a child namespace still shows up as valid in ours. 3189 + */ 3190 + TEST(user_notification_child_pid_ns) 3191 + { 3192 + pid_t pid; 3193 + int status, listener; 3194 + struct seccomp_notif req = {}; 3195 + struct seccomp_notif_resp resp = {}; 3196 + 3197 + ASSERT_EQ(unshare(CLONE_NEWPID), 0); 3198 + 3199 + listener = user_trap_syscall(__NR_getpid, SECCOMP_FILTER_FLAG_NEW_LISTENER); 3200 + ASSERT_GE(listener, 0); 3201 + 3202 + pid = fork(); 3203 + ASSERT_GE(pid, 0); 3204 + 3205 + if (pid == 0) 3206 + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); 3207 + 3208 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); 3209 + EXPECT_EQ(req.pid, pid); 3210 + 3211 + resp.id = req.id; 3212 + resp.error = 0; 3213 + resp.val = USER_NOTIF_MAGIC; 3214 + 3215 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); 3216 + 3217 + EXPECT_EQ(waitpid(pid, &status, 0), pid); 3218 + EXPECT_EQ(true, WIFEXITED(status)); 3219 + EXPECT_EQ(0, WEXITSTATUS(status)); 3220 + close(listener); 3221 + } 3222 + 3223 + /* 3224 + * Check that a pid in a sibling (i.e. unrelated) namespace shows up as 0, i.e. 3225 + * invalid. 3226 + */ 3227 + TEST(user_notification_sibling_pid_ns) 3228 + { 3229 + pid_t pid, pid2; 3230 + int status, listener; 3231 + struct seccomp_notif req = {}; 3232 + struct seccomp_notif_resp resp = {}; 3233 + 3234 + listener = user_trap_syscall(__NR_getpid, SECCOMP_FILTER_FLAG_NEW_LISTENER); 3235 + ASSERT_GE(listener, 0); 3236 + 3237 + pid = fork(); 3238 + ASSERT_GE(pid, 0); 3239 + 3240 + if (pid == 0) { 3241 + ASSERT_EQ(unshare(CLONE_NEWPID), 0); 3242 + 3243 + pid2 = fork(); 3244 + ASSERT_GE(pid2, 0); 3245 + 3246 + if (pid2 == 0) 3247 + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); 3248 + 3249 + EXPECT_EQ(waitpid(pid2, &status, 0), pid2); 3250 + EXPECT_EQ(true, WIFEXITED(status)); 3251 + EXPECT_EQ(0, WEXITSTATUS(status)); 3252 + exit(WEXITSTATUS(status)); 3253 + } 3254 + 3255 + /* Create the sibling ns, and sibling in it. */ 3256 + EXPECT_EQ(unshare(CLONE_NEWPID), 0); 3257 + EXPECT_EQ(errno, 0); 3258 + 3259 + pid2 = fork(); 3260 + EXPECT_GE(pid2, 0); 3261 + 3262 + if (pid2 == 0) { 3263 + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); 3264 + /* 3265 + * The pid should be 0, i.e. the task is in some namespace that 3266 + * we can't "see". 3267 + */ 3268 + ASSERT_EQ(req.pid, 0); 3269 + 3270 + resp.id = req.id; 3271 + resp.error = 0; 3272 + resp.val = USER_NOTIF_MAGIC; 3273 + 3274 + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); 3275 + exit(0); 3276 + } 3277 + 3278 + close(listener); 3279 + 3280 + EXPECT_EQ(waitpid(pid, &status, 0), pid); 3281 + EXPECT_EQ(true, WIFEXITED(status)); 3282 + EXPECT_EQ(0, WEXITSTATUS(status)); 3283 + 3284 + EXPECT_EQ(waitpid(pid2, &status, 0), pid2); 3285 + EXPECT_EQ(true, WIFEXITED(status)); 3286 + EXPECT_EQ(0, WEXITSTATUS(status)); 3287 + } 3288 + 3289 + TEST(user_notification_fault_recv) 3290 + { 3291 + pid_t pid; 3292 + int status, listener; 3293 + struct seccomp_notif req = {}; 3294 + struct seccomp_notif_resp resp = {}; 3295 + 3296 + listener = user_trap_syscall(__NR_getpid, SECCOMP_FILTER_FLAG_NEW_LISTENER); 3297 + ASSERT_GE(listener, 0); 3298 + 3299 + pid = fork(); 3300 + ASSERT_GE(pid, 0); 3301 + 3302 + if (pid == 0) 3303 + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); 3304 + 3305 + /* Do a bad recv() */ 3306 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, NULL), -1); 3307 + EXPECT_EQ(errno, EFAULT); 3308 + 3309 + /* We should still be able to receive this notification, though. */ 3310 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); 3311 + EXPECT_EQ(req.pid, pid); 3312 + 3313 + resp.id = req.id; 3314 + resp.error = 0; 3315 + resp.val = USER_NOTIF_MAGIC; 3316 + 3317 + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); 3318 + 3319 + EXPECT_EQ(waitpid(pid, &status, 0), pid); 3320 + EXPECT_EQ(true, WIFEXITED(status)); 3321 + EXPECT_EQ(0, WEXITSTATUS(status)); 3322 + } 3323 + 3324 + TEST(seccomp_get_notif_sizes) 3325 + { 3326 + struct seccomp_notif_sizes sizes; 3327 + 3328 + EXPECT_EQ(seccomp(SECCOMP_GET_NOTIF_SIZES, 0, &sizes), 0); 3329 + EXPECT_EQ(sizes.seccomp_notif, sizeof(struct seccomp_notif)); 3330 + EXPECT_EQ(sizes.seccomp_notif_resp, sizeof(struct seccomp_notif_resp)); 2980 3331 } 2981 3332 2982 3333 /*