Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Pull proc updates from Eric Biederman:
"This has four sets of changes:

- modernize proc to support multiple private instances

- ensure we see the exit of each process tid exactly

- remove has_group_leader_pid

- use pids not tasks in posix-cpu-timers lookup

Alexey updated proc so each mount of proc uses a new superblock. This
allows people to actually use mount options with proc with no fear of
messing up another mount of proc. Given the kernel's internal mounts
of proc for things like uml this was a real problem, and resulted in
Android's hidepid mount options being ignored and introducing security
issues.

The rest of the changes are small cleanups and fixes that came out of
my work to allow this change to proc. In essence it is swapping the
pids in de_thread during exec which removes a special case the code
had to handle. Then updating the code to stop handling that special
case"

* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: proc_pid_ns takes super_block as an argument
remove the no longer needed pid_alive() check in __task_pid_nr_ns()
posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock
posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type
posix-cpu-timers: Extend rcu_read_lock removing task_struct references
signal: Remove has_group_leader_pid
exec: Remove BUG_ON(has_group_leader_pid)
posix-cpu-timer: Unify the now redundant code in lookup_task
posix-cpu-timer: Tidy up group_leader logic in lookup_task
proc: Ensure we see the exit of each process tid exactly once
rculist: Add hlists_swap_heads_rcu
proc: Use PIDTYPE_TGID in next_tgid
Use proc_pid_ns() to get pid_namespace from the proc superblock
proc: use named enums for better readability
proc: use human-readable values for hidepid
docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
proc: add option to mount only a pids subset
proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
proc: allow to mount many instances of proc in one pid namespace
proc: rename struct proc_fs_info to proc_fs_opts

Linus Torvalds 5 years ago 9ff72585 051c3556

+493 -207

25 changed files

expand all

Documentation

filesystems

proc.rst

exec.c

locks.c

proc

array.c

base.c

generic.c

inode.c

root.c

self.c

thread_self.c

proc_namespace.c

include

linux

pid.h

pid_namespace.h

proc_fs.h

rculist.h

sched

signal.h

kernel

fork.c

pid.c

time

posix-cpu-timers.c

net

ipv6

ip6_flowlabel.c

security

tomoyo

realpath.c

tools

testing

selftests

proc

.gitignore

Makefile

proc-fsconfig-hidepid.c

proc-multiple-procfs.c

+71 -17

Documentation/filesystems/proc.rst

··· 51 51 4 Configuring procfs 52 52 4.1 Mount options 53 53 54 + 5 Filesystem behavior 55 + 54 56 Preface 55 57 ======= 56 58 ··· 2145 2143 ========= ======================================================== 2146 2144 hidepid= Set /proc/<pid>/ access mode. 2147 2145 gid= Set the group authorized to learn processes information. 2146 + subset= Show only the specified subset of procfs. 2148 2147 ========= ======================================================== 2149 2148 2150 - hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories 2151 - (default). 2149 + hidepid=off or hidepid=0 means classic mode - everybody may access all 2150 + /proc/<pid>/ directories (default). 2152 2151 2153 - hidepid=1 means users may not access any /proc/<pid>/ directories but their 2154 - own. Sensitive files like cmdline, sched*, status are now protected against 2155 - other users. This makes it impossible to learn whether any user runs 2156 - specific program (given the program doesn't reveal itself by its behaviour). 2157 - As an additional bonus, as /proc/<pid>/cmdline is unaccessible for other users, 2158 - poorly written programs passing sensitive information via program arguments are 2159 - now protected against local eavesdroppers. 2152 + hidepid=noaccess or hidepid=1 means users may not access any /proc/<pid>/ 2153 + directories but their own. Sensitive files like cmdline, sched*, status are now 2154 + protected against other users. This makes it impossible to learn whether any 2155 + user runs specific program (given the program doesn't reveal itself by its 2156 + behaviour). As an additional bonus, as /proc/<pid>/cmdline is unaccessible for 2157 + other users, poorly written programs passing sensitive information via program 2158 + arguments are now protected against local eavesdroppers. 2160 2159 2161 - hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other 2162 - users. It doesn't mean that it hides a fact whether a process with a specific 2163 - pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"), 2164 - but it hides process' uid and gid, which may be learned by stat()'ing 2165 - /proc/<pid>/ otherwise. It greatly complicates an intruder's task of gathering 2166 - information about running processes, whether some daemon runs with elevated 2167 - privileges, whether other user runs some sensitive program, whether other users 2168 - run any program at all, etc. 2160 + hidepid=invisible or hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be 2161 + fully invisible to other users. It doesn't mean that it hides a fact whether a 2162 + process with a specific pid value exists (it can be learned by other means, e.g. 2163 + by "kill -0 $PID"), but it hides process' uid and gid, which may be learned by 2164 + stat()'ing /proc/<pid>/ otherwise. It greatly complicates an intruder's task of 2165 + gathering information about running processes, whether some daemon runs with 2166 + elevated privileges, whether other user runs some sensitive program, whether 2167 + other users run any program at all, etc. 2168 + 2169 + hidepid=ptraceable or hidepid=4 means that procfs should only contain 2170 + /proc/<pid>/ directories that the caller can ptrace. 2169 2171 2170 2172 gid= defines a group authorized to learn processes information otherwise 2171 2173 prohibited by hidepid=. If you use some daemon like identd which needs to learn 2172 2174 information about processes information, just add identd to this group. 2175 + 2176 + subset=pid hides all top level files and directories in the procfs that 2177 + are not related to tasks. 2178 + 2179 + 5 Filesystem behavior 2180 + ---------------------------- 2181 + 2182 + Originally, before the advent of pid namepsace, procfs was a global file 2183 + system. It means that there was only one procfs instance in the system. 2184 + 2185 + When pid namespace was added, a separate procfs instance was mounted in 2186 + each pid namespace. So, procfs mount options are global among all 2187 + mountpoints within the same namespace. 2188 + 2189 + :: 2190 + 2191 + # grep ^proc /proc/mounts 2192 + proc /proc proc rw,relatime,hidepid=2 0 0 2193 + 2194 + # strace -e mount mount -o hidepid=1 -t proc proc /tmp/proc 2195 + mount("proc", "/tmp/proc", "proc", 0, "hidepid=1") = 0 2196 + +++ exited with 0 +++ 2197 + 2198 + # grep ^proc /proc/mounts 2199 + proc /proc proc rw,relatime,hidepid=2 0 0 2200 + proc /tmp/proc proc rw,relatime,hidepid=2 0 0 2201 + 2202 + and only after remounting procfs mount options will change at all 2203 + mountpoints. 2204 + 2205 + # mount -o remount,hidepid=1 -t proc proc /tmp/proc 2206 + 2207 + # grep ^proc /proc/mounts 2208 + proc /proc proc rw,relatime,hidepid=1 0 0 2209 + proc /tmp/proc proc rw,relatime,hidepid=1 0 0 2210 + 2211 + This behavior is different from the behavior of other filesystems. 2212 + 2213 + The new procfs behavior is more like other filesystems. Each procfs mount 2214 + creates a new procfs instance. Mount options affect own procfs instance. 2215 + It means that it became possible to have several procfs instances 2216 + displaying tasks with different filtering options in one pid namespace. 2217 + 2218 + # mount -o hidepid=invisible -t proc proc /proc 2219 + # mount -o hidepid=noaccess -t proc proc /tmp/proc 2220 + # grep ^proc /proc/mounts 2221 + proc /proc proc rw,relatime,hidepid=invisible 0 0 2222 + proc /tmp/proc proc rw,relatime,hidepid=noaccess 0 0

+1 -5

fs/exec.c

··· 1176 1176 tsk->start_boottime = leader->start_boottime; 1177 1177 1178 1178 BUG_ON(!same_thread_group(leader, tsk)); 1179 - BUG_ON(has_group_leader_pid(tsk)); 1180 1179 /* 1181 1180 * An exec() starts a new thread group with the 1182 1181 * TGID of the previous thread group. Rehash the ··· 1185 1186 1186 1187 /* Become a process group leader with the old leader's pid. 1187 1188 * The old leader becomes a thread of the this thread group. 1188 - * Note: The old leader also uses this pid until release_task 1189 - * is called. Odd but simple and correct. 1190 1189 */ 1191 - tsk->pid = leader->pid; 1192 - change_pid(tsk, PIDTYPE_PID, task_pid(leader)); 1190 + exchange_tids(tsk, leader); 1193 1191 transfer_pid(leader, tsk, PIDTYPE_TGID); 1194 1192 transfer_pid(leader, tsk, PIDTYPE_PGID); 1195 1193 transfer_pid(leader, tsk, PIDTYPE_SID);

+2 -2

fs/locks.c

··· 2823 2823 { 2824 2824 struct inode *inode = NULL; 2825 2825 unsigned int fl_pid; 2826 - struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info; 2826 + struct pid_namespace *proc_pidns = proc_pid_ns(file_inode(f->file)->i_sb); 2827 2827 2828 2828 fl_pid = locks_translate_pid(fl, proc_pidns); 2829 2829 /* ··· 2901 2901 { 2902 2902 struct locks_iterator *iter = f->private; 2903 2903 struct file_lock *fl, *bfl; 2904 - struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info; 2904 + struct pid_namespace *proc_pidns = proc_pid_ns(file_inode(f->file)->i_sb); 2905 2905 2906 2906 fl = hlist_entry(v, struct file_lock, fl_link); 2907 2907

+1 -1

fs/proc/array.c

··· 728 728 { 729 729 struct inode *inode = file_inode(seq->file); 730 730 731 - seq_printf(seq, "%d ", pid_nr_ns(v, proc_pid_ns(inode))); 731 + seq_printf(seq, "%d ", pid_nr_ns(v, proc_pid_ns(inode->i_sb))); 732 732 return 0; 733 733 } 734 734

+41 -33

fs/proc/base.c

··· 697 697 * May current process learn task's sched/cmdline info (for hide_pid_min=1) 698 698 * or euid/egid (for hide_pid_min=2)? 699 699 */ 700 - static bool has_pid_permissions(struct pid_namespace *pid, 700 + static bool has_pid_permissions(struct proc_fs_info *fs_info, 701 701 struct task_struct *task, 702 - int hide_pid_min) 702 + enum proc_hidepid hide_pid_min) 703 703 { 704 - if (pid->hide_pid < hide_pid_min) 704 + /* 705 + * If 'hidpid' mount option is set force a ptrace check, 706 + * we indicate that we are using a filesystem syscall 707 + * by passing PTRACE_MODE_READ_FSCREDS 708 + */ 709 + if (fs_info->hide_pid == HIDEPID_NOT_PTRACEABLE) 710 + return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); 711 + 712 + if (fs_info->hide_pid < hide_pid_min) 705 713 return true; 706 - if (in_group_p(pid->pid_gid)) 714 + if (in_group_p(fs_info->pid_gid)) 707 715 return true; 708 716 return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); 709 717 } ··· 719 711 720 712 static int proc_pid_permission(struct inode *inode, int mask) 721 713 { 722 - struct pid_namespace *pid = proc_pid_ns(inode); 714 + struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb); 723 715 struct task_struct *task; 724 716 bool has_perms; 725 717 726 718 task = get_proc_task(inode); 727 719 if (!task) 728 720 return -ESRCH; 729 - has_perms = has_pid_permissions(pid, task, HIDEPID_NO_ACCESS); 721 + has_perms = has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS); 730 722 put_task_struct(task); 731 723 732 724 if (!has_perms) { 733 - if (pid->hide_pid == HIDEPID_INVISIBLE) { 725 + if (fs_info->hide_pid == HIDEPID_INVISIBLE) { 734 726 /* 735 727 * Let's make getdents(), stat(), and open() 736 728 * consistent with each other. If a process ··· 754 746 static int proc_single_show(struct seq_file *m, void *v) 755 747 { 756 748 struct inode *inode = m->private; 757 - struct pid_namespace *ns = proc_pid_ns(inode); 749 + struct pid_namespace *ns = proc_pid_ns(inode->i_sb); 758 750 struct pid *pid = proc_pid(inode); 759 751 struct task_struct *task; 760 752 int ret; ··· 1423 1415 static int sched_show(struct seq_file *m, void *v) 1424 1416 { 1425 1417 struct inode *inode = m->private; 1426 - struct pid_namespace *ns = proc_pid_ns(inode); 1418 + struct pid_namespace *ns = proc_pid_ns(inode->i_sb); 1427 1419 struct task_struct *p; 1428 1420 1429 1421 p = get_proc_task(inode); ··· 1917 1909 u32 request_mask, unsigned int query_flags) 1918 1910 { 1919 1911 struct inode *inode = d_inode(path->dentry); 1920 - struct pid_namespace *pid = proc_pid_ns(inode); 1912 + struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb); 1921 1913 struct task_struct *task; 1922 1914 1923 1915 generic_fillattr(inode, stat); ··· 1927 1919 rcu_read_lock(); 1928 1920 task = pid_task(proc_pid(inode), PIDTYPE_PID); 1929 1921 if (task) { 1930 - if (!has_pid_permissions(pid, task, HIDEPID_INVISIBLE)) { 1922 + if (!has_pid_permissions(fs_info, task, HIDEPID_INVISIBLE)) { 1931 1923 rcu_read_unlock(); 1932 1924 /* 1933 1925 * This doesn't prevent learning whether PID exists, ··· 2478 2470 return -ENOMEM; 2479 2471 2480 2472 tp->pid = proc_pid(inode); 2481 - tp->ns = proc_pid_ns(inode); 2473 + tp->ns = proc_pid_ns(inode->i_sb); 2482 2474 return 0; 2483 2475 } 2484 2476 ··· 3320 3312 { 3321 3313 struct task_struct *task; 3322 3314 unsigned tgid; 3315 + struct proc_fs_info *fs_info; 3323 3316 struct pid_namespace *ns; 3324 3317 struct dentry *result = ERR_PTR(-ENOENT); 3325 3318 ··· 3328 3319 if (tgid == ~0U) 3329 3320 goto out; 3330 3321 3331 - ns = dentry->d_sb->s_fs_info; 3322 + fs_info = proc_sb_info(dentry->d_sb); 3323 + ns = fs_info->pid_ns; 3332 3324 rcu_read_lock(); 3333 3325 task = find_task_by_pid_ns(tgid, ns); 3334 3326 if (task) ··· 3338 3328 if (!task) 3339 3329 goto out; 3340 3330 3331 + /* Limit procfs to only ptraceable tasks */ 3332 + if (fs_info->hide_pid == HIDEPID_NOT_PTRACEABLE) { 3333 + if (!has_pid_permissions(fs_info, task, HIDEPID_NO_ACCESS)) 3334 + goto out_put_task; 3335 + } 3336 + 3341 3337 result = proc_pid_instantiate(dentry, task, NULL); 3338 + out_put_task: 3342 3339 put_task_struct(task); 3343 3340 out: 3344 3341 return result; ··· 3371 3354 pid = find_ge_pid(iter.tgid, ns); 3372 3355 if (pid) { 3373 3356 iter.tgid = pid_nr_ns(pid, ns); 3374 - iter.task = pid_task(pid, PIDTYPE_PID); 3375 - /* What we to know is if the pid we have find is the 3376 - * pid of a thread_group_leader. Testing for task 3377 - * being a thread_group_leader is the obvious thing 3378 - * todo but there is a window when it fails, due to 3379 - * the pid transfer logic in de_thread. 3380 - * 3381 - * So we perform the straight forward test of seeing 3382 - * if the pid we have found is the pid of a thread 3383 - * group leader, and don't worry if the task we have 3384 - * found doesn't happen to be a thread group leader. 3385 - * As we don't care in the case of readdir. 3386 - */ 3387 - if (!iter.task || !has_group_leader_pid(iter.task)) { 3357 + iter.task = pid_task(pid, PIDTYPE_TGID); 3358 + if (!iter.task) { 3388 3359 iter.tgid += 1; 3389 3360 goto retry; 3390 3361 } ··· 3388 3383 int proc_pid_readdir(struct file *file, struct dir_context *ctx) 3389 3384 { 3390 3385 struct tgid_iter iter; 3391 - struct pid_namespace *ns = proc_pid_ns(file_inode(file)); 3386 + struct proc_fs_info *fs_info = proc_sb_info(file_inode(file)->i_sb); 3387 + struct pid_namespace *ns = proc_pid_ns(file_inode(file)->i_sb); 3392 3388 loff_t pos = ctx->pos; 3393 3389 3394 3390 if (pos >= PID_MAX_LIMIT + TGID_OFFSET) 3395 3391 return 0; 3396 3392 3397 3393 if (pos == TGID_OFFSET - 2) { 3398 - struct inode *inode = d_inode(ns->proc_self); 3394 + struct inode *inode = d_inode(fs_info->proc_self); 3399 3395 if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK)) 3400 3396 return 0; 3401 3397 ctx->pos = pos = pos + 1; 3402 3398 } 3403 3399 if (pos == TGID_OFFSET - 1) { 3404 - struct inode *inode = d_inode(ns->proc_thread_self); 3400 + struct inode *inode = d_inode(fs_info->proc_thread_self); 3405 3401 if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK)) 3406 3402 return 0; 3407 3403 ctx->pos = pos = pos + 1; ··· 3416 3410 unsigned int len; 3417 3411 3418 3412 cond_resched(); 3419 - if (!has_pid_permissions(ns, iter.task, HIDEPID_INVISIBLE)) 3413 + if (!has_pid_permissions(fs_info, iter.task, HIDEPID_INVISIBLE)) 3420 3414 continue; 3421 3415 3422 3416 len = snprintf(name, sizeof(name), "%u", iter.tgid); ··· 3616 3610 struct task_struct *task; 3617 3611 struct task_struct *leader = get_proc_task(dir); 3618 3612 unsigned tid; 3613 + struct proc_fs_info *fs_info; 3619 3614 struct pid_namespace *ns; 3620 3615 struct dentry *result = ERR_PTR(-ENOENT); 3621 3616 ··· 3627 3620 if (tid == ~0U) 3628 3621 goto out; 3629 3622 3630 - ns = dentry->d_sb->s_fs_info; 3623 + fs_info = proc_sb_info(dentry->d_sb); 3624 + ns = fs_info->pid_ns; 3631 3625 rcu_read_lock(); 3632 3626 task = find_task_by_pid_ns(tid, ns); 3633 3627 if (task) ··· 3742 3734 /* f_version caches the tgid value that the last readdir call couldn't 3743 3735 * return. lseek aka telldir automagically resets f_version to 0. 3744 3736 */ 3745 - ns = proc_pid_ns(inode); 3737 + ns = proc_pid_ns(inode->i_sb); 3746 3738 tid = (int)file->f_version; 3747 3739 file->f_version = 0; 3748 3740 for (task = first_tid(proc_pid(inode), tid, ctx->pos - 2, ns);

fs/proc/generic.c

··· 269 269 struct dentry *proc_lookup(struct inode *dir, struct dentry *dentry, 270 270 unsigned int flags) 271 271 { 272 + struct proc_fs_info *fs_info = proc_sb_info(dir->i_sb); 273 + 274 + if (fs_info->pidonly == PROC_PIDONLY_ON) 275 + return ERR_PTR(-ENOENT); 276 + 272 277 return proc_lookup_de(dir, dentry, PDE(dir)); 273 278 } 274 279 ··· 330 325 int proc_readdir(struct file *file, struct dir_context *ctx) 331 326 { 332 327 struct inode *inode = file_inode(file); 328 + struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb); 329 + 330 + if (fs_info->pidonly == PROC_PIDONLY_ON) 331 + return 1; 333 332 334 333 return proc_readdir_de(file, ctx, PDE(inode)); 335 334 }

+24 -6

fs/proc/inode.c

··· 24 24 #include <linux/seq_file.h> 25 25 #include <linux/slab.h> 26 26 #include <linux/mount.h> 27 + #include <linux/bug.h> 27 28 28 29 #include <linux/uaccess.h> 29 30 ··· 166 165 deactivate_super(old_sb); 167 166 } 168 167 168 + static inline const char *hidepid2str(enum proc_hidepid v) 169 + { 170 + switch (v) { 171 + case HIDEPID_OFF: return "off"; 172 + case HIDEPID_NO_ACCESS: return "noaccess"; 173 + case HIDEPID_INVISIBLE: return "invisible"; 174 + case HIDEPID_NOT_PTRACEABLE: return "ptraceable"; 175 + } 176 + WARN_ONCE(1, "bad hide_pid value: %d\n", v); 177 + return "unknown"; 178 + } 179 + 169 180 static int proc_show_options(struct seq_file *seq, struct dentry *root) 170 181 { 171 - struct super_block *sb = root->d_sb; 172 - struct pid_namespace *pid = sb->s_fs_info; 182 + struct proc_fs_info *fs_info = proc_sb_info(root->d_sb); 173 183 174 - if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID)) 175 - seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid)); 176 - if (pid->hide_pid != HIDEPID_OFF) 177 - seq_printf(seq, ",hidepid=%u", pid->hide_pid); 184 + if (!gid_eq(fs_info->pid_gid, GLOBAL_ROOT_GID)) 185 + seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, fs_info->pid_gid)); 186 + if (fs_info->hide_pid != HIDEPID_OFF) 187 + seq_printf(seq, ",hidepid=%s", hidepid2str(fs_info->hide_pid)); 188 + if (fs_info->pidonly != PROC_PIDONLY_OFF) 189 + seq_printf(seq, ",subset=pid"); 178 190 179 191 return 0; 180 192 } ··· 478 464 479 465 static int proc_reg_open(struct inode *inode, struct file *file) 480 466 { 467 + struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb); 481 468 struct proc_dir_entry *pde = PDE(inode); 482 469 int rv = 0; 483 470 typeof_member(struct proc_ops, proc_open) open; ··· 491 476 rv = open(inode, file); 492 477 return rv; 493 478 } 479 + 480 + if (fs_info->pidonly == PROC_PIDONLY_ON) 481 + return -ENOENT; 494 482 495 483 /* 496 484 * Ensure that

+101 -32

fs/proc/root.c

··· 32 32 struct proc_fs_context { 33 33 struct pid_namespace *pid_ns; 34 34 unsigned int mask; 35 - int hidepid; 35 + enum proc_hidepid hidepid; 36 36 int gid; 37 + enum proc_pidonly pidonly; 37 38 }; 38 39 39 40 enum proc_param { 40 41 Opt_gid, 41 42 Opt_hidepid, 43 + Opt_subset, 42 44 }; 43 45 44 46 static const struct fs_parameter_spec proc_fs_parameters[] = { 45 47 fsparam_u32("gid", Opt_gid), 46 - fsparam_u32("hidepid", Opt_hidepid), 48 + fsparam_string("hidepid", Opt_hidepid), 49 + fsparam_string("subset", Opt_subset), 47 50 {} 48 51 }; 52 + 53 + static inline int valid_hidepid(unsigned int value) 54 + { 55 + return (value == HIDEPID_OFF || 56 + value == HIDEPID_NO_ACCESS || 57 + value == HIDEPID_INVISIBLE || 58 + value == HIDEPID_NOT_PTRACEABLE); 59 + } 60 + 61 + static int proc_parse_hidepid_param(struct fs_context *fc, struct fs_parameter *param) 62 + { 63 + struct proc_fs_context *ctx = fc->fs_private; 64 + struct fs_parameter_spec hidepid_u32_spec = fsparam_u32("hidepid", Opt_hidepid); 65 + struct fs_parse_result result; 66 + int base = (unsigned long)hidepid_u32_spec.data; 67 + 68 + if (param->type != fs_value_is_string) 69 + return invalf(fc, "proc: unexpected type of hidepid value\n"); 70 + 71 + if (!kstrtouint(param->string, base, &result.uint_32)) { 72 + if (!valid_hidepid(result.uint_32)) 73 + return invalf(fc, "proc: unknown value of hidepid - %s\n", param->string); 74 + ctx->hidepid = result.uint_32; 75 + return 0; 76 + } 77 + 78 + if (!strcmp(param->string, "off")) 79 + ctx->hidepid = HIDEPID_OFF; 80 + else if (!strcmp(param->string, "noaccess")) 81 + ctx->hidepid = HIDEPID_NO_ACCESS; 82 + else if (!strcmp(param->string, "invisible")) 83 + ctx->hidepid = HIDEPID_INVISIBLE; 84 + else if (!strcmp(param->string, "ptraceable")) 85 + ctx->hidepid = HIDEPID_NOT_PTRACEABLE; 86 + else 87 + return invalf(fc, "proc: unknown value of hidepid - %s\n", param->string); 88 + 89 + return 0; 90 + } 91 + 92 + static int proc_parse_subset_param(struct fs_context *fc, char *value) 93 + { 94 + struct proc_fs_context *ctx = fc->fs_private; 95 + 96 + while (value) { 97 + char *ptr = strchr(value, ','); 98 + 99 + if (ptr != NULL) 100 + *ptr++ = '\0'; 101 + 102 + if (*value != '\0') { 103 + if (!strcmp(value, "pid")) { 104 + ctx->pidonly = PROC_PIDONLY_ON; 105 + } else { 106 + return invalf(fc, "proc: unsupported subset option - %s\n", value); 107 + } 108 + } 109 + value = ptr; 110 + } 111 + 112 + return 0; 113 + } 49 114 50 115 static int proc_parse_param(struct fs_context *fc, struct fs_parameter *param) 51 116 { ··· 128 63 break; 129 64 130 65 case Opt_hidepid: 131 - ctx->hidepid = result.uint_32; 132 - if (ctx->hidepid < HIDEPID_OFF || 133 - ctx->hidepid > HIDEPID_INVISIBLE) 134 - return invalfc(fc, "hidepid value must be between 0 and 2.\n"); 66 + if (proc_parse_hidepid_param(fc, param)) 67 + return -EINVAL; 68 + break; 69 + 70 + case Opt_subset: 71 + if (proc_parse_subset_param(fc, param->string) < 0) 72 + return -EINVAL; 135 73 break; 136 74 137 75 default: ··· 145 77 return 0; 146 78 } 147 79 148 - static void proc_apply_options(struct super_block *s, 80 + static void proc_apply_options(struct proc_fs_info *fs_info, 149 81 struct fs_context *fc, 150 - struct pid_namespace *pid_ns, 151 82 struct user_namespace *user_ns) 152 83 { 153 84 struct proc_fs_context *ctx = fc->fs_private; 154 85 155 86 if (ctx->mask & (1 << Opt_gid)) 156 - pid_ns->pid_gid = make_kgid(user_ns, ctx->gid); 87 + fs_info->pid_gid = make_kgid(user_ns, ctx->gid); 157 88 if (ctx->mask & (1 << Opt_hidepid)) 158 - pid_ns->hide_pid = ctx->hidepid; 89 + fs_info->hide_pid = ctx->hidepid; 90 + if (ctx->mask & (1 << Opt_subset)) 91 + fs_info->pidonly = ctx->pidonly; 159 92 } 160 93 161 94 static int proc_fill_super(struct super_block *s, struct fs_context *fc) 162 95 { 163 - struct pid_namespace *pid_ns = get_pid_ns(s->s_fs_info); 96 + struct proc_fs_context *ctx = fc->fs_private; 164 97 struct inode *root_inode; 98 + struct proc_fs_info *fs_info; 165 99 int ret; 166 100 167 - proc_apply_options(s, fc, pid_ns, current_user_ns()); 101 + fs_info = kzalloc(sizeof(*fs_info), GFP_KERNEL); 102 + if (!fs_info) 103 + return -ENOMEM; 104 + 105 + fs_info->pid_ns = get_pid_ns(ctx->pid_ns); 106 + proc_apply_options(fs_info, fc, current_user_ns()); 168 107 169 108 /* User space would break if executables or devices appear on proc */ 170 109 s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV; ··· 181 106 s->s_magic = PROC_SUPER_MAGIC; 182 107 s->s_op = &proc_sops; 183 108 s->s_time_gran = 1; 109 + s->s_fs_info = fs_info; 184 110 185 111 /* 186 112 * procfs isn't actually a stacking filesystem; however, there is ··· 189 113 * top of it 190 114 */ 191 115 s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH; 192 - 116 + 193 117 /* procfs dentries and inodes don't require IO to create */ 194 118 s->s_shrink.seeks = 0; 195 119 ··· 216 140 static int proc_reconfigure(struct fs_context *fc) 217 141 { 218 142 struct super_block *sb = fc->root->d_sb; 219 - struct pid_namespace *pid = sb->s_fs_info; 143 + struct proc_fs_info *fs_info = proc_sb_info(sb); 220 144 221 145 sync_filesystem(sb); 222 146 223 - proc_apply_options(sb, fc, pid, current_user_ns()); 147 + proc_apply_options(fs_info, fc, current_user_ns()); 224 148 return 0; 225 149 } 226 150 227 151 static int proc_get_tree(struct fs_context *fc) 228 152 { 229 - struct proc_fs_context *ctx = fc->fs_private; 230 - 231 - return get_tree_keyed(fc, proc_fill_super, ctx->pid_ns); 153 + return get_tree_nodev(fc, proc_fill_super); 232 154 } 233 155 234 156 static void proc_fs_context_free(struct fs_context *fc) ··· 262 188 263 189 static void proc_kill_sb(struct super_block *sb) 264 190 { 265 - struct pid_namespace *ns; 191 + struct proc_fs_info *fs_info = proc_sb_info(sb); 266 192 267 - ns = (struct pid_namespace *)sb->s_fs_info; 268 - if (ns->proc_self) 269 - dput(ns->proc_self); 270 - if (ns->proc_thread_self) 271 - dput(ns->proc_thread_self); 193 + if (fs_info->proc_self) 194 + dput(fs_info->proc_self); 195 + 196 + if (fs_info->proc_thread_self) 197 + dput(fs_info->proc_thread_self); 198 + 272 199 kill_anon_super(sb); 273 - 274 - /* Make the pid namespace safe for the next mount of proc */ 275 - ns->proc_self = NULL; 276 - ns->proc_thread_self = NULL; 277 - ns->pid_gid = GLOBAL_ROOT_GID; 278 - ns->hide_pid = 0; 279 - 280 - put_pid_ns(ns); 200 + put_pid_ns(fs_info->pid_ns); 201 + kfree(fs_info); 281 202 } 282 203 283 204 static struct file_system_type proc_fs_type = {

+4 -4

fs/proc/self.c

··· 12 12 struct inode *inode, 13 13 struct delayed_call *done) 14 14 { 15 - struct pid_namespace *ns = proc_pid_ns(inode); 15 + struct pid_namespace *ns = proc_pid_ns(inode->i_sb); 16 16 pid_t tgid = task_tgid_nr_ns(current, ns); 17 17 char *name; 18 18 ··· 36 36 int proc_setup_self(struct super_block *s) 37 37 { 38 38 struct inode *root_inode = d_inode(s->s_root); 39 - struct pid_namespace *ns = proc_pid_ns(root_inode); 39 + struct proc_fs_info *fs_info = proc_sb_info(s); 40 40 struct dentry *self; 41 41 int ret = -ENOMEM; 42 - 42 + 43 43 inode_lock(root_inode); 44 44 self = d_alloc_name(s->s_root, "self"); 45 45 if (self) { ··· 62 62 if (ret) 63 63 pr_err("proc_fill_super: can't allocate /proc/self\n"); 64 64 else 65 - ns->proc_self = self; 65 + fs_info->proc_self = self; 66 66 67 67 return ret; 68 68 }

+4 -4

fs/proc/thread_self.c

··· 12 12 struct inode *inode, 13 13 struct delayed_call *done) 14 14 { 15 - struct pid_namespace *ns = proc_pid_ns(inode); 15 + struct pid_namespace *ns = proc_pid_ns(inode->i_sb); 16 16 pid_t tgid = task_tgid_nr_ns(current, ns); 17 17 pid_t pid = task_pid_nr_ns(current, ns); 18 18 char *name; ··· 36 36 int proc_setup_thread_self(struct super_block *s) 37 37 { 38 38 struct inode *root_inode = d_inode(s->s_root); 39 - struct pid_namespace *ns = proc_pid_ns(root_inode); 39 + struct proc_fs_info *fs_info = proc_sb_info(s); 40 40 struct dentry *thread_self; 41 41 int ret = -ENOMEM; 42 42 ··· 60 60 inode_unlock(root_inode); 61 61 62 62 if (ret) 63 - pr_err("proc_fill_super: can't allocate /proc/thread_self\n"); 63 + pr_err("proc_fill_super: can't allocate /proc/thread-self\n"); 64 64 else 65 - ns->proc_thread_self = thread_self; 65 + fs_info->proc_thread_self = thread_self; 66 66 67 67 return ret; 68 68 }

+7 -7

fs/proc_namespace.c

··· 37 37 return res; 38 38 } 39 39 40 - struct proc_fs_info { 40 + struct proc_fs_opts { 41 41 int flag; 42 42 const char *str; 43 43 }; 44 44 45 45 static int show_sb_opts(struct seq_file *m, struct super_block *sb) 46 46 { 47 - static const struct proc_fs_info fs_info[] = { 47 + static const struct proc_fs_opts fs_opts[] = { 48 48 { SB_SYNCHRONOUS, ",sync" }, 49 49 { SB_DIRSYNC, ",dirsync" }, 50 50 { SB_MANDLOCK, ",mand" }, 51 51 { SB_LAZYTIME, ",lazytime" }, 52 52 { 0, NULL } 53 53 }; 54 - const struct proc_fs_info *fs_infop; 54 + const struct proc_fs_opts *fs_infop; 55 55 56 - for (fs_infop = fs_info; fs_infop->flag; fs_infop++) { 56 + for (fs_infop = fs_opts; fs_infop->flag; fs_infop++) { 57 57 if (sb->s_flags & fs_infop->flag) 58 58 seq_puts(m, fs_infop->str); 59 59 } ··· 63 63 64 64 static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) 65 65 { 66 - static const struct proc_fs_info mnt_info[] = { 66 + static const struct proc_fs_opts mnt_opts[] = { 67 67 { MNT_NOSUID, ",nosuid" }, 68 68 { MNT_NODEV, ",nodev" }, 69 69 { MNT_NOEXEC, ",noexec" }, ··· 72 72 { MNT_RELATIME, ",relatime" }, 73 73 { 0, NULL } 74 74 }; 75 - const struct proc_fs_info *fs_infop; 75 + const struct proc_fs_opts *fs_infop; 76 76 77 - for (fs_infop = mnt_info; fs_infop->flag; fs_infop++) { 77 + for (fs_infop = mnt_opts; fs_infop->flag; fs_infop++) { 78 78 if (mnt->mnt_flags & fs_infop->flag) 79 79 seq_puts(m, fs_infop->str); 80 80 }

include/linux/pid.h

··· 102 102 extern void detach_pid(struct task_struct *task, enum pid_type); 103 103 extern void change_pid(struct task_struct *task, enum pid_type, 104 104 struct pid *pid); 105 + extern void exchange_tids(struct task_struct *task, struct task_struct *old); 105 106 extern void transfer_pid(struct task_struct *old, struct task_struct *new, 106 107 enum pid_type); 107 108

-12

include/linux/pid_namespace.h

··· 17 17 18 18 struct fs_pin; 19 19 20 - enum { /* definitions for pid_namespace's hide_pid field */ 21 - HIDEPID_OFF = 0, 22 - HIDEPID_NO_ACCESS = 1, 23 - HIDEPID_INVISIBLE = 2, 24 - }; 25 - 26 20 struct pid_namespace { 27 21 struct kref kref; 28 22 struct idr idr; ··· 26 32 struct kmem_cache *pid_cachep; 27 33 unsigned int level; 28 34 struct pid_namespace *parent; 29 - #ifdef CONFIG_PROC_FS 30 - struct dentry *proc_self; 31 - struct dentry *proc_thread_self; 32 - #endif 33 35 #ifdef CONFIG_BSD_PROCESS_ACCT 34 36 struct fs_pin *bacct; 35 37 #endif 36 38 struct user_namespace *user_ns; 37 39 struct ucounts *ucounts; 38 - kgid_t pid_gid; 39 - int hide_pid; 40 40 int reboot; /* group exit code if this pidns was rebooted */ 41 41 struct ns_common ns; 42 42 } __randomize_layout;

+30 -2

include/linux/proc_fs.h

··· 42 42 unsigned long (*proc_get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); 43 43 } __randomize_layout; 44 44 45 + /* definitions for hide_pid field */ 46 + enum proc_hidepid { 47 + HIDEPID_OFF = 0, 48 + HIDEPID_NO_ACCESS = 1, 49 + HIDEPID_INVISIBLE = 2, 50 + HIDEPID_NOT_PTRACEABLE = 4, /* Limit pids to only ptraceable pids */ 51 + }; 52 + 53 + /* definitions for proc mount option pidonly */ 54 + enum proc_pidonly { 55 + PROC_PIDONLY_OFF = 0, 56 + PROC_PIDONLY_ON = 1, 57 + }; 58 + 59 + struct proc_fs_info { 60 + struct pid_namespace *pid_ns; 61 + struct dentry *proc_self; /* For /proc/self */ 62 + struct dentry *proc_thread_self; /* For /proc/thread-self */ 63 + kgid_t pid_gid; 64 + enum proc_hidepid hide_pid; 65 + enum proc_pidonly pidonly; 66 + }; 67 + 68 + static inline struct proc_fs_info *proc_sb_info(struct super_block *sb) 69 + { 70 + return sb->s_fs_info; 71 + } 72 + 45 73 #ifdef CONFIG_PROC_FS 46 74 47 75 typedef int (*proc_write_t)(struct file *, char *, size_t); ··· 205 177 struct ns_common *(*get_ns)(struct ns_common *ns)); 206 178 207 179 /* get the associated pid namespace for a file in procfs */ 208 - static inline struct pid_namespace *proc_pid_ns(const struct inode *inode) 180 + static inline struct pid_namespace *proc_pid_ns(struct super_block *sb) 209 181 { 210 - return inode->i_sb->s_fs_info; 182 + return proc_sb_info(sb)->pid_ns; 211 183 } 212 184 213 185 bool proc_ns_file(const struct file *file);

+21

include/linux/rculist.h

··· 506 506 WRITE_ONCE(old->pprev, LIST_POISON2); 507 507 } 508 508 509 + /** 510 + * hlists_swap_heads_rcu - swap the lists the hlist heads point to 511 + * @left: The hlist head on the left 512 + * @right: The hlist head on the right 513 + * 514 + * The lists start out as [@left ][node1 ... ] and 515 + [@right ][node2 ... ] 516 + * The lists end up as [@left ][node2 ... ] 517 + * [@right ][node1 ... ] 518 + */ 519 + static inline void hlists_swap_heads_rcu(struct hlist_head *left, struct hlist_head *right) 520 + { 521 + struct hlist_node *node1 = left->first; 522 + struct hlist_node *node2 = right->first; 523 + 524 + rcu_assign_pointer(left->first, node2); 525 + rcu_assign_pointer(right->first, node1); 526 + WRITE_ONCE(node2->pprev, &left->first); 527 + WRITE_ONCE(node1->pprev, &right->first); 528 + } 529 + 509 530 /* 510 531 * return the first or the next element in an RCU protected hlist 511 532 */

-11

include/linux/sched/signal.h

··· 654 654 return p->exit_signal >= 0; 655 655 } 656 656 657 - /* Do to the insanities of de_thread it is possible for a process 658 - * to have the pid of the thread group leader without actually being 659 - * the thread group leader. For iteration through the pids in proc 660 - * all we care about is that we have a task with the appropriate 661 - * pid, we don't actually care if we have the right task. 662 - */ 663 - static inline bool has_group_leader_pid(struct task_struct *p) 664 - { 665 - return task_pid(p) == task_tgid(p); 666 - } 667 - 668 657 static inline 669 658 bool same_thread_group(struct task_struct *p1, struct task_struct *p2) 670 659 {

+1 -1

kernel/fork.c

··· 1759 1759 pid_t nr = -1; 1760 1760 1761 1761 if (likely(pid_has_task(pid, PIDTYPE_PID))) { 1762 - ns = proc_pid_ns(file_inode(m->file)); 1762 + ns = proc_pid_ns(file_inode(m->file)->i_sb); 1763 1763 nr = pid_nr_ns(pid, ns); 1764 1764 } 1765 1765

+20 -2

kernel/pid.c

··· 363 363 attach_pid(task, type); 364 364 } 365 365 366 + void exchange_tids(struct task_struct *left, struct task_struct *right) 367 + { 368 + struct pid *pid1 = left->thread_pid; 369 + struct pid *pid2 = right->thread_pid; 370 + struct hlist_head *head1 = &pid1->tasks[PIDTYPE_PID]; 371 + struct hlist_head *head2 = &pid2->tasks[PIDTYPE_PID]; 372 + 373 + /* Swap the single entry tid lists */ 374 + hlists_swap_heads_rcu(head1, head2); 375 + 376 + /* Swap the per task_struct pid */ 377 + rcu_assign_pointer(left->thread_pid, pid2); 378 + rcu_assign_pointer(right->thread_pid, pid1); 379 + 380 + /* Swap the cached value */ 381 + WRITE_ONCE(left->pid, pid_nr(pid2)); 382 + WRITE_ONCE(right->pid, pid_nr(pid1)); 383 + } 384 + 366 385 /* transfer_pid is an optimization of attach_pid(new), detach_pid(old) */ 367 386 void transfer_pid(struct task_struct *old, struct task_struct *new, 368 387 enum pid_type type) ··· 495 476 rcu_read_lock(); 496 477 if (!ns) 497 478 ns = task_active_pid_ns(current); 498 - if (likely(pid_alive(task))) 499 - nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns); 479 + nr = pid_nr_ns(rcu_dereference(*task_pid_ptr(task, type)), ns); 500 480 rcu_read_unlock(); 501 481 502 482 return nr;

+49 -66

kernel/time/posix-cpu-timers.c

··· 47 47 /* 48 48 * Functions for validating access to tasks. 49 49 */ 50 - static struct task_struct *lookup_task(const pid_t pid, bool thread, 51 - bool gettime) 50 + static struct pid *pid_for_clock(const clockid_t clock, bool gettime) 52 51 { 53 - struct task_struct *p; 52 + const bool thread = !!CPUCLOCK_PERTHREAD(clock); 53 + const pid_t upid = CPUCLOCK_PID(clock); 54 + struct pid *pid; 55 + 56 + if (CPUCLOCK_WHICH(clock) >= CPUCLOCK_MAX) 57 + return NULL; 54 58 55 59 /* 56 60 * If the encoded PID is 0, then the timer is targeted at current 57 61 * or the process to which current belongs. 58 62 */ 63 + if (upid == 0) 64 + return thread ? task_pid(current) : task_tgid(current); 65 + 66 + pid = find_vpid(upid); 59 67 if (!pid) 60 - return thread ? current : current->group_leader; 68 + return NULL; 61 69 62 - p = find_task_by_vpid(pid); 63 - if (!p) 64 - return p; 65 - 66 - if (thread) 67 - return same_thread_group(p, current) ? p : NULL; 68 - 69 - if (gettime) { 70 - /* 71 - * For clock_gettime(PROCESS) the task does not need to be 72 - * the actual group leader. tsk->sighand gives 73 - * access to the group's clock. 74 - * 75 - * Timers need the group leader because they take a 76 - * reference on it and store the task pointer until the 77 - * timer is destroyed. 78 - */ 79 - return (p == current || thread_group_leader(p)) ? p : NULL; 70 + if (thread) { 71 + struct task_struct *tsk = pid_task(pid, PIDTYPE_PID); 72 + return (tsk && same_thread_group(tsk, current)) ? pid : NULL; 80 73 } 81 74 82 75 /* 83 - * For processes require that p is group leader. 76 + * For clock_gettime(PROCESS) allow finding the process by 77 + * with the pid of the current task. The code needs the tgid 78 + * of the process so that pid_task(pid, PIDTYPE_TGID) can be 79 + * used to find the process. 84 80 */ 85 - return has_group_leader_pid(p) ? p : NULL; 86 - } 81 + if (gettime && (pid == task_pid(current))) 82 + return task_tgid(current); 87 83 88 - static struct task_struct *__get_task_for_clock(const clockid_t clock, 89 - bool getref, bool gettime) 90 - { 91 - const bool thread = !!CPUCLOCK_PERTHREAD(clock); 92 - const pid_t pid = CPUCLOCK_PID(clock); 93 - struct task_struct *p; 94 - 95 - if (CPUCLOCK_WHICH(clock) >= CPUCLOCK_MAX) 96 - return NULL; 97 - 98 - rcu_read_lock(); 99 - p = lookup_task(pid, thread, gettime); 100 - if (p && getref) 101 - get_task_struct(p); 102 - rcu_read_unlock(); 103 - return p; 104 - } 105 - 106 - static inline struct task_struct *get_task_for_clock(const clockid_t clock) 107 - { 108 - return __get_task_for_clock(clock, true, false); 109 - } 110 - 111 - static inline struct task_struct *get_task_for_clock_get(const clockid_t clock) 112 - { 113 - return __get_task_for_clock(clock, true, true); 84 + /* 85 + * For processes require that pid identifies a process. 86 + */ 87 + return pid_has_task(pid, PIDTYPE_TGID) ? pid : NULL; 114 88 } 115 89 116 90 static inline int validate_clock_permissions(const clockid_t clock) 117 91 { 118 - return __get_task_for_clock(clock, false, false) ? 0 : -EINVAL; 92 + int ret; 93 + 94 + rcu_read_lock(); 95 + ret = pid_for_clock(clock, false) ? 0 : -EINVAL; 96 + rcu_read_unlock(); 97 + 98 + return ret; 119 99 } 120 100 121 - static inline enum pid_type cpu_timer_pid_type(struct k_itimer *timer) 101 + static inline enum pid_type clock_pid_type(const clockid_t clock) 122 102 { 123 - return CPUCLOCK_PERTHREAD(timer->it_clock) ? PIDTYPE_PID : PIDTYPE_TGID; 103 + return CPUCLOCK_PERTHREAD(clock) ? PIDTYPE_PID : PIDTYPE_TGID; 124 104 } 125 105 126 106 static inline struct task_struct *cpu_timer_task_rcu(struct k_itimer *timer) 127 107 { 128 - return pid_task(timer->it.cpu.pid, cpu_timer_pid_type(timer)); 108 + return pid_task(timer->it.cpu.pid, clock_pid_type(timer->it_clock)); 129 109 } 130 110 131 111 /* ··· 353 373 struct task_struct *tsk; 354 374 u64 t; 355 375 356 - tsk = get_task_for_clock_get(clock); 357 - if (!tsk) 376 + rcu_read_lock(); 377 + tsk = pid_task(pid_for_clock(clock, true), clock_pid_type(clock)); 378 + if (!tsk) { 379 + rcu_read_unlock(); 358 380 return -EINVAL; 381 + } 359 382 360 383 if (CPUCLOCK_PERTHREAD(clock)) 361 384 t = cpu_clock_sample(clkid, tsk); 362 385 else 363 386 t = cpu_clock_sample_group(clkid, tsk, false); 364 - put_task_struct(tsk); 387 + rcu_read_unlock(); 365 388 366 389 *tp = ns_to_timespec64(t); 367 390 return 0; ··· 377 394 */ 378 395 static int posix_cpu_timer_create(struct k_itimer *new_timer) 379 396 { 380 - struct task_struct *p = get_task_for_clock(new_timer->it_clock); 397 + struct pid *pid; 381 398 382 - if (!p) 399 + rcu_read_lock(); 400 + pid = pid_for_clock(new_timer->it_clock, false); 401 + if (!pid) { 402 + rcu_read_unlock(); 383 403 return -EINVAL; 404 + } 384 405 385 406 new_timer->kclock = &clock_posix_cpu; 386 407 timerqueue_init(&new_timer->it.cpu.node); 387 - new_timer->it.cpu.pid = get_task_pid(p, cpu_timer_pid_type(new_timer)); 388 - /* 389 - * get_task_for_clock() took a reference on @p. Drop it as the timer 390 - * holds a reference on the pid of @p. 391 - */ 392 - put_task_struct(p); 408 + new_timer->it.cpu.pid = get_pid(pid); 409 + rcu_read_unlock(); 393 410 return 0; 394 411 } 395 412

+1 -1

net/ipv6/ip6_flowlabel.c

··· 779 779 { 780 780 struct ip6fl_iter_state *state = ip6fl_seq_private(seq); 781 781 782 - state->pid_ns = proc_pid_ns(file_inode(seq->file)); 782 + state->pid_ns = proc_pid_ns(file_inode(seq->file)->i_sb); 783 783 784 784 rcu_read_lock_bh(); 785 785 return *pos ? ip6fl_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;

+3 -1

security/tomoyo/realpath.c

··· 7 7 8 8 #include "common.h" 9 9 #include <linux/magic.h> 10 + #include <linux/proc_fs.h> 10 11 11 12 /** 12 13 * tomoyo_encode2 - Encode binary string to ascii string. ··· 162 161 if (sb->s_magic == PROC_SUPER_MAGIC && *pos == '/') { 163 162 char *ep; 164 163 const pid_t pid = (pid_t) simple_strtoul(pos + 1, &ep, 10); 164 + struct pid_namespace *proc_pidns = proc_pid_ns(sb); 165 165 166 166 if (*ep == '/' && pid && pid == 167 - task_tgid_nr_ns(current, sb->s_fs_info)) { 167 + task_tgid_nr_ns(current, proc_pidns)) { 168 168 pos = ep - 5; 169 169 if (pos < buffer) 170 170 goto out;

tools/testing/selftests/proc/.gitignore

··· 2 2 /fd-001-lookup 3 3 /fd-002-posix-eq 4 4 /fd-003-kthread 5 + /proc-fsconfig-hidepid 5 6 /proc-loadavg-001 7 + /proc-multiple-procfs 6 8 /proc-pid-vm 7 9 /proc-self-map-files-001 8 10 /proc-self-map-files-002

tools/testing/selftests/proc/Makefile

··· 19 19 TEST_GEN_PROGS += setns-dcache 20 20 TEST_GEN_PROGS += setns-sysvipc 21 21 TEST_GEN_PROGS += thread-self 22 + TEST_GEN_PROGS += proc-multiple-procfs 23 + TEST_GEN_PROGS += proc-fsconfig-hidepid 22 24 23 25 include ../lib.mk

+50

tools/testing/selftests/proc/proc-fsconfig-hidepid.c

··· 1 + /* 2 + * Copyright © 2020 Alexey Gladkov <gladkov.alexey@gmail.com> 3 + * 4 + * Permission to use, copy, modify, and distribute this software for any 5 + * purpose with or without fee is hereby granted, provided that the above 6 + * copyright notice and this permission notice appear in all copies. 7 + * 8 + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 9 + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 10 + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 11 + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 12 + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 13 + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 14 + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 15 + */ 16 + #include <assert.h> 17 + #include <unistd.h> 18 + #include <stdlib.h> 19 + #include <errno.h> 20 + #include <linux/mount.h> 21 + #include <linux/unistd.h> 22 + 23 + static inline int fsopen(const char *fsname, unsigned int flags) 24 + { 25 + return syscall(__NR_fsopen, fsname, flags); 26 + } 27 + 28 + static inline int fsconfig(int fd, unsigned int cmd, const char *key, const void *val, int aux) 29 + { 30 + return syscall(__NR_fsconfig, fd, cmd, key, val, aux); 31 + } 32 + 33 + int main(void) 34 + { 35 + int fsfd, ret; 36 + int hidepid = 2; 37 + 38 + assert((fsfd = fsopen("proc", 0)) != -1); 39 + 40 + ret = fsconfig(fsfd, FSCONFIG_SET_BINARY, "hidepid", &hidepid, 0); 41 + assert(ret == -1); 42 + assert(errno == EINVAL); 43 + 44 + assert(!fsconfig(fsfd, FSCONFIG_SET_STRING, "hidepid", "2", 0)); 45 + assert(!fsconfig(fsfd, FSCONFIG_SET_STRING, "hidepid", "invisible", 0)); 46 + 47 + assert(!close(fsfd)); 48 + 49 + return 0; 50 + }

+48

tools/testing/selftests/proc/proc-multiple-procfs.c

··· 1 + /* 2 + * Copyright © 2020 Alexey Gladkov <gladkov.alexey@gmail.com> 3 + * 4 + * Permission to use, copy, modify, and distribute this software for any 5 + * purpose with or without fee is hereby granted, provided that the above 6 + * copyright notice and this permission notice appear in all copies. 7 + * 8 + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 9 + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 10 + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 11 + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 12 + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 13 + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 14 + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 15 + */ 16 + #include <assert.h> 17 + #include <stdlib.h> 18 + #include <stdio.h> 19 + #include <sys/mount.h> 20 + #include <sys/types.h> 21 + #include <sys/stat.h> 22 + 23 + int main(void) 24 + { 25 + struct stat proc_st1, proc_st2; 26 + char procbuff[] = "/tmp/proc.XXXXXX/meminfo"; 27 + char procdir1[] = "/tmp/proc.XXXXXX"; 28 + char procdir2[] = "/tmp/proc.XXXXXX"; 29 + 30 + assert(mkdtemp(procdir1) != NULL); 31 + assert(mkdtemp(procdir2) != NULL); 32 + 33 + assert(!mount("proc", procdir1, "proc", 0, "hidepid=1")); 34 + assert(!mount("proc", procdir2, "proc", 0, "hidepid=2")); 35 + 36 + snprintf(procbuff, sizeof(procbuff), "%s/meminfo", procdir1); 37 + assert(!stat(procbuff, &proc_st1)); 38 + 39 + snprintf(procbuff, sizeof(procbuff), "%s/meminfo", procdir2); 40 + assert(!stat(procbuff, &proc_st2)); 41 + 42 + umount(procdir1); 43 + umount(procdir2); 44 + 45 + assert(proc_st1.st_dev != proc_st2.st_dev); 46 + 47 + return 0; 48 + }