Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

proc: Ensure we see the exit of each process tid exactly once

When the thread group leader changes during exec and the old leaders
thread is reaped proc_flush_pid will flush the dentries for the entire
process because the leader still has it's original pid.

Fix this by exchanging the pids in an rcu safe manner,
and wrapping the code to do that up in a helper exchange_tids.

When I removed switch_exec_pids and introduced this behavior
in d73d65293e3e ("[PATCH] pidhash: kill switch_exec_pids") there
really was nothing that cared as flushing happened with
the cached dentry and de_thread flushed both of them on exec.

This lack of fully exchanging pids became a problem a few months later
when I introduced 48e6484d4902 ("[PATCH] proc: Rewrite the proc dentry
flush on exit optimization"). Which overlooked the de_thread case
was no longer swapping pids, and I was looking up proc dentries
by task->pid.

The current behavior isn't properly a bug as everything in proc will
continue to work correctly just a little bit less efficiently. Fix
this just so there are no little surprise corner cases waiting to bite
people.

-- Oleg points out this could be an issue in next_tgid in proc where
has_group_leader_pid is called, and reording some of the assignments
should fix that.

-- Oleg points out this will break the 10 year old hack in __exit_signal.c
> /*
> * This can only happen if the caller is de_thread().
> * FIXME: this is the temporary hack, we should teach
> * posix-cpu-timers to handle this case correctly.
> */
> if (unlikely(has_group_leader_pid(tsk)))
> posix_cpu_timers_exit_group(tsk);

The code in next_tgid has been changed to use PIDTYPE_TGID,
and the posix cpu timers code has been fixed so it does not
need the 10 year old hack, so this should be safe to merge
now.

Link: https://lore.kernel.org/lkml/87h7x3ajll.fsf_-_@x220.int.ebiederm.org/
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Fixes: 48e6484d4902 ("[PATCH] proc: Rewrite the proc dentry flush on exit optimization").
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

+21 -4
+1 -4
fs/exec.c
··· 1186 1186 1187 1187 /* Become a process group leader with the old leader's pid. 1188 1188 * The old leader becomes a thread of the this thread group. 1189 - * Note: The old leader also uses this pid until release_task 1190 - * is called. Odd but simple and correct. 1191 1189 */ 1192 - tsk->pid = leader->pid; 1193 - change_pid(tsk, PIDTYPE_PID, task_pid(leader)); 1190 + exchange_tids(tsk, leader); 1194 1191 transfer_pid(leader, tsk, PIDTYPE_TGID); 1195 1192 transfer_pid(leader, tsk, PIDTYPE_PGID); 1196 1193 transfer_pid(leader, tsk, PIDTYPE_SID);
+1
include/linux/pid.h
··· 102 102 extern void detach_pid(struct task_struct *task, enum pid_type); 103 103 extern void change_pid(struct task_struct *task, enum pid_type, 104 104 struct pid *pid); 105 + extern void exchange_tids(struct task_struct *task, struct task_struct *old); 105 106 extern void transfer_pid(struct task_struct *old, struct task_struct *new, 106 107 enum pid_type); 107 108
+19
kernel/pid.c
··· 363 363 attach_pid(task, type); 364 364 } 365 365 366 + void exchange_tids(struct task_struct *left, struct task_struct *right) 367 + { 368 + struct pid *pid1 = left->thread_pid; 369 + struct pid *pid2 = right->thread_pid; 370 + struct hlist_head *head1 = &pid1->tasks[PIDTYPE_PID]; 371 + struct hlist_head *head2 = &pid2->tasks[PIDTYPE_PID]; 372 + 373 + /* Swap the single entry tid lists */ 374 + hlists_swap_heads_rcu(head1, head2); 375 + 376 + /* Swap the per task_struct pid */ 377 + rcu_assign_pointer(left->thread_pid, pid2); 378 + rcu_assign_pointer(right->thread_pid, pid1); 379 + 380 + /* Swap the cached value */ 381 + WRITE_ONCE(left->pid, pid_nr(pid2)); 382 + WRITE_ONCE(right->pid, pid_nr(pid1)); 383 + } 384 + 366 385 /* transfer_pid is an optimization of attach_pid(new), detach_pid(old) */ 367 386 void transfer_pid(struct task_struct *old, struct task_struct *new, 368 387 enum pid_type type)