Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

exec: make de_thread() killable

Change de_thread() to use KILLABLE rather than UNINTERRUPTIBLE while
waiting for other threads. The only complication is that we should
clear ->group_exit_task and ->notify_count before we return, and we
should do this under tasklist_lock. -EAGAIN is used to match the
initial signal_group_exit() check/return, it doesn't really matter.

This fixes the (unlikely) race with coredump. de_thread() checks
signal_group_exit() before it starts to kill the subthreads, but this
can't help if another CLONE_VM (but non CLONE_THREAD) task starts the
coredumping after de_thread() unlocks ->siglock. In this case the
killed sub-thread can block in exit_mm() waiting for coredump_finish(),
execing thread waits for that sub-thead, and the coredumping thread
waits for execing thread. Deadlock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Oleg Nesterov and committed by
Linus Torvalds
d5bbd43d b5356a19

+14 -2
+14 -2
fs/exec.c
··· 878 878 sig->notify_count--; 879 879 880 880 while (sig->notify_count) { 881 - __set_current_state(TASK_UNINTERRUPTIBLE); 881 + __set_current_state(TASK_KILLABLE); 882 882 spin_unlock_irq(lock); 883 883 schedule(); 884 + if (unlikely(__fatal_signal_pending(tsk))) 885 + goto killed; 884 886 spin_lock_irq(lock); 885 887 } 886 888 spin_unlock_irq(lock); ··· 900 898 write_lock_irq(&tasklist_lock); 901 899 if (likely(leader->exit_state)) 902 900 break; 903 - __set_current_state(TASK_UNINTERRUPTIBLE); 901 + __set_current_state(TASK_KILLABLE); 904 902 write_unlock_irq(&tasklist_lock); 905 903 schedule(); 904 + if (unlikely(__fatal_signal_pending(tsk))) 905 + goto killed; 906 906 } 907 907 908 908 /* ··· 998 994 999 995 BUG_ON(!thread_group_leader(tsk)); 1000 996 return 0; 997 + 998 + killed: 999 + /* protects against exit_notify() and __exit_signal() */ 1000 + read_lock(&tasklist_lock); 1001 + sig->group_exit_task = NULL; 1002 + sig->notify_count = 0; 1003 + read_unlock(&tasklist_lock); 1004 + return -EAGAIN; 1001 1005 } 1002 1006 1003 1007 char *get_task_comm(char *buf, struct task_struct *tsk)