sched: fix TASK_WAKEKILL vs SIGKILL race

schedule() has the special "TASK_INTERRUPTIBLE && signal_pending()" case,
this allows us to do

current->state = TASK_INTERRUPTIBLE;
schedule();

without fear to sleep with pending signal.

However, the code like

current->state = TASK_KILLABLE;
schedule();

is not right, schedule() doesn't take TASK_WAKEKILL into account. This means
that mutex_lock_killable(), wait_for_completion_killable(), down_killable(),
schedule_timeout_killable() can miss SIGKILL (and btw the second SIGKILL has
no effect).

Introduce the new helper, signal_pending_state(), and change schedule() to
use it. Hopefully it will have more users, that is why the task's state is
passed separately.

Note this "__TASK_STOPPED | __TASK_TRACED" check in signal_pending_state().
This is needed to preserve the current behaviour (ptrace_notify). I hope
this check will be removed soon, but this (afaics good) change needs the
separate discussion.

The fast path is "(state & (INTERRUPTIBLE | WAKEKILL)) + signal_pending(p)",
basically the same that schedule() does now. However, this patch of course
bloats schedule().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

authored by Oleg Nesterov and committed by Ingo Molnar 16882c1e 39b945a3

+15 -4
+13
include/linux/sched.h
··· 2026 return signal_pending(p) && __fatal_signal_pending(p); 2027 } 2028 2029 static inline int need_resched(void) 2030 { 2031 return unlikely(test_thread_flag(TIF_NEED_RESCHED));
··· 2026 return signal_pending(p) && __fatal_signal_pending(p); 2027 } 2028 2029 + static inline int signal_pending_state(long state, struct task_struct *p) 2030 + { 2031 + if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL))) 2032 + return 0; 2033 + if (!signal_pending(p)) 2034 + return 0; 2035 + 2036 + if (state & (__TASK_STOPPED | __TASK_TRACED)) 2037 + return 0; 2038 + 2039 + return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p); 2040 + } 2041 + 2042 static inline int need_resched(void) 2043 { 2044 return unlikely(test_thread_flag(TIF_NEED_RESCHED));
+2 -4
kernel/sched.c
··· 4159 clear_tsk_need_resched(prev); 4160 4161 if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { 4162 - if (unlikely((prev->state & TASK_INTERRUPTIBLE) && 4163 - signal_pending(prev))) { 4164 prev->state = TASK_RUNNING; 4165 - } else { 4166 deactivate_task(rq, prev, 1); 4167 - } 4168 switch_count = &prev->nvcsw; 4169 } 4170
··· 4159 clear_tsk_need_resched(prev); 4160 4161 if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { 4162 + if (unlikely(signal_pending_state(prev->state, prev))) 4163 prev->state = TASK_RUNNING; 4164 + else 4165 deactivate_task(rq, prev, 1); 4166 switch_count = &prev->nvcsw; 4167 } 4168