sched/scs: Reset task stack state in bringup_cpu()

To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.

When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.

We previously fixed the KASAN issue in commit:

e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")

... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.

Subsequently in commit:

f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")

... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.

We fixed SCS (but not KASAN) in commit:

63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")

... but as this runs in the context of the idle task being offlined it's
potentially fragile.

To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.

Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.

I've tested this on arm64 with:

* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK

... offlining and onlining CPUS with:

| while true; do
| for C in /sys/devices/system/cpu/cpu*/online; do
| echo 0 > $C;
| echo 1 > $C;
| done
| done

Fixes: f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/

authored by Mark Rutland and committed by Peter Zijlstra dce1ca05 13605725

+7 -4
+7
kernel/cpu.c
··· 31 31 #include <linux/smpboot.h> 32 32 #include <linux/relay.h> 33 33 #include <linux/slab.h> 34 + #include <linux/scs.h> 34 35 #include <linux/percpu-rwsem.h> 35 36 #include <linux/cpuset.h> 36 37 ··· 587 586 { 588 587 struct task_struct *idle = idle_thread_get(cpu); 589 588 int ret; 589 + 590 + /* 591 + * Reset stale stack state from the last time this CPU was online. 592 + */ 593 + scs_task_reset(idle); 594 + kasan_unpoison_task_stack(idle); 590 595 591 596 /* 592 597 * Some architectures have to walk the irq descriptors to
-4
kernel/sched/core.c
··· 8619 8619 idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY; 8620 8620 kthread_set_per_cpu(idle, cpu); 8621 8621 8622 - scs_task_reset(idle); 8623 - kasan_unpoison_task_stack(idle); 8624 - 8625 8622 #ifdef CONFIG_SMP 8626 8623 /* 8627 8624 * It's possible that init_idle() gets called multiple times on a task, ··· 8774 8777 finish_arch_post_lock_switch(); 8775 8778 } 8776 8779 8777 - scs_task_reset(current); 8778 8780 /* finish_cpu(), as ran on the BP, will clean up the active_mm state */ 8779 8781 } 8780 8782