Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

sched_ext: Hook up hardlockup detector

A poorly behaving BPF scheduler can trigger hard lockup. For example, on a
large system with many tasks pinned to different subsets of CPUs, if the BPF
scheduler puts all tasks in a single DSQ and lets all CPUs at it, the DSQ lock
can be contended to the point where hardlockup triggers. Unfortunately,
hardlockup can be the first signal out of such situations, thus requiring
hardlockup handling.

Hook scx_hardlockup() into the hardlockup detector to try kicking out the
current scheduler in an attempt to recover the system to a good state. The
handling strategy can delay watchdog taking its own action by one polling
period; however, given that the only remediation for hardlockup is crash, this
is likely an acceptable trade-off.

v2: Add missing dummy scx_hardlockup() definition for
!CONFIG_SCHED_CLASS_EXT (kernel test bot).

Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

Tejun Heo 582f700e 7ed8df0d

+29
+2
include/linux/sched/ext.h
··· 223 223 void sched_ext_dead(struct task_struct *p); 224 224 void print_scx_info(const char *log_lvl, struct task_struct *p); 225 225 void scx_softlockup(u32 dur_s); 226 + bool scx_hardlockup(void); 226 227 bool scx_rcu_cpu_stall(void); 227 228 228 229 #else /* !CONFIG_SCHED_CLASS_EXT */ ··· 231 230 static inline void sched_ext_dead(struct task_struct *p) {} 232 231 static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {} 233 232 static inline void scx_softlockup(u32 dur_s) {} 233 + static inline bool scx_hardlockup(void) { return false; } 234 234 static inline bool scx_rcu_cpu_stall(void) { return false; } 235 235 236 236 #endif /* CONFIG_SCHED_CLASS_EXT */
+18
kernel/sched/ext.c
··· 3712 3712 } 3713 3713 3714 3714 /** 3715 + * scx_hardlockup - sched_ext hardlockup handler 3716 + * 3717 + * A poorly behaving BPF scheduler can trigger hard lockup by e.g. putting 3718 + * numerous affinitized tasks in a single queue and directing all CPUs at it. 3719 + * Try kicking out the current scheduler in an attempt to recover the system to 3720 + * a good state before taking more drastic actions. 3721 + */ 3722 + bool scx_hardlockup(void) 3723 + { 3724 + if (!handle_lockup("hard lockup - CPU %d", smp_processor_id())) 3725 + return false; 3726 + 3727 + printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n", 3728 + smp_processor_id()); 3729 + return true; 3730 + } 3731 + 3732 + /** 3715 3733 * scx_bypass - [Un]bypass scx_ops and guarantee forward progress 3716 3734 * @bypass: true for bypass, false for unbypass 3717 3735 *
+9
kernel/watchdog.c
··· 196 196 #ifdef CONFIG_SYSFS 197 197 ++hardlockup_count; 198 198 #endif 199 + /* 200 + * A poorly behaving BPF scheduler can trigger hard lockup by 201 + * e.g. putting numerous affinitized tasks in a single queue and 202 + * directing all CPUs at it. The following call can return true 203 + * only once when sched_ext is enabled and will immediately 204 + * abort the BPF scheduler and print out a warning message. 205 + */ 206 + if (scx_hardlockup()) 207 + return; 199 208 200 209 /* Only print hardlockups once. */ 201 210 if (per_cpu(watchdog_hardlockup_warned, cpu))