Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

rcu: Restrict access to RCU CPU stall notifiers

Although the RCU CPU stall notifiers can be useful for dumping state when
tracking down delicate forward-progress bugs where NUMA effects cause
cache lines to be delivered to a given CPU regularly, but always in a
state that prevents that CPU from making forward progress. These bugs can
be detected by the RCU CPU stall-warning mechanism, but in some cases,
the stall-warnings printk()s disrupt the forward-progress bug before
any useful state can be obtained.

Unfortunately, the notifier mechanism added by commit 5b404fdabacf ("rcu:
Add RCU CPU stall notifier") can make matters worse if used at all
carelessly. For example, if the stall warning was caused by a lock not
being released, then any attempt to acquire that lock in the notifier
will hang. This will prevent not only the notifier from producing any
useful output, but it will also prevent the stall-warning message from
ever appearing.

This commit therefore hides this new RCU CPU stall notifier
mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that
depends on both DEBUG_KERNEL and RCU_EXPERT. In addition, the
rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also
be specified. The RCU_CPU_STALL_NOTIFIER Kconfig option's help text
contains a warning and explains the dangers of careless use, recommending
lockless notifier code. In addition, a WARN() is triggered each time
that an attempt is made to register a stall-warning notifier in kernels
built with CONFIG_RCU_CPU_STALL_NOTIFIER=y.

This combination of measures will keep use of this mechanism confined to
debug kernels and away from routine deployments.

[ paulmck: Apply Dan Carpenter feedback. ]

Fixes: 5b404fdabacf ("rcu: Add RCU CPU stall notifier")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>

authored by

Paul E. McKenney and committed by
Neeraj Upadhyay (AMD)
4e58aaee 98b1cc82

+62 -12
+6
Documentation/admin-guide/kernel-parameters.txt
··· 5302 5302 Dump ftrace buffer after reporting RCU CPU 5303 5303 stall warning. 5304 5304 5305 + rcupdate.rcu_cpu_stall_notifiers= [KNL] 5306 + Provide RCU CPU stall notifiers, but see the 5307 + warnings in the RCU_CPU_STALL_NOTIFIER Kconfig 5308 + option's help text. TL;DR: You almost certainly 5309 + do not want rcupdate.rcu_cpu_stall_notifiers. 5310 + 5305 5311 rcupdate.rcu_cpu_stall_suppress= [KNL] 5306 5312 Suppress RCU CPU stall warning messages. 5307 5313
+3 -3
include/linux/rcu_notifier.h
··· 13 13 #define RCU_STALL_NOTIFY_NORM 1 14 14 #define RCU_STALL_NOTIFY_EXP 2 15 15 16 - #ifdef CONFIG_RCU_STALL_COMMON 16 + #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER) 17 17 18 18 #include <linux/notifier.h> 19 19 #include <linux/types.h> ··· 21 21 int rcu_stall_chain_notifier_register(struct notifier_block *n); 22 22 int rcu_stall_chain_notifier_unregister(struct notifier_block *n); 23 23 24 - #else // #ifdef CONFIG_RCU_STALL_COMMON 24 + #else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER) 25 25 26 26 // No RCU CPU stall warnings in Tiny RCU. 27 27 static inline int rcu_stall_chain_notifier_register(struct notifier_block *n) { return -EEXIST; } 28 28 static inline int rcu_stall_chain_notifier_unregister(struct notifier_block *n) { return -ENOENT; } 29 29 30 - #endif // #else // #ifdef CONFIG_RCU_STALL_COMMON 30 + #endif // #else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER) 31 31 32 32 #endif /* __LINUX_RCU_NOTIFIER_H */
+25
kernel/rcu/Kconfig.debug
··· 105 105 The boot option rcupdate.rcu_cpu_stall_cputime has the same function 106 106 as this one, but will override this if it exists. 107 107 108 + config RCU_CPU_STALL_NOTIFIER 109 + bool "Provide RCU CPU-stall notifiers" 110 + depends on RCU_STALL_COMMON 111 + depends on DEBUG_KERNEL 112 + depends on RCU_EXPERT 113 + default n 114 + help 115 + WARNING: You almost certainly do not want this!!! 116 + 117 + Enable RCU CPU-stall notifiers, which are invoked just before 118 + printing the RCU CPU stall warning. As such, bugs in notifier 119 + callbacks can prevent stall warnings from being printed. 120 + And the whole reason that a stall warning is being printed is 121 + that something is hung up somewhere. Therefore, the notifier 122 + callbacks must be written extremely carefully, preferably 123 + containing only lockless code. After all, it is quite possible 124 + that the whole reason that the RCU CPU stall is happening in 125 + the first place is that someone forgot to release whatever lock 126 + that you are thinking of acquiring. In which case, having your 127 + notifier callback acquire that lock will hang, preventing the 128 + RCU CPU stall warning from appearing. 129 + 130 + Say Y here if you want RCU CPU stall notifiers (you don't want them) 131 + Say N if you are unsure. 132 + 108 133 config RCU_TRACE 109 134 bool "Enable tracing for RCU" 110 135 depends on DEBUG_KERNEL
+5 -3
kernel/rcu/rcu.h
··· 262 262 return rcu_cpu_stall_suppress_at_boot && !rcu_inkernel_boot_has_ended(); 263 263 } 264 264 265 + extern int rcu_cpu_stall_notifiers; 266 + 265 267 #ifdef CONFIG_RCU_STALL_COMMON 266 268 267 269 extern int rcu_cpu_stall_ftrace_dump; ··· 661 659 bool rcu_cpu_beenfullyonline(int cpu); 662 660 #endif 663 661 664 - #ifdef CONFIG_RCU_STALL_COMMON 662 + #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER) 665 663 int rcu_stall_notifier_call_chain(unsigned long val, void *v); 666 - #else // #ifdef CONFIG_RCU_STALL_COMMON 664 + #else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER) 667 665 static inline int rcu_stall_notifier_call_chain(unsigned long val, void *v) { return NOTIFY_DONE; } 668 - #endif // #else // #ifdef CONFIG_RCU_STALL_COMMON 666 + #endif // #else // #if defined(CONFIG_RCU_STALL_COMMON) && defined(CONFIG_RCU_CPU_STALL_NOTIFIER) 669 667 670 668 #endif /* __LINUX_RCU_H */
+7 -5
kernel/rcu/rcutorture.c
··· 2450 2450 unsigned long stop_at; 2451 2451 2452 2452 VERBOSE_TOROUT_STRING("rcu_torture_stall task started"); 2453 - ret = rcu_stall_chain_notifier_register(&rcu_torture_stall_block); 2454 - if (ret) 2455 - pr_info("%s: rcu_stall_chain_notifier_register() returned %d, %sexpected.\n", 2456 - __func__, ret, !IS_ENABLED(CONFIG_RCU_STALL_COMMON) ? "un" : ""); 2453 + if (rcu_cpu_stall_notifiers) { 2454 + ret = rcu_stall_chain_notifier_register(&rcu_torture_stall_block); 2455 + if (ret) 2456 + pr_info("%s: rcu_stall_chain_notifier_register() returned %d, %sexpected.\n", 2457 + __func__, ret, !IS_ENABLED(CONFIG_RCU_STALL_COMMON) ? "un" : ""); 2458 + } 2457 2459 if (stall_cpu_holdoff > 0) { 2458 2460 VERBOSE_TOROUT_STRING("rcu_torture_stall begin holdoff"); 2459 2461 schedule_timeout_interruptible(stall_cpu_holdoff * HZ); ··· 2499 2497 cur_ops->readunlock(idx); 2500 2498 } 2501 2499 pr_alert("%s end.\n", __func__); 2502 - if (!ret) { 2500 + if (rcu_cpu_stall_notifiers && !ret) { 2503 2501 ret = rcu_stall_chain_notifier_unregister(&rcu_torture_stall_block); 2504 2502 if (ret) 2505 2503 pr_info("%s: rcu_stall_chain_notifier_unregister() returned %d.\n", __func__, ret);
+10 -1
kernel/rcu/tree_stall.h
··· 1061 1061 } 1062 1062 early_initcall(rcu_sysrq_init); 1063 1063 1064 + #ifdef CONFIG_RCU_CPU_STALL_NOTIFIER 1064 1065 1065 1066 ////////////////////////////////////////////////////////////////////////////// 1066 1067 // ··· 1082 1081 */ 1083 1082 int rcu_stall_chain_notifier_register(struct notifier_block *n) 1084 1083 { 1085 - return atomic_notifier_chain_register(&rcu_cpu_stall_notifier_list, n); 1084 + int rcsn = rcu_cpu_stall_notifiers; 1085 + 1086 + WARN(1, "Adding %pS() to RCU stall notifier list (%s).\n", n->notifier_call, 1087 + rcsn ? "possibly suppressing RCU CPU stall warnings" : "failed, so all is well"); 1088 + if (rcsn) 1089 + return atomic_notifier_chain_register(&rcu_cpu_stall_notifier_list, n); 1090 + return -EEXIST; 1086 1091 } 1087 1092 EXPORT_SYMBOL_GPL(rcu_stall_chain_notifier_register); 1088 1093 ··· 1122 1115 { 1123 1116 return atomic_notifier_call_chain(&rcu_cpu_stall_notifier_list, val, v); 1124 1117 } 1118 + 1119 + #endif // #ifdef CONFIG_RCU_CPU_STALL_NOTIFIER
+6
kernel/rcu/update.c
··· 538 538 EXPORT_SYMBOL_GPL(torture_sched_setaffinity); 539 539 #endif 540 540 541 + int rcu_cpu_stall_notifiers __read_mostly; // !0 = provide stall notifiers (rarely useful) 542 + EXPORT_SYMBOL_GPL(rcu_cpu_stall_notifiers); 543 + 541 544 #ifdef CONFIG_RCU_STALL_COMMON 542 545 int rcu_cpu_stall_ftrace_dump __read_mostly; 543 546 module_param(rcu_cpu_stall_ftrace_dump, int, 0644); 547 + #ifdef CONFIG_RCU_CPU_STALL_NOTIFIER 548 + module_param(rcu_cpu_stall_notifiers, int, 0444); 549 + #endif // #ifdef CONFIG_RCU_CPU_STALL_NOTIFIER 544 550 int rcu_cpu_stall_suppress __read_mostly; // !0 = suppress stall warnings. 545 551 EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress); 546 552 module_param(rcu_cpu_stall_suppress, int, 0644);