Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

hung_task: panic when there are more than N hung tasks at the same time

The hung_task_panic sysctl is currently a blunt instrument: it's all or
nothing.

Panicking on a single hung task can be an overreaction to a transient
glitch. A more reliable indicator of a systemic problem is when
multiple tasks hang simultaneously.

Extend hung_task_panic to accept an integer threshold, allowing the
kernel to panic only when N hung tasks are detected in a single scan.
This provides finer control to distinguish between isolated incidents
and system-wide failures.

The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic on the first hung task (unchanged)
- N > 1: Panic after N hung tasks are detected in a single scan

The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility.

[lance.yang@linux.dev: new changelog]
Link: https://lkml.kernel.org/r/20251015063615.2632-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Lance Yang <lance.yang@linux.dev>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> [aspeed_g5_defconfig]
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Florian Wesphal <fw@strlen.de>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Li RongQing and committed by
Andrew Morton
9544f9e6 05d6f1cc

+36 -23
+13 -7
Documentation/admin-guide/kernel-parameters.txt
··· 2010 2010 the added memory block itself do not be affected. 2011 2011 2012 2012 hung_task_panic= 2013 - [KNL] Should the hung task detector generate panics. 2014 - Format: 0 | 1 2013 + [KNL] Number of hung tasks to trigger kernel panic. 2014 + Format: <int> 2015 2015 2016 - A value of 1 instructs the kernel to panic when a 2017 - hung task is detected. The default value is controlled 2018 - by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time 2019 - option. The value selected by this boot parameter can 2020 - be changed later by the kernel.hung_task_panic sysctl. 2016 + When set to a non-zero value, a kernel panic will be triggered if 2017 + the number of detected hung tasks reaches this value. 2018 + 2019 + 0: don't panic 2020 + 1: panic immediately on first hung task 2021 + N: panic after N hung tasks are detected in a single scan 2022 + 2023 + The default value is controlled by the 2024 + CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time option. The value 2025 + selected by this boot parameter can be changed later by the 2026 + kernel.hung_task_panic sysctl. 2021 2027 2022 2028 hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC) 2023 2029 terminal devices. Valid values: 0..8
+5 -4
Documentation/admin-guide/sysctl/kernel.rst
··· 397 397 hung_task_panic 398 398 =============== 399 399 400 - Controls the kernel's behavior when a hung task is detected. 400 + When set to a non-zero value, a kernel panic will be triggered if the 401 + number of hung tasks found during a single scan reaches this value. 401 402 This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 402 403 403 - = ================================================= 404 + = ======================================================= 404 405 0 Continue operation. This is the default behavior. 405 - 1 Panic immediately. 406 - = ================================================= 406 + N Panic when N hung tasks are found during a single scan. 407 + = ======================================================= 407 408 408 409 409 410 hung_task_check_count
+1 -1
arch/arm/configs/aspeed_g5_defconfig
··· 308 308 CONFIG_PANIC_TIMEOUT=-1 309 309 CONFIG_SOFTLOCKUP_DETECTOR=y 310 310 CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y 311 - CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y 311 + CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1 312 312 CONFIG_WQ_WATCHDOG=y 313 313 # CONFIG_SCHED_DEBUG is not set 314 314 CONFIG_FUNCTION_TRACER=y
+1 -1
kernel/configs/debug.config
··· 83 83 # 84 84 # Debug Oops, Lockups and Hangs 85 85 # 86 - # CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set 86 + CONFIG_BOOTPARAM_HUNG_TASK_PANIC=0 87 87 # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set 88 88 CONFIG_DEBUG_ATOMIC_SLEEP=y 89 89 CONFIG_DETECT_HUNG_TASK=y
+10 -5
kernel/hung_task.c
··· 81 81 * hung task is detected: 82 82 */ 83 83 static unsigned int __read_mostly sysctl_hung_task_panic = 84 - IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC); 84 + CONFIG_BOOTPARAM_HUNG_TASK_PANIC; 85 85 86 86 static int 87 87 hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr) ··· 218 218 } 219 219 #endif 220 220 221 - static void check_hung_task(struct task_struct *t, unsigned long timeout) 221 + static void check_hung_task(struct task_struct *t, unsigned long timeout, 222 + unsigned long prev_detect_count) 222 223 { 224 + unsigned long total_hung_task; 225 + 223 226 if (!task_is_hung(t, timeout)) 224 227 return; 225 228 ··· 232 229 */ 233 230 sysctl_hung_task_detect_count++; 234 231 232 + total_hung_task = sysctl_hung_task_detect_count - prev_detect_count; 235 233 trace_sched_process_hang(t); 236 234 237 - if (sysctl_hung_task_panic) { 235 + if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) { 238 236 console_verbose(); 239 237 hung_task_show_lock = true; 240 238 hung_task_call_panic = true; ··· 304 300 int max_count = sysctl_hung_task_check_count; 305 301 unsigned long last_break = jiffies; 306 302 struct task_struct *g, *t; 303 + unsigned long prev_detect_count = sysctl_hung_task_detect_count; 307 304 308 305 /* 309 306 * If the system crashed already then all bets are off, ··· 325 320 last_break = jiffies; 326 321 } 327 322 328 - check_hung_task(t, timeout); 323 + check_hung_task(t, timeout, prev_detect_count); 329 324 } 330 325 unlock: 331 326 rcu_read_unlock(); ··· 394 389 .mode = 0644, 395 390 .proc_handler = proc_dointvec_minmax, 396 391 .extra1 = SYSCTL_ZERO, 397 - .extra2 = SYSCTL_ONE, 392 + .extra2 = SYSCTL_INT_MAX, 398 393 }, 399 394 { 400 395 .procname = "hung_task_check_count",
+5 -4
lib/Kconfig.debug
··· 1257 1257 Keeping the default should be fine in most cases. 1258 1258 1259 1259 config BOOTPARAM_HUNG_TASK_PANIC 1260 - bool "Panic (Reboot) On Hung Tasks" 1260 + int "Number of hung tasks to trigger kernel panic" 1261 1261 depends on DETECT_HUNG_TASK 1262 + default 0 1262 1263 help 1263 - Say Y here to enable the kernel to panic on "hung tasks", 1264 - which are bugs that cause the kernel to leave a task stuck 1265 - in uninterruptible "D" state. 1264 + When set to a non-zero value, a kernel panic will be triggered 1265 + if the number of hung tasks found during a single scan reaches 1266 + this value. 1266 1267 1267 1268 The panic can be used in combination with panic_timeout, 1268 1269 to cause the system to reboot automatically after a
+1 -1
tools/testing/selftests/wireguard/qemu/kernel.config
··· 81 81 CONFIG_DETECT_HUNG_TASK=y 82 82 CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y 83 83 CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y 84 - CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y 84 + CONFIG_BOOTPARAM_HUNG_TASK_PANIC=1 85 85 CONFIG_PANIC_TIMEOUT=-1 86 86 CONFIG_STACKTRACE=y 87 87 CONFIG_EARLY_PRINTK=y