Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

kernel: add panic_on_warn

There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system. Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to
the user.

A much easier method would be a switch to change the WARN() over to a
panic. This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the
warn_slowpath_common() path. The function will still print out the
location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location. After that the panic() output is displayed.

WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...

CPU: 30 PID: 11698 Comm: insmod Tainted: G W OE 3.17.0+ #57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
[<ffffffff81665190>] dump_stack+0x46/0x58
[<ffffffff8165e2ec>] panic+0xd0/0x204
[<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
[<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
[<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
[<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
[<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
[<ffffffff81002144>] do_one_initcall+0xd4/0x210
[<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
[<ffffffff810f8889>] load_module+0x16a9/0x1b30
[<ffffffff810f3d30>] ? store_uevent+0x70/0x70
[<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
[<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
[<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Prarit Bhargava and committed by
Linus Torvalds
9e3961a0 f938612d

+61 -14
+7
Documentation/kdump/kdump.txt
··· 471 471 472 472 http://people.redhat.com/~anderson/ 473 473 474 + Trigger Kdump on WARN() 475 + ======================= 476 + 477 + The kernel parameter, panic_on_warn, calls panic() in all WARN() paths. This 478 + will cause a kdump to occur at the panic() call. In cases where a user wants 479 + to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1 480 + to achieve the same behaviour. 474 481 475 482 Contact 476 483 =======
+3
Documentation/kernel-parameters.txt
··· 2509 2509 timeout < 0: reboot immediately 2510 2510 Format: <timeout> 2511 2511 2512 + panic_on_warn panic() instead of WARN(). Useful to cause kdump 2513 + on a WARN(). 2514 + 2512 2515 crash_kexec_post_notifiers 2513 2516 Run kdump after running panic-notifiers and dumping 2514 2517 kmsg. This only for the users who doubt kdump always
+26 -14
Documentation/sysctl/kernel.txt
··· 54 54 - overflowuid 55 55 - panic 56 56 - panic_on_oops 57 - - panic_on_unrecovered_nmi 58 57 - panic_on_stackoverflow 58 + - panic_on_unrecovered_nmi 59 + - panic_on_warn 59 60 - pid_max 60 61 - powersave-nap [ PPC only ] 61 62 - printk ··· 528 527 529 528 ============================================================== 530 529 531 - panic_on_unrecovered_nmi: 532 - 533 - The default Linux behaviour on an NMI of either memory or unknown is 534 - to continue operation. For many environments such as scientific 535 - computing it is preferable that the box is taken out and the error 536 - dealt with than an uncorrected parity/ECC error get propagated. 537 - 538 - A small number of systems do generate NMI's for bizarre random reasons 539 - such as power management so the default is off. That sysctl works like 540 - the existing panic controls already in that directory. 541 - 542 - ============================================================== 543 - 544 530 panic_on_oops: 545 531 546 532 Controls the kernel's behaviour when an oops or BUG is encountered. ··· 548 560 0: try to continue operation. 549 561 550 562 1: panic immediately. 563 + 564 + ============================================================== 565 + 566 + panic_on_unrecovered_nmi: 567 + 568 + The default Linux behaviour on an NMI of either memory or unknown is 569 + to continue operation. For many environments such as scientific 570 + computing it is preferable that the box is taken out and the error 571 + dealt with than an uncorrected parity/ECC error get propagated. 572 + 573 + A small number of systems do generate NMI's for bizarre random reasons 574 + such as power management so the default is off. That sysctl works like 575 + the existing panic controls already in that directory. 576 + 577 + ============================================================== 578 + 579 + panic_on_warn: 580 + 581 + Calls panic() in the WARN() path when set to 1. This is useful to avoid 582 + a kernel rebuild when attempting to kdump at the location of a WARN(). 583 + 584 + 0: only WARN(), default behaviour. 585 + 586 + 1: call panic() after printing out WARN() location. 551 587 552 588 ============================================================== 553 589
+1
include/linux/kernel.h
··· 427 427 extern int panic_on_oops; 428 428 extern int panic_on_unrecovered_nmi; 429 429 extern int panic_on_io_nmi; 430 + extern int panic_on_warn; 430 431 extern int sysctl_panic_on_stackoverflow; 431 432 /* 432 433 * Only to be used by arch init code. If the user over-wrote the default
+1
include/uapi/linux/sysctl.h
··· 153 153 KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */ 154 154 KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ 155 155 KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ 156 + KERN_PANIC_ON_WARN=77, /* int: call panic() in WARN() functions */ 156 157 }; 157 158 158 159
+13
kernel/panic.c
··· 33 33 static int pause_on_oops_flag; 34 34 static DEFINE_SPINLOCK(pause_on_oops_lock); 35 35 static bool crash_kexec_post_notifiers; 36 + int panic_on_warn __read_mostly; 36 37 37 38 int panic_timeout = CONFIG_PANIC_TIMEOUT; 38 39 EXPORT_SYMBOL_GPL(panic_timeout); ··· 429 428 if (args) 430 429 vprintk(args->fmt, args->args); 431 430 431 + if (panic_on_warn) { 432 + /* 433 + * This thread may hit another WARN() in the panic path. 434 + * Resetting this prevents additional WARN() from panicking the 435 + * system on this thread. Other threads are blocked by the 436 + * panic_mutex in panic(). 437 + */ 438 + panic_on_warn = 0; 439 + panic("panic_on_warn set ...\n"); 440 + } 441 + 432 442 print_modules(); 433 443 dump_stack(); 434 444 print_oops_end_marker(); ··· 497 485 498 486 core_param(panic, panic_timeout, int, 0644); 499 487 core_param(pause_on_oops, pause_on_oops, int, 0644); 488 + core_param(panic_on_warn, panic_on_warn, int, 0644); 500 489 501 490 static int __init setup_crash_kexec_post_notifiers(char *s) 502 491 {
+9
kernel/sysctl.c
··· 1104 1104 .proc_handler = proc_dointvec, 1105 1105 }, 1106 1106 #endif 1107 + { 1108 + .procname = "panic_on_warn", 1109 + .data = &panic_on_warn, 1110 + .maxlen = sizeof(int), 1111 + .mode = 0644, 1112 + .proc_handler = proc_dointvec_minmax, 1113 + .extra1 = &zero, 1114 + .extra2 = &one, 1115 + }, 1107 1116 { } 1108 1117 }; 1109 1118
+1
kernel/sysctl_binary.c
··· 137 137 { CTL_INT, KERN_COMPAT_LOG, "compat-log" }, 138 138 { CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" }, 139 139 { CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" }, 140 + { CTL_INT, KERN_PANIC_ON_WARN, "panic_on_warn" }, 140 141 {} 141 142 }; 142 143