Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

reboot: reboot, not shutdown, on hw_protection_reboot timeout

hw_protection_shutdown() will kick off an orderly shutdown and if that
takes longer than a configurable amount of time, an emergency shutdown
will occur.

Recently, hw_protection_reboot() was added for those systems that don't
implement a proper shutdown and are better served by rebooting and having
the boot firmware worry about doing something about the critical
condition.

On timeout of the orderly reboot of hw_protection_reboot(), the system
would go into shutdown, instead of reboot. This is not a good idea, as
going into shutdown was explicitly not asked for.

Fix this by always doing an emergency reboot if hw_protection_reboot() is
called and the orderly reboot takes too long.

Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-2-e1c09b090c0c@pengutronix.de
Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
Cc: Benson Leung <bleung@chromium.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Fabio Estevam <festevam@denx.de>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matteo Croce <teknoraver@meta.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Ahmad Fatoum and committed by
Andrew Morton
bbf0ec4f 318f05a0

+49 -21
+49 -21
kernel/reboot.c
··· 932 932 } 933 933 EXPORT_SYMBOL_GPL(orderly_reboot); 934 934 935 + static const char *hw_protection_action_str(enum hw_protection_action action) 936 + { 937 + switch (action) { 938 + case HWPROT_ACT_SHUTDOWN: 939 + return "shutdown"; 940 + case HWPROT_ACT_REBOOT: 941 + return "reboot"; 942 + default: 943 + return "undefined"; 944 + } 945 + } 946 + 947 + static enum hw_protection_action hw_failure_emergency_action; 948 + 935 949 /** 936 - * hw_failure_emergency_poweroff_func - emergency poweroff work after a known delay 937 - * @work: work_struct associated with the emergency poweroff function 950 + * hw_failure_emergency_action_func - emergency action work after a known delay 951 + * @work: work_struct associated with the emergency action function 938 952 * 939 953 * This function is called in very critical situations to force 940 - * a kernel poweroff after a configurable timeout value. 954 + * a kernel poweroff or reboot after a configurable timeout value. 941 955 */ 942 - static void hw_failure_emergency_poweroff_func(struct work_struct *work) 956 + static void hw_failure_emergency_action_func(struct work_struct *work) 943 957 { 958 + const char *action_str = hw_protection_action_str(hw_failure_emergency_action); 959 + 960 + pr_emerg("Hardware protection timed-out. Trying forced %s\n", 961 + action_str); 962 + 944 963 /* 945 - * We have reached here after the emergency shutdown waiting period has 946 - * expired. This means orderly_poweroff has not been able to shut off 947 - * the system for some reason. 964 + * We have reached here after the emergency action waiting period has 965 + * expired. This means orderly_poweroff/reboot has not been able to 966 + * shut off the system for some reason. 948 967 * 949 - * Try to shut down the system immediately using kernel_power_off 950 - * if populated 968 + * Try to shut off the system immediately if possible 951 969 */ 952 - pr_emerg("Hardware protection timed-out. Trying forced poweroff\n"); 953 - kernel_power_off(); 970 + 971 + if (hw_failure_emergency_action == HWPROT_ACT_REBOOT) 972 + kernel_restart(NULL); 973 + else 974 + kernel_power_off(); 954 975 955 976 /* 956 977 * Worst of the worst case trigger emergency restart 957 978 */ 958 - pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n"); 979 + pr_emerg("Hardware protection %s failed. Trying emergency restart\n", 980 + action_str); 959 981 emergency_restart(); 960 982 } 961 983 962 - static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work, 963 - hw_failure_emergency_poweroff_func); 984 + static DECLARE_DELAYED_WORK(hw_failure_emergency_action_work, 985 + hw_failure_emergency_action_func); 964 986 965 987 /** 966 - * hw_failure_emergency_poweroff - Trigger an emergency system poweroff 988 + * hw_failure_emergency_schedule - Schedule an emergency system shutdown or reboot 989 + * 990 + * @action: The hardware protection action to be taken 991 + * @action_delay_ms: Time in milliseconds to elapse before triggering action 967 992 * 968 993 * This may be called from any critical situation to trigger a system shutdown 969 - * after a given period of time. If time is negative this is not scheduled. 994 + * or reboot after a given period of time. 995 + * If time is negative this is not scheduled. 970 996 */ 971 - static void hw_failure_emergency_poweroff(int poweroff_delay_ms) 997 + static void hw_failure_emergency_schedule(enum hw_protection_action action, 998 + int action_delay_ms) 972 999 { 973 - if (poweroff_delay_ms <= 0) 1000 + if (action_delay_ms <= 0) 974 1001 return; 975 - schedule_delayed_work(&hw_failure_emergency_poweroff_work, 976 - msecs_to_jiffies(poweroff_delay_ms)); 1002 + hw_failure_emergency_action = action; 1003 + schedule_delayed_work(&hw_failure_emergency_action_work, 1004 + msecs_to_jiffies(action_delay_ms)); 977 1005 } 978 1006 979 1007 /** ··· 1034 1006 * Queue a backup emergency shutdown in the event of 1035 1007 * orderly_poweroff failure 1036 1008 */ 1037 - hw_failure_emergency_poweroff(ms_until_forced); 1009 + hw_failure_emergency_schedule(action, ms_until_forced); 1038 1010 if (action == HWPROT_ACT_REBOOT) 1039 1011 orderly_reboot(); 1040 1012 else