Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count

Patch series "sysfs: add counters for lockups and stalls", v2.

Commits 9db89b411170 ("exit: Expose "oops_count" to sysfs") and
8b05aa263361 ("panic: Expose "warn_count" to sysfs") added counters for
oopses and warnings to sysfs, and these two patches do the same for
hard/soft lockups and RCU stalls.

All of these counters are useful for monitoring tools to detect whether
the machine is healthy. If the kernel has experienced a lockup or a
stall, it's probably due to a kernel bug, and I'd like to detect that
quickly and easily. There is currently no way to detect that, other than
parsing dmesg. Or observing indirect effects: such as certain tasks not
responding, but then I need to observe all tasks, and it may take a while
until these effects become visible/measurable. I'd rather be able to
detect the primary cause more quickly, possibly before everything falls
apart.


This patch (of 2):

There is /proc/sys/kernel/hung_task_detect_count, /sys/kernel/warn_count
and /sys/kernel/oops_count but there is no userspace-accessible counter
for hard/soft lockups. Having this is useful for monitoring tools.

Link: https://lkml.kernel.org/r/20250504180831.4190860-1-max.kellermann@ionos.com
Link: https://lkml.kernel.org/r/20250504180831.4190860-2-max.kellermann@ionos.com
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Cc:
Cc: Core Minyard <cminyard@mvista.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Max Kellermann and committed by
Andrew Morton
aaf05e96 cc66e486

+67
+7
Documentation/ABI/testing/sysfs-kernel-hardlockup_count
··· 1 + What: /sys/kernel/hardlockup_count 2 + Date: May 2025 3 + KernelVersion: 6.16 4 + Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> 5 + Description: 6 + Shows how many times the system has detected a hard lockup since last boot. 7 + Available only if CONFIG_HARDLOCKUP_DETECTOR is enabled.
+7
Documentation/ABI/testing/sysfs-kernel-softlockup_count
··· 1 + What: /sys/kernel/softlockup_count 2 + Date: May 2025 3 + KernelVersion: 6.16 4 + Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> 5 + Description: 6 + Shows how many times the system has detected a soft lockup since last boot. 7 + Available only if CONFIG_SOFTLOCKUP_DETECTOR is enabled.
+53
kernel/watchdog.c
··· 64 64 */ 65 65 unsigned int __read_mostly hardlockup_panic = 66 66 IS_ENABLED(CONFIG_BOOTPARAM_HARDLOCKUP_PANIC); 67 + 68 + #ifdef CONFIG_SYSFS 69 + 70 + static unsigned int hardlockup_count; 71 + 72 + static ssize_t hardlockup_count_show(struct kobject *kobj, struct kobj_attribute *attr, 73 + char *page) 74 + { 75 + return sysfs_emit(page, "%u\n", hardlockup_count); 76 + } 77 + 78 + static struct kobj_attribute hardlockup_count_attr = __ATTR_RO(hardlockup_count); 79 + 80 + static __init int kernel_hardlockup_sysfs_init(void) 81 + { 82 + sysfs_add_file_to_group(kernel_kobj, &hardlockup_count_attr.attr, NULL); 83 + return 0; 84 + } 85 + 86 + late_initcall(kernel_hardlockup_sysfs_init); 87 + 88 + #endif // CONFIG_SYSFS 89 + 67 90 /* 68 91 * We may not want to enable hard lockup detection by default in all cases, 69 92 * for example when running the kernel as a guest on a hypervisor. In these ··· 192 169 if (is_hardlockup(cpu)) { 193 170 unsigned int this_cpu = smp_processor_id(); 194 171 unsigned long flags; 172 + 173 + #ifdef CONFIG_SYSFS 174 + ++hardlockup_count; 175 + #endif 195 176 196 177 /* Only print hardlockups once. */ 197 178 if (per_cpu(watchdog_hardlockup_warned, cpu)) ··· 338 311 339 312 static bool softlockup_initialized __read_mostly; 340 313 static u64 __read_mostly sample_period; 314 + 315 + #ifdef CONFIG_SYSFS 316 + 317 + static unsigned int softlockup_count; 318 + 319 + static ssize_t softlockup_count_show(struct kobject *kobj, struct kobj_attribute *attr, 320 + char *page) 321 + { 322 + return sysfs_emit(page, "%u\n", softlockup_count); 323 + } 324 + 325 + static struct kobj_attribute softlockup_count_attr = __ATTR_RO(softlockup_count); 326 + 327 + static __init int kernel_softlockup_sysfs_init(void) 328 + { 329 + sysfs_add_file_to_group(kernel_kobj, &softlockup_count_attr.attr, NULL); 330 + return 0; 331 + } 332 + 333 + late_initcall(kernel_softlockup_sysfs_init); 334 + 335 + #endif // CONFIG_SYSFS 341 336 342 337 /* Timestamp taken after the last successful reschedule. */ 343 338 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); ··· 792 743 touch_ts = __this_cpu_read(watchdog_touch_ts); 793 744 duration = is_softlockup(touch_ts, period_ts, now); 794 745 if (unlikely(duration)) { 746 + #ifdef CONFIG_SYSFS 747 + ++softlockup_count; 748 + #endif 749 + 795 750 /* 796 751 * Prevent multiple soft-lockup reports if one cpu is already 797 752 * engaged in dumping all cpu back traces.