Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

-2

Documentation/dev-tools/checkpatch.rst

··· 470 470 usleep_range() should be preferred over udelay(). The proper way of 471 471 using usleep_range() is mentioned in the kernel docs. 472 472 473 - See: https://www.kernel.org/doc/html/latest/timers/timers-howto.html#delays-information-on-the-various-kernel-delay-sleep-mechanisms 474 - 475 473 476 474 Comments 477 475 --------

-21

Documentation/devicetree/bindings/timer/actions,owl-timer.txt

··· 1 - Actions Semi Owl Timer 2 - 3 - Required properties: 4 - - compatible : "actions,s500-timer" for S500 5 - "actions,s700-timer" for S700 6 - "actions,s900-timer" for S900 7 - - reg : Offset and length of the register set for the device. 8 - - interrupts : Should contain the interrupts. 9 - - interrupt-names : Valid names are: "2hz0", "2hz1", 10 - "timer0", "timer1", "timer2", "timer3" 11 - See ../resource-names.txt 12 - 13 - Example: 14 - 15 - timer@b0168000 { 16 - compatible = "actions,s500-timer"; 17 - reg = <0xb0168000 0x100>; 18 - interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>, 19 - <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>; 20 - interrupt-names = "timer0", "timer1"; 21 - };

+107

Documentation/devicetree/bindings/timer/actions,owl-timer.yaml

··· 1 + # SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause 2 + %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/timer/actions,owl-timer.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: Actions Semi Owl timer 8 + 9 + maintainers: 10 + - Andreas Färber <afaerber@suse.de> 11 + 12 + description: 13 + Actions Semi Owl SoCs provide 32bit and 2Hz timers. 14 + The 32bit timers support dynamic irq, as well as one-shot mode. 15 + 16 + properties: 17 + compatible: 18 + enum: 19 + - actions,s500-timer 20 + - actions,s700-timer 21 + - actions,s900-timer 22 + 23 + clocks: 24 + maxItems: 1 25 + 26 + interrupts: 27 + minItems: 1 28 + maxItems: 6 29 + 30 + interrupt-names: 31 + minItems: 1 32 + maxItems: 6 33 + items: 34 + enum: 35 + - 2hz0 36 + - 2hz1 37 + - timer0 38 + - timer1 39 + - timer2 40 + - timer3 41 + 42 + reg: 43 + maxItems: 1 44 + 45 + required: 46 + - compatible 47 + - clocks 48 + - interrupts 49 + - interrupt-names 50 + - reg 51 + 52 + allOf: 53 + - if: 54 + properties: 55 + compatible: 56 + contains: 57 + enum: 58 + - actions,s500-timer 59 + then: 60 + properties: 61 + interrupts: 62 + minItems: 4 63 + maxItems: 4 64 + interrupt-names: 65 + items: 66 + - const: 2hz0 67 + - const: 2hz1 68 + - const: timer0 69 + - const: timer1 70 + 71 + - if: 72 + properties: 73 + compatible: 74 + contains: 75 + enum: 76 + - actions,s700-timer 77 + - actions,s900-timer 78 + then: 79 + properties: 80 + interrupts: 81 + minItems: 1 82 + maxItems: 1 83 + interrupt-names: 84 + items: 85 + - const: timer1 86 + 87 + additionalProperties: false 88 + 89 + examples: 90 + - | 91 + #include <dt-bindings/interrupt-controller/arm-gic.h> 92 + #include <dt-bindings/interrupt-controller/irq.h> 93 + soc { 94 + #address-cells = <1>; 95 + #size-cells = <1>; 96 + timer@b0168000 { 97 + compatible = "actions,s500-timer"; 98 + reg = <0xb0168000 0x100>; 99 + clocks = <&hosc>; 100 + interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>, 101 + <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>, 102 + <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>, 103 + <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>; 104 + interrupt-names = "2hz0", "2hz1", "timer0", "timer1"; 105 + }; 106 + }; 107 + ...

+121

Documentation/timers/delay_sleep_functions.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + Delay and sleep mechanisms 4 + ========================== 5 + 6 + This document seeks to answer the common question: "What is the 7 + RightWay (TM) to insert a delay?" 8 + 9 + This question is most often faced by driver writers who have to 10 + deal with hardware delays and who may not be the most intimately 11 + familiar with the inner workings of the Linux Kernel. 12 + 13 + The following table gives a rough overview about the existing function 14 + 'families' and their limitations. This overview table does not replace the 15 + reading of the function description before usage! 16 + 17 + .. list-table:: 18 + :widths: 20 20 20 20 20 19 + :header-rows: 2 20 + 21 + * - 22 + - `*delay()` 23 + - `usleep_range*()` 24 + - `*sleep()` 25 + - `fsleep()` 26 + * - 27 + - busy-wait loop 28 + - hrtimers based 29 + - timer list timers based 30 + - combines the others 31 + * - Usage in atomic Context 32 + - yes 33 + - no 34 + - no 35 + - no 36 + * - precise on "short intervals" 37 + - yes 38 + - yes 39 + - depends 40 + - yes 41 + * - precise on "long intervals" 42 + - Do not use! 43 + - yes 44 + - max 12.5% slack 45 + - yes 46 + * - interruptible variant 47 + - no 48 + - yes 49 + - yes 50 + - no 51 + 52 + A generic advice for non atomic contexts could be: 53 + 54 + #. Use `fsleep()` whenever unsure (as it combines all the advantages of the 55 + others) 56 + #. Use `*sleep()` whenever possible 57 + #. Use `usleep_range*()` whenever accuracy of `*sleep()` is not sufficient 58 + #. Use `*delay()` for very, very short delays 59 + 60 + Find some more detailed information about the function 'families' in the next 61 + sections. 62 + 63 + `*delay()` family of functions 64 + ------------------------------ 65 + 66 + These functions use the jiffy estimation of clock speed and will busy wait for 67 + enough loop cycles to achieve the desired delay. udelay() is the basic 68 + implementation and ndelay() as well as mdelay() are variants. 69 + 70 + These functions are mainly used to add a delay in atomic context. Please make 71 + sure to ask yourself before adding a delay in atomic context: Is this really 72 + required? 73 + 74 + .. kernel-doc:: include/asm-generic/delay.h 75 + :identifiers: udelay ndelay 76 + 77 + .. kernel-doc:: include/linux/delay.h 78 + :identifiers: mdelay 79 + 80 + 81 + `usleep_range*()` and `*sleep()` family of functions 82 + ---------------------------------------------------- 83 + 84 + These functions use hrtimers or timer list timers to provide the requested 85 + sleeping duration. In order to decide which function is the right one to use, 86 + take some basic information into account: 87 + 88 + #. hrtimers are more expensive as they are using an rb-tree (instead of hashing) 89 + #. hrtimers are more expensive when the requested sleeping duration is the first 90 + timer which means real hardware has to be programmed 91 + #. timer list timers always provide some sort of slack as they are jiffy based 92 + 93 + The generic advice is repeated here: 94 + 95 + #. Use `fsleep()` whenever unsure (as it combines all the advantages of the 96 + others) 97 + #. Use `*sleep()` whenever possible 98 + #. Use `usleep_range*()` whenever accuracy of `*sleep()` is not sufficient 99 + 100 + First check fsleep() function description and to learn more about accuracy, 101 + please check msleep() function description. 102 + 103 + 104 + `usleep_range*()` 105 + ~~~~~~~~~~~~~~~~~ 106 + 107 + .. kernel-doc:: include/linux/delay.h 108 + :identifiers: usleep_range usleep_range_idle 109 + 110 + .. kernel-doc:: kernel/time/sleep_timeout.c 111 + :identifiers: usleep_range_state 112 + 113 + 114 + `*sleep()` 115 + ~~~~~~~~~~ 116 + 117 + .. kernel-doc:: kernel/time/sleep_timeout.c 118 + :identifiers: msleep msleep_interruptible 119 + 120 + .. kernel-doc:: include/linux/delay.h 121 + :identifiers: ssleep fsleep

+1 -1

Documentation/timers/index.rst

··· 12 12 hrtimers 13 13 no_hz 14 14 timekeeping 15 - timers-howto 15 + delay_sleep_functions 16 16 17 17 .. only:: subproject and html 18 18

-115

Documentation/timers/timers-howto.rst

··· 1 - =================================================================== 2 - delays - Information on the various kernel delay / sleep mechanisms 3 - =================================================================== 4 - 5 - This document seeks to answer the common question: "What is the 6 - RightWay (TM) to insert a delay?" 7 - 8 - This question is most often faced by driver writers who have to 9 - deal with hardware delays and who may not be the most intimately 10 - familiar with the inner workings of the Linux Kernel. 11 - 12 - 13 - Inserting Delays 14 - ---------------- 15 - 16 - The first, and most important, question you need to ask is "Is my 17 - code in an atomic context?" This should be followed closely by "Does 18 - it really need to delay in atomic context?" If so... 19 - 20 - ATOMIC CONTEXT: 21 - You must use the `*delay` family of functions. These 22 - functions use the jiffy estimation of clock speed 23 - and will busy wait for enough loop cycles to achieve 24 - the desired delay: 25 - 26 - ndelay(unsigned long nsecs) 27 - udelay(unsigned long usecs) 28 - mdelay(unsigned long msecs) 29 - 30 - udelay is the generally preferred API; ndelay-level 31 - precision may not actually exist on many non-PC devices. 32 - 33 - mdelay is macro wrapper around udelay, to account for 34 - possible overflow when passing large arguments to udelay. 35 - In general, use of mdelay is discouraged and code should 36 - be refactored to allow for the use of msleep. 37 - 38 - NON-ATOMIC CONTEXT: 39 - You should use the `*sleep[_range]` family of functions. 40 - There are a few more options here, while any of them may 41 - work correctly, using the "right" sleep function will 42 - help the scheduler, power management, and just make your 43 - driver better :) 44 - 45 - -- Backed by busy-wait loop: 46 - 47 - udelay(unsigned long usecs) 48 - 49 - -- Backed by hrtimers: 50 - 51 - usleep_range(unsigned long min, unsigned long max) 52 - 53 - -- Backed by jiffies / legacy_timers 54 - 55 - msleep(unsigned long msecs) 56 - msleep_interruptible(unsigned long msecs) 57 - 58 - Unlike the `*delay` family, the underlying mechanism 59 - driving each of these calls varies, thus there are 60 - quirks you should be aware of. 61 - 62 - 63 - SLEEPING FOR "A FEW" USECS ( < ~10us? ): 64 - * Use udelay 65 - 66 - - Why not usleep? 67 - On slower systems, (embedded, OR perhaps a speed- 68 - stepped PC!) the overhead of setting up the hrtimers 69 - for usleep *may* not be worth it. Such an evaluation 70 - will obviously depend on your specific situation, but 71 - it is something to be aware of. 72 - 73 - SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms): 74 - * Use usleep_range 75 - 76 - - Why not msleep for (1ms - 20ms)? 77 - Explained originally here: 78 - https://lore.kernel.org/r/15327.1186166232@lwn.net 79 - 80 - msleep(1~20) may not do what the caller intends, and 81 - will often sleep longer (~20 ms actual sleep for any 82 - value given in the 1~20ms range). In many cases this 83 - is not the desired behavior. 84 - 85 - - Why is there no "usleep" / What is a good range? 86 - Since usleep_range is built on top of hrtimers, the 87 - wakeup will be very precise (ish), thus a simple 88 - usleep function would likely introduce a large number 89 - of undesired interrupts. 90 - 91 - With the introduction of a range, the scheduler is 92 - free to coalesce your wakeup with any other wakeup 93 - that may have happened for other reasons, or at the 94 - worst case, fire an interrupt for your upper bound. 95 - 96 - The larger a range you supply, the greater a chance 97 - that you will not trigger an interrupt; this should 98 - be balanced with what is an acceptable upper bound on 99 - delay / performance for your specific code path. Exact 100 - tolerances here are very situation specific, thus it 101 - is left to the caller to determine a reasonable range. 102 - 103 - SLEEPING FOR LARGER MSECS ( 10ms+ ) 104 - * Use msleep or possibly msleep_interruptible 105 - 106 - - What's the difference? 107 - msleep sets the current task to TASK_UNINTERRUPTIBLE 108 - whereas msleep_interruptible sets the current task to 109 - TASK_INTERRUPTIBLE before scheduling the sleep. In 110 - short, the difference is whether the sleep can be ended 111 - early by a signal. In general, just use msleep unless 112 - you know you have a need for the interruptible variant. 113 - 114 - FLEXIBLE SLEEPING (any delay, uninterruptible) 115 - * Use fsleep

+3 -1

MAINTAINERS

··· 1998 1998 F: Documentation/devicetree/bindings/net/actions,owl-emac.yaml 1999 1999 F: Documentation/devicetree/bindings/pinctrl/actions,* 2000 2000 F: Documentation/devicetree/bindings/power/actions,owl-sps.txt 2001 - F: Documentation/devicetree/bindings/timer/actions,owl-timer.txt 2001 + F: Documentation/devicetree/bindings/timer/actions,owl-timer.yaml 2002 2002 F: arch/arm/boot/dts/actions/ 2003 2003 F: arch/arm/mach-actions/ 2004 2004 F: arch/arm64/boot/dts/actions/ ··· 10138 10138 T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/core 10139 10139 F: Documentation/timers/ 10140 10140 F: include/linux/clockchips.h 10141 + F: include/linux/delay.h 10141 10142 F: include/linux/hrtimer.h 10142 10143 F: include/linux/timer.h 10143 10144 F: kernel/time/clockevents.c 10144 10145 F: kernel/time/hrtimer.c 10146 + F: kernel/time/sleep_timeout.c 10145 10147 F: kernel/time/timer.c 10146 10148 F: kernel/time/timer_list.c 10147 10149 F: kernel/time/timer_migration.*

-1

arch/arm/kernel/smp_twd.c

··· 93 93 { 94 94 struct clock_event_device *clk = raw_cpu_ptr(twd_evt); 95 95 96 - twd_shutdown(clk); 97 96 disable_percpu_irq(clk->irq); 98 97 } 99 98

-7

arch/mips/ralink/Kconfig

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 if RALINK 3 3 4 - config CLKEVT_RT3352 5 - bool 6 - depends on SOC_RT305X || SOC_MT7620 7 - default y 8 - select TIMER_OF 9 - select CLKSRC_MMIO 10 - 11 4 config RALINK_ILL_ACC 12 5 bool 13 6 depends on SOC_RT305X

-2

arch/mips/ralink/Makefile

··· 10 10 obj-y += clk.o timer.o 11 11 endif 12 12 13 - obj-$(CONFIG_CLKEVT_RT3352) += cevt-rt3352.o 14 - 15 13 obj-$(CONFIG_RALINK_ILL_ACC) += ill_acc.o 16 14 17 15 obj-$(CONFIG_IRQ_INTC) += irq.o

+4 -7

arch/mips/ralink/cevt-rt3352.c drivers/clocksource/timer-ralink.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 1 2 /* 2 - * This file is subject to the terms and conditions of the GNU General Public 3 - * License. See the file "COPYING" in the main directory of this archive 4 - * for more details. 3 + * Ralink System Tick Counter driver present on RT3352 and MT7620 SoCs. 5 4 * 6 5 * Copyright (C) 2013 by John Crispin <john@phrozen.org> 7 6 */ ··· 14 15 #include <linux/of.h> 15 16 #include <linux/of_irq.h> 16 17 #include <linux/of_address.h> 17 - 18 - #include <asm/mach-ralink/ralink_regs.h> 19 18 20 19 #define SYSTICK_FREQ (50 * 1000) 21 20 ··· 37 40 static int systick_shutdown(struct clock_event_device *evt); 38 41 39 42 static int systick_next_event(unsigned long delta, 40 - struct clock_event_device *evt) 43 + struct clock_event_device *evt) 41 44 { 42 45 struct systick_device *sdev; 43 46 u32 count; ··· 57 60 58 61 static irqreturn_t systick_interrupt(int irq, void *dev_id) 59 62 { 60 - struct clock_event_device *dev = (struct clock_event_device *) dev_id; 63 + struct clock_event_device *dev = (struct clock_event_device *)dev_id; 61 64 62 65 dev->event_handler(dev); 63 66

+7 -14

arch/powerpc/kernel/rtas.c

··· 1390 1390 */ 1391 1391 ms = clamp(ms, 1U, 1000U); 1392 1392 /* 1393 - * The delay hint is an order-of-magnitude suggestion, not 1394 - * a minimum. It is fine, possibly even advantageous, for 1395 - * us to pause for less time than hinted. For small values, 1396 - * use usleep_range() to ensure we don't sleep much longer 1397 - * than actually needed. 1398 - * 1399 - * See Documentation/timers/timers-howto.rst for 1400 - * explanation of the threshold used here. In effect we use 1401 - * usleep_range() for 9900 and 9901, msleep() for 1402 - * 9902-9905. 1393 + * The delay hint is an order-of-magnitude suggestion, not a 1394 + * minimum. It is fine, possibly even advantageous, for us to 1395 + * pause for less time than hinted. To make sure pause time will 1396 + * not be way longer than requested independent of HZ 1397 + * configuration, use fsleep(). See fsleep() for details of 1398 + * used sleeping functions. 1403 1399 */ 1404 - if (ms <= 20) 1405 - usleep_range(ms * 100, ms * 1000); 1406 - else 1407 - msleep(ms); 1400 + fsleep(ms * 1000); 1408 1401 break; 1409 1402 case RTAS_BUSY: 1410 1403 ret = true;

-1

arch/riscv/configs/defconfig

··· 302 302 CONFIG_DEBUG_PER_CPU_MAPS=y 303 303 CONFIG_SOFTLOCKUP_DETECTOR=y 304 304 CONFIG_WQ_WATCHDOG=y 305 - CONFIG_DEBUG_TIMEKEEPING=y 306 305 CONFIG_DEBUG_RT_MUTEXES=y 307 306 CONFIG_DEBUG_SPINLOCK=y 308 307 CONFIG_DEBUG_MUTEXES=y

-1

arch/x86/Kconfig

··· 146 146 select ARCH_HAS_PARANOID_L1D_FLUSH 147 147 select BUILDTIME_TABLE_SORT 148 148 select CLKEVT_I8253 149 - select CLOCKSOURCE_VALIDATE_LAST_CYCLE 150 149 select CLOCKSOURCE_WATCHDOG 151 150 # Word-size accesses may read uninitialized data past the trailing \0 152 151 # in strings and cause false KMSAN reports.

-2

arch/x86/include/asm/timer.h

··· 6 6 #include <linux/interrupt.h> 7 7 #include <linux/math64.h> 8 8 9 - #define TICK_SIZE (tick_nsec / 1000) 10 - 11 9 unsigned long long native_sched_clock(void); 12 10 extern void recalibrate_cpu_khz(void); 13 11

+2 -10

arch/x86/kvm/xen.c

··· 263 263 atomic_set(&vcpu->arch.xen.timer_pending, 0); 264 264 } 265 265 266 - static void kvm_xen_init_timer(struct kvm_vcpu *vcpu) 267 - { 268 - hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, 269 - HRTIMER_MODE_ABS_HARD); 270 - vcpu->arch.xen.timer.function = xen_timer_callback; 271 - } 272 - 273 266 static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) 274 267 { 275 268 struct kvm_vcpu_xen *vx = &v->arch.xen; ··· 1062 1069 r = -EINVAL; 1063 1070 break; 1064 1071 } 1065 - 1066 - if (!vcpu->arch.xen.timer.function) 1067 - kvm_xen_init_timer(vcpu); 1068 1072 1069 1073 /* Stop the timer (if it's running) before changing the vector */ 1070 1074 kvm_xen_stop_timer(vcpu); ··· 2225 2235 vcpu->arch.xen.poll_evtchn = 0; 2226 2236 2227 2237 timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0); 2238 + hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD); 2239 + vcpu->arch.xen.timer.function = xen_timer_callback; 2228 2240 2229 2241 kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm); 2230 2242 kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm);

+11 -1

drivers/clocksource/Kconfig

··· 400 400 This affects CPU_FREQ max delta from the initial frequency. 401 401 402 402 config ARM_TIMER_SP804 403 - bool "Support for Dual Timer SP804 module" if COMPILE_TEST 403 + bool "Support for Dual Timer SP804 module" 404 + depends on ARM || ARM64 || COMPILE_TEST 404 405 depends on GENERIC_SCHED_CLOCK && HAVE_CLK 405 406 select CLKSRC_MMIO 406 407 select TIMER_OF if OF ··· 753 752 help 754 753 Enables support for the Cirrus Logic timer block 755 754 EP93XX. 755 + 756 + config RALINK_TIMER 757 + bool "Ralink System Tick Counter" 758 + depends on SOC_RT305X || SOC_MT7620 || COMPILE_TEST 759 + select CLKSRC_MMIO 760 + select TIMER_OF 761 + help 762 + Enables support for system tick counter present on 763 + Ralink SoCs RT3352 and MT7620. 756 764 757 765 endmenu

+1

drivers/clocksource/Makefile

··· 91 91 obj-$(CONFIG_GXP_TIMER) += timer-gxp.o 92 92 obj-$(CONFIG_CLKSRC_LOONGSON1_PWM) += timer-loongson1-pwm.o 93 93 obj-$(CONFIG_EP93XX_TIMER) += timer-ep93xx.o 94 + obj-$(CONFIG_RALINK_TIMER) += timer-ralink.o

+1 -3

drivers/clocksource/arm_arch_timer.c

··· 1179 1179 disable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi]); 1180 1180 if (arch_timer_has_nonsecure_ppi()) 1181 1181 disable_percpu_irq(arch_timer_ppi[ARCH_TIMER_PHYS_NONSECURE_PPI]); 1182 - 1183 - clk->set_state_shutdown(clk); 1184 1182 } 1185 1183 1186 1184 static int arch_timer_dying_cpu(unsigned int cpu) ··· 1428 1430 1429 1431 arch_timers_present |= ARCH_TIMER_TYPE_CP15; 1430 1432 1431 - has_names = of_property_read_bool(np, "interrupt-names"); 1433 + has_names = of_property_present(np, "interrupt-names"); 1432 1434 1433 1435 for (i = ARCH_TIMER_PHYS_SECURE_PPI; i < ARCH_TIMER_MAX_TIMER_PPI; i++) { 1434 1436 if (has_names)

-1

drivers/clocksource/arm_global_timer.c

··· 195 195 { 196 196 struct clock_event_device *clk = this_cpu_ptr(gt_evt); 197 197 198 - gt_clockevent_shutdown(clk); 199 198 disable_percpu_irq(clk->irq); 200 199 return 0; 201 200 }

-39

drivers/clocksource/dw_apb_timer.c

··· 68 68 writel_relaxed(val, timer->base + offs); 69 69 } 70 70 71 - static void apbt_disable_int(struct dw_apb_timer *timer) 72 - { 73 - u32 ctrl = apbt_readl(timer, APBTMR_N_CONTROL); 74 - 75 - ctrl |= APBTMR_CONTROL_INT; 76 - apbt_writel(timer, ctrl, APBTMR_N_CONTROL); 77 - } 78 - 79 - /** 80 - * dw_apb_clockevent_pause() - stop the clock_event_device from running 81 - * 82 - * @dw_ced: The APB clock to stop generating events. 83 - */ 84 - void dw_apb_clockevent_pause(struct dw_apb_clock_event_device *dw_ced) 85 - { 86 - disable_irq(dw_ced->timer.irq); 87 - apbt_disable_int(&dw_ced->timer); 88 - } 89 - 90 71 static void apbt_eoi(struct dw_apb_timer *timer) 91 72 { 92 73 apbt_readl_relaxed(timer, APBTMR_N_EOI); ··· 263 282 } 264 283 265 284 return dw_ced; 266 - } 267 - 268 - /** 269 - * dw_apb_clockevent_resume() - resume a clock that has been paused. 270 - * 271 - * @dw_ced: The APB clock to resume. 272 - */ 273 - void dw_apb_clockevent_resume(struct dw_apb_clock_event_device *dw_ced) 274 - { 275 - enable_irq(dw_ced->timer.irq); 276 - } 277 - 278 - /** 279 - * dw_apb_clockevent_stop() - stop the clock_event_device and release the IRQ. 280 - * 281 - * @dw_ced: The APB clock to stop generating the events. 282 - */ 283 - void dw_apb_clockevent_stop(struct dw_apb_clock_event_device *dw_ced) 284 - { 285 - free_irq(dw_ced->timer.irq, &dw_ced->ced); 286 285 } 287 286 288 287 /**

-1

drivers/clocksource/exynos_mct.c

··· 496 496 per_cpu_ptr(&percpu_mct_tick, cpu); 497 497 struct clock_event_device *evt = &mevt->evt; 498 498 499 - evt->set_state_shutdown(evt); 500 499 if (mct_int_type == MCT_INT_SPI) { 501 500 if (evt->irq != -1) 502 501 disable_irq_nosync(evt->irq);

+38 -1

drivers/clocksource/mips-gic-timer.c

··· 166 166 return gic_read_count(); 167 167 } 168 168 169 + static u64 gic_hpt_read_multicluster(struct clocksource *cs) 170 + { 171 + unsigned int hi, hi2, lo; 172 + u64 count; 173 + 174 + mips_cm_lock_other(0, 0, 0, CM_GCR_Cx_OTHER_BLOCK_GLOBAL); 175 + 176 + if (mips_cm_is64) { 177 + count = read_gic_redir_counter(); 178 + goto out; 179 + } 180 + 181 + hi = read_gic_redir_counter_32h(); 182 + while (true) { 183 + lo = read_gic_redir_counter_32l(); 184 + 185 + /* If hi didn't change then lo didn't wrap & we're done */ 186 + hi2 = read_gic_redir_counter_32h(); 187 + if (hi2 == hi) 188 + break; 189 + 190 + /* Otherwise, repeat with the latest hi value */ 191 + hi = hi2; 192 + } 193 + 194 + count = (((u64)hi) << 32) + lo; 195 + out: 196 + mips_cm_unlock_other(); 197 + return count; 198 + } 199 + 169 200 static struct clocksource gic_clocksource = { 170 201 .name = "GIC", 171 202 .read = gic_hpt_read, ··· 233 202 else 234 203 gic_clocksource.rating = 200; 235 204 gic_clocksource.rating += clamp(gic_frequency / 10000000, 0, 99); 205 + 206 + if (mips_cps_multicluster_cpus()) { 207 + gic_clocksource.read = &gic_hpt_read_multicluster; 208 + gic_clocksource.vdso_clock_mode = VDSO_CLOCKMODE_NONE; 209 + } 236 210 237 211 ret = clocksource_register_hz(&gic_clocksource, gic_frequency); 238 212 if (ret < 0) ··· 297 261 * stable CPU frequency or on the platforms with CM3 and CPU frequency 298 262 * change performed by the CPC core clocks divider. 299 263 */ 300 - if (mips_cm_revision() >= CM_REV_CM3 || !IS_ENABLED(CONFIG_CPU_FREQ)) { 264 + if ((mips_cm_revision() >= CM_REV_CM3 || !IS_ENABLED(CONFIG_CPU_FREQ)) && 265 + !mips_cps_multicluster_cpus()) { 301 266 sched_clock_register(mips_cm_is64 ? 302 267 gic_read_count_64 : gic_read_count_2x32, 303 268 gic_count_width, gic_frequency);

-1

drivers/clocksource/timer-armada-370-xp.c

··· 201 201 { 202 202 struct clock_event_device *evt = per_cpu_ptr(armada_370_xp_evt, cpu); 203 203 204 - evt->set_state_shutdown(evt); 205 204 disable_percpu_irq(evt->irq); 206 205 return 0; 207 206 }

+1 -1

drivers/clocksource/timer-gxp.c

··· 85 85 86 86 clk = of_clk_get(node, 0); 87 87 if (IS_ERR(clk)) { 88 - ret = (int)PTR_ERR(clk); 88 + ret = PTR_ERR(clk); 89 89 pr_err("%pOFn clock not found: %d\n", node, ret); 90 90 goto err_free; 91 91 }

-1

drivers/clocksource/timer-qcom.c

··· 130 130 { 131 131 struct clock_event_device *evt = per_cpu_ptr(msm_evt, cpu); 132 132 133 - evt->set_state_shutdown(evt); 134 133 disable_percpu_irq(evt->irq); 135 134 return 0; 136 135 }

-1

drivers/clocksource/timer-tegra.c

··· 158 158 { 159 159 struct timer_of *to = per_cpu_ptr(&tegra_to, cpu); 160 160 161 - to->clkevt.set_state_shutdown(&to->clkevt); 162 161 disable_irq_nosync(to->clkevt.irq); 163 162 164 163 return 0;

+4 -4

drivers/clocksource/timer-ti-dm-systimer.c

··· 202 202 203 203 /* Secure gptimer12 is always clocked with a fixed source */ 204 204 if (!of_property_read_bool(np, "ti,timer-secure")) { 205 - if (!of_property_read_bool(np, "assigned-clocks")) 205 + if (!of_property_present(np, "assigned-clocks")) 206 206 return false; 207 207 208 - if (!of_property_read_bool(np, "assigned-clock-parents")) 208 + if (!of_property_present(np, "assigned-clock-parents")) 209 209 return false; 210 210 } 211 211 ··· 686 686 687 687 static int __init dmtimer_percpu_quirk_init(struct device_node *np, u32 pa) 688 688 { 689 - struct device_node *arm_timer; 689 + struct device_node *arm_timer __free(device_node) = 690 + of_find_compatible_node(NULL, NULL, "arm,armv7-timer"); 690 691 691 - arm_timer = of_find_compatible_node(NULL, NULL, "arm,armv7-timer"); 692 692 if (of_device_is_available(arm_timer)) { 693 693 pr_warn_once("ARM architected timer wrap issue i940 detected\n"); 694 694 return 0;

+6 -2

drivers/clocksource/timer-ti-dm.c

··· 1104 1104 return -ENOMEM; 1105 1105 1106 1106 timer->irq = platform_get_irq(pdev, 0); 1107 - if (timer->irq < 0) 1108 - return timer->irq; 1107 + if (timer->irq < 0) { 1108 + if (of_property_read_bool(dev->of_node, "ti,timer-pwm")) 1109 + dev_info(dev, "Did not find timer interrupt, timer usable in PWM mode only\n"); 1110 + else 1111 + return timer->irq; 1112 + } 1109 1113 1110 1114 timer->io_base = devm_platform_ioremap_resource(pdev, 0); 1111 1115 if (IS_ERR(timer->io_base))

+9 -8

drivers/gpu/drm/i915/i915_request.c

··· 273 273 return ret; 274 274 } 275 275 276 - static void __rq_init_watchdog(struct i915_request *rq) 277 - { 278 - rq->watchdog.timer.function = NULL; 279 - } 280 - 281 276 static enum hrtimer_restart __rq_watchdog_expired(struct hrtimer *hrtimer) 282 277 { 283 278 struct i915_request *rq = ··· 289 294 return HRTIMER_NORESTART; 290 295 } 291 296 297 + static void __rq_init_watchdog(struct i915_request *rq) 298 + { 299 + struct i915_request_watchdog *wdg = &rq->watchdog; 300 + 301 + hrtimer_init(&wdg->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 302 + wdg->timer.function = __rq_watchdog_expired; 303 + } 304 + 292 305 static void __rq_arm_watchdog(struct i915_request *rq) 293 306 { 294 307 struct i915_request_watchdog *wdg = &rq->watchdog; ··· 307 304 308 305 i915_request_get(rq); 309 306 310 - hrtimer_init(&wdg->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 311 - wdg->timer.function = __rq_watchdog_expired; 312 307 hrtimer_start_range_ns(&wdg->timer, 313 308 ns_to_ktime(ce->watchdog.timeout_us * 314 309 NSEC_PER_USEC), ··· 318 317 { 319 318 struct i915_request_watchdog *wdg = &rq->watchdog; 320 319 321 - if (wdg->timer.function && hrtimer_try_to_cancel(&wdg->timer) > 0) 320 + if (hrtimer_try_to_cancel(&wdg->timer) > 0) 322 321 i915_request_put(rq); 323 322 } 324 323

+4 -13

drivers/media/usb/dvb-usb-v2/anysee.c

··· 46 46 47 47 dev_dbg(&d->udev->dev, "%s: >>> %*ph\n", __func__, slen, state->buf); 48 48 49 - /* We need receive one message more after dvb_usb_generic_rw due 50 - to weird transaction flow, which is 1 x send + 2 x receive. */ 49 + /* 50 + * We need receive one message more after dvb_usbv2_generic_rw_locked() 51 + * due to weird transaction flow, which is 1 x send + 2 x receive. 52 + */ 51 53 ret = dvb_usbv2_generic_rw_locked(d, state->buf, sizeof(state->buf), 52 54 state->buf, sizeof(state->buf)); 53 55 if (ret) 54 56 goto error_unlock; 55 - 56 - /* TODO FIXME: dvb_usb_generic_rw() fails rarely with error code -32 57 - * (EPIPE, Broken pipe). Function supports currently msleep() as a 58 - * parameter but I would not like to use it, since according to 59 - * Documentation/timers/timers-howto.rst it should not be used such 60 - * short, under < 20ms, sleeps. Repeating failed message would be 61 - * better choice as not to add unwanted delays... 62 - * Fixing that correctly is one of those or both; 63 - * 1) use repeat if possible 64 - * 2) add suitable delay 65 - */ 66 57 67 58 /* get answer, retry few times if error returned */ 68 59 for (i = 0; i < 3; i++) {

-2

drivers/net/wireless/ralink/rt2x00/rt2x00usb.c

··· 823 823 824 824 INIT_WORK(&rt2x00dev->rxdone_work, rt2x00usb_work_rxdone); 825 825 INIT_WORK(&rt2x00dev->txdone_work, rt2x00usb_work_txdone); 826 - hrtimer_init(&rt2x00dev->txstatus_timer, CLOCK_MONOTONIC, 827 - HRTIMER_MODE_REL); 828 826 829 827 retval = rt2x00usb_alloc_reg(rt2x00dev); 830 828 if (retval)

+1 -2

drivers/power/supply/charger-manager.c

··· 1412 1412 return dev_get_platdata(&pdev->dev); 1413 1413 } 1414 1414 1415 - static enum alarmtimer_restart cm_timer_func(struct alarm *alarm, ktime_t now) 1415 + static void cm_timer_func(struct alarm *alarm, ktime_t now) 1416 1416 { 1417 1417 cm_timer_set = false; 1418 - return ALARMTIMER_NORESTART; 1419 1418 } 1420 1419 1421 1420 static int charger_manager_probe(struct platform_device *pdev)

+1 -1

fs/aio.c

··· 1335 1335 if (until == 0 || ret < 0 || ret >= min_nr) 1336 1336 return ret; 1337 1337 1338 - hrtimer_init_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 1338 + hrtimer_setup_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 1339 1339 if (until != KTIME_MAX) { 1340 1340 hrtimer_set_expires_range_ns(&t.timer, until, current->timer_slack_ns); 1341 1341 hrtimer_sleeper_start_expires(&t, HRTIMER_MODE_REL);

+2 -2

fs/proc/base.c

··· 2552 2552 2553 2553 seq_printf(m, "ID: %d\n", timer->it_id); 2554 2554 seq_printf(m, "signal: %d/%px\n", 2555 - timer->sigq->info.si_signo, 2556 - timer->sigq->info.si_value.sival_ptr); 2555 + timer->sigq.info.si_signo, 2556 + timer->sigq.info.si_value.sival_ptr); 2557 2557 seq_printf(m, "notify: %s/%s.%d\n", 2558 2558 nstr[notify & ~SIGEV_THREAD_ID], 2559 2559 (notify & SIGEV_THREAD_ID) ? "tid" : "pid",

+1 -3

fs/timerfd.c

··· 79 79 return HRTIMER_NORESTART; 80 80 } 81 81 82 - static enum alarmtimer_restart timerfd_alarmproc(struct alarm *alarm, 83 - ktime_t now) 82 + static void timerfd_alarmproc(struct alarm *alarm, ktime_t now) 84 83 { 85 84 struct timerfd_ctx *ctx = container_of(alarm, struct timerfd_ctx, 86 85 t.alarm); 87 86 timerfd_triggered(ctx); 88 - return ALARMTIMER_NORESTART; 89 87 } 90 88 91 89 /*

+68 -26

include/asm-generic/delay.h

··· 2 2 #ifndef __ASM_GENERIC_DELAY_H 3 3 #define __ASM_GENERIC_DELAY_H 4 4 5 + #include <linux/math.h> 6 + #include <vdso/time64.h> 7 + 5 8 /* Undefined functions to get compile-time errors */ 6 9 extern void __bad_udelay(void); 7 10 extern void __bad_ndelay(void); ··· 15 12 extern void __delay(unsigned long loops); 16 13 17 14 /* 18 - * The weird n/20000 thing suppresses a "comparison is always false due to 19 - * limited range of data type" warning with non-const 8-bit arguments. 15 + * The microseconds/nanosecond delay multiplicators are used to convert a 16 + * constant microseconds/nanoseconds value to a value which can be used by the 17 + * architectures specific implementation to transform it into loops. 20 18 */ 19 + #define UDELAY_CONST_MULT ((unsigned long)DIV_ROUND_UP(1ULL << 32, USEC_PER_SEC)) 20 + #define NDELAY_CONST_MULT ((unsigned long)DIV_ROUND_UP(1ULL << 32, NSEC_PER_SEC)) 21 21 22 - /* 0x10c7 is 2**32 / 1000000 (rounded up) */ 23 - #define udelay(n) \ 24 - ({ \ 25 - if (__builtin_constant_p(n)) { \ 26 - if ((n) / 20000 >= 1) \ 27 - __bad_udelay(); \ 28 - else \ 29 - __const_udelay((n) * 0x10c7ul); \ 30 - } else { \ 31 - __udelay(n); \ 32 - } \ 33 - }) 22 + /* 23 + * The maximum constant udelay/ndelay value picked out of thin air to prevent 24 + * too long constant udelays/ndelays. 25 + */ 26 + #define DELAY_CONST_MAX 20000 34 27 35 - /* 0x5 is 2**32 / 1000000000 (rounded up) */ 36 - #define ndelay(n) \ 37 - ({ \ 38 - if (__builtin_constant_p(n)) { \ 39 - if ((n) / 20000 >= 1) \ 40 - __bad_ndelay(); \ 41 - else \ 42 - __const_udelay((n) * 5ul); \ 43 - } else { \ 44 - __ndelay(n); \ 45 - } \ 46 - }) 28 + /** 29 + * udelay - Inserting a delay based on microseconds with busy waiting 30 + * @usec: requested delay in microseconds 31 + * 32 + * When delaying in an atomic context ndelay(), udelay() and mdelay() are the 33 + * only valid variants of delaying/sleeping to go with. 34 + * 35 + * When inserting delays in non atomic context which are shorter than the time 36 + * which is required to queue e.g. an hrtimer and to enter then the scheduler, 37 + * it is also valuable to use udelay(). But it is not simple to specify a 38 + * generic threshold for this which will fit for all systems. An approximation 39 + * is a threshold for all delays up to 10 microseconds. 40 + * 41 + * When having a delay which is larger than the architecture specific 42 + * %MAX_UDELAY_MS value, please make sure mdelay() is used. Otherwise a overflow 43 + * risk is given. 44 + * 45 + * Please note that ndelay(), udelay() and mdelay() may return early for several 46 + * reasons (https://lists.openwall.net/linux-kernel/2011/01/09/56): 47 + * 48 + * #. computed loops_per_jiffy too low (due to the time taken to execute the 49 + * timer interrupt.) 50 + * #. cache behaviour affecting the time it takes to execute the loop function. 51 + * #. CPU clock rate changes. 52 + */ 53 + static __always_inline void udelay(unsigned long usec) 54 + { 55 + if (__builtin_constant_p(usec)) { 56 + if (usec >= DELAY_CONST_MAX) 57 + __bad_udelay(); 58 + else 59 + __const_udelay(usec * UDELAY_CONST_MULT); 60 + } else { 61 + __udelay(usec); 62 + } 63 + } 64 + 65 + /** 66 + * ndelay - Inserting a delay based on nanoseconds with busy waiting 67 + * @nsec: requested delay in nanoseconds 68 + * 69 + * See udelay() for basic information about ndelay() and it's variants. 70 + */ 71 + static __always_inline void ndelay(unsigned long nsec) 72 + { 73 + if (__builtin_constant_p(nsec)) { 74 + if (nsec >= DELAY_CONST_MAX) 75 + __bad_udelay(); 76 + else 77 + __const_udelay(nsec * NDELAY_CONST_MULT); 78 + } else { 79 + __udelay(nsec); 80 + } 81 + } 82 + #define ndelay(x) ndelay(x) 47 83 48 84 #endif /* __ASM_GENERIC_DELAY_H */

+2 -8

include/linux/alarmtimer.h

··· 20 20 ALARM_BOOTTIME_FREEZER, 21 21 }; 22 22 23 - enum alarmtimer_restart { 24 - ALARMTIMER_NORESTART, 25 - ALARMTIMER_RESTART, 26 - }; 27 - 28 - 29 23 #define ALARMTIMER_STATE_INACTIVE 0x00 30 24 #define ALARMTIMER_STATE_ENQUEUED 0x01 31 25 ··· 36 42 struct alarm { 37 43 struct timerqueue_node node; 38 44 struct hrtimer timer; 39 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t now); 45 + void (*function)(struct alarm *, ktime_t now); 40 46 enum alarmtimer_type type; 41 47 int state; 42 48 void *data; 43 49 }; 44 50 45 51 void alarm_init(struct alarm *alarm, enum alarmtimer_type type, 46 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t)); 52 + void (*function)(struct alarm *, ktime_t)); 47 53 void alarm_start(struct alarm *alarm, ktime_t start); 48 54 void alarm_start_relative(struct alarm *alarm, ktime_t start); 49 55 void alarm_restart(struct alarm *alarm);

-1

include/linux/clocksource.h

··· 215 215 216 216 extern int clocksource_unregister(struct clocksource*); 217 217 extern void clocksource_touch_watchdog(void); 218 - extern void clocksource_change_rating(struct clocksource *cs, int rating); 219 218 extern void clocksource_suspend(void); 220 219 extern void clocksource_resume(void); 221 220 extern struct clocksource * __init clocksource_default_clock(void);

+62 -17

include/linux/delay.h

··· 6 6 * Copyright (C) 1993 Linus Torvalds 7 7 * 8 8 * Delay routines, using a pre-computed "loops_per_jiffy" value. 9 - * 10 - * Please note that ndelay(), udelay() and mdelay() may return early for 11 - * several reasons: 12 - * 1. computed loops_per_jiffy too low (due to the time taken to 13 - * execute the timer interrupt.) 14 - * 2. cache behaviour affecting the time it takes to execute the 15 - * loop function. 16 - * 3. CPU clock rate changes. 17 - * 18 - * Please see this thread: 19 - * https://lists.openwall.net/linux-kernel/2011/01/09/56 9 + * Sleep routines using timer list timers or hrtimers. 20 10 */ 21 11 22 12 #include <linux/math.h> 23 13 #include <linux/sched.h> 14 + #include <linux/jiffies.h> 24 15 25 16 extern unsigned long loops_per_jiffy; 26 17 ··· 26 35 * The 2nd mdelay() definition ensures GCC will optimize away the 27 36 * while loop for the common cases where n <= MAX_UDELAY_MS -- Paul G. 28 37 */ 29 - 30 38 #ifndef MAX_UDELAY_MS 31 39 #define MAX_UDELAY_MS 5 32 40 #endif 33 41 34 42 #ifndef mdelay 43 + /** 44 + * mdelay - Inserting a delay based on milliseconds with busy waiting 45 + * @n: requested delay in milliseconds 46 + * 47 + * See udelay() for basic information about mdelay() and it's variants. 48 + * 49 + * Please double check, whether mdelay() is the right way to go or whether a 50 + * refactoring of the code is the better variant to be able to use msleep() 51 + * instead. 52 + */ 35 53 #define mdelay(n) (\ 36 54 (__builtin_constant_p(n) && (n)<=MAX_UDELAY_MS) ? udelay((n)*1000) : \ 37 55 ({unsigned long __ms=(n); while (__ms--) udelay(1000);})) ··· 63 63 void usleep_range_state(unsigned long min, unsigned long max, 64 64 unsigned int state); 65 65 66 + /** 67 + * usleep_range - Sleep for an approximate time 68 + * @min: Minimum time in microseconds to sleep 69 + * @max: Maximum time in microseconds to sleep 70 + * 71 + * For basic information please refere to usleep_range_state(). 72 + * 73 + * The task will be in the state TASK_UNINTERRUPTIBLE during the sleep. 74 + */ 66 75 static inline void usleep_range(unsigned long min, unsigned long max) 67 76 { 68 77 usleep_range_state(min, max, TASK_UNINTERRUPTIBLE); 69 78 } 70 79 71 - static inline void usleep_idle_range(unsigned long min, unsigned long max) 80 + /** 81 + * usleep_range_idle - Sleep for an approximate time with idle time accounting 82 + * @min: Minimum time in microseconds to sleep 83 + * @max: Maximum time in microseconds to sleep 84 + * 85 + * For basic information please refere to usleep_range_state(). 86 + * 87 + * The sleeping task has the state TASK_IDLE during the sleep to prevent 88 + * contribution to the load avarage. 89 + */ 90 + static inline void usleep_range_idle(unsigned long min, unsigned long max) 72 91 { 73 92 usleep_range_state(min, max, TASK_IDLE); 74 93 } 75 94 95 + /** 96 + * ssleep - wrapper for seconds around msleep 97 + * @seconds: Requested sleep duration in seconds 98 + * 99 + * Please refere to msleep() for detailed information. 100 + */ 76 101 static inline void ssleep(unsigned int seconds) 77 102 { 78 103 msleep(seconds * 1000); 79 104 } 80 105 81 - /* see Documentation/timers/timers-howto.rst for the thresholds */ 106 + static const unsigned int max_slack_shift = 2; 107 + #define USLEEP_RANGE_UPPER_BOUND ((TICK_NSEC << max_slack_shift) / NSEC_PER_USEC) 108 + 109 + /** 110 + * fsleep - flexible sleep which autoselects the best mechanism 111 + * @usecs: requested sleep duration in microseconds 112 + * 113 + * flseep() selects the best mechanism that will provide maximum 25% slack 114 + * to the requested sleep duration. Therefore it uses: 115 + * 116 + * * udelay() loop for sleep durations <= 10 microseconds to avoid hrtimer 117 + * overhead for really short sleep durations. 118 + * * usleep_range() for sleep durations which would lead with the usage of 119 + * msleep() to a slack larger than 25%. This depends on the granularity of 120 + * jiffies. 121 + * * msleep() for all other sleep durations. 122 + * 123 + * Note: When %CONFIG_HIGH_RES_TIMERS is not set, all sleeps are processed with 124 + * the granularity of jiffies and the slack might exceed 25% especially for 125 + * short sleep durations. 126 + */ 82 127 static inline void fsleep(unsigned long usecs) 83 128 { 84 129 if (usecs <= 10) 85 130 udelay(usecs); 86 - else if (usecs <= 20000) 87 - usleep_range(usecs, 2 * usecs); 131 + else if (usecs < USLEEP_RANGE_UPPER_BOUND) 132 + usleep_range(usecs, usecs + (usecs >> max_slack_shift)); 88 133 else 89 - msleep(DIV_ROUND_UP(usecs, 1000)); 134 + msleep(DIV_ROUND_UP(usecs, USEC_PER_MSEC)); 90 135 } 91 136 92 137 #endif /* defined(_LINUX_DELAY_H) */

-3

include/linux/dw_apb_timer.h

··· 34 34 }; 35 35 36 36 void dw_apb_clockevent_register(struct dw_apb_clock_event_device *dw_ced); 37 - void dw_apb_clockevent_pause(struct dw_apb_clock_event_device *dw_ced); 38 - void dw_apb_clockevent_resume(struct dw_apb_clock_event_device *dw_ced); 39 - void dw_apb_clockevent_stop(struct dw_apb_clock_event_device *dw_ced); 40 37 41 38 struct dw_apb_clock_event_device * 42 39 dw_apb_clockevent_init(int cpu, const char *name, unsigned rating,

+29 -22

include/linux/hrtimer.h

··· 228 228 /* Initialize timers: */ 229 229 extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, 230 230 enum hrtimer_mode mode); 231 - extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, 232 - enum hrtimer_mode mode); 231 + extern void hrtimer_setup(struct hrtimer *timer, enum hrtimer_restart (*function)(struct hrtimer *), 232 + clockid_t clock_id, enum hrtimer_mode mode); 233 + extern void hrtimer_setup_on_stack(struct hrtimer *timer, 234 + enum hrtimer_restart (*function)(struct hrtimer *), 235 + clockid_t clock_id, enum hrtimer_mode mode); 236 + extern void hrtimer_setup_sleeper_on_stack(struct hrtimer_sleeper *sl, clockid_t clock_id, 237 + enum hrtimer_mode mode); 233 238 234 239 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS 235 - extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, 236 - enum hrtimer_mode mode); 237 - extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, 238 - clockid_t clock_id, 239 - enum hrtimer_mode mode); 240 - 241 240 extern void destroy_hrtimer_on_stack(struct hrtimer *timer); 242 241 #else 243 - static inline void hrtimer_init_on_stack(struct hrtimer *timer, 244 - clockid_t which_clock, 245 - enum hrtimer_mode mode) 246 - { 247 - hrtimer_init(timer, which_clock, mode); 248 - } 249 - 250 - static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, 251 - clockid_t clock_id, 252 - enum hrtimer_mode mode) 253 - { 254 - hrtimer_init_sleeper(sl, clock_id, mode); 255 - } 256 - 257 242 static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } 258 243 #endif 259 244 ··· 320 335 static inline int hrtimer_callback_running(struct hrtimer *timer) 321 336 { 322 337 return timer->base->running == timer; 338 + } 339 + 340 + /** 341 + * hrtimer_update_function - Update the timer's callback function 342 + * @timer: Timer to update 343 + * @function: New callback function 344 + * 345 + * Only safe to call if the timer is not enqueued. Can be called in the callback function if the 346 + * timer is not enqueued at the same time (see the comments above HRTIMER_STATE_ENQUEUED). 347 + */ 348 + static inline void hrtimer_update_function(struct hrtimer *timer, 349 + enum hrtimer_restart (*function)(struct hrtimer *)) 350 + { 351 + guard(raw_spinlock_irqsave)(&timer->base->cpu_base->lock); 352 + 353 + if (WARN_ON_ONCE(hrtimer_is_queued(timer))) 354 + return; 355 + 356 + if (WARN_ON_ONCE(!function)) 357 + return; 358 + 359 + timer->function = function; 323 360 } 324 361 325 362 /* Forward a hrtimer so it expires after now: */

+26 -26

include/linux/iopoll.h

··· 19 19 * @op: accessor function (takes @args as its arguments) 20 20 * @val: Variable to read the value into 21 21 * @cond: Break condition (usually involving @val) 22 - * @sleep_us: Maximum time to sleep between reads in us (0 23 - * tight-loops). Should be less than ~20ms since usleep_range 24 - * is used (see Documentation/timers/timers-howto.rst). 22 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 23 + * read usleep_range() function description for details and 24 + * limitations. 25 25 * @timeout_us: Timeout in us, 0 means never timeout 26 26 * @sleep_before_read: if it is true, sleep @sleep_us before read. 27 27 * @args: arguments for @op poll 28 28 * 29 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 30 - * case, the last read value at @args is stored in @val. Must not 31 - * be called from atomic context if sleep_us or timeout_us are used. 32 - * 33 29 * When available, you'll probably want to use one of the specialized 34 30 * macros defined below rather than this macro directly. 31 + * 32 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 33 + * case, the last read value at @args is stored in @val. Must not 34 + * be called from atomic context if sleep_us or timeout_us are used. 35 35 */ 36 36 #define read_poll_timeout(op, val, cond, sleep_us, timeout_us, \ 37 37 sleep_before_read, args...) \ ··· 64 64 * @op: accessor function (takes @args as its arguments) 65 65 * @val: Variable to read the value into 66 66 * @cond: Break condition (usually involving @val) 67 - * @delay_us: Time to udelay between reads in us (0 tight-loops). Should 68 - * be less than ~10us since udelay is used (see 69 - * Documentation/timers/timers-howto.rst). 67 + * @delay_us: Time to udelay between reads in us (0 tight-loops). Please 68 + * read udelay() function description for details and 69 + * limitations. 70 70 * @timeout_us: Timeout in us, 0 means never timeout 71 71 * @delay_before_read: if it is true, delay @delay_us before read. 72 72 * @args: arguments for @op poll 73 - * 74 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 75 - * case, the last read value at @args is stored in @val. 76 73 * 77 74 * This macro does not rely on timekeeping. Hence it is safe to call even when 78 75 * timekeeping is suspended, at the expense of an underestimation of wall clock ··· 77 80 * 78 81 * When available, you'll probably want to use one of the specialized 79 82 * macros defined below rather than this macro directly. 83 + * 84 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 85 + * case, the last read value at @args is stored in @val. 80 86 */ 81 87 #define read_poll_timeout_atomic(op, val, cond, delay_us, timeout_us, \ 82 88 delay_before_read, args...) \ ··· 119 119 * @addr: Address to poll 120 120 * @val: Variable to read the value into 121 121 * @cond: Break condition (usually involving @val) 122 - * @sleep_us: Maximum time to sleep between reads in us (0 123 - * tight-loops). Should be less than ~20ms since usleep_range 124 - * is used (see Documentation/timers/timers-howto.rst). 122 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 123 + * read usleep_range() function description for details and 124 + * limitations. 125 125 * @timeout_us: Timeout in us, 0 means never timeout 126 - * 127 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 128 - * case, the last read value at @addr is stored in @val. Must not 129 - * be called from atomic context if sleep_us or timeout_us are used. 130 126 * 131 127 * When available, you'll probably want to use one of the specialized 132 128 * macros defined below rather than this macro directly. 129 + * 130 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 131 + * case, the last read value at @addr is stored in @val. Must not 132 + * be called from atomic context if sleep_us or timeout_us are used. 133 133 */ 134 134 #define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us) \ 135 135 read_poll_timeout(op, val, cond, sleep_us, timeout_us, false, addr) ··· 140 140 * @addr: Address to poll 141 141 * @val: Variable to read the value into 142 142 * @cond: Break condition (usually involving @val) 143 - * @delay_us: Time to udelay between reads in us (0 tight-loops). Should 144 - * be less than ~10us since udelay is used (see 145 - * Documentation/timers/timers-howto.rst). 143 + * @delay_us: Time to udelay between reads in us (0 tight-loops). Please 144 + * read udelay() function description for details and 145 + * limitations. 146 146 * @timeout_us: Timeout in us, 0 means never timeout 147 - * 148 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 149 - * case, the last read value at @addr is stored in @val. 150 147 * 151 148 * When available, you'll probably want to use one of the specialized 152 149 * macros defined below rather than this macro directly. 150 + * 151 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 152 + * case, the last read value at @addr is stored in @val. 153 153 */ 154 154 #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \ 155 155 read_poll_timeout_atomic(op, val, cond, delay_us, timeout_us, false, addr)

+14 -1

include/linux/jiffies.h

··· 502 502 * - all other values are converted to jiffies by either multiplying 503 503 * the input value by a factor or dividing it with a factor and 504 504 * handling any 32-bit overflows. 505 - * for the details see __msecs_to_jiffies() 505 + * for the details see _msecs_to_jiffies() 506 506 * 507 507 * msecs_to_jiffies() checks for the passed in value being a constant 508 508 * via __builtin_constant_p() allowing gcc to eliminate most of the ··· 525 525 return __msecs_to_jiffies(m); 526 526 } 527 527 } 528 + 529 + /** 530 + * secs_to_jiffies: - convert seconds to jiffies 531 + * @_secs: time in seconds 532 + * 533 + * Conversion is done by simple multiplication with HZ 534 + * 535 + * secs_to_jiffies() is defined as a macro rather than a static inline 536 + * function so it can be used in static initializers. 537 + * 538 + * Return: jiffies value 539 + */ 540 + #define secs_to_jiffies(_secs) ((_secs) * HZ) 528 541 529 542 extern unsigned long __usecs_to_jiffies(const unsigned int u); 530 543 #if !(USEC_PER_SEC % HZ)

+5 -4

include/linux/phy.h

··· 1378 1378 * @regnum: The register on the MMD to read 1379 1379 * @val: Variable to read the register into 1380 1380 * @cond: Break condition (usually involving @val) 1381 - * @sleep_us: Maximum time to sleep between reads in us (0 1382 - * tight-loops). Should be less than ~20ms since usleep_range 1383 - * is used (see Documentation/timers/timers-howto.rst). 1381 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 1382 + * read usleep_range() function description for details and 1383 + * limitations. 1384 1384 * @timeout_us: Timeout in us, 0 means never timeout 1385 1385 * @sleep_before_read: if it is true, sleep @sleep_us before read. 1386 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 1386 + * 1387 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 1387 1388 * case, the last read value at @args is stored in @val. Must not 1388 1389 * be called from atomic context if sleep_us or timeout_us are used. 1389 1390 */

+60 -12

include/linux/posix-timers.h

··· 5 5 #include <linux/alarmtimer.h> 6 6 #include <linux/list.h> 7 7 #include <linux/mutex.h> 8 + #include <linux/pid.h> 8 9 #include <linux/posix-timers_types.h> 10 + #include <linux/rcuref.h> 9 11 #include <linux/spinlock.h> 10 12 #include <linux/timerqueue.h> 11 13 12 14 struct kernel_siginfo; 13 15 struct task_struct; 16 + struct sigqueue; 17 + struct k_itimer; 14 18 15 19 static inline clockid_t make_process_cpuclock(const unsigned int pid, 16 20 const clockid_t clock) ··· 39 35 40 36 #ifdef CONFIG_POSIX_TIMERS 41 37 38 + #include <linux/signal_types.h> 39 + 42 40 /** 43 41 * cpu_timer - Posix CPU timer representation for k_itimer 44 42 * @node: timerqueue node to queue in the task/sig ··· 48 42 * @pid: Pointer to target task PID 49 43 * @elist: List head for the expiry list 50 44 * @firing: Timer is currently firing 45 + * @nanosleep: Timer is used for nanosleep and is not a regular posix-timer 51 46 * @handling: Pointer to the task which handles expiry 52 47 */ 53 48 struct cpu_timer { ··· 56 49 struct timerqueue_head *head; 57 50 struct pid *pid; 58 51 struct list_head elist; 59 - int firing; 52 + bool firing; 53 + bool nanosleep; 60 54 struct task_struct __rcu *handling; 61 55 }; 62 56 ··· 109 101 pct->bases[CPUCLOCK_SCHED].nextevt = runtime; 110 102 } 111 103 104 + void posixtimer_rearm_itimer(struct task_struct *p); 105 + bool posixtimer_init_sigqueue(struct sigqueue *q); 106 + void posixtimer_send_sigqueue(struct k_itimer *tmr); 107 + bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueue *timer_sigq); 108 + void posixtimer_free_timer(struct k_itimer *timer); 109 + 112 110 /* Init task static initializer */ 113 111 #define INIT_CPU_TIMERBASE(b) { \ 114 112 .nextevt = U64_MAX, \ ··· 136 122 static inline void posix_cputimers_init(struct posix_cputimers *pct) { } 137 123 static inline void posix_cputimers_group_init(struct posix_cputimers *pct, 138 124 u64 cpu_limit) { } 125 + static inline void posixtimer_rearm_itimer(struct task_struct *p) { } 126 + static inline bool posixtimer_deliver_signal(struct kernel_siginfo *info, 127 + struct sigqueue *timer_sigq) { return false; } 128 + static inline void posixtimer_free_timer(struct k_itimer *timer) { } 139 129 #endif 140 130 141 131 #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK ··· 150 132 static inline void posix_cputimers_init_work(void) { } 151 133 #endif 152 134 153 - #define REQUEUE_PENDING 1 154 - 155 135 /** 156 136 * struct k_itimer - POSIX.1b interval timer structure. 157 - * @list: List head for binding the timer to signals->posix_timers 137 + * @list: List node for binding the timer to tsk::signal::posix_timers 138 + * @ignored_list: List node for tracking ignored timers in tsk::signal::ignored_posix_timers 158 139 * @t_hash: Entry in the posix timer hash table 159 140 * @it_lock: Lock protecting the timer 160 141 * @kclock: Pointer to the k_clock struct handling this timer 161 142 * @it_clock: The posix timer clock id 162 143 * @it_id: The posix timer id for identifying the timer 163 - * @it_active: Marker that timer is active 144 + * @it_status: The status of the timer 145 + * @it_sig_periodic: The periodic status at signal delivery 164 146 * @it_overrun: The overrun counter for pending signals 165 147 * @it_overrun_last: The overrun at the time of the last delivered signal 166 - * @it_requeue_pending: Indicator that timer waits for being requeued on 167 - * signal delivery 148 + * @it_signal_seq: Sequence count to control signal delivery 149 + * @it_sigqueue_seq: The sequence count at the point where the signal was queued 168 150 * @it_sigev_notify: The notify word of sigevent struct for signal delivery 169 151 * @it_interval: The interval for periodic timers 170 152 * @it_signal: Pointer to the creators signal struct 171 153 * @it_pid: The pid of the process/task targeted by the signal 172 154 * @it_process: The task to wakeup on clock_nanosleep (CPU timers) 173 - * @sigq: Pointer to preallocated sigqueue 155 + * @rcuref: Reference count for life time management 156 + * @sigq: Embedded sigqueue 174 157 * @it: Union representing the various posix timer type 175 158 * internals. 176 159 * @rcu: RCU head for freeing the timer. 177 160 */ 178 161 struct k_itimer { 179 162 struct hlist_node list; 163 + struct hlist_node ignored_list; 180 164 struct hlist_node t_hash; 181 165 spinlock_t it_lock; 182 166 const struct k_clock *kclock; 183 167 clockid_t it_clock; 184 168 timer_t it_id; 185 - int it_active; 169 + int it_status; 170 + bool it_sig_periodic; 186 171 s64 it_overrun; 187 172 s64 it_overrun_last; 188 - int it_requeue_pending; 173 + unsigned int it_signal_seq; 174 + unsigned int it_sigqueue_seq; 189 175 int it_sigev_notify; 176 + enum pid_type it_pid_type; 190 177 ktime_t it_interval; 191 178 struct signal_struct *it_signal; 192 179 union { 193 180 struct pid *it_pid; 194 181 struct task_struct *it_process; 195 182 }; 196 - struct sigqueue *sigq; 183 + struct sigqueue sigq; 184 + rcuref_t rcuref; 197 185 union { 198 186 struct { 199 187 struct hrtimer timer; ··· 220 196 221 197 int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); 222 198 223 - void posixtimer_rearm(struct kernel_siginfo *info); 199 + #ifdef CONFIG_POSIX_TIMERS 200 + static inline void posixtimer_putref(struct k_itimer *tmr) 201 + { 202 + if (rcuref_put(&tmr->rcuref)) 203 + posixtimer_free_timer(tmr); 204 + } 205 + 206 + static inline void posixtimer_sigqueue_getref(struct sigqueue *q) 207 + { 208 + struct k_itimer *tmr = container_of(q, struct k_itimer, sigq); 209 + 210 + WARN_ON_ONCE(!rcuref_get(&tmr->rcuref)); 211 + } 212 + 213 + static inline void posixtimer_sigqueue_putref(struct sigqueue *q) 214 + { 215 + struct k_itimer *tmr = container_of(q, struct k_itimer, sigq); 216 + 217 + posixtimer_putref(tmr); 218 + } 219 + #else /* CONFIG_POSIX_TIMERS */ 220 + static inline void posixtimer_sigqueue_getref(struct sigqueue *q) { } 221 + static inline void posixtimer_sigqueue_putref(struct sigqueue *q) { } 222 + #endif /* !CONFIG_POSIX_TIMERS */ 223 + 224 224 #endif

+19 -19

include/linux/regmap.h

··· 106 106 * @addr: Address to poll 107 107 * @val: Unsigned integer variable to read the value into 108 108 * @cond: Break condition (usually involving @val) 109 - * @sleep_us: Maximum time to sleep between reads in us (0 110 - * tight-loops). Should be less than ~20ms since usleep_range 111 - * is used (see Documentation/timers/timers-howto.rst). 109 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 110 + * read usleep_range() function description for details and 111 + * limitations. 112 112 * @timeout_us: Timeout in us, 0 means never timeout 113 113 * 114 - * Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_read 114 + * This is modelled after the readx_poll_timeout macros in linux/iopoll.h. 115 + * 116 + * Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_read 115 117 * error return value in case of a error read. In the two former cases, 116 118 * the last read value at @addr is stored in @val. Must not be called 117 119 * from atomic context if sleep_us or timeout_us are used. 118 - * 119 - * This is modelled after the readx_poll_timeout macros in linux/iopoll.h. 120 120 */ 121 121 #define regmap_read_poll_timeout(map, addr, val, cond, sleep_us, timeout_us) \ 122 122 ({ \ ··· 133 133 * @addr: Address to poll 134 134 * @val: Unsigned integer variable to read the value into 135 135 * @cond: Break condition (usually involving @val) 136 - * @delay_us: Time to udelay between reads in us (0 tight-loops). 137 - * Should be less than ~10us since udelay is used 138 - * (see Documentation/timers/timers-howto.rst). 136 + * @delay_us: Time to udelay between reads in us (0 tight-loops). Please 137 + * read udelay() function description for details and 138 + * limitations. 139 139 * @timeout_us: Timeout in us, 0 means never timeout 140 - * 141 - * Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_read 142 - * error return value in case of a error read. In the two former cases, 143 - * the last read value at @addr is stored in @val. 144 140 * 145 141 * This is modelled after the readx_poll_timeout_atomic macros in linux/iopoll.h. 146 142 * 147 143 * Note: In general regmap cannot be used in atomic context. If you want to use 148 144 * this macro then first setup your regmap for atomic use (flat or no cache 149 145 * and MMIO regmap). 146 + * 147 + * Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_read 148 + * error return value in case of a error read. In the two former cases, 149 + * the last read value at @addr is stored in @val. 150 150 */ 151 151 #define regmap_read_poll_timeout_atomic(map, addr, val, cond, delay_us, timeout_us) \ 152 152 ({ \ ··· 177 177 * @field: Regmap field to read from 178 178 * @val: Unsigned integer variable to read the value into 179 179 * @cond: Break condition (usually involving @val) 180 - * @sleep_us: Maximum time to sleep between reads in us (0 181 - * tight-loops). Should be less than ~20ms since usleep_range 182 - * is used (see Documentation/timers/timers-howto.rst). 180 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 181 + * read usleep_range() function description for details and 182 + * limitations. 183 183 * @timeout_us: Timeout in us, 0 means never timeout 184 184 * 185 - * Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_field_read 185 + * This is modelled after the readx_poll_timeout macros in linux/iopoll.h. 186 + * 187 + * Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_field_read 186 188 * error return value in case of a error read. In the two former cases, 187 189 * the last read value at @addr is stored in @val. Must not be called 188 190 * from atomic context if sleep_us or timeout_us are used. 189 - * 190 - * This is modelled after the readx_poll_timeout macros in linux/iopoll.h. 191 191 */ 192 192 #define regmap_field_read_poll_timeout(field, val, cond, sleep_us, timeout_us) \ 193 193 ({ \

+1 -3

include/linux/sched/signal.h

··· 138 138 /* POSIX.1b Interval Timers */ 139 139 unsigned int next_posix_timer_id; 140 140 struct hlist_head posix_timers; 141 + struct hlist_head ignored_posix_timers; 141 142 142 143 /* ITIMER_REAL timer for the process */ 143 144 struct hrtimer real_timer; ··· 339 338 extern void force_exit_sig(int); 340 339 extern int send_sig(int, struct task_struct *, int); 341 340 extern int zap_other_threads(struct task_struct *p); 342 - extern struct sigqueue *sigqueue_alloc(void); 343 - extern void sigqueue_free(struct sigqueue *); 344 - extern int send_sigqueue(struct sigqueue *, struct pid *, enum pid_type); 345 341 extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *); 346 342 347 343 static inline void clear_notify_signal(void)

-2

include/linux/tick.h

··· 20 20 extern void tick_suspend_local(void); 21 21 /* Should be core only, but XEN resume magic and ARM BL switcher require it */ 22 22 extern void tick_resume_local(void); 23 - extern void tick_cleanup_dead_cpu(int cpu); 24 23 #else /* CONFIG_GENERIC_CLOCKEVENTS */ 25 24 static inline void tick_init(void) { } 26 25 static inline void tick_suspend_local(void) { } 27 26 static inline void tick_resume_local(void) { } 28 - static inline void tick_cleanup_dead_cpu(int cpu) { } 29 27 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */ 30 28 31 29 #if defined(CONFIG_GENERIC_CLOCKEVENTS) && defined(CONFIG_HOTPLUG_CPU)

+61 -53

include/linux/timekeeper_internal.h

··· 26 26 * occupies a single 64byte cache line. 27 27 * 28 28 * The struct is separate from struct timekeeper as it is also used 29 - * for a fast NMI safe accessors. 29 + * for the fast NMI safe accessors. 30 30 * 31 31 * @base_real is for the fast NMI safe accessor to allow reading clock 32 32 * realtime from any context. ··· 44 44 45 45 /** 46 46 * struct timekeeper - Structure holding internal timekeeping values. 47 - * @tkr_mono: The readout base structure for CLOCK_MONOTONIC 48 - * @tkr_raw: The readout base structure for CLOCK_MONOTONIC_RAW 49 - * @xtime_sec: Current CLOCK_REALTIME time in seconds 50 - * @ktime_sec: Current CLOCK_MONOTONIC time in seconds 51 - * @wall_to_monotonic: CLOCK_REALTIME to CLOCK_MONOTONIC offset 52 - * @offs_real: Offset clock monotonic -> clock realtime 53 - * @offs_boot: Offset clock monotonic -> clock boottime 54 - * @offs_tai: Offset clock monotonic -> clock tai 55 - * @tai_offset: The current UTC to TAI offset in seconds 56 - * @clock_was_set_seq: The sequence number of clock was set events 57 - * @cs_was_changed_seq: The sequence number of clocksource change events 58 - * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second 59 - * @raw_sec: CLOCK_MONOTONIC_RAW time in seconds 60 - * @monotonic_to_boot: CLOCK_MONOTONIC to CLOCK_BOOTTIME offset 61 - * @cycle_interval: Number of clock cycles in one NTP interval 62 - * @xtime_interval: Number of clock shifted nano seconds in one NTP 63 - * interval. 64 - * @xtime_remainder: Shifted nano seconds left over when rounding 65 - * @cycle_interval 66 - * @raw_interval: Shifted raw nano seconds accumulated per NTP interval. 67 - * @ntp_error: Difference between accumulated time and NTP time in ntp 68 - * shifted nano seconds. 69 - * @ntp_error_shift: Shift conversion between clock shifted nano seconds and 70 - * ntp shifted nano seconds. 71 - * @last_warning: Warning ratelimiter (DEBUG_TIMEKEEPING) 72 - * @underflow_seen: Underflow warning flag (DEBUG_TIMEKEEPING) 73 - * @overflow_seen: Overflow warning flag (DEBUG_TIMEKEEPING) 47 + * @tkr_mono: The readout base structure for CLOCK_MONOTONIC 48 + * @xtime_sec: Current CLOCK_REALTIME time in seconds 49 + * @ktime_sec: Current CLOCK_MONOTONIC time in seconds 50 + * @wall_to_monotonic: CLOCK_REALTIME to CLOCK_MONOTONIC offset 51 + * @offs_real: Offset clock monotonic -> clock realtime 52 + * @offs_boot: Offset clock monotonic -> clock boottime 53 + * @offs_tai: Offset clock monotonic -> clock tai 54 + * @tai_offset: The current UTC to TAI offset in seconds 55 + * @tkr_raw: The readout base structure for CLOCK_MONOTONIC_RAW 56 + * @raw_sec: CLOCK_MONOTONIC_RAW time in seconds 57 + * @clock_was_set_seq: The sequence number of clock was set events 58 + * @cs_was_changed_seq: The sequence number of clocksource change events 59 + * @monotonic_to_boot: CLOCK_MONOTONIC to CLOCK_BOOTTIME offset 60 + * @cycle_interval: Number of clock cycles in one NTP interval 61 + * @xtime_interval: Number of clock shifted nano seconds in one NTP 62 + * interval. 63 + * @xtime_remainder: Shifted nano seconds left over when rounding 64 + * @cycle_interval 65 + * @raw_interval: Shifted raw nano seconds accumulated per NTP interval. 66 + * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second 67 + * @ntp_tick: The ntp_tick_length() value currently being 68 + * used. This cached copy ensures we consistently 69 + * apply the tick length for an entire tick, as 70 + * ntp_tick_length may change mid-tick, and we don't 71 + * want to apply that new value to the tick in 72 + * progress. 73 + * @ntp_error: Difference between accumulated time and NTP time in ntp 74 + * shifted nano seconds. 75 + * @ntp_error_shift: Shift conversion between clock shifted nano seconds and 76 + * ntp shifted nano seconds. 77 + * @ntp_err_mult: Multiplication factor for scaled math conversion 78 + * @skip_second_overflow: Flag used to avoid updating NTP twice with same second 74 79 * 75 80 * Note: For timespec(64) based interfaces wall_to_monotonic is what 76 81 * we need to add to xtime (or xtime corrected for sub jiffy times) ··· 93 88 * 94 89 * @monotonic_to_boottime is a timespec64 representation of @offs_boot to 95 90 * accelerate the VDSO update for CLOCK_BOOTTIME. 91 + * 92 + * The cacheline ordering of the structure is optimized for in kernel usage of 93 + * the ktime_get() and ktime_get_ts64() family of time accessors. Struct 94 + * timekeeper is prepended in the core timekeeping code with a sequence count, 95 + * which results in the following cacheline layout: 96 + * 97 + * 0: seqcount, tkr_mono 98 + * 1: xtime_sec ... tai_offset 99 + * 2: tkr_raw, raw_sec 100 + * 3,4: Internal variables 101 + * 102 + * Cacheline 0,1 contain the data which is used for accessing 103 + * CLOCK_MONOTONIC/REALTIME/BOOTTIME/TAI, while cacheline 2 contains the 104 + * data for accessing CLOCK_MONOTONIC_RAW. Cacheline 3,4 are internal 105 + * variables which are only accessed during timekeeper updates once per 106 + * tick. 96 107 */ 97 108 struct timekeeper { 109 + /* Cacheline 0 (together with prepended seqcount of timekeeper core): */ 98 110 struct tk_read_base tkr_mono; 99 - struct tk_read_base tkr_raw; 111 + 112 + /* Cacheline 1: */ 100 113 u64 xtime_sec; 101 114 unsigned long ktime_sec; 102 115 struct timespec64 wall_to_monotonic; ··· 122 99 ktime_t offs_boot; 123 100 ktime_t offs_tai; 124 101 s32 tai_offset; 102 + 103 + /* Cacheline 2: */ 104 + struct tk_read_base tkr_raw; 105 + u64 raw_sec; 106 + 107 + /* Cachline 3 and 4 (timekeeping internal variables): */ 125 108 unsigned int clock_was_set_seq; 126 109 u8 cs_was_changed_seq; 127 - ktime_t next_leap_ktime; 128 - u64 raw_sec; 110 + 129 111 struct timespec64 monotonic_to_boot; 130 112 131 - /* The following members are for timekeeping internal use */ 132 113 u64 cycle_interval; 133 114 u64 xtime_interval; 134 115 s64 xtime_remainder; 135 116 u64 raw_interval; 136 - /* The ntp_tick_length() value currently being used. 137 - * This cached copy ensures we consistently apply the tick 138 - * length for an entire tick, as ntp_tick_length may change 139 - * mid-tick, and we don't want to apply that new value to 140 - * the tick in progress. 141 - */ 117 + 118 + ktime_t next_leap_ktime; 142 119 u64 ntp_tick; 143 - /* Difference between accumulated time and NTP time in ntp 144 - * shifted nano seconds. */ 145 120 s64 ntp_error; 146 121 u32 ntp_error_shift; 147 122 u32 ntp_err_mult; 148 - /* Flag used to avoid updating NTP twice with same second */ 149 123 u32 skip_second_overflow; 150 - #ifdef CONFIG_DEBUG_TIMEKEEPING 151 - long last_warning; 152 - /* 153 - * These simple flag variables are managed 154 - * without locks, which is racy, but they are 155 - * ok since we don't really care about being 156 - * super precise about how many events were 157 - * seen, just that a problem was observed. 158 - */ 159 - int underflow_seen; 160 - int overflow_seen; 161 - #endif 162 124 }; 163 125 164 126 #ifdef CONFIG_GENERIC_TIME_VSYSCALL

+2

include/linux/timekeeping.h

··· 280 280 * counter value 281 281 * @cycles: Clocksource counter value to produce the system times 282 282 * @real: Realtime system time 283 + * @boot: Boot time 283 284 * @raw: Monotonic raw system time 284 285 * @cs_id: Clocksource ID 285 286 * @clock_was_set_seq: The sequence number of clock-was-set events ··· 289 288 struct system_time_snapshot { 290 289 u64 cycles; 291 290 ktime_t real; 291 + ktime_t boot; 292 292 ktime_t raw; 293 293 enum clocksource_ids cs_id; 294 294 unsigned int clock_was_set_seq;

-8

include/linux/timex.h

··· 139 139 #define MAXSEC 2048 /* max interval between updates (s) */ 140 140 #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) /* beyond max. dispersion */ 141 141 142 - /* 143 - * kernel variables 144 - * Note: maximum error = NTP sync distance = dispersion + delay / 2; 145 - * estimated error = NTP dispersion. 146 - */ 147 - extern unsigned long tick_usec; /* USER_HZ period (usec) */ 148 - extern unsigned long tick_nsec; /* SHIFTED_HZ period (nsec) */ 149 - 150 142 /* Required to safely shift negative values */ 151 143 #define shift_right(x, s) ({ \ 152 144 __typeof__(x) __x = (x); \

+2 -2

include/linux/wait.h

··· 542 542 int __ret = 0; \ 543 543 struct hrtimer_sleeper __t; \ 544 544 \ 545 - hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, \ 546 - HRTIMER_MODE_REL); \ 545 + hrtimer_setup_sleeper_on_stack(&__t, CLOCK_MONOTONIC, \ 546 + HRTIMER_MODE_REL); \ 547 547 if ((timeout) != KTIME_MAX) { \ 548 548 hrtimer_set_expires_range_ns(&__t.timer, timeout, \ 549 549 current->timer_slack_ns); \

+1 -1

include/uapi/asm-generic/siginfo.h

··· 46 46 __kernel_timer_t _tid; /* timer id */ 47 47 int _overrun; /* overrun count */ 48 48 sigval_t _sigval; /* same as below */ 49 - int _sys_private; /* not to be passed to user */ 49 + int _sys_private; /* Not used by the kernel. Historic leftover. Always 0. */ 50 50 } _timer; 51 51 52 52 /* POSIX.1b signals */

+3 -2

init/init_task.c

··· 30 30 .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex), 31 31 .exec_update_lock = __RWSEM_INITIALIZER(init_signals.exec_update_lock), 32 32 #ifdef CONFIG_POSIX_TIMERS 33 - .posix_timers = HLIST_HEAD_INIT, 34 - .cputimer = { 33 + .posix_timers = HLIST_HEAD_INIT, 34 + .ignored_posix_timers = HLIST_HEAD_INIT, 35 + .cputimer = { 35 36 .cputime_atomic = INIT_CPUTIME_ATOMIC, 36 37 }, 37 38 #endif

+4 -3

io_uring/io_uring.c

··· 2408 2408 { 2409 2409 ktime_t timeout; 2410 2410 2411 - hrtimer_init_on_stack(&iowq->t, clock_id, HRTIMER_MODE_ABS); 2412 2411 if (iowq->min_timeout) { 2413 2412 timeout = ktime_add_ns(iowq->min_timeout, start_time); 2414 - iowq->t.function = io_cqring_min_timer_wakeup; 2413 + hrtimer_setup_on_stack(&iowq->t, io_cqring_min_timer_wakeup, clock_id, 2414 + HRTIMER_MODE_ABS); 2415 2415 } else { 2416 2416 timeout = iowq->timeout; 2417 - iowq->t.function = io_cqring_timer_wakeup; 2417 + hrtimer_setup_on_stack(&iowq->t, io_cqring_timer_wakeup, clock_id, 2418 + HRTIMER_MODE_ABS); 2418 2419 } 2419 2420 2420 2421 hrtimer_set_expires_range_ns(&iowq->t, timeout, 0);

+1 -1

io_uring/rw.c

··· 1176 1176 req->flags |= REQ_F_IOPOLL_STATE; 1177 1177 1178 1178 mode = HRTIMER_MODE_REL; 1179 - hrtimer_init_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode); 1179 + hrtimer_setup_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode); 1180 1180 hrtimer_set_expires(&timer.timer, kt); 1181 1181 set_current_state(TASK_INTERRUPTIBLE); 1182 1182 hrtimer_sleeper_start_expires(&timer, mode);

-1

io_uring/timeout.c

··· 76 76 /* re-arm timer */ 77 77 spin_lock_irq(&ctx->timeout_lock); 78 78 list_add(&timeout->list, ctx->timeout_list.prev); 79 - data->timer.function = io_timeout_fn; 80 79 hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), data->mode); 81 80 spin_unlock_irq(&ctx->timeout_lock); 82 81 return;

-1

kernel/cpu.c

··· 1339 1339 cpuhp_bp_sync_dead(cpu); 1340 1340 1341 1341 lockdep_cleanup_dead_cpu(cpu, idle_thread_get(cpu)); 1342 - tick_cleanup_dead_cpu(cpu); 1343 1342 1344 1343 /* 1345 1344 * Callbacks must be re-integrated right away to the RCU state machine.

+1

kernel/fork.c

··· 1862 1862 1863 1863 #ifdef CONFIG_POSIX_TIMERS 1864 1864 INIT_HLIST_HEAD(&sig->posix_timers); 1865 + INIT_HLIST_HEAD(&sig->ignored_posix_timers); 1865 1866 hrtimer_init(&sig->real_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 1866 1867 sig->real_timer.function = it_real_fn; 1867 1868 #endif

+3 -3

kernel/futex/core.c

··· 140 140 if (!time) 141 141 return NULL; 142 142 143 - hrtimer_init_sleeper_on_stack(timeout, (flags & FLAGS_CLOCKRT) ? 144 - CLOCK_REALTIME : CLOCK_MONOTONIC, 145 - HRTIMER_MODE_ABS); 143 + hrtimer_setup_sleeper_on_stack(timeout, 144 + (flags & FLAGS_CLOCKRT) ? CLOCK_REALTIME : CLOCK_MONOTONIC, 145 + HRTIMER_MODE_ABS); 146 146 /* 147 147 * If range_ns is 0, calling hrtimer_set_expires_range_ns() is 148 148 * effectively the same as calling hrtimer_set_expires().

+2 -2

kernel/sched/idle.c

··· 398 398 cpuidle_use_deepest_state(latency_ns); 399 399 400 400 it.done = 0; 401 - hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); 402 - it.timer.function = idle_inject_timer_fn; 401 + hrtimer_setup_on_stack(&it.timer, idle_inject_timer_fn, CLOCK_MONOTONIC, 402 + HRTIMER_MODE_REL_HARD); 403 403 hrtimer_start(&it.timer, ns_to_ktime(duration_ns), 404 404 HRTIMER_MODE_REL_PINNED_HARD); 405 405

+311 -209

kernel/signal.c

··· 59 59 #include <asm/cacheflush.h> 60 60 #include <asm/syscall.h> /* for syscall_get_* */ 61 61 62 + #include "time/posix-timers.h" 63 + 62 64 /* 63 65 * SLAB caches for signal bits. 64 66 */ ··· 398 396 task_set_jobctl_pending(task, mask | JOBCTL_STOP_PENDING); 399 397 } 400 398 401 - /* 402 - * allocate a new signal queue record 403 - * - this may be called without locks if and only if t == current, otherwise an 404 - * appropriate lock must be held to stop the target task from exiting 405 - */ 406 - static struct sigqueue * 407 - __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, 408 - int override_rlimit, const unsigned int sigqueue_flags) 399 + static struct ucounts *sig_get_ucounts(struct task_struct *t, int sig, 400 + int override_rlimit) 409 401 { 410 - struct sigqueue *q = NULL; 411 402 struct ucounts *ucounts; 412 403 long sigpending; 413 404 ··· 420 425 if (!sigpending) 421 426 return NULL; 422 427 423 - if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) { 424 - q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); 425 - } else { 428 + if (unlikely(!override_rlimit && sigpending > task_rlimit(t, RLIMIT_SIGPENDING))) { 429 + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); 426 430 print_dropped_signal(sig); 431 + return NULL; 427 432 } 428 433 429 - if (unlikely(q == NULL)) { 434 + return ucounts; 435 + } 436 + 437 + static void __sigqueue_init(struct sigqueue *q, struct ucounts *ucounts, 438 + const unsigned int sigqueue_flags) 439 + { 440 + INIT_LIST_HEAD(&q->list); 441 + q->flags = sigqueue_flags; 442 + q->ucounts = ucounts; 443 + } 444 + 445 + /* 446 + * allocate a new signal queue record 447 + * - this may be called without locks if and only if t == current, otherwise an 448 + * appropriate lock must be held to stop the target task from exiting 449 + */ 450 + static struct sigqueue *sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, 451 + int override_rlimit) 452 + { 453 + struct ucounts *ucounts = sig_get_ucounts(t, sig, override_rlimit); 454 + struct sigqueue *q; 455 + 456 + if (!ucounts) 457 + return NULL; 458 + 459 + q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); 460 + if (!q) { 430 461 dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); 431 - } else { 432 - INIT_LIST_HEAD(&q->list); 433 - q->flags = sigqueue_flags; 434 - q->ucounts = ucounts; 462 + return NULL; 435 463 } 464 + 465 + __sigqueue_init(q, ucounts, 0); 436 466 return q; 437 467 } 438 468 439 469 static void __sigqueue_free(struct sigqueue *q) 440 470 { 441 - if (q->flags & SIGQUEUE_PREALLOC) 471 + if (q->flags & SIGQUEUE_PREALLOC) { 472 + posixtimer_sigqueue_putref(q); 442 473 return; 474 + } 443 475 if (q->ucounts) { 444 476 dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING); 445 477 q->ucounts = NULL; ··· 500 478 spin_unlock_irqrestore(&t->sighand->siglock, flags); 501 479 } 502 480 EXPORT_SYMBOL(flush_signals); 503 - 504 - #ifdef CONFIG_POSIX_TIMERS 505 - static void __flush_itimer_signals(struct sigpending *pending) 506 - { 507 - sigset_t signal, retain; 508 - struct sigqueue *q, *n; 509 - 510 - signal = pending->signal; 511 - sigemptyset(&retain); 512 - 513 - list_for_each_entry_safe(q, n, &pending->list, list) { 514 - int sig = q->info.si_signo; 515 - 516 - if (likely(q->info.si_code != SI_TIMER)) { 517 - sigaddset(&retain, sig); 518 - } else { 519 - sigdelset(&signal, sig); 520 - list_del_init(&q->list); 521 - __sigqueue_free(q); 522 - } 523 - } 524 - 525 - sigorsets(&pending->signal, &signal, &retain); 526 - } 527 - 528 - void flush_itimer_signals(void) 529 - { 530 - struct task_struct *tsk = current; 531 - unsigned long flags; 532 - 533 - spin_lock_irqsave(&tsk->sighand->siglock, flags); 534 - __flush_itimer_signals(&tsk->pending); 535 - __flush_itimer_signals(&tsk->signal->shared_pending); 536 - spin_unlock_irqrestore(&tsk->sighand->siglock, flags); 537 - } 538 - #endif 539 481 540 482 void ignore_signals(struct task_struct *t) 541 483 { ··· 550 564 } 551 565 552 566 static void collect_signal(int sig, struct sigpending *list, kernel_siginfo_t *info, 553 - bool *resched_timer) 567 + struct sigqueue **timer_sigq) 554 568 { 555 569 struct sigqueue *q, *first = NULL; 556 570 ··· 573 587 list_del_init(&first->list); 574 588 copy_siginfo(info, &first->info); 575 589 576 - *resched_timer = 577 - (first->flags & SIGQUEUE_PREALLOC) && 578 - (info->si_code == SI_TIMER) && 579 - (info->si_sys_private); 580 - 581 - __sigqueue_free(first); 590 + /* 591 + * posix-timer signals are preallocated and freed when the last 592 + * reference count is dropped in posixtimer_deliver_signal() or 593 + * immediately on timer deletion when the signal is not pending. 594 + * Spare the extra round through __sigqueue_free() which is 595 + * ignoring preallocated signals. 596 + */ 597 + if (unlikely((first->flags & SIGQUEUE_PREALLOC) && (info->si_code == SI_TIMER))) 598 + *timer_sigq = first; 599 + else 600 + __sigqueue_free(first); 582 601 } else { 583 602 /* 584 603 * Ok, it wasn't in the queue. This must be ··· 600 609 } 601 610 602 611 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, 603 - kernel_siginfo_t *info, bool *resched_timer) 612 + kernel_siginfo_t *info, struct sigqueue **timer_sigq) 604 613 { 605 614 int sig = next_signal(pending, mask); 606 615 607 616 if (sig) 608 - collect_signal(sig, pending, info, resched_timer); 617 + collect_signal(sig, pending, info, timer_sigq); 609 618 return sig; 610 619 } 611 620 ··· 617 626 int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type) 618 627 { 619 628 struct task_struct *tsk = current; 620 - bool resched_timer = false; 629 + struct sigqueue *timer_sigq; 621 630 int signr; 622 631 623 632 lockdep_assert_held(&tsk->sighand->siglock); 624 633 634 + again: 625 635 *type = PIDTYPE_PID; 626 - signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer); 636 + timer_sigq = NULL; 637 + signr = __dequeue_signal(&tsk->pending, mask, info, &timer_sigq); 627 638 if (!signr) { 628 639 *type = PIDTYPE_TGID; 629 640 signr = __dequeue_signal(&tsk->signal->shared_pending, 630 - mask, info, &resched_timer); 631 - #ifdef CONFIG_POSIX_TIMERS 632 - /* 633 - * itimer signal ? 634 - * 635 - * itimers are process shared and we restart periodic 636 - * itimers in the signal delivery path to prevent DoS 637 - * attacks in the high resolution timer case. This is 638 - * compliant with the old way of self-restarting 639 - * itimers, as the SIGALRM is a legacy signal and only 640 - * queued once. Changing the restart behaviour to 641 - * restart the timer in the signal dequeue path is 642 - * reducing the timer noise on heavy loaded !highres 643 - * systems too. 644 - */ 645 - if (unlikely(signr == SIGALRM)) { 646 - struct hrtimer *tmr = &tsk->signal->real_timer; 641 + mask, info, &timer_sigq); 647 642 648 - if (!hrtimer_is_queued(tmr) && 649 - tsk->signal->it_real_incr != 0) { 650 - hrtimer_forward(tmr, tmr->base->get_time(), 651 - tsk->signal->it_real_incr); 652 - hrtimer_restart(tmr); 653 - } 654 - } 655 - #endif 643 + if (unlikely(signr == SIGALRM)) 644 + posixtimer_rearm_itimer(tsk); 656 645 } 657 646 658 647 recalc_sigpending(); ··· 654 683 */ 655 684 current->jobctl |= JOBCTL_STOP_DEQUEUED; 656 685 } 657 - #ifdef CONFIG_POSIX_TIMERS 658 - if (resched_timer) { 659 - /* 660 - * Release the siglock to ensure proper locking order 661 - * of timer locks outside of siglocks. Note, we leave 662 - * irqs disabled here, since the posix-timers code is 663 - * about to disable them again anyway. 664 - */ 665 - spin_unlock(&tsk->sighand->siglock); 666 - posixtimer_rearm(info); 667 - spin_lock(&tsk->sighand->siglock); 668 686 669 - /* Don't expose the si_sys_private value to userspace */ 670 - info->si_sys_private = 0; 687 + if (IS_ENABLED(CONFIG_POSIX_TIMERS) && unlikely(timer_sigq)) { 688 + if (!posixtimer_deliver_signal(info, timer_sigq)) 689 + goto again; 671 690 } 672 - #endif 691 + 673 692 return signr; 674 693 } 675 694 EXPORT_SYMBOL_GPL(dequeue_signal); ··· 734 773 kick_process(t); 735 774 } 736 775 737 - /* 738 - * Remove signals in mask from the pending set and queue. 739 - * Returns 1 if any signals were found. 740 - * 741 - * All callers must be holding the siglock. 742 - */ 743 - static void flush_sigqueue_mask(sigset_t *mask, struct sigpending *s) 776 + static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q); 777 + 778 + static void sigqueue_free_ignored(struct task_struct *tsk, struct sigqueue *q) 779 + { 780 + if (likely(!(q->flags & SIGQUEUE_PREALLOC) || q->info.si_code != SI_TIMER)) 781 + __sigqueue_free(q); 782 + else 783 + posixtimer_sig_ignore(tsk, q); 784 + } 785 + 786 + /* Remove signals in mask from the pending set and queue. */ 787 + static void flush_sigqueue_mask(struct task_struct *p, sigset_t *mask, struct sigpending *s) 744 788 { 745 789 struct sigqueue *q, *n; 746 790 sigset_t m; 791 + 792 + lockdep_assert_held(&p->sighand->siglock); 747 793 748 794 sigandsets(&m, mask, &s->signal); 749 795 if (sigisemptyset(&m)) ··· 760 792 list_for_each_entry_safe(q, n, &s->list, list) { 761 793 if (sigismember(mask, q->info.si_signo)) { 762 794 list_del_init(&q->list); 763 - __sigqueue_free(q); 795 + sigqueue_free_ignored(p, q); 764 796 } 765 797 } 766 798 } ··· 885 917 * This is a stop signal. Remove SIGCONT from all queues. 886 918 */ 887 919 siginitset(&flush, sigmask(SIGCONT)); 888 - flush_sigqueue_mask(&flush, &signal->shared_pending); 920 + flush_sigqueue_mask(p, &flush, &signal->shared_pending); 889 921 for_each_thread(p, t) 890 - flush_sigqueue_mask(&flush, &t->pending); 922 + flush_sigqueue_mask(p, &flush, &t->pending); 891 923 } else if (sig == SIGCONT) { 892 924 unsigned int why; 893 925 /* 894 926 * Remove all stop signals from all queues, wake all threads. 895 927 */ 896 928 siginitset(&flush, SIG_KERNEL_STOP_MASK); 897 - flush_sigqueue_mask(&flush, &signal->shared_pending); 929 + flush_sigqueue_mask(p, &flush, &signal->shared_pending); 898 930 for_each_thread(p, t) { 899 - flush_sigqueue_mask(&flush, &t->pending); 931 + flush_sigqueue_mask(p, &flush, &t->pending); 900 932 task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING); 901 933 if (likely(!(t->ptrace & PT_SEIZED))) { 902 934 t->jobctl &= ~JOBCTL_STOPPED; ··· 1083 1115 else 1084 1116 override_rlimit = 0; 1085 1117 1086 - q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit, 0); 1118 + q = sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit); 1087 1119 1088 1120 if (q) { 1089 1121 list_add_tail(&q->list, &pending->list); ··· 1891 1923 } 1892 1924 EXPORT_SYMBOL(kill_pid); 1893 1925 1926 + #ifdef CONFIG_POSIX_TIMERS 1894 1927 /* 1895 - * These functions support sending signals using preallocated sigqueue 1896 - * structures. This is needed "because realtime applications cannot 1897 - * afford to lose notifications of asynchronous events, like timer 1898 - * expirations or I/O completions". In the case of POSIX Timers 1899 - * we allocate the sigqueue structure from the timer_create. If this 1900 - * allocation fails we are able to report the failure to the application 1901 - * with an EAGAIN error. 1928 + * These functions handle POSIX timer signals. POSIX timers use 1929 + * preallocated sigqueue structs for sending signals. 1902 1930 */ 1903 - struct sigqueue *sigqueue_alloc(void) 1931 + static void __flush_itimer_signals(struct sigpending *pending) 1904 1932 { 1905 - return __sigqueue_alloc(-1, current, GFP_KERNEL, 0, SIGQUEUE_PREALLOC); 1906 - } 1933 + sigset_t signal, retain; 1934 + struct sigqueue *q, *n; 1907 1935 1908 - void sigqueue_free(struct sigqueue *q) 1909 - { 1910 - spinlock_t *lock = &current->sighand->siglock; 1911 - unsigned long flags; 1936 + signal = pending->signal; 1937 + sigemptyset(&retain); 1912 1938 1913 - if (WARN_ON_ONCE(!(q->flags & SIGQUEUE_PREALLOC))) 1914 - return; 1915 - /* 1916 - * We must hold ->siglock while testing q->list 1917 - * to serialize with collect_signal() or with 1918 - * __exit_signal()->flush_sigqueue(). 1919 - */ 1920 - spin_lock_irqsave(lock, flags); 1921 - q->flags &= ~SIGQUEUE_PREALLOC; 1922 - /* 1923 - * If it is queued it will be freed when dequeued, 1924 - * like the "regular" sigqueue. 1925 - */ 1926 - if (!list_empty(&q->list)) 1927 - q = NULL; 1928 - spin_unlock_irqrestore(lock, flags); 1939 + list_for_each_entry_safe(q, n, &pending->list, list) { 1940 + int sig = q->info.si_signo; 1929 1941 1930 - if (q) 1931 - __sigqueue_free(q); 1932 - } 1933 - 1934 - int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) 1935 - { 1936 - int sig = q->info.si_signo; 1937 - struct sigpending *pending; 1938 - struct task_struct *t; 1939 - unsigned long flags; 1940 - int ret, result; 1941 - 1942 - if (WARN_ON_ONCE(!(q->flags & SIGQUEUE_PREALLOC))) 1943 - return 0; 1944 - if (WARN_ON_ONCE(q->info.si_code != SI_TIMER)) 1945 - return 0; 1946 - 1947 - ret = -1; 1948 - rcu_read_lock(); 1949 - 1950 - /* 1951 - * This function is used by POSIX timers to deliver a timer signal. 1952 - * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID 1953 - * set), the signal must be delivered to the specific thread (queues 1954 - * into t->pending). 1955 - * 1956 - * Where type is not PIDTYPE_PID, signals must be delivered to the 1957 - * process. In this case, prefer to deliver to current if it is in 1958 - * the same thread group as the target process, which avoids 1959 - * unnecessarily waking up a potentially idle task. 1960 - */ 1961 - t = pid_task(pid, type); 1962 - if (!t) 1963 - goto ret; 1964 - if (type != PIDTYPE_PID && same_thread_group(t, current)) 1965 - t = current; 1966 - if (!likely(lock_task_sighand(t, &flags))) 1967 - goto ret; 1968 - 1969 - ret = 1; /* the signal is ignored */ 1970 - result = TRACE_SIGNAL_IGNORED; 1971 - if (!prepare_signal(sig, t, false)) 1972 - goto out; 1973 - 1974 - ret = 0; 1975 - if (unlikely(!list_empty(&q->list))) { 1976 - /* 1977 - * If an SI_TIMER entry is already queue just increment 1978 - * the overrun count. 1979 - */ 1980 - q->info.si_overrun++; 1981 - result = TRACE_SIGNAL_ALREADY_PENDING; 1982 - goto out; 1942 + if (likely(q->info.si_code != SI_TIMER)) { 1943 + sigaddset(&retain, sig); 1944 + } else { 1945 + sigdelset(&signal, sig); 1946 + list_del_init(&q->list); 1947 + __sigqueue_free(q); 1948 + } 1983 1949 } 1984 - q->info.si_overrun = 0; 1950 + 1951 + sigorsets(&pending->signal, &signal, &retain); 1952 + } 1953 + 1954 + void flush_itimer_signals(void) 1955 + { 1956 + struct task_struct *tsk = current; 1957 + 1958 + guard(spinlock_irqsave)(&tsk->sighand->siglock); 1959 + __flush_itimer_signals(&tsk->pending); 1960 + __flush_itimer_signals(&tsk->signal->shared_pending); 1961 + } 1962 + 1963 + bool posixtimer_init_sigqueue(struct sigqueue *q) 1964 + { 1965 + struct ucounts *ucounts = sig_get_ucounts(current, -1, 0); 1966 + 1967 + if (!ucounts) 1968 + return false; 1969 + clear_siginfo(&q->info); 1970 + __sigqueue_init(q, ucounts, SIGQUEUE_PREALLOC); 1971 + return true; 1972 + } 1973 + 1974 + static void posixtimer_queue_sigqueue(struct sigqueue *q, struct task_struct *t, enum pid_type type) 1975 + { 1976 + struct sigpending *pending; 1977 + int sig = q->info.si_signo; 1985 1978 1986 1979 signalfd_notify(t, sig); 1987 1980 pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; 1988 1981 list_add_tail(&q->list, &pending->list); 1989 1982 sigaddset(&pending->signal, sig); 1990 1983 complete_signal(sig, t, type); 1984 + } 1985 + 1986 + /* 1987 + * This function is used by POSIX timers to deliver a timer signal. 1988 + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID 1989 + * set), the signal must be delivered to the specific thread (queues 1990 + * into t->pending). 1991 + * 1992 + * Where type is not PIDTYPE_PID, signals must be delivered to the 1993 + * process. In this case, prefer to deliver to current if it is in 1994 + * the same thread group as the target process, which avoids 1995 + * unnecessarily waking up a potentially idle task. 1996 + */ 1997 + static inline struct task_struct *posixtimer_get_target(struct k_itimer *tmr) 1998 + { 1999 + struct task_struct *t = pid_task(tmr->it_pid, tmr->it_pid_type); 2000 + 2001 + if (t && tmr->it_pid_type != PIDTYPE_PID && same_thread_group(t, current)) 2002 + t = current; 2003 + return t; 2004 + } 2005 + 2006 + void posixtimer_send_sigqueue(struct k_itimer *tmr) 2007 + { 2008 + struct sigqueue *q = &tmr->sigq; 2009 + int sig = q->info.si_signo; 2010 + struct task_struct *t; 2011 + unsigned long flags; 2012 + int result; 2013 + 2014 + guard(rcu)(); 2015 + 2016 + t = posixtimer_get_target(tmr); 2017 + if (!t) 2018 + return; 2019 + 2020 + if (!likely(lock_task_sighand(t, &flags))) 2021 + return; 2022 + 2023 + /* 2024 + * Update @tmr::sigqueue_seq for posix timer signals with sighand 2025 + * locked to prevent a race against dequeue_signal(). 2026 + */ 2027 + tmr->it_sigqueue_seq = tmr->it_signal_seq; 2028 + 2029 + /* 2030 + * Set the signal delivery status under sighand lock, so that the 2031 + * ignored signal handling can distinguish between a periodic and a 2032 + * non-periodic timer. 2033 + */ 2034 + tmr->it_sig_periodic = tmr->it_status == POSIX_TIMER_REQUEUE_PENDING; 2035 + 2036 + if (!prepare_signal(sig, t, false)) { 2037 + result = TRACE_SIGNAL_IGNORED; 2038 + 2039 + if (!list_empty(&q->list)) { 2040 + /* 2041 + * If task group is exiting with the signal already pending, 2042 + * wait for __exit_signal() to do its job. Otherwise if 2043 + * ignored, it's not supposed to be queued. Try to survive. 2044 + */ 2045 + WARN_ON_ONCE(!(t->signal->flags & SIGNAL_GROUP_EXIT)); 2046 + goto out; 2047 + } 2048 + 2049 + /* Periodic timers with SIG_IGN are queued on the ignored list */ 2050 + if (tmr->it_sig_periodic) { 2051 + /* 2052 + * Already queued means the timer was rearmed after 2053 + * the previous expiry got it on the ignore list. 2054 + * Nothing to do for that case. 2055 + */ 2056 + if (hlist_unhashed(&tmr->ignored_list)) { 2057 + /* 2058 + * Take a signal reference and queue it on 2059 + * the ignored list. 2060 + */ 2061 + posixtimer_sigqueue_getref(q); 2062 + posixtimer_sig_ignore(t, q); 2063 + } 2064 + } else if (!hlist_unhashed(&tmr->ignored_list)) { 2065 + /* 2066 + * Covers the case where a timer was periodic and 2067 + * then the signal was ignored. Later it was rearmed 2068 + * as oneshot timer. The previous signal is invalid 2069 + * now, and this oneshot signal has to be dropped. 2070 + * Remove it from the ignored list and drop the 2071 + * reference count as the signal is not longer 2072 + * queued. 2073 + */ 2074 + hlist_del_init(&tmr->ignored_list); 2075 + posixtimer_putref(tmr); 2076 + } 2077 + goto out; 2078 + } 2079 + 2080 + /* This should never happen and leaks a reference count */ 2081 + if (WARN_ON_ONCE(!hlist_unhashed(&tmr->ignored_list))) 2082 + hlist_del_init(&tmr->ignored_list); 2083 + 2084 + if (unlikely(!list_empty(&q->list))) { 2085 + /* This holds a reference count already */ 2086 + result = TRACE_SIGNAL_ALREADY_PENDING; 2087 + goto out; 2088 + } 2089 + 2090 + posixtimer_sigqueue_getref(q); 2091 + posixtimer_queue_sigqueue(q, t, tmr->it_pid_type); 1991 2092 result = TRACE_SIGNAL_DELIVERED; 1992 2093 out: 1993 - trace_signal_generate(sig, &q->info, t, type != PIDTYPE_PID, result); 2094 + trace_signal_generate(sig, &q->info, t, tmr->it_pid_type != PIDTYPE_PID, result); 1994 2095 unlock_task_sighand(t, &flags); 1995 - ret: 1996 - rcu_read_unlock(); 1997 - return ret; 1998 2096 } 2097 + 2098 + static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q) 2099 + { 2100 + struct k_itimer *tmr = container_of(q, struct k_itimer, sigq); 2101 + 2102 + /* 2103 + * If the timer is marked deleted already or the signal originates 2104 + * from a non-periodic timer, then just drop the reference 2105 + * count. Otherwise queue it on the ignored list. 2106 + */ 2107 + if (tmr->it_signal && tmr->it_sig_periodic) 2108 + hlist_add_head(&tmr->ignored_list, &tsk->signal->ignored_posix_timers); 2109 + else 2110 + posixtimer_putref(tmr); 2111 + } 2112 + 2113 + static void posixtimer_sig_unignore(struct task_struct *tsk, int sig) 2114 + { 2115 + struct hlist_head *head = &tsk->signal->ignored_posix_timers; 2116 + struct hlist_node *tmp; 2117 + struct k_itimer *tmr; 2118 + 2119 + if (likely(hlist_empty(head))) 2120 + return; 2121 + 2122 + /* 2123 + * Rearming a timer with sighand lock held is not possible due to 2124 + * lock ordering vs. tmr::it_lock. Just stick the sigqueue back and 2125 + * let the signal delivery path deal with it whether it needs to be 2126 + * rearmed or not. This cannot be decided here w/o dropping sighand 2127 + * lock and creating a loop retry horror show. 2128 + */ 2129 + hlist_for_each_entry_safe(tmr, tmp , head, ignored_list) { 2130 + struct task_struct *target; 2131 + 2132 + /* 2133 + * tmr::sigq.info.si_signo is immutable, so accessing it 2134 + * without holding tmr::it_lock is safe. 2135 + */ 2136 + if (tmr->sigq.info.si_signo != sig) 2137 + continue; 2138 + 2139 + hlist_del_init(&tmr->ignored_list); 2140 + 2141 + /* This should never happen and leaks a reference count */ 2142 + if (WARN_ON_ONCE(!list_empty(&tmr->sigq.list))) 2143 + continue; 2144 + 2145 + /* 2146 + * Get the target for the signal. If target is a thread and 2147 + * has exited by now, drop the reference count. 2148 + */ 2149 + guard(rcu)(); 2150 + target = posixtimer_get_target(tmr); 2151 + if (target) 2152 + posixtimer_queue_sigqueue(&tmr->sigq, target, tmr->it_pid_type); 2153 + else 2154 + posixtimer_putref(tmr); 2155 + } 2156 + } 2157 + #else /* CONFIG_POSIX_TIMERS */ 2158 + static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q) { } 2159 + static inline void posixtimer_sig_unignore(struct task_struct *tsk, int sig) { } 2160 + #endif /* !CONFIG_POSIX_TIMERS */ 1999 2161 2000 2162 void do_notify_pidfd(struct task_struct *task) 2001 2163 { ··· 4243 4145 sigemptyset(&mask); 4244 4146 sigaddset(&mask, sig); 4245 4147 4246 - flush_sigqueue_mask(&mask, &current->signal->shared_pending); 4247 - flush_sigqueue_mask(&mask, &current->pending); 4148 + flush_sigqueue_mask(current, &mask, &current->signal->shared_pending); 4149 + flush_sigqueue_mask(current, &mask, &current->pending); 4248 4150 recalc_sigpending(); 4249 4151 } 4250 4152 spin_unlock_irq(&current->sighand->siglock); ··· 4294 4196 sigaction_compat_abi(act, oact); 4295 4197 4296 4198 if (act) { 4199 + bool was_ignored = k->sa.sa_handler == SIG_IGN; 4200 + 4297 4201 sigdelsetmask(&act->sa.sa_mask, 4298 4202 sigmask(SIGKILL) | sigmask(SIGSTOP)); 4299 4203 *k = *act; ··· 4313 4213 if (sig_handler_ignored(sig_handler(p, sig), sig)) { 4314 4214 sigemptyset(&mask); 4315 4215 sigaddset(&mask, sig); 4316 - flush_sigqueue_mask(&mask, &p->signal->shared_pending); 4216 + flush_sigqueue_mask(p, &mask, &p->signal->shared_pending); 4317 4217 for_each_thread(p, t) 4318 - flush_sigqueue_mask(&mask, &t->pending); 4218 + flush_sigqueue_mask(p, &mask, &t->pending); 4219 + } else if (was_ignored) { 4220 + posixtimer_sig_unignore(p, sig); 4319 4221 } 4320 4222 } 4321 4223

-5

kernel/time/Kconfig

··· 17 17 config ARCH_CLOCKSOURCE_INIT 18 18 bool 19 19 20 - # Clocksources require validation of the clocksource against the last 21 - # cycle update - x86/TSC misfeature 22 - config CLOCKSOURCE_VALIDATE_LAST_CYCLE 23 - bool 24 - 25 20 # Timekeeping vsyscall support 26 21 config GENERIC_TIME_VSYSCALL 27 22 bool

+1 -1

kernel/time/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 - obj-y += time.o timer.o hrtimer.o 2 + obj-y += time.o timer.o hrtimer.o sleep_timeout.o 3 3 obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o 4 4 obj-y += timeconv.o timecounter.o alarmtimer.o 5 5

+19 -79

kernel/time/alarmtimer.c

··· 197 197 { 198 198 struct alarm *alarm = container_of(timer, struct alarm, timer); 199 199 struct alarm_base *base = &alarm_bases[alarm->type]; 200 - unsigned long flags; 201 - int ret = HRTIMER_NORESTART; 202 - int restart = ALARMTIMER_NORESTART; 203 200 204 - spin_lock_irqsave(&base->lock, flags); 205 - alarmtimer_dequeue(base, alarm); 206 - spin_unlock_irqrestore(&base->lock, flags); 201 + scoped_guard (spinlock_irqsave, &base->lock) 202 + alarmtimer_dequeue(base, alarm); 207 203 208 204 if (alarm->function) 209 - restart = alarm->function(alarm, base->get_ktime()); 210 - 211 - spin_lock_irqsave(&base->lock, flags); 212 - if (restart != ALARMTIMER_NORESTART) { 213 - hrtimer_set_expires(&alarm->timer, alarm->node.expires); 214 - alarmtimer_enqueue(base, alarm); 215 - ret = HRTIMER_RESTART; 216 - } 217 - spin_unlock_irqrestore(&base->lock, flags); 205 + alarm->function(alarm, base->get_ktime()); 218 206 219 207 trace_alarmtimer_fired(alarm, base->get_ktime()); 220 - return ret; 221 - 208 + return HRTIMER_NORESTART; 222 209 } 223 210 224 211 ktime_t alarm_expires_remaining(const struct alarm *alarm) ··· 321 334 322 335 static void 323 336 __alarm_init(struct alarm *alarm, enum alarmtimer_type type, 324 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) 337 + void (*function)(struct alarm *, ktime_t)) 325 338 { 326 339 timerqueue_init(&alarm->node); 327 - alarm->timer.function = alarmtimer_fired; 328 340 alarm->function = function; 329 341 alarm->type = type; 330 342 alarm->state = ALARMTIMER_STATE_INACTIVE; ··· 336 350 * @function: callback that is run when the alarm fires 337 351 */ 338 352 void alarm_init(struct alarm *alarm, enum alarmtimer_type type, 339 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) 353 + void (*function)(struct alarm *, ktime_t)) 340 354 { 341 - hrtimer_init(&alarm->timer, alarm_bases[type].base_clockid, 342 - HRTIMER_MODE_ABS); 355 + hrtimer_setup(&alarm->timer, alarmtimer_fired, alarm_bases[type].base_clockid, 356 + HRTIMER_MODE_ABS); 343 357 __alarm_init(alarm, type, function); 344 358 } 345 359 EXPORT_SYMBOL_GPL(alarm_init); ··· 466 480 } 467 481 EXPORT_SYMBOL_GPL(alarm_forward); 468 482 469 - static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle) 470 - { 471 - struct alarm_base *base = &alarm_bases[alarm->type]; 472 - ktime_t now = base->get_ktime(); 473 - 474 - if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) { 475 - /* 476 - * Same issue as with posix_timer_fn(). Timers which are 477 - * periodic but the signal is ignored can starve the system 478 - * with a very small interval. The real fix which was 479 - * promised in the context of posix_timer_fn() never 480 - * materialized, but someone should really work on it. 481 - * 482 - * To prevent DOS fake @now to be 1 jiffy out which keeps 483 - * the overrun accounting correct but creates an 484 - * inconsistency vs. timer_gettime(2). 485 - */ 486 - ktime_t kj = NSEC_PER_SEC / HZ; 487 - 488 - if (interval < kj) 489 - now = ktime_add(now, kj); 490 - } 491 - 492 - return alarm_forward(alarm, now, interval); 493 - } 494 - 495 483 u64 alarm_forward_now(struct alarm *alarm, ktime_t interval) 496 484 { 497 - return __alarm_forward_now(alarm, interval, false); 485 + struct alarm_base *base = &alarm_bases[alarm->type]; 486 + 487 + return alarm_forward(alarm, base->get_ktime(), interval); 498 488 } 499 489 EXPORT_SYMBOL_GPL(alarm_forward_now); 500 490 ··· 529 567 * 530 568 * Return: whether the timer is to be restarted 531 569 */ 532 - static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm, 533 - ktime_t now) 570 + static void alarm_handle_timer(struct alarm *alarm, ktime_t now) 534 571 { 535 - struct k_itimer *ptr = container_of(alarm, struct k_itimer, 536 - it.alarm.alarmtimer); 537 - enum alarmtimer_restart result = ALARMTIMER_NORESTART; 538 - unsigned long flags; 572 + struct k_itimer *ptr = container_of(alarm, struct k_itimer, it.alarm.alarmtimer); 539 573 540 - spin_lock_irqsave(&ptr->it_lock, flags); 541 - 542 - if (posix_timer_queue_signal(ptr) && ptr->it_interval) { 543 - /* 544 - * Handle ignored signals and rearm the timer. This will go 545 - * away once we handle ignored signals proper. Ensure that 546 - * small intervals cannot starve the system. 547 - */ 548 - ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true); 549 - ++ptr->it_requeue_pending; 550 - ptr->it_active = 1; 551 - result = ALARMTIMER_RESTART; 552 - } 553 - spin_unlock_irqrestore(&ptr->it_lock, flags); 554 - 555 - return result; 574 + guard(spinlock_irqsave)(&ptr->it_lock); 575 + posix_timer_queue_signal(ptr); 556 576 } 557 577 558 578 /** ··· 695 751 * @now: time at the timer expiration 696 752 * 697 753 * Wakes up the task that set the alarmtimer 698 - * 699 - * Return: ALARMTIMER_NORESTART 700 754 */ 701 - static enum alarmtimer_restart alarmtimer_nsleep_wakeup(struct alarm *alarm, 702 - ktime_t now) 755 + static void alarmtimer_nsleep_wakeup(struct alarm *alarm, ktime_t now) 703 756 { 704 757 struct task_struct *task = alarm->data; 705 758 706 759 alarm->data = NULL; 707 760 if (task) 708 761 wake_up_process(task); 709 - return ALARMTIMER_NORESTART; 710 762 } 711 763 712 764 /** ··· 754 814 755 815 static void 756 816 alarm_init_on_stack(struct alarm *alarm, enum alarmtimer_type type, 757 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) 817 + void (*function)(struct alarm *, ktime_t)) 758 818 { 759 - hrtimer_init_on_stack(&alarm->timer, alarm_bases[type].base_clockid, 760 - HRTIMER_MODE_ABS); 819 + hrtimer_setup_on_stack(&alarm->timer, alarmtimer_fired, alarm_bases[type].base_clockid, 820 + HRTIMER_MODE_ABS); 761 821 __alarm_init(alarm, type, function); 762 822 } 763 823

+21 -21

kernel/time/clockevents.c

··· 337 337 } 338 338 339 339 /* 340 - * Called after a notify add to make devices available which were 341 - * released from the notifier call. 340 + * Called after a clockevent has been added which might 341 + * have replaced a current regular or broadcast device. A 342 + * released normal device might be a suitable replacement 343 + * for the current broadcast device. Similarly a released 344 + * broadcast device might be a suitable replacement for a 345 + * normal device. 342 346 */ 343 347 static void clockevents_notify_released(void) 344 348 { 345 349 struct clock_event_device *dev; 346 350 351 + /* 352 + * Keep iterating as long as tick_check_new_device() 353 + * replaces a device. 354 + */ 347 355 while (!list_empty(&clockevents_released)) { 348 356 dev = list_entry(clockevents_released.next, 349 357 struct clock_event_device, list); ··· 618 610 619 611 #ifdef CONFIG_HOTPLUG_CPU 620 612 621 - # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST 622 613 /** 623 - * tick_offline_cpu - Take CPU out of the broadcast mechanism 614 + * tick_offline_cpu - Shutdown all clock events related 615 + * to this CPU and take it out of the 616 + * broadcast mechanism. 624 617 * @cpu: The outgoing CPU 625 618 * 626 - * Called on the outgoing CPU after it took itself offline. 619 + * Called by the dying CPU during teardown. 627 620 */ 628 621 void tick_offline_cpu(unsigned int cpu) 629 622 { 630 - raw_spin_lock(&clockevents_lock); 631 - tick_broadcast_offline(cpu); 632 - raw_spin_unlock(&clockevents_lock); 633 - } 634 - # endif 635 - 636 - /** 637 - * tick_cleanup_dead_cpu - Cleanup the tick and clockevents of a dead cpu 638 - * @cpu: The dead CPU 639 - */ 640 - void tick_cleanup_dead_cpu(int cpu) 641 - { 642 623 struct clock_event_device *dev, *tmp; 643 - unsigned long flags; 644 624 645 - raw_spin_lock_irqsave(&clockevents_lock, flags); 625 + raw_spin_lock(&clockevents_lock); 646 626 627 + tick_broadcast_offline(cpu); 647 628 tick_shutdown(cpu); 629 + 648 630 /* 649 631 * Unregister the clock event devices which were 650 - * released from the users in the notify chain. 632 + * released above. 651 633 */ 652 634 list_for_each_entry_safe(dev, tmp, &clockevents_released, list) 653 635 list_del(&dev->list); 636 + 654 637 /* 655 638 * Now check whether the CPU has left unused per cpu devices 656 639 */ ··· 653 654 list_del(&dev->list); 654 655 } 655 656 } 656 - raw_spin_unlock_irqrestore(&clockevents_lock, flags); 657 + 658 + raw_spin_unlock(&clockevents_lock); 657 659 } 658 660 #endif 659 661

+10 -30

kernel/time/clocksource.c

··· 20 20 #include "tick-internal.h" 21 21 #include "timekeeping_internal.h" 22 22 23 + static void clocksource_enqueue(struct clocksource *cs); 24 + 23 25 static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end) 24 26 { 25 27 u64 delta = clocksource_delta(end, start, cs->mask); ··· 173 171 } 174 172 175 173 static int clocksource_watchdog_kthread(void *data); 176 - static void __clocksource_change_rating(struct clocksource *cs, int rating); 177 174 178 175 static void clocksource_watchdog_work(struct work_struct *work) 179 176 { ··· 190 189 * watchdog_list will find the unstable clock again. 191 190 */ 192 191 kthread_run(clocksource_watchdog_kthread, NULL, "kwatchdog"); 192 + } 193 + 194 + static void clocksource_change_rating(struct clocksource *cs, int rating) 195 + { 196 + list_del(&cs->list); 197 + cs->rating = rating; 198 + clocksource_enqueue(cs); 193 199 } 194 200 195 201 static void __clocksource_unstable(struct clocksource *cs) ··· 705 697 list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) { 706 698 if (cs->flags & CLOCK_SOURCE_UNSTABLE) { 707 699 list_del_init(&cs->wd_list); 708 - __clocksource_change_rating(cs, 0); 700 + clocksource_change_rating(cs, 0); 709 701 select = 1; 710 702 } 711 703 if (cs->flags & CLOCK_SOURCE_RESELECT) { ··· 1262 1254 return 0; 1263 1255 } 1264 1256 EXPORT_SYMBOL_GPL(__clocksource_register_scale); 1265 - 1266 - static void __clocksource_change_rating(struct clocksource *cs, int rating) 1267 - { 1268 - list_del(&cs->list); 1269 - cs->rating = rating; 1270 - clocksource_enqueue(cs); 1271 - } 1272 - 1273 - /** 1274 - * clocksource_change_rating - Change the rating of a registered clocksource 1275 - * @cs: clocksource to be changed 1276 - * @rating: new rating 1277 - */ 1278 - void clocksource_change_rating(struct clocksource *cs, int rating) 1279 - { 1280 - unsigned long flags; 1281 - 1282 - mutex_lock(&clocksource_mutex); 1283 - clocksource_watchdog_lock(&flags); 1284 - __clocksource_change_rating(cs, rating); 1285 - clocksource_watchdog_unlock(&flags); 1286 - 1287 - clocksource_select(); 1288 - clocksource_select_watchdog(false); 1289 - clocksource_suspend_select(false); 1290 - mutex_unlock(&clocksource_mutex); 1291 - } 1292 - EXPORT_SYMBOL(clocksource_change_rating); 1293 1257 1294 1258 /* 1295 1259 * Unbind clocksource @cs. Called with clocksource_mutex held

+78 -152

kernel/time/hrtimer.c

··· 417 417 debug_object_init(timer, &hrtimer_debug_descr); 418 418 } 419 419 420 + static inline void debug_hrtimer_init_on_stack(struct hrtimer *timer) 421 + { 422 + debug_object_init_on_stack(timer, &hrtimer_debug_descr); 423 + } 424 + 420 425 static inline void debug_hrtimer_activate(struct hrtimer *timer, 421 426 enum hrtimer_mode mode) 422 427 { ··· 433 428 debug_object_deactivate(timer, &hrtimer_debug_descr); 434 429 } 435 430 436 - static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, 437 - enum hrtimer_mode mode); 438 - 439 - void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id, 440 - enum hrtimer_mode mode) 441 - { 442 - debug_object_init_on_stack(timer, &hrtimer_debug_descr); 443 - __hrtimer_init(timer, clock_id, mode); 444 - } 445 - EXPORT_SYMBOL_GPL(hrtimer_init_on_stack); 446 - 447 - static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl, 448 - clockid_t clock_id, enum hrtimer_mode mode); 449 - 450 - void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, 451 - clockid_t clock_id, enum hrtimer_mode mode) 452 - { 453 - debug_object_init_on_stack(&sl->timer, &hrtimer_debug_descr); 454 - __hrtimer_init_sleeper(sl, clock_id, mode); 455 - } 456 - EXPORT_SYMBOL_GPL(hrtimer_init_sleeper_on_stack); 457 - 458 431 void destroy_hrtimer_on_stack(struct hrtimer *timer) 459 432 { 460 433 debug_object_free(timer, &hrtimer_debug_descr); ··· 442 459 #else 443 460 444 461 static inline void debug_hrtimer_init(struct hrtimer *timer) { } 462 + static inline void debug_hrtimer_init_on_stack(struct hrtimer *timer) { } 445 463 static inline void debug_hrtimer_activate(struct hrtimer *timer, 446 464 enum hrtimer_mode mode) { } 447 465 static inline void debug_hrtimer_deactivate(struct hrtimer *timer) { } ··· 453 469 enum hrtimer_mode mode) 454 470 { 455 471 debug_hrtimer_init(timer); 472 + trace_hrtimer_init(timer, clockid, mode); 473 + } 474 + 475 + static inline void debug_init_on_stack(struct hrtimer *timer, clockid_t clockid, 476 + enum hrtimer_mode mode) 477 + { 478 + debug_hrtimer_init_on_stack(timer); 456 479 trace_hrtimer_init(timer, clockid, mode); 457 480 } 458 481 ··· 1535 1544 return HRTIMER_BASE_MONOTONIC; 1536 1545 } 1537 1546 1547 + static enum hrtimer_restart hrtimer_dummy_timeout(struct hrtimer *unused) 1548 + { 1549 + return HRTIMER_NORESTART; 1550 + } 1551 + 1538 1552 static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, 1539 1553 enum hrtimer_mode mode) 1540 1554 { ··· 1576 1580 timerqueue_init(&timer->node); 1577 1581 } 1578 1582 1583 + static void __hrtimer_setup(struct hrtimer *timer, 1584 + enum hrtimer_restart (*function)(struct hrtimer *), 1585 + clockid_t clock_id, enum hrtimer_mode mode) 1586 + { 1587 + __hrtimer_init(timer, clock_id, mode); 1588 + 1589 + if (WARN_ON_ONCE(!function)) 1590 + timer->function = hrtimer_dummy_timeout; 1591 + else 1592 + timer->function = function; 1593 + } 1594 + 1579 1595 /** 1580 1596 * hrtimer_init - initialize a timer to the given clock 1581 1597 * @timer: the timer to be initialized ··· 1607 1599 __hrtimer_init(timer, clock_id, mode); 1608 1600 } 1609 1601 EXPORT_SYMBOL_GPL(hrtimer_init); 1602 + 1603 + /** 1604 + * hrtimer_setup - initialize a timer to the given clock 1605 + * @timer: the timer to be initialized 1606 + * @function: the callback function 1607 + * @clock_id: the clock to be used 1608 + * @mode: The modes which are relevant for initialization: 1609 + * HRTIMER_MODE_ABS, HRTIMER_MODE_REL, HRTIMER_MODE_ABS_SOFT, 1610 + * HRTIMER_MODE_REL_SOFT 1611 + * 1612 + * The PINNED variants of the above can be handed in, 1613 + * but the PINNED bit is ignored as pinning happens 1614 + * when the hrtimer is started 1615 + */ 1616 + void hrtimer_setup(struct hrtimer *timer, enum hrtimer_restart (*function)(struct hrtimer *), 1617 + clockid_t clock_id, enum hrtimer_mode mode) 1618 + { 1619 + debug_init(timer, clock_id, mode); 1620 + __hrtimer_setup(timer, function, clock_id, mode); 1621 + } 1622 + EXPORT_SYMBOL_GPL(hrtimer_setup); 1623 + 1624 + /** 1625 + * hrtimer_setup_on_stack - initialize a timer on stack memory 1626 + * @timer: The timer to be initialized 1627 + * @function: the callback function 1628 + * @clock_id: The clock to be used 1629 + * @mode: The timer mode 1630 + * 1631 + * Similar to hrtimer_setup(), except that this one must be used if struct hrtimer is in stack 1632 + * memory. 1633 + */ 1634 + void hrtimer_setup_on_stack(struct hrtimer *timer, 1635 + enum hrtimer_restart (*function)(struct hrtimer *), 1636 + clockid_t clock_id, enum hrtimer_mode mode) 1637 + { 1638 + debug_init_on_stack(timer, clock_id, mode); 1639 + __hrtimer_setup(timer, function, clock_id, mode); 1640 + } 1641 + EXPORT_SYMBOL_GPL(hrtimer_setup_on_stack); 1610 1642 1611 1643 /* 1612 1644 * A timer is active, when it is enqueued into the rbtree or the ··· 1992 1944 * Make the enqueue delivery mode check work on RT. If the sleeper 1993 1945 * was initialized for hard interrupt delivery, force the mode bit. 1994 1946 * This is a special case for hrtimer_sleepers because 1995 - * hrtimer_init_sleeper() determines the delivery mode on RT so the 1947 + * __hrtimer_init_sleeper() determines the delivery mode on RT so the 1996 1948 * fiddling with this decision is avoided at the call sites. 1997 1949 */ 1998 1950 if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard) ··· 2035 1987 } 2036 1988 2037 1989 /** 2038 - * hrtimer_init_sleeper - initialize sleeper to the given clock 1990 + * hrtimer_setup_sleeper_on_stack - initialize a sleeper in stack memory 2039 1991 * @sl: sleeper to be initialized 2040 1992 * @clock_id: the clock to be used 2041 1993 * @mode: timer mode abs/rel 2042 1994 */ 2043 - void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, 2044 - enum hrtimer_mode mode) 1995 + void hrtimer_setup_sleeper_on_stack(struct hrtimer_sleeper *sl, 1996 + clockid_t clock_id, enum hrtimer_mode mode) 2045 1997 { 2046 - debug_init(&sl->timer, clock_id, mode); 1998 + debug_init_on_stack(&sl->timer, clock_id, mode); 2047 1999 __hrtimer_init_sleeper(sl, clock_id, mode); 2048 - 2049 2000 } 2050 - EXPORT_SYMBOL_GPL(hrtimer_init_sleeper); 2001 + EXPORT_SYMBOL_GPL(hrtimer_setup_sleeper_on_stack); 2051 2002 2052 2003 int nanosleep_copyout(struct restart_block *restart, struct timespec64 *ts) 2053 2004 { ··· 2107 2060 struct hrtimer_sleeper t; 2108 2061 int ret; 2109 2062 2110 - hrtimer_init_sleeper_on_stack(&t, restart->nanosleep.clockid, 2111 - HRTIMER_MODE_ABS); 2063 + hrtimer_setup_sleeper_on_stack(&t, restart->nanosleep.clockid, HRTIMER_MODE_ABS); 2112 2064 hrtimer_set_expires_tv64(&t.timer, restart->nanosleep.expires); 2113 2065 ret = do_nanosleep(&t, HRTIMER_MODE_ABS); 2114 2066 destroy_hrtimer_on_stack(&t.timer); ··· 2121 2075 struct hrtimer_sleeper t; 2122 2076 int ret = 0; 2123 2077 2124 - hrtimer_init_sleeper_on_stack(&t, clockid, mode); 2078 + hrtimer_setup_sleeper_on_stack(&t, clockid, mode); 2125 2079 hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns); 2126 2080 ret = do_nanosleep(&t, mode); 2127 2081 if (ret != -ERESTART_RESTARTBLOCK) ··· 2288 2242 hrtimers_prepare_cpu(smp_processor_id()); 2289 2243 open_softirq(HRTIMER_SOFTIRQ, hrtimer_run_softirq); 2290 2244 } 2291 - 2292 - /** 2293 - * schedule_hrtimeout_range_clock - sleep until timeout 2294 - * @expires: timeout value (ktime_t) 2295 - * @delta: slack in expires timeout (ktime_t) 2296 - * @mode: timer mode 2297 - * @clock_id: timer clock to be used 2298 - */ 2299 - int __sched 2300 - schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta, 2301 - const enum hrtimer_mode mode, clockid_t clock_id) 2302 - { 2303 - struct hrtimer_sleeper t; 2304 - 2305 - /* 2306 - * Optimize when a zero timeout value is given. It does not 2307 - * matter whether this is an absolute or a relative time. 2308 - */ 2309 - if (expires && *expires == 0) { 2310 - __set_current_state(TASK_RUNNING); 2311 - return 0; 2312 - } 2313 - 2314 - /* 2315 - * A NULL parameter means "infinite" 2316 - */ 2317 - if (!expires) { 2318 - schedule(); 2319 - return -EINTR; 2320 - } 2321 - 2322 - hrtimer_init_sleeper_on_stack(&t, clock_id, mode); 2323 - hrtimer_set_expires_range_ns(&t.timer, *expires, delta); 2324 - hrtimer_sleeper_start_expires(&t, mode); 2325 - 2326 - if (likely(t.task)) 2327 - schedule(); 2328 - 2329 - hrtimer_cancel(&t.timer); 2330 - destroy_hrtimer_on_stack(&t.timer); 2331 - 2332 - __set_current_state(TASK_RUNNING); 2333 - 2334 - return !t.task ? 0 : -EINTR; 2335 - } 2336 - EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock); 2337 - 2338 - /** 2339 - * schedule_hrtimeout_range - sleep until timeout 2340 - * @expires: timeout value (ktime_t) 2341 - * @delta: slack in expires timeout (ktime_t) 2342 - * @mode: timer mode 2343 - * 2344 - * Make the current task sleep until the given expiry time has 2345 - * elapsed. The routine will return immediately unless 2346 - * the current task state has been set (see set_current_state()). 2347 - * 2348 - * The @delta argument gives the kernel the freedom to schedule the 2349 - * actual wakeup to a time that is both power and performance friendly 2350 - * for regular (non RT/DL) tasks. 2351 - * The kernel give the normal best effort behavior for "@expires+@delta", 2352 - * but may decide to fire the timer earlier, but no earlier than @expires. 2353 - * 2354 - * You can set the task state as follows - 2355 - * 2356 - * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to 2357 - * pass before the routine returns unless the current task is explicitly 2358 - * woken up, (e.g. by wake_up_process()). 2359 - * 2360 - * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 2361 - * delivered to the current task or the current task is explicitly woken 2362 - * up. 2363 - * 2364 - * The current task state is guaranteed to be TASK_RUNNING when this 2365 - * routine returns. 2366 - * 2367 - * Returns 0 when the timer has expired. If the task was woken before the 2368 - * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or 2369 - * by an explicit wakeup, it returns -EINTR. 2370 - */ 2371 - int __sched schedule_hrtimeout_range(ktime_t *expires, u64 delta, 2372 - const enum hrtimer_mode mode) 2373 - { 2374 - return schedule_hrtimeout_range_clock(expires, delta, mode, 2375 - CLOCK_MONOTONIC); 2376 - } 2377 - EXPORT_SYMBOL_GPL(schedule_hrtimeout_range); 2378 - 2379 - /** 2380 - * schedule_hrtimeout - sleep until timeout 2381 - * @expires: timeout value (ktime_t) 2382 - * @mode: timer mode 2383 - * 2384 - * Make the current task sleep until the given expiry time has 2385 - * elapsed. The routine will return immediately unless 2386 - * the current task state has been set (see set_current_state()). 2387 - * 2388 - * You can set the task state as follows - 2389 - * 2390 - * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to 2391 - * pass before the routine returns unless the current task is explicitly 2392 - * woken up, (e.g. by wake_up_process()). 2393 - * 2394 - * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 2395 - * delivered to the current task or the current task is explicitly woken 2396 - * up. 2397 - * 2398 - * The current task state is guaranteed to be TASK_RUNNING when this 2399 - * routine returns. 2400 - * 2401 - * Returns 0 when the timer has expired. If the task was woken before the 2402 - * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or 2403 - * by an explicit wakeup, it returns -EINTR. 2404 - */ 2405 - int __sched schedule_hrtimeout(ktime_t *expires, 2406 - const enum hrtimer_mode mode) 2407 - { 2408 - return schedule_hrtimeout_range(expires, 0, mode); 2409 - } 2410 - EXPORT_SYMBOL_GPL(schedule_hrtimeout);

+21 -1

kernel/time/itimer.c

··· 151 151 #endif 152 152 153 153 /* 154 - * The timer is automagically restarted, when interval != 0 154 + * Invoked from dequeue_signal() when SIG_ALRM is delivered. 155 + * 156 + * Restart the ITIMER_REAL timer if it is armed as periodic timer. Doing 157 + * this in the signal delivery path instead of self rearming prevents a DoS 158 + * with small increments in the high reolution timer case and reduces timer 159 + * noise in general. 160 + */ 161 + void posixtimer_rearm_itimer(struct task_struct *tsk) 162 + { 163 + struct hrtimer *tmr = &tsk->signal->real_timer; 164 + 165 + if (!hrtimer_is_queued(tmr) && tsk->signal->it_real_incr != 0) { 166 + hrtimer_forward(tmr, tmr->base->get_time(), 167 + tsk->signal->it_real_incr); 168 + hrtimer_restart(tmr); 169 + } 170 + } 171 + 172 + /* 173 + * Interval timers are restarted in the signal delivery path. See 174 + * posixtimer_rearm_itimer(). 155 175 */ 156 176 enum hrtimer_restart it_real_fn(struct hrtimer *timer) 157 177 {

+420 -422

kernel/time/ntp.c

··· 22 22 #include "ntp_internal.h" 23 23 #include "timekeeping_internal.h" 24 24 25 - 26 - /* 27 - * NTP timekeeping variables: 25 + /** 26 + * struct ntp_data - Structure holding all NTP related state 27 + * @tick_usec: USER_HZ period in microseconds 28 + * @tick_length: Adjusted tick length 29 + * @tick_length_base: Base value for @tick_length 30 + * @time_state: State of the clock synchronization 31 + * @time_status: Clock status bits 32 + * @time_offset: Time adjustment in nanoseconds 33 + * @time_constant: PLL time constant 34 + * @time_maxerror: Maximum error in microseconds holding the NTP sync distance 35 + * (NTP dispersion + delay / 2) 36 + * @time_esterror: Estimated error in microseconds holding NTP dispersion 37 + * @time_freq: Frequency offset scaled nsecs/secs 38 + * @time_reftime: Time at last adjustment in seconds 39 + * @time_adjust: Adjustment value 40 + * @ntp_tick_adj: Constant boot-param configurable NTP tick adjustment (upscaled) 41 + * @ntp_next_leap_sec: Second value of the next pending leapsecond, or TIME64_MAX if no leap 28 42 * 29 - * Note: All of the NTP state is protected by the timekeeping locks. 43 + * @pps_valid: PPS signal watchdog counter 44 + * @pps_tf: PPS phase median filter 45 + * @pps_jitter: PPS current jitter in nanoseconds 46 + * @pps_fbase: PPS beginning of the last freq interval 47 + * @pps_shift: PPS current interval duration in seconds (shift value) 48 + * @pps_intcnt: PPS interval counter 49 + * @pps_freq: PPS frequency offset in scaled ns/s 50 + * @pps_stabil: PPS current stability in scaled ns/s 51 + * @pps_calcnt: PPS monitor: calibration intervals 52 + * @pps_jitcnt: PPS monitor: jitter limit exceeded 53 + * @pps_stbcnt: PPS monitor: stability limit exceeded 54 + * @pps_errcnt: PPS monitor: calibration errors 55 + * 56 + * Protected by the timekeeping locks. 30 57 */ 58 + struct ntp_data { 59 + unsigned long tick_usec; 60 + u64 tick_length; 61 + u64 tick_length_base; 62 + int time_state; 63 + int time_status; 64 + s64 time_offset; 65 + long time_constant; 66 + long time_maxerror; 67 + long time_esterror; 68 + s64 time_freq; 69 + time64_t time_reftime; 70 + long time_adjust; 71 + s64 ntp_tick_adj; 72 + time64_t ntp_next_leap_sec; 73 + #ifdef CONFIG_NTP_PPS 74 + int pps_valid; 75 + long pps_tf[3]; 76 + long pps_jitter; 77 + struct timespec64 pps_fbase; 78 + int pps_shift; 79 + int pps_intcnt; 80 + s64 pps_freq; 81 + long pps_stabil; 82 + long pps_calcnt; 83 + long pps_jitcnt; 84 + long pps_stbcnt; 85 + long pps_errcnt; 86 + #endif 87 + }; 31 88 32 - 33 - /* USER_HZ period (usecs): */ 34 - unsigned long tick_usec = USER_TICK_USEC; 35 - 36 - /* SHIFTED_HZ period (nsecs): */ 37 - unsigned long tick_nsec; 38 - 39 - static u64 tick_length; 40 - static u64 tick_length_base; 89 + static struct ntp_data tk_ntp_data = { 90 + .tick_usec = USER_TICK_USEC, 91 + .time_state = TIME_OK, 92 + .time_status = STA_UNSYNC, 93 + .time_constant = 2, 94 + .time_maxerror = NTP_PHASE_LIMIT, 95 + .time_esterror = NTP_PHASE_LIMIT, 96 + .ntp_next_leap_sec = TIME64_MAX, 97 + }; 41 98 42 99 #define SECS_PER_DAY 86400 43 100 #define MAX_TICKADJ 500LL /* usecs */ 44 101 #define MAX_TICKADJ_SCALED \ 45 102 (((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ) 46 103 #define MAX_TAI_OFFSET 100000 47 - 48 - /* 49 - * phase-lock loop variables 50 - */ 51 - 52 - /* 53 - * clock synchronization status 54 - * 55 - * (TIME_ERROR prevents overwriting the CMOS clock) 56 - */ 57 - static int time_state = TIME_OK; 58 - 59 - /* clock status bits: */ 60 - static int time_status = STA_UNSYNC; 61 - 62 - /* time adjustment (nsecs): */ 63 - static s64 time_offset; 64 - 65 - /* pll time constant: */ 66 - static long time_constant = 2; 67 - 68 - /* maximum error (usecs): */ 69 - static long time_maxerror = NTP_PHASE_LIMIT; 70 - 71 - /* estimated error (usecs): */ 72 - static long time_esterror = NTP_PHASE_LIMIT; 73 - 74 - /* frequency offset (scaled nsecs/secs): */ 75 - static s64 time_freq; 76 - 77 - /* time at last adjustment (secs): */ 78 - static time64_t time_reftime; 79 - 80 - static long time_adjust; 81 - 82 - /* constant (boot-param configurable) NTP tick adjustment (upscaled) */ 83 - static s64 ntp_tick_adj; 84 - 85 - /* second value of the next pending leapsecond, or TIME64_MAX if no leap */ 86 - static time64_t ntp_next_leap_sec = TIME64_MAX; 87 104 88 105 #ifdef CONFIG_NTP_PPS 89 106 ··· 118 101 intervals to decrease it */ 119 102 #define PPS_MAXWANDER 100000 /* max PPS freq wander (ns/s) */ 120 103 121 - static int pps_valid; /* signal watchdog counter */ 122 - static long pps_tf[3]; /* phase median filter */ 123 - static long pps_jitter; /* current jitter (ns) */ 124 - static struct timespec64 pps_fbase; /* beginning of the last freq interval */ 125 - static int pps_shift; /* current interval duration (s) (shift) */ 126 - static int pps_intcnt; /* interval counter */ 127 - static s64 pps_freq; /* frequency offset (scaled ns/s) */ 128 - static long pps_stabil; /* current stability (scaled ns/s) */ 129 - 130 104 /* 131 - * PPS signal quality monitors 132 - */ 133 - static long pps_calcnt; /* calibration intervals */ 134 - static long pps_jitcnt; /* jitter limit exceeded */ 135 - static long pps_stbcnt; /* stability limit exceeded */ 136 - static long pps_errcnt; /* calibration errors */ 137 - 138 - 139 - /* PPS kernel consumer compensates the whole phase error immediately. 105 + * PPS kernel consumer compensates the whole phase error immediately. 140 106 * Otherwise, reduce the offset by a fixed factor times the time constant. 141 107 */ 142 - static inline s64 ntp_offset_chunk(s64 offset) 108 + static inline s64 ntp_offset_chunk(struct ntp_data *ntpdata, s64 offset) 143 109 { 144 - if (time_status & STA_PPSTIME && time_status & STA_PPSSIGNAL) 110 + if (ntpdata->time_status & STA_PPSTIME && ntpdata->time_status & STA_PPSSIGNAL) 145 111 return offset; 146 112 else 147 - return shift_right(offset, SHIFT_PLL + time_constant); 113 + return shift_right(offset, SHIFT_PLL + ntpdata->time_constant); 148 114 } 149 115 150 - static inline void pps_reset_freq_interval(void) 116 + static inline void pps_reset_freq_interval(struct ntp_data *ntpdata) 151 117 { 152 - /* the PPS calibration interval may end 153 - surprisingly early */ 154 - pps_shift = PPS_INTMIN; 155 - pps_intcnt = 0; 118 + /* The PPS calibration interval may end surprisingly early */ 119 + ntpdata->pps_shift = PPS_INTMIN; 120 + ntpdata->pps_intcnt = 0; 156 121 } 157 122 158 123 /** 159 124 * pps_clear - Clears the PPS state variables 125 + * @ntpdata: Pointer to ntp data 160 126 */ 161 - static inline void pps_clear(void) 127 + static inline void pps_clear(struct ntp_data *ntpdata) 162 128 { 163 - pps_reset_freq_interval(); 164 - pps_tf[0] = 0; 165 - pps_tf[1] = 0; 166 - pps_tf[2] = 0; 167 - pps_fbase.tv_sec = pps_fbase.tv_nsec = 0; 168 - pps_freq = 0; 129 + pps_reset_freq_interval(ntpdata); 130 + ntpdata->pps_tf[0] = 0; 131 + ntpdata->pps_tf[1] = 0; 132 + ntpdata->pps_tf[2] = 0; 133 + ntpdata->pps_fbase.tv_sec = ntpdata->pps_fbase.tv_nsec = 0; 134 + ntpdata->pps_freq = 0; 169 135 } 170 136 171 - /* Decrease pps_valid to indicate that another second has passed since 172 - * the last PPS signal. When it reaches 0, indicate that PPS signal is 173 - * missing. 137 + /* 138 + * Decrease pps_valid to indicate that another second has passed since the 139 + * last PPS signal. When it reaches 0, indicate that PPS signal is missing. 174 140 */ 175 - static inline void pps_dec_valid(void) 141 + static inline void pps_dec_valid(struct ntp_data *ntpdata) 176 142 { 177 - if (pps_valid > 0) 178 - pps_valid--; 179 - else { 180 - time_status &= ~(STA_PPSSIGNAL | STA_PPSJITTER | 181 - STA_PPSWANDER | STA_PPSERROR); 182 - pps_clear(); 143 + if (ntpdata->pps_valid > 0) { 144 + ntpdata->pps_valid--; 145 + } else { 146 + ntpdata->time_status &= ~(STA_PPSSIGNAL | STA_PPSJITTER | 147 + STA_PPSWANDER | STA_PPSERROR); 148 + pps_clear(ntpdata); 183 149 } 184 150 } 185 151 186 - static inline void pps_set_freq(s64 freq) 152 + static inline void pps_set_freq(struct ntp_data *ntpdata) 187 153 { 188 - pps_freq = freq; 154 + ntpdata->pps_freq = ntpdata->time_freq; 189 155 } 190 156 191 - static inline int is_error_status(int status) 157 + static inline bool is_error_status(int status) 192 158 { 193 159 return (status & (STA_UNSYNC|STA_CLOCKERR)) 194 - /* PPS signal lost when either PPS time or 195 - * PPS frequency synchronization requested 160 + /* 161 + * PPS signal lost when either PPS time or PPS frequency 162 + * synchronization requested 196 163 */ 197 164 || ((status & (STA_PPSFREQ|STA_PPSTIME)) 198 165 && !(status & STA_PPSSIGNAL)) 199 - /* PPS jitter exceeded when 200 - * PPS time synchronization requested */ 166 + /* 167 + * PPS jitter exceeded when PPS time synchronization 168 + * requested 169 + */ 201 170 || ((status & (STA_PPSTIME|STA_PPSJITTER)) 202 171 == (STA_PPSTIME|STA_PPSJITTER)) 203 - /* PPS wander exceeded or calibration error when 204 - * PPS frequency synchronization requested 172 + /* 173 + * PPS wander exceeded or calibration error when PPS 174 + * frequency synchronization requested 205 175 */ 206 176 || ((status & STA_PPSFREQ) 207 177 && (status & (STA_PPSWANDER|STA_PPSERROR))); 208 178 } 209 179 210 - static inline void pps_fill_timex(struct __kernel_timex *txc) 180 + static inline void pps_fill_timex(struct ntp_data *ntpdata, struct __kernel_timex *txc) 211 181 { 212 - txc->ppsfreq = shift_right((pps_freq >> PPM_SCALE_INV_SHIFT) * 182 + txc->ppsfreq = shift_right((ntpdata->pps_freq >> PPM_SCALE_INV_SHIFT) * 213 183 PPM_SCALE_INV, NTP_SCALE_SHIFT); 214 - txc->jitter = pps_jitter; 215 - if (!(time_status & STA_NANO)) 216 - txc->jitter = pps_jitter / NSEC_PER_USEC; 217 - txc->shift = pps_shift; 218 - txc->stabil = pps_stabil; 219 - txc->jitcnt = pps_jitcnt; 220 - txc->calcnt = pps_calcnt; 221 - txc->errcnt = pps_errcnt; 222 - txc->stbcnt = pps_stbcnt; 184 + txc->jitter = ntpdata->pps_jitter; 185 + if (!(ntpdata->time_status & STA_NANO)) 186 + txc->jitter = ntpdata->pps_jitter / NSEC_PER_USEC; 187 + txc->shift = ntpdata->pps_shift; 188 + txc->stabil = ntpdata->pps_stabil; 189 + txc->jitcnt = ntpdata->pps_jitcnt; 190 + txc->calcnt = ntpdata->pps_calcnt; 191 + txc->errcnt = ntpdata->pps_errcnt; 192 + txc->stbcnt = ntpdata->pps_stbcnt; 223 193 } 224 194 225 195 #else /* !CONFIG_NTP_PPS */ 226 196 227 - static inline s64 ntp_offset_chunk(s64 offset) 197 + static inline s64 ntp_offset_chunk(struct ntp_data *ntpdata, s64 offset) 228 198 { 229 - return shift_right(offset, SHIFT_PLL + time_constant); 199 + return shift_right(offset, SHIFT_PLL + ntpdata->time_constant); 230 200 } 231 201 232 - static inline void pps_reset_freq_interval(void) {} 233 - static inline void pps_clear(void) {} 234 - static inline void pps_dec_valid(void) {} 235 - static inline void pps_set_freq(s64 freq) {} 202 + static inline void pps_reset_freq_interval(struct ntp_data *ntpdata) {} 203 + static inline void pps_clear(struct ntp_data *ntpdata) {} 204 + static inline void pps_dec_valid(struct ntp_data *ntpdata) {} 205 + static inline void pps_set_freq(struct ntp_data *ntpdata) {} 236 206 237 - static inline int is_error_status(int status) 207 + static inline bool is_error_status(int status) 238 208 { 239 209 return status & (STA_UNSYNC|STA_CLOCKERR); 240 210 } 241 211 242 - static inline void pps_fill_timex(struct __kernel_timex *txc) 212 + static inline void pps_fill_timex(struct ntp_data *ntpdata, struct __kernel_timex *txc) 243 213 { 244 214 /* PPS is not implemented, so these are zero */ 245 215 txc->ppsfreq = 0; ··· 241 237 242 238 #endif /* CONFIG_NTP_PPS */ 243 239 244 - 245 - /** 246 - * ntp_synced - Returns 1 if the NTP status is not UNSYNC 247 - * 248 - */ 249 - static inline int ntp_synced(void) 250 - { 251 - return !(time_status & STA_UNSYNC); 252 - } 253 - 254 - 255 240 /* 256 - * NTP methods: 241 + * Update tick_length and tick_length_base, based on tick_usec, ntp_tick_adj and 242 + * time_freq: 257 243 */ 258 - 259 - /* 260 - * Update (tick_length, tick_length_base, tick_nsec), based 261 - * on (tick_usec, ntp_tick_adj, time_freq): 262 - */ 263 - static void ntp_update_frequency(void) 244 + static void ntp_update_frequency(struct ntp_data *ntpdata) 264 245 { 265 - u64 second_length; 266 - u64 new_base; 246 + u64 second_length, new_base, tick_usec = (u64)ntpdata->tick_usec; 267 247 268 - second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) 269 - << NTP_SCALE_SHIFT; 248 + second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << NTP_SCALE_SHIFT; 270 249 271 - second_length += ntp_tick_adj; 272 - second_length += time_freq; 250 + second_length += ntpdata->ntp_tick_adj; 251 + second_length += ntpdata->time_freq; 273 252 274 - tick_nsec = div_u64(second_length, HZ) >> NTP_SCALE_SHIFT; 275 253 new_base = div_u64(second_length, NTP_INTERVAL_FREQ); 276 254 277 255 /* 278 - * Don't wait for the next second_overflow, apply 279 - * the change to the tick length immediately: 256 + * Don't wait for the next second_overflow, apply the change to the 257 + * tick length immediately: 280 258 */ 281 - tick_length += new_base - tick_length_base; 282 - tick_length_base = new_base; 259 + ntpdata->tick_length += new_base - ntpdata->tick_length_base; 260 + ntpdata->tick_length_base = new_base; 283 261 } 284 262 285 - static inline s64 ntp_update_offset_fll(s64 offset64, long secs) 263 + static inline s64 ntp_update_offset_fll(struct ntp_data *ntpdata, s64 offset64, long secs) 286 264 { 287 - time_status &= ~STA_MODE; 265 + ntpdata->time_status &= ~STA_MODE; 288 266 289 267 if (secs < MINSEC) 290 268 return 0; 291 269 292 - if (!(time_status & STA_FLL) && (secs <= MAXSEC)) 270 + if (!(ntpdata->time_status & STA_FLL) && (secs <= MAXSEC)) 293 271 return 0; 294 272 295 - time_status |= STA_MODE; 273 + ntpdata->time_status |= STA_MODE; 296 274 297 275 return div64_long(offset64 << (NTP_SCALE_SHIFT - SHIFT_FLL), secs); 298 276 } 299 277 300 - static void ntp_update_offset(long offset) 278 + static void ntp_update_offset(struct ntp_data *ntpdata, long offset) 301 279 { 302 - s64 freq_adj; 303 - s64 offset64; 304 - long secs; 280 + s64 freq_adj, offset64; 281 + long secs, real_secs; 305 282 306 - if (!(time_status & STA_PLL)) 283 + if (!(ntpdata->time_status & STA_PLL)) 307 284 return; 308 285 309 - if (!(time_status & STA_NANO)) { 286 + if (!(ntpdata->time_status & STA_NANO)) { 310 287 /* Make sure the multiplication below won't overflow */ 311 288 offset = clamp(offset, -USEC_PER_SEC, USEC_PER_SEC); 312 289 offset *= NSEC_PER_USEC; 313 290 } 314 291 315 - /* 316 - * Scale the phase adjustment and 317 - * clamp to the operating range. 318 - */ 292 + /* Scale the phase adjustment and clamp to the operating range. */ 319 293 offset = clamp(offset, -MAXPHASE, MAXPHASE); 320 294 321 295 /* 322 296 * Select how the frequency is to be controlled 323 297 * and in which mode (PLL or FLL). 324 298 */ 325 - secs = (long)(__ktime_get_real_seconds() - time_reftime); 326 - if (unlikely(time_status & STA_FREQHOLD)) 299 + real_secs = __ktime_get_real_seconds(); 300 + secs = (long)(real_secs - ntpdata->time_reftime); 301 + if (unlikely(ntpdata->time_status & STA_FREQHOLD)) 327 302 secs = 0; 328 303 329 - time_reftime = __ktime_get_real_seconds(); 304 + ntpdata->time_reftime = real_secs; 330 305 331 306 offset64 = offset; 332 - freq_adj = ntp_update_offset_fll(offset64, secs); 307 + freq_adj = ntp_update_offset_fll(ntpdata, offset64, secs); 333 308 334 309 /* 335 310 * Clamp update interval to reduce PLL gain with low 336 311 * sampling rate (e.g. intermittent network connection) 337 312 * to avoid instability. 338 313 */ 339 - if (unlikely(secs > 1 << (SHIFT_PLL + 1 + time_constant))) 340 - secs = 1 << (SHIFT_PLL + 1 + time_constant); 314 + if (unlikely(secs > 1 << (SHIFT_PLL + 1 + ntpdata->time_constant))) 315 + secs = 1 << (SHIFT_PLL + 1 + ntpdata->time_constant); 341 316 342 317 freq_adj += (offset64 * secs) << 343 - (NTP_SCALE_SHIFT - 2 * (SHIFT_PLL + 2 + time_constant)); 318 + (NTP_SCALE_SHIFT - 2 * (SHIFT_PLL + 2 + ntpdata->time_constant)); 344 319 345 - freq_adj = min(freq_adj + time_freq, MAXFREQ_SCALED); 320 + freq_adj = min(freq_adj + ntpdata->time_freq, MAXFREQ_SCALED); 346 321 347 - time_freq = max(freq_adj, -MAXFREQ_SCALED); 322 + ntpdata->time_freq = max(freq_adj, -MAXFREQ_SCALED); 348 323 349 - time_offset = div_s64(offset64 << NTP_SCALE_SHIFT, NTP_INTERVAL_FREQ); 324 + ntpdata->time_offset = div_s64(offset64 << NTP_SCALE_SHIFT, NTP_INTERVAL_FREQ); 325 + } 326 + 327 + static void __ntp_clear(struct ntp_data *ntpdata) 328 + { 329 + /* Stop active adjtime() */ 330 + ntpdata->time_adjust = 0; 331 + ntpdata->time_status |= STA_UNSYNC; 332 + ntpdata->time_maxerror = NTP_PHASE_LIMIT; 333 + ntpdata->time_esterror = NTP_PHASE_LIMIT; 334 + 335 + ntp_update_frequency(ntpdata); 336 + 337 + ntpdata->tick_length = ntpdata->tick_length_base; 338 + ntpdata->time_offset = 0; 339 + 340 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 341 + /* Clear PPS state variables */ 342 + pps_clear(ntpdata); 350 343 } 351 344 352 345 /** ··· 351 350 */ 352 351 void ntp_clear(void) 353 352 { 354 - time_adjust = 0; /* stop active adjtime() */ 355 - time_status |= STA_UNSYNC; 356 - time_maxerror = NTP_PHASE_LIMIT; 357 - time_esterror = NTP_PHASE_LIMIT; 358 - 359 - ntp_update_frequency(); 360 - 361 - tick_length = tick_length_base; 362 - time_offset = 0; 363 - 364 - ntp_next_leap_sec = TIME64_MAX; 365 - /* Clear PPS state variables */ 366 - pps_clear(); 353 + __ntp_clear(&tk_ntp_data); 367 354 } 368 355 369 356 370 357 u64 ntp_tick_length(void) 371 358 { 372 - return tick_length; 359 + return tk_ntp_data.tick_length; 373 360 } 374 361 375 362 /** ··· 368 379 */ 369 380 ktime_t ntp_get_next_leap(void) 370 381 { 382 + struct ntp_data *ntpdata = &tk_ntp_data; 371 383 ktime_t ret; 372 384 373 - if ((time_state == TIME_INS) && (time_status & STA_INS)) 374 - return ktime_set(ntp_next_leap_sec, 0); 385 + if ((ntpdata->time_state == TIME_INS) && (ntpdata->time_status & STA_INS)) 386 + return ktime_set(ntpdata->ntp_next_leap_sec, 0); 375 387 ret = KTIME_MAX; 376 388 return ret; 377 389 } 378 390 379 391 /* 380 - * this routine handles the overflow of the microsecond field 392 + * This routine handles the overflow of the microsecond field 381 393 * 382 394 * The tricky bits of code to handle the accurate clock support 383 395 * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame. ··· 389 399 */ 390 400 int second_overflow(time64_t secs) 391 401 { 402 + struct ntp_data *ntpdata = &tk_ntp_data; 392 403 s64 delta; 393 404 int leap = 0; 394 405 s32 rem; ··· 399 408 * day, the system clock is set back one second; if in leap-delete 400 409 * state, the system clock is set ahead one second. 401 410 */ 402 - switch (time_state) { 411 + switch (ntpdata->time_state) { 403 412 case TIME_OK: 404 - if (time_status & STA_INS) { 405 - time_state = TIME_INS; 413 + if (ntpdata->time_status & STA_INS) { 414 + ntpdata->time_state = TIME_INS; 406 415 div_s64_rem(secs, SECS_PER_DAY, &rem); 407 - ntp_next_leap_sec = secs + SECS_PER_DAY - rem; 408 - } else if (time_status & STA_DEL) { 409 - time_state = TIME_DEL; 416 + ntpdata->ntp_next_leap_sec = secs + SECS_PER_DAY - rem; 417 + } else if (ntpdata->time_status & STA_DEL) { 418 + ntpdata->time_state = TIME_DEL; 410 419 div_s64_rem(secs + 1, SECS_PER_DAY, &rem); 411 - ntp_next_leap_sec = secs + SECS_PER_DAY - rem; 420 + ntpdata->ntp_next_leap_sec = secs + SECS_PER_DAY - rem; 412 421 } 413 422 break; 414 423 case TIME_INS: 415 - if (!(time_status & STA_INS)) { 416 - ntp_next_leap_sec = TIME64_MAX; 417 - time_state = TIME_OK; 418 - } else if (secs == ntp_next_leap_sec) { 424 + if (!(ntpdata->time_status & STA_INS)) { 425 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 426 + ntpdata->time_state = TIME_OK; 427 + } else if (secs == ntpdata->ntp_next_leap_sec) { 419 428 leap = -1; 420 - time_state = TIME_OOP; 421 - printk(KERN_NOTICE 422 - "Clock: inserting leap second 23:59:60 UTC\n"); 429 + ntpdata->time_state = TIME_OOP; 430 + pr_notice("Clock: inserting leap second 23:59:60 UTC\n"); 423 431 } 424 432 break; 425 433 case TIME_DEL: 426 - if (!(time_status & STA_DEL)) { 427 - ntp_next_leap_sec = TIME64_MAX; 428 - time_state = TIME_OK; 429 - } else if (secs == ntp_next_leap_sec) { 434 + if (!(ntpdata->time_status & STA_DEL)) { 435 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 436 + ntpdata->time_state = TIME_OK; 437 + } else if (secs == ntpdata->ntp_next_leap_sec) { 430 438 leap = 1; 431 - ntp_next_leap_sec = TIME64_MAX; 432 - time_state = TIME_WAIT; 433 - printk(KERN_NOTICE 434 - "Clock: deleting leap second 23:59:59 UTC\n"); 439 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 440 + ntpdata->time_state = TIME_WAIT; 441 + pr_notice("Clock: deleting leap second 23:59:59 UTC\n"); 435 442 } 436 443 break; 437 444 case TIME_OOP: 438 - ntp_next_leap_sec = TIME64_MAX; 439 - time_state = TIME_WAIT; 445 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 446 + ntpdata->time_state = TIME_WAIT; 440 447 break; 441 448 case TIME_WAIT: 442 - if (!(time_status & (STA_INS | STA_DEL))) 443 - time_state = TIME_OK; 449 + if (!(ntpdata->time_status & (STA_INS | STA_DEL))) 450 + ntpdata->time_state = TIME_OK; 444 451 break; 445 452 } 446 453 447 - 448 454 /* Bump the maxerror field */ 449 - time_maxerror += MAXFREQ / NSEC_PER_USEC; 450 - if (time_maxerror > NTP_PHASE_LIMIT) { 451 - time_maxerror = NTP_PHASE_LIMIT; 452 - time_status |= STA_UNSYNC; 455 + ntpdata->time_maxerror += MAXFREQ / NSEC_PER_USEC; 456 + if (ntpdata->time_maxerror > NTP_PHASE_LIMIT) { 457 + ntpdata->time_maxerror = NTP_PHASE_LIMIT; 458 + ntpdata->time_status |= STA_UNSYNC; 453 459 } 454 460 455 461 /* Compute the phase adjustment for the next second */ 456 - tick_length = tick_length_base; 462 + ntpdata->tick_length = ntpdata->tick_length_base; 457 463 458 - delta = ntp_offset_chunk(time_offset); 459 - time_offset -= delta; 460 - tick_length += delta; 464 + delta = ntp_offset_chunk(ntpdata, ntpdata->time_offset); 465 + ntpdata->time_offset -= delta; 466 + ntpdata->tick_length += delta; 461 467 462 468 /* Check PPS signal */ 463 - pps_dec_valid(); 469 + pps_dec_valid(ntpdata); 464 470 465 - if (!time_adjust) 471 + if (!ntpdata->time_adjust) 466 472 goto out; 467 473 468 - if (time_adjust > MAX_TICKADJ) { 469 - time_adjust -= MAX_TICKADJ; 470 - tick_length += MAX_TICKADJ_SCALED; 471 - goto out; 472 - } 473 - 474 - if (time_adjust < -MAX_TICKADJ) { 475 - time_adjust += MAX_TICKADJ; 476 - tick_length -= MAX_TICKADJ_SCALED; 474 + if (ntpdata->time_adjust > MAX_TICKADJ) { 475 + ntpdata->time_adjust -= MAX_TICKADJ; 476 + ntpdata->tick_length += MAX_TICKADJ_SCALED; 477 477 goto out; 478 478 } 479 479 480 - tick_length += (s64)(time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ) 481 - << NTP_SCALE_SHIFT; 482 - time_adjust = 0; 480 + if (ntpdata->time_adjust < -MAX_TICKADJ) { 481 + ntpdata->time_adjust += MAX_TICKADJ; 482 + ntpdata->tick_length -= MAX_TICKADJ_SCALED; 483 + goto out; 484 + } 485 + 486 + ntpdata->tick_length += (s64)(ntpdata->time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ) 487 + << NTP_SCALE_SHIFT; 488 + ntpdata->time_adjust = 0; 483 489 484 490 out: 485 491 return leap; ··· 599 611 } 600 612 #endif 601 613 614 + /** 615 + * ntp_synced - Tells whether the NTP status is not UNSYNC 616 + * Returns: true if not UNSYNC, false otherwise 617 + */ 618 + static inline bool ntp_synced(void) 619 + { 620 + return !(tk_ntp_data.time_status & STA_UNSYNC); 621 + } 622 + 602 623 /* 603 624 * If we have an externally synchronized Linux clock, then update RTC clock 604 625 * accordingly every ~11 minutes. Generally RTCs can only store second ··· 688 691 /* 689 692 * Propagate a new txc->status value into the NTP state: 690 693 */ 691 - static inline void process_adj_status(const struct __kernel_timex *txc) 694 + static inline void process_adj_status(struct ntp_data *ntpdata, const struct __kernel_timex *txc) 692 695 { 693 - if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) { 694 - time_state = TIME_OK; 695 - time_status = STA_UNSYNC; 696 - ntp_next_leap_sec = TIME64_MAX; 697 - /* restart PPS frequency calibration */ 698 - pps_reset_freq_interval(); 696 + if ((ntpdata->time_status & STA_PLL) && !(txc->status & STA_PLL)) { 697 + ntpdata->time_state = TIME_OK; 698 + ntpdata->time_status = STA_UNSYNC; 699 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 700 + /* Restart PPS frequency calibration */ 701 + pps_reset_freq_interval(ntpdata); 699 702 } 700 703 701 704 /* 702 705 * If we turn on PLL adjustments then reset the 703 706 * reference time to current time. 704 707 */ 705 - if (!(time_status & STA_PLL) && (txc->status & STA_PLL)) 706 - time_reftime = __ktime_get_real_seconds(); 708 + if (!(ntpdata->time_status & STA_PLL) && (txc->status & STA_PLL)) 709 + ntpdata->time_reftime = __ktime_get_real_seconds(); 707 710 708 711 /* only set allowed bits */ 709 - time_status &= STA_RONLY; 710 - time_status |= txc->status & ~STA_RONLY; 712 + ntpdata->time_status &= STA_RONLY; 713 + ntpdata->time_status |= txc->status & ~STA_RONLY; 711 714 } 712 715 713 - 714 - static inline void process_adjtimex_modes(const struct __kernel_timex *txc, 716 + static inline void process_adjtimex_modes(struct ntp_data *ntpdata, const struct __kernel_timex *txc, 715 717 s32 *time_tai) 716 718 { 717 719 if (txc->modes & ADJ_STATUS) 718 - process_adj_status(txc); 720 + process_adj_status(ntpdata, txc); 719 721 720 722 if (txc->modes & ADJ_NANO) 721 - time_status |= STA_NANO; 723 + ntpdata->time_status |= STA_NANO; 722 724 723 725 if (txc->modes & ADJ_MICRO) 724 - time_status &= ~STA_NANO; 726 + ntpdata->time_status &= ~STA_NANO; 725 727 726 728 if (txc->modes & ADJ_FREQUENCY) { 727 - time_freq = txc->freq * PPM_SCALE; 728 - time_freq = min(time_freq, MAXFREQ_SCALED); 729 - time_freq = max(time_freq, -MAXFREQ_SCALED); 730 - /* update pps_freq */ 731 - pps_set_freq(time_freq); 729 + ntpdata->time_freq = txc->freq * PPM_SCALE; 730 + ntpdata->time_freq = min(ntpdata->time_freq, MAXFREQ_SCALED); 731 + ntpdata->time_freq = max(ntpdata->time_freq, -MAXFREQ_SCALED); 732 + /* Update pps_freq */ 733 + pps_set_freq(ntpdata); 732 734 } 733 735 734 736 if (txc->modes & ADJ_MAXERROR) 735 - time_maxerror = clamp(txc->maxerror, 0, NTP_PHASE_LIMIT); 737 + ntpdata->time_maxerror = clamp(txc->maxerror, 0, NTP_PHASE_LIMIT); 736 738 737 739 if (txc->modes & ADJ_ESTERROR) 738 - time_esterror = clamp(txc->esterror, 0, NTP_PHASE_LIMIT); 740 + ntpdata->time_esterror = clamp(txc->esterror, 0, NTP_PHASE_LIMIT); 739 741 740 742 if (txc->modes & ADJ_TIMECONST) { 741 - time_constant = clamp(txc->constant, 0, MAXTC); 742 - if (!(time_status & STA_NANO)) 743 - time_constant += 4; 744 - time_constant = clamp(time_constant, 0, MAXTC); 743 + ntpdata->time_constant = clamp(txc->constant, 0, MAXTC); 744 + if (!(ntpdata->time_status & STA_NANO)) 745 + ntpdata->time_constant += 4; 746 + ntpdata->time_constant = clamp(ntpdata->time_constant, 0, MAXTC); 745 747 } 746 748 747 - if (txc->modes & ADJ_TAI && 748 - txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) 749 + if (txc->modes & ADJ_TAI && txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) 749 750 *time_tai = txc->constant; 750 751 751 752 if (txc->modes & ADJ_OFFSET) 752 - ntp_update_offset(txc->offset); 753 + ntp_update_offset(ntpdata, txc->offset); 753 754 754 755 if (txc->modes & ADJ_TICK) 755 - tick_usec = txc->tick; 756 + ntpdata->tick_usec = txc->tick; 756 757 757 758 if (txc->modes & (ADJ_TICK|ADJ_FREQUENCY|ADJ_OFFSET)) 758 - ntp_update_frequency(); 759 + ntp_update_frequency(ntpdata); 759 760 } 760 761 761 - 762 762 /* 763 - * adjtimex mainly allows reading (and writing, if superuser) of 763 + * adjtimex() mainly allows reading (and writing, if superuser) of 764 764 * kernel time-keeping variables. used by xntpd. 765 765 */ 766 766 int __do_adjtimex(struct __kernel_timex *txc, const struct timespec64 *ts, 767 767 s32 *time_tai, struct audit_ntp_data *ad) 768 768 { 769 + struct ntp_data *ntpdata = &tk_ntp_data; 769 770 int result; 770 771 771 772 if (txc->modes & ADJ_ADJTIME) { 772 - long save_adjust = time_adjust; 773 + long save_adjust = ntpdata->time_adjust; 773 774 774 775 if (!(txc->modes & ADJ_OFFSET_READONLY)) { 775 776 /* adjtime() is independent from ntp_adjtime() */ 776 - time_adjust = txc->offset; 777 - ntp_update_frequency(); 777 + ntpdata->time_adjust = txc->offset; 778 + ntp_update_frequency(ntpdata); 778 779 779 780 audit_ntp_set_old(ad, AUDIT_NTP_ADJUST, save_adjust); 780 - audit_ntp_set_new(ad, AUDIT_NTP_ADJUST, time_adjust); 781 + audit_ntp_set_new(ad, AUDIT_NTP_ADJUST, ntpdata->time_adjust); 781 782 } 782 783 txc->offset = save_adjust; 783 784 } else { 784 785 /* If there are input parameters, then process them: */ 785 786 if (txc->modes) { 786 - audit_ntp_set_old(ad, AUDIT_NTP_OFFSET, time_offset); 787 - audit_ntp_set_old(ad, AUDIT_NTP_FREQ, time_freq); 788 - audit_ntp_set_old(ad, AUDIT_NTP_STATUS, time_status); 787 + audit_ntp_set_old(ad, AUDIT_NTP_OFFSET, ntpdata->time_offset); 788 + audit_ntp_set_old(ad, AUDIT_NTP_FREQ, ntpdata->time_freq); 789 + audit_ntp_set_old(ad, AUDIT_NTP_STATUS, ntpdata->time_status); 789 790 audit_ntp_set_old(ad, AUDIT_NTP_TAI, *time_tai); 790 - audit_ntp_set_old(ad, AUDIT_NTP_TICK, tick_usec); 791 + audit_ntp_set_old(ad, AUDIT_NTP_TICK, ntpdata->tick_usec); 791 792 792 - process_adjtimex_modes(txc, time_tai); 793 + process_adjtimex_modes(ntpdata, txc, time_tai); 793 794 794 - audit_ntp_set_new(ad, AUDIT_NTP_OFFSET, time_offset); 795 - audit_ntp_set_new(ad, AUDIT_NTP_FREQ, time_freq); 796 - audit_ntp_set_new(ad, AUDIT_NTP_STATUS, time_status); 795 + audit_ntp_set_new(ad, AUDIT_NTP_OFFSET, ntpdata->time_offset); 796 + audit_ntp_set_new(ad, AUDIT_NTP_FREQ, ntpdata->time_freq); 797 + audit_ntp_set_new(ad, AUDIT_NTP_STATUS, ntpdata->time_status); 797 798 audit_ntp_set_new(ad, AUDIT_NTP_TAI, *time_tai); 798 - audit_ntp_set_new(ad, AUDIT_NTP_TICK, tick_usec); 799 + audit_ntp_set_new(ad, AUDIT_NTP_TICK, ntpdata->tick_usec); 799 800 } 800 801 801 - txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ, 802 - NTP_SCALE_SHIFT); 803 - if (!(time_status & STA_NANO)) 802 + txc->offset = shift_right(ntpdata->time_offset * NTP_INTERVAL_FREQ, NTP_SCALE_SHIFT); 803 + if (!(ntpdata->time_status & STA_NANO)) 804 804 txc->offset = (u32)txc->offset / NSEC_PER_USEC; 805 805 } 806 806 807 - result = time_state; /* mostly `TIME_OK' */ 808 - /* check for errors */ 809 - if (is_error_status(time_status)) 807 + result = ntpdata->time_state; 808 + if (is_error_status(ntpdata->time_status)) 810 809 result = TIME_ERROR; 811 810 812 - txc->freq = shift_right((time_freq >> PPM_SCALE_INV_SHIFT) * 811 + txc->freq = shift_right((ntpdata->time_freq >> PPM_SCALE_INV_SHIFT) * 813 812 PPM_SCALE_INV, NTP_SCALE_SHIFT); 814 - txc->maxerror = time_maxerror; 815 - txc->esterror = time_esterror; 816 - txc->status = time_status; 817 - txc->constant = time_constant; 813 + txc->maxerror = ntpdata->time_maxerror; 814 + txc->esterror = ntpdata->time_esterror; 815 + txc->status = ntpdata->time_status; 816 + txc->constant = ntpdata->time_constant; 818 817 txc->precision = 1; 819 818 txc->tolerance = MAXFREQ_SCALED / PPM_SCALE; 820 - txc->tick = tick_usec; 819 + txc->tick = ntpdata->tick_usec; 821 820 txc->tai = *time_tai; 822 821 823 - /* fill PPS status fields */ 824 - pps_fill_timex(txc); 822 + /* Fill PPS status fields */ 823 + pps_fill_timex(ntpdata, txc); 825 824 826 825 txc->time.tv_sec = ts->tv_sec; 827 826 txc->time.tv_usec = ts->tv_nsec; 828 - if (!(time_status & STA_NANO)) 827 + if (!(ntpdata->time_status & STA_NANO)) 829 828 txc->time.tv_usec = ts->tv_nsec / NSEC_PER_USEC; 830 829 831 830 /* Handle leapsec adjustments */ 832 - if (unlikely(ts->tv_sec >= ntp_next_leap_sec)) { 833 - if ((time_state == TIME_INS) && (time_status & STA_INS)) { 831 + if (unlikely(ts->tv_sec >= ntpdata->ntp_next_leap_sec)) { 832 + if ((ntpdata->time_state == TIME_INS) && (ntpdata->time_status & STA_INS)) { 834 833 result = TIME_OOP; 835 834 txc->tai++; 836 835 txc->time.tv_sec--; 837 836 } 838 - if ((time_state == TIME_DEL) && (time_status & STA_DEL)) { 837 + if ((ntpdata->time_state == TIME_DEL) && (ntpdata->time_status & STA_DEL)) { 839 838 result = TIME_WAIT; 840 839 txc->tai--; 841 840 txc->time.tv_sec++; 842 841 } 843 - if ((time_state == TIME_OOP) && 844 - (ts->tv_sec == ntp_next_leap_sec)) { 842 + if ((ntpdata->time_state == TIME_OOP) && (ts->tv_sec == ntpdata->ntp_next_leap_sec)) 845 843 result = TIME_WAIT; 846 - } 847 844 } 848 845 849 846 return result; ··· 845 854 846 855 #ifdef CONFIG_NTP_PPS 847 856 848 - /* actually struct pps_normtime is good old struct timespec, but it is 857 + /* 858 + * struct pps_normtime is basically a struct timespec, but it is 849 859 * semantically different (and it is the reason why it was invented): 850 860 * pps_normtime.nsec has a range of ( -NSEC_PER_SEC / 2, NSEC_PER_SEC / 2 ] 851 - * while timespec.tv_nsec has a range of [0, NSEC_PER_SEC) */ 861 + * while timespec.tv_nsec has a range of [0, NSEC_PER_SEC) 862 + */ 852 863 struct pps_normtime { 853 864 s64 sec; /* seconds */ 854 865 long nsec; /* nanoseconds */ 855 866 }; 856 867 857 - /* normalize the timestamp so that nsec is in the 858 - ( -NSEC_PER_SEC / 2, NSEC_PER_SEC / 2 ] interval */ 868 + /* 869 + * Normalize the timestamp so that nsec is in the 870 + * [ -NSEC_PER_SEC / 2, NSEC_PER_SEC / 2 ] interval 871 + */ 859 872 static inline struct pps_normtime pps_normalize_ts(struct timespec64 ts) 860 873 { 861 874 struct pps_normtime norm = { ··· 875 880 return norm; 876 881 } 877 882 878 - /* get current phase correction and jitter */ 879 - static inline long pps_phase_filter_get(long *jitter) 883 + /* Get current phase correction and jitter */ 884 + static inline long pps_phase_filter_get(struct ntp_data *ntpdata, long *jitter) 880 885 { 881 - *jitter = pps_tf[0] - pps_tf[1]; 886 + *jitter = ntpdata->pps_tf[0] - ntpdata->pps_tf[1]; 882 887 if (*jitter < 0) 883 888 *jitter = -*jitter; 884 889 885 890 /* TODO: test various filters */ 886 - return pps_tf[0]; 891 + return ntpdata->pps_tf[0]; 887 892 } 888 893 889 - /* add the sample to the phase filter */ 890 - static inline void pps_phase_filter_add(long err) 894 + /* Add the sample to the phase filter */ 895 + static inline void pps_phase_filter_add(struct ntp_data *ntpdata, long err) 891 896 { 892 - pps_tf[2] = pps_tf[1]; 893 - pps_tf[1] = pps_tf[0]; 894 - pps_tf[0] = err; 897 + ntpdata->pps_tf[2] = ntpdata->pps_tf[1]; 898 + ntpdata->pps_tf[1] = ntpdata->pps_tf[0]; 899 + ntpdata->pps_tf[0] = err; 895 900 } 896 901 897 - /* decrease frequency calibration interval length. 898 - * It is halved after four consecutive unstable intervals. 902 + /* 903 + * Decrease frequency calibration interval length. It is halved after four 904 + * consecutive unstable intervals. 899 905 */ 900 - static inline void pps_dec_freq_interval(void) 906 + static inline void pps_dec_freq_interval(struct ntp_data *ntpdata) 901 907 { 902 - if (--pps_intcnt <= -PPS_INTCOUNT) { 903 - pps_intcnt = -PPS_INTCOUNT; 904 - if (pps_shift > PPS_INTMIN) { 905 - pps_shift--; 906 - pps_intcnt = 0; 908 + if (--ntpdata->pps_intcnt <= -PPS_INTCOUNT) { 909 + ntpdata->pps_intcnt = -PPS_INTCOUNT; 910 + if (ntpdata->pps_shift > PPS_INTMIN) { 911 + ntpdata->pps_shift--; 912 + ntpdata->pps_intcnt = 0; 907 913 } 908 914 } 909 915 } 910 916 911 - /* increase frequency calibration interval length. 912 - * It is doubled after four consecutive stable intervals. 917 + /* 918 + * Increase frequency calibration interval length. It is doubled after 919 + * four consecutive stable intervals. 913 920 */ 914 - static inline void pps_inc_freq_interval(void) 921 + static inline void pps_inc_freq_interval(struct ntp_data *ntpdata) 915 922 { 916 - if (++pps_intcnt >= PPS_INTCOUNT) { 917 - pps_intcnt = PPS_INTCOUNT; 918 - if (pps_shift < PPS_INTMAX) { 919 - pps_shift++; 920 - pps_intcnt = 0; 923 + if (++ntpdata->pps_intcnt >= PPS_INTCOUNT) { 924 + ntpdata->pps_intcnt = PPS_INTCOUNT; 925 + if (ntpdata->pps_shift < PPS_INTMAX) { 926 + ntpdata->pps_shift++; 927 + ntpdata->pps_intcnt = 0; 921 928 } 922 929 } 923 930 } 924 931 925 - /* update clock frequency based on MONOTONIC_RAW clock PPS signal 932 + /* 933 + * Update clock frequency based on MONOTONIC_RAW clock PPS signal 926 934 * timestamps 927 935 * 928 936 * At the end of the calibration interval the difference between the ··· 934 936 * too long, the data are discarded. 935 937 * Returns the difference between old and new frequency values. 936 938 */ 937 - static long hardpps_update_freq(struct pps_normtime freq_norm) 939 + static long hardpps_update_freq(struct ntp_data *ntpdata, struct pps_normtime freq_norm) 938 940 { 939 941 long delta, delta_mod; 940 942 s64 ftemp; 941 943 942 - /* check if the frequency interval was too long */ 943 - if (freq_norm.sec > (2 << pps_shift)) { 944 - time_status |= STA_PPSERROR; 945 - pps_errcnt++; 946 - pps_dec_freq_interval(); 947 - printk_deferred(KERN_ERR 948 - "hardpps: PPSERROR: interval too long - %lld s\n", 949 - freq_norm.sec); 944 + /* Check if the frequency interval was too long */ 945 + if (freq_norm.sec > (2 << ntpdata->pps_shift)) { 946 + ntpdata->time_status |= STA_PPSERROR; 947 + ntpdata->pps_errcnt++; 948 + pps_dec_freq_interval(ntpdata); 949 + printk_deferred(KERN_ERR "hardpps: PPSERROR: interval too long - %lld s\n", 950 + freq_norm.sec); 950 951 return 0; 951 952 } 952 953 953 - /* here the raw frequency offset and wander (stability) is 954 - * calculated. If the wander is less than the wander threshold 955 - * the interval is increased; otherwise it is decreased. 954 + /* 955 + * Here the raw frequency offset and wander (stability) is 956 + * calculated. If the wander is less than the wander threshold the 957 + * interval is increased; otherwise it is decreased. 956 958 */ 957 959 ftemp = div_s64(((s64)(-freq_norm.nsec)) << NTP_SCALE_SHIFT, 958 960 freq_norm.sec); 959 - delta = shift_right(ftemp - pps_freq, NTP_SCALE_SHIFT); 960 - pps_freq = ftemp; 961 + delta = shift_right(ftemp - ntpdata->pps_freq, NTP_SCALE_SHIFT); 962 + ntpdata->pps_freq = ftemp; 961 963 if (delta > PPS_MAXWANDER || delta < -PPS_MAXWANDER) { 962 - printk_deferred(KERN_WARNING 963 - "hardpps: PPSWANDER: change=%ld\n", delta); 964 - time_status |= STA_PPSWANDER; 965 - pps_stbcnt++; 966 - pps_dec_freq_interval(); 967 - } else { /* good sample */ 968 - pps_inc_freq_interval(); 964 + printk_deferred(KERN_WARNING "hardpps: PPSWANDER: change=%ld\n", delta); 965 + ntpdata->time_status |= STA_PPSWANDER; 966 + ntpdata->pps_stbcnt++; 967 + pps_dec_freq_interval(ntpdata); 968 + } else { 969 + /* Good sample */ 970 + pps_inc_freq_interval(ntpdata); 969 971 } 970 972 971 - /* the stability metric is calculated as the average of recent 972 - * frequency changes, but is used only for performance 973 - * monitoring 973 + /* 974 + * The stability metric is calculated as the average of recent 975 + * frequency changes, but is used only for performance monitoring 974 976 */ 975 977 delta_mod = delta; 976 978 if (delta_mod < 0) 977 979 delta_mod = -delta_mod; 978 - pps_stabil += (div_s64(((s64)delta_mod) << 979 - (NTP_SCALE_SHIFT - SHIFT_USEC), 980 - NSEC_PER_USEC) - pps_stabil) >> PPS_INTMIN; 980 + ntpdata->pps_stabil += (div_s64(((s64)delta_mod) << (NTP_SCALE_SHIFT - SHIFT_USEC), 981 + NSEC_PER_USEC) - ntpdata->pps_stabil) >> PPS_INTMIN; 981 982 982 - /* if enabled, the system clock frequency is updated */ 983 - if ((time_status & STA_PPSFREQ) != 0 && 984 - (time_status & STA_FREQHOLD) == 0) { 985 - time_freq = pps_freq; 986 - ntp_update_frequency(); 983 + /* If enabled, the system clock frequency is updated */ 984 + if ((ntpdata->time_status & STA_PPSFREQ) && !(ntpdata->time_status & STA_FREQHOLD)) { 985 + ntpdata->time_freq = ntpdata->pps_freq; 986 + ntp_update_frequency(ntpdata); 987 987 } 988 988 989 989 return delta; 990 990 } 991 991 992 - /* correct REALTIME clock phase error against PPS signal */ 993 - static void hardpps_update_phase(long error) 992 + /* Correct REALTIME clock phase error against PPS signal */ 993 + static void hardpps_update_phase(struct ntp_data *ntpdata, long error) 994 994 { 995 995 long correction = -error; 996 996 long jitter; 997 997 998 - /* add the sample to the median filter */ 999 - pps_phase_filter_add(correction); 1000 - correction = pps_phase_filter_get(&jitter); 998 + /* Add the sample to the median filter */ 999 + pps_phase_filter_add(ntpdata, correction); 1000 + correction = pps_phase_filter_get(ntpdata, &jitter); 1001 1001 1002 - /* Nominal jitter is due to PPS signal noise. If it exceeds the 1002 + /* 1003 + * Nominal jitter is due to PPS signal noise. If it exceeds the 1003 1004 * threshold, the sample is discarded; otherwise, if so enabled, 1004 1005 * the time offset is updated. 1005 1006 */ 1006 - if (jitter > (pps_jitter << PPS_POPCORN)) { 1007 - printk_deferred(KERN_WARNING 1008 - "hardpps: PPSJITTER: jitter=%ld, limit=%ld\n", 1009 - jitter, (pps_jitter << PPS_POPCORN)); 1010 - time_status |= STA_PPSJITTER; 1011 - pps_jitcnt++; 1012 - } else if (time_status & STA_PPSTIME) { 1013 - /* correct the time using the phase offset */ 1014 - time_offset = div_s64(((s64)correction) << NTP_SCALE_SHIFT, 1015 - NTP_INTERVAL_FREQ); 1016 - /* cancel running adjtime() */ 1017 - time_adjust = 0; 1007 + if (jitter > (ntpdata->pps_jitter << PPS_POPCORN)) { 1008 + printk_deferred(KERN_WARNING "hardpps: PPSJITTER: jitter=%ld, limit=%ld\n", 1009 + jitter, (ntpdata->pps_jitter << PPS_POPCORN)); 1010 + ntpdata->time_status |= STA_PPSJITTER; 1011 + ntpdata->pps_jitcnt++; 1012 + } else if (ntpdata->time_status & STA_PPSTIME) { 1013 + /* Correct the time using the phase offset */ 1014 + ntpdata->time_offset = div_s64(((s64)correction) << NTP_SCALE_SHIFT, 1015 + NTP_INTERVAL_FREQ); 1016 + /* Cancel running adjtime() */ 1017 + ntpdata->time_adjust = 0; 1018 1018 } 1019 - /* update jitter */ 1020 - pps_jitter += (jitter - pps_jitter) >> PPS_INTMIN; 1019 + /* Update jitter */ 1020 + ntpdata->pps_jitter += (jitter - ntpdata->pps_jitter) >> PPS_INTMIN; 1021 1021 } 1022 1022 1023 1023 /* ··· 1033 1037 void __hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts) 1034 1038 { 1035 1039 struct pps_normtime pts_norm, freq_norm; 1040 + struct ntp_data *ntpdata = &tk_ntp_data; 1036 1041 1037 1042 pts_norm = pps_normalize_ts(*phase_ts); 1038 1043 1039 - /* clear the error bits, they will be set again if needed */ 1040 - time_status &= ~(STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR); 1044 + /* Clear the error bits, they will be set again if needed */ 1045 + ntpdata->time_status &= ~(STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR); 1041 1046 1042 1047 /* indicate signal presence */ 1043 - time_status |= STA_PPSSIGNAL; 1044 - pps_valid = PPS_VALID; 1048 + ntpdata->time_status |= STA_PPSSIGNAL; 1049 + ntpdata->pps_valid = PPS_VALID; 1045 1050 1046 - /* when called for the first time, 1047 - * just start the frequency interval */ 1048 - if (unlikely(pps_fbase.tv_sec == 0)) { 1049 - pps_fbase = *raw_ts; 1051 + /* 1052 + * When called for the first time, just start the frequency 1053 + * interval 1054 + */ 1055 + if (unlikely(ntpdata->pps_fbase.tv_sec == 0)) { 1056 + ntpdata->pps_fbase = *raw_ts; 1050 1057 return; 1051 1058 } 1052 1059 1053 - /* ok, now we have a base for frequency calculation */ 1054 - freq_norm = pps_normalize_ts(timespec64_sub(*raw_ts, pps_fbase)); 1060 + /* Ok, now we have a base for frequency calculation */ 1061 + freq_norm = pps_normalize_ts(timespec64_sub(*raw_ts, ntpdata->pps_fbase)); 1055 1062 1056 - /* check that the signal is in the range 1057 - * [1s - MAXFREQ us, 1s + MAXFREQ us], otherwise reject it */ 1058 - if ((freq_norm.sec == 0) || 1059 - (freq_norm.nsec > MAXFREQ * freq_norm.sec) || 1060 - (freq_norm.nsec < -MAXFREQ * freq_norm.sec)) { 1061 - time_status |= STA_PPSJITTER; 1062 - /* restart the frequency calibration interval */ 1063 - pps_fbase = *raw_ts; 1063 + /* 1064 + * Check that the signal is in the range 1065 + * [1s - MAXFREQ us, 1s + MAXFREQ us], otherwise reject it 1066 + */ 1067 + if ((freq_norm.sec == 0) || (freq_norm.nsec > MAXFREQ * freq_norm.sec) || 1068 + (freq_norm.nsec < -MAXFREQ * freq_norm.sec)) { 1069 + ntpdata->time_status |= STA_PPSJITTER; 1070 + /* Restart the frequency calibration interval */ 1071 + ntpdata->pps_fbase = *raw_ts; 1064 1072 printk_deferred(KERN_ERR "hardpps: PPSJITTER: bad pulse\n"); 1065 1073 return; 1066 1074 } 1067 1075 1068 - /* signal is ok */ 1069 - 1070 - /* check if the current frequency interval is finished */ 1071 - if (freq_norm.sec >= (1 << pps_shift)) { 1072 - pps_calcnt++; 1073 - /* restart the frequency calibration interval */ 1074 - pps_fbase = *raw_ts; 1075 - hardpps_update_freq(freq_norm); 1076 + /* Signal is ok. Check if the current frequency interval is finished */ 1077 + if (freq_norm.sec >= (1 << ntpdata->pps_shift)) { 1078 + ntpdata->pps_calcnt++; 1079 + /* Restart the frequency calibration interval */ 1080 + ntpdata->pps_fbase = *raw_ts; 1081 + hardpps_update_freq(ntpdata, freq_norm); 1076 1082 } 1077 1083 1078 - hardpps_update_phase(pts_norm.nsec); 1084 + hardpps_update_phase(ntpdata, pts_norm.nsec); 1079 1085 1080 1086 } 1081 1087 #endif /* CONFIG_NTP_PPS */ 1082 1088 1083 1089 static int __init ntp_tick_adj_setup(char *str) 1084 1090 { 1085 - int rc = kstrtos64(str, 0, &ntp_tick_adj); 1091 + int rc = kstrtos64(str, 0, &tk_ntp_data.ntp_tick_adj); 1086 1092 if (rc) 1087 1093 return rc; 1088 1094 1089 - ntp_tick_adj <<= NTP_SCALE_SHIFT; 1095 + tk_ntp_data.ntp_tick_adj <<= NTP_SCALE_SHIFT; 1090 1096 return 1; 1091 1097 } 1092 1098

+39 -33

kernel/time/posix-cpu-timers.c

··· 453 453 struct cpu_timer *ctmr = &timer->it.cpu; 454 454 struct posix_cputimer_base *base; 455 455 456 - timer->it_active = 0; 457 456 if (!cpu_timer_dequeue(ctmr)) 458 457 return; 459 458 ··· 493 494 */ 494 495 WARN_ON_ONCE(ctmr->head || timerqueue_node_queued(&ctmr->node)); 495 496 } else { 496 - if (timer->it.cpu.firing) 497 + if (timer->it.cpu.firing) { 498 + /* 499 + * Prevent signal delivery. The timer cannot be dequeued 500 + * because it is on the firing list which is not protected 501 + * by sighand->lock. The delivery path is waiting for 502 + * the timer lock. So go back, unlock and retry. 503 + */ 504 + timer->it.cpu.firing = false; 497 505 ret = TIMER_RETRY; 498 - else 506 + } else { 499 507 disarm_timer(timer, p); 500 - 508 + } 501 509 unlock_task_sighand(p, &flags); 502 510 } 503 511 504 512 out: 505 513 rcu_read_unlock(); 506 - if (!ret) 507 - put_pid(ctmr->pid); 508 514 515 + if (!ret) { 516 + put_pid(ctmr->pid); 517 + timer->it_status = POSIX_TIMER_DISARMED; 518 + } 509 519 return ret; 510 520 } 511 521 ··· 568 560 struct cpu_timer *ctmr = &timer->it.cpu; 569 561 u64 newexp = cpu_timer_getexpires(ctmr); 570 562 571 - timer->it_active = 1; 563 + timer->it_status = POSIX_TIMER_ARMED; 572 564 if (!cpu_timer_enqueue(&base->tqhead, ctmr)) 573 565 return; 574 566 ··· 594 586 { 595 587 struct cpu_timer *ctmr = &timer->it.cpu; 596 588 597 - timer->it_active = 0; 598 - if (unlikely(timer->sigq == NULL)) { 589 + timer->it_status = POSIX_TIMER_DISARMED; 590 + 591 + if (unlikely(ctmr->nanosleep)) { 599 592 /* 600 593 * This a special case for clock_nanosleep, 601 594 * not a normal timer from sys_timer_create. 602 595 */ 603 596 wake_up_process(timer->it_process); 604 597 cpu_timer_setexpires(ctmr, 0); 605 - } else if (!timer->it_interval) { 606 - /* 607 - * One-shot timer. Clear it as soon as it's fired. 608 - */ 598 + } else { 609 599 posix_timer_queue_signal(timer); 610 - cpu_timer_setexpires(ctmr, 0); 611 - } else if (posix_timer_queue_signal(timer)) { 612 - /* 613 - * The signal did not get queued because the signal 614 - * was ignored, so we won't get any callback to 615 - * reload the timer. But we need to keep it 616 - * ticking in case the signal is deliverable next time. 617 - */ 618 - posix_cpu_timer_rearm(timer); 619 - ++timer->it_requeue_pending; 600 + /* Disable oneshot timers */ 601 + if (!timer->it_interval) 602 + cpu_timer_setexpires(ctmr, 0); 620 603 } 621 604 } 622 605 ··· 666 667 old_expires = cpu_timer_getexpires(ctmr); 667 668 668 669 if (unlikely(timer->it.cpu.firing)) { 669 - timer->it.cpu.firing = -1; 670 + /* 671 + * Prevent signal delivery. The timer cannot be dequeued 672 + * because it is on the firing list which is not protected 673 + * by sighand->lock. The delivery path is waiting for 674 + * the timer lock. So go back, unlock and retry. 675 + */ 676 + timer->it.cpu.firing = false; 670 677 ret = TIMER_RETRY; 671 678 } else { 672 679 cpu_timer_dequeue(ctmr); 673 - timer->it_active = 0; 680 + timer->it_status = POSIX_TIMER_DISARMED; 674 681 } 675 682 676 683 /* ··· 750 745 * - Timers which expired, but the signal has not yet been 751 746 * delivered 752 747 */ 753 - if (iv && ((timer->it_requeue_pending & REQUEUE_PENDING) || sigev_none)) 748 + if (iv && timer->it_status != POSIX_TIMER_ARMED) 754 749 expires = bump_cpu_timer(timer, now); 755 750 else 756 751 expires = cpu_timer_getexpires(&timer->it.cpu); ··· 813 808 if (++i == MAX_COLLECTED || now < expires) 814 809 return expires; 815 810 816 - ctmr->firing = 1; 811 + ctmr->firing = true; 817 812 /* See posix_cpu_timer_wait_running() */ 818 813 rcu_assign_pointer(ctmr->handling, current); 819 814 cpu_timer_dequeue(ctmr); ··· 1368 1363 * timer call will interfere. 1369 1364 */ 1370 1365 list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) { 1371 - int cpu_firing; 1366 + bool cpu_firing; 1372 1367 1373 1368 /* 1374 1369 * spin_lock() is sufficient here even independent of the ··· 1380 1375 spin_lock(&timer->it_lock); 1381 1376 list_del_init(&timer->it.cpu.elist); 1382 1377 cpu_firing = timer->it.cpu.firing; 1383 - timer->it.cpu.firing = 0; 1378 + timer->it.cpu.firing = false; 1384 1379 /* 1385 - * The firing flag is -1 if we collided with a reset 1386 - * of the timer, which already reported this 1387 - * almost-firing as an overrun. So don't generate an event. 1380 + * If the firing flag is cleared then this raced with a 1381 + * timer rearm/delete operation. So don't generate an 1382 + * event. 1388 1383 */ 1389 - if (likely(cpu_firing >= 0)) 1384 + if (likely(cpu_firing)) 1390 1385 cpu_timer_fire(timer); 1391 1386 /* See posix_cpu_timer_wait_running() */ 1392 1387 rcu_assign_pointer(timer->it.cpu.handling, NULL); ··· 1483 1478 timer.it_overrun = -1; 1484 1479 error = posix_cpu_timer_create(&timer); 1485 1480 timer.it_process = current; 1481 + timer.it.cpu.nanosleep = true; 1486 1482 1487 1483 if (!error) { 1488 1484 static struct itimerspec64 zero_it;

+136 -145

kernel/time/posix-timers.c

··· 233 233 * The siginfo si_overrun field and the return value of timer_getoverrun(2) 234 234 * are of type int. Clamp the overrun value to INT_MAX 235 235 */ 236 - static inline int timer_overrun_to_int(struct k_itimer *timr, int baseval) 236 + static inline int timer_overrun_to_int(struct k_itimer *timr) 237 237 { 238 - s64 sum = timr->it_overrun_last + (s64)baseval; 238 + if (timr->it_overrun_last > (s64)INT_MAX) 239 + return INT_MAX; 239 240 240 - return sum > (s64)INT_MAX ? INT_MAX : (int)sum; 241 + return (int)timr->it_overrun_last; 241 242 } 242 243 243 244 static void common_hrtimer_rearm(struct k_itimer *timr) ··· 250 249 hrtimer_restart(timer); 251 250 } 252 251 253 - /* 254 - * This function is called from the signal delivery code if 255 - * info->si_sys_private is not zero, which indicates that the timer has to 256 - * be rearmed. Restart the timer and update info::si_overrun. 257 - */ 258 - void posixtimer_rearm(struct kernel_siginfo *info) 252 + static bool __posixtimer_deliver_signal(struct kernel_siginfo *info, struct k_itimer *timr) 259 253 { 260 - struct k_itimer *timr; 261 - unsigned long flags; 262 - 263 - timr = lock_timer(info->si_tid, &flags); 264 - if (!timr) 265 - return; 266 - 267 - if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) { 268 - timr->kclock->timer_rearm(timr); 269 - 270 - timr->it_active = 1; 271 - timr->it_overrun_last = timr->it_overrun; 272 - timr->it_overrun = -1LL; 273 - ++timr->it_requeue_pending; 274 - 275 - info->si_overrun = timer_overrun_to_int(timr, info->si_overrun); 276 - } 277 - 278 - unlock_timer(timr, flags); 279 - } 280 - 281 - int posix_timer_queue_signal(struct k_itimer *timr) 282 - { 283 - int ret, si_private = 0; 284 - enum pid_type type; 285 - 286 - lockdep_assert_held(&timr->it_lock); 287 - 288 - timr->it_active = 0; 289 - if (timr->it_interval) 290 - si_private = ++timr->it_requeue_pending; 254 + guard(spinlock)(&timr->it_lock); 291 255 292 256 /* 293 - * FIXME: if ->sigq is queued we can race with 294 - * dequeue_signal()->posixtimer_rearm(). 295 - * 296 - * If dequeue_signal() sees the "right" value of 297 - * si_sys_private it calls posixtimer_rearm(). 298 - * We re-queue ->sigq and drop ->it_lock(). 299 - * posixtimer_rearm() locks the timer 300 - * and re-schedules it while ->sigq is pending. 301 - * Not really bad, but not that we want. 257 + * Check if the timer is still alive or whether it got modified 258 + * since the signal was queued. In either case, don't rearm and 259 + * drop the signal. 302 260 */ 303 - timr->sigq->info.si_sys_private = si_private; 261 + if (timr->it_signal_seq != timr->it_sigqueue_seq || WARN_ON_ONCE(!timr->it_signal)) 262 + return false; 304 263 305 - type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID; 306 - ret = send_sigqueue(timr->sigq, timr->it_pid, type); 307 - /* If we failed to send the signal the timer stops. */ 308 - return ret > 0; 264 + if (!timr->it_interval || WARN_ON_ONCE(timr->it_status != POSIX_TIMER_REQUEUE_PENDING)) 265 + return true; 266 + 267 + timr->kclock->timer_rearm(timr); 268 + timr->it_status = POSIX_TIMER_ARMED; 269 + timr->it_overrun_last = timr->it_overrun; 270 + timr->it_overrun = -1LL; 271 + ++timr->it_signal_seq; 272 + info->si_overrun = timer_overrun_to_int(timr); 273 + return true; 274 + } 275 + 276 + /* 277 + * This function is called from the signal delivery code. It decides 278 + * whether the signal should be dropped and rearms interval timers. The 279 + * timer can be unconditionally accessed as there is a reference held on 280 + * it. 281 + */ 282 + bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueue *timer_sigq) 283 + { 284 + struct k_itimer *timr = container_of(timer_sigq, struct k_itimer, sigq); 285 + bool ret; 286 + 287 + /* 288 + * Release siglock to ensure proper locking order versus 289 + * timr::it_lock. Keep interrupts disabled. 290 + */ 291 + spin_unlock(&current->sighand->siglock); 292 + 293 + ret = __posixtimer_deliver_signal(info, timr); 294 + 295 + /* Drop the reference which was acquired when the signal was queued */ 296 + posixtimer_putref(timr); 297 + 298 + spin_lock(&current->sighand->siglock); 299 + return ret; 300 + } 301 + 302 + void posix_timer_queue_signal(struct k_itimer *timr) 303 + { 304 + lockdep_assert_held(&timr->it_lock); 305 + 306 + timr->it_status = timr->it_interval ? POSIX_TIMER_REQUEUE_PENDING : POSIX_TIMER_DISARMED; 307 + posixtimer_send_sigqueue(timr); 309 308 } 310 309 311 310 /* ··· 318 317 static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer) 319 318 { 320 319 struct k_itimer *timr = container_of(timer, struct k_itimer, it.real.timer); 321 - enum hrtimer_restart ret = HRTIMER_NORESTART; 322 - unsigned long flags; 323 320 324 - spin_lock_irqsave(&timr->it_lock, flags); 325 - 326 - if (posix_timer_queue_signal(timr)) { 327 - /* 328 - * The signal was not queued due to SIG_IGN. As a 329 - * consequence the timer is not going to be rearmed from 330 - * the signal delivery path. But as a real signal handler 331 - * can be installed later the timer must be rearmed here. 332 - */ 333 - if (timr->it_interval != 0) { 334 - ktime_t now = hrtimer_cb_get_time(timer); 335 - 336 - /* 337 - * FIXME: What we really want, is to stop this 338 - * timer completely and restart it in case the 339 - * SIG_IGN is removed. This is a non trivial 340 - * change to the signal handling code. 341 - * 342 - * For now let timers with an interval less than a 343 - * jiffy expire every jiffy and recheck for a 344 - * valid signal handler. 345 - * 346 - * This avoids interrupt starvation in case of a 347 - * very small interval, which would expire the 348 - * timer immediately again. 349 - * 350 - * Moving now ahead of time by one jiffy tricks 351 - * hrtimer_forward() to expire the timer later, 352 - * while it still maintains the overrun accuracy 353 - * for the price of a slight inconsistency in the 354 - * timer_gettime() case. This is at least better 355 - * than a timer storm. 356 - * 357 - * Only required when high resolution timers are 358 - * enabled as the periodic tick based timers are 359 - * automatically aligned to the next tick. 360 - */ 361 - if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS)) { 362 - ktime_t kj = TICK_NSEC; 363 - 364 - if (timr->it_interval < kj) 365 - now = ktime_add(now, kj); 366 - } 367 - 368 - timr->it_overrun += hrtimer_forward(timer, now, timr->it_interval); 369 - ret = HRTIMER_RESTART; 370 - ++timr->it_requeue_pending; 371 - timr->it_active = 1; 372 - } 373 - } 374 - 375 - unlock_timer(timr, flags); 376 - return ret; 321 + guard(spinlock_irqsave)(&timr->it_lock); 322 + posix_timer_queue_signal(timr); 323 + return HRTIMER_NORESTART; 377 324 } 378 325 379 326 static struct pid *good_sigevent(sigevent_t * event) ··· 348 399 } 349 400 } 350 401 351 - static struct k_itimer * alloc_posix_timer(void) 402 + static struct k_itimer *alloc_posix_timer(void) 352 403 { 353 404 struct k_itimer *tmr = kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL); 354 405 355 406 if (!tmr) 356 407 return tmr; 357 - if (unlikely(!(tmr->sigq = sigqueue_alloc()))) { 408 + 409 + if (unlikely(!posixtimer_init_sigqueue(&tmr->sigq))) { 358 410 kmem_cache_free(posix_timers_cache, tmr); 359 411 return NULL; 360 412 } 361 - clear_siginfo(&tmr->sigq->info); 413 + rcuref_init(&tmr->rcuref, 1); 362 414 return tmr; 363 415 } 364 416 365 - static void k_itimer_rcu_free(struct rcu_head *head) 366 - { 367 - struct k_itimer *tmr = container_of(head, struct k_itimer, rcu); 368 - 369 - kmem_cache_free(posix_timers_cache, tmr); 370 - } 371 - 372 - static void posix_timer_free(struct k_itimer *tmr) 417 + void posixtimer_free_timer(struct k_itimer *tmr) 373 418 { 374 419 put_pid(tmr->it_pid); 375 - sigqueue_free(tmr->sigq); 376 - call_rcu(&tmr->rcu, k_itimer_rcu_free); 420 + if (tmr->sigq.ucounts) 421 + dec_rlimit_put_ucounts(tmr->sigq.ucounts, UCOUNT_RLIMIT_SIGPENDING); 422 + kfree_rcu(tmr, rcu); 377 423 } 378 424 379 425 static void posix_timer_unhash_and_free(struct k_itimer *tmr) ··· 376 432 spin_lock(&hash_lock); 377 433 hlist_del_rcu(&tmr->t_hash); 378 434 spin_unlock(&hash_lock); 379 - posix_timer_free(tmr); 435 + posixtimer_putref(tmr); 380 436 } 381 437 382 438 static int common_timer_create(struct k_itimer *new_timer) ··· 411 467 */ 412 468 new_timer_id = posix_timer_add(new_timer); 413 469 if (new_timer_id < 0) { 414 - posix_timer_free(new_timer); 470 + posixtimer_free_timer(new_timer); 415 471 return new_timer_id; 416 472 } 417 473 ··· 429 485 goto out; 430 486 } 431 487 new_timer->it_sigev_notify = event->sigev_notify; 432 - new_timer->sigq->info.si_signo = event->sigev_signo; 433 - new_timer->sigq->info.si_value = event->sigev_value; 488 + new_timer->sigq.info.si_signo = event->sigev_signo; 489 + new_timer->sigq.info.si_value = event->sigev_value; 434 490 } else { 435 491 new_timer->it_sigev_notify = SIGEV_SIGNAL; 436 - new_timer->sigq->info.si_signo = SIGALRM; 437 - memset(&new_timer->sigq->info.si_value, 0, sizeof(sigval_t)); 438 - new_timer->sigq->info.si_value.sival_int = new_timer->it_id; 492 + new_timer->sigq.info.si_signo = SIGALRM; 493 + memset(&new_timer->sigq.info.si_value, 0, sizeof(sigval_t)); 494 + new_timer->sigq.info.si_value.sival_int = new_timer->it_id; 439 495 new_timer->it_pid = get_pid(task_tgid(current)); 440 496 } 441 497 442 - new_timer->sigq->info.si_tid = new_timer->it_id; 443 - new_timer->sigq->info.si_code = SI_TIMER; 498 + if (new_timer->it_sigev_notify & SIGEV_THREAD_ID) 499 + new_timer->it_pid_type = PIDTYPE_PID; 500 + else 501 + new_timer->it_pid_type = PIDTYPE_TGID; 502 + 503 + new_timer->sigq.info.si_tid = new_timer->it_id; 504 + new_timer->sigq.info.si_code = SI_TIMER; 444 505 445 506 if (copy_to_user(created_timer_id, &new_timer_id, sizeof (new_timer_id))) { 446 507 error = -EFAULT; ··· 529 580 * 1) Set timr::it_signal to NULL with timr::it_lock held 530 581 * 2) Release timr::it_lock 531 582 * 3) Remove from the hash under hash_lock 532 - * 4) Call RCU for removal after the grace period 583 + * 4) Put the reference count. 584 + * 585 + * The reference count might not drop to zero if timr::sigq is 586 + * queued. In that case the signal delivery or flush will put the 587 + * last reference count. 588 + * 589 + * When the reference count reaches zero, the timer is scheduled 590 + * for RCU removal after the grace period. 533 591 * 534 592 * Holding rcu_read_lock() accross the lookup ensures that 535 593 * the timer cannot be freed. ··· 603 647 /* interval timer ? */ 604 648 if (iv) { 605 649 cur_setting->it_interval = ktime_to_timespec64(iv); 606 - } else if (!timr->it_active) { 650 + } else if (timr->it_status == POSIX_TIMER_DISARMED) { 607 651 /* 608 652 * SIGEV_NONE oneshot timers are never queued and therefore 609 - * timr->it_active is always false. The check below 653 + * timr->it_status is always DISARMED. The check below 610 654 * vs. remaining time will handle this case. 611 655 * 612 656 * For all other timers there is nothing to update here, so ··· 623 667 * is a SIGEV_NONE timer move the expiry time forward by intervals, 624 668 * so expiry is > now. 625 669 */ 626 - if (iv && (timr->it_requeue_pending & REQUEUE_PENDING || sig_none)) 670 + if (iv && timr->it_status != POSIX_TIMER_ARMED) 627 671 timr->it_overrun += kc->timer_forward(timr, now); 628 672 629 673 remaining = kc->timer_remaining(timr, now); ··· 731 775 if (!timr) 732 776 return -EINVAL; 733 777 734 - overrun = timer_overrun_to_int(timr, 0); 778 + overrun = timer_overrun_to_int(timr); 735 779 unlock_timer(timr, flags); 736 780 737 781 return overrun; ··· 823 867 else 824 868 timer->it_interval = 0; 825 869 826 - /* Prevent reloading in case there is a signal pending */ 827 - timer->it_requeue_pending = (timer->it_requeue_pending + 2) & ~REQUEUE_PENDING; 828 870 /* Reset overrun accounting */ 829 871 timer->it_overrun_last = 0; 830 872 timer->it_overrun = -1LL; ··· 840 886 if (old_setting) 841 887 common_timer_get(timr, old_setting); 842 888 843 - /* Prevent rearming by clearing the interval */ 844 - timr->it_interval = 0; 845 889 /* 846 890 * Careful here. On SMP systems the timer expiry function could be 847 891 * active and spinning on timr->it_lock. ··· 847 895 if (kc->timer_try_to_cancel(timr) < 0) 848 896 return TIMER_RETRY; 849 897 850 - timr->it_active = 0; 898 + timr->it_status = POSIX_TIMER_DISARMED; 851 899 posix_timer_set_common(timr, new_setting); 852 900 853 901 /* Keep timer disarmed when it_value is zero */ ··· 860 908 sigev_none = timr->it_sigev_notify == SIGEV_NONE; 861 909 862 910 kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none); 863 - timr->it_active = !sigev_none; 911 + if (!sigev_none) 912 + timr->it_status = POSIX_TIMER_ARMED; 864 913 return 0; 865 914 } 866 915 ··· 888 935 889 936 if (old_spec64) 890 937 old_spec64->it_interval = ktime_to_timespec64(timr->it_interval); 938 + 939 + /* Prevent signal delivery and rearming. */ 940 + timr->it_signal_seq++; 891 941 892 942 kc = timr->kclock; 893 943 if (WARN_ON_ONCE(!kc || !kc->timer_set)) ··· 960 1004 { 961 1005 const struct k_clock *kc = timer->kclock; 962 1006 963 - timer->it_interval = 0; 964 1007 if (kc->timer_try_to_cancel(timer) < 0) 965 1008 return TIMER_RETRY; 966 - timer->it_active = 0; 1009 + timer->it_status = POSIX_TIMER_DISARMED; 967 1010 return 0; 1011 + } 1012 + 1013 + /* 1014 + * If the deleted timer is on the ignored list, remove it and 1015 + * drop the associated reference. 1016 + */ 1017 + static inline void posix_timer_cleanup_ignored(struct k_itimer *tmr) 1018 + { 1019 + if (!hlist_unhashed(&tmr->ignored_list)) { 1020 + hlist_del_init(&tmr->ignored_list); 1021 + posixtimer_putref(tmr); 1022 + } 968 1023 } 969 1024 970 1025 static inline int timer_delete_hook(struct k_itimer *timer) 971 1026 { 972 1027 const struct k_clock *kc = timer->kclock; 1028 + 1029 + /* Prevent signal delivery and rearming. */ 1030 + timer->it_signal_seq++; 973 1031 974 1032 if (WARN_ON_ONCE(!kc || !kc->timer_del)) 975 1033 return -EINVAL; ··· 1010 1040 1011 1041 spin_lock(&current->sighand->siglock); 1012 1042 hlist_del(&timer->list); 1013 - spin_unlock(&current->sighand->siglock); 1043 + posix_timer_cleanup_ignored(timer); 1014 1044 /* 1015 1045 * A concurrent lookup could check timer::it_signal lockless. It 1016 1046 * will reevaluate with timer::it_lock held and observe the NULL. 1047 + * 1048 + * It must be written with siglock held so that the signal code 1049 + * observes timer->it_signal == NULL in do_sigaction(SIG_IGN), 1050 + * which prevents it from moving a pending signal of a deleted 1051 + * timer to the ignore list. 1017 1052 */ 1018 1053 WRITE_ONCE(timer->it_signal, NULL); 1054 + spin_unlock(&current->sighand->siglock); 1019 1055 1020 1056 unlock_timer(timer, flags); 1021 1057 posix_timer_unhash_and_free(timer); ··· 1067 1091 } 1068 1092 hlist_del(&timer->list); 1069 1093 1094 + posix_timer_cleanup_ignored(timer); 1095 + 1070 1096 /* 1071 1097 * Setting timer::it_signal to NULL is technically not required 1072 1098 * here as nothing can access the timer anymore legitimately via ··· 1101 1123 /* The timers are not longer accessible via tsk::signal */ 1102 1124 while (!hlist_empty(&timers)) 1103 1125 itimer_delete(hlist_entry(timers.first, struct k_itimer, list)); 1126 + 1127 + /* 1128 + * There should be no timers on the ignored list. itimer_delete() has 1129 + * mopped them up. 1130 + */ 1131 + if (!WARN_ON_ONCE(!hlist_empty(&tsk->signal->ignored_posix_timers))) 1132 + return; 1133 + 1134 + hlist_move_list(&tsk->signal->ignored_posix_timers, &timers); 1135 + while (!hlist_empty(&timers)) { 1136 + posix_timer_cleanup_ignored(hlist_entry(timers.first, struct k_itimer, 1137 + ignored_list)); 1138 + } 1104 1139 } 1105 1140 1106 1141 SYSCALL_DEFINE2(clock_settime, const clockid_t, which_clock,

+7 -1

kernel/time/posix-timers.h

··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 2 #define TIMER_RETRY 1 3 3 4 + enum posix_timer_state { 5 + POSIX_TIMER_DISARMED, 6 + POSIX_TIMER_ARMED, 7 + POSIX_TIMER_REQUEUE_PENDING, 8 + }; 9 + 4 10 struct k_clock { 5 11 int (*clock_getres)(const clockid_t which_clock, 6 12 struct timespec64 *tp); ··· 42 36 extern const struct k_clock clock_thread; 43 37 extern const struct k_clock alarm_clock; 44 38 45 - int posix_timer_queue_signal(struct k_itimer *timr); 39 + void posix_timer_queue_signal(struct k_itimer *timr); 46 40 47 41 void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting); 48 42 int common_timer_set(struct k_itimer *timr, int flags,

+377

kernel/time/sleep_timeout.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Kernel internal schedule timeout and sleeping functions 4 + */ 5 + 6 + #include <linux/delay.h> 7 + #include <linux/jiffies.h> 8 + #include <linux/timer.h> 9 + #include <linux/sched/signal.h> 10 + #include <linux/sched/debug.h> 11 + 12 + #include "tick-internal.h" 13 + 14 + /* 15 + * Since schedule_timeout()'s timer is defined on the stack, it must store 16 + * the target task on the stack as well. 17 + */ 18 + struct process_timer { 19 + struct timer_list timer; 20 + struct task_struct *task; 21 + }; 22 + 23 + static void process_timeout(struct timer_list *t) 24 + { 25 + struct process_timer *timeout = from_timer(timeout, t, timer); 26 + 27 + wake_up_process(timeout->task); 28 + } 29 + 30 + /** 31 + * schedule_timeout - sleep until timeout 32 + * @timeout: timeout value in jiffies 33 + * 34 + * Make the current task sleep until @timeout jiffies have elapsed. 35 + * The function behavior depends on the current task state 36 + * (see also set_current_state() description): 37 + * 38 + * %TASK_RUNNING - the scheduler is called, but the task does not sleep 39 + * at all. That happens because sched_submit_work() does nothing for 40 + * tasks in %TASK_RUNNING state. 41 + * 42 + * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to 43 + * pass before the routine returns unless the current task is explicitly 44 + * woken up, (e.g. by wake_up_process()). 45 + * 46 + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 47 + * delivered to the current task or the current task is explicitly woken 48 + * up. 49 + * 50 + * The current task state is guaranteed to be %TASK_RUNNING when this 51 + * routine returns. 52 + * 53 + * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule 54 + * the CPU away without a bound on the timeout. In this case the return 55 + * value will be %MAX_SCHEDULE_TIMEOUT. 56 + * 57 + * Returns: 0 when the timer has expired otherwise the remaining time in 58 + * jiffies will be returned. In all cases the return value is guaranteed 59 + * to be non-negative. 60 + */ 61 + signed long __sched schedule_timeout(signed long timeout) 62 + { 63 + struct process_timer timer; 64 + unsigned long expire; 65 + 66 + switch (timeout) { 67 + case MAX_SCHEDULE_TIMEOUT: 68 + /* 69 + * These two special cases are useful to be comfortable 70 + * in the caller. Nothing more. We could take 71 + * MAX_SCHEDULE_TIMEOUT from one of the negative value 72 + * but I' d like to return a valid offset (>=0) to allow 73 + * the caller to do everything it want with the retval. 74 + */ 75 + schedule(); 76 + goto out; 77 + default: 78 + /* 79 + * Another bit of PARANOID. Note that the retval will be 80 + * 0 since no piece of kernel is supposed to do a check 81 + * for a negative retval of schedule_timeout() (since it 82 + * should never happens anyway). You just have the printk() 83 + * that will tell you if something is gone wrong and where. 84 + */ 85 + if (timeout < 0) { 86 + pr_err("%s: wrong timeout value %lx\n", __func__, timeout); 87 + dump_stack(); 88 + __set_current_state(TASK_RUNNING); 89 + goto out; 90 + } 91 + } 92 + 93 + expire = timeout + jiffies; 94 + 95 + timer.task = current; 96 + timer_setup_on_stack(&timer.timer, process_timeout, 0); 97 + timer.timer.expires = expire; 98 + add_timer(&timer.timer); 99 + schedule(); 100 + del_timer_sync(&timer.timer); 101 + 102 + /* Remove the timer from the object tracker */ 103 + destroy_timer_on_stack(&timer.timer); 104 + 105 + timeout = expire - jiffies; 106 + 107 + out: 108 + return timeout < 0 ? 0 : timeout; 109 + } 110 + EXPORT_SYMBOL(schedule_timeout); 111 + 112 + /* 113 + * __set_current_state() can be used in schedule_timeout_*() functions, because 114 + * schedule_timeout() calls schedule() unconditionally. 115 + */ 116 + 117 + /** 118 + * schedule_timeout_interruptible - sleep until timeout (interruptible) 119 + * @timeout: timeout value in jiffies 120 + * 121 + * See schedule_timeout() for details. 122 + * 123 + * Task state is set to TASK_INTERRUPTIBLE before starting the timeout. 124 + */ 125 + signed long __sched schedule_timeout_interruptible(signed long timeout) 126 + { 127 + __set_current_state(TASK_INTERRUPTIBLE); 128 + return schedule_timeout(timeout); 129 + } 130 + EXPORT_SYMBOL(schedule_timeout_interruptible); 131 + 132 + /** 133 + * schedule_timeout_killable - sleep until timeout (killable) 134 + * @timeout: timeout value in jiffies 135 + * 136 + * See schedule_timeout() for details. 137 + * 138 + * Task state is set to TASK_KILLABLE before starting the timeout. 139 + */ 140 + signed long __sched schedule_timeout_killable(signed long timeout) 141 + { 142 + __set_current_state(TASK_KILLABLE); 143 + return schedule_timeout(timeout); 144 + } 145 + EXPORT_SYMBOL(schedule_timeout_killable); 146 + 147 + /** 148 + * schedule_timeout_uninterruptible - sleep until timeout (uninterruptible) 149 + * @timeout: timeout value in jiffies 150 + * 151 + * See schedule_timeout() for details. 152 + * 153 + * Task state is set to TASK_UNINTERRUPTIBLE before starting the timeout. 154 + */ 155 + signed long __sched schedule_timeout_uninterruptible(signed long timeout) 156 + { 157 + __set_current_state(TASK_UNINTERRUPTIBLE); 158 + return schedule_timeout(timeout); 159 + } 160 + EXPORT_SYMBOL(schedule_timeout_uninterruptible); 161 + 162 + /** 163 + * schedule_timeout_idle - sleep until timeout (idle) 164 + * @timeout: timeout value in jiffies 165 + * 166 + * See schedule_timeout() for details. 167 + * 168 + * Task state is set to TASK_IDLE before starting the timeout. It is similar to 169 + * schedule_timeout_uninterruptible(), except this task will not contribute to 170 + * load average. 171 + */ 172 + signed long __sched schedule_timeout_idle(signed long timeout) 173 + { 174 + __set_current_state(TASK_IDLE); 175 + return schedule_timeout(timeout); 176 + } 177 + EXPORT_SYMBOL(schedule_timeout_idle); 178 + 179 + /** 180 + * schedule_hrtimeout_range_clock - sleep until timeout 181 + * @expires: timeout value (ktime_t) 182 + * @delta: slack in expires timeout (ktime_t) 183 + * @mode: timer mode 184 + * @clock_id: timer clock to be used 185 + * 186 + * Details are explained in schedule_hrtimeout_range() function description as 187 + * this function is commonly used. 188 + */ 189 + int __sched schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta, 190 + const enum hrtimer_mode mode, clockid_t clock_id) 191 + { 192 + struct hrtimer_sleeper t; 193 + 194 + /* 195 + * Optimize when a zero timeout value is given. It does not 196 + * matter whether this is an absolute or a relative time. 197 + */ 198 + if (expires && *expires == 0) { 199 + __set_current_state(TASK_RUNNING); 200 + return 0; 201 + } 202 + 203 + /* 204 + * A NULL parameter means "infinite" 205 + */ 206 + if (!expires) { 207 + schedule(); 208 + return -EINTR; 209 + } 210 + 211 + hrtimer_setup_sleeper_on_stack(&t, clock_id, mode); 212 + hrtimer_set_expires_range_ns(&t.timer, *expires, delta); 213 + hrtimer_sleeper_start_expires(&t, mode); 214 + 215 + if (likely(t.task)) 216 + schedule(); 217 + 218 + hrtimer_cancel(&t.timer); 219 + destroy_hrtimer_on_stack(&t.timer); 220 + 221 + __set_current_state(TASK_RUNNING); 222 + 223 + return !t.task ? 0 : -EINTR; 224 + } 225 + EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock); 226 + 227 + /** 228 + * schedule_hrtimeout_range - sleep until timeout 229 + * @expires: timeout value (ktime_t) 230 + * @delta: slack in expires timeout (ktime_t) 231 + * @mode: timer mode 232 + * 233 + * Make the current task sleep until the given expiry time has 234 + * elapsed. The routine will return immediately unless 235 + * the current task state has been set (see set_current_state()). 236 + * 237 + * The @delta argument gives the kernel the freedom to schedule the 238 + * actual wakeup to a time that is both power and performance friendly 239 + * for regular (non RT/DL) tasks. 240 + * The kernel give the normal best effort behavior for "@expires+@delta", 241 + * but may decide to fire the timer earlier, but no earlier than @expires. 242 + * 243 + * You can set the task state as follows - 244 + * 245 + * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to 246 + * pass before the routine returns unless the current task is explicitly 247 + * woken up, (e.g. by wake_up_process()). 248 + * 249 + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 250 + * delivered to the current task or the current task is explicitly woken 251 + * up. 252 + * 253 + * The current task state is guaranteed to be TASK_RUNNING when this 254 + * routine returns. 255 + * 256 + * Returns: 0 when the timer has expired. If the task was woken before the 257 + * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or 258 + * by an explicit wakeup, it returns -EINTR. 259 + */ 260 + int __sched schedule_hrtimeout_range(ktime_t *expires, u64 delta, 261 + const enum hrtimer_mode mode) 262 + { 263 + return schedule_hrtimeout_range_clock(expires, delta, mode, 264 + CLOCK_MONOTONIC); 265 + } 266 + EXPORT_SYMBOL_GPL(schedule_hrtimeout_range); 267 + 268 + /** 269 + * schedule_hrtimeout - sleep until timeout 270 + * @expires: timeout value (ktime_t) 271 + * @mode: timer mode 272 + * 273 + * See schedule_hrtimeout_range() for details. @delta argument of 274 + * schedule_hrtimeout_range() is set to 0 and has therefore no impact. 275 + */ 276 + int __sched schedule_hrtimeout(ktime_t *expires, const enum hrtimer_mode mode) 277 + { 278 + return schedule_hrtimeout_range(expires, 0, mode); 279 + } 280 + EXPORT_SYMBOL_GPL(schedule_hrtimeout); 281 + 282 + /** 283 + * msleep - sleep safely even with waitqueue interruptions 284 + * @msecs: Requested sleep duration in milliseconds 285 + * 286 + * msleep() uses jiffy based timeouts for the sleep duration. Because of the 287 + * design of the timer wheel, the maximum additional percentage delay (slack) is 288 + * 12.5%. This is only valid for timers which will end up in level 1 or a higher 289 + * level of the timer wheel. For explanation of those 12.5% please check the 290 + * detailed description about the basics of the timer wheel. 291 + * 292 + * The slack of timers which will end up in level 0 depends on sleep duration 293 + * (msecs) and HZ configuration and can be calculated in the following way (with 294 + * the timer wheel design restriction that the slack is not less than 12.5%): 295 + * 296 + * ``slack = MSECS_PER_TICK / msecs`` 297 + * 298 + * When the allowed slack of the callsite is known, the calculation could be 299 + * turned around to find the minimal allowed sleep duration to meet the 300 + * constraints. For example: 301 + * 302 + * * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``: 303 + * all sleep durations greater or equal 4ms will meet the constraints. 304 + * * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``: 305 + * all sleep durations greater or equal 8ms will meet the constraints. 306 + * * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``: 307 + * all sleep durations greater or equal 16ms will meet the constraints. 308 + * * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``: 309 + * all sleep durations greater or equal 32ms will meet the constraints. 310 + * 311 + * See also the signal aware variant msleep_interruptible(). 312 + */ 313 + void msleep(unsigned int msecs) 314 + { 315 + unsigned long timeout = msecs_to_jiffies(msecs); 316 + 317 + while (timeout) 318 + timeout = schedule_timeout_uninterruptible(timeout); 319 + } 320 + EXPORT_SYMBOL(msleep); 321 + 322 + /** 323 + * msleep_interruptible - sleep waiting for signals 324 + * @msecs: Requested sleep duration in milliseconds 325 + * 326 + * See msleep() for some basic information. 327 + * 328 + * The difference between msleep() and msleep_interruptible() is that the sleep 329 + * could be interrupted by a signal delivery and then returns early. 330 + * 331 + * Returns: The remaining time of the sleep duration transformed to msecs (see 332 + * schedule_timeout() for details). 333 + */ 334 + unsigned long msleep_interruptible(unsigned int msecs) 335 + { 336 + unsigned long timeout = msecs_to_jiffies(msecs); 337 + 338 + while (timeout && !signal_pending(current)) 339 + timeout = schedule_timeout_interruptible(timeout); 340 + return jiffies_to_msecs(timeout); 341 + } 342 + EXPORT_SYMBOL(msleep_interruptible); 343 + 344 + /** 345 + * usleep_range_state - Sleep for an approximate time in a given state 346 + * @min: Minimum time in usecs to sleep 347 + * @max: Maximum time in usecs to sleep 348 + * @state: State of the current task that will be while sleeping 349 + * 350 + * usleep_range_state() sleeps at least for the minimum specified time but not 351 + * longer than the maximum specified amount of time. The range might reduce 352 + * power usage by allowing hrtimers to coalesce an already scheduled interrupt 353 + * with this hrtimer. In the worst case, an interrupt is scheduled for the upper 354 + * bound. 355 + * 356 + * The sleeping task is set to the specified state before starting the sleep. 357 + * 358 + * In non-atomic context where the exact wakeup time is flexible, use 359 + * usleep_range() or its variants instead of udelay(). The sleep improves 360 + * responsiveness by avoiding the CPU-hogging busy-wait of udelay(). 361 + */ 362 + void __sched usleep_range_state(unsigned long min, unsigned long max, unsigned int state) 363 + { 364 + ktime_t exp = ktime_add_us(ktime_get(), min); 365 + u64 delta = (u64)(max - min) * NSEC_PER_USEC; 366 + 367 + if (WARN_ON_ONCE(max < min)) 368 + delta = 0; 369 + 370 + for (;;) { 371 + __set_current_state(state); 372 + /* Do not return before the requested sleep time has elapsed */ 373 + if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS)) 374 + break; 375 + } 376 + } 377 + EXPORT_SYMBOL(usleep_range_state);

+1 -2

kernel/time/tick-internal.h

··· 25 25 extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast); 26 26 extern void tick_handle_periodic(struct clock_event_device *dev); 27 27 extern void tick_check_new_device(struct clock_event_device *dev); 28 + extern void tick_offline_cpu(unsigned int cpu); 28 29 extern void tick_shutdown(unsigned int cpu); 29 30 extern void tick_suspend(void); 30 31 extern void tick_resume(void); ··· 143 142 #endif /* !(BROADCAST && ONESHOT) */ 144 143 145 144 #if defined(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) && defined(CONFIG_HOTPLUG_CPU) 146 - extern void tick_offline_cpu(unsigned int cpu); 147 145 extern void tick_broadcast_offline(unsigned int cpu); 148 146 #else 149 - static inline void tick_offline_cpu(unsigned int cpu) { } 150 147 static inline void tick_broadcast_offline(unsigned int cpu) { } 151 148 #endif 152 149

+6 -19

kernel/time/tick-sched.c

··· 311 311 return HRTIMER_RESTART; 312 312 } 313 313 314 - static void tick_sched_timer_cancel(struct tick_sched *ts) 315 - { 316 - if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) 317 - hrtimer_cancel(&ts->sched_timer); 318 - else if (tick_sched_flag_test(ts, TS_FLAG_NOHZ)) 319 - tick_program_event(KTIME_MAX, 1); 320 - } 321 - 322 314 #ifdef CONFIG_NO_HZ_FULL 323 315 cpumask_var_t tick_nohz_full_mask; 324 316 EXPORT_SYMBOL_GPL(tick_nohz_full_mask); ··· 1053 1061 * the tick timer. 1054 1062 */ 1055 1063 if (unlikely(expires == KTIME_MAX)) { 1056 - tick_sched_timer_cancel(ts); 1064 + if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) 1065 + hrtimer_cancel(&ts->sched_timer); 1066 + else 1067 + tick_program_event(KTIME_MAX, 1); 1057 1068 return; 1058 1069 } 1059 1070 ··· 1605 1610 */ 1606 1611 void tick_sched_timer_dying(int cpu) 1607 1612 { 1608 - struct tick_device *td = &per_cpu(tick_cpu_device, cpu); 1609 1613 struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); 1610 - struct clock_event_device *dev = td->evtdev; 1611 1614 ktime_t idle_sleeptime, iowait_sleeptime; 1612 1615 unsigned long idle_calls, idle_sleeps; 1613 1616 1614 1617 /* This must happen before hrtimers are migrated! */ 1615 - tick_sched_timer_cancel(ts); 1616 - 1617 - /* 1618 - * If the clockevents doesn't support CLOCK_EVT_STATE_ONESHOT_STOPPED, 1619 - * make sure not to call low-res tick handler. 1620 - */ 1621 - if (tick_sched_flag_test(ts, TS_FLAG_NOHZ)) 1622 - dev->event_handler = clockevents_handle_noop; 1618 + if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) 1619 + hrtimer_cancel(&ts->sched_timer); 1623 1620 1624 1621 idle_sleeptime = ts->idle_sleeptime; 1625 1622 iowait_sleeptime = ts->iowait_sleeptime;

+10 -10

kernel/time/time.c

··· 556 556 * - all other values are converted to jiffies by either multiplying 557 557 * the input value by a factor or dividing it with a factor and 558 558 * handling any 32-bit overflows. 559 - * for the details see __msecs_to_jiffies() 559 + * for the details see _msecs_to_jiffies() 560 560 * 561 - * __msecs_to_jiffies() checks for the passed in value being a constant 561 + * msecs_to_jiffies() checks for the passed in value being a constant 562 562 * via __builtin_constant_p() allowing gcc to eliminate most of the 563 563 * code, __msecs_to_jiffies() is called if the value passed does not 564 564 * allow constant folding and the actual conversion must be done at ··· 866 866 * 867 867 * Handles compat or 32-bit modes. 868 868 * 869 - * Return: %0 on success or negative errno on error 869 + * Return: 0 on success or negative errno on error 870 870 */ 871 871 int get_timespec64(struct timespec64 *ts, 872 872 const struct __kernel_timespec __user *uts) ··· 897 897 * @ts: input &struct timespec64 898 898 * @uts: user's &struct __kernel_timespec 899 899 * 900 - * Return: %0 on success or negative errno on error 900 + * Return: 0 on success or negative errno on error 901 901 */ 902 902 int put_timespec64(const struct timespec64 *ts, 903 903 struct __kernel_timespec __user *uts) ··· 944 944 * 945 945 * Handles X86_X32_ABI compatibility conversion. 946 946 * 947 - * Return: %0 on success or negative errno on error 947 + * Return: 0 on success or negative errno on error 948 948 */ 949 949 int get_old_timespec32(struct timespec64 *ts, const void __user *uts) 950 950 { ··· 963 963 * 964 964 * Handles X86_X32_ABI compatibility conversion. 965 965 * 966 - * Return: %0 on success or negative errno on error 966 + * Return: 0 on success or negative errno on error 967 967 */ 968 968 int put_old_timespec32(const struct timespec64 *ts, void __user *uts) 969 969 { ··· 979 979 * @it: destination &struct itimerspec64 980 980 * @uit: user's &struct __kernel_itimerspec 981 981 * 982 - * Return: %0 on success or negative errno on error 982 + * Return: 0 on success or negative errno on error 983 983 */ 984 984 int get_itimerspec64(struct itimerspec64 *it, 985 985 const struct __kernel_itimerspec __user *uit) ··· 1002 1002 * @it: input &struct itimerspec64 1003 1003 * @uit: user's &struct __kernel_itimerspec 1004 1004 * 1005 - * Return: %0 on success or negative errno on error 1005 + * Return: 0 on success or negative errno on error 1006 1006 */ 1007 1007 int put_itimerspec64(const struct itimerspec64 *it, 1008 1008 struct __kernel_itimerspec __user *uit) ··· 1024 1024 * @its: destination &struct itimerspec64 1025 1025 * @uits: user's &struct old_itimerspec32 1026 1026 * 1027 - * Return: %0 on success or negative errno on error 1027 + * Return: 0 on success or negative errno on error 1028 1028 */ 1029 1029 int get_old_itimerspec32(struct itimerspec64 *its, 1030 1030 const struct old_itimerspec32 __user *uits) ··· 1043 1043 * @its: input &struct itimerspec64 1044 1044 * @uits: user's &struct old_itimerspec32 1045 1045 * 1046 - * Return: %0 on success or negative errno on error 1046 + * Return: 0 on success or negative errno on error 1047 1047 */ 1048 1048 int put_old_itimerspec32(const struct itimerspec64 *its, 1049 1049 struct old_itimerspec32 __user *uits)

+211 -321

kernel/time/timekeeping.c

··· 30 30 #include "timekeeping_internal.h" 31 31 32 32 #define TK_CLEAR_NTP (1 << 0) 33 - #define TK_MIRROR (1 << 1) 34 - #define TK_CLOCK_WAS_SET (1 << 2) 33 + #define TK_CLOCK_WAS_SET (1 << 1) 34 + 35 + #define TK_UPDATE_ALL (TK_CLEAR_NTP | TK_CLOCK_WAS_SET) 35 36 36 37 enum timekeeping_adv_mode { 37 38 /* Update timekeeper when a tick has passed */ ··· 42 41 TK_ADV_FREQ 43 42 }; 44 43 45 - DEFINE_RAW_SPINLOCK(timekeeper_lock); 46 - 47 44 /* 48 45 * The most important data for readout fits into a single 64 byte 49 46 * cache line. 50 47 */ 51 - static struct { 48 + struct tk_data { 52 49 seqcount_raw_spinlock_t seq; 53 50 struct timekeeper timekeeper; 54 - } tk_core ____cacheline_aligned = { 55 - .seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock), 56 - }; 51 + struct timekeeper shadow_timekeeper; 52 + raw_spinlock_t lock; 53 + } ____cacheline_aligned; 57 54 58 - static struct timekeeper shadow_timekeeper; 55 + static struct tk_data tk_core; 59 56 60 57 /* flag for if timekeeping is suspended */ 61 58 int __read_mostly timekeeping_suspended; ··· 112 113 .base[0] = FAST_TK_INIT, 113 114 .base[1] = FAST_TK_INIT, 114 115 }; 116 + 117 + unsigned long timekeeper_lock_irqsave(void) 118 + { 119 + unsigned long flags; 120 + 121 + raw_spin_lock_irqsave(&tk_core.lock, flags); 122 + return flags; 123 + } 124 + 125 + void timekeeper_unlock_irqrestore(unsigned long flags) 126 + { 127 + raw_spin_unlock_irqrestore(&tk_core.lock, flags); 128 + } 115 129 116 130 /* 117 131 * Multigrain timestamps require tracking the latest fine-grained timestamp ··· 190 178 WARN_ON_ONCE(tk->offs_real != timespec64_to_ktime(tmp)); 191 179 tk->wall_to_monotonic = wtm; 192 180 set_normalized_timespec64(&tmp, -wtm.tv_sec, -wtm.tv_nsec); 193 - tk->offs_real = timespec64_to_ktime(tmp); 194 - tk->offs_tai = ktime_add(tk->offs_real, ktime_set(tk->tai_offset, 0)); 181 + /* Paired with READ_ONCE() in ktime_mono_to_any() */ 182 + WRITE_ONCE(tk->offs_real, timespec64_to_ktime(tmp)); 183 + WRITE_ONCE(tk->offs_tai, ktime_add(tk->offs_real, ktime_set(tk->tai_offset, 0))); 195 184 } 196 185 197 186 static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) 198 187 { 199 - tk->offs_boot = ktime_add(tk->offs_boot, delta); 188 + /* Paired with READ_ONCE() in ktime_mono_to_any() */ 189 + WRITE_ONCE(tk->offs_boot, ktime_add(tk->offs_boot, delta)); 200 190 /* 201 191 * Timespec representation for VDSO update to avoid 64bit division 202 192 * on every update. ··· 215 201 * the tkr's clocksource may change between the read reference, and the 216 202 * clock reference passed to the read function. This can cause crashes if 217 203 * the wrong clocksource is passed to the wrong read function. 218 - * This isn't necessary to use when holding the timekeeper_lock or doing 204 + * This isn't necessary to use when holding the tk_core.lock or doing 219 205 * a read of the fast-timekeeper tkrs (which is protected by its own locking 220 206 * and update logic). 221 207 */ ··· 225 211 226 212 return clock->read(clock); 227 213 } 228 - 229 - #ifdef CONFIG_DEBUG_TIMEKEEPING 230 - #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */ 231 - 232 - static void timekeeping_check_update(struct timekeeper *tk, u64 offset) 233 - { 234 - 235 - u64 max_cycles = tk->tkr_mono.clock->max_cycles; 236 - const char *name = tk->tkr_mono.clock->name; 237 - 238 - if (offset > max_cycles) { 239 - printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow danger\n", 240 - offset, name, max_cycles); 241 - printk_deferred(" timekeeping: Your kernel is sick, but tries to cope by capping time updates\n"); 242 - } else { 243 - if (offset > (max_cycles >> 1)) { 244 - printk_deferred("INFO: timekeeping: Cycle offset (%lld) is larger than the '%s' clock's 50%% safety margin (%lld)\n", 245 - offset, name, max_cycles >> 1); 246 - printk_deferred(" timekeeping: Your kernel is still fine, but is feeling a bit nervous\n"); 247 - } 248 - } 249 - 250 - if (tk->underflow_seen) { 251 - if (jiffies - tk->last_warning > WARNING_FREQ) { 252 - printk_deferred("WARNING: Underflow in clocksource '%s' observed, time update ignored.\n", name); 253 - printk_deferred(" Please report this, consider using a different clocksource, if possible.\n"); 254 - printk_deferred(" Your kernel is probably still fine.\n"); 255 - tk->last_warning = jiffies; 256 - } 257 - tk->underflow_seen = 0; 258 - } 259 - 260 - if (tk->overflow_seen) { 261 - if (jiffies - tk->last_warning > WARNING_FREQ) { 262 - printk_deferred("WARNING: Overflow in clocksource '%s' observed, time update capped.\n", name); 263 - printk_deferred(" Please report this, consider using a different clocksource, if possible.\n"); 264 - printk_deferred(" Your kernel is probably still fine.\n"); 265 - tk->last_warning = jiffies; 266 - } 267 - tk->overflow_seen = 0; 268 - } 269 - } 270 - 271 - static inline u64 timekeeping_cycles_to_ns(const struct tk_read_base *tkr, u64 cycles); 272 - 273 - static inline u64 timekeeping_debug_get_ns(const struct tk_read_base *tkr) 274 - { 275 - struct timekeeper *tk = &tk_core.timekeeper; 276 - u64 now, last, mask, max, delta; 277 - unsigned int seq; 278 - 279 - /* 280 - * Since we're called holding a seqcount, the data may shift 281 - * under us while we're doing the calculation. This can cause 282 - * false positives, since we'd note a problem but throw the 283 - * results away. So nest another seqcount here to atomically 284 - * grab the points we are checking with. 285 - */ 286 - do { 287 - seq = read_seqcount_begin(&tk_core.seq); 288 - now = tk_clock_read(tkr); 289 - last = tkr->cycle_last; 290 - mask = tkr->mask; 291 - max = tkr->clock->max_cycles; 292 - } while (read_seqcount_retry(&tk_core.seq, seq)); 293 - 294 - delta = clocksource_delta(now, last, mask); 295 - 296 - /* 297 - * Try to catch underflows by checking if we are seeing small 298 - * mask-relative negative values. 299 - */ 300 - if (unlikely((~delta & mask) < (mask >> 3))) 301 - tk->underflow_seen = 1; 302 - 303 - /* Check for multiplication overflows */ 304 - if (unlikely(delta > max)) 305 - tk->overflow_seen = 1; 306 - 307 - /* timekeeping_cycles_to_ns() handles both under and overflow */ 308 - return timekeeping_cycles_to_ns(tkr, now); 309 - } 310 - #else 311 - static inline void timekeeping_check_update(struct timekeeper *tk, u64 offset) 312 - { 313 - } 314 - static inline u64 timekeeping_debug_get_ns(const struct tk_read_base *tkr) 315 - { 316 - BUG(); 317 - } 318 - #endif 319 214 320 215 /** 321 216 * tk_setup_internals - Set up internals to use clocksource clock. ··· 330 407 return ((delta * tkr->mult) + tkr->xtime_nsec) >> tkr->shift; 331 408 } 332 409 333 - static __always_inline u64 __timekeeping_get_ns(const struct tk_read_base *tkr) 410 + static __always_inline u64 timekeeping_get_ns(const struct tk_read_base *tkr) 334 411 { 335 412 return timekeeping_cycles_to_ns(tkr, tk_clock_read(tkr)); 336 - } 337 - 338 - static inline u64 timekeeping_get_ns(const struct tk_read_base *tkr) 339 - { 340 - if (IS_ENABLED(CONFIG_DEBUG_TIMEKEEPING)) 341 - return timekeeping_debug_get_ns(tkr); 342 - 343 - return __timekeeping_get_ns(tkr); 344 413 } 345 414 346 415 /** ··· 380 465 seq = read_seqcount_latch(&tkf->seq); 381 466 tkr = tkf->base + (seq & 0x01); 382 467 now = ktime_to_ns(tkr->base); 383 - now += __timekeeping_get_ns(tkr); 468 + now += timekeeping_get_ns(tkr); 384 469 } while (read_seqcount_latch_retry(&tkf->seq, seq)); 385 470 386 471 return now; ··· 451 536 * timekeeping_inject_sleeptime64() 452 537 * __timekeeping_inject_sleeptime(tk, delta); 453 538 * timestamp(); 454 - * timekeeping_update(tk, TK_CLEAR_NTP...); 539 + * timekeeping_update_staged(tkd, TK_CLEAR_NTP...); 455 540 * 456 541 * (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be 457 542 * partially updated. Since the tk->offs_boot update is a rare event, this ··· 496 581 tkr = tkf->base + (seq & 0x01); 497 582 basem = ktime_to_ns(tkr->base); 498 583 baser = ktime_to_ns(tkr->base_real); 499 - delta = __timekeeping_get_ns(tkr); 584 + delta = timekeeping_get_ns(tkr); 500 585 } while (raw_read_seqcount_latch_retry(&tkf->seq, seq)); 501 586 502 587 if (mono) ··· 610 695 int pvclock_gtod_register_notifier(struct notifier_block *nb) 611 696 { 612 697 struct timekeeper *tk = &tk_core.timekeeper; 613 - unsigned long flags; 614 698 int ret; 615 699 616 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 700 + guard(raw_spinlock_irqsave)(&tk_core.lock); 617 701 ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb); 618 702 update_pvclock_gtod(tk, true); 619 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 620 703 621 704 return ret; 622 705 } ··· 627 714 */ 628 715 int pvclock_gtod_unregister_notifier(struct notifier_block *nb) 629 716 { 630 - unsigned long flags; 631 - int ret; 632 - 633 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 634 - ret = raw_notifier_chain_unregister(&pvclock_gtod_chain, nb); 635 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 636 - 637 - return ret; 717 + guard(raw_spinlock_irqsave)(&tk_core.lock); 718 + return raw_notifier_chain_unregister(&pvclock_gtod_chain, nb); 638 719 } 639 720 EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier); 640 721 ··· 641 734 if (tk->next_leap_ktime != KTIME_MAX) 642 735 /* Convert to monotonic time */ 643 736 tk->next_leap_ktime = ktime_sub(tk->next_leap_ktime, tk->offs_real); 737 + } 738 + 739 + /* 740 + * Leap state update for both shadow and the real timekeeper 741 + * Separate to spare a full memcpy() of the timekeeper. 742 + */ 743 + static void tk_update_leap_state_all(struct tk_data *tkd) 744 + { 745 + write_seqcount_begin(&tkd->seq); 746 + tk_update_leap_state(&tkd->shadow_timekeeper); 747 + tkd->timekeeper.next_leap_ktime = tkd->shadow_timekeeper.next_leap_ktime; 748 + write_seqcount_end(&tkd->seq); 644 749 } 645 750 646 751 /* ··· 688 769 tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC); 689 770 } 690 771 691 - /* must hold timekeeper_lock */ 692 - static void timekeeping_update(struct timekeeper *tk, unsigned int action) 772 + /* 773 + * Restore the shadow timekeeper from the real timekeeper. 774 + */ 775 + static void timekeeping_restore_shadow(struct tk_data *tkd) 693 776 { 777 + lockdep_assert_held(&tkd->lock); 778 + memcpy(&tkd->shadow_timekeeper, &tkd->timekeeper, sizeof(tkd->timekeeper)); 779 + } 780 + 781 + static void timekeeping_update_from_shadow(struct tk_data *tkd, unsigned int action) 782 + { 783 + struct timekeeper *tk = &tk_core.shadow_timekeeper; 784 + 785 + lockdep_assert_held(&tkd->lock); 786 + 787 + /* 788 + * Block out readers before running the updates below because that 789 + * updates VDSO and other time related infrastructure. Not blocking 790 + * the readers might let a reader see time going backwards when 791 + * reading from the VDSO after the VDSO update and then reading in 792 + * the kernel from the timekeeper before that got updated. 793 + */ 794 + write_seqcount_begin(&tkd->seq); 795 + 694 796 if (action & TK_CLEAR_NTP) { 695 797 tk->ntp_error = 0; 696 798 ntp_clear(); ··· 729 789 730 790 if (action & TK_CLOCK_WAS_SET) 731 791 tk->clock_was_set_seq++; 792 + 732 793 /* 733 - * The mirroring of the data to the shadow-timekeeper needs 734 - * to happen last here to ensure we don't over-write the 735 - * timekeeper structure on the next update with stale data 794 + * Update the real timekeeper. 795 + * 796 + * We could avoid this memcpy() by switching pointers, but that has 797 + * the downside that the reader side does not longer benefit from 798 + * the cacheline optimized data layout of the timekeeper and requires 799 + * another indirection. 736 800 */ 737 - if (action & TK_MIRROR) 738 - memcpy(&shadow_timekeeper, &tk_core.timekeeper, 739 - sizeof(tk_core.timekeeper)); 801 + memcpy(&tkd->timekeeper, tk, sizeof(*tk)); 802 + write_seqcount_end(&tkd->seq); 740 803 } 741 804 742 805 /** ··· 892 949 unsigned int seq; 893 950 ktime_t tconv; 894 951 952 + if (IS_ENABLED(CONFIG_64BIT)) { 953 + /* 954 + * Paired with WRITE_ONCE()s in tk_set_wall_to_mono() and 955 + * tk_update_sleep_time(). 956 + */ 957 + return ktime_add(tmono, READ_ONCE(*offset)); 958 + } 959 + 895 960 do { 896 961 seq = read_seqcount_begin(&tk_core.seq); 897 962 tconv = ktime_add(tmono, *offset); ··· 1030 1079 unsigned int seq; 1031 1080 ktime_t base_raw; 1032 1081 ktime_t base_real; 1082 + ktime_t base_boot; 1033 1083 u64 nsec_raw; 1034 1084 u64 nsec_real; 1035 1085 u64 now; ··· 1045 1093 systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; 1046 1094 base_real = ktime_add(tk->tkr_mono.base, 1047 1095 tk_core.timekeeper.offs_real); 1096 + base_boot = ktime_add(tk->tkr_mono.base, 1097 + tk_core.timekeeper.offs_boot); 1048 1098 base_raw = tk->tkr_raw.base; 1049 1099 nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); 1050 1100 nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); ··· 1054 1100 1055 1101 systime_snapshot->cycles = now; 1056 1102 systime_snapshot->real = ktime_add_ns(base_real, nsec_real); 1103 + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); 1057 1104 systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); 1058 1105 } 1059 1106 EXPORT_SYMBOL_GPL(ktime_get_snapshot); ··· 1414 1459 */ 1415 1460 int do_settimeofday64(const struct timespec64 *ts) 1416 1461 { 1417 - struct timekeeper *tk = &tk_core.timekeeper; 1418 1462 struct timespec64 ts_delta, xt; 1419 - unsigned long flags; 1420 - int ret = 0; 1421 1463 1422 1464 if (!timespec64_valid_settod(ts)) 1423 1465 return -EINVAL; 1424 1466 1425 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1426 - write_seqcount_begin(&tk_core.seq); 1467 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 1468 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1427 1469 1428 - timekeeping_forward_now(tk); 1470 + timekeeping_forward_now(tks); 1429 1471 1430 - xt = tk_xtime(tk); 1431 - ts_delta = timespec64_sub(*ts, xt); 1472 + xt = tk_xtime(tks); 1473 + ts_delta = timespec64_sub(*ts, xt); 1432 1474 1433 - if (timespec64_compare(&tk->wall_to_monotonic, &ts_delta) > 0) { 1434 - ret = -EINVAL; 1435 - goto out; 1475 + if (timespec64_compare(&tks->wall_to_monotonic, &ts_delta) > 0) { 1476 + timekeeping_restore_shadow(&tk_core); 1477 + return -EINVAL; 1478 + } 1479 + 1480 + tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, ts_delta)); 1481 + tk_set_xtime(tks, ts); 1482 + timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1436 1483 } 1437 - 1438 - tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, ts_delta)); 1439 - 1440 - tk_set_xtime(tk, ts); 1441 - out: 1442 - timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); 1443 - 1444 - write_seqcount_end(&tk_core.seq); 1445 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1446 1484 1447 1485 /* Signal hrtimers about time change */ 1448 1486 clock_was_set(CLOCK_SET_WALL); 1449 1487 1450 - if (!ret) { 1451 - audit_tk_injoffset(ts_delta); 1452 - add_device_randomness(ts, sizeof(*ts)); 1453 - } 1454 - 1455 - return ret; 1488 + audit_tk_injoffset(ts_delta); 1489 + add_device_randomness(ts, sizeof(*ts)); 1490 + return 0; 1456 1491 } 1457 1492 EXPORT_SYMBOL(do_settimeofday64); 1458 1493 ··· 1454 1509 */ 1455 1510 static int timekeeping_inject_offset(const struct timespec64 *ts) 1456 1511 { 1457 - struct timekeeper *tk = &tk_core.timekeeper; 1458 - unsigned long flags; 1459 - struct timespec64 tmp; 1460 - int ret = 0; 1461 - 1462 1512 if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC) 1463 1513 return -EINVAL; 1464 1514 1465 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1466 - write_seqcount_begin(&tk_core.seq); 1515 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 1516 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1517 + struct timespec64 tmp; 1467 1518 1468 - timekeeping_forward_now(tk); 1519 + timekeeping_forward_now(tks); 1469 1520 1470 - /* Make sure the proposed value is valid */ 1471 - tmp = timespec64_add(tk_xtime(tk), *ts); 1472 - if (timespec64_compare(&tk->wall_to_monotonic, ts) > 0 || 1473 - !timespec64_valid_settod(&tmp)) { 1474 - ret = -EINVAL; 1475 - goto error; 1521 + /* Make sure the proposed value is valid */ 1522 + tmp = timespec64_add(tk_xtime(tks), *ts); 1523 + if (timespec64_compare(&tks->wall_to_monotonic, ts) > 0 || 1524 + !timespec64_valid_settod(&tmp)) { 1525 + timekeeping_restore_shadow(&tk_core); 1526 + return -EINVAL; 1527 + } 1528 + 1529 + tk_xtime_add(tks, ts); 1530 + tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, *ts)); 1531 + timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1476 1532 } 1477 - 1478 - tk_xtime_add(tk, ts); 1479 - tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, *ts)); 1480 - 1481 - error: /* even if we error out, we forwarded the time, so call update */ 1482 - timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); 1483 - 1484 - write_seqcount_end(&tk_core.seq); 1485 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1486 1533 1487 1534 /* Signal hrtimers about time change */ 1488 1535 clock_was_set(CLOCK_SET_WALL); 1489 - 1490 - return ret; 1536 + return 0; 1491 1537 } 1492 1538 1493 1539 /* ··· 1531 1595 */ 1532 1596 static int change_clocksource(void *data) 1533 1597 { 1534 - struct timekeeper *tk = &tk_core.timekeeper; 1535 - struct clocksource *new, *old = NULL; 1536 - unsigned long flags; 1537 - bool change = false; 1538 - 1539 - new = (struct clocksource *) data; 1598 + struct clocksource *new = data, *old = NULL; 1540 1599 1541 1600 /* 1542 - * If the cs is in module, get a module reference. Succeeds 1543 - * for built-in code (owner == NULL) as well. 1601 + * If the clocksource is in a module, get a module reference. 1602 + * Succeeds for built-in code (owner == NULL) as well. Abort if the 1603 + * reference can't be acquired. 1544 1604 */ 1545 - if (try_module_get(new->owner)) { 1546 - if (!new->enable || new->enable(new) == 0) 1547 - change = true; 1548 - else 1549 - module_put(new->owner); 1605 + if (!try_module_get(new->owner)) 1606 + return 0; 1607 + 1608 + /* Abort if the device can't be enabled */ 1609 + if (new->enable && new->enable(new) != 0) { 1610 + module_put(new->owner); 1611 + return 0; 1550 1612 } 1551 1613 1552 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1553 - write_seqcount_begin(&tk_core.seq); 1614 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 1615 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1554 1616 1555 - timekeeping_forward_now(tk); 1556 - 1557 - if (change) { 1558 - old = tk->tkr_mono.clock; 1559 - tk_setup_internals(tk, new); 1617 + timekeeping_forward_now(tks); 1618 + old = tks->tkr_mono.clock; 1619 + tk_setup_internals(tks, new); 1620 + timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1560 1621 } 1561 - 1562 - timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); 1563 - 1564 - write_seqcount_end(&tk_core.seq); 1565 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1566 1622 1567 1623 if (old) { 1568 1624 if (old->disable) 1569 1625 old->disable(old); 1570 - 1571 1626 module_put(old->owner); 1572 1627 } 1573 1628 ··· 1683 1756 *boot_offset = ns_to_timespec64(local_clock()); 1684 1757 } 1685 1758 1759 + static __init void tkd_basic_setup(struct tk_data *tkd) 1760 + { 1761 + raw_spin_lock_init(&tkd->lock); 1762 + seqcount_raw_spinlock_init(&tkd->seq, &tkd->lock); 1763 + } 1764 + 1686 1765 /* 1687 1766 * Flag reflecting whether timekeeping_resume() has injected sleeptime. 1688 1767 * ··· 1713 1780 void __init timekeeping_init(void) 1714 1781 { 1715 1782 struct timespec64 wall_time, boot_offset, wall_to_mono; 1716 - struct timekeeper *tk = &tk_core.timekeeper; 1783 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1717 1784 struct clocksource *clock; 1718 - unsigned long flags; 1785 + 1786 + tkd_basic_setup(&tk_core); 1719 1787 1720 1788 read_persistent_wall_and_boot_offset(&wall_time, &boot_offset); 1721 1789 if (timespec64_valid_settod(&wall_time) && ··· 1736 1802 */ 1737 1803 wall_to_mono = timespec64_sub(boot_offset, wall_time); 1738 1804 1739 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1740 - write_seqcount_begin(&tk_core.seq); 1805 + guard(raw_spinlock_irqsave)(&tk_core.lock); 1806 + 1741 1807 ntp_init(); 1742 1808 1743 1809 clock = clocksource_default_clock(); 1744 1810 if (clock->enable) 1745 1811 clock->enable(clock); 1746 - tk_setup_internals(tk, clock); 1812 + tk_setup_internals(tks, clock); 1747 1813 1748 - tk_set_xtime(tk, &wall_time); 1749 - tk->raw_sec = 0; 1814 + tk_set_xtime(tks, &wall_time); 1815 + tks->raw_sec = 0; 1750 1816 1751 - tk_set_wall_to_mono(tk, wall_to_mono); 1817 + tk_set_wall_to_mono(tks, wall_to_mono); 1752 1818 1753 - timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); 1754 - 1755 - write_seqcount_end(&tk_core.seq); 1756 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1819 + timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); 1757 1820 } 1758 1821 1759 1822 /* time in seconds when suspend began for persistent clock */ ··· 1828 1897 */ 1829 1898 void timekeeping_inject_sleeptime64(const struct timespec64 *delta) 1830 1899 { 1831 - struct timekeeper *tk = &tk_core.timekeeper; 1832 - unsigned long flags; 1900 + scoped_guard(raw_spinlock_irqsave, &tk_core.lock) { 1901 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1833 1902 1834 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1835 - write_seqcount_begin(&tk_core.seq); 1836 - 1837 - suspend_timing_needed = false; 1838 - 1839 - timekeeping_forward_now(tk); 1840 - 1841 - __timekeeping_inject_sleeptime(tk, delta); 1842 - 1843 - timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); 1844 - 1845 - write_seqcount_end(&tk_core.seq); 1846 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1903 + suspend_timing_needed = false; 1904 + timekeeping_forward_now(tks); 1905 + __timekeeping_inject_sleeptime(tks, delta); 1906 + timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1907 + } 1847 1908 1848 1909 /* Signal hrtimers about time change */ 1849 1910 clock_was_set(CLOCK_SET_WALL | CLOCK_SET_BOOT); ··· 1847 1924 */ 1848 1925 void timekeeping_resume(void) 1849 1926 { 1850 - struct timekeeper *tk = &tk_core.timekeeper; 1851 - struct clocksource *clock = tk->tkr_mono.clock; 1852 - unsigned long flags; 1927 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1928 + struct clocksource *clock = tks->tkr_mono.clock; 1853 1929 struct timespec64 ts_new, ts_delta; 1854 - u64 cycle_now, nsec; 1855 1930 bool inject_sleeptime = false; 1931 + u64 cycle_now, nsec; 1932 + unsigned long flags; 1856 1933 1857 1934 read_persistent_clock64(&ts_new); 1858 1935 1859 1936 clockevents_resume(); 1860 1937 clocksource_resume(); 1861 1938 1862 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1863 - write_seqcount_begin(&tk_core.seq); 1939 + raw_spin_lock_irqsave(&tk_core.lock, flags); 1864 1940 1865 1941 /* 1866 1942 * After system resumes, we need to calculate the suspended time and ··· 1873 1951 * The less preferred source will only be tried if there is no better 1874 1952 * usable source. The rtc part is handled separately in rtc core code. 1875 1953 */ 1876 - cycle_now = tk_clock_read(&tk->tkr_mono); 1954 + cycle_now = tk_clock_read(&tks->tkr_mono); 1877 1955 nsec = clocksource_stop_suspend_timing(clock, cycle_now); 1878 1956 if (nsec > 0) { 1879 1957 ts_delta = ns_to_timespec64(nsec); ··· 1885 1963 1886 1964 if (inject_sleeptime) { 1887 1965 suspend_timing_needed = false; 1888 - __timekeeping_inject_sleeptime(tk, &ts_delta); 1966 + __timekeeping_inject_sleeptime(tks, &ts_delta); 1889 1967 } 1890 1968 1891 1969 /* Re-base the last cycle value */ 1892 - tk->tkr_mono.cycle_last = cycle_now; 1893 - tk->tkr_raw.cycle_last = cycle_now; 1970 + tks->tkr_mono.cycle_last = cycle_now; 1971 + tks->tkr_raw.cycle_last = cycle_now; 1894 1972 1895 - tk->ntp_error = 0; 1973 + tks->ntp_error = 0; 1896 1974 timekeeping_suspended = 0; 1897 - timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); 1898 - write_seqcount_end(&tk_core.seq); 1899 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1975 + timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); 1976 + raw_spin_unlock_irqrestore(&tk_core.lock, flags); 1900 1977 1901 1978 touch_softlockup_watchdog(); 1902 1979 ··· 1907 1986 1908 1987 int timekeeping_suspend(void) 1909 1988 { 1910 - struct timekeeper *tk = &tk_core.timekeeper; 1911 - unsigned long flags; 1912 - struct timespec64 delta, delta_delta; 1913 - static struct timespec64 old_delta; 1989 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1990 + struct timespec64 delta, delta_delta; 1991 + static struct timespec64 old_delta; 1914 1992 struct clocksource *curr_clock; 1993 + unsigned long flags; 1915 1994 u64 cycle_now; 1916 1995 1917 1996 read_persistent_clock64(&timekeeping_suspend_time); ··· 1926 2005 1927 2006 suspend_timing_needed = true; 1928 2007 1929 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1930 - write_seqcount_begin(&tk_core.seq); 1931 - timekeeping_forward_now(tk); 2008 + raw_spin_lock_irqsave(&tk_core.lock, flags); 2009 + timekeeping_forward_now(tks); 1932 2010 timekeeping_suspended = 1; 1933 2011 1934 2012 /* ··· 1935 2015 * just read from the current clocksource. Save this to potentially 1936 2016 * use in suspend timing. 1937 2017 */ 1938 - curr_clock = tk->tkr_mono.clock; 1939 - cycle_now = tk->tkr_mono.cycle_last; 2018 + curr_clock = tks->tkr_mono.clock; 2019 + cycle_now = tks->tkr_mono.cycle_last; 1940 2020 clocksource_start_suspend_timing(curr_clock, cycle_now); 1941 2021 1942 2022 if (persistent_clock_exists) { ··· 1946 2026 * try to compensate so the difference in system time 1947 2027 * and persistent_clock time stays close to constant. 1948 2028 */ 1949 - delta = timespec64_sub(tk_xtime(tk), timekeeping_suspend_time); 2029 + delta = timespec64_sub(tk_xtime(tks), timekeeping_suspend_time); 1950 2030 delta_delta = timespec64_sub(delta, old_delta); 1951 2031 if (abs(delta_delta.tv_sec) >= 2) { 1952 2032 /* ··· 1961 2041 } 1962 2042 } 1963 2043 1964 - timekeeping_update(tk, TK_MIRROR); 1965 - halt_fast_timekeeper(tk); 1966 - write_seqcount_end(&tk_core.seq); 1967 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2044 + timekeeping_update_from_shadow(&tk_core, 0); 2045 + halt_fast_timekeeper(tks); 2046 + raw_spin_unlock_irqrestore(&tk_core.lock, flags); 1968 2047 1969 2048 tick_suspend(); 1970 2049 clocksource_suspend(); ··· 2068 2149 */ 2069 2150 static void timekeeping_adjust(struct timekeeper *tk, s64 offset) 2070 2151 { 2152 + u64 ntp_tl = ntp_tick_length(); 2071 2153 u32 mult; 2072 2154 2073 2155 /* 2074 2156 * Determine the multiplier from the current NTP tick length. 2075 2157 * Avoid expensive division when the tick length doesn't change. 2076 2158 */ 2077 - if (likely(tk->ntp_tick == ntp_tick_length())) { 2159 + if (likely(tk->ntp_tick == ntp_tl)) { 2078 2160 mult = tk->tkr_mono.mult - tk->ntp_err_mult; 2079 2161 } else { 2080 - tk->ntp_tick = ntp_tick_length(); 2162 + tk->ntp_tick = ntp_tl; 2081 2163 mult = div64_u64((tk->ntp_tick >> tk->ntp_error_shift) - 2082 2164 tk->xtime_remainder, tk->cycle_interval); 2083 2165 } ··· 2217 2297 */ 2218 2298 static bool timekeeping_advance(enum timekeeping_adv_mode mode) 2219 2299 { 2300 + struct timekeeper *tk = &tk_core.shadow_timekeeper; 2220 2301 struct timekeeper *real_tk = &tk_core.timekeeper; 2221 - struct timekeeper *tk = &shadow_timekeeper; 2222 - u64 offset; 2223 - int shift = 0, maxshift; 2224 2302 unsigned int clock_set = 0; 2225 - unsigned long flags; 2303 + int shift = 0, maxshift; 2304 + u64 offset; 2226 2305 2227 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 2306 + guard(raw_spinlock_irqsave)(&tk_core.lock); 2228 2307 2229 2308 /* Make sure we're fully resumed: */ 2230 2309 if (unlikely(timekeeping_suspended)) 2231 - goto out; 2310 + return false; 2232 2311 2233 2312 offset = clocksource_delta(tk_clock_read(&tk->tkr_mono), 2234 2313 tk->tkr_mono.cycle_last, tk->tkr_mono.mask); 2235 2314 2236 2315 /* Check if there's really nothing to do */ 2237 2316 if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) 2238 - goto out; 2239 - 2240 - /* Do some additional sanity checking */ 2241 - timekeeping_check_update(tk, offset); 2317 + return false; 2242 2318 2243 2319 /* 2244 2320 * With NO_HZ we may have to accumulate many cycle_intervals ··· 2250 2334 maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; 2251 2335 shift = min(shift, maxshift); 2252 2336 while (offset >= tk->cycle_interval) { 2253 - offset = logarithmic_accumulation(tk, offset, shift, 2254 - &clock_set); 2337 + offset = logarithmic_accumulation(tk, offset, shift, &clock_set); 2255 2338 if (offset < tk->cycle_interval<<shift) 2256 2339 shift--; 2257 2340 } ··· 2264 2349 */ 2265 2350 clock_set |= accumulate_nsecs_to_secs(tk); 2266 2351 2267 - write_seqcount_begin(&tk_core.seq); 2268 - /* 2269 - * Update the real timekeeper. 2270 - * 2271 - * We could avoid this memcpy by switching pointers, but that 2272 - * requires changes to all other timekeeper usage sites as 2273 - * well, i.e. move the timekeeper pointer getter into the 2274 - * spinlocked/seqcount protected sections. And we trade this 2275 - * memcpy under the tk_core.seq against one before we start 2276 - * updating. 2277 - */ 2278 - timekeeping_update(tk, clock_set); 2279 - memcpy(real_tk, tk, sizeof(*tk)); 2280 - /* The memcpy must come last. Do not put anything here! */ 2281 - write_seqcount_end(&tk_core.seq); 2282 - out: 2283 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2352 + timekeeping_update_from_shadow(&tk_core, clock_set); 2284 2353 2285 2354 return !!clock_set; 2286 2355 } ··· 2557 2658 */ 2558 2659 int do_adjtimex(struct __kernel_timex *txc) 2559 2660 { 2560 - struct timekeeper *tk = &tk_core.timekeeper; 2561 2661 struct audit_ntp_data ad; 2562 2662 bool offset_set = false; 2563 2663 bool clock_set = false; 2564 2664 struct timespec64 ts; 2565 - unsigned long flags; 2566 - s32 orig_tai, tai; 2567 2665 int ret; 2568 2666 2569 2667 /* Validate the data before disabling interrupts */ ··· 2571 2675 2572 2676 if (txc->modes & ADJ_SETOFFSET) { 2573 2677 struct timespec64 delta; 2678 + 2574 2679 delta.tv_sec = txc->time.tv_sec; 2575 2680 delta.tv_nsec = txc->time.tv_usec; 2576 2681 if (!(txc->modes & ADJ_NANO)) ··· 2589 2692 ktime_get_real_ts64(&ts); 2590 2693 add_device_randomness(&ts, sizeof(ts)); 2591 2694 2592 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 2593 - write_seqcount_begin(&tk_core.seq); 2695 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 2696 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 2697 + s32 orig_tai, tai; 2594 2698 2595 - orig_tai = tai = tk->tai_offset; 2596 - ret = __do_adjtimex(txc, &ts, &tai, &ad); 2699 + orig_tai = tai = tks->tai_offset; 2700 + ret = __do_adjtimex(txc, &ts, &tai, &ad); 2597 2701 2598 - if (tai != orig_tai) { 2599 - __timekeeping_set_tai_offset(tk, tai); 2600 - timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); 2601 - clock_set = true; 2702 + if (tai != orig_tai) { 2703 + __timekeeping_set_tai_offset(tks, tai); 2704 + timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); 2705 + clock_set = true; 2706 + } else { 2707 + tk_update_leap_state_all(&tk_core); 2708 + } 2602 2709 } 2603 - tk_update_leap_state(tk); 2604 - 2605 - write_seqcount_end(&tk_core.seq); 2606 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2607 2710 2608 2711 audit_ntp_log(&ad); 2609 2712 ··· 2627 2730 */ 2628 2731 void hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts) 2629 2732 { 2630 - unsigned long flags; 2631 - 2632 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 2633 - write_seqcount_begin(&tk_core.seq); 2634 - 2733 + guard(raw_spinlock_irqsave)(&tk_core.lock); 2635 2734 __hardpps(phase_ts, raw_ts); 2636 - 2637 - write_seqcount_end(&tk_core.seq); 2638 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2639 2735 } 2640 2736 EXPORT_SYMBOL(hardpps); 2641 2737 #endif /* CONFIG_NTP_PPS */

+2 -8

kernel/time/timekeeping_internal.h

··· 30 30 31 31 #endif 32 32 33 - #ifdef CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE 34 33 static inline u64 clocksource_delta(u64 now, u64 last, u64 mask) 35 34 { 36 35 u64 ret = (now - last) & mask; ··· 40 41 */ 41 42 return ret & ~(mask >> 1) ? 0 : ret; 42 43 } 43 - #else 44 - static inline u64 clocksource_delta(u64 now, u64 last, u64 mask) 45 - { 46 - return (now - last) & mask; 47 - } 48 - #endif 49 44 50 45 /* Semi public for serialization of non timekeeper VDSO updates. */ 51 - extern raw_spinlock_t timekeeper_lock; 46 + unsigned long timekeeper_lock_irqsave(void); 47 + void timekeeper_unlock_irqrestore(unsigned long flags); 52 48 53 49 #endif /* _TIMEKEEPING_INTERNAL_H */

+2 -193

kernel/time/timer.c

··· 37 37 #include <linux/tick.h> 38 38 #include <linux/kallsyms.h> 39 39 #include <linux/irq_work.h> 40 - #include <linux/sched/signal.h> 41 40 #include <linux/sched/sysctl.h> 42 41 #include <linux/sched/nohz.h> 43 42 #include <linux/sched/debug.h> ··· 2421 2422 2422 2423 static void __run_timer_base(struct timer_base *base) 2423 2424 { 2424 - if (time_before(jiffies, base->next_expiry)) 2425 + /* Can race against a remote CPU updating next_expiry under the lock */ 2426 + if (time_before(jiffies, READ_ONCE(base->next_expiry))) 2425 2427 return; 2426 2428 2427 2429 timer_base_lock_expiry(base); ··· 2526 2526 run_posix_cpu_timers(); 2527 2527 } 2528 2528 2529 - /* 2530 - * Since schedule_timeout()'s timer is defined on the stack, it must store 2531 - * the target task on the stack as well. 2532 - */ 2533 - struct process_timer { 2534 - struct timer_list timer; 2535 - struct task_struct *task; 2536 - }; 2537 - 2538 - static void process_timeout(struct timer_list *t) 2539 - { 2540 - struct process_timer *timeout = from_timer(timeout, t, timer); 2541 - 2542 - wake_up_process(timeout->task); 2543 - } 2544 - 2545 - /** 2546 - * schedule_timeout - sleep until timeout 2547 - * @timeout: timeout value in jiffies 2548 - * 2549 - * Make the current task sleep until @timeout jiffies have elapsed. 2550 - * The function behavior depends on the current task state 2551 - * (see also set_current_state() description): 2552 - * 2553 - * %TASK_RUNNING - the scheduler is called, but the task does not sleep 2554 - * at all. That happens because sched_submit_work() does nothing for 2555 - * tasks in %TASK_RUNNING state. 2556 - * 2557 - * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to 2558 - * pass before the routine returns unless the current task is explicitly 2559 - * woken up, (e.g. by wake_up_process()). 2560 - * 2561 - * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 2562 - * delivered to the current task or the current task is explicitly woken 2563 - * up. 2564 - * 2565 - * The current task state is guaranteed to be %TASK_RUNNING when this 2566 - * routine returns. 2567 - * 2568 - * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule 2569 - * the CPU away without a bound on the timeout. In this case the return 2570 - * value will be %MAX_SCHEDULE_TIMEOUT. 2571 - * 2572 - * Returns 0 when the timer has expired otherwise the remaining time in 2573 - * jiffies will be returned. In all cases the return value is guaranteed 2574 - * to be non-negative. 2575 - */ 2576 - signed long __sched schedule_timeout(signed long timeout) 2577 - { 2578 - struct process_timer timer; 2579 - unsigned long expire; 2580 - 2581 - switch (timeout) 2582 - { 2583 - case MAX_SCHEDULE_TIMEOUT: 2584 - /* 2585 - * These two special cases are useful to be comfortable 2586 - * in the caller. Nothing more. We could take 2587 - * MAX_SCHEDULE_TIMEOUT from one of the negative value 2588 - * but I' d like to return a valid offset (>=0) to allow 2589 - * the caller to do everything it want with the retval. 2590 - */ 2591 - schedule(); 2592 - goto out; 2593 - default: 2594 - /* 2595 - * Another bit of PARANOID. Note that the retval will be 2596 - * 0 since no piece of kernel is supposed to do a check 2597 - * for a negative retval of schedule_timeout() (since it 2598 - * should never happens anyway). You just have the printk() 2599 - * that will tell you if something is gone wrong and where. 2600 - */ 2601 - if (timeout < 0) { 2602 - printk(KERN_ERR "schedule_timeout: wrong timeout " 2603 - "value %lx\n", timeout); 2604 - dump_stack(); 2605 - __set_current_state(TASK_RUNNING); 2606 - goto out; 2607 - } 2608 - } 2609 - 2610 - expire = timeout + jiffies; 2611 - 2612 - timer.task = current; 2613 - timer_setup_on_stack(&timer.timer, process_timeout, 0); 2614 - __mod_timer(&timer.timer, expire, MOD_TIMER_NOTPENDING); 2615 - schedule(); 2616 - del_timer_sync(&timer.timer); 2617 - 2618 - /* Remove the timer from the object tracker */ 2619 - destroy_timer_on_stack(&timer.timer); 2620 - 2621 - timeout = expire - jiffies; 2622 - 2623 - out: 2624 - return timeout < 0 ? 0 : timeout; 2625 - } 2626 - EXPORT_SYMBOL(schedule_timeout); 2627 - 2628 - /* 2629 - * We can use __set_current_state() here because schedule_timeout() calls 2630 - * schedule() unconditionally. 2631 - */ 2632 - signed long __sched schedule_timeout_interruptible(signed long timeout) 2633 - { 2634 - __set_current_state(TASK_INTERRUPTIBLE); 2635 - return schedule_timeout(timeout); 2636 - } 2637 - EXPORT_SYMBOL(schedule_timeout_interruptible); 2638 - 2639 - signed long __sched schedule_timeout_killable(signed long timeout) 2640 - { 2641 - __set_current_state(TASK_KILLABLE); 2642 - return schedule_timeout(timeout); 2643 - } 2644 - EXPORT_SYMBOL(schedule_timeout_killable); 2645 - 2646 - signed long __sched schedule_timeout_uninterruptible(signed long timeout) 2647 - { 2648 - __set_current_state(TASK_UNINTERRUPTIBLE); 2649 - return schedule_timeout(timeout); 2650 - } 2651 - EXPORT_SYMBOL(schedule_timeout_uninterruptible); 2652 - 2653 - /* 2654 - * Like schedule_timeout_uninterruptible(), except this task will not contribute 2655 - * to load average. 2656 - */ 2657 - signed long __sched schedule_timeout_idle(signed long timeout) 2658 - { 2659 - __set_current_state(TASK_IDLE); 2660 - return schedule_timeout(timeout); 2661 - } 2662 - EXPORT_SYMBOL(schedule_timeout_idle); 2663 - 2664 2529 #ifdef CONFIG_HOTPLUG_CPU 2665 2530 static void migrate_timer_list(struct timer_base *new_base, struct hlist_head *head) 2666 2531 { ··· 2622 2757 posix_cputimers_init_work(); 2623 2758 open_softirq(TIMER_SOFTIRQ, run_timer_softirq); 2624 2759 } 2625 - 2626 - /** 2627 - * msleep - sleep safely even with waitqueue interruptions 2628 - * @msecs: Time in milliseconds to sleep for 2629 - */ 2630 - void msleep(unsigned int msecs) 2631 - { 2632 - unsigned long timeout = msecs_to_jiffies(msecs); 2633 - 2634 - while (timeout) 2635 - timeout = schedule_timeout_uninterruptible(timeout); 2636 - } 2637 - 2638 - EXPORT_SYMBOL(msleep); 2639 - 2640 - /** 2641 - * msleep_interruptible - sleep waiting for signals 2642 - * @msecs: Time in milliseconds to sleep for 2643 - */ 2644 - unsigned long msleep_interruptible(unsigned int msecs) 2645 - { 2646 - unsigned long timeout = msecs_to_jiffies(msecs); 2647 - 2648 - while (timeout && !signal_pending(current)) 2649 - timeout = schedule_timeout_interruptible(timeout); 2650 - return jiffies_to_msecs(timeout); 2651 - } 2652 - 2653 - EXPORT_SYMBOL(msleep_interruptible); 2654 - 2655 - /** 2656 - * usleep_range_state - Sleep for an approximate time in a given state 2657 - * @min: Minimum time in usecs to sleep 2658 - * @max: Maximum time in usecs to sleep 2659 - * @state: State of the current task that will be while sleeping 2660 - * 2661 - * In non-atomic context where the exact wakeup time is flexible, use 2662 - * usleep_range_state() instead of udelay(). The sleep improves responsiveness 2663 - * by avoiding the CPU-hogging busy-wait of udelay(), and the range reduces 2664 - * power usage by allowing hrtimers to take advantage of an already- 2665 - * scheduled interrupt instead of scheduling a new one just for this sleep. 2666 - */ 2667 - void __sched usleep_range_state(unsigned long min, unsigned long max, 2668 - unsigned int state) 2669 - { 2670 - ktime_t exp = ktime_add_us(ktime_get(), min); 2671 - u64 delta = (u64)(max - min) * NSEC_PER_USEC; 2672 - 2673 - for (;;) { 2674 - __set_current_state(state); 2675 - /* Do not return before the requested sleep time has elapsed */ 2676 - if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS)) 2677 - break; 2678 - } 2679 - } 2680 - EXPORT_SYMBOL(usleep_range_state);

+2 -3

kernel/time/vsyscall.c

··· 151 151 unsigned long vdso_update_begin(void) 152 152 { 153 153 struct vdso_data *vdata = __arch_get_k_vdso_data(); 154 - unsigned long flags; 154 + unsigned long flags = timekeeper_lock_irqsave(); 155 155 156 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 157 156 vdso_write_begin(vdata); 158 157 return flags; 159 158 } ··· 171 172 172 173 vdso_write_end(vdata); 173 174 __arch_sync_vdso_data(vdata); 174 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 175 + timekeeper_unlock_irqrestore(flags); 175 176 }

-13

lib/Kconfig.debug

··· 1328 1328 1329 1329 endmenu 1330 1330 1331 - config DEBUG_TIMEKEEPING 1332 - bool "Enable extra timekeeping sanity checking" 1333 - help 1334 - This option will enable additional timekeeping sanity checks 1335 - which may be helpful when diagnosing issues where timekeeping 1336 - problems are suspected. 1337 - 1338 - This may include checks in the timekeeping hotpaths, so this 1339 - option may have a (very small) performance impact to some 1340 - workloads. 1341 - 1342 - If unsure, say N. 1343 - 1344 1331 config DEBUG_PREEMPT 1345 1332 bool "Debug preemptible kernel" 1346 1333 depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT

+2 -3

mm/damon/core.c

··· 1906 1906 1907 1907 static void kdamond_usleep(unsigned long usecs) 1908 1908 { 1909 - /* See Documentation/timers/timers-howto.rst for the thresholds */ 1910 - if (usecs > 20 * USEC_PER_MSEC) 1909 + if (usecs >= USLEEP_RANGE_UPPER_BOUND) 1911 1910 schedule_timeout_idle(usecs_to_jiffies(usecs)); 1912 1911 else 1913 - usleep_idle_range(usecs, usecs + 1); 1912 + usleep_range_idle(usecs, usecs + 1); 1914 1913 } 1915 1914 1916 1915 /* Returns negative error code if it's not activated but should return */

-2

net/bluetooth/hci_event.c

··· 42 42 #define ZERO_KEY "\x00\x00\x00\x00\x00\x00\x00\x00" \ 43 43 "\x00\x00\x00\x00\x00\x00\x00\x00" 44 44 45 - #define secs_to_jiffies(_secs) msecs_to_jiffies((_secs) * 1000) 46 - 47 45 /* Handle HCI Event packets */ 48 46 49 47 static void *hci_ev_skb_pull(struct hci_dev *hdev, struct sk_buff *skb,

+1 -1

net/core/pktgen.c

··· 2285 2285 s64 remaining; 2286 2286 struct hrtimer_sleeper t; 2287 2287 2288 - hrtimer_init_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 2288 + hrtimer_setup_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 2289 2289 hrtimer_set_expires(&t.timer, spin_until); 2290 2290 2291 2291 remaining = ktime_to_ns(hrtimer_expires_remaining(&t.timer));

+1 -3

net/netfilter/xt_IDLETIMER.c

··· 107 107 schedule_work(&timer->work); 108 108 } 109 109 110 - static enum alarmtimer_restart idletimer_tg_alarmproc(struct alarm *alarm, 111 - ktime_t now) 110 + static void idletimer_tg_alarmproc(struct alarm *alarm, ktime_t now) 112 111 { 113 112 struct idletimer_tg *timer = alarm->data; 114 113 115 114 pr_debug("alarm %s expired\n", timer->attr.attr.name); 116 115 schedule_work(&timer->work); 117 - return ALARMTIMER_NORESTART; 118 116 } 119 117 120 118 static int idletimer_check_sysfs_name(const char *name, unsigned int size)

+5 -5

scripts/checkpatch.pl

··· 6597 6597 # ignore udelay's < 10, however 6598 6598 if (! ($delay < 10) ) { 6599 6599 CHK("USLEEP_RANGE", 6600 - "usleep_range is preferred over udelay; see Documentation/timers/timers-howto.rst\n" . $herecurr); 6600 + "usleep_range is preferred over udelay; see function description of usleep_range() and udelay().\n" . $herecurr); 6601 6601 } 6602 6602 if ($delay > 2000) { 6603 6603 WARN("LONG_UDELAY", 6604 - "long udelay - prefer mdelay; see arch/arm/include/asm/delay.h\n" . $herecurr); 6604 + "long udelay - prefer mdelay; see function description of mdelay().\n" . $herecurr); 6605 6605 } 6606 6606 } 6607 6607 ··· 6609 6609 if ($line =~ /\bmsleep\s*$(\d+)$;/) { 6610 6610 if ($1 < 20) { 6611 6611 WARN("MSLEEP", 6612 - "msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.rst\n" . $herecurr); 6612 + "msleep < 20ms can sleep for up to 20ms; see function description of msleep().\n" . $herecurr); 6613 6613 } 6614 6614 } 6615 6615 ··· 7077 7077 my $max = $7; 7078 7078 if ($min eq $max) { 7079 7079 WARN("USLEEP_RANGE", 7080 - "usleep_range should not use min == max args; see Documentation/timers/timers-howto.rst\n" . "$here\n$stat\n"); 7080 + "usleep_range should not use min == max args; see function description of usleep_range().\n" . "$here\n$stat\n"); 7081 7081 } elsif ($min =~ /^\d+$/ && $max =~ /^\d+$/ && 7082 7082 $min > $max) { 7083 7083 WARN("USLEEP_RANGE", 7084 - "usleep_range args reversed, use min then max; see Documentation/timers/timers-howto.rst\n" . "$here\n$stat\n"); 7084 + "usleep_range args reversed, use min then max; see function description of usleep_range().\n" . "$here\n$stat\n"); 7085 7085 } 7086 7086 } 7087 7087

+4 -4

sound/soc/sof/ops.h

··· 597 597 * @addr: Address to poll 598 598 * @val: Variable to read the value into 599 599 * @cond: Break condition (usually involving @val) 600 - * @sleep_us: Maximum time to sleep between reads in us (0 601 - * tight-loops). Should be less than ~20ms since usleep_range 602 - * is used (see Documentation/timers/timers-howto.rst). 600 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 601 + * read usleep_range() function description for details and 602 + * limitations. 603 603 * @timeout_us: Timeout in us, 0 means never timeout 604 604 * 605 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 605 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 606 606 * case, the last read value at @addr is stored in @val. Must not 607 607 * be called from atomic context if sleep_us or timeout_us are used. 608 608 *

-1

tools/testing/selftests/wireguard/qemu/debug.config

··· 31 31 CONFIG_SCHED_INFO=y 32 32 CONFIG_SCHEDSTATS=y 33 33 CONFIG_SCHED_STACK_END_CHECK=y 34 - CONFIG_DEBUG_TIMEKEEPING=y 35 34 CONFIG_DEBUG_PREEMPT=y 36 35 CONFIG_DEBUG_RT_MUTEXES=y 37 36 CONFIG_DEBUG_SPINLOCK=y