Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:

- The final step to get rid of auto-rearming posix-timers

posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.

This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.

Cure this by:

- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.

- Queuing ignored timer signals onto a seperate ignored list.

- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.

- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.

This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.

- Core infrastructure for VFS multigrain timestamping

This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().

These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.

- Cleanup and consolidation of the sleep() infrastructure

- Move all sleep and timeout functions into one file

- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.

- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.

- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place

- Fixup a few usage sites

- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks

A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).

The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.

As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.

The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.

Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.

This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.

- Consolidate hrtimer initialization

hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.

That's an extra unnecessary step and makes Rust support less
straight forward than it should be.

Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.

The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.

- Drivers:

- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.

Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.

- Mostly boring cleanups, fixes, improvements and code movement"

* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...

+2378 -2180
-2
Documentation/dev-tools/checkpatch.rst
··· 470 470 usleep_range() should be preferred over udelay(). The proper way of 471 471 using usleep_range() is mentioned in the kernel docs. 472 472 473 - See: https://www.kernel.org/doc/html/latest/timers/timers-howto.html#delays-information-on-the-various-kernel-delay-sleep-mechanisms 474 - 475 473 476 474 Comments 477 475 --------
-21
Documentation/devicetree/bindings/timer/actions,owl-timer.txt
··· 1 - Actions Semi Owl Timer 2 - 3 - Required properties: 4 - - compatible : "actions,s500-timer" for S500 5 - "actions,s700-timer" for S700 6 - "actions,s900-timer" for S900 7 - - reg : Offset and length of the register set for the device. 8 - - interrupts : Should contain the interrupts. 9 - - interrupt-names : Valid names are: "2hz0", "2hz1", 10 - "timer0", "timer1", "timer2", "timer3" 11 - See ../resource-names.txt 12 - 13 - Example: 14 - 15 - timer@b0168000 { 16 - compatible = "actions,s500-timer"; 17 - reg = <0xb0168000 0x100>; 18 - interrupts = <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>, 19 - <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>; 20 - interrupt-names = "timer0", "timer1"; 21 - };
+107
Documentation/devicetree/bindings/timer/actions,owl-timer.yaml
··· 1 + # SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause 2 + %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/timer/actions,owl-timer.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: Actions Semi Owl timer 8 + 9 + maintainers: 10 + - Andreas Färber <afaerber@suse.de> 11 + 12 + description: 13 + Actions Semi Owl SoCs provide 32bit and 2Hz timers. 14 + The 32bit timers support dynamic irq, as well as one-shot mode. 15 + 16 + properties: 17 + compatible: 18 + enum: 19 + - actions,s500-timer 20 + - actions,s700-timer 21 + - actions,s900-timer 22 + 23 + clocks: 24 + maxItems: 1 25 + 26 + interrupts: 27 + minItems: 1 28 + maxItems: 6 29 + 30 + interrupt-names: 31 + minItems: 1 32 + maxItems: 6 33 + items: 34 + enum: 35 + - 2hz0 36 + - 2hz1 37 + - timer0 38 + - timer1 39 + - timer2 40 + - timer3 41 + 42 + reg: 43 + maxItems: 1 44 + 45 + required: 46 + - compatible 47 + - clocks 48 + - interrupts 49 + - interrupt-names 50 + - reg 51 + 52 + allOf: 53 + - if: 54 + properties: 55 + compatible: 56 + contains: 57 + enum: 58 + - actions,s500-timer 59 + then: 60 + properties: 61 + interrupts: 62 + minItems: 4 63 + maxItems: 4 64 + interrupt-names: 65 + items: 66 + - const: 2hz0 67 + - const: 2hz1 68 + - const: timer0 69 + - const: timer1 70 + 71 + - if: 72 + properties: 73 + compatible: 74 + contains: 75 + enum: 76 + - actions,s700-timer 77 + - actions,s900-timer 78 + then: 79 + properties: 80 + interrupts: 81 + minItems: 1 82 + maxItems: 1 83 + interrupt-names: 84 + items: 85 + - const: timer1 86 + 87 + additionalProperties: false 88 + 89 + examples: 90 + - | 91 + #include <dt-bindings/interrupt-controller/arm-gic.h> 92 + #include <dt-bindings/interrupt-controller/irq.h> 93 + soc { 94 + #address-cells = <1>; 95 + #size-cells = <1>; 96 + timer@b0168000 { 97 + compatible = "actions,s500-timer"; 98 + reg = <0xb0168000 0x100>; 99 + clocks = <&hosc>; 100 + interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>, 101 + <GIC_SPI 9 IRQ_TYPE_LEVEL_HIGH>, 102 + <GIC_SPI 10 IRQ_TYPE_LEVEL_HIGH>, 103 + <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>; 104 + interrupt-names = "2hz0", "2hz1", "timer0", "timer1"; 105 + }; 106 + }; 107 + ...
+121
Documentation/timers/delay_sleep_functions.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + Delay and sleep mechanisms 4 + ========================== 5 + 6 + This document seeks to answer the common question: "What is the 7 + RightWay (TM) to insert a delay?" 8 + 9 + This question is most often faced by driver writers who have to 10 + deal with hardware delays and who may not be the most intimately 11 + familiar with the inner workings of the Linux Kernel. 12 + 13 + The following table gives a rough overview about the existing function 14 + 'families' and their limitations. This overview table does not replace the 15 + reading of the function description before usage! 16 + 17 + .. list-table:: 18 + :widths: 20 20 20 20 20 19 + :header-rows: 2 20 + 21 + * - 22 + - `*delay()` 23 + - `usleep_range*()` 24 + - `*sleep()` 25 + - `fsleep()` 26 + * - 27 + - busy-wait loop 28 + - hrtimers based 29 + - timer list timers based 30 + - combines the others 31 + * - Usage in atomic Context 32 + - yes 33 + - no 34 + - no 35 + - no 36 + * - precise on "short intervals" 37 + - yes 38 + - yes 39 + - depends 40 + - yes 41 + * - precise on "long intervals" 42 + - Do not use! 43 + - yes 44 + - max 12.5% slack 45 + - yes 46 + * - interruptible variant 47 + - no 48 + - yes 49 + - yes 50 + - no 51 + 52 + A generic advice for non atomic contexts could be: 53 + 54 + #. Use `fsleep()` whenever unsure (as it combines all the advantages of the 55 + others) 56 + #. Use `*sleep()` whenever possible 57 + #. Use `usleep_range*()` whenever accuracy of `*sleep()` is not sufficient 58 + #. Use `*delay()` for very, very short delays 59 + 60 + Find some more detailed information about the function 'families' in the next 61 + sections. 62 + 63 + `*delay()` family of functions 64 + ------------------------------ 65 + 66 + These functions use the jiffy estimation of clock speed and will busy wait for 67 + enough loop cycles to achieve the desired delay. udelay() is the basic 68 + implementation and ndelay() as well as mdelay() are variants. 69 + 70 + These functions are mainly used to add a delay in atomic context. Please make 71 + sure to ask yourself before adding a delay in atomic context: Is this really 72 + required? 73 + 74 + .. kernel-doc:: include/asm-generic/delay.h 75 + :identifiers: udelay ndelay 76 + 77 + .. kernel-doc:: include/linux/delay.h 78 + :identifiers: mdelay 79 + 80 + 81 + `usleep_range*()` and `*sleep()` family of functions 82 + ---------------------------------------------------- 83 + 84 + These functions use hrtimers or timer list timers to provide the requested 85 + sleeping duration. In order to decide which function is the right one to use, 86 + take some basic information into account: 87 + 88 + #. hrtimers are more expensive as they are using an rb-tree (instead of hashing) 89 + #. hrtimers are more expensive when the requested sleeping duration is the first 90 + timer which means real hardware has to be programmed 91 + #. timer list timers always provide some sort of slack as they are jiffy based 92 + 93 + The generic advice is repeated here: 94 + 95 + #. Use `fsleep()` whenever unsure (as it combines all the advantages of the 96 + others) 97 + #. Use `*sleep()` whenever possible 98 + #. Use `usleep_range*()` whenever accuracy of `*sleep()` is not sufficient 99 + 100 + First check fsleep() function description and to learn more about accuracy, 101 + please check msleep() function description. 102 + 103 + 104 + `usleep_range*()` 105 + ~~~~~~~~~~~~~~~~~ 106 + 107 + .. kernel-doc:: include/linux/delay.h 108 + :identifiers: usleep_range usleep_range_idle 109 + 110 + .. kernel-doc:: kernel/time/sleep_timeout.c 111 + :identifiers: usleep_range_state 112 + 113 + 114 + `*sleep()` 115 + ~~~~~~~~~~ 116 + 117 + .. kernel-doc:: kernel/time/sleep_timeout.c 118 + :identifiers: msleep msleep_interruptible 119 + 120 + .. kernel-doc:: include/linux/delay.h 121 + :identifiers: ssleep fsleep
+1 -1
Documentation/timers/index.rst
··· 12 12 hrtimers 13 13 no_hz 14 14 timekeeping 15 - timers-howto 15 + delay_sleep_functions 16 16 17 17 .. only:: subproject and html 18 18
-115
Documentation/timers/timers-howto.rst
··· 1 - =================================================================== 2 - delays - Information on the various kernel delay / sleep mechanisms 3 - =================================================================== 4 - 5 - This document seeks to answer the common question: "What is the 6 - RightWay (TM) to insert a delay?" 7 - 8 - This question is most often faced by driver writers who have to 9 - deal with hardware delays and who may not be the most intimately 10 - familiar with the inner workings of the Linux Kernel. 11 - 12 - 13 - Inserting Delays 14 - ---------------- 15 - 16 - The first, and most important, question you need to ask is "Is my 17 - code in an atomic context?" This should be followed closely by "Does 18 - it really need to delay in atomic context?" If so... 19 - 20 - ATOMIC CONTEXT: 21 - You must use the `*delay` family of functions. These 22 - functions use the jiffy estimation of clock speed 23 - and will busy wait for enough loop cycles to achieve 24 - the desired delay: 25 - 26 - ndelay(unsigned long nsecs) 27 - udelay(unsigned long usecs) 28 - mdelay(unsigned long msecs) 29 - 30 - udelay is the generally preferred API; ndelay-level 31 - precision may not actually exist on many non-PC devices. 32 - 33 - mdelay is macro wrapper around udelay, to account for 34 - possible overflow when passing large arguments to udelay. 35 - In general, use of mdelay is discouraged and code should 36 - be refactored to allow for the use of msleep. 37 - 38 - NON-ATOMIC CONTEXT: 39 - You should use the `*sleep[_range]` family of functions. 40 - There are a few more options here, while any of them may 41 - work correctly, using the "right" sleep function will 42 - help the scheduler, power management, and just make your 43 - driver better :) 44 - 45 - -- Backed by busy-wait loop: 46 - 47 - udelay(unsigned long usecs) 48 - 49 - -- Backed by hrtimers: 50 - 51 - usleep_range(unsigned long min, unsigned long max) 52 - 53 - -- Backed by jiffies / legacy_timers 54 - 55 - msleep(unsigned long msecs) 56 - msleep_interruptible(unsigned long msecs) 57 - 58 - Unlike the `*delay` family, the underlying mechanism 59 - driving each of these calls varies, thus there are 60 - quirks you should be aware of. 61 - 62 - 63 - SLEEPING FOR "A FEW" USECS ( < ~10us? ): 64 - * Use udelay 65 - 66 - - Why not usleep? 67 - On slower systems, (embedded, OR perhaps a speed- 68 - stepped PC!) the overhead of setting up the hrtimers 69 - for usleep *may* not be worth it. Such an evaluation 70 - will obviously depend on your specific situation, but 71 - it is something to be aware of. 72 - 73 - SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms): 74 - * Use usleep_range 75 - 76 - - Why not msleep for (1ms - 20ms)? 77 - Explained originally here: 78 - https://lore.kernel.org/r/15327.1186166232@lwn.net 79 - 80 - msleep(1~20) may not do what the caller intends, and 81 - will often sleep longer (~20 ms actual sleep for any 82 - value given in the 1~20ms range). In many cases this 83 - is not the desired behavior. 84 - 85 - - Why is there no "usleep" / What is a good range? 86 - Since usleep_range is built on top of hrtimers, the 87 - wakeup will be very precise (ish), thus a simple 88 - usleep function would likely introduce a large number 89 - of undesired interrupts. 90 - 91 - With the introduction of a range, the scheduler is 92 - free to coalesce your wakeup with any other wakeup 93 - that may have happened for other reasons, or at the 94 - worst case, fire an interrupt for your upper bound. 95 - 96 - The larger a range you supply, the greater a chance 97 - that you will not trigger an interrupt; this should 98 - be balanced with what is an acceptable upper bound on 99 - delay / performance for your specific code path. Exact 100 - tolerances here are very situation specific, thus it 101 - is left to the caller to determine a reasonable range. 102 - 103 - SLEEPING FOR LARGER MSECS ( 10ms+ ) 104 - * Use msleep or possibly msleep_interruptible 105 - 106 - - What's the difference? 107 - msleep sets the current task to TASK_UNINTERRUPTIBLE 108 - whereas msleep_interruptible sets the current task to 109 - TASK_INTERRUPTIBLE before scheduling the sleep. In 110 - short, the difference is whether the sleep can be ended 111 - early by a signal. In general, just use msleep unless 112 - you know you have a need for the interruptible variant. 113 - 114 - FLEXIBLE SLEEPING (any delay, uninterruptible) 115 - * Use fsleep
+3 -1
MAINTAINERS
··· 1998 1998 F: Documentation/devicetree/bindings/net/actions,owl-emac.yaml 1999 1999 F: Documentation/devicetree/bindings/pinctrl/actions,* 2000 2000 F: Documentation/devicetree/bindings/power/actions,owl-sps.txt 2001 - F: Documentation/devicetree/bindings/timer/actions,owl-timer.txt 2001 + F: Documentation/devicetree/bindings/timer/actions,owl-timer.yaml 2002 2002 F: arch/arm/boot/dts/actions/ 2003 2003 F: arch/arm/mach-actions/ 2004 2004 F: arch/arm64/boot/dts/actions/ ··· 10138 10138 T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/core 10139 10139 F: Documentation/timers/ 10140 10140 F: include/linux/clockchips.h 10141 + F: include/linux/delay.h 10141 10142 F: include/linux/hrtimer.h 10142 10143 F: include/linux/timer.h 10143 10144 F: kernel/time/clockevents.c 10144 10145 F: kernel/time/hrtimer.c 10146 + F: kernel/time/sleep_timeout.c 10145 10147 F: kernel/time/timer.c 10146 10148 F: kernel/time/timer_list.c 10147 10149 F: kernel/time/timer_migration.*
-1
arch/arm/kernel/smp_twd.c
··· 93 93 { 94 94 struct clock_event_device *clk = raw_cpu_ptr(twd_evt); 95 95 96 - twd_shutdown(clk); 97 96 disable_percpu_irq(clk->irq); 98 97 } 99 98
-7
arch/mips/ralink/Kconfig
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 if RALINK 3 3 4 - config CLKEVT_RT3352 5 - bool 6 - depends on SOC_RT305X || SOC_MT7620 7 - default y 8 - select TIMER_OF 9 - select CLKSRC_MMIO 10 - 11 4 config RALINK_ILL_ACC 12 5 bool 13 6 depends on SOC_RT305X
-2
arch/mips/ralink/Makefile
··· 10 10 obj-y += clk.o timer.o 11 11 endif 12 12 13 - obj-$(CONFIG_CLKEVT_RT3352) += cevt-rt3352.o 14 - 15 13 obj-$(CONFIG_RALINK_ILL_ACC) += ill_acc.o 16 14 17 15 obj-$(CONFIG_IRQ_INTC) += irq.o
+4 -7
arch/mips/ralink/cevt-rt3352.c drivers/clocksource/timer-ralink.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 1 2 /* 2 - * This file is subject to the terms and conditions of the GNU General Public 3 - * License. See the file "COPYING" in the main directory of this archive 4 - * for more details. 3 + * Ralink System Tick Counter driver present on RT3352 and MT7620 SoCs. 5 4 * 6 5 * Copyright (C) 2013 by John Crispin <john@phrozen.org> 7 6 */ ··· 14 15 #include <linux/of.h> 15 16 #include <linux/of_irq.h> 16 17 #include <linux/of_address.h> 17 - 18 - #include <asm/mach-ralink/ralink_regs.h> 19 18 20 19 #define SYSTICK_FREQ (50 * 1000) 21 20 ··· 37 40 static int systick_shutdown(struct clock_event_device *evt); 38 41 39 42 static int systick_next_event(unsigned long delta, 40 - struct clock_event_device *evt) 43 + struct clock_event_device *evt) 41 44 { 42 45 struct systick_device *sdev; 43 46 u32 count; ··· 57 60 58 61 static irqreturn_t systick_interrupt(int irq, void *dev_id) 59 62 { 60 - struct clock_event_device *dev = (struct clock_event_device *) dev_id; 63 + struct clock_event_device *dev = (struct clock_event_device *)dev_id; 61 64 62 65 dev->event_handler(dev); 63 66
+7 -14
arch/powerpc/kernel/rtas.c
··· 1390 1390 */ 1391 1391 ms = clamp(ms, 1U, 1000U); 1392 1392 /* 1393 - * The delay hint is an order-of-magnitude suggestion, not 1394 - * a minimum. It is fine, possibly even advantageous, for 1395 - * us to pause for less time than hinted. For small values, 1396 - * use usleep_range() to ensure we don't sleep much longer 1397 - * than actually needed. 1398 - * 1399 - * See Documentation/timers/timers-howto.rst for 1400 - * explanation of the threshold used here. In effect we use 1401 - * usleep_range() for 9900 and 9901, msleep() for 1402 - * 9902-9905. 1393 + * The delay hint is an order-of-magnitude suggestion, not a 1394 + * minimum. It is fine, possibly even advantageous, for us to 1395 + * pause for less time than hinted. To make sure pause time will 1396 + * not be way longer than requested independent of HZ 1397 + * configuration, use fsleep(). See fsleep() for details of 1398 + * used sleeping functions. 1403 1399 */ 1404 - if (ms <= 20) 1405 - usleep_range(ms * 100, ms * 1000); 1406 - else 1407 - msleep(ms); 1400 + fsleep(ms * 1000); 1408 1401 break; 1409 1402 case RTAS_BUSY: 1410 1403 ret = true;
-1
arch/riscv/configs/defconfig
··· 302 302 CONFIG_DEBUG_PER_CPU_MAPS=y 303 303 CONFIG_SOFTLOCKUP_DETECTOR=y 304 304 CONFIG_WQ_WATCHDOG=y 305 - CONFIG_DEBUG_TIMEKEEPING=y 306 305 CONFIG_DEBUG_RT_MUTEXES=y 307 306 CONFIG_DEBUG_SPINLOCK=y 308 307 CONFIG_DEBUG_MUTEXES=y
-1
arch/x86/Kconfig
··· 146 146 select ARCH_HAS_PARANOID_L1D_FLUSH 147 147 select BUILDTIME_TABLE_SORT 148 148 select CLKEVT_I8253 149 - select CLOCKSOURCE_VALIDATE_LAST_CYCLE 150 149 select CLOCKSOURCE_WATCHDOG 151 150 # Word-size accesses may read uninitialized data past the trailing \0 152 151 # in strings and cause false KMSAN reports.
-2
arch/x86/include/asm/timer.h
··· 6 6 #include <linux/interrupt.h> 7 7 #include <linux/math64.h> 8 8 9 - #define TICK_SIZE (tick_nsec / 1000) 10 - 11 9 unsigned long long native_sched_clock(void); 12 10 extern void recalibrate_cpu_khz(void); 13 11
+2 -10
arch/x86/kvm/xen.c
··· 263 263 atomic_set(&vcpu->arch.xen.timer_pending, 0); 264 264 } 265 265 266 - static void kvm_xen_init_timer(struct kvm_vcpu *vcpu) 267 - { 268 - hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, 269 - HRTIMER_MODE_ABS_HARD); 270 - vcpu->arch.xen.timer.function = xen_timer_callback; 271 - } 272 - 273 266 static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) 274 267 { 275 268 struct kvm_vcpu_xen *vx = &v->arch.xen; ··· 1062 1069 r = -EINVAL; 1063 1070 break; 1064 1071 } 1065 - 1066 - if (!vcpu->arch.xen.timer.function) 1067 - kvm_xen_init_timer(vcpu); 1068 1072 1069 1073 /* Stop the timer (if it's running) before changing the vector */ 1070 1074 kvm_xen_stop_timer(vcpu); ··· 2225 2235 vcpu->arch.xen.poll_evtchn = 0; 2226 2236 2227 2237 timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0); 2238 + hrtimer_init(&vcpu->arch.xen.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD); 2239 + vcpu->arch.xen.timer.function = xen_timer_callback; 2228 2240 2229 2241 kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm); 2230 2242 kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm);
+11 -1
drivers/clocksource/Kconfig
··· 400 400 This affects CPU_FREQ max delta from the initial frequency. 401 401 402 402 config ARM_TIMER_SP804 403 - bool "Support for Dual Timer SP804 module" if COMPILE_TEST 403 + bool "Support for Dual Timer SP804 module" 404 + depends on ARM || ARM64 || COMPILE_TEST 404 405 depends on GENERIC_SCHED_CLOCK && HAVE_CLK 405 406 select CLKSRC_MMIO 406 407 select TIMER_OF if OF ··· 753 752 help 754 753 Enables support for the Cirrus Logic timer block 755 754 EP93XX. 755 + 756 + config RALINK_TIMER 757 + bool "Ralink System Tick Counter" 758 + depends on SOC_RT305X || SOC_MT7620 || COMPILE_TEST 759 + select CLKSRC_MMIO 760 + select TIMER_OF 761 + help 762 + Enables support for system tick counter present on 763 + Ralink SoCs RT3352 and MT7620. 756 764 757 765 endmenu
+1
drivers/clocksource/Makefile
··· 91 91 obj-$(CONFIG_GXP_TIMER) += timer-gxp.o 92 92 obj-$(CONFIG_CLKSRC_LOONGSON1_PWM) += timer-loongson1-pwm.o 93 93 obj-$(CONFIG_EP93XX_TIMER) += timer-ep93xx.o 94 + obj-$(CONFIG_RALINK_TIMER) += timer-ralink.o
+1 -3
drivers/clocksource/arm_arch_timer.c
··· 1179 1179 disable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi]); 1180 1180 if (arch_timer_has_nonsecure_ppi()) 1181 1181 disable_percpu_irq(arch_timer_ppi[ARCH_TIMER_PHYS_NONSECURE_PPI]); 1182 - 1183 - clk->set_state_shutdown(clk); 1184 1182 } 1185 1183 1186 1184 static int arch_timer_dying_cpu(unsigned int cpu) ··· 1428 1430 1429 1431 arch_timers_present |= ARCH_TIMER_TYPE_CP15; 1430 1432 1431 - has_names = of_property_read_bool(np, "interrupt-names"); 1433 + has_names = of_property_present(np, "interrupt-names"); 1432 1434 1433 1435 for (i = ARCH_TIMER_PHYS_SECURE_PPI; i < ARCH_TIMER_MAX_TIMER_PPI; i++) { 1434 1436 if (has_names)
-1
drivers/clocksource/arm_global_timer.c
··· 195 195 { 196 196 struct clock_event_device *clk = this_cpu_ptr(gt_evt); 197 197 198 - gt_clockevent_shutdown(clk); 199 198 disable_percpu_irq(clk->irq); 200 199 return 0; 201 200 }
-39
drivers/clocksource/dw_apb_timer.c
··· 68 68 writel_relaxed(val, timer->base + offs); 69 69 } 70 70 71 - static void apbt_disable_int(struct dw_apb_timer *timer) 72 - { 73 - u32 ctrl = apbt_readl(timer, APBTMR_N_CONTROL); 74 - 75 - ctrl |= APBTMR_CONTROL_INT; 76 - apbt_writel(timer, ctrl, APBTMR_N_CONTROL); 77 - } 78 - 79 - /** 80 - * dw_apb_clockevent_pause() - stop the clock_event_device from running 81 - * 82 - * @dw_ced: The APB clock to stop generating events. 83 - */ 84 - void dw_apb_clockevent_pause(struct dw_apb_clock_event_device *dw_ced) 85 - { 86 - disable_irq(dw_ced->timer.irq); 87 - apbt_disable_int(&dw_ced->timer); 88 - } 89 - 90 71 static void apbt_eoi(struct dw_apb_timer *timer) 91 72 { 92 73 apbt_readl_relaxed(timer, APBTMR_N_EOI); ··· 263 282 } 264 283 265 284 return dw_ced; 266 - } 267 - 268 - /** 269 - * dw_apb_clockevent_resume() - resume a clock that has been paused. 270 - * 271 - * @dw_ced: The APB clock to resume. 272 - */ 273 - void dw_apb_clockevent_resume(struct dw_apb_clock_event_device *dw_ced) 274 - { 275 - enable_irq(dw_ced->timer.irq); 276 - } 277 - 278 - /** 279 - * dw_apb_clockevent_stop() - stop the clock_event_device and release the IRQ. 280 - * 281 - * @dw_ced: The APB clock to stop generating the events. 282 - */ 283 - void dw_apb_clockevent_stop(struct dw_apb_clock_event_device *dw_ced) 284 - { 285 - free_irq(dw_ced->timer.irq, &dw_ced->ced); 286 285 } 287 286 288 287 /**
-1
drivers/clocksource/exynos_mct.c
··· 496 496 per_cpu_ptr(&percpu_mct_tick, cpu); 497 497 struct clock_event_device *evt = &mevt->evt; 498 498 499 - evt->set_state_shutdown(evt); 500 499 if (mct_int_type == MCT_INT_SPI) { 501 500 if (evt->irq != -1) 502 501 disable_irq_nosync(evt->irq);
+38 -1
drivers/clocksource/mips-gic-timer.c
··· 166 166 return gic_read_count(); 167 167 } 168 168 169 + static u64 gic_hpt_read_multicluster(struct clocksource *cs) 170 + { 171 + unsigned int hi, hi2, lo; 172 + u64 count; 173 + 174 + mips_cm_lock_other(0, 0, 0, CM_GCR_Cx_OTHER_BLOCK_GLOBAL); 175 + 176 + if (mips_cm_is64) { 177 + count = read_gic_redir_counter(); 178 + goto out; 179 + } 180 + 181 + hi = read_gic_redir_counter_32h(); 182 + while (true) { 183 + lo = read_gic_redir_counter_32l(); 184 + 185 + /* If hi didn't change then lo didn't wrap & we're done */ 186 + hi2 = read_gic_redir_counter_32h(); 187 + if (hi2 == hi) 188 + break; 189 + 190 + /* Otherwise, repeat with the latest hi value */ 191 + hi = hi2; 192 + } 193 + 194 + count = (((u64)hi) << 32) + lo; 195 + out: 196 + mips_cm_unlock_other(); 197 + return count; 198 + } 199 + 169 200 static struct clocksource gic_clocksource = { 170 201 .name = "GIC", 171 202 .read = gic_hpt_read, ··· 233 202 else 234 203 gic_clocksource.rating = 200; 235 204 gic_clocksource.rating += clamp(gic_frequency / 10000000, 0, 99); 205 + 206 + if (mips_cps_multicluster_cpus()) { 207 + gic_clocksource.read = &gic_hpt_read_multicluster; 208 + gic_clocksource.vdso_clock_mode = VDSO_CLOCKMODE_NONE; 209 + } 236 210 237 211 ret = clocksource_register_hz(&gic_clocksource, gic_frequency); 238 212 if (ret < 0) ··· 297 261 * stable CPU frequency or on the platforms with CM3 and CPU frequency 298 262 * change performed by the CPC core clocks divider. 299 263 */ 300 - if (mips_cm_revision() >= CM_REV_CM3 || !IS_ENABLED(CONFIG_CPU_FREQ)) { 264 + if ((mips_cm_revision() >= CM_REV_CM3 || !IS_ENABLED(CONFIG_CPU_FREQ)) && 265 + !mips_cps_multicluster_cpus()) { 301 266 sched_clock_register(mips_cm_is64 ? 302 267 gic_read_count_64 : gic_read_count_2x32, 303 268 gic_count_width, gic_frequency);
-1
drivers/clocksource/timer-armada-370-xp.c
··· 201 201 { 202 202 struct clock_event_device *evt = per_cpu_ptr(armada_370_xp_evt, cpu); 203 203 204 - evt->set_state_shutdown(evt); 205 204 disable_percpu_irq(evt->irq); 206 205 return 0; 207 206 }
+1 -1
drivers/clocksource/timer-gxp.c
··· 85 85 86 86 clk = of_clk_get(node, 0); 87 87 if (IS_ERR(clk)) { 88 - ret = (int)PTR_ERR(clk); 88 + ret = PTR_ERR(clk); 89 89 pr_err("%pOFn clock not found: %d\n", node, ret); 90 90 goto err_free; 91 91 }
-1
drivers/clocksource/timer-qcom.c
··· 130 130 { 131 131 struct clock_event_device *evt = per_cpu_ptr(msm_evt, cpu); 132 132 133 - evt->set_state_shutdown(evt); 134 133 disable_percpu_irq(evt->irq); 135 134 return 0; 136 135 }
-1
drivers/clocksource/timer-tegra.c
··· 158 158 { 159 159 struct timer_of *to = per_cpu_ptr(&tegra_to, cpu); 160 160 161 - to->clkevt.set_state_shutdown(&to->clkevt); 162 161 disable_irq_nosync(to->clkevt.irq); 163 162 164 163 return 0;
+4 -4
drivers/clocksource/timer-ti-dm-systimer.c
··· 202 202 203 203 /* Secure gptimer12 is always clocked with a fixed source */ 204 204 if (!of_property_read_bool(np, "ti,timer-secure")) { 205 - if (!of_property_read_bool(np, "assigned-clocks")) 205 + if (!of_property_present(np, "assigned-clocks")) 206 206 return false; 207 207 208 - if (!of_property_read_bool(np, "assigned-clock-parents")) 208 + if (!of_property_present(np, "assigned-clock-parents")) 209 209 return false; 210 210 } 211 211 ··· 686 686 687 687 static int __init dmtimer_percpu_quirk_init(struct device_node *np, u32 pa) 688 688 { 689 - struct device_node *arm_timer; 689 + struct device_node *arm_timer __free(device_node) = 690 + of_find_compatible_node(NULL, NULL, "arm,armv7-timer"); 690 691 691 - arm_timer = of_find_compatible_node(NULL, NULL, "arm,armv7-timer"); 692 692 if (of_device_is_available(arm_timer)) { 693 693 pr_warn_once("ARM architected timer wrap issue i940 detected\n"); 694 694 return 0;
+6 -2
drivers/clocksource/timer-ti-dm.c
··· 1104 1104 return -ENOMEM; 1105 1105 1106 1106 timer->irq = platform_get_irq(pdev, 0); 1107 - if (timer->irq < 0) 1108 - return timer->irq; 1107 + if (timer->irq < 0) { 1108 + if (of_property_read_bool(dev->of_node, "ti,timer-pwm")) 1109 + dev_info(dev, "Did not find timer interrupt, timer usable in PWM mode only\n"); 1110 + else 1111 + return timer->irq; 1112 + } 1109 1113 1110 1114 timer->io_base = devm_platform_ioremap_resource(pdev, 0); 1111 1115 if (IS_ERR(timer->io_base))
+9 -8
drivers/gpu/drm/i915/i915_request.c
··· 273 273 return ret; 274 274 } 275 275 276 - static void __rq_init_watchdog(struct i915_request *rq) 277 - { 278 - rq->watchdog.timer.function = NULL; 279 - } 280 - 281 276 static enum hrtimer_restart __rq_watchdog_expired(struct hrtimer *hrtimer) 282 277 { 283 278 struct i915_request *rq = ··· 289 294 return HRTIMER_NORESTART; 290 295 } 291 296 297 + static void __rq_init_watchdog(struct i915_request *rq) 298 + { 299 + struct i915_request_watchdog *wdg = &rq->watchdog; 300 + 301 + hrtimer_init(&wdg->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 302 + wdg->timer.function = __rq_watchdog_expired; 303 + } 304 + 292 305 static void __rq_arm_watchdog(struct i915_request *rq) 293 306 { 294 307 struct i915_request_watchdog *wdg = &rq->watchdog; ··· 307 304 308 305 i915_request_get(rq); 309 306 310 - hrtimer_init(&wdg->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 311 - wdg->timer.function = __rq_watchdog_expired; 312 307 hrtimer_start_range_ns(&wdg->timer, 313 308 ns_to_ktime(ce->watchdog.timeout_us * 314 309 NSEC_PER_USEC), ··· 318 317 { 319 318 struct i915_request_watchdog *wdg = &rq->watchdog; 320 319 321 - if (wdg->timer.function && hrtimer_try_to_cancel(&wdg->timer) > 0) 320 + if (hrtimer_try_to_cancel(&wdg->timer) > 0) 322 321 i915_request_put(rq); 323 322 } 324 323
+4 -13
drivers/media/usb/dvb-usb-v2/anysee.c
··· 46 46 47 47 dev_dbg(&d->udev->dev, "%s: >>> %*ph\n", __func__, slen, state->buf); 48 48 49 - /* We need receive one message more after dvb_usb_generic_rw due 50 - to weird transaction flow, which is 1 x send + 2 x receive. */ 49 + /* 50 + * We need receive one message more after dvb_usbv2_generic_rw_locked() 51 + * due to weird transaction flow, which is 1 x send + 2 x receive. 52 + */ 51 53 ret = dvb_usbv2_generic_rw_locked(d, state->buf, sizeof(state->buf), 52 54 state->buf, sizeof(state->buf)); 53 55 if (ret) 54 56 goto error_unlock; 55 - 56 - /* TODO FIXME: dvb_usb_generic_rw() fails rarely with error code -32 57 - * (EPIPE, Broken pipe). Function supports currently msleep() as a 58 - * parameter but I would not like to use it, since according to 59 - * Documentation/timers/timers-howto.rst it should not be used such 60 - * short, under < 20ms, sleeps. Repeating failed message would be 61 - * better choice as not to add unwanted delays... 62 - * Fixing that correctly is one of those or both; 63 - * 1) use repeat if possible 64 - * 2) add suitable delay 65 - */ 66 57 67 58 /* get answer, retry few times if error returned */ 68 59 for (i = 0; i < 3; i++) {
-2
drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
··· 823 823 824 824 INIT_WORK(&rt2x00dev->rxdone_work, rt2x00usb_work_rxdone); 825 825 INIT_WORK(&rt2x00dev->txdone_work, rt2x00usb_work_txdone); 826 - hrtimer_init(&rt2x00dev->txstatus_timer, CLOCK_MONOTONIC, 827 - HRTIMER_MODE_REL); 828 826 829 827 retval = rt2x00usb_alloc_reg(rt2x00dev); 830 828 if (retval)
+1 -2
drivers/power/supply/charger-manager.c
··· 1412 1412 return dev_get_platdata(&pdev->dev); 1413 1413 } 1414 1414 1415 - static enum alarmtimer_restart cm_timer_func(struct alarm *alarm, ktime_t now) 1415 + static void cm_timer_func(struct alarm *alarm, ktime_t now) 1416 1416 { 1417 1417 cm_timer_set = false; 1418 - return ALARMTIMER_NORESTART; 1419 1418 } 1420 1419 1421 1420 static int charger_manager_probe(struct platform_device *pdev)
+1 -1
fs/aio.c
··· 1335 1335 if (until == 0 || ret < 0 || ret >= min_nr) 1336 1336 return ret; 1337 1337 1338 - hrtimer_init_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 1338 + hrtimer_setup_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 1339 1339 if (until != KTIME_MAX) { 1340 1340 hrtimer_set_expires_range_ns(&t.timer, until, current->timer_slack_ns); 1341 1341 hrtimer_sleeper_start_expires(&t, HRTIMER_MODE_REL);
+2 -2
fs/proc/base.c
··· 2552 2552 2553 2553 seq_printf(m, "ID: %d\n", timer->it_id); 2554 2554 seq_printf(m, "signal: %d/%px\n", 2555 - timer->sigq->info.si_signo, 2556 - timer->sigq->info.si_value.sival_ptr); 2555 + timer->sigq.info.si_signo, 2556 + timer->sigq.info.si_value.sival_ptr); 2557 2557 seq_printf(m, "notify: %s/%s.%d\n", 2558 2558 nstr[notify & ~SIGEV_THREAD_ID], 2559 2559 (notify & SIGEV_THREAD_ID) ? "tid" : "pid",
+1 -3
fs/timerfd.c
··· 79 79 return HRTIMER_NORESTART; 80 80 } 81 81 82 - static enum alarmtimer_restart timerfd_alarmproc(struct alarm *alarm, 83 - ktime_t now) 82 + static void timerfd_alarmproc(struct alarm *alarm, ktime_t now) 84 83 { 85 84 struct timerfd_ctx *ctx = container_of(alarm, struct timerfd_ctx, 86 85 t.alarm); 87 86 timerfd_triggered(ctx); 88 - return ALARMTIMER_NORESTART; 89 87 } 90 88 91 89 /*
+68 -26
include/asm-generic/delay.h
··· 2 2 #ifndef __ASM_GENERIC_DELAY_H 3 3 #define __ASM_GENERIC_DELAY_H 4 4 5 + #include <linux/math.h> 6 + #include <vdso/time64.h> 7 + 5 8 /* Undefined functions to get compile-time errors */ 6 9 extern void __bad_udelay(void); 7 10 extern void __bad_ndelay(void); ··· 15 12 extern void __delay(unsigned long loops); 16 13 17 14 /* 18 - * The weird n/20000 thing suppresses a "comparison is always false due to 19 - * limited range of data type" warning with non-const 8-bit arguments. 15 + * The microseconds/nanosecond delay multiplicators are used to convert a 16 + * constant microseconds/nanoseconds value to a value which can be used by the 17 + * architectures specific implementation to transform it into loops. 20 18 */ 19 + #define UDELAY_CONST_MULT ((unsigned long)DIV_ROUND_UP(1ULL << 32, USEC_PER_SEC)) 20 + #define NDELAY_CONST_MULT ((unsigned long)DIV_ROUND_UP(1ULL << 32, NSEC_PER_SEC)) 21 21 22 - /* 0x10c7 is 2**32 / 1000000 (rounded up) */ 23 - #define udelay(n) \ 24 - ({ \ 25 - if (__builtin_constant_p(n)) { \ 26 - if ((n) / 20000 >= 1) \ 27 - __bad_udelay(); \ 28 - else \ 29 - __const_udelay((n) * 0x10c7ul); \ 30 - } else { \ 31 - __udelay(n); \ 32 - } \ 33 - }) 22 + /* 23 + * The maximum constant udelay/ndelay value picked out of thin air to prevent 24 + * too long constant udelays/ndelays. 25 + */ 26 + #define DELAY_CONST_MAX 20000 34 27 35 - /* 0x5 is 2**32 / 1000000000 (rounded up) */ 36 - #define ndelay(n) \ 37 - ({ \ 38 - if (__builtin_constant_p(n)) { \ 39 - if ((n) / 20000 >= 1) \ 40 - __bad_ndelay(); \ 41 - else \ 42 - __const_udelay((n) * 5ul); \ 43 - } else { \ 44 - __ndelay(n); \ 45 - } \ 46 - }) 28 + /** 29 + * udelay - Inserting a delay based on microseconds with busy waiting 30 + * @usec: requested delay in microseconds 31 + * 32 + * When delaying in an atomic context ndelay(), udelay() and mdelay() are the 33 + * only valid variants of delaying/sleeping to go with. 34 + * 35 + * When inserting delays in non atomic context which are shorter than the time 36 + * which is required to queue e.g. an hrtimer and to enter then the scheduler, 37 + * it is also valuable to use udelay(). But it is not simple to specify a 38 + * generic threshold for this which will fit for all systems. An approximation 39 + * is a threshold for all delays up to 10 microseconds. 40 + * 41 + * When having a delay which is larger than the architecture specific 42 + * %MAX_UDELAY_MS value, please make sure mdelay() is used. Otherwise a overflow 43 + * risk is given. 44 + * 45 + * Please note that ndelay(), udelay() and mdelay() may return early for several 46 + * reasons (https://lists.openwall.net/linux-kernel/2011/01/09/56): 47 + * 48 + * #. computed loops_per_jiffy too low (due to the time taken to execute the 49 + * timer interrupt.) 50 + * #. cache behaviour affecting the time it takes to execute the loop function. 51 + * #. CPU clock rate changes. 52 + */ 53 + static __always_inline void udelay(unsigned long usec) 54 + { 55 + if (__builtin_constant_p(usec)) { 56 + if (usec >= DELAY_CONST_MAX) 57 + __bad_udelay(); 58 + else 59 + __const_udelay(usec * UDELAY_CONST_MULT); 60 + } else { 61 + __udelay(usec); 62 + } 63 + } 64 + 65 + /** 66 + * ndelay - Inserting a delay based on nanoseconds with busy waiting 67 + * @nsec: requested delay in nanoseconds 68 + * 69 + * See udelay() for basic information about ndelay() and it's variants. 70 + */ 71 + static __always_inline void ndelay(unsigned long nsec) 72 + { 73 + if (__builtin_constant_p(nsec)) { 74 + if (nsec >= DELAY_CONST_MAX) 75 + __bad_udelay(); 76 + else 77 + __const_udelay(nsec * NDELAY_CONST_MULT); 78 + } else { 79 + __udelay(nsec); 80 + } 81 + } 82 + #define ndelay(x) ndelay(x) 47 83 48 84 #endif /* __ASM_GENERIC_DELAY_H */
+2 -8
include/linux/alarmtimer.h
··· 20 20 ALARM_BOOTTIME_FREEZER, 21 21 }; 22 22 23 - enum alarmtimer_restart { 24 - ALARMTIMER_NORESTART, 25 - ALARMTIMER_RESTART, 26 - }; 27 - 28 - 29 23 #define ALARMTIMER_STATE_INACTIVE 0x00 30 24 #define ALARMTIMER_STATE_ENQUEUED 0x01 31 25 ··· 36 42 struct alarm { 37 43 struct timerqueue_node node; 38 44 struct hrtimer timer; 39 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t now); 45 + void (*function)(struct alarm *, ktime_t now); 40 46 enum alarmtimer_type type; 41 47 int state; 42 48 void *data; 43 49 }; 44 50 45 51 void alarm_init(struct alarm *alarm, enum alarmtimer_type type, 46 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t)); 52 + void (*function)(struct alarm *, ktime_t)); 47 53 void alarm_start(struct alarm *alarm, ktime_t start); 48 54 void alarm_start_relative(struct alarm *alarm, ktime_t start); 49 55 void alarm_restart(struct alarm *alarm);
-1
include/linux/clocksource.h
··· 215 215 216 216 extern int clocksource_unregister(struct clocksource*); 217 217 extern void clocksource_touch_watchdog(void); 218 - extern void clocksource_change_rating(struct clocksource *cs, int rating); 219 218 extern void clocksource_suspend(void); 220 219 extern void clocksource_resume(void); 221 220 extern struct clocksource * __init clocksource_default_clock(void);
+62 -17
include/linux/delay.h
··· 6 6 * Copyright (C) 1993 Linus Torvalds 7 7 * 8 8 * Delay routines, using a pre-computed "loops_per_jiffy" value. 9 - * 10 - * Please note that ndelay(), udelay() and mdelay() may return early for 11 - * several reasons: 12 - * 1. computed loops_per_jiffy too low (due to the time taken to 13 - * execute the timer interrupt.) 14 - * 2. cache behaviour affecting the time it takes to execute the 15 - * loop function. 16 - * 3. CPU clock rate changes. 17 - * 18 - * Please see this thread: 19 - * https://lists.openwall.net/linux-kernel/2011/01/09/56 9 + * Sleep routines using timer list timers or hrtimers. 20 10 */ 21 11 22 12 #include <linux/math.h> 23 13 #include <linux/sched.h> 14 + #include <linux/jiffies.h> 24 15 25 16 extern unsigned long loops_per_jiffy; 26 17 ··· 26 35 * The 2nd mdelay() definition ensures GCC will optimize away the 27 36 * while loop for the common cases where n <= MAX_UDELAY_MS -- Paul G. 28 37 */ 29 - 30 38 #ifndef MAX_UDELAY_MS 31 39 #define MAX_UDELAY_MS 5 32 40 #endif 33 41 34 42 #ifndef mdelay 43 + /** 44 + * mdelay - Inserting a delay based on milliseconds with busy waiting 45 + * @n: requested delay in milliseconds 46 + * 47 + * See udelay() for basic information about mdelay() and it's variants. 48 + * 49 + * Please double check, whether mdelay() is the right way to go or whether a 50 + * refactoring of the code is the better variant to be able to use msleep() 51 + * instead. 52 + */ 35 53 #define mdelay(n) (\ 36 54 (__builtin_constant_p(n) && (n)<=MAX_UDELAY_MS) ? udelay((n)*1000) : \ 37 55 ({unsigned long __ms=(n); while (__ms--) udelay(1000);})) ··· 63 63 void usleep_range_state(unsigned long min, unsigned long max, 64 64 unsigned int state); 65 65 66 + /** 67 + * usleep_range - Sleep for an approximate time 68 + * @min: Minimum time in microseconds to sleep 69 + * @max: Maximum time in microseconds to sleep 70 + * 71 + * For basic information please refere to usleep_range_state(). 72 + * 73 + * The task will be in the state TASK_UNINTERRUPTIBLE during the sleep. 74 + */ 66 75 static inline void usleep_range(unsigned long min, unsigned long max) 67 76 { 68 77 usleep_range_state(min, max, TASK_UNINTERRUPTIBLE); 69 78 } 70 79 71 - static inline void usleep_idle_range(unsigned long min, unsigned long max) 80 + /** 81 + * usleep_range_idle - Sleep for an approximate time with idle time accounting 82 + * @min: Minimum time in microseconds to sleep 83 + * @max: Maximum time in microseconds to sleep 84 + * 85 + * For basic information please refere to usleep_range_state(). 86 + * 87 + * The sleeping task has the state TASK_IDLE during the sleep to prevent 88 + * contribution to the load avarage. 89 + */ 90 + static inline void usleep_range_idle(unsigned long min, unsigned long max) 72 91 { 73 92 usleep_range_state(min, max, TASK_IDLE); 74 93 } 75 94 95 + /** 96 + * ssleep - wrapper for seconds around msleep 97 + * @seconds: Requested sleep duration in seconds 98 + * 99 + * Please refere to msleep() for detailed information. 100 + */ 76 101 static inline void ssleep(unsigned int seconds) 77 102 { 78 103 msleep(seconds * 1000); 79 104 } 80 105 81 - /* see Documentation/timers/timers-howto.rst for the thresholds */ 106 + static const unsigned int max_slack_shift = 2; 107 + #define USLEEP_RANGE_UPPER_BOUND ((TICK_NSEC << max_slack_shift) / NSEC_PER_USEC) 108 + 109 + /** 110 + * fsleep - flexible sleep which autoselects the best mechanism 111 + * @usecs: requested sleep duration in microseconds 112 + * 113 + * flseep() selects the best mechanism that will provide maximum 25% slack 114 + * to the requested sleep duration. Therefore it uses: 115 + * 116 + * * udelay() loop for sleep durations <= 10 microseconds to avoid hrtimer 117 + * overhead for really short sleep durations. 118 + * * usleep_range() for sleep durations which would lead with the usage of 119 + * msleep() to a slack larger than 25%. This depends on the granularity of 120 + * jiffies. 121 + * * msleep() for all other sleep durations. 122 + * 123 + * Note: When %CONFIG_HIGH_RES_TIMERS is not set, all sleeps are processed with 124 + * the granularity of jiffies and the slack might exceed 25% especially for 125 + * short sleep durations. 126 + */ 82 127 static inline void fsleep(unsigned long usecs) 83 128 { 84 129 if (usecs <= 10) 85 130 udelay(usecs); 86 - else if (usecs <= 20000) 87 - usleep_range(usecs, 2 * usecs); 131 + else if (usecs < USLEEP_RANGE_UPPER_BOUND) 132 + usleep_range(usecs, usecs + (usecs >> max_slack_shift)); 88 133 else 89 - msleep(DIV_ROUND_UP(usecs, 1000)); 134 + msleep(DIV_ROUND_UP(usecs, USEC_PER_MSEC)); 90 135 } 91 136 92 137 #endif /* defined(_LINUX_DELAY_H) */
-3
include/linux/dw_apb_timer.h
··· 34 34 }; 35 35 36 36 void dw_apb_clockevent_register(struct dw_apb_clock_event_device *dw_ced); 37 - void dw_apb_clockevent_pause(struct dw_apb_clock_event_device *dw_ced); 38 - void dw_apb_clockevent_resume(struct dw_apb_clock_event_device *dw_ced); 39 - void dw_apb_clockevent_stop(struct dw_apb_clock_event_device *dw_ced); 40 37 41 38 struct dw_apb_clock_event_device * 42 39 dw_apb_clockevent_init(int cpu, const char *name, unsigned rating,
+29 -22
include/linux/hrtimer.h
··· 228 228 /* Initialize timers: */ 229 229 extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, 230 230 enum hrtimer_mode mode); 231 - extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, 232 - enum hrtimer_mode mode); 231 + extern void hrtimer_setup(struct hrtimer *timer, enum hrtimer_restart (*function)(struct hrtimer *), 232 + clockid_t clock_id, enum hrtimer_mode mode); 233 + extern void hrtimer_setup_on_stack(struct hrtimer *timer, 234 + enum hrtimer_restart (*function)(struct hrtimer *), 235 + clockid_t clock_id, enum hrtimer_mode mode); 236 + extern void hrtimer_setup_sleeper_on_stack(struct hrtimer_sleeper *sl, clockid_t clock_id, 237 + enum hrtimer_mode mode); 233 238 234 239 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS 235 - extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, 236 - enum hrtimer_mode mode); 237 - extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, 238 - clockid_t clock_id, 239 - enum hrtimer_mode mode); 240 - 241 240 extern void destroy_hrtimer_on_stack(struct hrtimer *timer); 242 241 #else 243 - static inline void hrtimer_init_on_stack(struct hrtimer *timer, 244 - clockid_t which_clock, 245 - enum hrtimer_mode mode) 246 - { 247 - hrtimer_init(timer, which_clock, mode); 248 - } 249 - 250 - static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, 251 - clockid_t clock_id, 252 - enum hrtimer_mode mode) 253 - { 254 - hrtimer_init_sleeper(sl, clock_id, mode); 255 - } 256 - 257 242 static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } 258 243 #endif 259 244 ··· 320 335 static inline int hrtimer_callback_running(struct hrtimer *timer) 321 336 { 322 337 return timer->base->running == timer; 338 + } 339 + 340 + /** 341 + * hrtimer_update_function - Update the timer's callback function 342 + * @timer: Timer to update 343 + * @function: New callback function 344 + * 345 + * Only safe to call if the timer is not enqueued. Can be called in the callback function if the 346 + * timer is not enqueued at the same time (see the comments above HRTIMER_STATE_ENQUEUED). 347 + */ 348 + static inline void hrtimer_update_function(struct hrtimer *timer, 349 + enum hrtimer_restart (*function)(struct hrtimer *)) 350 + { 351 + guard(raw_spinlock_irqsave)(&timer->base->cpu_base->lock); 352 + 353 + if (WARN_ON_ONCE(hrtimer_is_queued(timer))) 354 + return; 355 + 356 + if (WARN_ON_ONCE(!function)) 357 + return; 358 + 359 + timer->function = function; 323 360 } 324 361 325 362 /* Forward a hrtimer so it expires after now: */
+26 -26
include/linux/iopoll.h
··· 19 19 * @op: accessor function (takes @args as its arguments) 20 20 * @val: Variable to read the value into 21 21 * @cond: Break condition (usually involving @val) 22 - * @sleep_us: Maximum time to sleep between reads in us (0 23 - * tight-loops). Should be less than ~20ms since usleep_range 24 - * is used (see Documentation/timers/timers-howto.rst). 22 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 23 + * read usleep_range() function description for details and 24 + * limitations. 25 25 * @timeout_us: Timeout in us, 0 means never timeout 26 26 * @sleep_before_read: if it is true, sleep @sleep_us before read. 27 27 * @args: arguments for @op poll 28 28 * 29 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 30 - * case, the last read value at @args is stored in @val. Must not 31 - * be called from atomic context if sleep_us or timeout_us are used. 32 - * 33 29 * When available, you'll probably want to use one of the specialized 34 30 * macros defined below rather than this macro directly. 31 + * 32 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 33 + * case, the last read value at @args is stored in @val. Must not 34 + * be called from atomic context if sleep_us or timeout_us are used. 35 35 */ 36 36 #define read_poll_timeout(op, val, cond, sleep_us, timeout_us, \ 37 37 sleep_before_read, args...) \ ··· 64 64 * @op: accessor function (takes @args as its arguments) 65 65 * @val: Variable to read the value into 66 66 * @cond: Break condition (usually involving @val) 67 - * @delay_us: Time to udelay between reads in us (0 tight-loops). Should 68 - * be less than ~10us since udelay is used (see 69 - * Documentation/timers/timers-howto.rst). 67 + * @delay_us: Time to udelay between reads in us (0 tight-loops). Please 68 + * read udelay() function description for details and 69 + * limitations. 70 70 * @timeout_us: Timeout in us, 0 means never timeout 71 71 * @delay_before_read: if it is true, delay @delay_us before read. 72 72 * @args: arguments for @op poll 73 - * 74 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 75 - * case, the last read value at @args is stored in @val. 76 73 * 77 74 * This macro does not rely on timekeeping. Hence it is safe to call even when 78 75 * timekeeping is suspended, at the expense of an underestimation of wall clock ··· 77 80 * 78 81 * When available, you'll probably want to use one of the specialized 79 82 * macros defined below rather than this macro directly. 83 + * 84 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 85 + * case, the last read value at @args is stored in @val. 80 86 */ 81 87 #define read_poll_timeout_atomic(op, val, cond, delay_us, timeout_us, \ 82 88 delay_before_read, args...) \ ··· 119 119 * @addr: Address to poll 120 120 * @val: Variable to read the value into 121 121 * @cond: Break condition (usually involving @val) 122 - * @sleep_us: Maximum time to sleep between reads in us (0 123 - * tight-loops). Should be less than ~20ms since usleep_range 124 - * is used (see Documentation/timers/timers-howto.rst). 122 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 123 + * read usleep_range() function description for details and 124 + * limitations. 125 125 * @timeout_us: Timeout in us, 0 means never timeout 126 - * 127 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 128 - * case, the last read value at @addr is stored in @val. Must not 129 - * be called from atomic context if sleep_us or timeout_us are used. 130 126 * 131 127 * When available, you'll probably want to use one of the specialized 132 128 * macros defined below rather than this macro directly. 129 + * 130 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 131 + * case, the last read value at @addr is stored in @val. Must not 132 + * be called from atomic context if sleep_us or timeout_us are used. 133 133 */ 134 134 #define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us) \ 135 135 read_poll_timeout(op, val, cond, sleep_us, timeout_us, false, addr) ··· 140 140 * @addr: Address to poll 141 141 * @val: Variable to read the value into 142 142 * @cond: Break condition (usually involving @val) 143 - * @delay_us: Time to udelay between reads in us (0 tight-loops). Should 144 - * be less than ~10us since udelay is used (see 145 - * Documentation/timers/timers-howto.rst). 143 + * @delay_us: Time to udelay between reads in us (0 tight-loops). Please 144 + * read udelay() function description for details and 145 + * limitations. 146 146 * @timeout_us: Timeout in us, 0 means never timeout 147 - * 148 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 149 - * case, the last read value at @addr is stored in @val. 150 147 * 151 148 * When available, you'll probably want to use one of the specialized 152 149 * macros defined below rather than this macro directly. 150 + * 151 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 152 + * case, the last read value at @addr is stored in @val. 153 153 */ 154 154 #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \ 155 155 read_poll_timeout_atomic(op, val, cond, delay_us, timeout_us, false, addr)
+14 -1
include/linux/jiffies.h
··· 502 502 * - all other values are converted to jiffies by either multiplying 503 503 * the input value by a factor or dividing it with a factor and 504 504 * handling any 32-bit overflows. 505 - * for the details see __msecs_to_jiffies() 505 + * for the details see _msecs_to_jiffies() 506 506 * 507 507 * msecs_to_jiffies() checks for the passed in value being a constant 508 508 * via __builtin_constant_p() allowing gcc to eliminate most of the ··· 525 525 return __msecs_to_jiffies(m); 526 526 } 527 527 } 528 + 529 + /** 530 + * secs_to_jiffies: - convert seconds to jiffies 531 + * @_secs: time in seconds 532 + * 533 + * Conversion is done by simple multiplication with HZ 534 + * 535 + * secs_to_jiffies() is defined as a macro rather than a static inline 536 + * function so it can be used in static initializers. 537 + * 538 + * Return: jiffies value 539 + */ 540 + #define secs_to_jiffies(_secs) ((_secs) * HZ) 528 541 529 542 extern unsigned long __usecs_to_jiffies(const unsigned int u); 530 543 #if !(USEC_PER_SEC % HZ)
+5 -4
include/linux/phy.h
··· 1378 1378 * @regnum: The register on the MMD to read 1379 1379 * @val: Variable to read the register into 1380 1380 * @cond: Break condition (usually involving @val) 1381 - * @sleep_us: Maximum time to sleep between reads in us (0 1382 - * tight-loops). Should be less than ~20ms since usleep_range 1383 - * is used (see Documentation/timers/timers-howto.rst). 1381 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 1382 + * read usleep_range() function description for details and 1383 + * limitations. 1384 1384 * @timeout_us: Timeout in us, 0 means never timeout 1385 1385 * @sleep_before_read: if it is true, sleep @sleep_us before read. 1386 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 1386 + * 1387 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 1387 1388 * case, the last read value at @args is stored in @val. Must not 1388 1389 * be called from atomic context if sleep_us or timeout_us are used. 1389 1390 */
+60 -12
include/linux/posix-timers.h
··· 5 5 #include <linux/alarmtimer.h> 6 6 #include <linux/list.h> 7 7 #include <linux/mutex.h> 8 + #include <linux/pid.h> 8 9 #include <linux/posix-timers_types.h> 10 + #include <linux/rcuref.h> 9 11 #include <linux/spinlock.h> 10 12 #include <linux/timerqueue.h> 11 13 12 14 struct kernel_siginfo; 13 15 struct task_struct; 16 + struct sigqueue; 17 + struct k_itimer; 14 18 15 19 static inline clockid_t make_process_cpuclock(const unsigned int pid, 16 20 const clockid_t clock) ··· 39 35 40 36 #ifdef CONFIG_POSIX_TIMERS 41 37 38 + #include <linux/signal_types.h> 39 + 42 40 /** 43 41 * cpu_timer - Posix CPU timer representation for k_itimer 44 42 * @node: timerqueue node to queue in the task/sig ··· 48 42 * @pid: Pointer to target task PID 49 43 * @elist: List head for the expiry list 50 44 * @firing: Timer is currently firing 45 + * @nanosleep: Timer is used for nanosleep and is not a regular posix-timer 51 46 * @handling: Pointer to the task which handles expiry 52 47 */ 53 48 struct cpu_timer { ··· 56 49 struct timerqueue_head *head; 57 50 struct pid *pid; 58 51 struct list_head elist; 59 - int firing; 52 + bool firing; 53 + bool nanosleep; 60 54 struct task_struct __rcu *handling; 61 55 }; 62 56 ··· 109 101 pct->bases[CPUCLOCK_SCHED].nextevt = runtime; 110 102 } 111 103 104 + void posixtimer_rearm_itimer(struct task_struct *p); 105 + bool posixtimer_init_sigqueue(struct sigqueue *q); 106 + void posixtimer_send_sigqueue(struct k_itimer *tmr); 107 + bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueue *timer_sigq); 108 + void posixtimer_free_timer(struct k_itimer *timer); 109 + 112 110 /* Init task static initializer */ 113 111 #define INIT_CPU_TIMERBASE(b) { \ 114 112 .nextevt = U64_MAX, \ ··· 136 122 static inline void posix_cputimers_init(struct posix_cputimers *pct) { } 137 123 static inline void posix_cputimers_group_init(struct posix_cputimers *pct, 138 124 u64 cpu_limit) { } 125 + static inline void posixtimer_rearm_itimer(struct task_struct *p) { } 126 + static inline bool posixtimer_deliver_signal(struct kernel_siginfo *info, 127 + struct sigqueue *timer_sigq) { return false; } 128 + static inline void posixtimer_free_timer(struct k_itimer *timer) { } 139 129 #endif 140 130 141 131 #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK ··· 150 132 static inline void posix_cputimers_init_work(void) { } 151 133 #endif 152 134 153 - #define REQUEUE_PENDING 1 154 - 155 135 /** 156 136 * struct k_itimer - POSIX.1b interval timer structure. 157 - * @list: List head for binding the timer to signals->posix_timers 137 + * @list: List node for binding the timer to tsk::signal::posix_timers 138 + * @ignored_list: List node for tracking ignored timers in tsk::signal::ignored_posix_timers 158 139 * @t_hash: Entry in the posix timer hash table 159 140 * @it_lock: Lock protecting the timer 160 141 * @kclock: Pointer to the k_clock struct handling this timer 161 142 * @it_clock: The posix timer clock id 162 143 * @it_id: The posix timer id for identifying the timer 163 - * @it_active: Marker that timer is active 144 + * @it_status: The status of the timer 145 + * @it_sig_periodic: The periodic status at signal delivery 164 146 * @it_overrun: The overrun counter for pending signals 165 147 * @it_overrun_last: The overrun at the time of the last delivered signal 166 - * @it_requeue_pending: Indicator that timer waits for being requeued on 167 - * signal delivery 148 + * @it_signal_seq: Sequence count to control signal delivery 149 + * @it_sigqueue_seq: The sequence count at the point where the signal was queued 168 150 * @it_sigev_notify: The notify word of sigevent struct for signal delivery 169 151 * @it_interval: The interval for periodic timers 170 152 * @it_signal: Pointer to the creators signal struct 171 153 * @it_pid: The pid of the process/task targeted by the signal 172 154 * @it_process: The task to wakeup on clock_nanosleep (CPU timers) 173 - * @sigq: Pointer to preallocated sigqueue 155 + * @rcuref: Reference count for life time management 156 + * @sigq: Embedded sigqueue 174 157 * @it: Union representing the various posix timer type 175 158 * internals. 176 159 * @rcu: RCU head for freeing the timer. 177 160 */ 178 161 struct k_itimer { 179 162 struct hlist_node list; 163 + struct hlist_node ignored_list; 180 164 struct hlist_node t_hash; 181 165 spinlock_t it_lock; 182 166 const struct k_clock *kclock; 183 167 clockid_t it_clock; 184 168 timer_t it_id; 185 - int it_active; 169 + int it_status; 170 + bool it_sig_periodic; 186 171 s64 it_overrun; 187 172 s64 it_overrun_last; 188 - int it_requeue_pending; 173 + unsigned int it_signal_seq; 174 + unsigned int it_sigqueue_seq; 189 175 int it_sigev_notify; 176 + enum pid_type it_pid_type; 190 177 ktime_t it_interval; 191 178 struct signal_struct *it_signal; 192 179 union { 193 180 struct pid *it_pid; 194 181 struct task_struct *it_process; 195 182 }; 196 - struct sigqueue *sigq; 183 + struct sigqueue sigq; 184 + rcuref_t rcuref; 197 185 union { 198 186 struct { 199 187 struct hrtimer timer; ··· 220 196 221 197 int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); 222 198 223 - void posixtimer_rearm(struct kernel_siginfo *info); 199 + #ifdef CONFIG_POSIX_TIMERS 200 + static inline void posixtimer_putref(struct k_itimer *tmr) 201 + { 202 + if (rcuref_put(&tmr->rcuref)) 203 + posixtimer_free_timer(tmr); 204 + } 205 + 206 + static inline void posixtimer_sigqueue_getref(struct sigqueue *q) 207 + { 208 + struct k_itimer *tmr = container_of(q, struct k_itimer, sigq); 209 + 210 + WARN_ON_ONCE(!rcuref_get(&tmr->rcuref)); 211 + } 212 + 213 + static inline void posixtimer_sigqueue_putref(struct sigqueue *q) 214 + { 215 + struct k_itimer *tmr = container_of(q, struct k_itimer, sigq); 216 + 217 + posixtimer_putref(tmr); 218 + } 219 + #else /* CONFIG_POSIX_TIMERS */ 220 + static inline void posixtimer_sigqueue_getref(struct sigqueue *q) { } 221 + static inline void posixtimer_sigqueue_putref(struct sigqueue *q) { } 222 + #endif /* !CONFIG_POSIX_TIMERS */ 223 + 224 224 #endif
+19 -19
include/linux/regmap.h
··· 106 106 * @addr: Address to poll 107 107 * @val: Unsigned integer variable to read the value into 108 108 * @cond: Break condition (usually involving @val) 109 - * @sleep_us: Maximum time to sleep between reads in us (0 110 - * tight-loops). Should be less than ~20ms since usleep_range 111 - * is used (see Documentation/timers/timers-howto.rst). 109 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 110 + * read usleep_range() function description for details and 111 + * limitations. 112 112 * @timeout_us: Timeout in us, 0 means never timeout 113 113 * 114 - * Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_read 114 + * This is modelled after the readx_poll_timeout macros in linux/iopoll.h. 115 + * 116 + * Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_read 115 117 * error return value in case of a error read. In the two former cases, 116 118 * the last read value at @addr is stored in @val. Must not be called 117 119 * from atomic context if sleep_us or timeout_us are used. 118 - * 119 - * This is modelled after the readx_poll_timeout macros in linux/iopoll.h. 120 120 */ 121 121 #define regmap_read_poll_timeout(map, addr, val, cond, sleep_us, timeout_us) \ 122 122 ({ \ ··· 133 133 * @addr: Address to poll 134 134 * @val: Unsigned integer variable to read the value into 135 135 * @cond: Break condition (usually involving @val) 136 - * @delay_us: Time to udelay between reads in us (0 tight-loops). 137 - * Should be less than ~10us since udelay is used 138 - * (see Documentation/timers/timers-howto.rst). 136 + * @delay_us: Time to udelay between reads in us (0 tight-loops). Please 137 + * read udelay() function description for details and 138 + * limitations. 139 139 * @timeout_us: Timeout in us, 0 means never timeout 140 - * 141 - * Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_read 142 - * error return value in case of a error read. In the two former cases, 143 - * the last read value at @addr is stored in @val. 144 140 * 145 141 * This is modelled after the readx_poll_timeout_atomic macros in linux/iopoll.h. 146 142 * 147 143 * Note: In general regmap cannot be used in atomic context. If you want to use 148 144 * this macro then first setup your regmap for atomic use (flat or no cache 149 145 * and MMIO regmap). 146 + * 147 + * Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_read 148 + * error return value in case of a error read. In the two former cases, 149 + * the last read value at @addr is stored in @val. 150 150 */ 151 151 #define regmap_read_poll_timeout_atomic(map, addr, val, cond, delay_us, timeout_us) \ 152 152 ({ \ ··· 177 177 * @field: Regmap field to read from 178 178 * @val: Unsigned integer variable to read the value into 179 179 * @cond: Break condition (usually involving @val) 180 - * @sleep_us: Maximum time to sleep between reads in us (0 181 - * tight-loops). Should be less than ~20ms since usleep_range 182 - * is used (see Documentation/timers/timers-howto.rst). 180 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 181 + * read usleep_range() function description for details and 182 + * limitations. 183 183 * @timeout_us: Timeout in us, 0 means never timeout 184 184 * 185 - * Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_field_read 185 + * This is modelled after the readx_poll_timeout macros in linux/iopoll.h. 186 + * 187 + * Returns: 0 on success and -ETIMEDOUT upon a timeout or the regmap_field_read 186 188 * error return value in case of a error read. In the two former cases, 187 189 * the last read value at @addr is stored in @val. Must not be called 188 190 * from atomic context if sleep_us or timeout_us are used. 189 - * 190 - * This is modelled after the readx_poll_timeout macros in linux/iopoll.h. 191 191 */ 192 192 #define regmap_field_read_poll_timeout(field, val, cond, sleep_us, timeout_us) \ 193 193 ({ \
+1 -3
include/linux/sched/signal.h
··· 138 138 /* POSIX.1b Interval Timers */ 139 139 unsigned int next_posix_timer_id; 140 140 struct hlist_head posix_timers; 141 + struct hlist_head ignored_posix_timers; 141 142 142 143 /* ITIMER_REAL timer for the process */ 143 144 struct hrtimer real_timer; ··· 339 338 extern void force_exit_sig(int); 340 339 extern int send_sig(int, struct task_struct *, int); 341 340 extern int zap_other_threads(struct task_struct *p); 342 - extern struct sigqueue *sigqueue_alloc(void); 343 - extern void sigqueue_free(struct sigqueue *); 344 - extern int send_sigqueue(struct sigqueue *, struct pid *, enum pid_type); 345 341 extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *); 346 342 347 343 static inline void clear_notify_signal(void)
-2
include/linux/tick.h
··· 20 20 extern void tick_suspend_local(void); 21 21 /* Should be core only, but XEN resume magic and ARM BL switcher require it */ 22 22 extern void tick_resume_local(void); 23 - extern void tick_cleanup_dead_cpu(int cpu); 24 23 #else /* CONFIG_GENERIC_CLOCKEVENTS */ 25 24 static inline void tick_init(void) { } 26 25 static inline void tick_suspend_local(void) { } 27 26 static inline void tick_resume_local(void) { } 28 - static inline void tick_cleanup_dead_cpu(int cpu) { } 29 27 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */ 30 28 31 29 #if defined(CONFIG_GENERIC_CLOCKEVENTS) && defined(CONFIG_HOTPLUG_CPU)
+61 -53
include/linux/timekeeper_internal.h
··· 26 26 * occupies a single 64byte cache line. 27 27 * 28 28 * The struct is separate from struct timekeeper as it is also used 29 - * for a fast NMI safe accessors. 29 + * for the fast NMI safe accessors. 30 30 * 31 31 * @base_real is for the fast NMI safe accessor to allow reading clock 32 32 * realtime from any context. ··· 44 44 45 45 /** 46 46 * struct timekeeper - Structure holding internal timekeeping values. 47 - * @tkr_mono: The readout base structure for CLOCK_MONOTONIC 48 - * @tkr_raw: The readout base structure for CLOCK_MONOTONIC_RAW 49 - * @xtime_sec: Current CLOCK_REALTIME time in seconds 50 - * @ktime_sec: Current CLOCK_MONOTONIC time in seconds 51 - * @wall_to_monotonic: CLOCK_REALTIME to CLOCK_MONOTONIC offset 52 - * @offs_real: Offset clock monotonic -> clock realtime 53 - * @offs_boot: Offset clock monotonic -> clock boottime 54 - * @offs_tai: Offset clock monotonic -> clock tai 55 - * @tai_offset: The current UTC to TAI offset in seconds 56 - * @clock_was_set_seq: The sequence number of clock was set events 57 - * @cs_was_changed_seq: The sequence number of clocksource change events 58 - * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second 59 - * @raw_sec: CLOCK_MONOTONIC_RAW time in seconds 60 - * @monotonic_to_boot: CLOCK_MONOTONIC to CLOCK_BOOTTIME offset 61 - * @cycle_interval: Number of clock cycles in one NTP interval 62 - * @xtime_interval: Number of clock shifted nano seconds in one NTP 63 - * interval. 64 - * @xtime_remainder: Shifted nano seconds left over when rounding 65 - * @cycle_interval 66 - * @raw_interval: Shifted raw nano seconds accumulated per NTP interval. 67 - * @ntp_error: Difference between accumulated time and NTP time in ntp 68 - * shifted nano seconds. 69 - * @ntp_error_shift: Shift conversion between clock shifted nano seconds and 70 - * ntp shifted nano seconds. 71 - * @last_warning: Warning ratelimiter (DEBUG_TIMEKEEPING) 72 - * @underflow_seen: Underflow warning flag (DEBUG_TIMEKEEPING) 73 - * @overflow_seen: Overflow warning flag (DEBUG_TIMEKEEPING) 47 + * @tkr_mono: The readout base structure for CLOCK_MONOTONIC 48 + * @xtime_sec: Current CLOCK_REALTIME time in seconds 49 + * @ktime_sec: Current CLOCK_MONOTONIC time in seconds 50 + * @wall_to_monotonic: CLOCK_REALTIME to CLOCK_MONOTONIC offset 51 + * @offs_real: Offset clock monotonic -> clock realtime 52 + * @offs_boot: Offset clock monotonic -> clock boottime 53 + * @offs_tai: Offset clock monotonic -> clock tai 54 + * @tai_offset: The current UTC to TAI offset in seconds 55 + * @tkr_raw: The readout base structure for CLOCK_MONOTONIC_RAW 56 + * @raw_sec: CLOCK_MONOTONIC_RAW time in seconds 57 + * @clock_was_set_seq: The sequence number of clock was set events 58 + * @cs_was_changed_seq: The sequence number of clocksource change events 59 + * @monotonic_to_boot: CLOCK_MONOTONIC to CLOCK_BOOTTIME offset 60 + * @cycle_interval: Number of clock cycles in one NTP interval 61 + * @xtime_interval: Number of clock shifted nano seconds in one NTP 62 + * interval. 63 + * @xtime_remainder: Shifted nano seconds left over when rounding 64 + * @cycle_interval 65 + * @raw_interval: Shifted raw nano seconds accumulated per NTP interval. 66 + * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second 67 + * @ntp_tick: The ntp_tick_length() value currently being 68 + * used. This cached copy ensures we consistently 69 + * apply the tick length for an entire tick, as 70 + * ntp_tick_length may change mid-tick, and we don't 71 + * want to apply that new value to the tick in 72 + * progress. 73 + * @ntp_error: Difference between accumulated time and NTP time in ntp 74 + * shifted nano seconds. 75 + * @ntp_error_shift: Shift conversion between clock shifted nano seconds and 76 + * ntp shifted nano seconds. 77 + * @ntp_err_mult: Multiplication factor for scaled math conversion 78 + * @skip_second_overflow: Flag used to avoid updating NTP twice with same second 74 79 * 75 80 * Note: For timespec(64) based interfaces wall_to_monotonic is what 76 81 * we need to add to xtime (or xtime corrected for sub jiffy times) ··· 93 88 * 94 89 * @monotonic_to_boottime is a timespec64 representation of @offs_boot to 95 90 * accelerate the VDSO update for CLOCK_BOOTTIME. 91 + * 92 + * The cacheline ordering of the structure is optimized for in kernel usage of 93 + * the ktime_get() and ktime_get_ts64() family of time accessors. Struct 94 + * timekeeper is prepended in the core timekeeping code with a sequence count, 95 + * which results in the following cacheline layout: 96 + * 97 + * 0: seqcount, tkr_mono 98 + * 1: xtime_sec ... tai_offset 99 + * 2: tkr_raw, raw_sec 100 + * 3,4: Internal variables 101 + * 102 + * Cacheline 0,1 contain the data which is used for accessing 103 + * CLOCK_MONOTONIC/REALTIME/BOOTTIME/TAI, while cacheline 2 contains the 104 + * data for accessing CLOCK_MONOTONIC_RAW. Cacheline 3,4 are internal 105 + * variables which are only accessed during timekeeper updates once per 106 + * tick. 96 107 */ 97 108 struct timekeeper { 109 + /* Cacheline 0 (together with prepended seqcount of timekeeper core): */ 98 110 struct tk_read_base tkr_mono; 99 - struct tk_read_base tkr_raw; 111 + 112 + /* Cacheline 1: */ 100 113 u64 xtime_sec; 101 114 unsigned long ktime_sec; 102 115 struct timespec64 wall_to_monotonic; ··· 122 99 ktime_t offs_boot; 123 100 ktime_t offs_tai; 124 101 s32 tai_offset; 102 + 103 + /* Cacheline 2: */ 104 + struct tk_read_base tkr_raw; 105 + u64 raw_sec; 106 + 107 + /* Cachline 3 and 4 (timekeeping internal variables): */ 125 108 unsigned int clock_was_set_seq; 126 109 u8 cs_was_changed_seq; 127 - ktime_t next_leap_ktime; 128 - u64 raw_sec; 110 + 129 111 struct timespec64 monotonic_to_boot; 130 112 131 - /* The following members are for timekeeping internal use */ 132 113 u64 cycle_interval; 133 114 u64 xtime_interval; 134 115 s64 xtime_remainder; 135 116 u64 raw_interval; 136 - /* The ntp_tick_length() value currently being used. 137 - * This cached copy ensures we consistently apply the tick 138 - * length for an entire tick, as ntp_tick_length may change 139 - * mid-tick, and we don't want to apply that new value to 140 - * the tick in progress. 141 - */ 117 + 118 + ktime_t next_leap_ktime; 142 119 u64 ntp_tick; 143 - /* Difference between accumulated time and NTP time in ntp 144 - * shifted nano seconds. */ 145 120 s64 ntp_error; 146 121 u32 ntp_error_shift; 147 122 u32 ntp_err_mult; 148 - /* Flag used to avoid updating NTP twice with same second */ 149 123 u32 skip_second_overflow; 150 - #ifdef CONFIG_DEBUG_TIMEKEEPING 151 - long last_warning; 152 - /* 153 - * These simple flag variables are managed 154 - * without locks, which is racy, but they are 155 - * ok since we don't really care about being 156 - * super precise about how many events were 157 - * seen, just that a problem was observed. 158 - */ 159 - int underflow_seen; 160 - int overflow_seen; 161 - #endif 162 124 }; 163 125 164 126 #ifdef CONFIG_GENERIC_TIME_VSYSCALL
+2
include/linux/timekeeping.h
··· 280 280 * counter value 281 281 * @cycles: Clocksource counter value to produce the system times 282 282 * @real: Realtime system time 283 + * @boot: Boot time 283 284 * @raw: Monotonic raw system time 284 285 * @cs_id: Clocksource ID 285 286 * @clock_was_set_seq: The sequence number of clock-was-set events ··· 289 288 struct system_time_snapshot { 290 289 u64 cycles; 291 290 ktime_t real; 291 + ktime_t boot; 292 292 ktime_t raw; 293 293 enum clocksource_ids cs_id; 294 294 unsigned int clock_was_set_seq;
-8
include/linux/timex.h
··· 139 139 #define MAXSEC 2048 /* max interval between updates (s) */ 140 140 #define NTP_PHASE_LIMIT ((MAXPHASE / NSEC_PER_USEC) << 5) /* beyond max. dispersion */ 141 141 142 - /* 143 - * kernel variables 144 - * Note: maximum error = NTP sync distance = dispersion + delay / 2; 145 - * estimated error = NTP dispersion. 146 - */ 147 - extern unsigned long tick_usec; /* USER_HZ period (usec) */ 148 - extern unsigned long tick_nsec; /* SHIFTED_HZ period (nsec) */ 149 - 150 142 /* Required to safely shift negative values */ 151 143 #define shift_right(x, s) ({ \ 152 144 __typeof__(x) __x = (x); \
+2 -2
include/linux/wait.h
··· 542 542 int __ret = 0; \ 543 543 struct hrtimer_sleeper __t; \ 544 544 \ 545 - hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, \ 546 - HRTIMER_MODE_REL); \ 545 + hrtimer_setup_sleeper_on_stack(&__t, CLOCK_MONOTONIC, \ 546 + HRTIMER_MODE_REL); \ 547 547 if ((timeout) != KTIME_MAX) { \ 548 548 hrtimer_set_expires_range_ns(&__t.timer, timeout, \ 549 549 current->timer_slack_ns); \
+1 -1
include/uapi/asm-generic/siginfo.h
··· 46 46 __kernel_timer_t _tid; /* timer id */ 47 47 int _overrun; /* overrun count */ 48 48 sigval_t _sigval; /* same as below */ 49 - int _sys_private; /* not to be passed to user */ 49 + int _sys_private; /* Not used by the kernel. Historic leftover. Always 0. */ 50 50 } _timer; 51 51 52 52 /* POSIX.1b signals */
+3 -2
init/init_task.c
··· 30 30 .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex), 31 31 .exec_update_lock = __RWSEM_INITIALIZER(init_signals.exec_update_lock), 32 32 #ifdef CONFIG_POSIX_TIMERS 33 - .posix_timers = HLIST_HEAD_INIT, 34 - .cputimer = { 33 + .posix_timers = HLIST_HEAD_INIT, 34 + .ignored_posix_timers = HLIST_HEAD_INIT, 35 + .cputimer = { 35 36 .cputime_atomic = INIT_CPUTIME_ATOMIC, 36 37 }, 37 38 #endif
+4 -3
io_uring/io_uring.c
··· 2408 2408 { 2409 2409 ktime_t timeout; 2410 2410 2411 - hrtimer_init_on_stack(&iowq->t, clock_id, HRTIMER_MODE_ABS); 2412 2411 if (iowq->min_timeout) { 2413 2412 timeout = ktime_add_ns(iowq->min_timeout, start_time); 2414 - iowq->t.function = io_cqring_min_timer_wakeup; 2413 + hrtimer_setup_on_stack(&iowq->t, io_cqring_min_timer_wakeup, clock_id, 2414 + HRTIMER_MODE_ABS); 2415 2415 } else { 2416 2416 timeout = iowq->timeout; 2417 - iowq->t.function = io_cqring_timer_wakeup; 2417 + hrtimer_setup_on_stack(&iowq->t, io_cqring_timer_wakeup, clock_id, 2418 + HRTIMER_MODE_ABS); 2418 2419 } 2419 2420 2420 2421 hrtimer_set_expires_range_ns(&iowq->t, timeout, 0);
+1 -1
io_uring/rw.c
··· 1176 1176 req->flags |= REQ_F_IOPOLL_STATE; 1177 1177 1178 1178 mode = HRTIMER_MODE_REL; 1179 - hrtimer_init_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode); 1179 + hrtimer_setup_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode); 1180 1180 hrtimer_set_expires(&timer.timer, kt); 1181 1181 set_current_state(TASK_INTERRUPTIBLE); 1182 1182 hrtimer_sleeper_start_expires(&timer, mode);
-1
io_uring/timeout.c
··· 76 76 /* re-arm timer */ 77 77 spin_lock_irq(&ctx->timeout_lock); 78 78 list_add(&timeout->list, ctx->timeout_list.prev); 79 - data->timer.function = io_timeout_fn; 80 79 hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), data->mode); 81 80 spin_unlock_irq(&ctx->timeout_lock); 82 81 return;
-1
kernel/cpu.c
··· 1339 1339 cpuhp_bp_sync_dead(cpu); 1340 1340 1341 1341 lockdep_cleanup_dead_cpu(cpu, idle_thread_get(cpu)); 1342 - tick_cleanup_dead_cpu(cpu); 1343 1342 1344 1343 /* 1345 1344 * Callbacks must be re-integrated right away to the RCU state machine.
+1
kernel/fork.c
··· 1862 1862 1863 1863 #ifdef CONFIG_POSIX_TIMERS 1864 1864 INIT_HLIST_HEAD(&sig->posix_timers); 1865 + INIT_HLIST_HEAD(&sig->ignored_posix_timers); 1865 1866 hrtimer_init(&sig->real_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 1866 1867 sig->real_timer.function = it_real_fn; 1867 1868 #endif
+3 -3
kernel/futex/core.c
··· 140 140 if (!time) 141 141 return NULL; 142 142 143 - hrtimer_init_sleeper_on_stack(timeout, (flags & FLAGS_CLOCKRT) ? 144 - CLOCK_REALTIME : CLOCK_MONOTONIC, 145 - HRTIMER_MODE_ABS); 143 + hrtimer_setup_sleeper_on_stack(timeout, 144 + (flags & FLAGS_CLOCKRT) ? CLOCK_REALTIME : CLOCK_MONOTONIC, 145 + HRTIMER_MODE_ABS); 146 146 /* 147 147 * If range_ns is 0, calling hrtimer_set_expires_range_ns() is 148 148 * effectively the same as calling hrtimer_set_expires().
+2 -2
kernel/sched/idle.c
··· 398 398 cpuidle_use_deepest_state(latency_ns); 399 399 400 400 it.done = 0; 401 - hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); 402 - it.timer.function = idle_inject_timer_fn; 401 + hrtimer_setup_on_stack(&it.timer, idle_inject_timer_fn, CLOCK_MONOTONIC, 402 + HRTIMER_MODE_REL_HARD); 403 403 hrtimer_start(&it.timer, ns_to_ktime(duration_ns), 404 404 HRTIMER_MODE_REL_PINNED_HARD); 405 405
+311 -209
kernel/signal.c
··· 59 59 #include <asm/cacheflush.h> 60 60 #include <asm/syscall.h> /* for syscall_get_* */ 61 61 62 + #include "time/posix-timers.h" 63 + 62 64 /* 63 65 * SLAB caches for signal bits. 64 66 */ ··· 398 396 task_set_jobctl_pending(task, mask | JOBCTL_STOP_PENDING); 399 397 } 400 398 401 - /* 402 - * allocate a new signal queue record 403 - * - this may be called without locks if and only if t == current, otherwise an 404 - * appropriate lock must be held to stop the target task from exiting 405 - */ 406 - static struct sigqueue * 407 - __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, 408 - int override_rlimit, const unsigned int sigqueue_flags) 399 + static struct ucounts *sig_get_ucounts(struct task_struct *t, int sig, 400 + int override_rlimit) 409 401 { 410 - struct sigqueue *q = NULL; 411 402 struct ucounts *ucounts; 412 403 long sigpending; 413 404 ··· 420 425 if (!sigpending) 421 426 return NULL; 422 427 423 - if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) { 424 - q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); 425 - } else { 428 + if (unlikely(!override_rlimit && sigpending > task_rlimit(t, RLIMIT_SIGPENDING))) { 429 + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); 426 430 print_dropped_signal(sig); 431 + return NULL; 427 432 } 428 433 429 - if (unlikely(q == NULL)) { 434 + return ucounts; 435 + } 436 + 437 + static void __sigqueue_init(struct sigqueue *q, struct ucounts *ucounts, 438 + const unsigned int sigqueue_flags) 439 + { 440 + INIT_LIST_HEAD(&q->list); 441 + q->flags = sigqueue_flags; 442 + q->ucounts = ucounts; 443 + } 444 + 445 + /* 446 + * allocate a new signal queue record 447 + * - this may be called without locks if and only if t == current, otherwise an 448 + * appropriate lock must be held to stop the target task from exiting 449 + */ 450 + static struct sigqueue *sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, 451 + int override_rlimit) 452 + { 453 + struct ucounts *ucounts = sig_get_ucounts(t, sig, override_rlimit); 454 + struct sigqueue *q; 455 + 456 + if (!ucounts) 457 + return NULL; 458 + 459 + q = kmem_cache_alloc(sigqueue_cachep, gfp_flags); 460 + if (!q) { 430 461 dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); 431 - } else { 432 - INIT_LIST_HEAD(&q->list); 433 - q->flags = sigqueue_flags; 434 - q->ucounts = ucounts; 462 + return NULL; 435 463 } 464 + 465 + __sigqueue_init(q, ucounts, 0); 436 466 return q; 437 467 } 438 468 439 469 static void __sigqueue_free(struct sigqueue *q) 440 470 { 441 - if (q->flags & SIGQUEUE_PREALLOC) 471 + if (q->flags & SIGQUEUE_PREALLOC) { 472 + posixtimer_sigqueue_putref(q); 442 473 return; 474 + } 443 475 if (q->ucounts) { 444 476 dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING); 445 477 q->ucounts = NULL; ··· 500 478 spin_unlock_irqrestore(&t->sighand->siglock, flags); 501 479 } 502 480 EXPORT_SYMBOL(flush_signals); 503 - 504 - #ifdef CONFIG_POSIX_TIMERS 505 - static void __flush_itimer_signals(struct sigpending *pending) 506 - { 507 - sigset_t signal, retain; 508 - struct sigqueue *q, *n; 509 - 510 - signal = pending->signal; 511 - sigemptyset(&retain); 512 - 513 - list_for_each_entry_safe(q, n, &pending->list, list) { 514 - int sig = q->info.si_signo; 515 - 516 - if (likely(q->info.si_code != SI_TIMER)) { 517 - sigaddset(&retain, sig); 518 - } else { 519 - sigdelset(&signal, sig); 520 - list_del_init(&q->list); 521 - __sigqueue_free(q); 522 - } 523 - } 524 - 525 - sigorsets(&pending->signal, &signal, &retain); 526 - } 527 - 528 - void flush_itimer_signals(void) 529 - { 530 - struct task_struct *tsk = current; 531 - unsigned long flags; 532 - 533 - spin_lock_irqsave(&tsk->sighand->siglock, flags); 534 - __flush_itimer_signals(&tsk->pending); 535 - __flush_itimer_signals(&tsk->signal->shared_pending); 536 - spin_unlock_irqrestore(&tsk->sighand->siglock, flags); 537 - } 538 - #endif 539 481 540 482 void ignore_signals(struct task_struct *t) 541 483 { ··· 550 564 } 551 565 552 566 static void collect_signal(int sig, struct sigpending *list, kernel_siginfo_t *info, 553 - bool *resched_timer) 567 + struct sigqueue **timer_sigq) 554 568 { 555 569 struct sigqueue *q, *first = NULL; 556 570 ··· 573 587 list_del_init(&first->list); 574 588 copy_siginfo(info, &first->info); 575 589 576 - *resched_timer = 577 - (first->flags & SIGQUEUE_PREALLOC) && 578 - (info->si_code == SI_TIMER) && 579 - (info->si_sys_private); 580 - 581 - __sigqueue_free(first); 590 + /* 591 + * posix-timer signals are preallocated and freed when the last 592 + * reference count is dropped in posixtimer_deliver_signal() or 593 + * immediately on timer deletion when the signal is not pending. 594 + * Spare the extra round through __sigqueue_free() which is 595 + * ignoring preallocated signals. 596 + */ 597 + if (unlikely((first->flags & SIGQUEUE_PREALLOC) && (info->si_code == SI_TIMER))) 598 + *timer_sigq = first; 599 + else 600 + __sigqueue_free(first); 582 601 } else { 583 602 /* 584 603 * Ok, it wasn't in the queue. This must be ··· 600 609 } 601 610 602 611 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, 603 - kernel_siginfo_t *info, bool *resched_timer) 612 + kernel_siginfo_t *info, struct sigqueue **timer_sigq) 604 613 { 605 614 int sig = next_signal(pending, mask); 606 615 607 616 if (sig) 608 - collect_signal(sig, pending, info, resched_timer); 617 + collect_signal(sig, pending, info, timer_sigq); 609 618 return sig; 610 619 } 611 620 ··· 617 626 int dequeue_signal(sigset_t *mask, kernel_siginfo_t *info, enum pid_type *type) 618 627 { 619 628 struct task_struct *tsk = current; 620 - bool resched_timer = false; 629 + struct sigqueue *timer_sigq; 621 630 int signr; 622 631 623 632 lockdep_assert_held(&tsk->sighand->siglock); 624 633 634 + again: 625 635 *type = PIDTYPE_PID; 626 - signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer); 636 + timer_sigq = NULL; 637 + signr = __dequeue_signal(&tsk->pending, mask, info, &timer_sigq); 627 638 if (!signr) { 628 639 *type = PIDTYPE_TGID; 629 640 signr = __dequeue_signal(&tsk->signal->shared_pending, 630 - mask, info, &resched_timer); 631 - #ifdef CONFIG_POSIX_TIMERS 632 - /* 633 - * itimer signal ? 634 - * 635 - * itimers are process shared and we restart periodic 636 - * itimers in the signal delivery path to prevent DoS 637 - * attacks in the high resolution timer case. This is 638 - * compliant with the old way of self-restarting 639 - * itimers, as the SIGALRM is a legacy signal and only 640 - * queued once. Changing the restart behaviour to 641 - * restart the timer in the signal dequeue path is 642 - * reducing the timer noise on heavy loaded !highres 643 - * systems too. 644 - */ 645 - if (unlikely(signr == SIGALRM)) { 646 - struct hrtimer *tmr = &tsk->signal->real_timer; 641 + mask, info, &timer_sigq); 647 642 648 - if (!hrtimer_is_queued(tmr) && 649 - tsk->signal->it_real_incr != 0) { 650 - hrtimer_forward(tmr, tmr->base->get_time(), 651 - tsk->signal->it_real_incr); 652 - hrtimer_restart(tmr); 653 - } 654 - } 655 - #endif 643 + if (unlikely(signr == SIGALRM)) 644 + posixtimer_rearm_itimer(tsk); 656 645 } 657 646 658 647 recalc_sigpending(); ··· 654 683 */ 655 684 current->jobctl |= JOBCTL_STOP_DEQUEUED; 656 685 } 657 - #ifdef CONFIG_POSIX_TIMERS 658 - if (resched_timer) { 659 - /* 660 - * Release the siglock to ensure proper locking order 661 - * of timer locks outside of siglocks. Note, we leave 662 - * irqs disabled here, since the posix-timers code is 663 - * about to disable them again anyway. 664 - */ 665 - spin_unlock(&tsk->sighand->siglock); 666 - posixtimer_rearm(info); 667 - spin_lock(&tsk->sighand->siglock); 668 686 669 - /* Don't expose the si_sys_private value to userspace */ 670 - info->si_sys_private = 0; 687 + if (IS_ENABLED(CONFIG_POSIX_TIMERS) && unlikely(timer_sigq)) { 688 + if (!posixtimer_deliver_signal(info, timer_sigq)) 689 + goto again; 671 690 } 672 - #endif 691 + 673 692 return signr; 674 693 } 675 694 EXPORT_SYMBOL_GPL(dequeue_signal); ··· 734 773 kick_process(t); 735 774 } 736 775 737 - /* 738 - * Remove signals in mask from the pending set and queue. 739 - * Returns 1 if any signals were found. 740 - * 741 - * All callers must be holding the siglock. 742 - */ 743 - static void flush_sigqueue_mask(sigset_t *mask, struct sigpending *s) 776 + static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q); 777 + 778 + static void sigqueue_free_ignored(struct task_struct *tsk, struct sigqueue *q) 779 + { 780 + if (likely(!(q->flags & SIGQUEUE_PREALLOC) || q->info.si_code != SI_TIMER)) 781 + __sigqueue_free(q); 782 + else 783 + posixtimer_sig_ignore(tsk, q); 784 + } 785 + 786 + /* Remove signals in mask from the pending set and queue. */ 787 + static void flush_sigqueue_mask(struct task_struct *p, sigset_t *mask, struct sigpending *s) 744 788 { 745 789 struct sigqueue *q, *n; 746 790 sigset_t m; 791 + 792 + lockdep_assert_held(&p->sighand->siglock); 747 793 748 794 sigandsets(&m, mask, &s->signal); 749 795 if (sigisemptyset(&m)) ··· 760 792 list_for_each_entry_safe(q, n, &s->list, list) { 761 793 if (sigismember(mask, q->info.si_signo)) { 762 794 list_del_init(&q->list); 763 - __sigqueue_free(q); 795 + sigqueue_free_ignored(p, q); 764 796 } 765 797 } 766 798 } ··· 885 917 * This is a stop signal. Remove SIGCONT from all queues. 886 918 */ 887 919 siginitset(&flush, sigmask(SIGCONT)); 888 - flush_sigqueue_mask(&flush, &signal->shared_pending); 920 + flush_sigqueue_mask(p, &flush, &signal->shared_pending); 889 921 for_each_thread(p, t) 890 - flush_sigqueue_mask(&flush, &t->pending); 922 + flush_sigqueue_mask(p, &flush, &t->pending); 891 923 } else if (sig == SIGCONT) { 892 924 unsigned int why; 893 925 /* 894 926 * Remove all stop signals from all queues, wake all threads. 895 927 */ 896 928 siginitset(&flush, SIG_KERNEL_STOP_MASK); 897 - flush_sigqueue_mask(&flush, &signal->shared_pending); 929 + flush_sigqueue_mask(p, &flush, &signal->shared_pending); 898 930 for_each_thread(p, t) { 899 - flush_sigqueue_mask(&flush, &t->pending); 931 + flush_sigqueue_mask(p, &flush, &t->pending); 900 932 task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING); 901 933 if (likely(!(t->ptrace & PT_SEIZED))) { 902 934 t->jobctl &= ~JOBCTL_STOPPED; ··· 1083 1115 else 1084 1116 override_rlimit = 0; 1085 1117 1086 - q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit, 0); 1118 + q = sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit); 1087 1119 1088 1120 if (q) { 1089 1121 list_add_tail(&q->list, &pending->list); ··· 1891 1923 } 1892 1924 EXPORT_SYMBOL(kill_pid); 1893 1925 1926 + #ifdef CONFIG_POSIX_TIMERS 1894 1927 /* 1895 - * These functions support sending signals using preallocated sigqueue 1896 - * structures. This is needed "because realtime applications cannot 1897 - * afford to lose notifications of asynchronous events, like timer 1898 - * expirations or I/O completions". In the case of POSIX Timers 1899 - * we allocate the sigqueue structure from the timer_create. If this 1900 - * allocation fails we are able to report the failure to the application 1901 - * with an EAGAIN error. 1928 + * These functions handle POSIX timer signals. POSIX timers use 1929 + * preallocated sigqueue structs for sending signals. 1902 1930 */ 1903 - struct sigqueue *sigqueue_alloc(void) 1931 + static void __flush_itimer_signals(struct sigpending *pending) 1904 1932 { 1905 - return __sigqueue_alloc(-1, current, GFP_KERNEL, 0, SIGQUEUE_PREALLOC); 1906 - } 1933 + sigset_t signal, retain; 1934 + struct sigqueue *q, *n; 1907 1935 1908 - void sigqueue_free(struct sigqueue *q) 1909 - { 1910 - spinlock_t *lock = &current->sighand->siglock; 1911 - unsigned long flags; 1936 + signal = pending->signal; 1937 + sigemptyset(&retain); 1912 1938 1913 - if (WARN_ON_ONCE(!(q->flags & SIGQUEUE_PREALLOC))) 1914 - return; 1915 - /* 1916 - * We must hold ->siglock while testing q->list 1917 - * to serialize with collect_signal() or with 1918 - * __exit_signal()->flush_sigqueue(). 1919 - */ 1920 - spin_lock_irqsave(lock, flags); 1921 - q->flags &= ~SIGQUEUE_PREALLOC; 1922 - /* 1923 - * If it is queued it will be freed when dequeued, 1924 - * like the "regular" sigqueue. 1925 - */ 1926 - if (!list_empty(&q->list)) 1927 - q = NULL; 1928 - spin_unlock_irqrestore(lock, flags); 1939 + list_for_each_entry_safe(q, n, &pending->list, list) { 1940 + int sig = q->info.si_signo; 1929 1941 1930 - if (q) 1931 - __sigqueue_free(q); 1932 - } 1933 - 1934 - int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) 1935 - { 1936 - int sig = q->info.si_signo; 1937 - struct sigpending *pending; 1938 - struct task_struct *t; 1939 - unsigned long flags; 1940 - int ret, result; 1941 - 1942 - if (WARN_ON_ONCE(!(q->flags & SIGQUEUE_PREALLOC))) 1943 - return 0; 1944 - if (WARN_ON_ONCE(q->info.si_code != SI_TIMER)) 1945 - return 0; 1946 - 1947 - ret = -1; 1948 - rcu_read_lock(); 1949 - 1950 - /* 1951 - * This function is used by POSIX timers to deliver a timer signal. 1952 - * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID 1953 - * set), the signal must be delivered to the specific thread (queues 1954 - * into t->pending). 1955 - * 1956 - * Where type is not PIDTYPE_PID, signals must be delivered to the 1957 - * process. In this case, prefer to deliver to current if it is in 1958 - * the same thread group as the target process, which avoids 1959 - * unnecessarily waking up a potentially idle task. 1960 - */ 1961 - t = pid_task(pid, type); 1962 - if (!t) 1963 - goto ret; 1964 - if (type != PIDTYPE_PID && same_thread_group(t, current)) 1965 - t = current; 1966 - if (!likely(lock_task_sighand(t, &flags))) 1967 - goto ret; 1968 - 1969 - ret = 1; /* the signal is ignored */ 1970 - result = TRACE_SIGNAL_IGNORED; 1971 - if (!prepare_signal(sig, t, false)) 1972 - goto out; 1973 - 1974 - ret = 0; 1975 - if (unlikely(!list_empty(&q->list))) { 1976 - /* 1977 - * If an SI_TIMER entry is already queue just increment 1978 - * the overrun count. 1979 - */ 1980 - q->info.si_overrun++; 1981 - result = TRACE_SIGNAL_ALREADY_PENDING; 1982 - goto out; 1942 + if (likely(q->info.si_code != SI_TIMER)) { 1943 + sigaddset(&retain, sig); 1944 + } else { 1945 + sigdelset(&signal, sig); 1946 + list_del_init(&q->list); 1947 + __sigqueue_free(q); 1948 + } 1983 1949 } 1984 - q->info.si_overrun = 0; 1950 + 1951 + sigorsets(&pending->signal, &signal, &retain); 1952 + } 1953 + 1954 + void flush_itimer_signals(void) 1955 + { 1956 + struct task_struct *tsk = current; 1957 + 1958 + guard(spinlock_irqsave)(&tsk->sighand->siglock); 1959 + __flush_itimer_signals(&tsk->pending); 1960 + __flush_itimer_signals(&tsk->signal->shared_pending); 1961 + } 1962 + 1963 + bool posixtimer_init_sigqueue(struct sigqueue *q) 1964 + { 1965 + struct ucounts *ucounts = sig_get_ucounts(current, -1, 0); 1966 + 1967 + if (!ucounts) 1968 + return false; 1969 + clear_siginfo(&q->info); 1970 + __sigqueue_init(q, ucounts, SIGQUEUE_PREALLOC); 1971 + return true; 1972 + } 1973 + 1974 + static void posixtimer_queue_sigqueue(struct sigqueue *q, struct task_struct *t, enum pid_type type) 1975 + { 1976 + struct sigpending *pending; 1977 + int sig = q->info.si_signo; 1985 1978 1986 1979 signalfd_notify(t, sig); 1987 1980 pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; 1988 1981 list_add_tail(&q->list, &pending->list); 1989 1982 sigaddset(&pending->signal, sig); 1990 1983 complete_signal(sig, t, type); 1984 + } 1985 + 1986 + /* 1987 + * This function is used by POSIX timers to deliver a timer signal. 1988 + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID 1989 + * set), the signal must be delivered to the specific thread (queues 1990 + * into t->pending). 1991 + * 1992 + * Where type is not PIDTYPE_PID, signals must be delivered to the 1993 + * process. In this case, prefer to deliver to current if it is in 1994 + * the same thread group as the target process, which avoids 1995 + * unnecessarily waking up a potentially idle task. 1996 + */ 1997 + static inline struct task_struct *posixtimer_get_target(struct k_itimer *tmr) 1998 + { 1999 + struct task_struct *t = pid_task(tmr->it_pid, tmr->it_pid_type); 2000 + 2001 + if (t && tmr->it_pid_type != PIDTYPE_PID && same_thread_group(t, current)) 2002 + t = current; 2003 + return t; 2004 + } 2005 + 2006 + void posixtimer_send_sigqueue(struct k_itimer *tmr) 2007 + { 2008 + struct sigqueue *q = &tmr->sigq; 2009 + int sig = q->info.si_signo; 2010 + struct task_struct *t; 2011 + unsigned long flags; 2012 + int result; 2013 + 2014 + guard(rcu)(); 2015 + 2016 + t = posixtimer_get_target(tmr); 2017 + if (!t) 2018 + return; 2019 + 2020 + if (!likely(lock_task_sighand(t, &flags))) 2021 + return; 2022 + 2023 + /* 2024 + * Update @tmr::sigqueue_seq for posix timer signals with sighand 2025 + * locked to prevent a race against dequeue_signal(). 2026 + */ 2027 + tmr->it_sigqueue_seq = tmr->it_signal_seq; 2028 + 2029 + /* 2030 + * Set the signal delivery status under sighand lock, so that the 2031 + * ignored signal handling can distinguish between a periodic and a 2032 + * non-periodic timer. 2033 + */ 2034 + tmr->it_sig_periodic = tmr->it_status == POSIX_TIMER_REQUEUE_PENDING; 2035 + 2036 + if (!prepare_signal(sig, t, false)) { 2037 + result = TRACE_SIGNAL_IGNORED; 2038 + 2039 + if (!list_empty(&q->list)) { 2040 + /* 2041 + * If task group is exiting with the signal already pending, 2042 + * wait for __exit_signal() to do its job. Otherwise if 2043 + * ignored, it's not supposed to be queued. Try to survive. 2044 + */ 2045 + WARN_ON_ONCE(!(t->signal->flags & SIGNAL_GROUP_EXIT)); 2046 + goto out; 2047 + } 2048 + 2049 + /* Periodic timers with SIG_IGN are queued on the ignored list */ 2050 + if (tmr->it_sig_periodic) { 2051 + /* 2052 + * Already queued means the timer was rearmed after 2053 + * the previous expiry got it on the ignore list. 2054 + * Nothing to do for that case. 2055 + */ 2056 + if (hlist_unhashed(&tmr->ignored_list)) { 2057 + /* 2058 + * Take a signal reference and queue it on 2059 + * the ignored list. 2060 + */ 2061 + posixtimer_sigqueue_getref(q); 2062 + posixtimer_sig_ignore(t, q); 2063 + } 2064 + } else if (!hlist_unhashed(&tmr->ignored_list)) { 2065 + /* 2066 + * Covers the case where a timer was periodic and 2067 + * then the signal was ignored. Later it was rearmed 2068 + * as oneshot timer. The previous signal is invalid 2069 + * now, and this oneshot signal has to be dropped. 2070 + * Remove it from the ignored list and drop the 2071 + * reference count as the signal is not longer 2072 + * queued. 2073 + */ 2074 + hlist_del_init(&tmr->ignored_list); 2075 + posixtimer_putref(tmr); 2076 + } 2077 + goto out; 2078 + } 2079 + 2080 + /* This should never happen and leaks a reference count */ 2081 + if (WARN_ON_ONCE(!hlist_unhashed(&tmr->ignored_list))) 2082 + hlist_del_init(&tmr->ignored_list); 2083 + 2084 + if (unlikely(!list_empty(&q->list))) { 2085 + /* This holds a reference count already */ 2086 + result = TRACE_SIGNAL_ALREADY_PENDING; 2087 + goto out; 2088 + } 2089 + 2090 + posixtimer_sigqueue_getref(q); 2091 + posixtimer_queue_sigqueue(q, t, tmr->it_pid_type); 1991 2092 result = TRACE_SIGNAL_DELIVERED; 1992 2093 out: 1993 - trace_signal_generate(sig, &q->info, t, type != PIDTYPE_PID, result); 2094 + trace_signal_generate(sig, &q->info, t, tmr->it_pid_type != PIDTYPE_PID, result); 1994 2095 unlock_task_sighand(t, &flags); 1995 - ret: 1996 - rcu_read_unlock(); 1997 - return ret; 1998 2096 } 2097 + 2098 + static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q) 2099 + { 2100 + struct k_itimer *tmr = container_of(q, struct k_itimer, sigq); 2101 + 2102 + /* 2103 + * If the timer is marked deleted already or the signal originates 2104 + * from a non-periodic timer, then just drop the reference 2105 + * count. Otherwise queue it on the ignored list. 2106 + */ 2107 + if (tmr->it_signal && tmr->it_sig_periodic) 2108 + hlist_add_head(&tmr->ignored_list, &tsk->signal->ignored_posix_timers); 2109 + else 2110 + posixtimer_putref(tmr); 2111 + } 2112 + 2113 + static void posixtimer_sig_unignore(struct task_struct *tsk, int sig) 2114 + { 2115 + struct hlist_head *head = &tsk->signal->ignored_posix_timers; 2116 + struct hlist_node *tmp; 2117 + struct k_itimer *tmr; 2118 + 2119 + if (likely(hlist_empty(head))) 2120 + return; 2121 + 2122 + /* 2123 + * Rearming a timer with sighand lock held is not possible due to 2124 + * lock ordering vs. tmr::it_lock. Just stick the sigqueue back and 2125 + * let the signal delivery path deal with it whether it needs to be 2126 + * rearmed or not. This cannot be decided here w/o dropping sighand 2127 + * lock and creating a loop retry horror show. 2128 + */ 2129 + hlist_for_each_entry_safe(tmr, tmp , head, ignored_list) { 2130 + struct task_struct *target; 2131 + 2132 + /* 2133 + * tmr::sigq.info.si_signo is immutable, so accessing it 2134 + * without holding tmr::it_lock is safe. 2135 + */ 2136 + if (tmr->sigq.info.si_signo != sig) 2137 + continue; 2138 + 2139 + hlist_del_init(&tmr->ignored_list); 2140 + 2141 + /* This should never happen and leaks a reference count */ 2142 + if (WARN_ON_ONCE(!list_empty(&tmr->sigq.list))) 2143 + continue; 2144 + 2145 + /* 2146 + * Get the target for the signal. If target is a thread and 2147 + * has exited by now, drop the reference count. 2148 + */ 2149 + guard(rcu)(); 2150 + target = posixtimer_get_target(tmr); 2151 + if (target) 2152 + posixtimer_queue_sigqueue(&tmr->sigq, target, tmr->it_pid_type); 2153 + else 2154 + posixtimer_putref(tmr); 2155 + } 2156 + } 2157 + #else /* CONFIG_POSIX_TIMERS */ 2158 + static inline void posixtimer_sig_ignore(struct task_struct *tsk, struct sigqueue *q) { } 2159 + static inline void posixtimer_sig_unignore(struct task_struct *tsk, int sig) { } 2160 + #endif /* !CONFIG_POSIX_TIMERS */ 1999 2161 2000 2162 void do_notify_pidfd(struct task_struct *task) 2001 2163 { ··· 4243 4145 sigemptyset(&mask); 4244 4146 sigaddset(&mask, sig); 4245 4147 4246 - flush_sigqueue_mask(&mask, &current->signal->shared_pending); 4247 - flush_sigqueue_mask(&mask, &current->pending); 4148 + flush_sigqueue_mask(current, &mask, &current->signal->shared_pending); 4149 + flush_sigqueue_mask(current, &mask, &current->pending); 4248 4150 recalc_sigpending(); 4249 4151 } 4250 4152 spin_unlock_irq(&current->sighand->siglock); ··· 4294 4196 sigaction_compat_abi(act, oact); 4295 4197 4296 4198 if (act) { 4199 + bool was_ignored = k->sa.sa_handler == SIG_IGN; 4200 + 4297 4201 sigdelsetmask(&act->sa.sa_mask, 4298 4202 sigmask(SIGKILL) | sigmask(SIGSTOP)); 4299 4203 *k = *act; ··· 4313 4213 if (sig_handler_ignored(sig_handler(p, sig), sig)) { 4314 4214 sigemptyset(&mask); 4315 4215 sigaddset(&mask, sig); 4316 - flush_sigqueue_mask(&mask, &p->signal->shared_pending); 4216 + flush_sigqueue_mask(p, &mask, &p->signal->shared_pending); 4317 4217 for_each_thread(p, t) 4318 - flush_sigqueue_mask(&mask, &t->pending); 4218 + flush_sigqueue_mask(p, &mask, &t->pending); 4219 + } else if (was_ignored) { 4220 + posixtimer_sig_unignore(p, sig); 4319 4221 } 4320 4222 } 4321 4223
-5
kernel/time/Kconfig
··· 17 17 config ARCH_CLOCKSOURCE_INIT 18 18 bool 19 19 20 - # Clocksources require validation of the clocksource against the last 21 - # cycle update - x86/TSC misfeature 22 - config CLOCKSOURCE_VALIDATE_LAST_CYCLE 23 - bool 24 - 25 20 # Timekeeping vsyscall support 26 21 config GENERIC_TIME_VSYSCALL 27 22 bool
+1 -1
kernel/time/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 - obj-y += time.o timer.o hrtimer.o 2 + obj-y += time.o timer.o hrtimer.o sleep_timeout.o 3 3 obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o 4 4 obj-y += timeconv.o timecounter.o alarmtimer.o 5 5
+19 -79
kernel/time/alarmtimer.c
··· 197 197 { 198 198 struct alarm *alarm = container_of(timer, struct alarm, timer); 199 199 struct alarm_base *base = &alarm_bases[alarm->type]; 200 - unsigned long flags; 201 - int ret = HRTIMER_NORESTART; 202 - int restart = ALARMTIMER_NORESTART; 203 200 204 - spin_lock_irqsave(&base->lock, flags); 205 - alarmtimer_dequeue(base, alarm); 206 - spin_unlock_irqrestore(&base->lock, flags); 201 + scoped_guard (spinlock_irqsave, &base->lock) 202 + alarmtimer_dequeue(base, alarm); 207 203 208 204 if (alarm->function) 209 - restart = alarm->function(alarm, base->get_ktime()); 210 - 211 - spin_lock_irqsave(&base->lock, flags); 212 - if (restart != ALARMTIMER_NORESTART) { 213 - hrtimer_set_expires(&alarm->timer, alarm->node.expires); 214 - alarmtimer_enqueue(base, alarm); 215 - ret = HRTIMER_RESTART; 216 - } 217 - spin_unlock_irqrestore(&base->lock, flags); 205 + alarm->function(alarm, base->get_ktime()); 218 206 219 207 trace_alarmtimer_fired(alarm, base->get_ktime()); 220 - return ret; 221 - 208 + return HRTIMER_NORESTART; 222 209 } 223 210 224 211 ktime_t alarm_expires_remaining(const struct alarm *alarm) ··· 321 334 322 335 static void 323 336 __alarm_init(struct alarm *alarm, enum alarmtimer_type type, 324 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) 337 + void (*function)(struct alarm *, ktime_t)) 325 338 { 326 339 timerqueue_init(&alarm->node); 327 - alarm->timer.function = alarmtimer_fired; 328 340 alarm->function = function; 329 341 alarm->type = type; 330 342 alarm->state = ALARMTIMER_STATE_INACTIVE; ··· 336 350 * @function: callback that is run when the alarm fires 337 351 */ 338 352 void alarm_init(struct alarm *alarm, enum alarmtimer_type type, 339 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) 353 + void (*function)(struct alarm *, ktime_t)) 340 354 { 341 - hrtimer_init(&alarm->timer, alarm_bases[type].base_clockid, 342 - HRTIMER_MODE_ABS); 355 + hrtimer_setup(&alarm->timer, alarmtimer_fired, alarm_bases[type].base_clockid, 356 + HRTIMER_MODE_ABS); 343 357 __alarm_init(alarm, type, function); 344 358 } 345 359 EXPORT_SYMBOL_GPL(alarm_init); ··· 466 480 } 467 481 EXPORT_SYMBOL_GPL(alarm_forward); 468 482 469 - static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle) 470 - { 471 - struct alarm_base *base = &alarm_bases[alarm->type]; 472 - ktime_t now = base->get_ktime(); 473 - 474 - if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) { 475 - /* 476 - * Same issue as with posix_timer_fn(). Timers which are 477 - * periodic but the signal is ignored can starve the system 478 - * with a very small interval. The real fix which was 479 - * promised in the context of posix_timer_fn() never 480 - * materialized, but someone should really work on it. 481 - * 482 - * To prevent DOS fake @now to be 1 jiffy out which keeps 483 - * the overrun accounting correct but creates an 484 - * inconsistency vs. timer_gettime(2). 485 - */ 486 - ktime_t kj = NSEC_PER_SEC / HZ; 487 - 488 - if (interval < kj) 489 - now = ktime_add(now, kj); 490 - } 491 - 492 - return alarm_forward(alarm, now, interval); 493 - } 494 - 495 483 u64 alarm_forward_now(struct alarm *alarm, ktime_t interval) 496 484 { 497 - return __alarm_forward_now(alarm, interval, false); 485 + struct alarm_base *base = &alarm_bases[alarm->type]; 486 + 487 + return alarm_forward(alarm, base->get_ktime(), interval); 498 488 } 499 489 EXPORT_SYMBOL_GPL(alarm_forward_now); 500 490 ··· 529 567 * 530 568 * Return: whether the timer is to be restarted 531 569 */ 532 - static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm, 533 - ktime_t now) 570 + static void alarm_handle_timer(struct alarm *alarm, ktime_t now) 534 571 { 535 - struct k_itimer *ptr = container_of(alarm, struct k_itimer, 536 - it.alarm.alarmtimer); 537 - enum alarmtimer_restart result = ALARMTIMER_NORESTART; 538 - unsigned long flags; 572 + struct k_itimer *ptr = container_of(alarm, struct k_itimer, it.alarm.alarmtimer); 539 573 540 - spin_lock_irqsave(&ptr->it_lock, flags); 541 - 542 - if (posix_timer_queue_signal(ptr) && ptr->it_interval) { 543 - /* 544 - * Handle ignored signals and rearm the timer. This will go 545 - * away once we handle ignored signals proper. Ensure that 546 - * small intervals cannot starve the system. 547 - */ 548 - ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true); 549 - ++ptr->it_requeue_pending; 550 - ptr->it_active = 1; 551 - result = ALARMTIMER_RESTART; 552 - } 553 - spin_unlock_irqrestore(&ptr->it_lock, flags); 554 - 555 - return result; 574 + guard(spinlock_irqsave)(&ptr->it_lock); 575 + posix_timer_queue_signal(ptr); 556 576 } 557 577 558 578 /** ··· 695 751 * @now: time at the timer expiration 696 752 * 697 753 * Wakes up the task that set the alarmtimer 698 - * 699 - * Return: ALARMTIMER_NORESTART 700 754 */ 701 - static enum alarmtimer_restart alarmtimer_nsleep_wakeup(struct alarm *alarm, 702 - ktime_t now) 755 + static void alarmtimer_nsleep_wakeup(struct alarm *alarm, ktime_t now) 703 756 { 704 757 struct task_struct *task = alarm->data; 705 758 706 759 alarm->data = NULL; 707 760 if (task) 708 761 wake_up_process(task); 709 - return ALARMTIMER_NORESTART; 710 762 } 711 763 712 764 /** ··· 754 814 755 815 static void 756 816 alarm_init_on_stack(struct alarm *alarm, enum alarmtimer_type type, 757 - enum alarmtimer_restart (*function)(struct alarm *, ktime_t)) 817 + void (*function)(struct alarm *, ktime_t)) 758 818 { 759 - hrtimer_init_on_stack(&alarm->timer, alarm_bases[type].base_clockid, 760 - HRTIMER_MODE_ABS); 819 + hrtimer_setup_on_stack(&alarm->timer, alarmtimer_fired, alarm_bases[type].base_clockid, 820 + HRTIMER_MODE_ABS); 761 821 __alarm_init(alarm, type, function); 762 822 } 763 823
+21 -21
kernel/time/clockevents.c
··· 337 337 } 338 338 339 339 /* 340 - * Called after a notify add to make devices available which were 341 - * released from the notifier call. 340 + * Called after a clockevent has been added which might 341 + * have replaced a current regular or broadcast device. A 342 + * released normal device might be a suitable replacement 343 + * for the current broadcast device. Similarly a released 344 + * broadcast device might be a suitable replacement for a 345 + * normal device. 342 346 */ 343 347 static void clockevents_notify_released(void) 344 348 { 345 349 struct clock_event_device *dev; 346 350 351 + /* 352 + * Keep iterating as long as tick_check_new_device() 353 + * replaces a device. 354 + */ 347 355 while (!list_empty(&clockevents_released)) { 348 356 dev = list_entry(clockevents_released.next, 349 357 struct clock_event_device, list); ··· 618 610 619 611 #ifdef CONFIG_HOTPLUG_CPU 620 612 621 - # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST 622 613 /** 623 - * tick_offline_cpu - Take CPU out of the broadcast mechanism 614 + * tick_offline_cpu - Shutdown all clock events related 615 + * to this CPU and take it out of the 616 + * broadcast mechanism. 624 617 * @cpu: The outgoing CPU 625 618 * 626 - * Called on the outgoing CPU after it took itself offline. 619 + * Called by the dying CPU during teardown. 627 620 */ 628 621 void tick_offline_cpu(unsigned int cpu) 629 622 { 630 - raw_spin_lock(&clockevents_lock); 631 - tick_broadcast_offline(cpu); 632 - raw_spin_unlock(&clockevents_lock); 633 - } 634 - # endif 635 - 636 - /** 637 - * tick_cleanup_dead_cpu - Cleanup the tick and clockevents of a dead cpu 638 - * @cpu: The dead CPU 639 - */ 640 - void tick_cleanup_dead_cpu(int cpu) 641 - { 642 623 struct clock_event_device *dev, *tmp; 643 - unsigned long flags; 644 624 645 - raw_spin_lock_irqsave(&clockevents_lock, flags); 625 + raw_spin_lock(&clockevents_lock); 646 626 627 + tick_broadcast_offline(cpu); 647 628 tick_shutdown(cpu); 629 + 648 630 /* 649 631 * Unregister the clock event devices which were 650 - * released from the users in the notify chain. 632 + * released above. 651 633 */ 652 634 list_for_each_entry_safe(dev, tmp, &clockevents_released, list) 653 635 list_del(&dev->list); 636 + 654 637 /* 655 638 * Now check whether the CPU has left unused per cpu devices 656 639 */ ··· 653 654 list_del(&dev->list); 654 655 } 655 656 } 656 - raw_spin_unlock_irqrestore(&clockevents_lock, flags); 657 + 658 + raw_spin_unlock(&clockevents_lock); 657 659 } 658 660 #endif 659 661
+10 -30
kernel/time/clocksource.c
··· 20 20 #include "tick-internal.h" 21 21 #include "timekeeping_internal.h" 22 22 23 + static void clocksource_enqueue(struct clocksource *cs); 24 + 23 25 static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end) 24 26 { 25 27 u64 delta = clocksource_delta(end, start, cs->mask); ··· 173 171 } 174 172 175 173 static int clocksource_watchdog_kthread(void *data); 176 - static void __clocksource_change_rating(struct clocksource *cs, int rating); 177 174 178 175 static void clocksource_watchdog_work(struct work_struct *work) 179 176 { ··· 190 189 * watchdog_list will find the unstable clock again. 191 190 */ 192 191 kthread_run(clocksource_watchdog_kthread, NULL, "kwatchdog"); 192 + } 193 + 194 + static void clocksource_change_rating(struct clocksource *cs, int rating) 195 + { 196 + list_del(&cs->list); 197 + cs->rating = rating; 198 + clocksource_enqueue(cs); 193 199 } 194 200 195 201 static void __clocksource_unstable(struct clocksource *cs) ··· 705 697 list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) { 706 698 if (cs->flags & CLOCK_SOURCE_UNSTABLE) { 707 699 list_del_init(&cs->wd_list); 708 - __clocksource_change_rating(cs, 0); 700 + clocksource_change_rating(cs, 0); 709 701 select = 1; 710 702 } 711 703 if (cs->flags & CLOCK_SOURCE_RESELECT) { ··· 1262 1254 return 0; 1263 1255 } 1264 1256 EXPORT_SYMBOL_GPL(__clocksource_register_scale); 1265 - 1266 - static void __clocksource_change_rating(struct clocksource *cs, int rating) 1267 - { 1268 - list_del(&cs->list); 1269 - cs->rating = rating; 1270 - clocksource_enqueue(cs); 1271 - } 1272 - 1273 - /** 1274 - * clocksource_change_rating - Change the rating of a registered clocksource 1275 - * @cs: clocksource to be changed 1276 - * @rating: new rating 1277 - */ 1278 - void clocksource_change_rating(struct clocksource *cs, int rating) 1279 - { 1280 - unsigned long flags; 1281 - 1282 - mutex_lock(&clocksource_mutex); 1283 - clocksource_watchdog_lock(&flags); 1284 - __clocksource_change_rating(cs, rating); 1285 - clocksource_watchdog_unlock(&flags); 1286 - 1287 - clocksource_select(); 1288 - clocksource_select_watchdog(false); 1289 - clocksource_suspend_select(false); 1290 - mutex_unlock(&clocksource_mutex); 1291 - } 1292 - EXPORT_SYMBOL(clocksource_change_rating); 1293 1257 1294 1258 /* 1295 1259 * Unbind clocksource @cs. Called with clocksource_mutex held
+78 -152
kernel/time/hrtimer.c
··· 417 417 debug_object_init(timer, &hrtimer_debug_descr); 418 418 } 419 419 420 + static inline void debug_hrtimer_init_on_stack(struct hrtimer *timer) 421 + { 422 + debug_object_init_on_stack(timer, &hrtimer_debug_descr); 423 + } 424 + 420 425 static inline void debug_hrtimer_activate(struct hrtimer *timer, 421 426 enum hrtimer_mode mode) 422 427 { ··· 433 428 debug_object_deactivate(timer, &hrtimer_debug_descr); 434 429 } 435 430 436 - static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, 437 - enum hrtimer_mode mode); 438 - 439 - void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id, 440 - enum hrtimer_mode mode) 441 - { 442 - debug_object_init_on_stack(timer, &hrtimer_debug_descr); 443 - __hrtimer_init(timer, clock_id, mode); 444 - } 445 - EXPORT_SYMBOL_GPL(hrtimer_init_on_stack); 446 - 447 - static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl, 448 - clockid_t clock_id, enum hrtimer_mode mode); 449 - 450 - void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, 451 - clockid_t clock_id, enum hrtimer_mode mode) 452 - { 453 - debug_object_init_on_stack(&sl->timer, &hrtimer_debug_descr); 454 - __hrtimer_init_sleeper(sl, clock_id, mode); 455 - } 456 - EXPORT_SYMBOL_GPL(hrtimer_init_sleeper_on_stack); 457 - 458 431 void destroy_hrtimer_on_stack(struct hrtimer *timer) 459 432 { 460 433 debug_object_free(timer, &hrtimer_debug_descr); ··· 442 459 #else 443 460 444 461 static inline void debug_hrtimer_init(struct hrtimer *timer) { } 462 + static inline void debug_hrtimer_init_on_stack(struct hrtimer *timer) { } 445 463 static inline void debug_hrtimer_activate(struct hrtimer *timer, 446 464 enum hrtimer_mode mode) { } 447 465 static inline void debug_hrtimer_deactivate(struct hrtimer *timer) { } ··· 453 469 enum hrtimer_mode mode) 454 470 { 455 471 debug_hrtimer_init(timer); 472 + trace_hrtimer_init(timer, clockid, mode); 473 + } 474 + 475 + static inline void debug_init_on_stack(struct hrtimer *timer, clockid_t clockid, 476 + enum hrtimer_mode mode) 477 + { 478 + debug_hrtimer_init_on_stack(timer); 456 479 trace_hrtimer_init(timer, clockid, mode); 457 480 } 458 481 ··· 1535 1544 return HRTIMER_BASE_MONOTONIC; 1536 1545 } 1537 1546 1547 + static enum hrtimer_restart hrtimer_dummy_timeout(struct hrtimer *unused) 1548 + { 1549 + return HRTIMER_NORESTART; 1550 + } 1551 + 1538 1552 static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, 1539 1553 enum hrtimer_mode mode) 1540 1554 { ··· 1576 1580 timerqueue_init(&timer->node); 1577 1581 } 1578 1582 1583 + static void __hrtimer_setup(struct hrtimer *timer, 1584 + enum hrtimer_restart (*function)(struct hrtimer *), 1585 + clockid_t clock_id, enum hrtimer_mode mode) 1586 + { 1587 + __hrtimer_init(timer, clock_id, mode); 1588 + 1589 + if (WARN_ON_ONCE(!function)) 1590 + timer->function = hrtimer_dummy_timeout; 1591 + else 1592 + timer->function = function; 1593 + } 1594 + 1579 1595 /** 1580 1596 * hrtimer_init - initialize a timer to the given clock 1581 1597 * @timer: the timer to be initialized ··· 1607 1599 __hrtimer_init(timer, clock_id, mode); 1608 1600 } 1609 1601 EXPORT_SYMBOL_GPL(hrtimer_init); 1602 + 1603 + /** 1604 + * hrtimer_setup - initialize a timer to the given clock 1605 + * @timer: the timer to be initialized 1606 + * @function: the callback function 1607 + * @clock_id: the clock to be used 1608 + * @mode: The modes which are relevant for initialization: 1609 + * HRTIMER_MODE_ABS, HRTIMER_MODE_REL, HRTIMER_MODE_ABS_SOFT, 1610 + * HRTIMER_MODE_REL_SOFT 1611 + * 1612 + * The PINNED variants of the above can be handed in, 1613 + * but the PINNED bit is ignored as pinning happens 1614 + * when the hrtimer is started 1615 + */ 1616 + void hrtimer_setup(struct hrtimer *timer, enum hrtimer_restart (*function)(struct hrtimer *), 1617 + clockid_t clock_id, enum hrtimer_mode mode) 1618 + { 1619 + debug_init(timer, clock_id, mode); 1620 + __hrtimer_setup(timer, function, clock_id, mode); 1621 + } 1622 + EXPORT_SYMBOL_GPL(hrtimer_setup); 1623 + 1624 + /** 1625 + * hrtimer_setup_on_stack - initialize a timer on stack memory 1626 + * @timer: The timer to be initialized 1627 + * @function: the callback function 1628 + * @clock_id: The clock to be used 1629 + * @mode: The timer mode 1630 + * 1631 + * Similar to hrtimer_setup(), except that this one must be used if struct hrtimer is in stack 1632 + * memory. 1633 + */ 1634 + void hrtimer_setup_on_stack(struct hrtimer *timer, 1635 + enum hrtimer_restart (*function)(struct hrtimer *), 1636 + clockid_t clock_id, enum hrtimer_mode mode) 1637 + { 1638 + debug_init_on_stack(timer, clock_id, mode); 1639 + __hrtimer_setup(timer, function, clock_id, mode); 1640 + } 1641 + EXPORT_SYMBOL_GPL(hrtimer_setup_on_stack); 1610 1642 1611 1643 /* 1612 1644 * A timer is active, when it is enqueued into the rbtree or the ··· 1992 1944 * Make the enqueue delivery mode check work on RT. If the sleeper 1993 1945 * was initialized for hard interrupt delivery, force the mode bit. 1994 1946 * This is a special case for hrtimer_sleepers because 1995 - * hrtimer_init_sleeper() determines the delivery mode on RT so the 1947 + * __hrtimer_init_sleeper() determines the delivery mode on RT so the 1996 1948 * fiddling with this decision is avoided at the call sites. 1997 1949 */ 1998 1950 if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard) ··· 2035 1987 } 2036 1988 2037 1989 /** 2038 - * hrtimer_init_sleeper - initialize sleeper to the given clock 1990 + * hrtimer_setup_sleeper_on_stack - initialize a sleeper in stack memory 2039 1991 * @sl: sleeper to be initialized 2040 1992 * @clock_id: the clock to be used 2041 1993 * @mode: timer mode abs/rel 2042 1994 */ 2043 - void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, 2044 - enum hrtimer_mode mode) 1995 + void hrtimer_setup_sleeper_on_stack(struct hrtimer_sleeper *sl, 1996 + clockid_t clock_id, enum hrtimer_mode mode) 2045 1997 { 2046 - debug_init(&sl->timer, clock_id, mode); 1998 + debug_init_on_stack(&sl->timer, clock_id, mode); 2047 1999 __hrtimer_init_sleeper(sl, clock_id, mode); 2048 - 2049 2000 } 2050 - EXPORT_SYMBOL_GPL(hrtimer_init_sleeper); 2001 + EXPORT_SYMBOL_GPL(hrtimer_setup_sleeper_on_stack); 2051 2002 2052 2003 int nanosleep_copyout(struct restart_block *restart, struct timespec64 *ts) 2053 2004 { ··· 2107 2060 struct hrtimer_sleeper t; 2108 2061 int ret; 2109 2062 2110 - hrtimer_init_sleeper_on_stack(&t, restart->nanosleep.clockid, 2111 - HRTIMER_MODE_ABS); 2063 + hrtimer_setup_sleeper_on_stack(&t, restart->nanosleep.clockid, HRTIMER_MODE_ABS); 2112 2064 hrtimer_set_expires_tv64(&t.timer, restart->nanosleep.expires); 2113 2065 ret = do_nanosleep(&t, HRTIMER_MODE_ABS); 2114 2066 destroy_hrtimer_on_stack(&t.timer); ··· 2121 2075 struct hrtimer_sleeper t; 2122 2076 int ret = 0; 2123 2077 2124 - hrtimer_init_sleeper_on_stack(&t, clockid, mode); 2078 + hrtimer_setup_sleeper_on_stack(&t, clockid, mode); 2125 2079 hrtimer_set_expires_range_ns(&t.timer, rqtp, current->timer_slack_ns); 2126 2080 ret = do_nanosleep(&t, mode); 2127 2081 if (ret != -ERESTART_RESTARTBLOCK) ··· 2288 2242 hrtimers_prepare_cpu(smp_processor_id()); 2289 2243 open_softirq(HRTIMER_SOFTIRQ, hrtimer_run_softirq); 2290 2244 } 2291 - 2292 - /** 2293 - * schedule_hrtimeout_range_clock - sleep until timeout 2294 - * @expires: timeout value (ktime_t) 2295 - * @delta: slack in expires timeout (ktime_t) 2296 - * @mode: timer mode 2297 - * @clock_id: timer clock to be used 2298 - */ 2299 - int __sched 2300 - schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta, 2301 - const enum hrtimer_mode mode, clockid_t clock_id) 2302 - { 2303 - struct hrtimer_sleeper t; 2304 - 2305 - /* 2306 - * Optimize when a zero timeout value is given. It does not 2307 - * matter whether this is an absolute or a relative time. 2308 - */ 2309 - if (expires && *expires == 0) { 2310 - __set_current_state(TASK_RUNNING); 2311 - return 0; 2312 - } 2313 - 2314 - /* 2315 - * A NULL parameter means "infinite" 2316 - */ 2317 - if (!expires) { 2318 - schedule(); 2319 - return -EINTR; 2320 - } 2321 - 2322 - hrtimer_init_sleeper_on_stack(&t, clock_id, mode); 2323 - hrtimer_set_expires_range_ns(&t.timer, *expires, delta); 2324 - hrtimer_sleeper_start_expires(&t, mode); 2325 - 2326 - if (likely(t.task)) 2327 - schedule(); 2328 - 2329 - hrtimer_cancel(&t.timer); 2330 - destroy_hrtimer_on_stack(&t.timer); 2331 - 2332 - __set_current_state(TASK_RUNNING); 2333 - 2334 - return !t.task ? 0 : -EINTR; 2335 - } 2336 - EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock); 2337 - 2338 - /** 2339 - * schedule_hrtimeout_range - sleep until timeout 2340 - * @expires: timeout value (ktime_t) 2341 - * @delta: slack in expires timeout (ktime_t) 2342 - * @mode: timer mode 2343 - * 2344 - * Make the current task sleep until the given expiry time has 2345 - * elapsed. The routine will return immediately unless 2346 - * the current task state has been set (see set_current_state()). 2347 - * 2348 - * The @delta argument gives the kernel the freedom to schedule the 2349 - * actual wakeup to a time that is both power and performance friendly 2350 - * for regular (non RT/DL) tasks. 2351 - * The kernel give the normal best effort behavior for "@expires+@delta", 2352 - * but may decide to fire the timer earlier, but no earlier than @expires. 2353 - * 2354 - * You can set the task state as follows - 2355 - * 2356 - * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to 2357 - * pass before the routine returns unless the current task is explicitly 2358 - * woken up, (e.g. by wake_up_process()). 2359 - * 2360 - * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 2361 - * delivered to the current task or the current task is explicitly woken 2362 - * up. 2363 - * 2364 - * The current task state is guaranteed to be TASK_RUNNING when this 2365 - * routine returns. 2366 - * 2367 - * Returns 0 when the timer has expired. If the task was woken before the 2368 - * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or 2369 - * by an explicit wakeup, it returns -EINTR. 2370 - */ 2371 - int __sched schedule_hrtimeout_range(ktime_t *expires, u64 delta, 2372 - const enum hrtimer_mode mode) 2373 - { 2374 - return schedule_hrtimeout_range_clock(expires, delta, mode, 2375 - CLOCK_MONOTONIC); 2376 - } 2377 - EXPORT_SYMBOL_GPL(schedule_hrtimeout_range); 2378 - 2379 - /** 2380 - * schedule_hrtimeout - sleep until timeout 2381 - * @expires: timeout value (ktime_t) 2382 - * @mode: timer mode 2383 - * 2384 - * Make the current task sleep until the given expiry time has 2385 - * elapsed. The routine will return immediately unless 2386 - * the current task state has been set (see set_current_state()). 2387 - * 2388 - * You can set the task state as follows - 2389 - * 2390 - * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to 2391 - * pass before the routine returns unless the current task is explicitly 2392 - * woken up, (e.g. by wake_up_process()). 2393 - * 2394 - * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 2395 - * delivered to the current task or the current task is explicitly woken 2396 - * up. 2397 - * 2398 - * The current task state is guaranteed to be TASK_RUNNING when this 2399 - * routine returns. 2400 - * 2401 - * Returns 0 when the timer has expired. If the task was woken before the 2402 - * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or 2403 - * by an explicit wakeup, it returns -EINTR. 2404 - */ 2405 - int __sched schedule_hrtimeout(ktime_t *expires, 2406 - const enum hrtimer_mode mode) 2407 - { 2408 - return schedule_hrtimeout_range(expires, 0, mode); 2409 - } 2410 - EXPORT_SYMBOL_GPL(schedule_hrtimeout);
+21 -1
kernel/time/itimer.c
··· 151 151 #endif 152 152 153 153 /* 154 - * The timer is automagically restarted, when interval != 0 154 + * Invoked from dequeue_signal() when SIG_ALRM is delivered. 155 + * 156 + * Restart the ITIMER_REAL timer if it is armed as periodic timer. Doing 157 + * this in the signal delivery path instead of self rearming prevents a DoS 158 + * with small increments in the high reolution timer case and reduces timer 159 + * noise in general. 160 + */ 161 + void posixtimer_rearm_itimer(struct task_struct *tsk) 162 + { 163 + struct hrtimer *tmr = &tsk->signal->real_timer; 164 + 165 + if (!hrtimer_is_queued(tmr) && tsk->signal->it_real_incr != 0) { 166 + hrtimer_forward(tmr, tmr->base->get_time(), 167 + tsk->signal->it_real_incr); 168 + hrtimer_restart(tmr); 169 + } 170 + } 171 + 172 + /* 173 + * Interval timers are restarted in the signal delivery path. See 174 + * posixtimer_rearm_itimer(). 155 175 */ 156 176 enum hrtimer_restart it_real_fn(struct hrtimer *timer) 157 177 {
+420 -422
kernel/time/ntp.c
··· 22 22 #include "ntp_internal.h" 23 23 #include "timekeeping_internal.h" 24 24 25 - 26 - /* 27 - * NTP timekeeping variables: 25 + /** 26 + * struct ntp_data - Structure holding all NTP related state 27 + * @tick_usec: USER_HZ period in microseconds 28 + * @tick_length: Adjusted tick length 29 + * @tick_length_base: Base value for @tick_length 30 + * @time_state: State of the clock synchronization 31 + * @time_status: Clock status bits 32 + * @time_offset: Time adjustment in nanoseconds 33 + * @time_constant: PLL time constant 34 + * @time_maxerror: Maximum error in microseconds holding the NTP sync distance 35 + * (NTP dispersion + delay / 2) 36 + * @time_esterror: Estimated error in microseconds holding NTP dispersion 37 + * @time_freq: Frequency offset scaled nsecs/secs 38 + * @time_reftime: Time at last adjustment in seconds 39 + * @time_adjust: Adjustment value 40 + * @ntp_tick_adj: Constant boot-param configurable NTP tick adjustment (upscaled) 41 + * @ntp_next_leap_sec: Second value of the next pending leapsecond, or TIME64_MAX if no leap 28 42 * 29 - * Note: All of the NTP state is protected by the timekeeping locks. 43 + * @pps_valid: PPS signal watchdog counter 44 + * @pps_tf: PPS phase median filter 45 + * @pps_jitter: PPS current jitter in nanoseconds 46 + * @pps_fbase: PPS beginning of the last freq interval 47 + * @pps_shift: PPS current interval duration in seconds (shift value) 48 + * @pps_intcnt: PPS interval counter 49 + * @pps_freq: PPS frequency offset in scaled ns/s 50 + * @pps_stabil: PPS current stability in scaled ns/s 51 + * @pps_calcnt: PPS monitor: calibration intervals 52 + * @pps_jitcnt: PPS monitor: jitter limit exceeded 53 + * @pps_stbcnt: PPS monitor: stability limit exceeded 54 + * @pps_errcnt: PPS monitor: calibration errors 55 + * 56 + * Protected by the timekeeping locks. 30 57 */ 58 + struct ntp_data { 59 + unsigned long tick_usec; 60 + u64 tick_length; 61 + u64 tick_length_base; 62 + int time_state; 63 + int time_status; 64 + s64 time_offset; 65 + long time_constant; 66 + long time_maxerror; 67 + long time_esterror; 68 + s64 time_freq; 69 + time64_t time_reftime; 70 + long time_adjust; 71 + s64 ntp_tick_adj; 72 + time64_t ntp_next_leap_sec; 73 + #ifdef CONFIG_NTP_PPS 74 + int pps_valid; 75 + long pps_tf[3]; 76 + long pps_jitter; 77 + struct timespec64 pps_fbase; 78 + int pps_shift; 79 + int pps_intcnt; 80 + s64 pps_freq; 81 + long pps_stabil; 82 + long pps_calcnt; 83 + long pps_jitcnt; 84 + long pps_stbcnt; 85 + long pps_errcnt; 86 + #endif 87 + }; 31 88 32 - 33 - /* USER_HZ period (usecs): */ 34 - unsigned long tick_usec = USER_TICK_USEC; 35 - 36 - /* SHIFTED_HZ period (nsecs): */ 37 - unsigned long tick_nsec; 38 - 39 - static u64 tick_length; 40 - static u64 tick_length_base; 89 + static struct ntp_data tk_ntp_data = { 90 + .tick_usec = USER_TICK_USEC, 91 + .time_state = TIME_OK, 92 + .time_status = STA_UNSYNC, 93 + .time_constant = 2, 94 + .time_maxerror = NTP_PHASE_LIMIT, 95 + .time_esterror = NTP_PHASE_LIMIT, 96 + .ntp_next_leap_sec = TIME64_MAX, 97 + }; 41 98 42 99 #define SECS_PER_DAY 86400 43 100 #define MAX_TICKADJ 500LL /* usecs */ 44 101 #define MAX_TICKADJ_SCALED \ 45 102 (((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ) 46 103 #define MAX_TAI_OFFSET 100000 47 - 48 - /* 49 - * phase-lock loop variables 50 - */ 51 - 52 - /* 53 - * clock synchronization status 54 - * 55 - * (TIME_ERROR prevents overwriting the CMOS clock) 56 - */ 57 - static int time_state = TIME_OK; 58 - 59 - /* clock status bits: */ 60 - static int time_status = STA_UNSYNC; 61 - 62 - /* time adjustment (nsecs): */ 63 - static s64 time_offset; 64 - 65 - /* pll time constant: */ 66 - static long time_constant = 2; 67 - 68 - /* maximum error (usecs): */ 69 - static long time_maxerror = NTP_PHASE_LIMIT; 70 - 71 - /* estimated error (usecs): */ 72 - static long time_esterror = NTP_PHASE_LIMIT; 73 - 74 - /* frequency offset (scaled nsecs/secs): */ 75 - static s64 time_freq; 76 - 77 - /* time at last adjustment (secs): */ 78 - static time64_t time_reftime; 79 - 80 - static long time_adjust; 81 - 82 - /* constant (boot-param configurable) NTP tick adjustment (upscaled) */ 83 - static s64 ntp_tick_adj; 84 - 85 - /* second value of the next pending leapsecond, or TIME64_MAX if no leap */ 86 - static time64_t ntp_next_leap_sec = TIME64_MAX; 87 104 88 105 #ifdef CONFIG_NTP_PPS 89 106 ··· 118 101 intervals to decrease it */ 119 102 #define PPS_MAXWANDER 100000 /* max PPS freq wander (ns/s) */ 120 103 121 - static int pps_valid; /* signal watchdog counter */ 122 - static long pps_tf[3]; /* phase median filter */ 123 - static long pps_jitter; /* current jitter (ns) */ 124 - static struct timespec64 pps_fbase; /* beginning of the last freq interval */ 125 - static int pps_shift; /* current interval duration (s) (shift) */ 126 - static int pps_intcnt; /* interval counter */ 127 - static s64 pps_freq; /* frequency offset (scaled ns/s) */ 128 - static long pps_stabil; /* current stability (scaled ns/s) */ 129 - 130 104 /* 131 - * PPS signal quality monitors 132 - */ 133 - static long pps_calcnt; /* calibration intervals */ 134 - static long pps_jitcnt; /* jitter limit exceeded */ 135 - static long pps_stbcnt; /* stability limit exceeded */ 136 - static long pps_errcnt; /* calibration errors */ 137 - 138 - 139 - /* PPS kernel consumer compensates the whole phase error immediately. 105 + * PPS kernel consumer compensates the whole phase error immediately. 140 106 * Otherwise, reduce the offset by a fixed factor times the time constant. 141 107 */ 142 - static inline s64 ntp_offset_chunk(s64 offset) 108 + static inline s64 ntp_offset_chunk(struct ntp_data *ntpdata, s64 offset) 143 109 { 144 - if (time_status & STA_PPSTIME && time_status & STA_PPSSIGNAL) 110 + if (ntpdata->time_status & STA_PPSTIME && ntpdata->time_status & STA_PPSSIGNAL) 145 111 return offset; 146 112 else 147 - return shift_right(offset, SHIFT_PLL + time_constant); 113 + return shift_right(offset, SHIFT_PLL + ntpdata->time_constant); 148 114 } 149 115 150 - static inline void pps_reset_freq_interval(void) 116 + static inline void pps_reset_freq_interval(struct ntp_data *ntpdata) 151 117 { 152 - /* the PPS calibration interval may end 153 - surprisingly early */ 154 - pps_shift = PPS_INTMIN; 155 - pps_intcnt = 0; 118 + /* The PPS calibration interval may end surprisingly early */ 119 + ntpdata->pps_shift = PPS_INTMIN; 120 + ntpdata->pps_intcnt = 0; 156 121 } 157 122 158 123 /** 159 124 * pps_clear - Clears the PPS state variables 125 + * @ntpdata: Pointer to ntp data 160 126 */ 161 - static inline void pps_clear(void) 127 + static inline void pps_clear(struct ntp_data *ntpdata) 162 128 { 163 - pps_reset_freq_interval(); 164 - pps_tf[0] = 0; 165 - pps_tf[1] = 0; 166 - pps_tf[2] = 0; 167 - pps_fbase.tv_sec = pps_fbase.tv_nsec = 0; 168 - pps_freq = 0; 129 + pps_reset_freq_interval(ntpdata); 130 + ntpdata->pps_tf[0] = 0; 131 + ntpdata->pps_tf[1] = 0; 132 + ntpdata->pps_tf[2] = 0; 133 + ntpdata->pps_fbase.tv_sec = ntpdata->pps_fbase.tv_nsec = 0; 134 + ntpdata->pps_freq = 0; 169 135 } 170 136 171 - /* Decrease pps_valid to indicate that another second has passed since 172 - * the last PPS signal. When it reaches 0, indicate that PPS signal is 173 - * missing. 137 + /* 138 + * Decrease pps_valid to indicate that another second has passed since the 139 + * last PPS signal. When it reaches 0, indicate that PPS signal is missing. 174 140 */ 175 - static inline void pps_dec_valid(void) 141 + static inline void pps_dec_valid(struct ntp_data *ntpdata) 176 142 { 177 - if (pps_valid > 0) 178 - pps_valid--; 179 - else { 180 - time_status &= ~(STA_PPSSIGNAL | STA_PPSJITTER | 181 - STA_PPSWANDER | STA_PPSERROR); 182 - pps_clear(); 143 + if (ntpdata->pps_valid > 0) { 144 + ntpdata->pps_valid--; 145 + } else { 146 + ntpdata->time_status &= ~(STA_PPSSIGNAL | STA_PPSJITTER | 147 + STA_PPSWANDER | STA_PPSERROR); 148 + pps_clear(ntpdata); 183 149 } 184 150 } 185 151 186 - static inline void pps_set_freq(s64 freq) 152 + static inline void pps_set_freq(struct ntp_data *ntpdata) 187 153 { 188 - pps_freq = freq; 154 + ntpdata->pps_freq = ntpdata->time_freq; 189 155 } 190 156 191 - static inline int is_error_status(int status) 157 + static inline bool is_error_status(int status) 192 158 { 193 159 return (status & (STA_UNSYNC|STA_CLOCKERR)) 194 - /* PPS signal lost when either PPS time or 195 - * PPS frequency synchronization requested 160 + /* 161 + * PPS signal lost when either PPS time or PPS frequency 162 + * synchronization requested 196 163 */ 197 164 || ((status & (STA_PPSFREQ|STA_PPSTIME)) 198 165 && !(status & STA_PPSSIGNAL)) 199 - /* PPS jitter exceeded when 200 - * PPS time synchronization requested */ 166 + /* 167 + * PPS jitter exceeded when PPS time synchronization 168 + * requested 169 + */ 201 170 || ((status & (STA_PPSTIME|STA_PPSJITTER)) 202 171 == (STA_PPSTIME|STA_PPSJITTER)) 203 - /* PPS wander exceeded or calibration error when 204 - * PPS frequency synchronization requested 172 + /* 173 + * PPS wander exceeded or calibration error when PPS 174 + * frequency synchronization requested 205 175 */ 206 176 || ((status & STA_PPSFREQ) 207 177 && (status & (STA_PPSWANDER|STA_PPSERROR))); 208 178 } 209 179 210 - static inline void pps_fill_timex(struct __kernel_timex *txc) 180 + static inline void pps_fill_timex(struct ntp_data *ntpdata, struct __kernel_timex *txc) 211 181 { 212 - txc->ppsfreq = shift_right((pps_freq >> PPM_SCALE_INV_SHIFT) * 182 + txc->ppsfreq = shift_right((ntpdata->pps_freq >> PPM_SCALE_INV_SHIFT) * 213 183 PPM_SCALE_INV, NTP_SCALE_SHIFT); 214 - txc->jitter = pps_jitter; 215 - if (!(time_status & STA_NANO)) 216 - txc->jitter = pps_jitter / NSEC_PER_USEC; 217 - txc->shift = pps_shift; 218 - txc->stabil = pps_stabil; 219 - txc->jitcnt = pps_jitcnt; 220 - txc->calcnt = pps_calcnt; 221 - txc->errcnt = pps_errcnt; 222 - txc->stbcnt = pps_stbcnt; 184 + txc->jitter = ntpdata->pps_jitter; 185 + if (!(ntpdata->time_status & STA_NANO)) 186 + txc->jitter = ntpdata->pps_jitter / NSEC_PER_USEC; 187 + txc->shift = ntpdata->pps_shift; 188 + txc->stabil = ntpdata->pps_stabil; 189 + txc->jitcnt = ntpdata->pps_jitcnt; 190 + txc->calcnt = ntpdata->pps_calcnt; 191 + txc->errcnt = ntpdata->pps_errcnt; 192 + txc->stbcnt = ntpdata->pps_stbcnt; 223 193 } 224 194 225 195 #else /* !CONFIG_NTP_PPS */ 226 196 227 - static inline s64 ntp_offset_chunk(s64 offset) 197 + static inline s64 ntp_offset_chunk(struct ntp_data *ntpdata, s64 offset) 228 198 { 229 - return shift_right(offset, SHIFT_PLL + time_constant); 199 + return shift_right(offset, SHIFT_PLL + ntpdata->time_constant); 230 200 } 231 201 232 - static inline void pps_reset_freq_interval(void) {} 233 - static inline void pps_clear(void) {} 234 - static inline void pps_dec_valid(void) {} 235 - static inline void pps_set_freq(s64 freq) {} 202 + static inline void pps_reset_freq_interval(struct ntp_data *ntpdata) {} 203 + static inline void pps_clear(struct ntp_data *ntpdata) {} 204 + static inline void pps_dec_valid(struct ntp_data *ntpdata) {} 205 + static inline void pps_set_freq(struct ntp_data *ntpdata) {} 236 206 237 - static inline int is_error_status(int status) 207 + static inline bool is_error_status(int status) 238 208 { 239 209 return status & (STA_UNSYNC|STA_CLOCKERR); 240 210 } 241 211 242 - static inline void pps_fill_timex(struct __kernel_timex *txc) 212 + static inline void pps_fill_timex(struct ntp_data *ntpdata, struct __kernel_timex *txc) 243 213 { 244 214 /* PPS is not implemented, so these are zero */ 245 215 txc->ppsfreq = 0; ··· 241 237 242 238 #endif /* CONFIG_NTP_PPS */ 243 239 244 - 245 - /** 246 - * ntp_synced - Returns 1 if the NTP status is not UNSYNC 247 - * 248 - */ 249 - static inline int ntp_synced(void) 250 - { 251 - return !(time_status & STA_UNSYNC); 252 - } 253 - 254 - 255 240 /* 256 - * NTP methods: 241 + * Update tick_length and tick_length_base, based on tick_usec, ntp_tick_adj and 242 + * time_freq: 257 243 */ 258 - 259 - /* 260 - * Update (tick_length, tick_length_base, tick_nsec), based 261 - * on (tick_usec, ntp_tick_adj, time_freq): 262 - */ 263 - static void ntp_update_frequency(void) 244 + static void ntp_update_frequency(struct ntp_data *ntpdata) 264 245 { 265 - u64 second_length; 266 - u64 new_base; 246 + u64 second_length, new_base, tick_usec = (u64)ntpdata->tick_usec; 267 247 268 - second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) 269 - << NTP_SCALE_SHIFT; 248 + second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << NTP_SCALE_SHIFT; 270 249 271 - second_length += ntp_tick_adj; 272 - second_length += time_freq; 250 + second_length += ntpdata->ntp_tick_adj; 251 + second_length += ntpdata->time_freq; 273 252 274 - tick_nsec = div_u64(second_length, HZ) >> NTP_SCALE_SHIFT; 275 253 new_base = div_u64(second_length, NTP_INTERVAL_FREQ); 276 254 277 255 /* 278 - * Don't wait for the next second_overflow, apply 279 - * the change to the tick length immediately: 256 + * Don't wait for the next second_overflow, apply the change to the 257 + * tick length immediately: 280 258 */ 281 - tick_length += new_base - tick_length_base; 282 - tick_length_base = new_base; 259 + ntpdata->tick_length += new_base - ntpdata->tick_length_base; 260 + ntpdata->tick_length_base = new_base; 283 261 } 284 262 285 - static inline s64 ntp_update_offset_fll(s64 offset64, long secs) 263 + static inline s64 ntp_update_offset_fll(struct ntp_data *ntpdata, s64 offset64, long secs) 286 264 { 287 - time_status &= ~STA_MODE; 265 + ntpdata->time_status &= ~STA_MODE; 288 266 289 267 if (secs < MINSEC) 290 268 return 0; 291 269 292 - if (!(time_status & STA_FLL) && (secs <= MAXSEC)) 270 + if (!(ntpdata->time_status & STA_FLL) && (secs <= MAXSEC)) 293 271 return 0; 294 272 295 - time_status |= STA_MODE; 273 + ntpdata->time_status |= STA_MODE; 296 274 297 275 return div64_long(offset64 << (NTP_SCALE_SHIFT - SHIFT_FLL), secs); 298 276 } 299 277 300 - static void ntp_update_offset(long offset) 278 + static void ntp_update_offset(struct ntp_data *ntpdata, long offset) 301 279 { 302 - s64 freq_adj; 303 - s64 offset64; 304 - long secs; 280 + s64 freq_adj, offset64; 281 + long secs, real_secs; 305 282 306 - if (!(time_status & STA_PLL)) 283 + if (!(ntpdata->time_status & STA_PLL)) 307 284 return; 308 285 309 - if (!(time_status & STA_NANO)) { 286 + if (!(ntpdata->time_status & STA_NANO)) { 310 287 /* Make sure the multiplication below won't overflow */ 311 288 offset = clamp(offset, -USEC_PER_SEC, USEC_PER_SEC); 312 289 offset *= NSEC_PER_USEC; 313 290 } 314 291 315 - /* 316 - * Scale the phase adjustment and 317 - * clamp to the operating range. 318 - */ 292 + /* Scale the phase adjustment and clamp to the operating range. */ 319 293 offset = clamp(offset, -MAXPHASE, MAXPHASE); 320 294 321 295 /* 322 296 * Select how the frequency is to be controlled 323 297 * and in which mode (PLL or FLL). 324 298 */ 325 - secs = (long)(__ktime_get_real_seconds() - time_reftime); 326 - if (unlikely(time_status & STA_FREQHOLD)) 299 + real_secs = __ktime_get_real_seconds(); 300 + secs = (long)(real_secs - ntpdata->time_reftime); 301 + if (unlikely(ntpdata->time_status & STA_FREQHOLD)) 327 302 secs = 0; 328 303 329 - time_reftime = __ktime_get_real_seconds(); 304 + ntpdata->time_reftime = real_secs; 330 305 331 306 offset64 = offset; 332 - freq_adj = ntp_update_offset_fll(offset64, secs); 307 + freq_adj = ntp_update_offset_fll(ntpdata, offset64, secs); 333 308 334 309 /* 335 310 * Clamp update interval to reduce PLL gain with low 336 311 * sampling rate (e.g. intermittent network connection) 337 312 * to avoid instability. 338 313 */ 339 - if (unlikely(secs > 1 << (SHIFT_PLL + 1 + time_constant))) 340 - secs = 1 << (SHIFT_PLL + 1 + time_constant); 314 + if (unlikely(secs > 1 << (SHIFT_PLL + 1 + ntpdata->time_constant))) 315 + secs = 1 << (SHIFT_PLL + 1 + ntpdata->time_constant); 341 316 342 317 freq_adj += (offset64 * secs) << 343 - (NTP_SCALE_SHIFT - 2 * (SHIFT_PLL + 2 + time_constant)); 318 + (NTP_SCALE_SHIFT - 2 * (SHIFT_PLL + 2 + ntpdata->time_constant)); 344 319 345 - freq_adj = min(freq_adj + time_freq, MAXFREQ_SCALED); 320 + freq_adj = min(freq_adj + ntpdata->time_freq, MAXFREQ_SCALED); 346 321 347 - time_freq = max(freq_adj, -MAXFREQ_SCALED); 322 + ntpdata->time_freq = max(freq_adj, -MAXFREQ_SCALED); 348 323 349 - time_offset = div_s64(offset64 << NTP_SCALE_SHIFT, NTP_INTERVAL_FREQ); 324 + ntpdata->time_offset = div_s64(offset64 << NTP_SCALE_SHIFT, NTP_INTERVAL_FREQ); 325 + } 326 + 327 + static void __ntp_clear(struct ntp_data *ntpdata) 328 + { 329 + /* Stop active adjtime() */ 330 + ntpdata->time_adjust = 0; 331 + ntpdata->time_status |= STA_UNSYNC; 332 + ntpdata->time_maxerror = NTP_PHASE_LIMIT; 333 + ntpdata->time_esterror = NTP_PHASE_LIMIT; 334 + 335 + ntp_update_frequency(ntpdata); 336 + 337 + ntpdata->tick_length = ntpdata->tick_length_base; 338 + ntpdata->time_offset = 0; 339 + 340 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 341 + /* Clear PPS state variables */ 342 + pps_clear(ntpdata); 350 343 } 351 344 352 345 /** ··· 351 350 */ 352 351 void ntp_clear(void) 353 352 { 354 - time_adjust = 0; /* stop active adjtime() */ 355 - time_status |= STA_UNSYNC; 356 - time_maxerror = NTP_PHASE_LIMIT; 357 - time_esterror = NTP_PHASE_LIMIT; 358 - 359 - ntp_update_frequency(); 360 - 361 - tick_length = tick_length_base; 362 - time_offset = 0; 363 - 364 - ntp_next_leap_sec = TIME64_MAX; 365 - /* Clear PPS state variables */ 366 - pps_clear(); 353 + __ntp_clear(&tk_ntp_data); 367 354 } 368 355 369 356 370 357 u64 ntp_tick_length(void) 371 358 { 372 - return tick_length; 359 + return tk_ntp_data.tick_length; 373 360 } 374 361 375 362 /** ··· 368 379 */ 369 380 ktime_t ntp_get_next_leap(void) 370 381 { 382 + struct ntp_data *ntpdata = &tk_ntp_data; 371 383 ktime_t ret; 372 384 373 - if ((time_state == TIME_INS) && (time_status & STA_INS)) 374 - return ktime_set(ntp_next_leap_sec, 0); 385 + if ((ntpdata->time_state == TIME_INS) && (ntpdata->time_status & STA_INS)) 386 + return ktime_set(ntpdata->ntp_next_leap_sec, 0); 375 387 ret = KTIME_MAX; 376 388 return ret; 377 389 } 378 390 379 391 /* 380 - * this routine handles the overflow of the microsecond field 392 + * This routine handles the overflow of the microsecond field 381 393 * 382 394 * The tricky bits of code to handle the accurate clock support 383 395 * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame. ··· 389 399 */ 390 400 int second_overflow(time64_t secs) 391 401 { 402 + struct ntp_data *ntpdata = &tk_ntp_data; 392 403 s64 delta; 393 404 int leap = 0; 394 405 s32 rem; ··· 399 408 * day, the system clock is set back one second; if in leap-delete 400 409 * state, the system clock is set ahead one second. 401 410 */ 402 - switch (time_state) { 411 + switch (ntpdata->time_state) { 403 412 case TIME_OK: 404 - if (time_status & STA_INS) { 405 - time_state = TIME_INS; 413 + if (ntpdata->time_status & STA_INS) { 414 + ntpdata->time_state = TIME_INS; 406 415 div_s64_rem(secs, SECS_PER_DAY, &rem); 407 - ntp_next_leap_sec = secs + SECS_PER_DAY - rem; 408 - } else if (time_status & STA_DEL) { 409 - time_state = TIME_DEL; 416 + ntpdata->ntp_next_leap_sec = secs + SECS_PER_DAY - rem; 417 + } else if (ntpdata->time_status & STA_DEL) { 418 + ntpdata->time_state = TIME_DEL; 410 419 div_s64_rem(secs + 1, SECS_PER_DAY, &rem); 411 - ntp_next_leap_sec = secs + SECS_PER_DAY - rem; 420 + ntpdata->ntp_next_leap_sec = secs + SECS_PER_DAY - rem; 412 421 } 413 422 break; 414 423 case TIME_INS: 415 - if (!(time_status & STA_INS)) { 416 - ntp_next_leap_sec = TIME64_MAX; 417 - time_state = TIME_OK; 418 - } else if (secs == ntp_next_leap_sec) { 424 + if (!(ntpdata->time_status & STA_INS)) { 425 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 426 + ntpdata->time_state = TIME_OK; 427 + } else if (secs == ntpdata->ntp_next_leap_sec) { 419 428 leap = -1; 420 - time_state = TIME_OOP; 421 - printk(KERN_NOTICE 422 - "Clock: inserting leap second 23:59:60 UTC\n"); 429 + ntpdata->time_state = TIME_OOP; 430 + pr_notice("Clock: inserting leap second 23:59:60 UTC\n"); 423 431 } 424 432 break; 425 433 case TIME_DEL: 426 - if (!(time_status & STA_DEL)) { 427 - ntp_next_leap_sec = TIME64_MAX; 428 - time_state = TIME_OK; 429 - } else if (secs == ntp_next_leap_sec) { 434 + if (!(ntpdata->time_status & STA_DEL)) { 435 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 436 + ntpdata->time_state = TIME_OK; 437 + } else if (secs == ntpdata->ntp_next_leap_sec) { 430 438 leap = 1; 431 - ntp_next_leap_sec = TIME64_MAX; 432 - time_state = TIME_WAIT; 433 - printk(KERN_NOTICE 434 - "Clock: deleting leap second 23:59:59 UTC\n"); 439 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 440 + ntpdata->time_state = TIME_WAIT; 441 + pr_notice("Clock: deleting leap second 23:59:59 UTC\n"); 435 442 } 436 443 break; 437 444 case TIME_OOP: 438 - ntp_next_leap_sec = TIME64_MAX; 439 - time_state = TIME_WAIT; 445 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 446 + ntpdata->time_state = TIME_WAIT; 440 447 break; 441 448 case TIME_WAIT: 442 - if (!(time_status & (STA_INS | STA_DEL))) 443 - time_state = TIME_OK; 449 + if (!(ntpdata->time_status & (STA_INS | STA_DEL))) 450 + ntpdata->time_state = TIME_OK; 444 451 break; 445 452 } 446 453 447 - 448 454 /* Bump the maxerror field */ 449 - time_maxerror += MAXFREQ / NSEC_PER_USEC; 450 - if (time_maxerror > NTP_PHASE_LIMIT) { 451 - time_maxerror = NTP_PHASE_LIMIT; 452 - time_status |= STA_UNSYNC; 455 + ntpdata->time_maxerror += MAXFREQ / NSEC_PER_USEC; 456 + if (ntpdata->time_maxerror > NTP_PHASE_LIMIT) { 457 + ntpdata->time_maxerror = NTP_PHASE_LIMIT; 458 + ntpdata->time_status |= STA_UNSYNC; 453 459 } 454 460 455 461 /* Compute the phase adjustment for the next second */ 456 - tick_length = tick_length_base; 462 + ntpdata->tick_length = ntpdata->tick_length_base; 457 463 458 - delta = ntp_offset_chunk(time_offset); 459 - time_offset -= delta; 460 - tick_length += delta; 464 + delta = ntp_offset_chunk(ntpdata, ntpdata->time_offset); 465 + ntpdata->time_offset -= delta; 466 + ntpdata->tick_length += delta; 461 467 462 468 /* Check PPS signal */ 463 - pps_dec_valid(); 469 + pps_dec_valid(ntpdata); 464 470 465 - if (!time_adjust) 471 + if (!ntpdata->time_adjust) 466 472 goto out; 467 473 468 - if (time_adjust > MAX_TICKADJ) { 469 - time_adjust -= MAX_TICKADJ; 470 - tick_length += MAX_TICKADJ_SCALED; 471 - goto out; 472 - } 473 - 474 - if (time_adjust < -MAX_TICKADJ) { 475 - time_adjust += MAX_TICKADJ; 476 - tick_length -= MAX_TICKADJ_SCALED; 474 + if (ntpdata->time_adjust > MAX_TICKADJ) { 475 + ntpdata->time_adjust -= MAX_TICKADJ; 476 + ntpdata->tick_length += MAX_TICKADJ_SCALED; 477 477 goto out; 478 478 } 479 479 480 - tick_length += (s64)(time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ) 481 - << NTP_SCALE_SHIFT; 482 - time_adjust = 0; 480 + if (ntpdata->time_adjust < -MAX_TICKADJ) { 481 + ntpdata->time_adjust += MAX_TICKADJ; 482 + ntpdata->tick_length -= MAX_TICKADJ_SCALED; 483 + goto out; 484 + } 485 + 486 + ntpdata->tick_length += (s64)(ntpdata->time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ) 487 + << NTP_SCALE_SHIFT; 488 + ntpdata->time_adjust = 0; 483 489 484 490 out: 485 491 return leap; ··· 599 611 } 600 612 #endif 601 613 614 + /** 615 + * ntp_synced - Tells whether the NTP status is not UNSYNC 616 + * Returns: true if not UNSYNC, false otherwise 617 + */ 618 + static inline bool ntp_synced(void) 619 + { 620 + return !(tk_ntp_data.time_status & STA_UNSYNC); 621 + } 622 + 602 623 /* 603 624 * If we have an externally synchronized Linux clock, then update RTC clock 604 625 * accordingly every ~11 minutes. Generally RTCs can only store second ··· 688 691 /* 689 692 * Propagate a new txc->status value into the NTP state: 690 693 */ 691 - static inline void process_adj_status(const struct __kernel_timex *txc) 694 + static inline void process_adj_status(struct ntp_data *ntpdata, const struct __kernel_timex *txc) 692 695 { 693 - if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) { 694 - time_state = TIME_OK; 695 - time_status = STA_UNSYNC; 696 - ntp_next_leap_sec = TIME64_MAX; 697 - /* restart PPS frequency calibration */ 698 - pps_reset_freq_interval(); 696 + if ((ntpdata->time_status & STA_PLL) && !(txc->status & STA_PLL)) { 697 + ntpdata->time_state = TIME_OK; 698 + ntpdata->time_status = STA_UNSYNC; 699 + ntpdata->ntp_next_leap_sec = TIME64_MAX; 700 + /* Restart PPS frequency calibration */ 701 + pps_reset_freq_interval(ntpdata); 699 702 } 700 703 701 704 /* 702 705 * If we turn on PLL adjustments then reset the 703 706 * reference time to current time. 704 707 */ 705 - if (!(time_status & STA_PLL) && (txc->status & STA_PLL)) 706 - time_reftime = __ktime_get_real_seconds(); 708 + if (!(ntpdata->time_status & STA_PLL) && (txc->status & STA_PLL)) 709 + ntpdata->time_reftime = __ktime_get_real_seconds(); 707 710 708 711 /* only set allowed bits */ 709 - time_status &= STA_RONLY; 710 - time_status |= txc->status & ~STA_RONLY; 712 + ntpdata->time_status &= STA_RONLY; 713 + ntpdata->time_status |= txc->status & ~STA_RONLY; 711 714 } 712 715 713 - 714 - static inline void process_adjtimex_modes(const struct __kernel_timex *txc, 716 + static inline void process_adjtimex_modes(struct ntp_data *ntpdata, const struct __kernel_timex *txc, 715 717 s32 *time_tai) 716 718 { 717 719 if (txc->modes & ADJ_STATUS) 718 - process_adj_status(txc); 720 + process_adj_status(ntpdata, txc); 719 721 720 722 if (txc->modes & ADJ_NANO) 721 - time_status |= STA_NANO; 723 + ntpdata->time_status |= STA_NANO; 722 724 723 725 if (txc->modes & ADJ_MICRO) 724 - time_status &= ~STA_NANO; 726 + ntpdata->time_status &= ~STA_NANO; 725 727 726 728 if (txc->modes & ADJ_FREQUENCY) { 727 - time_freq = txc->freq * PPM_SCALE; 728 - time_freq = min(time_freq, MAXFREQ_SCALED); 729 - time_freq = max(time_freq, -MAXFREQ_SCALED); 730 - /* update pps_freq */ 731 - pps_set_freq(time_freq); 729 + ntpdata->time_freq = txc->freq * PPM_SCALE; 730 + ntpdata->time_freq = min(ntpdata->time_freq, MAXFREQ_SCALED); 731 + ntpdata->time_freq = max(ntpdata->time_freq, -MAXFREQ_SCALED); 732 + /* Update pps_freq */ 733 + pps_set_freq(ntpdata); 732 734 } 733 735 734 736 if (txc->modes & ADJ_MAXERROR) 735 - time_maxerror = clamp(txc->maxerror, 0, NTP_PHASE_LIMIT); 737 + ntpdata->time_maxerror = clamp(txc->maxerror, 0, NTP_PHASE_LIMIT); 736 738 737 739 if (txc->modes & ADJ_ESTERROR) 738 - time_esterror = clamp(txc->esterror, 0, NTP_PHASE_LIMIT); 740 + ntpdata->time_esterror = clamp(txc->esterror, 0, NTP_PHASE_LIMIT); 739 741 740 742 if (txc->modes & ADJ_TIMECONST) { 741 - time_constant = clamp(txc->constant, 0, MAXTC); 742 - if (!(time_status & STA_NANO)) 743 - time_constant += 4; 744 - time_constant = clamp(time_constant, 0, MAXTC); 743 + ntpdata->time_constant = clamp(txc->constant, 0, MAXTC); 744 + if (!(ntpdata->time_status & STA_NANO)) 745 + ntpdata->time_constant += 4; 746 + ntpdata->time_constant = clamp(ntpdata->time_constant, 0, MAXTC); 745 747 } 746 748 747 - if (txc->modes & ADJ_TAI && 748 - txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) 749 + if (txc->modes & ADJ_TAI && txc->constant >= 0 && txc->constant <= MAX_TAI_OFFSET) 749 750 *time_tai = txc->constant; 750 751 751 752 if (txc->modes & ADJ_OFFSET) 752 - ntp_update_offset(txc->offset); 753 + ntp_update_offset(ntpdata, txc->offset); 753 754 754 755 if (txc->modes & ADJ_TICK) 755 - tick_usec = txc->tick; 756 + ntpdata->tick_usec = txc->tick; 756 757 757 758 if (txc->modes & (ADJ_TICK|ADJ_FREQUENCY|ADJ_OFFSET)) 758 - ntp_update_frequency(); 759 + ntp_update_frequency(ntpdata); 759 760 } 760 761 761 - 762 762 /* 763 - * adjtimex mainly allows reading (and writing, if superuser) of 763 + * adjtimex() mainly allows reading (and writing, if superuser) of 764 764 * kernel time-keeping variables. used by xntpd. 765 765 */ 766 766 int __do_adjtimex(struct __kernel_timex *txc, const struct timespec64 *ts, 767 767 s32 *time_tai, struct audit_ntp_data *ad) 768 768 { 769 + struct ntp_data *ntpdata = &tk_ntp_data; 769 770 int result; 770 771 771 772 if (txc->modes & ADJ_ADJTIME) { 772 - long save_adjust = time_adjust; 773 + long save_adjust = ntpdata->time_adjust; 773 774 774 775 if (!(txc->modes & ADJ_OFFSET_READONLY)) { 775 776 /* adjtime() is independent from ntp_adjtime() */ 776 - time_adjust = txc->offset; 777 - ntp_update_frequency(); 777 + ntpdata->time_adjust = txc->offset; 778 + ntp_update_frequency(ntpdata); 778 779 779 780 audit_ntp_set_old(ad, AUDIT_NTP_ADJUST, save_adjust); 780 - audit_ntp_set_new(ad, AUDIT_NTP_ADJUST, time_adjust); 781 + audit_ntp_set_new(ad, AUDIT_NTP_ADJUST, ntpdata->time_adjust); 781 782 } 782 783 txc->offset = save_adjust; 783 784 } else { 784 785 /* If there are input parameters, then process them: */ 785 786 if (txc->modes) { 786 - audit_ntp_set_old(ad, AUDIT_NTP_OFFSET, time_offset); 787 - audit_ntp_set_old(ad, AUDIT_NTP_FREQ, time_freq); 788 - audit_ntp_set_old(ad, AUDIT_NTP_STATUS, time_status); 787 + audit_ntp_set_old(ad, AUDIT_NTP_OFFSET, ntpdata->time_offset); 788 + audit_ntp_set_old(ad, AUDIT_NTP_FREQ, ntpdata->time_freq); 789 + audit_ntp_set_old(ad, AUDIT_NTP_STATUS, ntpdata->time_status); 789 790 audit_ntp_set_old(ad, AUDIT_NTP_TAI, *time_tai); 790 - audit_ntp_set_old(ad, AUDIT_NTP_TICK, tick_usec); 791 + audit_ntp_set_old(ad, AUDIT_NTP_TICK, ntpdata->tick_usec); 791 792 792 - process_adjtimex_modes(txc, time_tai); 793 + process_adjtimex_modes(ntpdata, txc, time_tai); 793 794 794 - audit_ntp_set_new(ad, AUDIT_NTP_OFFSET, time_offset); 795 - audit_ntp_set_new(ad, AUDIT_NTP_FREQ, time_freq); 796 - audit_ntp_set_new(ad, AUDIT_NTP_STATUS, time_status); 795 + audit_ntp_set_new(ad, AUDIT_NTP_OFFSET, ntpdata->time_offset); 796 + audit_ntp_set_new(ad, AUDIT_NTP_FREQ, ntpdata->time_freq); 797 + audit_ntp_set_new(ad, AUDIT_NTP_STATUS, ntpdata->time_status); 797 798 audit_ntp_set_new(ad, AUDIT_NTP_TAI, *time_tai); 798 - audit_ntp_set_new(ad, AUDIT_NTP_TICK, tick_usec); 799 + audit_ntp_set_new(ad, AUDIT_NTP_TICK, ntpdata->tick_usec); 799 800 } 800 801 801 - txc->offset = shift_right(time_offset * NTP_INTERVAL_FREQ, 802 - NTP_SCALE_SHIFT); 803 - if (!(time_status & STA_NANO)) 802 + txc->offset = shift_right(ntpdata->time_offset * NTP_INTERVAL_FREQ, NTP_SCALE_SHIFT); 803 + if (!(ntpdata->time_status & STA_NANO)) 804 804 txc->offset = (u32)txc->offset / NSEC_PER_USEC; 805 805 } 806 806 807 - result = time_state; /* mostly `TIME_OK' */ 808 - /* check for errors */ 809 - if (is_error_status(time_status)) 807 + result = ntpdata->time_state; 808 + if (is_error_status(ntpdata->time_status)) 810 809 result = TIME_ERROR; 811 810 812 - txc->freq = shift_right((time_freq >> PPM_SCALE_INV_SHIFT) * 811 + txc->freq = shift_right((ntpdata->time_freq >> PPM_SCALE_INV_SHIFT) * 813 812 PPM_SCALE_INV, NTP_SCALE_SHIFT); 814 - txc->maxerror = time_maxerror; 815 - txc->esterror = time_esterror; 816 - txc->status = time_status; 817 - txc->constant = time_constant; 813 + txc->maxerror = ntpdata->time_maxerror; 814 + txc->esterror = ntpdata->time_esterror; 815 + txc->status = ntpdata->time_status; 816 + txc->constant = ntpdata->time_constant; 818 817 txc->precision = 1; 819 818 txc->tolerance = MAXFREQ_SCALED / PPM_SCALE; 820 - txc->tick = tick_usec; 819 + txc->tick = ntpdata->tick_usec; 821 820 txc->tai = *time_tai; 822 821 823 - /* fill PPS status fields */ 824 - pps_fill_timex(txc); 822 + /* Fill PPS status fields */ 823 + pps_fill_timex(ntpdata, txc); 825 824 826 825 txc->time.tv_sec = ts->tv_sec; 827 826 txc->time.tv_usec = ts->tv_nsec; 828 - if (!(time_status & STA_NANO)) 827 + if (!(ntpdata->time_status & STA_NANO)) 829 828 txc->time.tv_usec = ts->tv_nsec / NSEC_PER_USEC; 830 829 831 830 /* Handle leapsec adjustments */ 832 - if (unlikely(ts->tv_sec >= ntp_next_leap_sec)) { 833 - if ((time_state == TIME_INS) && (time_status & STA_INS)) { 831 + if (unlikely(ts->tv_sec >= ntpdata->ntp_next_leap_sec)) { 832 + if ((ntpdata->time_state == TIME_INS) && (ntpdata->time_status & STA_INS)) { 834 833 result = TIME_OOP; 835 834 txc->tai++; 836 835 txc->time.tv_sec--; 837 836 } 838 - if ((time_state == TIME_DEL) && (time_status & STA_DEL)) { 837 + if ((ntpdata->time_state == TIME_DEL) && (ntpdata->time_status & STA_DEL)) { 839 838 result = TIME_WAIT; 840 839 txc->tai--; 841 840 txc->time.tv_sec++; 842 841 } 843 - if ((time_state == TIME_OOP) && 844 - (ts->tv_sec == ntp_next_leap_sec)) { 842 + if ((ntpdata->time_state == TIME_OOP) && (ts->tv_sec == ntpdata->ntp_next_leap_sec)) 845 843 result = TIME_WAIT; 846 - } 847 844 } 848 845 849 846 return result; ··· 845 854 846 855 #ifdef CONFIG_NTP_PPS 847 856 848 - /* actually struct pps_normtime is good old struct timespec, but it is 857 + /* 858 + * struct pps_normtime is basically a struct timespec, but it is 849 859 * semantically different (and it is the reason why it was invented): 850 860 * pps_normtime.nsec has a range of ( -NSEC_PER_SEC / 2, NSEC_PER_SEC / 2 ] 851 - * while timespec.tv_nsec has a range of [0, NSEC_PER_SEC) */ 861 + * while timespec.tv_nsec has a range of [0, NSEC_PER_SEC) 862 + */ 852 863 struct pps_normtime { 853 864 s64 sec; /* seconds */ 854 865 long nsec; /* nanoseconds */ 855 866 }; 856 867 857 - /* normalize the timestamp so that nsec is in the 858 - ( -NSEC_PER_SEC / 2, NSEC_PER_SEC / 2 ] interval */ 868 + /* 869 + * Normalize the timestamp so that nsec is in the 870 + * [ -NSEC_PER_SEC / 2, NSEC_PER_SEC / 2 ] interval 871 + */ 859 872 static inline struct pps_normtime pps_normalize_ts(struct timespec64 ts) 860 873 { 861 874 struct pps_normtime norm = { ··· 875 880 return norm; 876 881 } 877 882 878 - /* get current phase correction and jitter */ 879 - static inline long pps_phase_filter_get(long *jitter) 883 + /* Get current phase correction and jitter */ 884 + static inline long pps_phase_filter_get(struct ntp_data *ntpdata, long *jitter) 880 885 { 881 - *jitter = pps_tf[0] - pps_tf[1]; 886 + *jitter = ntpdata->pps_tf[0] - ntpdata->pps_tf[1]; 882 887 if (*jitter < 0) 883 888 *jitter = -*jitter; 884 889 885 890 /* TODO: test various filters */ 886 - return pps_tf[0]; 891 + return ntpdata->pps_tf[0]; 887 892 } 888 893 889 - /* add the sample to the phase filter */ 890 - static inline void pps_phase_filter_add(long err) 894 + /* Add the sample to the phase filter */ 895 + static inline void pps_phase_filter_add(struct ntp_data *ntpdata, long err) 891 896 { 892 - pps_tf[2] = pps_tf[1]; 893 - pps_tf[1] = pps_tf[0]; 894 - pps_tf[0] = err; 897 + ntpdata->pps_tf[2] = ntpdata->pps_tf[1]; 898 + ntpdata->pps_tf[1] = ntpdata->pps_tf[0]; 899 + ntpdata->pps_tf[0] = err; 895 900 } 896 901 897 - /* decrease frequency calibration interval length. 898 - * It is halved after four consecutive unstable intervals. 902 + /* 903 + * Decrease frequency calibration interval length. It is halved after four 904 + * consecutive unstable intervals. 899 905 */ 900 - static inline void pps_dec_freq_interval(void) 906 + static inline void pps_dec_freq_interval(struct ntp_data *ntpdata) 901 907 { 902 - if (--pps_intcnt <= -PPS_INTCOUNT) { 903 - pps_intcnt = -PPS_INTCOUNT; 904 - if (pps_shift > PPS_INTMIN) { 905 - pps_shift--; 906 - pps_intcnt = 0; 908 + if (--ntpdata->pps_intcnt <= -PPS_INTCOUNT) { 909 + ntpdata->pps_intcnt = -PPS_INTCOUNT; 910 + if (ntpdata->pps_shift > PPS_INTMIN) { 911 + ntpdata->pps_shift--; 912 + ntpdata->pps_intcnt = 0; 907 913 } 908 914 } 909 915 } 910 916 911 - /* increase frequency calibration interval length. 912 - * It is doubled after four consecutive stable intervals. 917 + /* 918 + * Increase frequency calibration interval length. It is doubled after 919 + * four consecutive stable intervals. 913 920 */ 914 - static inline void pps_inc_freq_interval(void) 921 + static inline void pps_inc_freq_interval(struct ntp_data *ntpdata) 915 922 { 916 - if (++pps_intcnt >= PPS_INTCOUNT) { 917 - pps_intcnt = PPS_INTCOUNT; 918 - if (pps_shift < PPS_INTMAX) { 919 - pps_shift++; 920 - pps_intcnt = 0; 923 + if (++ntpdata->pps_intcnt >= PPS_INTCOUNT) { 924 + ntpdata->pps_intcnt = PPS_INTCOUNT; 925 + if (ntpdata->pps_shift < PPS_INTMAX) { 926 + ntpdata->pps_shift++; 927 + ntpdata->pps_intcnt = 0; 921 928 } 922 929 } 923 930 } 924 931 925 - /* update clock frequency based on MONOTONIC_RAW clock PPS signal 932 + /* 933 + * Update clock frequency based on MONOTONIC_RAW clock PPS signal 926 934 * timestamps 927 935 * 928 936 * At the end of the calibration interval the difference between the ··· 934 936 * too long, the data are discarded. 935 937 * Returns the difference between old and new frequency values. 936 938 */ 937 - static long hardpps_update_freq(struct pps_normtime freq_norm) 939 + static long hardpps_update_freq(struct ntp_data *ntpdata, struct pps_normtime freq_norm) 938 940 { 939 941 long delta, delta_mod; 940 942 s64 ftemp; 941 943 942 - /* check if the frequency interval was too long */ 943 - if (freq_norm.sec > (2 << pps_shift)) { 944 - time_status |= STA_PPSERROR; 945 - pps_errcnt++; 946 - pps_dec_freq_interval(); 947 - printk_deferred(KERN_ERR 948 - "hardpps: PPSERROR: interval too long - %lld s\n", 949 - freq_norm.sec); 944 + /* Check if the frequency interval was too long */ 945 + if (freq_norm.sec > (2 << ntpdata->pps_shift)) { 946 + ntpdata->time_status |= STA_PPSERROR; 947 + ntpdata->pps_errcnt++; 948 + pps_dec_freq_interval(ntpdata); 949 + printk_deferred(KERN_ERR "hardpps: PPSERROR: interval too long - %lld s\n", 950 + freq_norm.sec); 950 951 return 0; 951 952 } 952 953 953 - /* here the raw frequency offset and wander (stability) is 954 - * calculated. If the wander is less than the wander threshold 955 - * the interval is increased; otherwise it is decreased. 954 + /* 955 + * Here the raw frequency offset and wander (stability) is 956 + * calculated. If the wander is less than the wander threshold the 957 + * interval is increased; otherwise it is decreased. 956 958 */ 957 959 ftemp = div_s64(((s64)(-freq_norm.nsec)) << NTP_SCALE_SHIFT, 958 960 freq_norm.sec); 959 - delta = shift_right(ftemp - pps_freq, NTP_SCALE_SHIFT); 960 - pps_freq = ftemp; 961 + delta = shift_right(ftemp - ntpdata->pps_freq, NTP_SCALE_SHIFT); 962 + ntpdata->pps_freq = ftemp; 961 963 if (delta > PPS_MAXWANDER || delta < -PPS_MAXWANDER) { 962 - printk_deferred(KERN_WARNING 963 - "hardpps: PPSWANDER: change=%ld\n", delta); 964 - time_status |= STA_PPSWANDER; 965 - pps_stbcnt++; 966 - pps_dec_freq_interval(); 967 - } else { /* good sample */ 968 - pps_inc_freq_interval(); 964 + printk_deferred(KERN_WARNING "hardpps: PPSWANDER: change=%ld\n", delta); 965 + ntpdata->time_status |= STA_PPSWANDER; 966 + ntpdata->pps_stbcnt++; 967 + pps_dec_freq_interval(ntpdata); 968 + } else { 969 + /* Good sample */ 970 + pps_inc_freq_interval(ntpdata); 969 971 } 970 972 971 - /* the stability metric is calculated as the average of recent 972 - * frequency changes, but is used only for performance 973 - * monitoring 973 + /* 974 + * The stability metric is calculated as the average of recent 975 + * frequency changes, but is used only for performance monitoring 974 976 */ 975 977 delta_mod = delta; 976 978 if (delta_mod < 0) 977 979 delta_mod = -delta_mod; 978 - pps_stabil += (div_s64(((s64)delta_mod) << 979 - (NTP_SCALE_SHIFT - SHIFT_USEC), 980 - NSEC_PER_USEC) - pps_stabil) >> PPS_INTMIN; 980 + ntpdata->pps_stabil += (div_s64(((s64)delta_mod) << (NTP_SCALE_SHIFT - SHIFT_USEC), 981 + NSEC_PER_USEC) - ntpdata->pps_stabil) >> PPS_INTMIN; 981 982 982 - /* if enabled, the system clock frequency is updated */ 983 - if ((time_status & STA_PPSFREQ) != 0 && 984 - (time_status & STA_FREQHOLD) == 0) { 985 - time_freq = pps_freq; 986 - ntp_update_frequency(); 983 + /* If enabled, the system clock frequency is updated */ 984 + if ((ntpdata->time_status & STA_PPSFREQ) && !(ntpdata->time_status & STA_FREQHOLD)) { 985 + ntpdata->time_freq = ntpdata->pps_freq; 986 + ntp_update_frequency(ntpdata); 987 987 } 988 988 989 989 return delta; 990 990 } 991 991 992 - /* correct REALTIME clock phase error against PPS signal */ 993 - static void hardpps_update_phase(long error) 992 + /* Correct REALTIME clock phase error against PPS signal */ 993 + static void hardpps_update_phase(struct ntp_data *ntpdata, long error) 994 994 { 995 995 long correction = -error; 996 996 long jitter; 997 997 998 - /* add the sample to the median filter */ 999 - pps_phase_filter_add(correction); 1000 - correction = pps_phase_filter_get(&jitter); 998 + /* Add the sample to the median filter */ 999 + pps_phase_filter_add(ntpdata, correction); 1000 + correction = pps_phase_filter_get(ntpdata, &jitter); 1001 1001 1002 - /* Nominal jitter is due to PPS signal noise. If it exceeds the 1002 + /* 1003 + * Nominal jitter is due to PPS signal noise. If it exceeds the 1003 1004 * threshold, the sample is discarded; otherwise, if so enabled, 1004 1005 * the time offset is updated. 1005 1006 */ 1006 - if (jitter > (pps_jitter << PPS_POPCORN)) { 1007 - printk_deferred(KERN_WARNING 1008 - "hardpps: PPSJITTER: jitter=%ld, limit=%ld\n", 1009 - jitter, (pps_jitter << PPS_POPCORN)); 1010 - time_status |= STA_PPSJITTER; 1011 - pps_jitcnt++; 1012 - } else if (time_status & STA_PPSTIME) { 1013 - /* correct the time using the phase offset */ 1014 - time_offset = div_s64(((s64)correction) << NTP_SCALE_SHIFT, 1015 - NTP_INTERVAL_FREQ); 1016 - /* cancel running adjtime() */ 1017 - time_adjust = 0; 1007 + if (jitter > (ntpdata->pps_jitter << PPS_POPCORN)) { 1008 + printk_deferred(KERN_WARNING "hardpps: PPSJITTER: jitter=%ld, limit=%ld\n", 1009 + jitter, (ntpdata->pps_jitter << PPS_POPCORN)); 1010 + ntpdata->time_status |= STA_PPSJITTER; 1011 + ntpdata->pps_jitcnt++; 1012 + } else if (ntpdata->time_status & STA_PPSTIME) { 1013 + /* Correct the time using the phase offset */ 1014 + ntpdata->time_offset = div_s64(((s64)correction) << NTP_SCALE_SHIFT, 1015 + NTP_INTERVAL_FREQ); 1016 + /* Cancel running adjtime() */ 1017 + ntpdata->time_adjust = 0; 1018 1018 } 1019 - /* update jitter */ 1020 - pps_jitter += (jitter - pps_jitter) >> PPS_INTMIN; 1019 + /* Update jitter */ 1020 + ntpdata->pps_jitter += (jitter - ntpdata->pps_jitter) >> PPS_INTMIN; 1021 1021 } 1022 1022 1023 1023 /* ··· 1033 1037 void __hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts) 1034 1038 { 1035 1039 struct pps_normtime pts_norm, freq_norm; 1040 + struct ntp_data *ntpdata = &tk_ntp_data; 1036 1041 1037 1042 pts_norm = pps_normalize_ts(*phase_ts); 1038 1043 1039 - /* clear the error bits, they will be set again if needed */ 1040 - time_status &= ~(STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR); 1044 + /* Clear the error bits, they will be set again if needed */ 1045 + ntpdata->time_status &= ~(STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR); 1041 1046 1042 1047 /* indicate signal presence */ 1043 - time_status |= STA_PPSSIGNAL; 1044 - pps_valid = PPS_VALID; 1048 + ntpdata->time_status |= STA_PPSSIGNAL; 1049 + ntpdata->pps_valid = PPS_VALID; 1045 1050 1046 - /* when called for the first time, 1047 - * just start the frequency interval */ 1048 - if (unlikely(pps_fbase.tv_sec == 0)) { 1049 - pps_fbase = *raw_ts; 1051 + /* 1052 + * When called for the first time, just start the frequency 1053 + * interval 1054 + */ 1055 + if (unlikely(ntpdata->pps_fbase.tv_sec == 0)) { 1056 + ntpdata->pps_fbase = *raw_ts; 1050 1057 return; 1051 1058 } 1052 1059 1053 - /* ok, now we have a base for frequency calculation */ 1054 - freq_norm = pps_normalize_ts(timespec64_sub(*raw_ts, pps_fbase)); 1060 + /* Ok, now we have a base for frequency calculation */ 1061 + freq_norm = pps_normalize_ts(timespec64_sub(*raw_ts, ntpdata->pps_fbase)); 1055 1062 1056 - /* check that the signal is in the range 1057 - * [1s - MAXFREQ us, 1s + MAXFREQ us], otherwise reject it */ 1058 - if ((freq_norm.sec == 0) || 1059 - (freq_norm.nsec > MAXFREQ * freq_norm.sec) || 1060 - (freq_norm.nsec < -MAXFREQ * freq_norm.sec)) { 1061 - time_status |= STA_PPSJITTER; 1062 - /* restart the frequency calibration interval */ 1063 - pps_fbase = *raw_ts; 1063 + /* 1064 + * Check that the signal is in the range 1065 + * [1s - MAXFREQ us, 1s + MAXFREQ us], otherwise reject it 1066 + */ 1067 + if ((freq_norm.sec == 0) || (freq_norm.nsec > MAXFREQ * freq_norm.sec) || 1068 + (freq_norm.nsec < -MAXFREQ * freq_norm.sec)) { 1069 + ntpdata->time_status |= STA_PPSJITTER; 1070 + /* Restart the frequency calibration interval */ 1071 + ntpdata->pps_fbase = *raw_ts; 1064 1072 printk_deferred(KERN_ERR "hardpps: PPSJITTER: bad pulse\n"); 1065 1073 return; 1066 1074 } 1067 1075 1068 - /* signal is ok */ 1069 - 1070 - /* check if the current frequency interval is finished */ 1071 - if (freq_norm.sec >= (1 << pps_shift)) { 1072 - pps_calcnt++; 1073 - /* restart the frequency calibration interval */ 1074 - pps_fbase = *raw_ts; 1075 - hardpps_update_freq(freq_norm); 1076 + /* Signal is ok. Check if the current frequency interval is finished */ 1077 + if (freq_norm.sec >= (1 << ntpdata->pps_shift)) { 1078 + ntpdata->pps_calcnt++; 1079 + /* Restart the frequency calibration interval */ 1080 + ntpdata->pps_fbase = *raw_ts; 1081 + hardpps_update_freq(ntpdata, freq_norm); 1076 1082 } 1077 1083 1078 - hardpps_update_phase(pts_norm.nsec); 1084 + hardpps_update_phase(ntpdata, pts_norm.nsec); 1079 1085 1080 1086 } 1081 1087 #endif /* CONFIG_NTP_PPS */ 1082 1088 1083 1089 static int __init ntp_tick_adj_setup(char *str) 1084 1090 { 1085 - int rc = kstrtos64(str, 0, &ntp_tick_adj); 1091 + int rc = kstrtos64(str, 0, &tk_ntp_data.ntp_tick_adj); 1086 1092 if (rc) 1087 1093 return rc; 1088 1094 1089 - ntp_tick_adj <<= NTP_SCALE_SHIFT; 1095 + tk_ntp_data.ntp_tick_adj <<= NTP_SCALE_SHIFT; 1090 1096 return 1; 1091 1097 } 1092 1098
+39 -33
kernel/time/posix-cpu-timers.c
··· 453 453 struct cpu_timer *ctmr = &timer->it.cpu; 454 454 struct posix_cputimer_base *base; 455 455 456 - timer->it_active = 0; 457 456 if (!cpu_timer_dequeue(ctmr)) 458 457 return; 459 458 ··· 493 494 */ 494 495 WARN_ON_ONCE(ctmr->head || timerqueue_node_queued(&ctmr->node)); 495 496 } else { 496 - if (timer->it.cpu.firing) 497 + if (timer->it.cpu.firing) { 498 + /* 499 + * Prevent signal delivery. The timer cannot be dequeued 500 + * because it is on the firing list which is not protected 501 + * by sighand->lock. The delivery path is waiting for 502 + * the timer lock. So go back, unlock and retry. 503 + */ 504 + timer->it.cpu.firing = false; 497 505 ret = TIMER_RETRY; 498 - else 506 + } else { 499 507 disarm_timer(timer, p); 500 - 508 + } 501 509 unlock_task_sighand(p, &flags); 502 510 } 503 511 504 512 out: 505 513 rcu_read_unlock(); 506 - if (!ret) 507 - put_pid(ctmr->pid); 508 514 515 + if (!ret) { 516 + put_pid(ctmr->pid); 517 + timer->it_status = POSIX_TIMER_DISARMED; 518 + } 509 519 return ret; 510 520 } 511 521 ··· 568 560 struct cpu_timer *ctmr = &timer->it.cpu; 569 561 u64 newexp = cpu_timer_getexpires(ctmr); 570 562 571 - timer->it_active = 1; 563 + timer->it_status = POSIX_TIMER_ARMED; 572 564 if (!cpu_timer_enqueue(&base->tqhead, ctmr)) 573 565 return; 574 566 ··· 594 586 { 595 587 struct cpu_timer *ctmr = &timer->it.cpu; 596 588 597 - timer->it_active = 0; 598 - if (unlikely(timer->sigq == NULL)) { 589 + timer->it_status = POSIX_TIMER_DISARMED; 590 + 591 + if (unlikely(ctmr->nanosleep)) { 599 592 /* 600 593 * This a special case for clock_nanosleep, 601 594 * not a normal timer from sys_timer_create. 602 595 */ 603 596 wake_up_process(timer->it_process); 604 597 cpu_timer_setexpires(ctmr, 0); 605 - } else if (!timer->it_interval) { 606 - /* 607 - * One-shot timer. Clear it as soon as it's fired. 608 - */ 598 + } else { 609 599 posix_timer_queue_signal(timer); 610 - cpu_timer_setexpires(ctmr, 0); 611 - } else if (posix_timer_queue_signal(timer)) { 612 - /* 613 - * The signal did not get queued because the signal 614 - * was ignored, so we won't get any callback to 615 - * reload the timer. But we need to keep it 616 - * ticking in case the signal is deliverable next time. 617 - */ 618 - posix_cpu_timer_rearm(timer); 619 - ++timer->it_requeue_pending; 600 + /* Disable oneshot timers */ 601 + if (!timer->it_interval) 602 + cpu_timer_setexpires(ctmr, 0); 620 603 } 621 604 } 622 605 ··· 666 667 old_expires = cpu_timer_getexpires(ctmr); 667 668 668 669 if (unlikely(timer->it.cpu.firing)) { 669 - timer->it.cpu.firing = -1; 670 + /* 671 + * Prevent signal delivery. The timer cannot be dequeued 672 + * because it is on the firing list which is not protected 673 + * by sighand->lock. The delivery path is waiting for 674 + * the timer lock. So go back, unlock and retry. 675 + */ 676 + timer->it.cpu.firing = false; 670 677 ret = TIMER_RETRY; 671 678 } else { 672 679 cpu_timer_dequeue(ctmr); 673 - timer->it_active = 0; 680 + timer->it_status = POSIX_TIMER_DISARMED; 674 681 } 675 682 676 683 /* ··· 750 745 * - Timers which expired, but the signal has not yet been 751 746 * delivered 752 747 */ 753 - if (iv && ((timer->it_requeue_pending & REQUEUE_PENDING) || sigev_none)) 748 + if (iv && timer->it_status != POSIX_TIMER_ARMED) 754 749 expires = bump_cpu_timer(timer, now); 755 750 else 756 751 expires = cpu_timer_getexpires(&timer->it.cpu); ··· 813 808 if (++i == MAX_COLLECTED || now < expires) 814 809 return expires; 815 810 816 - ctmr->firing = 1; 811 + ctmr->firing = true; 817 812 /* See posix_cpu_timer_wait_running() */ 818 813 rcu_assign_pointer(ctmr->handling, current); 819 814 cpu_timer_dequeue(ctmr); ··· 1368 1363 * timer call will interfere. 1369 1364 */ 1370 1365 list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) { 1371 - int cpu_firing; 1366 + bool cpu_firing; 1372 1367 1373 1368 /* 1374 1369 * spin_lock() is sufficient here even independent of the ··· 1380 1375 spin_lock(&timer->it_lock); 1381 1376 list_del_init(&timer->it.cpu.elist); 1382 1377 cpu_firing = timer->it.cpu.firing; 1383 - timer->it.cpu.firing = 0; 1378 + timer->it.cpu.firing = false; 1384 1379 /* 1385 - * The firing flag is -1 if we collided with a reset 1386 - * of the timer, which already reported this 1387 - * almost-firing as an overrun. So don't generate an event. 1380 + * If the firing flag is cleared then this raced with a 1381 + * timer rearm/delete operation. So don't generate an 1382 + * event. 1388 1383 */ 1389 - if (likely(cpu_firing >= 0)) 1384 + if (likely(cpu_firing)) 1390 1385 cpu_timer_fire(timer); 1391 1386 /* See posix_cpu_timer_wait_running() */ 1392 1387 rcu_assign_pointer(timer->it.cpu.handling, NULL); ··· 1483 1478 timer.it_overrun = -1; 1484 1479 error = posix_cpu_timer_create(&timer); 1485 1480 timer.it_process = current; 1481 + timer.it.cpu.nanosleep = true; 1486 1482 1487 1483 if (!error) { 1488 1484 static struct itimerspec64 zero_it;
+136 -145
kernel/time/posix-timers.c
··· 233 233 * The siginfo si_overrun field and the return value of timer_getoverrun(2) 234 234 * are of type int. Clamp the overrun value to INT_MAX 235 235 */ 236 - static inline int timer_overrun_to_int(struct k_itimer *timr, int baseval) 236 + static inline int timer_overrun_to_int(struct k_itimer *timr) 237 237 { 238 - s64 sum = timr->it_overrun_last + (s64)baseval; 238 + if (timr->it_overrun_last > (s64)INT_MAX) 239 + return INT_MAX; 239 240 240 - return sum > (s64)INT_MAX ? INT_MAX : (int)sum; 241 + return (int)timr->it_overrun_last; 241 242 } 242 243 243 244 static void common_hrtimer_rearm(struct k_itimer *timr) ··· 250 249 hrtimer_restart(timer); 251 250 } 252 251 253 - /* 254 - * This function is called from the signal delivery code if 255 - * info->si_sys_private is not zero, which indicates that the timer has to 256 - * be rearmed. Restart the timer and update info::si_overrun. 257 - */ 258 - void posixtimer_rearm(struct kernel_siginfo *info) 252 + static bool __posixtimer_deliver_signal(struct kernel_siginfo *info, struct k_itimer *timr) 259 253 { 260 - struct k_itimer *timr; 261 - unsigned long flags; 262 - 263 - timr = lock_timer(info->si_tid, &flags); 264 - if (!timr) 265 - return; 266 - 267 - if (timr->it_interval && timr->it_requeue_pending == info->si_sys_private) { 268 - timr->kclock->timer_rearm(timr); 269 - 270 - timr->it_active = 1; 271 - timr->it_overrun_last = timr->it_overrun; 272 - timr->it_overrun = -1LL; 273 - ++timr->it_requeue_pending; 274 - 275 - info->si_overrun = timer_overrun_to_int(timr, info->si_overrun); 276 - } 277 - 278 - unlock_timer(timr, flags); 279 - } 280 - 281 - int posix_timer_queue_signal(struct k_itimer *timr) 282 - { 283 - int ret, si_private = 0; 284 - enum pid_type type; 285 - 286 - lockdep_assert_held(&timr->it_lock); 287 - 288 - timr->it_active = 0; 289 - if (timr->it_interval) 290 - si_private = ++timr->it_requeue_pending; 254 + guard(spinlock)(&timr->it_lock); 291 255 292 256 /* 293 - * FIXME: if ->sigq is queued we can race with 294 - * dequeue_signal()->posixtimer_rearm(). 295 - * 296 - * If dequeue_signal() sees the "right" value of 297 - * si_sys_private it calls posixtimer_rearm(). 298 - * We re-queue ->sigq and drop ->it_lock(). 299 - * posixtimer_rearm() locks the timer 300 - * and re-schedules it while ->sigq is pending. 301 - * Not really bad, but not that we want. 257 + * Check if the timer is still alive or whether it got modified 258 + * since the signal was queued. In either case, don't rearm and 259 + * drop the signal. 302 260 */ 303 - timr->sigq->info.si_sys_private = si_private; 261 + if (timr->it_signal_seq != timr->it_sigqueue_seq || WARN_ON_ONCE(!timr->it_signal)) 262 + return false; 304 263 305 - type = !(timr->it_sigev_notify & SIGEV_THREAD_ID) ? PIDTYPE_TGID : PIDTYPE_PID; 306 - ret = send_sigqueue(timr->sigq, timr->it_pid, type); 307 - /* If we failed to send the signal the timer stops. */ 308 - return ret > 0; 264 + if (!timr->it_interval || WARN_ON_ONCE(timr->it_status != POSIX_TIMER_REQUEUE_PENDING)) 265 + return true; 266 + 267 + timr->kclock->timer_rearm(timr); 268 + timr->it_status = POSIX_TIMER_ARMED; 269 + timr->it_overrun_last = timr->it_overrun; 270 + timr->it_overrun = -1LL; 271 + ++timr->it_signal_seq; 272 + info->si_overrun = timer_overrun_to_int(timr); 273 + return true; 274 + } 275 + 276 + /* 277 + * This function is called from the signal delivery code. It decides 278 + * whether the signal should be dropped and rearms interval timers. The 279 + * timer can be unconditionally accessed as there is a reference held on 280 + * it. 281 + */ 282 + bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueue *timer_sigq) 283 + { 284 + struct k_itimer *timr = container_of(timer_sigq, struct k_itimer, sigq); 285 + bool ret; 286 + 287 + /* 288 + * Release siglock to ensure proper locking order versus 289 + * timr::it_lock. Keep interrupts disabled. 290 + */ 291 + spin_unlock(&current->sighand->siglock); 292 + 293 + ret = __posixtimer_deliver_signal(info, timr); 294 + 295 + /* Drop the reference which was acquired when the signal was queued */ 296 + posixtimer_putref(timr); 297 + 298 + spin_lock(&current->sighand->siglock); 299 + return ret; 300 + } 301 + 302 + void posix_timer_queue_signal(struct k_itimer *timr) 303 + { 304 + lockdep_assert_held(&timr->it_lock); 305 + 306 + timr->it_status = timr->it_interval ? POSIX_TIMER_REQUEUE_PENDING : POSIX_TIMER_DISARMED; 307 + posixtimer_send_sigqueue(timr); 309 308 } 310 309 311 310 /* ··· 318 317 static enum hrtimer_restart posix_timer_fn(struct hrtimer *timer) 319 318 { 320 319 struct k_itimer *timr = container_of(timer, struct k_itimer, it.real.timer); 321 - enum hrtimer_restart ret = HRTIMER_NORESTART; 322 - unsigned long flags; 323 320 324 - spin_lock_irqsave(&timr->it_lock, flags); 325 - 326 - if (posix_timer_queue_signal(timr)) { 327 - /* 328 - * The signal was not queued due to SIG_IGN. As a 329 - * consequence the timer is not going to be rearmed from 330 - * the signal delivery path. But as a real signal handler 331 - * can be installed later the timer must be rearmed here. 332 - */ 333 - if (timr->it_interval != 0) { 334 - ktime_t now = hrtimer_cb_get_time(timer); 335 - 336 - /* 337 - * FIXME: What we really want, is to stop this 338 - * timer completely and restart it in case the 339 - * SIG_IGN is removed. This is a non trivial 340 - * change to the signal handling code. 341 - * 342 - * For now let timers with an interval less than a 343 - * jiffy expire every jiffy and recheck for a 344 - * valid signal handler. 345 - * 346 - * This avoids interrupt starvation in case of a 347 - * very small interval, which would expire the 348 - * timer immediately again. 349 - * 350 - * Moving now ahead of time by one jiffy tricks 351 - * hrtimer_forward() to expire the timer later, 352 - * while it still maintains the overrun accuracy 353 - * for the price of a slight inconsistency in the 354 - * timer_gettime() case. This is at least better 355 - * than a timer storm. 356 - * 357 - * Only required when high resolution timers are 358 - * enabled as the periodic tick based timers are 359 - * automatically aligned to the next tick. 360 - */ 361 - if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS)) { 362 - ktime_t kj = TICK_NSEC; 363 - 364 - if (timr->it_interval < kj) 365 - now = ktime_add(now, kj); 366 - } 367 - 368 - timr->it_overrun += hrtimer_forward(timer, now, timr->it_interval); 369 - ret = HRTIMER_RESTART; 370 - ++timr->it_requeue_pending; 371 - timr->it_active = 1; 372 - } 373 - } 374 - 375 - unlock_timer(timr, flags); 376 - return ret; 321 + guard(spinlock_irqsave)(&timr->it_lock); 322 + posix_timer_queue_signal(timr); 323 + return HRTIMER_NORESTART; 377 324 } 378 325 379 326 static struct pid *good_sigevent(sigevent_t * event) ··· 348 399 } 349 400 } 350 401 351 - static struct k_itimer * alloc_posix_timer(void) 402 + static struct k_itimer *alloc_posix_timer(void) 352 403 { 353 404 struct k_itimer *tmr = kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL); 354 405 355 406 if (!tmr) 356 407 return tmr; 357 - if (unlikely(!(tmr->sigq = sigqueue_alloc()))) { 408 + 409 + if (unlikely(!posixtimer_init_sigqueue(&tmr->sigq))) { 358 410 kmem_cache_free(posix_timers_cache, tmr); 359 411 return NULL; 360 412 } 361 - clear_siginfo(&tmr->sigq->info); 413 + rcuref_init(&tmr->rcuref, 1); 362 414 return tmr; 363 415 } 364 416 365 - static void k_itimer_rcu_free(struct rcu_head *head) 366 - { 367 - struct k_itimer *tmr = container_of(head, struct k_itimer, rcu); 368 - 369 - kmem_cache_free(posix_timers_cache, tmr); 370 - } 371 - 372 - static void posix_timer_free(struct k_itimer *tmr) 417 + void posixtimer_free_timer(struct k_itimer *tmr) 373 418 { 374 419 put_pid(tmr->it_pid); 375 - sigqueue_free(tmr->sigq); 376 - call_rcu(&tmr->rcu, k_itimer_rcu_free); 420 + if (tmr->sigq.ucounts) 421 + dec_rlimit_put_ucounts(tmr->sigq.ucounts, UCOUNT_RLIMIT_SIGPENDING); 422 + kfree_rcu(tmr, rcu); 377 423 } 378 424 379 425 static void posix_timer_unhash_and_free(struct k_itimer *tmr) ··· 376 432 spin_lock(&hash_lock); 377 433 hlist_del_rcu(&tmr->t_hash); 378 434 spin_unlock(&hash_lock); 379 - posix_timer_free(tmr); 435 + posixtimer_putref(tmr); 380 436 } 381 437 382 438 static int common_timer_create(struct k_itimer *new_timer) ··· 411 467 */ 412 468 new_timer_id = posix_timer_add(new_timer); 413 469 if (new_timer_id < 0) { 414 - posix_timer_free(new_timer); 470 + posixtimer_free_timer(new_timer); 415 471 return new_timer_id; 416 472 } 417 473 ··· 429 485 goto out; 430 486 } 431 487 new_timer->it_sigev_notify = event->sigev_notify; 432 - new_timer->sigq->info.si_signo = event->sigev_signo; 433 - new_timer->sigq->info.si_value = event->sigev_value; 488 + new_timer->sigq.info.si_signo = event->sigev_signo; 489 + new_timer->sigq.info.si_value = event->sigev_value; 434 490 } else { 435 491 new_timer->it_sigev_notify = SIGEV_SIGNAL; 436 - new_timer->sigq->info.si_signo = SIGALRM; 437 - memset(&new_timer->sigq->info.si_value, 0, sizeof(sigval_t)); 438 - new_timer->sigq->info.si_value.sival_int = new_timer->it_id; 492 + new_timer->sigq.info.si_signo = SIGALRM; 493 + memset(&new_timer->sigq.info.si_value, 0, sizeof(sigval_t)); 494 + new_timer->sigq.info.si_value.sival_int = new_timer->it_id; 439 495 new_timer->it_pid = get_pid(task_tgid(current)); 440 496 } 441 497 442 - new_timer->sigq->info.si_tid = new_timer->it_id; 443 - new_timer->sigq->info.si_code = SI_TIMER; 498 + if (new_timer->it_sigev_notify & SIGEV_THREAD_ID) 499 + new_timer->it_pid_type = PIDTYPE_PID; 500 + else 501 + new_timer->it_pid_type = PIDTYPE_TGID; 502 + 503 + new_timer->sigq.info.si_tid = new_timer->it_id; 504 + new_timer->sigq.info.si_code = SI_TIMER; 444 505 445 506 if (copy_to_user(created_timer_id, &new_timer_id, sizeof (new_timer_id))) { 446 507 error = -EFAULT; ··· 529 580 * 1) Set timr::it_signal to NULL with timr::it_lock held 530 581 * 2) Release timr::it_lock 531 582 * 3) Remove from the hash under hash_lock 532 - * 4) Call RCU for removal after the grace period 583 + * 4) Put the reference count. 584 + * 585 + * The reference count might not drop to zero if timr::sigq is 586 + * queued. In that case the signal delivery or flush will put the 587 + * last reference count. 588 + * 589 + * When the reference count reaches zero, the timer is scheduled 590 + * for RCU removal after the grace period. 533 591 * 534 592 * Holding rcu_read_lock() accross the lookup ensures that 535 593 * the timer cannot be freed. ··· 603 647 /* interval timer ? */ 604 648 if (iv) { 605 649 cur_setting->it_interval = ktime_to_timespec64(iv); 606 - } else if (!timr->it_active) { 650 + } else if (timr->it_status == POSIX_TIMER_DISARMED) { 607 651 /* 608 652 * SIGEV_NONE oneshot timers are never queued and therefore 609 - * timr->it_active is always false. The check below 653 + * timr->it_status is always DISARMED. The check below 610 654 * vs. remaining time will handle this case. 611 655 * 612 656 * For all other timers there is nothing to update here, so ··· 623 667 * is a SIGEV_NONE timer move the expiry time forward by intervals, 624 668 * so expiry is > now. 625 669 */ 626 - if (iv && (timr->it_requeue_pending & REQUEUE_PENDING || sig_none)) 670 + if (iv && timr->it_status != POSIX_TIMER_ARMED) 627 671 timr->it_overrun += kc->timer_forward(timr, now); 628 672 629 673 remaining = kc->timer_remaining(timr, now); ··· 731 775 if (!timr) 732 776 return -EINVAL; 733 777 734 - overrun = timer_overrun_to_int(timr, 0); 778 + overrun = timer_overrun_to_int(timr); 735 779 unlock_timer(timr, flags); 736 780 737 781 return overrun; ··· 823 867 else 824 868 timer->it_interval = 0; 825 869 826 - /* Prevent reloading in case there is a signal pending */ 827 - timer->it_requeue_pending = (timer->it_requeue_pending + 2) & ~REQUEUE_PENDING; 828 870 /* Reset overrun accounting */ 829 871 timer->it_overrun_last = 0; 830 872 timer->it_overrun = -1LL; ··· 840 886 if (old_setting) 841 887 common_timer_get(timr, old_setting); 842 888 843 - /* Prevent rearming by clearing the interval */ 844 - timr->it_interval = 0; 845 889 /* 846 890 * Careful here. On SMP systems the timer expiry function could be 847 891 * active and spinning on timr->it_lock. ··· 847 895 if (kc->timer_try_to_cancel(timr) < 0) 848 896 return TIMER_RETRY; 849 897 850 - timr->it_active = 0; 898 + timr->it_status = POSIX_TIMER_DISARMED; 851 899 posix_timer_set_common(timr, new_setting); 852 900 853 901 /* Keep timer disarmed when it_value is zero */ ··· 860 908 sigev_none = timr->it_sigev_notify == SIGEV_NONE; 861 909 862 910 kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none); 863 - timr->it_active = !sigev_none; 911 + if (!sigev_none) 912 + timr->it_status = POSIX_TIMER_ARMED; 864 913 return 0; 865 914 } 866 915 ··· 888 935 889 936 if (old_spec64) 890 937 old_spec64->it_interval = ktime_to_timespec64(timr->it_interval); 938 + 939 + /* Prevent signal delivery and rearming. */ 940 + timr->it_signal_seq++; 891 941 892 942 kc = timr->kclock; 893 943 if (WARN_ON_ONCE(!kc || !kc->timer_set)) ··· 960 1004 { 961 1005 const struct k_clock *kc = timer->kclock; 962 1006 963 - timer->it_interval = 0; 964 1007 if (kc->timer_try_to_cancel(timer) < 0) 965 1008 return TIMER_RETRY; 966 - timer->it_active = 0; 1009 + timer->it_status = POSIX_TIMER_DISARMED; 967 1010 return 0; 1011 + } 1012 + 1013 + /* 1014 + * If the deleted timer is on the ignored list, remove it and 1015 + * drop the associated reference. 1016 + */ 1017 + static inline void posix_timer_cleanup_ignored(struct k_itimer *tmr) 1018 + { 1019 + if (!hlist_unhashed(&tmr->ignored_list)) { 1020 + hlist_del_init(&tmr->ignored_list); 1021 + posixtimer_putref(tmr); 1022 + } 968 1023 } 969 1024 970 1025 static inline int timer_delete_hook(struct k_itimer *timer) 971 1026 { 972 1027 const struct k_clock *kc = timer->kclock; 1028 + 1029 + /* Prevent signal delivery and rearming. */ 1030 + timer->it_signal_seq++; 973 1031 974 1032 if (WARN_ON_ONCE(!kc || !kc->timer_del)) 975 1033 return -EINVAL; ··· 1010 1040 1011 1041 spin_lock(&current->sighand->siglock); 1012 1042 hlist_del(&timer->list); 1013 - spin_unlock(&current->sighand->siglock); 1043 + posix_timer_cleanup_ignored(timer); 1014 1044 /* 1015 1045 * A concurrent lookup could check timer::it_signal lockless. It 1016 1046 * will reevaluate with timer::it_lock held and observe the NULL. 1047 + * 1048 + * It must be written with siglock held so that the signal code 1049 + * observes timer->it_signal == NULL in do_sigaction(SIG_IGN), 1050 + * which prevents it from moving a pending signal of a deleted 1051 + * timer to the ignore list. 1017 1052 */ 1018 1053 WRITE_ONCE(timer->it_signal, NULL); 1054 + spin_unlock(&current->sighand->siglock); 1019 1055 1020 1056 unlock_timer(timer, flags); 1021 1057 posix_timer_unhash_and_free(timer); ··· 1067 1091 } 1068 1092 hlist_del(&timer->list); 1069 1093 1094 + posix_timer_cleanup_ignored(timer); 1095 + 1070 1096 /* 1071 1097 * Setting timer::it_signal to NULL is technically not required 1072 1098 * here as nothing can access the timer anymore legitimately via ··· 1101 1123 /* The timers are not longer accessible via tsk::signal */ 1102 1124 while (!hlist_empty(&timers)) 1103 1125 itimer_delete(hlist_entry(timers.first, struct k_itimer, list)); 1126 + 1127 + /* 1128 + * There should be no timers on the ignored list. itimer_delete() has 1129 + * mopped them up. 1130 + */ 1131 + if (!WARN_ON_ONCE(!hlist_empty(&tsk->signal->ignored_posix_timers))) 1132 + return; 1133 + 1134 + hlist_move_list(&tsk->signal->ignored_posix_timers, &timers); 1135 + while (!hlist_empty(&timers)) { 1136 + posix_timer_cleanup_ignored(hlist_entry(timers.first, struct k_itimer, 1137 + ignored_list)); 1138 + } 1104 1139 } 1105 1140 1106 1141 SYSCALL_DEFINE2(clock_settime, const clockid_t, which_clock,
+7 -1
kernel/time/posix-timers.h
··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 2 #define TIMER_RETRY 1 3 3 4 + enum posix_timer_state { 5 + POSIX_TIMER_DISARMED, 6 + POSIX_TIMER_ARMED, 7 + POSIX_TIMER_REQUEUE_PENDING, 8 + }; 9 + 4 10 struct k_clock { 5 11 int (*clock_getres)(const clockid_t which_clock, 6 12 struct timespec64 *tp); ··· 42 36 extern const struct k_clock clock_thread; 43 37 extern const struct k_clock alarm_clock; 44 38 45 - int posix_timer_queue_signal(struct k_itimer *timr); 39 + void posix_timer_queue_signal(struct k_itimer *timr); 46 40 47 41 void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting); 48 42 int common_timer_set(struct k_itimer *timr, int flags,
+377
kernel/time/sleep_timeout.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Kernel internal schedule timeout and sleeping functions 4 + */ 5 + 6 + #include <linux/delay.h> 7 + #include <linux/jiffies.h> 8 + #include <linux/timer.h> 9 + #include <linux/sched/signal.h> 10 + #include <linux/sched/debug.h> 11 + 12 + #include "tick-internal.h" 13 + 14 + /* 15 + * Since schedule_timeout()'s timer is defined on the stack, it must store 16 + * the target task on the stack as well. 17 + */ 18 + struct process_timer { 19 + struct timer_list timer; 20 + struct task_struct *task; 21 + }; 22 + 23 + static void process_timeout(struct timer_list *t) 24 + { 25 + struct process_timer *timeout = from_timer(timeout, t, timer); 26 + 27 + wake_up_process(timeout->task); 28 + } 29 + 30 + /** 31 + * schedule_timeout - sleep until timeout 32 + * @timeout: timeout value in jiffies 33 + * 34 + * Make the current task sleep until @timeout jiffies have elapsed. 35 + * The function behavior depends on the current task state 36 + * (see also set_current_state() description): 37 + * 38 + * %TASK_RUNNING - the scheduler is called, but the task does not sleep 39 + * at all. That happens because sched_submit_work() does nothing for 40 + * tasks in %TASK_RUNNING state. 41 + * 42 + * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to 43 + * pass before the routine returns unless the current task is explicitly 44 + * woken up, (e.g. by wake_up_process()). 45 + * 46 + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 47 + * delivered to the current task or the current task is explicitly woken 48 + * up. 49 + * 50 + * The current task state is guaranteed to be %TASK_RUNNING when this 51 + * routine returns. 52 + * 53 + * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule 54 + * the CPU away without a bound on the timeout. In this case the return 55 + * value will be %MAX_SCHEDULE_TIMEOUT. 56 + * 57 + * Returns: 0 when the timer has expired otherwise the remaining time in 58 + * jiffies will be returned. In all cases the return value is guaranteed 59 + * to be non-negative. 60 + */ 61 + signed long __sched schedule_timeout(signed long timeout) 62 + { 63 + struct process_timer timer; 64 + unsigned long expire; 65 + 66 + switch (timeout) { 67 + case MAX_SCHEDULE_TIMEOUT: 68 + /* 69 + * These two special cases are useful to be comfortable 70 + * in the caller. Nothing more. We could take 71 + * MAX_SCHEDULE_TIMEOUT from one of the negative value 72 + * but I' d like to return a valid offset (>=0) to allow 73 + * the caller to do everything it want with the retval. 74 + */ 75 + schedule(); 76 + goto out; 77 + default: 78 + /* 79 + * Another bit of PARANOID. Note that the retval will be 80 + * 0 since no piece of kernel is supposed to do a check 81 + * for a negative retval of schedule_timeout() (since it 82 + * should never happens anyway). You just have the printk() 83 + * that will tell you if something is gone wrong and where. 84 + */ 85 + if (timeout < 0) { 86 + pr_err("%s: wrong timeout value %lx\n", __func__, timeout); 87 + dump_stack(); 88 + __set_current_state(TASK_RUNNING); 89 + goto out; 90 + } 91 + } 92 + 93 + expire = timeout + jiffies; 94 + 95 + timer.task = current; 96 + timer_setup_on_stack(&timer.timer, process_timeout, 0); 97 + timer.timer.expires = expire; 98 + add_timer(&timer.timer); 99 + schedule(); 100 + del_timer_sync(&timer.timer); 101 + 102 + /* Remove the timer from the object tracker */ 103 + destroy_timer_on_stack(&timer.timer); 104 + 105 + timeout = expire - jiffies; 106 + 107 + out: 108 + return timeout < 0 ? 0 : timeout; 109 + } 110 + EXPORT_SYMBOL(schedule_timeout); 111 + 112 + /* 113 + * __set_current_state() can be used in schedule_timeout_*() functions, because 114 + * schedule_timeout() calls schedule() unconditionally. 115 + */ 116 + 117 + /** 118 + * schedule_timeout_interruptible - sleep until timeout (interruptible) 119 + * @timeout: timeout value in jiffies 120 + * 121 + * See schedule_timeout() for details. 122 + * 123 + * Task state is set to TASK_INTERRUPTIBLE before starting the timeout. 124 + */ 125 + signed long __sched schedule_timeout_interruptible(signed long timeout) 126 + { 127 + __set_current_state(TASK_INTERRUPTIBLE); 128 + return schedule_timeout(timeout); 129 + } 130 + EXPORT_SYMBOL(schedule_timeout_interruptible); 131 + 132 + /** 133 + * schedule_timeout_killable - sleep until timeout (killable) 134 + * @timeout: timeout value in jiffies 135 + * 136 + * See schedule_timeout() for details. 137 + * 138 + * Task state is set to TASK_KILLABLE before starting the timeout. 139 + */ 140 + signed long __sched schedule_timeout_killable(signed long timeout) 141 + { 142 + __set_current_state(TASK_KILLABLE); 143 + return schedule_timeout(timeout); 144 + } 145 + EXPORT_SYMBOL(schedule_timeout_killable); 146 + 147 + /** 148 + * schedule_timeout_uninterruptible - sleep until timeout (uninterruptible) 149 + * @timeout: timeout value in jiffies 150 + * 151 + * See schedule_timeout() for details. 152 + * 153 + * Task state is set to TASK_UNINTERRUPTIBLE before starting the timeout. 154 + */ 155 + signed long __sched schedule_timeout_uninterruptible(signed long timeout) 156 + { 157 + __set_current_state(TASK_UNINTERRUPTIBLE); 158 + return schedule_timeout(timeout); 159 + } 160 + EXPORT_SYMBOL(schedule_timeout_uninterruptible); 161 + 162 + /** 163 + * schedule_timeout_idle - sleep until timeout (idle) 164 + * @timeout: timeout value in jiffies 165 + * 166 + * See schedule_timeout() for details. 167 + * 168 + * Task state is set to TASK_IDLE before starting the timeout. It is similar to 169 + * schedule_timeout_uninterruptible(), except this task will not contribute to 170 + * load average. 171 + */ 172 + signed long __sched schedule_timeout_idle(signed long timeout) 173 + { 174 + __set_current_state(TASK_IDLE); 175 + return schedule_timeout(timeout); 176 + } 177 + EXPORT_SYMBOL(schedule_timeout_idle); 178 + 179 + /** 180 + * schedule_hrtimeout_range_clock - sleep until timeout 181 + * @expires: timeout value (ktime_t) 182 + * @delta: slack in expires timeout (ktime_t) 183 + * @mode: timer mode 184 + * @clock_id: timer clock to be used 185 + * 186 + * Details are explained in schedule_hrtimeout_range() function description as 187 + * this function is commonly used. 188 + */ 189 + int __sched schedule_hrtimeout_range_clock(ktime_t *expires, u64 delta, 190 + const enum hrtimer_mode mode, clockid_t clock_id) 191 + { 192 + struct hrtimer_sleeper t; 193 + 194 + /* 195 + * Optimize when a zero timeout value is given. It does not 196 + * matter whether this is an absolute or a relative time. 197 + */ 198 + if (expires && *expires == 0) { 199 + __set_current_state(TASK_RUNNING); 200 + return 0; 201 + } 202 + 203 + /* 204 + * A NULL parameter means "infinite" 205 + */ 206 + if (!expires) { 207 + schedule(); 208 + return -EINTR; 209 + } 210 + 211 + hrtimer_setup_sleeper_on_stack(&t, clock_id, mode); 212 + hrtimer_set_expires_range_ns(&t.timer, *expires, delta); 213 + hrtimer_sleeper_start_expires(&t, mode); 214 + 215 + if (likely(t.task)) 216 + schedule(); 217 + 218 + hrtimer_cancel(&t.timer); 219 + destroy_hrtimer_on_stack(&t.timer); 220 + 221 + __set_current_state(TASK_RUNNING); 222 + 223 + return !t.task ? 0 : -EINTR; 224 + } 225 + EXPORT_SYMBOL_GPL(schedule_hrtimeout_range_clock); 226 + 227 + /** 228 + * schedule_hrtimeout_range - sleep until timeout 229 + * @expires: timeout value (ktime_t) 230 + * @delta: slack in expires timeout (ktime_t) 231 + * @mode: timer mode 232 + * 233 + * Make the current task sleep until the given expiry time has 234 + * elapsed. The routine will return immediately unless 235 + * the current task state has been set (see set_current_state()). 236 + * 237 + * The @delta argument gives the kernel the freedom to schedule the 238 + * actual wakeup to a time that is both power and performance friendly 239 + * for regular (non RT/DL) tasks. 240 + * The kernel give the normal best effort behavior for "@expires+@delta", 241 + * but may decide to fire the timer earlier, but no earlier than @expires. 242 + * 243 + * You can set the task state as follows - 244 + * 245 + * %TASK_UNINTERRUPTIBLE - at least @timeout time is guaranteed to 246 + * pass before the routine returns unless the current task is explicitly 247 + * woken up, (e.g. by wake_up_process()). 248 + * 249 + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 250 + * delivered to the current task or the current task is explicitly woken 251 + * up. 252 + * 253 + * The current task state is guaranteed to be TASK_RUNNING when this 254 + * routine returns. 255 + * 256 + * Returns: 0 when the timer has expired. If the task was woken before the 257 + * timer expired by a signal (only possible in state TASK_INTERRUPTIBLE) or 258 + * by an explicit wakeup, it returns -EINTR. 259 + */ 260 + int __sched schedule_hrtimeout_range(ktime_t *expires, u64 delta, 261 + const enum hrtimer_mode mode) 262 + { 263 + return schedule_hrtimeout_range_clock(expires, delta, mode, 264 + CLOCK_MONOTONIC); 265 + } 266 + EXPORT_SYMBOL_GPL(schedule_hrtimeout_range); 267 + 268 + /** 269 + * schedule_hrtimeout - sleep until timeout 270 + * @expires: timeout value (ktime_t) 271 + * @mode: timer mode 272 + * 273 + * See schedule_hrtimeout_range() for details. @delta argument of 274 + * schedule_hrtimeout_range() is set to 0 and has therefore no impact. 275 + */ 276 + int __sched schedule_hrtimeout(ktime_t *expires, const enum hrtimer_mode mode) 277 + { 278 + return schedule_hrtimeout_range(expires, 0, mode); 279 + } 280 + EXPORT_SYMBOL_GPL(schedule_hrtimeout); 281 + 282 + /** 283 + * msleep - sleep safely even with waitqueue interruptions 284 + * @msecs: Requested sleep duration in milliseconds 285 + * 286 + * msleep() uses jiffy based timeouts for the sleep duration. Because of the 287 + * design of the timer wheel, the maximum additional percentage delay (slack) is 288 + * 12.5%. This is only valid for timers which will end up in level 1 or a higher 289 + * level of the timer wheel. For explanation of those 12.5% please check the 290 + * detailed description about the basics of the timer wheel. 291 + * 292 + * The slack of timers which will end up in level 0 depends on sleep duration 293 + * (msecs) and HZ configuration and can be calculated in the following way (with 294 + * the timer wheel design restriction that the slack is not less than 12.5%): 295 + * 296 + * ``slack = MSECS_PER_TICK / msecs`` 297 + * 298 + * When the allowed slack of the callsite is known, the calculation could be 299 + * turned around to find the minimal allowed sleep duration to meet the 300 + * constraints. For example: 301 + * 302 + * * ``HZ=1000`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 1 / (1/4) = 4``: 303 + * all sleep durations greater or equal 4ms will meet the constraints. 304 + * * ``HZ=1000`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 1 / (1/8) = 8``: 305 + * all sleep durations greater or equal 8ms will meet the constraints. 306 + * * ``HZ=250`` with ``slack=25%``: ``MSECS_PER_TICK / slack = 4 / (1/4) = 16``: 307 + * all sleep durations greater or equal 16ms will meet the constraints. 308 + * * ``HZ=250`` with ``slack=12.5%``: ``MSECS_PER_TICK / slack = 4 / (1/8) = 32``: 309 + * all sleep durations greater or equal 32ms will meet the constraints. 310 + * 311 + * See also the signal aware variant msleep_interruptible(). 312 + */ 313 + void msleep(unsigned int msecs) 314 + { 315 + unsigned long timeout = msecs_to_jiffies(msecs); 316 + 317 + while (timeout) 318 + timeout = schedule_timeout_uninterruptible(timeout); 319 + } 320 + EXPORT_SYMBOL(msleep); 321 + 322 + /** 323 + * msleep_interruptible - sleep waiting for signals 324 + * @msecs: Requested sleep duration in milliseconds 325 + * 326 + * See msleep() for some basic information. 327 + * 328 + * The difference between msleep() and msleep_interruptible() is that the sleep 329 + * could be interrupted by a signal delivery and then returns early. 330 + * 331 + * Returns: The remaining time of the sleep duration transformed to msecs (see 332 + * schedule_timeout() for details). 333 + */ 334 + unsigned long msleep_interruptible(unsigned int msecs) 335 + { 336 + unsigned long timeout = msecs_to_jiffies(msecs); 337 + 338 + while (timeout && !signal_pending(current)) 339 + timeout = schedule_timeout_interruptible(timeout); 340 + return jiffies_to_msecs(timeout); 341 + } 342 + EXPORT_SYMBOL(msleep_interruptible); 343 + 344 + /** 345 + * usleep_range_state - Sleep for an approximate time in a given state 346 + * @min: Minimum time in usecs to sleep 347 + * @max: Maximum time in usecs to sleep 348 + * @state: State of the current task that will be while sleeping 349 + * 350 + * usleep_range_state() sleeps at least for the minimum specified time but not 351 + * longer than the maximum specified amount of time. The range might reduce 352 + * power usage by allowing hrtimers to coalesce an already scheduled interrupt 353 + * with this hrtimer. In the worst case, an interrupt is scheduled for the upper 354 + * bound. 355 + * 356 + * The sleeping task is set to the specified state before starting the sleep. 357 + * 358 + * In non-atomic context where the exact wakeup time is flexible, use 359 + * usleep_range() or its variants instead of udelay(). The sleep improves 360 + * responsiveness by avoiding the CPU-hogging busy-wait of udelay(). 361 + */ 362 + void __sched usleep_range_state(unsigned long min, unsigned long max, unsigned int state) 363 + { 364 + ktime_t exp = ktime_add_us(ktime_get(), min); 365 + u64 delta = (u64)(max - min) * NSEC_PER_USEC; 366 + 367 + if (WARN_ON_ONCE(max < min)) 368 + delta = 0; 369 + 370 + for (;;) { 371 + __set_current_state(state); 372 + /* Do not return before the requested sleep time has elapsed */ 373 + if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS)) 374 + break; 375 + } 376 + } 377 + EXPORT_SYMBOL(usleep_range_state);
+1 -2
kernel/time/tick-internal.h
··· 25 25 extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast); 26 26 extern void tick_handle_periodic(struct clock_event_device *dev); 27 27 extern void tick_check_new_device(struct clock_event_device *dev); 28 + extern void tick_offline_cpu(unsigned int cpu); 28 29 extern void tick_shutdown(unsigned int cpu); 29 30 extern void tick_suspend(void); 30 31 extern void tick_resume(void); ··· 143 142 #endif /* !(BROADCAST && ONESHOT) */ 144 143 145 144 #if defined(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) && defined(CONFIG_HOTPLUG_CPU) 146 - extern void tick_offline_cpu(unsigned int cpu); 147 145 extern void tick_broadcast_offline(unsigned int cpu); 148 146 #else 149 - static inline void tick_offline_cpu(unsigned int cpu) { } 150 147 static inline void tick_broadcast_offline(unsigned int cpu) { } 151 148 #endif 152 149
+6 -19
kernel/time/tick-sched.c
··· 311 311 return HRTIMER_RESTART; 312 312 } 313 313 314 - static void tick_sched_timer_cancel(struct tick_sched *ts) 315 - { 316 - if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) 317 - hrtimer_cancel(&ts->sched_timer); 318 - else if (tick_sched_flag_test(ts, TS_FLAG_NOHZ)) 319 - tick_program_event(KTIME_MAX, 1); 320 - } 321 - 322 314 #ifdef CONFIG_NO_HZ_FULL 323 315 cpumask_var_t tick_nohz_full_mask; 324 316 EXPORT_SYMBOL_GPL(tick_nohz_full_mask); ··· 1053 1061 * the tick timer. 1054 1062 */ 1055 1063 if (unlikely(expires == KTIME_MAX)) { 1056 - tick_sched_timer_cancel(ts); 1064 + if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) 1065 + hrtimer_cancel(&ts->sched_timer); 1066 + else 1067 + tick_program_event(KTIME_MAX, 1); 1057 1068 return; 1058 1069 } 1059 1070 ··· 1605 1610 */ 1606 1611 void tick_sched_timer_dying(int cpu) 1607 1612 { 1608 - struct tick_device *td = &per_cpu(tick_cpu_device, cpu); 1609 1613 struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); 1610 - struct clock_event_device *dev = td->evtdev; 1611 1614 ktime_t idle_sleeptime, iowait_sleeptime; 1612 1615 unsigned long idle_calls, idle_sleeps; 1613 1616 1614 1617 /* This must happen before hrtimers are migrated! */ 1615 - tick_sched_timer_cancel(ts); 1616 - 1617 - /* 1618 - * If the clockevents doesn't support CLOCK_EVT_STATE_ONESHOT_STOPPED, 1619 - * make sure not to call low-res tick handler. 1620 - */ 1621 - if (tick_sched_flag_test(ts, TS_FLAG_NOHZ)) 1622 - dev->event_handler = clockevents_handle_noop; 1618 + if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) 1619 + hrtimer_cancel(&ts->sched_timer); 1623 1620 1624 1621 idle_sleeptime = ts->idle_sleeptime; 1625 1622 iowait_sleeptime = ts->iowait_sleeptime;
+10 -10
kernel/time/time.c
··· 556 556 * - all other values are converted to jiffies by either multiplying 557 557 * the input value by a factor or dividing it with a factor and 558 558 * handling any 32-bit overflows. 559 - * for the details see __msecs_to_jiffies() 559 + * for the details see _msecs_to_jiffies() 560 560 * 561 - * __msecs_to_jiffies() checks for the passed in value being a constant 561 + * msecs_to_jiffies() checks for the passed in value being a constant 562 562 * via __builtin_constant_p() allowing gcc to eliminate most of the 563 563 * code, __msecs_to_jiffies() is called if the value passed does not 564 564 * allow constant folding and the actual conversion must be done at ··· 866 866 * 867 867 * Handles compat or 32-bit modes. 868 868 * 869 - * Return: %0 on success or negative errno on error 869 + * Return: 0 on success or negative errno on error 870 870 */ 871 871 int get_timespec64(struct timespec64 *ts, 872 872 const struct __kernel_timespec __user *uts) ··· 897 897 * @ts: input &struct timespec64 898 898 * @uts: user's &struct __kernel_timespec 899 899 * 900 - * Return: %0 on success or negative errno on error 900 + * Return: 0 on success or negative errno on error 901 901 */ 902 902 int put_timespec64(const struct timespec64 *ts, 903 903 struct __kernel_timespec __user *uts) ··· 944 944 * 945 945 * Handles X86_X32_ABI compatibility conversion. 946 946 * 947 - * Return: %0 on success or negative errno on error 947 + * Return: 0 on success or negative errno on error 948 948 */ 949 949 int get_old_timespec32(struct timespec64 *ts, const void __user *uts) 950 950 { ··· 963 963 * 964 964 * Handles X86_X32_ABI compatibility conversion. 965 965 * 966 - * Return: %0 on success or negative errno on error 966 + * Return: 0 on success or negative errno on error 967 967 */ 968 968 int put_old_timespec32(const struct timespec64 *ts, void __user *uts) 969 969 { ··· 979 979 * @it: destination &struct itimerspec64 980 980 * @uit: user's &struct __kernel_itimerspec 981 981 * 982 - * Return: %0 on success or negative errno on error 982 + * Return: 0 on success or negative errno on error 983 983 */ 984 984 int get_itimerspec64(struct itimerspec64 *it, 985 985 const struct __kernel_itimerspec __user *uit) ··· 1002 1002 * @it: input &struct itimerspec64 1003 1003 * @uit: user's &struct __kernel_itimerspec 1004 1004 * 1005 - * Return: %0 on success or negative errno on error 1005 + * Return: 0 on success or negative errno on error 1006 1006 */ 1007 1007 int put_itimerspec64(const struct itimerspec64 *it, 1008 1008 struct __kernel_itimerspec __user *uit) ··· 1024 1024 * @its: destination &struct itimerspec64 1025 1025 * @uits: user's &struct old_itimerspec32 1026 1026 * 1027 - * Return: %0 on success or negative errno on error 1027 + * Return: 0 on success or negative errno on error 1028 1028 */ 1029 1029 int get_old_itimerspec32(struct itimerspec64 *its, 1030 1030 const struct old_itimerspec32 __user *uits) ··· 1043 1043 * @its: input &struct itimerspec64 1044 1044 * @uits: user's &struct old_itimerspec32 1045 1045 * 1046 - * Return: %0 on success or negative errno on error 1046 + * Return: 0 on success or negative errno on error 1047 1047 */ 1048 1048 int put_old_itimerspec32(const struct itimerspec64 *its, 1049 1049 struct old_itimerspec32 __user *uits)
+211 -321
kernel/time/timekeeping.c
··· 30 30 #include "timekeeping_internal.h" 31 31 32 32 #define TK_CLEAR_NTP (1 << 0) 33 - #define TK_MIRROR (1 << 1) 34 - #define TK_CLOCK_WAS_SET (1 << 2) 33 + #define TK_CLOCK_WAS_SET (1 << 1) 34 + 35 + #define TK_UPDATE_ALL (TK_CLEAR_NTP | TK_CLOCK_WAS_SET) 35 36 36 37 enum timekeeping_adv_mode { 37 38 /* Update timekeeper when a tick has passed */ ··· 42 41 TK_ADV_FREQ 43 42 }; 44 43 45 - DEFINE_RAW_SPINLOCK(timekeeper_lock); 46 - 47 44 /* 48 45 * The most important data for readout fits into a single 64 byte 49 46 * cache line. 50 47 */ 51 - static struct { 48 + struct tk_data { 52 49 seqcount_raw_spinlock_t seq; 53 50 struct timekeeper timekeeper; 54 - } tk_core ____cacheline_aligned = { 55 - .seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock), 56 - }; 51 + struct timekeeper shadow_timekeeper; 52 + raw_spinlock_t lock; 53 + } ____cacheline_aligned; 57 54 58 - static struct timekeeper shadow_timekeeper; 55 + static struct tk_data tk_core; 59 56 60 57 /* flag for if timekeeping is suspended */ 61 58 int __read_mostly timekeeping_suspended; ··· 112 113 .base[0] = FAST_TK_INIT, 113 114 .base[1] = FAST_TK_INIT, 114 115 }; 116 + 117 + unsigned long timekeeper_lock_irqsave(void) 118 + { 119 + unsigned long flags; 120 + 121 + raw_spin_lock_irqsave(&tk_core.lock, flags); 122 + return flags; 123 + } 124 + 125 + void timekeeper_unlock_irqrestore(unsigned long flags) 126 + { 127 + raw_spin_unlock_irqrestore(&tk_core.lock, flags); 128 + } 115 129 116 130 /* 117 131 * Multigrain timestamps require tracking the latest fine-grained timestamp ··· 190 178 WARN_ON_ONCE(tk->offs_real != timespec64_to_ktime(tmp)); 191 179 tk->wall_to_monotonic = wtm; 192 180 set_normalized_timespec64(&tmp, -wtm.tv_sec, -wtm.tv_nsec); 193 - tk->offs_real = timespec64_to_ktime(tmp); 194 - tk->offs_tai = ktime_add(tk->offs_real, ktime_set(tk->tai_offset, 0)); 181 + /* Paired with READ_ONCE() in ktime_mono_to_any() */ 182 + WRITE_ONCE(tk->offs_real, timespec64_to_ktime(tmp)); 183 + WRITE_ONCE(tk->offs_tai, ktime_add(tk->offs_real, ktime_set(tk->tai_offset, 0))); 195 184 } 196 185 197 186 static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta) 198 187 { 199 - tk->offs_boot = ktime_add(tk->offs_boot, delta); 188 + /* Paired with READ_ONCE() in ktime_mono_to_any() */ 189 + WRITE_ONCE(tk->offs_boot, ktime_add(tk->offs_boot, delta)); 200 190 /* 201 191 * Timespec representation for VDSO update to avoid 64bit division 202 192 * on every update. ··· 215 201 * the tkr's clocksource may change between the read reference, and the 216 202 * clock reference passed to the read function. This can cause crashes if 217 203 * the wrong clocksource is passed to the wrong read function. 218 - * This isn't necessary to use when holding the timekeeper_lock or doing 204 + * This isn't necessary to use when holding the tk_core.lock or doing 219 205 * a read of the fast-timekeeper tkrs (which is protected by its own locking 220 206 * and update logic). 221 207 */ ··· 225 211 226 212 return clock->read(clock); 227 213 } 228 - 229 - #ifdef CONFIG_DEBUG_TIMEKEEPING 230 - #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */ 231 - 232 - static void timekeeping_check_update(struct timekeeper *tk, u64 offset) 233 - { 234 - 235 - u64 max_cycles = tk->tkr_mono.clock->max_cycles; 236 - const char *name = tk->tkr_mono.clock->name; 237 - 238 - if (offset > max_cycles) { 239 - printk_deferred("WARNING: timekeeping: Cycle offset (%lld) is larger than allowed by the '%s' clock's max_cycles value (%lld): time overflow danger\n", 240 - offset, name, max_cycles); 241 - printk_deferred(" timekeeping: Your kernel is sick, but tries to cope by capping time updates\n"); 242 - } else { 243 - if (offset > (max_cycles >> 1)) { 244 - printk_deferred("INFO: timekeeping: Cycle offset (%lld) is larger than the '%s' clock's 50%% safety margin (%lld)\n", 245 - offset, name, max_cycles >> 1); 246 - printk_deferred(" timekeeping: Your kernel is still fine, but is feeling a bit nervous\n"); 247 - } 248 - } 249 - 250 - if (tk->underflow_seen) { 251 - if (jiffies - tk->last_warning > WARNING_FREQ) { 252 - printk_deferred("WARNING: Underflow in clocksource '%s' observed, time update ignored.\n", name); 253 - printk_deferred(" Please report this, consider using a different clocksource, if possible.\n"); 254 - printk_deferred(" Your kernel is probably still fine.\n"); 255 - tk->last_warning = jiffies; 256 - } 257 - tk->underflow_seen = 0; 258 - } 259 - 260 - if (tk->overflow_seen) { 261 - if (jiffies - tk->last_warning > WARNING_FREQ) { 262 - printk_deferred("WARNING: Overflow in clocksource '%s' observed, time update capped.\n", name); 263 - printk_deferred(" Please report this, consider using a different clocksource, if possible.\n"); 264 - printk_deferred(" Your kernel is probably still fine.\n"); 265 - tk->last_warning = jiffies; 266 - } 267 - tk->overflow_seen = 0; 268 - } 269 - } 270 - 271 - static inline u64 timekeeping_cycles_to_ns(const struct tk_read_base *tkr, u64 cycles); 272 - 273 - static inline u64 timekeeping_debug_get_ns(const struct tk_read_base *tkr) 274 - { 275 - struct timekeeper *tk = &tk_core.timekeeper; 276 - u64 now, last, mask, max, delta; 277 - unsigned int seq; 278 - 279 - /* 280 - * Since we're called holding a seqcount, the data may shift 281 - * under us while we're doing the calculation. This can cause 282 - * false positives, since we'd note a problem but throw the 283 - * results away. So nest another seqcount here to atomically 284 - * grab the points we are checking with. 285 - */ 286 - do { 287 - seq = read_seqcount_begin(&tk_core.seq); 288 - now = tk_clock_read(tkr); 289 - last = tkr->cycle_last; 290 - mask = tkr->mask; 291 - max = tkr->clock->max_cycles; 292 - } while (read_seqcount_retry(&tk_core.seq, seq)); 293 - 294 - delta = clocksource_delta(now, last, mask); 295 - 296 - /* 297 - * Try to catch underflows by checking if we are seeing small 298 - * mask-relative negative values. 299 - */ 300 - if (unlikely((~delta & mask) < (mask >> 3))) 301 - tk->underflow_seen = 1; 302 - 303 - /* Check for multiplication overflows */ 304 - if (unlikely(delta > max)) 305 - tk->overflow_seen = 1; 306 - 307 - /* timekeeping_cycles_to_ns() handles both under and overflow */ 308 - return timekeeping_cycles_to_ns(tkr, now); 309 - } 310 - #else 311 - static inline void timekeeping_check_update(struct timekeeper *tk, u64 offset) 312 - { 313 - } 314 - static inline u64 timekeeping_debug_get_ns(const struct tk_read_base *tkr) 315 - { 316 - BUG(); 317 - } 318 - #endif 319 214 320 215 /** 321 216 * tk_setup_internals - Set up internals to use clocksource clock. ··· 330 407 return ((delta * tkr->mult) + tkr->xtime_nsec) >> tkr->shift; 331 408 } 332 409 333 - static __always_inline u64 __timekeeping_get_ns(const struct tk_read_base *tkr) 410 + static __always_inline u64 timekeeping_get_ns(const struct tk_read_base *tkr) 334 411 { 335 412 return timekeeping_cycles_to_ns(tkr, tk_clock_read(tkr)); 336 - } 337 - 338 - static inline u64 timekeeping_get_ns(const struct tk_read_base *tkr) 339 - { 340 - if (IS_ENABLED(CONFIG_DEBUG_TIMEKEEPING)) 341 - return timekeeping_debug_get_ns(tkr); 342 - 343 - return __timekeeping_get_ns(tkr); 344 413 } 345 414 346 415 /** ··· 380 465 seq = read_seqcount_latch(&tkf->seq); 381 466 tkr = tkf->base + (seq & 0x01); 382 467 now = ktime_to_ns(tkr->base); 383 - now += __timekeeping_get_ns(tkr); 468 + now += timekeeping_get_ns(tkr); 384 469 } while (read_seqcount_latch_retry(&tkf->seq, seq)); 385 470 386 471 return now; ··· 451 536 * timekeeping_inject_sleeptime64() 452 537 * __timekeeping_inject_sleeptime(tk, delta); 453 538 * timestamp(); 454 - * timekeeping_update(tk, TK_CLEAR_NTP...); 539 + * timekeeping_update_staged(tkd, TK_CLEAR_NTP...); 455 540 * 456 541 * (2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be 457 542 * partially updated. Since the tk->offs_boot update is a rare event, this ··· 496 581 tkr = tkf->base + (seq & 0x01); 497 582 basem = ktime_to_ns(tkr->base); 498 583 baser = ktime_to_ns(tkr->base_real); 499 - delta = __timekeeping_get_ns(tkr); 584 + delta = timekeeping_get_ns(tkr); 500 585 } while (raw_read_seqcount_latch_retry(&tkf->seq, seq)); 501 586 502 587 if (mono) ··· 610 695 int pvclock_gtod_register_notifier(struct notifier_block *nb) 611 696 { 612 697 struct timekeeper *tk = &tk_core.timekeeper; 613 - unsigned long flags; 614 698 int ret; 615 699 616 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 700 + guard(raw_spinlock_irqsave)(&tk_core.lock); 617 701 ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb); 618 702 update_pvclock_gtod(tk, true); 619 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 620 703 621 704 return ret; 622 705 } ··· 627 714 */ 628 715 int pvclock_gtod_unregister_notifier(struct notifier_block *nb) 629 716 { 630 - unsigned long flags; 631 - int ret; 632 - 633 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 634 - ret = raw_notifier_chain_unregister(&pvclock_gtod_chain, nb); 635 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 636 - 637 - return ret; 717 + guard(raw_spinlock_irqsave)(&tk_core.lock); 718 + return raw_notifier_chain_unregister(&pvclock_gtod_chain, nb); 638 719 } 639 720 EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier); 640 721 ··· 641 734 if (tk->next_leap_ktime != KTIME_MAX) 642 735 /* Convert to monotonic time */ 643 736 tk->next_leap_ktime = ktime_sub(tk->next_leap_ktime, tk->offs_real); 737 + } 738 + 739 + /* 740 + * Leap state update for both shadow and the real timekeeper 741 + * Separate to spare a full memcpy() of the timekeeper. 742 + */ 743 + static void tk_update_leap_state_all(struct tk_data *tkd) 744 + { 745 + write_seqcount_begin(&tkd->seq); 746 + tk_update_leap_state(&tkd->shadow_timekeeper); 747 + tkd->timekeeper.next_leap_ktime = tkd->shadow_timekeeper.next_leap_ktime; 748 + write_seqcount_end(&tkd->seq); 644 749 } 645 750 646 751 /* ··· 688 769 tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC); 689 770 } 690 771 691 - /* must hold timekeeper_lock */ 692 - static void timekeeping_update(struct timekeeper *tk, unsigned int action) 772 + /* 773 + * Restore the shadow timekeeper from the real timekeeper. 774 + */ 775 + static void timekeeping_restore_shadow(struct tk_data *tkd) 693 776 { 777 + lockdep_assert_held(&tkd->lock); 778 + memcpy(&tkd->shadow_timekeeper, &tkd->timekeeper, sizeof(tkd->timekeeper)); 779 + } 780 + 781 + static void timekeeping_update_from_shadow(struct tk_data *tkd, unsigned int action) 782 + { 783 + struct timekeeper *tk = &tk_core.shadow_timekeeper; 784 + 785 + lockdep_assert_held(&tkd->lock); 786 + 787 + /* 788 + * Block out readers before running the updates below because that 789 + * updates VDSO and other time related infrastructure. Not blocking 790 + * the readers might let a reader see time going backwards when 791 + * reading from the VDSO after the VDSO update and then reading in 792 + * the kernel from the timekeeper before that got updated. 793 + */ 794 + write_seqcount_begin(&tkd->seq); 795 + 694 796 if (action & TK_CLEAR_NTP) { 695 797 tk->ntp_error = 0; 696 798 ntp_clear(); ··· 729 789 730 790 if (action & TK_CLOCK_WAS_SET) 731 791 tk->clock_was_set_seq++; 792 + 732 793 /* 733 - * The mirroring of the data to the shadow-timekeeper needs 734 - * to happen last here to ensure we don't over-write the 735 - * timekeeper structure on the next update with stale data 794 + * Update the real timekeeper. 795 + * 796 + * We could avoid this memcpy() by switching pointers, but that has 797 + * the downside that the reader side does not longer benefit from 798 + * the cacheline optimized data layout of the timekeeper and requires 799 + * another indirection. 736 800 */ 737 - if (action & TK_MIRROR) 738 - memcpy(&shadow_timekeeper, &tk_core.timekeeper, 739 - sizeof(tk_core.timekeeper)); 801 + memcpy(&tkd->timekeeper, tk, sizeof(*tk)); 802 + write_seqcount_end(&tkd->seq); 740 803 } 741 804 742 805 /** ··· 892 949 unsigned int seq; 893 950 ktime_t tconv; 894 951 952 + if (IS_ENABLED(CONFIG_64BIT)) { 953 + /* 954 + * Paired with WRITE_ONCE()s in tk_set_wall_to_mono() and 955 + * tk_update_sleep_time(). 956 + */ 957 + return ktime_add(tmono, READ_ONCE(*offset)); 958 + } 959 + 895 960 do { 896 961 seq = read_seqcount_begin(&tk_core.seq); 897 962 tconv = ktime_add(tmono, *offset); ··· 1030 1079 unsigned int seq; 1031 1080 ktime_t base_raw; 1032 1081 ktime_t base_real; 1082 + ktime_t base_boot; 1033 1083 u64 nsec_raw; 1034 1084 u64 nsec_real; 1035 1085 u64 now; ··· 1045 1093 systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; 1046 1094 base_real = ktime_add(tk->tkr_mono.base, 1047 1095 tk_core.timekeeper.offs_real); 1096 + base_boot = ktime_add(tk->tkr_mono.base, 1097 + tk_core.timekeeper.offs_boot); 1048 1098 base_raw = tk->tkr_raw.base; 1049 1099 nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); 1050 1100 nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); ··· 1054 1100 1055 1101 systime_snapshot->cycles = now; 1056 1102 systime_snapshot->real = ktime_add_ns(base_real, nsec_real); 1103 + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); 1057 1104 systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); 1058 1105 } 1059 1106 EXPORT_SYMBOL_GPL(ktime_get_snapshot); ··· 1414 1459 */ 1415 1460 int do_settimeofday64(const struct timespec64 *ts) 1416 1461 { 1417 - struct timekeeper *tk = &tk_core.timekeeper; 1418 1462 struct timespec64 ts_delta, xt; 1419 - unsigned long flags; 1420 - int ret = 0; 1421 1463 1422 1464 if (!timespec64_valid_settod(ts)) 1423 1465 return -EINVAL; 1424 1466 1425 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1426 - write_seqcount_begin(&tk_core.seq); 1467 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 1468 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1427 1469 1428 - timekeeping_forward_now(tk); 1470 + timekeeping_forward_now(tks); 1429 1471 1430 - xt = tk_xtime(tk); 1431 - ts_delta = timespec64_sub(*ts, xt); 1472 + xt = tk_xtime(tks); 1473 + ts_delta = timespec64_sub(*ts, xt); 1432 1474 1433 - if (timespec64_compare(&tk->wall_to_monotonic, &ts_delta) > 0) { 1434 - ret = -EINVAL; 1435 - goto out; 1475 + if (timespec64_compare(&tks->wall_to_monotonic, &ts_delta) > 0) { 1476 + timekeeping_restore_shadow(&tk_core); 1477 + return -EINVAL; 1478 + } 1479 + 1480 + tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, ts_delta)); 1481 + tk_set_xtime(tks, ts); 1482 + timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1436 1483 } 1437 - 1438 - tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, ts_delta)); 1439 - 1440 - tk_set_xtime(tk, ts); 1441 - out: 1442 - timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); 1443 - 1444 - write_seqcount_end(&tk_core.seq); 1445 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1446 1484 1447 1485 /* Signal hrtimers about time change */ 1448 1486 clock_was_set(CLOCK_SET_WALL); 1449 1487 1450 - if (!ret) { 1451 - audit_tk_injoffset(ts_delta); 1452 - add_device_randomness(ts, sizeof(*ts)); 1453 - } 1454 - 1455 - return ret; 1488 + audit_tk_injoffset(ts_delta); 1489 + add_device_randomness(ts, sizeof(*ts)); 1490 + return 0; 1456 1491 } 1457 1492 EXPORT_SYMBOL(do_settimeofday64); 1458 1493 ··· 1454 1509 */ 1455 1510 static int timekeeping_inject_offset(const struct timespec64 *ts) 1456 1511 { 1457 - struct timekeeper *tk = &tk_core.timekeeper; 1458 - unsigned long flags; 1459 - struct timespec64 tmp; 1460 - int ret = 0; 1461 - 1462 1512 if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC) 1463 1513 return -EINVAL; 1464 1514 1465 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1466 - write_seqcount_begin(&tk_core.seq); 1515 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 1516 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1517 + struct timespec64 tmp; 1467 1518 1468 - timekeeping_forward_now(tk); 1519 + timekeeping_forward_now(tks); 1469 1520 1470 - /* Make sure the proposed value is valid */ 1471 - tmp = timespec64_add(tk_xtime(tk), *ts); 1472 - if (timespec64_compare(&tk->wall_to_monotonic, ts) > 0 || 1473 - !timespec64_valid_settod(&tmp)) { 1474 - ret = -EINVAL; 1475 - goto error; 1521 + /* Make sure the proposed value is valid */ 1522 + tmp = timespec64_add(tk_xtime(tks), *ts); 1523 + if (timespec64_compare(&tks->wall_to_monotonic, ts) > 0 || 1524 + !timespec64_valid_settod(&tmp)) { 1525 + timekeeping_restore_shadow(&tk_core); 1526 + return -EINVAL; 1527 + } 1528 + 1529 + tk_xtime_add(tks, ts); 1530 + tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, *ts)); 1531 + timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1476 1532 } 1477 - 1478 - tk_xtime_add(tk, ts); 1479 - tk_set_wall_to_mono(tk, timespec64_sub(tk->wall_to_monotonic, *ts)); 1480 - 1481 - error: /* even if we error out, we forwarded the time, so call update */ 1482 - timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); 1483 - 1484 - write_seqcount_end(&tk_core.seq); 1485 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1486 1533 1487 1534 /* Signal hrtimers about time change */ 1488 1535 clock_was_set(CLOCK_SET_WALL); 1489 - 1490 - return ret; 1536 + return 0; 1491 1537 } 1492 1538 1493 1539 /* ··· 1531 1595 */ 1532 1596 static int change_clocksource(void *data) 1533 1597 { 1534 - struct timekeeper *tk = &tk_core.timekeeper; 1535 - struct clocksource *new, *old = NULL; 1536 - unsigned long flags; 1537 - bool change = false; 1538 - 1539 - new = (struct clocksource *) data; 1598 + struct clocksource *new = data, *old = NULL; 1540 1599 1541 1600 /* 1542 - * If the cs is in module, get a module reference. Succeeds 1543 - * for built-in code (owner == NULL) as well. 1601 + * If the clocksource is in a module, get a module reference. 1602 + * Succeeds for built-in code (owner == NULL) as well. Abort if the 1603 + * reference can't be acquired. 1544 1604 */ 1545 - if (try_module_get(new->owner)) { 1546 - if (!new->enable || new->enable(new) == 0) 1547 - change = true; 1548 - else 1549 - module_put(new->owner); 1605 + if (!try_module_get(new->owner)) 1606 + return 0; 1607 + 1608 + /* Abort if the device can't be enabled */ 1609 + if (new->enable && new->enable(new) != 0) { 1610 + module_put(new->owner); 1611 + return 0; 1550 1612 } 1551 1613 1552 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1553 - write_seqcount_begin(&tk_core.seq); 1614 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 1615 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1554 1616 1555 - timekeeping_forward_now(tk); 1556 - 1557 - if (change) { 1558 - old = tk->tkr_mono.clock; 1559 - tk_setup_internals(tk, new); 1617 + timekeeping_forward_now(tks); 1618 + old = tks->tkr_mono.clock; 1619 + tk_setup_internals(tks, new); 1620 + timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1560 1621 } 1561 - 1562 - timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); 1563 - 1564 - write_seqcount_end(&tk_core.seq); 1565 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1566 1622 1567 1623 if (old) { 1568 1624 if (old->disable) 1569 1625 old->disable(old); 1570 - 1571 1626 module_put(old->owner); 1572 1627 } 1573 1628 ··· 1683 1756 *boot_offset = ns_to_timespec64(local_clock()); 1684 1757 } 1685 1758 1759 + static __init void tkd_basic_setup(struct tk_data *tkd) 1760 + { 1761 + raw_spin_lock_init(&tkd->lock); 1762 + seqcount_raw_spinlock_init(&tkd->seq, &tkd->lock); 1763 + } 1764 + 1686 1765 /* 1687 1766 * Flag reflecting whether timekeeping_resume() has injected sleeptime. 1688 1767 * ··· 1713 1780 void __init timekeeping_init(void) 1714 1781 { 1715 1782 struct timespec64 wall_time, boot_offset, wall_to_mono; 1716 - struct timekeeper *tk = &tk_core.timekeeper; 1783 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1717 1784 struct clocksource *clock; 1718 - unsigned long flags; 1785 + 1786 + tkd_basic_setup(&tk_core); 1719 1787 1720 1788 read_persistent_wall_and_boot_offset(&wall_time, &boot_offset); 1721 1789 if (timespec64_valid_settod(&wall_time) && ··· 1736 1802 */ 1737 1803 wall_to_mono = timespec64_sub(boot_offset, wall_time); 1738 1804 1739 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1740 - write_seqcount_begin(&tk_core.seq); 1805 + guard(raw_spinlock_irqsave)(&tk_core.lock); 1806 + 1741 1807 ntp_init(); 1742 1808 1743 1809 clock = clocksource_default_clock(); 1744 1810 if (clock->enable) 1745 1811 clock->enable(clock); 1746 - tk_setup_internals(tk, clock); 1812 + tk_setup_internals(tks, clock); 1747 1813 1748 - tk_set_xtime(tk, &wall_time); 1749 - tk->raw_sec = 0; 1814 + tk_set_xtime(tks, &wall_time); 1815 + tks->raw_sec = 0; 1750 1816 1751 - tk_set_wall_to_mono(tk, wall_to_mono); 1817 + tk_set_wall_to_mono(tks, wall_to_mono); 1752 1818 1753 - timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); 1754 - 1755 - write_seqcount_end(&tk_core.seq); 1756 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1819 + timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); 1757 1820 } 1758 1821 1759 1822 /* time in seconds when suspend began for persistent clock */ ··· 1828 1897 */ 1829 1898 void timekeeping_inject_sleeptime64(const struct timespec64 *delta) 1830 1899 { 1831 - struct timekeeper *tk = &tk_core.timekeeper; 1832 - unsigned long flags; 1900 + scoped_guard(raw_spinlock_irqsave, &tk_core.lock) { 1901 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1833 1902 1834 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1835 - write_seqcount_begin(&tk_core.seq); 1836 - 1837 - suspend_timing_needed = false; 1838 - 1839 - timekeeping_forward_now(tk); 1840 - 1841 - __timekeeping_inject_sleeptime(tk, delta); 1842 - 1843 - timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); 1844 - 1845 - write_seqcount_end(&tk_core.seq); 1846 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1903 + suspend_timing_needed = false; 1904 + timekeeping_forward_now(tks); 1905 + __timekeeping_inject_sleeptime(tks, delta); 1906 + timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1907 + } 1847 1908 1848 1909 /* Signal hrtimers about time change */ 1849 1910 clock_was_set(CLOCK_SET_WALL | CLOCK_SET_BOOT); ··· 1847 1924 */ 1848 1925 void timekeeping_resume(void) 1849 1926 { 1850 - struct timekeeper *tk = &tk_core.timekeeper; 1851 - struct clocksource *clock = tk->tkr_mono.clock; 1852 - unsigned long flags; 1927 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1928 + struct clocksource *clock = tks->tkr_mono.clock; 1853 1929 struct timespec64 ts_new, ts_delta; 1854 - u64 cycle_now, nsec; 1855 1930 bool inject_sleeptime = false; 1931 + u64 cycle_now, nsec; 1932 + unsigned long flags; 1856 1933 1857 1934 read_persistent_clock64(&ts_new); 1858 1935 1859 1936 clockevents_resume(); 1860 1937 clocksource_resume(); 1861 1938 1862 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1863 - write_seqcount_begin(&tk_core.seq); 1939 + raw_spin_lock_irqsave(&tk_core.lock, flags); 1864 1940 1865 1941 /* 1866 1942 * After system resumes, we need to calculate the suspended time and ··· 1873 1951 * The less preferred source will only be tried if there is no better 1874 1952 * usable source. The rtc part is handled separately in rtc core code. 1875 1953 */ 1876 - cycle_now = tk_clock_read(&tk->tkr_mono); 1954 + cycle_now = tk_clock_read(&tks->tkr_mono); 1877 1955 nsec = clocksource_stop_suspend_timing(clock, cycle_now); 1878 1956 if (nsec > 0) { 1879 1957 ts_delta = ns_to_timespec64(nsec); ··· 1885 1963 1886 1964 if (inject_sleeptime) { 1887 1965 suspend_timing_needed = false; 1888 - __timekeeping_inject_sleeptime(tk, &ts_delta); 1966 + __timekeeping_inject_sleeptime(tks, &ts_delta); 1889 1967 } 1890 1968 1891 1969 /* Re-base the last cycle value */ 1892 - tk->tkr_mono.cycle_last = cycle_now; 1893 - tk->tkr_raw.cycle_last = cycle_now; 1970 + tks->tkr_mono.cycle_last = cycle_now; 1971 + tks->tkr_raw.cycle_last = cycle_now; 1894 1972 1895 - tk->ntp_error = 0; 1973 + tks->ntp_error = 0; 1896 1974 timekeeping_suspended = 0; 1897 - timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); 1898 - write_seqcount_end(&tk_core.seq); 1899 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 1975 + timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); 1976 + raw_spin_unlock_irqrestore(&tk_core.lock, flags); 1900 1977 1901 1978 touch_softlockup_watchdog(); 1902 1979 ··· 1907 1986 1908 1987 int timekeeping_suspend(void) 1909 1988 { 1910 - struct timekeeper *tk = &tk_core.timekeeper; 1911 - unsigned long flags; 1912 - struct timespec64 delta, delta_delta; 1913 - static struct timespec64 old_delta; 1989 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 1990 + struct timespec64 delta, delta_delta; 1991 + static struct timespec64 old_delta; 1914 1992 struct clocksource *curr_clock; 1993 + unsigned long flags; 1915 1994 u64 cycle_now; 1916 1995 1917 1996 read_persistent_clock64(&timekeeping_suspend_time); ··· 1926 2005 1927 2006 suspend_timing_needed = true; 1928 2007 1929 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 1930 - write_seqcount_begin(&tk_core.seq); 1931 - timekeeping_forward_now(tk); 2008 + raw_spin_lock_irqsave(&tk_core.lock, flags); 2009 + timekeeping_forward_now(tks); 1932 2010 timekeeping_suspended = 1; 1933 2011 1934 2012 /* ··· 1935 2015 * just read from the current clocksource. Save this to potentially 1936 2016 * use in suspend timing. 1937 2017 */ 1938 - curr_clock = tk->tkr_mono.clock; 1939 - cycle_now = tk->tkr_mono.cycle_last; 2018 + curr_clock = tks->tkr_mono.clock; 2019 + cycle_now = tks->tkr_mono.cycle_last; 1940 2020 clocksource_start_suspend_timing(curr_clock, cycle_now); 1941 2021 1942 2022 if (persistent_clock_exists) { ··· 1946 2026 * try to compensate so the difference in system time 1947 2027 * and persistent_clock time stays close to constant. 1948 2028 */ 1949 - delta = timespec64_sub(tk_xtime(tk), timekeeping_suspend_time); 2029 + delta = timespec64_sub(tk_xtime(tks), timekeeping_suspend_time); 1950 2030 delta_delta = timespec64_sub(delta, old_delta); 1951 2031 if (abs(delta_delta.tv_sec) >= 2) { 1952 2032 /* ··· 1961 2041 } 1962 2042 } 1963 2043 1964 - timekeeping_update(tk, TK_MIRROR); 1965 - halt_fast_timekeeper(tk); 1966 - write_seqcount_end(&tk_core.seq); 1967 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2044 + timekeeping_update_from_shadow(&tk_core, 0); 2045 + halt_fast_timekeeper(tks); 2046 + raw_spin_unlock_irqrestore(&tk_core.lock, flags); 1968 2047 1969 2048 tick_suspend(); 1970 2049 clocksource_suspend(); ··· 2068 2149 */ 2069 2150 static void timekeeping_adjust(struct timekeeper *tk, s64 offset) 2070 2151 { 2152 + u64 ntp_tl = ntp_tick_length(); 2071 2153 u32 mult; 2072 2154 2073 2155 /* 2074 2156 * Determine the multiplier from the current NTP tick length. 2075 2157 * Avoid expensive division when the tick length doesn't change. 2076 2158 */ 2077 - if (likely(tk->ntp_tick == ntp_tick_length())) { 2159 + if (likely(tk->ntp_tick == ntp_tl)) { 2078 2160 mult = tk->tkr_mono.mult - tk->ntp_err_mult; 2079 2161 } else { 2080 - tk->ntp_tick = ntp_tick_length(); 2162 + tk->ntp_tick = ntp_tl; 2081 2163 mult = div64_u64((tk->ntp_tick >> tk->ntp_error_shift) - 2082 2164 tk->xtime_remainder, tk->cycle_interval); 2083 2165 } ··· 2217 2297 */ 2218 2298 static bool timekeeping_advance(enum timekeeping_adv_mode mode) 2219 2299 { 2300 + struct timekeeper *tk = &tk_core.shadow_timekeeper; 2220 2301 struct timekeeper *real_tk = &tk_core.timekeeper; 2221 - struct timekeeper *tk = &shadow_timekeeper; 2222 - u64 offset; 2223 - int shift = 0, maxshift; 2224 2302 unsigned int clock_set = 0; 2225 - unsigned long flags; 2303 + int shift = 0, maxshift; 2304 + u64 offset; 2226 2305 2227 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 2306 + guard(raw_spinlock_irqsave)(&tk_core.lock); 2228 2307 2229 2308 /* Make sure we're fully resumed: */ 2230 2309 if (unlikely(timekeeping_suspended)) 2231 - goto out; 2310 + return false; 2232 2311 2233 2312 offset = clocksource_delta(tk_clock_read(&tk->tkr_mono), 2234 2313 tk->tkr_mono.cycle_last, tk->tkr_mono.mask); 2235 2314 2236 2315 /* Check if there's really nothing to do */ 2237 2316 if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) 2238 - goto out; 2239 - 2240 - /* Do some additional sanity checking */ 2241 - timekeeping_check_update(tk, offset); 2317 + return false; 2242 2318 2243 2319 /* 2244 2320 * With NO_HZ we may have to accumulate many cycle_intervals ··· 2250 2334 maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; 2251 2335 shift = min(shift, maxshift); 2252 2336 while (offset >= tk->cycle_interval) { 2253 - offset = logarithmic_accumulation(tk, offset, shift, 2254 - &clock_set); 2337 + offset = logarithmic_accumulation(tk, offset, shift, &clock_set); 2255 2338 if (offset < tk->cycle_interval<<shift) 2256 2339 shift--; 2257 2340 } ··· 2264 2349 */ 2265 2350 clock_set |= accumulate_nsecs_to_secs(tk); 2266 2351 2267 - write_seqcount_begin(&tk_core.seq); 2268 - /* 2269 - * Update the real timekeeper. 2270 - * 2271 - * We could avoid this memcpy by switching pointers, but that 2272 - * requires changes to all other timekeeper usage sites as 2273 - * well, i.e. move the timekeeper pointer getter into the 2274 - * spinlocked/seqcount protected sections. And we trade this 2275 - * memcpy under the tk_core.seq against one before we start 2276 - * updating. 2277 - */ 2278 - timekeeping_update(tk, clock_set); 2279 - memcpy(real_tk, tk, sizeof(*tk)); 2280 - /* The memcpy must come last. Do not put anything here! */ 2281 - write_seqcount_end(&tk_core.seq); 2282 - out: 2283 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2352 + timekeeping_update_from_shadow(&tk_core, clock_set); 2284 2353 2285 2354 return !!clock_set; 2286 2355 } ··· 2557 2658 */ 2558 2659 int do_adjtimex(struct __kernel_timex *txc) 2559 2660 { 2560 - struct timekeeper *tk = &tk_core.timekeeper; 2561 2661 struct audit_ntp_data ad; 2562 2662 bool offset_set = false; 2563 2663 bool clock_set = false; 2564 2664 struct timespec64 ts; 2565 - unsigned long flags; 2566 - s32 orig_tai, tai; 2567 2665 int ret; 2568 2666 2569 2667 /* Validate the data before disabling interrupts */ ··· 2571 2675 2572 2676 if (txc->modes & ADJ_SETOFFSET) { 2573 2677 struct timespec64 delta; 2678 + 2574 2679 delta.tv_sec = txc->time.tv_sec; 2575 2680 delta.tv_nsec = txc->time.tv_usec; 2576 2681 if (!(txc->modes & ADJ_NANO)) ··· 2589 2692 ktime_get_real_ts64(&ts); 2590 2693 add_device_randomness(&ts, sizeof(ts)); 2591 2694 2592 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 2593 - write_seqcount_begin(&tk_core.seq); 2695 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 2696 + struct timekeeper *tks = &tk_core.shadow_timekeeper; 2697 + s32 orig_tai, tai; 2594 2698 2595 - orig_tai = tai = tk->tai_offset; 2596 - ret = __do_adjtimex(txc, &ts, &tai, &ad); 2699 + orig_tai = tai = tks->tai_offset; 2700 + ret = __do_adjtimex(txc, &ts, &tai, &ad); 2597 2701 2598 - if (tai != orig_tai) { 2599 - __timekeeping_set_tai_offset(tk, tai); 2600 - timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); 2601 - clock_set = true; 2702 + if (tai != orig_tai) { 2703 + __timekeeping_set_tai_offset(tks, tai); 2704 + timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); 2705 + clock_set = true; 2706 + } else { 2707 + tk_update_leap_state_all(&tk_core); 2708 + } 2602 2709 } 2603 - tk_update_leap_state(tk); 2604 - 2605 - write_seqcount_end(&tk_core.seq); 2606 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2607 2710 2608 2711 audit_ntp_log(&ad); 2609 2712 ··· 2627 2730 */ 2628 2731 void hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts) 2629 2732 { 2630 - unsigned long flags; 2631 - 2632 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 2633 - write_seqcount_begin(&tk_core.seq); 2634 - 2733 + guard(raw_spinlock_irqsave)(&tk_core.lock); 2635 2734 __hardpps(phase_ts, raw_ts); 2636 - 2637 - write_seqcount_end(&tk_core.seq); 2638 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2639 2735 } 2640 2736 EXPORT_SYMBOL(hardpps); 2641 2737 #endif /* CONFIG_NTP_PPS */
+2 -8
kernel/time/timekeeping_internal.h
··· 30 30 31 31 #endif 32 32 33 - #ifdef CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE 34 33 static inline u64 clocksource_delta(u64 now, u64 last, u64 mask) 35 34 { 36 35 u64 ret = (now - last) & mask; ··· 40 41 */ 41 42 return ret & ~(mask >> 1) ? 0 : ret; 42 43 } 43 - #else 44 - static inline u64 clocksource_delta(u64 now, u64 last, u64 mask) 45 - { 46 - return (now - last) & mask; 47 - } 48 - #endif 49 44 50 45 /* Semi public for serialization of non timekeeper VDSO updates. */ 51 - extern raw_spinlock_t timekeeper_lock; 46 + unsigned long timekeeper_lock_irqsave(void); 47 + void timekeeper_unlock_irqrestore(unsigned long flags); 52 48 53 49 #endif /* _TIMEKEEPING_INTERNAL_H */
+2 -193
kernel/time/timer.c
··· 37 37 #include <linux/tick.h> 38 38 #include <linux/kallsyms.h> 39 39 #include <linux/irq_work.h> 40 - #include <linux/sched/signal.h> 41 40 #include <linux/sched/sysctl.h> 42 41 #include <linux/sched/nohz.h> 43 42 #include <linux/sched/debug.h> ··· 2421 2422 2422 2423 static void __run_timer_base(struct timer_base *base) 2423 2424 { 2424 - if (time_before(jiffies, base->next_expiry)) 2425 + /* Can race against a remote CPU updating next_expiry under the lock */ 2426 + if (time_before(jiffies, READ_ONCE(base->next_expiry))) 2425 2427 return; 2426 2428 2427 2429 timer_base_lock_expiry(base); ··· 2526 2526 run_posix_cpu_timers(); 2527 2527 } 2528 2528 2529 - /* 2530 - * Since schedule_timeout()'s timer is defined on the stack, it must store 2531 - * the target task on the stack as well. 2532 - */ 2533 - struct process_timer { 2534 - struct timer_list timer; 2535 - struct task_struct *task; 2536 - }; 2537 - 2538 - static void process_timeout(struct timer_list *t) 2539 - { 2540 - struct process_timer *timeout = from_timer(timeout, t, timer); 2541 - 2542 - wake_up_process(timeout->task); 2543 - } 2544 - 2545 - /** 2546 - * schedule_timeout - sleep until timeout 2547 - * @timeout: timeout value in jiffies 2548 - * 2549 - * Make the current task sleep until @timeout jiffies have elapsed. 2550 - * The function behavior depends on the current task state 2551 - * (see also set_current_state() description): 2552 - * 2553 - * %TASK_RUNNING - the scheduler is called, but the task does not sleep 2554 - * at all. That happens because sched_submit_work() does nothing for 2555 - * tasks in %TASK_RUNNING state. 2556 - * 2557 - * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to 2558 - * pass before the routine returns unless the current task is explicitly 2559 - * woken up, (e.g. by wake_up_process()). 2560 - * 2561 - * %TASK_INTERRUPTIBLE - the routine may return early if a signal is 2562 - * delivered to the current task or the current task is explicitly woken 2563 - * up. 2564 - * 2565 - * The current task state is guaranteed to be %TASK_RUNNING when this 2566 - * routine returns. 2567 - * 2568 - * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule 2569 - * the CPU away without a bound on the timeout. In this case the return 2570 - * value will be %MAX_SCHEDULE_TIMEOUT. 2571 - * 2572 - * Returns 0 when the timer has expired otherwise the remaining time in 2573 - * jiffies will be returned. In all cases the return value is guaranteed 2574 - * to be non-negative. 2575 - */ 2576 - signed long __sched schedule_timeout(signed long timeout) 2577 - { 2578 - struct process_timer timer; 2579 - unsigned long expire; 2580 - 2581 - switch (timeout) 2582 - { 2583 - case MAX_SCHEDULE_TIMEOUT: 2584 - /* 2585 - * These two special cases are useful to be comfortable 2586 - * in the caller. Nothing more. We could take 2587 - * MAX_SCHEDULE_TIMEOUT from one of the negative value 2588 - * but I' d like to return a valid offset (>=0) to allow 2589 - * the caller to do everything it want with the retval. 2590 - */ 2591 - schedule(); 2592 - goto out; 2593 - default: 2594 - /* 2595 - * Another bit of PARANOID. Note that the retval will be 2596 - * 0 since no piece of kernel is supposed to do a check 2597 - * for a negative retval of schedule_timeout() (since it 2598 - * should never happens anyway). You just have the printk() 2599 - * that will tell you if something is gone wrong and where. 2600 - */ 2601 - if (timeout < 0) { 2602 - printk(KERN_ERR "schedule_timeout: wrong timeout " 2603 - "value %lx\n", timeout); 2604 - dump_stack(); 2605 - __set_current_state(TASK_RUNNING); 2606 - goto out; 2607 - } 2608 - } 2609 - 2610 - expire = timeout + jiffies; 2611 - 2612 - timer.task = current; 2613 - timer_setup_on_stack(&timer.timer, process_timeout, 0); 2614 - __mod_timer(&timer.timer, expire, MOD_TIMER_NOTPENDING); 2615 - schedule(); 2616 - del_timer_sync(&timer.timer); 2617 - 2618 - /* Remove the timer from the object tracker */ 2619 - destroy_timer_on_stack(&timer.timer); 2620 - 2621 - timeout = expire - jiffies; 2622 - 2623 - out: 2624 - return timeout < 0 ? 0 : timeout; 2625 - } 2626 - EXPORT_SYMBOL(schedule_timeout); 2627 - 2628 - /* 2629 - * We can use __set_current_state() here because schedule_timeout() calls 2630 - * schedule() unconditionally. 2631 - */ 2632 - signed long __sched schedule_timeout_interruptible(signed long timeout) 2633 - { 2634 - __set_current_state(TASK_INTERRUPTIBLE); 2635 - return schedule_timeout(timeout); 2636 - } 2637 - EXPORT_SYMBOL(schedule_timeout_interruptible); 2638 - 2639 - signed long __sched schedule_timeout_killable(signed long timeout) 2640 - { 2641 - __set_current_state(TASK_KILLABLE); 2642 - return schedule_timeout(timeout); 2643 - } 2644 - EXPORT_SYMBOL(schedule_timeout_killable); 2645 - 2646 - signed long __sched schedule_timeout_uninterruptible(signed long timeout) 2647 - { 2648 - __set_current_state(TASK_UNINTERRUPTIBLE); 2649 - return schedule_timeout(timeout); 2650 - } 2651 - EXPORT_SYMBOL(schedule_timeout_uninterruptible); 2652 - 2653 - /* 2654 - * Like schedule_timeout_uninterruptible(), except this task will not contribute 2655 - * to load average. 2656 - */ 2657 - signed long __sched schedule_timeout_idle(signed long timeout) 2658 - { 2659 - __set_current_state(TASK_IDLE); 2660 - return schedule_timeout(timeout); 2661 - } 2662 - EXPORT_SYMBOL(schedule_timeout_idle); 2663 - 2664 2529 #ifdef CONFIG_HOTPLUG_CPU 2665 2530 static void migrate_timer_list(struct timer_base *new_base, struct hlist_head *head) 2666 2531 { ··· 2622 2757 posix_cputimers_init_work(); 2623 2758 open_softirq(TIMER_SOFTIRQ, run_timer_softirq); 2624 2759 } 2625 - 2626 - /** 2627 - * msleep - sleep safely even with waitqueue interruptions 2628 - * @msecs: Time in milliseconds to sleep for 2629 - */ 2630 - void msleep(unsigned int msecs) 2631 - { 2632 - unsigned long timeout = msecs_to_jiffies(msecs); 2633 - 2634 - while (timeout) 2635 - timeout = schedule_timeout_uninterruptible(timeout); 2636 - } 2637 - 2638 - EXPORT_SYMBOL(msleep); 2639 - 2640 - /** 2641 - * msleep_interruptible - sleep waiting for signals 2642 - * @msecs: Time in milliseconds to sleep for 2643 - */ 2644 - unsigned long msleep_interruptible(unsigned int msecs) 2645 - { 2646 - unsigned long timeout = msecs_to_jiffies(msecs); 2647 - 2648 - while (timeout && !signal_pending(current)) 2649 - timeout = schedule_timeout_interruptible(timeout); 2650 - return jiffies_to_msecs(timeout); 2651 - } 2652 - 2653 - EXPORT_SYMBOL(msleep_interruptible); 2654 - 2655 - /** 2656 - * usleep_range_state - Sleep for an approximate time in a given state 2657 - * @min: Minimum time in usecs to sleep 2658 - * @max: Maximum time in usecs to sleep 2659 - * @state: State of the current task that will be while sleeping 2660 - * 2661 - * In non-atomic context where the exact wakeup time is flexible, use 2662 - * usleep_range_state() instead of udelay(). The sleep improves responsiveness 2663 - * by avoiding the CPU-hogging busy-wait of udelay(), and the range reduces 2664 - * power usage by allowing hrtimers to take advantage of an already- 2665 - * scheduled interrupt instead of scheduling a new one just for this sleep. 2666 - */ 2667 - void __sched usleep_range_state(unsigned long min, unsigned long max, 2668 - unsigned int state) 2669 - { 2670 - ktime_t exp = ktime_add_us(ktime_get(), min); 2671 - u64 delta = (u64)(max - min) * NSEC_PER_USEC; 2672 - 2673 - for (;;) { 2674 - __set_current_state(state); 2675 - /* Do not return before the requested sleep time has elapsed */ 2676 - if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS)) 2677 - break; 2678 - } 2679 - } 2680 - EXPORT_SYMBOL(usleep_range_state);
+2 -3
kernel/time/vsyscall.c
··· 151 151 unsigned long vdso_update_begin(void) 152 152 { 153 153 struct vdso_data *vdata = __arch_get_k_vdso_data(); 154 - unsigned long flags; 154 + unsigned long flags = timekeeper_lock_irqsave(); 155 155 156 - raw_spin_lock_irqsave(&timekeeper_lock, flags); 157 156 vdso_write_begin(vdata); 158 157 return flags; 159 158 } ··· 171 172 172 173 vdso_write_end(vdata); 173 174 __arch_sync_vdso_data(vdata); 174 - raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 175 + timekeeper_unlock_irqrestore(flags); 175 176 }
-13
lib/Kconfig.debug
··· 1328 1328 1329 1329 endmenu 1330 1330 1331 - config DEBUG_TIMEKEEPING 1332 - bool "Enable extra timekeeping sanity checking" 1333 - help 1334 - This option will enable additional timekeeping sanity checks 1335 - which may be helpful when diagnosing issues where timekeeping 1336 - problems are suspected. 1337 - 1338 - This may include checks in the timekeeping hotpaths, so this 1339 - option may have a (very small) performance impact to some 1340 - workloads. 1341 - 1342 - If unsure, say N. 1343 - 1344 1331 config DEBUG_PREEMPT 1345 1332 bool "Debug preemptible kernel" 1346 1333 depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT
+2 -3
mm/damon/core.c
··· 1906 1906 1907 1907 static void kdamond_usleep(unsigned long usecs) 1908 1908 { 1909 - /* See Documentation/timers/timers-howto.rst for the thresholds */ 1910 - if (usecs > 20 * USEC_PER_MSEC) 1909 + if (usecs >= USLEEP_RANGE_UPPER_BOUND) 1911 1910 schedule_timeout_idle(usecs_to_jiffies(usecs)); 1912 1911 else 1913 - usleep_idle_range(usecs, usecs + 1); 1912 + usleep_range_idle(usecs, usecs + 1); 1914 1913 } 1915 1914 1916 1915 /* Returns negative error code if it's not activated but should return */
-2
net/bluetooth/hci_event.c
··· 42 42 #define ZERO_KEY "\x00\x00\x00\x00\x00\x00\x00\x00" \ 43 43 "\x00\x00\x00\x00\x00\x00\x00\x00" 44 44 45 - #define secs_to_jiffies(_secs) msecs_to_jiffies((_secs) * 1000) 46 - 47 45 /* Handle HCI Event packets */ 48 46 49 47 static void *hci_ev_skb_pull(struct hci_dev *hdev, struct sk_buff *skb,
+1 -1
net/core/pktgen.c
··· 2285 2285 s64 remaining; 2286 2286 struct hrtimer_sleeper t; 2287 2287 2288 - hrtimer_init_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 2288 + hrtimer_setup_sleeper_on_stack(&t, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 2289 2289 hrtimer_set_expires(&t.timer, spin_until); 2290 2290 2291 2291 remaining = ktime_to_ns(hrtimer_expires_remaining(&t.timer));
+1 -3
net/netfilter/xt_IDLETIMER.c
··· 107 107 schedule_work(&timer->work); 108 108 } 109 109 110 - static enum alarmtimer_restart idletimer_tg_alarmproc(struct alarm *alarm, 111 - ktime_t now) 110 + static void idletimer_tg_alarmproc(struct alarm *alarm, ktime_t now) 112 111 { 113 112 struct idletimer_tg *timer = alarm->data; 114 113 115 114 pr_debug("alarm %s expired\n", timer->attr.attr.name); 116 115 schedule_work(&timer->work); 117 - return ALARMTIMER_NORESTART; 118 116 } 119 117 120 118 static int idletimer_check_sysfs_name(const char *name, unsigned int size)
+5 -5
scripts/checkpatch.pl
··· 6597 6597 # ignore udelay's < 10, however 6598 6598 if (! ($delay < 10) ) { 6599 6599 CHK("USLEEP_RANGE", 6600 - "usleep_range is preferred over udelay; see Documentation/timers/timers-howto.rst\n" . $herecurr); 6600 + "usleep_range is preferred over udelay; see function description of usleep_range() and udelay().\n" . $herecurr); 6601 6601 } 6602 6602 if ($delay > 2000) { 6603 6603 WARN("LONG_UDELAY", 6604 - "long udelay - prefer mdelay; see arch/arm/include/asm/delay.h\n" . $herecurr); 6604 + "long udelay - prefer mdelay; see function description of mdelay().\n" . $herecurr); 6605 6605 } 6606 6606 } 6607 6607 ··· 6609 6609 if ($line =~ /\bmsleep\s*\((\d+)\);/) { 6610 6610 if ($1 < 20) { 6611 6611 WARN("MSLEEP", 6612 - "msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.rst\n" . $herecurr); 6612 + "msleep < 20ms can sleep for up to 20ms; see function description of msleep().\n" . $herecurr); 6613 6613 } 6614 6614 } 6615 6615 ··· 7077 7077 my $max = $7; 7078 7078 if ($min eq $max) { 7079 7079 WARN("USLEEP_RANGE", 7080 - "usleep_range should not use min == max args; see Documentation/timers/timers-howto.rst\n" . "$here\n$stat\n"); 7080 + "usleep_range should not use min == max args; see function description of usleep_range().\n" . "$here\n$stat\n"); 7081 7081 } elsif ($min =~ /^\d+$/ && $max =~ /^\d+$/ && 7082 7082 $min > $max) { 7083 7083 WARN("USLEEP_RANGE", 7084 - "usleep_range args reversed, use min then max; see Documentation/timers/timers-howto.rst\n" . "$here\n$stat\n"); 7084 + "usleep_range args reversed, use min then max; see function description of usleep_range().\n" . "$here\n$stat\n"); 7085 7085 } 7086 7086 } 7087 7087
+4 -4
sound/soc/sof/ops.h
··· 597 597 * @addr: Address to poll 598 598 * @val: Variable to read the value into 599 599 * @cond: Break condition (usually involving @val) 600 - * @sleep_us: Maximum time to sleep between reads in us (0 601 - * tight-loops). Should be less than ~20ms since usleep_range 602 - * is used (see Documentation/timers/timers-howto.rst). 600 + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops). Please 601 + * read usleep_range() function description for details and 602 + * limitations. 603 603 * @timeout_us: Timeout in us, 0 means never timeout 604 604 * 605 - * Returns 0 on success and -ETIMEDOUT upon a timeout. In either 605 + * Returns: 0 on success and -ETIMEDOUT upon a timeout. In either 606 606 * case, the last read value at @addr is stored in @val. Must not 607 607 * be called from atomic context if sleep_us or timeout_us are used. 608 608 *
-1
tools/testing/selftests/wireguard/qemu/debug.config
··· 31 31 CONFIG_SCHED_INFO=y 32 32 CONFIG_SCHEDSTATS=y 33 33 CONFIG_SCHED_STACK_END_CHECK=y 34 - CONFIG_DEBUG_TIMEKEEPING=y 35 34 CONFIG_DEBUG_PREEMPT=y 36 35 CONFIG_DEBUG_RT_MUTEXES=y 37 36 CONFIG_DEBUG_SPINLOCK=y