Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge

Currently, leapsecond adjustments are done at tick time. As a result,
the leapsecond was applied at the first timer tick *after* the
leapsecond (~1-10ms late depending on HZ), rather then exactly on the
second edge.

This was in part historical from back when we were always tick based,
but correcting this since has been avoided since it adds extra
conditional checks in the gettime fastpath, which has performance
overhead.

However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
timers set for right after the leapsecond could fire a second early,
since some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.

This isn't quite as bad as it sounds, since behaviorally it is similar
to what is possible w/ ntpd made leapsecond adjustments done w/o using
the kernel discipline. Where due to latencies, timers may fire just
prior to the settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be careful,
since they are prone to quirks from settimeofday() disturbances.)

However, the purpose of having the kernel do the leap adjustment is to
avoid such latencies, so I think this is worth fixing.

So in order to properly keep those timers from firing a second early,
this patch modifies the ntp and timekeeping logic so that we keep
enough state so that the update_base_offsets_now accessor, which
provides the hrtimer core the current time, can check and apply the
leapsecond adjustment on the second edge. This prevents the hrtimer
core from expiring timers too early.

This patch does not modify any other time read path, so no additional
overhead is incurred. However, this also means that the leap-second
continues to be applied at tick time for all other read-paths.

Apologies to Richard Cochran, who pushed for similar changes years
ago, which I resisted due to the concerns about the performance
overhead.

While I suspect this isn't extremely critical, folks who care about
strict leap-second correctness will likely want to watch
this. Potentially a -stable candidate eventually.

Originally-suggested-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

authored by

John Stultz and committed by
Thomas Gleixner
833f32d7 90bf361c

+61 -8
+1
include/linux/time64.h
··· 29 29 #define FSEC_PER_SEC 1000000000000000LL 30 30 31 31 /* Located here for timespec[64]_valid_strict */ 32 + #define TIME64_MAX ((s64)~((u64)1 << 63)) 32 33 #define KTIME_MAX ((s64)~((u64)1 << 63)) 33 34 #define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC) 34 35
+2
include/linux/timekeeper_internal.h
··· 50 50 * @offs_tai: Offset clock monotonic -> clock tai 51 51 * @tai_offset: The current UTC to TAI offset in seconds 52 52 * @clock_was_set_seq: The sequence number of clock was set events 53 + * @next_leap_ktime: CLOCK_MONOTONIC time value of a pending leap-second 53 54 * @raw_time: Monotonic raw base time in timespec64 format 54 55 * @cycle_interval: Number of clock cycles in one NTP interval 55 56 * @xtime_interval: Number of clock shifted nano seconds in one NTP ··· 91 90 ktime_t offs_tai; 92 91 s32 tai_offset; 93 92 unsigned int clock_was_set_seq; 93 + ktime_t next_leap_ktime; 94 94 struct timespec64 raw_time; 95 95 96 96 /* The following members are for timekeeping internal use */
+35 -7
kernel/time/ntp.c
··· 77 77 /* constant (boot-param configurable) NTP tick adjustment (upscaled) */ 78 78 static s64 ntp_tick_adj; 79 79 80 + /* second value of the next pending leapsecond, or TIME64_MAX if no leap */ 81 + static time64_t ntp_next_leap_sec = TIME64_MAX; 82 + 80 83 #ifdef CONFIG_NTP_PPS 81 84 82 85 /* ··· 353 350 tick_length = tick_length_base; 354 351 time_offset = 0; 355 352 353 + ntp_next_leap_sec = TIME64_MAX; 356 354 /* Clear PPS state variables */ 357 355 pps_clear(); 358 356 } ··· 364 360 return tick_length; 365 361 } 366 362 363 + /** 364 + * ntp_get_next_leap - Returns the next leapsecond in CLOCK_REALTIME ktime_t 365 + * 366 + * Provides the time of the next leapsecond against CLOCK_REALTIME in 367 + * a ktime_t format. Returns KTIME_MAX if no leapsecond is pending. 368 + */ 369 + ktime_t ntp_get_next_leap(void) 370 + { 371 + ktime_t ret; 372 + 373 + if ((time_state == TIME_INS) && (time_status & STA_INS)) 374 + return ktime_set(ntp_next_leap_sec, 0); 375 + ret.tv64 = KTIME_MAX; 376 + return ret; 377 + } 367 378 368 379 /* 369 380 * this routine handles the overflow of the microsecond field ··· 402 383 */ 403 384 switch (time_state) { 404 385 case TIME_OK: 405 - if (time_status & STA_INS) 386 + if (time_status & STA_INS) { 406 387 time_state = TIME_INS; 407 - else if (time_status & STA_DEL) 388 + ntp_next_leap_sec = secs + SECS_PER_DAY - 389 + (secs % SECS_PER_DAY); 390 + } else if (time_status & STA_DEL) { 408 391 time_state = TIME_DEL; 392 + ntp_next_leap_sec = secs + SECS_PER_DAY - 393 + ((secs+1) % SECS_PER_DAY); 394 + } 409 395 break; 410 396 case TIME_INS: 411 - if (!(time_status & STA_INS)) 397 + if (!(time_status & STA_INS)) { 398 + ntp_next_leap_sec = TIME64_MAX; 412 399 time_state = TIME_OK; 413 - else if (secs % SECS_PER_DAY == 0) { 400 + } else if (secs % SECS_PER_DAY == 0) { 414 401 leap = -1; 415 402 time_state = TIME_OOP; 416 403 printk(KERN_NOTICE ··· 424 399 } 425 400 break; 426 401 case TIME_DEL: 427 - if (!(time_status & STA_DEL)) 402 + if (!(time_status & STA_DEL)) { 403 + ntp_next_leap_sec = TIME64_MAX; 428 404 time_state = TIME_OK; 429 - else if ((secs + 1) % SECS_PER_DAY == 0) { 405 + } else if ((secs + 1) % SECS_PER_DAY == 0) { 430 406 leap = 1; 407 + ntp_next_leap_sec = TIME64_MAX; 431 408 time_state = TIME_WAIT; 432 409 printk(KERN_NOTICE 433 410 "Clock: deleting leap second 23:59:59 UTC\n"); 434 411 } 435 412 break; 436 413 case TIME_OOP: 414 + ntp_next_leap_sec = TIME64_MAX; 437 415 time_state = TIME_WAIT; 438 416 break; 439 - 440 417 case TIME_WAIT: 441 418 if (!(time_status & (STA_INS | STA_DEL))) 442 419 time_state = TIME_OK; ··· 575 548 if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) { 576 549 time_state = TIME_OK; 577 550 time_status = STA_UNSYNC; 551 + ntp_next_leap_sec = TIME64_MAX; 578 552 /* restart PPS frequency calibration */ 579 553 pps_reset_freq_interval(); 580 554 }
+1
kernel/time/ntp_internal.h
··· 5 5 extern void ntp_clear(void); 6 6 /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */ 7 7 extern u64 ntp_tick_length(void); 8 + extern ktime_t ntp_get_next_leap(void); 8 9 extern int second_overflow(unsigned long secs); 9 10 extern int ntp_validate_timex(struct timex *); 10 11 extern int __do_adjtimex(struct timex *, struct timespec64 *, s32 *);
+22 -1
kernel/time/timekeeping.c
··· 540 540 EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier); 541 541 542 542 /* 543 + * tk_update_leap_state - helper to update the next_leap_ktime 544 + */ 545 + static inline void tk_update_leap_state(struct timekeeper *tk) 546 + { 547 + tk->next_leap_ktime = ntp_get_next_leap(); 548 + if (tk->next_leap_ktime.tv64 != KTIME_MAX) 549 + /* Convert to monotonic time */ 550 + tk->next_leap_ktime = ktime_sub(tk->next_leap_ktime, tk->offs_real); 551 + } 552 + 553 + /* 543 554 * Update the ktime_t based scalar nsec members of the timekeeper 544 555 */ 545 556 static inline void tk_update_ktime_data(struct timekeeper *tk) ··· 591 580 ntp_clear(); 592 581 } 593 582 583 + tk_update_leap_state(tk); 594 584 tk_update_ktime_data(tk); 595 585 596 586 update_vsyscall(tk); ··· 1968 1956 1969 1957 base = tk->tkr_mono.base; 1970 1958 nsecs = timekeeping_get_ns(&tk->tkr_mono); 1959 + base = ktime_add_ns(base, nsecs); 1960 + 1971 1961 if (*cwsseq != tk->clock_was_set_seq) { 1972 1962 *cwsseq = tk->clock_was_set_seq; 1973 1963 *offs_real = tk->offs_real; 1974 1964 *offs_boot = tk->offs_boot; 1975 1965 *offs_tai = tk->offs_tai; 1976 1966 } 1967 + 1968 + /* Handle leapsecond insertion adjustments */ 1969 + if (unlikely(base.tv64 >= tk->next_leap_ktime.tv64)) 1970 + *offs_real = ktime_sub(tk->offs_real, ktime_set(1, 0)); 1971 + 1977 1972 } while (read_seqcount_retry(&tk_core.seq, seq)); 1978 1973 1979 - return ktime_add_ns(base, nsecs); 1974 + return base; 1980 1975 } 1981 1976 1982 1977 /** ··· 2025 2006 __timekeeping_set_tai_offset(tk, tai); 2026 2007 timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET); 2027 2008 } 2009 + tk_update_leap_state(tk); 2010 + 2028 2011 write_seqcount_end(&tk_core.seq); 2029 2012 raw_spin_unlock_irqrestore(&timekeeper_lock, flags); 2030 2013