Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'tcp-fix-handling-of-stale-syncookies-timestamps'

Guillaume Nault says:

====================
tcp: fix handling of stale syncookies timestamps

The synflood timestamps (->ts_recent_stamp and ->synq_overflow_ts) are
only refreshed when the syncookie protection triggers. Therefore, their
value can become very far apart from jiffies if no synflood happens for
a long time.

If jiffies grows too much and wraps while the synflood timestamp isn't
refreshed, then time_after32() might consider the later to be in the
future. This can trick tcp_synq_no_recent_overflow() into returning
erroneous values and rejecting valid ACKs.

Patch 1 handles the case of ACKs using legitimate syncookies.
Patch 2 handles the case of stray ACKs.
Patch 3 annotates lockless timestamp operations with READ_ONCE() and
WRITE_ONCE().

Changes from v3:
- Fix description of time_between32() (found by Eric Dumazet).
- Use more accurate Fixes tag in patch 3 (suggested by Eric Dumazet).

Changes from v2:
- Define and use time_between32() instead of a pair of
time_before32/time_after32 (suggested by Eric Dumazet).
- Use 'last_overflow - HZ' as lower bound in
tcp_synq_no_recent_overflow(), to accommodate for concurrent
timestamp updates (found by Eric Dumazet).
- Add a third patch to annotate lockless accesses to .ts_recent_stamp.

Changes from v1:
- Initialising timestamps at socket creation time is not enough
because jiffies wraps in 24 days with HZ=1000 (Eric Dumazet).
Handle stale timestamps in tcp_synq_overflow() and
tcp_synq_no_recent_overflow() instead.
- Rework commit description.
- Add a second patch to handle the case of stray ACKs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+32 -8
+13
include/linux/time.h
··· 97 97 */ 98 98 #define time_after32(a, b) ((s32)((u32)(b) - (u32)(a)) < 0) 99 99 #define time_before32(b, a) time_after32(a, b) 100 + 101 + /** 102 + * time_between32 - check if a 32-bit timestamp is within a given time range 103 + * @t: the time which may be within [l,h] 104 + * @l: the lower bound of the range 105 + * @h: the higher bound of the range 106 + * 107 + * time_before32(t, l, h) returns true if @l <= @t <= @h. All operands are 108 + * treated as 32-bit integers. 109 + * 110 + * Equivalent to !(time_before32(@t, @l) || time_after32(@t, @h)). 111 + */ 112 + #define time_between32(t, l, h) ((u32)(h) - (u32)(l) >= (u32)(t) - (u32)(l)) 100 113 #endif
+19 -8
include/net/tcp.h
··· 494 494 reuse = rcu_dereference(sk->sk_reuseport_cb); 495 495 if (likely(reuse)) { 496 496 last_overflow = READ_ONCE(reuse->synq_overflow_ts); 497 - if (time_after32(now, last_overflow + HZ)) 497 + if (!time_between32(now, last_overflow, 498 + last_overflow + HZ)) 498 499 WRITE_ONCE(reuse->synq_overflow_ts, now); 499 500 return; 500 501 } 501 502 } 502 503 503 - last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; 504 - if (time_after32(now, last_overflow + HZ)) 505 - tcp_sk(sk)->rx_opt.ts_recent_stamp = now; 504 + last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp); 505 + if (!time_between32(now, last_overflow, last_overflow + HZ)) 506 + WRITE_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp, now); 506 507 } 507 508 508 509 /* syncookies: no recent synqueue overflow on this listening socket? */ ··· 518 517 reuse = rcu_dereference(sk->sk_reuseport_cb); 519 518 if (likely(reuse)) { 520 519 last_overflow = READ_ONCE(reuse->synq_overflow_ts); 521 - return time_after32(now, last_overflow + 522 - TCP_SYNCOOKIE_VALID); 520 + return !time_between32(now, last_overflow - HZ, 521 + last_overflow + 522 + TCP_SYNCOOKIE_VALID); 523 523 } 524 524 } 525 525 526 - last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp; 527 - return time_after32(now, last_overflow + TCP_SYNCOOKIE_VALID); 526 + last_overflow = READ_ONCE(tcp_sk(sk)->rx_opt.ts_recent_stamp); 527 + 528 + /* If last_overflow <= jiffies <= last_overflow + TCP_SYNCOOKIE_VALID, 529 + * then we're under synflood. However, we have to use 530 + * 'last_overflow - HZ' as lower bound. That's because a concurrent 531 + * tcp_synq_overflow() could update .ts_recent_stamp after we read 532 + * jiffies but before we store .ts_recent_stamp into last_overflow, 533 + * which could lead to rejecting a valid syncookie. 534 + */ 535 + return !time_between32(now, last_overflow - HZ, 536 + last_overflow + TCP_SYNCOOKIE_VALID); 528 537 } 529 538 530 539 static inline u32 tcp_cookie_time(void)