Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic

tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
ordering of incoming connections and teardowns without the need to
hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
the last measured RTO. To do so, we keep the last seen timestamp in a
per-host indexed data structure and verify if the incoming timestamp
in a connection request is strictly greater than the saved one during
last connection teardown. Thus we can verify later on that no old data
packets will be accepted by the new connection.

During moving a socket to time-wait state we already verify if timestamps
where seen on a connection. Only if that was the case we let the
time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
will be used. But we don't verify this on incoming SYN packets. If a
connection teardown was less than TCP_PAWS_MSL seconds in the past we
cannot guarantee to not accept data packets from an old connection if
no timestamps are present. We should drop this SYN packet. This patch
closes this loophole.

Please note, this patch does not make tcp_tw_recycle in any way more
usable but only adds another safety check:
Sporadic drops of SYN packets because of reordering in the network or
in the socket backlog queues can happen. Users behing NAT trying to
connect to a tcp_tw_recycle enabled server can get caught in blackholes
and their connection requests may regullary get dropped because hosts
behind an address translator don't have synchronized tcp timestamp clocks.
tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.

In general, use of tcp_tw_recycle is disadvised.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Hannes Frederic Sowa and committed by
David S. Miller
a26552af 4fab9071

+11 -6
+1 -1
include/net/tcp.h
··· 417 417 void tcp_init_metrics(struct sock *sk); 418 418 void tcp_metrics_init(void); 419 419 bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst, 420 - bool paws_check); 420 + bool paws_check, bool timestamps); 421 421 bool tcp_remember_stamp(struct sock *sk); 422 422 bool tcp_tw_remember_stamp(struct inet_timewait_sock *tw); 423 423 void tcp_fetch_timewait_stamp(struct sock *sk, struct dst_entry *dst);
+6 -3
net/ipv4/tcp_input.c
··· 5979 5979 * timewait bucket, so that all the necessary checks 5980 5980 * are made in the function processing timewait state. 5981 5981 */ 5982 - if (tmp_opt.saw_tstamp && tcp_death_row.sysctl_tw_recycle) { 5982 + if (tcp_death_row.sysctl_tw_recycle) { 5983 5983 bool strict; 5984 5984 5985 5985 dst = af_ops->route_req(sk, &fl, req, &strict); 5986 + 5986 5987 if (dst && strict && 5987 - !tcp_peer_is_proven(req, dst, true)) { 5988 + !tcp_peer_is_proven(req, dst, true, 5989 + tmp_opt.saw_tstamp)) { 5988 5990 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED); 5989 5991 goto drop_and_release; 5990 5992 } ··· 5995 5993 else if (!sysctl_tcp_syncookies && 5996 5994 (sysctl_max_syn_backlog - inet_csk_reqsk_queue_len(sk) < 5997 5995 (sysctl_max_syn_backlog >> 2)) && 5998 - !tcp_peer_is_proven(req, dst, false)) { 5996 + !tcp_peer_is_proven(req, dst, false, 5997 + tmp_opt.saw_tstamp)) { 5999 5998 /* Without syncookies last quarter of 6000 5999 * backlog is filled with destinations, 6001 6000 * proven to be alive.
+4 -2
net/ipv4/tcp_metrics.c
··· 576 576 tp->snd_cwnd_stamp = tcp_time_stamp; 577 577 } 578 578 579 - bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst, bool paws_check) 579 + bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst, 580 + bool paws_check, bool timestamps) 580 581 { 581 582 struct tcp_metrics_block *tm; 582 583 bool ret; ··· 590 589 if (paws_check) { 591 590 if (tm && 592 591 (u32)get_seconds() - tm->tcpm_ts_stamp < TCP_PAWS_MSL && 593 - (s32)(tm->tcpm_ts - req->ts_recent) > TCP_PAWS_WINDOW) 592 + ((s32)(tm->tcpm_ts - req->ts_recent) > TCP_PAWS_WINDOW || 593 + !timestamps)) 594 594 ret = false; 595 595 else 596 596 ret = true;