Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net/tcp_fastopen: Disable active side TFO in certain scenarios

Middlebox firewall issues can potentially cause server's data being
blackholed after a successful 3WHS using TFO. Following are the related
reports from Apple:
https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
Slide 31 identifies an issue where the client ACK to the server's data
sent during a TFO'd handshake is dropped.
C ---> syn-data ---> S
C <--- syn/ack ----- S
C (accept & write)
C <---- data ------- S
C ----- ACK -> X S
[retry and timeout]

https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
Slide 5 shows a similar situation that the server's data gets dropped
after 3WHS.
C ---- syn-data ---> S
C <--- syn/ack ----- S
C ---- ack --------> S
S (accept & write)
C? X <- data ------ S
[retry and timeout]

This is the worst failure b/c the client can not detect such behavior to
mitigate the situation (such as disabling TFO). Failing to proceed, the
application (e.g., SSL library) may simply timeout and retry with TFO
again, and the process repeats indefinitely.

The proposed solution is to disable active TFO globally under the
following circumstances:
1. client side TFO socket detects out of order FIN
2. client side TFO socket receives out of order RST

We disable active side TFO globally for 1hr at first. Then if it
happens again, we disable it for 2h, then 4h, 8h, ...
And we reset the timeout to 1hr if a client side TFO sockets not opened
on loopback has successfully received data segs from server.
And we examine this condition during close().

The rational behind it is that when such firewall issue happens,
application running on the client should eventually close the socket as
it is not able to get the data it is expecting. Or application running
on the server should close the socket as it is not able to receive any
response from client.
In both cases, out of order FIN or RST will get received on the client
given that the firewall will not block them as no data are in those
frames.
And we want to disable active TFO globally as it helps if the middle box
is very close to the client and most of the connections are likely to
fail.

Also, add a debug sysctl:
tcp_fastopen_blackhole_detect_timeout_sec:
the initial timeout to use when firewall blackhole issue happens.
This can be set and read.
When setting it to 0, it means to disable the active disable logic.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Wei Wang and committed by
David S. Miller
cf1ef3f0 bc95cd8e

+160 -4
+8
Documentation/networking/ip-sysctl.txt
··· 602 602 Note that that additional client or server features are only 603 603 effective if the basic support (0x1 and 0x2) are enabled respectively. 604 604 605 + tcp_fastopen_blackhole_timeout_sec - INTEGER 606 + Initial time period in second to disable Fastopen on active TCP sockets 607 + when a TFO firewall blackhole issue happens. 608 + This time period will grow exponentially when more blackhole issues 609 + get detected right after Fastopen is re-enabled and will reset to 610 + initial value when the blackhole issue goes away. 611 + By default, it is set to 1hr. 612 + 605 613 tcp_syn_retries - INTEGER 606 614 Number of times initial SYNs for an active TCP connection attempt 607 615 will be retransmitted. Should not be higher than 127. Default value
+1
include/linux/tcp.h
··· 233 233 u8 syn_data:1, /* SYN includes data */ 234 234 syn_fastopen:1, /* SYN includes Fast Open option */ 235 235 syn_fastopen_exp:1,/* SYN includes Fast Open exp. option */ 236 + syn_fastopen_ch:1, /* Active TFO re-enabling probe */ 236 237 syn_data_acked:1,/* data in SYN is acked by SYN-ACK */ 237 238 save_syn:1, /* Save headers of SYN packet */ 238 239 is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */
+6
include/net/tcp.h
··· 1506 1506 struct rcu_head rcu; 1507 1507 }; 1508 1508 1509 + extern unsigned int sysctl_tcp_fastopen_blackhole_timeout; 1510 + void tcp_fastopen_active_disable(void); 1511 + bool tcp_fastopen_active_should_disable(struct sock *sk); 1512 + void tcp_fastopen_active_disable_ofo_check(struct sock *sk); 1513 + void tcp_fastopen_active_timeout_reset(void); 1514 + 1509 1515 /* Latencies incurred by various limits for a sender. They are 1510 1516 * chronograph-like stats that are mutually exclusive. 1511 1517 */
+21
net/ipv4/sysctl_net_ipv4.c
··· 350 350 return ret; 351 351 } 352 352 353 + static int proc_tfo_blackhole_detect_timeout(struct ctl_table *table, 354 + int write, 355 + void __user *buffer, 356 + size_t *lenp, loff_t *ppos) 357 + { 358 + int ret; 359 + 360 + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); 361 + if (write && ret == 0) 362 + tcp_fastopen_active_timeout_reset(); 363 + return ret; 364 + } 365 + 353 366 static struct ctl_table ipv4_table[] = { 354 367 { 355 368 .procname = "tcp_timestamps", ··· 411 398 .mode = 0600, 412 399 .maxlen = ((TCP_FASTOPEN_KEY_LENGTH * 2) + 10), 413 400 .proc_handler = proc_tcp_fastopen_key, 401 + }, 402 + { 403 + .procname = "tcp_fastopen_blackhole_timeout_sec", 404 + .data = &sysctl_tcp_fastopen_blackhole_timeout, 405 + .maxlen = sizeof(int), 406 + .mode = 0644, 407 + .proc_handler = proc_tfo_blackhole_detect_timeout, 408 + .extra1 = &zero, 414 409 }, 415 410 { 416 411 .procname = "tcp_abort_on_overflow",
+1
net/ipv4/tcp.c
··· 2296 2296 tcp_clear_xmit_timers(sk); 2297 2297 __skb_queue_purge(&sk->sk_receive_queue); 2298 2298 tcp_write_queue_purge(sk); 2299 + tcp_fastopen_active_disable_ofo_check(sk); 2299 2300 skb_rbtree_purge(&tp->out_of_order_queue); 2300 2301 2301 2302 inet->inet_dport = 0;
+101
net/ipv4/tcp_fastopen.c
··· 341 341 cookie->len = -1; 342 342 return false; 343 343 } 344 + 345 + /* Firewall blackhole issue check */ 346 + if (tcp_fastopen_active_should_disable(sk)) { 347 + cookie->len = -1; 348 + return false; 349 + } 350 + 344 351 if (sysctl_tcp_fastopen & TFO_CLIENT_NO_COOKIE) { 345 352 cookie->len = -1; 346 353 return true; ··· 387 380 return false; 388 381 } 389 382 EXPORT_SYMBOL(tcp_fastopen_defer_connect); 383 + 384 + /* 385 + * The following code block is to deal with middle box issues with TFO: 386 + * Middlebox firewall issues can potentially cause server's data being 387 + * blackholed after a successful 3WHS using TFO. 388 + * The proposed solution is to disable active TFO globally under the 389 + * following circumstances: 390 + * 1. client side TFO socket receives out of order FIN 391 + * 2. client side TFO socket receives out of order RST 392 + * We disable active side TFO globally for 1hr at first. Then if it 393 + * happens again, we disable it for 2h, then 4h, 8h, ... 394 + * And we reset the timeout back to 1hr when we see a successful active 395 + * TFO connection with data exchanges. 396 + */ 397 + 398 + /* Default to 1hr */ 399 + unsigned int sysctl_tcp_fastopen_blackhole_timeout __read_mostly = 60 * 60; 400 + static atomic_t tfo_active_disable_times __read_mostly = ATOMIC_INIT(0); 401 + static unsigned long tfo_active_disable_stamp __read_mostly; 402 + 403 + /* Disable active TFO and record current jiffies and 404 + * tfo_active_disable_times 405 + */ 406 + void tcp_fastopen_active_disable(void) 407 + { 408 + atomic_inc(&tfo_active_disable_times); 409 + tfo_active_disable_stamp = jiffies; 410 + } 411 + 412 + /* Reset tfo_active_disable_times to 0 */ 413 + void tcp_fastopen_active_timeout_reset(void) 414 + { 415 + atomic_set(&tfo_active_disable_times, 0); 416 + } 417 + 418 + /* Calculate timeout for tfo active disable 419 + * Return true if we are still in the active TFO disable period 420 + * Return false if timeout already expired and we should use active TFO 421 + */ 422 + bool tcp_fastopen_active_should_disable(struct sock *sk) 423 + { 424 + int tfo_da_times = atomic_read(&tfo_active_disable_times); 425 + int multiplier; 426 + unsigned long timeout; 427 + 428 + if (!tfo_da_times) 429 + return false; 430 + 431 + /* Limit timout to max: 2^6 * initial timeout */ 432 + multiplier = 1 << min(tfo_da_times - 1, 6); 433 + timeout = multiplier * sysctl_tcp_fastopen_blackhole_timeout * HZ; 434 + if (time_before(jiffies, tfo_active_disable_stamp + timeout)) 435 + return true; 436 + 437 + /* Mark check bit so we can check for successful active TFO 438 + * condition and reset tfo_active_disable_times 439 + */ 440 + tcp_sk(sk)->syn_fastopen_ch = 1; 441 + return false; 442 + } 443 + 444 + /* Disable active TFO if FIN is the only packet in the ofo queue 445 + * and no data is received. 446 + * Also check if we can reset tfo_active_disable_times if data is 447 + * received successfully on a marked active TFO sockets opened on 448 + * a non-loopback interface 449 + */ 450 + void tcp_fastopen_active_disable_ofo_check(struct sock *sk) 451 + { 452 + struct tcp_sock *tp = tcp_sk(sk); 453 + struct rb_node *p; 454 + struct sk_buff *skb; 455 + struct dst_entry *dst; 456 + 457 + if (!tp->syn_fastopen) 458 + return; 459 + 460 + if (!tp->data_segs_in) { 461 + p = rb_first(&tp->out_of_order_queue); 462 + if (p && !rb_next(p)) { 463 + skb = rb_entry(p, struct sk_buff, rbnode); 464 + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { 465 + tcp_fastopen_active_disable(); 466 + return; 467 + } 468 + } 469 + } else if (tp->syn_fastopen_ch && 470 + atomic_read(&tfo_active_disable_times)) { 471 + dst = sk_dst_get(sk); 472 + if (!(dst && dst->dev && (dst->dev->flags & IFF_LOOPBACK))) 473 + tcp_fastopen_active_timeout_reset(); 474 + dst_release(dst); 475 + } 476 + }
+19 -4
net/ipv4/tcp_input.c
··· 5300 5300 5301 5301 if (rst_seq_match) 5302 5302 tcp_reset(sk); 5303 - else 5303 + else { 5304 + /* Disable TFO if RST is out-of-order 5305 + * and no data has been received 5306 + * for current active TFO socket 5307 + */ 5308 + if (tp->syn_fastopen && !tp->data_segs_in && 5309 + sk->sk_state == TCP_ESTABLISHED) 5310 + tcp_fastopen_active_disable(); 5304 5311 tcp_send_challenge_ack(sk, skb); 5312 + } 5305 5313 goto discard; 5306 5314 } 5307 5315 ··· 6052 6044 break; 6053 6045 } 6054 6046 6055 - if (tp->linger2 < 0 || 6056 - (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq && 6057 - after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt))) { 6047 + if (tp->linger2 < 0) { 6048 + tcp_done(sk); 6049 + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA); 6050 + return 1; 6051 + } 6052 + if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq && 6053 + after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt)) { 6054 + /* Receive out of order FIN after close() */ 6055 + if (tp->syn_fastopen && th->fin) 6056 + tcp_fastopen_active_disable(); 6058 6057 tcp_done(sk); 6059 6058 NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA); 6060 6059 return 1;
+3
net/ipv4/tcp_ipv4.c
··· 1855 1855 /* Cleanup up the write buffer. */ 1856 1856 tcp_write_queue_purge(sk); 1857 1857 1858 + /* Check if we want to disable active TFO */ 1859 + tcp_fastopen_active_disable_ofo_check(sk); 1860 + 1858 1861 /* Cleans up our, hopefully empty, out_of_order_queue. */ 1859 1862 skb_rbtree_purge(&tp->out_of_order_queue); 1860 1863