Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: add new tcp_mtu_probe_floor sysctl

The current implementation of TCP MTU probing can considerably
underestimate the MTU on lossy connections allowing the MSS to get down to
48. We have found that in almost all of these cases on our networks these
paths can handle much larger MTUs meaning the connections are being
artificially limited. Even though TCP MTU probing can raise the MSS back up
we have seen this not to be the case causing connections to be "stuck" with
an MSS of 48 when heavy loss is present.

Prior to pushing out this change we could not keep TCP MTU probing enabled
b/c of the above reasons. Now with a reasonble floor set we've had it
enabled for the past 6 months.

The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
administrators the ability to control the floor of MSS probing.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Josh Hunt and committed by
David S. Miller
c04b79b6 3a5e5234

+18 -1
+6
Documentation/networking/ip-sysctl.txt
··· 256 256 Path MTU discovery (MTU probing). If MTU probing is enabled, 257 257 this is the initial MSS used by the connection. 258 258 259 + tcp_mtu_probe_floor - INTEGER 260 + If MTU probing is enabled this caps the minimum MSS used for search_low 261 + for the connection. 262 + 263 + Default : 48 264 + 259 265 tcp_min_snd_mss - INTEGER 260 266 TCP SYN and SYNACK messages usually advertise an ADVMSS option, 261 267 as described in RFC 1122 and RFC 6691.
+1
include/net/netns/ipv4.h
··· 116 116 int sysctl_tcp_l3mdev_accept; 117 117 #endif 118 118 int sysctl_tcp_mtu_probing; 119 + int sysctl_tcp_mtu_probe_floor; 119 120 int sysctl_tcp_base_mss; 120 121 int sysctl_tcp_min_snd_mss; 121 122 int sysctl_tcp_probe_threshold;
+9
net/ipv4/sysctl_net_ipv4.c
··· 820 820 .extra2 = &tcp_min_snd_mss_max, 821 821 }, 822 822 { 823 + .procname = "tcp_mtu_probe_floor", 824 + .data = &init_net.ipv4.sysctl_tcp_mtu_probe_floor, 825 + .maxlen = sizeof(int), 826 + .mode = 0644, 827 + .proc_handler = proc_dointvec_minmax, 828 + .extra1 = &tcp_min_snd_mss_min, 829 + .extra2 = &tcp_min_snd_mss_max, 830 + }, 831 + { 823 832 .procname = "tcp_probe_threshold", 824 833 .data = &init_net.ipv4.sysctl_tcp_probe_threshold, 825 834 .maxlen = sizeof(int),
+1
net/ipv4/tcp_ipv4.c
··· 2637 2637 net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS; 2638 2638 net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD; 2639 2639 net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL; 2640 + net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS; 2640 2641 2641 2642 net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME; 2642 2643 net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES;
+1 -1
net/ipv4/tcp_timer.c
··· 154 154 } else { 155 155 mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1; 156 156 mss = min(net->ipv4.sysctl_tcp_base_mss, mss); 157 - mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len); 157 + mss = max(mss, net->ipv4.sysctl_tcp_mtu_probe_floor); 158 158 mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss); 159 159 icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss); 160 160 }