Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: allow for bigger reordering level

While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.

Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.

Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.

[1] Aggregation of links, per packet load balancing, fabrics not doing
deep packet inspections, alternative TCP congestion modules...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Eric Dumazet and committed by
David S. Miller
dca145ff 7aef06db

+23 -12
+2 -5
Documentation/networking/bonding.txt
··· 2230 2230 2231 2231 It is possible to adjust TCP/IP's congestion limits by 2232 2232 altering the net.ipv4.tcp_reordering sysctl parameter. The 2233 - usual default value is 3, and the maximum useful value is 127. 2234 - For a four interface balance-rr bond, expect that a single 2235 - TCP/IP stream will utilize no more than approximately 2.3 2236 - interface's worth of throughput, even after adjusting 2237 - tcp_reordering. 2233 + usual default value is 3. But keep in mind TCP stack is able 2234 + to automatically increase this when it detects reorders. 2238 2235 2239 2236 Note that the fraction of packets that will be delivered out of 2240 2237 order is highly variable, and is unlikely to be zero. The level
+9 -1
Documentation/networking/ip-sysctl.txt
··· 376 376 may consume significant resources. Cf. tcp_max_orphans. 377 377 378 378 tcp_reordering - INTEGER 379 - Maximal reordering of packets in a TCP stream. 379 + Initial reordering level of packets in a TCP stream. 380 + TCP stack can then dynamically adjust flow reordering level 381 + between this initial value and tcp_max_reordering 380 382 Default: 3 383 + 384 + tcp_max_reordering - INTEGER 385 + Maximal reordering level of packets in a TCP stream. 386 + 300 is a fairly conservative value, but you might increase it 387 + if paths are using per packet load balancing (like bonding rr mode) 388 + Default: 300 381 389 382 390 tcp_retrans_collapse - BOOLEAN 383 391 Bug-to-bug compatibility with some broken printers.
+2 -2
include/linux/tcp.h
··· 204 204 205 205 u16 urg_data; /* Saved octet of OOB data and control flags */ 206 206 u8 ecn_flags; /* ECN status bits. */ 207 - u8 reordering; /* Packet reordering metric. */ 207 + u8 keepalive_probes; /* num of allowed keep alive probes */ 208 + u32 reordering; /* Packet reordering metric. */ 208 209 u32 snd_up; /* Urgent pointer */ 209 210 210 - u8 keepalive_probes; /* num of allowed keep alive probes */ 211 211 /* 212 212 * Options received (usually on last packet, some only on SYN packets). 213 213 */
+1 -3
include/net/tcp.h
··· 70 70 /* After receiving this amount of duplicate ACKs fast retransmit starts. */ 71 71 #define TCP_FASTRETRANS_THRESH 3 72 72 73 - /* Maximal reordering. */ 74 - #define TCP_MAX_REORDERING 127 75 - 76 73 /* Maximal number of ACKs sent quickly to accelerate slow-start. */ 77 74 #define TCP_MAX_QUICKACKS 16U 78 75 ··· 249 252 extern int sysctl_tcp_max_orphans; 250 253 extern int sysctl_tcp_fack; 251 254 extern int sysctl_tcp_reordering; 255 + extern int sysctl_tcp_max_reordering; 252 256 extern int sysctl_tcp_dsack; 253 257 extern long sysctl_tcp_mem[3]; 254 258 extern int sysctl_tcp_wmem[3];
+7
net/ipv4/sysctl_net_ipv4.c
··· 496 496 .proc_handler = proc_dointvec 497 497 }, 498 498 { 499 + .procname = "tcp_max_reordering", 500 + .data = &sysctl_tcp_max_reordering, 501 + .maxlen = sizeof(int), 502 + .mode = 0644, 503 + .proc_handler = proc_dointvec 504 + }, 505 + { 499 506 .procname = "tcp_dsack", 500 507 .data = &sysctl_tcp_dsack, 501 508 .maxlen = sizeof(int),
+2 -1
net/ipv4/tcp_input.c
··· 81 81 int sysctl_tcp_sack __read_mostly = 1; 82 82 int sysctl_tcp_fack __read_mostly = 1; 83 83 int sysctl_tcp_reordering __read_mostly = TCP_FASTRETRANS_THRESH; 84 + int sysctl_tcp_max_reordering __read_mostly = 300; 84 85 EXPORT_SYMBOL(sysctl_tcp_reordering); 85 86 int sysctl_tcp_dsack __read_mostly = 1; 86 87 int sysctl_tcp_app_win __read_mostly = 31; ··· 834 833 if (metric > tp->reordering) { 835 834 int mib_idx; 836 835 837 - tp->reordering = min(TCP_MAX_REORDERING, metric); 836 + tp->reordering = min(sysctl_tcp_max_reordering, metric); 838 837 839 838 /* This exciting event is worth to be remembered. 8) */ 840 839 if (ts)