Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: gso: do not generate out of order packets

GSO TCP handler has following issues :

1) ooo_okay from original GSO packet is duplicated to all segments
2) segments (but the last one) are orphaned, so transmit path can not
get transmit queue number from the socket. This happens if GSO
segmentation is done before stacked device for example.

Result is we can send packets from a given TCP flow to different TX
queues (if using multiqueue NICS). This generates OOO problems and
spurious SACK & retransmits.

Fix this by keeping socket pointer set for all segments.

This means that every segment must also have a destructor, and the
original gso skb truesize must be split on all segments, to keep
precise sk->sk_wmem_alloc accounting.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Eric Dumazet and committed by
David S. Miller
6ff50cd5 5c4b2749

+21 -1
+21 -1
net/ipv4/tcp.c
··· 2887 2887 unsigned int mss; 2888 2888 struct sk_buff *gso_skb = skb; 2889 2889 __sum16 newcheck; 2890 + bool ooo_okay, copy_destructor; 2890 2891 2891 2892 if (!pskb_may_pull(skb, sizeof(*th))) 2892 2893 goto out; ··· 2928 2927 goto out; 2929 2928 } 2930 2929 2930 + copy_destructor = gso_skb->destructor == tcp_wfree; 2931 + ooo_okay = gso_skb->ooo_okay; 2932 + /* All segments but the first should have ooo_okay cleared */ 2933 + skb->ooo_okay = 0; 2934 + 2931 2935 segs = skb_segment(skb, features); 2932 2936 if (IS_ERR(segs)) 2933 2937 goto out; 2938 + 2939 + /* Only first segment might have ooo_okay set */ 2940 + segs->ooo_okay = ooo_okay; 2934 2941 2935 2942 delta = htonl(oldlen + (thlen + mss)); 2936 2943 ··· 2959 2950 thlen, skb->csum)); 2960 2951 2961 2952 seq += mss; 2953 + if (copy_destructor) { 2954 + skb->destructor = gso_skb->destructor; 2955 + skb->sk = gso_skb->sk; 2956 + /* {tcp|sock}_wfree() use exact truesize accounting : 2957 + * sum(skb->truesize) MUST be exactly be gso_skb->truesize 2958 + * So we account mss bytes of 'true size' for each segment. 2959 + * The last segment will contain the remaining. 2960 + */ 2961 + skb->truesize = mss; 2962 + gso_skb->truesize -= mss; 2963 + } 2962 2964 skb = skb->next; 2963 2965 th = tcp_hdr(skb); 2964 2966 ··· 2982 2962 * is freed at TX completion, and not right now when gso_skb 2983 2963 * is freed by GSO engine 2984 2964 */ 2985 - if (gso_skb->destructor == tcp_wfree) { 2965 + if (copy_destructor) { 2986 2966 swap(gso_skb->sk, skb->sk); 2987 2967 swap(gso_skb->destructor, skb->destructor); 2988 2968 swap(gso_skb->truesize, skb->truesize);