Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ip: take care of last fragment in ip_append_data

While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :

- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:

if (datalen == length + fraggap)
alloclen += rt->dst.trailer_len;

Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)

In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.

In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Eric Dumazet and committed by
David S. Miller
59104f06 a02cec21

+6 -3
+6 -3
net/ipv4/ip_output.c
··· 926 926 !(rt->dst.dev->features&NETIF_F_SG)) 927 927 alloclen = mtu; 928 928 else 929 - alloclen = datalen + fragheaderlen; 929 + alloclen = fraglen; 930 930 931 931 /* The last fragment gets additional space at tail. 932 932 * Note, with MSG_MORE we overallocate on fragments, 933 933 * because we have no idea what fragment will be 934 934 * the last. 935 935 */ 936 - if (datalen == length + fraggap) 936 + if (datalen == length + fraggap) { 937 937 alloclen += rt->dst.trailer_len; 938 - 938 + /* make sure mtu is not reached */ 939 + if (datalen > mtu - fragheaderlen - rt->dst.trailer_len) 940 + datalen -= ALIGN(rt->dst.trailer_len, 8); 941 + } 939 942 if (transhdrlen) { 940 943 skb = sock_alloc_send_skb(sk, 941 944 alloclen + hh_len + 15,