Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ipv4: allow local fragmentation in ip_finish_output_gso()

Some configurations (e.g. geneve interface with default
MTU of 1500 over an ethernet interface with 1500 MTU) result
in the transmission of packets that exceed the configured MTU.
While this should be considered to be a "bad" configuration,
it is still allowed and should not result in the sending
of packets that exceed the configured MTU.

Fix by dropping the assumption in ip_finish_output_gso() that
locally originated gso packets will never need fragmentation.
Basic testing using iperf (observing CPU usage and bandwidth)
have shown no measurable performance impact for traffic not
requiring fragmentation.

Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack")
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Lance Richardson and committed by
David S. Miller
9ee6c5dc da96786e

+5 -19
+1 -2
include/net/ip.h
··· 47 47 #define IPSKB_REROUTED BIT(4) 48 48 #define IPSKB_DOREDIRECT BIT(5) 49 49 #define IPSKB_FRAG_PMTU BIT(6) 50 - #define IPSKB_FRAG_SEGS BIT(7) 51 - #define IPSKB_L3SLAVE BIT(8) 50 + #define IPSKB_L3SLAVE BIT(7) 52 51 53 52 u16 frag_max_size; 54 53 };
+1 -1
net/ipv4/ip_forward.c
··· 117 117 if (opt->is_strictroute && rt->rt_uses_gateway) 118 118 goto sr_failed; 119 119 120 - IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS; 120 + IPCB(skb)->flags |= IPSKB_FORWARDED; 121 121 mtu = ip_dst_mtu_maybe_forward(&rt->dst, true); 122 122 if (ip_exceeds_mtu(skb, mtu)) { 123 123 IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
+2 -4
net/ipv4/ip_output.c
··· 239 239 struct sk_buff *segs; 240 240 int ret = 0; 241 241 242 - /* common case: fragmentation of segments is not allowed, 243 - * or seglen is <= mtu 242 + /* common case: seglen is <= mtu 244 243 */ 245 - if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) || 246 - skb_gso_validate_mtu(skb, mtu)) 244 + if (skb_gso_validate_mtu(skb, mtu)) 247 245 return ip_finish_output2(net, sk, skb); 248 246 249 247 /* Slowpath - GSO segment length is exceeding the dst MTU.
-11
net/ipv4/ip_tunnel_core.c
··· 63 63 int pkt_len = skb->len - skb_inner_network_offset(skb); 64 64 struct net *net = dev_net(rt->dst.dev); 65 65 struct net_device *dev = skb->dev; 66 - int skb_iif = skb->skb_iif; 67 66 struct iphdr *iph; 68 67 int err; 69 68 ··· 71 72 skb_clear_hash_if_not_l4(skb); 72 73 skb_dst_set(skb, &rt->dst); 73 74 memset(IPCB(skb), 0, sizeof(*IPCB(skb))); 74 - 75 - if (skb_iif && !(df & htons(IP_DF))) { 76 - /* Arrived from an ingress interface, got encapsulated, with 77 - * fragmentation of encapulating frames allowed. 78 - * If skb is gso, the resulting encapsulated network segments 79 - * may exceed dst mtu. 80 - * Allow IP Fragmentation of segments. 81 - */ 82 - IPCB(skb)->flags |= IPSKB_FRAG_SEGS; 83 - } 84 75 85 76 /* Push down and install the IP header. */ 86 77 skb_push(skb, sizeof(struct iphdr));
+1 -1
net/ipv4/ipmr.c
··· 1749 1749 vif->dev->stats.tx_bytes += skb->len; 1750 1750 } 1751 1751 1752 - IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS; 1752 + IPCB(skb)->flags |= IPSKB_FORWARDED; 1753 1753 1754 1754 /* RFC1584 teaches, that DVMRP/PIM router must deliver packets locally 1755 1755 * not only before forwarding, but after forwarding on all output