Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

udp: Allow GSO transmit from devices with no checksum offload

Today sending a UDP GSO packet from a TUN device results in an EIO error:

import fcntl, os, struct
from socket import *

TUNSETIFF = 0x400454CA
IFF_TUN = 0x0001
IFF_NO_PI = 0x1000
UDP_SEGMENT = 103

tun_fd = os.open("/dev/net/tun", os.O_RDWR)
ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
fcntl.ioctl(tun_fd, TUNSETIFF, ifr)

os.system("ip addr add 192.0.2.1/24 dev tun0")
os.system("ip link set dev tun0 up")

s = socket(AF_INET, SOCK_DGRAM)
s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO

This is due to a check in the udp stack if the egress device offers
checksum offload. While TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.

However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback.

Lift the restriction so that UDP_SEGMENT can be used with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee643f1 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.

Users should, however, expect a potential performance impact when
batch-sending packets with UDP_SEGMENT without checksum offload on the
egress device. In such case the packet payload is read twice: first during
the sendmsg syscall when copying data from user memory, and then in the GSO
stack for checksum computation. This double memory read can be less
efficient than a regular sendmsg where the checksum is calculated during
the initial data copy from user memory.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-1-422dfcbd6b48@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Jakub Sitnicki and committed by
Jakub Kicinski
10154dbd 748e3bbf

+10 -4
+1 -2
net/ipv4/udp.c
··· 938 938 kfree_skb(skb); 939 939 return -EINVAL; 940 940 } 941 - if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite || 942 - dst_xfrm(skb_dst(skb))) { 941 + if (is_udplite || dst_xfrm(skb_dst(skb))) { 943 942 kfree_skb(skb); 944 943 return -EIO; 945 944 }
+8
net/ipv4/udp_offload.c
··· 357 357 else 358 358 uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0; 359 359 360 + /* On the TX path, CHECKSUM_NONE and CHECKSUM_UNNECESSARY have the same 361 + * meaning. However, check for bad offloads in the GSO stack expects the 362 + * latter, if the checksum was calculated in software. To vouch for the 363 + * segment skbs we actually need to set it on the gso_skb. 364 + */ 365 + if (gso_skb->ip_summed == CHECKSUM_NONE) 366 + gso_skb->ip_summed = CHECKSUM_UNNECESSARY; 367 + 360 368 /* update refcount for the packet */ 361 369 if (copy_dtor) { 362 370 int delta = sum_truesize - gso_skb->truesize;
+1 -2
net/ipv6/udp.c
··· 1257 1257 kfree_skb(skb); 1258 1258 return -EINVAL; 1259 1259 } 1260 - if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite || 1261 - dst_xfrm(skb_dst(skb))) { 1260 + if (is_udplite || dst_xfrm(skb_dst(skb))) { 1262 1261 kfree_skb(skb); 1263 1262 return -EIO; 1264 1263 }