Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: fix zerocopy and notsent_lowat issues

My recent patch had at least three problems :

1) TX zerocopy wants notification when skb is acknowledged,
thus we need to call skb_zcopy_clear() if the skb is
cached into sk->sk_tx_skb_cache

2) Some applications might expect precise EPOLLOUT
notifications, so we need to update sk->sk_wmem_queued
and call sk_mem_uncharge() from sk_wmem_free_skb()
in all cases. The SOCK_QUEUE_SHRUNK flag must also be set.

3) Reuse of saved skb should have used skb_cloned() instead
of simply checking if the fast clone has been freed.

Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Eric Dumazet and committed by
David S. Miller
4f661542 4d5ec89f

+8 -14
+5 -4
include/net/sock.h
··· 1465 1465 1466 1466 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) 1467 1467 { 1468 - if (!sk->sk_tx_skb_cache) { 1469 - sk->sk_tx_skb_cache = skb; 1470 - return; 1471 - } 1472 1468 sock_set_flag(sk, SOCK_QUEUE_SHRUNK); 1473 1469 sk->sk_wmem_queued -= skb->truesize; 1474 1470 sk_mem_uncharge(sk, skb->truesize); 1471 + if (!sk->sk_tx_skb_cache) { 1472 + skb_zcopy_clear(skb, true); 1473 + sk->sk_tx_skb_cache = skb; 1474 + return; 1475 + } 1475 1476 __kfree_skb(skb); 1476 1477 } 1477 1478
+3 -10
net/ipv4/tcp.c
··· 865 865 { 866 866 struct sk_buff *skb; 867 867 868 - skb = sk->sk_tx_skb_cache; 869 - if (skb && !size) { 870 - const struct sk_buff_fclones *fclones; 871 - 872 - fclones = container_of(skb, struct sk_buff_fclones, skb1); 873 - if (refcount_read(&fclones->fclone_ref) == 1) { 874 - sk->sk_wmem_queued -= skb->truesize; 875 - sk_mem_uncharge(sk, skb->truesize); 868 + if (likely(!size)) { 869 + skb = sk->sk_tx_skb_cache; 870 + if (skb && !skb_cloned(skb)) { 876 871 skb->truesize -= skb->data_len; 877 872 sk->sk_tx_skb_cache = NULL; 878 873 pskb_trim(skb, 0); ··· 2538 2543 tcp_rtx_queue_purge(sk); 2539 2544 skb = sk->sk_tx_skb_cache; 2540 2545 if (skb) { 2541 - sk->sk_wmem_queued -= skb->truesize; 2542 - sk_mem_uncharge(sk, skb->truesize); 2543 2546 __kfree_skb(skb); 2544 2547 sk->sk_tx_skb_cache = NULL; 2545 2548 }