Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: avoid premature drops in tcp_add_backlog()

While testing TCP performance with latest trees,
I saw suspect SOCKET_BACKLOG drops.

tcp_add_backlog() computes its limit with :

limit = (u32)READ_ONCE(sk->sk_rcvbuf) +
(u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
limit += 64 * 1024;

This does not take into account that sk->sk_backlog.len
is reset only at the very end of __release_sock().

Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
sk_rcvbuf in normal conditions.

We should double sk->sk_rcvbuf contribution in the formula
to absorb bubbles in the backlog, which happen more often
for very fast flows.

This change maintains decent protection against abuses.

Fixes: c377411f2494 ("net: sk_add_backlog() take rmem_alloc into account")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240423125620.3309458-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Eric Dumazet and committed by
Jakub Kicinski
ec00ed47 e6be197f

+11 -2
+11 -2
net/ipv4/tcp_ipv4.c
··· 1995 1995 bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb, 1996 1996 enum skb_drop_reason *reason) 1997 1997 { 1998 - u32 limit, tail_gso_size, tail_gso_segs; 1998 + u32 tail_gso_size, tail_gso_segs; 1999 1999 struct skb_shared_info *shinfo; 2000 2000 const struct tcphdr *th; 2001 2001 struct tcphdr *thtail; ··· 2004 2004 bool fragstolen; 2005 2005 u32 gso_segs; 2006 2006 u32 gso_size; 2007 + u64 limit; 2007 2008 int delta; 2008 2009 2009 2010 /* In case all data was pulled from skb frags (in __pskb_pull_tail()), ··· 2100 2099 __skb_push(skb, hdrlen); 2101 2100 2102 2101 no_coalesce: 2103 - limit = (u32)READ_ONCE(sk->sk_rcvbuf) + (u32)(READ_ONCE(sk->sk_sndbuf) >> 1); 2102 + /* sk->sk_backlog.len is reset only at the end of __release_sock(). 2103 + * Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach 2104 + * sk_rcvbuf in normal conditions. 2105 + */ 2106 + limit = ((u64)READ_ONCE(sk->sk_rcvbuf)) << 1; 2107 + 2108 + limit += ((u32)READ_ONCE(sk->sk_sndbuf)) >> 1; 2104 2109 2105 2110 /* Only socket owner can try to collapse/prune rx queues 2106 2111 * to reduce memory overhead, so add a little headroom here. 2107 2112 * Few sockets backlog are possibly concurrently non empty. 2108 2113 */ 2109 2114 limit += 64 * 1024; 2115 + 2116 + limit = min_t(u64, limit, UINT_MAX); 2110 2117 2111 2118 if (unlikely(sk_add_backlog(sk, skb, limit))) { 2112 2119 bh_unlock_sock(sk);