Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: sockmap update/simplify memory accounting scheme

Instead of tracking wmem_queued and sk_mem_charge by incrementing
in the verdict SK_REDIRECT paths and decrementing in the tx work
path use skb_set_owner_w and sock_writeable helpers. This solves
a few issues with the current code. First, in SK_REDIRECT inc on
sk_wmem_queued and sk_mem_charge were being done without the peers
sock lock being held. Under stress this can result in accounting
errors when tx work and/or multiple verdict decisions are working
on the peer psock.

Additionally, this cleans up the code because we can rely on the
default destructor to decrement memory accounting on kfree_skb. Also
this will trigger sk_write_space when space becomes available on
kfree_skb() which wasn't happening before and prevent __sk_free
from being called until all in-flight packets are completed.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

John Fastabend and committed by
David S. Miller
90a9631c 250b0f78

+7 -11
+7 -11
kernel/bpf/sockmap.c
··· 111 111 112 112 static void smap_do_verdict(struct smap_psock *psock, struct sk_buff *skb) 113 113 { 114 - struct sock *sock; 114 + struct sock *sk; 115 115 int rc; 116 116 117 117 /* Because we use per cpu values to feed input from sock redirect ··· 123 123 rc = smap_verdict_func(psock, skb); 124 124 switch (rc) { 125 125 case SK_REDIRECT: 126 - sock = do_sk_redirect_map(); 126 + sk = do_sk_redirect_map(); 127 127 preempt_enable(); 128 - if (likely(sock)) { 129 - struct smap_psock *peer = smap_psock_sk(sock); 128 + if (likely(sk)) { 129 + struct smap_psock *peer = smap_psock_sk(sk); 130 130 131 131 if (likely(peer && 132 132 test_bit(SMAP_TX_RUNNING, &peer->state) && 133 - sk_stream_memory_free(peer->sock))) { 134 - peer->sock->sk_wmem_queued += skb->truesize; 135 - sk_mem_charge(peer->sock, skb->truesize); 133 + !sock_flag(sk, SOCK_DEAD) && 134 + sock_writeable(sk))) { 135 + skb_set_owner_w(skb, sk); 136 136 skb_queue_tail(&peer->rxqueue, skb); 137 137 schedule_work(&peer->tx_work); 138 138 break; ··· 282 282 /* Hard errors break pipe and stop xmit */ 283 283 smap_report_sk_error(psock, n ? -n : EPIPE); 284 284 clear_bit(SMAP_TX_RUNNING, &psock->state); 285 - sk_mem_uncharge(psock->sock, skb->truesize); 286 - psock->sock->sk_wmem_queued -= skb->truesize; 287 285 kfree_skb(skb); 288 286 goto out; 289 287 } 290 288 rem -= n; 291 289 off += n; 292 290 } while (rem); 293 - sk_mem_uncharge(psock->sock, skb->truesize); 294 - psock->sock->sk_wmem_queued -= skb->truesize; 295 291 kfree_skb(skb); 296 292 } 297 293 out: