Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: don't select potentially stale ri->map from buggy xdp progs

We can potentially run into a couple of issues with the XDP
bpf_redirect_map() helper. The ri->map in the per CPU storage
can become stale in several ways, mostly due to misuse, where
we can then trigger a use after free on the map:

i) prog A is calling bpf_redirect_map(), returning XDP_REDIRECT
and running on a driver not supporting XDP_REDIRECT yet. The
ri->map on that CPU becomes stale when the XDP program is unloaded
on the driver, and a prog B loaded on a different driver which
supports XDP_REDIRECT return code. prog B would have to omit
calling to bpf_redirect_map() and just return XDP_REDIRECT, which
would then access the freed map in xdp_do_redirect() since not
cleared for that CPU.

ii) prog A is calling bpf_redirect_map(), returning a code other
than XDP_REDIRECT. prog A is then detached, which triggers release
of the map. prog B is attached which, similarly as in i), would
just return XDP_REDIRECT without having called bpf_redirect_map()
and thus be accessing the freed map in xdp_do_redirect() since
not cleared for that CPU.

iii) prog A is attached to generic XDP, calling the bpf_redirect_map()
helper and returning XDP_REDIRECT. xdp_do_generic_redirect() is
currently not handling ri->map (will be fixed by Jesper), so it's
not being reset. Later loading a e.g. native prog B which would,
say, call bpf_xdp_redirect() and then returns XDP_REDIRECT would
find in xdp_do_redirect() that a map was set and uses that causing
use after free on map access.

Fix thus needs to avoid accessing stale ri->map pointers, naive
way would be to call a BPF function from drivers that just resets
it to NULL for all XDP return codes but XDP_REDIRECT and including
XDP_REDIRECT for drivers not supporting it yet (and let ri->map
being handled in xdp_do_generic_redirect()). There is a less
intrusive way w/o letting drivers call a reset for each BPF run.

The verifier knows we're calling into bpf_xdp_redirect_map()
helper, so it can do a small insn rewrite transparent to the prog
itself in the sense that it fills R4 with a pointer to the own
bpf_prog. We have that pointer at verification time anyway and
R4 is allowed to be used as per calling convention we scratch
R0 to R5 anyway, so they become inaccessible and program cannot
read them prior to a write. Then, the helper would store the prog
pointer in the current CPUs struct redirect_info. Later in
xdp_do_*_redirect() we check whether the redirect_info's prog
pointer is the same as passed xdp_prog pointer, and if that's
the case then all good, since the prog holds a ref on the map
anyway, so it is always valid at that point in time and must
have a reference count of at least 1. If in the unlikely case
they are not equal, it means we got a stale pointer, so we clear
and bail out right there. Also do reset map and the owning prog
in bpf_xdp_redirect(), so that bpf_xdp_redirect_map() and
bpf_xdp_redirect() won't get mixed up, only the last call should
take precedence. A tc bpf_redirect() doesn't use map anywhere
yet, so no need to clear it there since never accessed in that
layer.

Note that in case the prog is released, and thus the map as
well we're still under RCU read critical section at that time
and have preemption disabled as well. Once we commit with the
__dev_map_insert_ctx() from xdp_do_redirect_map() and set the
map to ri->map_to_flush, we still wait for a xdp_do_flush_map()
to finish in devmap dismantle time once flush_needed bit is set,
so that is fine.

Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Daniel Borkmann and committed by
David S. Miller
109980b8 9a486c9d

+35 -2
+16
kernel/bpf/verifier.c
··· 4203 4203 continue; 4204 4204 } 4205 4205 4206 + if (insn->imm == BPF_FUNC_redirect_map) { 4207 + u64 addr = (unsigned long)prog; 4208 + struct bpf_insn r4_ld[] = { 4209 + BPF_LD_IMM64(BPF_REG_4, addr), 4210 + *insn, 4211 + }; 4212 + cnt = ARRAY_SIZE(r4_ld); 4213 + 4214 + new_prog = bpf_patch_insn_data(env, i + delta, r4_ld, cnt); 4215 + if (!new_prog) 4216 + return -ENOMEM; 4217 + 4218 + delta += cnt - 1; 4219 + env->prog = prog = new_prog; 4220 + insn = new_prog->insnsi + i + delta; 4221 + } 4206 4222 patch_call_imm: 4207 4223 fn = prog->aux->ops->get_func_proto(insn->imm); 4208 4224 /* all functions that have prototype and verifier allowed
+19 -2
net/core/filter.c
··· 1794 1794 u32 flags; 1795 1795 struct bpf_map *map; 1796 1796 struct bpf_map *map_to_flush; 1797 + const struct bpf_prog *map_owner; 1797 1798 }; 1798 1799 1799 1800 static DEFINE_PER_CPU(struct redirect_info, redirect_info); ··· 1808 1807 1809 1808 ri->ifindex = ifindex; 1810 1809 ri->flags = flags; 1811 - ri->map = NULL; 1812 1810 1813 1811 return TC_ACT_REDIRECT; 1814 1812 } ··· 2504 2504 struct bpf_prog *xdp_prog) 2505 2505 { 2506 2506 struct redirect_info *ri = this_cpu_ptr(&redirect_info); 2507 + const struct bpf_prog *map_owner = ri->map_owner; 2507 2508 struct bpf_map *map = ri->map; 2508 2509 u32 index = ri->ifindex; 2509 2510 struct net_device *fwd; ··· 2512 2511 2513 2512 ri->ifindex = 0; 2514 2513 ri->map = NULL; 2514 + ri->map_owner = NULL; 2515 + 2516 + /* This is really only caused by a deliberately crappy 2517 + * BPF program, normally we would never hit that case, 2518 + * so no need to inform someone via tracepoints either, 2519 + * just bail out. 2520 + */ 2521 + if (unlikely(map_owner != xdp_prog)) 2522 + return -EINVAL; 2515 2523 2516 2524 fwd = __dev_map_lookup_elem(map, index); 2517 2525 if (!fwd) { ··· 2617 2607 2618 2608 ri->ifindex = ifindex; 2619 2609 ri->flags = flags; 2610 + ri->map = NULL; 2611 + ri->map_owner = NULL; 2620 2612 2621 2613 return XDP_REDIRECT; 2622 2614 } ··· 2631 2619 .arg2_type = ARG_ANYTHING, 2632 2620 }; 2633 2621 2634 - BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, u64, flags) 2622 + BPF_CALL_4(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, u64, flags, 2623 + const struct bpf_prog *, map_owner) 2635 2624 { 2636 2625 struct redirect_info *ri = this_cpu_ptr(&redirect_info); 2637 2626 ··· 2642 2629 ri->ifindex = ifindex; 2643 2630 ri->flags = flags; 2644 2631 ri->map = map; 2632 + ri->map_owner = map_owner; 2645 2633 2646 2634 return XDP_REDIRECT; 2647 2635 } 2648 2636 2637 + /* Note, arg4 is hidden from users and populated by the verifier 2638 + * with the right pointer. 2639 + */ 2649 2640 static const struct bpf_func_proto bpf_xdp_redirect_map_proto = { 2650 2641 .func = bpf_xdp_redirect_map, 2651 2642 .gpl_only = false,