Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: Add bpf_sock_destroy kfunc

The socket destroy kfunc is used to forcefully terminate sockets from
certain BPF contexts. We plan to use the capability in Cilium
load-balancing to terminate client sockets that continue to connect to
deleted backends. The other use case is on-the-fly policy enforcement
where existing socket connections prevented by policies need to be
forcefully terminated. The kfunc also allows terminating sockets that may
or may not be actively sending traffic.

The kfunc can currently be called only from BPF TCP and UDP iterators
where users can filter, and terminate selected sockets. More
specifically, it can only be called from BPF contexts that ensure
socket locking in order to allow synchronous execution of protocol
specific `diag_destroy` handlers. The previous commit that batches UDP
sockets during iteration facilitated a synchronous invocation of the UDP
destroy callback from BPF context by skipping socket locks in
`udp_abort`. TCP iterator already supported batching of sockets being
iterated. To that end, `tracing_iter_filter` callback filter is added so
that verifier can restrict the kfunc to programs with `BPF_TRACE_ITER`
attach type, and reject other programs.

The kfunc takes `sock_common` type argument, even though it expects, and
casts them to a `sock` pointer. This enables the verifier to allow the
sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
`sock` structs. Furthermore, as `sock_common` only has a subset of
certain fields of `sock`, casting pointer to the latter type might not
always be safe for certain sockets like request sockets, but these have a
special handling in the diag_destroy handlers.

Additionally, the kfunc is defined with `KF_TRUSTED_ARGS` flag to avoid the
cases where a `PTR_TO_BTF_ID` sk is obtained by following another pointer.
eg. getting a sk pointer (may be even NULL) by following another sk
pointer. The pointer socket argument passed in TCP and UDP iterators is
tagged as `PTR_TRUSTED` in {tcp,udp}_reg_info. The TRUSTED arg changes
are contributed by Martin KaFai Lau <martin.lau@kernel.org>.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-8-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

authored by

Aditi Ghag and committed by
Martin KaFai Lau
4ddbcb88 e924e80e

+75 -7
+63
net/core/filter.c
··· 11723 11723 return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); 11724 11724 } 11725 11725 late_initcall(bpf_kfunc_init); 11726 + 11727 + /* Disables missing prototype warnings */ 11728 + __diag_push(); 11729 + __diag_ignore_all("-Wmissing-prototypes", 11730 + "Global functions as their definitions will be in vmlinux BTF"); 11731 + 11732 + /* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code. 11733 + * 11734 + * The function expects a non-NULL pointer to a socket, and invokes the 11735 + * protocol specific socket destroy handlers. 11736 + * 11737 + * The helper can only be called from BPF contexts that have acquired the socket 11738 + * locks. 11739 + * 11740 + * Parameters: 11741 + * @sock: Pointer to socket to be destroyed 11742 + * 11743 + * Return: 11744 + * On error, may return EPROTONOSUPPORT, EINVAL. 11745 + * EPROTONOSUPPORT if protocol specific destroy handler is not supported. 11746 + * 0 otherwise 11747 + */ 11748 + __bpf_kfunc int bpf_sock_destroy(struct sock_common *sock) 11749 + { 11750 + struct sock *sk = (struct sock *)sock; 11751 + 11752 + /* The locking semantics that allow for synchronous execution of the 11753 + * destroy handlers are only supported for TCP and UDP. 11754 + * Supporting protocols will need to acquire sock lock in the BPF context 11755 + * prior to invoking this kfunc. 11756 + */ 11757 + if (!sk->sk_prot->diag_destroy || (sk->sk_protocol != IPPROTO_TCP && 11758 + sk->sk_protocol != IPPROTO_UDP)) 11759 + return -EOPNOTSUPP; 11760 + 11761 + return sk->sk_prot->diag_destroy(sk, ECONNABORTED); 11762 + } 11763 + 11764 + __diag_pop() 11765 + 11766 + BTF_SET8_START(bpf_sk_iter_kfunc_ids) 11767 + BTF_ID_FLAGS(func, bpf_sock_destroy, KF_TRUSTED_ARGS) 11768 + BTF_SET8_END(bpf_sk_iter_kfunc_ids) 11769 + 11770 + static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id) 11771 + { 11772 + if (btf_id_set8_contains(&bpf_sk_iter_kfunc_ids, kfunc_id) && 11773 + prog->expected_attach_type != BPF_TRACE_ITER) 11774 + return -EACCES; 11775 + return 0; 11776 + } 11777 + 11778 + static const struct btf_kfunc_id_set bpf_sk_iter_kfunc_set = { 11779 + .owner = THIS_MODULE, 11780 + .set = &bpf_sk_iter_kfunc_ids, 11781 + .filter = tracing_iter_filter, 11782 + }; 11783 + 11784 + static int init_subsystem(void) 11785 + { 11786 + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sk_iter_kfunc_set); 11787 + } 11788 + late_initcall(init_subsystem);
+6 -3
net/ipv4/tcp.c
··· 4682 4682 return 0; 4683 4683 } 4684 4684 4685 - /* Don't race with userspace socket closes such as tcp_close. */ 4686 - lock_sock(sk); 4685 + /* BPF context ensures sock locking. */ 4686 + if (!has_current_bpf_ctx()) 4687 + /* Don't race with userspace socket closes such as tcp_close. */ 4688 + lock_sock(sk); 4687 4689 4688 4690 if (sk->sk_state == TCP_LISTEN) { 4689 4691 tcp_set_state(sk, TCP_CLOSE); ··· 4709 4707 bh_unlock_sock(sk); 4710 4708 local_bh_enable(); 4711 4709 tcp_write_queue_purge(sk); 4712 - release_sock(sk); 4710 + if (!has_current_bpf_ctx()) 4711 + release_sock(sk); 4713 4712 return 0; 4714 4713 } 4715 4714 EXPORT_SYMBOL_GPL(tcp_abort);
+1 -1
net/ipv4/tcp_ipv4.c
··· 3355 3355 .ctx_arg_info_size = 1, 3356 3356 .ctx_arg_info = { 3357 3357 { offsetof(struct bpf_iter__tcp, sk_common), 3358 - PTR_TO_BTF_ID_OR_NULL }, 3358 + PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED }, 3359 3359 }, 3360 3360 .get_func_proto = bpf_iter_tcp_get_func_proto, 3361 3361 .seq_info = &tcp_seq_info,
+5 -3
net/ipv4/udp.c
··· 2930 2930 2931 2931 int udp_abort(struct sock *sk, int err) 2932 2932 { 2933 - lock_sock(sk); 2933 + if (!has_current_bpf_ctx()) 2934 + lock_sock(sk); 2934 2935 2935 2936 /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing 2936 2937 * with close() ··· 2944 2943 __udp_disconnect(sk, 0); 2945 2944 2946 2945 out: 2947 - release_sock(sk); 2946 + if (!has_current_bpf_ctx()) 2947 + release_sock(sk); 2948 2948 2949 2949 return 0; 2950 2950 } ··· 3648 3646 .ctx_arg_info_size = 1, 3649 3647 .ctx_arg_info = { 3650 3648 { offsetof(struct bpf_iter__udp, udp_sk), 3651 - PTR_TO_BTF_ID_OR_NULL }, 3649 + PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED }, 3652 3650 }, 3653 3651 .seq_info = &udp_seq_info, 3654 3652 };