Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-06-05

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add a new BPF hook for sendmsg similar to existing hooks for bind and
connect: "This allows to override source IP (including the case when it's
set via cmsg(3)) and destination IP:port for unconnected UDP (slow path).
TCP and connected UDP (fast path) are not affected. This makes UDP support
complete, that is, connected UDP is handled by connect hooks, unconnected
by sendmsg ones.", from Andrey.

2) Rework of the AF_XDP API to allow extending it in future for type writer
model if necessary. In this mode a memory window is passed to hardware
and multiple frames might be filled into that window instead of just one
that is the case in the current fixed frame-size model. With the new
changes made this can be supported without having to add a new descriptor
format. Also, core bits for the zero-copy support for AF_XDP have been
merged as agreed upon, where i40e bits will be routed via Jeff later on.
Various improvements to documentation and sample programs included as
well, all from Björn and Magnus.

3) Given BPF's flexibility, a new program type has been added to implement
infrared decoders. Quote: "The kernel IR decoders support the most
widely used IR protocols, but there are many protocols which are not
supported. [...] There is a 'long tail' of unsupported IR protocols,
for which lircd is need to decode the IR. IR encoding is done in such
a way that some simple circuit can decode it; therefore, BPF is ideal.
[...] user-space can define a decoder in BPF, attach it to the rc
device through the lirc chardev.", from Sean.

4) Several improvements and fixes to BPF core, among others, dumping map
and prog IDs into fdinfo which is a straight forward way to correlate
BPF objects used by applications, removing an indirect call and therefore
retpoline in all map lookup/update/delete calls by invoking the callback
directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper
for tc BPF programs to have an efficient way of looking up cgroup v2 id
for policy or other use cases. Fixes to make sure we zero tunnel/xfrm
state that hasn't been filled, to allow context access wrt pt_regs in
32 bit archs for tracing, and last but not least various test cases
for fixes that landed in bpf earlier, from Daniel.

5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with
a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect
call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper.

6) Add a new bpf_get_current_cgroup_id() helper that can be used in
tracing to retrieve the cgroup id from the current process in order
to allow for e.g. aggregation of container-level events, from Yonghong.

7) Two follow-up fixes for BTF to reject invalid input values and
related to that also two test cases for BPF kselftests, from Martin.

8) Various API improvements to the bpf_fib_lookup() helper, that is,
dropping MPLS bits which are not fully hashed out yet, rejecting
invalid helper flags, returning error for unsupported address
families as well as renaming flowlabel to flowinfo, from David.

9) Various fixes and improvements to sockmap BPF kselftests in particular
in proper error detection and data verification, from Prashant.

10) Two arm32 BPF JIT improvements. One is to fix imm range check with
regards to whether immediate fits into 24 bits, and a naming cleanup
to get functions related to rsh handling consistent to those handling
lsh, from Wang.

11) Two compile warning fixes in BPF, one for BTF and a false positive
to silent gcc in stack_map_get_build_id_offset(), from Arnd.

12) Add missing seg6.h header into tools include infrastructure in order
to fix compilation of BPF kselftests, from Mathieu.

13) Several formatting cleanups in the BPF UAPI helper description that
also fix an error during rst2man compilation, from Quentin.

14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is
not built into the kernel, from Yue.

15) Remove a useless double assignment in dev_map_enqueue(), from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+3860 -749
+58 -43
Documentation/networking/af_xdp.rst
··· 12 12 13 13 This document assumes that the reader is familiar with BPF and XDP. If 14 14 not, the Cilium project has an excellent reference guide at 15 - http://cilium.readthedocs.io/en/doc-1.0/bpf/. 15 + http://cilium.readthedocs.io/en/latest/bpf/. 16 16 17 17 Using the XDP_REDIRECT action from an XDP program, the program can 18 18 redirect ingress frames to other XDP enabled netdevs, using the ··· 33 33 to that packet can be changed to point to another and reused right 34 34 away. This again avoids copying data. 35 35 36 - The UMEM consists of a number of equally size frames and each frame 37 - has a unique frame id. A descriptor in one of the rings references a 38 - frame by referencing its frame id. The user space allocates memory for 39 - this UMEM using whatever means it feels is most appropriate (malloc, 40 - mmap, huge pages, etc). This memory area is then registered with the 41 - kernel using the new setsockopt XDP_UMEM_REG. The UMEM also has two 42 - rings: the FILL ring and the COMPLETION ring. The fill ring is used by 43 - the application to send down frame ids for the kernel to fill in with 44 - RX packet data. References to these frames will then appear in the RX 45 - ring once each packet has been received. The completion ring, on the 46 - other hand, contains frame ids that the kernel has transmitted 47 - completely and can now be used again by user space, for either TX or 48 - RX. Thus, the frame ids appearing in the completion ring are ids that 49 - were previously transmitted using the TX ring. In summary, the RX and 50 - FILL rings are used for the RX path and the TX and COMPLETION rings 51 - are used for the TX path. 36 + The UMEM consists of a number of equally sized chunks. A descriptor in 37 + one of the rings references a frame by referencing its addr. The addr 38 + is simply an offset within the entire UMEM region. The user space 39 + allocates memory for this UMEM using whatever means it feels is most 40 + appropriate (malloc, mmap, huge pages, etc). This memory area is then 41 + registered with the kernel using the new setsockopt XDP_UMEM_REG. The 42 + UMEM also has two rings: the FILL ring and the COMPLETION ring. The 43 + fill ring is used by the application to send down addr for the kernel 44 + to fill in with RX packet data. References to these frames will then 45 + appear in the RX ring once each packet has been received. The 46 + completion ring, on the other hand, contains frame addr that the 47 + kernel has transmitted completely and can now be used again by user 48 + space, for either TX or RX. Thus, the frame addrs appearing in the 49 + completion ring are addrs that were previously transmitted using the 50 + TX ring. In summary, the RX and FILL rings are used for the RX path 51 + and the TX and COMPLETION rings are used for the TX path. 52 52 53 53 The socket is then finally bound with a bind() call to a device and a 54 54 specific queue id on that device, and it is not until bind is ··· 59 59 corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind 60 60 call and submits the XSK of the process it would like to share UMEM 61 61 with as well as its own newly created XSK socket. The new process will 62 - then receive frame id references in its own RX ring that point to this 63 - shared UMEM. Note that since the ring structures are single-consumer / 64 - single-producer (for performance reasons), the new process has to 65 - create its own socket with associated RX and TX rings, since it cannot 66 - share this with the other process. This is also the reason that there 67 - is only one set of FILL and COMPLETION rings per UMEM. It is the 68 - responsibility of a single process to handle the UMEM. 62 + then receive frame addr references in its own RX ring that point to 63 + this shared UMEM. Note that since the ring structures are 64 + single-consumer / single-producer (for performance reasons), the new 65 + process has to create its own socket with associated RX and TX rings, 66 + since it cannot share this with the other process. This is also the 67 + reason that there is only one set of FILL and COMPLETION rings per 68 + UMEM. It is the responsibility of a single process to handle the UMEM. 69 69 70 70 How is then packets distributed from an XDP program to the XSKs? There 71 71 is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The ··· 102 102 103 103 UMEM is a region of virtual contiguous memory, divided into 104 104 equal-sized frames. An UMEM is associated to a netdev and a specific 105 - queue id of that netdev. It is created and configured (frame size, 106 - frame headroom, start address and size) by using the XDP_UMEM_REG 107 - setsockopt system call. A UMEM is bound to a netdev and queue id, via 108 - the bind() system call. 105 + queue id of that netdev. It is created and configured (chunk size, 106 + headroom, start address and size) by using the XDP_UMEM_REG setsockopt 107 + system call. A UMEM is bound to a netdev and queue id, via the bind() 108 + system call. 109 109 110 110 An AF_XDP is socket linked to a single UMEM, but one UMEM can have 111 111 multiple AF_XDP sockets. To share an UMEM created via one socket A, ··· 147 147 ~~~~~~~~~~~~~~ 148 148 149 149 The Fill ring is used to transfer ownership of UMEM frames from 150 - user-space to kernel-space. The UMEM indicies are passed in the 151 - ring. As an example, if the UMEM is 64k and each frame is 4k, then the 152 - UMEM has 16 frames and can pass indicies between 0 and 15. 150 + user-space to kernel-space. The UMEM addrs are passed in the ring. As 151 + an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has 152 + 16 chunks and can pass addrs between 0 and 64k. 153 153 154 154 Frames passed to the kernel are used for the ingress path (RX rings). 155 155 156 - The user application produces UMEM indicies to this ring. 156 + The user application produces UMEM addrs to this ring. Note that the 157 + kernel will mask the incoming addr. E.g. for a chunk size of 2k, the 158 + log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050 159 + and 3000 refers to the same chunk. 160 + 157 161 158 162 UMEM Completetion Ring 159 163 ~~~~~~~~~~~~~~~~~~~~~~ ··· 169 165 Frames passed from the kernel to user-space are frames that has been 170 166 sent (TX ring) and can be used by user-space again. 171 167 172 - The user application consumes UMEM indicies from this ring. 168 + The user application consumes UMEM addrs from this ring. 173 169 174 170 175 171 RX Ring 176 172 ~~~~~~~ 177 173 178 174 The RX ring is the receiving side of a socket. Each entry in the ring 179 - is a struct xdp_desc descriptor. The descriptor contains UMEM index 180 - (idx), the length of the data (len), the offset into the frame 181 - (offset). 175 + is a struct xdp_desc descriptor. The descriptor contains UMEM offset 176 + (addr) and the length of the data (len). 182 177 183 178 If no frames have been passed to kernel via the Fill ring, no 184 179 descriptors will (or can) appear on the RX ring. ··· 224 221 225 222 Naive ring dequeue and enqueue could look like this:: 226 223 224 + // struct xdp_rxtx_ring { 225 + // __u32 *producer; 226 + // __u32 *consumer; 227 + // struct xdp_desc *desc; 228 + // }; 229 + 230 + // struct xdp_umem_ring { 231 + // __u32 *producer; 232 + // __u32 *consumer; 233 + // __u64 *desc; 234 + // }; 235 + 227 236 // typedef struct xdp_rxtx_ring RING; 228 237 // typedef struct xdp_umem_ring RING; 229 238 230 239 // typedef struct xdp_desc RING_TYPE; 231 - // typedef __u32 RING_TYPE; 240 + // typedef __u64 RING_TYPE; 232 241 233 242 int dequeue_one(RING *ring, RING_TYPE *item) 234 243 { 235 - __u32 entries = ring->ptrs.producer - ring->ptrs.consumer; 244 + __u32 entries = *ring->producer - *ring->consumer; 236 245 237 246 if (entries == 0) 238 247 return -1; 239 248 240 249 // read-barrier! 241 250 242 - *item = ring->desc[ring->ptrs.consumer & (RING_SIZE - 1)]; 243 - ring->ptrs.consumer++; 251 + *item = ring->desc[*ring->consumer & (RING_SIZE - 1)]; 252 + (*ring->consumer)++; 244 253 return 0; 245 254 } 246 255 247 256 int enqueue_one(RING *ring, const RING_TYPE *item) 248 257 { 249 - u32 free_entries = RING_SIZE - (ring->ptrs.producer - ring->ptrs.consumer); 258 + u32 free_entries = RING_SIZE - (*ring->producer - *ring->consumer); 250 259 251 260 if (free_entries == 0) 252 261 return -1; 253 262 254 - ring->desc[ring->ptrs.producer & (RING_SIZE - 1)] = *item; 263 + ring->desc[*ring->producer & (RING_SIZE - 1)] = *item; 255 264 256 265 // write-barrier! 257 266 258 - ring->ptrs.producer++; 267 + (*ring->producer)++; 259 268 return 0; 260 269 } 261 270
+2
MAINTAINERS
··· 2722 2722 L: linux-kernel@vger.kernel.org 2723 2723 T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 2724 2724 T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 2725 + Q: https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147 2725 2726 S: Supported 2726 2727 F: arch/x86/net/bpf_jit* 2727 2728 F: Documentation/networking/filter.txt ··· 2741 2740 F: net/sched/cls_bpf.c 2742 2741 F: samples/bpf/ 2743 2742 F: tools/bpf/ 2743 + F: tools/lib/bpf/ 2744 2744 F: tools/testing/selftests/bpf/ 2745 2745 2746 2746 BROADCOM B44 10/100 ETHERNET DRIVER
+8 -8
arch/arm/net/bpf_jit_32.c
··· 84 84 * 85 85 * 1. First argument is passed using the arm 32bit registers and rest of the 86 86 * arguments are passed on stack scratch space. 87 - * 2. First callee-saved arugument is mapped to arm 32 bit registers and rest 87 + * 2. First callee-saved argument is mapped to arm 32 bit registers and rest 88 88 * arguments are mapped to scratch space on stack. 89 89 * 3. We need two 64 bit temp registers to do complex operations on eBPF 90 90 * registers. ··· 701 701 } 702 702 703 703 /* dst = dst >> src */ 704 - static inline void emit_a32_lsr_r64(const u8 dst[], const u8 src[], bool dstk, 704 + static inline void emit_a32_rsh_r64(const u8 dst[], const u8 src[], bool dstk, 705 705 bool sstk, struct jit_ctx *ctx) { 706 706 const u8 *tmp = bpf2a32[TMP_REG_1]; 707 707 const u8 *tmp2 = bpf2a32[TMP_REG_2]; ··· 717 717 emit(ARM_LDR_I(rm, ARM_SP, STACK_VAR(dst_hi)), ctx); 718 718 } 719 719 720 - /* Do LSH operation */ 720 + /* Do RSH operation */ 721 721 emit(ARM_RSB_I(ARM_IP, rt, 32), ctx); 722 722 emit(ARM_SUBS_I(tmp2[0], rt, 32), ctx); 723 723 emit(ARM_MOV_SR(ARM_LR, rd, SRTYPE_LSR, rt), ctx); ··· 767 767 } 768 768 769 769 /* dst = dst >> val */ 770 - static inline void emit_a32_lsr_i64(const u8 dst[], bool dstk, 770 + static inline void emit_a32_rsh_i64(const u8 dst[], bool dstk, 771 771 const u32 val, struct jit_ctx *ctx) { 772 772 const u8 *tmp = bpf2a32[TMP_REG_1]; 773 773 const u8 *tmp2 = bpf2a32[TMP_REG_2]; ··· 1192 1192 s32 jmp_offset; 1193 1193 1194 1194 #define check_imm(bits, imm) do { \ 1195 - if ((((imm) > 0) && ((imm) >> (bits))) || \ 1196 - (((imm) < 0) && (~(imm) >> (bits)))) { \ 1195 + if ((imm) >= (1 << ((bits) - 1)) || \ 1196 + (imm) < -(1 << ((bits) - 1))) { \ 1197 1197 pr_info("[%2d] imm=%d(0x%x) out of range\n", \ 1198 1198 i, imm, imm); \ 1199 1199 return -EINVAL; \ ··· 1323 1323 case BPF_ALU64 | BPF_RSH | BPF_K: 1324 1324 if (unlikely(imm > 63)) 1325 1325 return -EINVAL; 1326 - emit_a32_lsr_i64(dst, dstk, imm, ctx); 1326 + emit_a32_rsh_i64(dst, dstk, imm, ctx); 1327 1327 break; 1328 1328 /* dst = dst << src */ 1329 1329 case BPF_ALU64 | BPF_LSH | BPF_X: ··· 1331 1331 break; 1332 1332 /* dst = dst >> src */ 1333 1333 case BPF_ALU64 | BPF_RSH | BPF_X: 1334 - emit_a32_lsr_r64(dst, src, dstk, sstk, ctx); 1334 + emit_a32_rsh_r64(dst, src, dstk, sstk, ctx); 1335 1335 break; 1336 1336 /* dst = dst >> src (signed) */ 1337 1337 case BPF_ALU64 | BPF_ARSH | BPF_X:
+13
drivers/media/rc/Kconfig
··· 25 25 passes raw IR to and from userspace, which is needed for 26 26 IR transmitting (aka "blasting") and for the lirc daemon. 27 27 28 + config BPF_LIRC_MODE2 29 + bool "Support for eBPF programs attached to lirc devices" 30 + depends on BPF_SYSCALL 31 + depends on RC_CORE=y 32 + depends on LIRC 33 + help 34 + Allow attaching eBPF programs to a lirc device using the bpf(2) 35 + syscall command BPF_PROG_ATTACH. This is supported for raw IR 36 + receivers. 37 + 38 + These eBPF programs can be used to decode IR into scancodes, for 39 + IR protocols not supported by the kernel decoders. 40 + 28 41 menuconfig RC_DECODERS 29 42 bool "Remote controller decoders" 30 43 depends on RC_CORE
+1
drivers/media/rc/Makefile
··· 5 5 obj-$(CONFIG_RC_CORE) += rc-core.o 6 6 rc-core-y := rc-main.o rc-ir-raw.o 7 7 rc-core-$(CONFIG_LIRC) += lirc_dev.o 8 + rc-core-$(CONFIG_BPF_LIRC_MODE2) += bpf-lirc.o 8 9 obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o 9 10 obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o 10 11 obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
+313
drivers/media/rc/bpf-lirc.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // bpf-lirc.c - handles bpf 3 + // 4 + // Copyright (C) 2018 Sean Young <sean@mess.org> 5 + 6 + #include <linux/bpf.h> 7 + #include <linux/filter.h> 8 + #include <linux/bpf_lirc.h> 9 + #include "rc-core-priv.h" 10 + 11 + /* 12 + * BPF interface for raw IR 13 + */ 14 + const struct bpf_prog_ops lirc_mode2_prog_ops = { 15 + }; 16 + 17 + BPF_CALL_1(bpf_rc_repeat, u32*, sample) 18 + { 19 + struct ir_raw_event_ctrl *ctrl; 20 + 21 + ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample); 22 + 23 + rc_repeat(ctrl->dev); 24 + 25 + return 0; 26 + } 27 + 28 + static const struct bpf_func_proto rc_repeat_proto = { 29 + .func = bpf_rc_repeat, 30 + .gpl_only = true, /* rc_repeat is EXPORT_SYMBOL_GPL */ 31 + .ret_type = RET_INTEGER, 32 + .arg1_type = ARG_PTR_TO_CTX, 33 + }; 34 + 35 + /* 36 + * Currently rc-core does not support 64-bit scancodes, but there are many 37 + * known protocols with more than 32 bits. So, define the interface as u64 38 + * as a future-proof. 39 + */ 40 + BPF_CALL_4(bpf_rc_keydown, u32*, sample, u32, protocol, u64, scancode, 41 + u32, toggle) 42 + { 43 + struct ir_raw_event_ctrl *ctrl; 44 + 45 + ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample); 46 + 47 + rc_keydown(ctrl->dev, protocol, scancode, toggle != 0); 48 + 49 + return 0; 50 + } 51 + 52 + static const struct bpf_func_proto rc_keydown_proto = { 53 + .func = bpf_rc_keydown, 54 + .gpl_only = true, /* rc_keydown is EXPORT_SYMBOL_GPL */ 55 + .ret_type = RET_INTEGER, 56 + .arg1_type = ARG_PTR_TO_CTX, 57 + .arg2_type = ARG_ANYTHING, 58 + .arg3_type = ARG_ANYTHING, 59 + .arg4_type = ARG_ANYTHING, 60 + }; 61 + 62 + static const struct bpf_func_proto * 63 + lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 64 + { 65 + switch (func_id) { 66 + case BPF_FUNC_rc_repeat: 67 + return &rc_repeat_proto; 68 + case BPF_FUNC_rc_keydown: 69 + return &rc_keydown_proto; 70 + case BPF_FUNC_map_lookup_elem: 71 + return &bpf_map_lookup_elem_proto; 72 + case BPF_FUNC_map_update_elem: 73 + return &bpf_map_update_elem_proto; 74 + case BPF_FUNC_map_delete_elem: 75 + return &bpf_map_delete_elem_proto; 76 + case BPF_FUNC_ktime_get_ns: 77 + return &bpf_ktime_get_ns_proto; 78 + case BPF_FUNC_tail_call: 79 + return &bpf_tail_call_proto; 80 + case BPF_FUNC_get_prandom_u32: 81 + return &bpf_get_prandom_u32_proto; 82 + case BPF_FUNC_trace_printk: 83 + if (capable(CAP_SYS_ADMIN)) 84 + return bpf_get_trace_printk_proto(); 85 + /* fall through */ 86 + default: 87 + return NULL; 88 + } 89 + } 90 + 91 + static bool lirc_mode2_is_valid_access(int off, int size, 92 + enum bpf_access_type type, 93 + const struct bpf_prog *prog, 94 + struct bpf_insn_access_aux *info) 95 + { 96 + /* We have one field of u32 */ 97 + return type == BPF_READ && off == 0 && size == sizeof(u32); 98 + } 99 + 100 + const struct bpf_verifier_ops lirc_mode2_verifier_ops = { 101 + .get_func_proto = lirc_mode2_func_proto, 102 + .is_valid_access = lirc_mode2_is_valid_access 103 + }; 104 + 105 + #define BPF_MAX_PROGS 64 106 + 107 + static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) 108 + { 109 + struct bpf_prog_array __rcu *old_array; 110 + struct bpf_prog_array *new_array; 111 + struct ir_raw_event_ctrl *raw; 112 + int ret; 113 + 114 + if (rcdev->driver_type != RC_DRIVER_IR_RAW) 115 + return -EINVAL; 116 + 117 + ret = mutex_lock_interruptible(&ir_raw_handler_lock); 118 + if (ret) 119 + return ret; 120 + 121 + raw = rcdev->raw; 122 + if (!raw) { 123 + ret = -ENODEV; 124 + goto unlock; 125 + } 126 + 127 + if (raw->progs && bpf_prog_array_length(raw->progs) >= BPF_MAX_PROGS) { 128 + ret = -E2BIG; 129 + goto unlock; 130 + } 131 + 132 + old_array = raw->progs; 133 + ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); 134 + if (ret < 0) 135 + goto unlock; 136 + 137 + rcu_assign_pointer(raw->progs, new_array); 138 + bpf_prog_array_free(old_array); 139 + 140 + unlock: 141 + mutex_unlock(&ir_raw_handler_lock); 142 + return ret; 143 + } 144 + 145 + static int lirc_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog) 146 + { 147 + struct bpf_prog_array __rcu *old_array; 148 + struct bpf_prog_array *new_array; 149 + struct ir_raw_event_ctrl *raw; 150 + int ret; 151 + 152 + if (rcdev->driver_type != RC_DRIVER_IR_RAW) 153 + return -EINVAL; 154 + 155 + ret = mutex_lock_interruptible(&ir_raw_handler_lock); 156 + if (ret) 157 + return ret; 158 + 159 + raw = rcdev->raw; 160 + if (!raw) { 161 + ret = -ENODEV; 162 + goto unlock; 163 + } 164 + 165 + old_array = raw->progs; 166 + ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array); 167 + /* 168 + * Do not use bpf_prog_array_delete_safe() as we would end up 169 + * with a dummy entry in the array, and the we would free the 170 + * dummy in lirc_bpf_free() 171 + */ 172 + if (ret) 173 + goto unlock; 174 + 175 + rcu_assign_pointer(raw->progs, new_array); 176 + bpf_prog_array_free(old_array); 177 + unlock: 178 + mutex_unlock(&ir_raw_handler_lock); 179 + return ret; 180 + } 181 + 182 + void lirc_bpf_run(struct rc_dev *rcdev, u32 sample) 183 + { 184 + struct ir_raw_event_ctrl *raw = rcdev->raw; 185 + 186 + raw->bpf_sample = sample; 187 + 188 + if (raw->progs) 189 + BPF_PROG_RUN_ARRAY(raw->progs, &raw->bpf_sample, BPF_PROG_RUN); 190 + } 191 + 192 + /* 193 + * This should be called once the rc thread has been stopped, so there can be 194 + * no concurrent bpf execution. 195 + */ 196 + void lirc_bpf_free(struct rc_dev *rcdev) 197 + { 198 + struct bpf_prog **progs; 199 + 200 + if (!rcdev->raw->progs) 201 + return; 202 + 203 + progs = rcu_dereference(rcdev->raw->progs)->progs; 204 + while (*progs) 205 + bpf_prog_put(*progs++); 206 + 207 + bpf_prog_array_free(rcdev->raw->progs); 208 + } 209 + 210 + int lirc_prog_attach(const union bpf_attr *attr) 211 + { 212 + struct bpf_prog *prog; 213 + struct rc_dev *rcdev; 214 + int ret; 215 + 216 + if (attr->attach_flags) 217 + return -EINVAL; 218 + 219 + prog = bpf_prog_get_type(attr->attach_bpf_fd, 220 + BPF_PROG_TYPE_LIRC_MODE2); 221 + if (IS_ERR(prog)) 222 + return PTR_ERR(prog); 223 + 224 + rcdev = rc_dev_get_from_fd(attr->target_fd); 225 + if (IS_ERR(rcdev)) { 226 + bpf_prog_put(prog); 227 + return PTR_ERR(rcdev); 228 + } 229 + 230 + ret = lirc_bpf_attach(rcdev, prog); 231 + if (ret) 232 + bpf_prog_put(prog); 233 + 234 + put_device(&rcdev->dev); 235 + 236 + return ret; 237 + } 238 + 239 + int lirc_prog_detach(const union bpf_attr *attr) 240 + { 241 + struct bpf_prog *prog; 242 + struct rc_dev *rcdev; 243 + int ret; 244 + 245 + if (attr->attach_flags) 246 + return -EINVAL; 247 + 248 + prog = bpf_prog_get_type(attr->attach_bpf_fd, 249 + BPF_PROG_TYPE_LIRC_MODE2); 250 + if (IS_ERR(prog)) 251 + return PTR_ERR(prog); 252 + 253 + rcdev = rc_dev_get_from_fd(attr->target_fd); 254 + if (IS_ERR(rcdev)) { 255 + bpf_prog_put(prog); 256 + return PTR_ERR(rcdev); 257 + } 258 + 259 + ret = lirc_bpf_detach(rcdev, prog); 260 + 261 + bpf_prog_put(prog); 262 + put_device(&rcdev->dev); 263 + 264 + return ret; 265 + } 266 + 267 + int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) 268 + { 269 + __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); 270 + struct bpf_prog_array __rcu *progs; 271 + struct rc_dev *rcdev; 272 + u32 cnt, flags = 0; 273 + int ret; 274 + 275 + if (attr->query.query_flags) 276 + return -EINVAL; 277 + 278 + rcdev = rc_dev_get_from_fd(attr->query.target_fd); 279 + if (IS_ERR(rcdev)) 280 + return PTR_ERR(rcdev); 281 + 282 + if (rcdev->driver_type != RC_DRIVER_IR_RAW) { 283 + ret = -EINVAL; 284 + goto put; 285 + } 286 + 287 + ret = mutex_lock_interruptible(&ir_raw_handler_lock); 288 + if (ret) 289 + goto put; 290 + 291 + progs = rcdev->raw->progs; 292 + cnt = progs ? bpf_prog_array_length(progs) : 0; 293 + 294 + if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) { 295 + ret = -EFAULT; 296 + goto unlock; 297 + } 298 + 299 + if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) { 300 + ret = -EFAULT; 301 + goto unlock; 302 + } 303 + 304 + if (attr->query.prog_cnt != 0 && prog_ids && cnt) 305 + ret = bpf_prog_array_copy_to_user(progs, prog_ids, cnt); 306 + 307 + unlock: 308 + mutex_unlock(&ir_raw_handler_lock); 309 + put: 310 + put_device(&rcdev->dev); 311 + 312 + return ret; 313 + }
+30
drivers/media/rc/lirc_dev.c
··· 20 20 #include <linux/module.h> 21 21 #include <linux/mutex.h> 22 22 #include <linux/device.h> 23 + #include <linux/file.h> 23 24 #include <linux/idr.h> 24 25 #include <linux/poll.h> 25 26 #include <linux/sched.h> ··· 104 103 dev_dbg(&dev->dev, "delivering %uus %s to lirc_dev\n", 105 104 TO_US(ev.duration), TO_STR(ev.pulse)); 106 105 } 106 + 107 + /* 108 + * bpf does not care about the gap generated above; that exists 109 + * for backwards compatibility 110 + */ 111 + lirc_bpf_run(dev, sample); 107 112 108 113 spin_lock_irqsave(&dev->lirc_fh_lock, flags); 109 114 list_for_each_entry(fh, &dev->lirc_fh, list) { ··· 821 814 { 822 815 class_destroy(lirc_class); 823 816 unregister_chrdev_region(lirc_base_dev, RC_DEV_MAX); 817 + } 818 + 819 + struct rc_dev *rc_dev_get_from_fd(int fd) 820 + { 821 + struct fd f = fdget(fd); 822 + struct lirc_fh *fh; 823 + struct rc_dev *dev; 824 + 825 + if (!f.file) 826 + return ERR_PTR(-EBADF); 827 + 828 + if (f.file->f_op != &lirc_fops) { 829 + fdput(f); 830 + return ERR_PTR(-EINVAL); 831 + } 832 + 833 + fh = f.file->private_data; 834 + dev = fh->rc; 835 + 836 + get_device(&dev->dev); 837 + fdput(f); 838 + 839 + return dev; 824 840 } 825 841 826 842 MODULE_ALIAS("lirc_dev");
+21
drivers/media/rc/rc-core-priv.h
··· 13 13 #define MAX_IR_EVENT_SIZE 512 14 14 15 15 #include <linux/slab.h> 16 + #include <uapi/linux/bpf.h> 16 17 #include <media/rc-core.h> 17 18 18 19 /** ··· 58 57 /* raw decoder state follows */ 59 58 struct ir_raw_event prev_ev; 60 59 struct ir_raw_event this_ev; 60 + 61 + #ifdef CONFIG_BPF_LIRC_MODE2 62 + u32 bpf_sample; 63 + struct bpf_prog_array __rcu *progs; 64 + #endif 61 65 struct nec_dec { 62 66 int state; 63 67 unsigned count; ··· 131 125 unsigned int bits; 132 126 } imon; 133 127 }; 128 + 129 + /* Mutex for locking raw IR processing and handler change */ 130 + extern struct mutex ir_raw_handler_lock; 134 131 135 132 /* macros for IR decoders */ 136 133 static inline bool geq_margin(unsigned d1, unsigned d2, unsigned margin) ··· 297 288 void ir_lirc_scancode_event(struct rc_dev *dev, struct lirc_scancode *lsc); 298 289 int ir_lirc_register(struct rc_dev *dev); 299 290 void ir_lirc_unregister(struct rc_dev *dev); 291 + struct rc_dev *rc_dev_get_from_fd(int fd); 300 292 #else 301 293 static inline int lirc_dev_init(void) { return 0; } 302 294 static inline void lirc_dev_exit(void) {} ··· 307 297 struct lirc_scancode *lsc) { } 308 298 static inline int ir_lirc_register(struct rc_dev *dev) { return 0; } 309 299 static inline void ir_lirc_unregister(struct rc_dev *dev) { } 300 + #endif 301 + 302 + /* 303 + * bpf interface 304 + */ 305 + #ifdef CONFIG_BPF_LIRC_MODE2 306 + void lirc_bpf_free(struct rc_dev *dev); 307 + void lirc_bpf_run(struct rc_dev *dev, u32 sample); 308 + #else 309 + static inline void lirc_bpf_free(struct rc_dev *dev) { } 310 + static inline void lirc_bpf_run(struct rc_dev *dev, u32 sample) { } 310 311 #endif 311 312 312 313 #endif /* _RC_CORE_PRIV */
+10 -2
drivers/media/rc/rc-ir-raw.c
··· 14 14 static LIST_HEAD(ir_raw_client_list); 15 15 16 16 /* Used to handle IR raw handler extensions */ 17 - static DEFINE_MUTEX(ir_raw_handler_lock); 17 + DEFINE_MUTEX(ir_raw_handler_lock); 18 18 static LIST_HEAD(ir_raw_handler_list); 19 19 static atomic64_t available_protocols = ATOMIC64_INIT(0); 20 20 ··· 621 621 list_for_each_entry(handler, &ir_raw_handler_list, list) 622 622 if (handler->raw_unregister) 623 623 handler->raw_unregister(dev); 624 - mutex_unlock(&ir_raw_handler_lock); 624 + 625 + lirc_bpf_free(dev); 625 626 626 627 ir_raw_event_free(dev); 628 + 629 + /* 630 + * A user can be calling bpf(BPF_PROG_{QUERY|ATTACH|DETACH}), so 631 + * ensure that the raw member is null on unlock; this is how 632 + * "device gone" is checked. 633 + */ 634 + mutex_unlock(&ir_raw_handler_lock); 627 635 } 628 636 629 637 /*
-1
drivers/net/ethernet/intel/i40e/i40e_main.c
··· 11883 11883 .ndo_bridge_setlink = i40e_ndo_bridge_setlink, 11884 11884 .ndo_bpf = i40e_xdp, 11885 11885 .ndo_xdp_xmit = i40e_xdp_xmit, 11886 - .ndo_xdp_flush = i40e_xdp_flush, 11887 11886 }; 11888 11887 11889 11888 /**
+12 -21
drivers/net/ethernet/intel/i40e/i40e_txrx.c
··· 3693 3693 * For error cases, a negative errno code is returned and no-frames 3694 3694 * are transmitted (caller must handle freeing frames). 3695 3695 **/ 3696 - int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames) 3696 + int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, 3697 + u32 flags) 3697 3698 { 3698 3699 struct i40e_netdev_priv *np = netdev_priv(dev); 3699 3700 unsigned int queue_index = smp_processor_id(); 3700 3701 struct i40e_vsi *vsi = np->vsi; 3702 + struct i40e_ring *xdp_ring; 3701 3703 int drops = 0; 3702 3704 int i; 3703 3705 ··· 3709 3707 if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs) 3710 3708 return -ENXIO; 3711 3709 3710 + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) 3711 + return -EINVAL; 3712 + 3713 + xdp_ring = vsi->xdp_rings[queue_index]; 3714 + 3712 3715 for (i = 0; i < n; i++) { 3713 3716 struct xdp_frame *xdpf = frames[i]; 3714 3717 int err; 3715 3718 3716 - err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]); 3719 + err = i40e_xmit_xdp_ring(xdpf, xdp_ring); 3717 3720 if (err != I40E_XDP_TX) { 3718 3721 xdp_return_frame_rx_napi(xdpf); 3719 3722 drops++; 3720 3723 } 3721 3724 } 3722 3725 3726 + if (unlikely(flags & XDP_XMIT_FLUSH)) 3727 + i40e_xdp_ring_update_tail(xdp_ring); 3728 + 3723 3729 return n - drops; 3724 - } 3725 - 3726 - /** 3727 - * i40e_xdp_flush - Implements ndo_xdp_flush 3728 - * @dev: netdev 3729 - **/ 3730 - void i40e_xdp_flush(struct net_device *dev) 3731 - { 3732 - struct i40e_netdev_priv *np = netdev_priv(dev); 3733 - unsigned int queue_index = smp_processor_id(); 3734 - struct i40e_vsi *vsi = np->vsi; 3735 - 3736 - if (test_bit(__I40E_VSI_DOWN, vsi->state)) 3737 - return; 3738 - 3739 - if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs) 3740 - return; 3741 - 3742 - i40e_xdp_ring_update_tail(vsi->xdp_rings[queue_index]); 3743 3730 }
+2 -2
drivers/net/ethernet/intel/i40e/i40e_txrx.h
··· 487 487 void i40e_detect_recover_hung(struct i40e_vsi *vsi); 488 488 int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size); 489 489 bool __i40e_chk_linearize(struct sk_buff *skb); 490 - int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames); 491 - void i40e_xdp_flush(struct net_device *dev); 490 + int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, 491 + u32 flags); 492 492 493 493 /** 494 494 * i40e_get_head - Retrieve head from head writeback
+16 -26
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
··· 10023 10023 } 10024 10024 } 10025 10025 10026 + static void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring) 10027 + { 10028 + /* Force memory writes to complete before letting h/w know there 10029 + * are new descriptors to fetch. 10030 + */ 10031 + wmb(); 10032 + writel(ring->next_to_use, ring->tail); 10033 + } 10034 + 10026 10035 static int ixgbe_xdp_xmit(struct net_device *dev, int n, 10027 - struct xdp_frame **frames) 10036 + struct xdp_frame **frames, u32 flags) 10028 10037 { 10029 10038 struct ixgbe_adapter *adapter = netdev_priv(dev); 10030 10039 struct ixgbe_ring *ring; ··· 10042 10033 10043 10034 if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state))) 10044 10035 return -ENETDOWN; 10036 + 10037 + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) 10038 + return -EINVAL; 10045 10039 10046 10040 /* During program transitions its possible adapter->xdp_prog is assigned 10047 10041 * but ring has not been configured yet. In this case simply abort xmit. ··· 10064 10052 } 10065 10053 } 10066 10054 10055 + if (unlikely(flags & XDP_XMIT_FLUSH)) 10056 + ixgbe_xdp_ring_update_tail(ring); 10057 + 10067 10058 return n - drops; 10068 - } 10069 - 10070 - static void ixgbe_xdp_flush(struct net_device *dev) 10071 - { 10072 - struct ixgbe_adapter *adapter = netdev_priv(dev); 10073 - struct ixgbe_ring *ring; 10074 - 10075 - /* Its possible the device went down between xdp xmit and flush so 10076 - * we need to ensure device is still up. 10077 - */ 10078 - if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state))) 10079 - return; 10080 - 10081 - ring = adapter->xdp_prog ? adapter->xdp_ring[smp_processor_id()] : NULL; 10082 - if (unlikely(!ring)) 10083 - return; 10084 - 10085 - /* Force memory writes to complete before letting h/w know there 10086 - * are new descriptors to fetch. 10087 - */ 10088 - wmb(); 10089 - writel(ring->next_to_use, ring->tail); 10090 - 10091 - return; 10092 10059 } 10093 10060 10094 10061 static const struct net_device_ops ixgbe_netdev_ops = { ··· 10117 10126 .ndo_features_check = ixgbe_features_check, 10118 10127 .ndo_bpf = ixgbe_xdp, 10119 10128 .ndo_xdp_xmit = ixgbe_xdp_xmit, 10120 - .ndo_xdp_flush = ixgbe_xdp_flush, 10121 10129 }; 10122 10130 10123 10131 /**
+17 -27
drivers/net/tun.c
··· 1289 1289 .ndo_get_stats64 = tun_net_get_stats64, 1290 1290 }; 1291 1291 1292 - static int tun_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames) 1292 + static void __tun_xdp_flush_tfile(struct tun_file *tfile) 1293 + { 1294 + /* Notify and wake up reader process */ 1295 + if (tfile->flags & TUN_FASYNC) 1296 + kill_fasync(&tfile->fasync, SIGIO, POLL_IN); 1297 + tfile->socket.sk->sk_data_ready(tfile->socket.sk); 1298 + } 1299 + 1300 + static int tun_xdp_xmit(struct net_device *dev, int n, 1301 + struct xdp_frame **frames, u32 flags) 1293 1302 { 1294 1303 struct tun_struct *tun = netdev_priv(dev); 1295 1304 struct tun_file *tfile; ··· 1306 1297 int drops = 0; 1307 1298 int cnt = n; 1308 1299 int i; 1300 + 1301 + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) 1302 + return -EINVAL; 1309 1303 1310 1304 rcu_read_lock(); 1311 1305 ··· 1337 1325 } 1338 1326 spin_unlock(&tfile->tx_ring.producer_lock); 1339 1327 1328 + if (flags & XDP_XMIT_FLUSH) 1329 + __tun_xdp_flush_tfile(tfile); 1330 + 1340 1331 rcu_read_unlock(); 1341 1332 return cnt - drops; 1342 1333 } ··· 1351 1336 if (unlikely(!frame)) 1352 1337 return -EOVERFLOW; 1353 1338 1354 - return tun_xdp_xmit(dev, 1, &frame); 1355 - } 1356 - 1357 - static void tun_xdp_flush(struct net_device *dev) 1358 - { 1359 - struct tun_struct *tun = netdev_priv(dev); 1360 - struct tun_file *tfile; 1361 - u32 numqueues; 1362 - 1363 - rcu_read_lock(); 1364 - 1365 - numqueues = READ_ONCE(tun->numqueues); 1366 - if (!numqueues) 1367 - goto out; 1368 - 1369 - tfile = rcu_dereference(tun->tfiles[smp_processor_id() % 1370 - numqueues]); 1371 - /* Notify and wake up reader process */ 1372 - if (tfile->flags & TUN_FASYNC) 1373 - kill_fasync(&tfile->fasync, SIGIO, POLL_IN); 1374 - tfile->socket.sk->sk_data_ready(tfile->socket.sk); 1375 - 1376 - out: 1377 - rcu_read_unlock(); 1339 + return tun_xdp_xmit(dev, 1, &frame, XDP_XMIT_FLUSH); 1378 1340 } 1379 1341 1380 1342 static const struct net_device_ops tap_netdev_ops = { ··· 1372 1380 .ndo_get_stats64 = tun_net_get_stats64, 1373 1381 .ndo_bpf = tun_xdp, 1374 1382 .ndo_xdp_xmit = tun_xdp_xmit, 1375 - .ndo_xdp_flush = tun_xdp_flush, 1376 1383 }; 1377 1384 1378 1385 static void tun_flow_init(struct tun_struct *tun) ··· 1690 1699 alloc_frag->offset += buflen; 1691 1700 if (tun_xdp_tx(tun->dev, &xdp)) 1692 1701 goto err_redirect; 1693 - tun_xdp_flush(tun->dev); 1694 1702 rcu_read_unlock(); 1695 1703 local_bh_enable(); 1696 1704 return NULL;
+8 -14
drivers/net/virtio_net.c
··· 413 413 return skb; 414 414 } 415 415 416 - static void virtnet_xdp_flush(struct net_device *dev) 417 - { 418 - struct virtnet_info *vi = netdev_priv(dev); 419 - struct send_queue *sq; 420 - unsigned int qp; 421 - 422 - qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); 423 - sq = &vi->sq[qp]; 424 - 425 - virtqueue_kick(sq->vq); 426 - } 427 - 428 416 static int __virtnet_xdp_xmit_one(struct virtnet_info *vi, 429 417 struct send_queue *sq, 430 418 struct xdp_frame *xdpf) ··· 462 474 } 463 475 464 476 static int virtnet_xdp_xmit(struct net_device *dev, 465 - int n, struct xdp_frame **frames) 477 + int n, struct xdp_frame **frames, u32 flags) 466 478 { 467 479 struct virtnet_info *vi = netdev_priv(dev); 468 480 struct receive_queue *rq = vi->rq; ··· 474 486 int drops = 0; 475 487 int err; 476 488 int i; 489 + 490 + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) 491 + return -EINVAL; 477 492 478 493 qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); 479 494 sq = &vi->sq[qp]; ··· 501 510 drops++; 502 511 } 503 512 } 513 + 514 + if (flags & XDP_XMIT_FLUSH) 515 + virtqueue_kick(sq->vq); 516 + 504 517 return n - drops; 505 518 } 506 519 ··· 2372 2377 #endif 2373 2378 .ndo_bpf = virtnet_xdp, 2374 2379 .ndo_xdp_xmit = virtnet_xdp_xmit, 2375 - .ndo_xdp_flush = virtnet_xdp_flush, 2376 2380 .ndo_features_check = passthru_features_check, 2377 2381 .ndo_get_phys_port_name = virtnet_get_phys_port_name, 2378 2382 };
+18 -6
include/linux/bpf-cgroup.h
··· 66 66 67 67 int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, 68 68 struct sockaddr *uaddr, 69 - enum bpf_attach_type type); 69 + enum bpf_attach_type type, 70 + void *t_ctx); 70 71 71 72 int __cgroup_bpf_run_filter_sock_ops(struct sock *sk, 72 73 struct bpf_sock_ops_kern *sock_ops, ··· 121 120 ({ \ 122 121 int __ret = 0; \ 123 122 if (cgroup_bpf_enabled) \ 124 - __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type); \ 123 + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ 124 + NULL); \ 125 125 __ret; \ 126 126 }) 127 127 128 - #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type) \ 128 + #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx) \ 129 129 ({ \ 130 130 int __ret = 0; \ 131 131 if (cgroup_bpf_enabled) { \ 132 132 lock_sock(sk); \ 133 - __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type); \ 133 + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \ 134 + t_ctx); \ 134 135 release_sock(sk); \ 135 136 } \ 136 137 __ret; \ ··· 154 151 BPF_CGROUP_RUN_SA_PROG(sk, uaddr, BPF_CGROUP_INET6_CONNECT) 155 152 156 153 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr) \ 157 - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_CONNECT) 154 + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_CONNECT, NULL) 158 155 159 156 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr) \ 160 - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_CONNECT) 157 + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_CONNECT, NULL) 158 + 159 + #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx) \ 160 + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP4_SENDMSG, t_ctx) 161 + 162 + #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) \ 163 + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP6_SENDMSG, t_ctx) 161 164 162 165 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) \ 163 166 ({ \ ··· 194 185 static inline void cgroup_bpf_put(struct cgroup *cgrp) {} 195 186 static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; } 196 187 188 + #define cgroup_bpf_enabled (0) 197 189 #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0) 198 190 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; }) 199 191 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; }) ··· 207 197 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr) ({ 0; }) 208 198 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr) ({ 0; }) 209 199 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr) ({ 0; }) 200 + #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; }) 201 + #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; }) 210 202 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; }) 211 203 #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; }) 212 204
+1
include/linux/bpf.h
··· 746 746 extern const struct bpf_func_proto bpf_get_stack_proto; 747 747 extern const struct bpf_func_proto bpf_sock_map_update_proto; 748 748 extern const struct bpf_func_proto bpf_sock_hash_update_proto; 749 + extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto; 749 750 750 751 /* Shared helpers among cBPF and eBPF. */ 751 752 void bpf_user_rnd_init_once(void);
+29
include/linux/bpf_lirc.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _BPF_LIRC_H 3 + #define _BPF_LIRC_H 4 + 5 + #include <uapi/linux/bpf.h> 6 + 7 + #ifdef CONFIG_BPF_LIRC_MODE2 8 + int lirc_prog_attach(const union bpf_attr *attr); 9 + int lirc_prog_detach(const union bpf_attr *attr); 10 + int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr); 11 + #else 12 + static inline int lirc_prog_attach(const union bpf_attr *attr) 13 + { 14 + return -EINVAL; 15 + } 16 + 17 + static inline int lirc_prog_detach(const union bpf_attr *attr) 18 + { 19 + return -EINVAL; 20 + } 21 + 22 + static inline int lirc_prog_query(const union bpf_attr *attr, 23 + union bpf_attr __user *uattr) 24 + { 25 + return -EINVAL; 26 + } 27 + #endif 28 + 29 + #endif /* _BPF_LIRC_H */
+3
include/linux/bpf_types.h
··· 26 26 #ifdef CONFIG_CGROUP_BPF 27 27 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev) 28 28 #endif 29 + #ifdef CONFIG_BPF_LIRC_MODE2 30 + BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2) 31 + #endif 29 32 30 33 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) 31 34 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
+38 -6
include/linux/filter.h
··· 289 289 .off = OFF, \ 290 290 .imm = 0 }) 291 291 292 + /* Relative call */ 293 + 294 + #define BPF_CALL_REL(TGT) \ 295 + ((struct bpf_insn) { \ 296 + .code = BPF_JMP | BPF_CALL, \ 297 + .dst_reg = 0, \ 298 + .src_reg = BPF_PSEUDO_CALL, \ 299 + .off = 0, \ 300 + .imm = TGT }) 301 + 292 302 /* Function call */ 303 + 304 + #define BPF_CAST_CALL(x) \ 305 + ((u64 (*)(u64, u64, u64, u64, u64))(x)) 293 306 294 307 #define BPF_EMIT_CALL(FUNC) \ 295 308 ((struct bpf_insn) { \ ··· 639 626 return prog->type == BPF_PROG_TYPE_UNSPEC; 640 627 } 641 628 642 - static inline bool 643 - bpf_ctx_narrow_access_ok(u32 off, u32 size, const u32 size_default) 629 + static inline u32 bpf_ctx_off_adjust_machine(u32 size) 644 630 { 645 - bool off_ok; 631 + const u32 size_machine = sizeof(unsigned long); 632 + 633 + if (size > size_machine && size % size_machine == 0) 634 + size = size_machine; 635 + 636 + return size; 637 + } 638 + 639 + static inline bool bpf_ctx_narrow_align_ok(u32 off, u32 size_access, 640 + u32 size_default) 641 + { 642 + size_default = bpf_ctx_off_adjust_machine(size_default); 643 + size_access = bpf_ctx_off_adjust_machine(size_access); 644 + 646 645 #ifdef __LITTLE_ENDIAN 647 - off_ok = (off & (size_default - 1)) == 0; 646 + return (off & (size_default - 1)) == 0; 648 647 #else 649 - off_ok = (off & (size_default - 1)) + size == size_default; 648 + return (off & (size_default - 1)) + size_access == size_default; 650 649 #endif 651 - return off_ok && size <= size_default && (size & (size - 1)) == 0; 650 + } 651 + 652 + static inline bool 653 + bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) 654 + { 655 + return bpf_ctx_narrow_align_ok(off, size, size_default) && 656 + size <= size_default && (size & (size - 1)) == 0; 652 657 } 653 658 654 659 #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) ··· 1041 1010 * only two (src and dst) are available at convert_ctx_access time 1042 1011 */ 1043 1012 u64 tmp_reg; 1013 + void *t_ctx; /* Attach type specific context. */ 1044 1014 }; 1045 1015 1046 1016 struct bpf_sock_ops_kern {
+14 -7
include/linux/netdevice.h
··· 818 818 BPF_OFFLOAD_DESTROY, 819 819 BPF_OFFLOAD_MAP_ALLOC, 820 820 BPF_OFFLOAD_MAP_FREE, 821 + XDP_QUERY_XSK_UMEM, 822 + XDP_SETUP_XSK_UMEM, 821 823 }; 822 824 823 825 struct bpf_prog_offload_ops; 824 826 struct netlink_ext_ack; 827 + struct xdp_umem; 825 828 826 829 struct netdev_bpf { 827 830 enum bpf_netdev_command command; ··· 855 852 struct { 856 853 struct bpf_offloaded_map *offmap; 857 854 }; 855 + /* XDP_SETUP_XSK_UMEM */ 856 + struct { 857 + struct xdp_umem *umem; 858 + u16 queue_id; 859 + } xsk; 858 860 }; 859 861 }; 860 862 ··· 1194 1186 * This function is used to set or query state related to XDP on the 1195 1187 * netdevice and manage BPF offload. See definition of 1196 1188 * enum bpf_netdev_command for details. 1197 - * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp); 1189 + * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp, 1190 + * u32 flags); 1198 1191 * This function is used to submit @n XDP packets for transmit on a 1199 1192 * netdevice. Returns number of frames successfully transmitted, frames 1200 1193 * that got dropped are freed/returned via xdp_return_frame(). 1201 1194 * Returns negative number, means general error invoking ndo, meaning 1202 1195 * no frames were xmit'ed and core-caller will free all frames. 1203 - * TODO: Consider add flag to allow sending flush operation. 1204 - * void (*ndo_xdp_flush)(struct net_device *dev); 1205 - * This function is used to inform the driver to flush a particular 1206 - * xdp tx queue. Must be called on same CPU as xdp_xmit. 1207 1196 */ 1208 1197 struct net_device_ops { 1209 1198 int (*ndo_init)(struct net_device *dev); ··· 1386 1381 int (*ndo_bpf)(struct net_device *dev, 1387 1382 struct netdev_bpf *bpf); 1388 1383 int (*ndo_xdp_xmit)(struct net_device *dev, int n, 1389 - struct xdp_frame **xdp); 1390 - void (*ndo_xdp_flush)(struct net_device *dev); 1384 + struct xdp_frame **xdp, 1385 + u32 flags); 1386 + int (*ndo_xsk_async_xmit)(struct net_device *dev, 1387 + u32 queue_id); 1391 1388 }; 1392 1389 1393 1390 /**
+14
include/net/xdp.h
··· 37 37 MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */ 38 38 MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ 39 39 MEM_TYPE_PAGE_POOL, 40 + MEM_TYPE_ZERO_COPY, 40 41 MEM_TYPE_MAX, 41 42 }; 43 + 44 + /* XDP flags for ndo_xdp_xmit */ 45 + #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ 46 + #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH 42 47 43 48 struct xdp_mem_info { 44 49 u32 type; /* enum xdp_mem_type, but known size type */ ··· 51 46 }; 52 47 53 48 struct page_pool; 49 + 50 + struct zero_copy_allocator { 51 + void (*free)(struct zero_copy_allocator *zca, unsigned long handle); 52 + }; 54 53 55 54 struct xdp_rxq_info { 56 55 struct net_device *dev; ··· 68 59 void *data_end; 69 60 void *data_meta; 70 61 void *data_hard_start; 62 + unsigned long handle; 71 63 struct xdp_rxq_info *rxq; 72 64 }; 73 65 ··· 91 81 struct xdp_frame *xdp_frame; 92 82 int metasize; 93 83 int headroom; 84 + 85 + /* TODO: implement clone, copy, use "native" MEM_TYPE */ 86 + if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) 87 + return NULL; 94 88 95 89 /* Assure headroom is available for storing info */ 96 90 headroom = xdp->data - xdp->data_hard_start;
+43 -1
include/net/xdp_sock.h
··· 6 6 #ifndef _LINUX_XDP_SOCK_H 7 7 #define _LINUX_XDP_SOCK_H 8 8 9 + #include <linux/workqueue.h> 10 + #include <linux/if_xdp.h> 9 11 #include <linux/mutex.h> 12 + #include <linux/spinlock.h> 13 + #include <linux/mm.h> 10 14 #include <net/sock.h> 11 15 12 16 struct net_device; 13 17 struct xsk_queue; 14 - struct xdp_umem; 18 + 19 + struct xdp_umem_props { 20 + u64 chunk_mask; 21 + u64 size; 22 + }; 23 + 24 + struct xdp_umem_page { 25 + void *addr; 26 + dma_addr_t dma; 27 + }; 28 + 29 + struct xdp_umem { 30 + struct xsk_queue *fq; 31 + struct xsk_queue *cq; 32 + struct xdp_umem_page *pages; 33 + struct xdp_umem_props props; 34 + u32 headroom; 35 + u32 chunk_size_nohr; 36 + struct user_struct *user; 37 + struct pid *pid; 38 + unsigned long address; 39 + refcount_t users; 40 + struct work_struct work; 41 + struct page **pgs; 42 + u32 npgs; 43 + struct net_device *dev; 44 + u16 queue_id; 45 + bool zc; 46 + spinlock_t xsk_list_lock; 47 + struct list_head xsk_list; 48 + }; 15 49 16 50 struct xdp_sock { 17 51 /* struct sock must be the first member of struct xdp_sock */ ··· 56 22 struct list_head flush_node; 57 23 u16 queue_id; 58 24 struct xsk_queue *tx ____cacheline_aligned_in_smp; 25 + struct list_head list; 26 + bool zc; 59 27 /* Protects multiple processes in the control path */ 60 28 struct mutex mutex; 61 29 u64 rx_dropped; ··· 69 33 int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); 70 34 void xsk_flush(struct xdp_sock *xs); 71 35 bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs); 36 + /* Used from netdev driver */ 37 + u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr); 38 + void xsk_umem_discard_addr(struct xdp_umem *umem); 39 + void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries); 40 + bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len); 41 + void xsk_umem_consume_tx_done(struct xdp_umem *umem); 72 42 #else 73 43 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) 74 44 {
+109 -27
include/uapi/linux/bpf.h
··· 143 143 BPF_PROG_TYPE_RAW_TRACEPOINT, 144 144 BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 145 145 BPF_PROG_TYPE_LWT_SEG6LOCAL, 146 + BPF_PROG_TYPE_LIRC_MODE2, 146 147 }; 147 148 148 149 enum bpf_attach_type { ··· 161 160 BPF_CGROUP_INET6_CONNECT, 162 161 BPF_CGROUP_INET4_POST_BIND, 163 162 BPF_CGROUP_INET6_POST_BIND, 163 + BPF_CGROUP_UDP4_SENDMSG, 164 + BPF_CGROUP_UDP6_SENDMSG, 165 + BPF_LIRC_MODE2, 164 166 __MAX_BPF_ATTACH_TYPE 165 167 }; 166 168 ··· 1012 1008 * :: 1013 1009 * 1014 1010 * # sysctl kernel.perf_event_max_stack=<new value> 1015 - * 1016 1011 * Return 1017 1012 * The positive or null stack id on success, or a negative error 1018 1013 * in case of failure. ··· 1822 1819 * :: 1823 1820 * 1824 1821 * # sysctl kernel.perf_event_max_stack=<new value> 1825 - * 1826 1822 * Return 1827 - * a non-negative value equal to or less than size on success, or 1828 - * a negative error in case of failure. 1823 + * A non-negative value equal to or less than *size* on success, 1824 + * or a negative error in case of failure. 1829 1825 * 1830 1826 * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void *to, u32 len, u32 start_header) 1831 1827 * Description ··· 1845 1843 * in socket filters where *skb*\ **->data** does not always point 1846 1844 * to the start of the mac header and where "direct packet access" 1847 1845 * is not available. 1848 - * 1849 1846 * Return 1850 1847 * 0 on success, or a negative error in case of failure. 1851 1848 * ··· 1854 1853 * If lookup is successful and result shows packet is to be 1855 1854 * forwarded, the neighbor tables are searched for the nexthop. 1856 1855 * If successful (ie., FIB lookup shows forwarding and nexthop 1857 - * is resolved), the nexthop address is returned in ipv4_dst, 1858 - * ipv6_dst or mpls_out based on family, smac is set to mac 1859 - * address of egress device, dmac is set to nexthop mac address, 1860 - * rt_metric is set to metric from route. 1856 + * is resolved), the nexthop address is returned in ipv4_dst 1857 + * or ipv6_dst based on family, smac is set to mac address of 1858 + * egress device, dmac is set to nexthop mac address, rt_metric 1859 + * is set to metric from route (IPv4/IPv6 only). 1861 1860 * 1862 1861 * *plen* argument is the size of the passed in struct. 1863 - * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags: 1862 + * *flags* argument can be a combination of one or more of the 1863 + * following values: 1864 1864 * 1865 - * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs 1866 - * full lookup using FIB rules 1867 - * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress 1868 - * perspective (default is ingress) 1865 + * **BPF_FIB_LOOKUP_DIRECT** 1866 + * Do a direct table lookup vs full lookup using FIB 1867 + * rules. 1868 + * **BPF_FIB_LOOKUP_OUTPUT** 1869 + * Perform lookup from an egress perspective (default is 1870 + * ingress). 1869 1871 * 1870 1872 * *ctx* is either **struct xdp_md** for XDP programs or 1871 1873 * **struct sk_buff** tc cls_act programs. 1872 - * 1873 1874 * Return 1874 1875 * Egress device index on success, 0 if packet needs to continue 1875 1876 * up the stack for further processing or a negative error in case ··· 2007 2004 * direct packet access. 2008 2005 * Return 2009 2006 * 0 on success, or a negative error in case of failure. 2007 + * 2008 + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle) 2009 + * Description 2010 + * This helper is used in programs implementing IR decoding, to 2011 + * report a successfully decoded key press with *scancode*, 2012 + * *toggle* value in the given *protocol*. The scancode will be 2013 + * translated to a keycode using the rc keymap, and reported as 2014 + * an input key down event. After a period a key up event is 2015 + * generated. This period can be extended by calling either 2016 + * **bpf_rc_keydown** () again with the same values, or calling 2017 + * **bpf_rc_repeat** (). 2018 + * 2019 + * Some protocols include a toggle bit, in case the button was 2020 + * released and pressed again between consecutive scancodes. 2021 + * 2022 + * The *ctx* should point to the lirc sample as passed into 2023 + * the program. 2024 + * 2025 + * The *protocol* is the decoded protocol number (see 2026 + * **enum rc_proto** for some predefined values). 2027 + * 2028 + * This helper is only available is the kernel was compiled with 2029 + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to 2030 + * "**y**". 2031 + * 2032 + * Return 2033 + * 0 2034 + * 2035 + * int bpf_rc_repeat(void *ctx) 2036 + * Description 2037 + * This helper is used in programs implementing IR decoding, to 2038 + * report a successfully decoded repeat key message. This delays 2039 + * the generation of a key up event for previously generated 2040 + * key down event. 2041 + * 2042 + * Some IR protocols like NEC have a special IR message for 2043 + * repeating last button, for when a button is held down. 2044 + * 2045 + * The *ctx* should point to the lirc sample as passed into 2046 + * the program. 2047 + * 2048 + * This helper is only available is the kernel was compiled with 2049 + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to 2050 + * "**y**". 2051 + * 2052 + * Return 2053 + * 0 2054 + * 2055 + * uint64_t bpf_skb_cgroup_id(struct sk_buff *skb) 2056 + * Description 2057 + * Return the cgroup v2 id of the socket associated with the *skb*. 2058 + * This is roughly similar to the **bpf_get_cgroup_classid**\ () 2059 + * helper for cgroup v1 by providing a tag resp. identifier that 2060 + * can be matched on or used for map lookups e.g. to implement 2061 + * policy. The cgroup v2 id of a given path in the hierarchy is 2062 + * exposed in user space through the f_handle API in order to get 2063 + * to the same 64-bit id. 2064 + * 2065 + * This helper can be used on TC egress path, but not on ingress, 2066 + * and is available only if the kernel was compiled with the 2067 + * **CONFIG_SOCK_CGROUP_DATA** configuration option. 2068 + * Return 2069 + * The id is returned or 0 in case the id could not be retrieved. 2070 + * 2071 + * u64 bpf_get_current_cgroup_id(void) 2072 + * Return 2073 + * A 64-bit integer containing the current cgroup id based 2074 + * on the cgroup within which the current task is running. 2010 2075 */ 2011 2076 #define __BPF_FUNC_MAPPER(FN) \ 2012 2077 FN(unspec), \ ··· 2153 2082 FN(lwt_push_encap), \ 2154 2083 FN(lwt_seg6_store_bytes), \ 2155 2084 FN(lwt_seg6_adjust_srh), \ 2156 - FN(lwt_seg6_action), 2085 + FN(lwt_seg6_action), \ 2086 + FN(rc_repeat), \ 2087 + FN(rc_keydown), \ 2088 + FN(skb_cgroup_id), \ 2089 + FN(get_current_cgroup_id), 2157 2090 2158 2091 /* integer value in 'imm' field of BPF_CALL instruction selects which helper 2159 2092 * function eBPF program intends to call ··· 2274 2199 }; 2275 2200 __u8 tunnel_tos; 2276 2201 __u8 tunnel_ttl; 2277 - __u16 tunnel_ext; 2202 + __u16 tunnel_ext; /* Padding, future use. */ 2278 2203 __u32 tunnel_label; 2279 2204 }; 2280 2205 ··· 2285 2210 __u32 reqid; 2286 2211 __u32 spi; /* Stored in network byte order */ 2287 2212 __u16 family; 2213 + __u16 ext; /* Padding, future use. */ 2288 2214 union { 2289 2215 __u32 remote_ipv4; /* Stored in network byte order */ 2290 2216 __u32 remote_ipv6[4]; /* Stored in network byte order */ ··· 2440 2364 __u32 family; /* Allows 4-byte read, but no write */ 2441 2365 __u32 type; /* Allows 4-byte read, but no write */ 2442 2366 __u32 protocol; /* Allows 4-byte read, but no write */ 2367 + __u32 msg_src_ip4; /* Allows 1,2,4-byte read an 4-byte write. 2368 + * Stored in network byte order. 2369 + */ 2370 + __u32 msg_src_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. 2371 + * Stored in network byte order. 2372 + */ 2443 2373 }; 2444 2374 2445 2375 /* User bpf_sock_ops struct to access socket values and specify request ops ··· 2613 2531 #define BPF_FIB_LOOKUP_OUTPUT BIT(1) 2614 2532 2615 2533 struct bpf_fib_lookup { 2616 - /* input */ 2617 - __u8 family; /* network family, AF_INET, AF_INET6, AF_MPLS */ 2534 + /* input: network family for lookup (AF_INET, AF_INET6) 2535 + * output: network family of egress nexthop 2536 + */ 2537 + __u8 family; 2618 2538 2619 2539 /* set if lookup is to consider L4 data - e.g., FIB rules */ 2620 2540 __u8 l4_protocol; ··· 2630 2546 union { 2631 2547 /* inputs to lookup */ 2632 2548 __u8 tos; /* AF_INET */ 2633 - __be32 flowlabel; /* AF_INET6 */ 2549 + __be32 flowinfo; /* AF_INET6, flow_label + priority */ 2634 2550 2635 - /* output: metric of fib result */ 2636 - __u32 rt_metric; 2551 + /* output: metric of fib result (IPv4/IPv6 only) */ 2552 + __u32 rt_metric; 2637 2553 }; 2638 2554 2639 2555 union { 2640 - __be32 mpls_in; 2641 2556 __be32 ipv4_src; 2642 2557 __u32 ipv6_src[4]; /* in6_addr; network order */ 2643 2558 }; 2644 2559 2645 - /* input to bpf_fib_lookup, *dst is destination address. 2646 - * output: bpf_fib_lookup sets to gateway address 2560 + /* input to bpf_fib_lookup, ipv{4,6}_dst is destination address in 2561 + * network header. output: bpf_fib_lookup sets to gateway address 2562 + * if FIB lookup returns gateway route 2647 2563 */ 2648 2564 union { 2649 - /* return for MPLS lookups */ 2650 - __be32 mpls_out[4]; /* support up to 4 labels */ 2651 2565 __be32 ipv4_dst; 2652 2566 __u32 ipv6_dst[4]; /* in6_addr; network order */ 2653 2567 };
+8 -8
include/uapi/linux/if_xdp.h
··· 13 13 #include <linux/types.h> 14 14 15 15 /* Options for the sxdp_flags field */ 16 - #define XDP_SHARED_UMEM 1 16 + #define XDP_SHARED_UMEM (1 << 0) 17 + #define XDP_COPY (1 << 1) /* Force copy-mode */ 18 + #define XDP_ZEROCOPY (1 << 2) /* Force zero-copy mode */ 17 19 18 20 struct sockaddr_xdp { 19 21 __u16 sxdp_family; ··· 50 48 struct xdp_umem_reg { 51 49 __u64 addr; /* Start of packet data area */ 52 50 __u64 len; /* Length of packet data area */ 53 - __u32 frame_size; /* Frame size */ 54 - __u32 frame_headroom; /* Frame head room */ 51 + __u32 chunk_size; 52 + __u32 headroom; 55 53 }; 56 54 57 55 struct xdp_statistics { ··· 68 66 69 67 /* Rx/Tx descriptor */ 70 68 struct xdp_desc { 71 - __u32 idx; 69 + __u64 addr; 72 70 __u32 len; 73 - __u16 offset; 74 - __u8 flags; 75 - __u8 padding[5]; 71 + __u32 options; 76 72 }; 77 73 78 - /* UMEM descriptor is __u32 */ 74 + /* UMEM descriptor is __u64 */ 79 75 80 76 #endif /* _LINUX_IF_XDP_H */
+26 -2
kernel/bpf/btf.c
··· 749 749 !btf_type_is_array(next_type) && 750 750 !btf_type_is_struct(next_type); 751 751 default: 752 - BUG_ON(1); 752 + BUG(); 753 753 } 754 754 } 755 755 ··· 1286 1286 .seq_show = btf_ptr_seq_show, 1287 1287 }; 1288 1288 1289 + static s32 btf_fwd_check_meta(struct btf_verifier_env *env, 1290 + const struct btf_type *t, 1291 + u32 meta_left) 1292 + { 1293 + if (btf_type_vlen(t)) { 1294 + btf_verifier_log_type(env, t, "vlen != 0"); 1295 + return -EINVAL; 1296 + } 1297 + 1298 + if (t->type) { 1299 + btf_verifier_log_type(env, t, "type != 0"); 1300 + return -EINVAL; 1301 + } 1302 + 1303 + btf_verifier_log_type(env, t, NULL); 1304 + 1305 + return 0; 1306 + } 1307 + 1289 1308 static struct btf_kind_operations fwd_ops = { 1290 - .check_meta = btf_ref_type_check_meta, 1309 + .check_meta = btf_fwd_check_meta, 1291 1310 .resolve = btf_df_resolve, 1292 1311 .check_member = btf_df_check_member, 1293 1312 .log_details = btf_ref_type_log, ··· 1358 1339 1359 1340 if (btf_type_vlen(t)) { 1360 1341 btf_verifier_log_type(env, t, "vlen != 0"); 1342 + return -EINVAL; 1343 + } 1344 + 1345 + if (t->size) { 1346 + btf_verifier_log_type(env, t, "size != 0"); 1361 1347 return -EINVAL; 1362 1348 } 1363 1349
+10 -1
kernel/bpf/cgroup.c
··· 500 500 * @sk: sock struct that will use sockaddr 501 501 * @uaddr: sockaddr struct provided by user 502 502 * @type: The type of program to be exectuted 503 + * @t_ctx: Pointer to attach type specific context 503 504 * 504 505 * socket is expected to be of type INET or INET6. 505 506 * ··· 509 508 */ 510 509 int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, 511 510 struct sockaddr *uaddr, 512 - enum bpf_attach_type type) 511 + enum bpf_attach_type type, 512 + void *t_ctx) 513 513 { 514 514 struct bpf_sock_addr_kern ctx = { 515 515 .sk = sk, 516 516 .uaddr = uaddr, 517 + .t_ctx = t_ctx, 517 518 }; 519 + struct sockaddr_storage unspec; 518 520 struct cgroup *cgrp; 519 521 int ret; 520 522 ··· 526 522 */ 527 523 if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) 528 524 return 0; 525 + 526 + if (!ctx.uaddr) { 527 + memset(&unspec, 0, sizeof(unspec)); 528 + ctx.uaddr = (struct sockaddr *)&unspec; 529 + } 529 530 530 531 cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); 531 532 ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], &ctx, BPF_PROG_RUN);
+10 -2
kernel/bpf/core.c
··· 1616 1616 int new_prog_cnt, carry_prog_cnt = 0; 1617 1617 struct bpf_prog **existing_prog; 1618 1618 struct bpf_prog_array *array; 1619 + bool found_exclude = false; 1619 1620 int new_prog_idx = 0; 1620 1621 1621 1622 /* Figure out how many existing progs we need to carry over to ··· 1625 1624 if (old_array) { 1626 1625 existing_prog = old_array->progs; 1627 1626 for (; *existing_prog; existing_prog++) { 1628 - if (*existing_prog != exclude_prog && 1629 - *existing_prog != &dummy_bpf_prog.prog) 1627 + if (*existing_prog == exclude_prog) { 1628 + found_exclude = true; 1629 + continue; 1630 + } 1631 + if (*existing_prog != &dummy_bpf_prog.prog) 1630 1632 carry_prog_cnt++; 1631 1633 if (*existing_prog == include_prog) 1632 1634 return -EEXIST; 1633 1635 } 1634 1636 } 1637 + 1638 + if (exclude_prog && !found_exclude) 1639 + return -ENOENT; 1635 1640 1636 1641 /* How many progs (not NULL) will be in the new array? */ 1637 1642 new_prog_cnt = carry_prog_cnt; ··· 1765 1758 const struct bpf_func_proto bpf_get_current_comm_proto __weak; 1766 1759 const struct bpf_func_proto bpf_sock_map_update_proto __weak; 1767 1760 const struct bpf_func_proto bpf_sock_hash_update_proto __weak; 1761 + const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak; 1768 1762 1769 1763 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void) 1770 1764 {
+7 -14
kernel/bpf/devmap.c
··· 217 217 } 218 218 219 219 static int bq_xmit_all(struct bpf_dtab_netdev *obj, 220 - struct xdp_bulk_queue *bq) 220 + struct xdp_bulk_queue *bq, u32 flags) 221 221 { 222 222 struct net_device *dev = obj->dev; 223 223 int sent = 0, drops = 0, err = 0; ··· 232 232 prefetch(xdpf); 233 233 } 234 234 235 - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q); 235 + sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags); 236 236 if (sent < 0) { 237 237 err = sent; 238 238 sent = 0; ··· 276 276 for_each_set_bit(bit, bitmap, map->max_entries) { 277 277 struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]); 278 278 struct xdp_bulk_queue *bq; 279 - struct net_device *netdev; 280 279 281 280 /* This is possible if the dev entry is removed by user space 282 281 * between xdp redirect and flush op. ··· 286 287 __clear_bit(bit, bitmap); 287 288 288 289 bq = this_cpu_ptr(dev->bulkq); 289 - bq_xmit_all(dev, bq); 290 - netdev = dev->dev; 291 - if (likely(netdev->netdev_ops->ndo_xdp_flush)) 292 - netdev->netdev_ops->ndo_xdp_flush(netdev); 290 + bq_xmit_all(dev, bq, XDP_XMIT_FLUSH); 293 291 } 294 292 } 295 293 ··· 316 320 struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq); 317 321 318 322 if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) 319 - bq_xmit_all(obj, bq); 323 + bq_xmit_all(obj, bq, 0); 320 324 321 325 /* Ingress dev_rx will be the same for all xdp_frame's in 322 326 * bulk_queue, because bq stored per-CPU and must be flushed ··· 348 352 static void *dev_map_lookup_elem(struct bpf_map *map, void *key) 349 353 { 350 354 struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key); 351 - struct net_device *dev = dev = obj ? obj->dev : NULL; 355 + struct net_device *dev = obj ? obj->dev : NULL; 352 356 353 357 return dev ? &dev->ifindex : NULL; 354 358 } 355 359 356 360 static void dev_map_flush_old(struct bpf_dtab_netdev *dev) 357 361 { 358 - if (dev->dev->netdev_ops->ndo_xdp_flush) { 359 - struct net_device *fl = dev->dev; 362 + if (dev->dev->netdev_ops->ndo_xdp_xmit) { 360 363 struct xdp_bulk_queue *bq; 361 364 unsigned long *bitmap; 362 365 ··· 366 371 __clear_bit(dev->bit, bitmap); 367 372 368 373 bq = per_cpu_ptr(dev->bulkq, cpu); 369 - bq_xmit_all(dev, bq); 370 - 371 - fl->netdev_ops->ndo_xdp_flush(dev->dev); 374 + bq_xmit_all(dev, bq, XDP_XMIT_FLUSH); 372 375 } 373 376 } 374 377 }
+9 -3
kernel/bpf/hashtab.c
··· 503 503 struct bpf_insn *insn = insn_buf; 504 504 const int ret = BPF_REG_0; 505 505 506 - *insn++ = BPF_EMIT_CALL((u64 (*)(u64, u64, u64, u64, u64))__htab_map_lookup_elem); 506 + BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem, 507 + (void *(*)(struct bpf_map *map, void *key))NULL)); 508 + *insn++ = BPF_EMIT_CALL(BPF_CAST_CALL(__htab_map_lookup_elem)); 507 509 *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 1); 508 510 *insn++ = BPF_ALU64_IMM(BPF_ADD, ret, 509 511 offsetof(struct htab_elem, key) + ··· 532 530 const int ret = BPF_REG_0; 533 531 const int ref_reg = BPF_REG_1; 534 532 535 - *insn++ = BPF_EMIT_CALL((u64 (*)(u64, u64, u64, u64, u64))__htab_map_lookup_elem); 533 + BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem, 534 + (void *(*)(struct bpf_map *map, void *key))NULL)); 535 + *insn++ = BPF_EMIT_CALL(BPF_CAST_CALL(__htab_map_lookup_elem)); 536 536 *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 4); 537 537 *insn++ = BPF_LDX_MEM(BPF_B, ref_reg, ret, 538 538 offsetof(struct htab_elem, lru_node) + ··· 1373 1369 struct bpf_insn *insn = insn_buf; 1374 1370 const int ret = BPF_REG_0; 1375 1371 1376 - *insn++ = BPF_EMIT_CALL((u64 (*)(u64, u64, u64, u64, u64))__htab_map_lookup_elem); 1372 + BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem, 1373 + (void *(*)(struct bpf_map *map, void *key))NULL)); 1374 + *insn++ = BPF_EMIT_CALL(BPF_CAST_CALL(__htab_map_lookup_elem)); 1377 1375 *insn++ = BPF_JMP_IMM(BPF_JEQ, ret, 0, 2); 1378 1376 *insn++ = BPF_ALU64_IMM(BPF_ADD, ret, 1379 1377 offsetof(struct htab_elem, key) +
+15
kernel/bpf/helpers.c
··· 179 179 .arg1_type = ARG_PTR_TO_UNINIT_MEM, 180 180 .arg2_type = ARG_CONST_SIZE, 181 181 }; 182 + 183 + #ifdef CONFIG_CGROUPS 184 + BPF_CALL_0(bpf_get_current_cgroup_id) 185 + { 186 + struct cgroup *cgrp = task_dfl_cgroup(current); 187 + 188 + return cgrp->kn->id.id; 189 + } 190 + 191 + const struct bpf_func_proto bpf_get_current_cgroup_id_proto = { 192 + .func = bpf_get_current_cgroup_id, 193 + .gpl_only = false, 194 + .ret_type = RET_INTEGER, 195 + }; 196 + #endif
+3 -4
kernel/bpf/stackmap.c
··· 285 285 { 286 286 int i; 287 287 struct vm_area_struct *vma; 288 - bool in_nmi_ctx = in_nmi(); 289 288 bool irq_work_busy = false; 290 - struct stack_map_irq_work *work; 289 + struct stack_map_irq_work *work = NULL; 291 290 292 - if (in_nmi_ctx) { 291 + if (in_nmi()) { 293 292 work = this_cpu_ptr(&up_read_work); 294 293 if (work->irq_work.flags & IRQ_WORK_BUSY) 295 294 /* cannot queue more up_read, fallback */ ··· 327 328 id_offs[i].status = BPF_STACK_BUILD_ID_VALID; 328 329 } 329 330 330 - if (!in_nmi_ctx) { 331 + if (!work) { 331 332 up_read(&current->mm->mmap_sem); 332 333 } else { 333 334 work->sem = &current->mm->mmap_sem;
+23 -4
kernel/bpf/syscall.c
··· 11 11 */ 12 12 #include <linux/bpf.h> 13 13 #include <linux/bpf_trace.h> 14 + #include <linux/bpf_lirc.h> 14 15 #include <linux/btf.h> 15 16 #include <linux/syscalls.h> 16 17 #include <linux/slab.h> ··· 327 326 "value_size:\t%u\n" 328 327 "max_entries:\t%u\n" 329 328 "map_flags:\t%#x\n" 330 - "memlock:\t%llu\n", 329 + "memlock:\t%llu\n" 330 + "map_id:\t%u\n", 331 331 map->map_type, 332 332 map->key_size, 333 333 map->value_size, 334 334 map->max_entries, 335 335 map->map_flags, 336 - map->pages * 1ULL << PAGE_SHIFT); 336 + map->pages * 1ULL << PAGE_SHIFT, 337 + map->id); 337 338 338 339 if (owner_prog_type) { 339 340 seq_printf(m, "owner_prog_type:\t%u\n", ··· 1072 1069 "prog_type:\t%u\n" 1073 1070 "prog_jited:\t%u\n" 1074 1071 "prog_tag:\t%s\n" 1075 - "memlock:\t%llu\n", 1072 + "memlock:\t%llu\n" 1073 + "prog_id:\t%u\n", 1076 1074 prog->type, 1077 1075 prog->jited, 1078 1076 prog_tag, 1079 - prog->pages * 1ULL << PAGE_SHIFT); 1077 + prog->pages * 1ULL << PAGE_SHIFT, 1078 + prog->aux->id); 1080 1079 } 1081 1080 #endif 1082 1081 ··· 1254 1249 case BPF_CGROUP_INET6_BIND: 1255 1250 case BPF_CGROUP_INET4_CONNECT: 1256 1251 case BPF_CGROUP_INET6_CONNECT: 1252 + case BPF_CGROUP_UDP4_SENDMSG: 1253 + case BPF_CGROUP_UDP6_SENDMSG: 1257 1254 return 0; 1258 1255 default: 1259 1256 return -EINVAL; ··· 1572 1565 case BPF_CGROUP_INET6_BIND: 1573 1566 case BPF_CGROUP_INET4_CONNECT: 1574 1567 case BPF_CGROUP_INET6_CONNECT: 1568 + case BPF_CGROUP_UDP4_SENDMSG: 1569 + case BPF_CGROUP_UDP6_SENDMSG: 1575 1570 ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; 1576 1571 break; 1577 1572 case BPF_CGROUP_SOCK_OPS: ··· 1587 1578 case BPF_SK_SKB_STREAM_PARSER: 1588 1579 case BPF_SK_SKB_STREAM_VERDICT: 1589 1580 return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true); 1581 + case BPF_LIRC_MODE2: 1582 + return lirc_prog_attach(attr); 1590 1583 default: 1591 1584 return -EINVAL; 1592 1585 } ··· 1646 1635 case BPF_CGROUP_INET6_BIND: 1647 1636 case BPF_CGROUP_INET4_CONNECT: 1648 1637 case BPF_CGROUP_INET6_CONNECT: 1638 + case BPF_CGROUP_UDP4_SENDMSG: 1639 + case BPF_CGROUP_UDP6_SENDMSG: 1649 1640 ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; 1650 1641 break; 1651 1642 case BPF_CGROUP_SOCK_OPS: ··· 1661 1648 case BPF_SK_SKB_STREAM_PARSER: 1662 1649 case BPF_SK_SKB_STREAM_VERDICT: 1663 1650 return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, false); 1651 + case BPF_LIRC_MODE2: 1652 + return lirc_prog_detach(attr); 1664 1653 default: 1665 1654 return -EINVAL; 1666 1655 } ··· 1707 1692 case BPF_CGROUP_INET6_POST_BIND: 1708 1693 case BPF_CGROUP_INET4_CONNECT: 1709 1694 case BPF_CGROUP_INET6_CONNECT: 1695 + case BPF_CGROUP_UDP4_SENDMSG: 1696 + case BPF_CGROUP_UDP6_SENDMSG: 1710 1697 case BPF_CGROUP_SOCK_OPS: 1711 1698 case BPF_CGROUP_DEVICE: 1712 1699 break; 1700 + case BPF_LIRC_MODE2: 1701 + return lirc_prog_query(attr, uattr); 1713 1702 default: 1714 1703 return -EINVAL; 1715 1704 }
+52 -21
kernel/bpf/verifier.c
··· 2421 2421 struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx]; 2422 2422 2423 2423 if (func_id != BPF_FUNC_tail_call && 2424 - func_id != BPF_FUNC_map_lookup_elem) 2424 + func_id != BPF_FUNC_map_lookup_elem && 2425 + func_id != BPF_FUNC_map_update_elem && 2426 + func_id != BPF_FUNC_map_delete_elem) 2425 2427 return 0; 2428 + 2426 2429 if (meta->map_ptr == NULL) { 2427 2430 verbose(env, "kernel subsystem misconfigured verifier\n"); 2428 2431 return -EINVAL; ··· 2465 2462 2466 2463 /* eBPF programs must be GPL compatible to use GPL-ed functions */ 2467 2464 if (!env->prog->gpl_compatible && fn->gpl_only) { 2468 - verbose(env, "cannot call GPL only function from proprietary program\n"); 2465 + verbose(env, "cannot call GPL-restricted function from non-GPL compatible program\n"); 2469 2466 return -EINVAL; 2470 2467 } 2471 2468 ··· 5349 5346 */ 5350 5347 is_narrower_load = size < ctx_field_size; 5351 5348 if (is_narrower_load) { 5349 + u32 size_default = bpf_ctx_off_adjust_machine(ctx_field_size); 5352 5350 u32 off = insn->off; 5353 5351 u8 size_code; 5354 5352 ··· 5364 5360 else if (ctx_field_size == 8) 5365 5361 size_code = BPF_DW; 5366 5362 5367 - insn->off = off & ~(ctx_field_size - 1); 5363 + insn->off = off & ~(size_default - 1); 5368 5364 insn->code = BPF_LDX | BPF_MEM | size_code; 5369 5365 } 5370 5366 ··· 5590 5586 struct bpf_insn *insn = prog->insnsi; 5591 5587 const struct bpf_func_proto *fn; 5592 5588 const int insn_cnt = prog->len; 5589 + const struct bpf_map_ops *ops; 5593 5590 struct bpf_insn_aux_data *aux; 5594 5591 struct bpf_insn insn_buf[16]; 5595 5592 struct bpf_prog *new_prog; ··· 5720 5715 } 5721 5716 5722 5717 /* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup 5723 - * handlers are currently limited to 64 bit only. 5718 + * and other inlining handlers are currently limited to 64 bit 5719 + * only. 5724 5720 */ 5725 5721 if (prog->jit_requested && BITS_PER_LONG == 64 && 5726 - insn->imm == BPF_FUNC_map_lookup_elem) { 5722 + (insn->imm == BPF_FUNC_map_lookup_elem || 5723 + insn->imm == BPF_FUNC_map_update_elem || 5724 + insn->imm == BPF_FUNC_map_delete_elem)) { 5727 5725 aux = &env->insn_aux_data[i + delta]; 5728 5726 if (bpf_map_ptr_poisoned(aux)) 5729 5727 goto patch_call_imm; 5730 5728 5731 5729 map_ptr = BPF_MAP_PTR(aux->map_state); 5732 - if (!map_ptr->ops->map_gen_lookup) 5733 - goto patch_call_imm; 5730 + ops = map_ptr->ops; 5731 + if (insn->imm == BPF_FUNC_map_lookup_elem && 5732 + ops->map_gen_lookup) { 5733 + cnt = ops->map_gen_lookup(map_ptr, insn_buf); 5734 + if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) { 5735 + verbose(env, "bpf verifier is misconfigured\n"); 5736 + return -EINVAL; 5737 + } 5734 5738 5735 - cnt = map_ptr->ops->map_gen_lookup(map_ptr, insn_buf); 5736 - if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) { 5737 - verbose(env, "bpf verifier is misconfigured\n"); 5738 - return -EINVAL; 5739 + new_prog = bpf_patch_insn_data(env, i + delta, 5740 + insn_buf, cnt); 5741 + if (!new_prog) 5742 + return -ENOMEM; 5743 + 5744 + delta += cnt - 1; 5745 + env->prog = prog = new_prog; 5746 + insn = new_prog->insnsi + i + delta; 5747 + continue; 5739 5748 } 5740 5749 5741 - new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 5742 - cnt); 5743 - if (!new_prog) 5744 - return -ENOMEM; 5750 + BUILD_BUG_ON(!__same_type(ops->map_lookup_elem, 5751 + (void *(*)(struct bpf_map *map, void *key))NULL)); 5752 + BUILD_BUG_ON(!__same_type(ops->map_delete_elem, 5753 + (int (*)(struct bpf_map *map, void *key))NULL)); 5754 + BUILD_BUG_ON(!__same_type(ops->map_update_elem, 5755 + (int (*)(struct bpf_map *map, void *key, void *value, 5756 + u64 flags))NULL)); 5757 + switch (insn->imm) { 5758 + case BPF_FUNC_map_lookup_elem: 5759 + insn->imm = BPF_CAST_CALL(ops->map_lookup_elem) - 5760 + __bpf_call_base; 5761 + continue; 5762 + case BPF_FUNC_map_update_elem: 5763 + insn->imm = BPF_CAST_CALL(ops->map_update_elem) - 5764 + __bpf_call_base; 5765 + continue; 5766 + case BPF_FUNC_map_delete_elem: 5767 + insn->imm = BPF_CAST_CALL(ops->map_delete_elem) - 5768 + __bpf_call_base; 5769 + continue; 5770 + } 5745 5771 5746 - delta += cnt - 1; 5747 - 5748 - /* keep walking new program and skip insns we just inserted */ 5749 - env->prog = prog = new_prog; 5750 - insn = new_prog->insnsi + i + delta; 5751 - continue; 5772 + goto patch_call_imm; 5752 5773 } 5753 5774 5754 5775 if (insn->imm == BPF_FUNC_redirect_map) {
+14 -2
kernel/trace/bpf_trace.c
··· 564 564 return &bpf_get_prandom_u32_proto; 565 565 case BPF_FUNC_probe_read_str: 566 566 return &bpf_probe_read_str_proto; 567 + #ifdef CONFIG_CGROUPS 568 + case BPF_FUNC_get_current_cgroup_id: 569 + return &bpf_get_current_cgroup_id_proto; 570 + #endif 567 571 default: 568 572 return NULL; 569 573 } ··· 884 880 return false; 885 881 if (type != BPF_READ) 886 882 return false; 887 - if (off % size != 0) 888 - return false; 883 + if (off % size != 0) { 884 + if (sizeof(unsigned long) != 4) 885 + return false; 886 + if (size != 8) 887 + return false; 888 + if (off % size != 4) 889 + return false; 890 + } 889 891 890 892 switch (off) { 891 893 case bpf_ctx_range(struct bpf_perf_event_data, sample_period): ··· 1016 1006 1017 1007 old_array = event->tp_event->prog_array; 1018 1008 ret = bpf_prog_array_copy(old_array, event->prog, NULL, &new_array); 1009 + if (ret == -ENOENT) 1010 + goto unlock; 1019 1011 if (ret < 0) { 1020 1012 bpf_prog_array_delete_safe(old_array, event->prog); 1021 1013 } else {
+63
lib/test_bpf.c
··· 356 356 return __bpf_fill_ja(self, BPF_MAXINSNS, 68); 357 357 } 358 358 359 + static int bpf_fill_maxinsns12(struct bpf_test *self) 360 + { 361 + unsigned int len = BPF_MAXINSNS; 362 + struct sock_filter *insn; 363 + int i = 0; 364 + 365 + insn = kmalloc_array(len, sizeof(*insn), GFP_KERNEL); 366 + if (!insn) 367 + return -ENOMEM; 368 + 369 + insn[0] = __BPF_JUMP(BPF_JMP | BPF_JA, len - 2, 0, 0); 370 + 371 + for (i = 1; i < len - 1; i++) 372 + insn[i] = __BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0); 373 + 374 + insn[len - 1] = __BPF_STMT(BPF_RET | BPF_K, 0xabababab); 375 + 376 + self->u.ptr.insns = insn; 377 + self->u.ptr.len = len; 378 + 379 + return 0; 380 + } 381 + 382 + static int bpf_fill_maxinsns13(struct bpf_test *self) 383 + { 384 + unsigned int len = BPF_MAXINSNS; 385 + struct sock_filter *insn; 386 + int i = 0; 387 + 388 + insn = kmalloc_array(len, sizeof(*insn), GFP_KERNEL); 389 + if (!insn) 390 + return -ENOMEM; 391 + 392 + for (i = 0; i < len - 3; i++) 393 + insn[i] = __BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0); 394 + 395 + insn[len - 3] = __BPF_STMT(BPF_LD | BPF_IMM, 0xabababab); 396 + insn[len - 2] = __BPF_STMT(BPF_ALU | BPF_XOR | BPF_X, 0); 397 + insn[len - 1] = __BPF_STMT(BPF_RET | BPF_A, 0); 398 + 399 + self->u.ptr.insns = insn; 400 + self->u.ptr.len = len; 401 + 402 + return 0; 403 + } 404 + 359 405 static int bpf_fill_ja(struct bpf_test *self) 360 406 { 361 407 /* Hits exactly 11 passes on x86_64 JIT. */ ··· 5334 5288 { { 0, 0xababcbac } }, 5335 5289 .fill_helper = bpf_fill_maxinsns11, 5336 5290 .expected_errcode = -ENOTSUPP, 5291 + }, 5292 + { 5293 + "BPF_MAXINSNS: jump over MSH", 5294 + { }, 5295 + CLASSIC | FLAG_EXPECTED_FAIL, 5296 + { 0xfa, 0xfb, 0xfc, 0xfd, }, 5297 + { { 4, 0xabababab } }, 5298 + .fill_helper = bpf_fill_maxinsns12, 5299 + .expected_errcode = -EINVAL, 5300 + }, 5301 + { 5302 + "BPF_MAXINSNS: exec all MSH", 5303 + { }, 5304 + CLASSIC, 5305 + { 0xfa, 0xfb, 0xfc, 0xfd, }, 5306 + { { 4, 0xababab83 } }, 5307 + .fill_helper = bpf_fill_maxinsns13, 5337 5308 }, 5338 5309 { 5339 5310 "BPF_MAXINSNS: ld_abs+get_processor_id",
+84 -7
net/core/filter.c
··· 3056 3056 if (unlikely(!xdpf)) 3057 3057 return -EOVERFLOW; 3058 3058 3059 - sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf); 3059 + sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH); 3060 3060 if (sent <= 0) 3061 3061 return sent; 3062 - dev->netdev_ops->ndo_xdp_flush(dev); 3063 3062 return 0; 3064 3063 } 3065 3064 ··· 3444 3445 to->tunnel_id = be64_to_cpu(info->key.tun_id); 3445 3446 to->tunnel_tos = info->key.tos; 3446 3447 to->tunnel_ttl = info->key.ttl; 3448 + to->tunnel_ext = 0; 3447 3449 3448 3450 if (flags & BPF_F_TUNINFO_IPV6) { 3449 3451 memcpy(to->remote_ipv6, &info->key.u.ipv6.src, ··· 3452 3452 to->tunnel_label = be32_to_cpu(info->key.label); 3453 3453 } else { 3454 3454 to->remote_ipv4 = be32_to_cpu(info->key.u.ipv4.src); 3455 + memset(&to->remote_ipv6[1], 0, sizeof(__u32) * 3); 3456 + to->tunnel_label = 0; 3455 3457 } 3456 3458 3457 3459 if (unlikely(size != sizeof(struct bpf_tunnel_key))) ··· 3662 3660 .arg2_type = ARG_CONST_MAP_PTR, 3663 3661 .arg3_type = ARG_ANYTHING, 3664 3662 }; 3663 + 3664 + #ifdef CONFIG_SOCK_CGROUP_DATA 3665 + BPF_CALL_1(bpf_skb_cgroup_id, const struct sk_buff *, skb) 3666 + { 3667 + struct sock *sk = skb_to_full_sk(skb); 3668 + struct cgroup *cgrp; 3669 + 3670 + if (!sk || !sk_fullsock(sk)) 3671 + return 0; 3672 + 3673 + cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); 3674 + return cgrp->kn->id.id; 3675 + } 3676 + 3677 + static const struct bpf_func_proto bpf_skb_cgroup_id_proto = { 3678 + .func = bpf_skb_cgroup_id, 3679 + .gpl_only = false, 3680 + .ret_type = RET_INTEGER, 3681 + .arg1_type = ARG_PTR_TO_CTX, 3682 + }; 3683 + #endif 3665 3684 3666 3685 static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff, 3667 3686 unsigned long off, unsigned long len) ··· 4049 4026 to->reqid = x->props.reqid; 4050 4027 to->spi = x->id.spi; 4051 4028 to->family = x->props.family; 4029 + to->ext = 0; 4030 + 4052 4031 if (to->family == AF_INET6) { 4053 4032 memcpy(to->remote_ipv6, x->props.saddr.a6, 4054 4033 sizeof(to->remote_ipv6)); 4055 4034 } else { 4056 4035 to->remote_ipv4 = x->props.saddr.a4; 4036 + memset(&to->remote_ipv6[1], 0, sizeof(__u32) * 3); 4057 4037 } 4058 4038 4059 4039 return 0; ··· 4221 4195 fl6.flowi6_oif = 0; 4222 4196 strict = RT6_LOOKUP_F_HAS_SADDR; 4223 4197 } 4224 - fl6.flowlabel = params->flowlabel; 4198 + fl6.flowlabel = params->flowinfo; 4225 4199 fl6.flowi6_scope = 0; 4226 4200 fl6.flowi6_flags = 0; 4227 4201 fl6.mp_hash = 0; ··· 4296 4270 if (plen < sizeof(*params)) 4297 4271 return -EINVAL; 4298 4272 4273 + if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) 4274 + return -EINVAL; 4275 + 4299 4276 switch (params->family) { 4300 4277 #if IS_ENABLED(CONFIG_INET) 4301 4278 case AF_INET: ··· 4311 4282 flags, true); 4312 4283 #endif 4313 4284 } 4314 - return 0; 4285 + return -EAFNOSUPPORT; 4315 4286 } 4316 4287 4317 4288 static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = { ··· 4328 4299 struct bpf_fib_lookup *, params, int, plen, u32, flags) 4329 4300 { 4330 4301 struct net *net = dev_net(skb->dev); 4331 - int index = 0; 4302 + int index = -EAFNOSUPPORT; 4332 4303 4333 4304 if (plen < sizeof(*params)) 4305 + return -EINVAL; 4306 + 4307 + if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT)) 4334 4308 return -EINVAL; 4335 4309 4336 4310 switch (params->family) { ··· 4773 4741 return &bpf_get_socket_cookie_proto; 4774 4742 case BPF_FUNC_get_socket_uid: 4775 4743 return &bpf_get_socket_uid_proto; 4744 + case BPF_FUNC_fib_lookup: 4745 + return &bpf_skb_fib_lookup_proto; 4776 4746 #ifdef CONFIG_XFRM 4777 4747 case BPF_FUNC_skb_get_xfrm_state: 4778 4748 return &bpf_skb_get_xfrm_state_proto; 4779 4749 #endif 4780 - case BPF_FUNC_fib_lookup: 4781 - return &bpf_skb_fib_lookup_proto; 4750 + #ifdef CONFIG_SOCK_CGROUP_DATA 4751 + case BPF_FUNC_skb_cgroup_id: 4752 + return &bpf_skb_cgroup_id_proto; 4753 + #endif 4782 4754 default: 4783 4755 return bpf_base_func_proto(func_id); 4784 4756 } ··· 5335 5299 switch (prog->expected_attach_type) { 5336 5300 case BPF_CGROUP_INET4_BIND: 5337 5301 case BPF_CGROUP_INET4_CONNECT: 5302 + case BPF_CGROUP_UDP4_SENDMSG: 5338 5303 break; 5339 5304 default: 5340 5305 return false; ··· 5345 5308 switch (prog->expected_attach_type) { 5346 5309 case BPF_CGROUP_INET6_BIND: 5347 5310 case BPF_CGROUP_INET6_CONNECT: 5311 + case BPF_CGROUP_UDP6_SENDMSG: 5312 + break; 5313 + default: 5314 + return false; 5315 + } 5316 + break; 5317 + case bpf_ctx_range(struct bpf_sock_addr, msg_src_ip4): 5318 + switch (prog->expected_attach_type) { 5319 + case BPF_CGROUP_UDP4_SENDMSG: 5320 + break; 5321 + default: 5322 + return false; 5323 + } 5324 + break; 5325 + case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0], 5326 + msg_src_ip6[3]): 5327 + switch (prog->expected_attach_type) { 5328 + case BPF_CGROUP_UDP6_SENDMSG: 5348 5329 break; 5349 5330 default: 5350 5331 return false; ··· 5373 5318 switch (off) { 5374 5319 case bpf_ctx_range(struct bpf_sock_addr, user_ip4): 5375 5320 case bpf_ctx_range_till(struct bpf_sock_addr, user_ip6[0], user_ip6[3]): 5321 + case bpf_ctx_range(struct bpf_sock_addr, msg_src_ip4): 5322 + case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0], 5323 + msg_src_ip6[3]): 5376 5324 /* Only narrow read access allowed for now. */ 5377 5325 if (type == BPF_READ) { 5378 5326 bpf_ctx_record_field_size(info, size_default); ··· 6130 6072 *insn++ = BPF_ALU32_IMM(BPF_RSH, si->dst_reg, 6131 6073 SK_FL_PROTO_SHIFT); 6132 6074 break; 6075 + 6076 + case offsetof(struct bpf_sock_addr, msg_src_ip4): 6077 + /* Treat t_ctx as struct in_addr for msg_src_ip4. */ 6078 + SOCK_ADDR_LOAD_OR_STORE_NESTED_FIELD_SIZE_OFF( 6079 + struct bpf_sock_addr_kern, struct in_addr, t_ctx, 6080 + s_addr, BPF_SIZE(si->code), 0, tmp_reg); 6081 + break; 6082 + 6083 + case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0], 6084 + msg_src_ip6[3]): 6085 + off = si->off; 6086 + off -= offsetof(struct bpf_sock_addr, msg_src_ip6[0]); 6087 + /* Treat t_ctx as struct in6_addr for msg_src_ip6. */ 6088 + SOCK_ADDR_LOAD_OR_STORE_NESTED_FIELD_SIZE_OFF( 6089 + struct bpf_sock_addr_kern, struct in6_addr, t_ctx, 6090 + s6_addr32[0], BPF_SIZE(si->code), off, tmp_reg); 6091 + break; 6133 6092 } 6134 6093 6135 6094 return insn - insn_buf; ··· 6522 6447 struct bpf_prog *prog, u32 *target_size) 6523 6448 { 6524 6449 struct bpf_insn *insn = insn_buf; 6450 + #if IS_ENABLED(CONFIG_IPV6) 6525 6451 int off; 6452 + #endif 6526 6453 6527 6454 switch (si->off) { 6528 6455 case offsetof(struct sk_msg_md, data):
+14 -5
net/core/xdp.c
··· 31 31 union { 32 32 void *allocator; 33 33 struct page_pool *page_pool; 34 + struct zero_copy_allocator *zc_alloc; 34 35 }; 35 36 struct rhash_head node; 36 37 struct rcu_head rcu; ··· 262 261 xdp_rxq->mem.type = type; 263 262 264 263 if (!allocator) { 265 - if (type == MEM_TYPE_PAGE_POOL) 264 + if (type == MEM_TYPE_PAGE_POOL || type == MEM_TYPE_ZERO_COPY) 266 265 return -EINVAL; /* Setup time check page_pool req */ 267 266 return 0; 268 267 } ··· 315 314 * is used for those calls sites. Thus, allowing for faster recycling 316 315 * of xdp_frames/pages in those cases. 317 316 */ 318 - static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct) 317 + static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, 318 + unsigned long handle) 319 319 { 320 320 struct xdp_mem_allocator *xa; 321 321 struct page *page; ··· 340 338 page = virt_to_page(data); /* Assumes order0 page*/ 341 339 put_page(page); 342 340 break; 341 + case MEM_TYPE_ZERO_COPY: 342 + /* NB! Only valid from an xdp_buff! */ 343 + rcu_read_lock(); 344 + /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */ 345 + xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); 346 + xa->zc_alloc->free(xa->zc_alloc, handle); 347 + rcu_read_unlock(); 343 348 default: 344 349 /* Not possible, checked in xdp_rxq_info_reg_mem_model() */ 345 350 break; ··· 355 346 356 347 void xdp_return_frame(struct xdp_frame *xdpf) 357 348 { 358 - __xdp_return(xdpf->data, &xdpf->mem, false); 349 + __xdp_return(xdpf->data, &xdpf->mem, false, 0); 359 350 } 360 351 EXPORT_SYMBOL_GPL(xdp_return_frame); 361 352 362 353 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf) 363 354 { 364 - __xdp_return(xdpf->data, &xdpf->mem, true); 355 + __xdp_return(xdpf->data, &xdpf->mem, true, 0); 365 356 } 366 357 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi); 367 358 368 359 void xdp_return_buff(struct xdp_buff *xdp) 369 360 { 370 - __xdp_return(xdp->data, &xdp->rxq->mem, true); 361 + __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle); 371 362 } 372 363 EXPORT_SYMBOL_GPL(xdp_return_buff);
+18 -2
net/ipv4/udp.c
··· 899 899 { 900 900 struct inet_sock *inet = inet_sk(sk); 901 901 struct udp_sock *up = udp_sk(sk); 902 + DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name); 902 903 struct flowi4 fl4_stack; 903 904 struct flowi4 *fl4; 904 905 int ulen = len; ··· 954 953 /* 955 954 * Get and verify the address. 956 955 */ 957 - if (msg->msg_name) { 958 - DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name); 956 + if (usin) { 959 957 if (msg->msg_namelen < sizeof(*usin)) 960 958 return -EINVAL; 961 959 if (usin->sin_family != AF_INET) { ··· 1006 1006 ipc.opt = &opt_copy.opt; 1007 1007 } 1008 1008 rcu_read_unlock(); 1009 + } 1010 + 1011 + if (cgroup_bpf_enabled && !connected) { 1012 + err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, 1013 + (struct sockaddr *)usin, &ipc.addr); 1014 + if (err) 1015 + goto out_free; 1016 + if (usin) { 1017 + if (usin->sin_port == 0) { 1018 + /* BPF program set invalid port. Reject it. */ 1019 + err = -EINVAL; 1020 + goto out_free; 1021 + } 1022 + daddr = usin->sin_addr.s_addr; 1023 + dport = usin->sin_port; 1024 + } 1009 1025 } 1010 1026 1011 1027 saddr = ipc.addr;
+24
net/ipv6/udp.c
··· 1314 1314 fl6.saddr = np->saddr; 1315 1315 fl6.fl6_sport = inet->inet_sport; 1316 1316 1317 + if (cgroup_bpf_enabled && !connected) { 1318 + err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, 1319 + (struct sockaddr *)sin6, &fl6.saddr); 1320 + if (err) 1321 + goto out_no_dst; 1322 + if (sin6) { 1323 + if (ipv6_addr_v4mapped(&sin6->sin6_addr)) { 1324 + /* BPF program rewrote IPv6-only by IPv4-mapped 1325 + * IPv6. It's currently unsupported. 1326 + */ 1327 + err = -ENOTSUPP; 1328 + goto out_no_dst; 1329 + } 1330 + if (sin6->sin6_port == 0) { 1331 + /* BPF program set invalid port. Reject it. */ 1332 + err = -EINVAL; 1333 + goto out_no_dst; 1334 + } 1335 + fl6.fl6_dport = sin6->sin6_port; 1336 + fl6.daddr = sin6->sin6_addr; 1337 + } 1338 + } 1339 + 1317 1340 final_p = fl6_update_dst(&fl6, opt, &final); 1318 1341 if (final_p) 1319 1342 connected = false; ··· 1416 1393 1417 1394 out: 1418 1395 dst_release(dst); 1396 + out_no_dst: 1419 1397 fl6_sock_release(flowlabel); 1420 1398 txopt_put(opt_to_free); 1421 1399 if (!err)
+132 -19
net/xdp/xdp_umem.c
··· 13 13 #include <linux/mm.h> 14 14 15 15 #include "xdp_umem.h" 16 + #include "xsk_queue.h" 16 17 17 - #define XDP_UMEM_MIN_FRAME_SIZE 2048 18 + #define XDP_UMEM_MIN_CHUNK_SIZE 2048 19 + 20 + void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) 21 + { 22 + unsigned long flags; 23 + 24 + spin_lock_irqsave(&umem->xsk_list_lock, flags); 25 + list_add_rcu(&xs->list, &umem->xsk_list); 26 + spin_unlock_irqrestore(&umem->xsk_list_lock, flags); 27 + } 28 + 29 + void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) 30 + { 31 + unsigned long flags; 32 + 33 + if (xs->dev) { 34 + spin_lock_irqsave(&umem->xsk_list_lock, flags); 35 + list_del_rcu(&xs->list); 36 + spin_unlock_irqrestore(&umem->xsk_list_lock, flags); 37 + 38 + if (umem->zc) 39 + synchronize_net(); 40 + } 41 + } 42 + 43 + int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, 44 + u32 queue_id, u16 flags) 45 + { 46 + bool force_zc, force_copy; 47 + struct netdev_bpf bpf; 48 + int err; 49 + 50 + force_zc = flags & XDP_ZEROCOPY; 51 + force_copy = flags & XDP_COPY; 52 + 53 + if (force_zc && force_copy) 54 + return -EINVAL; 55 + 56 + if (force_copy) 57 + return 0; 58 + 59 + dev_hold(dev); 60 + 61 + if (dev->netdev_ops->ndo_bpf && dev->netdev_ops->ndo_xsk_async_xmit) { 62 + bpf.command = XDP_QUERY_XSK_UMEM; 63 + 64 + rtnl_lock(); 65 + err = dev->netdev_ops->ndo_bpf(dev, &bpf); 66 + rtnl_unlock(); 67 + 68 + if (err) { 69 + dev_put(dev); 70 + return force_zc ? -ENOTSUPP : 0; 71 + } 72 + 73 + bpf.command = XDP_SETUP_XSK_UMEM; 74 + bpf.xsk.umem = umem; 75 + bpf.xsk.queue_id = queue_id; 76 + 77 + rtnl_lock(); 78 + err = dev->netdev_ops->ndo_bpf(dev, &bpf); 79 + rtnl_unlock(); 80 + 81 + if (err) { 82 + dev_put(dev); 83 + return force_zc ? err : 0; /* fail or fallback */ 84 + } 85 + 86 + umem->dev = dev; 87 + umem->queue_id = queue_id; 88 + umem->zc = true; 89 + return 0; 90 + } 91 + 92 + dev_put(dev); 93 + return force_zc ? -ENOTSUPP : 0; /* fail or fallback */ 94 + } 95 + 96 + static void xdp_umem_clear_dev(struct xdp_umem *umem) 97 + { 98 + struct netdev_bpf bpf; 99 + int err; 100 + 101 + if (umem->dev) { 102 + bpf.command = XDP_SETUP_XSK_UMEM; 103 + bpf.xsk.umem = NULL; 104 + bpf.xsk.queue_id = umem->queue_id; 105 + 106 + rtnl_lock(); 107 + err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); 108 + rtnl_unlock(); 109 + 110 + if (err) 111 + WARN(1, "failed to disable umem!\n"); 112 + 113 + dev_put(umem->dev); 114 + umem->dev = NULL; 115 + } 116 + } 18 117 19 118 static void xdp_umem_unpin_pages(struct xdp_umem *umem) 20 119 { ··· 141 42 struct task_struct *task; 142 43 struct mm_struct *mm; 143 44 45 + xdp_umem_clear_dev(umem); 46 + 144 47 if (umem->fq) { 145 48 xskq_destroy(umem->fq); 146 49 umem->fq = NULL; ··· 165 64 goto out; 166 65 167 66 mmput(mm); 67 + kfree(umem->pages); 68 + umem->pages = NULL; 69 + 168 70 xdp_umem_unaccount_pages(umem); 169 71 out: 170 72 kfree(umem); ··· 255 151 256 152 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) 257 153 { 258 - u32 frame_size = mr->frame_size, frame_headroom = mr->frame_headroom; 154 + u32 chunk_size = mr->chunk_size, headroom = mr->headroom; 155 + unsigned int chunks, chunks_per_page; 259 156 u64 addr = mr->addr, size = mr->len; 260 - unsigned int nframes, nfpp; 261 - int size_chk, err; 157 + int size_chk, err, i; 262 158 263 - if (frame_size < XDP_UMEM_MIN_FRAME_SIZE || frame_size > PAGE_SIZE) { 159 + if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) { 264 160 /* Strictly speaking we could support this, if: 265 161 * - huge pages, or* 266 162 * - using an IOMMU, or ··· 270 166 return -EINVAL; 271 167 } 272 168 273 - if (!is_power_of_2(frame_size)) 169 + if (!is_power_of_2(chunk_size)) 274 170 return -EINVAL; 275 171 276 172 if (!PAGE_ALIGNED(addr)) { ··· 283 179 if ((addr + size) < addr) 284 180 return -EINVAL; 285 181 286 - nframes = (unsigned int)div_u64(size, frame_size); 287 - if (nframes == 0 || nframes > UINT_MAX) 182 + chunks = (unsigned int)div_u64(size, chunk_size); 183 + if (chunks == 0) 288 184 return -EINVAL; 289 185 290 - nfpp = PAGE_SIZE / frame_size; 291 - if (nframes < nfpp || nframes % nfpp) 186 + chunks_per_page = PAGE_SIZE / chunk_size; 187 + if (chunks < chunks_per_page || chunks % chunks_per_page) 292 188 return -EINVAL; 293 189 294 - frame_headroom = ALIGN(frame_headroom, 64); 190 + headroom = ALIGN(headroom, 64); 295 191 296 - size_chk = frame_size - frame_headroom - XDP_PACKET_HEADROOM; 192 + size_chk = chunk_size - headroom - XDP_PACKET_HEADROOM; 297 193 if (size_chk < 0) 298 194 return -EINVAL; 299 195 300 196 umem->pid = get_task_pid(current, PIDTYPE_PID); 301 - umem->size = (size_t)size; 302 197 umem->address = (unsigned long)addr; 303 - umem->props.frame_size = frame_size; 304 - umem->props.nframes = nframes; 305 - umem->frame_headroom = frame_headroom; 198 + umem->props.chunk_mask = ~((u64)chunk_size - 1); 199 + umem->props.size = size; 200 + umem->headroom = headroom; 201 + umem->chunk_size_nohr = chunk_size - headroom; 306 202 umem->npgs = size / PAGE_SIZE; 307 203 umem->pgs = NULL; 308 204 umem->user = NULL; 205 + INIT_LIST_HEAD(&umem->xsk_list); 206 + spin_lock_init(&umem->xsk_list_lock); 309 207 310 - umem->frame_size_log2 = ilog2(frame_size); 311 - umem->nfpp_mask = nfpp - 1; 312 - umem->nfpplog2 = ilog2(nfpp); 313 208 refcount_set(&umem->users, 1); 314 209 315 210 err = xdp_umem_account_pages(umem); ··· 318 215 err = xdp_umem_pin_pages(umem); 319 216 if (err) 320 217 goto out_account; 218 + 219 + umem->pages = kcalloc(umem->npgs, sizeof(*umem->pages), GFP_KERNEL); 220 + if (!umem->pages) { 221 + err = -ENOMEM; 222 + goto out_account; 223 + } 224 + 225 + for (i = 0; i < umem->npgs; i++) 226 + umem->pages[i].addr = page_address(umem->pgs[i]); 227 + 321 228 return 0; 322 229 323 230 out_account:
+9 -36
net/xdp/xdp_umem.h
··· 6 6 #ifndef XDP_UMEM_H_ 7 7 #define XDP_UMEM_H_ 8 8 9 - #include <linux/mm.h> 10 - #include <linux/if_xdp.h> 11 - #include <linux/workqueue.h> 9 + #include <net/xdp_sock.h> 12 10 13 - #include "xsk_queue.h" 14 - #include "xdp_umem_props.h" 15 - 16 - struct xdp_umem { 17 - struct xsk_queue *fq; 18 - struct xsk_queue *cq; 19 - struct page **pgs; 20 - struct xdp_umem_props props; 21 - u32 npgs; 22 - u32 frame_headroom; 23 - u32 nfpp_mask; 24 - u32 nfpplog2; 25 - u32 frame_size_log2; 26 - struct user_struct *user; 27 - struct pid *pid; 28 - unsigned long address; 29 - size_t size; 30 - refcount_t users; 31 - struct work_struct work; 32 - }; 33 - 34 - static inline char *xdp_umem_get_data(struct xdp_umem *umem, u32 idx) 11 + static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr) 35 12 { 36 - u64 pg, off; 37 - char *data; 38 - 39 - pg = idx >> umem->nfpplog2; 40 - off = (idx & umem->nfpp_mask) << umem->frame_size_log2; 41 - 42 - data = page_address(umem->pgs[pg]); 43 - return data + off; 13 + return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1)); 44 14 } 45 15 46 - static inline char *xdp_umem_get_data_with_headroom(struct xdp_umem *umem, 47 - u32 idx) 16 + static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr) 48 17 { 49 - return xdp_umem_get_data(umem, idx) + umem->frame_headroom; 18 + return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1)); 50 19 } 51 20 21 + int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, 22 + u32 queue_id, u16 flags); 52 23 bool xdp_umem_validate_queues(struct xdp_umem *umem); 53 24 void xdp_get_umem(struct xdp_umem *umem); 54 25 void xdp_put_umem(struct xdp_umem *umem); 26 + void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); 27 + void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); 55 28 struct xdp_umem *xdp_umem_create(struct xdp_umem_reg *mr); 56 29 57 30 #endif /* XDP_UMEM_H_ */
+2 -2
net/xdp/xdp_umem_props.h
··· 7 7 #define XDP_UMEM_PROPS_H_ 8 8 9 9 struct xdp_umem_props { 10 - u32 frame_size; 11 - u32 nframes; 10 + u64 chunk_mask; 11 + u64 size; 12 12 }; 13 13 14 14 #endif /* XDP_UMEM_PROPS_H_ */
+159 -44
net/xdp/xsk.c
··· 21 21 #include <linux/uaccess.h> 22 22 #include <linux/net.h> 23 23 #include <linux/netdevice.h> 24 + #include <linux/rculist.h> 24 25 #include <net/xdp_sock.h> 25 26 #include <net/xdp.h> 26 27 ··· 37 36 38 37 bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) 39 38 { 40 - return !!xs->rx; 39 + return READ_ONCE(xs->rx) && READ_ONCE(xs->umem) && 40 + READ_ONCE(xs->umem->fq); 41 41 } 42 42 43 - static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) 43 + u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr) 44 44 { 45 - u32 *id, len = xdp->data_end - xdp->data; 45 + return xskq_peek_addr(umem->fq, addr); 46 + } 47 + EXPORT_SYMBOL(xsk_umem_peek_addr); 48 + 49 + void xsk_umem_discard_addr(struct xdp_umem *umem) 50 + { 51 + xskq_discard_addr(umem->fq); 52 + } 53 + EXPORT_SYMBOL(xsk_umem_discard_addr); 54 + 55 + static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) 56 + { 46 57 void *buffer; 47 - int err = 0; 58 + u64 addr; 59 + int err; 48 60 49 - if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) 50 - return -EINVAL; 51 - 52 - id = xskq_peek_id(xs->umem->fq); 53 - if (!id) 61 + if (!xskq_peek_addr(xs->umem->fq, &addr) || 62 + len > xs->umem->chunk_size_nohr) { 63 + xs->rx_dropped++; 54 64 return -ENOSPC; 65 + } 55 66 56 - buffer = xdp_umem_get_data_with_headroom(xs->umem, *id); 67 + addr += xs->umem->headroom; 68 + 69 + buffer = xdp_umem_get_data(xs->umem, addr); 57 70 memcpy(buffer, xdp->data, len); 58 - err = xskq_produce_batch_desc(xs->rx, *id, len, 59 - xs->umem->frame_headroom); 60 - if (!err) 61 - xskq_discard_id(xs->umem->fq); 71 + err = xskq_produce_batch_desc(xs->rx, addr, len); 72 + if (!err) { 73 + xskq_discard_addr(xs->umem->fq); 74 + xdp_return_buff(xdp); 75 + return 0; 76 + } 77 + 78 + xs->rx_dropped++; 79 + return err; 80 + } 81 + 82 + static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) 83 + { 84 + int err = xskq_produce_batch_desc(xs->rx, (u64)xdp->handle, len); 85 + 86 + if (err) { 87 + xdp_return_buff(xdp); 88 + xs->rx_dropped++; 89 + } 62 90 63 91 return err; 64 92 } 65 93 66 94 int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) 67 95 { 68 - int err; 96 + u32 len; 69 97 70 - err = __xsk_rcv(xs, xdp); 71 - if (likely(!err)) 72 - xdp_return_buff(xdp); 73 - else 74 - xs->rx_dropped++; 98 + if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) 99 + return -EINVAL; 75 100 76 - return err; 101 + len = xdp->data_end - xdp->data; 102 + 103 + return (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) ? 104 + __xsk_rcv_zc(xs, xdp, len) : __xsk_rcv(xs, xdp, len); 77 105 } 78 106 79 107 void xsk_flush(struct xdp_sock *xs) ··· 113 83 114 84 int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) 115 85 { 86 + u32 len = xdp->data_end - xdp->data; 87 + void *buffer; 88 + u64 addr; 116 89 int err; 117 90 118 - err = __xsk_rcv(xs, xdp); 119 - if (!err) 120 - xsk_flush(xs); 121 - else 91 + if (!xskq_peek_addr(xs->umem->fq, &addr) || 92 + len > xs->umem->chunk_size_nohr) { 122 93 xs->rx_dropped++; 94 + return -ENOSPC; 95 + } 123 96 97 + addr += xs->umem->headroom; 98 + 99 + buffer = xdp_umem_get_data(xs->umem, addr); 100 + memcpy(buffer, xdp->data, len); 101 + err = xskq_produce_batch_desc(xs->rx, addr, len); 102 + if (!err) { 103 + xskq_discard_addr(xs->umem->fq); 104 + xsk_flush(xs); 105 + return 0; 106 + } 107 + 108 + xs->rx_dropped++; 124 109 return err; 110 + } 111 + 112 + void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) 113 + { 114 + xskq_produce_flush_addr_n(umem->cq, nb_entries); 115 + } 116 + EXPORT_SYMBOL(xsk_umem_complete_tx); 117 + 118 + void xsk_umem_consume_tx_done(struct xdp_umem *umem) 119 + { 120 + struct xdp_sock *xs; 121 + 122 + rcu_read_lock(); 123 + list_for_each_entry_rcu(xs, &umem->xsk_list, list) { 124 + xs->sk.sk_write_space(&xs->sk); 125 + } 126 + rcu_read_unlock(); 127 + } 128 + EXPORT_SYMBOL(xsk_umem_consume_tx_done); 129 + 130 + bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len) 131 + { 132 + struct xdp_desc desc; 133 + struct xdp_sock *xs; 134 + 135 + rcu_read_lock(); 136 + list_for_each_entry_rcu(xs, &umem->xsk_list, list) { 137 + if (!xskq_peek_desc(xs->tx, &desc)) 138 + continue; 139 + 140 + if (xskq_produce_addr_lazy(umem->cq, desc.addr)) 141 + goto out; 142 + 143 + *dma = xdp_umem_get_dma(umem, desc.addr); 144 + *len = desc.len; 145 + 146 + xskq_discard_desc(xs->tx); 147 + rcu_read_unlock(); 148 + return true; 149 + } 150 + 151 + out: 152 + rcu_read_unlock(); 153 + return false; 154 + } 155 + EXPORT_SYMBOL(xsk_umem_consume_tx); 156 + 157 + static int xsk_zc_xmit(struct sock *sk) 158 + { 159 + struct xdp_sock *xs = xdp_sk(sk); 160 + struct net_device *dev = xs->dev; 161 + 162 + return dev->netdev_ops->ndo_xsk_async_xmit(dev, xs->queue_id); 125 163 } 126 164 127 165 static void xsk_destruct_skb(struct sk_buff *skb) 128 166 { 129 - u32 id = (u32)(long)skb_shinfo(skb)->destructor_arg; 167 + u64 addr = (u64)(long)skb_shinfo(skb)->destructor_arg; 130 168 struct xdp_sock *xs = xdp_sk(skb->sk); 131 169 132 - WARN_ON_ONCE(xskq_produce_id(xs->umem->cq, id)); 170 + WARN_ON_ONCE(xskq_produce_addr(xs->umem->cq, addr)); 133 171 134 172 sock_wfree(skb); 135 173 } ··· 205 107 static int xsk_generic_xmit(struct sock *sk, struct msghdr *m, 206 108 size_t total_len) 207 109 { 208 - bool need_wait = !(m->msg_flags & MSG_DONTWAIT); 209 110 u32 max_batch = TX_BATCH_SIZE; 210 111 struct xdp_sock *xs = xdp_sk(sk); 211 112 bool sent_frame = false; ··· 214 117 215 118 if (unlikely(!xs->tx)) 216 119 return -ENOBUFS; 217 - if (need_wait) 218 - return -EOPNOTSUPP; 219 120 220 121 mutex_lock(&xs->mutex); 221 122 222 123 while (xskq_peek_desc(xs->tx, &desc)) { 223 124 char *buffer; 224 - u32 id, len; 125 + u64 addr; 126 + u32 len; 225 127 226 128 if (max_batch-- == 0) { 227 129 err = -EAGAIN; 228 130 goto out; 229 131 } 230 132 231 - if (xskq_reserve_id(xs->umem->cq)) { 133 + if (xskq_reserve_addr(xs->umem->cq)) { 232 134 err = -EAGAIN; 233 135 goto out; 234 136 } ··· 243 147 goto out; 244 148 } 245 149 246 - skb = sock_alloc_send_skb(sk, len, !need_wait, &err); 150 + skb = sock_alloc_send_skb(sk, len, 1, &err); 247 151 if (unlikely(!skb)) { 248 152 err = -EAGAIN; 249 153 goto out; 250 154 } 251 155 252 156 skb_put(skb, len); 253 - id = desc.idx; 254 - buffer = xdp_umem_get_data(xs->umem, id) + desc.offset; 157 + addr = desc.addr; 158 + buffer = xdp_umem_get_data(xs->umem, addr); 255 159 err = skb_store_bits(skb, 0, buffer, len); 256 160 if (unlikely(err)) { 257 161 kfree_skb(skb); ··· 261 165 skb->dev = xs->dev; 262 166 skb->priority = sk->sk_priority; 263 167 skb->mark = sk->sk_mark; 264 - skb_shinfo(skb)->destructor_arg = (void *)(long)id; 168 + skb_shinfo(skb)->destructor_arg = (void *)(long)addr; 265 169 skb->destructor = xsk_destruct_skb; 266 170 267 171 err = dev_direct_xmit(skb, xs->queue_id); ··· 286 190 287 191 static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) 288 192 { 193 + bool need_wait = !(m->msg_flags & MSG_DONTWAIT); 289 194 struct sock *sk = sock->sk; 290 195 struct xdp_sock *xs = xdp_sk(sk); 291 196 ··· 294 197 return -ENXIO; 295 198 if (unlikely(!(xs->dev->flags & IFF_UP))) 296 199 return -ENETDOWN; 200 + if (need_wait) 201 + return -EOPNOTSUPP; 297 202 298 - return xsk_generic_xmit(sk, m, total_len); 203 + return (xs->zc) ? xsk_zc_xmit(sk) : xsk_generic_xmit(sk, m, total_len); 299 204 } 300 205 301 206 static unsigned int xsk_poll(struct file *file, struct socket *sock, ··· 387 288 struct sock *sk = sock->sk; 388 289 struct xdp_sock *xs = xdp_sk(sk); 389 290 struct net_device *dev; 291 + u32 flags, qid; 390 292 int err = 0; 391 293 392 294 if (addr_len < sizeof(struct sockaddr_xdp)) ··· 412 312 goto out_unlock; 413 313 } 414 314 415 - if ((xs->rx && sxdp->sxdp_queue_id >= dev->real_num_rx_queues) || 416 - (xs->tx && sxdp->sxdp_queue_id >= dev->real_num_tx_queues)) { 315 + qid = sxdp->sxdp_queue_id; 316 + 317 + if ((xs->rx && qid >= dev->real_num_rx_queues) || 318 + (xs->tx && qid >= dev->real_num_tx_queues)) { 417 319 err = -EINVAL; 418 320 goto out_unlock; 419 321 } 420 322 421 - if (sxdp->sxdp_flags & XDP_SHARED_UMEM) { 323 + flags = sxdp->sxdp_flags; 324 + 325 + if (flags & XDP_SHARED_UMEM) { 422 326 struct xdp_sock *umem_xs; 423 327 struct socket *sock; 328 + 329 + if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY)) { 330 + /* Cannot specify flags for shared sockets. */ 331 + err = -EINVAL; 332 + goto out_unlock; 333 + } 424 334 425 335 if (xs->umem) { 426 336 /* We have already our own. */ ··· 450 340 err = -EBADF; 451 341 sockfd_put(sock); 452 342 goto out_unlock; 453 - } else if (umem_xs->dev != dev || 454 - umem_xs->queue_id != sxdp->sxdp_queue_id) { 343 + } else if (umem_xs->dev != dev || umem_xs->queue_id != qid) { 455 344 err = -EINVAL; 456 345 sockfd_put(sock); 457 346 goto out_unlock; ··· 466 357 /* This xsk has its own umem. */ 467 358 xskq_set_umem(xs->umem->fq, &xs->umem->props); 468 359 xskq_set_umem(xs->umem->cq, &xs->umem->props); 360 + 361 + err = xdp_umem_assign_dev(xs->umem, dev, qid, flags); 362 + if (err) 363 + goto out_unlock; 469 364 } 470 365 471 366 xs->dev = dev; 472 - xs->queue_id = sxdp->sxdp_queue_id; 473 - 367 + xs->zc = xs->umem->zc; 368 + xs->queue_id = qid; 474 369 xskq_set_umem(xs->rx, &xs->umem->props); 475 370 xskq_set_umem(xs->tx, &xs->umem->props); 371 + xdp_add_sk_umem(xs->umem, xs); 476 372 477 373 out_unlock: 478 374 if (err) ··· 715 601 716 602 xskq_destroy(xs->rx); 717 603 xskq_destroy(xs->tx); 604 + xdp_del_sk_umem(xs->umem, xs); 718 605 xdp_put_umem(xs->umem); 719 606 720 607 sk_refcnt_debug_dec(sk);
+1 -1
net/xdp/xsk_queue.c
··· 17 17 18 18 static u32 xskq_umem_get_ring_size(struct xsk_queue *q) 19 19 { 20 - return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u32); 20 + return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u64); 21 21 } 22 22 23 23 static u32 xskq_rxtx_get_ring_size(struct xsk_queue *q)
+54 -44
net/xdp/xsk_queue.h
··· 8 8 9 9 #include <linux/types.h> 10 10 #include <linux/if_xdp.h> 11 - 12 - #include "xdp_umem_props.h" 11 + #include <net/xdp_sock.h> 13 12 14 13 #define RX_BATCH_SIZE 16 14 + #define LAZY_UPDATE_THRESHOLD 128 15 15 16 16 struct xdp_ring { 17 17 u32 producer ____cacheline_aligned_in_smp; ··· 27 27 /* Used for the fill and completion queues for buffers */ 28 28 struct xdp_umem_ring { 29 29 struct xdp_ring ptrs; 30 - u32 desc[0] ____cacheline_aligned_in_smp; 30 + u64 desc[0] ____cacheline_aligned_in_smp; 31 31 }; 32 32 33 33 struct xsk_queue { ··· 62 62 return (entries > dcnt) ? dcnt : entries; 63 63 } 64 64 65 + static inline u32 xskq_nb_free_lazy(struct xsk_queue *q, u32 producer) 66 + { 67 + return q->nentries - (producer - q->cons_tail); 68 + } 69 + 65 70 static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt) 66 71 { 67 - u32 free_entries = q->nentries - (producer - q->cons_tail); 72 + u32 free_entries = xskq_nb_free_lazy(q, producer); 68 73 69 74 if (free_entries >= dcnt) 70 75 return free_entries; ··· 81 76 82 77 /* UMEM queue */ 83 78 84 - static inline bool xskq_is_valid_id(struct xsk_queue *q, u32 idx) 79 + static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr) 85 80 { 86 - if (unlikely(idx >= q->umem_props.nframes)) { 81 + if (addr >= q->umem_props.size) { 87 82 q->invalid_descs++; 88 83 return false; 89 84 } 85 + 90 86 return true; 91 87 } 92 88 93 - static inline u32 *xskq_validate_id(struct xsk_queue *q) 89 + static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr) 94 90 { 95 91 while (q->cons_tail != q->cons_head) { 96 92 struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; 97 93 unsigned int idx = q->cons_tail & q->ring_mask; 98 94 99 - if (xskq_is_valid_id(q, ring->desc[idx])) 100 - return &ring->desc[idx]; 95 + *addr = READ_ONCE(ring->desc[idx]) & q->umem_props.chunk_mask; 96 + if (xskq_is_valid_addr(q, *addr)) 97 + return addr; 101 98 102 99 q->cons_tail++; 103 100 } ··· 107 100 return NULL; 108 101 } 109 102 110 - static inline u32 *xskq_peek_id(struct xsk_queue *q) 103 + static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr) 111 104 { 112 - struct xdp_umem_ring *ring; 113 - 114 105 if (q->cons_tail == q->cons_head) { 115 106 WRITE_ONCE(q->ring->consumer, q->cons_tail); 116 107 q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE); 117 108 118 109 /* Order consumer and data */ 119 110 smp_rmb(); 120 - 121 - return xskq_validate_id(q); 122 111 } 123 112 124 - ring = (struct xdp_umem_ring *)q->ring; 125 - return &ring->desc[q->cons_tail & q->ring_mask]; 113 + return xskq_validate_addr(q, addr); 126 114 } 127 115 128 - static inline void xskq_discard_id(struct xsk_queue *q) 116 + static inline void xskq_discard_addr(struct xsk_queue *q) 129 117 { 130 118 q->cons_tail++; 131 - (void)xskq_validate_id(q); 132 119 } 133 120 134 - static inline int xskq_produce_id(struct xsk_queue *q, u32 id) 121 + static inline int xskq_produce_addr(struct xsk_queue *q, u64 addr) 135 122 { 136 123 struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; 137 124 138 - ring->desc[q->prod_tail++ & q->ring_mask] = id; 125 + if (xskq_nb_free(q, q->prod_tail, LAZY_UPDATE_THRESHOLD) == 0) 126 + return -ENOSPC; 127 + 128 + ring->desc[q->prod_tail++ & q->ring_mask] = addr; 139 129 140 130 /* Order producer and data */ 141 131 smp_wmb(); ··· 141 137 return 0; 142 138 } 143 139 144 - static inline int xskq_reserve_id(struct xsk_queue *q) 140 + static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr) 141 + { 142 + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; 143 + 144 + if (xskq_nb_free(q, q->prod_head, LAZY_UPDATE_THRESHOLD) == 0) 145 + return -ENOSPC; 146 + 147 + ring->desc[q->prod_head++ & q->ring_mask] = addr; 148 + return 0; 149 + } 150 + 151 + static inline void xskq_produce_flush_addr_n(struct xsk_queue *q, 152 + u32 nb_entries) 153 + { 154 + /* Order producer and data */ 155 + smp_wmb(); 156 + 157 + q->prod_tail += nb_entries; 158 + WRITE_ONCE(q->ring->producer, q->prod_tail); 159 + } 160 + 161 + static inline int xskq_reserve_addr(struct xsk_queue *q) 145 162 { 146 163 if (xskq_nb_free(q, q->prod_head, 1) == 0) 147 164 return -ENOSPC; ··· 175 150 176 151 static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d) 177 152 { 178 - u32 buff_len; 179 - 180 - if (unlikely(d->idx >= q->umem_props.nframes)) { 181 - q->invalid_descs++; 153 + if (!xskq_is_valid_addr(q, d->addr)) 182 154 return false; 183 - } 184 155 185 - buff_len = q->umem_props.frame_size; 186 - if (unlikely(d->len > buff_len || d->len == 0 || 187 - d->offset > buff_len || d->offset + d->len > buff_len)) { 156 + if (((d->addr + d->len) & q->umem_props.chunk_mask) != 157 + (d->addr & q->umem_props.chunk_mask)) { 188 158 q->invalid_descs++; 189 159 return false; 190 160 } ··· 194 174 struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; 195 175 unsigned int idx = q->cons_tail & q->ring_mask; 196 176 197 - if (xskq_is_valid_desc(q, &ring->desc[idx])) { 198 - if (desc) 199 - *desc = ring->desc[idx]; 177 + *desc = READ_ONCE(ring->desc[idx]); 178 + if (xskq_is_valid_desc(q, desc)) 200 179 return desc; 201 - } 202 180 203 181 q->cons_tail++; 204 182 } ··· 207 189 static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q, 208 190 struct xdp_desc *desc) 209 191 { 210 - struct xdp_rxtx_ring *ring; 211 - 212 192 if (q->cons_tail == q->cons_head) { 213 193 WRITE_ONCE(q->ring->consumer, q->cons_tail); 214 194 q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE); 215 195 216 196 /* Order consumer and data */ 217 197 smp_rmb(); 218 - 219 - return xskq_validate_desc(q, desc); 220 198 } 221 199 222 - ring = (struct xdp_rxtx_ring *)q->ring; 223 - *desc = ring->desc[q->cons_tail & q->ring_mask]; 224 - return desc; 200 + return xskq_validate_desc(q, desc); 225 201 } 226 202 227 203 static inline void xskq_discard_desc(struct xsk_queue *q) 228 204 { 229 205 q->cons_tail++; 230 - (void)xskq_validate_desc(q, NULL); 231 206 } 232 207 233 208 static inline int xskq_produce_batch_desc(struct xsk_queue *q, 234 - u32 id, u32 len, u16 offset) 209 + u64 addr, u32 len) 235 210 { 236 211 struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; 237 212 unsigned int idx; ··· 233 222 return -ENOSPC; 234 223 235 224 idx = (q->prod_head++) & q->ring_mask; 236 - ring->desc[idx].idx = id; 225 + ring->desc[idx].addr = addr; 237 226 ring->desc[idx].len = len; 238 - ring->desc[idx].offset = offset; 239 227 240 228 return 0; 241 229 }
+1 -1
samples/bpf/xdp_fwd_kern.c
··· 88 88 return XDP_PASS; 89 89 90 90 fib_params.family = AF_INET6; 91 - fib_params.flowlabel = *(__be32 *)ip6h & IPV6_FLOWINFO_MASK; 91 + fib_params.flowinfo = *(__be32 *)ip6h & IPV6_FLOWINFO_MASK; 92 92 fib_params.l4_protocol = ip6h->nexthdr; 93 93 fib_params.sport = 0; 94 94 fib_params.dport = 0;
+46 -51
samples/bpf/xdpsock_user.c
··· 46 46 47 47 #define NUM_FRAMES 131072 48 48 #define FRAME_HEADROOM 0 49 + #define FRAME_SHIFT 11 49 50 #define FRAME_SIZE 2048 50 51 #define NUM_DESCS 1024 51 52 #define BATCH_SIZE 16 ··· 56 55 57 56 #define DEBUG_HEXDUMP 0 58 57 58 + typedef __u64 u64; 59 59 typedef __u32 u32; 60 60 61 61 static unsigned long prev_time; ··· 75 73 static int opt_poll; 76 74 static int opt_shared_packet_buffer; 77 75 static int opt_interval = 1; 76 + static u32 opt_xdp_bind_flags; 78 77 79 78 struct xdp_umem_uqueue { 80 79 u32 cached_prod; ··· 84 81 u32 size; 85 82 u32 *producer; 86 83 u32 *consumer; 87 - u32 *ring; 84 + u64 *ring; 88 85 void *map; 89 86 }; 90 87 91 88 struct xdp_umem { 92 - char (*frames)[FRAME_SIZE]; 89 + char *frames; 93 90 struct xdp_umem_uqueue fq; 94 91 struct xdp_umem_uqueue cq; 95 92 int fd; ··· 158 155 159 156 static inline u32 umem_nb_free(struct xdp_umem_uqueue *q, u32 nb) 160 157 { 161 - u32 free_entries = q->size - (q->cached_prod - q->cached_cons); 158 + u32 free_entries = q->cached_cons - q->cached_prod; 162 159 163 160 if (free_entries >= nb) 164 161 return free_entries; 165 162 166 163 /* Refresh the local tail pointer */ 167 - q->cached_cons = *q->consumer; 164 + q->cached_cons = *q->consumer + q->size; 168 165 169 - return q->size - (q->cached_prod - q->cached_cons); 166 + return q->cached_cons - q->cached_prod; 170 167 } 171 168 172 169 static inline u32 xq_nb_free(struct xdp_uqueue *q, u32 ndescs) ··· 217 214 for (i = 0; i < nb; i++) { 218 215 u32 idx = fq->cached_prod++ & fq->mask; 219 216 220 - fq->ring[idx] = d[i].idx; 217 + fq->ring[idx] = d[i].addr; 221 218 } 222 219 223 220 u_smp_wmb(); ··· 227 224 return 0; 228 225 } 229 226 230 - static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u32 *d, 227 + static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u64 *d, 231 228 size_t nb) 232 229 { 233 230 u32 i; ··· 249 246 } 250 247 251 248 static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq, 252 - u32 *d, size_t nb) 249 + u64 *d, size_t nb) 253 250 { 254 251 u32 idx, i, entries = umem_nb_avail(cq, nb); 255 252 ··· 269 266 return entries; 270 267 } 271 268 272 - static inline void *xq_get_data(struct xdpsock *xsk, __u32 idx, __u32 off) 269 + static inline void *xq_get_data(struct xdpsock *xsk, u64 addr) 273 270 { 274 - lassert(idx < NUM_FRAMES); 275 - return &xsk->umem->frames[idx][off]; 271 + return &xsk->umem->frames[addr]; 276 272 } 277 273 278 274 static inline int xq_enq(struct xdp_uqueue *uq, ··· 287 285 for (i = 0; i < ndescs; i++) { 288 286 u32 idx = uq->cached_prod++ & uq->mask; 289 287 290 - r[idx].idx = descs[i].idx; 288 + r[idx].addr = descs[i].addr; 291 289 r[idx].len = descs[i].len; 292 - r[idx].offset = descs[i].offset; 293 290 } 294 291 295 292 u_smp_wmb(); ··· 298 297 } 299 298 300 299 static inline int xq_enq_tx_only(struct xdp_uqueue *uq, 301 - __u32 idx, unsigned int ndescs) 300 + unsigned int id, unsigned int ndescs) 302 301 { 303 302 struct xdp_desc *r = uq->ring; 304 303 unsigned int i; ··· 309 308 for (i = 0; i < ndescs; i++) { 310 309 u32 idx = uq->cached_prod++ & uq->mask; 311 310 312 - r[idx].idx = idx + i; 311 + r[idx].addr = (id + i) << FRAME_SHIFT; 313 312 r[idx].len = sizeof(pkt_data) - 1; 314 - r[idx].offset = 0; 315 313 } 316 314 317 315 u_smp_wmb(); ··· 357 357 *dst_addr = tmp; 358 358 } 359 359 360 - #if DEBUG_HEXDUMP 361 - static void hex_dump(void *pkt, size_t length, const char *prefix) 360 + static void hex_dump(void *pkt, size_t length, u64 addr) 362 361 { 363 - int i = 0; 364 362 const unsigned char *address = (unsigned char *)pkt; 365 363 const unsigned char *line = address; 366 364 size_t line_size = 32; 367 365 unsigned char c; 366 + char buf[32]; 367 + int i = 0; 368 368 369 + if (!DEBUG_HEXDUMP) 370 + return; 371 + 372 + sprintf(buf, "addr=%llu", addr); 369 373 printf("length = %zu\n", length); 370 - printf("%s | ", prefix); 374 + printf("%s | ", buf); 371 375 while (length-- > 0) { 372 376 printf("%02X ", *address++); 373 377 if (!(++i % line_size) || (length == 0 && i % line_size)) { ··· 386 382 } 387 383 printf("\n"); 388 384 if (length > 0) 389 - printf("%s | ", prefix); 385 + printf("%s | ", buf); 390 386 } 391 387 } 392 388 printf("\n"); 393 389 } 394 - #endif 395 390 396 391 static size_t gen_eth_frame(char *frame) 397 392 { ··· 415 412 416 413 mr.addr = (__u64)bufs; 417 414 mr.len = NUM_FRAMES * FRAME_SIZE; 418 - mr.frame_size = FRAME_SIZE; 419 - mr.frame_headroom = FRAME_HEADROOM; 415 + mr.chunk_size = FRAME_SIZE; 416 + mr.headroom = FRAME_HEADROOM; 420 417 421 418 lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0); 422 419 lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size, ··· 429 426 &optlen) == 0); 430 427 431 428 umem->fq.map = mmap(0, off.fr.desc + 432 - FQ_NUM_DESCS * sizeof(u32), 429 + FQ_NUM_DESCS * sizeof(u64), 433 430 PROT_READ | PROT_WRITE, 434 431 MAP_SHARED | MAP_POPULATE, sfd, 435 432 XDP_UMEM_PGOFF_FILL_RING); ··· 440 437 umem->fq.producer = umem->fq.map + off.fr.producer; 441 438 umem->fq.consumer = umem->fq.map + off.fr.consumer; 442 439 umem->fq.ring = umem->fq.map + off.fr.desc; 440 + umem->fq.cached_cons = FQ_NUM_DESCS; 443 441 444 442 umem->cq.map = mmap(0, off.cr.desc + 445 - CQ_NUM_DESCS * sizeof(u32), 443 + CQ_NUM_DESCS * sizeof(u64), 446 444 PROT_READ | PROT_WRITE, 447 445 MAP_SHARED | MAP_POPULATE, sfd, 448 446 XDP_UMEM_PGOFF_COMPLETION_RING); ··· 455 451 umem->cq.consumer = umem->cq.map + off.cr.consumer; 456 452 umem->cq.ring = umem->cq.map + off.cr.desc; 457 453 458 - umem->frames = (char (*)[FRAME_SIZE])bufs; 454 + umem->frames = bufs; 459 455 umem->fd = sfd; 460 456 461 457 if (opt_bench == BENCH_TXONLY) { 462 458 int i; 463 459 464 - for (i = 0; i < NUM_FRAMES; i++) 465 - (void)gen_eth_frame(&umem->frames[i][0]); 460 + for (i = 0; i < NUM_FRAMES * FRAME_SIZE; i += FRAME_SIZE) 461 + (void)gen_eth_frame(&umem->frames[i]); 466 462 } 467 463 468 464 return umem; ··· 476 472 struct xdpsock *xsk; 477 473 bool shared = true; 478 474 socklen_t optlen; 479 - u32 i; 475 + u64 i; 480 476 481 477 sfd = socket(PF_XDP, SOCK_RAW, 0); 482 478 lassert(sfd >= 0); ··· 512 508 lassert(xsk->rx.map != MAP_FAILED); 513 509 514 510 if (!shared) { 515 - for (i = 0; i < NUM_DESCS / 2; i++) 511 + for (i = 0; i < NUM_DESCS * FRAME_SIZE; i += FRAME_SIZE) 516 512 lassert(umem_fill_to_kernel(&xsk->umem->fq, &i, 1) 517 513 == 0); 518 514 } ··· 537 533 xsk->tx.producer = xsk->tx.map + off.tx.producer; 538 534 xsk->tx.consumer = xsk->tx.map + off.tx.consumer; 539 535 xsk->tx.ring = xsk->tx.map + off.tx.desc; 536 + xsk->tx.cached_cons = NUM_DESCS; 540 537 541 538 sxdp.sxdp_family = PF_XDP; 542 539 sxdp.sxdp_ifindex = opt_ifindex; 543 540 sxdp.sxdp_queue_id = opt_queue; 541 + 544 542 if (shared) { 545 543 sxdp.sxdp_flags = XDP_SHARED_UMEM; 546 544 sxdp.sxdp_shared_umem_fd = umem->fd; 545 + } else { 546 + sxdp.sxdp_flags = opt_xdp_bind_flags; 547 547 } 548 548 549 549 lassert(bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0); ··· 703 695 break; 704 696 case 'S': 705 697 opt_xdp_flags |= XDP_FLAGS_SKB_MODE; 698 + opt_xdp_bind_flags |= XDP_COPY; 706 699 break; 707 700 case 'N': 708 701 opt_xdp_flags |= XDP_FLAGS_DRV_MODE; ··· 736 727 737 728 static inline void complete_tx_l2fwd(struct xdpsock *xsk) 738 729 { 739 - u32 descs[BATCH_SIZE]; 730 + u64 descs[BATCH_SIZE]; 740 731 unsigned int rcvd; 741 732 size_t ndescs; 742 733 ··· 758 749 759 750 static inline void complete_tx_only(struct xdpsock *xsk) 760 751 { 761 - u32 descs[BATCH_SIZE]; 752 + u64 descs[BATCH_SIZE]; 762 753 unsigned int rcvd; 763 754 764 755 if (!xsk->outstanding_tx) ··· 783 774 return; 784 775 785 776 for (i = 0; i < rcvd; i++) { 786 - u32 idx = descs[i].idx; 777 + char *pkt = xq_get_data(xsk, descs[i].addr); 787 778 788 - lassert(idx < NUM_FRAMES); 789 - #if DEBUG_HEXDUMP 790 - char *pkt; 791 - char buf[32]; 792 - 793 - pkt = xq_get_data(xsk, idx, descs[i].offset); 794 - sprintf(buf, "idx=%d", idx); 795 - hex_dump(pkt, descs[i].len, buf); 796 - #endif 779 + hex_dump(pkt, descs[i].len, descs[i].addr); 797 780 } 798 781 799 782 xsk->rx_npkts += rcvd; ··· 868 867 } 869 868 870 869 for (i = 0; i < rcvd; i++) { 871 - char *pkt = xq_get_data(xsk, descs[i].idx, 872 - descs[i].offset); 870 + char *pkt = xq_get_data(xsk, descs[i].addr); 873 871 874 872 swap_mac_addresses(pkt); 875 - #if DEBUG_HEXDUMP 876 - char buf[32]; 877 - u32 idx = descs[i].idx; 878 873 879 - sprintf(buf, "idx=%d", idx); 880 - hex_dump(pkt, descs[i].len, buf); 881 - #endif 874 + hex_dump(pkt, descs[i].len, descs[i].addr); 882 875 } 883 876 884 877 xsk->rx_npkts += rcvd;
+1 -1
tools/bpf/bpf_exp.l
··· 175 175 yylval.number = strtol(yytext, NULL, 10); 176 176 return number; 177 177 } 178 - ([0][0-9]+) { 178 + ([0][0-7]+) { 179 179 yylval.number = strtol(yytext + 1, NULL, 8); 180 180 return number; 181 181 }
+7 -2
tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
··· 27 27 | 28 28 | *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* } 29 29 | *ATTACH_TYPE* := { **ingress** | **egress** | **sock_create** | **sock_ops** | **device** | 30 - | **bind4** | **bind6** | **post_bind4** | **post_bind6** | **connect4** | **connect6** } 30 + | **bind4** | **bind6** | **post_bind4** | **post_bind6** | **connect4** | **connect6** | 31 + | **sendmsg4** | **sendmsg6** } 31 32 | *ATTACH_FLAGS* := { **multi** | **override** } 32 33 33 34 DESCRIPTION ··· 71 70 **post_bind4** return from bind(2) for an inet4 socket (since 4.17); 72 71 **post_bind6** return from bind(2) for an inet6 socket (since 4.17); 73 72 **connect4** call to connect(2) for an inet4 socket (since 4.17); 74 - **connect6** call to connect(2) for an inet6 socket (since 4.17). 73 + **connect6** call to connect(2) for an inet6 socket (since 4.17); 74 + **sendmsg4** call to sendto(2), sendmsg(2), sendmmsg(2) for an 75 + unconnected udp4 socket (since 4.18); 76 + **sendmsg6** call to sendto(2), sendmsg(2), sendmmsg(2) for an 77 + unconnected udp6 socket (since 4.18). 75 78 76 79 **bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG* 77 80 Detach *PROG* from the cgroup *CGROUP* and attach type
+3 -2
tools/bpf/bpftool/bash-completion/bpftool
··· 407 407 attach|detach) 408 408 local ATTACH_TYPES='ingress egress sock_create sock_ops \ 409 409 device bind4 bind6 post_bind4 post_bind6 connect4 \ 410 - connect6' 410 + connect6 sendmsg4 sendmsg6' 411 411 local ATTACH_FLAGS='multi override' 412 412 local PROG_TYPE='id pinned tag' 413 413 case $prev in ··· 416 416 return 0 417 417 ;; 418 418 ingress|egress|sock_create|sock_ops|device|bind4|bind6|\ 419 - post_bind4|post_bind6|connect4|connect6) 419 + post_bind4|post_bind6|connect4|connect6|sendmsg4|\ 420 + sendmsg6) 420 421 COMPREPLY=( $( compgen -W "$PROG_TYPE" -- \ 421 422 "$cur" ) ) 422 423 return 0
+3 -1
tools/bpf/bpftool/cgroup.c
··· 20 20 " ATTACH_TYPE := { ingress | egress | sock_create |\n" \ 21 21 " sock_ops | device | bind4 | bind6 |\n" \ 22 22 " post_bind4 | post_bind6 | connect4 |\n" \ 23 - " connect6 }" 23 + " connect6 | sendmsg4 | sendmsg6 }" 24 24 25 25 static const char * const attach_type_strings[] = { 26 26 [BPF_CGROUP_INET_INGRESS] = "ingress", ··· 34 34 [BPF_CGROUP_INET6_CONNECT] = "connect6", 35 35 [BPF_CGROUP_INET4_POST_BIND] = "post_bind4", 36 36 [BPF_CGROUP_INET6_POST_BIND] = "post_bind6", 37 + [BPF_CGROUP_UDP4_SENDMSG] = "sendmsg4", 38 + [BPF_CGROUP_UDP6_SENDMSG] = "sendmsg6", 37 39 [__MAX_BPF_ATTACH_TYPE] = NULL, 38 40 }; 39 41
+1
tools/bpf/bpftool/prog.c
··· 71 71 [BPF_PROG_TYPE_SK_MSG] = "sk_msg", 72 72 [BPF_PROG_TYPE_RAW_TRACEPOINT] = "raw_tracepoint", 73 73 [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr", 74 + [BPF_PROG_TYPE_LIRC_MODE2] = "lirc_mode2", 74 75 }; 75 76 76 77 static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
+10
tools/include/linux/filter.h
··· 263 263 #define BPF_LD_MAP_FD(DST, MAP_FD) \ 264 264 BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD) 265 265 266 + /* Relative call */ 267 + 268 + #define BPF_CALL_REL(TGT) \ 269 + ((struct bpf_insn) { \ 270 + .code = BPF_JMP | BPF_CALL, \ 271 + .dst_reg = 0, \ 272 + .src_reg = BPF_PSEUDO_CALL, \ 273 + .off = 0, \ 274 + .imm = TGT }) 275 + 266 276 /* Program exit */ 267 277 268 278 #define BPF_EXIT_INSN() \
+108 -26
tools/include/uapi/linux/bpf.h
··· 143 143 BPF_PROG_TYPE_RAW_TRACEPOINT, 144 144 BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 145 145 BPF_PROG_TYPE_LWT_SEG6LOCAL, 146 + BPF_PROG_TYPE_LIRC_MODE2, 146 147 }; 147 148 148 149 enum bpf_attach_type { ··· 161 160 BPF_CGROUP_INET6_CONNECT, 162 161 BPF_CGROUP_INET4_POST_BIND, 163 162 BPF_CGROUP_INET6_POST_BIND, 163 + BPF_CGROUP_UDP4_SENDMSG, 164 + BPF_CGROUP_UDP6_SENDMSG, 165 + BPF_LIRC_MODE2, 164 166 __MAX_BPF_ATTACH_TYPE 165 167 }; 166 168 ··· 1012 1008 * :: 1013 1009 * 1014 1010 * # sysctl kernel.perf_event_max_stack=<new value> 1015 - * 1016 1011 * Return 1017 1012 * The positive or null stack id on success, or a negative error 1018 1013 * in case of failure. ··· 1822 1819 * :: 1823 1820 * 1824 1821 * # sysctl kernel.perf_event_max_stack=<new value> 1825 - * 1826 1822 * Return 1827 - * a non-negative value equal to or less than size on success, or 1828 - * a negative error in case of failure. 1823 + * A non-negative value equal to or less than *size* on success, 1824 + * or a negative error in case of failure. 1829 1825 * 1830 1826 * int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void *to, u32 len, u32 start_header) 1831 1827 * Description ··· 1845 1843 * in socket filters where *skb*\ **->data** does not always point 1846 1844 * to the start of the mac header and where "direct packet access" 1847 1845 * is not available. 1848 - * 1849 1846 * Return 1850 1847 * 0 on success, or a negative error in case of failure. 1851 1848 * ··· 1854 1853 * If lookup is successful and result shows packet is to be 1855 1854 * forwarded, the neighbor tables are searched for the nexthop. 1856 1855 * If successful (ie., FIB lookup shows forwarding and nexthop 1857 - * is resolved), the nexthop address is returned in ipv4_dst, 1858 - * ipv6_dst or mpls_out based on family, smac is set to mac 1859 - * address of egress device, dmac is set to nexthop mac address, 1860 - * rt_metric is set to metric from route. 1856 + * is resolved), the nexthop address is returned in ipv4_dst 1857 + * or ipv6_dst based on family, smac is set to mac address of 1858 + * egress device, dmac is set to nexthop mac address, rt_metric 1859 + * is set to metric from route (IPv4/IPv6 only). 1861 1860 * 1862 1861 * *plen* argument is the size of the passed in struct. 1863 - * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags: 1862 + * *flags* argument can be a combination of one or more of the 1863 + * following values: 1864 1864 * 1865 - * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs 1866 - * full lookup using FIB rules 1867 - * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress 1868 - * perspective (default is ingress) 1865 + * **BPF_FIB_LOOKUP_DIRECT** 1866 + * Do a direct table lookup vs full lookup using FIB 1867 + * rules. 1868 + * **BPF_FIB_LOOKUP_OUTPUT** 1869 + * Perform lookup from an egress perspective (default is 1870 + * ingress). 1869 1871 * 1870 1872 * *ctx* is either **struct xdp_md** for XDP programs or 1871 1873 * **struct sk_buff** tc cls_act programs. 1872 - * 1873 1874 * Return 1874 1875 * Egress device index on success, 0 if packet needs to continue 1875 1876 * up the stack for further processing or a negative error in case ··· 2007 2004 * direct packet access. 2008 2005 * Return 2009 2006 * 0 on success, or a negative error in case of failure. 2007 + * 2008 + * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle) 2009 + * Description 2010 + * This helper is used in programs implementing IR decoding, to 2011 + * report a successfully decoded key press with *scancode*, 2012 + * *toggle* value in the given *protocol*. The scancode will be 2013 + * translated to a keycode using the rc keymap, and reported as 2014 + * an input key down event. After a period a key up event is 2015 + * generated. This period can be extended by calling either 2016 + * **bpf_rc_keydown** () again with the same values, or calling 2017 + * **bpf_rc_repeat** (). 2018 + * 2019 + * Some protocols include a toggle bit, in case the button was 2020 + * released and pressed again between consecutive scancodes. 2021 + * 2022 + * The *ctx* should point to the lirc sample as passed into 2023 + * the program. 2024 + * 2025 + * The *protocol* is the decoded protocol number (see 2026 + * **enum rc_proto** for some predefined values). 2027 + * 2028 + * This helper is only available is the kernel was compiled with 2029 + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to 2030 + * "**y**". 2031 + * 2032 + * Return 2033 + * 0 2034 + * 2035 + * int bpf_rc_repeat(void *ctx) 2036 + * Description 2037 + * This helper is used in programs implementing IR decoding, to 2038 + * report a successfully decoded repeat key message. This delays 2039 + * the generation of a key up event for previously generated 2040 + * key down event. 2041 + * 2042 + * Some IR protocols like NEC have a special IR message for 2043 + * repeating last button, for when a button is held down. 2044 + * 2045 + * The *ctx* should point to the lirc sample as passed into 2046 + * the program. 2047 + * 2048 + * This helper is only available is the kernel was compiled with 2049 + * the **CONFIG_BPF_LIRC_MODE2** configuration option set to 2050 + * "**y**". 2051 + * 2052 + * Return 2053 + * 0 2054 + * 2055 + * uint64_t bpf_skb_cgroup_id(struct sk_buff *skb) 2056 + * Description 2057 + * Return the cgroup v2 id of the socket associated with the *skb*. 2058 + * This is roughly similar to the **bpf_get_cgroup_classid**\ () 2059 + * helper for cgroup v1 by providing a tag resp. identifier that 2060 + * can be matched on or used for map lookups e.g. to implement 2061 + * policy. The cgroup v2 id of a given path in the hierarchy is 2062 + * exposed in user space through the f_handle API in order to get 2063 + * to the same 64-bit id. 2064 + * 2065 + * This helper can be used on TC egress path, but not on ingress, 2066 + * and is available only if the kernel was compiled with the 2067 + * **CONFIG_SOCK_CGROUP_DATA** configuration option. 2068 + * Return 2069 + * The id is returned or 0 in case the id could not be retrieved. 2070 + * 2071 + * u64 bpf_get_current_cgroup_id(void) 2072 + * Return 2073 + * A 64-bit integer containing the current cgroup id based 2074 + * on the cgroup within which the current task is running. 2010 2075 */ 2011 2076 #define __BPF_FUNC_MAPPER(FN) \ 2012 2077 FN(unspec), \ ··· 2153 2082 FN(lwt_push_encap), \ 2154 2083 FN(lwt_seg6_store_bytes), \ 2155 2084 FN(lwt_seg6_adjust_srh), \ 2156 - FN(lwt_seg6_action), 2085 + FN(lwt_seg6_action), \ 2086 + FN(rc_repeat), \ 2087 + FN(rc_keydown), \ 2088 + FN(skb_cgroup_id), \ 2089 + FN(get_current_cgroup_id), 2157 2090 2158 2091 /* integer value in 'imm' field of BPF_CALL instruction selects which helper 2159 2092 * function eBPF program intends to call ··· 2274 2199 }; 2275 2200 __u8 tunnel_tos; 2276 2201 __u8 tunnel_ttl; 2277 - __u16 tunnel_ext; 2202 + __u16 tunnel_ext; /* Padding, future use. */ 2278 2203 __u32 tunnel_label; 2279 2204 }; 2280 2205 ··· 2285 2210 __u32 reqid; 2286 2211 __u32 spi; /* Stored in network byte order */ 2287 2212 __u16 family; 2213 + __u16 ext; /* Padding, future use. */ 2288 2214 union { 2289 2215 __u32 remote_ipv4; /* Stored in network byte order */ 2290 2216 __u32 remote_ipv6[4]; /* Stored in network byte order */ ··· 2440 2364 __u32 family; /* Allows 4-byte read, but no write */ 2441 2365 __u32 type; /* Allows 4-byte read, but no write */ 2442 2366 __u32 protocol; /* Allows 4-byte read, but no write */ 2367 + __u32 msg_src_ip4; /* Allows 1,2,4-byte read an 4-byte write. 2368 + * Stored in network byte order. 2369 + */ 2370 + __u32 msg_src_ip6[4]; /* Allows 1,2,4-byte read an 4-byte write. 2371 + * Stored in network byte order. 2372 + */ 2443 2373 }; 2444 2374 2445 2375 /* User bpf_sock_ops struct to access socket values and specify request ops ··· 2613 2531 #define BPF_FIB_LOOKUP_OUTPUT BIT(1) 2614 2532 2615 2533 struct bpf_fib_lookup { 2616 - /* input */ 2617 - __u8 family; /* network family, AF_INET, AF_INET6, AF_MPLS */ 2534 + /* input: network family for lookup (AF_INET, AF_INET6) 2535 + * output: network family of egress nexthop 2536 + */ 2537 + __u8 family; 2618 2538 2619 2539 /* set if lookup is to consider L4 data - e.g., FIB rules */ 2620 2540 __u8 l4_protocol; ··· 2632 2548 __u8 tos; /* AF_INET */ 2633 2549 __be32 flowlabel; /* AF_INET6 */ 2634 2550 2635 - /* output: metric of fib result */ 2636 - __u32 rt_metric; 2551 + /* output: metric of fib result (IPv4/IPv6 only) */ 2552 + __u32 rt_metric; 2637 2553 }; 2638 2554 2639 2555 union { 2640 - __be32 mpls_in; 2641 2556 __be32 ipv4_src; 2642 2557 __u32 ipv6_src[4]; /* in6_addr; network order */ 2643 2558 }; 2644 2559 2645 - /* input to bpf_fib_lookup, *dst is destination address. 2646 - * output: bpf_fib_lookup sets to gateway address 2560 + /* input to bpf_fib_lookup, ipv{4,6}_dst is destination address in 2561 + * network header. output: bpf_fib_lookup sets to gateway address 2562 + * if FIB lookup returns gateway route 2647 2563 */ 2648 2564 union { 2649 - /* return for MPLS lookups */ 2650 - __be32 mpls_out[4]; /* support up to 4 labels */ 2651 2565 __be32 ipv4_dst; 2652 2566 __u32 ipv6_dst[4]; /* in6_addr; network order */ 2653 2567 };
+217
tools/include/uapi/linux/lirc.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ 2 + /* 3 + * lirc.h - linux infrared remote control header file 4 + * last modified 2010/07/13 by Jarod Wilson 5 + */ 6 + 7 + #ifndef _LINUX_LIRC_H 8 + #define _LINUX_LIRC_H 9 + 10 + #include <linux/types.h> 11 + #include <linux/ioctl.h> 12 + 13 + #define PULSE_BIT 0x01000000 14 + #define PULSE_MASK 0x00FFFFFF 15 + 16 + #define LIRC_MODE2_SPACE 0x00000000 17 + #define LIRC_MODE2_PULSE 0x01000000 18 + #define LIRC_MODE2_FREQUENCY 0x02000000 19 + #define LIRC_MODE2_TIMEOUT 0x03000000 20 + 21 + #define LIRC_VALUE_MASK 0x00FFFFFF 22 + #define LIRC_MODE2_MASK 0xFF000000 23 + 24 + #define LIRC_SPACE(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_SPACE) 25 + #define LIRC_PULSE(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_PULSE) 26 + #define LIRC_FREQUENCY(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_FREQUENCY) 27 + #define LIRC_TIMEOUT(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_TIMEOUT) 28 + 29 + #define LIRC_VALUE(val) ((val)&LIRC_VALUE_MASK) 30 + #define LIRC_MODE2(val) ((val)&LIRC_MODE2_MASK) 31 + 32 + #define LIRC_IS_SPACE(val) (LIRC_MODE2(val) == LIRC_MODE2_SPACE) 33 + #define LIRC_IS_PULSE(val) (LIRC_MODE2(val) == LIRC_MODE2_PULSE) 34 + #define LIRC_IS_FREQUENCY(val) (LIRC_MODE2(val) == LIRC_MODE2_FREQUENCY) 35 + #define LIRC_IS_TIMEOUT(val) (LIRC_MODE2(val) == LIRC_MODE2_TIMEOUT) 36 + 37 + /* used heavily by lirc userspace */ 38 + #define lirc_t int 39 + 40 + /*** lirc compatible hardware features ***/ 41 + 42 + #define LIRC_MODE2SEND(x) (x) 43 + #define LIRC_SEND2MODE(x) (x) 44 + #define LIRC_MODE2REC(x) ((x) << 16) 45 + #define LIRC_REC2MODE(x) ((x) >> 16) 46 + 47 + #define LIRC_MODE_RAW 0x00000001 48 + #define LIRC_MODE_PULSE 0x00000002 49 + #define LIRC_MODE_MODE2 0x00000004 50 + #define LIRC_MODE_SCANCODE 0x00000008 51 + #define LIRC_MODE_LIRCCODE 0x00000010 52 + 53 + 54 + #define LIRC_CAN_SEND_RAW LIRC_MODE2SEND(LIRC_MODE_RAW) 55 + #define LIRC_CAN_SEND_PULSE LIRC_MODE2SEND(LIRC_MODE_PULSE) 56 + #define LIRC_CAN_SEND_MODE2 LIRC_MODE2SEND(LIRC_MODE_MODE2) 57 + #define LIRC_CAN_SEND_LIRCCODE LIRC_MODE2SEND(LIRC_MODE_LIRCCODE) 58 + 59 + #define LIRC_CAN_SEND_MASK 0x0000003f 60 + 61 + #define LIRC_CAN_SET_SEND_CARRIER 0x00000100 62 + #define LIRC_CAN_SET_SEND_DUTY_CYCLE 0x00000200 63 + #define LIRC_CAN_SET_TRANSMITTER_MASK 0x00000400 64 + 65 + #define LIRC_CAN_REC_RAW LIRC_MODE2REC(LIRC_MODE_RAW) 66 + #define LIRC_CAN_REC_PULSE LIRC_MODE2REC(LIRC_MODE_PULSE) 67 + #define LIRC_CAN_REC_MODE2 LIRC_MODE2REC(LIRC_MODE_MODE2) 68 + #define LIRC_CAN_REC_SCANCODE LIRC_MODE2REC(LIRC_MODE_SCANCODE) 69 + #define LIRC_CAN_REC_LIRCCODE LIRC_MODE2REC(LIRC_MODE_LIRCCODE) 70 + 71 + #define LIRC_CAN_REC_MASK LIRC_MODE2REC(LIRC_CAN_SEND_MASK) 72 + 73 + #define LIRC_CAN_SET_REC_CARRIER (LIRC_CAN_SET_SEND_CARRIER << 16) 74 + #define LIRC_CAN_SET_REC_DUTY_CYCLE (LIRC_CAN_SET_SEND_DUTY_CYCLE << 16) 75 + 76 + #define LIRC_CAN_SET_REC_DUTY_CYCLE_RANGE 0x40000000 77 + #define LIRC_CAN_SET_REC_CARRIER_RANGE 0x80000000 78 + #define LIRC_CAN_GET_REC_RESOLUTION 0x20000000 79 + #define LIRC_CAN_SET_REC_TIMEOUT 0x10000000 80 + #define LIRC_CAN_SET_REC_FILTER 0x08000000 81 + 82 + #define LIRC_CAN_MEASURE_CARRIER 0x02000000 83 + #define LIRC_CAN_USE_WIDEBAND_RECEIVER 0x04000000 84 + 85 + #define LIRC_CAN_SEND(x) ((x)&LIRC_CAN_SEND_MASK) 86 + #define LIRC_CAN_REC(x) ((x)&LIRC_CAN_REC_MASK) 87 + 88 + #define LIRC_CAN_NOTIFY_DECODE 0x01000000 89 + 90 + /*** IOCTL commands for lirc driver ***/ 91 + 92 + #define LIRC_GET_FEATURES _IOR('i', 0x00000000, __u32) 93 + 94 + #define LIRC_GET_SEND_MODE _IOR('i', 0x00000001, __u32) 95 + #define LIRC_GET_REC_MODE _IOR('i', 0x00000002, __u32) 96 + #define LIRC_GET_REC_RESOLUTION _IOR('i', 0x00000007, __u32) 97 + 98 + #define LIRC_GET_MIN_TIMEOUT _IOR('i', 0x00000008, __u32) 99 + #define LIRC_GET_MAX_TIMEOUT _IOR('i', 0x00000009, __u32) 100 + 101 + /* code length in bits, currently only for LIRC_MODE_LIRCCODE */ 102 + #define LIRC_GET_LENGTH _IOR('i', 0x0000000f, __u32) 103 + 104 + #define LIRC_SET_SEND_MODE _IOW('i', 0x00000011, __u32) 105 + #define LIRC_SET_REC_MODE _IOW('i', 0x00000012, __u32) 106 + /* Note: these can reset the according pulse_width */ 107 + #define LIRC_SET_SEND_CARRIER _IOW('i', 0x00000013, __u32) 108 + #define LIRC_SET_REC_CARRIER _IOW('i', 0x00000014, __u32) 109 + #define LIRC_SET_SEND_DUTY_CYCLE _IOW('i', 0x00000015, __u32) 110 + #define LIRC_SET_TRANSMITTER_MASK _IOW('i', 0x00000017, __u32) 111 + 112 + /* 113 + * when a timeout != 0 is set the driver will send a 114 + * LIRC_MODE2_TIMEOUT data packet, otherwise LIRC_MODE2_TIMEOUT is 115 + * never sent, timeout is disabled by default 116 + */ 117 + #define LIRC_SET_REC_TIMEOUT _IOW('i', 0x00000018, __u32) 118 + 119 + /* 1 enables, 0 disables timeout reports in MODE2 */ 120 + #define LIRC_SET_REC_TIMEOUT_REPORTS _IOW('i', 0x00000019, __u32) 121 + 122 + /* 123 + * if enabled from the next key press on the driver will send 124 + * LIRC_MODE2_FREQUENCY packets 125 + */ 126 + #define LIRC_SET_MEASURE_CARRIER_MODE _IOW('i', 0x0000001d, __u32) 127 + 128 + /* 129 + * to set a range use LIRC_SET_REC_CARRIER_RANGE with the 130 + * lower bound first and later LIRC_SET_REC_CARRIER with the upper bound 131 + */ 132 + #define LIRC_SET_REC_CARRIER_RANGE _IOW('i', 0x0000001f, __u32) 133 + 134 + #define LIRC_SET_WIDEBAND_RECEIVER _IOW('i', 0x00000023, __u32) 135 + 136 + /* 137 + * struct lirc_scancode - decoded scancode with protocol for use with 138 + * LIRC_MODE_SCANCODE 139 + * 140 + * @timestamp: Timestamp in nanoseconds using CLOCK_MONOTONIC when IR 141 + * was decoded. 142 + * @flags: should be 0 for transmit. When receiving scancodes, 143 + * LIRC_SCANCODE_FLAG_TOGGLE or LIRC_SCANCODE_FLAG_REPEAT can be set 144 + * depending on the protocol 145 + * @rc_proto: see enum rc_proto 146 + * @keycode: the translated keycode. Set to 0 for transmit. 147 + * @scancode: the scancode received or to be sent 148 + */ 149 + struct lirc_scancode { 150 + __u64 timestamp; 151 + __u16 flags; 152 + __u16 rc_proto; 153 + __u32 keycode; 154 + __u64 scancode; 155 + }; 156 + 157 + /* Set if the toggle bit of rc-5 or rc-6 is enabled */ 158 + #define LIRC_SCANCODE_FLAG_TOGGLE 1 159 + /* Set if this is a nec or sanyo repeat */ 160 + #define LIRC_SCANCODE_FLAG_REPEAT 2 161 + 162 + /** 163 + * enum rc_proto - the Remote Controller protocol 164 + * 165 + * @RC_PROTO_UNKNOWN: Protocol not known 166 + * @RC_PROTO_OTHER: Protocol known but proprietary 167 + * @RC_PROTO_RC5: Philips RC5 protocol 168 + * @RC_PROTO_RC5X_20: Philips RC5x 20 bit protocol 169 + * @RC_PROTO_RC5_SZ: StreamZap variant of RC5 170 + * @RC_PROTO_JVC: JVC protocol 171 + * @RC_PROTO_SONY12: Sony 12 bit protocol 172 + * @RC_PROTO_SONY15: Sony 15 bit protocol 173 + * @RC_PROTO_SONY20: Sony 20 bit protocol 174 + * @RC_PROTO_NEC: NEC protocol 175 + * @RC_PROTO_NECX: Extended NEC protocol 176 + * @RC_PROTO_NEC32: NEC 32 bit protocol 177 + * @RC_PROTO_SANYO: Sanyo protocol 178 + * @RC_PROTO_MCIR2_KBD: RC6-ish MCE keyboard 179 + * @RC_PROTO_MCIR2_MSE: RC6-ish MCE mouse 180 + * @RC_PROTO_RC6_0: Philips RC6-0-16 protocol 181 + * @RC_PROTO_RC6_6A_20: Philips RC6-6A-20 protocol 182 + * @RC_PROTO_RC6_6A_24: Philips RC6-6A-24 protocol 183 + * @RC_PROTO_RC6_6A_32: Philips RC6-6A-32 protocol 184 + * @RC_PROTO_RC6_MCE: MCE (Philips RC6-6A-32 subtype) protocol 185 + * @RC_PROTO_SHARP: Sharp protocol 186 + * @RC_PROTO_XMP: XMP protocol 187 + * @RC_PROTO_CEC: CEC protocol 188 + * @RC_PROTO_IMON: iMon Pad protocol 189 + */ 190 + enum rc_proto { 191 + RC_PROTO_UNKNOWN = 0, 192 + RC_PROTO_OTHER = 1, 193 + RC_PROTO_RC5 = 2, 194 + RC_PROTO_RC5X_20 = 3, 195 + RC_PROTO_RC5_SZ = 4, 196 + RC_PROTO_JVC = 5, 197 + RC_PROTO_SONY12 = 6, 198 + RC_PROTO_SONY15 = 7, 199 + RC_PROTO_SONY20 = 8, 200 + RC_PROTO_NEC = 9, 201 + RC_PROTO_NECX = 10, 202 + RC_PROTO_NEC32 = 11, 203 + RC_PROTO_SANYO = 12, 204 + RC_PROTO_MCIR2_KBD = 13, 205 + RC_PROTO_MCIR2_MSE = 14, 206 + RC_PROTO_RC6_0 = 15, 207 + RC_PROTO_RC6_6A_20 = 16, 208 + RC_PROTO_RC6_6A_24 = 17, 209 + RC_PROTO_RC6_6A_32 = 18, 210 + RC_PROTO_RC6_MCE = 19, 211 + RC_PROTO_SHARP = 20, 212 + RC_PROTO_XMP = 21, 213 + RC_PROTO_CEC = 22, 214 + RC_PROTO_IMON = 23, 215 + }; 216 + 217 + #endif
+55
tools/include/uapi/linux/seg6.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ 2 + /* 3 + * SR-IPv6 implementation 4 + * 5 + * Author: 6 + * David Lebrun <david.lebrun@uclouvain.be> 7 + * 8 + * 9 + * This program is free software; you can redistribute it and/or 10 + * modify it under the terms of the GNU General Public License 11 + * as published by the Free Software Foundation; either version 12 + * 2 of the License, or (at your option) any later version. 13 + */ 14 + 15 + #ifndef _UAPI_LINUX_SEG6_H 16 + #define _UAPI_LINUX_SEG6_H 17 + 18 + #include <linux/types.h> 19 + #include <linux/in6.h> /* For struct in6_addr. */ 20 + 21 + /* 22 + * SRH 23 + */ 24 + struct ipv6_sr_hdr { 25 + __u8 nexthdr; 26 + __u8 hdrlen; 27 + __u8 type; 28 + __u8 segments_left; 29 + __u8 first_segment; /* Represents the last_entry field of SRH */ 30 + __u8 flags; 31 + __u16 tag; 32 + 33 + struct in6_addr segments[0]; 34 + }; 35 + 36 + #define SR6_FLAG1_PROTECTED (1 << 6) 37 + #define SR6_FLAG1_OAM (1 << 5) 38 + #define SR6_FLAG1_ALERT (1 << 4) 39 + #define SR6_FLAG1_HMAC (1 << 3) 40 + 41 + #define SR6_TLV_INGRESS 1 42 + #define SR6_TLV_EGRESS 2 43 + #define SR6_TLV_OPAQUE 3 44 + #define SR6_TLV_PADDING 4 45 + #define SR6_TLV_HMAC 5 46 + 47 + #define sr_has_hmac(srh) ((srh)->flags & SR6_FLAG1_HMAC) 48 + 49 + struct sr6_tlv { 50 + __u8 type; 51 + __u8 len; 52 + __u8 data[0]; 53 + }; 54 + 55 + #endif
+80
tools/include/uapi/linux/seg6_local.h
··· 1 + /* 2 + * SR-IPv6 implementation 3 + * 4 + * Author: 5 + * David Lebrun <david.lebrun@uclouvain.be> 6 + * 7 + * 8 + * This program is free software; you can redistribute it and/or 9 + * modify it under the terms of the GNU General Public License 10 + * as published by the Free Software Foundation; either version 11 + * 2 of the License, or (at your option) any later version. 12 + */ 13 + 14 + #ifndef _UAPI_LINUX_SEG6_LOCAL_H 15 + #define _UAPI_LINUX_SEG6_LOCAL_H 16 + 17 + #include <linux/seg6.h> 18 + 19 + enum { 20 + SEG6_LOCAL_UNSPEC, 21 + SEG6_LOCAL_ACTION, 22 + SEG6_LOCAL_SRH, 23 + SEG6_LOCAL_TABLE, 24 + SEG6_LOCAL_NH4, 25 + SEG6_LOCAL_NH6, 26 + SEG6_LOCAL_IIF, 27 + SEG6_LOCAL_OIF, 28 + SEG6_LOCAL_BPF, 29 + __SEG6_LOCAL_MAX, 30 + }; 31 + #define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1) 32 + 33 + enum { 34 + SEG6_LOCAL_ACTION_UNSPEC = 0, 35 + /* node segment */ 36 + SEG6_LOCAL_ACTION_END = 1, 37 + /* adjacency segment (IPv6 cross-connect) */ 38 + SEG6_LOCAL_ACTION_END_X = 2, 39 + /* lookup of next seg NH in table */ 40 + SEG6_LOCAL_ACTION_END_T = 3, 41 + /* decap and L2 cross-connect */ 42 + SEG6_LOCAL_ACTION_END_DX2 = 4, 43 + /* decap and IPv6 cross-connect */ 44 + SEG6_LOCAL_ACTION_END_DX6 = 5, 45 + /* decap and IPv4 cross-connect */ 46 + SEG6_LOCAL_ACTION_END_DX4 = 6, 47 + /* decap and lookup of DA in v6 table */ 48 + SEG6_LOCAL_ACTION_END_DT6 = 7, 49 + /* decap and lookup of DA in v4 table */ 50 + SEG6_LOCAL_ACTION_END_DT4 = 8, 51 + /* binding segment with insertion */ 52 + SEG6_LOCAL_ACTION_END_B6 = 9, 53 + /* binding segment with encapsulation */ 54 + SEG6_LOCAL_ACTION_END_B6_ENCAP = 10, 55 + /* binding segment with MPLS encap */ 56 + SEG6_LOCAL_ACTION_END_BM = 11, 57 + /* lookup last seg in table */ 58 + SEG6_LOCAL_ACTION_END_S = 12, 59 + /* forward to SR-unaware VNF with static proxy */ 60 + SEG6_LOCAL_ACTION_END_AS = 13, 61 + /* forward to SR-unaware VNF with masquerading */ 62 + SEG6_LOCAL_ACTION_END_AM = 14, 63 + /* custom BPF action */ 64 + SEG6_LOCAL_ACTION_END_BPF = 15, 65 + 66 + __SEG6_LOCAL_ACTION_MAX, 67 + }; 68 + 69 + #define SEG6_LOCAL_ACTION_MAX (__SEG6_LOCAL_ACTION_MAX - 1) 70 + 71 + enum { 72 + SEG6_LOCAL_BPF_PROG_UNSPEC, 73 + SEG6_LOCAL_BPF_PROG, 74 + SEG6_LOCAL_BPF_PROG_NAME, 75 + __SEG6_LOCAL_BPF_PROG_MAX, 76 + }; 77 + 78 + #define SEG6_LOCAL_BPF_PROG_MAX (__SEG6_LOCAL_BPF_PROG_MAX - 1) 79 + 80 + #endif
+1
tools/lib/bpf/Makefile
··· 189 189 $(call QUIET_INSTALL, headers) \ 190 190 $(call do_install,bpf.h,$(prefix)/include/bpf,644); \ 191 191 $(call do_install,libbpf.h,$(prefix)/include/bpf,644); 192 + $(call do_install,btf.h,$(prefix)/include/bpf,644); 192 193 193 194 install: install_lib 194 195
+3
tools/lib/bpf/libbpf.c
··· 1462 1462 case BPF_PROG_TYPE_CGROUP_DEVICE: 1463 1463 case BPF_PROG_TYPE_SK_MSG: 1464 1464 case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: 1465 + case BPF_PROG_TYPE_LIRC_MODE2: 1465 1466 return false; 1466 1467 case BPF_PROG_TYPE_UNSPEC: 1467 1468 case BPF_PROG_TYPE_KPROBE: ··· 2044 2043 BPF_SA_PROG_SEC("cgroup/bind6", BPF_CGROUP_INET6_BIND), 2045 2044 BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT), 2046 2045 BPF_SA_PROG_SEC("cgroup/connect6", BPF_CGROUP_INET6_CONNECT), 2046 + BPF_SA_PROG_SEC("cgroup/sendmsg4", BPF_CGROUP_UDP4_SENDMSG), 2047 + BPF_SA_PROG_SEC("cgroup/sendmsg6", BPF_CGROUP_UDP6_SENDMSG), 2047 2048 BPF_S_PROG_SEC("cgroup/post_bind4", BPF_CGROUP_INET4_POST_BIND), 2048 2049 BPF_S_PROG_SEC("cgroup/post_bind6", BPF_CGROUP_INET6_POST_BIND), 2049 2050 };
+2
tools/testing/selftests/bpf/.gitignore
··· 17 17 urandom_read 18 18 test_btf 19 19 test_sockmap 20 + test_lirc_mode2_user 21 + get_cgroup_id_user
+6 -3
tools/testing/selftests/bpf/Makefile
··· 24 24 # Order correspond to 'make run_tests' order 25 25 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ 26 26 test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \ 27 - test_sock test_btf test_sockmap 27 + test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user 28 28 29 29 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \ 30 30 test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \ ··· 34 34 sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \ 35 35 test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \ 36 36 test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \ 37 - test_lwt_seg6local.o 37 + test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \ 38 + get_cgroup_id_kern.o 38 39 39 40 # Order correspond to 'make run_tests' order 40 41 TEST_PROGS := test_kmod.sh \ ··· 45 44 test_offload.py \ 46 45 test_sock_addr.sh \ 47 46 test_tunnel.sh \ 48 - test_lwt_seg6local.sh 47 + test_lwt_seg6local.sh \ 48 + test_lirc_mode2.sh 49 49 50 50 # Compile but not part of 'make run_tests' 51 51 TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr ··· 64 62 $(OUTPUT)/test_sock_addr: cgroup_helpers.c 65 63 $(OUTPUT)/test_sockmap: cgroup_helpers.c 66 64 $(OUTPUT)/test_progs: trace_helpers.c 65 + $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c 67 66 68 67 .PHONY: force 69 68
+7
tools/testing/selftests/bpf/bpf_helpers.h
··· 126 126 static int (*bpf_lwt_seg6_adjust_srh)(void *ctx, unsigned int offset, 127 127 unsigned int len) = 128 128 (void *) BPF_FUNC_lwt_seg6_adjust_srh; 129 + static int (*bpf_rc_repeat)(void *ctx) = 130 + (void *) BPF_FUNC_rc_repeat; 131 + static int (*bpf_rc_keydown)(void *ctx, unsigned int protocol, 132 + unsigned long long scancode, unsigned int toggle) = 133 + (void *) BPF_FUNC_rc_keydown; 134 + static unsigned long long (*bpf_get_current_cgroup_id)(void) = 135 + (void *) BPF_FUNC_get_current_cgroup_id; 129 136 130 137 /* llvm builtin functions that eBPF C program may use to 131 138 * emit BPF_LD_ABS and BPF_LD_IND instructions
+57
tools/testing/selftests/bpf/cgroup_helpers.c
··· 6 6 #include <sys/types.h> 7 7 #include <linux/limits.h> 8 8 #include <stdio.h> 9 + #include <stdlib.h> 9 10 #include <linux/sched.h> 10 11 #include <fcntl.h> 11 12 #include <unistd.h> ··· 176 175 } 177 176 178 177 return fd; 178 + } 179 + 180 + /** 181 + * get_cgroup_id() - Get cgroup id for a particular cgroup path 182 + * @path: The cgroup path, relative to the workdir, to join 183 + * 184 + * On success, it returns the cgroup id. On failure it returns 0, 185 + * which is an invalid cgroup id. 186 + * If there is a failure, it prints the error to stderr. 187 + */ 188 + unsigned long long get_cgroup_id(char *path) 189 + { 190 + int dirfd, err, flags, mount_id, fhsize; 191 + union { 192 + unsigned long long cgid; 193 + unsigned char raw_bytes[8]; 194 + } id; 195 + char cgroup_workdir[PATH_MAX + 1]; 196 + struct file_handle *fhp, *fhp2; 197 + unsigned long long ret = 0; 198 + 199 + format_cgroup_path(cgroup_workdir, path); 200 + 201 + dirfd = AT_FDCWD; 202 + flags = 0; 203 + fhsize = sizeof(*fhp); 204 + fhp = calloc(1, fhsize); 205 + if (!fhp) { 206 + log_err("calloc"); 207 + return 0; 208 + } 209 + err = name_to_handle_at(dirfd, cgroup_workdir, fhp, &mount_id, flags); 210 + if (err >= 0 || fhp->handle_bytes != 8) { 211 + log_err("name_to_handle_at"); 212 + goto free_mem; 213 + } 214 + 215 + fhsize = sizeof(struct file_handle) + fhp->handle_bytes; 216 + fhp2 = realloc(fhp, fhsize); 217 + if (!fhp2) { 218 + log_err("realloc"); 219 + goto free_mem; 220 + } 221 + err = name_to_handle_at(dirfd, cgroup_workdir, fhp2, &mount_id, flags); 222 + fhp = fhp2; 223 + if (err < 0) { 224 + log_err("name_to_handle_at"); 225 + goto free_mem; 226 + } 227 + 228 + memcpy(id.raw_bytes, fhp->f_handle, 8); 229 + ret = id.cgid; 230 + 231 + free_mem: 232 + free(fhp); 233 + return ret; 179 234 }
+1
tools/testing/selftests/bpf/cgroup_helpers.h
··· 13 13 int join_cgroup(char *path); 14 14 int setup_cgroup_environment(void); 15 15 void cleanup_cgroup_environment(void); 16 + unsigned long long get_cgroup_id(char *path); 16 17 17 18 #endif
+28
tools/testing/selftests/bpf/get_cgroup_id_kern.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2018 Facebook 3 + 4 + #include <linux/bpf.h> 5 + #include "bpf_helpers.h" 6 + 7 + struct bpf_map_def SEC("maps") cg_ids = { 8 + .type = BPF_MAP_TYPE_ARRAY, 9 + .key_size = sizeof(__u32), 10 + .value_size = sizeof(__u64), 11 + .max_entries = 1, 12 + }; 13 + 14 + SEC("tracepoint/syscalls/sys_enter_nanosleep") 15 + int trace(void *ctx) 16 + { 17 + __u32 key = 0; 18 + __u64 *val; 19 + 20 + val = bpf_map_lookup_elem(&cg_ids, &key); 21 + if (val) 22 + *val = bpf_get_current_cgroup_id(); 23 + 24 + return 0; 25 + } 26 + 27 + char _license[] SEC("license") = "GPL"; 28 + __u32 _version SEC("version") = 1; /* ignored by tracepoints, required by libbpf.a */
+141
tools/testing/selftests/bpf/get_cgroup_id_user.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2018 Facebook 3 + 4 + #include <stdio.h> 5 + #include <stdlib.h> 6 + #include <string.h> 7 + #include <errno.h> 8 + #include <fcntl.h> 9 + #include <syscall.h> 10 + #include <unistd.h> 11 + #include <linux/perf_event.h> 12 + #include <sys/ioctl.h> 13 + #include <sys/time.h> 14 + #include <sys/types.h> 15 + #include <sys/stat.h> 16 + 17 + #include <linux/bpf.h> 18 + #include <bpf/bpf.h> 19 + #include <bpf/libbpf.h> 20 + 21 + #include "cgroup_helpers.h" 22 + #include "bpf_rlimit.h" 23 + 24 + #define CHECK(condition, tag, format...) ({ \ 25 + int __ret = !!(condition); \ 26 + if (__ret) { \ 27 + printf("%s:FAIL:%s ", __func__, tag); \ 28 + printf(format); \ 29 + } else { \ 30 + printf("%s:PASS:%s\n", __func__, tag); \ 31 + } \ 32 + __ret; \ 33 + }) 34 + 35 + static int bpf_find_map(const char *test, struct bpf_object *obj, 36 + const char *name) 37 + { 38 + struct bpf_map *map; 39 + 40 + map = bpf_object__find_map_by_name(obj, name); 41 + if (!map) 42 + return -1; 43 + return bpf_map__fd(map); 44 + } 45 + 46 + #define TEST_CGROUP "/test-bpf-get-cgroup-id/" 47 + 48 + int main(int argc, char **argv) 49 + { 50 + const char *probe_name = "syscalls/sys_enter_nanosleep"; 51 + const char *file = "get_cgroup_id_kern.o"; 52 + int err, bytes, efd, prog_fd, pmu_fd; 53 + struct perf_event_attr attr = {}; 54 + int cgroup_fd, cgidmap_fd; 55 + struct bpf_object *obj; 56 + __u64 kcgid = 0, ucgid; 57 + int exit_code = 1; 58 + char buf[256]; 59 + __u32 key = 0; 60 + 61 + err = setup_cgroup_environment(); 62 + if (CHECK(err, "setup_cgroup_environment", "err %d errno %d\n", err, 63 + errno)) 64 + return 1; 65 + 66 + cgroup_fd = create_and_get_cgroup(TEST_CGROUP); 67 + if (CHECK(cgroup_fd < 0, "create_and_get_cgroup", "err %d errno %d\n", 68 + cgroup_fd, errno)) 69 + goto cleanup_cgroup_env; 70 + 71 + err = join_cgroup(TEST_CGROUP); 72 + if (CHECK(err, "join_cgroup", "err %d errno %d\n", err, errno)) 73 + goto cleanup_cgroup_env; 74 + 75 + err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd); 76 + if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno)) 77 + goto cleanup_cgroup_env; 78 + 79 + cgidmap_fd = bpf_find_map(__func__, obj, "cg_ids"); 80 + if (CHECK(cgidmap_fd < 0, "bpf_find_map", "err %d errno %d\n", 81 + cgidmap_fd, errno)) 82 + goto close_prog; 83 + 84 + snprintf(buf, sizeof(buf), 85 + "/sys/kernel/debug/tracing/events/%s/id", probe_name); 86 + efd = open(buf, O_RDONLY, 0); 87 + if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno)) 88 + goto close_prog; 89 + bytes = read(efd, buf, sizeof(buf)); 90 + close(efd); 91 + if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read", 92 + "bytes %d errno %d\n", bytes, errno)) 93 + goto close_prog; 94 + 95 + attr.config = strtol(buf, NULL, 0); 96 + attr.type = PERF_TYPE_TRACEPOINT; 97 + attr.sample_type = PERF_SAMPLE_RAW; 98 + attr.sample_period = 1; 99 + attr.wakeup_events = 1; 100 + 101 + /* attach to this pid so the all bpf invocations will be in the 102 + * cgroup associated with this pid. 103 + */ 104 + pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0); 105 + if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd, 106 + errno)) 107 + goto close_prog; 108 + 109 + err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0); 110 + if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err, 111 + errno)) 112 + goto close_pmu; 113 + 114 + err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); 115 + if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err, 116 + errno)) 117 + goto close_pmu; 118 + 119 + /* trigger some syscalls */ 120 + sleep(1); 121 + 122 + err = bpf_map_lookup_elem(cgidmap_fd, &key, &kcgid); 123 + if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno)) 124 + goto close_pmu; 125 + 126 + ucgid = get_cgroup_id(TEST_CGROUP); 127 + if (CHECK(kcgid != ucgid, "compare_cgroup_id", 128 + "kern cgid %llx user cgid %llx", kcgid, ucgid)) 129 + goto close_pmu; 130 + 131 + exit_code = 0; 132 + printf("%s:PASS\n", argv[0]); 133 + 134 + close_pmu: 135 + close(pmu_fd); 136 + close_prog: 137 + bpf_object__close(obj); 138 + cleanup_cgroup_env: 139 + cleanup_cgroup_environment(); 140 + return exit_code; 141 + }
+49
tools/testing/selftests/bpf/sendmsg4_prog.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2018 Facebook 3 + 4 + #include <linux/stddef.h> 5 + #include <linux/bpf.h> 6 + #include <sys/socket.h> 7 + 8 + #include "bpf_helpers.h" 9 + #include "bpf_endian.h" 10 + 11 + #define SRC1_IP4 0xAC100001U /* 172.16.0.1 */ 12 + #define SRC2_IP4 0x00000000U 13 + #define SRC_REWRITE_IP4 0x7f000004U 14 + #define DST_IP4 0xC0A801FEU /* 192.168.1.254 */ 15 + #define DST_REWRITE_IP4 0x7f000001U 16 + #define DST_PORT 4040 17 + #define DST_REWRITE_PORT4 4444 18 + 19 + int _version SEC("version") = 1; 20 + 21 + SEC("cgroup/sendmsg4") 22 + int sendmsg_v4_prog(struct bpf_sock_addr *ctx) 23 + { 24 + if (ctx->type != SOCK_DGRAM) 25 + return 0; 26 + 27 + /* Rewrite source. */ 28 + if (ctx->msg_src_ip4 == bpf_htonl(SRC1_IP4) || 29 + ctx->msg_src_ip4 == bpf_htonl(SRC2_IP4)) { 30 + ctx->msg_src_ip4 = bpf_htonl(SRC_REWRITE_IP4); 31 + } else { 32 + /* Unexpected source. Reject sendmsg. */ 33 + return 0; 34 + } 35 + 36 + /* Rewrite destination. */ 37 + if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) && 38 + ctx->user_port == bpf_htons(DST_PORT)) { 39 + ctx->user_ip4 = bpf_htonl(DST_REWRITE_IP4); 40 + ctx->user_port = bpf_htons(DST_REWRITE_PORT4); 41 + } else { 42 + /* Unexpected source. Reject sendmsg. */ 43 + return 0; 44 + } 45 + 46 + return 1; 47 + } 48 + 49 + char _license[] SEC("license") = "GPL";
+60
tools/testing/selftests/bpf/sendmsg6_prog.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2018 Facebook 3 + 4 + #include <linux/stddef.h> 5 + #include <linux/bpf.h> 6 + #include <sys/socket.h> 7 + 8 + #include "bpf_helpers.h" 9 + #include "bpf_endian.h" 10 + 11 + #define SRC_REWRITE_IP6_0 0 12 + #define SRC_REWRITE_IP6_1 0 13 + #define SRC_REWRITE_IP6_2 0 14 + #define SRC_REWRITE_IP6_3 6 15 + 16 + #define DST_REWRITE_IP6_0 0 17 + #define DST_REWRITE_IP6_1 0 18 + #define DST_REWRITE_IP6_2 0 19 + #define DST_REWRITE_IP6_3 1 20 + 21 + #define DST_REWRITE_PORT6 6666 22 + 23 + int _version SEC("version") = 1; 24 + 25 + SEC("cgroup/sendmsg6") 26 + int sendmsg_v6_prog(struct bpf_sock_addr *ctx) 27 + { 28 + if (ctx->type != SOCK_DGRAM) 29 + return 0; 30 + 31 + /* Rewrite source. */ 32 + if (ctx->msg_src_ip6[3] == bpf_htonl(1) || 33 + ctx->msg_src_ip6[3] == bpf_htonl(0)) { 34 + ctx->msg_src_ip6[0] = bpf_htonl(SRC_REWRITE_IP6_0); 35 + ctx->msg_src_ip6[1] = bpf_htonl(SRC_REWRITE_IP6_1); 36 + ctx->msg_src_ip6[2] = bpf_htonl(SRC_REWRITE_IP6_2); 37 + ctx->msg_src_ip6[3] = bpf_htonl(SRC_REWRITE_IP6_3); 38 + } else { 39 + /* Unexpected source. Reject sendmsg. */ 40 + return 0; 41 + } 42 + 43 + /* Rewrite destination. */ 44 + if ((ctx->user_ip6[0] & 0xFFFF) == bpf_htons(0xFACE) && 45 + ctx->user_ip6[0] >> 16 == bpf_htons(0xB00C)) { 46 + ctx->user_ip6[0] = bpf_htonl(DST_REWRITE_IP6_0); 47 + ctx->user_ip6[1] = bpf_htonl(DST_REWRITE_IP6_1); 48 + ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); 49 + ctx->user_ip6[3] = bpf_htonl(DST_REWRITE_IP6_3); 50 + 51 + ctx->user_port = bpf_htons(DST_REWRITE_PORT6); 52 + } else { 53 + /* Unexpected destination. Reject sendmsg. */ 54 + return 0; 55 + } 56 + 57 + return 1; 58 + } 59 + 60 + char _license[] SEC("license") = "GPL";
+45
tools/testing/selftests/bpf/test_btf.c
··· 1179 1179 }, 1180 1180 1181 1181 { 1182 + .descr = "array test. t->size != 0\"", 1183 + .raw_types = { 1184 + /* int */ /* [1] */ 1185 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), 1186 + /* int[16] */ /* [2] */ 1187 + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_ARRAY, 0, 0), 1), 1188 + BTF_ARRAY_ENC(1, 1, 16), 1189 + BTF_END_RAW, 1190 + }, 1191 + .str_sec = "", 1192 + .str_sec_size = sizeof(""), 1193 + .map_type = BPF_MAP_TYPE_ARRAY, 1194 + .map_name = "array_test_map", 1195 + .key_size = sizeof(int), 1196 + .value_size = sizeof(int), 1197 + .key_type_id = 1, 1198 + .value_type_id = 1, 1199 + .max_entries = 4, 1200 + .btf_load_err = true, 1201 + .err_str = "size != 0", 1202 + }, 1203 + 1204 + { 1182 1205 .descr = "int test. invalid int_data", 1183 1206 .raw_types = { 1184 1207 BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_INT, 0, 0), 4), ··· 1240 1217 .max_entries = 4, 1241 1218 .btf_load_err = true, 1242 1219 .err_str = "Invalid btf_info", 1220 + }, 1221 + 1222 + { 1223 + .descr = "fwd test. t->type != 0\"", 1224 + .raw_types = { 1225 + /* int */ /* [1] */ 1226 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), 1227 + /* fwd type */ /* [2] */ 1228 + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FWD, 0, 0), 1), 1229 + BTF_END_RAW, 1230 + }, 1231 + .str_sec = "", 1232 + .str_sec_size = sizeof(""), 1233 + .map_type = BPF_MAP_TYPE_ARRAY, 1234 + .map_name = "fwd_test_map", 1235 + .key_size = sizeof(int), 1236 + .value_size = sizeof(int), 1237 + .key_type_id = 1, 1238 + .value_type_id = 1, 1239 + .max_entries = 4, 1240 + .btf_load_err = true, 1241 + .err_str = "type != 0", 1243 1242 }, 1244 1243 1245 1244 }; /* struct btf_raw_test raw_tests[] */
+28
tools/testing/selftests/bpf/test_lirc_mode2.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + GREEN='\033[0;92m' 5 + RED='\033[0;31m' 6 + NC='\033[0m' # No Color 7 + 8 + modprobe rc-loopback 9 + 10 + for i in /sys/class/rc/rc* 11 + do 12 + if grep -q DRV_NAME=rc-loopback $i/uevent 13 + then 14 + LIRCDEV=$(grep DEVNAME= $i/lirc*/uevent | sed sQDEVNAME=Q/dev/Q) 15 + fi 16 + done 17 + 18 + if [ -n $LIRCDEV ]; 19 + then 20 + TYPE=lirc_mode2 21 + ./test_lirc_mode2_user $LIRCDEV 22 + ret=$? 23 + if [ $ret -ne 0 ]; then 24 + echo -e ${RED}"FAIL: $TYPE"${NC} 25 + else 26 + echo -e ${GREEN}"PASS: $TYPE"${NC} 27 + fi 28 + fi
+23
tools/testing/selftests/bpf/test_lirc_mode2_kern.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // test ir decoder 3 + // 4 + // Copyright (C) 2018 Sean Young <sean@mess.org> 5 + 6 + #include <linux/bpf.h> 7 + #include <linux/lirc.h> 8 + #include "bpf_helpers.h" 9 + 10 + SEC("lirc_mode2") 11 + int bpf_decoder(unsigned int *sample) 12 + { 13 + if (LIRC_IS_PULSE(*sample)) { 14 + unsigned int duration = LIRC_VALUE(*sample); 15 + 16 + if (duration & 0x10000) 17 + bpf_rc_keydown(sample, 0x40, duration & 0xffff, 0); 18 + } 19 + 20 + return 0; 21 + } 22 + 23 + char _license[] SEC("license") = "GPL";
+149
tools/testing/selftests/bpf/test_lirc_mode2_user.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // test ir decoder 3 + // 4 + // Copyright (C) 2018 Sean Young <sean@mess.org> 5 + 6 + // A lirc chardev is a device representing a consumer IR (cir) device which 7 + // can receive infrared signals from remote control and/or transmit IR. 8 + // 9 + // IR is sent as a series of pulses and space somewhat like morse code. The 10 + // BPF program can decode this into scancodes so that rc-core can translate 11 + // this into input key codes using the rc keymap. 12 + // 13 + // This test works by sending IR over rc-loopback, so the IR is processed by 14 + // BPF and then decoded into scancodes. The lirc chardev must be the one 15 + // associated with rc-loopback, see the output of ir-keytable(1). 16 + // 17 + // The following CONFIG options must be enabled for the test to succeed: 18 + // CONFIG_RC_CORE=y 19 + // CONFIG_BPF_RAWIR_EVENT=y 20 + // CONFIG_RC_LOOPBACK=y 21 + 22 + // Steps: 23 + // 1. Open the /dev/lircN device for rc-loopback (given on command line) 24 + // 2. Attach bpf_lirc_mode2 program which decodes some IR. 25 + // 3. Send some IR to the same IR device; since it is loopback, this will 26 + // end up in the bpf program 27 + // 4. bpf program should decode IR and report keycode 28 + // 5. We can read keycode from same /dev/lirc device 29 + 30 + #include <linux/bpf.h> 31 + #include <linux/lirc.h> 32 + #include <errno.h> 33 + #include <stdio.h> 34 + #include <stdlib.h> 35 + #include <string.h> 36 + #include <unistd.h> 37 + #include <poll.h> 38 + #include <sys/types.h> 39 + #include <sys/ioctl.h> 40 + #include <sys/stat.h> 41 + #include <fcntl.h> 42 + 43 + #include "bpf_util.h" 44 + #include <bpf/bpf.h> 45 + #include <bpf/libbpf.h> 46 + 47 + int main(int argc, char **argv) 48 + { 49 + struct bpf_object *obj; 50 + int ret, lircfd, progfd, mode; 51 + int testir = 0x1dead; 52 + u32 prog_ids[10], prog_flags[10], prog_cnt; 53 + 54 + if (argc != 2) { 55 + printf("Usage: %s /dev/lircN\n", argv[0]); 56 + return 2; 57 + } 58 + 59 + ret = bpf_prog_load("test_lirc_mode2_kern.o", 60 + BPF_PROG_TYPE_LIRC_MODE2, &obj, &progfd); 61 + if (ret) { 62 + printf("Failed to load bpf program\n"); 63 + return 1; 64 + } 65 + 66 + lircfd = open(argv[1], O_RDWR | O_NONBLOCK); 67 + if (lircfd == -1) { 68 + printf("failed to open lirc device %s: %m\n", argv[1]); 69 + return 1; 70 + } 71 + 72 + /* Let's try detach it before it was ever attached */ 73 + ret = bpf_prog_detach2(progfd, lircfd, BPF_LIRC_MODE2); 74 + if (ret != -1 || errno != ENOENT) { 75 + printf("bpf_prog_detach2 not attached should fail: %m\n"); 76 + return 1; 77 + } 78 + 79 + mode = LIRC_MODE_SCANCODE; 80 + if (ioctl(lircfd, LIRC_SET_REC_MODE, &mode)) { 81 + printf("failed to set rec mode: %m\n"); 82 + return 1; 83 + } 84 + 85 + prog_cnt = 10; 86 + ret = bpf_prog_query(lircfd, BPF_LIRC_MODE2, 0, prog_flags, prog_ids, 87 + &prog_cnt); 88 + if (ret) { 89 + printf("Failed to query bpf programs on lirc device: %m\n"); 90 + return 1; 91 + } 92 + 93 + if (prog_cnt != 0) { 94 + printf("Expected nothing to be attached\n"); 95 + return 1; 96 + } 97 + 98 + ret = bpf_prog_attach(progfd, lircfd, BPF_LIRC_MODE2, 0); 99 + if (ret) { 100 + printf("Failed to attach bpf to lirc device: %m\n"); 101 + return 1; 102 + } 103 + 104 + /* Write raw IR */ 105 + ret = write(lircfd, &testir, sizeof(testir)); 106 + if (ret != sizeof(testir)) { 107 + printf("Failed to send test IR message: %m\n"); 108 + return 1; 109 + } 110 + 111 + struct pollfd pfd = { .fd = lircfd, .events = POLLIN }; 112 + struct lirc_scancode lsc; 113 + 114 + poll(&pfd, 1, 100); 115 + 116 + /* Read decoded IR */ 117 + ret = read(lircfd, &lsc, sizeof(lsc)); 118 + if (ret != sizeof(lsc)) { 119 + printf("Failed to read decoded IR: %m\n"); 120 + return 1; 121 + } 122 + 123 + if (lsc.scancode != 0xdead || lsc.rc_proto != 64) { 124 + printf("Incorrect scancode decoded\n"); 125 + return 1; 126 + } 127 + 128 + prog_cnt = 10; 129 + ret = bpf_prog_query(lircfd, BPF_LIRC_MODE2, 0, prog_flags, prog_ids, 130 + &prog_cnt); 131 + if (ret) { 132 + printf("Failed to query bpf programs on lirc device: %m\n"); 133 + return 1; 134 + } 135 + 136 + if (prog_cnt != 1) { 137 + printf("Expected one program to be attached\n"); 138 + return 1; 139 + } 140 + 141 + /* Let's try detaching it now it is actually attached */ 142 + ret = bpf_prog_detach2(progfd, lircfd, BPF_LIRC_MODE2); 143 + if (ret) { 144 + printf("bpf_prog_detach2: returned %m\n"); 145 + return 1; 146 + } 147 + 148 + return 0; 149 + }
+986 -203
tools/testing/selftests/bpf/test_sock_addr.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 // Copyright (c) 2018 Facebook 3 3 4 + #define _GNU_SOURCE 5 + 4 6 #include <stdio.h> 5 7 #include <stdlib.h> 6 8 #include <unistd.h> 7 9 8 10 #include <arpa/inet.h> 11 + #include <netinet/in.h> 9 12 #include <sys/types.h> 13 + #include <sys/select.h> 10 14 #include <sys/socket.h> 11 15 12 16 #include <linux/filter.h> ··· 21 17 #include "cgroup_helpers.h" 22 18 #include "bpf_rlimit.h" 23 19 20 + #ifndef ENOTSUPP 21 + # define ENOTSUPP 524 22 + #endif 23 + 24 + #ifndef ARRAY_SIZE 25 + # define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) 26 + #endif 27 + 24 28 #define CG_PATH "/foo" 25 29 #define CONNECT4_PROG_PATH "./connect4_prog.o" 26 30 #define CONNECT6_PROG_PATH "./connect6_prog.o" 31 + #define SENDMSG4_PROG_PATH "./sendmsg4_prog.o" 32 + #define SENDMSG6_PROG_PATH "./sendmsg6_prog.o" 27 33 28 34 #define SERV4_IP "192.168.1.254" 29 35 #define SERV4_REWRITE_IP "127.0.0.1" 36 + #define SRC4_IP "172.16.0.1" 37 + #define SRC4_REWRITE_IP "127.0.0.4" 30 38 #define SERV4_PORT 4040 31 39 #define SERV4_REWRITE_PORT 4444 32 40 33 41 #define SERV6_IP "face:b00c:1234:5678::abcd" 34 42 #define SERV6_REWRITE_IP "::1" 43 + #define SERV6_V4MAPPED_IP "::ffff:192.168.0.4" 44 + #define SRC6_IP "::1" 45 + #define SRC6_REWRITE_IP "::6" 35 46 #define SERV6_PORT 6060 36 47 #define SERV6_REWRITE_PORT 6666 37 48 38 49 #define INET_NTOP_BUF 40 39 50 40 - typedef int (*load_fn)(enum bpf_attach_type, const char *comment); 51 + struct sock_addr_test; 52 + 53 + typedef int (*load_fn)(const struct sock_addr_test *test); 41 54 typedef int (*info_fn)(int, struct sockaddr *, socklen_t *); 42 55 43 - struct program { 44 - enum bpf_attach_type type; 45 - load_fn loadfn; 46 - int fd; 47 - const char *name; 48 - enum bpf_attach_type invalid_type; 56 + char bpf_log_buf[BPF_LOG_BUF_SIZE]; 57 + 58 + struct sock_addr_test { 59 + const char *descr; 60 + /* BPF prog properties */ 61 + load_fn loadfn; 62 + enum bpf_attach_type expected_attach_type; 63 + enum bpf_attach_type attach_type; 64 + /* Socket properties */ 65 + int domain; 66 + int type; 67 + /* IP:port pairs for BPF prog to override */ 68 + const char *requested_ip; 69 + unsigned short requested_port; 70 + const char *expected_ip; 71 + unsigned short expected_port; 72 + const char *expected_src_ip; 73 + /* Expected test result */ 74 + enum { 75 + LOAD_REJECT, 76 + ATTACH_REJECT, 77 + SYSCALL_EPERM, 78 + SYSCALL_ENOTSUPP, 79 + SUCCESS, 80 + } expected_result; 49 81 }; 50 82 51 - char bpf_log_buf[BPF_LOG_BUF_SIZE]; 83 + static int bind4_prog_load(const struct sock_addr_test *test); 84 + static int bind6_prog_load(const struct sock_addr_test *test); 85 + static int connect4_prog_load(const struct sock_addr_test *test); 86 + static int connect6_prog_load(const struct sock_addr_test *test); 87 + static int sendmsg_deny_prog_load(const struct sock_addr_test *test); 88 + static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test); 89 + static int sendmsg4_rw_c_prog_load(const struct sock_addr_test *test); 90 + static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test); 91 + static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test); 92 + static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test); 93 + 94 + static struct sock_addr_test tests[] = { 95 + /* bind */ 96 + { 97 + "bind4: load prog with wrong expected attach type", 98 + bind4_prog_load, 99 + BPF_CGROUP_INET6_BIND, 100 + BPF_CGROUP_INET4_BIND, 101 + AF_INET, 102 + SOCK_STREAM, 103 + NULL, 104 + 0, 105 + NULL, 106 + 0, 107 + NULL, 108 + LOAD_REJECT, 109 + }, 110 + { 111 + "bind4: attach prog with wrong attach type", 112 + bind4_prog_load, 113 + BPF_CGROUP_INET4_BIND, 114 + BPF_CGROUP_INET6_BIND, 115 + AF_INET, 116 + SOCK_STREAM, 117 + NULL, 118 + 0, 119 + NULL, 120 + 0, 121 + NULL, 122 + ATTACH_REJECT, 123 + }, 124 + { 125 + "bind4: rewrite IP & TCP port in", 126 + bind4_prog_load, 127 + BPF_CGROUP_INET4_BIND, 128 + BPF_CGROUP_INET4_BIND, 129 + AF_INET, 130 + SOCK_STREAM, 131 + SERV4_IP, 132 + SERV4_PORT, 133 + SERV4_REWRITE_IP, 134 + SERV4_REWRITE_PORT, 135 + NULL, 136 + SUCCESS, 137 + }, 138 + { 139 + "bind4: rewrite IP & UDP port in", 140 + bind4_prog_load, 141 + BPF_CGROUP_INET4_BIND, 142 + BPF_CGROUP_INET4_BIND, 143 + AF_INET, 144 + SOCK_DGRAM, 145 + SERV4_IP, 146 + SERV4_PORT, 147 + SERV4_REWRITE_IP, 148 + SERV4_REWRITE_PORT, 149 + NULL, 150 + SUCCESS, 151 + }, 152 + { 153 + "bind6: load prog with wrong expected attach type", 154 + bind6_prog_load, 155 + BPF_CGROUP_INET4_BIND, 156 + BPF_CGROUP_INET6_BIND, 157 + AF_INET6, 158 + SOCK_STREAM, 159 + NULL, 160 + 0, 161 + NULL, 162 + 0, 163 + NULL, 164 + LOAD_REJECT, 165 + }, 166 + { 167 + "bind6: attach prog with wrong attach type", 168 + bind6_prog_load, 169 + BPF_CGROUP_INET6_BIND, 170 + BPF_CGROUP_INET4_BIND, 171 + AF_INET, 172 + SOCK_STREAM, 173 + NULL, 174 + 0, 175 + NULL, 176 + 0, 177 + NULL, 178 + ATTACH_REJECT, 179 + }, 180 + { 181 + "bind6: rewrite IP & TCP port in", 182 + bind6_prog_load, 183 + BPF_CGROUP_INET6_BIND, 184 + BPF_CGROUP_INET6_BIND, 185 + AF_INET6, 186 + SOCK_STREAM, 187 + SERV6_IP, 188 + SERV6_PORT, 189 + SERV6_REWRITE_IP, 190 + SERV6_REWRITE_PORT, 191 + NULL, 192 + SUCCESS, 193 + }, 194 + { 195 + "bind6: rewrite IP & UDP port in", 196 + bind6_prog_load, 197 + BPF_CGROUP_INET6_BIND, 198 + BPF_CGROUP_INET6_BIND, 199 + AF_INET6, 200 + SOCK_DGRAM, 201 + SERV6_IP, 202 + SERV6_PORT, 203 + SERV6_REWRITE_IP, 204 + SERV6_REWRITE_PORT, 205 + NULL, 206 + SUCCESS, 207 + }, 208 + 209 + /* connect */ 210 + { 211 + "connect4: load prog with wrong expected attach type", 212 + connect4_prog_load, 213 + BPF_CGROUP_INET6_CONNECT, 214 + BPF_CGROUP_INET4_CONNECT, 215 + AF_INET, 216 + SOCK_STREAM, 217 + NULL, 218 + 0, 219 + NULL, 220 + 0, 221 + NULL, 222 + LOAD_REJECT, 223 + }, 224 + { 225 + "connect4: attach prog with wrong attach type", 226 + connect4_prog_load, 227 + BPF_CGROUP_INET4_CONNECT, 228 + BPF_CGROUP_INET6_CONNECT, 229 + AF_INET, 230 + SOCK_STREAM, 231 + NULL, 232 + 0, 233 + NULL, 234 + 0, 235 + NULL, 236 + ATTACH_REJECT, 237 + }, 238 + { 239 + "connect4: rewrite IP & TCP port", 240 + connect4_prog_load, 241 + BPF_CGROUP_INET4_CONNECT, 242 + BPF_CGROUP_INET4_CONNECT, 243 + AF_INET, 244 + SOCK_STREAM, 245 + SERV4_IP, 246 + SERV4_PORT, 247 + SERV4_REWRITE_IP, 248 + SERV4_REWRITE_PORT, 249 + SRC4_REWRITE_IP, 250 + SUCCESS, 251 + }, 252 + { 253 + "connect4: rewrite IP & UDP port", 254 + connect4_prog_load, 255 + BPF_CGROUP_INET4_CONNECT, 256 + BPF_CGROUP_INET4_CONNECT, 257 + AF_INET, 258 + SOCK_DGRAM, 259 + SERV4_IP, 260 + SERV4_PORT, 261 + SERV4_REWRITE_IP, 262 + SERV4_REWRITE_PORT, 263 + SRC4_REWRITE_IP, 264 + SUCCESS, 265 + }, 266 + { 267 + "connect6: load prog with wrong expected attach type", 268 + connect6_prog_load, 269 + BPF_CGROUP_INET4_CONNECT, 270 + BPF_CGROUP_INET6_CONNECT, 271 + AF_INET6, 272 + SOCK_STREAM, 273 + NULL, 274 + 0, 275 + NULL, 276 + 0, 277 + NULL, 278 + LOAD_REJECT, 279 + }, 280 + { 281 + "connect6: attach prog with wrong attach type", 282 + connect6_prog_load, 283 + BPF_CGROUP_INET6_CONNECT, 284 + BPF_CGROUP_INET4_CONNECT, 285 + AF_INET, 286 + SOCK_STREAM, 287 + NULL, 288 + 0, 289 + NULL, 290 + 0, 291 + NULL, 292 + ATTACH_REJECT, 293 + }, 294 + { 295 + "connect6: rewrite IP & TCP port", 296 + connect6_prog_load, 297 + BPF_CGROUP_INET6_CONNECT, 298 + BPF_CGROUP_INET6_CONNECT, 299 + AF_INET6, 300 + SOCK_STREAM, 301 + SERV6_IP, 302 + SERV6_PORT, 303 + SERV6_REWRITE_IP, 304 + SERV6_REWRITE_PORT, 305 + SRC6_REWRITE_IP, 306 + SUCCESS, 307 + }, 308 + { 309 + "connect6: rewrite IP & UDP port", 310 + connect6_prog_load, 311 + BPF_CGROUP_INET6_CONNECT, 312 + BPF_CGROUP_INET6_CONNECT, 313 + AF_INET6, 314 + SOCK_DGRAM, 315 + SERV6_IP, 316 + SERV6_PORT, 317 + SERV6_REWRITE_IP, 318 + SERV6_REWRITE_PORT, 319 + SRC6_REWRITE_IP, 320 + SUCCESS, 321 + }, 322 + 323 + /* sendmsg */ 324 + { 325 + "sendmsg4: load prog with wrong expected attach type", 326 + sendmsg4_rw_asm_prog_load, 327 + BPF_CGROUP_UDP6_SENDMSG, 328 + BPF_CGROUP_UDP4_SENDMSG, 329 + AF_INET, 330 + SOCK_DGRAM, 331 + NULL, 332 + 0, 333 + NULL, 334 + 0, 335 + NULL, 336 + LOAD_REJECT, 337 + }, 338 + { 339 + "sendmsg4: attach prog with wrong attach type", 340 + sendmsg4_rw_asm_prog_load, 341 + BPF_CGROUP_UDP4_SENDMSG, 342 + BPF_CGROUP_UDP6_SENDMSG, 343 + AF_INET, 344 + SOCK_DGRAM, 345 + NULL, 346 + 0, 347 + NULL, 348 + 0, 349 + NULL, 350 + ATTACH_REJECT, 351 + }, 352 + { 353 + "sendmsg4: rewrite IP & port (asm)", 354 + sendmsg4_rw_asm_prog_load, 355 + BPF_CGROUP_UDP4_SENDMSG, 356 + BPF_CGROUP_UDP4_SENDMSG, 357 + AF_INET, 358 + SOCK_DGRAM, 359 + SERV4_IP, 360 + SERV4_PORT, 361 + SERV4_REWRITE_IP, 362 + SERV4_REWRITE_PORT, 363 + SRC4_REWRITE_IP, 364 + SUCCESS, 365 + }, 366 + { 367 + "sendmsg4: rewrite IP & port (C)", 368 + sendmsg4_rw_c_prog_load, 369 + BPF_CGROUP_UDP4_SENDMSG, 370 + BPF_CGROUP_UDP4_SENDMSG, 371 + AF_INET, 372 + SOCK_DGRAM, 373 + SERV4_IP, 374 + SERV4_PORT, 375 + SERV4_REWRITE_IP, 376 + SERV4_REWRITE_PORT, 377 + SRC4_REWRITE_IP, 378 + SUCCESS, 379 + }, 380 + { 381 + "sendmsg4: deny call", 382 + sendmsg_deny_prog_load, 383 + BPF_CGROUP_UDP4_SENDMSG, 384 + BPF_CGROUP_UDP4_SENDMSG, 385 + AF_INET, 386 + SOCK_DGRAM, 387 + SERV4_IP, 388 + SERV4_PORT, 389 + SERV4_REWRITE_IP, 390 + SERV4_REWRITE_PORT, 391 + SRC4_REWRITE_IP, 392 + SYSCALL_EPERM, 393 + }, 394 + { 395 + "sendmsg6: load prog with wrong expected attach type", 396 + sendmsg6_rw_asm_prog_load, 397 + BPF_CGROUP_UDP4_SENDMSG, 398 + BPF_CGROUP_UDP6_SENDMSG, 399 + AF_INET6, 400 + SOCK_DGRAM, 401 + NULL, 402 + 0, 403 + NULL, 404 + 0, 405 + NULL, 406 + LOAD_REJECT, 407 + }, 408 + { 409 + "sendmsg6: attach prog with wrong attach type", 410 + sendmsg6_rw_asm_prog_load, 411 + BPF_CGROUP_UDP6_SENDMSG, 412 + BPF_CGROUP_UDP4_SENDMSG, 413 + AF_INET6, 414 + SOCK_DGRAM, 415 + NULL, 416 + 0, 417 + NULL, 418 + 0, 419 + NULL, 420 + ATTACH_REJECT, 421 + }, 422 + { 423 + "sendmsg6: rewrite IP & port (asm)", 424 + sendmsg6_rw_asm_prog_load, 425 + BPF_CGROUP_UDP6_SENDMSG, 426 + BPF_CGROUP_UDP6_SENDMSG, 427 + AF_INET6, 428 + SOCK_DGRAM, 429 + SERV6_IP, 430 + SERV6_PORT, 431 + SERV6_REWRITE_IP, 432 + SERV6_REWRITE_PORT, 433 + SRC6_REWRITE_IP, 434 + SUCCESS, 435 + }, 436 + { 437 + "sendmsg6: rewrite IP & port (C)", 438 + sendmsg6_rw_c_prog_load, 439 + BPF_CGROUP_UDP6_SENDMSG, 440 + BPF_CGROUP_UDP6_SENDMSG, 441 + AF_INET6, 442 + SOCK_DGRAM, 443 + SERV6_IP, 444 + SERV6_PORT, 445 + SERV6_REWRITE_IP, 446 + SERV6_REWRITE_PORT, 447 + SRC6_REWRITE_IP, 448 + SUCCESS, 449 + }, 450 + { 451 + "sendmsg6: IPv4-mapped IPv6", 452 + sendmsg6_rw_v4mapped_prog_load, 453 + BPF_CGROUP_UDP6_SENDMSG, 454 + BPF_CGROUP_UDP6_SENDMSG, 455 + AF_INET6, 456 + SOCK_DGRAM, 457 + SERV6_IP, 458 + SERV6_PORT, 459 + SERV6_REWRITE_IP, 460 + SERV6_REWRITE_PORT, 461 + SRC6_REWRITE_IP, 462 + SYSCALL_ENOTSUPP, 463 + }, 464 + { 465 + "sendmsg6: deny call", 466 + sendmsg_deny_prog_load, 467 + BPF_CGROUP_UDP6_SENDMSG, 468 + BPF_CGROUP_UDP6_SENDMSG, 469 + AF_INET6, 470 + SOCK_DGRAM, 471 + SERV6_IP, 472 + SERV6_PORT, 473 + SERV6_REWRITE_IP, 474 + SERV6_REWRITE_PORT, 475 + SRC6_REWRITE_IP, 476 + SYSCALL_EPERM, 477 + }, 478 + }; 52 479 53 480 static int mk_sockaddr(int domain, const char *ip, unsigned short port, 54 481 struct sockaddr *addr, socklen_t addr_len) ··· 519 84 return 0; 520 85 } 521 86 522 - static int load_insns(enum bpf_attach_type attach_type, 523 - const struct bpf_insn *insns, size_t insns_cnt, 524 - const char *comment) 87 + static int load_insns(const struct sock_addr_test *test, 88 + const struct bpf_insn *insns, size_t insns_cnt) 525 89 { 526 90 struct bpf_load_program_attr load_attr; 527 91 int ret; 528 92 529 93 memset(&load_attr, 0, sizeof(struct bpf_load_program_attr)); 530 94 load_attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; 531 - load_attr.expected_attach_type = attach_type; 95 + load_attr.expected_attach_type = test->expected_attach_type; 532 96 load_attr.insns = insns; 533 97 load_attr.insns_cnt = insns_cnt; 534 98 load_attr.license = "GPL"; 535 99 536 100 ret = bpf_load_program_xattr(&load_attr, bpf_log_buf, BPF_LOG_BUF_SIZE); 537 - if (ret < 0 && comment) { 538 - log_err(">>> Loading %s program error.\n" 539 - ">>> Output from verifier:\n%s\n-------\n", 540 - comment, bpf_log_buf); 101 + if (ret < 0 && test->expected_result != LOAD_REJECT) { 102 + log_err(">>> Loading program error.\n" 103 + ">>> Verifier output:\n%s\n-------\n", bpf_log_buf); 541 104 } 542 105 543 106 return ret; ··· 552 119 * to count jumps properly. 553 120 */ 554 121 555 - static int bind4_prog_load(enum bpf_attach_type attach_type, 556 - const char *comment) 122 + static int bind4_prog_load(const struct sock_addr_test *test) 557 123 { 558 124 union { 559 125 uint8_t u4_addr8[4]; ··· 618 186 BPF_EXIT_INSN(), 619 187 }; 620 188 621 - return load_insns(attach_type, insns, 622 - sizeof(insns) / sizeof(struct bpf_insn), comment); 189 + return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); 623 190 } 624 191 625 - static int bind6_prog_load(enum bpf_attach_type attach_type, 626 - const char *comment) 192 + static int bind6_prog_load(const struct sock_addr_test *test) 627 193 { 628 194 struct sockaddr_in6 addr6_rw; 629 195 struct in6_addr ip6; ··· 684 254 BPF_EXIT_INSN(), 685 255 }; 686 256 687 - return load_insns(attach_type, insns, 688 - sizeof(insns) / sizeof(struct bpf_insn), comment); 257 + return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); 689 258 } 690 259 691 - static int connect_prog_load_path(const char *path, 692 - enum bpf_attach_type attach_type, 693 - const char *comment) 260 + static int load_path(const struct sock_addr_test *test, const char *path) 694 261 { 695 262 struct bpf_prog_load_attr attr; 696 263 struct bpf_object *obj; ··· 696 269 memset(&attr, 0, sizeof(struct bpf_prog_load_attr)); 697 270 attr.file = path; 698 271 attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; 699 - attr.expected_attach_type = attach_type; 272 + attr.expected_attach_type = test->expected_attach_type; 700 273 701 274 if (bpf_prog_load_xattr(&attr, &obj, &prog_fd)) { 702 - if (comment) 703 - log_err(">>> Loading %s program at %s error.\n", 704 - comment, path); 275 + if (test->expected_result != LOAD_REJECT) 276 + log_err(">>> Loading program (%s) error.\n", path); 705 277 return -1; 706 278 } 707 279 708 280 return prog_fd; 709 281 } 710 282 711 - static int connect4_prog_load(enum bpf_attach_type attach_type, 712 - const char *comment) 283 + static int connect4_prog_load(const struct sock_addr_test *test) 713 284 { 714 - return connect_prog_load_path(CONNECT4_PROG_PATH, attach_type, comment); 285 + return load_path(test, CONNECT4_PROG_PATH); 715 286 } 716 287 717 - static int connect6_prog_load(enum bpf_attach_type attach_type, 718 - const char *comment) 288 + static int connect6_prog_load(const struct sock_addr_test *test) 719 289 { 720 - return connect_prog_load_path(CONNECT6_PROG_PATH, attach_type, comment); 290 + return load_path(test, CONNECT6_PROG_PATH); 721 291 } 722 292 723 - static void print_ip_port(int sockfd, info_fn fn, const char *fmt) 293 + static int sendmsg_deny_prog_load(const struct sock_addr_test *test) 724 294 { 725 - char addr_buf[INET_NTOP_BUF]; 726 - struct sockaddr_storage addr; 727 - struct sockaddr_in6 *addr6; 728 - struct sockaddr_in *addr4; 729 - socklen_t addr_len; 730 - unsigned short port; 731 - void *nip; 295 + struct bpf_insn insns[] = { 296 + /* return 0 */ 297 + BPF_MOV64_IMM(BPF_REG_0, 0), 298 + BPF_EXIT_INSN(), 299 + }; 300 + return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); 301 + } 732 302 733 - addr_len = sizeof(struct sockaddr_storage); 734 - memset(&addr, 0, addr_len); 303 + static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test) 304 + { 305 + struct sockaddr_in dst4_rw_addr; 306 + struct in_addr src4_rw_ip; 735 307 736 - if (fn(sockfd, (struct sockaddr *)&addr, (socklen_t *)&addr_len) == 0) { 737 - if (addr.ss_family == AF_INET) { 738 - addr4 = (struct sockaddr_in *)&addr; 739 - nip = (void *)&addr4->sin_addr; 740 - port = ntohs(addr4->sin_port); 741 - } else if (addr.ss_family == AF_INET6) { 742 - addr6 = (struct sockaddr_in6 *)&addr; 743 - nip = (void *)&addr6->sin6_addr; 744 - port = ntohs(addr6->sin6_port); 745 - } else { 746 - return; 747 - } 748 - const char *addr_str = 749 - inet_ntop(addr.ss_family, nip, addr_buf, INET_NTOP_BUF); 750 - printf(fmt, addr_str ? addr_str : "??", port); 308 + if (inet_pton(AF_INET, SRC4_REWRITE_IP, (void *)&src4_rw_ip) != 1) { 309 + log_err("Invalid IPv4: %s", SRC4_REWRITE_IP); 310 + return -1; 751 311 } 312 + 313 + if (mk_sockaddr(AF_INET, SERV4_REWRITE_IP, SERV4_REWRITE_PORT, 314 + (struct sockaddr *)&dst4_rw_addr, 315 + sizeof(dst4_rw_addr)) == -1) 316 + return -1; 317 + 318 + struct bpf_insn insns[] = { 319 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), 320 + 321 + /* if (sk.family == AF_INET && */ 322 + BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6, 323 + offsetof(struct bpf_sock_addr, family)), 324 + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET, 8), 325 + 326 + /* sk.type == SOCK_DGRAM) { */ 327 + BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6, 328 + offsetof(struct bpf_sock_addr, type)), 329 + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, SOCK_DGRAM, 6), 330 + 331 + /* msg_src_ip4 = src4_rw_ip */ 332 + BPF_MOV32_IMM(BPF_REG_7, src4_rw_ip.s_addr), 333 + BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, 334 + offsetof(struct bpf_sock_addr, msg_src_ip4)), 335 + 336 + /* user_ip4 = dst4_rw_addr.sin_addr */ 337 + BPF_MOV32_IMM(BPF_REG_7, dst4_rw_addr.sin_addr.s_addr), 338 + BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, 339 + offsetof(struct bpf_sock_addr, user_ip4)), 340 + 341 + /* user_port = dst4_rw_addr.sin_port */ 342 + BPF_MOV32_IMM(BPF_REG_7, dst4_rw_addr.sin_port), 343 + BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, 344 + offsetof(struct bpf_sock_addr, user_port)), 345 + /* } */ 346 + 347 + /* return 1 */ 348 + BPF_MOV64_IMM(BPF_REG_0, 1), 349 + BPF_EXIT_INSN(), 350 + }; 351 + 352 + return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); 752 353 } 753 354 754 - static void print_local_ip_port(int sockfd, const char *fmt) 355 + static int sendmsg4_rw_c_prog_load(const struct sock_addr_test *test) 755 356 { 756 - print_ip_port(sockfd, getsockname, fmt); 357 + return load_path(test, SENDMSG4_PROG_PATH); 757 358 } 758 359 759 - static void print_remote_ip_port(int sockfd, const char *fmt) 360 + static int sendmsg6_rw_dst_asm_prog_load(const struct sock_addr_test *test, 361 + const char *rw_dst_ip) 760 362 { 761 - print_ip_port(sockfd, getpeername, fmt); 363 + struct sockaddr_in6 dst6_rw_addr; 364 + struct in6_addr src6_rw_ip; 365 + 366 + if (inet_pton(AF_INET6, SRC6_REWRITE_IP, (void *)&src6_rw_ip) != 1) { 367 + log_err("Invalid IPv6: %s", SRC6_REWRITE_IP); 368 + return -1; 369 + } 370 + 371 + if (mk_sockaddr(AF_INET6, rw_dst_ip, SERV6_REWRITE_PORT, 372 + (struct sockaddr *)&dst6_rw_addr, 373 + sizeof(dst6_rw_addr)) == -1) 374 + return -1; 375 + 376 + struct bpf_insn insns[] = { 377 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), 378 + 379 + /* if (sk.family == AF_INET6) { */ 380 + BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6, 381 + offsetof(struct bpf_sock_addr, family)), 382 + BPF_JMP_IMM(BPF_JNE, BPF_REG_7, AF_INET6, 18), 383 + 384 + #define STORE_IPV6_WORD_N(DST, SRC, N) \ 385 + BPF_MOV32_IMM(BPF_REG_7, SRC[N]), \ 386 + BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, \ 387 + offsetof(struct bpf_sock_addr, DST[N])) 388 + 389 + #define STORE_IPV6(DST, SRC) \ 390 + STORE_IPV6_WORD_N(DST, SRC, 0), \ 391 + STORE_IPV6_WORD_N(DST, SRC, 1), \ 392 + STORE_IPV6_WORD_N(DST, SRC, 2), \ 393 + STORE_IPV6_WORD_N(DST, SRC, 3) 394 + 395 + STORE_IPV6(msg_src_ip6, src6_rw_ip.s6_addr32), 396 + STORE_IPV6(user_ip6, dst6_rw_addr.sin6_addr.s6_addr32), 397 + 398 + /* user_port = dst6_rw_addr.sin6_port */ 399 + BPF_MOV32_IMM(BPF_REG_7, dst6_rw_addr.sin6_port), 400 + BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_7, 401 + offsetof(struct bpf_sock_addr, user_port)), 402 + 403 + /* } */ 404 + 405 + /* return 1 */ 406 + BPF_MOV64_IMM(BPF_REG_0, 1), 407 + BPF_EXIT_INSN(), 408 + }; 409 + 410 + return load_insns(test, insns, sizeof(insns) / sizeof(struct bpf_insn)); 411 + } 412 + 413 + static int sendmsg6_rw_asm_prog_load(const struct sock_addr_test *test) 414 + { 415 + return sendmsg6_rw_dst_asm_prog_load(test, SERV6_REWRITE_IP); 416 + } 417 + 418 + static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test) 419 + { 420 + return sendmsg6_rw_dst_asm_prog_load(test, SERV6_V4MAPPED_IP); 421 + } 422 + 423 + static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test) 424 + { 425 + return load_path(test, SENDMSG6_PROG_PATH); 426 + } 427 + 428 + static int cmp_addr(const struct sockaddr_storage *addr1, 429 + const struct sockaddr_storage *addr2, int cmp_port) 430 + { 431 + const struct sockaddr_in *four1, *four2; 432 + const struct sockaddr_in6 *six1, *six2; 433 + 434 + if (addr1->ss_family != addr2->ss_family) 435 + return -1; 436 + 437 + if (addr1->ss_family == AF_INET) { 438 + four1 = (const struct sockaddr_in *)addr1; 439 + four2 = (const struct sockaddr_in *)addr2; 440 + return !((four1->sin_port == four2->sin_port || !cmp_port) && 441 + four1->sin_addr.s_addr == four2->sin_addr.s_addr); 442 + } else if (addr1->ss_family == AF_INET6) { 443 + six1 = (const struct sockaddr_in6 *)addr1; 444 + six2 = (const struct sockaddr_in6 *)addr2; 445 + return !((six1->sin6_port == six2->sin6_port || !cmp_port) && 446 + !memcmp(&six1->sin6_addr, &six2->sin6_addr, 447 + sizeof(struct in6_addr))); 448 + } 449 + 450 + return -1; 451 + } 452 + 453 + static int cmp_sock_addr(info_fn fn, int sock1, 454 + const struct sockaddr_storage *addr2, int cmp_port) 455 + { 456 + struct sockaddr_storage addr1; 457 + socklen_t len1 = sizeof(addr1); 458 + 459 + memset(&addr1, 0, len1); 460 + if (fn(sock1, (struct sockaddr *)&addr1, (socklen_t *)&len1) != 0) 461 + return -1; 462 + 463 + return cmp_addr(&addr1, addr2, cmp_port); 464 + } 465 + 466 + static int cmp_local_ip(int sock1, const struct sockaddr_storage *addr2) 467 + { 468 + return cmp_sock_addr(getsockname, sock1, addr2, /*cmp_port*/ 0); 469 + } 470 + 471 + static int cmp_local_addr(int sock1, const struct sockaddr_storage *addr2) 472 + { 473 + return cmp_sock_addr(getsockname, sock1, addr2, /*cmp_port*/ 1); 474 + } 475 + 476 + static int cmp_peer_addr(int sock1, const struct sockaddr_storage *addr2) 477 + { 478 + return cmp_sock_addr(getpeername, sock1, addr2, /*cmp_port*/ 1); 762 479 } 763 480 764 481 static int start_server(int type, const struct sockaddr_storage *addr, 765 482 socklen_t addr_len) 766 483 { 767 - 768 484 int fd; 769 485 770 486 fd = socket(addr->ss_family, type, 0); ··· 928 358 } 929 359 } 930 360 931 - print_local_ip_port(fd, "\t Actual: bind(%s, %d)\n"); 932 - 933 361 goto out; 934 362 close_out: 935 363 close(fd); ··· 940 372 socklen_t addr_len) 941 373 { 942 374 int domain; 943 - int fd; 375 + int fd = -1; 944 376 945 377 domain = addr->ss_family; 946 378 947 379 if (domain != AF_INET && domain != AF_INET6) { 948 380 log_err("Unsupported address family"); 949 - return -1; 381 + goto err; 950 382 } 951 383 952 384 fd = socket(domain, type, 0); 953 385 if (fd == -1) { 954 - log_err("Failed to creating client socket"); 955 - return -1; 386 + log_err("Failed to create client socket"); 387 + goto err; 956 388 } 957 389 958 390 if (connect(fd, (const struct sockaddr *)addr, addr_len) == -1) { ··· 960 392 goto err; 961 393 } 962 394 963 - print_remote_ip_port(fd, "\t Actual: connect(%s, %d)"); 964 - print_local_ip_port(fd, " from (%s, %d)\n"); 395 + goto out; 396 + err: 397 + close(fd); 398 + fd = -1; 399 + out: 400 + return fd; 401 + } 402 + 403 + int init_pktinfo(int domain, struct cmsghdr *cmsg) 404 + { 405 + struct in6_pktinfo *pktinfo6; 406 + struct in_pktinfo *pktinfo4; 407 + 408 + if (domain == AF_INET) { 409 + cmsg->cmsg_level = SOL_IP; 410 + cmsg->cmsg_type = IP_PKTINFO; 411 + cmsg->cmsg_len = CMSG_LEN(sizeof(struct in_pktinfo)); 412 + pktinfo4 = (struct in_pktinfo *)CMSG_DATA(cmsg); 413 + memset(pktinfo4, 0, sizeof(struct in_pktinfo)); 414 + if (inet_pton(domain, SRC4_IP, 415 + (void *)&pktinfo4->ipi_spec_dst) != 1) 416 + return -1; 417 + } else if (domain == AF_INET6) { 418 + cmsg->cmsg_level = SOL_IPV6; 419 + cmsg->cmsg_type = IPV6_PKTINFO; 420 + cmsg->cmsg_len = CMSG_LEN(sizeof(struct in6_pktinfo)); 421 + pktinfo6 = (struct in6_pktinfo *)CMSG_DATA(cmsg); 422 + memset(pktinfo6, 0, sizeof(struct in6_pktinfo)); 423 + if (inet_pton(domain, SRC6_IP, 424 + (void *)&pktinfo6->ipi6_addr) != 1) 425 + return -1; 426 + } else { 427 + return -1; 428 + } 429 + 430 + return 0; 431 + } 432 + 433 + static int sendmsg_to_server(const struct sockaddr_storage *addr, 434 + socklen_t addr_len, int set_cmsg, int *syscall_err) 435 + { 436 + union { 437 + char buf[CMSG_SPACE(sizeof(struct in6_pktinfo))]; 438 + struct cmsghdr align; 439 + } control6; 440 + union { 441 + char buf[CMSG_SPACE(sizeof(struct in_pktinfo))]; 442 + struct cmsghdr align; 443 + } control4; 444 + struct msghdr hdr; 445 + struct iovec iov; 446 + char data = 'a'; 447 + int domain; 448 + int fd = -1; 449 + 450 + domain = addr->ss_family; 451 + 452 + if (domain != AF_INET && domain != AF_INET6) { 453 + log_err("Unsupported address family"); 454 + goto err; 455 + } 456 + 457 + fd = socket(domain, SOCK_DGRAM, 0); 458 + if (fd == -1) { 459 + log_err("Failed to create client socket"); 460 + goto err; 461 + } 462 + 463 + memset(&iov, 0, sizeof(iov)); 464 + iov.iov_base = &data; 465 + iov.iov_len = sizeof(data); 466 + 467 + memset(&hdr, 0, sizeof(hdr)); 468 + hdr.msg_name = (void *)addr; 469 + hdr.msg_namelen = addr_len; 470 + hdr.msg_iov = &iov; 471 + hdr.msg_iovlen = 1; 472 + 473 + if (set_cmsg) { 474 + if (domain == AF_INET) { 475 + hdr.msg_control = &control4; 476 + hdr.msg_controllen = sizeof(control4.buf); 477 + } else if (domain == AF_INET6) { 478 + hdr.msg_control = &control6; 479 + hdr.msg_controllen = sizeof(control6.buf); 480 + } 481 + if (init_pktinfo(domain, CMSG_FIRSTHDR(&hdr))) { 482 + log_err("Fail to init pktinfo"); 483 + goto err; 484 + } 485 + } 486 + 487 + if (sendmsg(fd, &hdr, 0) != sizeof(data)) { 488 + log_err("Fail to send message to server"); 489 + *syscall_err = errno; 490 + goto err; 491 + } 492 + 493 + goto out; 494 + err: 495 + close(fd); 496 + fd = -1; 497 + out: 498 + return fd; 499 + } 500 + 501 + static int recvmsg_from_client(int sockfd, struct sockaddr_storage *src_addr) 502 + { 503 + struct timeval tv; 504 + struct msghdr hdr; 505 + struct iovec iov; 506 + char data[64]; 507 + fd_set rfds; 508 + 509 + FD_ZERO(&rfds); 510 + FD_SET(sockfd, &rfds); 511 + 512 + tv.tv_sec = 2; 513 + tv.tv_usec = 0; 514 + 515 + if (select(sockfd + 1, &rfds, NULL, NULL, &tv) <= 0 || 516 + !FD_ISSET(sockfd, &rfds)) 517 + return -1; 518 + 519 + memset(&iov, 0, sizeof(iov)); 520 + iov.iov_base = data; 521 + iov.iov_len = sizeof(data); 522 + 523 + memset(&hdr, 0, sizeof(hdr)); 524 + hdr.msg_name = src_addr; 525 + hdr.msg_namelen = sizeof(struct sockaddr_storage); 526 + hdr.msg_iov = &iov; 527 + hdr.msg_iovlen = 1; 528 + 529 + return recvmsg(sockfd, &hdr, 0); 530 + } 531 + 532 + static int init_addrs(const struct sock_addr_test *test, 533 + struct sockaddr_storage *requested_addr, 534 + struct sockaddr_storage *expected_addr, 535 + struct sockaddr_storage *expected_src_addr) 536 + { 537 + socklen_t addr_len = sizeof(struct sockaddr_storage); 538 + 539 + if (mk_sockaddr(test->domain, test->expected_ip, test->expected_port, 540 + (struct sockaddr *)expected_addr, addr_len) == -1) 541 + goto err; 542 + 543 + if (mk_sockaddr(test->domain, test->requested_ip, test->requested_port, 544 + (struct sockaddr *)requested_addr, addr_len) == -1) 545 + goto err; 546 + 547 + if (test->expected_src_ip && 548 + mk_sockaddr(test->domain, test->expected_src_ip, 0, 549 + (struct sockaddr *)expected_src_addr, addr_len) == -1) 550 + goto err; 965 551 966 552 return 0; 967 553 err: 968 - close(fd); 969 554 return -1; 970 555 } 971 556 972 - static void print_test_case_num(int domain, int type) 557 + static int run_bind_test_case(const struct sock_addr_test *test) 973 558 { 974 - static int test_num; 975 - 976 - printf("Test case #%d (%s/%s):\n", ++test_num, 977 - (domain == AF_INET ? "IPv4" : 978 - domain == AF_INET6 ? "IPv6" : 979 - "unknown_domain"), 980 - (type == SOCK_STREAM ? "TCP" : 981 - type == SOCK_DGRAM ? "UDP" : 982 - "unknown_type")); 983 - } 984 - 985 - static int run_test_case(int domain, int type, const char *ip, 986 - unsigned short port) 987 - { 988 - struct sockaddr_storage addr; 989 - socklen_t addr_len = sizeof(addr); 559 + socklen_t addr_len = sizeof(struct sockaddr_storage); 560 + struct sockaddr_storage requested_addr; 561 + struct sockaddr_storage expected_addr; 562 + int clientfd = -1; 990 563 int servfd = -1; 991 564 int err = 0; 992 565 993 - print_test_case_num(domain, type); 566 + if (init_addrs(test, &requested_addr, &expected_addr, NULL)) 567 + goto err; 994 568 995 - if (mk_sockaddr(domain, ip, port, (struct sockaddr *)&addr, 996 - addr_len) == -1) 997 - return -1; 998 - 999 - printf("\tRequested: bind(%s, %d) ..\n", ip, port); 1000 - servfd = start_server(type, &addr, addr_len); 569 + servfd = start_server(test->type, &requested_addr, addr_len); 1001 570 if (servfd == -1) 1002 571 goto err; 1003 572 1004 - printf("\tRequested: connect(%s, %d) from (*, *) ..\n", ip, port); 1005 - if (connect_to_server(type, &addr, addr_len)) 573 + if (cmp_local_addr(servfd, &expected_addr)) 574 + goto err; 575 + 576 + /* Try to connect to server just in case */ 577 + clientfd = connect_to_server(test->type, &expected_addr, addr_len); 578 + if (clientfd == -1) 1006 579 goto err; 1007 580 1008 581 goto out; 1009 582 err: 1010 583 err = -1; 1011 584 out: 585 + close(clientfd); 1012 586 close(servfd); 1013 587 return err; 1014 588 } 1015 589 1016 - static void close_progs_fds(struct program *progs, size_t prog_cnt) 590 + static int run_connect_test_case(const struct sock_addr_test *test) 1017 591 { 1018 - size_t i; 1019 - 1020 - for (i = 0; i < prog_cnt; ++i) { 1021 - close(progs[i].fd); 1022 - progs[i].fd = -1; 1023 - } 1024 - } 1025 - 1026 - static int load_and_attach_progs(int cgfd, struct program *progs, 1027 - size_t prog_cnt) 1028 - { 1029 - size_t i; 1030 - 1031 - for (i = 0; i < prog_cnt; ++i) { 1032 - printf("Load %s with invalid type (can pollute stderr) ", 1033 - progs[i].name); 1034 - fflush(stdout); 1035 - progs[i].fd = progs[i].loadfn(progs[i].invalid_type, NULL); 1036 - if (progs[i].fd != -1) { 1037 - log_err("Load with invalid type accepted for %s", 1038 - progs[i].name); 1039 - goto err; 1040 - } 1041 - printf("... REJECTED\n"); 1042 - 1043 - printf("Load %s with valid type", progs[i].name); 1044 - progs[i].fd = progs[i].loadfn(progs[i].type, progs[i].name); 1045 - if (progs[i].fd == -1) { 1046 - log_err("Failed to load program %s", progs[i].name); 1047 - goto err; 1048 - } 1049 - printf(" ... OK\n"); 1050 - 1051 - printf("Attach %s with invalid type", progs[i].name); 1052 - if (bpf_prog_attach(progs[i].fd, cgfd, progs[i].invalid_type, 1053 - BPF_F_ALLOW_OVERRIDE) != -1) { 1054 - log_err("Attach with invalid type accepted for %s", 1055 - progs[i].name); 1056 - goto err; 1057 - } 1058 - printf(" ... REJECTED\n"); 1059 - 1060 - printf("Attach %s with valid type", progs[i].name); 1061 - if (bpf_prog_attach(progs[i].fd, cgfd, progs[i].type, 1062 - BPF_F_ALLOW_OVERRIDE) == -1) { 1063 - log_err("Failed to attach program %s", progs[i].name); 1064 - goto err; 1065 - } 1066 - printf(" ... OK\n"); 1067 - } 1068 - 1069 - return 0; 1070 - err: 1071 - close_progs_fds(progs, prog_cnt); 1072 - return -1; 1073 - } 1074 - 1075 - static int run_domain_test(int domain, int cgfd, struct program *progs, 1076 - size_t prog_cnt, const char *ip, unsigned short port) 1077 - { 592 + socklen_t addr_len = sizeof(struct sockaddr_storage); 593 + struct sockaddr_storage expected_src_addr; 594 + struct sockaddr_storage requested_addr; 595 + struct sockaddr_storage expected_addr; 596 + int clientfd = -1; 597 + int servfd = -1; 1078 598 int err = 0; 1079 599 1080 - if (load_and_attach_progs(cgfd, progs, prog_cnt) == -1) 600 + if (init_addrs(test, &requested_addr, &expected_addr, 601 + &expected_src_addr)) 1081 602 goto err; 1082 603 1083 - if (run_test_case(domain, SOCK_STREAM, ip, port) == -1) 604 + /* Prepare server to connect to */ 605 + servfd = start_server(test->type, &expected_addr, addr_len); 606 + if (servfd == -1) 1084 607 goto err; 1085 608 1086 - if (run_test_case(domain, SOCK_DGRAM, ip, port) == -1) 609 + clientfd = connect_to_server(test->type, &requested_addr, addr_len); 610 + if (clientfd == -1) 611 + goto err; 612 + 613 + /* Make sure src and dst addrs were overridden properly */ 614 + if (cmp_peer_addr(clientfd, &expected_addr)) 615 + goto err; 616 + 617 + if (cmp_local_ip(clientfd, &expected_src_addr)) 1087 618 goto err; 1088 619 1089 620 goto out; 1090 621 err: 1091 622 err = -1; 1092 623 out: 1093 - close_progs_fds(progs, prog_cnt); 624 + close(clientfd); 625 + close(servfd); 1094 626 return err; 1095 627 } 1096 628 1097 - static int run_test(void) 629 + static int run_sendmsg_test_case(const struct sock_addr_test *test) 1098 630 { 1099 - size_t inet6_prog_cnt; 1100 - size_t inet_prog_cnt; 631 + socklen_t addr_len = sizeof(struct sockaddr_storage); 632 + struct sockaddr_storage expected_src_addr; 633 + struct sockaddr_storage requested_addr; 634 + struct sockaddr_storage expected_addr; 635 + struct sockaddr_storage real_src_addr; 636 + int clientfd = -1; 637 + int servfd = -1; 638 + int set_cmsg; 639 + int err = 0; 640 + 641 + if (test->type != SOCK_DGRAM) 642 + goto err; 643 + 644 + if (init_addrs(test, &requested_addr, &expected_addr, 645 + &expected_src_addr)) 646 + goto err; 647 + 648 + /* Prepare server to sendmsg to */ 649 + servfd = start_server(test->type, &expected_addr, addr_len); 650 + if (servfd == -1) 651 + goto err; 652 + 653 + for (set_cmsg = 0; set_cmsg <= 1; ++set_cmsg) { 654 + if (clientfd >= 0) 655 + close(clientfd); 656 + 657 + clientfd = sendmsg_to_server(&requested_addr, addr_len, 658 + set_cmsg, &err); 659 + if (err) 660 + goto out; 661 + else if (clientfd == -1) 662 + goto err; 663 + 664 + /* Try to receive message on server instead of using 665 + * getpeername(2) on client socket, to check that client's 666 + * destination address was rewritten properly, since 667 + * getpeername(2) doesn't work with unconnected datagram 668 + * sockets. 669 + * 670 + * Get source address from recvmsg(2) as well to make sure 671 + * source was rewritten properly: getsockname(2) can't be used 672 + * since socket is unconnected and source defined for one 673 + * specific packet may differ from the one used by default and 674 + * returned by getsockname(2). 675 + */ 676 + if (recvmsg_from_client(servfd, &real_src_addr) == -1) 677 + goto err; 678 + 679 + if (cmp_addr(&real_src_addr, &expected_src_addr, /*cmp_port*/0)) 680 + goto err; 681 + } 682 + 683 + goto out; 684 + err: 685 + err = -1; 686 + out: 687 + close(clientfd); 688 + close(servfd); 689 + return err; 690 + } 691 + 692 + static int run_test_case(int cgfd, const struct sock_addr_test *test) 693 + { 694 + int progfd = -1; 695 + int err = 0; 696 + 697 + printf("Test case: %s .. ", test->descr); 698 + 699 + progfd = test->loadfn(test); 700 + if (test->expected_result == LOAD_REJECT && progfd < 0) 701 + goto out; 702 + else if (test->expected_result == LOAD_REJECT || progfd < 0) 703 + goto err; 704 + 705 + err = bpf_prog_attach(progfd, cgfd, test->attach_type, 706 + BPF_F_ALLOW_OVERRIDE); 707 + if (test->expected_result == ATTACH_REJECT && err) { 708 + err = 0; /* error was expected, reset it */ 709 + goto out; 710 + } else if (test->expected_result == ATTACH_REJECT || err) { 711 + goto err; 712 + } 713 + 714 + switch (test->attach_type) { 715 + case BPF_CGROUP_INET4_BIND: 716 + case BPF_CGROUP_INET6_BIND: 717 + err = run_bind_test_case(test); 718 + break; 719 + case BPF_CGROUP_INET4_CONNECT: 720 + case BPF_CGROUP_INET6_CONNECT: 721 + err = run_connect_test_case(test); 722 + break; 723 + case BPF_CGROUP_UDP4_SENDMSG: 724 + case BPF_CGROUP_UDP6_SENDMSG: 725 + err = run_sendmsg_test_case(test); 726 + break; 727 + default: 728 + goto err; 729 + } 730 + 731 + if (test->expected_result == SYSCALL_EPERM && err == EPERM) { 732 + err = 0; /* error was expected, reset it */ 733 + goto out; 734 + } 735 + 736 + if (test->expected_result == SYSCALL_ENOTSUPP && err == ENOTSUPP) { 737 + err = 0; /* error was expected, reset it */ 738 + goto out; 739 + } 740 + 741 + if (err || test->expected_result != SUCCESS) 742 + goto err; 743 + 744 + goto out; 745 + err: 746 + err = -1; 747 + out: 748 + /* Detaching w/o checking return code: best effort attempt. */ 749 + if (progfd != -1) 750 + bpf_prog_detach(cgfd, test->attach_type); 751 + close(progfd); 752 + printf("[%s]\n", err ? "FAIL" : "PASS"); 753 + return err; 754 + } 755 + 756 + static int run_tests(int cgfd) 757 + { 758 + int passes = 0; 759 + int fails = 0; 760 + int i; 761 + 762 + for (i = 0; i < ARRAY_SIZE(tests); ++i) { 763 + if (run_test_case(cgfd, &tests[i])) 764 + ++fails; 765 + else 766 + ++passes; 767 + } 768 + printf("Summary: %d PASSED, %d FAILED\n", passes, fails); 769 + return fails ? -1 : 0; 770 + } 771 + 772 + int main(int argc, char **argv) 773 + { 1101 774 int cgfd = -1; 1102 775 int err = 0; 1103 776 1104 - struct program inet6_progs[] = { 1105 - {BPF_CGROUP_INET6_BIND, bind6_prog_load, -1, "bind6", 1106 - BPF_CGROUP_INET4_BIND}, 1107 - {BPF_CGROUP_INET6_CONNECT, connect6_prog_load, -1, "connect6", 1108 - BPF_CGROUP_INET4_CONNECT}, 1109 - }; 1110 - inet6_prog_cnt = sizeof(inet6_progs) / sizeof(struct program); 1111 - 1112 - struct program inet_progs[] = { 1113 - {BPF_CGROUP_INET4_BIND, bind4_prog_load, -1, "bind4", 1114 - BPF_CGROUP_INET6_BIND}, 1115 - {BPF_CGROUP_INET4_CONNECT, connect4_prog_load, -1, "connect4", 1116 - BPF_CGROUP_INET6_CONNECT}, 1117 - }; 1118 - inet_prog_cnt = sizeof(inet_progs) / sizeof(struct program); 777 + if (argc < 2) { 778 + fprintf(stderr, 779 + "%s has to be run via %s.sh. Skip direct run.\n", 780 + argv[0], argv[0]); 781 + exit(err); 782 + } 1119 783 1120 784 if (setup_cgroup_environment()) 1121 785 goto err; ··· 1359 559 if (join_cgroup(CG_PATH)) 1360 560 goto err; 1361 561 1362 - if (run_domain_test(AF_INET, cgfd, inet_progs, inet_prog_cnt, SERV4_IP, 1363 - SERV4_PORT) == -1) 1364 - goto err; 1365 - 1366 - if (run_domain_test(AF_INET6, cgfd, inet6_progs, inet6_prog_cnt, 1367 - SERV6_IP, SERV6_PORT) == -1) 562 + if (run_tests(cgfd)) 1368 563 goto err; 1369 564 1370 565 goto out; ··· 1368 573 out: 1369 574 close(cgfd); 1370 575 cleanup_cgroup_environment(); 1371 - printf(err ? "### FAIL\n" : "### SUCCESS\n"); 1372 576 return err; 1373 - } 1374 - 1375 - int main(int argc, char **argv) 1376 - { 1377 - if (argc < 2) { 1378 - fprintf(stderr, 1379 - "%s has to be run via %s.sh. Skip direct run.\n", 1380 - argv[0], argv[0]); 1381 - exit(0); 1382 - } 1383 - return run_test(); 1384 577 }
+67 -20
tools/testing/selftests/bpf/test_sockmap.c
··· 337 337 int fd_flags = O_NONBLOCK; 338 338 struct timeval timeout; 339 339 float total_bytes; 340 + int bytes_cnt = 0; 341 + int chunk_sz; 340 342 fd_set w; 343 + 344 + if (opt->sendpage) 345 + chunk_sz = iov_length * cnt; 346 + else 347 + chunk_sz = iov_length * iov_count; 341 348 342 349 fcntl(fd, fd_flags); 343 350 total_bytes = (float)iov_count * (float)iov_length * (float)cnt; ··· 352 345 if (err < 0) 353 346 perror("recv start time: "); 354 347 while (s->bytes_recvd < total_bytes) { 355 - timeout.tv_sec = 0; 356 - timeout.tv_usec = 10; 348 + if (txmsg_cork) { 349 + timeout.tv_sec = 0; 350 + timeout.tv_usec = 1000; 351 + } else { 352 + timeout.tv_sec = 1; 353 + timeout.tv_usec = 0; 354 + } 357 355 358 356 /* FD sets */ 359 357 FD_ZERO(&w); ··· 400 388 errno = -EIO; 401 389 fprintf(stderr, 402 390 "detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n", 403 - i, j, d[j], k - 1, d[j+1], k + 1); 391 + i, j, d[j], k - 1, d[j+1], k); 404 392 goto out_errno; 393 + } 394 + bytes_cnt++; 395 + if (bytes_cnt == chunk_sz) { 396 + k = 0; 397 + bytes_cnt = 0; 405 398 } 406 399 recv--; 407 400 } ··· 446 429 struct msg_stats s = {0}; 447 430 int iov_count = opt->iov_count; 448 431 int iov_buf = opt->iov_length; 432 + int rx_status, tx_status; 449 433 int cnt = opt->rate; 450 - int status; 451 434 452 435 errno = 0; 453 436 ··· 459 442 rxpid = fork(); 460 443 if (rxpid == 0) { 461 444 if (opt->drop_expected) 462 - exit(1); 445 + exit(0); 463 446 464 447 if (opt->sendpage) 465 448 iov_count = 1; ··· 480 463 "rx_sendmsg: TX: %zuB %fB/s %fGB/s RX: %zuB %fB/s %fGB/s\n", 481 464 s.bytes_sent, sent_Bps, sent_Bps/giga, 482 465 s.bytes_recvd, recvd_Bps, recvd_Bps/giga); 483 - exit(1); 466 + if (err && txmsg_cork) 467 + err = 0; 468 + exit(err ? 1 : 0); 484 469 } else if (rxpid == -1) { 485 470 perror("msg_loop_rx: "); 486 471 return errno; ··· 510 491 "tx_sendmsg: TX: %zuB %fB/s %f GB/s RX: %zuB %fB/s %fGB/s\n", 511 492 s.bytes_sent, sent_Bps, sent_Bps/giga, 512 493 s.bytes_recvd, recvd_Bps, recvd_Bps/giga); 513 - exit(1); 494 + exit(err ? 1 : 0); 514 495 } else if (txpid == -1) { 515 496 perror("msg_loop_tx: "); 516 497 return errno; 517 498 } 518 499 519 - assert(waitpid(rxpid, &status, 0) == rxpid); 520 - assert(waitpid(txpid, &status, 0) == txpid); 500 + assert(waitpid(rxpid, &rx_status, 0) == rxpid); 501 + assert(waitpid(txpid, &tx_status, 0) == txpid); 502 + if (WIFEXITED(rx_status)) { 503 + err = WEXITSTATUS(rx_status); 504 + if (err) { 505 + fprintf(stderr, "rx thread exited with err %d. ", err); 506 + goto out; 507 + } 508 + } 509 + if (WIFEXITED(tx_status)) { 510 + err = WEXITSTATUS(tx_status); 511 + if (err) 512 + fprintf(stderr, "tx thread exited with err %d. ", err); 513 + } 514 + out: 521 515 return err; 522 516 } 523 517 ··· 876 844 #define OPTSTRING 60 877 845 static void test_options(char *options) 878 846 { 847 + char tstr[OPTSTRING]; 848 + 879 849 memset(options, 0, OPTSTRING); 880 850 881 851 if (txmsg_pass) ··· 890 856 strncat(options, "redir_noisy,", OPTSTRING); 891 857 if (txmsg_drop) 892 858 strncat(options, "drop,", OPTSTRING); 893 - if (txmsg_apply) 894 - strncat(options, "apply,", OPTSTRING); 895 - if (txmsg_cork) 896 - strncat(options, "cork,", OPTSTRING); 897 - if (txmsg_start) 898 - strncat(options, "start,", OPTSTRING); 899 - if (txmsg_end) 900 - strncat(options, "end,", OPTSTRING); 859 + if (txmsg_apply) { 860 + snprintf(tstr, OPTSTRING, "apply %d,", txmsg_apply); 861 + strncat(options, tstr, OPTSTRING); 862 + } 863 + if (txmsg_cork) { 864 + snprintf(tstr, OPTSTRING, "cork %d,", txmsg_cork); 865 + strncat(options, tstr, OPTSTRING); 866 + } 867 + if (txmsg_start) { 868 + snprintf(tstr, OPTSTRING, "start %d,", txmsg_start); 869 + strncat(options, tstr, OPTSTRING); 870 + } 871 + if (txmsg_end) { 872 + snprintf(tstr, OPTSTRING, "end %d,", txmsg_end); 873 + strncat(options, tstr, OPTSTRING); 874 + } 901 875 if (txmsg_ingress) 902 876 strncat(options, "ingress,", OPTSTRING); 903 877 if (txmsg_skb) ··· 914 872 915 873 static int __test_exec(int cgrp, int test, struct sockmap_options *opt) 916 874 { 917 - char *options = calloc(60, sizeof(char)); 875 + char *options = calloc(OPTSTRING, sizeof(char)); 918 876 int err; 919 877 920 878 if (test == SENDPAGE) ··· 1052 1010 1053 1011 opt->iov_length = 1; 1054 1012 opt->iov_count = 1; 1055 - opt->rate = 1024; 1013 + opt->rate = 512; 1056 1014 err = test_exec(cgrp, opt); 1057 1015 if (err) 1058 1016 goto out; 1059 1017 1060 1018 opt->iov_length = 256; 1061 1019 opt->iov_count = 1024; 1062 - opt->rate = 10; 1020 + opt->rate = 2; 1063 1021 err = test_exec(cgrp, opt); 1064 1022 if (err) 1065 1023 goto out; ··· 1369 1327 "ERROR: (%i) open cg path failed: %s\n", 1370 1328 cg_fd, optarg); 1371 1329 return cg_fd; 1330 + } 1331 + 1332 + if (join_cgroup(CG_PATH)) { 1333 + fprintf(stderr, "ERROR: failed to join cgroup\n"); 1334 + return -EINVAL; 1372 1335 } 1373 1336 1374 1337 /* Tests basic commands and APIs with range of iov values */
+158 -27
tools/testing/selftests/bpf/test_verifier.c
··· 50 50 51 51 #define MAX_INSNS BPF_MAXINSNS 52 52 #define MAX_FIXUPS 8 53 - #define MAX_NR_MAPS 4 53 + #define MAX_NR_MAPS 7 54 54 #define POINTER_VALUE 0xcafe4all 55 55 #define TEST_DATA_LEN 64 56 56 ··· 66 66 int fixup_map1[MAX_FIXUPS]; 67 67 int fixup_map2[MAX_FIXUPS]; 68 68 int fixup_map3[MAX_FIXUPS]; 69 - int fixup_prog[MAX_FIXUPS]; 69 + int fixup_map4[MAX_FIXUPS]; 70 + int fixup_prog1[MAX_FIXUPS]; 71 + int fixup_prog2[MAX_FIXUPS]; 70 72 int fixup_map_in_map[MAX_FIXUPS]; 71 73 const char *errstr; 72 74 const char *errstr_unpriv; ··· 2771 2769 BPF_MOV64_IMM(BPF_REG_0, 0), 2772 2770 BPF_EXIT_INSN(), 2773 2771 }, 2774 - .fixup_prog = { 1 }, 2772 + .fixup_prog1 = { 1 }, 2775 2773 .errstr_unpriv = "R3 leaks addr into helper", 2776 2774 .result_unpriv = REJECT, 2777 2775 .result = ACCEPT, ··· 2858 2856 BPF_MOV64_IMM(BPF_REG_0, 1), 2859 2857 BPF_EXIT_INSN(), 2860 2858 }, 2861 - .fixup_prog = { 1 }, 2859 + .fixup_prog1 = { 1 }, 2862 2860 .result = ACCEPT, 2863 2861 .retval = 42, 2864 2862 }, ··· 2872 2870 BPF_MOV64_IMM(BPF_REG_0, 1), 2873 2871 BPF_EXIT_INSN(), 2874 2872 }, 2875 - .fixup_prog = { 1 }, 2873 + .fixup_prog1 = { 1 }, 2876 2874 .result = ACCEPT, 2877 2875 .retval = 41, 2878 2876 }, ··· 2886 2884 BPF_MOV64_IMM(BPF_REG_0, 1), 2887 2885 BPF_EXIT_INSN(), 2888 2886 }, 2889 - .fixup_prog = { 1 }, 2887 + .fixup_prog1 = { 1 }, 2890 2888 .result = ACCEPT, 2891 2889 .retval = 1, 2892 2890 }, ··· 2900 2898 BPF_MOV64_IMM(BPF_REG_0, 2), 2901 2899 BPF_EXIT_INSN(), 2902 2900 }, 2903 - .fixup_prog = { 1 }, 2901 + .fixup_prog1 = { 1 }, 2904 2902 .result = ACCEPT, 2905 2903 .retval = 2, 2906 2904 }, ··· 2914 2912 BPF_MOV64_IMM(BPF_REG_0, 2), 2915 2913 BPF_EXIT_INSN(), 2916 2914 }, 2917 - .fixup_prog = { 1 }, 2915 + .fixup_prog1 = { 1 }, 2918 2916 .result = ACCEPT, 2919 2917 .retval = 2, 2920 2918 }, ··· 2928 2926 BPF_MOV64_IMM(BPF_REG_0, 2), 2929 2927 BPF_EXIT_INSN(), 2930 2928 }, 2931 - .fixup_prog = { 2 }, 2929 + .fixup_prog1 = { 2 }, 2932 2930 .result = ACCEPT, 2933 2931 .retval = 42, 2934 2932 }, ··· 11684 11682 .prog_type = BPF_PROG_TYPE_XDP, 11685 11683 }, 11686 11684 { 11685 + "calls: two calls returning different map pointers for lookup (hash, array)", 11686 + .insns = { 11687 + /* main prog */ 11688 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2), 11689 + BPF_CALL_REL(11), 11690 + BPF_JMP_IMM(BPF_JA, 0, 0, 1), 11691 + BPF_CALL_REL(12), 11692 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 11693 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 11694 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 11695 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 11696 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 11697 + BPF_FUNC_map_lookup_elem), 11698 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 11699 + BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 11700 + offsetof(struct test_val, foo)), 11701 + BPF_MOV64_IMM(BPF_REG_0, 1), 11702 + BPF_EXIT_INSN(), 11703 + /* subprog 1 */ 11704 + BPF_LD_MAP_FD(BPF_REG_0, 0), 11705 + BPF_EXIT_INSN(), 11706 + /* subprog 2 */ 11707 + BPF_LD_MAP_FD(BPF_REG_0, 0), 11708 + BPF_EXIT_INSN(), 11709 + }, 11710 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 11711 + .fixup_map2 = { 13 }, 11712 + .fixup_map4 = { 16 }, 11713 + .result = ACCEPT, 11714 + .retval = 1, 11715 + }, 11716 + { 11717 + "calls: two calls returning different map pointers for lookup (hash, map in map)", 11718 + .insns = { 11719 + /* main prog */ 11720 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2), 11721 + BPF_CALL_REL(11), 11722 + BPF_JMP_IMM(BPF_JA, 0, 0, 1), 11723 + BPF_CALL_REL(12), 11724 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 11725 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 11726 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 11727 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 11728 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 11729 + BPF_FUNC_map_lookup_elem), 11730 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 11731 + BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 11732 + offsetof(struct test_val, foo)), 11733 + BPF_MOV64_IMM(BPF_REG_0, 1), 11734 + BPF_EXIT_INSN(), 11735 + /* subprog 1 */ 11736 + BPF_LD_MAP_FD(BPF_REG_0, 0), 11737 + BPF_EXIT_INSN(), 11738 + /* subprog 2 */ 11739 + BPF_LD_MAP_FD(BPF_REG_0, 0), 11740 + BPF_EXIT_INSN(), 11741 + }, 11742 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 11743 + .fixup_map_in_map = { 16 }, 11744 + .fixup_map4 = { 13 }, 11745 + .result = REJECT, 11746 + .errstr = "R0 invalid mem access 'map_ptr'", 11747 + }, 11748 + { 11749 + "cond: two branches returning different map pointers for lookup (tail, tail)", 11750 + .insns = { 11751 + BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, 11752 + offsetof(struct __sk_buff, mark)), 11753 + BPF_JMP_IMM(BPF_JNE, BPF_REG_6, 0, 3), 11754 + BPF_LD_MAP_FD(BPF_REG_2, 0), 11755 + BPF_JMP_IMM(BPF_JA, 0, 0, 2), 11756 + BPF_LD_MAP_FD(BPF_REG_2, 0), 11757 + BPF_MOV64_IMM(BPF_REG_3, 7), 11758 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 11759 + BPF_FUNC_tail_call), 11760 + BPF_MOV64_IMM(BPF_REG_0, 1), 11761 + BPF_EXIT_INSN(), 11762 + }, 11763 + .fixup_prog1 = { 5 }, 11764 + .fixup_prog2 = { 2 }, 11765 + .result_unpriv = REJECT, 11766 + .errstr_unpriv = "tail_call abusing map_ptr", 11767 + .result = ACCEPT, 11768 + .retval = 42, 11769 + }, 11770 + { 11771 + "cond: two branches returning same map pointers for lookup (tail, tail)", 11772 + .insns = { 11773 + BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, 11774 + offsetof(struct __sk_buff, mark)), 11775 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 3), 11776 + BPF_LD_MAP_FD(BPF_REG_2, 0), 11777 + BPF_JMP_IMM(BPF_JA, 0, 0, 2), 11778 + BPF_LD_MAP_FD(BPF_REG_2, 0), 11779 + BPF_MOV64_IMM(BPF_REG_3, 7), 11780 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 11781 + BPF_FUNC_tail_call), 11782 + BPF_MOV64_IMM(BPF_REG_0, 1), 11783 + BPF_EXIT_INSN(), 11784 + }, 11785 + .fixup_prog2 = { 2, 5 }, 11786 + .result_unpriv = ACCEPT, 11787 + .result = ACCEPT, 11788 + .retval = 42, 11789 + }, 11790 + { 11687 11791 "search pruning: all branches should be verified (nop operation)", 11688 11792 .insns = { 11689 11793 BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ··· 12270 12162 return len + 1; 12271 12163 } 12272 12164 12273 - static int create_map(uint32_t size_value, uint32_t max_elem) 12165 + static int create_map(uint32_t type, uint32_t size_key, 12166 + uint32_t size_value, uint32_t max_elem) 12274 12167 { 12275 12168 int fd; 12276 12169 12277 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(long long), 12278 - size_value, max_elem, BPF_F_NO_PREALLOC); 12170 + fd = bpf_create_map(type, size_key, size_value, max_elem, 12171 + type == BPF_MAP_TYPE_HASH ? BPF_F_NO_PREALLOC : 0); 12279 12172 if (fd < 0) 12280 12173 printf("Failed to create hash map '%s'!\n", strerror(errno)); 12281 12174 ··· 12309 12200 ARRAY_SIZE(prog), "GPL", 0, NULL, 0); 12310 12201 } 12311 12202 12312 - static int create_prog_array(void) 12203 + static int create_prog_array(uint32_t max_elem, int p1key) 12313 12204 { 12314 - int p1key = 0, p2key = 1; 12205 + int p2key = 1; 12315 12206 int mfd, p1fd, p2fd; 12316 12207 12317 12208 mfd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, sizeof(int), 12318 - sizeof(int), 4, 0); 12209 + sizeof(int), max_elem, 0); 12319 12210 if (mfd < 0) { 12320 12211 printf("Failed to create prog array '%s'!\n", strerror(errno)); 12321 12212 return -1; ··· 12370 12261 int *fixup_map1 = test->fixup_map1; 12371 12262 int *fixup_map2 = test->fixup_map2; 12372 12263 int *fixup_map3 = test->fixup_map3; 12373 - int *fixup_prog = test->fixup_prog; 12264 + int *fixup_map4 = test->fixup_map4; 12265 + int *fixup_prog1 = test->fixup_prog1; 12266 + int *fixup_prog2 = test->fixup_prog2; 12374 12267 int *fixup_map_in_map = test->fixup_map_in_map; 12375 12268 12376 12269 if (test->fill_helper) ··· 12383 12272 * that really matters is value size in this case. 12384 12273 */ 12385 12274 if (*fixup_map1) { 12386 - map_fds[0] = create_map(sizeof(long long), 1); 12275 + map_fds[0] = create_map(BPF_MAP_TYPE_HASH, sizeof(long long), 12276 + sizeof(long long), 1); 12387 12277 do { 12388 12278 prog[*fixup_map1].imm = map_fds[0]; 12389 12279 fixup_map1++; ··· 12392 12280 } 12393 12281 12394 12282 if (*fixup_map2) { 12395 - map_fds[1] = create_map(sizeof(struct test_val), 1); 12283 + map_fds[1] = create_map(BPF_MAP_TYPE_HASH, sizeof(long long), 12284 + sizeof(struct test_val), 1); 12396 12285 do { 12397 12286 prog[*fixup_map2].imm = map_fds[1]; 12398 12287 fixup_map2++; ··· 12401 12288 } 12402 12289 12403 12290 if (*fixup_map3) { 12404 - map_fds[1] = create_map(sizeof(struct other_val), 1); 12291 + map_fds[2] = create_map(BPF_MAP_TYPE_HASH, sizeof(long long), 12292 + sizeof(struct other_val), 1); 12405 12293 do { 12406 - prog[*fixup_map3].imm = map_fds[1]; 12294 + prog[*fixup_map3].imm = map_fds[2]; 12407 12295 fixup_map3++; 12408 12296 } while (*fixup_map3); 12409 12297 } 12410 12298 12411 - if (*fixup_prog) { 12412 - map_fds[2] = create_prog_array(); 12299 + if (*fixup_map4) { 12300 + map_fds[3] = create_map(BPF_MAP_TYPE_ARRAY, sizeof(int), 12301 + sizeof(struct test_val), 1); 12413 12302 do { 12414 - prog[*fixup_prog].imm = map_fds[2]; 12415 - fixup_prog++; 12416 - } while (*fixup_prog); 12303 + prog[*fixup_map4].imm = map_fds[3]; 12304 + fixup_map4++; 12305 + } while (*fixup_map4); 12306 + } 12307 + 12308 + if (*fixup_prog1) { 12309 + map_fds[4] = create_prog_array(4, 0); 12310 + do { 12311 + prog[*fixup_prog1].imm = map_fds[4]; 12312 + fixup_prog1++; 12313 + } while (*fixup_prog1); 12314 + } 12315 + 12316 + if (*fixup_prog2) { 12317 + map_fds[5] = create_prog_array(8, 7); 12318 + do { 12319 + prog[*fixup_prog2].imm = map_fds[5]; 12320 + fixup_prog2++; 12321 + } while (*fixup_prog2); 12417 12322 } 12418 12323 12419 12324 if (*fixup_map_in_map) { 12420 - map_fds[3] = create_map_in_map(); 12325 + map_fds[6] = create_map_in_map(); 12421 12326 do { 12422 - prog[*fixup_map_in_map].imm = map_fds[3]; 12327 + prog[*fixup_map_in_map].imm = map_fds[6]; 12423 12328 fixup_map_in_map++; 12424 12329 } while (*fixup_map_in_map); 12425 12330 }