Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+5 -1

Documentation/bpf/btf.rst

··· 369 369 * ``name_off``: offset to a valid C identifier 370 370 * ``info.kind_flag``: 0 371 371 * ``info.kind``: BTF_KIND_FUNC 372 - * ``info.vlen``: 0 372 + * ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL 373 + or BTF_FUNC_EXTERN) 373 374 * ``type``: a BTF_KIND_FUNC_PROTO type 374 375 375 376 No additional type data follow ``btf_type``. ··· 380 379 type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the 381 380 :ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load` 382 381 (ABI). 382 + 383 + Currently, only linkage values of BTF_FUNC_STATIC and BTF_FUNC_GLOBAL are 384 + supported in the kernel. 383 385 384 386 2.2.13 BTF_KIND_FUNC_PROTO 385 387 ~~~~~~~~~~~~~~~~~~~~~~~~~~

+1

Documentation/bpf/index.rst

··· 19 19 faq 20 20 syscall_api 21 21 helpers 22 + kfuncs 22 23 programs 23 24 maps 24 25 bpf_prog_run

+170

Documentation/bpf/kfuncs.rst

··· 1 + ============================= 2 + BPF Kernel Functions (kfuncs) 3 + ============================= 4 + 5 + 1. Introduction 6 + =============== 7 + 8 + BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux 9 + kernel which are exposed for use by BPF programs. Unlike normal BPF helpers, 10 + kfuncs do not have a stable interface and can change from one kernel release to 11 + another. Hence, BPF programs need to be updated in response to changes in the 12 + kernel. 13 + 14 + 2. Defining a kfunc 15 + =================== 16 + 17 + There are two ways to expose a kernel function to BPF programs, either make an 18 + existing function in the kernel visible, or add a new wrapper for BPF. In both 19 + cases, care must be taken that BPF program can only call such function in a 20 + valid context. To enforce this, visibility of a kfunc can be per program type. 21 + 22 + If you are not creating a BPF wrapper for existing kernel function, skip ahead 23 + to :ref:`BPF_kfunc_nodef`. 24 + 25 + 2.1 Creating a wrapper kfunc 26 + ---------------------------- 27 + 28 + When defining a wrapper kfunc, the wrapper function should have extern linkage. 29 + This prevents the compiler from optimizing away dead code, as this wrapper kfunc 30 + is not invoked anywhere in the kernel itself. It is not necessary to provide a 31 + prototype in a header for the wrapper kfunc. 32 + 33 + An example is given below:: 34 + 35 + /* Disables missing prototype warnings */ 36 + __diag_push(); 37 + __diag_ignore_all("-Wmissing-prototypes", 38 + "Global kfuncs as their definitions will be in BTF"); 39 + 40 + struct task_struct *bpf_find_get_task_by_vpid(pid_t nr) 41 + { 42 + return find_get_task_by_vpid(nr); 43 + } 44 + 45 + __diag_pop(); 46 + 47 + A wrapper kfunc is often needed when we need to annotate parameters of the 48 + kfunc. Otherwise one may directly make the kfunc visible to the BPF program by 49 + registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`. 50 + 51 + 2.2 Annotating kfunc parameters 52 + ------------------------------- 53 + 54 + Similar to BPF helpers, there is sometime need for additional context required 55 + by the verifier to make the usage of kernel functions safer and more useful. 56 + Hence, we can annotate a parameter by suffixing the name of the argument of the 57 + kfunc with a __tag, where tag may be one of the supported annotations. 58 + 59 + 2.2.1 __sz Annotation 60 + --------------------- 61 + 62 + This annotation is used to indicate a memory and size pair in the argument list. 63 + An example is given below:: 64 + 65 + void bpf_memzero(void *mem, int mem__sz) 66 + { 67 + ... 68 + } 69 + 70 + Here, the verifier will treat first argument as a PTR_TO_MEM, and second 71 + argument as its size. By default, without __sz annotation, the size of the type 72 + of the pointer is used. Without __sz annotation, a kfunc cannot accept a void 73 + pointer. 74 + 75 + .. _BPF_kfunc_nodef: 76 + 77 + 2.3 Using an existing kernel function 78 + ------------------------------------- 79 + 80 + When an existing function in the kernel is fit for consumption by BPF programs, 81 + it can be directly registered with the BPF subsystem. However, care must still 82 + be taken to review the context in which it will be invoked by the BPF program 83 + and whether it is safe to do so. 84 + 85 + 2.4 Annotating kfuncs 86 + --------------------- 87 + 88 + In addition to kfuncs' arguments, verifier may need more information about the 89 + type of kfunc(s) being registered with the BPF subsystem. To do so, we define 90 + flags on a set of kfuncs as follows:: 91 + 92 + BTF_SET8_START(bpf_task_set) 93 + BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) 94 + BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) 95 + BTF_SET8_END(bpf_task_set) 96 + 97 + This set encodes the BTF ID of each kfunc listed above, and encodes the flags 98 + along with it. Ofcourse, it is also allowed to specify no flags. 99 + 100 + 2.4.1 KF_ACQUIRE flag 101 + --------------------- 102 + 103 + The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a 104 + refcounted object. The verifier will then ensure that the pointer to the object 105 + is eventually released using a release kfunc, or transferred to a map using a 106 + referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the 107 + loading of the BPF program until no lingering references remain in all possible 108 + explored states of the program. 109 + 110 + 2.4.2 KF_RET_NULL flag 111 + ---------------------- 112 + 113 + The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc 114 + may be NULL. Hence, it forces the user to do a NULL check on the pointer 115 + returned from the kfunc before making use of it (dereferencing or passing to 116 + another helper). This flag is often used in pairing with KF_ACQUIRE flag, but 117 + both are orthogonal to each other. 118 + 119 + 2.4.3 KF_RELEASE flag 120 + --------------------- 121 + 122 + The KF_RELEASE flag is used to indicate that the kfunc releases the pointer 123 + passed in to it. There can be only one referenced pointer that can be passed in. 124 + All copies of the pointer being released are invalidated as a result of invoking 125 + kfunc with this flag. 126 + 127 + 2.4.4 KF_KPTR_GET flag 128 + ---------------------- 129 + 130 + The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument 131 + as a pointer to kptr, safely increments the refcount of the object it points to, 132 + and returns a reference to the user. The rest of the arguments may be normal 133 + arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with 134 + KF_ACQUIRE and KF_RET_NULL flags. 135 + 136 + 2.4.5 KF_TRUSTED_ARGS flag 137 + -------------------------- 138 + 139 + The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It 140 + indicates that the all pointer arguments will always be refcounted, and have 141 + their offset set to 0. It can be used to enforce that a pointer to a refcounted 142 + object acquired from a kfunc or BPF helper is passed as an argument to this 143 + kfunc without any modifications (e.g. pointer arithmetic) such that it is 144 + trusted and points to the original object. This flag is often used for kfuncs 145 + that operate (change some property, perform some operation) on an object that 146 + was obtained using an acquire kfunc. Such kfuncs need an unchanged pointer to 147 + ensure the integrity of the operation being performed on the expected object. 148 + 149 + 2.5 Registering the kfuncs 150 + -------------------------- 151 + 152 + Once the kfunc is prepared for use, the final step to making it visible is 153 + registering it with the BPF subsystem. Registration is done per BPF program 154 + type. An example is shown below:: 155 + 156 + BTF_SET8_START(bpf_task_set) 157 + BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) 158 + BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) 159 + BTF_SET8_END(bpf_task_set) 160 + 161 + static const struct btf_kfunc_id_set bpf_task_kfunc_set = { 162 + .owner = THIS_MODULE, 163 + .set = &bpf_task_set, 164 + }; 165 + 166 + static int init_subsystem(void) 167 + { 168 + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set); 169 + } 170 + late_initcall(init_subsystem);

+185

Documentation/bpf/map_hash.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0-only 2 + .. Copyright (C) 2022 Red Hat, Inc. 3 + 4 + =============================================== 5 + BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants 6 + =============================================== 7 + 8 + .. note:: 9 + - ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19 10 + - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6 11 + - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 12 + were introduced in version 4.10 13 + 14 + ``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general 15 + purpose hash map storage. Both the key and the value can be structs, 16 + allowing for composite keys and values. 17 + 18 + The kernel is responsible for allocating and freeing key/value pairs, up 19 + to the max_entries limit that you specify. Hash maps use pre-allocation 20 + of hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be 21 + used to disable pre-allocation when it is too memory expensive. 22 + 23 + ``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per 24 + CPU. The per-cpu values are stored internally in an array. 25 + 26 + The ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 27 + variants add LRU semantics to their respective hash tables. An LRU hash 28 + will automatically evict the least recently used entries when the hash 29 + table reaches capacity. An LRU hash maintains an internal LRU list that 30 + is used to select elements for eviction. This internal LRU list is 31 + shared across CPUs but it is possible to request a per CPU LRU list with 32 + the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. 33 + 34 + Usage 35 + ===== 36 + 37 + .. c:function:: 38 + long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 39 + 40 + Hash entries can be added or updated using the ``bpf_map_update_elem()`` 41 + helper. This helper replaces existing elements atomically. The ``flags`` 42 + parameter can be used to control the update behaviour: 43 + 44 + - ``BPF_ANY`` will create a new element or update an existing element 45 + - ``BPF_NOEXIST`` will create a new element only if one did not already 46 + exist 47 + - ``BPF_EXIST`` will update an existing element 48 + 49 + ``bpf_map_update_elem()`` returns 0 on success, or negative error in 50 + case of failure. 51 + 52 + .. c:function:: 53 + void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 54 + 55 + Hash entries can be retrieved using the ``bpf_map_lookup_elem()`` 56 + helper. This helper returns a pointer to the value associated with 57 + ``key``, or ``NULL`` if no entry was found. 58 + 59 + .. c:function:: 60 + long bpf_map_delete_elem(struct bpf_map *map, const void *key) 61 + 62 + Hash entries can be deleted using the ``bpf_map_delete_elem()`` 63 + helper. This helper will return 0 on success, or negative error in case 64 + of failure. 65 + 66 + Per CPU Hashes 67 + -------------- 68 + 69 + For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 70 + the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers 71 + automatically access the hash slot for the current CPU. 72 + 73 + .. c:function:: 74 + void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu) 75 + 76 + The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the 77 + value in the hash slot for a specific CPU. Returns value associated with 78 + ``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is 79 + invalid. 80 + 81 + Concurrency 82 + ----------- 83 + 84 + Values stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by 85 + programs running on different CPUs. Since Kernel version 5.1, the BPF 86 + infrastructure provides ``struct bpf_spin_lock`` to synchronise access. 87 + See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``. 88 + 89 + Userspace 90 + --------- 91 + 92 + .. c:function:: 93 + int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key) 94 + 95 + In userspace, it is possible to iterate through the keys of a hash using 96 + libbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by 97 + calling ``bpf_map_get_next_key()`` with ``cur_key`` set to 98 + ``NULL``. Subsequent calls will fetch the next key that follows the 99 + current key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if 100 + cur_key is the last key in the hash, or negative error in case of 101 + failure. 102 + 103 + Note that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()`` 104 + will instead return the *first* key in the hash table which is 105 + undesirable. It is recommended to use batched lookup if there is going 106 + to be key deletion intermixed with ``bpf_map_get_next_key()``. 107 + 108 + Examples 109 + ======== 110 + 111 + Please see the ``tools/testing/selftests/bpf`` directory for functional 112 + examples. The code snippets below demonstrates API usage. 113 + 114 + This example shows how to declare an LRU Hash with a struct key and a 115 + struct value. 116 + 117 + .. code-block:: c 118 + 119 + #include <linux/bpf.h> 120 + #include <bpf/bpf_helpers.h> 121 + 122 + struct key { 123 + __u32 srcip; 124 + }; 125 + 126 + struct value { 127 + __u64 packets; 128 + __u64 bytes; 129 + }; 130 + 131 + struct { 132 + __uint(type, BPF_MAP_TYPE_LRU_HASH); 133 + __uint(max_entries, 32); 134 + __type(key, struct key); 135 + __type(value, struct value); 136 + } packet_stats SEC(".maps"); 137 + 138 + This example shows how to create or update hash values using atomic 139 + instructions: 140 + 141 + .. code-block:: c 142 + 143 + static void update_stats(__u32 srcip, int bytes) 144 + { 145 + struct key key = { 146 + .srcip = srcip, 147 + }; 148 + struct value *value = bpf_map_lookup_elem(&packet_stats, &key); 149 + 150 + if (value) { 151 + __sync_fetch_and_add(&value->packets, 1); 152 + __sync_fetch_and_add(&value->bytes, bytes); 153 + } else { 154 + struct value newval = { 1, bytes }; 155 + 156 + bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST); 157 + } 158 + } 159 + 160 + Userspace walking the map elements from the map declared above: 161 + 162 + .. code-block:: c 163 + 164 + #include <bpf/libbpf.h> 165 + #include <bpf/bpf.h> 166 + 167 + static void walk_hash_elements(int map_fd) 168 + { 169 + struct key *cur_key = NULL; 170 + struct key next_key; 171 + struct value value; 172 + int err; 173 + 174 + for (;;) { 175 + err = bpf_map_get_next_key(map_fd, cur_key, &next_key); 176 + if (err) 177 + break; 178 + 179 + bpf_map_lookup_elem(map_fd, &next_key, &value); 180 + 181 + // Use key and value here 182 + 183 + cur_key = &next_key; 184 + } 185 + }

+3

arch/arm64/include/asm/insn.h

··· 510 510 unsigned int imm, 511 511 enum aarch64_insn_size_type size, 512 512 enum aarch64_insn_ldst_type type); 513 + u32 aarch64_insn_gen_load_literal(unsigned long pc, unsigned long addr, 514 + enum aarch64_insn_register reg, 515 + bool is64bit); 513 516 u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1, 514 517 enum aarch64_insn_register reg2, 515 518 enum aarch64_insn_register base,

+26 -4

arch/arm64/lib/insn.c

··· 323 323 return insn; 324 324 } 325 325 326 - static inline long branch_imm_common(unsigned long pc, unsigned long addr, 326 + static inline long label_imm_common(unsigned long pc, unsigned long addr, 327 327 long range) 328 328 { 329 329 long offset; ··· 354 354 * ARM64 virtual address arrangement guarantees all kernel and module 355 355 * texts are within +/-128M. 356 356 */ 357 - offset = branch_imm_common(pc, addr, SZ_128M); 357 + offset = label_imm_common(pc, addr, SZ_128M); 358 358 if (offset >= SZ_128M) 359 359 return AARCH64_BREAK_FAULT; 360 360 ··· 382 382 u32 insn; 383 383 long offset; 384 384 385 - offset = branch_imm_common(pc, addr, SZ_1M); 385 + offset = label_imm_common(pc, addr, SZ_1M); 386 386 if (offset >= SZ_1M) 387 387 return AARCH64_BREAK_FAULT; 388 388 ··· 421 421 u32 insn; 422 422 long offset; 423 423 424 - offset = branch_imm_common(pc, addr, SZ_1M); 424 + offset = label_imm_common(pc, addr, SZ_1M); 425 425 426 426 insn = aarch64_insn_get_bcond_value(); 427 427 ··· 541 541 base); 542 542 543 543 return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_12, insn, imm); 544 + } 545 + 546 + u32 aarch64_insn_gen_load_literal(unsigned long pc, unsigned long addr, 547 + enum aarch64_insn_register reg, 548 + bool is64bit) 549 + { 550 + u32 insn; 551 + long offset; 552 + 553 + offset = label_imm_common(pc, addr, SZ_1M); 554 + if (offset >= SZ_1M) 555 + return AARCH64_BREAK_FAULT; 556 + 557 + insn = aarch64_insn_get_ldr_lit_value(); 558 + 559 + if (is64bit) 560 + insn |= BIT(30); 561 + 562 + insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn, reg); 563 + 564 + return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_19, insn, 565 + offset >> 2); 544 566 } 545 567 546 568 u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,

+7

arch/arm64/net/bpf_jit.h

··· 80 80 #define A64_STR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, STORE) 81 81 #define A64_LDR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, LOAD) 82 82 83 + /* LDR (literal) */ 84 + #define A64_LDR32LIT(Wt, offset) \ 85 + aarch64_insn_gen_load_literal(0, offset, Wt, false) 86 + #define A64_LDR64LIT(Xt, offset) \ 87 + aarch64_insn_gen_load_literal(0, offset, Xt, true) 88 + 83 89 /* Load/store register pair */ 84 90 #define A64_LS_PAIR(Rt, Rt2, Rn, offset, ls, type) \ 85 91 aarch64_insn_gen_load_store_pair(Rt, Rt2, Rn, offset, \ ··· 276 270 #define A64_BTI_C A64_HINT(AARCH64_INSN_HINT_BTIC) 277 271 #define A64_BTI_J A64_HINT(AARCH64_INSN_HINT_BTIJ) 278 272 #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC) 273 + #define A64_NOP A64_HINT(AARCH64_INSN_HINT_NOP) 279 274 280 275 /* DMB */ 281 276 #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)

+698 -17

arch/arm64/net/bpf_jit_comp.c

··· 10 10 #include <linux/bitfield.h> 11 11 #include <linux/bpf.h> 12 12 #include <linux/filter.h> 13 + #include <linux/memory.h> 13 14 #include <linux/printk.h> 14 15 #include <linux/slab.h> 15 16 ··· 19 18 #include <asm/cacheflush.h> 20 19 #include <asm/debug-monitors.h> 21 20 #include <asm/insn.h> 21 + #include <asm/patching.h> 22 22 #include <asm/set_memory.h> 23 23 24 24 #include "bpf_jit.h" ··· 79 77 u32 stack_size; 80 78 int fpb_offset; 81 79 }; 80 + 81 + struct bpf_plt { 82 + u32 insn_ldr; /* load target */ 83 + u32 insn_br; /* branch to target */ 84 + u64 target; /* target value */ 85 + }; 86 + 87 + #define PLT_TARGET_SIZE sizeof_field(struct bpf_plt, target) 88 + #define PLT_TARGET_OFFSET offsetof(struct bpf_plt, target) 82 89 83 90 static inline void emit(const u32 insn, struct jit_ctx *ctx) 84 91 { ··· 151 140 } 152 141 } 153 142 143 + static inline void emit_bti(u32 insn, struct jit_ctx *ctx) 144 + { 145 + if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) 146 + emit(insn, ctx); 147 + } 148 + 154 149 /* 155 150 * Kernel addresses in the vmalloc space use at most 48 bits, and the 156 151 * remaining bits are guaranteed to be 0x1. So we can compose the address ··· 174 157 shift += 16; 175 158 emit(A64_MOVK(1, reg, tmp & 0xffff, shift), ctx); 176 159 } 160 + } 161 + 162 + static inline void emit_call(u64 target, struct jit_ctx *ctx) 163 + { 164 + u8 tmp = bpf2a64[TMP_REG_1]; 165 + 166 + emit_addr_mov_i64(tmp, target, ctx); 167 + emit(A64_BLR(tmp), ctx); 177 168 } 178 169 179 170 static inline int bpf2a64_offset(int bpf_insn, int off, ··· 260 235 return true; 261 236 } 262 237 238 + /* generated prologue: 239 + * bti c // if CONFIG_ARM64_BTI_KERNEL 240 + * mov x9, lr 241 + * nop // POKE_OFFSET 242 + * paciasp // if CONFIG_ARM64_PTR_AUTH_KERNEL 243 + * stp x29, lr, [sp, #-16]! 244 + * mov x29, sp 245 + * stp x19, x20, [sp, #-16]! 246 + * stp x21, x22, [sp, #-16]! 247 + * stp x25, x26, [sp, #-16]! 248 + * stp x27, x28, [sp, #-16]! 249 + * mov x25, sp 250 + * mov tcc, #0 251 + * // PROLOGUE_OFFSET 252 + */ 253 + 254 + #define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0) 255 + #define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0) 256 + 257 + /* Offset of nop instruction in bpf prog entry to be poked */ 258 + #define POKE_OFFSET (BTI_INSNS + 1) 259 + 263 260 /* Tail call offset to jump into */ 264 - #if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \ 265 - IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) 266 - #define PROLOGUE_OFFSET 9 267 - #else 268 - #define PROLOGUE_OFFSET 8 269 - #endif 261 + #define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8) 270 262 271 263 static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) 272 264 { ··· 322 280 * 323 281 */ 324 282 283 + emit_bti(A64_BTI_C, ctx); 284 + 285 + emit(A64_MOV(1, A64_R(9), A64_LR), ctx); 286 + emit(A64_NOP, ctx); 287 + 325 288 /* Sign lr */ 326 289 if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)) 327 290 emit(A64_PACIASP, ctx); 328 - /* BTI landing pad */ 329 - else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) 330 - emit(A64_BTI_C, ctx); 331 291 332 292 /* Save FP and LR registers to stay align with ARM64 AAPCS */ 333 293 emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx); ··· 356 312 } 357 313 358 314 /* BTI landing pad for the tail call, done with a BR */ 359 - if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) 360 - emit(A64_BTI_J, ctx); 315 + emit_bti(A64_BTI_J, ctx); 361 316 } 362 317 363 318 emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx); ··· 598 555 } 599 556 600 557 return 0; 558 + } 559 + 560 + void dummy_tramp(void); 561 + 562 + asm ( 563 + " .pushsection .text, \"ax\", @progbits\n" 564 + " .global dummy_tramp\n" 565 + " .type dummy_tramp, %function\n" 566 + "dummy_tramp:" 567 + #if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) 568 + " bti j\n" /* dummy_tramp is called via "br x10" */ 569 + #endif 570 + " mov x10, x30\n" 571 + " mov x30, x9\n" 572 + " ret x10\n" 573 + " .size dummy_tramp, .-dummy_tramp\n" 574 + " .popsection\n" 575 + ); 576 + 577 + /* build a plt initialized like this: 578 + * 579 + * plt: 580 + * ldr tmp, target 581 + * br tmp 582 + * target: 583 + * .quad dummy_tramp 584 + * 585 + * when a long jump trampoline is attached, target is filled with the 586 + * trampoline address, and when the trampoline is removed, target is 587 + * restored to dummy_tramp address. 588 + */ 589 + static void build_plt(struct jit_ctx *ctx) 590 + { 591 + const u8 tmp = bpf2a64[TMP_REG_1]; 592 + struct bpf_plt *plt = NULL; 593 + 594 + /* make sure target is 64-bit aligned */ 595 + if ((ctx->idx + PLT_TARGET_OFFSET / AARCH64_INSN_SIZE) % 2) 596 + emit(A64_NOP, ctx); 597 + 598 + plt = (struct bpf_plt *)(ctx->image + ctx->idx); 599 + /* plt is called via bl, no BTI needed here */ 600 + emit(A64_LDR64LIT(tmp, 2 * AARCH64_INSN_SIZE), ctx); 601 + emit(A64_BR(tmp), ctx); 602 + 603 + if (ctx->image) 604 + plt->target = (u64)&dummy_tramp; 601 605 } 602 606 603 607 static void build_epilogue(struct jit_ctx *ctx) ··· 1081 991 &func_addr, &func_addr_fixed); 1082 992 if (ret < 0) 1083 993 return ret; 1084 - emit_addr_mov_i64(tmp, func_addr, ctx); 1085 - emit(A64_BLR(tmp), ctx); 994 + emit_call(func_addr, ctx); 1086 995 emit(A64_MOV(1, r0, A64_R(0)), ctx); 1087 996 break; 1088 997 } ··· 1425 1336 if (a64_insn == AARCH64_BREAK_FAULT) 1426 1337 return -1; 1427 1338 } 1339 + return 0; 1340 + } 1341 + 1342 + static int validate_ctx(struct jit_ctx *ctx) 1343 + { 1344 + if (validate_code(ctx)) 1345 + return -1; 1428 1346 1429 1347 if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries)) 1430 1348 return -1; ··· 1452 1356 1453 1357 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) 1454 1358 { 1455 - int image_size, prog_size, extable_size; 1359 + int image_size, prog_size, extable_size, extable_align, extable_offset; 1456 1360 struct bpf_prog *tmp, *orig_prog = prog; 1457 1361 struct bpf_binary_header *header; 1458 1362 struct arm64_jit_data *jit_data; ··· 1522 1426 1523 1427 ctx.epilogue_offset = ctx.idx; 1524 1428 build_epilogue(&ctx); 1429 + build_plt(&ctx); 1525 1430 1431 + extable_align = __alignof__(struct exception_table_entry); 1526 1432 extable_size = prog->aux->num_exentries * 1527 1433 sizeof(struct exception_table_entry); 1528 1434 1529 1435 /* Now we know the actual image size. */ 1530 1436 prog_size = sizeof(u32) * ctx.idx; 1531 - image_size = prog_size + extable_size; 1437 + /* also allocate space for plt target */ 1438 + extable_offset = round_up(prog_size + PLT_TARGET_SIZE, extable_align); 1439 + image_size = extable_offset + extable_size; 1532 1440 header = bpf_jit_binary_alloc(image_size, &image_ptr, 1533 1441 sizeof(u32), jit_fill_hole); 1534 1442 if (header == NULL) { ··· 1544 1444 1545 1445 ctx.image = (__le32 *)image_ptr; 1546 1446 if (extable_size) 1547 - prog->aux->extable = (void *)image_ptr + prog_size; 1447 + prog->aux->extable = (void *)image_ptr + extable_offset; 1548 1448 skip_init_ctx: 1549 1449 ctx.idx = 0; 1550 1450 ctx.exentry_idx = 0; ··· 1558 1458 } 1559 1459 1560 1460 build_epilogue(&ctx); 1461 + build_plt(&ctx); 1561 1462 1562 1463 /* 3. Extra pass to validate JITed code. */ 1563 - if (validate_code(&ctx)) { 1464 + if (validate_ctx(&ctx)) { 1564 1465 bpf_jit_binary_free(header); 1565 1466 prog = orig_prog; 1566 1467 goto out_off; ··· 1637 1536 bool bpf_jit_supports_subprog_tailcalls(void) 1638 1537 { 1639 1538 return true; 1539 + } 1540 + 1541 + static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l, 1542 + int args_off, int retval_off, int run_ctx_off, 1543 + bool save_ret) 1544 + { 1545 + u32 *branch; 1546 + u64 enter_prog; 1547 + u64 exit_prog; 1548 + struct bpf_prog *p = l->link.prog; 1549 + int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie); 1550 + 1551 + if (p->aux->sleepable) { 1552 + enter_prog = (u64)__bpf_prog_enter_sleepable; 1553 + exit_prog = (u64)__bpf_prog_exit_sleepable; 1554 + } else { 1555 + enter_prog = (u64)__bpf_prog_enter; 1556 + exit_prog = (u64)__bpf_prog_exit; 1557 + } 1558 + 1559 + if (l->cookie == 0) { 1560 + /* if cookie is zero, one instruction is enough to store it */ 1561 + emit(A64_STR64I(A64_ZR, A64_SP, run_ctx_off + cookie_off), ctx); 1562 + } else { 1563 + emit_a64_mov_i64(A64_R(10), l->cookie, ctx); 1564 + emit(A64_STR64I(A64_R(10), A64_SP, run_ctx_off + cookie_off), 1565 + ctx); 1566 + } 1567 + 1568 + /* save p to callee saved register x19 to avoid loading p with mov_i64 1569 + * each time. 1570 + */ 1571 + emit_addr_mov_i64(A64_R(19), (const u64)p, ctx); 1572 + 1573 + /* arg1: prog */ 1574 + emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx); 1575 + /* arg2: &run_ctx */ 1576 + emit(A64_ADD_I(1, A64_R(1), A64_SP, run_ctx_off), ctx); 1577 + 1578 + emit_call(enter_prog, ctx); 1579 + 1580 + /* if (__bpf_prog_enter(prog) == 0) 1581 + * goto skip_exec_of_prog; 1582 + */ 1583 + branch = ctx->image + ctx->idx; 1584 + emit(A64_NOP, ctx); 1585 + 1586 + /* save return value to callee saved register x20 */ 1587 + emit(A64_MOV(1, A64_R(20), A64_R(0)), ctx); 1588 + 1589 + emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx); 1590 + if (!p->jited) 1591 + emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx); 1592 + 1593 + emit_call((const u64)p->bpf_func, ctx); 1594 + 1595 + if (save_ret) 1596 + emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx); 1597 + 1598 + if (ctx->image) { 1599 + int offset = &ctx->image[ctx->idx] - branch; 1600 + *branch = A64_CBZ(1, A64_R(0), offset); 1601 + } 1602 + 1603 + /* arg1: prog */ 1604 + emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx); 1605 + /* arg2: start time */ 1606 + emit(A64_MOV(1, A64_R(1), A64_R(20)), ctx); 1607 + /* arg3: &run_ctx */ 1608 + emit(A64_ADD_I(1, A64_R(2), A64_SP, run_ctx_off), ctx); 1609 + 1610 + emit_call(exit_prog, ctx); 1611 + } 1612 + 1613 + static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl, 1614 + int args_off, int retval_off, int run_ctx_off, 1615 + u32 **branches) 1616 + { 1617 + int i; 1618 + 1619 + /* The first fmod_ret program will receive a garbage return value. 1620 + * Set this to 0 to avoid confusing the program. 1621 + */ 1622 + emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx); 1623 + for (i = 0; i < tl->nr_links; i++) { 1624 + invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off, 1625 + run_ctx_off, true); 1626 + /* if (*(u64 *)(sp + retval_off) != 0) 1627 + * goto do_fexit; 1628 + */ 1629 + emit(A64_LDR64I(A64_R(10), A64_SP, retval_off), ctx); 1630 + /* Save the location of branch, and generate a nop. 1631 + * This nop will be replaced with a cbnz later. 1632 + */ 1633 + branches[i] = ctx->image + ctx->idx; 1634 + emit(A64_NOP, ctx); 1635 + } 1636 + } 1637 + 1638 + static void save_args(struct jit_ctx *ctx, int args_off, int nargs) 1639 + { 1640 + int i; 1641 + 1642 + for (i = 0; i < nargs; i++) { 1643 + emit(A64_STR64I(i, A64_SP, args_off), ctx); 1644 + args_off += 8; 1645 + } 1646 + } 1647 + 1648 + static void restore_args(struct jit_ctx *ctx, int args_off, int nargs) 1649 + { 1650 + int i; 1651 + 1652 + for (i = 0; i < nargs; i++) { 1653 + emit(A64_LDR64I(i, A64_SP, args_off), ctx); 1654 + args_off += 8; 1655 + } 1656 + } 1657 + 1658 + /* Based on the x86's implementation of arch_prepare_bpf_trampoline(). 1659 + * 1660 + * bpf prog and function entry before bpf trampoline hooked: 1661 + * mov x9, lr 1662 + * nop 1663 + * 1664 + * bpf prog and function entry after bpf trampoline hooked: 1665 + * mov x9, lr 1666 + * bl <bpf_trampoline or plt> 1667 + * 1668 + */ 1669 + static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, 1670 + struct bpf_tramp_links *tlinks, void *orig_call, 1671 + int nargs, u32 flags) 1672 + { 1673 + int i; 1674 + int stack_size; 1675 + int retaddr_off; 1676 + int regs_off; 1677 + int retval_off; 1678 + int args_off; 1679 + int nargs_off; 1680 + int ip_off; 1681 + int run_ctx_off; 1682 + struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; 1683 + struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; 1684 + struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; 1685 + bool save_ret; 1686 + u32 **branches = NULL; 1687 + 1688 + /* trampoline stack layout: 1689 + * [ parent ip ] 1690 + * [ FP ] 1691 + * SP + retaddr_off [ self ip ] 1692 + * [ FP ] 1693 + * 1694 + * [ padding ] align SP to multiples of 16 1695 + * 1696 + * [ x20 ] callee saved reg x20 1697 + * SP + regs_off [ x19 ] callee saved reg x19 1698 + * 1699 + * SP + retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or 1700 + * BPF_TRAMP_F_RET_FENTRY_RET 1701 + * 1702 + * [ argN ] 1703 + * [ ... ] 1704 + * SP + args_off [ arg1 ] 1705 + * 1706 + * SP + nargs_off [ args count ] 1707 + * 1708 + * SP + ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag 1709 + * 1710 + * SP + run_ctx_off [ bpf_tramp_run_ctx ] 1711 + */ 1712 + 1713 + stack_size = 0; 1714 + run_ctx_off = stack_size; 1715 + /* room for bpf_tramp_run_ctx */ 1716 + stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8); 1717 + 1718 + ip_off = stack_size; 1719 + /* room for IP address argument */ 1720 + if (flags & BPF_TRAMP_F_IP_ARG) 1721 + stack_size += 8; 1722 + 1723 + nargs_off = stack_size; 1724 + /* room for args count */ 1725 + stack_size += 8; 1726 + 1727 + args_off = stack_size; 1728 + /* room for args */ 1729 + stack_size += nargs * 8; 1730 + 1731 + /* room for return value */ 1732 + retval_off = stack_size; 1733 + save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET); 1734 + if (save_ret) 1735 + stack_size += 8; 1736 + 1737 + /* room for callee saved registers, currently x19 and x20 are used */ 1738 + regs_off = stack_size; 1739 + stack_size += 16; 1740 + 1741 + /* round up to multiples of 16 to avoid SPAlignmentFault */ 1742 + stack_size = round_up(stack_size, 16); 1743 + 1744 + /* return address locates above FP */ 1745 + retaddr_off = stack_size + 8; 1746 + 1747 + /* bpf trampoline may be invoked by 3 instruction types: 1748 + * 1. bl, attached to bpf prog or kernel function via short jump 1749 + * 2. br, attached to bpf prog or kernel function via long jump 1750 + * 3. blr, working as a function pointer, used by struct_ops. 1751 + * So BTI_JC should used here to support both br and blr. 1752 + */ 1753 + emit_bti(A64_BTI_JC, ctx); 1754 + 1755 + /* frame for parent function */ 1756 + emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx); 1757 + emit(A64_MOV(1, A64_FP, A64_SP), ctx); 1758 + 1759 + /* frame for patched function */ 1760 + emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx); 1761 + emit(A64_MOV(1, A64_FP, A64_SP), ctx); 1762 + 1763 + /* allocate stack space */ 1764 + emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx); 1765 + 1766 + if (flags & BPF_TRAMP_F_IP_ARG) { 1767 + /* save ip address of the traced function */ 1768 + emit_addr_mov_i64(A64_R(10), (const u64)orig_call, ctx); 1769 + emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx); 1770 + } 1771 + 1772 + /* save args count*/ 1773 + emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx); 1774 + emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx); 1775 + 1776 + /* save args */ 1777 + save_args(ctx, args_off, nargs); 1778 + 1779 + /* save callee saved registers */ 1780 + emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx); 1781 + emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx); 1782 + 1783 + if (flags & BPF_TRAMP_F_CALL_ORIG) { 1784 + emit_addr_mov_i64(A64_R(0), (const u64)im, ctx); 1785 + emit_call((const u64)__bpf_tramp_enter, ctx); 1786 + } 1787 + 1788 + for (i = 0; i < fentry->nr_links; i++) 1789 + invoke_bpf_prog(ctx, fentry->links[i], args_off, 1790 + retval_off, run_ctx_off, 1791 + flags & BPF_TRAMP_F_RET_FENTRY_RET); 1792 + 1793 + if (fmod_ret->nr_links) { 1794 + branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *), 1795 + GFP_KERNEL); 1796 + if (!branches) 1797 + return -ENOMEM; 1798 + 1799 + invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off, 1800 + run_ctx_off, branches); 1801 + } 1802 + 1803 + if (flags & BPF_TRAMP_F_CALL_ORIG) { 1804 + restore_args(ctx, args_off, nargs); 1805 + /* call original func */ 1806 + emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx); 1807 + emit(A64_BLR(A64_R(10)), ctx); 1808 + /* store return value */ 1809 + emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx); 1810 + /* reserve a nop for bpf_tramp_image_put */ 1811 + im->ip_after_call = ctx->image + ctx->idx; 1812 + emit(A64_NOP, ctx); 1813 + } 1814 + 1815 + /* update the branches saved in invoke_bpf_mod_ret with cbnz */ 1816 + for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) { 1817 + int offset = &ctx->image[ctx->idx] - branches[i]; 1818 + *branches[i] = A64_CBNZ(1, A64_R(10), offset); 1819 + } 1820 + 1821 + for (i = 0; i < fexit->nr_links; i++) 1822 + invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off, 1823 + run_ctx_off, false); 1824 + 1825 + if (flags & BPF_TRAMP_F_CALL_ORIG) { 1826 + im->ip_epilogue = ctx->image + ctx->idx; 1827 + emit_addr_mov_i64(A64_R(0), (const u64)im, ctx); 1828 + emit_call((const u64)__bpf_tramp_exit, ctx); 1829 + } 1830 + 1831 + if (flags & BPF_TRAMP_F_RESTORE_REGS) 1832 + restore_args(ctx, args_off, nargs); 1833 + 1834 + /* restore callee saved register x19 and x20 */ 1835 + emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx); 1836 + emit(A64_LDR64I(A64_R(20), A64_SP, regs_off + 8), ctx); 1837 + 1838 + if (save_ret) 1839 + emit(A64_LDR64I(A64_R(0), A64_SP, retval_off), ctx); 1840 + 1841 + /* reset SP */ 1842 + emit(A64_MOV(1, A64_SP, A64_FP), ctx); 1843 + 1844 + /* pop frames */ 1845 + emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx); 1846 + emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx); 1847 + 1848 + if (flags & BPF_TRAMP_F_SKIP_FRAME) { 1849 + /* skip patched function, return to parent */ 1850 + emit(A64_MOV(1, A64_LR, A64_R(9)), ctx); 1851 + emit(A64_RET(A64_R(9)), ctx); 1852 + } else { 1853 + /* return to patched function */ 1854 + emit(A64_MOV(1, A64_R(10), A64_LR), ctx); 1855 + emit(A64_MOV(1, A64_LR, A64_R(9)), ctx); 1856 + emit(A64_RET(A64_R(10)), ctx); 1857 + } 1858 + 1859 + if (ctx->image) 1860 + bpf_flush_icache(ctx->image, ctx->image + ctx->idx); 1861 + 1862 + kfree(branches); 1863 + 1864 + return ctx->idx; 1865 + } 1866 + 1867 + int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, 1868 + void *image_end, const struct btf_func_model *m, 1869 + u32 flags, struct bpf_tramp_links *tlinks, 1870 + void *orig_call) 1871 + { 1872 + int ret; 1873 + int nargs = m->nr_args; 1874 + int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE; 1875 + struct jit_ctx ctx = { 1876 + .image = NULL, 1877 + .idx = 0, 1878 + }; 1879 + 1880 + /* the first 8 arguments are passed by registers */ 1881 + if (nargs > 8) 1882 + return -ENOTSUPP; 1883 + 1884 + ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags); 1885 + if (ret < 0) 1886 + return ret; 1887 + 1888 + if (ret > max_insns) 1889 + return -EFBIG; 1890 + 1891 + ctx.image = image; 1892 + ctx.idx = 0; 1893 + 1894 + jit_fill_hole(image, (unsigned int)(image_end - image)); 1895 + ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags); 1896 + 1897 + if (ret > 0 && validate_code(&ctx) < 0) 1898 + ret = -EINVAL; 1899 + 1900 + if (ret > 0) 1901 + ret *= AARCH64_INSN_SIZE; 1902 + 1903 + return ret; 1904 + } 1905 + 1906 + static bool is_long_jump(void *ip, void *target) 1907 + { 1908 + long offset; 1909 + 1910 + /* NULL target means this is a NOP */ 1911 + if (!target) 1912 + return false; 1913 + 1914 + offset = (long)target - (long)ip; 1915 + return offset < -SZ_128M || offset >= SZ_128M; 1916 + } 1917 + 1918 + static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip, 1919 + void *addr, void *plt, u32 *insn) 1920 + { 1921 + void *target; 1922 + 1923 + if (!addr) { 1924 + *insn = aarch64_insn_gen_nop(); 1925 + return 0; 1926 + } 1927 + 1928 + if (is_long_jump(ip, addr)) 1929 + target = plt; 1930 + else 1931 + target = addr; 1932 + 1933 + *insn = aarch64_insn_gen_branch_imm((unsigned long)ip, 1934 + (unsigned long)target, 1935 + type); 1936 + 1937 + return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT; 1938 + } 1939 + 1940 + /* Replace the branch instruction from @ip to @old_addr in a bpf prog or a bpf 1941 + * trampoline with the branch instruction from @ip to @new_addr. If @old_addr 1942 + * or @new_addr is NULL, the old or new instruction is NOP. 1943 + * 1944 + * When @ip is the bpf prog entry, a bpf trampoline is being attached or 1945 + * detached. Since bpf trampoline and bpf prog are allocated separately with 1946 + * vmalloc, the address distance may exceed 128MB, the maximum branch range. 1947 + * So long jump should be handled. 1948 + * 1949 + * When a bpf prog is constructed, a plt pointing to empty trampoline 1950 + * dummy_tramp is placed at the end: 1951 + * 1952 + * bpf_prog: 1953 + * mov x9, lr 1954 + * nop // patchsite 1955 + * ... 1956 + * ret 1957 + * 1958 + * plt: 1959 + * ldr x10, target 1960 + * br x10 1961 + * target: 1962 + * .quad dummy_tramp // plt target 1963 + * 1964 + * This is also the state when no trampoline is attached. 1965 + * 1966 + * When a short-jump bpf trampoline is attached, the patchsite is patched 1967 + * to a bl instruction to the trampoline directly: 1968 + * 1969 + * bpf_prog: 1970 + * mov x9, lr 1971 + * bl <short-jump bpf trampoline address> // patchsite 1972 + * ... 1973 + * ret 1974 + * 1975 + * plt: 1976 + * ldr x10, target 1977 + * br x10 1978 + * target: 1979 + * .quad dummy_tramp // plt target 1980 + * 1981 + * When a long-jump bpf trampoline is attached, the plt target is filled with 1982 + * the trampoline address and the patchsite is patched to a bl instruction to 1983 + * the plt: 1984 + * 1985 + * bpf_prog: 1986 + * mov x9, lr 1987 + * bl plt // patchsite 1988 + * ... 1989 + * ret 1990 + * 1991 + * plt: 1992 + * ldr x10, target 1993 + * br x10 1994 + * target: 1995 + * .quad <long-jump bpf trampoline address> // plt target 1996 + * 1997 + * The dummy_tramp is used to prevent another CPU from jumping to unknown 1998 + * locations during the patching process, making the patching process easier. 1999 + */ 2000 + int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type, 2001 + void *old_addr, void *new_addr) 2002 + { 2003 + int ret; 2004 + u32 old_insn; 2005 + u32 new_insn; 2006 + u32 replaced; 2007 + struct bpf_plt *plt = NULL; 2008 + unsigned long size = 0UL; 2009 + unsigned long offset = ~0UL; 2010 + enum aarch64_insn_branch_type branch_type; 2011 + char namebuf[KSYM_NAME_LEN]; 2012 + void *image = NULL; 2013 + u64 plt_target = 0ULL; 2014 + bool poking_bpf_entry; 2015 + 2016 + if (!__bpf_address_lookup((unsigned long)ip, &size, &offset, namebuf)) 2017 + /* Only poking bpf text is supported. Since kernel function 2018 + * entry is set up by ftrace, we reply on ftrace to poke kernel 2019 + * functions. 2020 + */ 2021 + return -ENOTSUPP; 2022 + 2023 + image = ip - offset; 2024 + /* zero offset means we're poking bpf prog entry */ 2025 + poking_bpf_entry = (offset == 0UL); 2026 + 2027 + /* bpf prog entry, find plt and the real patchsite */ 2028 + if (poking_bpf_entry) { 2029 + /* plt locates at the end of bpf prog */ 2030 + plt = image + size - PLT_TARGET_OFFSET; 2031 + 2032 + /* skip to the nop instruction in bpf prog entry: 2033 + * bti c // if BTI enabled 2034 + * mov x9, x30 2035 + * nop 2036 + */ 2037 + ip = image + POKE_OFFSET * AARCH64_INSN_SIZE; 2038 + } 2039 + 2040 + /* long jump is only possible at bpf prog entry */ 2041 + if (WARN_ON((is_long_jump(ip, new_addr) || is_long_jump(ip, old_addr)) && 2042 + !poking_bpf_entry)) 2043 + return -EINVAL; 2044 + 2045 + if (poke_type == BPF_MOD_CALL) 2046 + branch_type = AARCH64_INSN_BRANCH_LINK; 2047 + else 2048 + branch_type = AARCH64_INSN_BRANCH_NOLINK; 2049 + 2050 + if (gen_branch_or_nop(branch_type, ip, old_addr, plt, &old_insn) < 0) 2051 + return -EFAULT; 2052 + 2053 + if (gen_branch_or_nop(branch_type, ip, new_addr, plt, &new_insn) < 0) 2054 + return -EFAULT; 2055 + 2056 + if (is_long_jump(ip, new_addr)) 2057 + plt_target = (u64)new_addr; 2058 + else if (is_long_jump(ip, old_addr)) 2059 + /* if the old target is a long jump and the new target is not, 2060 + * restore the plt target to dummy_tramp, so there is always a 2061 + * legal and harmless address stored in plt target, and we'll 2062 + * never jump from plt to an unknown place. 2063 + */ 2064 + plt_target = (u64)&dummy_tramp; 2065 + 2066 + if (plt_target) { 2067 + /* non-zero plt_target indicates we're patching a bpf prog, 2068 + * which is read only. 2069 + */ 2070 + if (set_memory_rw(PAGE_MASK & ((uintptr_t)&plt->target), 1)) 2071 + return -EFAULT; 2072 + WRITE_ONCE(plt->target, plt_target); 2073 + set_memory_ro(PAGE_MASK & ((uintptr_t)&plt->target), 1); 2074 + /* since plt target points to either the new trampoline 2075 + * or dummy_tramp, even if another CPU reads the old plt 2076 + * target value before fetching the bl instruction to plt, 2077 + * it will be brought back by dummy_tramp, so no barrier is 2078 + * required here. 2079 + */ 2080 + } 2081 + 2082 + /* if the old target and the new target are both long jumps, no 2083 + * patching is required 2084 + */ 2085 + if (old_insn == new_insn) 2086 + return 0; 2087 + 2088 + mutex_lock(&text_mutex); 2089 + if (aarch64_insn_read(ip, &replaced)) { 2090 + ret = -EFAULT; 2091 + goto out; 2092 + } 2093 + 2094 + if (replaced != old_insn) { 2095 + ret = -EFAULT; 2096 + goto out; 2097 + } 2098 + 2099 + /* We call aarch64_insn_patch_text_nosync() to replace instruction 2100 + * atomically, so no other CPUs will fetch a half-new and half-old 2101 + * instruction. But there is chance that another CPU executes the 2102 + * old instruction after the patching operation finishes (e.g., 2103 + * pipeline not flushed, or icache not synchronized yet). 2104 + * 2105 + * 1. when a new trampoline is attached, it is not a problem for 2106 + * different CPUs to jump to different trampolines temporarily. 2107 + * 2108 + * 2. when an old trampoline is freed, we should wait for all other 2109 + * CPUs to exit the trampoline and make sure the trampoline is no 2110 + * longer reachable, since bpf_tramp_image_put() function already 2111 + * uses percpu_ref and task-based rcu to do the sync, no need to call 2112 + * the sync version here, see bpf_tramp_image_put() for details. 2113 + */ 2114 + ret = aarch64_insn_patch_text_nosync(ip, new_insn); 2115 + out: 2116 + mutex_unlock(&text_mutex); 2117 + 2118 + return ret; 1640 2119 }

+34 -24

arch/x86/net/bpf_jit_comp.c

··· 1950 1950 return 0; 1951 1951 } 1952 1952 1953 - static bool is_valid_bpf_tramp_flags(unsigned int flags) 1954 - { 1955 - if ((flags & BPF_TRAMP_F_RESTORE_REGS) && 1956 - (flags & BPF_TRAMP_F_SKIP_FRAME)) 1957 - return false; 1958 - 1959 - /* 1960 - * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops, 1961 - * and it must be used alone. 1962 - */ 1963 - if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) && 1964 - (flags & ~BPF_TRAMP_F_RET_FENTRY_RET)) 1965 - return false; 1966 - 1967 - return true; 1968 - } 1969 - 1970 1953 /* Example: 1971 1954 * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev); 1972 1955 * its 'struct btf_func_model' will be nr_args=2 ··· 2027 2044 /* x86-64 supports up to 6 arguments. 7+ can be added in the future */ 2028 2045 if (nr_args > 6) 2029 2046 return -ENOTSUPP; 2030 - 2031 - if (!is_valid_bpf_tramp_flags(flags)) 2032 - return -EINVAL; 2033 2047 2034 2048 /* Generated trampoline stack layout: 2035 2049 * ··· 2133 2153 if (flags & BPF_TRAMP_F_CALL_ORIG) { 2134 2154 restore_regs(m, &prog, nr_args, regs_off); 2135 2155 2136 - /* call original function */ 2137 - if (emit_call(&prog, orig_call, prog)) { 2138 - ret = -EINVAL; 2139 - goto cleanup; 2156 + if (flags & BPF_TRAMP_F_ORIG_STACK) { 2157 + emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8); 2158 + EMIT2(0xff, 0xd0); /* call *rax */ 2159 + } else { 2160 + /* call original function */ 2161 + if (emit_call(&prog, orig_call, prog)) { 2162 + ret = -EINVAL; 2163 + goto cleanup; 2164 + } 2140 2165 } 2141 2166 /* remember return value in a stack for bpf prog to access */ 2142 2167 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); ··· 2504 2519 bool bpf_jit_supports_subprog_tailcalls(void) 2505 2520 { 2506 2521 return true; 2522 + } 2523 + 2524 + void bpf_jit_free(struct bpf_prog *prog) 2525 + { 2526 + if (prog->jited) { 2527 + struct x64_jit_data *jit_data = prog->aux->jit_data; 2528 + struct bpf_binary_header *hdr; 2529 + 2530 + /* 2531 + * If we fail the final pass of JIT (from jit_subprogs), 2532 + * the program may not be finalized yet. Call finalize here 2533 + * before freeing it. 2534 + */ 2535 + if (jit_data) { 2536 + bpf_jit_binary_pack_finalize(prog, jit_data->header, 2537 + jit_data->rw_header); 2538 + kvfree(jit_data->addrs); 2539 + kfree(jit_data); 2540 + } 2541 + hdr = bpf_jit_binary_pack_hdr(prog); 2542 + bpf_jit_binary_pack_free(hdr, NULL); 2543 + WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog)); 2544 + } 2545 + 2546 + bpf_prog_unlock_free(prog); 2507 2547 }

+23 -6

include/linux/bpf.h

··· 47 47 struct mem_cgroup; 48 48 struct module; 49 49 struct bpf_func_state; 50 + struct ftrace_ops; 50 51 51 52 extern struct idr btf_idr; 52 53 extern spinlock_t btf_idr_lock; ··· 222 221 u32 btf_vmlinux_value_type_id; 223 222 struct btf *btf; 224 223 #ifdef CONFIG_MEMCG_KMEM 225 - struct mem_cgroup *memcg; 224 + struct obj_cgroup *objcg; 226 225 #endif 227 226 char name[BPF_OBJ_NAME_LEN]; 228 227 struct bpf_map_off_arr *off_arr; ··· 752 751 /* Return the return value of fentry prog. Only used by bpf_struct_ops. */ 753 752 #define BPF_TRAMP_F_RET_FENTRY_RET BIT(4) 754 753 754 + /* Get original function from stack instead of from provided direct address. 755 + * Makes sense for trampolines with fexit or fmod_ret programs. 756 + */ 757 + #define BPF_TRAMP_F_ORIG_STACK BIT(5) 758 + 759 + /* This trampoline is on a function with another ftrace_ops with IPMODIFY, 760 + * e.g., a live patch. This flag is set and cleared by ftrace call backs, 761 + */ 762 + #define BPF_TRAMP_F_SHARE_IPMODIFY BIT(6) 763 + 755 764 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 756 765 * bytes on x86. 757 766 */ ··· 844 833 struct bpf_trampoline { 845 834 /* hlist for trampoline_table */ 846 835 struct hlist_node hlist; 836 + struct ftrace_ops *fops; 847 837 /* serializes access to fields of this trampoline */ 848 838 struct mutex mutex; 849 839 refcount_t refcnt; 840 + u32 flags; 850 841 u64 key; 851 842 struct { 852 843 struct btf_func_model model; ··· 1057 1044 bool sleepable; 1058 1045 bool tail_call_reachable; 1059 1046 bool xdp_has_frags; 1060 - bool use_bpf_prog_pack; 1061 1047 /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ 1062 1048 const struct btf_type *attach_func_proto; 1063 1049 /* function name for valid attach_btf_id */ ··· 1267 1255 int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, 1268 1256 union bpf_attr __user *uattr); 1269 1257 #endif 1270 - int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, 1271 - int cgroup_atype); 1272 - void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog); 1273 1258 #else 1274 1259 static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) 1275 1260 { ··· 1290 1281 { 1291 1282 return -EINVAL; 1292 1283 } 1284 + #endif 1285 + 1286 + #if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM) 1287 + int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, 1288 + int cgroup_atype); 1289 + void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog); 1290 + #else 1293 1291 static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, 1294 1292 int cgroup_atype) 1295 1293 { ··· 1937 1921 struct bpf_reg_state *regs); 1938 1922 int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, 1939 1923 const struct btf *btf, u32 func_id, 1940 - struct bpf_reg_state *regs); 1924 + struct bpf_reg_state *regs, 1925 + u32 kfunc_flags); 1941 1926 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, 1942 1927 struct bpf_reg_state *reg); 1943 1928 int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,

+4 -4

include/linux/bpf_verifier.h

··· 345 345 }; 346 346 347 347 struct bpf_loop_inline_state { 348 - int initialized:1; /* set to true upon first entry */ 349 - int fit_for_inline:1; /* true if callback function is the same 350 - * at each call and flags are always zero 351 - */ 348 + unsigned int initialized:1; /* set to true upon first entry */ 349 + unsigned int fit_for_inline:1; /* true if callback function is the same 350 + * at each call and flags are always zero 351 + */ 352 352 u32 callback_subprogno; /* valid when fit_for_inline is true */ 353 353 }; 354 354

+42 -23

include/linux/btf.h

··· 12 12 #define BTF_TYPE_EMIT(type) ((void)(type *)0) 13 13 #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) 14 14 15 - enum btf_kfunc_type { 16 - BTF_KFUNC_TYPE_CHECK, 17 - BTF_KFUNC_TYPE_ACQUIRE, 18 - BTF_KFUNC_TYPE_RELEASE, 19 - BTF_KFUNC_TYPE_RET_NULL, 20 - BTF_KFUNC_TYPE_KPTR_ACQUIRE, 21 - BTF_KFUNC_TYPE_MAX, 22 - }; 15 + /* These need to be macros, as the expressions are used in assembler input */ 16 + #define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ 17 + #define KF_RELEASE (1 << 1) /* kfunc is a release function */ 18 + #define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ 19 + #define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ 20 + /* Trusted arguments are those which are meant to be referenced arguments with 21 + * unchanged offset. It is used to enforce that pointers obtained from acquire 22 + * kfuncs remain unmodified when being passed to helpers taking trusted args. 23 + * 24 + * Consider 25 + * struct foo { 26 + * int data; 27 + * struct foo *next; 28 + * }; 29 + * 30 + * struct bar { 31 + * int data; 32 + * struct foo f; 33 + * }; 34 + * 35 + * struct foo *f = alloc_foo(); // Acquire kfunc 36 + * struct bar *b = alloc_bar(); // Acquire kfunc 37 + * 38 + * If a kfunc set_foo_data() wants to operate only on the allocated object, it 39 + * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like: 40 + * 41 + * set_foo_data(f, 42); // Allowed 42 + * set_foo_data(f->next, 42); // Rejected, non-referenced pointer 43 + * set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type 44 + * set_foo_data(&b->f, 42); // Rejected, referenced, but bad offset 45 + * 46 + * In the final case, usually for the purposes of type matching, it is deduced 47 + * by looking at the type of the member at the offset, but due to the 48 + * requirement of trusted argument, this deduction will be strict and not done 49 + * for this case. 50 + */ 51 + #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ 23 52 24 53 struct btf; 25 54 struct btf_member; ··· 59 30 60 31 struct btf_kfunc_id_set { 61 32 struct module *owner; 62 - union { 63 - struct { 64 - struct btf_id_set *check_set; 65 - struct btf_id_set *acquire_set; 66 - struct btf_id_set *release_set; 67 - struct btf_id_set *ret_null_set; 68 - struct btf_id_set *kptr_acquire_set; 69 - }; 70 - struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX]; 71 - }; 33 + struct btf_id_set8 *set; 72 34 }; 73 35 74 36 struct btf_id_dtor_kfunc { ··· 398 378 const char *btf_name_by_offset(const struct btf *btf, u32 offset); 399 379 struct btf *btf_parse_vmlinux(void); 400 380 struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); 401 - bool btf_kfunc_id_set_contains(const struct btf *btf, 381 + u32 *btf_kfunc_id_set_contains(const struct btf *btf, 402 382 enum bpf_prog_type prog_type, 403 - enum btf_kfunc_type type, u32 kfunc_btf_id); 383 + u32 kfunc_btf_id); 404 384 int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, 405 385 const struct btf_kfunc_id_set *s); 406 386 s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id); ··· 417 397 { 418 398 return NULL; 419 399 } 420 - static inline bool btf_kfunc_id_set_contains(const struct btf *btf, 400 + static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf, 421 401 enum bpf_prog_type prog_type, 422 - enum btf_kfunc_type type, 423 402 u32 kfunc_btf_id) 424 403 { 425 - return false; 404 + return NULL; 426 405 } 427 406 static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, 428 407 const struct btf_kfunc_id_set *s)

+64 -4

include/linux/btf_ids.h

··· 8 8 u32 ids[]; 9 9 }; 10 10 11 + struct btf_id_set8 { 12 + u32 cnt; 13 + u32 flags; 14 + struct { 15 + u32 id; 16 + u32 flags; 17 + } pairs[]; 18 + }; 19 + 11 20 #ifdef CONFIG_DEBUG_INFO_BTF 12 21 13 22 #include <linux/compiler.h> /* for __PASTE */ ··· 34 25 35 26 #define BTF_IDS_SECTION ".BTF_ids" 36 27 37 - #define ____BTF_ID(symbol) \ 28 + #define ____BTF_ID(symbol, word) \ 38 29 asm( \ 39 30 ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 40 31 ".local " #symbol " ; \n" \ ··· 42 33 ".size " #symbol ", 4; \n" \ 43 34 #symbol ": \n" \ 44 35 ".zero 4 \n" \ 36 + word \ 45 37 ".popsection; \n"); 46 38 47 - #define __BTF_ID(symbol) \ 48 - ____BTF_ID(symbol) 39 + #define __BTF_ID(symbol, word) \ 40 + ____BTF_ID(symbol, word) 49 41 50 42 #define __ID(prefix) \ 51 43 __PASTE(prefix, __COUNTER__) ··· 56 46 * to 4 zero bytes. 57 47 */ 58 48 #define BTF_ID(prefix, name) \ 59 - __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__)) 49 + __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), "") 50 + 51 + #define ____BTF_ID_FLAGS(prefix, name, flags) \ 52 + __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), ".long " #flags "\n") 53 + #define __BTF_ID_FLAGS(prefix, name, flags, ...) \ 54 + ____BTF_ID_FLAGS(prefix, name, flags) 55 + #define BTF_ID_FLAGS(prefix, name, ...) \ 56 + __BTF_ID_FLAGS(prefix, name, ##__VA_ARGS__, 0) 60 57 61 58 /* 62 59 * The BTF_ID_LIST macro defines pure (unsorted) list ··· 162 145 ".popsection; \n"); \ 163 146 extern struct btf_id_set name; 164 147 148 + /* 149 + * The BTF_SET8_START/END macros pair defines sorted list of 150 + * BTF IDs and their flags plus its members count, with the 151 + * following layout: 152 + * 153 + * BTF_SET8_START(list) 154 + * BTF_ID_FLAGS(type1, name1, flags) 155 + * BTF_ID_FLAGS(type2, name2, flags) 156 + * BTF_SET8_END(list) 157 + * 158 + * __BTF_ID__set8__list: 159 + * .zero 8 160 + * list: 161 + * __BTF_ID__type1__name1__3: 162 + * .zero 4 163 + * .word (1 << 0) | (1 << 2) 164 + * __BTF_ID__type2__name2__5: 165 + * .zero 4 166 + * .word (1 << 3) | (1 << 1) | (1 << 2) 167 + * 168 + */ 169 + #define __BTF_SET8_START(name, scope) \ 170 + asm( \ 171 + ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 172 + "." #scope " __BTF_ID__set8__" #name "; \n" \ 173 + "__BTF_ID__set8__" #name ":; \n" \ 174 + ".zero 8 \n" \ 175 + ".popsection; \n"); 176 + 177 + #define BTF_SET8_START(name) \ 178 + __BTF_ID_LIST(name, local) \ 179 + __BTF_SET8_START(name, local) 180 + 181 + #define BTF_SET8_END(name) \ 182 + asm( \ 183 + ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 184 + ".size __BTF_ID__set8__" #name ", .-" #name " \n" \ 185 + ".popsection; \n"); \ 186 + extern struct btf_id_set8 name; 187 + 165 188 #else 166 189 167 190 #define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; 168 191 #define BTF_ID(prefix, name) 192 + #define BTF_ID_FLAGS(prefix, name, ...) 169 193 #define BTF_ID_UNUSED 170 194 #define BTF_ID_LIST_GLOBAL(name, n) u32 __maybe_unused name[n]; 171 195 #define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 __maybe_unused name[1]; ··· 214 156 #define BTF_SET_START(name) static struct btf_id_set __maybe_unused name = { 0 }; 215 157 #define BTF_SET_START_GLOBAL(name) static struct btf_id_set __maybe_unused name = { 0 }; 216 158 #define BTF_SET_END(name) 159 + #define BTF_SET8_START(name) static struct btf_id_set8 __maybe_unused name = { 0 }; 160 + #define BTF_SET8_END(name) 217 161 218 162 #endif /* CONFIG_DEBUG_INFO_BTF */ 219 163

+8

include/linux/filter.h

··· 1027 1027 void *bpf_jit_alloc_exec(unsigned long size); 1028 1028 void bpf_jit_free_exec(void *addr); 1029 1029 void bpf_jit_free(struct bpf_prog *fp); 1030 + struct bpf_binary_header * 1031 + bpf_jit_binary_pack_hdr(const struct bpf_prog *fp); 1032 + 1033 + static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) 1034 + { 1035 + return list_empty(&fp->aux->ksym.lnode) || 1036 + fp->aux->ksym.lnode.prev == LIST_POISON2; 1037 + } 1030 1038 1031 1039 struct bpf_binary_header * 1032 1040 bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image,

+43

include/linux/ftrace.h

··· 208 208 FTRACE_OPS_FL_DIRECT = BIT(17), 209 209 }; 210 210 211 + /* 212 + * FTRACE_OPS_CMD_* commands allow the ftrace core logic to request changes 213 + * to a ftrace_ops. Note, the requests may fail. 214 + * 215 + * ENABLE_SHARE_IPMODIFY_SELF - enable a DIRECT ops to work on the same 216 + * function as an ops with IPMODIFY. Called 217 + * when the DIRECT ops is being registered. 218 + * This is called with both direct_mutex and 219 + * ftrace_lock are locked. 220 + * 221 + * ENABLE_SHARE_IPMODIFY_PEER - enable a DIRECT ops to work on the same 222 + * function as an ops with IPMODIFY. Called 223 + * when the other ops (the one with IPMODIFY) 224 + * is being registered. 225 + * This is called with direct_mutex locked. 226 + * 227 + * DISABLE_SHARE_IPMODIFY_PEER - disable a DIRECT ops to work on the same 228 + * function as an ops with IPMODIFY. Called 229 + * when the other ops (the one with IPMODIFY) 230 + * is being unregistered. 231 + * This is called with direct_mutex locked. 232 + */ 233 + enum ftrace_ops_cmd { 234 + FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF, 235 + FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER, 236 + FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER, 237 + }; 238 + 239 + /* 240 + * For most ftrace_ops_cmd, 241 + * Returns: 242 + * 0 - Success. 243 + * Negative on failure. The return value is dependent on the 244 + * callback. 245 + */ 246 + typedef int (*ftrace_ops_func_t)(struct ftrace_ops *op, enum ftrace_ops_cmd cmd); 247 + 211 248 #ifdef CONFIG_DYNAMIC_FTRACE 212 249 /* The hash used to know what functions callbacks trace */ 213 250 struct ftrace_ops_hash { ··· 287 250 unsigned long trampoline; 288 251 unsigned long trampoline_size; 289 252 struct list_head list; 253 + ftrace_ops_func_t ops_func; 290 254 #endif 291 255 }; 292 256 ··· 378 340 int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 379 341 int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 380 342 int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 343 + int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr); 381 344 382 345 #else 383 346 struct ftrace_ops; ··· 420 381 return -ENODEV; 421 382 } 422 383 static inline int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 384 + { 385 + return -ENODEV; 386 + } 387 + static inline int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr) 423 388 { 424 389 return -ENODEV; 425 390 }

+8

include/linux/skbuff.h

··· 2487 2487 2488 2488 #endif /* NET_SKBUFF_DATA_USES_OFFSET */ 2489 2489 2490 + static inline void skb_assert_len(struct sk_buff *skb) 2491 + { 2492 + #ifdef CONFIG_DEBUG_NET 2493 + if (WARN_ONCE(!skb->len, "%s\n", __func__)) 2494 + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); 2495 + #endif /* CONFIG_DEBUG_NET */ 2496 + } 2497 + 2490 2498 /* 2491 2499 * Add data to an sk_buff 2492 2500 */

+19

include/net/netfilter/nf_conntrack_core.h

··· 84 84 85 85 extern spinlock_t nf_conntrack_expect_lock; 86 86 87 + /* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */ 88 + 89 + #if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \ 90 + (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) || \ 91 + IS_ENABLED(CONFIG_NF_CT_NETLINK)) 92 + 93 + static inline void __nf_ct_set_timeout(struct nf_conn *ct, u64 timeout) 94 + { 95 + if (timeout > INT_MAX) 96 + timeout = INT_MAX; 97 + WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout); 98 + } 99 + 100 + int __nf_ct_change_timeout(struct nf_conn *ct, u64 cta_timeout); 101 + void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off); 102 + int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status); 103 + 104 + #endif 105 + 87 106 #endif /* _NF_CONNTRACK_CORE_H */

+14

include/net/xdp_sock_drv.h

··· 44 44 xp_set_rxq_info(pool, rxq); 45 45 } 46 46 47 + static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool) 48 + { 49 + #ifdef CONFIG_NET_RX_BUSY_POLL 50 + return pool->heads[0].xdp.rxq->napi_id; 51 + #else 52 + return 0; 53 + #endif 54 + } 55 + 47 56 static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, 48 57 unsigned long attrs) 49 58 { ··· 205 196 static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool, 206 197 struct xdp_rxq_info *rxq) 207 198 { 199 + } 200 + 201 + static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool) 202 + { 203 + return 0; 208 204 } 209 205 210 206 static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,

+2 -1

include/uapi/linux/bpf.h

··· 2361 2361 * Pull in non-linear data in case the *skb* is non-linear and not 2362 2362 * all of *len* are part of the linear section. Make *len* bytes 2363 2363 * from *skb* readable and writable. If a zero value is passed for 2364 - * *len*, then the whole length of the *skb* is pulled. 2364 + * *len*, then all bytes in the linear part of *skb* will be made 2365 + * readable and writable. 2365 2366 * 2366 2367 * This helper is only needed for reading and writing with direct 2367 2368 * packet access.

+22 -18

kernel/bpf/arraymap.c

··· 70 70 attr->map_flags & BPF_F_PRESERVE_ELEMS) 71 71 return -EINVAL; 72 72 73 - if (attr->value_size > KMALLOC_MAX_SIZE) 74 - /* if value_size is bigger, the user space won't be able to 75 - * access the elements. 76 - */ 73 + /* avoid overflow on round_up(map->value_size) */ 74 + if (attr->value_size > INT_MAX) 77 75 return -E2BIG; 78 76 79 77 return 0; ··· 154 156 return &array->map; 155 157 } 156 158 159 + static void *array_map_elem_ptr(struct bpf_array* array, u32 index) 160 + { 161 + return array->value + (u64)array->elem_size * index; 162 + } 163 + 157 164 /* Called from syscall or from eBPF program */ 158 165 static void *array_map_lookup_elem(struct bpf_map *map, void *key) 159 166 { ··· 168 165 if (unlikely(index >= array->map.max_entries)) 169 166 return NULL; 170 167 171 - return array->value + array->elem_size * (index & array->index_mask); 168 + return array->value + (u64)array->elem_size * (index & array->index_mask); 172 169 } 173 170 174 171 static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm, ··· 206 203 { 207 204 struct bpf_array *array = container_of(map, struct bpf_array, map); 208 205 struct bpf_insn *insn = insn_buf; 209 - u32 elem_size = round_up(map->value_size, 8); 206 + u32 elem_size = array->elem_size; 210 207 const int ret = BPF_REG_0; 211 208 const int map_ptr = BPF_REG_1; 212 209 const int index = BPF_REG_2; ··· 275 272 * access 'value_size' of them, so copying rounded areas 276 273 * will not leak any kernel data 277 274 */ 278 - size = round_up(map->value_size, 8); 275 + size = array->elem_size; 279 276 rcu_read_lock(); 280 277 pptr = array->pptrs[index & array->index_mask]; 281 278 for_each_possible_cpu(cpu) { ··· 342 339 value, map->value_size); 343 340 } else { 344 341 val = array->value + 345 - array->elem_size * (index & array->index_mask); 342 + (u64)array->elem_size * (index & array->index_mask); 346 343 if (map_flags & BPF_F_LOCK) 347 344 copy_map_value_locked(map, val, value, false); 348 345 else ··· 379 376 * returned or zeros which were zero-filled by percpu_alloc, 380 377 * so no kernel data leaks possible 381 378 */ 382 - size = round_up(map->value_size, 8); 379 + size = array->elem_size; 383 380 rcu_read_lock(); 384 381 pptr = array->pptrs[index & array->index_mask]; 385 382 for_each_possible_cpu(cpu) { ··· 411 408 return; 412 409 413 410 for (i = 0; i < array->map.max_entries; i++) 414 - bpf_timer_cancel_and_free(array->value + array->elem_size * i + 415 - map->timer_off); 411 + bpf_timer_cancel_and_free(array_map_elem_ptr(array, i) + map->timer_off); 416 412 } 417 413 418 414 /* Called when map->refcnt goes to zero, either from workqueue or from syscall */ ··· 422 420 423 421 if (map_value_has_kptrs(map)) { 424 422 for (i = 0; i < array->map.max_entries; i++) 425 - bpf_map_free_kptrs(map, array->value + array->elem_size * i); 423 + bpf_map_free_kptrs(map, array_map_elem_ptr(array, i)); 426 424 bpf_map_free_kptr_off_tab(map); 427 425 } 428 426 ··· 558 556 index = info->index & array->index_mask; 559 557 if (info->percpu_value_buf) 560 558 return array->pptrs[index]; 561 - return array->value + array->elem_size * index; 559 + return array_map_elem_ptr(array, index); 562 560 } 563 561 564 562 static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos) ··· 577 575 index = info->index & array->index_mask; 578 576 if (info->percpu_value_buf) 579 577 return array->pptrs[index]; 580 - return array->value + array->elem_size * index; 578 + return array_map_elem_ptr(array, index); 581 579 } 582 580 583 581 static int __bpf_array_map_seq_show(struct seq_file *seq, void *v) ··· 585 583 struct bpf_iter_seq_array_map_info *info = seq->private; 586 584 struct bpf_iter__bpf_map_elem ctx = {}; 587 585 struct bpf_map *map = info->map; 586 + struct bpf_array *array = container_of(map, struct bpf_array, map); 588 587 struct bpf_iter_meta meta; 589 588 struct bpf_prog *prog; 590 589 int off = 0, cpu = 0; ··· 606 603 ctx.value = v; 607 604 } else { 608 605 pptr = v; 609 - size = round_up(map->value_size, 8); 606 + size = array->elem_size; 610 607 for_each_possible_cpu(cpu) { 611 608 bpf_long_memcpy(info->percpu_value_buf + off, 612 609 per_cpu_ptr(pptr, cpu), ··· 636 633 { 637 634 struct bpf_iter_seq_array_map_info *seq_info = priv_data; 638 635 struct bpf_map *map = aux->map; 636 + struct bpf_array *array = container_of(map, struct bpf_array, map); 639 637 void *value_buf; 640 638 u32 buf_size; 641 639 642 640 if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { 643 - buf_size = round_up(map->value_size, 8) * num_possible_cpus(); 641 + buf_size = array->elem_size * num_possible_cpus(); 644 642 value_buf = kmalloc(buf_size, GFP_USER | __GFP_NOWARN); 645 643 if (!value_buf) 646 644 return -ENOMEM; ··· 694 690 if (is_percpu) 695 691 val = this_cpu_ptr(array->pptrs[i]); 696 692 else 697 - val = array->value + array->elem_size * i; 693 + val = array_map_elem_ptr(array, i); 698 694 num_elems++; 699 695 key = i; 700 696 ret = callback_fn((u64)(long)map, (u64)(long)&key, ··· 1326 1322 struct bpf_insn *insn_buf) 1327 1323 { 1328 1324 struct bpf_array *array = container_of(map, struct bpf_array, map); 1329 - u32 elem_size = round_up(map->value_size, 8); 1325 + u32 elem_size = array->elem_size; 1330 1326 struct bpf_insn *insn = insn_buf; 1331 1327 const int ret = BPF_REG_0; 1332 1328 const int map_ptr = BPF_REG_1;

+6 -2

kernel/bpf/bpf_lsm.c

··· 63 63 BTF_ID(func, bpf_lsm_socket_socketpair) 64 64 BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks) 65 65 66 + #ifdef CONFIG_CGROUP_BPF 66 67 void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, 67 68 bpf_func_t *bpf_func) 68 69 { 69 - const struct btf_param *args; 70 + const struct btf_param *args __maybe_unused; 70 71 71 72 if (btf_type_vlen(prog->aux->attach_func_proto) < 1 || 72 73 btf_id_set_contains(&bpf_lsm_current_hooks, ··· 76 75 return; 77 76 } 78 77 78 + #ifdef CONFIG_NET 79 79 args = btf_params(prog->aux->attach_func_proto); 80 80 81 - #ifdef CONFIG_NET 82 81 if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCKET]) 83 82 *bpf_func = __cgroup_bpf_run_lsm_socket; 84 83 else if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCK]) ··· 87 86 #endif 88 87 *bpf_func = __cgroup_bpf_run_lsm_current; 89 88 } 89 + #endif 90 90 91 91 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, 92 92 const struct bpf_prog *prog) ··· 221 219 case BPF_FUNC_get_retval: 222 220 return prog->expected_attach_type == BPF_LSM_CGROUP ? 223 221 &bpf_get_retval_proto : NULL; 222 + #ifdef CONFIG_NET 224 223 case BPF_FUNC_setsockopt: 225 224 if (prog->expected_attach_type != BPF_LSM_CGROUP) 226 225 return NULL; ··· 242 239 prog->aux->attach_btf_id)) 243 240 return &bpf_unlocked_sk_getsockopt_proto; 244 241 return NULL; 242 + #endif 245 243 default: 246 244 return tracing_prog_func_proto(func_id, prog); 247 245 }

+3

kernel/bpf/bpf_struct_ops.c

··· 341 341 342 342 tlinks[BPF_TRAMP_FENTRY].links[0] = link; 343 343 tlinks[BPF_TRAMP_FENTRY].nr_links = 1; 344 + /* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops, 345 + * and it must be used alone. 346 + */ 344 347 flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0; 345 348 return arch_prepare_bpf_trampoline(NULL, image, image_end, 346 349 model, flags, tlinks, NULL);

+64 -62

kernel/bpf/btf.c

··· 213 213 }; 214 214 215 215 struct btf_kfunc_set_tab { 216 - struct btf_id_set *sets[BTF_KFUNC_HOOK_MAX][BTF_KFUNC_TYPE_MAX]; 216 + struct btf_id_set8 *sets[BTF_KFUNC_HOOK_MAX]; 217 217 }; 218 218 219 219 struct btf_id_dtor_kfunc_tab { ··· 1116 1116 */ 1117 1117 #define btf_show_type_value(show, fmt, value) \ 1118 1118 do { \ 1119 - if ((value) != 0 || (show->flags & BTF_SHOW_ZERO) || \ 1119 + if ((value) != (__typeof__(value))0 || \ 1120 + (show->flags & BTF_SHOW_ZERO) || \ 1120 1121 show->state.depth == 0) { \ 1121 1122 btf_show(show, "%s%s" fmt "%s%s", \ 1122 1123 btf_show_indent(show), \ ··· 1616 1615 static void btf_free_kfunc_set_tab(struct btf *btf) 1617 1616 { 1618 1617 struct btf_kfunc_set_tab *tab = btf->kfunc_set_tab; 1619 - int hook, type; 1618 + int hook; 1620 1619 1621 1620 if (!tab) 1622 1621 return; ··· 1625 1624 */ 1626 1625 if (btf_is_module(btf)) 1627 1626 goto free_tab; 1628 - for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++) { 1629 - for (type = 0; type < ARRAY_SIZE(tab->sets[0]); type++) 1630 - kfree(tab->sets[hook][type]); 1631 - } 1627 + for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++) 1628 + kfree(tab->sets[hook]); 1632 1629 free_tab: 1633 1630 kfree(tab); 1634 1631 btf->kfunc_set_tab = NULL; ··· 6170 6171 static int btf_check_func_arg_match(struct bpf_verifier_env *env, 6171 6172 const struct btf *btf, u32 func_id, 6172 6173 struct bpf_reg_state *regs, 6173 - bool ptr_to_mem_ok) 6174 + bool ptr_to_mem_ok, 6175 + u32 kfunc_flags) 6174 6176 { 6175 6177 enum bpf_prog_type prog_type = resolve_prog_type(env->prog); 6178 + bool rel = false, kptr_get = false, trusted_arg = false; 6176 6179 struct bpf_verifier_log *log = &env->log; 6177 6180 u32 i, nargs, ref_id, ref_obj_id = 0; 6178 6181 bool is_kfunc = btf_is_kernel(btf); 6179 - bool rel = false, kptr_get = false; 6180 6182 const char *func_name, *ref_tname; 6181 6183 const struct btf_type *t, *ref_t; 6182 6184 const struct btf_param *args; ··· 6209 6209 6210 6210 if (is_kfunc) { 6211 6211 /* Only kfunc can be release func */ 6212 - rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog), 6213 - BTF_KFUNC_TYPE_RELEASE, func_id); 6214 - kptr_get = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog), 6215 - BTF_KFUNC_TYPE_KPTR_ACQUIRE, func_id); 6212 + rel = kfunc_flags & KF_RELEASE; 6213 + kptr_get = kfunc_flags & KF_KPTR_GET; 6214 + trusted_arg = kfunc_flags & KF_TRUSTED_ARGS; 6216 6215 } 6217 6216 6218 6217 /* check that BTF function arguments match actual types that the ··· 6236 6237 return -EINVAL; 6237 6238 } 6238 6239 6240 + /* Check if argument must be a referenced pointer, args + i has 6241 + * been verified to be a pointer (after skipping modifiers). 6242 + */ 6243 + if (is_kfunc && trusted_arg && !reg->ref_obj_id) { 6244 + bpf_log(log, "R%d must be referenced\n", regno); 6245 + return -EINVAL; 6246 + } 6247 + 6239 6248 ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id); 6240 6249 ref_tname = btf_name_by_offset(btf, ref_t->name_off); 6241 6250 6242 - if (rel && reg->ref_obj_id) 6251 + /* Trusted args have the same offset checks as release arguments */ 6252 + if (trusted_arg || (rel && reg->ref_obj_id)) 6243 6253 arg_type |= OBJ_RELEASE; 6244 6254 ret = check_func_arg_reg_off(env, reg, regno, arg_type); 6245 6255 if (ret < 0) ··· 6346 6338 reg_ref_tname = btf_name_by_offset(reg_btf, 6347 6339 reg_ref_t->name_off); 6348 6340 if (!btf_struct_ids_match(log, reg_btf, reg_ref_id, 6349 - reg->off, btf, ref_id, rel && reg->ref_obj_id)) { 6341 + reg->off, btf, ref_id, 6342 + trusted_arg || (rel && reg->ref_obj_id))) { 6350 6343 bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n", 6351 6344 func_name, i, 6352 6345 btf_type_str(ref_t), ref_tname, ··· 6450 6441 return -EINVAL; 6451 6442 6452 6443 is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL; 6453 - err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global); 6444 + err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, 0); 6454 6445 6455 6446 /* Compiler optimizations can remove arguments from static functions 6456 6447 * or mismatched type can be passed into a global function. ··· 6463 6454 6464 6455 int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, 6465 6456 const struct btf *btf, u32 func_id, 6466 - struct bpf_reg_state *regs) 6457 + struct bpf_reg_state *regs, 6458 + u32 kfunc_flags) 6467 6459 { 6468 - return btf_check_func_arg_match(env, btf, func_id, regs, true); 6460 + return btf_check_func_arg_match(env, btf, func_id, regs, true, kfunc_flags); 6469 6461 } 6470 6462 6471 6463 /* Convert BTF of a function into bpf_reg_state if possible ··· 6863 6853 return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL; 6864 6854 } 6865 6855 6856 + static void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id) 6857 + { 6858 + return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func); 6859 + } 6860 + 6866 6861 enum { 6867 6862 BTF_MODULE_F_LIVE = (1 << 0), 6868 6863 }; ··· 7116 7101 7117 7102 /* Kernel Function (kfunc) BTF ID set registration API */ 7118 7103 7119 - static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, 7120 - enum btf_kfunc_type type, 7121 - struct btf_id_set *add_set, bool vmlinux_set) 7104 + static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, 7105 + struct btf_id_set8 *add_set) 7122 7106 { 7107 + bool vmlinux_set = !btf_is_module(btf); 7123 7108 struct btf_kfunc_set_tab *tab; 7124 - struct btf_id_set *set; 7109 + struct btf_id_set8 *set; 7125 7110 u32 set_cnt; 7126 7111 int ret; 7127 7112 7128 - if (hook >= BTF_KFUNC_HOOK_MAX || type >= BTF_KFUNC_TYPE_MAX) { 7113 + if (hook >= BTF_KFUNC_HOOK_MAX) { 7129 7114 ret = -EINVAL; 7130 7115 goto end; 7131 7116 } ··· 7141 7126 btf->kfunc_set_tab = tab; 7142 7127 } 7143 7128 7144 - set = tab->sets[hook][type]; 7129 + set = tab->sets[hook]; 7145 7130 /* Warn when register_btf_kfunc_id_set is called twice for the same hook 7146 7131 * for module sets. 7147 7132 */ ··· 7155 7140 * pointer and return. 7156 7141 */ 7157 7142 if (!vmlinux_set) { 7158 - tab->sets[hook][type] = add_set; 7143 + tab->sets[hook] = add_set; 7159 7144 return 0; 7160 7145 } 7161 7146 ··· 7164 7149 * and concatenate all individual sets being registered. While each set 7165 7150 * is individually sorted, they may become unsorted when concatenated, 7166 7151 * hence re-sorting the final set again is required to make binary 7167 - * searching the set using btf_id_set_contains function work. 7152 + * searching the set using btf_id_set8_contains function work. 7168 7153 */ 7169 7154 set_cnt = set ? set->cnt : 0; 7170 7155 ··· 7179 7164 } 7180 7165 7181 7166 /* Grow set */ 7182 - set = krealloc(tab->sets[hook][type], 7183 - offsetof(struct btf_id_set, ids[set_cnt + add_set->cnt]), 7167 + set = krealloc(tab->sets[hook], 7168 + offsetof(struct btf_id_set8, pairs[set_cnt + add_set->cnt]), 7184 7169 GFP_KERNEL | __GFP_NOWARN); 7185 7170 if (!set) { 7186 7171 ret = -ENOMEM; ··· 7188 7173 } 7189 7174 7190 7175 /* For newly allocated set, initialize set->cnt to 0 */ 7191 - if (!tab->sets[hook][type]) 7176 + if (!tab->sets[hook]) 7192 7177 set->cnt = 0; 7193 - tab->sets[hook][type] = set; 7178 + tab->sets[hook] = set; 7194 7179 7195 7180 /* Concatenate the two sets */ 7196 - memcpy(set->ids + set->cnt, add_set->ids, add_set->cnt * sizeof(set->ids[0])); 7181 + memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0])); 7197 7182 set->cnt += add_set->cnt; 7198 7183 7199 - sort(set->ids, set->cnt, sizeof(set->ids[0]), btf_id_cmp_func, NULL); 7184 + sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL); 7200 7185 7201 7186 return 0; 7202 7187 end: ··· 7204 7189 return ret; 7205 7190 } 7206 7191 7207 - static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, 7208 - const struct btf_kfunc_id_set *kset) 7209 - { 7210 - bool vmlinux_set = !btf_is_module(btf); 7211 - int type, ret = 0; 7212 - 7213 - for (type = 0; type < ARRAY_SIZE(kset->sets); type++) { 7214 - if (!kset->sets[type]) 7215 - continue; 7216 - 7217 - ret = __btf_populate_kfunc_set(btf, hook, type, kset->sets[type], vmlinux_set); 7218 - if (ret) 7219 - break; 7220 - } 7221 - return ret; 7222 - } 7223 - 7224 - static bool __btf_kfunc_id_set_contains(const struct btf *btf, 7192 + static u32 *__btf_kfunc_id_set_contains(const struct btf *btf, 7225 7193 enum btf_kfunc_hook hook, 7226 - enum btf_kfunc_type type, 7227 7194 u32 kfunc_btf_id) 7228 7195 { 7229 - struct btf_id_set *set; 7196 + struct btf_id_set8 *set; 7197 + u32 *id; 7230 7198 7231 - if (hook >= BTF_KFUNC_HOOK_MAX || type >= BTF_KFUNC_TYPE_MAX) 7232 - return false; 7199 + if (hook >= BTF_KFUNC_HOOK_MAX) 7200 + return NULL; 7233 7201 if (!btf->kfunc_set_tab) 7234 - return false; 7235 - set = btf->kfunc_set_tab->sets[hook][type]; 7202 + return NULL; 7203 + set = btf->kfunc_set_tab->sets[hook]; 7236 7204 if (!set) 7237 - return false; 7238 - return btf_id_set_contains(set, kfunc_btf_id); 7205 + return NULL; 7206 + id = btf_id_set8_contains(set, kfunc_btf_id); 7207 + if (!id) 7208 + return NULL; 7209 + /* The flags for BTF ID are located next to it */ 7210 + return id + 1; 7239 7211 } 7240 7212 7241 7213 static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type) ··· 7250 7248 * keeping the reference for the duration of the call provides the necessary 7251 7249 * protection for looking up a well-formed btf->kfunc_set_tab. 7252 7250 */ 7253 - bool btf_kfunc_id_set_contains(const struct btf *btf, 7251 + u32 *btf_kfunc_id_set_contains(const struct btf *btf, 7254 7252 enum bpf_prog_type prog_type, 7255 - enum btf_kfunc_type type, u32 kfunc_btf_id) 7253 + u32 kfunc_btf_id) 7256 7254 { 7257 7255 enum btf_kfunc_hook hook; 7258 7256 7259 7257 hook = bpf_prog_type_to_kfunc_hook(prog_type); 7260 - return __btf_kfunc_id_set_contains(btf, hook, type, kfunc_btf_id); 7258 + return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id); 7261 7259 } 7262 7260 7263 7261 /* This function must be invoked only from initcalls/module init functions */ ··· 7284 7282 return PTR_ERR(btf); 7285 7283 7286 7284 hook = bpf_prog_type_to_kfunc_hook(prog_type); 7287 - ret = btf_populate_kfunc_set(btf, hook, kset); 7285 + ret = btf_populate_kfunc_set(btf, hook, kset->set); 7288 7286 btf_put(btf); 7289 7287 return ret; 7290 7288 }

+29 -71

kernel/bpf/core.c

··· 652 652 return fp->jited && !bpf_prog_was_classic(fp); 653 653 } 654 654 655 - static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) 656 - { 657 - return list_empty(&fp->aux->ksym.lnode) || 658 - fp->aux->ksym.lnode.prev == LIST_POISON2; 659 - } 660 - 661 655 void bpf_prog_kallsyms_add(struct bpf_prog *fp) 662 656 { 663 657 if (!bpf_prog_kallsyms_candidate(fp) || ··· 827 833 828 834 #define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE) 829 835 830 - static size_t bpf_prog_pack_size = -1; 831 - static size_t bpf_prog_pack_mask = -1; 832 - 833 - static int bpf_prog_chunk_count(void) 834 - { 835 - WARN_ON_ONCE(bpf_prog_pack_size == -1); 836 - return bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE; 837 - } 838 - 839 836 static DEFINE_MUTEX(pack_mutex); 840 837 static LIST_HEAD(pack_list); 841 838 ··· 834 849 * CONFIG_MMU=n. Use PAGE_SIZE in these cases. 835 850 */ 836 851 #ifdef PMD_SIZE 837 - #define BPF_HPAGE_SIZE PMD_SIZE 838 - #define BPF_HPAGE_MASK PMD_MASK 852 + #define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes()) 839 853 #else 840 - #define BPF_HPAGE_SIZE PAGE_SIZE 841 - #define BPF_HPAGE_MASK PAGE_MASK 854 + #define BPF_PROG_PACK_SIZE PAGE_SIZE 842 855 #endif 843 856 844 - static size_t select_bpf_prog_pack_size(void) 845 - { 846 - size_t size; 847 - void *ptr; 848 - 849 - size = BPF_HPAGE_SIZE * num_online_nodes(); 850 - ptr = module_alloc(size); 851 - 852 - /* Test whether we can get huge pages. If not just use PAGE_SIZE 853 - * packs. 854 - */ 855 - if (!ptr || !is_vm_area_hugepages(ptr)) { 856 - size = PAGE_SIZE; 857 - bpf_prog_pack_mask = PAGE_MASK; 858 - } else { 859 - bpf_prog_pack_mask = BPF_HPAGE_MASK; 860 - } 861 - 862 - vfree(ptr); 863 - return size; 864 - } 857 + #define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE) 865 858 866 859 static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns) 867 860 { 868 861 struct bpf_prog_pack *pack; 869 862 870 - pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(bpf_prog_chunk_count())), 863 + pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)), 871 864 GFP_KERNEL); 872 865 if (!pack) 873 866 return NULL; 874 - pack->ptr = module_alloc(bpf_prog_pack_size); 867 + pack->ptr = module_alloc(BPF_PROG_PACK_SIZE); 875 868 if (!pack->ptr) { 876 869 kfree(pack); 877 870 return NULL; 878 871 } 879 - bpf_fill_ill_insns(pack->ptr, bpf_prog_pack_size); 880 - bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE); 872 + bpf_fill_ill_insns(pack->ptr, BPF_PROG_PACK_SIZE); 873 + bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE); 881 874 list_add_tail(&pack->list, &pack_list); 882 875 883 876 set_vm_flush_reset_perms(pack->ptr); 884 - set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); 885 - set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); 877 + set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); 878 + set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); 886 879 return pack; 887 880 } 888 881 ··· 872 909 void *ptr = NULL; 873 910 874 911 mutex_lock(&pack_mutex); 875 - if (bpf_prog_pack_size == -1) 876 - bpf_prog_pack_size = select_bpf_prog_pack_size(); 877 - 878 - if (size > bpf_prog_pack_size) { 912 + if (size > BPF_PROG_PACK_SIZE) { 879 913 size = round_up(size, PAGE_SIZE); 880 914 ptr = module_alloc(size); 881 915 if (ptr) { ··· 884 924 goto out; 885 925 } 886 926 list_for_each_entry(pack, &pack_list, list) { 887 - pos = bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, 927 + pos = bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, 888 928 nbits, 0); 889 - if (pos < bpf_prog_chunk_count()) 929 + if (pos < BPF_PROG_CHUNK_COUNT) 890 930 goto found_free_area; 891 931 } 892 932 ··· 910 950 struct bpf_prog_pack *pack = NULL, *tmp; 911 951 unsigned int nbits; 912 952 unsigned long pos; 913 - void *pack_ptr; 914 953 915 954 mutex_lock(&pack_mutex); 916 - if (hdr->size > bpf_prog_pack_size) { 955 + if (hdr->size > BPF_PROG_PACK_SIZE) { 917 956 module_memfree(hdr); 918 957 goto out; 919 958 } 920 959 921 - pack_ptr = (void *)((unsigned long)hdr & bpf_prog_pack_mask); 922 - 923 960 list_for_each_entry(tmp, &pack_list, list) { 924 - if (tmp->ptr == pack_ptr) { 961 + if ((void *)hdr >= tmp->ptr && (tmp->ptr + BPF_PROG_PACK_SIZE) > (void *)hdr) { 925 962 pack = tmp; 926 963 break; 927 964 } ··· 928 971 goto out; 929 972 930 973 nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size); 931 - pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT; 974 + pos = ((unsigned long)hdr - (unsigned long)pack->ptr) >> BPF_PROG_CHUNK_SHIFT; 932 975 933 976 WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size), 934 977 "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n"); 935 978 936 979 bitmap_clear(pack->bitmap, pos, nbits); 937 - if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, 938 - bpf_prog_chunk_count(), 0) == 0) { 980 + if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, 981 + BPF_PROG_CHUNK_COUNT, 0) == 0) { 939 982 list_del(&pack->list); 940 983 module_memfree(pack->ptr); 941 984 kfree(pack); ··· 1112 1155 bpf_prog_pack_free(ro_header); 1113 1156 return PTR_ERR(ptr); 1114 1157 } 1115 - prog->aux->use_bpf_prog_pack = true; 1116 1158 return 0; 1117 1159 } 1118 1160 ··· 1135 1179 bpf_jit_uncharge_modmem(size); 1136 1180 } 1137 1181 1182 + struct bpf_binary_header * 1183 + bpf_jit_binary_pack_hdr(const struct bpf_prog *fp) 1184 + { 1185 + unsigned long real_start = (unsigned long)fp->bpf_func; 1186 + unsigned long addr; 1187 + 1188 + addr = real_start & BPF_PROG_CHUNK_MASK; 1189 + return (void *)addr; 1190 + } 1191 + 1138 1192 static inline struct bpf_binary_header * 1139 1193 bpf_jit_binary_hdr(const struct bpf_prog *fp) 1140 1194 { 1141 1195 unsigned long real_start = (unsigned long)fp->bpf_func; 1142 1196 unsigned long addr; 1143 1197 1144 - if (fp->aux->use_bpf_prog_pack) 1145 - addr = real_start & BPF_PROG_CHUNK_MASK; 1146 - else 1147 - addr = real_start & PAGE_MASK; 1148 - 1198 + addr = real_start & PAGE_MASK; 1149 1199 return (void *)addr; 1150 1200 } 1151 1201 ··· 1164 1202 if (fp->jited) { 1165 1203 struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp); 1166 1204 1167 - if (fp->aux->use_bpf_prog_pack) 1168 - bpf_jit_binary_pack_free(hdr, NULL /* rw_buffer */); 1169 - else 1170 - bpf_jit_binary_free(hdr); 1171 - 1205 + bpf_jit_binary_free(hdr); 1172 1206 WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp)); 1173 1207 } 1174 1208

+1 -1

kernel/bpf/devmap.c

··· 845 845 struct bpf_dtab_netdev *dev; 846 846 847 847 dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev), 848 - GFP_ATOMIC | __GFP_NOWARN, 848 + GFP_NOWAIT | __GFP_NOWARN, 849 849 dtab->map.numa_node); 850 850 if (!dev) 851 851 return ERR_PTR(-ENOMEM);

+3 -3

kernel/bpf/hashtab.c

··· 61 61 * 62 62 * As regular device interrupt handlers and soft interrupts are forced into 63 63 * thread context, the existing code which does 64 - * spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*(); 64 + * spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*(); 65 65 * just works. 66 66 * 67 67 * In theory the BPF locks could be converted to regular spinlocks as well, ··· 978 978 goto dec_count; 979 979 } 980 980 l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, 981 - GFP_ATOMIC | __GFP_NOWARN, 981 + GFP_NOWAIT | __GFP_NOWARN, 982 982 htab->map.numa_node); 983 983 if (!l_new) { 984 984 l_new = ERR_PTR(-ENOMEM); ··· 996 996 } else { 997 997 /* alloc_percpu zero-fills */ 998 998 pptr = bpf_map_alloc_percpu(&htab->map, size, 8, 999 - GFP_ATOMIC | __GFP_NOWARN); 999 + GFP_NOWAIT | __GFP_NOWARN); 1000 1000 if (!pptr) { 1001 1001 kfree(l_new); 1002 1002 l_new = ERR_PTR(-ENOMEM);

+1 -1

kernel/bpf/local_storage.c

··· 165 165 } 166 166 167 167 new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size), 168 - __GFP_ZERO | GFP_ATOMIC | __GFP_NOWARN, 168 + __GFP_ZERO | GFP_NOWAIT | __GFP_NOWARN, 169 169 map->numa_node); 170 170 if (!new) 171 171 return -ENOMEM;

+1 -1

kernel/bpf/lpm_trie.c

··· 285 285 if (value) 286 286 size += trie->map.value_size; 287 287 288 - node = bpf_map_kmalloc_node(&trie->map, size, GFP_ATOMIC | __GFP_NOWARN, 288 + node = bpf_map_kmalloc_node(&trie->map, size, GFP_NOWAIT | __GFP_NOWARN, 289 289 trie->map.numa_node); 290 290 if (!node) 291 291 return NULL;

+3 -7

kernel/bpf/preload/iterators/Makefile

··· 9 9 TOOLS_PATH := $(abspath ../../../../tools) 10 10 BPFTOOL_SRC := $(TOOLS_PATH)/bpf/bpftool 11 11 BPFTOOL_OUTPUT := $(abs_out)/bpftool 12 - DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool 12 + DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)/bootstrap/bpftool 13 13 BPFTOOL ?= $(DEFAULT_BPFTOOL) 14 14 15 15 LIBBPF_SRC := $(TOOLS_PATH)/lib/bpf ··· 61 61 OUTPUT=$(abspath $(dir $@))/ prefix= \ 62 62 DESTDIR=$(LIBBPF_DESTDIR) $(abspath $@) install_headers 63 63 64 - $(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT) 65 - $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC) \ 66 - OUTPUT=$(BPFTOOL_OUTPUT)/ \ 67 - LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/ \ 68 - LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/ \ 69 - prefix= DESTDIR=$(abs_out)/ install-bin 64 + $(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT) 65 + $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC) OUTPUT=$(BPFTOOL_OUTPUT)/ bootstrap

+28 -8

kernel/bpf/syscall.c

··· 419 419 #ifdef CONFIG_MEMCG_KMEM 420 420 static void bpf_map_save_memcg(struct bpf_map *map) 421 421 { 422 - map->memcg = get_mem_cgroup_from_mm(current->mm); 422 + /* Currently if a map is created by a process belonging to the root 423 + * memory cgroup, get_obj_cgroup_from_current() will return NULL. 424 + * So we have to check map->objcg for being NULL each time it's 425 + * being used. 426 + */ 427 + map->objcg = get_obj_cgroup_from_current(); 423 428 } 424 429 425 430 static void bpf_map_release_memcg(struct bpf_map *map) 426 431 { 427 - mem_cgroup_put(map->memcg); 432 + if (map->objcg) 433 + obj_cgroup_put(map->objcg); 434 + } 435 + 436 + static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map) 437 + { 438 + if (map->objcg) 439 + return get_mem_cgroup_from_objcg(map->objcg); 440 + 441 + return root_mem_cgroup; 428 442 } 429 443 430 444 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, 431 445 int node) 432 446 { 433 - struct mem_cgroup *old_memcg; 447 + struct mem_cgroup *memcg, *old_memcg; 434 448 void *ptr; 435 449 436 - old_memcg = set_active_memcg(map->memcg); 450 + memcg = bpf_map_get_memcg(map); 451 + old_memcg = set_active_memcg(memcg); 437 452 ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node); 438 453 set_active_memcg(old_memcg); 454 + mem_cgroup_put(memcg); 439 455 440 456 return ptr; 441 457 } 442 458 443 459 void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) 444 460 { 445 - struct mem_cgroup *old_memcg; 461 + struct mem_cgroup *memcg, *old_memcg; 446 462 void *ptr; 447 463 448 - old_memcg = set_active_memcg(map->memcg); 464 + memcg = bpf_map_get_memcg(map); 465 + old_memcg = set_active_memcg(memcg); 449 466 ptr = kzalloc(size, flags | __GFP_ACCOUNT); 450 467 set_active_memcg(old_memcg); 468 + mem_cgroup_put(memcg); 451 469 452 470 return ptr; 453 471 } ··· 473 455 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, 474 456 size_t align, gfp_t flags) 475 457 { 476 - struct mem_cgroup *old_memcg; 458 + struct mem_cgroup *memcg, *old_memcg; 477 459 void __percpu *ptr; 478 460 479 - old_memcg = set_active_memcg(map->memcg); 461 + memcg = bpf_map_get_memcg(map); 462 + old_memcg = set_active_memcg(memcg); 480 463 ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT); 481 464 set_active_memcg(old_memcg); 465 + mem_cgroup_put(memcg); 482 466 483 467 return ptr; 484 468 }

+145 -18

kernel/bpf/trampoline.c

··· 13 13 #include <linux/static_call.h> 14 14 #include <linux/bpf_verifier.h> 15 15 #include <linux/bpf_lsm.h> 16 + #include <linux/delay.h> 16 17 17 18 /* dummy _ops. The verifier will operate on target program's ops. */ 18 19 const struct bpf_verifier_ops bpf_extension_verifier_ops = { ··· 29 28 30 29 /* serializes access to trampoline_table */ 31 30 static DEFINE_MUTEX(trampoline_mutex); 31 + 32 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 33 + static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex); 34 + 35 + static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, enum ftrace_ops_cmd cmd) 36 + { 37 + struct bpf_trampoline *tr = ops->private; 38 + int ret = 0; 39 + 40 + if (cmd == FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF) { 41 + /* This is called inside register_ftrace_direct_multi(), so 42 + * tr->mutex is already locked. 43 + */ 44 + lockdep_assert_held_once(&tr->mutex); 45 + 46 + /* Instead of updating the trampoline here, we propagate 47 + * -EAGAIN to register_ftrace_direct_multi(). Then we can 48 + * retry register_ftrace_direct_multi() after updating the 49 + * trampoline. 50 + */ 51 + if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) && 52 + !(tr->flags & BPF_TRAMP_F_ORIG_STACK)) { 53 + if (WARN_ON_ONCE(tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY)) 54 + return -EBUSY; 55 + 56 + tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY; 57 + return -EAGAIN; 58 + } 59 + 60 + return 0; 61 + } 62 + 63 + /* The normal locking order is 64 + * tr->mutex => direct_mutex (ftrace.c) => ftrace_lock (ftrace.c) 65 + * 66 + * The following two commands are called from 67 + * 68 + * prepare_direct_functions_for_ipmodify 69 + * cleanup_direct_functions_after_ipmodify 70 + * 71 + * In both cases, direct_mutex is already locked. Use 72 + * mutex_trylock(&tr->mutex) to avoid deadlock in race condition 73 + * (something else is making changes to this same trampoline). 74 + */ 75 + if (!mutex_trylock(&tr->mutex)) { 76 + /* sleep 1 ms to make sure whatever holding tr->mutex makes 77 + * some progress. 78 + */ 79 + msleep(1); 80 + return -EAGAIN; 81 + } 82 + 83 + switch (cmd) { 84 + case FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER: 85 + tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY; 86 + 87 + if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) && 88 + !(tr->flags & BPF_TRAMP_F_ORIG_STACK)) 89 + ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */); 90 + break; 91 + case FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER: 92 + tr->flags &= ~BPF_TRAMP_F_SHARE_IPMODIFY; 93 + 94 + if (tr->flags & BPF_TRAMP_F_ORIG_STACK) 95 + ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */); 96 + break; 97 + default: 98 + ret = -EINVAL; 99 + break; 100 + }; 101 + 102 + mutex_unlock(&tr->mutex); 103 + return ret; 104 + } 105 + #endif 32 106 33 107 bool bpf_prog_has_trampoline(const struct bpf_prog *prog) 34 108 { ··· 165 89 tr = kzalloc(sizeof(*tr), GFP_KERNEL); 166 90 if (!tr) 167 91 goto out; 92 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 93 + tr->fops = kzalloc(sizeof(struct ftrace_ops), GFP_KERNEL); 94 + if (!tr->fops) { 95 + kfree(tr); 96 + tr = NULL; 97 + goto out; 98 + } 99 + tr->fops->private = tr; 100 + tr->fops->ops_func = bpf_tramp_ftrace_ops_func; 101 + #endif 168 102 169 103 tr->key = key; 170 104 INIT_HLIST_NODE(&tr->hlist); ··· 214 128 int ret; 215 129 216 130 if (tr->func.ftrace_managed) 217 - ret = unregister_ftrace_direct((long)ip, (long)old_addr); 131 + ret = unregister_ftrace_direct_multi(tr->fops, (long)old_addr); 218 132 else 219 133 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL); 220 134 ··· 223 137 return ret; 224 138 } 225 139 226 - static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr) 140 + static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr, 141 + bool lock_direct_mutex) 227 142 { 228 143 void *ip = tr->func.addr; 229 144 int ret; 230 145 231 - if (tr->func.ftrace_managed) 232 - ret = modify_ftrace_direct((long)ip, (long)old_addr, (long)new_addr); 233 - else 146 + if (tr->func.ftrace_managed) { 147 + if (lock_direct_mutex) 148 + ret = modify_ftrace_direct_multi(tr->fops, (long)new_addr); 149 + else 150 + ret = modify_ftrace_direct_multi_nolock(tr->fops, (long)new_addr); 151 + } else { 234 152 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, new_addr); 153 + } 235 154 return ret; 236 155 } 237 156 ··· 254 163 if (bpf_trampoline_module_get(tr)) 255 164 return -ENOENT; 256 165 257 - if (tr->func.ftrace_managed) 258 - ret = register_ftrace_direct((long)ip, (long)new_addr); 259 - else 166 + if (tr->func.ftrace_managed) { 167 + ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 0); 168 + ret = register_ftrace_direct_multi(tr->fops, (long)new_addr); 169 + } else { 260 170 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr); 171 + } 261 172 262 173 if (ret) 263 174 bpf_trampoline_module_put(tr); ··· 425 332 return ERR_PTR(err); 426 333 } 427 334 428 - static int bpf_trampoline_update(struct bpf_trampoline *tr) 335 + static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex) 429 336 { 430 337 struct bpf_tramp_image *im; 431 338 struct bpf_tramp_links *tlinks; 432 - u32 flags = BPF_TRAMP_F_RESTORE_REGS; 339 + u32 orig_flags = tr->flags; 433 340 bool ip_arg = false; 434 341 int err, total; 435 342 ··· 451 358 goto out; 452 359 } 453 360 361 + /* clear all bits except SHARE_IPMODIFY */ 362 + tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY; 363 + 454 364 if (tlinks[BPF_TRAMP_FEXIT].nr_links || 455 - tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) 456 - flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME; 365 + tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) { 366 + /* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME 367 + * should not be set together. 368 + */ 369 + tr->flags |= BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME; 370 + } else { 371 + tr->flags |= BPF_TRAMP_F_RESTORE_REGS; 372 + } 457 373 458 374 if (ip_arg) 459 - flags |= BPF_TRAMP_F_IP_ARG; 375 + tr->flags |= BPF_TRAMP_F_IP_ARG; 376 + 377 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 378 + again: 379 + if ((tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY) && 380 + (tr->flags & BPF_TRAMP_F_CALL_ORIG)) 381 + tr->flags |= BPF_TRAMP_F_ORIG_STACK; 382 + #endif 460 383 461 384 err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE, 462 - &tr->func.model, flags, tlinks, 385 + &tr->func.model, tr->flags, tlinks, 463 386 tr->func.addr); 464 387 if (err < 0) 465 388 goto out; ··· 484 375 WARN_ON(!tr->cur_image && tr->selector); 485 376 if (tr->cur_image) 486 377 /* progs already running at this address */ 487 - err = modify_fentry(tr, tr->cur_image->image, im->image); 378 + err = modify_fentry(tr, tr->cur_image->image, im->image, lock_direct_mutex); 488 379 else 489 380 /* first time registering */ 490 381 err = register_fentry(tr, im->image); 382 + 383 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 384 + if (err == -EAGAIN) { 385 + /* -EAGAIN from bpf_tramp_ftrace_ops_func. Now 386 + * BPF_TRAMP_F_SHARE_IPMODIFY is set, we can generate the 387 + * trampoline again, and retry register. 388 + */ 389 + /* reset fops->func and fops->trampoline for re-register */ 390 + tr->fops->func = NULL; 391 + tr->fops->trampoline = 0; 392 + goto again; 393 + } 394 + #endif 491 395 if (err) 492 396 goto out; 397 + 493 398 if (tr->cur_image) 494 399 bpf_tramp_image_put(tr->cur_image); 495 400 tr->cur_image = im; 496 401 tr->selector++; 497 402 out: 403 + /* If any error happens, restore previous flags */ 404 + if (err) 405 + tr->flags = orig_flags; 498 406 kfree(tlinks); 499 407 return err; 500 408 } ··· 577 451 578 452 hlist_add_head(&link->tramp_hlist, &tr->progs_hlist[kind]); 579 453 tr->progs_cnt[kind]++; 580 - err = bpf_trampoline_update(tr); 454 + err = bpf_trampoline_update(tr, true /* lock_direct_mutex */); 581 455 if (err) { 582 456 hlist_del_init(&link->tramp_hlist); 583 457 tr->progs_cnt[kind]--; ··· 610 484 } 611 485 hlist_del_init(&link->tramp_hlist); 612 486 tr->progs_cnt[kind]--; 613 - return bpf_trampoline_update(tr); 487 + return bpf_trampoline_update(tr, true /* lock_direct_mutex */); 614 488 } 615 489 616 490 /* bpf_trampoline_unlink_prog() should never fail. */ ··· 624 498 return err; 625 499 } 626 500 627 - #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) 501 + #if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM) 628 502 static void bpf_shim_tramp_link_release(struct bpf_link *link) 629 503 { 630 504 struct bpf_shim_tramp_link *shim_link = ··· 838 712 * multiple rcu callbacks. 839 713 */ 840 714 hlist_del(&tr->hlist); 715 + kfree(tr->fops); 841 716 kfree(tr); 842 717 out: 843 718 mutex_unlock(&trampoline_mutex);

+49 -40

kernel/bpf/verifier.c

··· 5533 5533 type == ARG_CONST_SIZE_OR_ZERO; 5534 5534 } 5535 5535 5536 - static bool arg_type_is_alloc_size(enum bpf_arg_type type) 5537 - { 5538 - return type == ARG_CONST_ALLOC_SIZE_OR_ZERO; 5539 - } 5540 - 5541 - static bool arg_type_is_int_ptr(enum bpf_arg_type type) 5542 - { 5543 - return type == ARG_PTR_TO_INT || 5544 - type == ARG_PTR_TO_LONG; 5545 - } 5546 - 5547 5536 static bool arg_type_is_release(enum bpf_arg_type type) 5548 5537 { 5549 5538 return type & OBJ_RELEASE; ··· 5918 5929 meta->ref_obj_id = reg->ref_obj_id; 5919 5930 } 5920 5931 5921 - if (arg_type == ARG_CONST_MAP_PTR) { 5932 + switch (base_type(arg_type)) { 5933 + case ARG_CONST_MAP_PTR: 5922 5934 /* bpf_map_xxx(map_ptr) call: remember that map_ptr */ 5923 5935 if (meta->map_ptr) { 5924 5936 /* Use map_uid (which is unique id of inner map) to reject: ··· 5944 5954 } 5945 5955 meta->map_ptr = reg->map_ptr; 5946 5956 meta->map_uid = reg->map_uid; 5947 - } else if (arg_type == ARG_PTR_TO_MAP_KEY) { 5957 + break; 5958 + case ARG_PTR_TO_MAP_KEY: 5948 5959 /* bpf_map_xxx(..., map_ptr, ..., key) call: 5949 5960 * check that [key, key + map->key_size) are within 5950 5961 * stack limits and initialized ··· 5962 5971 err = check_helper_mem_access(env, regno, 5963 5972 meta->map_ptr->key_size, false, 5964 5973 NULL); 5965 - } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) { 5974 + break; 5975 + case ARG_PTR_TO_MAP_VALUE: 5966 5976 if (type_may_be_null(arg_type) && register_is_null(reg)) 5967 5977 return 0; 5968 5978 ··· 5979 5987 err = check_helper_mem_access(env, regno, 5980 5988 meta->map_ptr->value_size, false, 5981 5989 meta); 5982 - } else if (arg_type == ARG_PTR_TO_PERCPU_BTF_ID) { 5990 + break; 5991 + case ARG_PTR_TO_PERCPU_BTF_ID: 5983 5992 if (!reg->btf_id) { 5984 5993 verbose(env, "Helper has invalid btf_id in R%d\n", regno); 5985 5994 return -EACCES; 5986 5995 } 5987 5996 meta->ret_btf = reg->btf; 5988 5997 meta->ret_btf_id = reg->btf_id; 5989 - } else if (arg_type == ARG_PTR_TO_SPIN_LOCK) { 5998 + break; 5999 + case ARG_PTR_TO_SPIN_LOCK: 5990 6000 if (meta->func_id == BPF_FUNC_spin_lock) { 5991 6001 if (process_spin_lock(env, regno, true)) 5992 6002 return -EACCES; ··· 5999 6005 verbose(env, "verifier internal error\n"); 6000 6006 return -EFAULT; 6001 6007 } 6002 - } else if (arg_type == ARG_PTR_TO_TIMER) { 6008 + break; 6009 + case ARG_PTR_TO_TIMER: 6003 6010 if (process_timer_func(env, regno, meta)) 6004 6011 return -EACCES; 6005 - } else if (arg_type == ARG_PTR_TO_FUNC) { 6012 + break; 6013 + case ARG_PTR_TO_FUNC: 6006 6014 meta->subprogno = reg->subprogno; 6007 - } else if (base_type(arg_type) == ARG_PTR_TO_MEM) { 6015 + break; 6016 + case ARG_PTR_TO_MEM: 6008 6017 /* The access to this pointer is only checked when we hit the 6009 6018 * next is_mem_size argument below. 6010 6019 */ ··· 6017 6020 fn->arg_size[arg], false, 6018 6021 meta); 6019 6022 } 6020 - } else if (arg_type_is_mem_size(arg_type)) { 6021 - bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO); 6022 - 6023 - err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta); 6024 - } else if (arg_type_is_dynptr(arg_type)) { 6023 + break; 6024 + case ARG_CONST_SIZE: 6025 + err = check_mem_size_reg(env, reg, regno, false, meta); 6026 + break; 6027 + case ARG_CONST_SIZE_OR_ZERO: 6028 + err = check_mem_size_reg(env, reg, regno, true, meta); 6029 + break; 6030 + case ARG_PTR_TO_DYNPTR: 6025 6031 if (arg_type & MEM_UNINIT) { 6026 6032 if (!is_dynptr_reg_valid_uninit(env, reg)) { 6027 6033 verbose(env, "Dynptr has to be an uninitialized dynptr\n"); ··· 6058 6058 err_extra, arg + 1); 6059 6059 return -EINVAL; 6060 6060 } 6061 - } else if (arg_type_is_alloc_size(arg_type)) { 6061 + break; 6062 + case ARG_CONST_ALLOC_SIZE_OR_ZERO: 6062 6063 if (!tnum_is_const(reg->var_off)) { 6063 6064 verbose(env, "R%d is not a known constant'\n", 6064 6065 regno); 6065 6066 return -EACCES; 6066 6067 } 6067 6068 meta->mem_size = reg->var_off.value; 6068 - } else if (arg_type_is_int_ptr(arg_type)) { 6069 + break; 6070 + case ARG_PTR_TO_INT: 6071 + case ARG_PTR_TO_LONG: 6072 + { 6069 6073 int size = int_ptr_type_to_size(arg_type); 6070 6074 6071 6075 err = check_helper_mem_access(env, regno, size, false, meta); 6072 6076 if (err) 6073 6077 return err; 6074 6078 err = check_ptr_alignment(env, reg, 0, size, true); 6075 - } else if (arg_type == ARG_PTR_TO_CONST_STR) { 6079 + break; 6080 + } 6081 + case ARG_PTR_TO_CONST_STR: 6082 + { 6076 6083 struct bpf_map *map = reg->map_ptr; 6077 6084 int map_off; 6078 6085 u64 map_addr; ··· 6118 6111 verbose(env, "string is not zero-terminated\n"); 6119 6112 return -EINVAL; 6120 6113 } 6121 - } else if (arg_type == ARG_PTR_TO_KPTR) { 6114 + break; 6115 + } 6116 + case ARG_PTR_TO_KPTR: 6122 6117 if (process_kptr_func(env, regno, meta)) 6123 6118 return -EACCES; 6119 + break; 6124 6120 } 6125 6121 6126 6122 return err; ··· 7170 7160 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn, 7171 7161 int *insn_idx_p) 7172 7162 { 7163 + enum bpf_prog_type prog_type = resolve_prog_type(env->prog); 7173 7164 const struct bpf_func_proto *fn = NULL; 7174 7165 enum bpf_return_type ret_type; 7175 7166 enum bpf_type_flag ret_flag; ··· 7332 7321 } 7333 7322 break; 7334 7323 case BPF_FUNC_set_retval: 7335 - if (env->prog->expected_attach_type == BPF_LSM_CGROUP) { 7324 + if (prog_type == BPF_PROG_TYPE_LSM && 7325 + env->prog->expected_attach_type == BPF_LSM_CGROUP) { 7336 7326 if (!env->prog->aux->attach_func_proto->type) { 7337 7327 /* Make sure programs that attach to void 7338 7328 * hooks don't try to modify return value. ··· 7562 7550 int err, insn_idx = *insn_idx_p; 7563 7551 const struct btf_param *args; 7564 7552 struct btf *desc_btf; 7553 + u32 *kfunc_flags; 7565 7554 bool acq; 7566 7555 7567 7556 /* skip for now, but return error when we find this in fixup_kfunc_call */ ··· 7578 7565 func_name = btf_name_by_offset(desc_btf, func->name_off); 7579 7566 func_proto = btf_type_by_id(desc_btf, func->type); 7580 7567 7581 - if (!btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), 7582 - BTF_KFUNC_TYPE_CHECK, func_id)) { 7568 + kfunc_flags = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), func_id); 7569 + if (!kfunc_flags) { 7583 7570 verbose(env, "calling kernel function %s is not allowed\n", 7584 7571 func_name); 7585 7572 return -EACCES; 7586 7573 } 7587 - 7588 - acq = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), 7589 - BTF_KFUNC_TYPE_ACQUIRE, func_id); 7574 + acq = *kfunc_flags & KF_ACQUIRE; 7590 7575 7591 7576 /* Check the arguments */ 7592 - err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs); 7577 + err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, *kfunc_flags); 7593 7578 if (err < 0) 7594 7579 return err; 7595 7580 /* In case of release function, we get register number of refcounted ··· 7631 7620 regs[BPF_REG_0].btf = desc_btf; 7632 7621 regs[BPF_REG_0].type = PTR_TO_BTF_ID; 7633 7622 regs[BPF_REG_0].btf_id = ptr_type_id; 7634 - if (btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), 7635 - BTF_KFUNC_TYPE_RET_NULL, func_id)) { 7623 + if (*kfunc_flags & KF_RET_NULL) { 7636 7624 regs[BPF_REG_0].type |= PTR_MAYBE_NULL; 7637 7625 /* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */ 7638 7626 regs[BPF_REG_0].id = ++env->id_gen; ··· 12572 12562 case BPF_PROG_TYPE_TRACEPOINT: 12573 12563 case BPF_PROG_TYPE_PERF_EVENT: 12574 12564 case BPF_PROG_TYPE_RAW_TRACEPOINT: 12565 + case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE: 12575 12566 return true; 12576 12567 default: 12577 12568 return false; ··· 13631 13620 /* Below members will be freed only at prog->aux */ 13632 13621 func[i]->aux->btf = prog->aux->btf; 13633 13622 func[i]->aux->func_info = prog->aux->func_info; 13623 + func[i]->aux->func_info_cnt = prog->aux->func_info_cnt; 13634 13624 func[i]->aux->poke_tab = prog->aux->poke_tab; 13635 13625 func[i]->aux->size_poke_tab = prog->aux->size_poke_tab; 13636 13626 ··· 13644 13632 poke->aux = func[i]->aux; 13645 13633 } 13646 13634 13647 - /* Use bpf_prog_F_tag to indicate functions in stack traces. 13648 - * Long term would need debug info to populate names 13649 - */ 13650 13635 func[i]->aux->name[0] = 'F'; 13651 13636 func[i]->aux->stack_depth = env->subprog_info[i].stack_depth; 13652 13637 func[i]->jit_requested = 1;

+91

kernel/kallsyms.c

··· 30 30 #include <linux/module.h> 31 31 #include <linux/kernel.h> 32 32 #include <linux/bsearch.h> 33 + #include <linux/btf_ids.h> 33 34 34 35 /* 35 36 * These will be re-linked against their real values ··· 799 798 .stop = s_stop, 800 799 .show = s_show 801 800 }; 801 + 802 + #ifdef CONFIG_BPF_SYSCALL 803 + 804 + struct bpf_iter__ksym { 805 + __bpf_md_ptr(struct bpf_iter_meta *, meta); 806 + __bpf_md_ptr(struct kallsym_iter *, ksym); 807 + }; 808 + 809 + static int ksym_prog_seq_show(struct seq_file *m, bool in_stop) 810 + { 811 + struct bpf_iter__ksym ctx; 812 + struct bpf_iter_meta meta; 813 + struct bpf_prog *prog; 814 + 815 + meta.seq = m; 816 + prog = bpf_iter_get_info(&meta, in_stop); 817 + if (!prog) 818 + return 0; 819 + 820 + ctx.meta = &meta; 821 + ctx.ksym = m ? m->private : NULL; 822 + return bpf_iter_run_prog(prog, &ctx); 823 + } 824 + 825 + static int bpf_iter_ksym_seq_show(struct seq_file *m, void *p) 826 + { 827 + return ksym_prog_seq_show(m, false); 828 + } 829 + 830 + static void bpf_iter_ksym_seq_stop(struct seq_file *m, void *p) 831 + { 832 + if (!p) 833 + (void) ksym_prog_seq_show(m, true); 834 + else 835 + s_stop(m, p); 836 + } 837 + 838 + static const struct seq_operations bpf_iter_ksym_ops = { 839 + .start = s_start, 840 + .next = s_next, 841 + .stop = bpf_iter_ksym_seq_stop, 842 + .show = bpf_iter_ksym_seq_show, 843 + }; 844 + 845 + static int bpf_iter_ksym_init(void *priv_data, struct bpf_iter_aux_info *aux) 846 + { 847 + struct kallsym_iter *iter = priv_data; 848 + 849 + reset_iter(iter, 0); 850 + 851 + /* cache here as in kallsyms_open() case; use current process 852 + * credentials to tell BPF iterators if values should be shown. 853 + */ 854 + iter->show_value = kallsyms_show_value(current_cred()); 855 + 856 + return 0; 857 + } 858 + 859 + DEFINE_BPF_ITER_FUNC(ksym, struct bpf_iter_meta *meta, struct kallsym_iter *ksym) 860 + 861 + static const struct bpf_iter_seq_info ksym_iter_seq_info = { 862 + .seq_ops = &bpf_iter_ksym_ops, 863 + .init_seq_private = bpf_iter_ksym_init, 864 + .fini_seq_private = NULL, 865 + .seq_priv_size = sizeof(struct kallsym_iter), 866 + }; 867 + 868 + static struct bpf_iter_reg ksym_iter_reg_info = { 869 + .target = "ksym", 870 + .feature = BPF_ITER_RESCHED, 871 + .ctx_arg_info_size = 1, 872 + .ctx_arg_info = { 873 + { offsetof(struct bpf_iter__ksym, ksym), 874 + PTR_TO_BTF_ID_OR_NULL }, 875 + }, 876 + .seq_info = &ksym_iter_seq_info, 877 + }; 878 + 879 + BTF_ID_LIST(btf_ksym_iter_id) 880 + BTF_ID(struct, kallsym_iter) 881 + 882 + static int __init bpf_ksym_iter_register(void) 883 + { 884 + ksym_iter_reg_info.ctx_arg_info[0].btf_id = *btf_ksym_iter_id; 885 + return bpf_iter_reg_target(&ksym_iter_reg_info); 886 + } 887 + 888 + late_initcall(bpf_ksym_iter_register); 889 + 890 + #endif /* CONFIG_BPF_SYSCALL */ 802 891 803 892 static inline int kallsyms_for_perf(void) 804 893 {

+278 -50

kernel/trace/ftrace.c

··· 1861 1861 ftrace_hash_rec_update_modify(ops, filter_hash, 1); 1862 1862 } 1863 1863 1864 + static bool ops_references_ip(struct ftrace_ops *ops, unsigned long ip); 1865 + 1864 1866 /* 1865 1867 * Try to update IPMODIFY flag on each ftrace_rec. Return 0 if it is OK 1866 1868 * or no-needed to update, -EBUSY if it detects a conflict of the flag ··· 1871 1869 * - If the hash is NULL, it hits all recs (if IPMODIFY is set, this is rejected) 1872 1870 * - If the hash is EMPTY_HASH, it hits nothing 1873 1871 * - Anything else hits the recs which match the hash entries. 1872 + * 1873 + * DIRECT ops does not have IPMODIFY flag, but we still need to check it 1874 + * against functions with FTRACE_FL_IPMODIFY. If there is any overlap, call 1875 + * ops_func(SHARE_IPMODIFY_SELF) to make sure current ops can share with 1876 + * IPMODIFY. If ops_func(SHARE_IPMODIFY_SELF) returns non-zero, propagate 1877 + * the return value to the caller and eventually to the owner of the DIRECT 1878 + * ops. 1874 1879 */ 1875 1880 static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops, 1876 1881 struct ftrace_hash *old_hash, ··· 1886 1877 struct ftrace_page *pg; 1887 1878 struct dyn_ftrace *rec, *end = NULL; 1888 1879 int in_old, in_new; 1880 + bool is_ipmodify, is_direct; 1889 1881 1890 1882 /* Only update if the ops has been registered */ 1891 1883 if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 1892 1884 return 0; 1893 1885 1894 - if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY)) 1886 + is_ipmodify = ops->flags & FTRACE_OPS_FL_IPMODIFY; 1887 + is_direct = ops->flags & FTRACE_OPS_FL_DIRECT; 1888 + 1889 + /* neither IPMODIFY nor DIRECT, skip */ 1890 + if (!is_ipmodify && !is_direct) 1891 + return 0; 1892 + 1893 + if (WARN_ON_ONCE(is_ipmodify && is_direct)) 1895 1894 return 0; 1896 1895 1897 1896 /* 1898 - * Since the IPMODIFY is a very address sensitive action, we do not 1899 - * allow ftrace_ops to set all functions to new hash. 1897 + * Since the IPMODIFY and DIRECT are very address sensitive 1898 + * actions, we do not allow ftrace_ops to set all functions to new 1899 + * hash. 1900 1900 */ 1901 1901 if (!new_hash || !old_hash) 1902 1902 return -EINVAL; ··· 1923 1905 continue; 1924 1906 1925 1907 if (in_new) { 1926 - /* New entries must ensure no others are using it */ 1927 - if (rec->flags & FTRACE_FL_IPMODIFY) 1928 - goto rollback; 1929 - rec->flags |= FTRACE_FL_IPMODIFY; 1930 - } else /* Removed entry */ 1908 + if (rec->flags & FTRACE_FL_IPMODIFY) { 1909 + int ret; 1910 + 1911 + /* Cannot have two ipmodify on same rec */ 1912 + if (is_ipmodify) 1913 + goto rollback; 1914 + 1915 + FTRACE_WARN_ON(rec->flags & FTRACE_FL_DIRECT); 1916 + 1917 + /* 1918 + * Another ops with IPMODIFY is already 1919 + * attached. We are now attaching a direct 1920 + * ops. Run SHARE_IPMODIFY_SELF, to check 1921 + * whether sharing is supported. 1922 + */ 1923 + if (!ops->ops_func) 1924 + return -EBUSY; 1925 + ret = ops->ops_func(ops, FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF); 1926 + if (ret) 1927 + return ret; 1928 + } else if (is_ipmodify) { 1929 + rec->flags |= FTRACE_FL_IPMODIFY; 1930 + } 1931 + } else if (is_ipmodify) { 1931 1932 rec->flags &= ~FTRACE_FL_IPMODIFY; 1933 + } 1932 1934 } while_for_each_ftrace_rec(); 1933 1935 1934 1936 return 0; ··· 2492 2454 2493 2455 struct ftrace_ops direct_ops = { 2494 2456 .func = call_direct_funcs, 2495 - .flags = FTRACE_OPS_FL_IPMODIFY 2496 - | FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS 2457 + .flags = FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS 2497 2458 | FTRACE_OPS_FL_PERMANENT, 2498 2459 /* 2499 2460 * By declaring the main trampoline as this trampoline ··· 3109 3072 } 3110 3073 3111 3074 /* 3112 - * Check if the current ops references the record. 3075 + * Check if the current ops references the given ip. 3113 3076 * 3114 3077 * If the ops traces all functions, then it was already accounted for. 3115 3078 * If the ops does not trace the current record function, skip it. 3116 3079 * If the ops ignores the function via notrace filter, skip it. 3117 3080 */ 3118 - static inline bool 3119 - ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec) 3081 + static bool 3082 + ops_references_ip(struct ftrace_ops *ops, unsigned long ip) 3120 3083 { 3121 3084 /* If ops isn't enabled, ignore it */ 3122 3085 if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) ··· 3128 3091 3129 3092 /* The function must be in the filter */ 3130 3093 if (!ftrace_hash_empty(ops->func_hash->filter_hash) && 3131 - !__ftrace_lookup_ip(ops->func_hash->filter_hash, rec->ip)) 3094 + !__ftrace_lookup_ip(ops->func_hash->filter_hash, ip)) 3132 3095 return false; 3133 3096 3134 3097 /* If in notrace hash, we ignore it too */ 3135 - if (ftrace_lookup_ip(ops->func_hash->notrace_hash, rec->ip)) 3098 + if (ftrace_lookup_ip(ops->func_hash->notrace_hash, ip)) 3136 3099 return false; 3137 3100 3138 3101 return true; 3102 + } 3103 + 3104 + /* 3105 + * Check if the current ops references the record. 3106 + * 3107 + * If the ops traces all functions, then it was already accounted for. 3108 + * If the ops does not trace the current record function, skip it. 3109 + * If the ops ignores the function via notrace filter, skip it. 3110 + */ 3111 + static bool 3112 + ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec) 3113 + { 3114 + return ops_references_ip(ops, rec->ip); 3139 3115 } 3140 3116 3141 3117 static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) ··· 5265 5215 return direct; 5266 5216 } 5267 5217 5218 + static int register_ftrace_function_nolock(struct ftrace_ops *ops); 5219 + 5268 5220 /** 5269 5221 * register_ftrace_direct - Call a custom trampoline directly 5270 5222 * @ip: The address of the nop at the beginning of a function ··· 5338 5286 ret = ftrace_set_filter_ip(&direct_ops, ip, 0, 0); 5339 5287 5340 5288 if (!ret && !(direct_ops.flags & FTRACE_OPS_FL_ENABLED)) { 5341 - ret = register_ftrace_function(&direct_ops); 5289 + ret = register_ftrace_function_nolock(&direct_ops); 5342 5290 if (ret) 5343 5291 ftrace_set_filter_ip(&direct_ops, ip, 1, 0); 5344 5292 } ··· 5597 5545 } 5598 5546 EXPORT_SYMBOL_GPL(modify_ftrace_direct); 5599 5547 5600 - #define MULTI_FLAGS (FTRACE_OPS_FL_IPMODIFY | FTRACE_OPS_FL_DIRECT | \ 5601 - FTRACE_OPS_FL_SAVE_REGS) 5548 + #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS) 5602 5549 5603 5550 static int check_direct_multi(struct ftrace_ops *ops) 5604 5551 { ··· 5690 5639 ops->flags = MULTI_FLAGS; 5691 5640 ops->trampoline = FTRACE_REGS_ADDR; 5692 5641 5693 - err = register_ftrace_function(ops); 5642 + err = register_ftrace_function_nolock(ops); 5694 5643 5695 5644 out_remove: 5696 5645 if (err) ··· 5742 5691 } 5743 5692 EXPORT_SYMBOL_GPL(unregister_ftrace_direct_multi); 5744 5693 5745 - /** 5746 - * modify_ftrace_direct_multi - Modify an existing direct 'multi' call 5747 - * to call something else 5748 - * @ops: The address of the struct ftrace_ops object 5749 - * @addr: The address of the new trampoline to call at @ops functions 5750 - * 5751 - * This is used to unregister currently registered direct caller and 5752 - * register new one @addr on functions registered in @ops object. 5753 - * 5754 - * Note there's window between ftrace_shutdown and ftrace_startup calls 5755 - * where there will be no callbacks called. 5756 - * 5757 - * Returns: zero on success. Non zero on error, which includes: 5758 - * -EINVAL - The @ops object was not properly registered. 5759 - */ 5760 - int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5694 + static int 5695 + __modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5761 5696 { 5762 5697 struct ftrace_hash *hash; 5763 5698 struct ftrace_func_entry *entry, *iter; ··· 5754 5717 int i, size; 5755 5718 int err; 5756 5719 5757 - if (check_direct_multi(ops)) 5758 - return -EINVAL; 5759 - if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 5760 - return -EINVAL; 5761 - 5762 - mutex_lock(&direct_mutex); 5720 + lockdep_assert_held_once(&direct_mutex); 5763 5721 5764 5722 /* Enable the tmp_ops to have the same functions as the direct ops */ 5765 5723 ftrace_ops_init(&tmp_ops); 5766 5724 tmp_ops.func_hash = ops->func_hash; 5767 5725 5768 - err = register_ftrace_function(&tmp_ops); 5726 + err = register_ftrace_function_nolock(&tmp_ops); 5769 5727 if (err) 5770 - goto out_direct; 5728 + return err; 5771 5729 5772 5730 /* 5773 5731 * Now the ftrace_ops_list_func() is called to do the direct callers. ··· 5786 5754 /* Removing the tmp_ops will add the updated direct callers to the functions */ 5787 5755 unregister_ftrace_function(&tmp_ops); 5788 5756 5789 - out_direct: 5757 + return err; 5758 + } 5759 + 5760 + /** 5761 + * modify_ftrace_direct_multi_nolock - Modify an existing direct 'multi' call 5762 + * to call something else 5763 + * @ops: The address of the struct ftrace_ops object 5764 + * @addr: The address of the new trampoline to call at @ops functions 5765 + * 5766 + * This is used to unregister currently registered direct caller and 5767 + * register new one @addr on functions registered in @ops object. 5768 + * 5769 + * Note there's window between ftrace_shutdown and ftrace_startup calls 5770 + * where there will be no callbacks called. 5771 + * 5772 + * Caller should already have direct_mutex locked, so we don't lock 5773 + * direct_mutex here. 5774 + * 5775 + * Returns: zero on success. Non zero on error, which includes: 5776 + * -EINVAL - The @ops object was not properly registered. 5777 + */ 5778 + int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr) 5779 + { 5780 + if (check_direct_multi(ops)) 5781 + return -EINVAL; 5782 + if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 5783 + return -EINVAL; 5784 + 5785 + return __modify_ftrace_direct_multi(ops, addr); 5786 + } 5787 + EXPORT_SYMBOL_GPL(modify_ftrace_direct_multi_nolock); 5788 + 5789 + /** 5790 + * modify_ftrace_direct_multi - Modify an existing direct 'multi' call 5791 + * to call something else 5792 + * @ops: The address of the struct ftrace_ops object 5793 + * @addr: The address of the new trampoline to call at @ops functions 5794 + * 5795 + * This is used to unregister currently registered direct caller and 5796 + * register new one @addr on functions registered in @ops object. 5797 + * 5798 + * Note there's window between ftrace_shutdown and ftrace_startup calls 5799 + * where there will be no callbacks called. 5800 + * 5801 + * Returns: zero on success. Non zero on error, which includes: 5802 + * -EINVAL - The @ops object was not properly registered. 5803 + */ 5804 + int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5805 + { 5806 + int err; 5807 + 5808 + if (check_direct_multi(ops)) 5809 + return -EINVAL; 5810 + if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 5811 + return -EINVAL; 5812 + 5813 + mutex_lock(&direct_mutex); 5814 + err = __modify_ftrace_direct_multi(ops, addr); 5790 5815 mutex_unlock(&direct_mutex); 5791 5816 return err; 5792 5817 } ··· 8054 7965 return ftrace_disabled; 8055 7966 } 8056 7967 7968 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 7969 + /* 7970 + * When registering ftrace_ops with IPMODIFY, it is necessary to make sure 7971 + * it doesn't conflict with any direct ftrace_ops. If there is existing 7972 + * direct ftrace_ops on a kernel function being patched, call 7973 + * FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER on it to enable sharing. 7974 + * 7975 + * @ops: ftrace_ops being registered. 7976 + * 7977 + * Returns: 7978 + * 0 on success; 7979 + * Negative on failure. 7980 + */ 7981 + static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops) 7982 + { 7983 + struct ftrace_func_entry *entry; 7984 + struct ftrace_hash *hash; 7985 + struct ftrace_ops *op; 7986 + int size, i, ret; 7987 + 7988 + lockdep_assert_held_once(&direct_mutex); 7989 + 7990 + if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY)) 7991 + return 0; 7992 + 7993 + hash = ops->func_hash->filter_hash; 7994 + size = 1 << hash->size_bits; 7995 + for (i = 0; i < size; i++) { 7996 + hlist_for_each_entry(entry, &hash->buckets[i], hlist) { 7997 + unsigned long ip = entry->ip; 7998 + bool found_op = false; 7999 + 8000 + mutex_lock(&ftrace_lock); 8001 + do_for_each_ftrace_op(op, ftrace_ops_list) { 8002 + if (!(op->flags & FTRACE_OPS_FL_DIRECT)) 8003 + continue; 8004 + if (ops_references_ip(op, ip)) { 8005 + found_op = true; 8006 + break; 8007 + } 8008 + } while_for_each_ftrace_op(op); 8009 + mutex_unlock(&ftrace_lock); 8010 + 8011 + if (found_op) { 8012 + if (!op->ops_func) 8013 + return -EBUSY; 8014 + 8015 + ret = op->ops_func(op, FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER); 8016 + if (ret) 8017 + return ret; 8018 + } 8019 + } 8020 + } 8021 + 8022 + return 0; 8023 + } 8024 + 8025 + /* 8026 + * Similar to prepare_direct_functions_for_ipmodify, clean up after ops 8027 + * with IPMODIFY is unregistered. The cleanup is optional for most DIRECT 8028 + * ops. 8029 + */ 8030 + static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops) 8031 + { 8032 + struct ftrace_func_entry *entry; 8033 + struct ftrace_hash *hash; 8034 + struct ftrace_ops *op; 8035 + int size, i; 8036 + 8037 + if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY)) 8038 + return; 8039 + 8040 + mutex_lock(&direct_mutex); 8041 + 8042 + hash = ops->func_hash->filter_hash; 8043 + size = 1 << hash->size_bits; 8044 + for (i = 0; i < size; i++) { 8045 + hlist_for_each_entry(entry, &hash->buckets[i], hlist) { 8046 + unsigned long ip = entry->ip; 8047 + bool found_op = false; 8048 + 8049 + mutex_lock(&ftrace_lock); 8050 + do_for_each_ftrace_op(op, ftrace_ops_list) { 8051 + if (!(op->flags & FTRACE_OPS_FL_DIRECT)) 8052 + continue; 8053 + if (ops_references_ip(op, ip)) { 8054 + found_op = true; 8055 + break; 8056 + } 8057 + } while_for_each_ftrace_op(op); 8058 + mutex_unlock(&ftrace_lock); 8059 + 8060 + /* The cleanup is optional, ignore any errors */ 8061 + if (found_op && op->ops_func) 8062 + op->ops_func(op, FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER); 8063 + } 8064 + } 8065 + mutex_unlock(&direct_mutex); 8066 + } 8067 + 8068 + #define lock_direct_mutex() mutex_lock(&direct_mutex) 8069 + #define unlock_direct_mutex() mutex_unlock(&direct_mutex) 8070 + 8071 + #else /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ 8072 + 8073 + static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops) 8074 + { 8075 + return 0; 8076 + } 8077 + 8078 + static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops) 8079 + { 8080 + } 8081 + 8082 + #define lock_direct_mutex() do { } while (0) 8083 + #define unlock_direct_mutex() do { } while (0) 8084 + 8085 + #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ 8086 + 8087 + /* 8088 + * Similar to register_ftrace_function, except we don't lock direct_mutex. 8089 + */ 8090 + static int register_ftrace_function_nolock(struct ftrace_ops *ops) 8091 + { 8092 + int ret; 8093 + 8094 + ftrace_ops_init(ops); 8095 + 8096 + mutex_lock(&ftrace_lock); 8097 + 8098 + ret = ftrace_startup(ops, 0); 8099 + 8100 + mutex_unlock(&ftrace_lock); 8101 + 8102 + return ret; 8103 + } 8104 + 8057 8105 /** 8058 8106 * register_ftrace_function - register a function for profiling 8059 8107 * @ops: ops structure that holds the function for profiling. ··· 8206 7980 { 8207 7981 int ret; 8208 7982 8209 - ftrace_ops_init(ops); 7983 + lock_direct_mutex(); 7984 + ret = prepare_direct_functions_for_ipmodify(ops); 7985 + if (ret < 0) 7986 + goto out_unlock; 8210 7987 8211 - mutex_lock(&ftrace_lock); 7988 + ret = register_ftrace_function_nolock(ops); 8212 7989 8213 - ret = ftrace_startup(ops, 0); 8214 - 8215 - mutex_unlock(&ftrace_lock); 8216 - 7990 + out_unlock: 7991 + unlock_direct_mutex(); 8217 7992 return ret; 8218 7993 } 8219 7994 EXPORT_SYMBOL_GPL(register_ftrace_function); ··· 8233 8006 ret = ftrace_shutdown(ops, 0); 8234 8007 mutex_unlock(&ftrace_lock); 8235 8008 8009 + cleanup_direct_functions_after_ipmodify(ops); 8236 8010 return ret; 8237 8011 } 8238 8012 EXPORT_SYMBOL_GPL(unregister_ftrace_function);

+30 -48

net/bpf/test_run.c

··· 691 691 { 692 692 } 693 693 694 + noinline void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) 695 + { 696 + } 697 + 694 698 __diag_pop(); 695 699 696 700 ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO); 697 701 698 - BTF_SET_START(test_sk_check_kfunc_ids) 699 - BTF_ID(func, bpf_kfunc_call_test1) 700 - BTF_ID(func, bpf_kfunc_call_test2) 701 - BTF_ID(func, bpf_kfunc_call_test3) 702 - BTF_ID(func, bpf_kfunc_call_test_acquire) 703 - BTF_ID(func, bpf_kfunc_call_memb_acquire) 704 - BTF_ID(func, bpf_kfunc_call_test_release) 705 - BTF_ID(func, bpf_kfunc_call_memb_release) 706 - BTF_ID(func, bpf_kfunc_call_memb1_release) 707 - BTF_ID(func, bpf_kfunc_call_test_kptr_get) 708 - BTF_ID(func, bpf_kfunc_call_test_pass_ctx) 709 - BTF_ID(func, bpf_kfunc_call_test_pass1) 710 - BTF_ID(func, bpf_kfunc_call_test_pass2) 711 - BTF_ID(func, bpf_kfunc_call_test_fail1) 712 - BTF_ID(func, bpf_kfunc_call_test_fail2) 713 - BTF_ID(func, bpf_kfunc_call_test_fail3) 714 - BTF_ID(func, bpf_kfunc_call_test_mem_len_pass1) 715 - BTF_ID(func, bpf_kfunc_call_test_mem_len_fail1) 716 - BTF_ID(func, bpf_kfunc_call_test_mem_len_fail2) 717 - BTF_SET_END(test_sk_check_kfunc_ids) 718 - 719 - BTF_SET_START(test_sk_acquire_kfunc_ids) 720 - BTF_ID(func, bpf_kfunc_call_test_acquire) 721 - BTF_ID(func, bpf_kfunc_call_memb_acquire) 722 - BTF_ID(func, bpf_kfunc_call_test_kptr_get) 723 - BTF_SET_END(test_sk_acquire_kfunc_ids) 724 - 725 - BTF_SET_START(test_sk_release_kfunc_ids) 726 - BTF_ID(func, bpf_kfunc_call_test_release) 727 - BTF_ID(func, bpf_kfunc_call_memb_release) 728 - BTF_ID(func, bpf_kfunc_call_memb1_release) 729 - BTF_SET_END(test_sk_release_kfunc_ids) 730 - 731 - BTF_SET_START(test_sk_ret_null_kfunc_ids) 732 - BTF_ID(func, bpf_kfunc_call_test_acquire) 733 - BTF_ID(func, bpf_kfunc_call_memb_acquire) 734 - BTF_ID(func, bpf_kfunc_call_test_kptr_get) 735 - BTF_SET_END(test_sk_ret_null_kfunc_ids) 736 - 737 - BTF_SET_START(test_sk_kptr_acquire_kfunc_ids) 738 - BTF_ID(func, bpf_kfunc_call_test_kptr_get) 739 - BTF_SET_END(test_sk_kptr_acquire_kfunc_ids) 702 + BTF_SET8_START(test_sk_check_kfunc_ids) 703 + BTF_ID_FLAGS(func, bpf_kfunc_call_test1) 704 + BTF_ID_FLAGS(func, bpf_kfunc_call_test2) 705 + BTF_ID_FLAGS(func, bpf_kfunc_call_test3) 706 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_acquire, KF_ACQUIRE | KF_RET_NULL) 707 + BTF_ID_FLAGS(func, bpf_kfunc_call_memb_acquire, KF_ACQUIRE | KF_RET_NULL) 708 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE) 709 + BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE) 710 + BTF_ID_FLAGS(func, bpf_kfunc_call_memb1_release, KF_RELEASE) 711 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_kptr_get, KF_ACQUIRE | KF_RET_NULL | KF_KPTR_GET) 712 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass_ctx) 713 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass1) 714 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass2) 715 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail1) 716 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail2) 717 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3) 718 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1) 719 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1) 720 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2) 721 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS) 722 + BTF_SET8_END(test_sk_check_kfunc_ids) 740 723 741 724 static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, 742 725 u32 size, u32 headroom, u32 tailroom) ··· 937 954 static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb) 938 955 { 939 956 struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; 957 + 958 + if (!skb->len) 959 + return -EINVAL; 940 960 941 961 if (!__skb) 942 962 return 0; ··· 1603 1617 } 1604 1618 1605 1619 static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = { 1606 - .owner = THIS_MODULE, 1607 - .check_set = &test_sk_check_kfunc_ids, 1608 - .acquire_set = &test_sk_acquire_kfunc_ids, 1609 - .release_set = &test_sk_release_kfunc_ids, 1610 - .ret_null_set = &test_sk_ret_null_kfunc_ids, 1611 - .kptr_acquire_set = &test_sk_kptr_acquire_kfunc_ids 1620 + .owner = THIS_MODULE, 1621 + .set = &test_sk_check_kfunc_ids, 1612 1622 }; 1613 1623 1614 1624 BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)

+1

net/core/dev.c

··· 4168 4168 bool again = false; 4169 4169 4170 4170 skb_reset_mac_header(skb); 4171 + skb_assert_len(skb); 4171 4172 4172 4173 if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) 4173 4174 __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);

+2 -2

net/core/filter.c

··· 237 237 BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *, 238 238 data, int, headlen, int, offset) 239 239 { 240 - u16 tmp, *ptr; 240 + __be16 tmp, *ptr; 241 241 const int len = sizeof(tmp); 242 242 243 243 if (offset >= 0) { ··· 264 264 BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *, 265 265 data, int, headlen, int, offset) 266 266 { 267 - u32 tmp, *ptr; 267 + __be32 tmp, *ptr; 268 268 const int len = sizeof(tmp); 269 269 270 270 if (likely(offset >= 0)) {

+2 -2

net/core/skmsg.c

··· 462 462 463 463 if (copied == len) 464 464 break; 465 - } while (i != msg_rx->sg.end); 465 + } while (!sg_is_last(sge)); 466 466 467 467 if (unlikely(peek)) { 468 468 msg_rx = sk_psock_next_msg(psock, msg_rx); ··· 472 472 } 473 473 474 474 msg_rx->sg.start = i; 475 - if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { 475 + if (!sge->length && sg_is_last(sge)) { 476 476 msg_rx = sk_psock_dequeue_msg(psock); 477 477 kfree_sk_msg(msg_rx); 478 478 }

+9 -9

net/ipv4/bpf_tcp_ca.c

··· 197 197 } 198 198 } 199 199 200 - BTF_SET_START(bpf_tcp_ca_check_kfunc_ids) 201 - BTF_ID(func, tcp_reno_ssthresh) 202 - BTF_ID(func, tcp_reno_cong_avoid) 203 - BTF_ID(func, tcp_reno_undo_cwnd) 204 - BTF_ID(func, tcp_slow_start) 205 - BTF_ID(func, tcp_cong_avoid_ai) 206 - BTF_SET_END(bpf_tcp_ca_check_kfunc_ids) 200 + BTF_SET8_START(bpf_tcp_ca_check_kfunc_ids) 201 + BTF_ID_FLAGS(func, tcp_reno_ssthresh) 202 + BTF_ID_FLAGS(func, tcp_reno_cong_avoid) 203 + BTF_ID_FLAGS(func, tcp_reno_undo_cwnd) 204 + BTF_ID_FLAGS(func, tcp_slow_start) 205 + BTF_ID_FLAGS(func, tcp_cong_avoid_ai) 206 + BTF_SET8_END(bpf_tcp_ca_check_kfunc_ids) 207 207 208 208 static const struct btf_kfunc_id_set bpf_tcp_ca_kfunc_set = { 209 - .owner = THIS_MODULE, 210 - .check_set = &bpf_tcp_ca_check_kfunc_ids, 209 + .owner = THIS_MODULE, 210 + .set = &bpf_tcp_ca_check_kfunc_ids, 211 211 }; 212 212 213 213 static const struct bpf_verifier_ops bpf_tcp_ca_verifier_ops = {

+12 -12

net/ipv4/tcp_bbr.c

··· 1154 1154 .set_state = bbr_set_state, 1155 1155 }; 1156 1156 1157 - BTF_SET_START(tcp_bbr_check_kfunc_ids) 1157 + BTF_SET8_START(tcp_bbr_check_kfunc_ids) 1158 1158 #ifdef CONFIG_X86 1159 1159 #ifdef CONFIG_DYNAMIC_FTRACE 1160 - BTF_ID(func, bbr_init) 1161 - BTF_ID(func, bbr_main) 1162 - BTF_ID(func, bbr_sndbuf_expand) 1163 - BTF_ID(func, bbr_undo_cwnd) 1164 - BTF_ID(func, bbr_cwnd_event) 1165 - BTF_ID(func, bbr_ssthresh) 1166 - BTF_ID(func, bbr_min_tso_segs) 1167 - BTF_ID(func, bbr_set_state) 1160 + BTF_ID_FLAGS(func, bbr_init) 1161 + BTF_ID_FLAGS(func, bbr_main) 1162 + BTF_ID_FLAGS(func, bbr_sndbuf_expand) 1163 + BTF_ID_FLAGS(func, bbr_undo_cwnd) 1164 + BTF_ID_FLAGS(func, bbr_cwnd_event) 1165 + BTF_ID_FLAGS(func, bbr_ssthresh) 1166 + BTF_ID_FLAGS(func, bbr_min_tso_segs) 1167 + BTF_ID_FLAGS(func, bbr_set_state) 1168 1168 #endif 1169 1169 #endif 1170 - BTF_SET_END(tcp_bbr_check_kfunc_ids) 1170 + BTF_SET8_END(tcp_bbr_check_kfunc_ids) 1171 1171 1172 1172 static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = { 1173 - .owner = THIS_MODULE, 1174 - .check_set = &tcp_bbr_check_kfunc_ids, 1173 + .owner = THIS_MODULE, 1174 + .set = &tcp_bbr_check_kfunc_ids, 1175 1175 }; 1176 1176 1177 1177 static int __init bbr_register(void)

+10 -10

net/ipv4/tcp_cubic.c

··· 485 485 .name = "cubic", 486 486 }; 487 487 488 - BTF_SET_START(tcp_cubic_check_kfunc_ids) 488 + BTF_SET8_START(tcp_cubic_check_kfunc_ids) 489 489 #ifdef CONFIG_X86 490 490 #ifdef CONFIG_DYNAMIC_FTRACE 491 - BTF_ID(func, cubictcp_init) 492 - BTF_ID(func, cubictcp_recalc_ssthresh) 493 - BTF_ID(func, cubictcp_cong_avoid) 494 - BTF_ID(func, cubictcp_state) 495 - BTF_ID(func, cubictcp_cwnd_event) 496 - BTF_ID(func, cubictcp_acked) 491 + BTF_ID_FLAGS(func, cubictcp_init) 492 + BTF_ID_FLAGS(func, cubictcp_recalc_ssthresh) 493 + BTF_ID_FLAGS(func, cubictcp_cong_avoid) 494 + BTF_ID_FLAGS(func, cubictcp_state) 495 + BTF_ID_FLAGS(func, cubictcp_cwnd_event) 496 + BTF_ID_FLAGS(func, cubictcp_acked) 497 497 #endif 498 498 #endif 499 - BTF_SET_END(tcp_cubic_check_kfunc_ids) 499 + BTF_SET8_END(tcp_cubic_check_kfunc_ids) 500 500 501 501 static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = { 502 - .owner = THIS_MODULE, 503 - .check_set = &tcp_cubic_check_kfunc_ids, 502 + .owner = THIS_MODULE, 503 + .set = &tcp_cubic_check_kfunc_ids, 504 504 }; 505 505 506 506 static int __init cubictcp_register(void)

+10 -10

net/ipv4/tcp_dctcp.c

··· 239 239 .name = "dctcp-reno", 240 240 }; 241 241 242 - BTF_SET_START(tcp_dctcp_check_kfunc_ids) 242 + BTF_SET8_START(tcp_dctcp_check_kfunc_ids) 243 243 #ifdef CONFIG_X86 244 244 #ifdef CONFIG_DYNAMIC_FTRACE 245 - BTF_ID(func, dctcp_init) 246 - BTF_ID(func, dctcp_update_alpha) 247 - BTF_ID(func, dctcp_cwnd_event) 248 - BTF_ID(func, dctcp_ssthresh) 249 - BTF_ID(func, dctcp_cwnd_undo) 250 - BTF_ID(func, dctcp_state) 245 + BTF_ID_FLAGS(func, dctcp_init) 246 + BTF_ID_FLAGS(func, dctcp_update_alpha) 247 + BTF_ID_FLAGS(func, dctcp_cwnd_event) 248 + BTF_ID_FLAGS(func, dctcp_ssthresh) 249 + BTF_ID_FLAGS(func, dctcp_cwnd_undo) 250 + BTF_ID_FLAGS(func, dctcp_state) 251 251 #endif 252 252 #endif 253 - BTF_SET_END(tcp_dctcp_check_kfunc_ids) 253 + BTF_SET8_END(tcp_dctcp_check_kfunc_ids) 254 254 255 255 static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = { 256 - .owner = THIS_MODULE, 257 - .check_set = &tcp_dctcp_check_kfunc_ids, 256 + .owner = THIS_MODULE, 257 + .set = &tcp_dctcp_check_kfunc_ids, 258 258 }; 259 259 260 260 static int __init dctcp_register(void)

+278 -91

net/netfilter/nf_conntrack_bpf.c

··· 55 55 NF_BPF_CT_OPTS_SZ = 12, 56 56 }; 57 57 58 + static int bpf_nf_ct_tuple_parse(struct bpf_sock_tuple *bpf_tuple, 59 + u32 tuple_len, u8 protonum, u8 dir, 60 + struct nf_conntrack_tuple *tuple) 61 + { 62 + union nf_inet_addr *src = dir ? &tuple->dst.u3 : &tuple->src.u3; 63 + union nf_inet_addr *dst = dir ? &tuple->src.u3 : &tuple->dst.u3; 64 + union nf_conntrack_man_proto *sport = dir ? (void *)&tuple->dst.u 65 + : &tuple->src.u; 66 + union nf_conntrack_man_proto *dport = dir ? &tuple->src.u 67 + : (void *)&tuple->dst.u; 68 + 69 + if (unlikely(protonum != IPPROTO_TCP && protonum != IPPROTO_UDP)) 70 + return -EPROTO; 71 + 72 + memset(tuple, 0, sizeof(*tuple)); 73 + 74 + switch (tuple_len) { 75 + case sizeof(bpf_tuple->ipv4): 76 + tuple->src.l3num = AF_INET; 77 + src->ip = bpf_tuple->ipv4.saddr; 78 + sport->tcp.port = bpf_tuple->ipv4.sport; 79 + dst->ip = bpf_tuple->ipv4.daddr; 80 + dport->tcp.port = bpf_tuple->ipv4.dport; 81 + break; 82 + case sizeof(bpf_tuple->ipv6): 83 + tuple->src.l3num = AF_INET6; 84 + memcpy(src->ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr)); 85 + sport->tcp.port = bpf_tuple->ipv6.sport; 86 + memcpy(dst->ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr)); 87 + dport->tcp.port = bpf_tuple->ipv6.dport; 88 + break; 89 + default: 90 + return -EAFNOSUPPORT; 91 + } 92 + tuple->dst.protonum = protonum; 93 + tuple->dst.dir = dir; 94 + 95 + return 0; 96 + } 97 + 98 + static struct nf_conn * 99 + __bpf_nf_ct_alloc_entry(struct net *net, struct bpf_sock_tuple *bpf_tuple, 100 + u32 tuple_len, struct bpf_ct_opts *opts, u32 opts_len, 101 + u32 timeout) 102 + { 103 + struct nf_conntrack_tuple otuple, rtuple; 104 + struct nf_conn *ct; 105 + int err; 106 + 107 + if (!opts || !bpf_tuple || opts->reserved[0] || opts->reserved[1] || 108 + opts_len != NF_BPF_CT_OPTS_SZ) 109 + return ERR_PTR(-EINVAL); 110 + 111 + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) 112 + return ERR_PTR(-EINVAL); 113 + 114 + err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto, 115 + IP_CT_DIR_ORIGINAL, &otuple); 116 + if (err < 0) 117 + return ERR_PTR(err); 118 + 119 + err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto, 120 + IP_CT_DIR_REPLY, &rtuple); 121 + if (err < 0) 122 + return ERR_PTR(err); 123 + 124 + if (opts->netns_id >= 0) { 125 + net = get_net_ns_by_id(net, opts->netns_id); 126 + if (unlikely(!net)) 127 + return ERR_PTR(-ENONET); 128 + } 129 + 130 + ct = nf_conntrack_alloc(net, &nf_ct_zone_dflt, &otuple, &rtuple, 131 + GFP_ATOMIC); 132 + if (IS_ERR(ct)) 133 + goto out; 134 + 135 + memset(&ct->proto, 0, sizeof(ct->proto)); 136 + __nf_ct_set_timeout(ct, timeout * HZ); 137 + ct->status |= IPS_CONFIRMED; 138 + 139 + out: 140 + if (opts->netns_id >= 0) 141 + put_net(net); 142 + 143 + return ct; 144 + } 145 + 58 146 static struct nf_conn *__bpf_nf_ct_lookup(struct net *net, 59 147 struct bpf_sock_tuple *bpf_tuple, 60 - u32 tuple_len, u8 protonum, 61 - s32 netns_id, u8 *dir) 148 + u32 tuple_len, struct bpf_ct_opts *opts, 149 + u32 opts_len) 62 150 { 63 151 struct nf_conntrack_tuple_hash *hash; 64 152 struct nf_conntrack_tuple tuple; 65 153 struct nf_conn *ct; 154 + int err; 66 155 67 - if (unlikely(protonum != IPPROTO_TCP && protonum != IPPROTO_UDP)) 156 + if (!opts || !bpf_tuple || opts->reserved[0] || opts->reserved[1] || 157 + opts_len != NF_BPF_CT_OPTS_SZ) 158 + return ERR_PTR(-EINVAL); 159 + if (unlikely(opts->l4proto != IPPROTO_TCP && opts->l4proto != IPPROTO_UDP)) 68 160 return ERR_PTR(-EPROTO); 69 - if (unlikely(netns_id < BPF_F_CURRENT_NETNS)) 161 + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) 70 162 return ERR_PTR(-EINVAL); 71 163 72 - memset(&tuple, 0, sizeof(tuple)); 73 - switch (tuple_len) { 74 - case sizeof(bpf_tuple->ipv4): 75 - tuple.src.l3num = AF_INET; 76 - tuple.src.u3.ip = bpf_tuple->ipv4.saddr; 77 - tuple.src.u.tcp.port = bpf_tuple->ipv4.sport; 78 - tuple.dst.u3.ip = bpf_tuple->ipv4.daddr; 79 - tuple.dst.u.tcp.port = bpf_tuple->ipv4.dport; 80 - break; 81 - case sizeof(bpf_tuple->ipv6): 82 - tuple.src.l3num = AF_INET6; 83 - memcpy(tuple.src.u3.ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr)); 84 - tuple.src.u.tcp.port = bpf_tuple->ipv6.sport; 85 - memcpy(tuple.dst.u3.ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr)); 86 - tuple.dst.u.tcp.port = bpf_tuple->ipv6.dport; 87 - break; 88 - default: 89 - return ERR_PTR(-EAFNOSUPPORT); 90 - } 164 + err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto, 165 + IP_CT_DIR_ORIGINAL, &tuple); 166 + if (err < 0) 167 + return ERR_PTR(err); 91 168 92 - tuple.dst.protonum = protonum; 93 - 94 - if (netns_id >= 0) { 95 - net = get_net_ns_by_id(net, netns_id); 169 + if (opts->netns_id >= 0) { 170 + net = get_net_ns_by_id(net, opts->netns_id); 96 171 if (unlikely(!net)) 97 172 return ERR_PTR(-ENONET); 98 173 } 99 174 100 175 hash = nf_conntrack_find_get(net, &nf_ct_zone_dflt, &tuple); 101 - if (netns_id >= 0) 176 + if (opts->netns_id >= 0) 102 177 put_net(net); 103 178 if (!hash) 104 179 return ERR_PTR(-ENOENT); 105 180 106 181 ct = nf_ct_tuplehash_to_ctrack(hash); 107 - if (dir) 108 - *dir = NF_CT_DIRECTION(hash); 182 + opts->dir = NF_CT_DIRECTION(hash); 109 183 110 184 return ct; 111 185 } ··· 187 113 __diag_push(); 188 114 __diag_ignore_all("-Wmissing-prototypes", 189 115 "Global functions as their definitions will be in nf_conntrack BTF"); 116 + 117 + struct nf_conn___init { 118 + struct nf_conn ct; 119 + }; 120 + 121 + /* bpf_xdp_ct_alloc - Allocate a new CT entry 122 + * 123 + * Parameters: 124 + * @xdp_ctx - Pointer to ctx (xdp_md) in XDP program 125 + * Cannot be NULL 126 + * @bpf_tuple - Pointer to memory representing the tuple to look up 127 + * Cannot be NULL 128 + * @tuple__sz - Length of the tuple structure 129 + * Must be one of sizeof(bpf_tuple->ipv4) or 130 + * sizeof(bpf_tuple->ipv6) 131 + * @opts - Additional options for allocation (documented above) 132 + * Cannot be NULL 133 + * @opts__sz - Length of the bpf_ct_opts structure 134 + * Must be NF_BPF_CT_OPTS_SZ (12) 135 + */ 136 + struct nf_conn___init * 137 + bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple, 138 + u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz) 139 + { 140 + struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx; 141 + struct nf_conn *nfct; 142 + 143 + nfct = __bpf_nf_ct_alloc_entry(dev_net(ctx->rxq->dev), bpf_tuple, tuple__sz, 144 + opts, opts__sz, 10); 145 + if (IS_ERR(nfct)) { 146 + if (opts) 147 + opts->error = PTR_ERR(nfct); 148 + return NULL; 149 + } 150 + 151 + return (struct nf_conn___init *)nfct; 152 + } 190 153 191 154 /* bpf_xdp_ct_lookup - Lookup CT entry for the given tuple, and acquire a 192 155 * reference to it ··· 249 138 struct net *caller_net; 250 139 struct nf_conn *nfct; 251 140 252 - BUILD_BUG_ON(sizeof(struct bpf_ct_opts) != NF_BPF_CT_OPTS_SZ); 253 - 254 - if (!opts) 255 - return NULL; 256 - if (!bpf_tuple || opts->reserved[0] || opts->reserved[1] || 257 - opts__sz != NF_BPF_CT_OPTS_SZ) { 258 - opts->error = -EINVAL; 259 - return NULL; 260 - } 261 141 caller_net = dev_net(ctx->rxq->dev); 262 - nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts->l4proto, 263 - opts->netns_id, &opts->dir); 142 + nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz); 264 143 if (IS_ERR(nfct)) { 265 - opts->error = PTR_ERR(nfct); 144 + if (opts) 145 + opts->error = PTR_ERR(nfct); 266 146 return NULL; 267 147 } 268 148 return nfct; 149 + } 150 + 151 + /* bpf_skb_ct_alloc - Allocate a new CT entry 152 + * 153 + * Parameters: 154 + * @skb_ctx - Pointer to ctx (__sk_buff) in TC program 155 + * Cannot be NULL 156 + * @bpf_tuple - Pointer to memory representing the tuple to look up 157 + * Cannot be NULL 158 + * @tuple__sz - Length of the tuple structure 159 + * Must be one of sizeof(bpf_tuple->ipv4) or 160 + * sizeof(bpf_tuple->ipv6) 161 + * @opts - Additional options for allocation (documented above) 162 + * Cannot be NULL 163 + * @opts__sz - Length of the bpf_ct_opts structure 164 + * Must be NF_BPF_CT_OPTS_SZ (12) 165 + */ 166 + struct nf_conn___init * 167 + bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple, 168 + u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz) 169 + { 170 + struct sk_buff *skb = (struct sk_buff *)skb_ctx; 171 + struct nf_conn *nfct; 172 + struct net *net; 173 + 174 + net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); 175 + nfct = __bpf_nf_ct_alloc_entry(net, bpf_tuple, tuple__sz, opts, opts__sz, 10); 176 + if (IS_ERR(nfct)) { 177 + if (opts) 178 + opts->error = PTR_ERR(nfct); 179 + return NULL; 180 + } 181 + 182 + return (struct nf_conn___init *)nfct; 269 183 } 270 184 271 185 /* bpf_skb_ct_lookup - Lookup CT entry for the given tuple, and acquire a ··· 317 181 struct net *caller_net; 318 182 struct nf_conn *nfct; 319 183 320 - BUILD_BUG_ON(sizeof(struct bpf_ct_opts) != NF_BPF_CT_OPTS_SZ); 321 - 322 - if (!opts) 323 - return NULL; 324 - if (!bpf_tuple || opts->reserved[0] || opts->reserved[1] || 325 - opts__sz != NF_BPF_CT_OPTS_SZ) { 326 - opts->error = -EINVAL; 184 + caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); 185 + nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz); 186 + if (IS_ERR(nfct)) { 187 + if (opts) 188 + opts->error = PTR_ERR(nfct); 327 189 return NULL; 328 190 } 329 - caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); 330 - nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts->l4proto, 331 - opts->netns_id, &opts->dir); 332 - if (IS_ERR(nfct)) { 333 - opts->error = PTR_ERR(nfct); 191 + return nfct; 192 + } 193 + 194 + /* bpf_ct_insert_entry - Add the provided entry into a CT map 195 + * 196 + * This must be invoked for referenced PTR_TO_BTF_ID. 197 + * 198 + * @nfct - Pointer to referenced nf_conn___init object, obtained 199 + * using bpf_xdp_ct_alloc or bpf_skb_ct_alloc. 200 + */ 201 + struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i) 202 + { 203 + struct nf_conn *nfct = (struct nf_conn *)nfct_i; 204 + int err; 205 + 206 + err = nf_conntrack_hash_check_insert(nfct); 207 + if (err < 0) { 208 + nf_conntrack_free(nfct); 334 209 return NULL; 335 210 } 336 211 return nfct; ··· 364 217 nf_ct_put(nfct); 365 218 } 366 219 220 + /* bpf_ct_set_timeout - Set timeout of allocated nf_conn 221 + * 222 + * Sets the default timeout of newly allocated nf_conn before insertion. 223 + * This helper must be invoked for refcounted pointer to nf_conn___init. 224 + * 225 + * Parameters: 226 + * @nfct - Pointer to referenced nf_conn object, obtained using 227 + * bpf_xdp_ct_alloc or bpf_skb_ct_alloc. 228 + * @timeout - Timeout in msecs. 229 + */ 230 + void bpf_ct_set_timeout(struct nf_conn___init *nfct, u32 timeout) 231 + { 232 + __nf_ct_set_timeout((struct nf_conn *)nfct, msecs_to_jiffies(timeout)); 233 + } 234 + 235 + /* bpf_ct_change_timeout - Change timeout of inserted nf_conn 236 + * 237 + * Change timeout associated of the inserted or looked up nf_conn. 238 + * This helper must be invoked for refcounted pointer to nf_conn. 239 + * 240 + * Parameters: 241 + * @nfct - Pointer to referenced nf_conn object, obtained using 242 + * bpf_ct_insert_entry, bpf_xdp_ct_lookup, or bpf_skb_ct_lookup. 243 + * @timeout - New timeout in msecs. 244 + */ 245 + int bpf_ct_change_timeout(struct nf_conn *nfct, u32 timeout) 246 + { 247 + return __nf_ct_change_timeout(nfct, msecs_to_jiffies(timeout)); 248 + } 249 + 250 + /* bpf_ct_set_status - Set status field of allocated nf_conn 251 + * 252 + * Set the status field of the newly allocated nf_conn before insertion. 253 + * This must be invoked for referenced PTR_TO_BTF_ID to nf_conn___init. 254 + * 255 + * Parameters: 256 + * @nfct - Pointer to referenced nf_conn object, obtained using 257 + * bpf_xdp_ct_alloc or bpf_skb_ct_alloc. 258 + * @status - New status value. 259 + */ 260 + int bpf_ct_set_status(const struct nf_conn___init *nfct, u32 status) 261 + { 262 + return nf_ct_change_status_common((struct nf_conn *)nfct, status); 263 + } 264 + 265 + /* bpf_ct_change_status - Change status of inserted nf_conn 266 + * 267 + * Change the status field of the provided connection tracking entry. 268 + * This must be invoked for referenced PTR_TO_BTF_ID to nf_conn. 269 + * 270 + * Parameters: 271 + * @nfct - Pointer to referenced nf_conn object, obtained using 272 + * bpf_ct_insert_entry, bpf_xdp_ct_lookup or bpf_skb_ct_lookup. 273 + * @status - New status value. 274 + */ 275 + int bpf_ct_change_status(struct nf_conn *nfct, u32 status) 276 + { 277 + return nf_ct_change_status_common(nfct, status); 278 + } 279 + 367 280 __diag_pop() 368 281 369 - BTF_SET_START(nf_ct_xdp_check_kfunc_ids) 370 - BTF_ID(func, bpf_xdp_ct_lookup) 371 - BTF_ID(func, bpf_ct_release) 372 - BTF_SET_END(nf_ct_xdp_check_kfunc_ids) 282 + BTF_SET8_START(nf_ct_kfunc_set) 283 + BTF_ID_FLAGS(func, bpf_xdp_ct_alloc, KF_ACQUIRE | KF_RET_NULL) 284 + BTF_ID_FLAGS(func, bpf_xdp_ct_lookup, KF_ACQUIRE | KF_RET_NULL) 285 + BTF_ID_FLAGS(func, bpf_skb_ct_alloc, KF_ACQUIRE | KF_RET_NULL) 286 + BTF_ID_FLAGS(func, bpf_skb_ct_lookup, KF_ACQUIRE | KF_RET_NULL) 287 + BTF_ID_FLAGS(func, bpf_ct_insert_entry, KF_ACQUIRE | KF_RET_NULL | KF_RELEASE) 288 + BTF_ID_FLAGS(func, bpf_ct_release, KF_RELEASE) 289 + BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_TRUSTED_ARGS) 290 + BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_TRUSTED_ARGS) 291 + BTF_ID_FLAGS(func, bpf_ct_set_status, KF_TRUSTED_ARGS) 292 + BTF_ID_FLAGS(func, bpf_ct_change_status, KF_TRUSTED_ARGS) 293 + BTF_SET8_END(nf_ct_kfunc_set) 373 294 374 - BTF_SET_START(nf_ct_tc_check_kfunc_ids) 375 - BTF_ID(func, bpf_skb_ct_lookup) 376 - BTF_ID(func, bpf_ct_release) 377 - BTF_SET_END(nf_ct_tc_check_kfunc_ids) 378 - 379 - BTF_SET_START(nf_ct_acquire_kfunc_ids) 380 - BTF_ID(func, bpf_xdp_ct_lookup) 381 - BTF_ID(func, bpf_skb_ct_lookup) 382 - BTF_SET_END(nf_ct_acquire_kfunc_ids) 383 - 384 - BTF_SET_START(nf_ct_release_kfunc_ids) 385 - BTF_ID(func, bpf_ct_release) 386 - BTF_SET_END(nf_ct_release_kfunc_ids) 387 - 388 - /* Both sets are identical */ 389 - #define nf_ct_ret_null_kfunc_ids nf_ct_acquire_kfunc_ids 390 - 391 - static const struct btf_kfunc_id_set nf_conntrack_xdp_kfunc_set = { 392 - .owner = THIS_MODULE, 393 - .check_set = &nf_ct_xdp_check_kfunc_ids, 394 - .acquire_set = &nf_ct_acquire_kfunc_ids, 395 - .release_set = &nf_ct_release_kfunc_ids, 396 - .ret_null_set = &nf_ct_ret_null_kfunc_ids, 397 - }; 398 - 399 - static const struct btf_kfunc_id_set nf_conntrack_tc_kfunc_set = { 400 - .owner = THIS_MODULE, 401 - .check_set = &nf_ct_tc_check_kfunc_ids, 402 - .acquire_set = &nf_ct_acquire_kfunc_ids, 403 - .release_set = &nf_ct_release_kfunc_ids, 404 - .ret_null_set = &nf_ct_ret_null_kfunc_ids, 295 + static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = { 296 + .owner = THIS_MODULE, 297 + .set = &nf_ct_kfunc_set, 405 298 }; 406 299 407 300 int register_nf_conntrack_bpf(void) 408 301 { 409 302 int ret; 410 303 411 - ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_xdp_kfunc_set); 412 - return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_tc_kfunc_set); 304 + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set); 305 + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set); 413 306 }

+62

net/netfilter/nf_conntrack_core.c

··· 2806 2806 free_percpu(net->ct.stat); 2807 2807 return ret; 2808 2808 } 2809 + 2810 + #if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \ 2811 + (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) || \ 2812 + IS_ENABLED(CONFIG_NF_CT_NETLINK)) 2813 + 2814 + /* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */ 2815 + 2816 + int __nf_ct_change_timeout(struct nf_conn *ct, u64 timeout) 2817 + { 2818 + if (test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status)) 2819 + return -EPERM; 2820 + 2821 + __nf_ct_set_timeout(ct, timeout); 2822 + 2823 + if (test_bit(IPS_DYING_BIT, &ct->status)) 2824 + return -ETIME; 2825 + 2826 + return 0; 2827 + } 2828 + EXPORT_SYMBOL_GPL(__nf_ct_change_timeout); 2829 + 2830 + void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off) 2831 + { 2832 + unsigned int bit; 2833 + 2834 + /* Ignore these unchangable bits */ 2835 + on &= ~IPS_UNCHANGEABLE_MASK; 2836 + off &= ~IPS_UNCHANGEABLE_MASK; 2837 + 2838 + for (bit = 0; bit < __IPS_MAX_BIT; bit++) { 2839 + if (on & (1 << bit)) 2840 + set_bit(bit, &ct->status); 2841 + else if (off & (1 << bit)) 2842 + clear_bit(bit, &ct->status); 2843 + } 2844 + } 2845 + EXPORT_SYMBOL_GPL(__nf_ct_change_status); 2846 + 2847 + int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status) 2848 + { 2849 + unsigned long d; 2850 + 2851 + d = ct->status ^ status; 2852 + 2853 + if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING)) 2854 + /* unchangeable */ 2855 + return -EBUSY; 2856 + 2857 + if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY)) 2858 + /* SEEN_REPLY bit can only be set */ 2859 + return -EBUSY; 2860 + 2861 + if (d & IPS_ASSURED && !(status & IPS_ASSURED)) 2862 + /* ASSURED bit can only be set */ 2863 + return -EBUSY; 2864 + 2865 + __nf_ct_change_status(ct, status, 0); 2866 + return 0; 2867 + } 2868 + EXPORT_SYMBOL_GPL(nf_ct_change_status_common); 2869 + 2870 + #endif

+4 -50

net/netfilter/nf_conntrack_netlink.c

··· 1891 1891 } 1892 1892 #endif 1893 1893 1894 - static void 1895 - __ctnetlink_change_status(struct nf_conn *ct, unsigned long on, 1896 - unsigned long off) 1897 - { 1898 - unsigned int bit; 1899 - 1900 - /* Ignore these unchangable bits */ 1901 - on &= ~IPS_UNCHANGEABLE_MASK; 1902 - off &= ~IPS_UNCHANGEABLE_MASK; 1903 - 1904 - for (bit = 0; bit < __IPS_MAX_BIT; bit++) { 1905 - if (on & (1 << bit)) 1906 - set_bit(bit, &ct->status); 1907 - else if (off & (1 << bit)) 1908 - clear_bit(bit, &ct->status); 1909 - } 1910 - } 1911 - 1912 1894 static int 1913 1895 ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[]) 1914 1896 { 1915 - unsigned long d; 1916 - unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS])); 1917 - d = ct->status ^ status; 1918 - 1919 - if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING)) 1920 - /* unchangeable */ 1921 - return -EBUSY; 1922 - 1923 - if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY)) 1924 - /* SEEN_REPLY bit can only be set */ 1925 - return -EBUSY; 1926 - 1927 - if (d & IPS_ASSURED && !(status & IPS_ASSURED)) 1928 - /* ASSURED bit can only be set */ 1929 - return -EBUSY; 1930 - 1931 - __ctnetlink_change_status(ct, status, 0); 1932 - return 0; 1897 + return nf_ct_change_status_common(ct, ntohl(nla_get_be32(cda[CTA_STATUS]))); 1933 1898 } 1934 1899 1935 1900 static int ··· 1989 2024 static int ctnetlink_change_timeout(struct nf_conn *ct, 1990 2025 const struct nlattr * const cda[]) 1991 2026 { 1992 - u64 timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ; 1993 - 1994 - if (timeout > INT_MAX) 1995 - timeout = INT_MAX; 1996 - WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout); 1997 - 1998 - if (test_bit(IPS_DYING_BIT, &ct->status)) 1999 - return -ETIME; 2000 - 2001 - return 0; 2027 + return __nf_ct_change_timeout(ct, (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ); 2002 2028 } 2003 2029 2004 2030 #if defined(CONFIG_NF_CONNTRACK_MARK) ··· 2249 2293 goto err1; 2250 2294 2251 2295 timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ; 2252 - if (timeout > INT_MAX) 2253 - timeout = INT_MAX; 2254 - ct->timeout = (u32)timeout + nfct_time_stamp; 2296 + __nf_ct_set_timeout(ct, timeout); 2255 2297 2256 2298 rcu_read_lock(); 2257 2299 if (cda[CTA_HELP]) { ··· 2791 2837 * unchangeable bits but do not error out. Also user programs 2792 2838 * are allowed to clear the bits that they are allowed to change. 2793 2839 */ 2794 - __ctnetlink_change_status(ct, status, ~status); 2840 + __nf_ct_change_status(ct, status, ~status); 2795 2841 return 0; 2796 2842 } 2797 2843

+4 -1

net/xdp/xsk.c

··· 639 639 if (unlikely(need_wait)) 640 640 return -EOPNOTSUPP; 641 641 642 - if (sk_can_busy_loop(sk)) 642 + if (sk_can_busy_loop(sk)) { 643 + if (xs->zc) 644 + __sk_mark_napi_id_once(sk, xsk_pool_get_napi_id(xs->pool)); 643 645 sk_busy_loop(sk, 1); /* only support non-blocking sockets */ 646 + } 644 647 645 648 if (xs->zc && xsk_no_wakeup(sk)) 646 649 return 0;

+4 -6

samples/bpf/Makefile

··· 282 282 283 283 BPFTOOLDIR := $(TOOLS_PATH)/bpf/bpftool 284 284 BPFTOOL_OUTPUT := $(abspath $(BPF_SAMPLES_PATH))/bpftool 285 - BPFTOOL := $(BPFTOOL_OUTPUT)/bpftool 286 - $(BPFTOOL): $(LIBBPF) $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) | $(BPFTOOL_OUTPUT) 287 - $(MAKE) -C $(BPFTOOLDIR) srctree=$(BPF_SAMPLES_PATH)/../../ \ 288 - OUTPUT=$(BPFTOOL_OUTPUT)/ \ 289 - LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/ \ 290 - LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/ 285 + BPFTOOL := $(BPFTOOL_OUTPUT)/bootstrap/bpftool 286 + $(BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) | $(BPFTOOL_OUTPUT) 287 + $(MAKE) -C $(BPFTOOLDIR) srctree=$(BPF_SAMPLES_PATH)/../../ \ 288 + OUTPUT=$(BPFTOOL_OUTPUT)/ bootstrap 291 289 292 290 $(LIBBPF_OUTPUT) $(BPFTOOL_OUTPUT): 293 291 $(call msg,MKDIR,$@)

+2 -1

samples/bpf/fds_example.c

··· 17 17 #include <bpf/libbpf.h> 18 18 #include "bpf_insn.h" 19 19 #include "sock_example.h" 20 + #include "bpf_util.h" 20 21 21 22 #define BPF_F_PIN (1 << 0) 22 23 #define BPF_F_GET (1 << 1) ··· 53 52 BPF_MOV64_IMM(BPF_REG_0, 1), 54 53 BPF_EXIT_INSN(), 55 54 }; 56 - size_t insns_cnt = sizeof(insns) / sizeof(struct bpf_insn); 55 + size_t insns_cnt = ARRAY_SIZE(insns); 57 56 struct bpf_object *obj; 58 57 int err; 59 58

+2 -1

samples/bpf/sock_example.c

··· 29 29 #include <bpf/bpf.h> 30 30 #include "bpf_insn.h" 31 31 #include "sock_example.h" 32 + #include "bpf_util.h" 32 33 33 34 char bpf_log_buf[BPF_LOG_BUF_SIZE]; 34 35 ··· 59 58 BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ 60 59 BPF_EXIT_INSN(), 61 60 }; 62 - size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); 61 + size_t insns_cnt = ARRAY_SIZE(prog); 63 62 LIBBPF_OPTS(bpf_prog_load_opts, opts, 64 63 .log_buf = bpf_log_buf, 65 64 .log_size = BPF_LOG_BUF_SIZE,

+2 -1

samples/bpf/test_cgrp2_attach.c

··· 31 31 #include <bpf/bpf.h> 32 32 33 33 #include "bpf_insn.h" 34 + #include "bpf_util.h" 34 35 35 36 enum { 36 37 MAP_KEY_PACKETS, ··· 71 70 BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */ 72 71 BPF_EXIT_INSN(), 73 72 }; 74 - size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); 73 + size_t insns_cnt = ARRAY_SIZE(prog); 75 74 LIBBPF_OPTS(bpf_prog_load_opts, opts, 76 75 .log_buf = bpf_log_buf, 77 76 .log_size = BPF_LOG_BUF_SIZE,

+1 -1

samples/bpf/test_lru_dist.c

··· 523 523 return -1; 524 524 } 525 525 526 - for (f = 0; f < sizeof(map_flags) / sizeof(*map_flags); f++) { 526 + for (f = 0; f < ARRAY_SIZE(map_flags); f++) { 527 527 test_lru_loss0(BPF_MAP_TYPE_LRU_HASH, map_flags[f]); 528 528 test_lru_loss1(BPF_MAP_TYPE_LRU_HASH, map_flags[f]); 529 529 test_parallel_lru_loss(BPF_MAP_TYPE_LRU_HASH, map_flags[f],

+3 -1

samples/bpf/test_map_in_map_user.c

··· 12 12 #include <bpf/bpf.h> 13 13 #include <bpf/libbpf.h> 14 14 15 + #include "bpf_util.h" 16 + 15 17 static int map_fd[7]; 16 18 17 19 #define PORT_A (map_fd[0]) ··· 30 28 "Hash of Hash", 31 29 }; 32 30 33 - #define NR_TESTS (sizeof(test_names) / sizeof(*test_names)) 31 + #define NR_TESTS ARRAY_SIZE(test_names) 34 32 35 33 static void check_map_id(int inner_map_fd, int map_in_map_fd, uint32_t key) 36 34 {

+2 -1

samples/bpf/tracex5_user.c

··· 8 8 #include <bpf/bpf.h> 9 9 #include <bpf/libbpf.h> 10 10 #include "trace_helpers.h" 11 + #include "bpf_util.h" 11 12 12 13 #ifdef __mips__ 13 14 #define MAX_ENTRIES 6000 /* MIPS n64 syscalls start at 5000 */ ··· 25 24 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), 26 25 }; 27 26 struct sock_fprog prog = { 28 - .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), 27 + .len = (unsigned short)ARRAY_SIZE(filter), 29 28 .filter = filter, 30 29 }; 31 30 if (prctl(PR_SET_SECCOMP, 2, &prog))

+4 -2

samples/bpf/xdp_redirect_map.bpf.c

··· 33 33 } tx_port_native SEC(".maps"); 34 34 35 35 /* store egress interface mac address */ 36 - const volatile char tx_mac_addr[ETH_ALEN]; 36 + const volatile __u8 tx_mac_addr[ETH_ALEN]; 37 37 38 38 static __always_inline int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map) 39 39 { ··· 73 73 { 74 74 void *data_end = (void *)(long)ctx->data_end; 75 75 void *data = (void *)(long)ctx->data; 76 + u8 *mac_addr = (u8 *) tx_mac_addr; 76 77 struct ethhdr *eth = data; 77 78 u64 nh_off; 78 79 ··· 81 80 if (data + nh_off > data_end) 82 81 return XDP_DROP; 83 82 84 - __builtin_memcpy(eth->h_source, (const char *)tx_mac_addr, ETH_ALEN); 83 + barrier_var(mac_addr); /* prevent optimizing out memcpy */ 84 + __builtin_memcpy(eth->h_source, mac_addr, ETH_ALEN); 85 85 86 86 return XDP_PASS; 87 87 }

+9

samples/bpf/xdp_redirect_map_user.c

··· 40 40 {} 41 41 }; 42 42 43 + static int verbose = 0; 44 + 43 45 int main(int argc, char **argv) 44 46 { 45 47 struct bpf_devmap_val devmap_val = {}; ··· 81 79 break; 82 80 case 'v': 83 81 sample_switch_mode(); 82 + verbose = 1; 84 83 break; 85 84 case 's': 86 85 mask |= SAMPLE_REDIRECT_MAP_CNT; ··· 137 134 ret = EXIT_FAIL; 138 135 goto end_destroy; 139 136 } 137 + if (verbose) 138 + printf("Egress ifindex:%d using src MAC %02x:%02x:%02x:%02x:%02x:%02x\n", 139 + ifindex_out, 140 + skel->rodata->tx_mac_addr[0], skel->rodata->tx_mac_addr[1], 141 + skel->rodata->tx_mac_addr[2], skel->rodata->tx_mac_addr[3], 142 + skel->rodata->tx_mac_addr[4], skel->rodata->tx_mac_addr[5]); 140 143 } 141 144 142 145 skel->rodata->from_match[0] = ifindex_in;

+1 -21

scripts/bpf_doc.py

··· 333 333 .. Copyright (C) All BPF authors and contributors from 2014 to present. 334 334 .. See git log include/uapi/linux/bpf.h in kernel tree for details. 335 335 .. 336 - .. %%%LICENSE_START(VERBATIM) 337 - .. Permission is granted to make and distribute verbatim copies of this 338 - .. manual provided the copyright notice and this permission notice are 339 - .. preserved on all copies. 340 - .. 341 - .. Permission is granted to copy and distribute modified versions of this 342 - .. manual under the conditions for verbatim copying, provided that the 343 - .. entire resulting derived work is distributed under the terms of a 344 - .. permission notice identical to this one. 345 - .. 346 - .. Since the Linux kernel and libraries are constantly changing, this 347 - .. manual page may be incorrect or out-of-date. The author(s) assume no 348 - .. responsibility for errors or omissions, or for damages resulting from 349 - .. the use of the information contained herein. The author(s) may not 350 - .. have taken the same level of care in the production of this manual, 351 - .. which is licensed free of charge, as they might when working 352 - .. professionally. 353 - .. 354 - .. Formatted or processed versions of this manual, if unaccompanied by 355 - .. the source, must acknowledge the copyright and authors of this work. 356 - .. %%%LICENSE_END 336 + .. SPDX-License-Identifier: Linux-man-pages-copyleft 357 337 .. 358 338 .. Please do not edit this file. It was generated from the documentation 359 339 .. located in file include/uapi/linux/bpf.h of the Linux kernel sources

+34 -6

tools/bpf/resolve_btfids/main.c

··· 45 45 * .zero 4 46 46 * __BTF_ID__func__vfs_fallocate__4: 47 47 * .zero 4 48 + * 49 + * set8 - store symbol size into first 4 bytes and sort following 50 + * ID list 51 + * 52 + * __BTF_ID__set8__list: 53 + * .zero 8 54 + * list: 55 + * __BTF_ID__func__vfs_getattr__3: 56 + * .zero 4 57 + * .word (1 << 0) | (1 << 2) 58 + * __BTF_ID__func__vfs_fallocate__5: 59 + * .zero 4 60 + * .word (1 << 3) | (1 << 1) | (1 << 2) 48 61 */ 49 62 50 63 #define _GNU_SOURCE ··· 85 72 #define BTF_TYPEDEF "typedef" 86 73 #define BTF_FUNC "func" 87 74 #define BTF_SET "set" 75 + #define BTF_SET8 "set8" 88 76 89 77 #define ADDR_CNT 100 90 78 ··· 98 84 }; 99 85 int addr_cnt; 100 86 bool is_set; 87 + bool is_set8; 101 88 Elf64_Addr addr[ADDR_CNT]; 102 89 }; 103 90 ··· 246 231 return id; 247 232 } 248 233 249 - static struct btf_id *add_set(struct object *obj, char *name) 234 + static struct btf_id *add_set(struct object *obj, char *name, bool is_set8) 250 235 { 251 236 /* 252 237 * __BTF_ID__set__name 253 238 * name = ^ 254 239 * id = ^ 255 240 */ 256 - char *id = name + sizeof(BTF_SET "__") - 1; 241 + char *id = name + (is_set8 ? sizeof(BTF_SET8 "__") : sizeof(BTF_SET "__")) - 1; 257 242 int len = strlen(name); 258 243 259 244 if (id >= name + len) { ··· 459 444 } else if (!strncmp(prefix, BTF_FUNC, sizeof(BTF_FUNC) - 1)) { 460 445 obj->nr_funcs++; 461 446 id = add_symbol(&obj->funcs, prefix, sizeof(BTF_FUNC) - 1); 447 + /* set8 */ 448 + } else if (!strncmp(prefix, BTF_SET8, sizeof(BTF_SET8) - 1)) { 449 + id = add_set(obj, prefix, true); 450 + /* 451 + * SET8 objects store list's count, which is encoded 452 + * in symbol's size, together with 'cnt' field hence 453 + * that - 1. 454 + */ 455 + if (id) { 456 + id->cnt = sym.st_size / sizeof(uint64_t) - 1; 457 + id->is_set8 = true; 458 + } 462 459 /* set */ 463 460 } else if (!strncmp(prefix, BTF_SET, sizeof(BTF_SET) - 1)) { 464 - id = add_set(obj, prefix); 461 + id = add_set(obj, prefix, false); 465 462 /* 466 463 * SET objects store list's count, which is encoded 467 464 * in symbol's size, together with 'cnt' field hence ··· 598 571 int *ptr = data->d_buf; 599 572 int i; 600 573 601 - if (!id->id && !id->is_set) 574 + /* For set, set8, id->id may be 0 */ 575 + if (!id->id && !id->is_set && !id->is_set8) 602 576 pr_err("WARN: resolve_btfids: unresolved symbol %s\n", id->name); 603 577 604 578 for (i = 0; i < id->addr_cnt; i++) { ··· 671 643 } 672 644 673 645 idx = idx / sizeof(int); 674 - base = &ptr[idx] + 1; 646 + base = &ptr[idx] + (id->is_set8 ? 2 : 1); 675 647 cnt = ptr[idx]; 676 648 677 649 pr_debug("sorting addr %5lu: cnt %6d [%s]\n", 678 650 (idx + 1) * sizeof(int), cnt, id->name); 679 651 680 - qsort(base, cnt, sizeof(int), cmp_id); 652 + qsort(base, cnt, id->is_set8 ? sizeof(uint64_t) : sizeof(int), cmp_id); 681 653 682 654 next = rb_next(next); 683 655 }

+3 -4

tools/bpf/runqslower/Makefile

··· 4 4 OUTPUT ?= $(abspath .output)/ 5 5 6 6 BPFTOOL_OUTPUT := $(OUTPUT)bpftool/ 7 - DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bpftool 7 + DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bootstrap/bpftool 8 8 BPFTOOL ?= $(DEFAULT_BPFTOOL) 9 9 LIBBPF_SRC := $(abspath ../../lib/bpf) 10 10 BPFOBJ_OUTPUT := $(OUTPUT)libbpf/ ··· 86 86 $(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) OUTPUT=$(BPFOBJ_OUTPUT) \ 87 87 DESTDIR=$(BPFOBJ_OUTPUT) prefix= $(abspath $@) install_headers 88 88 89 - $(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT) 90 - $(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT) \ 91 - ARCH= CROSS_COMPILE= CC=$(HOSTCC) LD=$(HOSTLD) 89 + $(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT) 90 + $(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT) bootstrap

+2 -1

tools/include/uapi/linux/bpf.h

··· 2361 2361 * Pull in non-linear data in case the *skb* is non-linear and not 2362 2362 * all of *len* are part of the linear section. Make *len* bytes 2363 2363 * from *skb* readable and writable. If a zero value is passed for 2364 - * *len*, then the whole length of the *skb* is pulled. 2364 + * *len*, then all bytes in the linear part of *skb* will be made 2365 + * readable and writable. 2365 2366 * 2366 2367 * This helper is only needed for reading and writing with direct 2367 2368 * packet access.

+38 -13

tools/lib/bpf/bpf_tracing.h

··· 2 2 #ifndef __BPF_TRACING_H__ 3 3 #define __BPF_TRACING_H__ 4 4 5 + #include <bpf/bpf_helpers.h> 6 + 5 7 /* Scan the ARCH passed in from ARCH env variable (see Makefile) */ 6 8 #if defined(__TARGET_ARCH_x86) 7 9 #define bpf_target_x86 ··· 142 140 #define __PT_RC_REG gprs[2] 143 141 #define __PT_SP_REG gprs[15] 144 142 #define __PT_IP_REG psw.addr 145 - #define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; }) 143 + #define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x) 146 144 #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___s390 *)(x), orig_gpr2) 147 145 148 146 #elif defined(bpf_target_arm) ··· 176 174 #define __PT_RC_REG regs[0] 177 175 #define __PT_SP_REG sp 178 176 #define __PT_IP_REG pc 179 - #define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; }) 177 + #define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x) 180 178 #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___arm64 *)(x), orig_x0) 181 179 182 180 #elif defined(bpf_target_mips) ··· 495 493 } \ 496 494 static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args) 497 495 496 + /* If kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER, read pt_regs directly */ 498 497 #define ___bpf_syscall_args0() ctx 499 - #define ___bpf_syscall_args1(x) ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs) 500 - #define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs) 501 - #define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs) 502 - #define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs) 503 - #define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs) 498 + #define ___bpf_syscall_args1(x) ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_SYSCALL(regs) 499 + #define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_SYSCALL(regs) 500 + #define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_SYSCALL(regs) 501 + #define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_SYSCALL(regs) 502 + #define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_SYSCALL(regs) 504 503 #define ___bpf_syscall_args(args...) ___bpf_apply(___bpf_syscall_args, ___bpf_narg(args))(args) 505 504 505 + /* If kernel doesn't have CONFIG_ARCH_HAS_SYSCALL_WRAPPER, we have to BPF_CORE_READ from pt_regs */ 506 + #define ___bpf_syswrap_args0() ctx 507 + #define ___bpf_syswrap_args1(x) ___bpf_syswrap_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs) 508 + #define ___bpf_syswrap_args2(x, args...) ___bpf_syswrap_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs) 509 + #define ___bpf_syswrap_args3(x, args...) ___bpf_syswrap_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs) 510 + #define ___bpf_syswrap_args4(x, args...) ___bpf_syswrap_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs) 511 + #define ___bpf_syswrap_args5(x, args...) ___bpf_syswrap_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs) 512 + #define ___bpf_syswrap_args(args...) ___bpf_apply(___bpf_syswrap_args, ___bpf_narg(args))(args) 513 + 506 514 /* 507 - * BPF_KPROBE_SYSCALL is a variant of BPF_KPROBE, which is intended for 515 + * BPF_KSYSCALL is a variant of BPF_KPROBE, which is intended for 508 516 * tracing syscall functions, like __x64_sys_close. It hides the underlying 509 517 * platform-specific low-level way of getting syscall input arguments from 510 518 * struct pt_regs, and provides a familiar typed and named function arguments 511 519 * syntax and semantics of accessing syscall input parameters. 512 520 * 513 - * Original struct pt_regs* context is preserved as 'ctx' argument. This might 521 + * Original struct pt_regs * context is preserved as 'ctx' argument. This might 514 522 * be necessary when using BPF helpers like bpf_perf_event_output(). 515 523 * 516 - * This macro relies on BPF CO-RE support. 524 + * At the moment BPF_KSYSCALL does not handle all the calling convention 525 + * quirks for mmap(), clone() and compat syscalls transparrently. This may or 526 + * may not change in the future. User needs to take extra measures to handle 527 + * such quirks explicitly, if necessary. 528 + * 529 + * This macro relies on BPF CO-RE support and virtual __kconfig externs. 517 530 */ 518 - #define BPF_KPROBE_SYSCALL(name, args...) \ 531 + #define BPF_KSYSCALL(name, args...) \ 519 532 name(struct pt_regs *ctx); \ 533 + extern _Bool LINUX_HAS_SYSCALL_WRAPPER __kconfig; \ 520 534 static __attribute__((always_inline)) typeof(name(0)) \ 521 535 ____##name(struct pt_regs *ctx, ##args); \ 522 536 typeof(name(0)) name(struct pt_regs *ctx) \ 523 537 { \ 524 - struct pt_regs *regs = PT_REGS_SYSCALL_REGS(ctx); \ 538 + struct pt_regs *regs = LINUX_HAS_SYSCALL_WRAPPER \ 539 + ? (struct pt_regs *)PT_REGS_PARM1(ctx) \ 540 + : ctx; \ 525 541 _Pragma("GCC diagnostic push") \ 526 542 _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \ 527 - return ____##name(___bpf_syscall_args(args)); \ 543 + if (LINUX_HAS_SYSCALL_WRAPPER) \ 544 + return ____##name(___bpf_syswrap_args(args)); \ 545 + else \ 546 + return ____##name(___bpf_syscall_args(args)); \ 528 547 _Pragma("GCC diagnostic pop") \ 529 548 } \ 530 549 static __attribute__((always_inline)) typeof(name(0)) \ 531 550 ____##name(struct pt_regs *ctx, ##args) 551 + 552 + #define BPF_KPROBE_SYSCALL BPF_KSYSCALL 532 553 533 554 #endif

+1 -1

tools/lib/bpf/btf_dump.c

··· 2045 2045 *value = *(__s64 *)data; 2046 2046 return 0; 2047 2047 case 4: 2048 - *value = is_signed ? *(__s32 *)data : *(__u32 *)data; 2048 + *value = is_signed ? (__s64)*(__s32 *)data : *(__u32 *)data; 2049 2049 return 0; 2050 2050 case 2: 2051 2051 *value = is_signed ? *(__s16 *)data : *(__u16 *)data;

+1 -1

tools/lib/bpf/gen_loader.c

··· 533 533 gen->attach_kind = kind; 534 534 ret = snprintf(gen->attach_target, sizeof(gen->attach_target), "%s%s", 535 535 prefix, attach_name); 536 - if (ret == sizeof(gen->attach_target)) 536 + if (ret >= sizeof(gen->attach_target)) 537 537 gen->error = -ENOSPC; 538 538 } 539 539

+287 -105

tools/lib/bpf/libbpf.c

··· 1694 1694 switch (ext->kcfg.type) { 1695 1695 case KCFG_BOOL: 1696 1696 if (value == 'm') { 1697 - pr_warn("extern (kcfg) %s=%c should be tristate or char\n", 1697 + pr_warn("extern (kcfg) '%s': value '%c' implies tristate or char type\n", 1698 1698 ext->name, value); 1699 1699 return -EINVAL; 1700 1700 } ··· 1715 1715 case KCFG_INT: 1716 1716 case KCFG_CHAR_ARR: 1717 1717 default: 1718 - pr_warn("extern (kcfg) %s=%c should be bool, tristate, or char\n", 1718 + pr_warn("extern (kcfg) '%s': value '%c' implies bool, tristate, or char type\n", 1719 1719 ext->name, value); 1720 1720 return -EINVAL; 1721 1721 } ··· 1729 1729 size_t len; 1730 1730 1731 1731 if (ext->kcfg.type != KCFG_CHAR_ARR) { 1732 - pr_warn("extern (kcfg) %s=%s should be char array\n", ext->name, value); 1732 + pr_warn("extern (kcfg) '%s': value '%s' implies char array type\n", 1733 + ext->name, value); 1733 1734 return -EINVAL; 1734 1735 } 1735 1736 ··· 1744 1743 /* strip quotes */ 1745 1744 len -= 2; 1746 1745 if (len >= ext->kcfg.sz) { 1747 - pr_warn("extern (kcfg) '%s': long string config %s of (%zu bytes) truncated to %d bytes\n", 1746 + pr_warn("extern (kcfg) '%s': long string '%s' of (%zu bytes) truncated to %d bytes\n", 1748 1747 ext->name, value, len, ext->kcfg.sz - 1); 1749 1748 len = ext->kcfg.sz - 1; 1750 1749 } ··· 1801 1800 static int set_kcfg_value_num(struct extern_desc *ext, void *ext_val, 1802 1801 __u64 value) 1803 1802 { 1804 - if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR) { 1805 - pr_warn("extern (kcfg) %s=%llu should be integer\n", 1803 + if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR && 1804 + ext->kcfg.type != KCFG_BOOL) { 1805 + pr_warn("extern (kcfg) '%s': value '%llu' implies integer, char, or boolean type\n", 1806 1806 ext->name, (unsigned long long)value); 1807 1807 return -EINVAL; 1808 1808 } 1809 + if (ext->kcfg.type == KCFG_BOOL && value > 1) { 1810 + pr_warn("extern (kcfg) '%s': value '%llu' isn't boolean compatible\n", 1811 + ext->name, (unsigned long long)value); 1812 + return -EINVAL; 1813 + 1814 + } 1809 1815 if (!is_kcfg_value_in_range(ext, value)) { 1810 - pr_warn("extern (kcfg) %s=%llu value doesn't fit in %d bytes\n", 1816 + pr_warn("extern (kcfg) '%s': value '%llu' doesn't fit in %d bytes\n", 1811 1817 ext->name, (unsigned long long)value, ext->kcfg.sz); 1812 1818 return -ERANGE; 1813 1819 } ··· 1878 1870 /* assume integer */ 1879 1871 err = parse_u64(value, &num); 1880 1872 if (err) { 1881 - pr_warn("extern (kcfg) %s=%s should be integer\n", 1882 - ext->name, value); 1873 + pr_warn("extern (kcfg) '%s': value '%s' isn't a valid integer\n", ext->name, value); 1883 1874 return err; 1875 + } 1876 + if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR) { 1877 + pr_warn("extern (kcfg) '%s': value '%s' implies integer type\n", ext->name, value); 1878 + return -EINVAL; 1884 1879 } 1885 1880 err = set_kcfg_value_num(ext, ext_val, num); 1886 1881 break; 1887 1882 } 1888 1883 if (err) 1889 1884 return err; 1890 - pr_debug("extern (kcfg) %s=%s\n", ext->name, value); 1885 + pr_debug("extern (kcfg) '%s': set to %s\n", ext->name, value); 1891 1886 return 0; 1892 1887 } 1893 1888 ··· 2331 2320 return 0; 2332 2321 } 2333 2322 2323 + static size_t adjust_ringbuf_sz(size_t sz) 2324 + { 2325 + __u32 page_sz = sysconf(_SC_PAGE_SIZE); 2326 + __u32 mul; 2327 + 2328 + /* if user forgot to set any size, make sure they see error */ 2329 + if (sz == 0) 2330 + return 0; 2331 + /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be 2332 + * a power-of-2 multiple of kernel's page size. If user diligently 2333 + * satisified these conditions, pass the size through. 2334 + */ 2335 + if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz)) 2336 + return sz; 2337 + 2338 + /* Otherwise find closest (page_sz * power_of_2) product bigger than 2339 + * user-set size to satisfy both user size request and kernel 2340 + * requirements and substitute correct max_entries for map creation. 2341 + */ 2342 + for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) { 2343 + if (mul * page_sz > sz) 2344 + return mul * page_sz; 2345 + } 2346 + 2347 + /* if it's impossible to satisfy the conditions (i.e., user size is 2348 + * very close to UINT_MAX but is not a power-of-2 multiple of 2349 + * page_size) then just return original size and let kernel reject it 2350 + */ 2351 + return sz; 2352 + } 2353 + 2334 2354 static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def) 2335 2355 { 2336 2356 map->def.type = def->map_type; ··· 2374 2332 map->numa_node = def->numa_node; 2375 2333 map->btf_key_type_id = def->key_type_id; 2376 2334 map->btf_value_type_id = def->value_type_id; 2335 + 2336 + /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */ 2337 + if (map->def.type == BPF_MAP_TYPE_RINGBUF) 2338 + map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 2377 2339 2378 2340 if (def->parts & MAP_DEF_MAP_TYPE) 2379 2341 pr_debug("map '%s': found type = %u.\n", map->name, def->map_type); ··· 3733 3687 ext->kcfg.type = find_kcfg_type(obj->btf, t->type, 3734 3688 &ext->kcfg.is_signed); 3735 3689 if (ext->kcfg.type == KCFG_UNKNOWN) { 3736 - pr_warn("extern (kcfg) '%s' type is unsupported\n", ext_name); 3690 + pr_warn("extern (kcfg) '%s': type is unsupported\n", ext_name); 3737 3691 return -ENOTSUP; 3738 3692 } 3739 3693 } else if (strcmp(sec_name, KSYMS_SEC) == 0) { ··· 4278 4232 int bpf_map__reuse_fd(struct bpf_map *map, int fd) 4279 4233 { 4280 4234 struct bpf_map_info info = {}; 4281 - __u32 len = sizeof(info); 4235 + __u32 len = sizeof(info), name_len; 4282 4236 int new_fd, err; 4283 4237 char *new_name; 4284 4238 ··· 4288 4242 if (err) 4289 4243 return libbpf_err(err); 4290 4244 4291 - new_name = strdup(info.name); 4245 + name_len = strlen(info.name); 4246 + if (name_len == BPF_OBJ_NAME_LEN - 1 && strncmp(map->name, info.name, name_len) == 0) 4247 + new_name = strdup(map->name); 4248 + else 4249 + new_name = strdup(info.name); 4250 + 4292 4251 if (!new_name) 4293 4252 return libbpf_err(-errno); 4294 4253 ··· 4352 4301 4353 4302 int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries) 4354 4303 { 4355 - if (map->fd >= 0) 4304 + if (map->obj->loaded) 4356 4305 return libbpf_err(-EBUSY); 4306 + 4357 4307 map->def.max_entries = max_entries; 4308 + 4309 + /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */ 4310 + if (map->def.type == BPF_MAP_TYPE_RINGBUF) 4311 + map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 4312 + 4358 4313 return 0; 4359 4314 } 4360 4315 ··· 4711 4654 strs, sizeof(strs))); 4712 4655 } 4713 4656 4657 + static int probe_kern_syscall_wrapper(void); 4658 + 4714 4659 enum kern_feature_result { 4715 4660 FEAT_UNKNOWN = 0, 4716 4661 FEAT_SUPPORTED = 1, ··· 4780 4721 }, 4781 4722 [FEAT_BTF_ENUM64] = { 4782 4723 "BTF_KIND_ENUM64 support", probe_kern_btf_enum64, 4724 + }, 4725 + [FEAT_SYSCALL_WRAPPER] = { 4726 + "Kernel using syscall wrapper", probe_kern_syscall_wrapper, 4783 4727 }, 4784 4728 }; 4785 4729 ··· 4916 4854 4917 4855 static void bpf_map__destroy(struct bpf_map *map); 4918 4856 4919 - static size_t adjust_ringbuf_sz(size_t sz) 4920 - { 4921 - __u32 page_sz = sysconf(_SC_PAGE_SIZE); 4922 - __u32 mul; 4923 - 4924 - /* if user forgot to set any size, make sure they see error */ 4925 - if (sz == 0) 4926 - return 0; 4927 - /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be 4928 - * a power-of-2 multiple of kernel's page size. If user diligently 4929 - * satisified these conditions, pass the size through. 4930 - */ 4931 - if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz)) 4932 - return sz; 4933 - 4934 - /* Otherwise find closest (page_sz * power_of_2) product bigger than 4935 - * user-set size to satisfy both user size request and kernel 4936 - * requirements and substitute correct max_entries for map creation. 4937 - */ 4938 - for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) { 4939 - if (mul * page_sz > sz) 4940 - return mul * page_sz; 4941 - } 4942 - 4943 - /* if it's impossible to satisfy the conditions (i.e., user size is 4944 - * very close to UINT_MAX but is not a power-of-2 multiple of 4945 - * page_size) then just return original size and let kernel reject it 4946 - */ 4947 - return sz; 4948 - } 4949 - 4950 4857 static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, bool is_inner) 4951 4858 { 4952 4859 LIBBPF_OPTS(bpf_map_create_opts, create_attr); ··· 4954 4923 } 4955 4924 4956 4925 switch (def->type) { 4957 - case BPF_MAP_TYPE_RINGBUF: 4958 - map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 4959 - /* fallthrough */ 4960 4926 case BPF_MAP_TYPE_PERF_EVENT_ARRAY: 4961 4927 case BPF_MAP_TYPE_CGROUP_ARRAY: 4962 4928 case BPF_MAP_TYPE_STACK_TRACE: ··· 7310 7282 return 0; 7311 7283 7312 7284 if (ext->is_set && ext->ksym.addr != sym_addr) { 7313 - pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n", 7285 + pr_warn("extern (ksym) '%s': resolution is ambiguous: 0x%llx or 0x%llx\n", 7314 7286 sym_name, ext->ksym.addr, sym_addr); 7315 7287 return -EINVAL; 7316 7288 } 7317 7289 if (!ext->is_set) { 7318 7290 ext->is_set = true; 7319 7291 ext->ksym.addr = sym_addr; 7320 - pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr); 7292 + pr_debug("extern (ksym) '%s': set to 0x%llx\n", sym_name, sym_addr); 7321 7293 } 7322 7294 return 0; 7323 7295 } ··· 7521 7493 for (i = 0; i < obj->nr_extern; i++) { 7522 7494 ext = &obj->externs[i]; 7523 7495 7524 - if (ext->type == EXT_KCFG && 7525 - strcmp(ext->name, "LINUX_KERNEL_VERSION") == 0) { 7526 - void *ext_val = kcfg_data + ext->kcfg.data_off; 7527 - __u32 kver = get_kernel_version(); 7528 - 7529 - if (!kver) { 7530 - pr_warn("failed to get kernel version\n"); 7531 - return -EINVAL; 7532 - } 7533 - err = set_kcfg_value_num(ext, ext_val, kver); 7534 - if (err) 7535 - return err; 7536 - pr_debug("extern (kcfg) %s=0x%x\n", ext->name, kver); 7537 - } else if (ext->type == EXT_KCFG && str_has_pfx(ext->name, "CONFIG_")) { 7538 - need_config = true; 7539 - } else if (ext->type == EXT_KSYM) { 7496 + if (ext->type == EXT_KSYM) { 7540 7497 if (ext->ksym.type_id) 7541 7498 need_vmlinux_btf = true; 7542 7499 else 7543 7500 need_kallsyms = true; 7501 + continue; 7502 + } else if (ext->type == EXT_KCFG) { 7503 + void *ext_ptr = kcfg_data + ext->kcfg.data_off; 7504 + __u64 value = 0; 7505 + 7506 + /* Kconfig externs need actual /proc/config.gz */ 7507 + if (str_has_pfx(ext->name, "CONFIG_")) { 7508 + need_config = true; 7509 + continue; 7510 + } 7511 + 7512 + /* Virtual kcfg externs are customly handled by libbpf */ 7513 + if (strcmp(ext->name, "LINUX_KERNEL_VERSION") == 0) { 7514 + value = get_kernel_version(); 7515 + if (!value) { 7516 + pr_warn("extern (kcfg) '%s': failed to get kernel version\n", ext->name); 7517 + return -EINVAL; 7518 + } 7519 + } else if (strcmp(ext->name, "LINUX_HAS_BPF_COOKIE") == 0) { 7520 + value = kernel_supports(obj, FEAT_BPF_COOKIE); 7521 + } else if (strcmp(ext->name, "LINUX_HAS_SYSCALL_WRAPPER") == 0) { 7522 + value = kernel_supports(obj, FEAT_SYSCALL_WRAPPER); 7523 + } else if (!str_has_pfx(ext->name, "LINUX_") || !ext->is_weak) { 7524 + /* Currently libbpf supports only CONFIG_ and LINUX_ prefixed 7525 + * __kconfig externs, where LINUX_ ones are virtual and filled out 7526 + * customly by libbpf (their values don't come from Kconfig). 7527 + * If LINUX_xxx variable is not recognized by libbpf, but is marked 7528 + * __weak, it defaults to zero value, just like for CONFIG_xxx 7529 + * externs. 7530 + */ 7531 + pr_warn("extern (kcfg) '%s': unrecognized virtual extern\n", ext->name); 7532 + return -EINVAL; 7533 + } 7534 + 7535 + err = set_kcfg_value_num(ext, ext_ptr, value); 7536 + if (err) 7537 + return err; 7538 + pr_debug("extern (kcfg) '%s': set to 0x%llx\n", 7539 + ext->name, (long long)value); 7544 7540 } else { 7545 - pr_warn("unrecognized extern '%s'\n", ext->name); 7541 + pr_warn("extern '%s': unrecognized extern kind\n", ext->name); 7546 7542 return -EINVAL; 7547 7543 } 7548 7544 } ··· 7602 7550 ext = &obj->externs[i]; 7603 7551 7604 7552 if (!ext->is_set && !ext->is_weak) { 7605 - pr_warn("extern %s (strong) not resolved\n", ext->name); 7553 + pr_warn("extern '%s' (strong): not resolved\n", ext->name); 7606 7554 return -ESRCH; 7607 7555 } else if (!ext->is_set) { 7608 - pr_debug("extern %s (weak) not resolved, defaulting to zero\n", 7556 + pr_debug("extern '%s' (weak): not resolved, defaulting to zero\n", 7609 7557 ext->name); 7610 7558 } 7611 7559 } ··· 8433 8381 8434 8382 static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8435 8383 static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8384 + static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8436 8385 static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8437 8386 static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8438 8387 static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link); ··· 8454 8401 SEC_DEF("uretprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe), 8455 8402 SEC_DEF("kprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi), 8456 8403 SEC_DEF("kretprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi), 8404 + SEC_DEF("ksyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall), 8405 + SEC_DEF("kretsyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall), 8457 8406 SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt), 8458 8407 SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE), 8459 8408 SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE), ··· 9812 9757 { 9813 9758 struct perf_event_attr attr = {}; 9814 9759 char errmsg[STRERR_BUFSIZE]; 9815 - int type, pfd, err; 9760 + int type, pfd; 9816 9761 9817 9762 if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS)) 9818 9763 return -EINVAL; ··· 9848 9793 pid < 0 ? -1 : pid /* pid */, 9849 9794 pid == -1 ? 0 : -1 /* cpu */, 9850 9795 -1 /* group_fd */, PERF_FLAG_FD_CLOEXEC); 9851 - if (pfd < 0) { 9852 - err = -errno; 9853 - pr_warn("%s perf_event_open() failed: %s\n", 9854 - uprobe ? "uprobe" : "kprobe", 9855 - libbpf_strerror_r(err, errmsg, sizeof(errmsg))); 9856 - return err; 9857 - } 9858 - return pfd; 9796 + return pfd >= 0 ? pfd : -errno; 9859 9797 } 9860 9798 9861 9799 static int append_to_file(const char *file, const char *fmt, ...) ··· 9871 9823 return err; 9872 9824 } 9873 9825 9826 + #define DEBUGFS "/sys/kernel/debug/tracing" 9827 + #define TRACEFS "/sys/kernel/tracing" 9828 + 9829 + static bool use_debugfs(void) 9830 + { 9831 + static int has_debugfs = -1; 9832 + 9833 + if (has_debugfs < 0) 9834 + has_debugfs = access(DEBUGFS, F_OK) == 0; 9835 + 9836 + return has_debugfs == 1; 9837 + } 9838 + 9839 + static const char *tracefs_path(void) 9840 + { 9841 + return use_debugfs() ? DEBUGFS : TRACEFS; 9842 + } 9843 + 9844 + static const char *tracefs_kprobe_events(void) 9845 + { 9846 + return use_debugfs() ? DEBUGFS"/kprobe_events" : TRACEFS"/kprobe_events"; 9847 + } 9848 + 9849 + static const char *tracefs_uprobe_events(void) 9850 + { 9851 + return use_debugfs() ? DEBUGFS"/uprobe_events" : TRACEFS"/uprobe_events"; 9852 + } 9853 + 9874 9854 static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz, 9875 9855 const char *kfunc_name, size_t offset) 9876 9856 { ··· 9911 9835 static int add_kprobe_event_legacy(const char *probe_name, bool retprobe, 9912 9836 const char *kfunc_name, size_t offset) 9913 9837 { 9914 - const char *file = "/sys/kernel/debug/tracing/kprobe_events"; 9915 - 9916 - return append_to_file(file, "%c:%s/%s %s+0x%zx", 9838 + return append_to_file(tracefs_kprobe_events(), "%c:%s/%s %s+0x%zx", 9917 9839 retprobe ? 'r' : 'p', 9918 9840 retprobe ? "kretprobes" : "kprobes", 9919 9841 probe_name, kfunc_name, offset); ··· 9919 9845 9920 9846 static int remove_kprobe_event_legacy(const char *probe_name, bool retprobe) 9921 9847 { 9922 - const char *file = "/sys/kernel/debug/tracing/kprobe_events"; 9923 - 9924 - return append_to_file(file, "-:%s/%s", retprobe ? "kretprobes" : "kprobes", probe_name); 9848 + return append_to_file(tracefs_kprobe_events(), "-:%s/%s", 9849 + retprobe ? "kretprobes" : "kprobes", probe_name); 9925 9850 } 9926 9851 9927 9852 static int determine_kprobe_perf_type_legacy(const char *probe_name, bool retprobe) 9928 9853 { 9929 9854 char file[256]; 9930 9855 9931 - snprintf(file, sizeof(file), 9932 - "/sys/kernel/debug/tracing/events/%s/%s/id", 9933 - retprobe ? "kretprobes" : "kprobes", probe_name); 9856 + snprintf(file, sizeof(file), "%s/events/%s/%s/id", 9857 + tracefs_path(), retprobe ? "kretprobes" : "kprobes", probe_name); 9934 9858 9935 9859 return parse_uint_from_file(file, "%d\n"); 9936 9860 } ··· 9975 9903 /* Clear the newly added legacy kprobe_event */ 9976 9904 remove_kprobe_event_legacy(probe_name, retprobe); 9977 9905 return err; 9906 + } 9907 + 9908 + static const char *arch_specific_syscall_pfx(void) 9909 + { 9910 + #if defined(__x86_64__) 9911 + return "x64"; 9912 + #elif defined(__i386__) 9913 + return "ia32"; 9914 + #elif defined(__s390x__) 9915 + return "s390x"; 9916 + #elif defined(__s390__) 9917 + return "s390"; 9918 + #elif defined(__arm__) 9919 + return "arm"; 9920 + #elif defined(__aarch64__) 9921 + return "arm64"; 9922 + #elif defined(__mips__) 9923 + return "mips"; 9924 + #elif defined(__riscv) 9925 + return "riscv"; 9926 + #else 9927 + return NULL; 9928 + #endif 9929 + } 9930 + 9931 + static int probe_kern_syscall_wrapper(void) 9932 + { 9933 + char syscall_name[64]; 9934 + const char *ksys_pfx; 9935 + 9936 + ksys_pfx = arch_specific_syscall_pfx(); 9937 + if (!ksys_pfx) 9938 + return 0; 9939 + 9940 + snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx); 9941 + 9942 + if (determine_kprobe_perf_type() >= 0) { 9943 + int pfd; 9944 + 9945 + pfd = perf_event_open_probe(false, false, syscall_name, 0, getpid(), 0); 9946 + if (pfd >= 0) 9947 + close(pfd); 9948 + 9949 + return pfd >= 0 ? 1 : 0; 9950 + } else { /* legacy mode */ 9951 + char probe_name[128]; 9952 + 9953 + gen_kprobe_legacy_event_name(probe_name, sizeof(probe_name), syscall_name, 0); 9954 + if (add_kprobe_event_legacy(probe_name, false, syscall_name, 0) < 0) 9955 + return 0; 9956 + 9957 + (void)remove_kprobe_event_legacy(probe_name, false); 9958 + return 1; 9959 + } 9978 9960 } 9979 9961 9980 9962 struct bpf_link * ··· 10114 9988 ); 10115 9989 10116 9990 return bpf_program__attach_kprobe_opts(prog, func_name, &opts); 9991 + } 9992 + 9993 + struct bpf_link *bpf_program__attach_ksyscall(const struct bpf_program *prog, 9994 + const char *syscall_name, 9995 + const struct bpf_ksyscall_opts *opts) 9996 + { 9997 + LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts); 9998 + char func_name[128]; 9999 + 10000 + if (!OPTS_VALID(opts, bpf_ksyscall_opts)) 10001 + return libbpf_err_ptr(-EINVAL); 10002 + 10003 + if (kernel_supports(prog->obj, FEAT_SYSCALL_WRAPPER)) { 10004 + snprintf(func_name, sizeof(func_name), "__%s_sys_%s", 10005 + arch_specific_syscall_pfx(), syscall_name); 10006 + } else { 10007 + snprintf(func_name, sizeof(func_name), "__se_sys_%s", syscall_name); 10008 + } 10009 + 10010 + kprobe_opts.retprobe = OPTS_GET(opts, retprobe, false); 10011 + kprobe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0); 10012 + 10013 + return bpf_program__attach_kprobe_opts(prog, func_name, &kprobe_opts); 10117 10014 } 10118 10015 10119 10016 /* Adapted from perf/util/string.c */ ··· 10309 10160 return libbpf_get_error(*link); 10310 10161 } 10311 10162 10163 + static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link) 10164 + { 10165 + LIBBPF_OPTS(bpf_ksyscall_opts, opts); 10166 + const char *syscall_name; 10167 + 10168 + *link = NULL; 10169 + 10170 + /* no auto-attach for SEC("ksyscall") and SEC("kretsyscall") */ 10171 + if (strcmp(prog->sec_name, "ksyscall") == 0 || strcmp(prog->sec_name, "kretsyscall") == 0) 10172 + return 0; 10173 + 10174 + opts.retprobe = str_has_pfx(prog->sec_name, "kretsyscall/"); 10175 + if (opts.retprobe) 10176 + syscall_name = prog->sec_name + sizeof("kretsyscall/") - 1; 10177 + else 10178 + syscall_name = prog->sec_name + sizeof("ksyscall/") - 1; 10179 + 10180 + *link = bpf_program__attach_ksyscall(prog, syscall_name, &opts); 10181 + return *link ? 0 : -errno; 10182 + } 10183 + 10312 10184 static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link) 10313 10185 { 10314 10186 LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); ··· 10378 10208 static inline int add_uprobe_event_legacy(const char *probe_name, bool retprobe, 10379 10209 const char *binary_path, size_t offset) 10380 10210 { 10381 - const char *file = "/sys/kernel/debug/tracing/uprobe_events"; 10382 - 10383 - return append_to_file(file, "%c:%s/%s %s:0x%zx", 10211 + return append_to_file(tracefs_uprobe_events(), "%c:%s/%s %s:0x%zx", 10384 10212 retprobe ? 'r' : 'p', 10385 10213 retprobe ? "uretprobes" : "uprobes", 10386 10214 probe_name, binary_path, offset); ··· 10386 10218 10387 10219 static inline int remove_uprobe_event_legacy(const char *probe_name, bool retprobe) 10388 10220 { 10389 - const char *file = "/sys/kernel/debug/tracing/uprobe_events"; 10390 - 10391 - return append_to_file(file, "-:%s/%s", retprobe ? "uretprobes" : "uprobes", probe_name); 10221 + return append_to_file(tracefs_uprobe_events(), "-:%s/%s", 10222 + retprobe ? "uretprobes" : "uprobes", probe_name); 10392 10223 } 10393 10224 10394 10225 static int determine_uprobe_perf_type_legacy(const char *probe_name, bool retprobe) 10395 10226 { 10396 10227 char file[512]; 10397 10228 10398 - snprintf(file, sizeof(file), 10399 - "/sys/kernel/debug/tracing/events/%s/%s/id", 10400 - retprobe ? "uretprobes" : "uprobes", probe_name); 10229 + snprintf(file, sizeof(file), "%s/events/%s/%s/id", 10230 + tracefs_path(), retprobe ? "uretprobes" : "uprobes", probe_name); 10401 10231 10402 10232 return parse_uint_from_file(file, "%d\n"); 10403 10233 } ··· 10711 10545 ref_ctr_off = OPTS_GET(opts, ref_ctr_offset, 0); 10712 10546 pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0); 10713 10547 10714 - if (binary_path && !strchr(binary_path, '/')) { 10548 + if (!binary_path) 10549 + return libbpf_err_ptr(-EINVAL); 10550 + 10551 + if (!strchr(binary_path, '/')) { 10715 10552 err = resolve_full_path(binary_path, full_binary_path, 10716 10553 sizeof(full_binary_path)); 10717 10554 if (err) { ··· 10728 10559 if (func_name) { 10729 10560 long sym_off; 10730 10561 10731 - if (!binary_path) { 10732 - pr_warn("prog '%s': name-based attach requires binary_path\n", 10733 - prog->name); 10734 - return libbpf_err_ptr(-EINVAL); 10735 - } 10736 10562 sym_off = elf_find_func_offset(binary_path, func_name); 10737 10563 if (sym_off < 0) 10738 10564 return libbpf_err_ptr(sym_off); ··· 10875 10711 return libbpf_err_ptr(-EINVAL); 10876 10712 } 10877 10713 10714 + if (!binary_path) 10715 + return libbpf_err_ptr(-EINVAL); 10716 + 10878 10717 if (!strchr(binary_path, '/')) { 10879 10718 err = resolve_full_path(binary_path, resolved_path, sizeof(resolved_path)); 10880 10719 if (err) { ··· 10943 10776 char file[PATH_MAX]; 10944 10777 int ret; 10945 10778 10946 - ret = snprintf(file, sizeof(file), 10947 - "/sys/kernel/debug/tracing/events/%s/%s/id", 10948 - tp_category, tp_name); 10779 + ret = snprintf(file, sizeof(file), "%s/events/%s/%s/id", 10780 + tracefs_path(), tp_category, tp_name); 10949 10781 if (ret < 0) 10950 10782 return -errno; 10951 10783 if (ret >= sizeof(file)) { ··· 11892 11726 return libbpf_err(-ENOENT); 11893 11727 11894 11728 return cpu_buf->fd; 11729 + } 11730 + 11731 + int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, size_t *buf_size) 11732 + { 11733 + struct perf_cpu_buf *cpu_buf; 11734 + 11735 + if (buf_idx >= pb->cpu_cnt) 11736 + return libbpf_err(-EINVAL); 11737 + 11738 + cpu_buf = pb->cpu_bufs[buf_idx]; 11739 + if (!cpu_buf) 11740 + return libbpf_err(-ENOENT); 11741 + 11742 + *buf = cpu_buf->base; 11743 + *buf_size = pb->mmap_size; 11744 + return 0; 11895 11745 } 11896 11746 11897 11747 /*

+62

tools/lib/bpf/libbpf.h

··· 457 457 const char *pattern, 458 458 const struct bpf_kprobe_multi_opts *opts); 459 459 460 + struct bpf_ksyscall_opts { 461 + /* size of this struct, for forward/backward compatiblity */ 462 + size_t sz; 463 + /* custom user-provided value fetchable through bpf_get_attach_cookie() */ 464 + __u64 bpf_cookie; 465 + /* attach as return probe? */ 466 + bool retprobe; 467 + size_t :0; 468 + }; 469 + #define bpf_ksyscall_opts__last_field retprobe 470 + 471 + /** 472 + * @brief **bpf_program__attach_ksyscall()** attaches a BPF program 473 + * to kernel syscall handler of a specified syscall. Optionally it's possible 474 + * to request to install retprobe that will be triggered at syscall exit. It's 475 + * also possible to associate BPF cookie (though options). 476 + * 477 + * Libbpf automatically will determine correct full kernel function name, 478 + * which depending on system architecture and kernel version/configuration 479 + * could be of the form __<arch>_sys_<syscall> or __se_sys_<syscall>, and will 480 + * attach specified program using kprobe/kretprobe mechanism. 481 + * 482 + * **bpf_program__attach_ksyscall()** is an API counterpart of declarative 483 + * **SEC("ksyscall/<syscall>")** annotation of BPF programs. 484 + * 485 + * At the moment **SEC("ksyscall")** and **bpf_program__attach_ksyscall()** do 486 + * not handle all the calling convention quirks for mmap(), clone() and compat 487 + * syscalls. It also only attaches to "native" syscall interfaces. If host 488 + * system supports compat syscalls or defines 32-bit syscalls in 64-bit 489 + * kernel, such syscall interfaces won't be attached to by libbpf. 490 + * 491 + * These limitations may or may not change in the future. Therefore it is 492 + * recommended to use SEC("kprobe") for these syscalls or if working with 493 + * compat and 32-bit interfaces is required. 494 + * 495 + * @param prog BPF program to attach 496 + * @param syscall_name Symbolic name of the syscall (e.g., "bpf") 497 + * @param opts Additional options (see **struct bpf_ksyscall_opts**) 498 + * @return Reference to the newly created BPF link; or NULL is returned on 499 + * error, error code is stored in errno 500 + */ 501 + LIBBPF_API struct bpf_link * 502 + bpf_program__attach_ksyscall(const struct bpf_program *prog, 503 + const char *syscall_name, 504 + const struct bpf_ksyscall_opts *opts); 505 + 460 506 struct bpf_uprobe_opts { 461 507 /* size of this struct, for forward/backward compatiblity */ 462 508 size_t sz; ··· 1099 1053 LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx); 1100 1054 LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb); 1101 1055 LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx); 1056 + /** 1057 + * @brief **perf_buffer__buffer()** returns the per-cpu raw mmap()'ed underlying 1058 + * memory region of the ring buffer. 1059 + * This ring buffer can be used to implement a custom events consumer. 1060 + * The ring buffer starts with the *struct perf_event_mmap_page*, which 1061 + * holds the ring buffer managment fields, when accessing the header 1062 + * structure it's important to be SMP aware. 1063 + * You can refer to *perf_event_read_simple* for a simple example. 1064 + * @param pb the perf buffer structure 1065 + * @param buf_idx the buffer index to retreive 1066 + * @param buf (out) gets the base pointer of the mmap()'ed memory 1067 + * @param buf_size (out) gets the size of the mmap()'ed region 1068 + * @return 0 on success, negative error code for failure 1069 + */ 1070 + LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, 1071 + size_t *buf_size); 1102 1072 1103 1073 struct bpf_prog_linfo; 1104 1074 struct bpf_prog_info;

+2

tools/lib/bpf/libbpf.map

··· 356 356 LIBBPF_1.0.0 { 357 357 global: 358 358 bpf_prog_query_opts; 359 + bpf_program__attach_ksyscall; 359 360 btf__add_enum64; 360 361 btf__add_enum64_value; 361 362 libbpf_bpf_attach_type_str; 362 363 libbpf_bpf_link_type_str; 363 364 libbpf_bpf_map_type_str; 364 365 libbpf_bpf_prog_type_str; 366 + perf_buffer__buffer; 365 367 };

+5 -3

tools/lib/bpf/libbpf_internal.h

··· 108 108 size_t str_len = strlen(str); 109 109 size_t sfx_len = strlen(sfx); 110 110 111 - if (sfx_len <= str_len) 112 - return strcmp(str + str_len - sfx_len, sfx); 113 - return false; 111 + if (sfx_len > str_len) 112 + return false; 113 + return strcmp(str + str_len - sfx_len, sfx) == 0; 114 114 } 115 115 116 116 /* Symbol versioning is different between static and shared library. ··· 352 352 FEAT_BPF_COOKIE, 353 353 /* BTF_KIND_ENUM64 support and BTF_KIND_ENUM kflag support */ 354 354 FEAT_BTF_ENUM64, 355 + /* Kernel uses syscall wrapper (CONFIG_ARCH_HAS_SYSCALL_WRAPPER) */ 356 + FEAT_SYSCALL_WRAPPER, 355 357 __FEAT_CNT, 356 358 }; 357 359

+2 -14

tools/lib/bpf/usdt.bpf.h

··· 6 6 #include <linux/errno.h> 7 7 #include <bpf/bpf_helpers.h> 8 8 #include <bpf/bpf_tracing.h> 9 - #include <bpf/bpf_core_read.h> 10 9 11 10 /* Below types and maps are internal implementation details of libbpf's USDT 12 11 * support and are subjects to change. Also, bpf_usdt_xxx() API helpers should ··· 28 29 */ 29 30 #ifndef BPF_USDT_MAX_IP_CNT 30 31 #define BPF_USDT_MAX_IP_CNT (4 * BPF_USDT_MAX_SPEC_CNT) 31 - #endif 32 - /* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is 33 - * the only dependency on CO-RE, so if it's undesirable, user can override 34 - * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not. 35 - */ 36 - #ifndef BPF_USDT_HAS_BPF_COOKIE 37 - #define BPF_USDT_HAS_BPF_COOKIE \ 38 - bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt) 39 32 #endif 40 33 41 34 enum __bpf_usdt_arg_type { ··· 74 83 __type(value, __u32); 75 84 } __bpf_usdt_ip_to_spec_id SEC(".maps") __weak; 76 85 77 - /* don't rely on user's BPF code to have latest definition of bpf_func_id */ 78 - enum bpf_func_id___usdt { 79 - BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */ 80 - }; 86 + extern const _Bool LINUX_HAS_BPF_COOKIE __kconfig; 81 87 82 88 static __always_inline 83 89 int __bpf_usdt_spec_id(struct pt_regs *ctx) 84 90 { 85 - if (!BPF_USDT_HAS_BPF_COOKIE) { 91 + if (!LINUX_HAS_BPF_COOKIE) { 86 92 long ip = PT_REGS_IP(ctx); 87 93 int *spec_id_ptr; 88 94

+5 -5

tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c

··· 148 148 .write = bpf_testmod_test_write, 149 149 }; 150 150 151 - BTF_SET_START(bpf_testmod_check_kfunc_ids) 152 - BTF_ID(func, bpf_testmod_test_mod_kfunc) 153 - BTF_SET_END(bpf_testmod_check_kfunc_ids) 151 + BTF_SET8_START(bpf_testmod_check_kfunc_ids) 152 + BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc) 153 + BTF_SET8_END(bpf_testmod_check_kfunc_ids) 154 154 155 155 static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = { 156 - .owner = THIS_MODULE, 157 - .check_set = &bpf_testmod_check_kfunc_ids, 156 + .owner = THIS_MODULE, 157 + .set = &bpf_testmod_check_kfunc_ids, 158 158 }; 159 159 160 160 extern int bpf_fentry_test1(int a);

+16

tools/testing/selftests/bpf/prog_tests/bpf_iter.c

··· 27 27 #include "bpf_iter_test_kern5.skel.h" 28 28 #include "bpf_iter_test_kern6.skel.h" 29 29 #include "bpf_iter_bpf_link.skel.h" 30 + #include "bpf_iter_ksym.skel.h" 30 31 31 32 static int duration; 32 33 ··· 1121 1120 bpf_iter_bpf_link__destroy(skel); 1122 1121 } 1123 1122 1123 + static void test_ksym_iter(void) 1124 + { 1125 + struct bpf_iter_ksym *skel; 1126 + 1127 + skel = bpf_iter_ksym__open_and_load(); 1128 + if (!ASSERT_OK_PTR(skel, "bpf_iter_ksym__open_and_load")) 1129 + return; 1130 + 1131 + do_dummy_read(skel->progs.dump_ksym); 1132 + 1133 + bpf_iter_ksym__destroy(skel); 1134 + } 1135 + 1124 1136 #define CMP_BUFFER_SIZE 1024 1125 1137 static char task_vma_output[CMP_BUFFER_SIZE]; 1126 1138 static char proc_maps_output[CMP_BUFFER_SIZE]; ··· 1281 1267 test_buf_neg_offset(); 1282 1268 if (test__start_subtest("link-iter")) 1283 1269 test_link_iter(); 1270 + if (test__start_subtest("ksym")) 1271 + test_ksym_iter(); 1284 1272 }

+63 -1

tools/testing/selftests/bpf/prog_tests/bpf_nf.c

··· 2 2 #include <test_progs.h> 3 3 #include <network_helpers.h> 4 4 #include "test_bpf_nf.skel.h" 5 + #include "test_bpf_nf_fail.skel.h" 6 + 7 + static char log_buf[1024 * 1024]; 8 + 9 + struct { 10 + const char *prog_name; 11 + const char *err_msg; 12 + } test_bpf_nf_fail_tests[] = { 13 + { "alloc_release", "kernel function bpf_ct_release args#0 expected pointer to STRUCT nf_conn but" }, 14 + { "insert_insert", "kernel function bpf_ct_insert_entry args#0 expected pointer to STRUCT nf_conn___init but" }, 15 + { "lookup_insert", "kernel function bpf_ct_insert_entry args#0 expected pointer to STRUCT nf_conn___init but" }, 16 + { "set_timeout_after_insert", "kernel function bpf_ct_set_timeout args#0 expected pointer to STRUCT nf_conn___init but" }, 17 + { "set_status_after_insert", "kernel function bpf_ct_set_status args#0 expected pointer to STRUCT nf_conn___init but" }, 18 + { "change_timeout_after_alloc", "kernel function bpf_ct_change_timeout args#0 expected pointer to STRUCT nf_conn but" }, 19 + { "change_status_after_alloc", "kernel function bpf_ct_change_status args#0 expected pointer to STRUCT nf_conn but" }, 20 + }; 5 21 6 22 enum { 7 23 TEST_XDP, 8 24 TEST_TC_BPF, 9 25 }; 10 26 11 - void test_bpf_nf_ct(int mode) 27 + static void test_bpf_nf_ct(int mode) 12 28 { 13 29 struct test_bpf_nf *skel; 14 30 int prog_fd, err; ··· 55 39 ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id"); 56 40 ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup"); 57 41 ASSERT_EQ(skel->bss->test_eafnosupport, -EAFNOSUPPORT, "Test EAFNOSUPPORT for invalid len__tuple"); 42 + ASSERT_EQ(skel->data->test_alloc_entry, 0, "Test for alloc new entry"); 43 + ASSERT_EQ(skel->data->test_insert_entry, 0, "Test for insert new entry"); 44 + ASSERT_EQ(skel->data->test_succ_lookup, 0, "Test for successful lookup"); 45 + /* allow some tolerance for test_delta_timeout value to avoid races. */ 46 + ASSERT_GT(skel->bss->test_delta_timeout, 8, "Test for min ct timeout update"); 47 + ASSERT_LE(skel->bss->test_delta_timeout, 10, "Test for max ct timeout update"); 48 + /* expected status is IPS_SEEN_REPLY */ 49 + ASSERT_EQ(skel->bss->test_status, 2, "Test for ct status update "); 58 50 end: 59 51 test_bpf_nf__destroy(skel); 60 52 } 61 53 54 + static void test_bpf_nf_ct_fail(const char *prog_name, const char *err_msg) 55 + { 56 + LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf, 57 + .kernel_log_size = sizeof(log_buf), 58 + .kernel_log_level = 1); 59 + struct test_bpf_nf_fail *skel; 60 + struct bpf_program *prog; 61 + int ret; 62 + 63 + skel = test_bpf_nf_fail__open_opts(&opts); 64 + if (!ASSERT_OK_PTR(skel, "test_bpf_nf_fail__open")) 65 + return; 66 + 67 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 68 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 69 + goto end; 70 + 71 + bpf_program__set_autoload(prog, true); 72 + 73 + ret = test_bpf_nf_fail__load(skel); 74 + if (!ASSERT_ERR(ret, "test_bpf_nf_fail__load must fail")) 75 + goto end; 76 + 77 + if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) { 78 + fprintf(stderr, "Expected: %s\n", err_msg); 79 + fprintf(stderr, "Verifier: %s\n", log_buf); 80 + } 81 + 82 + end: 83 + test_bpf_nf_fail__destroy(skel); 84 + } 85 + 62 86 void test_bpf_nf(void) 63 87 { 88 + int i; 64 89 if (test__start_subtest("xdp-ct")) 65 90 test_bpf_nf_ct(TEST_XDP); 66 91 if (test__start_subtest("tc-bpf-ct")) 67 92 test_bpf_nf_ct(TEST_TC_BPF); 93 + for (i = 0; i < ARRAY_SIZE(test_bpf_nf_fail_tests); i++) { 94 + if (test__start_subtest(test_bpf_nf_fail_tests[i].prog_name)) 95 + test_bpf_nf_ct_fail(test_bpf_nf_fail_tests[i].prog_name, 96 + test_bpf_nf_fail_tests[i].err_msg); 97 + } 68 98 }

+1 -1

tools/testing/selftests/bpf/prog_tests/btf.c

··· 5338 5338 ret = snprintf(pin_path, sizeof(pin_path), "%s/%s", 5339 5339 "/sys/fs/bpf", test->map_name); 5340 5340 5341 - if (CHECK(ret == sizeof(pin_path), "pin_path %s/%s is too long", 5341 + if (CHECK(ret >= sizeof(pin_path), "pin_path %s/%s is too long", 5342 5342 "/sys/fs/bpf", test->map_name)) { 5343 5343 err = -1; 5344 5344 goto done;

+7 -10

tools/testing/selftests/bpf/prog_tests/core_extern.c

··· 39 39 "CONFIG_STR=\"abracad\"\n" 40 40 "CONFIG_MISSING=0", 41 41 .data = { 42 + .unkn_virt_val = 0, 42 43 .bpf_syscall = false, 43 44 .tristate_val = TRI_MODULE, 44 45 .bool_val = true, ··· 122 121 void test_core_extern(void) 123 122 { 124 123 const uint32_t kern_ver = get_kernel_version(); 125 - int err, duration = 0, i, j; 124 + int err, i, j; 126 125 struct test_core_extern *skel = NULL; 127 126 uint64_t *got, *exp; 128 127 int n = sizeof(*skel->data) / sizeof(uint64_t); ··· 137 136 continue; 138 137 139 138 skel = test_core_extern__open_opts(&opts); 140 - if (CHECK(!skel, "skel_open", "skeleton open failed\n")) 139 + if (!ASSERT_OK_PTR(skel, "skel_open")) 141 140 goto cleanup; 142 141 err = test_core_extern__load(skel); 143 142 if (t->fails) { 144 - CHECK(!err, "skel_load", 145 - "shouldn't succeed open/load of skeleton\n"); 143 + ASSERT_ERR(err, "skel_load_should_fail"); 146 144 goto cleanup; 147 - } else if (CHECK(err, "skel_load", 148 - "failed to open/load skeleton\n")) { 145 + } else if (!ASSERT_OK(err, "skel_load")) { 149 146 goto cleanup; 150 147 } 151 148 err = test_core_extern__attach(skel); 152 - if (CHECK(err, "attach_raw_tp", "failed attach: %d\n", err)) 149 + if (!ASSERT_OK(err, "attach_raw_tp")) 153 150 goto cleanup; 154 151 155 152 usleep(1); ··· 157 158 got = (uint64_t *)skel->data; 158 159 exp = (uint64_t *)&t->data; 159 160 for (j = 0; j < n; j++) { 160 - CHECK(got[j] != exp[j], "check_res", 161 - "result #%d: expected %llx, but got %llx\n", 162 - j, (__u64)exp[j], (__u64)got[j]); 161 + ASSERT_EQ(got[j], exp[j], "result"); 163 162 } 164 163 cleanup: 165 164 test_core_extern__destroy(skel);

+2

tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c

··· 364 364 continue; 365 365 if (!strncmp(name, "rcu_", 4)) 366 366 continue; 367 + if (!strcmp(name, "bpf_dispatcher_xdp_func")) 368 + continue; 367 369 if (!strncmp(name, "__ftrace_invalid_address__", 368 370 sizeof("__ftrace_invalid_address__") - 1)) 369 371 continue;

+11

tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c

··· 50 50 if (CHECK(!skel, "skel_open", "skeleton open failed\n")) 51 51 return; 52 52 53 + /* validate ringbuf size adjustment logic */ 54 + ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), page_size, "rb1_size_before"); 55 + ASSERT_OK(bpf_map__set_max_entries(skel->maps.ringbuf1, page_size + 1), "rb1_resize"); 56 + ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), 2 * page_size, "rb1_size_after"); 57 + ASSERT_OK(bpf_map__set_max_entries(skel->maps.ringbuf1, page_size), "rb1_reset"); 58 + ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), page_size, "rb1_size_final"); 59 + 53 60 proto_fd = bpf_map_create(BPF_MAP_TYPE_RINGBUF, NULL, 0, 0, page_size, NULL); 54 61 if (CHECK(proto_fd < 0, "bpf_map_create", "bpf_map_create failed\n")) 55 62 goto cleanup; ··· 71 64 72 65 close(proto_fd); 73 66 proto_fd = -1; 67 + 68 + /* make sure we can't resize ringbuf after object load */ 69 + if (!ASSERT_ERR(bpf_map__set_max_entries(skel->maps.ringbuf1, 3 * page_size), "rb1_resize_after_load")) 70 + goto cleanup; 74 71 75 72 /* only trigger BPF program for current process */ 76 73 skel->bss->pid = getpid();

+2

tools/testing/selftests/bpf/prog_tests/skeleton.c

··· 122 122 123 123 ASSERT_EQ(skel->bss->out_mostly_var, 123, "out_mostly_var"); 124 124 125 + ASSERT_EQ(bss->huge_arr[ARRAY_SIZE(bss->huge_arr) - 1], 123, "huge_arr"); 126 + 125 127 elf_bytes = test_skeleton__elf_bytes(&elf_bytes_sz); 126 128 ASSERT_OK_PTR(elf_bytes, "elf_bytes"); 127 129 ASSERT_GE(elf_bytes_sz, 0, "elf_bytes_sz");

+7

tools/testing/selftests/bpf/progs/bpf_iter.h

··· 22 22 #define BTF_F_NONAME BTF_F_NONAME___not_used 23 23 #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used 24 24 #define BTF_F_ZERO BTF_F_ZERO___not_used 25 + #define bpf_iter__ksym bpf_iter__ksym___not_used 25 26 #include "vmlinux.h" 26 27 #undef bpf_iter_meta 27 28 #undef bpf_iter__bpf_map ··· 45 44 #undef BTF_F_NONAME 46 45 #undef BTF_F_PTR_RAW 47 46 #undef BTF_F_ZERO 47 + #undef bpf_iter__ksym 48 48 49 49 struct bpf_iter_meta { 50 50 struct seq_file *seq; ··· 152 150 BTF_F_NONAME = (1ULL << 1), 153 151 BTF_F_PTR_RAW = (1ULL << 2), 154 152 BTF_F_ZERO = (1ULL << 3), 153 + }; 154 + 155 + struct bpf_iter__ksym { 156 + struct bpf_iter_meta *meta; 157 + struct kallsym_iter *ksym; 155 158 };

+74

tools/testing/selftests/bpf/progs/bpf_iter_ksym.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022, Oracle and/or its affiliates. */ 3 + #include "bpf_iter.h" 4 + #include <bpf/bpf_helpers.h> 5 + 6 + char _license[] SEC("license") = "GPL"; 7 + 8 + unsigned long last_sym_value = 0; 9 + 10 + static inline char tolower(char c) 11 + { 12 + if (c >= 'A' && c <= 'Z') 13 + c += ('a' - 'A'); 14 + return c; 15 + } 16 + 17 + static inline char toupper(char c) 18 + { 19 + if (c >= 'a' && c <= 'z') 20 + c -= ('a' - 'A'); 21 + return c; 22 + } 23 + 24 + /* Dump symbols with max size; the latter is calculated by caching symbol N value 25 + * and when iterating on symbol N+1, we can print max size of symbol N via 26 + * address of N+1 - address of N. 27 + */ 28 + SEC("iter/ksym") 29 + int dump_ksym(struct bpf_iter__ksym *ctx) 30 + { 31 + struct seq_file *seq = ctx->meta->seq; 32 + struct kallsym_iter *iter = ctx->ksym; 33 + __u32 seq_num = ctx->meta->seq_num; 34 + unsigned long value; 35 + char type; 36 + int ret; 37 + 38 + if (!iter) 39 + return 0; 40 + 41 + if (seq_num == 0) { 42 + BPF_SEQ_PRINTF(seq, "ADDR TYPE NAME MODULE_NAME KIND MAX_SIZE\n"); 43 + return 0; 44 + } 45 + if (last_sym_value) 46 + BPF_SEQ_PRINTF(seq, "0x%x\n", iter->value - last_sym_value); 47 + else 48 + BPF_SEQ_PRINTF(seq, "\n"); 49 + 50 + value = iter->show_value ? iter->value : 0; 51 + 52 + last_sym_value = value; 53 + 54 + type = iter->type; 55 + 56 + if (iter->module_name[0]) { 57 + type = iter->exported ? toupper(type) : tolower(type); 58 + BPF_SEQ_PRINTF(seq, "0x%llx %c %s [ %s ] ", 59 + value, type, iter->name, iter->module_name); 60 + } else { 61 + BPF_SEQ_PRINTF(seq, "0x%llx %c %s ", value, type, iter->name); 62 + } 63 + if (!iter->pos_arch_end || iter->pos_arch_end > iter->pos) 64 + BPF_SEQ_PRINTF(seq, "CORE "); 65 + else if (!iter->pos_mod_end || iter->pos_mod_end > iter->pos) 66 + BPF_SEQ_PRINTF(seq, "MOD "); 67 + else if (!iter->pos_ftrace_mod_end || iter->pos_ftrace_mod_end > iter->pos) 68 + BPF_SEQ_PRINTF(seq, "FTRACE_MOD "); 69 + else if (!iter->pos_bpf_end || iter->pos_bpf_end > iter->pos) 70 + BPF_SEQ_PRINTF(seq, "BPF "); 71 + else 72 + BPF_SEQ_PRINTF(seq, "KPROBE "); 73 + return 0; 74 + }

+3 -3

tools/testing/selftests/bpf/progs/bpf_syscall_macro.c

··· 64 64 return 0; 65 65 } 66 66 67 - SEC("kprobe/" SYS_PREFIX "sys_prctl") 68 - int BPF_KPROBE_SYSCALL(prctl_enter, int option, unsigned long arg2, 69 - unsigned long arg3, unsigned long arg4, unsigned long arg5) 67 + SEC("ksyscall/prctl") 68 + int BPF_KSYSCALL(prctl_enter, int option, unsigned long arg2, 69 + unsigned long arg3, unsigned long arg4, unsigned long arg5) 70 70 { 71 71 pid_t pid = bpf_get_current_pid_tgid() >> 32; 72 72

+7 -8

tools/testing/selftests/bpf/progs/test_attach_probe.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 // Copyright (c) 2017 Facebook 3 3 4 - #include <linux/ptrace.h> 5 - #include <linux/bpf.h> 4 + #include "vmlinux.h" 6 5 #include <bpf/bpf_helpers.h> 7 6 #include <bpf/bpf_tracing.h> 8 - #include <stdbool.h> 7 + #include <bpf/bpf_core_read.h> 9 8 #include "bpf_misc.h" 10 9 11 10 int kprobe_res = 0; ··· 30 31 return 0; 31 32 } 32 33 33 - SEC("kprobe/" SYS_PREFIX "sys_nanosleep") 34 - int BPF_KPROBE(handle_kprobe_auto) 34 + SEC("ksyscall/nanosleep") 35 + int BPF_KSYSCALL(handle_kprobe_auto, struct __kernel_timespec *req, struct __kernel_timespec *rem) 35 36 { 36 37 kprobe2_res = 11; 37 38 return 0; ··· 55 56 return 0; 56 57 } 57 58 58 - SEC("kretprobe/" SYS_PREFIX "sys_nanosleep") 59 - int BPF_KRETPROBE(handle_kretprobe_auto) 59 + SEC("kretsyscall/nanosleep") 60 + int BPF_KRETPROBE(handle_kretprobe_auto, int ret) 60 61 { 61 62 kretprobe2_res = 22; 62 - return 0; 63 + return ret; 63 64 } 64 65 65 66 SEC("uprobe")

+73 -12

tools/testing/selftests/bpf/progs/test_bpf_nf.c

··· 8 8 #define EINVAL 22 9 9 #define ENOENT 2 10 10 11 + extern unsigned long CONFIG_HZ __kconfig; 12 + 11 13 int test_einval_bpf_tuple = 0; 12 14 int test_einval_reserved = 0; 13 15 int test_einval_netns_id = 0; ··· 18 16 int test_enonet_netns_id = 0; 19 17 int test_enoent_lookup = 0; 20 18 int test_eafnosupport = 0; 19 + int test_alloc_entry = -EINVAL; 20 + int test_insert_entry = -EAFNOSUPPORT; 21 + int test_succ_lookup = -ENOENT; 22 + u32 test_delta_timeout = 0; 23 + u32 test_status = 0; 21 24 22 25 struct nf_conn; 23 26 ··· 33 26 u8 reserved[3]; 34 27 } __attribute__((preserve_access_index)); 35 28 29 + struct nf_conn *bpf_xdp_ct_alloc(struct xdp_md *, struct bpf_sock_tuple *, u32, 30 + struct bpf_ct_opts___local *, u32) __ksym; 36 31 struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *, struct bpf_sock_tuple *, u32, 37 32 struct bpf_ct_opts___local *, u32) __ksym; 33 + struct nf_conn *bpf_skb_ct_alloc(struct __sk_buff *, struct bpf_sock_tuple *, u32, 34 + struct bpf_ct_opts___local *, u32) __ksym; 38 35 struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32, 39 36 struct bpf_ct_opts___local *, u32) __ksym; 37 + struct nf_conn *bpf_ct_insert_entry(struct nf_conn *) __ksym; 40 38 void bpf_ct_release(struct nf_conn *) __ksym; 39 + void bpf_ct_set_timeout(struct nf_conn *, u32) __ksym; 40 + int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym; 41 + int bpf_ct_set_status(struct nf_conn *, u32) __ksym; 42 + int bpf_ct_change_status(struct nf_conn *, u32) __ksym; 41 43 42 44 static __always_inline void 43 - nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32, 44 - struct bpf_ct_opts___local *, u32), 45 + nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32, 46 + struct bpf_ct_opts___local *, u32), 47 + struct nf_conn *(*alloc_fn)(void *, struct bpf_sock_tuple *, u32, 48 + struct bpf_ct_opts___local *, u32), 45 49 void *ctx) 46 50 { 47 51 struct bpf_ct_opts___local opts_def = { .l4proto = IPPROTO_TCP, .netns_id = -1 }; 48 52 struct bpf_sock_tuple bpf_tuple; 49 53 struct nf_conn *ct; 54 + int err; 50 55 51 56 __builtin_memset(&bpf_tuple, 0, sizeof(bpf_tuple.ipv4)); 52 57 53 - ct = func(ctx, NULL, 0, &opts_def, sizeof(opts_def)); 58 + ct = lookup_fn(ctx, NULL, 0, &opts_def, sizeof(opts_def)); 54 59 if (ct) 55 60 bpf_ct_release(ct); 56 61 else 57 62 test_einval_bpf_tuple = opts_def.error; 58 63 59 64 opts_def.reserved[0] = 1; 60 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 65 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 66 + sizeof(opts_def)); 61 67 opts_def.reserved[0] = 0; 62 68 opts_def.l4proto = IPPROTO_TCP; 63 69 if (ct) ··· 79 59 test_einval_reserved = opts_def.error; 80 60 81 61 opts_def.netns_id = -2; 82 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 62 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 63 + sizeof(opts_def)); 83 64 opts_def.netns_id = -1; 84 65 if (ct) 85 66 bpf_ct_release(ct); 86 67 else 87 68 test_einval_netns_id = opts_def.error; 88 69 89 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def) - 1); 70 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 71 + sizeof(opts_def) - 1); 90 72 if (ct) 91 73 bpf_ct_release(ct); 92 74 else 93 75 test_einval_len_opts = opts_def.error; 94 76 95 77 opts_def.l4proto = IPPROTO_ICMP; 96 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 78 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 79 + sizeof(opts_def)); 97 80 opts_def.l4proto = IPPROTO_TCP; 98 81 if (ct) 99 82 bpf_ct_release(ct); ··· 104 81 test_eproto_l4proto = opts_def.error; 105 82 106 83 opts_def.netns_id = 0xf00f; 107 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 84 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 85 + sizeof(opts_def)); 108 86 opts_def.netns_id = -1; 109 87 if (ct) 110 88 bpf_ct_release(ct); 111 89 else 112 90 test_enonet_netns_id = opts_def.error; 113 91 114 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 92 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 93 + sizeof(opts_def)); 115 94 if (ct) 116 95 bpf_ct_release(ct); 117 96 else 118 97 test_enoent_lookup = opts_def.error; 119 98 120 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def, sizeof(opts_def)); 99 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def, 100 + sizeof(opts_def)); 121 101 if (ct) 122 102 bpf_ct_release(ct); 123 103 else 124 104 test_eafnosupport = opts_def.error; 105 + 106 + bpf_tuple.ipv4.saddr = bpf_get_prandom_u32(); /* src IP */ 107 + bpf_tuple.ipv4.daddr = bpf_get_prandom_u32(); /* dst IP */ 108 + bpf_tuple.ipv4.sport = bpf_get_prandom_u32(); /* src port */ 109 + bpf_tuple.ipv4.dport = bpf_get_prandom_u32(); /* dst port */ 110 + 111 + ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 112 + sizeof(opts_def)); 113 + if (ct) { 114 + struct nf_conn *ct_ins; 115 + 116 + bpf_ct_set_timeout(ct, 10000); 117 + bpf_ct_set_status(ct, IPS_CONFIRMED); 118 + 119 + ct_ins = bpf_ct_insert_entry(ct); 120 + if (ct_ins) { 121 + struct nf_conn *ct_lk; 122 + 123 + ct_lk = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), 124 + &opts_def, sizeof(opts_def)); 125 + if (ct_lk) { 126 + /* update ct entry timeout */ 127 + bpf_ct_change_timeout(ct_lk, 10000); 128 + test_delta_timeout = ct_lk->timeout - bpf_jiffies64(); 129 + test_delta_timeout /= CONFIG_HZ; 130 + test_status = IPS_SEEN_REPLY; 131 + bpf_ct_change_status(ct_lk, IPS_SEEN_REPLY); 132 + bpf_ct_release(ct_lk); 133 + test_succ_lookup = 0; 134 + } 135 + bpf_ct_release(ct_ins); 136 + test_insert_entry = 0; 137 + } 138 + test_alloc_entry = 0; 139 + } 125 140 } 126 141 127 142 SEC("xdp") 128 143 int nf_xdp_ct_test(struct xdp_md *ctx) 129 144 { 130 - nf_ct_test((void *)bpf_xdp_ct_lookup, ctx); 145 + nf_ct_test((void *)bpf_xdp_ct_lookup, (void *)bpf_xdp_ct_alloc, ctx); 131 146 return 0; 132 147 } 133 148 134 149 SEC("tc") 135 150 int nf_skb_ct_test(struct __sk_buff *ctx) 136 151 { 137 - nf_ct_test((void *)bpf_skb_ct_lookup, ctx); 152 + nf_ct_test((void *)bpf_skb_ct_lookup, (void *)bpf_skb_ct_alloc, ctx); 138 153 return 0; 139 154 } 140 155

+134

tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <vmlinux.h> 3 + #include <bpf/bpf_tracing.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_core_read.h> 6 + 7 + struct nf_conn; 8 + 9 + struct bpf_ct_opts___local { 10 + s32 netns_id; 11 + s32 error; 12 + u8 l4proto; 13 + u8 reserved[3]; 14 + } __attribute__((preserve_access_index)); 15 + 16 + struct nf_conn *bpf_skb_ct_alloc(struct __sk_buff *, struct bpf_sock_tuple *, u32, 17 + struct bpf_ct_opts___local *, u32) __ksym; 18 + struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32, 19 + struct bpf_ct_opts___local *, u32) __ksym; 20 + struct nf_conn *bpf_ct_insert_entry(struct nf_conn *) __ksym; 21 + void bpf_ct_release(struct nf_conn *) __ksym; 22 + void bpf_ct_set_timeout(struct nf_conn *, u32) __ksym; 23 + int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym; 24 + int bpf_ct_set_status(struct nf_conn *, u32) __ksym; 25 + int bpf_ct_change_status(struct nf_conn *, u32) __ksym; 26 + 27 + SEC("?tc") 28 + int alloc_release(struct __sk_buff *ctx) 29 + { 30 + struct bpf_ct_opts___local opts = {}; 31 + struct bpf_sock_tuple tup = {}; 32 + struct nf_conn *ct; 33 + 34 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 35 + if (!ct) 36 + return 0; 37 + bpf_ct_release(ct); 38 + return 0; 39 + } 40 + 41 + SEC("?tc") 42 + int insert_insert(struct __sk_buff *ctx) 43 + { 44 + struct bpf_ct_opts___local opts = {}; 45 + struct bpf_sock_tuple tup = {}; 46 + struct nf_conn *ct; 47 + 48 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 49 + if (!ct) 50 + return 0; 51 + ct = bpf_ct_insert_entry(ct); 52 + if (!ct) 53 + return 0; 54 + ct = bpf_ct_insert_entry(ct); 55 + return 0; 56 + } 57 + 58 + SEC("?tc") 59 + int lookup_insert(struct __sk_buff *ctx) 60 + { 61 + struct bpf_ct_opts___local opts = {}; 62 + struct bpf_sock_tuple tup = {}; 63 + struct nf_conn *ct; 64 + 65 + ct = bpf_skb_ct_lookup(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 66 + if (!ct) 67 + return 0; 68 + bpf_ct_insert_entry(ct); 69 + return 0; 70 + } 71 + 72 + SEC("?tc") 73 + int set_timeout_after_insert(struct __sk_buff *ctx) 74 + { 75 + struct bpf_ct_opts___local opts = {}; 76 + struct bpf_sock_tuple tup = {}; 77 + struct nf_conn *ct; 78 + 79 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 80 + if (!ct) 81 + return 0; 82 + ct = bpf_ct_insert_entry(ct); 83 + if (!ct) 84 + return 0; 85 + bpf_ct_set_timeout(ct, 0); 86 + return 0; 87 + } 88 + 89 + SEC("?tc") 90 + int set_status_after_insert(struct __sk_buff *ctx) 91 + { 92 + struct bpf_ct_opts___local opts = {}; 93 + struct bpf_sock_tuple tup = {}; 94 + struct nf_conn *ct; 95 + 96 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 97 + if (!ct) 98 + return 0; 99 + ct = bpf_ct_insert_entry(ct); 100 + if (!ct) 101 + return 0; 102 + bpf_ct_set_status(ct, 0); 103 + return 0; 104 + } 105 + 106 + SEC("?tc") 107 + int change_timeout_after_alloc(struct __sk_buff *ctx) 108 + { 109 + struct bpf_ct_opts___local opts = {}; 110 + struct bpf_sock_tuple tup = {}; 111 + struct nf_conn *ct; 112 + 113 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 114 + if (!ct) 115 + return 0; 116 + bpf_ct_change_timeout(ct, 0); 117 + return 0; 118 + } 119 + 120 + SEC("?tc") 121 + int change_status_after_alloc(struct __sk_buff *ctx) 122 + { 123 + struct bpf_ct_opts___local opts = {}; 124 + struct bpf_sock_tuple tup = {}; 125 + struct nf_conn *ct; 126 + 127 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 128 + if (!ct) 129 + return 0; 130 + bpf_ct_change_status(ct, 0); 131 + return 0; 132 + } 133 + 134 + char _license[] SEC("license") = "GPL";

+3

tools/testing/selftests/bpf/progs/test_core_extern.c

··· 11 11 static int (*bpf_missing_helper)(const void *arg1, int arg2) = (void *) 999; 12 12 13 13 extern int LINUX_KERNEL_VERSION __kconfig; 14 + extern int LINUX_UNKNOWN_VIRTUAL_EXTERN __kconfig __weak; 14 15 extern bool CONFIG_BPF_SYSCALL __kconfig; /* strong */ 15 16 extern enum libbpf_tristate CONFIG_TRISTATE __kconfig __weak; 16 17 extern bool CONFIG_BOOL __kconfig __weak; ··· 23 22 extern uint64_t CONFIG_MISSING __kconfig __weak; 24 23 25 24 uint64_t kern_ver = -1; 25 + uint64_t unkn_virt_val = -1; 26 26 uint64_t bpf_syscall = -1; 27 27 uint64_t tristate_val = -1; 28 28 uint64_t bool_val = -1; ··· 40 38 int i; 41 39 42 40 kern_ver = LINUX_KERNEL_VERSION; 41 + unkn_virt_val = LINUX_UNKNOWN_VIRTUAL_EXTERN; 43 42 bpf_syscall = CONFIG_BPF_SYSCALL; 44 43 tristate_val = CONFIG_TRISTATE; 45 44 bool_val = CONFIG_BOOL;

+6 -21

tools/testing/selftests/bpf/progs/test_probe_user.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 - 3 - #include <linux/ptrace.h> 4 - #include <linux/bpf.h> 5 - 6 - #include <netinet/in.h> 7 - 2 + #include "vmlinux.h" 8 3 #include <bpf/bpf_helpers.h> 9 4 #include <bpf/bpf_tracing.h> 5 + #include <bpf/bpf_core_read.h> 10 6 #include "bpf_misc.h" 11 7 12 8 static struct sockaddr_in old; 13 9 14 - SEC("kprobe/" SYS_PREFIX "sys_connect") 15 - int BPF_KPROBE(handle_sys_connect) 10 + SEC("ksyscall/connect") 11 + int BPF_KSYSCALL(handle_sys_connect, int fd, struct sockaddr_in *uservaddr, int addrlen) 16 12 { 17 - #if SYSCALL_WRAPPER == 1 18 - struct pt_regs *real_regs; 19 - #endif 20 13 struct sockaddr_in new; 21 - void *ptr; 22 14 23 - #if SYSCALL_WRAPPER == 0 24 - ptr = (void *)PT_REGS_PARM2(ctx); 25 - #else 26 - real_regs = (struct pt_regs *)PT_REGS_PARM1(ctx); 27 - bpf_probe_read_kernel(&ptr, sizeof(ptr), &PT_REGS_PARM2(real_regs)); 28 - #endif 29 - 30 - bpf_probe_read_user(&old, sizeof(old), ptr); 15 + bpf_probe_read_user(&old, sizeof(old), uservaddr); 31 16 __builtin_memset(&new, 0xab, sizeof(new)); 32 - bpf_probe_write_user(ptr, &new, sizeof(new)); 17 + bpf_probe_write_user(uservaddr, &new, sizeof(new)); 33 18 34 19 return 0; 35 20 }

+4

tools/testing/selftests/bpf/progs/test_skeleton.c

··· 51 51 int read_mostly_var __read_mostly; 52 52 int out_mostly_var; 53 53 54 + char huge_arr[16 * 1024 * 1024]; 55 + 54 56 SEC("raw_tp/sys_enter") 55 57 int handler(const void *ctx) 56 58 { ··· 72 70 out_dynarr[i] = in_dynarr[i]; 73 71 74 72 out_mostly_var = read_mostly_var; 73 + 74 + huge_arr[sizeof(huge_arr) - 1] = 123; 75 75 76 76 return 0; 77 77 }

+15 -15

tools/testing/selftests/bpf/progs/test_xdp_noinline.c

··· 239 239 udp = data + off; 240 240 241 241 if (udp + 1 > data_end) 242 - return 0; 242 + return false; 243 243 if (!is_icmp) { 244 244 pckt->flow.port16[0] = udp->source; 245 245 pckt->flow.port16[1] = udp->dest; ··· 247 247 pckt->flow.port16[0] = udp->dest; 248 248 pckt->flow.port16[1] = udp->source; 249 249 } 250 - return 1; 250 + return true; 251 251 } 252 252 253 253 static __attribute__ ((noinline)) ··· 261 261 262 262 tcp = data + off; 263 263 if (tcp + 1 > data_end) 264 - return 0; 264 + return false; 265 265 if (tcp->syn) 266 266 pckt->flags |= (1 << 1); 267 267 if (!is_icmp) { ··· 271 271 pckt->flow.port16[0] = tcp->dest; 272 272 pckt->flow.port16[1] = tcp->source; 273 273 } 274 - return 1; 274 + return true; 275 275 } 276 276 277 277 static __attribute__ ((noinline)) ··· 287 287 void *data; 288 288 289 289 if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct ipv6hdr))) 290 - return 0; 290 + return false; 291 291 data = (void *)(long)xdp->data; 292 292 data_end = (void *)(long)xdp->data_end; 293 293 new_eth = data; ··· 295 295 old_eth = data + sizeof(struct ipv6hdr); 296 296 if (new_eth + 1 > data_end || 297 297 old_eth + 1 > data_end || ip6h + 1 > data_end) 298 - return 0; 298 + return false; 299 299 memcpy(new_eth->eth_dest, cval->mac, 6); 300 300 memcpy(new_eth->eth_source, old_eth->eth_dest, 6); 301 301 new_eth->eth_proto = 56710; ··· 314 314 ip6h->saddr.in6_u.u6_addr32[2] = 3; 315 315 ip6h->saddr.in6_u.u6_addr32[3] = ip_suffix; 316 316 memcpy(ip6h->daddr.in6_u.u6_addr32, dst->dstv6, 16); 317 - return 1; 317 + return true; 318 318 } 319 319 320 320 static __attribute__ ((noinline)) ··· 335 335 ip_suffix <<= 15; 336 336 ip_suffix ^= pckt->flow.src; 337 337 if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct iphdr))) 338 - return 0; 338 + return false; 339 339 data = (void *)(long)xdp->data; 340 340 data_end = (void *)(long)xdp->data_end; 341 341 new_eth = data; ··· 343 343 old_eth = data + sizeof(struct iphdr); 344 344 if (new_eth + 1 > data_end || 345 345 old_eth + 1 > data_end || iph + 1 > data_end) 346 - return 0; 346 + return false; 347 347 memcpy(new_eth->eth_dest, cval->mac, 6); 348 348 memcpy(new_eth->eth_source, old_eth->eth_dest, 6); 349 349 new_eth->eth_proto = 8; ··· 367 367 csum += *next_iph_u16++; 368 368 iph->check = ~((csum & 0xffff) + (csum >> 16)); 369 369 if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr))) 370 - return 0; 371 - return 1; 370 + return false; 371 + return true; 372 372 } 373 373 374 374 static __attribute__ ((noinline)) ··· 386 386 else 387 387 new_eth->eth_proto = 56710; 388 388 if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct ipv6hdr))) 389 - return 0; 389 + return false; 390 390 *data = (void *)(long)xdp->data; 391 391 *data_end = (void *)(long)xdp->data_end; 392 - return 1; 392 + return true; 393 393 } 394 394 395 395 static __attribute__ ((noinline)) ··· 404 404 memcpy(new_eth->eth_dest, old_eth->eth_dest, 6); 405 405 new_eth->eth_proto = 8; 406 406 if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr))) 407 - return 0; 407 + return false; 408 408 *data = (void *)(long)xdp->data; 409 409 *data_end = (void *)(long)xdp->data_end; 410 - return 1; 410 + return true; 411 411 } 412 412 413 413 static __attribute__ ((noinline))

+3 -3

tools/testing/selftests/bpf/test_xdp_veth.sh

··· 106 106 bpftool map update pinned $BPF_DIR/maps/tx_port key 0 0 0 0 value 122 0 0 0 107 107 bpftool map update pinned $BPF_DIR/maps/tx_port key 1 0 0 0 value 133 0 0 0 108 108 bpftool map update pinned $BPF_DIR/maps/tx_port key 2 0 0 0 value 111 0 0 0 109 - ip link set dev veth1 xdp pinned $BPF_DIR/progs/redirect_map_0 110 - ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1 111 - ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2 109 + ip link set dev veth1 xdp pinned $BPF_DIR/progs/xdp_redirect_map_0 110 + ip link set dev veth2 xdp pinned $BPF_DIR/progs/xdp_redirect_map_1 111 + ip link set dev veth3 xdp pinned $BPF_DIR/progs/xdp_redirect_map_2 112 112 113 113 ip -n ${NS1} link set dev veth11 xdp obj xdp_dummy.o sec xdp 114 114 ip -n ${NS2} link set dev veth22 xdp obj xdp_tx.o sec xdp

+1

tools/testing/selftests/bpf/verifier/bpf_loop_inline.c

··· 251 251 .expected_insns = { PSEUDO_CALL_INSN() }, 252 252 .unexpected_insns = { HELPER_CALL_INSN() }, 253 253 .result = ACCEPT, 254 + .prog_type = BPF_PROG_TYPE_TRACEPOINT, 254 255 .func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } }, 255 256 .func_info_cnt = 2, 256 257 BTF_TYPES

+53

tools/testing/selftests/bpf/verifier/calls.c

··· 219 219 .errstr = "variable ptr_ access var_off=(0x0; 0x7) disallowed", 220 220 }, 221 221 { 222 + "calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID", 223 + .insns = { 224 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), 225 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), 226 + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0), 227 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 228 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 229 + BPF_EXIT_INSN(), 230 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 231 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 232 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 233 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 16), 234 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 235 + BPF_MOV64_IMM(BPF_REG_0, 0), 236 + BPF_EXIT_INSN(), 237 + }, 238 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 239 + .fixup_kfunc_btf_id = { 240 + { "bpf_kfunc_call_test_acquire", 3 }, 241 + { "bpf_kfunc_call_test_ref", 8 }, 242 + { "bpf_kfunc_call_test_ref", 10 }, 243 + }, 244 + .result_unpriv = REJECT, 245 + .result = REJECT, 246 + .errstr = "R1 must be referenced", 247 + }, 248 + { 249 + "calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID", 250 + .insns = { 251 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), 252 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), 253 + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0), 254 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 255 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 256 + BPF_EXIT_INSN(), 257 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 258 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 259 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 260 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 261 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 262 + BPF_MOV64_IMM(BPF_REG_0, 0), 263 + BPF_EXIT_INSN(), 264 + }, 265 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 266 + .fixup_kfunc_btf_id = { 267 + { "bpf_kfunc_call_test_acquire", 3 }, 268 + { "bpf_kfunc_call_test_ref", 8 }, 269 + { "bpf_kfunc_call_test_release", 10 }, 270 + }, 271 + .result_unpriv = REJECT, 272 + .result = ACCEPT, 273 + }, 274 + { 222 275 "calls: basic sanity", 223 276 .insns = { 224 277 BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),