Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2022-07-22

We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).

The main changes are:

1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.

2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
syscalls through kprobe mechanism, from Andrii Nakryiko.

3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
function, from Song Liu & Jiri Olsa.

4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.

5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
with kernel symbols, from Alan Maguire.

6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.

7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.

8) libbpf support for writing custom perf event readers, from Jon Doron.

9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.

10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.

11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.

12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.

13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.

14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.

15) Make non-prealloced BPF map allocations low priority to play better with
memcg limits, from Yafang Shao.

16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.

17) Various smaller cleanups and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
bpf: Simplify bpf_prog_pack_[size|mask]
bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
bpf, x64: Allow to use caller address from stack
ftrace: Allow IPMODIFY and DIRECT ops on the same function
ftrace: Add modify_ftrace_direct_multi_nolock
bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
selftests/bpf: Fix test_verifier failed test in unprivileged mode
selftests/bpf: Add negative tests for new nf_conntrack kfuncs
selftests/bpf: Add tests for new nf_conntrack kfuncs
selftests/bpf: Add verifier tests for trusted kfunc args
net: netfilter: Add kfuncs to set and change CT status
net: netfilter: Add kfuncs to set and change CT timeout
net: netfilter: Add kfuncs to allocate and insert CT
net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
bpf: Add documentation for kfuncs
bpf: Add support for forcing kfunc args to be trusted
bpf: Switch to new kfunc flags infrastructure
tools/resolve_btfids: Add support for 8-byte BTF sets
bpf: Introduce 8-byte BTF set
...
====================

Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+3461 -863
+5 -1
Documentation/bpf/btf.rst
··· 369 369 * ``name_off``: offset to a valid C identifier 370 370 * ``info.kind_flag``: 0 371 371 * ``info.kind``: BTF_KIND_FUNC 372 - * ``info.vlen``: 0 372 + * ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL 373 + or BTF_FUNC_EXTERN) 373 374 * ``type``: a BTF_KIND_FUNC_PROTO type 374 375 375 376 No additional type data follow ``btf_type``. ··· 380 379 type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the 381 380 :ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load` 382 381 (ABI). 382 + 383 + Currently, only linkage values of BTF_FUNC_STATIC and BTF_FUNC_GLOBAL are 384 + supported in the kernel. 383 385 384 386 2.2.13 BTF_KIND_FUNC_PROTO 385 387 ~~~~~~~~~~~~~~~~~~~~~~~~~~
+1
Documentation/bpf/index.rst
··· 19 19 faq 20 20 syscall_api 21 21 helpers 22 + kfuncs 22 23 programs 23 24 maps 24 25 bpf_prog_run
+170
Documentation/bpf/kfuncs.rst
··· 1 + ============================= 2 + BPF Kernel Functions (kfuncs) 3 + ============================= 4 + 5 + 1. Introduction 6 + =============== 7 + 8 + BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux 9 + kernel which are exposed for use by BPF programs. Unlike normal BPF helpers, 10 + kfuncs do not have a stable interface and can change from one kernel release to 11 + another. Hence, BPF programs need to be updated in response to changes in the 12 + kernel. 13 + 14 + 2. Defining a kfunc 15 + =================== 16 + 17 + There are two ways to expose a kernel function to BPF programs, either make an 18 + existing function in the kernel visible, or add a new wrapper for BPF. In both 19 + cases, care must be taken that BPF program can only call such function in a 20 + valid context. To enforce this, visibility of a kfunc can be per program type. 21 + 22 + If you are not creating a BPF wrapper for existing kernel function, skip ahead 23 + to :ref:`BPF_kfunc_nodef`. 24 + 25 + 2.1 Creating a wrapper kfunc 26 + ---------------------------- 27 + 28 + When defining a wrapper kfunc, the wrapper function should have extern linkage. 29 + This prevents the compiler from optimizing away dead code, as this wrapper kfunc 30 + is not invoked anywhere in the kernel itself. It is not necessary to provide a 31 + prototype in a header for the wrapper kfunc. 32 + 33 + An example is given below:: 34 + 35 + /* Disables missing prototype warnings */ 36 + __diag_push(); 37 + __diag_ignore_all("-Wmissing-prototypes", 38 + "Global kfuncs as their definitions will be in BTF"); 39 + 40 + struct task_struct *bpf_find_get_task_by_vpid(pid_t nr) 41 + { 42 + return find_get_task_by_vpid(nr); 43 + } 44 + 45 + __diag_pop(); 46 + 47 + A wrapper kfunc is often needed when we need to annotate parameters of the 48 + kfunc. Otherwise one may directly make the kfunc visible to the BPF program by 49 + registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`. 50 + 51 + 2.2 Annotating kfunc parameters 52 + ------------------------------- 53 + 54 + Similar to BPF helpers, there is sometime need for additional context required 55 + by the verifier to make the usage of kernel functions safer and more useful. 56 + Hence, we can annotate a parameter by suffixing the name of the argument of the 57 + kfunc with a __tag, where tag may be one of the supported annotations. 58 + 59 + 2.2.1 __sz Annotation 60 + --------------------- 61 + 62 + This annotation is used to indicate a memory and size pair in the argument list. 63 + An example is given below:: 64 + 65 + void bpf_memzero(void *mem, int mem__sz) 66 + { 67 + ... 68 + } 69 + 70 + Here, the verifier will treat first argument as a PTR_TO_MEM, and second 71 + argument as its size. By default, without __sz annotation, the size of the type 72 + of the pointer is used. Without __sz annotation, a kfunc cannot accept a void 73 + pointer. 74 + 75 + .. _BPF_kfunc_nodef: 76 + 77 + 2.3 Using an existing kernel function 78 + ------------------------------------- 79 + 80 + When an existing function in the kernel is fit for consumption by BPF programs, 81 + it can be directly registered with the BPF subsystem. However, care must still 82 + be taken to review the context in which it will be invoked by the BPF program 83 + and whether it is safe to do so. 84 + 85 + 2.4 Annotating kfuncs 86 + --------------------- 87 + 88 + In addition to kfuncs' arguments, verifier may need more information about the 89 + type of kfunc(s) being registered with the BPF subsystem. To do so, we define 90 + flags on a set of kfuncs as follows:: 91 + 92 + BTF_SET8_START(bpf_task_set) 93 + BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) 94 + BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) 95 + BTF_SET8_END(bpf_task_set) 96 + 97 + This set encodes the BTF ID of each kfunc listed above, and encodes the flags 98 + along with it. Ofcourse, it is also allowed to specify no flags. 99 + 100 + 2.4.1 KF_ACQUIRE flag 101 + --------------------- 102 + 103 + The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a 104 + refcounted object. The verifier will then ensure that the pointer to the object 105 + is eventually released using a release kfunc, or transferred to a map using a 106 + referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the 107 + loading of the BPF program until no lingering references remain in all possible 108 + explored states of the program. 109 + 110 + 2.4.2 KF_RET_NULL flag 111 + ---------------------- 112 + 113 + The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc 114 + may be NULL. Hence, it forces the user to do a NULL check on the pointer 115 + returned from the kfunc before making use of it (dereferencing or passing to 116 + another helper). This flag is often used in pairing with KF_ACQUIRE flag, but 117 + both are orthogonal to each other. 118 + 119 + 2.4.3 KF_RELEASE flag 120 + --------------------- 121 + 122 + The KF_RELEASE flag is used to indicate that the kfunc releases the pointer 123 + passed in to it. There can be only one referenced pointer that can be passed in. 124 + All copies of the pointer being released are invalidated as a result of invoking 125 + kfunc with this flag. 126 + 127 + 2.4.4 KF_KPTR_GET flag 128 + ---------------------- 129 + 130 + The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument 131 + as a pointer to kptr, safely increments the refcount of the object it points to, 132 + and returns a reference to the user. The rest of the arguments may be normal 133 + arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with 134 + KF_ACQUIRE and KF_RET_NULL flags. 135 + 136 + 2.4.5 KF_TRUSTED_ARGS flag 137 + -------------------------- 138 + 139 + The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It 140 + indicates that the all pointer arguments will always be refcounted, and have 141 + their offset set to 0. It can be used to enforce that a pointer to a refcounted 142 + object acquired from a kfunc or BPF helper is passed as an argument to this 143 + kfunc without any modifications (e.g. pointer arithmetic) such that it is 144 + trusted and points to the original object. This flag is often used for kfuncs 145 + that operate (change some property, perform some operation) on an object that 146 + was obtained using an acquire kfunc. Such kfuncs need an unchanged pointer to 147 + ensure the integrity of the operation being performed on the expected object. 148 + 149 + 2.5 Registering the kfuncs 150 + -------------------------- 151 + 152 + Once the kfunc is prepared for use, the final step to making it visible is 153 + registering it with the BPF subsystem. Registration is done per BPF program 154 + type. An example is shown below:: 155 + 156 + BTF_SET8_START(bpf_task_set) 157 + BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) 158 + BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) 159 + BTF_SET8_END(bpf_task_set) 160 + 161 + static const struct btf_kfunc_id_set bpf_task_kfunc_set = { 162 + .owner = THIS_MODULE, 163 + .set = &bpf_task_set, 164 + }; 165 + 166 + static int init_subsystem(void) 167 + { 168 + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set); 169 + } 170 + late_initcall(init_subsystem);
+185
Documentation/bpf/map_hash.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0-only 2 + .. Copyright (C) 2022 Red Hat, Inc. 3 + 4 + =============================================== 5 + BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants 6 + =============================================== 7 + 8 + .. note:: 9 + - ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19 10 + - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6 11 + - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 12 + were introduced in version 4.10 13 + 14 + ``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general 15 + purpose hash map storage. Both the key and the value can be structs, 16 + allowing for composite keys and values. 17 + 18 + The kernel is responsible for allocating and freeing key/value pairs, up 19 + to the max_entries limit that you specify. Hash maps use pre-allocation 20 + of hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be 21 + used to disable pre-allocation when it is too memory expensive. 22 + 23 + ``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per 24 + CPU. The per-cpu values are stored internally in an array. 25 + 26 + The ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 27 + variants add LRU semantics to their respective hash tables. An LRU hash 28 + will automatically evict the least recently used entries when the hash 29 + table reaches capacity. An LRU hash maintains an internal LRU list that 30 + is used to select elements for eviction. This internal LRU list is 31 + shared across CPUs but it is possible to request a per CPU LRU list with 32 + the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. 33 + 34 + Usage 35 + ===== 36 + 37 + .. c:function:: 38 + long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 39 + 40 + Hash entries can be added or updated using the ``bpf_map_update_elem()`` 41 + helper. This helper replaces existing elements atomically. The ``flags`` 42 + parameter can be used to control the update behaviour: 43 + 44 + - ``BPF_ANY`` will create a new element or update an existing element 45 + - ``BPF_NOEXIST`` will create a new element only if one did not already 46 + exist 47 + - ``BPF_EXIST`` will update an existing element 48 + 49 + ``bpf_map_update_elem()`` returns 0 on success, or negative error in 50 + case of failure. 51 + 52 + .. c:function:: 53 + void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 54 + 55 + Hash entries can be retrieved using the ``bpf_map_lookup_elem()`` 56 + helper. This helper returns a pointer to the value associated with 57 + ``key``, or ``NULL`` if no entry was found. 58 + 59 + .. c:function:: 60 + long bpf_map_delete_elem(struct bpf_map *map, const void *key) 61 + 62 + Hash entries can be deleted using the ``bpf_map_delete_elem()`` 63 + helper. This helper will return 0 on success, or negative error in case 64 + of failure. 65 + 66 + Per CPU Hashes 67 + -------------- 68 + 69 + For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` 70 + the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers 71 + automatically access the hash slot for the current CPU. 72 + 73 + .. c:function:: 74 + void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu) 75 + 76 + The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the 77 + value in the hash slot for a specific CPU. Returns value associated with 78 + ``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is 79 + invalid. 80 + 81 + Concurrency 82 + ----------- 83 + 84 + Values stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by 85 + programs running on different CPUs. Since Kernel version 5.1, the BPF 86 + infrastructure provides ``struct bpf_spin_lock`` to synchronise access. 87 + See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``. 88 + 89 + Userspace 90 + --------- 91 + 92 + .. c:function:: 93 + int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key) 94 + 95 + In userspace, it is possible to iterate through the keys of a hash using 96 + libbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by 97 + calling ``bpf_map_get_next_key()`` with ``cur_key`` set to 98 + ``NULL``. Subsequent calls will fetch the next key that follows the 99 + current key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if 100 + cur_key is the last key in the hash, or negative error in case of 101 + failure. 102 + 103 + Note that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()`` 104 + will instead return the *first* key in the hash table which is 105 + undesirable. It is recommended to use batched lookup if there is going 106 + to be key deletion intermixed with ``bpf_map_get_next_key()``. 107 + 108 + Examples 109 + ======== 110 + 111 + Please see the ``tools/testing/selftests/bpf`` directory for functional 112 + examples. The code snippets below demonstrates API usage. 113 + 114 + This example shows how to declare an LRU Hash with a struct key and a 115 + struct value. 116 + 117 + .. code-block:: c 118 + 119 + #include <linux/bpf.h> 120 + #include <bpf/bpf_helpers.h> 121 + 122 + struct key { 123 + __u32 srcip; 124 + }; 125 + 126 + struct value { 127 + __u64 packets; 128 + __u64 bytes; 129 + }; 130 + 131 + struct { 132 + __uint(type, BPF_MAP_TYPE_LRU_HASH); 133 + __uint(max_entries, 32); 134 + __type(key, struct key); 135 + __type(value, struct value); 136 + } packet_stats SEC(".maps"); 137 + 138 + This example shows how to create or update hash values using atomic 139 + instructions: 140 + 141 + .. code-block:: c 142 + 143 + static void update_stats(__u32 srcip, int bytes) 144 + { 145 + struct key key = { 146 + .srcip = srcip, 147 + }; 148 + struct value *value = bpf_map_lookup_elem(&packet_stats, &key); 149 + 150 + if (value) { 151 + __sync_fetch_and_add(&value->packets, 1); 152 + __sync_fetch_and_add(&value->bytes, bytes); 153 + } else { 154 + struct value newval = { 1, bytes }; 155 + 156 + bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST); 157 + } 158 + } 159 + 160 + Userspace walking the map elements from the map declared above: 161 + 162 + .. code-block:: c 163 + 164 + #include <bpf/libbpf.h> 165 + #include <bpf/bpf.h> 166 + 167 + static void walk_hash_elements(int map_fd) 168 + { 169 + struct key *cur_key = NULL; 170 + struct key next_key; 171 + struct value value; 172 + int err; 173 + 174 + for (;;) { 175 + err = bpf_map_get_next_key(map_fd, cur_key, &next_key); 176 + if (err) 177 + break; 178 + 179 + bpf_map_lookup_elem(map_fd, &next_key, &value); 180 + 181 + // Use key and value here 182 + 183 + cur_key = &next_key; 184 + } 185 + }
+3
arch/arm64/include/asm/insn.h
··· 510 510 unsigned int imm, 511 511 enum aarch64_insn_size_type size, 512 512 enum aarch64_insn_ldst_type type); 513 + u32 aarch64_insn_gen_load_literal(unsigned long pc, unsigned long addr, 514 + enum aarch64_insn_register reg, 515 + bool is64bit); 513 516 u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1, 514 517 enum aarch64_insn_register reg2, 515 518 enum aarch64_insn_register base,
+26 -4
arch/arm64/lib/insn.c
··· 323 323 return insn; 324 324 } 325 325 326 - static inline long branch_imm_common(unsigned long pc, unsigned long addr, 326 + static inline long label_imm_common(unsigned long pc, unsigned long addr, 327 327 long range) 328 328 { 329 329 long offset; ··· 354 354 * ARM64 virtual address arrangement guarantees all kernel and module 355 355 * texts are within +/-128M. 356 356 */ 357 - offset = branch_imm_common(pc, addr, SZ_128M); 357 + offset = label_imm_common(pc, addr, SZ_128M); 358 358 if (offset >= SZ_128M) 359 359 return AARCH64_BREAK_FAULT; 360 360 ··· 382 382 u32 insn; 383 383 long offset; 384 384 385 - offset = branch_imm_common(pc, addr, SZ_1M); 385 + offset = label_imm_common(pc, addr, SZ_1M); 386 386 if (offset >= SZ_1M) 387 387 return AARCH64_BREAK_FAULT; 388 388 ··· 421 421 u32 insn; 422 422 long offset; 423 423 424 - offset = branch_imm_common(pc, addr, SZ_1M); 424 + offset = label_imm_common(pc, addr, SZ_1M); 425 425 426 426 insn = aarch64_insn_get_bcond_value(); 427 427 ··· 541 541 base); 542 542 543 543 return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_12, insn, imm); 544 + } 545 + 546 + u32 aarch64_insn_gen_load_literal(unsigned long pc, unsigned long addr, 547 + enum aarch64_insn_register reg, 548 + bool is64bit) 549 + { 550 + u32 insn; 551 + long offset; 552 + 553 + offset = label_imm_common(pc, addr, SZ_1M); 554 + if (offset >= SZ_1M) 555 + return AARCH64_BREAK_FAULT; 556 + 557 + insn = aarch64_insn_get_ldr_lit_value(); 558 + 559 + if (is64bit) 560 + insn |= BIT(30); 561 + 562 + insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn, reg); 563 + 564 + return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_19, insn, 565 + offset >> 2); 544 566 } 545 567 546 568 u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
+7
arch/arm64/net/bpf_jit.h
··· 80 80 #define A64_STR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, STORE) 81 81 #define A64_LDR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, LOAD) 82 82 83 + /* LDR (literal) */ 84 + #define A64_LDR32LIT(Wt, offset) \ 85 + aarch64_insn_gen_load_literal(0, offset, Wt, false) 86 + #define A64_LDR64LIT(Xt, offset) \ 87 + aarch64_insn_gen_load_literal(0, offset, Xt, true) 88 + 83 89 /* Load/store register pair */ 84 90 #define A64_LS_PAIR(Rt, Rt2, Rn, offset, ls, type) \ 85 91 aarch64_insn_gen_load_store_pair(Rt, Rt2, Rn, offset, \ ··· 276 270 #define A64_BTI_C A64_HINT(AARCH64_INSN_HINT_BTIC) 277 271 #define A64_BTI_J A64_HINT(AARCH64_INSN_HINT_BTIJ) 278 272 #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC) 273 + #define A64_NOP A64_HINT(AARCH64_INSN_HINT_NOP) 279 274 280 275 /* DMB */ 281 276 #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
+698 -17
arch/arm64/net/bpf_jit_comp.c
··· 10 10 #include <linux/bitfield.h> 11 11 #include <linux/bpf.h> 12 12 #include <linux/filter.h> 13 + #include <linux/memory.h> 13 14 #include <linux/printk.h> 14 15 #include <linux/slab.h> 15 16 ··· 19 18 #include <asm/cacheflush.h> 20 19 #include <asm/debug-monitors.h> 21 20 #include <asm/insn.h> 21 + #include <asm/patching.h> 22 22 #include <asm/set_memory.h> 23 23 24 24 #include "bpf_jit.h" ··· 79 77 u32 stack_size; 80 78 int fpb_offset; 81 79 }; 80 + 81 + struct bpf_plt { 82 + u32 insn_ldr; /* load target */ 83 + u32 insn_br; /* branch to target */ 84 + u64 target; /* target value */ 85 + }; 86 + 87 + #define PLT_TARGET_SIZE sizeof_field(struct bpf_plt, target) 88 + #define PLT_TARGET_OFFSET offsetof(struct bpf_plt, target) 82 89 83 90 static inline void emit(const u32 insn, struct jit_ctx *ctx) 84 91 { ··· 151 140 } 152 141 } 153 142 143 + static inline void emit_bti(u32 insn, struct jit_ctx *ctx) 144 + { 145 + if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) 146 + emit(insn, ctx); 147 + } 148 + 154 149 /* 155 150 * Kernel addresses in the vmalloc space use at most 48 bits, and the 156 151 * remaining bits are guaranteed to be 0x1. So we can compose the address ··· 174 157 shift += 16; 175 158 emit(A64_MOVK(1, reg, tmp & 0xffff, shift), ctx); 176 159 } 160 + } 161 + 162 + static inline void emit_call(u64 target, struct jit_ctx *ctx) 163 + { 164 + u8 tmp = bpf2a64[TMP_REG_1]; 165 + 166 + emit_addr_mov_i64(tmp, target, ctx); 167 + emit(A64_BLR(tmp), ctx); 177 168 } 178 169 179 170 static inline int bpf2a64_offset(int bpf_insn, int off, ··· 260 235 return true; 261 236 } 262 237 238 + /* generated prologue: 239 + * bti c // if CONFIG_ARM64_BTI_KERNEL 240 + * mov x9, lr 241 + * nop // POKE_OFFSET 242 + * paciasp // if CONFIG_ARM64_PTR_AUTH_KERNEL 243 + * stp x29, lr, [sp, #-16]! 244 + * mov x29, sp 245 + * stp x19, x20, [sp, #-16]! 246 + * stp x21, x22, [sp, #-16]! 247 + * stp x25, x26, [sp, #-16]! 248 + * stp x27, x28, [sp, #-16]! 249 + * mov x25, sp 250 + * mov tcc, #0 251 + * // PROLOGUE_OFFSET 252 + */ 253 + 254 + #define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0) 255 + #define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0) 256 + 257 + /* Offset of nop instruction in bpf prog entry to be poked */ 258 + #define POKE_OFFSET (BTI_INSNS + 1) 259 + 263 260 /* Tail call offset to jump into */ 264 - #if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \ 265 - IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) 266 - #define PROLOGUE_OFFSET 9 267 - #else 268 - #define PROLOGUE_OFFSET 8 269 - #endif 261 + #define PROLOGUE_OFFSET (BTI_INSNS + 2 + PAC_INSNS + 8) 270 262 271 263 static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) 272 264 { ··· 322 280 * 323 281 */ 324 282 283 + emit_bti(A64_BTI_C, ctx); 284 + 285 + emit(A64_MOV(1, A64_R(9), A64_LR), ctx); 286 + emit(A64_NOP, ctx); 287 + 325 288 /* Sign lr */ 326 289 if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)) 327 290 emit(A64_PACIASP, ctx); 328 - /* BTI landing pad */ 329 - else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) 330 - emit(A64_BTI_C, ctx); 331 291 332 292 /* Save FP and LR registers to stay align with ARM64 AAPCS */ 333 293 emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx); ··· 356 312 } 357 313 358 314 /* BTI landing pad for the tail call, done with a BR */ 359 - if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) 360 - emit(A64_BTI_J, ctx); 315 + emit_bti(A64_BTI_J, ctx); 361 316 } 362 317 363 318 emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx); ··· 598 555 } 599 556 600 557 return 0; 558 + } 559 + 560 + void dummy_tramp(void); 561 + 562 + asm ( 563 + " .pushsection .text, \"ax\", @progbits\n" 564 + " .global dummy_tramp\n" 565 + " .type dummy_tramp, %function\n" 566 + "dummy_tramp:" 567 + #if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) 568 + " bti j\n" /* dummy_tramp is called via "br x10" */ 569 + #endif 570 + " mov x10, x30\n" 571 + " mov x30, x9\n" 572 + " ret x10\n" 573 + " .size dummy_tramp, .-dummy_tramp\n" 574 + " .popsection\n" 575 + ); 576 + 577 + /* build a plt initialized like this: 578 + * 579 + * plt: 580 + * ldr tmp, target 581 + * br tmp 582 + * target: 583 + * .quad dummy_tramp 584 + * 585 + * when a long jump trampoline is attached, target is filled with the 586 + * trampoline address, and when the trampoline is removed, target is 587 + * restored to dummy_tramp address. 588 + */ 589 + static void build_plt(struct jit_ctx *ctx) 590 + { 591 + const u8 tmp = bpf2a64[TMP_REG_1]; 592 + struct bpf_plt *plt = NULL; 593 + 594 + /* make sure target is 64-bit aligned */ 595 + if ((ctx->idx + PLT_TARGET_OFFSET / AARCH64_INSN_SIZE) % 2) 596 + emit(A64_NOP, ctx); 597 + 598 + plt = (struct bpf_plt *)(ctx->image + ctx->idx); 599 + /* plt is called via bl, no BTI needed here */ 600 + emit(A64_LDR64LIT(tmp, 2 * AARCH64_INSN_SIZE), ctx); 601 + emit(A64_BR(tmp), ctx); 602 + 603 + if (ctx->image) 604 + plt->target = (u64)&dummy_tramp; 601 605 } 602 606 603 607 static void build_epilogue(struct jit_ctx *ctx) ··· 1081 991 &func_addr, &func_addr_fixed); 1082 992 if (ret < 0) 1083 993 return ret; 1084 - emit_addr_mov_i64(tmp, func_addr, ctx); 1085 - emit(A64_BLR(tmp), ctx); 994 + emit_call(func_addr, ctx); 1086 995 emit(A64_MOV(1, r0, A64_R(0)), ctx); 1087 996 break; 1088 997 } ··· 1425 1336 if (a64_insn == AARCH64_BREAK_FAULT) 1426 1337 return -1; 1427 1338 } 1339 + return 0; 1340 + } 1341 + 1342 + static int validate_ctx(struct jit_ctx *ctx) 1343 + { 1344 + if (validate_code(ctx)) 1345 + return -1; 1428 1346 1429 1347 if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries)) 1430 1348 return -1; ··· 1452 1356 1453 1357 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) 1454 1358 { 1455 - int image_size, prog_size, extable_size; 1359 + int image_size, prog_size, extable_size, extable_align, extable_offset; 1456 1360 struct bpf_prog *tmp, *orig_prog = prog; 1457 1361 struct bpf_binary_header *header; 1458 1362 struct arm64_jit_data *jit_data; ··· 1522 1426 1523 1427 ctx.epilogue_offset = ctx.idx; 1524 1428 build_epilogue(&ctx); 1429 + build_plt(&ctx); 1525 1430 1431 + extable_align = __alignof__(struct exception_table_entry); 1526 1432 extable_size = prog->aux->num_exentries * 1527 1433 sizeof(struct exception_table_entry); 1528 1434 1529 1435 /* Now we know the actual image size. */ 1530 1436 prog_size = sizeof(u32) * ctx.idx; 1531 - image_size = prog_size + extable_size; 1437 + /* also allocate space for plt target */ 1438 + extable_offset = round_up(prog_size + PLT_TARGET_SIZE, extable_align); 1439 + image_size = extable_offset + extable_size; 1532 1440 header = bpf_jit_binary_alloc(image_size, &image_ptr, 1533 1441 sizeof(u32), jit_fill_hole); 1534 1442 if (header == NULL) { ··· 1544 1444 1545 1445 ctx.image = (__le32 *)image_ptr; 1546 1446 if (extable_size) 1547 - prog->aux->extable = (void *)image_ptr + prog_size; 1447 + prog->aux->extable = (void *)image_ptr + extable_offset; 1548 1448 skip_init_ctx: 1549 1449 ctx.idx = 0; 1550 1450 ctx.exentry_idx = 0; ··· 1558 1458 } 1559 1459 1560 1460 build_epilogue(&ctx); 1461 + build_plt(&ctx); 1561 1462 1562 1463 /* 3. Extra pass to validate JITed code. */ 1563 - if (validate_code(&ctx)) { 1464 + if (validate_ctx(&ctx)) { 1564 1465 bpf_jit_binary_free(header); 1565 1466 prog = orig_prog; 1566 1467 goto out_off; ··· 1637 1536 bool bpf_jit_supports_subprog_tailcalls(void) 1638 1537 { 1639 1538 return true; 1539 + } 1540 + 1541 + static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l, 1542 + int args_off, int retval_off, int run_ctx_off, 1543 + bool save_ret) 1544 + { 1545 + u32 *branch; 1546 + u64 enter_prog; 1547 + u64 exit_prog; 1548 + struct bpf_prog *p = l->link.prog; 1549 + int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie); 1550 + 1551 + if (p->aux->sleepable) { 1552 + enter_prog = (u64)__bpf_prog_enter_sleepable; 1553 + exit_prog = (u64)__bpf_prog_exit_sleepable; 1554 + } else { 1555 + enter_prog = (u64)__bpf_prog_enter; 1556 + exit_prog = (u64)__bpf_prog_exit; 1557 + } 1558 + 1559 + if (l->cookie == 0) { 1560 + /* if cookie is zero, one instruction is enough to store it */ 1561 + emit(A64_STR64I(A64_ZR, A64_SP, run_ctx_off + cookie_off), ctx); 1562 + } else { 1563 + emit_a64_mov_i64(A64_R(10), l->cookie, ctx); 1564 + emit(A64_STR64I(A64_R(10), A64_SP, run_ctx_off + cookie_off), 1565 + ctx); 1566 + } 1567 + 1568 + /* save p to callee saved register x19 to avoid loading p with mov_i64 1569 + * each time. 1570 + */ 1571 + emit_addr_mov_i64(A64_R(19), (const u64)p, ctx); 1572 + 1573 + /* arg1: prog */ 1574 + emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx); 1575 + /* arg2: &run_ctx */ 1576 + emit(A64_ADD_I(1, A64_R(1), A64_SP, run_ctx_off), ctx); 1577 + 1578 + emit_call(enter_prog, ctx); 1579 + 1580 + /* if (__bpf_prog_enter(prog) == 0) 1581 + * goto skip_exec_of_prog; 1582 + */ 1583 + branch = ctx->image + ctx->idx; 1584 + emit(A64_NOP, ctx); 1585 + 1586 + /* save return value to callee saved register x20 */ 1587 + emit(A64_MOV(1, A64_R(20), A64_R(0)), ctx); 1588 + 1589 + emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx); 1590 + if (!p->jited) 1591 + emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx); 1592 + 1593 + emit_call((const u64)p->bpf_func, ctx); 1594 + 1595 + if (save_ret) 1596 + emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx); 1597 + 1598 + if (ctx->image) { 1599 + int offset = &ctx->image[ctx->idx] - branch; 1600 + *branch = A64_CBZ(1, A64_R(0), offset); 1601 + } 1602 + 1603 + /* arg1: prog */ 1604 + emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx); 1605 + /* arg2: start time */ 1606 + emit(A64_MOV(1, A64_R(1), A64_R(20)), ctx); 1607 + /* arg3: &run_ctx */ 1608 + emit(A64_ADD_I(1, A64_R(2), A64_SP, run_ctx_off), ctx); 1609 + 1610 + emit_call(exit_prog, ctx); 1611 + } 1612 + 1613 + static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl, 1614 + int args_off, int retval_off, int run_ctx_off, 1615 + u32 **branches) 1616 + { 1617 + int i; 1618 + 1619 + /* The first fmod_ret program will receive a garbage return value. 1620 + * Set this to 0 to avoid confusing the program. 1621 + */ 1622 + emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx); 1623 + for (i = 0; i < tl->nr_links; i++) { 1624 + invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off, 1625 + run_ctx_off, true); 1626 + /* if (*(u64 *)(sp + retval_off) != 0) 1627 + * goto do_fexit; 1628 + */ 1629 + emit(A64_LDR64I(A64_R(10), A64_SP, retval_off), ctx); 1630 + /* Save the location of branch, and generate a nop. 1631 + * This nop will be replaced with a cbnz later. 1632 + */ 1633 + branches[i] = ctx->image + ctx->idx; 1634 + emit(A64_NOP, ctx); 1635 + } 1636 + } 1637 + 1638 + static void save_args(struct jit_ctx *ctx, int args_off, int nargs) 1639 + { 1640 + int i; 1641 + 1642 + for (i = 0; i < nargs; i++) { 1643 + emit(A64_STR64I(i, A64_SP, args_off), ctx); 1644 + args_off += 8; 1645 + } 1646 + } 1647 + 1648 + static void restore_args(struct jit_ctx *ctx, int args_off, int nargs) 1649 + { 1650 + int i; 1651 + 1652 + for (i = 0; i < nargs; i++) { 1653 + emit(A64_LDR64I(i, A64_SP, args_off), ctx); 1654 + args_off += 8; 1655 + } 1656 + } 1657 + 1658 + /* Based on the x86's implementation of arch_prepare_bpf_trampoline(). 1659 + * 1660 + * bpf prog and function entry before bpf trampoline hooked: 1661 + * mov x9, lr 1662 + * nop 1663 + * 1664 + * bpf prog and function entry after bpf trampoline hooked: 1665 + * mov x9, lr 1666 + * bl <bpf_trampoline or plt> 1667 + * 1668 + */ 1669 + static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, 1670 + struct bpf_tramp_links *tlinks, void *orig_call, 1671 + int nargs, u32 flags) 1672 + { 1673 + int i; 1674 + int stack_size; 1675 + int retaddr_off; 1676 + int regs_off; 1677 + int retval_off; 1678 + int args_off; 1679 + int nargs_off; 1680 + int ip_off; 1681 + int run_ctx_off; 1682 + struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; 1683 + struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; 1684 + struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; 1685 + bool save_ret; 1686 + u32 **branches = NULL; 1687 + 1688 + /* trampoline stack layout: 1689 + * [ parent ip ] 1690 + * [ FP ] 1691 + * SP + retaddr_off [ self ip ] 1692 + * [ FP ] 1693 + * 1694 + * [ padding ] align SP to multiples of 16 1695 + * 1696 + * [ x20 ] callee saved reg x20 1697 + * SP + regs_off [ x19 ] callee saved reg x19 1698 + * 1699 + * SP + retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or 1700 + * BPF_TRAMP_F_RET_FENTRY_RET 1701 + * 1702 + * [ argN ] 1703 + * [ ... ] 1704 + * SP + args_off [ arg1 ] 1705 + * 1706 + * SP + nargs_off [ args count ] 1707 + * 1708 + * SP + ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag 1709 + * 1710 + * SP + run_ctx_off [ bpf_tramp_run_ctx ] 1711 + */ 1712 + 1713 + stack_size = 0; 1714 + run_ctx_off = stack_size; 1715 + /* room for bpf_tramp_run_ctx */ 1716 + stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8); 1717 + 1718 + ip_off = stack_size; 1719 + /* room for IP address argument */ 1720 + if (flags & BPF_TRAMP_F_IP_ARG) 1721 + stack_size += 8; 1722 + 1723 + nargs_off = stack_size; 1724 + /* room for args count */ 1725 + stack_size += 8; 1726 + 1727 + args_off = stack_size; 1728 + /* room for args */ 1729 + stack_size += nargs * 8; 1730 + 1731 + /* room for return value */ 1732 + retval_off = stack_size; 1733 + save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET); 1734 + if (save_ret) 1735 + stack_size += 8; 1736 + 1737 + /* room for callee saved registers, currently x19 and x20 are used */ 1738 + regs_off = stack_size; 1739 + stack_size += 16; 1740 + 1741 + /* round up to multiples of 16 to avoid SPAlignmentFault */ 1742 + stack_size = round_up(stack_size, 16); 1743 + 1744 + /* return address locates above FP */ 1745 + retaddr_off = stack_size + 8; 1746 + 1747 + /* bpf trampoline may be invoked by 3 instruction types: 1748 + * 1. bl, attached to bpf prog or kernel function via short jump 1749 + * 2. br, attached to bpf prog or kernel function via long jump 1750 + * 3. blr, working as a function pointer, used by struct_ops. 1751 + * So BTI_JC should used here to support both br and blr. 1752 + */ 1753 + emit_bti(A64_BTI_JC, ctx); 1754 + 1755 + /* frame for parent function */ 1756 + emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx); 1757 + emit(A64_MOV(1, A64_FP, A64_SP), ctx); 1758 + 1759 + /* frame for patched function */ 1760 + emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx); 1761 + emit(A64_MOV(1, A64_FP, A64_SP), ctx); 1762 + 1763 + /* allocate stack space */ 1764 + emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx); 1765 + 1766 + if (flags & BPF_TRAMP_F_IP_ARG) { 1767 + /* save ip address of the traced function */ 1768 + emit_addr_mov_i64(A64_R(10), (const u64)orig_call, ctx); 1769 + emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx); 1770 + } 1771 + 1772 + /* save args count*/ 1773 + emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx); 1774 + emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx); 1775 + 1776 + /* save args */ 1777 + save_args(ctx, args_off, nargs); 1778 + 1779 + /* save callee saved registers */ 1780 + emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx); 1781 + emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx); 1782 + 1783 + if (flags & BPF_TRAMP_F_CALL_ORIG) { 1784 + emit_addr_mov_i64(A64_R(0), (const u64)im, ctx); 1785 + emit_call((const u64)__bpf_tramp_enter, ctx); 1786 + } 1787 + 1788 + for (i = 0; i < fentry->nr_links; i++) 1789 + invoke_bpf_prog(ctx, fentry->links[i], args_off, 1790 + retval_off, run_ctx_off, 1791 + flags & BPF_TRAMP_F_RET_FENTRY_RET); 1792 + 1793 + if (fmod_ret->nr_links) { 1794 + branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *), 1795 + GFP_KERNEL); 1796 + if (!branches) 1797 + return -ENOMEM; 1798 + 1799 + invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off, 1800 + run_ctx_off, branches); 1801 + } 1802 + 1803 + if (flags & BPF_TRAMP_F_CALL_ORIG) { 1804 + restore_args(ctx, args_off, nargs); 1805 + /* call original func */ 1806 + emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx); 1807 + emit(A64_BLR(A64_R(10)), ctx); 1808 + /* store return value */ 1809 + emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx); 1810 + /* reserve a nop for bpf_tramp_image_put */ 1811 + im->ip_after_call = ctx->image + ctx->idx; 1812 + emit(A64_NOP, ctx); 1813 + } 1814 + 1815 + /* update the branches saved in invoke_bpf_mod_ret with cbnz */ 1816 + for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) { 1817 + int offset = &ctx->image[ctx->idx] - branches[i]; 1818 + *branches[i] = A64_CBNZ(1, A64_R(10), offset); 1819 + } 1820 + 1821 + for (i = 0; i < fexit->nr_links; i++) 1822 + invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off, 1823 + run_ctx_off, false); 1824 + 1825 + if (flags & BPF_TRAMP_F_CALL_ORIG) { 1826 + im->ip_epilogue = ctx->image + ctx->idx; 1827 + emit_addr_mov_i64(A64_R(0), (const u64)im, ctx); 1828 + emit_call((const u64)__bpf_tramp_exit, ctx); 1829 + } 1830 + 1831 + if (flags & BPF_TRAMP_F_RESTORE_REGS) 1832 + restore_args(ctx, args_off, nargs); 1833 + 1834 + /* restore callee saved register x19 and x20 */ 1835 + emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx); 1836 + emit(A64_LDR64I(A64_R(20), A64_SP, regs_off + 8), ctx); 1837 + 1838 + if (save_ret) 1839 + emit(A64_LDR64I(A64_R(0), A64_SP, retval_off), ctx); 1840 + 1841 + /* reset SP */ 1842 + emit(A64_MOV(1, A64_SP, A64_FP), ctx); 1843 + 1844 + /* pop frames */ 1845 + emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx); 1846 + emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx); 1847 + 1848 + if (flags & BPF_TRAMP_F_SKIP_FRAME) { 1849 + /* skip patched function, return to parent */ 1850 + emit(A64_MOV(1, A64_LR, A64_R(9)), ctx); 1851 + emit(A64_RET(A64_R(9)), ctx); 1852 + } else { 1853 + /* return to patched function */ 1854 + emit(A64_MOV(1, A64_R(10), A64_LR), ctx); 1855 + emit(A64_MOV(1, A64_LR, A64_R(9)), ctx); 1856 + emit(A64_RET(A64_R(10)), ctx); 1857 + } 1858 + 1859 + if (ctx->image) 1860 + bpf_flush_icache(ctx->image, ctx->image + ctx->idx); 1861 + 1862 + kfree(branches); 1863 + 1864 + return ctx->idx; 1865 + } 1866 + 1867 + int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, 1868 + void *image_end, const struct btf_func_model *m, 1869 + u32 flags, struct bpf_tramp_links *tlinks, 1870 + void *orig_call) 1871 + { 1872 + int ret; 1873 + int nargs = m->nr_args; 1874 + int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE; 1875 + struct jit_ctx ctx = { 1876 + .image = NULL, 1877 + .idx = 0, 1878 + }; 1879 + 1880 + /* the first 8 arguments are passed by registers */ 1881 + if (nargs > 8) 1882 + return -ENOTSUPP; 1883 + 1884 + ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags); 1885 + if (ret < 0) 1886 + return ret; 1887 + 1888 + if (ret > max_insns) 1889 + return -EFBIG; 1890 + 1891 + ctx.image = image; 1892 + ctx.idx = 0; 1893 + 1894 + jit_fill_hole(image, (unsigned int)(image_end - image)); 1895 + ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags); 1896 + 1897 + if (ret > 0 && validate_code(&ctx) < 0) 1898 + ret = -EINVAL; 1899 + 1900 + if (ret > 0) 1901 + ret *= AARCH64_INSN_SIZE; 1902 + 1903 + return ret; 1904 + } 1905 + 1906 + static bool is_long_jump(void *ip, void *target) 1907 + { 1908 + long offset; 1909 + 1910 + /* NULL target means this is a NOP */ 1911 + if (!target) 1912 + return false; 1913 + 1914 + offset = (long)target - (long)ip; 1915 + return offset < -SZ_128M || offset >= SZ_128M; 1916 + } 1917 + 1918 + static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip, 1919 + void *addr, void *plt, u32 *insn) 1920 + { 1921 + void *target; 1922 + 1923 + if (!addr) { 1924 + *insn = aarch64_insn_gen_nop(); 1925 + return 0; 1926 + } 1927 + 1928 + if (is_long_jump(ip, addr)) 1929 + target = plt; 1930 + else 1931 + target = addr; 1932 + 1933 + *insn = aarch64_insn_gen_branch_imm((unsigned long)ip, 1934 + (unsigned long)target, 1935 + type); 1936 + 1937 + return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT; 1938 + } 1939 + 1940 + /* Replace the branch instruction from @ip to @old_addr in a bpf prog or a bpf 1941 + * trampoline with the branch instruction from @ip to @new_addr. If @old_addr 1942 + * or @new_addr is NULL, the old or new instruction is NOP. 1943 + * 1944 + * When @ip is the bpf prog entry, a bpf trampoline is being attached or 1945 + * detached. Since bpf trampoline and bpf prog are allocated separately with 1946 + * vmalloc, the address distance may exceed 128MB, the maximum branch range. 1947 + * So long jump should be handled. 1948 + * 1949 + * When a bpf prog is constructed, a plt pointing to empty trampoline 1950 + * dummy_tramp is placed at the end: 1951 + * 1952 + * bpf_prog: 1953 + * mov x9, lr 1954 + * nop // patchsite 1955 + * ... 1956 + * ret 1957 + * 1958 + * plt: 1959 + * ldr x10, target 1960 + * br x10 1961 + * target: 1962 + * .quad dummy_tramp // plt target 1963 + * 1964 + * This is also the state when no trampoline is attached. 1965 + * 1966 + * When a short-jump bpf trampoline is attached, the patchsite is patched 1967 + * to a bl instruction to the trampoline directly: 1968 + * 1969 + * bpf_prog: 1970 + * mov x9, lr 1971 + * bl <short-jump bpf trampoline address> // patchsite 1972 + * ... 1973 + * ret 1974 + * 1975 + * plt: 1976 + * ldr x10, target 1977 + * br x10 1978 + * target: 1979 + * .quad dummy_tramp // plt target 1980 + * 1981 + * When a long-jump bpf trampoline is attached, the plt target is filled with 1982 + * the trampoline address and the patchsite is patched to a bl instruction to 1983 + * the plt: 1984 + * 1985 + * bpf_prog: 1986 + * mov x9, lr 1987 + * bl plt // patchsite 1988 + * ... 1989 + * ret 1990 + * 1991 + * plt: 1992 + * ldr x10, target 1993 + * br x10 1994 + * target: 1995 + * .quad <long-jump bpf trampoline address> // plt target 1996 + * 1997 + * The dummy_tramp is used to prevent another CPU from jumping to unknown 1998 + * locations during the patching process, making the patching process easier. 1999 + */ 2000 + int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type, 2001 + void *old_addr, void *new_addr) 2002 + { 2003 + int ret; 2004 + u32 old_insn; 2005 + u32 new_insn; 2006 + u32 replaced; 2007 + struct bpf_plt *plt = NULL; 2008 + unsigned long size = 0UL; 2009 + unsigned long offset = ~0UL; 2010 + enum aarch64_insn_branch_type branch_type; 2011 + char namebuf[KSYM_NAME_LEN]; 2012 + void *image = NULL; 2013 + u64 plt_target = 0ULL; 2014 + bool poking_bpf_entry; 2015 + 2016 + if (!__bpf_address_lookup((unsigned long)ip, &size, &offset, namebuf)) 2017 + /* Only poking bpf text is supported. Since kernel function 2018 + * entry is set up by ftrace, we reply on ftrace to poke kernel 2019 + * functions. 2020 + */ 2021 + return -ENOTSUPP; 2022 + 2023 + image = ip - offset; 2024 + /* zero offset means we're poking bpf prog entry */ 2025 + poking_bpf_entry = (offset == 0UL); 2026 + 2027 + /* bpf prog entry, find plt and the real patchsite */ 2028 + if (poking_bpf_entry) { 2029 + /* plt locates at the end of bpf prog */ 2030 + plt = image + size - PLT_TARGET_OFFSET; 2031 + 2032 + /* skip to the nop instruction in bpf prog entry: 2033 + * bti c // if BTI enabled 2034 + * mov x9, x30 2035 + * nop 2036 + */ 2037 + ip = image + POKE_OFFSET * AARCH64_INSN_SIZE; 2038 + } 2039 + 2040 + /* long jump is only possible at bpf prog entry */ 2041 + if (WARN_ON((is_long_jump(ip, new_addr) || is_long_jump(ip, old_addr)) && 2042 + !poking_bpf_entry)) 2043 + return -EINVAL; 2044 + 2045 + if (poke_type == BPF_MOD_CALL) 2046 + branch_type = AARCH64_INSN_BRANCH_LINK; 2047 + else 2048 + branch_type = AARCH64_INSN_BRANCH_NOLINK; 2049 + 2050 + if (gen_branch_or_nop(branch_type, ip, old_addr, plt, &old_insn) < 0) 2051 + return -EFAULT; 2052 + 2053 + if (gen_branch_or_nop(branch_type, ip, new_addr, plt, &new_insn) < 0) 2054 + return -EFAULT; 2055 + 2056 + if (is_long_jump(ip, new_addr)) 2057 + plt_target = (u64)new_addr; 2058 + else if (is_long_jump(ip, old_addr)) 2059 + /* if the old target is a long jump and the new target is not, 2060 + * restore the plt target to dummy_tramp, so there is always a 2061 + * legal and harmless address stored in plt target, and we'll 2062 + * never jump from plt to an unknown place. 2063 + */ 2064 + plt_target = (u64)&dummy_tramp; 2065 + 2066 + if (plt_target) { 2067 + /* non-zero plt_target indicates we're patching a bpf prog, 2068 + * which is read only. 2069 + */ 2070 + if (set_memory_rw(PAGE_MASK & ((uintptr_t)&plt->target), 1)) 2071 + return -EFAULT; 2072 + WRITE_ONCE(plt->target, plt_target); 2073 + set_memory_ro(PAGE_MASK & ((uintptr_t)&plt->target), 1); 2074 + /* since plt target points to either the new trampoline 2075 + * or dummy_tramp, even if another CPU reads the old plt 2076 + * target value before fetching the bl instruction to plt, 2077 + * it will be brought back by dummy_tramp, so no barrier is 2078 + * required here. 2079 + */ 2080 + } 2081 + 2082 + /* if the old target and the new target are both long jumps, no 2083 + * patching is required 2084 + */ 2085 + if (old_insn == new_insn) 2086 + return 0; 2087 + 2088 + mutex_lock(&text_mutex); 2089 + if (aarch64_insn_read(ip, &replaced)) { 2090 + ret = -EFAULT; 2091 + goto out; 2092 + } 2093 + 2094 + if (replaced != old_insn) { 2095 + ret = -EFAULT; 2096 + goto out; 2097 + } 2098 + 2099 + /* We call aarch64_insn_patch_text_nosync() to replace instruction 2100 + * atomically, so no other CPUs will fetch a half-new and half-old 2101 + * instruction. But there is chance that another CPU executes the 2102 + * old instruction after the patching operation finishes (e.g., 2103 + * pipeline not flushed, or icache not synchronized yet). 2104 + * 2105 + * 1. when a new trampoline is attached, it is not a problem for 2106 + * different CPUs to jump to different trampolines temporarily. 2107 + * 2108 + * 2. when an old trampoline is freed, we should wait for all other 2109 + * CPUs to exit the trampoline and make sure the trampoline is no 2110 + * longer reachable, since bpf_tramp_image_put() function already 2111 + * uses percpu_ref and task-based rcu to do the sync, no need to call 2112 + * the sync version here, see bpf_tramp_image_put() for details. 2113 + */ 2114 + ret = aarch64_insn_patch_text_nosync(ip, new_insn); 2115 + out: 2116 + mutex_unlock(&text_mutex); 2117 + 2118 + return ret; 1640 2119 }
+34 -24
arch/x86/net/bpf_jit_comp.c
··· 1950 1950 return 0; 1951 1951 } 1952 1952 1953 - static bool is_valid_bpf_tramp_flags(unsigned int flags) 1954 - { 1955 - if ((flags & BPF_TRAMP_F_RESTORE_REGS) && 1956 - (flags & BPF_TRAMP_F_SKIP_FRAME)) 1957 - return false; 1958 - 1959 - /* 1960 - * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops, 1961 - * and it must be used alone. 1962 - */ 1963 - if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) && 1964 - (flags & ~BPF_TRAMP_F_RET_FENTRY_RET)) 1965 - return false; 1966 - 1967 - return true; 1968 - } 1969 - 1970 1953 /* Example: 1971 1954 * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev); 1972 1955 * its 'struct btf_func_model' will be nr_args=2 ··· 2027 2044 /* x86-64 supports up to 6 arguments. 7+ can be added in the future */ 2028 2045 if (nr_args > 6) 2029 2046 return -ENOTSUPP; 2030 - 2031 - if (!is_valid_bpf_tramp_flags(flags)) 2032 - return -EINVAL; 2033 2047 2034 2048 /* Generated trampoline stack layout: 2035 2049 * ··· 2133 2153 if (flags & BPF_TRAMP_F_CALL_ORIG) { 2134 2154 restore_regs(m, &prog, nr_args, regs_off); 2135 2155 2136 - /* call original function */ 2137 - if (emit_call(&prog, orig_call, prog)) { 2138 - ret = -EINVAL; 2139 - goto cleanup; 2156 + if (flags & BPF_TRAMP_F_ORIG_STACK) { 2157 + emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8); 2158 + EMIT2(0xff, 0xd0); /* call *rax */ 2159 + } else { 2160 + /* call original function */ 2161 + if (emit_call(&prog, orig_call, prog)) { 2162 + ret = -EINVAL; 2163 + goto cleanup; 2164 + } 2140 2165 } 2141 2166 /* remember return value in a stack for bpf prog to access */ 2142 2167 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); ··· 2504 2519 bool bpf_jit_supports_subprog_tailcalls(void) 2505 2520 { 2506 2521 return true; 2522 + } 2523 + 2524 + void bpf_jit_free(struct bpf_prog *prog) 2525 + { 2526 + if (prog->jited) { 2527 + struct x64_jit_data *jit_data = prog->aux->jit_data; 2528 + struct bpf_binary_header *hdr; 2529 + 2530 + /* 2531 + * If we fail the final pass of JIT (from jit_subprogs), 2532 + * the program may not be finalized yet. Call finalize here 2533 + * before freeing it. 2534 + */ 2535 + if (jit_data) { 2536 + bpf_jit_binary_pack_finalize(prog, jit_data->header, 2537 + jit_data->rw_header); 2538 + kvfree(jit_data->addrs); 2539 + kfree(jit_data); 2540 + } 2541 + hdr = bpf_jit_binary_pack_hdr(prog); 2542 + bpf_jit_binary_pack_free(hdr, NULL); 2543 + WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog)); 2544 + } 2545 + 2546 + bpf_prog_unlock_free(prog); 2507 2547 }
+23 -6
include/linux/bpf.h
··· 47 47 struct mem_cgroup; 48 48 struct module; 49 49 struct bpf_func_state; 50 + struct ftrace_ops; 50 51 51 52 extern struct idr btf_idr; 52 53 extern spinlock_t btf_idr_lock; ··· 222 221 u32 btf_vmlinux_value_type_id; 223 222 struct btf *btf; 224 223 #ifdef CONFIG_MEMCG_KMEM 225 - struct mem_cgroup *memcg; 224 + struct obj_cgroup *objcg; 226 225 #endif 227 226 char name[BPF_OBJ_NAME_LEN]; 228 227 struct bpf_map_off_arr *off_arr; ··· 752 751 /* Return the return value of fentry prog. Only used by bpf_struct_ops. */ 753 752 #define BPF_TRAMP_F_RET_FENTRY_RET BIT(4) 754 753 754 + /* Get original function from stack instead of from provided direct address. 755 + * Makes sense for trampolines with fexit or fmod_ret programs. 756 + */ 757 + #define BPF_TRAMP_F_ORIG_STACK BIT(5) 758 + 759 + /* This trampoline is on a function with another ftrace_ops with IPMODIFY, 760 + * e.g., a live patch. This flag is set and cleared by ftrace call backs, 761 + */ 762 + #define BPF_TRAMP_F_SHARE_IPMODIFY BIT(6) 763 + 755 764 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 756 765 * bytes on x86. 757 766 */ ··· 844 833 struct bpf_trampoline { 845 834 /* hlist for trampoline_table */ 846 835 struct hlist_node hlist; 836 + struct ftrace_ops *fops; 847 837 /* serializes access to fields of this trampoline */ 848 838 struct mutex mutex; 849 839 refcount_t refcnt; 840 + u32 flags; 850 841 u64 key; 851 842 struct { 852 843 struct btf_func_model model; ··· 1057 1044 bool sleepable; 1058 1045 bool tail_call_reachable; 1059 1046 bool xdp_has_frags; 1060 - bool use_bpf_prog_pack; 1061 1047 /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ 1062 1048 const struct btf_type *attach_func_proto; 1063 1049 /* function name for valid attach_btf_id */ ··· 1267 1255 int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, 1268 1256 union bpf_attr __user *uattr); 1269 1257 #endif 1270 - int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, 1271 - int cgroup_atype); 1272 - void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog); 1273 1258 #else 1274 1259 static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) 1275 1260 { ··· 1290 1281 { 1291 1282 return -EINVAL; 1292 1283 } 1284 + #endif 1285 + 1286 + #if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM) 1287 + int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, 1288 + int cgroup_atype); 1289 + void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog); 1290 + #else 1293 1291 static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, 1294 1292 int cgroup_atype) 1295 1293 { ··· 1937 1921 struct bpf_reg_state *regs); 1938 1922 int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, 1939 1923 const struct btf *btf, u32 func_id, 1940 - struct bpf_reg_state *regs); 1924 + struct bpf_reg_state *regs, 1925 + u32 kfunc_flags); 1941 1926 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, 1942 1927 struct bpf_reg_state *reg); 1943 1928 int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
+4 -4
include/linux/bpf_verifier.h
··· 345 345 }; 346 346 347 347 struct bpf_loop_inline_state { 348 - int initialized:1; /* set to true upon first entry */ 349 - int fit_for_inline:1; /* true if callback function is the same 350 - * at each call and flags are always zero 351 - */ 348 + unsigned int initialized:1; /* set to true upon first entry */ 349 + unsigned int fit_for_inline:1; /* true if callback function is the same 350 + * at each call and flags are always zero 351 + */ 352 352 u32 callback_subprogno; /* valid when fit_for_inline is true */ 353 353 }; 354 354
+42 -23
include/linux/btf.h
··· 12 12 #define BTF_TYPE_EMIT(type) ((void)(type *)0) 13 13 #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) 14 14 15 - enum btf_kfunc_type { 16 - BTF_KFUNC_TYPE_CHECK, 17 - BTF_KFUNC_TYPE_ACQUIRE, 18 - BTF_KFUNC_TYPE_RELEASE, 19 - BTF_KFUNC_TYPE_RET_NULL, 20 - BTF_KFUNC_TYPE_KPTR_ACQUIRE, 21 - BTF_KFUNC_TYPE_MAX, 22 - }; 15 + /* These need to be macros, as the expressions are used in assembler input */ 16 + #define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ 17 + #define KF_RELEASE (1 << 1) /* kfunc is a release function */ 18 + #define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ 19 + #define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ 20 + /* Trusted arguments are those which are meant to be referenced arguments with 21 + * unchanged offset. It is used to enforce that pointers obtained from acquire 22 + * kfuncs remain unmodified when being passed to helpers taking trusted args. 23 + * 24 + * Consider 25 + * struct foo { 26 + * int data; 27 + * struct foo *next; 28 + * }; 29 + * 30 + * struct bar { 31 + * int data; 32 + * struct foo f; 33 + * }; 34 + * 35 + * struct foo *f = alloc_foo(); // Acquire kfunc 36 + * struct bar *b = alloc_bar(); // Acquire kfunc 37 + * 38 + * If a kfunc set_foo_data() wants to operate only on the allocated object, it 39 + * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like: 40 + * 41 + * set_foo_data(f, 42); // Allowed 42 + * set_foo_data(f->next, 42); // Rejected, non-referenced pointer 43 + * set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type 44 + * set_foo_data(&b->f, 42); // Rejected, referenced, but bad offset 45 + * 46 + * In the final case, usually for the purposes of type matching, it is deduced 47 + * by looking at the type of the member at the offset, but due to the 48 + * requirement of trusted argument, this deduction will be strict and not done 49 + * for this case. 50 + */ 51 + #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ 23 52 24 53 struct btf; 25 54 struct btf_member; ··· 59 30 60 31 struct btf_kfunc_id_set { 61 32 struct module *owner; 62 - union { 63 - struct { 64 - struct btf_id_set *check_set; 65 - struct btf_id_set *acquire_set; 66 - struct btf_id_set *release_set; 67 - struct btf_id_set *ret_null_set; 68 - struct btf_id_set *kptr_acquire_set; 69 - }; 70 - struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX]; 71 - }; 33 + struct btf_id_set8 *set; 72 34 }; 73 35 74 36 struct btf_id_dtor_kfunc { ··· 398 378 const char *btf_name_by_offset(const struct btf *btf, u32 offset); 399 379 struct btf *btf_parse_vmlinux(void); 400 380 struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); 401 - bool btf_kfunc_id_set_contains(const struct btf *btf, 381 + u32 *btf_kfunc_id_set_contains(const struct btf *btf, 402 382 enum bpf_prog_type prog_type, 403 - enum btf_kfunc_type type, u32 kfunc_btf_id); 383 + u32 kfunc_btf_id); 404 384 int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, 405 385 const struct btf_kfunc_id_set *s); 406 386 s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id); ··· 417 397 { 418 398 return NULL; 419 399 } 420 - static inline bool btf_kfunc_id_set_contains(const struct btf *btf, 400 + static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf, 421 401 enum bpf_prog_type prog_type, 422 - enum btf_kfunc_type type, 423 402 u32 kfunc_btf_id) 424 403 { 425 - return false; 404 + return NULL; 426 405 } 427 406 static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, 428 407 const struct btf_kfunc_id_set *s)
+64 -4
include/linux/btf_ids.h
··· 8 8 u32 ids[]; 9 9 }; 10 10 11 + struct btf_id_set8 { 12 + u32 cnt; 13 + u32 flags; 14 + struct { 15 + u32 id; 16 + u32 flags; 17 + } pairs[]; 18 + }; 19 + 11 20 #ifdef CONFIG_DEBUG_INFO_BTF 12 21 13 22 #include <linux/compiler.h> /* for __PASTE */ ··· 34 25 35 26 #define BTF_IDS_SECTION ".BTF_ids" 36 27 37 - #define ____BTF_ID(symbol) \ 28 + #define ____BTF_ID(symbol, word) \ 38 29 asm( \ 39 30 ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 40 31 ".local " #symbol " ; \n" \ ··· 42 33 ".size " #symbol ", 4; \n" \ 43 34 #symbol ": \n" \ 44 35 ".zero 4 \n" \ 36 + word \ 45 37 ".popsection; \n"); 46 38 47 - #define __BTF_ID(symbol) \ 48 - ____BTF_ID(symbol) 39 + #define __BTF_ID(symbol, word) \ 40 + ____BTF_ID(symbol, word) 49 41 50 42 #define __ID(prefix) \ 51 43 __PASTE(prefix, __COUNTER__) ··· 56 46 * to 4 zero bytes. 57 47 */ 58 48 #define BTF_ID(prefix, name) \ 59 - __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__)) 49 + __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), "") 50 + 51 + #define ____BTF_ID_FLAGS(prefix, name, flags) \ 52 + __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), ".long " #flags "\n") 53 + #define __BTF_ID_FLAGS(prefix, name, flags, ...) \ 54 + ____BTF_ID_FLAGS(prefix, name, flags) 55 + #define BTF_ID_FLAGS(prefix, name, ...) \ 56 + __BTF_ID_FLAGS(prefix, name, ##__VA_ARGS__, 0) 60 57 61 58 /* 62 59 * The BTF_ID_LIST macro defines pure (unsorted) list ··· 162 145 ".popsection; \n"); \ 163 146 extern struct btf_id_set name; 164 147 148 + /* 149 + * The BTF_SET8_START/END macros pair defines sorted list of 150 + * BTF IDs and their flags plus its members count, with the 151 + * following layout: 152 + * 153 + * BTF_SET8_START(list) 154 + * BTF_ID_FLAGS(type1, name1, flags) 155 + * BTF_ID_FLAGS(type2, name2, flags) 156 + * BTF_SET8_END(list) 157 + * 158 + * __BTF_ID__set8__list: 159 + * .zero 8 160 + * list: 161 + * __BTF_ID__type1__name1__3: 162 + * .zero 4 163 + * .word (1 << 0) | (1 << 2) 164 + * __BTF_ID__type2__name2__5: 165 + * .zero 4 166 + * .word (1 << 3) | (1 << 1) | (1 << 2) 167 + * 168 + */ 169 + #define __BTF_SET8_START(name, scope) \ 170 + asm( \ 171 + ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 172 + "." #scope " __BTF_ID__set8__" #name "; \n" \ 173 + "__BTF_ID__set8__" #name ":; \n" \ 174 + ".zero 8 \n" \ 175 + ".popsection; \n"); 176 + 177 + #define BTF_SET8_START(name) \ 178 + __BTF_ID_LIST(name, local) \ 179 + __BTF_SET8_START(name, local) 180 + 181 + #define BTF_SET8_END(name) \ 182 + asm( \ 183 + ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 184 + ".size __BTF_ID__set8__" #name ", .-" #name " \n" \ 185 + ".popsection; \n"); \ 186 + extern struct btf_id_set8 name; 187 + 165 188 #else 166 189 167 190 #define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; 168 191 #define BTF_ID(prefix, name) 192 + #define BTF_ID_FLAGS(prefix, name, ...) 169 193 #define BTF_ID_UNUSED 170 194 #define BTF_ID_LIST_GLOBAL(name, n) u32 __maybe_unused name[n]; 171 195 #define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 __maybe_unused name[1]; ··· 214 156 #define BTF_SET_START(name) static struct btf_id_set __maybe_unused name = { 0 }; 215 157 #define BTF_SET_START_GLOBAL(name) static struct btf_id_set __maybe_unused name = { 0 }; 216 158 #define BTF_SET_END(name) 159 + #define BTF_SET8_START(name) static struct btf_id_set8 __maybe_unused name = { 0 }; 160 + #define BTF_SET8_END(name) 217 161 218 162 #endif /* CONFIG_DEBUG_INFO_BTF */ 219 163
+8
include/linux/filter.h
··· 1027 1027 void *bpf_jit_alloc_exec(unsigned long size); 1028 1028 void bpf_jit_free_exec(void *addr); 1029 1029 void bpf_jit_free(struct bpf_prog *fp); 1030 + struct bpf_binary_header * 1031 + bpf_jit_binary_pack_hdr(const struct bpf_prog *fp); 1032 + 1033 + static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) 1034 + { 1035 + return list_empty(&fp->aux->ksym.lnode) || 1036 + fp->aux->ksym.lnode.prev == LIST_POISON2; 1037 + } 1030 1038 1031 1039 struct bpf_binary_header * 1032 1040 bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image,
+43
include/linux/ftrace.h
··· 208 208 FTRACE_OPS_FL_DIRECT = BIT(17), 209 209 }; 210 210 211 + /* 212 + * FTRACE_OPS_CMD_* commands allow the ftrace core logic to request changes 213 + * to a ftrace_ops. Note, the requests may fail. 214 + * 215 + * ENABLE_SHARE_IPMODIFY_SELF - enable a DIRECT ops to work on the same 216 + * function as an ops with IPMODIFY. Called 217 + * when the DIRECT ops is being registered. 218 + * This is called with both direct_mutex and 219 + * ftrace_lock are locked. 220 + * 221 + * ENABLE_SHARE_IPMODIFY_PEER - enable a DIRECT ops to work on the same 222 + * function as an ops with IPMODIFY. Called 223 + * when the other ops (the one with IPMODIFY) 224 + * is being registered. 225 + * This is called with direct_mutex locked. 226 + * 227 + * DISABLE_SHARE_IPMODIFY_PEER - disable a DIRECT ops to work on the same 228 + * function as an ops with IPMODIFY. Called 229 + * when the other ops (the one with IPMODIFY) 230 + * is being unregistered. 231 + * This is called with direct_mutex locked. 232 + */ 233 + enum ftrace_ops_cmd { 234 + FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF, 235 + FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER, 236 + FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER, 237 + }; 238 + 239 + /* 240 + * For most ftrace_ops_cmd, 241 + * Returns: 242 + * 0 - Success. 243 + * Negative on failure. The return value is dependent on the 244 + * callback. 245 + */ 246 + typedef int (*ftrace_ops_func_t)(struct ftrace_ops *op, enum ftrace_ops_cmd cmd); 247 + 211 248 #ifdef CONFIG_DYNAMIC_FTRACE 212 249 /* The hash used to know what functions callbacks trace */ 213 250 struct ftrace_ops_hash { ··· 287 250 unsigned long trampoline; 288 251 unsigned long trampoline_size; 289 252 struct list_head list; 253 + ftrace_ops_func_t ops_func; 290 254 #endif 291 255 }; 292 256 ··· 378 340 int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 379 341 int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 380 342 int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 343 + int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr); 381 344 382 345 #else 383 346 struct ftrace_ops; ··· 420 381 return -ENODEV; 421 382 } 422 383 static inline int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 384 + { 385 + return -ENODEV; 386 + } 387 + static inline int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr) 423 388 { 424 389 return -ENODEV; 425 390 }
+8
include/linux/skbuff.h
··· 2487 2487 2488 2488 #endif /* NET_SKBUFF_DATA_USES_OFFSET */ 2489 2489 2490 + static inline void skb_assert_len(struct sk_buff *skb) 2491 + { 2492 + #ifdef CONFIG_DEBUG_NET 2493 + if (WARN_ONCE(!skb->len, "%s\n", __func__)) 2494 + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); 2495 + #endif /* CONFIG_DEBUG_NET */ 2496 + } 2497 + 2490 2498 /* 2491 2499 * Add data to an sk_buff 2492 2500 */
+19
include/net/netfilter/nf_conntrack_core.h
··· 84 84 85 85 extern spinlock_t nf_conntrack_expect_lock; 86 86 87 + /* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */ 88 + 89 + #if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \ 90 + (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) || \ 91 + IS_ENABLED(CONFIG_NF_CT_NETLINK)) 92 + 93 + static inline void __nf_ct_set_timeout(struct nf_conn *ct, u64 timeout) 94 + { 95 + if (timeout > INT_MAX) 96 + timeout = INT_MAX; 97 + WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout); 98 + } 99 + 100 + int __nf_ct_change_timeout(struct nf_conn *ct, u64 cta_timeout); 101 + void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off); 102 + int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status); 103 + 104 + #endif 105 + 87 106 #endif /* _NF_CONNTRACK_CORE_H */
+14
include/net/xdp_sock_drv.h
··· 44 44 xp_set_rxq_info(pool, rxq); 45 45 } 46 46 47 + static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool) 48 + { 49 + #ifdef CONFIG_NET_RX_BUSY_POLL 50 + return pool->heads[0].xdp.rxq->napi_id; 51 + #else 52 + return 0; 53 + #endif 54 + } 55 + 47 56 static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, 48 57 unsigned long attrs) 49 58 { ··· 205 196 static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool, 206 197 struct xdp_rxq_info *rxq) 207 198 { 199 + } 200 + 201 + static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool) 202 + { 203 + return 0; 208 204 } 209 205 210 206 static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool,
+2 -1
include/uapi/linux/bpf.h
··· 2361 2361 * Pull in non-linear data in case the *skb* is non-linear and not 2362 2362 * all of *len* are part of the linear section. Make *len* bytes 2363 2363 * from *skb* readable and writable. If a zero value is passed for 2364 - * *len*, then the whole length of the *skb* is pulled. 2364 + * *len*, then all bytes in the linear part of *skb* will be made 2365 + * readable and writable. 2365 2366 * 2366 2367 * This helper is only needed for reading and writing with direct 2367 2368 * packet access.
+22 -18
kernel/bpf/arraymap.c
··· 70 70 attr->map_flags & BPF_F_PRESERVE_ELEMS) 71 71 return -EINVAL; 72 72 73 - if (attr->value_size > KMALLOC_MAX_SIZE) 74 - /* if value_size is bigger, the user space won't be able to 75 - * access the elements. 76 - */ 73 + /* avoid overflow on round_up(map->value_size) */ 74 + if (attr->value_size > INT_MAX) 77 75 return -E2BIG; 78 76 79 77 return 0; ··· 154 156 return &array->map; 155 157 } 156 158 159 + static void *array_map_elem_ptr(struct bpf_array* array, u32 index) 160 + { 161 + return array->value + (u64)array->elem_size * index; 162 + } 163 + 157 164 /* Called from syscall or from eBPF program */ 158 165 static void *array_map_lookup_elem(struct bpf_map *map, void *key) 159 166 { ··· 168 165 if (unlikely(index >= array->map.max_entries)) 169 166 return NULL; 170 167 171 - return array->value + array->elem_size * (index & array->index_mask); 168 + return array->value + (u64)array->elem_size * (index & array->index_mask); 172 169 } 173 170 174 171 static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm, ··· 206 203 { 207 204 struct bpf_array *array = container_of(map, struct bpf_array, map); 208 205 struct bpf_insn *insn = insn_buf; 209 - u32 elem_size = round_up(map->value_size, 8); 206 + u32 elem_size = array->elem_size; 210 207 const int ret = BPF_REG_0; 211 208 const int map_ptr = BPF_REG_1; 212 209 const int index = BPF_REG_2; ··· 275 272 * access 'value_size' of them, so copying rounded areas 276 273 * will not leak any kernel data 277 274 */ 278 - size = round_up(map->value_size, 8); 275 + size = array->elem_size; 279 276 rcu_read_lock(); 280 277 pptr = array->pptrs[index & array->index_mask]; 281 278 for_each_possible_cpu(cpu) { ··· 342 339 value, map->value_size); 343 340 } else { 344 341 val = array->value + 345 - array->elem_size * (index & array->index_mask); 342 + (u64)array->elem_size * (index & array->index_mask); 346 343 if (map_flags & BPF_F_LOCK) 347 344 copy_map_value_locked(map, val, value, false); 348 345 else ··· 379 376 * returned or zeros which were zero-filled by percpu_alloc, 380 377 * so no kernel data leaks possible 381 378 */ 382 - size = round_up(map->value_size, 8); 379 + size = array->elem_size; 383 380 rcu_read_lock(); 384 381 pptr = array->pptrs[index & array->index_mask]; 385 382 for_each_possible_cpu(cpu) { ··· 411 408 return; 412 409 413 410 for (i = 0; i < array->map.max_entries; i++) 414 - bpf_timer_cancel_and_free(array->value + array->elem_size * i + 415 - map->timer_off); 411 + bpf_timer_cancel_and_free(array_map_elem_ptr(array, i) + map->timer_off); 416 412 } 417 413 418 414 /* Called when map->refcnt goes to zero, either from workqueue or from syscall */ ··· 422 420 423 421 if (map_value_has_kptrs(map)) { 424 422 for (i = 0; i < array->map.max_entries; i++) 425 - bpf_map_free_kptrs(map, array->value + array->elem_size * i); 423 + bpf_map_free_kptrs(map, array_map_elem_ptr(array, i)); 426 424 bpf_map_free_kptr_off_tab(map); 427 425 } 428 426 ··· 558 556 index = info->index & array->index_mask; 559 557 if (info->percpu_value_buf) 560 558 return array->pptrs[index]; 561 - return array->value + array->elem_size * index; 559 + return array_map_elem_ptr(array, index); 562 560 } 563 561 564 562 static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos) ··· 577 575 index = info->index & array->index_mask; 578 576 if (info->percpu_value_buf) 579 577 return array->pptrs[index]; 580 - return array->value + array->elem_size * index; 578 + return array_map_elem_ptr(array, index); 581 579 } 582 580 583 581 static int __bpf_array_map_seq_show(struct seq_file *seq, void *v) ··· 585 583 struct bpf_iter_seq_array_map_info *info = seq->private; 586 584 struct bpf_iter__bpf_map_elem ctx = {}; 587 585 struct bpf_map *map = info->map; 586 + struct bpf_array *array = container_of(map, struct bpf_array, map); 588 587 struct bpf_iter_meta meta; 589 588 struct bpf_prog *prog; 590 589 int off = 0, cpu = 0; ··· 606 603 ctx.value = v; 607 604 } else { 608 605 pptr = v; 609 - size = round_up(map->value_size, 8); 606 + size = array->elem_size; 610 607 for_each_possible_cpu(cpu) { 611 608 bpf_long_memcpy(info->percpu_value_buf + off, 612 609 per_cpu_ptr(pptr, cpu), ··· 636 633 { 637 634 struct bpf_iter_seq_array_map_info *seq_info = priv_data; 638 635 struct bpf_map *map = aux->map; 636 + struct bpf_array *array = container_of(map, struct bpf_array, map); 639 637 void *value_buf; 640 638 u32 buf_size; 641 639 642 640 if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { 643 - buf_size = round_up(map->value_size, 8) * num_possible_cpus(); 641 + buf_size = array->elem_size * num_possible_cpus(); 644 642 value_buf = kmalloc(buf_size, GFP_USER | __GFP_NOWARN); 645 643 if (!value_buf) 646 644 return -ENOMEM; ··· 694 690 if (is_percpu) 695 691 val = this_cpu_ptr(array->pptrs[i]); 696 692 else 697 - val = array->value + array->elem_size * i; 693 + val = array_map_elem_ptr(array, i); 698 694 num_elems++; 699 695 key = i; 700 696 ret = callback_fn((u64)(long)map, (u64)(long)&key, ··· 1326 1322 struct bpf_insn *insn_buf) 1327 1323 { 1328 1324 struct bpf_array *array = container_of(map, struct bpf_array, map); 1329 - u32 elem_size = round_up(map->value_size, 8); 1325 + u32 elem_size = array->elem_size; 1330 1326 struct bpf_insn *insn = insn_buf; 1331 1327 const int ret = BPF_REG_0; 1332 1328 const int map_ptr = BPF_REG_1;
+6 -2
kernel/bpf/bpf_lsm.c
··· 63 63 BTF_ID(func, bpf_lsm_socket_socketpair) 64 64 BTF_SET_END(bpf_lsm_unlocked_sockopt_hooks) 65 65 66 + #ifdef CONFIG_CGROUP_BPF 66 67 void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, 67 68 bpf_func_t *bpf_func) 68 69 { 69 - const struct btf_param *args; 70 + const struct btf_param *args __maybe_unused; 70 71 71 72 if (btf_type_vlen(prog->aux->attach_func_proto) < 1 || 72 73 btf_id_set_contains(&bpf_lsm_current_hooks, ··· 76 75 return; 77 76 } 78 77 78 + #ifdef CONFIG_NET 79 79 args = btf_params(prog->aux->attach_func_proto); 80 80 81 - #ifdef CONFIG_NET 82 81 if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCKET]) 83 82 *bpf_func = __cgroup_bpf_run_lsm_socket; 84 83 else if (args[0].type == btf_sock_ids[BTF_SOCK_TYPE_SOCK]) ··· 87 86 #endif 88 87 *bpf_func = __cgroup_bpf_run_lsm_current; 89 88 } 89 + #endif 90 90 91 91 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, 92 92 const struct bpf_prog *prog) ··· 221 219 case BPF_FUNC_get_retval: 222 220 return prog->expected_attach_type == BPF_LSM_CGROUP ? 223 221 &bpf_get_retval_proto : NULL; 222 + #ifdef CONFIG_NET 224 223 case BPF_FUNC_setsockopt: 225 224 if (prog->expected_attach_type != BPF_LSM_CGROUP) 226 225 return NULL; ··· 242 239 prog->aux->attach_btf_id)) 243 240 return &bpf_unlocked_sk_getsockopt_proto; 244 241 return NULL; 242 + #endif 245 243 default: 246 244 return tracing_prog_func_proto(func_id, prog); 247 245 }
+3
kernel/bpf/bpf_struct_ops.c
··· 341 341 342 342 tlinks[BPF_TRAMP_FENTRY].links[0] = link; 343 343 tlinks[BPF_TRAMP_FENTRY].nr_links = 1; 344 + /* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops, 345 + * and it must be used alone. 346 + */ 344 347 flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0; 345 348 return arch_prepare_bpf_trampoline(NULL, image, image_end, 346 349 model, flags, tlinks, NULL);
+64 -62
kernel/bpf/btf.c
··· 213 213 }; 214 214 215 215 struct btf_kfunc_set_tab { 216 - struct btf_id_set *sets[BTF_KFUNC_HOOK_MAX][BTF_KFUNC_TYPE_MAX]; 216 + struct btf_id_set8 *sets[BTF_KFUNC_HOOK_MAX]; 217 217 }; 218 218 219 219 struct btf_id_dtor_kfunc_tab { ··· 1116 1116 */ 1117 1117 #define btf_show_type_value(show, fmt, value) \ 1118 1118 do { \ 1119 - if ((value) != 0 || (show->flags & BTF_SHOW_ZERO) || \ 1119 + if ((value) != (__typeof__(value))0 || \ 1120 + (show->flags & BTF_SHOW_ZERO) || \ 1120 1121 show->state.depth == 0) { \ 1121 1122 btf_show(show, "%s%s" fmt "%s%s", \ 1122 1123 btf_show_indent(show), \ ··· 1616 1615 static void btf_free_kfunc_set_tab(struct btf *btf) 1617 1616 { 1618 1617 struct btf_kfunc_set_tab *tab = btf->kfunc_set_tab; 1619 - int hook, type; 1618 + int hook; 1620 1619 1621 1620 if (!tab) 1622 1621 return; ··· 1625 1624 */ 1626 1625 if (btf_is_module(btf)) 1627 1626 goto free_tab; 1628 - for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++) { 1629 - for (type = 0; type < ARRAY_SIZE(tab->sets[0]); type++) 1630 - kfree(tab->sets[hook][type]); 1631 - } 1627 + for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++) 1628 + kfree(tab->sets[hook]); 1632 1629 free_tab: 1633 1630 kfree(tab); 1634 1631 btf->kfunc_set_tab = NULL; ··· 6170 6171 static int btf_check_func_arg_match(struct bpf_verifier_env *env, 6171 6172 const struct btf *btf, u32 func_id, 6172 6173 struct bpf_reg_state *regs, 6173 - bool ptr_to_mem_ok) 6174 + bool ptr_to_mem_ok, 6175 + u32 kfunc_flags) 6174 6176 { 6175 6177 enum bpf_prog_type prog_type = resolve_prog_type(env->prog); 6178 + bool rel = false, kptr_get = false, trusted_arg = false; 6176 6179 struct bpf_verifier_log *log = &env->log; 6177 6180 u32 i, nargs, ref_id, ref_obj_id = 0; 6178 6181 bool is_kfunc = btf_is_kernel(btf); 6179 - bool rel = false, kptr_get = false; 6180 6182 const char *func_name, *ref_tname; 6181 6183 const struct btf_type *t, *ref_t; 6182 6184 const struct btf_param *args; ··· 6209 6209 6210 6210 if (is_kfunc) { 6211 6211 /* Only kfunc can be release func */ 6212 - rel = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog), 6213 - BTF_KFUNC_TYPE_RELEASE, func_id); 6214 - kptr_get = btf_kfunc_id_set_contains(btf, resolve_prog_type(env->prog), 6215 - BTF_KFUNC_TYPE_KPTR_ACQUIRE, func_id); 6212 + rel = kfunc_flags & KF_RELEASE; 6213 + kptr_get = kfunc_flags & KF_KPTR_GET; 6214 + trusted_arg = kfunc_flags & KF_TRUSTED_ARGS; 6216 6215 } 6217 6216 6218 6217 /* check that BTF function arguments match actual types that the ··· 6236 6237 return -EINVAL; 6237 6238 } 6238 6239 6240 + /* Check if argument must be a referenced pointer, args + i has 6241 + * been verified to be a pointer (after skipping modifiers). 6242 + */ 6243 + if (is_kfunc && trusted_arg && !reg->ref_obj_id) { 6244 + bpf_log(log, "R%d must be referenced\n", regno); 6245 + return -EINVAL; 6246 + } 6247 + 6239 6248 ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id); 6240 6249 ref_tname = btf_name_by_offset(btf, ref_t->name_off); 6241 6250 6242 - if (rel && reg->ref_obj_id) 6251 + /* Trusted args have the same offset checks as release arguments */ 6252 + if (trusted_arg || (rel && reg->ref_obj_id)) 6243 6253 arg_type |= OBJ_RELEASE; 6244 6254 ret = check_func_arg_reg_off(env, reg, regno, arg_type); 6245 6255 if (ret < 0) ··· 6346 6338 reg_ref_tname = btf_name_by_offset(reg_btf, 6347 6339 reg_ref_t->name_off); 6348 6340 if (!btf_struct_ids_match(log, reg_btf, reg_ref_id, 6349 - reg->off, btf, ref_id, rel && reg->ref_obj_id)) { 6341 + reg->off, btf, ref_id, 6342 + trusted_arg || (rel && reg->ref_obj_id))) { 6350 6343 bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n", 6351 6344 func_name, i, 6352 6345 btf_type_str(ref_t), ref_tname, ··· 6450 6441 return -EINVAL; 6451 6442 6452 6443 is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL; 6453 - err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global); 6444 + err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, 0); 6454 6445 6455 6446 /* Compiler optimizations can remove arguments from static functions 6456 6447 * or mismatched type can be passed into a global function. ··· 6463 6454 6464 6455 int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, 6465 6456 const struct btf *btf, u32 func_id, 6466 - struct bpf_reg_state *regs) 6457 + struct bpf_reg_state *regs, 6458 + u32 kfunc_flags) 6467 6459 { 6468 - return btf_check_func_arg_match(env, btf, func_id, regs, true); 6460 + return btf_check_func_arg_match(env, btf, func_id, regs, true, kfunc_flags); 6469 6461 } 6470 6462 6471 6463 /* Convert BTF of a function into bpf_reg_state if possible ··· 6863 6853 return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL; 6864 6854 } 6865 6855 6856 + static void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id) 6857 + { 6858 + return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func); 6859 + } 6860 + 6866 6861 enum { 6867 6862 BTF_MODULE_F_LIVE = (1 << 0), 6868 6863 }; ··· 7116 7101 7117 7102 /* Kernel Function (kfunc) BTF ID set registration API */ 7118 7103 7119 - static int __btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, 7120 - enum btf_kfunc_type type, 7121 - struct btf_id_set *add_set, bool vmlinux_set) 7104 + static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, 7105 + struct btf_id_set8 *add_set) 7122 7106 { 7107 + bool vmlinux_set = !btf_is_module(btf); 7123 7108 struct btf_kfunc_set_tab *tab; 7124 - struct btf_id_set *set; 7109 + struct btf_id_set8 *set; 7125 7110 u32 set_cnt; 7126 7111 int ret; 7127 7112 7128 - if (hook >= BTF_KFUNC_HOOK_MAX || type >= BTF_KFUNC_TYPE_MAX) { 7113 + if (hook >= BTF_KFUNC_HOOK_MAX) { 7129 7114 ret = -EINVAL; 7130 7115 goto end; 7131 7116 } ··· 7141 7126 btf->kfunc_set_tab = tab; 7142 7127 } 7143 7128 7144 - set = tab->sets[hook][type]; 7129 + set = tab->sets[hook]; 7145 7130 /* Warn when register_btf_kfunc_id_set is called twice for the same hook 7146 7131 * for module sets. 7147 7132 */ ··· 7155 7140 * pointer and return. 7156 7141 */ 7157 7142 if (!vmlinux_set) { 7158 - tab->sets[hook][type] = add_set; 7143 + tab->sets[hook] = add_set; 7159 7144 return 0; 7160 7145 } 7161 7146 ··· 7164 7149 * and concatenate all individual sets being registered. While each set 7165 7150 * is individually sorted, they may become unsorted when concatenated, 7166 7151 * hence re-sorting the final set again is required to make binary 7167 - * searching the set using btf_id_set_contains function work. 7152 + * searching the set using btf_id_set8_contains function work. 7168 7153 */ 7169 7154 set_cnt = set ? set->cnt : 0; 7170 7155 ··· 7179 7164 } 7180 7165 7181 7166 /* Grow set */ 7182 - set = krealloc(tab->sets[hook][type], 7183 - offsetof(struct btf_id_set, ids[set_cnt + add_set->cnt]), 7167 + set = krealloc(tab->sets[hook], 7168 + offsetof(struct btf_id_set8, pairs[set_cnt + add_set->cnt]), 7184 7169 GFP_KERNEL | __GFP_NOWARN); 7185 7170 if (!set) { 7186 7171 ret = -ENOMEM; ··· 7188 7173 } 7189 7174 7190 7175 /* For newly allocated set, initialize set->cnt to 0 */ 7191 - if (!tab->sets[hook][type]) 7176 + if (!tab->sets[hook]) 7192 7177 set->cnt = 0; 7193 - tab->sets[hook][type] = set; 7178 + tab->sets[hook] = set; 7194 7179 7195 7180 /* Concatenate the two sets */ 7196 - memcpy(set->ids + set->cnt, add_set->ids, add_set->cnt * sizeof(set->ids[0])); 7181 + memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0])); 7197 7182 set->cnt += add_set->cnt; 7198 7183 7199 - sort(set->ids, set->cnt, sizeof(set->ids[0]), btf_id_cmp_func, NULL); 7184 + sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL); 7200 7185 7201 7186 return 0; 7202 7187 end: ··· 7204 7189 return ret; 7205 7190 } 7206 7191 7207 - static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, 7208 - const struct btf_kfunc_id_set *kset) 7209 - { 7210 - bool vmlinux_set = !btf_is_module(btf); 7211 - int type, ret = 0; 7212 - 7213 - for (type = 0; type < ARRAY_SIZE(kset->sets); type++) { 7214 - if (!kset->sets[type]) 7215 - continue; 7216 - 7217 - ret = __btf_populate_kfunc_set(btf, hook, type, kset->sets[type], vmlinux_set); 7218 - if (ret) 7219 - break; 7220 - } 7221 - return ret; 7222 - } 7223 - 7224 - static bool __btf_kfunc_id_set_contains(const struct btf *btf, 7192 + static u32 *__btf_kfunc_id_set_contains(const struct btf *btf, 7225 7193 enum btf_kfunc_hook hook, 7226 - enum btf_kfunc_type type, 7227 7194 u32 kfunc_btf_id) 7228 7195 { 7229 - struct btf_id_set *set; 7196 + struct btf_id_set8 *set; 7197 + u32 *id; 7230 7198 7231 - if (hook >= BTF_KFUNC_HOOK_MAX || type >= BTF_KFUNC_TYPE_MAX) 7232 - return false; 7199 + if (hook >= BTF_KFUNC_HOOK_MAX) 7200 + return NULL; 7233 7201 if (!btf->kfunc_set_tab) 7234 - return false; 7235 - set = btf->kfunc_set_tab->sets[hook][type]; 7202 + return NULL; 7203 + set = btf->kfunc_set_tab->sets[hook]; 7236 7204 if (!set) 7237 - return false; 7238 - return btf_id_set_contains(set, kfunc_btf_id); 7205 + return NULL; 7206 + id = btf_id_set8_contains(set, kfunc_btf_id); 7207 + if (!id) 7208 + return NULL; 7209 + /* The flags for BTF ID are located next to it */ 7210 + return id + 1; 7239 7211 } 7240 7212 7241 7213 static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type) ··· 7250 7248 * keeping the reference for the duration of the call provides the necessary 7251 7249 * protection for looking up a well-formed btf->kfunc_set_tab. 7252 7250 */ 7253 - bool btf_kfunc_id_set_contains(const struct btf *btf, 7251 + u32 *btf_kfunc_id_set_contains(const struct btf *btf, 7254 7252 enum bpf_prog_type prog_type, 7255 - enum btf_kfunc_type type, u32 kfunc_btf_id) 7253 + u32 kfunc_btf_id) 7256 7254 { 7257 7255 enum btf_kfunc_hook hook; 7258 7256 7259 7257 hook = bpf_prog_type_to_kfunc_hook(prog_type); 7260 - return __btf_kfunc_id_set_contains(btf, hook, type, kfunc_btf_id); 7258 + return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id); 7261 7259 } 7262 7260 7263 7261 /* This function must be invoked only from initcalls/module init functions */ ··· 7284 7282 return PTR_ERR(btf); 7285 7283 7286 7284 hook = bpf_prog_type_to_kfunc_hook(prog_type); 7287 - ret = btf_populate_kfunc_set(btf, hook, kset); 7285 + ret = btf_populate_kfunc_set(btf, hook, kset->set); 7288 7286 btf_put(btf); 7289 7287 return ret; 7290 7288 }
+29 -71
kernel/bpf/core.c
··· 652 652 return fp->jited && !bpf_prog_was_classic(fp); 653 653 } 654 654 655 - static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) 656 - { 657 - return list_empty(&fp->aux->ksym.lnode) || 658 - fp->aux->ksym.lnode.prev == LIST_POISON2; 659 - } 660 - 661 655 void bpf_prog_kallsyms_add(struct bpf_prog *fp) 662 656 { 663 657 if (!bpf_prog_kallsyms_candidate(fp) || ··· 827 833 828 834 #define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE) 829 835 830 - static size_t bpf_prog_pack_size = -1; 831 - static size_t bpf_prog_pack_mask = -1; 832 - 833 - static int bpf_prog_chunk_count(void) 834 - { 835 - WARN_ON_ONCE(bpf_prog_pack_size == -1); 836 - return bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE; 837 - } 838 - 839 836 static DEFINE_MUTEX(pack_mutex); 840 837 static LIST_HEAD(pack_list); 841 838 ··· 834 849 * CONFIG_MMU=n. Use PAGE_SIZE in these cases. 835 850 */ 836 851 #ifdef PMD_SIZE 837 - #define BPF_HPAGE_SIZE PMD_SIZE 838 - #define BPF_HPAGE_MASK PMD_MASK 852 + #define BPF_PROG_PACK_SIZE (PMD_SIZE * num_possible_nodes()) 839 853 #else 840 - #define BPF_HPAGE_SIZE PAGE_SIZE 841 - #define BPF_HPAGE_MASK PAGE_MASK 854 + #define BPF_PROG_PACK_SIZE PAGE_SIZE 842 855 #endif 843 856 844 - static size_t select_bpf_prog_pack_size(void) 845 - { 846 - size_t size; 847 - void *ptr; 848 - 849 - size = BPF_HPAGE_SIZE * num_online_nodes(); 850 - ptr = module_alloc(size); 851 - 852 - /* Test whether we can get huge pages. If not just use PAGE_SIZE 853 - * packs. 854 - */ 855 - if (!ptr || !is_vm_area_hugepages(ptr)) { 856 - size = PAGE_SIZE; 857 - bpf_prog_pack_mask = PAGE_MASK; 858 - } else { 859 - bpf_prog_pack_mask = BPF_HPAGE_MASK; 860 - } 861 - 862 - vfree(ptr); 863 - return size; 864 - } 857 + #define BPF_PROG_CHUNK_COUNT (BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE) 865 858 866 859 static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns) 867 860 { 868 861 struct bpf_prog_pack *pack; 869 862 870 - pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(bpf_prog_chunk_count())), 863 + pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)), 871 864 GFP_KERNEL); 872 865 if (!pack) 873 866 return NULL; 874 - pack->ptr = module_alloc(bpf_prog_pack_size); 867 + pack->ptr = module_alloc(BPF_PROG_PACK_SIZE); 875 868 if (!pack->ptr) { 876 869 kfree(pack); 877 870 return NULL; 878 871 } 879 - bpf_fill_ill_insns(pack->ptr, bpf_prog_pack_size); 880 - bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE); 872 + bpf_fill_ill_insns(pack->ptr, BPF_PROG_PACK_SIZE); 873 + bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE); 881 874 list_add_tail(&pack->list, &pack_list); 882 875 883 876 set_vm_flush_reset_perms(pack->ptr); 884 - set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); 885 - set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); 877 + set_memory_ro((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); 878 + set_memory_x((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); 886 879 return pack; 887 880 } 888 881 ··· 872 909 void *ptr = NULL; 873 910 874 911 mutex_lock(&pack_mutex); 875 - if (bpf_prog_pack_size == -1) 876 - bpf_prog_pack_size = select_bpf_prog_pack_size(); 877 - 878 - if (size > bpf_prog_pack_size) { 912 + if (size > BPF_PROG_PACK_SIZE) { 879 913 size = round_up(size, PAGE_SIZE); 880 914 ptr = module_alloc(size); 881 915 if (ptr) { ··· 884 924 goto out; 885 925 } 886 926 list_for_each_entry(pack, &pack_list, list) { 887 - pos = bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, 927 + pos = bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, 888 928 nbits, 0); 889 - if (pos < bpf_prog_chunk_count()) 929 + if (pos < BPF_PROG_CHUNK_COUNT) 890 930 goto found_free_area; 891 931 } 892 932 ··· 910 950 struct bpf_prog_pack *pack = NULL, *tmp; 911 951 unsigned int nbits; 912 952 unsigned long pos; 913 - void *pack_ptr; 914 953 915 954 mutex_lock(&pack_mutex); 916 - if (hdr->size > bpf_prog_pack_size) { 955 + if (hdr->size > BPF_PROG_PACK_SIZE) { 917 956 module_memfree(hdr); 918 957 goto out; 919 958 } 920 959 921 - pack_ptr = (void *)((unsigned long)hdr & bpf_prog_pack_mask); 922 - 923 960 list_for_each_entry(tmp, &pack_list, list) { 924 - if (tmp->ptr == pack_ptr) { 961 + if ((void *)hdr >= tmp->ptr && (tmp->ptr + BPF_PROG_PACK_SIZE) > (void *)hdr) { 925 962 pack = tmp; 926 963 break; 927 964 } ··· 928 971 goto out; 929 972 930 973 nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size); 931 - pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT; 974 + pos = ((unsigned long)hdr - (unsigned long)pack->ptr) >> BPF_PROG_CHUNK_SHIFT; 932 975 933 976 WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size), 934 977 "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n"); 935 978 936 979 bitmap_clear(pack->bitmap, pos, nbits); 937 - if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, 938 - bpf_prog_chunk_count(), 0) == 0) { 980 + if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, 981 + BPF_PROG_CHUNK_COUNT, 0) == 0) { 939 982 list_del(&pack->list); 940 983 module_memfree(pack->ptr); 941 984 kfree(pack); ··· 1112 1155 bpf_prog_pack_free(ro_header); 1113 1156 return PTR_ERR(ptr); 1114 1157 } 1115 - prog->aux->use_bpf_prog_pack = true; 1116 1158 return 0; 1117 1159 } 1118 1160 ··· 1135 1179 bpf_jit_uncharge_modmem(size); 1136 1180 } 1137 1181 1182 + struct bpf_binary_header * 1183 + bpf_jit_binary_pack_hdr(const struct bpf_prog *fp) 1184 + { 1185 + unsigned long real_start = (unsigned long)fp->bpf_func; 1186 + unsigned long addr; 1187 + 1188 + addr = real_start & BPF_PROG_CHUNK_MASK; 1189 + return (void *)addr; 1190 + } 1191 + 1138 1192 static inline struct bpf_binary_header * 1139 1193 bpf_jit_binary_hdr(const struct bpf_prog *fp) 1140 1194 { 1141 1195 unsigned long real_start = (unsigned long)fp->bpf_func; 1142 1196 unsigned long addr; 1143 1197 1144 - if (fp->aux->use_bpf_prog_pack) 1145 - addr = real_start & BPF_PROG_CHUNK_MASK; 1146 - else 1147 - addr = real_start & PAGE_MASK; 1148 - 1198 + addr = real_start & PAGE_MASK; 1149 1199 return (void *)addr; 1150 1200 } 1151 1201 ··· 1164 1202 if (fp->jited) { 1165 1203 struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp); 1166 1204 1167 - if (fp->aux->use_bpf_prog_pack) 1168 - bpf_jit_binary_pack_free(hdr, NULL /* rw_buffer */); 1169 - else 1170 - bpf_jit_binary_free(hdr); 1171 - 1205 + bpf_jit_binary_free(hdr); 1172 1206 WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp)); 1173 1207 } 1174 1208
+1 -1
kernel/bpf/devmap.c
··· 845 845 struct bpf_dtab_netdev *dev; 846 846 847 847 dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev), 848 - GFP_ATOMIC | __GFP_NOWARN, 848 + GFP_NOWAIT | __GFP_NOWARN, 849 849 dtab->map.numa_node); 850 850 if (!dev) 851 851 return ERR_PTR(-ENOMEM);
+3 -3
kernel/bpf/hashtab.c
··· 61 61 * 62 62 * As regular device interrupt handlers and soft interrupts are forced into 63 63 * thread context, the existing code which does 64 - * spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*(); 64 + * spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*(); 65 65 * just works. 66 66 * 67 67 * In theory the BPF locks could be converted to regular spinlocks as well, ··· 978 978 goto dec_count; 979 979 } 980 980 l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size, 981 - GFP_ATOMIC | __GFP_NOWARN, 981 + GFP_NOWAIT | __GFP_NOWARN, 982 982 htab->map.numa_node); 983 983 if (!l_new) { 984 984 l_new = ERR_PTR(-ENOMEM); ··· 996 996 } else { 997 997 /* alloc_percpu zero-fills */ 998 998 pptr = bpf_map_alloc_percpu(&htab->map, size, 8, 999 - GFP_ATOMIC | __GFP_NOWARN); 999 + GFP_NOWAIT | __GFP_NOWARN); 1000 1000 if (!pptr) { 1001 1001 kfree(l_new); 1002 1002 l_new = ERR_PTR(-ENOMEM);
+1 -1
kernel/bpf/local_storage.c
··· 165 165 } 166 166 167 167 new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size), 168 - __GFP_ZERO | GFP_ATOMIC | __GFP_NOWARN, 168 + __GFP_ZERO | GFP_NOWAIT | __GFP_NOWARN, 169 169 map->numa_node); 170 170 if (!new) 171 171 return -ENOMEM;
+1 -1
kernel/bpf/lpm_trie.c
··· 285 285 if (value) 286 286 size += trie->map.value_size; 287 287 288 - node = bpf_map_kmalloc_node(&trie->map, size, GFP_ATOMIC | __GFP_NOWARN, 288 + node = bpf_map_kmalloc_node(&trie->map, size, GFP_NOWAIT | __GFP_NOWARN, 289 289 trie->map.numa_node); 290 290 if (!node) 291 291 return NULL;
+3 -7
kernel/bpf/preload/iterators/Makefile
··· 9 9 TOOLS_PATH := $(abspath ../../../../tools) 10 10 BPFTOOL_SRC := $(TOOLS_PATH)/bpf/bpftool 11 11 BPFTOOL_OUTPUT := $(abs_out)/bpftool 12 - DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool 12 + DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)/bootstrap/bpftool 13 13 BPFTOOL ?= $(DEFAULT_BPFTOOL) 14 14 15 15 LIBBPF_SRC := $(TOOLS_PATH)/lib/bpf ··· 61 61 OUTPUT=$(abspath $(dir $@))/ prefix= \ 62 62 DESTDIR=$(LIBBPF_DESTDIR) $(abspath $@) install_headers 63 63 64 - $(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT) 65 - $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC) \ 66 - OUTPUT=$(BPFTOOL_OUTPUT)/ \ 67 - LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/ \ 68 - LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/ \ 69 - prefix= DESTDIR=$(abs_out)/ install-bin 64 + $(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT) 65 + $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOL_SRC) OUTPUT=$(BPFTOOL_OUTPUT)/ bootstrap
+28 -8
kernel/bpf/syscall.c
··· 419 419 #ifdef CONFIG_MEMCG_KMEM 420 420 static void bpf_map_save_memcg(struct bpf_map *map) 421 421 { 422 - map->memcg = get_mem_cgroup_from_mm(current->mm); 422 + /* Currently if a map is created by a process belonging to the root 423 + * memory cgroup, get_obj_cgroup_from_current() will return NULL. 424 + * So we have to check map->objcg for being NULL each time it's 425 + * being used. 426 + */ 427 + map->objcg = get_obj_cgroup_from_current(); 423 428 } 424 429 425 430 static void bpf_map_release_memcg(struct bpf_map *map) 426 431 { 427 - mem_cgroup_put(map->memcg); 432 + if (map->objcg) 433 + obj_cgroup_put(map->objcg); 434 + } 435 + 436 + static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map) 437 + { 438 + if (map->objcg) 439 + return get_mem_cgroup_from_objcg(map->objcg); 440 + 441 + return root_mem_cgroup; 428 442 } 429 443 430 444 void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, 431 445 int node) 432 446 { 433 - struct mem_cgroup *old_memcg; 447 + struct mem_cgroup *memcg, *old_memcg; 434 448 void *ptr; 435 449 436 - old_memcg = set_active_memcg(map->memcg); 450 + memcg = bpf_map_get_memcg(map); 451 + old_memcg = set_active_memcg(memcg); 437 452 ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node); 438 453 set_active_memcg(old_memcg); 454 + mem_cgroup_put(memcg); 439 455 440 456 return ptr; 441 457 } 442 458 443 459 void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags) 444 460 { 445 - struct mem_cgroup *old_memcg; 461 + struct mem_cgroup *memcg, *old_memcg; 446 462 void *ptr; 447 463 448 - old_memcg = set_active_memcg(map->memcg); 464 + memcg = bpf_map_get_memcg(map); 465 + old_memcg = set_active_memcg(memcg); 449 466 ptr = kzalloc(size, flags | __GFP_ACCOUNT); 450 467 set_active_memcg(old_memcg); 468 + mem_cgroup_put(memcg); 451 469 452 470 return ptr; 453 471 } ··· 473 455 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, 474 456 size_t align, gfp_t flags) 475 457 { 476 - struct mem_cgroup *old_memcg; 458 + struct mem_cgroup *memcg, *old_memcg; 477 459 void __percpu *ptr; 478 460 479 - old_memcg = set_active_memcg(map->memcg); 461 + memcg = bpf_map_get_memcg(map); 462 + old_memcg = set_active_memcg(memcg); 480 463 ptr = __alloc_percpu_gfp(size, align, flags | __GFP_ACCOUNT); 481 464 set_active_memcg(old_memcg); 465 + mem_cgroup_put(memcg); 482 466 483 467 return ptr; 484 468 }
+145 -18
kernel/bpf/trampoline.c
··· 13 13 #include <linux/static_call.h> 14 14 #include <linux/bpf_verifier.h> 15 15 #include <linux/bpf_lsm.h> 16 + #include <linux/delay.h> 16 17 17 18 /* dummy _ops. The verifier will operate on target program's ops. */ 18 19 const struct bpf_verifier_ops bpf_extension_verifier_ops = { ··· 29 28 30 29 /* serializes access to trampoline_table */ 31 30 static DEFINE_MUTEX(trampoline_mutex); 31 + 32 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 33 + static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex); 34 + 35 + static int bpf_tramp_ftrace_ops_func(struct ftrace_ops *ops, enum ftrace_ops_cmd cmd) 36 + { 37 + struct bpf_trampoline *tr = ops->private; 38 + int ret = 0; 39 + 40 + if (cmd == FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF) { 41 + /* This is called inside register_ftrace_direct_multi(), so 42 + * tr->mutex is already locked. 43 + */ 44 + lockdep_assert_held_once(&tr->mutex); 45 + 46 + /* Instead of updating the trampoline here, we propagate 47 + * -EAGAIN to register_ftrace_direct_multi(). Then we can 48 + * retry register_ftrace_direct_multi() after updating the 49 + * trampoline. 50 + */ 51 + if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) && 52 + !(tr->flags & BPF_TRAMP_F_ORIG_STACK)) { 53 + if (WARN_ON_ONCE(tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY)) 54 + return -EBUSY; 55 + 56 + tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY; 57 + return -EAGAIN; 58 + } 59 + 60 + return 0; 61 + } 62 + 63 + /* The normal locking order is 64 + * tr->mutex => direct_mutex (ftrace.c) => ftrace_lock (ftrace.c) 65 + * 66 + * The following two commands are called from 67 + * 68 + * prepare_direct_functions_for_ipmodify 69 + * cleanup_direct_functions_after_ipmodify 70 + * 71 + * In both cases, direct_mutex is already locked. Use 72 + * mutex_trylock(&tr->mutex) to avoid deadlock in race condition 73 + * (something else is making changes to this same trampoline). 74 + */ 75 + if (!mutex_trylock(&tr->mutex)) { 76 + /* sleep 1 ms to make sure whatever holding tr->mutex makes 77 + * some progress. 78 + */ 79 + msleep(1); 80 + return -EAGAIN; 81 + } 82 + 83 + switch (cmd) { 84 + case FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER: 85 + tr->flags |= BPF_TRAMP_F_SHARE_IPMODIFY; 86 + 87 + if ((tr->flags & BPF_TRAMP_F_CALL_ORIG) && 88 + !(tr->flags & BPF_TRAMP_F_ORIG_STACK)) 89 + ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */); 90 + break; 91 + case FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER: 92 + tr->flags &= ~BPF_TRAMP_F_SHARE_IPMODIFY; 93 + 94 + if (tr->flags & BPF_TRAMP_F_ORIG_STACK) 95 + ret = bpf_trampoline_update(tr, false /* lock_direct_mutex */); 96 + break; 97 + default: 98 + ret = -EINVAL; 99 + break; 100 + }; 101 + 102 + mutex_unlock(&tr->mutex); 103 + return ret; 104 + } 105 + #endif 32 106 33 107 bool bpf_prog_has_trampoline(const struct bpf_prog *prog) 34 108 { ··· 165 89 tr = kzalloc(sizeof(*tr), GFP_KERNEL); 166 90 if (!tr) 167 91 goto out; 92 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 93 + tr->fops = kzalloc(sizeof(struct ftrace_ops), GFP_KERNEL); 94 + if (!tr->fops) { 95 + kfree(tr); 96 + tr = NULL; 97 + goto out; 98 + } 99 + tr->fops->private = tr; 100 + tr->fops->ops_func = bpf_tramp_ftrace_ops_func; 101 + #endif 168 102 169 103 tr->key = key; 170 104 INIT_HLIST_NODE(&tr->hlist); ··· 214 128 int ret; 215 129 216 130 if (tr->func.ftrace_managed) 217 - ret = unregister_ftrace_direct((long)ip, (long)old_addr); 131 + ret = unregister_ftrace_direct_multi(tr->fops, (long)old_addr); 218 132 else 219 133 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL); 220 134 ··· 223 137 return ret; 224 138 } 225 139 226 - static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr) 140 + static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr, 141 + bool lock_direct_mutex) 227 142 { 228 143 void *ip = tr->func.addr; 229 144 int ret; 230 145 231 - if (tr->func.ftrace_managed) 232 - ret = modify_ftrace_direct((long)ip, (long)old_addr, (long)new_addr); 233 - else 146 + if (tr->func.ftrace_managed) { 147 + if (lock_direct_mutex) 148 + ret = modify_ftrace_direct_multi(tr->fops, (long)new_addr); 149 + else 150 + ret = modify_ftrace_direct_multi_nolock(tr->fops, (long)new_addr); 151 + } else { 234 152 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, new_addr); 153 + } 235 154 return ret; 236 155 } 237 156 ··· 254 163 if (bpf_trampoline_module_get(tr)) 255 164 return -ENOENT; 256 165 257 - if (tr->func.ftrace_managed) 258 - ret = register_ftrace_direct((long)ip, (long)new_addr); 259 - else 166 + if (tr->func.ftrace_managed) { 167 + ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 0); 168 + ret = register_ftrace_direct_multi(tr->fops, (long)new_addr); 169 + } else { 260 170 ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr); 171 + } 261 172 262 173 if (ret) 263 174 bpf_trampoline_module_put(tr); ··· 425 332 return ERR_PTR(err); 426 333 } 427 334 428 - static int bpf_trampoline_update(struct bpf_trampoline *tr) 335 + static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mutex) 429 336 { 430 337 struct bpf_tramp_image *im; 431 338 struct bpf_tramp_links *tlinks; 432 - u32 flags = BPF_TRAMP_F_RESTORE_REGS; 339 + u32 orig_flags = tr->flags; 433 340 bool ip_arg = false; 434 341 int err, total; 435 342 ··· 451 358 goto out; 452 359 } 453 360 361 + /* clear all bits except SHARE_IPMODIFY */ 362 + tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY; 363 + 454 364 if (tlinks[BPF_TRAMP_FEXIT].nr_links || 455 - tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) 456 - flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME; 365 + tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) { 366 + /* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME 367 + * should not be set together. 368 + */ 369 + tr->flags |= BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME; 370 + } else { 371 + tr->flags |= BPF_TRAMP_F_RESTORE_REGS; 372 + } 457 373 458 374 if (ip_arg) 459 - flags |= BPF_TRAMP_F_IP_ARG; 375 + tr->flags |= BPF_TRAMP_F_IP_ARG; 376 + 377 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 378 + again: 379 + if ((tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY) && 380 + (tr->flags & BPF_TRAMP_F_CALL_ORIG)) 381 + tr->flags |= BPF_TRAMP_F_ORIG_STACK; 382 + #endif 460 383 461 384 err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE, 462 - &tr->func.model, flags, tlinks, 385 + &tr->func.model, tr->flags, tlinks, 463 386 tr->func.addr); 464 387 if (err < 0) 465 388 goto out; ··· 484 375 WARN_ON(!tr->cur_image && tr->selector); 485 376 if (tr->cur_image) 486 377 /* progs already running at this address */ 487 - err = modify_fentry(tr, tr->cur_image->image, im->image); 378 + err = modify_fentry(tr, tr->cur_image->image, im->image, lock_direct_mutex); 488 379 else 489 380 /* first time registering */ 490 381 err = register_fentry(tr, im->image); 382 + 383 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 384 + if (err == -EAGAIN) { 385 + /* -EAGAIN from bpf_tramp_ftrace_ops_func. Now 386 + * BPF_TRAMP_F_SHARE_IPMODIFY is set, we can generate the 387 + * trampoline again, and retry register. 388 + */ 389 + /* reset fops->func and fops->trampoline for re-register */ 390 + tr->fops->func = NULL; 391 + tr->fops->trampoline = 0; 392 + goto again; 393 + } 394 + #endif 491 395 if (err) 492 396 goto out; 397 + 493 398 if (tr->cur_image) 494 399 bpf_tramp_image_put(tr->cur_image); 495 400 tr->cur_image = im; 496 401 tr->selector++; 497 402 out: 403 + /* If any error happens, restore previous flags */ 404 + if (err) 405 + tr->flags = orig_flags; 498 406 kfree(tlinks); 499 407 return err; 500 408 } ··· 577 451 578 452 hlist_add_head(&link->tramp_hlist, &tr->progs_hlist[kind]); 579 453 tr->progs_cnt[kind]++; 580 - err = bpf_trampoline_update(tr); 454 + err = bpf_trampoline_update(tr, true /* lock_direct_mutex */); 581 455 if (err) { 582 456 hlist_del_init(&link->tramp_hlist); 583 457 tr->progs_cnt[kind]--; ··· 610 484 } 611 485 hlist_del_init(&link->tramp_hlist); 612 486 tr->progs_cnt[kind]--; 613 - return bpf_trampoline_update(tr); 487 + return bpf_trampoline_update(tr, true /* lock_direct_mutex */); 614 488 } 615 489 616 490 /* bpf_trampoline_unlink_prog() should never fail. */ ··· 624 498 return err; 625 499 } 626 500 627 - #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) 501 + #if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM) 628 502 static void bpf_shim_tramp_link_release(struct bpf_link *link) 629 503 { 630 504 struct bpf_shim_tramp_link *shim_link = ··· 838 712 * multiple rcu callbacks. 839 713 */ 840 714 hlist_del(&tr->hlist); 715 + kfree(tr->fops); 841 716 kfree(tr); 842 717 out: 843 718 mutex_unlock(&trampoline_mutex);
+49 -40
kernel/bpf/verifier.c
··· 5533 5533 type == ARG_CONST_SIZE_OR_ZERO; 5534 5534 } 5535 5535 5536 - static bool arg_type_is_alloc_size(enum bpf_arg_type type) 5537 - { 5538 - return type == ARG_CONST_ALLOC_SIZE_OR_ZERO; 5539 - } 5540 - 5541 - static bool arg_type_is_int_ptr(enum bpf_arg_type type) 5542 - { 5543 - return type == ARG_PTR_TO_INT || 5544 - type == ARG_PTR_TO_LONG; 5545 - } 5546 - 5547 5536 static bool arg_type_is_release(enum bpf_arg_type type) 5548 5537 { 5549 5538 return type & OBJ_RELEASE; ··· 5918 5929 meta->ref_obj_id = reg->ref_obj_id; 5919 5930 } 5920 5931 5921 - if (arg_type == ARG_CONST_MAP_PTR) { 5932 + switch (base_type(arg_type)) { 5933 + case ARG_CONST_MAP_PTR: 5922 5934 /* bpf_map_xxx(map_ptr) call: remember that map_ptr */ 5923 5935 if (meta->map_ptr) { 5924 5936 /* Use map_uid (which is unique id of inner map) to reject: ··· 5944 5954 } 5945 5955 meta->map_ptr = reg->map_ptr; 5946 5956 meta->map_uid = reg->map_uid; 5947 - } else if (arg_type == ARG_PTR_TO_MAP_KEY) { 5957 + break; 5958 + case ARG_PTR_TO_MAP_KEY: 5948 5959 /* bpf_map_xxx(..., map_ptr, ..., key) call: 5949 5960 * check that [key, key + map->key_size) are within 5950 5961 * stack limits and initialized ··· 5962 5971 err = check_helper_mem_access(env, regno, 5963 5972 meta->map_ptr->key_size, false, 5964 5973 NULL); 5965 - } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) { 5974 + break; 5975 + case ARG_PTR_TO_MAP_VALUE: 5966 5976 if (type_may_be_null(arg_type) && register_is_null(reg)) 5967 5977 return 0; 5968 5978 ··· 5979 5987 err = check_helper_mem_access(env, regno, 5980 5988 meta->map_ptr->value_size, false, 5981 5989 meta); 5982 - } else if (arg_type == ARG_PTR_TO_PERCPU_BTF_ID) { 5990 + break; 5991 + case ARG_PTR_TO_PERCPU_BTF_ID: 5983 5992 if (!reg->btf_id) { 5984 5993 verbose(env, "Helper has invalid btf_id in R%d\n", regno); 5985 5994 return -EACCES; 5986 5995 } 5987 5996 meta->ret_btf = reg->btf; 5988 5997 meta->ret_btf_id = reg->btf_id; 5989 - } else if (arg_type == ARG_PTR_TO_SPIN_LOCK) { 5998 + break; 5999 + case ARG_PTR_TO_SPIN_LOCK: 5990 6000 if (meta->func_id == BPF_FUNC_spin_lock) { 5991 6001 if (process_spin_lock(env, regno, true)) 5992 6002 return -EACCES; ··· 5999 6005 verbose(env, "verifier internal error\n"); 6000 6006 return -EFAULT; 6001 6007 } 6002 - } else if (arg_type == ARG_PTR_TO_TIMER) { 6008 + break; 6009 + case ARG_PTR_TO_TIMER: 6003 6010 if (process_timer_func(env, regno, meta)) 6004 6011 return -EACCES; 6005 - } else if (arg_type == ARG_PTR_TO_FUNC) { 6012 + break; 6013 + case ARG_PTR_TO_FUNC: 6006 6014 meta->subprogno = reg->subprogno; 6007 - } else if (base_type(arg_type) == ARG_PTR_TO_MEM) { 6015 + break; 6016 + case ARG_PTR_TO_MEM: 6008 6017 /* The access to this pointer is only checked when we hit the 6009 6018 * next is_mem_size argument below. 6010 6019 */ ··· 6017 6020 fn->arg_size[arg], false, 6018 6021 meta); 6019 6022 } 6020 - } else if (arg_type_is_mem_size(arg_type)) { 6021 - bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO); 6022 - 6023 - err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta); 6024 - } else if (arg_type_is_dynptr(arg_type)) { 6023 + break; 6024 + case ARG_CONST_SIZE: 6025 + err = check_mem_size_reg(env, reg, regno, false, meta); 6026 + break; 6027 + case ARG_CONST_SIZE_OR_ZERO: 6028 + err = check_mem_size_reg(env, reg, regno, true, meta); 6029 + break; 6030 + case ARG_PTR_TO_DYNPTR: 6025 6031 if (arg_type & MEM_UNINIT) { 6026 6032 if (!is_dynptr_reg_valid_uninit(env, reg)) { 6027 6033 verbose(env, "Dynptr has to be an uninitialized dynptr\n"); ··· 6058 6058 err_extra, arg + 1); 6059 6059 return -EINVAL; 6060 6060 } 6061 - } else if (arg_type_is_alloc_size(arg_type)) { 6061 + break; 6062 + case ARG_CONST_ALLOC_SIZE_OR_ZERO: 6062 6063 if (!tnum_is_const(reg->var_off)) { 6063 6064 verbose(env, "R%d is not a known constant'\n", 6064 6065 regno); 6065 6066 return -EACCES; 6066 6067 } 6067 6068 meta->mem_size = reg->var_off.value; 6068 - } else if (arg_type_is_int_ptr(arg_type)) { 6069 + break; 6070 + case ARG_PTR_TO_INT: 6071 + case ARG_PTR_TO_LONG: 6072 + { 6069 6073 int size = int_ptr_type_to_size(arg_type); 6070 6074 6071 6075 err = check_helper_mem_access(env, regno, size, false, meta); 6072 6076 if (err) 6073 6077 return err; 6074 6078 err = check_ptr_alignment(env, reg, 0, size, true); 6075 - } else if (arg_type == ARG_PTR_TO_CONST_STR) { 6079 + break; 6080 + } 6081 + case ARG_PTR_TO_CONST_STR: 6082 + { 6076 6083 struct bpf_map *map = reg->map_ptr; 6077 6084 int map_off; 6078 6085 u64 map_addr; ··· 6118 6111 verbose(env, "string is not zero-terminated\n"); 6119 6112 return -EINVAL; 6120 6113 } 6121 - } else if (arg_type == ARG_PTR_TO_KPTR) { 6114 + break; 6115 + } 6116 + case ARG_PTR_TO_KPTR: 6122 6117 if (process_kptr_func(env, regno, meta)) 6123 6118 return -EACCES; 6119 + break; 6124 6120 } 6125 6121 6126 6122 return err; ··· 7170 7160 static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn, 7171 7161 int *insn_idx_p) 7172 7162 { 7163 + enum bpf_prog_type prog_type = resolve_prog_type(env->prog); 7173 7164 const struct bpf_func_proto *fn = NULL; 7174 7165 enum bpf_return_type ret_type; 7175 7166 enum bpf_type_flag ret_flag; ··· 7332 7321 } 7333 7322 break; 7334 7323 case BPF_FUNC_set_retval: 7335 - if (env->prog->expected_attach_type == BPF_LSM_CGROUP) { 7324 + if (prog_type == BPF_PROG_TYPE_LSM && 7325 + env->prog->expected_attach_type == BPF_LSM_CGROUP) { 7336 7326 if (!env->prog->aux->attach_func_proto->type) { 7337 7327 /* Make sure programs that attach to void 7338 7328 * hooks don't try to modify return value. ··· 7562 7550 int err, insn_idx = *insn_idx_p; 7563 7551 const struct btf_param *args; 7564 7552 struct btf *desc_btf; 7553 + u32 *kfunc_flags; 7565 7554 bool acq; 7566 7555 7567 7556 /* skip for now, but return error when we find this in fixup_kfunc_call */ ··· 7578 7565 func_name = btf_name_by_offset(desc_btf, func->name_off); 7579 7566 func_proto = btf_type_by_id(desc_btf, func->type); 7580 7567 7581 - if (!btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), 7582 - BTF_KFUNC_TYPE_CHECK, func_id)) { 7568 + kfunc_flags = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), func_id); 7569 + if (!kfunc_flags) { 7583 7570 verbose(env, "calling kernel function %s is not allowed\n", 7584 7571 func_name); 7585 7572 return -EACCES; 7586 7573 } 7587 - 7588 - acq = btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), 7589 - BTF_KFUNC_TYPE_ACQUIRE, func_id); 7574 + acq = *kfunc_flags & KF_ACQUIRE; 7590 7575 7591 7576 /* Check the arguments */ 7592 - err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs); 7577 + err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, *kfunc_flags); 7593 7578 if (err < 0) 7594 7579 return err; 7595 7580 /* In case of release function, we get register number of refcounted ··· 7631 7620 regs[BPF_REG_0].btf = desc_btf; 7632 7621 regs[BPF_REG_0].type = PTR_TO_BTF_ID; 7633 7622 regs[BPF_REG_0].btf_id = ptr_type_id; 7634 - if (btf_kfunc_id_set_contains(desc_btf, resolve_prog_type(env->prog), 7635 - BTF_KFUNC_TYPE_RET_NULL, func_id)) { 7623 + if (*kfunc_flags & KF_RET_NULL) { 7636 7624 regs[BPF_REG_0].type |= PTR_MAYBE_NULL; 7637 7625 /* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */ 7638 7626 regs[BPF_REG_0].id = ++env->id_gen; ··· 12572 12562 case BPF_PROG_TYPE_TRACEPOINT: 12573 12563 case BPF_PROG_TYPE_PERF_EVENT: 12574 12564 case BPF_PROG_TYPE_RAW_TRACEPOINT: 12565 + case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE: 12575 12566 return true; 12576 12567 default: 12577 12568 return false; ··· 13631 13620 /* Below members will be freed only at prog->aux */ 13632 13621 func[i]->aux->btf = prog->aux->btf; 13633 13622 func[i]->aux->func_info = prog->aux->func_info; 13623 + func[i]->aux->func_info_cnt = prog->aux->func_info_cnt; 13634 13624 func[i]->aux->poke_tab = prog->aux->poke_tab; 13635 13625 func[i]->aux->size_poke_tab = prog->aux->size_poke_tab; 13636 13626 ··· 13644 13632 poke->aux = func[i]->aux; 13645 13633 } 13646 13634 13647 - /* Use bpf_prog_F_tag to indicate functions in stack traces. 13648 - * Long term would need debug info to populate names 13649 - */ 13650 13635 func[i]->aux->name[0] = 'F'; 13651 13636 func[i]->aux->stack_depth = env->subprog_info[i].stack_depth; 13652 13637 func[i]->jit_requested = 1;
+91
kernel/kallsyms.c
··· 30 30 #include <linux/module.h> 31 31 #include <linux/kernel.h> 32 32 #include <linux/bsearch.h> 33 + #include <linux/btf_ids.h> 33 34 34 35 /* 35 36 * These will be re-linked against their real values ··· 799 798 .stop = s_stop, 800 799 .show = s_show 801 800 }; 801 + 802 + #ifdef CONFIG_BPF_SYSCALL 803 + 804 + struct bpf_iter__ksym { 805 + __bpf_md_ptr(struct bpf_iter_meta *, meta); 806 + __bpf_md_ptr(struct kallsym_iter *, ksym); 807 + }; 808 + 809 + static int ksym_prog_seq_show(struct seq_file *m, bool in_stop) 810 + { 811 + struct bpf_iter__ksym ctx; 812 + struct bpf_iter_meta meta; 813 + struct bpf_prog *prog; 814 + 815 + meta.seq = m; 816 + prog = bpf_iter_get_info(&meta, in_stop); 817 + if (!prog) 818 + return 0; 819 + 820 + ctx.meta = &meta; 821 + ctx.ksym = m ? m->private : NULL; 822 + return bpf_iter_run_prog(prog, &ctx); 823 + } 824 + 825 + static int bpf_iter_ksym_seq_show(struct seq_file *m, void *p) 826 + { 827 + return ksym_prog_seq_show(m, false); 828 + } 829 + 830 + static void bpf_iter_ksym_seq_stop(struct seq_file *m, void *p) 831 + { 832 + if (!p) 833 + (void) ksym_prog_seq_show(m, true); 834 + else 835 + s_stop(m, p); 836 + } 837 + 838 + static const struct seq_operations bpf_iter_ksym_ops = { 839 + .start = s_start, 840 + .next = s_next, 841 + .stop = bpf_iter_ksym_seq_stop, 842 + .show = bpf_iter_ksym_seq_show, 843 + }; 844 + 845 + static int bpf_iter_ksym_init(void *priv_data, struct bpf_iter_aux_info *aux) 846 + { 847 + struct kallsym_iter *iter = priv_data; 848 + 849 + reset_iter(iter, 0); 850 + 851 + /* cache here as in kallsyms_open() case; use current process 852 + * credentials to tell BPF iterators if values should be shown. 853 + */ 854 + iter->show_value = kallsyms_show_value(current_cred()); 855 + 856 + return 0; 857 + } 858 + 859 + DEFINE_BPF_ITER_FUNC(ksym, struct bpf_iter_meta *meta, struct kallsym_iter *ksym) 860 + 861 + static const struct bpf_iter_seq_info ksym_iter_seq_info = { 862 + .seq_ops = &bpf_iter_ksym_ops, 863 + .init_seq_private = bpf_iter_ksym_init, 864 + .fini_seq_private = NULL, 865 + .seq_priv_size = sizeof(struct kallsym_iter), 866 + }; 867 + 868 + static struct bpf_iter_reg ksym_iter_reg_info = { 869 + .target = "ksym", 870 + .feature = BPF_ITER_RESCHED, 871 + .ctx_arg_info_size = 1, 872 + .ctx_arg_info = { 873 + { offsetof(struct bpf_iter__ksym, ksym), 874 + PTR_TO_BTF_ID_OR_NULL }, 875 + }, 876 + .seq_info = &ksym_iter_seq_info, 877 + }; 878 + 879 + BTF_ID_LIST(btf_ksym_iter_id) 880 + BTF_ID(struct, kallsym_iter) 881 + 882 + static int __init bpf_ksym_iter_register(void) 883 + { 884 + ksym_iter_reg_info.ctx_arg_info[0].btf_id = *btf_ksym_iter_id; 885 + return bpf_iter_reg_target(&ksym_iter_reg_info); 886 + } 887 + 888 + late_initcall(bpf_ksym_iter_register); 889 + 890 + #endif /* CONFIG_BPF_SYSCALL */ 802 891 803 892 static inline int kallsyms_for_perf(void) 804 893 {
+278 -50
kernel/trace/ftrace.c
··· 1861 1861 ftrace_hash_rec_update_modify(ops, filter_hash, 1); 1862 1862 } 1863 1863 1864 + static bool ops_references_ip(struct ftrace_ops *ops, unsigned long ip); 1865 + 1864 1866 /* 1865 1867 * Try to update IPMODIFY flag on each ftrace_rec. Return 0 if it is OK 1866 1868 * or no-needed to update, -EBUSY if it detects a conflict of the flag ··· 1871 1869 * - If the hash is NULL, it hits all recs (if IPMODIFY is set, this is rejected) 1872 1870 * - If the hash is EMPTY_HASH, it hits nothing 1873 1871 * - Anything else hits the recs which match the hash entries. 1872 + * 1873 + * DIRECT ops does not have IPMODIFY flag, but we still need to check it 1874 + * against functions with FTRACE_FL_IPMODIFY. If there is any overlap, call 1875 + * ops_func(SHARE_IPMODIFY_SELF) to make sure current ops can share with 1876 + * IPMODIFY. If ops_func(SHARE_IPMODIFY_SELF) returns non-zero, propagate 1877 + * the return value to the caller and eventually to the owner of the DIRECT 1878 + * ops. 1874 1879 */ 1875 1880 static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops, 1876 1881 struct ftrace_hash *old_hash, ··· 1886 1877 struct ftrace_page *pg; 1887 1878 struct dyn_ftrace *rec, *end = NULL; 1888 1879 int in_old, in_new; 1880 + bool is_ipmodify, is_direct; 1889 1881 1890 1882 /* Only update if the ops has been registered */ 1891 1883 if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 1892 1884 return 0; 1893 1885 1894 - if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY)) 1886 + is_ipmodify = ops->flags & FTRACE_OPS_FL_IPMODIFY; 1887 + is_direct = ops->flags & FTRACE_OPS_FL_DIRECT; 1888 + 1889 + /* neither IPMODIFY nor DIRECT, skip */ 1890 + if (!is_ipmodify && !is_direct) 1891 + return 0; 1892 + 1893 + if (WARN_ON_ONCE(is_ipmodify && is_direct)) 1895 1894 return 0; 1896 1895 1897 1896 /* 1898 - * Since the IPMODIFY is a very address sensitive action, we do not 1899 - * allow ftrace_ops to set all functions to new hash. 1897 + * Since the IPMODIFY and DIRECT are very address sensitive 1898 + * actions, we do not allow ftrace_ops to set all functions to new 1899 + * hash. 1900 1900 */ 1901 1901 if (!new_hash || !old_hash) 1902 1902 return -EINVAL; ··· 1923 1905 continue; 1924 1906 1925 1907 if (in_new) { 1926 - /* New entries must ensure no others are using it */ 1927 - if (rec->flags & FTRACE_FL_IPMODIFY) 1928 - goto rollback; 1929 - rec->flags |= FTRACE_FL_IPMODIFY; 1930 - } else /* Removed entry */ 1908 + if (rec->flags & FTRACE_FL_IPMODIFY) { 1909 + int ret; 1910 + 1911 + /* Cannot have two ipmodify on same rec */ 1912 + if (is_ipmodify) 1913 + goto rollback; 1914 + 1915 + FTRACE_WARN_ON(rec->flags & FTRACE_FL_DIRECT); 1916 + 1917 + /* 1918 + * Another ops with IPMODIFY is already 1919 + * attached. We are now attaching a direct 1920 + * ops. Run SHARE_IPMODIFY_SELF, to check 1921 + * whether sharing is supported. 1922 + */ 1923 + if (!ops->ops_func) 1924 + return -EBUSY; 1925 + ret = ops->ops_func(ops, FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF); 1926 + if (ret) 1927 + return ret; 1928 + } else if (is_ipmodify) { 1929 + rec->flags |= FTRACE_FL_IPMODIFY; 1930 + } 1931 + } else if (is_ipmodify) { 1931 1932 rec->flags &= ~FTRACE_FL_IPMODIFY; 1933 + } 1932 1934 } while_for_each_ftrace_rec(); 1933 1935 1934 1936 return 0; ··· 2492 2454 2493 2455 struct ftrace_ops direct_ops = { 2494 2456 .func = call_direct_funcs, 2495 - .flags = FTRACE_OPS_FL_IPMODIFY 2496 - | FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS 2457 + .flags = FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS 2497 2458 | FTRACE_OPS_FL_PERMANENT, 2498 2459 /* 2499 2460 * By declaring the main trampoline as this trampoline ··· 3109 3072 } 3110 3073 3111 3074 /* 3112 - * Check if the current ops references the record. 3075 + * Check if the current ops references the given ip. 3113 3076 * 3114 3077 * If the ops traces all functions, then it was already accounted for. 3115 3078 * If the ops does not trace the current record function, skip it. 3116 3079 * If the ops ignores the function via notrace filter, skip it. 3117 3080 */ 3118 - static inline bool 3119 - ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec) 3081 + static bool 3082 + ops_references_ip(struct ftrace_ops *ops, unsigned long ip) 3120 3083 { 3121 3084 /* If ops isn't enabled, ignore it */ 3122 3085 if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) ··· 3128 3091 3129 3092 /* The function must be in the filter */ 3130 3093 if (!ftrace_hash_empty(ops->func_hash->filter_hash) && 3131 - !__ftrace_lookup_ip(ops->func_hash->filter_hash, rec->ip)) 3094 + !__ftrace_lookup_ip(ops->func_hash->filter_hash, ip)) 3132 3095 return false; 3133 3096 3134 3097 /* If in notrace hash, we ignore it too */ 3135 - if (ftrace_lookup_ip(ops->func_hash->notrace_hash, rec->ip)) 3098 + if (ftrace_lookup_ip(ops->func_hash->notrace_hash, ip)) 3136 3099 return false; 3137 3100 3138 3101 return true; 3102 + } 3103 + 3104 + /* 3105 + * Check if the current ops references the record. 3106 + * 3107 + * If the ops traces all functions, then it was already accounted for. 3108 + * If the ops does not trace the current record function, skip it. 3109 + * If the ops ignores the function via notrace filter, skip it. 3110 + */ 3111 + static bool 3112 + ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec) 3113 + { 3114 + return ops_references_ip(ops, rec->ip); 3139 3115 } 3140 3116 3141 3117 static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs) ··· 5265 5215 return direct; 5266 5216 } 5267 5217 5218 + static int register_ftrace_function_nolock(struct ftrace_ops *ops); 5219 + 5268 5220 /** 5269 5221 * register_ftrace_direct - Call a custom trampoline directly 5270 5222 * @ip: The address of the nop at the beginning of a function ··· 5338 5286 ret = ftrace_set_filter_ip(&direct_ops, ip, 0, 0); 5339 5287 5340 5288 if (!ret && !(direct_ops.flags & FTRACE_OPS_FL_ENABLED)) { 5341 - ret = register_ftrace_function(&direct_ops); 5289 + ret = register_ftrace_function_nolock(&direct_ops); 5342 5290 if (ret) 5343 5291 ftrace_set_filter_ip(&direct_ops, ip, 1, 0); 5344 5292 } ··· 5597 5545 } 5598 5546 EXPORT_SYMBOL_GPL(modify_ftrace_direct); 5599 5547 5600 - #define MULTI_FLAGS (FTRACE_OPS_FL_IPMODIFY | FTRACE_OPS_FL_DIRECT | \ 5601 - FTRACE_OPS_FL_SAVE_REGS) 5548 + #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS) 5602 5549 5603 5550 static int check_direct_multi(struct ftrace_ops *ops) 5604 5551 { ··· 5690 5639 ops->flags = MULTI_FLAGS; 5691 5640 ops->trampoline = FTRACE_REGS_ADDR; 5692 5641 5693 - err = register_ftrace_function(ops); 5642 + err = register_ftrace_function_nolock(ops); 5694 5643 5695 5644 out_remove: 5696 5645 if (err) ··· 5742 5691 } 5743 5692 EXPORT_SYMBOL_GPL(unregister_ftrace_direct_multi); 5744 5693 5745 - /** 5746 - * modify_ftrace_direct_multi - Modify an existing direct 'multi' call 5747 - * to call something else 5748 - * @ops: The address of the struct ftrace_ops object 5749 - * @addr: The address of the new trampoline to call at @ops functions 5750 - * 5751 - * This is used to unregister currently registered direct caller and 5752 - * register new one @addr on functions registered in @ops object. 5753 - * 5754 - * Note there's window between ftrace_shutdown and ftrace_startup calls 5755 - * where there will be no callbacks called. 5756 - * 5757 - * Returns: zero on success. Non zero on error, which includes: 5758 - * -EINVAL - The @ops object was not properly registered. 5759 - */ 5760 - int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5694 + static int 5695 + __modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5761 5696 { 5762 5697 struct ftrace_hash *hash; 5763 5698 struct ftrace_func_entry *entry, *iter; ··· 5754 5717 int i, size; 5755 5718 int err; 5756 5719 5757 - if (check_direct_multi(ops)) 5758 - return -EINVAL; 5759 - if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 5760 - return -EINVAL; 5761 - 5762 - mutex_lock(&direct_mutex); 5720 + lockdep_assert_held_once(&direct_mutex); 5763 5721 5764 5722 /* Enable the tmp_ops to have the same functions as the direct ops */ 5765 5723 ftrace_ops_init(&tmp_ops); 5766 5724 tmp_ops.func_hash = ops->func_hash; 5767 5725 5768 - err = register_ftrace_function(&tmp_ops); 5726 + err = register_ftrace_function_nolock(&tmp_ops); 5769 5727 if (err) 5770 - goto out_direct; 5728 + return err; 5771 5729 5772 5730 /* 5773 5731 * Now the ftrace_ops_list_func() is called to do the direct callers. ··· 5786 5754 /* Removing the tmp_ops will add the updated direct callers to the functions */ 5787 5755 unregister_ftrace_function(&tmp_ops); 5788 5756 5789 - out_direct: 5757 + return err; 5758 + } 5759 + 5760 + /** 5761 + * modify_ftrace_direct_multi_nolock - Modify an existing direct 'multi' call 5762 + * to call something else 5763 + * @ops: The address of the struct ftrace_ops object 5764 + * @addr: The address of the new trampoline to call at @ops functions 5765 + * 5766 + * This is used to unregister currently registered direct caller and 5767 + * register new one @addr on functions registered in @ops object. 5768 + * 5769 + * Note there's window between ftrace_shutdown and ftrace_startup calls 5770 + * where there will be no callbacks called. 5771 + * 5772 + * Caller should already have direct_mutex locked, so we don't lock 5773 + * direct_mutex here. 5774 + * 5775 + * Returns: zero on success. Non zero on error, which includes: 5776 + * -EINVAL - The @ops object was not properly registered. 5777 + */ 5778 + int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr) 5779 + { 5780 + if (check_direct_multi(ops)) 5781 + return -EINVAL; 5782 + if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 5783 + return -EINVAL; 5784 + 5785 + return __modify_ftrace_direct_multi(ops, addr); 5786 + } 5787 + EXPORT_SYMBOL_GPL(modify_ftrace_direct_multi_nolock); 5788 + 5789 + /** 5790 + * modify_ftrace_direct_multi - Modify an existing direct 'multi' call 5791 + * to call something else 5792 + * @ops: The address of the struct ftrace_ops object 5793 + * @addr: The address of the new trampoline to call at @ops functions 5794 + * 5795 + * This is used to unregister currently registered direct caller and 5796 + * register new one @addr on functions registered in @ops object. 5797 + * 5798 + * Note there's window between ftrace_shutdown and ftrace_startup calls 5799 + * where there will be no callbacks called. 5800 + * 5801 + * Returns: zero on success. Non zero on error, which includes: 5802 + * -EINVAL - The @ops object was not properly registered. 5803 + */ 5804 + int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5805 + { 5806 + int err; 5807 + 5808 + if (check_direct_multi(ops)) 5809 + return -EINVAL; 5810 + if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 5811 + return -EINVAL; 5812 + 5813 + mutex_lock(&direct_mutex); 5814 + err = __modify_ftrace_direct_multi(ops, addr); 5790 5815 mutex_unlock(&direct_mutex); 5791 5816 return err; 5792 5817 } ··· 8054 7965 return ftrace_disabled; 8055 7966 } 8056 7967 7968 + #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS 7969 + /* 7970 + * When registering ftrace_ops with IPMODIFY, it is necessary to make sure 7971 + * it doesn't conflict with any direct ftrace_ops. If there is existing 7972 + * direct ftrace_ops on a kernel function being patched, call 7973 + * FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER on it to enable sharing. 7974 + * 7975 + * @ops: ftrace_ops being registered. 7976 + * 7977 + * Returns: 7978 + * 0 on success; 7979 + * Negative on failure. 7980 + */ 7981 + static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops) 7982 + { 7983 + struct ftrace_func_entry *entry; 7984 + struct ftrace_hash *hash; 7985 + struct ftrace_ops *op; 7986 + int size, i, ret; 7987 + 7988 + lockdep_assert_held_once(&direct_mutex); 7989 + 7990 + if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY)) 7991 + return 0; 7992 + 7993 + hash = ops->func_hash->filter_hash; 7994 + size = 1 << hash->size_bits; 7995 + for (i = 0; i < size; i++) { 7996 + hlist_for_each_entry(entry, &hash->buckets[i], hlist) { 7997 + unsigned long ip = entry->ip; 7998 + bool found_op = false; 7999 + 8000 + mutex_lock(&ftrace_lock); 8001 + do_for_each_ftrace_op(op, ftrace_ops_list) { 8002 + if (!(op->flags & FTRACE_OPS_FL_DIRECT)) 8003 + continue; 8004 + if (ops_references_ip(op, ip)) { 8005 + found_op = true; 8006 + break; 8007 + } 8008 + } while_for_each_ftrace_op(op); 8009 + mutex_unlock(&ftrace_lock); 8010 + 8011 + if (found_op) { 8012 + if (!op->ops_func) 8013 + return -EBUSY; 8014 + 8015 + ret = op->ops_func(op, FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER); 8016 + if (ret) 8017 + return ret; 8018 + } 8019 + } 8020 + } 8021 + 8022 + return 0; 8023 + } 8024 + 8025 + /* 8026 + * Similar to prepare_direct_functions_for_ipmodify, clean up after ops 8027 + * with IPMODIFY is unregistered. The cleanup is optional for most DIRECT 8028 + * ops. 8029 + */ 8030 + static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops) 8031 + { 8032 + struct ftrace_func_entry *entry; 8033 + struct ftrace_hash *hash; 8034 + struct ftrace_ops *op; 8035 + int size, i; 8036 + 8037 + if (!(ops->flags & FTRACE_OPS_FL_IPMODIFY)) 8038 + return; 8039 + 8040 + mutex_lock(&direct_mutex); 8041 + 8042 + hash = ops->func_hash->filter_hash; 8043 + size = 1 << hash->size_bits; 8044 + for (i = 0; i < size; i++) { 8045 + hlist_for_each_entry(entry, &hash->buckets[i], hlist) { 8046 + unsigned long ip = entry->ip; 8047 + bool found_op = false; 8048 + 8049 + mutex_lock(&ftrace_lock); 8050 + do_for_each_ftrace_op(op, ftrace_ops_list) { 8051 + if (!(op->flags & FTRACE_OPS_FL_DIRECT)) 8052 + continue; 8053 + if (ops_references_ip(op, ip)) { 8054 + found_op = true; 8055 + break; 8056 + } 8057 + } while_for_each_ftrace_op(op); 8058 + mutex_unlock(&ftrace_lock); 8059 + 8060 + /* The cleanup is optional, ignore any errors */ 8061 + if (found_op && op->ops_func) 8062 + op->ops_func(op, FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER); 8063 + } 8064 + } 8065 + mutex_unlock(&direct_mutex); 8066 + } 8067 + 8068 + #define lock_direct_mutex() mutex_lock(&direct_mutex) 8069 + #define unlock_direct_mutex() mutex_unlock(&direct_mutex) 8070 + 8071 + #else /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ 8072 + 8073 + static int prepare_direct_functions_for_ipmodify(struct ftrace_ops *ops) 8074 + { 8075 + return 0; 8076 + } 8077 + 8078 + static void cleanup_direct_functions_after_ipmodify(struct ftrace_ops *ops) 8079 + { 8080 + } 8081 + 8082 + #define lock_direct_mutex() do { } while (0) 8083 + #define unlock_direct_mutex() do { } while (0) 8084 + 8085 + #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ 8086 + 8087 + /* 8088 + * Similar to register_ftrace_function, except we don't lock direct_mutex. 8089 + */ 8090 + static int register_ftrace_function_nolock(struct ftrace_ops *ops) 8091 + { 8092 + int ret; 8093 + 8094 + ftrace_ops_init(ops); 8095 + 8096 + mutex_lock(&ftrace_lock); 8097 + 8098 + ret = ftrace_startup(ops, 0); 8099 + 8100 + mutex_unlock(&ftrace_lock); 8101 + 8102 + return ret; 8103 + } 8104 + 8057 8105 /** 8058 8106 * register_ftrace_function - register a function for profiling 8059 8107 * @ops: ops structure that holds the function for profiling. ··· 8206 7980 { 8207 7981 int ret; 8208 7982 8209 - ftrace_ops_init(ops); 7983 + lock_direct_mutex(); 7984 + ret = prepare_direct_functions_for_ipmodify(ops); 7985 + if (ret < 0) 7986 + goto out_unlock; 8210 7987 8211 - mutex_lock(&ftrace_lock); 7988 + ret = register_ftrace_function_nolock(ops); 8212 7989 8213 - ret = ftrace_startup(ops, 0); 8214 - 8215 - mutex_unlock(&ftrace_lock); 8216 - 7990 + out_unlock: 7991 + unlock_direct_mutex(); 8217 7992 return ret; 8218 7993 } 8219 7994 EXPORT_SYMBOL_GPL(register_ftrace_function); ··· 8233 8006 ret = ftrace_shutdown(ops, 0); 8234 8007 mutex_unlock(&ftrace_lock); 8235 8008 8009 + cleanup_direct_functions_after_ipmodify(ops); 8236 8010 return ret; 8237 8011 } 8238 8012 EXPORT_SYMBOL_GPL(unregister_ftrace_function);
+30 -48
net/bpf/test_run.c
··· 691 691 { 692 692 } 693 693 694 + noinline void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) 695 + { 696 + } 697 + 694 698 __diag_pop(); 695 699 696 700 ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO); 697 701 698 - BTF_SET_START(test_sk_check_kfunc_ids) 699 - BTF_ID(func, bpf_kfunc_call_test1) 700 - BTF_ID(func, bpf_kfunc_call_test2) 701 - BTF_ID(func, bpf_kfunc_call_test3) 702 - BTF_ID(func, bpf_kfunc_call_test_acquire) 703 - BTF_ID(func, bpf_kfunc_call_memb_acquire) 704 - BTF_ID(func, bpf_kfunc_call_test_release) 705 - BTF_ID(func, bpf_kfunc_call_memb_release) 706 - BTF_ID(func, bpf_kfunc_call_memb1_release) 707 - BTF_ID(func, bpf_kfunc_call_test_kptr_get) 708 - BTF_ID(func, bpf_kfunc_call_test_pass_ctx) 709 - BTF_ID(func, bpf_kfunc_call_test_pass1) 710 - BTF_ID(func, bpf_kfunc_call_test_pass2) 711 - BTF_ID(func, bpf_kfunc_call_test_fail1) 712 - BTF_ID(func, bpf_kfunc_call_test_fail2) 713 - BTF_ID(func, bpf_kfunc_call_test_fail3) 714 - BTF_ID(func, bpf_kfunc_call_test_mem_len_pass1) 715 - BTF_ID(func, bpf_kfunc_call_test_mem_len_fail1) 716 - BTF_ID(func, bpf_kfunc_call_test_mem_len_fail2) 717 - BTF_SET_END(test_sk_check_kfunc_ids) 718 - 719 - BTF_SET_START(test_sk_acquire_kfunc_ids) 720 - BTF_ID(func, bpf_kfunc_call_test_acquire) 721 - BTF_ID(func, bpf_kfunc_call_memb_acquire) 722 - BTF_ID(func, bpf_kfunc_call_test_kptr_get) 723 - BTF_SET_END(test_sk_acquire_kfunc_ids) 724 - 725 - BTF_SET_START(test_sk_release_kfunc_ids) 726 - BTF_ID(func, bpf_kfunc_call_test_release) 727 - BTF_ID(func, bpf_kfunc_call_memb_release) 728 - BTF_ID(func, bpf_kfunc_call_memb1_release) 729 - BTF_SET_END(test_sk_release_kfunc_ids) 730 - 731 - BTF_SET_START(test_sk_ret_null_kfunc_ids) 732 - BTF_ID(func, bpf_kfunc_call_test_acquire) 733 - BTF_ID(func, bpf_kfunc_call_memb_acquire) 734 - BTF_ID(func, bpf_kfunc_call_test_kptr_get) 735 - BTF_SET_END(test_sk_ret_null_kfunc_ids) 736 - 737 - BTF_SET_START(test_sk_kptr_acquire_kfunc_ids) 738 - BTF_ID(func, bpf_kfunc_call_test_kptr_get) 739 - BTF_SET_END(test_sk_kptr_acquire_kfunc_ids) 702 + BTF_SET8_START(test_sk_check_kfunc_ids) 703 + BTF_ID_FLAGS(func, bpf_kfunc_call_test1) 704 + BTF_ID_FLAGS(func, bpf_kfunc_call_test2) 705 + BTF_ID_FLAGS(func, bpf_kfunc_call_test3) 706 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_acquire, KF_ACQUIRE | KF_RET_NULL) 707 + BTF_ID_FLAGS(func, bpf_kfunc_call_memb_acquire, KF_ACQUIRE | KF_RET_NULL) 708 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE) 709 + BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE) 710 + BTF_ID_FLAGS(func, bpf_kfunc_call_memb1_release, KF_RELEASE) 711 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_kptr_get, KF_ACQUIRE | KF_RET_NULL | KF_KPTR_GET) 712 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass_ctx) 713 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass1) 714 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass2) 715 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail1) 716 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail2) 717 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3) 718 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1) 719 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1) 720 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2) 721 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS) 722 + BTF_SET8_END(test_sk_check_kfunc_ids) 740 723 741 724 static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, 742 725 u32 size, u32 headroom, u32 tailroom) ··· 937 954 static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb) 938 955 { 939 956 struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; 957 + 958 + if (!skb->len) 959 + return -EINVAL; 940 960 941 961 if (!__skb) 942 962 return 0; ··· 1603 1617 } 1604 1618 1605 1619 static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = { 1606 - .owner = THIS_MODULE, 1607 - .check_set = &test_sk_check_kfunc_ids, 1608 - .acquire_set = &test_sk_acquire_kfunc_ids, 1609 - .release_set = &test_sk_release_kfunc_ids, 1610 - .ret_null_set = &test_sk_ret_null_kfunc_ids, 1611 - .kptr_acquire_set = &test_sk_kptr_acquire_kfunc_ids 1620 + .owner = THIS_MODULE, 1621 + .set = &test_sk_check_kfunc_ids, 1612 1622 }; 1613 1623 1614 1624 BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)
+1
net/core/dev.c
··· 4168 4168 bool again = false; 4169 4169 4170 4170 skb_reset_mac_header(skb); 4171 + skb_assert_len(skb); 4171 4172 4172 4173 if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP)) 4173 4174 __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
+2 -2
net/core/filter.c
··· 237 237 BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *, 238 238 data, int, headlen, int, offset) 239 239 { 240 - u16 tmp, *ptr; 240 + __be16 tmp, *ptr; 241 241 const int len = sizeof(tmp); 242 242 243 243 if (offset >= 0) { ··· 264 264 BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *, 265 265 data, int, headlen, int, offset) 266 266 { 267 - u32 tmp, *ptr; 267 + __be32 tmp, *ptr; 268 268 const int len = sizeof(tmp); 269 269 270 270 if (likely(offset >= 0)) {
+2 -2
net/core/skmsg.c
··· 462 462 463 463 if (copied == len) 464 464 break; 465 - } while (i != msg_rx->sg.end); 465 + } while (!sg_is_last(sge)); 466 466 467 467 if (unlikely(peek)) { 468 468 msg_rx = sk_psock_next_msg(psock, msg_rx); ··· 472 472 } 473 473 474 474 msg_rx->sg.start = i; 475 - if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { 475 + if (!sge->length && sg_is_last(sge)) { 476 476 msg_rx = sk_psock_dequeue_msg(psock); 477 477 kfree_sk_msg(msg_rx); 478 478 }
+9 -9
net/ipv4/bpf_tcp_ca.c
··· 197 197 } 198 198 } 199 199 200 - BTF_SET_START(bpf_tcp_ca_check_kfunc_ids) 201 - BTF_ID(func, tcp_reno_ssthresh) 202 - BTF_ID(func, tcp_reno_cong_avoid) 203 - BTF_ID(func, tcp_reno_undo_cwnd) 204 - BTF_ID(func, tcp_slow_start) 205 - BTF_ID(func, tcp_cong_avoid_ai) 206 - BTF_SET_END(bpf_tcp_ca_check_kfunc_ids) 200 + BTF_SET8_START(bpf_tcp_ca_check_kfunc_ids) 201 + BTF_ID_FLAGS(func, tcp_reno_ssthresh) 202 + BTF_ID_FLAGS(func, tcp_reno_cong_avoid) 203 + BTF_ID_FLAGS(func, tcp_reno_undo_cwnd) 204 + BTF_ID_FLAGS(func, tcp_slow_start) 205 + BTF_ID_FLAGS(func, tcp_cong_avoid_ai) 206 + BTF_SET8_END(bpf_tcp_ca_check_kfunc_ids) 207 207 208 208 static const struct btf_kfunc_id_set bpf_tcp_ca_kfunc_set = { 209 - .owner = THIS_MODULE, 210 - .check_set = &bpf_tcp_ca_check_kfunc_ids, 209 + .owner = THIS_MODULE, 210 + .set = &bpf_tcp_ca_check_kfunc_ids, 211 211 }; 212 212 213 213 static const struct bpf_verifier_ops bpf_tcp_ca_verifier_ops = {
+12 -12
net/ipv4/tcp_bbr.c
··· 1154 1154 .set_state = bbr_set_state, 1155 1155 }; 1156 1156 1157 - BTF_SET_START(tcp_bbr_check_kfunc_ids) 1157 + BTF_SET8_START(tcp_bbr_check_kfunc_ids) 1158 1158 #ifdef CONFIG_X86 1159 1159 #ifdef CONFIG_DYNAMIC_FTRACE 1160 - BTF_ID(func, bbr_init) 1161 - BTF_ID(func, bbr_main) 1162 - BTF_ID(func, bbr_sndbuf_expand) 1163 - BTF_ID(func, bbr_undo_cwnd) 1164 - BTF_ID(func, bbr_cwnd_event) 1165 - BTF_ID(func, bbr_ssthresh) 1166 - BTF_ID(func, bbr_min_tso_segs) 1167 - BTF_ID(func, bbr_set_state) 1160 + BTF_ID_FLAGS(func, bbr_init) 1161 + BTF_ID_FLAGS(func, bbr_main) 1162 + BTF_ID_FLAGS(func, bbr_sndbuf_expand) 1163 + BTF_ID_FLAGS(func, bbr_undo_cwnd) 1164 + BTF_ID_FLAGS(func, bbr_cwnd_event) 1165 + BTF_ID_FLAGS(func, bbr_ssthresh) 1166 + BTF_ID_FLAGS(func, bbr_min_tso_segs) 1167 + BTF_ID_FLAGS(func, bbr_set_state) 1168 1168 #endif 1169 1169 #endif 1170 - BTF_SET_END(tcp_bbr_check_kfunc_ids) 1170 + BTF_SET8_END(tcp_bbr_check_kfunc_ids) 1171 1171 1172 1172 static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = { 1173 - .owner = THIS_MODULE, 1174 - .check_set = &tcp_bbr_check_kfunc_ids, 1173 + .owner = THIS_MODULE, 1174 + .set = &tcp_bbr_check_kfunc_ids, 1175 1175 }; 1176 1176 1177 1177 static int __init bbr_register(void)
+10 -10
net/ipv4/tcp_cubic.c
··· 485 485 .name = "cubic", 486 486 }; 487 487 488 - BTF_SET_START(tcp_cubic_check_kfunc_ids) 488 + BTF_SET8_START(tcp_cubic_check_kfunc_ids) 489 489 #ifdef CONFIG_X86 490 490 #ifdef CONFIG_DYNAMIC_FTRACE 491 - BTF_ID(func, cubictcp_init) 492 - BTF_ID(func, cubictcp_recalc_ssthresh) 493 - BTF_ID(func, cubictcp_cong_avoid) 494 - BTF_ID(func, cubictcp_state) 495 - BTF_ID(func, cubictcp_cwnd_event) 496 - BTF_ID(func, cubictcp_acked) 491 + BTF_ID_FLAGS(func, cubictcp_init) 492 + BTF_ID_FLAGS(func, cubictcp_recalc_ssthresh) 493 + BTF_ID_FLAGS(func, cubictcp_cong_avoid) 494 + BTF_ID_FLAGS(func, cubictcp_state) 495 + BTF_ID_FLAGS(func, cubictcp_cwnd_event) 496 + BTF_ID_FLAGS(func, cubictcp_acked) 497 497 #endif 498 498 #endif 499 - BTF_SET_END(tcp_cubic_check_kfunc_ids) 499 + BTF_SET8_END(tcp_cubic_check_kfunc_ids) 500 500 501 501 static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = { 502 - .owner = THIS_MODULE, 503 - .check_set = &tcp_cubic_check_kfunc_ids, 502 + .owner = THIS_MODULE, 503 + .set = &tcp_cubic_check_kfunc_ids, 504 504 }; 505 505 506 506 static int __init cubictcp_register(void)
+10 -10
net/ipv4/tcp_dctcp.c
··· 239 239 .name = "dctcp-reno", 240 240 }; 241 241 242 - BTF_SET_START(tcp_dctcp_check_kfunc_ids) 242 + BTF_SET8_START(tcp_dctcp_check_kfunc_ids) 243 243 #ifdef CONFIG_X86 244 244 #ifdef CONFIG_DYNAMIC_FTRACE 245 - BTF_ID(func, dctcp_init) 246 - BTF_ID(func, dctcp_update_alpha) 247 - BTF_ID(func, dctcp_cwnd_event) 248 - BTF_ID(func, dctcp_ssthresh) 249 - BTF_ID(func, dctcp_cwnd_undo) 250 - BTF_ID(func, dctcp_state) 245 + BTF_ID_FLAGS(func, dctcp_init) 246 + BTF_ID_FLAGS(func, dctcp_update_alpha) 247 + BTF_ID_FLAGS(func, dctcp_cwnd_event) 248 + BTF_ID_FLAGS(func, dctcp_ssthresh) 249 + BTF_ID_FLAGS(func, dctcp_cwnd_undo) 250 + BTF_ID_FLAGS(func, dctcp_state) 251 251 #endif 252 252 #endif 253 - BTF_SET_END(tcp_dctcp_check_kfunc_ids) 253 + BTF_SET8_END(tcp_dctcp_check_kfunc_ids) 254 254 255 255 static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = { 256 - .owner = THIS_MODULE, 257 - .check_set = &tcp_dctcp_check_kfunc_ids, 256 + .owner = THIS_MODULE, 257 + .set = &tcp_dctcp_check_kfunc_ids, 258 258 }; 259 259 260 260 static int __init dctcp_register(void)
+278 -91
net/netfilter/nf_conntrack_bpf.c
··· 55 55 NF_BPF_CT_OPTS_SZ = 12, 56 56 }; 57 57 58 + static int bpf_nf_ct_tuple_parse(struct bpf_sock_tuple *bpf_tuple, 59 + u32 tuple_len, u8 protonum, u8 dir, 60 + struct nf_conntrack_tuple *tuple) 61 + { 62 + union nf_inet_addr *src = dir ? &tuple->dst.u3 : &tuple->src.u3; 63 + union nf_inet_addr *dst = dir ? &tuple->src.u3 : &tuple->dst.u3; 64 + union nf_conntrack_man_proto *sport = dir ? (void *)&tuple->dst.u 65 + : &tuple->src.u; 66 + union nf_conntrack_man_proto *dport = dir ? &tuple->src.u 67 + : (void *)&tuple->dst.u; 68 + 69 + if (unlikely(protonum != IPPROTO_TCP && protonum != IPPROTO_UDP)) 70 + return -EPROTO; 71 + 72 + memset(tuple, 0, sizeof(*tuple)); 73 + 74 + switch (tuple_len) { 75 + case sizeof(bpf_tuple->ipv4): 76 + tuple->src.l3num = AF_INET; 77 + src->ip = bpf_tuple->ipv4.saddr; 78 + sport->tcp.port = bpf_tuple->ipv4.sport; 79 + dst->ip = bpf_tuple->ipv4.daddr; 80 + dport->tcp.port = bpf_tuple->ipv4.dport; 81 + break; 82 + case sizeof(bpf_tuple->ipv6): 83 + tuple->src.l3num = AF_INET6; 84 + memcpy(src->ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr)); 85 + sport->tcp.port = bpf_tuple->ipv6.sport; 86 + memcpy(dst->ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr)); 87 + dport->tcp.port = bpf_tuple->ipv6.dport; 88 + break; 89 + default: 90 + return -EAFNOSUPPORT; 91 + } 92 + tuple->dst.protonum = protonum; 93 + tuple->dst.dir = dir; 94 + 95 + return 0; 96 + } 97 + 98 + static struct nf_conn * 99 + __bpf_nf_ct_alloc_entry(struct net *net, struct bpf_sock_tuple *bpf_tuple, 100 + u32 tuple_len, struct bpf_ct_opts *opts, u32 opts_len, 101 + u32 timeout) 102 + { 103 + struct nf_conntrack_tuple otuple, rtuple; 104 + struct nf_conn *ct; 105 + int err; 106 + 107 + if (!opts || !bpf_tuple || opts->reserved[0] || opts->reserved[1] || 108 + opts_len != NF_BPF_CT_OPTS_SZ) 109 + return ERR_PTR(-EINVAL); 110 + 111 + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) 112 + return ERR_PTR(-EINVAL); 113 + 114 + err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto, 115 + IP_CT_DIR_ORIGINAL, &otuple); 116 + if (err < 0) 117 + return ERR_PTR(err); 118 + 119 + err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto, 120 + IP_CT_DIR_REPLY, &rtuple); 121 + if (err < 0) 122 + return ERR_PTR(err); 123 + 124 + if (opts->netns_id >= 0) { 125 + net = get_net_ns_by_id(net, opts->netns_id); 126 + if (unlikely(!net)) 127 + return ERR_PTR(-ENONET); 128 + } 129 + 130 + ct = nf_conntrack_alloc(net, &nf_ct_zone_dflt, &otuple, &rtuple, 131 + GFP_ATOMIC); 132 + if (IS_ERR(ct)) 133 + goto out; 134 + 135 + memset(&ct->proto, 0, sizeof(ct->proto)); 136 + __nf_ct_set_timeout(ct, timeout * HZ); 137 + ct->status |= IPS_CONFIRMED; 138 + 139 + out: 140 + if (opts->netns_id >= 0) 141 + put_net(net); 142 + 143 + return ct; 144 + } 145 + 58 146 static struct nf_conn *__bpf_nf_ct_lookup(struct net *net, 59 147 struct bpf_sock_tuple *bpf_tuple, 60 - u32 tuple_len, u8 protonum, 61 - s32 netns_id, u8 *dir) 148 + u32 tuple_len, struct bpf_ct_opts *opts, 149 + u32 opts_len) 62 150 { 63 151 struct nf_conntrack_tuple_hash *hash; 64 152 struct nf_conntrack_tuple tuple; 65 153 struct nf_conn *ct; 154 + int err; 66 155 67 - if (unlikely(protonum != IPPROTO_TCP && protonum != IPPROTO_UDP)) 156 + if (!opts || !bpf_tuple || opts->reserved[0] || opts->reserved[1] || 157 + opts_len != NF_BPF_CT_OPTS_SZ) 158 + return ERR_PTR(-EINVAL); 159 + if (unlikely(opts->l4proto != IPPROTO_TCP && opts->l4proto != IPPROTO_UDP)) 68 160 return ERR_PTR(-EPROTO); 69 - if (unlikely(netns_id < BPF_F_CURRENT_NETNS)) 161 + if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) 70 162 return ERR_PTR(-EINVAL); 71 163 72 - memset(&tuple, 0, sizeof(tuple)); 73 - switch (tuple_len) { 74 - case sizeof(bpf_tuple->ipv4): 75 - tuple.src.l3num = AF_INET; 76 - tuple.src.u3.ip = bpf_tuple->ipv4.saddr; 77 - tuple.src.u.tcp.port = bpf_tuple->ipv4.sport; 78 - tuple.dst.u3.ip = bpf_tuple->ipv4.daddr; 79 - tuple.dst.u.tcp.port = bpf_tuple->ipv4.dport; 80 - break; 81 - case sizeof(bpf_tuple->ipv6): 82 - tuple.src.l3num = AF_INET6; 83 - memcpy(tuple.src.u3.ip6, bpf_tuple->ipv6.saddr, sizeof(bpf_tuple->ipv6.saddr)); 84 - tuple.src.u.tcp.port = bpf_tuple->ipv6.sport; 85 - memcpy(tuple.dst.u3.ip6, bpf_tuple->ipv6.daddr, sizeof(bpf_tuple->ipv6.daddr)); 86 - tuple.dst.u.tcp.port = bpf_tuple->ipv6.dport; 87 - break; 88 - default: 89 - return ERR_PTR(-EAFNOSUPPORT); 90 - } 164 + err = bpf_nf_ct_tuple_parse(bpf_tuple, tuple_len, opts->l4proto, 165 + IP_CT_DIR_ORIGINAL, &tuple); 166 + if (err < 0) 167 + return ERR_PTR(err); 91 168 92 - tuple.dst.protonum = protonum; 93 - 94 - if (netns_id >= 0) { 95 - net = get_net_ns_by_id(net, netns_id); 169 + if (opts->netns_id >= 0) { 170 + net = get_net_ns_by_id(net, opts->netns_id); 96 171 if (unlikely(!net)) 97 172 return ERR_PTR(-ENONET); 98 173 } 99 174 100 175 hash = nf_conntrack_find_get(net, &nf_ct_zone_dflt, &tuple); 101 - if (netns_id >= 0) 176 + if (opts->netns_id >= 0) 102 177 put_net(net); 103 178 if (!hash) 104 179 return ERR_PTR(-ENOENT); 105 180 106 181 ct = nf_ct_tuplehash_to_ctrack(hash); 107 - if (dir) 108 - *dir = NF_CT_DIRECTION(hash); 182 + opts->dir = NF_CT_DIRECTION(hash); 109 183 110 184 return ct; 111 185 } ··· 187 113 __diag_push(); 188 114 __diag_ignore_all("-Wmissing-prototypes", 189 115 "Global functions as their definitions will be in nf_conntrack BTF"); 116 + 117 + struct nf_conn___init { 118 + struct nf_conn ct; 119 + }; 120 + 121 + /* bpf_xdp_ct_alloc - Allocate a new CT entry 122 + * 123 + * Parameters: 124 + * @xdp_ctx - Pointer to ctx (xdp_md) in XDP program 125 + * Cannot be NULL 126 + * @bpf_tuple - Pointer to memory representing the tuple to look up 127 + * Cannot be NULL 128 + * @tuple__sz - Length of the tuple structure 129 + * Must be one of sizeof(bpf_tuple->ipv4) or 130 + * sizeof(bpf_tuple->ipv6) 131 + * @opts - Additional options for allocation (documented above) 132 + * Cannot be NULL 133 + * @opts__sz - Length of the bpf_ct_opts structure 134 + * Must be NF_BPF_CT_OPTS_SZ (12) 135 + */ 136 + struct nf_conn___init * 137 + bpf_xdp_ct_alloc(struct xdp_md *xdp_ctx, struct bpf_sock_tuple *bpf_tuple, 138 + u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz) 139 + { 140 + struct xdp_buff *ctx = (struct xdp_buff *)xdp_ctx; 141 + struct nf_conn *nfct; 142 + 143 + nfct = __bpf_nf_ct_alloc_entry(dev_net(ctx->rxq->dev), bpf_tuple, tuple__sz, 144 + opts, opts__sz, 10); 145 + if (IS_ERR(nfct)) { 146 + if (opts) 147 + opts->error = PTR_ERR(nfct); 148 + return NULL; 149 + } 150 + 151 + return (struct nf_conn___init *)nfct; 152 + } 190 153 191 154 /* bpf_xdp_ct_lookup - Lookup CT entry for the given tuple, and acquire a 192 155 * reference to it ··· 249 138 struct net *caller_net; 250 139 struct nf_conn *nfct; 251 140 252 - BUILD_BUG_ON(sizeof(struct bpf_ct_opts) != NF_BPF_CT_OPTS_SZ); 253 - 254 - if (!opts) 255 - return NULL; 256 - if (!bpf_tuple || opts->reserved[0] || opts->reserved[1] || 257 - opts__sz != NF_BPF_CT_OPTS_SZ) { 258 - opts->error = -EINVAL; 259 - return NULL; 260 - } 261 141 caller_net = dev_net(ctx->rxq->dev); 262 - nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts->l4proto, 263 - opts->netns_id, &opts->dir); 142 + nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz); 264 143 if (IS_ERR(nfct)) { 265 - opts->error = PTR_ERR(nfct); 144 + if (opts) 145 + opts->error = PTR_ERR(nfct); 266 146 return NULL; 267 147 } 268 148 return nfct; 149 + } 150 + 151 + /* bpf_skb_ct_alloc - Allocate a new CT entry 152 + * 153 + * Parameters: 154 + * @skb_ctx - Pointer to ctx (__sk_buff) in TC program 155 + * Cannot be NULL 156 + * @bpf_tuple - Pointer to memory representing the tuple to look up 157 + * Cannot be NULL 158 + * @tuple__sz - Length of the tuple structure 159 + * Must be one of sizeof(bpf_tuple->ipv4) or 160 + * sizeof(bpf_tuple->ipv6) 161 + * @opts - Additional options for allocation (documented above) 162 + * Cannot be NULL 163 + * @opts__sz - Length of the bpf_ct_opts structure 164 + * Must be NF_BPF_CT_OPTS_SZ (12) 165 + */ 166 + struct nf_conn___init * 167 + bpf_skb_ct_alloc(struct __sk_buff *skb_ctx, struct bpf_sock_tuple *bpf_tuple, 168 + u32 tuple__sz, struct bpf_ct_opts *opts, u32 opts__sz) 169 + { 170 + struct sk_buff *skb = (struct sk_buff *)skb_ctx; 171 + struct nf_conn *nfct; 172 + struct net *net; 173 + 174 + net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); 175 + nfct = __bpf_nf_ct_alloc_entry(net, bpf_tuple, tuple__sz, opts, opts__sz, 10); 176 + if (IS_ERR(nfct)) { 177 + if (opts) 178 + opts->error = PTR_ERR(nfct); 179 + return NULL; 180 + } 181 + 182 + return (struct nf_conn___init *)nfct; 269 183 } 270 184 271 185 /* bpf_skb_ct_lookup - Lookup CT entry for the given tuple, and acquire a ··· 317 181 struct net *caller_net; 318 182 struct nf_conn *nfct; 319 183 320 - BUILD_BUG_ON(sizeof(struct bpf_ct_opts) != NF_BPF_CT_OPTS_SZ); 321 - 322 - if (!opts) 323 - return NULL; 324 - if (!bpf_tuple || opts->reserved[0] || opts->reserved[1] || 325 - opts__sz != NF_BPF_CT_OPTS_SZ) { 326 - opts->error = -EINVAL; 184 + caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); 185 + nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts, opts__sz); 186 + if (IS_ERR(nfct)) { 187 + if (opts) 188 + opts->error = PTR_ERR(nfct); 327 189 return NULL; 328 190 } 329 - caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); 330 - nfct = __bpf_nf_ct_lookup(caller_net, bpf_tuple, tuple__sz, opts->l4proto, 331 - opts->netns_id, &opts->dir); 332 - if (IS_ERR(nfct)) { 333 - opts->error = PTR_ERR(nfct); 191 + return nfct; 192 + } 193 + 194 + /* bpf_ct_insert_entry - Add the provided entry into a CT map 195 + * 196 + * This must be invoked for referenced PTR_TO_BTF_ID. 197 + * 198 + * @nfct - Pointer to referenced nf_conn___init object, obtained 199 + * using bpf_xdp_ct_alloc or bpf_skb_ct_alloc. 200 + */ 201 + struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i) 202 + { 203 + struct nf_conn *nfct = (struct nf_conn *)nfct_i; 204 + int err; 205 + 206 + err = nf_conntrack_hash_check_insert(nfct); 207 + if (err < 0) { 208 + nf_conntrack_free(nfct); 334 209 return NULL; 335 210 } 336 211 return nfct; ··· 364 217 nf_ct_put(nfct); 365 218 } 366 219 220 + /* bpf_ct_set_timeout - Set timeout of allocated nf_conn 221 + * 222 + * Sets the default timeout of newly allocated nf_conn before insertion. 223 + * This helper must be invoked for refcounted pointer to nf_conn___init. 224 + * 225 + * Parameters: 226 + * @nfct - Pointer to referenced nf_conn object, obtained using 227 + * bpf_xdp_ct_alloc or bpf_skb_ct_alloc. 228 + * @timeout - Timeout in msecs. 229 + */ 230 + void bpf_ct_set_timeout(struct nf_conn___init *nfct, u32 timeout) 231 + { 232 + __nf_ct_set_timeout((struct nf_conn *)nfct, msecs_to_jiffies(timeout)); 233 + } 234 + 235 + /* bpf_ct_change_timeout - Change timeout of inserted nf_conn 236 + * 237 + * Change timeout associated of the inserted or looked up nf_conn. 238 + * This helper must be invoked for refcounted pointer to nf_conn. 239 + * 240 + * Parameters: 241 + * @nfct - Pointer to referenced nf_conn object, obtained using 242 + * bpf_ct_insert_entry, bpf_xdp_ct_lookup, or bpf_skb_ct_lookup. 243 + * @timeout - New timeout in msecs. 244 + */ 245 + int bpf_ct_change_timeout(struct nf_conn *nfct, u32 timeout) 246 + { 247 + return __nf_ct_change_timeout(nfct, msecs_to_jiffies(timeout)); 248 + } 249 + 250 + /* bpf_ct_set_status - Set status field of allocated nf_conn 251 + * 252 + * Set the status field of the newly allocated nf_conn before insertion. 253 + * This must be invoked for referenced PTR_TO_BTF_ID to nf_conn___init. 254 + * 255 + * Parameters: 256 + * @nfct - Pointer to referenced nf_conn object, obtained using 257 + * bpf_xdp_ct_alloc or bpf_skb_ct_alloc. 258 + * @status - New status value. 259 + */ 260 + int bpf_ct_set_status(const struct nf_conn___init *nfct, u32 status) 261 + { 262 + return nf_ct_change_status_common((struct nf_conn *)nfct, status); 263 + } 264 + 265 + /* bpf_ct_change_status - Change status of inserted nf_conn 266 + * 267 + * Change the status field of the provided connection tracking entry. 268 + * This must be invoked for referenced PTR_TO_BTF_ID to nf_conn. 269 + * 270 + * Parameters: 271 + * @nfct - Pointer to referenced nf_conn object, obtained using 272 + * bpf_ct_insert_entry, bpf_xdp_ct_lookup or bpf_skb_ct_lookup. 273 + * @status - New status value. 274 + */ 275 + int bpf_ct_change_status(struct nf_conn *nfct, u32 status) 276 + { 277 + return nf_ct_change_status_common(nfct, status); 278 + } 279 + 367 280 __diag_pop() 368 281 369 - BTF_SET_START(nf_ct_xdp_check_kfunc_ids) 370 - BTF_ID(func, bpf_xdp_ct_lookup) 371 - BTF_ID(func, bpf_ct_release) 372 - BTF_SET_END(nf_ct_xdp_check_kfunc_ids) 282 + BTF_SET8_START(nf_ct_kfunc_set) 283 + BTF_ID_FLAGS(func, bpf_xdp_ct_alloc, KF_ACQUIRE | KF_RET_NULL) 284 + BTF_ID_FLAGS(func, bpf_xdp_ct_lookup, KF_ACQUIRE | KF_RET_NULL) 285 + BTF_ID_FLAGS(func, bpf_skb_ct_alloc, KF_ACQUIRE | KF_RET_NULL) 286 + BTF_ID_FLAGS(func, bpf_skb_ct_lookup, KF_ACQUIRE | KF_RET_NULL) 287 + BTF_ID_FLAGS(func, bpf_ct_insert_entry, KF_ACQUIRE | KF_RET_NULL | KF_RELEASE) 288 + BTF_ID_FLAGS(func, bpf_ct_release, KF_RELEASE) 289 + BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_TRUSTED_ARGS) 290 + BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_TRUSTED_ARGS) 291 + BTF_ID_FLAGS(func, bpf_ct_set_status, KF_TRUSTED_ARGS) 292 + BTF_ID_FLAGS(func, bpf_ct_change_status, KF_TRUSTED_ARGS) 293 + BTF_SET8_END(nf_ct_kfunc_set) 373 294 374 - BTF_SET_START(nf_ct_tc_check_kfunc_ids) 375 - BTF_ID(func, bpf_skb_ct_lookup) 376 - BTF_ID(func, bpf_ct_release) 377 - BTF_SET_END(nf_ct_tc_check_kfunc_ids) 378 - 379 - BTF_SET_START(nf_ct_acquire_kfunc_ids) 380 - BTF_ID(func, bpf_xdp_ct_lookup) 381 - BTF_ID(func, bpf_skb_ct_lookup) 382 - BTF_SET_END(nf_ct_acquire_kfunc_ids) 383 - 384 - BTF_SET_START(nf_ct_release_kfunc_ids) 385 - BTF_ID(func, bpf_ct_release) 386 - BTF_SET_END(nf_ct_release_kfunc_ids) 387 - 388 - /* Both sets are identical */ 389 - #define nf_ct_ret_null_kfunc_ids nf_ct_acquire_kfunc_ids 390 - 391 - static const struct btf_kfunc_id_set nf_conntrack_xdp_kfunc_set = { 392 - .owner = THIS_MODULE, 393 - .check_set = &nf_ct_xdp_check_kfunc_ids, 394 - .acquire_set = &nf_ct_acquire_kfunc_ids, 395 - .release_set = &nf_ct_release_kfunc_ids, 396 - .ret_null_set = &nf_ct_ret_null_kfunc_ids, 397 - }; 398 - 399 - static const struct btf_kfunc_id_set nf_conntrack_tc_kfunc_set = { 400 - .owner = THIS_MODULE, 401 - .check_set = &nf_ct_tc_check_kfunc_ids, 402 - .acquire_set = &nf_ct_acquire_kfunc_ids, 403 - .release_set = &nf_ct_release_kfunc_ids, 404 - .ret_null_set = &nf_ct_ret_null_kfunc_ids, 295 + static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = { 296 + .owner = THIS_MODULE, 297 + .set = &nf_ct_kfunc_set, 405 298 }; 406 299 407 300 int register_nf_conntrack_bpf(void) 408 301 { 409 302 int ret; 410 303 411 - ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_xdp_kfunc_set); 412 - return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_tc_kfunc_set); 304 + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set); 305 + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set); 413 306 }
+62
net/netfilter/nf_conntrack_core.c
··· 2806 2806 free_percpu(net->ct.stat); 2807 2807 return ret; 2808 2808 } 2809 + 2810 + #if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \ 2811 + (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) || \ 2812 + IS_ENABLED(CONFIG_NF_CT_NETLINK)) 2813 + 2814 + /* ctnetlink code shared by both ctnetlink and nf_conntrack_bpf */ 2815 + 2816 + int __nf_ct_change_timeout(struct nf_conn *ct, u64 timeout) 2817 + { 2818 + if (test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status)) 2819 + return -EPERM; 2820 + 2821 + __nf_ct_set_timeout(ct, timeout); 2822 + 2823 + if (test_bit(IPS_DYING_BIT, &ct->status)) 2824 + return -ETIME; 2825 + 2826 + return 0; 2827 + } 2828 + EXPORT_SYMBOL_GPL(__nf_ct_change_timeout); 2829 + 2830 + void __nf_ct_change_status(struct nf_conn *ct, unsigned long on, unsigned long off) 2831 + { 2832 + unsigned int bit; 2833 + 2834 + /* Ignore these unchangable bits */ 2835 + on &= ~IPS_UNCHANGEABLE_MASK; 2836 + off &= ~IPS_UNCHANGEABLE_MASK; 2837 + 2838 + for (bit = 0; bit < __IPS_MAX_BIT; bit++) { 2839 + if (on & (1 << bit)) 2840 + set_bit(bit, &ct->status); 2841 + else if (off & (1 << bit)) 2842 + clear_bit(bit, &ct->status); 2843 + } 2844 + } 2845 + EXPORT_SYMBOL_GPL(__nf_ct_change_status); 2846 + 2847 + int nf_ct_change_status_common(struct nf_conn *ct, unsigned int status) 2848 + { 2849 + unsigned long d; 2850 + 2851 + d = ct->status ^ status; 2852 + 2853 + if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING)) 2854 + /* unchangeable */ 2855 + return -EBUSY; 2856 + 2857 + if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY)) 2858 + /* SEEN_REPLY bit can only be set */ 2859 + return -EBUSY; 2860 + 2861 + if (d & IPS_ASSURED && !(status & IPS_ASSURED)) 2862 + /* ASSURED bit can only be set */ 2863 + return -EBUSY; 2864 + 2865 + __nf_ct_change_status(ct, status, 0); 2866 + return 0; 2867 + } 2868 + EXPORT_SYMBOL_GPL(nf_ct_change_status_common); 2869 + 2870 + #endif
+4 -50
net/netfilter/nf_conntrack_netlink.c
··· 1891 1891 } 1892 1892 #endif 1893 1893 1894 - static void 1895 - __ctnetlink_change_status(struct nf_conn *ct, unsigned long on, 1896 - unsigned long off) 1897 - { 1898 - unsigned int bit; 1899 - 1900 - /* Ignore these unchangable bits */ 1901 - on &= ~IPS_UNCHANGEABLE_MASK; 1902 - off &= ~IPS_UNCHANGEABLE_MASK; 1903 - 1904 - for (bit = 0; bit < __IPS_MAX_BIT; bit++) { 1905 - if (on & (1 << bit)) 1906 - set_bit(bit, &ct->status); 1907 - else if (off & (1 << bit)) 1908 - clear_bit(bit, &ct->status); 1909 - } 1910 - } 1911 - 1912 1894 static int 1913 1895 ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[]) 1914 1896 { 1915 - unsigned long d; 1916 - unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS])); 1917 - d = ct->status ^ status; 1918 - 1919 - if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING)) 1920 - /* unchangeable */ 1921 - return -EBUSY; 1922 - 1923 - if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY)) 1924 - /* SEEN_REPLY bit can only be set */ 1925 - return -EBUSY; 1926 - 1927 - if (d & IPS_ASSURED && !(status & IPS_ASSURED)) 1928 - /* ASSURED bit can only be set */ 1929 - return -EBUSY; 1930 - 1931 - __ctnetlink_change_status(ct, status, 0); 1932 - return 0; 1897 + return nf_ct_change_status_common(ct, ntohl(nla_get_be32(cda[CTA_STATUS]))); 1933 1898 } 1934 1899 1935 1900 static int ··· 1989 2024 static int ctnetlink_change_timeout(struct nf_conn *ct, 1990 2025 const struct nlattr * const cda[]) 1991 2026 { 1992 - u64 timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ; 1993 - 1994 - if (timeout > INT_MAX) 1995 - timeout = INT_MAX; 1996 - WRITE_ONCE(ct->timeout, nfct_time_stamp + (u32)timeout); 1997 - 1998 - if (test_bit(IPS_DYING_BIT, &ct->status)) 1999 - return -ETIME; 2000 - 2001 - return 0; 2027 + return __nf_ct_change_timeout(ct, (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ); 2002 2028 } 2003 2029 2004 2030 #if defined(CONFIG_NF_CONNTRACK_MARK) ··· 2249 2293 goto err1; 2250 2294 2251 2295 timeout = (u64)ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ; 2252 - if (timeout > INT_MAX) 2253 - timeout = INT_MAX; 2254 - ct->timeout = (u32)timeout + nfct_time_stamp; 2296 + __nf_ct_set_timeout(ct, timeout); 2255 2297 2256 2298 rcu_read_lock(); 2257 2299 if (cda[CTA_HELP]) { ··· 2791 2837 * unchangeable bits but do not error out. Also user programs 2792 2838 * are allowed to clear the bits that they are allowed to change. 2793 2839 */ 2794 - __ctnetlink_change_status(ct, status, ~status); 2840 + __nf_ct_change_status(ct, status, ~status); 2795 2841 return 0; 2796 2842 } 2797 2843
+4 -1
net/xdp/xsk.c
··· 639 639 if (unlikely(need_wait)) 640 640 return -EOPNOTSUPP; 641 641 642 - if (sk_can_busy_loop(sk)) 642 + if (sk_can_busy_loop(sk)) { 643 + if (xs->zc) 644 + __sk_mark_napi_id_once(sk, xsk_pool_get_napi_id(xs->pool)); 643 645 sk_busy_loop(sk, 1); /* only support non-blocking sockets */ 646 + } 644 647 645 648 if (xs->zc && xsk_no_wakeup(sk)) 646 649 return 0;
+4 -6
samples/bpf/Makefile
··· 282 282 283 283 BPFTOOLDIR := $(TOOLS_PATH)/bpf/bpftool 284 284 BPFTOOL_OUTPUT := $(abspath $(BPF_SAMPLES_PATH))/bpftool 285 - BPFTOOL := $(BPFTOOL_OUTPUT)/bpftool 286 - $(BPFTOOL): $(LIBBPF) $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) | $(BPFTOOL_OUTPUT) 287 - $(MAKE) -C $(BPFTOOLDIR) srctree=$(BPF_SAMPLES_PATH)/../../ \ 288 - OUTPUT=$(BPFTOOL_OUTPUT)/ \ 289 - LIBBPF_OUTPUT=$(LIBBPF_OUTPUT)/ \ 290 - LIBBPF_DESTDIR=$(LIBBPF_DESTDIR)/ 285 + BPFTOOL := $(BPFTOOL_OUTPUT)/bootstrap/bpftool 286 + $(BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) | $(BPFTOOL_OUTPUT) 287 + $(MAKE) -C $(BPFTOOLDIR) srctree=$(BPF_SAMPLES_PATH)/../../ \ 288 + OUTPUT=$(BPFTOOL_OUTPUT)/ bootstrap 291 289 292 290 $(LIBBPF_OUTPUT) $(BPFTOOL_OUTPUT): 293 291 $(call msg,MKDIR,$@)
+2 -1
samples/bpf/fds_example.c
··· 17 17 #include <bpf/libbpf.h> 18 18 #include "bpf_insn.h" 19 19 #include "sock_example.h" 20 + #include "bpf_util.h" 20 21 21 22 #define BPF_F_PIN (1 << 0) 22 23 #define BPF_F_GET (1 << 1) ··· 53 52 BPF_MOV64_IMM(BPF_REG_0, 1), 54 53 BPF_EXIT_INSN(), 55 54 }; 56 - size_t insns_cnt = sizeof(insns) / sizeof(struct bpf_insn); 55 + size_t insns_cnt = ARRAY_SIZE(insns); 57 56 struct bpf_object *obj; 58 57 int err; 59 58
+2 -1
samples/bpf/sock_example.c
··· 29 29 #include <bpf/bpf.h> 30 30 #include "bpf_insn.h" 31 31 #include "sock_example.h" 32 + #include "bpf_util.h" 32 33 33 34 char bpf_log_buf[BPF_LOG_BUF_SIZE]; 34 35 ··· 59 58 BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ 60 59 BPF_EXIT_INSN(), 61 60 }; 62 - size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); 61 + size_t insns_cnt = ARRAY_SIZE(prog); 63 62 LIBBPF_OPTS(bpf_prog_load_opts, opts, 64 63 .log_buf = bpf_log_buf, 65 64 .log_size = BPF_LOG_BUF_SIZE,
+2 -1
samples/bpf/test_cgrp2_attach.c
··· 31 31 #include <bpf/bpf.h> 32 32 33 33 #include "bpf_insn.h" 34 + #include "bpf_util.h" 34 35 35 36 enum { 36 37 MAP_KEY_PACKETS, ··· 71 70 BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */ 72 71 BPF_EXIT_INSN(), 73 72 }; 74 - size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); 73 + size_t insns_cnt = ARRAY_SIZE(prog); 75 74 LIBBPF_OPTS(bpf_prog_load_opts, opts, 76 75 .log_buf = bpf_log_buf, 77 76 .log_size = BPF_LOG_BUF_SIZE,
+1 -1
samples/bpf/test_lru_dist.c
··· 523 523 return -1; 524 524 } 525 525 526 - for (f = 0; f < sizeof(map_flags) / sizeof(*map_flags); f++) { 526 + for (f = 0; f < ARRAY_SIZE(map_flags); f++) { 527 527 test_lru_loss0(BPF_MAP_TYPE_LRU_HASH, map_flags[f]); 528 528 test_lru_loss1(BPF_MAP_TYPE_LRU_HASH, map_flags[f]); 529 529 test_parallel_lru_loss(BPF_MAP_TYPE_LRU_HASH, map_flags[f],
+3 -1
samples/bpf/test_map_in_map_user.c
··· 12 12 #include <bpf/bpf.h> 13 13 #include <bpf/libbpf.h> 14 14 15 + #include "bpf_util.h" 16 + 15 17 static int map_fd[7]; 16 18 17 19 #define PORT_A (map_fd[0]) ··· 30 28 "Hash of Hash", 31 29 }; 32 30 33 - #define NR_TESTS (sizeof(test_names) / sizeof(*test_names)) 31 + #define NR_TESTS ARRAY_SIZE(test_names) 34 32 35 33 static void check_map_id(int inner_map_fd, int map_in_map_fd, uint32_t key) 36 34 {
+2 -1
samples/bpf/tracex5_user.c
··· 8 8 #include <bpf/bpf.h> 9 9 #include <bpf/libbpf.h> 10 10 #include "trace_helpers.h" 11 + #include "bpf_util.h" 11 12 12 13 #ifdef __mips__ 13 14 #define MAX_ENTRIES 6000 /* MIPS n64 syscalls start at 5000 */ ··· 25 24 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), 26 25 }; 27 26 struct sock_fprog prog = { 28 - .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), 27 + .len = (unsigned short)ARRAY_SIZE(filter), 29 28 .filter = filter, 30 29 }; 31 30 if (prctl(PR_SET_SECCOMP, 2, &prog))
+4 -2
samples/bpf/xdp_redirect_map.bpf.c
··· 33 33 } tx_port_native SEC(".maps"); 34 34 35 35 /* store egress interface mac address */ 36 - const volatile char tx_mac_addr[ETH_ALEN]; 36 + const volatile __u8 tx_mac_addr[ETH_ALEN]; 37 37 38 38 static __always_inline int xdp_redirect_map(struct xdp_md *ctx, void *redirect_map) 39 39 { ··· 73 73 { 74 74 void *data_end = (void *)(long)ctx->data_end; 75 75 void *data = (void *)(long)ctx->data; 76 + u8 *mac_addr = (u8 *) tx_mac_addr; 76 77 struct ethhdr *eth = data; 77 78 u64 nh_off; 78 79 ··· 81 80 if (data + nh_off > data_end) 82 81 return XDP_DROP; 83 82 84 - __builtin_memcpy(eth->h_source, (const char *)tx_mac_addr, ETH_ALEN); 83 + barrier_var(mac_addr); /* prevent optimizing out memcpy */ 84 + __builtin_memcpy(eth->h_source, mac_addr, ETH_ALEN); 85 85 86 86 return XDP_PASS; 87 87 }
+9
samples/bpf/xdp_redirect_map_user.c
··· 40 40 {} 41 41 }; 42 42 43 + static int verbose = 0; 44 + 43 45 int main(int argc, char **argv) 44 46 { 45 47 struct bpf_devmap_val devmap_val = {}; ··· 81 79 break; 82 80 case 'v': 83 81 sample_switch_mode(); 82 + verbose = 1; 84 83 break; 85 84 case 's': 86 85 mask |= SAMPLE_REDIRECT_MAP_CNT; ··· 137 134 ret = EXIT_FAIL; 138 135 goto end_destroy; 139 136 } 137 + if (verbose) 138 + printf("Egress ifindex:%d using src MAC %02x:%02x:%02x:%02x:%02x:%02x\n", 139 + ifindex_out, 140 + skel->rodata->tx_mac_addr[0], skel->rodata->tx_mac_addr[1], 141 + skel->rodata->tx_mac_addr[2], skel->rodata->tx_mac_addr[3], 142 + skel->rodata->tx_mac_addr[4], skel->rodata->tx_mac_addr[5]); 140 143 } 141 144 142 145 skel->rodata->from_match[0] = ifindex_in;
+1 -21
scripts/bpf_doc.py
··· 333 333 .. Copyright (C) All BPF authors and contributors from 2014 to present. 334 334 .. See git log include/uapi/linux/bpf.h in kernel tree for details. 335 335 .. 336 - .. %%%LICENSE_START(VERBATIM) 337 - .. Permission is granted to make and distribute verbatim copies of this 338 - .. manual provided the copyright notice and this permission notice are 339 - .. preserved on all copies. 340 - .. 341 - .. Permission is granted to copy and distribute modified versions of this 342 - .. manual under the conditions for verbatim copying, provided that the 343 - .. entire resulting derived work is distributed under the terms of a 344 - .. permission notice identical to this one. 345 - .. 346 - .. Since the Linux kernel and libraries are constantly changing, this 347 - .. manual page may be incorrect or out-of-date. The author(s) assume no 348 - .. responsibility for errors or omissions, or for damages resulting from 349 - .. the use of the information contained herein. The author(s) may not 350 - .. have taken the same level of care in the production of this manual, 351 - .. which is licensed free of charge, as they might when working 352 - .. professionally. 353 - .. 354 - .. Formatted or processed versions of this manual, if unaccompanied by 355 - .. the source, must acknowledge the copyright and authors of this work. 356 - .. %%%LICENSE_END 336 + .. SPDX-License-Identifier: Linux-man-pages-copyleft 357 337 .. 358 338 .. Please do not edit this file. It was generated from the documentation 359 339 .. located in file include/uapi/linux/bpf.h of the Linux kernel sources
+34 -6
tools/bpf/resolve_btfids/main.c
··· 45 45 * .zero 4 46 46 * __BTF_ID__func__vfs_fallocate__4: 47 47 * .zero 4 48 + * 49 + * set8 - store symbol size into first 4 bytes and sort following 50 + * ID list 51 + * 52 + * __BTF_ID__set8__list: 53 + * .zero 8 54 + * list: 55 + * __BTF_ID__func__vfs_getattr__3: 56 + * .zero 4 57 + * .word (1 << 0) | (1 << 2) 58 + * __BTF_ID__func__vfs_fallocate__5: 59 + * .zero 4 60 + * .word (1 << 3) | (1 << 1) | (1 << 2) 48 61 */ 49 62 50 63 #define _GNU_SOURCE ··· 85 72 #define BTF_TYPEDEF "typedef" 86 73 #define BTF_FUNC "func" 87 74 #define BTF_SET "set" 75 + #define BTF_SET8 "set8" 88 76 89 77 #define ADDR_CNT 100 90 78 ··· 98 84 }; 99 85 int addr_cnt; 100 86 bool is_set; 87 + bool is_set8; 101 88 Elf64_Addr addr[ADDR_CNT]; 102 89 }; 103 90 ··· 246 231 return id; 247 232 } 248 233 249 - static struct btf_id *add_set(struct object *obj, char *name) 234 + static struct btf_id *add_set(struct object *obj, char *name, bool is_set8) 250 235 { 251 236 /* 252 237 * __BTF_ID__set__name 253 238 * name = ^ 254 239 * id = ^ 255 240 */ 256 - char *id = name + sizeof(BTF_SET "__") - 1; 241 + char *id = name + (is_set8 ? sizeof(BTF_SET8 "__") : sizeof(BTF_SET "__")) - 1; 257 242 int len = strlen(name); 258 243 259 244 if (id >= name + len) { ··· 459 444 } else if (!strncmp(prefix, BTF_FUNC, sizeof(BTF_FUNC) - 1)) { 460 445 obj->nr_funcs++; 461 446 id = add_symbol(&obj->funcs, prefix, sizeof(BTF_FUNC) - 1); 447 + /* set8 */ 448 + } else if (!strncmp(prefix, BTF_SET8, sizeof(BTF_SET8) - 1)) { 449 + id = add_set(obj, prefix, true); 450 + /* 451 + * SET8 objects store list's count, which is encoded 452 + * in symbol's size, together with 'cnt' field hence 453 + * that - 1. 454 + */ 455 + if (id) { 456 + id->cnt = sym.st_size / sizeof(uint64_t) - 1; 457 + id->is_set8 = true; 458 + } 462 459 /* set */ 463 460 } else if (!strncmp(prefix, BTF_SET, sizeof(BTF_SET) - 1)) { 464 - id = add_set(obj, prefix); 461 + id = add_set(obj, prefix, false); 465 462 /* 466 463 * SET objects store list's count, which is encoded 467 464 * in symbol's size, together with 'cnt' field hence ··· 598 571 int *ptr = data->d_buf; 599 572 int i; 600 573 601 - if (!id->id && !id->is_set) 574 + /* For set, set8, id->id may be 0 */ 575 + if (!id->id && !id->is_set && !id->is_set8) 602 576 pr_err("WARN: resolve_btfids: unresolved symbol %s\n", id->name); 603 577 604 578 for (i = 0; i < id->addr_cnt; i++) { ··· 671 643 } 672 644 673 645 idx = idx / sizeof(int); 674 - base = &ptr[idx] + 1; 646 + base = &ptr[idx] + (id->is_set8 ? 2 : 1); 675 647 cnt = ptr[idx]; 676 648 677 649 pr_debug("sorting addr %5lu: cnt %6d [%s]\n", 678 650 (idx + 1) * sizeof(int), cnt, id->name); 679 651 680 - qsort(base, cnt, sizeof(int), cmp_id); 652 + qsort(base, cnt, id->is_set8 ? sizeof(uint64_t) : sizeof(int), cmp_id); 681 653 682 654 next = rb_next(next); 683 655 }
+3 -4
tools/bpf/runqslower/Makefile
··· 4 4 OUTPUT ?= $(abspath .output)/ 5 5 6 6 BPFTOOL_OUTPUT := $(OUTPUT)bpftool/ 7 - DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bpftool 7 + DEFAULT_BPFTOOL := $(BPFTOOL_OUTPUT)bootstrap/bpftool 8 8 BPFTOOL ?= $(DEFAULT_BPFTOOL) 9 9 LIBBPF_SRC := $(abspath ../../lib/bpf) 10 10 BPFOBJ_OUTPUT := $(OUTPUT)libbpf/ ··· 86 86 $(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) OUTPUT=$(BPFOBJ_OUTPUT) \ 87 87 DESTDIR=$(BPFOBJ_OUTPUT) prefix= $(abspath $@) install_headers 88 88 89 - $(DEFAULT_BPFTOOL): $(BPFOBJ) | $(BPFTOOL_OUTPUT) 90 - $(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT) \ 91 - ARCH= CROSS_COMPILE= CC=$(HOSTCC) LD=$(HOSTLD) 89 + $(DEFAULT_BPFTOOL): | $(BPFTOOL_OUTPUT) 90 + $(Q)$(MAKE) $(submake_extras) -C ../bpftool OUTPUT=$(BPFTOOL_OUTPUT) bootstrap
+2 -1
tools/include/uapi/linux/bpf.h
··· 2361 2361 * Pull in non-linear data in case the *skb* is non-linear and not 2362 2362 * all of *len* are part of the linear section. Make *len* bytes 2363 2363 * from *skb* readable and writable. If a zero value is passed for 2364 - * *len*, then the whole length of the *skb* is pulled. 2364 + * *len*, then all bytes in the linear part of *skb* will be made 2365 + * readable and writable. 2365 2366 * 2366 2367 * This helper is only needed for reading and writing with direct 2367 2368 * packet access.
+38 -13
tools/lib/bpf/bpf_tracing.h
··· 2 2 #ifndef __BPF_TRACING_H__ 3 3 #define __BPF_TRACING_H__ 4 4 5 + #include <bpf/bpf_helpers.h> 6 + 5 7 /* Scan the ARCH passed in from ARCH env variable (see Makefile) */ 6 8 #if defined(__TARGET_ARCH_x86) 7 9 #define bpf_target_x86 ··· 142 140 #define __PT_RC_REG gprs[2] 143 141 #define __PT_SP_REG gprs[15] 144 142 #define __PT_IP_REG psw.addr 145 - #define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; }) 143 + #define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x) 146 144 #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___s390 *)(x), orig_gpr2) 147 145 148 146 #elif defined(bpf_target_arm) ··· 176 174 #define __PT_RC_REG regs[0] 177 175 #define __PT_SP_REG sp 178 176 #define __PT_IP_REG pc 179 - #define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error \"use PT_REGS_PARM1_CORE_SYSCALL() instead\""); 0l; }) 177 + #define PT_REGS_PARM1_SYSCALL(x) PT_REGS_PARM1_CORE_SYSCALL(x) 180 178 #define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___arm64 *)(x), orig_x0) 181 179 182 180 #elif defined(bpf_target_mips) ··· 495 493 } \ 496 494 static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args) 497 495 496 + /* If kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER, read pt_regs directly */ 498 497 #define ___bpf_syscall_args0() ctx 499 - #define ___bpf_syscall_args1(x) ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs) 500 - #define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs) 501 - #define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs) 502 - #define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs) 503 - #define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs) 498 + #define ___bpf_syscall_args1(x) ___bpf_syscall_args0(), (void *)PT_REGS_PARM1_SYSCALL(regs) 499 + #define ___bpf_syscall_args2(x, args...) ___bpf_syscall_args1(args), (void *)PT_REGS_PARM2_SYSCALL(regs) 500 + #define ___bpf_syscall_args3(x, args...) ___bpf_syscall_args2(args), (void *)PT_REGS_PARM3_SYSCALL(regs) 501 + #define ___bpf_syscall_args4(x, args...) ___bpf_syscall_args3(args), (void *)PT_REGS_PARM4_SYSCALL(regs) 502 + #define ___bpf_syscall_args5(x, args...) ___bpf_syscall_args4(args), (void *)PT_REGS_PARM5_SYSCALL(regs) 504 503 #define ___bpf_syscall_args(args...) ___bpf_apply(___bpf_syscall_args, ___bpf_narg(args))(args) 505 504 505 + /* If kernel doesn't have CONFIG_ARCH_HAS_SYSCALL_WRAPPER, we have to BPF_CORE_READ from pt_regs */ 506 + #define ___bpf_syswrap_args0() ctx 507 + #define ___bpf_syswrap_args1(x) ___bpf_syswrap_args0(), (void *)PT_REGS_PARM1_CORE_SYSCALL(regs) 508 + #define ___bpf_syswrap_args2(x, args...) ___bpf_syswrap_args1(args), (void *)PT_REGS_PARM2_CORE_SYSCALL(regs) 509 + #define ___bpf_syswrap_args3(x, args...) ___bpf_syswrap_args2(args), (void *)PT_REGS_PARM3_CORE_SYSCALL(regs) 510 + #define ___bpf_syswrap_args4(x, args...) ___bpf_syswrap_args3(args), (void *)PT_REGS_PARM4_CORE_SYSCALL(regs) 511 + #define ___bpf_syswrap_args5(x, args...) ___bpf_syswrap_args4(args), (void *)PT_REGS_PARM5_CORE_SYSCALL(regs) 512 + #define ___bpf_syswrap_args(args...) ___bpf_apply(___bpf_syswrap_args, ___bpf_narg(args))(args) 513 + 506 514 /* 507 - * BPF_KPROBE_SYSCALL is a variant of BPF_KPROBE, which is intended for 515 + * BPF_KSYSCALL is a variant of BPF_KPROBE, which is intended for 508 516 * tracing syscall functions, like __x64_sys_close. It hides the underlying 509 517 * platform-specific low-level way of getting syscall input arguments from 510 518 * struct pt_regs, and provides a familiar typed and named function arguments 511 519 * syntax and semantics of accessing syscall input parameters. 512 520 * 513 - * Original struct pt_regs* context is preserved as 'ctx' argument. This might 521 + * Original struct pt_regs * context is preserved as 'ctx' argument. This might 514 522 * be necessary when using BPF helpers like bpf_perf_event_output(). 515 523 * 516 - * This macro relies on BPF CO-RE support. 524 + * At the moment BPF_KSYSCALL does not handle all the calling convention 525 + * quirks for mmap(), clone() and compat syscalls transparrently. This may or 526 + * may not change in the future. User needs to take extra measures to handle 527 + * such quirks explicitly, if necessary. 528 + * 529 + * This macro relies on BPF CO-RE support and virtual __kconfig externs. 517 530 */ 518 - #define BPF_KPROBE_SYSCALL(name, args...) \ 531 + #define BPF_KSYSCALL(name, args...) \ 519 532 name(struct pt_regs *ctx); \ 533 + extern _Bool LINUX_HAS_SYSCALL_WRAPPER __kconfig; \ 520 534 static __attribute__((always_inline)) typeof(name(0)) \ 521 535 ____##name(struct pt_regs *ctx, ##args); \ 522 536 typeof(name(0)) name(struct pt_regs *ctx) \ 523 537 { \ 524 - struct pt_regs *regs = PT_REGS_SYSCALL_REGS(ctx); \ 538 + struct pt_regs *regs = LINUX_HAS_SYSCALL_WRAPPER \ 539 + ? (struct pt_regs *)PT_REGS_PARM1(ctx) \ 540 + : ctx; \ 525 541 _Pragma("GCC diagnostic push") \ 526 542 _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \ 527 - return ____##name(___bpf_syscall_args(args)); \ 543 + if (LINUX_HAS_SYSCALL_WRAPPER) \ 544 + return ____##name(___bpf_syswrap_args(args)); \ 545 + else \ 546 + return ____##name(___bpf_syscall_args(args)); \ 528 547 _Pragma("GCC diagnostic pop") \ 529 548 } \ 530 549 static __attribute__((always_inline)) typeof(name(0)) \ 531 550 ____##name(struct pt_regs *ctx, ##args) 551 + 552 + #define BPF_KPROBE_SYSCALL BPF_KSYSCALL 532 553 533 554 #endif
+1 -1
tools/lib/bpf/btf_dump.c
··· 2045 2045 *value = *(__s64 *)data; 2046 2046 return 0; 2047 2047 case 4: 2048 - *value = is_signed ? *(__s32 *)data : *(__u32 *)data; 2048 + *value = is_signed ? (__s64)*(__s32 *)data : *(__u32 *)data; 2049 2049 return 0; 2050 2050 case 2: 2051 2051 *value = is_signed ? *(__s16 *)data : *(__u16 *)data;
+1 -1
tools/lib/bpf/gen_loader.c
··· 533 533 gen->attach_kind = kind; 534 534 ret = snprintf(gen->attach_target, sizeof(gen->attach_target), "%s%s", 535 535 prefix, attach_name); 536 - if (ret == sizeof(gen->attach_target)) 536 + if (ret >= sizeof(gen->attach_target)) 537 537 gen->error = -ENOSPC; 538 538 } 539 539
+287 -105
tools/lib/bpf/libbpf.c
··· 1694 1694 switch (ext->kcfg.type) { 1695 1695 case KCFG_BOOL: 1696 1696 if (value == 'm') { 1697 - pr_warn("extern (kcfg) %s=%c should be tristate or char\n", 1697 + pr_warn("extern (kcfg) '%s': value '%c' implies tristate or char type\n", 1698 1698 ext->name, value); 1699 1699 return -EINVAL; 1700 1700 } ··· 1715 1715 case KCFG_INT: 1716 1716 case KCFG_CHAR_ARR: 1717 1717 default: 1718 - pr_warn("extern (kcfg) %s=%c should be bool, tristate, or char\n", 1718 + pr_warn("extern (kcfg) '%s': value '%c' implies bool, tristate, or char type\n", 1719 1719 ext->name, value); 1720 1720 return -EINVAL; 1721 1721 } ··· 1729 1729 size_t len; 1730 1730 1731 1731 if (ext->kcfg.type != KCFG_CHAR_ARR) { 1732 - pr_warn("extern (kcfg) %s=%s should be char array\n", ext->name, value); 1732 + pr_warn("extern (kcfg) '%s': value '%s' implies char array type\n", 1733 + ext->name, value); 1733 1734 return -EINVAL; 1734 1735 } 1735 1736 ··· 1744 1743 /* strip quotes */ 1745 1744 len -= 2; 1746 1745 if (len >= ext->kcfg.sz) { 1747 - pr_warn("extern (kcfg) '%s': long string config %s of (%zu bytes) truncated to %d bytes\n", 1746 + pr_warn("extern (kcfg) '%s': long string '%s' of (%zu bytes) truncated to %d bytes\n", 1748 1747 ext->name, value, len, ext->kcfg.sz - 1); 1749 1748 len = ext->kcfg.sz - 1; 1750 1749 } ··· 1801 1800 static int set_kcfg_value_num(struct extern_desc *ext, void *ext_val, 1802 1801 __u64 value) 1803 1802 { 1804 - if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR) { 1805 - pr_warn("extern (kcfg) %s=%llu should be integer\n", 1803 + if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR && 1804 + ext->kcfg.type != KCFG_BOOL) { 1805 + pr_warn("extern (kcfg) '%s': value '%llu' implies integer, char, or boolean type\n", 1806 1806 ext->name, (unsigned long long)value); 1807 1807 return -EINVAL; 1808 1808 } 1809 + if (ext->kcfg.type == KCFG_BOOL && value > 1) { 1810 + pr_warn("extern (kcfg) '%s': value '%llu' isn't boolean compatible\n", 1811 + ext->name, (unsigned long long)value); 1812 + return -EINVAL; 1813 + 1814 + } 1809 1815 if (!is_kcfg_value_in_range(ext, value)) { 1810 - pr_warn("extern (kcfg) %s=%llu value doesn't fit in %d bytes\n", 1816 + pr_warn("extern (kcfg) '%s': value '%llu' doesn't fit in %d bytes\n", 1811 1817 ext->name, (unsigned long long)value, ext->kcfg.sz); 1812 1818 return -ERANGE; 1813 1819 } ··· 1878 1870 /* assume integer */ 1879 1871 err = parse_u64(value, &num); 1880 1872 if (err) { 1881 - pr_warn("extern (kcfg) %s=%s should be integer\n", 1882 - ext->name, value); 1873 + pr_warn("extern (kcfg) '%s': value '%s' isn't a valid integer\n", ext->name, value); 1883 1874 return err; 1875 + } 1876 + if (ext->kcfg.type != KCFG_INT && ext->kcfg.type != KCFG_CHAR) { 1877 + pr_warn("extern (kcfg) '%s': value '%s' implies integer type\n", ext->name, value); 1878 + return -EINVAL; 1884 1879 } 1885 1880 err = set_kcfg_value_num(ext, ext_val, num); 1886 1881 break; 1887 1882 } 1888 1883 if (err) 1889 1884 return err; 1890 - pr_debug("extern (kcfg) %s=%s\n", ext->name, value); 1885 + pr_debug("extern (kcfg) '%s': set to %s\n", ext->name, value); 1891 1886 return 0; 1892 1887 } 1893 1888 ··· 2331 2320 return 0; 2332 2321 } 2333 2322 2323 + static size_t adjust_ringbuf_sz(size_t sz) 2324 + { 2325 + __u32 page_sz = sysconf(_SC_PAGE_SIZE); 2326 + __u32 mul; 2327 + 2328 + /* if user forgot to set any size, make sure they see error */ 2329 + if (sz == 0) 2330 + return 0; 2331 + /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be 2332 + * a power-of-2 multiple of kernel's page size. If user diligently 2333 + * satisified these conditions, pass the size through. 2334 + */ 2335 + if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz)) 2336 + return sz; 2337 + 2338 + /* Otherwise find closest (page_sz * power_of_2) product bigger than 2339 + * user-set size to satisfy both user size request and kernel 2340 + * requirements and substitute correct max_entries for map creation. 2341 + */ 2342 + for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) { 2343 + if (mul * page_sz > sz) 2344 + return mul * page_sz; 2345 + } 2346 + 2347 + /* if it's impossible to satisfy the conditions (i.e., user size is 2348 + * very close to UINT_MAX but is not a power-of-2 multiple of 2349 + * page_size) then just return original size and let kernel reject it 2350 + */ 2351 + return sz; 2352 + } 2353 + 2334 2354 static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def) 2335 2355 { 2336 2356 map->def.type = def->map_type; ··· 2374 2332 map->numa_node = def->numa_node; 2375 2333 map->btf_key_type_id = def->key_type_id; 2376 2334 map->btf_value_type_id = def->value_type_id; 2335 + 2336 + /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */ 2337 + if (map->def.type == BPF_MAP_TYPE_RINGBUF) 2338 + map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 2377 2339 2378 2340 if (def->parts & MAP_DEF_MAP_TYPE) 2379 2341 pr_debug("map '%s': found type = %u.\n", map->name, def->map_type); ··· 3733 3687 ext->kcfg.type = find_kcfg_type(obj->btf, t->type, 3734 3688 &ext->kcfg.is_signed); 3735 3689 if (ext->kcfg.type == KCFG_UNKNOWN) { 3736 - pr_warn("extern (kcfg) '%s' type is unsupported\n", ext_name); 3690 + pr_warn("extern (kcfg) '%s': type is unsupported\n", ext_name); 3737 3691 return -ENOTSUP; 3738 3692 } 3739 3693 } else if (strcmp(sec_name, KSYMS_SEC) == 0) { ··· 4278 4232 int bpf_map__reuse_fd(struct bpf_map *map, int fd) 4279 4233 { 4280 4234 struct bpf_map_info info = {}; 4281 - __u32 len = sizeof(info); 4235 + __u32 len = sizeof(info), name_len; 4282 4236 int new_fd, err; 4283 4237 char *new_name; 4284 4238 ··· 4288 4242 if (err) 4289 4243 return libbpf_err(err); 4290 4244 4291 - new_name = strdup(info.name); 4245 + name_len = strlen(info.name); 4246 + if (name_len == BPF_OBJ_NAME_LEN - 1 && strncmp(map->name, info.name, name_len) == 0) 4247 + new_name = strdup(map->name); 4248 + else 4249 + new_name = strdup(info.name); 4250 + 4292 4251 if (!new_name) 4293 4252 return libbpf_err(-errno); 4294 4253 ··· 4352 4301 4353 4302 int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries) 4354 4303 { 4355 - if (map->fd >= 0) 4304 + if (map->obj->loaded) 4356 4305 return libbpf_err(-EBUSY); 4306 + 4357 4307 map->def.max_entries = max_entries; 4308 + 4309 + /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */ 4310 + if (map->def.type == BPF_MAP_TYPE_RINGBUF) 4311 + map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 4312 + 4358 4313 return 0; 4359 4314 } 4360 4315 ··· 4711 4654 strs, sizeof(strs))); 4712 4655 } 4713 4656 4657 + static int probe_kern_syscall_wrapper(void); 4658 + 4714 4659 enum kern_feature_result { 4715 4660 FEAT_UNKNOWN = 0, 4716 4661 FEAT_SUPPORTED = 1, ··· 4780 4721 }, 4781 4722 [FEAT_BTF_ENUM64] = { 4782 4723 "BTF_KIND_ENUM64 support", probe_kern_btf_enum64, 4724 + }, 4725 + [FEAT_SYSCALL_WRAPPER] = { 4726 + "Kernel using syscall wrapper", probe_kern_syscall_wrapper, 4783 4727 }, 4784 4728 }; 4785 4729 ··· 4916 4854 4917 4855 static void bpf_map__destroy(struct bpf_map *map); 4918 4856 4919 - static size_t adjust_ringbuf_sz(size_t sz) 4920 - { 4921 - __u32 page_sz = sysconf(_SC_PAGE_SIZE); 4922 - __u32 mul; 4923 - 4924 - /* if user forgot to set any size, make sure they see error */ 4925 - if (sz == 0) 4926 - return 0; 4927 - /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be 4928 - * a power-of-2 multiple of kernel's page size. If user diligently 4929 - * satisified these conditions, pass the size through. 4930 - */ 4931 - if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz)) 4932 - return sz; 4933 - 4934 - /* Otherwise find closest (page_sz * power_of_2) product bigger than 4935 - * user-set size to satisfy both user size request and kernel 4936 - * requirements and substitute correct max_entries for map creation. 4937 - */ 4938 - for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) { 4939 - if (mul * page_sz > sz) 4940 - return mul * page_sz; 4941 - } 4942 - 4943 - /* if it's impossible to satisfy the conditions (i.e., user size is 4944 - * very close to UINT_MAX but is not a power-of-2 multiple of 4945 - * page_size) then just return original size and let kernel reject it 4946 - */ 4947 - return sz; 4948 - } 4949 - 4950 4857 static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, bool is_inner) 4951 4858 { 4952 4859 LIBBPF_OPTS(bpf_map_create_opts, create_attr); ··· 4954 4923 } 4955 4924 4956 4925 switch (def->type) { 4957 - case BPF_MAP_TYPE_RINGBUF: 4958 - map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 4959 - /* fallthrough */ 4960 4926 case BPF_MAP_TYPE_PERF_EVENT_ARRAY: 4961 4927 case BPF_MAP_TYPE_CGROUP_ARRAY: 4962 4928 case BPF_MAP_TYPE_STACK_TRACE: ··· 7310 7282 return 0; 7311 7283 7312 7284 if (ext->is_set && ext->ksym.addr != sym_addr) { 7313 - pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n", 7285 + pr_warn("extern (ksym) '%s': resolution is ambiguous: 0x%llx or 0x%llx\n", 7314 7286 sym_name, ext->ksym.addr, sym_addr); 7315 7287 return -EINVAL; 7316 7288 } 7317 7289 if (!ext->is_set) { 7318 7290 ext->is_set = true; 7319 7291 ext->ksym.addr = sym_addr; 7320 - pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr); 7292 + pr_debug("extern (ksym) '%s': set to 0x%llx\n", sym_name, sym_addr); 7321 7293 } 7322 7294 return 0; 7323 7295 } ··· 7521 7493 for (i = 0; i < obj->nr_extern; i++) { 7522 7494 ext = &obj->externs[i]; 7523 7495 7524 - if (ext->type == EXT_KCFG && 7525 - strcmp(ext->name, "LINUX_KERNEL_VERSION") == 0) { 7526 - void *ext_val = kcfg_data + ext->kcfg.data_off; 7527 - __u32 kver = get_kernel_version(); 7528 - 7529 - if (!kver) { 7530 - pr_warn("failed to get kernel version\n"); 7531 - return -EINVAL; 7532 - } 7533 - err = set_kcfg_value_num(ext, ext_val, kver); 7534 - if (err) 7535 - return err; 7536 - pr_debug("extern (kcfg) %s=0x%x\n", ext->name, kver); 7537 - } else if (ext->type == EXT_KCFG && str_has_pfx(ext->name, "CONFIG_")) { 7538 - need_config = true; 7539 - } else if (ext->type == EXT_KSYM) { 7496 + if (ext->type == EXT_KSYM) { 7540 7497 if (ext->ksym.type_id) 7541 7498 need_vmlinux_btf = true; 7542 7499 else 7543 7500 need_kallsyms = true; 7501 + continue; 7502 + } else if (ext->type == EXT_KCFG) { 7503 + void *ext_ptr = kcfg_data + ext->kcfg.data_off; 7504 + __u64 value = 0; 7505 + 7506 + /* Kconfig externs need actual /proc/config.gz */ 7507 + if (str_has_pfx(ext->name, "CONFIG_")) { 7508 + need_config = true; 7509 + continue; 7510 + } 7511 + 7512 + /* Virtual kcfg externs are customly handled by libbpf */ 7513 + if (strcmp(ext->name, "LINUX_KERNEL_VERSION") == 0) { 7514 + value = get_kernel_version(); 7515 + if (!value) { 7516 + pr_warn("extern (kcfg) '%s': failed to get kernel version\n", ext->name); 7517 + return -EINVAL; 7518 + } 7519 + } else if (strcmp(ext->name, "LINUX_HAS_BPF_COOKIE") == 0) { 7520 + value = kernel_supports(obj, FEAT_BPF_COOKIE); 7521 + } else if (strcmp(ext->name, "LINUX_HAS_SYSCALL_WRAPPER") == 0) { 7522 + value = kernel_supports(obj, FEAT_SYSCALL_WRAPPER); 7523 + } else if (!str_has_pfx(ext->name, "LINUX_") || !ext->is_weak) { 7524 + /* Currently libbpf supports only CONFIG_ and LINUX_ prefixed 7525 + * __kconfig externs, where LINUX_ ones are virtual and filled out 7526 + * customly by libbpf (their values don't come from Kconfig). 7527 + * If LINUX_xxx variable is not recognized by libbpf, but is marked 7528 + * __weak, it defaults to zero value, just like for CONFIG_xxx 7529 + * externs. 7530 + */ 7531 + pr_warn("extern (kcfg) '%s': unrecognized virtual extern\n", ext->name); 7532 + return -EINVAL; 7533 + } 7534 + 7535 + err = set_kcfg_value_num(ext, ext_ptr, value); 7536 + if (err) 7537 + return err; 7538 + pr_debug("extern (kcfg) '%s': set to 0x%llx\n", 7539 + ext->name, (long long)value); 7544 7540 } else { 7545 - pr_warn("unrecognized extern '%s'\n", ext->name); 7541 + pr_warn("extern '%s': unrecognized extern kind\n", ext->name); 7546 7542 return -EINVAL; 7547 7543 } 7548 7544 } ··· 7602 7550 ext = &obj->externs[i]; 7603 7551 7604 7552 if (!ext->is_set && !ext->is_weak) { 7605 - pr_warn("extern %s (strong) not resolved\n", ext->name); 7553 + pr_warn("extern '%s' (strong): not resolved\n", ext->name); 7606 7554 return -ESRCH; 7607 7555 } else if (!ext->is_set) { 7608 - pr_debug("extern %s (weak) not resolved, defaulting to zero\n", 7556 + pr_debug("extern '%s' (weak): not resolved, defaulting to zero\n", 7609 7557 ext->name); 7610 7558 } 7611 7559 } ··· 8433 8381 8434 8382 static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8435 8383 static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8384 + static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8436 8385 static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8437 8386 static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link); 8438 8387 static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link); ··· 8454 8401 SEC_DEF("uretprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe), 8455 8402 SEC_DEF("kprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi), 8456 8403 SEC_DEF("kretprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi), 8404 + SEC_DEF("ksyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall), 8405 + SEC_DEF("kretsyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall), 8457 8406 SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt), 8458 8407 SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE), 8459 8408 SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE), ··· 9812 9757 { 9813 9758 struct perf_event_attr attr = {}; 9814 9759 char errmsg[STRERR_BUFSIZE]; 9815 - int type, pfd, err; 9760 + int type, pfd; 9816 9761 9817 9762 if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS)) 9818 9763 return -EINVAL; ··· 9848 9793 pid < 0 ? -1 : pid /* pid */, 9849 9794 pid == -1 ? 0 : -1 /* cpu */, 9850 9795 -1 /* group_fd */, PERF_FLAG_FD_CLOEXEC); 9851 - if (pfd < 0) { 9852 - err = -errno; 9853 - pr_warn("%s perf_event_open() failed: %s\n", 9854 - uprobe ? "uprobe" : "kprobe", 9855 - libbpf_strerror_r(err, errmsg, sizeof(errmsg))); 9856 - return err; 9857 - } 9858 - return pfd; 9796 + return pfd >= 0 ? pfd : -errno; 9859 9797 } 9860 9798 9861 9799 static int append_to_file(const char *file, const char *fmt, ...) ··· 9871 9823 return err; 9872 9824 } 9873 9825 9826 + #define DEBUGFS "/sys/kernel/debug/tracing" 9827 + #define TRACEFS "/sys/kernel/tracing" 9828 + 9829 + static bool use_debugfs(void) 9830 + { 9831 + static int has_debugfs = -1; 9832 + 9833 + if (has_debugfs < 0) 9834 + has_debugfs = access(DEBUGFS, F_OK) == 0; 9835 + 9836 + return has_debugfs == 1; 9837 + } 9838 + 9839 + static const char *tracefs_path(void) 9840 + { 9841 + return use_debugfs() ? DEBUGFS : TRACEFS; 9842 + } 9843 + 9844 + static const char *tracefs_kprobe_events(void) 9845 + { 9846 + return use_debugfs() ? DEBUGFS"/kprobe_events" : TRACEFS"/kprobe_events"; 9847 + } 9848 + 9849 + static const char *tracefs_uprobe_events(void) 9850 + { 9851 + return use_debugfs() ? DEBUGFS"/uprobe_events" : TRACEFS"/uprobe_events"; 9852 + } 9853 + 9874 9854 static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz, 9875 9855 const char *kfunc_name, size_t offset) 9876 9856 { ··· 9911 9835 static int add_kprobe_event_legacy(const char *probe_name, bool retprobe, 9912 9836 const char *kfunc_name, size_t offset) 9913 9837 { 9914 - const char *file = "/sys/kernel/debug/tracing/kprobe_events"; 9915 - 9916 - return append_to_file(file, "%c:%s/%s %s+0x%zx", 9838 + return append_to_file(tracefs_kprobe_events(), "%c:%s/%s %s+0x%zx", 9917 9839 retprobe ? 'r' : 'p', 9918 9840 retprobe ? "kretprobes" : "kprobes", 9919 9841 probe_name, kfunc_name, offset); ··· 9919 9845 9920 9846 static int remove_kprobe_event_legacy(const char *probe_name, bool retprobe) 9921 9847 { 9922 - const char *file = "/sys/kernel/debug/tracing/kprobe_events"; 9923 - 9924 - return append_to_file(file, "-:%s/%s", retprobe ? "kretprobes" : "kprobes", probe_name); 9848 + return append_to_file(tracefs_kprobe_events(), "-:%s/%s", 9849 + retprobe ? "kretprobes" : "kprobes", probe_name); 9925 9850 } 9926 9851 9927 9852 static int determine_kprobe_perf_type_legacy(const char *probe_name, bool retprobe) 9928 9853 { 9929 9854 char file[256]; 9930 9855 9931 - snprintf(file, sizeof(file), 9932 - "/sys/kernel/debug/tracing/events/%s/%s/id", 9933 - retprobe ? "kretprobes" : "kprobes", probe_name); 9856 + snprintf(file, sizeof(file), "%s/events/%s/%s/id", 9857 + tracefs_path(), retprobe ? "kretprobes" : "kprobes", probe_name); 9934 9858 9935 9859 return parse_uint_from_file(file, "%d\n"); 9936 9860 } ··· 9975 9903 /* Clear the newly added legacy kprobe_event */ 9976 9904 remove_kprobe_event_legacy(probe_name, retprobe); 9977 9905 return err; 9906 + } 9907 + 9908 + static const char *arch_specific_syscall_pfx(void) 9909 + { 9910 + #if defined(__x86_64__) 9911 + return "x64"; 9912 + #elif defined(__i386__) 9913 + return "ia32"; 9914 + #elif defined(__s390x__) 9915 + return "s390x"; 9916 + #elif defined(__s390__) 9917 + return "s390"; 9918 + #elif defined(__arm__) 9919 + return "arm"; 9920 + #elif defined(__aarch64__) 9921 + return "arm64"; 9922 + #elif defined(__mips__) 9923 + return "mips"; 9924 + #elif defined(__riscv) 9925 + return "riscv"; 9926 + #else 9927 + return NULL; 9928 + #endif 9929 + } 9930 + 9931 + static int probe_kern_syscall_wrapper(void) 9932 + { 9933 + char syscall_name[64]; 9934 + const char *ksys_pfx; 9935 + 9936 + ksys_pfx = arch_specific_syscall_pfx(); 9937 + if (!ksys_pfx) 9938 + return 0; 9939 + 9940 + snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx); 9941 + 9942 + if (determine_kprobe_perf_type() >= 0) { 9943 + int pfd; 9944 + 9945 + pfd = perf_event_open_probe(false, false, syscall_name, 0, getpid(), 0); 9946 + if (pfd >= 0) 9947 + close(pfd); 9948 + 9949 + return pfd >= 0 ? 1 : 0; 9950 + } else { /* legacy mode */ 9951 + char probe_name[128]; 9952 + 9953 + gen_kprobe_legacy_event_name(probe_name, sizeof(probe_name), syscall_name, 0); 9954 + if (add_kprobe_event_legacy(probe_name, false, syscall_name, 0) < 0) 9955 + return 0; 9956 + 9957 + (void)remove_kprobe_event_legacy(probe_name, false); 9958 + return 1; 9959 + } 9978 9960 } 9979 9961 9980 9962 struct bpf_link * ··· 10114 9988 ); 10115 9989 10116 9990 return bpf_program__attach_kprobe_opts(prog, func_name, &opts); 9991 + } 9992 + 9993 + struct bpf_link *bpf_program__attach_ksyscall(const struct bpf_program *prog, 9994 + const char *syscall_name, 9995 + const struct bpf_ksyscall_opts *opts) 9996 + { 9997 + LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts); 9998 + char func_name[128]; 9999 + 10000 + if (!OPTS_VALID(opts, bpf_ksyscall_opts)) 10001 + return libbpf_err_ptr(-EINVAL); 10002 + 10003 + if (kernel_supports(prog->obj, FEAT_SYSCALL_WRAPPER)) { 10004 + snprintf(func_name, sizeof(func_name), "__%s_sys_%s", 10005 + arch_specific_syscall_pfx(), syscall_name); 10006 + } else { 10007 + snprintf(func_name, sizeof(func_name), "__se_sys_%s", syscall_name); 10008 + } 10009 + 10010 + kprobe_opts.retprobe = OPTS_GET(opts, retprobe, false); 10011 + kprobe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0); 10012 + 10013 + return bpf_program__attach_kprobe_opts(prog, func_name, &kprobe_opts); 10117 10014 } 10118 10015 10119 10016 /* Adapted from perf/util/string.c */ ··· 10309 10160 return libbpf_get_error(*link); 10310 10161 } 10311 10162 10163 + static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link) 10164 + { 10165 + LIBBPF_OPTS(bpf_ksyscall_opts, opts); 10166 + const char *syscall_name; 10167 + 10168 + *link = NULL; 10169 + 10170 + /* no auto-attach for SEC("ksyscall") and SEC("kretsyscall") */ 10171 + if (strcmp(prog->sec_name, "ksyscall") == 0 || strcmp(prog->sec_name, "kretsyscall") == 0) 10172 + return 0; 10173 + 10174 + opts.retprobe = str_has_pfx(prog->sec_name, "kretsyscall/"); 10175 + if (opts.retprobe) 10176 + syscall_name = prog->sec_name + sizeof("kretsyscall/") - 1; 10177 + else 10178 + syscall_name = prog->sec_name + sizeof("ksyscall/") - 1; 10179 + 10180 + *link = bpf_program__attach_ksyscall(prog, syscall_name, &opts); 10181 + return *link ? 0 : -errno; 10182 + } 10183 + 10312 10184 static int attach_kprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link) 10313 10185 { 10314 10186 LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); ··· 10378 10208 static inline int add_uprobe_event_legacy(const char *probe_name, bool retprobe, 10379 10209 const char *binary_path, size_t offset) 10380 10210 { 10381 - const char *file = "/sys/kernel/debug/tracing/uprobe_events"; 10382 - 10383 - return append_to_file(file, "%c:%s/%s %s:0x%zx", 10211 + return append_to_file(tracefs_uprobe_events(), "%c:%s/%s %s:0x%zx", 10384 10212 retprobe ? 'r' : 'p', 10385 10213 retprobe ? "uretprobes" : "uprobes", 10386 10214 probe_name, binary_path, offset); ··· 10386 10218 10387 10219 static inline int remove_uprobe_event_legacy(const char *probe_name, bool retprobe) 10388 10220 { 10389 - const char *file = "/sys/kernel/debug/tracing/uprobe_events"; 10390 - 10391 - return append_to_file(file, "-:%s/%s", retprobe ? "uretprobes" : "uprobes", probe_name); 10221 + return append_to_file(tracefs_uprobe_events(), "-:%s/%s", 10222 + retprobe ? "uretprobes" : "uprobes", probe_name); 10392 10223 } 10393 10224 10394 10225 static int determine_uprobe_perf_type_legacy(const char *probe_name, bool retprobe) 10395 10226 { 10396 10227 char file[512]; 10397 10228 10398 - snprintf(file, sizeof(file), 10399 - "/sys/kernel/debug/tracing/events/%s/%s/id", 10400 - retprobe ? "uretprobes" : "uprobes", probe_name); 10229 + snprintf(file, sizeof(file), "%s/events/%s/%s/id", 10230 + tracefs_path(), retprobe ? "uretprobes" : "uprobes", probe_name); 10401 10231 10402 10232 return parse_uint_from_file(file, "%d\n"); 10403 10233 } ··· 10711 10545 ref_ctr_off = OPTS_GET(opts, ref_ctr_offset, 0); 10712 10546 pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0); 10713 10547 10714 - if (binary_path && !strchr(binary_path, '/')) { 10548 + if (!binary_path) 10549 + return libbpf_err_ptr(-EINVAL); 10550 + 10551 + if (!strchr(binary_path, '/')) { 10715 10552 err = resolve_full_path(binary_path, full_binary_path, 10716 10553 sizeof(full_binary_path)); 10717 10554 if (err) { ··· 10728 10559 if (func_name) { 10729 10560 long sym_off; 10730 10561 10731 - if (!binary_path) { 10732 - pr_warn("prog '%s': name-based attach requires binary_path\n", 10733 - prog->name); 10734 - return libbpf_err_ptr(-EINVAL); 10735 - } 10736 10562 sym_off = elf_find_func_offset(binary_path, func_name); 10737 10563 if (sym_off < 0) 10738 10564 return libbpf_err_ptr(sym_off); ··· 10875 10711 return libbpf_err_ptr(-EINVAL); 10876 10712 } 10877 10713 10714 + if (!binary_path) 10715 + return libbpf_err_ptr(-EINVAL); 10716 + 10878 10717 if (!strchr(binary_path, '/')) { 10879 10718 err = resolve_full_path(binary_path, resolved_path, sizeof(resolved_path)); 10880 10719 if (err) { ··· 10943 10776 char file[PATH_MAX]; 10944 10777 int ret; 10945 10778 10946 - ret = snprintf(file, sizeof(file), 10947 - "/sys/kernel/debug/tracing/events/%s/%s/id", 10948 - tp_category, tp_name); 10779 + ret = snprintf(file, sizeof(file), "%s/events/%s/%s/id", 10780 + tracefs_path(), tp_category, tp_name); 10949 10781 if (ret < 0) 10950 10782 return -errno; 10951 10783 if (ret >= sizeof(file)) { ··· 11892 11726 return libbpf_err(-ENOENT); 11893 11727 11894 11728 return cpu_buf->fd; 11729 + } 11730 + 11731 + int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, size_t *buf_size) 11732 + { 11733 + struct perf_cpu_buf *cpu_buf; 11734 + 11735 + if (buf_idx >= pb->cpu_cnt) 11736 + return libbpf_err(-EINVAL); 11737 + 11738 + cpu_buf = pb->cpu_bufs[buf_idx]; 11739 + if (!cpu_buf) 11740 + return libbpf_err(-ENOENT); 11741 + 11742 + *buf = cpu_buf->base; 11743 + *buf_size = pb->mmap_size; 11744 + return 0; 11895 11745 } 11896 11746 11897 11747 /*
+62
tools/lib/bpf/libbpf.h
··· 457 457 const char *pattern, 458 458 const struct bpf_kprobe_multi_opts *opts); 459 459 460 + struct bpf_ksyscall_opts { 461 + /* size of this struct, for forward/backward compatiblity */ 462 + size_t sz; 463 + /* custom user-provided value fetchable through bpf_get_attach_cookie() */ 464 + __u64 bpf_cookie; 465 + /* attach as return probe? */ 466 + bool retprobe; 467 + size_t :0; 468 + }; 469 + #define bpf_ksyscall_opts__last_field retprobe 470 + 471 + /** 472 + * @brief **bpf_program__attach_ksyscall()** attaches a BPF program 473 + * to kernel syscall handler of a specified syscall. Optionally it's possible 474 + * to request to install retprobe that will be triggered at syscall exit. It's 475 + * also possible to associate BPF cookie (though options). 476 + * 477 + * Libbpf automatically will determine correct full kernel function name, 478 + * which depending on system architecture and kernel version/configuration 479 + * could be of the form __<arch>_sys_<syscall> or __se_sys_<syscall>, and will 480 + * attach specified program using kprobe/kretprobe mechanism. 481 + * 482 + * **bpf_program__attach_ksyscall()** is an API counterpart of declarative 483 + * **SEC("ksyscall/<syscall>")** annotation of BPF programs. 484 + * 485 + * At the moment **SEC("ksyscall")** and **bpf_program__attach_ksyscall()** do 486 + * not handle all the calling convention quirks for mmap(), clone() and compat 487 + * syscalls. It also only attaches to "native" syscall interfaces. If host 488 + * system supports compat syscalls or defines 32-bit syscalls in 64-bit 489 + * kernel, such syscall interfaces won't be attached to by libbpf. 490 + * 491 + * These limitations may or may not change in the future. Therefore it is 492 + * recommended to use SEC("kprobe") for these syscalls or if working with 493 + * compat and 32-bit interfaces is required. 494 + * 495 + * @param prog BPF program to attach 496 + * @param syscall_name Symbolic name of the syscall (e.g., "bpf") 497 + * @param opts Additional options (see **struct bpf_ksyscall_opts**) 498 + * @return Reference to the newly created BPF link; or NULL is returned on 499 + * error, error code is stored in errno 500 + */ 501 + LIBBPF_API struct bpf_link * 502 + bpf_program__attach_ksyscall(const struct bpf_program *prog, 503 + const char *syscall_name, 504 + const struct bpf_ksyscall_opts *opts); 505 + 460 506 struct bpf_uprobe_opts { 461 507 /* size of this struct, for forward/backward compatiblity */ 462 508 size_t sz; ··· 1099 1053 LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx); 1100 1054 LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb); 1101 1055 LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx); 1056 + /** 1057 + * @brief **perf_buffer__buffer()** returns the per-cpu raw mmap()'ed underlying 1058 + * memory region of the ring buffer. 1059 + * This ring buffer can be used to implement a custom events consumer. 1060 + * The ring buffer starts with the *struct perf_event_mmap_page*, which 1061 + * holds the ring buffer managment fields, when accessing the header 1062 + * structure it's important to be SMP aware. 1063 + * You can refer to *perf_event_read_simple* for a simple example. 1064 + * @param pb the perf buffer structure 1065 + * @param buf_idx the buffer index to retreive 1066 + * @param buf (out) gets the base pointer of the mmap()'ed memory 1067 + * @param buf_size (out) gets the size of the mmap()'ed region 1068 + * @return 0 on success, negative error code for failure 1069 + */ 1070 + LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, 1071 + size_t *buf_size); 1102 1072 1103 1073 struct bpf_prog_linfo; 1104 1074 struct bpf_prog_info;
+2
tools/lib/bpf/libbpf.map
··· 356 356 LIBBPF_1.0.0 { 357 357 global: 358 358 bpf_prog_query_opts; 359 + bpf_program__attach_ksyscall; 359 360 btf__add_enum64; 360 361 btf__add_enum64_value; 361 362 libbpf_bpf_attach_type_str; 362 363 libbpf_bpf_link_type_str; 363 364 libbpf_bpf_map_type_str; 364 365 libbpf_bpf_prog_type_str; 366 + perf_buffer__buffer; 365 367 };
+5 -3
tools/lib/bpf/libbpf_internal.h
··· 108 108 size_t str_len = strlen(str); 109 109 size_t sfx_len = strlen(sfx); 110 110 111 - if (sfx_len <= str_len) 112 - return strcmp(str + str_len - sfx_len, sfx); 113 - return false; 111 + if (sfx_len > str_len) 112 + return false; 113 + return strcmp(str + str_len - sfx_len, sfx) == 0; 114 114 } 115 115 116 116 /* Symbol versioning is different between static and shared library. ··· 352 352 FEAT_BPF_COOKIE, 353 353 /* BTF_KIND_ENUM64 support and BTF_KIND_ENUM kflag support */ 354 354 FEAT_BTF_ENUM64, 355 + /* Kernel uses syscall wrapper (CONFIG_ARCH_HAS_SYSCALL_WRAPPER) */ 356 + FEAT_SYSCALL_WRAPPER, 355 357 __FEAT_CNT, 356 358 }; 357 359
+2 -14
tools/lib/bpf/usdt.bpf.h
··· 6 6 #include <linux/errno.h> 7 7 #include <bpf/bpf_helpers.h> 8 8 #include <bpf/bpf_tracing.h> 9 - #include <bpf/bpf_core_read.h> 10 9 11 10 /* Below types and maps are internal implementation details of libbpf's USDT 12 11 * support and are subjects to change. Also, bpf_usdt_xxx() API helpers should ··· 28 29 */ 29 30 #ifndef BPF_USDT_MAX_IP_CNT 30 31 #define BPF_USDT_MAX_IP_CNT (4 * BPF_USDT_MAX_SPEC_CNT) 31 - #endif 32 - /* We use BPF CO-RE to detect support for BPF cookie from BPF side. This is 33 - * the only dependency on CO-RE, so if it's undesirable, user can override 34 - * BPF_USDT_HAS_BPF_COOKIE to specify whether to BPF cookie is supported or not. 35 - */ 36 - #ifndef BPF_USDT_HAS_BPF_COOKIE 37 - #define BPF_USDT_HAS_BPF_COOKIE \ 38 - bpf_core_enum_value_exists(enum bpf_func_id___usdt, BPF_FUNC_get_attach_cookie___usdt) 39 32 #endif 40 33 41 34 enum __bpf_usdt_arg_type { ··· 74 83 __type(value, __u32); 75 84 } __bpf_usdt_ip_to_spec_id SEC(".maps") __weak; 76 85 77 - /* don't rely on user's BPF code to have latest definition of bpf_func_id */ 78 - enum bpf_func_id___usdt { 79 - BPF_FUNC_get_attach_cookie___usdt = 0xBAD, /* value doesn't matter */ 80 - }; 86 + extern const _Bool LINUX_HAS_BPF_COOKIE __kconfig; 81 87 82 88 static __always_inline 83 89 int __bpf_usdt_spec_id(struct pt_regs *ctx) 84 90 { 85 - if (!BPF_USDT_HAS_BPF_COOKIE) { 91 + if (!LINUX_HAS_BPF_COOKIE) { 86 92 long ip = PT_REGS_IP(ctx); 87 93 int *spec_id_ptr; 88 94
+5 -5
tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
··· 148 148 .write = bpf_testmod_test_write, 149 149 }; 150 150 151 - BTF_SET_START(bpf_testmod_check_kfunc_ids) 152 - BTF_ID(func, bpf_testmod_test_mod_kfunc) 153 - BTF_SET_END(bpf_testmod_check_kfunc_ids) 151 + BTF_SET8_START(bpf_testmod_check_kfunc_ids) 152 + BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc) 153 + BTF_SET8_END(bpf_testmod_check_kfunc_ids) 154 154 155 155 static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = { 156 - .owner = THIS_MODULE, 157 - .check_set = &bpf_testmod_check_kfunc_ids, 156 + .owner = THIS_MODULE, 157 + .set = &bpf_testmod_check_kfunc_ids, 158 158 }; 159 159 160 160 extern int bpf_fentry_test1(int a);
+16
tools/testing/selftests/bpf/prog_tests/bpf_iter.c
··· 27 27 #include "bpf_iter_test_kern5.skel.h" 28 28 #include "bpf_iter_test_kern6.skel.h" 29 29 #include "bpf_iter_bpf_link.skel.h" 30 + #include "bpf_iter_ksym.skel.h" 30 31 31 32 static int duration; 32 33 ··· 1121 1120 bpf_iter_bpf_link__destroy(skel); 1122 1121 } 1123 1122 1123 + static void test_ksym_iter(void) 1124 + { 1125 + struct bpf_iter_ksym *skel; 1126 + 1127 + skel = bpf_iter_ksym__open_and_load(); 1128 + if (!ASSERT_OK_PTR(skel, "bpf_iter_ksym__open_and_load")) 1129 + return; 1130 + 1131 + do_dummy_read(skel->progs.dump_ksym); 1132 + 1133 + bpf_iter_ksym__destroy(skel); 1134 + } 1135 + 1124 1136 #define CMP_BUFFER_SIZE 1024 1125 1137 static char task_vma_output[CMP_BUFFER_SIZE]; 1126 1138 static char proc_maps_output[CMP_BUFFER_SIZE]; ··· 1281 1267 test_buf_neg_offset(); 1282 1268 if (test__start_subtest("link-iter")) 1283 1269 test_link_iter(); 1270 + if (test__start_subtest("ksym")) 1271 + test_ksym_iter(); 1284 1272 }
+63 -1
tools/testing/selftests/bpf/prog_tests/bpf_nf.c
··· 2 2 #include <test_progs.h> 3 3 #include <network_helpers.h> 4 4 #include "test_bpf_nf.skel.h" 5 + #include "test_bpf_nf_fail.skel.h" 6 + 7 + static char log_buf[1024 * 1024]; 8 + 9 + struct { 10 + const char *prog_name; 11 + const char *err_msg; 12 + } test_bpf_nf_fail_tests[] = { 13 + { "alloc_release", "kernel function bpf_ct_release args#0 expected pointer to STRUCT nf_conn but" }, 14 + { "insert_insert", "kernel function bpf_ct_insert_entry args#0 expected pointer to STRUCT nf_conn___init but" }, 15 + { "lookup_insert", "kernel function bpf_ct_insert_entry args#0 expected pointer to STRUCT nf_conn___init but" }, 16 + { "set_timeout_after_insert", "kernel function bpf_ct_set_timeout args#0 expected pointer to STRUCT nf_conn___init but" }, 17 + { "set_status_after_insert", "kernel function bpf_ct_set_status args#0 expected pointer to STRUCT nf_conn___init but" }, 18 + { "change_timeout_after_alloc", "kernel function bpf_ct_change_timeout args#0 expected pointer to STRUCT nf_conn but" }, 19 + { "change_status_after_alloc", "kernel function bpf_ct_change_status args#0 expected pointer to STRUCT nf_conn but" }, 20 + }; 5 21 6 22 enum { 7 23 TEST_XDP, 8 24 TEST_TC_BPF, 9 25 }; 10 26 11 - void test_bpf_nf_ct(int mode) 27 + static void test_bpf_nf_ct(int mode) 12 28 { 13 29 struct test_bpf_nf *skel; 14 30 int prog_fd, err; ··· 55 39 ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id"); 56 40 ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup"); 57 41 ASSERT_EQ(skel->bss->test_eafnosupport, -EAFNOSUPPORT, "Test EAFNOSUPPORT for invalid len__tuple"); 42 + ASSERT_EQ(skel->data->test_alloc_entry, 0, "Test for alloc new entry"); 43 + ASSERT_EQ(skel->data->test_insert_entry, 0, "Test for insert new entry"); 44 + ASSERT_EQ(skel->data->test_succ_lookup, 0, "Test for successful lookup"); 45 + /* allow some tolerance for test_delta_timeout value to avoid races. */ 46 + ASSERT_GT(skel->bss->test_delta_timeout, 8, "Test for min ct timeout update"); 47 + ASSERT_LE(skel->bss->test_delta_timeout, 10, "Test for max ct timeout update"); 48 + /* expected status is IPS_SEEN_REPLY */ 49 + ASSERT_EQ(skel->bss->test_status, 2, "Test for ct status update "); 58 50 end: 59 51 test_bpf_nf__destroy(skel); 60 52 } 61 53 54 + static void test_bpf_nf_ct_fail(const char *prog_name, const char *err_msg) 55 + { 56 + LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf, 57 + .kernel_log_size = sizeof(log_buf), 58 + .kernel_log_level = 1); 59 + struct test_bpf_nf_fail *skel; 60 + struct bpf_program *prog; 61 + int ret; 62 + 63 + skel = test_bpf_nf_fail__open_opts(&opts); 64 + if (!ASSERT_OK_PTR(skel, "test_bpf_nf_fail__open")) 65 + return; 66 + 67 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 68 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 69 + goto end; 70 + 71 + bpf_program__set_autoload(prog, true); 72 + 73 + ret = test_bpf_nf_fail__load(skel); 74 + if (!ASSERT_ERR(ret, "test_bpf_nf_fail__load must fail")) 75 + goto end; 76 + 77 + if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) { 78 + fprintf(stderr, "Expected: %s\n", err_msg); 79 + fprintf(stderr, "Verifier: %s\n", log_buf); 80 + } 81 + 82 + end: 83 + test_bpf_nf_fail__destroy(skel); 84 + } 85 + 62 86 void test_bpf_nf(void) 63 87 { 88 + int i; 64 89 if (test__start_subtest("xdp-ct")) 65 90 test_bpf_nf_ct(TEST_XDP); 66 91 if (test__start_subtest("tc-bpf-ct")) 67 92 test_bpf_nf_ct(TEST_TC_BPF); 93 + for (i = 0; i < ARRAY_SIZE(test_bpf_nf_fail_tests); i++) { 94 + if (test__start_subtest(test_bpf_nf_fail_tests[i].prog_name)) 95 + test_bpf_nf_ct_fail(test_bpf_nf_fail_tests[i].prog_name, 96 + test_bpf_nf_fail_tests[i].err_msg); 97 + } 68 98 }
+1 -1
tools/testing/selftests/bpf/prog_tests/btf.c
··· 5338 5338 ret = snprintf(pin_path, sizeof(pin_path), "%s/%s", 5339 5339 "/sys/fs/bpf", test->map_name); 5340 5340 5341 - if (CHECK(ret == sizeof(pin_path), "pin_path %s/%s is too long", 5341 + if (CHECK(ret >= sizeof(pin_path), "pin_path %s/%s is too long", 5342 5342 "/sys/fs/bpf", test->map_name)) { 5343 5343 err = -1; 5344 5344 goto done;
+7 -10
tools/testing/selftests/bpf/prog_tests/core_extern.c
··· 39 39 "CONFIG_STR=\"abracad\"\n" 40 40 "CONFIG_MISSING=0", 41 41 .data = { 42 + .unkn_virt_val = 0, 42 43 .bpf_syscall = false, 43 44 .tristate_val = TRI_MODULE, 44 45 .bool_val = true, ··· 122 121 void test_core_extern(void) 123 122 { 124 123 const uint32_t kern_ver = get_kernel_version(); 125 - int err, duration = 0, i, j; 124 + int err, i, j; 126 125 struct test_core_extern *skel = NULL; 127 126 uint64_t *got, *exp; 128 127 int n = sizeof(*skel->data) / sizeof(uint64_t); ··· 137 136 continue; 138 137 139 138 skel = test_core_extern__open_opts(&opts); 140 - if (CHECK(!skel, "skel_open", "skeleton open failed\n")) 139 + if (!ASSERT_OK_PTR(skel, "skel_open")) 141 140 goto cleanup; 142 141 err = test_core_extern__load(skel); 143 142 if (t->fails) { 144 - CHECK(!err, "skel_load", 145 - "shouldn't succeed open/load of skeleton\n"); 143 + ASSERT_ERR(err, "skel_load_should_fail"); 146 144 goto cleanup; 147 - } else if (CHECK(err, "skel_load", 148 - "failed to open/load skeleton\n")) { 145 + } else if (!ASSERT_OK(err, "skel_load")) { 149 146 goto cleanup; 150 147 } 151 148 err = test_core_extern__attach(skel); 152 - if (CHECK(err, "attach_raw_tp", "failed attach: %d\n", err)) 149 + if (!ASSERT_OK(err, "attach_raw_tp")) 153 150 goto cleanup; 154 151 155 152 usleep(1); ··· 157 158 got = (uint64_t *)skel->data; 158 159 exp = (uint64_t *)&t->data; 159 160 for (j = 0; j < n; j++) { 160 - CHECK(got[j] != exp[j], "check_res", 161 - "result #%d: expected %llx, but got %llx\n", 162 - j, (__u64)exp[j], (__u64)got[j]); 161 + ASSERT_EQ(got[j], exp[j], "result"); 163 162 } 164 163 cleanup: 165 164 test_core_extern__destroy(skel);
+2
tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
··· 364 364 continue; 365 365 if (!strncmp(name, "rcu_", 4)) 366 366 continue; 367 + if (!strcmp(name, "bpf_dispatcher_xdp_func")) 368 + continue; 367 369 if (!strncmp(name, "__ftrace_invalid_address__", 368 370 sizeof("__ftrace_invalid_address__") - 1)) 369 371 continue;
+11
tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c
··· 50 50 if (CHECK(!skel, "skel_open", "skeleton open failed\n")) 51 51 return; 52 52 53 + /* validate ringbuf size adjustment logic */ 54 + ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), page_size, "rb1_size_before"); 55 + ASSERT_OK(bpf_map__set_max_entries(skel->maps.ringbuf1, page_size + 1), "rb1_resize"); 56 + ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), 2 * page_size, "rb1_size_after"); 57 + ASSERT_OK(bpf_map__set_max_entries(skel->maps.ringbuf1, page_size), "rb1_reset"); 58 + ASSERT_EQ(bpf_map__max_entries(skel->maps.ringbuf1), page_size, "rb1_size_final"); 59 + 53 60 proto_fd = bpf_map_create(BPF_MAP_TYPE_RINGBUF, NULL, 0, 0, page_size, NULL); 54 61 if (CHECK(proto_fd < 0, "bpf_map_create", "bpf_map_create failed\n")) 55 62 goto cleanup; ··· 71 64 72 65 close(proto_fd); 73 66 proto_fd = -1; 67 + 68 + /* make sure we can't resize ringbuf after object load */ 69 + if (!ASSERT_ERR(bpf_map__set_max_entries(skel->maps.ringbuf1, 3 * page_size), "rb1_resize_after_load")) 70 + goto cleanup; 74 71 75 72 /* only trigger BPF program for current process */ 76 73 skel->bss->pid = getpid();
+2
tools/testing/selftests/bpf/prog_tests/skeleton.c
··· 122 122 123 123 ASSERT_EQ(skel->bss->out_mostly_var, 123, "out_mostly_var"); 124 124 125 + ASSERT_EQ(bss->huge_arr[ARRAY_SIZE(bss->huge_arr) - 1], 123, "huge_arr"); 126 + 125 127 elf_bytes = test_skeleton__elf_bytes(&elf_bytes_sz); 126 128 ASSERT_OK_PTR(elf_bytes, "elf_bytes"); 127 129 ASSERT_GE(elf_bytes_sz, 0, "elf_bytes_sz");
+7
tools/testing/selftests/bpf/progs/bpf_iter.h
··· 22 22 #define BTF_F_NONAME BTF_F_NONAME___not_used 23 23 #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used 24 24 #define BTF_F_ZERO BTF_F_ZERO___not_used 25 + #define bpf_iter__ksym bpf_iter__ksym___not_used 25 26 #include "vmlinux.h" 26 27 #undef bpf_iter_meta 27 28 #undef bpf_iter__bpf_map ··· 45 44 #undef BTF_F_NONAME 46 45 #undef BTF_F_PTR_RAW 47 46 #undef BTF_F_ZERO 47 + #undef bpf_iter__ksym 48 48 49 49 struct bpf_iter_meta { 50 50 struct seq_file *seq; ··· 152 150 BTF_F_NONAME = (1ULL << 1), 153 151 BTF_F_PTR_RAW = (1ULL << 2), 154 152 BTF_F_ZERO = (1ULL << 3), 153 + }; 154 + 155 + struct bpf_iter__ksym { 156 + struct bpf_iter_meta *meta; 157 + struct kallsym_iter *ksym; 155 158 };
+74
tools/testing/selftests/bpf/progs/bpf_iter_ksym.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022, Oracle and/or its affiliates. */ 3 + #include "bpf_iter.h" 4 + #include <bpf/bpf_helpers.h> 5 + 6 + char _license[] SEC("license") = "GPL"; 7 + 8 + unsigned long last_sym_value = 0; 9 + 10 + static inline char tolower(char c) 11 + { 12 + if (c >= 'A' && c <= 'Z') 13 + c += ('a' - 'A'); 14 + return c; 15 + } 16 + 17 + static inline char toupper(char c) 18 + { 19 + if (c >= 'a' && c <= 'z') 20 + c -= ('a' - 'A'); 21 + return c; 22 + } 23 + 24 + /* Dump symbols with max size; the latter is calculated by caching symbol N value 25 + * and when iterating on symbol N+1, we can print max size of symbol N via 26 + * address of N+1 - address of N. 27 + */ 28 + SEC("iter/ksym") 29 + int dump_ksym(struct bpf_iter__ksym *ctx) 30 + { 31 + struct seq_file *seq = ctx->meta->seq; 32 + struct kallsym_iter *iter = ctx->ksym; 33 + __u32 seq_num = ctx->meta->seq_num; 34 + unsigned long value; 35 + char type; 36 + int ret; 37 + 38 + if (!iter) 39 + return 0; 40 + 41 + if (seq_num == 0) { 42 + BPF_SEQ_PRINTF(seq, "ADDR TYPE NAME MODULE_NAME KIND MAX_SIZE\n"); 43 + return 0; 44 + } 45 + if (last_sym_value) 46 + BPF_SEQ_PRINTF(seq, "0x%x\n", iter->value - last_sym_value); 47 + else 48 + BPF_SEQ_PRINTF(seq, "\n"); 49 + 50 + value = iter->show_value ? iter->value : 0; 51 + 52 + last_sym_value = value; 53 + 54 + type = iter->type; 55 + 56 + if (iter->module_name[0]) { 57 + type = iter->exported ? toupper(type) : tolower(type); 58 + BPF_SEQ_PRINTF(seq, "0x%llx %c %s [ %s ] ", 59 + value, type, iter->name, iter->module_name); 60 + } else { 61 + BPF_SEQ_PRINTF(seq, "0x%llx %c %s ", value, type, iter->name); 62 + } 63 + if (!iter->pos_arch_end || iter->pos_arch_end > iter->pos) 64 + BPF_SEQ_PRINTF(seq, "CORE "); 65 + else if (!iter->pos_mod_end || iter->pos_mod_end > iter->pos) 66 + BPF_SEQ_PRINTF(seq, "MOD "); 67 + else if (!iter->pos_ftrace_mod_end || iter->pos_ftrace_mod_end > iter->pos) 68 + BPF_SEQ_PRINTF(seq, "FTRACE_MOD "); 69 + else if (!iter->pos_bpf_end || iter->pos_bpf_end > iter->pos) 70 + BPF_SEQ_PRINTF(seq, "BPF "); 71 + else 72 + BPF_SEQ_PRINTF(seq, "KPROBE "); 73 + return 0; 74 + }
+3 -3
tools/testing/selftests/bpf/progs/bpf_syscall_macro.c
··· 64 64 return 0; 65 65 } 66 66 67 - SEC("kprobe/" SYS_PREFIX "sys_prctl") 68 - int BPF_KPROBE_SYSCALL(prctl_enter, int option, unsigned long arg2, 69 - unsigned long arg3, unsigned long arg4, unsigned long arg5) 67 + SEC("ksyscall/prctl") 68 + int BPF_KSYSCALL(prctl_enter, int option, unsigned long arg2, 69 + unsigned long arg3, unsigned long arg4, unsigned long arg5) 70 70 { 71 71 pid_t pid = bpf_get_current_pid_tgid() >> 32; 72 72
+7 -8
tools/testing/selftests/bpf/progs/test_attach_probe.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 // Copyright (c) 2017 Facebook 3 3 4 - #include <linux/ptrace.h> 5 - #include <linux/bpf.h> 4 + #include "vmlinux.h" 6 5 #include <bpf/bpf_helpers.h> 7 6 #include <bpf/bpf_tracing.h> 8 - #include <stdbool.h> 7 + #include <bpf/bpf_core_read.h> 9 8 #include "bpf_misc.h" 10 9 11 10 int kprobe_res = 0; ··· 30 31 return 0; 31 32 } 32 33 33 - SEC("kprobe/" SYS_PREFIX "sys_nanosleep") 34 - int BPF_KPROBE(handle_kprobe_auto) 34 + SEC("ksyscall/nanosleep") 35 + int BPF_KSYSCALL(handle_kprobe_auto, struct __kernel_timespec *req, struct __kernel_timespec *rem) 35 36 { 36 37 kprobe2_res = 11; 37 38 return 0; ··· 55 56 return 0; 56 57 } 57 58 58 - SEC("kretprobe/" SYS_PREFIX "sys_nanosleep") 59 - int BPF_KRETPROBE(handle_kretprobe_auto) 59 + SEC("kretsyscall/nanosleep") 60 + int BPF_KRETPROBE(handle_kretprobe_auto, int ret) 60 61 { 61 62 kretprobe2_res = 22; 62 - return 0; 63 + return ret; 63 64 } 64 65 65 66 SEC("uprobe")
+73 -12
tools/testing/selftests/bpf/progs/test_bpf_nf.c
··· 8 8 #define EINVAL 22 9 9 #define ENOENT 2 10 10 11 + extern unsigned long CONFIG_HZ __kconfig; 12 + 11 13 int test_einval_bpf_tuple = 0; 12 14 int test_einval_reserved = 0; 13 15 int test_einval_netns_id = 0; ··· 18 16 int test_enonet_netns_id = 0; 19 17 int test_enoent_lookup = 0; 20 18 int test_eafnosupport = 0; 19 + int test_alloc_entry = -EINVAL; 20 + int test_insert_entry = -EAFNOSUPPORT; 21 + int test_succ_lookup = -ENOENT; 22 + u32 test_delta_timeout = 0; 23 + u32 test_status = 0; 21 24 22 25 struct nf_conn; 23 26 ··· 33 26 u8 reserved[3]; 34 27 } __attribute__((preserve_access_index)); 35 28 29 + struct nf_conn *bpf_xdp_ct_alloc(struct xdp_md *, struct bpf_sock_tuple *, u32, 30 + struct bpf_ct_opts___local *, u32) __ksym; 36 31 struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *, struct bpf_sock_tuple *, u32, 37 32 struct bpf_ct_opts___local *, u32) __ksym; 33 + struct nf_conn *bpf_skb_ct_alloc(struct __sk_buff *, struct bpf_sock_tuple *, u32, 34 + struct bpf_ct_opts___local *, u32) __ksym; 38 35 struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32, 39 36 struct bpf_ct_opts___local *, u32) __ksym; 37 + struct nf_conn *bpf_ct_insert_entry(struct nf_conn *) __ksym; 40 38 void bpf_ct_release(struct nf_conn *) __ksym; 39 + void bpf_ct_set_timeout(struct nf_conn *, u32) __ksym; 40 + int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym; 41 + int bpf_ct_set_status(struct nf_conn *, u32) __ksym; 42 + int bpf_ct_change_status(struct nf_conn *, u32) __ksym; 41 43 42 44 static __always_inline void 43 - nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32, 44 - struct bpf_ct_opts___local *, u32), 45 + nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32, 46 + struct bpf_ct_opts___local *, u32), 47 + struct nf_conn *(*alloc_fn)(void *, struct bpf_sock_tuple *, u32, 48 + struct bpf_ct_opts___local *, u32), 45 49 void *ctx) 46 50 { 47 51 struct bpf_ct_opts___local opts_def = { .l4proto = IPPROTO_TCP, .netns_id = -1 }; 48 52 struct bpf_sock_tuple bpf_tuple; 49 53 struct nf_conn *ct; 54 + int err; 50 55 51 56 __builtin_memset(&bpf_tuple, 0, sizeof(bpf_tuple.ipv4)); 52 57 53 - ct = func(ctx, NULL, 0, &opts_def, sizeof(opts_def)); 58 + ct = lookup_fn(ctx, NULL, 0, &opts_def, sizeof(opts_def)); 54 59 if (ct) 55 60 bpf_ct_release(ct); 56 61 else 57 62 test_einval_bpf_tuple = opts_def.error; 58 63 59 64 opts_def.reserved[0] = 1; 60 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 65 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 66 + sizeof(opts_def)); 61 67 opts_def.reserved[0] = 0; 62 68 opts_def.l4proto = IPPROTO_TCP; 63 69 if (ct) ··· 79 59 test_einval_reserved = opts_def.error; 80 60 81 61 opts_def.netns_id = -2; 82 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 62 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 63 + sizeof(opts_def)); 83 64 opts_def.netns_id = -1; 84 65 if (ct) 85 66 bpf_ct_release(ct); 86 67 else 87 68 test_einval_netns_id = opts_def.error; 88 69 89 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def) - 1); 70 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 71 + sizeof(opts_def) - 1); 90 72 if (ct) 91 73 bpf_ct_release(ct); 92 74 else 93 75 test_einval_len_opts = opts_def.error; 94 76 95 77 opts_def.l4proto = IPPROTO_ICMP; 96 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 78 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 79 + sizeof(opts_def)); 97 80 opts_def.l4proto = IPPROTO_TCP; 98 81 if (ct) 99 82 bpf_ct_release(ct); ··· 104 81 test_eproto_l4proto = opts_def.error; 105 82 106 83 opts_def.netns_id = 0xf00f; 107 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 84 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 85 + sizeof(opts_def)); 108 86 opts_def.netns_id = -1; 109 87 if (ct) 110 88 bpf_ct_release(ct); 111 89 else 112 90 test_enonet_netns_id = opts_def.error; 113 91 114 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); 92 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 93 + sizeof(opts_def)); 115 94 if (ct) 116 95 bpf_ct_release(ct); 117 96 else 118 97 test_enoent_lookup = opts_def.error; 119 98 120 - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def, sizeof(opts_def)); 99 + ct = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def, 100 + sizeof(opts_def)); 121 101 if (ct) 122 102 bpf_ct_release(ct); 123 103 else 124 104 test_eafnosupport = opts_def.error; 105 + 106 + bpf_tuple.ipv4.saddr = bpf_get_prandom_u32(); /* src IP */ 107 + bpf_tuple.ipv4.daddr = bpf_get_prandom_u32(); /* dst IP */ 108 + bpf_tuple.ipv4.sport = bpf_get_prandom_u32(); /* src port */ 109 + bpf_tuple.ipv4.dport = bpf_get_prandom_u32(); /* dst port */ 110 + 111 + ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 112 + sizeof(opts_def)); 113 + if (ct) { 114 + struct nf_conn *ct_ins; 115 + 116 + bpf_ct_set_timeout(ct, 10000); 117 + bpf_ct_set_status(ct, IPS_CONFIRMED); 118 + 119 + ct_ins = bpf_ct_insert_entry(ct); 120 + if (ct_ins) { 121 + struct nf_conn *ct_lk; 122 + 123 + ct_lk = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), 124 + &opts_def, sizeof(opts_def)); 125 + if (ct_lk) { 126 + /* update ct entry timeout */ 127 + bpf_ct_change_timeout(ct_lk, 10000); 128 + test_delta_timeout = ct_lk->timeout - bpf_jiffies64(); 129 + test_delta_timeout /= CONFIG_HZ; 130 + test_status = IPS_SEEN_REPLY; 131 + bpf_ct_change_status(ct_lk, IPS_SEEN_REPLY); 132 + bpf_ct_release(ct_lk); 133 + test_succ_lookup = 0; 134 + } 135 + bpf_ct_release(ct_ins); 136 + test_insert_entry = 0; 137 + } 138 + test_alloc_entry = 0; 139 + } 125 140 } 126 141 127 142 SEC("xdp") 128 143 int nf_xdp_ct_test(struct xdp_md *ctx) 129 144 { 130 - nf_ct_test((void *)bpf_xdp_ct_lookup, ctx); 145 + nf_ct_test((void *)bpf_xdp_ct_lookup, (void *)bpf_xdp_ct_alloc, ctx); 131 146 return 0; 132 147 } 133 148 134 149 SEC("tc") 135 150 int nf_skb_ct_test(struct __sk_buff *ctx) 136 151 { 137 - nf_ct_test((void *)bpf_skb_ct_lookup, ctx); 152 + nf_ct_test((void *)bpf_skb_ct_lookup, (void *)bpf_skb_ct_alloc, ctx); 138 153 return 0; 139 154 } 140 155
+134
tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <vmlinux.h> 3 + #include <bpf/bpf_tracing.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_core_read.h> 6 + 7 + struct nf_conn; 8 + 9 + struct bpf_ct_opts___local { 10 + s32 netns_id; 11 + s32 error; 12 + u8 l4proto; 13 + u8 reserved[3]; 14 + } __attribute__((preserve_access_index)); 15 + 16 + struct nf_conn *bpf_skb_ct_alloc(struct __sk_buff *, struct bpf_sock_tuple *, u32, 17 + struct bpf_ct_opts___local *, u32) __ksym; 18 + struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32, 19 + struct bpf_ct_opts___local *, u32) __ksym; 20 + struct nf_conn *bpf_ct_insert_entry(struct nf_conn *) __ksym; 21 + void bpf_ct_release(struct nf_conn *) __ksym; 22 + void bpf_ct_set_timeout(struct nf_conn *, u32) __ksym; 23 + int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym; 24 + int bpf_ct_set_status(struct nf_conn *, u32) __ksym; 25 + int bpf_ct_change_status(struct nf_conn *, u32) __ksym; 26 + 27 + SEC("?tc") 28 + int alloc_release(struct __sk_buff *ctx) 29 + { 30 + struct bpf_ct_opts___local opts = {}; 31 + struct bpf_sock_tuple tup = {}; 32 + struct nf_conn *ct; 33 + 34 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 35 + if (!ct) 36 + return 0; 37 + bpf_ct_release(ct); 38 + return 0; 39 + } 40 + 41 + SEC("?tc") 42 + int insert_insert(struct __sk_buff *ctx) 43 + { 44 + struct bpf_ct_opts___local opts = {}; 45 + struct bpf_sock_tuple tup = {}; 46 + struct nf_conn *ct; 47 + 48 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 49 + if (!ct) 50 + return 0; 51 + ct = bpf_ct_insert_entry(ct); 52 + if (!ct) 53 + return 0; 54 + ct = bpf_ct_insert_entry(ct); 55 + return 0; 56 + } 57 + 58 + SEC("?tc") 59 + int lookup_insert(struct __sk_buff *ctx) 60 + { 61 + struct bpf_ct_opts___local opts = {}; 62 + struct bpf_sock_tuple tup = {}; 63 + struct nf_conn *ct; 64 + 65 + ct = bpf_skb_ct_lookup(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 66 + if (!ct) 67 + return 0; 68 + bpf_ct_insert_entry(ct); 69 + return 0; 70 + } 71 + 72 + SEC("?tc") 73 + int set_timeout_after_insert(struct __sk_buff *ctx) 74 + { 75 + struct bpf_ct_opts___local opts = {}; 76 + struct bpf_sock_tuple tup = {}; 77 + struct nf_conn *ct; 78 + 79 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 80 + if (!ct) 81 + return 0; 82 + ct = bpf_ct_insert_entry(ct); 83 + if (!ct) 84 + return 0; 85 + bpf_ct_set_timeout(ct, 0); 86 + return 0; 87 + } 88 + 89 + SEC("?tc") 90 + int set_status_after_insert(struct __sk_buff *ctx) 91 + { 92 + struct bpf_ct_opts___local opts = {}; 93 + struct bpf_sock_tuple tup = {}; 94 + struct nf_conn *ct; 95 + 96 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 97 + if (!ct) 98 + return 0; 99 + ct = bpf_ct_insert_entry(ct); 100 + if (!ct) 101 + return 0; 102 + bpf_ct_set_status(ct, 0); 103 + return 0; 104 + } 105 + 106 + SEC("?tc") 107 + int change_timeout_after_alloc(struct __sk_buff *ctx) 108 + { 109 + struct bpf_ct_opts___local opts = {}; 110 + struct bpf_sock_tuple tup = {}; 111 + struct nf_conn *ct; 112 + 113 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 114 + if (!ct) 115 + return 0; 116 + bpf_ct_change_timeout(ct, 0); 117 + return 0; 118 + } 119 + 120 + SEC("?tc") 121 + int change_status_after_alloc(struct __sk_buff *ctx) 122 + { 123 + struct bpf_ct_opts___local opts = {}; 124 + struct bpf_sock_tuple tup = {}; 125 + struct nf_conn *ct; 126 + 127 + ct = bpf_skb_ct_alloc(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 128 + if (!ct) 129 + return 0; 130 + bpf_ct_change_status(ct, 0); 131 + return 0; 132 + } 133 + 134 + char _license[] SEC("license") = "GPL";
+3
tools/testing/selftests/bpf/progs/test_core_extern.c
··· 11 11 static int (*bpf_missing_helper)(const void *arg1, int arg2) = (void *) 999; 12 12 13 13 extern int LINUX_KERNEL_VERSION __kconfig; 14 + extern int LINUX_UNKNOWN_VIRTUAL_EXTERN __kconfig __weak; 14 15 extern bool CONFIG_BPF_SYSCALL __kconfig; /* strong */ 15 16 extern enum libbpf_tristate CONFIG_TRISTATE __kconfig __weak; 16 17 extern bool CONFIG_BOOL __kconfig __weak; ··· 23 22 extern uint64_t CONFIG_MISSING __kconfig __weak; 24 23 25 24 uint64_t kern_ver = -1; 25 + uint64_t unkn_virt_val = -1; 26 26 uint64_t bpf_syscall = -1; 27 27 uint64_t tristate_val = -1; 28 28 uint64_t bool_val = -1; ··· 40 38 int i; 41 39 42 40 kern_ver = LINUX_KERNEL_VERSION; 41 + unkn_virt_val = LINUX_UNKNOWN_VIRTUAL_EXTERN; 43 42 bpf_syscall = CONFIG_BPF_SYSCALL; 44 43 tristate_val = CONFIG_TRISTATE; 45 44 bool_val = CONFIG_BOOL;
+6 -21
tools/testing/selftests/bpf/progs/test_probe_user.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 - 3 - #include <linux/ptrace.h> 4 - #include <linux/bpf.h> 5 - 6 - #include <netinet/in.h> 7 - 2 + #include "vmlinux.h" 8 3 #include <bpf/bpf_helpers.h> 9 4 #include <bpf/bpf_tracing.h> 5 + #include <bpf/bpf_core_read.h> 10 6 #include "bpf_misc.h" 11 7 12 8 static struct sockaddr_in old; 13 9 14 - SEC("kprobe/" SYS_PREFIX "sys_connect") 15 - int BPF_KPROBE(handle_sys_connect) 10 + SEC("ksyscall/connect") 11 + int BPF_KSYSCALL(handle_sys_connect, int fd, struct sockaddr_in *uservaddr, int addrlen) 16 12 { 17 - #if SYSCALL_WRAPPER == 1 18 - struct pt_regs *real_regs; 19 - #endif 20 13 struct sockaddr_in new; 21 - void *ptr; 22 14 23 - #if SYSCALL_WRAPPER == 0 24 - ptr = (void *)PT_REGS_PARM2(ctx); 25 - #else 26 - real_regs = (struct pt_regs *)PT_REGS_PARM1(ctx); 27 - bpf_probe_read_kernel(&ptr, sizeof(ptr), &PT_REGS_PARM2(real_regs)); 28 - #endif 29 - 30 - bpf_probe_read_user(&old, sizeof(old), ptr); 15 + bpf_probe_read_user(&old, sizeof(old), uservaddr); 31 16 __builtin_memset(&new, 0xab, sizeof(new)); 32 - bpf_probe_write_user(ptr, &new, sizeof(new)); 17 + bpf_probe_write_user(uservaddr, &new, sizeof(new)); 33 18 34 19 return 0; 35 20 }
+4
tools/testing/selftests/bpf/progs/test_skeleton.c
··· 51 51 int read_mostly_var __read_mostly; 52 52 int out_mostly_var; 53 53 54 + char huge_arr[16 * 1024 * 1024]; 55 + 54 56 SEC("raw_tp/sys_enter") 55 57 int handler(const void *ctx) 56 58 { ··· 72 70 out_dynarr[i] = in_dynarr[i]; 73 71 74 72 out_mostly_var = read_mostly_var; 73 + 74 + huge_arr[sizeof(huge_arr) - 1] = 123; 75 75 76 76 return 0; 77 77 }
+15 -15
tools/testing/selftests/bpf/progs/test_xdp_noinline.c
··· 239 239 udp = data + off; 240 240 241 241 if (udp + 1 > data_end) 242 - return 0; 242 + return false; 243 243 if (!is_icmp) { 244 244 pckt->flow.port16[0] = udp->source; 245 245 pckt->flow.port16[1] = udp->dest; ··· 247 247 pckt->flow.port16[0] = udp->dest; 248 248 pckt->flow.port16[1] = udp->source; 249 249 } 250 - return 1; 250 + return true; 251 251 } 252 252 253 253 static __attribute__ ((noinline)) ··· 261 261 262 262 tcp = data + off; 263 263 if (tcp + 1 > data_end) 264 - return 0; 264 + return false; 265 265 if (tcp->syn) 266 266 pckt->flags |= (1 << 1); 267 267 if (!is_icmp) { ··· 271 271 pckt->flow.port16[0] = tcp->dest; 272 272 pckt->flow.port16[1] = tcp->source; 273 273 } 274 - return 1; 274 + return true; 275 275 } 276 276 277 277 static __attribute__ ((noinline)) ··· 287 287 void *data; 288 288 289 289 if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct ipv6hdr))) 290 - return 0; 290 + return false; 291 291 data = (void *)(long)xdp->data; 292 292 data_end = (void *)(long)xdp->data_end; 293 293 new_eth = data; ··· 295 295 old_eth = data + sizeof(struct ipv6hdr); 296 296 if (new_eth + 1 > data_end || 297 297 old_eth + 1 > data_end || ip6h + 1 > data_end) 298 - return 0; 298 + return false; 299 299 memcpy(new_eth->eth_dest, cval->mac, 6); 300 300 memcpy(new_eth->eth_source, old_eth->eth_dest, 6); 301 301 new_eth->eth_proto = 56710; ··· 314 314 ip6h->saddr.in6_u.u6_addr32[2] = 3; 315 315 ip6h->saddr.in6_u.u6_addr32[3] = ip_suffix; 316 316 memcpy(ip6h->daddr.in6_u.u6_addr32, dst->dstv6, 16); 317 - return 1; 317 + return true; 318 318 } 319 319 320 320 static __attribute__ ((noinline)) ··· 335 335 ip_suffix <<= 15; 336 336 ip_suffix ^= pckt->flow.src; 337 337 if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct iphdr))) 338 - return 0; 338 + return false; 339 339 data = (void *)(long)xdp->data; 340 340 data_end = (void *)(long)xdp->data_end; 341 341 new_eth = data; ··· 343 343 old_eth = data + sizeof(struct iphdr); 344 344 if (new_eth + 1 > data_end || 345 345 old_eth + 1 > data_end || iph + 1 > data_end) 346 - return 0; 346 + return false; 347 347 memcpy(new_eth->eth_dest, cval->mac, 6); 348 348 memcpy(new_eth->eth_source, old_eth->eth_dest, 6); 349 349 new_eth->eth_proto = 8; ··· 367 367 csum += *next_iph_u16++; 368 368 iph->check = ~((csum & 0xffff) + (csum >> 16)); 369 369 if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr))) 370 - return 0; 371 - return 1; 370 + return false; 371 + return true; 372 372 } 373 373 374 374 static __attribute__ ((noinline)) ··· 386 386 else 387 387 new_eth->eth_proto = 56710; 388 388 if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct ipv6hdr))) 389 - return 0; 389 + return false; 390 390 *data = (void *)(long)xdp->data; 391 391 *data_end = (void *)(long)xdp->data_end; 392 - return 1; 392 + return true; 393 393 } 394 394 395 395 static __attribute__ ((noinline)) ··· 404 404 memcpy(new_eth->eth_dest, old_eth->eth_dest, 6); 405 405 new_eth->eth_proto = 8; 406 406 if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr))) 407 - return 0; 407 + return false; 408 408 *data = (void *)(long)xdp->data; 409 409 *data_end = (void *)(long)xdp->data_end; 410 - return 1; 410 + return true; 411 411 } 412 412 413 413 static __attribute__ ((noinline))
+3 -3
tools/testing/selftests/bpf/test_xdp_veth.sh
··· 106 106 bpftool map update pinned $BPF_DIR/maps/tx_port key 0 0 0 0 value 122 0 0 0 107 107 bpftool map update pinned $BPF_DIR/maps/tx_port key 1 0 0 0 value 133 0 0 0 108 108 bpftool map update pinned $BPF_DIR/maps/tx_port key 2 0 0 0 value 111 0 0 0 109 - ip link set dev veth1 xdp pinned $BPF_DIR/progs/redirect_map_0 110 - ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1 111 - ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2 109 + ip link set dev veth1 xdp pinned $BPF_DIR/progs/xdp_redirect_map_0 110 + ip link set dev veth2 xdp pinned $BPF_DIR/progs/xdp_redirect_map_1 111 + ip link set dev veth3 xdp pinned $BPF_DIR/progs/xdp_redirect_map_2 112 112 113 113 ip -n ${NS1} link set dev veth11 xdp obj xdp_dummy.o sec xdp 114 114 ip -n ${NS2} link set dev veth22 xdp obj xdp_tx.o sec xdp
+1
tools/testing/selftests/bpf/verifier/bpf_loop_inline.c
··· 251 251 .expected_insns = { PSEUDO_CALL_INSN() }, 252 252 .unexpected_insns = { HELPER_CALL_INSN() }, 253 253 .result = ACCEPT, 254 + .prog_type = BPF_PROG_TYPE_TRACEPOINT, 254 255 .func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } }, 255 256 .func_info_cnt = 2, 256 257 BTF_TYPES
+53
tools/testing/selftests/bpf/verifier/calls.c
··· 219 219 .errstr = "variable ptr_ access var_off=(0x0; 0x7) disallowed", 220 220 }, 221 221 { 222 + "calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID", 223 + .insns = { 224 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), 225 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), 226 + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0), 227 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 228 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 229 + BPF_EXIT_INSN(), 230 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 231 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 232 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 233 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 16), 234 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 235 + BPF_MOV64_IMM(BPF_REG_0, 0), 236 + BPF_EXIT_INSN(), 237 + }, 238 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 239 + .fixup_kfunc_btf_id = { 240 + { "bpf_kfunc_call_test_acquire", 3 }, 241 + { "bpf_kfunc_call_test_ref", 8 }, 242 + { "bpf_kfunc_call_test_ref", 10 }, 243 + }, 244 + .result_unpriv = REJECT, 245 + .result = REJECT, 246 + .errstr = "R1 must be referenced", 247 + }, 248 + { 249 + "calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID", 250 + .insns = { 251 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), 252 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), 253 + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 0), 254 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 255 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 256 + BPF_EXIT_INSN(), 257 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 258 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 259 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 260 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 261 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 262 + BPF_MOV64_IMM(BPF_REG_0, 0), 263 + BPF_EXIT_INSN(), 264 + }, 265 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 266 + .fixup_kfunc_btf_id = { 267 + { "bpf_kfunc_call_test_acquire", 3 }, 268 + { "bpf_kfunc_call_test_ref", 8 }, 269 + { "bpf_kfunc_call_test_release", 10 }, 270 + }, 271 + .result_unpriv = REJECT, 272 + .result = ACCEPT, 273 + }, 274 + { 222 275 "calls: basic sanity", 223 276 .insns = { 224 277 BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),