Daniel Borkmann says: · tjh.dev/kernel@d6dc62f

+6 -5

Documentation/bpf/bpf_design_QA.rst

··· 332 332 In other words, no backwards compatibility is guaranteed if one using a type 333 333 in BTF with 'bpf\_' prefix. 334 334 335 - Q: What is the compatibility story for special BPF types in local kptrs? 336 - ------------------------------------------------------------------------ 337 - Q: Same as above, but for local kptrs (i.e. pointers to objects allocated using 338 - bpf_obj_new for user defined structures). Will the kernel preserve backwards 335 + Q: What is the compatibility story for special BPF types in allocated objects? 336 + ------------------------------------------------------------------------------ 337 + Q: Same as above, but for allocated objects (i.e. objects allocated using 338 + bpf_obj_new for user defined types). Will the kernel preserve backwards 339 339 compatibility for these features? 340 340 341 341 A: NO. 342 342 343 343 Unlike map value types, there are no stability guarantees for this case. The 344 - whole local kptr API itself is unstable (since it is exposed through kfuncs). 344 + whole API to work with allocated objects and any support for special fields 345 + inside them is unstable (since it is exposed through kfuncs).

+27

Documentation/bpf/bpf_devel_QA.rst

··· 44 44 Submitting patches 45 45 ================== 46 46 47 + Q: How do I run BPF CI on my changes before sending them out for review? 48 + ------------------------------------------------------------------------ 49 + A: BPF CI is GitHub based and hosted at https://github.com/kernel-patches/bpf. 50 + While GitHub also provides a CLI that can be used to accomplish the same 51 + results, here we focus on the UI based workflow. 52 + 53 + The following steps lay out how to start a CI run for your patches: 54 + 55 + - Create a fork of the aforementioned repository in your own account (one time 56 + action) 57 + 58 + - Clone the fork locally, check out a new branch tracking either the bpf-next 59 + or bpf branch, and apply your to-be-tested patches on top of it 60 + 61 + - Push the local branch to your fork and create a pull request against 62 + kernel-patches/bpf's bpf-next_base or bpf_base branch, respectively 63 + 64 + Shortly after the pull request has been created, the CI workflow will run. Note 65 + that capacity is shared with patches submitted upstream being checked and so 66 + depending on utilization the run can take a while to finish. 67 + 68 + Note furthermore that both base branches (bpf-next_base and bpf_base) will be 69 + updated as patches are pushed to the respective upstream branches they track. As 70 + such, your patch set will automatically (be attempted to) be rebased as well. 71 + This behavior can result in a CI run being aborted and restarted with the new 72 + base line. 73 + 47 74 Q: To which mailing list do I need to submit my BPF patches? 48 75 ------------------------------------------------------------ 49 76 A: Please submit your BPF patches to the bpf kernel mailing list:

+6 -1

Documentation/bpf/btf.rst

··· 1062 1062 7. Testing 1063 1063 ========== 1064 1064 1065 - Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests. 1065 + The kernel BPF selftest `tools/testing/selftests/bpf/prog_tests/btf.c`_ 1066 + provides an extensive set of BTF-related tests. 1067 + 1068 + .. Links 1069 + .. _tools/testing/selftests/bpf/prog_tests/btf.c: 1070 + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/prog_tests/btf.c

+1

Documentation/bpf/index.rst

··· 29 29 clang-notes 30 30 linux-notes 31 31 other 32 + redirect 32 33 33 34 .. only:: subproject and html 34 35

+35 -13

Documentation/bpf/kfuncs.rst

··· 72 72 of the pointer is used. Without __sz annotation, a kfunc cannot accept a void 73 73 pointer. 74 74 75 + 2.2.2 __k Annotation 76 + -------------------- 77 + 78 + This annotation is only understood for scalar arguments, where it indicates that 79 + the verifier must check the scalar argument to be a known constant, which does 80 + not indicate a size parameter, and the value of the constant is relevant to the 81 + safety of the program. 82 + 83 + An example is given below:: 84 + 85 + void *bpf_obj_new(u32 local_type_id__k, ...) 86 + { 87 + ... 88 + } 89 + 90 + Here, bpf_obj_new uses local_type_id argument to find out the size of that type 91 + ID in program's BTF and return a sized pointer to it. Each type ID will have a 92 + distinct size, hence it is crucial to treat each such call as distinct when 93 + values don't match during verifier state pruning checks. 94 + 95 + Hence, whenever a constant scalar argument is accepted by a kfunc which is not a 96 + size parameter, and the value of the constant matters for program safety, __k 97 + suffix should be used. 98 + 75 99 .. _BPF_kfunc_nodef: 76 100 77 101 2.3 Using an existing kernel function ··· 161 137 -------------------------- 162 138 163 139 The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It 164 - indicates that the all pointer arguments will always have a guaranteed lifetime, 165 - and pointers to kernel objects are always passed to helpers in their unmodified 166 - form (as obtained from acquire kfuncs). 140 + indicates that the all pointer arguments are valid, and that all pointers to 141 + BTF objects have been passed in their unmodified form (that is, at a zero 142 + offset, and without having been obtained from walking another pointer). 167 143 168 - It can be used to enforce that a pointer to a refcounted object acquired from a 169 - kfunc or BPF helper is passed as an argument to this kfunc without any 170 - modifications (e.g. pointer arithmetic) such that it is trusted and points to 171 - the original object. 144 + There are two types of pointers to kernel objects which are considered "valid": 172 145 173 - Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs, 174 - but those can have a non-zero offset. 146 + 1. Pointers which are passed as tracepoint or struct_ops callback arguments. 147 + 2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc. 175 148 176 - This flag is often used for kfuncs that operate (change some property, perform 177 - some operation) on an object that was obtained using an acquire kfunc. Such 178 - kfuncs need an unchanged pointer to ensure the integrity of the operation being 179 - performed on the expected object. 149 + Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to 150 + KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset. 151 + 152 + The definition of "valid" pointers is subject to change at any time, and has 153 + absolutely no ABI stability guarantees. 180 154 181 155 2.4.6 KF_SLEEPABLE flag 182 156 -----------------------

+3

Documentation/bpf/libbpf/index.rst

··· 1 1 .. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 2 3 + .. _libbpf: 4 + 3 5 libbpf 4 6 ====== 5 7 ··· 9 7 :maxdepth: 1 10 8 11 9 API Documentation <https://libbpf.readthedocs.io/en/latest/api.html> 10 + program_types 12 11 libbpf_naming_convention 13 12 libbpf_build 14 13

+203

Documentation/bpf/libbpf/program_types.rst

··· 1 + .. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 + 3 + .. _program_types_and_elf: 4 + 5 + Program Types and ELF Sections 6 + ============================== 7 + 8 + The table below lists the program types, their attach types where relevant and the ELF section 9 + names supported by libbpf for them. The ELF section names follow these rules: 10 + 11 + - ``type`` is an exact match, e.g. ``SEC("socket")`` 12 + - ``type+`` means it can be either exact ``SEC("type")`` or well-formed ``SEC("type/extras")`` 13 + with a '``/``' separator between ``type`` and ``extras``. 14 + 15 + When ``extras`` are specified, they provide details of how to auto-attach the BPF program. The 16 + format of ``extras`` depends on the program type, e.g. ``SEC("tracepoint/<category>/<name>")`` 17 + for tracepoints or ``SEC("usdt/<path>:<provider>:<name>")`` for USDT probes. The extras are 18 + described in more detail in the footnotes. 19 + 20 + 21 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 22 + | Program Type | Attach Type | ELF Section Name | Sleepable | 23 + +===========================================+========================================+==================================+===========+ 24 + | ``BPF_PROG_TYPE_CGROUP_DEVICE`` | ``BPF_CGROUP_DEVICE`` | ``cgroup/dev`` | | 25 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 26 + | ``BPF_PROG_TYPE_CGROUP_SKB`` | | ``cgroup/skb`` | | 27 + + +----------------------------------------+----------------------------------+-----------+ 28 + | | ``BPF_CGROUP_INET_EGRESS`` | ``cgroup_skb/egress`` | | 29 + + +----------------------------------------+----------------------------------+-----------+ 30 + | | ``BPF_CGROUP_INET_INGRESS`` | ``cgroup_skb/ingress`` | | 31 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 32 + | ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` | ``BPF_CGROUP_GETSOCKOPT`` | ``cgroup/getsockopt`` | | 33 + + +----------------------------------------+----------------------------------+-----------+ 34 + | | ``BPF_CGROUP_SETSOCKOPT`` | ``cgroup/setsockopt`` | | 35 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 36 + | ``BPF_PROG_TYPE_CGROUP_SOCK_ADDR`` | ``BPF_CGROUP_INET4_BIND`` | ``cgroup/bind4`` | | 37 + + +----------------------------------------+----------------------------------+-----------+ 38 + | | ``BPF_CGROUP_INET4_CONNECT`` | ``cgroup/connect4`` | | 39 + + +----------------------------------------+----------------------------------+-----------+ 40 + | | ``BPF_CGROUP_INET4_GETPEERNAME`` | ``cgroup/getpeername4`` | | 41 + + +----------------------------------------+----------------------------------+-----------+ 42 + | | ``BPF_CGROUP_INET4_GETSOCKNAME`` | ``cgroup/getsockname4`` | | 43 + + +----------------------------------------+----------------------------------+-----------+ 44 + | | ``BPF_CGROUP_INET6_BIND`` | ``cgroup/bind6`` | | 45 + + +----------------------------------------+----------------------------------+-----------+ 46 + | | ``BPF_CGROUP_INET6_CONNECT`` | ``cgroup/connect6`` | | 47 + + +----------------------------------------+----------------------------------+-----------+ 48 + | | ``BPF_CGROUP_INET6_GETPEERNAME`` | ``cgroup/getpeername6`` | | 49 + + +----------------------------------------+----------------------------------+-----------+ 50 + | | ``BPF_CGROUP_INET6_GETSOCKNAME`` | ``cgroup/getsockname6`` | | 51 + + +----------------------------------------+----------------------------------+-----------+ 52 + | | ``BPF_CGROUP_UDP4_RECVMSG`` | ``cgroup/recvmsg4`` | | 53 + + +----------------------------------------+----------------------------------+-----------+ 54 + | | ``BPF_CGROUP_UDP4_SENDMSG`` | ``cgroup/sendmsg4`` | | 55 + + +----------------------------------------+----------------------------------+-----------+ 56 + | | ``BPF_CGROUP_UDP6_RECVMSG`` | ``cgroup/recvmsg6`` | | 57 + + +----------------------------------------+----------------------------------+-----------+ 58 + | | ``BPF_CGROUP_UDP6_SENDMSG`` | ``cgroup/sendmsg6`` | | 59 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 60 + | ``BPF_PROG_TYPE_CGROUP_SOCK`` | ``BPF_CGROUP_INET4_POST_BIND`` | ``cgroup/post_bind4`` | | 61 + + +----------------------------------------+----------------------------------+-----------+ 62 + | | ``BPF_CGROUP_INET6_POST_BIND`` | ``cgroup/post_bind6`` | | 63 + + +----------------------------------------+----------------------------------+-----------+ 64 + | | ``BPF_CGROUP_INET_SOCK_CREATE`` | ``cgroup/sock_create`` | | 65 + + + +----------------------------------+-----------+ 66 + | | | ``cgroup/sock`` | | 67 + + +----------------------------------------+----------------------------------+-----------+ 68 + | | ``BPF_CGROUP_INET_SOCK_RELEASE`` | ``cgroup/sock_release`` | | 69 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 70 + | ``BPF_PROG_TYPE_CGROUP_SYSCTL`` | ``BPF_CGROUP_SYSCTL`` | ``cgroup/sysctl`` | | 71 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 72 + | ``BPF_PROG_TYPE_EXT`` | | ``freplace+`` [#fentry]_ | | 73 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 74 + | ``BPF_PROG_TYPE_FLOW_DISSECTOR`` | ``BPF_FLOW_DISSECTOR`` | ``flow_dissector`` | | 75 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 76 + | ``BPF_PROG_TYPE_KPROBE`` | | ``kprobe+`` [#kprobe]_ | | 77 + + + +----------------------------------+-----------+ 78 + | | | ``kretprobe+`` [#kprobe]_ | | 79 + + + +----------------------------------+-----------+ 80 + | | | ``ksyscall+`` [#ksyscall]_ | | 81 + + + +----------------------------------+-----------+ 82 + | | | ``kretsyscall+`` [#ksyscall]_ | | 83 + + + +----------------------------------+-----------+ 84 + | | | ``uprobe+`` [#uprobe]_ | | 85 + + + +----------------------------------+-----------+ 86 + | | | ``uprobe.s+`` [#uprobe]_ | Yes | 87 + + + +----------------------------------+-----------+ 88 + | | | ``uretprobe+`` [#uprobe]_ | | 89 + + + +----------------------------------+-----------+ 90 + | | | ``uretprobe.s+`` [#uprobe]_ | Yes | 91 + + + +----------------------------------+-----------+ 92 + | | | ``usdt+`` [#usdt]_ | | 93 + + +----------------------------------------+----------------------------------+-----------+ 94 + | | ``BPF_TRACE_KPROBE_MULTI`` | ``kprobe.multi+`` [#kpmulti]_ | | 95 + + + +----------------------------------+-----------+ 96 + | | | ``kretprobe.multi+`` [#kpmulti]_ | | 97 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 98 + | ``BPF_PROG_TYPE_LIRC_MODE2`` | ``BPF_LIRC_MODE2`` | ``lirc_mode2`` | | 99 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 100 + | ``BPF_PROG_TYPE_LSM`` | ``BPF_LSM_CGROUP`` | ``lsm_cgroup+`` | | 101 + + +----------------------------------------+----------------------------------+-----------+ 102 + | | ``BPF_LSM_MAC`` | ``lsm+`` [#lsm]_ | | 103 + + + +----------------------------------+-----------+ 104 + | | | ``lsm.s+`` [#lsm]_ | Yes | 105 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 106 + | ``BPF_PROG_TYPE_LWT_IN`` | | ``lwt_in`` | | 107 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 108 + | ``BPF_PROG_TYPE_LWT_OUT`` | | ``lwt_out`` | | 109 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 110 + | ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` | | ``lwt_seg6local`` | | 111 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 112 + | ``BPF_PROG_TYPE_LWT_XMIT`` | | ``lwt_xmit`` | | 113 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 114 + | ``BPF_PROG_TYPE_PERF_EVENT`` | | ``perf_event`` | | 115 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 116 + | ``BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE`` | | ``raw_tp.w+`` [#rawtp]_ | | 117 + + + +----------------------------------+-----------+ 118 + | | | ``raw_tracepoint.w+`` | | 119 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 120 + | ``BPF_PROG_TYPE_RAW_TRACEPOINT`` | | ``raw_tp+`` [#rawtp]_ | | 121 + + + +----------------------------------+-----------+ 122 + | | | ``raw_tracepoint+`` | | 123 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 124 + | ``BPF_PROG_TYPE_SCHED_ACT`` | | ``action`` | | 125 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 126 + | ``BPF_PROG_TYPE_SCHED_CLS`` | | ``classifier`` | | 127 + + + +----------------------------------+-----------+ 128 + | | | ``tc`` | | 129 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 130 + | ``BPF_PROG_TYPE_SK_LOOKUP`` | ``BPF_SK_LOOKUP`` | ``sk_lookup`` | | 131 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 132 + | ``BPF_PROG_TYPE_SK_MSG`` | ``BPF_SK_MSG_VERDICT`` | ``sk_msg`` | | 133 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 134 + | ``BPF_PROG_TYPE_SK_REUSEPORT`` | ``BPF_SK_REUSEPORT_SELECT_OR_MIGRATE`` | ``sk_reuseport/migrate`` | | 135 + + +----------------------------------------+----------------------------------+-----------+ 136 + | | ``BPF_SK_REUSEPORT_SELECT`` | ``sk_reuseport`` | | 137 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 138 + | ``BPF_PROG_TYPE_SK_SKB`` | | ``sk_skb`` | | 139 + + +----------------------------------------+----------------------------------+-----------+ 140 + | | ``BPF_SK_SKB_STREAM_PARSER`` | ``sk_skb/stream_parser`` | | 141 + + +----------------------------------------+----------------------------------+-----------+ 142 + | | ``BPF_SK_SKB_STREAM_VERDICT`` | ``sk_skb/stream_verdict`` | | 143 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 144 + | ``BPF_PROG_TYPE_SOCKET_FILTER`` | | ``socket`` | | 145 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 146 + | ``BPF_PROG_TYPE_SOCK_OPS`` | ``BPF_CGROUP_SOCK_OPS`` | ``sockops`` | | 147 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 148 + | ``BPF_PROG_TYPE_STRUCT_OPS`` | | ``struct_ops+`` | | 149 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 150 + | ``BPF_PROG_TYPE_SYSCALL`` | | ``syscall`` | Yes | 151 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 152 + | ``BPF_PROG_TYPE_TRACEPOINT`` | | ``tp+`` [#tp]_ | | 153 + + + +----------------------------------+-----------+ 154 + | | | ``tracepoint+`` [#tp]_ | | 155 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 156 + | ``BPF_PROG_TYPE_TRACING`` | ``BPF_MODIFY_RETURN`` | ``fmod_ret+`` [#fentry]_ | | 157 + + + +----------------------------------+-----------+ 158 + | | | ``fmod_ret.s+`` [#fentry]_ | Yes | 159 + + +----------------------------------------+----------------------------------+-----------+ 160 + | | ``BPF_TRACE_FENTRY`` | ``fentry+`` [#fentry]_ | | 161 + + + +----------------------------------+-----------+ 162 + | | | ``fentry.s+`` [#fentry]_ | Yes | 163 + + +----------------------------------------+----------------------------------+-----------+ 164 + | | ``BPF_TRACE_FEXIT`` | ``fexit+`` [#fentry]_ | | 165 + + + +----------------------------------+-----------+ 166 + | | | ``fexit.s+`` [#fentry]_ | Yes | 167 + + +----------------------------------------+----------------------------------+-----------+ 168 + | | ``BPF_TRACE_ITER`` | ``iter+`` [#iter]_ | | 169 + + + +----------------------------------+-----------+ 170 + | | | ``iter.s+`` [#iter]_ | Yes | 171 + + +----------------------------------------+----------------------------------+-----------+ 172 + | | ``BPF_TRACE_RAW_TP`` | ``tp_btf+`` [#fentry]_ | | 173 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 174 + | ``BPF_PROG_TYPE_XDP`` | ``BPF_XDP_CPUMAP`` | ``xdp.frags/cpumap`` | | 175 + + + +----------------------------------+-----------+ 176 + | | | ``xdp/cpumap`` | | 177 + + +----------------------------------------+----------------------------------+-----------+ 178 + | | ``BPF_XDP_DEVMAP`` | ``xdp.frags/devmap`` | | 179 + + + +----------------------------------+-----------+ 180 + | | | ``xdp/devmap`` | | 181 + + +----------------------------------------+----------------------------------+-----------+ 182 + | | ``BPF_XDP`` | ``xdp.frags`` | | 183 + + + +----------------------------------+-----------+ 184 + | | | ``xdp`` | | 185 + +-------------------------------------------+----------------------------------------+----------------------------------+-----------+ 186 + 187 + 188 + .. rubric:: Footnotes 189 + 190 + .. [#fentry] The ``fentry`` attach format is ``fentry[.s]/<function>``. 191 + .. [#kprobe] The ``kprobe`` attach format is ``kprobe/<function>[+<offset>]``. Valid 192 + characters for ``function`` are ``a-zA-Z0-9_.`` and ``offset`` must be a valid 193 + non-negative integer. 194 + .. [#ksyscall] The ``ksyscall`` attach format is ``ksyscall/<syscall>``. 195 + .. [#uprobe] The ``uprobe`` attach format is ``uprobe[.s]/<path>:<function>[+<offset>]``. 196 + .. [#usdt] The ``usdt`` attach format is ``usdt/<path>:<provider>:<name>``. 197 + .. [#kpmulti] The ``kprobe.multi`` attach format is ``kprobe.multi/<pattern>`` where ``pattern`` 198 + supports ``*`` and ``?`` wildcards. Valid characters for pattern are 199 + ``a-zA-Z0-9_.*?``. 200 + .. [#lsm] The ``lsm`` attachment format is ``lsm[.s]/<hook>``. 201 + .. [#rawtp] The ``raw_tp`` attach format is ``raw_tracepoint[.w]/<tracepoint>``. 202 + .. [#tp] The ``tracepoint`` attach format is ``tracepoint/<category>/<name>``. 203 + .. [#iter] The ``iter`` attach format is ``iter[.s]/<struct-name>``.

+17 -5

Documentation/bpf/map_array.rst

··· 32 32 Kernel BPF 33 33 ---------- 34 34 35 - .. c:function:: 35 + bpf_map_lookup_elem() 36 + ~~~~~~~~~~~~~~~~~~~~~ 37 + 38 + .. code-block:: c 39 + 36 40 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 37 41 38 42 Array elements can be retrieved using the ``bpf_map_lookup_elem()`` helper. ··· 44 40 with userspace reading the value, the user must use primitives like 45 41 ``__sync_fetch_and_add()`` when updating the value in-place. 46 42 47 - .. c:function:: 43 + bpf_map_update_elem() 44 + ~~~~~~~~~~~~~~~~~~~~~ 45 + 46 + .. code-block:: c 47 + 48 48 long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 49 49 50 50 Array elements can be updated using the ``bpf_map_update_elem()`` helper. ··· 61 53 zero value to that index. 62 54 63 55 Per CPU Array 64 - ~~~~~~~~~~~~~ 56 + ------------- 65 57 66 58 Values stored in ``BPF_MAP_TYPE_ARRAY`` can be accessed by multiple programs 67 59 across different CPUs. To restrict storage to a single CPU, you may use a ··· 71 63 ``bpf_map_lookup_elem()`` helpers automatically access the slot for the current 72 64 CPU. 73 65 74 - .. c:function:: 66 + bpf_map_lookup_percpu_elem() 67 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 68 + 69 + .. code-block:: c 70 + 75 71 void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu) 76 72 77 73 The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the array ··· 131 119 index = ip.protocol; 132 120 value = bpf_map_lookup_elem(&my_map, &index); 133 121 if (value) 134 - __sync_fetch_and_add(&value, skb->len); 122 + __sync_fetch_and_add(value, skb->len); 135 123 136 124 return 0; 137 125 }

+174

Documentation/bpf/map_bloom_filter.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0-only 2 + .. Copyright (C) 2022 Red Hat, Inc. 3 + 4 + ========================= 5 + BPF_MAP_TYPE_BLOOM_FILTER 6 + ========================= 7 + 8 + .. note:: 9 + - ``BPF_MAP_TYPE_BLOOM_FILTER`` was introduced in kernel version 5.16 10 + 11 + ``BPF_MAP_TYPE_BLOOM_FILTER`` provides a BPF bloom filter map. Bloom 12 + filters are a space-efficient probabilistic data structure used to 13 + quickly test whether an element exists in a set. In a bloom filter, 14 + false positives are possible whereas false negatives are not. 15 + 16 + The bloom filter map does not have keys, only values. When the bloom 17 + filter map is created, it must be created with a ``key_size`` of 0. The 18 + bloom filter map supports two operations: 19 + 20 + - push: adding an element to the map 21 + - peek: determining whether an element is present in the map 22 + 23 + BPF programs must use ``bpf_map_push_elem`` to add an element to the 24 + bloom filter map and ``bpf_map_peek_elem`` to query the map. These 25 + operations are exposed to userspace applications using the existing 26 + ``bpf`` syscall in the following way: 27 + 28 + - ``BPF_MAP_UPDATE_ELEM`` -> push 29 + - ``BPF_MAP_LOOKUP_ELEM`` -> peek 30 + 31 + The ``max_entries`` size that is specified at map creation time is used 32 + to approximate a reasonable bitmap size for the bloom filter, and is not 33 + otherwise strictly enforced. If the user wishes to insert more entries 34 + into the bloom filter than ``max_entries``, this may lead to a higher 35 + false positive rate. 36 + 37 + The number of hashes to use for the bloom filter is configurable using 38 + the lower 4 bits of ``map_extra`` in ``union bpf_attr`` at map creation 39 + time. If no number is specified, the default used will be 5 hash 40 + functions. In general, using more hashes decreases both the false 41 + positive rate and the speed of a lookup. 42 + 43 + It is not possible to delete elements from a bloom filter map. A bloom 44 + filter map may be used as an inner map. The user is responsible for 45 + synchronising concurrent updates and lookups to ensure no false negative 46 + lookups occur. 47 + 48 + Usage 49 + ===== 50 + 51 + Kernel BPF 52 + ---------- 53 + 54 + bpf_map_push_elem() 55 + ~~~~~~~~~~~~~~~~~~~ 56 + 57 + .. code-block:: c 58 + 59 + long bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags) 60 + 61 + A ``value`` can be added to a bloom filter using the 62 + ``bpf_map_push_elem()`` helper. The ``flags`` parameter must be set to 63 + ``BPF_ANY`` when adding an entry to the bloom filter. This helper 64 + returns ``0`` on success, or negative error in case of failure. 65 + 66 + bpf_map_peek_elem() 67 + ~~~~~~~~~~~~~~~~~~~ 68 + 69 + .. code-block:: c 70 + 71 + long bpf_map_peek_elem(struct bpf_map *map, void *value) 72 + 73 + The ``bpf_map_peek_elem()`` helper is used to determine whether 74 + ``value`` is present in the bloom filter map. This helper returns ``0`` 75 + if ``value`` is probably present in the map, or ``-ENOENT`` if ``value`` 76 + is definitely not present in the map. 77 + 78 + Userspace 79 + --------- 80 + 81 + bpf_map_update_elem() 82 + ~~~~~~~~~~~~~~~~~~~~~ 83 + 84 + .. code-block:: c 85 + 86 + int bpf_map_update_elem (int fd, const void *key, const void *value, __u64 flags) 87 + 88 + A userspace program can add a ``value`` to a bloom filter using libbpf's 89 + ``bpf_map_update_elem`` function. The ``key`` parameter must be set to 90 + ``NULL`` and ``flags`` must be set to ``BPF_ANY``. Returns ``0`` on 91 + success, or negative error in case of failure. 92 + 93 + bpf_map_lookup_elem() 94 + ~~~~~~~~~~~~~~~~~~~~~ 95 + 96 + .. code-block:: c 97 + 98 + int bpf_map_lookup_elem (int fd, const void *key, void *value) 99 + 100 + A userspace program can determine the presence of ``value`` in a bloom 101 + filter using libbpf's ``bpf_map_lookup_elem`` function. The ``key`` 102 + parameter must be set to ``NULL``. Returns ``0`` if ``value`` is 103 + probably present in the map, or ``-ENOENT`` if ``value`` is definitely 104 + not present in the map. 105 + 106 + Examples 107 + ======== 108 + 109 + Kernel BPF 110 + ---------- 111 + 112 + This snippet shows how to declare a bloom filter in a BPF program: 113 + 114 + .. code-block:: c 115 + 116 + struct { 117 + __uint(type, BPF_MAP_TYPE_BLOOM_FILTER); 118 + __type(value, __u32); 119 + __uint(max_entries, 1000); 120 + __uint(map_extra, 3); 121 + } bloom_filter SEC(".maps"); 122 + 123 + This snippet shows how to determine presence of a value in a bloom 124 + filter in a BPF program: 125 + 126 + .. code-block:: c 127 + 128 + void *lookup(__u32 key) 129 + { 130 + if (bpf_map_peek_elem(&bloom_filter, &key) == 0) { 131 + /* Verify not a false positive and fetch an associated 132 + * value using a secondary lookup, e.g. in a hash table 133 + */ 134 + return bpf_map_lookup_elem(&hash_table, &key); 135 + } 136 + return 0; 137 + } 138 + 139 + Userspace 140 + --------- 141 + 142 + This snippet shows how to use libbpf to create a bloom filter map from 143 + userspace: 144 + 145 + .. code-block:: c 146 + 147 + int create_bloom() 148 + { 149 + LIBBPF_OPTS(bpf_map_create_opts, opts, 150 + .map_extra = 3); /* number of hashes */ 151 + 152 + return bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, 153 + "ipv6_bloom", /* name */ 154 + 0, /* key size, must be zero */ 155 + sizeof(ipv6_addr), /* value size */ 156 + 10000, /* max entries */ 157 + &opts); /* create options */ 158 + } 159 + 160 + This snippet shows how to add an element to a bloom filter from 161 + userspace: 162 + 163 + .. code-block:: c 164 + 165 + int add_element(struct bpf_map *bloom_map, __u32 value) 166 + { 167 + int bloom_fd = bpf_map__fd(bloom_map); 168 + return bpf_map_update_elem(bloom_fd, NULL, &value, BPF_ANY); 169 + } 170 + 171 + References 172 + ========== 173 + 174 + https://lwn.net/ml/bpf/20210831225005.2762202-1-joannekoong@fb.com/

+35 -24

Documentation/bpf/map_cpumap.rst

··· 30 30 31 31 Kernel BPF 32 32 ---------- 33 - .. c:function:: 33 + bpf_redirect_map() 34 + ^^^^^^^^^^^^^^^^^^ 35 + .. code-block:: c 36 + 34 37 long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) 35 38 36 - Redirect the packet to the endpoint referenced by ``map`` at index ``key``. 37 - For ``BPF_MAP_TYPE_CPUMAP`` this map contains references to CPUs. 39 + Redirect the packet to the endpoint referenced by ``map`` at index ``key``. 40 + For ``BPF_MAP_TYPE_CPUMAP`` this map contains references to CPUs. 38 41 39 - The lower two bits of ``flags`` are used as the return code if the map lookup 40 - fails. This is so that the return value can be one of the XDP program return 41 - codes up to ``XDP_TX``, as chosen by the caller. 42 + The lower two bits of ``flags`` are used as the return code if the map lookup 43 + fails. This is so that the return value can be one of the XDP program return 44 + codes up to ``XDP_TX``, as chosen by the caller. 42 45 43 - Userspace 44 - --------- 46 + User space 47 + ---------- 45 48 .. note:: 46 49 CPUMAP entries can only be updated/looked up/deleted from user space and not 47 50 from an eBPF program. Trying to call these functions from a kernel eBPF 48 51 program will result in the program failing to load and a verifier warning. 49 52 50 - .. c:function:: 51 - int bpf_map_update_elem(int fd, const void *key, const void *value, 52 - __u64 flags); 53 + bpf_map_update_elem() 54 + ^^^^^^^^^^^^^^^^^^^^^ 55 + .. code-block:: c 53 56 54 - CPU entries can be added or updated using the ``bpf_map_update_elem()`` 55 - helper. This helper replaces existing elements atomically. The ``value`` parameter 56 - can be ``struct bpf_cpumap_val``. 57 + int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags); 58 + 59 + CPU entries can be added or updated using the ``bpf_map_update_elem()`` 60 + helper. This helper replaces existing elements atomically. The ``value`` parameter 61 + can be ``struct bpf_cpumap_val``. 57 62 58 63 .. code-block:: c 59 64 ··· 70 65 } bpf_prog; 71 66 }; 72 67 73 - The flags argument can be one of the following: 68 + The flags argument can be one of the following: 74 69 - BPF_ANY: Create a new element or update an existing element. 75 70 - BPF_NOEXIST: Create a new element only if it did not exist. 76 71 - BPF_EXIST: Update an existing element. 77 72 78 - .. c:function:: 73 + bpf_map_lookup_elem() 74 + ^^^^^^^^^^^^^^^^^^^^^ 75 + .. code-block:: c 76 + 79 77 int bpf_map_lookup_elem(int fd, const void *key, void *value); 80 78 81 - CPU entries can be retrieved using the ``bpf_map_lookup_elem()`` 82 - helper. 79 + CPU entries can be retrieved using the ``bpf_map_lookup_elem()`` 80 + helper. 83 81 84 - .. c:function:: 82 + bpf_map_delete_elem() 83 + ^^^^^^^^^^^^^^^^^^^^^ 84 + .. code-block:: c 85 + 85 86 int bpf_map_delete_elem(int fd, const void *key); 86 87 87 - CPU entries can be deleted using the ``bpf_map_delete_elem()`` 88 - helper. This helper will return 0 on success, or negative error in case of 89 - failure. 88 + CPU entries can be deleted using the ``bpf_map_delete_elem()`` 89 + helper. This helper will return 0 on success, or negative error in case of 90 + failure. 90 91 91 92 Examples 92 93 ======== ··· 153 142 return bpf_redirect_map(&cpu_map, cpu_dest, 0); 154 143 } 155 144 156 - Userspace 157 - --------- 145 + User space 146 + ---------- 158 147 159 148 The following code snippet shows how to dynamically set the max_entries for a 160 149 CPUMAP to the max number of cpus available on the system.

+238

Documentation/bpf/map_devmap.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0-only 2 + .. Copyright (C) 2022 Red Hat, Inc. 3 + 4 + ================================================= 5 + BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH 6 + ================================================= 7 + 8 + .. note:: 9 + - ``BPF_MAP_TYPE_DEVMAP`` was introduced in kernel version 4.14 10 + - ``BPF_MAP_TYPE_DEVMAP_HASH`` was introduced in kernel version 5.4 11 + 12 + ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` are BPF maps primarily 13 + used as backend maps for the XDP BPF helper call ``bpf_redirect_map()``. 14 + ``BPF_MAP_TYPE_DEVMAP`` is backed by an array that uses the key as 15 + the index to lookup a reference to a net device. While ``BPF_MAP_TYPE_DEVMAP_HASH`` 16 + is backed by a hash table that uses a key to lookup a reference to a net device. 17 + The user provides either <``key``/ ``ifindex``> or <``key``/ ``struct bpf_devmap_val``> 18 + pairs to update the maps with new net devices. 19 + 20 + .. note:: 21 + - The key to a hash map doesn't have to be an ``ifindex``. 22 + - While ``BPF_MAP_TYPE_DEVMAP_HASH`` allows for densely packing the net devices 23 + it comes at the cost of a hash of the key when performing a look up. 24 + 25 + The setup and packet enqueue/send code is shared between the two types of 26 + devmap; only the lookup and insertion is different. 27 + 28 + Usage 29 + ===== 30 + Kernel BPF 31 + ---------- 32 + bpf_redirect_map() 33 + ^^^^^^^^^^^^^^^^^^ 34 + .. code-block:: c 35 + 36 + long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) 37 + 38 + Redirect the packet to the endpoint referenced by ``map`` at index ``key``. 39 + For ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` this map contains 40 + references to net devices (for forwarding packets through other ports). 41 + 42 + The lower two bits of *flags* are used as the return code if the map lookup 43 + fails. This is so that the return value can be one of the XDP program return 44 + codes up to ``XDP_TX``, as chosen by the caller. The higher bits of ``flags`` 45 + can be set to ``BPF_F_BROADCAST`` or ``BPF_F_EXCLUDE_INGRESS`` as defined 46 + below. 47 + 48 + With ``BPF_F_BROADCAST`` the packet will be broadcast to all the interfaces 49 + in the map, with ``BPF_F_EXCLUDE_INGRESS`` the ingress interface will be excluded 50 + from the broadcast. 51 + 52 + .. note:: 53 + - The key is ignored if BPF_F_BROADCAST is set. 54 + - The broadcast feature can also be used to implement multicast forwarding: 55 + simply create multiple DEVMAPs, each one corresponding to a single multicast group. 56 + 57 + This helper will return ``XDP_REDIRECT`` on success, or the value of the two 58 + lower bits of the ``flags`` argument if the map lookup fails. 59 + 60 + More information about redirection can be found :doc:`redirect` 61 + 62 + bpf_map_lookup_elem() 63 + ^^^^^^^^^^^^^^^^^^^^^ 64 + .. code-block:: c 65 + 66 + void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 67 + 68 + Net device entries can be retrieved using the ``bpf_map_lookup_elem()`` 69 + helper. 70 + 71 + User space 72 + ---------- 73 + .. note:: 74 + DEVMAP entries can only be updated/deleted from user space and not 75 + from an eBPF program. Trying to call these functions from a kernel eBPF 76 + program will result in the program failing to load and a verifier warning. 77 + 78 + bpf_map_update_elem() 79 + ^^^^^^^^^^^^^^^^^^^^^ 80 + .. code-block:: c 81 + 82 + int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags); 83 + 84 + Net device entries can be added or updated using the ``bpf_map_update_elem()`` 85 + helper. This helper replaces existing elements atomically. The ``value`` parameter 86 + can be ``struct bpf_devmap_val`` or a simple ``int ifindex`` for backwards 87 + compatibility. 88 + 89 + .. code-block:: c 90 + 91 + struct bpf_devmap_val { 92 + __u32 ifindex; /* device index */ 93 + union { 94 + int fd; /* prog fd on map write */ 95 + __u32 id; /* prog id on map read */ 96 + } bpf_prog; 97 + }; 98 + 99 + The ``flags`` argument can be one of the following: 100 + - ``BPF_ANY``: Create a new element or update an existing element. 101 + - ``BPF_NOEXIST``: Create a new element only if it did not exist. 102 + - ``BPF_EXIST``: Update an existing element. 103 + 104 + DEVMAPs can associate a program with a device entry by adding a ``bpf_prog.fd`` 105 + to ``struct bpf_devmap_val``. Programs are run after ``XDP_REDIRECT`` and have 106 + access to both Rx device and Tx device. The program associated with the ``fd`` 107 + must have type XDP with expected attach type ``xdp_devmap``. 108 + When a program is associated with a device index, the program is run on an 109 + ``XDP_REDIRECT`` and before the buffer is added to the per-cpu queue. Examples 110 + of how to attach/use xdp_devmap progs can be found in the kernel selftests: 111 + 112 + - ``tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c`` 113 + - ``tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c`` 114 + 115 + bpf_map_lookup_elem() 116 + ^^^^^^^^^^^^^^^^^^^^^ 117 + .. code-block:: c 118 + 119 + .. c:function:: 120 + int bpf_map_lookup_elem(int fd, const void *key, void *value); 121 + 122 + Net device entries can be retrieved using the ``bpf_map_lookup_elem()`` 123 + helper. 124 + 125 + bpf_map_delete_elem() 126 + ^^^^^^^^^^^^^^^^^^^^^ 127 + .. code-block:: c 128 + 129 + .. c:function:: 130 + int bpf_map_delete_elem(int fd, const void *key); 131 + 132 + Net device entries can be deleted using the ``bpf_map_delete_elem()`` 133 + helper. This helper will return 0 on success, or negative error in case of 134 + failure. 135 + 136 + Examples 137 + ======== 138 + 139 + Kernel BPF 140 + ---------- 141 + 142 + The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP`` 143 + called tx_port. 144 + 145 + .. code-block:: c 146 + 147 + struct { 148 + __uint(type, BPF_MAP_TYPE_DEVMAP); 149 + __type(key, __u32); 150 + __type(value, __u32); 151 + __uint(max_entries, 256); 152 + } tx_port SEC(".maps"); 153 + 154 + The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP_HASH`` 155 + called forward_map. 156 + 157 + .. code-block:: c 158 + 159 + struct { 160 + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); 161 + __type(key, __u32); 162 + __type(value, struct bpf_devmap_val); 163 + __uint(max_entries, 32); 164 + } forward_map SEC(".maps"); 165 + 166 + .. note:: 167 + 168 + The value type in the DEVMAP above is a ``struct bpf_devmap_val`` 169 + 170 + The following code snippet shows a simple xdp_redirect_map program. This program 171 + would work with a user space program that populates the devmap ``forward_map`` based 172 + on ingress ifindexes. The BPF program (below) is redirecting packets using the 173 + ingress ``ifindex`` as the ``key``. 174 + 175 + .. code-block:: c 176 + 177 + SEC("xdp") 178 + int xdp_redirect_map_func(struct xdp_md *ctx) 179 + { 180 + int index = ctx->ingress_ifindex; 181 + 182 + return bpf_redirect_map(&forward_map, index, 0); 183 + } 184 + 185 + The following code snippet shows a BPF program that is broadcasting packets to 186 + all the interfaces in the ``tx_port`` devmap. 187 + 188 + .. code-block:: c 189 + 190 + SEC("xdp") 191 + int xdp_redirect_map_func(struct xdp_md *ctx) 192 + { 193 + return bpf_redirect_map(&tx_port, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS); 194 + } 195 + 196 + User space 197 + ---------- 198 + 199 + The following code snippet shows how to update a devmap called ``tx_port``. 200 + 201 + .. code-block:: c 202 + 203 + int update_devmap(int ifindex, int redirect_ifindex) 204 + { 205 + int ret; 206 + 207 + ret = bpf_map_update_elem(bpf_map__fd(tx_port), &ifindex, &redirect_ifindex, 0); 208 + if (ret < 0) { 209 + fprintf(stderr, "Failed to update devmap_ value: %s\n", 210 + strerror(errno)); 211 + } 212 + 213 + return ret; 214 + } 215 + 216 + The following code snippet shows how to update a hash_devmap called ``forward_map``. 217 + 218 + .. code-block:: c 219 + 220 + int update_devmap(int ifindex, int redirect_ifindex) 221 + { 222 + struct bpf_devmap_val devmap_val = { .ifindex = redirect_ifindex }; 223 + int ret; 224 + 225 + ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0); 226 + if (ret < 0) { 227 + fprintf(stderr, "Failed to update devmap_ value: %s\n", 228 + strerror(errno)); 229 + } 230 + return ret; 231 + } 232 + 233 + References 234 + =========== 235 + 236 + - https://lwn.net/Articles/728146/ 237 + - https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6f9d451ab1a33728adb72d7ff66a7b374d665176 238 + - https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4106

+28 -5

Documentation/bpf/map_hash.rst

··· 34 34 Usage 35 35 ===== 36 36 37 - .. c:function:: 37 + Kernel BPF 38 + ---------- 39 + 40 + bpf_map_update_elem() 41 + ~~~~~~~~~~~~~~~~~~~~~ 42 + 43 + .. code-block:: c 44 + 38 45 long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 39 46 40 47 Hash entries can be added or updated using the ``bpf_map_update_elem()`` ··· 56 49 ``bpf_map_update_elem()`` returns 0 on success, or negative error in 57 50 case of failure. 58 51 59 - .. c:function:: 52 + bpf_map_lookup_elem() 53 + ~~~~~~~~~~~~~~~~~~~~~ 54 + 55 + .. code-block:: c 56 + 60 57 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 61 58 62 59 Hash entries can be retrieved using the ``bpf_map_lookup_elem()`` 63 60 helper. This helper returns a pointer to the value associated with 64 61 ``key``, or ``NULL`` if no entry was found. 65 62 66 - .. c:function:: 63 + bpf_map_delete_elem() 64 + ~~~~~~~~~~~~~~~~~~~~~ 65 + 66 + .. code-block:: c 67 + 67 68 long bpf_map_delete_elem(struct bpf_map *map, const void *key) 68 69 69 70 Hash entries can be deleted using the ``bpf_map_delete_elem()`` ··· 85 70 the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers 86 71 automatically access the hash slot for the current CPU. 87 72 88 - .. c:function:: 73 + bpf_map_lookup_percpu_elem() 74 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 75 + 76 + .. code-block:: c 77 + 89 78 void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu) 90 79 91 80 The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the ··· 108 89 Userspace 109 90 --------- 110 91 111 - .. c:function:: 92 + bpf_map_get_next_key() 93 + ~~~~~~~~~~~~~~~~~~~~~~ 94 + 95 + .. code-block:: c 96 + 112 97 int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key) 113 98 114 99 In userspace, it is possible to iterate through the keys of a hash using

+20 -4

Documentation/bpf/map_lpm_trie.rst

··· 35 35 Kernel BPF 36 36 ---------- 37 37 38 - .. c:function:: 38 + bpf_map_lookup_elem() 39 + ~~~~~~~~~~~~~~~~~~~~~ 40 + 41 + .. code-block:: c 42 + 39 43 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 40 44 41 45 The longest prefix entry for a given data value can be found using the ··· 52 48 longest prefix match for an IPv4 address, ``prefixlen`` should be set to 53 49 ``32``. 54 50 55 - .. c:function:: 51 + bpf_map_update_elem() 52 + ~~~~~~~~~~~~~~~~~~~~~ 53 + 54 + .. code-block:: c 55 + 56 56 long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags) 57 57 58 58 Prefix entries can be added or updated using the ``bpf_map_update_elem()`` ··· 69 61 The flags parameter must be one of BPF_ANY, BPF_NOEXIST or BPF_EXIST, 70 62 but the value is ignored, giving BPF_ANY semantics. 71 63 72 - .. c:function:: 64 + bpf_map_delete_elem() 65 + ~~~~~~~~~~~~~~~~~~~~~ 66 + 67 + .. code-block:: c 68 + 73 69 long bpf_map_delete_elem(struct bpf_map *map, const void *key) 74 70 75 71 Prefix entries can be deleted using the ``bpf_map_delete_elem()`` ··· 86 74 Access from userspace uses libbpf APIs with the same names as above, with 87 75 the map identified by ``fd``. 88 76 89 - .. c:function:: 77 + bpf_map_get_next_key() 78 + ~~~~~~~~~~~~~~~~~~~~~~ 79 + 80 + .. code-block:: c 81 + 90 82 int bpf_map_get_next_key (int fd, const void *cur_key, void *next_key) 91 83 92 84 A userspace program can iterate through the entries in an LPM trie using

+5 -1

Documentation/bpf/map_of_maps.rst

··· 45 45 Kernel BPF Helper 46 46 ----------------- 47 47 48 - .. c:function:: 48 + bpf_map_lookup_elem() 49 + ~~~~~~~~~~~~~~~~~~~~~ 50 + 51 + .. code-block:: c 52 + 49 53 void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 50 54 51 55 Inner maps can be retrieved using the ``bpf_map_lookup_elem()`` helper. This

+30 -6

Documentation/bpf/map_queue_stack.rst

··· 28 28 Kernel BPF 29 29 ---------- 30 30 31 - .. c:function:: 31 + bpf_map_push_elem() 32 + ~~~~~~~~~~~~~~~~~~~ 33 + 34 + .. code-block:: c 35 + 32 36 long bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags) 33 37 34 38 An element ``value`` can be added to a queue or stack using the ··· 42 38 make room for ``value`` to be added. Returns ``0`` on success, or 43 39 negative error in case of failure. 44 40 45 - .. c:function:: 41 + bpf_map_peek_elem() 42 + ~~~~~~~~~~~~~~~~~~~ 43 + 44 + .. code-block:: c 45 + 46 46 long bpf_map_peek_elem(struct bpf_map *map, void *value) 47 47 48 48 This helper fetches an element ``value`` from a queue or stack without 49 49 removing it. Returns ``0`` on success, or negative error in case of 50 50 failure. 51 51 52 - .. c:function:: 52 + bpf_map_pop_elem() 53 + ~~~~~~~~~~~~~~~~~~ 54 + 55 + .. code-block:: c 56 + 53 57 long bpf_map_pop_elem(struct bpf_map *map, void *value) 54 58 55 59 This helper removes an element into ``value`` from a queue or ··· 67 55 Userspace 68 56 --------- 69 57 70 - .. c:function:: 58 + bpf_map_update_elem() 59 + ~~~~~~~~~~~~~~~~~~~~~ 60 + 61 + .. code-block:: c 62 + 71 63 int bpf_map_update_elem (int fd, const void *key, const void *value, __u64 flags) 72 64 73 65 A userspace program can push ``value`` onto a queue or stack using libbpf's ··· 80 64 same semantics as the ``bpf_map_push_elem`` kernel helper. Returns ``0`` on 81 65 success, or negative error in case of failure. 82 66 83 - .. c:function:: 67 + bpf_map_lookup_elem() 68 + ~~~~~~~~~~~~~~~~~~~~~ 69 + 70 + .. code-block:: c 71 + 84 72 int bpf_map_lookup_elem (int fd, const void *key, void *value) 85 73 86 74 A userspace program can peek at the ``value`` at the head of a queue or stack ··· 92 72 set to ``NULL``. Returns ``0`` on success, or negative error in case of 93 73 failure. 94 74 95 - .. c:function:: 75 + bpf_map_lookup_and_delete_elem() 76 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 77 + 78 + .. code-block:: c 79 + 96 80 int bpf_map_lookup_and_delete_elem (int fd, const void *key, void *value) 97 81 98 82 A userspace program can pop a ``value`` from the head of a queue or stack using

+192

Documentation/bpf/map_xskmap.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0-only 2 + .. Copyright (C) 2022 Red Hat, Inc. 3 + 4 + =================== 5 + BPF_MAP_TYPE_XSKMAP 6 + =================== 7 + 8 + .. note:: 9 + - ``BPF_MAP_TYPE_XSKMAP`` was introduced in kernel version 4.18 10 + 11 + The ``BPF_MAP_TYPE_XSKMAP`` is used as a backend map for XDP BPF helper 12 + call ``bpf_redirect_map()`` and ``XDP_REDIRECT`` action, like 'devmap' and 'cpumap'. 13 + This map type redirects raw XDP frames to `AF_XDP`_ sockets (XSKs), a new type of 14 + address family in the kernel that allows redirection of frames from a driver to 15 + user space without having to traverse the full network stack. An AF_XDP socket 16 + binds to a single netdev queue. A mapping of XSKs to queues is shown below: 17 + 18 + .. code-block:: none 19 + 20 + +---------------------------------------------------+ 21 + | xsk A | xsk B | xsk C |<---+ User space 22 + =========================================================|========== 23 + | Queue 0 | Queue 1 | Queue 2 | | Kernel 24 + +---------------------------------------------------+ | 25 + | Netdev eth0 | | 26 + +---------------------------------------------------+ | 27 + | +=============+ | | 28 + | | key | xsk | | | 29 + | +---------+ +=============+ | | 30 + | | | | 0 | xsk A | | | 31 + | | | +-------------+ | | 32 + | | | | 1 | xsk B | | | 33 + | | BPF |-- redirect -->+-------------+-------------+ 34 + | | prog | | 2 | xsk C | | 35 + | | | +-------------+ | 36 + | | | | 37 + | | | | 38 + | +---------+ | 39 + | | 40 + +---------------------------------------------------+ 41 + 42 + .. note:: 43 + An AF_XDP socket that is bound to a certain <netdev/queue_id> will *only* 44 + accept XDP frames from that <netdev/queue_id>. If an XDP program tries to redirect 45 + from a <netdev/queue_id> other than what the socket is bound to, the frame will 46 + not be received on the socket. 47 + 48 + Typically an XSKMAP is created per netdev. This map contains an array of XSK File 49 + Descriptors (FDs). The number of array elements is typically set or adjusted using 50 + the ``max_entries`` map parameter. For AF_XDP ``max_entries`` is equal to the number 51 + of queues supported by the netdev. 52 + 53 + .. note:: 54 + Both the map key and map value size must be 4 bytes. 55 + 56 + Usage 57 + ===== 58 + 59 + Kernel BPF 60 + ---------- 61 + bpf_redirect_map() 62 + ^^^^^^^^^^^^^^^^^^ 63 + .. code-block:: c 64 + 65 + long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) 66 + 67 + Redirect the packet to the endpoint referenced by ``map`` at index ``key``. 68 + For ``BPF_MAP_TYPE_XSKMAP`` this map contains references to XSK FDs 69 + for sockets attached to a netdev's queues. 70 + 71 + .. note:: 72 + If the map is empty at an index, the packet is dropped. This means that it is 73 + necessary to have an XDP program loaded with at least one XSK in the 74 + XSKMAP to be able to get any traffic to user space through the socket. 75 + 76 + bpf_map_lookup_elem() 77 + ^^^^^^^^^^^^^^^^^^^^^ 78 + .. code-block:: c 79 + 80 + void *bpf_map_lookup_elem(struct bpf_map *map, const void *key) 81 + 82 + XSK entry references of type ``struct xdp_sock *`` can be retrieved using the 83 + ``bpf_map_lookup_elem()`` helper. 84 + 85 + User space 86 + ---------- 87 + .. note:: 88 + XSK entries can only be updated/deleted from user space and not from 89 + a BPF program. Trying to call these functions from a kernel BPF program will 90 + result in the program failing to load and a verifier warning. 91 + 92 + bpf_map_update_elem() 93 + ^^^^^^^^^^^^^^^^^^^^^ 94 + .. code-block:: c 95 + 96 + int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags) 97 + 98 + XSK entries can be added or updated using the ``bpf_map_update_elem()`` 99 + helper. The ``key`` parameter is equal to the queue_id of the queue the XSK 100 + is attaching to. And the ``value`` parameter is the FD value of that socket. 101 + 102 + Under the hood, the XSKMAP update function uses the XSK FD value to retrieve the 103 + associated ``struct xdp_sock`` instance. 104 + 105 + The flags argument can be one of the following: 106 + 107 + - BPF_ANY: Create a new element or update an existing element. 108 + - BPF_NOEXIST: Create a new element only if it did not exist. 109 + - BPF_EXIST: Update an existing element. 110 + 111 + bpf_map_lookup_elem() 112 + ^^^^^^^^^^^^^^^^^^^^^ 113 + .. code-block:: c 114 + 115 + int bpf_map_lookup_elem(int fd, const void *key, void *value) 116 + 117 + Returns ``struct xdp_sock *`` or negative error in case of failure. 118 + 119 + bpf_map_delete_elem() 120 + ^^^^^^^^^^^^^^^^^^^^^ 121 + .. code-block:: c 122 + 123 + int bpf_map_delete_elem(int fd, const void *key) 124 + 125 + XSK entries can be deleted using the ``bpf_map_delete_elem()`` 126 + helper. This helper will return 0 on success, or negative error in case of 127 + failure. 128 + 129 + .. note:: 130 + When `libxdp`_ deletes an XSK it also removes the associated socket 131 + entry from the XSKMAP. 132 + 133 + Examples 134 + ======== 135 + Kernel 136 + ------ 137 + 138 + The following code snippet shows how to declare a ``BPF_MAP_TYPE_XSKMAP`` called 139 + ``xsks_map`` and how to redirect packets to an XSK. 140 + 141 + .. code-block:: c 142 + 143 + struct { 144 + __uint(type, BPF_MAP_TYPE_XSKMAP); 145 + __type(key, __u32); 146 + __type(value, __u32); 147 + __uint(max_entries, 64); 148 + } xsks_map SEC(".maps"); 149 + 150 + 151 + SEC("xdp") 152 + int xsk_redir_prog(struct xdp_md *ctx) 153 + { 154 + __u32 index = ctx->rx_queue_index; 155 + 156 + if (bpf_map_lookup_elem(&xsks_map, &index)) 157 + return bpf_redirect_map(&xsks_map, index, 0); 158 + return XDP_PASS; 159 + } 160 + 161 + User space 162 + ---------- 163 + 164 + The following code snippet shows how to update an XSKMAP with an XSK entry. 165 + 166 + .. code-block:: c 167 + 168 + int update_xsks_map(struct bpf_map *xsks_map, int queue_id, int xsk_fd) 169 + { 170 + int ret; 171 + 172 + ret = bpf_map_update_elem(bpf_map__fd(xsks_map), &queue_id, &xsk_fd, 0); 173 + if (ret < 0) 174 + fprintf(stderr, "Failed to update xsks_map: %s\n", strerror(errno)); 175 + 176 + return ret; 177 + } 178 + 179 + For an example on how create AF_XDP sockets, please see the AF_XDP-example and 180 + AF_XDP-forwarding programs in the `bpf-examples`_ directory in the `libxdp`_ repository. 181 + For a detailed explaination of the AF_XDP interface please see: 182 + 183 + - `libxdp-readme`_. 184 + - `AF_XDP`_ kernel documentation. 185 + 186 + .. note:: 187 + The most comprehensive resource for using XSKMAPs and AF_XDP is `libxdp`_. 188 + 189 + .. _libxdp: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp 190 + .. _AF_XDP: https://www.kernel.org/doc/html/latest/networking/af_xdp.html 191 + .. _bpf-examples: https://github.com/xdp-project/bpf-examples 192 + .. _libxdp-readme: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp#using-af_xdp-sockets

+3

Documentation/bpf/programs.rst

··· 7 7 :glob: 8 8 9 9 prog_* 10 + 11 + For a list of all program types, see :ref:`program_types_and_elf` in 12 + the :ref:`libbpf` documentation.

+81

Documentation/bpf/redirect.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0-only 2 + .. Copyright (C) 2022 Red Hat, Inc. 3 + 4 + ======== 5 + Redirect 6 + ======== 7 + XDP_REDIRECT 8 + ############ 9 + Supported maps 10 + -------------- 11 + 12 + XDP_REDIRECT works with the following map types: 13 + 14 + - ``BPF_MAP_TYPE_DEVMAP`` 15 + - ``BPF_MAP_TYPE_DEVMAP_HASH`` 16 + - ``BPF_MAP_TYPE_CPUMAP`` 17 + - ``BPF_MAP_TYPE_XSKMAP`` 18 + 19 + For more information on these maps, please see the specific map documentation. 20 + 21 + Process 22 + ------- 23 + 24 + .. kernel-doc:: net/core/filter.c 25 + :doc: xdp redirect 26 + 27 + .. note:: 28 + Not all drivers support transmitting frames after a redirect, and for 29 + those that do, not all of them support non-linear frames. Non-linear xdp 30 + bufs/frames are bufs/frames that contain more than one fragment. 31 + 32 + Debugging packet drops 33 + ---------------------- 34 + Silent packet drops for XDP_REDIRECT can be debugged using: 35 + 36 + - bpf_trace 37 + - perf_record 38 + 39 + bpf_trace 40 + ^^^^^^^^^ 41 + The following bpftrace command can be used to capture and count all XDP tracepoints: 42 + 43 + .. code-block:: none 44 + 45 + sudo bpftrace -e 'tracepoint:xdp:* { @cnt[probe] = count(); }' 46 + Attaching 12 probes... 47 + ^C 48 + 49 + @cnt[tracepoint:xdp:mem_connect]: 18 50 + @cnt[tracepoint:xdp:mem_disconnect]: 18 51 + @cnt[tracepoint:xdp:xdp_exception]: 19605 52 + @cnt[tracepoint:xdp:xdp_devmap_xmit]: 1393604 53 + @cnt[tracepoint:xdp:xdp_redirect]: 22292200 54 + 55 + .. note:: 56 + The various xdp tracepoints can be found in ``source/include/trace/events/xdp.h`` 57 + 58 + The following bpftrace command can be used to extract the ``ERRNO`` being returned as 59 + part of the err parameter: 60 + 61 + .. code-block:: none 62 + 63 + sudo bpftrace -e \ 64 + 'tracepoint:xdp:xdp_redirect*_err {@redir_errno[-args->err] = count();} 65 + tracepoint:xdp:xdp_devmap_xmit {@devmap_errno[-args->err] = count();}' 66 + 67 + perf record 68 + ^^^^^^^^^^^ 69 + The perf tool also supports recording tracepoints: 70 + 71 + .. code-block:: none 72 + 73 + perf record -a -e xdp:xdp_redirect_err \ 74 + -e xdp:xdp_redirect_map_err \ 75 + -e xdp:xdp_exception \ 76 + -e xdp:xdp_devmap_xmit 77 + 78 + References 79 + =========== 80 + 81 + - https://github.com/xdp-project/xdp-tutorial/tree/master/tracing02-xdp-monitor

+107 -47

include/linux/bpf.h

··· 54 54 extern struct idr btf_idr; 55 55 extern spinlock_t btf_idr_lock; 56 56 extern struct kobject *btf_kobj; 57 + extern struct bpf_mem_alloc bpf_global_ma; 58 + extern bool bpf_global_ma_set; 57 59 58 60 typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64); 59 61 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data, ··· 87 85 int (*map_lookup_and_delete_batch)(struct bpf_map *map, 88 86 const union bpf_attr *attr, 89 87 union bpf_attr __user *uattr); 90 - int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr, 88 + int (*map_update_batch)(struct bpf_map *map, struct file *map_file, 89 + const union bpf_attr *attr, 91 90 union bpf_attr __user *uattr); 92 91 int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr, 93 92 union bpf_attr __user *uattr); ··· 138 135 struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner); 139 136 140 137 /* Misc helpers.*/ 141 - int (*map_redirect)(struct bpf_map *map, u32 ifindex, u64 flags); 138 + int (*map_redirect)(struct bpf_map *map, u64 key, u64 flags); 142 139 143 140 /* map_meta_equal must be implemented for maps that can be 144 141 * used as an inner map. It is a runtime check to ensure ··· 168 165 }; 169 166 170 167 enum { 171 - /* Support at most 8 pointers in a BTF type */ 172 - BTF_FIELDS_MAX = 10, 173 - BPF_MAP_OFF_ARR_MAX = BTF_FIELDS_MAX, 168 + /* Support at most 10 fields in a BTF type */ 169 + BTF_FIELDS_MAX = 10, 174 170 }; 175 171 176 172 enum btf_field_type { ··· 178 176 BPF_KPTR_UNREF = (1 << 2), 179 177 BPF_KPTR_REF = (1 << 3), 180 178 BPF_KPTR = BPF_KPTR_UNREF | BPF_KPTR_REF, 179 + BPF_LIST_HEAD = (1 << 4), 180 + BPF_LIST_NODE = (1 << 5), 181 181 }; 182 182 183 183 struct btf_field_kptr { ··· 189 185 u32 btf_id; 190 186 }; 191 187 188 + struct btf_field_list_head { 189 + struct btf *btf; 190 + u32 value_btf_id; 191 + u32 node_offset; 192 + struct btf_record *value_rec; 193 + }; 194 + 192 195 struct btf_field { 193 196 u32 offset; 194 197 enum btf_field_type type; 195 198 union { 196 199 struct btf_field_kptr kptr; 200 + struct btf_field_list_head list_head; 197 201 }; 198 202 }; 199 203 ··· 215 203 216 204 struct btf_field_offs { 217 205 u32 cnt; 218 - u32 field_off[BPF_MAP_OFF_ARR_MAX]; 219 - u8 field_sz[BPF_MAP_OFF_ARR_MAX]; 206 + u32 field_off[BTF_FIELDS_MAX]; 207 + u8 field_sz[BTF_FIELDS_MAX]; 220 208 }; 221 209 222 210 struct bpf_map { ··· 279 267 case BPF_KPTR_UNREF: 280 268 case BPF_KPTR_REF: 281 269 return "kptr"; 270 + case BPF_LIST_HEAD: 271 + return "bpf_list_head"; 272 + case BPF_LIST_NODE: 273 + return "bpf_list_node"; 282 274 default: 283 275 WARN_ON_ONCE(1); 284 276 return "unknown"; ··· 299 283 case BPF_KPTR_UNREF: 300 284 case BPF_KPTR_REF: 301 285 return sizeof(u64); 286 + case BPF_LIST_HEAD: 287 + return sizeof(struct bpf_list_head); 288 + case BPF_LIST_NODE: 289 + return sizeof(struct bpf_list_node); 302 290 default: 303 291 WARN_ON_ONCE(1); 304 292 return 0; ··· 319 299 case BPF_KPTR_UNREF: 320 300 case BPF_KPTR_REF: 321 301 return __alignof__(u64); 302 + case BPF_LIST_HEAD: 303 + return __alignof__(struct bpf_list_head); 304 + case BPF_LIST_NODE: 305 + return __alignof__(struct bpf_list_node); 322 306 default: 323 307 WARN_ON_ONCE(1); 324 308 return 0; ··· 336 312 return rec->field_mask & type; 337 313 } 338 314 315 + static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj) 316 + { 317 + int i; 318 + 319 + if (!foffs) 320 + return; 321 + for (i = 0; i < foffs->cnt; i++) 322 + memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]); 323 + } 324 + 339 325 static inline void check_and_init_map_value(struct bpf_map *map, void *dst) 340 326 { 341 - if (!IS_ERR_OR_NULL(map->record)) { 342 - struct btf_field *fields = map->record->fields; 343 - u32 cnt = map->record->cnt; 344 - int i; 345 - 346 - for (i = 0; i < cnt; i++) 347 - memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type)); 348 - } 327 + bpf_obj_init(map->field_offs, dst); 349 328 } 350 329 351 330 /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and ··· 388 361 u32 sz = next_off - curr_off; 389 362 390 363 memcpy(dst + curr_off, src + curr_off, sz); 391 - curr_off = next_off + foffs->field_sz[i]; 364 + curr_off += foffs->field_sz[i] + sz; 392 365 } 393 366 memcpy(dst + curr_off, src + curr_off, size - curr_off); 394 367 } ··· 418 391 u32 sz = next_off - curr_off; 419 392 420 393 memset(dst + curr_off, 0, sz); 421 - curr_off = next_off + foffs->field_sz[i]; 394 + curr_off += foffs->field_sz[i] + sz; 422 395 } 423 396 memset(dst + curr_off, 0, size - curr_off); 424 397 } ··· 431 404 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, 432 405 bool lock_src); 433 406 void bpf_timer_cancel_and_free(void *timer); 407 + void bpf_list_head_free(const struct btf_field *field, void *list_head, 408 + struct bpf_spin_lock *spin_lock); 409 + 434 410 int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size); 435 411 436 412 struct bpf_offload_dev; ··· 502 472 */ 503 473 MEM_RDONLY = BIT(1 + BPF_BASE_TYPE_BITS), 504 474 505 - /* MEM was "allocated" from a different helper, and cannot be mixed 506 - * with regular non-MEM_ALLOC'ed MEM types. 507 - */ 508 - MEM_ALLOC = BIT(2 + BPF_BASE_TYPE_BITS), 475 + /* MEM points to BPF ring buffer reservation. */ 476 + MEM_RINGBUF = BIT(2 + BPF_BASE_TYPE_BITS), 509 477 510 478 /* MEM is in user address space. */ 511 479 MEM_USER = BIT(3 + BPF_BASE_TYPE_BITS), ··· 537 509 538 510 /* Size is known at compile time. */ 539 511 MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS), 512 + 513 + /* MEM is of an allocated object of type in program BTF. This is used to 514 + * tag PTR_TO_BTF_ID allocated using bpf_obj_new. 515 + */ 516 + MEM_ALLOC = BIT(11 + BPF_BASE_TYPE_BITS), 517 + 518 + /* PTR was passed from the kernel in a trusted context, and may be 519 + * passed to KF_TRUSTED_ARGS kfuncs or BPF helper functions. 520 + * Confusingly, this is _not_ the opposite of PTR_UNTRUSTED above. 521 + * PTR_UNTRUSTED refers to a kptr that was read directly from a map 522 + * without invoking bpf_kptr_xchg(). What we really need to know is 523 + * whether a pointer is safe to pass to a kfunc or BPF helper function. 524 + * While PTR_UNTRUSTED pointers are unsafe to pass to kfuncs and BPF 525 + * helpers, they do not cover all possible instances of unsafe 526 + * pointers. For example, a pointer that was obtained from walking a 527 + * struct will _not_ get the PTR_UNTRUSTED type modifier, despite the 528 + * fact that it may be NULL, invalid, etc. This is due to backwards 529 + * compatibility requirements, as this was the behavior that was first 530 + * introduced when kptrs were added. The behavior is now considered 531 + * deprecated, and PTR_UNTRUSTED will eventually be removed. 532 + * 533 + * PTR_TRUSTED, on the other hand, is a pointer that the kernel 534 + * guarantees to be valid and safe to pass to kfuncs and BPF helpers. 535 + * For example, pointers passed to tracepoint arguments are considered 536 + * PTR_TRUSTED, as are pointers that are passed to struct_ops 537 + * callbacks. As alluded to above, pointers that are obtained from 538 + * walking PTR_TRUSTED pointers are _not_ trusted. For example, if a 539 + * struct task_struct *task is PTR_TRUSTED, then accessing 540 + * task->last_wakee will lose the PTR_TRUSTED modifier when it's stored 541 + * in a BPF register. Similarly, pointers passed to certain programs 542 + * types such as kretprobes are not guaranteed to be valid, as they may 543 + * for example contain an object that was recently freed. 544 + */ 545 + PTR_TRUSTED = BIT(12 + BPF_BASE_TYPE_BITS), 546 + 547 + /* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */ 548 + MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS), 540 549 541 550 __BPF_TYPE_FLAG_MAX, 542 551 __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, ··· 614 549 ARG_PTR_TO_LONG, /* pointer to long */ 615 550 ARG_PTR_TO_SOCKET, /* pointer to bpf_sock (fullsock) */ 616 551 ARG_PTR_TO_BTF_ID, /* pointer to in-kernel struct */ 617 - ARG_PTR_TO_ALLOC_MEM, /* pointer to dynamically allocated memory */ 552 + ARG_PTR_TO_RINGBUF_MEM, /* pointer to dynamically reserved ringbuf memory */ 618 553 ARG_CONST_ALLOC_SIZE_OR_ZERO, /* number of allocated bytes requested */ 619 554 ARG_PTR_TO_BTF_ID_SOCK_COMMON, /* pointer to in-kernel sock_common or bpf-mirrored bpf_sock */ 620 555 ARG_PTR_TO_PERCPU_BTF_ID, /* pointer to in-kernel percpu type */ ··· 631 566 ARG_PTR_TO_MEM_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_MEM, 632 567 ARG_PTR_TO_CTX_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_CTX, 633 568 ARG_PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET, 634 - ARG_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM, 635 569 ARG_PTR_TO_STACK_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_STACK, 636 570 ARG_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID, 637 571 /* pointer to memory does not need to be initialized, helper function must fill ··· 655 591 RET_PTR_TO_SOCKET, /* returns a pointer to a socket */ 656 592 RET_PTR_TO_TCP_SOCK, /* returns a pointer to a tcp_sock */ 657 593 RET_PTR_TO_SOCK_COMMON, /* returns a pointer to a sock_common */ 658 - RET_PTR_TO_ALLOC_MEM, /* returns a pointer to dynamically allocated memory */ 594 + RET_PTR_TO_MEM, /* returns a pointer to memory */ 659 595 RET_PTR_TO_MEM_OR_BTF_ID, /* returns a pointer to a valid memory or a btf_id */ 660 596 RET_PTR_TO_BTF_ID, /* returns a pointer to a btf_id */ 661 597 __BPF_RET_TYPE_MAX, ··· 665 601 RET_PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_SOCKET, 666 602 RET_PTR_TO_TCP_SOCK_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK, 667 603 RET_PTR_TO_SOCK_COMMON_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON, 668 - RET_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_ALLOC_MEM, 669 - RET_PTR_TO_DYNPTR_MEM_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_ALLOC_MEM, 604 + RET_PTR_TO_RINGBUF_MEM_OR_NULL = PTR_MAYBE_NULL | MEM_RINGBUF | RET_PTR_TO_MEM, 605 + RET_PTR_TO_DYNPTR_MEM_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_MEM, 670 606 RET_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID, 607 + RET_PTR_TO_BTF_ID_TRUSTED = PTR_TRUSTED | RET_PTR_TO_BTF_ID, 671 608 672 609 /* This must be the last entry. Its purpose is to ensure the enum is 673 610 * wide enough to hold the higher bits reserved for bpf_type_flag. ··· 685 620 u64 (*func)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); 686 621 bool gpl_only; 687 622 bool pkt_access; 623 + bool might_sleep; 688 624 enum bpf_return_type ret_type; 689 625 union { 690 626 struct { ··· 824 758 union bpf_attr __user *uattr); 825 759 }; 826 760 761 + struct bpf_reg_state; 827 762 struct bpf_verifier_ops { 828 763 /* return eBPF function prototype for verification */ 829 764 const struct bpf_func_proto * ··· 846 779 struct bpf_insn *dst, 847 780 struct bpf_prog *prog, u32 *target_size); 848 781 int (*btf_struct_access)(struct bpf_verifier_log *log, 849 - const struct btf *btf, 850 - const struct btf_type *t, int off, int size, 851 - enum bpf_access_type atype, 782 + const struct bpf_reg_state *reg, 783 + int off, int size, enum bpf_access_type atype, 852 784 u32 *next_btf_id, enum bpf_type_flag *flag); 853 785 }; 854 786 ··· 1860 1794 int generic_map_lookup_batch(struct bpf_map *map, 1861 1795 const union bpf_attr *attr, 1862 1796 union bpf_attr __user *uattr); 1863 - int generic_map_update_batch(struct bpf_map *map, 1797 + int generic_map_update_batch(struct bpf_map *map, struct file *map_file, 1864 1798 const union bpf_attr *attr, 1865 1799 union bpf_attr __user *uattr); 1866 1800 int generic_map_delete_batch(struct bpf_map *map, ··· 2151 2085 return btf_ctx_access(off, size, type, prog, info); 2152 2086 } 2153 2087 2154 - int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf, 2155 - const struct btf_type *t, int off, int size, 2156 - enum bpf_access_type atype, 2088 + int btf_struct_access(struct bpf_verifier_log *log, 2089 + const struct bpf_reg_state *reg, 2090 + int off, int size, enum bpf_access_type atype, 2157 2091 u32 *next_btf_id, enum bpf_type_flag *flag); 2158 2092 bool btf_struct_ids_match(struct bpf_verifier_log *log, 2159 2093 const struct btf *btf, u32 id, int off, ··· 2166 2100 const char *func_name, 2167 2101 struct btf_func_model *m); 2168 2102 2169 - struct bpf_kfunc_arg_meta { 2170 - u64 r0_size; 2171 - bool r0_rdonly; 2172 - int ref_obj_id; 2173 - u32 flags; 2174 - }; 2175 - 2176 2103 struct bpf_reg_state; 2177 2104 int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog, 2178 2105 struct bpf_reg_state *regs); 2179 2106 int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog, 2180 2107 struct bpf_reg_state *regs); 2181 - int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, 2182 - const struct btf *btf, u32 func_id, 2183 - struct bpf_reg_state *regs, 2184 - struct bpf_kfunc_arg_meta *meta); 2185 2108 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, 2186 2109 struct bpf_reg_state *reg); 2187 2110 int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog, ··· 2393 2338 } 2394 2339 2395 2340 static inline int btf_struct_access(struct bpf_verifier_log *log, 2396 - const struct btf *btf, 2397 - const struct btf_type *t, int off, int size, 2398 - enum bpf_access_type atype, 2341 + const struct bpf_reg_state *reg, 2342 + int off, int size, enum bpf_access_type atype, 2399 2343 u32 *next_btf_id, enum bpf_type_flag *flag) 2400 2344 { 2401 2345 return -EACCES; ··· 2851 2797 bool has_ref; 2852 2798 }; 2853 2799 #endif /* CONFIG_KEYS */ 2800 + 2801 + static inline bool type_is_alloc(u32 type) 2802 + { 2803 + return type & MEM_ALLOC; 2804 + } 2805 + 2854 2806 #endif /* _LINUX_BPF_H */

+33 -4

include/linux/bpf_verifier.h

··· 19 19 */ 20 20 #define BPF_MAX_VAR_SIZ (1 << 29) 21 21 /* size of type_str_buf in bpf_verifier. */ 22 - #define TYPE_STR_BUF_LEN 64 22 + #define TYPE_STR_BUF_LEN 128 23 23 24 24 /* Liveness marks, used for registers and spilled-regs (in stack slots). 25 25 * Read marks propagate upwards until they find a write mark; they record that ··· 223 223 * exiting a callback function. 224 224 */ 225 225 int callback_ref; 226 + /* Mark the reference state to release the registers sharing the same id 227 + * on bpf_spin_unlock (for nodes that we will lose ownership to but are 228 + * safe to access inside the critical section). 229 + */ 230 + bool release_on_unlock; 226 231 }; 227 232 228 233 /* state of the program: ··· 328 323 u32 branches; 329 324 u32 insn_idx; 330 325 u32 curframe; 331 - u32 active_spin_lock; 326 + /* For every reg representing a map value or allocated object pointer, 327 + * we consider the tuple of (ptr, id) for them to be unique in verifier 328 + * context and conside them to not alias each other for the purposes of 329 + * tracking lock state. 330 + */ 331 + struct { 332 + /* This can either be reg->map_ptr or reg->btf. If ptr is NULL, 333 + * there's no active lock held, and other fields have no 334 + * meaning. If non-NULL, it indicates that a lock is held and 335 + * id member has the reg->id of the register which can be >= 0. 336 + */ 337 + void *ptr; 338 + /* This will be reg->id */ 339 + u32 id; 340 + } active_lock; 332 341 bool speculative; 342 + bool active_rcu_lock; 333 343 334 344 /* first and last insn idx of this verifier state */ 335 345 u32 first_insn_idx; ··· 439 419 */ 440 420 struct bpf_loop_inline_state loop_inline_state; 441 421 }; 422 + u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */ 423 + struct btf_struct_meta *kptr_struct_meta; 442 424 u64 map_key_state; /* constant (32 bit) key tracking for maps */ 443 425 int ctx_field_size; /* the ctx field size for load insn, maybe 0 */ 444 426 u32 seen; /* this insn was processed by the verifier at env->pass_cnt */ 445 427 bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */ 446 428 bool zext_dst; /* this insn zero extends dst reg */ 429 + bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */ 447 430 u8 alu_state; /* used in combination with alu_limit */ 448 431 449 432 /* below fields are initialized once */ ··· 536 513 bool bypass_spec_v1; 537 514 bool bypass_spec_v4; 538 515 bool seen_direct_write; 516 + bool rcu_tag_supported; 539 517 struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */ 540 518 const struct bpf_line_info *prev_linfo; 541 519 struct bpf_verifier_log log; ··· 613 589 int check_func_arg_reg_off(struct bpf_verifier_env *env, 614 590 const struct bpf_reg_state *reg, int regno, 615 591 enum bpf_arg_type arg_type); 616 - int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 617 - u32 regno); 618 592 int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 619 593 u32 regno, u32 mem_size); 620 594 bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, ··· 681 659 default: 682 660 return true; 683 661 } 662 + } 663 + 664 + #define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | MEM_RCU | PTR_TRUSTED) 665 + 666 + static inline bool bpf_type_has_unsafe_modifiers(u32 type) 667 + { 668 + return type_flag(type) & ~BPF_REG_TRUSTED_MODIFIERS; 684 669 } 685 670 686 671 #endif /* _LINUX_BPF_VERIFIER_H */

+113 -26

include/linux/btf.h

··· 6 6 7 7 #include <linux/types.h> 8 8 #include <linux/bpfptr.h> 9 + #include <linux/bsearch.h> 10 + #include <linux/btf_ids.h> 9 11 #include <uapi/linux/btf.h> 10 12 #include <uapi/linux/bpf.h> 11 13 ··· 19 17 #define KF_RELEASE (1 << 1) /* kfunc is a release function */ 20 18 #define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ 21 19 #define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ 22 - /* Trusted arguments are those which are meant to be referenced arguments with 23 - * unchanged offset. It is used to enforce that pointers obtained from acquire 24 - * kfuncs remain unmodified when being passed to helpers taking trusted args. 20 + /* Trusted arguments are those which are guaranteed to be valid when passed to 21 + * the kfunc. It is used to enforce that pointers obtained from either acquire 22 + * kfuncs, or from the main kernel on a tracepoint or struct_ops callback 23 + * invocation, remain unmodified when being passed to helpers taking trusted 24 + * args. 25 25 * 26 - * Consider 27 - * struct foo { 28 - * int data; 29 - * struct foo *next; 30 - * }; 26 + * Consider, for example, the following new task tracepoint: 31 27 * 32 - * struct bar { 33 - * int data; 34 - * struct foo f; 35 - * }; 28 + * SEC("tp_btf/task_newtask") 29 + * int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags) 30 + * { 31 + * ... 32 + * } 36 33 * 37 - * struct foo *f = alloc_foo(); // Acquire kfunc 38 - * struct bar *b = alloc_bar(); // Acquire kfunc 34 + * And the following kfunc: 39 35 * 40 - * If a kfunc set_foo_data() wants to operate only on the allocated object, it 41 - * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like: 36 + * BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) 42 37 * 43 - * set_foo_data(f, 42); // Allowed 44 - * set_foo_data(f->next, 42); // Rejected, non-referenced pointer 45 - * set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type 46 - * set_foo_data(&b->f, 42); // Rejected, referenced, but bad offset 38 + * All invocations to the kfunc must pass the unmodified, unwalked task: 47 39 * 48 - * In the final case, usually for the purposes of type matching, it is deduced 49 - * by looking at the type of the member at the offset, but due to the 50 - * requirement of trusted argument, this deduction will be strict and not done 51 - * for this case. 40 + * bpf_task_acquire(task); // Allowed 41 + * bpf_task_acquire(task->last_wakee); // Rejected, walked task 42 + * 43 + * Programs may also pass referenced tasks directly to the kfunc: 44 + * 45 + * struct task_struct *acquired; 46 + * 47 + * acquired = bpf_task_acquire(task); // Allowed, same as above 48 + * bpf_task_acquire(acquired); // Allowed 49 + * bpf_task_acquire(task); // Allowed 50 + * bpf_task_acquire(acquired->last_wakee); // Rejected, walked task 51 + * 52 + * Programs may _not_, however, pass a task from an arbitrary fentry/fexit, or 53 + * kprobe/kretprobe to the kfunc, as BPF cannot guarantee that all of these 54 + * pointers are guaranteed to be safe. For example, the following BPF program 55 + * would be rejected: 56 + * 57 + * SEC("kretprobe/free_task") 58 + * int BPF_PROG(free_task_probe, struct task_struct *tsk) 59 + * { 60 + * struct task_struct *acquired; 61 + * 62 + * acquired = bpf_task_acquire(acquired); // Rejected, not a trusted pointer 63 + * bpf_task_release(acquired); 64 + * 65 + * return 0; 66 + * } 52 67 */ 53 68 #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ 54 69 #define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ ··· 95 76 struct btf_id_dtor_kfunc { 96 77 u32 btf_id; 97 78 u32 kfunc_btf_id; 79 + }; 80 + 81 + struct btf_struct_meta { 82 + u32 btf_id; 83 + struct btf_record *record; 84 + struct btf_field_offs *field_offs; 85 + }; 86 + 87 + struct btf_struct_metas { 88 + u32 cnt; 89 + struct btf_struct_meta types[]; 98 90 }; 99 91 100 92 typedef void (*btf_dtor_kfunc_t)(void *); ··· 195 165 int btf_find_timer(const struct btf *btf, const struct btf_type *t); 196 166 struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t, 197 167 u32 field_mask, u32 value_size); 168 + int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec); 198 169 struct btf_field_offs *btf_parse_field_offs(struct btf_record *rec); 199 170 bool btf_type_is_void(const struct btf_type *t); 200 171 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind); ··· 355 324 return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION; 356 325 } 357 326 327 + static inline bool __btf_type_is_struct(const struct btf_type *t) 328 + { 329 + return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT; 330 + } 331 + 332 + static inline bool btf_type_is_array(const struct btf_type *t) 333 + { 334 + return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY; 335 + } 336 + 358 337 static inline u16 btf_type_vlen(const struct btf_type *t) 359 338 { 360 339 return BTF_INFO_VLEN(t->info); ··· 449 408 return (struct btf_param *)(t + 1); 450 409 } 451 410 452 - #ifdef CONFIG_BPF_SYSCALL 453 - struct bpf_prog; 411 + static inline int btf_id_cmp_func(const void *a, const void *b) 412 + { 413 + const int *pa = a, *pb = b; 454 414 415 + return *pa - *pb; 416 + } 417 + 418 + static inline bool btf_id_set_contains(const struct btf_id_set *set, u32 id) 419 + { 420 + return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL; 421 + } 422 + 423 + static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id) 424 + { 425 + return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func); 426 + } 427 + 428 + struct bpf_prog; 429 + struct bpf_verifier_log; 430 + 431 + #ifdef CONFIG_BPF_SYSCALL 455 432 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); 456 433 const char *btf_name_by_offset(const struct btf *btf, u32 offset); 457 434 struct btf *btf_parse_vmlinux(void); ··· 482 423 s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id); 483 424 int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt, 484 425 struct module *owner); 426 + struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id); 427 + const struct btf_member * 428 + btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf, 429 + const struct btf_type *t, enum bpf_prog_type prog_type, 430 + int arg); 431 + int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_type); 432 + bool btf_types_are_same(const struct btf *btf1, u32 id1, 433 + const struct btf *btf2, u32 id2); 485 434 #else 486 435 static inline const struct btf_type *btf_type_by_id(const struct btf *btf, 487 436 u32 type_id) ··· 520 453 u32 add_cnt, struct module *owner) 521 454 { 522 455 return 0; 456 + } 457 + static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id) 458 + { 459 + return NULL; 460 + } 461 + static inline const struct btf_member * 462 + btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf, 463 + const struct btf_type *t, enum bpf_prog_type prog_type, 464 + int arg) 465 + { 466 + return NULL; 467 + } 468 + static inline int get_kern_ctx_btf_id(struct bpf_verifier_log *log, 469 + enum bpf_prog_type prog_type) { 470 + return -EINVAL; 471 + } 472 + static inline bool btf_types_are_same(const struct btf *btf1, u32 id1, 473 + const struct btf *btf2, u32 id2) 474 + { 475 + return false; 523 476 } 524 477 #endif 525 478

+1 -1

include/linux/btf_ids.h

··· 204 204 205 205 #else 206 206 207 - #define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; 207 + #define BTF_ID_LIST(name) static u32 __maybe_unused name[16]; 208 208 #define BTF_ID(prefix, name) 209 209 #define BTF_ID_FLAGS(prefix, name, ...) 210 210 #define BTF_ID_UNUSED

+2 -1

include/linux/compiler_types.h

··· 49 49 # endif 50 50 # define __iomem 51 51 # define __percpu BTF_TYPE_TAG(percpu) 52 - # define __rcu 52 + # define __rcu BTF_TYPE_TAG(rcu) 53 + 53 54 # define __chk_user_ptr(x) (void)0 54 55 # define __chk_io_ptr(x) (void)0 55 56 /* context/locking */

+10 -10

include/linux/filter.h

··· 568 568 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); 569 569 570 570 extern struct mutex nf_conn_btf_access_lock; 571 - extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf, 572 - const struct btf_type *t, int off, int size, 573 - enum bpf_access_type atype, u32 *next_btf_id, 574 - enum bpf_type_flag *flag); 571 + extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, 572 + const struct bpf_reg_state *reg, 573 + int off, int size, enum bpf_access_type atype, 574 + u32 *next_btf_id, enum bpf_type_flag *flag); 575 575 576 576 typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx, 577 577 const struct bpf_insn *insnsi, ··· 643 643 }; 644 644 645 645 struct bpf_redirect_info { 646 - u32 flags; 647 - u32 tgt_index; 646 + u64 tgt_index; 648 647 void *tgt_value; 649 648 struct bpf_map *map; 649 + u32 flags; 650 + u32 kern_flags; 650 651 u32 map_id; 651 652 enum bpf_map_type map_type; 652 - u32 kern_flags; 653 653 struct bpf_nh_params nh; 654 654 }; 655 655 ··· 1504 1504 } 1505 1505 #endif /* IS_ENABLED(CONFIG_IPV6) */ 1506 1506 1507 - static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifindex, 1507 + static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u64 index, 1508 1508 u64 flags, const u64 flag_mask, 1509 1509 void *lookup_elem(struct bpf_map *map, u32 key)) 1510 1510 { ··· 1515 1515 if (unlikely(flags & ~(action_mask | flag_mask))) 1516 1516 return XDP_ABORTED; 1517 1517 1518 - ri->tgt_value = lookup_elem(map, ifindex); 1518 + ri->tgt_value = lookup_elem(map, index); 1519 1519 if (unlikely(!ri->tgt_value) && !(flags & BPF_F_BROADCAST)) { 1520 1520 /* If the lookup fails we want to clear out the state in the 1521 1521 * redirect_info struct completely, so that if an eBPF program ··· 1527 1527 return flags & action_mask; 1528 1528 } 1529 1529 1530 - ri->tgt_index = ifindex; 1530 + ri->tgt_index = index; 1531 1531 ri->map_id = map->id; 1532 1532 ri->map_type = map->map_type; 1533 1533

+1 -1

include/linux/netdevice.h

··· 3135 3135 /* stats */ 3136 3136 unsigned int processed; 3137 3137 unsigned int time_squeeze; 3138 - unsigned int received_rps; 3139 3138 #ifdef CONFIG_RPS 3140 3139 struct softnet_data *rps_ipi_list; 3141 3140 #endif ··· 3167 3168 unsigned int cpu; 3168 3169 unsigned int input_queue_tail; 3169 3170 #endif 3171 + unsigned int received_rps; 3170 3172 unsigned int dropped; 3171 3173 struct sk_buff_head input_pkt_queue; 3172 3174 struct napi_struct backlog;

+23 -10

include/uapi/linux/bpf.h

··· 2584 2584 * * **SOL_SOCKET**, which supports the following *optname*\ s: 2585 2585 * **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**, 2586 2586 * **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**, 2587 - * **SO_BINDTODEVICE**, **SO_KEEPALIVE**. 2587 + * **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**, 2588 + * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**. 2588 2589 * * **IPPROTO_TCP**, which supports the following *optname*\ s: 2589 2590 * **TCP_CONGESTION**, **TCP_BPF_IW**, 2590 2591 * **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**, 2591 2592 * **TCP_KEEPIDLE**, **TCP_KEEPINTVL**, **TCP_KEEPCNT**, 2592 - * **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**. 2593 + * **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**, 2594 + * **TCP_NODELAY**, **TCP_MAXSEG**, **TCP_WINDOW_CLAMP**, 2595 + * **TCP_THIN_LINEAR_TIMEOUTS**, **TCP_BPF_DELACK_MAX**, 2596 + * **TCP_BPF_RTO_MIN**. 2593 2597 * * **IPPROTO_IP**, which supports *optname* **IP_TOS**. 2594 - * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**. 2598 + * * **IPPROTO_IPV6**, which supports the following *optname*\ s: 2599 + * **IPV6_TCLASS**, **IPV6_AUTOFLOWLABEL**. 2595 2600 * Return 2596 2601 * 0 on success, or a negative error in case of failure. 2597 2602 * ··· 2652 2647 * Return 2653 2648 * 0 on success, or a negative error in case of failure. 2654 2649 * 2655 - * long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) 2650 + * long bpf_redirect_map(struct bpf_map *map, u64 key, u64 flags) 2656 2651 * Description 2657 2652 * Redirect the packet to the endpoint referenced by *map* at 2658 2653 * index *key*. Depending on its type, this *map* can contain ··· 2813 2808 * and **BPF_CGROUP_INET6_CONNECT**. 2814 2809 * 2815 2810 * This helper actually implements a subset of **getsockopt()**. 2816 - * It supports the following *level*\ s: 2817 - * 2818 - * * **IPPROTO_TCP**, which supports *optname* 2819 - * **TCP_CONGESTION**. 2820 - * * **IPPROTO_IP**, which supports *optname* **IP_TOS**. 2821 - * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**. 2811 + * It supports the same set of *optname*\ s that is supported by 2812 + * the **bpf_setsockopt**\ () helper. The exceptions are 2813 + * **TCP_BPF_*** is **bpf_setsockopt**\ () only and 2814 + * **TCP_SAVED_SYN** is **bpf_getsockopt**\ () only. 2822 2815 * Return 2823 2816 * 0 on success, or a negative error in case of failure. 2824 2817 * ··· 6887 6884 } __attribute__((aligned(8))); 6888 6885 6889 6886 struct bpf_dynptr { 6887 + __u64 :64; 6888 + __u64 :64; 6889 + } __attribute__((aligned(8))); 6890 + 6891 + struct bpf_list_head { 6892 + __u64 :64; 6893 + __u64 :64; 6894 + } __attribute__((aligned(8))); 6895 + 6896 + struct bpf_list_node { 6890 6897 __u64 :64; 6891 6898 __u64 :64; 6892 6899 } __attribute__((aligned(8)));

-1

kernel/bpf/arraymap.c

··· 430 430 for (i = 0; i < array->map.max_entries; i++) 431 431 bpf_obj_free_fields(map->record, array_map_elem_ptr(array, i)); 432 432 } 433 - bpf_map_free_record(map); 434 433 } 435 434 436 435 if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)

+4 -2

kernel/bpf/bpf_lsm.c

··· 151 151 static const struct bpf_func_proto bpf_ima_inode_hash_proto = { 152 152 .func = bpf_ima_inode_hash, 153 153 .gpl_only = false, 154 + .might_sleep = true, 154 155 .ret_type = RET_INTEGER, 155 156 .arg1_type = ARG_PTR_TO_BTF_ID, 156 157 .arg1_btf_id = &bpf_ima_inode_hash_btf_ids[0], ··· 170 169 static const struct bpf_func_proto bpf_ima_file_hash_proto = { 171 170 .func = bpf_ima_file_hash, 172 171 .gpl_only = false, 172 + .might_sleep = true, 173 173 .ret_type = RET_INTEGER, 174 174 .arg1_type = ARG_PTR_TO_BTF_ID, 175 175 .arg1_btf_id = &bpf_ima_file_hash_btf_ids[0], ··· 223 221 case BPF_FUNC_bprm_opts_set: 224 222 return &bpf_bprm_opts_set_proto; 225 223 case BPF_FUNC_ima_inode_hash: 226 - return prog->aux->sleepable ? &bpf_ima_inode_hash_proto : NULL; 224 + return &bpf_ima_inode_hash_proto; 227 225 case BPF_FUNC_ima_file_hash: 228 - return prog->aux->sleepable ? &bpf_ima_file_hash_proto : NULL; 226 + return &bpf_ima_file_hash_proto; 229 227 case BPF_FUNC_get_attach_cookie: 230 228 return bpf_prog_has_trampoline(prog) ? &bpf_get_attach_cookie_proto : NULL; 231 229 #ifdef CONFIG_NET

+478 -404

kernel/bpf/btf.c

··· 199 199 DEFINE_SPINLOCK(btf_idr_lock); 200 200 201 201 enum btf_kfunc_hook { 202 + BTF_KFUNC_HOOK_COMMON, 202 203 BTF_KFUNC_HOOK_XDP, 203 204 BTF_KFUNC_HOOK_TC, 204 205 BTF_KFUNC_HOOK_STRUCT_OPS, ··· 238 237 struct rcu_head rcu; 239 238 struct btf_kfunc_set_tab *kfunc_set_tab; 240 239 struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab; 240 + struct btf_struct_metas *struct_meta_tab; 241 241 242 242 /* split BTF support */ 243 243 struct btf *base_btf; ··· 477 475 static bool btf_type_nosize_or_null(const struct btf_type *t) 478 476 { 479 477 return !t || btf_type_nosize(t); 480 - } 481 - 482 - static bool __btf_type_is_struct(const struct btf_type *t) 483 - { 484 - return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT; 485 - } 486 - 487 - static bool btf_type_is_array(const struct btf_type *t) 488 - { 489 - return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY; 490 478 } 491 479 492 480 static bool btf_type_is_datasec(const struct btf_type *t) ··· 1634 1642 btf->dtor_kfunc_tab = NULL; 1635 1643 } 1636 1644 1645 + static void btf_struct_metas_free(struct btf_struct_metas *tab) 1646 + { 1647 + int i; 1648 + 1649 + if (!tab) 1650 + return; 1651 + for (i = 0; i < tab->cnt; i++) { 1652 + btf_record_free(tab->types[i].record); 1653 + kfree(tab->types[i].field_offs); 1654 + } 1655 + kfree(tab); 1656 + } 1657 + 1658 + static void btf_free_struct_meta_tab(struct btf *btf) 1659 + { 1660 + struct btf_struct_metas *tab = btf->struct_meta_tab; 1661 + 1662 + btf_struct_metas_free(tab); 1663 + btf->struct_meta_tab = NULL; 1664 + } 1665 + 1637 1666 static void btf_free(struct btf *btf) 1638 1667 { 1668 + btf_free_struct_meta_tab(btf); 1639 1669 btf_free_dtor_kfunc_tab(btf); 1640 1670 btf_free_kfunc_set_tab(btf); 1641 1671 kvfree(btf->types); ··· 3219 3205 struct btf_field_info { 3220 3206 enum btf_field_type type; 3221 3207 u32 off; 3222 - struct { 3223 - u32 type_id; 3224 - } kptr; 3208 + union { 3209 + struct { 3210 + u32 type_id; 3211 + } kptr; 3212 + struct { 3213 + const char *node_name; 3214 + u32 value_btf_id; 3215 + } list_head; 3216 + }; 3225 3217 }; 3226 3218 3227 3219 static int btf_find_struct(const struct btf *btf, const struct btf_type *t, ··· 3281 3261 return BTF_FIELD_FOUND; 3282 3262 } 3283 3263 3264 + static const char *btf_find_decl_tag_value(const struct btf *btf, 3265 + const struct btf_type *pt, 3266 + int comp_idx, const char *tag_key) 3267 + { 3268 + int i; 3269 + 3270 + for (i = 1; i < btf_nr_types(btf); i++) { 3271 + const struct btf_type *t = btf_type_by_id(btf, i); 3272 + int len = strlen(tag_key); 3273 + 3274 + if (!btf_type_is_decl_tag(t)) 3275 + continue; 3276 + if (pt != btf_type_by_id(btf, t->type) || 3277 + btf_type_decl_tag(t)->component_idx != comp_idx) 3278 + continue; 3279 + if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len)) 3280 + continue; 3281 + return __btf_name_by_offset(btf, t->name_off) + len; 3282 + } 3283 + return NULL; 3284 + } 3285 + 3286 + static int btf_find_list_head(const struct btf *btf, const struct btf_type *pt, 3287 + const struct btf_type *t, int comp_idx, 3288 + u32 off, int sz, struct btf_field_info *info) 3289 + { 3290 + const char *value_type; 3291 + const char *list_node; 3292 + s32 id; 3293 + 3294 + if (!__btf_type_is_struct(t)) 3295 + return BTF_FIELD_IGNORE; 3296 + if (t->size != sz) 3297 + return BTF_FIELD_IGNORE; 3298 + value_type = btf_find_decl_tag_value(btf, pt, comp_idx, "contains:"); 3299 + if (!value_type) 3300 + return -EINVAL; 3301 + list_node = strstr(value_type, ":"); 3302 + if (!list_node) 3303 + return -EINVAL; 3304 + value_type = kstrndup(value_type, list_node - value_type, GFP_KERNEL | __GFP_NOWARN); 3305 + if (!value_type) 3306 + return -ENOMEM; 3307 + id = btf_find_by_name_kind(btf, value_type, BTF_KIND_STRUCT); 3308 + kfree(value_type); 3309 + if (id < 0) 3310 + return id; 3311 + list_node++; 3312 + if (str_is_empty(list_node)) 3313 + return -EINVAL; 3314 + info->type = BPF_LIST_HEAD; 3315 + info->off = off; 3316 + info->list_head.value_btf_id = id; 3317 + info->list_head.node_name = list_node; 3318 + return BTF_FIELD_FOUND; 3319 + } 3320 + 3284 3321 static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask, 3285 3322 int *align, int *sz) 3286 3323 { ··· 3358 3281 return -E2BIG; 3359 3282 *seen_mask |= BPF_TIMER; 3360 3283 type = BPF_TIMER; 3284 + goto end; 3285 + } 3286 + } 3287 + if (field_mask & BPF_LIST_HEAD) { 3288 + if (!strcmp(name, "bpf_list_head")) { 3289 + type = BPF_LIST_HEAD; 3290 + goto end; 3291 + } 3292 + } 3293 + if (field_mask & BPF_LIST_NODE) { 3294 + if (!strcmp(name, "bpf_list_node")) { 3295 + type = BPF_LIST_NODE; 3361 3296 goto end; 3362 3297 } 3363 3298 } ··· 3416 3327 switch (field_type) { 3417 3328 case BPF_SPIN_LOCK: 3418 3329 case BPF_TIMER: 3330 + case BPF_LIST_NODE: 3419 3331 ret = btf_find_struct(btf, member_type, off, sz, field_type, 3420 3332 idx < info_cnt ? &info[idx] : &tmp); 3421 3333 if (ret < 0) ··· 3426 3336 case BPF_KPTR_REF: 3427 3337 ret = btf_find_kptr(btf, member_type, off, sz, 3428 3338 idx < info_cnt ? &info[idx] : &tmp); 3339 + if (ret < 0) 3340 + return ret; 3341 + break; 3342 + case BPF_LIST_HEAD: 3343 + ret = btf_find_list_head(btf, t, member_type, i, off, sz, 3344 + idx < info_cnt ? &info[idx] : &tmp); 3429 3345 if (ret < 0) 3430 3346 return ret; 3431 3347 break; ··· 3477 3381 switch (field_type) { 3478 3382 case BPF_SPIN_LOCK: 3479 3383 case BPF_TIMER: 3384 + case BPF_LIST_NODE: 3480 3385 ret = btf_find_struct(btf, var_type, off, sz, field_type, 3481 3386 idx < info_cnt ? &info[idx] : &tmp); 3482 3387 if (ret < 0) ··· 3487 3390 case BPF_KPTR_REF: 3488 3391 ret = btf_find_kptr(btf, var_type, off, sz, 3489 3392 idx < info_cnt ? &info[idx] : &tmp); 3393 + if (ret < 0) 3394 + return ret; 3395 + break; 3396 + case BPF_LIST_HEAD: 3397 + ret = btf_find_list_head(btf, var, var_type, -1, off, sz, 3398 + idx < info_cnt ? &info[idx] : &tmp); 3490 3399 if (ret < 0) 3491 3400 return ret; 3492 3401 break; ··· 3594 3491 return ret; 3595 3492 } 3596 3493 3494 + static int btf_parse_list_head(const struct btf *btf, struct btf_field *field, 3495 + struct btf_field_info *info) 3496 + { 3497 + const struct btf_type *t, *n = NULL; 3498 + const struct btf_member *member; 3499 + u32 offset; 3500 + int i; 3501 + 3502 + t = btf_type_by_id(btf, info->list_head.value_btf_id); 3503 + /* We've already checked that value_btf_id is a struct type. We 3504 + * just need to figure out the offset of the list_node, and 3505 + * verify its type. 3506 + */ 3507 + for_each_member(i, t, member) { 3508 + if (strcmp(info->list_head.node_name, __btf_name_by_offset(btf, member->name_off))) 3509 + continue; 3510 + /* Invalid BTF, two members with same name */ 3511 + if (n) 3512 + return -EINVAL; 3513 + n = btf_type_by_id(btf, member->type); 3514 + if (!__btf_type_is_struct(n)) 3515 + return -EINVAL; 3516 + if (strcmp("bpf_list_node", __btf_name_by_offset(btf, n->name_off))) 3517 + return -EINVAL; 3518 + offset = __btf_member_bit_offset(n, member); 3519 + if (offset % 8) 3520 + return -EINVAL; 3521 + offset /= 8; 3522 + if (offset % __alignof__(struct bpf_list_node)) 3523 + return -EINVAL; 3524 + 3525 + field->list_head.btf = (struct btf *)btf; 3526 + field->list_head.value_btf_id = info->list_head.value_btf_id; 3527 + field->list_head.node_offset = offset; 3528 + } 3529 + if (!n) 3530 + return -ENOENT; 3531 + return 0; 3532 + } 3533 + 3597 3534 struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t, 3598 3535 u32 field_mask, u32 value_size) 3599 3536 { 3600 3537 struct btf_field_info info_arr[BTF_FIELDS_MAX]; 3601 3538 struct btf_record *rec; 3539 + u32 next_off = 0; 3602 3540 int ret, i, cnt; 3603 3541 3604 3542 ret = btf_find_field(btf, t, field_mask, info_arr, ARRAY_SIZE(info_arr)); ··· 3649 3505 return NULL; 3650 3506 3651 3507 cnt = ret; 3508 + /* This needs to be kzalloc to zero out padding and unused fields, see 3509 + * comment in btf_record_equal. 3510 + */ 3652 3511 rec = kzalloc(offsetof(struct btf_record, fields[cnt]), GFP_KERNEL | __GFP_NOWARN); 3653 3512 if (!rec) 3654 3513 return ERR_PTR(-ENOMEM); ··· 3664 3517 ret = -EFAULT; 3665 3518 goto end; 3666 3519 } 3520 + if (info_arr[i].off < next_off) { 3521 + ret = -EEXIST; 3522 + goto end; 3523 + } 3524 + next_off = info_arr[i].off + btf_field_type_size(info_arr[i].type); 3667 3525 3668 3526 rec->field_mask |= info_arr[i].type; 3669 3527 rec->fields[i].offset = info_arr[i].off; ··· 3691 3539 if (ret < 0) 3692 3540 goto end; 3693 3541 break; 3542 + case BPF_LIST_HEAD: 3543 + ret = btf_parse_list_head(btf, &rec->fields[i], &info_arr[i]); 3544 + if (ret < 0) 3545 + goto end; 3546 + break; 3547 + case BPF_LIST_NODE: 3548 + break; 3694 3549 default: 3695 3550 ret = -EFAULT; 3696 3551 goto end; 3697 3552 } 3698 3553 rec->cnt++; 3699 3554 } 3555 + 3556 + /* bpf_list_head requires bpf_spin_lock */ 3557 + if (btf_record_has_field(rec, BPF_LIST_HEAD) && rec->spin_lock_off < 0) { 3558 + ret = -EINVAL; 3559 + goto end; 3560 + } 3561 + 3700 3562 return rec; 3701 3563 end: 3702 3564 btf_record_free(rec); 3703 3565 return ERR_PTR(ret); 3566 + } 3567 + 3568 + int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec) 3569 + { 3570 + int i; 3571 + 3572 + /* There are two owning types, kptr_ref and bpf_list_head. The former 3573 + * only supports storing kernel types, which can never store references 3574 + * to program allocated local types, atleast not yet. Hence we only need 3575 + * to ensure that bpf_list_head ownership does not form cycles. 3576 + */ 3577 + if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_LIST_HEAD)) 3578 + return 0; 3579 + for (i = 0; i < rec->cnt; i++) { 3580 + struct btf_struct_meta *meta; 3581 + u32 btf_id; 3582 + 3583 + if (!(rec->fields[i].type & BPF_LIST_HEAD)) 3584 + continue; 3585 + btf_id = rec->fields[i].list_head.value_btf_id; 3586 + meta = btf_find_struct_meta(btf, btf_id); 3587 + if (!meta) 3588 + return -EFAULT; 3589 + rec->fields[i].list_head.value_rec = meta->record; 3590 + 3591 + if (!(rec->field_mask & BPF_LIST_NODE)) 3592 + continue; 3593 + 3594 + /* We need to ensure ownership acyclicity among all types. The 3595 + * proper way to do it would be to topologically sort all BTF 3596 + * IDs based on the ownership edges, since there can be multiple 3597 + * bpf_list_head in a type. Instead, we use the following 3598 + * reasoning: 3599 + * 3600 + * - A type can only be owned by another type in user BTF if it 3601 + * has a bpf_list_node. 3602 + * - A type can only _own_ another type in user BTF if it has a 3603 + * bpf_list_head. 3604 + * 3605 + * We ensure that if a type has both bpf_list_head and 3606 + * bpf_list_node, its element types cannot be owning types. 3607 + * 3608 + * To ensure acyclicity: 3609 + * 3610 + * When A only has bpf_list_head, ownership chain can be: 3611 + * A -> B -> C 3612 + * Where: 3613 + * - B has both bpf_list_head and bpf_list_node. 3614 + * - C only has bpf_list_node. 3615 + * 3616 + * When A has both bpf_list_head and bpf_list_node, some other 3617 + * type already owns it in the BTF domain, hence it can not own 3618 + * another owning type through any of the bpf_list_head edges. 3619 + * A -> B 3620 + * Where: 3621 + * - B only has bpf_list_node. 3622 + */ 3623 + if (meta->record->field_mask & BPF_LIST_HEAD) 3624 + return -ELOOP; 3625 + } 3626 + return 0; 3704 3627 } 3705 3628 3706 3629 static int btf_field_offs_cmp(const void *_a, const void *_b, const void *priv) ··· 3811 3584 u8 *sz; 3812 3585 3813 3586 BUILD_BUG_ON(ARRAY_SIZE(foffs->field_off) != ARRAY_SIZE(foffs->field_sz)); 3814 - if (IS_ERR_OR_NULL(rec) || WARN_ON_ONCE(rec->cnt > sizeof(foffs->field_off))) 3587 + if (IS_ERR_OR_NULL(rec)) 3815 3588 return NULL; 3816 3589 3817 3590 foffs = kzalloc(sizeof(*foffs), GFP_KERNEL | __GFP_NOWARN); ··· 4779 4552 nr_args--; 4780 4553 } 4781 4554 4782 - err = 0; 4783 4555 for (i = 0; i < nr_args; i++) { 4784 4556 const struct btf_type *arg_type; 4785 4557 u32 arg_type_id; ··· 4787 4561 arg_type = btf_type_by_id(btf, arg_type_id); 4788 4562 if (!arg_type) { 4789 4563 btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1); 4790 - err = -EINVAL; 4791 - break; 4564 + return -EINVAL; 4565 + } 4566 + 4567 + if (btf_type_is_resolve_source_only(arg_type)) { 4568 + btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1); 4569 + return -EINVAL; 4792 4570 } 4793 4571 4794 4572 if (args[i].name_off && ··· 4800 4570 !btf_name_valid_identifier(btf, args[i].name_off))) { 4801 4571 btf_verifier_log_type(env, t, 4802 4572 "Invalid arg#%u", i + 1); 4803 - err = -EINVAL; 4804 - break; 4573 + return -EINVAL; 4805 4574 } 4806 4575 4807 4576 if (btf_type_needs_resolve(arg_type) && 4808 4577 !env_type_is_resolved(env, arg_type_id)) { 4809 4578 err = btf_resolve(env, arg_type, arg_type_id); 4810 4579 if (err) 4811 - break; 4580 + return err; 4812 4581 } 4813 4582 4814 4583 if (!btf_type_id_size(btf, &arg_type_id, NULL)) { 4815 4584 btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1); 4816 - err = -EINVAL; 4817 - break; 4585 + return -EINVAL; 4818 4586 } 4819 4587 } 4820 4588 4821 - return err; 4589 + return 0; 4822 4590 } 4823 4591 4824 4592 static int btf_func_check(struct btf_verifier_env *env, ··· 5230 5002 return btf_check_sec_info(env, btf_data_size); 5231 5003 } 5232 5004 5005 + static const char *alloc_obj_fields[] = { 5006 + "bpf_spin_lock", 5007 + "bpf_list_head", 5008 + "bpf_list_node", 5009 + }; 5010 + 5011 + static struct btf_struct_metas * 5012 + btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf) 5013 + { 5014 + union { 5015 + struct btf_id_set set; 5016 + struct { 5017 + u32 _cnt; 5018 + u32 _ids[ARRAY_SIZE(alloc_obj_fields)]; 5019 + } _arr; 5020 + } aof; 5021 + struct btf_struct_metas *tab = NULL; 5022 + int i, n, id, ret; 5023 + 5024 + BUILD_BUG_ON(offsetof(struct btf_id_set, cnt) != 0); 5025 + BUILD_BUG_ON(sizeof(struct btf_id_set) != sizeof(u32)); 5026 + 5027 + memset(&aof, 0, sizeof(aof)); 5028 + for (i = 0; i < ARRAY_SIZE(alloc_obj_fields); i++) { 5029 + /* Try to find whether this special type exists in user BTF, and 5030 + * if so remember its ID so we can easily find it among members 5031 + * of structs that we iterate in the next loop. 5032 + */ 5033 + id = btf_find_by_name_kind(btf, alloc_obj_fields[i], BTF_KIND_STRUCT); 5034 + if (id < 0) 5035 + continue; 5036 + aof.set.ids[aof.set.cnt++] = id; 5037 + } 5038 + 5039 + if (!aof.set.cnt) 5040 + return NULL; 5041 + sort(&aof.set.ids, aof.set.cnt, sizeof(aof.set.ids[0]), btf_id_cmp_func, NULL); 5042 + 5043 + n = btf_nr_types(btf); 5044 + for (i = 1; i < n; i++) { 5045 + struct btf_struct_metas *new_tab; 5046 + const struct btf_member *member; 5047 + struct btf_field_offs *foffs; 5048 + struct btf_struct_meta *type; 5049 + struct btf_record *record; 5050 + const struct btf_type *t; 5051 + int j, tab_cnt; 5052 + 5053 + t = btf_type_by_id(btf, i); 5054 + if (!t) { 5055 + ret = -EINVAL; 5056 + goto free; 5057 + } 5058 + if (!__btf_type_is_struct(t)) 5059 + continue; 5060 + 5061 + cond_resched(); 5062 + 5063 + for_each_member(j, t, member) { 5064 + if (btf_id_set_contains(&aof.set, member->type)) 5065 + goto parse; 5066 + } 5067 + continue; 5068 + parse: 5069 + tab_cnt = tab ? tab->cnt : 0; 5070 + new_tab = krealloc(tab, offsetof(struct btf_struct_metas, types[tab_cnt + 1]), 5071 + GFP_KERNEL | __GFP_NOWARN); 5072 + if (!new_tab) { 5073 + ret = -ENOMEM; 5074 + goto free; 5075 + } 5076 + if (!tab) 5077 + new_tab->cnt = 0; 5078 + tab = new_tab; 5079 + 5080 + type = &tab->types[tab->cnt]; 5081 + type->btf_id = i; 5082 + record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE, t->size); 5083 + /* The record cannot be unset, treat it as an error if so */ 5084 + if (IS_ERR_OR_NULL(record)) { 5085 + ret = PTR_ERR_OR_ZERO(record) ?: -EFAULT; 5086 + goto free; 5087 + } 5088 + foffs = btf_parse_field_offs(record); 5089 + /* We need the field_offs to be valid for a valid record, 5090 + * either both should be set or both should be unset. 5091 + */ 5092 + if (IS_ERR_OR_NULL(foffs)) { 5093 + btf_record_free(record); 5094 + ret = -EFAULT; 5095 + goto free; 5096 + } 5097 + type->record = record; 5098 + type->field_offs = foffs; 5099 + tab->cnt++; 5100 + } 5101 + return tab; 5102 + free: 5103 + btf_struct_metas_free(tab); 5104 + return ERR_PTR(ret); 5105 + } 5106 + 5107 + struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id) 5108 + { 5109 + struct btf_struct_metas *tab; 5110 + 5111 + BUILD_BUG_ON(offsetof(struct btf_struct_meta, btf_id) != 0); 5112 + tab = btf->struct_meta_tab; 5113 + if (!tab) 5114 + return NULL; 5115 + return bsearch(&btf_id, tab->types, tab->cnt, sizeof(tab->types[0]), btf_id_cmp_func); 5116 + } 5117 + 5233 5118 static int btf_check_type_tags(struct btf_verifier_env *env, 5234 5119 struct btf *btf, int start_id) 5235 5120 { ··· 5393 5052 static struct btf *btf_parse(bpfptr_t btf_data, u32 btf_data_size, 5394 5053 u32 log_level, char __user *log_ubuf, u32 log_size) 5395 5054 { 5055 + struct btf_struct_metas *struct_meta_tab; 5396 5056 struct btf_verifier_env *env = NULL; 5397 5057 struct bpf_verifier_log *log; 5398 5058 struct btf *btf = NULL; ··· 5462 5120 if (err) 5463 5121 goto errout; 5464 5122 5123 + struct_meta_tab = btf_parse_struct_metas(log, btf); 5124 + if (IS_ERR(struct_meta_tab)) { 5125 + err = PTR_ERR(struct_meta_tab); 5126 + goto errout; 5127 + } 5128 + btf->struct_meta_tab = struct_meta_tab; 5129 + 5130 + if (struct_meta_tab) { 5131 + int i; 5132 + 5133 + for (i = 0; i < struct_meta_tab->cnt; i++) { 5134 + err = btf_check_and_fixup_fields(btf, struct_meta_tab->types[i].record); 5135 + if (err < 0) 5136 + goto errout_meta; 5137 + } 5138 + } 5139 + 5465 5140 if (log->level && bpf_verifier_log_full(log)) { 5466 5141 err = -ENOSPC; 5467 - goto errout; 5142 + goto errout_meta; 5468 5143 } 5469 5144 5470 5145 btf_verifier_env_free(env); 5471 5146 refcount_set(&btf->refcnt, 1); 5472 5147 return btf; 5473 5148 5149 + errout_meta: 5150 + btf_free_struct_meta_tab(btf); 5474 5151 errout: 5475 5152 btf_verifier_env_free(env); 5476 5153 if (btf) ··· 5531 5170 #undef BPF_MAP_TYPE 5532 5171 #undef BPF_LINK_TYPE 5533 5172 5534 - static const struct btf_member * 5173 + const struct btf_member * 5535 5174 btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf, 5536 5175 const struct btf_type *t, enum bpf_prog_type prog_type, 5537 5176 int arg) ··· 5602 5241 return -ENOENT; 5603 5242 kern_ctx_type = prog_ctx_type + 1; 5604 5243 return kern_ctx_type->type; 5244 + } 5245 + 5246 + int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_type) 5247 + { 5248 + const struct btf_member *kctx_member; 5249 + const struct btf_type *conv_struct; 5250 + const struct btf_type *kctx_type; 5251 + u32 kctx_type_id; 5252 + 5253 + conv_struct = bpf_ctx_convert.t; 5254 + /* get member for kernel ctx type */ 5255 + kctx_member = btf_type_member(conv_struct) + bpf_ctx_convert_map[prog_type] * 2 + 1; 5256 + kctx_type_id = kctx_member->type; 5257 + kctx_type = btf_type_by_id(btf_vmlinux, kctx_type_id); 5258 + if (!btf_type_is_struct(kctx_type)) { 5259 + bpf_log(log, "kern ctx type id %u is not a struct\n", kctx_type_id); 5260 + return -EINVAL; 5261 + } 5262 + 5263 + return kctx_type_id; 5605 5264 } 5606 5265 5607 5266 BTF_ID_LIST(bpf_ctx_convert_btf_id) ··· 5821 5440 return nr_args + 1; 5822 5441 } 5823 5442 5443 + static bool prog_args_trusted(const struct bpf_prog *prog) 5444 + { 5445 + enum bpf_attach_type atype = prog->expected_attach_type; 5446 + 5447 + switch (prog->type) { 5448 + case BPF_PROG_TYPE_TRACING: 5449 + return atype == BPF_TRACE_RAW_TP || atype == BPF_TRACE_ITER; 5450 + case BPF_PROG_TYPE_LSM: 5451 + case BPF_PROG_TYPE_STRUCT_OPS: 5452 + return true; 5453 + default: 5454 + return false; 5455 + } 5456 + } 5457 + 5824 5458 bool btf_ctx_access(int off, int size, enum bpf_access_type type, 5825 5459 const struct bpf_prog *prog, 5826 5460 struct bpf_insn_access_aux *info) ··· 5979 5583 } 5980 5584 5981 5585 info->reg_type = PTR_TO_BTF_ID; 5586 + if (prog_args_trusted(prog)) 5587 + info->reg_type |= PTR_TRUSTED; 5588 + 5982 5589 if (tgt_prog) { 5983 5590 enum bpf_prog_type tgt_type; 5984 5591 ··· 6248 5849 /* check __percpu tag */ 6249 5850 if (strcmp(tag_value, "percpu") == 0) 6250 5851 tmp_flag = MEM_PERCPU; 5852 + /* check __rcu tag */ 5853 + if (strcmp(tag_value, "rcu") == 0) 5854 + tmp_flag = MEM_RCU; 6251 5855 } 6252 5856 6253 5857 stype = btf_type_skip_modifiers(btf, mtype->type, &id); ··· 6280 5878 return -EINVAL; 6281 5879 } 6282 5880 6283 - int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf, 6284 - const struct btf_type *t, int off, int size, 6285 - enum bpf_access_type atype __maybe_unused, 5881 + int btf_struct_access(struct bpf_verifier_log *log, 5882 + const struct bpf_reg_state *reg, 5883 + int off, int size, enum bpf_access_type atype __maybe_unused, 6286 5884 u32 *next_btf_id, enum bpf_type_flag *flag) 6287 5885 { 5886 + const struct btf *btf = reg->btf; 6288 5887 enum bpf_type_flag tmp_flag = 0; 5888 + const struct btf_type *t; 5889 + u32 id = reg->btf_id; 6289 5890 int err; 6290 - u32 id; 6291 5891 5892 + while (type_is_alloc(reg->type)) { 5893 + struct btf_struct_meta *meta; 5894 + struct btf_record *rec; 5895 + int i; 5896 + 5897 + meta = btf_find_struct_meta(btf, id); 5898 + if (!meta) 5899 + break; 5900 + rec = meta->record; 5901 + for (i = 0; i < rec->cnt; i++) { 5902 + struct btf_field *field = &rec->fields[i]; 5903 + u32 offset = field->offset; 5904 + if (off < offset + btf_field_type_size(field->type) && offset < off + size) { 5905 + bpf_log(log, 5906 + "direct access to %s is disallowed\n", 5907 + btf_field_type_name(field->type)); 5908 + return -EACCES; 5909 + } 5910 + } 5911 + break; 5912 + } 5913 + 5914 + t = btf_type_by_id(btf, id); 6292 5915 do { 6293 5916 err = btf_struct_walk(log, btf, t, off, size, &id, &tmp_flag); 6294 5917 6295 5918 switch (err) { 6296 5919 case WALK_PTR: 5920 + /* For local types, the destination register cannot 5921 + * become a pointer again. 5922 + */ 5923 + if (type_is_alloc(reg->type)) 5924 + return SCALAR_VALUE; 6297 5925 /* If we found the pointer or scalar on t+off, 6298 5926 * we're done. 6299 5927 */ ··· 6358 5926 * end up with two different module BTFs, but IDs point to the common type in 6359 5927 * vmlinux BTF. 6360 5928 */ 6361 - static bool btf_types_are_same(const struct btf *btf1, u32 id1, 6362 - const struct btf *btf2, u32 id2) 5929 + bool btf_types_are_same(const struct btf *btf1, u32 id1, 5930 + const struct btf *btf2, u32 id2) 6363 5931 { 6364 5932 if (id1 != id2) 6365 5933 return false; ··· 6641 6209 return btf_check_func_type_match(log, btf1, t1, btf2, t2); 6642 6210 } 6643 6211 6644 - static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = { 6645 - #ifdef CONFIG_NET 6646 - [PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK], 6647 - [PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], 6648 - [PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP], 6649 - #endif 6650 - }; 6651 - 6652 - /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */ 6653 - static bool __btf_type_is_scalar_struct(struct bpf_verifier_log *log, 6654 - const struct btf *btf, 6655 - const struct btf_type *t, int rec) 6656 - { 6657 - const struct btf_type *member_type; 6658 - const struct btf_member *member; 6659 - u32 i; 6660 - 6661 - if (!btf_type_is_struct(t)) 6662 - return false; 6663 - 6664 - for_each_member(i, t, member) { 6665 - const struct btf_array *array; 6666 - 6667 - member_type = btf_type_skip_modifiers(btf, member->type, NULL); 6668 - if (btf_type_is_struct(member_type)) { 6669 - if (rec >= 3) { 6670 - bpf_log(log, "max struct nesting depth exceeded\n"); 6671 - return false; 6672 - } 6673 - if (!__btf_type_is_scalar_struct(log, btf, member_type, rec + 1)) 6674 - return false; 6675 - continue; 6676 - } 6677 - if (btf_type_is_array(member_type)) { 6678 - array = btf_type_array(member_type); 6679 - if (!array->nelems) 6680 - return false; 6681 - member_type = btf_type_skip_modifiers(btf, array->type, NULL); 6682 - if (!btf_type_is_scalar(member_type)) 6683 - return false; 6684 - continue; 6685 - } 6686 - if (!btf_type_is_scalar(member_type)) 6687 - return false; 6688 - } 6689 - return true; 6690 - } 6691 - 6692 - static bool is_kfunc_arg_mem_size(const struct btf *btf, 6693 - const struct btf_param *arg, 6694 - const struct bpf_reg_state *reg) 6695 - { 6696 - int len, sfx_len = sizeof("__sz") - 1; 6697 - const struct btf_type *t; 6698 - const char *param_name; 6699 - 6700 - t = btf_type_skip_modifiers(btf, arg->type, NULL); 6701 - if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE) 6702 - return false; 6703 - 6704 - /* In the future, this can be ported to use BTF tagging */ 6705 - param_name = btf_name_by_offset(btf, arg->name_off); 6706 - if (str_is_empty(param_name)) 6707 - return false; 6708 - len = strlen(param_name); 6709 - if (len < sfx_len) 6710 - return false; 6711 - param_name += len - sfx_len; 6712 - if (strncmp(param_name, "__sz", sfx_len)) 6713 - return false; 6714 - 6715 - return true; 6716 - } 6717 - 6718 - static bool btf_is_kfunc_arg_mem_size(const struct btf *btf, 6719 - const struct btf_param *arg, 6720 - const struct bpf_reg_state *reg, 6721 - const char *name) 6722 - { 6723 - int len, target_len = strlen(name); 6724 - const struct btf_type *t; 6725 - const char *param_name; 6726 - 6727 - t = btf_type_skip_modifiers(btf, arg->type, NULL); 6728 - if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE) 6729 - return false; 6730 - 6731 - param_name = btf_name_by_offset(btf, arg->name_off); 6732 - if (str_is_empty(param_name)) 6733 - return false; 6734 - len = strlen(param_name); 6735 - if (len != target_len) 6736 - return false; 6737 - if (strcmp(param_name, name)) 6738 - return false; 6739 - 6740 - return true; 6741 - } 6742 - 6743 6212 static int btf_check_func_arg_match(struct bpf_verifier_env *env, 6744 6213 const struct btf *btf, u32 func_id, 6745 6214 struct bpf_reg_state *regs, 6746 6215 bool ptr_to_mem_ok, 6747 - struct bpf_kfunc_arg_meta *kfunc_meta, 6748 6216 bool processing_call) 6749 6217 { 6750 6218 enum bpf_prog_type prog_type = resolve_prog_type(env->prog); 6751 - bool rel = false, kptr_get = false, trusted_args = false; 6752 - bool sleepable = false; 6753 6219 struct bpf_verifier_log *log = &env->log; 6754 - u32 i, nargs, ref_id, ref_obj_id = 0; 6755 - bool is_kfunc = btf_is_kernel(btf); 6756 6220 const char *func_name, *ref_tname; 6757 6221 const struct btf_type *t, *ref_t; 6758 6222 const struct btf_param *args; 6759 - int ref_regno = 0, ret; 6223 + u32 i, nargs, ref_id; 6224 + int ret; 6760 6225 6761 6226 t = btf_type_by_id(btf, func_id); 6762 6227 if (!t || !btf_type_is_func(t)) { ··· 6679 6350 return -EINVAL; 6680 6351 } 6681 6352 6682 - if (is_kfunc && kfunc_meta) { 6683 - /* Only kfunc can be release func */ 6684 - rel = kfunc_meta->flags & KF_RELEASE; 6685 - kptr_get = kfunc_meta->flags & KF_KPTR_GET; 6686 - trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS; 6687 - sleepable = kfunc_meta->flags & KF_SLEEPABLE; 6688 - } 6689 - 6690 6353 /* check that BTF function arguments match actual types that the 6691 6354 * verifier sees. 6692 6355 */ ··· 6686 6365 enum bpf_arg_type arg_type = ARG_DONTCARE; 6687 6366 u32 regno = i + 1; 6688 6367 struct bpf_reg_state *reg = &regs[regno]; 6689 - bool obj_ptr = false; 6690 6368 6691 6369 t = btf_type_skip_modifiers(btf, args[i].type, NULL); 6692 6370 if (btf_type_is_scalar(t)) { 6693 - if (is_kfunc && kfunc_meta) { 6694 - bool is_buf_size = false; 6695 - 6696 - /* check for any const scalar parameter of name "rdonly_buf_size" 6697 - * or "rdwr_buf_size" 6698 - */ 6699 - if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg, 6700 - "rdonly_buf_size")) { 6701 - kfunc_meta->r0_rdonly = true; 6702 - is_buf_size = true; 6703 - } else if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg, 6704 - "rdwr_buf_size")) 6705 - is_buf_size = true; 6706 - 6707 - if (is_buf_size) { 6708 - if (kfunc_meta->r0_size) { 6709 - bpf_log(log, "2 or more rdonly/rdwr_buf_size parameters for kfunc"); 6710 - return -EINVAL; 6711 - } 6712 - 6713 - if (!tnum_is_const(reg->var_off)) { 6714 - bpf_log(log, "R%d is not a const\n", regno); 6715 - return -EINVAL; 6716 - } 6717 - 6718 - kfunc_meta->r0_size = reg->var_off.value; 6719 - ret = mark_chain_precision(env, regno); 6720 - if (ret) 6721 - return ret; 6722 - } 6723 - } 6724 - 6725 6371 if (reg->type == SCALAR_VALUE) 6726 6372 continue; 6727 6373 bpf_log(log, "R%d is not a scalar\n", regno); ··· 6701 6413 return -EINVAL; 6702 6414 } 6703 6415 6704 - /* These register types have special constraints wrt ref_obj_id 6705 - * and offset checks. The rest of trusted args don't. 6706 - */ 6707 - obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID || 6708 - reg2btf_ids[base_type(reg->type)]; 6709 - 6710 - /* Check if argument must be a referenced pointer, args + i has 6711 - * been verified to be a pointer (after skipping modifiers). 6712 - * PTR_TO_CTX is ok without having non-zero ref_obj_id. 6713 - */ 6714 - if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) { 6715 - bpf_log(log, "R%d must be referenced\n", regno); 6716 - return -EINVAL; 6717 - } 6718 - 6719 6416 ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id); 6720 6417 ref_tname = btf_name_by_offset(btf, ref_t->name_off); 6721 6418 6722 - /* Trusted args have the same offset checks as release arguments */ 6723 - if ((trusted_args && obj_ptr) || (rel && reg->ref_obj_id)) 6724 - arg_type |= OBJ_RELEASE; 6725 6419 ret = check_func_arg_reg_off(env, reg, regno, arg_type); 6726 6420 if (ret < 0) 6727 6421 return ret; 6728 6422 6729 - if (is_kfunc && reg->ref_obj_id) { 6730 - /* Ensure only one argument is referenced PTR_TO_BTF_ID */ 6731 - if (ref_obj_id) { 6732 - bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n", 6733 - regno, reg->ref_obj_id, ref_obj_id); 6734 - return -EFAULT; 6735 - } 6736 - ref_regno = regno; 6737 - ref_obj_id = reg->ref_obj_id; 6738 - } 6739 - 6740 - /* kptr_get is only true for kfunc */ 6741 - if (i == 0 && kptr_get) { 6742 - struct btf_field *kptr_field; 6743 - 6744 - if (reg->type != PTR_TO_MAP_VALUE) { 6745 - bpf_log(log, "arg#0 expected pointer to map value\n"); 6746 - return -EINVAL; 6747 - } 6748 - 6749 - /* check_func_arg_reg_off allows var_off for 6750 - * PTR_TO_MAP_VALUE, but we need fixed offset to find 6751 - * off_desc. 6752 - */ 6753 - if (!tnum_is_const(reg->var_off)) { 6754 - bpf_log(log, "arg#0 must have constant offset\n"); 6755 - return -EINVAL; 6756 - } 6757 - 6758 - kptr_field = btf_record_find(reg->map_ptr->record, reg->off + reg->var_off.value, BPF_KPTR); 6759 - if (!kptr_field || kptr_field->type != BPF_KPTR_REF) { 6760 - bpf_log(log, "arg#0 no referenced kptr at map value offset=%llu\n", 6761 - reg->off + reg->var_off.value); 6762 - return -EINVAL; 6763 - } 6764 - 6765 - if (!btf_type_is_ptr(ref_t)) { 6766 - bpf_log(log, "arg#0 BTF type must be a double pointer\n"); 6767 - return -EINVAL; 6768 - } 6769 - 6770 - ref_t = btf_type_skip_modifiers(btf, ref_t->type, &ref_id); 6771 - ref_tname = btf_name_by_offset(btf, ref_t->name_off); 6772 - 6773 - if (!btf_type_is_struct(ref_t)) { 6774 - bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n", 6775 - func_name, i, btf_type_str(ref_t), ref_tname); 6776 - return -EINVAL; 6777 - } 6778 - if (!btf_struct_ids_match(log, btf, ref_id, 0, kptr_field->kptr.btf, 6779 - kptr_field->kptr.btf_id, true)) { 6780 - bpf_log(log, "kernel function %s args#%d expected pointer to %s %s\n", 6781 - func_name, i, btf_type_str(ref_t), ref_tname); 6782 - return -EINVAL; 6783 - } 6784 - /* rest of the arguments can be anything, like normal kfunc */ 6785 - } else if (btf_get_prog_ctx_type(log, btf, t, prog_type, i)) { 6423 + if (btf_get_prog_ctx_type(log, btf, t, prog_type, i)) { 6786 6424 /* If function expects ctx type in BTF check that caller 6787 6425 * is passing PTR_TO_CTX. 6788 6426 */ ··· 6718 6504 i, btf_type_str(t)); 6719 6505 return -EINVAL; 6720 6506 } 6721 - } else if (is_kfunc && (reg->type == PTR_TO_BTF_ID || 6722 - (reg2btf_ids[base_type(reg->type)] && !type_flag(reg->type)))) { 6723 - const struct btf_type *reg_ref_t; 6724 - const struct btf *reg_btf; 6725 - const char *reg_ref_tname; 6726 - u32 reg_ref_id; 6727 - 6728 - if (!btf_type_is_struct(ref_t)) { 6729 - bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n", 6730 - func_name, i, btf_type_str(ref_t), 6731 - ref_tname); 6732 - return -EINVAL; 6733 - } 6734 - 6735 - if (reg->type == PTR_TO_BTF_ID) { 6736 - reg_btf = reg->btf; 6737 - reg_ref_id = reg->btf_id; 6738 - } else { 6739 - reg_btf = btf_vmlinux; 6740 - reg_ref_id = *reg2btf_ids[base_type(reg->type)]; 6741 - } 6742 - 6743 - reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, 6744 - &reg_ref_id); 6745 - reg_ref_tname = btf_name_by_offset(reg_btf, 6746 - reg_ref_t->name_off); 6747 - if (!btf_struct_ids_match(log, reg_btf, reg_ref_id, 6748 - reg->off, btf, ref_id, 6749 - trusted_args || (rel && reg->ref_obj_id))) { 6750 - bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n", 6751 - func_name, i, 6752 - btf_type_str(ref_t), ref_tname, 6753 - regno, btf_type_str(reg_ref_t), 6754 - reg_ref_tname); 6755 - return -EINVAL; 6756 - } 6757 6507 } else if (ptr_to_mem_ok && processing_call) { 6758 6508 const struct btf_type *resolve_ret; 6759 6509 u32 type_size; 6760 - 6761 - if (is_kfunc) { 6762 - bool arg_mem_size = i + 1 < nargs && is_kfunc_arg_mem_size(btf, &args[i + 1], &regs[regno + 1]); 6763 - bool arg_dynptr = btf_type_is_struct(ref_t) && 6764 - !strcmp(ref_tname, 6765 - stringify_struct(bpf_dynptr_kern)); 6766 - 6767 - /* Permit pointer to mem, but only when argument 6768 - * type is pointer to scalar, or struct composed 6769 - * (recursively) of scalars. 6770 - * When arg_mem_size is true, the pointer can be 6771 - * void *. 6772 - * Also permit initialized local dynamic pointers. 6773 - */ 6774 - if (!btf_type_is_scalar(ref_t) && 6775 - !__btf_type_is_scalar_struct(log, btf, ref_t, 0) && 6776 - !arg_dynptr && 6777 - (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) { 6778 - bpf_log(log, 6779 - "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n", 6780 - i, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : ""); 6781 - return -EINVAL; 6782 - } 6783 - 6784 - if (arg_dynptr) { 6785 - if (reg->type != PTR_TO_STACK) { 6786 - bpf_log(log, "arg#%d pointer type %s %s not to stack\n", 6787 - i, btf_type_str(ref_t), 6788 - ref_tname); 6789 - return -EINVAL; 6790 - } 6791 - 6792 - if (!is_dynptr_reg_valid_init(env, reg)) { 6793 - bpf_log(log, 6794 - "arg#%d pointer type %s %s must be valid and initialized\n", 6795 - i, btf_type_str(ref_t), 6796 - ref_tname); 6797 - return -EINVAL; 6798 - } 6799 - 6800 - if (!is_dynptr_type_expected(env, reg, 6801 - ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) { 6802 - bpf_log(log, 6803 - "arg#%d pointer type %s %s points to unsupported dynamic pointer type\n", 6804 - i, btf_type_str(ref_t), 6805 - ref_tname); 6806 - return -EINVAL; 6807 - } 6808 - 6809 - continue; 6810 - } 6811 - 6812 - /* Check for mem, len pair */ 6813 - if (arg_mem_size) { 6814 - if (check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1)) { 6815 - bpf_log(log, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", 6816 - i, i + 1); 6817 - return -EINVAL; 6818 - } 6819 - i++; 6820 - continue; 6821 - } 6822 - } 6823 6510 6824 6511 resolve_ret = btf_resolve_size(btf, ref_t, &type_size); 6825 6512 if (IS_ERR(resolve_ret)) { ··· 6734 6619 if (check_mem_reg(env, reg, regno, type_size)) 6735 6620 return -EINVAL; 6736 6621 } else { 6737 - bpf_log(log, "reg type unsupported for arg#%d %sfunction %s#%d\n", i, 6738 - is_kfunc ? "kernel " : "", func_name, func_id); 6622 + bpf_log(log, "reg type unsupported for arg#%d function %s#%d\n", i, 6623 + func_name, func_id); 6739 6624 return -EINVAL; 6740 6625 } 6741 6626 } 6742 6627 6743 - /* Either both are set, or neither */ 6744 - WARN_ON_ONCE((ref_obj_id && !ref_regno) || (!ref_obj_id && ref_regno)); 6745 - /* We already made sure ref_obj_id is set only for one argument. We do 6746 - * allow (!rel && ref_obj_id), so that passing such referenced 6747 - * PTR_TO_BTF_ID to other kfuncs works. Note that rel is only true when 6748 - * is_kfunc is true. 6749 - */ 6750 - if (rel && !ref_obj_id) { 6751 - bpf_log(log, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n", 6752 - func_name); 6753 - return -EINVAL; 6754 - } 6755 - 6756 - if (sleepable && !env->prog->aux->sleepable) { 6757 - bpf_log(log, "kernel function %s is sleepable but the program is not\n", 6758 - func_name); 6759 - return -EINVAL; 6760 - } 6761 - 6762 - if (kfunc_meta && ref_obj_id) 6763 - kfunc_meta->ref_obj_id = ref_obj_id; 6764 - 6765 - /* returns argument register number > 0 in case of reference release kfunc */ 6766 - return rel ? ref_regno : 0; 6628 + return 0; 6767 6629 } 6768 6630 6769 6631 /* Compare BTF of a function declaration with given bpf_reg_state. ··· 6770 6678 return -EINVAL; 6771 6679 6772 6680 is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL; 6773 - err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, false); 6681 + err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, false); 6774 6682 6775 6683 /* Compiler optimizations can remove arguments from static functions 6776 6684 * or mismatched type can be passed into a global function. ··· 6813 6721 return -EINVAL; 6814 6722 6815 6723 is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL; 6816 - err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, true); 6724 + err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, true); 6817 6725 6818 6726 /* Compiler optimizations can remove arguments from static functions 6819 6727 * or mismatched type can be passed into a global function. ··· 6822 6730 if (err) 6823 6731 prog->aux->func_info_aux[subprog].unreliable = true; 6824 6732 return err; 6825 - } 6826 - 6827 - int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, 6828 - const struct btf *btf, u32 func_id, 6829 - struct bpf_reg_state *regs, 6830 - struct bpf_kfunc_arg_meta *meta) 6831 - { 6832 - return btf_check_func_arg_match(env, btf, func_id, regs, true, meta, true); 6833 6733 } 6834 6734 6835 6735 /* Convert BTF of a function into bpf_reg_state if possible ··· 7206 7122 return btf->kernel_btf && strcmp(btf->name, "vmlinux") != 0; 7207 7123 } 7208 7124 7209 - static int btf_id_cmp_func(const void *a, const void *b) 7210 - { 7211 - const int *pa = a, *pb = b; 7212 - 7213 - return *pa - *pb; 7214 - } 7215 - 7216 - bool btf_id_set_contains(const struct btf_id_set *set, u32 id) 7217 - { 7218 - return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL; 7219 - } 7220 - 7221 - static void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id) 7222 - { 7223 - return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func); 7224 - } 7225 - 7226 7125 enum { 7227 7126 BTF_MODULE_F_LIVE = (1 << 0), 7228 7127 }; ··· 7566 7499 static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type) 7567 7500 { 7568 7501 switch (prog_type) { 7502 + case BPF_PROG_TYPE_UNSPEC: 7503 + return BTF_KFUNC_HOOK_COMMON; 7569 7504 case BPF_PROG_TYPE_XDP: 7570 7505 return BTF_KFUNC_HOOK_XDP; 7571 7506 case BPF_PROG_TYPE_SCHED_CLS: ··· 7596 7527 u32 kfunc_btf_id) 7597 7528 { 7598 7529 enum btf_kfunc_hook hook; 7530 + u32 *kfunc_flags; 7531 + 7532 + kfunc_flags = __btf_kfunc_id_set_contains(btf, BTF_KFUNC_HOOK_COMMON, kfunc_btf_id); 7533 + if (kfunc_flags) 7534 + return kfunc_flags; 7599 7535 7600 7536 hook = bpf_prog_type_to_kfunc_hook(prog_type); 7601 7537 return __btf_kfunc_id_set_contains(btf, hook, kfunc_btf_id);

+14

kernel/bpf/cgroup_iter.c

··· 164 164 struct cgroup_iter_priv *p = (struct cgroup_iter_priv *)priv; 165 165 struct cgroup *cgrp = aux->cgroup.start; 166 166 167 + /* bpf_iter_attach_cgroup() has already acquired an extra reference 168 + * for the start cgroup, but the reference may be released after 169 + * cgroup_iter_seq_init(), so acquire another reference for the 170 + * start cgroup. 171 + */ 167 172 p->start_css = &cgrp->self; 173 + css_get(p->start_css); 168 174 p->terminate = false; 169 175 p->visited_all = false; 170 176 p->order = aux->cgroup.order; 171 177 return 0; 172 178 } 173 179 180 + static void cgroup_iter_seq_fini(void *priv) 181 + { 182 + struct cgroup_iter_priv *p = (struct cgroup_iter_priv *)priv; 183 + 184 + css_put(p->start_css); 185 + } 186 + 174 187 static const struct bpf_iter_seq_info cgroup_iter_seq_info = { 175 188 .seq_ops = &cgroup_iter_seq_ops, 176 189 .init_seq_private = cgroup_iter_seq_init, 190 + .fini_seq_private = cgroup_iter_seq_fini, 177 191 .seq_priv_size = sizeof(struct cgroup_iter_priv), 178 192 }; 179 193

+16

kernel/bpf/core.c

··· 34 34 #include <linux/log2.h> 35 35 #include <linux/bpf_verifier.h> 36 36 #include <linux/nodemask.h> 37 + #include <linux/bpf_mem_alloc.h> 37 38 38 39 #include <asm/barrier.h> 39 40 #include <asm/unaligned.h> ··· 60 59 #define ARG1 regs[BPF_REG_ARG1] 61 60 #define CTX regs[BPF_REG_CTX] 62 61 #define IMM insn->imm 62 + 63 + struct bpf_mem_alloc bpf_global_ma; 64 + bool bpf_global_ma_set; 63 65 64 66 /* No hurry in this branch 65 67 * ··· 2749 2745 { 2750 2746 return -ENOTSUPP; 2751 2747 } 2748 + 2749 + #ifdef CONFIG_BPF_SYSCALL 2750 + static int __init bpf_global_ma_init(void) 2751 + { 2752 + int ret; 2753 + 2754 + ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false); 2755 + bpf_global_ma_set = !ret; 2756 + return ret; 2757 + } 2758 + late_initcall(bpf_global_ma_init); 2759 + #endif 2752 2760 2753 2761 DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key); 2754 2762 EXPORT_SYMBOL(bpf_stats_enabled_key);

+2 -2

kernel/bpf/cpumap.c

··· 667 667 return 0; 668 668 } 669 669 670 - static int cpu_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags) 670 + static int cpu_map_redirect(struct bpf_map *map, u64 index, u64 flags) 671 671 { 672 - return __bpf_xdp_redirect_map(map, ifindex, flags, 0, 672 + return __bpf_xdp_redirect_map(map, index, flags, 0, 673 673 __cpu_map_lookup_elem); 674 674 } 675 675

+2 -2

kernel/bpf/devmap.c

··· 992 992 map, key, value, map_flags); 993 993 } 994 994 995 - static int dev_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags) 995 + static int dev_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags) 996 996 { 997 997 return __bpf_xdp_redirect_map(map, ifindex, flags, 998 998 BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS, 999 999 __dev_map_lookup_elem); 1000 1000 } 1001 1001 1002 - static int dev_hash_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags) 1002 + static int dev_hash_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags) 1003 1003 { 1004 1004 return __bpf_xdp_redirect_map(map, ifindex, flags, 1005 1005 BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS,

-1

kernel/bpf/hashtab.c

··· 1511 1511 prealloc_destroy(htab); 1512 1512 } 1513 1513 1514 - bpf_map_free_record(map); 1515 1514 free_percpu(htab->extra_elems); 1516 1515 bpf_map_area_free(htab->buckets); 1517 1516 bpf_mem_alloc_destroy(&htab->pcpu_ma);

+358 -5

kernel/bpf/helpers.c

··· 4 4 #include <linux/bpf.h> 5 5 #include <linux/btf.h> 6 6 #include <linux/bpf-cgroup.h> 7 + #include <linux/cgroup.h> 7 8 #include <linux/rcupdate.h> 8 9 #include <linux/random.h> 9 10 #include <linux/smp.h> ··· 20 19 #include <linux/proc_ns.h> 21 20 #include <linux/security.h> 22 21 #include <linux/btf_ids.h> 22 + #include <linux/bpf_mem_alloc.h> 23 23 24 24 #include "../../lib/kstrtox.h" 25 25 ··· 338 336 .gpl_only = false, 339 337 .ret_type = RET_VOID, 340 338 .arg1_type = ARG_PTR_TO_SPIN_LOCK, 339 + .arg1_btf_id = BPF_PTR_POISON, 341 340 }; 342 341 343 342 static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock) ··· 361 358 .gpl_only = false, 362 359 .ret_type = RET_VOID, 363 360 .arg1_type = ARG_PTR_TO_SPIN_LOCK, 361 + .arg1_btf_id = BPF_PTR_POISON, 364 362 }; 365 363 366 364 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, ··· 661 657 const struct bpf_func_proto bpf_copy_from_user_proto = { 662 658 .func = bpf_copy_from_user, 663 659 .gpl_only = false, 660 + .might_sleep = true, 664 661 .ret_type = RET_INTEGER, 665 662 .arg1_type = ARG_PTR_TO_UNINIT_MEM, 666 663 .arg2_type = ARG_CONST_SIZE_OR_ZERO, ··· 692 687 const struct bpf_func_proto bpf_copy_from_user_task_proto = { 693 688 .func = bpf_copy_from_user_task, 694 689 .gpl_only = true, 690 + .might_sleep = true, 695 691 .ret_type = RET_INTEGER, 696 692 .arg1_type = ARG_PTR_TO_UNINIT_MEM, 697 693 .arg2_type = ARG_CONST_SIZE_OR_ZERO, ··· 1712 1706 } 1713 1707 } 1714 1708 1715 - BTF_SET8_START(tracing_btf_ids) 1709 + void bpf_list_head_free(const struct btf_field *field, void *list_head, 1710 + struct bpf_spin_lock *spin_lock) 1711 + { 1712 + struct list_head *head = list_head, *orig_head = list_head; 1713 + 1714 + BUILD_BUG_ON(sizeof(struct list_head) > sizeof(struct bpf_list_head)); 1715 + BUILD_BUG_ON(__alignof__(struct list_head) > __alignof__(struct bpf_list_head)); 1716 + 1717 + /* Do the actual list draining outside the lock to not hold the lock for 1718 + * too long, and also prevent deadlocks if tracing programs end up 1719 + * executing on entry/exit of functions called inside the critical 1720 + * section, and end up doing map ops that call bpf_list_head_free for 1721 + * the same map value again. 1722 + */ 1723 + __bpf_spin_lock_irqsave(spin_lock); 1724 + if (!head->next || list_empty(head)) 1725 + goto unlock; 1726 + head = head->next; 1727 + unlock: 1728 + INIT_LIST_HEAD(orig_head); 1729 + __bpf_spin_unlock_irqrestore(spin_lock); 1730 + 1731 + while (head != orig_head) { 1732 + void *obj = head; 1733 + 1734 + obj -= field->list_head.node_offset; 1735 + head = head->next; 1736 + /* The contained type can also have resources, including a 1737 + * bpf_list_head which needs to be freed. 1738 + */ 1739 + bpf_obj_free_fields(field->list_head.value_rec, obj); 1740 + /* bpf_mem_free requires migrate_disable(), since we can be 1741 + * called from map free path as well apart from BPF program (as 1742 + * part of map ops doing bpf_obj_free_fields). 1743 + */ 1744 + migrate_disable(); 1745 + bpf_mem_free(&bpf_global_ma, obj); 1746 + migrate_enable(); 1747 + } 1748 + } 1749 + 1750 + __diag_push(); 1751 + __diag_ignore_all("-Wmissing-prototypes", 1752 + "Global functions as their definitions will be in vmlinux BTF"); 1753 + 1754 + void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign) 1755 + { 1756 + struct btf_struct_meta *meta = meta__ign; 1757 + u64 size = local_type_id__k; 1758 + void *p; 1759 + 1760 + p = bpf_mem_alloc(&bpf_global_ma, size); 1761 + if (!p) 1762 + return NULL; 1763 + if (meta) 1764 + bpf_obj_init(meta->field_offs, p); 1765 + return p; 1766 + } 1767 + 1768 + void bpf_obj_drop_impl(void *p__alloc, void *meta__ign) 1769 + { 1770 + struct btf_struct_meta *meta = meta__ign; 1771 + void *p = p__alloc; 1772 + 1773 + if (meta) 1774 + bpf_obj_free_fields(meta->record, p); 1775 + bpf_mem_free(&bpf_global_ma, p); 1776 + } 1777 + 1778 + static void __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head, bool tail) 1779 + { 1780 + struct list_head *n = (void *)node, *h = (void *)head; 1781 + 1782 + if (unlikely(!h->next)) 1783 + INIT_LIST_HEAD(h); 1784 + if (unlikely(!n->next)) 1785 + INIT_LIST_HEAD(n); 1786 + tail ? list_add_tail(n, h) : list_add(n, h); 1787 + } 1788 + 1789 + void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node) 1790 + { 1791 + return __bpf_list_add(node, head, false); 1792 + } 1793 + 1794 + void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node) 1795 + { 1796 + return __bpf_list_add(node, head, true); 1797 + } 1798 + 1799 + static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail) 1800 + { 1801 + struct list_head *n, *h = (void *)head; 1802 + 1803 + if (unlikely(!h->next)) 1804 + INIT_LIST_HEAD(h); 1805 + if (list_empty(h)) 1806 + return NULL; 1807 + n = tail ? h->prev : h->next; 1808 + list_del_init(n); 1809 + return (struct bpf_list_node *)n; 1810 + } 1811 + 1812 + struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) 1813 + { 1814 + return __bpf_list_del(head, false); 1815 + } 1816 + 1817 + struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) 1818 + { 1819 + return __bpf_list_del(head, true); 1820 + } 1821 + 1822 + /** 1823 + * bpf_task_acquire - Acquire a reference to a task. A task acquired by this 1824 + * kfunc which is not stored in a map as a kptr, must be released by calling 1825 + * bpf_task_release(). 1826 + * @p: The task on which a reference is being acquired. 1827 + */ 1828 + struct task_struct *bpf_task_acquire(struct task_struct *p) 1829 + { 1830 + refcount_inc(&p->rcu_users); 1831 + return p; 1832 + } 1833 + 1834 + /** 1835 + * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task 1836 + * kptr acquired by this kfunc which is not subsequently stored in a map, must 1837 + * be released by calling bpf_task_release(). 1838 + * @pp: A pointer to a task kptr on which a reference is being acquired. 1839 + */ 1840 + struct task_struct *bpf_task_kptr_get(struct task_struct **pp) 1841 + { 1842 + struct task_struct *p; 1843 + 1844 + rcu_read_lock(); 1845 + p = READ_ONCE(*pp); 1846 + 1847 + /* Another context could remove the task from the map and release it at 1848 + * any time, including after we've done the lookup above. This is safe 1849 + * because we're in an RCU read region, so the task is guaranteed to 1850 + * remain valid until at least the rcu_read_unlock() below. 1851 + */ 1852 + if (p && !refcount_inc_not_zero(&p->rcu_users)) 1853 + /* If the task had been removed from the map and freed as 1854 + * described above, refcount_inc_not_zero() will return false. 1855 + * The task will be freed at some point after the current RCU 1856 + * gp has ended, so just return NULL to the user. 1857 + */ 1858 + p = NULL; 1859 + rcu_read_unlock(); 1860 + 1861 + return p; 1862 + } 1863 + 1864 + /** 1865 + * bpf_task_release - Release the reference acquired on a struct task_struct *. 1866 + * If this kfunc is invoked in an RCU read region, the task_struct is 1867 + * guaranteed to not be freed until the current grace period has ended, even if 1868 + * its refcount drops to 0. 1869 + * @p: The task on which a reference is being released. 1870 + */ 1871 + void bpf_task_release(struct task_struct *p) 1872 + { 1873 + if (!p) 1874 + return; 1875 + 1876 + put_task_struct_rcu_user(p); 1877 + } 1878 + 1879 + #ifdef CONFIG_CGROUPS 1880 + /** 1881 + * bpf_cgroup_acquire - Acquire a reference to a cgroup. A cgroup acquired by 1882 + * this kfunc which is not stored in a map as a kptr, must be released by 1883 + * calling bpf_cgroup_release(). 1884 + * @cgrp: The cgroup on which a reference is being acquired. 1885 + */ 1886 + struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) 1887 + { 1888 + cgroup_get(cgrp); 1889 + return cgrp; 1890 + } 1891 + 1892 + /** 1893 + * bpf_cgroup_kptr_get - Acquire a reference on a struct cgroup kptr. A cgroup 1894 + * kptr acquired by this kfunc which is not subsequently stored in a map, must 1895 + * be released by calling bpf_cgroup_release(). 1896 + * @cgrpp: A pointer to a cgroup kptr on which a reference is being acquired. 1897 + */ 1898 + struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp) 1899 + { 1900 + struct cgroup *cgrp; 1901 + 1902 + rcu_read_lock(); 1903 + /* Another context could remove the cgroup from the map and release it 1904 + * at any time, including after we've done the lookup above. This is 1905 + * safe because we're in an RCU read region, so the cgroup is 1906 + * guaranteed to remain valid until at least the rcu_read_unlock() 1907 + * below. 1908 + */ 1909 + cgrp = READ_ONCE(*cgrpp); 1910 + 1911 + if (cgrp && !cgroup_tryget(cgrp)) 1912 + /* If the cgroup had been removed from the map and freed as 1913 + * described above, cgroup_tryget() will return false. The 1914 + * cgroup will be freed at some point after the current RCU gp 1915 + * has ended, so just return NULL to the user. 1916 + */ 1917 + cgrp = NULL; 1918 + rcu_read_unlock(); 1919 + 1920 + return cgrp; 1921 + } 1922 + 1923 + /** 1924 + * bpf_cgroup_release - Release the reference acquired on a struct cgroup *. 1925 + * If this kfunc is invoked in an RCU read region, the cgroup is guaranteed to 1926 + * not be freed until the current grace period has ended, even if its refcount 1927 + * drops to 0. 1928 + * @cgrp: The cgroup on which a reference is being released. 1929 + */ 1930 + void bpf_cgroup_release(struct cgroup *cgrp) 1931 + { 1932 + if (!cgrp) 1933 + return; 1934 + 1935 + cgroup_put(cgrp); 1936 + } 1937 + 1938 + /** 1939 + * bpf_cgroup_ancestor - Perform a lookup on an entry in a cgroup's ancestor 1940 + * array. A cgroup returned by this kfunc which is not subsequently stored in a 1941 + * map, must be released by calling bpf_cgroup_release(). 1942 + * @cgrp: The cgroup for which we're performing a lookup. 1943 + * @level: The level of ancestor to look up. 1944 + */ 1945 + struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) 1946 + { 1947 + struct cgroup *ancestor; 1948 + 1949 + if (level > cgrp->level || level < 0) 1950 + return NULL; 1951 + 1952 + ancestor = cgrp->ancestors[level]; 1953 + cgroup_get(ancestor); 1954 + return ancestor; 1955 + } 1956 + #endif /* CONFIG_CGROUPS */ 1957 + 1958 + /** 1959 + * bpf_task_from_pid - Find a struct task_struct from its pid by looking it up 1960 + * in the root pid namespace idr. If a task is returned, it must either be 1961 + * stored in a map, or released with bpf_task_release(). 1962 + * @pid: The pid of the task being looked up. 1963 + */ 1964 + struct task_struct *bpf_task_from_pid(s32 pid) 1965 + { 1966 + struct task_struct *p; 1967 + 1968 + rcu_read_lock(); 1969 + p = find_task_by_pid_ns(pid, &init_pid_ns); 1970 + if (p) 1971 + bpf_task_acquire(p); 1972 + rcu_read_unlock(); 1973 + 1974 + return p; 1975 + } 1976 + 1977 + void *bpf_cast_to_kern_ctx(void *obj) 1978 + { 1979 + return obj; 1980 + } 1981 + 1982 + void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k) 1983 + { 1984 + return obj__ign; 1985 + } 1986 + 1987 + void bpf_rcu_read_lock(void) 1988 + { 1989 + rcu_read_lock(); 1990 + } 1991 + 1992 + void bpf_rcu_read_unlock(void) 1993 + { 1994 + rcu_read_unlock(); 1995 + } 1996 + 1997 + __diag_pop(); 1998 + 1999 + BTF_SET8_START(generic_btf_ids) 1716 2000 #ifdef CONFIG_KEXEC_CORE 1717 2001 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) 1718 2002 #endif 1719 - BTF_SET8_END(tracing_btf_ids) 2003 + BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL) 2004 + BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE) 2005 + BTF_ID_FLAGS(func, bpf_list_push_front) 2006 + BTF_ID_FLAGS(func, bpf_list_push_back) 2007 + BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) 2008 + BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) 2009 + BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) 2010 + BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL) 2011 + BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE) 2012 + #ifdef CONFIG_CGROUPS 2013 + BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) 2014 + BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL) 2015 + BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE) 2016 + BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL) 2017 + #endif 2018 + BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) 2019 + BTF_SET8_END(generic_btf_ids) 1720 2020 1721 - static const struct btf_kfunc_id_set tracing_kfunc_set = { 2021 + static const struct btf_kfunc_id_set generic_kfunc_set = { 1722 2022 .owner = THIS_MODULE, 1723 - .set = &tracing_btf_ids, 2023 + .set = &generic_btf_ids, 2024 + }; 2025 + 2026 + 2027 + BTF_ID_LIST(generic_dtor_ids) 2028 + BTF_ID(struct, task_struct) 2029 + BTF_ID(func, bpf_task_release) 2030 + #ifdef CONFIG_CGROUPS 2031 + BTF_ID(struct, cgroup) 2032 + BTF_ID(func, bpf_cgroup_release) 2033 + #endif 2034 + 2035 + BTF_SET8_START(common_btf_ids) 2036 + BTF_ID_FLAGS(func, bpf_cast_to_kern_ctx) 2037 + BTF_ID_FLAGS(func, bpf_rdonly_cast) 2038 + BTF_ID_FLAGS(func, bpf_rcu_read_lock) 2039 + BTF_ID_FLAGS(func, bpf_rcu_read_unlock) 2040 + BTF_SET8_END(common_btf_ids) 2041 + 2042 + static const struct btf_kfunc_id_set common_kfunc_set = { 2043 + .owner = THIS_MODULE, 2044 + .set = &common_btf_ids, 1724 2045 }; 1725 2046 1726 2047 static int __init kfunc_init(void) 1727 2048 { 1728 - return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set); 2049 + int ret; 2050 + const struct btf_id_dtor_kfunc generic_dtors[] = { 2051 + { 2052 + .btf_id = generic_dtor_ids[0], 2053 + .kfunc_btf_id = generic_dtor_ids[1] 2054 + }, 2055 + #ifdef CONFIG_CGROUPS 2056 + { 2057 + .btf_id = generic_dtor_ids[2], 2058 + .kfunc_btf_id = generic_dtor_ids[3] 2059 + }, 2060 + #endif 2061 + }; 2062 + 2063 + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set); 2064 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set); 2065 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set); 2066 + ret = ret ?: register_btf_id_dtor_kfuncs(generic_dtors, 2067 + ARRAY_SIZE(generic_dtors), 2068 + THIS_MODULE); 2069 + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &common_kfunc_set); 1729 2070 } 1730 2071 1731 2072 late_initcall(kfunc_init);

+35 -13

kernel/bpf/map_in_map.c

··· 12 12 struct bpf_map *inner_map, *inner_map_meta; 13 13 u32 inner_map_meta_size; 14 14 struct fd f; 15 + int ret; 15 16 16 17 f = fdget(inner_map_ufd); 17 18 inner_map = __bpf_map_get(f); ··· 21 20 22 21 /* Does not support >1 level map-in-map */ 23 22 if (inner_map->inner_map_meta) { 24 - fdput(f); 25 - return ERR_PTR(-EINVAL); 23 + ret = -EINVAL; 24 + goto put; 26 25 } 27 26 28 27 if (!inner_map->ops->map_meta_equal) { 29 - fdput(f); 30 - return ERR_PTR(-ENOTSUPP); 31 - } 32 - 33 - if (btf_record_has_field(inner_map->record, BPF_SPIN_LOCK)) { 34 - fdput(f); 35 - return ERR_PTR(-ENOTSUPP); 28 + ret = -ENOTSUPP; 29 + goto put; 36 30 } 37 31 38 32 inner_map_meta_size = sizeof(*inner_map_meta); ··· 37 41 38 42 inner_map_meta = kzalloc(inner_map_meta_size, GFP_USER); 39 43 if (!inner_map_meta) { 40 - fdput(f); 41 - return ERR_PTR(-ENOMEM); 44 + ret = -ENOMEM; 45 + goto put; 42 46 } 43 47 44 48 inner_map_meta->map_type = inner_map->map_type; ··· 46 50 inner_map_meta->value_size = inner_map->value_size; 47 51 inner_map_meta->map_flags = inner_map->map_flags; 48 52 inner_map_meta->max_entries = inner_map->max_entries; 53 + 49 54 inner_map_meta->record = btf_record_dup(inner_map->record); 50 55 if (IS_ERR(inner_map_meta->record)) { 51 56 /* btf_record_dup returns NULL or valid pointer in case of 52 57 * invalid/empty/valid, but ERR_PTR in case of errors. During 53 58 * equality NULL or IS_ERR is equivalent. 54 59 */ 55 - fdput(f); 56 - return ERR_CAST(inner_map_meta->record); 60 + ret = PTR_ERR(inner_map_meta->record); 61 + goto free; 57 62 } 63 + if (inner_map_meta->record) { 64 + struct btf_field_offs *field_offs; 65 + /* If btf_record is !IS_ERR_OR_NULL, then field_offs is always 66 + * valid. 67 + */ 68 + field_offs = kmemdup(inner_map->field_offs, sizeof(*inner_map->field_offs), GFP_KERNEL | __GFP_NOWARN); 69 + if (!field_offs) { 70 + ret = -ENOMEM; 71 + goto free_rec; 72 + } 73 + inner_map_meta->field_offs = field_offs; 74 + } 75 + /* Note: We must use the same BTF, as we also used btf_record_dup above 76 + * which relies on BTF being same for both maps, as some members like 77 + * record->fields.list_head have pointers like value_rec pointing into 78 + * inner_map->btf. 79 + */ 58 80 if (inner_map->btf) { 59 81 btf_get(inner_map->btf); 60 82 inner_map_meta->btf = inner_map->btf; ··· 88 74 89 75 fdput(f); 90 76 return inner_map_meta; 77 + free_rec: 78 + btf_record_free(inner_map_meta->record); 79 + free: 80 + kfree(inner_map_meta); 81 + put: 82 + fdput(f); 83 + return ERR_PTR(ret); 91 84 } 92 85 93 86 void bpf_map_meta_free(struct bpf_map *map_meta) 94 87 { 88 + kfree(map_meta->field_offs); 95 89 bpf_map_free_record(map_meta); 96 90 btf_put(map_meta->btf); 97 91 kfree(map_meta);

+3 -3

kernel/bpf/ringbuf.c

··· 447 447 448 448 const struct bpf_func_proto bpf_ringbuf_reserve_proto = { 449 449 .func = bpf_ringbuf_reserve, 450 - .ret_type = RET_PTR_TO_ALLOC_MEM_OR_NULL, 450 + .ret_type = RET_PTR_TO_RINGBUF_MEM_OR_NULL, 451 451 .arg1_type = ARG_CONST_MAP_PTR, 452 452 .arg2_type = ARG_CONST_ALLOC_SIZE_OR_ZERO, 453 453 .arg3_type = ARG_ANYTHING, ··· 490 490 const struct bpf_func_proto bpf_ringbuf_submit_proto = { 491 491 .func = bpf_ringbuf_submit, 492 492 .ret_type = RET_VOID, 493 - .arg1_type = ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE, 493 + .arg1_type = ARG_PTR_TO_RINGBUF_MEM | OBJ_RELEASE, 494 494 .arg2_type = ARG_ANYTHING, 495 495 }; 496 496 ··· 503 503 const struct bpf_func_proto bpf_ringbuf_discard_proto = { 504 504 .func = bpf_ringbuf_discard, 505 505 .ret_type = RET_VOID, 506 - .arg1_type = ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE, 506 + .arg1_type = ARG_PTR_TO_RINGBUF_MEM | OBJ_RELEASE, 507 507 .arg2_type = ARG_ANYTHING, 508 508 }; 509 509

+72 -24

kernel/bpf/syscall.c

··· 175 175 synchronize_rcu(); 176 176 } 177 177 178 - static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key, 179 - void *value, __u64 flags) 178 + static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, 179 + void *key, void *value, __u64 flags) 180 180 { 181 181 int err; 182 182 ··· 190 190 map->map_type == BPF_MAP_TYPE_SOCKMAP) { 191 191 return sock_map_update_elem_sys(map, key, value, flags); 192 192 } else if (IS_FD_PROG_ARRAY(map)) { 193 - return bpf_fd_array_map_update_elem(map, f.file, key, value, 193 + return bpf_fd_array_map_update_elem(map, map_file, key, value, 194 194 flags); 195 195 } 196 196 ··· 205 205 flags); 206 206 } else if (IS_FD_ARRAY(map)) { 207 207 rcu_read_lock(); 208 - err = bpf_fd_array_map_update_elem(map, f.file, key, value, 208 + err = bpf_fd_array_map_update_elem(map, map_file, key, value, 209 209 flags); 210 210 rcu_read_unlock(); 211 211 } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { 212 212 rcu_read_lock(); 213 - err = bpf_fd_htab_map_update_elem(map, f.file, key, value, 213 + err = bpf_fd_htab_map_update_elem(map, map_file, key, value, 214 214 flags); 215 215 rcu_read_unlock(); 216 216 } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { ··· 536 536 module_put(rec->fields[i].kptr.module); 537 537 btf_put(rec->fields[i].kptr.btf); 538 538 break; 539 + case BPF_LIST_HEAD: 540 + case BPF_LIST_NODE: 541 + /* Nothing to release for bpf_list_head */ 542 + break; 539 543 default: 540 544 WARN_ON_ONCE(1); 541 545 continue; ··· 582 578 goto free; 583 579 } 584 580 break; 581 + case BPF_LIST_HEAD: 582 + case BPF_LIST_NODE: 583 + /* Nothing to acquire for bpf_list_head */ 584 + break; 585 585 default: 586 586 ret = -EFAULT; 587 587 WARN_ON_ONCE(1); ··· 611 603 if (rec_a->cnt != rec_b->cnt) 612 604 return false; 613 605 size = offsetof(struct btf_record, fields[rec_a->cnt]); 606 + /* btf_parse_fields uses kzalloc to allocate a btf_record, so unused 607 + * members are zeroed out. So memcmp is safe to do without worrying 608 + * about padding/unused fields. 609 + * 610 + * While spin_lock, timer, and kptr have no relation to map BTF, 611 + * list_head metadata is specific to map BTF, the btf and value_rec 612 + * members in particular. btf is the map BTF, while value_rec points to 613 + * btf_record in that map BTF. 614 + * 615 + * So while by default, we don't rely on the map BTF (which the records 616 + * were parsed from) matching for both records, which is not backwards 617 + * compatible, in case list_head is part of it, we implicitly rely on 618 + * that by way of depending on memcmp succeeding for it. 619 + */ 614 620 return !memcmp(rec_a, rec_b, size); 615 621 } 616 622 ··· 659 637 case BPF_KPTR_REF: 660 638 field->kptr.dtor((void *)xchg((unsigned long *)field_ptr, 0)); 661 639 break; 640 + case BPF_LIST_HEAD: 641 + if (WARN_ON_ONCE(rec->spin_lock_off < 0)) 642 + continue; 643 + bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off); 644 + break; 645 + case BPF_LIST_NODE: 646 + break; 662 647 default: 663 648 WARN_ON_ONCE(1); 664 649 continue; ··· 677 648 static void bpf_map_free_deferred(struct work_struct *work) 678 649 { 679 650 struct bpf_map *map = container_of(work, struct bpf_map, work); 651 + struct btf_field_offs *foffs = map->field_offs; 652 + struct btf_record *rec = map->record; 680 653 681 654 security_bpf_map_free(map); 682 - kfree(map->field_offs); 683 655 bpf_map_release_memcg(map); 684 - /* implementation dependent freeing, map_free callback also does 685 - * bpf_map_free_record, if needed. 686 - */ 656 + /* implementation dependent freeing */ 687 657 map->ops->map_free(map); 658 + /* Delay freeing of field_offs and btf_record for maps, as map_free 659 + * callback usually needs access to them. It is better to do it here 660 + * than require each callback to do the free itself manually. 661 + * 662 + * Note that the btf_record stashed in map->inner_map_meta->record was 663 + * already freed using the map_free callback for map in map case which 664 + * eventually calls bpf_map_free_meta, since inner_map_meta is only a 665 + * template bpf_map struct used during verification. 666 + */ 667 + kfree(foffs); 668 + btf_record_free(rec); 688 669 } 689 670 690 671 static void bpf_map_put_uref(struct bpf_map *map) ··· 1004 965 if (!value_type || value_size != map->value_size) 1005 966 return -EINVAL; 1006 967 1007 - map->record = btf_parse_fields(btf, value_type, BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR, 968 + map->record = btf_parse_fields(btf, value_type, 969 + BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD, 1008 970 map->value_size); 1009 971 if (!IS_ERR_OR_NULL(map->record)) { 1010 972 int i; ··· 1038 998 if (map->map_type != BPF_MAP_TYPE_HASH && 1039 999 map->map_type != BPF_MAP_TYPE_LRU_HASH && 1040 1000 map->map_type != BPF_MAP_TYPE_ARRAY) { 1041 - return -EOPNOTSUPP; 1001 + ret = -EOPNOTSUPP; 1042 1002 goto free_map_tab; 1043 1003 } 1044 1004 break; ··· 1052 1012 goto free_map_tab; 1053 1013 } 1054 1014 break; 1015 + case BPF_LIST_HEAD: 1016 + if (map->map_type != BPF_MAP_TYPE_HASH && 1017 + map->map_type != BPF_MAP_TYPE_LRU_HASH && 1018 + map->map_type != BPF_MAP_TYPE_ARRAY) { 1019 + ret = -EOPNOTSUPP; 1020 + goto free_map_tab; 1021 + } 1022 + break; 1055 1023 default: 1056 1024 /* Fail if map_type checks are missing for a field type */ 1057 1025 ret = -EOPNOTSUPP; ··· 1067 1019 } 1068 1020 } 1069 1021 } 1022 + 1023 + ret = btf_check_and_fixup_fields(btf, map->record); 1024 + if (ret < 0) 1025 + goto free_map_tab; 1070 1026 1071 1027 if (map->ops->map_check_btf) { 1072 1028 ret = map->ops->map_check_btf(map, btf, key_type, value_type); ··· 1442 1390 goto free_key; 1443 1391 } 1444 1392 1445 - err = bpf_map_update_value(map, f, key, value, attr->flags); 1393 + err = bpf_map_update_value(map, f.file, key, value, attr->flags); 1446 1394 1447 1395 kvfree(value); 1448 1396 free_key: ··· 1628 1576 return err; 1629 1577 } 1630 1578 1631 - int generic_map_update_batch(struct bpf_map *map, 1579 + int generic_map_update_batch(struct bpf_map *map, struct file *map_file, 1632 1580 const union bpf_attr *attr, 1633 1581 union bpf_attr __user *uattr) 1634 1582 { 1635 1583 void __user *values = u64_to_user_ptr(attr->batch.values); 1636 1584 void __user *keys = u64_to_user_ptr(attr->batch.keys); 1637 1585 u32 value_size, cp, max_count; 1638 - int ufd = attr->batch.map_fd; 1639 1586 void *key, *value; 1640 - struct fd f; 1641 1587 int err = 0; 1642 1588 1643 1589 if (attr->batch.elem_flags & ~BPF_F_LOCK) ··· 1662 1612 return -ENOMEM; 1663 1613 } 1664 1614 1665 - f = fdget(ufd); /* bpf_map_do_batch() guarantees ufd is valid */ 1666 1615 for (cp = 0; cp < max_count; cp++) { 1667 1616 err = -EFAULT; 1668 1617 if (copy_from_user(key, keys + cp * map->key_size, ··· 1669 1620 copy_from_user(value, values + cp * value_size, value_size)) 1670 1621 break; 1671 1622 1672 - err = bpf_map_update_value(map, f, key, value, 1623 + err = bpf_map_update_value(map, map_file, key, value, 1673 1624 attr->batch.elem_flags); 1674 1625 1675 1626 if (err) ··· 1682 1633 1683 1634 kvfree(value); 1684 1635 kvfree(key); 1685 - fdput(f); 1686 1636 return err; 1687 1637 } 1688 1638 ··· 4474 4426 4475 4427 #define BPF_MAP_BATCH_LAST_FIELD batch.flags 4476 4428 4477 - #define BPF_DO_BATCH(fn) \ 4429 + #define BPF_DO_BATCH(fn, ...) \ 4478 4430 do { \ 4479 4431 if (!fn) { \ 4480 4432 err = -ENOTSUPP; \ 4481 4433 goto err_put; \ 4482 4434 } \ 4483 - err = fn(map, attr, uattr); \ 4435 + err = fn(__VA_ARGS__); \ 4484 4436 } while (0) 4485 4437 4486 4438 static int bpf_map_do_batch(const union bpf_attr *attr, ··· 4514 4466 } 4515 4467 4516 4468 if (cmd == BPF_MAP_LOOKUP_BATCH) 4517 - BPF_DO_BATCH(map->ops->map_lookup_batch); 4469 + BPF_DO_BATCH(map->ops->map_lookup_batch, map, attr, uattr); 4518 4470 else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) 4519 - BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch); 4471 + BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch, map, attr, uattr); 4520 4472 else if (cmd == BPF_MAP_UPDATE_BATCH) 4521 - BPF_DO_BATCH(map->ops->map_update_batch); 4473 + BPF_DO_BATCH(map->ops->map_update_batch, map, f.file, attr, uattr); 4522 4474 else 4523 - BPF_DO_BATCH(map->ops->map_delete_batch); 4475 + BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr); 4524 4476 err_put: 4525 4477 if (has_write) 4526 4478 bpf_map_write_active_dec(map);

+1414 -117

kernel/bpf/verifier.c

··· 451 451 type == PTR_TO_SOCK_COMMON; 452 452 } 453 453 454 + static struct btf_record *reg_btf_record(const struct bpf_reg_state *reg) 455 + { 456 + struct btf_record *rec = NULL; 457 + struct btf_struct_meta *meta; 458 + 459 + if (reg->type == PTR_TO_MAP_VALUE) { 460 + rec = reg->map_ptr->record; 461 + } else if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC)) { 462 + meta = btf_find_struct_meta(reg->btf, reg->btf_id); 463 + if (meta) 464 + rec = meta->record; 465 + } 466 + return rec; 467 + } 468 + 454 469 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) 455 470 { 456 - return reg->type == PTR_TO_MAP_VALUE && 457 - btf_record_has_field(reg->map_ptr->record, BPF_SPIN_LOCK); 471 + return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK); 458 472 } 459 473 460 474 static bool type_is_rdonly_mem(u32 type) ··· 527 513 func_id == BPF_FUNC_user_ringbuf_drain; 528 514 } 529 515 516 + static bool is_storage_get_function(enum bpf_func_id func_id) 517 + { 518 + return func_id == BPF_FUNC_sk_storage_get || 519 + func_id == BPF_FUNC_inode_storage_get || 520 + func_id == BPF_FUNC_task_storage_get || 521 + func_id == BPF_FUNC_cgrp_storage_get; 522 + } 523 + 530 524 static bool helper_multiple_ref_obj_use(enum bpf_func_id func_id, 531 525 const struct bpf_map *map) 532 526 { ··· 565 543 static const char *reg_type_str(struct bpf_verifier_env *env, 566 544 enum bpf_reg_type type) 567 545 { 568 - char postfix[16] = {0}, prefix[32] = {0}; 546 + char postfix[16] = {0}, prefix[64] = {0}; 569 547 static const char * const str[] = { 570 548 [NOT_INIT] = "?", 571 549 [SCALAR_VALUE] = "scalar", ··· 597 575 strncpy(postfix, "_or_null", 16); 598 576 } 599 577 600 - if (type & MEM_RDONLY) 601 - strncpy(prefix, "rdonly_", 32); 602 - if (type & MEM_ALLOC) 603 - strncpy(prefix, "alloc_", 32); 604 - if (type & MEM_USER) 605 - strncpy(prefix, "user_", 32); 606 - if (type & MEM_PERCPU) 607 - strncpy(prefix, "percpu_", 32); 608 - if (type & PTR_UNTRUSTED) 609 - strncpy(prefix, "untrusted_", 32); 578 + snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s%s", 579 + type & MEM_RDONLY ? "rdonly_" : "", 580 + type & MEM_RINGBUF ? "ringbuf_" : "", 581 + type & MEM_USER ? "user_" : "", 582 + type & MEM_PERCPU ? "percpu_" : "", 583 + type & MEM_RCU ? "rcu_" : "", 584 + type & PTR_UNTRUSTED ? "untrusted_" : "", 585 + type & PTR_TRUSTED ? "trusted_" : "" 586 + ); 610 587 611 588 snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s", 612 589 prefix, str[base_type(type)], postfix); ··· 1031 1010 if (unlikely(check_mul_overflow(n, size, &bytes))) 1032 1011 return NULL; 1033 1012 1034 - if (ksize(dst) < bytes) { 1013 + if (ksize(dst) < ksize(src)) { 1035 1014 kfree(dst); 1036 - dst = kmalloc_track_caller(bytes, flags); 1015 + dst = kmalloc_track_caller(kmalloc_size_roundup(bytes), flags); 1037 1016 if (!dst) 1038 1017 return NULL; 1039 1018 } ··· 1050 1029 */ 1051 1030 static void *realloc_array(void *arr, size_t old_n, size_t new_n, size_t size) 1052 1031 { 1032 + size_t alloc_size; 1053 1033 void *new_arr; 1054 1034 1055 1035 if (!new_n || old_n == new_n) 1056 1036 goto out; 1057 1037 1058 - new_arr = krealloc_array(arr, new_n, size, GFP_KERNEL); 1038 + alloc_size = kmalloc_size_roundup(size_mul(new_n, size)); 1039 + new_arr = krealloc(arr, alloc_size, GFP_KERNEL); 1059 1040 if (!new_arr) { 1060 1041 kfree(arr); 1061 1042 return NULL; ··· 1229 1206 dst_state->frame[i] = NULL; 1230 1207 } 1231 1208 dst_state->speculative = src->speculative; 1209 + dst_state->active_rcu_lock = src->active_rcu_lock; 1232 1210 dst_state->curframe = src->curframe; 1233 - dst_state->active_spin_lock = src->active_spin_lock; 1211 + dst_state->active_lock.ptr = src->active_lock.ptr; 1212 + dst_state->active_lock.id = src->active_lock.id; 1234 1213 dst_state->branches = src->branches; 1235 1214 dst_state->parent = src->parent; 1236 1215 dst_state->first_insn_idx = src->first_insn_idx; ··· 2531 2506 { 2532 2507 u32 cnt = cur->jmp_history_cnt; 2533 2508 struct bpf_idx_pair *p; 2509 + size_t alloc_size; 2534 2510 2535 2511 cnt++; 2536 - p = krealloc(cur->jmp_history, cnt * sizeof(*p), GFP_USER); 2512 + alloc_size = kmalloc_size_roundup(size_mul(cnt, sizeof(*p))); 2513 + p = krealloc(cur->jmp_history, alloc_size, GFP_USER); 2537 2514 if (!p) 2538 2515 return -ENOMEM; 2539 2516 p[cnt - 1].idx = env->insn_idx; ··· 3871 3844 struct bpf_reg_state *reg, u32 regno) 3872 3845 { 3873 3846 const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id); 3874 - int perm_flags = PTR_MAYBE_NULL; 3847 + int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED; 3875 3848 const char *reg_name = ""; 3876 3849 3877 3850 /* Only unreferenced case accepts untrusted pointers */ ··· 4266 4239 4267 4240 /* Separate to is_ctx_reg() since we still want to allow BPF_ST here. */ 4268 4241 return reg->type == PTR_TO_FLOW_KEYS; 4242 + } 4243 + 4244 + static bool is_trusted_reg(const struct bpf_reg_state *reg) 4245 + { 4246 + /* A referenced register is always trusted. */ 4247 + if (reg->ref_obj_id) 4248 + return true; 4249 + 4250 + /* If a register is not referenced, it is trusted if it has the 4251 + * MEM_ALLOC, MEM_RCU or PTR_TRUSTED type modifiers, and no others. Some of the 4252 + * other type modifiers may be safe, but we elect to take an opt-in 4253 + * approach here as some (e.g. PTR_UNTRUSTED and PTR_MAYBE_NULL) are 4254 + * not. 4255 + * 4256 + * Eventually, we should make PTR_TRUSTED the single source of truth 4257 + * for whether a register is trusted. 4258 + */ 4259 + return type_flag(reg->type) & BPF_REG_TRUSTED_MODIFIERS && 4260 + !bpf_type_has_unsafe_modifiers(reg->type); 4269 4261 } 4270 4262 4271 4263 static int check_pkt_ptr_alignment(struct bpf_verifier_env *env, ··· 4733 4687 return -EACCES; 4734 4688 } 4735 4689 4736 - if (env->ops->btf_struct_access) { 4737 - ret = env->ops->btf_struct_access(&env->log, reg->btf, t, 4738 - off, size, atype, &btf_id, &flag); 4690 + if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) { 4691 + if (!btf_is_kernel(reg->btf)) { 4692 + verbose(env, "verifier internal error: reg->btf must be kernel btf\n"); 4693 + return -EFAULT; 4694 + } 4695 + ret = env->ops->btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag); 4739 4696 } else { 4740 - if (atype != BPF_READ) { 4697 + /* Writes are permitted with default btf_struct_access for 4698 + * program allocated objects (which always have ref_obj_id > 0), 4699 + * but not for untrusted PTR_TO_BTF_ID | MEM_ALLOC. 4700 + */ 4701 + if (atype != BPF_READ && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { 4741 4702 verbose(env, "only read is supported\n"); 4742 4703 return -EACCES; 4743 4704 } 4744 4705 4745 - ret = btf_struct_access(&env->log, reg->btf, t, off, size, 4746 - atype, &btf_id, &flag); 4706 + if (type_is_alloc(reg->type) && !reg->ref_obj_id) { 4707 + verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n"); 4708 + return -EFAULT; 4709 + } 4710 + 4711 + ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag); 4747 4712 } 4748 4713 4749 4714 if (ret < 0) ··· 4765 4708 */ 4766 4709 if (type_flag(reg->type) & PTR_UNTRUSTED) 4767 4710 flag |= PTR_UNTRUSTED; 4711 + 4712 + /* By default any pointer obtained from walking a trusted pointer is 4713 + * no longer trusted except the rcu case below. 4714 + */ 4715 + flag &= ~PTR_TRUSTED; 4716 + 4717 + if (flag & MEM_RCU) { 4718 + /* Mark value register as MEM_RCU only if it is protected by 4719 + * bpf_rcu_read_lock() and the ptr reg is trusted. MEM_RCU 4720 + * itself can already indicate trustedness inside the rcu 4721 + * read lock region. Also mark it as PTR_TRUSTED. 4722 + */ 4723 + if (!env->cur_state->active_rcu_lock || !is_trusted_reg(reg)) 4724 + flag &= ~MEM_RCU; 4725 + else 4726 + flag |= PTR_TRUSTED; 4727 + } else if (reg->type & MEM_RCU) { 4728 + /* ptr (reg) is marked as MEM_RCU, but the struct field is not tagged 4729 + * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively. 4730 + */ 4731 + flag |= PTR_UNTRUSTED; 4732 + } 4768 4733 4769 4734 if (atype == BPF_READ && value_regno >= 0) 4770 4735 mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag); ··· 4802 4723 { 4803 4724 struct bpf_reg_state *reg = regs + regno; 4804 4725 struct bpf_map *map = reg->map_ptr; 4726 + struct bpf_reg_state map_reg; 4805 4727 enum bpf_type_flag flag = 0; 4806 4728 const struct btf_type *t; 4807 4729 const char *tname; ··· 4841 4761 return -EACCES; 4842 4762 } 4843 4763 4844 - ret = btf_struct_access(&env->log, btf_vmlinux, t, off, size, atype, &btf_id, &flag); 4764 + /* Simulate access to a PTR_TO_BTF_ID */ 4765 + memset(&map_reg, 0, sizeof(map_reg)); 4766 + mark_btf_ld_reg(env, &map_reg, 0, PTR_TO_BTF_ID, btf_vmlinux, *map->ops->map_btf_id, 0); 4767 + ret = btf_struct_access(&env->log, &map_reg, off, size, atype, &btf_id, &flag); 4845 4768 if (ret < 0) 4846 4769 return ret; 4847 4770 ··· 5603 5520 return err; 5604 5521 } 5605 5522 5606 - int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 5607 - u32 regno) 5523 + static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 5524 + u32 regno) 5608 5525 { 5609 5526 struct bpf_reg_state *mem_reg = &cur_regs(env)[regno - 1]; 5610 5527 bool may_be_null = type_may_be_null(mem_reg->type); ··· 5632 5549 } 5633 5550 5634 5551 /* Implementation details: 5635 - * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL 5552 + * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL. 5553 + * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL. 5636 5554 * Two bpf_map_lookups (even with the same key) will have different reg->id. 5637 - * For traditional PTR_TO_MAP_VALUE the verifier clears reg->id after 5638 - * value_or_null->value transition, since the verifier only cares about 5639 - * the range of access to valid map value pointer and doesn't care about actual 5640 - * address of the map element. 5555 + * Two separate bpf_obj_new will also have different reg->id. 5556 + * For traditional PTR_TO_MAP_VALUE or PTR_TO_BTF_ID | MEM_ALLOC, the verifier 5557 + * clears reg->id after value_or_null->value transition, since the verifier only 5558 + * cares about the range of access to valid map value pointer and doesn't care 5559 + * about actual address of the map element. 5641 5560 * For maps with 'struct bpf_spin_lock' inside map value the verifier keeps 5642 5561 * reg->id > 0 after value_or_null->value transition. By doing so 5643 5562 * two bpf_map_lookups will be considered two different pointers that 5644 - * point to different bpf_spin_locks. 5563 + * point to different bpf_spin_locks. Likewise for pointers to allocated objects 5564 + * returned from bpf_obj_new. 5645 5565 * The verifier allows taking only one bpf_spin_lock at a time to avoid 5646 5566 * dead-locks. 5647 5567 * Since only one bpf_spin_lock is allowed the checks are simpler than 5648 5568 * reg_is_refcounted() logic. The verifier needs to remember only 5649 5569 * one spin_lock instead of array of acquired_refs. 5650 - * cur_state->active_spin_lock remembers which map value element got locked 5651 - * and clears it after bpf_spin_unlock. 5570 + * cur_state->active_lock remembers which map value element or allocated 5571 + * object got locked and clears it after bpf_spin_unlock. 5652 5572 */ 5653 5573 static int process_spin_lock(struct bpf_verifier_env *env, int regno, 5654 5574 bool is_lock) ··· 5659 5573 struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno]; 5660 5574 struct bpf_verifier_state *cur = env->cur_state; 5661 5575 bool is_const = tnum_is_const(reg->var_off); 5662 - struct bpf_map *map = reg->map_ptr; 5663 5576 u64 val = reg->var_off.value; 5577 + struct bpf_map *map = NULL; 5578 + struct btf *btf = NULL; 5579 + struct btf_record *rec; 5664 5580 5665 5581 if (!is_const) { 5666 5582 verbose(env, ··· 5670 5582 regno); 5671 5583 return -EINVAL; 5672 5584 } 5673 - if (!map->btf) { 5674 - verbose(env, 5675 - "map '%s' has to have BTF in order to use bpf_spin_lock\n", 5676 - map->name); 5585 + if (reg->type == PTR_TO_MAP_VALUE) { 5586 + map = reg->map_ptr; 5587 + if (!map->btf) { 5588 + verbose(env, 5589 + "map '%s' has to have BTF in order to use bpf_spin_lock\n", 5590 + map->name); 5591 + return -EINVAL; 5592 + } 5593 + } else { 5594 + btf = reg->btf; 5595 + } 5596 + 5597 + rec = reg_btf_record(reg); 5598 + if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) { 5599 + verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local", 5600 + map ? map->name : "kptr"); 5677 5601 return -EINVAL; 5678 5602 } 5679 - if (!btf_record_has_field(map->record, BPF_SPIN_LOCK)) { 5680 - verbose(env, "map '%s' has no valid bpf_spin_lock\n", map->name); 5681 - return -EINVAL; 5682 - } 5683 - if (map->record->spin_lock_off != val + reg->off) { 5603 + if (rec->spin_lock_off != val + reg->off) { 5684 5604 verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n", 5685 - val + reg->off, map->record->spin_lock_off); 5605 + val + reg->off, rec->spin_lock_off); 5686 5606 return -EINVAL; 5687 5607 } 5688 5608 if (is_lock) { 5689 - if (cur->active_spin_lock) { 5609 + if (cur->active_lock.ptr) { 5690 5610 verbose(env, 5691 5611 "Locking two bpf_spin_locks are not allowed\n"); 5692 5612 return -EINVAL; 5693 5613 } 5694 - cur->active_spin_lock = reg->id; 5614 + if (map) 5615 + cur->active_lock.ptr = map; 5616 + else 5617 + cur->active_lock.ptr = btf; 5618 + cur->active_lock.id = reg->id; 5695 5619 } else { 5696 - if (!cur->active_spin_lock) { 5620 + struct bpf_func_state *fstate = cur_func(env); 5621 + void *ptr; 5622 + int i; 5623 + 5624 + if (map) 5625 + ptr = map; 5626 + else 5627 + ptr = btf; 5628 + 5629 + if (!cur->active_lock.ptr) { 5697 5630 verbose(env, "bpf_spin_unlock without taking a lock\n"); 5698 5631 return -EINVAL; 5699 5632 } 5700 - if (cur->active_spin_lock != reg->id) { 5633 + if (cur->active_lock.ptr != ptr || 5634 + cur->active_lock.id != reg->id) { 5701 5635 verbose(env, "bpf_spin_unlock of different lock\n"); 5702 5636 return -EINVAL; 5703 5637 } 5704 - cur->active_spin_lock = 0; 5638 + cur->active_lock.ptr = NULL; 5639 + cur->active_lock.id = 0; 5640 + 5641 + for (i = 0; i < fstate->acquired_refs; i++) { 5642 + int err; 5643 + 5644 + /* Complain on error because this reference state cannot 5645 + * be freed before this point, as bpf_spin_lock critical 5646 + * section does not allow functions that release the 5647 + * allocated object immediately. 5648 + */ 5649 + if (!fstate->refs[i].release_on_unlock) 5650 + continue; 5651 + err = release_reference(env, fstate->refs[i].id); 5652 + if (err) { 5653 + verbose(env, "failed to release release_on_unlock reference"); 5654 + return err; 5655 + } 5656 + } 5705 5657 } 5706 5658 return 0; 5707 5659 } ··· 5900 5772 PTR_TO_TCP_SOCK, 5901 5773 PTR_TO_XDP_SOCK, 5902 5774 PTR_TO_BTF_ID, 5775 + PTR_TO_BTF_ID | PTR_TRUSTED, 5903 5776 }, 5904 5777 .btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], 5905 5778 }; ··· 5914 5785 PTR_TO_MAP_KEY, 5915 5786 PTR_TO_MAP_VALUE, 5916 5787 PTR_TO_MEM, 5917 - PTR_TO_MEM | MEM_ALLOC, 5788 + PTR_TO_MEM | MEM_RINGBUF, 5918 5789 PTR_TO_BUF, 5919 5790 }, 5920 5791 }; ··· 5929 5800 }, 5930 5801 }; 5931 5802 5803 + static const struct bpf_reg_types spin_lock_types = { 5804 + .types = { 5805 + PTR_TO_MAP_VALUE, 5806 + PTR_TO_BTF_ID | MEM_ALLOC, 5807 + } 5808 + }; 5809 + 5932 5810 static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } }; 5933 5811 static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } }; 5934 5812 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } }; 5935 - static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } }; 5813 + static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } }; 5936 5814 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } }; 5937 - static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } }; 5938 - static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } }; 5939 - static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } }; 5815 + static const struct bpf_reg_types btf_ptr_types = { 5816 + .types = { 5817 + PTR_TO_BTF_ID, 5818 + PTR_TO_BTF_ID | PTR_TRUSTED, 5819 + PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED, 5820 + }, 5821 + }; 5822 + static const struct bpf_reg_types percpu_btf_ptr_types = { 5823 + .types = { 5824 + PTR_TO_BTF_ID | MEM_PERCPU, 5825 + PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED, 5826 + } 5827 + }; 5940 5828 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } }; 5941 5829 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } }; 5942 5830 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } }; ··· 5982 5836 [ARG_PTR_TO_BTF_ID] = &btf_ptr_types, 5983 5837 [ARG_PTR_TO_SPIN_LOCK] = &spin_lock_types, 5984 5838 [ARG_PTR_TO_MEM] = &mem_types, 5985 - [ARG_PTR_TO_ALLOC_MEM] = &alloc_mem_types, 5839 + [ARG_PTR_TO_RINGBUF_MEM] = &ringbuf_mem_types, 5986 5840 [ARG_PTR_TO_INT] = &int_ptr_types, 5987 5841 [ARG_PTR_TO_LONG] = &int_ptr_types, 5988 5842 [ARG_PTR_TO_PERCPU_BTF_ID] = &percpu_btf_ptr_types, ··· 6041 5895 return -EACCES; 6042 5896 6043 5897 found: 6044 - if (reg->type == PTR_TO_BTF_ID) { 5898 + if (reg->type == PTR_TO_BTF_ID || reg->type & PTR_TRUSTED) { 6045 5899 /* For bpf_sk_release, it needs to match against first member 6046 5900 * 'struct sock_common', hence make an exception for it. This 6047 5901 * allows bpf_sk_release to work for multiple socket types. ··· 6077 5931 return -EACCES; 6078 5932 } 6079 5933 } 5934 + } else if (type_is_alloc(reg->type)) { 5935 + if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock) { 5936 + verbose(env, "verifier internal error: unimplemented handling of MEM_ALLOC\n"); 5937 + return -EFAULT; 5938 + } 6080 5939 } 6081 5940 6082 5941 return 0; ··· 6108 5957 case PTR_TO_MAP_VALUE: 6109 5958 case PTR_TO_MEM: 6110 5959 case PTR_TO_MEM | MEM_RDONLY: 6111 - case PTR_TO_MEM | MEM_ALLOC: 5960 + case PTR_TO_MEM | MEM_RINGBUF: 6112 5961 case PTR_TO_BUF: 6113 5962 case PTR_TO_BUF | MEM_RDONLY: 6114 5963 case SCALAR_VALUE: 6115 5964 /* Some of the argument types nevertheless require a 6116 5965 * zero register offset. 6117 5966 */ 6118 - if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM) 5967 + if (base_type(arg_type) != ARG_PTR_TO_RINGBUF_MEM) 6119 5968 return 0; 6120 5969 break; 6121 5970 /* All the rest must be rejected, except PTR_TO_BTF_ID which allows 6122 5971 * fixed offset. 6123 5972 */ 6124 5973 case PTR_TO_BTF_ID: 5974 + case PTR_TO_BTF_ID | MEM_ALLOC: 5975 + case PTR_TO_BTF_ID | PTR_TRUSTED: 5976 + case PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED: 5977 + case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED: 6125 5978 /* When referenced PTR_TO_BTF_ID is passed to release function, 6126 5979 * it's fixed offset must be 0. In the other cases, fixed offset 6127 5980 * can be non-zero. ··· 6201 6046 goto skip_type_check; 6202 6047 6203 6048 /* arg_btf_id and arg_size are in a union. */ 6204 - if (base_type(arg_type) == ARG_PTR_TO_BTF_ID) 6049 + if (base_type(arg_type) == ARG_PTR_TO_BTF_ID || 6050 + base_type(arg_type) == ARG_PTR_TO_SPIN_LOCK) 6205 6051 arg_btf_id = fn->arg_btf_id[arg]; 6206 6052 6207 6053 err = check_reg_type(env, regno, arg_type, arg_btf_id, meta); ··· 6820 6664 int i; 6821 6665 6822 6666 for (i = 0; i < ARRAY_SIZE(fn->arg_type); i++) { 6823 - if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i]) 6824 - return false; 6825 - 6667 + if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID) 6668 + return !!fn->arg_btf_id[i]; 6669 + if (base_type(fn->arg_type[i]) == ARG_PTR_TO_SPIN_LOCK) 6670 + return fn->arg_btf_id[i] == BPF_PTR_POISON; 6826 6671 if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i] && 6827 6672 /* arg_btf_id and arg_size are in a union. */ 6828 6673 (base_type(fn->arg_type[i]) != ARG_PTR_TO_MEM || ··· 7570 7413 return -EINVAL; 7571 7414 } 7572 7415 7416 + if (!env->prog->aux->sleepable && fn->might_sleep) { 7417 + verbose(env, "helper call might sleep in a non-sleepable prog\n"); 7418 + return -EINVAL; 7419 + } 7420 + 7573 7421 /* With LD_ABS/IND some JITs save/restore skb from r1. */ 7574 7422 changes_data = bpf_helper_changes_pkt_data(fn->func); 7575 7423 if (changes_data && fn->arg1_type != ARG_PTR_TO_CTX) { ··· 7591 7429 verbose(env, "kernel subsystem misconfigured func %s#%d\n", 7592 7430 func_id_name(func_id), func_id); 7593 7431 return err; 7432 + } 7433 + 7434 + if (env->cur_state->active_rcu_lock) { 7435 + if (fn->might_sleep) { 7436 + verbose(env, "sleepable helper %s#%d in rcu_read_lock region\n", 7437 + func_id_name(func_id), func_id); 7438 + return -EINVAL; 7439 + } 7440 + 7441 + if (env->prog->aux->sleepable && is_storage_get_function(func_id)) 7442 + env->insn_aux_data[insn_idx].storage_get_func_atomic = true; 7594 7443 } 7595 7444 7596 7445 meta.func_id = func_id; ··· 7807 7634 mark_reg_known_zero(env, regs, BPF_REG_0); 7808 7635 regs[BPF_REG_0].type = PTR_TO_TCP_SOCK | ret_flag; 7809 7636 break; 7810 - case RET_PTR_TO_ALLOC_MEM: 7637 + case RET_PTR_TO_MEM: 7811 7638 mark_reg_known_zero(env, regs, BPF_REG_0); 7812 7639 regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag; 7813 7640 regs[BPF_REG_0].mem_size = meta.mem_size; ··· 7970 7797 } 7971 7798 } 7972 7799 7800 + struct bpf_kfunc_call_arg_meta { 7801 + /* In parameters */ 7802 + struct btf *btf; 7803 + u32 func_id; 7804 + u32 kfunc_flags; 7805 + const struct btf_type *func_proto; 7806 + const char *func_name; 7807 + /* Out parameters */ 7808 + u32 ref_obj_id; 7809 + u8 release_regno; 7810 + bool r0_rdonly; 7811 + u32 ret_btf_id; 7812 + u64 r0_size; 7813 + struct { 7814 + u64 value; 7815 + bool found; 7816 + } arg_constant; 7817 + struct { 7818 + struct btf *btf; 7819 + u32 btf_id; 7820 + } arg_obj_drop; 7821 + struct { 7822 + struct btf_field *field; 7823 + } arg_list_head; 7824 + }; 7825 + 7826 + static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta) 7827 + { 7828 + return meta->kfunc_flags & KF_ACQUIRE; 7829 + } 7830 + 7831 + static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) 7832 + { 7833 + return meta->kfunc_flags & KF_RET_NULL; 7834 + } 7835 + 7836 + static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta) 7837 + { 7838 + return meta->kfunc_flags & KF_RELEASE; 7839 + } 7840 + 7841 + static bool is_kfunc_trusted_args(struct bpf_kfunc_call_arg_meta *meta) 7842 + { 7843 + return meta->kfunc_flags & KF_TRUSTED_ARGS; 7844 + } 7845 + 7846 + static bool is_kfunc_sleepable(struct bpf_kfunc_call_arg_meta *meta) 7847 + { 7848 + return meta->kfunc_flags & KF_SLEEPABLE; 7849 + } 7850 + 7851 + static bool is_kfunc_destructive(struct bpf_kfunc_call_arg_meta *meta) 7852 + { 7853 + return meta->kfunc_flags & KF_DESTRUCTIVE; 7854 + } 7855 + 7856 + static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg) 7857 + { 7858 + return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET); 7859 + } 7860 + 7861 + static bool __kfunc_param_match_suffix(const struct btf *btf, 7862 + const struct btf_param *arg, 7863 + const char *suffix) 7864 + { 7865 + int suffix_len = strlen(suffix), len; 7866 + const char *param_name; 7867 + 7868 + /* In the future, this can be ported to use BTF tagging */ 7869 + param_name = btf_name_by_offset(btf, arg->name_off); 7870 + if (str_is_empty(param_name)) 7871 + return false; 7872 + len = strlen(param_name); 7873 + if (len < suffix_len) 7874 + return false; 7875 + param_name += len - suffix_len; 7876 + return !strncmp(param_name, suffix, suffix_len); 7877 + } 7878 + 7879 + static bool is_kfunc_arg_mem_size(const struct btf *btf, 7880 + const struct btf_param *arg, 7881 + const struct bpf_reg_state *reg) 7882 + { 7883 + const struct btf_type *t; 7884 + 7885 + t = btf_type_skip_modifiers(btf, arg->type, NULL); 7886 + if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE) 7887 + return false; 7888 + 7889 + return __kfunc_param_match_suffix(btf, arg, "__sz"); 7890 + } 7891 + 7892 + static bool is_kfunc_arg_constant(const struct btf *btf, const struct btf_param *arg) 7893 + { 7894 + return __kfunc_param_match_suffix(btf, arg, "__k"); 7895 + } 7896 + 7897 + static bool is_kfunc_arg_ignore(const struct btf *btf, const struct btf_param *arg) 7898 + { 7899 + return __kfunc_param_match_suffix(btf, arg, "__ign"); 7900 + } 7901 + 7902 + static bool is_kfunc_arg_alloc_obj(const struct btf *btf, const struct btf_param *arg) 7903 + { 7904 + return __kfunc_param_match_suffix(btf, arg, "__alloc"); 7905 + } 7906 + 7907 + static bool is_kfunc_arg_scalar_with_name(const struct btf *btf, 7908 + const struct btf_param *arg, 7909 + const char *name) 7910 + { 7911 + int len, target_len = strlen(name); 7912 + const char *param_name; 7913 + 7914 + param_name = btf_name_by_offset(btf, arg->name_off); 7915 + if (str_is_empty(param_name)) 7916 + return false; 7917 + len = strlen(param_name); 7918 + if (len != target_len) 7919 + return false; 7920 + if (strcmp(param_name, name)) 7921 + return false; 7922 + 7923 + return true; 7924 + } 7925 + 7926 + enum { 7927 + KF_ARG_DYNPTR_ID, 7928 + KF_ARG_LIST_HEAD_ID, 7929 + KF_ARG_LIST_NODE_ID, 7930 + }; 7931 + 7932 + BTF_ID_LIST(kf_arg_btf_ids) 7933 + BTF_ID(struct, bpf_dynptr_kern) 7934 + BTF_ID(struct, bpf_list_head) 7935 + BTF_ID(struct, bpf_list_node) 7936 + 7937 + static bool __is_kfunc_ptr_arg_type(const struct btf *btf, 7938 + const struct btf_param *arg, int type) 7939 + { 7940 + const struct btf_type *t; 7941 + u32 res_id; 7942 + 7943 + t = btf_type_skip_modifiers(btf, arg->type, NULL); 7944 + if (!t) 7945 + return false; 7946 + if (!btf_type_is_ptr(t)) 7947 + return false; 7948 + t = btf_type_skip_modifiers(btf, t->type, &res_id); 7949 + if (!t) 7950 + return false; 7951 + return btf_types_are_same(btf, res_id, btf_vmlinux, kf_arg_btf_ids[type]); 7952 + } 7953 + 7954 + static bool is_kfunc_arg_dynptr(const struct btf *btf, const struct btf_param *arg) 7955 + { 7956 + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_DYNPTR_ID); 7957 + } 7958 + 7959 + static bool is_kfunc_arg_list_head(const struct btf *btf, const struct btf_param *arg) 7960 + { 7961 + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_HEAD_ID); 7962 + } 7963 + 7964 + static bool is_kfunc_arg_list_node(const struct btf *btf, const struct btf_param *arg) 7965 + { 7966 + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_LIST_NODE_ID); 7967 + } 7968 + 7969 + /* Returns true if struct is composed of scalars, 4 levels of nesting allowed */ 7970 + static bool __btf_type_is_scalar_struct(struct bpf_verifier_env *env, 7971 + const struct btf *btf, 7972 + const struct btf_type *t, int rec) 7973 + { 7974 + const struct btf_type *member_type; 7975 + const struct btf_member *member; 7976 + u32 i; 7977 + 7978 + if (!btf_type_is_struct(t)) 7979 + return false; 7980 + 7981 + for_each_member(i, t, member) { 7982 + const struct btf_array *array; 7983 + 7984 + member_type = btf_type_skip_modifiers(btf, member->type, NULL); 7985 + if (btf_type_is_struct(member_type)) { 7986 + if (rec >= 3) { 7987 + verbose(env, "max struct nesting depth exceeded\n"); 7988 + return false; 7989 + } 7990 + if (!__btf_type_is_scalar_struct(env, btf, member_type, rec + 1)) 7991 + return false; 7992 + continue; 7993 + } 7994 + if (btf_type_is_array(member_type)) { 7995 + array = btf_array(member_type); 7996 + if (!array->nelems) 7997 + return false; 7998 + member_type = btf_type_skip_modifiers(btf, array->type, NULL); 7999 + if (!btf_type_is_scalar(member_type)) 8000 + return false; 8001 + continue; 8002 + } 8003 + if (!btf_type_is_scalar(member_type)) 8004 + return false; 8005 + } 8006 + return true; 8007 + } 8008 + 8009 + 8010 + static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = { 8011 + #ifdef CONFIG_NET 8012 + [PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK], 8013 + [PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], 8014 + [PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP], 8015 + #endif 8016 + }; 8017 + 8018 + enum kfunc_ptr_arg_type { 8019 + KF_ARG_PTR_TO_CTX, 8020 + KF_ARG_PTR_TO_ALLOC_BTF_ID, /* Allocated object */ 8021 + KF_ARG_PTR_TO_KPTR, /* PTR_TO_KPTR but type specific */ 8022 + KF_ARG_PTR_TO_DYNPTR, 8023 + KF_ARG_PTR_TO_LIST_HEAD, 8024 + KF_ARG_PTR_TO_LIST_NODE, 8025 + KF_ARG_PTR_TO_BTF_ID, /* Also covers reg2btf_ids conversions */ 8026 + KF_ARG_PTR_TO_MEM, 8027 + KF_ARG_PTR_TO_MEM_SIZE, /* Size derived from next argument, skip it */ 8028 + }; 8029 + 8030 + enum special_kfunc_type { 8031 + KF_bpf_obj_new_impl, 8032 + KF_bpf_obj_drop_impl, 8033 + KF_bpf_list_push_front, 8034 + KF_bpf_list_push_back, 8035 + KF_bpf_list_pop_front, 8036 + KF_bpf_list_pop_back, 8037 + KF_bpf_cast_to_kern_ctx, 8038 + KF_bpf_rdonly_cast, 8039 + KF_bpf_rcu_read_lock, 8040 + KF_bpf_rcu_read_unlock, 8041 + }; 8042 + 8043 + BTF_SET_START(special_kfunc_set) 8044 + BTF_ID(func, bpf_obj_new_impl) 8045 + BTF_ID(func, bpf_obj_drop_impl) 8046 + BTF_ID(func, bpf_list_push_front) 8047 + BTF_ID(func, bpf_list_push_back) 8048 + BTF_ID(func, bpf_list_pop_front) 8049 + BTF_ID(func, bpf_list_pop_back) 8050 + BTF_ID(func, bpf_cast_to_kern_ctx) 8051 + BTF_ID(func, bpf_rdonly_cast) 8052 + BTF_SET_END(special_kfunc_set) 8053 + 8054 + BTF_ID_LIST(special_kfunc_list) 8055 + BTF_ID(func, bpf_obj_new_impl) 8056 + BTF_ID(func, bpf_obj_drop_impl) 8057 + BTF_ID(func, bpf_list_push_front) 8058 + BTF_ID(func, bpf_list_push_back) 8059 + BTF_ID(func, bpf_list_pop_front) 8060 + BTF_ID(func, bpf_list_pop_back) 8061 + BTF_ID(func, bpf_cast_to_kern_ctx) 8062 + BTF_ID(func, bpf_rdonly_cast) 8063 + BTF_ID(func, bpf_rcu_read_lock) 8064 + BTF_ID(func, bpf_rcu_read_unlock) 8065 + 8066 + static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta) 8067 + { 8068 + return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_lock]; 8069 + } 8070 + 8071 + static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta) 8072 + { 8073 + return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock]; 8074 + } 8075 + 8076 + static enum kfunc_ptr_arg_type 8077 + get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, 8078 + struct bpf_kfunc_call_arg_meta *meta, 8079 + const struct btf_type *t, const struct btf_type *ref_t, 8080 + const char *ref_tname, const struct btf_param *args, 8081 + int argno, int nargs) 8082 + { 8083 + u32 regno = argno + 1; 8084 + struct bpf_reg_state *regs = cur_regs(env); 8085 + struct bpf_reg_state *reg = &regs[regno]; 8086 + bool arg_mem_size = false; 8087 + 8088 + if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) 8089 + return KF_ARG_PTR_TO_CTX; 8090 + 8091 + /* In this function, we verify the kfunc's BTF as per the argument type, 8092 + * leaving the rest of the verification with respect to the register 8093 + * type to our caller. When a set of conditions hold in the BTF type of 8094 + * arguments, we resolve it to a known kfunc_ptr_arg_type. 8095 + */ 8096 + if (btf_get_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog), argno)) 8097 + return KF_ARG_PTR_TO_CTX; 8098 + 8099 + if (is_kfunc_arg_alloc_obj(meta->btf, &args[argno])) 8100 + return KF_ARG_PTR_TO_ALLOC_BTF_ID; 8101 + 8102 + if (is_kfunc_arg_kptr_get(meta, argno)) { 8103 + if (!btf_type_is_ptr(ref_t)) { 8104 + verbose(env, "arg#0 BTF type must be a double pointer for kptr_get kfunc\n"); 8105 + return -EINVAL; 8106 + } 8107 + ref_t = btf_type_by_id(meta->btf, ref_t->type); 8108 + ref_tname = btf_name_by_offset(meta->btf, ref_t->name_off); 8109 + if (!btf_type_is_struct(ref_t)) { 8110 + verbose(env, "kernel function %s args#0 pointer type %s %s is not supported\n", 8111 + meta->func_name, btf_type_str(ref_t), ref_tname); 8112 + return -EINVAL; 8113 + } 8114 + return KF_ARG_PTR_TO_KPTR; 8115 + } 8116 + 8117 + if (is_kfunc_arg_dynptr(meta->btf, &args[argno])) 8118 + return KF_ARG_PTR_TO_DYNPTR; 8119 + 8120 + if (is_kfunc_arg_list_head(meta->btf, &args[argno])) 8121 + return KF_ARG_PTR_TO_LIST_HEAD; 8122 + 8123 + if (is_kfunc_arg_list_node(meta->btf, &args[argno])) 8124 + return KF_ARG_PTR_TO_LIST_NODE; 8125 + 8126 + if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) { 8127 + if (!btf_type_is_struct(ref_t)) { 8128 + verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n", 8129 + meta->func_name, argno, btf_type_str(ref_t), ref_tname); 8130 + return -EINVAL; 8131 + } 8132 + return KF_ARG_PTR_TO_BTF_ID; 8133 + } 8134 + 8135 + if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1])) 8136 + arg_mem_size = true; 8137 + 8138 + /* This is the catch all argument type of register types supported by 8139 + * check_helper_mem_access. However, we only allow when argument type is 8140 + * pointer to scalar, or struct composed (recursively) of scalars. When 8141 + * arg_mem_size is true, the pointer can be void *. 8142 + */ 8143 + if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) && 8144 + (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) { 8145 + verbose(env, "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n", 8146 + argno, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : ""); 8147 + return -EINVAL; 8148 + } 8149 + return arg_mem_size ? KF_ARG_PTR_TO_MEM_SIZE : KF_ARG_PTR_TO_MEM; 8150 + } 8151 + 8152 + static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env, 8153 + struct bpf_reg_state *reg, 8154 + const struct btf_type *ref_t, 8155 + const char *ref_tname, u32 ref_id, 8156 + struct bpf_kfunc_call_arg_meta *meta, 8157 + int argno) 8158 + { 8159 + const struct btf_type *reg_ref_t; 8160 + bool strict_type_match = false; 8161 + const struct btf *reg_btf; 8162 + const char *reg_ref_tname; 8163 + u32 reg_ref_id; 8164 + 8165 + if (base_type(reg->type) == PTR_TO_BTF_ID) { 8166 + reg_btf = reg->btf; 8167 + reg_ref_id = reg->btf_id; 8168 + } else { 8169 + reg_btf = btf_vmlinux; 8170 + reg_ref_id = *reg2btf_ids[base_type(reg->type)]; 8171 + } 8172 + 8173 + if (is_kfunc_trusted_args(meta) || (is_kfunc_release(meta) && reg->ref_obj_id)) 8174 + strict_type_match = true; 8175 + 8176 + reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, &reg_ref_id); 8177 + reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off); 8178 + if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) { 8179 + verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n", 8180 + meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1, 8181 + btf_type_str(reg_ref_t), reg_ref_tname); 8182 + return -EINVAL; 8183 + } 8184 + return 0; 8185 + } 8186 + 8187 + static int process_kf_arg_ptr_to_kptr(struct bpf_verifier_env *env, 8188 + struct bpf_reg_state *reg, 8189 + const struct btf_type *ref_t, 8190 + const char *ref_tname, 8191 + struct bpf_kfunc_call_arg_meta *meta, 8192 + int argno) 8193 + { 8194 + struct btf_field *kptr_field; 8195 + 8196 + /* check_func_arg_reg_off allows var_off for 8197 + * PTR_TO_MAP_VALUE, but we need fixed offset to find 8198 + * off_desc. 8199 + */ 8200 + if (!tnum_is_const(reg->var_off)) { 8201 + verbose(env, "arg#0 must have constant offset\n"); 8202 + return -EINVAL; 8203 + } 8204 + 8205 + kptr_field = btf_record_find(reg->map_ptr->record, reg->off + reg->var_off.value, BPF_KPTR); 8206 + if (!kptr_field || kptr_field->type != BPF_KPTR_REF) { 8207 + verbose(env, "arg#0 no referenced kptr at map value offset=%llu\n", 8208 + reg->off + reg->var_off.value); 8209 + return -EINVAL; 8210 + } 8211 + 8212 + if (!btf_struct_ids_match(&env->log, meta->btf, ref_t->type, 0, kptr_field->kptr.btf, 8213 + kptr_field->kptr.btf_id, true)) { 8214 + verbose(env, "kernel function %s args#%d expected pointer to %s %s\n", 8215 + meta->func_name, argno, btf_type_str(ref_t), ref_tname); 8216 + return -EINVAL; 8217 + } 8218 + return 0; 8219 + } 8220 + 8221 + static int ref_set_release_on_unlock(struct bpf_verifier_env *env, u32 ref_obj_id) 8222 + { 8223 + struct bpf_func_state *state = cur_func(env); 8224 + struct bpf_reg_state *reg; 8225 + int i; 8226 + 8227 + /* bpf_spin_lock only allows calling list_push and list_pop, no BPF 8228 + * subprogs, no global functions. This means that the references would 8229 + * not be released inside the critical section but they may be added to 8230 + * the reference state, and the acquired_refs are never copied out for a 8231 + * different frame as BPF to BPF calls don't work in bpf_spin_lock 8232 + * critical sections. 8233 + */ 8234 + if (!ref_obj_id) { 8235 + verbose(env, "verifier internal error: ref_obj_id is zero for release_on_unlock\n"); 8236 + return -EFAULT; 8237 + } 8238 + for (i = 0; i < state->acquired_refs; i++) { 8239 + if (state->refs[i].id == ref_obj_id) { 8240 + if (state->refs[i].release_on_unlock) { 8241 + verbose(env, "verifier internal error: expected false release_on_unlock"); 8242 + return -EFAULT; 8243 + } 8244 + state->refs[i].release_on_unlock = true; 8245 + /* Now mark everyone sharing same ref_obj_id as untrusted */ 8246 + bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ 8247 + if (reg->ref_obj_id == ref_obj_id) 8248 + reg->type |= PTR_UNTRUSTED; 8249 + })); 8250 + return 0; 8251 + } 8252 + } 8253 + verbose(env, "verifier internal error: ref state missing for ref_obj_id\n"); 8254 + return -EFAULT; 8255 + } 8256 + 8257 + /* Implementation details: 8258 + * 8259 + * Each register points to some region of memory, which we define as an 8260 + * allocation. Each allocation may embed a bpf_spin_lock which protects any 8261 + * special BPF objects (bpf_list_head, bpf_rb_root, etc.) part of the same 8262 + * allocation. The lock and the data it protects are colocated in the same 8263 + * memory region. 8264 + * 8265 + * Hence, everytime a register holds a pointer value pointing to such 8266 + * allocation, the verifier preserves a unique reg->id for it. 8267 + * 8268 + * The verifier remembers the lock 'ptr' and the lock 'id' whenever 8269 + * bpf_spin_lock is called. 8270 + * 8271 + * To enable this, lock state in the verifier captures two values: 8272 + * active_lock.ptr = Register's type specific pointer 8273 + * active_lock.id = A unique ID for each register pointer value 8274 + * 8275 + * Currently, PTR_TO_MAP_VALUE and PTR_TO_BTF_ID | MEM_ALLOC are the two 8276 + * supported register types. 8277 + * 8278 + * The active_lock.ptr in case of map values is the reg->map_ptr, and in case of 8279 + * allocated objects is the reg->btf pointer. 8280 + * 8281 + * The active_lock.id is non-unique for maps supporting direct_value_addr, as we 8282 + * can establish the provenance of the map value statically for each distinct 8283 + * lookup into such maps. They always contain a single map value hence unique 8284 + * IDs for each pseudo load pessimizes the algorithm and rejects valid programs. 8285 + * 8286 + * So, in case of global variables, they use array maps with max_entries = 1, 8287 + * hence their active_lock.ptr becomes map_ptr and id = 0 (since they all point 8288 + * into the same map value as max_entries is 1, as described above). 8289 + * 8290 + * In case of inner map lookups, the inner map pointer has same map_ptr as the 8291 + * outer map pointer (in verifier context), but each lookup into an inner map 8292 + * assigns a fresh reg->id to the lookup, so while lookups into distinct inner 8293 + * maps from the same outer map share the same map_ptr as active_lock.ptr, they 8294 + * will get different reg->id assigned to each lookup, hence different 8295 + * active_lock.id. 8296 + * 8297 + * In case of allocated objects, active_lock.ptr is the reg->btf, and the 8298 + * reg->id is a unique ID preserved after the NULL pointer check on the pointer 8299 + * returned from bpf_obj_new. Each allocation receives a new reg->id. 8300 + */ 8301 + static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_reg_state *reg) 8302 + { 8303 + void *ptr; 8304 + u32 id; 8305 + 8306 + switch ((int)reg->type) { 8307 + case PTR_TO_MAP_VALUE: 8308 + ptr = reg->map_ptr; 8309 + break; 8310 + case PTR_TO_BTF_ID | MEM_ALLOC: 8311 + case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED: 8312 + ptr = reg->btf; 8313 + break; 8314 + default: 8315 + verbose(env, "verifier internal error: unknown reg type for lock check\n"); 8316 + return -EFAULT; 8317 + } 8318 + id = reg->id; 8319 + 8320 + if (!env->cur_state->active_lock.ptr) 8321 + return -EINVAL; 8322 + if (env->cur_state->active_lock.ptr != ptr || 8323 + env->cur_state->active_lock.id != id) { 8324 + verbose(env, "held lock and object are not in the same allocation\n"); 8325 + return -EINVAL; 8326 + } 8327 + return 0; 8328 + } 8329 + 8330 + static bool is_bpf_list_api_kfunc(u32 btf_id) 8331 + { 8332 + return btf_id == special_kfunc_list[KF_bpf_list_push_front] || 8333 + btf_id == special_kfunc_list[KF_bpf_list_push_back] || 8334 + btf_id == special_kfunc_list[KF_bpf_list_pop_front] || 8335 + btf_id == special_kfunc_list[KF_bpf_list_pop_back]; 8336 + } 8337 + 8338 + static int process_kf_arg_ptr_to_list_head(struct bpf_verifier_env *env, 8339 + struct bpf_reg_state *reg, u32 regno, 8340 + struct bpf_kfunc_call_arg_meta *meta) 8341 + { 8342 + struct btf_field *field; 8343 + struct btf_record *rec; 8344 + u32 list_head_off; 8345 + 8346 + if (meta->btf != btf_vmlinux || !is_bpf_list_api_kfunc(meta->func_id)) { 8347 + verbose(env, "verifier internal error: bpf_list_head argument for unknown kfunc\n"); 8348 + return -EFAULT; 8349 + } 8350 + 8351 + if (!tnum_is_const(reg->var_off)) { 8352 + verbose(env, 8353 + "R%d doesn't have constant offset. bpf_list_head has to be at the constant offset\n", 8354 + regno); 8355 + return -EINVAL; 8356 + } 8357 + 8358 + rec = reg_btf_record(reg); 8359 + list_head_off = reg->off + reg->var_off.value; 8360 + field = btf_record_find(rec, list_head_off, BPF_LIST_HEAD); 8361 + if (!field) { 8362 + verbose(env, "bpf_list_head not found at offset=%u\n", list_head_off); 8363 + return -EINVAL; 8364 + } 8365 + 8366 + /* All functions require bpf_list_head to be protected using a bpf_spin_lock */ 8367 + if (check_reg_allocation_locked(env, reg)) { 8368 + verbose(env, "bpf_spin_lock at off=%d must be held for bpf_list_head\n", 8369 + rec->spin_lock_off); 8370 + return -EINVAL; 8371 + } 8372 + 8373 + if (meta->arg_list_head.field) { 8374 + verbose(env, "verifier internal error: repeating bpf_list_head arg\n"); 8375 + return -EFAULT; 8376 + } 8377 + meta->arg_list_head.field = field; 8378 + return 0; 8379 + } 8380 + 8381 + static int process_kf_arg_ptr_to_list_node(struct bpf_verifier_env *env, 8382 + struct bpf_reg_state *reg, u32 regno, 8383 + struct bpf_kfunc_call_arg_meta *meta) 8384 + { 8385 + const struct btf_type *et, *t; 8386 + struct btf_field *field; 8387 + struct btf_record *rec; 8388 + u32 list_node_off; 8389 + 8390 + if (meta->btf != btf_vmlinux || 8391 + (meta->func_id != special_kfunc_list[KF_bpf_list_push_front] && 8392 + meta->func_id != special_kfunc_list[KF_bpf_list_push_back])) { 8393 + verbose(env, "verifier internal error: bpf_list_node argument for unknown kfunc\n"); 8394 + return -EFAULT; 8395 + } 8396 + 8397 + if (!tnum_is_const(reg->var_off)) { 8398 + verbose(env, 8399 + "R%d doesn't have constant offset. bpf_list_node has to be at the constant offset\n", 8400 + regno); 8401 + return -EINVAL; 8402 + } 8403 + 8404 + rec = reg_btf_record(reg); 8405 + list_node_off = reg->off + reg->var_off.value; 8406 + field = btf_record_find(rec, list_node_off, BPF_LIST_NODE); 8407 + if (!field || field->offset != list_node_off) { 8408 + verbose(env, "bpf_list_node not found at offset=%u\n", list_node_off); 8409 + return -EINVAL; 8410 + } 8411 + 8412 + field = meta->arg_list_head.field; 8413 + 8414 + et = btf_type_by_id(field->list_head.btf, field->list_head.value_btf_id); 8415 + t = btf_type_by_id(reg->btf, reg->btf_id); 8416 + if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, 0, field->list_head.btf, 8417 + field->list_head.value_btf_id, true)) { 8418 + verbose(env, "operation on bpf_list_head expects arg#1 bpf_list_node at offset=%d " 8419 + "in struct %s, but arg is at offset=%d in struct %s\n", 8420 + field->list_head.node_offset, btf_name_by_offset(field->list_head.btf, et->name_off), 8421 + list_node_off, btf_name_by_offset(reg->btf, t->name_off)); 8422 + return -EINVAL; 8423 + } 8424 + 8425 + if (list_node_off != field->list_head.node_offset) { 8426 + verbose(env, "arg#1 offset=%d, but expected bpf_list_node at offset=%d in struct %s\n", 8427 + list_node_off, field->list_head.node_offset, 8428 + btf_name_by_offset(field->list_head.btf, et->name_off)); 8429 + return -EINVAL; 8430 + } 8431 + /* Set arg#1 for expiration after unlock */ 8432 + return ref_set_release_on_unlock(env, reg->ref_obj_id); 8433 + } 8434 + 8435 + static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta) 8436 + { 8437 + const char *func_name = meta->func_name, *ref_tname; 8438 + const struct btf *btf = meta->btf; 8439 + const struct btf_param *args; 8440 + u32 i, nargs; 8441 + int ret; 8442 + 8443 + args = (const struct btf_param *)(meta->func_proto + 1); 8444 + nargs = btf_type_vlen(meta->func_proto); 8445 + if (nargs > MAX_BPF_FUNC_REG_ARGS) { 8446 + verbose(env, "Function %s has %d > %d args\n", func_name, nargs, 8447 + MAX_BPF_FUNC_REG_ARGS); 8448 + return -EINVAL; 8449 + } 8450 + 8451 + /* Check that BTF function arguments match actual types that the 8452 + * verifier sees. 8453 + */ 8454 + for (i = 0; i < nargs; i++) { 8455 + struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1]; 8456 + const struct btf_type *t, *ref_t, *resolve_ret; 8457 + enum bpf_arg_type arg_type = ARG_DONTCARE; 8458 + u32 regno = i + 1, ref_id, type_size; 8459 + bool is_ret_buf_sz = false; 8460 + int kf_arg_type; 8461 + 8462 + t = btf_type_skip_modifiers(btf, args[i].type, NULL); 8463 + 8464 + if (is_kfunc_arg_ignore(btf, &args[i])) 8465 + continue; 8466 + 8467 + if (btf_type_is_scalar(t)) { 8468 + if (reg->type != SCALAR_VALUE) { 8469 + verbose(env, "R%d is not a scalar\n", regno); 8470 + return -EINVAL; 8471 + } 8472 + 8473 + if (is_kfunc_arg_constant(meta->btf, &args[i])) { 8474 + if (meta->arg_constant.found) { 8475 + verbose(env, "verifier internal error: only one constant argument permitted\n"); 8476 + return -EFAULT; 8477 + } 8478 + if (!tnum_is_const(reg->var_off)) { 8479 + verbose(env, "R%d must be a known constant\n", regno); 8480 + return -EINVAL; 8481 + } 8482 + ret = mark_chain_precision(env, regno); 8483 + if (ret < 0) 8484 + return ret; 8485 + meta->arg_constant.found = true; 8486 + meta->arg_constant.value = reg->var_off.value; 8487 + } else if (is_kfunc_arg_scalar_with_name(btf, &args[i], "rdonly_buf_size")) { 8488 + meta->r0_rdonly = true; 8489 + is_ret_buf_sz = true; 8490 + } else if (is_kfunc_arg_scalar_with_name(btf, &args[i], "rdwr_buf_size")) { 8491 + is_ret_buf_sz = true; 8492 + } 8493 + 8494 + if (is_ret_buf_sz) { 8495 + if (meta->r0_size) { 8496 + verbose(env, "2 or more rdonly/rdwr_buf_size parameters for kfunc"); 8497 + return -EINVAL; 8498 + } 8499 + 8500 + if (!tnum_is_const(reg->var_off)) { 8501 + verbose(env, "R%d is not a const\n", regno); 8502 + return -EINVAL; 8503 + } 8504 + 8505 + meta->r0_size = reg->var_off.value; 8506 + ret = mark_chain_precision(env, regno); 8507 + if (ret) 8508 + return ret; 8509 + } 8510 + continue; 8511 + } 8512 + 8513 + if (!btf_type_is_ptr(t)) { 8514 + verbose(env, "Unrecognized arg#%d type %s\n", i, btf_type_str(t)); 8515 + return -EINVAL; 8516 + } 8517 + 8518 + if (reg->ref_obj_id) { 8519 + if (is_kfunc_release(meta) && meta->ref_obj_id) { 8520 + verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n", 8521 + regno, reg->ref_obj_id, 8522 + meta->ref_obj_id); 8523 + return -EFAULT; 8524 + } 8525 + meta->ref_obj_id = reg->ref_obj_id; 8526 + if (is_kfunc_release(meta)) 8527 + meta->release_regno = regno; 8528 + } 8529 + 8530 + ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id); 8531 + ref_tname = btf_name_by_offset(btf, ref_t->name_off); 8532 + 8533 + kf_arg_type = get_kfunc_ptr_arg_type(env, meta, t, ref_t, ref_tname, args, i, nargs); 8534 + if (kf_arg_type < 0) 8535 + return kf_arg_type; 8536 + 8537 + switch (kf_arg_type) { 8538 + case KF_ARG_PTR_TO_ALLOC_BTF_ID: 8539 + case KF_ARG_PTR_TO_BTF_ID: 8540 + if (!is_kfunc_trusted_args(meta)) 8541 + break; 8542 + 8543 + if (!is_trusted_reg(reg)) { 8544 + verbose(env, "R%d must be referenced or trusted\n", regno); 8545 + return -EINVAL; 8546 + } 8547 + fallthrough; 8548 + case KF_ARG_PTR_TO_CTX: 8549 + /* Trusted arguments have the same offset checks as release arguments */ 8550 + arg_type |= OBJ_RELEASE; 8551 + break; 8552 + case KF_ARG_PTR_TO_KPTR: 8553 + case KF_ARG_PTR_TO_DYNPTR: 8554 + case KF_ARG_PTR_TO_LIST_HEAD: 8555 + case KF_ARG_PTR_TO_LIST_NODE: 8556 + case KF_ARG_PTR_TO_MEM: 8557 + case KF_ARG_PTR_TO_MEM_SIZE: 8558 + /* Trusted by default */ 8559 + break; 8560 + default: 8561 + WARN_ON_ONCE(1); 8562 + return -EFAULT; 8563 + } 8564 + 8565 + if (is_kfunc_release(meta) && reg->ref_obj_id) 8566 + arg_type |= OBJ_RELEASE; 8567 + ret = check_func_arg_reg_off(env, reg, regno, arg_type); 8568 + if (ret < 0) 8569 + return ret; 8570 + 8571 + switch (kf_arg_type) { 8572 + case KF_ARG_PTR_TO_CTX: 8573 + if (reg->type != PTR_TO_CTX) { 8574 + verbose(env, "arg#%d expected pointer to ctx, but got %s\n", i, btf_type_str(t)); 8575 + return -EINVAL; 8576 + } 8577 + 8578 + if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) { 8579 + ret = get_kern_ctx_btf_id(&env->log, resolve_prog_type(env->prog)); 8580 + if (ret < 0) 8581 + return -EINVAL; 8582 + meta->ret_btf_id = ret; 8583 + } 8584 + break; 8585 + case KF_ARG_PTR_TO_ALLOC_BTF_ID: 8586 + if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { 8587 + verbose(env, "arg#%d expected pointer to allocated object\n", i); 8588 + return -EINVAL; 8589 + } 8590 + if (!reg->ref_obj_id) { 8591 + verbose(env, "allocated object must be referenced\n"); 8592 + return -EINVAL; 8593 + } 8594 + if (meta->btf == btf_vmlinux && 8595 + meta->func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) { 8596 + meta->arg_obj_drop.btf = reg->btf; 8597 + meta->arg_obj_drop.btf_id = reg->btf_id; 8598 + } 8599 + break; 8600 + case KF_ARG_PTR_TO_KPTR: 8601 + if (reg->type != PTR_TO_MAP_VALUE) { 8602 + verbose(env, "arg#0 expected pointer to map value\n"); 8603 + return -EINVAL; 8604 + } 8605 + ret = process_kf_arg_ptr_to_kptr(env, reg, ref_t, ref_tname, meta, i); 8606 + if (ret < 0) 8607 + return ret; 8608 + break; 8609 + case KF_ARG_PTR_TO_DYNPTR: 8610 + if (reg->type != PTR_TO_STACK) { 8611 + verbose(env, "arg#%d expected pointer to stack\n", i); 8612 + return -EINVAL; 8613 + } 8614 + 8615 + if (!is_dynptr_reg_valid_init(env, reg)) { 8616 + verbose(env, "arg#%d pointer type %s %s must be valid and initialized\n", 8617 + i, btf_type_str(ref_t), ref_tname); 8618 + return -EINVAL; 8619 + } 8620 + 8621 + if (!is_dynptr_type_expected(env, reg, ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) { 8622 + verbose(env, "arg#%d pointer type %s %s points to unsupported dynamic pointer type\n", 8623 + i, btf_type_str(ref_t), ref_tname); 8624 + return -EINVAL; 8625 + } 8626 + break; 8627 + case KF_ARG_PTR_TO_LIST_HEAD: 8628 + if (reg->type != PTR_TO_MAP_VALUE && 8629 + reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { 8630 + verbose(env, "arg#%d expected pointer to map value or allocated object\n", i); 8631 + return -EINVAL; 8632 + } 8633 + if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC) && !reg->ref_obj_id) { 8634 + verbose(env, "allocated object must be referenced\n"); 8635 + return -EINVAL; 8636 + } 8637 + ret = process_kf_arg_ptr_to_list_head(env, reg, regno, meta); 8638 + if (ret < 0) 8639 + return ret; 8640 + break; 8641 + case KF_ARG_PTR_TO_LIST_NODE: 8642 + if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { 8643 + verbose(env, "arg#%d expected pointer to allocated object\n", i); 8644 + return -EINVAL; 8645 + } 8646 + if (!reg->ref_obj_id) { 8647 + verbose(env, "allocated object must be referenced\n"); 8648 + return -EINVAL; 8649 + } 8650 + ret = process_kf_arg_ptr_to_list_node(env, reg, regno, meta); 8651 + if (ret < 0) 8652 + return ret; 8653 + break; 8654 + case KF_ARG_PTR_TO_BTF_ID: 8655 + /* Only base_type is checked, further checks are done here */ 8656 + if ((base_type(reg->type) != PTR_TO_BTF_ID || 8657 + bpf_type_has_unsafe_modifiers(reg->type)) && 8658 + !reg2btf_ids[base_type(reg->type)]) { 8659 + verbose(env, "arg#%d is %s ", i, reg_type_str(env, reg->type)); 8660 + verbose(env, "expected %s or socket\n", 8661 + reg_type_str(env, base_type(reg->type) | 8662 + (type_flag(reg->type) & BPF_REG_TRUSTED_MODIFIERS))); 8663 + return -EINVAL; 8664 + } 8665 + ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i); 8666 + if (ret < 0) 8667 + return ret; 8668 + break; 8669 + case KF_ARG_PTR_TO_MEM: 8670 + resolve_ret = btf_resolve_size(btf, ref_t, &type_size); 8671 + if (IS_ERR(resolve_ret)) { 8672 + verbose(env, "arg#%d reference type('%s %s') size cannot be determined: %ld\n", 8673 + i, btf_type_str(ref_t), ref_tname, PTR_ERR(resolve_ret)); 8674 + return -EINVAL; 8675 + } 8676 + ret = check_mem_reg(env, reg, regno, type_size); 8677 + if (ret < 0) 8678 + return ret; 8679 + break; 8680 + case KF_ARG_PTR_TO_MEM_SIZE: 8681 + ret = check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1); 8682 + if (ret < 0) { 8683 + verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1); 8684 + return ret; 8685 + } 8686 + /* Skip next '__sz' argument */ 8687 + i++; 8688 + break; 8689 + } 8690 + } 8691 + 8692 + if (is_kfunc_release(meta) && !meta->release_regno) { 8693 + verbose(env, "release kernel function %s expects refcounted PTR_TO_BTF_ID\n", 8694 + func_name); 8695 + return -EINVAL; 8696 + } 8697 + 8698 + return 0; 8699 + } 8700 + 7973 8701 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, 7974 8702 int *insn_idx_p) 7975 8703 { 7976 8704 const struct btf_type *t, *func, *func_proto, *ptr_type; 7977 8705 struct bpf_reg_state *regs = cur_regs(env); 7978 - struct bpf_kfunc_arg_meta meta = { 0 }; 7979 8706 const char *func_name, *ptr_type_name; 8707 + bool sleepable, rcu_lock, rcu_unlock; 8708 + struct bpf_kfunc_call_arg_meta meta; 7980 8709 u32 i, nargs, func_id, ptr_type_id; 7981 8710 int err, insn_idx = *insn_idx_p; 7982 8711 const struct btf_param *args; 8712 + const struct btf_type *ret_t; 7983 8713 struct btf *desc_btf; 7984 8714 u32 *kfunc_flags; 7985 - bool acq; 7986 8715 7987 8716 /* skip for now, but return error when we find this in fixup_kfunc_call */ 7988 8717 if (!insn->imm) ··· 8905 7830 func_name); 8906 7831 return -EACCES; 8907 7832 } 8908 - if (*kfunc_flags & KF_DESTRUCTIVE && !capable(CAP_SYS_BOOT)) { 8909 - verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capabilities\n"); 7833 + 7834 + /* Prepare kfunc call metadata */ 7835 + memset(&meta, 0, sizeof(meta)); 7836 + meta.btf = desc_btf; 7837 + meta.func_id = func_id; 7838 + meta.kfunc_flags = *kfunc_flags; 7839 + meta.func_proto = func_proto; 7840 + meta.func_name = func_name; 7841 + 7842 + if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) { 7843 + verbose(env, "destructive kfunc calls require CAP_SYS_BOOT capability\n"); 8910 7844 return -EACCES; 8911 7845 } 8912 7846 8913 - acq = *kfunc_flags & KF_ACQUIRE; 7847 + sleepable = is_kfunc_sleepable(&meta); 7848 + if (sleepable && !env->prog->aux->sleepable) { 7849 + verbose(env, "program must be sleepable to call sleepable kfunc %s\n", func_name); 7850 + return -EACCES; 7851 + } 8914 7852 8915 - meta.flags = *kfunc_flags; 7853 + rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta); 7854 + rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta); 7855 + if ((rcu_lock || rcu_unlock) && !env->rcu_tag_supported) { 7856 + verbose(env, "no vmlinux btf rcu tag support for kfunc %s\n", func_name); 7857 + return -EACCES; 7858 + } 7859 + 7860 + if (env->cur_state->active_rcu_lock) { 7861 + struct bpf_func_state *state; 7862 + struct bpf_reg_state *reg; 7863 + 7864 + if (rcu_lock) { 7865 + verbose(env, "nested rcu read lock (kernel function %s)\n", func_name); 7866 + return -EINVAL; 7867 + } else if (rcu_unlock) { 7868 + bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ 7869 + if (reg->type & MEM_RCU) { 7870 + reg->type &= ~(MEM_RCU | PTR_TRUSTED); 7871 + reg->type |= PTR_UNTRUSTED; 7872 + } 7873 + })); 7874 + env->cur_state->active_rcu_lock = false; 7875 + } else if (sleepable) { 7876 + verbose(env, "kernel func %s is sleepable within rcu_read_lock region\n", func_name); 7877 + return -EACCES; 7878 + } 7879 + } else if (rcu_lock) { 7880 + env->cur_state->active_rcu_lock = true; 7881 + } else if (rcu_unlock) { 7882 + verbose(env, "unmatched rcu read unlock (kernel function %s)\n", func_name); 7883 + return -EINVAL; 7884 + } 8916 7885 8917 7886 /* Check the arguments */ 8918 - err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, &meta); 7887 + err = check_kfunc_args(env, &meta); 8919 7888 if (err < 0) 8920 7889 return err; 8921 7890 /* In case of release function, we get register number of refcounted 8922 - * PTR_TO_BTF_ID back from btf_check_kfunc_arg_match, do the release now 7891 + * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. 8923 7892 */ 8924 - if (err) { 8925 - err = release_reference(env, regs[err].ref_obj_id); 7893 + if (meta.release_regno) { 7894 + err = release_reference(env, regs[meta.release_regno].ref_obj_id); 8926 7895 if (err) { 8927 7896 verbose(env, "kfunc %s#%d reference has not been acquired before\n", 8928 7897 func_name, func_id); ··· 8980 7861 /* Check return type */ 8981 7862 t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL); 8982 7863 8983 - if (acq && !btf_type_is_struct_ptr(desc_btf, t)) { 8984 - verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n"); 8985 - return -EINVAL; 7864 + if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) { 7865 + /* Only exception is bpf_obj_new_impl */ 7866 + if (meta.btf != btf_vmlinux || meta.func_id != special_kfunc_list[KF_bpf_obj_new_impl]) { 7867 + verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n"); 7868 + return -EINVAL; 7869 + } 8986 7870 } 8987 7871 8988 7872 if (btf_type_is_scalar(t)) { 8989 7873 mark_reg_unknown(env, regs, BPF_REG_0); 8990 7874 mark_btf_func_reg_size(env, BPF_REG_0, t->size); 8991 7875 } else if (btf_type_is_ptr(t)) { 8992 - ptr_type = btf_type_skip_modifiers(desc_btf, t->type, 8993 - &ptr_type_id); 8994 - if (!btf_type_is_struct(ptr_type)) { 7876 + ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id); 7877 + 7878 + if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) { 7879 + if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl]) { 7880 + struct btf *ret_btf; 7881 + u32 ret_btf_id; 7882 + 7883 + if (unlikely(!bpf_global_ma_set)) 7884 + return -ENOMEM; 7885 + 7886 + if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) { 7887 + verbose(env, "local type ID argument must be in range [0, U32_MAX]\n"); 7888 + return -EINVAL; 7889 + } 7890 + 7891 + ret_btf = env->prog->aux->btf; 7892 + ret_btf_id = meta.arg_constant.value; 7893 + 7894 + /* This may be NULL due to user not supplying a BTF */ 7895 + if (!ret_btf) { 7896 + verbose(env, "bpf_obj_new requires prog BTF\n"); 7897 + return -EINVAL; 7898 + } 7899 + 7900 + ret_t = btf_type_by_id(ret_btf, ret_btf_id); 7901 + if (!ret_t || !__btf_type_is_struct(ret_t)) { 7902 + verbose(env, "bpf_obj_new type ID argument must be of a struct\n"); 7903 + return -EINVAL; 7904 + } 7905 + 7906 + mark_reg_known_zero(env, regs, BPF_REG_0); 7907 + regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC; 7908 + regs[BPF_REG_0].btf = ret_btf; 7909 + regs[BPF_REG_0].btf_id = ret_btf_id; 7910 + 7911 + env->insn_aux_data[insn_idx].obj_new_size = ret_t->size; 7912 + env->insn_aux_data[insn_idx].kptr_struct_meta = 7913 + btf_find_struct_meta(ret_btf, ret_btf_id); 7914 + } else if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) { 7915 + env->insn_aux_data[insn_idx].kptr_struct_meta = 7916 + btf_find_struct_meta(meta.arg_obj_drop.btf, 7917 + meta.arg_obj_drop.btf_id); 7918 + } else if (meta.func_id == special_kfunc_list[KF_bpf_list_pop_front] || 7919 + meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) { 7920 + struct btf_field *field = meta.arg_list_head.field; 7921 + 7922 + mark_reg_known_zero(env, regs, BPF_REG_0); 7923 + regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC; 7924 + regs[BPF_REG_0].btf = field->list_head.btf; 7925 + regs[BPF_REG_0].btf_id = field->list_head.value_btf_id; 7926 + regs[BPF_REG_0].off = field->list_head.node_offset; 7927 + } else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) { 7928 + mark_reg_known_zero(env, regs, BPF_REG_0); 7929 + regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED; 7930 + regs[BPF_REG_0].btf = desc_btf; 7931 + regs[BPF_REG_0].btf_id = meta.ret_btf_id; 7932 + } else if (meta.func_id == special_kfunc_list[KF_bpf_rdonly_cast]) { 7933 + ret_t = btf_type_by_id(desc_btf, meta.arg_constant.value); 7934 + if (!ret_t || !btf_type_is_struct(ret_t)) { 7935 + verbose(env, 7936 + "kfunc bpf_rdonly_cast type ID argument must be of a struct\n"); 7937 + return -EINVAL; 7938 + } 7939 + 7940 + mark_reg_known_zero(env, regs, BPF_REG_0); 7941 + regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_UNTRUSTED; 7942 + regs[BPF_REG_0].btf = desc_btf; 7943 + regs[BPF_REG_0].btf_id = meta.arg_constant.value; 7944 + } else { 7945 + verbose(env, "kernel function %s unhandled dynamic return type\n", 7946 + meta.func_name); 7947 + return -EFAULT; 7948 + } 7949 + } else if (!__btf_type_is_struct(ptr_type)) { 8995 7950 if (!meta.r0_size) { 8996 7951 ptr_type_name = btf_name_by_offset(desc_btf, 8997 7952 ptr_type->name_off); ··· 9093 7900 regs[BPF_REG_0].type = PTR_TO_BTF_ID; 9094 7901 regs[BPF_REG_0].btf_id = ptr_type_id; 9095 7902 } 9096 - if (*kfunc_flags & KF_RET_NULL) { 7903 + 7904 + if (is_kfunc_ret_null(&meta)) { 9097 7905 regs[BPF_REG_0].type |= PTR_MAYBE_NULL; 9098 7906 /* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */ 9099 7907 regs[BPF_REG_0].id = ++env->id_gen; 9100 7908 } 9101 7909 mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *)); 9102 - if (acq) { 7910 + if (is_kfunc_acquire(&meta)) { 9103 7911 int id = acquire_reference_state(env, insn_idx); 9104 7912 9105 7913 if (id < 0) 9106 7914 return id; 9107 - regs[BPF_REG_0].id = id; 7915 + if (is_kfunc_ret_null(&meta)) 7916 + regs[BPF_REG_0].id = id; 9108 7917 regs[BPF_REG_0].ref_obj_id = id; 9109 7918 } 7919 + if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id) 7920 + regs[BPF_REG_0].id = ++env->id_gen; 9110 7921 } /* else { add_kfunc_call() ensures it is btf_type_is_void(t) } */ 9111 7922 9112 7923 nargs = btf_type_vlen(func_proto); ··· 11283 10086 { 11284 10087 if (type_may_be_null(reg->type) && reg->id == id && 11285 10088 !WARN_ON_ONCE(!reg->id)) { 11286 - if (WARN_ON_ONCE(reg->smin_value || reg->smax_value || 11287 - !tnum_equals_const(reg->var_off, 0) || 11288 - reg->off)) { 11289 - /* Old offset (both fixed and variable parts) should 11290 - * have been known-zero, because we don't allow pointer 11291 - * arithmetic on pointers that might be NULL. If we 11292 - * see this happening, don't convert the register. 11293 - */ 10089 + /* Old offset (both fixed and variable parts) should have been 10090 + * known-zero, because we don't allow pointer arithmetic on 10091 + * pointers that might be NULL. If we see this happening, don't 10092 + * convert the register. 10093 + * 10094 + * But in some cases, some helpers that return local kptrs 10095 + * advance offset for the returned pointer. In those cases, it 10096 + * is fine to expect to see reg->off. 10097 + */ 10098 + if (WARN_ON_ONCE(reg->smin_value || reg->smax_value || !tnum_equals_const(reg->var_off, 0))) 11294 10099 return; 11295 - } 10100 + if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL) && WARN_ON_ONCE(reg->off)) 10101 + return; 11296 10102 if (is_null) { 11297 10103 reg->type = SCALAR_VALUE; 11298 10104 /* We don't need id and ref_obj_id from this point ··· 11469 10269 struct bpf_verifier_state *other_branch; 11470 10270 struct bpf_reg_state *regs = this_branch->frame[this_branch->curframe]->regs; 11471 10271 struct bpf_reg_state *dst_reg, *other_branch_regs, *src_reg = NULL; 10272 + struct bpf_reg_state *eq_branch_regs; 11472 10273 u8 opcode = BPF_OP(insn->code); 11473 10274 bool is_jmp32; 11474 10275 int pred = -1; ··· 11579 10378 /* detect if we are comparing against a constant value so we can adjust 11580 10379 * our min/max values for our dst register. 11581 10380 * this is only legit if both are scalars (or pointers to the same 11582 - * object, I suppose, but we don't support that right now), because 11583 - * otherwise the different base pointers mean the offsets aren't 10381 + * object, I suppose, see the PTR_MAYBE_NULL related if block below), 10382 + * because otherwise the different base pointers mean the offsets aren't 11584 10383 * comparable. 11585 10384 */ 11586 10385 if (BPF_SRC(insn->code) == BPF_X) { ··· 11627 10426 !WARN_ON_ONCE(dst_reg->id != other_branch_regs[insn->dst_reg].id)) { 11628 10427 find_equal_scalars(this_branch, dst_reg); 11629 10428 find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]); 10429 + } 10430 + 10431 + /* if one pointer register is compared to another pointer 10432 + * register check if PTR_MAYBE_NULL could be lifted. 10433 + * E.g. register A - maybe null 10434 + * register B - not null 10435 + * for JNE A, B, ... - A is not null in the false branch; 10436 + * for JEQ A, B, ... - A is not null in the true branch. 10437 + */ 10438 + if (!is_jmp32 && BPF_SRC(insn->code) == BPF_X && 10439 + __is_pointer_value(false, src_reg) && __is_pointer_value(false, dst_reg) && 10440 + type_may_be_null(src_reg->type) != type_may_be_null(dst_reg->type)) { 10441 + eq_branch_regs = NULL; 10442 + switch (opcode) { 10443 + case BPF_JEQ: 10444 + eq_branch_regs = other_branch_regs; 10445 + break; 10446 + case BPF_JNE: 10447 + eq_branch_regs = regs; 10448 + break; 10449 + default: 10450 + /* do nothing */ 10451 + break; 10452 + } 10453 + if (eq_branch_regs) { 10454 + if (type_may_be_null(src_reg->type)) 10455 + mark_ptr_not_null_reg(&eq_branch_regs[insn->src_reg]); 10456 + else 10457 + mark_ptr_not_null_reg(&eq_branch_regs[insn->dst_reg]); 10458 + } 11630 10459 } 11631 10460 11632 10461 /* detect if R == 0 where R is returned from bpf_map_lookup_elem(). ··· 11765 10534 insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE) { 11766 10535 dst_reg->type = PTR_TO_MAP_VALUE; 11767 10536 dst_reg->off = aux->map_off; 11768 - if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) 11769 - dst_reg->id = ++env->id_gen; 10537 + WARN_ON_ONCE(map->max_entries != 1); 10538 + /* We want reg->id to be same (0) as map_value is not distinct */ 11770 10539 } else if (insn->src_reg == BPF_PSEUDO_MAP_FD || 11771 10540 insn->src_reg == BPF_PSEUDO_MAP_IDX) { 11772 10541 dst_reg->type = CONST_PTR_TO_MAP; ··· 11844 10613 return err; 11845 10614 } 11846 10615 11847 - if (env->cur_state->active_spin_lock) { 10616 + if (env->cur_state->active_lock.ptr) { 11848 10617 verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_spin_lock-ed region\n"); 10618 + return -EINVAL; 10619 + } 10620 + 10621 + if (env->cur_state->active_rcu_lock) { 10622 + verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_rcu_read_lock-ed region\n"); 11849 10623 return -EINVAL; 11850 10624 } 11851 10625 ··· 13115 11879 if (old->speculative && !cur->speculative) 13116 11880 return false; 13117 11881 13118 - if (old->active_spin_lock != cur->active_spin_lock) 11882 + if (old->active_lock.ptr != cur->active_lock.ptr || 11883 + old->active_lock.id != cur->active_lock.id) 11884 + return false; 11885 + 11886 + if (old->active_rcu_lock != cur->active_rcu_lock) 13119 11887 return false; 13120 11888 13121 11889 /* for states to be equal callsites have to be the same ··· 13764 12524 return -EINVAL; 13765 12525 } 13766 12526 13767 - if (env->cur_state->active_spin_lock && 13768 - (insn->src_reg == BPF_PSEUDO_CALL || 13769 - insn->imm != BPF_FUNC_spin_unlock)) { 13770 - verbose(env, "function calls are not allowed while holding a lock\n"); 13771 - return -EINVAL; 12527 + if (env->cur_state->active_lock.ptr) { 12528 + if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) || 12529 + (insn->src_reg == BPF_PSEUDO_CALL) || 12530 + (insn->src_reg == BPF_PSEUDO_KFUNC_CALL && 12531 + (insn->off != 0 || !is_bpf_list_api_kfunc(insn->imm)))) { 12532 + verbose(env, "function calls are not allowed while holding a lock\n"); 12533 + return -EINVAL; 12534 + } 13772 12535 } 13773 12536 if (insn->src_reg == BPF_PSEUDO_CALL) 13774 12537 err = check_func_call(env, insn, &env->insn_idx); ··· 13804 12561 return -EINVAL; 13805 12562 } 13806 12563 13807 - if (env->cur_state->active_spin_lock) { 12564 + if (env->cur_state->active_lock.ptr) { 13808 12565 verbose(env, "bpf_spin_unlock is missing\n"); 12566 + return -EINVAL; 12567 + } 12568 + 12569 + if (env->cur_state->active_rcu_lock) { 12570 + verbose(env, "bpf_rcu_read_unlock is missing\n"); 13809 12571 return -EINVAL; 13810 12572 } 13811 12573 ··· 14065 12817 14066 12818 { 14067 12819 enum bpf_prog_type prog_type = resolve_prog_type(prog); 12820 + 12821 + if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { 12822 + if (is_tracing_prog_type(prog_type)) { 12823 + verbose(env, "tracing progs cannot use bpf_list_head yet\n"); 12824 + return -EINVAL; 12825 + } 12826 + } 14068 12827 14069 12828 if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) { 14070 12829 if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) { ··· 14909 13654 break; 14910 13655 case PTR_TO_BTF_ID: 14911 13656 case PTR_TO_BTF_ID | PTR_UNTRUSTED: 13657 + /* PTR_TO_BTF_ID | MEM_ALLOC always has a valid lifetime, unlike 13658 + * PTR_TO_BTF_ID, and an active ref_obj_id, but the same cannot 13659 + * be said once it is marked PTR_UNTRUSTED, hence we must handle 13660 + * any faults for loads into such types. BPF_WRITE is disallowed 13661 + * for this case. 13662 + */ 13663 + case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED: 14912 13664 if (type == BPF_READ) { 14913 13665 insn->code = BPF_LDX | BPF_PROBE_MEM | 14914 13666 BPF_SIZE((insn)->code); ··· 15281 14019 return err; 15282 14020 } 15283 14021 15284 - static int fixup_kfunc_call(struct bpf_verifier_env *env, 15285 - struct bpf_insn *insn) 14022 + static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, 14023 + struct bpf_insn *insn_buf, int insn_idx, int *cnt) 15286 14024 { 15287 14025 const struct bpf_kfunc_desc *desc; 15288 14026 ··· 15301 14039 return -EFAULT; 15302 14040 } 15303 14041 14042 + *cnt = 0; 15304 14043 insn->imm = desc->imm; 14044 + if (insn->off) 14045 + return 0; 14046 + if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl]) { 14047 + struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta; 14048 + struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) }; 14049 + u64 obj_new_size = env->insn_aux_data[insn_idx].obj_new_size; 15305 14050 14051 + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_1, obj_new_size); 14052 + insn_buf[1] = addr[0]; 14053 + insn_buf[2] = addr[1]; 14054 + insn_buf[3] = *insn; 14055 + *cnt = 4; 14056 + } else if (desc->func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) { 14057 + struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta; 14058 + struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) }; 14059 + 14060 + insn_buf[0] = addr[0]; 14061 + insn_buf[1] = addr[1]; 14062 + insn_buf[2] = *insn; 14063 + *cnt = 3; 14064 + } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] || 14065 + desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) { 14066 + insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1); 14067 + *cnt = 1; 14068 + } 15306 14069 return 0; 15307 14070 } 15308 14071 ··· 15469 14182 if (insn->src_reg == BPF_PSEUDO_CALL) 15470 14183 continue; 15471 14184 if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { 15472 - ret = fixup_kfunc_call(env, insn); 14185 + ret = fixup_kfunc_call(env, insn, insn_buf, i + delta, &cnt); 15473 14186 if (ret) 15474 14187 return ret; 14188 + if (cnt == 0) 14189 + continue; 14190 + 14191 + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); 14192 + if (!new_prog) 14193 + return -ENOMEM; 14194 + 14195 + delta += cnt - 1; 14196 + env->prog = prog = new_prog; 14197 + insn = new_prog->insnsi + i + delta; 15475 14198 continue; 15476 14199 } 15477 14200 ··· 15599 14302 goto patch_call_imm; 15600 14303 } 15601 14304 15602 - if (insn->imm == BPF_FUNC_task_storage_get || 15603 - insn->imm == BPF_FUNC_sk_storage_get || 15604 - insn->imm == BPF_FUNC_inode_storage_get || 15605 - insn->imm == BPF_FUNC_cgrp_storage_get) { 15606 - if (env->prog->aux->sleepable) 15607 - insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL); 15608 - else 14305 + if (is_storage_get_function(insn->imm)) { 14306 + if (!env->prog->aux->sleepable || 14307 + env->insn_aux_data[i + delta].storage_get_func_atomic) 15609 14308 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC); 14309 + else 14310 + insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL); 15610 14311 insn_buf[1] = *insn; 15611 14312 cnt = 2; 15612 14313 ··· 15674 14379 BUILD_BUG_ON(!__same_type(ops->map_peek_elem, 15675 14380 (int (*)(struct bpf_map *map, void *value))NULL)); 15676 14381 BUILD_BUG_ON(!__same_type(ops->map_redirect, 15677 - (int (*)(struct bpf_map *map, u32 ifindex, u64 flags))NULL)); 14382 + (int (*)(struct bpf_map *map, u64 index, u64 flags))NULL)); 15678 14383 BUILD_BUG_ON(!__same_type(ops->map_for_each_callback, 15679 14384 (int (*)(struct bpf_map *map, 15680 14385 bpf_callback_t callback_fn, ··· 16683 15388 env->bypass_spec_v1 = bpf_bypass_spec_v1(); 16684 15389 env->bypass_spec_v4 = bpf_bypass_spec_v4(); 16685 15390 env->bpf_capable = bpf_capable(); 15391 + env->rcu_tag_supported = btf_vmlinux && 15392 + btf_find_by_name_kind(btf_vmlinux, "rcu", BTF_KIND_TYPE_TAG) > 0; 16686 15393 16687 15394 if (is_priv) 16688 15395 env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;

+3 -3

kernel/trace/bpf_trace.c

··· 774 774 const struct bpf_func_proto bpf_get_current_task_btf_proto = { 775 775 .func = bpf_get_current_task_btf, 776 776 .gpl_only = true, 777 - .ret_type = RET_PTR_TO_BTF_ID, 777 + .ret_type = RET_PTR_TO_BTF_ID_TRUSTED, 778 778 .ret_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], 779 779 }; 780 780 ··· 1485 1485 case BPF_FUNC_get_task_stack: 1486 1486 return &bpf_get_task_stack_proto; 1487 1487 case BPF_FUNC_copy_from_user: 1488 - return prog->aux->sleepable ? &bpf_copy_from_user_proto : NULL; 1488 + return &bpf_copy_from_user_proto; 1489 1489 case BPF_FUNC_copy_from_user_task: 1490 - return prog->aux->sleepable ? &bpf_copy_from_user_task_proto : NULL; 1490 + return &bpf_copy_from_user_task_proto; 1491 1491 case BPF_FUNC_snprintf_btf: 1492 1492 return &bpf_snprintf_btf_proto; 1493 1493 case BPF_FUNC_per_cpu_ptr:

+7 -7

net/bpf/bpf_dummy_struct_ops.c

··· 156 156 } 157 157 158 158 static int bpf_dummy_ops_btf_struct_access(struct bpf_verifier_log *log, 159 - const struct btf *btf, 160 - const struct btf_type *t, int off, 161 - int size, enum bpf_access_type atype, 159 + const struct bpf_reg_state *reg, 160 + int off, int size, enum bpf_access_type atype, 162 161 u32 *next_btf_id, 163 162 enum bpf_type_flag *flag) 164 163 { 165 164 const struct btf_type *state; 165 + const struct btf_type *t; 166 166 s32 type_id; 167 167 int err; 168 168 169 - type_id = btf_find_by_name_kind(btf, "bpf_dummy_ops_state", 169 + type_id = btf_find_by_name_kind(reg->btf, "bpf_dummy_ops_state", 170 170 BTF_KIND_STRUCT); 171 171 if (type_id < 0) 172 172 return -EINVAL; 173 173 174 - state = btf_type_by_id(btf, type_id); 174 + t = btf_type_by_id(reg->btf, reg->btf_id); 175 + state = btf_type_by_id(reg->btf, type_id); 175 176 if (t != state) { 176 177 bpf_log(log, "only access to bpf_dummy_ops_state is supported\n"); 177 178 return -EACCES; 178 179 } 179 180 180 - err = btf_struct_access(log, btf, t, off, size, atype, next_btf_id, 181 - flag); 181 + err = btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); 182 182 if (err < 0) 183 183 return err; 184 184

-3

net/bpf/test_run.c

··· 980 980 { 981 981 struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; 982 982 983 - if (!skb->len) 984 - return -EINVAL; 985 - 986 983 if (!__skb) 987 984 return 0; 988 985

+28 -29

net/core/filter.c

··· 2124 2124 { 2125 2125 unsigned int mlen = skb_network_offset(skb); 2126 2126 2127 + if (unlikely(skb->len <= mlen)) { 2128 + kfree_skb(skb); 2129 + return -ERANGE; 2130 + } 2131 + 2127 2132 if (mlen) { 2128 2133 __skb_pull(skb, mlen); 2129 - if (unlikely(!skb->len)) { 2130 - kfree_skb(skb); 2131 - return -ERANGE; 2132 - } 2133 2134 2134 2135 /* At ingress, the mac header has already been pulled once. 2135 2136 * At egress, skb_pospull_rcsum has to be done in case that ··· 2150 2149 u32 flags) 2151 2150 { 2152 2151 /* Verify that a link layer header is carried */ 2153 - if (unlikely(skb->mac_header >= skb->network_header)) { 2152 + if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) { 2154 2153 kfree_skb(skb); 2155 2154 return -ERANGE; 2156 2155 } ··· 4109 4108 .arg2_type = ARG_ANYTHING, 4110 4109 }; 4111 4110 4112 - /* XDP_REDIRECT works by a three-step process, implemented in the functions 4111 + /** 4112 + * DOC: xdp redirect 4113 + * 4114 + * XDP_REDIRECT works by a three-step process, implemented in the functions 4113 4115 * below: 4114 4116 * 4115 4117 * 1. The bpf_redirect() and bpf_redirect_map() helpers will lookup the target ··· 4127 4123 * 3. Before exiting its NAPI poll loop, the driver will call xdp_do_flush(), 4128 4124 * which will flush all the different bulk queues, thus completing the 4129 4125 * redirect. 4130 - * 4126 + */ 4127 + /* 4131 4128 * Pointers to the map entries will be kept around for this whole sequence of 4132 4129 * steps, protected by RCU. However, there is no top-level rcu_read_lock() in 4133 4130 * the core code; instead, the RCU protection relies on everything happening ··· 4419 4414 .arg2_type = ARG_ANYTHING, 4420 4415 }; 4421 4416 4422 - BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, 4417 + BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u64, key, 4423 4418 u64, flags) 4424 4419 { 4425 - return map->ops->map_redirect(map, ifindex, flags); 4420 + return map->ops->map_redirect(map, key, flags); 4426 4421 } 4427 4422 4428 4423 static const struct bpf_func_proto bpf_xdp_redirect_map_proto = { ··· 8656 8651 DEFINE_MUTEX(nf_conn_btf_access_lock); 8657 8652 EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock); 8658 8653 8659 - int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf, 8660 - const struct btf_type *t, int off, int size, 8661 - enum bpf_access_type atype, u32 *next_btf_id, 8662 - enum bpf_type_flag *flag); 8654 + int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, 8655 + const struct bpf_reg_state *reg, 8656 + int off, int size, enum bpf_access_type atype, 8657 + u32 *next_btf_id, enum bpf_type_flag *flag); 8663 8658 EXPORT_SYMBOL_GPL(nfct_btf_struct_access); 8664 8659 8665 8660 static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log, 8666 - const struct btf *btf, 8667 - const struct btf_type *t, int off, 8668 - int size, enum bpf_access_type atype, 8669 - u32 *next_btf_id, 8670 - enum bpf_type_flag *flag) 8661 + const struct bpf_reg_state *reg, 8662 + int off, int size, enum bpf_access_type atype, 8663 + u32 *next_btf_id, enum bpf_type_flag *flag) 8671 8664 { 8672 8665 int ret = -EACCES; 8673 8666 8674 8667 if (atype == BPF_READ) 8675 - return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, 8676 - flag); 8668 + return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); 8677 8669 8678 8670 mutex_lock(&nf_conn_btf_access_lock); 8679 8671 if (nfct_btf_struct_access) 8680 - ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag); 8672 + ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); 8681 8673 mutex_unlock(&nf_conn_btf_access_lock); 8682 8674 8683 8675 return ret; ··· 8740 8738 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action); 8741 8739 8742 8740 static int xdp_btf_struct_access(struct bpf_verifier_log *log, 8743 - const struct btf *btf, 8744 - const struct btf_type *t, int off, 8745 - int size, enum bpf_access_type atype, 8746 - u32 *next_btf_id, 8747 - enum bpf_type_flag *flag) 8741 + const struct bpf_reg_state *reg, 8742 + int off, int size, enum bpf_access_type atype, 8743 + u32 *next_btf_id, enum bpf_type_flag *flag) 8748 8744 { 8749 8745 int ret = -EACCES; 8750 8746 8751 8747 if (atype == BPF_READ) 8752 - return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, 8753 - flag); 8748 + return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); 8754 8749 8755 8750 mutex_lock(&nf_conn_btf_access_lock); 8756 8751 if (nfct_btf_struct_access) 8757 - ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag); 8752 + ret = nfct_btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); 8758 8753 mutex_unlock(&nf_conn_btf_access_lock); 8759 8754 8760 8755 return ret;

+9 -8

net/ipv4/bpf_tcp_ca.c

··· 61 61 if (!bpf_tracing_btf_ctx_access(off, size, type, prog, info)) 62 62 return false; 63 63 64 - if (info->reg_type == PTR_TO_BTF_ID && info->btf_id == sock_id) 64 + if (base_type(info->reg_type) == PTR_TO_BTF_ID && 65 + !bpf_type_has_unsafe_modifiers(info->reg_type) && 66 + info->btf_id == sock_id) 65 67 /* promote it to tcp_sock */ 66 68 info->btf_id = tcp_sock_id; 67 69 ··· 71 69 } 72 70 73 71 static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log, 74 - const struct btf *btf, 75 - const struct btf_type *t, int off, 76 - int size, enum bpf_access_type atype, 77 - u32 *next_btf_id, 78 - enum bpf_type_flag *flag) 72 + const struct bpf_reg_state *reg, 73 + int off, int size, enum bpf_access_type atype, 74 + u32 *next_btf_id, enum bpf_type_flag *flag) 79 75 { 76 + const struct btf_type *t; 80 77 size_t end; 81 78 82 79 if (atype == BPF_READ) 83 - return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, 84 - flag); 80 + return btf_struct_access(log, reg, off, size, atype, next_btf_id, flag); 85 81 82 + t = btf_type_by_id(reg->btf, reg->btf_id); 86 83 if (t != tcp_sock_type) { 87 84 bpf_log(log, "only read is supported\n"); 88 85 return -EACCES;

+7 -10

net/netfilter/nf_conntrack_bpf.c

··· 191 191 192 192 /* Check writes into `struct nf_conn` */ 193 193 static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log, 194 - const struct btf *btf, 195 - const struct btf_type *t, int off, 196 - int size, enum bpf_access_type atype, 197 - u32 *next_btf_id, 198 - enum bpf_type_flag *flag) 194 + const struct bpf_reg_state *reg, 195 + int off, int size, enum bpf_access_type atype, 196 + u32 *next_btf_id, enum bpf_type_flag *flag) 199 197 { 200 - const struct btf_type *ncit; 201 - const struct btf_type *nct; 198 + const struct btf_type *ncit, *nct, *t; 202 199 size_t end; 203 200 204 - ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]); 205 - nct = btf_type_by_id(btf, btf_nf_conn_ids[0]); 206 - 201 + ncit = btf_type_by_id(reg->btf, btf_nf_conn_ids[1]); 202 + nct = btf_type_by_id(reg->btf, btf_nf_conn_ids[0]); 203 + t = btf_type_by_id(reg->btf, reg->btf_id); 207 204 if (t != nct && t != ncit) { 208 205 bpf_log(log, "only read is supported\n"); 209 206 return -EACCES;

+2 -2

net/xdp/xskmap.c

··· 231 231 return 0; 232 232 } 233 233 234 - static int xsk_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags) 234 + static int xsk_map_redirect(struct bpf_map *map, u64 index, u64 flags) 235 235 { 236 - return __bpf_xdp_redirect_map(map, ifindex, flags, 0, 236 + return __bpf_xdp_redirect_map(map, index, flags, 0, 237 237 __xsk_map_lookup_elem); 238 238 } 239 239

+1 -1

samples/bpf/test_cgrp2_tc.sh

··· 115 115 if [ "$DEBUG" == "yes" ] && [ "$MODE" != 'cleanuponly' ] 116 116 then 117 117 echo "------ DEBUG ------" 118 - echo "mount: "; mount | egrep '(cgroup2|bpf)'; echo 118 + echo "mount: "; mount | grep -E '(cgroup2|bpf)'; echo 119 119 echo "$CGRP2_TC_LEAF: "; ls -l $CGRP2_TC_LEAF; echo 120 120 if [ -d "$BPF_FS_TC_SHARE" ] 121 121 then

+1 -1

samples/bpf/xdp_router_ipv4_user.c

··· 162 162 __be32 gw; 163 163 } *prefix_value; 164 164 165 - prefix_key = alloca(sizeof(*prefix_key) + 3); 165 + prefix_key = alloca(sizeof(*prefix_key) + 4); 166 166 prefix_value = alloca(sizeof(*prefix_value)); 167 167 168 168 prefix_key->prefixlen = 32;

-9

tools/bpf/bpftool/Documentation/common_options.rst

··· 23 23 Print all logs available, even debug-level information. This includes 24 24 logs from libbpf as well as from the verifier, when attempting to 25 25 load programs. 26 - 27 - -l, --legacy 28 - Use legacy libbpf mode which has more relaxed BPF program 29 - requirements. By default, bpftool has more strict requirements 30 - about section names, changes pinning logic and doesn't support 31 - some of the older non-BTF map declarations. 32 - 33 - See https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0 34 - for details.

+1 -1

tools/bpf/bpftool/Documentation/substitutions.rst

··· 1 1 .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 2 3 - .. |COMMON_OPTIONS| replace:: { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | { **-l** | **--legacy** } 3 + .. |COMMON_OPTIONS| replace:: { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** }

+1 -1

tools/bpf/bpftool/bash-completion/bpftool

··· 261 261 # Deal with options 262 262 if [[ ${words[cword]} == -* ]]; then 263 263 local c='--version --json --pretty --bpffs --mapcompat --debug \ 264 - --use-loader --base-btf --legacy' 264 + --use-loader --base-btf' 265 265 COMPREPLY=( $( compgen -W "$c" -- "$cur" ) ) 266 266 return 0 267 267 fi

+8 -11

tools/bpf/bpftool/btf.c

··· 467 467 int err = 0, i; 468 468 469 469 d = btf_dump__new(btf, btf_dump_printf, NULL, NULL); 470 - err = libbpf_get_error(d); 471 - if (err) 472 - return err; 470 + if (!d) 471 + return -errno; 473 472 474 473 printf("#ifndef __VMLINUX_H__\n"); 475 474 printf("#define __VMLINUX_H__\n"); ··· 511 512 struct btf *base; 512 513 513 514 base = btf__parse(sysfs_vmlinux, NULL); 514 - if (libbpf_get_error(base)) { 515 - p_err("failed to parse vmlinux BTF at '%s': %ld\n", 516 - sysfs_vmlinux, libbpf_get_error(base)); 517 - base = NULL; 518 - } 515 + if (!base) 516 + p_err("failed to parse vmlinux BTF at '%s': %d\n", 517 + sysfs_vmlinux, -errno); 519 518 520 519 return base; 521 520 } ··· 556 559 __u32 btf_id = -1; 557 560 const char *src; 558 561 int fd = -1; 559 - int err; 562 + int err = 0; 560 563 561 564 if (!REQ_ARGS(2)) { 562 565 usage(); ··· 631 634 base = get_vmlinux_btf_from_sysfs(); 632 635 633 636 btf = btf__parse_split(*argv, base ?: base_btf); 634 - err = libbpf_get_error(btf); 635 637 if (!btf) { 638 + err = -errno; 636 639 p_err("failed to load BTF from %s: %s", 637 640 *argv, strerror(errno)); 638 641 goto done; ··· 678 681 } 679 682 680 683 btf = btf__load_from_kernel_by_id_split(btf_id, base_btf); 681 - err = libbpf_get_error(btf); 682 684 if (!btf) { 685 + err = -errno; 683 686 p_err("get btf by id (%u): %s", btf_id, strerror(errno)); 684 687 goto done; 685 688 }

+1 -1

tools/bpf/bpftool/btf_dumper.c

··· 75 75 goto print; 76 76 77 77 prog_btf = btf__load_from_kernel_by_id(info.btf_id); 78 - if (libbpf_get_error(prog_btf)) 78 + if (!prog_btf) 79 79 goto print; 80 80 func_type = btf__type_by_id(prog_btf, finfo.type_id); 81 81 if (!func_type || !btf_is_func(func_type))

+4 -6

tools/bpf/bpftool/gen.c

··· 252 252 int err = 0; 253 253 254 254 d = btf_dump__new(btf, codegen_btf_dump_printf, NULL, NULL); 255 - err = libbpf_get_error(d); 256 - if (err) 257 - return err; 255 + if (!d) 256 + return -errno; 258 257 259 258 bpf_object__for_each_map(map, obj) { 260 259 /* only generate definitions for memory-mapped internal maps */ ··· 975 976 /* log_level1 + log_level2 + stats, but not stable UAPI */ 976 977 opts.kernel_log_level = 1 + 2 + 4; 977 978 obj = bpf_object__open_mem(obj_data, file_sz, &opts); 978 - err = libbpf_get_error(obj); 979 - if (err) { 979 + if (!obj) { 980 980 char err_buf[256]; 981 981 982 + err = -errno; 982 983 libbpf_strerror(err, err_buf, sizeof(err_buf)); 983 984 p_err("failed to open BPF object file: %s", err_buf); 984 - obj = NULL; 985 985 goto out; 986 986 } 987 987

+6 -4

tools/bpf/bpftool/iter.c

··· 4 4 #ifndef _GNU_SOURCE 5 5 #define _GNU_SOURCE 6 6 #endif 7 + #include <errno.h> 7 8 #include <unistd.h> 8 9 #include <linux/err.h> 9 10 #include <bpf/libbpf.h> ··· 49 48 } 50 49 51 50 obj = bpf_object__open(objfile); 52 - err = libbpf_get_error(obj); 53 - if (err) { 51 + if (!obj) { 52 + err = -errno; 54 53 p_err("can't open objfile %s", objfile); 55 54 goto close_map_fd; 56 55 } ··· 63 62 64 63 prog = bpf_object__next_program(obj, NULL); 65 64 if (!prog) { 65 + err = -errno; 66 66 p_err("can't find bpf program in objfile %s", objfile); 67 67 goto close_obj; 68 68 } 69 69 70 70 link = bpf_program__attach_iter(prog, &iter_opts); 71 - err = libbpf_get_error(link); 72 - if (err) { 71 + if (!link) { 72 + err = -errno; 73 73 p_err("attach_iter failed for program %s", 74 74 bpf_program__name(prog)); 75 75 goto close_obj;

+6 -22

tools/bpf/bpftool/main.c

··· 31 31 bool verifier_logs; 32 32 bool relaxed_maps; 33 33 bool use_loader; 34 - bool legacy_libbpf; 35 34 struct btf *base_btf; 36 35 struct hashmap *refs_table; 37 36 ··· 159 160 jsonw_start_object(json_wtr); /* features */ 160 161 jsonw_bool_field(json_wtr, "libbfd", has_libbfd); 161 162 jsonw_bool_field(json_wtr, "llvm", has_llvm); 162 - jsonw_bool_field(json_wtr, "libbpf_strict", !legacy_libbpf); 163 163 jsonw_bool_field(json_wtr, "skeletons", has_skeletons); 164 164 jsonw_bool_field(json_wtr, "bootstrap", bootstrap); 165 165 jsonw_end_object(json_wtr); /* features */ ··· 177 179 printf("features:"); 178 180 print_feature("libbfd", has_libbfd, &nb_features); 179 181 print_feature("llvm", has_llvm, &nb_features); 180 - print_feature("libbpf_strict", !legacy_libbpf, &nb_features); 181 182 print_feature("skeletons", has_skeletons, &nb_features); 182 183 print_feature("bootstrap", bootstrap, &nb_features); 183 184 printf("\n"); ··· 334 337 if (argc < 2) { 335 338 p_err("too few parameters for batch"); 336 339 return -1; 337 - } else if (!is_prefix(*argv, "file")) { 338 - p_err("expected 'file', got: %s", *argv); 339 - return -1; 340 340 } else if (argc > 2) { 341 341 p_err("too many parameters for batch"); 342 + return -1; 343 + } else if (!is_prefix(*argv, "file")) { 344 + p_err("expected 'file', got: %s", *argv); 342 345 return -1; 343 346 } 344 347 NEXT_ARG(); ··· 448 451 { "debug", no_argument, NULL, 'd' }, 449 452 { "use-loader", no_argument, NULL, 'L' }, 450 453 { "base-btf", required_argument, NULL, 'B' }, 451 - { "legacy", no_argument, NULL, 'l' }, 452 454 { 0 } 453 455 }; 454 456 bool version_requested = false; ··· 510 514 break; 511 515 case 'B': 512 516 base_btf = btf__parse(optarg, NULL); 513 - if (libbpf_get_error(base_btf)) { 514 - p_err("failed to parse base BTF at '%s': %ld\n", 515 - optarg, libbpf_get_error(base_btf)); 516 - base_btf = NULL; 517 + if (!base_btf) { 518 + p_err("failed to parse base BTF at '%s': %d\n", 519 + optarg, -errno); 517 520 return -1; 518 521 } 519 522 break; 520 523 case 'L': 521 524 use_loader = true; 522 - break; 523 - case 'l': 524 - legacy_libbpf = true; 525 525 break; 526 526 default: 527 527 p_err("unrecognized option '%s'", argv[optind - 1]); ··· 526 534 else 527 535 usage(); 528 536 } 529 - } 530 - 531 - if (!legacy_libbpf) { 532 - /* Allow legacy map definitions for skeleton generation. 533 - * It will still be rejected if users use LIBBPF_STRICT_ALL 534 - * mode for loading generated skeleton. 535 - */ 536 - libbpf_set_strict_mode(LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS); 537 537 } 538 538 539 539 argc -= optind;

+1 -2

tools/bpf/bpftool/main.h

··· 57 57 #define HELP_SPEC_PROGRAM \ 58 58 "PROG := { id PROG_ID | pinned FILE | tag PROG_TAG | name PROG_NAME }" 59 59 #define HELP_SPEC_OPTIONS \ 60 - "OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug} | {-l|--legacy}" 60 + "OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug}" 61 61 #define HELP_SPEC_MAP \ 62 62 "MAP := { id MAP_ID | pinned FILE | name MAP_NAME }" 63 63 #define HELP_SPEC_LINK \ ··· 82 82 extern bool verifier_logs; 83 83 extern bool relaxed_maps; 84 84 extern bool use_loader; 85 - extern bool legacy_libbpf; 86 85 extern struct btf *base_btf; 87 86 extern struct hashmap *refs_table; 88 87

+7 -13

tools/bpf/bpftool/map.c

··· 786 786 if (info->btf_vmlinux_value_type_id) { 787 787 if (!btf_vmlinux) { 788 788 btf_vmlinux = libbpf_find_kernel_btf(); 789 - err = libbpf_get_error(btf_vmlinux); 790 - if (err) { 789 + if (!btf_vmlinux) { 791 790 p_err("failed to get kernel btf"); 792 - return err; 791 + return -errno; 793 792 } 794 793 } 795 794 *btf = btf_vmlinux; 796 795 } else if (info->btf_value_type_id) { 797 796 *btf = btf__load_from_kernel_by_id(info->btf_id); 798 - err = libbpf_get_error(*btf); 799 - if (err) 797 + if (!*btf) { 798 + err = -errno; 800 799 p_err("failed to get btf"); 800 + } 801 801 } else { 802 802 *btf = NULL; 803 803 } ··· 807 807 808 808 static void free_map_kv_btf(struct btf *btf) 809 809 { 810 - if (!libbpf_get_error(btf) && btf != btf_vmlinux) 810 + if (btf != btf_vmlinux) 811 811 btf__free(btf); 812 - } 813 - 814 - static void free_btf_vmlinux(void) 815 - { 816 - if (!libbpf_get_error(btf_vmlinux)) 817 - btf__free(btf_vmlinux); 818 812 } 819 813 820 814 static int ··· 947 953 close(fds[i]); 948 954 exit_free: 949 955 free(fds); 950 - free_btf_vmlinux(); 956 + btf__free(btf_vmlinux); 951 957 return err; 952 958 } 953 959

+5 -10

tools/bpf/bpftool/prog.c

··· 322 322 return; 323 323 324 324 btf = btf__load_from_kernel_by_id(map_info.btf_id); 325 - if (libbpf_get_error(btf)) 325 + if (!btf) 326 326 goto out_free; 327 327 328 328 t_datasec = btf__type_by_id(btf, map_info.btf_value_type_id); ··· 726 726 727 727 if (info->btf_id) { 728 728 btf = btf__load_from_kernel_by_id(info->btf_id); 729 - if (libbpf_get_error(btf)) { 729 + if (!btf) { 730 730 p_err("failed to get btf"); 731 731 return -1; 732 732 } ··· 1663 1663 open_opts.kernel_log_level = 1 + 2 + 4; 1664 1664 1665 1665 obj = bpf_object__open_file(file, &open_opts); 1666 - if (libbpf_get_error(obj)) { 1666 + if (!obj) { 1667 1667 p_err("failed to open object file"); 1668 1668 goto err_free_reuse_maps; 1669 1669 } ··· 1802 1802 else 1803 1803 bpf_object__unpin_programs(obj, pinfile); 1804 1804 err_close_obj: 1805 - if (!legacy_libbpf) { 1806 - p_info("Warning: bpftool is now running in libbpf strict mode and has more stringent requirements about BPF programs.\n" 1807 - "If it used to work for this object file but now doesn't, see --legacy option for more details.\n"); 1808 - } 1809 - 1810 1805 bpf_object__close(obj); 1811 1806 err_free_reuse_maps: 1812 1807 for (i = 0; i < old_map_fds; i++) ··· 1882 1887 open_opts.kernel_log_level = 1 + 2 + 4; 1883 1888 1884 1889 obj = bpf_object__open_file(file, &open_opts); 1885 - if (libbpf_get_error(obj)) { 1890 + if (!obj) { 1886 1891 p_err("failed to open object file"); 1887 1892 goto err_close_obj; 1888 1893 } ··· 2199 2204 } 2200 2205 2201 2206 btf = btf__load_from_kernel_by_id(info.btf_id); 2202 - if (libbpf_get_error(btf)) { 2207 + if (!btf) { 2203 2208 p_err("failed to load btf for prog FD %d", tgt_fd); 2204 2209 goto out; 2205 2210 }

+9 -13

tools/bpf/bpftool/struct_ops.c

··· 32 32 return btf_vmlinux; 33 33 34 34 btf_vmlinux = libbpf_find_kernel_btf(); 35 - if (libbpf_get_error(btf_vmlinux)) 35 + if (!btf_vmlinux) 36 36 p_err("struct_ops requires kernel CONFIG_DEBUG_INFO_BTF=y"); 37 37 38 38 return btf_vmlinux; ··· 45 45 const char *st_ops_name; 46 46 47 47 kern_btf = get_btf_vmlinux(); 48 - if (libbpf_get_error(kern_btf)) 48 + if (!kern_btf) 49 49 return "<btf_vmlinux_not_found>"; 50 50 51 51 t = btf__type_by_id(kern_btf, info->btf_vmlinux_value_type_id); ··· 63 63 return map_info_type_id; 64 64 65 65 kern_btf = get_btf_vmlinux(); 66 - if (libbpf_get_error(kern_btf)) { 67 - map_info_type_id = PTR_ERR(kern_btf); 68 - return map_info_type_id; 69 - } 66 + if (!kern_btf) 67 + return 0; 70 68 71 69 map_info_type_id = btf__find_by_name_kind(kern_btf, "bpf_map_info", 72 70 BTF_KIND_STRUCT); ··· 413 415 } 414 416 415 417 kern_btf = get_btf_vmlinux(); 416 - if (libbpf_get_error(kern_btf)) 418 + if (!kern_btf) 417 419 return -1; 418 420 419 421 if (!json_output) { ··· 496 498 open_opts.kernel_log_level = 1 + 2 + 4; 497 499 498 500 obj = bpf_object__open_file(file, &open_opts); 499 - if (libbpf_get_error(obj)) 501 + if (!obj) 500 502 return -1; 501 503 502 504 set_max_rlimit(); ··· 511 513 continue; 512 514 513 515 link = bpf_map__attach_struct_ops(map); 514 - if (libbpf_get_error(link)) { 516 + if (!link) { 515 517 p_err("can't register struct_ops %s: %s", 516 - bpf_map__name(map), 517 - strerror(-PTR_ERR(link))); 518 + bpf_map__name(map), strerror(errno)); 518 519 nr_errs++; 519 520 continue; 520 521 } ··· 590 593 591 594 err = cmd_select(cmds, argc, argv, do_help); 592 595 593 - if (!libbpf_get_error(btf_vmlinux)) 594 - btf__free(btf_vmlinux); 596 + btf__free(btf_vmlinux); 595 597 596 598 return err; 597 599 }

+23 -10

tools/include/uapi/linux/bpf.h

··· 2584 2584 * * **SOL_SOCKET**, which supports the following *optname*\ s: 2585 2585 * **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**, 2586 2586 * **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**, 2587 - * **SO_BINDTODEVICE**, **SO_KEEPALIVE**. 2587 + * **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**, 2588 + * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**. 2588 2589 * * **IPPROTO_TCP**, which supports the following *optname*\ s: 2589 2590 * **TCP_CONGESTION**, **TCP_BPF_IW**, 2590 2591 * **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**, 2591 2592 * **TCP_KEEPIDLE**, **TCP_KEEPINTVL**, **TCP_KEEPCNT**, 2592 - * **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**. 2593 + * **TCP_SYNCNT**, **TCP_USER_TIMEOUT**, **TCP_NOTSENT_LOWAT**, 2594 + * **TCP_NODELAY**, **TCP_MAXSEG**, **TCP_WINDOW_CLAMP**, 2595 + * **TCP_THIN_LINEAR_TIMEOUTS**, **TCP_BPF_DELACK_MAX**, 2596 + * **TCP_BPF_RTO_MIN**. 2593 2597 * * **IPPROTO_IP**, which supports *optname* **IP_TOS**. 2594 - * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**. 2598 + * * **IPPROTO_IPV6**, which supports the following *optname*\ s: 2599 + * **IPV6_TCLASS**, **IPV6_AUTOFLOWLABEL**. 2595 2600 * Return 2596 2601 * 0 on success, or a negative error in case of failure. 2597 2602 * ··· 2652 2647 * Return 2653 2648 * 0 on success, or a negative error in case of failure. 2654 2649 * 2655 - * long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags) 2650 + * long bpf_redirect_map(struct bpf_map *map, u64 key, u64 flags) 2656 2651 * Description 2657 2652 * Redirect the packet to the endpoint referenced by *map* at 2658 2653 * index *key*. Depending on its type, this *map* can contain ··· 2813 2808 * and **BPF_CGROUP_INET6_CONNECT**. 2814 2809 * 2815 2810 * This helper actually implements a subset of **getsockopt()**. 2816 - * It supports the following *level*\ s: 2817 - * 2818 - * * **IPPROTO_TCP**, which supports *optname* 2819 - * **TCP_CONGESTION**. 2820 - * * **IPPROTO_IP**, which supports *optname* **IP_TOS**. 2821 - * * **IPPROTO_IPV6**, which supports *optname* **IPV6_TCLASS**. 2811 + * It supports the same set of *optname*\ s that is supported by 2812 + * the **bpf_setsockopt**\ () helper. The exceptions are 2813 + * **TCP_BPF_*** is **bpf_setsockopt**\ () only and 2814 + * **TCP_SAVED_SYN** is **bpf_getsockopt**\ () only. 2822 2815 * Return 2823 2816 * 0 on success, or a negative error in case of failure. 2824 2817 * ··· 6887 6884 } __attribute__((aligned(8))); 6888 6885 6889 6886 struct bpf_dynptr { 6887 + __u64 :64; 6888 + __u64 :64; 6889 + } __attribute__((aligned(8))); 6890 + 6891 + struct bpf_list_head { 6892 + __u64 :64; 6893 + __u64 :64; 6894 + } __attribute__((aligned(8))); 6895 + 6896 + struct bpf_list_node { 6890 6897 __u64 :64; 6891 6898 __u64 :64; 6892 6899 } __attribute__((aligned(8)));

+3 -2

tools/lib/bpf/btf.c

··· 1724 1724 memset(btf->strs_data + old_strs_len, 0, btf->hdr->str_len - old_strs_len); 1725 1725 1726 1726 /* and now restore original strings section size; types data size 1727 - * wasn't modified, so doesn't need restoring, see big comment above */ 1727 + * wasn't modified, so doesn't need restoring, see big comment above 1728 + */ 1728 1729 btf->hdr->str_len = old_strs_len; 1729 1730 1730 1731 hashmap__free(p.str_off_map); ··· 2330 2329 */ 2331 2330 int btf__add_type_tag(struct btf *btf, const char *value, int ref_type_id) 2332 2331 { 2333 - if (!value|| !value[0]) 2332 + if (!value || !value[0]) 2334 2333 return libbpf_err(-EINVAL); 2335 2334 2336 2335 return btf_add_ref_kind(btf, BTF_KIND_TYPE_TAG, value, ref_type_id);

+2 -2

tools/lib/bpf/btf_dump.c

··· 1543 1543 if (!new_name) 1544 1544 return 1; 1545 1545 1546 - hashmap__find(name_map, orig_name, &dup_cnt); 1546 + (void)hashmap__find(name_map, orig_name, &dup_cnt); 1547 1547 dup_cnt++; 1548 1548 1549 1549 err = hashmap__set(name_map, new_name, dup_cnt, &old_name, NULL); ··· 1989 1989 { 1990 1990 const struct btf_member *m = btf_members(t); 1991 1991 __u16 n = btf_vlen(t); 1992 - int i, err; 1992 + int i, err = 0; 1993 1993 1994 1994 /* note that we increment depth before calling btf_dump_print() below; 1995 1995 * this is intentional. btf_dump_data_newline() will not print a

+30 -18

tools/lib/bpf/libbpf.c

··· 347 347 SEC_ATTACHABLE = 2, 348 348 SEC_ATTACHABLE_OPT = SEC_ATTACHABLE | SEC_EXP_ATTACH_OPT, 349 349 /* attachment target is specified through BTF ID in either kernel or 350 - * other BPF program's BTF object */ 350 + * other BPF program's BTF object 351 + */ 351 352 SEC_ATTACH_BTF = 4, 352 353 /* BPF program type allows sleeping/blocking in kernel */ 353 354 SEC_SLEEPABLE = 8, ··· 489 488 char *name; 490 489 /* real_name is defined for special internal maps (.rodata*, 491 490 * .data*, .bss, .kconfig) and preserves their original ELF section 492 - * name. This is important to be be able to find corresponding BTF 491 + * name. This is important to be able to find corresponding BTF 493 492 * DATASEC information. 494 493 */ 495 494 char *real_name; ··· 1864 1863 return -ERANGE; 1865 1864 } 1866 1865 switch (ext->kcfg.sz) { 1867 - case 1: *(__u8 *)ext_val = value; break; 1868 - case 2: *(__u16 *)ext_val = value; break; 1869 - case 4: *(__u32 *)ext_val = value; break; 1870 - case 8: *(__u64 *)ext_val = value; break; 1871 - default: 1872 - return -EINVAL; 1866 + case 1: 1867 + *(__u8 *)ext_val = value; 1868 + break; 1869 + case 2: 1870 + *(__u16 *)ext_val = value; 1871 + break; 1872 + case 4: 1873 + *(__u32 *)ext_val = value; 1874 + break; 1875 + case 8: 1876 + *(__u64 *)ext_val = value; 1877 + break; 1878 + default: 1879 + return -EINVAL; 1873 1880 } 1874 1881 ext->is_set = true; 1875 1882 return 0; ··· 2779 2770 m->type = enum64_placeholder_id; 2780 2771 m->offset = 0; 2781 2772 } 2782 - } 2773 + } 2783 2774 } 2784 2775 2785 2776 return 0; ··· 3511 3502 sec_desc->sec_type = SEC_RELO; 3512 3503 sec_desc->shdr = sh; 3513 3504 sec_desc->data = data; 3514 - } else if (sh->sh_type == SHT_NOBITS && strcmp(name, BSS_SEC) == 0) { 3505 + } else if (sh->sh_type == SHT_NOBITS && (strcmp(name, BSS_SEC) == 0 || 3506 + str_has_pfx(name, BSS_SEC "."))) { 3515 3507 sec_desc->sec_type = SEC_BSS; 3516 3508 sec_desc->shdr = sh; 3517 3509 sec_desc->data = data; ··· 3528 3518 } 3529 3519 3530 3520 /* sort BPF programs by section name and in-section instruction offset 3531 - * for faster search */ 3521 + * for faster search 3522 + */ 3532 3523 if (obj->nr_programs) 3533 3524 qsort(obj->programs, obj->nr_programs, sizeof(*obj->programs), cmp_progs); 3534 3525 ··· 3828 3817 return -EINVAL; 3829 3818 } 3830 3819 ext->kcfg.type = find_kcfg_type(obj->btf, t->type, 3831 - &ext->kcfg.is_signed); 3820 + &ext->kcfg.is_signed); 3832 3821 if (ext->kcfg.type == KCFG_UNKNOWN) { 3833 3822 pr_warn("extern (kcfg) '%s': type is unsupported\n", ext_name); 3834 3823 return -ENOTSUP; ··· 4976 4965 4977 4966 err = bpf_map__reuse_fd(map, pin_fd); 4978 4967 close(pin_fd); 4979 - if (err) { 4968 + if (err) 4980 4969 return err; 4981 - } 4970 + 4982 4971 map->pinned = true; 4983 4972 pr_debug("reused pinned map at '%s'\n", map->pin_path); 4984 4973 ··· 5496 5485 } 5497 5486 5498 5487 err = libbpf_ensure_mem((void **)&obj->btf_modules, &obj->btf_module_cap, 5499 - sizeof(*obj->btf_modules), obj->btf_module_cnt + 1); 5488 + sizeof(*obj->btf_modules), obj->btf_module_cnt + 1); 5500 5489 if (err) 5501 5490 goto err_out; 5502 5491 ··· 6248 6237 * prog; each main prog can have a different set of 6249 6238 * subprograms appended (potentially in different order as 6250 6239 * well), so position of any subprog can be different for 6251 - * different main programs */ 6240 + * different main programs 6241 + */ 6252 6242 insn->imm = subprog->sub_insn_off - (prog->sub_insn_off + insn_idx) - 1; 6253 6243 6254 6244 pr_debug("prog '%s': insn #%zu relocated, imm %d points to subprog '%s' (now at %zu offset)\n", ··· 11007 10995 11008 10996 usdt_cookie = OPTS_GET(opts, usdt_cookie, 0); 11009 10997 link = usdt_manager_attach_usdt(obj->usdt_man, prog, pid, binary_path, 11010 - usdt_provider, usdt_name, usdt_cookie); 10998 + usdt_provider, usdt_name, usdt_cookie); 11011 10999 err = libbpf_get_error(link); 11012 11000 if (err) 11013 11001 return libbpf_err_ptr(err); ··· 12316 12304 btf = bpf_object__btf(s->obj); 12317 12305 if (!btf) { 12318 12306 pr_warn("subskeletons require BTF at runtime (object %s)\n", 12319 - bpf_object__name(s->obj)); 12307 + bpf_object__name(s->obj)); 12320 12308 return libbpf_err(-errno); 12321 12309 } 12322 12310

+2 -2

tools/lib/bpf/ringbuf.c

··· 128 128 /* Map read-only producer page and data pages. We map twice as big 129 129 * data size to allow simple reading of samples that wrap around the 130 130 * end of a ring buffer. See kernel implementation for details. 131 - * */ 131 + */ 132 132 tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, PROT_READ, 133 133 MAP_SHARED, map_fd, rb->page_size); 134 134 if (tmp == MAP_FAILED) { ··· 220 220 return (len + 7) / 8 * 8; 221 221 } 222 222 223 - static int64_t ringbuf_process_ring(struct ring* r) 223 + static int64_t ringbuf_process_ring(struct ring *r) 224 224 { 225 225 int *len_ptr, len, err; 226 226 /* 64-bit to avoid overflow in case of extreme application behavior */

+2

tools/testing/selftests/bpf/DENYLIST.aarch64

··· 38 38 ksyms_module/libbpf # 'bpf_testmod_ksym_percpu': not found in kernel BTF 39 39 ksyms_module/lskel # test_ksyms_module_lskel__open_and_load unexpected error: -2 40 40 libbpf_get_fd_by_id_opts # test_libbpf_get_fd_by_id_opts__attach unexpected error: -524 (errno 524) 41 + linked_list 41 42 lookup_key # test_lookup_key__attach unexpected error: -524 (errno 524) 42 43 lru_bug # lru_bug__attach unexpected error: -524 (errno 524) 43 44 modify_return # modify_return__attach failed unexpected error: -524 (errno 524) 44 45 module_attach # skel_attach skeleton attach failed: -524 45 46 mptcp/base # run_test mptcp unexpected error: -524 (errno 524) 46 47 netcnt # packets unexpected packets: actual 10001 != expected 10000 48 + rcu_read_lock # failed to attach: ERROR: strerror_r(-524)=22 47 49 recursion # skel_attach unexpected error: -524 (errno 524) 48 50 ringbuf # skel_attach skeleton attachment failed: -1 49 51 setget_sockopt # attach_cgroup unexpected error: -524

+5

tools/testing/selftests/bpf/DENYLIST.s390x

··· 10 10 bpf_tcp_ca # JIT does not support calling kernel function (kfunc) 11 11 cb_refs # expected error message unexpected error: -524 (trampoline) 12 12 cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc) 13 + cgrp_kfunc # JIT does not support calling kernel function 13 14 cgrp_local_storage # prog_attach unexpected error: -524 (trampoline) 14 15 core_read_macros # unknown func bpf_probe_read#4 (overlapping) 15 16 d_path # failed to auto-attach program 'prog_stat': -524 (trampoline) ··· 34 33 ksyms_module_libbpf # JIT does not support calling kernel function (kfunc) 35 34 ksyms_module_lskel # test_ksyms_module_lskel__open_and_load unexpected error: -9 (?) 36 35 libbpf_get_fd_by_id_opts # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) 36 + linked_list # JIT does not support calling kernel function (kfunc) 37 37 lookup_key # JIT does not support calling kernel function (kfunc) 38 38 lru_bug # prog 'printk': failed to auto-attach: -524 39 39 map_kptr # failed to open_and_load program: -524 (trampoline) ··· 43 41 mptcp 44 42 netcnt # failed to load BPF skeleton 'netcnt_prog': -7 (?) 45 43 probe_user # check_kprobe_res wrong kprobe res from probe read (?) 44 + rcu_read_lock # failed to find kernel BTF type ID of '__x64_sys_getpgid': -3 (?) 46 45 recursion # skel_attach unexpected error: -524 (trampoline) 47 46 ringbuf # skel_load skeleton load failed (?) 48 47 select_reuseport # intermittently fails on new s390x setup ··· 56 53 socket_cookie # prog_attach unexpected error: -524 (trampoline) 57 54 stacktrace_build_id # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2 (?) 58 55 tailcalls # tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls (?) 56 + task_kfunc # JIT does not support calling kernel function 59 57 task_local_storage # failed to auto-attach program 'trace_exit_creds': -524 (trampoline) 60 58 test_bpffs # bpffs test failed 255 (iterator) 61 59 test_bprm_opts # failed to auto-attach program 'secure_exec': -524 (trampoline) ··· 73 69 trace_vprintk # trace_vprintk__open_and_load unexpected error: -9 (?) 74 70 tracing_struct # failed to auto-attach: -524 (trampoline) 75 71 trampoline_count # prog 'prog1': failed to attach: ERROR: strerror_r(-524)=22 (trampoline) 72 + type_cast # JIT does not support calling kernel function 76 73 unpriv_bpf_disabled # fentry 77 74 user_ringbuf # failed to find kernel BTF type ID of '__s390x_sys_prctl': -3 (?) 78 75 verif_stats # trace_vprintk__open_and_load unexpected error: -9 (?)

+9 -5

tools/testing/selftests/bpf/Makefile

··· 201 201 $(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch]) 202 202 $(call msg,MOD,,$@) 203 203 $(Q)$(RM) bpf_testmod/bpf_testmod.ko # force re-compilation 204 - $(Q)$(MAKE) $(submake_extras) -C bpf_testmod 204 + $(Q)$(MAKE) $(submake_extras) RESOLVE_BTFIDS=$(RESOLVE_BTFIDS) -C bpf_testmod 205 205 $(Q)cp bpf_testmod/bpf_testmod.ko $@ 206 206 207 207 DEFAULT_BPFTOOL := $(HOST_SCRATCH_DIR)/sbin/bpftool ··· 310 310 # Use '-idirafter': Don't interfere with include mechanics except where the 311 311 # build would have failed anyways. 312 312 define get_sys_includes 313 - $(shell $(1) -v -E - </dev/null 2>&1 \ 313 + $(shell $(1) $(2) -v -E - </dev/null 2>&1 \ 314 314 | sed -n '/<...> search starts here:/,/End of search list./{ s| $/.*$|-idirafter \1|p }') \ 315 - $(shell $(1) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}') 315 + $(shell $(1) $(2) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}') 316 316 endef 317 317 318 318 # Determine target endianness. ··· 320 320 grep 'define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__') 321 321 MENDIAN=$(if $(IS_LITTLE_ENDIAN),-mlittle-endian,-mbig-endian) 322 322 323 - CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG)) 323 + ifneq ($(CROSS_COMPILE),) 324 + CLANG_TARGET_ARCH = --target=$(notdir $(CROSS_COMPILE:%-=%)) 325 + endif 326 + 327 + CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH)) 324 328 BPF_CFLAGS = -g -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \ 325 329 -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR) \ 326 330 -I$(abspath $(OUTPUT)/../usr/include) ··· 546 542 # Define test_progs BPF-GCC-flavored test runner. 547 543 ifneq ($(BPF_GCC),) 548 544 TRUNNER_BPF_BUILD_RULE := GCC_BPF_BUILD_RULE 549 - TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(call get_sys_includes,gcc) 545 + TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(call get_sys_includes,gcc,) 550 546 $(eval $(call DEFINE_TEST_RUNNER,test_progs,bpf_gcc)) 551 547 endif 552 548

+68

tools/testing/selftests/bpf/bpf_experimental.h

··· 1 + #ifndef __BPF_EXPERIMENTAL__ 2 + #define __BPF_EXPERIMENTAL__ 3 + 4 + #include <vmlinux.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include <bpf/bpf_helpers.h> 7 + #include <bpf/bpf_core_read.h> 8 + 9 + #define __contains(name, node) __attribute__((btf_decl_tag("contains:" #name ":" #node))) 10 + 11 + /* Description 12 + * Allocates an object of the type represented by 'local_type_id' in 13 + * program BTF. User may use the bpf_core_type_id_local macro to pass the 14 + * type ID of a struct in program BTF. 15 + * 16 + * The 'local_type_id' parameter must be a known constant. 17 + * The 'meta' parameter is a hidden argument that is ignored. 18 + * Returns 19 + * A pointer to an object of the type corresponding to the passed in 20 + * 'local_type_id', or NULL on failure. 21 + */ 22 + extern void *bpf_obj_new_impl(__u64 local_type_id, void *meta) __ksym; 23 + 24 + /* Convenience macro to wrap over bpf_obj_new_impl */ 25 + #define bpf_obj_new(type) ((type *)bpf_obj_new_impl(bpf_core_type_id_local(type), NULL)) 26 + 27 + /* Description 28 + * Free an allocated object. All fields of the object that require 29 + * destruction will be destructed before the storage is freed. 30 + * 31 + * The 'meta' parameter is a hidden argument that is ignored. 32 + * Returns 33 + * Void. 34 + */ 35 + extern void bpf_obj_drop_impl(void *kptr, void *meta) __ksym; 36 + 37 + /* Convenience macro to wrap over bpf_obj_drop_impl */ 38 + #define bpf_obj_drop(kptr) bpf_obj_drop_impl(kptr, NULL) 39 + 40 + /* Description 41 + * Add a new entry to the beginning of the BPF linked list. 42 + * Returns 43 + * Void. 44 + */ 45 + extern void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node) __ksym; 46 + 47 + /* Description 48 + * Add a new entry to the end of the BPF linked list. 49 + * Returns 50 + * Void. 51 + */ 52 + extern void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node) __ksym; 53 + 54 + /* Description 55 + * Remove the entry at the beginning of the BPF linked list. 56 + * Returns 57 + * Pointer to bpf_list_node of deleted entry, or NULL if list is empty. 58 + */ 59 + extern struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head) __ksym; 60 + 61 + /* Description 62 + * Remove the entry at the end of the BPF linked list. 63 + * Returns 64 + * Pointer to bpf_list_node of deleted entry, or NULL if list is empty. 65 + */ 66 + extern struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) __ksym; 67 + 68 + #endif

+19

tools/testing/selftests/bpf/cgroup_helpers.c

··· 333 333 return fd; 334 334 } 335 335 336 + /* 337 + * remove_cgroup() - Remove a cgroup 338 + * @relative_path: The cgroup path, relative to the workdir, to remove 339 + * 340 + * This function expects a cgroup to already be created, relative to the cgroup 341 + * work dir. It also expects the cgroup doesn't have any children or live 342 + * processes and it removes the cgroup. 343 + * 344 + * On failure, it will print an error to stderr. 345 + */ 346 + void remove_cgroup(const char *relative_path) 347 + { 348 + char cgroup_path[PATH_MAX + 1]; 349 + 350 + format_cgroup_path(cgroup_path, relative_path); 351 + if (rmdir(cgroup_path)) 352 + log_err("rmdiring cgroup %s .. %s", relative_path, cgroup_path); 353 + } 354 + 336 355 /** 337 356 * create_and_get_cgroup() - Create a cgroup, relative to workdir, and get the FD 338 357 * @relative_path: The cgroup path, relative to the workdir, to join

+1

tools/testing/selftests/bpf/cgroup_helpers.h

··· 18 18 int cgroup_setup_and_join(const char *relative_path); 19 19 int get_root_cgroup(void); 20 20 int create_and_get_cgroup(const char *relative_path); 21 + void remove_cgroup(const char *relative_path); 21 22 unsigned long long get_cgroup_id(const char *relative_path); 22 23 23 24 int join_cgroup(const char *relative_path);

+1

tools/testing/selftests/bpf/config

··· 8 8 CONFIG_BPF_LSM=y 9 9 CONFIG_BPF_STREAM_PARSER=y 10 10 CONFIG_BPF_SYSCALL=y 11 + CONFIG_BPF_UNPRIV_DEFAULT_OFF=n 11 12 CONFIG_CGROUP_BPF=y 12 13 CONFIG_CRYPTO_HMAC=y 13 14 CONFIG_CRYPTO_SHA256=y

+4

tools/testing/selftests/bpf/network_helpers.c

··· 426 426 if (!ASSERT_OK(err, "mount /sys/fs/bpf")) 427 427 return err; 428 428 429 + err = mount("debugfs", "/sys/kernel/debug", "debugfs", 0, NULL); 430 + if (!ASSERT_OK(err, "mount /sys/kernel/debug")) 431 + return err; 432 + 429 433 return 0; 430 434 } 431 435

+14

tools/testing/selftests/bpf/prog_tests/btf.c

··· 3949 3949 .err_str = "Invalid return type", 3950 3950 }, 3951 3951 { 3952 + .descr = "decl_tag test #17, func proto, argument", 3953 + .raw_types = { 3954 + BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_DECL_TAG, 0, 0), 4), (-1), /* [1] */ 3955 + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 0), /* [2] */ 3956 + BTF_FUNC_PROTO_ENC(0, 1), /* [3] */ 3957 + BTF_FUNC_PROTO_ARG_ENC(NAME_TBD, 1), 3958 + BTF_VAR_ENC(NAME_TBD, 2, 0), /* [4] */ 3959 + BTF_END_RAW, 3960 + }, 3961 + BTF_STR_SEC("\0local\0tag1\0var"), 3962 + .btf_load_err = true, 3963 + .err_str = "Invalid arg#1", 3964 + }, 3965 + { 3952 3966 .descr = "type_tag test #1", 3953 3967 .raw_types = { 3954 3968 BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */

+76

tools/testing/selftests/bpf/prog_tests/cgroup_iter.c

··· 189 189 BPF_CGROUP_ITER_SELF_ONLY, "self_only"); 190 190 } 191 191 192 + static void test_walk_dead_self_only(struct cgroup_iter *skel) 193 + { 194 + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); 195 + char expected_output[128], buf[128]; 196 + const char *cgrp_name = "/dead"; 197 + union bpf_iter_link_info linfo; 198 + int len, cgrp_fd, iter_fd; 199 + struct bpf_link *link; 200 + size_t left; 201 + char *p; 202 + 203 + cgrp_fd = create_and_get_cgroup(cgrp_name); 204 + if (!ASSERT_GE(cgrp_fd, 0, "create cgrp")) 205 + return; 206 + 207 + /* The cgroup will be dead during read() iteration, so it only has 208 + * epilogue in the output 209 + */ 210 + snprintf(expected_output, sizeof(expected_output), EPILOGUE); 211 + 212 + memset(&linfo, 0, sizeof(linfo)); 213 + linfo.cgroup.cgroup_fd = cgrp_fd; 214 + linfo.cgroup.order = BPF_CGROUP_ITER_SELF_ONLY; 215 + opts.link_info = &linfo; 216 + opts.link_info_len = sizeof(linfo); 217 + 218 + link = bpf_program__attach_iter(skel->progs.cgroup_id_printer, &opts); 219 + if (!ASSERT_OK_PTR(link, "attach_iter")) 220 + goto close_cgrp; 221 + 222 + iter_fd = bpf_iter_create(bpf_link__fd(link)); 223 + if (!ASSERT_GE(iter_fd, 0, "iter_create")) 224 + goto free_link; 225 + 226 + /* Close link fd and cgroup fd */ 227 + bpf_link__destroy(link); 228 + close(cgrp_fd); 229 + 230 + /* Remove cgroup to mark it as dead */ 231 + remove_cgroup(cgrp_name); 232 + 233 + /* Two kern_sync_rcu() and usleep() pairs are used to wait for the 234 + * releases of cgroup css, and the last kern_sync_rcu() and usleep() 235 + * pair is used to wait for the free of cgroup itself. 236 + */ 237 + kern_sync_rcu(); 238 + usleep(8000); 239 + kern_sync_rcu(); 240 + usleep(8000); 241 + kern_sync_rcu(); 242 + usleep(1000); 243 + 244 + memset(buf, 0, sizeof(buf)); 245 + left = ARRAY_SIZE(buf); 246 + p = buf; 247 + while ((len = read(iter_fd, p, left)) > 0) { 248 + p += len; 249 + left -= len; 250 + } 251 + 252 + ASSERT_STREQ(buf, expected_output, "dead cgroup output"); 253 + 254 + /* read() after iter finishes should be ok. */ 255 + if (len == 0) 256 + ASSERT_OK(read(iter_fd, buf, sizeof(buf)), "second_read"); 257 + 258 + close(iter_fd); 259 + return; 260 + free_link: 261 + bpf_link__destroy(link); 262 + close_cgrp: 263 + close(cgrp_fd); 264 + } 265 + 192 266 void test_cgroup_iter(void) 193 267 { 194 268 struct cgroup_iter *skel = NULL; ··· 291 217 test_early_termination(skel); 292 218 if (test__start_subtest("cgroup_iter__self_only")) 293 219 test_walk_self_only(skel); 220 + if (test__start_subtest("cgroup_iter__dead_self_only")) 221 + test_walk_dead_self_only(skel); 294 222 out: 295 223 cgroup_iter__destroy(skel); 296 224 cleanup_cgroups();

+175

tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #define _GNU_SOURCE 5 + #include <cgroup_helpers.h> 6 + #include <test_progs.h> 7 + 8 + #include "cgrp_kfunc_failure.skel.h" 9 + #include "cgrp_kfunc_success.skel.h" 10 + 11 + static size_t log_buf_sz = 1 << 20; /* 1 MB */ 12 + static char obj_log_buf[1048576]; 13 + 14 + static struct cgrp_kfunc_success *open_load_cgrp_kfunc_skel(void) 15 + { 16 + struct cgrp_kfunc_success *skel; 17 + int err; 18 + 19 + skel = cgrp_kfunc_success__open(); 20 + if (!ASSERT_OK_PTR(skel, "skel_open")) 21 + return NULL; 22 + 23 + skel->bss->pid = getpid(); 24 + 25 + err = cgrp_kfunc_success__load(skel); 26 + if (!ASSERT_OK(err, "skel_load")) 27 + goto cleanup; 28 + 29 + return skel; 30 + 31 + cleanup: 32 + cgrp_kfunc_success__destroy(skel); 33 + return NULL; 34 + } 35 + 36 + static int mkdir_rm_test_dir(void) 37 + { 38 + int fd; 39 + const char *cgrp_path = "cgrp_kfunc"; 40 + 41 + fd = create_and_get_cgroup(cgrp_path); 42 + if (!ASSERT_GT(fd, 0, "mkdir_cgrp_fd")) 43 + return -1; 44 + 45 + close(fd); 46 + remove_cgroup(cgrp_path); 47 + 48 + return 0; 49 + } 50 + 51 + static void run_success_test(const char *prog_name) 52 + { 53 + struct cgrp_kfunc_success *skel; 54 + struct bpf_program *prog; 55 + struct bpf_link *link = NULL; 56 + 57 + skel = open_load_cgrp_kfunc_skel(); 58 + if (!ASSERT_OK_PTR(skel, "open_load_skel")) 59 + return; 60 + 61 + if (!ASSERT_OK(skel->bss->err, "pre_mkdir_err")) 62 + goto cleanup; 63 + 64 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 65 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 66 + goto cleanup; 67 + 68 + link = bpf_program__attach(prog); 69 + if (!ASSERT_OK_PTR(link, "attached_link")) 70 + goto cleanup; 71 + 72 + ASSERT_EQ(skel->bss->invocations, 0, "pre_rmdir_count"); 73 + if (!ASSERT_OK(mkdir_rm_test_dir(), "cgrp_mkdir")) 74 + goto cleanup; 75 + 76 + ASSERT_EQ(skel->bss->invocations, 1, "post_rmdir_count"); 77 + ASSERT_OK(skel->bss->err, "post_rmdir_err"); 78 + 79 + cleanup: 80 + bpf_link__destroy(link); 81 + cgrp_kfunc_success__destroy(skel); 82 + } 83 + 84 + static const char * const success_tests[] = { 85 + "test_cgrp_acquire_release_argument", 86 + "test_cgrp_acquire_leave_in_map", 87 + "test_cgrp_xchg_release", 88 + "test_cgrp_get_release", 89 + "test_cgrp_get_ancestors", 90 + }; 91 + 92 + static struct { 93 + const char *prog_name; 94 + const char *expected_err_msg; 95 + } failure_tests[] = { 96 + {"cgrp_kfunc_acquire_untrusted", "R1 must be referenced or trusted"}, 97 + {"cgrp_kfunc_acquire_fp", "arg#0 pointer type STRUCT cgroup must point"}, 98 + {"cgrp_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"}, 99 + {"cgrp_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"}, 100 + {"cgrp_kfunc_acquire_null", "arg#0 pointer type STRUCT cgroup must point"}, 101 + {"cgrp_kfunc_acquire_unreleased", "Unreleased reference"}, 102 + {"cgrp_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"}, 103 + {"cgrp_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"}, 104 + {"cgrp_kfunc_get_null", "arg#0 expected pointer to map value"}, 105 + {"cgrp_kfunc_xchg_unreleased", "Unreleased reference"}, 106 + {"cgrp_kfunc_get_unreleased", "Unreleased reference"}, 107 + {"cgrp_kfunc_release_untrusted", "arg#0 is untrusted_ptr_or_null_ expected ptr_ or socket"}, 108 + {"cgrp_kfunc_release_fp", "arg#0 pointer type STRUCT cgroup must point"}, 109 + {"cgrp_kfunc_release_null", "arg#0 is ptr_or_null_ expected ptr_ or socket"}, 110 + {"cgrp_kfunc_release_unacquired", "release kernel function bpf_cgroup_release expects"}, 111 + }; 112 + 113 + static void verify_fail(const char *prog_name, const char *expected_err_msg) 114 + { 115 + LIBBPF_OPTS(bpf_object_open_opts, opts); 116 + struct cgrp_kfunc_failure *skel; 117 + int err, i; 118 + 119 + opts.kernel_log_buf = obj_log_buf; 120 + opts.kernel_log_size = log_buf_sz; 121 + opts.kernel_log_level = 1; 122 + 123 + skel = cgrp_kfunc_failure__open_opts(&opts); 124 + if (!ASSERT_OK_PTR(skel, "cgrp_kfunc_failure__open_opts")) 125 + goto cleanup; 126 + 127 + for (i = 0; i < ARRAY_SIZE(failure_tests); i++) { 128 + struct bpf_program *prog; 129 + const char *curr_name = failure_tests[i].prog_name; 130 + 131 + prog = bpf_object__find_program_by_name(skel->obj, curr_name); 132 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 133 + goto cleanup; 134 + 135 + bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name)); 136 + } 137 + 138 + err = cgrp_kfunc_failure__load(skel); 139 + if (!ASSERT_ERR(err, "unexpected load success")) 140 + goto cleanup; 141 + 142 + if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) { 143 + fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg); 144 + fprintf(stderr, "Verifier output: %s\n", obj_log_buf); 145 + } 146 + 147 + cleanup: 148 + cgrp_kfunc_failure__destroy(skel); 149 + } 150 + 151 + void test_cgrp_kfunc(void) 152 + { 153 + int i, err; 154 + 155 + err = setup_cgroup_environment(); 156 + if (!ASSERT_OK(err, "cgrp_env_setup")) 157 + goto cleanup; 158 + 159 + for (i = 0; i < ARRAY_SIZE(success_tests); i++) { 160 + if (!test__start_subtest(success_tests[i])) 161 + continue; 162 + 163 + run_success_test(success_tests[i]); 164 + } 165 + 166 + for (i = 0; i < ARRAY_SIZE(failure_tests); i++) { 167 + if (!test__start_subtest(failure_tests[i].prog_name)) 168 + continue; 169 + 170 + verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg); 171 + } 172 + 173 + cleanup: 174 + cleanup_cgroup_environment(); 175 + }

+1 -1

tools/testing/selftests/bpf/prog_tests/dynptr.c

··· 17 17 {"ringbuf_missing_release2", "Unreleased reference id=2"}, 18 18 {"ringbuf_missing_release_callback", "Unreleased reference id"}, 19 19 {"use_after_invalid", "Expected an initialized dynptr as arg #3"}, 20 - {"ringbuf_invalid_api", "type=mem expected=alloc_mem"}, 20 + {"ringbuf_invalid_api", "type=mem expected=ringbuf_mem"}, 21 21 {"add_dynptr_to_map1", "invalid indirect read from stack"}, 22 22 {"add_dynptr_to_map2", "invalid indirect read from stack"}, 23 23 {"data_slice_out_of_bounds_ringbuf", "value is outside of the allowed memory range"},

+146

tools/testing/selftests/bpf/prog_tests/empty_skb.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <test_progs.h> 3 + #include <network_helpers.h> 4 + #include <net/if.h> 5 + #include "empty_skb.skel.h" 6 + 7 + #define SYS(cmd) ({ \ 8 + if (!ASSERT_OK(system(cmd), (cmd))) \ 9 + goto out; \ 10 + }) 11 + 12 + void serial_test_empty_skb(void) 13 + { 14 + LIBBPF_OPTS(bpf_test_run_opts, tattr); 15 + struct empty_skb *bpf_obj = NULL; 16 + struct nstoken *tok = NULL; 17 + struct bpf_program *prog; 18 + char eth_hlen_pp[15]; 19 + char eth_hlen[14]; 20 + int veth_ifindex; 21 + int ipip_ifindex; 22 + int err; 23 + int i; 24 + 25 + struct { 26 + const char *msg; 27 + const void *data_in; 28 + __u32 data_size_in; 29 + int *ifindex; 30 + int err; 31 + int ret; 32 + bool success_on_tc; 33 + } tests[] = { 34 + /* Empty packets are always rejected. */ 35 + 36 + { 37 + /* BPF_PROG_RUN ETH_HLEN size check */ 38 + .msg = "veth empty ingress packet", 39 + .data_in = NULL, 40 + .data_size_in = 0, 41 + .ifindex = &veth_ifindex, 42 + .err = -EINVAL, 43 + }, 44 + { 45 + /* BPF_PROG_RUN ETH_HLEN size check */ 46 + .msg = "ipip empty ingress packet", 47 + .data_in = NULL, 48 + .data_size_in = 0, 49 + .ifindex = &ipip_ifindex, 50 + .err = -EINVAL, 51 + }, 52 + 53 + /* ETH_HLEN-sized packets: 54 + * - can not be redirected at LWT_XMIT 55 + * - can be redirected at TC to non-tunneling dest 56 + */ 57 + 58 + { 59 + /* __bpf_redirect_common */ 60 + .msg = "veth ETH_HLEN packet ingress", 61 + .data_in = eth_hlen, 62 + .data_size_in = sizeof(eth_hlen), 63 + .ifindex = &veth_ifindex, 64 + .ret = -ERANGE, 65 + .success_on_tc = true, 66 + }, 67 + { 68 + /* __bpf_redirect_no_mac 69 + * 70 + * lwt: skb->len=0 <= skb_network_offset=0 71 + * tc: skb->len=14 <= skb_network_offset=14 72 + */ 73 + .msg = "ipip ETH_HLEN packet ingress", 74 + .data_in = eth_hlen, 75 + .data_size_in = sizeof(eth_hlen), 76 + .ifindex = &ipip_ifindex, 77 + .ret = -ERANGE, 78 + }, 79 + 80 + /* ETH_HLEN+1-sized packet should be redirected. */ 81 + 82 + { 83 + .msg = "veth ETH_HLEN+1 packet ingress", 84 + .data_in = eth_hlen_pp, 85 + .data_size_in = sizeof(eth_hlen_pp), 86 + .ifindex = &veth_ifindex, 87 + }, 88 + { 89 + .msg = "ipip ETH_HLEN+1 packet ingress", 90 + .data_in = eth_hlen_pp, 91 + .data_size_in = sizeof(eth_hlen_pp), 92 + .ifindex = &ipip_ifindex, 93 + }, 94 + }; 95 + 96 + SYS("ip netns add empty_skb"); 97 + tok = open_netns("empty_skb"); 98 + SYS("ip link add veth0 type veth peer veth1"); 99 + SYS("ip link set dev veth0 up"); 100 + SYS("ip link set dev veth1 up"); 101 + SYS("ip addr add 10.0.0.1/8 dev veth0"); 102 + SYS("ip addr add 10.0.0.2/8 dev veth1"); 103 + veth_ifindex = if_nametoindex("veth0"); 104 + 105 + SYS("ip link add ipip0 type ipip local 10.0.0.1 remote 10.0.0.2"); 106 + SYS("ip link set ipip0 up"); 107 + SYS("ip addr add 192.168.1.1/16 dev ipip0"); 108 + ipip_ifindex = if_nametoindex("ipip0"); 109 + 110 + bpf_obj = empty_skb__open_and_load(); 111 + if (!ASSERT_OK_PTR(bpf_obj, "open skeleton")) 112 + goto out; 113 + 114 + for (i = 0; i < ARRAY_SIZE(tests); i++) { 115 + bpf_object__for_each_program(prog, bpf_obj->obj) { 116 + char buf[128]; 117 + bool at_tc = !strncmp(bpf_program__section_name(prog), "tc", 2); 118 + 119 + tattr.data_in = tests[i].data_in; 120 + tattr.data_size_in = tests[i].data_size_in; 121 + 122 + tattr.data_size_out = 0; 123 + bpf_obj->bss->ifindex = *tests[i].ifindex; 124 + bpf_obj->bss->ret = 0; 125 + err = bpf_prog_test_run_opts(bpf_program__fd(prog), &tattr); 126 + sprintf(buf, "err: %s [%s]", tests[i].msg, bpf_program__name(prog)); 127 + 128 + if (at_tc && tests[i].success_on_tc) 129 + ASSERT_GE(err, 0, buf); 130 + else 131 + ASSERT_EQ(err, tests[i].err, buf); 132 + sprintf(buf, "ret: %s [%s]", tests[i].msg, bpf_program__name(prog)); 133 + if (at_tc && tests[i].success_on_tc) 134 + ASSERT_GE(bpf_obj->bss->ret, 0, buf); 135 + else 136 + ASSERT_EQ(bpf_obj->bss->ret, tests[i].ret, buf); 137 + } 138 + } 139 + 140 + out: 141 + if (bpf_obj) 142 + empty_skb__destroy(bpf_obj); 143 + if (tok) 144 + close_netns(tok); 145 + system("ip netns del empty_skb"); 146 + }

+1 -1

tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c

··· 22 22 "arg#0 pointer type STRUCT bpf_dynptr_kern points to unsupported dynamic pointer type", 0}, 23 23 {"not_valid_dynptr", 24 24 "arg#0 pointer type STRUCT bpf_dynptr_kern must be valid and initialized", 0}, 25 - {"not_ptr_to_stack", "arg#0 pointer type STRUCT bpf_dynptr_kern not to stack", 0}, 25 + {"not_ptr_to_stack", "arg#0 expected pointer to stack", 0}, 26 26 {"dynptr_data_null", NULL, -EBADMSG}, 27 27 }; 28 28

+740

tools/testing/selftests/bpf/prog_tests/linked_list.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <bpf/btf.h> 3 + #include <test_btf.h> 4 + #include <linux/btf.h> 5 + #include <test_progs.h> 6 + #include <network_helpers.h> 7 + 8 + #include "linked_list.skel.h" 9 + #include "linked_list_fail.skel.h" 10 + 11 + static char log_buf[1024 * 1024]; 12 + 13 + static struct { 14 + const char *prog_name; 15 + const char *err_msg; 16 + } linked_list_fail_tests[] = { 17 + #define TEST(test, off) \ 18 + { #test "_missing_lock_push_front", \ 19 + "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \ 20 + { #test "_missing_lock_push_back", \ 21 + "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \ 22 + { #test "_missing_lock_pop_front", \ 23 + "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \ 24 + { #test "_missing_lock_pop_back", \ 25 + "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, 26 + TEST(kptr, 32) 27 + TEST(global, 16) 28 + TEST(map, 0) 29 + TEST(inner_map, 0) 30 + #undef TEST 31 + #define TEST(test, op) \ 32 + { #test "_kptr_incorrect_lock_" #op, \ 33 + "held lock and object are not in the same allocation\n" \ 34 + "bpf_spin_lock at off=32 must be held for bpf_list_head" }, \ 35 + { #test "_global_incorrect_lock_" #op, \ 36 + "held lock and object are not in the same allocation\n" \ 37 + "bpf_spin_lock at off=16 must be held for bpf_list_head" }, \ 38 + { #test "_map_incorrect_lock_" #op, \ 39 + "held lock and object are not in the same allocation\n" \ 40 + "bpf_spin_lock at off=0 must be held for bpf_list_head" }, \ 41 + { #test "_inner_map_incorrect_lock_" #op, \ 42 + "held lock and object are not in the same allocation\n" \ 43 + "bpf_spin_lock at off=0 must be held for bpf_list_head" }, 44 + TEST(kptr, push_front) 45 + TEST(kptr, push_back) 46 + TEST(kptr, pop_front) 47 + TEST(kptr, pop_back) 48 + TEST(global, push_front) 49 + TEST(global, push_back) 50 + TEST(global, pop_front) 51 + TEST(global, pop_back) 52 + TEST(map, push_front) 53 + TEST(map, push_back) 54 + TEST(map, pop_front) 55 + TEST(map, pop_back) 56 + TEST(inner_map, push_front) 57 + TEST(inner_map, push_back) 58 + TEST(inner_map, pop_front) 59 + TEST(inner_map, pop_back) 60 + #undef TEST 61 + { "map_compat_kprobe", "tracing progs cannot use bpf_list_head yet" }, 62 + { "map_compat_kretprobe", "tracing progs cannot use bpf_list_head yet" }, 63 + { "map_compat_tp", "tracing progs cannot use bpf_list_head yet" }, 64 + { "map_compat_perf", "tracing progs cannot use bpf_list_head yet" }, 65 + { "map_compat_raw_tp", "tracing progs cannot use bpf_list_head yet" }, 66 + { "map_compat_raw_tp_w", "tracing progs cannot use bpf_list_head yet" }, 67 + { "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" }, 68 + { "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" }, 69 + { "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" }, 70 + { "obj_drop_non_zero_off", "R1 must have zero offset when passed to release func" }, 71 + { "new_null_ret", "R0 invalid mem access 'ptr_or_null_'" }, 72 + { "obj_new_acq", "Unreleased reference id=" }, 73 + { "use_after_drop", "invalid mem access 'scalar'" }, 74 + { "ptr_walk_scalar", "type=scalar expected=percpu_ptr_" }, 75 + { "direct_read_lock", "direct access to bpf_spin_lock is disallowed" }, 76 + { "direct_write_lock", "direct access to bpf_spin_lock is disallowed" }, 77 + { "direct_read_head", "direct access to bpf_list_head is disallowed" }, 78 + { "direct_write_head", "direct access to bpf_list_head is disallowed" }, 79 + { "direct_read_node", "direct access to bpf_list_node is disallowed" }, 80 + { "direct_write_node", "direct access to bpf_list_node is disallowed" }, 81 + { "write_after_push_front", "only read is supported" }, 82 + { "write_after_push_back", "only read is supported" }, 83 + { "use_after_unlock_push_front", "invalid mem access 'scalar'" }, 84 + { "use_after_unlock_push_back", "invalid mem access 'scalar'" }, 85 + { "double_push_front", "arg#1 expected pointer to allocated object" }, 86 + { "double_push_back", "arg#1 expected pointer to allocated object" }, 87 + { "no_node_value_type", "bpf_list_node not found at offset=0" }, 88 + { "incorrect_value_type", 89 + "operation on bpf_list_head expects arg#1 bpf_list_node at offset=0 in struct foo, " 90 + "but arg is at offset=0 in struct bar" }, 91 + { "incorrect_node_var_off", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" }, 92 + { "incorrect_node_off1", "bpf_list_node not found at offset=1" }, 93 + { "incorrect_node_off2", "arg#1 offset=40, but expected bpf_list_node at offset=0 in struct foo" }, 94 + { "no_head_type", "bpf_list_head not found at offset=0" }, 95 + { "incorrect_head_var_off1", "R1 doesn't have constant offset" }, 96 + { "incorrect_head_var_off2", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" }, 97 + { "incorrect_head_off1", "bpf_list_head not found at offset=17" }, 98 + { "incorrect_head_off2", "bpf_list_head not found at offset=1" }, 99 + { "pop_front_off", 100 + "15: (bf) r1 = r6 ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) " 101 + "R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) refs=2,4\n" 102 + "16: (85) call bpf_this_cpu_ptr#154\nR1 type=ptr_or_null_ expected=percpu_ptr_" }, 103 + { "pop_back_off", 104 + "15: (bf) r1 = r6 ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) " 105 + "R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) refs=2,4\n" 106 + "16: (85) call bpf_this_cpu_ptr#154\nR1 type=ptr_or_null_ expected=percpu_ptr_" }, 107 + }; 108 + 109 + static void test_linked_list_fail_prog(const char *prog_name, const char *err_msg) 110 + { 111 + LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf, 112 + .kernel_log_size = sizeof(log_buf), 113 + .kernel_log_level = 1); 114 + struct linked_list_fail *skel; 115 + struct bpf_program *prog; 116 + int ret; 117 + 118 + skel = linked_list_fail__open_opts(&opts); 119 + if (!ASSERT_OK_PTR(skel, "linked_list_fail__open_opts")) 120 + return; 121 + 122 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 123 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 124 + goto end; 125 + 126 + bpf_program__set_autoload(prog, true); 127 + 128 + ret = linked_list_fail__load(skel); 129 + if (!ASSERT_ERR(ret, "linked_list_fail__load must fail")) 130 + goto end; 131 + 132 + if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) { 133 + fprintf(stderr, "Expected: %s\n", err_msg); 134 + fprintf(stderr, "Verifier: %s\n", log_buf); 135 + } 136 + 137 + end: 138 + linked_list_fail__destroy(skel); 139 + } 140 + 141 + static void clear_fields(struct bpf_map *map) 142 + { 143 + char buf[24]; 144 + int key = 0; 145 + 146 + memset(buf, 0xff, sizeof(buf)); 147 + ASSERT_OK(bpf_map__update_elem(map, &key, sizeof(key), buf, sizeof(buf), 0), "check_and_free_fields"); 148 + } 149 + 150 + enum { 151 + TEST_ALL, 152 + PUSH_POP, 153 + PUSH_POP_MULT, 154 + LIST_IN_LIST, 155 + }; 156 + 157 + static void test_linked_list_success(int mode, bool leave_in_map) 158 + { 159 + LIBBPF_OPTS(bpf_test_run_opts, opts, 160 + .data_in = &pkt_v4, 161 + .data_size_in = sizeof(pkt_v4), 162 + .repeat = 1, 163 + ); 164 + struct linked_list *skel; 165 + int ret; 166 + 167 + skel = linked_list__open_and_load(); 168 + if (!ASSERT_OK_PTR(skel, "linked_list__open_and_load")) 169 + return; 170 + 171 + if (mode == LIST_IN_LIST) 172 + goto lil; 173 + if (mode == PUSH_POP_MULT) 174 + goto ppm; 175 + 176 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_push_pop), &opts); 177 + ASSERT_OK(ret, "map_list_push_pop"); 178 + ASSERT_OK(opts.retval, "map_list_push_pop retval"); 179 + if (!leave_in_map) 180 + clear_fields(skel->maps.array_map); 181 + 182 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_push_pop), &opts); 183 + ASSERT_OK(ret, "inner_map_list_push_pop"); 184 + ASSERT_OK(opts.retval, "inner_map_list_push_pop retval"); 185 + if (!leave_in_map) 186 + clear_fields(skel->maps.inner_map); 187 + 188 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop), &opts); 189 + ASSERT_OK(ret, "global_list_push_pop"); 190 + ASSERT_OK(opts.retval, "global_list_push_pop retval"); 191 + if (!leave_in_map) 192 + clear_fields(skel->maps.bss_A); 193 + 194 + if (mode == PUSH_POP) 195 + goto end; 196 + 197 + ppm: 198 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_push_pop_multiple), &opts); 199 + ASSERT_OK(ret, "map_list_push_pop_multiple"); 200 + ASSERT_OK(opts.retval, "map_list_push_pop_multiple retval"); 201 + if (!leave_in_map) 202 + clear_fields(skel->maps.array_map); 203 + 204 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_push_pop_multiple), &opts); 205 + ASSERT_OK(ret, "inner_map_list_push_pop_multiple"); 206 + ASSERT_OK(opts.retval, "inner_map_list_push_pop_multiple retval"); 207 + if (!leave_in_map) 208 + clear_fields(skel->maps.inner_map); 209 + 210 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop_multiple), &opts); 211 + ASSERT_OK(ret, "global_list_push_pop_multiple"); 212 + ASSERT_OK(opts.retval, "global_list_push_pop_multiple retval"); 213 + if (!leave_in_map) 214 + clear_fields(skel->maps.bss_A); 215 + 216 + if (mode == PUSH_POP_MULT) 217 + goto end; 218 + 219 + lil: 220 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.map_list_in_list), &opts); 221 + ASSERT_OK(ret, "map_list_in_list"); 222 + ASSERT_OK(opts.retval, "map_list_in_list retval"); 223 + if (!leave_in_map) 224 + clear_fields(skel->maps.array_map); 225 + 226 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_in_list), &opts); 227 + ASSERT_OK(ret, "inner_map_list_in_list"); 228 + ASSERT_OK(opts.retval, "inner_map_list_in_list retval"); 229 + if (!leave_in_map) 230 + clear_fields(skel->maps.inner_map); 231 + 232 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_in_list), &opts); 233 + ASSERT_OK(ret, "global_list_in_list"); 234 + ASSERT_OK(opts.retval, "global_list_in_list retval"); 235 + if (!leave_in_map) 236 + clear_fields(skel->maps.bss_A); 237 + end: 238 + linked_list__destroy(skel); 239 + } 240 + 241 + #define SPIN_LOCK 2 242 + #define LIST_HEAD 3 243 + #define LIST_NODE 4 244 + 245 + static struct btf *init_btf(void) 246 + { 247 + int id, lid, hid, nid; 248 + struct btf *btf; 249 + 250 + btf = btf__new_empty(); 251 + if (!ASSERT_OK_PTR(btf, "btf__new_empty")) 252 + return NULL; 253 + id = btf__add_int(btf, "int", 4, BTF_INT_SIGNED); 254 + if (!ASSERT_EQ(id, 1, "btf__add_int")) 255 + goto end; 256 + lid = btf__add_struct(btf, "bpf_spin_lock", 4); 257 + if (!ASSERT_EQ(lid, SPIN_LOCK, "btf__add_struct bpf_spin_lock")) 258 + goto end; 259 + hid = btf__add_struct(btf, "bpf_list_head", 16); 260 + if (!ASSERT_EQ(hid, LIST_HEAD, "btf__add_struct bpf_list_head")) 261 + goto end; 262 + nid = btf__add_struct(btf, "bpf_list_node", 16); 263 + if (!ASSERT_EQ(nid, LIST_NODE, "btf__add_struct bpf_list_node")) 264 + goto end; 265 + return btf; 266 + end: 267 + btf__free(btf); 268 + return NULL; 269 + } 270 + 271 + static void test_btf(void) 272 + { 273 + struct btf *btf = NULL; 274 + int id, err; 275 + 276 + while (test__start_subtest("btf: too many locks")) { 277 + btf = init_btf(); 278 + if (!ASSERT_OK_PTR(btf, "init_btf")) 279 + break; 280 + id = btf__add_struct(btf, "foo", 24); 281 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 282 + break; 283 + err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0); 284 + if (!ASSERT_OK(err, "btf__add_struct foo::a")) 285 + break; 286 + err = btf__add_field(btf, "b", SPIN_LOCK, 32, 0); 287 + if (!ASSERT_OK(err, "btf__add_struct foo::a")) 288 + break; 289 + err = btf__add_field(btf, "c", LIST_HEAD, 64, 0); 290 + if (!ASSERT_OK(err, "btf__add_struct foo::a")) 291 + break; 292 + 293 + err = btf__load_into_kernel(btf); 294 + ASSERT_EQ(err, -E2BIG, "check btf"); 295 + btf__free(btf); 296 + break; 297 + } 298 + 299 + while (test__start_subtest("btf: missing lock")) { 300 + btf = init_btf(); 301 + if (!ASSERT_OK_PTR(btf, "init_btf")) 302 + break; 303 + id = btf__add_struct(btf, "foo", 16); 304 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 305 + break; 306 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 307 + if (!ASSERT_OK(err, "btf__add_struct foo::a")) 308 + break; 309 + id = btf__add_decl_tag(btf, "contains:baz:a", 5, 0); 310 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:baz:a")) 311 + break; 312 + id = btf__add_struct(btf, "baz", 16); 313 + if (!ASSERT_EQ(id, 7, "btf__add_struct baz")) 314 + break; 315 + err = btf__add_field(btf, "a", LIST_NODE, 0, 0); 316 + if (!ASSERT_OK(err, "btf__add_field baz::a")) 317 + break; 318 + 319 + err = btf__load_into_kernel(btf); 320 + ASSERT_EQ(err, -EINVAL, "check btf"); 321 + btf__free(btf); 322 + break; 323 + } 324 + 325 + while (test__start_subtest("btf: bad offset")) { 326 + btf = init_btf(); 327 + if (!ASSERT_OK_PTR(btf, "init_btf")) 328 + break; 329 + id = btf__add_struct(btf, "foo", 36); 330 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 331 + break; 332 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 333 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 334 + break; 335 + err = btf__add_field(btf, "b", LIST_NODE, 0, 0); 336 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 337 + break; 338 + err = btf__add_field(btf, "c", SPIN_LOCK, 0, 0); 339 + if (!ASSERT_OK(err, "btf__add_field foo::c")) 340 + break; 341 + id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0); 342 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b")) 343 + break; 344 + 345 + err = btf__load_into_kernel(btf); 346 + ASSERT_EQ(err, -EEXIST, "check btf"); 347 + btf__free(btf); 348 + break; 349 + } 350 + 351 + while (test__start_subtest("btf: missing contains:")) { 352 + btf = init_btf(); 353 + if (!ASSERT_OK_PTR(btf, "init_btf")) 354 + break; 355 + id = btf__add_struct(btf, "foo", 24); 356 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 357 + break; 358 + err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0); 359 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 360 + break; 361 + err = btf__add_field(btf, "b", LIST_HEAD, 64, 0); 362 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 363 + break; 364 + 365 + err = btf__load_into_kernel(btf); 366 + ASSERT_EQ(err, -EINVAL, "check btf"); 367 + btf__free(btf); 368 + break; 369 + } 370 + 371 + while (test__start_subtest("btf: missing struct")) { 372 + btf = init_btf(); 373 + if (!ASSERT_OK_PTR(btf, "init_btf")) 374 + break; 375 + id = btf__add_struct(btf, "foo", 24); 376 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 377 + break; 378 + err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0); 379 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 380 + break; 381 + err = btf__add_field(btf, "b", LIST_HEAD, 64, 0); 382 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 383 + break; 384 + id = btf__add_decl_tag(btf, "contains:bar:bar", 5, 1); 385 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:bar")) 386 + break; 387 + 388 + err = btf__load_into_kernel(btf); 389 + ASSERT_EQ(err, -ENOENT, "check btf"); 390 + btf__free(btf); 391 + break; 392 + } 393 + 394 + while (test__start_subtest("btf: missing node")) { 395 + btf = init_btf(); 396 + if (!ASSERT_OK_PTR(btf, "init_btf")) 397 + break; 398 + id = btf__add_struct(btf, "foo", 24); 399 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 400 + break; 401 + err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0); 402 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 403 + break; 404 + err = btf__add_field(btf, "b", LIST_HEAD, 64, 0); 405 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 406 + break; 407 + id = btf__add_decl_tag(btf, "contains:foo:c", 5, 1); 408 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:c")) 409 + break; 410 + 411 + err = btf__load_into_kernel(btf); 412 + btf__free(btf); 413 + ASSERT_EQ(err, -ENOENT, "check btf"); 414 + break; 415 + } 416 + 417 + while (test__start_subtest("btf: node incorrect type")) { 418 + btf = init_btf(); 419 + if (!ASSERT_OK_PTR(btf, "init_btf")) 420 + break; 421 + id = btf__add_struct(btf, "foo", 20); 422 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 423 + break; 424 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 425 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 426 + break; 427 + err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0); 428 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 429 + break; 430 + id = btf__add_decl_tag(btf, "contains:bar:a", 5, 0); 431 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:a")) 432 + break; 433 + id = btf__add_struct(btf, "bar", 4); 434 + if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 435 + break; 436 + err = btf__add_field(btf, "a", SPIN_LOCK, 0, 0); 437 + if (!ASSERT_OK(err, "btf__add_field bar::a")) 438 + break; 439 + 440 + err = btf__load_into_kernel(btf); 441 + ASSERT_EQ(err, -EINVAL, "check btf"); 442 + btf__free(btf); 443 + break; 444 + } 445 + 446 + while (test__start_subtest("btf: multiple bpf_list_node with name b")) { 447 + btf = init_btf(); 448 + if (!ASSERT_OK_PTR(btf, "init_btf")) 449 + break; 450 + id = btf__add_struct(btf, "foo", 52); 451 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 452 + break; 453 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 454 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 455 + break; 456 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 457 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 458 + break; 459 + err = btf__add_field(btf, "b", LIST_NODE, 256, 0); 460 + if (!ASSERT_OK(err, "btf__add_field foo::c")) 461 + break; 462 + err = btf__add_field(btf, "d", SPIN_LOCK, 384, 0); 463 + if (!ASSERT_OK(err, "btf__add_field foo::d")) 464 + break; 465 + id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0); 466 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b")) 467 + break; 468 + 469 + err = btf__load_into_kernel(btf); 470 + ASSERT_EQ(err, -EINVAL, "check btf"); 471 + btf__free(btf); 472 + break; 473 + } 474 + 475 + while (test__start_subtest("btf: owning | owned AA cycle")) { 476 + btf = init_btf(); 477 + if (!ASSERT_OK_PTR(btf, "init_btf")) 478 + break; 479 + id = btf__add_struct(btf, "foo", 36); 480 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 481 + break; 482 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 483 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 484 + break; 485 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 486 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 487 + break; 488 + err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 489 + if (!ASSERT_OK(err, "btf__add_field foo::c")) 490 + break; 491 + id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0); 492 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:foo:b")) 493 + break; 494 + 495 + err = btf__load_into_kernel(btf); 496 + ASSERT_EQ(err, -ELOOP, "check btf"); 497 + btf__free(btf); 498 + break; 499 + } 500 + 501 + while (test__start_subtest("btf: owning | owned ABA cycle")) { 502 + btf = init_btf(); 503 + if (!ASSERT_OK_PTR(btf, "init_btf")) 504 + break; 505 + id = btf__add_struct(btf, "foo", 36); 506 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 507 + break; 508 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 509 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 510 + break; 511 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 512 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 513 + break; 514 + err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 515 + if (!ASSERT_OK(err, "btf__add_field foo::c")) 516 + break; 517 + id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0); 518 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b")) 519 + break; 520 + id = btf__add_struct(btf, "bar", 36); 521 + if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 522 + break; 523 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 524 + if (!ASSERT_OK(err, "btf__add_field bar::a")) 525 + break; 526 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 527 + if (!ASSERT_OK(err, "btf__add_field bar::b")) 528 + break; 529 + err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 530 + if (!ASSERT_OK(err, "btf__add_field bar::c")) 531 + break; 532 + id = btf__add_decl_tag(btf, "contains:foo:b", 7, 0); 533 + if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:foo:b")) 534 + break; 535 + 536 + err = btf__load_into_kernel(btf); 537 + ASSERT_EQ(err, -ELOOP, "check btf"); 538 + btf__free(btf); 539 + break; 540 + } 541 + 542 + while (test__start_subtest("btf: owning -> owned")) { 543 + btf = init_btf(); 544 + if (!ASSERT_OK_PTR(btf, "init_btf")) 545 + break; 546 + id = btf__add_struct(btf, "foo", 20); 547 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 548 + break; 549 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 550 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 551 + break; 552 + err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0); 553 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 554 + break; 555 + id = btf__add_decl_tag(btf, "contains:bar:a", 5, 0); 556 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:a")) 557 + break; 558 + id = btf__add_struct(btf, "bar", 16); 559 + if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 560 + break; 561 + err = btf__add_field(btf, "a", LIST_NODE, 0, 0); 562 + if (!ASSERT_OK(err, "btf__add_field bar::a")) 563 + break; 564 + 565 + err = btf__load_into_kernel(btf); 566 + ASSERT_EQ(err, 0, "check btf"); 567 + btf__free(btf); 568 + break; 569 + } 570 + 571 + while (test__start_subtest("btf: owning -> owning | owned -> owned")) { 572 + btf = init_btf(); 573 + if (!ASSERT_OK_PTR(btf, "init_btf")) 574 + break; 575 + id = btf__add_struct(btf, "foo", 20); 576 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 577 + break; 578 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 579 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 580 + break; 581 + err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0); 582 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 583 + break; 584 + id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0); 585 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b")) 586 + break; 587 + id = btf__add_struct(btf, "bar", 36); 588 + if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 589 + break; 590 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 591 + if (!ASSERT_OK(err, "btf__add_field bar::a")) 592 + break; 593 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 594 + if (!ASSERT_OK(err, "btf__add_field bar::b")) 595 + break; 596 + err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 597 + if (!ASSERT_OK(err, "btf__add_field bar::c")) 598 + break; 599 + id = btf__add_decl_tag(btf, "contains:baz:a", 7, 0); 600 + if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:baz:a")) 601 + break; 602 + id = btf__add_struct(btf, "baz", 16); 603 + if (!ASSERT_EQ(id, 9, "btf__add_struct baz")) 604 + break; 605 + err = btf__add_field(btf, "a", LIST_NODE, 0, 0); 606 + if (!ASSERT_OK(err, "btf__add_field baz:a")) 607 + break; 608 + 609 + err = btf__load_into_kernel(btf); 610 + ASSERT_EQ(err, 0, "check btf"); 611 + btf__free(btf); 612 + break; 613 + } 614 + 615 + while (test__start_subtest("btf: owning | owned -> owning | owned -> owned")) { 616 + btf = init_btf(); 617 + if (!ASSERT_OK_PTR(btf, "init_btf")) 618 + break; 619 + id = btf__add_struct(btf, "foo", 36); 620 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 621 + break; 622 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 623 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 624 + break; 625 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 626 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 627 + break; 628 + err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 629 + if (!ASSERT_OK(err, "btf__add_field foo::c")) 630 + break; 631 + id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0); 632 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b")) 633 + break; 634 + id = btf__add_struct(btf, "bar", 36); 635 + if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 636 + break; 637 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 638 + if (!ASSERT_OK(err, "btf__add_field bar:a")) 639 + break; 640 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 641 + if (!ASSERT_OK(err, "btf__add_field bar:b")) 642 + break; 643 + err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 644 + if (!ASSERT_OK(err, "btf__add_field bar:c")) 645 + break; 646 + id = btf__add_decl_tag(btf, "contains:baz:a", 7, 0); 647 + if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:baz:a")) 648 + break; 649 + id = btf__add_struct(btf, "baz", 16); 650 + if (!ASSERT_EQ(id, 9, "btf__add_struct baz")) 651 + break; 652 + err = btf__add_field(btf, "a", LIST_NODE, 0, 0); 653 + if (!ASSERT_OK(err, "btf__add_field baz:a")) 654 + break; 655 + 656 + err = btf__load_into_kernel(btf); 657 + ASSERT_EQ(err, -ELOOP, "check btf"); 658 + btf__free(btf); 659 + break; 660 + } 661 + 662 + while (test__start_subtest("btf: owning -> owning | owned -> owning | owned -> owned")) { 663 + btf = init_btf(); 664 + if (!ASSERT_OK_PTR(btf, "init_btf")) 665 + break; 666 + id = btf__add_struct(btf, "foo", 20); 667 + if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 668 + break; 669 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 670 + if (!ASSERT_OK(err, "btf__add_field foo::a")) 671 + break; 672 + err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0); 673 + if (!ASSERT_OK(err, "btf__add_field foo::b")) 674 + break; 675 + id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0); 676 + if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b")) 677 + break; 678 + id = btf__add_struct(btf, "bar", 36); 679 + if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 680 + break; 681 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 682 + if (!ASSERT_OK(err, "btf__add_field bar::a")) 683 + break; 684 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 685 + if (!ASSERT_OK(err, "btf__add_field bar::b")) 686 + break; 687 + err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 688 + if (!ASSERT_OK(err, "btf__add_field bar::c")) 689 + break; 690 + id = btf__add_decl_tag(btf, "contains:baz:b", 7, 0); 691 + if (!ASSERT_EQ(id, 8, "btf__add_decl_tag")) 692 + break; 693 + id = btf__add_struct(btf, "baz", 36); 694 + if (!ASSERT_EQ(id, 9, "btf__add_struct baz")) 695 + break; 696 + err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 697 + if (!ASSERT_OK(err, "btf__add_field bar::a")) 698 + break; 699 + err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 700 + if (!ASSERT_OK(err, "btf__add_field bar::b")) 701 + break; 702 + err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 703 + if (!ASSERT_OK(err, "btf__add_field bar::c")) 704 + break; 705 + id = btf__add_decl_tag(btf, "contains:bam:a", 9, 0); 706 + if (!ASSERT_EQ(id, 10, "btf__add_decl_tag contains:bam:a")) 707 + break; 708 + id = btf__add_struct(btf, "bam", 16); 709 + if (!ASSERT_EQ(id, 11, "btf__add_struct bam")) 710 + break; 711 + err = btf__add_field(btf, "a", LIST_NODE, 0, 0); 712 + if (!ASSERT_OK(err, "btf__add_field bam::a")) 713 + break; 714 + 715 + err = btf__load_into_kernel(btf); 716 + ASSERT_EQ(err, -ELOOP, "check btf"); 717 + btf__free(btf); 718 + break; 719 + } 720 + } 721 + 722 + void test_linked_list(void) 723 + { 724 + int i; 725 + 726 + for (i = 0; i < ARRAY_SIZE(linked_list_fail_tests); i++) { 727 + if (!test__start_subtest(linked_list_fail_tests[i].prog_name)) 728 + continue; 729 + test_linked_list_fail_prog(linked_list_fail_tests[i].prog_name, 730 + linked_list_fail_tests[i].err_msg); 731 + } 732 + test_btf(); 733 + test_linked_list_success(PUSH_POP, false); 734 + test_linked_list_success(PUSH_POP, true); 735 + test_linked_list_success(PUSH_POP_MULT, false); 736 + test_linked_list_success(PUSH_POP_MULT, true); 737 + test_linked_list_success(LIST_IN_LIST, false); 738 + test_linked_list_success(LIST_IN_LIST, true); 739 + test_linked_list_success(TEST_ALL, false); 740 + }

+13 -4

tools/testing/selftests/bpf/prog_tests/lsm_cgroup.c

··· 173 173 ASSERT_EQ(query_prog_cnt(cgroup_fd, NULL), 4, "total prog count"); 174 174 ASSERT_EQ(query_prog_cnt(cgroup_fd2, NULL), 1, "total prog count"); 175 175 176 - /* AF_UNIX is prohibited. */ 177 - 178 176 fd = socket(AF_UNIX, SOCK_STREAM, 0); 179 - ASSERT_LT(fd, 0, "socket(AF_UNIX)"); 177 + if (!(skel->kconfig->CONFIG_SECURITY_APPARMOR 178 + || skel->kconfig->CONFIG_SECURITY_SELINUX 179 + || skel->kconfig->CONFIG_SECURITY_SMACK)) 180 + /* AF_UNIX is prohibited. */ 181 + ASSERT_LT(fd, 0, "socket(AF_UNIX)"); 180 182 close(fd); 181 183 182 184 /* AF_INET6 gets default policy (sk_priority). */ ··· 235 233 236 234 /* AF_INET6+SOCK_STREAM 237 235 * AF_PACKET+SOCK_RAW 236 + * AF_UNIX+SOCK_RAW if already have non-bpf lsms installed 238 237 * listen_fd 239 238 * client_fd 240 239 * accepted_fd 241 240 */ 242 - ASSERT_EQ(skel->bss->called_socket_post_create2, 5, "called_create2"); 241 + if (skel->kconfig->CONFIG_SECURITY_APPARMOR 242 + || skel->kconfig->CONFIG_SECURITY_SELINUX 243 + || skel->kconfig->CONFIG_SECURITY_SMACK) 244 + /* AF_UNIX+SOCK_RAW if already have non-bpf lsms installed */ 245 + ASSERT_EQ(skel->bss->called_socket_post_create2, 6, "called_create2"); 246 + else 247 + ASSERT_EQ(skel->bss->called_socket_post_create2, 5, "called_create2"); 243 248 244 249 /* start_server 245 250 * bind(ETH_P_ALL)

+158

tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates.*/ 3 + 4 + #define _GNU_SOURCE 5 + #include <unistd.h> 6 + #include <sys/syscall.h> 7 + #include <sys/types.h> 8 + #include <test_progs.h> 9 + #include <bpf/btf.h> 10 + #include "rcu_read_lock.skel.h" 11 + #include "cgroup_helpers.h" 12 + 13 + static unsigned long long cgroup_id; 14 + 15 + static void test_success(void) 16 + { 17 + struct rcu_read_lock *skel; 18 + int err; 19 + 20 + skel = rcu_read_lock__open(); 21 + if (!ASSERT_OK_PTR(skel, "skel_open")) 22 + return; 23 + 24 + skel->bss->target_pid = syscall(SYS_gettid); 25 + 26 + bpf_program__set_autoload(skel->progs.get_cgroup_id, true); 27 + bpf_program__set_autoload(skel->progs.task_succ, true); 28 + bpf_program__set_autoload(skel->progs.no_lock, true); 29 + bpf_program__set_autoload(skel->progs.two_regions, true); 30 + bpf_program__set_autoload(skel->progs.non_sleepable_1, true); 31 + bpf_program__set_autoload(skel->progs.non_sleepable_2, true); 32 + err = rcu_read_lock__load(skel); 33 + if (!ASSERT_OK(err, "skel_load")) 34 + goto out; 35 + 36 + err = rcu_read_lock__attach(skel); 37 + if (!ASSERT_OK(err, "skel_attach")) 38 + goto out; 39 + 40 + syscall(SYS_getpgid); 41 + 42 + ASSERT_EQ(skel->bss->task_storage_val, 2, "task_storage_val"); 43 + ASSERT_EQ(skel->bss->cgroup_id, cgroup_id, "cgroup_id"); 44 + out: 45 + rcu_read_lock__destroy(skel); 46 + } 47 + 48 + static void test_rcuptr_acquire(void) 49 + { 50 + struct rcu_read_lock *skel; 51 + int err; 52 + 53 + skel = rcu_read_lock__open(); 54 + if (!ASSERT_OK_PTR(skel, "skel_open")) 55 + return; 56 + 57 + skel->bss->target_pid = syscall(SYS_gettid); 58 + 59 + bpf_program__set_autoload(skel->progs.task_acquire, true); 60 + err = rcu_read_lock__load(skel); 61 + if (!ASSERT_OK(err, "skel_load")) 62 + goto out; 63 + 64 + err = rcu_read_lock__attach(skel); 65 + ASSERT_OK(err, "skel_attach"); 66 + out: 67 + rcu_read_lock__destroy(skel); 68 + } 69 + 70 + static const char * const inproper_region_tests[] = { 71 + "miss_lock", 72 + "miss_unlock", 73 + "non_sleepable_rcu_mismatch", 74 + "inproper_sleepable_helper", 75 + "inproper_sleepable_kfunc", 76 + "nested_rcu_region", 77 + }; 78 + 79 + static void test_inproper_region(void) 80 + { 81 + struct rcu_read_lock *skel; 82 + struct bpf_program *prog; 83 + int i, err; 84 + 85 + for (i = 0; i < ARRAY_SIZE(inproper_region_tests); i++) { 86 + skel = rcu_read_lock__open(); 87 + if (!ASSERT_OK_PTR(skel, "skel_open")) 88 + return; 89 + 90 + prog = bpf_object__find_program_by_name(skel->obj, inproper_region_tests[i]); 91 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 92 + goto out; 93 + bpf_program__set_autoload(prog, true); 94 + err = rcu_read_lock__load(skel); 95 + ASSERT_ERR(err, "skel_load"); 96 + out: 97 + rcu_read_lock__destroy(skel); 98 + } 99 + } 100 + 101 + static const char * const rcuptr_misuse_tests[] = { 102 + "task_untrusted_non_rcuptr", 103 + "task_untrusted_rcuptr", 104 + "cross_rcu_region", 105 + }; 106 + 107 + static void test_rcuptr_misuse(void) 108 + { 109 + struct rcu_read_lock *skel; 110 + struct bpf_program *prog; 111 + int i, err; 112 + 113 + for (i = 0; i < ARRAY_SIZE(rcuptr_misuse_tests); i++) { 114 + skel = rcu_read_lock__open(); 115 + if (!ASSERT_OK_PTR(skel, "skel_open")) 116 + return; 117 + 118 + prog = bpf_object__find_program_by_name(skel->obj, rcuptr_misuse_tests[i]); 119 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 120 + goto out; 121 + bpf_program__set_autoload(prog, true); 122 + err = rcu_read_lock__load(skel); 123 + ASSERT_ERR(err, "skel_load"); 124 + out: 125 + rcu_read_lock__destroy(skel); 126 + } 127 + } 128 + 129 + void test_rcu_read_lock(void) 130 + { 131 + struct btf *vmlinux_btf; 132 + int cgroup_fd; 133 + 134 + vmlinux_btf = btf__load_vmlinux_btf(); 135 + if (!ASSERT_OK_PTR(vmlinux_btf, "could not load vmlinux BTF")) 136 + return; 137 + if (btf__find_by_name_kind(vmlinux_btf, "rcu", BTF_KIND_TYPE_TAG) < 0) { 138 + test__skip(); 139 + goto out; 140 + } 141 + 142 + cgroup_fd = test__join_cgroup("/rcu_read_lock"); 143 + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /rcu_read_lock")) 144 + goto out; 145 + 146 + cgroup_id = get_cgroup_id("/rcu_read_lock"); 147 + if (test__start_subtest("success")) 148 + test_success(); 149 + if (test__start_subtest("rcuptr_acquire")) 150 + test_rcuptr_acquire(); 151 + if (test__start_subtest("negative_tests_inproper_region")) 152 + test_inproper_region(); 153 + if (test__start_subtest("negative_tests_rcuptr_misuse")) 154 + test_rcuptr_misuse(); 155 + close(cgroup_fd); 156 + out: 157 + btf__free(vmlinux_btf); 158 + }

+142

tools/testing/selftests/bpf/prog_tests/spin_lock.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <test_progs.h> 3 + #include <network_helpers.h> 4 + 5 + #include "test_spin_lock.skel.h" 6 + #include "test_spin_lock_fail.skel.h" 7 + 8 + static char log_buf[1024 * 1024]; 9 + 10 + static struct { 11 + const char *prog_name; 12 + const char *err_msg; 13 + } spin_lock_fail_tests[] = { 14 + { "lock_id_kptr_preserve", 15 + "5: (bf) r1 = r0 ; R0_w=ptr_foo(id=2,ref_obj_id=2,off=0,imm=0) " 16 + "R1_w=ptr_foo(id=2,ref_obj_id=2,off=0,imm=0) refs=2\n6: (85) call bpf_this_cpu_ptr#154\n" 17 + "R1 type=ptr_ expected=percpu_ptr_" }, 18 + { "lock_id_global_zero", 19 + "; R1_w=map_value(off=0,ks=4,vs=4,imm=0)\n2: (85) call bpf_this_cpu_ptr#154\n" 20 + "R1 type=map_value expected=percpu_ptr_" }, 21 + { "lock_id_mapval_preserve", 22 + "8: (bf) r1 = r0 ; R0_w=map_value(id=1,off=0,ks=4,vs=8,imm=0) " 23 + "R1_w=map_value(id=1,off=0,ks=4,vs=8,imm=0)\n9: (85) call bpf_this_cpu_ptr#154\n" 24 + "R1 type=map_value expected=percpu_ptr_" }, 25 + { "lock_id_innermapval_preserve", 26 + "13: (bf) r1 = r0 ; R0=map_value(id=2,off=0,ks=4,vs=8,imm=0) " 27 + "R1_w=map_value(id=2,off=0,ks=4,vs=8,imm=0)\n14: (85) call bpf_this_cpu_ptr#154\n" 28 + "R1 type=map_value expected=percpu_ptr_" }, 29 + { "lock_id_mismatch_kptr_kptr", "bpf_spin_unlock of different lock" }, 30 + { "lock_id_mismatch_kptr_global", "bpf_spin_unlock of different lock" }, 31 + { "lock_id_mismatch_kptr_mapval", "bpf_spin_unlock of different lock" }, 32 + { "lock_id_mismatch_kptr_innermapval", "bpf_spin_unlock of different lock" }, 33 + { "lock_id_mismatch_global_global", "bpf_spin_unlock of different lock" }, 34 + { "lock_id_mismatch_global_kptr", "bpf_spin_unlock of different lock" }, 35 + { "lock_id_mismatch_global_mapval", "bpf_spin_unlock of different lock" }, 36 + { "lock_id_mismatch_global_innermapval", "bpf_spin_unlock of different lock" }, 37 + { "lock_id_mismatch_mapval_mapval", "bpf_spin_unlock of different lock" }, 38 + { "lock_id_mismatch_mapval_kptr", "bpf_spin_unlock of different lock" }, 39 + { "lock_id_mismatch_mapval_global", "bpf_spin_unlock of different lock" }, 40 + { "lock_id_mismatch_mapval_innermapval", "bpf_spin_unlock of different lock" }, 41 + { "lock_id_mismatch_innermapval_innermapval1", "bpf_spin_unlock of different lock" }, 42 + { "lock_id_mismatch_innermapval_innermapval2", "bpf_spin_unlock of different lock" }, 43 + { "lock_id_mismatch_innermapval_kptr", "bpf_spin_unlock of different lock" }, 44 + { "lock_id_mismatch_innermapval_global", "bpf_spin_unlock of different lock" }, 45 + { "lock_id_mismatch_innermapval_mapval", "bpf_spin_unlock of different lock" }, 46 + }; 47 + 48 + static void test_spin_lock_fail_prog(const char *prog_name, const char *err_msg) 49 + { 50 + LIBBPF_OPTS(bpf_object_open_opts, opts, .kernel_log_buf = log_buf, 51 + .kernel_log_size = sizeof(log_buf), 52 + .kernel_log_level = 1); 53 + struct test_spin_lock_fail *skel; 54 + struct bpf_program *prog; 55 + int ret; 56 + 57 + skel = test_spin_lock_fail__open_opts(&opts); 58 + if (!ASSERT_OK_PTR(skel, "test_spin_lock_fail__open_opts")) 59 + return; 60 + 61 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 62 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 63 + goto end; 64 + 65 + bpf_program__set_autoload(prog, true); 66 + 67 + ret = test_spin_lock_fail__load(skel); 68 + if (!ASSERT_ERR(ret, "test_spin_lock_fail__load must fail")) 69 + goto end; 70 + 71 + /* Skip check if JIT does not support kfuncs */ 72 + if (strstr(log_buf, "JIT does not support calling kernel function")) { 73 + test__skip(); 74 + goto end; 75 + } 76 + 77 + if (!ASSERT_OK_PTR(strstr(log_buf, err_msg), "expected error message")) { 78 + fprintf(stderr, "Expected: %s\n", err_msg); 79 + fprintf(stderr, "Verifier: %s\n", log_buf); 80 + } 81 + 82 + end: 83 + test_spin_lock_fail__destroy(skel); 84 + } 85 + 86 + static void *spin_lock_thread(void *arg) 87 + { 88 + int err, prog_fd = *(u32 *) arg; 89 + LIBBPF_OPTS(bpf_test_run_opts, topts, 90 + .data_in = &pkt_v4, 91 + .data_size_in = sizeof(pkt_v4), 92 + .repeat = 10000, 93 + ); 94 + 95 + err = bpf_prog_test_run_opts(prog_fd, &topts); 96 + ASSERT_OK(err, "test_run"); 97 + ASSERT_OK(topts.retval, "test_run retval"); 98 + pthread_exit(arg); 99 + } 100 + 101 + void test_spin_lock_success(void) 102 + { 103 + struct test_spin_lock *skel; 104 + pthread_t thread_id[4]; 105 + int prog_fd, i; 106 + void *ret; 107 + 108 + skel = test_spin_lock__open_and_load(); 109 + if (!ASSERT_OK_PTR(skel, "test_spin_lock__open_and_load")) 110 + return; 111 + prog_fd = bpf_program__fd(skel->progs.bpf_spin_lock_test); 112 + for (i = 0; i < 4; i++) { 113 + int err; 114 + 115 + err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd); 116 + if (!ASSERT_OK(err, "pthread_create")) 117 + goto end; 118 + } 119 + 120 + for (i = 0; i < 4; i++) { 121 + if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join")) 122 + goto end; 123 + if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd")) 124 + goto end; 125 + } 126 + end: 127 + test_spin_lock__destroy(skel); 128 + } 129 + 130 + void test_spin_lock(void) 131 + { 132 + int i; 133 + 134 + test_spin_lock_success(); 135 + 136 + for (i = 0; i < ARRAY_SIZE(spin_lock_fail_tests); i++) { 137 + if (!test__start_subtest(spin_lock_fail_tests[i].prog_name)) 138 + continue; 139 + test_spin_lock_fail_prog(spin_lock_fail_tests[i].prog_name, 140 + spin_lock_fail_tests[i].err_msg); 141 + } 142 + }

-45

tools/testing/selftests/bpf/prog_tests/spinlock.c

··· 1 - // SPDX-License-Identifier: GPL-2.0 2 - #include <test_progs.h> 3 - #include <network_helpers.h> 4 - 5 - static void *spin_lock_thread(void *arg) 6 - { 7 - int err, prog_fd = *(u32 *) arg; 8 - LIBBPF_OPTS(bpf_test_run_opts, topts, 9 - .data_in = &pkt_v4, 10 - .data_size_in = sizeof(pkt_v4), 11 - .repeat = 10000, 12 - ); 13 - 14 - err = bpf_prog_test_run_opts(prog_fd, &topts); 15 - ASSERT_OK(err, "test_run"); 16 - ASSERT_OK(topts.retval, "test_run retval"); 17 - pthread_exit(arg); 18 - } 19 - 20 - void test_spinlock(void) 21 - { 22 - const char *file = "./test_spin_lock.bpf.o"; 23 - pthread_t thread_id[4]; 24 - struct bpf_object *obj = NULL; 25 - int prog_fd; 26 - int err = 0, i; 27 - void *ret; 28 - 29 - err = bpf_prog_test_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd); 30 - if (CHECK_FAIL(err)) { 31 - printf("test_spin_lock:bpf_prog_test_load errno %d\n", errno); 32 - goto close_prog; 33 - } 34 - for (i = 0; i < 4; i++) 35 - if (CHECK_FAIL(pthread_create(&thread_id[i], NULL, 36 - &spin_lock_thread, &prog_fd))) 37 - goto close_prog; 38 - 39 - for (i = 0; i < 4; i++) 40 - if (CHECK_FAIL(pthread_join(thread_id[i], &ret) || 41 - ret != (void *)&prog_fd)) 42 - goto close_prog; 43 - close_prog: 44 - bpf_object__close(obj); 45 - }

+163

tools/testing/selftests/bpf/prog_tests/task_kfunc.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #define _GNU_SOURCE 5 + #include <sys/wait.h> 6 + #include <test_progs.h> 7 + #include <unistd.h> 8 + 9 + #include "task_kfunc_failure.skel.h" 10 + #include "task_kfunc_success.skel.h" 11 + 12 + static size_t log_buf_sz = 1 << 20; /* 1 MB */ 13 + static char obj_log_buf[1048576]; 14 + 15 + static struct task_kfunc_success *open_load_task_kfunc_skel(void) 16 + { 17 + struct task_kfunc_success *skel; 18 + int err; 19 + 20 + skel = task_kfunc_success__open(); 21 + if (!ASSERT_OK_PTR(skel, "skel_open")) 22 + return NULL; 23 + 24 + skel->bss->pid = getpid(); 25 + 26 + err = task_kfunc_success__load(skel); 27 + if (!ASSERT_OK(err, "skel_load")) 28 + goto cleanup; 29 + 30 + return skel; 31 + 32 + cleanup: 33 + task_kfunc_success__destroy(skel); 34 + return NULL; 35 + } 36 + 37 + static void run_success_test(const char *prog_name) 38 + { 39 + struct task_kfunc_success *skel; 40 + int status; 41 + pid_t child_pid; 42 + struct bpf_program *prog; 43 + struct bpf_link *link = NULL; 44 + 45 + skel = open_load_task_kfunc_skel(); 46 + if (!ASSERT_OK_PTR(skel, "open_load_skel")) 47 + return; 48 + 49 + if (!ASSERT_OK(skel->bss->err, "pre_spawn_err")) 50 + goto cleanup; 51 + 52 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 53 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 54 + goto cleanup; 55 + 56 + link = bpf_program__attach(prog); 57 + if (!ASSERT_OK_PTR(link, "attached_link")) 58 + goto cleanup; 59 + 60 + child_pid = fork(); 61 + if (!ASSERT_GT(child_pid, -1, "child_pid")) 62 + goto cleanup; 63 + if (child_pid == 0) 64 + _exit(0); 65 + waitpid(child_pid, &status, 0); 66 + 67 + ASSERT_OK(skel->bss->err, "post_wait_err"); 68 + 69 + cleanup: 70 + bpf_link__destroy(link); 71 + task_kfunc_success__destroy(skel); 72 + } 73 + 74 + static const char * const success_tests[] = { 75 + "test_task_acquire_release_argument", 76 + "test_task_acquire_release_current", 77 + "test_task_acquire_leave_in_map", 78 + "test_task_xchg_release", 79 + "test_task_get_release", 80 + "test_task_current_acquire_release", 81 + "test_task_from_pid_arg", 82 + "test_task_from_pid_current", 83 + "test_task_from_pid_invalid", 84 + }; 85 + 86 + static struct { 87 + const char *prog_name; 88 + const char *expected_err_msg; 89 + } failure_tests[] = { 90 + {"task_kfunc_acquire_untrusted", "R1 must be referenced or trusted"}, 91 + {"task_kfunc_acquire_fp", "arg#0 pointer type STRUCT task_struct must point"}, 92 + {"task_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"}, 93 + {"task_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"}, 94 + {"task_kfunc_acquire_null", "arg#0 pointer type STRUCT task_struct must point"}, 95 + {"task_kfunc_acquire_unreleased", "Unreleased reference"}, 96 + {"task_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"}, 97 + {"task_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"}, 98 + {"task_kfunc_get_null", "arg#0 expected pointer to map value"}, 99 + {"task_kfunc_xchg_unreleased", "Unreleased reference"}, 100 + {"task_kfunc_get_unreleased", "Unreleased reference"}, 101 + {"task_kfunc_release_untrusted", "arg#0 is untrusted_ptr_or_null_ expected ptr_ or socket"}, 102 + {"task_kfunc_release_fp", "arg#0 pointer type STRUCT task_struct must point"}, 103 + {"task_kfunc_release_null", "arg#0 is ptr_or_null_ expected ptr_ or socket"}, 104 + {"task_kfunc_release_unacquired", "release kernel function bpf_task_release expects"}, 105 + {"task_kfunc_from_pid_no_null_check", "arg#0 is ptr_or_null_ expected ptr_ or socket"}, 106 + }; 107 + 108 + static void verify_fail(const char *prog_name, const char *expected_err_msg) 109 + { 110 + LIBBPF_OPTS(bpf_object_open_opts, opts); 111 + struct task_kfunc_failure *skel; 112 + int err, i; 113 + 114 + opts.kernel_log_buf = obj_log_buf; 115 + opts.kernel_log_size = log_buf_sz; 116 + opts.kernel_log_level = 1; 117 + 118 + skel = task_kfunc_failure__open_opts(&opts); 119 + if (!ASSERT_OK_PTR(skel, "task_kfunc_failure__open_opts")) 120 + goto cleanup; 121 + 122 + for (i = 0; i < ARRAY_SIZE(failure_tests); i++) { 123 + struct bpf_program *prog; 124 + const char *curr_name = failure_tests[i].prog_name; 125 + 126 + prog = bpf_object__find_program_by_name(skel->obj, curr_name); 127 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 128 + goto cleanup; 129 + 130 + bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name)); 131 + } 132 + 133 + err = task_kfunc_failure__load(skel); 134 + if (!ASSERT_ERR(err, "unexpected load success")) 135 + goto cleanup; 136 + 137 + if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) { 138 + fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg); 139 + fprintf(stderr, "Verifier output: %s\n", obj_log_buf); 140 + } 141 + 142 + cleanup: 143 + task_kfunc_failure__destroy(skel); 144 + } 145 + 146 + void test_task_kfunc(void) 147 + { 148 + int i; 149 + 150 + for (i = 0; i < ARRAY_SIZE(success_tests); i++) { 151 + if (!test__start_subtest(success_tests[i])) 152 + continue; 153 + 154 + run_success_test(success_tests[i]); 155 + } 156 + 157 + for (i = 0; i < ARRAY_SIZE(failure_tests); i++) { 158 + if (!test__start_subtest(failure_tests[i].prog_name)) 159 + continue; 160 + 161 + verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg); 162 + } 163 + }

+114

tools/testing/selftests/bpf/prog_tests/type_cast.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + #include <test_progs.h> 4 + #include <network_helpers.h> 5 + #include "type_cast.skel.h" 6 + 7 + static void test_xdp(void) 8 + { 9 + struct type_cast *skel; 10 + int err, prog_fd; 11 + char buf[128]; 12 + 13 + LIBBPF_OPTS(bpf_test_run_opts, topts, 14 + .data_in = &pkt_v4, 15 + .data_size_in = sizeof(pkt_v4), 16 + .data_out = buf, 17 + .data_size_out = sizeof(buf), 18 + .repeat = 1, 19 + ); 20 + 21 + skel = type_cast__open(); 22 + if (!ASSERT_OK_PTR(skel, "skel_open")) 23 + return; 24 + 25 + bpf_program__set_autoload(skel->progs.md_xdp, true); 26 + err = type_cast__load(skel); 27 + if (!ASSERT_OK(err, "skel_load")) 28 + goto out; 29 + 30 + prog_fd = bpf_program__fd(skel->progs.md_xdp); 31 + err = bpf_prog_test_run_opts(prog_fd, &topts); 32 + ASSERT_OK(err, "test_run"); 33 + ASSERT_EQ(topts.retval, XDP_PASS, "xdp test_run retval"); 34 + 35 + ASSERT_EQ(skel->bss->ifindex, 1, "xdp_md ifindex"); 36 + ASSERT_EQ(skel->bss->ifindex, skel->bss->ingress_ifindex, "xdp_md ingress_ifindex"); 37 + ASSERT_STREQ(skel->bss->name, "lo", "xdp_md name"); 38 + ASSERT_NEQ(skel->bss->inum, 0, "xdp_md inum"); 39 + 40 + out: 41 + type_cast__destroy(skel); 42 + } 43 + 44 + static void test_tc(void) 45 + { 46 + struct type_cast *skel; 47 + int err, prog_fd; 48 + 49 + LIBBPF_OPTS(bpf_test_run_opts, topts, 50 + .data_in = &pkt_v4, 51 + .data_size_in = sizeof(pkt_v4), 52 + .repeat = 1, 53 + ); 54 + 55 + skel = type_cast__open(); 56 + if (!ASSERT_OK_PTR(skel, "skel_open")) 57 + return; 58 + 59 + bpf_program__set_autoload(skel->progs.md_skb, true); 60 + err = type_cast__load(skel); 61 + if (!ASSERT_OK(err, "skel_load")) 62 + goto out; 63 + 64 + prog_fd = bpf_program__fd(skel->progs.md_skb); 65 + err = bpf_prog_test_run_opts(prog_fd, &topts); 66 + ASSERT_OK(err, "test_run"); 67 + ASSERT_EQ(topts.retval, 0, "tc test_run retval"); 68 + 69 + ASSERT_EQ(skel->bss->meta_len, 0, "skb meta_len"); 70 + ASSERT_EQ(skel->bss->frag0_len, 0, "skb frag0_len"); 71 + ASSERT_NEQ(skel->bss->kskb_len, 0, "skb len"); 72 + ASSERT_NEQ(skel->bss->kskb2_len, 0, "skb2 len"); 73 + ASSERT_EQ(skel->bss->kskb_len, skel->bss->kskb2_len, "skb len compare"); 74 + 75 + out: 76 + type_cast__destroy(skel); 77 + } 78 + 79 + static const char * const negative_tests[] = { 80 + "untrusted_ptr", 81 + "kctx_u64", 82 + }; 83 + 84 + static void test_negative(void) 85 + { 86 + struct bpf_program *prog; 87 + struct type_cast *skel; 88 + int i, err; 89 + 90 + for (i = 0; i < ARRAY_SIZE(negative_tests); i++) { 91 + skel = type_cast__open(); 92 + if (!ASSERT_OK_PTR(skel, "skel_open")) 93 + return; 94 + 95 + prog = bpf_object__find_program_by_name(skel->obj, negative_tests[i]); 96 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 97 + goto out; 98 + bpf_program__set_autoload(prog, true); 99 + err = type_cast__load(skel); 100 + ASSERT_ERR(err, "skel_load"); 101 + out: 102 + type_cast__destroy(skel); 103 + } 104 + } 105 + 106 + void test_type_cast(void) 107 + { 108 + if (test__start_subtest("xdp")) 109 + test_xdp(); 110 + if (test__start_subtest("tc")) 111 + test_tc(); 112 + if (test__start_subtest("negative")) 113 + test_negative(); 114 + }

+1 -1

tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c

··· 85 85 } 86 86 87 87 #define NUM_PKTS 10000 88 - void test_xdp_do_redirect(void) 88 + void serial_test_xdp_do_redirect(void) 89 89 { 90 90 int err, xdp_prog_fd, tc_prog_fd, ifindex_src, ifindex_dst; 91 91 char data[sizeof(pkt_udp) + sizeof(__u32)];

+1 -1

tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c

··· 174 174 system("ip netns del synproxy"); 175 175 } 176 176 177 - void test_xdp_synproxy(void) 177 + void serial_test_xdp_synproxy(void) 178 178 { 179 179 if (test__start_subtest("xdp")) 180 180 test_synproxy(true);

+72

tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #ifndef _CGRP_KFUNC_COMMON_H 5 + #define _CGRP_KFUNC_COMMON_H 6 + 7 + #include <errno.h> 8 + #include <vmlinux.h> 9 + #include <bpf/bpf_helpers.h> 10 + #include <bpf/bpf_tracing.h> 11 + 12 + struct __cgrps_kfunc_map_value { 13 + struct cgroup __kptr_ref * cgrp; 14 + }; 15 + 16 + struct hash_map { 17 + __uint(type, BPF_MAP_TYPE_HASH); 18 + __type(key, int); 19 + __type(value, struct __cgrps_kfunc_map_value); 20 + __uint(max_entries, 1); 21 + } __cgrps_kfunc_map SEC(".maps"); 22 + 23 + struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; 24 + struct cgroup *bpf_cgroup_kptr_get(struct cgroup **pp) __ksym; 25 + void bpf_cgroup_release(struct cgroup *p) __ksym; 26 + struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) __ksym; 27 + 28 + static inline struct __cgrps_kfunc_map_value *cgrps_kfunc_map_value_lookup(struct cgroup *cgrp) 29 + { 30 + s32 id; 31 + long status; 32 + 33 + status = bpf_probe_read_kernel(&id, sizeof(id), &cgrp->self.id); 34 + if (status) 35 + return NULL; 36 + 37 + return bpf_map_lookup_elem(&__cgrps_kfunc_map, &id); 38 + } 39 + 40 + static inline int cgrps_kfunc_map_insert(struct cgroup *cgrp) 41 + { 42 + struct __cgrps_kfunc_map_value local, *v; 43 + long status; 44 + struct cgroup *acquired, *old; 45 + s32 id; 46 + 47 + status = bpf_probe_read_kernel(&id, sizeof(id), &cgrp->self.id); 48 + if (status) 49 + return status; 50 + 51 + local.cgrp = NULL; 52 + status = bpf_map_update_elem(&__cgrps_kfunc_map, &id, &local, BPF_NOEXIST); 53 + if (status) 54 + return status; 55 + 56 + v = bpf_map_lookup_elem(&__cgrps_kfunc_map, &id); 57 + if (!v) { 58 + bpf_map_delete_elem(&__cgrps_kfunc_map, &id); 59 + return -ENOENT; 60 + } 61 + 62 + acquired = bpf_cgroup_acquire(cgrp); 63 + old = bpf_kptr_xchg(&v->cgrp, acquired); 64 + if (old) { 65 + bpf_cgroup_release(old); 66 + return -EEXIST; 67 + } 68 + 69 + return 0; 70 + } 71 + 72 + #endif /* _CGRP_KFUNC_COMMON_H */

+260

tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <vmlinux.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include <bpf/bpf_helpers.h> 7 + 8 + #include "cgrp_kfunc_common.h" 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + /* Prototype for all of the program trace events below: 13 + * 14 + * TRACE_EVENT(cgroup_mkdir, 15 + * TP_PROTO(struct cgroup *cgrp, const char *path), 16 + * TP_ARGS(cgrp, path) 17 + */ 18 + 19 + static struct __cgrps_kfunc_map_value *insert_lookup_cgrp(struct cgroup *cgrp) 20 + { 21 + int status; 22 + 23 + status = cgrps_kfunc_map_insert(cgrp); 24 + if (status) 25 + return NULL; 26 + 27 + return cgrps_kfunc_map_value_lookup(cgrp); 28 + } 29 + 30 + SEC("tp_btf/cgroup_mkdir") 31 + int BPF_PROG(cgrp_kfunc_acquire_untrusted, struct cgroup *cgrp, const char *path) 32 + { 33 + struct cgroup *acquired; 34 + struct __cgrps_kfunc_map_value *v; 35 + 36 + v = insert_lookup_cgrp(cgrp); 37 + if (!v) 38 + return 0; 39 + 40 + /* Can't invoke bpf_cgroup_acquire() on an untrusted pointer. */ 41 + acquired = bpf_cgroup_acquire(v->cgrp); 42 + bpf_cgroup_release(acquired); 43 + 44 + return 0; 45 + } 46 + 47 + SEC("tp_btf/cgroup_mkdir") 48 + int BPF_PROG(cgrp_kfunc_acquire_fp, struct cgroup *cgrp, const char *path) 49 + { 50 + struct cgroup *acquired, *stack_cgrp = (struct cgroup *)&path; 51 + 52 + /* Can't invoke bpf_cgroup_acquire() on a random frame pointer. */ 53 + acquired = bpf_cgroup_acquire((struct cgroup *)&stack_cgrp); 54 + bpf_cgroup_release(acquired); 55 + 56 + return 0; 57 + } 58 + 59 + SEC("kretprobe/cgroup_destroy_locked") 60 + int BPF_PROG(cgrp_kfunc_acquire_unsafe_kretprobe, struct cgroup *cgrp) 61 + { 62 + struct cgroup *acquired; 63 + 64 + /* Can't acquire an untrusted struct cgroup * pointer. */ 65 + acquired = bpf_cgroup_acquire(cgrp); 66 + bpf_cgroup_release(acquired); 67 + 68 + return 0; 69 + } 70 + 71 + SEC("tp_btf/cgroup_mkdir") 72 + int BPF_PROG(cgrp_kfunc_acquire_trusted_walked, struct cgroup *cgrp, const char *path) 73 + { 74 + struct cgroup *acquired; 75 + 76 + /* Can't invoke bpf_cgroup_acquire() on a pointer obtained from walking a trusted cgroup. */ 77 + acquired = bpf_cgroup_acquire(cgrp->old_dom_cgrp); 78 + bpf_cgroup_release(acquired); 79 + 80 + return 0; 81 + } 82 + 83 + 84 + SEC("tp_btf/cgroup_mkdir") 85 + int BPF_PROG(cgrp_kfunc_acquire_null, struct cgroup *cgrp, const char *path) 86 + { 87 + struct cgroup *acquired; 88 + 89 + /* Can't invoke bpf_cgroup_acquire() on a NULL pointer. */ 90 + acquired = bpf_cgroup_acquire(NULL); 91 + if (!acquired) 92 + return 0; 93 + bpf_cgroup_release(acquired); 94 + 95 + return 0; 96 + } 97 + 98 + SEC("tp_btf/cgroup_mkdir") 99 + int BPF_PROG(cgrp_kfunc_acquire_unreleased, struct cgroup *cgrp, const char *path) 100 + { 101 + struct cgroup *acquired; 102 + 103 + acquired = bpf_cgroup_acquire(cgrp); 104 + 105 + /* Acquired cgroup is never released. */ 106 + 107 + return 0; 108 + } 109 + 110 + SEC("tp_btf/cgroup_mkdir") 111 + int BPF_PROG(cgrp_kfunc_get_non_kptr_param, struct cgroup *cgrp, const char *path) 112 + { 113 + struct cgroup *kptr; 114 + 115 + /* Cannot use bpf_cgroup_kptr_get() on a non-kptr, even on a valid cgroup. */ 116 + kptr = bpf_cgroup_kptr_get(&cgrp); 117 + if (!kptr) 118 + return 0; 119 + 120 + bpf_cgroup_release(kptr); 121 + 122 + return 0; 123 + } 124 + 125 + SEC("tp_btf/cgroup_mkdir") 126 + int BPF_PROG(cgrp_kfunc_get_non_kptr_acquired, struct cgroup *cgrp, const char *path) 127 + { 128 + struct cgroup *kptr, *acquired; 129 + 130 + acquired = bpf_cgroup_acquire(cgrp); 131 + 132 + /* Cannot use bpf_cgroup_kptr_get() on a non-map-value, even if the kptr was acquired. */ 133 + kptr = bpf_cgroup_kptr_get(&acquired); 134 + bpf_cgroup_release(acquired); 135 + if (!kptr) 136 + return 0; 137 + 138 + bpf_cgroup_release(kptr); 139 + 140 + return 0; 141 + } 142 + 143 + SEC("tp_btf/cgroup_mkdir") 144 + int BPF_PROG(cgrp_kfunc_get_null, struct cgroup *cgrp, const char *path) 145 + { 146 + struct cgroup *kptr; 147 + 148 + /* Cannot use bpf_cgroup_kptr_get() on a NULL pointer. */ 149 + kptr = bpf_cgroup_kptr_get(NULL); 150 + if (!kptr) 151 + return 0; 152 + 153 + bpf_cgroup_release(kptr); 154 + 155 + return 0; 156 + } 157 + 158 + SEC("tp_btf/cgroup_mkdir") 159 + int BPF_PROG(cgrp_kfunc_xchg_unreleased, struct cgroup *cgrp, const char *path) 160 + { 161 + struct cgroup *kptr; 162 + struct __cgrps_kfunc_map_value *v; 163 + 164 + v = insert_lookup_cgrp(cgrp); 165 + if (!v) 166 + return 0; 167 + 168 + kptr = bpf_kptr_xchg(&v->cgrp, NULL); 169 + if (!kptr) 170 + return 0; 171 + 172 + /* Kptr retrieved from map is never released. */ 173 + 174 + return 0; 175 + } 176 + 177 + SEC("tp_btf/cgroup_mkdir") 178 + int BPF_PROG(cgrp_kfunc_get_unreleased, struct cgroup *cgrp, const char *path) 179 + { 180 + struct cgroup *kptr; 181 + struct __cgrps_kfunc_map_value *v; 182 + 183 + v = insert_lookup_cgrp(cgrp); 184 + if (!v) 185 + return 0; 186 + 187 + kptr = bpf_cgroup_kptr_get(&v->cgrp); 188 + if (!kptr) 189 + return 0; 190 + 191 + /* Kptr acquired above is never released. */ 192 + 193 + return 0; 194 + } 195 + 196 + SEC("tp_btf/cgroup_mkdir") 197 + int BPF_PROG(cgrp_kfunc_release_untrusted, struct cgroup *cgrp, const char *path) 198 + { 199 + struct __cgrps_kfunc_map_value *v; 200 + 201 + v = insert_lookup_cgrp(cgrp); 202 + if (!v) 203 + return 0; 204 + 205 + /* Can't invoke bpf_cgroup_release() on an untrusted pointer. */ 206 + bpf_cgroup_release(v->cgrp); 207 + 208 + return 0; 209 + } 210 + 211 + SEC("tp_btf/cgroup_mkdir") 212 + int BPF_PROG(cgrp_kfunc_release_fp, struct cgroup *cgrp, const char *path) 213 + { 214 + struct cgroup *acquired = (struct cgroup *)&path; 215 + 216 + /* Cannot release random frame pointer. */ 217 + bpf_cgroup_release(acquired); 218 + 219 + return 0; 220 + } 221 + 222 + SEC("tp_btf/cgroup_mkdir") 223 + int BPF_PROG(cgrp_kfunc_release_null, struct cgroup *cgrp, const char *path) 224 + { 225 + struct __cgrps_kfunc_map_value local, *v; 226 + long status; 227 + struct cgroup *acquired, *old; 228 + s32 id; 229 + 230 + status = bpf_probe_read_kernel(&id, sizeof(id), &cgrp->self.id); 231 + if (status) 232 + return 0; 233 + 234 + local.cgrp = NULL; 235 + status = bpf_map_update_elem(&__cgrps_kfunc_map, &id, &local, BPF_NOEXIST); 236 + if (status) 237 + return status; 238 + 239 + v = bpf_map_lookup_elem(&__cgrps_kfunc_map, &id); 240 + if (!v) 241 + return -ENOENT; 242 + 243 + acquired = bpf_cgroup_acquire(cgrp); 244 + 245 + old = bpf_kptr_xchg(&v->cgrp, acquired); 246 + 247 + /* old cannot be passed to bpf_cgroup_release() without a NULL check. */ 248 + bpf_cgroup_release(old); 249 + 250 + return 0; 251 + } 252 + 253 + SEC("tp_btf/cgroup_mkdir") 254 + int BPF_PROG(cgrp_kfunc_release_unacquired, struct cgroup *cgrp, const char *path) 255 + { 256 + /* Cannot release trusted cgroup pointer which was not acquired. */ 257 + bpf_cgroup_release(cgrp); 258 + 259 + return 0; 260 + }

+170

tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <vmlinux.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include <bpf/bpf_helpers.h> 7 + 8 + #include "cgrp_kfunc_common.h" 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + int err, pid, invocations; 13 + 14 + /* Prototype for all of the program trace events below: 15 + * 16 + * TRACE_EVENT(cgroup_mkdir, 17 + * TP_PROTO(struct cgroup *cgrp, const char *path), 18 + * TP_ARGS(cgrp, path) 19 + */ 20 + 21 + static bool is_test_kfunc_task(void) 22 + { 23 + int cur_pid = bpf_get_current_pid_tgid() >> 32; 24 + bool same = pid == cur_pid; 25 + 26 + if (same) 27 + __sync_fetch_and_add(&invocations, 1); 28 + 29 + return same; 30 + } 31 + 32 + SEC("tp_btf/cgroup_mkdir") 33 + int BPF_PROG(test_cgrp_acquire_release_argument, struct cgroup *cgrp, const char *path) 34 + { 35 + struct cgroup *acquired; 36 + 37 + if (!is_test_kfunc_task()) 38 + return 0; 39 + 40 + acquired = bpf_cgroup_acquire(cgrp); 41 + bpf_cgroup_release(acquired); 42 + 43 + return 0; 44 + } 45 + 46 + SEC("tp_btf/cgroup_mkdir") 47 + int BPF_PROG(test_cgrp_acquire_leave_in_map, struct cgroup *cgrp, const char *path) 48 + { 49 + long status; 50 + 51 + if (!is_test_kfunc_task()) 52 + return 0; 53 + 54 + status = cgrps_kfunc_map_insert(cgrp); 55 + if (status) 56 + err = 1; 57 + 58 + return 0; 59 + } 60 + 61 + SEC("tp_btf/cgroup_mkdir") 62 + int BPF_PROG(test_cgrp_xchg_release, struct cgroup *cgrp, const char *path) 63 + { 64 + struct cgroup *kptr; 65 + struct __cgrps_kfunc_map_value *v; 66 + long status; 67 + 68 + if (!is_test_kfunc_task()) 69 + return 0; 70 + 71 + status = cgrps_kfunc_map_insert(cgrp); 72 + if (status) { 73 + err = 1; 74 + return 0; 75 + } 76 + 77 + v = cgrps_kfunc_map_value_lookup(cgrp); 78 + if (!v) { 79 + err = 2; 80 + return 0; 81 + } 82 + 83 + kptr = bpf_kptr_xchg(&v->cgrp, NULL); 84 + if (!kptr) { 85 + err = 3; 86 + return 0; 87 + } 88 + 89 + bpf_cgroup_release(kptr); 90 + 91 + return 0; 92 + } 93 + 94 + SEC("tp_btf/cgroup_mkdir") 95 + int BPF_PROG(test_cgrp_get_release, struct cgroup *cgrp, const char *path) 96 + { 97 + struct cgroup *kptr; 98 + struct __cgrps_kfunc_map_value *v; 99 + long status; 100 + 101 + if (!is_test_kfunc_task()) 102 + return 0; 103 + 104 + status = cgrps_kfunc_map_insert(cgrp); 105 + if (status) { 106 + err = 1; 107 + return 0; 108 + } 109 + 110 + v = cgrps_kfunc_map_value_lookup(cgrp); 111 + if (!v) { 112 + err = 2; 113 + return 0; 114 + } 115 + 116 + kptr = bpf_cgroup_kptr_get(&v->cgrp); 117 + if (!kptr) { 118 + err = 3; 119 + return 0; 120 + } 121 + 122 + bpf_cgroup_release(kptr); 123 + 124 + return 0; 125 + } 126 + 127 + SEC("tp_btf/cgroup_mkdir") 128 + int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp, const char *path) 129 + { 130 + struct cgroup *self, *ancestor1, *invalid; 131 + 132 + if (!is_test_kfunc_task()) 133 + return 0; 134 + 135 + self = bpf_cgroup_ancestor(cgrp, cgrp->level); 136 + if (!self) { 137 + err = 1; 138 + return 0; 139 + } 140 + 141 + if (self->self.id != cgrp->self.id) { 142 + bpf_cgroup_release(self); 143 + err = 2; 144 + return 0; 145 + } 146 + bpf_cgroup_release(self); 147 + 148 + ancestor1 = bpf_cgroup_ancestor(cgrp, cgrp->level - 1); 149 + if (!ancestor1) { 150 + err = 3; 151 + return 0; 152 + } 153 + bpf_cgroup_release(ancestor1); 154 + 155 + invalid = bpf_cgroup_ancestor(cgrp, 10000); 156 + if (invalid) { 157 + bpf_cgroup_release(invalid); 158 + err = 4; 159 + return 0; 160 + } 161 + 162 + invalid = bpf_cgroup_ancestor(cgrp, -1); 163 + if (invalid) { 164 + bpf_cgroup_release(invalid); 165 + err = 5; 166 + return 0; 167 + } 168 + 169 + return 0; 170 + }

+37

tools/testing/selftests/bpf/progs/empty_skb.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause 2 + #include <linux/bpf.h> 3 + #include <bpf/bpf_helpers.h> 4 + #include <bpf/bpf_endian.h> 5 + 6 + char _license[] SEC("license") = "GPL"; 7 + 8 + int ifindex; 9 + int ret; 10 + 11 + SEC("lwt_xmit") 12 + int redirect_ingress(struct __sk_buff *skb) 13 + { 14 + ret = bpf_clone_redirect(skb, ifindex, BPF_F_INGRESS); 15 + return 0; 16 + } 17 + 18 + SEC("lwt_xmit") 19 + int redirect_egress(struct __sk_buff *skb) 20 + { 21 + ret = bpf_clone_redirect(skb, ifindex, 0); 22 + return 0; 23 + } 24 + 25 + SEC("tc") 26 + int tc_redirect_ingress(struct __sk_buff *skb) 27 + { 28 + ret = bpf_clone_redirect(skb, ifindex, BPF_F_INGRESS); 29 + return 0; 30 + } 31 + 32 + SEC("tc") 33 + int tc_redirect_egress(struct __sk_buff *skb) 34 + { 35 + ret = bpf_clone_redirect(skb, ifindex, 0); 36 + return 0; 37 + }

+370

tools/testing/selftests/bpf/progs/linked_list.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <vmlinux.h> 3 + #include <bpf/bpf_tracing.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_core_read.h> 6 + #include "bpf_experimental.h" 7 + 8 + #ifndef ARRAY_SIZE 9 + #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) 10 + #endif 11 + 12 + #include "linked_list.h" 13 + 14 + static __always_inline 15 + int list_push_pop(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map) 16 + { 17 + struct bpf_list_node *n; 18 + struct foo *f; 19 + 20 + f = bpf_obj_new(typeof(*f)); 21 + if (!f) 22 + return 2; 23 + 24 + bpf_spin_lock(lock); 25 + n = bpf_list_pop_front(head); 26 + bpf_spin_unlock(lock); 27 + if (n) { 28 + bpf_obj_drop(container_of(n, struct foo, node)); 29 + bpf_obj_drop(f); 30 + return 3; 31 + } 32 + 33 + bpf_spin_lock(lock); 34 + n = bpf_list_pop_back(head); 35 + bpf_spin_unlock(lock); 36 + if (n) { 37 + bpf_obj_drop(container_of(n, struct foo, node)); 38 + bpf_obj_drop(f); 39 + return 4; 40 + } 41 + 42 + 43 + bpf_spin_lock(lock); 44 + f->data = 42; 45 + bpf_list_push_front(head, &f->node); 46 + bpf_spin_unlock(lock); 47 + if (leave_in_map) 48 + return 0; 49 + bpf_spin_lock(lock); 50 + n = bpf_list_pop_back(head); 51 + bpf_spin_unlock(lock); 52 + if (!n) 53 + return 5; 54 + f = container_of(n, struct foo, node); 55 + if (f->data != 42) { 56 + bpf_obj_drop(f); 57 + return 6; 58 + } 59 + 60 + bpf_spin_lock(lock); 61 + f->data = 13; 62 + bpf_list_push_front(head, &f->node); 63 + bpf_spin_unlock(lock); 64 + bpf_spin_lock(lock); 65 + n = bpf_list_pop_front(head); 66 + bpf_spin_unlock(lock); 67 + if (!n) 68 + return 7; 69 + f = container_of(n, struct foo, node); 70 + if (f->data != 13) { 71 + bpf_obj_drop(f); 72 + return 8; 73 + } 74 + bpf_obj_drop(f); 75 + 76 + bpf_spin_lock(lock); 77 + n = bpf_list_pop_front(head); 78 + bpf_spin_unlock(lock); 79 + if (n) { 80 + bpf_obj_drop(container_of(n, struct foo, node)); 81 + return 9; 82 + } 83 + 84 + bpf_spin_lock(lock); 85 + n = bpf_list_pop_back(head); 86 + bpf_spin_unlock(lock); 87 + if (n) { 88 + bpf_obj_drop(container_of(n, struct foo, node)); 89 + return 10; 90 + } 91 + return 0; 92 + } 93 + 94 + 95 + static __always_inline 96 + int list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map) 97 + { 98 + struct bpf_list_node *n; 99 + struct foo *f[8], *pf; 100 + int i; 101 + 102 + for (i = 0; i < ARRAY_SIZE(f); i++) { 103 + f[i] = bpf_obj_new(typeof(**f)); 104 + if (!f[i]) 105 + return 2; 106 + f[i]->data = i; 107 + bpf_spin_lock(lock); 108 + bpf_list_push_front(head, &f[i]->node); 109 + bpf_spin_unlock(lock); 110 + } 111 + 112 + for (i = 0; i < ARRAY_SIZE(f); i++) { 113 + bpf_spin_lock(lock); 114 + n = bpf_list_pop_front(head); 115 + bpf_spin_unlock(lock); 116 + if (!n) 117 + return 3; 118 + pf = container_of(n, struct foo, node); 119 + if (pf->data != (ARRAY_SIZE(f) - i - 1)) { 120 + bpf_obj_drop(pf); 121 + return 4; 122 + } 123 + bpf_spin_lock(lock); 124 + bpf_list_push_back(head, &pf->node); 125 + bpf_spin_unlock(lock); 126 + } 127 + 128 + if (leave_in_map) 129 + return 0; 130 + 131 + for (i = 0; i < ARRAY_SIZE(f); i++) { 132 + bpf_spin_lock(lock); 133 + n = bpf_list_pop_back(head); 134 + bpf_spin_unlock(lock); 135 + if (!n) 136 + return 5; 137 + pf = container_of(n, struct foo, node); 138 + if (pf->data != i) { 139 + bpf_obj_drop(pf); 140 + return 6; 141 + } 142 + bpf_obj_drop(pf); 143 + } 144 + bpf_spin_lock(lock); 145 + n = bpf_list_pop_back(head); 146 + bpf_spin_unlock(lock); 147 + if (n) { 148 + bpf_obj_drop(container_of(n, struct foo, node)); 149 + return 7; 150 + } 151 + 152 + bpf_spin_lock(lock); 153 + n = bpf_list_pop_front(head); 154 + bpf_spin_unlock(lock); 155 + if (n) { 156 + bpf_obj_drop(container_of(n, struct foo, node)); 157 + return 8; 158 + } 159 + return 0; 160 + } 161 + 162 + static __always_inline 163 + int list_in_list(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map) 164 + { 165 + struct bpf_list_node *n; 166 + struct bar *ba[8], *b; 167 + struct foo *f; 168 + int i; 169 + 170 + f = bpf_obj_new(typeof(*f)); 171 + if (!f) 172 + return 2; 173 + for (i = 0; i < ARRAY_SIZE(ba); i++) { 174 + b = bpf_obj_new(typeof(*b)); 175 + if (!b) { 176 + bpf_obj_drop(f); 177 + return 3; 178 + } 179 + b->data = i; 180 + bpf_spin_lock(&f->lock); 181 + bpf_list_push_back(&f->head, &b->node); 182 + bpf_spin_unlock(&f->lock); 183 + } 184 + 185 + bpf_spin_lock(lock); 186 + f->data = 42; 187 + bpf_list_push_front(head, &f->node); 188 + bpf_spin_unlock(lock); 189 + 190 + if (leave_in_map) 191 + return 0; 192 + 193 + bpf_spin_lock(lock); 194 + n = bpf_list_pop_front(head); 195 + bpf_spin_unlock(lock); 196 + if (!n) 197 + return 4; 198 + f = container_of(n, struct foo, node); 199 + if (f->data != 42) { 200 + bpf_obj_drop(f); 201 + return 5; 202 + } 203 + 204 + for (i = 0; i < ARRAY_SIZE(ba); i++) { 205 + bpf_spin_lock(&f->lock); 206 + n = bpf_list_pop_front(&f->head); 207 + bpf_spin_unlock(&f->lock); 208 + if (!n) { 209 + bpf_obj_drop(f); 210 + return 6; 211 + } 212 + b = container_of(n, struct bar, node); 213 + if (b->data != i) { 214 + bpf_obj_drop(f); 215 + bpf_obj_drop(b); 216 + return 7; 217 + } 218 + bpf_obj_drop(b); 219 + } 220 + bpf_spin_lock(&f->lock); 221 + n = bpf_list_pop_front(&f->head); 222 + bpf_spin_unlock(&f->lock); 223 + if (n) { 224 + bpf_obj_drop(f); 225 + bpf_obj_drop(container_of(n, struct bar, node)); 226 + return 8; 227 + } 228 + bpf_obj_drop(f); 229 + return 0; 230 + } 231 + 232 + static __always_inline 233 + int test_list_push_pop(struct bpf_spin_lock *lock, struct bpf_list_head *head) 234 + { 235 + int ret; 236 + 237 + ret = list_push_pop(lock, head, false); 238 + if (ret) 239 + return ret; 240 + return list_push_pop(lock, head, true); 241 + } 242 + 243 + static __always_inline 244 + int test_list_push_pop_multiple(struct bpf_spin_lock *lock, struct bpf_list_head *head) 245 + { 246 + int ret; 247 + 248 + ret = list_push_pop_multiple(lock ,head, false); 249 + if (ret) 250 + return ret; 251 + return list_push_pop_multiple(lock, head, true); 252 + } 253 + 254 + static __always_inline 255 + int test_list_in_list(struct bpf_spin_lock *lock, struct bpf_list_head *head) 256 + { 257 + int ret; 258 + 259 + ret = list_in_list(lock, head, false); 260 + if (ret) 261 + return ret; 262 + return list_in_list(lock, head, true); 263 + } 264 + 265 + SEC("tc") 266 + int map_list_push_pop(void *ctx) 267 + { 268 + struct map_value *v; 269 + 270 + v = bpf_map_lookup_elem(&array_map, &(int){0}); 271 + if (!v) 272 + return 1; 273 + return test_list_push_pop(&v->lock, &v->head); 274 + } 275 + 276 + SEC("tc") 277 + int inner_map_list_push_pop(void *ctx) 278 + { 279 + struct map_value *v; 280 + void *map; 281 + 282 + map = bpf_map_lookup_elem(&map_of_maps, &(int){0}); 283 + if (!map) 284 + return 1; 285 + v = bpf_map_lookup_elem(map, &(int){0}); 286 + if (!v) 287 + return 1; 288 + return test_list_push_pop(&v->lock, &v->head); 289 + } 290 + 291 + SEC("tc") 292 + int global_list_push_pop(void *ctx) 293 + { 294 + return test_list_push_pop(&glock, &ghead); 295 + } 296 + 297 + SEC("tc") 298 + int map_list_push_pop_multiple(void *ctx) 299 + { 300 + struct map_value *v; 301 + int ret; 302 + 303 + v = bpf_map_lookup_elem(&array_map, &(int){0}); 304 + if (!v) 305 + return 1; 306 + return test_list_push_pop_multiple(&v->lock, &v->head); 307 + } 308 + 309 + SEC("tc") 310 + int inner_map_list_push_pop_multiple(void *ctx) 311 + { 312 + struct map_value *v; 313 + void *map; 314 + int ret; 315 + 316 + map = bpf_map_lookup_elem(&map_of_maps, &(int){0}); 317 + if (!map) 318 + return 1; 319 + v = bpf_map_lookup_elem(map, &(int){0}); 320 + if (!v) 321 + return 1; 322 + return test_list_push_pop_multiple(&v->lock, &v->head); 323 + } 324 + 325 + SEC("tc") 326 + int global_list_push_pop_multiple(void *ctx) 327 + { 328 + int ret; 329 + 330 + ret = list_push_pop_multiple(&glock, &ghead, false); 331 + if (ret) 332 + return ret; 333 + return list_push_pop_multiple(&glock, &ghead, true); 334 + } 335 + 336 + SEC("tc") 337 + int map_list_in_list(void *ctx) 338 + { 339 + struct map_value *v; 340 + int ret; 341 + 342 + v = bpf_map_lookup_elem(&array_map, &(int){0}); 343 + if (!v) 344 + return 1; 345 + return test_list_in_list(&v->lock, &v->head); 346 + } 347 + 348 + SEC("tc") 349 + int inner_map_list_in_list(void *ctx) 350 + { 351 + struct map_value *v; 352 + void *map; 353 + int ret; 354 + 355 + map = bpf_map_lookup_elem(&map_of_maps, &(int){0}); 356 + if (!map) 357 + return 1; 358 + v = bpf_map_lookup_elem(map, &(int){0}); 359 + if (!v) 360 + return 1; 361 + return test_list_in_list(&v->lock, &v->head); 362 + } 363 + 364 + SEC("tc") 365 + int global_list_in_list(void *ctx) 366 + { 367 + return test_list_in_list(&glock, &ghead); 368 + } 369 + 370 + char _license[] SEC("license") = "GPL";

+56

tools/testing/selftests/bpf/progs/linked_list.h

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #ifndef LINKED_LIST_H 3 + #define LINKED_LIST_H 4 + 5 + #include <vmlinux.h> 6 + #include <bpf/bpf_helpers.h> 7 + #include "bpf_experimental.h" 8 + 9 + struct bar { 10 + struct bpf_list_node node; 11 + int data; 12 + }; 13 + 14 + struct foo { 15 + struct bpf_list_node node; 16 + struct bpf_list_head head __contains(bar, node); 17 + struct bpf_spin_lock lock; 18 + int data; 19 + struct bpf_list_node node2; 20 + }; 21 + 22 + struct map_value { 23 + struct bpf_spin_lock lock; 24 + int data; 25 + struct bpf_list_head head __contains(foo, node); 26 + }; 27 + 28 + struct array_map { 29 + __uint(type, BPF_MAP_TYPE_ARRAY); 30 + __type(key, int); 31 + __type(value, struct map_value); 32 + __uint(max_entries, 1); 33 + }; 34 + 35 + struct array_map array_map SEC(".maps"); 36 + struct array_map inner_map SEC(".maps"); 37 + 38 + struct { 39 + __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS); 40 + __uint(max_entries, 1); 41 + __type(key, int); 42 + __type(value, int); 43 + __array(values, struct array_map); 44 + } map_of_maps SEC(".maps") = { 45 + .values = { 46 + [0] = &inner_map, 47 + }, 48 + }; 49 + 50 + #define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8))) 51 + 52 + private(A) struct bpf_spin_lock glock; 53 + private(A) struct bpf_list_head ghead __contains(foo, node); 54 + private(B) struct bpf_spin_lock glock2; 55 + 56 + #endif

+581

tools/testing/selftests/bpf/progs/linked_list_fail.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <vmlinux.h> 3 + #include <bpf/bpf_tracing.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_core_read.h> 6 + #include "bpf_experimental.h" 7 + 8 + #include "linked_list.h" 9 + 10 + #define INIT \ 11 + struct map_value *v, *v2, *iv, *iv2; \ 12 + struct foo *f, *f1, *f2; \ 13 + struct bar *b; \ 14 + void *map; \ 15 + \ 16 + map = bpf_map_lookup_elem(&map_of_maps, &(int){ 0 }); \ 17 + if (!map) \ 18 + return 0; \ 19 + v = bpf_map_lookup_elem(&array_map, &(int){ 0 }); \ 20 + if (!v) \ 21 + return 0; \ 22 + v2 = bpf_map_lookup_elem(&array_map, &(int){ 0 }); \ 23 + if (!v2) \ 24 + return 0; \ 25 + iv = bpf_map_lookup_elem(map, &(int){ 0 }); \ 26 + if (!iv) \ 27 + return 0; \ 28 + iv2 = bpf_map_lookup_elem(map, &(int){ 0 }); \ 29 + if (!iv2) \ 30 + return 0; \ 31 + f = bpf_obj_new(typeof(*f)); \ 32 + if (!f) \ 33 + return 0; \ 34 + f1 = f; \ 35 + f2 = bpf_obj_new(typeof(*f2)); \ 36 + if (!f2) { \ 37 + bpf_obj_drop(f1); \ 38 + return 0; \ 39 + } \ 40 + b = bpf_obj_new(typeof(*b)); \ 41 + if (!b) { \ 42 + bpf_obj_drop(f2); \ 43 + bpf_obj_drop(f1); \ 44 + return 0; \ 45 + } 46 + 47 + #define CHECK(test, op, hexpr) \ 48 + SEC("?tc") \ 49 + int test##_missing_lock_##op(void *ctx) \ 50 + { \ 51 + INIT; \ 52 + void (*p)(void *) = (void *)&bpf_list_##op; \ 53 + p(hexpr); \ 54 + return 0; \ 55 + } 56 + 57 + CHECK(kptr, push_front, &f->head); 58 + CHECK(kptr, push_back, &f->head); 59 + CHECK(kptr, pop_front, &f->head); 60 + CHECK(kptr, pop_back, &f->head); 61 + 62 + CHECK(global, push_front, &ghead); 63 + CHECK(global, push_back, &ghead); 64 + CHECK(global, pop_front, &ghead); 65 + CHECK(global, pop_back, &ghead); 66 + 67 + CHECK(map, push_front, &v->head); 68 + CHECK(map, push_back, &v->head); 69 + CHECK(map, pop_front, &v->head); 70 + CHECK(map, pop_back, &v->head); 71 + 72 + CHECK(inner_map, push_front, &iv->head); 73 + CHECK(inner_map, push_back, &iv->head); 74 + CHECK(inner_map, pop_front, &iv->head); 75 + CHECK(inner_map, pop_back, &iv->head); 76 + 77 + #undef CHECK 78 + 79 + #define CHECK(test, op, lexpr, hexpr) \ 80 + SEC("?tc") \ 81 + int test##_incorrect_lock_##op(void *ctx) \ 82 + { \ 83 + INIT; \ 84 + void (*p)(void *) = (void *)&bpf_list_##op; \ 85 + bpf_spin_lock(lexpr); \ 86 + p(hexpr); \ 87 + return 0; \ 88 + } 89 + 90 + #define CHECK_OP(op) \ 91 + CHECK(kptr_kptr, op, &f1->lock, &f2->head); \ 92 + CHECK(kptr_global, op, &f1->lock, &ghead); \ 93 + CHECK(kptr_map, op, &f1->lock, &v->head); \ 94 + CHECK(kptr_inner_map, op, &f1->lock, &iv->head); \ 95 + \ 96 + CHECK(global_global, op, &glock2, &ghead); \ 97 + CHECK(global_kptr, op, &glock, &f1->head); \ 98 + CHECK(global_map, op, &glock, &v->head); \ 99 + CHECK(global_inner_map, op, &glock, &iv->head); \ 100 + \ 101 + CHECK(map_map, op, &v->lock, &v2->head); \ 102 + CHECK(map_kptr, op, &v->lock, &f2->head); \ 103 + CHECK(map_global, op, &v->lock, &ghead); \ 104 + CHECK(map_inner_map, op, &v->lock, &iv->head); \ 105 + \ 106 + CHECK(inner_map_inner_map, op, &iv->lock, &iv2->head); \ 107 + CHECK(inner_map_kptr, op, &iv->lock, &f2->head); \ 108 + CHECK(inner_map_global, op, &iv->lock, &ghead); \ 109 + CHECK(inner_map_map, op, &iv->lock, &v->head); 110 + 111 + CHECK_OP(push_front); 112 + CHECK_OP(push_back); 113 + CHECK_OP(pop_front); 114 + CHECK_OP(pop_back); 115 + 116 + #undef CHECK 117 + #undef CHECK_OP 118 + #undef INIT 119 + 120 + SEC("?kprobe/xyz") 121 + int map_compat_kprobe(void *ctx) 122 + { 123 + bpf_list_push_front(&ghead, NULL); 124 + return 0; 125 + } 126 + 127 + SEC("?kretprobe/xyz") 128 + int map_compat_kretprobe(void *ctx) 129 + { 130 + bpf_list_push_front(&ghead, NULL); 131 + return 0; 132 + } 133 + 134 + SEC("?tracepoint/xyz") 135 + int map_compat_tp(void *ctx) 136 + { 137 + bpf_list_push_front(&ghead, NULL); 138 + return 0; 139 + } 140 + 141 + SEC("?perf_event") 142 + int map_compat_perf(void *ctx) 143 + { 144 + bpf_list_push_front(&ghead, NULL); 145 + return 0; 146 + } 147 + 148 + SEC("?raw_tp/xyz") 149 + int map_compat_raw_tp(void *ctx) 150 + { 151 + bpf_list_push_front(&ghead, NULL); 152 + return 0; 153 + } 154 + 155 + SEC("?raw_tp.w/xyz") 156 + int map_compat_raw_tp_w(void *ctx) 157 + { 158 + bpf_list_push_front(&ghead, NULL); 159 + return 0; 160 + } 161 + 162 + SEC("?tc") 163 + int obj_type_id_oor(void *ctx) 164 + { 165 + bpf_obj_new_impl(~0UL, NULL); 166 + return 0; 167 + } 168 + 169 + SEC("?tc") 170 + int obj_new_no_composite(void *ctx) 171 + { 172 + bpf_obj_new_impl(bpf_core_type_id_local(int), (void *)42); 173 + return 0; 174 + } 175 + 176 + SEC("?tc") 177 + int obj_new_no_struct(void *ctx) 178 + { 179 + 180 + bpf_obj_new(union { int data; unsigned udata; }); 181 + return 0; 182 + } 183 + 184 + SEC("?tc") 185 + int obj_drop_non_zero_off(void *ctx) 186 + { 187 + void *f; 188 + 189 + f = bpf_obj_new(struct foo); 190 + if (!f) 191 + return 0; 192 + bpf_obj_drop(f+1); 193 + return 0; 194 + } 195 + 196 + SEC("?tc") 197 + int new_null_ret(void *ctx) 198 + { 199 + return bpf_obj_new(struct foo)->data; 200 + } 201 + 202 + SEC("?tc") 203 + int obj_new_acq(void *ctx) 204 + { 205 + bpf_obj_new(struct foo); 206 + return 0; 207 + } 208 + 209 + SEC("?tc") 210 + int use_after_drop(void *ctx) 211 + { 212 + struct foo *f; 213 + 214 + f = bpf_obj_new(typeof(*f)); 215 + if (!f) 216 + return 0; 217 + bpf_obj_drop(f); 218 + return f->data; 219 + } 220 + 221 + SEC("?tc") 222 + int ptr_walk_scalar(void *ctx) 223 + { 224 + struct test1 { 225 + struct test2 { 226 + struct test2 *next; 227 + } *ptr; 228 + } *p; 229 + 230 + p = bpf_obj_new(typeof(*p)); 231 + if (!p) 232 + return 0; 233 + bpf_this_cpu_ptr(p->ptr); 234 + return 0; 235 + } 236 + 237 + SEC("?tc") 238 + int direct_read_lock(void *ctx) 239 + { 240 + struct foo *f; 241 + 242 + f = bpf_obj_new(typeof(*f)); 243 + if (!f) 244 + return 0; 245 + return *(int *)&f->lock; 246 + } 247 + 248 + SEC("?tc") 249 + int direct_write_lock(void *ctx) 250 + { 251 + struct foo *f; 252 + 253 + f = bpf_obj_new(typeof(*f)); 254 + if (!f) 255 + return 0; 256 + *(int *)&f->lock = 0; 257 + return 0; 258 + } 259 + 260 + SEC("?tc") 261 + int direct_read_head(void *ctx) 262 + { 263 + struct foo *f; 264 + 265 + f = bpf_obj_new(typeof(*f)); 266 + if (!f) 267 + return 0; 268 + return *(int *)&f->head; 269 + } 270 + 271 + SEC("?tc") 272 + int direct_write_head(void *ctx) 273 + { 274 + struct foo *f; 275 + 276 + f = bpf_obj_new(typeof(*f)); 277 + if (!f) 278 + return 0; 279 + *(int *)&f->head = 0; 280 + return 0; 281 + } 282 + 283 + SEC("?tc") 284 + int direct_read_node(void *ctx) 285 + { 286 + struct foo *f; 287 + 288 + f = bpf_obj_new(typeof(*f)); 289 + if (!f) 290 + return 0; 291 + return *(int *)&f->node; 292 + } 293 + 294 + SEC("?tc") 295 + int direct_write_node(void *ctx) 296 + { 297 + struct foo *f; 298 + 299 + f = bpf_obj_new(typeof(*f)); 300 + if (!f) 301 + return 0; 302 + *(int *)&f->node = 0; 303 + return 0; 304 + } 305 + 306 + static __always_inline 307 + int write_after_op(void (*push_op)(void *head, void *node)) 308 + { 309 + struct foo *f; 310 + 311 + f = bpf_obj_new(typeof(*f)); 312 + if (!f) 313 + return 0; 314 + bpf_spin_lock(&glock); 315 + push_op(&ghead, &f->node); 316 + f->data = 42; 317 + bpf_spin_unlock(&glock); 318 + 319 + return 0; 320 + } 321 + 322 + SEC("?tc") 323 + int write_after_push_front(void *ctx) 324 + { 325 + return write_after_op((void *)bpf_list_push_front); 326 + } 327 + 328 + SEC("?tc") 329 + int write_after_push_back(void *ctx) 330 + { 331 + return write_after_op((void *)bpf_list_push_back); 332 + } 333 + 334 + static __always_inline 335 + int use_after_unlock(void (*op)(void *head, void *node)) 336 + { 337 + struct foo *f; 338 + 339 + f = bpf_obj_new(typeof(*f)); 340 + if (!f) 341 + return 0; 342 + bpf_spin_lock(&glock); 343 + f->data = 42; 344 + op(&ghead, &f->node); 345 + bpf_spin_unlock(&glock); 346 + 347 + return f->data; 348 + } 349 + 350 + SEC("?tc") 351 + int use_after_unlock_push_front(void *ctx) 352 + { 353 + return use_after_unlock((void *)bpf_list_push_front); 354 + } 355 + 356 + SEC("?tc") 357 + int use_after_unlock_push_back(void *ctx) 358 + { 359 + return use_after_unlock((void *)bpf_list_push_back); 360 + } 361 + 362 + static __always_inline 363 + int list_double_add(void (*op)(void *head, void *node)) 364 + { 365 + struct foo *f; 366 + 367 + f = bpf_obj_new(typeof(*f)); 368 + if (!f) 369 + return 0; 370 + bpf_spin_lock(&glock); 371 + op(&ghead, &f->node); 372 + op(&ghead, &f->node); 373 + bpf_spin_unlock(&glock); 374 + 375 + return 0; 376 + } 377 + 378 + SEC("?tc") 379 + int double_push_front(void *ctx) 380 + { 381 + return list_double_add((void *)bpf_list_push_front); 382 + } 383 + 384 + SEC("?tc") 385 + int double_push_back(void *ctx) 386 + { 387 + return list_double_add((void *)bpf_list_push_back); 388 + } 389 + 390 + SEC("?tc") 391 + int no_node_value_type(void *ctx) 392 + { 393 + void *p; 394 + 395 + p = bpf_obj_new(struct { int data; }); 396 + if (!p) 397 + return 0; 398 + bpf_spin_lock(&glock); 399 + bpf_list_push_front(&ghead, p); 400 + bpf_spin_unlock(&glock); 401 + 402 + return 0; 403 + } 404 + 405 + SEC("?tc") 406 + int incorrect_value_type(void *ctx) 407 + { 408 + struct bar *b; 409 + 410 + b = bpf_obj_new(typeof(*b)); 411 + if (!b) 412 + return 0; 413 + bpf_spin_lock(&glock); 414 + bpf_list_push_front(&ghead, &b->node); 415 + bpf_spin_unlock(&glock); 416 + 417 + return 0; 418 + } 419 + 420 + SEC("?tc") 421 + int incorrect_node_var_off(struct __sk_buff *ctx) 422 + { 423 + struct foo *f; 424 + 425 + f = bpf_obj_new(typeof(*f)); 426 + if (!f) 427 + return 0; 428 + bpf_spin_lock(&glock); 429 + bpf_list_push_front(&ghead, (void *)&f->node + ctx->protocol); 430 + bpf_spin_unlock(&glock); 431 + 432 + return 0; 433 + } 434 + 435 + SEC("?tc") 436 + int incorrect_node_off1(void *ctx) 437 + { 438 + struct foo *f; 439 + 440 + f = bpf_obj_new(typeof(*f)); 441 + if (!f) 442 + return 0; 443 + bpf_spin_lock(&glock); 444 + bpf_list_push_front(&ghead, (void *)&f->node + 1); 445 + bpf_spin_unlock(&glock); 446 + 447 + return 0; 448 + } 449 + 450 + SEC("?tc") 451 + int incorrect_node_off2(void *ctx) 452 + { 453 + struct foo *f; 454 + 455 + f = bpf_obj_new(typeof(*f)); 456 + if (!f) 457 + return 0; 458 + bpf_spin_lock(&glock); 459 + bpf_list_push_front(&ghead, &f->node2); 460 + bpf_spin_unlock(&glock); 461 + 462 + return 0; 463 + } 464 + 465 + SEC("?tc") 466 + int no_head_type(void *ctx) 467 + { 468 + void *p; 469 + 470 + p = bpf_obj_new(typeof(struct { int data; })); 471 + if (!p) 472 + return 0; 473 + bpf_spin_lock(&glock); 474 + bpf_list_push_front(p, NULL); 475 + bpf_spin_lock(&glock); 476 + 477 + return 0; 478 + } 479 + 480 + SEC("?tc") 481 + int incorrect_head_var_off1(struct __sk_buff *ctx) 482 + { 483 + struct foo *f; 484 + 485 + f = bpf_obj_new(typeof(*f)); 486 + if (!f) 487 + return 0; 488 + bpf_spin_lock(&glock); 489 + bpf_list_push_front((void *)&ghead + ctx->protocol, &f->node); 490 + bpf_spin_unlock(&glock); 491 + 492 + return 0; 493 + } 494 + 495 + SEC("?tc") 496 + int incorrect_head_var_off2(struct __sk_buff *ctx) 497 + { 498 + struct foo *f; 499 + 500 + f = bpf_obj_new(typeof(*f)); 501 + if (!f) 502 + return 0; 503 + bpf_spin_lock(&glock); 504 + bpf_list_push_front((void *)&f->head + ctx->protocol, &f->node); 505 + bpf_spin_unlock(&glock); 506 + 507 + return 0; 508 + } 509 + 510 + SEC("?tc") 511 + int incorrect_head_off1(void *ctx) 512 + { 513 + struct foo *f; 514 + struct bar *b; 515 + 516 + f = bpf_obj_new(typeof(*f)); 517 + if (!f) 518 + return 0; 519 + b = bpf_obj_new(typeof(*b)); 520 + if (!b) { 521 + bpf_obj_drop(f); 522 + return 0; 523 + } 524 + 525 + bpf_spin_lock(&f->lock); 526 + bpf_list_push_front((void *)&f->head + 1, &b->node); 527 + bpf_spin_unlock(&f->lock); 528 + 529 + return 0; 530 + } 531 + 532 + SEC("?tc") 533 + int incorrect_head_off2(void *ctx) 534 + { 535 + struct foo *f; 536 + struct bar *b; 537 + 538 + f = bpf_obj_new(typeof(*f)); 539 + if (!f) 540 + return 0; 541 + 542 + bpf_spin_lock(&glock); 543 + bpf_list_push_front((void *)&ghead + 1, &f->node); 544 + bpf_spin_unlock(&glock); 545 + 546 + return 0; 547 + } 548 + 549 + static __always_inline 550 + int pop_ptr_off(void *(*op)(void *head)) 551 + { 552 + struct { 553 + struct bpf_list_head head __contains(foo, node2); 554 + struct bpf_spin_lock lock; 555 + } *p; 556 + struct bpf_list_node *n; 557 + 558 + p = bpf_obj_new(typeof(*p)); 559 + if (!p) 560 + return 0; 561 + bpf_spin_lock(&p->lock); 562 + n = op(&p->head); 563 + bpf_spin_unlock(&p->lock); 564 + 565 + bpf_this_cpu_ptr(n); 566 + return 0; 567 + } 568 + 569 + SEC("?tc") 570 + int pop_front_off(void *ctx) 571 + { 572 + return pop_ptr_off((void *)bpf_list_pop_front); 573 + } 574 + 575 + SEC("?tc") 576 + int pop_back_off(void *ctx) 577 + { 578 + return pop_ptr_off((void *)bpf_list_pop_back); 579 + } 580 + 581 + char _license[] SEC("license") = "GPL";

+8

tools/testing/selftests/bpf/progs/lsm_cgroup.c

··· 7 7 8 8 char _license[] SEC("license") = "GPL"; 9 9 10 + extern bool CONFIG_SECURITY_SELINUX __kconfig __weak; 11 + extern bool CONFIG_SECURITY_SMACK __kconfig __weak; 12 + extern bool CONFIG_SECURITY_APPARMOR __kconfig __weak; 13 + 10 14 #ifndef AF_PACKET 11 15 #define AF_PACKET 17 12 16 #endif ··· 144 140 int BPF_PROG(socket_alloc, struct sock *sk, int family, gfp_t priority) 145 141 { 146 142 called_socket_alloc++; 143 + /* if already have non-bpf lsms installed, EPERM will cause memory leak of non-bpf lsms */ 144 + if (CONFIG_SECURITY_SELINUX || CONFIG_SECURITY_SMACK || CONFIG_SECURITY_APPARMOR) 145 + return 1; 146 + 147 147 if (family == AF_UNIX) 148 148 return 0; /* EPERM */ 149 149

+290

tools/testing/selftests/bpf/progs/rcu_read_lock.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + #include "bpf_tracing_net.h" 8 + #include "bpf_misc.h" 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + struct { 13 + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); 14 + __uint(map_flags, BPF_F_NO_PREALLOC); 15 + __type(key, int); 16 + __type(value, long); 17 + } map_a SEC(".maps"); 18 + 19 + __u32 user_data, key_serial, target_pid; 20 + __u64 flags, task_storage_val, cgroup_id; 21 + 22 + struct bpf_key *bpf_lookup_user_key(__u32 serial, __u64 flags) __ksym; 23 + void bpf_key_put(struct bpf_key *key) __ksym; 24 + void bpf_rcu_read_lock(void) __ksym; 25 + void bpf_rcu_read_unlock(void) __ksym; 26 + struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym; 27 + void bpf_task_release(struct task_struct *p) __ksym; 28 + 29 + SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") 30 + int get_cgroup_id(void *ctx) 31 + { 32 + struct task_struct *task; 33 + 34 + task = bpf_get_current_task_btf(); 35 + if (task->pid != target_pid) 36 + return 0; 37 + 38 + /* simulate bpf_get_current_cgroup_id() helper */ 39 + bpf_rcu_read_lock(); 40 + cgroup_id = task->cgroups->dfl_cgrp->kn->id; 41 + bpf_rcu_read_unlock(); 42 + return 0; 43 + } 44 + 45 + SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") 46 + int task_succ(void *ctx) 47 + { 48 + struct task_struct *task, *real_parent; 49 + long init_val = 2; 50 + long *ptr; 51 + 52 + task = bpf_get_current_task_btf(); 53 + if (task->pid != target_pid) 54 + return 0; 55 + 56 + bpf_rcu_read_lock(); 57 + /* region including helper using rcu ptr real_parent */ 58 + real_parent = task->real_parent; 59 + ptr = bpf_task_storage_get(&map_a, real_parent, &init_val, 60 + BPF_LOCAL_STORAGE_GET_F_CREATE); 61 + if (!ptr) 62 + goto out; 63 + ptr = bpf_task_storage_get(&map_a, real_parent, 0, 0); 64 + if (!ptr) 65 + goto out; 66 + task_storage_val = *ptr; 67 + out: 68 + bpf_rcu_read_unlock(); 69 + return 0; 70 + } 71 + 72 + SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep") 73 + int no_lock(void *ctx) 74 + { 75 + struct task_struct *task, *real_parent; 76 + 77 + /* no bpf_rcu_read_lock(), old code still works */ 78 + task = bpf_get_current_task_btf(); 79 + real_parent = task->real_parent; 80 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 81 + return 0; 82 + } 83 + 84 + SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep") 85 + int two_regions(void *ctx) 86 + { 87 + struct task_struct *task, *real_parent; 88 + 89 + /* two regions */ 90 + task = bpf_get_current_task_btf(); 91 + bpf_rcu_read_lock(); 92 + bpf_rcu_read_unlock(); 93 + bpf_rcu_read_lock(); 94 + real_parent = task->real_parent; 95 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 96 + bpf_rcu_read_unlock(); 97 + return 0; 98 + } 99 + 100 + SEC("?fentry/" SYS_PREFIX "sys_getpgid") 101 + int non_sleepable_1(void *ctx) 102 + { 103 + struct task_struct *task, *real_parent; 104 + 105 + task = bpf_get_current_task_btf(); 106 + bpf_rcu_read_lock(); 107 + real_parent = task->real_parent; 108 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 109 + bpf_rcu_read_unlock(); 110 + return 0; 111 + } 112 + 113 + SEC("?fentry/" SYS_PREFIX "sys_getpgid") 114 + int non_sleepable_2(void *ctx) 115 + { 116 + struct task_struct *task, *real_parent; 117 + 118 + bpf_rcu_read_lock(); 119 + task = bpf_get_current_task_btf(); 120 + bpf_rcu_read_unlock(); 121 + 122 + bpf_rcu_read_lock(); 123 + real_parent = task->real_parent; 124 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 125 + bpf_rcu_read_unlock(); 126 + return 0; 127 + } 128 + 129 + SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep") 130 + int task_acquire(void *ctx) 131 + { 132 + struct task_struct *task, *real_parent; 133 + 134 + task = bpf_get_current_task_btf(); 135 + bpf_rcu_read_lock(); 136 + real_parent = task->real_parent; 137 + /* acquire a reference which can be used outside rcu read lock region */ 138 + real_parent = bpf_task_acquire(real_parent); 139 + bpf_rcu_read_unlock(); 140 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 141 + bpf_task_release(real_parent); 142 + return 0; 143 + } 144 + 145 + SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") 146 + int miss_lock(void *ctx) 147 + { 148 + struct task_struct *task; 149 + struct css_set *cgroups; 150 + struct cgroup *dfl_cgrp; 151 + 152 + /* missing bpf_rcu_read_lock() */ 153 + task = bpf_get_current_task_btf(); 154 + bpf_rcu_read_lock(); 155 + (void)bpf_task_storage_get(&map_a, task, 0, 0); 156 + bpf_rcu_read_unlock(); 157 + bpf_rcu_read_unlock(); 158 + return 0; 159 + } 160 + 161 + SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") 162 + int miss_unlock(void *ctx) 163 + { 164 + struct task_struct *task; 165 + struct css_set *cgroups; 166 + struct cgroup *dfl_cgrp; 167 + 168 + /* missing bpf_rcu_read_unlock() */ 169 + task = bpf_get_current_task_btf(); 170 + bpf_rcu_read_lock(); 171 + (void)bpf_task_storage_get(&map_a, task, 0, 0); 172 + return 0; 173 + } 174 + 175 + SEC("?fentry/" SYS_PREFIX "sys_getpgid") 176 + int non_sleepable_rcu_mismatch(void *ctx) 177 + { 178 + struct task_struct *task, *real_parent; 179 + 180 + task = bpf_get_current_task_btf(); 181 + /* non-sleepable: missing bpf_rcu_read_unlock() in one path */ 182 + bpf_rcu_read_lock(); 183 + real_parent = task->real_parent; 184 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 185 + if (real_parent) 186 + bpf_rcu_read_unlock(); 187 + return 0; 188 + } 189 + 190 + SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") 191 + int inproper_sleepable_helper(void *ctx) 192 + { 193 + struct task_struct *task, *real_parent; 194 + struct pt_regs *regs; 195 + __u32 value = 0; 196 + void *ptr; 197 + 198 + task = bpf_get_current_task_btf(); 199 + /* sleepable helper in rcu read lock region */ 200 + bpf_rcu_read_lock(); 201 + real_parent = task->real_parent; 202 + regs = (struct pt_regs *)bpf_task_pt_regs(real_parent); 203 + if (!regs) { 204 + bpf_rcu_read_unlock(); 205 + return 0; 206 + } 207 + 208 + ptr = (void *)PT_REGS_IP(regs); 209 + (void)bpf_copy_from_user_task(&value, sizeof(uint32_t), ptr, task, 0); 210 + user_data = value; 211 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 212 + bpf_rcu_read_unlock(); 213 + return 0; 214 + } 215 + 216 + SEC("?lsm.s/bpf") 217 + int BPF_PROG(inproper_sleepable_kfunc, int cmd, union bpf_attr *attr, unsigned int size) 218 + { 219 + struct bpf_key *bkey; 220 + 221 + /* sleepable kfunc in rcu read lock region */ 222 + bpf_rcu_read_lock(); 223 + bkey = bpf_lookup_user_key(key_serial, flags); 224 + bpf_rcu_read_unlock(); 225 + if (!bkey) 226 + return -1; 227 + bpf_key_put(bkey); 228 + 229 + return 0; 230 + } 231 + 232 + SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep") 233 + int nested_rcu_region(void *ctx) 234 + { 235 + struct task_struct *task, *real_parent; 236 + 237 + /* nested rcu read lock regions */ 238 + task = bpf_get_current_task_btf(); 239 + bpf_rcu_read_lock(); 240 + bpf_rcu_read_lock(); 241 + real_parent = task->real_parent; 242 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 243 + bpf_rcu_read_unlock(); 244 + bpf_rcu_read_unlock(); 245 + return 0; 246 + } 247 + 248 + SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") 249 + int task_untrusted_non_rcuptr(void *ctx) 250 + { 251 + struct task_struct *task, *last_wakee; 252 + 253 + task = bpf_get_current_task_btf(); 254 + bpf_rcu_read_lock(); 255 + /* the pointer last_wakee marked as untrusted */ 256 + last_wakee = task->real_parent->last_wakee; 257 + (void)bpf_task_storage_get(&map_a, last_wakee, 0, 0); 258 + bpf_rcu_read_unlock(); 259 + return 0; 260 + } 261 + 262 + SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") 263 + int task_untrusted_rcuptr(void *ctx) 264 + { 265 + struct task_struct *task, *real_parent; 266 + 267 + task = bpf_get_current_task_btf(); 268 + bpf_rcu_read_lock(); 269 + real_parent = task->real_parent; 270 + bpf_rcu_read_unlock(); 271 + /* helper use of rcu ptr outside the rcu read lock region */ 272 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 273 + return 0; 274 + } 275 + 276 + SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep") 277 + int cross_rcu_region(void *ctx) 278 + { 279 + struct task_struct *task, *real_parent; 280 + 281 + /* rcu ptr define/use in different regions */ 282 + task = bpf_get_current_task_btf(); 283 + bpf_rcu_read_lock(); 284 + real_parent = task->real_parent; 285 + bpf_rcu_read_unlock(); 286 + bpf_rcu_read_lock(); 287 + (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); 288 + bpf_rcu_read_unlock(); 289 + return 0; 290 + }

+72

tools/testing/selftests/bpf/progs/task_kfunc_common.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #ifndef _TASK_KFUNC_COMMON_H 5 + #define _TASK_KFUNC_COMMON_H 6 + 7 + #include <errno.h> 8 + #include <vmlinux.h> 9 + #include <bpf/bpf_helpers.h> 10 + #include <bpf/bpf_tracing.h> 11 + 12 + struct __tasks_kfunc_map_value { 13 + struct task_struct __kptr_ref * task; 14 + }; 15 + 16 + struct hash_map { 17 + __uint(type, BPF_MAP_TYPE_HASH); 18 + __type(key, int); 19 + __type(value, struct __tasks_kfunc_map_value); 20 + __uint(max_entries, 1); 21 + } __tasks_kfunc_map SEC(".maps"); 22 + 23 + struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym; 24 + struct task_struct *bpf_task_kptr_get(struct task_struct **pp) __ksym; 25 + void bpf_task_release(struct task_struct *p) __ksym; 26 + struct task_struct *bpf_task_from_pid(s32 pid) __ksym; 27 + 28 + static inline struct __tasks_kfunc_map_value *tasks_kfunc_map_value_lookup(struct task_struct *p) 29 + { 30 + s32 pid; 31 + long status; 32 + 33 + status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid); 34 + if (status) 35 + return NULL; 36 + 37 + return bpf_map_lookup_elem(&__tasks_kfunc_map, &pid); 38 + } 39 + 40 + static inline int tasks_kfunc_map_insert(struct task_struct *p) 41 + { 42 + struct __tasks_kfunc_map_value local, *v; 43 + long status; 44 + struct task_struct *acquired, *old; 45 + s32 pid; 46 + 47 + status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid); 48 + if (status) 49 + return status; 50 + 51 + local.task = NULL; 52 + status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST); 53 + if (status) 54 + return status; 55 + 56 + v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid); 57 + if (!v) { 58 + bpf_map_delete_elem(&__tasks_kfunc_map, &pid); 59 + return -ENOENT; 60 + } 61 + 62 + acquired = bpf_task_acquire(p); 63 + old = bpf_kptr_xchg(&v->task, acquired); 64 + if (old) { 65 + bpf_task_release(old); 66 + return -EEXIST; 67 + } 68 + 69 + return 0; 70 + } 71 + 72 + #endif /* _TASK_KFUNC_COMMON_H */

+273

tools/testing/selftests/bpf/progs/task_kfunc_failure.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <vmlinux.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include <bpf/bpf_helpers.h> 7 + 8 + #include "task_kfunc_common.h" 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + /* Prototype for all of the program trace events below: 13 + * 14 + * TRACE_EVENT(task_newtask, 15 + * TP_PROTO(struct task_struct *p, u64 clone_flags) 16 + */ 17 + 18 + static struct __tasks_kfunc_map_value *insert_lookup_task(struct task_struct *task) 19 + { 20 + int status; 21 + 22 + status = tasks_kfunc_map_insert(task); 23 + if (status) 24 + return NULL; 25 + 26 + return tasks_kfunc_map_value_lookup(task); 27 + } 28 + 29 + SEC("tp_btf/task_newtask") 30 + int BPF_PROG(task_kfunc_acquire_untrusted, struct task_struct *task, u64 clone_flags) 31 + { 32 + struct task_struct *acquired; 33 + struct __tasks_kfunc_map_value *v; 34 + 35 + v = insert_lookup_task(task); 36 + if (!v) 37 + return 0; 38 + 39 + /* Can't invoke bpf_task_acquire() on an untrusted pointer. */ 40 + acquired = bpf_task_acquire(v->task); 41 + bpf_task_release(acquired); 42 + 43 + return 0; 44 + } 45 + 46 + SEC("tp_btf/task_newtask") 47 + int BPF_PROG(task_kfunc_acquire_fp, struct task_struct *task, u64 clone_flags) 48 + { 49 + struct task_struct *acquired, *stack_task = (struct task_struct *)&clone_flags; 50 + 51 + /* Can't invoke bpf_task_acquire() on a random frame pointer. */ 52 + acquired = bpf_task_acquire((struct task_struct *)&stack_task); 53 + bpf_task_release(acquired); 54 + 55 + return 0; 56 + } 57 + 58 + SEC("kretprobe/free_task") 59 + int BPF_PROG(task_kfunc_acquire_unsafe_kretprobe, struct task_struct *task, u64 clone_flags) 60 + { 61 + struct task_struct *acquired; 62 + 63 + acquired = bpf_task_acquire(task); 64 + /* Can't release a bpf_task_acquire()'d task without a NULL check. */ 65 + bpf_task_release(acquired); 66 + 67 + return 0; 68 + } 69 + 70 + SEC("tp_btf/task_newtask") 71 + int BPF_PROG(task_kfunc_acquire_trusted_walked, struct task_struct *task, u64 clone_flags) 72 + { 73 + struct task_struct *acquired; 74 + 75 + /* Can't invoke bpf_task_acquire() on a trusted pointer obtained from walking a struct. */ 76 + acquired = bpf_task_acquire(task->last_wakee); 77 + bpf_task_release(acquired); 78 + 79 + return 0; 80 + } 81 + 82 + 83 + SEC("tp_btf/task_newtask") 84 + int BPF_PROG(task_kfunc_acquire_null, struct task_struct *task, u64 clone_flags) 85 + { 86 + struct task_struct *acquired; 87 + 88 + /* Can't invoke bpf_task_acquire() on a NULL pointer. */ 89 + acquired = bpf_task_acquire(NULL); 90 + if (!acquired) 91 + return 0; 92 + bpf_task_release(acquired); 93 + 94 + return 0; 95 + } 96 + 97 + SEC("tp_btf/task_newtask") 98 + int BPF_PROG(task_kfunc_acquire_unreleased, struct task_struct *task, u64 clone_flags) 99 + { 100 + struct task_struct *acquired; 101 + 102 + acquired = bpf_task_acquire(task); 103 + 104 + /* Acquired task is never released. */ 105 + 106 + return 0; 107 + } 108 + 109 + SEC("tp_btf/task_newtask") 110 + int BPF_PROG(task_kfunc_get_non_kptr_param, struct task_struct *task, u64 clone_flags) 111 + { 112 + struct task_struct *kptr; 113 + 114 + /* Cannot use bpf_task_kptr_get() on a non-kptr, even on a valid task. */ 115 + kptr = bpf_task_kptr_get(&task); 116 + if (!kptr) 117 + return 0; 118 + 119 + bpf_task_release(kptr); 120 + 121 + return 0; 122 + } 123 + 124 + SEC("tp_btf/task_newtask") 125 + int BPF_PROG(task_kfunc_get_non_kptr_acquired, struct task_struct *task, u64 clone_flags) 126 + { 127 + struct task_struct *kptr, *acquired; 128 + 129 + acquired = bpf_task_acquire(task); 130 + 131 + /* Cannot use bpf_task_kptr_get() on a non-kptr, even if it was acquired. */ 132 + kptr = bpf_task_kptr_get(&acquired); 133 + bpf_task_release(acquired); 134 + if (!kptr) 135 + return 0; 136 + 137 + bpf_task_release(kptr); 138 + 139 + return 0; 140 + } 141 + 142 + SEC("tp_btf/task_newtask") 143 + int BPF_PROG(task_kfunc_get_null, struct task_struct *task, u64 clone_flags) 144 + { 145 + struct task_struct *kptr; 146 + 147 + /* Cannot use bpf_task_kptr_get() on a NULL pointer. */ 148 + kptr = bpf_task_kptr_get(NULL); 149 + if (!kptr) 150 + return 0; 151 + 152 + bpf_task_release(kptr); 153 + 154 + return 0; 155 + } 156 + 157 + SEC("tp_btf/task_newtask") 158 + int BPF_PROG(task_kfunc_xchg_unreleased, struct task_struct *task, u64 clone_flags) 159 + { 160 + struct task_struct *kptr; 161 + struct __tasks_kfunc_map_value *v; 162 + 163 + v = insert_lookup_task(task); 164 + if (!v) 165 + return 0; 166 + 167 + kptr = bpf_kptr_xchg(&v->task, NULL); 168 + if (!kptr) 169 + return 0; 170 + 171 + /* Kptr retrieved from map is never released. */ 172 + 173 + return 0; 174 + } 175 + 176 + SEC("tp_btf/task_newtask") 177 + int BPF_PROG(task_kfunc_get_unreleased, struct task_struct *task, u64 clone_flags) 178 + { 179 + struct task_struct *kptr; 180 + struct __tasks_kfunc_map_value *v; 181 + 182 + v = insert_lookup_task(task); 183 + if (!v) 184 + return 0; 185 + 186 + kptr = bpf_task_kptr_get(&v->task); 187 + if (!kptr) 188 + return 0; 189 + 190 + /* Kptr acquired above is never released. */ 191 + 192 + return 0; 193 + } 194 + 195 + SEC("tp_btf/task_newtask") 196 + int BPF_PROG(task_kfunc_release_untrusted, struct task_struct *task, u64 clone_flags) 197 + { 198 + struct __tasks_kfunc_map_value *v; 199 + 200 + v = insert_lookup_task(task); 201 + if (!v) 202 + return 0; 203 + 204 + /* Can't invoke bpf_task_release() on an untrusted pointer. */ 205 + bpf_task_release(v->task); 206 + 207 + return 0; 208 + } 209 + 210 + SEC("tp_btf/task_newtask") 211 + int BPF_PROG(task_kfunc_release_fp, struct task_struct *task, u64 clone_flags) 212 + { 213 + struct task_struct *acquired = (struct task_struct *)&clone_flags; 214 + 215 + /* Cannot release random frame pointer. */ 216 + bpf_task_release(acquired); 217 + 218 + return 0; 219 + } 220 + 221 + SEC("tp_btf/task_newtask") 222 + int BPF_PROG(task_kfunc_release_null, struct task_struct *task, u64 clone_flags) 223 + { 224 + struct __tasks_kfunc_map_value local, *v; 225 + long status; 226 + struct task_struct *acquired, *old; 227 + s32 pid; 228 + 229 + status = bpf_probe_read_kernel(&pid, sizeof(pid), &task->pid); 230 + if (status) 231 + return 0; 232 + 233 + local.task = NULL; 234 + status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST); 235 + if (status) 236 + return status; 237 + 238 + v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid); 239 + if (!v) 240 + return -ENOENT; 241 + 242 + acquired = bpf_task_acquire(task); 243 + 244 + old = bpf_kptr_xchg(&v->task, acquired); 245 + 246 + /* old cannot be passed to bpf_task_release() without a NULL check. */ 247 + bpf_task_release(old); 248 + bpf_task_release(old); 249 + 250 + return 0; 251 + } 252 + 253 + SEC("tp_btf/task_newtask") 254 + int BPF_PROG(task_kfunc_release_unacquired, struct task_struct *task, u64 clone_flags) 255 + { 256 + /* Cannot release trusted task pointer which was not acquired. */ 257 + bpf_task_release(task); 258 + 259 + return 0; 260 + } 261 + 262 + SEC("tp_btf/task_newtask") 263 + int BPF_PROG(task_kfunc_from_pid_no_null_check, struct task_struct *task, u64 clone_flags) 264 + { 265 + struct task_struct *acquired; 266 + 267 + acquired = bpf_task_from_pid(task->pid); 268 + 269 + /* Releasing bpf_task_from_pid() lookup without a NULL check. */ 270 + bpf_task_release(acquired); 271 + 272 + return 0; 273 + }

+222

tools/testing/selftests/bpf/progs/task_kfunc_success.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <vmlinux.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include <bpf/bpf_helpers.h> 7 + 8 + #include "task_kfunc_common.h" 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + int err, pid; 13 + 14 + /* Prototype for all of the program trace events below: 15 + * 16 + * TRACE_EVENT(task_newtask, 17 + * TP_PROTO(struct task_struct *p, u64 clone_flags) 18 + */ 19 + 20 + static bool is_test_kfunc_task(void) 21 + { 22 + int cur_pid = bpf_get_current_pid_tgid() >> 32; 23 + 24 + return pid == cur_pid; 25 + } 26 + 27 + static int test_acquire_release(struct task_struct *task) 28 + { 29 + struct task_struct *acquired; 30 + 31 + acquired = bpf_task_acquire(task); 32 + bpf_task_release(acquired); 33 + 34 + return 0; 35 + } 36 + 37 + SEC("tp_btf/task_newtask") 38 + int BPF_PROG(test_task_acquire_release_argument, struct task_struct *task, u64 clone_flags) 39 + { 40 + if (!is_test_kfunc_task()) 41 + return 0; 42 + 43 + return test_acquire_release(task); 44 + } 45 + 46 + SEC("tp_btf/task_newtask") 47 + int BPF_PROG(test_task_acquire_release_current, struct task_struct *task, u64 clone_flags) 48 + { 49 + if (!is_test_kfunc_task()) 50 + return 0; 51 + 52 + return test_acquire_release(bpf_get_current_task_btf()); 53 + } 54 + 55 + SEC("tp_btf/task_newtask") 56 + int BPF_PROG(test_task_acquire_leave_in_map, struct task_struct *task, u64 clone_flags) 57 + { 58 + long status; 59 + 60 + if (!is_test_kfunc_task()) 61 + return 0; 62 + 63 + status = tasks_kfunc_map_insert(task); 64 + if (status) 65 + err = 1; 66 + 67 + return 0; 68 + } 69 + 70 + SEC("tp_btf/task_newtask") 71 + int BPF_PROG(test_task_xchg_release, struct task_struct *task, u64 clone_flags) 72 + { 73 + struct task_struct *kptr; 74 + struct __tasks_kfunc_map_value *v; 75 + long status; 76 + 77 + if (!is_test_kfunc_task()) 78 + return 0; 79 + 80 + status = tasks_kfunc_map_insert(task); 81 + if (status) { 82 + err = 1; 83 + return 0; 84 + } 85 + 86 + v = tasks_kfunc_map_value_lookup(task); 87 + if (!v) { 88 + err = 2; 89 + return 0; 90 + } 91 + 92 + kptr = bpf_kptr_xchg(&v->task, NULL); 93 + if (!kptr) { 94 + err = 3; 95 + return 0; 96 + } 97 + 98 + bpf_task_release(kptr); 99 + 100 + return 0; 101 + } 102 + 103 + SEC("tp_btf/task_newtask") 104 + int BPF_PROG(test_task_get_release, struct task_struct *task, u64 clone_flags) 105 + { 106 + struct task_struct *kptr; 107 + struct __tasks_kfunc_map_value *v; 108 + long status; 109 + 110 + if (!is_test_kfunc_task()) 111 + return 0; 112 + 113 + status = tasks_kfunc_map_insert(task); 114 + if (status) { 115 + err = 1; 116 + return 0; 117 + } 118 + 119 + v = tasks_kfunc_map_value_lookup(task); 120 + if (!v) { 121 + err = 2; 122 + return 0; 123 + } 124 + 125 + kptr = bpf_task_kptr_get(&v->task); 126 + if (!kptr) { 127 + err = 3; 128 + return 0; 129 + } 130 + 131 + bpf_task_release(kptr); 132 + 133 + return 0; 134 + } 135 + 136 + SEC("tp_btf/task_newtask") 137 + int BPF_PROG(test_task_current_acquire_release, struct task_struct *task, u64 clone_flags) 138 + { 139 + struct task_struct *current, *acquired; 140 + 141 + if (!is_test_kfunc_task()) 142 + return 0; 143 + 144 + current = bpf_get_current_task_btf(); 145 + acquired = bpf_task_acquire(current); 146 + bpf_task_release(acquired); 147 + 148 + return 0; 149 + } 150 + 151 + static void lookup_compare_pid(const struct task_struct *p) 152 + { 153 + struct task_struct *acquired; 154 + 155 + acquired = bpf_task_from_pid(p->pid); 156 + if (!acquired) { 157 + err = 1; 158 + return; 159 + } 160 + 161 + if (acquired->pid != p->pid) 162 + err = 2; 163 + bpf_task_release(acquired); 164 + } 165 + 166 + SEC("tp_btf/task_newtask") 167 + int BPF_PROG(test_task_from_pid_arg, struct task_struct *task, u64 clone_flags) 168 + { 169 + struct task_struct *acquired; 170 + 171 + if (!is_test_kfunc_task()) 172 + return 0; 173 + 174 + lookup_compare_pid(task); 175 + return 0; 176 + } 177 + 178 + SEC("tp_btf/task_newtask") 179 + int BPF_PROG(test_task_from_pid_current, struct task_struct *task, u64 clone_flags) 180 + { 181 + struct task_struct *current, *acquired; 182 + 183 + if (!is_test_kfunc_task()) 184 + return 0; 185 + 186 + lookup_compare_pid(bpf_get_current_task_btf()); 187 + return 0; 188 + } 189 + 190 + static int is_pid_lookup_valid(s32 pid) 191 + { 192 + struct task_struct *acquired; 193 + 194 + acquired = bpf_task_from_pid(pid); 195 + if (acquired) { 196 + bpf_task_release(acquired); 197 + return 1; 198 + } 199 + 200 + return 0; 201 + } 202 + 203 + SEC("tp_btf/task_newtask") 204 + int BPF_PROG(test_task_from_pid_invalid, struct task_struct *task, u64 clone_flags) 205 + { 206 + struct task_struct *acquired; 207 + 208 + if (!is_test_kfunc_task()) 209 + return 0; 210 + 211 + if (is_pid_lookup_valid(-1)) { 212 + err = 1; 213 + return 0; 214 + } 215 + 216 + if (is_pid_lookup_valid(0xcafef00d)) { 217 + err = 2; 218 + return 0; 219 + } 220 + 221 + return 0; 222 + }

+2 -2

tools/testing/selftests/bpf/progs/test_spin_lock.c

··· 45 45 46 46 #define CREDIT_PER_NS(delta, rate) (((delta) * rate) >> 20) 47 47 48 - SEC("tc") 49 - int bpf_sping_lock_test(struct __sk_buff *skb) 48 + SEC("cgroup_skb/ingress") 49 + int bpf_spin_lock_test(struct __sk_buff *skb) 50 50 { 51 51 volatile int credit = 0, max_credit = 100, pkt_len = 64; 52 52 struct hmap_elem zero = {}, *val;

+204

tools/testing/selftests/bpf/progs/test_spin_lock_fail.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <vmlinux.h> 3 + #include <bpf/bpf_tracing.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_experimental.h" 6 + 7 + struct foo { 8 + struct bpf_spin_lock lock; 9 + int data; 10 + }; 11 + 12 + struct array_map { 13 + __uint(type, BPF_MAP_TYPE_ARRAY); 14 + __type(key, int); 15 + __type(value, struct foo); 16 + __uint(max_entries, 1); 17 + } array_map SEC(".maps"); 18 + 19 + struct { 20 + __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS); 21 + __uint(max_entries, 1); 22 + __type(key, int); 23 + __type(value, int); 24 + __array(values, struct array_map); 25 + } map_of_maps SEC(".maps") = { 26 + .values = { 27 + [0] = &array_map, 28 + }, 29 + }; 30 + 31 + SEC(".data.A") struct bpf_spin_lock lockA; 32 + SEC(".data.B") struct bpf_spin_lock lockB; 33 + 34 + SEC("?tc") 35 + int lock_id_kptr_preserve(void *ctx) 36 + { 37 + struct foo *f; 38 + 39 + f = bpf_obj_new(typeof(*f)); 40 + if (!f) 41 + return 0; 42 + bpf_this_cpu_ptr(f); 43 + return 0; 44 + } 45 + 46 + SEC("?tc") 47 + int lock_id_global_zero(void *ctx) 48 + { 49 + bpf_this_cpu_ptr(&lockA); 50 + return 0; 51 + } 52 + 53 + SEC("?tc") 54 + int lock_id_mapval_preserve(void *ctx) 55 + { 56 + struct foo *f; 57 + int key = 0; 58 + 59 + f = bpf_map_lookup_elem(&array_map, &key); 60 + if (!f) 61 + return 0; 62 + bpf_this_cpu_ptr(f); 63 + return 0; 64 + } 65 + 66 + SEC("?tc") 67 + int lock_id_innermapval_preserve(void *ctx) 68 + { 69 + struct foo *f; 70 + int key = 0; 71 + void *map; 72 + 73 + map = bpf_map_lookup_elem(&map_of_maps, &key); 74 + if (!map) 75 + return 0; 76 + f = bpf_map_lookup_elem(map, &key); 77 + if (!f) 78 + return 0; 79 + bpf_this_cpu_ptr(f); 80 + return 0; 81 + } 82 + 83 + #define CHECK(test, A, B) \ 84 + SEC("?tc") \ 85 + int lock_id_mismatch_##test(void *ctx) \ 86 + { \ 87 + struct foo *f1, *f2, *v, *iv; \ 88 + int key = 0; \ 89 + void *map; \ 90 + \ 91 + map = bpf_map_lookup_elem(&map_of_maps, &key); \ 92 + if (!map) \ 93 + return 0; \ 94 + iv = bpf_map_lookup_elem(map, &key); \ 95 + if (!iv) \ 96 + return 0; \ 97 + v = bpf_map_lookup_elem(&array_map, &key); \ 98 + if (!v) \ 99 + return 0; \ 100 + f1 = bpf_obj_new(typeof(*f1)); \ 101 + if (!f1) \ 102 + return 0; \ 103 + f2 = bpf_obj_new(typeof(*f2)); \ 104 + if (!f2) { \ 105 + bpf_obj_drop(f1); \ 106 + return 0; \ 107 + } \ 108 + bpf_spin_lock(A); \ 109 + bpf_spin_unlock(B); \ 110 + return 0; \ 111 + } 112 + 113 + CHECK(kptr_kptr, &f1->lock, &f2->lock); 114 + CHECK(kptr_global, &f1->lock, &lockA); 115 + CHECK(kptr_mapval, &f1->lock, &v->lock); 116 + CHECK(kptr_innermapval, &f1->lock, &iv->lock); 117 + 118 + CHECK(global_global, &lockA, &lockB); 119 + CHECK(global_kptr, &lockA, &f1->lock); 120 + CHECK(global_mapval, &lockA, &v->lock); 121 + CHECK(global_innermapval, &lockA, &iv->lock); 122 + 123 + SEC("?tc") 124 + int lock_id_mismatch_mapval_mapval(void *ctx) 125 + { 126 + struct foo *f1, *f2; 127 + int key = 0; 128 + 129 + f1 = bpf_map_lookup_elem(&array_map, &key); 130 + if (!f1) 131 + return 0; 132 + f2 = bpf_map_lookup_elem(&array_map, &key); 133 + if (!f2) 134 + return 0; 135 + 136 + bpf_spin_lock(&f1->lock); 137 + f1->data = 42; 138 + bpf_spin_unlock(&f2->lock); 139 + 140 + return 0; 141 + } 142 + 143 + CHECK(mapval_kptr, &v->lock, &f1->lock); 144 + CHECK(mapval_global, &v->lock, &lockB); 145 + CHECK(mapval_innermapval, &v->lock, &iv->lock); 146 + 147 + SEC("?tc") 148 + int lock_id_mismatch_innermapval_innermapval1(void *ctx) 149 + { 150 + struct foo *f1, *f2; 151 + int key = 0; 152 + void *map; 153 + 154 + map = bpf_map_lookup_elem(&map_of_maps, &key); 155 + if (!map) 156 + return 0; 157 + f1 = bpf_map_lookup_elem(map, &key); 158 + if (!f1) 159 + return 0; 160 + f2 = bpf_map_lookup_elem(map, &key); 161 + if (!f2) 162 + return 0; 163 + 164 + bpf_spin_lock(&f1->lock); 165 + f1->data = 42; 166 + bpf_spin_unlock(&f2->lock); 167 + 168 + return 0; 169 + } 170 + 171 + SEC("?tc") 172 + int lock_id_mismatch_innermapval_innermapval2(void *ctx) 173 + { 174 + struct foo *f1, *f2; 175 + int key = 0; 176 + void *map; 177 + 178 + map = bpf_map_lookup_elem(&map_of_maps, &key); 179 + if (!map) 180 + return 0; 181 + f1 = bpf_map_lookup_elem(map, &key); 182 + if (!f1) 183 + return 0; 184 + map = bpf_map_lookup_elem(&map_of_maps, &key); 185 + if (!map) 186 + return 0; 187 + f2 = bpf_map_lookup_elem(map, &key); 188 + if (!f2) 189 + return 0; 190 + 191 + bpf_spin_lock(&f1->lock); 192 + f1->data = 42; 193 + bpf_spin_unlock(&f2->lock); 194 + 195 + return 0; 196 + } 197 + 198 + CHECK(innermapval_kptr, &iv->lock, &f1->lock); 199 + CHECK(innermapval_global, &iv->lock, &lockA); 200 + CHECK(innermapval_mapval, &iv->lock, &v->lock); 201 + 202 + #undef CHECK 203 + 204 + char _license[] SEC("license") = "GPL";

+83

tools/testing/selftests/bpf/progs/type_cast.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + #include "vmlinux.h" 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include <bpf/bpf_core_read.h> 7 + 8 + struct { 9 + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); 10 + __uint(map_flags, BPF_F_NO_PREALLOC); 11 + __type(key, int); 12 + __type(value, long); 13 + } enter_id SEC(".maps"); 14 + 15 + #define IFNAMSIZ 16 16 + 17 + int ifindex, ingress_ifindex; 18 + char name[IFNAMSIZ]; 19 + unsigned int inum; 20 + unsigned int meta_len, frag0_len, kskb_len, kskb2_len; 21 + 22 + void *bpf_cast_to_kern_ctx(void *) __ksym; 23 + void *bpf_rdonly_cast(void *, __u32) __ksym; 24 + 25 + SEC("?xdp") 26 + int md_xdp(struct xdp_md *ctx) 27 + { 28 + struct xdp_buff *kctx = bpf_cast_to_kern_ctx(ctx); 29 + struct net_device *dev; 30 + 31 + dev = kctx->rxq->dev; 32 + ifindex = dev->ifindex; 33 + inum = dev->nd_net.net->ns.inum; 34 + __builtin_memcpy(name, dev->name, IFNAMSIZ); 35 + ingress_ifindex = ctx->ingress_ifindex; 36 + return XDP_PASS; 37 + } 38 + 39 + SEC("?tc") 40 + int md_skb(struct __sk_buff *skb) 41 + { 42 + struct sk_buff *kskb = bpf_cast_to_kern_ctx(skb); 43 + struct skb_shared_info *shared_info; 44 + struct sk_buff *kskb2; 45 + 46 + kskb_len = kskb->len; 47 + 48 + /* Simulate the following kernel macro: 49 + * #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) 50 + */ 51 + shared_info = bpf_rdonly_cast(kskb->head + kskb->end, 52 + bpf_core_type_id_kernel(struct skb_shared_info)); 53 + meta_len = shared_info->meta_len; 54 + frag0_len = shared_info->frag_list->len; 55 + 56 + /* kskb2 should be equal to kskb */ 57 + kskb2 = bpf_rdonly_cast(kskb, bpf_core_type_id_kernel(struct sk_buff)); 58 + kskb2_len = kskb2->len; 59 + return 0; 60 + } 61 + 62 + SEC("?tp_btf/sys_enter") 63 + int BPF_PROG(untrusted_ptr, struct pt_regs *regs, long id) 64 + { 65 + struct task_struct *task, *task_dup; 66 + long *ptr; 67 + 68 + task = bpf_get_current_task_btf(); 69 + task_dup = bpf_rdonly_cast(task, bpf_core_type_id_kernel(struct task_struct)); 70 + (void)bpf_task_storage_get(&enter_id, task_dup, 0, 0); 71 + return 0; 72 + } 73 + 74 + SEC("?tracepoint/syscalls/sys_enter_nanosleep") 75 + int kctx_u64(void *ctx) 76 + { 77 + u64 *kctx = bpf_rdonly_cast(ctx, bpf_core_type_id_kernel(u64)); 78 + 79 + (void)kctx; 80 + return 0; 81 + } 82 + 83 + char _license[] SEC("license") = "GPL";

+3 -3

tools/testing/selftests/bpf/test_bpftool_synctypes.py

··· 309 309 commands), which looks to the lists of options in other source files 310 310 but has different start and end markers: 311 311 312 - "OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug} | {-l|--legacy}" 312 + "OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug}" 313 313 314 314 Return a set containing all options, such as: 315 315 316 - {'-p', '-d', '--legacy', '--pretty', '--debug', '--json', '-l', '-j'} 316 + {'-p', '-d', '--pretty', '--debug', '--json', '-j'} 317 317 """ 318 318 start_marker = re.compile(f'"OPTIONS :=') 319 319 pattern = re.compile('([\w-]+) ?(?:\||}[ }\]"])') ··· 336 336 337 337 Return a set containing all options, such as: 338 338 339 - {'-p', '-d', '--legacy', '--pretty', '--debug', '--json', '-l', '-j'} 339 + {'-p', '-d', '--pretty', '--debug', '--json', '-j'} 340 340 """ 341 341 start_marker = re.compile('\|COMMON_OPTIONS\| replace:: {') 342 342 pattern = re.compile('\*\*([\w/-]+)\*\*')

+1 -1

tools/testing/selftests/bpf/verifier/calls.c

··· 109 109 }, 110 110 .prog_type = BPF_PROG_TYPE_SCHED_CLS, 111 111 .result = REJECT, 112 - .errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point", 112 + .errstr = "arg#0 is ptr_or_null_ expected ptr_ or socket", 113 113 .fixup_kfunc_btf_id = { 114 114 { "bpf_kfunc_call_test_acquire", 3 }, 115 115 { "bpf_kfunc_call_test_release", 5 },

+174

tools/testing/selftests/bpf/verifier/jeq_infer_not_null.c

··· 1 + { 2 + /* This is equivalent to the following program: 3 + * 4 + * r6 = skb->sk; 5 + * r7 = sk_fullsock(r6); 6 + * r0 = sk_fullsock(r6); 7 + * if (r0 == 0) return 0; (a) 8 + * if (r0 != r7) return 0; (b) 9 + * *r7->type; (c) 10 + * return 0; 11 + * 12 + * It is safe to dereference r7 at point (c), because of (a) and (b). 13 + * The test verifies that relation r0 == r7 is propagated from (b) to (c). 14 + */ 15 + "jne/jeq infer not null, PTR_TO_SOCKET_OR_NULL -> PTR_TO_SOCKET for JNE false branch", 16 + .insns = { 17 + /* r6 = skb->sk; */ 18 + BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, offsetof(struct __sk_buff, sk)), 19 + /* if (r6 == 0) return 0; */ 20 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 8), 21 + /* r7 = sk_fullsock(skb); */ 22 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 23 + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 24 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), 25 + /* r0 = sk_fullsock(skb); */ 26 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 27 + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 28 + /* if (r0 == null) return 0; */ 29 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 30 + /* if (r0 == r7) r0 = *(r7->type); */ 31 + BPF_JMP_REG(BPF_JNE, BPF_REG_0, BPF_REG_7, 1), /* Use ! JNE ! */ 32 + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct bpf_sock, type)), 33 + /* return 0 */ 34 + BPF_MOV64_IMM(BPF_REG_0, 0), 35 + BPF_EXIT_INSN(), 36 + }, 37 + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, 38 + .result = ACCEPT, 39 + .result_unpriv = REJECT, 40 + .errstr_unpriv = "R7 pointer comparison", 41 + }, 42 + { 43 + /* Same as above, but verify that another branch of JNE still 44 + * prohibits access to PTR_MAYBE_NULL. 45 + */ 46 + "jne/jeq infer not null, PTR_TO_SOCKET_OR_NULL unchanged for JNE true branch", 47 + .insns = { 48 + /* r6 = skb->sk */ 49 + BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, offsetof(struct __sk_buff, sk)), 50 + /* if (r6 == 0) return 0; */ 51 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 9), 52 + /* r7 = sk_fullsock(skb); */ 53 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 54 + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 55 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), 56 + /* r0 = sk_fullsock(skb); */ 57 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 58 + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 59 + /* if (r0 == null) return 0; */ 60 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3), 61 + /* if (r0 == r7) return 0; */ 62 + BPF_JMP_REG(BPF_JNE, BPF_REG_0, BPF_REG_7, 1), /* Use ! JNE ! */ 63 + BPF_JMP_IMM(BPF_JA, 0, 0, 1), 64 + /* r0 = *(r7->type); */ 65 + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct bpf_sock, type)), 66 + /* return 0 */ 67 + BPF_MOV64_IMM(BPF_REG_0, 0), 68 + BPF_EXIT_INSN(), 69 + }, 70 + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, 71 + .result = REJECT, 72 + .errstr = "R7 invalid mem access 'sock_or_null'", 73 + .result_unpriv = REJECT, 74 + .errstr_unpriv = "R7 pointer comparison", 75 + }, 76 + { 77 + /* Same as a first test, but not null should be inferred for JEQ branch */ 78 + "jne/jeq infer not null, PTR_TO_SOCKET_OR_NULL -> PTR_TO_SOCKET for JEQ true branch", 79 + .insns = { 80 + /* r6 = skb->sk; */ 81 + BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, offsetof(struct __sk_buff, sk)), 82 + /* if (r6 == null) return 0; */ 83 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 9), 84 + /* r7 = sk_fullsock(skb); */ 85 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 86 + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 87 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), 88 + /* r0 = sk_fullsock(skb); */ 89 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 90 + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 91 + /* if (r0 == null) return 0; */ 92 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), 93 + /* if (r0 != r7) return 0; */ 94 + BPF_JMP_REG(BPF_JEQ, BPF_REG_0, BPF_REG_7, 1), /* Use ! JEQ ! */ 95 + BPF_JMP_IMM(BPF_JA, 0, 0, 1), 96 + /* r0 = *(r7->type); */ 97 + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct bpf_sock, type)), 98 + /* return 0; */ 99 + BPF_MOV64_IMM(BPF_REG_0, 0), 100 + BPF_EXIT_INSN(), 101 + }, 102 + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, 103 + .result = ACCEPT, 104 + .result_unpriv = REJECT, 105 + .errstr_unpriv = "R7 pointer comparison", 106 + }, 107 + { 108 + /* Same as above, but verify that another branch of JNE still 109 + * prohibits access to PTR_MAYBE_NULL. 110 + */ 111 + "jne/jeq infer not null, PTR_TO_SOCKET_OR_NULL unchanged for JEQ false branch", 112 + .insns = { 113 + /* r6 = skb->sk; */ 114 + BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, offsetof(struct __sk_buff, sk)), 115 + /* if (r6 == null) return 0; */ 116 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 8), 117 + /* r7 = sk_fullsock(skb); */ 118 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 119 + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 120 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), 121 + /* r0 = sk_fullsock(skb); */ 122 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 123 + BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 124 + /* if (r0 == null) return 0; */ 125 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 126 + /* if (r0 != r7) r0 = *(r7->type); */ 127 + BPF_JMP_REG(BPF_JEQ, BPF_REG_0, BPF_REG_7, 1), /* Use ! JEQ ! */ 128 + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct bpf_sock, type)), 129 + /* return 0; */ 130 + BPF_MOV64_IMM(BPF_REG_0, 0), 131 + BPF_EXIT_INSN(), 132 + }, 133 + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, 134 + .result = REJECT, 135 + .errstr = "R7 invalid mem access 'sock_or_null'", 136 + .result_unpriv = REJECT, 137 + .errstr_unpriv = "R7 pointer comparison", 138 + }, 139 + { 140 + /* Maps are treated in a different branch of `mark_ptr_not_null_reg`, 141 + * so separate test for maps case. 142 + */ 143 + "jne/jeq infer not null, PTR_TO_MAP_VALUE_OR_NULL -> PTR_TO_MAP_VALUE", 144 + .insns = { 145 + /* r9 = &some stack to use as key */ 146 + BPF_ST_MEM(BPF_W, BPF_REG_10, -8, 0), 147 + BPF_MOV64_REG(BPF_REG_9, BPF_REG_10), 148 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_9, -8), 149 + /* r8 = process local map */ 150 + BPF_LD_MAP_FD(BPF_REG_8, 0), 151 + /* r6 = map_lookup_elem(r8, r9); */ 152 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), 153 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_9), 154 + BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), 155 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 156 + /* r7 = map_lookup_elem(r8, r9); */ 157 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), 158 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_9), 159 + BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), 160 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), 161 + /* if (r6 == 0) return 0; */ 162 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 2), 163 + /* if (r6 != r7) return 0; */ 164 + BPF_JMP_REG(BPF_JNE, BPF_REG_6, BPF_REG_7, 1), 165 + /* read *r7; */ 166 + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct bpf_xdp_sock, queue_id)), 167 + /* return 0; */ 168 + BPF_MOV64_IMM(BPF_REG_0, 0), 169 + BPF_EXIT_INSN(), 170 + }, 171 + .fixup_map_xskmap = { 3 }, 172 + .prog_type = BPF_PROG_TYPE_XDP, 173 + .result = ACCEPT, 174 + },

+2 -2

tools/testing/selftests/bpf/verifier/ref_tracking.c

··· 142 142 .kfunc = "bpf", 143 143 .expected_attach_type = BPF_LSM_MAC, 144 144 .flags = BPF_F_SLEEPABLE, 145 - .errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar", 145 + .errstr = "arg#0 is ptr_or_null_ expected ptr_ or socket", 146 146 .fixup_kfunc_btf_id = { 147 147 { "bpf_lookup_user_key", 2 }, 148 148 { "bpf_key_put", 4 }, ··· 163 163 .kfunc = "bpf", 164 164 .expected_attach_type = BPF_LSM_MAC, 165 165 .flags = BPF_F_SLEEPABLE, 166 - .errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar", 166 + .errstr = "arg#0 is ptr_or_null_ expected ptr_ or socket", 167 167 .fixup_kfunc_btf_id = { 168 168 { "bpf_lookup_system_key", 1 }, 169 169 { "bpf_key_put", 3 },

+1 -1

tools/testing/selftests/bpf/verifier/ringbuf.c

··· 28 28 }, 29 29 .fixup_map_ringbuf = { 1 }, 30 30 .result = REJECT, 31 - .errstr = "dereference of modified alloc_mem ptr R1", 31 + .errstr = "dereference of modified ringbuf_mem ptr R1", 32 32 }, 33 33 { 34 34 "ringbuf: invalid reservation offset 2",

+1 -1

tools/testing/selftests/bpf/verifier/spill_fill.c

··· 84 84 }, 85 85 .fixup_map_ringbuf = { 1 }, 86 86 .result = REJECT, 87 - .errstr = "R0 pointer arithmetic on alloc_mem_or_null prohibited", 87 + .errstr = "R0 pointer arithmetic on ringbuf_mem_or_null prohibited", 88 88 }, 89 89 { 90 90 "check corrupted spill/fill",