Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+142

Documentation/bpf/bpf_lsm.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0+ 2 + .. Copyright (C) 2020 Google LLC. 3 + 4 + ================ 5 + LSM BPF Programs 6 + ================ 7 + 8 + These BPF programs allow runtime instrumentation of the LSM hooks by privileged 9 + users to implement system-wide MAC (Mandatory Access Control) and Audit 10 + policies using eBPF. 11 + 12 + Structure 13 + --------- 14 + 15 + The example shows an eBPF program that can be attached to the ``file_mprotect`` 16 + LSM hook: 17 + 18 + .. c:function:: int file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot); 19 + 20 + Other LSM hooks which can be instrumented can be found in 21 + ``include/linux/lsm_hooks.h``. 22 + 23 + eBPF programs that use :doc:`/bpf/btf` do not need to include kernel headers 24 + for accessing information from the attached eBPF program's context. They can 25 + simply declare the structures in the eBPF program and only specify the fields 26 + that need to be accessed. 27 + 28 + .. code-block:: c 29 + 30 + struct mm_struct { 31 + unsigned long start_brk, brk, start_stack; 32 + } __attribute__((preserve_access_index)); 33 + 34 + struct vm_area_struct { 35 + unsigned long start_brk, brk, start_stack; 36 + unsigned long vm_start, vm_end; 37 + struct mm_struct *vm_mm; 38 + } __attribute__((preserve_access_index)); 39 + 40 + 41 + .. note:: The order of the fields is irrelevant. 42 + 43 + This can be further simplified (if one has access to the BTF information at 44 + build time) by generating the ``vmlinux.h`` with: 45 + 46 + .. code-block:: console 47 + 48 + # bpftool btf dump file <path-to-btf-vmlinux> format c > vmlinux.h 49 + 50 + .. note:: ``path-to-btf-vmlinux`` can be ``/sys/kernel/btf/vmlinux`` if the 51 + build environment matches the environment the BPF programs are 52 + deployed in. 53 + 54 + The ``vmlinux.h`` can then simply be included in the BPF programs without 55 + requiring the definition of the types. 56 + 57 + The eBPF programs can be declared using the``BPF_PROG`` 58 + macros defined in `tools/lib/bpf/bpf_tracing.h`_. In this 59 + example: 60 + 61 + * ``"lsm/file_mprotect"`` indicates the LSM hook that the program must 62 + be attached to 63 + * ``mprotect_audit`` is the name of the eBPF program 64 + 65 + .. code-block:: c 66 + 67 + SEC("lsm/file_mprotect") 68 + int BPF_PROG(mprotect_audit, struct vm_area_struct *vma, 69 + unsigned long reqprot, unsigned long prot, int ret) 70 + { 71 + /* ret is the return value from the previous BPF program 72 + * or 0 if it's the first hook. 73 + */ 74 + if (ret != 0) 75 + return ret; 76 + 77 + int is_heap; 78 + 79 + is_heap = (vma->vm_start >= vma->vm_mm->start_brk && 80 + vma->vm_end <= vma->vm_mm->brk); 81 + 82 + /* Return an -EPERM or write information to the perf events buffer 83 + * for auditing 84 + */ 85 + if (is_heap) 86 + return -EPERM; 87 + } 88 + 89 + The ``__attribute__((preserve_access_index))`` is a clang feature that allows 90 + the BPF verifier to update the offsets for the access at runtime using the 91 + :doc:`/bpf/btf` information. Since the BPF verifier is aware of the types, it 92 + also validates all the accesses made to the various types in the eBPF program. 93 + 94 + Loading 95 + ------- 96 + 97 + eBPF programs can be loaded with the :manpage:`bpf(2)` syscall's 98 + ``BPF_PROG_LOAD`` operation: 99 + 100 + .. code-block:: c 101 + 102 + struct bpf_object *obj; 103 + 104 + obj = bpf_object__open("./my_prog.o"); 105 + bpf_object__load(obj); 106 + 107 + This can be simplified by using a skeleton header generated by ``bpftool``: 108 + 109 + .. code-block:: console 110 + 111 + # bpftool gen skeleton my_prog.o > my_prog.skel.h 112 + 113 + and the program can be loaded by including ``my_prog.skel.h`` and using 114 + the generated helper, ``my_prog__open_and_load``. 115 + 116 + Attachment to LSM Hooks 117 + ----------------------- 118 + 119 + The LSM allows attachment of eBPF programs as LSM hooks using :manpage:`bpf(2)` 120 + syscall's ``BPF_RAW_TRACEPOINT_OPEN`` operation or more simply by 121 + using the libbpf helper ``bpf_program__attach_lsm``. 122 + 123 + The program can be detached from the LSM hook by *destroying* the ``link`` 124 + link returned by ``bpf_program__attach_lsm`` using ``bpf_link__destroy``. 125 + 126 + One can also use the helpers generated in ``my_prog.skel.h`` i.e. 127 + ``my_prog__attach`` for attachment and ``my_prog__destroy`` for cleaning up. 128 + 129 + Examples 130 + -------- 131 + 132 + An example eBPF program can be found in 133 + `tools/testing/selftests/bpf/progs/lsm.c`_ and the corresponding 134 + userspace code in `tools/testing/selftests/bpf/prog_tests/test_lsm.c`_ 135 + 136 + .. Links 137 + .. _tools/lib/bpf/bpf_tracing.h: 138 + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/lib/bpf/bpf_tracing.h 139 + .. _tools/testing/selftests/bpf/progs/lsm.c: 140 + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/progs/lsm.c 141 + .. _tools/testing/selftests/bpf/prog_tests/test_lsm.c: 142 + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/prog_tests/test_lsm.c

+213

Documentation/bpf/drgn.rst

··· 1 + .. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 + 3 + ============== 4 + BPF drgn tools 5 + ============== 6 + 7 + drgn scripts is a convenient and easy to use mechanism to retrieve arbitrary 8 + kernel data structures. drgn is not relying on kernel UAPI to read the data. 9 + Instead it's reading directly from ``/proc/kcore`` or vmcore and pretty prints 10 + the data based on DWARF debug information from vmlinux. 11 + 12 + This document describes BPF related drgn tools. 13 + 14 + See `drgn/tools`_ for all tools available at the moment and `drgn/doc`_ for 15 + more details on drgn itself. 16 + 17 + bpf_inspect.py 18 + -------------- 19 + 20 + Description 21 + =========== 22 + 23 + `bpf_inspect.py`_ is a tool intended to inspect BPF programs and maps. It can 24 + iterate over all programs and maps in the system and print basic information 25 + about these objects, including id, type and name. 26 + 27 + The main use-case `bpf_inspect.py`_ covers is to show BPF programs of types 28 + ``BPF_PROG_TYPE_EXT`` and ``BPF_PROG_TYPE_TRACING`` attached to other BPF 29 + programs via ``freplace``/``fentry``/``fexit`` mechanisms, since there is no 30 + user-space API to get this information. 31 + 32 + Getting started 33 + =============== 34 + 35 + List BPF programs (full names are obtained from BTF):: 36 + 37 + % sudo bpf_inspect.py prog 38 + 27: BPF_PROG_TYPE_TRACEPOINT tracepoint__tcp__tcp_send_reset 39 + 4632: BPF_PROG_TYPE_CGROUP_SOCK_ADDR tw_ipt_bind 40 + 49464: BPF_PROG_TYPE_RAW_TRACEPOINT raw_tracepoint__sched_process_exit 41 + 42 + List BPF maps:: 43 + 44 + % sudo bpf_inspect.py map 45 + 2577: BPF_MAP_TYPE_HASH tw_ipt_vips 46 + 4050: BPF_MAP_TYPE_STACK_TRACE stack_traces 47 + 4069: BPF_MAP_TYPE_PERCPU_ARRAY ned_dctcp_cntr 48 + 49 + Find BPF programs attached to BPF program ``test_pkt_access``:: 50 + 51 + % sudo bpf_inspect.py p | grep test_pkt_access 52 + 650: BPF_PROG_TYPE_SCHED_CLS test_pkt_access 53 + 654: BPF_PROG_TYPE_TRACING test_main linked:[650->25: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access()] 54 + 655: BPF_PROG_TYPE_TRACING test_subprog1 linked:[650->29: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog1()] 55 + 656: BPF_PROG_TYPE_TRACING test_subprog2 linked:[650->31: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog2()] 56 + 657: BPF_PROG_TYPE_TRACING test_subprog3 linked:[650->21: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog3()] 57 + 658: BPF_PROG_TYPE_EXT new_get_skb_len linked:[650->16: BPF_TRAMP_REPLACE test_pkt_access->get_skb_len()] 58 + 659: BPF_PROG_TYPE_EXT new_get_skb_ifindex linked:[650->23: BPF_TRAMP_REPLACE test_pkt_access->get_skb_ifindex()] 59 + 660: BPF_PROG_TYPE_EXT new_get_constant linked:[650->19: BPF_TRAMP_REPLACE test_pkt_access->get_constant()] 60 + 61 + It can be seen that there is a program ``test_pkt_access``, id 650 and there 62 + are multiple other tracing and ext programs attached to functions in 63 + ``test_pkt_access``. 64 + 65 + For example the line:: 66 + 67 + 658: BPF_PROG_TYPE_EXT new_get_skb_len linked:[650->16: BPF_TRAMP_REPLACE test_pkt_access->get_skb_len()] 68 + 69 + , means that BPF program id 658, type ``BPF_PROG_TYPE_EXT``, name 70 + ``new_get_skb_len`` replaces (``BPF_TRAMP_REPLACE``) function ``get_skb_len()`` 71 + that has BTF id 16 in BPF program id 650, name ``test_pkt_access``. 72 + 73 + Getting help: 74 + 75 + .. code-block:: none 76 + 77 + % sudo bpf_inspect.py 78 + usage: bpf_inspect.py [-h] {prog,p,map,m} ... 79 + 80 + drgn script to list BPF programs or maps and their properties 81 + unavailable via kernel API. 82 + 83 + See https://github.com/osandov/drgn/ for more details on drgn. 84 + 85 + optional arguments: 86 + -h, --help show this help message and exit 87 + 88 + subcommands: 89 + {prog,p,map,m} 90 + prog (p) list BPF programs 91 + map (m) list BPF maps 92 + 93 + Customization 94 + ============= 95 + 96 + The script is intended to be customized by developers to print relevant 97 + information about BPF programs, maps and other objects. 98 + 99 + For example, to print ``struct bpf_prog_aux`` for BPF program id 53077: 100 + 101 + .. code-block:: none 102 + 103 + % git diff 104 + diff --git a/tools/bpf_inspect.py b/tools/bpf_inspect.py 105 + index 650e228..aea2357 100755 106 + --- a/tools/bpf_inspect.py 107 + +++ b/tools/bpf_inspect.py 108 + @@ -112,7 +112,9 @@ def list_bpf_progs(args): 109 + if linked: 110 + linked = f" linked:[{linked}]" 111 + 112 + - print(f"{id_:>6}: {type_:32} {name:32} {linked}") 113 + + if id_ == 53077: 114 + + print(f"{id_:>6}: {type_:32} {name:32}") 115 + + print(f"{bpf_prog.aux}") 116 + 117 + 118 + def list_bpf_maps(args): 119 + 120 + It produces the output:: 121 + 122 + % sudo bpf_inspect.py p 123 + 53077: BPF_PROG_TYPE_XDP tw_xdp_policer 124 + *(struct bpf_prog_aux *)0xffff8893fad4b400 = { 125 + .refcnt = (atomic64_t){ 126 + .counter = (long)58, 127 + }, 128 + .used_map_cnt = (u32)1, 129 + .max_ctx_offset = (u32)8, 130 + .max_pkt_offset = (u32)15, 131 + .max_tp_access = (u32)0, 132 + .stack_depth = (u32)8, 133 + .id = (u32)53077, 134 + .func_cnt = (u32)0, 135 + .func_idx = (u32)0, 136 + .attach_btf_id = (u32)0, 137 + .linked_prog = (struct bpf_prog *)0x0, 138 + .verifier_zext = (bool)0, 139 + .offload_requested = (bool)0, 140 + .attach_btf_trace = (bool)0, 141 + .func_proto_unreliable = (bool)0, 142 + .trampoline_prog_type = (enum bpf_tramp_prog_type)BPF_TRAMP_FENTRY, 143 + .trampoline = (struct bpf_trampoline *)0x0, 144 + .tramp_hlist = (struct hlist_node){ 145 + .next = (struct hlist_node *)0x0, 146 + .pprev = (struct hlist_node **)0x0, 147 + }, 148 + .attach_func_proto = (const struct btf_type *)0x0, 149 + .attach_func_name = (const char *)0x0, 150 + .func = (struct bpf_prog **)0x0, 151 + .jit_data = (void *)0x0, 152 + .poke_tab = (struct bpf_jit_poke_descriptor *)0x0, 153 + .size_poke_tab = (u32)0, 154 + .ksym_tnode = (struct latch_tree_node){ 155 + .node = (struct rb_node [2]){ 156 + { 157 + .__rb_parent_color = (unsigned long)18446612956263126665, 158 + .rb_right = (struct rb_node *)0x0, 159 + .rb_left = (struct rb_node *)0xffff88a0be3d0088, 160 + }, 161 + { 162 + .__rb_parent_color = (unsigned long)18446612956263126689, 163 + .rb_right = (struct rb_node *)0x0, 164 + .rb_left = (struct rb_node *)0xffff88a0be3d00a0, 165 + }, 166 + }, 167 + }, 168 + .ksym_lnode = (struct list_head){ 169 + .next = (struct list_head *)0xffff88bf481830b8, 170 + .prev = (struct list_head *)0xffff888309f536b8, 171 + }, 172 + .ops = (const struct bpf_prog_ops *)xdp_prog_ops+0x0 = 0xffffffff820fa350, 173 + .used_maps = (struct bpf_map **)0xffff889ff795de98, 174 + .prog = (struct bpf_prog *)0xffffc9000cf2d000, 175 + .user = (struct user_struct *)root_user+0x0 = 0xffffffff82444820, 176 + .load_time = (u64)2408348759285319, 177 + .cgroup_storage = (struct bpf_map *[2]){}, 178 + .name = (char [16])"tw_xdp_policer", 179 + .security = (void *)0xffff889ff795d548, 180 + .offload = (struct bpf_prog_offload *)0x0, 181 + .btf = (struct btf *)0xffff8890ce6d0580, 182 + .func_info = (struct bpf_func_info *)0xffff889ff795d240, 183 + .func_info_aux = (struct bpf_func_info_aux *)0xffff889ff795de20, 184 + .linfo = (struct bpf_line_info *)0xffff888a707afc00, 185 + .jited_linfo = (void **)0xffff8893fad48600, 186 + .func_info_cnt = (u32)1, 187 + .nr_linfo = (u32)37, 188 + .linfo_idx = (u32)0, 189 + .num_exentries = (u32)0, 190 + .extable = (struct exception_table_entry *)0xffffffffa032d950, 191 + .stats = (struct bpf_prog_stats *)0x603fe3a1f6d0, 192 + .work = (struct work_struct){ 193 + .data = (atomic_long_t){ 194 + .counter = (long)0, 195 + }, 196 + .entry = (struct list_head){ 197 + .next = (struct list_head *)0x0, 198 + .prev = (struct list_head *)0x0, 199 + }, 200 + .func = (work_func_t)0x0, 201 + }, 202 + .rcu = (struct callback_head){ 203 + .next = (struct callback_head *)0x0, 204 + .func = (void (*)(struct callback_head *))0x0, 205 + }, 206 + } 207 + 208 + 209 + .. Links 210 + .. _drgn/doc: https://drgn.readthedocs.io/en/latest/ 211 + .. _drgn/tools: https://github.com/osandov/drgn/tree/master/tools 212 + .. _bpf_inspect.py: 213 + https://github.com/osandov/drgn/blob/master/tools/bpf_inspect.py

+4 -2

Documentation/bpf/index.rst

··· 45 45 prog_cgroup_sockopt 46 46 prog_cgroup_sysctl 47 47 prog_flow_dissector 48 + bpf_lsm 48 49 49 50 50 - Testing BPF 51 - =========== 51 + Testing and debugging BPF 52 + ========================= 52 53 53 54 .. toctree:: 54 55 :maxdepth: 1 55 56 57 + drgn 56 58 s390 57 59 58 60

+2

MAINTAINERS

··· 3147 3147 R: Song Liu <songliubraving@fb.com> 3148 3148 R: Yonghong Song <yhs@fb.com> 3149 3149 R: Andrii Nakryiko <andriin@fb.com> 3150 + R: John Fastabend <john.fastabend@gmail.com> 3151 + R: KP Singh <kpsingh@chromium.org> 3150 3152 L: netdev@vger.kernel.org 3151 3153 L: bpf@vger.kernel.org 3152 3154 T: git git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

-6

arch/powerpc/kernel/vmlinux.lds.S

··· 303 303 *(.branch_lt) 304 304 } 305 305 306 - #ifdef CONFIG_DEBUG_INFO_BTF 307 - .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { 308 - *(.BTF) 309 - } 310 - #endif 311 - 312 306 .opd : AT(ADDR(.opd) - LOAD_OFFSET) { 313 307 __start_opd = .; 314 308 KEEP(*(.opd))

+15

include/asm-generic/vmlinux.lds.h

··· 535 535 \ 536 536 RO_EXCEPTION_TABLE \ 537 537 NOTES \ 538 + BTF \ 538 539 \ 539 540 . = ALIGN((align)); \ 540 541 __end_rodata = .; ··· 621 620 KEEP(*(__ex_table)) \ 622 621 __stop___ex_table = .; \ 623 622 } 623 + 624 + /* 625 + * .BTF 626 + */ 627 + #ifdef CONFIG_DEBUG_INFO_BTF 628 + #define BTF \ 629 + .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { \ 630 + __start_BTF = .; \ 631 + *(.BTF) \ 632 + __stop_BTF = .; \ 633 + } 634 + #else 635 + #define BTF 636 + #endif 624 637 625 638 /* 626 639 * Init task

+36 -5

include/linux/bpf-cgroup.h

··· 51 51 struct rcu_head rcu; 52 52 }; 53 53 54 + struct bpf_cgroup_link { 55 + struct bpf_link link; 56 + struct cgroup *cgroup; 57 + enum bpf_attach_type type; 58 + }; 59 + 60 + extern const struct bpf_link_ops bpf_cgroup_link_lops; 61 + 54 62 struct bpf_prog_list { 55 63 struct list_head node; 56 64 struct bpf_prog *prog; 65 + struct bpf_cgroup_link *link; 57 66 struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]; 58 67 }; 59 68 ··· 93 84 int cgroup_bpf_inherit(struct cgroup *cgrp); 94 85 void cgroup_bpf_offline(struct cgroup *cgrp); 95 86 96 - int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, 97 - struct bpf_prog *replace_prog, 87 + int __cgroup_bpf_attach(struct cgroup *cgrp, 88 + struct bpf_prog *prog, struct bpf_prog *replace_prog, 89 + struct bpf_cgroup_link *link, 98 90 enum bpf_attach_type type, u32 flags); 99 91 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, 92 + struct bpf_cgroup_link *link, 100 93 enum bpf_attach_type type); 94 + int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link, 95 + struct bpf_prog *new_prog); 101 96 int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, 102 97 union bpf_attr __user *uattr); 103 98 104 99 /* Wrapper for __cgroup_bpf_*() protected by cgroup_mutex */ 105 - int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, 106 - struct bpf_prog *replace_prog, enum bpf_attach_type type, 100 + int cgroup_bpf_attach(struct cgroup *cgrp, 101 + struct bpf_prog *prog, struct bpf_prog *replace_prog, 102 + struct bpf_cgroup_link *link, enum bpf_attach_type type, 107 103 u32 flags); 108 104 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, 109 - enum bpf_attach_type type, u32 flags); 105 + enum bpf_attach_type type); 106 + int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *old_prog, 107 + struct bpf_prog *new_prog); 110 108 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, 111 109 union bpf_attr __user *uattr); 112 110 ··· 348 332 enum bpf_prog_type ptype, struct bpf_prog *prog); 349 333 int cgroup_bpf_prog_detach(const union bpf_attr *attr, 350 334 enum bpf_prog_type ptype); 335 + int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); 351 336 int cgroup_bpf_prog_query(const union bpf_attr *attr, 352 337 union bpf_attr __user *uattr); 353 338 #else 354 339 355 340 struct bpf_prog; 341 + struct bpf_link; 356 342 struct cgroup_bpf {}; 357 343 static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; } 358 344 static inline void cgroup_bpf_offline(struct cgroup *cgrp) {} ··· 368 350 369 351 static inline int cgroup_bpf_prog_detach(const union bpf_attr *attr, 370 352 enum bpf_prog_type ptype) 353 + { 354 + return -EINVAL; 355 + } 356 + 357 + static inline int cgroup_bpf_link_attach(const union bpf_attr *attr, 358 + struct bpf_prog *prog) 359 + { 360 + return -EINVAL; 361 + } 362 + 363 + static inline int cgroup_bpf_replace(struct bpf_link *link, 364 + struct bpf_prog *old_prog, 365 + struct bpf_prog *new_prog) 371 366 { 372 367 return -EINVAL; 373 368 }

+14 -1

include/linux/bpf.h

··· 234 234 ARG_CONST_SIZE_OR_ZERO, /* number of bytes accessed from memory or 0 */ 235 235 236 236 ARG_PTR_TO_CTX, /* pointer to context */ 237 + ARG_PTR_TO_CTX_OR_NULL, /* pointer to context or NULL */ 237 238 ARG_ANYTHING, /* any (initialized) argument is ok */ 238 239 ARG_PTR_TO_SPIN_LOCK, /* pointer to bpf_spin_lock */ 239 240 ARG_PTR_TO_SOCK_COMMON, /* pointer to sock_common */ ··· 1083 1082 int bpf_map_new_fd(struct bpf_map *map, int flags); 1084 1083 int bpf_prog_new_fd(struct bpf_prog *prog); 1085 1084 1086 - struct bpf_link; 1085 + struct bpf_link { 1086 + atomic64_t refcnt; 1087 + const struct bpf_link_ops *ops; 1088 + struct bpf_prog *prog; 1089 + struct work_struct work; 1090 + }; 1087 1091 1088 1092 struct bpf_link_ops { 1089 1093 void (*release)(struct bpf_link *link); 1090 1094 void (*dealloc)(struct bpf_link *link); 1095 + 1091 1096 }; 1092 1097 1093 1098 void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops, 1094 1099 struct bpf_prog *prog); 1100 + void bpf_link_cleanup(struct bpf_link *link, struct file *link_file, 1101 + int link_fd); 1095 1102 void bpf_link_inc(struct bpf_link *link); 1096 1103 void bpf_link_put(struct bpf_link *link); 1097 1104 int bpf_link_new_fd(struct bpf_link *link); ··· 1510 1501 extern const struct bpf_func_proto bpf_sock_map_update_proto; 1511 1502 extern const struct bpf_func_proto bpf_sock_hash_update_proto; 1512 1503 extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto; 1504 + extern const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto; 1513 1505 extern const struct bpf_func_proto bpf_msg_redirect_hash_proto; 1514 1506 extern const struct bpf_func_proto bpf_msg_redirect_map_proto; 1515 1507 extern const struct bpf_func_proto bpf_sk_redirect_hash_proto; ··· 1523 1513 extern const struct bpf_func_proto bpf_tcp_sock_proto; 1524 1514 extern const struct bpf_func_proto bpf_jiffies64_proto; 1525 1515 extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto; 1516 + 1517 + const struct bpf_func_proto *bpf_tracing_func_proto( 1518 + enum bpf_func_id func_id, const struct bpf_prog *prog); 1526 1519 1527 1520 /* Shared helpers among cBPF and eBPF. */ 1528 1521 void bpf_user_rnd_init_once(void);

+33

include/linux/bpf_lsm.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + /* 4 + * Copyright (C) 2020 Google LLC. 5 + */ 6 + 7 + #ifndef _LINUX_BPF_LSM_H 8 + #define _LINUX_BPF_LSM_H 9 + 10 + #include <linux/bpf.h> 11 + #include <linux/lsm_hooks.h> 12 + 13 + #ifdef CONFIG_BPF_LSM 14 + 15 + #define LSM_HOOK(RET, DEFAULT, NAME, ...) \ 16 + RET bpf_lsm_##NAME(__VA_ARGS__); 17 + #include <linux/lsm_hook_defs.h> 18 + #undef LSM_HOOK 19 + 20 + int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, 21 + const struct bpf_prog *prog); 22 + 23 + #else /* !CONFIG_BPF_LSM */ 24 + 25 + static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, 26 + const struct bpf_prog *prog) 27 + { 28 + return -EOPNOTSUPP; 29 + } 30 + 31 + #endif /* CONFIG_BPF_LSM */ 32 + 33 + #endif /* _LINUX_BPF_LSM_H */

+4

include/linux/bpf_types.h

··· 70 70 void *, void *) 71 71 BPF_PROG_TYPE(BPF_PROG_TYPE_EXT, bpf_extension, 72 72 void *, void *) 73 + #ifdef CONFIG_BPF_LSM 74 + BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm, 75 + void *, void *) 76 + #endif /* CONFIG_BPF_LSM */ 73 77 #endif 74 78 75 79 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)

+4

include/linux/bpf_verifier.h

··· 123 123 s64 smax_value; /* maximum possible (s64)value */ 124 124 u64 umin_value; /* minimum possible (u64)value */ 125 125 u64 umax_value; /* maximum possible (u64)value */ 126 + s32 s32_min_value; /* minimum possible (s32)value */ 127 + s32 s32_max_value; /* maximum possible (s32)value */ 128 + u32 u32_min_value; /* minimum possible (u32)value */ 129 + u32 u32_max_value; /* maximum possible (u32)value */ 126 130 /* parentage chain for liveness checking */ 127 131 struct bpf_reg_state *parent; 128 132 /* Inside the callee two registers can be both PTR_TO_STACK like

+1

include/linux/limits.h

··· 27 27 #define S16_MAX ((s16)(U16_MAX >> 1)) 28 28 #define S16_MIN ((s16)(-S16_MAX - 1)) 29 29 #define U32_MAX ((u32)~0U) 30 + #define U32_MIN ((u32)0) 30 31 #define S32_MAX ((s32)(U32_MAX >> 1)) 31 32 #define S32_MIN ((s32)(-S32_MAX - 1)) 32 33 #define U64_MAX ((u64)~0ULL)

+381

include/linux/lsm_hook_defs.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + /* 4 + * Linux Security Module Hook declarations. 5 + * 6 + * Copyright (C) 2001 WireX Communications, Inc <chris@wirex.com> 7 + * Copyright (C) 2001 Greg Kroah-Hartman <greg@kroah.com> 8 + * Copyright (C) 2001 Networks Associates Technology, Inc <ssmalley@nai.com> 9 + * Copyright (C) 2001 James Morris <jmorris@intercode.com.au> 10 + * Copyright (C) 2001 Silicon Graphics, Inc. (Trust Technology Group) 11 + * Copyright (C) 2015 Intel Corporation. 12 + * Copyright (C) 2015 Casey Schaufler <casey@schaufler-ca.com> 13 + * Copyright (C) 2016 Mellanox Techonologies 14 + * Copyright (C) 2020 Google LLC. 15 + */ 16 + 17 + /* 18 + * The macro LSM_HOOK is used to define the data structures required by the 19 + * the LSM framework using the pattern: 20 + * 21 + * LSM_HOOK(<return_type>, <default_value>, <hook_name>, args...) 22 + * 23 + * struct security_hook_heads { 24 + * #define LSM_HOOK(RET, DEFAULT, NAME, ...) struct hlist_head NAME; 25 + * #include <linux/lsm_hook_defs.h> 26 + * #undef LSM_HOOK 27 + * }; 28 + */ 29 + LSM_HOOK(int, 0, binder_set_context_mgr, struct task_struct *mgr) 30 + LSM_HOOK(int, 0, binder_transaction, struct task_struct *from, 31 + struct task_struct *to) 32 + LSM_HOOK(int, 0, binder_transfer_binder, struct task_struct *from, 33 + struct task_struct *to) 34 + LSM_HOOK(int, 0, binder_transfer_file, struct task_struct *from, 35 + struct task_struct *to, struct file *file) 36 + LSM_HOOK(int, 0, ptrace_access_check, struct task_struct *child, 37 + unsigned int mode) 38 + LSM_HOOK(int, 0, ptrace_traceme, struct task_struct *parent) 39 + LSM_HOOK(int, 0, capget, struct task_struct *target, kernel_cap_t *effective, 40 + kernel_cap_t *inheritable, kernel_cap_t *permitted) 41 + LSM_HOOK(int, 0, capset, struct cred *new, const struct cred *old, 42 + const kernel_cap_t *effective, const kernel_cap_t *inheritable, 43 + const kernel_cap_t *permitted) 44 + LSM_HOOK(int, 0, capable, const struct cred *cred, struct user_namespace *ns, 45 + int cap, unsigned int opts) 46 + LSM_HOOK(int, 0, quotactl, int cmds, int type, int id, struct super_block *sb) 47 + LSM_HOOK(int, 0, quota_on, struct dentry *dentry) 48 + LSM_HOOK(int, 0, syslog, int type) 49 + LSM_HOOK(int, 0, settime, const struct timespec64 *ts, 50 + const struct timezone *tz) 51 + LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages) 52 + LSM_HOOK(int, 0, bprm_set_creds, struct linux_binprm *bprm) 53 + LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm) 54 + LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm) 55 + LSM_HOOK(void, LSM_RET_VOID, bprm_committed_creds, struct linux_binprm *bprm) 56 + LSM_HOOK(int, 0, fs_context_dup, struct fs_context *fc, 57 + struct fs_context *src_sc) 58 + LSM_HOOK(int, 0, fs_context_parse_param, struct fs_context *fc, 59 + struct fs_parameter *param) 60 + LSM_HOOK(int, 0, sb_alloc_security, struct super_block *sb) 61 + LSM_HOOK(void, LSM_RET_VOID, sb_free_security, struct super_block *sb) 62 + LSM_HOOK(void, LSM_RET_VOID, sb_free_mnt_opts, void *mnt_opts) 63 + LSM_HOOK(int, 0, sb_eat_lsm_opts, char *orig, void **mnt_opts) 64 + LSM_HOOK(int, 0, sb_remount, struct super_block *sb, void *mnt_opts) 65 + LSM_HOOK(int, 0, sb_kern_mount, struct super_block *sb) 66 + LSM_HOOK(int, 0, sb_show_options, struct seq_file *m, struct super_block *sb) 67 + LSM_HOOK(int, 0, sb_statfs, struct dentry *dentry) 68 + LSM_HOOK(int, 0, sb_mount, const char *dev_name, const struct path *path, 69 + const char *type, unsigned long flags, void *data) 70 + LSM_HOOK(int, 0, sb_umount, struct vfsmount *mnt, int flags) 71 + LSM_HOOK(int, 0, sb_pivotroot, const struct path *old_path, 72 + const struct path *new_path) 73 + LSM_HOOK(int, 0, sb_set_mnt_opts, struct super_block *sb, void *mnt_opts, 74 + unsigned long kern_flags, unsigned long *set_kern_flags) 75 + LSM_HOOK(int, 0, sb_clone_mnt_opts, const struct super_block *oldsb, 76 + struct super_block *newsb, unsigned long kern_flags, 77 + unsigned long *set_kern_flags) 78 + LSM_HOOK(int, 0, sb_add_mnt_opt, const char *option, const char *val, 79 + int len, void **mnt_opts) 80 + LSM_HOOK(int, 0, move_mount, const struct path *from_path, 81 + const struct path *to_path) 82 + LSM_HOOK(int, 0, dentry_init_security, struct dentry *dentry, 83 + int mode, const struct qstr *name, void **ctx, u32 *ctxlen) 84 + LSM_HOOK(int, 0, dentry_create_files_as, struct dentry *dentry, int mode, 85 + struct qstr *name, const struct cred *old, struct cred *new) 86 + 87 + #ifdef CONFIG_SECURITY_PATH 88 + LSM_HOOK(int, 0, path_unlink, const struct path *dir, struct dentry *dentry) 89 + LSM_HOOK(int, 0, path_mkdir, const struct path *dir, struct dentry *dentry, 90 + umode_t mode) 91 + LSM_HOOK(int, 0, path_rmdir, const struct path *dir, struct dentry *dentry) 92 + LSM_HOOK(int, 0, path_mknod, const struct path *dir, struct dentry *dentry, 93 + umode_t mode, unsigned int dev) 94 + LSM_HOOK(int, 0, path_truncate, const struct path *path) 95 + LSM_HOOK(int, 0, path_symlink, const struct path *dir, struct dentry *dentry, 96 + const char *old_name) 97 + LSM_HOOK(int, 0, path_link, struct dentry *old_dentry, 98 + const struct path *new_dir, struct dentry *new_dentry) 99 + LSM_HOOK(int, 0, path_rename, const struct path *old_dir, 100 + struct dentry *old_dentry, const struct path *new_dir, 101 + struct dentry *new_dentry) 102 + LSM_HOOK(int, 0, path_chmod, const struct path *path, umode_t mode) 103 + LSM_HOOK(int, 0, path_chown, const struct path *path, kuid_t uid, kgid_t gid) 104 + LSM_HOOK(int, 0, path_chroot, const struct path *path) 105 + #endif /* CONFIG_SECURITY_PATH */ 106 + 107 + /* Needed for inode based security check */ 108 + LSM_HOOK(int, 0, path_notify, const struct path *path, u64 mask, 109 + unsigned int obj_type) 110 + LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode) 111 + LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode) 112 + LSM_HOOK(int, 0, inode_init_security, struct inode *inode, 113 + struct inode *dir, const struct qstr *qstr, const char **name, 114 + void **value, size_t *len) 115 + LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry, 116 + umode_t mode) 117 + LSM_HOOK(int, 0, inode_link, struct dentry *old_dentry, struct inode *dir, 118 + struct dentry *new_dentry) 119 + LSM_HOOK(int, 0, inode_unlink, struct inode *dir, struct dentry *dentry) 120 + LSM_HOOK(int, 0, inode_symlink, struct inode *dir, struct dentry *dentry, 121 + const char *old_name) 122 + LSM_HOOK(int, 0, inode_mkdir, struct inode *dir, struct dentry *dentry, 123 + umode_t mode) 124 + LSM_HOOK(int, 0, inode_rmdir, struct inode *dir, struct dentry *dentry) 125 + LSM_HOOK(int, 0, inode_mknod, struct inode *dir, struct dentry *dentry, 126 + umode_t mode, dev_t dev) 127 + LSM_HOOK(int, 0, inode_rename, struct inode *old_dir, struct dentry *old_dentry, 128 + struct inode *new_dir, struct dentry *new_dentry) 129 + LSM_HOOK(int, 0, inode_readlink, struct dentry *dentry) 130 + LSM_HOOK(int, 0, inode_follow_link, struct dentry *dentry, struct inode *inode, 131 + bool rcu) 132 + LSM_HOOK(int, 0, inode_permission, struct inode *inode, int mask) 133 + LSM_HOOK(int, 0, inode_setattr, struct dentry *dentry, struct iattr *attr) 134 + LSM_HOOK(int, 0, inode_getattr, const struct path *path) 135 + LSM_HOOK(int, 0, inode_setxattr, struct dentry *dentry, const char *name, 136 + const void *value, size_t size, int flags) 137 + LSM_HOOK(void, LSM_RET_VOID, inode_post_setxattr, struct dentry *dentry, 138 + const char *name, const void *value, size_t size, int flags) 139 + LSM_HOOK(int, 0, inode_getxattr, struct dentry *dentry, const char *name) 140 + LSM_HOOK(int, 0, inode_listxattr, struct dentry *dentry) 141 + LSM_HOOK(int, 0, inode_removexattr, struct dentry *dentry, const char *name) 142 + LSM_HOOK(int, 0, inode_need_killpriv, struct dentry *dentry) 143 + LSM_HOOK(int, 0, inode_killpriv, struct dentry *dentry) 144 + LSM_HOOK(int, -EOPNOTSUPP, inode_getsecurity, struct inode *inode, 145 + const char *name, void **buffer, bool alloc) 146 + LSM_HOOK(int, -EOPNOTSUPP, inode_setsecurity, struct inode *inode, 147 + const char *name, const void *value, size_t size, int flags) 148 + LSM_HOOK(int, 0, inode_listsecurity, struct inode *inode, char *buffer, 149 + size_t buffer_size) 150 + LSM_HOOK(void, LSM_RET_VOID, inode_getsecid, struct inode *inode, u32 *secid) 151 + LSM_HOOK(int, 0, inode_copy_up, struct dentry *src, struct cred **new) 152 + LSM_HOOK(int, 0, inode_copy_up_xattr, const char *name) 153 + LSM_HOOK(int, 0, kernfs_init_security, struct kernfs_node *kn_dir, 154 + struct kernfs_node *kn) 155 + LSM_HOOK(int, 0, file_permission, struct file *file, int mask) 156 + LSM_HOOK(int, 0, file_alloc_security, struct file *file) 157 + LSM_HOOK(void, LSM_RET_VOID, file_free_security, struct file *file) 158 + LSM_HOOK(int, 0, file_ioctl, struct file *file, unsigned int cmd, 159 + unsigned long arg) 160 + LSM_HOOK(int, 0, mmap_addr, unsigned long addr) 161 + LSM_HOOK(int, 0, mmap_file, struct file *file, unsigned long reqprot, 162 + unsigned long prot, unsigned long flags) 163 + LSM_HOOK(int, 0, file_mprotect, struct vm_area_struct *vma, 164 + unsigned long reqprot, unsigned long prot) 165 + LSM_HOOK(int, 0, file_lock, struct file *file, unsigned int cmd) 166 + LSM_HOOK(int, 0, file_fcntl, struct file *file, unsigned int cmd, 167 + unsigned long arg) 168 + LSM_HOOK(void, LSM_RET_VOID, file_set_fowner, struct file *file) 169 + LSM_HOOK(int, 0, file_send_sigiotask, struct task_struct *tsk, 170 + struct fown_struct *fown, int sig) 171 + LSM_HOOK(int, 0, file_receive, struct file *file) 172 + LSM_HOOK(int, 0, file_open, struct file *file) 173 + LSM_HOOK(int, 0, task_alloc, struct task_struct *task, 174 + unsigned long clone_flags) 175 + LSM_HOOK(void, LSM_RET_VOID, task_free, struct task_struct *task) 176 + LSM_HOOK(int, 0, cred_alloc_blank, struct cred *cred, gfp_t gfp) 177 + LSM_HOOK(void, LSM_RET_VOID, cred_free, struct cred *cred) 178 + LSM_HOOK(int, 0, cred_prepare, struct cred *new, const struct cred *old, 179 + gfp_t gfp) 180 + LSM_HOOK(void, LSM_RET_VOID, cred_transfer, struct cred *new, 181 + const struct cred *old) 182 + LSM_HOOK(void, LSM_RET_VOID, cred_getsecid, const struct cred *c, u32 *secid) 183 + LSM_HOOK(int, 0, kernel_act_as, struct cred *new, u32 secid) 184 + LSM_HOOK(int, 0, kernel_create_files_as, struct cred *new, struct inode *inode) 185 + LSM_HOOK(int, 0, kernel_module_request, char *kmod_name) 186 + LSM_HOOK(int, 0, kernel_load_data, enum kernel_load_data_id id) 187 + LSM_HOOK(int, 0, kernel_read_file, struct file *file, 188 + enum kernel_read_file_id id) 189 + LSM_HOOK(int, 0, kernel_post_read_file, struct file *file, char *buf, 190 + loff_t size, enum kernel_read_file_id id) 191 + LSM_HOOK(int, 0, task_fix_setuid, struct cred *new, const struct cred *old, 192 + int flags) 193 + LSM_HOOK(int, 0, task_setpgid, struct task_struct *p, pid_t pgid) 194 + LSM_HOOK(int, 0, task_getpgid, struct task_struct *p) 195 + LSM_HOOK(int, 0, task_getsid, struct task_struct *p) 196 + LSM_HOOK(void, LSM_RET_VOID, task_getsecid, struct task_struct *p, u32 *secid) 197 + LSM_HOOK(int, 0, task_setnice, struct task_struct *p, int nice) 198 + LSM_HOOK(int, 0, task_setioprio, struct task_struct *p, int ioprio) 199 + LSM_HOOK(int, 0, task_getioprio, struct task_struct *p) 200 + LSM_HOOK(int, 0, task_prlimit, const struct cred *cred, 201 + const struct cred *tcred, unsigned int flags) 202 + LSM_HOOK(int, 0, task_setrlimit, struct task_struct *p, unsigned int resource, 203 + struct rlimit *new_rlim) 204 + LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p) 205 + LSM_HOOK(int, 0, task_getscheduler, struct task_struct *p) 206 + LSM_HOOK(int, 0, task_movememory, struct task_struct *p) 207 + LSM_HOOK(int, 0, task_kill, struct task_struct *p, struct kernel_siginfo *info, 208 + int sig, const struct cred *cred) 209 + LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2, 210 + unsigned long arg3, unsigned long arg4, unsigned long arg5) 211 + LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, 212 + struct inode *inode) 213 + LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) 214 + LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp, 215 + u32 *secid) 216 + LSM_HOOK(int, 0, msg_msg_alloc_security, struct msg_msg *msg) 217 + LSM_HOOK(void, LSM_RET_VOID, msg_msg_free_security, struct msg_msg *msg) 218 + LSM_HOOK(int, 0, msg_queue_alloc_security, struct kern_ipc_perm *perm) 219 + LSM_HOOK(void, LSM_RET_VOID, msg_queue_free_security, 220 + struct kern_ipc_perm *perm) 221 + LSM_HOOK(int, 0, msg_queue_associate, struct kern_ipc_perm *perm, int msqflg) 222 + LSM_HOOK(int, 0, msg_queue_msgctl, struct kern_ipc_perm *perm, int cmd) 223 + LSM_HOOK(int, 0, msg_queue_msgsnd, struct kern_ipc_perm *perm, 224 + struct msg_msg *msg, int msqflg) 225 + LSM_HOOK(int, 0, msg_queue_msgrcv, struct kern_ipc_perm *perm, 226 + struct msg_msg *msg, struct task_struct *target, long type, int mode) 227 + LSM_HOOK(int, 0, shm_alloc_security, struct kern_ipc_perm *perm) 228 + LSM_HOOK(void, LSM_RET_VOID, shm_free_security, struct kern_ipc_perm *perm) 229 + LSM_HOOK(int, 0, shm_associate, struct kern_ipc_perm *perm, int shmflg) 230 + LSM_HOOK(int, 0, shm_shmctl, struct kern_ipc_perm *perm, int cmd) 231 + LSM_HOOK(int, 0, shm_shmat, struct kern_ipc_perm *perm, char __user *shmaddr, 232 + int shmflg) 233 + LSM_HOOK(int, 0, sem_alloc_security, struct kern_ipc_perm *perm) 234 + LSM_HOOK(void, LSM_RET_VOID, sem_free_security, struct kern_ipc_perm *perm) 235 + LSM_HOOK(int, 0, sem_associate, struct kern_ipc_perm *perm, int semflg) 236 + LSM_HOOK(int, 0, sem_semctl, struct kern_ipc_perm *perm, int cmd) 237 + LSM_HOOK(int, 0, sem_semop, struct kern_ipc_perm *perm, struct sembuf *sops, 238 + unsigned nsops, int alter) 239 + LSM_HOOK(int, 0, netlink_send, struct sock *sk, struct sk_buff *skb) 240 + LSM_HOOK(void, LSM_RET_VOID, d_instantiate, struct dentry *dentry, 241 + struct inode *inode) 242 + LSM_HOOK(int, -EINVAL, getprocattr, struct task_struct *p, char *name, 243 + char **value) 244 + LSM_HOOK(int, -EINVAL, setprocattr, const char *name, void *value, size_t size) 245 + LSM_HOOK(int, 0, ismaclabel, const char *name) 246 + LSM_HOOK(int, 0, secid_to_secctx, u32 secid, char **secdata, 247 + u32 *seclen) 248 + LSM_HOOK(int, 0, secctx_to_secid, const char *secdata, u32 seclen, u32 *secid) 249 + LSM_HOOK(void, LSM_RET_VOID, release_secctx, char *secdata, u32 seclen) 250 + LSM_HOOK(void, LSM_RET_VOID, inode_invalidate_secctx, struct inode *inode) 251 + LSM_HOOK(int, 0, inode_notifysecctx, struct inode *inode, void *ctx, u32 ctxlen) 252 + LSM_HOOK(int, 0, inode_setsecctx, struct dentry *dentry, void *ctx, u32 ctxlen) 253 + LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx, 254 + u32 *ctxlen) 255 + 256 + #ifdef CONFIG_SECURITY_NETWORK 257 + LSM_HOOK(int, 0, unix_stream_connect, struct sock *sock, struct sock *other, 258 + struct sock *newsk) 259 + LSM_HOOK(int, 0, unix_may_send, struct socket *sock, struct socket *other) 260 + LSM_HOOK(int, 0, socket_create, int family, int type, int protocol, int kern) 261 + LSM_HOOK(int, 0, socket_post_create, struct socket *sock, int family, int type, 262 + int protocol, int kern) 263 + LSM_HOOK(int, 0, socket_socketpair, struct socket *socka, struct socket *sockb) 264 + LSM_HOOK(int, 0, socket_bind, struct socket *sock, struct sockaddr *address, 265 + int addrlen) 266 + LSM_HOOK(int, 0, socket_connect, struct socket *sock, struct sockaddr *address, 267 + int addrlen) 268 + LSM_HOOK(int, 0, socket_listen, struct socket *sock, int backlog) 269 + LSM_HOOK(int, 0, socket_accept, struct socket *sock, struct socket *newsock) 270 + LSM_HOOK(int, 0, socket_sendmsg, struct socket *sock, struct msghdr *msg, 271 + int size) 272 + LSM_HOOK(int, 0, socket_recvmsg, struct socket *sock, struct msghdr *msg, 273 + int size, int flags) 274 + LSM_HOOK(int, 0, socket_getsockname, struct socket *sock) 275 + LSM_HOOK(int, 0, socket_getpeername, struct socket *sock) 276 + LSM_HOOK(int, 0, socket_getsockopt, struct socket *sock, int level, int optname) 277 + LSM_HOOK(int, 0, socket_setsockopt, struct socket *sock, int level, int optname) 278 + LSM_HOOK(int, 0, socket_shutdown, struct socket *sock, int how) 279 + LSM_HOOK(int, 0, socket_sock_rcv_skb, struct sock *sk, struct sk_buff *skb) 280 + LSM_HOOK(int, 0, socket_getpeersec_stream, struct socket *sock, 281 + char __user *optval, int __user *optlen, unsigned len) 282 + LSM_HOOK(int, 0, socket_getpeersec_dgram, struct socket *sock, 283 + struct sk_buff *skb, u32 *secid) 284 + LSM_HOOK(int, 0, sk_alloc_security, struct sock *sk, int family, gfp_t priority) 285 + LSM_HOOK(void, LSM_RET_VOID, sk_free_security, struct sock *sk) 286 + LSM_HOOK(void, LSM_RET_VOID, sk_clone_security, const struct sock *sk, 287 + struct sock *newsk) 288 + LSM_HOOK(void, LSM_RET_VOID, sk_getsecid, struct sock *sk, u32 *secid) 289 + LSM_HOOK(void, LSM_RET_VOID, sock_graft, struct sock *sk, struct socket *parent) 290 + LSM_HOOK(int, 0, inet_conn_request, struct sock *sk, struct sk_buff *skb, 291 + struct request_sock *req) 292 + LSM_HOOK(void, LSM_RET_VOID, inet_csk_clone, struct sock *newsk, 293 + const struct request_sock *req) 294 + LSM_HOOK(void, LSM_RET_VOID, inet_conn_established, struct sock *sk, 295 + struct sk_buff *skb) 296 + LSM_HOOK(int, 0, secmark_relabel_packet, u32 secid) 297 + LSM_HOOK(void, LSM_RET_VOID, secmark_refcount_inc, void) 298 + LSM_HOOK(void, LSM_RET_VOID, secmark_refcount_dec, void) 299 + LSM_HOOK(void, LSM_RET_VOID, req_classify_flow, const struct request_sock *req, 300 + struct flowi *fl) 301 + LSM_HOOK(int, 0, tun_dev_alloc_security, void **security) 302 + LSM_HOOK(void, LSM_RET_VOID, tun_dev_free_security, void *security) 303 + LSM_HOOK(int, 0, tun_dev_create, void) 304 + LSM_HOOK(int, 0, tun_dev_attach_queue, void *security) 305 + LSM_HOOK(int, 0, tun_dev_attach, struct sock *sk, void *security) 306 + LSM_HOOK(int, 0, tun_dev_open, void *security) 307 + LSM_HOOK(int, 0, sctp_assoc_request, struct sctp_endpoint *ep, 308 + struct sk_buff *skb) 309 + LSM_HOOK(int, 0, sctp_bind_connect, struct sock *sk, int optname, 310 + struct sockaddr *address, int addrlen) 311 + LSM_HOOK(void, LSM_RET_VOID, sctp_sk_clone, struct sctp_endpoint *ep, 312 + struct sock *sk, struct sock *newsk) 313 + #endif /* CONFIG_SECURITY_NETWORK */ 314 + 315 + #ifdef CONFIG_SECURITY_INFINIBAND 316 + LSM_HOOK(int, 0, ib_pkey_access, void *sec, u64 subnet_prefix, u16 pkey) 317 + LSM_HOOK(int, 0, ib_endport_manage_subnet, void *sec, const char *dev_name, 318 + u8 port_num) 319 + LSM_HOOK(int, 0, ib_alloc_security, void **sec) 320 + LSM_HOOK(void, LSM_RET_VOID, ib_free_security, void *sec) 321 + #endif /* CONFIG_SECURITY_INFINIBAND */ 322 + 323 + #ifdef CONFIG_SECURITY_NETWORK_XFRM 324 + LSM_HOOK(int, 0, xfrm_policy_alloc_security, struct xfrm_sec_ctx **ctxp, 325 + struct xfrm_user_sec_ctx *sec_ctx, gfp_t gfp) 326 + LSM_HOOK(int, 0, xfrm_policy_clone_security, struct xfrm_sec_ctx *old_ctx, 327 + struct xfrm_sec_ctx **new_ctx) 328 + LSM_HOOK(void, LSM_RET_VOID, xfrm_policy_free_security, 329 + struct xfrm_sec_ctx *ctx) 330 + LSM_HOOK(int, 0, xfrm_policy_delete_security, struct xfrm_sec_ctx *ctx) 331 + LSM_HOOK(int, 0, xfrm_state_alloc, struct xfrm_state *x, 332 + struct xfrm_user_sec_ctx *sec_ctx) 333 + LSM_HOOK(int, 0, xfrm_state_alloc_acquire, struct xfrm_state *x, 334 + struct xfrm_sec_ctx *polsec, u32 secid) 335 + LSM_HOOK(void, LSM_RET_VOID, xfrm_state_free_security, struct xfrm_state *x) 336 + LSM_HOOK(int, 0, xfrm_state_delete_security, struct xfrm_state *x) 337 + LSM_HOOK(int, 0, xfrm_policy_lookup, struct xfrm_sec_ctx *ctx, u32 fl_secid, 338 + u8 dir) 339 + LSM_HOOK(int, 1, xfrm_state_pol_flow_match, struct xfrm_state *x, 340 + struct xfrm_policy *xp, const struct flowi *fl) 341 + LSM_HOOK(int, 0, xfrm_decode_session, struct sk_buff *skb, u32 *secid, 342 + int ckall) 343 + #endif /* CONFIG_SECURITY_NETWORK_XFRM */ 344 + 345 + /* key management security hooks */ 346 + #ifdef CONFIG_KEYS 347 + LSM_HOOK(int, 0, key_alloc, struct key *key, const struct cred *cred, 348 + unsigned long flags) 349 + LSM_HOOK(void, LSM_RET_VOID, key_free, struct key *key) 350 + LSM_HOOK(int, 0, key_permission, key_ref_t key_ref, const struct cred *cred, 351 + unsigned perm) 352 + LSM_HOOK(int, 0, key_getsecurity, struct key *key, char **_buffer) 353 + #endif /* CONFIG_KEYS */ 354 + 355 + #ifdef CONFIG_AUDIT 356 + LSM_HOOK(int, 0, audit_rule_init, u32 field, u32 op, char *rulestr, 357 + void **lsmrule) 358 + LSM_HOOK(int, 0, audit_rule_known, struct audit_krule *krule) 359 + LSM_HOOK(int, 0, audit_rule_match, u32 secid, u32 field, u32 op, void *lsmrule) 360 + LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule) 361 + #endif /* CONFIG_AUDIT */ 362 + 363 + #ifdef CONFIG_BPF_SYSCALL 364 + LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size) 365 + LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode) 366 + LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog) 367 + LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map) 368 + LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map) 369 + LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux) 370 + LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free_security, struct bpf_prog_aux *aux) 371 + #endif /* CONFIG_BPF_SYSCALL */ 372 + 373 + LSM_HOOK(int, 0, locked_down, enum lockdown_reason what) 374 + 375 + #ifdef CONFIG_PERF_EVENTS 376 + LSM_HOOK(int, 0, perf_event_open, struct perf_event_attr *attr, int type) 377 + LSM_HOOK(int, 0, perf_event_alloc, struct perf_event *event) 378 + LSM_HOOK(void, LSM_RET_VOID, perf_event_free, struct perf_event *event) 379 + LSM_HOOK(int, 0, perf_event_read, struct perf_event *event) 380 + LSM_HOOK(int, 0, perf_event_write, struct perf_event *event) 381 + #endif /* CONFIG_PERF_EVENTS */

+12 -616

include/linux/lsm_hooks.h

··· 1456 1456 * @what: kernel feature being accessed 1457 1457 */ 1458 1458 union security_list_options { 1459 - int (*binder_set_context_mgr)(struct task_struct *mgr); 1460 - int (*binder_transaction)(struct task_struct *from, 1461 - struct task_struct *to); 1462 - int (*binder_transfer_binder)(struct task_struct *from, 1463 - struct task_struct *to); 1464 - int (*binder_transfer_file)(struct task_struct *from, 1465 - struct task_struct *to, 1466 - struct file *file); 1467 - 1468 - int (*ptrace_access_check)(struct task_struct *child, 1469 - unsigned int mode); 1470 - int (*ptrace_traceme)(struct task_struct *parent); 1471 - int (*capget)(struct task_struct *target, kernel_cap_t *effective, 1472 - kernel_cap_t *inheritable, kernel_cap_t *permitted); 1473 - int (*capset)(struct cred *new, const struct cred *old, 1474 - const kernel_cap_t *effective, 1475 - const kernel_cap_t *inheritable, 1476 - const kernel_cap_t *permitted); 1477 - int (*capable)(const struct cred *cred, 1478 - struct user_namespace *ns, 1479 - int cap, 1480 - unsigned int opts); 1481 - int (*quotactl)(int cmds, int type, int id, struct super_block *sb); 1482 - int (*quota_on)(struct dentry *dentry); 1483 - int (*syslog)(int type); 1484 - int (*settime)(const struct timespec64 *ts, const struct timezone *tz); 1485 - int (*vm_enough_memory)(struct mm_struct *mm, long pages); 1486 - 1487 - int (*bprm_set_creds)(struct linux_binprm *bprm); 1488 - int (*bprm_check_security)(struct linux_binprm *bprm); 1489 - void (*bprm_committing_creds)(struct linux_binprm *bprm); 1490 - void (*bprm_committed_creds)(struct linux_binprm *bprm); 1491 - 1492 - int (*fs_context_dup)(struct fs_context *fc, struct fs_context *src_sc); 1493 - int (*fs_context_parse_param)(struct fs_context *fc, struct fs_parameter *param); 1494 - 1495 - int (*sb_alloc_security)(struct super_block *sb); 1496 - void (*sb_free_security)(struct super_block *sb); 1497 - void (*sb_free_mnt_opts)(void *mnt_opts); 1498 - int (*sb_eat_lsm_opts)(char *orig, void **mnt_opts); 1499 - int (*sb_remount)(struct super_block *sb, void *mnt_opts); 1500 - int (*sb_kern_mount)(struct super_block *sb); 1501 - int (*sb_show_options)(struct seq_file *m, struct super_block *sb); 1502 - int (*sb_statfs)(struct dentry *dentry); 1503 - int (*sb_mount)(const char *dev_name, const struct path *path, 1504 - const char *type, unsigned long flags, void *data); 1505 - int (*sb_umount)(struct vfsmount *mnt, int flags); 1506 - int (*sb_pivotroot)(const struct path *old_path, const struct path *new_path); 1507 - int (*sb_set_mnt_opts)(struct super_block *sb, 1508 - void *mnt_opts, 1509 - unsigned long kern_flags, 1510 - unsigned long *set_kern_flags); 1511 - int (*sb_clone_mnt_opts)(const struct super_block *oldsb, 1512 - struct super_block *newsb, 1513 - unsigned long kern_flags, 1514 - unsigned long *set_kern_flags); 1515 - int (*sb_add_mnt_opt)(const char *option, const char *val, int len, 1516 - void **mnt_opts); 1517 - int (*move_mount)(const struct path *from_path, const struct path *to_path); 1518 - int (*dentry_init_security)(struct dentry *dentry, int mode, 1519 - const struct qstr *name, void **ctx, 1520 - u32 *ctxlen); 1521 - int (*dentry_create_files_as)(struct dentry *dentry, int mode, 1522 - struct qstr *name, 1523 - const struct cred *old, 1524 - struct cred *new); 1525 - 1526 - 1527 - #ifdef CONFIG_SECURITY_PATH 1528 - int (*path_unlink)(const struct path *dir, struct dentry *dentry); 1529 - int (*path_mkdir)(const struct path *dir, struct dentry *dentry, 1530 - umode_t mode); 1531 - int (*path_rmdir)(const struct path *dir, struct dentry *dentry); 1532 - int (*path_mknod)(const struct path *dir, struct dentry *dentry, 1533 - umode_t mode, unsigned int dev); 1534 - int (*path_truncate)(const struct path *path); 1535 - int (*path_symlink)(const struct path *dir, struct dentry *dentry, 1536 - const char *old_name); 1537 - int (*path_link)(struct dentry *old_dentry, const struct path *new_dir, 1538 - struct dentry *new_dentry); 1539 - int (*path_rename)(const struct path *old_dir, struct dentry *old_dentry, 1540 - const struct path *new_dir, 1541 - struct dentry *new_dentry); 1542 - int (*path_chmod)(const struct path *path, umode_t mode); 1543 - int (*path_chown)(const struct path *path, kuid_t uid, kgid_t gid); 1544 - int (*path_chroot)(const struct path *path); 1545 - #endif 1546 - /* Needed for inode based security check */ 1547 - int (*path_notify)(const struct path *path, u64 mask, 1548 - unsigned int obj_type); 1549 - int (*inode_alloc_security)(struct inode *inode); 1550 - void (*inode_free_security)(struct inode *inode); 1551 - int (*inode_init_security)(struct inode *inode, struct inode *dir, 1552 - const struct qstr *qstr, 1553 - const char **name, void **value, 1554 - size_t *len); 1555 - int (*inode_create)(struct inode *dir, struct dentry *dentry, 1556 - umode_t mode); 1557 - int (*inode_link)(struct dentry *old_dentry, struct inode *dir, 1558 - struct dentry *new_dentry); 1559 - int (*inode_unlink)(struct inode *dir, struct dentry *dentry); 1560 - int (*inode_symlink)(struct inode *dir, struct dentry *dentry, 1561 - const char *old_name); 1562 - int (*inode_mkdir)(struct inode *dir, struct dentry *dentry, 1563 - umode_t mode); 1564 - int (*inode_rmdir)(struct inode *dir, struct dentry *dentry); 1565 - int (*inode_mknod)(struct inode *dir, struct dentry *dentry, 1566 - umode_t mode, dev_t dev); 1567 - int (*inode_rename)(struct inode *old_dir, struct dentry *old_dentry, 1568 - struct inode *new_dir, 1569 - struct dentry *new_dentry); 1570 - int (*inode_readlink)(struct dentry *dentry); 1571 - int (*inode_follow_link)(struct dentry *dentry, struct inode *inode, 1572 - bool rcu); 1573 - int (*inode_permission)(struct inode *inode, int mask); 1574 - int (*inode_setattr)(struct dentry *dentry, struct iattr *attr); 1575 - int (*inode_getattr)(const struct path *path); 1576 - int (*inode_setxattr)(struct dentry *dentry, const char *name, 1577 - const void *value, size_t size, int flags); 1578 - void (*inode_post_setxattr)(struct dentry *dentry, const char *name, 1579 - const void *value, size_t size, 1580 - int flags); 1581 - int (*inode_getxattr)(struct dentry *dentry, const char *name); 1582 - int (*inode_listxattr)(struct dentry *dentry); 1583 - int (*inode_removexattr)(struct dentry *dentry, const char *name); 1584 - int (*inode_need_killpriv)(struct dentry *dentry); 1585 - int (*inode_killpriv)(struct dentry *dentry); 1586 - int (*inode_getsecurity)(struct inode *inode, const char *name, 1587 - void **buffer, bool alloc); 1588 - int (*inode_setsecurity)(struct inode *inode, const char *name, 1589 - const void *value, size_t size, 1590 - int flags); 1591 - int (*inode_listsecurity)(struct inode *inode, char *buffer, 1592 - size_t buffer_size); 1593 - void (*inode_getsecid)(struct inode *inode, u32 *secid); 1594 - int (*inode_copy_up)(struct dentry *src, struct cred **new); 1595 - int (*inode_copy_up_xattr)(const char *name); 1596 - 1597 - int (*kernfs_init_security)(struct kernfs_node *kn_dir, 1598 - struct kernfs_node *kn); 1599 - 1600 - int (*file_permission)(struct file *file, int mask); 1601 - int (*file_alloc_security)(struct file *file); 1602 - void (*file_free_security)(struct file *file); 1603 - int (*file_ioctl)(struct file *file, unsigned int cmd, 1604 - unsigned long arg); 1605 - int (*mmap_addr)(unsigned long addr); 1606 - int (*mmap_file)(struct file *file, unsigned long reqprot, 1607 - unsigned long prot, unsigned long flags); 1608 - int (*file_mprotect)(struct vm_area_struct *vma, unsigned long reqprot, 1609 - unsigned long prot); 1610 - int (*file_lock)(struct file *file, unsigned int cmd); 1611 - int (*file_fcntl)(struct file *file, unsigned int cmd, 1612 - unsigned long arg); 1613 - void (*file_set_fowner)(struct file *file); 1614 - int (*file_send_sigiotask)(struct task_struct *tsk, 1615 - struct fown_struct *fown, int sig); 1616 - int (*file_receive)(struct file *file); 1617 - int (*file_open)(struct file *file); 1618 - 1619 - int (*task_alloc)(struct task_struct *task, unsigned long clone_flags); 1620 - void (*task_free)(struct task_struct *task); 1621 - int (*cred_alloc_blank)(struct cred *cred, gfp_t gfp); 1622 - void (*cred_free)(struct cred *cred); 1623 - int (*cred_prepare)(struct cred *new, const struct cred *old, 1624 - gfp_t gfp); 1625 - void (*cred_transfer)(struct cred *new, const struct cred *old); 1626 - void (*cred_getsecid)(const struct cred *c, u32 *secid); 1627 - int (*kernel_act_as)(struct cred *new, u32 secid); 1628 - int (*kernel_create_files_as)(struct cred *new, struct inode *inode); 1629 - int (*kernel_module_request)(char *kmod_name); 1630 - int (*kernel_load_data)(enum kernel_load_data_id id); 1631 - int (*kernel_read_file)(struct file *file, enum kernel_read_file_id id); 1632 - int (*kernel_post_read_file)(struct file *file, char *buf, loff_t size, 1633 - enum kernel_read_file_id id); 1634 - int (*task_fix_setuid)(struct cred *new, const struct cred *old, 1635 - int flags); 1636 - int (*task_setpgid)(struct task_struct *p, pid_t pgid); 1637 - int (*task_getpgid)(struct task_struct *p); 1638 - int (*task_getsid)(struct task_struct *p); 1639 - void (*task_getsecid)(struct task_struct *p, u32 *secid); 1640 - int (*task_setnice)(struct task_struct *p, int nice); 1641 - int (*task_setioprio)(struct task_struct *p, int ioprio); 1642 - int (*task_getioprio)(struct task_struct *p); 1643 - int (*task_prlimit)(const struct cred *cred, const struct cred *tcred, 1644 - unsigned int flags); 1645 - int (*task_setrlimit)(struct task_struct *p, unsigned int resource, 1646 - struct rlimit *new_rlim); 1647 - int (*task_setscheduler)(struct task_struct *p); 1648 - int (*task_getscheduler)(struct task_struct *p); 1649 - int (*task_movememory)(struct task_struct *p); 1650 - int (*task_kill)(struct task_struct *p, struct kernel_siginfo *info, 1651 - int sig, const struct cred *cred); 1652 - int (*task_prctl)(int option, unsigned long arg2, unsigned long arg3, 1653 - unsigned long arg4, unsigned long arg5); 1654 - void (*task_to_inode)(struct task_struct *p, struct inode *inode); 1655 - 1656 - int (*ipc_permission)(struct kern_ipc_perm *ipcp, short flag); 1657 - void (*ipc_getsecid)(struct kern_ipc_perm *ipcp, u32 *secid); 1658 - 1659 - int (*msg_msg_alloc_security)(struct msg_msg *msg); 1660 - void (*msg_msg_free_security)(struct msg_msg *msg); 1661 - 1662 - int (*msg_queue_alloc_security)(struct kern_ipc_perm *perm); 1663 - void (*msg_queue_free_security)(struct kern_ipc_perm *perm); 1664 - int (*msg_queue_associate)(struct kern_ipc_perm *perm, int msqflg); 1665 - int (*msg_queue_msgctl)(struct kern_ipc_perm *perm, int cmd); 1666 - int (*msg_queue_msgsnd)(struct kern_ipc_perm *perm, struct msg_msg *msg, 1667 - int msqflg); 1668 - int (*msg_queue_msgrcv)(struct kern_ipc_perm *perm, struct msg_msg *msg, 1669 - struct task_struct *target, long type, 1670 - int mode); 1671 - 1672 - int (*shm_alloc_security)(struct kern_ipc_perm *perm); 1673 - void (*shm_free_security)(struct kern_ipc_perm *perm); 1674 - int (*shm_associate)(struct kern_ipc_perm *perm, int shmflg); 1675 - int (*shm_shmctl)(struct kern_ipc_perm *perm, int cmd); 1676 - int (*shm_shmat)(struct kern_ipc_perm *perm, char __user *shmaddr, 1677 - int shmflg); 1678 - 1679 - int (*sem_alloc_security)(struct kern_ipc_perm *perm); 1680 - void (*sem_free_security)(struct kern_ipc_perm *perm); 1681 - int (*sem_associate)(struct kern_ipc_perm *perm, int semflg); 1682 - int (*sem_semctl)(struct kern_ipc_perm *perm, int cmd); 1683 - int (*sem_semop)(struct kern_ipc_perm *perm, struct sembuf *sops, 1684 - unsigned nsops, int alter); 1685 - 1686 - int (*netlink_send)(struct sock *sk, struct sk_buff *skb); 1687 - 1688 - void (*d_instantiate)(struct dentry *dentry, struct inode *inode); 1689 - 1690 - int (*getprocattr)(struct task_struct *p, char *name, char **value); 1691 - int (*setprocattr)(const char *name, void *value, size_t size); 1692 - int (*ismaclabel)(const char *name); 1693 - int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen); 1694 - int (*secctx_to_secid)(const char *secdata, u32 seclen, u32 *secid); 1695 - void (*release_secctx)(char *secdata, u32 seclen); 1696 - 1697 - void (*inode_invalidate_secctx)(struct inode *inode); 1698 - int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); 1699 - int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); 1700 - int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); 1701 - 1702 - #ifdef CONFIG_SECURITY_NETWORK 1703 - int (*unix_stream_connect)(struct sock *sock, struct sock *other, 1704 - struct sock *newsk); 1705 - int (*unix_may_send)(struct socket *sock, struct socket *other); 1706 - 1707 - int (*socket_create)(int family, int type, int protocol, int kern); 1708 - int (*socket_post_create)(struct socket *sock, int family, int type, 1709 - int protocol, int kern); 1710 - int (*socket_socketpair)(struct socket *socka, struct socket *sockb); 1711 - int (*socket_bind)(struct socket *sock, struct sockaddr *address, 1712 - int addrlen); 1713 - int (*socket_connect)(struct socket *sock, struct sockaddr *address, 1714 - int addrlen); 1715 - int (*socket_listen)(struct socket *sock, int backlog); 1716 - int (*socket_accept)(struct socket *sock, struct socket *newsock); 1717 - int (*socket_sendmsg)(struct socket *sock, struct msghdr *msg, 1718 - int size); 1719 - int (*socket_recvmsg)(struct socket *sock, struct msghdr *msg, 1720 - int size, int flags); 1721 - int (*socket_getsockname)(struct socket *sock); 1722 - int (*socket_getpeername)(struct socket *sock); 1723 - int (*socket_getsockopt)(struct socket *sock, int level, int optname); 1724 - int (*socket_setsockopt)(struct socket *sock, int level, int optname); 1725 - int (*socket_shutdown)(struct socket *sock, int how); 1726 - int (*socket_sock_rcv_skb)(struct sock *sk, struct sk_buff *skb); 1727 - int (*socket_getpeersec_stream)(struct socket *sock, 1728 - char __user *optval, 1729 - int __user *optlen, unsigned len); 1730 - int (*socket_getpeersec_dgram)(struct socket *sock, 1731 - struct sk_buff *skb, u32 *secid); 1732 - int (*sk_alloc_security)(struct sock *sk, int family, gfp_t priority); 1733 - void (*sk_free_security)(struct sock *sk); 1734 - void (*sk_clone_security)(const struct sock *sk, struct sock *newsk); 1735 - void (*sk_getsecid)(struct sock *sk, u32 *secid); 1736 - void (*sock_graft)(struct sock *sk, struct socket *parent); 1737 - int (*inet_conn_request)(struct sock *sk, struct sk_buff *skb, 1738 - struct request_sock *req); 1739 - void (*inet_csk_clone)(struct sock *newsk, 1740 - const struct request_sock *req); 1741 - void (*inet_conn_established)(struct sock *sk, struct sk_buff *skb); 1742 - int (*secmark_relabel_packet)(u32 secid); 1743 - void (*secmark_refcount_inc)(void); 1744 - void (*secmark_refcount_dec)(void); 1745 - void (*req_classify_flow)(const struct request_sock *req, 1746 - struct flowi *fl); 1747 - int (*tun_dev_alloc_security)(void **security); 1748 - void (*tun_dev_free_security)(void *security); 1749 - int (*tun_dev_create)(void); 1750 - int (*tun_dev_attach_queue)(void *security); 1751 - int (*tun_dev_attach)(struct sock *sk, void *security); 1752 - int (*tun_dev_open)(void *security); 1753 - int (*sctp_assoc_request)(struct sctp_endpoint *ep, 1754 - struct sk_buff *skb); 1755 - int (*sctp_bind_connect)(struct sock *sk, int optname, 1756 - struct sockaddr *address, int addrlen); 1757 - void (*sctp_sk_clone)(struct sctp_endpoint *ep, struct sock *sk, 1758 - struct sock *newsk); 1759 - #endif /* CONFIG_SECURITY_NETWORK */ 1760 - 1761 - #ifdef CONFIG_SECURITY_INFINIBAND 1762 - int (*ib_pkey_access)(void *sec, u64 subnet_prefix, u16 pkey); 1763 - int (*ib_endport_manage_subnet)(void *sec, const char *dev_name, 1764 - u8 port_num); 1765 - int (*ib_alloc_security)(void **sec); 1766 - void (*ib_free_security)(void *sec); 1767 - #endif /* CONFIG_SECURITY_INFINIBAND */ 1768 - 1769 - #ifdef CONFIG_SECURITY_NETWORK_XFRM 1770 - int (*xfrm_policy_alloc_security)(struct xfrm_sec_ctx **ctxp, 1771 - struct xfrm_user_sec_ctx *sec_ctx, 1772 - gfp_t gfp); 1773 - int (*xfrm_policy_clone_security)(struct xfrm_sec_ctx *old_ctx, 1774 - struct xfrm_sec_ctx **new_ctx); 1775 - void (*xfrm_policy_free_security)(struct xfrm_sec_ctx *ctx); 1776 - int (*xfrm_policy_delete_security)(struct xfrm_sec_ctx *ctx); 1777 - int (*xfrm_state_alloc)(struct xfrm_state *x, 1778 - struct xfrm_user_sec_ctx *sec_ctx); 1779 - int (*xfrm_state_alloc_acquire)(struct xfrm_state *x, 1780 - struct xfrm_sec_ctx *polsec, 1781 - u32 secid); 1782 - void (*xfrm_state_free_security)(struct xfrm_state *x); 1783 - int (*xfrm_state_delete_security)(struct xfrm_state *x); 1784 - int (*xfrm_policy_lookup)(struct xfrm_sec_ctx *ctx, u32 fl_secid, 1785 - u8 dir); 1786 - int (*xfrm_state_pol_flow_match)(struct xfrm_state *x, 1787 - struct xfrm_policy *xp, 1788 - const struct flowi *fl); 1789 - int (*xfrm_decode_session)(struct sk_buff *skb, u32 *secid, int ckall); 1790 - #endif /* CONFIG_SECURITY_NETWORK_XFRM */ 1791 - 1792 - /* key management security hooks */ 1793 - #ifdef CONFIG_KEYS 1794 - int (*key_alloc)(struct key *key, const struct cred *cred, 1795 - unsigned long flags); 1796 - void (*key_free)(struct key *key); 1797 - int (*key_permission)(key_ref_t key_ref, const struct cred *cred, 1798 - unsigned perm); 1799 - int (*key_getsecurity)(struct key *key, char **_buffer); 1800 - #endif /* CONFIG_KEYS */ 1801 - 1802 - #ifdef CONFIG_AUDIT 1803 - int (*audit_rule_init)(u32 field, u32 op, char *rulestr, 1804 - void **lsmrule); 1805 - int (*audit_rule_known)(struct audit_krule *krule); 1806 - int (*audit_rule_match)(u32 secid, u32 field, u32 op, void *lsmrule); 1807 - void (*audit_rule_free)(void *lsmrule); 1808 - #endif /* CONFIG_AUDIT */ 1809 - 1810 - #ifdef CONFIG_BPF_SYSCALL 1811 - int (*bpf)(int cmd, union bpf_attr *attr, 1812 - unsigned int size); 1813 - int (*bpf_map)(struct bpf_map *map, fmode_t fmode); 1814 - int (*bpf_prog)(struct bpf_prog *prog); 1815 - int (*bpf_map_alloc_security)(struct bpf_map *map); 1816 - void (*bpf_map_free_security)(struct bpf_map *map); 1817 - int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); 1818 - void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); 1819 - #endif /* CONFIG_BPF_SYSCALL */ 1820 - int (*locked_down)(enum lockdown_reason what); 1821 - #ifdef CONFIG_PERF_EVENTS 1822 - int (*perf_event_open)(struct perf_event_attr *attr, int type); 1823 - int (*perf_event_alloc)(struct perf_event *event); 1824 - void (*perf_event_free)(struct perf_event *event); 1825 - int (*perf_event_read)(struct perf_event *event); 1826 - int (*perf_event_write)(struct perf_event *event); 1827 - 1828 - #endif 1459 + #define LSM_HOOK(RET, DEFAULT, NAME, ...) RET (*NAME)(__VA_ARGS__); 1460 + #include "lsm_hook_defs.h" 1461 + #undef LSM_HOOK 1829 1462 }; 1830 1463 1831 1464 struct security_hook_heads { 1832 - struct hlist_head binder_set_context_mgr; 1833 - struct hlist_head binder_transaction; 1834 - struct hlist_head binder_transfer_binder; 1835 - struct hlist_head binder_transfer_file; 1836 - struct hlist_head ptrace_access_check; 1837 - struct hlist_head ptrace_traceme; 1838 - struct hlist_head capget; 1839 - struct hlist_head capset; 1840 - struct hlist_head capable; 1841 - struct hlist_head quotactl; 1842 - struct hlist_head quota_on; 1843 - struct hlist_head syslog; 1844 - struct hlist_head settime; 1845 - struct hlist_head vm_enough_memory; 1846 - struct hlist_head bprm_set_creds; 1847 - struct hlist_head bprm_check_security; 1848 - struct hlist_head bprm_committing_creds; 1849 - struct hlist_head bprm_committed_creds; 1850 - struct hlist_head fs_context_dup; 1851 - struct hlist_head fs_context_parse_param; 1852 - struct hlist_head sb_alloc_security; 1853 - struct hlist_head sb_free_security; 1854 - struct hlist_head sb_free_mnt_opts; 1855 - struct hlist_head sb_eat_lsm_opts; 1856 - struct hlist_head sb_remount; 1857 - struct hlist_head sb_kern_mount; 1858 - struct hlist_head sb_show_options; 1859 - struct hlist_head sb_statfs; 1860 - struct hlist_head sb_mount; 1861 - struct hlist_head sb_umount; 1862 - struct hlist_head sb_pivotroot; 1863 - struct hlist_head sb_set_mnt_opts; 1864 - struct hlist_head sb_clone_mnt_opts; 1865 - struct hlist_head sb_add_mnt_opt; 1866 - struct hlist_head move_mount; 1867 - struct hlist_head dentry_init_security; 1868 - struct hlist_head dentry_create_files_as; 1869 - #ifdef CONFIG_SECURITY_PATH 1870 - struct hlist_head path_unlink; 1871 - struct hlist_head path_mkdir; 1872 - struct hlist_head path_rmdir; 1873 - struct hlist_head path_mknod; 1874 - struct hlist_head path_truncate; 1875 - struct hlist_head path_symlink; 1876 - struct hlist_head path_link; 1877 - struct hlist_head path_rename; 1878 - struct hlist_head path_chmod; 1879 - struct hlist_head path_chown; 1880 - struct hlist_head path_chroot; 1881 - #endif 1882 - /* Needed for inode based modules as well */ 1883 - struct hlist_head path_notify; 1884 - struct hlist_head inode_alloc_security; 1885 - struct hlist_head inode_free_security; 1886 - struct hlist_head inode_init_security; 1887 - struct hlist_head inode_create; 1888 - struct hlist_head inode_link; 1889 - struct hlist_head inode_unlink; 1890 - struct hlist_head inode_symlink; 1891 - struct hlist_head inode_mkdir; 1892 - struct hlist_head inode_rmdir; 1893 - struct hlist_head inode_mknod; 1894 - struct hlist_head inode_rename; 1895 - struct hlist_head inode_readlink; 1896 - struct hlist_head inode_follow_link; 1897 - struct hlist_head inode_permission; 1898 - struct hlist_head inode_setattr; 1899 - struct hlist_head inode_getattr; 1900 - struct hlist_head inode_setxattr; 1901 - struct hlist_head inode_post_setxattr; 1902 - struct hlist_head inode_getxattr; 1903 - struct hlist_head inode_listxattr; 1904 - struct hlist_head inode_removexattr; 1905 - struct hlist_head inode_need_killpriv; 1906 - struct hlist_head inode_killpriv; 1907 - struct hlist_head inode_getsecurity; 1908 - struct hlist_head inode_setsecurity; 1909 - struct hlist_head inode_listsecurity; 1910 - struct hlist_head inode_getsecid; 1911 - struct hlist_head inode_copy_up; 1912 - struct hlist_head inode_copy_up_xattr; 1913 - struct hlist_head kernfs_init_security; 1914 - struct hlist_head file_permission; 1915 - struct hlist_head file_alloc_security; 1916 - struct hlist_head file_free_security; 1917 - struct hlist_head file_ioctl; 1918 - struct hlist_head mmap_addr; 1919 - struct hlist_head mmap_file; 1920 - struct hlist_head file_mprotect; 1921 - struct hlist_head file_lock; 1922 - struct hlist_head file_fcntl; 1923 - struct hlist_head file_set_fowner; 1924 - struct hlist_head file_send_sigiotask; 1925 - struct hlist_head file_receive; 1926 - struct hlist_head file_open; 1927 - struct hlist_head task_alloc; 1928 - struct hlist_head task_free; 1929 - struct hlist_head cred_alloc_blank; 1930 - struct hlist_head cred_free; 1931 - struct hlist_head cred_prepare; 1932 - struct hlist_head cred_transfer; 1933 - struct hlist_head cred_getsecid; 1934 - struct hlist_head kernel_act_as; 1935 - struct hlist_head kernel_create_files_as; 1936 - struct hlist_head kernel_load_data; 1937 - struct hlist_head kernel_read_file; 1938 - struct hlist_head kernel_post_read_file; 1939 - struct hlist_head kernel_module_request; 1940 - struct hlist_head task_fix_setuid; 1941 - struct hlist_head task_setpgid; 1942 - struct hlist_head task_getpgid; 1943 - struct hlist_head task_getsid; 1944 - struct hlist_head task_getsecid; 1945 - struct hlist_head task_setnice; 1946 - struct hlist_head task_setioprio; 1947 - struct hlist_head task_getioprio; 1948 - struct hlist_head task_prlimit; 1949 - struct hlist_head task_setrlimit; 1950 - struct hlist_head task_setscheduler; 1951 - struct hlist_head task_getscheduler; 1952 - struct hlist_head task_movememory; 1953 - struct hlist_head task_kill; 1954 - struct hlist_head task_prctl; 1955 - struct hlist_head task_to_inode; 1956 - struct hlist_head ipc_permission; 1957 - struct hlist_head ipc_getsecid; 1958 - struct hlist_head msg_msg_alloc_security; 1959 - struct hlist_head msg_msg_free_security; 1960 - struct hlist_head msg_queue_alloc_security; 1961 - struct hlist_head msg_queue_free_security; 1962 - struct hlist_head msg_queue_associate; 1963 - struct hlist_head msg_queue_msgctl; 1964 - struct hlist_head msg_queue_msgsnd; 1965 - struct hlist_head msg_queue_msgrcv; 1966 - struct hlist_head shm_alloc_security; 1967 - struct hlist_head shm_free_security; 1968 - struct hlist_head shm_associate; 1969 - struct hlist_head shm_shmctl; 1970 - struct hlist_head shm_shmat; 1971 - struct hlist_head sem_alloc_security; 1972 - struct hlist_head sem_free_security; 1973 - struct hlist_head sem_associate; 1974 - struct hlist_head sem_semctl; 1975 - struct hlist_head sem_semop; 1976 - struct hlist_head netlink_send; 1977 - struct hlist_head d_instantiate; 1978 - struct hlist_head getprocattr; 1979 - struct hlist_head setprocattr; 1980 - struct hlist_head ismaclabel; 1981 - struct hlist_head secid_to_secctx; 1982 - struct hlist_head secctx_to_secid; 1983 - struct hlist_head release_secctx; 1984 - struct hlist_head inode_invalidate_secctx; 1985 - struct hlist_head inode_notifysecctx; 1986 - struct hlist_head inode_setsecctx; 1987 - struct hlist_head inode_getsecctx; 1988 - #ifdef CONFIG_SECURITY_NETWORK 1989 - struct hlist_head unix_stream_connect; 1990 - struct hlist_head unix_may_send; 1991 - struct hlist_head socket_create; 1992 - struct hlist_head socket_post_create; 1993 - struct hlist_head socket_socketpair; 1994 - struct hlist_head socket_bind; 1995 - struct hlist_head socket_connect; 1996 - struct hlist_head socket_listen; 1997 - struct hlist_head socket_accept; 1998 - struct hlist_head socket_sendmsg; 1999 - struct hlist_head socket_recvmsg; 2000 - struct hlist_head socket_getsockname; 2001 - struct hlist_head socket_getpeername; 2002 - struct hlist_head socket_getsockopt; 2003 - struct hlist_head socket_setsockopt; 2004 - struct hlist_head socket_shutdown; 2005 - struct hlist_head socket_sock_rcv_skb; 2006 - struct hlist_head socket_getpeersec_stream; 2007 - struct hlist_head socket_getpeersec_dgram; 2008 - struct hlist_head sk_alloc_security; 2009 - struct hlist_head sk_free_security; 2010 - struct hlist_head sk_clone_security; 2011 - struct hlist_head sk_getsecid; 2012 - struct hlist_head sock_graft; 2013 - struct hlist_head inet_conn_request; 2014 - struct hlist_head inet_csk_clone; 2015 - struct hlist_head inet_conn_established; 2016 - struct hlist_head secmark_relabel_packet; 2017 - struct hlist_head secmark_refcount_inc; 2018 - struct hlist_head secmark_refcount_dec; 2019 - struct hlist_head req_classify_flow; 2020 - struct hlist_head tun_dev_alloc_security; 2021 - struct hlist_head tun_dev_free_security; 2022 - struct hlist_head tun_dev_create; 2023 - struct hlist_head tun_dev_attach_queue; 2024 - struct hlist_head tun_dev_attach; 2025 - struct hlist_head tun_dev_open; 2026 - struct hlist_head sctp_assoc_request; 2027 - struct hlist_head sctp_bind_connect; 2028 - struct hlist_head sctp_sk_clone; 2029 - #endif /* CONFIG_SECURITY_NETWORK */ 2030 - #ifdef CONFIG_SECURITY_INFINIBAND 2031 - struct hlist_head ib_pkey_access; 2032 - struct hlist_head ib_endport_manage_subnet; 2033 - struct hlist_head ib_alloc_security; 2034 - struct hlist_head ib_free_security; 2035 - #endif /* CONFIG_SECURITY_INFINIBAND */ 2036 - #ifdef CONFIG_SECURITY_NETWORK_XFRM 2037 - struct hlist_head xfrm_policy_alloc_security; 2038 - struct hlist_head xfrm_policy_clone_security; 2039 - struct hlist_head xfrm_policy_free_security; 2040 - struct hlist_head xfrm_policy_delete_security; 2041 - struct hlist_head xfrm_state_alloc; 2042 - struct hlist_head xfrm_state_alloc_acquire; 2043 - struct hlist_head xfrm_state_free_security; 2044 - struct hlist_head xfrm_state_delete_security; 2045 - struct hlist_head xfrm_policy_lookup; 2046 - struct hlist_head xfrm_state_pol_flow_match; 2047 - struct hlist_head xfrm_decode_session; 2048 - #endif /* CONFIG_SECURITY_NETWORK_XFRM */ 2049 - #ifdef CONFIG_KEYS 2050 - struct hlist_head key_alloc; 2051 - struct hlist_head key_free; 2052 - struct hlist_head key_permission; 2053 - struct hlist_head key_getsecurity; 2054 - #endif /* CONFIG_KEYS */ 2055 - #ifdef CONFIG_AUDIT 2056 - struct hlist_head audit_rule_init; 2057 - struct hlist_head audit_rule_known; 2058 - struct hlist_head audit_rule_match; 2059 - struct hlist_head audit_rule_free; 2060 - #endif /* CONFIG_AUDIT */ 2061 - #ifdef CONFIG_BPF_SYSCALL 2062 - struct hlist_head bpf; 2063 - struct hlist_head bpf_map; 2064 - struct hlist_head bpf_prog; 2065 - struct hlist_head bpf_map_alloc_security; 2066 - struct hlist_head bpf_map_free_security; 2067 - struct hlist_head bpf_prog_alloc_security; 2068 - struct hlist_head bpf_prog_free_security; 2069 - #endif /* CONFIG_BPF_SYSCALL */ 2070 - struct hlist_head locked_down; 2071 - #ifdef CONFIG_PERF_EVENTS 2072 - struct hlist_head perf_event_open; 2073 - struct hlist_head perf_event_alloc; 2074 - struct hlist_head perf_event_free; 2075 - struct hlist_head perf_event_read; 2076 - struct hlist_head perf_event_write; 2077 - #endif 1465 + #define LSM_HOOK(RET, DEFAULT, NAME, ...) struct hlist_head NAME; 1466 + #include "lsm_hook_defs.h" 1467 + #undef LSM_HOOK 2078 1468 } __randomize_layout; 2079 1469 2080 1470 /* ··· 1489 2099 int lbs_msg_msg; 1490 2100 int lbs_task; 1491 2101 }; 2102 + 2103 + /* 2104 + * LSM_RET_VOID is used as the default value in LSM_HOOK definitions for void 2105 + * LSM hooks (in include/linux/lsm_hook_defs.h). 2106 + */ 2107 + #define LSM_RET_VOID ((void) 0) 1492 2108 1493 2109 /* 1494 2110 * Initializing a security_hook_list structure takes

+1 -1

include/linux/netdevice.h

··· 3777 3777 3778 3778 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); 3779 3779 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, 3780 - int fd, u32 flags); 3780 + int fd, int expected_fd, u32 flags); 3781 3781 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op, 3782 3782 enum bpf_netdev_command cmd); 3783 3783 int xdp_umem_query(struct net_device *dev, u16 queue_id);

+12

include/linux/tnum.h

··· 86 86 /* Format a tnum as tristate binary expansion */ 87 87 int tnum_sbin(char *str, size_t size, struct tnum a); 88 88 89 + /* Returns the 32-bit subreg */ 90 + struct tnum tnum_subreg(struct tnum a); 91 + /* Returns the tnum with the lower 32-bit subreg cleared */ 92 + struct tnum tnum_clear_subreg(struct tnum a); 93 + /* Returns the tnum with the lower 32-bit subreg set to value */ 94 + struct tnum tnum_const_subreg(struct tnum a, u32 value); 95 + /* Returns true if 32-bit subreg @a is a known constant*/ 96 + static inline bool tnum_subreg_is_const(struct tnum a) 97 + { 98 + return !(tnum_subreg(a)).mask; 99 + } 100 + 89 101 #endif /* _LINUX_TNUM_H */

+6 -1

include/net/cls_cgroup.h

··· 45 45 sock_cgroup_set_classid(skcd, classid); 46 46 } 47 47 48 + static inline u32 __task_get_classid(struct task_struct *task) 49 + { 50 + return task_cls_state(task)->classid; 51 + } 52 + 48 53 static inline u32 task_get_classid(const struct sk_buff *skb) 49 54 { 50 - u32 classid = task_cls_state(current)->classid; 55 + u32 classid = __task_get_classid(current); 51 56 52 57 /* Due to the nature of the classifier it is required to ignore all 53 58 * packets originating from softirq context as accessing `current'

+1 -2

include/net/inet6_hashtables.h

··· 85 85 int iif, int sdif, 86 86 bool *refcounted) 87 87 { 88 - struct sock *sk = skb_steal_sock(skb); 88 + struct sock *sk = skb_steal_sock(skb, refcounted); 89 89 90 - *refcounted = true; 91 90 if (sk) 92 91 return sk; 93 92

+1 -2

include/net/inet_hashtables.h

··· 379 379 const int sdif, 380 380 bool *refcounted) 381 381 { 382 - struct sock *sk = skb_steal_sock(skb); 382 + struct sock *sk = skb_steal_sock(skb, refcounted); 383 383 const struct iphdr *iph = ip_hdr(skb); 384 384 385 - *refcounted = true; 386 385 if (sk) 387 386 return sk; 388 387

+5

include/net/net_namespace.h

··· 168 168 #ifdef CONFIG_XFRM 169 169 struct netns_xfrm xfrm; 170 170 #endif 171 + 172 + atomic64_t net_cookie; /* written once */ 173 + 171 174 #if IS_ENABLED(CONFIG_IP_VS) 172 175 struct netns_ipvs *ipvs; 173 176 #endif ··· 227 224 228 225 struct net *get_net_ns_by_pid(pid_t pid); 229 226 struct net *get_net_ns_by_fd(int fd); 227 + 228 + u64 net_gen_cookie(struct net *net); 230 229 231 230 #ifdef CONFIG_SYSCTL 232 231 void ipx_register_sysctl(void);

+37 -9

include/net/sock.h

··· 1659 1659 void sock_efree(struct sk_buff *skb); 1660 1660 #ifdef CONFIG_INET 1661 1661 void sock_edemux(struct sk_buff *skb); 1662 + void sock_pfree(struct sk_buff *skb); 1662 1663 #else 1663 1664 #define sock_edemux sock_efree 1664 1665 #endif ··· 2527 2526 write_pnet(&sk->sk_net, net); 2528 2527 } 2529 2528 2530 - static inline struct sock *skb_steal_sock(struct sk_buff *skb) 2529 + static inline bool 2530 + skb_sk_is_prefetched(struct sk_buff *skb) 2531 2531 { 2532 - if (skb->sk) { 2533 - struct sock *sk = skb->sk; 2534 - 2535 - skb->destructor = NULL; 2536 - skb->sk = NULL; 2537 - return sk; 2538 - } 2539 - return NULL; 2532 + #ifdef CONFIG_INET 2533 + return skb->destructor == sock_pfree; 2534 + #else 2535 + return false; 2536 + #endif /* CONFIG_INET */ 2540 2537 } 2541 2538 2542 2539 /* This helper checks if a socket is a full socket, ··· 2543 2544 static inline bool sk_fullsock(const struct sock *sk) 2544 2545 { 2545 2546 return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); 2547 + } 2548 + 2549 + static inline bool 2550 + sk_is_refcounted(struct sock *sk) 2551 + { 2552 + /* Only full sockets have sk->sk_flags. */ 2553 + return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE); 2554 + } 2555 + 2556 + /** 2557 + * skb_steal_sock 2558 + * @skb to steal the socket from 2559 + * @refcounted is set to true if the socket is reference-counted 2560 + */ 2561 + static inline struct sock * 2562 + skb_steal_sock(struct sk_buff *skb, bool *refcounted) 2563 + { 2564 + if (skb->sk) { 2565 + struct sock *sk = skb->sk; 2566 + 2567 + *refcounted = true; 2568 + if (skb_sk_is_prefetched(skb)) 2569 + *refcounted = sk_is_refcounted(sk); 2570 + skb->destructor = NULL; 2571 + skb->sk = NULL; 2572 + return sk; 2573 + } 2574 + *refcounted = false; 2575 + return NULL; 2546 2576 } 2547 2577 2548 2578 /* Checks if this SKB belongs to an HW offloaded socket

-2

include/net/tcp.h

··· 2207 2207 #ifdef CONFIG_NET_SOCK_MSG 2208 2208 int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes, 2209 2209 int flags); 2210 - int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, 2211 - int nonblock, int flags, int *addr_len); 2212 2210 int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, 2213 2211 struct msghdr *msg, int len, int flags); 2214 2212 #endif /* CONFIG_NET_SOCK_MSG */

+80 -2

include/uapi/linux/bpf.h

··· 111 111 BPF_MAP_LOOKUP_AND_DELETE_BATCH, 112 112 BPF_MAP_UPDATE_BATCH, 113 113 BPF_MAP_DELETE_BATCH, 114 + BPF_LINK_CREATE, 115 + BPF_LINK_UPDATE, 114 116 }; 115 117 116 118 enum bpf_map_type { ··· 183 181 BPF_PROG_TYPE_TRACING, 184 182 BPF_PROG_TYPE_STRUCT_OPS, 185 183 BPF_PROG_TYPE_EXT, 184 + BPF_PROG_TYPE_LSM, 186 185 }; 187 186 188 187 enum bpf_attach_type { ··· 214 211 BPF_TRACE_FENTRY, 215 212 BPF_TRACE_FEXIT, 216 213 BPF_MODIFY_RETURN, 214 + BPF_LSM_MAC, 217 215 __MAX_BPF_ATTACH_TYPE 218 216 }; 219 217 ··· 543 539 __u32 prog_cnt; 544 540 } query; 545 541 546 - struct { 542 + struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ 547 543 __u64 name; 548 544 __u32 prog_fd; 549 545 } raw_tracepoint; ··· 571 567 __u64 probe_offset; /* output: probe_offset */ 572 568 __u64 probe_addr; /* output: probe_addr */ 573 569 } task_fd_query; 570 + 571 + struct { /* struct used by BPF_LINK_CREATE command */ 572 + __u32 prog_fd; /* eBPF program to attach */ 573 + __u32 target_fd; /* object to attach to */ 574 + __u32 attach_type; /* attach type */ 575 + __u32 flags; /* extra flags */ 576 + } link_create; 577 + 578 + struct { /* struct used by BPF_LINK_UPDATE command */ 579 + __u32 link_fd; /* link fd */ 580 + /* new program fd to update link with */ 581 + __u32 new_prog_fd; 582 + __u32 flags; /* extra flags */ 583 + /* expected link's program fd; is specified only if 584 + * BPF_F_REPLACE flag is set in flags */ 585 + __u32 old_prog_fd; 586 + } link_update; 587 + 574 588 } __attribute__((aligned(8))); 575 589 576 590 /* The description below is an attempt at providing documentation to eBPF ··· 2972 2950 * restricted to raw_tracepoint bpf programs. 2973 2951 * Return 2974 2952 * 0 on success, or a negative error in case of failure. 2953 + * 2954 + * u64 bpf_get_netns_cookie(void *ctx) 2955 + * Description 2956 + * Retrieve the cookie (generated by the kernel) of the network 2957 + * namespace the input *ctx* is associated with. The network 2958 + * namespace cookie remains stable for its lifetime and provides 2959 + * a global identifier that can be assumed unique. If *ctx* is 2960 + * NULL, then the helper returns the cookie for the initial 2961 + * network namespace. The cookie itself is very similar to that 2962 + * of bpf_get_socket_cookie() helper, but for network namespaces 2963 + * instead of sockets. 2964 + * Return 2965 + * A 8-byte long opaque number. 2966 + * 2967 + * u64 bpf_get_current_ancestor_cgroup_id(int ancestor_level) 2968 + * Description 2969 + * Return id of cgroup v2 that is ancestor of the cgroup associated 2970 + * with the current task at the *ancestor_level*. The root cgroup 2971 + * is at *ancestor_level* zero and each step down the hierarchy 2972 + * increments the level. If *ancestor_level* == level of cgroup 2973 + * associated with the current task, then return value will be the 2974 + * same as that of **bpf_get_current_cgroup_id**\ (). 2975 + * 2976 + * The helper is useful to implement policies based on cgroups 2977 + * that are upper in hierarchy than immediate cgroup associated 2978 + * with the current task. 2979 + * 2980 + * The format of returned id and helper limitations are same as in 2981 + * **bpf_get_current_cgroup_id**\ (). 2982 + * Return 2983 + * The id is returned or 0 in case the id could not be retrieved. 2984 + * 2985 + * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags) 2986 + * Description 2987 + * Assign the *sk* to the *skb*. When combined with appropriate 2988 + * routing configuration to receive the packet towards the socket, 2989 + * will cause *skb* to be delivered to the specified socket. 2990 + * Subsequent redirection of *skb* via **bpf_redirect**\ (), 2991 + * **bpf_clone_redirect**\ () or other methods outside of BPF may 2992 + * interfere with successful delivery to the socket. 2993 + * 2994 + * This operation is only valid from TC ingress path. 2995 + * 2996 + * The *flags* argument must be zero. 2997 + * Return 2998 + * 0 on success, or a negative errno in case of failure. 2999 + * 3000 + * * **-EINVAL** Unsupported flags specified. 3001 + * * **-ENOENT** Socket is unavailable for assignment. 3002 + * * **-ENETUNREACH** Socket is unreachable (wrong netns). 3003 + * * **-EOPNOTSUPP** Unsupported operation, for example a 3004 + * call from outside of TC ingress. 3005 + * * **-ESOCKTNOSUPPORT** Socket type not supported (reuseport). 2975 3006 */ 2976 3007 #define __BPF_FUNC_MAPPER(FN) \ 2977 3008 FN(unspec), \ ··· 3148 3073 FN(jiffies64), \ 3149 3074 FN(read_branch_records), \ 3150 3075 FN(get_ns_current_pid_tgid), \ 3151 - FN(xdp_output), 3076 + FN(xdp_output), \ 3077 + FN(get_netns_cookie), \ 3078 + FN(get_current_ancestor_cgroup_id), \ 3079 + FN(sk_assign), 3152 3080 3153 3081 /* integer value in 'imm' field of BPF_CALL instruction selects which helper 3154 3082 * function eBPF program intends to call

+3 -1

include/uapi/linux/if_link.h

··· 974 974 #define XDP_FLAGS_SKB_MODE (1U << 1) 975 975 #define XDP_FLAGS_DRV_MODE (1U << 2) 976 976 #define XDP_FLAGS_HW_MODE (1U << 3) 977 + #define XDP_FLAGS_REPLACE (1U << 4) 977 978 #define XDP_FLAGS_MODES (XDP_FLAGS_SKB_MODE | \ 978 979 XDP_FLAGS_DRV_MODE | \ 979 980 XDP_FLAGS_HW_MODE) 980 981 #define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \ 981 - XDP_FLAGS_MODES) 982 + XDP_FLAGS_MODES | XDP_FLAGS_REPLACE) 982 983 983 984 /* These are stored into IFLA_XDP_ATTACHED on dump. */ 984 985 enum { ··· 999 998 IFLA_XDP_DRV_PROG_ID, 1000 999 IFLA_XDP_SKB_PROG_ID, 1001 1000 IFLA_XDP_HW_PROG_ID, 1001 + IFLA_XDP_EXPECTED_FD, 1002 1002 __IFLA_XDP_MAX, 1003 1003 }; 1004 1004

+13

init/Kconfig

··· 1615 1615 # end of the "standard kernel features (expert users)" menu 1616 1616 1617 1617 # syscall, maps, verifier 1618 + 1619 + config BPF_LSM 1620 + bool "LSM Instrumentation with BPF" 1621 + depends on BPF_EVENTS 1622 + depends on BPF_SYSCALL 1623 + depends on SECURITY 1624 + depends on BPF_JIT 1625 + help 1626 + Enables instrumentation of the security hooks with eBPF programs for 1627 + implementing dynamic MAC and Audit Policies. 1628 + 1629 + If you are unsure how to answer this question, answer N. 1630 + 1618 1631 config BPF_SYSCALL 1619 1632 bool "Enable bpf() system call" 1620 1633 select BPF

+1

kernel/bpf/Makefile

··· 29 29 endif 30 30 ifeq ($(CONFIG_BPF_JIT),y) 31 31 obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o 32 + obj-${CONFIG_BPF_LSM} += bpf_lsm.o 32 33 endif

+54

kernel/bpf/bpf_lsm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2020 Google LLC. 5 + */ 6 + 7 + #include <linux/filter.h> 8 + #include <linux/bpf.h> 9 + #include <linux/btf.h> 10 + #include <linux/lsm_hooks.h> 11 + #include <linux/bpf_lsm.h> 12 + #include <linux/kallsyms.h> 13 + #include <linux/bpf_verifier.h> 14 + 15 + /* For every LSM hook that allows attachment of BPF programs, declare a nop 16 + * function where a BPF program can be attached. 17 + */ 18 + #define LSM_HOOK(RET, DEFAULT, NAME, ...) \ 19 + noinline RET bpf_lsm_##NAME(__VA_ARGS__) \ 20 + { \ 21 + return DEFAULT; \ 22 + } 23 + 24 + #include <linux/lsm_hook_defs.h> 25 + #undef LSM_HOOK 26 + 27 + #define BPF_LSM_SYM_PREFX "bpf_lsm_" 28 + 29 + int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, 30 + const struct bpf_prog *prog) 31 + { 32 + if (!prog->gpl_compatible) { 33 + bpf_log(vlog, 34 + "LSM programs must have a GPL compatible license\n"); 35 + return -EINVAL; 36 + } 37 + 38 + if (strncmp(BPF_LSM_SYM_PREFX, prog->aux->attach_func_name, 39 + sizeof(BPF_LSM_SYM_PREFX) - 1)) { 40 + bpf_log(vlog, "attach_btf_id %u points to wrong type name %s\n", 41 + prog->aux->attach_btf_id, prog->aux->attach_func_name); 42 + return -EINVAL; 43 + } 44 + 45 + return 0; 46 + } 47 + 48 + const struct bpf_prog_ops lsm_prog_ops = { 49 + }; 50 + 51 + const struct bpf_verifier_ops lsm_verifier_ops = { 52 + .get_func_proto = bpf_tracing_func_proto, 53 + .is_valid_access = btf_ctx_access, 54 + };

+34 -11

kernel/bpf/btf.c

··· 3477 3477 return ERR_PTR(err); 3478 3478 } 3479 3479 3480 - extern char __weak _binary__btf_vmlinux_bin_start[]; 3481 - extern char __weak _binary__btf_vmlinux_bin_end[]; 3480 + extern char __weak __start_BTF[]; 3481 + extern char __weak __stop_BTF[]; 3482 3482 extern struct btf *btf_vmlinux; 3483 3483 3484 3484 #define BPF_MAP_TYPE(_id, _ops) ··· 3605 3605 } 3606 3606 env->btf = btf; 3607 3607 3608 - btf->data = _binary__btf_vmlinux_bin_start; 3609 - btf->data_size = _binary__btf_vmlinux_bin_end - 3610 - _binary__btf_vmlinux_bin_start; 3608 + btf->data = __start_BTF; 3609 + btf->data_size = __stop_BTF - __start_BTF; 3611 3610 3612 3611 err = btf_parse_hdr(env); 3613 3612 if (err) ··· 3709 3710 nr_args--; 3710 3711 } 3711 3712 3713 + if (arg > nr_args) { 3714 + bpf_log(log, "func '%s' doesn't have %d-th argument\n", 3715 + tname, arg + 1); 3716 + return false; 3717 + } 3718 + 3712 3719 if (arg == nr_args) { 3713 - if (prog->expected_attach_type == BPF_TRACE_FEXIT) { 3720 + switch (prog->expected_attach_type) { 3721 + case BPF_LSM_MAC: 3722 + case BPF_TRACE_FEXIT: 3723 + /* When LSM programs are attached to void LSM hooks 3724 + * they use FEXIT trampolines and when attached to 3725 + * int LSM hooks, they use MODIFY_RETURN trampolines. 3726 + * 3727 + * While the LSM programs are BPF_MODIFY_RETURN-like 3728 + * the check: 3729 + * 3730 + * if (ret_type != 'int') 3731 + * return -EINVAL; 3732 + * 3733 + * is _not_ done here. This is still safe as LSM hooks 3734 + * have only void and int return types. 3735 + */ 3714 3736 if (!t) 3715 3737 return true; 3716 3738 t = btf_type_by_id(btf, t->type); 3717 - } else if (prog->expected_attach_type == BPF_MODIFY_RETURN) { 3739 + break; 3740 + case BPF_MODIFY_RETURN: 3718 3741 /* For now the BPF_MODIFY_RETURN can only be attached to 3719 3742 * functions that return an int. 3720 3743 */ ··· 3750 3729 btf_kind_str[BTF_INFO_KIND(t->info)]); 3751 3730 return false; 3752 3731 } 3732 + break; 3733 + default: 3734 + bpf_log(log, "func '%s' doesn't have %d-th argument\n", 3735 + tname, arg + 1); 3736 + return false; 3753 3737 } 3754 - } else if (arg >= nr_args) { 3755 - bpf_log(log, "func '%s' doesn't have %d-th argument\n", 3756 - tname, arg + 1); 3757 - return false; 3758 3738 } else { 3759 3739 if (!t) 3760 3740 /* Default prog with 5 args */ 3761 3741 return true; 3762 3742 t = btf_type_by_id(btf, args[arg].type); 3763 3743 } 3744 + 3764 3745 /* skip modifiers */ 3765 3746 while (btf_type_is_modifier(t)) 3766 3747 t = btf_type_by_id(btf, t->type);

+391 -122

kernel/bpf/cgroup.c

··· 28 28 percpu_ref_kill(&cgrp->bpf.refcnt); 29 29 } 30 30 31 + static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[]) 32 + { 33 + enum bpf_cgroup_storage_type stype; 34 + 35 + for_each_cgroup_storage_type(stype) 36 + bpf_cgroup_storage_free(storages[stype]); 37 + } 38 + 39 + static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[], 40 + struct bpf_prog *prog) 41 + { 42 + enum bpf_cgroup_storage_type stype; 43 + 44 + for_each_cgroup_storage_type(stype) { 45 + storages[stype] = bpf_cgroup_storage_alloc(prog, stype); 46 + if (IS_ERR(storages[stype])) { 47 + storages[stype] = NULL; 48 + bpf_cgroup_storages_free(storages); 49 + return -ENOMEM; 50 + } 51 + } 52 + 53 + return 0; 54 + } 55 + 56 + static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[], 57 + struct bpf_cgroup_storage *src[]) 58 + { 59 + enum bpf_cgroup_storage_type stype; 60 + 61 + for_each_cgroup_storage_type(stype) 62 + dst[stype] = src[stype]; 63 + } 64 + 65 + static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[], 66 + struct cgroup* cgrp, 67 + enum bpf_attach_type attach_type) 68 + { 69 + enum bpf_cgroup_storage_type stype; 70 + 71 + for_each_cgroup_storage_type(stype) 72 + bpf_cgroup_storage_link(storages[stype], cgrp, attach_type); 73 + } 74 + 75 + static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[]) 76 + { 77 + enum bpf_cgroup_storage_type stype; 78 + 79 + for_each_cgroup_storage_type(stype) 80 + bpf_cgroup_storage_unlink(storages[stype]); 81 + } 82 + 83 + /* Called when bpf_cgroup_link is auto-detached from dying cgroup. 84 + * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It 85 + * doesn't free link memory, which will eventually be done by bpf_link's 86 + * release() callback, when its last FD is closed. 87 + */ 88 + static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link) 89 + { 90 + cgroup_put(link->cgroup); 91 + link->cgroup = NULL; 92 + } 93 + 31 94 /** 32 95 * cgroup_bpf_release() - put references of all bpf programs and 33 96 * release all cgroup bpf data ··· 100 37 { 101 38 struct cgroup *p, *cgrp = container_of(work, struct cgroup, 102 39 bpf.release_work); 103 - enum bpf_cgroup_storage_type stype; 104 40 struct bpf_prog_array *old_array; 105 41 unsigned int type; 106 42 ··· 111 49 112 50 list_for_each_entry_safe(pl, tmp, progs, node) { 113 51 list_del(&pl->node); 114 - bpf_prog_put(pl->prog); 115 - for_each_cgroup_storage_type(stype) { 116 - bpf_cgroup_storage_unlink(pl->storage[stype]); 117 - bpf_cgroup_storage_free(pl->storage[stype]); 118 - } 52 + if (pl->prog) 53 + bpf_prog_put(pl->prog); 54 + if (pl->link) 55 + bpf_cgroup_link_auto_detach(pl->link); 56 + bpf_cgroup_storages_unlink(pl->storage); 57 + bpf_cgroup_storages_free(pl->storage); 119 58 kfree(pl); 120 59 static_branch_dec(&cgroup_bpf_enabled_key); 121 60 } ··· 148 85 queue_work(system_wq, &cgrp->bpf.release_work); 149 86 } 150 87 88 + /* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through 89 + * link or direct prog. 90 + */ 91 + static struct bpf_prog *prog_list_prog(struct bpf_prog_list *pl) 92 + { 93 + if (pl->prog) 94 + return pl->prog; 95 + if (pl->link) 96 + return pl->link->link.prog; 97 + return NULL; 98 + } 99 + 151 100 /* count number of elements in the list. 152 101 * it's slow but the list cannot be long 153 102 */ ··· 169 94 u32 cnt = 0; 170 95 171 96 list_for_each_entry(pl, head, node) { 172 - if (!pl->prog) 97 + if (!prog_list_prog(pl)) 173 98 continue; 174 99 cnt++; 175 100 } ··· 213 138 enum bpf_attach_type type, 214 139 struct bpf_prog_array **array) 215 140 { 216 - enum bpf_cgroup_storage_type stype; 141 + struct bpf_prog_array_item *item; 217 142 struct bpf_prog_array *progs; 218 143 struct bpf_prog_list *pl; 219 144 struct cgroup *p = cgrp; ··· 238 163 continue; 239 164 240 165 list_for_each_entry(pl, &p->bpf.progs[type], node) { 241 - if (!pl->prog) 166 + if (!prog_list_prog(pl)) 242 167 continue; 243 168 244 - progs->items[cnt].prog = pl->prog; 245 - for_each_cgroup_storage_type(stype) 246 - progs->items[cnt].cgroup_storage[stype] = 247 - pl->storage[stype]; 169 + item = &progs->items[cnt]; 170 + item->prog = prog_list_prog(pl); 171 + bpf_cgroup_storages_assign(item->cgroup_storage, 172 + pl->storage); 248 173 cnt++; 249 174 } 250 175 } while ((p = cgroup_parent(p))); ··· 362 287 363 288 #define BPF_CGROUP_MAX_PROGS 64 364 289 290 + static struct bpf_prog_list *find_attach_entry(struct list_head *progs, 291 + struct bpf_prog *prog, 292 + struct bpf_cgroup_link *link, 293 + struct bpf_prog *replace_prog, 294 + bool allow_multi) 295 + { 296 + struct bpf_prog_list *pl; 297 + 298 + /* single-attach case */ 299 + if (!allow_multi) { 300 + if (list_empty(progs)) 301 + return NULL; 302 + return list_first_entry(progs, typeof(*pl), node); 303 + } 304 + 305 + list_for_each_entry(pl, progs, node) { 306 + if (prog && pl->prog == prog) 307 + /* disallow attaching the same prog twice */ 308 + return ERR_PTR(-EINVAL); 309 + if (link && pl->link == link) 310 + /* disallow attaching the same link twice */ 311 + return ERR_PTR(-EINVAL); 312 + } 313 + 314 + /* direct prog multi-attach w/ replacement case */ 315 + if (replace_prog) { 316 + list_for_each_entry(pl, progs, node) { 317 + if (pl->prog == replace_prog) 318 + /* a match found */ 319 + return pl; 320 + } 321 + /* prog to replace not found for cgroup */ 322 + return ERR_PTR(-ENOENT); 323 + } 324 + 325 + return NULL; 326 + } 327 + 365 328 /** 366 - * __cgroup_bpf_attach() - Attach the program to a cgroup, and 329 + * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and 367 330 * propagate the change to descendants 368 331 * @cgrp: The cgroup which descendants to traverse 369 332 * @prog: A program to attach 333 + * @link: A link to attach 370 334 * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set 371 335 * @type: Type of attach operation 372 336 * @flags: Option flags 373 337 * 338 + * Exactly one of @prog or @link can be non-null. 374 339 * Must be called with cgroup_mutex held. 375 340 */ 376 - int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, 377 - struct bpf_prog *replace_prog, 341 + int __cgroup_bpf_attach(struct cgroup *cgrp, 342 + struct bpf_prog *prog, struct bpf_prog *replace_prog, 343 + struct bpf_cgroup_link *link, 378 344 enum bpf_attach_type type, u32 flags) 379 345 { 380 346 u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI)); ··· 423 307 struct bpf_prog *old_prog = NULL; 424 308 struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {}; 425 309 struct bpf_cgroup_storage *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {}; 426 - struct bpf_prog_list *pl, *replace_pl = NULL; 427 - enum bpf_cgroup_storage_type stype; 310 + struct bpf_prog_list *pl; 428 311 int err; 429 312 430 313 if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) || 431 314 ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI))) 432 315 /* invalid combination */ 316 + return -EINVAL; 317 + if (link && (prog || replace_prog)) 318 + /* only either link or prog/replace_prog can be specified */ 319 + return -EINVAL; 320 + if (!!replace_prog != !!(flags & BPF_F_REPLACE)) 321 + /* replace_prog implies BPF_F_REPLACE, and vice versa */ 433 322 return -EINVAL; 434 323 435 324 if (!hierarchy_allows_attach(cgrp, type)) ··· 450 329 if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS) 451 330 return -E2BIG; 452 331 453 - if (flags & BPF_F_ALLOW_MULTI) { 454 - list_for_each_entry(pl, progs, node) { 455 - if (pl->prog == prog) 456 - /* disallow attaching the same prog twice */ 457 - return -EINVAL; 458 - if (pl->prog == replace_prog) 459 - replace_pl = pl; 460 - } 461 - if ((flags & BPF_F_REPLACE) && !replace_pl) 462 - /* prog to replace not found for cgroup */ 463 - return -ENOENT; 464 - } else if (!list_empty(progs)) { 465 - replace_pl = list_first_entry(progs, typeof(*pl), node); 466 - } 332 + pl = find_attach_entry(progs, prog, link, replace_prog, 333 + flags & BPF_F_ALLOW_MULTI); 334 + if (IS_ERR(pl)) 335 + return PTR_ERR(pl); 467 336 468 - for_each_cgroup_storage_type(stype) { 469 - storage[stype] = bpf_cgroup_storage_alloc(prog, stype); 470 - if (IS_ERR(storage[stype])) { 471 - storage[stype] = NULL; 472 - for_each_cgroup_storage_type(stype) 473 - bpf_cgroup_storage_free(storage[stype]); 474 - return -ENOMEM; 475 - } 476 - } 337 + if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog)) 338 + return -ENOMEM; 477 339 478 - if (replace_pl) { 479 - pl = replace_pl; 340 + if (pl) { 480 341 old_prog = pl->prog; 481 - for_each_cgroup_storage_type(stype) { 482 - old_storage[stype] = pl->storage[stype]; 483 - bpf_cgroup_storage_unlink(old_storage[stype]); 484 - } 342 + bpf_cgroup_storages_unlink(pl->storage); 343 + bpf_cgroup_storages_assign(old_storage, pl->storage); 485 344 } else { 486 345 pl = kmalloc(sizeof(*pl), GFP_KERNEL); 487 346 if (!pl) { 488 - for_each_cgroup_storage_type(stype) 489 - bpf_cgroup_storage_free(storage[stype]); 347 + bpf_cgroup_storages_free(storage); 490 348 return -ENOMEM; 491 349 } 492 350 list_add_tail(&pl->node, progs); 493 351 } 494 352 495 353 pl->prog = prog; 496 - for_each_cgroup_storage_type(stype) 497 - pl->storage[stype] = storage[stype]; 498 - 354 + pl->link = link; 355 + bpf_cgroup_storages_assign(pl->storage, storage); 499 356 cgrp->bpf.flags[type] = saved_flags; 500 357 501 358 err = update_effective_progs(cgrp, type); 502 359 if (err) 503 360 goto cleanup; 504 361 505 - static_branch_inc(&cgroup_bpf_enabled_key); 506 - for_each_cgroup_storage_type(stype) { 507 - if (!old_storage[stype]) 508 - continue; 509 - bpf_cgroup_storage_free(old_storage[stype]); 510 - } 511 - if (old_prog) { 362 + bpf_cgroup_storages_free(old_storage); 363 + if (old_prog) 512 364 bpf_prog_put(old_prog); 513 - static_branch_dec(&cgroup_bpf_enabled_key); 514 - } 515 - for_each_cgroup_storage_type(stype) 516 - bpf_cgroup_storage_link(storage[stype], cgrp, type); 365 + else 366 + static_branch_inc(&cgroup_bpf_enabled_key); 367 + bpf_cgroup_storages_link(pl->storage, cgrp, type); 517 368 return 0; 518 369 519 370 cleanup: 520 - /* and cleanup the prog list */ 521 - pl->prog = old_prog; 522 - for_each_cgroup_storage_type(stype) { 523 - bpf_cgroup_storage_free(pl->storage[stype]); 524 - pl->storage[stype] = old_storage[stype]; 525 - bpf_cgroup_storage_link(old_storage[stype], cgrp, type); 371 + if (old_prog) { 372 + pl->prog = old_prog; 373 + pl->link = NULL; 526 374 } 527 - if (!replace_pl) { 375 + bpf_cgroup_storages_free(pl->storage); 376 + bpf_cgroup_storages_assign(pl->storage, old_storage); 377 + bpf_cgroup_storages_link(pl->storage, cgrp, type); 378 + if (!old_prog) { 528 379 list_del(&pl->node); 529 380 kfree(pl); 530 381 } 531 382 return err; 532 383 } 533 384 385 + /* Swap updated BPF program for given link in effective program arrays across 386 + * all descendant cgroups. This function is guaranteed to succeed. 387 + */ 388 + static void replace_effective_prog(struct cgroup *cgrp, 389 + enum bpf_attach_type type, 390 + struct bpf_cgroup_link *link) 391 + { 392 + struct bpf_prog_array_item *item; 393 + struct cgroup_subsys_state *css; 394 + struct bpf_prog_array *progs; 395 + struct bpf_prog_list *pl; 396 + struct list_head *head; 397 + struct cgroup *cg; 398 + int pos; 399 + 400 + css_for_each_descendant_pre(css, &cgrp->self) { 401 + struct cgroup *desc = container_of(css, struct cgroup, self); 402 + 403 + if (percpu_ref_is_zero(&desc->bpf.refcnt)) 404 + continue; 405 + 406 + /* find position of link in effective progs array */ 407 + for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) { 408 + if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI)) 409 + continue; 410 + 411 + head = &cg->bpf.progs[type]; 412 + list_for_each_entry(pl, head, node) { 413 + if (!prog_list_prog(pl)) 414 + continue; 415 + if (pl->link == link) 416 + goto found; 417 + pos++; 418 + } 419 + } 420 + found: 421 + BUG_ON(!cg); 422 + progs = rcu_dereference_protected( 423 + desc->bpf.effective[type], 424 + lockdep_is_held(&cgroup_mutex)); 425 + item = &progs->items[pos]; 426 + WRITE_ONCE(item->prog, link->link.prog); 427 + } 428 + } 429 + 534 430 /** 535 - * __cgroup_bpf_detach() - Detach the program from a cgroup, and 536 - * propagate the change to descendants 431 + * __cgroup_bpf_replace() - Replace link's program and propagate the change 432 + * to descendants 537 433 * @cgrp: The cgroup which descendants to traverse 538 - * @prog: A program to detach or NULL 539 - * @type: Type of detach operation 434 + * @link: A link for which to replace BPF program 435 + * @type: Type of attach operation 540 436 * 541 437 * Must be called with cgroup_mutex held. 542 438 */ 543 - int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, 544 - enum bpf_attach_type type) 439 + int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link, 440 + struct bpf_prog *new_prog) 545 441 { 546 - struct list_head *progs = &cgrp->bpf.progs[type]; 547 - enum bpf_cgroup_storage_type stype; 548 - u32 flags = cgrp->bpf.flags[type]; 549 - struct bpf_prog *old_prog = NULL; 442 + struct list_head *progs = &cgrp->bpf.progs[link->type]; 443 + struct bpf_prog *old_prog; 550 444 struct bpf_prog_list *pl; 551 - int err; 445 + bool found = false; 552 446 553 - if (flags & BPF_F_ALLOW_MULTI) { 554 - if (!prog) 555 - /* to detach MULTI prog the user has to specify valid FD 556 - * of the program to be detached 557 - */ 558 - return -EINVAL; 559 - } else { 560 - if (list_empty(progs)) 561 - /* report error when trying to detach and nothing is attached */ 562 - return -ENOENT; 563 - } 447 + if (link->link.prog->type != new_prog->type) 448 + return -EINVAL; 564 449 565 - if (flags & BPF_F_ALLOW_MULTI) { 566 - /* find the prog and detach it */ 567 - list_for_each_entry(pl, progs, node) { 568 - if (pl->prog != prog) 569 - continue; 570 - old_prog = prog; 571 - /* mark it deleted, so it's ignored while 572 - * recomputing effective 573 - */ 574 - pl->prog = NULL; 450 + list_for_each_entry(pl, progs, node) { 451 + if (pl->link == link) { 452 + found = true; 575 453 break; 576 454 } 577 - if (!old_prog) 578 - return -ENOENT; 579 - } else { 580 - /* to maintain backward compatibility NONE and OVERRIDE cgroups 581 - * allow detaching with invalid FD (prog==NULL) 582 - */ 583 - pl = list_first_entry(progs, typeof(*pl), node); 584 - old_prog = pl->prog; 585 - pl->prog = NULL; 586 455 } 456 + if (!found) 457 + return -ENOENT; 458 + 459 + old_prog = xchg(&link->link.prog, new_prog); 460 + replace_effective_prog(cgrp, link->type, link); 461 + bpf_prog_put(old_prog); 462 + return 0; 463 + } 464 + 465 + static struct bpf_prog_list *find_detach_entry(struct list_head *progs, 466 + struct bpf_prog *prog, 467 + struct bpf_cgroup_link *link, 468 + bool allow_multi) 469 + { 470 + struct bpf_prog_list *pl; 471 + 472 + if (!allow_multi) { 473 + if (list_empty(progs)) 474 + /* report error when trying to detach and nothing is attached */ 475 + return ERR_PTR(-ENOENT); 476 + 477 + /* to maintain backward compatibility NONE and OVERRIDE cgroups 478 + * allow detaching with invalid FD (prog==NULL) in legacy mode 479 + */ 480 + return list_first_entry(progs, typeof(*pl), node); 481 + } 482 + 483 + if (!prog && !link) 484 + /* to detach MULTI prog the user has to specify valid FD 485 + * of the program or link to be detached 486 + */ 487 + return ERR_PTR(-EINVAL); 488 + 489 + /* find the prog or link and detach it */ 490 + list_for_each_entry(pl, progs, node) { 491 + if (pl->prog == prog && pl->link == link) 492 + return pl; 493 + } 494 + return ERR_PTR(-ENOENT); 495 + } 496 + 497 + /** 498 + * __cgroup_bpf_detach() - Detach the program or link from a cgroup, and 499 + * propagate the change to descendants 500 + * @cgrp: The cgroup which descendants to traverse 501 + * @prog: A program to detach or NULL 502 + * @prog: A link to detach or NULL 503 + * @type: Type of detach operation 504 + * 505 + * At most one of @prog or @link can be non-NULL. 506 + * Must be called with cgroup_mutex held. 507 + */ 508 + int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, 509 + struct bpf_cgroup_link *link, enum bpf_attach_type type) 510 + { 511 + struct list_head *progs = &cgrp->bpf.progs[type]; 512 + u32 flags = cgrp->bpf.flags[type]; 513 + struct bpf_prog_list *pl; 514 + struct bpf_prog *old_prog; 515 + int err; 516 + 517 + if (prog && link) 518 + /* only one of prog or link can be specified */ 519 + return -EINVAL; 520 + 521 + pl = find_detach_entry(progs, prog, link, flags & BPF_F_ALLOW_MULTI); 522 + if (IS_ERR(pl)) 523 + return PTR_ERR(pl); 524 + 525 + /* mark it deleted, so it's ignored while recomputing effective */ 526 + old_prog = pl->prog; 527 + pl->prog = NULL; 528 + pl->link = NULL; 587 529 588 530 err = update_effective_progs(cgrp, type); 589 531 if (err) ··· 654 470 655 471 /* now can actually delete it from this cgroup list */ 656 472 list_del(&pl->node); 657 - for_each_cgroup_storage_type(stype) { 658 - bpf_cgroup_storage_unlink(pl->storage[stype]); 659 - bpf_cgroup_storage_free(pl->storage[stype]); 660 - } 473 + bpf_cgroup_storages_unlink(pl->storage); 474 + bpf_cgroup_storages_free(pl->storage); 661 475 kfree(pl); 662 476 if (list_empty(progs)) 663 477 /* last program was detached, reset flags to zero */ 664 478 cgrp->bpf.flags[type] = 0; 665 - 666 - bpf_prog_put(old_prog); 479 + if (old_prog) 480 + bpf_prog_put(old_prog); 667 481 static_branch_dec(&cgroup_bpf_enabled_key); 668 482 return 0; 669 483 670 484 cleanup: 671 - /* and restore back old_prog */ 485 + /* restore back prog or link */ 672 486 pl->prog = old_prog; 487 + pl->link = link; 673 488 return err; 674 489 } 675 490 ··· 681 498 struct list_head *progs = &cgrp->bpf.progs[type]; 682 499 u32 flags = cgrp->bpf.flags[type]; 683 500 struct bpf_prog_array *effective; 501 + struct bpf_prog *prog; 684 502 int cnt, ret = 0, i; 685 503 686 504 effective = rcu_dereference_protected(cgrp->bpf.effective[type], ··· 712 528 713 529 i = 0; 714 530 list_for_each_entry(pl, progs, node) { 715 - id = pl->prog->aux->id; 531 + prog = prog_list_prog(pl); 532 + id = prog->aux->id; 716 533 if (copy_to_user(prog_ids + i, &id, sizeof(id))) 717 534 return -EFAULT; 718 535 if (++i == cnt) ··· 743 558 } 744 559 } 745 560 746 - ret = cgroup_bpf_attach(cgrp, prog, replace_prog, attr->attach_type, 747 - attr->attach_flags); 561 + ret = cgroup_bpf_attach(cgrp, prog, replace_prog, NULL, 562 + attr->attach_type, attr->attach_flags); 748 563 749 564 if (replace_prog) 750 565 bpf_prog_put(replace_prog); ··· 766 581 if (IS_ERR(prog)) 767 582 prog = NULL; 768 583 769 - ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type, 0); 584 + ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type); 770 585 if (prog) 771 586 bpf_prog_put(prog); 772 587 773 588 cgroup_put(cgrp); 774 589 return ret; 590 + } 591 + 592 + static void bpf_cgroup_link_release(struct bpf_link *link) 593 + { 594 + struct bpf_cgroup_link *cg_link = 595 + container_of(link, struct bpf_cgroup_link, link); 596 + 597 + /* link might have been auto-detached by dying cgroup already, 598 + * in that case our work is done here 599 + */ 600 + if (!cg_link->cgroup) 601 + return; 602 + 603 + mutex_lock(&cgroup_mutex); 604 + 605 + /* re-check cgroup under lock again */ 606 + if (!cg_link->cgroup) { 607 + mutex_unlock(&cgroup_mutex); 608 + return; 609 + } 610 + 611 + WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link, 612 + cg_link->type)); 613 + 614 + mutex_unlock(&cgroup_mutex); 615 + cgroup_put(cg_link->cgroup); 616 + } 617 + 618 + static void bpf_cgroup_link_dealloc(struct bpf_link *link) 619 + { 620 + struct bpf_cgroup_link *cg_link = 621 + container_of(link, struct bpf_cgroup_link, link); 622 + 623 + kfree(cg_link); 624 + } 625 + 626 + const struct bpf_link_ops bpf_cgroup_link_lops = { 627 + .release = bpf_cgroup_link_release, 628 + .dealloc = bpf_cgroup_link_dealloc, 629 + }; 630 + 631 + int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) 632 + { 633 + struct bpf_cgroup_link *link; 634 + struct file *link_file; 635 + struct cgroup *cgrp; 636 + int err, link_fd; 637 + 638 + if (attr->link_create.flags) 639 + return -EINVAL; 640 + 641 + cgrp = cgroup_get_from_fd(attr->link_create.target_fd); 642 + if (IS_ERR(cgrp)) 643 + return PTR_ERR(cgrp); 644 + 645 + link = kzalloc(sizeof(*link), GFP_USER); 646 + if (!link) { 647 + err = -ENOMEM; 648 + goto out_put_cgroup; 649 + } 650 + bpf_link_init(&link->link, &bpf_cgroup_link_lops, prog); 651 + link->cgroup = cgrp; 652 + link->type = attr->link_create.attach_type; 653 + 654 + link_file = bpf_link_new_file(&link->link, &link_fd); 655 + if (IS_ERR(link_file)) { 656 + kfree(link); 657 + err = PTR_ERR(link_file); 658 + goto out_put_cgroup; 659 + } 660 + 661 + err = cgroup_bpf_attach(cgrp, NULL, NULL, link, link->type, 662 + BPF_F_ALLOW_MULTI); 663 + if (err) { 664 + bpf_link_cleanup(&link->link, link_file, link_fd); 665 + goto out_put_cgroup; 666 + } 667 + 668 + fd_install(link_fd, link_file); 669 + return link_fd; 670 + 671 + out_put_cgroup: 672 + cgroup_put(cgrp); 673 + return err; 775 674 } 776 675 777 676 int cgroup_bpf_prog_query(const union bpf_attr *attr,

+1

kernel/bpf/core.c

··· 2156 2156 const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak; 2157 2157 const struct bpf_func_proto bpf_get_current_comm_proto __weak; 2158 2158 const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak; 2159 + const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto __weak; 2159 2160 const struct bpf_func_proto bpf_get_local_storage_proto __weak; 2160 2161 const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto __weak; 2161 2162

+18

kernel/bpf/helpers.c

··· 340 340 .ret_type = RET_INTEGER, 341 341 }; 342 342 343 + BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level) 344 + { 345 + struct cgroup *cgrp = task_dfl_cgroup(current); 346 + struct cgroup *ancestor; 347 + 348 + ancestor = cgroup_ancestor(cgrp, ancestor_level); 349 + if (!ancestor) 350 + return 0; 351 + return cgroup_id(ancestor); 352 + } 353 + 354 + const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto = { 355 + .func = bpf_get_current_ancestor_cgroup_id, 356 + .gpl_only = false, 357 + .ret_type = RET_INTEGER, 358 + .arg1_type = ARG_ANYTHING, 359 + }; 360 + 343 361 #ifdef CONFIG_CGROUP_BPF 344 362 DECLARE_PER_CPU(struct bpf_cgroup_storage*, 345 363 bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);

+221 -121

kernel/bpf/syscall.c

··· 25 25 #include <linux/nospec.h> 26 26 #include <linux/audit.h> 27 27 #include <uapi/linux/btf.h> 28 + #include <linux/bpf_lsm.h> 28 29 29 30 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ 30 31 (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ ··· 1943 1942 1944 1943 switch (prog_type) { 1945 1944 case BPF_PROG_TYPE_TRACING: 1945 + case BPF_PROG_TYPE_LSM: 1946 1946 case BPF_PROG_TYPE_STRUCT_OPS: 1947 1947 case BPF_PROG_TYPE_EXT: 1948 1948 break; ··· 2183 2181 attr->file_flags); 2184 2182 } 2185 2183 2186 - struct bpf_link { 2187 - atomic64_t refcnt; 2188 - const struct bpf_link_ops *ops; 2189 - struct bpf_prog *prog; 2190 - struct work_struct work; 2191 - }; 2192 - 2193 2184 void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops, 2194 2185 struct bpf_prog *prog) 2195 2186 { ··· 2196 2201 * anon_inode's release() call. This helper manages marking bpf_link as 2197 2202 * defunct, releases anon_inode file and puts reserved FD. 2198 2203 */ 2199 - static void bpf_link_cleanup(struct bpf_link *link, struct file *link_file, 2200 - int link_fd) 2204 + void bpf_link_cleanup(struct bpf_link *link, struct file *link_file, 2205 + int link_fd) 2201 2206 { 2202 2207 link->prog = NULL; 2203 2208 fput(link_file); ··· 2255 2260 #ifdef CONFIG_PROC_FS 2256 2261 static const struct bpf_link_ops bpf_raw_tp_lops; 2257 2262 static const struct bpf_link_ops bpf_tracing_link_lops; 2258 - static const struct bpf_link_ops bpf_xdp_link_lops; 2259 2263 2260 2264 static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp) 2261 2265 { ··· 2267 2273 link_type = "raw_tracepoint"; 2268 2274 else if (link->ops == &bpf_tracing_link_lops) 2269 2275 link_type = "tracing"; 2276 + #ifdef CONFIG_CGROUP_BPF 2277 + else if (link->ops == &bpf_cgroup_link_lops) 2278 + link_type = "cgroup"; 2279 + #endif 2270 2280 else 2271 2281 link_type = "unknown"; 2272 2282 ··· 2373 2375 struct file *link_file; 2374 2376 int link_fd, err; 2375 2377 2376 - if (prog->expected_attach_type != BPF_TRACE_FENTRY && 2377 - prog->expected_attach_type != BPF_TRACE_FEXIT && 2378 - prog->expected_attach_type != BPF_MODIFY_RETURN && 2379 - prog->type != BPF_PROG_TYPE_EXT) { 2378 + switch (prog->type) { 2379 + case BPF_PROG_TYPE_TRACING: 2380 + if (prog->expected_attach_type != BPF_TRACE_FENTRY && 2381 + prog->expected_attach_type != BPF_TRACE_FEXIT && 2382 + prog->expected_attach_type != BPF_MODIFY_RETURN) { 2383 + err = -EINVAL; 2384 + goto out_put_prog; 2385 + } 2386 + break; 2387 + case BPF_PROG_TYPE_EXT: 2388 + if (prog->expected_attach_type != 0) { 2389 + err = -EINVAL; 2390 + goto out_put_prog; 2391 + } 2392 + break; 2393 + case BPF_PROG_TYPE_LSM: 2394 + if (prog->expected_attach_type != BPF_LSM_MAC) { 2395 + err = -EINVAL; 2396 + goto out_put_prog; 2397 + } 2398 + break; 2399 + default: 2380 2400 err = -EINVAL; 2381 2401 goto out_put_prog; 2382 2402 } ··· 2473 2457 if (IS_ERR(prog)) 2474 2458 return PTR_ERR(prog); 2475 2459 2476 - if (prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT && 2477 - prog->type != BPF_PROG_TYPE_TRACING && 2478 - prog->type != BPF_PROG_TYPE_EXT && 2479 - prog->type != BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE) { 2480 - err = -EINVAL; 2481 - goto out_put_prog; 2482 - } 2483 - 2484 - if (prog->type == BPF_PROG_TYPE_TRACING || 2485 - prog->type == BPF_PROG_TYPE_EXT) { 2460 + switch (prog->type) { 2461 + case BPF_PROG_TYPE_TRACING: 2462 + case BPF_PROG_TYPE_EXT: 2463 + case BPF_PROG_TYPE_LSM: 2486 2464 if (attr->raw_tracepoint.name) { 2487 2465 /* The attach point for this category of programs 2488 2466 * should be specified via btf_id during program load. ··· 2484 2474 err = -EINVAL; 2485 2475 goto out_put_prog; 2486 2476 } 2487 - if (prog->expected_attach_type == BPF_TRACE_RAW_TP) 2477 + if (prog->type == BPF_PROG_TYPE_TRACING && 2478 + prog->expected_attach_type == BPF_TRACE_RAW_TP) { 2488 2479 tp_name = prog->aux->attach_func_name; 2489 - else 2490 - return bpf_tracing_prog_attach(prog); 2491 - } else { 2480 + break; 2481 + } 2482 + return bpf_tracing_prog_attach(prog); 2483 + case BPF_PROG_TYPE_RAW_TRACEPOINT: 2484 + case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE: 2492 2485 if (strncpy_from_user(buf, 2493 2486 u64_to_user_ptr(attr->raw_tracepoint.name), 2494 2487 sizeof(buf) - 1) < 0) { ··· 2500 2487 } 2501 2488 buf[sizeof(buf) - 1] = 0; 2502 2489 tp_name = buf; 2490 + break; 2491 + default: 2492 + err = -EINVAL; 2493 + goto out_put_prog; 2503 2494 } 2504 2495 2505 2496 btp = bpf_get_raw_tracepoint(tp_name); ··· 2560 2543 } 2561 2544 } 2562 2545 2546 + static enum bpf_prog_type 2547 + attach_type_to_prog_type(enum bpf_attach_type attach_type) 2548 + { 2549 + switch (attach_type) { 2550 + case BPF_CGROUP_INET_INGRESS: 2551 + case BPF_CGROUP_INET_EGRESS: 2552 + return BPF_PROG_TYPE_CGROUP_SKB; 2553 + break; 2554 + case BPF_CGROUP_INET_SOCK_CREATE: 2555 + case BPF_CGROUP_INET4_POST_BIND: 2556 + case BPF_CGROUP_INET6_POST_BIND: 2557 + return BPF_PROG_TYPE_CGROUP_SOCK; 2558 + case BPF_CGROUP_INET4_BIND: 2559 + case BPF_CGROUP_INET6_BIND: 2560 + case BPF_CGROUP_INET4_CONNECT: 2561 + case BPF_CGROUP_INET6_CONNECT: 2562 + case BPF_CGROUP_UDP4_SENDMSG: 2563 + case BPF_CGROUP_UDP6_SENDMSG: 2564 + case BPF_CGROUP_UDP4_RECVMSG: 2565 + case BPF_CGROUP_UDP6_RECVMSG: 2566 + return BPF_PROG_TYPE_CGROUP_SOCK_ADDR; 2567 + case BPF_CGROUP_SOCK_OPS: 2568 + return BPF_PROG_TYPE_SOCK_OPS; 2569 + case BPF_CGROUP_DEVICE: 2570 + return BPF_PROG_TYPE_CGROUP_DEVICE; 2571 + case BPF_SK_MSG_VERDICT: 2572 + return BPF_PROG_TYPE_SK_MSG; 2573 + case BPF_SK_SKB_STREAM_PARSER: 2574 + case BPF_SK_SKB_STREAM_VERDICT: 2575 + return BPF_PROG_TYPE_SK_SKB; 2576 + case BPF_LIRC_MODE2: 2577 + return BPF_PROG_TYPE_LIRC_MODE2; 2578 + case BPF_FLOW_DISSECTOR: 2579 + return BPF_PROG_TYPE_FLOW_DISSECTOR; 2580 + case BPF_CGROUP_SYSCTL: 2581 + return BPF_PROG_TYPE_CGROUP_SYSCTL; 2582 + case BPF_CGROUP_GETSOCKOPT: 2583 + case BPF_CGROUP_SETSOCKOPT: 2584 + return BPF_PROG_TYPE_CGROUP_SOCKOPT; 2585 + default: 2586 + return BPF_PROG_TYPE_UNSPEC; 2587 + } 2588 + } 2589 + 2563 2590 #define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd 2564 2591 2565 2592 #define BPF_F_ATTACH_MASK \ ··· 2624 2563 if (attr->attach_flags & ~BPF_F_ATTACH_MASK) 2625 2564 return -EINVAL; 2626 2565 2627 - switch (attr->attach_type) { 2628 - case BPF_CGROUP_INET_INGRESS: 2629 - case BPF_CGROUP_INET_EGRESS: 2630 - ptype = BPF_PROG_TYPE_CGROUP_SKB; 2631 - break; 2632 - case BPF_CGROUP_INET_SOCK_CREATE: 2633 - case BPF_CGROUP_INET4_POST_BIND: 2634 - case BPF_CGROUP_INET6_POST_BIND: 2635 - ptype = BPF_PROG_TYPE_CGROUP_SOCK; 2636 - break; 2637 - case BPF_CGROUP_INET4_BIND: 2638 - case BPF_CGROUP_INET6_BIND: 2639 - case BPF_CGROUP_INET4_CONNECT: 2640 - case BPF_CGROUP_INET6_CONNECT: 2641 - case BPF_CGROUP_UDP4_SENDMSG: 2642 - case BPF_CGROUP_UDP6_SENDMSG: 2643 - case BPF_CGROUP_UDP4_RECVMSG: 2644 - case BPF_CGROUP_UDP6_RECVMSG: 2645 - ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; 2646 - break; 2647 - case BPF_CGROUP_SOCK_OPS: 2648 - ptype = BPF_PROG_TYPE_SOCK_OPS; 2649 - break; 2650 - case BPF_CGROUP_DEVICE: 2651 - ptype = BPF_PROG_TYPE_CGROUP_DEVICE; 2652 - break; 2653 - case BPF_SK_MSG_VERDICT: 2654 - ptype = BPF_PROG_TYPE_SK_MSG; 2655 - break; 2656 - case BPF_SK_SKB_STREAM_PARSER: 2657 - case BPF_SK_SKB_STREAM_VERDICT: 2658 - ptype = BPF_PROG_TYPE_SK_SKB; 2659 - break; 2660 - case BPF_LIRC_MODE2: 2661 - ptype = BPF_PROG_TYPE_LIRC_MODE2; 2662 - break; 2663 - case BPF_FLOW_DISSECTOR: 2664 - ptype = BPF_PROG_TYPE_FLOW_DISSECTOR; 2665 - break; 2666 - case BPF_CGROUP_SYSCTL: 2667 - ptype = BPF_PROG_TYPE_CGROUP_SYSCTL; 2668 - break; 2669 - case BPF_CGROUP_GETSOCKOPT: 2670 - case BPF_CGROUP_SETSOCKOPT: 2671 - ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT; 2672 - break; 2673 - default: 2566 + ptype = attach_type_to_prog_type(attr->attach_type); 2567 + if (ptype == BPF_PROG_TYPE_UNSPEC) 2674 2568 return -EINVAL; 2675 - } 2676 2569 2677 2570 prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype); 2678 2571 if (IS_ERR(prog)) ··· 2648 2633 case BPF_PROG_TYPE_FLOW_DISSECTOR: 2649 2634 ret = skb_flow_dissector_bpf_prog_attach(attr, prog); 2650 2635 break; 2651 - default: 2636 + case BPF_PROG_TYPE_CGROUP_DEVICE: 2637 + case BPF_PROG_TYPE_CGROUP_SKB: 2638 + case BPF_PROG_TYPE_CGROUP_SOCK: 2639 + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: 2640 + case BPF_PROG_TYPE_CGROUP_SOCKOPT: 2641 + case BPF_PROG_TYPE_CGROUP_SYSCTL: 2642 + case BPF_PROG_TYPE_SOCK_OPS: 2652 2643 ret = cgroup_bpf_prog_attach(attr, ptype, prog); 2644 + break; 2645 + default: 2646 + ret = -EINVAL; 2653 2647 } 2654 2648 2655 2649 if (ret) ··· 2678 2654 if (CHECK_ATTR(BPF_PROG_DETACH)) 2679 2655 return -EINVAL; 2680 2656 2681 - switch (attr->attach_type) { 2682 - case BPF_CGROUP_INET_INGRESS: 2683 - case BPF_CGROUP_INET_EGRESS: 2684 - ptype = BPF_PROG_TYPE_CGROUP_SKB; 2685 - break; 2686 - case BPF_CGROUP_INET_SOCK_CREATE: 2687 - case BPF_CGROUP_INET4_POST_BIND: 2688 - case BPF_CGROUP_INET6_POST_BIND: 2689 - ptype = BPF_PROG_TYPE_CGROUP_SOCK; 2690 - break; 2691 - case BPF_CGROUP_INET4_BIND: 2692 - case BPF_CGROUP_INET6_BIND: 2693 - case BPF_CGROUP_INET4_CONNECT: 2694 - case BPF_CGROUP_INET6_CONNECT: 2695 - case BPF_CGROUP_UDP4_SENDMSG: 2696 - case BPF_CGROUP_UDP6_SENDMSG: 2697 - case BPF_CGROUP_UDP4_RECVMSG: 2698 - case BPF_CGROUP_UDP6_RECVMSG: 2699 - ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; 2700 - break; 2701 - case BPF_CGROUP_SOCK_OPS: 2702 - ptype = BPF_PROG_TYPE_SOCK_OPS; 2703 - break; 2704 - case BPF_CGROUP_DEVICE: 2705 - ptype = BPF_PROG_TYPE_CGROUP_DEVICE; 2706 - break; 2707 - case BPF_SK_MSG_VERDICT: 2657 + ptype = attach_type_to_prog_type(attr->attach_type); 2658 + 2659 + switch (ptype) { 2660 + case BPF_PROG_TYPE_SK_MSG: 2661 + case BPF_PROG_TYPE_SK_SKB: 2708 2662 return sock_map_get_from_fd(attr, NULL); 2709 - case BPF_SK_SKB_STREAM_PARSER: 2710 - case BPF_SK_SKB_STREAM_VERDICT: 2711 - return sock_map_get_from_fd(attr, NULL); 2712 - case BPF_LIRC_MODE2: 2663 + case BPF_PROG_TYPE_LIRC_MODE2: 2713 2664 return lirc_prog_detach(attr); 2714 - case BPF_FLOW_DISSECTOR: 2665 + case BPF_PROG_TYPE_FLOW_DISSECTOR: 2715 2666 return skb_flow_dissector_bpf_prog_detach(attr); 2716 - case BPF_CGROUP_SYSCTL: 2717 - ptype = BPF_PROG_TYPE_CGROUP_SYSCTL; 2718 - break; 2719 - case BPF_CGROUP_GETSOCKOPT: 2720 - case BPF_CGROUP_SETSOCKOPT: 2721 - ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT; 2722 - break; 2667 + case BPF_PROG_TYPE_CGROUP_DEVICE: 2668 + case BPF_PROG_TYPE_CGROUP_SKB: 2669 + case BPF_PROG_TYPE_CGROUP_SOCK: 2670 + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: 2671 + case BPF_PROG_TYPE_CGROUP_SOCKOPT: 2672 + case BPF_PROG_TYPE_CGROUP_SYSCTL: 2673 + case BPF_PROG_TYPE_SOCK_OPS: 2674 + return cgroup_bpf_prog_detach(attr, ptype); 2723 2675 default: 2724 2676 return -EINVAL; 2725 2677 } 2726 - 2727 - return cgroup_bpf_prog_detach(attr, ptype); 2728 2678 } 2729 2679 2730 2680 #define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt ··· 2732 2734 case BPF_CGROUP_SYSCTL: 2733 2735 case BPF_CGROUP_GETSOCKOPT: 2734 2736 case BPF_CGROUP_SETSOCKOPT: 2735 - break; 2737 + return cgroup_bpf_prog_query(attr, uattr); 2736 2738 case BPF_LIRC_MODE2: 2737 2739 return lirc_prog_query(attr, uattr); 2738 2740 case BPF_FLOW_DISSECTOR: ··· 2740 2742 default: 2741 2743 return -EINVAL; 2742 2744 } 2743 - 2744 - return cgroup_bpf_prog_query(attr, uattr); 2745 2745 } 2746 2746 2747 2747 #define BPF_PROG_TEST_RUN_LAST_FIELD test.ctx_out ··· 3560 3564 return err; 3561 3565 } 3562 3566 3567 + #define BPF_LINK_CREATE_LAST_FIELD link_create.flags 3568 + static int link_create(union bpf_attr *attr) 3569 + { 3570 + enum bpf_prog_type ptype; 3571 + struct bpf_prog *prog; 3572 + int ret; 3573 + 3574 + if (!capable(CAP_NET_ADMIN)) 3575 + return -EPERM; 3576 + 3577 + if (CHECK_ATTR(BPF_LINK_CREATE)) 3578 + return -EINVAL; 3579 + 3580 + ptype = attach_type_to_prog_type(attr->link_create.attach_type); 3581 + if (ptype == BPF_PROG_TYPE_UNSPEC) 3582 + return -EINVAL; 3583 + 3584 + prog = bpf_prog_get_type(attr->link_create.prog_fd, ptype); 3585 + if (IS_ERR(prog)) 3586 + return PTR_ERR(prog); 3587 + 3588 + ret = bpf_prog_attach_check_attach_type(prog, 3589 + attr->link_create.attach_type); 3590 + if (ret) 3591 + goto err_out; 3592 + 3593 + switch (ptype) { 3594 + case BPF_PROG_TYPE_CGROUP_SKB: 3595 + case BPF_PROG_TYPE_CGROUP_SOCK: 3596 + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: 3597 + case BPF_PROG_TYPE_SOCK_OPS: 3598 + case BPF_PROG_TYPE_CGROUP_DEVICE: 3599 + case BPF_PROG_TYPE_CGROUP_SYSCTL: 3600 + case BPF_PROG_TYPE_CGROUP_SOCKOPT: 3601 + ret = cgroup_bpf_link_attach(attr, prog); 3602 + break; 3603 + default: 3604 + ret = -EINVAL; 3605 + } 3606 + 3607 + err_out: 3608 + if (ret < 0) 3609 + bpf_prog_put(prog); 3610 + return ret; 3611 + } 3612 + 3613 + #define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd 3614 + 3615 + static int link_update(union bpf_attr *attr) 3616 + { 3617 + struct bpf_prog *old_prog = NULL, *new_prog; 3618 + struct bpf_link *link; 3619 + u32 flags; 3620 + int ret; 3621 + 3622 + if (!capable(CAP_NET_ADMIN)) 3623 + return -EPERM; 3624 + 3625 + if (CHECK_ATTR(BPF_LINK_UPDATE)) 3626 + return -EINVAL; 3627 + 3628 + flags = attr->link_update.flags; 3629 + if (flags & ~BPF_F_REPLACE) 3630 + return -EINVAL; 3631 + 3632 + link = bpf_link_get_from_fd(attr->link_update.link_fd); 3633 + if (IS_ERR(link)) 3634 + return PTR_ERR(link); 3635 + 3636 + new_prog = bpf_prog_get(attr->link_update.new_prog_fd); 3637 + if (IS_ERR(new_prog)) 3638 + return PTR_ERR(new_prog); 3639 + 3640 + if (flags & BPF_F_REPLACE) { 3641 + old_prog = bpf_prog_get(attr->link_update.old_prog_fd); 3642 + if (IS_ERR(old_prog)) { 3643 + ret = PTR_ERR(old_prog); 3644 + old_prog = NULL; 3645 + goto out_put_progs; 3646 + } 3647 + } 3648 + 3649 + #ifdef CONFIG_CGROUP_BPF 3650 + if (link->ops == &bpf_cgroup_link_lops) { 3651 + ret = cgroup_bpf_replace(link, old_prog, new_prog); 3652 + goto out_put_progs; 3653 + } 3654 + #endif 3655 + ret = -EINVAL; 3656 + 3657 + out_put_progs: 3658 + if (old_prog) 3659 + bpf_prog_put(old_prog); 3660 + if (ret) 3661 + bpf_prog_put(new_prog); 3662 + return ret; 3663 + } 3664 + 3563 3665 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) 3564 3666 { 3565 3667 union bpf_attr attr; ··· 3768 3674 break; 3769 3675 case BPF_MAP_DELETE_BATCH: 3770 3676 err = bpf_map_do_batch(&attr, uattr, BPF_MAP_DELETE_BATCH); 3677 + break; 3678 + case BPF_LINK_CREATE: 3679 + err = link_create(&attr); 3680 + break; 3681 + case BPF_LINK_UPDATE: 3682 + err = link_update(&attr); 3771 3683 break; 3772 3684 default: 3773 3685 err = -EINVAL;

+5 -6

kernel/bpf/sysfs_btf.c

··· 9 9 #include <linux/sysfs.h> 10 10 11 11 /* See scripts/link-vmlinux.sh, gen_btf() func for details */ 12 - extern char __weak _binary__btf_vmlinux_bin_start[]; 13 - extern char __weak _binary__btf_vmlinux_bin_end[]; 12 + extern char __weak __start_BTF[]; 13 + extern char __weak __stop_BTF[]; 14 14 15 15 static ssize_t 16 16 btf_vmlinux_read(struct file *file, struct kobject *kobj, 17 17 struct bin_attribute *bin_attr, 18 18 char *buf, loff_t off, size_t len) 19 19 { 20 - memcpy(buf, _binary__btf_vmlinux_bin_start + off, len); 20 + memcpy(buf, __start_BTF + off, len); 21 21 return len; 22 22 } 23 23 ··· 30 30 31 31 static int __init btf_vmlinux_init(void) 32 32 { 33 - if (!_binary__btf_vmlinux_bin_start) 33 + if (!__start_BTF) 34 34 return 0; 35 35 36 36 btf_kobj = kobject_create_and_add("btf", kernel_kobj); 37 37 if (!btf_kobj) 38 38 return -ENOMEM; 39 39 40 - bin_attr_btf_vmlinux.size = _binary__btf_vmlinux_bin_end - 41 - _binary__btf_vmlinux_bin_start; 40 + bin_attr_btf_vmlinux.size = __stop_BTF - __start_BTF; 42 41 43 42 return sysfs_create_bin_file(btf_kobj, &bin_attr_btf_vmlinux); 44 43 }

+15

kernel/bpf/tnum.c

··· 194 194 str[min(size - 1, (size_t)64)] = 0; 195 195 return 64; 196 196 } 197 + 198 + struct tnum tnum_subreg(struct tnum a) 199 + { 200 + return tnum_cast(a, 4); 201 + } 202 + 203 + struct tnum tnum_clear_subreg(struct tnum a) 204 + { 205 + return tnum_lshift(tnum_rshift(a, 32), 32); 206 + } 207 + 208 + struct tnum tnum_const_subreg(struct tnum a, u32 value) 209 + { 210 + return tnum_or(tnum_clear_subreg(a), tnum_const(value)); 211 + }

+13 -4

kernel/bpf/trampoline.c

··· 6 6 #include <linux/ftrace.h> 7 7 #include <linux/rbtree_latch.h> 8 8 #include <linux/perf_event.h> 9 + #include <linux/btf.h> 9 10 10 11 /* dummy _ops. The verifier will operate on target program's ops. */ 11 12 const struct bpf_verifier_ops bpf_extension_verifier_ops = { ··· 234 233 return err; 235 234 } 236 235 237 - static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(enum bpf_attach_type t) 236 + static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog) 238 237 { 239 - switch (t) { 238 + switch (prog->expected_attach_type) { 240 239 case BPF_TRACE_FENTRY: 241 240 return BPF_TRAMP_FENTRY; 242 241 case BPF_MODIFY_RETURN: 243 242 return BPF_TRAMP_MODIFY_RETURN; 244 243 case BPF_TRACE_FEXIT: 245 244 return BPF_TRAMP_FEXIT; 245 + case BPF_LSM_MAC: 246 + if (!prog->aux->attach_func_proto->type) 247 + /* The function returns void, we cannot modify its 248 + * return value. 249 + */ 250 + return BPF_TRAMP_FEXIT; 251 + else 252 + return BPF_TRAMP_MODIFY_RETURN; 246 253 default: 247 254 return BPF_TRAMP_REPLACE; 248 255 } ··· 264 255 int cnt; 265 256 266 257 tr = prog->aux->trampoline; 267 - kind = bpf_attach_type_to_tramp(prog->expected_attach_type); 258 + kind = bpf_attach_type_to_tramp(prog); 268 259 mutex_lock(&tr->mutex); 269 260 if (tr->extension_prog) { 270 261 /* cannot attach fentry/fexit if extension prog is attached. ··· 314 305 int err; 315 306 316 307 tr = prog->aux->trampoline; 317 - kind = bpf_attach_type_to_tramp(prog->expected_attach_type); 308 + kind = bpf_attach_type_to_tramp(prog); 318 309 mutex_lock(&tr->mutex); 319 310 if (kind == BPF_TRAMP_REPLACE) { 320 311 WARN_ON_ONCE(!tr->extension_prog);

+1089 -485

kernel/bpf/verifier.c

··· 20 20 #include <linux/perf_event.h> 21 21 #include <linux/ctype.h> 22 22 #include <linux/error-injection.h> 23 + #include <linux/bpf_lsm.h> 23 24 24 25 #include "disasm.h" 25 26 ··· 229 228 bool pkt_access; 230 229 int regno; 231 230 int access_size; 232 - s64 msize_smax_value; 233 - u64 msize_umax_value; 231 + u64 msize_max_value; 234 232 int ref_obj_id; 235 233 int func_id; 236 234 u32 btf_id; ··· 550 550 tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off); 551 551 verbose(env, ",var_off=%s", tn_buf); 552 552 } 553 + if (reg->s32_min_value != reg->smin_value && 554 + reg->s32_min_value != S32_MIN) 555 + verbose(env, ",s32_min_value=%d", 556 + (int)(reg->s32_min_value)); 557 + if (reg->s32_max_value != reg->smax_value && 558 + reg->s32_max_value != S32_MAX) 559 + verbose(env, ",s32_max_value=%d", 560 + (int)(reg->s32_max_value)); 561 + if (reg->u32_min_value != reg->umin_value && 562 + reg->u32_min_value != U32_MIN) 563 + verbose(env, ",u32_min_value=%d", 564 + (int)(reg->u32_min_value)); 565 + if (reg->u32_max_value != reg->umax_value && 566 + reg->u32_max_value != U32_MAX) 567 + verbose(env, ",u32_max_value=%d", 568 + (int)(reg->u32_max_value)); 553 569 } 554 570 verbose(env, ")"); 555 571 } ··· 940 924 reg->smax_value = (s64)imm; 941 925 reg->umin_value = imm; 942 926 reg->umax_value = imm; 927 + 928 + reg->s32_min_value = (s32)imm; 929 + reg->s32_max_value = (s32)imm; 930 + reg->u32_min_value = (u32)imm; 931 + reg->u32_max_value = (u32)imm; 932 + } 933 + 934 + static void __mark_reg32_known(struct bpf_reg_state *reg, u64 imm) 935 + { 936 + reg->var_off = tnum_const_subreg(reg->var_off, imm); 937 + reg->s32_min_value = (s32)imm; 938 + reg->s32_max_value = (s32)imm; 939 + reg->u32_min_value = (u32)imm; 940 + reg->u32_max_value = (u32)imm; 943 941 } 944 942 945 943 /* Mark the 'variable offset' part of a register as zero. This should be ··· 1008 978 tnum_equals_const(reg->var_off, 0); 1009 979 } 1010 980 1011 - /* Attempts to improve min/max values based on var_off information */ 1012 - static void __update_reg_bounds(struct bpf_reg_state *reg) 981 + /* Reset the min/max bounds of a register */ 982 + static void __mark_reg_unbounded(struct bpf_reg_state *reg) 983 + { 984 + reg->smin_value = S64_MIN; 985 + reg->smax_value = S64_MAX; 986 + reg->umin_value = 0; 987 + reg->umax_value = U64_MAX; 988 + 989 + reg->s32_min_value = S32_MIN; 990 + reg->s32_max_value = S32_MAX; 991 + reg->u32_min_value = 0; 992 + reg->u32_max_value = U32_MAX; 993 + } 994 + 995 + static void __mark_reg64_unbounded(struct bpf_reg_state *reg) 996 + { 997 + reg->smin_value = S64_MIN; 998 + reg->smax_value = S64_MAX; 999 + reg->umin_value = 0; 1000 + reg->umax_value = U64_MAX; 1001 + } 1002 + 1003 + static void __mark_reg32_unbounded(struct bpf_reg_state *reg) 1004 + { 1005 + reg->s32_min_value = S32_MIN; 1006 + reg->s32_max_value = S32_MAX; 1007 + reg->u32_min_value = 0; 1008 + reg->u32_max_value = U32_MAX; 1009 + } 1010 + 1011 + static void __update_reg32_bounds(struct bpf_reg_state *reg) 1012 + { 1013 + struct tnum var32_off = tnum_subreg(reg->var_off); 1014 + 1015 + /* min signed is max(sign bit) | min(other bits) */ 1016 + reg->s32_min_value = max_t(s32, reg->s32_min_value, 1017 + var32_off.value | (var32_off.mask & S32_MIN)); 1018 + /* max signed is min(sign bit) | max(other bits) */ 1019 + reg->s32_max_value = min_t(s32, reg->s32_max_value, 1020 + var32_off.value | (var32_off.mask & S32_MAX)); 1021 + reg->u32_min_value = max_t(u32, reg->u32_min_value, (u32)var32_off.value); 1022 + reg->u32_max_value = min(reg->u32_max_value, 1023 + (u32)(var32_off.value | var32_off.mask)); 1024 + } 1025 + 1026 + static void __update_reg64_bounds(struct bpf_reg_state *reg) 1013 1027 { 1014 1028 /* min signed is max(sign bit) | min(other bits) */ 1015 1029 reg->smin_value = max_t(s64, reg->smin_value, ··· 1066 992 reg->var_off.value | reg->var_off.mask); 1067 993 } 1068 994 995 + static void __update_reg_bounds(struct bpf_reg_state *reg) 996 + { 997 + __update_reg32_bounds(reg); 998 + __update_reg64_bounds(reg); 999 + } 1000 + 1069 1001 /* Uses signed min/max values to inform unsigned, and vice-versa */ 1070 - static void __reg_deduce_bounds(struct bpf_reg_state *reg) 1002 + static void __reg32_deduce_bounds(struct bpf_reg_state *reg) 1003 + { 1004 + /* Learn sign from signed bounds. 1005 + * If we cannot cross the sign boundary, then signed and unsigned bounds 1006 + * are the same, so combine. This works even in the negative case, e.g. 1007 + * -3 s<= x s<= -1 implies 0xf...fd u<= x u<= 0xf...ff. 1008 + */ 1009 + if (reg->s32_min_value >= 0 || reg->s32_max_value < 0) { 1010 + reg->s32_min_value = reg->u32_min_value = 1011 + max_t(u32, reg->s32_min_value, reg->u32_min_value); 1012 + reg->s32_max_value = reg->u32_max_value = 1013 + min_t(u32, reg->s32_max_value, reg->u32_max_value); 1014 + return; 1015 + } 1016 + /* Learn sign from unsigned bounds. Signed bounds cross the sign 1017 + * boundary, so we must be careful. 1018 + */ 1019 + if ((s32)reg->u32_max_value >= 0) { 1020 + /* Positive. We can't learn anything from the smin, but smax 1021 + * is positive, hence safe. 1022 + */ 1023 + reg->s32_min_value = reg->u32_min_value; 1024 + reg->s32_max_value = reg->u32_max_value = 1025 + min_t(u32, reg->s32_max_value, reg->u32_max_value); 1026 + } else if ((s32)reg->u32_min_value < 0) { 1027 + /* Negative. We can't learn anything from the smax, but smin 1028 + * is negative, hence safe. 1029 + */ 1030 + reg->s32_min_value = reg->u32_min_value = 1031 + max_t(u32, reg->s32_min_value, reg->u32_min_value); 1032 + reg->s32_max_value = reg->u32_max_value; 1033 + } 1034 + } 1035 + 1036 + static void __reg64_deduce_bounds(struct bpf_reg_state *reg) 1071 1037 { 1072 1038 /* Learn sign from signed bounds. 1073 1039 * If we cannot cross the sign boundary, then signed and unsigned bounds ··· 1141 1027 } 1142 1028 } 1143 1029 1030 + static void __reg_deduce_bounds(struct bpf_reg_state *reg) 1031 + { 1032 + __reg32_deduce_bounds(reg); 1033 + __reg64_deduce_bounds(reg); 1034 + } 1035 + 1144 1036 /* Attempts to improve var_off based on unsigned min/max information */ 1145 1037 static void __reg_bound_offset(struct bpf_reg_state *reg) 1146 1038 { 1147 - reg->var_off = tnum_intersect(reg->var_off, 1148 - tnum_range(reg->umin_value, 1149 - reg->umax_value)); 1039 + struct tnum var64_off = tnum_intersect(reg->var_off, 1040 + tnum_range(reg->umin_value, 1041 + reg->umax_value)); 1042 + struct tnum var32_off = tnum_intersect(tnum_subreg(reg->var_off), 1043 + tnum_range(reg->u32_min_value, 1044 + reg->u32_max_value)); 1045 + 1046 + reg->var_off = tnum_or(tnum_clear_subreg(var64_off), var32_off); 1150 1047 } 1151 1048 1152 - static void __reg_bound_offset32(struct bpf_reg_state *reg) 1049 + static void __reg_assign_32_into_64(struct bpf_reg_state *reg) 1153 1050 { 1154 - u64 mask = 0xffffFFFF; 1155 - struct tnum range = tnum_range(reg->umin_value & mask, 1156 - reg->umax_value & mask); 1157 - struct tnum lo32 = tnum_cast(reg->var_off, 4); 1158 - struct tnum hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32); 1159 - 1160 - reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range)); 1051 + reg->umin_value = reg->u32_min_value; 1052 + reg->umax_value = reg->u32_max_value; 1053 + /* Attempt to pull 32-bit signed bounds into 64-bit bounds 1054 + * but must be positive otherwise set to worse case bounds 1055 + * and refine later from tnum. 1056 + */ 1057 + if (reg->s32_min_value > 0) 1058 + reg->smin_value = reg->s32_min_value; 1059 + else 1060 + reg->smin_value = 0; 1061 + if (reg->s32_max_value > 0) 1062 + reg->smax_value = reg->s32_max_value; 1063 + else 1064 + reg->smax_value = U32_MAX; 1161 1065 } 1162 1066 1163 - /* Reset the min/max bounds of a register */ 1164 - static void __mark_reg_unbounded(struct bpf_reg_state *reg) 1067 + static void __reg_combine_32_into_64(struct bpf_reg_state *reg) 1165 1068 { 1166 - reg->smin_value = S64_MIN; 1167 - reg->smax_value = S64_MAX; 1168 - reg->umin_value = 0; 1169 - reg->umax_value = U64_MAX; 1069 + /* special case when 64-bit register has upper 32-bit register 1070 + * zeroed. Typically happens after zext or <<32, >>32 sequence 1071 + * allowing us to use 32-bit bounds directly, 1072 + */ 1073 + if (tnum_equals_const(tnum_clear_subreg(reg->var_off), 0)) { 1074 + __reg_assign_32_into_64(reg); 1075 + } else { 1076 + /* Otherwise the best we can do is push lower 32bit known and 1077 + * unknown bits into register (var_off set from jmp logic) 1078 + * then learn as much as possible from the 64-bit tnum 1079 + * known and unknown bits. The previous smin/smax bounds are 1080 + * invalid here because of jmp32 compare so mark them unknown 1081 + * so they do not impact tnum bounds calculation. 1082 + */ 1083 + __mark_reg64_unbounded(reg); 1084 + __update_reg_bounds(reg); 1085 + } 1086 + 1087 + /* Intersecting with the old var_off might have improved our bounds 1088 + * slightly. e.g. if umax was 0x7f...f and var_off was (0; 0xf...fc), 1089 + * then new var_off is (0; 0x7f...fc) which improves our umax. 1090 + */ 1091 + __reg_deduce_bounds(reg); 1092 + __reg_bound_offset(reg); 1093 + __update_reg_bounds(reg); 1094 + } 1095 + 1096 + static bool __reg64_bound_s32(s64 a) 1097 + { 1098 + if (a > S32_MIN && a < S32_MAX) 1099 + return true; 1100 + return false; 1101 + } 1102 + 1103 + static bool __reg64_bound_u32(u64 a) 1104 + { 1105 + if (a > U32_MIN && a < U32_MAX) 1106 + return true; 1107 + return false; 1108 + } 1109 + 1110 + static void __reg_combine_64_into_32(struct bpf_reg_state *reg) 1111 + { 1112 + __mark_reg32_unbounded(reg); 1113 + 1114 + if (__reg64_bound_s32(reg->smin_value)) 1115 + reg->s32_min_value = (s32)reg->smin_value; 1116 + if (__reg64_bound_s32(reg->smax_value)) 1117 + reg->s32_max_value = (s32)reg->smax_value; 1118 + if (__reg64_bound_u32(reg->umin_value)) 1119 + reg->u32_min_value = (u32)reg->umin_value; 1120 + if (__reg64_bound_u32(reg->umax_value)) 1121 + reg->u32_max_value = (u32)reg->umax_value; 1122 + 1123 + /* Intersecting with the old var_off might have improved our bounds 1124 + * slightly. e.g. if umax was 0x7f...f and var_off was (0; 0xf...fc), 1125 + * then new var_off is (0; 0x7f...fc) which improves our umax. 1126 + */ 1127 + __reg_deduce_bounds(reg); 1128 + __reg_bound_offset(reg); 1129 + __update_reg_bounds(reg); 1170 1130 } 1171 1131 1172 1132 /* Mark a register as having a completely unknown (scalar) value. */ ··· 2973 2785 return 0; 2974 2786 } 2975 2787 2788 + /* BPF architecture zero extends alu32 ops into 64-bit registesr */ 2789 + static void zext_32_to_64(struct bpf_reg_state *reg) 2790 + { 2791 + reg->var_off = tnum_subreg(reg->var_off); 2792 + __reg_assign_32_into_64(reg); 2793 + } 2976 2794 2977 2795 /* truncate register to smaller size (in bytes) 2978 2796 * must be called with size < BPF_REG_SIZE ··· 3001 2807 } 3002 2808 reg->smin_value = reg->umin_value; 3003 2809 reg->smax_value = reg->umax_value; 2810 + 2811 + /* If size is smaller than 32bit register the 32bit register 2812 + * values are also truncated so we push 64-bit bounds into 2813 + * 32-bit bounds. Above were truncated < 32-bits already. 2814 + */ 2815 + if (size >= 4) 2816 + return; 2817 + __reg_combine_64_into_32(reg); 3004 2818 } 3005 2819 3006 2820 static bool bpf_map_is_rdonly(const struct bpf_map *map) ··· 3663 3461 expected_type = CONST_PTR_TO_MAP; 3664 3462 if (type != expected_type) 3665 3463 goto err_type; 3666 - } else if (arg_type == ARG_PTR_TO_CTX) { 3464 + } else if (arg_type == ARG_PTR_TO_CTX || 3465 + arg_type == ARG_PTR_TO_CTX_OR_NULL) { 3667 3466 expected_type = PTR_TO_CTX; 3668 - if (type != expected_type) 3669 - goto err_type; 3670 - err = check_ctx_reg(env, reg, regno); 3671 - if (err < 0) 3672 - return err; 3467 + if (!(register_is_null(reg) && 3468 + arg_type == ARG_PTR_TO_CTX_OR_NULL)) { 3469 + if (type != expected_type) 3470 + goto err_type; 3471 + err = check_ctx_reg(env, reg, regno); 3472 + if (err < 0) 3473 + return err; 3474 + } 3673 3475 } else if (arg_type == ARG_PTR_TO_SOCK_COMMON) { 3674 3476 expected_type = PTR_TO_SOCK_COMMON; 3675 3477 /* Any sk pointer can be ARG_PTR_TO_SOCK_COMMON */ ··· 3783 3577 } else if (arg_type_is_mem_size(arg_type)) { 3784 3578 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO); 3785 3579 3786 - /* remember the mem_size which may be used later 3787 - * to refine return values. 3580 + /* This is used to refine r0 return value bounds for helpers 3581 + * that enforce this value as an upper bound on return values. 3582 + * See do_refine_retval_range() for helpers that can refine 3583 + * the return value. C type of helper is u32 so we pull register 3584 + * bound from umax_value however, if negative verifier errors 3585 + * out. Only upper bounds can be learned because retval is an 3586 + * int type and negative retvals are allowed. 3788 3587 */ 3789 - meta->msize_smax_value = reg->smax_value; 3790 - meta->msize_umax_value = reg->umax_value; 3588 + meta->msize_max_value = reg->umax_value; 3791 3589 3792 3590 /* The register is SCALAR_VALUE; the access check 3793 3591 * happens using its boundaries. ··· 4334 4124 func_id != BPF_FUNC_probe_read_str)) 4335 4125 return; 4336 4126 4337 - ret_reg->smax_value = meta->msize_smax_value; 4338 - ret_reg->umax_value = meta->msize_umax_value; 4127 + ret_reg->smax_value = meta->msize_max_value; 4128 + ret_reg->s32_max_value = meta->msize_max_value; 4339 4129 __reg_deduce_bounds(ret_reg); 4340 4130 __reg_bound_offset(ret_reg); 4131 + __update_reg_bounds(ret_reg); 4341 4132 } 4342 4133 4343 4134 static int ··· 4645 4434 return res < a; 4646 4435 } 4647 4436 4648 - static bool signed_sub_overflows(s64 a, s64 b) 4437 + static bool signed_add32_overflows(s64 a, s64 b) 4438 + { 4439 + /* Do the add in u32, where overflow is well-defined */ 4440 + s32 res = (s32)((u32)a + (u32)b); 4441 + 4442 + if (b < 0) 4443 + return res > a; 4444 + return res < a; 4445 + } 4446 + 4447 + static bool signed_sub_overflows(s32 a, s32 b) 4649 4448 { 4650 4449 /* Do the sub in u64, where overflow is well-defined */ 4651 4450 s64 res = (s64)((u64)a - (u64)b); 4451 + 4452 + if (b < 0) 4453 + return res < a; 4454 + return res > a; 4455 + } 4456 + 4457 + static bool signed_sub32_overflows(s32 a, s32 b) 4458 + { 4459 + /* Do the sub in u64, where overflow is well-defined */ 4460 + s32 res = (s32)((u32)a - (u32)b); 4652 4461 4653 4462 if (b < 0) 4654 4463 return res < a; ··· 4911 4680 !check_reg_sane_offset(env, ptr_reg, ptr_reg->type)) 4912 4681 return -EINVAL; 4913 4682 4683 + /* pointer types do not carry 32-bit bounds at the moment. */ 4684 + __mark_reg32_unbounded(dst_reg); 4685 + 4914 4686 switch (opcode) { 4915 4687 case BPF_ADD: 4916 4688 ret = sanitize_ptr_alu(env, insn, ptr_reg, dst_reg, smin_val < 0); ··· 5077 4843 return 0; 5078 4844 } 5079 4845 4846 + static void scalar32_min_max_add(struct bpf_reg_state *dst_reg, 4847 + struct bpf_reg_state *src_reg) 4848 + { 4849 + s32 smin_val = src_reg->s32_min_value; 4850 + s32 smax_val = src_reg->s32_max_value; 4851 + u32 umin_val = src_reg->u32_min_value; 4852 + u32 umax_val = src_reg->u32_max_value; 4853 + 4854 + if (signed_add32_overflows(dst_reg->s32_min_value, smin_val) || 4855 + signed_add32_overflows(dst_reg->s32_max_value, smax_val)) { 4856 + dst_reg->s32_min_value = S32_MIN; 4857 + dst_reg->s32_max_value = S32_MAX; 4858 + } else { 4859 + dst_reg->s32_min_value += smin_val; 4860 + dst_reg->s32_max_value += smax_val; 4861 + } 4862 + if (dst_reg->u32_min_value + umin_val < umin_val || 4863 + dst_reg->u32_max_value + umax_val < umax_val) { 4864 + dst_reg->u32_min_value = 0; 4865 + dst_reg->u32_max_value = U32_MAX; 4866 + } else { 4867 + dst_reg->u32_min_value += umin_val; 4868 + dst_reg->u32_max_value += umax_val; 4869 + } 4870 + } 4871 + 4872 + static void scalar_min_max_add(struct bpf_reg_state *dst_reg, 4873 + struct bpf_reg_state *src_reg) 4874 + { 4875 + s64 smin_val = src_reg->smin_value; 4876 + s64 smax_val = src_reg->smax_value; 4877 + u64 umin_val = src_reg->umin_value; 4878 + u64 umax_val = src_reg->umax_value; 4879 + 4880 + if (signed_add_overflows(dst_reg->smin_value, smin_val) || 4881 + signed_add_overflows(dst_reg->smax_value, smax_val)) { 4882 + dst_reg->smin_value = S64_MIN; 4883 + dst_reg->smax_value = S64_MAX; 4884 + } else { 4885 + dst_reg->smin_value += smin_val; 4886 + dst_reg->smax_value += smax_val; 4887 + } 4888 + if (dst_reg->umin_value + umin_val < umin_val || 4889 + dst_reg->umax_value + umax_val < umax_val) { 4890 + dst_reg->umin_value = 0; 4891 + dst_reg->umax_value = U64_MAX; 4892 + } else { 4893 + dst_reg->umin_value += umin_val; 4894 + dst_reg->umax_value += umax_val; 4895 + } 4896 + } 4897 + 4898 + static void scalar32_min_max_sub(struct bpf_reg_state *dst_reg, 4899 + struct bpf_reg_state *src_reg) 4900 + { 4901 + s32 smin_val = src_reg->s32_min_value; 4902 + s32 smax_val = src_reg->s32_max_value; 4903 + u32 umin_val = src_reg->u32_min_value; 4904 + u32 umax_val = src_reg->u32_max_value; 4905 + 4906 + if (signed_sub32_overflows(dst_reg->s32_min_value, smax_val) || 4907 + signed_sub32_overflows(dst_reg->s32_max_value, smin_val)) { 4908 + /* Overflow possible, we know nothing */ 4909 + dst_reg->s32_min_value = S32_MIN; 4910 + dst_reg->s32_max_value = S32_MAX; 4911 + } else { 4912 + dst_reg->s32_min_value -= smax_val; 4913 + dst_reg->s32_max_value -= smin_val; 4914 + } 4915 + if (dst_reg->u32_min_value < umax_val) { 4916 + /* Overflow possible, we know nothing */ 4917 + dst_reg->u32_min_value = 0; 4918 + dst_reg->u32_max_value = U32_MAX; 4919 + } else { 4920 + /* Cannot overflow (as long as bounds are consistent) */ 4921 + dst_reg->u32_min_value -= umax_val; 4922 + dst_reg->u32_max_value -= umin_val; 4923 + } 4924 + } 4925 + 4926 + static void scalar_min_max_sub(struct bpf_reg_state *dst_reg, 4927 + struct bpf_reg_state *src_reg) 4928 + { 4929 + s64 smin_val = src_reg->smin_value; 4930 + s64 smax_val = src_reg->smax_value; 4931 + u64 umin_val = src_reg->umin_value; 4932 + u64 umax_val = src_reg->umax_value; 4933 + 4934 + if (signed_sub_overflows(dst_reg->smin_value, smax_val) || 4935 + signed_sub_overflows(dst_reg->smax_value, smin_val)) { 4936 + /* Overflow possible, we know nothing */ 4937 + dst_reg->smin_value = S64_MIN; 4938 + dst_reg->smax_value = S64_MAX; 4939 + } else { 4940 + dst_reg->smin_value -= smax_val; 4941 + dst_reg->smax_value -= smin_val; 4942 + } 4943 + if (dst_reg->umin_value < umax_val) { 4944 + /* Overflow possible, we know nothing */ 4945 + dst_reg->umin_value = 0; 4946 + dst_reg->umax_value = U64_MAX; 4947 + } else { 4948 + /* Cannot overflow (as long as bounds are consistent) */ 4949 + dst_reg->umin_value -= umax_val; 4950 + dst_reg->umax_value -= umin_val; 4951 + } 4952 + } 4953 + 4954 + static void scalar32_min_max_mul(struct bpf_reg_state *dst_reg, 4955 + struct bpf_reg_state *src_reg) 4956 + { 4957 + s32 smin_val = src_reg->s32_min_value; 4958 + u32 umin_val = src_reg->u32_min_value; 4959 + u32 umax_val = src_reg->u32_max_value; 4960 + 4961 + if (smin_val < 0 || dst_reg->s32_min_value < 0) { 4962 + /* Ain't nobody got time to multiply that sign */ 4963 + __mark_reg32_unbounded(dst_reg); 4964 + return; 4965 + } 4966 + /* Both values are positive, so we can work with unsigned and 4967 + * copy the result to signed (unless it exceeds S32_MAX). 4968 + */ 4969 + if (umax_val > U16_MAX || dst_reg->u32_max_value > U16_MAX) { 4970 + /* Potential overflow, we know nothing */ 4971 + __mark_reg32_unbounded(dst_reg); 4972 + return; 4973 + } 4974 + dst_reg->u32_min_value *= umin_val; 4975 + dst_reg->u32_max_value *= umax_val; 4976 + if (dst_reg->u32_max_value > S32_MAX) { 4977 + /* Overflow possible, we know nothing */ 4978 + dst_reg->s32_min_value = S32_MIN; 4979 + dst_reg->s32_max_value = S32_MAX; 4980 + } else { 4981 + dst_reg->s32_min_value = dst_reg->u32_min_value; 4982 + dst_reg->s32_max_value = dst_reg->u32_max_value; 4983 + } 4984 + } 4985 + 4986 + static void scalar_min_max_mul(struct bpf_reg_state *dst_reg, 4987 + struct bpf_reg_state *src_reg) 4988 + { 4989 + s64 smin_val = src_reg->smin_value; 4990 + u64 umin_val = src_reg->umin_value; 4991 + u64 umax_val = src_reg->umax_value; 4992 + 4993 + if (smin_val < 0 || dst_reg->smin_value < 0) { 4994 + /* Ain't nobody got time to multiply that sign */ 4995 + __mark_reg64_unbounded(dst_reg); 4996 + return; 4997 + } 4998 + /* Both values are positive, so we can work with unsigned and 4999 + * copy the result to signed (unless it exceeds S64_MAX). 5000 + */ 5001 + if (umax_val > U32_MAX || dst_reg->umax_value > U32_MAX) { 5002 + /* Potential overflow, we know nothing */ 5003 + __mark_reg64_unbounded(dst_reg); 5004 + return; 5005 + } 5006 + dst_reg->umin_value *= umin_val; 5007 + dst_reg->umax_value *= umax_val; 5008 + if (dst_reg->umax_value > S64_MAX) { 5009 + /* Overflow possible, we know nothing */ 5010 + dst_reg->smin_value = S64_MIN; 5011 + dst_reg->smax_value = S64_MAX; 5012 + } else { 5013 + dst_reg->smin_value = dst_reg->umin_value; 5014 + dst_reg->smax_value = dst_reg->umax_value; 5015 + } 5016 + } 5017 + 5018 + static void scalar32_min_max_and(struct bpf_reg_state *dst_reg, 5019 + struct bpf_reg_state *src_reg) 5020 + { 5021 + bool src_known = tnum_subreg_is_const(src_reg->var_off); 5022 + bool dst_known = tnum_subreg_is_const(dst_reg->var_off); 5023 + struct tnum var32_off = tnum_subreg(dst_reg->var_off); 5024 + s32 smin_val = src_reg->s32_min_value; 5025 + u32 umax_val = src_reg->u32_max_value; 5026 + 5027 + /* Assuming scalar64_min_max_and will be called so its safe 5028 + * to skip updating register for known 32-bit case. 5029 + */ 5030 + if (src_known && dst_known) 5031 + return; 5032 + 5033 + /* We get our minimum from the var_off, since that's inherently 5034 + * bitwise. Our maximum is the minimum of the operands' maxima. 5035 + */ 5036 + dst_reg->u32_min_value = var32_off.value; 5037 + dst_reg->u32_max_value = min(dst_reg->u32_max_value, umax_val); 5038 + if (dst_reg->s32_min_value < 0 || smin_val < 0) { 5039 + /* Lose signed bounds when ANDing negative numbers, 5040 + * ain't nobody got time for that. 5041 + */ 5042 + dst_reg->s32_min_value = S32_MIN; 5043 + dst_reg->s32_max_value = S32_MAX; 5044 + } else { 5045 + /* ANDing two positives gives a positive, so safe to 5046 + * cast result into s64. 5047 + */ 5048 + dst_reg->s32_min_value = dst_reg->u32_min_value; 5049 + dst_reg->s32_max_value = dst_reg->u32_max_value; 5050 + } 5051 + 5052 + } 5053 + 5054 + static void scalar_min_max_and(struct bpf_reg_state *dst_reg, 5055 + struct bpf_reg_state *src_reg) 5056 + { 5057 + bool src_known = tnum_is_const(src_reg->var_off); 5058 + bool dst_known = tnum_is_const(dst_reg->var_off); 5059 + s64 smin_val = src_reg->smin_value; 5060 + u64 umax_val = src_reg->umax_value; 5061 + 5062 + if (src_known && dst_known) { 5063 + __mark_reg_known(dst_reg, dst_reg->var_off.value & 5064 + src_reg->var_off.value); 5065 + return; 5066 + } 5067 + 5068 + /* We get our minimum from the var_off, since that's inherently 5069 + * bitwise. Our maximum is the minimum of the operands' maxima. 5070 + */ 5071 + dst_reg->umin_value = dst_reg->var_off.value; 5072 + dst_reg->umax_value = min(dst_reg->umax_value, umax_val); 5073 + if (dst_reg->smin_value < 0 || smin_val < 0) { 5074 + /* Lose signed bounds when ANDing negative numbers, 5075 + * ain't nobody got time for that. 5076 + */ 5077 + dst_reg->smin_value = S64_MIN; 5078 + dst_reg->smax_value = S64_MAX; 5079 + } else { 5080 + /* ANDing two positives gives a positive, so safe to 5081 + * cast result into s64. 5082 + */ 5083 + dst_reg->smin_value = dst_reg->umin_value; 5084 + dst_reg->smax_value = dst_reg->umax_value; 5085 + } 5086 + /* We may learn something more from the var_off */ 5087 + __update_reg_bounds(dst_reg); 5088 + } 5089 + 5090 + static void scalar32_min_max_or(struct bpf_reg_state *dst_reg, 5091 + struct bpf_reg_state *src_reg) 5092 + { 5093 + bool src_known = tnum_subreg_is_const(src_reg->var_off); 5094 + bool dst_known = tnum_subreg_is_const(dst_reg->var_off); 5095 + struct tnum var32_off = tnum_subreg(dst_reg->var_off); 5096 + s32 smin_val = src_reg->smin_value; 5097 + u32 umin_val = src_reg->umin_value; 5098 + 5099 + /* Assuming scalar64_min_max_or will be called so it is safe 5100 + * to skip updating register for known case. 5101 + */ 5102 + if (src_known && dst_known) 5103 + return; 5104 + 5105 + /* We get our maximum from the var_off, and our minimum is the 5106 + * maximum of the operands' minima 5107 + */ 5108 + dst_reg->u32_min_value = max(dst_reg->u32_min_value, umin_val); 5109 + dst_reg->u32_max_value = var32_off.value | var32_off.mask; 5110 + if (dst_reg->s32_min_value < 0 || smin_val < 0) { 5111 + /* Lose signed bounds when ORing negative numbers, 5112 + * ain't nobody got time for that. 5113 + */ 5114 + dst_reg->s32_min_value = S32_MIN; 5115 + dst_reg->s32_max_value = S32_MAX; 5116 + } else { 5117 + /* ORing two positives gives a positive, so safe to 5118 + * cast result into s64. 5119 + */ 5120 + dst_reg->s32_min_value = dst_reg->umin_value; 5121 + dst_reg->s32_max_value = dst_reg->umax_value; 5122 + } 5123 + } 5124 + 5125 + static void scalar_min_max_or(struct bpf_reg_state *dst_reg, 5126 + struct bpf_reg_state *src_reg) 5127 + { 5128 + bool src_known = tnum_is_const(src_reg->var_off); 5129 + bool dst_known = tnum_is_const(dst_reg->var_off); 5130 + s64 smin_val = src_reg->smin_value; 5131 + u64 umin_val = src_reg->umin_value; 5132 + 5133 + if (src_known && dst_known) { 5134 + __mark_reg_known(dst_reg, dst_reg->var_off.value | 5135 + src_reg->var_off.value); 5136 + return; 5137 + } 5138 + 5139 + /* We get our maximum from the var_off, and our minimum is the 5140 + * maximum of the operands' minima 5141 + */ 5142 + dst_reg->umin_value = max(dst_reg->umin_value, umin_val); 5143 + dst_reg->umax_value = dst_reg->var_off.value | dst_reg->var_off.mask; 5144 + if (dst_reg->smin_value < 0 || smin_val < 0) { 5145 + /* Lose signed bounds when ORing negative numbers, 5146 + * ain't nobody got time for that. 5147 + */ 5148 + dst_reg->smin_value = S64_MIN; 5149 + dst_reg->smax_value = S64_MAX; 5150 + } else { 5151 + /* ORing two positives gives a positive, so safe to 5152 + * cast result into s64. 5153 + */ 5154 + dst_reg->smin_value = dst_reg->umin_value; 5155 + dst_reg->smax_value = dst_reg->umax_value; 5156 + } 5157 + /* We may learn something more from the var_off */ 5158 + __update_reg_bounds(dst_reg); 5159 + } 5160 + 5161 + static void __scalar32_min_max_lsh(struct bpf_reg_state *dst_reg, 5162 + u64 umin_val, u64 umax_val) 5163 + { 5164 + /* We lose all sign bit information (except what we can pick 5165 + * up from var_off) 5166 + */ 5167 + dst_reg->s32_min_value = S32_MIN; 5168 + dst_reg->s32_max_value = S32_MAX; 5169 + /* If we might shift our top bit out, then we know nothing */ 5170 + if (umax_val > 31 || dst_reg->u32_max_value > 1ULL << (31 - umax_val)) { 5171 + dst_reg->u32_min_value = 0; 5172 + dst_reg->u32_max_value = U32_MAX; 5173 + } else { 5174 + dst_reg->u32_min_value <<= umin_val; 5175 + dst_reg->u32_max_value <<= umax_val; 5176 + } 5177 + } 5178 + 5179 + static void scalar32_min_max_lsh(struct bpf_reg_state *dst_reg, 5180 + struct bpf_reg_state *src_reg) 5181 + { 5182 + u32 umax_val = src_reg->u32_max_value; 5183 + u32 umin_val = src_reg->u32_min_value; 5184 + /* u32 alu operation will zext upper bits */ 5185 + struct tnum subreg = tnum_subreg(dst_reg->var_off); 5186 + 5187 + __scalar32_min_max_lsh(dst_reg, umin_val, umax_val); 5188 + dst_reg->var_off = tnum_subreg(tnum_lshift(subreg, umin_val)); 5189 + /* Not required but being careful mark reg64 bounds as unknown so 5190 + * that we are forced to pick them up from tnum and zext later and 5191 + * if some path skips this step we are still safe. 5192 + */ 5193 + __mark_reg64_unbounded(dst_reg); 5194 + __update_reg32_bounds(dst_reg); 5195 + } 5196 + 5197 + static void __scalar64_min_max_lsh(struct bpf_reg_state *dst_reg, 5198 + u64 umin_val, u64 umax_val) 5199 + { 5200 + /* Special case <<32 because it is a common compiler pattern to sign 5201 + * extend subreg by doing <<32 s>>32. In this case if 32bit bounds are 5202 + * positive we know this shift will also be positive so we can track 5203 + * bounds correctly. Otherwise we lose all sign bit information except 5204 + * what we can pick up from var_off. Perhaps we can generalize this 5205 + * later to shifts of any length. 5206 + */ 5207 + if (umin_val == 32 && umax_val == 32 && dst_reg->s32_max_value >= 0) 5208 + dst_reg->smax_value = (s64)dst_reg->s32_max_value << 32; 5209 + else 5210 + dst_reg->smax_value = S64_MAX; 5211 + 5212 + if (umin_val == 32 && umax_val == 32 && dst_reg->s32_min_value >= 0) 5213 + dst_reg->smin_value = (s64)dst_reg->s32_min_value << 32; 5214 + else 5215 + dst_reg->smin_value = S64_MIN; 5216 + 5217 + /* If we might shift our top bit out, then we know nothing */ 5218 + if (dst_reg->umax_value > 1ULL << (63 - umax_val)) { 5219 + dst_reg->umin_value = 0; 5220 + dst_reg->umax_value = U64_MAX; 5221 + } else { 5222 + dst_reg->umin_value <<= umin_val; 5223 + dst_reg->umax_value <<= umax_val; 5224 + } 5225 + } 5226 + 5227 + static void scalar_min_max_lsh(struct bpf_reg_state *dst_reg, 5228 + struct bpf_reg_state *src_reg) 5229 + { 5230 + u64 umax_val = src_reg->umax_value; 5231 + u64 umin_val = src_reg->umin_value; 5232 + 5233 + /* scalar64 calc uses 32bit unshifted bounds so must be called first */ 5234 + __scalar64_min_max_lsh(dst_reg, umin_val, umax_val); 5235 + __scalar32_min_max_lsh(dst_reg, umin_val, umax_val); 5236 + 5237 + dst_reg->var_off = tnum_lshift(dst_reg->var_off, umin_val); 5238 + /* We may learn something more from the var_off */ 5239 + __update_reg_bounds(dst_reg); 5240 + } 5241 + 5242 + static void scalar32_min_max_rsh(struct bpf_reg_state *dst_reg, 5243 + struct bpf_reg_state *src_reg) 5244 + { 5245 + struct tnum subreg = tnum_subreg(dst_reg->var_off); 5246 + u32 umax_val = src_reg->u32_max_value; 5247 + u32 umin_val = src_reg->u32_min_value; 5248 + 5249 + /* BPF_RSH is an unsigned shift. If the value in dst_reg might 5250 + * be negative, then either: 5251 + * 1) src_reg might be zero, so the sign bit of the result is 5252 + * unknown, so we lose our signed bounds 5253 + * 2) it's known negative, thus the unsigned bounds capture the 5254 + * signed bounds 5255 + * 3) the signed bounds cross zero, so they tell us nothing 5256 + * about the result 5257 + * If the value in dst_reg is known nonnegative, then again the 5258 + * unsigned bounts capture the signed bounds. 5259 + * Thus, in all cases it suffices to blow away our signed bounds 5260 + * and rely on inferring new ones from the unsigned bounds and 5261 + * var_off of the result. 5262 + */ 5263 + dst_reg->s32_min_value = S32_MIN; 5264 + dst_reg->s32_max_value = S32_MAX; 5265 + 5266 + dst_reg->var_off = tnum_rshift(subreg, umin_val); 5267 + dst_reg->u32_min_value >>= umax_val; 5268 + dst_reg->u32_max_value >>= umin_val; 5269 + 5270 + __mark_reg64_unbounded(dst_reg); 5271 + __update_reg32_bounds(dst_reg); 5272 + } 5273 + 5274 + static void scalar_min_max_rsh(struct bpf_reg_state *dst_reg, 5275 + struct bpf_reg_state *src_reg) 5276 + { 5277 + u64 umax_val = src_reg->umax_value; 5278 + u64 umin_val = src_reg->umin_value; 5279 + 5280 + /* BPF_RSH is an unsigned shift. If the value in dst_reg might 5281 + * be negative, then either: 5282 + * 1) src_reg might be zero, so the sign bit of the result is 5283 + * unknown, so we lose our signed bounds 5284 + * 2) it's known negative, thus the unsigned bounds capture the 5285 + * signed bounds 5286 + * 3) the signed bounds cross zero, so they tell us nothing 5287 + * about the result 5288 + * If the value in dst_reg is known nonnegative, then again the 5289 + * unsigned bounts capture the signed bounds. 5290 + * Thus, in all cases it suffices to blow away our signed bounds 5291 + * and rely on inferring new ones from the unsigned bounds and 5292 + * var_off of the result. 5293 + */ 5294 + dst_reg->smin_value = S64_MIN; 5295 + dst_reg->smax_value = S64_MAX; 5296 + dst_reg->var_off = tnum_rshift(dst_reg->var_off, umin_val); 5297 + dst_reg->umin_value >>= umax_val; 5298 + dst_reg->umax_value >>= umin_val; 5299 + 5300 + /* Its not easy to operate on alu32 bounds here because it depends 5301 + * on bits being shifted in. Take easy way out and mark unbounded 5302 + * so we can recalculate later from tnum. 5303 + */ 5304 + __mark_reg32_unbounded(dst_reg); 5305 + __update_reg_bounds(dst_reg); 5306 + } 5307 + 5308 + static void scalar32_min_max_arsh(struct bpf_reg_state *dst_reg, 5309 + struct bpf_reg_state *src_reg) 5310 + { 5311 + u64 umin_val = src_reg->u32_min_value; 5312 + 5313 + /* Upon reaching here, src_known is true and 5314 + * umax_val is equal to umin_val. 5315 + */ 5316 + dst_reg->s32_min_value = (u32)(((s32)dst_reg->s32_min_value) >> umin_val); 5317 + dst_reg->s32_max_value = (u32)(((s32)dst_reg->s32_max_value) >> umin_val); 5318 + 5319 + dst_reg->var_off = tnum_arshift(tnum_subreg(dst_reg->var_off), umin_val, 32); 5320 + 5321 + /* blow away the dst_reg umin_value/umax_value and rely on 5322 + * dst_reg var_off to refine the result. 5323 + */ 5324 + dst_reg->u32_min_value = 0; 5325 + dst_reg->u32_max_value = U32_MAX; 5326 + 5327 + __mark_reg64_unbounded(dst_reg); 5328 + __update_reg32_bounds(dst_reg); 5329 + } 5330 + 5331 + static void scalar_min_max_arsh(struct bpf_reg_state *dst_reg, 5332 + struct bpf_reg_state *src_reg) 5333 + { 5334 + u64 umin_val = src_reg->umin_value; 5335 + 5336 + /* Upon reaching here, src_known is true and umax_val is equal 5337 + * to umin_val. 5338 + */ 5339 + dst_reg->smin_value >>= umin_val; 5340 + dst_reg->smax_value >>= umin_val; 5341 + 5342 + dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val, 64); 5343 + 5344 + /* blow away the dst_reg umin_value/umax_value and rely on 5345 + * dst_reg var_off to refine the result. 5346 + */ 5347 + dst_reg->umin_value = 0; 5348 + dst_reg->umax_value = U64_MAX; 5349 + 5350 + /* Its not easy to operate on alu32 bounds here because it depends 5351 + * on bits being shifted in from upper 32-bits. Take easy way out 5352 + * and mark unbounded so we can recalculate later from tnum. 5353 + */ 5354 + __mark_reg32_unbounded(dst_reg); 5355 + __update_reg_bounds(dst_reg); 5356 + } 5357 + 5080 5358 /* WARNING: This function does calculations on 64-bit values, but the actual 5081 5359 * execution may occur on 32-bit values. Therefore, things like bitshifts 5082 5360 * need extra checks in the 32-bit case. ··· 5603 4857 bool src_known, dst_known; 5604 4858 s64 smin_val, smax_val; 5605 4859 u64 umin_val, umax_val; 4860 + s32 s32_min_val, s32_max_val; 4861 + u32 u32_min_val, u32_max_val; 5606 4862 u64 insn_bitness = (BPF_CLASS(insn->code) == BPF_ALU64) ? 64 : 32; 5607 4863 u32 dst = insn->dst_reg; 5608 4864 int ret; 5609 - 5610 - if (insn_bitness == 32) { 5611 - /* Relevant for 32-bit RSH: Information can propagate towards 5612 - * LSB, so it isn't sufficient to only truncate the output to 5613 - * 32 bits. 5614 - */ 5615 - coerce_reg_to_size(dst_reg, 4); 5616 - coerce_reg_to_size(&src_reg, 4); 5617 - } 4865 + bool alu32 = (BPF_CLASS(insn->code) != BPF_ALU64); 5618 4866 5619 4867 smin_val = src_reg.smin_value; 5620 4868 smax_val = src_reg.smax_value; 5621 4869 umin_val = src_reg.umin_value; 5622 4870 umax_val = src_reg.umax_value; 5623 - src_known = tnum_is_const(src_reg.var_off); 5624 - dst_known = tnum_is_const(dst_reg->var_off); 5625 4871 5626 - if ((src_known && (smin_val != smax_val || umin_val != umax_val)) || 5627 - smin_val > smax_val || umin_val > umax_val) { 5628 - /* Taint dst register if offset had invalid bounds derived from 5629 - * e.g. dead branches. 5630 - */ 5631 - __mark_reg_unknown(env, dst_reg); 5632 - return 0; 4872 + s32_min_val = src_reg.s32_min_value; 4873 + s32_max_val = src_reg.s32_max_value; 4874 + u32_min_val = src_reg.u32_min_value; 4875 + u32_max_val = src_reg.u32_max_value; 4876 + 4877 + if (alu32) { 4878 + src_known = tnum_subreg_is_const(src_reg.var_off); 4879 + dst_known = tnum_subreg_is_const(dst_reg->var_off); 4880 + if ((src_known && 4881 + (s32_min_val != s32_max_val || u32_min_val != u32_max_val)) || 4882 + s32_min_val > s32_max_val || u32_min_val > u32_max_val) { 4883 + /* Taint dst register if offset had invalid bounds 4884 + * derived from e.g. dead branches. 4885 + */ 4886 + __mark_reg_unknown(env, dst_reg); 4887 + return 0; 4888 + } 4889 + } else { 4890 + src_known = tnum_is_const(src_reg.var_off); 4891 + dst_known = tnum_is_const(dst_reg->var_off); 4892 + if ((src_known && 4893 + (smin_val != smax_val || umin_val != umax_val)) || 4894 + smin_val > smax_val || umin_val > umax_val) { 4895 + /* Taint dst register if offset had invalid bounds 4896 + * derived from e.g. dead branches. 4897 + */ 4898 + __mark_reg_unknown(env, dst_reg); 4899 + return 0; 4900 + } 5633 4901 } 5634 4902 5635 4903 if (!src_known && ··· 5652 4892 return 0; 5653 4893 } 5654 4894 4895 + /* Calculate sign/unsigned bounds and tnum for alu32 and alu64 bit ops. 4896 + * There are two classes of instructions: The first class we track both 4897 + * alu32 and alu64 sign/unsigned bounds independently this provides the 4898 + * greatest amount of precision when alu operations are mixed with jmp32 4899 + * operations. These operations are BPF_ADD, BPF_SUB, BPF_MUL, BPF_ADD, 4900 + * and BPF_OR. This is possible because these ops have fairly easy to 4901 + * understand and calculate behavior in both 32-bit and 64-bit alu ops. 4902 + * See alu32 verifier tests for examples. The second class of 4903 + * operations, BPF_LSH, BPF_RSH, and BPF_ARSH, however are not so easy 4904 + * with regards to tracking sign/unsigned bounds because the bits may 4905 + * cross subreg boundaries in the alu64 case. When this happens we mark 4906 + * the reg unbounded in the subreg bound space and use the resulting 4907 + * tnum to calculate an approximation of the sign/unsigned bounds. 4908 + */ 5655 4909 switch (opcode) { 5656 4910 case BPF_ADD: 5657 4911 ret = sanitize_val_alu(env, insn); ··· 5673 4899 verbose(env, "R%d tried to add from different pointers or scalars\n", dst); 5674 4900 return ret; 5675 4901 } 5676 - if (signed_add_overflows(dst_reg->smin_value, smin_val) || 5677 - signed_add_overflows(dst_reg->smax_value, smax_val)) { 5678 - dst_reg->smin_value = S64_MIN; 5679 - dst_reg->smax_value = S64_MAX; 5680 - } else { 5681 - dst_reg->smin_value += smin_val; 5682 - dst_reg->smax_value += smax_val; 5683 - } 5684 - if (dst_reg->umin_value + umin_val < umin_val || 5685 - dst_reg->umax_value + umax_val < umax_val) { 5686 - dst_reg->umin_value = 0; 5687 - dst_reg->umax_value = U64_MAX; 5688 - } else { 5689 - dst_reg->umin_value += umin_val; 5690 - dst_reg->umax_value += umax_val; 5691 - } 4902 + scalar32_min_max_add(dst_reg, &src_reg); 4903 + scalar_min_max_add(dst_reg, &src_reg); 5692 4904 dst_reg->var_off = tnum_add(dst_reg->var_off, src_reg.var_off); 5693 4905 break; 5694 4906 case BPF_SUB: ··· 5683 4923 verbose(env, "R%d tried to sub from different pointers or scalars\n", dst); 5684 4924 return ret; 5685 4925 } 5686 - if (signed_sub_overflows(dst_reg->smin_value, smax_val) || 5687 - signed_sub_overflows(dst_reg->smax_value, smin_val)) { 5688 - /* Overflow possible, we know nothing */ 5689 - dst_reg->smin_value = S64_MIN; 5690 - dst_reg->smax_value = S64_MAX; 5691 - } else { 5692 - dst_reg->smin_value -= smax_val; 5693 - dst_reg->smax_value -= smin_val; 5694 - } 5695 - if (dst_reg->umin_value < umax_val) { 5696 - /* Overflow possible, we know nothing */ 5697 - dst_reg->umin_value = 0; 5698 - dst_reg->umax_value = U64_MAX; 5699 - } else { 5700 - /* Cannot overflow (as long as bounds are consistent) */ 5701 - dst_reg->umin_value -= umax_val; 5702 - dst_reg->umax_value -= umin_val; 5703 - } 4926 + scalar32_min_max_sub(dst_reg, &src_reg); 4927 + scalar_min_max_sub(dst_reg, &src_reg); 5704 4928 dst_reg->var_off = tnum_sub(dst_reg->var_off, src_reg.var_off); 5705 4929 break; 5706 4930 case BPF_MUL: 5707 4931 dst_reg->var_off = tnum_mul(dst_reg->var_off, src_reg.var_off); 5708 - if (smin_val < 0 || dst_reg->smin_value < 0) { 5709 - /* Ain't nobody got time to multiply that sign */ 5710 - __mark_reg_unbounded(dst_reg); 5711 - __update_reg_bounds(dst_reg); 5712 - break; 5713 - } 5714 - /* Both values are positive, so we can work with unsigned and 5715 - * copy the result to signed (unless it exceeds S64_MAX). 5716 - */ 5717 - if (umax_val > U32_MAX || dst_reg->umax_value > U32_MAX) { 5718 - /* Potential overflow, we know nothing */ 5719 - __mark_reg_unbounded(dst_reg); 5720 - /* (except what we can learn from the var_off) */ 5721 - __update_reg_bounds(dst_reg); 5722 - break; 5723 - } 5724 - dst_reg->umin_value *= umin_val; 5725 - dst_reg->umax_value *= umax_val; 5726 - if (dst_reg->umax_value > S64_MAX) { 5727 - /* Overflow possible, we know nothing */ 5728 - dst_reg->smin_value = S64_MIN; 5729 - dst_reg->smax_value = S64_MAX; 5730 - } else { 5731 - dst_reg->smin_value = dst_reg->umin_value; 5732 - dst_reg->smax_value = dst_reg->umax_value; 5733 - } 4932 + scalar32_min_max_mul(dst_reg, &src_reg); 4933 + scalar_min_max_mul(dst_reg, &src_reg); 5734 4934 break; 5735 4935 case BPF_AND: 5736 - if (src_known && dst_known) { 5737 - __mark_reg_known(dst_reg, dst_reg->var_off.value & 5738 - src_reg.var_off.value); 5739 - break; 5740 - } 5741 - /* We get our minimum from the var_off, since that's inherently 5742 - * bitwise. Our maximum is the minimum of the operands' maxima. 5743 - */ 5744 4936 dst_reg->var_off = tnum_and(dst_reg->var_off, src_reg.var_off); 5745 - dst_reg->umin_value = dst_reg->var_off.value; 5746 - dst_reg->umax_value = min(dst_reg->umax_value, umax_val); 5747 - if (dst_reg->smin_value < 0 || smin_val < 0) { 5748 - /* Lose signed bounds when ANDing negative numbers, 5749 - * ain't nobody got time for that. 5750 - */ 5751 - dst_reg->smin_value = S64_MIN; 5752 - dst_reg->smax_value = S64_MAX; 5753 - } else { 5754 - /* ANDing two positives gives a positive, so safe to 5755 - * cast result into s64. 5756 - */ 5757 - dst_reg->smin_value = dst_reg->umin_value; 5758 - dst_reg->smax_value = dst_reg->umax_value; 5759 - } 5760 - /* We may learn something more from the var_off */ 5761 - __update_reg_bounds(dst_reg); 4937 + scalar32_min_max_and(dst_reg, &src_reg); 4938 + scalar_min_max_and(dst_reg, &src_reg); 5762 4939 break; 5763 4940 case BPF_OR: 5764 - if (src_known && dst_known) { 5765 - __mark_reg_known(dst_reg, dst_reg->var_off.value | 5766 - src_reg.var_off.value); 5767 - break; 5768 - } 5769 - /* We get our maximum from the var_off, and our minimum is the 5770 - * maximum of the operands' minima 5771 - */ 5772 4941 dst_reg->var_off = tnum_or(dst_reg->var_off, src_reg.var_off); 5773 - dst_reg->umin_value = max(dst_reg->umin_value, umin_val); 5774 - dst_reg->umax_value = dst_reg->var_off.value | 5775 - dst_reg->var_off.mask; 5776 - if (dst_reg->smin_value < 0 || smin_val < 0) { 5777 - /* Lose signed bounds when ORing negative numbers, 5778 - * ain't nobody got time for that. 5779 - */ 5780 - dst_reg->smin_value = S64_MIN; 5781 - dst_reg->smax_value = S64_MAX; 5782 - } else { 5783 - /* ORing two positives gives a positive, so safe to 5784 - * cast result into s64. 5785 - */ 5786 - dst_reg->smin_value = dst_reg->umin_value; 5787 - dst_reg->smax_value = dst_reg->umax_value; 5788 - } 5789 - /* We may learn something more from the var_off */ 5790 - __update_reg_bounds(dst_reg); 4942 + scalar32_min_max_or(dst_reg, &src_reg); 4943 + scalar_min_max_or(dst_reg, &src_reg); 5791 4944 break; 5792 4945 case BPF_LSH: 5793 4946 if (umax_val >= insn_bitness) { ··· 5710 5037 mark_reg_unknown(env, regs, insn->dst_reg); 5711 5038 break; 5712 5039 } 5713 - /* We lose all sign bit information (except what we can pick 5714 - * up from var_off) 5715 - */ 5716 - dst_reg->smin_value = S64_MIN; 5717 - dst_reg->smax_value = S64_MAX; 5718 - /* If we might shift our top bit out, then we know nothing */ 5719 - if (dst_reg->umax_value > 1ULL << (63 - umax_val)) { 5720 - dst_reg->umin_value = 0; 5721 - dst_reg->umax_value = U64_MAX; 5722 - } else { 5723 - dst_reg->umin_value <<= umin_val; 5724 - dst_reg->umax_value <<= umax_val; 5725 - } 5726 - dst_reg->var_off = tnum_lshift(dst_reg->var_off, umin_val); 5727 - /* We may learn something more from the var_off */ 5728 - __update_reg_bounds(dst_reg); 5040 + if (alu32) 5041 + scalar32_min_max_lsh(dst_reg, &src_reg); 5042 + else 5043 + scalar_min_max_lsh(dst_reg, &src_reg); 5729 5044 break; 5730 5045 case BPF_RSH: 5731 5046 if (umax_val >= insn_bitness) { ··· 5723 5062 mark_reg_unknown(env, regs, insn->dst_reg); 5724 5063 break; 5725 5064 } 5726 - /* BPF_RSH is an unsigned shift. If the value in dst_reg might 5727 - * be negative, then either: 5728 - * 1) src_reg might be zero, so the sign bit of the result is 5729 - * unknown, so we lose our signed bounds 5730 - * 2) it's known negative, thus the unsigned bounds capture the 5731 - * signed bounds 5732 - * 3) the signed bounds cross zero, so they tell us nothing 5733 - * about the result 5734 - * If the value in dst_reg is known nonnegative, then again the 5735 - * unsigned bounts capture the signed bounds. 5736 - * Thus, in all cases it suffices to blow away our signed bounds 5737 - * and rely on inferring new ones from the unsigned bounds and 5738 - * var_off of the result. 5739 - */ 5740 - dst_reg->smin_value = S64_MIN; 5741 - dst_reg->smax_value = S64_MAX; 5742 - dst_reg->var_off = tnum_rshift(dst_reg->var_off, umin_val); 5743 - dst_reg->umin_value >>= umax_val; 5744 - dst_reg->umax_value >>= umin_val; 5745 - /* We may learn something more from the var_off */ 5746 - __update_reg_bounds(dst_reg); 5065 + if (alu32) 5066 + scalar32_min_max_rsh(dst_reg, &src_reg); 5067 + else 5068 + scalar_min_max_rsh(dst_reg, &src_reg); 5747 5069 break; 5748 5070 case BPF_ARSH: 5749 5071 if (umax_val >= insn_bitness) { ··· 5736 5092 mark_reg_unknown(env, regs, insn->dst_reg); 5737 5093 break; 5738 5094 } 5739 - 5740 - /* Upon reaching here, src_known is true and 5741 - * umax_val is equal to umin_val. 5742 - */ 5743 - if (insn_bitness == 32) { 5744 - dst_reg->smin_value = (u32)(((s32)dst_reg->smin_value) >> umin_val); 5745 - dst_reg->smax_value = (u32)(((s32)dst_reg->smax_value) >> umin_val); 5746 - } else { 5747 - dst_reg->smin_value >>= umin_val; 5748 - dst_reg->smax_value >>= umin_val; 5749 - } 5750 - 5751 - dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val, 5752 - insn_bitness); 5753 - 5754 - /* blow away the dst_reg umin_value/umax_value and rely on 5755 - * dst_reg var_off to refine the result. 5756 - */ 5757 - dst_reg->umin_value = 0; 5758 - dst_reg->umax_value = U64_MAX; 5759 - __update_reg_bounds(dst_reg); 5095 + if (alu32) 5096 + scalar32_min_max_arsh(dst_reg, &src_reg); 5097 + else 5098 + scalar_min_max_arsh(dst_reg, &src_reg); 5760 5099 break; 5761 5100 default: 5762 5101 mark_reg_unknown(env, regs, insn->dst_reg); 5763 5102 break; 5764 5103 } 5765 5104 5766 - if (BPF_CLASS(insn->code) != BPF_ALU64) { 5767 - /* 32-bit ALU ops are (32,32)->32 */ 5768 - coerce_reg_to_size(dst_reg, 4); 5769 - } 5105 + /* ALU32 ops are zero extended into 64bit register */ 5106 + if (alu32) 5107 + zext_32_to_64(dst_reg); 5770 5108 5109 + __update_reg_bounds(dst_reg); 5771 5110 __reg_deduce_bounds(dst_reg); 5772 5111 __reg_bound_offset(dst_reg); 5773 5112 return 0; ··· 5924 5297 mark_reg_unknown(env, regs, 5925 5298 insn->dst_reg); 5926 5299 } 5927 - coerce_reg_to_size(dst_reg, 4); 5300 + zext_32_to_64(dst_reg); 5928 5301 } 5929 5302 } else { 5930 5303 /* case: R = imm ··· 6094 5467 new_range); 6095 5468 } 6096 5469 6097 - /* compute branch direction of the expression "if (reg opcode val) goto target;" 6098 - * and return: 6099 - * 1 - branch will be taken and "goto target" will be executed 6100 - * 0 - branch will not be taken and fall-through to next insn 6101 - * -1 - unknown. Example: "if (reg < 5)" is unknown when register value range [0,10] 6102 - */ 6103 - static int is_branch_taken(struct bpf_reg_state *reg, u64 val, u8 opcode, 6104 - bool is_jmp32) 5470 + static int is_branch32_taken(struct bpf_reg_state *reg, u32 val, u8 opcode) 6105 5471 { 6106 - struct bpf_reg_state reg_lo; 6107 - s64 sval; 5472 + struct tnum subreg = tnum_subreg(reg->var_off); 5473 + s32 sval = (s32)val; 6108 5474 6109 - if (__is_pointer_value(false, reg)) 6110 - return -1; 6111 - 6112 - if (is_jmp32) { 6113 - reg_lo = *reg; 6114 - reg = &reg_lo; 6115 - /* For JMP32, only low 32 bits are compared, coerce_reg_to_size 6116 - * could truncate high bits and update umin/umax according to 6117 - * information of low bits. 6118 - */ 6119 - coerce_reg_to_size(reg, 4); 6120 - /* smin/smax need special handling. For example, after coerce, 6121 - * if smin_value is 0x00000000ffffffffLL, the value is -1 when 6122 - * used as operand to JMP32. It is a negative number from s32's 6123 - * point of view, while it is a positive number when seen as 6124 - * s64. The smin/smax are kept as s64, therefore, when used with 6125 - * JMP32, they need to be transformed into s32, then sign 6126 - * extended back to s64. 6127 - * 6128 - * Also, smin/smax were copied from umin/umax. If umin/umax has 6129 - * different sign bit, then min/max relationship doesn't 6130 - * maintain after casting into s32, for this case, set smin/smax 6131 - * to safest range. 6132 - */ 6133 - if ((reg->umax_value ^ reg->umin_value) & 6134 - (1ULL << 31)) { 6135 - reg->smin_value = S32_MIN; 6136 - reg->smax_value = S32_MAX; 6137 - } 6138 - reg->smin_value = (s64)(s32)reg->smin_value; 6139 - reg->smax_value = (s64)(s32)reg->smax_value; 6140 - 6141 - val = (u32)val; 6142 - sval = (s64)(s32)val; 6143 - } else { 6144 - sval = (s64)val; 5475 + switch (opcode) { 5476 + case BPF_JEQ: 5477 + if (tnum_is_const(subreg)) 5478 + return !!tnum_equals_const(subreg, val); 5479 + break; 5480 + case BPF_JNE: 5481 + if (tnum_is_const(subreg)) 5482 + return !tnum_equals_const(subreg, val); 5483 + break; 5484 + case BPF_JSET: 5485 + if ((~subreg.mask & subreg.value) & val) 5486 + return 1; 5487 + if (!((subreg.mask | subreg.value) & val)) 5488 + return 0; 5489 + break; 5490 + case BPF_JGT: 5491 + if (reg->u32_min_value > val) 5492 + return 1; 5493 + else if (reg->u32_max_value <= val) 5494 + return 0; 5495 + break; 5496 + case BPF_JSGT: 5497 + if (reg->s32_min_value > sval) 5498 + return 1; 5499 + else if (reg->s32_max_value < sval) 5500 + return 0; 5501 + break; 5502 + case BPF_JLT: 5503 + if (reg->u32_max_value < val) 5504 + return 1; 5505 + else if (reg->u32_min_value >= val) 5506 + return 0; 5507 + break; 5508 + case BPF_JSLT: 5509 + if (reg->s32_max_value < sval) 5510 + return 1; 5511 + else if (reg->s32_min_value >= sval) 5512 + return 0; 5513 + break; 5514 + case BPF_JGE: 5515 + if (reg->u32_min_value >= val) 5516 + return 1; 5517 + else if (reg->u32_max_value < val) 5518 + return 0; 5519 + break; 5520 + case BPF_JSGE: 5521 + if (reg->s32_min_value >= sval) 5522 + return 1; 5523 + else if (reg->s32_max_value < sval) 5524 + return 0; 5525 + break; 5526 + case BPF_JLE: 5527 + if (reg->u32_max_value <= val) 5528 + return 1; 5529 + else if (reg->u32_min_value > val) 5530 + return 0; 5531 + break; 5532 + case BPF_JSLE: 5533 + if (reg->s32_max_value <= sval) 5534 + return 1; 5535 + else if (reg->s32_min_value > sval) 5536 + return 0; 5537 + break; 6145 5538 } 5539 + 5540 + return -1; 5541 + } 5542 + 5543 + 5544 + static int is_branch64_taken(struct bpf_reg_state *reg, u64 val, u8 opcode) 5545 + { 5546 + s64 sval = (s64)val; 6146 5547 6147 5548 switch (opcode) { 6148 5549 case BPF_JEQ: ··· 6240 5585 return -1; 6241 5586 } 6242 5587 6243 - /* Generate min value of the high 32-bit from TNUM info. */ 6244 - static u64 gen_hi_min(struct tnum var) 6245 - { 6246 - return var.value & ~0xffffffffULL; 6247 - } 6248 - 6249 - /* Generate max value of the high 32-bit from TNUM info. */ 6250 - static u64 gen_hi_max(struct tnum var) 6251 - { 6252 - return (var.value | var.mask) & ~0xffffffffULL; 6253 - } 6254 - 6255 - /* Return true if VAL is compared with a s64 sign extended from s32, and they 6256 - * are with the same signedness. 5588 + /* compute branch direction of the expression "if (reg opcode val) goto target;" 5589 + * and return: 5590 + * 1 - branch will be taken and "goto target" will be executed 5591 + * 0 - branch will not be taken and fall-through to next insn 5592 + * -1 - unknown. Example: "if (reg < 5)" is unknown when register value 5593 + * range [0,10] 6257 5594 */ 6258 - static bool cmp_val_with_extended_s64(s64 sval, struct bpf_reg_state *reg) 5595 + static int is_branch_taken(struct bpf_reg_state *reg, u64 val, u8 opcode, 5596 + bool is_jmp32) 6259 5597 { 6260 - return ((s32)sval >= 0 && 6261 - reg->smin_value >= 0 && reg->smax_value <= S32_MAX) || 6262 - ((s32)sval < 0 && 6263 - reg->smax_value <= 0 && reg->smin_value >= S32_MIN); 5598 + if (__is_pointer_value(false, reg)) 5599 + return -1; 5600 + 5601 + if (is_jmp32) 5602 + return is_branch32_taken(reg, val, opcode); 5603 + return is_branch64_taken(reg, val, opcode); 6264 5604 } 6265 5605 6266 5606 /* Adjusts the register min/max values in the case that the dst_reg is the ··· 6264 5614 * In JEQ/JNE cases we also adjust the var_off values. 6265 5615 */ 6266 5616 static void reg_set_min_max(struct bpf_reg_state *true_reg, 6267 - struct bpf_reg_state *false_reg, u64 val, 5617 + struct bpf_reg_state *false_reg, 5618 + u64 val, u32 val32, 6268 5619 u8 opcode, bool is_jmp32) 6269 5620 { 6270 - s64 sval; 5621 + struct tnum false_32off = tnum_subreg(false_reg->var_off); 5622 + struct tnum false_64off = false_reg->var_off; 5623 + struct tnum true_32off = tnum_subreg(true_reg->var_off); 5624 + struct tnum true_64off = true_reg->var_off; 5625 + s64 sval = (s64)val; 5626 + s32 sval32 = (s32)val32; 6271 5627 6272 5628 /* If the dst_reg is a pointer, we can't learn anything about its 6273 5629 * variable offset from the compare (unless src_reg were a pointer into ··· 6283 5627 */ 6284 5628 if (__is_pointer_value(false, false_reg)) 6285 5629 return; 6286 - 6287 - val = is_jmp32 ? (u32)val : val; 6288 - sval = is_jmp32 ? (s64)(s32)val : (s64)val; 6289 5630 6290 5631 switch (opcode) { 6291 5632 case BPF_JEQ: ··· 6295 5642 * if it is true we know the value for sure. Likewise for 6296 5643 * BPF_JNE. 6297 5644 */ 6298 - if (is_jmp32) { 6299 - u64 old_v = reg->var_off.value; 6300 - u64 hi_mask = ~0xffffffffULL; 6301 - 6302 - reg->var_off.value = (old_v & hi_mask) | val; 6303 - reg->var_off.mask &= hi_mask; 6304 - } else { 5645 + if (is_jmp32) 5646 + __mark_reg32_known(reg, val32); 5647 + else 6305 5648 __mark_reg_known(reg, val); 6306 - } 6307 5649 break; 6308 5650 } 6309 5651 case BPF_JSET: 6310 - false_reg->var_off = tnum_and(false_reg->var_off, 6311 - tnum_const(~val)); 6312 - if (is_power_of_2(val)) 6313 - true_reg->var_off = tnum_or(true_reg->var_off, 6314 - tnum_const(val)); 5652 + if (is_jmp32) { 5653 + false_32off = tnum_and(false_32off, tnum_const(~val32)); 5654 + if (is_power_of_2(val32)) 5655 + true_32off = tnum_or(true_32off, 5656 + tnum_const(val32)); 5657 + } else { 5658 + false_64off = tnum_and(false_64off, tnum_const(~val)); 5659 + if (is_power_of_2(val)) 5660 + true_64off = tnum_or(true_64off, 5661 + tnum_const(val)); 5662 + } 6315 5663 break; 6316 5664 case BPF_JGE: 6317 5665 case BPF_JGT: 6318 5666 { 6319 - u64 false_umax = opcode == BPF_JGT ? val : val - 1; 6320 - u64 true_umin = opcode == BPF_JGT ? val + 1 : val; 6321 - 6322 5667 if (is_jmp32) { 6323 - false_umax += gen_hi_max(false_reg->var_off); 6324 - true_umin += gen_hi_min(true_reg->var_off); 5668 + u32 false_umax = opcode == BPF_JGT ? val32 : val32 - 1; 5669 + u32 true_umin = opcode == BPF_JGT ? val32 + 1 : val32; 5670 + 5671 + false_reg->u32_max_value = min(false_reg->u32_max_value, 5672 + false_umax); 5673 + true_reg->u32_min_value = max(true_reg->u32_min_value, 5674 + true_umin); 5675 + } else { 5676 + u64 false_umax = opcode == BPF_JGT ? val : val - 1; 5677 + u64 true_umin = opcode == BPF_JGT ? val + 1 : val; 5678 + 5679 + false_reg->umax_value = min(false_reg->umax_value, false_umax); 5680 + true_reg->umin_value = max(true_reg->umin_value, true_umin); 6325 5681 } 6326 - false_reg->umax_value = min(false_reg->umax_value, false_umax); 6327 - true_reg->umin_value = max(true_reg->umin_value, true_umin); 6328 5682 break; 6329 5683 } 6330 5684 case BPF_JSGE: 6331 5685 case BPF_JSGT: 6332 5686 { 6333 - s64 false_smax = opcode == BPF_JSGT ? sval : sval - 1; 6334 - s64 true_smin = opcode == BPF_JSGT ? sval + 1 : sval; 5687 + if (is_jmp32) { 5688 + s32 false_smax = opcode == BPF_JSGT ? sval32 : sval32 - 1; 5689 + s32 true_smin = opcode == BPF_JSGT ? sval32 + 1 : sval32; 6335 5690 6336 - /* If the full s64 was not sign-extended from s32 then don't 6337 - * deduct further info. 6338 - */ 6339 - if (is_jmp32 && !cmp_val_with_extended_s64(sval, false_reg)) 6340 - break; 6341 - false_reg->smax_value = min(false_reg->smax_value, false_smax); 6342 - true_reg->smin_value = max(true_reg->smin_value, true_smin); 5691 + false_reg->s32_max_value = min(false_reg->s32_max_value, false_smax); 5692 + true_reg->s32_min_value = max(true_reg->s32_min_value, true_smin); 5693 + } else { 5694 + s64 false_smax = opcode == BPF_JSGT ? sval : sval - 1; 5695 + s64 true_smin = opcode == BPF_JSGT ? sval + 1 : sval; 5696 + 5697 + false_reg->smax_value = min(false_reg->smax_value, false_smax); 5698 + true_reg->smin_value = max(true_reg->smin_value, true_smin); 5699 + } 6343 5700 break; 6344 5701 } 6345 5702 case BPF_JLE: 6346 5703 case BPF_JLT: 6347 5704 { 6348 - u64 false_umin = opcode == BPF_JLT ? val : val + 1; 6349 - u64 true_umax = opcode == BPF_JLT ? val - 1 : val; 6350 - 6351 5705 if (is_jmp32) { 6352 - false_umin += gen_hi_min(false_reg->var_off); 6353 - true_umax += gen_hi_max(true_reg->var_off); 5706 + u32 false_umin = opcode == BPF_JLT ? val32 : val32 + 1; 5707 + u32 true_umax = opcode == BPF_JLT ? val32 - 1 : val32; 5708 + 5709 + false_reg->u32_min_value = max(false_reg->u32_min_value, 5710 + false_umin); 5711 + true_reg->u32_max_value = min(true_reg->u32_max_value, 5712 + true_umax); 5713 + } else { 5714 + u64 false_umin = opcode == BPF_JLT ? val : val + 1; 5715 + u64 true_umax = opcode == BPF_JLT ? val - 1 : val; 5716 + 5717 + false_reg->umin_value = max(false_reg->umin_value, false_umin); 5718 + true_reg->umax_value = min(true_reg->umax_value, true_umax); 6354 5719 } 6355 - false_reg->umin_value = max(false_reg->umin_value, false_umin); 6356 - true_reg->umax_value = min(true_reg->umax_value, true_umax); 6357 5720 break; 6358 5721 } 6359 5722 case BPF_JSLE: 6360 5723 case BPF_JSLT: 6361 5724 { 6362 - s64 false_smin = opcode == BPF_JSLT ? sval : sval + 1; 6363 - s64 true_smax = opcode == BPF_JSLT ? sval - 1 : sval; 5725 + if (is_jmp32) { 5726 + s32 false_smin = opcode == BPF_JSLT ? sval32 : sval32 + 1; 5727 + s32 true_smax = opcode == BPF_JSLT ? sval32 - 1 : sval32; 6364 5728 6365 - if (is_jmp32 && !cmp_val_with_extended_s64(sval, false_reg)) 6366 - break; 6367 - false_reg->smin_value = max(false_reg->smin_value, false_smin); 6368 - true_reg->smax_value = min(true_reg->smax_value, true_smax); 5729 + false_reg->s32_min_value = max(false_reg->s32_min_value, false_smin); 5730 + true_reg->s32_max_value = min(true_reg->s32_max_value, true_smax); 5731 + } else { 5732 + s64 false_smin = opcode == BPF_JSLT ? sval : sval + 1; 5733 + s64 true_smax = opcode == BPF_JSLT ? sval - 1 : sval; 5734 + 5735 + false_reg->smin_value = max(false_reg->smin_value, false_smin); 5736 + true_reg->smax_value = min(true_reg->smax_value, true_smax); 5737 + } 6369 5738 break; 6370 5739 } 6371 5740 default: 6372 - break; 5741 + return; 6373 5742 } 6374 5743 6375 - __reg_deduce_bounds(false_reg); 6376 - __reg_deduce_bounds(true_reg); 6377 - /* We might have learned some bits from the bounds. */ 6378 - __reg_bound_offset(false_reg); 6379 - __reg_bound_offset(true_reg); 6380 5744 if (is_jmp32) { 6381 - __reg_bound_offset32(false_reg); 6382 - __reg_bound_offset32(true_reg); 5745 + false_reg->var_off = tnum_or(tnum_clear_subreg(false_64off), 5746 + tnum_subreg(false_32off)); 5747 + true_reg->var_off = tnum_or(tnum_clear_subreg(true_64off), 5748 + tnum_subreg(true_32off)); 5749 + __reg_combine_32_into_64(false_reg); 5750 + __reg_combine_32_into_64(true_reg); 5751 + } else { 5752 + false_reg->var_off = false_64off; 5753 + true_reg->var_off = true_64off; 5754 + __reg_combine_64_into_32(false_reg); 5755 + __reg_combine_64_into_32(true_reg); 6383 5756 } 6384 - /* Intersecting with the old var_off might have improved our bounds 6385 - * slightly. e.g. if umax was 0x7f...f and var_off was (0; 0xf...fc), 6386 - * then new var_off is (0; 0x7f...fc) which improves our umax. 6387 - */ 6388 - __update_reg_bounds(false_reg); 6389 - __update_reg_bounds(true_reg); 6390 5757 } 6391 5758 6392 5759 /* Same as above, but for the case that dst_reg holds a constant and src_reg is 6393 5760 * the variable reg. 6394 5761 */ 6395 5762 static void reg_set_min_max_inv(struct bpf_reg_state *true_reg, 6396 - struct bpf_reg_state *false_reg, u64 val, 5763 + struct bpf_reg_state *false_reg, 5764 + u64 val, u32 val32, 6397 5765 u8 opcode, bool is_jmp32) 6398 5766 { 6399 - s64 sval; 6400 - 6401 - if (__is_pointer_value(false, false_reg)) 6402 - return; 6403 - 6404 - val = is_jmp32 ? (u32)val : val; 6405 - sval = is_jmp32 ? (s64)(s32)val : (s64)val; 6406 - 6407 - switch (opcode) { 6408 - case BPF_JEQ: 6409 - case BPF_JNE: 6410 - { 6411 - struct bpf_reg_state *reg = 6412 - opcode == BPF_JEQ ? true_reg : false_reg; 6413 - 6414 - if (is_jmp32) { 6415 - u64 old_v = reg->var_off.value; 6416 - u64 hi_mask = ~0xffffffffULL; 6417 - 6418 - reg->var_off.value = (old_v & hi_mask) | val; 6419 - reg->var_off.mask &= hi_mask; 6420 - } else { 6421 - __mark_reg_known(reg, val); 6422 - } 6423 - break; 6424 - } 6425 - case BPF_JSET: 6426 - false_reg->var_off = tnum_and(false_reg->var_off, 6427 - tnum_const(~val)); 6428 - if (is_power_of_2(val)) 6429 - true_reg->var_off = tnum_or(true_reg->var_off, 6430 - tnum_const(val)); 6431 - break; 6432 - case BPF_JGE: 6433 - case BPF_JGT: 6434 - { 6435 - u64 false_umin = opcode == BPF_JGT ? val : val + 1; 6436 - u64 true_umax = opcode == BPF_JGT ? val - 1 : val; 6437 - 6438 - if (is_jmp32) { 6439 - false_umin += gen_hi_min(false_reg->var_off); 6440 - true_umax += gen_hi_max(true_reg->var_off); 6441 - } 6442 - false_reg->umin_value = max(false_reg->umin_value, false_umin); 6443 - true_reg->umax_value = min(true_reg->umax_value, true_umax); 6444 - break; 6445 - } 6446 - case BPF_JSGE: 6447 - case BPF_JSGT: 6448 - { 6449 - s64 false_smin = opcode == BPF_JSGT ? sval : sval + 1; 6450 - s64 true_smax = opcode == BPF_JSGT ? sval - 1 : sval; 6451 - 6452 - if (is_jmp32 && !cmp_val_with_extended_s64(sval, false_reg)) 6453 - break; 6454 - false_reg->smin_value = max(false_reg->smin_value, false_smin); 6455 - true_reg->smax_value = min(true_reg->smax_value, true_smax); 6456 - break; 6457 - } 6458 - case BPF_JLE: 6459 - case BPF_JLT: 6460 - { 6461 - u64 false_umax = opcode == BPF_JLT ? val : val - 1; 6462 - u64 true_umin = opcode == BPF_JLT ? val + 1 : val; 6463 - 6464 - if (is_jmp32) { 6465 - false_umax += gen_hi_max(false_reg->var_off); 6466 - true_umin += gen_hi_min(true_reg->var_off); 6467 - } 6468 - false_reg->umax_value = min(false_reg->umax_value, false_umax); 6469 - true_reg->umin_value = max(true_reg->umin_value, true_umin); 6470 - break; 6471 - } 6472 - case BPF_JSLE: 6473 - case BPF_JSLT: 6474 - { 6475 - s64 false_smax = opcode == BPF_JSLT ? sval : sval - 1; 6476 - s64 true_smin = opcode == BPF_JSLT ? sval + 1 : sval; 6477 - 6478 - if (is_jmp32 && !cmp_val_with_extended_s64(sval, false_reg)) 6479 - break; 6480 - false_reg->smax_value = min(false_reg->smax_value, false_smax); 6481 - true_reg->smin_value = max(true_reg->smin_value, true_smin); 6482 - break; 6483 - } 6484 - default: 6485 - break; 6486 - } 6487 - 6488 - __reg_deduce_bounds(false_reg); 6489 - __reg_deduce_bounds(true_reg); 6490 - /* We might have learned some bits from the bounds. */ 6491 - __reg_bound_offset(false_reg); 6492 - __reg_bound_offset(true_reg); 6493 - if (is_jmp32) { 6494 - __reg_bound_offset32(false_reg); 6495 - __reg_bound_offset32(true_reg); 6496 - } 6497 - /* Intersecting with the old var_off might have improved our bounds 6498 - * slightly. e.g. if umax was 0x7f...f and var_off was (0; 0xf...fc), 6499 - * then new var_off is (0; 0x7f...fc) which improves our umax. 5767 + /* How can we transform "a <op> b" into "b <op> a"? */ 5768 + static const u8 opcode_flip[16] = { 5769 + /* these stay the same */ 5770 + [BPF_JEQ >> 4] = BPF_JEQ, 5771 + [BPF_JNE >> 4] = BPF_JNE, 5772 + [BPF_JSET >> 4] = BPF_JSET, 5773 + /* these swap "lesser" and "greater" (L and G in the opcodes) */ 5774 + [BPF_JGE >> 4] = BPF_JLE, 5775 + [BPF_JGT >> 4] = BPF_JLT, 5776 + [BPF_JLE >> 4] = BPF_JGE, 5777 + [BPF_JLT >> 4] = BPF_JGT, 5778 + [BPF_JSGE >> 4] = BPF_JSLE, 5779 + [BPF_JSGT >> 4] = BPF_JSLT, 5780 + [BPF_JSLE >> 4] = BPF_JSGE, 5781 + [BPF_JSLT >> 4] = BPF_JSGT 5782 + }; 5783 + opcode = opcode_flip[opcode >> 4]; 5784 + /* This uses zero as "not present in table"; luckily the zero opcode, 5785 + * BPF_JA, can't get here. 6500 5786 */ 6501 - __update_reg_bounds(false_reg); 6502 - __update_reg_bounds(true_reg); 5787 + if (opcode) 5788 + reg_set_min_max(true_reg, false_reg, val, val32, opcode, is_jmp32); 6503 5789 } 6504 5790 6505 5791 /* Regs are known to be equal, so intersect their min/max/var_off */ ··· 6727 6135 dst_reg = &regs[insn->dst_reg]; 6728 6136 is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32; 6729 6137 6730 - if (BPF_SRC(insn->code) == BPF_K) 6731 - pred = is_branch_taken(dst_reg, insn->imm, 6732 - opcode, is_jmp32); 6733 - else if (src_reg->type == SCALAR_VALUE && 6734 - tnum_is_const(src_reg->var_off)) 6735 - pred = is_branch_taken(dst_reg, src_reg->var_off.value, 6736 - opcode, is_jmp32); 6138 + if (BPF_SRC(insn->code) == BPF_K) { 6139 + pred = is_branch_taken(dst_reg, insn->imm, opcode, is_jmp32); 6140 + } else if (src_reg->type == SCALAR_VALUE && 6141 + is_jmp32 && tnum_is_const(tnum_subreg(src_reg->var_off))) { 6142 + pred = is_branch_taken(dst_reg, 6143 + tnum_subreg(src_reg->var_off).value, 6144 + opcode, 6145 + is_jmp32); 6146 + } else if (src_reg->type == SCALAR_VALUE && 6147 + !is_jmp32 && tnum_is_const(src_reg->var_off)) { 6148 + pred = is_branch_taken(dst_reg, 6149 + src_reg->var_off.value, 6150 + opcode, 6151 + is_jmp32); 6152 + } 6153 + 6737 6154 if (pred >= 0) { 6738 6155 err = mark_chain_precision(env, insn->dst_reg); 6739 6156 if (BPF_SRC(insn->code) == BPF_X && !err) ··· 6776 6175 */ 6777 6176 if (BPF_SRC(insn->code) == BPF_X) { 6778 6177 struct bpf_reg_state *src_reg = &regs[insn->src_reg]; 6779 - struct bpf_reg_state lo_reg0 = *dst_reg; 6780 - struct bpf_reg_state lo_reg1 = *src_reg; 6781 - struct bpf_reg_state *src_lo, *dst_lo; 6782 - 6783 - dst_lo = &lo_reg0; 6784 - src_lo = &lo_reg1; 6785 - coerce_reg_to_size(dst_lo, 4); 6786 - coerce_reg_to_size(src_lo, 4); 6787 6178 6788 6179 if (dst_reg->type == SCALAR_VALUE && 6789 6180 src_reg->type == SCALAR_VALUE) { 6790 6181 if (tnum_is_const(src_reg->var_off) || 6791 - (is_jmp32 && tnum_is_const(src_lo->var_off))) 6182 + (is_jmp32 && 6183 + tnum_is_const(tnum_subreg(src_reg->var_off)))) 6792 6184 reg_set_min_max(&other_branch_regs[insn->dst_reg], 6793 6185 dst_reg, 6794 - is_jmp32 6795 - ? src_lo->var_off.value 6796 - : src_reg->var_off.value, 6186 + src_reg->var_off.value, 6187 + tnum_subreg(src_reg->var_off).value, 6797 6188 opcode, is_jmp32); 6798 6189 else if (tnum_is_const(dst_reg->var_off) || 6799 - (is_jmp32 && tnum_is_const(dst_lo->var_off))) 6190 + (is_jmp32 && 6191 + tnum_is_const(tnum_subreg(dst_reg->var_off)))) 6800 6192 reg_set_min_max_inv(&other_branch_regs[insn->src_reg], 6801 6193 src_reg, 6802 - is_jmp32 6803 - ? dst_lo->var_off.value 6804 - : dst_reg->var_off.value, 6194 + dst_reg->var_off.value, 6195 + tnum_subreg(dst_reg->var_off).value, 6805 6196 opcode, is_jmp32); 6806 6197 else if (!is_jmp32 && 6807 6198 (opcode == BPF_JEQ || opcode == BPF_JNE)) ··· 6804 6211 } 6805 6212 } else if (dst_reg->type == SCALAR_VALUE) { 6806 6213 reg_set_min_max(&other_branch_regs[insn->dst_reg], 6807 - dst_reg, insn->imm, opcode, is_jmp32); 6214 + dst_reg, insn->imm, (u32)insn->imm, 6215 + opcode, is_jmp32); 6808 6216 } 6809 6217 6810 6218 /* detect if R == 0 where R is returned from bpf_map_lookup_elem(). ··· 7006 6412 struct tnum range = tnum_range(0, 1); 7007 6413 int err; 7008 6414 7009 - /* The struct_ops func-ptr's return type could be "void" */ 7010 - if (env->prog->type == BPF_PROG_TYPE_STRUCT_OPS && 6415 + /* LSM and struct_ops func-ptr's return type could be "void" */ 6416 + if ((env->prog->type == BPF_PROG_TYPE_STRUCT_OPS || 6417 + env->prog->type == BPF_PROG_TYPE_LSM) && 7011 6418 !prog->aux->attach_func_proto->type) 7012 6419 return 0; 7013 6420 ··· 10438 9843 if (prog->type == BPF_PROG_TYPE_STRUCT_OPS) 10439 9844 return check_struct_ops_btf_id(env); 10440 9845 10441 - if (prog->type != BPF_PROG_TYPE_TRACING && !prog_extension) 9846 + if (prog->type != BPF_PROG_TYPE_TRACING && 9847 + prog->type != BPF_PROG_TYPE_LSM && 9848 + !prog_extension) 10442 9849 return 0; 10443 9850 10444 9851 if (!btf_id) { ··· 10571 9974 return -EINVAL; 10572 9975 /* fallthrough */ 10573 9976 case BPF_MODIFY_RETURN: 9977 + case BPF_LSM_MAC: 10574 9978 case BPF_TRACE_FENTRY: 10575 9979 case BPF_TRACE_FEXIT: 9980 + prog->aux->attach_func_name = tname; 9981 + if (prog->type == BPF_PROG_TYPE_LSM) { 9982 + ret = bpf_lsm_verify_prog(&env->log, prog); 9983 + if (ret < 0) 9984 + return ret; 9985 + } 9986 + 10576 9987 if (!btf_type_is_func(t)) { 10577 9988 verbose(env, "attach_btf_id %u is not a function\n", 10578 9989 btf_id); ··· 10595 9990 tr = bpf_trampoline_lookup(key); 10596 9991 if (!tr) 10597 9992 return -ENOMEM; 10598 - prog->aux->attach_func_name = tname; 10599 9993 /* t is either vmlinux type or another program's type */ 10600 9994 prog->aux->attach_func_proto = t; 10601 9995 mutex_lock(&tr->mutex);

+36 -5

kernel/cgroup/cgroup.c

··· 6303 6303 #endif /* CONFIG_SOCK_CGROUP_DATA */ 6304 6304 6305 6305 #ifdef CONFIG_CGROUP_BPF 6306 - int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog, 6307 - struct bpf_prog *replace_prog, enum bpf_attach_type type, 6306 + int cgroup_bpf_attach(struct cgroup *cgrp, 6307 + struct bpf_prog *prog, struct bpf_prog *replace_prog, 6308 + struct bpf_cgroup_link *link, 6309 + enum bpf_attach_type type, 6308 6310 u32 flags) 6309 6311 { 6310 6312 int ret; 6311 6313 6312 6314 mutex_lock(&cgroup_mutex); 6313 - ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, type, flags); 6315 + ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, link, type, flags); 6314 6316 mutex_unlock(&cgroup_mutex); 6315 6317 return ret; 6316 6318 } 6319 + 6320 + int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *old_prog, 6321 + struct bpf_prog *new_prog) 6322 + { 6323 + struct bpf_cgroup_link *cg_link; 6324 + int ret; 6325 + 6326 + if (link->ops != &bpf_cgroup_link_lops) 6327 + return -EINVAL; 6328 + 6329 + cg_link = container_of(link, struct bpf_cgroup_link, link); 6330 + 6331 + mutex_lock(&cgroup_mutex); 6332 + /* link might have been auto-released by dying cgroup, so fail */ 6333 + if (!cg_link->cgroup) { 6334 + ret = -EINVAL; 6335 + goto out_unlock; 6336 + } 6337 + if (old_prog && link->prog != old_prog) { 6338 + ret = -EPERM; 6339 + goto out_unlock; 6340 + } 6341 + ret = __cgroup_bpf_replace(cg_link->cgroup, cg_link, new_prog); 6342 + out_unlock: 6343 + mutex_unlock(&cgroup_mutex); 6344 + return ret; 6345 + } 6346 + 6317 6347 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, 6318 - enum bpf_attach_type type, u32 flags) 6348 + enum bpf_attach_type type) 6319 6349 { 6320 6350 int ret; 6321 6351 6322 6352 mutex_lock(&cgroup_mutex); 6323 - ret = __cgroup_bpf_detach(cgrp, prog, type); 6353 + ret = __cgroup_bpf_detach(cgrp, prog, NULL, type); 6324 6354 mutex_unlock(&cgroup_mutex); 6325 6355 return ret; 6326 6356 } 6357 + 6327 6358 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, 6328 6359 union bpf_attr __user *uattr) 6329 6360 {

+6 -6

kernel/trace/bpf_trace.c

··· 779 779 .arg1_type = ARG_ANYTHING, 780 780 }; 781 781 782 - static const struct bpf_func_proto * 783 - tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 782 + const struct bpf_func_proto * 783 + bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 784 784 { 785 785 switch (func_id) { 786 786 case BPF_FUNC_map_lookup_elem: ··· 865 865 return &bpf_override_return_proto; 866 866 #endif 867 867 default: 868 - return tracing_func_proto(func_id, prog); 868 + return bpf_tracing_func_proto(func_id, prog); 869 869 } 870 870 } 871 871 ··· 975 975 case BPF_FUNC_get_stack: 976 976 return &bpf_get_stack_proto_tp; 977 977 default: 978 - return tracing_func_proto(func_id, prog); 978 + return bpf_tracing_func_proto(func_id, prog); 979 979 } 980 980 } 981 981 ··· 1082 1082 case BPF_FUNC_read_branch_records: 1083 1083 return &bpf_read_branch_records_proto; 1084 1084 default: 1085 - return tracing_func_proto(func_id, prog); 1085 + return bpf_tracing_func_proto(func_id, prog); 1086 1086 } 1087 1087 } 1088 1088 ··· 1210 1210 case BPF_FUNC_get_stack: 1211 1211 return &bpf_get_stack_proto_raw_tp; 1212 1212 default: 1213 - return tracing_func_proto(func_id, prog); 1213 + return bpf_tracing_func_proto(func_id, prog); 1214 1214 } 1215 1215 } 1216 1216

+4

net/bpf/test_run.c

··· 114 114 * architecture dependent calling conventions. 7+ can be supported in the 115 115 * future. 116 116 */ 117 + __diag_push(); 118 + __diag_ignore(GCC, 8, "-Wmissing-prototypes", 119 + "Global functions as their definitions will be in vmlinux BTF"); 117 120 int noinline bpf_fentry_test1(int a) 118 121 { 119 122 return a + 1; ··· 152 149 *b += 1; 153 150 return a + *b; 154 151 } 152 + __diag_pop(); 155 153 156 154 ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO); 157 155

+21 -5

net/core/dev.c

··· 8655 8655 * @dev: device 8656 8656 * @extack: netlink extended ack 8657 8657 * @fd: new program fd or negative value to clear 8658 + * @expected_fd: old program fd that userspace expects to replace or clear 8658 8659 * @flags: xdp-related flags 8659 8660 * 8660 8661 * Set or clear a bpf program for a device 8661 8662 */ 8662 8663 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, 8663 - int fd, u32 flags) 8664 + int fd, int expected_fd, u32 flags) 8664 8665 { 8665 8666 const struct net_device_ops *ops = dev->netdev_ops; 8666 8667 enum bpf_netdev_command query; 8668 + u32 prog_id, expected_id = 0; 8667 8669 struct bpf_prog *prog = NULL; 8668 8670 bpf_op_t bpf_op, bpf_chk; 8669 8671 bool offload; ··· 8686 8684 if (bpf_op == bpf_chk) 8687 8685 bpf_chk = generic_xdp_install; 8688 8686 8689 - if (fd >= 0) { 8690 - u32 prog_id; 8687 + prog_id = __dev_xdp_query(dev, bpf_op, query); 8688 + if (flags & XDP_FLAGS_REPLACE) { 8689 + if (expected_fd >= 0) { 8690 + prog = bpf_prog_get_type_dev(expected_fd, 8691 + BPF_PROG_TYPE_XDP, 8692 + bpf_op == ops->ndo_bpf); 8693 + if (IS_ERR(prog)) 8694 + return PTR_ERR(prog); 8695 + expected_id = prog->aux->id; 8696 + bpf_prog_put(prog); 8697 + } 8691 8698 8699 + if (prog_id != expected_id) { 8700 + NL_SET_ERR_MSG(extack, "Active program does not match expected"); 8701 + return -EEXIST; 8702 + } 8703 + } 8704 + if (fd >= 0) { 8692 8705 if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) { 8693 8706 NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time"); 8694 8707 return -EEXIST; 8695 8708 } 8696 8709 8697 - prog_id = __dev_xdp_query(dev, bpf_op, query); 8698 8710 if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) && prog_id) { 8699 8711 NL_SET_ERR_MSG(extack, "XDP program already attached"); 8700 8712 return -EBUSY; ··· 8731 8715 return 0; 8732 8716 } 8733 8717 } else { 8734 - if (!__dev_xdp_query(dev, bpf_op, query)) 8718 + if (!prog_id) 8735 8719 return 0; 8736 8720 } 8737 8721

+134 -7

net/core/filter.c

··· 2642 2642 .arg4_type = ARG_ANYTHING, 2643 2643 }; 2644 2644 2645 + #ifdef CONFIG_CGROUP_NET_CLASSID 2646 + BPF_CALL_0(bpf_get_cgroup_classid_curr) 2647 + { 2648 + return __task_get_classid(current); 2649 + } 2650 + 2651 + static const struct bpf_func_proto bpf_get_cgroup_classid_curr_proto = { 2652 + .func = bpf_get_cgroup_classid_curr, 2653 + .gpl_only = false, 2654 + .ret_type = RET_INTEGER, 2655 + }; 2656 + #endif 2657 + 2645 2658 BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb) 2646 2659 { 2647 2660 return task_get_classid(skb); ··· 4130 4117 .arg1_type = ARG_PTR_TO_CTX, 4131 4118 }; 4132 4119 4120 + BPF_CALL_1(bpf_get_socket_cookie_sock, struct sock *, ctx) 4121 + { 4122 + return sock_gen_cookie(ctx); 4123 + } 4124 + 4125 + static const struct bpf_func_proto bpf_get_socket_cookie_sock_proto = { 4126 + .func = bpf_get_socket_cookie_sock, 4127 + .gpl_only = false, 4128 + .ret_type = RET_INTEGER, 4129 + .arg1_type = ARG_PTR_TO_CTX, 4130 + }; 4131 + 4133 4132 BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx) 4134 4133 { 4135 4134 return sock_gen_cookie(ctx->sk); ··· 4152 4127 .gpl_only = false, 4153 4128 .ret_type = RET_INTEGER, 4154 4129 .arg1_type = ARG_PTR_TO_CTX, 4130 + }; 4131 + 4132 + static u64 __bpf_get_netns_cookie(struct sock *sk) 4133 + { 4134 + #ifdef CONFIG_NET_NS 4135 + return net_gen_cookie(sk ? sk->sk_net.net : &init_net); 4136 + #else 4137 + return 0; 4138 + #endif 4139 + } 4140 + 4141 + BPF_CALL_1(bpf_get_netns_cookie_sock, struct sock *, ctx) 4142 + { 4143 + return __bpf_get_netns_cookie(ctx); 4144 + } 4145 + 4146 + static const struct bpf_func_proto bpf_get_netns_cookie_sock_proto = { 4147 + .func = bpf_get_netns_cookie_sock, 4148 + .gpl_only = false, 4149 + .ret_type = RET_INTEGER, 4150 + .arg1_type = ARG_PTR_TO_CTX_OR_NULL, 4151 + }; 4152 + 4153 + BPF_CALL_1(bpf_get_netns_cookie_sock_addr, struct bpf_sock_addr_kern *, ctx) 4154 + { 4155 + return __bpf_get_netns_cookie(ctx ? ctx->sk : NULL); 4156 + } 4157 + 4158 + static const struct bpf_func_proto bpf_get_netns_cookie_sock_addr_proto = { 4159 + .func = bpf_get_netns_cookie_sock_addr, 4160 + .gpl_only = false, 4161 + .ret_type = RET_INTEGER, 4162 + .arg1_type = ARG_PTR_TO_CTX_OR_NULL, 4155 4163 }; 4156 4164 4157 4165 BPF_CALL_1(bpf_get_socket_uid, struct sk_buff *, skb) ··· 4205 4147 .arg1_type = ARG_PTR_TO_CTX, 4206 4148 }; 4207 4149 4208 - BPF_CALL_5(bpf_sockopt_event_output, struct bpf_sock_ops_kern *, bpf_sock, 4209 - struct bpf_map *, map, u64, flags, void *, data, u64, size) 4150 + BPF_CALL_5(bpf_event_output_data, void *, ctx, struct bpf_map *, map, u64, flags, 4151 + void *, data, u64, size) 4210 4152 { 4211 4153 if (unlikely(flags & ~(BPF_F_INDEX_MASK))) 4212 4154 return -EINVAL; ··· 4214 4156 return bpf_event_output(map, flags, data, size, NULL, 0, NULL); 4215 4157 } 4216 4158 4217 - static const struct bpf_func_proto bpf_sockopt_event_output_proto = { 4218 - .func = bpf_sockopt_event_output, 4159 + static const struct bpf_func_proto bpf_event_output_data_proto = { 4160 + .func = bpf_event_output_data, 4219 4161 .gpl_only = true, 4220 4162 .ret_type = RET_INTEGER, 4221 4163 .arg1_type = ARG_PTR_TO_CTX, ··· 5401 5343 5402 5344 BPF_CALL_1(bpf_sk_release, struct sock *, sk) 5403 5345 { 5404 - /* Only full sockets have sk->sk_flags. */ 5405 - if (!sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE)) 5346 + if (sk_is_refcounted(sk)) 5406 5347 sock_gen_put(sk); 5407 5348 return 0; 5408 5349 } ··· 5917 5860 .arg5_type = ARG_CONST_SIZE, 5918 5861 }; 5919 5862 5863 + BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags) 5864 + { 5865 + if (flags != 0) 5866 + return -EINVAL; 5867 + if (!skb_at_tc_ingress(skb)) 5868 + return -EOPNOTSUPP; 5869 + if (unlikely(dev_net(skb->dev) != sock_net(sk))) 5870 + return -ENETUNREACH; 5871 + if (unlikely(sk->sk_reuseport)) 5872 + return -ESOCKTNOSUPPORT; 5873 + if (sk_is_refcounted(sk) && 5874 + unlikely(!refcount_inc_not_zero(&sk->sk_refcnt))) 5875 + return -ENOENT; 5876 + 5877 + skb_orphan(skb); 5878 + skb->sk = sk; 5879 + skb->destructor = sock_pfree; 5880 + 5881 + return 0; 5882 + } 5883 + 5884 + static const struct bpf_func_proto bpf_sk_assign_proto = { 5885 + .func = bpf_sk_assign, 5886 + .gpl_only = false, 5887 + .ret_type = RET_INTEGER, 5888 + .arg1_type = ARG_PTR_TO_CTX, 5889 + .arg2_type = ARG_PTR_TO_SOCK_COMMON, 5890 + .arg3_type = ARG_ANYTHING, 5891 + }; 5892 + 5920 5893 #endif /* CONFIG_INET */ 5921 5894 5922 5895 bool bpf_helper_changes_pkt_data(void *func) ··· 6041 5954 return &bpf_get_current_uid_gid_proto; 6042 5955 case BPF_FUNC_get_local_storage: 6043 5956 return &bpf_get_local_storage_proto; 5957 + case BPF_FUNC_get_socket_cookie: 5958 + return &bpf_get_socket_cookie_sock_proto; 5959 + case BPF_FUNC_get_netns_cookie: 5960 + return &bpf_get_netns_cookie_sock_proto; 5961 + case BPF_FUNC_perf_event_output: 5962 + return &bpf_event_output_data_proto; 5963 + case BPF_FUNC_get_current_pid_tgid: 5964 + return &bpf_get_current_pid_tgid_proto; 5965 + case BPF_FUNC_get_current_comm: 5966 + return &bpf_get_current_comm_proto; 5967 + #ifdef CONFIG_CGROUPS 5968 + case BPF_FUNC_get_current_cgroup_id: 5969 + return &bpf_get_current_cgroup_id_proto; 5970 + case BPF_FUNC_get_current_ancestor_cgroup_id: 5971 + return &bpf_get_current_ancestor_cgroup_id_proto; 5972 + #endif 5973 + #ifdef CONFIG_CGROUP_NET_CLASSID 5974 + case BPF_FUNC_get_cgroup_classid: 5975 + return &bpf_get_cgroup_classid_curr_proto; 5976 + #endif 6044 5977 default: 6045 5978 return bpf_base_func_proto(func_id); 6046 5979 } ··· 6085 5978 } 6086 5979 case BPF_FUNC_get_socket_cookie: 6087 5980 return &bpf_get_socket_cookie_sock_addr_proto; 5981 + case BPF_FUNC_get_netns_cookie: 5982 + return &bpf_get_netns_cookie_sock_addr_proto; 6088 5983 case BPF_FUNC_get_local_storage: 6089 5984 return &bpf_get_local_storage_proto; 5985 + case BPF_FUNC_perf_event_output: 5986 + return &bpf_event_output_data_proto; 5987 + case BPF_FUNC_get_current_pid_tgid: 5988 + return &bpf_get_current_pid_tgid_proto; 5989 + case BPF_FUNC_get_current_comm: 5990 + return &bpf_get_current_comm_proto; 5991 + #ifdef CONFIG_CGROUPS 5992 + case BPF_FUNC_get_current_cgroup_id: 5993 + return &bpf_get_current_cgroup_id_proto; 5994 + case BPF_FUNC_get_current_ancestor_cgroup_id: 5995 + return &bpf_get_current_ancestor_cgroup_id_proto; 5996 + #endif 5997 + #ifdef CONFIG_CGROUP_NET_CLASSID 5998 + case BPF_FUNC_get_cgroup_classid: 5999 + return &bpf_get_cgroup_classid_curr_proto; 6000 + #endif 6090 6001 #ifdef CONFIG_INET 6091 6002 case BPF_FUNC_sk_lookup_tcp: 6092 6003 return &bpf_sock_addr_sk_lookup_tcp_proto; ··· 6278 6153 return &bpf_skb_ecn_set_ce_proto; 6279 6154 case BPF_FUNC_tcp_gen_syncookie: 6280 6155 return &bpf_tcp_gen_syncookie_proto; 6156 + case BPF_FUNC_sk_assign: 6157 + return &bpf_sk_assign_proto; 6281 6158 #endif 6282 6159 default: 6283 6160 return bpf_base_func_proto(func_id); ··· 6349 6222 case BPF_FUNC_get_local_storage: 6350 6223 return &bpf_get_local_storage_proto; 6351 6224 case BPF_FUNC_perf_event_output: 6352 - return &bpf_sockopt_event_output_proto; 6225 + return &bpf_event_output_data_proto; 6353 6226 case BPF_FUNC_sk_storage_get: 6354 6227 return &bpf_sk_storage_get_proto; 6355 6228 case BPF_FUNC_sk_storage_delete:

+15

net/core/net_namespace.c

··· 69 69 70 70 static unsigned int max_gen_ptrs = INITIAL_NET_GEN_PTRS; 71 71 72 + static atomic64_t cookie_gen; 73 + 74 + u64 net_gen_cookie(struct net *net) 75 + { 76 + while (1) { 77 + u64 res = atomic64_read(&net->net_cookie); 78 + 79 + if (res) 80 + return res; 81 + res = atomic64_inc_return(&cookie_gen); 82 + atomic64_cmpxchg(&net->net_cookie, 0, res); 83 + } 84 + } 85 + 72 86 static struct net_generic *net_alloc_generic(void) 73 87 { 74 88 struct net_generic *ng; ··· 1101 1087 panic("Could not allocate generic netns"); 1102 1088 1103 1089 rcu_assign_pointer(init_net.gen, ng); 1090 + net_gen_cookie(&init_net); 1104 1091 1105 1092 down_write(&pernet_ops_rwsem); 1106 1093 if (setup_net(&init_net, &init_user_ns))

+14

net/core/rtnetlink.c

··· 1872 1872 }; 1873 1873 1874 1874 static const struct nla_policy ifla_xdp_policy[IFLA_XDP_MAX + 1] = { 1875 + [IFLA_XDP_UNSPEC] = { .strict_start_type = IFLA_XDP_EXPECTED_FD }, 1875 1876 [IFLA_XDP_FD] = { .type = NLA_S32 }, 1877 + [IFLA_XDP_EXPECTED_FD] = { .type = NLA_S32 }, 1876 1878 [IFLA_XDP_ATTACHED] = { .type = NLA_U8 }, 1877 1879 [IFLA_XDP_FLAGS] = { .type = NLA_U32 }, 1878 1880 [IFLA_XDP_PROG_ID] = { .type = NLA_U32 }, ··· 2801 2799 } 2802 2800 2803 2801 if (xdp[IFLA_XDP_FD]) { 2802 + int expected_fd = -1; 2803 + 2804 + if (xdp_flags & XDP_FLAGS_REPLACE) { 2805 + if (!xdp[IFLA_XDP_EXPECTED_FD]) { 2806 + err = -EINVAL; 2807 + goto errout; 2808 + } 2809 + expected_fd = 2810 + nla_get_s32(xdp[IFLA_XDP_EXPECTED_FD]); 2811 + } 2812 + 2804 2813 err = dev_change_xdp_fd(dev, extack, 2805 2814 nla_get_s32(xdp[IFLA_XDP_FD]), 2815 + expected_fd, 2806 2816 xdp_flags); 2807 2817 if (err) 2808 2818 goto errout;

+12

net/core/sock.c

··· 2071 2071 } 2072 2072 EXPORT_SYMBOL(sock_efree); 2073 2073 2074 + /* Buffer destructor for prefetch/receive path where reference count may 2075 + * not be held, e.g. for listen sockets. 2076 + */ 2077 + #ifdef CONFIG_INET 2078 + void sock_pfree(struct sk_buff *skb) 2079 + { 2080 + if (sk_is_refcounted(skb->sk)) 2081 + sock_gen_put(skb->sk); 2082 + } 2083 + EXPORT_SYMBOL(sock_pfree); 2084 + #endif /* CONFIG_INET */ 2085 + 2074 2086 kuid_t sock_i_uid(struct sock *sk) 2075 2087 { 2076 2088 kuid_t uid;

+33

net/ipv4/bpf_tcp_ca.c

··· 7 7 #include <linux/btf.h> 8 8 #include <linux/filter.h> 9 9 #include <net/tcp.h> 10 + #include <net/bpf_sk_storage.h> 10 11 11 12 static u32 optional_ops[] = { 12 13 offsetof(struct tcp_congestion_ops, init), ··· 28 27 static const struct btf_type *tcp_sock_type; 29 28 static u32 tcp_sock_id, sock_id; 30 29 30 + static int btf_sk_storage_get_ids[5]; 31 + static struct bpf_func_proto btf_sk_storage_get_proto __read_mostly; 32 + 33 + static int btf_sk_storage_delete_ids[5]; 34 + static struct bpf_func_proto btf_sk_storage_delete_proto __read_mostly; 35 + 36 + static void convert_sk_func_proto(struct bpf_func_proto *to, int *to_btf_ids, 37 + const struct bpf_func_proto *from) 38 + { 39 + int i; 40 + 41 + *to = *from; 42 + to->btf_id = to_btf_ids; 43 + for (i = 0; i < ARRAY_SIZE(to->arg_type); i++) { 44 + if (to->arg_type[i] == ARG_PTR_TO_SOCKET) { 45 + to->arg_type[i] = ARG_PTR_TO_BTF_ID; 46 + to->btf_id[i] = tcp_sock_id; 47 + } 48 + } 49 + } 50 + 31 51 static int bpf_tcp_ca_init(struct btf *btf) 32 52 { 33 53 s32 type_id; ··· 63 41 return -EINVAL; 64 42 tcp_sock_id = type_id; 65 43 tcp_sock_type = btf_type_by_id(btf, tcp_sock_id); 44 + 45 + convert_sk_func_proto(&btf_sk_storage_get_proto, 46 + btf_sk_storage_get_ids, 47 + &bpf_sk_storage_get_proto); 48 + convert_sk_func_proto(&btf_sk_storage_delete_proto, 49 + btf_sk_storage_delete_ids, 50 + &bpf_sk_storage_delete_proto); 66 51 67 52 return 0; 68 53 } ··· 196 167 switch (func_id) { 197 168 case BPF_FUNC_tcp_send_ack: 198 169 return &bpf_tcp_send_ack_proto; 170 + case BPF_FUNC_sk_storage_get: 171 + return &btf_sk_storage_get_proto; 172 + case BPF_FUNC_sk_storage_delete: 173 + return &btf_sk_storage_delete_proto; 199 174 default: 200 175 return bpf_base_func_proto(func_id); 201 176 }

+2 -1

net/ipv4/ip_input.c

··· 509 509 IPCB(skb)->iif = skb->skb_iif; 510 510 511 511 /* Must drop socket now because of tproxy. */ 512 - skb_orphan(skb); 512 + if (!skb_sk_is_prefetched(skb)) 513 + skb_orphan(skb); 513 514 514 515 return skb; 515 516

+76 -76

net/ipv4/tcp_bpf.c

··· 10 10 #include <net/inet_common.h> 11 11 #include <net/tls.h> 12 12 13 - static bool tcp_bpf_stream_read(const struct sock *sk) 14 - { 15 - struct sk_psock *psock; 16 - bool empty = true; 17 - 18 - rcu_read_lock(); 19 - psock = sk_psock(sk); 20 - if (likely(psock)) 21 - empty = list_empty(&psock->ingress_msg); 22 - rcu_read_unlock(); 23 - return !empty; 24 - } 25 - 26 - static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock, 27 - int flags, long timeo, int *err) 28 - { 29 - DEFINE_WAIT_FUNC(wait, woken_wake_function); 30 - int ret = 0; 31 - 32 - if (!timeo) 33 - return ret; 34 - 35 - add_wait_queue(sk_sleep(sk), &wait); 36 - sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); 37 - ret = sk_wait_event(sk, &timeo, 38 - !list_empty(&psock->ingress_msg) || 39 - !skb_queue_empty(&sk->sk_receive_queue), &wait); 40 - sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); 41 - remove_wait_queue(sk_sleep(sk), &wait); 42 - return ret; 43 - } 44 - 45 13 int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, 46 14 struct msghdr *msg, int len, int flags) 47 15 { ··· 82 114 return copied; 83 115 } 84 116 EXPORT_SYMBOL_GPL(__tcp_bpf_recvmsg); 85 - 86 - int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, 87 - int nonblock, int flags, int *addr_len) 88 - { 89 - struct sk_psock *psock; 90 - int copied, ret; 91 - 92 - psock = sk_psock_get(sk); 93 - if (unlikely(!psock)) 94 - return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len); 95 - if (unlikely(flags & MSG_ERRQUEUE)) 96 - return inet_recv_error(sk, msg, len, addr_len); 97 - if (!skb_queue_empty(&sk->sk_receive_queue) && 98 - sk_psock_queue_empty(psock)) 99 - return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len); 100 - lock_sock(sk); 101 - msg_bytes_ready: 102 - copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags); 103 - if (!copied) { 104 - int data, err = 0; 105 - long timeo; 106 - 107 - timeo = sock_rcvtimeo(sk, nonblock); 108 - data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err); 109 - if (data) { 110 - if (!sk_psock_queue_empty(psock)) 111 - goto msg_bytes_ready; 112 - release_sock(sk); 113 - sk_psock_put(sk, psock); 114 - return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len); 115 - } 116 - if (err) { 117 - ret = err; 118 - goto out; 119 - } 120 - copied = -EAGAIN; 121 - } 122 - ret = copied; 123 - out: 124 - release_sock(sk); 125 - sk_psock_put(sk, psock); 126 - return ret; 127 - } 128 117 129 118 static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, 130 119 struct sk_msg *msg, u32 apply_bytes, int flags) ··· 222 297 return ret; 223 298 } 224 299 EXPORT_SYMBOL_GPL(tcp_bpf_sendmsg_redir); 300 + 301 + #ifdef CONFIG_BPF_STREAM_PARSER 302 + static bool tcp_bpf_stream_read(const struct sock *sk) 303 + { 304 + struct sk_psock *psock; 305 + bool empty = true; 306 + 307 + rcu_read_lock(); 308 + psock = sk_psock(sk); 309 + if (likely(psock)) 310 + empty = list_empty(&psock->ingress_msg); 311 + rcu_read_unlock(); 312 + return !empty; 313 + } 314 + 315 + static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock, 316 + int flags, long timeo, int *err) 317 + { 318 + DEFINE_WAIT_FUNC(wait, woken_wake_function); 319 + int ret = 0; 320 + 321 + if (!timeo) 322 + return ret; 323 + 324 + add_wait_queue(sk_sleep(sk), &wait); 325 + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); 326 + ret = sk_wait_event(sk, &timeo, 327 + !list_empty(&psock->ingress_msg) || 328 + !skb_queue_empty(&sk->sk_receive_queue), &wait); 329 + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); 330 + remove_wait_queue(sk_sleep(sk), &wait); 331 + return ret; 332 + } 333 + 334 + static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, 335 + int nonblock, int flags, int *addr_len) 336 + { 337 + struct sk_psock *psock; 338 + int copied, ret; 339 + 340 + psock = sk_psock_get(sk); 341 + if (unlikely(!psock)) 342 + return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len); 343 + if (unlikely(flags & MSG_ERRQUEUE)) 344 + return inet_recv_error(sk, msg, len, addr_len); 345 + if (!skb_queue_empty(&sk->sk_receive_queue) && 346 + sk_psock_queue_empty(psock)) 347 + return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len); 348 + lock_sock(sk); 349 + msg_bytes_ready: 350 + copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags); 351 + if (!copied) { 352 + int data, err = 0; 353 + long timeo; 354 + 355 + timeo = sock_rcvtimeo(sk, nonblock); 356 + data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err); 357 + if (data) { 358 + if (!sk_psock_queue_empty(psock)) 359 + goto msg_bytes_ready; 360 + release_sock(sk); 361 + sk_psock_put(sk, psock); 362 + return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len); 363 + } 364 + if (err) { 365 + ret = err; 366 + goto out; 367 + } 368 + copied = -EAGAIN; 369 + } 370 + ret = copied; 371 + out: 372 + release_sock(sk); 373 + sk_psock_put(sk, psock); 374 + return ret; 375 + } 225 376 226 377 static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock, 227 378 struct sk_msg *msg, int *copied, int flags) ··· 529 528 return copied ? copied : err; 530 529 } 531 530 532 - #ifdef CONFIG_BPF_STREAM_PARSER 533 531 enum { 534 532 TCP_BPF_IPV4, 535 533 TCP_BPF_IPV6,

+4 -2

net/ipv4/udp.c

··· 2288 2288 struct rtable *rt = skb_rtable(skb); 2289 2289 __be32 saddr, daddr; 2290 2290 struct net *net = dev_net(skb->dev); 2291 + bool refcounted; 2291 2292 2292 2293 /* 2293 2294 * Validate the packet. ··· 2314 2313 if (udp4_csum_init(skb, uh, proto)) 2315 2314 goto csum_error; 2316 2315 2317 - sk = skb_steal_sock(skb); 2316 + sk = skb_steal_sock(skb, &refcounted); 2318 2317 if (sk) { 2319 2318 struct dst_entry *dst = skb_dst(skb); 2320 2319 int ret; ··· 2323 2322 udp_sk_rx_dst_set(sk, dst); 2324 2323 2325 2324 ret = udp_unicast_rcv_skb(sk, skb, uh); 2326 - sock_put(sk); 2325 + if (refcounted) 2326 + sock_put(sk); 2327 2327 return ret; 2328 2328 } 2329 2329

+2 -1

net/ipv6/ip6_input.c

··· 285 285 rcu_read_unlock(); 286 286 287 287 /* Must drop socket now because of tproxy. */ 288 - skb_orphan(skb); 288 + if (!skb_sk_is_prefetched(skb)) 289 + skb_orphan(skb); 289 290 290 291 return skb; 291 292 err:

+6 -3

net/ipv6/udp.c

··· 843 843 struct net *net = dev_net(skb->dev); 844 844 struct udphdr *uh; 845 845 struct sock *sk; 846 + bool refcounted; 846 847 u32 ulen = 0; 847 848 848 849 if (!pskb_may_pull(skb, sizeof(struct udphdr))) ··· 880 879 goto csum_error; 881 880 882 881 /* Check if the socket is already available, e.g. due to early demux */ 883 - sk = skb_steal_sock(skb); 882 + sk = skb_steal_sock(skb, &refcounted); 884 883 if (sk) { 885 884 struct dst_entry *dst = skb_dst(skb); 886 885 int ret; ··· 889 888 udp6_sk_rx_dst_set(sk, dst); 890 889 891 890 if (!uh->check && !udp_sk(sk)->no_check6_rx) { 892 - sock_put(sk); 891 + if (refcounted) 892 + sock_put(sk); 893 893 goto report_csum_error; 894 894 } 895 895 896 896 ret = udp6_unicast_rcv_skb(sk, skb, uh); 897 - sock_put(sk); 897 + if (refcounted) 898 + sock_put(sk); 898 899 return ret; 899 900 } 900 901

+3

net/sched/act_bpf.c

··· 12 12 #include <linux/bpf.h> 13 13 14 14 #include <net/netlink.h> 15 + #include <net/sock.h> 15 16 #include <net/pkt_sched.h> 16 17 #include <net/pkt_cls.h> 17 18 ··· 54 53 bpf_compute_data_pointers(skb); 55 54 filter_res = BPF_PROG_RUN(filter, skb); 56 55 } 56 + if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK) 57 + skb_orphan(skb); 57 58 rcu_read_unlock(); 58 59 59 60 /* A BPF program may overwrite the default action opcode.

+4 -4

samples/bpf/Makefile

··· 64 64 sockex1-objs := sockex1_user.o 65 65 sockex2-objs := sockex2_user.o 66 66 sockex3-objs := bpf_load.o sockex3_user.o 67 - tracex1-objs := bpf_load.o tracex1_user.o 67 + tracex1-objs := bpf_load.o tracex1_user.o $(TRACE_HELPERS) 68 68 tracex2-objs := bpf_load.o tracex2_user.o 69 69 tracex3-objs := bpf_load.o tracex3_user.o 70 70 tracex4-objs := bpf_load.o tracex4_user.o 71 - tracex5-objs := bpf_load.o tracex5_user.o 71 + tracex5-objs := bpf_load.o tracex5_user.o $(TRACE_HELPERS) 72 72 tracex6-objs := bpf_load.o tracex6_user.o 73 73 tracex7-objs := bpf_load.o tracex7_user.o 74 74 test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o ··· 88 88 xdp_router_ipv4-objs := xdp_router_ipv4_user.o 89 89 test_current_task_under_cgroup-objs := bpf_load.o $(CGROUP_HELPERS) \ 90 90 test_current_task_under_cgroup_user.o 91 - trace_event-objs := bpf_load.o trace_event_user.o $(TRACE_HELPERS) 92 - sampleip-objs := bpf_load.o sampleip_user.o $(TRACE_HELPERS) 91 + trace_event-objs := trace_event_user.o $(TRACE_HELPERS) 92 + sampleip-objs := sampleip_user.o $(TRACE_HELPERS) 93 93 tc_l2_redirect-objs := bpf_load.o tc_l2_redirect_user.o 94 94 lwt_len_hist-objs := bpf_load.o lwt_len_hist_user.o 95 95 xdp_tx_iptunnel-objs := xdp_tx_iptunnel_user.o

-20

samples/bpf/bpf_load.c

··· 665 665 { 666 666 return do_load_bpf_file(path, fixup_map); 667 667 } 668 - 669 - void read_trace_pipe(void) 670 - { 671 - int trace_fd; 672 - 673 - trace_fd = open(DEBUGFS "trace_pipe", O_RDONLY, 0); 674 - if (trace_fd < 0) 675 - return; 676 - 677 - while (1) { 678 - static char buf[4096]; 679 - ssize_t sz; 680 - 681 - sz = read(trace_fd, buf, sizeof(buf) - 1); 682 - if (sz > 0) { 683 - buf[sz] = 0; 684 - puts(buf); 685 - } 686 - } 687 - }

-1

samples/bpf/bpf_load.h

··· 53 53 int load_bpf_file(char *path); 54 54 int load_bpf_file_fixup_map(const char *path, fixup_map_cb fixup_map); 55 55 56 - void read_trace_pipe(void); 57 56 int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags); 58 57 #endif

+64 -34

samples/bpf/sampleip_user.c

··· 10 10 #include <errno.h> 11 11 #include <signal.h> 12 12 #include <string.h> 13 - #include <assert.h> 14 13 #include <linux/perf_event.h> 15 14 #include <linux/ptrace.h> 16 15 #include <linux/bpf.h> 17 - #include <sys/ioctl.h> 16 + #include <bpf/bpf.h> 18 17 #include <bpf/libbpf.h> 19 - #include "bpf_load.h" 20 18 #include "perf-sys.h" 21 19 #include "trace_helpers.h" 20 + 21 + #define __must_check 22 + #include <linux/err.h> 22 23 23 24 #define DEFAULT_FREQ 99 24 25 #define DEFAULT_SECS 5 25 26 #define MAX_IPS 8192 26 27 #define PAGE_OFFSET 0xffff880000000000 27 28 29 + static int map_fd; 28 30 static int nr_cpus; 29 31 30 32 static void usage(void) ··· 36 34 printf(" duration # sampling duration (seconds), default 5\n"); 37 35 } 38 36 39 - static int sampling_start(int *pmu_fd, int freq) 37 + static int sampling_start(int freq, struct bpf_program *prog, 38 + struct bpf_link *links[]) 40 39 { 41 - int i; 40 + int i, pmu_fd; 42 41 43 42 struct perf_event_attr pe_sample_attr = { 44 43 .type = PERF_TYPE_SOFTWARE, ··· 50 47 }; 51 48 52 49 for (i = 0; i < nr_cpus; i++) { 53 - pmu_fd[i] = sys_perf_event_open(&pe_sample_attr, -1 /* pid */, i, 50 + pmu_fd = sys_perf_event_open(&pe_sample_attr, -1 /* pid */, i, 54 51 -1 /* group_fd */, 0 /* flags */); 55 - if (pmu_fd[i] < 0) { 52 + if (pmu_fd < 0) { 56 53 fprintf(stderr, "ERROR: Initializing perf sampling\n"); 57 54 return 1; 58 55 } 59 - assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF, 60 - prog_fd[0]) == 0); 61 - assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_ENABLE, 0) == 0); 56 + links[i] = bpf_program__attach_perf_event(prog, pmu_fd); 57 + if (IS_ERR(links[i])) { 58 + fprintf(stderr, "ERROR: Attach perf event\n"); 59 + links[i] = NULL; 60 + close(pmu_fd); 61 + return 1; 62 + } 62 63 } 63 64 64 65 return 0; 65 66 } 66 67 67 - static void sampling_end(int *pmu_fd) 68 + static void sampling_end(struct bpf_link *links[]) 68 69 { 69 70 int i; 70 71 71 72 for (i = 0; i < nr_cpus; i++) 72 - close(pmu_fd[i]); 73 + bpf_link__destroy(links[i]); 73 74 } 74 75 75 76 struct ipcount { ··· 135 128 static void int_exit(int sig) 136 129 { 137 130 printf("\n"); 138 - print_ip_map(map_fd[0]); 131 + print_ip_map(map_fd); 139 132 exit(0); 140 133 } 141 134 142 135 int main(int argc, char **argv) 143 136 { 137 + int opt, freq = DEFAULT_FREQ, secs = DEFAULT_SECS, error = 1; 138 + struct bpf_object *obj = NULL; 139 + struct bpf_program *prog; 140 + struct bpf_link **links; 144 141 char filename[256]; 145 - int *pmu_fd, opt, freq = DEFAULT_FREQ, secs = DEFAULT_SECS; 146 142 147 143 /* process arguments */ 148 144 while ((opt = getopt(argc, argv, "F:h")) != -1) { ··· 173 163 } 174 164 175 165 /* create perf FDs for each CPU */ 176 - nr_cpus = sysconf(_SC_NPROCESSORS_CONF); 177 - pmu_fd = malloc(nr_cpus * sizeof(int)); 178 - if (pmu_fd == NULL) { 179 - fprintf(stderr, "ERROR: malloc of pmu_fd\n"); 180 - return 1; 166 + nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); 167 + links = calloc(nr_cpus, sizeof(struct bpf_link *)); 168 + if (!links) { 169 + fprintf(stderr, "ERROR: malloc of links\n"); 170 + goto cleanup; 171 + } 172 + 173 + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 174 + obj = bpf_object__open_file(filename, NULL); 175 + if (IS_ERR(obj)) { 176 + fprintf(stderr, "ERROR: opening BPF object file failed\n"); 177 + obj = NULL; 178 + goto cleanup; 179 + } 180 + 181 + prog = bpf_object__find_program_by_name(obj, "do_sample"); 182 + if (!prog) { 183 + fprintf(stderr, "ERROR: finding a prog in obj file failed\n"); 184 + goto cleanup; 181 185 } 182 186 183 187 /* load BPF program */ 184 - snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 185 - if (load_bpf_file(filename)) { 186 - fprintf(stderr, "ERROR: loading BPF program (errno %d):\n", 187 - errno); 188 - if (strcmp(bpf_log_buf, "") == 0) 189 - fprintf(stderr, "Try: ulimit -l unlimited\n"); 190 - else 191 - fprintf(stderr, "%s", bpf_log_buf); 192 - return 1; 188 + if (bpf_object__load(obj)) { 189 + fprintf(stderr, "ERROR: loading BPF object file failed\n"); 190 + goto cleanup; 193 191 } 192 + 193 + map_fd = bpf_object__find_map_fd_by_name(obj, "ip_map"); 194 + if (map_fd < 0) { 195 + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); 196 + goto cleanup; 197 + } 198 + 194 199 signal(SIGINT, int_exit); 195 200 signal(SIGTERM, int_exit); 196 201 197 202 /* do sampling */ 198 203 printf("Sampling at %d Hertz for %d seconds. Ctrl-C also ends.\n", 199 204 freq, secs); 200 - if (sampling_start(pmu_fd, freq) != 0) 201 - return 1; 205 + if (sampling_start(freq, prog, links) != 0) 206 + goto cleanup; 207 + 202 208 sleep(secs); 203 - sampling_end(pmu_fd); 204 - free(pmu_fd); 209 + error = 0; 205 210 211 + cleanup: 212 + sampling_end(links); 206 213 /* output sample counts */ 207 - print_ip_map(map_fd[0]); 214 + if (!error) 215 + print_ip_map(map_fd); 208 216 209 - return 0; 217 + free(links); 218 + bpf_object__close(obj); 219 + return error; 210 220 }

+94 -47

samples/bpf/trace_event_user.c

··· 6 6 #include <stdlib.h> 7 7 #include <stdbool.h> 8 8 #include <string.h> 9 - #include <fcntl.h> 10 - #include <poll.h> 11 - #include <sys/ioctl.h> 12 9 #include <linux/perf_event.h> 13 10 #include <linux/bpf.h> 14 11 #include <signal.h> 15 - #include <assert.h> 16 12 #include <errno.h> 17 13 #include <sys/resource.h> 14 + #include <bpf/bpf.h> 18 15 #include <bpf/libbpf.h> 19 - #include "bpf_load.h" 20 16 #include "perf-sys.h" 21 17 #include "trace_helpers.h" 22 18 19 + #define __must_check 20 + #include <linux/err.h> 21 + 23 22 #define SAMPLE_FREQ 50 24 23 24 + static int pid; 25 + /* counts, stackmap */ 26 + static int map_fd[2]; 27 + struct bpf_program *prog; 25 28 static bool sys_read_seen, sys_write_seen; 26 29 27 30 static void print_ksym(__u64 addr) ··· 94 91 } 95 92 } 96 93 97 - static void int_exit(int sig) 94 + static void err_exit(int err) 98 95 { 99 - kill(0, SIGKILL); 100 - exit(0); 96 + kill(pid, SIGKILL); 97 + exit(err); 101 98 } 102 99 103 100 static void print_stacks(void) ··· 105 102 struct key_t key = {}, next_key; 106 103 __u64 value; 107 104 __u32 stackid = 0, next_id; 108 - int fd = map_fd[0], stack_map = map_fd[1]; 105 + int error = 1, fd = map_fd[0], stack_map = map_fd[1]; 109 106 110 107 sys_read_seen = sys_write_seen = false; 111 108 while (bpf_map_get_next_key(fd, &key, &next_key) == 0) { ··· 117 114 printf("\n"); 118 115 if (!sys_read_seen || !sys_write_seen) { 119 116 printf("BUG kernel stack doesn't contain sys_read() and sys_write()\n"); 120 - int_exit(0); 117 + err_exit(error); 121 118 } 122 119 123 120 /* clear stack map */ ··· 139 136 140 137 static void test_perf_event_all_cpu(struct perf_event_attr *attr) 141 138 { 142 - int nr_cpus = sysconf(_SC_NPROCESSORS_CONF); 143 - int *pmu_fd = malloc(nr_cpus * sizeof(int)); 144 - int i, error = 0; 139 + int nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); 140 + struct bpf_link **links = calloc(nr_cpus, sizeof(struct bpf_link *)); 141 + int i, pmu_fd, error = 1; 142 + 143 + if (!links) { 144 + printf("malloc of links failed\n"); 145 + goto err; 146 + } 145 147 146 148 /* system wide perf event, no need to inherit */ 147 149 attr->inherit = 0; 148 150 149 151 /* open perf_event on all cpus */ 150 152 for (i = 0; i < nr_cpus; i++) { 151 - pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0); 152 - if (pmu_fd[i] < 0) { 153 + pmu_fd = sys_perf_event_open(attr, -1, i, -1, 0); 154 + if (pmu_fd < 0) { 153 155 printf("sys_perf_event_open failed\n"); 154 - error = 1; 155 156 goto all_cpu_err; 156 157 } 157 - assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 0); 158 - assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_ENABLE) == 0); 158 + links[i] = bpf_program__attach_perf_event(prog, pmu_fd); 159 + if (IS_ERR(links[i])) { 160 + printf("bpf_program__attach_perf_event failed\n"); 161 + links[i] = NULL; 162 + close(pmu_fd); 163 + goto all_cpu_err; 164 + } 159 165 } 160 166 161 - if (generate_load() < 0) { 162 - error = 1; 167 + if (generate_load() < 0) 163 168 goto all_cpu_err; 164 - } 169 + 165 170 print_stacks(); 171 + error = 0; 166 172 all_cpu_err: 167 - for (i--; i >= 0; i--) { 168 - ioctl(pmu_fd[i], PERF_EVENT_IOC_DISABLE); 169 - close(pmu_fd[i]); 170 - } 171 - free(pmu_fd); 173 + for (i--; i >= 0; i--) 174 + bpf_link__destroy(links[i]); 175 + err: 176 + free(links); 172 177 if (error) 173 - int_exit(0); 178 + err_exit(error); 174 179 } 175 180 176 181 static void test_perf_event_task(struct perf_event_attr *attr) 177 182 { 178 - int pmu_fd, error = 0; 183 + struct bpf_link *link = NULL; 184 + int pmu_fd, error = 1; 179 185 180 186 /* per task perf event, enable inherit so the "dd ..." command can be traced properly. 181 187 * Enabling inherit will cause bpf_perf_prog_read_time helper failure. ··· 195 183 pmu_fd = sys_perf_event_open(attr, 0, -1, -1, 0); 196 184 if (pmu_fd < 0) { 197 185 printf("sys_perf_event_open failed\n"); 198 - int_exit(0); 199 - } 200 - assert(ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 0); 201 - assert(ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE) == 0); 202 - 203 - if (generate_load() < 0) { 204 - error = 1; 205 186 goto err; 206 187 } 188 + link = bpf_program__attach_perf_event(prog, pmu_fd); 189 + if (IS_ERR(link)) { 190 + printf("bpf_program__attach_perf_event failed\n"); 191 + link = NULL; 192 + close(pmu_fd); 193 + goto err; 194 + } 195 + 196 + if (generate_load() < 0) 197 + goto err; 198 + 207 199 print_stacks(); 200 + error = 0; 208 201 err: 209 - ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE); 210 - close(pmu_fd); 202 + bpf_link__destroy(link); 211 203 if (error) 212 - int_exit(0); 204 + err_exit(error); 213 205 } 214 206 215 207 static void test_bpf_perf_event(void) ··· 298 282 int main(int argc, char **argv) 299 283 { 300 284 struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 285 + struct bpf_object *obj = NULL; 301 286 char filename[256]; 287 + int error = 1; 302 288 303 289 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 304 290 setrlimit(RLIMIT_MEMLOCK, &r); 305 291 306 - signal(SIGINT, int_exit); 307 - signal(SIGTERM, int_exit); 292 + signal(SIGINT, err_exit); 293 + signal(SIGTERM, err_exit); 308 294 309 295 if (load_kallsyms()) { 310 296 printf("failed to process /proc/kallsyms\n"); 311 - return 1; 297 + goto cleanup; 312 298 } 313 299 314 - if (load_bpf_file(filename)) { 315 - printf("%s", bpf_log_buf); 316 - return 2; 300 + obj = bpf_object__open_file(filename, NULL); 301 + if (IS_ERR(obj)) { 302 + printf("opening BPF object file failed\n"); 303 + obj = NULL; 304 + goto cleanup; 317 305 } 318 306 319 - if (fork() == 0) { 307 + prog = bpf_object__find_program_by_name(obj, "bpf_prog1"); 308 + if (!prog) { 309 + printf("finding a prog in obj file failed\n"); 310 + goto cleanup; 311 + } 312 + 313 + /* load BPF program */ 314 + if (bpf_object__load(obj)) { 315 + printf("loading BPF object file failed\n"); 316 + goto cleanup; 317 + } 318 + 319 + map_fd[0] = bpf_object__find_map_fd_by_name(obj, "counts"); 320 + map_fd[1] = bpf_object__find_map_fd_by_name(obj, "stackmap"); 321 + if (map_fd[0] < 0 || map_fd[1] < 0) { 322 + printf("finding a counts/stackmap map in obj file failed\n"); 323 + goto cleanup; 324 + } 325 + 326 + pid = fork(); 327 + if (pid == 0) { 320 328 read_trace_pipe(); 321 329 return 0; 330 + } else if (pid == -1) { 331 + printf("couldn't spawn process\n"); 332 + goto cleanup; 322 333 } 334 + 323 335 test_bpf_perf_event(); 324 - int_exit(0); 325 - return 0; 336 + error = 0; 337 + 338 + cleanup: 339 + bpf_object__close(obj); 340 + err_exit(error); 326 341 }

+1

samples/bpf/tracex1_user.c

··· 4 4 #include <unistd.h> 5 5 #include <bpf/bpf.h> 6 6 #include "bpf_load.h" 7 + #include "trace_helpers.h" 7 8 8 9 int main(int ac, char **argv) 9 10 {

+1

samples/bpf/tracex5_user.c

··· 8 8 #include <bpf/bpf.h> 9 9 #include "bpf_load.h" 10 10 #include <sys/resource.h> 11 + #include "trace_helpers.h" 11 12 12 13 /* install fake seccomp program to enable seccomp code path inside the kernel, 13 14 * so that our kprobe attached to seccomp_phase1() can be triggered

+10 -14

scripts/link-vmlinux.sh

··· 113 113 gen_btf() 114 114 { 115 115 local pahole_ver 116 - local bin_arch 117 - local bin_format 118 - local bin_file 119 116 120 117 if ! [ -x "$(command -v ${PAHOLE})" ]; then 121 118 echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available" ··· 130 133 info "BTF" ${2} 131 134 LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1} 132 135 133 - # dump .BTF section into raw binary file to link with final vmlinux 134 - bin_arch=$(LANG=C ${OBJDUMP} -f ${1} | grep architecture | \ 135 - cut -d, -f1 | cut -d' ' -f2) 136 - bin_format=$(LANG=C ${OBJDUMP} -f ${1} | grep 'file format' | \ 137 - awk '{print $4}') 138 - bin_file=.btf.vmlinux.bin 139 - ${OBJCOPY} --change-section-address .BTF=0 \ 140 - --set-section-flags .BTF=alloc -O binary \ 141 - --only-section=.BTF ${1} $bin_file 142 - ${OBJCOPY} -I binary -O ${bin_format} -B ${bin_arch} \ 143 - --rename-section .data=.BTF $bin_file ${2} 136 + # Create ${2} which contains just .BTF section but no symbols. Add 137 + # SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all 138 + # deletes all symbols including __start_BTF and __stop_BTF, which will 139 + # be redefined in the linker script. Add 2>/dev/null to suppress GNU 140 + # objcopy warnings: "empty loadable segment detected at ..." 141 + ${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \ 142 + --strip-all ${1} ${2} 2>/dev/null 143 + # Change e_type to ET_REL so that it can be used to link final vmlinux. 144 + # Unlike GNU ld, lld does not allow an ET_EXEC input. 145 + printf '\1' | dd of=${2} conv=notrunc bs=1 seek=16 status=none 144 146 } 145 147 146 148 # Create ${2} .o file with all symbols from the ${1} object file

+5 -5

security/Kconfig

··· 277 277 278 278 config LSM 279 279 string "Ordered list of enabled LSMs" 280 - default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor" if DEFAULT_SECURITY_SMACK 281 - default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo" if DEFAULT_SECURITY_APPARMOR 282 - default "lockdown,yama,loadpin,safesetid,integrity,tomoyo" if DEFAULT_SECURITY_TOMOYO 283 - default "lockdown,yama,loadpin,safesetid,integrity" if DEFAULT_SECURITY_DAC 284 - default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor" 280 + default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK 281 + default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR 282 + default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO 283 + default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC 284 + default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" 285 285 help 286 286 A comma-separated list of LSMs, in initialization order. 287 287 Any LSMs left off this list will be ignored. This can be

+2

security/Makefile

··· 12 12 subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin 13 13 subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid 14 14 subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown 15 + subdir-$(CONFIG_BPF_LSM) += bpf 15 16 16 17 # always enable default capabilities 17 18 obj-y += commoncap.o ··· 31 30 obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ 32 31 obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/ 33 32 obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o 33 + obj-$(CONFIG_BPF_LSM) += bpf/ 34 34 35 35 # Object integrity file lists 36 36 subdir-$(CONFIG_INTEGRITY) += integrity

+5

security/bpf/Makefile

+26

security/bpf/hooks.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2020 Google LLC. 5 + */ 6 + #include <linux/lsm_hooks.h> 7 + #include <linux/bpf_lsm.h> 8 + 9 + static struct security_hook_list bpf_lsm_hooks[] __lsm_ro_after_init = { 10 + #define LSM_HOOK(RET, DEFAULT, NAME, ...) \ 11 + LSM_HOOK_INIT(NAME, bpf_lsm_##NAME), 12 + #include <linux/lsm_hook_defs.h> 13 + #undef LSM_HOOK 14 + }; 15 + 16 + static int __init bpf_lsm_init(void) 17 + { 18 + security_add_hooks(bpf_lsm_hooks, ARRAY_SIZE(bpf_lsm_hooks), "bpf"); 19 + pr_info("LSM support for eBPF active\n"); 20 + return 0; 21 + } 22 + 23 + DEFINE_LSM(bpf) = { 24 + .name = "bpf", 25 + .init = bpf_lsm_init, 26 + };

+30 -11

security/security.c

··· 669 669 } 670 670 671 671 /* 672 + * The default value of the LSM hook is defined in linux/lsm_hook_defs.h and 673 + * can be accessed with: 674 + * 675 + * LSM_RET_DEFAULT(<hook_name>) 676 + * 677 + * The macros below define static constants for the default value of each 678 + * LSM hook. 679 + */ 680 + #define LSM_RET_DEFAULT(NAME) (NAME##_default) 681 + #define DECLARE_LSM_RET_DEFAULT_void(DEFAULT, NAME) 682 + #define DECLARE_LSM_RET_DEFAULT_int(DEFAULT, NAME) \ 683 + static const int LSM_RET_DEFAULT(NAME) = (DEFAULT); 684 + #define LSM_HOOK(RET, DEFAULT, NAME, ...) \ 685 + DECLARE_LSM_RET_DEFAULT_##RET(DEFAULT, NAME) 686 + 687 + #include <linux/lsm_hook_defs.h> 688 + #undef LSM_HOOK 689 + 690 + /* 672 691 * Hook list operation macros. 673 692 * 674 693 * call_void_hook: ··· 1357 1338 int rc; 1358 1339 1359 1340 if (unlikely(IS_PRIVATE(inode))) 1360 - return -EOPNOTSUPP; 1341 + return LSM_RET_DEFAULT(inode_getsecurity); 1361 1342 /* 1362 1343 * Only one module will provide an attribute with a given name. 1363 1344 */ 1364 1345 hlist_for_each_entry(hp, &security_hook_heads.inode_getsecurity, list) { 1365 1346 rc = hp->hook.inode_getsecurity(inode, name, buffer, alloc); 1366 - if (rc != -EOPNOTSUPP) 1347 + if (rc != LSM_RET_DEFAULT(inode_getsecurity)) 1367 1348 return rc; 1368 1349 } 1369 - return -EOPNOTSUPP; 1350 + return LSM_RET_DEFAULT(inode_getsecurity); 1370 1351 } 1371 1352 1372 1353 int security_inode_setsecurity(struct inode *inode, const char *name, const void *value, size_t size, int flags) ··· 1375 1356 int rc; 1376 1357 1377 1358 if (unlikely(IS_PRIVATE(inode))) 1378 - return -EOPNOTSUPP; 1359 + return LSM_RET_DEFAULT(inode_setsecurity); 1379 1360 /* 1380 1361 * Only one module will provide an attribute with a given name. 1381 1362 */ 1382 1363 hlist_for_each_entry(hp, &security_hook_heads.inode_setsecurity, list) { 1383 1364 rc = hp->hook.inode_setsecurity(inode, name, value, size, 1384 1365 flags); 1385 - if (rc != -EOPNOTSUPP) 1366 + if (rc != LSM_RET_DEFAULT(inode_setsecurity)) 1386 1367 return rc; 1387 1368 } 1388 - return -EOPNOTSUPP; 1369 + return LSM_RET_DEFAULT(inode_setsecurity); 1389 1370 } 1390 1371 1391 1372 int security_inode_listsecurity(struct inode *inode, char *buffer, size_t buffer_size) ··· 1759 1740 unsigned long arg4, unsigned long arg5) 1760 1741 { 1761 1742 int thisrc; 1762 - int rc = -ENOSYS; 1743 + int rc = LSM_RET_DEFAULT(task_prctl); 1763 1744 struct security_hook_list *hp; 1764 1745 1765 1746 hlist_for_each_entry(hp, &security_hook_heads.task_prctl, list) { 1766 1747 thisrc = hp->hook.task_prctl(option, arg2, arg3, arg4, arg5); 1767 - if (thisrc != -ENOSYS) { 1748 + if (thisrc != LSM_RET_DEFAULT(task_prctl)) { 1768 1749 rc = thisrc; 1769 1750 if (thisrc != 0) 1770 1751 break; ··· 1936 1917 continue; 1937 1918 return hp->hook.getprocattr(p, name, value); 1938 1919 } 1939 - return -EINVAL; 1920 + return LSM_RET_DEFAULT(getprocattr); 1940 1921 } 1941 1922 1942 1923 int security_setprocattr(const char *lsm, const char *name, void *value, ··· 1949 1930 continue; 1950 1931 return hp->hook.setprocattr(name, value, size); 1951 1932 } 1952 - return -EINVAL; 1933 + return LSM_RET_DEFAULT(setprocattr); 1953 1934 } 1954 1935 1955 1936 int security_netlink_send(struct sock *sk, struct sk_buff *skb) ··· 2334 2315 const struct flowi *fl) 2335 2316 { 2336 2317 struct security_hook_list *hp; 2337 - int rc = 1; 2318 + int rc = LSM_RET_DEFAULT(xfrm_state_pol_flow_match); 2338 2319 2339 2320 /* 2340 2321 * Since this function is expected to return 0 or 1, the judgment

+116

tools/bpf/bpftool/Documentation/bpftool-struct_ops.rst

··· 1 + ================== 2 + bpftool-struct_ops 3 + ================== 4 + ------------------------------------------------------------------------------- 5 + tool to register/unregister/introspect BPF struct_ops 6 + ------------------------------------------------------------------------------- 7 + 8 + :Manual section: 8 9 + 10 + SYNOPSIS 11 + ======== 12 + 13 + **bpftool** [*OPTIONS*] **struct_ops** *COMMAND* 14 + 15 + *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] } 16 + 17 + *COMMANDS* := 18 + { **show** | **list** | **dump** | **register** | **unregister** | **help** } 19 + 20 + STRUCT_OPS COMMANDS 21 + =================== 22 + 23 + | **bpftool** **struct_ops { show | list }** [*STRUCT_OPS_MAP*] 24 + | **bpftool** **struct_ops dump** [*STRUCT_OPS_MAP*] 25 + | **bpftool** **struct_ops register** *OBJ* 26 + | **bpftool** **struct_ops unregister** *STRUCT_OPS_MAP* 27 + | **bpftool** **struct_ops help** 28 + | 29 + | *STRUCT_OPS_MAP* := { **id** *STRUCT_OPS_MAP_ID* | **name** *STRUCT_OPS_MAP_NAME* } 30 + | *OBJ* := /a/file/of/bpf_struct_ops.o 31 + 32 + 33 + DESCRIPTION 34 + =========== 35 + **bpftool struct_ops { show | list }** [*STRUCT_OPS_MAP*] 36 + Show brief information about the struct_ops in the system. 37 + If *STRUCT_OPS_MAP* is specified, it shows information only 38 + for the given struct_ops. Otherwise, it lists all struct_ops 39 + currently existing in the system. 40 + 41 + Output will start with struct_ops map ID, followed by its map 42 + name and its struct_ops's kernel type. 43 + 44 + **bpftool struct_ops dump** [*STRUCT_OPS_MAP*] 45 + Dump details information about the struct_ops in the system. 46 + If *STRUCT_OPS_MAP* is specified, it dumps information only 47 + for the given struct_ops. Otherwise, it dumps all struct_ops 48 + currently existing in the system. 49 + 50 + **bpftool struct_ops register** *OBJ* 51 + Register bpf struct_ops from *OBJ*. All struct_ops under 52 + the ELF section ".struct_ops" will be registered to 53 + its kernel subsystem. 54 + 55 + **bpftool struct_ops unregister** *STRUCT_OPS_MAP* 56 + Unregister the *STRUCT_OPS_MAP* from the kernel subsystem. 57 + 58 + **bpftool struct_ops help** 59 + Print short help message. 60 + 61 + OPTIONS 62 + ======= 63 + -h, --help 64 + Print short generic help message (similar to **bpftool help**). 65 + 66 + -V, --version 67 + Print version number (similar to **bpftool version**). 68 + 69 + -j, --json 70 + Generate JSON output. For commands that cannot produce JSON, this 71 + option has no effect. 72 + 73 + -p, --pretty 74 + Generate human-readable JSON output. Implies **-j**. 75 + 76 + -d, --debug 77 + Print all logs available, even debug-level information. This 78 + includes logs from libbpf as well as from the verifier, when 79 + attempting to load programs. 80 + 81 + EXAMPLES 82 + ======== 83 + **# bpftool struct_ops show** 84 + 85 + :: 86 + 87 + 100: dctcp tcp_congestion_ops 88 + 105: cubic tcp_congestion_ops 89 + 90 + **# bpftool struct_ops unregister id 105** 91 + 92 + :: 93 + 94 + Unregistered tcp_congestion_ops cubic id 105 95 + 96 + **# bpftool struct_ops register bpf_cubic.o** 97 + 98 + :: 99 + 100 + Registered tcp_congestion_ops cubic id 110 101 + 102 + 103 + SEE ALSO 104 + ======== 105 + **bpf**\ (2), 106 + **bpf-helpers**\ (7), 107 + **bpftool**\ (8), 108 + **bpftool-prog**\ (8), 109 + **bpftool-map**\ (8), 110 + **bpftool-cgroup**\ (8), 111 + **bpftool-feature**\ (8), 112 + **bpftool-net**\ (8), 113 + **bpftool-perf**\ (8), 114 + **bpftool-btf**\ (8) 115 + **bpftool-gen**\ (8) 116 +

+28

tools/bpf/bpftool/bash-completion/bpftool

··· 576 576 ;; 577 577 esac 578 578 ;; 579 + struct_ops) 580 + local STRUCT_OPS_TYPE='id name' 581 + case $command in 582 + show|list|dump|unregister) 583 + case $prev in 584 + $command) 585 + COMPREPLY=( $( compgen -W "$STRUCT_OPS_TYPE" -- "$cur" ) ) 586 + ;; 587 + id) 588 + _bpftool_get_map_ids_for_type struct_ops 589 + ;; 590 + name) 591 + _bpftool_get_map_names_for_type struct_ops 592 + ;; 593 + esac 594 + return 0 595 + ;; 596 + register) 597 + _filedir 598 + return 0 599 + ;; 600 + *) 601 + [[ $prev == $object ]] && \ 602 + COMPREPLY=( $( compgen -W 'register unregister show list dump help' \ 603 + -- "$cur" ) ) 604 + ;; 605 + esac 606 + ;; 579 607 map) 580 608 local MAP_TYPE='id pinned name' 581 609 case $command in

+183 -16

tools/bpf/bpftool/btf_dumper.c

··· 4 4 #include <ctype.h> 5 5 #include <stdio.h> /* for (FILE *) used by json_writer */ 6 6 #include <string.h> 7 + #include <unistd.h> 7 8 #include <asm/byteorder.h> 8 9 #include <linux/bitops.h> 9 10 #include <linux/btf.h> 10 11 #include <linux/err.h> 11 12 #include <bpf/btf.h> 13 + #include <bpf/bpf.h> 12 14 13 15 #include "json_writer.h" 14 16 #include "main.h" ··· 24 22 static int btf_dumper_do_type(const struct btf_dumper *d, __u32 type_id, 25 23 __u8 bit_offset, const void *data); 26 24 27 - static void btf_dumper_ptr(const void *data, json_writer_t *jw, 28 - bool is_plain_text) 25 + static int btf_dump_func(const struct btf *btf, char *func_sig, 26 + const struct btf_type *func_proto, 27 + const struct btf_type *func, int pos, int size); 28 + 29 + static int dump_prog_id_as_func_ptr(const struct btf_dumper *d, 30 + const struct btf_type *func_proto, 31 + __u32 prog_id) 29 32 { 30 - if (is_plain_text) 31 - jsonw_printf(jw, "%p", *(void **)data); 33 + struct bpf_prog_info_linear *prog_info = NULL; 34 + const struct btf_type *func_type; 35 + const char *prog_name = NULL; 36 + struct bpf_func_info *finfo; 37 + struct btf *prog_btf = NULL; 38 + struct bpf_prog_info *info; 39 + int prog_fd, func_sig_len; 40 + char prog_str[1024]; 41 + 42 + /* Get the ptr's func_proto */ 43 + func_sig_len = btf_dump_func(d->btf, prog_str, func_proto, NULL, 0, 44 + sizeof(prog_str)); 45 + if (func_sig_len == -1) 46 + return -1; 47 + 48 + if (!prog_id) 49 + goto print; 50 + 51 + /* Get the bpf_prog's name. Obtain from func_info. */ 52 + prog_fd = bpf_prog_get_fd_by_id(prog_id); 53 + if (prog_fd == -1) 54 + goto print; 55 + 56 + prog_info = bpf_program__get_prog_info_linear(prog_fd, 57 + 1UL << BPF_PROG_INFO_FUNC_INFO); 58 + close(prog_fd); 59 + if (IS_ERR(prog_info)) { 60 + prog_info = NULL; 61 + goto print; 62 + } 63 + info = &prog_info->info; 64 + 65 + if (!info->btf_id || !info->nr_func_info || 66 + btf__get_from_id(info->btf_id, &prog_btf)) 67 + goto print; 68 + finfo = (struct bpf_func_info *)info->func_info; 69 + func_type = btf__type_by_id(prog_btf, finfo->type_id); 70 + if (!func_type || !btf_is_func(func_type)) 71 + goto print; 72 + 73 + prog_name = btf__name_by_offset(prog_btf, func_type->name_off); 74 + 75 + print: 76 + if (!prog_id) 77 + snprintf(&prog_str[func_sig_len], 78 + sizeof(prog_str) - func_sig_len, " 0"); 79 + else if (prog_name) 80 + snprintf(&prog_str[func_sig_len], 81 + sizeof(prog_str) - func_sig_len, 82 + " %s/prog_id:%u", prog_name, prog_id); 32 83 else 33 - jsonw_printf(jw, "%lu", *(unsigned long *)data); 84 + snprintf(&prog_str[func_sig_len], 85 + sizeof(prog_str) - func_sig_len, 86 + " <unknown_prog_name>/prog_id:%u", prog_id); 87 + 88 + prog_str[sizeof(prog_str) - 1] = '\0'; 89 + jsonw_string(d->jw, prog_str); 90 + btf__free(prog_btf); 91 + free(prog_info); 92 + return 0; 93 + } 94 + 95 + static void btf_dumper_ptr(const struct btf_dumper *d, 96 + const struct btf_type *t, 97 + const void *data) 98 + { 99 + unsigned long value = *(unsigned long *)data; 100 + const struct btf_type *ptr_type; 101 + __s32 ptr_type_id; 102 + 103 + if (!d->prog_id_as_func_ptr || value > UINT32_MAX) 104 + goto print_ptr_value; 105 + 106 + ptr_type_id = btf__resolve_type(d->btf, t->type); 107 + if (ptr_type_id < 0) 108 + goto print_ptr_value; 109 + ptr_type = btf__type_by_id(d->btf, ptr_type_id); 110 + if (!ptr_type || !btf_is_func_proto(ptr_type)) 111 + goto print_ptr_value; 112 + 113 + if (!dump_prog_id_as_func_ptr(d, ptr_type, value)) 114 + return; 115 + 116 + print_ptr_value: 117 + if (d->is_plain_text) 118 + jsonw_printf(d->jw, "%p", (void *)value); 119 + else 120 + jsonw_printf(d->jw, "%lu", value); 34 121 } 35 122 36 123 static int btf_dumper_modifier(const struct btf_dumper *d, __u32 type_id, ··· 134 43 return btf_dumper_do_type(d, actual_type_id, bit_offset, data); 135 44 } 136 45 137 - static void btf_dumper_enum(const void *data, json_writer_t *jw) 46 + static int btf_dumper_enum(const struct btf_dumper *d, 47 + const struct btf_type *t, 48 + const void *data) 138 49 { 139 - jsonw_printf(jw, "%d", *(int *)data); 50 + const struct btf_enum *enums = btf_enum(t); 51 + __s64 value; 52 + __u16 i; 53 + 54 + switch (t->size) { 55 + case 8: 56 + value = *(__s64 *)data; 57 + break; 58 + case 4: 59 + value = *(__s32 *)data; 60 + break; 61 + case 2: 62 + value = *(__s16 *)data; 63 + break; 64 + case 1: 65 + value = *(__s8 *)data; 66 + break; 67 + default: 68 + return -EINVAL; 69 + } 70 + 71 + for (i = 0; i < btf_vlen(t); i++) { 72 + if (value == enums[i].val) { 73 + jsonw_string(d->jw, 74 + btf__name_by_offset(d->btf, 75 + enums[i].name_off)); 76 + return 0; 77 + } 78 + } 79 + 80 + jsonw_int(d->jw, value); 81 + return 0; 82 + } 83 + 84 + static bool is_str_array(const struct btf *btf, const struct btf_array *arr, 85 + const char *s) 86 + { 87 + const struct btf_type *elem_type; 88 + const char *end_s; 89 + 90 + if (!arr->nelems) 91 + return false; 92 + 93 + elem_type = btf__type_by_id(btf, arr->type); 94 + /* Not skipping typedef. typedef to char does not count as 95 + * a string now. 96 + */ 97 + while (elem_type && btf_is_mod(elem_type)) 98 + elem_type = btf__type_by_id(btf, elem_type->type); 99 + 100 + if (!elem_type || !btf_is_int(elem_type) || elem_type->size != 1) 101 + return false; 102 + 103 + if (btf_int_encoding(elem_type) != BTF_INT_CHAR && 104 + strcmp("char", btf__name_by_offset(btf, elem_type->name_off))) 105 + return false; 106 + 107 + end_s = s + arr->nelems; 108 + while (s < end_s) { 109 + if (!*s) 110 + return true; 111 + if (*s <= 0x1f || *s >= 0x7f) 112 + return false; 113 + s++; 114 + } 115 + 116 + /* '\0' is not found */ 117 + return false; 140 118 } 141 119 142 120 static int btf_dumper_array(const struct btf_dumper *d, __u32 type_id, ··· 216 56 long long elem_size; 217 57 int ret = 0; 218 58 __u32 i; 59 + 60 + if (is_str_array(d->btf, arr, data)) { 61 + jsonw_string(d->jw, data); 62 + return 0; 63 + } 219 64 220 65 elem_size = btf__resolve_size(d->btf, arr->type); 221 66 if (elem_size < 0) ··· 531 366 case BTF_KIND_ARRAY: 532 367 return btf_dumper_array(d, type_id, data); 533 368 case BTF_KIND_ENUM: 534 - btf_dumper_enum(data, d->jw); 535 - return 0; 369 + return btf_dumper_enum(d, t, data); 536 370 case BTF_KIND_PTR: 537 - btf_dumper_ptr(data, d->jw, d->is_plain_text); 371 + btf_dumper_ptr(d, t, data); 538 372 return 0; 539 373 case BTF_KIND_UNKN: 540 374 jsonw_printf(d->jw, "(unknown)"); ··· 577 413 if (pos == -1) \ 578 414 return -1; \ 579 415 } while (0) 580 - 581 - static int btf_dump_func(const struct btf *btf, char *func_sig, 582 - const struct btf_type *func_proto, 583 - const struct btf_type *func, int pos, int size); 584 416 585 417 static int __btf_dumper_type_only(const struct btf *btf, __u32 type_id, 586 418 char *func_sig, int pos, int size) ··· 686 526 BTF_PRINT_ARG(", "); 687 527 if (arg->type) { 688 528 BTF_PRINT_TYPE(arg->type); 689 - BTF_PRINT_ARG("%s", 690 - btf__name_by_offset(btf, arg->name_off)); 529 + if (arg->name_off) 530 + BTF_PRINT_ARG("%s", 531 + btf__name_by_offset(btf, arg->name_off)); 532 + else if (pos && func_sig[pos - 1] == ' ') 533 + /* Remove unnecessary space for 534 + * FUNC_PROTO that does not have 535 + * arg->name_off 536 + */ 537 + func_sig[--pos] = '\0'; 691 538 } else { 692 539 BTF_PRINT_ARG("..."); 693 540 }

+2 -1

tools/bpf/bpftool/main.c

··· 58 58 " %s batch file FILE\n" 59 59 " %s version\n" 60 60 "\n" 61 - " OBJECT := { prog | map | cgroup | perf | net | feature | btf | gen }\n" 61 + " OBJECT := { prog | map | cgroup | perf | net | feature | btf | gen | struct_ops }\n" 62 62 " " HELP_SPEC_OPTIONS "\n" 63 63 "", 64 64 bin_name, bin_name, bin_name); ··· 221 221 { "feature", do_feature }, 222 222 { "btf", do_btf }, 223 223 { "gen", do_gen }, 224 + { "struct_ops", do_struct_ops }, 224 225 { "version", do_version }, 225 226 { 0 } 226 227 };

+2

tools/bpf/bpftool/main.h

··· 161 161 int do_feature(int argc, char **argv); 162 162 int do_btf(int argc, char **argv); 163 163 int do_gen(int argc, char **argv); 164 + int do_struct_ops(int argc, char **argv); 164 165 165 166 int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what); 166 167 int prog_parse_fd(int *argc, char ***argv); ··· 206 205 const struct btf *btf; 207 206 json_writer_t *jw; 208 207 bool is_plain_text; 208 + bool prog_id_as_func_ptr; 209 209 }; 210 210 211 211 /* btf_dumper_type - print data along with type information

+596

tools/bpf/bpftool/struct_ops.c

··· 1 + // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + /* Copyright (C) 2020 Facebook */ 3 + 4 + #include <errno.h> 5 + #include <stdio.h> 6 + #include <unistd.h> 7 + 8 + #include <linux/err.h> 9 + 10 + #include <bpf/bpf.h> 11 + #include <bpf/btf.h> 12 + #include <bpf/libbpf.h> 13 + 14 + #include "json_writer.h" 15 + #include "main.h" 16 + 17 + #define STRUCT_OPS_VALUE_PREFIX "bpf_struct_ops_" 18 + 19 + static const struct btf_type *map_info_type; 20 + static __u32 map_info_alloc_len; 21 + static struct btf *btf_vmlinux; 22 + static __s32 map_info_type_id; 23 + 24 + struct res { 25 + unsigned int nr_maps; 26 + unsigned int nr_errs; 27 + }; 28 + 29 + static const struct btf *get_btf_vmlinux(void) 30 + { 31 + if (btf_vmlinux) 32 + return btf_vmlinux; 33 + 34 + btf_vmlinux = libbpf_find_kernel_btf(); 35 + if (IS_ERR(btf_vmlinux)) 36 + p_err("struct_ops requires kernel CONFIG_DEBUG_INFO_BTF=y"); 37 + 38 + return btf_vmlinux; 39 + } 40 + 41 + static const char *get_kern_struct_ops_name(const struct bpf_map_info *info) 42 + { 43 + const struct btf *kern_btf; 44 + const struct btf_type *t; 45 + const char *st_ops_name; 46 + 47 + kern_btf = get_btf_vmlinux(); 48 + if (IS_ERR(kern_btf)) 49 + return "<btf_vmlinux_not_found>"; 50 + 51 + t = btf__type_by_id(kern_btf, info->btf_vmlinux_value_type_id); 52 + st_ops_name = btf__name_by_offset(kern_btf, t->name_off); 53 + st_ops_name += strlen(STRUCT_OPS_VALUE_PREFIX); 54 + 55 + return st_ops_name; 56 + } 57 + 58 + static __s32 get_map_info_type_id(void) 59 + { 60 + const struct btf *kern_btf; 61 + 62 + if (map_info_type_id) 63 + return map_info_type_id; 64 + 65 + kern_btf = get_btf_vmlinux(); 66 + if (IS_ERR(kern_btf)) { 67 + map_info_type_id = PTR_ERR(kern_btf); 68 + return map_info_type_id; 69 + } 70 + 71 + map_info_type_id = btf__find_by_name_kind(kern_btf, "bpf_map_info", 72 + BTF_KIND_STRUCT); 73 + if (map_info_type_id < 0) { 74 + p_err("can't find bpf_map_info from btf_vmlinux"); 75 + return map_info_type_id; 76 + } 77 + map_info_type = btf__type_by_id(kern_btf, map_info_type_id); 78 + 79 + /* Ensure map_info_alloc() has at least what the bpftool needs */ 80 + map_info_alloc_len = map_info_type->size; 81 + if (map_info_alloc_len < sizeof(struct bpf_map_info)) 82 + map_info_alloc_len = sizeof(struct bpf_map_info); 83 + 84 + return map_info_type_id; 85 + } 86 + 87 + /* If the subcmd needs to print out the bpf_map_info, 88 + * it should always call map_info_alloc to allocate 89 + * a bpf_map_info object instead of allocating it 90 + * on the stack. 91 + * 92 + * map_info_alloc() will take the running kernel's btf 93 + * into account. i.e. it will consider the 94 + * sizeof(struct bpf_map_info) of the running kernel. 95 + * 96 + * It will enable the "struct_ops" cmd to print the latest 97 + * "struct bpf_map_info". 98 + * 99 + * [ Recall that "struct_ops" requires the kernel's btf to 100 + * be available ] 101 + */ 102 + static struct bpf_map_info *map_info_alloc(__u32 *alloc_len) 103 + { 104 + struct bpf_map_info *info; 105 + 106 + if (get_map_info_type_id() < 0) 107 + return NULL; 108 + 109 + info = calloc(1, map_info_alloc_len); 110 + if (!info) 111 + p_err("mem alloc failed"); 112 + else 113 + *alloc_len = map_info_alloc_len; 114 + 115 + return info; 116 + } 117 + 118 + /* It iterates all struct_ops maps of the system. 119 + * It returns the fd in "*res_fd" and map_info in "*info". 120 + * In the very first iteration, info->id should be 0. 121 + * An optional map "*name" filter can be specified. 122 + * The filter can be made more flexible in the future. 123 + * e.g. filter by kernel-struct-ops-name, regex-name, glob-name, ...etc. 124 + * 125 + * Return value: 126 + * 1: A struct_ops map found. It is returned in "*res_fd" and "*info". 127 + * The caller can continue to call get_next in the future. 128 + * 0: No struct_ops map is returned. 129 + * All struct_ops map has been found. 130 + * -1: Error and the caller should abort the iteration. 131 + */ 132 + static int get_next_struct_ops_map(const char *name, int *res_fd, 133 + struct bpf_map_info *info, __u32 info_len) 134 + { 135 + __u32 id = info->id; 136 + int err, fd; 137 + 138 + while (true) { 139 + err = bpf_map_get_next_id(id, &id); 140 + if (err) { 141 + if (errno == ENOENT) 142 + return 0; 143 + p_err("can't get next map: %s", strerror(errno)); 144 + return -1; 145 + } 146 + 147 + fd = bpf_map_get_fd_by_id(id); 148 + if (fd < 0) { 149 + if (errno == ENOENT) 150 + continue; 151 + p_err("can't get map by id (%u): %s", 152 + id, strerror(errno)); 153 + return -1; 154 + } 155 + 156 + err = bpf_obj_get_info_by_fd(fd, info, &info_len); 157 + if (err) { 158 + p_err("can't get map info: %s", strerror(errno)); 159 + close(fd); 160 + return -1; 161 + } 162 + 163 + if (info->type == BPF_MAP_TYPE_STRUCT_OPS && 164 + (!name || !strcmp(name, info->name))) { 165 + *res_fd = fd; 166 + return 1; 167 + } 168 + close(fd); 169 + } 170 + } 171 + 172 + static int cmd_retval(const struct res *res, bool must_have_one_map) 173 + { 174 + if (res->nr_errs || (!res->nr_maps && must_have_one_map)) 175 + return -1; 176 + 177 + return 0; 178 + } 179 + 180 + /* "data" is the work_func private storage */ 181 + typedef int (*work_func)(int fd, const struct bpf_map_info *info, void *data, 182 + struct json_writer *wtr); 183 + 184 + /* Find all struct_ops map in the system. 185 + * Filter out by "name" (if specified). 186 + * Then call "func(fd, info, data, wtr)" on each struct_ops map found. 187 + */ 188 + static struct res do_search(const char *name, work_func func, void *data, 189 + struct json_writer *wtr) 190 + { 191 + struct bpf_map_info *info; 192 + struct res res = {}; 193 + __u32 info_len; 194 + int fd, err; 195 + 196 + info = map_info_alloc(&info_len); 197 + if (!info) { 198 + res.nr_errs++; 199 + return res; 200 + } 201 + 202 + if (wtr) 203 + jsonw_start_array(wtr); 204 + while ((err = get_next_struct_ops_map(name, &fd, info, info_len)) == 1) { 205 + res.nr_maps++; 206 + err = func(fd, info, data, wtr); 207 + if (err) 208 + res.nr_errs++; 209 + close(fd); 210 + } 211 + if (wtr) 212 + jsonw_end_array(wtr); 213 + 214 + if (err) 215 + res.nr_errs++; 216 + 217 + if (!wtr && name && !res.nr_errs && !res.nr_maps) 218 + /* It is not printing empty []. 219 + * Thus, needs to specifically say nothing found 220 + * for "name" here. 221 + */ 222 + p_err("no struct_ops found for %s", name); 223 + else if (!wtr && json_output && !res.nr_errs) 224 + /* The "func()" above is not writing any json (i.e. !wtr 225 + * test here). 226 + * 227 + * However, "-j" is enabled and there is no errs here, 228 + * so call json_null() as the current convention of 229 + * other cmds. 230 + */ 231 + jsonw_null(json_wtr); 232 + 233 + free(info); 234 + return res; 235 + } 236 + 237 + static struct res do_one_id(const char *id_str, work_func func, void *data, 238 + struct json_writer *wtr) 239 + { 240 + struct bpf_map_info *info; 241 + struct res res = {}; 242 + unsigned long id; 243 + __u32 info_len; 244 + char *endptr; 245 + int fd; 246 + 247 + id = strtoul(id_str, &endptr, 0); 248 + if (*endptr || !id || id > UINT32_MAX) { 249 + p_err("invalid id %s", id_str); 250 + res.nr_errs++; 251 + return res; 252 + } 253 + 254 + fd = bpf_map_get_fd_by_id(id); 255 + if (fd == -1) { 256 + p_err("can't get map by id (%lu): %s", id, strerror(errno)); 257 + res.nr_errs++; 258 + return res; 259 + } 260 + 261 + info = map_info_alloc(&info_len); 262 + if (!info) { 263 + res.nr_errs++; 264 + goto done; 265 + } 266 + 267 + if (bpf_obj_get_info_by_fd(fd, info, &info_len)) { 268 + p_err("can't get map info: %s", strerror(errno)); 269 + res.nr_errs++; 270 + goto done; 271 + } 272 + 273 + if (info->type != BPF_MAP_TYPE_STRUCT_OPS) { 274 + p_err("%s id %u is not a struct_ops map", info->name, info->id); 275 + res.nr_errs++; 276 + goto done; 277 + } 278 + 279 + res.nr_maps++; 280 + 281 + if (func(fd, info, data, wtr)) 282 + res.nr_errs++; 283 + else if (!wtr && json_output) 284 + /* The "func()" above is not writing any json (i.e. !wtr 285 + * test here). 286 + * 287 + * However, "-j" is enabled and there is no errs here, 288 + * so call json_null() as the current convention of 289 + * other cmds. 290 + */ 291 + jsonw_null(json_wtr); 292 + 293 + done: 294 + free(info); 295 + close(fd); 296 + 297 + return res; 298 + } 299 + 300 + static struct res do_work_on_struct_ops(const char *search_type, 301 + const char *search_term, 302 + work_func func, void *data, 303 + struct json_writer *wtr) 304 + { 305 + if (search_type) { 306 + if (is_prefix(search_type, "id")) 307 + return do_one_id(search_term, func, data, wtr); 308 + else if (!is_prefix(search_type, "name")) 309 + usage(); 310 + } 311 + 312 + return do_search(search_term, func, data, wtr); 313 + } 314 + 315 + static int __do_show(int fd, const struct bpf_map_info *info, void *data, 316 + struct json_writer *wtr) 317 + { 318 + if (wtr) { 319 + jsonw_start_object(wtr); 320 + jsonw_uint_field(wtr, "id", info->id); 321 + jsonw_string_field(wtr, "name", info->name); 322 + jsonw_string_field(wtr, "kernel_struct_ops", 323 + get_kern_struct_ops_name(info)); 324 + jsonw_end_object(wtr); 325 + } else { 326 + printf("%u: %-15s %-32s\n", info->id, info->name, 327 + get_kern_struct_ops_name(info)); 328 + } 329 + 330 + return 0; 331 + } 332 + 333 + static int do_show(int argc, char **argv) 334 + { 335 + const char *search_type = NULL, *search_term = NULL; 336 + struct res res; 337 + 338 + if (argc && argc != 2) 339 + usage(); 340 + 341 + if (argc == 2) { 342 + search_type = GET_ARG(); 343 + search_term = GET_ARG(); 344 + } 345 + 346 + res = do_work_on_struct_ops(search_type, search_term, __do_show, 347 + NULL, json_wtr); 348 + 349 + return cmd_retval(&res, !!search_term); 350 + } 351 + 352 + static int __do_dump(int fd, const struct bpf_map_info *info, void *data, 353 + struct json_writer *wtr) 354 + { 355 + struct btf_dumper *d = (struct btf_dumper *)data; 356 + const struct btf_type *struct_ops_type; 357 + const struct btf *kern_btf = d->btf; 358 + const char *struct_ops_name; 359 + int zero = 0; 360 + void *value; 361 + 362 + /* note: d->jw == wtr */ 363 + 364 + kern_btf = d->btf; 365 + 366 + /* The kernel supporting BPF_MAP_TYPE_STRUCT_OPS must have 367 + * btf_vmlinux_value_type_id. 368 + */ 369 + struct_ops_type = btf__type_by_id(kern_btf, 370 + info->btf_vmlinux_value_type_id); 371 + struct_ops_name = btf__name_by_offset(kern_btf, 372 + struct_ops_type->name_off); 373 + value = calloc(1, info->value_size); 374 + if (!value) { 375 + p_err("mem alloc failed"); 376 + return -1; 377 + } 378 + 379 + if (bpf_map_lookup_elem(fd, &zero, value)) { 380 + p_err("can't lookup struct_ops map %s id %u", 381 + info->name, info->id); 382 + free(value); 383 + return -1; 384 + } 385 + 386 + jsonw_start_object(wtr); 387 + jsonw_name(wtr, "bpf_map_info"); 388 + btf_dumper_type(d, map_info_type_id, (void *)info); 389 + jsonw_end_object(wtr); 390 + 391 + jsonw_start_object(wtr); 392 + jsonw_name(wtr, struct_ops_name); 393 + btf_dumper_type(d, info->btf_vmlinux_value_type_id, value); 394 + jsonw_end_object(wtr); 395 + 396 + free(value); 397 + 398 + return 0; 399 + } 400 + 401 + static int do_dump(int argc, char **argv) 402 + { 403 + const char *search_type = NULL, *search_term = NULL; 404 + json_writer_t *wtr = json_wtr; 405 + const struct btf *kern_btf; 406 + struct btf_dumper d = {}; 407 + struct res res; 408 + 409 + if (argc && argc != 2) 410 + usage(); 411 + 412 + if (argc == 2) { 413 + search_type = GET_ARG(); 414 + search_term = GET_ARG(); 415 + } 416 + 417 + kern_btf = get_btf_vmlinux(); 418 + if (IS_ERR(kern_btf)) 419 + return -1; 420 + 421 + if (!json_output) { 422 + wtr = jsonw_new(stdout); 423 + if (!wtr) { 424 + p_err("can't create json writer"); 425 + return -1; 426 + } 427 + jsonw_pretty(wtr, true); 428 + } 429 + 430 + d.btf = kern_btf; 431 + d.jw = wtr; 432 + d.is_plain_text = !json_output; 433 + d.prog_id_as_func_ptr = true; 434 + 435 + res = do_work_on_struct_ops(search_type, search_term, __do_dump, &d, 436 + wtr); 437 + 438 + if (!json_output) 439 + jsonw_destroy(&wtr); 440 + 441 + return cmd_retval(&res, !!search_term); 442 + } 443 + 444 + static int __do_unregister(int fd, const struct bpf_map_info *info, void *data, 445 + struct json_writer *wtr) 446 + { 447 + int zero = 0; 448 + 449 + if (bpf_map_delete_elem(fd, &zero)) { 450 + p_err("can't unload %s %s id %u: %s", 451 + get_kern_struct_ops_name(info), info->name, 452 + info->id, strerror(errno)); 453 + return -1; 454 + } 455 + 456 + p_info("Unregistered %s %s id %u", 457 + get_kern_struct_ops_name(info), info->name, 458 + info->id); 459 + 460 + return 0; 461 + } 462 + 463 + static int do_unregister(int argc, char **argv) 464 + { 465 + const char *search_type, *search_term; 466 + struct res res; 467 + 468 + if (argc != 2) 469 + usage(); 470 + 471 + search_type = GET_ARG(); 472 + search_term = GET_ARG(); 473 + 474 + res = do_work_on_struct_ops(search_type, search_term, 475 + __do_unregister, NULL, NULL); 476 + 477 + return cmd_retval(&res, true); 478 + } 479 + 480 + static int do_register(int argc, char **argv) 481 + { 482 + const struct bpf_map_def *def; 483 + struct bpf_map_info info = {}; 484 + __u32 info_len = sizeof(info); 485 + int nr_errs = 0, nr_maps = 0; 486 + struct bpf_object *obj; 487 + struct bpf_link *link; 488 + struct bpf_map *map; 489 + const char *file; 490 + 491 + if (argc != 1) 492 + usage(); 493 + 494 + file = GET_ARG(); 495 + 496 + obj = bpf_object__open(file); 497 + if (IS_ERR_OR_NULL(obj)) 498 + return -1; 499 + 500 + set_max_rlimit(); 501 + 502 + if (bpf_object__load(obj)) { 503 + bpf_object__close(obj); 504 + return -1; 505 + } 506 + 507 + bpf_object__for_each_map(map, obj) { 508 + def = bpf_map__def(map); 509 + if (def->type != BPF_MAP_TYPE_STRUCT_OPS) 510 + continue; 511 + 512 + link = bpf_map__attach_struct_ops(map); 513 + if (IS_ERR(link)) { 514 + p_err("can't register struct_ops %s: %s", 515 + bpf_map__name(map), 516 + strerror(-PTR_ERR(link))); 517 + nr_errs++; 518 + continue; 519 + } 520 + nr_maps++; 521 + 522 + bpf_link__disconnect(link); 523 + bpf_link__destroy(link); 524 + 525 + if (!bpf_obj_get_info_by_fd(bpf_map__fd(map), &info, 526 + &info_len)) 527 + p_info("Registered %s %s id %u", 528 + get_kern_struct_ops_name(&info), 529 + bpf_map__name(map), 530 + info.id); 531 + else 532 + /* Not p_err. The struct_ops was attached 533 + * successfully. 534 + */ 535 + p_info("Registered %s but can't find id: %s", 536 + bpf_map__name(map), strerror(errno)); 537 + } 538 + 539 + bpf_object__close(obj); 540 + 541 + if (nr_errs) 542 + return -1; 543 + 544 + if (!nr_maps) { 545 + p_err("no struct_ops found in %s", file); 546 + return -1; 547 + } 548 + 549 + if (json_output) 550 + jsonw_null(json_wtr); 551 + 552 + return 0; 553 + } 554 + 555 + static int do_help(int argc, char **argv) 556 + { 557 + if (json_output) { 558 + jsonw_null(json_wtr); 559 + return 0; 560 + } 561 + 562 + fprintf(stderr, 563 + "Usage: %s %s { show | list } [STRUCT_OPS_MAP]\n" 564 + " %s %s dump [STRUCT_OPS_MAP]\n" 565 + " %s %s register OBJ\n" 566 + " %s %s unregister STRUCT_OPS_MAP\n" 567 + " %s %s help\n" 568 + "\n" 569 + " OPTIONS := { {-j|--json} [{-p|--pretty}] }\n" 570 + " STRUCT_OPS_MAP := [ id STRUCT_OPS_MAP_ID | name STRUCT_OPS_MAP_NAME ]\n", 571 + bin_name, argv[-2], bin_name, argv[-2], 572 + bin_name, argv[-2], bin_name, argv[-2], 573 + bin_name, argv[-2]); 574 + 575 + return 0; 576 + } 577 + 578 + static const struct cmd cmds[] = { 579 + { "show", do_show }, 580 + { "list", do_show }, 581 + { "register", do_register }, 582 + { "unregister", do_unregister }, 583 + { "dump", do_dump }, 584 + { "help", do_help }, 585 + { 0 } 586 + }; 587 + 588 + int do_struct_ops(int argc, char **argv) 589 + { 590 + int err; 591 + 592 + err = cmd_select(cmds, argc, argv, do_help); 593 + 594 + btf__free(btf_vmlinux); 595 + return err; 596 + }

+80 -2

tools/include/uapi/linux/bpf.h

··· 111 111 BPF_MAP_LOOKUP_AND_DELETE_BATCH, 112 112 BPF_MAP_UPDATE_BATCH, 113 113 BPF_MAP_DELETE_BATCH, 114 + BPF_LINK_CREATE, 115 + BPF_LINK_UPDATE, 114 116 }; 115 117 116 118 enum bpf_map_type { ··· 183 181 BPF_PROG_TYPE_TRACING, 184 182 BPF_PROG_TYPE_STRUCT_OPS, 185 183 BPF_PROG_TYPE_EXT, 184 + BPF_PROG_TYPE_LSM, 186 185 }; 187 186 188 187 enum bpf_attach_type { ··· 214 211 BPF_TRACE_FENTRY, 215 212 BPF_TRACE_FEXIT, 216 213 BPF_MODIFY_RETURN, 214 + BPF_LSM_MAC, 217 215 __MAX_BPF_ATTACH_TYPE 218 216 }; 219 217 ··· 543 539 __u32 prog_cnt; 544 540 } query; 545 541 546 - struct { 542 + struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ 547 543 __u64 name; 548 544 __u32 prog_fd; 549 545 } raw_tracepoint; ··· 571 567 __u64 probe_offset; /* output: probe_offset */ 572 568 __u64 probe_addr; /* output: probe_addr */ 573 569 } task_fd_query; 570 + 571 + struct { /* struct used by BPF_LINK_CREATE command */ 572 + __u32 prog_fd; /* eBPF program to attach */ 573 + __u32 target_fd; /* object to attach to */ 574 + __u32 attach_type; /* attach type */ 575 + __u32 flags; /* extra flags */ 576 + } link_create; 577 + 578 + struct { /* struct used by BPF_LINK_UPDATE command */ 579 + __u32 link_fd; /* link fd */ 580 + /* new program fd to update link with */ 581 + __u32 new_prog_fd; 582 + __u32 flags; /* extra flags */ 583 + /* expected link's program fd; is specified only if 584 + * BPF_F_REPLACE flag is set in flags */ 585 + __u32 old_prog_fd; 586 + } link_update; 587 + 574 588 } __attribute__((aligned(8))); 575 589 576 590 /* The description below is an attempt at providing documentation to eBPF ··· 2972 2950 * restricted to raw_tracepoint bpf programs. 2973 2951 * Return 2974 2952 * 0 on success, or a negative error in case of failure. 2953 + * 2954 + * u64 bpf_get_netns_cookie(void *ctx) 2955 + * Description 2956 + * Retrieve the cookie (generated by the kernel) of the network 2957 + * namespace the input *ctx* is associated with. The network 2958 + * namespace cookie remains stable for its lifetime and provides 2959 + * a global identifier that can be assumed unique. If *ctx* is 2960 + * NULL, then the helper returns the cookie for the initial 2961 + * network namespace. The cookie itself is very similar to that 2962 + * of bpf_get_socket_cookie() helper, but for network namespaces 2963 + * instead of sockets. 2964 + * Return 2965 + * A 8-byte long opaque number. 2966 + * 2967 + * u64 bpf_get_current_ancestor_cgroup_id(int ancestor_level) 2968 + * Description 2969 + * Return id of cgroup v2 that is ancestor of the cgroup associated 2970 + * with the current task at the *ancestor_level*. The root cgroup 2971 + * is at *ancestor_level* zero and each step down the hierarchy 2972 + * increments the level. If *ancestor_level* == level of cgroup 2973 + * associated with the current task, then return value will be the 2974 + * same as that of **bpf_get_current_cgroup_id**\ (). 2975 + * 2976 + * The helper is useful to implement policies based on cgroups 2977 + * that are upper in hierarchy than immediate cgroup associated 2978 + * with the current task. 2979 + * 2980 + * The format of returned id and helper limitations are same as in 2981 + * **bpf_get_current_cgroup_id**\ (). 2982 + * Return 2983 + * The id is returned or 0 in case the id could not be retrieved. 2984 + * 2985 + * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags) 2986 + * Description 2987 + * Assign the *sk* to the *skb*. When combined with appropriate 2988 + * routing configuration to receive the packet towards the socket, 2989 + * will cause *skb* to be delivered to the specified socket. 2990 + * Subsequent redirection of *skb* via **bpf_redirect**\ (), 2991 + * **bpf_clone_redirect**\ () or other methods outside of BPF may 2992 + * interfere with successful delivery to the socket. 2993 + * 2994 + * This operation is only valid from TC ingress path. 2995 + * 2996 + * The *flags* argument must be zero. 2997 + * Return 2998 + * 0 on success, or a negative errno in case of failure. 2999 + * 3000 + * * **-EINVAL** Unsupported flags specified. 3001 + * * **-ENOENT** Socket is unavailable for assignment. 3002 + * * **-ENETUNREACH** Socket is unreachable (wrong netns). 3003 + * * **-EOPNOTSUPP** Unsupported operation, for example a 3004 + * call from outside of TC ingress. 3005 + * * **-ESOCKTNOSUPPORT** Socket type not supported (reuseport). 2975 3006 */ 2976 3007 #define __BPF_FUNC_MAPPER(FN) \ 2977 3008 FN(unspec), \ ··· 3148 3073 FN(jiffies64), \ 3149 3074 FN(read_branch_records), \ 3150 3075 FN(get_ns_current_pid_tgid), \ 3151 - FN(xdp_output), 3076 + FN(xdp_output), \ 3077 + FN(get_netns_cookie), \ 3078 + FN(get_current_ancestor_cgroup_id), \ 3079 + FN(sk_assign), 3152 3080 3153 3081 /* integer value in 'imm' field of BPF_CALL instruction selects which helper 3154 3082 * function eBPF program intends to call

+3 -1

tools/include/uapi/linux/if_link.h

··· 962 962 #define XDP_FLAGS_SKB_MODE (1U << 1) 963 963 #define XDP_FLAGS_DRV_MODE (1U << 2) 964 964 #define XDP_FLAGS_HW_MODE (1U << 3) 965 + #define XDP_FLAGS_REPLACE (1U << 4) 965 966 #define XDP_FLAGS_MODES (XDP_FLAGS_SKB_MODE | \ 966 967 XDP_FLAGS_DRV_MODE | \ 967 968 XDP_FLAGS_HW_MODE) 968 969 #define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \ 969 - XDP_FLAGS_MODES) 970 + XDP_FLAGS_MODES | XDP_FLAGS_REPLACE) 970 971 971 972 /* These are stored into IFLA_XDP_ATTACHED on dump. */ 972 973 enum { ··· 987 986 IFLA_XDP_DRV_PROG_ID, 988 987 IFLA_XDP_SKB_PROG_ID, 989 988 IFLA_XDP_HW_PROG_ID, 989 + IFLA_XDP_EXPECTED_FD, 990 990 __IFLA_XDP_MAX, 991 991 }; 992 992

+36 -1

tools/lib/bpf/bpf.c

··· 235 235 memset(&attr, 0, sizeof(attr)); 236 236 attr.prog_type = load_attr->prog_type; 237 237 attr.expected_attach_type = load_attr->expected_attach_type; 238 - if (attr.prog_type == BPF_PROG_TYPE_STRUCT_OPS) { 238 + if (attr.prog_type == BPF_PROG_TYPE_STRUCT_OPS || 239 + attr.prog_type == BPF_PROG_TYPE_LSM) { 239 240 attr.attach_btf_id = load_attr->attach_btf_id; 240 241 } else if (attr.prog_type == BPF_PROG_TYPE_TRACING || 241 242 attr.prog_type == BPF_PROG_TYPE_EXT) { ··· 583 582 attr.attach_type = type; 584 583 585 584 return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr)); 585 + } 586 + 587 + int bpf_link_create(int prog_fd, int target_fd, 588 + enum bpf_attach_type attach_type, 589 + const struct bpf_link_create_opts *opts) 590 + { 591 + union bpf_attr attr; 592 + 593 + if (!OPTS_VALID(opts, bpf_link_create_opts)) 594 + return -EINVAL; 595 + 596 + memset(&attr, 0, sizeof(attr)); 597 + attr.link_create.prog_fd = prog_fd; 598 + attr.link_create.target_fd = target_fd; 599 + attr.link_create.attach_type = attach_type; 600 + 601 + return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr)); 602 + } 603 + 604 + int bpf_link_update(int link_fd, int new_prog_fd, 605 + const struct bpf_link_update_opts *opts) 606 + { 607 + union bpf_attr attr; 608 + 609 + if (!OPTS_VALID(opts, bpf_link_update_opts)) 610 + return -EINVAL; 611 + 612 + memset(&attr, 0, sizeof(attr)); 613 + attr.link_update.link_fd = link_fd; 614 + attr.link_update.new_prog_fd = new_prog_fd; 615 + attr.link_update.flags = OPTS_GET(opts, flags, 0); 616 + attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0); 617 + 618 + return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr)); 586 619 } 587 620 588 621 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,

+19

tools/lib/bpf/bpf.h

··· 168 168 LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd, 169 169 enum bpf_attach_type type); 170 170 171 + struct bpf_link_create_opts { 172 + size_t sz; /* size of this struct for forward/backward compatibility */ 173 + }; 174 + #define bpf_link_create_opts__last_field sz 175 + 176 + LIBBPF_API int bpf_link_create(int prog_fd, int target_fd, 177 + enum bpf_attach_type attach_type, 178 + const struct bpf_link_create_opts *opts); 179 + 180 + struct bpf_link_update_opts { 181 + size_t sz; /* size of this struct for forward/backward compatibility */ 182 + __u32 flags; /* extra flags */ 183 + __u32 old_prog_fd; /* expected old program FD */ 184 + }; 185 + #define bpf_link_update_opts__last_field old_prog_fd 186 + 187 + LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd, 188 + const struct bpf_link_update_opts *opts); 189 + 171 190 struct bpf_prog_test_run_attr { 172 191 int prog_fd; 173 192 int repeat;

+1 -1

tools/lib/bpf/bpf_tracing.h

··· 390 390 391 391 #define ___bpf_kretprobe_args0() ctx 392 392 #define ___bpf_kretprobe_args1(x) \ 393 - ___bpf_kretprobe_args0(), (void *)PT_REGS_RET(ctx) 393 + ___bpf_kretprobe_args0(), (void *)PT_REGS_RC(ctx) 394 394 #define ___bpf_kretprobe_args(args...) \ 395 395 ___bpf_apply(___bpf_kretprobe_args, ___bpf_narg(args))(args) 396 396

+15 -5

tools/lib/bpf/btf.c

··· 657 657 658 658 int btf__load(struct btf *btf) 659 659 { 660 - __u32 log_buf_size = BPF_LOG_BUF_SIZE; 660 + __u32 log_buf_size = 0; 661 661 char *log_buf = NULL; 662 662 int err = 0; 663 663 664 664 if (btf->fd >= 0) 665 665 return -EEXIST; 666 666 667 - log_buf = malloc(log_buf_size); 668 - if (!log_buf) 669 - return -ENOMEM; 667 + retry_load: 668 + if (log_buf_size) { 669 + log_buf = malloc(log_buf_size); 670 + if (!log_buf) 671 + return -ENOMEM; 670 672 671 - *log_buf = 0; 673 + *log_buf = 0; 674 + } 672 675 673 676 btf->fd = bpf_load_btf(btf->data, btf->data_size, 674 677 log_buf, log_buf_size, false); 675 678 if (btf->fd < 0) { 679 + if (!log_buf || errno == ENOSPC) { 680 + log_buf_size = max((__u32)BPF_LOG_BUF_SIZE, 681 + log_buf_size << 1); 682 + free(log_buf); 683 + goto retry_load; 684 + } 685 + 676 686 err = -errno; 677 687 pr_warn("Error loading BTF: %s(%d)\n", strerror(errno), errno); 678 688 if (*log_buf)

+112 -22

tools/lib/bpf/libbpf.c

··· 1845 1845 * type definition, while using only sizeof(void *) space in ELF data section. 1846 1846 */ 1847 1847 static bool get_map_field_int(const char *map_name, const struct btf *btf, 1848 - const struct btf_type *def, 1849 1848 const struct btf_member *m, __u32 *res) 1850 1849 { 1851 1850 const struct btf_type *t = skip_mods_and_typedefs(btf, m->type, NULL); ··· 1971 1972 return -EINVAL; 1972 1973 } 1973 1974 if (strcmp(name, "type") == 0) { 1974 - if (!get_map_field_int(map_name, obj->btf, def, m, 1975 + if (!get_map_field_int(map_name, obj->btf, m, 1975 1976 &map->def.type)) 1976 1977 return -EINVAL; 1977 1978 pr_debug("map '%s': found type = %u.\n", 1978 1979 map_name, map->def.type); 1979 1980 } else if (strcmp(name, "max_entries") == 0) { 1980 - if (!get_map_field_int(map_name, obj->btf, def, m, 1981 + if (!get_map_field_int(map_name, obj->btf, m, 1981 1982 &map->def.max_entries)) 1982 1983 return -EINVAL; 1983 1984 pr_debug("map '%s': found max_entries = %u.\n", 1984 1985 map_name, map->def.max_entries); 1985 1986 } else if (strcmp(name, "map_flags") == 0) { 1986 - if (!get_map_field_int(map_name, obj->btf, def, m, 1987 + if (!get_map_field_int(map_name, obj->btf, m, 1987 1988 &map->def.map_flags)) 1988 1989 return -EINVAL; 1989 1990 pr_debug("map '%s': found map_flags = %u.\n", ··· 1991 1992 } else if (strcmp(name, "key_size") == 0) { 1992 1993 __u32 sz; 1993 1994 1994 - if (!get_map_field_int(map_name, obj->btf, def, m, 1995 - &sz)) 1995 + if (!get_map_field_int(map_name, obj->btf, m, &sz)) 1996 1996 return -EINVAL; 1997 1997 pr_debug("map '%s': found key_size = %u.\n", 1998 1998 map_name, sz); ··· 2033 2035 } else if (strcmp(name, "value_size") == 0) { 2034 2036 __u32 sz; 2035 2037 2036 - if (!get_map_field_int(map_name, obj->btf, def, m, 2037 - &sz)) 2038 + if (!get_map_field_int(map_name, obj->btf, m, &sz)) 2038 2039 return -EINVAL; 2039 2040 pr_debug("map '%s': found value_size = %u.\n", 2040 2041 map_name, sz); ··· 2076 2079 __u32 val; 2077 2080 int err; 2078 2081 2079 - if (!get_map_field_int(map_name, obj->btf, def, m, 2080 - &val)) 2082 + if (!get_map_field_int(map_name, obj->btf, m, &val)) 2081 2083 return -EINVAL; 2082 2084 pr_debug("map '%s': found pinning = %u.\n", 2083 2085 map_name, val); ··· 2358 2362 2359 2363 static inline bool libbpf_prog_needs_vmlinux_btf(struct bpf_program *prog) 2360 2364 { 2361 - if (prog->type == BPF_PROG_TYPE_STRUCT_OPS) 2365 + if (prog->type == BPF_PROG_TYPE_STRUCT_OPS || 2366 + prog->type == BPF_PROG_TYPE_LSM) 2362 2367 return true; 2363 2368 2364 2369 /* BPF_PROG_TYPE_TRACING programs which do not attach to other programs ··· 4852 4855 { 4853 4856 struct bpf_load_program_attr load_attr; 4854 4857 char *cp, errmsg[STRERR_BUFSIZE]; 4855 - int log_buf_size = BPF_LOG_BUF_SIZE; 4856 - char *log_buf; 4858 + size_t log_buf_size = 0; 4859 + char *log_buf = NULL; 4857 4860 int btf_fd, ret; 4858 4861 4859 4862 if (!insns || !insns_cnt) ··· 4867 4870 load_attr.insns = insns; 4868 4871 load_attr.insns_cnt = insns_cnt; 4869 4872 load_attr.license = license; 4870 - if (prog->type == BPF_PROG_TYPE_STRUCT_OPS) { 4873 + if (prog->type == BPF_PROG_TYPE_STRUCT_OPS || 4874 + prog->type == BPF_PROG_TYPE_LSM) { 4871 4875 load_attr.attach_btf_id = prog->attach_btf_id; 4872 4876 } else if (prog->type == BPF_PROG_TYPE_TRACING || 4873 4877 prog->type == BPF_PROG_TYPE_EXT) { ··· 4894 4896 load_attr.prog_flags = prog->prog_flags; 4895 4897 4896 4898 retry_load: 4897 - log_buf = malloc(log_buf_size); 4898 - if (!log_buf) 4899 - pr_warn("Alloc log buffer for bpf loader error, continue without log\n"); 4899 + if (log_buf_size) { 4900 + log_buf = malloc(log_buf_size); 4901 + if (!log_buf) 4902 + return -ENOMEM; 4903 + 4904 + *log_buf = 0; 4905 + } 4900 4906 4901 4907 ret = bpf_load_program_xattr(&load_attr, log_buf, log_buf_size); 4902 4908 4903 4909 if (ret >= 0) { 4904 - if (load_attr.log_level) 4910 + if (log_buf && load_attr.log_level) 4905 4911 pr_debug("verifier log:\n%s", log_buf); 4906 4912 *pfd = ret; 4907 4913 ret = 0; 4908 4914 goto out; 4909 4915 } 4910 4916 4911 - if (errno == ENOSPC) { 4912 - log_buf_size <<= 1; 4917 + if (!log_buf || errno == ENOSPC) { 4918 + log_buf_size = max((size_t)BPF_LOG_BUF_SIZE, 4919 + log_buf_size << 1); 4920 + 4913 4921 free(log_buf); 4914 4922 goto retry_load; 4915 4923 } ··· 4959 4955 int err = 0, fd, i, btf_id; 4960 4956 4961 4957 if ((prog->type == BPF_PROG_TYPE_TRACING || 4958 + prog->type == BPF_PROG_TYPE_LSM || 4962 4959 prog->type == BPF_PROG_TYPE_EXT) && !prog->attach_btf_id) { 4963 4960 btf_id = libbpf_find_attach_btf_id(prog); 4964 4961 if (btf_id <= 0) ··· 6199 6194 } \ 6200 6195 6201 6196 BPF_PROG_TYPE_FNS(socket_filter, BPF_PROG_TYPE_SOCKET_FILTER); 6197 + BPF_PROG_TYPE_FNS(lsm, BPF_PROG_TYPE_LSM); 6202 6198 BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE); 6203 6199 BPF_PROG_TYPE_FNS(sched_cls, BPF_PROG_TYPE_SCHED_CLS); 6204 6200 BPF_PROG_TYPE_FNS(sched_act, BPF_PROG_TYPE_SCHED_ACT); ··· 6266 6260 struct bpf_program *prog); 6267 6261 static struct bpf_link *attach_trace(const struct bpf_sec_def *sec, 6268 6262 struct bpf_program *prog); 6263 + static struct bpf_link *attach_lsm(const struct bpf_sec_def *sec, 6264 + struct bpf_program *prog); 6269 6265 6270 6266 struct bpf_sec_def { 6271 6267 const char *sec; ··· 6318 6310 SEC_DEF("freplace/", EXT, 6319 6311 .is_attach_btf = true, 6320 6312 .attach_fn = attach_trace), 6313 + SEC_DEF("lsm/", LSM, 6314 + .is_attach_btf = true, 6315 + .expected_attach_type = BPF_LSM_MAC, 6316 + .attach_fn = attach_lsm), 6321 6317 BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP), 6322 6318 BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT), 6323 6319 BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN), ··· 6584 6572 } 6585 6573 6586 6574 #define BTF_TRACE_PREFIX "btf_trace_" 6575 + #define BTF_LSM_PREFIX "bpf_lsm_" 6587 6576 #define BTF_MAX_NAME_SIZE 128 6588 6577 6589 6578 static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix, ··· 6612 6599 if (attach_type == BPF_TRACE_RAW_TP) 6613 6600 err = find_btf_by_prefix_kind(btf, BTF_TRACE_PREFIX, name, 6614 6601 BTF_KIND_TYPEDEF); 6602 + else if (attach_type == BPF_LSM_MAC) 6603 + err = find_btf_by_prefix_kind(btf, BTF_LSM_PREFIX, name, 6604 + BTF_KIND_FUNC); 6615 6605 else 6616 6606 err = btf__find_by_name_kind(btf, name, BTF_KIND_FUNC); 6617 6607 ··· 6770 6754 void *bpf_map__priv(const struct bpf_map *map) 6771 6755 { 6772 6756 return map ? map->priv : ERR_PTR(-EINVAL); 6757 + } 6758 + 6759 + int bpf_map__set_initial_value(struct bpf_map *map, 6760 + const void *data, size_t size) 6761 + { 6762 + if (!map->mmaped || map->libbpf_type == LIBBPF_MAP_KCONFIG || 6763 + size != map->def.value_size || map->fd >= 0) 6764 + return -EINVAL; 6765 + 6766 + memcpy(map->mmaped, data, size); 6767 + return 0; 6773 6768 } 6774 6769 6775 6770 bool bpf_map__is_offload_neutral(const struct bpf_map *map) ··· 6977 6950 int fd; /* hook FD, -1 if not applicable */ 6978 6951 bool disconnected; 6979 6952 }; 6953 + 6954 + /* Replace link's underlying BPF program with the new one */ 6955 + int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog) 6956 + { 6957 + return bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL); 6958 + } 6980 6959 6981 6960 /* Release "ownership" of underlying BPF resource (typically, BPF program 6982 6961 * attached to some BPF hook, e.g., tracepoint, kprobe, etc). Disconnected ··· 7485 7452 return bpf_program__attach_raw_tracepoint(prog, tp_name); 7486 7453 } 7487 7454 7488 - struct bpf_link *bpf_program__attach_trace(struct bpf_program *prog) 7455 + /* Common logic for all BPF program types that attach to a btf_id */ 7456 + static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog) 7489 7457 { 7490 7458 char errmsg[STRERR_BUFSIZE]; 7491 7459 struct bpf_link *link; ··· 7508 7474 if (pfd < 0) { 7509 7475 pfd = -errno; 7510 7476 free(link); 7511 - pr_warn("program '%s': failed to attach to trace: %s\n", 7477 + pr_warn("program '%s': failed to attach: %s\n", 7512 7478 bpf_program__title(prog, false), 7513 7479 libbpf_strerror_r(pfd, errmsg, sizeof(errmsg))); 7514 7480 return ERR_PTR(pfd); ··· 7517 7483 return (struct bpf_link *)link; 7518 7484 } 7519 7485 7486 + struct bpf_link *bpf_program__attach_trace(struct bpf_program *prog) 7487 + { 7488 + return bpf_program__attach_btf_id(prog); 7489 + } 7490 + 7491 + struct bpf_link *bpf_program__attach_lsm(struct bpf_program *prog) 7492 + { 7493 + return bpf_program__attach_btf_id(prog); 7494 + } 7495 + 7520 7496 static struct bpf_link *attach_trace(const struct bpf_sec_def *sec, 7521 7497 struct bpf_program *prog) 7522 7498 { 7523 7499 return bpf_program__attach_trace(prog); 7500 + } 7501 + 7502 + static struct bpf_link *attach_lsm(const struct bpf_sec_def *sec, 7503 + struct bpf_program *prog) 7504 + { 7505 + return bpf_program__attach_lsm(prog); 7506 + } 7507 + 7508 + struct bpf_link * 7509 + bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd) 7510 + { 7511 + const struct bpf_sec_def *sec_def; 7512 + enum bpf_attach_type attach_type; 7513 + char errmsg[STRERR_BUFSIZE]; 7514 + struct bpf_link *link; 7515 + int prog_fd, link_fd; 7516 + 7517 + prog_fd = bpf_program__fd(prog); 7518 + if (prog_fd < 0) { 7519 + pr_warn("program '%s': can't attach before loaded\n", 7520 + bpf_program__title(prog, false)); 7521 + return ERR_PTR(-EINVAL); 7522 + } 7523 + 7524 + link = calloc(1, sizeof(*link)); 7525 + if (!link) 7526 + return ERR_PTR(-ENOMEM); 7527 + link->detach = &bpf_link__detach_fd; 7528 + 7529 + attach_type = bpf_program__get_expected_attach_type(prog); 7530 + if (!attach_type) { 7531 + sec_def = find_sec_def(bpf_program__title(prog, false)); 7532 + if (sec_def) 7533 + attach_type = sec_def->attach_type; 7534 + } 7535 + link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL); 7536 + if (link_fd < 0) { 7537 + link_fd = -errno; 7538 + free(link); 7539 + pr_warn("program '%s': failed to attach to cgroup: %s\n", 7540 + bpf_program__title(prog, false), 7541 + libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg))); 7542 + return ERR_PTR(link_fd); 7543 + } 7544 + link->fd = link_fd; 7545 + return link; 7524 7546 } 7525 7547 7526 7548 struct bpf_link *bpf_program__attach(struct bpf_program *prog)

+21 -1

tools/lib/bpf/libbpf.h

··· 224 224 LIBBPF_API const char *bpf_link__pin_path(const struct bpf_link *link); 225 225 LIBBPF_API int bpf_link__pin(struct bpf_link *link, const char *path); 226 226 LIBBPF_API int bpf_link__unpin(struct bpf_link *link); 227 + LIBBPF_API int bpf_link__update_program(struct bpf_link *link, 228 + struct bpf_program *prog); 227 229 LIBBPF_API void bpf_link__disconnect(struct bpf_link *link); 228 230 LIBBPF_API int bpf_link__destroy(struct bpf_link *link); 229 231 ··· 247 245 LIBBPF_API struct bpf_link * 248 246 bpf_program__attach_raw_tracepoint(struct bpf_program *prog, 249 247 const char *tp_name); 250 - 251 248 LIBBPF_API struct bpf_link * 252 249 bpf_program__attach_trace(struct bpf_program *prog); 250 + LIBBPF_API struct bpf_link * 251 + bpf_program__attach_lsm(struct bpf_program *prog); 252 + LIBBPF_API struct bpf_link * 253 + bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd); 254 + 253 255 struct bpf_map; 256 + 254 257 LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map); 258 + 255 259 struct bpf_insn; 256 260 257 261 /* ··· 329 321 LIBBPF_API int bpf_program__set_tracepoint(struct bpf_program *prog); 330 322 LIBBPF_API int bpf_program__set_raw_tracepoint(struct bpf_program *prog); 331 323 LIBBPF_API int bpf_program__set_kprobe(struct bpf_program *prog); 324 + LIBBPF_API int bpf_program__set_lsm(struct bpf_program *prog); 332 325 LIBBPF_API int bpf_program__set_sched_cls(struct bpf_program *prog); 333 326 LIBBPF_API int bpf_program__set_sched_act(struct bpf_program *prog); 334 327 LIBBPF_API int bpf_program__set_xdp(struct bpf_program *prog); ··· 356 347 LIBBPF_API bool bpf_program__is_tracepoint(const struct bpf_program *prog); 357 348 LIBBPF_API bool bpf_program__is_raw_tracepoint(const struct bpf_program *prog); 358 349 LIBBPF_API bool bpf_program__is_kprobe(const struct bpf_program *prog); 350 + LIBBPF_API bool bpf_program__is_lsm(const struct bpf_program *prog); 359 351 LIBBPF_API bool bpf_program__is_sched_cls(const struct bpf_program *prog); 360 352 LIBBPF_API bool bpf_program__is_sched_act(const struct bpf_program *prog); 361 353 LIBBPF_API bool bpf_program__is_xdp(const struct bpf_program *prog); ··· 417 407 LIBBPF_API int bpf_map__set_priv(struct bpf_map *map, void *priv, 418 408 bpf_map_clear_priv_t clear_priv); 419 409 LIBBPF_API void *bpf_map__priv(const struct bpf_map *map); 410 + LIBBPF_API int bpf_map__set_initial_value(struct bpf_map *map, 411 + const void *data, size_t size); 420 412 LIBBPF_API int bpf_map__reuse_fd(struct bpf_map *map, int fd); 421 413 LIBBPF_API int bpf_map__resize(struct bpf_map *map, __u32 max_entries); 422 414 LIBBPF_API bool bpf_map__is_offload_neutral(const struct bpf_map *map); ··· 456 444 __u8 attach_mode; 457 445 }; 458 446 447 + struct bpf_xdp_set_link_opts { 448 + size_t sz; 449 + __u32 old_fd; 450 + }; 451 + #define bpf_xdp_set_link_opts__last_field old_fd 452 + 459 453 LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags); 454 + LIBBPF_API int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags, 455 + const struct bpf_xdp_set_link_opts *opts); 460 456 LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags); 461 457 LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info, 462 458 size_t info_size, __u32 flags);

+9

tools/lib/bpf/libbpf.map

··· 243 243 bpf_link__pin; 244 244 bpf_link__pin_path; 245 245 bpf_link__unpin; 246 + bpf_link__update_program; 247 + bpf_link_create; 248 + bpf_link_update; 249 + bpf_map__set_initial_value; 250 + bpf_program__attach_cgroup; 251 + bpf_program__attach_lsm; 252 + bpf_program__is_lsm; 246 253 bpf_program__set_attach_target; 254 + bpf_program__set_lsm; 255 + bpf_set_link_xdp_fd_opts; 247 256 } LIBBPF_0.0.7;

+1

tools/lib/bpf/libbpf_probes.c

··· 108 108 case BPF_PROG_TYPE_TRACING: 109 109 case BPF_PROG_TYPE_STRUCT_OPS: 110 110 case BPF_PROG_TYPE_EXT: 111 + case BPF_PROG_TYPE_LSM: 111 112 default: 112 113 break; 113 114 }

+33 -1

tools/lib/bpf/netlink.c

··· 132 132 return ret; 133 133 } 134 134 135 - int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags) 135 + static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd, 136 + __u32 flags) 136 137 { 137 138 int sock, seq = 0, ret; 138 139 struct nlattr *nla, *nla_xdp; ··· 179 178 nla->nla_len += nla_xdp->nla_len; 180 179 } 181 180 181 + if (flags & XDP_FLAGS_REPLACE) { 182 + nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len); 183 + nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD; 184 + nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd); 185 + memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd)); 186 + nla->nla_len += nla_xdp->nla_len; 187 + } 188 + 182 189 req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len); 183 190 184 191 if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) { ··· 198 189 cleanup: 199 190 close(sock); 200 191 return ret; 192 + } 193 + 194 + int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags, 195 + const struct bpf_xdp_set_link_opts *opts) 196 + { 197 + int old_fd = -1; 198 + 199 + if (!OPTS_VALID(opts, bpf_xdp_set_link_opts)) 200 + return -EINVAL; 201 + 202 + if (OPTS_HAS(opts, old_fd)) { 203 + old_fd = OPTS_GET(opts, old_fd, -1); 204 + flags |= XDP_FLAGS_REPLACE; 205 + } 206 + 207 + return __bpf_set_link_xdp_fd_replace(ifindex, fd, 208 + old_fd, 209 + flags); 210 + } 211 + 212 + int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags) 213 + { 214 + return __bpf_set_link_xdp_fd_replace(ifindex, fd, 0, flags); 201 215 } 202 216 203 217 static int __dump_link_nlmsg(struct nlmsghdr *nlh,

+14 -2

tools/lib/bpf/xsk.c

··· 280 280 fill->consumer = map + off.fr.consumer; 281 281 fill->flags = map + off.fr.flags; 282 282 fill->ring = map + off.fr.desc; 283 - fill->cached_cons = umem->config.fill_size; 283 + fill->cached_prod = *fill->producer; 284 + /* cached_cons is "size" bigger than the real consumer pointer 285 + * See xsk_prod_nb_free 286 + */ 287 + fill->cached_cons = *fill->consumer + umem->config.fill_size; 284 288 285 289 map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), 286 290 PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, ··· 301 297 comp->consumer = map + off.cr.consumer; 302 298 comp->flags = map + off.cr.flags; 303 299 comp->ring = map + off.cr.desc; 300 + comp->cached_prod = *comp->producer; 301 + comp->cached_cons = *comp->consumer; 304 302 305 303 *umem_ptr = umem; 306 304 return 0; ··· 678 672 rx->consumer = rx_map + off.rx.consumer; 679 673 rx->flags = rx_map + off.rx.flags; 680 674 rx->ring = rx_map + off.rx.desc; 675 + rx->cached_prod = *rx->producer; 676 + rx->cached_cons = *rx->consumer; 681 677 } 682 678 xsk->rx = rx; 683 679 ··· 699 691 tx->consumer = tx_map + off.tx.consumer; 700 692 tx->flags = tx_map + off.tx.flags; 701 693 tx->ring = tx_map + off.tx.desc; 702 - tx->cached_cons = xsk->config.tx_size; 694 + tx->cached_prod = *tx->producer; 695 + /* cached_cons is r->size bigger than the real consumer pointer 696 + * See xsk_prod_nb_free 697 + */ 698 + tx->cached_cons = *tx->consumer + xsk->config.tx_size; 703 699 } 704 700 xsk->tx = tx; 705 701

+2

tools/testing/selftests/bpf/config

··· 35 35 CONFIG_MPLS_IPTUNNEL=m 36 36 CONFIG_IPV6_SIT=m 37 37 CONFIG_BPF_JIT=y 38 + CONFIG_BPF_LSM=y 39 + CONFIG_SECURITY=y

+31 -8

tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c

··· 11 11 static const unsigned int total_bytes = 10 * 1024 * 1024; 12 12 static const struct timeval timeo_sec = { .tv_sec = 10 }; 13 13 static const size_t timeo_optlen = sizeof(timeo_sec); 14 + static int expected_stg = 0xeB9F; 14 15 static int stop, duration; 15 16 16 17 static int settimeo(int fd) ··· 89 88 return NULL; 90 89 } 91 90 92 - static void do_test(const char *tcp_ca) 91 + static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map) 93 92 { 94 93 struct sockaddr_in6 sa6 = {}; 95 94 ssize_t nr_recv = 0, bytes = 0; ··· 127 126 err = listen(lfd, 1); 128 127 if (CHECK(err == -1, "listen", "errno:%d\n", errno)) 129 128 goto done; 130 - err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd); 131 - if (CHECK(err != 0, "pthread_create", "err:%d\n", err)) 132 - goto done; 129 + 130 + if (sk_stg_map) { 131 + err = bpf_map_update_elem(bpf_map__fd(sk_stg_map), &fd, 132 + &expected_stg, BPF_NOEXIST); 133 + if (CHECK(err, "bpf_map_update_elem(sk_stg_map)", 134 + "err:%d errno:%d\n", err, errno)) 135 + goto done; 136 + } 133 137 134 138 /* connect to server */ 135 139 err = connect(fd, (struct sockaddr *)&sa6, addrlen); 136 140 if (CHECK(err == -1, "connect", "errno:%d\n", errno)) 137 - goto wait_thread; 141 + goto done; 142 + 143 + if (sk_stg_map) { 144 + int tmp_stg; 145 + 146 + err = bpf_map_lookup_elem(bpf_map__fd(sk_stg_map), &fd, 147 + &tmp_stg); 148 + if (CHECK(!err || errno != ENOENT, 149 + "bpf_map_lookup_elem(sk_stg_map)", 150 + "err:%d errno:%d\n", err, errno)) 151 + goto done; 152 + } 153 + 154 + err = pthread_create(&srv_thread, NULL, server, (void *)(long)lfd); 155 + if (CHECK(err != 0, "pthread_create", "err:%d errno:%d\n", err, errno)) 156 + goto done; 138 157 139 158 /* recv total_bytes */ 140 159 while (bytes < total_bytes && !READ_ONCE(stop)) { ··· 170 149 CHECK(bytes != total_bytes, "recv", "%zd != %u nr_recv:%zd errno:%d\n", 171 150 bytes, total_bytes, nr_recv, errno); 172 151 173 - wait_thread: 174 152 WRITE_ONCE(stop, 1); 175 153 pthread_join(srv_thread, &thread_ret); 176 154 CHECK(IS_ERR(thread_ret), "pthread_join", "thread_ret:%ld", ··· 195 175 return; 196 176 } 197 177 198 - do_test("bpf_cubic"); 178 + do_test("bpf_cubic", NULL); 199 179 200 180 bpf_link__destroy(link); 201 181 bpf_cubic__destroy(cubic_skel); ··· 217 197 return; 218 198 } 219 199 220 - do_test("bpf_dctcp"); 200 + do_test("bpf_dctcp", dctcp_skel->maps.sk_stg_map); 201 + CHECK(dctcp_skel->bss->stg_result != expected_stg, 202 + "Unexpected stg_result", "stg_result (%x) != expected_stg (%x)\n", 203 + dctcp_skel->bss->stg_result, expected_stg); 221 204 222 205 bpf_link__destroy(link); 223 206 bpf_dctcp__destroy(dctcp_skel);

+1 -1

tools/testing/selftests/bpf/prog_tests/btf_dump.c

··· 125 125 if (!test__start_subtest(t->name)) 126 126 continue; 127 127 128 - test_btf_dump_case(i, &btf_dump_test_cases[i]); 128 + test_btf_dump_case(i, &btf_dump_test_cases[i]); 129 129 } 130 130 }

+244

tools/testing/selftests/bpf/prog_tests/cgroup_link.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <test_progs.h> 4 + #include "cgroup_helpers.h" 5 + #include "test_cgroup_link.skel.h" 6 + 7 + static __u32 duration = 0; 8 + #define PING_CMD "ping -q -c1 -w1 127.0.0.1 > /dev/null" 9 + 10 + static struct test_cgroup_link *skel = NULL; 11 + 12 + int ping_and_check(int exp_calls, int exp_alt_calls) 13 + { 14 + skel->bss->calls = 0; 15 + skel->bss->alt_calls = 0; 16 + CHECK_FAIL(system(PING_CMD)); 17 + if (CHECK(skel->bss->calls != exp_calls, "call_cnt", 18 + "exp %d, got %d\n", exp_calls, skel->bss->calls)) 19 + return -EINVAL; 20 + if (CHECK(skel->bss->alt_calls != exp_alt_calls, "alt_call_cnt", 21 + "exp %d, got %d\n", exp_alt_calls, skel->bss->alt_calls)) 22 + return -EINVAL; 23 + return 0; 24 + } 25 + 26 + void test_cgroup_link(void) 27 + { 28 + struct { 29 + const char *path; 30 + int fd; 31 + } cgs[] = { 32 + { "/cg1" }, 33 + { "/cg1/cg2" }, 34 + { "/cg1/cg2/cg3" }, 35 + { "/cg1/cg2/cg3/cg4" }, 36 + }; 37 + int last_cg = ARRAY_SIZE(cgs) - 1, cg_nr = ARRAY_SIZE(cgs); 38 + DECLARE_LIBBPF_OPTS(bpf_link_update_opts, link_upd_opts); 39 + struct bpf_link *links[ARRAY_SIZE(cgs)] = {}, *tmp_link; 40 + __u32 prog_ids[ARRAY_SIZE(cgs)], prog_cnt = 0, attach_flags; 41 + int i = 0, err, prog_fd; 42 + bool detach_legacy = false; 43 + 44 + skel = test_cgroup_link__open_and_load(); 45 + if (CHECK(!skel, "skel_open_load", "failed to open/load skeleton\n")) 46 + return; 47 + prog_fd = bpf_program__fd(skel->progs.egress); 48 + 49 + err = setup_cgroup_environment(); 50 + if (CHECK(err, "cg_init", "failed: %d\n", err)) 51 + goto cleanup; 52 + 53 + for (i = 0; i < cg_nr; i++) { 54 + cgs[i].fd = create_and_get_cgroup(cgs[i].path); 55 + if (CHECK(cgs[i].fd < 0, "cg_create", "fail: %d\n", cgs[i].fd)) 56 + goto cleanup; 57 + } 58 + 59 + err = join_cgroup(cgs[last_cg].path); 60 + if (CHECK(err, "cg_join", "fail: %d\n", err)) 61 + goto cleanup; 62 + 63 + for (i = 0; i < cg_nr; i++) { 64 + links[i] = bpf_program__attach_cgroup(skel->progs.egress, 65 + cgs[i].fd); 66 + if (CHECK(IS_ERR(links[i]), "cg_attach", "i: %d, err: %ld\n", 67 + i, PTR_ERR(links[i]))) 68 + goto cleanup; 69 + } 70 + 71 + ping_and_check(cg_nr, 0); 72 + 73 + /* query the number of effective progs and attach flags in root cg */ 74 + err = bpf_prog_query(cgs[0].fd, BPF_CGROUP_INET_EGRESS, 75 + BPF_F_QUERY_EFFECTIVE, &attach_flags, NULL, 76 + &prog_cnt); 77 + CHECK_FAIL(err); 78 + CHECK_FAIL(attach_flags != BPF_F_ALLOW_MULTI); 79 + if (CHECK(prog_cnt != 1, "effect_cnt", "exp %d, got %d\n", 1, prog_cnt)) 80 + goto cleanup; 81 + 82 + /* query the number of effective progs in last cg */ 83 + err = bpf_prog_query(cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS, 84 + BPF_F_QUERY_EFFECTIVE, NULL, NULL, 85 + &prog_cnt); 86 + CHECK_FAIL(err); 87 + CHECK_FAIL(attach_flags != BPF_F_ALLOW_MULTI); 88 + if (CHECK(prog_cnt != cg_nr, "effect_cnt", "exp %d, got %d\n", 89 + cg_nr, prog_cnt)) 90 + goto cleanup; 91 + 92 + /* query the effective prog IDs in last cg */ 93 + err = bpf_prog_query(cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS, 94 + BPF_F_QUERY_EFFECTIVE, &attach_flags, 95 + prog_ids, &prog_cnt); 96 + CHECK_FAIL(err); 97 + CHECK_FAIL(attach_flags != BPF_F_ALLOW_MULTI); 98 + if (CHECK(prog_cnt != cg_nr, "effect_cnt", "exp %d, got %d\n", 99 + cg_nr, prog_cnt)) 100 + goto cleanup; 101 + for (i = 1; i < prog_cnt; i++) { 102 + CHECK(prog_ids[i - 1] != prog_ids[i], "prog_id_check", 103 + "idx %d, prev id %d, cur id %d\n", 104 + i, prog_ids[i - 1], prog_ids[i]); 105 + } 106 + 107 + /* detach bottom program and ping again */ 108 + bpf_link__destroy(links[last_cg]); 109 + links[last_cg] = NULL; 110 + 111 + ping_and_check(cg_nr - 1, 0); 112 + 113 + /* mix in with non link-based multi-attachments */ 114 + err = bpf_prog_attach(prog_fd, cgs[last_cg].fd, 115 + BPF_CGROUP_INET_EGRESS, BPF_F_ALLOW_MULTI); 116 + if (CHECK(err, "cg_attach_legacy", "errno=%d\n", errno)) 117 + goto cleanup; 118 + detach_legacy = true; 119 + 120 + links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress, 121 + cgs[last_cg].fd); 122 + if (CHECK(IS_ERR(links[last_cg]), "cg_attach", "err: %ld\n", 123 + PTR_ERR(links[last_cg]))) 124 + goto cleanup; 125 + 126 + ping_and_check(cg_nr + 1, 0); 127 + 128 + /* detach link */ 129 + bpf_link__destroy(links[last_cg]); 130 + links[last_cg] = NULL; 131 + 132 + /* detach legacy */ 133 + err = bpf_prog_detach2(prog_fd, cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS); 134 + if (CHECK(err, "cg_detach_legacy", "errno=%d\n", errno)) 135 + goto cleanup; 136 + detach_legacy = false; 137 + 138 + /* attach legacy exclusive prog attachment */ 139 + err = bpf_prog_attach(prog_fd, cgs[last_cg].fd, 140 + BPF_CGROUP_INET_EGRESS, 0); 141 + if (CHECK(err, "cg_attach_exclusive", "errno=%d\n", errno)) 142 + goto cleanup; 143 + detach_legacy = true; 144 + 145 + /* attempt to mix in with multi-attach bpf_link */ 146 + tmp_link = bpf_program__attach_cgroup(skel->progs.egress, 147 + cgs[last_cg].fd); 148 + if (CHECK(!IS_ERR(tmp_link), "cg_attach_fail", "unexpected success!\n")) { 149 + bpf_link__destroy(tmp_link); 150 + goto cleanup; 151 + } 152 + 153 + ping_and_check(cg_nr, 0); 154 + 155 + /* detach */ 156 + err = bpf_prog_detach2(prog_fd, cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS); 157 + if (CHECK(err, "cg_detach_legacy", "errno=%d\n", errno)) 158 + goto cleanup; 159 + detach_legacy = false; 160 + 161 + ping_and_check(cg_nr - 1, 0); 162 + 163 + /* attach back link-based one */ 164 + links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress, 165 + cgs[last_cg].fd); 166 + if (CHECK(IS_ERR(links[last_cg]), "cg_attach", "err: %ld\n", 167 + PTR_ERR(links[last_cg]))) 168 + goto cleanup; 169 + 170 + ping_and_check(cg_nr, 0); 171 + 172 + /* check legacy exclusive prog can't be attached */ 173 + err = bpf_prog_attach(prog_fd, cgs[last_cg].fd, 174 + BPF_CGROUP_INET_EGRESS, 0); 175 + if (CHECK(!err, "cg_attach_exclusive", "unexpected success")) { 176 + bpf_prog_detach2(prog_fd, cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS); 177 + goto cleanup; 178 + } 179 + 180 + /* replace BPF programs inside their links for all but first link */ 181 + for (i = 1; i < cg_nr; i++) { 182 + err = bpf_link__update_program(links[i], skel->progs.egress_alt); 183 + if (CHECK(err, "prog_upd", "link #%d\n", i)) 184 + goto cleanup; 185 + } 186 + 187 + ping_and_check(1, cg_nr - 1); 188 + 189 + /* Attempt program update with wrong expected BPF program */ 190 + link_upd_opts.old_prog_fd = bpf_program__fd(skel->progs.egress_alt); 191 + link_upd_opts.flags = BPF_F_REPLACE; 192 + err = bpf_link_update(bpf_link__fd(links[0]), 193 + bpf_program__fd(skel->progs.egress_alt), 194 + &link_upd_opts); 195 + if (CHECK(err == 0 || errno != EPERM, "prog_cmpxchg1", 196 + "unexpectedly succeeded, err %d, errno %d\n", err, -errno)) 197 + goto cleanup; 198 + 199 + /* Compare-exchange single link program from egress to egress_alt */ 200 + link_upd_opts.old_prog_fd = bpf_program__fd(skel->progs.egress); 201 + link_upd_opts.flags = BPF_F_REPLACE; 202 + err = bpf_link_update(bpf_link__fd(links[0]), 203 + bpf_program__fd(skel->progs.egress_alt), 204 + &link_upd_opts); 205 + if (CHECK(err, "prog_cmpxchg2", "errno %d\n", -errno)) 206 + goto cleanup; 207 + 208 + /* ping */ 209 + ping_and_check(0, cg_nr); 210 + 211 + /* close cgroup FDs before detaching links */ 212 + for (i = 0; i < cg_nr; i++) { 213 + if (cgs[i].fd > 0) { 214 + close(cgs[i].fd); 215 + cgs[i].fd = -1; 216 + } 217 + } 218 + 219 + /* BPF programs should still get called */ 220 + ping_and_check(0, cg_nr); 221 + 222 + /* leave cgroup and remove them, don't detach programs */ 223 + cleanup_cgroup_environment(); 224 + 225 + /* BPF programs should have been auto-detached */ 226 + ping_and_check(0, 0); 227 + 228 + cleanup: 229 + if (detach_legacy) 230 + bpf_prog_detach2(prog_fd, cgs[last_cg].fd, 231 + BPF_CGROUP_INET_EGRESS); 232 + 233 + for (i = 0; i < cg_nr; i++) { 234 + if (!IS_ERR(links[i])) 235 + bpf_link__destroy(links[i]); 236 + } 237 + test_cgroup_link__destroy(skel); 238 + 239 + for (i = 0; i < cg_nr; i++) { 240 + if (cgs[i].fd > 0) 241 + close(cgs[i].fd); 242 + } 243 + cleanup_cgroup_environment(); 244 + }

+5

tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c

··· 82 82 void test_get_stack_raw_tp(void) 83 83 { 84 84 const char *file = "./test_get_stack_rawtp.o"; 85 + const char *file_err = "./test_get_stack_rawtp_err.o"; 85 86 const char *prog_name = "raw_tracepoint/sys_enter"; 86 87 int i, err, prog_fd, exp_cnt = MAX_CNT_RAWTP; 87 88 struct perf_buffer_opts pb_opts = {}; ··· 93 92 struct bpf_object *obj; 94 93 struct bpf_map *map; 95 94 cpu_set_t cpu_set; 95 + 96 + err = bpf_prog_load(file_err, BPF_PROG_TYPE_RAW_TRACEPOINT, &obj, &prog_fd); 97 + if (CHECK(err >= 0, "prog_load raw tp", "err %d errno %d\n", err, errno)) 98 + return; 96 99 97 100 err = bpf_prog_load(file, BPF_PROG_TYPE_RAW_TRACEPOINT, &obj, &prog_fd); 98 101 if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))

+61

tools/testing/selftests/bpf/prog_tests/global_data_init.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <test_progs.h> 3 + 4 + void test_global_data_init(void) 5 + { 6 + const char *file = "./test_global_data.o"; 7 + int err = -ENOMEM, map_fd, zero = 0; 8 + __u8 *buff = NULL, *newval = NULL; 9 + struct bpf_object *obj; 10 + struct bpf_map *map; 11 + __u32 duration = 0; 12 + size_t sz; 13 + 14 + obj = bpf_object__open_file(file, NULL); 15 + if (CHECK_FAIL(!obj)) 16 + return; 17 + 18 + map = bpf_object__find_map_by_name(obj, "test_glo.rodata"); 19 + if (CHECK_FAIL(!map || !bpf_map__is_internal(map))) 20 + goto out; 21 + 22 + sz = bpf_map__def(map)->value_size; 23 + newval = malloc(sz); 24 + if (CHECK_FAIL(!newval)) 25 + goto out; 26 + 27 + memset(newval, 0, sz); 28 + /* wrong size, should fail */ 29 + err = bpf_map__set_initial_value(map, newval, sz - 1); 30 + if (CHECK(!err, "reject set initial value wrong size", "err %d\n", err)) 31 + goto out; 32 + 33 + err = bpf_map__set_initial_value(map, newval, sz); 34 + if (CHECK(err, "set initial value", "err %d\n", err)) 35 + goto out; 36 + 37 + err = bpf_object__load(obj); 38 + if (CHECK_FAIL(err)) 39 + goto out; 40 + 41 + map_fd = bpf_map__fd(map); 42 + if (CHECK_FAIL(map_fd < 0)) 43 + goto out; 44 + 45 + buff = malloc(sz); 46 + if (buff) 47 + err = bpf_map_lookup_elem(map_fd, &zero, buff); 48 + if (CHECK(!buff || err || memcmp(buff, newval, sz), 49 + "compare .rodata map data override", 50 + "err %d errno %d\n", err, errno)) 51 + goto out; 52 + 53 + memset(newval, 1, sz); 54 + /* object loaded - should fail */ 55 + err = bpf_map__set_initial_value(map, newval, sz); 56 + CHECK(!err, "reject set initial value after load", "err %d\n", err); 57 + out: 58 + free(buff); 59 + free(newval); 60 + bpf_object__close(obj); 61 + }

+309

tools/testing/selftests/bpf/prog_tests/sk_assign.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2018 Facebook 3 + // Copyright (c) 2019 Cloudflare 4 + // Copyright (c) 2020 Isovalent, Inc. 5 + /* 6 + * Test that the socket assign program is able to redirect traffic towards a 7 + * socket, regardless of whether the port or address destination of the traffic 8 + * matches the port. 9 + */ 10 + 11 + #define _GNU_SOURCE 12 + #include <fcntl.h> 13 + #include <signal.h> 14 + #include <stdlib.h> 15 + #include <unistd.h> 16 + 17 + #include "test_progs.h" 18 + 19 + #define BIND_PORT 1234 20 + #define CONNECT_PORT 4321 21 + #define TEST_DADDR (0xC0A80203) 22 + #define NS_SELF "/proc/self/ns/net" 23 + 24 + static const struct timeval timeo_sec = { .tv_sec = 3 }; 25 + static const size_t timeo_optlen = sizeof(timeo_sec); 26 + static int stop, duration; 27 + 28 + static bool 29 + configure_stack(void) 30 + { 31 + char tc_cmd[BUFSIZ]; 32 + 33 + /* Move to a new networking namespace */ 34 + if (CHECK_FAIL(unshare(CLONE_NEWNET))) 35 + return false; 36 + 37 + /* Configure necessary links, routes */ 38 + if (CHECK_FAIL(system("ip link set dev lo up"))) 39 + return false; 40 + if (CHECK_FAIL(system("ip route add local default dev lo"))) 41 + return false; 42 + if (CHECK_FAIL(system("ip -6 route add local default dev lo"))) 43 + return false; 44 + 45 + /* Load qdisc, BPF program */ 46 + if (CHECK_FAIL(system("tc qdisc add dev lo clsact"))) 47 + return false; 48 + sprintf(tc_cmd, "%s %s %s %s", "tc filter add dev lo ingress bpf", 49 + "direct-action object-file ./test_sk_assign.o", 50 + "section classifier/sk_assign_test", 51 + (env.verbosity < VERBOSE_VERY) ? " 2>/dev/null" : ""); 52 + if (CHECK(system(tc_cmd), "BPF load failed;", 53 + "run with -vv for more info\n")) 54 + return false; 55 + 56 + return true; 57 + } 58 + 59 + static int 60 + start_server(const struct sockaddr *addr, socklen_t len, int type) 61 + { 62 + int fd; 63 + 64 + fd = socket(addr->sa_family, type, 0); 65 + if (CHECK_FAIL(fd == -1)) 66 + goto out; 67 + if (CHECK_FAIL(setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &timeo_sec, 68 + timeo_optlen))) 69 + goto close_out; 70 + if (CHECK_FAIL(bind(fd, addr, len) == -1)) 71 + goto close_out; 72 + if (type == SOCK_STREAM && CHECK_FAIL(listen(fd, 128) == -1)) 73 + goto close_out; 74 + 75 + goto out; 76 + close_out: 77 + close(fd); 78 + fd = -1; 79 + out: 80 + return fd; 81 + } 82 + 83 + static int 84 + connect_to_server(const struct sockaddr *addr, socklen_t len, int type) 85 + { 86 + int fd = -1; 87 + 88 + fd = socket(addr->sa_family, type, 0); 89 + if (CHECK_FAIL(fd == -1)) 90 + goto out; 91 + if (CHECK_FAIL(setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &timeo_sec, 92 + timeo_optlen))) 93 + goto close_out; 94 + if (CHECK_FAIL(connect(fd, addr, len))) 95 + goto close_out; 96 + 97 + goto out; 98 + close_out: 99 + close(fd); 100 + fd = -1; 101 + out: 102 + return fd; 103 + } 104 + 105 + static in_port_t 106 + get_port(int fd) 107 + { 108 + struct sockaddr_storage ss; 109 + socklen_t slen = sizeof(ss); 110 + in_port_t port = 0; 111 + 112 + if (CHECK_FAIL(getsockname(fd, (struct sockaddr *)&ss, &slen))) 113 + return port; 114 + 115 + switch (ss.ss_family) { 116 + case AF_INET: 117 + port = ((struct sockaddr_in *)&ss)->sin_port; 118 + break; 119 + case AF_INET6: 120 + port = ((struct sockaddr_in6 *)&ss)->sin6_port; 121 + break; 122 + default: 123 + CHECK(1, "Invalid address family", "%d\n", ss.ss_family); 124 + } 125 + return port; 126 + } 127 + 128 + static ssize_t 129 + rcv_msg(int srv_client, int type) 130 + { 131 + struct sockaddr_storage ss; 132 + char buf[BUFSIZ]; 133 + socklen_t slen; 134 + 135 + if (type == SOCK_STREAM) 136 + return read(srv_client, &buf, sizeof(buf)); 137 + else 138 + return recvfrom(srv_client, &buf, sizeof(buf), 0, 139 + (struct sockaddr *)&ss, &slen); 140 + } 141 + 142 + static int 143 + run_test(int server_fd, const struct sockaddr *addr, socklen_t len, int type) 144 + { 145 + int client = -1, srv_client = -1; 146 + char buf[] = "testing"; 147 + in_port_t port; 148 + int ret = 1; 149 + 150 + client = connect_to_server(addr, len, type); 151 + if (client == -1) { 152 + perror("Cannot connect to server"); 153 + goto out; 154 + } 155 + 156 + if (type == SOCK_STREAM) { 157 + srv_client = accept(server_fd, NULL, NULL); 158 + if (CHECK_FAIL(srv_client == -1)) { 159 + perror("Can't accept connection"); 160 + goto out; 161 + } 162 + } else { 163 + srv_client = server_fd; 164 + } 165 + if (CHECK_FAIL(write(client, buf, sizeof(buf)) != sizeof(buf))) { 166 + perror("Can't write on client"); 167 + goto out; 168 + } 169 + if (CHECK_FAIL(rcv_msg(srv_client, type) != sizeof(buf))) { 170 + perror("Can't read on server"); 171 + goto out; 172 + } 173 + 174 + port = get_port(srv_client); 175 + if (CHECK_FAIL(!port)) 176 + goto out; 177 + /* SOCK_STREAM is connected via accept(), so the server's local address 178 + * will be the CONNECT_PORT rather than the BIND port that corresponds 179 + * to the listen socket. SOCK_DGRAM on the other hand is connectionless 180 + * so we can't really do the same check there; the server doesn't ever 181 + * create a socket with CONNECT_PORT. 182 + */ 183 + if (type == SOCK_STREAM && 184 + CHECK(port != htons(CONNECT_PORT), "Expected", "port %u but got %u", 185 + CONNECT_PORT, ntohs(port))) 186 + goto out; 187 + else if (type == SOCK_DGRAM && 188 + CHECK(port != htons(BIND_PORT), "Expected", 189 + "port %u but got %u", BIND_PORT, ntohs(port))) 190 + goto out; 191 + 192 + ret = 0; 193 + out: 194 + close(client); 195 + if (srv_client != server_fd) 196 + close(srv_client); 197 + if (ret) 198 + WRITE_ONCE(stop, 1); 199 + return ret; 200 + } 201 + 202 + static void 203 + prepare_addr(struct sockaddr *addr, int family, __u16 port, bool rewrite_addr) 204 + { 205 + struct sockaddr_in *addr4; 206 + struct sockaddr_in6 *addr6; 207 + 208 + switch (family) { 209 + case AF_INET: 210 + addr4 = (struct sockaddr_in *)addr; 211 + memset(addr4, 0, sizeof(*addr4)); 212 + addr4->sin_family = family; 213 + addr4->sin_port = htons(port); 214 + if (rewrite_addr) 215 + addr4->sin_addr.s_addr = htonl(TEST_DADDR); 216 + else 217 + addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK); 218 + break; 219 + case AF_INET6: 220 + addr6 = (struct sockaddr_in6 *)addr; 221 + memset(addr6, 0, sizeof(*addr6)); 222 + addr6->sin6_family = family; 223 + addr6->sin6_port = htons(port); 224 + addr6->sin6_addr = in6addr_loopback; 225 + if (rewrite_addr) 226 + addr6->sin6_addr.s6_addr32[3] = htonl(TEST_DADDR); 227 + break; 228 + default: 229 + fprintf(stderr, "Invalid family %d", family); 230 + } 231 + } 232 + 233 + struct test_sk_cfg { 234 + const char *name; 235 + int family; 236 + struct sockaddr *addr; 237 + socklen_t len; 238 + int type; 239 + bool rewrite_addr; 240 + }; 241 + 242 + #define TEST(NAME, FAMILY, TYPE, REWRITE) \ 243 + { \ 244 + .name = NAME, \ 245 + .family = FAMILY, \ 246 + .addr = (FAMILY == AF_INET) ? (struct sockaddr *)&addr4 \ 247 + : (struct sockaddr *)&addr6, \ 248 + .len = (FAMILY == AF_INET) ? sizeof(addr4) : sizeof(addr6), \ 249 + .type = TYPE, \ 250 + .rewrite_addr = REWRITE, \ 251 + } 252 + 253 + void test_sk_assign(void) 254 + { 255 + struct sockaddr_in addr4; 256 + struct sockaddr_in6 addr6; 257 + struct test_sk_cfg tests[] = { 258 + TEST("ipv4 tcp port redir", AF_INET, SOCK_STREAM, false), 259 + TEST("ipv4 tcp addr redir", AF_INET, SOCK_STREAM, true), 260 + TEST("ipv6 tcp port redir", AF_INET6, SOCK_STREAM, false), 261 + TEST("ipv6 tcp addr redir", AF_INET6, SOCK_STREAM, true), 262 + TEST("ipv4 udp port redir", AF_INET, SOCK_DGRAM, false), 263 + TEST("ipv4 udp addr redir", AF_INET, SOCK_DGRAM, true), 264 + TEST("ipv6 udp port redir", AF_INET6, SOCK_DGRAM, false), 265 + TEST("ipv6 udp addr redir", AF_INET6, SOCK_DGRAM, true), 266 + }; 267 + int server = -1; 268 + int self_net; 269 + 270 + self_net = open(NS_SELF, O_RDONLY); 271 + if (CHECK_FAIL(self_net < 0)) { 272 + perror("Unable to open "NS_SELF); 273 + return; 274 + } 275 + 276 + if (!configure_stack()) { 277 + perror("configure_stack"); 278 + goto cleanup; 279 + } 280 + 281 + for (int i = 0; i < ARRAY_SIZE(tests) && !READ_ONCE(stop); i++) { 282 + struct test_sk_cfg *test = &tests[i]; 283 + const struct sockaddr *addr; 284 + 285 + if (!test__start_subtest(test->name)) 286 + continue; 287 + prepare_addr(test->addr, test->family, BIND_PORT, false); 288 + addr = (const struct sockaddr *)test->addr; 289 + server = start_server(addr, test->len, test->type); 290 + if (server == -1) 291 + goto cleanup; 292 + 293 + /* connect to unbound ports */ 294 + prepare_addr(test->addr, test->family, CONNECT_PORT, 295 + test->rewrite_addr); 296 + if (run_test(server, addr, test->len, test->type)) 297 + goto close; 298 + 299 + close(server); 300 + server = -1; 301 + } 302 + 303 + close: 304 + close(server); 305 + cleanup: 306 + if (CHECK_FAIL(setns(self_net, CLONE_NEWNET))) 307 + perror("Failed to setns("NS_SELF")"); 308 + close(self_net); 309 + }

+2 -2

tools/testing/selftests/bpf/prog_tests/tcp_rtt.c

··· 226 226 return ERR_PTR(err); 227 227 } 228 228 229 - while (!server_done) { 229 + while (true) { 230 230 client_fd = accept(fd, (struct sockaddr *)&addr, &len); 231 231 if (client_fd == -1 && errno == EAGAIN) { 232 232 usleep(50); ··· 272 272 CHECK_FAIL(run_test(cgroup_fd, server_fd)); 273 273 274 274 server_done = true; 275 - pthread_join(tid, &server_res); 275 + CHECK_FAIL(pthread_join(tid, &server_res)); 276 276 CHECK_FAIL(IS_ERR(server_res)); 277 277 278 278 close_server_fd:

+86

tools/testing/selftests/bpf/prog_tests/test_lsm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2020 Google LLC. 5 + */ 6 + 7 + #include <test_progs.h> 8 + #include <sys/mman.h> 9 + #include <sys/wait.h> 10 + #include <unistd.h> 11 + #include <malloc.h> 12 + #include <stdlib.h> 13 + 14 + #include "lsm.skel.h" 15 + 16 + char *CMD_ARGS[] = {"true", NULL}; 17 + 18 + int heap_mprotect(void) 19 + { 20 + void *buf; 21 + long sz; 22 + int ret; 23 + 24 + sz = sysconf(_SC_PAGESIZE); 25 + if (sz < 0) 26 + return sz; 27 + 28 + buf = memalign(sz, 2 * sz); 29 + if (buf == NULL) 30 + return -ENOMEM; 31 + 32 + ret = mprotect(buf, sz, PROT_READ | PROT_WRITE | PROT_EXEC); 33 + free(buf); 34 + return ret; 35 + } 36 + 37 + int exec_cmd(int *monitored_pid) 38 + { 39 + int child_pid, child_status; 40 + 41 + child_pid = fork(); 42 + if (child_pid == 0) { 43 + *monitored_pid = getpid(); 44 + execvp(CMD_ARGS[0], CMD_ARGS); 45 + return -EINVAL; 46 + } else if (child_pid > 0) { 47 + waitpid(child_pid, &child_status, 0); 48 + return child_status; 49 + } 50 + 51 + return -EINVAL; 52 + } 53 + 54 + void test_test_lsm(void) 55 + { 56 + struct lsm *skel = NULL; 57 + int err, duration = 0; 58 + 59 + skel = lsm__open_and_load(); 60 + if (CHECK(!skel, "skel_load", "lsm skeleton failed\n")) 61 + goto close_prog; 62 + 63 + err = lsm__attach(skel); 64 + if (CHECK(err, "attach", "lsm attach failed: %d\n", err)) 65 + goto close_prog; 66 + 67 + err = exec_cmd(&skel->bss->monitored_pid); 68 + if (CHECK(err < 0, "exec_cmd", "err %d errno %d\n", err, errno)) 69 + goto close_prog; 70 + 71 + CHECK(skel->bss->bprm_count != 1, "bprm_count", "bprm_count = %d\n", 72 + skel->bss->bprm_count); 73 + 74 + skel->bss->monitored_pid = getpid(); 75 + 76 + err = heap_mprotect(); 77 + if (CHECK(errno != EPERM, "heap_mprotect", "want errno=EPERM, got %d\n", 78 + errno)) 79 + goto close_prog; 80 + 81 + CHECK(skel->bss->mprotect_count != 1, "mprotect_count", 82 + "mprotect_count = %d\n", skel->bss->mprotect_count); 83 + 84 + close_prog: 85 + lsm__destroy(skel); 86 + }

+1 -1

tools/testing/selftests/bpf/prog_tests/vmlinux.c

··· 11 11 { 12 12 struct timespec ts = { .tv_nsec = MY_TV_NSEC }; 13 13 14 - (void)nanosleep(&ts, NULL); 14 + (void)syscall(__NR_nanosleep, &ts, NULL); 15 15 } 16 16 17 17 void test_vmlinux(void)

+62

tools/testing/selftests/bpf/prog_tests/xdp_attach.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <test_progs.h> 3 + 4 + #define IFINDEX_LO 1 5 + #define XDP_FLAGS_REPLACE (1U << 4) 6 + 7 + void test_xdp_attach(void) 8 + { 9 + struct bpf_object *obj1, *obj2, *obj3; 10 + const char *file = "./test_xdp.o"; 11 + int err, fd1, fd2, fd3; 12 + __u32 duration = 0; 13 + DECLARE_LIBBPF_OPTS(bpf_xdp_set_link_opts, opts, 14 + .old_fd = -1); 15 + 16 + err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj1, &fd1); 17 + if (CHECK_FAIL(err)) 18 + return; 19 + err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj2, &fd2); 20 + if (CHECK_FAIL(err)) 21 + goto out_1; 22 + err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj3, &fd3); 23 + if (CHECK_FAIL(err)) 24 + goto out_2; 25 + 26 + err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, fd1, XDP_FLAGS_REPLACE, 27 + &opts); 28 + if (CHECK(err, "load_ok", "initial load failed")) 29 + goto out_close; 30 + 31 + err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, fd2, XDP_FLAGS_REPLACE, 32 + &opts); 33 + if (CHECK(!err, "load_fail", "load with expected id didn't fail")) 34 + goto out; 35 + 36 + opts.old_fd = fd1; 37 + err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, fd2, 0, &opts); 38 + if (CHECK(err, "replace_ok", "replace valid old_fd failed")) 39 + goto out; 40 + 41 + err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, fd3, 0, &opts); 42 + if (CHECK(!err, "replace_fail", "replace invalid old_fd didn't fail")) 43 + goto out; 44 + 45 + err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, -1, 0, &opts); 46 + if (CHECK(!err, "remove_fail", "remove invalid old_fd didn't fail")) 47 + goto out; 48 + 49 + opts.old_fd = fd2; 50 + err = bpf_set_link_xdp_fd_opts(IFINDEX_LO, -1, 0, &opts); 51 + if (CHECK(err, "remove_ok", "remove valid old_fd failed")) 52 + goto out; 53 + 54 + out: 55 + bpf_set_link_xdp_fd(IFINDEX_LO, -1, 0); 56 + out_close: 57 + bpf_object__close(obj3); 58 + out_2: 59 + bpf_object__close(obj2); 60 + out_1: 61 + bpf_object__close(obj1); 62 + }

+16

tools/testing/selftests/bpf/progs/bpf_dctcp.c

··· 6 6 * the kernel BPF logic. 7 7 */ 8 8 9 + #include <stddef.h> 9 10 #include <linux/bpf.h> 10 11 #include <linux/types.h> 11 12 #include <bpf/bpf_helpers.h> ··· 14 13 #include "bpf_tcp_helpers.h" 15 14 16 15 char _license[] SEC("license") = "GPL"; 16 + 17 + int stg_result = 0; 18 + 19 + struct { 20 + __uint(type, BPF_MAP_TYPE_SK_STORAGE); 21 + __uint(map_flags, BPF_F_NO_PREALLOC); 22 + __type(key, int); 23 + __type(value, int); 24 + } sk_stg_map SEC(".maps"); 17 25 18 26 #define DCTCP_MAX_ALPHA 1024U 19 27 ··· 53 43 { 54 44 const struct tcp_sock *tp = tcp_sk(sk); 55 45 struct dctcp *ca = inet_csk_ca(sk); 46 + int *stg; 56 47 57 48 ca->prior_rcv_nxt = tp->rcv_nxt; 58 49 ca->dctcp_alpha = min(dctcp_alpha_on_init, DCTCP_MAX_ALPHA); 59 50 ca->loss_cwnd = 0; 60 51 ca->ce_state = 0; 61 52 53 + stg = bpf_sk_storage_get(&sk_stg_map, (void *)tp, NULL, 0); 54 + if (stg) { 55 + stg_result = *stg; 56 + bpf_sk_storage_delete(&sk_stg_map, (void *)tp); 57 + } 62 58 dctcp_reset(tp, ca); 63 59 } 64 60

+48

tools/testing/selftests/bpf/progs/lsm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright 2020 Google LLC. 5 + */ 6 + 7 + #include "vmlinux.h" 8 + #include <bpf/bpf_helpers.h> 9 + #include <bpf/bpf_tracing.h> 10 + #include <errno.h> 11 + 12 + char _license[] SEC("license") = "GPL"; 13 + 14 + int monitored_pid = 0; 15 + int mprotect_count = 0; 16 + int bprm_count = 0; 17 + 18 + SEC("lsm/file_mprotect") 19 + int BPF_PROG(test_int_hook, struct vm_area_struct *vma, 20 + unsigned long reqprot, unsigned long prot, int ret) 21 + { 22 + if (ret != 0) 23 + return ret; 24 + 25 + __u32 pid = bpf_get_current_pid_tgid() >> 32; 26 + int is_heap = 0; 27 + 28 + is_heap = (vma->vm_start >= vma->vm_mm->start_brk && 29 + vma->vm_end <= vma->vm_mm->brk); 30 + 31 + if (is_heap && monitored_pid == pid) { 32 + mprotect_count++; 33 + ret = -EPERM; 34 + } 35 + 36 + return ret; 37 + } 38 + 39 + SEC("lsm/bprm_committed_creds") 40 + int BPF_PROG(test_void_hook, struct linux_binprm *bprm) 41 + { 42 + __u32 pid = bpf_get_current_pid_tgid() >> 32; 43 + 44 + if (monitored_pid == pid) 45 + bprm_count++; 46 + 47 + return 0; 48 + }

-1

tools/testing/selftests/bpf/progs/sockmap_parse_prog.c

··· 12 12 __u32 lport = skb->local_port; 13 13 __u32 rport = skb->remote_port; 14 14 __u8 *d = data; 15 - __u32 len = (__u32) data_end - (__u32) data; 16 15 int err; 17 16 18 17 if (data + 10 > data_end) {

+24

tools/testing/selftests/bpf/progs/test_cgroup_link.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Facebook 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + 6 + int calls = 0; 7 + int alt_calls = 0; 8 + 9 + SEC("cgroup_skb/egress1") 10 + int egress(struct __sk_buff *skb) 11 + { 12 + __sync_fetch_and_add(&calls, 1); 13 + return 1; 14 + } 15 + 16 + SEC("cgroup_skb/egress2") 17 + int egress_alt(struct __sk_buff *skb) 18 + { 19 + __sync_fetch_and_add(&alt_calls, 1); 20 + return 1; 21 + } 22 + 23 + char _license[] SEC("license") = "GPL"; 24 +

+26

tools/testing/selftests/bpf/progs/test_get_stack_rawtp_err.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + 6 + #define MAX_STACK_RAWTP 10 7 + 8 + SEC("raw_tracepoint/sys_enter") 9 + int bpf_prog2(void *ctx) 10 + { 11 + __u64 stack[MAX_STACK_RAWTP]; 12 + int error; 13 + 14 + /* set all the flags which should return -EINVAL */ 15 + error = bpf_get_stack(ctx, stack, 0, -1); 16 + if (error < 0) 17 + goto loop; 18 + 19 + return error; 20 + loop: 21 + while (1) { 22 + error++; 23 + } 24 + } 25 + 26 + char _license[] SEC("license") = "GPL";

+1 -1

tools/testing/selftests/bpf/progs/test_global_data.c

··· 68 68 bpf_map_update_elem(&result_##map, &key, var, 0); \ 69 69 } while (0) 70 70 71 - SEC("static_data_load") 71 + SEC("classifier/static_data_load") 72 72 int load_static_data(struct __sk_buff *skb) 73 73 { 74 74 static const __u64 bar = ~0;

+204

tools/testing/selftests/bpf/progs/test_sk_assign.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2019 Cloudflare Ltd. 3 + // Copyright (c) 2020 Isovalent, Inc. 4 + 5 + #include <stddef.h> 6 + #include <stdbool.h> 7 + #include <string.h> 8 + #include <linux/bpf.h> 9 + #include <linux/if_ether.h> 10 + #include <linux/in.h> 11 + #include <linux/ip.h> 12 + #include <linux/ipv6.h> 13 + #include <linux/pkt_cls.h> 14 + #include <linux/tcp.h> 15 + #include <sys/socket.h> 16 + #include <bpf/bpf_helpers.h> 17 + #include <bpf/bpf_endian.h> 18 + 19 + int _version SEC("version") = 1; 20 + char _license[] SEC("license") = "GPL"; 21 + 22 + /* Fill 'tuple' with L3 info, and attempt to find L4. On fail, return NULL. */ 23 + static inline struct bpf_sock_tuple * 24 + get_tuple(struct __sk_buff *skb, bool *ipv4, bool *tcp) 25 + { 26 + void *data_end = (void *)(long)skb->data_end; 27 + void *data = (void *)(long)skb->data; 28 + struct bpf_sock_tuple *result; 29 + struct ethhdr *eth; 30 + __u64 tuple_len; 31 + __u8 proto = 0; 32 + __u64 ihl_len; 33 + 34 + eth = (struct ethhdr *)(data); 35 + if (eth + 1 > data_end) 36 + return NULL; 37 + 38 + if (eth->h_proto == bpf_htons(ETH_P_IP)) { 39 + struct iphdr *iph = (struct iphdr *)(data + sizeof(*eth)); 40 + 41 + if (iph + 1 > data_end) 42 + return NULL; 43 + if (iph->ihl != 5) 44 + /* Options are not supported */ 45 + return NULL; 46 + ihl_len = iph->ihl * 4; 47 + proto = iph->protocol; 48 + *ipv4 = true; 49 + result = (struct bpf_sock_tuple *)&iph->saddr; 50 + } else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) { 51 + struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + sizeof(*eth)); 52 + 53 + if (ip6h + 1 > data_end) 54 + return NULL; 55 + ihl_len = sizeof(*ip6h); 56 + proto = ip6h->nexthdr; 57 + *ipv4 = false; 58 + result = (struct bpf_sock_tuple *)&ip6h->saddr; 59 + } else { 60 + return (struct bpf_sock_tuple *)data; 61 + } 62 + 63 + if (proto != IPPROTO_TCP && proto != IPPROTO_UDP) 64 + return NULL; 65 + 66 + *tcp = (proto == IPPROTO_TCP); 67 + return result; 68 + } 69 + 70 + static inline int 71 + handle_udp(struct __sk_buff *skb, struct bpf_sock_tuple *tuple, bool ipv4) 72 + { 73 + struct bpf_sock_tuple ln = {0}; 74 + struct bpf_sock *sk; 75 + size_t tuple_len; 76 + int ret; 77 + 78 + tuple_len = ipv4 ? sizeof(tuple->ipv4) : sizeof(tuple->ipv6); 79 + if ((void *)tuple + tuple_len > (void *)(long)skb->data_end) 80 + return TC_ACT_SHOT; 81 + 82 + sk = bpf_sk_lookup_udp(skb, tuple, tuple_len, BPF_F_CURRENT_NETNS, 0); 83 + if (sk) 84 + goto assign; 85 + 86 + if (ipv4) { 87 + if (tuple->ipv4.dport != bpf_htons(4321)) 88 + return TC_ACT_OK; 89 + 90 + ln.ipv4.daddr = bpf_htonl(0x7f000001); 91 + ln.ipv4.dport = bpf_htons(1234); 92 + 93 + sk = bpf_sk_lookup_udp(skb, &ln, sizeof(ln.ipv4), 94 + BPF_F_CURRENT_NETNS, 0); 95 + } else { 96 + if (tuple->ipv6.dport != bpf_htons(4321)) 97 + return TC_ACT_OK; 98 + 99 + /* Upper parts of daddr are already zero. */ 100 + ln.ipv6.daddr[3] = bpf_htonl(0x1); 101 + ln.ipv6.dport = bpf_htons(1234); 102 + 103 + sk = bpf_sk_lookup_udp(skb, &ln, sizeof(ln.ipv6), 104 + BPF_F_CURRENT_NETNS, 0); 105 + } 106 + 107 + /* workaround: We can't do a single socket lookup here, because then 108 + * the compiler will likely spill tuple_len to the stack. This makes it 109 + * lose all bounds information in the verifier, which then rejects the 110 + * call as unsafe. 111 + */ 112 + if (!sk) 113 + return TC_ACT_SHOT; 114 + 115 + assign: 116 + ret = bpf_sk_assign(skb, sk, 0); 117 + bpf_sk_release(sk); 118 + return ret; 119 + } 120 + 121 + static inline int 122 + handle_tcp(struct __sk_buff *skb, struct bpf_sock_tuple *tuple, bool ipv4) 123 + { 124 + struct bpf_sock_tuple ln = {0}; 125 + struct bpf_sock *sk; 126 + size_t tuple_len; 127 + int ret; 128 + 129 + tuple_len = ipv4 ? sizeof(tuple->ipv4) : sizeof(tuple->ipv6); 130 + if ((void *)tuple + tuple_len > (void *)(long)skb->data_end) 131 + return TC_ACT_SHOT; 132 + 133 + sk = bpf_skc_lookup_tcp(skb, tuple, tuple_len, BPF_F_CURRENT_NETNS, 0); 134 + if (sk) { 135 + if (sk->state != BPF_TCP_LISTEN) 136 + goto assign; 137 + bpf_sk_release(sk); 138 + } 139 + 140 + if (ipv4) { 141 + if (tuple->ipv4.dport != bpf_htons(4321)) 142 + return TC_ACT_OK; 143 + 144 + ln.ipv4.daddr = bpf_htonl(0x7f000001); 145 + ln.ipv4.dport = bpf_htons(1234); 146 + 147 + sk = bpf_skc_lookup_tcp(skb, &ln, sizeof(ln.ipv4), 148 + BPF_F_CURRENT_NETNS, 0); 149 + } else { 150 + if (tuple->ipv6.dport != bpf_htons(4321)) 151 + return TC_ACT_OK; 152 + 153 + /* Upper parts of daddr are already zero. */ 154 + ln.ipv6.daddr[3] = bpf_htonl(0x1); 155 + ln.ipv6.dport = bpf_htons(1234); 156 + 157 + sk = bpf_skc_lookup_tcp(skb, &ln, sizeof(ln.ipv6), 158 + BPF_F_CURRENT_NETNS, 0); 159 + } 160 + 161 + /* workaround: We can't do a single socket lookup here, because then 162 + * the compiler will likely spill tuple_len to the stack. This makes it 163 + * lose all bounds information in the verifier, which then rejects the 164 + * call as unsafe. 165 + */ 166 + if (!sk) 167 + return TC_ACT_SHOT; 168 + 169 + if (sk->state != BPF_TCP_LISTEN) { 170 + bpf_sk_release(sk); 171 + return TC_ACT_SHOT; 172 + } 173 + 174 + assign: 175 + ret = bpf_sk_assign(skb, sk, 0); 176 + bpf_sk_release(sk); 177 + return ret; 178 + } 179 + 180 + SEC("classifier/sk_assign_test") 181 + int bpf_sk_assign_test(struct __sk_buff *skb) 182 + { 183 + struct bpf_sock_tuple *tuple, ln = {0}; 184 + bool ipv4 = false; 185 + bool tcp = false; 186 + int tuple_len; 187 + int ret = 0; 188 + 189 + tuple = get_tuple(skb, &ipv4, &tcp); 190 + if (!tuple) 191 + return TC_ACT_SHOT; 192 + 193 + /* Note that the verifier socket return type for bpf_skc_lookup_tcp() 194 + * differs from bpf_sk_lookup_udp(), so even though the C-level type is 195 + * the same here, if we try to share the implementations they will 196 + * fail to verify because we're crossing pointer types. 197 + */ 198 + if (tcp) 199 + ret = handle_tcp(skb, tuple, ipv4); 200 + else 201 + ret = handle_udp(skb, tuple, ipv4); 202 + 203 + return ret == 0 ? TC_ACT_OK : TC_ACT_SHOT; 204 + }

+53 -16

tools/testing/selftests/bpf/test_progs.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 /* Copyright (c) 2017 Facebook 3 3 */ 4 + #define _GNU_SOURCE 4 5 #include "test_progs.h" 5 6 #include "cgroup_helpers.h" 6 7 #include "bpf_rlimit.h" 7 8 #include <argp.h> 8 - #include <string.h> 9 + #include <pthread.h> 10 + #include <sched.h> 9 11 #include <signal.h> 12 + #include <string.h> 10 13 #include <execinfo.h> /* backtrace */ 11 14 12 15 /* defined in test_progs.h */ ··· 38 35 */ 39 36 int usleep(useconds_t usec) 40 37 { 41 - struct timespec ts; 38 + struct timespec ts = { 39 + .tv_sec = usec / 1000000, 40 + .tv_nsec = (usec % 1000000) * 1000, 41 + }; 42 42 43 - if (usec > 999999) { 44 - ts.tv_sec = usec / 1000000; 45 - ts.tv_nsec = usec % 1000000; 46 - } else { 47 - ts.tv_sec = 0; 48 - ts.tv_nsec = usec; 49 - } 50 - return nanosleep(&ts, NULL); 43 + return syscall(__NR_nanosleep, &ts, NULL); 51 44 } 52 45 53 46 static bool should_run(struct test_selector *sel, int num, const char *name) ··· 93 94 } 94 95 } 95 96 97 + static void stdio_restore(void); 98 + 99 + /* A bunch of tests set custom affinity per-thread and/or per-process. Reset 100 + * it after each test/sub-test. 101 + */ 102 + static void reset_affinity() { 103 + 104 + cpu_set_t cpuset; 105 + int i, err; 106 + 107 + CPU_ZERO(&cpuset); 108 + for (i = 0; i < env.nr_cpus; i++) 109 + CPU_SET(i, &cpuset); 110 + 111 + err = sched_setaffinity(0, sizeof(cpuset), &cpuset); 112 + if (err < 0) { 113 + stdio_restore(); 114 + fprintf(stderr, "Failed to reset process affinity: %d!\n", err); 115 + exit(-1); 116 + } 117 + err = pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset); 118 + if (err < 0) { 119 + stdio_restore(); 120 + fprintf(stderr, "Failed to reset thread affinity: %d!\n", err); 121 + exit(-1); 122 + } 123 + } 124 + 96 125 void test__end_subtest() 97 126 { 98 127 struct prog_test_def *test = env.test; ··· 137 110 fprintf(env.stdout, "#%d/%d %s:%s\n", 138 111 test->test_num, test->subtest_num, 139 112 test->subtest_name, sub_error_cnt ? "FAIL" : "OK"); 113 + 114 + reset_affinity(); 140 115 141 116 free(test->subtest_name); 142 117 test->subtest_name = NULL; ··· 457 428 458 429 int parse_num_list(const char *s, struct test_selector *sel) 459 430 { 460 - int i, set_len = 0, num, start = 0, end = -1; 431 + int i, set_len = 0, new_len, num, start = 0, end = -1; 461 432 bool *set = NULL, *tmp, parsing_end = false; 462 433 char *next; 463 434 ··· 492 463 return -EINVAL; 493 464 494 465 if (end + 1 > set_len) { 495 - set_len = end + 1; 496 - tmp = realloc(set, set_len); 466 + new_len = end + 1; 467 + tmp = realloc(set, new_len); 497 468 if (!tmp) { 498 469 free(set); 499 470 return -ENOMEM; 500 471 } 472 + for (i = set_len; i < start; i++) 473 + tmp[i] = false; 501 474 set = tmp; 475 + set_len = new_len; 502 476 } 503 - for (i = start; i <= end; i++) { 477 + for (i = start; i <= end; i++) 504 478 set[i] = true; 505 - } 506 - 507 479 } 508 480 509 481 if (!set) ··· 712 682 srand(time(NULL)); 713 683 714 684 env.jit_enabled = is_jit_enabled(); 685 + env.nr_cpus = libbpf_num_possible_cpus(); 686 + if (env.nr_cpus < 0) { 687 + fprintf(stderr, "Failed to get number of CPUs: %d!\n", 688 + env.nr_cpus); 689 + return -1; 690 + } 715 691 716 692 stdio_hijack(); 717 693 for (i = 0; i < prog_test_cnt; i++) { ··· 748 712 test->test_num, test->test_name, 749 713 test->error_cnt ? "FAIL" : "OK"); 750 714 715 + reset_affinity(); 751 716 if (test->need_cgroup_cleanup) 752 717 cleanup_cgroup_environment(); 753 718 }

+1

tools/testing/selftests/bpf/test_progs.h

··· 71 71 FILE *stderr; 72 72 char *log_buf; 73 73 size_t log_cnt; 74 + int nr_cpus; 74 75 75 76 int succ_cnt; /* successful tests */ 76 77 int sub_succ_cnt; /* successful sub-tests */

+23

tools/testing/selftests/bpf/trace_helpers.c

··· 4 4 #include <string.h> 5 5 #include <assert.h> 6 6 #include <errno.h> 7 + #include <fcntl.h> 7 8 #include <poll.h> 8 9 #include <unistd.h> 9 10 #include <linux/perf_event.h> 10 11 #include <sys/mman.h> 11 12 #include "trace_helpers.h" 13 + 14 + #define DEBUGFS "/sys/kernel/debug/tracing/" 12 15 13 16 #define MAX_SYMS 300000 14 17 static struct ksym syms[MAX_SYMS]; ··· 88 85 } 89 86 90 87 return 0; 88 + } 89 + 90 + void read_trace_pipe(void) 91 + { 92 + int trace_fd; 93 + 94 + trace_fd = open(DEBUGFS "trace_pipe", O_RDONLY, 0); 95 + if (trace_fd < 0) 96 + return; 97 + 98 + while (1) { 99 + static char buf[4096]; 100 + ssize_t sz; 101 + 102 + sz = read(trace_fd, buf, sizeof(buf) - 1); 103 + if (sz > 0) { 104 + buf[sz] = 0; 105 + puts(buf); 106 + } 107 + } 91 108 }

+1

tools/testing/selftests/bpf/trace_helpers.h

··· 12 12 int load_kallsyms(void); 13 13 struct ksym *ksym_search(long key); 14 14 long ksym_get_addr(const char *name); 15 + void read_trace_pipe(void); 15 16 16 17 #endif

+45 -12

tools/testing/selftests/bpf/verifier/bounds.c

··· 257 257 * [0x00ff'ffff'ff00'0000, 0x00ff'ffff'ffff'ffff] 258 258 */ 259 259 BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), 260 - /* no-op or OOB pointer computation */ 260 + /* error on OOB pointer computation */ 261 261 BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), 262 - /* potentially OOB access */ 263 - BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), 264 262 /* exit */ 265 263 BPF_MOV64_IMM(BPF_REG_0, 0), 266 264 BPF_EXIT_INSN(), 267 265 }, 268 266 .fixup_map_hash_8b = { 3 }, 269 267 /* not actually fully unbounded, but the bound is very high */ 270 - .errstr = "R0 unbounded memory access", 268 + .errstr = "value 72057594021150720 makes map_value pointer be out of bounds", 271 269 .result = REJECT 272 270 }, 273 271 { ··· 297 299 * [0x00ff'ffff'ff00'0000, 0x00ff'ffff'ffff'ffff] 298 300 */ 299 301 BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), 300 - /* no-op or OOB pointer computation */ 302 + /* error on OOB pointer computation */ 301 303 BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), 302 - /* potentially OOB access */ 303 - BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), 304 304 /* exit */ 305 305 BPF_MOV64_IMM(BPF_REG_0, 0), 306 306 BPF_EXIT_INSN(), 307 307 }, 308 308 .fixup_map_hash_8b = { 3 }, 309 309 /* not actually fully unbounded, but the bound is very high */ 310 - .errstr = "R0 unbounded memory access", 310 + .errstr = "value 72057594021150720 makes map_value pointer be out of bounds", 311 311 .result = REJECT 312 312 }, 313 313 { ··· 407 411 BPF_ALU32_IMM(BPF_RSH, BPF_REG_1, 31), 408 412 /* r1 = 0xffff'fffe (NOT 0!) */ 409 413 BPF_ALU32_IMM(BPF_SUB, BPF_REG_1, 2), 410 - /* computes OOB pointer */ 414 + /* error on computing OOB pointer */ 411 415 BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), 412 - /* OOB access */ 413 - BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), 414 416 /* exit */ 415 417 BPF_MOV64_IMM(BPF_REG_0, 0), 416 418 BPF_EXIT_INSN(), 417 419 }, 418 420 .fixup_map_hash_8b = { 3 }, 419 - .errstr = "R0 invalid mem access", 421 + .errstr = "math between map_value pointer and 4294967294 is not allowed", 420 422 .result = REJECT, 421 423 }, 422 424 { ··· 499 505 .fixup_map_hash_8b = { 3 }, 500 506 .errstr = "map_value pointer and 1000000000000", 501 507 .result = REJECT 508 + }, 509 + { 510 + "bounds check mixed 32bit and 64bit arithmatic. test1", 511 + .insns = { 512 + BPF_MOV64_IMM(BPF_REG_0, 0), 513 + BPF_MOV64_IMM(BPF_REG_1, -1), 514 + BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32), 515 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 1), 516 + /* r1 = 0xffffFFFF00000001 */ 517 + BPF_JMP32_IMM(BPF_JGT, BPF_REG_1, 1, 3), 518 + /* check ALU64 op keeps 32bit bounds */ 519 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 1), 520 + BPF_JMP32_IMM(BPF_JGT, BPF_REG_1, 2, 1), 521 + BPF_JMP_A(1), 522 + /* invalid ldx if bounds are lost above */ 523 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, -1), 524 + BPF_EXIT_INSN(), 525 + }, 526 + .result = ACCEPT 527 + }, 528 + { 529 + "bounds check mixed 32bit and 64bit arithmatic. test2", 530 + .insns = { 531 + BPF_MOV64_IMM(BPF_REG_0, 0), 532 + BPF_MOV64_IMM(BPF_REG_1, -1), 533 + BPF_ALU64_IMM(BPF_LSH, BPF_REG_1, 32), 534 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 1), 535 + /* r1 = 0xffffFFFF00000001 */ 536 + BPF_MOV64_IMM(BPF_REG_2, 3), 537 + /* r1 = 0x2 */ 538 + BPF_ALU32_IMM(BPF_ADD, BPF_REG_1, 1), 539 + /* check ALU32 op zero extends 64bit bounds */ 540 + BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 1), 541 + BPF_JMP_A(1), 542 + /* invalid ldx if bounds are lost above */ 543 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, -1), 544 + BPF_EXIT_INSN(), 545 + }, 546 + .result = ACCEPT 502 547 },

+4 -4

tools/testing/selftests/bpf/verifier/bpf_get_stack.c

··· 9 9 BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 10 10 BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 28), 11 11 BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), 12 - BPF_MOV64_IMM(BPF_REG_9, sizeof(struct test_val)), 12 + BPF_MOV64_IMM(BPF_REG_9, sizeof(struct test_val)/2), 13 13 BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 14 14 BPF_MOV64_REG(BPF_REG_2, BPF_REG_7), 15 - BPF_MOV64_IMM(BPF_REG_3, sizeof(struct test_val)), 15 + BPF_MOV64_IMM(BPF_REG_3, sizeof(struct test_val)/2), 16 16 BPF_MOV64_IMM(BPF_REG_4, 256), 17 17 BPF_EMIT_CALL(BPF_FUNC_get_stack), 18 18 BPF_MOV64_IMM(BPF_REG_1, 0), 19 19 BPF_MOV64_REG(BPF_REG_8, BPF_REG_0), 20 20 BPF_ALU64_IMM(BPF_LSH, BPF_REG_8, 32), 21 21 BPF_ALU64_IMM(BPF_ARSH, BPF_REG_8, 32), 22 - BPF_JMP_REG(BPF_JSLT, BPF_REG_1, BPF_REG_8, 16), 22 + BPF_JMP_REG(BPF_JSGT, BPF_REG_1, BPF_REG_8, 16), 23 23 BPF_ALU64_REG(BPF_SUB, BPF_REG_9, BPF_REG_8), 24 24 BPF_MOV64_REG(BPF_REG_2, BPF_REG_7), 25 25 BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_8), ··· 29 29 BPF_MOV64_REG(BPF_REG_3, BPF_REG_2), 30 30 BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_1), 31 31 BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), 32 - BPF_MOV64_IMM(BPF_REG_5, sizeof(struct test_val)), 32 + BPF_MOV64_IMM(BPF_REG_5, sizeof(struct test_val)/2), 33 33 BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_5), 34 34 BPF_JMP_REG(BPF_JGE, BPF_REG_3, BPF_REG_1, 4), 35 35 BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),

+105

tools/testing/selftests/bpf/verifier/ctx.c

··· 91 91 .result = REJECT, 92 92 .errstr = "variable ctx access var_off=(0x0; 0x4)", 93 93 }, 94 + { 95 + "pass ctx or null check, 1: ctx", 96 + .insns = { 97 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 98 + BPF_FUNC_get_netns_cookie), 99 + BPF_MOV64_IMM(BPF_REG_0, 0), 100 + BPF_EXIT_INSN(), 101 + }, 102 + .prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 103 + .expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, 104 + .result = ACCEPT, 105 + }, 106 + { 107 + "pass ctx or null check, 2: null", 108 + .insns = { 109 + BPF_MOV64_IMM(BPF_REG_1, 0), 110 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 111 + BPF_FUNC_get_netns_cookie), 112 + BPF_MOV64_IMM(BPF_REG_0, 0), 113 + BPF_EXIT_INSN(), 114 + }, 115 + .prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 116 + .expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, 117 + .result = ACCEPT, 118 + }, 119 + { 120 + "pass ctx or null check, 3: 1", 121 + .insns = { 122 + BPF_MOV64_IMM(BPF_REG_1, 1), 123 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 124 + BPF_FUNC_get_netns_cookie), 125 + BPF_MOV64_IMM(BPF_REG_0, 0), 126 + BPF_EXIT_INSN(), 127 + }, 128 + .prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 129 + .expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, 130 + .result = REJECT, 131 + .errstr = "R1 type=inv expected=ctx", 132 + }, 133 + { 134 + "pass ctx or null check, 4: ctx - const", 135 + .insns = { 136 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -612), 137 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 138 + BPF_FUNC_get_netns_cookie), 139 + BPF_MOV64_IMM(BPF_REG_0, 0), 140 + BPF_EXIT_INSN(), 141 + }, 142 + .prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 143 + .expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, 144 + .result = REJECT, 145 + .errstr = "dereference of modified ctx ptr", 146 + }, 147 + { 148 + "pass ctx or null check, 5: null (connect)", 149 + .insns = { 150 + BPF_MOV64_IMM(BPF_REG_1, 0), 151 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 152 + BPF_FUNC_get_netns_cookie), 153 + BPF_MOV64_IMM(BPF_REG_0, 0), 154 + BPF_EXIT_INSN(), 155 + }, 156 + .prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 157 + .expected_attach_type = BPF_CGROUP_INET4_CONNECT, 158 + .result = ACCEPT, 159 + }, 160 + { 161 + "pass ctx or null check, 6: null (bind)", 162 + .insns = { 163 + BPF_MOV64_IMM(BPF_REG_1, 0), 164 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 165 + BPF_FUNC_get_netns_cookie), 166 + BPF_MOV64_IMM(BPF_REG_0, 0), 167 + BPF_EXIT_INSN(), 168 + }, 169 + .prog_type = BPF_PROG_TYPE_CGROUP_SOCK, 170 + .expected_attach_type = BPF_CGROUP_INET4_POST_BIND, 171 + .result = ACCEPT, 172 + }, 173 + { 174 + "pass ctx or null check, 7: ctx (bind)", 175 + .insns = { 176 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 177 + BPF_FUNC_get_socket_cookie), 178 + BPF_MOV64_IMM(BPF_REG_0, 0), 179 + BPF_EXIT_INSN(), 180 + }, 181 + .prog_type = BPF_PROG_TYPE_CGROUP_SOCK, 182 + .expected_attach_type = BPF_CGROUP_INET4_POST_BIND, 183 + .result = ACCEPT, 184 + }, 185 + { 186 + "pass ctx or null check, 8: null (bind)", 187 + .insns = { 188 + BPF_MOV64_IMM(BPF_REG_1, 0), 189 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 190 + BPF_FUNC_get_socket_cookie), 191 + BPF_MOV64_IMM(BPF_REG_0, 0), 192 + BPF_EXIT_INSN(), 193 + }, 194 + .prog_type = BPF_PROG_TYPE_CGROUP_SOCK, 195 + .expected_attach_type = BPF_CGROUP_INET4_POST_BIND, 196 + .result = REJECT, 197 + .errstr = "R1 type=inv expected=ctx", 198 + },