Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Documentation/bpf/bpf_lsm.rst Documentation/bpf/prog_lsm.rst

+22 -22

Documentation/bpf/btf.rst

··· 3 3 ===================== 4 4 5 5 1. Introduction 6 - *************** 6 + =============== 7 7 8 8 BTF (BPF Type Format) is the metadata format which encodes the debug info 9 9 related to BPF program/map. The name BTF was used initially to describe data ··· 30 30 .. _BTF_Type_String: 31 31 32 32 2. BTF Type and String Encoding 33 - ******************************* 33 + =============================== 34 34 35 35 The file ``include/uapi/linux/btf.h`` provides high-level definition of how 36 36 types/strings are encoded. ··· 57 57 generated. 58 58 59 59 2.1 String Encoding 60 - =================== 60 + ------------------- 61 61 62 62 The first string in the string section must be a null string. The rest of 63 63 string table is a concatenation of other null-terminated strings. 64 64 65 65 2.2 Type Encoding 66 - ================= 66 + ----------------- 67 67 68 68 The type id ``0`` is reserved for ``void`` type. The type section is parsed 69 69 sequentially and type id is assigned to each recognized type starting from id ··· 504 504 * ``type``: the type with ``btf_type_tag`` attribute 505 505 506 506 3. BTF Kernel API 507 - ***************** 507 + ================= 508 508 509 509 The following bpf syscall command involves BTF: 510 510 * BPF_BTF_LOAD: load a blob of BTF data into kernel ··· 547 547 548 548 549 549 3.1 BPF_BTF_LOAD 550 - ================ 550 + ---------------- 551 551 552 552 Load a blob of BTF data into kernel. A blob of data, described in 553 553 :ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd`` 554 554 is returned to a userspace. 555 555 556 556 3.2 BPF_MAP_CREATE 557 - ================== 557 + ------------------ 558 558 559 559 A map can be created with ``btf_fd`` and specified key/value type id.:: 560 560 ··· 581 581 .. _BPF_Prog_Load: 582 582 583 583 3.3 BPF_PROG_LOAD 584 - ================= 584 + ----------------- 585 585 586 586 During prog_load, func_info and line_info can be passed to kernel with proper 587 587 values for the following attributes: ··· 631 631 #define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff) 632 632 633 633 3.4 BPF_{PROG,MAP}_GET_NEXT_ID 634 - ============================== 634 + ------------------------------ 635 635 636 636 In kernel, every loaded program, map or btf has a unique id. The id won't 637 637 change during the lifetime of a program, map, or btf. ··· 641 641 inspection tool can inspect all programs and maps. 642 642 643 643 3.5 BPF_{PROG,MAP}_GET_FD_BY_ID 644 - =============================== 644 + ------------------------------- 645 645 646 646 An introspection tool cannot use id to get details about program or maps. 647 647 A file descriptor needs to be obtained first for reference-counting purpose. 648 648 649 649 3.6 BPF_OBJ_GET_INFO_BY_FD 650 - ========================== 650 + -------------------------- 651 651 652 652 Once a program/map fd is acquired, an introspection tool can get the detailed 653 653 information from kernel about this fd, some of which are BTF-related. For ··· 656 656 bpf byte codes, and jited_line_info. 657 657 658 658 3.7 BPF_BTF_GET_FD_BY_ID 659 - ======================== 659 + ------------------------ 660 660 661 661 With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf 662 662 syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with ··· 668 668 func signatures and line info, along with byte/jit codes. 669 669 670 670 4. ELF File Format Interface 671 - **************************** 671 + ============================ 672 672 673 673 4.1 .BTF section 674 - ================ 674 + ---------------- 675 675 676 676 The .BTF section contains type and string data. The format of this section is 677 677 same as the one describe in :ref:`BTF_Type_String`. ··· 679 679 .. _BTF_Ext_Section: 680 680 681 681 4.2 .BTF.ext section 682 - ==================== 682 + -------------------- 683 683 684 684 The .BTF.ext section encodes func_info and line_info which needs loader 685 685 manipulation before loading into the kernel. ··· 743 743 beginning of section (``btf_ext_info_sec->sec_name_off``). 744 744 745 745 4.2 .BTF_ids section 746 - ==================== 746 + -------------------- 747 747 748 748 The .BTF_ids section encodes BTF ID values that are used within the kernel. 749 749 ··· 804 804 resolved during the linking phase of kernel build by ``resolve_btfids`` tool. 805 805 806 806 5. Using BTF 807 - ************ 807 + ============ 808 808 809 809 5.1 bpftool map pretty print 810 - ============================ 810 + ---------------------------- 811 811 812 812 With BTF, the map key/value can be printed based on fields rather than simply 813 813 raw bytes. This is especially valuable for large structure or if your data ··· 849 849 ] 850 850 851 851 5.2 bpftool prog dump 852 - ===================== 852 + --------------------- 853 853 854 854 The following is an example showing how func_info and line_info can help prog 855 855 dump with better kernel symbol names, function prototypes and line ··· 883 883 [...] 884 884 885 885 5.3 Verifier Log 886 - ================ 886 + ---------------- 887 887 888 888 The following is an example of how line_info can help debugging verification 889 889 failure.:: ··· 909 909 R2 offset is outside of the packet 910 910 911 911 6. BTF Generation 912 - ***************** 912 + ================= 913 913 914 914 You need latest pahole 915 915 ··· 1016 1016 .long 8206 # Line 8 Col 14 1017 1017 1018 1018 7. Testing 1019 - ********** 1019 + ========== 1020 1020 1021 1021 Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.

+11

Documentation/bpf/faq.rst

··· 1 + ================================ 2 + Frequently asked questions (FAQ) 3 + ================================ 4 + 5 + Two sets of Questions and Answers (Q&A) are maintained. 6 + 7 + .. toctree:: 8 + :maxdepth: 1 9 + 10 + bpf_design_QA 11 + bpf_devel_QA

+7

Documentation/bpf/helpers.rst

··· 1 + Helper functions 2 + ================ 3 + 4 + * `bpf-helpers(7)`_ maintains a list of helpers available to eBPF programs. 5 + 6 + .. Links 7 + .. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html

+15 -87

Documentation/bpf/index.rst

··· 5 5 This directory contains documentation for the BPF (Berkeley Packet 6 6 Filter) facility, with a focus on the extended BPF version (eBPF). 7 7 8 - This kernel side documentation is still work in progress. The main 9 - textual documentation is (for historical reasons) described in 10 - :ref:`networking-filter`, which describe both classical and extended 11 - BPF instruction-set. 8 + This kernel side documentation is still work in progress. 12 9 The Cilium project also maintains a `BPF and XDP Reference Guide`_ 13 10 that goes into great technical depth about the BPF Architecture. 14 11 15 - libbpf 16 - ====== 17 - 18 - Documentation/bpf/libbpf/index.rst is a userspace library for loading and interacting with bpf programs. 19 - 20 - BPF Type Format (BTF) 21 - ===================== 22 - 23 12 .. toctree:: 24 13 :maxdepth: 1 25 14 15 + instruction-set 16 + verifier 17 + libbpf/index 26 18 btf 27 - 28 - 29 - Frequently asked questions (FAQ) 30 - ================================ 31 - 32 - Two sets of Questions and Answers (Q&A) are maintained. 33 - 34 - .. toctree:: 35 - :maxdepth: 1 36 - 37 - bpf_design_QA 38 - bpf_devel_QA 39 - 40 - Syscall API 41 - =========== 42 - 43 - The primary info for the bpf syscall is available in the `man-pages`_ 44 - for `bpf(2)`_. For more information about the userspace API, see 45 - Documentation/userspace-api/ebpf/index.rst. 46 - 47 - Helper functions 48 - ================ 49 - 50 - * `bpf-helpers(7)`_ maintains a list of helpers available to eBPF programs. 51 - 52 - 53 - Program types 54 - ============= 55 - 56 - .. toctree:: 57 - :maxdepth: 1 58 - 59 - prog_cgroup_sockopt 60 - prog_cgroup_sysctl 61 - prog_flow_dissector 62 - bpf_lsm 63 - prog_sk_lookup 64 - 65 - 66 - Map types 67 - ========= 68 - 69 - .. toctree:: 70 - :maxdepth: 1 71 - 72 - map_cgroup_storage 73 - 74 - 75 - Testing and debugging BPF 76 - ========================= 77 - 78 - .. toctree:: 79 - :maxdepth: 1 80 - 81 - drgn 82 - s390 83 - 84 - 85 - Licensing 86 - ========= 87 - 88 - .. toctree:: 89 - :maxdepth: 1 90 - 19 + faq 20 + syscall_api 21 + helpers 22 + programs 23 + maps 91 24 bpf_licensing 25 + test_debug 26 + other 92 27 28 + .. only:: subproject and html 93 29 94 - Other 95 - ===== 30 + Indices 31 + ======= 96 32 97 - .. toctree:: 98 - :maxdepth: 1 99 - 100 - ringbuf 101 - llvm_reloc 33 + * :ref:`genindex` 102 34 103 35 .. Links: 104 - .. _networking-filter: ../networking/filter.rst 105 - .. _man-pages: https://www.kernel.org/doc/man-pages/ 106 - .. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html 107 - .. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html 108 36 .. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/

+467

Documentation/bpf/instruction-set.rst

··· 1 + 2 + ==================== 3 + eBPF Instruction Set 4 + ==================== 5 + 6 + eBPF is designed to be JITed with one to one mapping, which can also open up 7 + the possibility for GCC/LLVM compilers to generate optimized eBPF code through 8 + an eBPF backend that performs almost as fast as natively compiled code. 9 + 10 + Some core changes of the eBPF format from classic BPF: 11 + 12 + - Number of registers increase from 2 to 10: 13 + 14 + The old format had two registers A and X, and a hidden frame pointer. The 15 + new layout extends this to be 10 internal registers and a read-only frame 16 + pointer. Since 64-bit CPUs are passing arguments to functions via registers 17 + the number of args from eBPF program to in-kernel function is restricted 18 + to 5 and one register is used to accept return value from an in-kernel 19 + function. Natively, x86_64 passes first 6 arguments in registers, aarch64/ 20 + sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved 21 + registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers. 22 + 23 + Therefore, eBPF calling convention is defined as: 24 + 25 + * R0 - return value from in-kernel function, and exit value for eBPF program 26 + * R1 - R5 - arguments from eBPF program to in-kernel function 27 + * R6 - R9 - callee saved registers that in-kernel function will preserve 28 + * R10 - read-only frame pointer to access stack 29 + 30 + Thus, all eBPF registers map one to one to HW registers on x86_64, aarch64, 31 + etc, and eBPF calling convention maps directly to ABIs used by the kernel on 32 + 64-bit architectures. 33 + 34 + On 32-bit architectures JIT may map programs that use only 32-bit arithmetic 35 + and may let more complex programs to be interpreted. 36 + 37 + R0 - R5 are scratch registers and eBPF program needs spill/fill them if 38 + necessary across calls. Note that there is only one eBPF program (== one 39 + eBPF main routine) and it cannot call other eBPF functions, it can only 40 + call predefined in-kernel functions, though. 41 + 42 + - Register width increases from 32-bit to 64-bit: 43 + 44 + Still, the semantics of the original 32-bit ALU operations are preserved 45 + via 32-bit subregisters. All eBPF registers are 64-bit with 32-bit lower 46 + subregisters that zero-extend into 64-bit if they are being written to. 47 + That behavior maps directly to x86_64 and arm64 subregister definition, but 48 + makes other JITs more difficult. 49 + 50 + 32-bit architectures run 64-bit eBPF programs via interpreter. 51 + Their JITs may convert BPF programs that only use 32-bit subregisters into 52 + native instruction set and let the rest being interpreted. 53 + 54 + Operation is 64-bit, because on 64-bit architectures, pointers are also 55 + 64-bit wide, and we want to pass 64-bit values in/out of kernel functions, 56 + so 32-bit eBPF registers would otherwise require to define register-pair 57 + ABI, thus, there won't be able to use a direct eBPF register to HW register 58 + mapping and JIT would need to do combine/split/move operations for every 59 + register in and out of the function, which is complex, bug prone and slow. 60 + Another reason is the use of atomic 64-bit counters. 61 + 62 + - Conditional jt/jf targets replaced with jt/fall-through: 63 + 64 + While the original design has constructs such as ``if (cond) jump_true; 65 + else jump_false;``, they are being replaced into alternative constructs like 66 + ``if (cond) jump_true; /* else fall-through */``. 67 + 68 + - Introduces bpf_call insn and register passing convention for zero overhead 69 + calls from/to other kernel functions: 70 + 71 + Before an in-kernel function call, the eBPF program needs to 72 + place function arguments into R1 to R5 registers to satisfy calling 73 + convention, then the interpreter will take them from registers and pass 74 + to in-kernel function. If R1 - R5 registers are mapped to CPU registers 75 + that are used for argument passing on given architecture, the JIT compiler 76 + doesn't need to emit extra moves. Function arguments will be in the correct 77 + registers and BPF_CALL instruction will be JITed as single 'call' HW 78 + instruction. This calling convention was picked to cover common call 79 + situations without performance penalty. 80 + 81 + After an in-kernel function call, R1 - R5 are reset to unreadable and R0 has 82 + a return value of the function. Since R6 - R9 are callee saved, their state 83 + is preserved across the call. 84 + 85 + For example, consider three C functions:: 86 + 87 + u64 f1() { return (*_f2)(1); } 88 + u64 f2(u64 a) { return f3(a + 1, a); } 89 + u64 f3(u64 a, u64 b) { return a - b; } 90 + 91 + GCC can compile f1, f3 into x86_64:: 92 + 93 + f1: 94 + movl $1, %edi 95 + movq _f2(%rip), %rax 96 + jmp *%rax 97 + f3: 98 + movq %rdi, %rax 99 + subq %rsi, %rax 100 + ret 101 + 102 + Function f2 in eBPF may look like:: 103 + 104 + f2: 105 + bpf_mov R2, R1 106 + bpf_add R1, 1 107 + bpf_call f3 108 + bpf_exit 109 + 110 + If f2 is JITed and the pointer stored to ``_f2``. The calls f1 -> f2 -> f3 and 111 + returns will be seamless. Without JIT, __bpf_prog_run() interpreter needs to 112 + be used to call into f2. 113 + 114 + For practical reasons all eBPF programs have only one argument 'ctx' which is 115 + already placed into R1 (e.g. on __bpf_prog_run() startup) and the programs 116 + can call kernel functions with up to 5 arguments. Calls with 6 or more arguments 117 + are currently not supported, but these restrictions can be lifted if necessary 118 + in the future. 119 + 120 + On 64-bit architectures all register map to HW registers one to one. For 121 + example, x86_64 JIT compiler can map them as ... 122 + 123 + :: 124 + 125 + R0 - rax 126 + R1 - rdi 127 + R2 - rsi 128 + R3 - rdx 129 + R4 - rcx 130 + R5 - r8 131 + R6 - rbx 132 + R7 - r13 133 + R8 - r14 134 + R9 - r15 135 + R10 - rbp 136 + 137 + ... since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing 138 + and rbx, r12 - r15 are callee saved. 139 + 140 + Then the following eBPF pseudo-program:: 141 + 142 + bpf_mov R6, R1 /* save ctx */ 143 + bpf_mov R2, 2 144 + bpf_mov R3, 3 145 + bpf_mov R4, 4 146 + bpf_mov R5, 5 147 + bpf_call foo 148 + bpf_mov R7, R0 /* save foo() return value */ 149 + bpf_mov R1, R6 /* restore ctx for next call */ 150 + bpf_mov R2, 6 151 + bpf_mov R3, 7 152 + bpf_mov R4, 8 153 + bpf_mov R5, 9 154 + bpf_call bar 155 + bpf_add R0, R7 156 + bpf_exit 157 + 158 + After JIT to x86_64 may look like:: 159 + 160 + push %rbp 161 + mov %rsp,%rbp 162 + sub $0x228,%rsp 163 + mov %rbx,-0x228(%rbp) 164 + mov %r13,-0x220(%rbp) 165 + mov %rdi,%rbx 166 + mov $0x2,%esi 167 + mov $0x3,%edx 168 + mov $0x4,%ecx 169 + mov $0x5,%r8d 170 + callq foo 171 + mov %rax,%r13 172 + mov %rbx,%rdi 173 + mov $0x6,%esi 174 + mov $0x7,%edx 175 + mov $0x8,%ecx 176 + mov $0x9,%r8d 177 + callq bar 178 + add %r13,%rax 179 + mov -0x228(%rbp),%rbx 180 + mov -0x220(%rbp),%r13 181 + leaveq 182 + retq 183 + 184 + Which is in this example equivalent in C to:: 185 + 186 + u64 bpf_filter(u64 ctx) 187 + { 188 + return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9); 189 + } 190 + 191 + In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64 192 + arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper 193 + registers and place their return value into ``%rax`` which is R0 in eBPF. 194 + Prologue and epilogue are emitted by JIT and are implicit in the 195 + interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve 196 + them across the calls as defined by calling convention. 197 + 198 + For example the following program is invalid:: 199 + 200 + bpf_mov R1, 1 201 + bpf_call foo 202 + bpf_mov R0, R1 203 + bpf_exit 204 + 205 + After the call the registers R1-R5 contain junk values and cannot be read. 206 + An in-kernel `eBPF verifier`_ is used to validate eBPF programs. 207 + 208 + Also in the new design, eBPF is limited to 4096 insns, which means that any 209 + program will terminate quickly and will only call a fixed number of kernel 210 + functions. Original BPF and eBPF are two operand instructions, 211 + which helps to do one-to-one mapping between eBPF insn and x86 insn during JIT. 212 + 213 + The input context pointer for invoking the interpreter function is generic, 214 + its content is defined by a specific use case. For seccomp register R1 points 215 + to seccomp_data, for converted BPF filters R1 points to a skb. 216 + 217 + A program, that is translated internally consists of the following elements:: 218 + 219 + op:16, jt:8, jf:8, k:32 ==> op:8, dst_reg:4, src_reg:4, off:16, imm:32 220 + 221 + So far 87 eBPF instructions were implemented. 8-bit 'op' opcode field 222 + has room for new instructions. Some of them may use 16/24/32 byte encoding. New 223 + instructions must be multiple of 8 bytes to preserve backward compatibility. 224 + 225 + eBPF is a general purpose RISC instruction set. Not every register and 226 + every instruction are used during translation from original BPF to eBPF. 227 + For example, socket filters are not using ``exclusive add`` instruction, but 228 + tracing filters may do to maintain counters of events, for example. Register R9 229 + is not used by socket filters either, but more complex filters may be running 230 + out of registers and would have to resort to spill/fill to stack. 231 + 232 + eBPF can be used as a generic assembler for last step performance 233 + optimizations, socket filters and seccomp are using it as assembler. Tracing 234 + filters may use it as assembler to generate code from kernel. In kernel usage 235 + may not be bounded by security considerations, since generated eBPF code 236 + may be optimizing internal code path and not being exposed to the user space. 237 + Safety of eBPF can come from the `eBPF verifier`_. In such use cases as 238 + described, it may be used as safe instruction set. 239 + 240 + Just like the original BPF, eBPF runs within a controlled environment, 241 + is deterministic and the kernel can easily prove that. The safety of the program 242 + can be determined in two steps: first step does depth-first-search to disallow 243 + loops and other CFG validation; second step starts from the first insn and 244 + descends all possible paths. It simulates execution of every insn and observes 245 + the state change of registers and stack. 246 + 247 + eBPF opcode encoding 248 + ==================== 249 + 250 + eBPF is reusing most of the opcode encoding from classic to simplify conversion 251 + of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code' 252 + field is divided into three parts:: 253 + 254 + +----------------+--------+--------------------+ 255 + | 4 bits | 1 bit | 3 bits | 256 + | operation code | source | instruction class | 257 + +----------------+--------+--------------------+ 258 + (MSB) (LSB) 259 + 260 + Three LSB bits store instruction class which is one of: 261 + 262 + =================== =============== 263 + Classic BPF classes eBPF classes 264 + =================== =============== 265 + BPF_LD 0x00 BPF_LD 0x00 266 + BPF_LDX 0x01 BPF_LDX 0x01 267 + BPF_ST 0x02 BPF_ST 0x02 268 + BPF_STX 0x03 BPF_STX 0x03 269 + BPF_ALU 0x04 BPF_ALU 0x04 270 + BPF_JMP 0x05 BPF_JMP 0x05 271 + BPF_RET 0x06 BPF_JMP32 0x06 272 + BPF_MISC 0x07 BPF_ALU64 0x07 273 + =================== =============== 274 + 275 + When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ... 276 + 277 + :: 278 + 279 + BPF_K 0x00 280 + BPF_X 0x08 281 + 282 + * in classic BPF, this means:: 283 + 284 + BPF_SRC(code) == BPF_X - use register X as source operand 285 + BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand 286 + 287 + * in eBPF, this means:: 288 + 289 + BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand 290 + BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand 291 + 292 + ... and four MSB bits store operation code. 293 + 294 + If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:: 295 + 296 + BPF_ADD 0x00 297 + BPF_SUB 0x10 298 + BPF_MUL 0x20 299 + BPF_DIV 0x30 300 + BPF_OR 0x40 301 + BPF_AND 0x50 302 + BPF_LSH 0x60 303 + BPF_RSH 0x70 304 + BPF_NEG 0x80 305 + BPF_MOD 0x90 306 + BPF_XOR 0xa0 307 + BPF_MOV 0xb0 /* eBPF only: mov reg to reg */ 308 + BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */ 309 + BPF_END 0xd0 /* eBPF only: endianness conversion */ 310 + 311 + If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:: 312 + 313 + BPF_JA 0x00 /* BPF_JMP only */ 314 + BPF_JEQ 0x10 315 + BPF_JGT 0x20 316 + BPF_JGE 0x30 317 + BPF_JSET 0x40 318 + BPF_JNE 0x50 /* eBPF only: jump != */ 319 + BPF_JSGT 0x60 /* eBPF only: signed '>' */ 320 + BPF_JSGE 0x70 /* eBPF only: signed '>=' */ 321 + BPF_CALL 0x80 /* eBPF BPF_JMP only: function call */ 322 + BPF_EXIT 0x90 /* eBPF BPF_JMP only: function return */ 323 + BPF_JLT 0xa0 /* eBPF only: unsigned '<' */ 324 + BPF_JLE 0xb0 /* eBPF only: unsigned '<=' */ 325 + BPF_JSLT 0xc0 /* eBPF only: signed '<' */ 326 + BPF_JSLE 0xd0 /* eBPF only: signed '<=' */ 327 + 328 + So BPF_ADD | BPF_X | BPF_ALU means 32-bit addition in both classic BPF 329 + and eBPF. There are only two registers in classic BPF, so it means A += X. 330 + In eBPF it means dst_reg = (u32) dst_reg + (u32) src_reg; similarly, 331 + BPF_XOR | BPF_K | BPF_ALU means A ^= imm32 in classic BPF and analogous 332 + src_reg = (u32) src_reg ^ (u32) imm32 in eBPF. 333 + 334 + Classic BPF is using BPF_MISC class to represent A = X and X = A moves. 335 + eBPF is using BPF_MOV | BPF_X | BPF_ALU code instead. Since there are no 336 + BPF_MISC operations in eBPF, the class 7 is used as BPF_ALU64 to mean 337 + exactly the same operations as BPF_ALU, but with 64-bit wide operands 338 + instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.: 339 + dst_reg = dst_reg + src_reg 340 + 341 + Classic BPF wastes the whole BPF_RET class to represent a single ``ret`` 342 + operation. Classic BPF_RET | BPF_K means copy imm32 into return register 343 + and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT 344 + in eBPF means function exit only. The eBPF program needs to store return 345 + value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as 346 + BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide 347 + operands for the comparisons instead. 348 + 349 + For load and store instructions the 8-bit 'code' field is divided as:: 350 + 351 + +--------+--------+-------------------+ 352 + | 3 bits | 2 bits | 3 bits | 353 + | mode | size | instruction class | 354 + +--------+--------+-------------------+ 355 + (MSB) (LSB) 356 + 357 + Size modifier is one of ... 358 + 359 + :: 360 + 361 + BPF_W 0x00 /* word */ 362 + BPF_H 0x08 /* half word */ 363 + BPF_B 0x10 /* byte */ 364 + BPF_DW 0x18 /* eBPF only, double word */ 365 + 366 + ... which encodes size of load/store operation:: 367 + 368 + B - 1 byte 369 + H - 2 byte 370 + W - 4 byte 371 + DW - 8 byte (eBPF only) 372 + 373 + Mode modifier is one of:: 374 + 375 + BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */ 376 + BPF_ABS 0x20 377 + BPF_IND 0x40 378 + BPF_MEM 0x60 379 + BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */ 380 + BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */ 381 + BPF_ATOMIC 0xc0 /* eBPF only, atomic operations */ 382 + 383 + eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and 384 + (BPF_IND | <size> | BPF_LD) which are used to access packet data. 385 + 386 + They had to be carried over from classic to have strong performance of 387 + socket filters running in eBPF interpreter. These instructions can only 388 + be used when interpreter context is a pointer to ``struct sk_buff`` and 389 + have seven implicit operands. Register R6 is an implicit input that must 390 + contain pointer to sk_buff. Register R0 is an implicit output which contains 391 + the data fetched from the packet. Registers R1-R5 are scratch registers 392 + and must not be used to store the data across BPF_ABS | BPF_LD or 393 + BPF_IND | BPF_LD instructions. 394 + 395 + These instructions have implicit program exit condition as well. When 396 + eBPF program is trying to access the data beyond the packet boundary, 397 + the interpreter will abort the execution of the program. JIT compilers 398 + therefore must preserve this property. src_reg and imm32 fields are 399 + explicit inputs to these instructions. 400 + 401 + For example:: 402 + 403 + BPF_IND | BPF_W | BPF_LD means: 404 + 405 + R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) 406 + and R1 - R5 were scratched. 407 + 408 + Unlike classic BPF instruction set, eBPF has generic load/store operations:: 409 + 410 + BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg 411 + BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32 412 + BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off) 413 + 414 + Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. 415 + 416 + It also includes atomic operations, which use the immediate field for extra 417 + encoding:: 418 + 419 + .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg 420 + .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg 421 + 422 + The basic atomic operations supported are:: 423 + 424 + BPF_ADD 425 + BPF_AND 426 + BPF_OR 427 + BPF_XOR 428 + 429 + Each having equivalent semantics with the ``BPF_ADD`` example, that is: the 430 + memory location addresed by ``dst_reg + off`` is atomically modified, with 431 + ``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the 432 + immediate, then these operations also overwrite ``src_reg`` with the 433 + value that was in memory before it was modified. 434 + 435 + The more special operations are:: 436 + 437 + BPF_XCHG 438 + 439 + This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg + 440 + off``. :: 441 + 442 + BPF_CMPXCHG 443 + 444 + This atomically compares the value addressed by ``dst_reg + off`` with 445 + ``R0``. If they match it is replaced with ``src_reg``. In either case, the 446 + value that was there before is zero-extended and loaded back to ``R0``. 447 + 448 + Note that 1 and 2 byte atomic operations are not supported. 449 + 450 + Clang can generate atomic instructions by default when ``-mcpu=v3`` is 451 + enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction 452 + Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable 453 + the atomics features, while keeping a lower ``-mcpu`` version, you can use 454 + ``-Xclang -target-feature -Xclang +alu32``. 455 + 456 + You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``, 457 + referring to the exclusive-add operation encoded when the immediate field is 458 + zero. 459 + 460 + eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists 461 + of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single 462 + instruction that loads 64-bit immediate value into a dst_reg. 463 + Classic BPF has similar instruction: ``BPF_LD | BPF_W | BPF_IMM`` which loads 464 + 32-bit immediate value into a register. 465 + 466 + .. Links: 467 + .. _eBPF verifier: verifiers.rst

+2 -2

Documentation/bpf/libbpf/index.rst

··· 3 3 libbpf 4 4 ====== 5 5 6 - For API documentation see the `versioned API documentation site <https://libbpf.readthedocs.io/en/latest/api.html>`_. 7 - 8 6 .. toctree:: 9 7 :maxdepth: 1 10 8 ··· 11 13 12 14 This is documentation for libbpf, a userspace library for loading and 13 15 interacting with bpf programs. 16 + 17 + For API documentation see the `versioned API documentation site <https://libbpf.readthedocs.io/en/latest/api.html>`_. 14 18 15 19 All general BPF questions, including kernel functionality, libbpf APIs and 16 20 their application, should be sent to bpf@vger.kernel.org mailing list.

+52

Documentation/bpf/maps.rst

··· 1 + 2 + ========= 3 + eBPF maps 4 + ========= 5 + 6 + 'maps' is a generic storage of different types for sharing data between kernel 7 + and userspace. 8 + 9 + The maps are accessed from user space via BPF syscall, which has commands: 10 + 11 + - create a map with given type and attributes 12 + ``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)`` 13 + using attr->map_type, attr->key_size, attr->value_size, attr->max_entries 14 + returns process-local file descriptor or negative error 15 + 16 + - lookup key in a given map 17 + ``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)`` 18 + using attr->map_fd, attr->key, attr->value 19 + returns zero and stores found elem into value or negative error 20 + 21 + - create or update key/value pair in a given map 22 + ``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)`` 23 + using attr->map_fd, attr->key, attr->value 24 + returns zero or negative error 25 + 26 + - find and delete element by key in a given map 27 + ``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)`` 28 + using attr->map_fd, attr->key 29 + 30 + - to delete map: close(fd) 31 + Exiting process will delete maps automatically 32 + 33 + userspace programs use this syscall to create/access maps that eBPF programs 34 + are concurrently updating. 35 + 36 + maps can have different types: hash, array, bloom filter, radix-tree, etc. 37 + 38 + The map is defined by: 39 + 40 + - type 41 + - max number of elements 42 + - key size in bytes 43 + - value size in bytes 44 + 45 + Map Types 46 + ========= 47 + 48 + .. toctree:: 49 + :maxdepth: 1 50 + :glob: 51 + 52 + map_*

+9

Documentation/bpf/other.rst

··· 1 + ===== 2 + Other 3 + ===== 4 + 5 + .. toctree:: 6 + :maxdepth: 1 7 + 8 + ringbuf 9 + llvm_reloc

+9

Documentation/bpf/programs.rst

··· 1 + ============= 2 + Program Types 3 + ============= 4 + 5 + .. toctree:: 6 + :maxdepth: 1 7 + :glob: 8 + 9 + prog_*

+11

Documentation/bpf/syscall_api.rst

··· 1 + =========== 2 + Syscall API 3 + =========== 4 + 5 + The primary info for the bpf syscall is available in the `man-pages`_ 6 + for `bpf(2)`_. For more information about the userspace API, see 7 + Documentation/userspace-api/ebpf/index.rst. 8 + 9 + .. Links: 10 + .. _man-pages: https://www.kernel.org/doc/man-pages/ 11 + .. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html

+9

Documentation/bpf/test_debug.rst

··· 1 + ========================= 2 + Testing and debugging BPF 3 + ========================= 4 + 5 + .. toctree:: 6 + :maxdepth: 1 7 + 8 + drgn 9 + s390

+529

Documentation/bpf/verifier.rst

··· 1 + 2 + ============= 3 + eBPF verifier 4 + ============= 5 + 6 + The safety of the eBPF program is determined in two steps. 7 + 8 + First step does DAG check to disallow loops and other CFG validation. 9 + In particular it will detect programs that have unreachable instructions. 10 + (though classic BPF checker allows them) 11 + 12 + Second step starts from the first insn and descends all possible paths. 13 + It simulates execution of every insn and observes the state change of 14 + registers and stack. 15 + 16 + At the start of the program the register R1 contains a pointer to context 17 + and has type PTR_TO_CTX. 18 + If verifier sees an insn that does R2=R1, then R2 has now type 19 + PTR_TO_CTX as well and can be used on the right hand side of expression. 20 + If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE, 21 + since addition of two valid pointers makes invalid pointer. 22 + (In 'secure' mode verifier will reject any type of pointer arithmetic to make 23 + sure that kernel addresses don't leak to unprivileged users) 24 + 25 + If register was never written to, it's not readable:: 26 + 27 + bpf_mov R0 = R2 28 + bpf_exit 29 + 30 + will be rejected, since R2 is unreadable at the start of the program. 31 + 32 + After kernel function call, R1-R5 are reset to unreadable and 33 + R0 has a return type of the function. 34 + 35 + Since R6-R9 are callee saved, their state is preserved across the call. 36 + 37 + :: 38 + 39 + bpf_mov R6 = 1 40 + bpf_call foo 41 + bpf_mov R0 = R6 42 + bpf_exit 43 + 44 + is a correct program. If there was R1 instead of R6, it would have 45 + been rejected. 46 + 47 + load/store instructions are allowed only with registers of valid types, which 48 + are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked. 49 + For example:: 50 + 51 + bpf_mov R1 = 1 52 + bpf_mov R2 = 2 53 + bpf_xadd *(u32 *)(R1 + 3) += R2 54 + bpf_exit 55 + 56 + will be rejected, since R1 doesn't have a valid pointer type at the time of 57 + execution of instruction bpf_xadd. 58 + 59 + At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``) 60 + A callback is used to customize verifier to restrict eBPF program access to only 61 + certain fields within ctx structure with specified size and alignment. 62 + 63 + For example, the following insn:: 64 + 65 + bpf_ld R0 = *(u32 *)(R6 + 8) 66 + 67 + intends to load a word from address R6 + 8 and store it into R0 68 + If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know 69 + that offset 8 of size 4 bytes can be accessed for reading, otherwise 70 + the verifier will reject the program. 71 + If R6=PTR_TO_STACK, then access should be aligned and be within 72 + stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8, 73 + so it will fail verification, since it's out of bounds. 74 + 75 + The verifier will allow eBPF program to read data from stack only after 76 + it wrote into it. 77 + 78 + Classic BPF verifier does similar check with M[0-15] memory slots. 79 + For example:: 80 + 81 + bpf_ld R0 = *(u32 *)(R10 - 4) 82 + bpf_exit 83 + 84 + is invalid program. 85 + Though R10 is correct read-only register and has type PTR_TO_STACK 86 + and R10 - 4 is within stack bounds, there were no stores into that location. 87 + 88 + Pointer register spill/fill is tracked as well, since four (R6-R9) 89 + callee saved registers may not be enough for some programs. 90 + 91 + Allowed function calls are customized with bpf_verifier_ops->get_func_proto() 92 + The eBPF verifier will check that registers match argument constraints. 93 + After the call register R0 will be set to return type of the function. 94 + 95 + Function calls is a main mechanism to extend functionality of eBPF programs. 96 + Socket filters may let programs to call one set of functions, whereas tracing 97 + filters may allow completely different set. 98 + 99 + If a function made accessible to eBPF program, it needs to be thought through 100 + from safety point of view. The verifier will guarantee that the function is 101 + called with valid arguments. 102 + 103 + seccomp vs socket filters have different security restrictions for classic BPF. 104 + Seccomp solves this by two stage verifier: classic BPF verifier is followed 105 + by seccomp verifier. In case of eBPF one configurable verifier is shared for 106 + all use cases. 107 + 108 + See details of eBPF verifier in kernel/bpf/verifier.c 109 + 110 + Register value tracking 111 + ======================= 112 + 113 + In order to determine the safety of an eBPF program, the verifier must track 114 + the range of possible values in each register and also in each stack slot. 115 + This is done with ``struct bpf_reg_state``, defined in include/linux/ 116 + bpf_verifier.h, which unifies tracking of scalar and pointer values. Each 117 + register state has a type, which is either NOT_INIT (the register has not been 118 + written to), SCALAR_VALUE (some value which is not usable as a pointer), or a 119 + pointer type. The types of pointers describe their base, as follows: 120 + 121 + 122 + PTR_TO_CTX 123 + Pointer to bpf_context. 124 + CONST_PTR_TO_MAP 125 + Pointer to struct bpf_map. "Const" because arithmetic 126 + on these pointers is forbidden. 127 + PTR_TO_MAP_VALUE 128 + Pointer to the value stored in a map element. 129 + PTR_TO_MAP_VALUE_OR_NULL 130 + Either a pointer to a map value, or NULL; map accesses 131 + (see maps.rst) return this type, which becomes a 132 + PTR_TO_MAP_VALUE when checked != NULL. Arithmetic on 133 + these pointers is forbidden. 134 + PTR_TO_STACK 135 + Frame pointer. 136 + PTR_TO_PACKET 137 + skb->data. 138 + PTR_TO_PACKET_END 139 + skb->data + headlen; arithmetic forbidden. 140 + PTR_TO_SOCKET 141 + Pointer to struct bpf_sock_ops, implicitly refcounted. 142 + PTR_TO_SOCKET_OR_NULL 143 + Either a pointer to a socket, or NULL; socket lookup 144 + returns this type, which becomes a PTR_TO_SOCKET when 145 + checked != NULL. PTR_TO_SOCKET is reference-counted, 146 + so programs must release the reference through the 147 + socket release function before the end of the program. 148 + Arithmetic on these pointers is forbidden. 149 + 150 + However, a pointer may be offset from this base (as a result of pointer 151 + arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable 152 + offset'. The former is used when an exactly-known value (e.g. an immediate 153 + operand) is added to a pointer, while the latter is used for values which are 154 + not exactly known. The variable offset is also used in SCALAR_VALUEs, to track 155 + the range of possible values in the register. 156 + 157 + The verifier's knowledge about the variable offset consists of: 158 + 159 + * minimum and maximum values as unsigned 160 + * minimum and maximum values as signed 161 + 162 + * knowledge of the values of individual bits, in the form of a 'tnum': a u64 163 + 'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown; 164 + 1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both 165 + mask and value; no bit should ever be 1 in both. For example, if a byte is read 166 + into a register from memory, the register's top 56 bits are known zero, while 167 + the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we 168 + then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0; 169 + 0x1ff), because of potential carries. 170 + 171 + Besides arithmetic, the register state can also be updated by conditional 172 + branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch 173 + it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false' 174 + branch it will have a umax_value of 8. A signed compare (with BPF_JSGT or 175 + BPF_JSGE) would instead update the signed minimum/maximum values. Information 176 + from the signed and unsigned bounds can be combined; for instance if a value is 177 + first tested < 8 and then tested s> 4, the verifier will conclude that the value 178 + is also > 4 and s< 8, since the bounds prevent crossing the sign boundary. 179 + 180 + PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all 181 + pointers sharing that same variable offset. This is important for packet range 182 + checks: after adding a variable to a packet pointer register A, if you then copy 183 + it to another register B and then add a constant 4 to A, both registers will 184 + share the same 'id' but the A will have a fixed offset of +4. Then if A is 185 + bounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is 186 + now known to have a safe range of at least 4 bytes. See 'Direct packet access', 187 + below, for more on PTR_TO_PACKET ranges. 188 + 189 + The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of 190 + the pointer returned from a map lookup. This means that when one copy is 191 + checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs. 192 + As well as range-checking, the tracked information is also used for enforcing 193 + alignment of pointer accesses. For instance, on most systems the packet pointer 194 + is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump 195 + over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting 196 + pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 197 + bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through 198 + that pointer are safe. 199 + The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common 200 + to all copies of the pointer returned from a socket lookup. This has similar 201 + behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but 202 + it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly 203 + represents a reference to the corresponding ``struct sock``. To ensure that the 204 + reference is not leaked, it is imperative to NULL-check the reference and in 205 + the non-NULL case, and pass the valid reference to the socket release function. 206 + 207 + Direct packet access 208 + ==================== 209 + 210 + In cls_bpf and act_bpf programs the verifier allows direct access to the packet 211 + data via skb->data and skb->data_end pointers. 212 + Ex:: 213 + 214 + 1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */ 215 + 2: r3 = *(u32 *)(r1 +76) /* load skb->data */ 216 + 3: r5 = r3 217 + 4: r5 += 14 218 + 5: if r5 > r4 goto pc+16 219 + R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp 220 + 6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */ 221 + 222 + this 2byte load from the packet is safe to do, since the program author 223 + did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which 224 + means that in the fall-through case the register R3 (which points to skb->data) 225 + has at least 14 directly accessible bytes. The verifier marks it 226 + as R3=pkt(id=0,off=0,r=14). 227 + id=0 means that no additional variables were added to the register. 228 + off=0 means that no additional constants were added. 229 + r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok. 230 + Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points 231 + to the packet data, but constant 14 was added to the register, so 232 + it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14) 233 + which is zero bytes. 234 + 235 + More complex packet access may look like:: 236 + 237 + 238 + R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp 239 + 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */ 240 + 7: r4 = *(u8 *)(r3 +12) 241 + 8: r4 *= 14 242 + 9: r3 = *(u32 *)(r1 +76) /* load skb->data */ 243 + 10: r3 += r4 244 + 11: r2 = r1 245 + 12: r2 <<= 48 246 + 13: r2 >>= 48 247 + 14: r3 += r2 248 + 15: r2 = r3 249 + 16: r2 += 8 250 + 17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */ 251 + 18: if r2 > r1 goto pc+2 252 + R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp 253 + 19: r1 = *(u8 *)(r3 +4) 254 + 255 + The state of the register R3 is R3=pkt(id=2,off=0,r=8) 256 + id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some 257 + offset within a packet and since the program author did 258 + ``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8). 259 + The verifier only allows 'add'/'sub' operations on packet registers. Any other 260 + operation will set the register state to 'SCALAR_VALUE' and it won't be 261 + available for direct packet access. 262 + 263 + Operation ``r3 += rX`` may overflow and become less than original skb->data, 264 + therefore the verifier has to prevent that. So when it sees ``r3 += rX`` 265 + instruction and rX is more than 16-bit value, any subsequent bounds-check of r3 266 + against skb->data_end will not give us 'range' information, so attempts to read 267 + through the pointer will give "invalid access to packet" error. 268 + 269 + Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is 270 + R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits 271 + of the register are guaranteed to be zero, and nothing is known about the lower 272 + 8 bits. After insn ``r4 *= 14`` the state becomes 273 + R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit 274 + value by constant 14 will keep upper 52 bits as zero, also the least significant 275 + bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make 276 + R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign 277 + extending. This logic is implemented in adjust_reg_min_max_vals() function, 278 + which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice 279 + versa) and adjust_scalar_min_max_vals() for operations on two scalars. 280 + 281 + The end result is that bpf program author can access packet directly 282 + using normal C code as:: 283 + 284 + void *data = (void *)(long)skb->data; 285 + void *data_end = (void *)(long)skb->data_end; 286 + struct eth_hdr *eth = data; 287 + struct iphdr *iph = data + sizeof(*eth); 288 + struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph); 289 + 290 + if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end) 291 + return 0; 292 + if (eth->h_proto != htons(ETH_P_IP)) 293 + return 0; 294 + if (iph->protocol != IPPROTO_UDP || iph->ihl != 5) 295 + return 0; 296 + if (udp->dest == 53 || udp->source == 9) 297 + ...; 298 + 299 + which makes such programs easier to write comparing to LD_ABS insn 300 + and significantly faster. 301 + 302 + Pruning 303 + ======= 304 + 305 + The verifier does not actually walk all possible paths through the program. For 306 + each new branch to analyse, the verifier looks at all the states it's previously 307 + been in when at this instruction. If any of them contain the current state as a 308 + subset, the branch is 'pruned' - that is, the fact that the previous state was 309 + accepted implies the current state would be as well. For instance, if in the 310 + previous state, r1 held a packet-pointer, and in the current state, r1 holds a 311 + packet-pointer with a range as long or longer and at least as strict an 312 + alignment, then r1 is safe. Similarly, if r2 was NOT_INIT before then it can't 313 + have been used by any path from that point, so any value in r2 (including 314 + another NOT_INIT) is safe. The implementation is in the function regsafe(). 315 + Pruning considers not only the registers but also the stack (and any spilled 316 + registers it may hold). They must all be safe for the branch to be pruned. 317 + This is implemented in states_equal(). 318 + 319 + Understanding eBPF verifier messages 320 + ==================================== 321 + 322 + The following are few examples of invalid eBPF programs and verifier error 323 + messages as seen in the log: 324 + 325 + Program with unreachable instructions:: 326 + 327 + static struct bpf_insn prog[] = { 328 + BPF_EXIT_INSN(), 329 + BPF_EXIT_INSN(), 330 + }; 331 + 332 + Error: 333 + 334 + unreachable insn 1 335 + 336 + Program that reads uninitialized register:: 337 + 338 + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), 339 + BPF_EXIT_INSN(), 340 + 341 + Error:: 342 + 343 + 0: (bf) r0 = r2 344 + R2 !read_ok 345 + 346 + Program that doesn't initialize R0 before exiting:: 347 + 348 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), 349 + BPF_EXIT_INSN(), 350 + 351 + Error:: 352 + 353 + 0: (bf) r2 = r1 354 + 1: (95) exit 355 + R0 !read_ok 356 + 357 + Program that accesses stack out of bounds:: 358 + 359 + BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0), 360 + BPF_EXIT_INSN(), 361 + 362 + Error:: 363 + 364 + 0: (7a) *(u64 *)(r10 +8) = 0 365 + invalid stack off=8 size=8 366 + 367 + Program that doesn't initialize stack before passing its address into function:: 368 + 369 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 370 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 371 + BPF_LD_MAP_FD(BPF_REG_1, 0), 372 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 373 + BPF_EXIT_INSN(), 374 + 375 + Error:: 376 + 377 + 0: (bf) r2 = r10 378 + 1: (07) r2 += -8 379 + 2: (b7) r1 = 0x0 380 + 3: (85) call 1 381 + invalid indirect read from stack off -8+0 size 8 382 + 383 + Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:: 384 + 385 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 386 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 387 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 388 + BPF_LD_MAP_FD(BPF_REG_1, 0), 389 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 390 + BPF_EXIT_INSN(), 391 + 392 + Error:: 393 + 394 + 0: (7a) *(u64 *)(r10 -8) = 0 395 + 1: (bf) r2 = r10 396 + 2: (07) r2 += -8 397 + 3: (b7) r1 = 0x0 398 + 4: (85) call 1 399 + fd 0 is not pointing to valid bpf_map 400 + 401 + Program that doesn't check return value of map_lookup_elem() before accessing 402 + map element:: 403 + 404 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 405 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 406 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 407 + BPF_LD_MAP_FD(BPF_REG_1, 0), 408 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 409 + BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), 410 + BPF_EXIT_INSN(), 411 + 412 + Error:: 413 + 414 + 0: (7a) *(u64 *)(r10 -8) = 0 415 + 1: (bf) r2 = r10 416 + 2: (07) r2 += -8 417 + 3: (b7) r1 = 0x0 418 + 4: (85) call 1 419 + 5: (7a) *(u64 *)(r0 +0) = 0 420 + R0 invalid mem access 'map_value_or_null' 421 + 422 + Program that correctly checks map_lookup_elem() returned value for NULL, but 423 + accesses the memory with incorrect alignment:: 424 + 425 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 426 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 427 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 428 + BPF_LD_MAP_FD(BPF_REG_1, 0), 429 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 430 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 431 + BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), 432 + BPF_EXIT_INSN(), 433 + 434 + Error:: 435 + 436 + 0: (7a) *(u64 *)(r10 -8) = 0 437 + 1: (bf) r2 = r10 438 + 2: (07) r2 += -8 439 + 3: (b7) r1 = 1 440 + 4: (85) call 1 441 + 5: (15) if r0 == 0x0 goto pc+1 442 + R0=map_ptr R10=fp 443 + 6: (7a) *(u64 *)(r0 +4) = 0 444 + misaligned access off 4 size 8 445 + 446 + Program that correctly checks map_lookup_elem() returned value for NULL and 447 + accesses memory with correct alignment in one side of 'if' branch, but fails 448 + to do so in the other side of 'if' branch:: 449 + 450 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 451 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 452 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 453 + BPF_LD_MAP_FD(BPF_REG_1, 0), 454 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 455 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 456 + BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), 457 + BPF_EXIT_INSN(), 458 + BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1), 459 + BPF_EXIT_INSN(), 460 + 461 + Error:: 462 + 463 + 0: (7a) *(u64 *)(r10 -8) = 0 464 + 1: (bf) r2 = r10 465 + 2: (07) r2 += -8 466 + 3: (b7) r1 = 1 467 + 4: (85) call 1 468 + 5: (15) if r0 == 0x0 goto pc+2 469 + R0=map_ptr R10=fp 470 + 6: (7a) *(u64 *)(r0 +0) = 0 471 + 7: (95) exit 472 + 473 + from 5 to 8: R0=imm0 R10=fp 474 + 8: (7a) *(u64 *)(r0 +0) = 1 475 + R0 invalid mem access 'imm' 476 + 477 + Program that performs a socket lookup then sets the pointer to NULL without 478 + checking it:: 479 + 480 + BPF_MOV64_IMM(BPF_REG_2, 0), 481 + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 482 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 483 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 484 + BPF_MOV64_IMM(BPF_REG_3, 4), 485 + BPF_MOV64_IMM(BPF_REG_4, 0), 486 + BPF_MOV64_IMM(BPF_REG_5, 0), 487 + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 488 + BPF_MOV64_IMM(BPF_REG_0, 0), 489 + BPF_EXIT_INSN(), 490 + 491 + Error:: 492 + 493 + 0: (b7) r2 = 0 494 + 1: (63) *(u32 *)(r10 -8) = r2 495 + 2: (bf) r2 = r10 496 + 3: (07) r2 += -8 497 + 4: (b7) r3 = 4 498 + 5: (b7) r4 = 0 499 + 6: (b7) r5 = 0 500 + 7: (85) call bpf_sk_lookup_tcp#65 501 + 8: (b7) r0 = 0 502 + 9: (95) exit 503 + Unreleased reference id=1, alloc_insn=7 504 + 505 + Program that performs a socket lookup but does not NULL-check the returned 506 + value:: 507 + 508 + BPF_MOV64_IMM(BPF_REG_2, 0), 509 + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 510 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 511 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 512 + BPF_MOV64_IMM(BPF_REG_3, 4), 513 + BPF_MOV64_IMM(BPF_REG_4, 0), 514 + BPF_MOV64_IMM(BPF_REG_5, 0), 515 + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 516 + BPF_EXIT_INSN(), 517 + 518 + Error:: 519 + 520 + 0: (b7) r2 = 0 521 + 1: (63) *(u32 *)(r10 -8) = r2 522 + 2: (bf) r2 = r10 523 + 3: (07) r2 += -8 524 + 4: (b7) r3 = 4 525 + 5: (b7) r4 = 0 526 + 6: (b7) r5 = 0 527 + 7: (85) call bpf_sk_lookup_tcp#65 528 + 8: (95) exit 529 + Unreleased reference id=1, alloc_insn=7

+9 -1027

Documentation/networking/filter.rst

··· 6 6 Linux Socket Filtering aka Berkeley Packet Filter (BPF) 7 7 ======================================================= 8 8 9 + Notice 10 + ------ 11 + 12 + This file used to document the eBPF format and mechanisms even when not 13 + related to socket filtering. The ../bpf/index.rst has more details 14 + on eBPF. 15 + 9 16 Introduction 10 17 ------------ 11 18 ··· 624 617 paragraphs is being used. However, the instruction set format is modelled 625 618 closer to the underlying architecture to mimic native instruction sets, so 626 619 that a better performance can be achieved (more details later). This new 627 - ISA is called 'eBPF' or 'internal BPF' interchangeably. (Note: eBPF which 620 + ISA is called eBPF. See the ../bpf/index.rst for details. (Note: eBPF which 628 621 originates from [e]xtended BPF is not the same as BPF extensions! While 629 622 eBPF is an ISA, BPF extensions date back to classic BPF's 'overloading' 630 623 of BPF_LD | BPF_{B,H,W} | BPF_ABS instruction.) 631 - 632 - It is designed to be JITed with one to one mapping, which can also open up 633 - the possibility for GCC/LLVM compilers to generate optimized eBPF code through 634 - an eBPF backend that performs almost as fast as natively compiled code. 635 624 636 625 The new instruction set was originally designed with the possible goal in 637 626 mind to write programs in "restricted C" and compile into eBPF with a optional ··· 653 650 sparc64, arm32, riscv64, riscv32 perform JIT compilation from eBPF 654 651 instruction set. 655 652 656 - Some core changes of the new internal format: 657 - 658 - - Number of registers increase from 2 to 10: 659 - 660 - The old format had two registers A and X, and a hidden frame pointer. The 661 - new layout extends this to be 10 internal registers and a read-only frame 662 - pointer. Since 64-bit CPUs are passing arguments to functions via registers 663 - the number of args from eBPF program to in-kernel function is restricted 664 - to 5 and one register is used to accept return value from an in-kernel 665 - function. Natively, x86_64 passes first 6 arguments in registers, aarch64/ 666 - sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved 667 - registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers. 668 - 669 - Therefore, eBPF calling convention is defined as: 670 - 671 - * R0 - return value from in-kernel function, and exit value for eBPF program 672 - * R1 - R5 - arguments from eBPF program to in-kernel function 673 - * R6 - R9 - callee saved registers that in-kernel function will preserve 674 - * R10 - read-only frame pointer to access stack 675 - 676 - Thus, all eBPF registers map one to one to HW registers on x86_64, aarch64, 677 - etc, and eBPF calling convention maps directly to ABIs used by the kernel on 678 - 64-bit architectures. 679 - 680 - On 32-bit architectures JIT may map programs that use only 32-bit arithmetic 681 - and may let more complex programs to be interpreted. 682 - 683 - R0 - R5 are scratch registers and eBPF program needs spill/fill them if 684 - necessary across calls. Note that there is only one eBPF program (== one 685 - eBPF main routine) and it cannot call other eBPF functions, it can only 686 - call predefined in-kernel functions, though. 687 - 688 - - Register width increases from 32-bit to 64-bit: 689 - 690 - Still, the semantics of the original 32-bit ALU operations are preserved 691 - via 32-bit subregisters. All eBPF registers are 64-bit with 32-bit lower 692 - subregisters that zero-extend into 64-bit if they are being written to. 693 - That behavior maps directly to x86_64 and arm64 subregister definition, but 694 - makes other JITs more difficult. 695 - 696 - 32-bit architectures run 64-bit internal BPF programs via interpreter. 697 - Their JITs may convert BPF programs that only use 32-bit subregisters into 698 - native instruction set and let the rest being interpreted. 699 - 700 - Operation is 64-bit, because on 64-bit architectures, pointers are also 701 - 64-bit wide, and we want to pass 64-bit values in/out of kernel functions, 702 - so 32-bit eBPF registers would otherwise require to define register-pair 703 - ABI, thus, there won't be able to use a direct eBPF register to HW register 704 - mapping and JIT would need to do combine/split/move operations for every 705 - register in and out of the function, which is complex, bug prone and slow. 706 - Another reason is the use of atomic 64-bit counters. 707 - 708 - - Conditional jt/jf targets replaced with jt/fall-through: 709 - 710 - While the original design has constructs such as ``if (cond) jump_true; 711 - else jump_false;``, they are being replaced into alternative constructs like 712 - ``if (cond) jump_true; /* else fall-through */``. 713 - 714 - - Introduces bpf_call insn and register passing convention for zero overhead 715 - calls from/to other kernel functions: 716 - 717 - Before an in-kernel function call, the internal BPF program needs to 718 - place function arguments into R1 to R5 registers to satisfy calling 719 - convention, then the interpreter will take them from registers and pass 720 - to in-kernel function. If R1 - R5 registers are mapped to CPU registers 721 - that are used for argument passing on given architecture, the JIT compiler 722 - doesn't need to emit extra moves. Function arguments will be in the correct 723 - registers and BPF_CALL instruction will be JITed as single 'call' HW 724 - instruction. This calling convention was picked to cover common call 725 - situations without performance penalty. 726 - 727 - After an in-kernel function call, R1 - R5 are reset to unreadable and R0 has 728 - a return value of the function. Since R6 - R9 are callee saved, their state 729 - is preserved across the call. 730 - 731 - For example, consider three C functions:: 732 - 733 - u64 f1() { return (*_f2)(1); } 734 - u64 f2(u64 a) { return f3(a + 1, a); } 735 - u64 f3(u64 a, u64 b) { return a - b; } 736 - 737 - GCC can compile f1, f3 into x86_64:: 738 - 739 - f1: 740 - movl $1, %edi 741 - movq _f2(%rip), %rax 742 - jmp *%rax 743 - f3: 744 - movq %rdi, %rax 745 - subq %rsi, %rax 746 - ret 747 - 748 - Function f2 in eBPF may look like:: 749 - 750 - f2: 751 - bpf_mov R2, R1 752 - bpf_add R1, 1 753 - bpf_call f3 754 - bpf_exit 755 - 756 - If f2 is JITed and the pointer stored to ``_f2``. The calls f1 -> f2 -> f3 and 757 - returns will be seamless. Without JIT, __bpf_prog_run() interpreter needs to 758 - be used to call into f2. 759 - 760 - For practical reasons all eBPF programs have only one argument 'ctx' which is 761 - already placed into R1 (e.g. on __bpf_prog_run() startup) and the programs 762 - can call kernel functions with up to 5 arguments. Calls with 6 or more arguments 763 - are currently not supported, but these restrictions can be lifted if necessary 764 - in the future. 765 - 766 - On 64-bit architectures all register map to HW registers one to one. For 767 - example, x86_64 JIT compiler can map them as ... 768 - 769 - :: 770 - 771 - R0 - rax 772 - R1 - rdi 773 - R2 - rsi 774 - R3 - rdx 775 - R4 - rcx 776 - R5 - r8 777 - R6 - rbx 778 - R7 - r13 779 - R8 - r14 780 - R9 - r15 781 - R10 - rbp 782 - 783 - ... since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing 784 - and rbx, r12 - r15 are callee saved. 785 - 786 - Then the following internal BPF pseudo-program:: 787 - 788 - bpf_mov R6, R1 /* save ctx */ 789 - bpf_mov R2, 2 790 - bpf_mov R3, 3 791 - bpf_mov R4, 4 792 - bpf_mov R5, 5 793 - bpf_call foo 794 - bpf_mov R7, R0 /* save foo() return value */ 795 - bpf_mov R1, R6 /* restore ctx for next call */ 796 - bpf_mov R2, 6 797 - bpf_mov R3, 7 798 - bpf_mov R4, 8 799 - bpf_mov R5, 9 800 - bpf_call bar 801 - bpf_add R0, R7 802 - bpf_exit 803 - 804 - After JIT to x86_64 may look like:: 805 - 806 - push %rbp 807 - mov %rsp,%rbp 808 - sub $0x228,%rsp 809 - mov %rbx,-0x228(%rbp) 810 - mov %r13,-0x220(%rbp) 811 - mov %rdi,%rbx 812 - mov $0x2,%esi 813 - mov $0x3,%edx 814 - mov $0x4,%ecx 815 - mov $0x5,%r8d 816 - callq foo 817 - mov %rax,%r13 818 - mov %rbx,%rdi 819 - mov $0x6,%esi 820 - mov $0x7,%edx 821 - mov $0x8,%ecx 822 - mov $0x9,%r8d 823 - callq bar 824 - add %r13,%rax 825 - mov -0x228(%rbp),%rbx 826 - mov -0x220(%rbp),%r13 827 - leaveq 828 - retq 829 - 830 - Which is in this example equivalent in C to:: 831 - 832 - u64 bpf_filter(u64 ctx) 833 - { 834 - return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9); 835 - } 836 - 837 - In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64 838 - arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper 839 - registers and place their return value into ``%rax`` which is R0 in eBPF. 840 - Prologue and epilogue are emitted by JIT and are implicit in the 841 - interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve 842 - them across the calls as defined by calling convention. 843 - 844 - For example the following program is invalid:: 845 - 846 - bpf_mov R1, 1 847 - bpf_call foo 848 - bpf_mov R0, R1 849 - bpf_exit 850 - 851 - After the call the registers R1-R5 contain junk values and cannot be read. 852 - An in-kernel eBPF verifier is used to validate internal BPF programs. 853 - 854 - Also in the new design, eBPF is limited to 4096 insns, which means that any 855 - program will terminate quickly and will only call a fixed number of kernel 856 - functions. Original BPF and the new format are two operand instructions, 857 - which helps to do one-to-one mapping between eBPF insn and x86 insn during JIT. 858 - 859 - The input context pointer for invoking the interpreter function is generic, 860 - its content is defined by a specific use case. For seccomp register R1 points 861 - to seccomp_data, for converted BPF filters R1 points to a skb. 862 - 863 - A program, that is translated internally consists of the following elements:: 864 - 865 - op:16, jt:8, jf:8, k:32 ==> op:8, dst_reg:4, src_reg:4, off:16, imm:32 866 - 867 - So far 87 internal BPF instructions were implemented. 8-bit 'op' opcode field 868 - has room for new instructions. Some of them may use 16/24/32 byte encoding. New 869 - instructions must be multiple of 8 bytes to preserve backward compatibility. 870 - 871 - Internal BPF is a general purpose RISC instruction set. Not every register and 872 - every instruction are used during translation from original BPF to new format. 873 - For example, socket filters are not using ``exclusive add`` instruction, but 874 - tracing filters may do to maintain counters of events, for example. Register R9 875 - is not used by socket filters either, but more complex filters may be running 876 - out of registers and would have to resort to spill/fill to stack. 877 - 878 - Internal BPF can be used as a generic assembler for last step performance 879 - optimizations, socket filters and seccomp are using it as assembler. Tracing 880 - filters may use it as assembler to generate code from kernel. In kernel usage 881 - may not be bounded by security considerations, since generated internal BPF code 882 - may be optimizing internal code path and not being exposed to the user space. 883 - Safety of internal BPF can come from a verifier (TBD). In such use cases as 884 - described, it may be used as safe instruction set. 885 - 886 - Just like the original BPF, the new format runs within a controlled environment, 887 - is deterministic and the kernel can easily prove that. The safety of the program 888 - can be determined in two steps: first step does depth-first-search to disallow 889 - loops and other CFG validation; second step starts from the first insn and 890 - descends all possible paths. It simulates execution of every insn and observes 891 - the state change of registers and stack. 892 - 893 - eBPF opcode encoding 894 - -------------------- 895 - 896 - eBPF is reusing most of the opcode encoding from classic to simplify conversion 897 - of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code' 898 - field is divided into three parts:: 899 - 900 - +----------------+--------+--------------------+ 901 - | 4 bits | 1 bit | 3 bits | 902 - | operation code | source | instruction class | 903 - +----------------+--------+--------------------+ 904 - (MSB) (LSB) 905 - 906 - Three LSB bits store instruction class which is one of: 907 - 908 - =================== =============== 909 - Classic BPF classes eBPF classes 910 - =================== =============== 911 - BPF_LD 0x00 BPF_LD 0x00 912 - BPF_LDX 0x01 BPF_LDX 0x01 913 - BPF_ST 0x02 BPF_ST 0x02 914 - BPF_STX 0x03 BPF_STX 0x03 915 - BPF_ALU 0x04 BPF_ALU 0x04 916 - BPF_JMP 0x05 BPF_JMP 0x05 917 - BPF_RET 0x06 BPF_JMP32 0x06 918 - BPF_MISC 0x07 BPF_ALU64 0x07 919 - =================== =============== 920 - 921 - When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ... 922 - 923 - :: 924 - 925 - BPF_K 0x00 926 - BPF_X 0x08 927 - 928 - * in classic BPF, this means:: 929 - 930 - BPF_SRC(code) == BPF_X - use register X as source operand 931 - BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand 932 - 933 - * in eBPF, this means:: 934 - 935 - BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand 936 - BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand 937 - 938 - ... and four MSB bits store operation code. 939 - 940 - If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:: 941 - 942 - BPF_ADD 0x00 943 - BPF_SUB 0x10 944 - BPF_MUL 0x20 945 - BPF_DIV 0x30 946 - BPF_OR 0x40 947 - BPF_AND 0x50 948 - BPF_LSH 0x60 949 - BPF_RSH 0x70 950 - BPF_NEG 0x80 951 - BPF_MOD 0x90 952 - BPF_XOR 0xa0 953 - BPF_MOV 0xb0 /* eBPF only: mov reg to reg */ 954 - BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */ 955 - BPF_END 0xd0 /* eBPF only: endianness conversion */ 956 - 957 - If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:: 958 - 959 - BPF_JA 0x00 /* BPF_JMP only */ 960 - BPF_JEQ 0x10 961 - BPF_JGT 0x20 962 - BPF_JGE 0x30 963 - BPF_JSET 0x40 964 - BPF_JNE 0x50 /* eBPF only: jump != */ 965 - BPF_JSGT 0x60 /* eBPF only: signed '>' */ 966 - BPF_JSGE 0x70 /* eBPF only: signed '>=' */ 967 - BPF_CALL 0x80 /* eBPF BPF_JMP only: function call */ 968 - BPF_EXIT 0x90 /* eBPF BPF_JMP only: function return */ 969 - BPF_JLT 0xa0 /* eBPF only: unsigned '<' */ 970 - BPF_JLE 0xb0 /* eBPF only: unsigned '<=' */ 971 - BPF_JSLT 0xc0 /* eBPF only: signed '<' */ 972 - BPF_JSLE 0xd0 /* eBPF only: signed '<=' */ 973 - 974 - So BPF_ADD | BPF_X | BPF_ALU means 32-bit addition in both classic BPF 975 - and eBPF. There are only two registers in classic BPF, so it means A += X. 976 - In eBPF it means dst_reg = (u32) dst_reg + (u32) src_reg; similarly, 977 - BPF_XOR | BPF_K | BPF_ALU means A ^= imm32 in classic BPF and analogous 978 - src_reg = (u32) src_reg ^ (u32) imm32 in eBPF. 979 - 980 - Classic BPF is using BPF_MISC class to represent A = X and X = A moves. 981 - eBPF is using BPF_MOV | BPF_X | BPF_ALU code instead. Since there are no 982 - BPF_MISC operations in eBPF, the class 7 is used as BPF_ALU64 to mean 983 - exactly the same operations as BPF_ALU, but with 64-bit wide operands 984 - instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.: 985 - dst_reg = dst_reg + src_reg 986 - 987 - Classic BPF wastes the whole BPF_RET class to represent a single ``ret`` 988 - operation. Classic BPF_RET | BPF_K means copy imm32 into return register 989 - and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT 990 - in eBPF means function exit only. The eBPF program needs to store return 991 - value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as 992 - BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide 993 - operands for the comparisons instead. 994 - 995 - For load and store instructions the 8-bit 'code' field is divided as:: 996 - 997 - +--------+--------+-------------------+ 998 - | 3 bits | 2 bits | 3 bits | 999 - | mode | size | instruction class | 1000 - +--------+--------+-------------------+ 1001 - (MSB) (LSB) 1002 - 1003 - Size modifier is one of ... 1004 - 1005 - :: 1006 - 1007 - BPF_W 0x00 /* word */ 1008 - BPF_H 0x08 /* half word */ 1009 - BPF_B 0x10 /* byte */ 1010 - BPF_DW 0x18 /* eBPF only, double word */ 1011 - 1012 - ... which encodes size of load/store operation:: 1013 - 1014 - B - 1 byte 1015 - H - 2 byte 1016 - W - 4 byte 1017 - DW - 8 byte (eBPF only) 1018 - 1019 - Mode modifier is one of:: 1020 - 1021 - BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */ 1022 - BPF_ABS 0x20 1023 - BPF_IND 0x40 1024 - BPF_MEM 0x60 1025 - BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */ 1026 - BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */ 1027 - BPF_ATOMIC 0xc0 /* eBPF only, atomic operations */ 1028 - 1029 - eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and 1030 - (BPF_IND | <size> | BPF_LD) which are used to access packet data. 1031 - 1032 - They had to be carried over from classic to have strong performance of 1033 - socket filters running in eBPF interpreter. These instructions can only 1034 - be used when interpreter context is a pointer to ``struct sk_buff`` and 1035 - have seven implicit operands. Register R6 is an implicit input that must 1036 - contain pointer to sk_buff. Register R0 is an implicit output which contains 1037 - the data fetched from the packet. Registers R1-R5 are scratch registers 1038 - and must not be used to store the data across BPF_ABS | BPF_LD or 1039 - BPF_IND | BPF_LD instructions. 1040 - 1041 - These instructions have implicit program exit condition as well. When 1042 - eBPF program is trying to access the data beyond the packet boundary, 1043 - the interpreter will abort the execution of the program. JIT compilers 1044 - therefore must preserve this property. src_reg and imm32 fields are 1045 - explicit inputs to these instructions. 1046 - 1047 - For example:: 1048 - 1049 - BPF_IND | BPF_W | BPF_LD means: 1050 - 1051 - R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) 1052 - and R1 - R5 were scratched. 1053 - 1054 - Unlike classic BPF instruction set, eBPF has generic load/store operations:: 1055 - 1056 - BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg 1057 - BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32 1058 - BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off) 1059 - 1060 - Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. 1061 - 1062 - It also includes atomic operations, which use the immediate field for extra 1063 - encoding:: 1064 - 1065 - .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg 1066 - .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg 1067 - 1068 - The basic atomic operations supported are:: 1069 - 1070 - BPF_ADD 1071 - BPF_AND 1072 - BPF_OR 1073 - BPF_XOR 1074 - 1075 - Each having equivalent semantics with the ``BPF_ADD`` example, that is: the 1076 - memory location addresed by ``dst_reg + off`` is atomically modified, with 1077 - ``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the 1078 - immediate, then these operations also overwrite ``src_reg`` with the 1079 - value that was in memory before it was modified. 1080 - 1081 - The more special operations are:: 1082 - 1083 - BPF_XCHG 1084 - 1085 - This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg + 1086 - off``. :: 1087 - 1088 - BPF_CMPXCHG 1089 - 1090 - This atomically compares the value addressed by ``dst_reg + off`` with 1091 - ``R0``. If they match it is replaced with ``src_reg``. In either case, the 1092 - value that was there before is zero-extended and loaded back to ``R0``. 1093 - 1094 - Note that 1 and 2 byte atomic operations are not supported. 1095 - 1096 - Clang can generate atomic instructions by default when ``-mcpu=v3`` is 1097 - enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction 1098 - Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable 1099 - the atomics features, while keeping a lower ``-mcpu`` version, you can use 1100 - ``-Xclang -target-feature -Xclang +alu32``. 1101 - 1102 - You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``, 1103 - referring to the exclusive-add operation encoded when the immediate field is 1104 - zero. 1105 - 1106 - eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists 1107 - of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single 1108 - instruction that loads 64-bit immediate value into a dst_reg. 1109 - Classic BPF has similar instruction: ``BPF_LD | BPF_W | BPF_IMM`` which loads 1110 - 32-bit immediate value into a register. 1111 - 1112 - eBPF verifier 1113 - ------------- 1114 - The safety of the eBPF program is determined in two steps. 1115 - 1116 - First step does DAG check to disallow loops and other CFG validation. 1117 - In particular it will detect programs that have unreachable instructions. 1118 - (though classic BPF checker allows them) 1119 - 1120 - Second step starts from the first insn and descends all possible paths. 1121 - It simulates execution of every insn and observes the state change of 1122 - registers and stack. 1123 - 1124 - At the start of the program the register R1 contains a pointer to context 1125 - and has type PTR_TO_CTX. 1126 - If verifier sees an insn that does R2=R1, then R2 has now type 1127 - PTR_TO_CTX as well and can be used on the right hand side of expression. 1128 - If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE, 1129 - since addition of two valid pointers makes invalid pointer. 1130 - (In 'secure' mode verifier will reject any type of pointer arithmetic to make 1131 - sure that kernel addresses don't leak to unprivileged users) 1132 - 1133 - If register was never written to, it's not readable:: 1134 - 1135 - bpf_mov R0 = R2 1136 - bpf_exit 1137 - 1138 - will be rejected, since R2 is unreadable at the start of the program. 1139 - 1140 - After kernel function call, R1-R5 are reset to unreadable and 1141 - R0 has a return type of the function. 1142 - 1143 - Since R6-R9 are callee saved, their state is preserved across the call. 1144 - 1145 - :: 1146 - 1147 - bpf_mov R6 = 1 1148 - bpf_call foo 1149 - bpf_mov R0 = R6 1150 - bpf_exit 1151 - 1152 - is a correct program. If there was R1 instead of R6, it would have 1153 - been rejected. 1154 - 1155 - load/store instructions are allowed only with registers of valid types, which 1156 - are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked. 1157 - For example:: 1158 - 1159 - bpf_mov R1 = 1 1160 - bpf_mov R2 = 2 1161 - bpf_xadd *(u32 *)(R1 + 3) += R2 1162 - bpf_exit 1163 - 1164 - will be rejected, since R1 doesn't have a valid pointer type at the time of 1165 - execution of instruction bpf_xadd. 1166 - 1167 - At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``) 1168 - A callback is used to customize verifier to restrict eBPF program access to only 1169 - certain fields within ctx structure with specified size and alignment. 1170 - 1171 - For example, the following insn:: 1172 - 1173 - bpf_ld R0 = *(u32 *)(R6 + 8) 1174 - 1175 - intends to load a word from address R6 + 8 and store it into R0 1176 - If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know 1177 - that offset 8 of size 4 bytes can be accessed for reading, otherwise 1178 - the verifier will reject the program. 1179 - If R6=PTR_TO_STACK, then access should be aligned and be within 1180 - stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8, 1181 - so it will fail verification, since it's out of bounds. 1182 - 1183 - The verifier will allow eBPF program to read data from stack only after 1184 - it wrote into it. 1185 - 1186 - Classic BPF verifier does similar check with M[0-15] memory slots. 1187 - For example:: 1188 - 1189 - bpf_ld R0 = *(u32 *)(R10 - 4) 1190 - bpf_exit 1191 - 1192 - is invalid program. 1193 - Though R10 is correct read-only register and has type PTR_TO_STACK 1194 - and R10 - 4 is within stack bounds, there were no stores into that location. 1195 - 1196 - Pointer register spill/fill is tracked as well, since four (R6-R9) 1197 - callee saved registers may not be enough for some programs. 1198 - 1199 - Allowed function calls are customized with bpf_verifier_ops->get_func_proto() 1200 - The eBPF verifier will check that registers match argument constraints. 1201 - After the call register R0 will be set to return type of the function. 1202 - 1203 - Function calls is a main mechanism to extend functionality of eBPF programs. 1204 - Socket filters may let programs to call one set of functions, whereas tracing 1205 - filters may allow completely different set. 1206 - 1207 - If a function made accessible to eBPF program, it needs to be thought through 1208 - from safety point of view. The verifier will guarantee that the function is 1209 - called with valid arguments. 1210 - 1211 - seccomp vs socket filters have different security restrictions for classic BPF. 1212 - Seccomp solves this by two stage verifier: classic BPF verifier is followed 1213 - by seccomp verifier. In case of eBPF one configurable verifier is shared for 1214 - all use cases. 1215 - 1216 - See details of eBPF verifier in kernel/bpf/verifier.c 1217 - 1218 - Register value tracking 1219 - ----------------------- 1220 - In order to determine the safety of an eBPF program, the verifier must track 1221 - the range of possible values in each register and also in each stack slot. 1222 - This is done with ``struct bpf_reg_state``, defined in include/linux/ 1223 - bpf_verifier.h, which unifies tracking of scalar and pointer values. Each 1224 - register state has a type, which is either NOT_INIT (the register has not been 1225 - written to), SCALAR_VALUE (some value which is not usable as a pointer), or a 1226 - pointer type. The types of pointers describe their base, as follows: 1227 - 1228 - 1229 - PTR_TO_CTX 1230 - Pointer to bpf_context. 1231 - CONST_PTR_TO_MAP 1232 - Pointer to struct bpf_map. "Const" because arithmetic 1233 - on these pointers is forbidden. 1234 - PTR_TO_MAP_VALUE 1235 - Pointer to the value stored in a map element. 1236 - PTR_TO_MAP_VALUE_OR_NULL 1237 - Either a pointer to a map value, or NULL; map accesses 1238 - (see section 'eBPF maps', below) return this type, 1239 - which becomes a PTR_TO_MAP_VALUE when checked != NULL. 1240 - Arithmetic on these pointers is forbidden. 1241 - PTR_TO_STACK 1242 - Frame pointer. 1243 - PTR_TO_PACKET 1244 - skb->data. 1245 - PTR_TO_PACKET_END 1246 - skb->data + headlen; arithmetic forbidden. 1247 - PTR_TO_SOCKET 1248 - Pointer to struct bpf_sock_ops, implicitly refcounted. 1249 - PTR_TO_SOCKET_OR_NULL 1250 - Either a pointer to a socket, or NULL; socket lookup 1251 - returns this type, which becomes a PTR_TO_SOCKET when 1252 - checked != NULL. PTR_TO_SOCKET is reference-counted, 1253 - so programs must release the reference through the 1254 - socket release function before the end of the program. 1255 - Arithmetic on these pointers is forbidden. 1256 - 1257 - However, a pointer may be offset from this base (as a result of pointer 1258 - arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable 1259 - offset'. The former is used when an exactly-known value (e.g. an immediate 1260 - operand) is added to a pointer, while the latter is used for values which are 1261 - not exactly known. The variable offset is also used in SCALAR_VALUEs, to track 1262 - the range of possible values in the register. 1263 - 1264 - The verifier's knowledge about the variable offset consists of: 1265 - 1266 - * minimum and maximum values as unsigned 1267 - * minimum and maximum values as signed 1268 - 1269 - * knowledge of the values of individual bits, in the form of a 'tnum': a u64 1270 - 'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown; 1271 - 1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both 1272 - mask and value; no bit should ever be 1 in both. For example, if a byte is read 1273 - into a register from memory, the register's top 56 bits are known zero, while 1274 - the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we 1275 - then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0; 1276 - 0x1ff), because of potential carries. 1277 - 1278 - Besides arithmetic, the register state can also be updated by conditional 1279 - branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch 1280 - it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false' 1281 - branch it will have a umax_value of 8. A signed compare (with BPF_JSGT or 1282 - BPF_JSGE) would instead update the signed minimum/maximum values. Information 1283 - from the signed and unsigned bounds can be combined; for instance if a value is 1284 - first tested < 8 and then tested s> 4, the verifier will conclude that the value 1285 - is also > 4 and s< 8, since the bounds prevent crossing the sign boundary. 1286 - 1287 - PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all 1288 - pointers sharing that same variable offset. This is important for packet range 1289 - checks: after adding a variable to a packet pointer register A, if you then copy 1290 - it to another register B and then add a constant 4 to A, both registers will 1291 - share the same 'id' but the A will have a fixed offset of +4. Then if A is 1292 - bounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is 1293 - now known to have a safe range of at least 4 bytes. See 'Direct packet access', 1294 - below, for more on PTR_TO_PACKET ranges. 1295 - 1296 - The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of 1297 - the pointer returned from a map lookup. This means that when one copy is 1298 - checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs. 1299 - As well as range-checking, the tracked information is also used for enforcing 1300 - alignment of pointer accesses. For instance, on most systems the packet pointer 1301 - is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump 1302 - over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting 1303 - pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 1304 - bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through 1305 - that pointer are safe. 1306 - The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common 1307 - to all copies of the pointer returned from a socket lookup. This has similar 1308 - behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but 1309 - it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly 1310 - represents a reference to the corresponding ``struct sock``. To ensure that the 1311 - reference is not leaked, it is imperative to NULL-check the reference and in 1312 - the non-NULL case, and pass the valid reference to the socket release function. 1313 - 1314 - Direct packet access 1315 - -------------------- 1316 - In cls_bpf and act_bpf programs the verifier allows direct access to the packet 1317 - data via skb->data and skb->data_end pointers. 1318 - Ex:: 1319 - 1320 - 1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */ 1321 - 2: r3 = *(u32 *)(r1 +76) /* load skb->data */ 1322 - 3: r5 = r3 1323 - 4: r5 += 14 1324 - 5: if r5 > r4 goto pc+16 1325 - R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp 1326 - 6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */ 1327 - 1328 - this 2byte load from the packet is safe to do, since the program author 1329 - did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which 1330 - means that in the fall-through case the register R3 (which points to skb->data) 1331 - has at least 14 directly accessible bytes. The verifier marks it 1332 - as R3=pkt(id=0,off=0,r=14). 1333 - id=0 means that no additional variables were added to the register. 1334 - off=0 means that no additional constants were added. 1335 - r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok. 1336 - Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points 1337 - to the packet data, but constant 14 was added to the register, so 1338 - it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14) 1339 - which is zero bytes. 1340 - 1341 - More complex packet access may look like:: 1342 - 1343 - 1344 - R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp 1345 - 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */ 1346 - 7: r4 = *(u8 *)(r3 +12) 1347 - 8: r4 *= 14 1348 - 9: r3 = *(u32 *)(r1 +76) /* load skb->data */ 1349 - 10: r3 += r4 1350 - 11: r2 = r1 1351 - 12: r2 <<= 48 1352 - 13: r2 >>= 48 1353 - 14: r3 += r2 1354 - 15: r2 = r3 1355 - 16: r2 += 8 1356 - 17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */ 1357 - 18: if r2 > r1 goto pc+2 1358 - R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp 1359 - 19: r1 = *(u8 *)(r3 +4) 1360 - 1361 - The state of the register R3 is R3=pkt(id=2,off=0,r=8) 1362 - id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some 1363 - offset within a packet and since the program author did 1364 - ``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8). 1365 - The verifier only allows 'add'/'sub' operations on packet registers. Any other 1366 - operation will set the register state to 'SCALAR_VALUE' and it won't be 1367 - available for direct packet access. 1368 - 1369 - Operation ``r3 += rX`` may overflow and become less than original skb->data, 1370 - therefore the verifier has to prevent that. So when it sees ``r3 += rX`` 1371 - instruction and rX is more than 16-bit value, any subsequent bounds-check of r3 1372 - against skb->data_end will not give us 'range' information, so attempts to read 1373 - through the pointer will give "invalid access to packet" error. 1374 - 1375 - Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is 1376 - R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits 1377 - of the register are guaranteed to be zero, and nothing is known about the lower 1378 - 8 bits. After insn ``r4 *= 14`` the state becomes 1379 - R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit 1380 - value by constant 14 will keep upper 52 bits as zero, also the least significant 1381 - bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make 1382 - R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign 1383 - extending. This logic is implemented in adjust_reg_min_max_vals() function, 1384 - which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice 1385 - versa) and adjust_scalar_min_max_vals() for operations on two scalars. 1386 - 1387 - The end result is that bpf program author can access packet directly 1388 - using normal C code as:: 1389 - 1390 - void *data = (void *)(long)skb->data; 1391 - void *data_end = (void *)(long)skb->data_end; 1392 - struct eth_hdr *eth = data; 1393 - struct iphdr *iph = data + sizeof(*eth); 1394 - struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph); 1395 - 1396 - if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end) 1397 - return 0; 1398 - if (eth->h_proto != htons(ETH_P_IP)) 1399 - return 0; 1400 - if (iph->protocol != IPPROTO_UDP || iph->ihl != 5) 1401 - return 0; 1402 - if (udp->dest == 53 || udp->source == 9) 1403 - ...; 1404 - 1405 - which makes such programs easier to write comparing to LD_ABS insn 1406 - and significantly faster. 1407 - 1408 - eBPF maps 1409 - --------- 1410 - 'maps' is a generic storage of different types for sharing data between kernel 1411 - and userspace. 1412 - 1413 - The maps are accessed from user space via BPF syscall, which has commands: 1414 - 1415 - - create a map with given type and attributes 1416 - ``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)`` 1417 - using attr->map_type, attr->key_size, attr->value_size, attr->max_entries 1418 - returns process-local file descriptor or negative error 1419 - 1420 - - lookup key in a given map 1421 - ``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)`` 1422 - using attr->map_fd, attr->key, attr->value 1423 - returns zero and stores found elem into value or negative error 1424 - 1425 - - create or update key/value pair in a given map 1426 - ``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)`` 1427 - using attr->map_fd, attr->key, attr->value 1428 - returns zero or negative error 1429 - 1430 - - find and delete element by key in a given map 1431 - ``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)`` 1432 - using attr->map_fd, attr->key 1433 - 1434 - - to delete map: close(fd) 1435 - Exiting process will delete maps automatically 1436 - 1437 - userspace programs use this syscall to create/access maps that eBPF programs 1438 - are concurrently updating. 1439 - 1440 - maps can have different types: hash, array, bloom filter, radix-tree, etc. 1441 - 1442 - The map is defined by: 1443 - 1444 - - type 1445 - - max number of elements 1446 - - key size in bytes 1447 - - value size in bytes 1448 - 1449 - Pruning 1450 - ------- 1451 - The verifier does not actually walk all possible paths through the program. For 1452 - each new branch to analyse, the verifier looks at all the states it's previously 1453 - been in when at this instruction. If any of them contain the current state as a 1454 - subset, the branch is 'pruned' - that is, the fact that the previous state was 1455 - accepted implies the current state would be as well. For instance, if in the 1456 - previous state, r1 held a packet-pointer, and in the current state, r1 holds a 1457 - packet-pointer with a range as long or longer and at least as strict an 1458 - alignment, then r1 is safe. Similarly, if r2 was NOT_INIT before then it can't 1459 - have been used by any path from that point, so any value in r2 (including 1460 - another NOT_INIT) is safe. The implementation is in the function regsafe(). 1461 - Pruning considers not only the registers but also the stack (and any spilled 1462 - registers it may hold). They must all be safe for the branch to be pruned. 1463 - This is implemented in states_equal(). 1464 - 1465 - Understanding eBPF verifier messages 1466 - ------------------------------------ 1467 - 1468 - The following are few examples of invalid eBPF programs and verifier error 1469 - messages as seen in the log: 1470 - 1471 - Program with unreachable instructions:: 1472 - 1473 - static struct bpf_insn prog[] = { 1474 - BPF_EXIT_INSN(), 1475 - BPF_EXIT_INSN(), 1476 - }; 1477 - 1478 - Error: 1479 - 1480 - unreachable insn 1 1481 - 1482 - Program that reads uninitialized register:: 1483 - 1484 - BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), 1485 - BPF_EXIT_INSN(), 1486 - 1487 - Error:: 1488 - 1489 - 0: (bf) r0 = r2 1490 - R2 !read_ok 1491 - 1492 - Program that doesn't initialize R0 before exiting:: 1493 - 1494 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), 1495 - BPF_EXIT_INSN(), 1496 - 1497 - Error:: 1498 - 1499 - 0: (bf) r2 = r1 1500 - 1: (95) exit 1501 - R0 !read_ok 1502 - 1503 - Program that accesses stack out of bounds:: 1504 - 1505 - BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0), 1506 - BPF_EXIT_INSN(), 1507 - 1508 - Error:: 1509 - 1510 - 0: (7a) *(u64 *)(r10 +8) = 0 1511 - invalid stack off=8 size=8 1512 - 1513 - Program that doesn't initialize stack before passing its address into function:: 1514 - 1515 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1516 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1517 - BPF_LD_MAP_FD(BPF_REG_1, 0), 1518 - BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 1519 - BPF_EXIT_INSN(), 1520 - 1521 - Error:: 1522 - 1523 - 0: (bf) r2 = r10 1524 - 1: (07) r2 += -8 1525 - 2: (b7) r1 = 0x0 1526 - 3: (85) call 1 1527 - invalid indirect read from stack off -8+0 size 8 1528 - 1529 - Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:: 1530 - 1531 - BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 1532 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1533 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1534 - BPF_LD_MAP_FD(BPF_REG_1, 0), 1535 - BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 1536 - BPF_EXIT_INSN(), 1537 - 1538 - Error:: 1539 - 1540 - 0: (7a) *(u64 *)(r10 -8) = 0 1541 - 1: (bf) r2 = r10 1542 - 2: (07) r2 += -8 1543 - 3: (b7) r1 = 0x0 1544 - 4: (85) call 1 1545 - fd 0 is not pointing to valid bpf_map 1546 - 1547 - Program that doesn't check return value of map_lookup_elem() before accessing 1548 - map element:: 1549 - 1550 - BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 1551 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1552 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1553 - BPF_LD_MAP_FD(BPF_REG_1, 0), 1554 - BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 1555 - BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), 1556 - BPF_EXIT_INSN(), 1557 - 1558 - Error:: 1559 - 1560 - 0: (7a) *(u64 *)(r10 -8) = 0 1561 - 1: (bf) r2 = r10 1562 - 2: (07) r2 += -8 1563 - 3: (b7) r1 = 0x0 1564 - 4: (85) call 1 1565 - 5: (7a) *(u64 *)(r0 +0) = 0 1566 - R0 invalid mem access 'map_value_or_null' 1567 - 1568 - Program that correctly checks map_lookup_elem() returned value for NULL, but 1569 - accesses the memory with incorrect alignment:: 1570 - 1571 - BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 1572 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1573 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1574 - BPF_LD_MAP_FD(BPF_REG_1, 0), 1575 - BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 1576 - BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 1577 - BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), 1578 - BPF_EXIT_INSN(), 1579 - 1580 - Error:: 1581 - 1582 - 0: (7a) *(u64 *)(r10 -8) = 0 1583 - 1: (bf) r2 = r10 1584 - 2: (07) r2 += -8 1585 - 3: (b7) r1 = 1 1586 - 4: (85) call 1 1587 - 5: (15) if r0 == 0x0 goto pc+1 1588 - R0=map_ptr R10=fp 1589 - 6: (7a) *(u64 *)(r0 +4) = 0 1590 - misaligned access off 4 size 8 1591 - 1592 - Program that correctly checks map_lookup_elem() returned value for NULL and 1593 - accesses memory with correct alignment in one side of 'if' branch, but fails 1594 - to do so in the other side of 'if' branch:: 1595 - 1596 - BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 1597 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1598 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1599 - BPF_LD_MAP_FD(BPF_REG_1, 0), 1600 - BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 1601 - BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 1602 - BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), 1603 - BPF_EXIT_INSN(), 1604 - BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1), 1605 - BPF_EXIT_INSN(), 1606 - 1607 - Error:: 1608 - 1609 - 0: (7a) *(u64 *)(r10 -8) = 0 1610 - 1: (bf) r2 = r10 1611 - 2: (07) r2 += -8 1612 - 3: (b7) r1 = 1 1613 - 4: (85) call 1 1614 - 5: (15) if r0 == 0x0 goto pc+2 1615 - R0=map_ptr R10=fp 1616 - 6: (7a) *(u64 *)(r0 +0) = 0 1617 - 7: (95) exit 1618 - 1619 - from 5 to 8: R0=imm0 R10=fp 1620 - 8: (7a) *(u64 *)(r0 +0) = 1 1621 - R0 invalid mem access 'imm' 1622 - 1623 - Program that performs a socket lookup then sets the pointer to NULL without 1624 - checking it:: 1625 - 1626 - BPF_MOV64_IMM(BPF_REG_2, 0), 1627 - BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 1628 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1629 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1630 - BPF_MOV64_IMM(BPF_REG_3, 4), 1631 - BPF_MOV64_IMM(BPF_REG_4, 0), 1632 - BPF_MOV64_IMM(BPF_REG_5, 0), 1633 - BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 1634 - BPF_MOV64_IMM(BPF_REG_0, 0), 1635 - BPF_EXIT_INSN(), 1636 - 1637 - Error:: 1638 - 1639 - 0: (b7) r2 = 0 1640 - 1: (63) *(u32 *)(r10 -8) = r2 1641 - 2: (bf) r2 = r10 1642 - 3: (07) r2 += -8 1643 - 4: (b7) r3 = 4 1644 - 5: (b7) r4 = 0 1645 - 6: (b7) r5 = 0 1646 - 7: (85) call bpf_sk_lookup_tcp#65 1647 - 8: (b7) r0 = 0 1648 - 9: (95) exit 1649 - Unreleased reference id=1, alloc_insn=7 1650 - 1651 - Program that performs a socket lookup but does not NULL-check the returned 1652 - value:: 1653 - 1654 - BPF_MOV64_IMM(BPF_REG_2, 0), 1655 - BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 1656 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1657 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1658 - BPF_MOV64_IMM(BPF_REG_3, 4), 1659 - BPF_MOV64_IMM(BPF_REG_4, 0), 1660 - BPF_MOV64_IMM(BPF_REG_5, 0), 1661 - BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 1662 - BPF_EXIT_INSN(), 1663 - 1664 - Error:: 1665 - 1666 - 0: (b7) r2 = 0 1667 - 1: (63) *(u32 *)(r10 -8) = r2 1668 - 2: (bf) r2 = r10 1669 - 3: (07) r2 += -8 1670 - 4: (b7) r3 = 4 1671 - 5: (b7) r4 = 0 1672 - 6: (b7) r5 = 0 1673 - 7: (85) call bpf_sk_lookup_tcp#65 1674 - 8: (95) exit 1675 - Unreleased reference id=1, alloc_insn=7 1676 - 1677 653 Testing 1678 654 ------- 1679 655 1680 656 Next to the BPF toolchain, the kernel also ships a test module that contains 1681 - various test cases for classic and internal BPF that can be executed against 657 + various test cases for classic and eBPF that can be executed against 1682 658 the BPF interpreter and JIT compiler. It can be found in lib/test_bpf.c and 1683 659 enabled via Kconfig:: 1684 660

+1 -1

MAINTAINERS

··· 3569 3569 R: Brendan Jackman <jackmanb@chromium.org> 3570 3570 L: bpf@vger.kernel.org 3571 3571 S: Maintained 3572 - F: Documentation/bpf/bpf_lsm.rst 3572 + F: Documentation/bpf/prog_lsm.rst 3573 3573 F: include/linux/bpf_lsm.h 3574 3574 F: kernel/bpf/bpf_lsm.c 3575 3575 F: security/bpf/

+4 -3

arch/arm/net/bpf_jit_32.c

··· 163 163 [BPF_REG_9] = {STACK_OFFSET(BPF_R9_HI), STACK_OFFSET(BPF_R9_LO)}, 164 164 /* Read only Frame Pointer to access Stack */ 165 165 [BPF_REG_FP] = {STACK_OFFSET(BPF_FP_HI), STACK_OFFSET(BPF_FP_LO)}, 166 - /* Temporary Register for internal BPF JIT, can be used 166 + /* Temporary Register for BPF JIT, can be used 167 167 * for constant blindings and others. 168 168 */ 169 169 [TMP_REG_1] = {ARM_R7, ARM_R6}, ··· 1199 1199 1200 1200 /* tmp2[0] = array, tmp2[1] = index */ 1201 1201 1202 - /* if (tail_call_cnt > MAX_TAIL_CALL_CNT) 1202 + /* 1203 + * if (tail_call_cnt >= MAX_TAIL_CALL_CNT) 1203 1204 * goto out; 1204 1205 * tail_call_cnt++; 1205 1206 */ ··· 1209 1208 tc = arm_bpf_get_reg64(tcc, tmp, ctx); 1210 1209 emit(ARM_CMP_I(tc[0], hi), ctx); 1211 1210 _emit(ARM_COND_EQ, ARM_CMP_I(tc[1], lo), ctx); 1212 - _emit(ARM_COND_HI, ARM_B(jmp_offset), ctx); 1211 + _emit(ARM_COND_CS, ARM_B(jmp_offset), ctx); 1213 1212 emit(ARM_ADDS_I(tc[1], tc[1], 1), ctx); 1214 1213 emit(ARM_ADC_I(tc[0], tc[0], 0), ctx); 1215 1214 arm_bpf_put_reg64(tcc, tmp, ctx);

+4 -3

arch/arm64/net/bpf_jit_comp.c

··· 44 44 [BPF_REG_9] = A64_R(22), 45 45 /* read-only frame pointer to access stack */ 46 46 [BPF_REG_FP] = A64_R(25), 47 - /* temporary registers for internal BPF JIT */ 47 + /* temporary registers for BPF JIT */ 48 48 [TMP_REG_1] = A64_R(10), 49 49 [TMP_REG_2] = A64_R(11), 50 50 [TMP_REG_3] = A64_R(12), ··· 287 287 emit(A64_CMP(0, r3, tmp), ctx); 288 288 emit(A64_B_(A64_COND_CS, jmp_offset), ctx); 289 289 290 - /* if (tail_call_cnt > MAX_TAIL_CALL_CNT) 290 + /* 291 + * if (tail_call_cnt >= MAX_TAIL_CALL_CNT) 291 292 * goto out; 292 293 * tail_call_cnt++; 293 294 */ 294 295 emit_a64_mov_i64(tmp, MAX_TAIL_CALL_CNT, ctx); 295 296 emit(A64_CMP(1, tcc, tmp), ctx); 296 - emit(A64_B_(A64_COND_HI, jmp_offset), ctx); 297 + emit(A64_B_(A64_COND_CS, jmp_offset), ctx); 297 298 emit(A64_ADD_I(1, tcc, tcc, 1), ctx); 298 299 299 300 /* prog = array->ptrs[index];

+1 -2

arch/mips/net/bpf_jit_comp32.c

··· 1381 1381 * 16-byte area in the parent's stack frame. On a tail call, the 1382 1382 * calling function jumps into the prologue after these instructions. 1383 1383 */ 1384 - emit(ctx, ori, MIPS_R_T9, MIPS_R_ZERO, 1385 - min(MAX_TAIL_CALL_CNT + 1, 0xffff)); 1384 + emit(ctx, ori, MIPS_R_T9, MIPS_R_ZERO, min(MAX_TAIL_CALL_CNT, 0xffff)); 1386 1385 emit(ctx, sw, MIPS_R_T9, 0, MIPS_R_SP); 1387 1386 1388 1387 /*

+1 -1

arch/mips/net/bpf_jit_comp64.c

··· 552 552 * On a tail call, the calling function jumps into the prologue 553 553 * after this instruction. 554 554 */ 555 - emit(ctx, addiu, tc, MIPS_R_ZERO, min(MAX_TAIL_CALL_CNT + 1, 0xffff)); 555 + emit(ctx, ori, tc, MIPS_R_ZERO, min(MAX_TAIL_CALL_CNT, 0xffff)); 556 556 557 557 /* === Entry-point for tail calls === */ 558 558

+2 -2

arch/powerpc/net/bpf_jit_comp32.c

··· 221 221 PPC_BCC(COND_GE, out); 222 222 223 223 /* 224 - * if (tail_call_cnt > MAX_TAIL_CALL_CNT) 224 + * if (tail_call_cnt >= MAX_TAIL_CALL_CNT) 225 225 * goto out; 226 226 */ 227 227 EMIT(PPC_RAW_CMPLWI(_R0, MAX_TAIL_CALL_CNT)); 228 228 /* tail_call_cnt++; */ 229 229 EMIT(PPC_RAW_ADDIC(_R0, _R0, 1)); 230 - PPC_BCC(COND_GT, out); 230 + PPC_BCC(COND_GE, out); 231 231 232 232 /* prog = array->ptrs[index]; */ 233 233 EMIT(PPC_RAW_RLWINM(_R3, b2p_index, 2, 0, 29));

+2 -2

arch/powerpc/net/bpf_jit_comp64.c

··· 228 228 PPC_BCC(COND_GE, out); 229 229 230 230 /* 231 - * if (tail_call_cnt > MAX_TAIL_CALL_CNT) 231 + * if (tail_call_cnt >= MAX_TAIL_CALL_CNT) 232 232 * goto out; 233 233 */ 234 234 PPC_BPF_LL(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx)); 235 235 EMIT(PPC_RAW_CMPLWI(b2p[TMP_REG_1], MAX_TAIL_CALL_CNT)); 236 - PPC_BCC(COND_GT, out); 236 + PPC_BCC(COND_GE, out); 237 237 238 238 /* 239 239 * tail_call_cnt++;

+2 -4

arch/riscv/net/bpf_jit_comp32.c

··· 799 799 emit_bcc(BPF_JGE, lo(idx_reg), RV_REG_T1, off, ctx); 800 800 801 801 /* 802 - * temp_tcc = tcc - 1; 803 - * if (tcc < 0) 802 + * if (--tcc < 0) 804 803 * goto out; 805 804 */ 806 - emit(rv_addi(RV_REG_T1, RV_REG_TCC, -1), ctx); 805 + emit(rv_addi(RV_REG_TCC, RV_REG_TCC, -1), ctx); 807 806 off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn)); 808 807 emit_bcc(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx); 809 808 ··· 828 829 if (is_12b_check(off, insn)) 829 830 return -1; 830 831 emit(rv_lw(RV_REG_T0, off, RV_REG_T0), ctx); 831 - emit(rv_addi(RV_REG_TCC, RV_REG_T1, 0), ctx); 832 832 /* Epilogue jumps to *(t0 + 4). */ 833 833 __build_epilogue(true, ctx); 834 834 return 0;

+3 -4

arch/riscv/net/bpf_jit_comp64.c

··· 327 327 off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn)); 328 328 emit_branch(BPF_JGE, RV_REG_A2, RV_REG_T1, off, ctx); 329 329 330 - /* if (TCC-- < 0) 330 + /* if (--TCC < 0) 331 331 * goto out; 332 332 */ 333 - emit_addi(RV_REG_T1, tcc, -1, ctx); 333 + emit_addi(RV_REG_TCC, tcc, -1, ctx); 334 334 off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn)); 335 - emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx); 335 + emit_branch(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx); 336 336 337 337 /* prog = array->ptrs[index]; 338 338 * if (!prog) ··· 352 352 if (is_12b_check(off, insn)) 353 353 return -1; 354 354 emit_ld(RV_REG_T3, off, RV_REG_T2, ctx); 355 - emit_mv(RV_REG_TCC, RV_REG_T1, ctx); 356 355 __build_epilogue(true, ctx); 357 356 return 0; 358 357 }

+3 -3

arch/s390/net/bpf_jit_comp.c

··· 1369 1369 jit->prg); 1370 1370 1371 1371 /* 1372 - * if (tail_call_cnt++ > MAX_TAIL_CALL_CNT) 1372 + * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT) 1373 1373 * goto out; 1374 1374 */ 1375 1375 ··· 1381 1381 EMIT4_IMM(0xa7080000, REG_W0, 1); 1382 1382 /* laal %w1,%w0,off(%r15) */ 1383 1383 EMIT6_DISP_LH(0xeb000000, 0x00fa, REG_W1, REG_W0, REG_15, off); 1384 - /* clij %w1,MAX_TAIL_CALL_CNT,0x2,out */ 1384 + /* clij %w1,MAX_TAIL_CALL_CNT-1,0x2,out */ 1385 1385 patch_2_clij = jit->prg; 1386 - EMIT6_PCREL_RIEC(0xec000000, 0x007f, REG_W1, MAX_TAIL_CALL_CNT, 1386 + EMIT6_PCREL_RIEC(0xec000000, 0x007f, REG_W1, MAX_TAIL_CALL_CNT - 1, 1387 1387 2, jit->prg); 1388 1388 1389 1389 /*

+2 -2

arch/sparc/net/bpf_jit_comp_64.c

··· 227 227 228 228 [BPF_REG_AX] = G7, 229 229 230 - /* temporary register for internal BPF JIT */ 230 + /* temporary register for BPF JIT */ 231 231 [TMP_REG_1] = G1, 232 232 [TMP_REG_2] = G2, 233 233 [TMP_REG_3] = G3, ··· 867 867 emit(LD32 | IMMED | RS1(SP) | S13(off) | RD(tmp), ctx); 868 868 emit_cmpi(tmp, MAX_TAIL_CALL_CNT, ctx); 869 869 #define OFFSET2 13 870 - emit_branch(BGU, ctx->idx, ctx->idx + OFFSET2, ctx); 870 + emit_branch(BGEU, ctx->idx, ctx->idx + OFFSET2, ctx); 871 871 emit_nop(ctx); 872 872 873 873 emit_alu_K(ADD, tmp, 1, ctx);

+7 -7

arch/x86/net/bpf_jit_comp.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 /* 3 - * bpf_jit_comp.c: BPF JIT compiler 3 + * BPF JIT compiler 4 4 * 5 5 * Copyright (C) 2011-2013 Eric Dumazet (eric.dumazet@gmail.com) 6 - * Internal BPF Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com 6 + * Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com 7 7 */ 8 8 #include <linux/netdevice.h> 9 9 #include <linux/filter.h> ··· 412 412 * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ... 413 413 * if (index >= array->map.max_entries) 414 414 * goto out; 415 - * if (++tail_call_cnt > MAX_TAIL_CALL_CNT) 415 + * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT) 416 416 * goto out; 417 417 * prog = array->ptrs[index]; 418 418 * if (prog == NULL) ··· 446 446 EMIT2(X86_JBE, offset); /* jbe out */ 447 447 448 448 /* 449 - * if (tail_call_cnt > MAX_TAIL_CALL_CNT) 449 + * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT) 450 450 * goto out; 451 451 */ 452 452 EMIT2_off32(0x8B, 0x85, tcc_off); /* mov eax, dword ptr [rbp - tcc_off] */ 453 453 EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ 454 454 455 455 offset = ctx->tail_call_indirect_label - (prog + 2 - start); 456 - EMIT2(X86_JA, offset); /* ja out */ 456 + EMIT2(X86_JAE, offset); /* jae out */ 457 457 EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ 458 458 EMIT2_off32(0x89, 0x85, tcc_off); /* mov dword ptr [rbp - tcc_off], eax */ 459 459 ··· 504 504 int offset; 505 505 506 506 /* 507 - * if (tail_call_cnt > MAX_TAIL_CALL_CNT) 507 + * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT) 508 508 * goto out; 509 509 */ 510 510 EMIT2_off32(0x8B, 0x85, tcc_off); /* mov eax, dword ptr [rbp - tcc_off] */ 511 511 EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */ 512 512 513 513 offset = ctx->tail_call_direct_label - (prog + 2 - start); 514 - EMIT2(X86_JA, offset); /* ja out */ 514 + EMIT2(X86_JAE, offset); /* jae out */ 515 515 EMIT3(0x83, 0xC0, 0x01); /* add eax, 1 */ 516 516 EMIT2_off32(0x89, 0x85, tcc_off); /* mov dword ptr [rbp - tcc_off], eax */ 517 517

+2 -2

arch/x86/net/bpf_jit_comp32.c

··· 1323 1323 EMIT2(IA32_JBE, jmp_label(jmp_label1, 2)); 1324 1324 1325 1325 /* 1326 - * if (tail_call_cnt > MAX_TAIL_CALL_CNT) 1326 + * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT) 1327 1327 * goto out; 1328 1328 */ 1329 1329 lo = (u32)MAX_TAIL_CALL_CNT; ··· 1337 1337 /* cmp ecx,lo */ 1338 1338 EMIT3(0x83, add_1reg(0xF8, IA32_ECX), lo); 1339 1339 1340 - /* ja out */ 1340 + /* jae out */ 1341 1341 EMIT2(IA32_JAE, jmp_label(jmp_label1, 2)); 1342 1342 1343 1343 /* add eax,0x1 */

+10 -1

include/linux/bpf.h

··· 1082 1082 }; 1083 1083 1084 1084 #define BPF_COMPLEXITY_LIMIT_INSNS 1000000 /* yes. 1M insns */ 1085 - #define MAX_TAIL_CALL_CNT 32 1085 + #define MAX_TAIL_CALL_CNT 33 1086 1086 1087 1087 #define BPF_F_ACCESS_MASK (BPF_F_RDONLY | \ 1088 1088 BPF_F_RDONLY_PROG | \ ··· 1722 1722 const struct btf_func_model * 1723 1723 bpf_jit_find_kfunc_model(const struct bpf_prog *prog, 1724 1724 const struct bpf_insn *insn); 1725 + struct bpf_core_ctx { 1726 + struct bpf_verifier_log *log; 1727 + const struct btf *btf; 1728 + }; 1729 + 1730 + int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo, 1731 + int relo_idx, void *insn); 1732 + 1725 1733 #else /* !CONFIG_BPF_SYSCALL */ 1726 1734 static inline struct bpf_prog *bpf_prog_get(u32 ufd) 1727 1735 { ··· 2162 2154 extern const struct bpf_func_proto bpf_sk_getsockopt_proto; 2163 2155 extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto; 2164 2156 extern const struct bpf_func_proto bpf_find_vma_proto; 2157 + extern const struct bpf_func_proto bpf_loop_proto; 2165 2158 2166 2159 const struct bpf_func_proto *tracing_prog_func_proto( 2167 2160 enum bpf_func_id func_id, const struct bpf_prog *prog);

+7

include/linux/bpf_verifier.h

··· 396 396 log->level == BPF_LOG_KERNEL); 397 397 } 398 398 399 + static inline bool 400 + bpf_verifier_log_attr_valid(const struct bpf_verifier_log *log) 401 + { 402 + return log->len_total >= 128 && log->len_total <= UINT_MAX >> 2 && 403 + log->level && log->ubuf && !(log->level & ~BPF_LOG_MASK); 404 + } 405 + 399 406 #define BPF_MAX_SUBPROGS 256 400 407 401 408 struct bpf_subprog_info {

+85 -4

include/linux/btf.h

··· 144 144 return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM; 145 145 } 146 146 147 + static inline bool str_is_empty(const char *s) 148 + { 149 + return !s || !s[0]; 150 + } 151 + 152 + static inline u16 btf_kind(const struct btf_type *t) 153 + { 154 + return BTF_INFO_KIND(t->info); 155 + } 156 + 157 + static inline bool btf_is_enum(const struct btf_type *t) 158 + { 159 + return btf_kind(t) == BTF_KIND_ENUM; 160 + } 161 + 162 + static inline bool btf_is_composite(const struct btf_type *t) 163 + { 164 + u16 kind = btf_kind(t); 165 + 166 + return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION; 167 + } 168 + 169 + static inline bool btf_is_array(const struct btf_type *t) 170 + { 171 + return btf_kind(t) == BTF_KIND_ARRAY; 172 + } 173 + 174 + static inline bool btf_is_int(const struct btf_type *t) 175 + { 176 + return btf_kind(t) == BTF_KIND_INT; 177 + } 178 + 179 + static inline bool btf_is_ptr(const struct btf_type *t) 180 + { 181 + return btf_kind(t) == BTF_KIND_PTR; 182 + } 183 + 184 + static inline u8 btf_int_offset(const struct btf_type *t) 185 + { 186 + return BTF_INT_OFFSET(*(u32 *)(t + 1)); 187 + } 188 + 189 + static inline u8 btf_int_encoding(const struct btf_type *t) 190 + { 191 + return BTF_INT_ENCODING(*(u32 *)(t + 1)); 192 + } 193 + 147 194 static inline bool btf_type_is_scalar(const struct btf_type *t) 148 195 { 149 196 return btf_type_is_int(t) || btf_type_is_enum(t); ··· 231 184 return BTF_INFO_VLEN(t->info); 232 185 } 233 186 187 + static inline u16 btf_vlen(const struct btf_type *t) 188 + { 189 + return btf_type_vlen(t); 190 + } 191 + 234 192 static inline u16 btf_func_linkage(const struct btf_type *t) 235 193 { 236 194 return BTF_INFO_VLEN(t->info); ··· 246 194 return BTF_INFO_KFLAG(t->info); 247 195 } 248 196 249 - static inline u32 btf_member_bit_offset(const struct btf_type *struct_type, 250 - const struct btf_member *member) 197 + static inline u32 __btf_member_bit_offset(const struct btf_type *struct_type, 198 + const struct btf_member *member) 251 199 { 252 200 return btf_type_kflag(struct_type) ? BTF_MEMBER_BIT_OFFSET(member->offset) 253 201 : member->offset; 254 202 } 255 203 256 - static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type, 257 - const struct btf_member *member) 204 + static inline u32 __btf_member_bitfield_size(const struct btf_type *struct_type, 205 + const struct btf_member *member) 258 206 { 259 207 return btf_type_kflag(struct_type) ? BTF_MEMBER_BITFIELD_SIZE(member->offset) 260 208 : 0; 261 209 } 262 210 211 + static inline struct btf_member *btf_members(const struct btf_type *t) 212 + { 213 + return (struct btf_member *)(t + 1); 214 + } 215 + 216 + static inline u32 btf_member_bit_offset(const struct btf_type *t, u32 member_idx) 217 + { 218 + const struct btf_member *m = btf_members(t) + member_idx; 219 + 220 + return __btf_member_bit_offset(t, m); 221 + } 222 + 223 + static inline u32 btf_member_bitfield_size(const struct btf_type *t, u32 member_idx) 224 + { 225 + const struct btf_member *m = btf_members(t) + member_idx; 226 + 227 + return __btf_member_bitfield_size(t, m); 228 + } 229 + 263 230 static inline const struct btf_member *btf_type_member(const struct btf_type *t) 264 231 { 265 232 return (const struct btf_member *)(t + 1); 233 + } 234 + 235 + static inline struct btf_array *btf_array(const struct btf_type *t) 236 + { 237 + return (struct btf_array *)(t + 1); 238 + } 239 + 240 + static inline struct btf_enum *btf_enum(const struct btf_type *t) 241 + { 242 + return (struct btf_enum *)(t + 1); 266 243 } 267 244 268 245 static inline const struct btf_var_secinfo *btf_type_var_secinfo(

+103 -2

include/uapi/linux/bpf.h

··· 1342 1342 /* or valid module BTF object fd or 0 to attach to vmlinux */ 1343 1343 __u32 attach_btf_obj_fd; 1344 1344 }; 1345 - __u32 :32; /* pad */ 1345 + __u32 core_relo_cnt; /* number of bpf_core_relo */ 1346 1346 __aligned_u64 fd_array; /* array of FDs */ 1347 + __aligned_u64 core_relos; 1348 + __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ 1347 1349 }; 1348 1350 1349 1351 struct { /* anonymous struct used by BPF_OBJ_* commands */ ··· 1746 1744 * if the maximum number of tail calls has been reached for this 1747 1745 * chain of programs. This limit is defined in the kernel by the 1748 1746 * macro **MAX_TAIL_CALL_CNT** (not accessible to user space), 1749 - * which is currently set to 32. 1747 + * which is currently set to 33. 1750 1748 * Return 1751 1749 * 0 on success, or a negative error in case of failure. 1752 1750 * ··· 4959 4957 * **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*. 4960 4958 * **-EBUSY** if failed to try lock mmap_lock. 4961 4959 * **-EINVAL** for invalid **flags**. 4960 + * 4961 + * long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64 flags) 4962 + * Description 4963 + * For **nr_loops**, call **callback_fn** function 4964 + * with **callback_ctx** as the context parameter. 4965 + * The **callback_fn** should be a static function and 4966 + * the **callback_ctx** should be a pointer to the stack. 4967 + * The **flags** is used to control certain aspects of the helper. 4968 + * Currently, the **flags** must be 0. Currently, nr_loops is 4969 + * limited to 1 << 23 (~8 million) loops. 4970 + * 4971 + * long (\*callback_fn)(u32 index, void \*ctx); 4972 + * 4973 + * where **index** is the current index in the loop. The index 4974 + * is zero-indexed. 4975 + * 4976 + * If **callback_fn** returns 0, the helper will continue to the next 4977 + * loop. If return value is 1, the helper will skip the rest of 4978 + * the loops and return. Other return values are not used now, 4979 + * and will be rejected by the verifier. 4980 + * 4981 + * Return 4982 + * The number of loops performed, **-EINVAL** for invalid **flags**, 4983 + * **-E2BIG** if **nr_loops** exceeds the maximum number of loops. 4962 4984 */ 4963 4985 #define __BPF_FUNC_MAPPER(FN) \ 4964 4986 FN(unspec), \ ··· 5166 5140 FN(skc_to_unix_sock), \ 5167 5141 FN(kallsyms_lookup_name), \ 5168 5142 FN(find_vma), \ 5143 + FN(loop), \ 5169 5144 /* */ 5170 5145 5171 5146 /* integer value in 'imm' field of BPF_CALL instruction selects which helper ··· 6374 6347 BTF_F_NONAME = (1ULL << 1), 6375 6348 BTF_F_PTR_RAW = (1ULL << 2), 6376 6349 BTF_F_ZERO = (1ULL << 3), 6350 + }; 6351 + 6352 + /* bpf_core_relo_kind encodes which aspect of captured field/type/enum value 6353 + * has to be adjusted by relocations. It is emitted by llvm and passed to 6354 + * libbpf and later to the kernel. 6355 + */ 6356 + enum bpf_core_relo_kind { 6357 + BPF_CORE_FIELD_BYTE_OFFSET = 0, /* field byte offset */ 6358 + BPF_CORE_FIELD_BYTE_SIZE = 1, /* field size in bytes */ 6359 + BPF_CORE_FIELD_EXISTS = 2, /* field existence in target kernel */ 6360 + BPF_CORE_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */ 6361 + BPF_CORE_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */ 6362 + BPF_CORE_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */ 6363 + BPF_CORE_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */ 6364 + BPF_CORE_TYPE_ID_TARGET = 7, /* type ID in target kernel */ 6365 + BPF_CORE_TYPE_EXISTS = 8, /* type existence in target kernel */ 6366 + BPF_CORE_TYPE_SIZE = 9, /* type size in bytes */ 6367 + BPF_CORE_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */ 6368 + BPF_CORE_ENUMVAL_VALUE = 11, /* enum value integer value */ 6369 + }; 6370 + 6371 + /* 6372 + * "struct bpf_core_relo" is used to pass relocation data form LLVM to libbpf 6373 + * and from libbpf to the kernel. 6374 + * 6375 + * CO-RE relocation captures the following data: 6376 + * - insn_off - instruction offset (in bytes) within a BPF program that needs 6377 + * its insn->imm field to be relocated with actual field info; 6378 + * - type_id - BTF type ID of the "root" (containing) entity of a relocatable 6379 + * type or field; 6380 + * - access_str_off - offset into corresponding .BTF string section. String 6381 + * interpretation depends on specific relocation kind: 6382 + * - for field-based relocations, string encodes an accessed field using 6383 + * a sequence of field and array indices, separated by colon (:). It's 6384 + * conceptually very close to LLVM's getelementptr ([0]) instruction's 6385 + * arguments for identifying offset to a field. 6386 + * - for type-based relocations, strings is expected to be just "0"; 6387 + * - for enum value-based relocations, string contains an index of enum 6388 + * value within its enum type; 6389 + * - kind - one of enum bpf_core_relo_kind; 6390 + * 6391 + * Example: 6392 + * struct sample { 6393 + * int a; 6394 + * struct { 6395 + * int b[10]; 6396 + * }; 6397 + * }; 6398 + * 6399 + * struct sample *s = ...; 6400 + * int *x = &s->a; // encoded as "0:0" (a is field #0) 6401 + * int *y = &s->b[5]; // encoded as "0:1:0:5" (anon struct is field #1, 6402 + * // b is field #0 inside anon struct, accessing elem #5) 6403 + * int *z = &s[10]->b; // encoded as "10:1" (ptr is used as an array) 6404 + * 6405 + * type_id for all relocs in this example will capture BTF type id of 6406 + * `struct sample`. 6407 + * 6408 + * Such relocation is emitted when using __builtin_preserve_access_index() 6409 + * Clang built-in, passing expression that captures field address, e.g.: 6410 + * 6411 + * bpf_probe_read(&dst, sizeof(dst), 6412 + * __builtin_preserve_access_index(&src->a.b.c)); 6413 + * 6414 + * In this case Clang will emit field relocation recording necessary data to 6415 + * be able to find offset of embedded `a.b.c` field within `src` struct. 6416 + * 6417 + * [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction 6418 + */ 6419 + struct bpf_core_relo { 6420 + __u32 insn_off; 6421 + __u32 type_id; 6422 + __u32 access_str_off; 6423 + enum bpf_core_relo_kind kind; 6377 6424 }; 6378 6425 6379 6426 #endif /* _UAPI__LINUX_BPF_H__ */

+4

kernel/bpf/Makefile

··· 36 36 obj-${CONFIG_BPF_LSM} += bpf_lsm.o 37 37 endif 38 38 obj-$(CONFIG_BPF_PRELOAD) += preload/ 39 + 40 + obj-$(CONFIG_BPF_SYSCALL) += relo_core.o 41 + $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE 42 + $(call if_changed_rule,cc_o_c)

+35

kernel/bpf/bpf_iter.c

··· 714 714 .arg3_type = ARG_PTR_TO_STACK_OR_NULL, 715 715 .arg4_type = ARG_ANYTHING, 716 716 }; 717 + 718 + /* maximum number of loops */ 719 + #define MAX_LOOPS BIT(23) 720 + 721 + BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx, 722 + u64, flags) 723 + { 724 + bpf_callback_t callback = (bpf_callback_t)callback_fn; 725 + u64 ret; 726 + u32 i; 727 + 728 + if (flags) 729 + return -EINVAL; 730 + if (nr_loops > MAX_LOOPS) 731 + return -E2BIG; 732 + 733 + for (i = 0; i < nr_loops; i++) { 734 + ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0); 735 + /* return value: 0 - continue, 1 - stop and return */ 736 + if (ret) 737 + return i + 1; 738 + } 739 + 740 + return i; 741 + } 742 + 743 + const struct bpf_func_proto bpf_loop_proto = { 744 + .func = bpf_loop, 745 + .gpl_only = false, 746 + .ret_type = RET_INTEGER, 747 + .arg1_type = ARG_ANYTHING, 748 + .arg2_type = ARG_PTR_TO_FUNC, 749 + .arg3_type = ARG_PTR_TO_STACK_OR_NULL, 750 + .arg4_type = ARG_ANYTHING, 751 + };

+3 -3

kernel/bpf/bpf_struct_ops.c

··· 165 165 break; 166 166 } 167 167 168 - if (btf_member_bitfield_size(t, member)) { 168 + if (__btf_member_bitfield_size(t, member)) { 169 169 pr_warn("bit field member %s in struct %s is not supported\n", 170 170 mname, st_ops->name); 171 171 break; ··· 296 296 const struct btf_type *mtype; 297 297 298 298 for_each_member(i, t, member) { 299 - moff = btf_member_bit_offset(t, member) / 8; 299 + moff = __btf_member_bit_offset(t, member) / 8; 300 300 if (moff > prev_mend && 301 301 memchr_inv(data + prev_mend, 0, moff - prev_mend)) 302 302 return -EINVAL; ··· 387 387 struct bpf_prog *prog; 388 388 u32 moff; 389 389 390 - moff = btf_member_bit_offset(t, member) / 8; 390 + moff = __btf_member_bit_offset(t, member) / 8; 391 391 ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, NULL); 392 392 if (ptype == module_type) { 393 393 if (*(void **)(udata + moff))

+398 -12

kernel/bpf/btf.c

··· 25 25 #include <linux/kobject.h> 26 26 #include <linux/sysfs.h> 27 27 #include <net/sock.h> 28 + #include "../tools/lib/bpf/relo_core.h" 28 29 29 30 /* BTF (BPF Type Format) is the meta data format which describes 30 31 * the data types of BPF program/map. Hence, it basically focus ··· 837 836 const char *ptr_suffix = &ptr_suffixes[strlen(ptr_suffixes)]; 838 837 const char *name = NULL, *prefix = "", *parens = ""; 839 838 const struct btf_member *m = show->state.member; 840 - const struct btf_type *t = show->state.type; 839 + const struct btf_type *t; 841 840 const struct btf_array *array; 842 841 u32 id = show->state.type_id; 843 842 const char *member = NULL; ··· 2970 2969 return -EINVAL; 2971 2970 } 2972 2971 2973 - offset = btf_member_bit_offset(t, member); 2972 + offset = __btf_member_bit_offset(t, member); 2974 2973 if (is_union && offset) { 2975 2974 btf_verifier_log_member(env, t, member, 2976 2975 "Invalid member bits_offset"); ··· 3095 3094 if (off != -ENOENT) 3096 3095 /* only one such field is allowed */ 3097 3096 return -E2BIG; 3098 - off = btf_member_bit_offset(t, member); 3097 + off = __btf_member_bit_offset(t, member); 3099 3098 if (off % 8) 3100 3099 /* valid C code cannot generate such BTF */ 3101 3100 return -EINVAL; ··· 3185 3184 3186 3185 btf_show_start_member(show, member); 3187 3186 3188 - member_offset = btf_member_bit_offset(t, member); 3189 - bitfield_size = btf_member_bitfield_size(t, member); 3187 + member_offset = __btf_member_bit_offset(t, member); 3188 + bitfield_size = __btf_member_bitfield_size(t, member); 3190 3189 bytes_offset = BITS_ROUNDDOWN_BYTES(member_offset); 3191 3190 bits8_offset = BITS_PER_BYTE_MASKED(member_offset); 3192 3191 if (bitfield_size) { ··· 4473 4472 log->len_total = log_size; 4474 4473 4475 4474 /* log attributes have to be sane */ 4476 - if (log->len_total < 128 || log->len_total > UINT_MAX >> 8 || 4477 - !log->level || !log->ubuf) { 4475 + if (!bpf_verifier_log_attr_valid(log)) { 4478 4476 err = -EINVAL; 4479 4477 goto errout; 4480 4478 } ··· 5060 5060 if (array_elem->nelems != 0) 5061 5061 goto error; 5062 5062 5063 - moff = btf_member_bit_offset(t, member) / 8; 5063 + moff = __btf_member_bit_offset(t, member) / 8; 5064 5064 if (off < moff) 5065 5065 goto error; 5066 5066 ··· 5083 5083 5084 5084 for_each_member(i, t, member) { 5085 5085 /* offset of the field in bytes */ 5086 - moff = btf_member_bit_offset(t, member) / 8; 5086 + moff = __btf_member_bit_offset(t, member) / 8; 5087 5087 if (off + size <= moff) 5088 5088 /* won't find anything, field is already too far */ 5089 5089 break; 5090 5090 5091 - if (btf_member_bitfield_size(t, member)) { 5092 - u32 end_bit = btf_member_bit_offset(t, member) + 5093 - btf_member_bitfield_size(t, member); 5091 + if (__btf_member_bitfield_size(t, member)) { 5092 + u32 end_bit = __btf_member_bit_offset(t, member) + 5093 + __btf_member_bitfield_size(t, member); 5094 5094 5095 5095 /* off <= moff instead of off == moff because clang 5096 5096 * does not generate a BTF member for anonymous ··· 6169 6169 return len; 6170 6170 } 6171 6171 6172 + static void purge_cand_cache(struct btf *btf); 6173 + 6172 6174 static int btf_module_notify(struct notifier_block *nb, unsigned long op, 6173 6175 void *module) 6174 6176 { ··· 6205 6203 goto out; 6206 6204 } 6207 6205 6206 + purge_cand_cache(NULL); 6208 6207 mutex_lock(&btf_module_mutex); 6209 6208 btf_mod->module = module; 6210 6209 btf_mod->btf = btf; ··· 6248 6245 list_del(&btf_mod->list); 6249 6246 if (btf_mod->sysfs_attr) 6250 6247 sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr); 6248 + purge_cand_cache(btf_mod->btf); 6251 6249 btf_put(btf_mod->btf); 6252 6250 kfree(btf_mod->sysfs_attr); 6253 6251 kfree(btf_mod); ··· 6410 6406 DEFINE_KFUNC_BTF_ID_LIST(prog_test_kfunc_list); 6411 6407 6412 6408 #endif 6409 + 6410 + int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id, 6411 + const struct btf *targ_btf, __u32 targ_id) 6412 + { 6413 + return -EOPNOTSUPP; 6414 + } 6415 + 6416 + static bool bpf_core_is_flavor_sep(const char *s) 6417 + { 6418 + /* check X___Y name pattern, where X and Y are not underscores */ 6419 + return s[0] != '_' && /* X */ 6420 + s[1] == '_' && s[2] == '_' && s[3] == '_' && /* ___ */ 6421 + s[4] != '_'; /* Y */ 6422 + } 6423 + 6424 + size_t bpf_core_essential_name_len(const char *name) 6425 + { 6426 + size_t n = strlen(name); 6427 + int i; 6428 + 6429 + for (i = n - 5; i >= 0; i--) { 6430 + if (bpf_core_is_flavor_sep(name + i)) 6431 + return i + 1; 6432 + } 6433 + return n; 6434 + } 6435 + 6436 + struct bpf_cand_cache { 6437 + const char *name; 6438 + u32 name_len; 6439 + u16 kind; 6440 + u16 cnt; 6441 + struct { 6442 + const struct btf *btf; 6443 + u32 id; 6444 + } cands[]; 6445 + }; 6446 + 6447 + static void bpf_free_cands(struct bpf_cand_cache *cands) 6448 + { 6449 + if (!cands->cnt) 6450 + /* empty candidate array was allocated on stack */ 6451 + return; 6452 + kfree(cands); 6453 + } 6454 + 6455 + static void bpf_free_cands_from_cache(struct bpf_cand_cache *cands) 6456 + { 6457 + kfree(cands->name); 6458 + kfree(cands); 6459 + } 6460 + 6461 + #define VMLINUX_CAND_CACHE_SIZE 31 6462 + static struct bpf_cand_cache *vmlinux_cand_cache[VMLINUX_CAND_CACHE_SIZE]; 6463 + 6464 + #define MODULE_CAND_CACHE_SIZE 31 6465 + static struct bpf_cand_cache *module_cand_cache[MODULE_CAND_CACHE_SIZE]; 6466 + 6467 + static DEFINE_MUTEX(cand_cache_mutex); 6468 + 6469 + static void __print_cand_cache(struct bpf_verifier_log *log, 6470 + struct bpf_cand_cache **cache, 6471 + int cache_size) 6472 + { 6473 + struct bpf_cand_cache *cc; 6474 + int i, j; 6475 + 6476 + for (i = 0; i < cache_size; i++) { 6477 + cc = cache[i]; 6478 + if (!cc) 6479 + continue; 6480 + bpf_log(log, "[%d]%s(", i, cc->name); 6481 + for (j = 0; j < cc->cnt; j++) { 6482 + bpf_log(log, "%d", cc->cands[j].id); 6483 + if (j < cc->cnt - 1) 6484 + bpf_log(log, " "); 6485 + } 6486 + bpf_log(log, "), "); 6487 + } 6488 + } 6489 + 6490 + static void print_cand_cache(struct bpf_verifier_log *log) 6491 + { 6492 + mutex_lock(&cand_cache_mutex); 6493 + bpf_log(log, "vmlinux_cand_cache:"); 6494 + __print_cand_cache(log, vmlinux_cand_cache, VMLINUX_CAND_CACHE_SIZE); 6495 + bpf_log(log, "\nmodule_cand_cache:"); 6496 + __print_cand_cache(log, module_cand_cache, MODULE_CAND_CACHE_SIZE); 6497 + bpf_log(log, "\n"); 6498 + mutex_unlock(&cand_cache_mutex); 6499 + } 6500 + 6501 + static u32 hash_cands(struct bpf_cand_cache *cands) 6502 + { 6503 + return jhash(cands->name, cands->name_len, 0); 6504 + } 6505 + 6506 + static struct bpf_cand_cache *check_cand_cache(struct bpf_cand_cache *cands, 6507 + struct bpf_cand_cache **cache, 6508 + int cache_size) 6509 + { 6510 + struct bpf_cand_cache *cc = cache[hash_cands(cands) % cache_size]; 6511 + 6512 + if (cc && cc->name_len == cands->name_len && 6513 + !strncmp(cc->name, cands->name, cands->name_len)) 6514 + return cc; 6515 + return NULL; 6516 + } 6517 + 6518 + static size_t sizeof_cands(int cnt) 6519 + { 6520 + return offsetof(struct bpf_cand_cache, cands[cnt]); 6521 + } 6522 + 6523 + static struct bpf_cand_cache *populate_cand_cache(struct bpf_cand_cache *cands, 6524 + struct bpf_cand_cache **cache, 6525 + int cache_size) 6526 + { 6527 + struct bpf_cand_cache **cc = &cache[hash_cands(cands) % cache_size], *new_cands; 6528 + 6529 + if (*cc) { 6530 + bpf_free_cands_from_cache(*cc); 6531 + *cc = NULL; 6532 + } 6533 + new_cands = kmalloc(sizeof_cands(cands->cnt), GFP_KERNEL); 6534 + if (!new_cands) { 6535 + bpf_free_cands(cands); 6536 + return ERR_PTR(-ENOMEM); 6537 + } 6538 + memcpy(new_cands, cands, sizeof_cands(cands->cnt)); 6539 + /* strdup the name, since it will stay in cache. 6540 + * the cands->name points to strings in prog's BTF and the prog can be unloaded. 6541 + */ 6542 + new_cands->name = kmemdup_nul(cands->name, cands->name_len, GFP_KERNEL); 6543 + bpf_free_cands(cands); 6544 + if (!new_cands->name) { 6545 + kfree(new_cands); 6546 + return ERR_PTR(-ENOMEM); 6547 + } 6548 + *cc = new_cands; 6549 + return new_cands; 6550 + } 6551 + 6552 + #ifdef CONFIG_DEBUG_INFO_BTF_MODULES 6553 + static void __purge_cand_cache(struct btf *btf, struct bpf_cand_cache **cache, 6554 + int cache_size) 6555 + { 6556 + struct bpf_cand_cache *cc; 6557 + int i, j; 6558 + 6559 + for (i = 0; i < cache_size; i++) { 6560 + cc = cache[i]; 6561 + if (!cc) 6562 + continue; 6563 + if (!btf) { 6564 + /* when new module is loaded purge all of module_cand_cache, 6565 + * since new module might have candidates with the name 6566 + * that matches cached cands. 6567 + */ 6568 + bpf_free_cands_from_cache(cc); 6569 + cache[i] = NULL; 6570 + continue; 6571 + } 6572 + /* when module is unloaded purge cache entries 6573 + * that match module's btf 6574 + */ 6575 + for (j = 0; j < cc->cnt; j++) 6576 + if (cc->cands[j].btf == btf) { 6577 + bpf_free_cands_from_cache(cc); 6578 + cache[i] = NULL; 6579 + break; 6580 + } 6581 + } 6582 + 6583 + } 6584 + 6585 + static void purge_cand_cache(struct btf *btf) 6586 + { 6587 + mutex_lock(&cand_cache_mutex); 6588 + __purge_cand_cache(btf, module_cand_cache, MODULE_CAND_CACHE_SIZE); 6589 + mutex_unlock(&cand_cache_mutex); 6590 + } 6591 + #endif 6592 + 6593 + static struct bpf_cand_cache * 6594 + bpf_core_add_cands(struct bpf_cand_cache *cands, const struct btf *targ_btf, 6595 + int targ_start_id) 6596 + { 6597 + struct bpf_cand_cache *new_cands; 6598 + const struct btf_type *t; 6599 + const char *targ_name; 6600 + size_t targ_essent_len; 6601 + int n, i; 6602 + 6603 + n = btf_nr_types(targ_btf); 6604 + for (i = targ_start_id; i < n; i++) { 6605 + t = btf_type_by_id(targ_btf, i); 6606 + if (btf_kind(t) != cands->kind) 6607 + continue; 6608 + 6609 + targ_name = btf_name_by_offset(targ_btf, t->name_off); 6610 + if (!targ_name) 6611 + continue; 6612 + 6613 + /* the resched point is before strncmp to make sure that search 6614 + * for non-existing name will have a chance to schedule(). 6615 + */ 6616 + cond_resched(); 6617 + 6618 + if (strncmp(cands->name, targ_name, cands->name_len) != 0) 6619 + continue; 6620 + 6621 + targ_essent_len = bpf_core_essential_name_len(targ_name); 6622 + if (targ_essent_len != cands->name_len) 6623 + continue; 6624 + 6625 + /* most of the time there is only one candidate for a given kind+name pair */ 6626 + new_cands = kmalloc(sizeof_cands(cands->cnt + 1), GFP_KERNEL); 6627 + if (!new_cands) { 6628 + bpf_free_cands(cands); 6629 + return ERR_PTR(-ENOMEM); 6630 + } 6631 + 6632 + memcpy(new_cands, cands, sizeof_cands(cands->cnt)); 6633 + bpf_free_cands(cands); 6634 + cands = new_cands; 6635 + cands->cands[cands->cnt].btf = targ_btf; 6636 + cands->cands[cands->cnt].id = i; 6637 + cands->cnt++; 6638 + } 6639 + return cands; 6640 + } 6641 + 6642 + static struct bpf_cand_cache * 6643 + bpf_core_find_cands(struct bpf_core_ctx *ctx, u32 local_type_id) 6644 + { 6645 + struct bpf_cand_cache *cands, *cc, local_cand = {}; 6646 + const struct btf *local_btf = ctx->btf; 6647 + const struct btf_type *local_type; 6648 + const struct btf *main_btf; 6649 + size_t local_essent_len; 6650 + struct btf *mod_btf; 6651 + const char *name; 6652 + int id; 6653 + 6654 + main_btf = bpf_get_btf_vmlinux(); 6655 + if (IS_ERR(main_btf)) 6656 + return (void *)main_btf; 6657 + 6658 + local_type = btf_type_by_id(local_btf, local_type_id); 6659 + if (!local_type) 6660 + return ERR_PTR(-EINVAL); 6661 + 6662 + name = btf_name_by_offset(local_btf, local_type->name_off); 6663 + if (str_is_empty(name)) 6664 + return ERR_PTR(-EINVAL); 6665 + local_essent_len = bpf_core_essential_name_len(name); 6666 + 6667 + cands = &local_cand; 6668 + cands->name = name; 6669 + cands->kind = btf_kind(local_type); 6670 + cands->name_len = local_essent_len; 6671 + 6672 + cc = check_cand_cache(cands, vmlinux_cand_cache, VMLINUX_CAND_CACHE_SIZE); 6673 + /* cands is a pointer to stack here */ 6674 + if (cc) { 6675 + if (cc->cnt) 6676 + return cc; 6677 + goto check_modules; 6678 + } 6679 + 6680 + /* Attempt to find target candidates in vmlinux BTF first */ 6681 + cands = bpf_core_add_cands(cands, main_btf, 1); 6682 + if (IS_ERR(cands)) 6683 + return cands; 6684 + 6685 + /* cands is a pointer to kmalloced memory here if cands->cnt > 0 */ 6686 + 6687 + /* populate cache even when cands->cnt == 0 */ 6688 + cc = populate_cand_cache(cands, vmlinux_cand_cache, VMLINUX_CAND_CACHE_SIZE); 6689 + if (IS_ERR(cc)) 6690 + return cc; 6691 + 6692 + /* if vmlinux BTF has any candidate, don't go for module BTFs */ 6693 + if (cc->cnt) 6694 + return cc; 6695 + 6696 + check_modules: 6697 + /* cands is a pointer to stack here and cands->cnt == 0 */ 6698 + cc = check_cand_cache(cands, module_cand_cache, MODULE_CAND_CACHE_SIZE); 6699 + if (cc) 6700 + /* if cache has it return it even if cc->cnt == 0 */ 6701 + return cc; 6702 + 6703 + /* If candidate is not found in vmlinux's BTF then search in module's BTFs */ 6704 + spin_lock_bh(&btf_idr_lock); 6705 + idr_for_each_entry(&btf_idr, mod_btf, id) { 6706 + if (!btf_is_module(mod_btf)) 6707 + continue; 6708 + /* linear search could be slow hence unlock/lock 6709 + * the IDR to avoiding holding it for too long 6710 + */ 6711 + btf_get(mod_btf); 6712 + spin_unlock_bh(&btf_idr_lock); 6713 + cands = bpf_core_add_cands(cands, mod_btf, btf_nr_types(main_btf)); 6714 + if (IS_ERR(cands)) { 6715 + btf_put(mod_btf); 6716 + return cands; 6717 + } 6718 + spin_lock_bh(&btf_idr_lock); 6719 + btf_put(mod_btf); 6720 + } 6721 + spin_unlock_bh(&btf_idr_lock); 6722 + /* cands is a pointer to kmalloced memory here if cands->cnt > 0 6723 + * or pointer to stack if cands->cnd == 0. 6724 + * Copy it into the cache even when cands->cnt == 0 and 6725 + * return the result. 6726 + */ 6727 + return populate_cand_cache(cands, module_cand_cache, MODULE_CAND_CACHE_SIZE); 6728 + } 6729 + 6730 + int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo, 6731 + int relo_idx, void *insn) 6732 + { 6733 + bool need_cands = relo->kind != BPF_CORE_TYPE_ID_LOCAL; 6734 + struct bpf_core_cand_list cands = {}; 6735 + struct bpf_core_spec *specs; 6736 + int err; 6737 + 6738 + /* ~4k of temp memory necessary to convert LLVM spec like "0:1:0:5" 6739 + * into arrays of btf_ids of struct fields and array indices. 6740 + */ 6741 + specs = kcalloc(3, sizeof(*specs), GFP_KERNEL); 6742 + if (!specs) 6743 + return -ENOMEM; 6744 + 6745 + if (need_cands) { 6746 + struct bpf_cand_cache *cc; 6747 + int i; 6748 + 6749 + mutex_lock(&cand_cache_mutex); 6750 + cc = bpf_core_find_cands(ctx, relo->type_id); 6751 + if (IS_ERR(cc)) { 6752 + bpf_log(ctx->log, "target candidate search failed for %d\n", 6753 + relo->type_id); 6754 + err = PTR_ERR(cc); 6755 + goto out; 6756 + } 6757 + if (cc->cnt) { 6758 + cands.cands = kcalloc(cc->cnt, sizeof(*cands.cands), GFP_KERNEL); 6759 + if (!cands.cands) { 6760 + err = -ENOMEM; 6761 + goto out; 6762 + } 6763 + } 6764 + for (i = 0; i < cc->cnt; i++) { 6765 + bpf_log(ctx->log, 6766 + "CO-RE relocating %s %s: found target candidate [%d]\n", 6767 + btf_kind_str[cc->kind], cc->name, cc->cands[i].id); 6768 + cands.cands[i].btf = cc->cands[i].btf; 6769 + cands.cands[i].id = cc->cands[i].id; 6770 + } 6771 + cands.len = cc->cnt; 6772 + /* cand_cache_mutex needs to span the cache lookup and 6773 + * copy of btf pointer into bpf_core_cand_list, 6774 + * since module can be unloaded while bpf_core_apply_relo_insn 6775 + * is working with module's btf. 6776 + */ 6777 + } 6778 + 6779 + err = bpf_core_apply_relo_insn((void *)ctx->log, insn, relo->insn_off / 8, 6780 + relo, relo_idx, ctx->btf, &cands, specs); 6781 + out: 6782 + kfree(specs); 6783 + if (need_cands) { 6784 + kfree(cands.cands); 6785 + mutex_unlock(&cand_cache_mutex); 6786 + if (ctx->log->level & BPF_LOG_LEVEL2) 6787 + print_cand_cache(ctx->log); 6788 + } 6789 + return err; 6790 + }

+3 -3

kernel/bpf/core.c

··· 1574 1574 1575 1575 if (unlikely(index >= array->map.max_entries)) 1576 1576 goto out; 1577 - if (unlikely(tail_call_cnt > MAX_TAIL_CALL_CNT)) 1577 + 1578 + if (unlikely(tail_call_cnt >= MAX_TAIL_CALL_CNT)) 1578 1579 goto out; 1579 1580 1580 1581 tail_call_cnt++; ··· 1892 1891 1893 1892 /** 1894 1893 * bpf_prog_select_runtime - select exec runtime for BPF program 1895 - * @fp: bpf_prog populated with internal BPF program 1894 + * @fp: bpf_prog populated with BPF program 1896 1895 * @err: pointer to error variable 1897 1896 * 1898 1897 * Try to JIT eBPF program, if JIT is not available, use interpreter. ··· 2301 2300 } 2302 2301 } 2303 2302 2304 - /* Free internal BPF program */ 2305 2303 void bpf_prog_free(struct bpf_prog *fp) 2306 2304 { 2307 2305 struct bpf_prog_aux *aux = fp->aux;

+2

kernel/bpf/helpers.c

··· 1376 1376 return &bpf_ringbuf_query_proto; 1377 1377 case BPF_FUNC_for_each_map_elem: 1378 1378 return &bpf_for_each_map_elem_proto; 1379 + case BPF_FUNC_loop: 1380 + return &bpf_loop_proto; 1379 1381 default: 1380 1382 break; 1381 1383 }

+2 -2

kernel/bpf/syscall.c

··· 2198 2198 } 2199 2199 2200 2200 /* last field in 'union bpf_attr' used by this command */ 2201 - #define BPF_PROG_LOAD_LAST_FIELD fd_array 2201 + #define BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size 2202 2202 2203 2203 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr) 2204 2204 { ··· 4819 4819 .gpl_only = false, 4820 4820 .ret_type = RET_INTEGER, 4821 4821 .arg1_type = ARG_PTR_TO_MEM, 4822 - .arg2_type = ARG_CONST_SIZE, 4822 + .arg2_type = ARG_CONST_SIZE_OR_ZERO, 4823 4823 .arg3_type = ARG_ANYTHING, 4824 4824 .arg4_type = ARG_PTR_TO_LONG, 4825 4825 };

+139 -41

kernel/bpf/verifier.c

··· 293 293 WARN_ONCE(n >= BPF_VERIFIER_TMP_LOG_SIZE - 1, 294 294 "verifier log line truncated - local buffer too short\n"); 295 295 296 - n = min(log->len_total - log->len_used - 1, n); 297 - log->kbuf[n] = '\0'; 298 - 299 296 if (log->level == BPF_LOG_KERNEL) { 300 - pr_err("BPF:%s\n", log->kbuf); 297 + bool newline = n > 0 && log->kbuf[n - 1] == '\n'; 298 + 299 + pr_err("BPF: %s%s", log->kbuf, newline ? "" : "\n"); 301 300 return; 302 301 } 302 + 303 + n = min(log->len_total - log->len_used - 1, n); 304 + log->kbuf[n] = '\0'; 303 305 if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1)) 304 306 log->len_used += n; 305 307 else ··· 6103 6101 return 0; 6104 6102 } 6105 6103 6104 + static int set_loop_callback_state(struct bpf_verifier_env *env, 6105 + struct bpf_func_state *caller, 6106 + struct bpf_func_state *callee, 6107 + int insn_idx) 6108 + { 6109 + /* bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, 6110 + * u64 flags); 6111 + * callback_fn(u32 index, void *callback_ctx); 6112 + */ 6113 + callee->regs[BPF_REG_1].type = SCALAR_VALUE; 6114 + callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3]; 6115 + 6116 + /* unused */ 6117 + __mark_reg_not_init(env, &callee->regs[BPF_REG_3]); 6118 + __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); 6119 + __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); 6120 + 6121 + callee->in_callback_fn = true; 6122 + return 0; 6123 + } 6124 + 6106 6125 static int set_timer_callback_state(struct bpf_verifier_env *env, 6107 6126 struct bpf_func_state *caller, 6108 6127 struct bpf_func_state *callee, ··· 6497 6474 return err; 6498 6475 } 6499 6476 6500 - if (func_id == BPF_FUNC_tail_call) { 6501 - err = check_reference_leak(env); 6502 - if (err) { 6503 - verbose(env, "tail_call would lead to reference leak\n"); 6504 - return err; 6505 - } 6506 - } else if (is_release_function(func_id)) { 6477 + if (is_release_function(func_id)) { 6507 6478 err = release_reference(env, meta.ref_obj_id); 6508 6479 if (err) { 6509 6480 verbose(env, "func %s#%d reference has not been acquired before\n", ··· 6508 6491 6509 6492 regs = cur_regs(env); 6510 6493 6511 - /* check that flags argument in get_local_storage(map, flags) is 0, 6512 - * this is required because get_local_storage() can't return an error. 6513 - */ 6514 - if (func_id == BPF_FUNC_get_local_storage && 6515 - !register_is_null(&regs[BPF_REG_2])) { 6516 - verbose(env, "get_local_storage() doesn't support non-zero flags\n"); 6517 - return -EINVAL; 6518 - } 6519 - 6520 - if (func_id == BPF_FUNC_for_each_map_elem) { 6494 + switch (func_id) { 6495 + case BPF_FUNC_tail_call: 6496 + err = check_reference_leak(env); 6497 + if (err) { 6498 + verbose(env, "tail_call would lead to reference leak\n"); 6499 + return err; 6500 + } 6501 + break; 6502 + case BPF_FUNC_get_local_storage: 6503 + /* check that flags argument in get_local_storage(map, flags) is 0, 6504 + * this is required because get_local_storage() can't return an error. 6505 + */ 6506 + if (!register_is_null(&regs[BPF_REG_2])) { 6507 + verbose(env, "get_local_storage() doesn't support non-zero flags\n"); 6508 + return -EINVAL; 6509 + } 6510 + break; 6511 + case BPF_FUNC_for_each_map_elem: 6521 6512 err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, 6522 6513 set_map_elem_callback_state); 6523 - if (err < 0) 6524 - return -EINVAL; 6525 - } 6526 - 6527 - if (func_id == BPF_FUNC_timer_set_callback) { 6514 + break; 6515 + case BPF_FUNC_timer_set_callback: 6528 6516 err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, 6529 6517 set_timer_callback_state); 6530 - if (err < 0) 6531 - return -EINVAL; 6532 - } 6533 - 6534 - if (func_id == BPF_FUNC_find_vma) { 6518 + break; 6519 + case BPF_FUNC_find_vma: 6535 6520 err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, 6536 6521 set_find_vma_callback_state); 6537 - if (err < 0) 6538 - return -EINVAL; 6522 + break; 6523 + case BPF_FUNC_snprintf: 6524 + err = check_bpf_snprintf_call(env, regs); 6525 + break; 6526 + case BPF_FUNC_loop: 6527 + err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, 6528 + set_loop_callback_state); 6529 + break; 6539 6530 } 6540 6531 6541 - if (func_id == BPF_FUNC_snprintf) { 6542 - err = check_bpf_snprintf_call(env, regs); 6543 - if (err < 0) 6544 - return err; 6545 - } 6532 + if (err) 6533 + return err; 6546 6534 6547 6535 /* reset caller saved regs */ 6548 6536 for (i = 0; i < CALLER_SAVED_REGS; i++) { ··· 10289 10267 return err; 10290 10268 } 10291 10269 10270 + #define MIN_CORE_RELO_SIZE sizeof(struct bpf_core_relo) 10271 + #define MAX_CORE_RELO_SIZE MAX_FUNCINFO_REC_SIZE 10272 + 10273 + static int check_core_relo(struct bpf_verifier_env *env, 10274 + const union bpf_attr *attr, 10275 + bpfptr_t uattr) 10276 + { 10277 + u32 i, nr_core_relo, ncopy, expected_size, rec_size; 10278 + struct bpf_core_relo core_relo = {}; 10279 + struct bpf_prog *prog = env->prog; 10280 + const struct btf *btf = prog->aux->btf; 10281 + struct bpf_core_ctx ctx = { 10282 + .log = &env->log, 10283 + .btf = btf, 10284 + }; 10285 + bpfptr_t u_core_relo; 10286 + int err; 10287 + 10288 + nr_core_relo = attr->core_relo_cnt; 10289 + if (!nr_core_relo) 10290 + return 0; 10291 + if (nr_core_relo > INT_MAX / sizeof(struct bpf_core_relo)) 10292 + return -EINVAL; 10293 + 10294 + rec_size = attr->core_relo_rec_size; 10295 + if (rec_size < MIN_CORE_RELO_SIZE || 10296 + rec_size > MAX_CORE_RELO_SIZE || 10297 + rec_size % sizeof(u32)) 10298 + return -EINVAL; 10299 + 10300 + u_core_relo = make_bpfptr(attr->core_relos, uattr.is_kernel); 10301 + expected_size = sizeof(struct bpf_core_relo); 10302 + ncopy = min_t(u32, expected_size, rec_size); 10303 + 10304 + /* Unlike func_info and line_info, copy and apply each CO-RE 10305 + * relocation record one at a time. 10306 + */ 10307 + for (i = 0; i < nr_core_relo; i++) { 10308 + /* future proofing when sizeof(bpf_core_relo) changes */ 10309 + err = bpf_check_uarg_tail_zero(u_core_relo, expected_size, rec_size); 10310 + if (err) { 10311 + if (err == -E2BIG) { 10312 + verbose(env, "nonzero tailing record in core_relo"); 10313 + if (copy_to_bpfptr_offset(uattr, 10314 + offsetof(union bpf_attr, core_relo_rec_size), 10315 + &expected_size, sizeof(expected_size))) 10316 + err = -EFAULT; 10317 + } 10318 + break; 10319 + } 10320 + 10321 + if (copy_from_bpfptr(&core_relo, u_core_relo, ncopy)) { 10322 + err = -EFAULT; 10323 + break; 10324 + } 10325 + 10326 + if (core_relo.insn_off % 8 || core_relo.insn_off / 8 >= prog->len) { 10327 + verbose(env, "Invalid core_relo[%u].insn_off:%u prog->len:%u\n", 10328 + i, core_relo.insn_off, prog->len); 10329 + err = -EINVAL; 10330 + break; 10331 + } 10332 + 10333 + err = bpf_core_apply(&ctx, &core_relo, i, 10334 + &prog->insnsi[core_relo.insn_off / 8]); 10335 + if (err) 10336 + break; 10337 + bpfptr_add(&u_core_relo, rec_size); 10338 + } 10339 + return err; 10340 + } 10341 + 10292 10342 static int check_btf_info(struct bpf_verifier_env *env, 10293 10343 const union bpf_attr *attr, 10294 10344 bpfptr_t uattr) ··· 10388 10294 return err; 10389 10295 10390 10296 err = check_btf_line(env, attr, uattr); 10297 + if (err) 10298 + return err; 10299 + 10300 + err = check_core_relo(env, attr, uattr); 10391 10301 if (err) 10392 10302 return err; 10393 10303 ··· 14073 13975 log->ubuf = (char __user *) (unsigned long) attr->log_buf; 14074 13976 log->len_total = attr->log_size; 14075 13977 14076 - ret = -EINVAL; 14077 13978 /* log attributes have to be sane */ 14078 - if (log->len_total < 128 || log->len_total > UINT_MAX >> 2 || 14079 - !log->level || !log->ubuf || log->level & ~BPF_LOG_MASK) 13979 + if (!bpf_verifier_log_attr_valid(log)) { 13980 + ret = -EINVAL; 14080 13981 goto err_unlock; 13982 + } 14081 13983 } 14082 13984 14083 13985 if (IS_ERR(btf_vmlinux)) {

+1 -5

kernel/trace/bpf_trace.c

··· 1402 1402 BPF_CALL_4(bpf_read_branch_records, struct bpf_perf_event_data_kern *, ctx, 1403 1403 void *, buf, u32, size, u64, flags) 1404 1404 { 1405 - #ifndef CONFIG_X86 1406 - return -ENOENT; 1407 - #else 1408 1405 static const u32 br_entry_size = sizeof(struct perf_branch_entry); 1409 1406 struct perf_branch_stack *br_stack = ctx->data->br_stack; 1410 1407 u32 to_copy; ··· 1410 1413 return -EINVAL; 1411 1414 1412 1415 if (unlikely(!br_stack)) 1413 - return -EINVAL; 1416 + return -ENOENT; 1414 1417 1415 1418 if (flags & BPF_F_GET_BRANCH_RECORDS_SIZE) 1416 1419 return br_stack->nr * br_entry_size; ··· 1422 1425 memcpy(buf, br_stack->entries, to_copy); 1423 1426 1424 1427 return to_copy; 1425 - #endif 1426 1428 } 1427 1429 1428 1430 static const struct bpf_func_proto bpf_read_branch_records_proto = {

+2 -2

lib/test_bpf.c

··· 14683 14683 BPF_EXIT_INSN(), 14684 14684 }, 14685 14685 .flags = FLAG_NEED_STATE | FLAG_RESULT_IN_STATE, 14686 - .result = (MAX_TAIL_CALL_CNT + 1 + 1) * MAX_TESTRUNS, 14686 + .result = (MAX_TAIL_CALL_CNT + 1) * MAX_TESTRUNS, 14687 14687 }, 14688 14688 { 14689 14689 "Tail call count preserved across function calls", ··· 14705 14705 }, 14706 14706 .stack_depth = 8, 14707 14707 .flags = FLAG_NEED_STATE | FLAG_RESULT_IN_STATE, 14708 - .result = (MAX_TAIL_CALL_CNT + 1 + 1) * MAX_TESTRUNS, 14708 + .result = (MAX_TAIL_CALL_CNT + 1) * MAX_TESTRUNS, 14709 14709 }, 14710 14710 { 14711 14711 "Tail call error path, NULL target",

+5 -6

net/core/filter.c

··· 1242 1242 int err, new_len, old_len = fp->len; 1243 1243 bool seen_ld_abs = false; 1244 1244 1245 - /* We are free to overwrite insns et al right here as it 1246 - * won't be used at this point in time anymore internally 1247 - * after the migration to the internal BPF instruction 1248 - * representation. 1245 + /* We are free to overwrite insns et al right here as it won't be used at 1246 + * this point in time anymore internally after the migration to the eBPF 1247 + * instruction representation. 1249 1248 */ 1250 1249 BUILD_BUG_ON(sizeof(struct sock_filter) != 1251 1250 sizeof(struct bpf_insn)); ··· 1335 1336 */ 1336 1337 bpf_jit_compile(fp); 1337 1338 1338 - /* JIT compiler couldn't process this filter, so do the 1339 - * internal BPF translation for the optimized interpreter. 1339 + /* JIT compiler couldn't process this filter, so do the eBPF translation 1340 + * for the optimized interpreter. 1340 1341 */ 1341 1342 if (!fp->jited) 1342 1343 fp = bpf_migrate_filter(fp);

+3 -3

net/ipv4/bpf_tcp_ca.c

··· 169 169 t = bpf_tcp_congestion_ops.type; 170 170 m = &btf_type_member(t)[midx]; 171 171 172 - return btf_member_bit_offset(t, m) / 8; 172 + return __btf_member_bit_offset(t, m) / 8; 173 173 } 174 174 175 175 static const struct bpf_func_proto * ··· 246 246 utcp_ca = (const struct tcp_congestion_ops *)udata; 247 247 tcp_ca = (struct tcp_congestion_ops *)kdata; 248 248 249 - moff = btf_member_bit_offset(t, member) / 8; 249 + moff = __btf_member_bit_offset(t, member) / 8; 250 250 switch (moff) { 251 251 case offsetof(struct tcp_congestion_ops, flags): 252 252 if (utcp_ca->flags & ~TCP_CONG_MASK) ··· 276 276 static int bpf_tcp_ca_check_member(const struct btf_type *t, 277 277 const struct btf_member *member) 278 278 { 279 - if (is_unsupported(btf_member_bit_offset(t, member) / 8)) 279 + if (is_unsupported(__btf_member_bit_offset(t, member) / 8)) 280 280 return -ENOTSUPP; 281 281 return 0; 282 282 }

+17 -1

samples/bpf/Makefile

··· 215 215 endif 216 216 217 217 TPROGS_LDLIBS += $(LIBBPF) -lelf -lz 218 + TPROGLDLIBS_xdp_monitor += -lm 219 + TPROGLDLIBS_xdp_redirect += -lm 220 + TPROGLDLIBS_xdp_redirect_cpu += -lm 221 + TPROGLDLIBS_xdp_redirect_map += -lm 222 + TPROGLDLIBS_xdp_redirect_map_multi += -lm 218 223 TPROGLDLIBS_tracex4 += -lrt 219 224 TPROGLDLIBS_trace_output += -lrt 220 225 TPROGLDLIBS_map_perf_test += -lrt ··· 333 328 $(src)/*.c: verify_target_bpf $(LIBBPF) 334 329 335 330 libbpf_hdrs: $(LIBBPF) 336 - $(obj)/$(TRACE_HELPERS): | libbpf_hdrs 331 + $(obj)/$(TRACE_HELPERS) $(obj)/$(CGROUP_HELPERS) $(obj)/$(XDP_SAMPLE): | libbpf_hdrs 337 332 338 333 .PHONY: libbpf_hdrs 339 334 ··· 347 342 $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h 348 343 $(obj)/hbm.o: $(src)/hbm.h 349 344 $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h 345 + 346 + # Override includes for xdp_sample_user.o because $(srctree)/usr/include in 347 + # TPROGS_CFLAGS causes conflicts 348 + XDP_SAMPLE_CFLAGS += -Wall -O2 \ 349 + -I$(src)/../../tools/include \ 350 + -I$(src)/../../tools/include/uapi \ 351 + -I$(LIBBPF_INCLUDE) \ 352 + -I$(src)/../../tools/testing/selftests/bpf 353 + 354 + $(obj)/$(XDP_SAMPLE): TPROGS_CFLAGS = $(XDP_SAMPLE_CFLAGS) 355 + $(obj)/$(XDP_SAMPLE): $(src)/xdp_sample_user.h $(src)/xdp_sample_shared.h 350 356 351 357 -include $(BPF_SAMPLES_PATH)/Makefile.target 352 358

-11

samples/bpf/Makefile.target

··· 73 73 cmd_tprog-cobjs = $(CC) $(tprogc_flags) -c -o $@ $< 74 74 $(tprog-cobjs): $(obj)/%.o: $(src)/%.c FORCE 75 75 $(call if_changed_dep,tprog-cobjs) 76 - 77 - # Override includes for xdp_sample_user.o because $(srctree)/usr/include in 78 - # TPROGS_CFLAGS causes conflicts 79 - XDP_SAMPLE_CFLAGS += -Wall -O2 -lm \ 80 - -I./tools/include \ 81 - -I./tools/include/uapi \ 82 - -I./tools/lib \ 83 - -I./tools/testing/selftests/bpf 84 - $(obj)/xdp_sample_user.o: $(src)/xdp_sample_user.c \ 85 - $(src)/xdp_sample_user.h $(src)/xdp_sample_shared.h 86 - $(CC) $(XDP_SAMPLE_CFLAGS) -c -o $@ $<

+9 -5

samples/bpf/cookie_uid_helper_example.c

··· 67 67 68 68 static void maps_create(void) 69 69 { 70 - map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(uint32_t), 71 - sizeof(struct stats), 100, 0); 70 + map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(uint32_t), 71 + sizeof(struct stats), 100, NULL); 72 72 if (map_fd < 0) 73 73 error(1, errno, "map create failed!\n"); 74 74 } ··· 157 157 offsetof(struct __sk_buff, len)), 158 158 BPF_EXIT_INSN(), 159 159 }; 160 - prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, 161 - ARRAY_SIZE(prog), "GPL", 0, 162 - log_buf, sizeof(log_buf)); 160 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 161 + .log_buf = log_buf, 162 + .log_size = sizeof(log_buf), 163 + ); 164 + 165 + prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", 166 + prog, ARRAY_SIZE(prog), &opts); 163 167 if (prog_fd < 0) 164 168 error(1, errno, "failed to load prog\n%s\n", log_buf); 165 169 }

+15 -14

samples/bpf/fds_example.c

··· 46 46 printf(" -h Display this help.\n"); 47 47 } 48 48 49 - static int bpf_map_create(void) 50 - { 51 - return bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(uint32_t), 52 - sizeof(uint32_t), 1024, 0); 53 - } 54 - 55 49 static int bpf_prog_create(const char *object) 56 50 { 57 51 static struct bpf_insn insns[] = { ··· 54 60 }; 55 61 size_t insns_cnt = sizeof(insns) / sizeof(struct bpf_insn); 56 62 struct bpf_object *obj; 57 - int prog_fd; 63 + int err; 58 64 59 65 if (object) { 60 - assert(!bpf_prog_load(object, BPF_PROG_TYPE_UNSPEC, 61 - &obj, &prog_fd)); 62 - return prog_fd; 66 + obj = bpf_object__open_file(object, NULL); 67 + assert(!libbpf_get_error(obj)); 68 + err = bpf_object__load(obj); 69 + assert(!err); 70 + return bpf_program__fd(bpf_object__next_program(obj, NULL)); 63 71 } else { 64 - return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, 65 - insns, insns_cnt, "GPL", 0, 66 - bpf_log_buf, BPF_LOG_BUF_SIZE); 72 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 73 + .log_buf = bpf_log_buf, 74 + .log_size = BPF_LOG_BUF_SIZE, 75 + ); 76 + 77 + return bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", 78 + insns, insns_cnt, &opts); 67 79 } 68 80 } 69 81 ··· 79 79 int fd, ret; 80 80 81 81 if (flags & BPF_F_PIN) { 82 - fd = bpf_map_create(); 82 + fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(uint32_t), 83 + sizeof(uint32_t), 1024, NULL); 83 84 printf("bpf: map fd:%d (%s)\n", fd, strerror(errno)); 84 85 assert(fd > 0); 85 86

-7

samples/bpf/lwt_len_hist_kern.c

··· 16 16 #include <uapi/linux/in.h> 17 17 #include <bpf/bpf_helpers.h> 18 18 19 - # define printk(fmt, ...) \ 20 - ({ \ 21 - char ____fmt[] = fmt; \ 22 - bpf_trace_printk(____fmt, sizeof(____fmt), \ 23 - ##__VA_ARGS__); \ 24 - }) 25 - 26 19 struct bpf_elf_map { 27 20 __u32 type; 28 21 __u32 size_key;

+9 -6

samples/bpf/map_perf_test_user.c

··· 134 134 */ 135 135 int outer_fd = map_fd[array_of_lru_hashs_idx]; 136 136 unsigned int mycpu, mynode; 137 + LIBBPF_OPTS(bpf_map_create_opts, opts, 138 + .map_flags = BPF_F_NUMA_NODE, 139 + ); 137 140 138 141 assert(cpu < MAX_NR_CPUS); 139 142 140 143 ret = syscall(__NR_getcpu, &mycpu, &mynode, NULL); 141 144 assert(!ret); 142 145 146 + opts.numa_node = mynode; 143 147 inner_lru_map_fds[cpu] = 144 - bpf_create_map_node(BPF_MAP_TYPE_LRU_HASH, 145 - test_map_names[INNER_LRU_HASH_PREALLOC], 146 - sizeof(uint32_t), 147 - sizeof(long), 148 - inner_lru_hash_size, 0, 149 - mynode); 148 + bpf_map_create(BPF_MAP_TYPE_LRU_HASH, 149 + test_map_names[INNER_LRU_HASH_PREALLOC], 150 + sizeof(uint32_t), 151 + sizeof(long), 152 + inner_lru_hash_size, &opts); 150 153 if (inner_lru_map_fds[cpu] == -1) { 151 154 printf("cannot create BPF_MAP_TYPE_LRU_HASH %s(%d)\n", 152 155 strerror(errno), errno);

+8 -4

samples/bpf/sock_example.c

··· 37 37 int sock = -1, map_fd, prog_fd, i, key; 38 38 long long value = 0, tcp_cnt, udp_cnt, icmp_cnt; 39 39 40 - map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key), sizeof(value), 41 - 256, 0); 40 + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(key), sizeof(value), 41 + 256, NULL); 42 42 if (map_fd < 0) { 43 43 printf("failed to create map '%s'\n", strerror(errno)); 44 44 goto cleanup; ··· 59 59 BPF_EXIT_INSN(), 60 60 }; 61 61 size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); 62 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 63 + .log_buf = bpf_log_buf, 64 + .log_size = BPF_LOG_BUF_SIZE, 65 + ); 62 66 63 - prog_fd = bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, insns_cnt, 64 - "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE); 67 + prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", 68 + prog, insns_cnt, &opts); 65 69 if (prog_fd < 0) { 66 70 printf("failed to load prog '%s'\n", strerror(errno)); 67 71 goto cleanup;

+12 -3

samples/bpf/sockex1_user.c

··· 11 11 int main(int ac, char **argv) 12 12 { 13 13 struct bpf_object *obj; 14 + struct bpf_program *prog; 14 15 int map_fd, prog_fd; 15 16 char filename[256]; 16 - int i, sock; 17 + int i, sock, err; 17 18 FILE *f; 18 19 19 20 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 20 21 21 - if (bpf_prog_load(filename, BPF_PROG_TYPE_SOCKET_FILTER, 22 - &obj, &prog_fd)) 22 + obj = bpf_object__open_file(filename, NULL); 23 + if (libbpf_get_error(obj)) 23 24 return 1; 24 25 26 + prog = bpf_object__next_program(obj, NULL); 27 + bpf_program__set_type(prog, BPF_PROG_TYPE_SOCKET_FILTER); 28 + 29 + err = bpf_object__load(obj); 30 + if (err) 31 + return 1; 32 + 33 + prog_fd = bpf_program__fd(prog); 25 34 map_fd = bpf_object__find_map_fd_by_name(obj, "my_map"); 26 35 27 36 sock = open_raw_sock("lo");

+12 -4

samples/bpf/sockex2_user.c

··· 16 16 17 17 int main(int ac, char **argv) 18 18 { 19 + struct bpf_program *prog; 19 20 struct bpf_object *obj; 20 21 int map_fd, prog_fd; 21 22 char filename[256]; 22 - int i, sock; 23 + int i, sock, err; 23 24 FILE *f; 24 25 25 26 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 26 - 27 - if (bpf_prog_load(filename, BPF_PROG_TYPE_SOCKET_FILTER, 28 - &obj, &prog_fd)) 27 + obj = bpf_object__open_file(filename, NULL); 28 + if (libbpf_get_error(obj)) 29 29 return 1; 30 30 31 + prog = bpf_object__next_program(obj, NULL); 32 + bpf_program__set_type(prog, BPF_PROG_TYPE_SOCKET_FILTER); 33 + 34 + err = bpf_object__load(obj); 35 + if (err) 36 + return 1; 37 + 38 + prog_fd = bpf_program__fd(prog); 31 39 map_fd = bpf_object__find_map_fd_by_name(obj, "hash_map"); 32 40 33 41 sock = open_raw_sock("lo");

+2 -2

samples/bpf/test_cgrp2_array_pin.c

··· 64 64 } 65 65 66 66 if (create_array) { 67 - array_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_ARRAY, 67 + array_fd = bpf_map_create(BPF_MAP_TYPE_CGROUP_ARRAY, NULL, 68 68 sizeof(uint32_t), sizeof(uint32_t), 69 - 1, 0); 69 + 1, NULL); 70 70 if (array_fd < 0) { 71 71 fprintf(stderr, 72 72 "bpf_create_map(BPF_MAP_TYPE_CGROUP_ARRAY,...): %s(%d)\n",

+8 -5

samples/bpf/test_cgrp2_attach.c

··· 71 71 BPF_EXIT_INSN(), 72 72 }; 73 73 size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn); 74 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 75 + .log_buf = bpf_log_buf, 76 + .log_size = BPF_LOG_BUF_SIZE, 77 + ); 74 78 75 - return bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB, 76 - prog, insns_cnt, "GPL", 0, 77 - bpf_log_buf, BPF_LOG_BUF_SIZE); 79 + return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SKB, NULL, "GPL", 80 + prog, insns_cnt, &opts); 78 81 } 79 82 80 83 static int usage(const char *argv0) ··· 93 90 int prog_fd, map_fd, ret, key; 94 91 long long pkt_cnt, byte_cnt; 95 92 96 - map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 93 + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, 97 94 sizeof(key), sizeof(byte_cnt), 98 - 256, 0); 95 + 256, NULL); 99 96 if (map_fd < 0) { 100 97 printf("Failed to create map: '%s'\n", strerror(errno)); 101 98 return EXIT_FAILURE;

+6 -2

samples/bpf/test_cgrp2_sock.c

··· 70 70 BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, priority)), 71 71 BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, priority)), 72 72 }; 73 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 74 + .log_buf = bpf_log_buf, 75 + .log_size = BPF_LOG_BUF_SIZE, 76 + ); 73 77 74 78 struct bpf_insn *prog; 75 79 size_t insns_cnt; ··· 119 115 120 116 insns_cnt /= sizeof(struct bpf_insn); 121 117 122 - ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt, 123 - "GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE); 118 + ret = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", 119 + prog, insns_cnt, &opts); 124 120 125 121 free(prog); 126 122

+7 -4

samples/bpf/test_lru_dist.c

··· 105 105 static void pfect_lru_init(struct pfect_lru *lru, unsigned int lru_size, 106 106 unsigned int nr_possible_elems) 107 107 { 108 - lru->map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, 108 + lru->map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, 109 109 sizeof(unsigned long long), 110 110 sizeof(struct pfect_lru_node *), 111 - nr_possible_elems, 0); 111 + nr_possible_elems, NULL); 112 112 assert(lru->map_fd != -1); 113 113 114 114 lru->free_nodes = malloc(lru_size * sizeof(struct pfect_lru_node)); ··· 207 207 208 208 static int create_map(int map_type, int map_flags, unsigned int size) 209 209 { 210 + LIBBPF_OPTS(bpf_map_create_opts, opts, 211 + .map_flags = map_flags, 212 + ); 210 213 int map_fd; 211 214 212 - map_fd = bpf_create_map(map_type, sizeof(unsigned long long), 213 - sizeof(unsigned long long), size, map_flags); 215 + map_fd = bpf_map_create(map_type, NULL, sizeof(unsigned long long), 216 + sizeof(unsigned long long), size, &opts); 214 217 215 218 if (map_fd == -1) 216 219 perror("bpf_create_map");

+1 -3

samples/bpf/trace_output_user.c

··· 43 43 44 44 int main(int argc, char **argv) 45 45 { 46 - struct perf_buffer_opts pb_opts = {}; 47 46 struct bpf_link *link = NULL; 48 47 struct bpf_program *prog; 49 48 struct perf_buffer *pb; ··· 83 84 goto cleanup; 84 85 } 85 86 86 - pb_opts.sample_cb = print_bpf_output; 87 - pb = perf_buffer__new(map_fd, 8, &pb_opts); 87 + pb = perf_buffer__new(map_fd, 8, print_bpf_output, NULL, NULL, NULL); 88 88 ret = libbpf_get_error(pb); 89 89 if (ret) { 90 90 printf("failed to setup perf_buffer: %d\n", ret);

+1 -3

samples/bpf/xdp_redirect_cpu.bpf.c

··· 100 100 void *data = (void *)(long)ctx->data; 101 101 struct iphdr *iph = data + nh_off; 102 102 struct udphdr *udph; 103 - u16 dport; 104 103 105 104 if (iph + 1 > data_end) 106 105 return 0; ··· 110 111 if (udph + 1 > data_end) 111 112 return 0; 112 113 113 - dport = bpf_ntohs(udph->dest); 114 - return dport; 114 + return bpf_ntohs(udph->dest); 115 115 } 116 116 117 117 static __always_inline

+11 -11

samples/bpf/xdp_sample_pkts_user.c

··· 110 110 111 111 int main(int argc, char **argv) 112 112 { 113 - struct bpf_prog_load_attr prog_load_attr = { 114 - .prog_type = BPF_PROG_TYPE_XDP, 115 - }; 116 - struct perf_buffer_opts pb_opts = {}; 117 113 const char *optstr = "FS"; 118 114 int prog_fd, map_fd, opt; 115 + struct bpf_program *prog; 119 116 struct bpf_object *obj; 120 117 struct bpf_map *map; 121 118 char filename[256]; ··· 141 144 } 142 145 143 146 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 144 - prog_load_attr.file = filename; 145 147 146 - if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) 148 + obj = bpf_object__open_file(filename, NULL); 149 + if (libbpf_get_error(obj)) 147 150 return 1; 148 151 149 - if (!prog_fd) { 150 - printf("bpf_prog_load_xattr: %s\n", strerror(errno)); 152 + prog = bpf_object__next_program(obj, NULL); 153 + bpf_program__set_type(prog, BPF_PROG_TYPE_XDP); 154 + 155 + err = bpf_object__load(obj); 156 + if (err) 151 157 return 1; 152 - } 158 + 159 + prog_fd = bpf_program__fd(prog); 153 160 154 161 map = bpf_object__next_map(obj, NULL); 155 162 if (!map) { ··· 182 181 return 1; 183 182 } 184 183 185 - pb_opts.sample_cb = print_bpf_output; 186 - pb = perf_buffer__new(map_fd, 8, &pb_opts); 184 + pb = perf_buffer__new(map_fd, 8, print_bpf_output, NULL, NULL, NULL); 187 185 err = libbpf_get_error(pb); 188 186 if (err) { 189 187 perror("perf_buffer setup failed");

+2

samples/bpf/xdp_sample_user.h

··· 45 45 int get_mac_addr(int ifindex, void *mac_addr); 46 46 47 47 #pragma GCC diagnostic push 48 + #ifndef __clang__ 48 49 #pragma GCC diagnostic ignored "-Wstringop-truncation" 50 + #endif 49 51 __attribute__((unused)) 50 52 static inline char *safe_strncpy(char *dst, const char *src, size_t size) 51 53 {

+3

samples/bpf/xdpsock_ctrl_proc.c

··· 15 15 #include <bpf/xsk.h> 16 16 #include "xdpsock.h" 17 17 18 + /* libbpf APIs for AF_XDP are deprecated starting from v0.7 */ 19 + #pragma GCC diagnostic ignored "-Wdeprecated-declarations" 20 + 18 21 static const char *opt_if = ""; 19 22 20 23 static struct option long_options[] = {

+3

samples/bpf/xdpsock_user.c

··· 36 36 #include <bpf/bpf.h> 37 37 #include "xdpsock.h" 38 38 39 + /* libbpf APIs for AF_XDP are deprecated starting from v0.7 */ 40 + #pragma GCC diagnostic ignored "-Wdeprecated-declarations" 41 + 39 42 #ifndef SOL_XDP 40 43 #define SOL_XDP 283 41 44 #endif

+3

samples/bpf/xsk_fwd.c

··· 27 27 #include <bpf/xsk.h> 28 28 #include <bpf/bpf.h> 29 29 30 + /* libbpf APIs for AF_XDP are deprecated starting from v0.7 */ 31 + #pragma GCC diagnostic ignored "-Wdeprecated-declarations" 32 + 30 33 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) 31 34 32 35 typedef __u64 u64;

+1 -1

tools/bpf/bpftool/Documentation/Makefile

··· 24 24 man8: $(DOC_MAN8) 25 25 26 26 RST2MAN_DEP := $(shell command -v rst2man 2>/dev/null) 27 - RST2MAN_OPTS += --verbose 27 + RST2MAN_OPTS += --verbose --strip-comments 28 28 29 29 list_pages = $(sort $(basename $(filter-out $(1),$(MAN8_RST)))) 30 30 see_also = $(subst " ",, \

+5 -2

tools/bpf/bpftool/Documentation/bpftool-btf.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 bpftool-btf 3 5 ================ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **btf** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | {**-d** | **--debug** } | 18 - { **-B** | **--base-btf** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } } 19 18 20 19 *COMMANDS* := { **dump** | **help** } 21 20

+5 -2

tools/bpf/bpftool/Documentation/bpftool-cgroup.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 bpftool-cgroup 3 5 ================ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **cgroup** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | 18 - { **-f** | **--bpffs** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } } 19 18 20 19 *COMMANDS* := 21 20 { **show** | **list** | **tree** | **attach** | **detach** | **help** }

+5 -1

tools/bpf/bpftool/Documentation/bpftool-feature.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 =============== 2 4 bpftool-feature 3 5 =============== ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **feature** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| } 18 18 19 19 *COMMANDS* := { **probe** | **help** } 20 20

+5 -2

tools/bpf/bpftool/Documentation/bpftool-gen.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 bpftool-gen 3 5 ================ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **gen** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | 18 - { **-L** | **--use-loader** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| | { **-L** | **--use-loader** } } 19 18 20 19 *COMMAND* := { **object** | **skeleton** | **help** } 21 20

+5 -1

tools/bpf/bpftool/Documentation/bpftool-iter.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ============ 2 4 bpftool-iter 3 5 ============ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **iter** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| } 18 18 19 19 *COMMANDS* := { **pin** | **help** } 20 20

+5 -2

tools/bpf/bpftool/Documentation/bpftool-link.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 bpftool-link 3 5 ================ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **link** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | 18 - { **-f** | **--bpffs** } | { **-n** | **--nomount** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } } 19 18 20 19 *COMMANDS* := { **show** | **list** | **pin** | **help** } 21 20

+5 -2

tools/bpf/bpftool/Documentation/bpftool-map.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 bpftool-map 3 5 ================ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **map** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | 18 - { **-f** | **--bpffs** } | { **-n** | **--nomount** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| | { **-f** | **--bpffs** } | { **-n** | **--nomount** } } 19 18 20 19 *COMMANDS* := 21 20 { **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |

+5 -1

tools/bpf/bpftool/Documentation/bpftool-net.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 bpftool-net 3 5 ================ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **net** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| } 18 18 19 19 *COMMANDS* := 20 20 { **show** | **list** | **attach** | **detach** | **help** }

+5 -1

tools/bpf/bpftool/Documentation/bpftool-perf.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 bpftool-perf 3 5 ================ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **perf** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| } 18 18 19 19 *COMMANDS* := 20 20 { **show** | **list** | **help** }

+5 -1

tools/bpf/bpftool/Documentation/bpftool-prog.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 bpftool-prog 3 5 ================ ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **prog** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | 17 + *OPTIONS* := { |COMMON_OPTIONS| | 18 18 { **-f** | **--bpffs** } | { **-m** | **--mapcompat** } | { **-n** | **--nomount** } | 19 19 { **-L** | **--use-loader** } } 20 20

+5 -1

tools/bpf/bpftool/Documentation/bpftool-struct_ops.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================== 2 4 bpftool-struct_ops 3 5 ================== ··· 9 7 10 8 :Manual section: 8 11 9 10 + .. include:: substitutions.rst 11 + 12 12 SYNOPSIS 13 13 ======== 14 14 15 15 **bpftool** [*OPTIONS*] **struct_ops** *COMMAND* 16 16 17 - *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } } 17 + *OPTIONS* := { |COMMON_OPTIONS| } 18 18 19 19 *COMMANDS* := 20 20 { **show** | **list** | **dump** | **register** | **unregister** | **help** }

+5 -2

tools/bpf/bpftool/Documentation/bpftool.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 ================ 2 4 BPFTOOL 3 5 ================ ··· 8 6 ------------------------------------------------------------------------------- 9 7 10 8 :Manual section: 8 9 + 10 + .. include:: substitutions.rst 11 11 12 12 SYNOPSIS 13 13 ======== ··· 22 18 23 19 *OBJECT* := { **map** | **program** | **cgroup** | **perf** | **net** | **feature** } 24 20 25 - *OPTIONS* := { { **-V** | **--version** } | 26 - { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } } 21 + *OPTIONS* := { { **-V** | **--version** } | |COMMON_OPTIONS| } 27 22 28 23 *MAP-COMMANDS* := 29 24 { **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext** |

+2

tools/bpf/bpftool/Documentation/common_options.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 1 3 -h, --help 2 4 Print short help message (similar to **bpftool help**). 3 5

+3

tools/bpf/bpftool/Documentation/substitutions.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + 3 + .. |COMMON_OPTIONS| replace:: { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-d** | **--debug** } | { **-l** | **--legacy** }

+4 -7

tools/bpf/bpftool/gen.c

··· 486 486 487 487 static int gen_trace(struct bpf_object *obj, const char *obj_name, const char *header_guard) 488 488 { 489 - struct bpf_object_load_attr load_attr = {}; 490 489 DECLARE_LIBBPF_OPTS(gen_loader_opts, opts); 491 490 struct bpf_map *map; 492 491 char ident[256]; ··· 495 496 if (err) 496 497 return err; 497 498 498 - load_attr.obj = obj; 499 - if (verifier_logs) 500 - /* log_level1 + log_level2 + stats, but not stable UAPI */ 501 - load_attr.log_level = 1 + 2 + 4; 502 - 503 - err = bpf_object__load_xattr(&load_attr); 499 + err = bpf_object__load(obj); 504 500 if (err) { 505 501 p_err("failed to load object file"); 506 502 goto out; ··· 713 719 if (obj_name[0] == '\0') 714 720 get_obj_name(obj_name, file); 715 721 opts.object_name = obj_name; 722 + if (verifier_logs) 723 + /* log_level1 + log_level2 + stats, but not stable UAPI */ 724 + opts.kernel_log_level = 1 + 2 + 4; 716 725 obj = bpf_object__open_mem(obj_data, file_sz, &opts); 717 726 err = libbpf_get_error(obj); 718 727 if (err) {

+11 -1

tools/bpf/bpftool/main.c

··· 93 93 jsonw_name(json_wtr, "features"); 94 94 jsonw_start_object(json_wtr); /* features */ 95 95 jsonw_bool_field(json_wtr, "libbfd", has_libbfd); 96 + jsonw_bool_field(json_wtr, "libbpf_strict", !legacy_libbpf); 96 97 jsonw_bool_field(json_wtr, "skeletons", has_skeletons); 97 98 jsonw_end_object(json_wtr); /* features */ 98 99 ··· 105 104 printf("features:"); 106 105 if (has_libbfd) { 107 106 printf(" libbfd"); 107 + nb_features++; 108 + } 109 + if (!legacy_libbpf) { 110 + printf("%s libbpf_strict", nb_features++ ? "," : ""); 108 111 nb_features++; 109 112 } 110 113 if (has_skeletons) ··· 405 400 { "legacy", no_argument, NULL, 'l' }, 406 401 { 0 } 407 402 }; 403 + bool version_requested = false; 408 404 int opt, ret; 409 405 410 406 last_do_help = do_help; ··· 420 414 options, NULL)) >= 0) { 421 415 switch (opt) { 422 416 case 'V': 423 - return do_version(argc, argv); 417 + version_requested = true; 418 + break; 424 419 case 'h': 425 420 return do_help(argc, argv); 426 421 case 'p': ··· 485 478 argv += optind; 486 479 if (argc < 0) 487 480 usage(); 481 + 482 + if (version_requested) 483 + return do_version(argc, argv); 488 484 489 485 ret = cmd_select(cmds, argc, argv, do_help); 490 486

+13 -10

tools/bpf/bpftool/map.c

··· 1261 1261 1262 1262 static int do_create(int argc, char **argv) 1263 1263 { 1264 - struct bpf_create_map_attr attr = { NULL, }; 1264 + LIBBPF_OPTS(bpf_map_create_opts, attr); 1265 + enum bpf_map_type map_type = BPF_MAP_TYPE_UNSPEC; 1266 + __u32 key_size = 0, value_size = 0, max_entries = 0; 1267 + const char *map_name = NULL; 1265 1268 const char *pinfile; 1266 1269 int err = -1, fd; 1267 1270 ··· 1279 1276 if (is_prefix(*argv, "type")) { 1280 1277 NEXT_ARG(); 1281 1278 1282 - if (attr.map_type) { 1279 + if (map_type) { 1283 1280 p_err("map type already specified"); 1284 1281 goto exit; 1285 1282 } 1286 1283 1287 - attr.map_type = map_type_from_str(*argv); 1288 - if ((int)attr.map_type < 0) { 1284 + map_type = map_type_from_str(*argv); 1285 + if ((int)map_type < 0) { 1289 1286 p_err("unrecognized map type: %s", *argv); 1290 1287 goto exit; 1291 1288 } 1292 1289 NEXT_ARG(); 1293 1290 } else if (is_prefix(*argv, "name")) { 1294 1291 NEXT_ARG(); 1295 - attr.name = GET_ARG(); 1292 + map_name = GET_ARG(); 1296 1293 } else if (is_prefix(*argv, "key")) { 1297 - if (parse_u32_arg(&argc, &argv, &attr.key_size, 1294 + if (parse_u32_arg(&argc, &argv, &key_size, 1298 1295 "key size")) 1299 1296 goto exit; 1300 1297 } else if (is_prefix(*argv, "value")) { 1301 - if (parse_u32_arg(&argc, &argv, &attr.value_size, 1298 + if (parse_u32_arg(&argc, &argv, &value_size, 1302 1299 "value size")) 1303 1300 goto exit; 1304 1301 } else if (is_prefix(*argv, "entries")) { 1305 - if (parse_u32_arg(&argc, &argv, &attr.max_entries, 1302 + if (parse_u32_arg(&argc, &argv, &max_entries, 1306 1303 "max entries")) 1307 1304 goto exit; 1308 1305 } else if (is_prefix(*argv, "flags")) { ··· 1343 1340 } 1344 1341 } 1345 1342 1346 - if (!attr.name) { 1343 + if (!map_name) { 1347 1344 p_err("map name not specified"); 1348 1345 goto exit; 1349 1346 } 1350 1347 1351 1348 set_max_rlimit(); 1352 1349 1353 - fd = bpf_create_map_xattr(&attr); 1350 + fd = bpf_map_create(map_type, map_name, key_size, value_size, max_entries, &attr); 1354 1351 if (fd < 0) { 1355 1352 p_err("map create failed: %s", strerror(errno)); 1356 1353 goto exit;

+21 -23

tools/bpf/bpftool/prog.c

··· 1464 1464 DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts, 1465 1465 .relaxed_maps = relaxed_maps, 1466 1466 ); 1467 - struct bpf_object_load_attr load_attr = { 0 }; 1468 1467 enum bpf_attach_type expected_attach_type; 1469 1468 struct map_replace *map_replace = NULL; 1470 1469 struct bpf_program *prog = NULL, *pos; ··· 1597 1598 1598 1599 set_max_rlimit(); 1599 1600 1601 + if (verifier_logs) 1602 + /* log_level1 + log_level2 + stats, but not stable UAPI */ 1603 + open_opts.kernel_log_level = 1 + 2 + 4; 1604 + 1600 1605 obj = bpf_object__open_file(file, &open_opts); 1601 1606 if (libbpf_get_error(obj)) { 1602 1607 p_err("failed to open object file"); ··· 1680 1677 goto err_close_obj; 1681 1678 } 1682 1679 1683 - load_attr.obj = obj; 1684 - if (verifier_logs) 1685 - /* log_level1 + log_level2 + stats, but not stable UAPI */ 1686 - load_attr.log_level = 1 + 2 + 4; 1687 - 1688 - err = bpf_object__load_xattr(&load_attr); 1680 + err = bpf_object__load(obj); 1689 1681 if (err) { 1690 1682 p_err("failed to load object file"); 1691 1683 goto err_close_obj; ··· 1772 1774 sizeof(struct bpf_prog_desc)); 1773 1775 int log_buf_sz = (1u << 24) - 1; 1774 1776 int err, fds_before, fd_delta; 1775 - char *log_buf; 1777 + char *log_buf = NULL; 1776 1778 1777 1779 ctx = alloca(ctx_sz); 1778 1780 memset(ctx, 0, ctx_sz); 1779 1781 ctx->sz = ctx_sz; 1780 - ctx->log_level = 1; 1781 - ctx->log_size = log_buf_sz; 1782 - log_buf = malloc(log_buf_sz); 1783 - if (!log_buf) 1784 - return -ENOMEM; 1785 - ctx->log_buf = (long) log_buf; 1782 + if (verifier_logs) { 1783 + ctx->log_level = 1 + 2 + 4; 1784 + ctx->log_size = log_buf_sz; 1785 + log_buf = malloc(log_buf_sz); 1786 + if (!log_buf) 1787 + return -ENOMEM; 1788 + ctx->log_buf = (long) log_buf; 1789 + } 1786 1790 opts.ctx = ctx; 1787 1791 opts.data = gen->data; 1788 1792 opts.data_sz = gen->data_sz; ··· 1793 1793 fds_before = count_open_fds(); 1794 1794 err = bpf_load_and_run(&opts); 1795 1795 fd_delta = count_open_fds() - fds_before; 1796 - if (err < 0) { 1796 + if (err < 0 || verifier_logs) { 1797 1797 fprintf(stderr, "err %d\n%s\n%s", err, opts.errstr, log_buf); 1798 - if (fd_delta) 1798 + if (fd_delta && err < 0) 1799 1799 fprintf(stderr, "loader prog leaked %d FDs\n", 1800 1800 fd_delta); 1801 1801 } ··· 1807 1807 { 1808 1808 DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts); 1809 1809 DECLARE_LIBBPF_OPTS(gen_loader_opts, gen); 1810 - struct bpf_object_load_attr load_attr = {}; 1811 1810 struct bpf_object *obj; 1812 1811 const char *file; 1813 1812 int err = 0; ··· 1814 1815 if (!REQ_ARGS(1)) 1815 1816 return -1; 1816 1817 file = GET_ARG(); 1818 + 1819 + if (verifier_logs) 1820 + /* log_level1 + log_level2 + stats, but not stable UAPI */ 1821 + open_opts.kernel_log_level = 1 + 2 + 4; 1817 1822 1818 1823 obj = bpf_object__open_file(file, &open_opts); 1819 1824 if (libbpf_get_error(obj)) { ··· 1829 1826 if (err) 1830 1827 goto err_close_obj; 1831 1828 1832 - load_attr.obj = obj; 1833 - if (verifier_logs) 1834 - /* log_level1 + log_level2 + stats, but not stable UAPI */ 1835 - load_attr.log_level = 1 + 2 + 4; 1836 - 1837 - err = bpf_object__load_xattr(&load_attr); 1829 + err = bpf_object__load(obj); 1838 1830 if (err) { 1839 1831 p_err("failed to load object file"); 1840 1832 goto err_close_obj;

+7 -8

tools/bpf/bpftool/struct_ops.c

··· 479 479 480 480 static int do_register(int argc, char **argv) 481 481 { 482 - struct bpf_object_load_attr load_attr = {}; 482 + LIBBPF_OPTS(bpf_object_open_opts, open_opts); 483 483 const struct bpf_map_def *def; 484 484 struct bpf_map_info info = {}; 485 485 __u32 info_len = sizeof(info); ··· 494 494 495 495 file = GET_ARG(); 496 496 497 - obj = bpf_object__open(file); 497 + if (verifier_logs) 498 + /* log_level1 + log_level2 + stats, but not stable UAPI */ 499 + open_opts.kernel_log_level = 1 + 2 + 4; 500 + 501 + obj = bpf_object__open_file(file, &open_opts); 498 502 if (libbpf_get_error(obj)) 499 503 return -1; 500 504 501 505 set_max_rlimit(); 502 506 503 - load_attr.obj = obj; 504 - if (verifier_logs) 505 - /* log_level1 + log_level2 + stats, but not stable UAPI */ 506 - load_attr.log_level = 1 + 2 + 4; 507 - 508 - if (bpf_object__load_xattr(&load_attr)) { 507 + if (bpf_object__load(obj)) { 509 508 bpf_object__close(obj); 510 509 return -1; 511 510 }

+3 -2

tools/bpf/resolve_btfids/main.c

··· 168 168 return NULL; 169 169 } 170 170 171 - static struct btf_id* 171 + static struct btf_id * 172 172 btf_id__add(struct rb_root *root, char *name, bool unique) 173 173 { 174 174 struct rb_node **p = &root->rb_node; ··· 732 732 if (obj.efile.idlist_shndx == -1 || 733 733 obj.efile.symbols_shndx == -1) { 734 734 pr_debug("Cannot find .BTF_ids or symbols sections, nothing to do\n"); 735 - return 0; 735 + err = 0; 736 + goto out; 736 737 } 737 738 738 739 if (symbols_collect(&obj))

+6

tools/build/feature/test-bpf.c

··· 14 14 # define __NR_bpf 349 15 15 # elif defined(__s390__) 16 16 # define __NR_bpf 351 17 + # elif defined(__mips__) && defined(_ABIO32) 18 + # define __NR_bpf 4355 19 + # elif defined(__mips__) && defined(_ABIN32) 20 + # define __NR_bpf 6319 21 + # elif defined(__mips__) && defined(_ABI64) 22 + # define __NR_bpf 5315 17 23 # else 18 24 # error __NR_bpf not defined. libbpf does not support your arch. 19 25 # endif

+103 -2

tools/include/uapi/linux/bpf.h

··· 1342 1342 /* or valid module BTF object fd or 0 to attach to vmlinux */ 1343 1343 __u32 attach_btf_obj_fd; 1344 1344 }; 1345 - __u32 :32; /* pad */ 1345 + __u32 core_relo_cnt; /* number of bpf_core_relo */ 1346 1346 __aligned_u64 fd_array; /* array of FDs */ 1347 + __aligned_u64 core_relos; 1348 + __u32 core_relo_rec_size; /* sizeof(struct bpf_core_relo) */ 1347 1349 }; 1348 1350 1349 1351 struct { /* anonymous struct used by BPF_OBJ_* commands */ ··· 1746 1744 * if the maximum number of tail calls has been reached for this 1747 1745 * chain of programs. This limit is defined in the kernel by the 1748 1746 * macro **MAX_TAIL_CALL_CNT** (not accessible to user space), 1749 - * which is currently set to 32. 1747 + * which is currently set to 33. 1750 1748 * Return 1751 1749 * 0 on success, or a negative error in case of failure. 1752 1750 * ··· 4959 4957 * **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*. 4960 4958 * **-EBUSY** if failed to try lock mmap_lock. 4961 4959 * **-EINVAL** for invalid **flags**. 4960 + * 4961 + * long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64 flags) 4962 + * Description 4963 + * For **nr_loops**, call **callback_fn** function 4964 + * with **callback_ctx** as the context parameter. 4965 + * The **callback_fn** should be a static function and 4966 + * the **callback_ctx** should be a pointer to the stack. 4967 + * The **flags** is used to control certain aspects of the helper. 4968 + * Currently, the **flags** must be 0. Currently, nr_loops is 4969 + * limited to 1 << 23 (~8 million) loops. 4970 + * 4971 + * long (\*callback_fn)(u32 index, void \*ctx); 4972 + * 4973 + * where **index** is the current index in the loop. The index 4974 + * is zero-indexed. 4975 + * 4976 + * If **callback_fn** returns 0, the helper will continue to the next 4977 + * loop. If return value is 1, the helper will skip the rest of 4978 + * the loops and return. Other return values are not used now, 4979 + * and will be rejected by the verifier. 4980 + * 4981 + * Return 4982 + * The number of loops performed, **-EINVAL** for invalid **flags**, 4983 + * **-E2BIG** if **nr_loops** exceeds the maximum number of loops. 4962 4984 */ 4963 4985 #define __BPF_FUNC_MAPPER(FN) \ 4964 4986 FN(unspec), \ ··· 5166 5140 FN(skc_to_unix_sock), \ 5167 5141 FN(kallsyms_lookup_name), \ 5168 5142 FN(find_vma), \ 5143 + FN(loop), \ 5169 5144 /* */ 5170 5145 5171 5146 /* integer value in 'imm' field of BPF_CALL instruction selects which helper ··· 6374 6347 BTF_F_NONAME = (1ULL << 1), 6375 6348 BTF_F_PTR_RAW = (1ULL << 2), 6376 6349 BTF_F_ZERO = (1ULL << 3), 6350 + }; 6351 + 6352 + /* bpf_core_relo_kind encodes which aspect of captured field/type/enum value 6353 + * has to be adjusted by relocations. It is emitted by llvm and passed to 6354 + * libbpf and later to the kernel. 6355 + */ 6356 + enum bpf_core_relo_kind { 6357 + BPF_CORE_FIELD_BYTE_OFFSET = 0, /* field byte offset */ 6358 + BPF_CORE_FIELD_BYTE_SIZE = 1, /* field size in bytes */ 6359 + BPF_CORE_FIELD_EXISTS = 2, /* field existence in target kernel */ 6360 + BPF_CORE_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */ 6361 + BPF_CORE_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */ 6362 + BPF_CORE_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */ 6363 + BPF_CORE_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */ 6364 + BPF_CORE_TYPE_ID_TARGET = 7, /* type ID in target kernel */ 6365 + BPF_CORE_TYPE_EXISTS = 8, /* type existence in target kernel */ 6366 + BPF_CORE_TYPE_SIZE = 9, /* type size in bytes */ 6367 + BPF_CORE_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */ 6368 + BPF_CORE_ENUMVAL_VALUE = 11, /* enum value integer value */ 6369 + }; 6370 + 6371 + /* 6372 + * "struct bpf_core_relo" is used to pass relocation data form LLVM to libbpf 6373 + * and from libbpf to the kernel. 6374 + * 6375 + * CO-RE relocation captures the following data: 6376 + * - insn_off - instruction offset (in bytes) within a BPF program that needs 6377 + * its insn->imm field to be relocated with actual field info; 6378 + * - type_id - BTF type ID of the "root" (containing) entity of a relocatable 6379 + * type or field; 6380 + * - access_str_off - offset into corresponding .BTF string section. String 6381 + * interpretation depends on specific relocation kind: 6382 + * - for field-based relocations, string encodes an accessed field using 6383 + * a sequence of field and array indices, separated by colon (:). It's 6384 + * conceptually very close to LLVM's getelementptr ([0]) instruction's 6385 + * arguments for identifying offset to a field. 6386 + * - for type-based relocations, strings is expected to be just "0"; 6387 + * - for enum value-based relocations, string contains an index of enum 6388 + * value within its enum type; 6389 + * - kind - one of enum bpf_core_relo_kind; 6390 + * 6391 + * Example: 6392 + * struct sample { 6393 + * int a; 6394 + * struct { 6395 + * int b[10]; 6396 + * }; 6397 + * }; 6398 + * 6399 + * struct sample *s = ...; 6400 + * int *x = &s->a; // encoded as "0:0" (a is field #0) 6401 + * int *y = &s->b[5]; // encoded as "0:1:0:5" (anon struct is field #1, 6402 + * // b is field #0 inside anon struct, accessing elem #5) 6403 + * int *z = &s[10]->b; // encoded as "10:1" (ptr is used as an array) 6404 + * 6405 + * type_id for all relocs in this example will capture BTF type id of 6406 + * `struct sample`. 6407 + * 6408 + * Such relocation is emitted when using __builtin_preserve_access_index() 6409 + * Clang built-in, passing expression that captures field address, e.g.: 6410 + * 6411 + * bpf_probe_read(&dst, sizeof(dst), 6412 + * __builtin_preserve_access_index(&src->a.b.c)); 6413 + * 6414 + * In this case Clang will emit field relocation recording necessary data to 6415 + * be able to find offset of embedded `a.b.c` field within `src` struct. 6416 + * 6417 + * [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction 6418 + */ 6419 + struct bpf_core_relo { 6420 + __u32 insn_off; 6421 + __u32 type_id; 6422 + __u32 access_str_off; 6423 + enum bpf_core_relo_kind kind; 6377 6424 }; 6378 6425 6379 6426 #endif /* _UAPI__LINUX_BPF_H__ */

+128 -102

tools/lib/bpf/bpf.c

··· 50 50 # define __NR_bpf 351 51 51 # elif defined(__arc__) 52 52 # define __NR_bpf 280 53 + # elif defined(__mips__) && defined(_ABIO32) 54 + # define __NR_bpf 4355 55 + # elif defined(__mips__) && defined(_ABIN32) 56 + # define __NR_bpf 6319 57 + # elif defined(__mips__) && defined(_ABI64) 58 + # define __NR_bpf 5315 53 59 # else 54 60 # error __NR_bpf not defined. libbpf does not support your arch. 55 61 # endif ··· 94 88 return fd; 95 89 } 96 90 97 - int libbpf__bpf_create_map_xattr(const struct bpf_create_map_params *create_attr) 91 + int bpf_map_create(enum bpf_map_type map_type, 92 + const char *map_name, 93 + __u32 key_size, 94 + __u32 value_size, 95 + __u32 max_entries, 96 + const struct bpf_map_create_opts *opts) 98 97 { 98 + const size_t attr_sz = offsetofend(union bpf_attr, map_extra); 99 99 union bpf_attr attr; 100 100 int fd; 101 101 102 - memset(&attr, '\0', sizeof(attr)); 102 + memset(&attr, 0, attr_sz); 103 103 104 - attr.map_type = create_attr->map_type; 105 - attr.key_size = create_attr->key_size; 106 - attr.value_size = create_attr->value_size; 107 - attr.max_entries = create_attr->max_entries; 108 - attr.map_flags = create_attr->map_flags; 109 - if (create_attr->name) 110 - memcpy(attr.map_name, create_attr->name, 111 - min(strlen(create_attr->name), BPF_OBJ_NAME_LEN - 1)); 112 - attr.numa_node = create_attr->numa_node; 113 - attr.btf_fd = create_attr->btf_fd; 114 - attr.btf_key_type_id = create_attr->btf_key_type_id; 115 - attr.btf_value_type_id = create_attr->btf_value_type_id; 116 - attr.map_ifindex = create_attr->map_ifindex; 117 - if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS) 118 - attr.btf_vmlinux_value_type_id = 119 - create_attr->btf_vmlinux_value_type_id; 120 - else 121 - attr.inner_map_fd = create_attr->inner_map_fd; 122 - attr.map_extra = create_attr->map_extra; 104 + if (!OPTS_VALID(opts, bpf_map_create_opts)) 105 + return libbpf_err(-EINVAL); 123 106 124 - fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, sizeof(attr)); 107 + attr.map_type = map_type; 108 + if (map_name) 109 + strncat(attr.map_name, map_name, sizeof(attr.map_name) - 1); 110 + attr.key_size = key_size; 111 + attr.value_size = value_size; 112 + attr.max_entries = max_entries; 113 + 114 + attr.btf_fd = OPTS_GET(opts, btf_fd, 0); 115 + attr.btf_key_type_id = OPTS_GET(opts, btf_key_type_id, 0); 116 + attr.btf_value_type_id = OPTS_GET(opts, btf_value_type_id, 0); 117 + attr.btf_vmlinux_value_type_id = OPTS_GET(opts, btf_vmlinux_value_type_id, 0); 118 + 119 + attr.inner_map_fd = OPTS_GET(opts, inner_map_fd, 0); 120 + attr.map_flags = OPTS_GET(opts, map_flags, 0); 121 + attr.map_extra = OPTS_GET(opts, map_extra, 0); 122 + attr.numa_node = OPTS_GET(opts, numa_node, 0); 123 + attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0); 124 + 125 + fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz); 125 126 return libbpf_err_errno(fd); 126 127 } 127 128 128 129 int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr) 129 130 { 130 - struct bpf_create_map_params p = {}; 131 + LIBBPF_OPTS(bpf_map_create_opts, p); 131 132 132 - p.map_type = create_attr->map_type; 133 - p.key_size = create_attr->key_size; 134 - p.value_size = create_attr->value_size; 135 - p.max_entries = create_attr->max_entries; 136 133 p.map_flags = create_attr->map_flags; 137 - p.name = create_attr->name; 138 134 p.numa_node = create_attr->numa_node; 139 135 p.btf_fd = create_attr->btf_fd; 140 136 p.btf_key_type_id = create_attr->btf_key_type_id; 141 137 p.btf_value_type_id = create_attr->btf_value_type_id; 142 138 p.map_ifindex = create_attr->map_ifindex; 143 - if (p.map_type == BPF_MAP_TYPE_STRUCT_OPS) 144 - p.btf_vmlinux_value_type_id = 145 - create_attr->btf_vmlinux_value_type_id; 139 + if (create_attr->map_type == BPF_MAP_TYPE_STRUCT_OPS) 140 + p.btf_vmlinux_value_type_id = create_attr->btf_vmlinux_value_type_id; 146 141 else 147 142 p.inner_map_fd = create_attr->inner_map_fd; 148 143 149 - return libbpf__bpf_create_map_xattr(&p); 144 + return bpf_map_create(create_attr->map_type, create_attr->name, 145 + create_attr->key_size, create_attr->value_size, 146 + create_attr->max_entries, &p); 150 147 } 151 148 152 149 int bpf_create_map_node(enum bpf_map_type map_type, const char *name, 153 150 int key_size, int value_size, int max_entries, 154 151 __u32 map_flags, int node) 155 152 { 156 - struct bpf_create_map_attr map_attr = {}; 153 + LIBBPF_OPTS(bpf_map_create_opts, opts); 157 154 158 - map_attr.name = name; 159 - map_attr.map_type = map_type; 160 - map_attr.map_flags = map_flags; 161 - map_attr.key_size = key_size; 162 - map_attr.value_size = value_size; 163 - map_attr.max_entries = max_entries; 155 + opts.map_flags = map_flags; 164 156 if (node >= 0) { 165 - map_attr.numa_node = node; 166 - map_attr.map_flags |= BPF_F_NUMA_NODE; 157 + opts.numa_node = node; 158 + opts.map_flags |= BPF_F_NUMA_NODE; 167 159 } 168 160 169 - return bpf_create_map_xattr(&map_attr); 161 + return bpf_map_create(map_type, name, key_size, value_size, max_entries, &opts); 170 162 } 171 163 172 164 int bpf_create_map(enum bpf_map_type map_type, int key_size, 173 165 int value_size, int max_entries, __u32 map_flags) 174 166 { 175 - struct bpf_create_map_attr map_attr = {}; 167 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = map_flags); 176 168 177 - map_attr.map_type = map_type; 178 - map_attr.map_flags = map_flags; 179 - map_attr.key_size = key_size; 180 - map_attr.value_size = value_size; 181 - map_attr.max_entries = max_entries; 182 - 183 - return bpf_create_map_xattr(&map_attr); 169 + return bpf_map_create(map_type, NULL, key_size, value_size, max_entries, &opts); 184 170 } 185 171 186 172 int bpf_create_map_name(enum bpf_map_type map_type, const char *name, 187 173 int key_size, int value_size, int max_entries, 188 174 __u32 map_flags) 189 175 { 190 - struct bpf_create_map_attr map_attr = {}; 176 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = map_flags); 191 177 192 - map_attr.name = name; 193 - map_attr.map_type = map_type; 194 - map_attr.map_flags = map_flags; 195 - map_attr.key_size = key_size; 196 - map_attr.value_size = value_size; 197 - map_attr.max_entries = max_entries; 198 - 199 - return bpf_create_map_xattr(&map_attr); 178 + return bpf_map_create(map_type, name, key_size, value_size, max_entries, &opts); 200 179 } 201 180 202 181 int bpf_create_map_in_map_node(enum bpf_map_type map_type, const char *name, 203 182 int key_size, int inner_map_fd, int max_entries, 204 183 __u32 map_flags, int node) 205 184 { 206 - union bpf_attr attr; 207 - int fd; 185 + LIBBPF_OPTS(bpf_map_create_opts, opts); 208 186 209 - memset(&attr, '\0', sizeof(attr)); 210 - 211 - attr.map_type = map_type; 212 - attr.key_size = key_size; 213 - attr.value_size = 4; 214 - attr.inner_map_fd = inner_map_fd; 215 - attr.max_entries = max_entries; 216 - attr.map_flags = map_flags; 217 - if (name) 218 - memcpy(attr.map_name, name, 219 - min(strlen(name), BPF_OBJ_NAME_LEN - 1)); 220 - 187 + opts.inner_map_fd = inner_map_fd; 188 + opts.map_flags = map_flags; 221 189 if (node >= 0) { 222 - attr.map_flags |= BPF_F_NUMA_NODE; 223 - attr.numa_node = node; 190 + opts.map_flags |= BPF_F_NUMA_NODE; 191 + opts.numa_node = node; 224 192 } 225 193 226 - fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, sizeof(attr)); 227 - return libbpf_err_errno(fd); 194 + return bpf_map_create(map_type, name, key_size, 4, max_entries, &opts); 228 195 } 229 196 230 197 int bpf_create_map_in_map(enum bpf_map_type map_type, const char *name, 231 198 int key_size, int inner_map_fd, int max_entries, 232 199 __u32 map_flags) 233 200 { 234 - return bpf_create_map_in_map_node(map_type, name, key_size, 235 - inner_map_fd, max_entries, map_flags, 236 - -1); 201 + LIBBPF_OPTS(bpf_map_create_opts, opts, 202 + .inner_map_fd = inner_map_fd, 203 + .map_flags = map_flags, 204 + ); 205 + 206 + return bpf_map_create(map_type, name, key_size, 4, max_entries, &opts); 237 207 } 238 208 239 209 static void * ··· 303 321 if (log_level && !log_buf) 304 322 return libbpf_err(-EINVAL); 305 323 306 - attr.log_level = log_level; 307 - attr.log_buf = ptr_to_u64(log_buf); 308 - attr.log_size = log_size; 309 - 310 324 func_info_rec_size = OPTS_GET(opts, func_info_rec_size, 0); 311 325 func_info = OPTS_GET(opts, func_info, NULL); 312 326 attr.func_info_rec_size = func_info_rec_size; ··· 316 338 attr.line_info_cnt = OPTS_GET(opts, line_info_cnt, 0); 317 339 318 340 attr.fd_array = ptr_to_u64(OPTS_GET(opts, fd_array, NULL)); 341 + 342 + if (log_level) { 343 + attr.log_buf = ptr_to_u64(log_buf); 344 + attr.log_size = log_size; 345 + attr.log_level = log_level; 346 + } 319 347 320 348 fd = sys_bpf_prog_load(&attr, sizeof(attr), attempts); 321 349 if (fd >= 0) ··· 368 384 goto done; 369 385 } 370 386 371 - if (log_level || !log_buf) 372 - goto done; 387 + if (log_level == 0 && log_buf) { 388 + /* log_level == 0 with non-NULL log_buf requires retrying on error 389 + * with log_level == 1 and log_buf/log_buf_size set, to get details of 390 + * failure 391 + */ 392 + attr.log_buf = ptr_to_u64(log_buf); 393 + attr.log_size = log_size; 394 + attr.log_level = 1; 373 395 374 - /* Try again with log */ 375 - log_buf[0] = 0; 376 - attr.log_buf = ptr_to_u64(log_buf); 377 - attr.log_size = log_size; 378 - attr.log_level = 1; 379 - 380 - fd = sys_bpf_prog_load(&attr, sizeof(attr), attempts); 396 + fd = sys_bpf_prog_load(&attr, sizeof(attr), attempts); 397 + } 381 398 done: 382 399 /* free() doesn't affect errno, so we don't need to restore it */ 383 400 free(finfo); ··· 1047 1062 return libbpf_err_errno(fd); 1048 1063 } 1049 1064 1050 - int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, 1051 - bool do_log) 1065 + int bpf_btf_load(const void *btf_data, size_t btf_size, const struct bpf_btf_load_opts *opts) 1052 1066 { 1053 - union bpf_attr attr = {}; 1067 + const size_t attr_sz = offsetofend(union bpf_attr, btf_log_level); 1068 + union bpf_attr attr; 1069 + char *log_buf; 1070 + size_t log_size; 1071 + __u32 log_level; 1054 1072 int fd; 1055 1073 1056 - attr.btf = ptr_to_u64(btf); 1074 + memset(&attr, 0, attr_sz); 1075 + 1076 + if (!OPTS_VALID(opts, bpf_btf_load_opts)) 1077 + return libbpf_err(-EINVAL); 1078 + 1079 + log_buf = OPTS_GET(opts, log_buf, NULL); 1080 + log_size = OPTS_GET(opts, log_size, 0); 1081 + log_level = OPTS_GET(opts, log_level, 0); 1082 + 1083 + if (log_size > UINT_MAX) 1084 + return libbpf_err(-EINVAL); 1085 + if (log_size && !log_buf) 1086 + return libbpf_err(-EINVAL); 1087 + 1088 + attr.btf = ptr_to_u64(btf_data); 1057 1089 attr.btf_size = btf_size; 1090 + /* log_level == 0 and log_buf != NULL means "try loading without 1091 + * log_buf, but retry with log_buf and log_level=1 on error", which is 1092 + * consistent across low-level and high-level BTF and program loading 1093 + * APIs within libbpf and provides a sensible behavior in practice 1094 + */ 1095 + if (log_level) { 1096 + attr.btf_log_buf = ptr_to_u64(log_buf); 1097 + attr.btf_log_size = (__u32)log_size; 1098 + attr.btf_log_level = log_level; 1099 + } 1100 + 1101 + fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, attr_sz); 1102 + if (fd < 0 && log_buf && log_level == 0) { 1103 + attr.btf_log_buf = ptr_to_u64(log_buf); 1104 + attr.btf_log_size = (__u32)log_size; 1105 + attr.btf_log_level = 1; 1106 + fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, attr_sz); 1107 + } 1108 + return libbpf_err_errno(fd); 1109 + } 1110 + 1111 + int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, bool do_log) 1112 + { 1113 + LIBBPF_OPTS(bpf_btf_load_opts, opts); 1114 + int fd; 1058 1115 1059 1116 retry: 1060 1117 if (do_log && log_buf && log_buf_size) { 1061 - attr.btf_log_level = 1; 1062 - attr.btf_log_size = log_buf_size; 1063 - attr.btf_log_buf = ptr_to_u64(log_buf); 1118 + opts.log_buf = log_buf; 1119 + opts.log_size = log_buf_size; 1120 + opts.log_level = 1; 1064 1121 } 1065 1122 1066 - fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, sizeof(attr)); 1067 - 1123 + fd = bpf_btf_load(btf, btf_size, &opts); 1068 1124 if (fd < 0 && !do_log && log_buf && log_buf_size) { 1069 1125 do_log = true; 1070 1126 goto retry;

+50 -5

tools/lib/bpf/bpf.h

··· 35 35 extern "C" { 36 36 #endif 37 37 38 + struct bpf_map_create_opts { 39 + size_t sz; /* size of this struct for forward/backward compatibility */ 40 + 41 + __u32 btf_fd; 42 + __u32 btf_key_type_id; 43 + __u32 btf_value_type_id; 44 + __u32 btf_vmlinux_value_type_id; 45 + 46 + __u32 inner_map_fd; 47 + __u32 map_flags; 48 + __u64 map_extra; 49 + 50 + __u32 numa_node; 51 + __u32 map_ifindex; 52 + }; 53 + #define bpf_map_create_opts__last_field map_ifindex 54 + 55 + LIBBPF_API int bpf_map_create(enum bpf_map_type map_type, 56 + const char *map_name, 57 + __u32 key_size, 58 + __u32 value_size, 59 + __u32 max_entries, 60 + const struct bpf_map_create_opts *opts); 61 + 38 62 struct bpf_create_map_attr { 39 63 const char *name; 40 64 enum bpf_map_type map_type; ··· 77 53 }; 78 54 }; 79 55 80 - LIBBPF_API int 81 - bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr); 56 + LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead") 57 + LIBBPF_API int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr); 58 + LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead") 82 59 LIBBPF_API int bpf_create_map_node(enum bpf_map_type map_type, const char *name, 83 60 int key_size, int value_size, 84 61 int max_entries, __u32 map_flags, int node); 62 + LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead") 85 63 LIBBPF_API int bpf_create_map_name(enum bpf_map_type map_type, const char *name, 86 64 int key_size, int value_size, 87 65 int max_entries, __u32 map_flags); 66 + LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead") 88 67 LIBBPF_API int bpf_create_map(enum bpf_map_type map_type, int key_size, 89 68 int value_size, int max_entries, __u32 map_flags); 69 + LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead") 90 70 LIBBPF_API int bpf_create_map_in_map_node(enum bpf_map_type map_type, 91 71 const char *name, int key_size, 92 72 int inner_map_fd, int max_entries, 93 73 __u32 map_flags, int node); 74 + LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_map_create() instead") 94 75 LIBBPF_API int bpf_create_map_in_map(enum bpf_map_type map_type, 95 76 const char *name, int key_size, 96 77 int inner_map_fd, int max_entries, ··· 195 166 /* Flags to direct loading requirements */ 196 167 #define MAPS_RELAX_COMPAT 0x01 197 168 198 - /* Recommend log buffer size */ 169 + /* Recommended log buffer size */ 199 170 #define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8) /* verifier maximum in kernels <= 5.1 */ 171 + 200 172 LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_prog_load() instead") 201 173 LIBBPF_API int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr, 202 174 char *log_buf, size_t log_buf_sz); ··· 213 183 const char *license, __u32 kern_version, 214 184 char *log_buf, size_t log_buf_sz, 215 185 int log_level); 186 + 187 + struct bpf_btf_load_opts { 188 + size_t sz; /* size of this struct for forward/backward compatibility */ 189 + 190 + /* kernel log options */ 191 + char *log_buf; 192 + __u32 log_level; 193 + __u32 log_size; 194 + }; 195 + #define bpf_btf_load_opts__last_field log_size 196 + 197 + LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size, 198 + const struct bpf_btf_load_opts *opts); 199 + 200 + LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_btf_load() instead") 201 + LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, 202 + __u32 log_buf_size, bool do_log); 216 203 217 204 LIBBPF_API int bpf_map_update_elem(int fd, const void *key, const void *value, 218 205 __u64 flags); ··· 358 311 __u32 query_flags, __u32 *attach_flags, 359 312 __u32 *prog_ids, __u32 *prog_cnt); 360 313 LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd); 361 - LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, 362 - __u32 log_buf_size, bool do_log); 363 314 LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, 364 315 __u32 *buf_len, __u32 *prog_id, __u32 *fd_type, 365 316 __u64 *probe_offset, __u64 *probe_addr);

+8 -1

tools/lib/bpf/bpf_gen_internal.h

··· 39 39 int error; 40 40 struct ksym_relo_desc *relos; 41 41 int relo_cnt; 42 + struct bpf_core_relo *core_relos; 43 + int core_relo_cnt; 42 44 char attach_target[128]; 43 45 int attach_kind; 44 46 struct ksym_desc *ksyms; ··· 53 51 int bpf_gen__finish(struct bpf_gen *gen, int nr_progs, int nr_maps); 54 52 void bpf_gen__free(struct bpf_gen *gen); 55 53 void bpf_gen__load_btf(struct bpf_gen *gen, const void *raw_data, __u32 raw_size); 56 - void bpf_gen__map_create(struct bpf_gen *gen, struct bpf_create_map_params *map_attr, int map_idx); 54 + void bpf_gen__map_create(struct bpf_gen *gen, 55 + enum bpf_map_type map_type, const char *map_name, 56 + __u32 key_size, __u32 value_size, __u32 max_entries, 57 + struct bpf_map_create_opts *map_attr, int map_idx); 57 58 void bpf_gen__prog_load(struct bpf_gen *gen, 58 59 enum bpf_prog_type prog_type, const char *prog_name, 59 60 const char *license, struct bpf_insn *insns, size_t insn_cnt, ··· 66 61 void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *name, enum bpf_attach_type type); 67 62 void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak, 68 63 bool is_typeless, int kind, int insn_idx); 64 + void bpf_gen__record_relo_core(struct bpf_gen *gen, const struct bpf_core_relo *core_relo); 65 + void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int key, int inner_map_idx); 69 66 70 67 #endif

+107 -36

tools/lib/bpf/btf.c

··· 454 454 } 455 455 456 456 /* internal helper returning non-const pointer to a type */ 457 - struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id) 457 + struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id) 458 458 { 459 459 if (type_id == 0) 460 460 return &btf_void; ··· 610 610 case BTF_KIND_RESTRICT: 611 611 case BTF_KIND_VAR: 612 612 case BTF_KIND_DECL_TAG: 613 + case BTF_KIND_TYPE_TAG: 613 614 type_id = t->type; 614 615 break; 615 616 case BTF_KIND_ARRAY: ··· 1124 1123 1125 1124 static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian); 1126 1125 1127 - int btf__load_into_kernel(struct btf *btf) 1126 + int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level) 1128 1127 { 1129 - __u32 log_buf_size = 0, raw_size; 1130 - char *log_buf = NULL; 1128 + LIBBPF_OPTS(bpf_btf_load_opts, opts); 1129 + __u32 buf_sz = 0, raw_size; 1130 + char *buf = NULL, *tmp; 1131 1131 void *raw_data; 1132 1132 int err = 0; 1133 1133 1134 1134 if (btf->fd >= 0) 1135 1135 return libbpf_err(-EEXIST); 1136 + if (log_sz && !log_buf) 1137 + return libbpf_err(-EINVAL); 1136 1138 1137 - retry_load: 1138 - if (log_buf_size) { 1139 - log_buf = malloc(log_buf_size); 1140 - if (!log_buf) 1141 - return libbpf_err(-ENOMEM); 1142 - 1143 - *log_buf = 0; 1144 - } 1145 - 1139 + /* cache native raw data representation */ 1146 1140 raw_data = btf_get_raw_data(btf, &raw_size, false); 1147 1141 if (!raw_data) { 1148 1142 err = -ENOMEM; 1149 1143 goto done; 1150 1144 } 1151 - /* cache native raw data representation */ 1152 1145 btf->raw_size = raw_size; 1153 1146 btf->raw_data = raw_data; 1154 1147 1155 - btf->fd = bpf_load_btf(raw_data, raw_size, log_buf, log_buf_size, false); 1156 - if (btf->fd < 0) { 1157 - if (!log_buf || errno == ENOSPC) { 1158 - log_buf_size = max((__u32)BPF_LOG_BUF_SIZE, 1159 - log_buf_size << 1); 1160 - free(log_buf); 1161 - goto retry_load; 1148 + retry_load: 1149 + /* if log_level is 0, we won't provide log_buf/log_size to the kernel, 1150 + * initially. Only if BTF loading fails, we bump log_level to 1 and 1151 + * retry, using either auto-allocated or custom log_buf. This way 1152 + * non-NULL custom log_buf provides a buffer just in case, but hopes 1153 + * for successful load and no need for log_buf. 1154 + */ 1155 + if (log_level) { 1156 + /* if caller didn't provide custom log_buf, we'll keep 1157 + * allocating our own progressively bigger buffers for BTF 1158 + * verification log 1159 + */ 1160 + if (!log_buf) { 1161 + buf_sz = max((__u32)BPF_LOG_BUF_SIZE, buf_sz * 2); 1162 + tmp = realloc(buf, buf_sz); 1163 + if (!tmp) { 1164 + err = -ENOMEM; 1165 + goto done; 1166 + } 1167 + buf = tmp; 1168 + buf[0] = '\0'; 1162 1169 } 1163 1170 1171 + opts.log_buf = log_buf ? log_buf : buf; 1172 + opts.log_size = log_buf ? log_sz : buf_sz; 1173 + opts.log_level = log_level; 1174 + } 1175 + 1176 + btf->fd = bpf_btf_load(raw_data, raw_size, &opts); 1177 + if (btf->fd < 0) { 1178 + /* time to turn on verbose mode and try again */ 1179 + if (log_level == 0) { 1180 + log_level = 1; 1181 + goto retry_load; 1182 + } 1183 + /* only retry if caller didn't provide custom log_buf, but 1184 + * make sure we can never overflow buf_sz 1185 + */ 1186 + if (!log_buf && errno == ENOSPC && buf_sz <= UINT_MAX / 2) 1187 + goto retry_load; 1188 + 1164 1189 err = -errno; 1165 - pr_warn("Error loading BTF: %s(%d)\n", strerror(errno), errno); 1166 - if (*log_buf) 1167 - pr_warn("%s\n", log_buf); 1168 - goto done; 1190 + pr_warn("BTF loading error: %d\n", err); 1191 + /* don't print out contents of custom log_buf */ 1192 + if (!log_buf && buf[0]) 1193 + pr_warn("-- BEGIN BTF LOAD LOG ---\n%s\n-- END BTF LOAD LOG --\n", buf); 1169 1194 } 1170 1195 1171 1196 done: 1172 - free(log_buf); 1197 + free(buf); 1173 1198 return libbpf_err(err); 1174 1199 } 1200 + 1201 + int btf__load_into_kernel(struct btf *btf) 1202 + { 1203 + return btf_load_into_kernel(btf, NULL, 0, 0); 1204 + } 1205 + 1175 1206 int btf__load(struct btf *) __attribute__((alias("btf__load_into_kernel"))); 1176 1207 1177 1208 int btf__fd(const struct btf *btf) ··· 2763 2730 free(btf_ext); 2764 2731 } 2765 2732 2766 - struct btf_ext *btf_ext__new(__u8 *data, __u32 size) 2733 + struct btf_ext *btf_ext__new(const __u8 *data, __u32 size) 2767 2734 { 2768 2735 struct btf_ext *btf_ext; 2769 2736 int err; 2770 - 2771 - err = btf_ext_parse_hdr(data, size); 2772 - if (err) 2773 - return libbpf_err_ptr(err); 2774 2737 2775 2738 btf_ext = calloc(1, sizeof(struct btf_ext)); 2776 2739 if (!btf_ext) ··· 2779 2750 goto done; 2780 2751 } 2781 2752 memcpy(btf_ext->data, data, size); 2753 + 2754 + err = btf_ext_parse_hdr(btf_ext->data, size); 2755 + if (err) 2756 + goto done; 2782 2757 2783 2758 if (btf_ext->hdr->hdr_len < offsetofend(struct btf_ext_header, line_info_len)) { 2784 2759 err = -EINVAL; ··· 3107 3074 return libbpf_err(err); 3108 3075 } 3109 3076 3110 - COMPAT_VERSION(bpf__dedup_deprecated, btf__dedup, LIBBPF_0.0.2) 3077 + COMPAT_VERSION(btf__dedup_deprecated, btf__dedup, LIBBPF_0.0.2) 3111 3078 int btf__dedup_deprecated(struct btf *btf, struct btf_ext *btf_ext, const void *unused_opts) 3112 3079 { 3113 3080 LIBBPF_OPTS(btf_dedup_opts, opts, .btf_ext = btf_ext); ··· 3509 3476 } 3510 3477 3511 3478 /* 3512 - * Check structural compatibility of two FUNC_PROTOs, ignoring referenced type 3513 - * IDs. This check is performed during type graph equivalence check and 3479 + * Check structural compatibility of two STRUCTs/UNIONs, ignoring referenced 3480 + * type IDs. This check is performed during type graph equivalence check and 3514 3481 * referenced types equivalence is checked separately. 3515 3482 */ 3516 3483 static bool btf_shallow_equal_struct(struct btf_type *t1, struct btf_type *t2) ··· 3883 3850 return btf_equal_array(t1, t2); 3884 3851 } 3885 3852 3853 + /* Check if given two types are identical STRUCT/UNION definitions */ 3854 + static bool btf_dedup_identical_structs(struct btf_dedup *d, __u32 id1, __u32 id2) 3855 + { 3856 + const struct btf_member *m1, *m2; 3857 + struct btf_type *t1, *t2; 3858 + int n, i; 3859 + 3860 + t1 = btf_type_by_id(d->btf, id1); 3861 + t2 = btf_type_by_id(d->btf, id2); 3862 + 3863 + if (!btf_is_composite(t1) || btf_kind(t1) != btf_kind(t2)) 3864 + return false; 3865 + 3866 + if (!btf_shallow_equal_struct(t1, t2)) 3867 + return false; 3868 + 3869 + m1 = btf_members(t1); 3870 + m2 = btf_members(t2); 3871 + for (i = 0, n = btf_vlen(t1); i < n; i++, m1++, m2++) { 3872 + if (m1->type != m2->type) 3873 + return false; 3874 + } 3875 + return true; 3876 + } 3877 + 3886 3878 /* 3887 3879 * Check equivalence of BTF type graph formed by candidate struct/union (we'll 3888 3880 * call it "candidate graph" in this description for brevity) to a type graph ··· 4019 3961 4020 3962 hypot_type_id = d->hypot_map[canon_id]; 4021 3963 if (hypot_type_id <= BTF_MAX_NR_TYPES) { 3964 + if (hypot_type_id == cand_id) 3965 + return 1; 4022 3966 /* In some cases compiler will generate different DWARF types 4023 3967 * for *identical* array type definitions and use them for 4024 3968 * different fields within the *same* struct. This breaks type ··· 4029 3969 * types within a single CU. So work around that by explicitly 4030 3970 * allowing identical array types here. 4031 3971 */ 4032 - return hypot_type_id == cand_id || 4033 - btf_dedup_identical_arrays(d, hypot_type_id, cand_id); 3972 + if (btf_dedup_identical_arrays(d, hypot_type_id, cand_id)) 3973 + return 1; 3974 + /* It turns out that similar situation can happen with 3975 + * struct/union sometimes, sigh... Handle the case where 3976 + * structs/unions are exactly the same, down to the referenced 3977 + * type IDs. Anything more complicated (e.g., if referenced 3978 + * types are different, but equivalent) is *way more* 3979 + * complicated and requires a many-to-many equivalence mapping. 3980 + */ 3981 + if (btf_dedup_identical_structs(d, hypot_type_id, cand_id)) 3982 + return 1; 3983 + return 0; 4034 3984 } 4035 3985 4036 3986 if (btf_dedup_hypot_map_add(d, canon_id, cand_id)) ··· 4093 4023 case BTF_KIND_PTR: 4094 4024 case BTF_KIND_TYPEDEF: 4095 4025 case BTF_KIND_FUNC: 4026 + case BTF_KIND_TYPE_TAG: 4096 4027 if (cand_type->info != canon_type->info) 4097 4028 return 0; 4098 4029 return btf_dedup_is_equiv(d, cand_type->type, canon_type->type);

+1 -1

tools/lib/bpf/btf.h

··· 157 157 __u32 expected_value_size, 158 158 __u32 *key_type_id, __u32 *value_type_id); 159 159 160 - LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size); 160 + LIBBPF_API struct btf_ext *btf_ext__new(const __u8 *data, __u32 size); 161 161 LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext); 162 162 LIBBPF_API const void *btf_ext__get_raw_data(const struct btf_ext *btf_ext, 163 163 __u32 *size);

+1 -1

tools/lib/bpf/btf_dump.c

··· 2216 2216 __u8 bits_offset, 2217 2217 __u8 bit_sz) 2218 2218 { 2219 - int size, err; 2219 + int size, err = 0; 2220 2220 2221 2221 size = btf_dump_type_data_check_overflow(d, t, id, data, bits_offset); 2222 2222 if (size < 0)

+107 -53

tools/lib/bpf/gen_loader.c

··· 445 445 } 446 446 447 447 void bpf_gen__map_create(struct bpf_gen *gen, 448 - struct bpf_create_map_params *map_attr, int map_idx) 448 + enum bpf_map_type map_type, 449 + const char *map_name, 450 + __u32 key_size, __u32 value_size, __u32 max_entries, 451 + struct bpf_map_create_opts *map_attr, int map_idx) 449 452 { 450 - int attr_size = offsetofend(union bpf_attr, btf_vmlinux_value_type_id); 453 + int attr_size = offsetofend(union bpf_attr, map_extra); 451 454 bool close_inner_map_fd = false; 452 455 int map_create_attr, idx; 453 456 union bpf_attr attr; 454 457 455 458 memset(&attr, 0, attr_size); 456 - attr.map_type = map_attr->map_type; 457 - attr.key_size = map_attr->key_size; 458 - attr.value_size = map_attr->value_size; 459 + attr.map_type = map_type; 460 + attr.key_size = key_size; 461 + attr.value_size = value_size; 459 462 attr.map_flags = map_attr->map_flags; 460 463 attr.map_extra = map_attr->map_extra; 461 - memcpy(attr.map_name, map_attr->name, 462 - min((unsigned)strlen(map_attr->name), BPF_OBJ_NAME_LEN - 1)); 464 + if (map_name) 465 + memcpy(attr.map_name, map_name, 466 + min((unsigned)strlen(map_name), BPF_OBJ_NAME_LEN - 1)); 463 467 attr.numa_node = map_attr->numa_node; 464 468 attr.map_ifindex = map_attr->map_ifindex; 465 - attr.max_entries = map_attr->max_entries; 466 - switch (attr.map_type) { 467 - case BPF_MAP_TYPE_PERF_EVENT_ARRAY: 468 - case BPF_MAP_TYPE_CGROUP_ARRAY: 469 - case BPF_MAP_TYPE_STACK_TRACE: 470 - case BPF_MAP_TYPE_ARRAY_OF_MAPS: 471 - case BPF_MAP_TYPE_HASH_OF_MAPS: 472 - case BPF_MAP_TYPE_DEVMAP: 473 - case BPF_MAP_TYPE_DEVMAP_HASH: 474 - case BPF_MAP_TYPE_CPUMAP: 475 - case BPF_MAP_TYPE_XSKMAP: 476 - case BPF_MAP_TYPE_SOCKMAP: 477 - case BPF_MAP_TYPE_SOCKHASH: 478 - case BPF_MAP_TYPE_QUEUE: 479 - case BPF_MAP_TYPE_STACK: 480 - case BPF_MAP_TYPE_RINGBUF: 481 - break; 482 - default: 483 - attr.btf_key_type_id = map_attr->btf_key_type_id; 484 - attr.btf_value_type_id = map_attr->btf_value_type_id; 485 - } 469 + attr.max_entries = max_entries; 470 + attr.btf_key_type_id = map_attr->btf_key_type_id; 471 + attr.btf_value_type_id = map_attr->btf_value_type_id; 486 472 487 473 pr_debug("gen: map_create: %s idx %d type %d value_type_id %d\n", 488 - attr.map_name, map_idx, map_attr->map_type, attr.btf_value_type_id); 474 + attr.map_name, map_idx, map_type, attr.btf_value_type_id); 489 475 490 476 map_create_attr = add_data(gen, &attr, attr_size); 491 477 if (attr.btf_value_type_id) ··· 498 512 /* emit MAP_CREATE command */ 499 513 emit_sys_bpf(gen, BPF_MAP_CREATE, map_create_attr, attr_size); 500 514 debug_ret(gen, "map_create %s idx %d type %d value_size %d value_btf_id %d", 501 - attr.map_name, map_idx, map_attr->map_type, attr.value_size, 515 + attr.map_name, map_idx, map_type, value_size, 502 516 attr.btf_value_type_id); 503 517 emit_check_err(gen); 504 518 /* remember map_fd in the stack, if successful */ ··· 687 701 return; 688 702 } 689 703 kdesc->off = btf_fd_idx; 690 - /* set a default value for imm */ 704 + /* jump to success case */ 705 + emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3)); 706 + /* set value for imm, off as 0 */ 691 707 emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, offsetof(struct bpf_insn, imm), 0)); 692 - /* skip success case store if ret < 0 */ 693 - emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 1)); 708 + emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0)); 709 + /* skip success case for ret < 0 */ 710 + emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 10)); 694 711 /* store btf_id into insn[insn_idx].imm */ 695 712 emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7, offsetof(struct bpf_insn, imm))); 713 + /* obtain fd in BPF_REG_9 */ 714 + emit(gen, BPF_MOV64_REG(BPF_REG_9, BPF_REG_7)); 715 + emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_9, 32)); 716 + /* jump to fd_array store if fd denotes module BTF */ 717 + emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0, 2)); 718 + /* set the default value for off */ 719 + emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0)); 720 + /* skip BTF fd store for vmlinux BTF */ 721 + emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 4)); 696 722 /* load fd_array slot pointer */ 697 723 emit2(gen, BPF_LD_IMM64_RAW_FULL(BPF_REG_0, BPF_PSEUDO_MAP_IDX_VALUE, 698 724 0, 0, 0, blob_fd_array_off(gen, btf_fd_idx))); 699 - /* skip store of BTF fd if ret < 0 */ 700 - emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 3)); 701 725 /* store BTF fd in slot */ 702 - emit(gen, BPF_MOV64_REG(BPF_REG_9, BPF_REG_7)); 703 - emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_9, 32)); 704 726 emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_0, BPF_REG_9, 0)); 705 - /* set a default value for off */ 706 - emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), 0)); 707 - /* skip insn->off store if ret < 0 */ 708 - emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 2)); 709 - /* skip if vmlinux BTF */ 710 - emit(gen, BPF_JMP_IMM(BPF_JEQ, BPF_REG_9, 0, 1)); 711 727 /* store index into insn[insn_idx].off */ 712 728 emit(gen, BPF_ST_MEM(BPF_H, BPF_REG_8, offsetof(struct bpf_insn, off), btf_fd_idx)); 713 729 log: ··· 808 820 kdesc->insn + offsetof(struct bpf_insn, imm)); 809 821 move_blob2blob(gen, insn + sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm), 4, 810 822 kdesc->insn + sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm)); 811 - emit(gen, BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_8, offsetof(struct bpf_insn, imm))); 812 - /* jump over src_reg adjustment if imm is not 0 */ 813 - emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_9, 0, 3)); 823 + /* jump over src_reg adjustment if imm is not 0, reuse BPF_REG_0 from move_blob2blob */ 824 + emit(gen, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3)); 814 825 goto clear_src_reg; 815 826 } 816 827 /* remember insn offset, so we can copy BTF ID and FD later */ ··· 817 830 emit_bpf_find_by_name_kind(gen, relo); 818 831 if (!relo->is_weak) 819 832 emit_check_err(gen); 820 - /* set default values as 0 */ 833 + /* jump to success case */ 834 + emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3)); 835 + /* set values for insn[insn_idx].imm, insn[insn_idx + 1].imm as 0 */ 821 836 emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, offsetof(struct bpf_insn, imm), 0)); 822 837 emit(gen, BPF_ST_MEM(BPF_W, BPF_REG_8, sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm), 0)); 823 - /* skip success case stores if ret < 0 */ 824 - emit(gen, BPF_JMP_IMM(BPF_JSLT, BPF_REG_7, 0, 4)); 838 + /* skip success case for ret < 0 */ 839 + emit(gen, BPF_JMP_IMM(BPF_JA, 0, 0, 4)); 825 840 /* store btf_id into insn[insn_idx].imm */ 826 841 emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7, offsetof(struct bpf_insn, imm))); 827 842 /* store btf_obj_fd into insn[insn_idx + 1].imm */ 828 843 emit(gen, BPF_ALU64_IMM(BPF_RSH, BPF_REG_7, 32)); 829 844 emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_8, BPF_REG_7, 830 845 sizeof(struct bpf_insn) + offsetof(struct bpf_insn, imm))); 846 + /* skip src_reg adjustment */ 831 847 emit(gen, BPF_JMP_IMM(BPF_JSGE, BPF_REG_7, 0, 3)); 832 848 clear_src_reg: 833 849 /* clear bpf_object__relocate_data's src_reg assignment, otherwise we get a verifier failure */ ··· 840 850 emit(gen, BPF_STX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, offsetofend(struct bpf_insn, code))); 841 851 842 852 emit_ksym_relo_log(gen, relo, kdesc->ref); 853 + } 854 + 855 + void bpf_gen__record_relo_core(struct bpf_gen *gen, 856 + const struct bpf_core_relo *core_relo) 857 + { 858 + struct bpf_core_relo *relos; 859 + 860 + relos = libbpf_reallocarray(gen->core_relos, gen->core_relo_cnt + 1, sizeof(*relos)); 861 + if (!relos) { 862 + gen->error = -ENOMEM; 863 + return; 864 + } 865 + gen->core_relos = relos; 866 + relos += gen->core_relo_cnt; 867 + memcpy(relos, core_relo, sizeof(*relos)); 868 + gen->core_relo_cnt++; 843 869 } 844 870 845 871 static void emit_relo(struct bpf_gen *gen, struct ksym_relo_desc *relo, int insns) ··· 890 884 emit_relo(gen, gen->relos + i, insns); 891 885 } 892 886 887 + static void cleanup_core_relo(struct bpf_gen *gen) 888 + { 889 + if (!gen->core_relo_cnt) 890 + return; 891 + free(gen->core_relos); 892 + gen->core_relo_cnt = 0; 893 + gen->core_relos = NULL; 894 + } 895 + 893 896 static void cleanup_relos(struct bpf_gen *gen, int insns) 894 897 { 895 898 int i, insn; ··· 926 911 gen->relo_cnt = 0; 927 912 gen->relos = NULL; 928 913 } 914 + cleanup_core_relo(gen); 929 915 } 930 916 931 917 void bpf_gen__prog_load(struct bpf_gen *gen, ··· 934 918 const char *license, struct bpf_insn *insns, size_t insn_cnt, 935 919 struct bpf_prog_load_opts *load_attr, int prog_idx) 936 920 { 937 - int attr_size = offsetofend(union bpf_attr, fd_array); 938 - int prog_load_attr, license_off, insns_off, func_info, line_info; 921 + int prog_load_attr, license_off, insns_off, func_info, line_info, core_relos; 922 + int attr_size = offsetofend(union bpf_attr, core_relo_rec_size); 939 923 union bpf_attr attr; 940 924 941 925 memset(&attr, 0, attr_size); 942 - pr_debug("gen: prog_load: type %d insns_cnt %zd\n", prog_type, insn_cnt); 926 + pr_debug("gen: prog_load: type %d insns_cnt %zd progi_idx %d\n", 927 + prog_type, insn_cnt, prog_idx); 943 928 /* add license string to blob of bytes */ 944 929 license_off = add_data(gen, license, strlen(license) + 1); 945 930 /* add insns to blob of bytes */ ··· 964 947 line_info = add_data(gen, load_attr->line_info, 965 948 attr.line_info_cnt * attr.line_info_rec_size); 966 949 950 + attr.core_relo_rec_size = sizeof(struct bpf_core_relo); 951 + attr.core_relo_cnt = gen->core_relo_cnt; 952 + core_relos = add_data(gen, gen->core_relos, 953 + attr.core_relo_cnt * attr.core_relo_rec_size); 954 + 967 955 memcpy(attr.prog_name, prog_name, 968 956 min((unsigned)strlen(prog_name), BPF_OBJ_NAME_LEN - 1)); 969 957 prog_load_attr = add_data(gen, &attr, attr_size); ··· 984 962 985 963 /* populate union bpf_attr with a pointer to line_info */ 986 964 emit_rel_store(gen, attr_field(prog_load_attr, line_info), line_info); 965 + 966 + /* populate union bpf_attr with a pointer to core_relos */ 967 + emit_rel_store(gen, attr_field(prog_load_attr, core_relos), core_relos); 987 968 988 969 /* populate union bpf_attr fd_array with a pointer to data where map_fds are saved */ 989 970 emit_rel_store(gen, attr_field(prog_load_attr, fd_array), gen->fd_array); ··· 1018 993 debug_ret(gen, "prog_load %s insn_cnt %d", attr.prog_name, attr.insn_cnt); 1019 994 /* successful or not, close btf module FDs used in extern ksyms and attach_btf_obj_fd */ 1020 995 cleanup_relos(gen, insns_off); 1021 - if (gen->attach_kind) 996 + if (gen->attach_kind) { 1022 997 emit_sys_close_blob(gen, 1023 998 attr_field(prog_load_attr, attach_btf_obj_fd)); 999 + gen->attach_kind = 0; 1000 + } 1024 1001 emit_check_err(gen); 1025 1002 /* remember prog_fd in the stack, if successful */ 1026 1003 emit(gen, BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_7, ··· 1065 1038 /* emit MAP_UPDATE_ELEM command */ 1066 1039 emit_sys_bpf(gen, BPF_MAP_UPDATE_ELEM, map_update_attr, attr_size); 1067 1040 debug_ret(gen, "update_elem idx %d value_size %d", map_idx, value_size); 1041 + emit_check_err(gen); 1042 + } 1043 + 1044 + void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int slot, 1045 + int inner_map_idx) 1046 + { 1047 + int attr_size = offsetofend(union bpf_attr, flags); 1048 + int map_update_attr, key; 1049 + union bpf_attr attr; 1050 + 1051 + memset(&attr, 0, attr_size); 1052 + pr_debug("gen: populate_outer_map: outer %d key %d inner %d\n", 1053 + outer_map_idx, slot, inner_map_idx); 1054 + 1055 + key = add_data(gen, &slot, sizeof(slot)); 1056 + 1057 + map_update_attr = add_data(gen, &attr, attr_size); 1058 + move_blob2blob(gen, attr_field(map_update_attr, map_fd), 4, 1059 + blob_fd_array_off(gen, outer_map_idx)); 1060 + emit_rel_store(gen, attr_field(map_update_attr, key), key); 1061 + emit_rel_store(gen, attr_field(map_update_attr, value), 1062 + blob_fd_array_off(gen, inner_map_idx)); 1063 + 1064 + /* emit MAP_UPDATE_ELEM command */ 1065 + emit_sys_bpf(gen, BPF_MAP_UPDATE_ELEM, map_update_attr, attr_size); 1066 + debug_ret(gen, "populate_outer_map outer %d key %d inner %d", 1067 + outer_map_idx, slot, inner_map_idx); 1068 1068 emit_check_err(gen); 1069 1069 } 1070 1070

+451 -198

tools/lib/bpf/libbpf.c

··· 168 168 return 0; 169 169 } 170 170 171 + __u32 libbpf_major_version(void) 172 + { 173 + return LIBBPF_MAJOR_VERSION; 174 + } 175 + 176 + __u32 libbpf_minor_version(void) 177 + { 178 + return LIBBPF_MINOR_VERSION; 179 + } 180 + 181 + const char *libbpf_version_string(void) 182 + { 183 + #define __S(X) #X 184 + #define _S(X) __S(X) 185 + return "v" _S(LIBBPF_MAJOR_VERSION) "." _S(LIBBPF_MINOR_VERSION); 186 + #undef _S 187 + #undef __S 188 + } 189 + 171 190 enum kern_feature_id { 172 191 /* v4.14: kernel support for program & map names. */ 173 192 FEAT_PROG_NAME, ··· 230 211 RELO_EXTERN_VAR, 231 212 RELO_EXTERN_FUNC, 232 213 RELO_SUBPROG_ADDR, 214 + RELO_CORE, 233 215 }; 234 216 235 217 struct reloc_desc { 236 218 enum reloc_type type; 237 219 int insn_idx; 238 - int map_idx; 239 - int sym_off; 220 + union { 221 + const struct bpf_core_relo *core_relo; /* used when type == RELO_CORE */ 222 + struct { 223 + int map_idx; 224 + int sym_off; 225 + }; 226 + }; 240 227 }; 241 228 242 229 struct bpf_sec_def; ··· 331 306 332 307 struct reloc_desc *reloc_desc; 333 308 int nr_reloc; 334 - int log_level; 309 + 310 + /* BPF verifier log settings */ 311 + char *log_buf; 312 + size_t log_size; 313 + __u32 log_level; 335 314 336 315 struct { 337 316 int nr; ··· 431 402 char *pin_path; 432 403 bool pinned; 433 404 bool reused; 405 + bool skipped; 434 406 __u64 map_extra; 435 407 }; 436 408 ··· 578 548 size_t btf_module_cnt; 579 549 size_t btf_module_cap; 580 550 551 + /* optional log settings passed to BPF_BTF_LOAD and BPF_PROG_LOAD commands */ 552 + char *log_buf; 553 + size_t log_size; 554 + __u32 log_level; 555 + 581 556 void *priv; 582 557 bpf_object_clear_priv_t clear_priv; 583 558 ··· 717 682 718 683 prog->instances.fds = NULL; 719 684 prog->instances.nr = -1; 685 + 686 + /* inherit object's log_level */ 687 + prog->log_level = obj->log_level; 720 688 721 689 prog->sec_name = strdup(sec_name); 722 690 if (!prog->sec_name) ··· 2296 2258 map_def->parts |= MAP_DEF_VALUE_SIZE | MAP_DEF_VALUE_TYPE; 2297 2259 } 2298 2260 else if (strcmp(name, "values") == 0) { 2261 + bool is_map_in_map = bpf_map_type__is_map_in_map(map_def->map_type); 2262 + bool is_prog_array = map_def->map_type == BPF_MAP_TYPE_PROG_ARRAY; 2263 + const char *desc = is_map_in_map ? "map-in-map inner" : "prog-array value"; 2299 2264 char inner_map_name[128]; 2300 2265 int err; 2301 2266 ··· 2312 2271 map_name, name); 2313 2272 return -EINVAL; 2314 2273 } 2315 - if (!bpf_map_type__is_map_in_map(map_def->map_type)) { 2316 - pr_warn("map '%s': should be map-in-map.\n", 2274 + if (!is_map_in_map && !is_prog_array) { 2275 + pr_warn("map '%s': should be map-in-map or prog-array.\n", 2317 2276 map_name); 2318 2277 return -ENOTSUP; 2319 2278 } ··· 2325 2284 map_def->value_size = 4; 2326 2285 t = btf__type_by_id(btf, m->type); 2327 2286 if (!t) { 2328 - pr_warn("map '%s': map-in-map inner type [%d] not found.\n", 2329 - map_name, m->type); 2287 + pr_warn("map '%s': %s type [%d] not found.\n", 2288 + map_name, desc, m->type); 2330 2289 return -EINVAL; 2331 2290 } 2332 2291 if (!btf_is_array(t) || btf_array(t)->nelems) { 2333 - pr_warn("map '%s': map-in-map inner spec is not a zero-sized array.\n", 2334 - map_name); 2292 + pr_warn("map '%s': %s spec is not a zero-sized array.\n", 2293 + map_name, desc); 2335 2294 return -EINVAL; 2336 2295 } 2337 2296 t = skip_mods_and_typedefs(btf, btf_array(t)->type, NULL); 2338 2297 if (!btf_is_ptr(t)) { 2339 - pr_warn("map '%s': map-in-map inner def is of unexpected kind %s.\n", 2340 - map_name, btf_kind_str(t)); 2298 + pr_warn("map '%s': %s def is of unexpected kind %s.\n", 2299 + map_name, desc, btf_kind_str(t)); 2341 2300 return -EINVAL; 2342 2301 } 2343 2302 t = skip_mods_and_typedefs(btf, t->type, NULL); 2303 + if (is_prog_array) { 2304 + if (!btf_is_func_proto(t)) { 2305 + pr_warn("map '%s': prog-array value def is of unexpected kind %s.\n", 2306 + map_name, btf_kind_str(t)); 2307 + return -EINVAL; 2308 + } 2309 + continue; 2310 + } 2344 2311 if (!btf_is_struct(t)) { 2345 2312 pr_warn("map '%s': map-in-map inner def is of unexpected kind %s.\n", 2346 2313 map_name, btf_kind_str(t)); ··· 3030 2981 */ 3031 2982 btf__set_fd(kern_btf, 0); 3032 2983 } else { 3033 - err = btf__load_into_kernel(kern_btf); 2984 + /* currently BPF_BTF_LOAD only supports log_level 1 */ 2985 + err = btf_load_into_kernel(kern_btf, obj->log_buf, obj->log_size, 2986 + obj->log_level ? 1 : 0); 3034 2987 } 3035 2988 if (sanitize) { 3036 2989 if (!err) { ··· 3401 3350 3402 3351 /* sort BPF programs by section name and in-section instruction offset 3403 3352 * for faster search */ 3404 - qsort(obj->programs, obj->nr_programs, sizeof(*obj->programs), cmp_progs); 3353 + if (obj->nr_programs) 3354 + qsort(obj->programs, obj->nr_programs, sizeof(*obj->programs), cmp_progs); 3405 3355 3406 3356 return bpf_object__init_btf(obj, btf_data, btf_ext_data); 3407 3357 } ··· 4394 4342 4395 4343 static int probe_kern_global_data(void) 4396 4344 { 4397 - struct bpf_create_map_attr map_attr; 4398 4345 char *cp, errmsg[STRERR_BUFSIZE]; 4399 4346 struct bpf_insn insns[] = { 4400 4347 BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16), ··· 4403 4352 }; 4404 4353 int ret, map, insn_cnt = ARRAY_SIZE(insns); 4405 4354 4406 - memset(&map_attr, 0, sizeof(map_attr)); 4407 - map_attr.map_type = BPF_MAP_TYPE_ARRAY; 4408 - map_attr.key_size = sizeof(int); 4409 - map_attr.value_size = 32; 4410 - map_attr.max_entries = 1; 4411 - 4412 - map = bpf_create_map_xattr(&map_attr); 4355 + map = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(int), 32, 1, NULL); 4413 4356 if (map < 0) { 4414 4357 ret = -errno; 4415 4358 cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); ··· 4533 4488 4534 4489 static int probe_kern_array_mmap(void) 4535 4490 { 4536 - struct bpf_create_map_attr attr = { 4537 - .map_type = BPF_MAP_TYPE_ARRAY, 4538 - .map_flags = BPF_F_MMAPABLE, 4539 - .key_size = sizeof(int), 4540 - .value_size = sizeof(int), 4541 - .max_entries = 1, 4542 - }; 4491 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_MMAPABLE); 4492 + int fd; 4543 4493 4544 - return probe_fd(bpf_create_map_xattr(&attr)); 4494 + fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(int), sizeof(int), 1, &opts); 4495 + return probe_fd(fd); 4545 4496 } 4546 4497 4547 4498 static int probe_kern_exp_attach_type(void) ··· 4576 4535 4577 4536 static int probe_prog_bind_map(void) 4578 4537 { 4579 - struct bpf_create_map_attr map_attr; 4580 4538 char *cp, errmsg[STRERR_BUFSIZE]; 4581 4539 struct bpf_insn insns[] = { 4582 4540 BPF_MOV64_IMM(BPF_REG_0, 0), ··· 4583 4543 }; 4584 4544 int ret, map, prog, insn_cnt = ARRAY_SIZE(insns); 4585 4545 4586 - memset(&map_attr, 0, sizeof(map_attr)); 4587 - map_attr.map_type = BPF_MAP_TYPE_ARRAY; 4588 - map_attr.key_size = sizeof(int); 4589 - map_attr.value_size = 32; 4590 - map_attr.max_entries = 1; 4591 - 4592 - map = bpf_create_map_xattr(&map_attr); 4546 + map = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(int), 32, 1, NULL); 4593 4547 if (map < 0) { 4594 4548 ret = -errno; 4595 4549 cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); ··· 4854 4820 4855 4821 static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, bool is_inner) 4856 4822 { 4857 - struct bpf_create_map_params create_attr; 4823 + LIBBPF_OPTS(bpf_map_create_opts, create_attr); 4858 4824 struct bpf_map_def *def = &map->def; 4825 + const char *map_name = NULL; 4826 + __u32 max_entries; 4859 4827 int err = 0; 4860 4828 4861 - memset(&create_attr, 0, sizeof(create_attr)); 4862 - 4863 4829 if (kernel_supports(obj, FEAT_PROG_NAME)) 4864 - create_attr.name = map->name; 4830 + map_name = map->name; 4865 4831 create_attr.map_ifindex = map->map_ifindex; 4866 - create_attr.map_type = def->type; 4867 4832 create_attr.map_flags = def->map_flags; 4868 - create_attr.key_size = def->key_size; 4869 - create_attr.value_size = def->value_size; 4870 4833 create_attr.numa_node = map->numa_node; 4871 4834 create_attr.map_extra = map->map_extra; 4872 4835 ··· 4877 4846 return nr_cpus; 4878 4847 } 4879 4848 pr_debug("map '%s': setting size to %d\n", map->name, nr_cpus); 4880 - create_attr.max_entries = nr_cpus; 4849 + max_entries = nr_cpus; 4881 4850 } else { 4882 - create_attr.max_entries = def->max_entries; 4851 + max_entries = def->max_entries; 4883 4852 } 4884 4853 4885 4854 if (bpf_map__is_struct_ops(map)) 4886 - create_attr.btf_vmlinux_value_type_id = 4887 - map->btf_vmlinux_value_type_id; 4855 + create_attr.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; 4888 4856 4889 - create_attr.btf_fd = 0; 4890 - create_attr.btf_key_type_id = 0; 4891 - create_attr.btf_value_type_id = 0; 4892 4857 if (obj->btf && btf__fd(obj->btf) >= 0 && !bpf_map_find_btf_info(obj, map)) { 4893 4858 create_attr.btf_fd = btf__fd(obj->btf); 4894 4859 create_attr.btf_key_type_id = map->btf_key_type_id; ··· 4930 4903 } 4931 4904 4932 4905 if (obj->gen_loader) { 4933 - bpf_gen__map_create(obj->gen_loader, &create_attr, is_inner ? -1 : map - obj->maps); 4906 + bpf_gen__map_create(obj->gen_loader, def->type, map_name, 4907 + def->key_size, def->value_size, max_entries, 4908 + &create_attr, is_inner ? -1 : map - obj->maps); 4934 4909 /* Pretend to have valid FD to pass various fd >= 0 checks. 4935 4910 * This fd == 0 will not be used with any syscall and will be reset to -1 eventually. 4936 4911 */ 4937 4912 map->fd = 0; 4938 4913 } else { 4939 - map->fd = libbpf__bpf_create_map_xattr(&create_attr); 4914 + map->fd = bpf_map_create(def->type, map_name, 4915 + def->key_size, def->value_size, 4916 + max_entries, &create_attr); 4940 4917 } 4941 4918 if (map->fd < 0 && (create_attr.btf_key_type_id || 4942 4919 create_attr.btf_value_type_id)) { ··· 4955 4924 create_attr.btf_value_type_id = 0; 4956 4925 map->btf_key_type_id = 0; 4957 4926 map->btf_value_type_id = 0; 4958 - map->fd = libbpf__bpf_create_map_xattr(&create_attr); 4927 + map->fd = bpf_map_create(def->type, map_name, 4928 + def->key_size, def->value_size, 4929 + max_entries, &create_attr); 4959 4930 } 4960 4931 4961 4932 err = map->fd < 0 ? -errno : 0; ··· 4972 4939 return err; 4973 4940 } 4974 4941 4975 - static int init_map_slots(struct bpf_object *obj, struct bpf_map *map) 4942 + static int init_map_in_map_slots(struct bpf_object *obj, struct bpf_map *map) 4976 4943 { 4977 4944 const struct bpf_map *targ_map; 4978 4945 unsigned int i; ··· 4984 4951 4985 4952 targ_map = map->init_slots[i]; 4986 4953 fd = bpf_map__fd(targ_map); 4954 + 4987 4955 if (obj->gen_loader) { 4988 - pr_warn("// TODO map_update_elem: idx %td key %d value==map_idx %td\n", 4989 - map - obj->maps, i, targ_map - obj->maps); 4990 - return -ENOTSUP; 4956 + bpf_gen__populate_outer_map(obj->gen_loader, 4957 + map - obj->maps, i, 4958 + targ_map - obj->maps); 4991 4959 } else { 4992 4960 err = bpf_map_update_elem(map->fd, &i, &fd, 0); 4993 4961 } 4994 4962 if (err) { 4995 4963 err = -errno; 4996 4964 pr_warn("map '%s': failed to initialize slot [%d] to map '%s' fd=%d: %d\n", 4997 - map->name, i, targ_map->name, 4998 - fd, err); 4965 + map->name, i, targ_map->name, fd, err); 4999 4966 return err; 5000 4967 } 5001 4968 pr_debug("map '%s': slot [%d] set to map '%s' fd=%d\n", ··· 5005 4972 zfree(&map->init_slots); 5006 4973 map->init_slots_sz = 0; 5007 4974 4975 + return 0; 4976 + } 4977 + 4978 + static int init_prog_array_slots(struct bpf_object *obj, struct bpf_map *map) 4979 + { 4980 + const struct bpf_program *targ_prog; 4981 + unsigned int i; 4982 + int fd, err; 4983 + 4984 + if (obj->gen_loader) 4985 + return -ENOTSUP; 4986 + 4987 + for (i = 0; i < map->init_slots_sz; i++) { 4988 + if (!map->init_slots[i]) 4989 + continue; 4990 + 4991 + targ_prog = map->init_slots[i]; 4992 + fd = bpf_program__fd(targ_prog); 4993 + 4994 + err = bpf_map_update_elem(map->fd, &i, &fd, 0); 4995 + if (err) { 4996 + err = -errno; 4997 + pr_warn("map '%s': failed to initialize slot [%d] to prog '%s' fd=%d: %d\n", 4998 + map->name, i, targ_prog->name, fd, err); 4999 + return err; 5000 + } 5001 + pr_debug("map '%s': slot [%d] set to prog '%s' fd=%d\n", 5002 + map->name, i, targ_prog->name, fd); 5003 + } 5004 + 5005 + zfree(&map->init_slots); 5006 + map->init_slots_sz = 0; 5007 + 5008 + return 0; 5009 + } 5010 + 5011 + static int bpf_object_init_prog_arrays(struct bpf_object *obj) 5012 + { 5013 + struct bpf_map *map; 5014 + int i, err; 5015 + 5016 + for (i = 0; i < obj->nr_maps; i++) { 5017 + map = &obj->maps[i]; 5018 + 5019 + if (!map->init_slots_sz || map->def.type != BPF_MAP_TYPE_PROG_ARRAY) 5020 + continue; 5021 + 5022 + err = init_prog_array_slots(obj, map); 5023 + if (err < 0) { 5024 + zclose(map->fd); 5025 + return err; 5026 + } 5027 + } 5008 5028 return 0; 5009 5029 } 5010 5030 ··· 5072 4986 5073 4987 for (i = 0; i < obj->nr_maps; i++) { 5074 4988 map = &obj->maps[i]; 4989 + 4990 + /* To support old kernels, we skip creating global data maps 4991 + * (.rodata, .data, .kconfig, etc); later on, during program 4992 + * loading, if we detect that at least one of the to-be-loaded 4993 + * programs is referencing any global data map, we'll error 4994 + * out with program name and relocation index logged. 4995 + * This approach allows to accommodate Clang emitting 4996 + * unnecessary .rodata.str1.1 sections for string literals, 4997 + * but also it allows to have CO-RE applications that use 4998 + * global variables in some of BPF programs, but not others. 4999 + * If those global variable-using programs are not loaded at 5000 + * runtime due to bpf_program__set_autoload(prog, false), 5001 + * bpf_object loading will succeed just fine even on old 5002 + * kernels. 5003 + */ 5004 + if (bpf_map__is_internal(map) && 5005 + !kernel_supports(obj, FEAT_GLOBAL_DATA)) { 5006 + map->skipped = true; 5007 + continue; 5008 + } 5075 5009 5076 5010 retried = false; 5077 5011 retry: ··· 5129 5023 } 5130 5024 } 5131 5025 5132 - if (map->init_slots_sz) { 5133 - err = init_map_slots(obj, map); 5026 + if (map->init_slots_sz && map->def.type != BPF_MAP_TYPE_PROG_ARRAY) { 5027 + err = init_map_in_map_slots(obj, map); 5134 5028 if (err < 0) { 5135 5029 zclose(map->fd); 5136 5030 goto err_out; ··· 5202 5096 struct bpf_core_cand_list *cands) 5203 5097 { 5204 5098 struct bpf_core_cand *new_cands, *cand; 5205 - const struct btf_type *t; 5206 - const char *targ_name; 5099 + const struct btf_type *t, *local_t; 5100 + const char *targ_name, *local_name; 5207 5101 size_t targ_essent_len; 5208 5102 int n, i; 5103 + 5104 + local_t = btf__type_by_id(local_cand->btf, local_cand->id); 5105 + local_name = btf__str_by_offset(local_cand->btf, local_t->name_off); 5209 5106 5210 5107 n = btf__type_cnt(targ_btf); 5211 5108 for (i = targ_start_id; i < n; i++) { 5212 5109 t = btf__type_by_id(targ_btf, i); 5213 - if (btf_kind(t) != btf_kind(local_cand->t)) 5110 + if (btf_kind(t) != btf_kind(local_t)) 5214 5111 continue; 5215 5112 5216 5113 targ_name = btf__name_by_offset(targ_btf, t->name_off); ··· 5224 5115 if (targ_essent_len != local_essent_len) 5225 5116 continue; 5226 5117 5227 - if (strncmp(local_cand->name, targ_name, local_essent_len) != 0) 5118 + if (strncmp(local_name, targ_name, local_essent_len) != 0) 5228 5119 continue; 5229 5120 5230 5121 pr_debug("CO-RE relocating [%d] %s %s: found target candidate [%d] %s %s in [%s]\n", 5231 - local_cand->id, btf_kind_str(local_cand->t), 5232 - local_cand->name, i, btf_kind_str(t), targ_name, 5122 + local_cand->id, btf_kind_str(local_t), 5123 + local_name, i, btf_kind_str(t), targ_name, 5233 5124 targ_btf_name); 5234 5125 new_cands = libbpf_reallocarray(cands->cands, cands->len + 1, 5235 5126 sizeof(*cands->cands)); ··· 5238 5129 5239 5130 cand = &new_cands[cands->len]; 5240 5131 cand->btf = targ_btf; 5241 - cand->t = t; 5242 - cand->name = targ_name; 5243 5132 cand->id = i; 5244 5133 5245 5134 cands->cands = new_cands; ··· 5344 5237 struct bpf_core_cand local_cand = {}; 5345 5238 struct bpf_core_cand_list *cands; 5346 5239 const struct btf *main_btf; 5240 + const struct btf_type *local_t; 5241 + const char *local_name; 5347 5242 size_t local_essent_len; 5348 5243 int err, i; 5349 5244 5350 5245 local_cand.btf = local_btf; 5351 - local_cand.t = btf__type_by_id(local_btf, local_type_id); 5352 - if (!local_cand.t) 5246 + local_cand.id = local_type_id; 5247 + local_t = btf__type_by_id(local_btf, local_type_id); 5248 + if (!local_t) 5353 5249 return ERR_PTR(-EINVAL); 5354 5250 5355 - local_cand.name = btf__name_by_offset(local_btf, local_cand.t->name_off); 5356 - if (str_is_empty(local_cand.name)) 5251 + local_name = btf__name_by_offset(local_btf, local_t->name_off); 5252 + if (str_is_empty(local_name)) 5357 5253 return ERR_PTR(-EINVAL); 5358 - local_essent_len = bpf_core_essential_name_len(local_cand.name); 5254 + local_essent_len = bpf_core_essential_name_len(local_name); 5359 5255 5360 5256 cands = calloc(1, sizeof(*cands)); 5361 5257 if (!cands) ··· 5508 5398 return (void *)(uintptr_t)x; 5509 5399 } 5510 5400 5401 + static int record_relo_core(struct bpf_program *prog, 5402 + const struct bpf_core_relo *core_relo, int insn_idx) 5403 + { 5404 + struct reloc_desc *relos, *relo; 5405 + 5406 + relos = libbpf_reallocarray(prog->reloc_desc, 5407 + prog->nr_reloc + 1, sizeof(*relos)); 5408 + if (!relos) 5409 + return -ENOMEM; 5410 + relo = &relos[prog->nr_reloc]; 5411 + relo->type = RELO_CORE; 5412 + relo->insn_idx = insn_idx; 5413 + relo->core_relo = core_relo; 5414 + prog->reloc_desc = relos; 5415 + prog->nr_reloc++; 5416 + return 0; 5417 + } 5418 + 5511 5419 static int bpf_core_apply_relo(struct bpf_program *prog, 5512 5420 const struct bpf_core_relo *relo, 5513 5421 int relo_idx, 5514 5422 const struct btf *local_btf, 5515 5423 struct hashmap *cand_cache) 5516 5424 { 5425 + struct bpf_core_spec specs_scratch[3] = {}; 5517 5426 const void *type_key = u32_as_hash_key(relo->type_id); 5518 5427 struct bpf_core_cand_list *cands = NULL; 5519 5428 const char *prog_name = prog->name; ··· 5563 5434 return -EINVAL; 5564 5435 5565 5436 if (prog->obj->gen_loader) { 5566 - pr_warn("// TODO core_relo: prog %td insn[%d] %s kind %d\n", 5437 + const char *spec_str = btf__name_by_offset(local_btf, relo->access_str_off); 5438 + 5439 + pr_debug("record_relo_core: prog %td insn[%d] %s %s %s final insn_idx %d\n", 5567 5440 prog - prog->obj->programs, relo->insn_off / 8, 5568 - local_name, relo->kind); 5569 - return -ENOTSUP; 5441 + btf_kind_str(local_type), local_name, spec_str, insn_idx); 5442 + return record_relo_core(prog, relo, insn_idx); 5570 5443 } 5571 5444 5572 - if (relo->kind != BPF_TYPE_ID_LOCAL && 5445 + if (relo->kind != BPF_CORE_TYPE_ID_LOCAL && 5573 5446 !hashmap__find(cand_cache, type_key, (void **)&cands)) { 5574 5447 cands = bpf_core_find_cands(prog->obj, local_btf, local_id); 5575 5448 if (IS_ERR(cands)) { ··· 5587 5456 } 5588 5457 } 5589 5458 5590 - return bpf_core_apply_relo_insn(prog_name, insn, insn_idx, relo, relo_idx, local_btf, cands); 5459 + return bpf_core_apply_relo_insn(prog_name, insn, insn_idx, relo, 5460 + relo_idx, local_btf, cands, specs_scratch); 5591 5461 } 5592 5462 5593 5463 static int ··· 5718 5586 insn[0].src_reg = BPF_PSEUDO_MAP_IDX_VALUE; 5719 5587 insn[0].imm = relo->map_idx; 5720 5588 } else { 5589 + const struct bpf_map *map = &obj->maps[relo->map_idx]; 5590 + 5591 + if (map->skipped) { 5592 + pr_warn("prog '%s': relo #%d: kernel doesn't support global data\n", 5593 + prog->name, i); 5594 + return -ENOTSUP; 5595 + } 5721 5596 insn[0].src_reg = BPF_PSEUDO_MAP_VALUE; 5722 5597 insn[0].imm = obj->maps[relo->map_idx].fd; 5723 5598 } ··· 5772 5633 break; 5773 5634 case RELO_CALL: 5774 5635 /* handled already */ 5636 + break; 5637 + case RELO_CORE: 5638 + /* will be handled by bpf_program_record_relos() */ 5775 5639 break; 5776 5640 default: 5777 5641 pr_warn("prog '%s': relo #%d: bad relo type %d\n", ··· 5939 5797 5940 5798 static struct reloc_desc *find_prog_insn_relo(const struct bpf_program *prog, size_t insn_idx) 5941 5799 { 5800 + if (!prog->nr_reloc) 5801 + return NULL; 5942 5802 return bsearch(&insn_idx, prog->reloc_desc, prog->nr_reloc, 5943 5803 sizeof(*prog->reloc_desc), cmp_relo_by_insn_idx); 5944 5804 } ··· 5956 5812 relos = libbpf_reallocarray(main_prog->reloc_desc, new_cnt, sizeof(*relos)); 5957 5813 if (!relos) 5958 5814 return -ENOMEM; 5959 - memcpy(relos + main_prog->nr_reloc, subprog->reloc_desc, 5960 - sizeof(*relos) * subprog->nr_reloc); 5815 + if (subprog->nr_reloc) 5816 + memcpy(relos + main_prog->nr_reloc, subprog->reloc_desc, 5817 + sizeof(*relos) * subprog->nr_reloc); 5961 5818 5962 5819 for (i = main_prog->nr_reloc; i < new_cnt; i++) 5963 5820 relos[i].insn_idx += subprog->sub_insn_off; ··· 6216 6071 } 6217 6072 } 6218 6073 6074 + static int cmp_relocs(const void *_a, const void *_b) 6075 + { 6076 + const struct reloc_desc *a = _a; 6077 + const struct reloc_desc *b = _b; 6078 + 6079 + if (a->insn_idx != b->insn_idx) 6080 + return a->insn_idx < b->insn_idx ? -1 : 1; 6081 + 6082 + /* no two relocations should have the same insn_idx, but ... */ 6083 + if (a->type != b->type) 6084 + return a->type < b->type ? -1 : 1; 6085 + 6086 + return 0; 6087 + } 6088 + 6089 + static void bpf_object__sort_relos(struct bpf_object *obj) 6090 + { 6091 + int i; 6092 + 6093 + for (i = 0; i < obj->nr_programs; i++) { 6094 + struct bpf_program *p = &obj->programs[i]; 6095 + 6096 + if (!p->nr_reloc) 6097 + continue; 6098 + 6099 + qsort(p->reloc_desc, p->nr_reloc, sizeof(*p->reloc_desc), cmp_relocs); 6100 + } 6101 + } 6102 + 6219 6103 static int 6220 6104 bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path) 6221 6105 { ··· 6259 6085 err); 6260 6086 return err; 6261 6087 } 6088 + if (obj->gen_loader) 6089 + bpf_object__sort_relos(obj); 6262 6090 } 6263 6091 6264 6092 /* Before relocating calls pre-process relocations and mark ··· 6296 6120 */ 6297 6121 if (prog_is_subprog(obj, prog)) 6298 6122 continue; 6123 + if (!prog->load) 6124 + continue; 6299 6125 6300 6126 err = bpf_object__relocate_calls(obj, prog); 6301 6127 if (err) { ··· 6310 6132 for (i = 0; i < obj->nr_programs; i++) { 6311 6133 prog = &obj->programs[i]; 6312 6134 if (prog_is_subprog(obj, prog)) 6135 + continue; 6136 + if (!prog->load) 6313 6137 continue; 6314 6138 err = bpf_object__relocate_data(obj, prog); 6315 6139 if (err) { ··· 6335 6155 int i, j, nrels, new_sz; 6336 6156 const struct btf_var_secinfo *vi = NULL; 6337 6157 const struct btf_type *sec, *var, *def; 6338 - struct bpf_map *map = NULL, *targ_map; 6158 + struct bpf_map *map = NULL, *targ_map = NULL; 6159 + struct bpf_program *targ_prog = NULL; 6160 + bool is_prog_array, is_map_in_map; 6339 6161 const struct btf_member *member; 6340 - const char *name, *mname; 6162 + const char *name, *mname, *type; 6341 6163 unsigned int moff; 6342 6164 Elf64_Sym *sym; 6343 6165 Elf64_Rel *rel; ··· 6366 6184 return -LIBBPF_ERRNO__FORMAT; 6367 6185 } 6368 6186 name = elf_sym_str(obj, sym->st_name) ?: "<?>"; 6369 - if (sym->st_shndx != obj->efile.btf_maps_shndx) { 6370 - pr_warn(".maps relo #%d: '%s' isn't a BTF-defined map\n", 6371 - i, name); 6372 - return -LIBBPF_ERRNO__RELOC; 6373 - } 6374 6187 6375 6188 pr_debug(".maps relo #%d: for %zd value %zd rel->r_offset %zu name %d ('%s')\n", 6376 6189 i, (ssize_t)(rel->r_info >> 32), (size_t)sym->st_value, ··· 6387 6210 return -EINVAL; 6388 6211 } 6389 6212 6390 - if (!bpf_map_type__is_map_in_map(map->def.type)) 6391 - return -EINVAL; 6392 - if (map->def.type == BPF_MAP_TYPE_HASH_OF_MAPS && 6393 - map->def.key_size != sizeof(int)) { 6394 - pr_warn(".maps relo #%d: hash-of-maps '%s' should have key size %zu.\n", 6395 - i, map->name, sizeof(int)); 6213 + is_map_in_map = bpf_map_type__is_map_in_map(map->def.type); 6214 + is_prog_array = map->def.type == BPF_MAP_TYPE_PROG_ARRAY; 6215 + type = is_map_in_map ? "map" : "prog"; 6216 + if (is_map_in_map) { 6217 + if (sym->st_shndx != obj->efile.btf_maps_shndx) { 6218 + pr_warn(".maps relo #%d: '%s' isn't a BTF-defined map\n", 6219 + i, name); 6220 + return -LIBBPF_ERRNO__RELOC; 6221 + } 6222 + if (map->def.type == BPF_MAP_TYPE_HASH_OF_MAPS && 6223 + map->def.key_size != sizeof(int)) { 6224 + pr_warn(".maps relo #%d: hash-of-maps '%s' should have key size %zu.\n", 6225 + i, map->name, sizeof(int)); 6226 + return -EINVAL; 6227 + } 6228 + targ_map = bpf_object__find_map_by_name(obj, name); 6229 + if (!targ_map) { 6230 + pr_warn(".maps relo #%d: '%s' isn't a valid map reference\n", 6231 + i, name); 6232 + return -ESRCH; 6233 + } 6234 + } else if (is_prog_array) { 6235 + targ_prog = bpf_object__find_program_by_name(obj, name); 6236 + if (!targ_prog) { 6237 + pr_warn(".maps relo #%d: '%s' isn't a valid program reference\n", 6238 + i, name); 6239 + return -ESRCH; 6240 + } 6241 + if (targ_prog->sec_idx != sym->st_shndx || 6242 + targ_prog->sec_insn_off * 8 != sym->st_value || 6243 + prog_is_subprog(obj, targ_prog)) { 6244 + pr_warn(".maps relo #%d: '%s' isn't an entry-point program\n", 6245 + i, name); 6246 + return -LIBBPF_ERRNO__RELOC; 6247 + } 6248 + } else { 6396 6249 return -EINVAL; 6397 6250 } 6398 - 6399 - targ_map = bpf_object__find_map_by_name(obj, name); 6400 - if (!targ_map) 6401 - return -ESRCH; 6402 6251 6403 6252 var = btf__type_by_id(obj->btf, vi->type); 6404 6253 def = skip_mods_and_typedefs(obj->btf, var->type, NULL); ··· 6456 6253 (new_sz - map->init_slots_sz) * host_ptr_sz); 6457 6254 map->init_slots_sz = new_sz; 6458 6255 } 6459 - map->init_slots[moff] = targ_map; 6256 + map->init_slots[moff] = is_map_in_map ? (void *)targ_map : (void *)targ_prog; 6460 6257 6461 - pr_debug(".maps relo #%d: map '%s' slot [%d] points to map '%s'\n", 6462 - i, map->name, moff, name); 6258 + pr_debug(".maps relo #%d: map '%s' slot [%d] points to %s '%s'\n", 6259 + i, map->name, moff, type, name); 6463 6260 } 6464 - 6465 - return 0; 6466 - } 6467 - 6468 - static int cmp_relocs(const void *_a, const void *_b) 6469 - { 6470 - const struct reloc_desc *a = _a; 6471 - const struct reloc_desc *b = _b; 6472 - 6473 - if (a->insn_idx != b->insn_idx) 6474 - return a->insn_idx < b->insn_idx ? -1 : 1; 6475 - 6476 - /* no two relocations should have the same insn_idx, but ... */ 6477 - if (a->type != b->type) 6478 - return a->type < b->type ? -1 : 1; 6479 6261 6480 6262 return 0; 6481 6263 } ··· 6497 6309 return err; 6498 6310 } 6499 6311 6500 - for (i = 0; i < obj->nr_programs; i++) { 6501 - struct bpf_program *p = &obj->programs[i]; 6502 - 6503 - if (!p->nr_reloc) 6504 - continue; 6505 - 6506 - qsort(p->reloc_desc, p->nr_reloc, sizeof(*p->reloc_desc), cmp_relocs); 6507 - } 6312 + bpf_object__sort_relos(obj); 6508 6313 return 0; 6509 6314 } 6510 6315 ··· 6600 6419 const char *prog_name = NULL; 6601 6420 char *cp, errmsg[STRERR_BUFSIZE]; 6602 6421 size_t log_buf_size = 0; 6603 - char *log_buf = NULL; 6422 + char *log_buf = NULL, *tmp; 6604 6423 int btf_fd, ret, err; 6424 + bool own_log_buf = true; 6425 + __u32 log_level = prog->log_level; 6605 6426 6606 6427 if (prog->type == BPF_PROG_TYPE_UNSPEC) { 6607 6428 /* ··· 6621 6438 load_attr.expected_attach_type = prog->expected_attach_type; 6622 6439 if (kernel_supports(obj, FEAT_PROG_NAME)) 6623 6440 prog_name = prog->name; 6624 - load_attr.attach_btf_id = prog->attach_btf_id; 6625 6441 load_attr.attach_prog_fd = prog->attach_prog_fd; 6626 6442 load_attr.attach_btf_obj_fd = prog->attach_btf_obj_fd; 6627 6443 load_attr.attach_btf_id = prog->attach_btf_id; ··· 6638 6456 load_attr.line_info_rec_size = prog->line_info_rec_size; 6639 6457 load_attr.line_info_cnt = prog->line_info_cnt; 6640 6458 } 6641 - load_attr.log_level = prog->log_level; 6459 + load_attr.log_level = log_level; 6642 6460 load_attr.prog_flags = prog->prog_flags; 6643 6461 load_attr.fd_array = obj->fd_array; 6644 6462 ··· 6659 6477 *prog_fd = -1; 6660 6478 return 0; 6661 6479 } 6662 - retry_load: 6663 - if (log_buf_size) { 6664 - log_buf = malloc(log_buf_size); 6665 - if (!log_buf) 6666 - return -ENOMEM; 6667 6480 6668 - *log_buf = 0; 6481 + retry_load: 6482 + /* if log_level is zero, we don't request logs initiallly even if 6483 + * custom log_buf is specified; if the program load fails, then we'll 6484 + * bump log_level to 1 and use either custom log_buf or we'll allocate 6485 + * our own and retry the load to get details on what failed 6486 + */ 6487 + if (log_level) { 6488 + if (prog->log_buf) { 6489 + log_buf = prog->log_buf; 6490 + log_buf_size = prog->log_size; 6491 + own_log_buf = false; 6492 + } else if (obj->log_buf) { 6493 + log_buf = obj->log_buf; 6494 + log_buf_size = obj->log_size; 6495 + own_log_buf = false; 6496 + } else { 6497 + log_buf_size = max((size_t)BPF_LOG_BUF_SIZE, log_buf_size * 2); 6498 + tmp = realloc(log_buf, log_buf_size); 6499 + if (!tmp) { 6500 + ret = -ENOMEM; 6501 + goto out; 6502 + } 6503 + log_buf = tmp; 6504 + log_buf[0] = '\0'; 6505 + own_log_buf = true; 6506 + } 6669 6507 } 6670 6508 6671 6509 load_attr.log_buf = log_buf; 6672 6510 load_attr.log_size = log_buf_size; 6673 - ret = bpf_prog_load(prog->type, prog_name, license, insns, insns_cnt, &load_attr); 6511 + load_attr.log_level = log_level; 6674 6512 6513 + ret = bpf_prog_load(prog->type, prog_name, license, insns, insns_cnt, &load_attr); 6675 6514 if (ret >= 0) { 6676 - if (log_buf && load_attr.log_level) 6677 - pr_debug("verifier log:\n%s", log_buf); 6515 + if (log_level && own_log_buf) { 6516 + pr_debug("prog '%s': -- BEGIN PROG LOAD LOG --\n%s-- END PROG LOAD LOG --\n", 6517 + prog->name, log_buf); 6518 + } 6678 6519 6679 6520 if (obj->has_rodata && kernel_supports(obj, FEAT_PROG_BIND_MAP)) { 6680 6521 struct bpf_map *map; ··· 6710 6505 6711 6506 if (bpf_prog_bind_map(ret, bpf_map__fd(map), NULL)) { 6712 6507 cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); 6713 - pr_warn("prog '%s': failed to bind .rodata map: %s\n", 6714 - prog->name, cp); 6508 + pr_warn("prog '%s': failed to bind map '%s': %s\n", 6509 + prog->name, map->real_name, cp); 6715 6510 /* Don't fail hard if can't bind rodata. */ 6716 6511 } 6717 6512 } ··· 6722 6517 goto out; 6723 6518 } 6724 6519 6725 - if (!log_buf || errno == ENOSPC) { 6726 - log_buf_size = max((size_t)BPF_LOG_BUF_SIZE, 6727 - log_buf_size << 1); 6728 - 6729 - free(log_buf); 6520 + if (log_level == 0) { 6521 + log_level = 1; 6730 6522 goto retry_load; 6731 6523 } 6732 - ret = errno ? -errno : -LIBBPF_ERRNO__LOAD; 6524 + /* On ENOSPC, increase log buffer size and retry, unless custom 6525 + * log_buf is specified. 6526 + * Be careful to not overflow u32, though. Kernel's log buf size limit 6527 + * isn't part of UAPI so it can always be bumped to full 4GB. So don't 6528 + * multiply by 2 unless we are sure we'll fit within 32 bits. 6529 + * Currently, we'll get -EINVAL when we reach (UINT_MAX >> 2). 6530 + */ 6531 + if (own_log_buf && errno == ENOSPC && log_buf_size <= UINT_MAX / 2) 6532 + goto retry_load; 6533 + 6534 + ret = -errno; 6733 6535 cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); 6734 - pr_warn("load bpf program failed: %s\n", cp); 6536 + pr_warn("prog '%s': BPF program load failed: %s\n", prog->name, cp); 6735 6537 pr_perm_msg(ret); 6736 6538 6737 - if (log_buf && log_buf[0] != '\0') { 6738 - ret = -LIBBPF_ERRNO__VERIFY; 6739 - pr_warn("-- BEGIN DUMP LOG ---\n"); 6740 - pr_warn("\n%s\n", log_buf); 6741 - pr_warn("-- END LOG --\n"); 6742 - } else if (insns_cnt >= BPF_MAXINSNS) { 6743 - pr_warn("Program too large (%d insns), at most %d insns\n", 6744 - insns_cnt, BPF_MAXINSNS); 6745 - ret = -LIBBPF_ERRNO__PROG2BIG; 6746 - } else if (prog->type != BPF_PROG_TYPE_KPROBE) { 6747 - /* Wrong program type? */ 6748 - int fd; 6749 - 6750 - load_attr.expected_attach_type = 0; 6751 - load_attr.log_buf = NULL; 6752 - load_attr.log_size = 0; 6753 - fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, prog_name, license, 6754 - insns, insns_cnt, &load_attr); 6755 - if (fd >= 0) { 6756 - close(fd); 6757 - ret = -LIBBPF_ERRNO__PROGTYPE; 6758 - goto out; 6759 - } 6539 + if (own_log_buf && log_buf && log_buf[0] != '\0') { 6540 + pr_warn("prog '%s': -- BEGIN PROG LOAD LOG --\n%s-- END PROG LOAD LOG --\n", 6541 + prog->name, log_buf); 6542 + } 6543 + if (insns_cnt >= BPF_MAXINSNS) { 6544 + pr_warn("prog '%s': program too large (%d insns), at most %d insns\n", 6545 + prog->name, insns_cnt, BPF_MAXINSNS); 6760 6546 } 6761 6547 6762 6548 out: 6763 - free(log_buf); 6549 + if (own_log_buf) 6550 + free(log_buf); 6764 6551 return ret; 6765 6552 } 6766 6553 6767 - static int bpf_program__record_externs(struct bpf_program *prog) 6554 + static int bpf_program_record_relos(struct bpf_program *prog) 6768 6555 { 6769 6556 struct bpf_object *obj = prog->obj; 6770 6557 int i; ··· 6778 6581 ext->is_weak, false, BTF_KIND_FUNC, 6779 6582 relo->insn_idx); 6780 6583 break; 6584 + case RELO_CORE: { 6585 + struct bpf_core_relo cr = { 6586 + .insn_off = relo->insn_idx * 8, 6587 + .type_id = relo->core_relo->type_id, 6588 + .access_str_off = relo->core_relo->access_str_off, 6589 + .kind = relo->core_relo->kind, 6590 + }; 6591 + 6592 + bpf_gen__record_relo_core(obj->gen_loader, &cr); 6593 + break; 6594 + } 6781 6595 default: 6782 6596 continue; 6783 6597 } ··· 6828 6620 prog->name, prog->instances.nr); 6829 6621 } 6830 6622 if (obj->gen_loader) 6831 - bpf_program__record_externs(prog); 6623 + bpf_program_record_relos(prog); 6832 6624 err = bpf_object_load_prog_instance(obj, prog, 6833 6625 prog->insns, prog->insns_cnt, 6834 6626 license, kern_ver, &fd); ··· 6957 6749 return 0; 6958 6750 } 6959 6751 6960 - static struct bpf_object * 6961 - __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz, 6962 - const struct bpf_object_open_opts *opts) 6752 + static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf, size_t obj_buf_sz, 6753 + const struct bpf_object_open_opts *opts) 6963 6754 { 6964 6755 const char *obj_name, *kconfig, *btf_tmp_path; 6965 6756 struct bpf_object *obj; 6966 6757 char tmp_name[64]; 6967 6758 int err; 6759 + char *log_buf; 6760 + size_t log_size; 6761 + __u32 log_level; 6968 6762 6969 6763 if (elf_version(EV_CURRENT) == EV_NONE) { 6970 6764 pr_warn("failed to init libelf for %s\n", ··· 6989 6779 pr_debug("loading object '%s' from buffer\n", obj_name); 6990 6780 } 6991 6781 6782 + log_buf = OPTS_GET(opts, kernel_log_buf, NULL); 6783 + log_size = OPTS_GET(opts, kernel_log_size, 0); 6784 + log_level = OPTS_GET(opts, kernel_log_level, 0); 6785 + if (log_size > UINT_MAX) 6786 + return ERR_PTR(-EINVAL); 6787 + if (log_size && !log_buf) 6788 + return ERR_PTR(-EINVAL); 6789 + 6992 6790 obj = bpf_object__new(path, obj_buf, obj_buf_sz, obj_name); 6993 6791 if (IS_ERR(obj)) 6994 6792 return obj; 6793 + 6794 + obj->log_buf = log_buf; 6795 + obj->log_size = log_size; 6796 + obj->log_level = log_level; 6995 6797 6996 6798 btf_tmp_path = OPTS_GET(opts, btf_custom_path, NULL); 6997 6799 if (btf_tmp_path) { ··· 7058 6836 return NULL; 7059 6837 7060 6838 pr_debug("loading %s\n", attr->file); 7061 - return __bpf_object__open(attr->file, NULL, 0, &opts); 6839 + return bpf_object_open(attr->file, NULL, 0, &opts); 7062 6840 } 7063 6841 7064 6842 struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr) ··· 7084 6862 7085 6863 pr_debug("loading %s\n", path); 7086 6864 7087 - return libbpf_ptr(__bpf_object__open(path, NULL, 0, opts)); 6865 + return libbpf_ptr(bpf_object_open(path, NULL, 0, opts)); 7088 6866 } 7089 6867 7090 6868 struct bpf_object * ··· 7094 6872 if (!obj_buf || obj_buf_sz == 0) 7095 6873 return libbpf_err_ptr(-EINVAL); 7096 6874 7097 - return libbpf_ptr(__bpf_object__open(NULL, obj_buf, obj_buf_sz, opts)); 6875 + return libbpf_ptr(bpf_object_open(NULL, obj_buf, obj_buf_sz, opts)); 7098 6876 } 7099 6877 7100 6878 struct bpf_object * ··· 7111 6889 if (!obj_buf || obj_buf_sz == 0) 7112 6890 return errno = EINVAL, NULL; 7113 6891 7114 - return libbpf_ptr(__bpf_object__open(NULL, obj_buf, obj_buf_sz, &opts)); 6892 + return libbpf_ptr(bpf_object_open(NULL, obj_buf, obj_buf_sz, &opts)); 7115 6893 } 7116 6894 7117 6895 static int bpf_object_unload(struct bpf_object *obj) ··· 7142 6920 bpf_object__for_each_map(m, obj) { 7143 6921 if (!bpf_map__is_internal(m)) 7144 6922 continue; 7145 - if (!kernel_supports(obj, FEAT_GLOBAL_DATA)) { 7146 - pr_warn("kernel doesn't support global data\n"); 7147 - return -ENOTSUP; 7148 - } 7149 6923 if (!kernel_supports(obj, FEAT_ARRAY_MMAP)) 7150 6924 m->def.map_flags ^= BPF_F_MMAPABLE; 7151 6925 } ··· 7464 7246 return 0; 7465 7247 } 7466 7248 7467 - int bpf_object__load_xattr(struct bpf_object_load_attr *attr) 7249 + static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const char *target_btf_path) 7468 7250 { 7469 - struct bpf_object *obj; 7470 7251 int err, i; 7471 7252 7472 - if (!attr) 7473 - return libbpf_err(-EINVAL); 7474 - obj = attr->obj; 7475 7253 if (!obj) 7476 7254 return libbpf_err(-EINVAL); 7477 7255 ··· 7477 7263 } 7478 7264 7479 7265 if (obj->gen_loader) 7480 - bpf_gen__init(obj->gen_loader, attr->log_level, obj->nr_programs, obj->nr_maps); 7266 + bpf_gen__init(obj->gen_loader, extra_log_level, obj->nr_programs, obj->nr_maps); 7481 7267 7482 7268 err = bpf_object__probe_loading(obj); 7483 7269 err = err ? : bpf_object__load_vmlinux_btf(obj, false); ··· 7486 7272 err = err ? : bpf_object__sanitize_maps(obj); 7487 7273 err = err ? : bpf_object__init_kern_struct_ops_maps(obj); 7488 7274 err = err ? : bpf_object__create_maps(obj); 7489 - err = err ? : bpf_object__relocate(obj, obj->btf_custom_path ? : attr->target_btf_path); 7490 - err = err ? : bpf_object__load_progs(obj, attr->log_level); 7275 + err = err ? : bpf_object__relocate(obj, obj->btf_custom_path ? : target_btf_path); 7276 + err = err ? : bpf_object__load_progs(obj, extra_log_level); 7277 + err = err ? : bpf_object_init_prog_arrays(obj); 7491 7278 7492 7279 if (obj->gen_loader) { 7493 7280 /* reset FDs */ ··· 7532 7317 return libbpf_err(err); 7533 7318 } 7534 7319 7320 + int bpf_object__load_xattr(struct bpf_object_load_attr *attr) 7321 + { 7322 + return bpf_object_load(attr->obj, attr->log_level, attr->target_btf_path); 7323 + } 7324 + 7535 7325 int bpf_object__load(struct bpf_object *obj) 7536 7326 { 7537 - struct bpf_object_load_attr attr = { 7538 - .obj = obj, 7539 - }; 7540 - 7541 - return bpf_object__load_xattr(&attr); 7327 + return bpf_object_load(obj, 0, NULL); 7542 7328 } 7543 7329 7544 7330 static int make_parent_dir(const char *path) ··· 7927 7711 bpf_object__for_each_map(map, obj) { 7928 7712 char *pin_path = NULL; 7929 7713 char buf[PATH_MAX]; 7714 + 7715 + if (map->skipped) 7716 + continue; 7930 7717 7931 7718 if (path) { 7932 7719 int len; ··· 8515 8296 return prog->prog_flags; 8516 8297 } 8517 8298 8518 - int bpf_program__set_extra_flags(struct bpf_program *prog, __u32 extra_flags) 8299 + int bpf_program__set_flags(struct bpf_program *prog, __u32 flags) 8519 8300 { 8520 8301 if (prog->obj->loaded) 8521 8302 return libbpf_err(-EBUSY); 8522 8303 8523 - prog->prog_flags |= extra_flags; 8304 + prog->prog_flags = flags; 8305 + return 0; 8306 + } 8307 + 8308 + __u32 bpf_program__log_level(const struct bpf_program *prog) 8309 + { 8310 + return prog->log_level; 8311 + } 8312 + 8313 + int bpf_program__set_log_level(struct bpf_program *prog, __u32 log_level) 8314 + { 8315 + if (prog->obj->loaded) 8316 + return libbpf_err(-EBUSY); 8317 + 8318 + prog->log_level = log_level; 8319 + return 0; 8320 + } 8321 + 8322 + const char *bpf_program__log_buf(const struct bpf_program *prog, size_t *log_size) 8323 + { 8324 + *log_size = prog->log_size; 8325 + return prog->log_buf; 8326 + } 8327 + 8328 + int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log_size) 8329 + { 8330 + if (log_size && !log_buf) 8331 + return -EINVAL; 8332 + if (prog->log_size > UINT_MAX) 8333 + return -EINVAL; 8334 + if (prog->obj->loaded) 8335 + return -EBUSY; 8336 + 8337 + prog->log_buf = log_buf; 8338 + prog->log_size = log_size; 8524 8339 return 0; 8525 8340 } 8526 8341

+111 -4

tools/lib/bpf/libbpf.h

··· 24 24 extern "C" { 25 25 #endif 26 26 27 + LIBBPF_API __u32 libbpf_major_version(void); 28 + LIBBPF_API __u32 libbpf_minor_version(void); 29 + LIBBPF_API const char *libbpf_version_string(void); 30 + 27 31 enum libbpf_errno { 28 32 __LIBBPF_ERRNO__START = 4000, 29 33 ··· 108 104 * struct_ops, etc) will need actual kernel BTF at /sys/kernel/btf/vmlinux. 109 105 */ 110 106 const char *btf_custom_path; 107 + /* Pointer to a buffer for storing kernel logs for applicable BPF 108 + * commands. Valid kernel_log_size has to be specified as well and are 109 + * passed-through to bpf() syscall. Keep in mind that kernel might 110 + * fail operation with -ENOSPC error if provided buffer is too small 111 + * to contain entire log output. 112 + * See the comment below for kernel_log_level for interaction between 113 + * log_buf and log_level settings. 114 + * 115 + * If specified, this log buffer will be passed for: 116 + * - each BPF progral load (BPF_PROG_LOAD) attempt, unless overriden 117 + * with bpf_program__set_log() on per-program level, to get 118 + * BPF verifier log output. 119 + * - during BPF object's BTF load into kernel (BPF_BTF_LOAD) to get 120 + * BTF sanity checking log. 121 + * 122 + * Each BPF command (BPF_BTF_LOAD or BPF_PROG_LOAD) will overwrite 123 + * previous contents, so if you need more fine-grained control, set 124 + * per-program buffer with bpf_program__set_log_buf() to preserve each 125 + * individual program's verification log. Keep using kernel_log_buf 126 + * for BTF verification log, if necessary. 127 + */ 128 + char *kernel_log_buf; 129 + size_t kernel_log_size; 130 + /* 131 + * Log level can be set independently from log buffer. Log_level=0 132 + * means that libbpf will attempt loading BTF or program without any 133 + * logging requested, but will retry with either its own or custom log 134 + * buffer, if provided, and log_level=1 on any error. 135 + * And vice versa, setting log_level>0 will request BTF or prog 136 + * loading with verbose log from the first attempt (and as such also 137 + * for successfully loaded BTF or program), and the actual log buffer 138 + * could be either libbpf's own auto-allocated log buffer, if 139 + * kernel_log_buffer is NULL, or user-provided custom kernel_log_buf. 140 + * If user didn't provide custom log buffer, libbpf will emit captured 141 + * logs through its print callback. 142 + */ 143 + __u32 kernel_log_level; 144 + 145 + size_t :0; 111 146 }; 112 - #define bpf_object_open_opts__last_field btf_custom_path 147 + #define bpf_object_open_opts__last_field kernel_log_level 113 148 114 149 LIBBPF_API struct bpf_object *bpf_object__open(const char *path); 150 + 151 + /** 152 + * @brief **bpf_object__open_file()** creates a bpf_object by opening 153 + * the BPF ELF object file pointed to by the passed path and loading it 154 + * into memory. 155 + * @param path BPF object file path 156 + * @param opts options for how to load the bpf object, this parameter is 157 + * optional and can be set to NULL 158 + * @return pointer to the new bpf_object; or NULL is returned on error, 159 + * error code is stored in errno 160 + */ 115 161 LIBBPF_API struct bpf_object * 116 162 bpf_object__open_file(const char *path, const struct bpf_object_open_opts *opts); 163 + 164 + /** 165 + * @brief **bpf_object__open_mem()** creates a bpf_object by reading 166 + * the BPF objects raw bytes from a memory buffer containing a valid 167 + * BPF ELF object file. 168 + * @param obj_buf pointer to the buffer containing ELF file bytes 169 + * @param obj_buf_sz number of bytes in the buffer 170 + * @param opts options for how to load the bpf object 171 + * @return pointer to the new bpf_object; or NULL is returned on error, 172 + * error code is stored in errno 173 + */ 117 174 LIBBPF_API struct bpf_object * 118 175 bpf_object__open_mem(const void *obj_buf, size_t obj_buf_sz, 119 176 const struct bpf_object_open_opts *opts); ··· 214 149 215 150 /* Load/unload object into/from kernel */ 216 151 LIBBPF_API int bpf_object__load(struct bpf_object *obj); 152 + LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_object__load() instead") 217 153 LIBBPF_API int bpf_object__load_xattr(struct bpf_object_load_attr *attr); 218 154 LIBBPF_DEPRECATED_SINCE(0, 6, "bpf_object__unload() is deprecated, use bpf_object__close() instead") 219 155 LIBBPF_API int bpf_object__unload(struct bpf_object *obj); ··· 410 344 }; 411 345 #define bpf_uprobe_opts__last_field retprobe 412 346 347 + /** 348 + * @brief **bpf_program__attach_uprobe()** attaches a BPF program 349 + * to the userspace function which is found by binary path and 350 + * offset. You can optionally specify a particular proccess to attach 351 + * to. You can also optionally attach the program to the function 352 + * exit instead of entry. 353 + * 354 + * @param prog BPF program to attach 355 + * @param retprobe Attach to function exit 356 + * @param pid Process ID to attach the uprobe to, 0 for self (own process), 357 + * -1 for all processes 358 + * @param binary_path Path to binary that contains the function symbol 359 + * @param func_offset Offset within the binary of the function symbol 360 + * @return Reference to the newly created BPF link; or NULL is returned on error, 361 + * error code is stored in errno 362 + */ 413 363 LIBBPF_API struct bpf_link * 414 364 bpf_program__attach_uprobe(const struct bpf_program *prog, bool retprobe, 415 365 pid_t pid, const char *binary_path, 416 366 size_t func_offset); 367 + 368 + /** 369 + * @brief **bpf_program__attach_uprobe_opts()** is just like 370 + * bpf_program__attach_uprobe() except with a options struct 371 + * for various configurations. 372 + * 373 + * @param prog BPF program to attach 374 + * @param pid Process ID to attach the uprobe to, 0 for self (own process), 375 + * -1 for all processes 376 + * @param binary_path Path to binary that contains the function symbol 377 + * @param func_offset Offset within the binary of the function symbol 378 + * @param opts Options for altering program attachment 379 + * @return Reference to the newly created BPF link; or NULL is returned on error, 380 + * error code is stored in errno 381 + */ 417 382 LIBBPF_API struct bpf_link * 418 383 bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid, 419 384 const char *binary_path, size_t func_offset, ··· 591 494 enum bpf_attach_type type); 592 495 593 496 LIBBPF_API __u32 bpf_program__flags(const struct bpf_program *prog); 594 - LIBBPF_API int bpf_program__set_extra_flags(struct bpf_program *prog, __u32 extra_flags); 497 + LIBBPF_API int bpf_program__set_flags(struct bpf_program *prog, __u32 flags); 498 + 499 + /* Per-program log level and log buffer getters/setters. 500 + * See bpf_object_open_opts comments regarding log_level and log_buf 501 + * interactions. 502 + */ 503 + LIBBPF_API __u32 bpf_program__log_level(const struct bpf_program *prog); 504 + LIBBPF_API int bpf_program__set_log_level(struct bpf_program *prog, __u32 log_level); 505 + LIBBPF_API const char *bpf_program__log_buf(const struct bpf_program *prog, size_t *log_size); 506 + LIBBPF_API int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log_size); 595 507 596 508 LIBBPF_API int 597 509 bpf_program__set_attach_target(struct bpf_program *prog, int attach_prog_fd, ··· 782 676 int prog_flags; 783 677 }; 784 678 679 + LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_object__open() and bpf_object__load() instead") 785 680 LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr, 786 681 struct bpf_object **pobj, int *prog_fd); 787 682 LIBBPF_DEPRECATED_SINCE(0, 7, "use bpf_object__open() and bpf_object__load() instead") ··· 1138 1031 struct bpf_object **obj; 1139 1032 1140 1033 int map_cnt; 1141 - int map_skel_sz; /* sizeof(struct bpf_skeleton_map) */ 1034 + int map_skel_sz; /* sizeof(struct bpf_map_skeleton) */ 1142 1035 struct bpf_map_skeleton *maps; 1143 1036 1144 1037 int prog_cnt; 1145 - int prog_skel_sz; /* sizeof(struct bpf_skeleton_prog) */ 1038 + int prog_skel_sz; /* sizeof(struct bpf_prog_skeleton) */ 1146 1039 struct bpf_prog_skeleton *progs; 1147 1040 }; 1148 1041

+14 -1

tools/lib/bpf/libbpf.map

··· 391 391 global: 392 392 bpf_map__map_extra; 393 393 bpf_map__set_map_extra; 394 + bpf_map_create; 394 395 bpf_object__next_map; 395 396 bpf_object__next_program; 396 397 bpf_object__prev_map; ··· 401 400 bpf_program__flags; 402 401 bpf_program__insn_cnt; 403 402 bpf_program__insns; 404 - bpf_program__set_extra_flags; 403 + bpf_program__set_flags; 405 404 btf__add_btf; 406 405 btf__add_decl_tag; 407 406 btf__add_type_tag; ··· 411 410 btf__type_cnt; 412 411 btf_dump__new; 413 412 btf_dump__new_deprecated; 413 + libbpf_major_version; 414 + libbpf_minor_version; 415 + libbpf_version_string; 414 416 perf_buffer__new; 415 417 perf_buffer__new_deprecated; 416 418 perf_buffer__new_raw; 417 419 perf_buffer__new_raw_deprecated; 418 420 } LIBBPF_0.5.0; 421 + 422 + LIBBPF_0.7.0 { 423 + global: 424 + bpf_btf_load; 425 + bpf_program__log_buf; 426 + bpf_program__log_level; 427 + bpf_program__set_log_buf; 428 + bpf_program__set_log_level; 429 + };

+5

tools/lib/bpf/libbpf_common.h

··· 40 40 #else 41 41 #define __LIBBPF_MARK_DEPRECATED_0_7(X) 42 42 #endif 43 + #if __LIBBPF_CURRENT_VERSION_GEQ(0, 8) 44 + #define __LIBBPF_MARK_DEPRECATED_0_8(X) X 45 + #else 46 + #define __LIBBPF_MARK_DEPRECATED_0_8(X) 47 + #endif 43 48 44 49 /* This set of internal macros allows to do "function overloading" based on 45 50 * number of arguments provided by used in backwards-compatible way during the

+2 -22

tools/lib/bpf/libbpf_internal.h

··· 172 172 struct btf; 173 173 struct btf_type; 174 174 175 - struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id); 175 + struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id); 176 176 const char *btf_kind_str(const struct btf_type *t); 177 177 const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id); 178 178 ··· 277 277 int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz); 278 278 int libbpf__load_raw_btf(const char *raw_types, size_t types_len, 279 279 const char *str_sec, size_t str_len); 280 - 281 - struct bpf_create_map_params { 282 - const char *name; 283 - enum bpf_map_type map_type; 284 - __u32 map_flags; 285 - __u32 key_size; 286 - __u32 value_size; 287 - __u32 max_entries; 288 - __u32 numa_node; 289 - __u32 btf_fd; 290 - __u32 btf_key_type_id; 291 - __u32 btf_value_type_id; 292 - __u32 map_ifindex; 293 - union { 294 - __u32 inner_map_fd; 295 - __u32 btf_vmlinux_value_type_id; 296 - }; 297 - __u64 map_extra; 298 - }; 299 - 300 - int libbpf__bpf_create_map_xattr(const struct bpf_create_map_params *create_attr); 280 + int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level); 301 281 302 282 struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf); 303 283 void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,

+16 -16

tools/lib/bpf/libbpf_probes.c

··· 164 164 memcpy(raw_btf + hdr.hdr_len, raw_types, hdr.type_len); 165 165 memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len); 166 166 167 - btf_fd = bpf_load_btf(raw_btf, btf_len, NULL, 0, false); 167 + btf_fd = bpf_btf_load(raw_btf, btf_len, NULL); 168 168 169 169 free(raw_btf); 170 170 return btf_fd; ··· 201 201 { 202 202 int key_size, value_size, max_entries, map_flags; 203 203 __u32 btf_key_type_id = 0, btf_value_type_id = 0; 204 - struct bpf_create_map_attr attr = {}; 205 204 int fd = -1, btf_fd = -1, fd_inner; 206 205 207 206 key_size = sizeof(__u32); ··· 270 271 271 272 if (map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS || 272 273 map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { 274 + LIBBPF_OPTS(bpf_map_create_opts, opts); 275 + 273 276 /* TODO: probe for device, once libbpf has a function to create 274 277 * map-in-map for offload 275 278 */ 276 279 if (ifindex) 277 280 return false; 278 281 279 - fd_inner = bpf_create_map(BPF_MAP_TYPE_HASH, 280 - sizeof(__u32), sizeof(__u32), 1, 0); 282 + fd_inner = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, 283 + sizeof(__u32), sizeof(__u32), 1, NULL); 281 284 if (fd_inner < 0) 282 285 return false; 283 - fd = bpf_create_map_in_map(map_type, NULL, sizeof(__u32), 284 - fd_inner, 1, 0); 286 + 287 + opts.inner_map_fd = fd_inner; 288 + fd = bpf_map_create(map_type, NULL, sizeof(__u32), sizeof(__u32), 1, &opts); 285 289 close(fd_inner); 286 290 } else { 291 + LIBBPF_OPTS(bpf_map_create_opts, opts); 292 + 287 293 /* Note: No other restriction on map type probes for offload */ 288 - attr.map_type = map_type; 289 - attr.key_size = key_size; 290 - attr.value_size = value_size; 291 - attr.max_entries = max_entries; 292 - attr.map_flags = map_flags; 293 - attr.map_ifindex = ifindex; 294 + opts.map_flags = map_flags; 295 + opts.map_ifindex = ifindex; 294 296 if (btf_fd >= 0) { 295 - attr.btf_fd = btf_fd; 296 - attr.btf_key_type_id = btf_key_type_id; 297 - attr.btf_value_type_id = btf_value_type_id; 297 + opts.btf_fd = btf_fd; 298 + opts.btf_key_type_id = btf_key_type_id; 299 + opts.btf_value_type_id = btf_value_type_id; 298 300 } 299 301 300 - fd = bpf_create_map_xattr(&attr); 302 + fd = bpf_map_create(map_type, NULL, key_size, value_size, max_entries, &opts); 301 303 } 302 304 if (fd >= 0) 303 305 close(fd);

+1 -1

tools/lib/bpf/libbpf_version.h

··· 4 4 #define __LIBBPF_VERSION_H 5 5 6 6 #define LIBBPF_MAJOR_VERSION 0 7 - #define LIBBPF_MINOR_VERSION 6 7 + #define LIBBPF_MINOR_VERSION 7 8 8 9 9 #endif /* __LIBBPF_VERSION_H */

+5 -1

tools/lib/bpf/linker.c

··· 210 210 } 211 211 free(linker->secs); 212 212 213 + free(linker->glob_syms); 213 214 free(linker); 214 215 } 215 216 ··· 2000 1999 static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *obj) 2001 2000 { 2002 2001 struct src_sec *src_symtab = &obj->secs[obj->symtab_sec_idx]; 2003 - struct dst_sec *dst_symtab = &linker->secs[linker->symtab_sec_idx]; 2002 + struct dst_sec *dst_symtab; 2004 2003 int i, err; 2005 2004 2006 2005 for (i = 1; i < obj->sec_cnt; i++) { ··· 2032 2031 pr_warn("sections %s are not compatible\n", src_sec->sec_name); 2033 2032 return -1; 2034 2033 } 2034 + 2035 + /* add_dst_sec() above could have invalidated linker->secs */ 2036 + dst_symtab = &linker->secs[linker->symtab_sec_idx]; 2035 2037 2036 2038 /* shdr->sh_link points to SYMTAB */ 2037 2039 dst_sec->shdr->sh_link = linker->symtab_sec_idx;

+129 -102

tools/lib/bpf/relo_core.c

··· 1 1 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 2 /* Copyright (c) 2019 Facebook */ 3 3 4 + #ifdef __KERNEL__ 5 + #include <linux/bpf.h> 6 + #include <linux/btf.h> 7 + #include <linux/string.h> 8 + #include <linux/bpf_verifier.h> 9 + #include "relo_core.h" 10 + 11 + static const char *btf_kind_str(const struct btf_type *t) 12 + { 13 + return btf_type_str(t); 14 + } 15 + 16 + static bool is_ldimm64_insn(struct bpf_insn *insn) 17 + { 18 + return insn->code == (BPF_LD | BPF_IMM | BPF_DW); 19 + } 20 + 21 + static const struct btf_type * 22 + skip_mods_and_typedefs(const struct btf *btf, u32 id, u32 *res_id) 23 + { 24 + return btf_type_skip_modifiers(btf, id, res_id); 25 + } 26 + 27 + static const char *btf__name_by_offset(const struct btf *btf, u32 offset) 28 + { 29 + return btf_name_by_offset(btf, offset); 30 + } 31 + 32 + static s64 btf__resolve_size(const struct btf *btf, u32 type_id) 33 + { 34 + const struct btf_type *t; 35 + int size; 36 + 37 + t = btf_type_by_id(btf, type_id); 38 + t = btf_resolve_size(btf, t, &size); 39 + if (IS_ERR(t)) 40 + return PTR_ERR(t); 41 + return size; 42 + } 43 + 44 + enum libbpf_print_level { 45 + LIBBPF_WARN, 46 + LIBBPF_INFO, 47 + LIBBPF_DEBUG, 48 + }; 49 + 50 + #undef pr_warn 51 + #undef pr_info 52 + #undef pr_debug 53 + #define pr_warn(fmt, log, ...) bpf_log((void *)log, fmt, "", ##__VA_ARGS__) 54 + #define pr_info(fmt, log, ...) bpf_log((void *)log, fmt, "", ##__VA_ARGS__) 55 + #define pr_debug(fmt, log, ...) bpf_log((void *)log, fmt, "", ##__VA_ARGS__) 56 + #define libbpf_print(level, fmt, ...) bpf_log((void *)prog_name, fmt, ##__VA_ARGS__) 57 + #else 4 58 #include <stdio.h> 5 59 #include <string.h> 6 60 #include <errno.h> ··· 66 12 #include "btf.h" 67 13 #include "str_error.h" 68 14 #include "libbpf_internal.h" 69 - 70 - #define BPF_CORE_SPEC_MAX_LEN 64 71 - 72 - /* represents BPF CO-RE field or array element accessor */ 73 - struct bpf_core_accessor { 74 - __u32 type_id; /* struct/union type or array element type */ 75 - __u32 idx; /* field index or array index */ 76 - const char *name; /* field name or NULL for array accessor */ 77 - }; 78 - 79 - struct bpf_core_spec { 80 - const struct btf *btf; 81 - /* high-level spec: named fields and array indices only */ 82 - struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN]; 83 - /* original unresolved (no skip_mods_or_typedefs) root type ID */ 84 - __u32 root_type_id; 85 - /* CO-RE relocation kind */ 86 - enum bpf_core_relo_kind relo_kind; 87 - /* high-level spec length */ 88 - int len; 89 - /* raw, low-level spec: 1-to-1 with accessor spec string */ 90 - int raw_spec[BPF_CORE_SPEC_MAX_LEN]; 91 - /* raw spec length */ 92 - int raw_len; 93 - /* field bit offset represented by spec */ 94 - __u32 bit_offset; 95 - }; 15 + #endif 96 16 97 17 static bool is_flex_arr(const struct btf *btf, 98 18 const struct bpf_core_accessor *acc, ··· 79 51 return false; 80 52 81 53 /* has to be the last member of enclosing struct */ 82 - t = btf__type_by_id(btf, acc->type_id); 54 + t = btf_type_by_id(btf, acc->type_id); 83 55 return acc->idx == btf_vlen(t) - 1; 84 56 } 85 57 86 58 static const char *core_relo_kind_str(enum bpf_core_relo_kind kind) 87 59 { 88 60 switch (kind) { 89 - case BPF_FIELD_BYTE_OFFSET: return "byte_off"; 90 - case BPF_FIELD_BYTE_SIZE: return "byte_sz"; 91 - case BPF_FIELD_EXISTS: return "field_exists"; 92 - case BPF_FIELD_SIGNED: return "signed"; 93 - case BPF_FIELD_LSHIFT_U64: return "lshift_u64"; 94 - case BPF_FIELD_RSHIFT_U64: return "rshift_u64"; 95 - case BPF_TYPE_ID_LOCAL: return "local_type_id"; 96 - case BPF_TYPE_ID_TARGET: return "target_type_id"; 97 - case BPF_TYPE_EXISTS: return "type_exists"; 98 - case BPF_TYPE_SIZE: return "type_size"; 99 - case BPF_ENUMVAL_EXISTS: return "enumval_exists"; 100 - case BPF_ENUMVAL_VALUE: return "enumval_value"; 61 + case BPF_CORE_FIELD_BYTE_OFFSET: return "byte_off"; 62 + case BPF_CORE_FIELD_BYTE_SIZE: return "byte_sz"; 63 + case BPF_CORE_FIELD_EXISTS: return "field_exists"; 64 + case BPF_CORE_FIELD_SIGNED: return "signed"; 65 + case BPF_CORE_FIELD_LSHIFT_U64: return "lshift_u64"; 66 + case BPF_CORE_FIELD_RSHIFT_U64: return "rshift_u64"; 67 + case BPF_CORE_TYPE_ID_LOCAL: return "local_type_id"; 68 + case BPF_CORE_TYPE_ID_TARGET: return "target_type_id"; 69 + case BPF_CORE_TYPE_EXISTS: return "type_exists"; 70 + case BPF_CORE_TYPE_SIZE: return "type_size"; 71 + case BPF_CORE_ENUMVAL_EXISTS: return "enumval_exists"; 72 + case BPF_CORE_ENUMVAL_VALUE: return "enumval_value"; 101 73 default: return "unknown"; 102 74 } 103 75 } ··· 105 77 static bool core_relo_is_field_based(enum bpf_core_relo_kind kind) 106 78 { 107 79 switch (kind) { 108 - case BPF_FIELD_BYTE_OFFSET: 109 - case BPF_FIELD_BYTE_SIZE: 110 - case BPF_FIELD_EXISTS: 111 - case BPF_FIELD_SIGNED: 112 - case BPF_FIELD_LSHIFT_U64: 113 - case BPF_FIELD_RSHIFT_U64: 80 + case BPF_CORE_FIELD_BYTE_OFFSET: 81 + case BPF_CORE_FIELD_BYTE_SIZE: 82 + case BPF_CORE_FIELD_EXISTS: 83 + case BPF_CORE_FIELD_SIGNED: 84 + case BPF_CORE_FIELD_LSHIFT_U64: 85 + case BPF_CORE_FIELD_RSHIFT_U64: 114 86 return true; 115 87 default: 116 88 return false; ··· 120 92 static bool core_relo_is_type_based(enum bpf_core_relo_kind kind) 121 93 { 122 94 switch (kind) { 123 - case BPF_TYPE_ID_LOCAL: 124 - case BPF_TYPE_ID_TARGET: 125 - case BPF_TYPE_EXISTS: 126 - case BPF_TYPE_SIZE: 95 + case BPF_CORE_TYPE_ID_LOCAL: 96 + case BPF_CORE_TYPE_ID_TARGET: 97 + case BPF_CORE_TYPE_EXISTS: 98 + case BPF_CORE_TYPE_SIZE: 127 99 return true; 128 100 default: 129 101 return false; ··· 133 105 static bool core_relo_is_enumval_based(enum bpf_core_relo_kind kind) 134 106 { 135 107 switch (kind) { 136 - case BPF_ENUMVAL_EXISTS: 137 - case BPF_ENUMVAL_VALUE: 108 + case BPF_CORE_ENUMVAL_EXISTS: 109 + case BPF_CORE_ENUMVAL_VALUE: 138 110 return true; 139 111 default: 140 112 return false; ··· 178 150 * Enum value-based relocations (ENUMVAL_EXISTS/ENUMVAL_VALUE) use access 179 151 * string to specify enumerator's value index that need to be relocated. 180 152 */ 181 - static int bpf_core_parse_spec(const struct btf *btf, 153 + static int bpf_core_parse_spec(const char *prog_name, const struct btf *btf, 182 154 __u32 type_id, 183 155 const char *spec_str, 184 156 enum bpf_core_relo_kind relo_kind, ··· 300 272 return sz; 301 273 spec->bit_offset += access_idx * sz * 8; 302 274 } else { 303 - pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %s\n", 304 - type_id, spec_str, i, id, btf_kind_str(t)); 275 + pr_warn("prog '%s': relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %s\n", 276 + prog_name, type_id, spec_str, i, id, btf_kind_str(t)); 305 277 return -EINVAL; 306 278 } 307 279 } ··· 374 346 targ_id = btf_array(targ_type)->type; 375 347 goto recur; 376 348 default: 377 - pr_warn("unexpected kind %d relocated, local [%d], target [%d]\n", 378 - btf_kind(local_type), local_id, targ_id); 379 349 return 0; 380 350 } 381 351 } ··· 414 388 return 0; 415 389 416 390 local_id = local_acc->type_id; 417 - local_type = btf__type_by_id(local_btf, local_id); 391 + local_type = btf_type_by_id(local_btf, local_id); 418 392 local_member = btf_members(local_type) + local_acc->idx; 419 393 local_name = btf__name_by_offset(local_btf, local_member->name_off); 420 394 ··· 597 571 598 572 *field_sz = 0; 599 573 600 - if (relo->kind == BPF_FIELD_EXISTS) { 574 + if (relo->kind == BPF_CORE_FIELD_EXISTS) { 601 575 *val = spec ? 1 : 0; 602 576 return 0; 603 577 } ··· 606 580 return -EUCLEAN; /* request instruction poisoning */ 607 581 608 582 acc = &spec->spec[spec->len - 1]; 609 - t = btf__type_by_id(spec->btf, acc->type_id); 583 + t = btf_type_by_id(spec->btf, acc->type_id); 610 584 611 585 /* a[n] accessor needs special handling */ 612 586 if (!acc->name) { 613 - if (relo->kind == BPF_FIELD_BYTE_OFFSET) { 587 + if (relo->kind == BPF_CORE_FIELD_BYTE_OFFSET) { 614 588 *val = spec->bit_offset / 8; 615 589 /* remember field size for load/store mem size */ 616 590 sz = btf__resolve_size(spec->btf, acc->type_id); ··· 618 592 return -EINVAL; 619 593 *field_sz = sz; 620 594 *type_id = acc->type_id; 621 - } else if (relo->kind == BPF_FIELD_BYTE_SIZE) { 595 + } else if (relo->kind == BPF_CORE_FIELD_BYTE_SIZE) { 622 596 sz = btf__resolve_size(spec->btf, acc->type_id); 623 597 if (sz < 0) 624 598 return -EINVAL; ··· 670 644 *validate = !bitfield; 671 645 672 646 switch (relo->kind) { 673 - case BPF_FIELD_BYTE_OFFSET: 647 + case BPF_CORE_FIELD_BYTE_OFFSET: 674 648 *val = byte_off; 675 649 if (!bitfield) { 676 650 *field_sz = byte_sz; 677 651 *type_id = field_type_id; 678 652 } 679 653 break; 680 - case BPF_FIELD_BYTE_SIZE: 654 + case BPF_CORE_FIELD_BYTE_SIZE: 681 655 *val = byte_sz; 682 656 break; 683 - case BPF_FIELD_SIGNED: 657 + case BPF_CORE_FIELD_SIGNED: 684 658 /* enums will be assumed unsigned */ 685 659 *val = btf_is_enum(mt) || 686 660 (btf_int_encoding(mt) & BTF_INT_SIGNED); 687 661 if (validate) 688 662 *validate = true; /* signedness is never ambiguous */ 689 663 break; 690 - case BPF_FIELD_LSHIFT_U64: 664 + case BPF_CORE_FIELD_LSHIFT_U64: 691 665 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ 692 666 *val = 64 - (bit_off + bit_sz - byte_off * 8); 693 667 #else 694 668 *val = (8 - byte_sz) * 8 + (bit_off - byte_off * 8); 695 669 #endif 696 670 break; 697 - case BPF_FIELD_RSHIFT_U64: 671 + case BPF_CORE_FIELD_RSHIFT_U64: 698 672 *val = 64 - bit_sz; 699 673 if (validate) 700 674 *validate = true; /* right shift is never ambiguous */ 701 675 break; 702 - case BPF_FIELD_EXISTS: 676 + case BPF_CORE_FIELD_EXISTS: 703 677 default: 704 678 return -EOPNOTSUPP; 705 679 } ··· 720 694 } 721 695 722 696 switch (relo->kind) { 723 - case BPF_TYPE_ID_TARGET: 697 + case BPF_CORE_TYPE_ID_TARGET: 724 698 *val = spec->root_type_id; 725 699 break; 726 - case BPF_TYPE_EXISTS: 700 + case BPF_CORE_TYPE_EXISTS: 727 701 *val = 1; 728 702 break; 729 - case BPF_TYPE_SIZE: 703 + case BPF_CORE_TYPE_SIZE: 730 704 sz = btf__resolve_size(spec->btf, spec->root_type_id); 731 705 if (sz < 0) 732 706 return -EINVAL; 733 707 *val = sz; 734 708 break; 735 - case BPF_TYPE_ID_LOCAL: 736 - /* BPF_TYPE_ID_LOCAL is handled specially and shouldn't get here */ 709 + case BPF_CORE_TYPE_ID_LOCAL: 710 + /* BPF_CORE_TYPE_ID_LOCAL is handled specially and shouldn't get here */ 737 711 default: 738 712 return -EOPNOTSUPP; 739 713 } ··· 749 723 const struct btf_enum *e; 750 724 751 725 switch (relo->kind) { 752 - case BPF_ENUMVAL_EXISTS: 726 + case BPF_CORE_ENUMVAL_EXISTS: 753 727 *val = spec ? 1 : 0; 754 728 break; 755 - case BPF_ENUMVAL_VALUE: 729 + case BPF_CORE_ENUMVAL_VALUE: 756 730 if (!spec) 757 731 return -EUCLEAN; /* request instruction poisoning */ 758 - t = btf__type_by_id(spec->btf, spec->spec[0].type_id); 732 + t = btf_type_by_id(spec->btf, spec->spec[0].type_id); 759 733 e = btf_enum(t) + spec->spec[0].idx; 760 734 *val = e->val; 761 735 break; ··· 831 805 if (res->orig_sz != res->new_sz) { 832 806 const struct btf_type *orig_t, *new_t; 833 807 834 - orig_t = btf__type_by_id(local_spec->btf, res->orig_type_id); 835 - new_t = btf__type_by_id(targ_spec->btf, res->new_type_id); 808 + orig_t = btf_type_by_id(local_spec->btf, res->orig_type_id); 809 + new_t = btf_type_by_id(targ_spec->btf, res->new_type_id); 836 810 837 811 /* There are two use cases in which it's safe to 838 812 * adjust load/store's mem size: ··· 1071 1045 * [<type-id>] (<type-name>) + <raw-spec> => <offset>@<spec>, 1072 1046 * where <spec> is a C-syntax view of recorded field access, e.g.: x.a[3].b 1073 1047 */ 1074 - static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec) 1048 + static void bpf_core_dump_spec(const char *prog_name, int level, const struct bpf_core_spec *spec) 1075 1049 { 1076 1050 const struct btf_type *t; 1077 1051 const struct btf_enum *e; ··· 1080 1054 int i; 1081 1055 1082 1056 type_id = spec->root_type_id; 1083 - t = btf__type_by_id(spec->btf, type_id); 1057 + t = btf_type_by_id(spec->btf, type_id); 1084 1058 s = btf__name_by_offset(spec->btf, t->name_off); 1085 1059 1086 1060 libbpf_print(level, "[%u] %s %s", type_id, btf_kind_str(t), str_is_empty(s) ? "<anon>" : s); ··· 1173 1147 const struct bpf_core_relo *relo, 1174 1148 int relo_idx, 1175 1149 const struct btf *local_btf, 1176 - struct bpf_core_cand_list *cands) 1150 + struct bpf_core_cand_list *cands, 1151 + struct bpf_core_spec *specs_scratch) 1177 1152 { 1178 - struct bpf_core_spec local_spec, cand_spec, targ_spec = {}; 1153 + struct bpf_core_spec *local_spec = &specs_scratch[0]; 1154 + struct bpf_core_spec *cand_spec = &specs_scratch[1]; 1155 + struct bpf_core_spec *targ_spec = &specs_scratch[2]; 1179 1156 struct bpf_core_relo_res cand_res, targ_res; 1180 1157 const struct btf_type *local_type; 1181 1158 const char *local_name; ··· 1187 1158 int i, j, err; 1188 1159 1189 1160 local_id = relo->type_id; 1190 - local_type = btf__type_by_id(local_btf, local_id); 1191 - if (!local_type) 1192 - return -EINVAL; 1193 - 1161 + local_type = btf_type_by_id(local_btf, local_id); 1194 1162 local_name = btf__name_by_offset(local_btf, local_type->name_off); 1195 1163 if (!local_name) 1196 1164 return -EINVAL; ··· 1196 1170 if (str_is_empty(spec_str)) 1197 1171 return -EINVAL; 1198 1172 1199 - err = bpf_core_parse_spec(local_btf, local_id, spec_str, relo->kind, &local_spec); 1173 + err = bpf_core_parse_spec(prog_name, local_btf, local_id, spec_str, 1174 + relo->kind, local_spec); 1200 1175 if (err) { 1201 1176 pr_warn("prog '%s': relo #%d: parsing [%d] %s %s + %s failed: %d\n", 1202 1177 prog_name, relo_idx, local_id, btf_kind_str(local_type), ··· 1208 1181 1209 1182 pr_debug("prog '%s': relo #%d: kind <%s> (%d), spec is ", prog_name, 1210 1183 relo_idx, core_relo_kind_str(relo->kind), relo->kind); 1211 - bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec); 1184 + bpf_core_dump_spec(prog_name, LIBBPF_DEBUG, local_spec); 1212 1185 libbpf_print(LIBBPF_DEBUG, "\n"); 1213 1186 1214 1187 /* TYPE_ID_LOCAL relo is special and doesn't need candidate search */ 1215 - if (relo->kind == BPF_TYPE_ID_LOCAL) { 1188 + if (relo->kind == BPF_CORE_TYPE_ID_LOCAL) { 1216 1189 targ_res.validate = true; 1217 1190 targ_res.poison = false; 1218 - targ_res.orig_val = local_spec.root_type_id; 1219 - targ_res.new_val = local_spec.root_type_id; 1191 + targ_res.orig_val = local_spec->root_type_id; 1192 + targ_res.new_val = local_spec->root_type_id; 1220 1193 goto patch_insn; 1221 1194 } 1222 1195 ··· 1229 1202 1230 1203 1231 1204 for (i = 0, j = 0; i < cands->len; i++) { 1232 - err = bpf_core_spec_match(&local_spec, cands->cands[i].btf, 1233 - cands->cands[i].id, &cand_spec); 1205 + err = bpf_core_spec_match(local_spec, cands->cands[i].btf, 1206 + cands->cands[i].id, cand_spec); 1234 1207 if (err < 0) { 1235 1208 pr_warn("prog '%s': relo #%d: error matching candidate #%d ", 1236 1209 prog_name, relo_idx, i); 1237 - bpf_core_dump_spec(LIBBPF_WARN, &cand_spec); 1210 + bpf_core_dump_spec(prog_name, LIBBPF_WARN, cand_spec); 1238 1211 libbpf_print(LIBBPF_WARN, ": %d\n", err); 1239 1212 return err; 1240 1213 } 1241 1214 1242 1215 pr_debug("prog '%s': relo #%d: %s candidate #%d ", prog_name, 1243 1216 relo_idx, err == 0 ? "non-matching" : "matching", i); 1244 - bpf_core_dump_spec(LIBBPF_DEBUG, &cand_spec); 1217 + bpf_core_dump_spec(prog_name, LIBBPF_DEBUG, cand_spec); 1245 1218 libbpf_print(LIBBPF_DEBUG, "\n"); 1246 1219 1247 1220 if (err == 0) 1248 1221 continue; 1249 1222 1250 - err = bpf_core_calc_relo(prog_name, relo, relo_idx, &local_spec, &cand_spec, &cand_res); 1223 + err = bpf_core_calc_relo(prog_name, relo, relo_idx, local_spec, cand_spec, &cand_res); 1251 1224 if (err) 1252 1225 return err; 1253 1226 1254 1227 if (j == 0) { 1255 1228 targ_res = cand_res; 1256 - targ_spec = cand_spec; 1257 - } else if (cand_spec.bit_offset != targ_spec.bit_offset) { 1229 + *targ_spec = *cand_spec; 1230 + } else if (cand_spec->bit_offset != targ_spec->bit_offset) { 1258 1231 /* if there are many field relo candidates, they 1259 1232 * should all resolve to the same bit offset 1260 1233 */ 1261 1234 pr_warn("prog '%s': relo #%d: field offset ambiguity: %u != %u\n", 1262 - prog_name, relo_idx, cand_spec.bit_offset, 1263 - targ_spec.bit_offset); 1235 + prog_name, relo_idx, cand_spec->bit_offset, 1236 + targ_spec->bit_offset); 1264 1237 return -EINVAL; 1265 1238 } else if (cand_res.poison != targ_res.poison || cand_res.new_val != targ_res.new_val) { 1266 1239 /* all candidates should result in the same relocation ··· 1278 1251 } 1279 1252 1280 1253 /* 1281 - * For BPF_FIELD_EXISTS relo or when used BPF program has field 1254 + * For BPF_CORE_FIELD_EXISTS relo or when used BPF program has field 1282 1255 * existence checks or kernel version/config checks, it's expected 1283 1256 * that we might not find any candidates. In this case, if field 1284 1257 * wasn't found in any candidate, the list of candidates shouldn't ··· 1304 1277 prog_name, relo_idx); 1305 1278 1306 1279 /* calculate single target relo result explicitly */ 1307 - err = bpf_core_calc_relo(prog_name, relo, relo_idx, &local_spec, NULL, &targ_res); 1280 + err = bpf_core_calc_relo(prog_name, relo, relo_idx, local_spec, NULL, &targ_res); 1308 1281 if (err) 1309 1282 return err; 1310 1283 }

+30 -73

tools/lib/bpf/relo_core.h

··· 4 4 #ifndef __RELO_CORE_H 5 5 #define __RELO_CORE_H 6 6 7 - /* bpf_core_relo_kind encodes which aspect of captured field/type/enum value 8 - * has to be adjusted by relocations. 9 - */ 10 - enum bpf_core_relo_kind { 11 - BPF_FIELD_BYTE_OFFSET = 0, /* field byte offset */ 12 - BPF_FIELD_BYTE_SIZE = 1, /* field size in bytes */ 13 - BPF_FIELD_EXISTS = 2, /* field existence in target kernel */ 14 - BPF_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */ 15 - BPF_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */ 16 - BPF_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */ 17 - BPF_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */ 18 - BPF_TYPE_ID_TARGET = 7, /* type ID in target kernel */ 19 - BPF_TYPE_EXISTS = 8, /* type existence in target kernel */ 20 - BPF_TYPE_SIZE = 9, /* type size in bytes */ 21 - BPF_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */ 22 - BPF_ENUMVAL_VALUE = 11, /* enum value integer value */ 23 - }; 24 - 25 - /* The minimum bpf_core_relo checked by the loader 26 - * 27 - * CO-RE relocation captures the following data: 28 - * - insn_off - instruction offset (in bytes) within a BPF program that needs 29 - * its insn->imm field to be relocated with actual field info; 30 - * - type_id - BTF type ID of the "root" (containing) entity of a relocatable 31 - * type or field; 32 - * - access_str_off - offset into corresponding .BTF string section. String 33 - * interpretation depends on specific relocation kind: 34 - * - for field-based relocations, string encodes an accessed field using 35 - * a sequence of field and array indices, separated by colon (:). It's 36 - * conceptually very close to LLVM's getelementptr ([0]) instruction's 37 - * arguments for identifying offset to a field. 38 - * - for type-based relocations, strings is expected to be just "0"; 39 - * - for enum value-based relocations, string contains an index of enum 40 - * value within its enum type; 41 - * 42 - * Example to provide a better feel. 43 - * 44 - * struct sample { 45 - * int a; 46 - * struct { 47 - * int b[10]; 48 - * }; 49 - * }; 50 - * 51 - * struct sample *s = ...; 52 - * int x = &s->a; // encoded as "0:0" (a is field #0) 53 - * int y = &s->b[5]; // encoded as "0:1:0:5" (anon struct is field #1, 54 - * // b is field #0 inside anon struct, accessing elem #5) 55 - * int z = &s[10]->b; // encoded as "10:1" (ptr is used as an array) 56 - * 57 - * type_id for all relocs in this example will capture BTF type id of 58 - * `struct sample`. 59 - * 60 - * Such relocation is emitted when using __builtin_preserve_access_index() 61 - * Clang built-in, passing expression that captures field address, e.g.: 62 - * 63 - * bpf_probe_read(&dst, sizeof(dst), 64 - * __builtin_preserve_access_index(&src->a.b.c)); 65 - * 66 - * In this case Clang will emit field relocation recording necessary data to 67 - * be able to find offset of embedded `a.b.c` field within `src` struct. 68 - * 69 - * [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction 70 - */ 71 - struct bpf_core_relo { 72 - __u32 insn_off; 73 - __u32 type_id; 74 - __u32 access_str_off; 75 - enum bpf_core_relo_kind kind; 76 - }; 7 + #include <linux/bpf.h> 77 8 78 9 struct bpf_core_cand { 79 10 const struct btf *btf; 80 - const struct btf_type *t; 81 - const char *name; 82 11 __u32 id; 83 12 }; 84 13 ··· 17 88 int len; 18 89 }; 19 90 91 + #define BPF_CORE_SPEC_MAX_LEN 64 92 + 93 + /* represents BPF CO-RE field or array element accessor */ 94 + struct bpf_core_accessor { 95 + __u32 type_id; /* struct/union type or array element type */ 96 + __u32 idx; /* field index or array index */ 97 + const char *name; /* field name or NULL for array accessor */ 98 + }; 99 + 100 + struct bpf_core_spec { 101 + const struct btf *btf; 102 + /* high-level spec: named fields and array indices only */ 103 + struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN]; 104 + /* original unresolved (no skip_mods_or_typedefs) root type ID */ 105 + __u32 root_type_id; 106 + /* CO-RE relocation kind */ 107 + enum bpf_core_relo_kind relo_kind; 108 + /* high-level spec length */ 109 + int len; 110 + /* raw, low-level spec: 1-to-1 with accessor spec string */ 111 + int raw_spec[BPF_CORE_SPEC_MAX_LEN]; 112 + /* raw spec length */ 113 + int raw_len; 114 + /* field bit offset represented by spec */ 115 + __u32 bit_offset; 116 + }; 117 + 20 118 int bpf_core_apply_relo_insn(const char *prog_name, 21 119 struct bpf_insn *insn, int insn_idx, 22 120 const struct bpf_core_relo *relo, int relo_idx, 23 121 const struct btf *local_btf, 24 - struct bpf_core_cand_list *cands); 122 + struct bpf_core_cand_list *cands, 123 + struct bpf_core_spec *specs_scratch); 25 124 int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id, 26 125 const struct btf *targ_btf, __u32 targ_id); 27 126

+11 -2

tools/lib/bpf/skel_internal.h

··· 7 7 #include <sys/syscall.h> 8 8 #include <sys/mman.h> 9 9 10 + #ifndef __NR_bpf 11 + # if defined(__mips__) && defined(_ABIO32) 12 + # define __NR_bpf 4355 13 + # elif defined(__mips__) && defined(_ABIN32) 14 + # define __NR_bpf 6319 15 + # elif defined(__mips__) && defined(_ABI64) 16 + # define __NR_bpf 5315 17 + # endif 18 + #endif 19 + 10 20 /* This file is a base header for auto-generated *.lskel.h files. 11 21 * Its contents will change and may become part of auto-generation in the future. 12 22 * ··· 75 65 int map_fd = -1, prog_fd = -1, key = 0, err; 76 66 union bpf_attr attr; 77 67 78 - map_fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "__loader.map", 4, 79 - opts->data_sz, 1, 0); 68 + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "__loader.map", 4, opts->data_sz, 1, NULL); 80 69 if (map_fd < 0) { 81 70 opts->errstr = "failed to create loader map"; 82 71 err = -errno;

+8 -10

tools/lib/bpf/xsk.c

··· 35 35 #include "libbpf_internal.h" 36 36 #include "xsk.h" 37 37 38 + /* entire xsk.h and xsk.c is going away in libbpf 1.0, so ignore all internal 39 + * uses of deprecated APIs 40 + */ 41 + #pragma GCC diagnostic ignored "-Wdeprecated-declarations" 42 + 38 43 #ifndef SOL_XDP 39 44 #define SOL_XDP 283 40 45 #endif ··· 369 364 static enum xsk_prog get_xsk_prog(void) 370 365 { 371 366 enum xsk_prog detected = XSK_PROG_FALLBACK; 372 - struct bpf_create_map_attr map_attr; 373 367 __u32 size_out, retval, duration; 374 368 char data_in = 0, data_out; 375 369 struct bpf_insn insns[] = { ··· 380 376 }; 381 377 int prog_fd, map_fd, ret, insn_cnt = ARRAY_SIZE(insns); 382 378 383 - memset(&map_attr, 0, sizeof(map_attr)); 384 - map_attr.map_type = BPF_MAP_TYPE_XSKMAP; 385 - map_attr.key_size = sizeof(int); 386 - map_attr.value_size = sizeof(int); 387 - map_attr.max_entries = 1; 388 - 389 - map_fd = bpf_create_map_xattr(&map_attr); 379 + map_fd = bpf_map_create(BPF_MAP_TYPE_XSKMAP, NULL, sizeof(int), sizeof(int), 1, NULL); 390 380 if (map_fd < 0) 391 381 return detected; 392 382 ··· 584 586 if (max_queues < 0) 585 587 return max_queues; 586 588 587 - fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map", 588 - sizeof(int), sizeof(int), max_queues, 0); 589 + fd = bpf_map_create(BPF_MAP_TYPE_XSKMAP, "xsks_map", 590 + sizeof(int), sizeof(int), max_queues, NULL); 589 591 if (fd < 0) 590 592 return fd; 591 593

+4

tools/perf/tests/bpf.c

··· 296 296 return err; 297 297 } 298 298 299 + /* temporarily disable libbpf deprecation warnings */ 300 + #pragma GCC diagnostic push 301 + #pragma GCC diagnostic ignored "-Wdeprecated-declarations" 299 302 err = bpf_load_program(BPF_PROG_TYPE_KPROBE, insns, 300 303 ARRAY_SIZE(insns), 301 304 license, kver_int, NULL, 0); 305 + #pragma GCC diagnostic pop 302 306 if (err < 0) { 303 307 pr_err("Missing basic BPF support, skip this test: %s\n", 304 308 strerror(errno));

+3

tools/perf/util/bpf-loader.c

··· 29 29 30 30 #include <internal/xyarray.h> 31 31 32 + /* temporarily disable libbpf deprecation warnings */ 33 + #pragma GCC diagnostic ignored "-Wdeprecated-declarations" 34 + 32 35 static int libbpf_perf_print(enum libbpf_print_level level __attribute__((unused)), 33 36 const char *fmt, va_list args) 34 37 {

+16 -2

tools/perf/util/bpf_counter.c

··· 307 307 (map_info.value_size == sizeof(struct perf_event_attr_map_entry)); 308 308 } 309 309 310 + int __weak 311 + bpf_map_create(enum bpf_map_type map_type, 312 + const char *map_name __maybe_unused, 313 + __u32 key_size, 314 + __u32 value_size, 315 + __u32 max_entries, 316 + const struct bpf_map_create_opts *opts __maybe_unused) 317 + { 318 + #pragma GCC diagnostic push 319 + #pragma GCC diagnostic ignored "-Wdeprecated-declarations" 320 + return bpf_create_map(map_type, key_size, value_size, max_entries, 0); 321 + #pragma GCC diagnostic pop 322 + } 323 + 310 324 static int bperf_lock_attr_map(struct target *target) 311 325 { 312 326 char path[PATH_MAX]; ··· 334 320 } 335 321 336 322 if (access(path, F_OK)) { 337 - map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, 323 + map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, 338 324 sizeof(struct perf_event_attr), 339 325 sizeof(struct perf_event_attr_map_entry), 340 - ATTR_MAP_SIZE, 0); 326 + ATTR_MAP_SIZE, NULL); 341 327 if (map_fd < 0) 342 328 return -1; 343 329

+29 -20

tools/testing/selftests/bpf/Makefile

··· 192 192 193 193 $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(BPFOBJ) 194 194 195 - $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c testing_helpers.o 196 - $(OUTPUT)/test_skb_cgroup_id_user: cgroup_helpers.c testing_helpers.o 197 - $(OUTPUT)/test_sock: cgroup_helpers.c testing_helpers.o 198 - $(OUTPUT)/test_sock_addr: cgroup_helpers.c testing_helpers.o 199 - $(OUTPUT)/test_sockmap: cgroup_helpers.c testing_helpers.o 200 - $(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c testing_helpers.o 201 - $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c testing_helpers.o 202 - $(OUTPUT)/test_cgroup_storage: cgroup_helpers.c testing_helpers.o 203 - $(OUTPUT)/test_sock_fields: cgroup_helpers.c testing_helpers.o 204 - $(OUTPUT)/test_sysctl: cgroup_helpers.c testing_helpers.o 205 - $(OUTPUT)/test_tag: testing_helpers.o 206 - $(OUTPUT)/test_lirc_mode2_user: testing_helpers.o 207 - $(OUTPUT)/xdping: testing_helpers.o 208 - $(OUTPUT)/flow_dissector_load: testing_helpers.o 209 - $(OUTPUT)/test_maps: testing_helpers.o 210 - $(OUTPUT)/test_verifier: testing_helpers.o 195 + CGROUP_HELPERS := $(OUTPUT)/cgroup_helpers.o 196 + TESTING_HELPERS := $(OUTPUT)/testing_helpers.o 197 + TRACE_HELPERS := $(OUTPUT)/trace_helpers.o 198 + 199 + $(OUTPUT)/test_dev_cgroup: $(CGROUP_HELPERS) $(TESTING_HELPERS) 200 + $(OUTPUT)/test_skb_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) 201 + $(OUTPUT)/test_sock: $(CGROUP_HELPERS) $(TESTING_HELPERS) 202 + $(OUTPUT)/test_sock_addr: $(CGROUP_HELPERS) $(TESTING_HELPERS) 203 + $(OUTPUT)/test_sockmap: $(CGROUP_HELPERS) $(TESTING_HELPERS) 204 + $(OUTPUT)/test_tcpnotify_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) $(TRACE_HELPERS) 205 + $(OUTPUT)/get_cgroup_id_user: $(CGROUP_HELPERS) $(TESTING_HELPERS) 206 + $(OUTPUT)/test_cgroup_storage: $(CGROUP_HELPERS) $(TESTING_HELPERS) 207 + $(OUTPUT)/test_sock_fields: $(CGROUP_HELPERS) $(TESTING_HELPERS) 208 + $(OUTPUT)/test_sysctl: $(CGROUP_HELPERS) $(TESTING_HELPERS) 209 + $(OUTPUT)/test_tag: $(TESTING_HELPERS) 210 + $(OUTPUT)/test_lirc_mode2_user: $(TESTING_HELPERS) 211 + $(OUTPUT)/xdping: $(TESTING_HELPERS) 212 + $(OUTPUT)/flow_dissector_load: $(TESTING_HELPERS) 213 + $(OUTPUT)/test_maps: $(TESTING_HELPERS) 214 + $(OUTPUT)/test_verifier: $(TESTING_HELPERS) 211 215 212 216 BPFTOOL ?= $(DEFAULT_BPFTOOL) 213 217 $(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \ ··· 329 325 linked_vars.skel.h linked_maps.skel.h 330 326 331 327 LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \ 332 - test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c 328 + test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \ 329 + map_ptr_kern.c core_kern.c 333 330 # Generate both light skeleton and libbpf skeleton for these 334 - LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c 331 + LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test_subprog.c 335 332 SKEL_BLACKLIST += $$(LSKELS) 336 333 337 334 test_static_linked.skel.h-deps := test_static_linked1.o test_static_linked2.o ··· 536 531 $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \ 537 532 $(OUTPUT)/perfbuf_bench.skel.h 538 533 $(OUTPUT)/bench_bloom_filter_map.o: $(OUTPUT)/bloom_filter_bench.skel.h 534 + $(OUTPUT)/bench_bpf_loop.o: $(OUTPUT)/bpf_loop_bench.skel.h 539 535 $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ) 540 536 $(OUTPUT)/bench: LDLIBS += -lm 541 - $(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \ 537 + $(OUTPUT)/bench: $(OUTPUT)/bench.o \ 538 + $(TESTING_HELPERS) \ 539 + $(TRACE_HELPERS) \ 542 540 $(OUTPUT)/bench_count.o \ 543 541 $(OUTPUT)/bench_rename.o \ 544 542 $(OUTPUT)/bench_trigger.o \ 545 543 $(OUTPUT)/bench_ringbufs.o \ 546 - $(OUTPUT)/bench_bloom_filter_map.o 544 + $(OUTPUT)/bench_bloom_filter_map.o \ 545 + $(OUTPUT)/bench_bpf_loop.o 547 546 $(call msg,BINARY,,$@) 548 547 $(Q)$(CC) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@ 549 548

+47

tools/testing/selftests/bpf/bench.c

··· 134 134 total_ops_mean, total_ops_stddev); 135 135 } 136 136 137 + void ops_report_progress(int iter, struct bench_res *res, long delta_ns) 138 + { 139 + double hits_per_sec, hits_per_prod; 140 + 141 + hits_per_sec = res->hits / 1000000.0 / (delta_ns / 1000000000.0); 142 + hits_per_prod = hits_per_sec / env.producer_cnt; 143 + 144 + printf("Iter %3d (%7.3lfus): ", iter, (delta_ns - 1000000000) / 1000.0); 145 + 146 + printf("hits %8.3lfM/s (%7.3lfM/prod)\n", hits_per_sec, hits_per_prod); 147 + } 148 + 149 + void ops_report_final(struct bench_res res[], int res_cnt) 150 + { 151 + double hits_mean = 0.0, hits_stddev = 0.0; 152 + int i; 153 + 154 + for (i = 0; i < res_cnt; i++) 155 + hits_mean += res[i].hits / 1000000.0 / (0.0 + res_cnt); 156 + 157 + if (res_cnt > 1) { 158 + for (i = 0; i < res_cnt; i++) 159 + hits_stddev += (hits_mean - res[i].hits / 1000000.0) * 160 + (hits_mean - res[i].hits / 1000000.0) / 161 + (res_cnt - 1.0); 162 + 163 + hits_stddev = sqrt(hits_stddev); 164 + } 165 + printf("Summary: throughput %8.3lf \u00B1 %5.3lf M ops/s (%7.3lfM ops/prod), ", 166 + hits_mean, hits_stddev, hits_mean / env.producer_cnt); 167 + printf("latency %8.3lf ns/op\n", 1000.0 / hits_mean * env.producer_cnt); 168 + } 169 + 137 170 const char *argp_program_version = "benchmark"; 138 171 const char *argp_program_bug_address = "<bpf@vger.kernel.org>"; 139 172 const char argp_program_doc[] = ··· 204 171 205 172 extern struct argp bench_ringbufs_argp; 206 173 extern struct argp bench_bloom_map_argp; 174 + extern struct argp bench_bpf_loop_argp; 207 175 208 176 static const struct argp_child bench_parsers[] = { 209 177 { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 }, 210 178 { &bench_bloom_map_argp, 0, "Bloom filter map benchmark", 0 }, 179 + { &bench_bpf_loop_argp, 0, "bpf_loop helper benchmark", 0 }, 211 180 {}, 212 181 }; 213 182 ··· 394 359 extern const struct bench bench_trig_fentry; 395 360 extern const struct bench bench_trig_fentry_sleep; 396 361 extern const struct bench bench_trig_fmodret; 362 + extern const struct bench bench_trig_uprobe_base; 363 + extern const struct bench bench_trig_uprobe_with_nop; 364 + extern const struct bench bench_trig_uretprobe_with_nop; 365 + extern const struct bench bench_trig_uprobe_without_nop; 366 + extern const struct bench bench_trig_uretprobe_without_nop; 397 367 extern const struct bench bench_rb_libbpf; 398 368 extern const struct bench bench_rb_custom; 399 369 extern const struct bench bench_pb_libbpf; ··· 408 368 extern const struct bench bench_bloom_false_positive; 409 369 extern const struct bench bench_hashmap_without_bloom; 410 370 extern const struct bench bench_hashmap_with_bloom; 371 + extern const struct bench bench_bpf_loop; 411 372 412 373 static const struct bench *benchs[] = { 413 374 &bench_count_global, ··· 426 385 &bench_trig_fentry, 427 386 &bench_trig_fentry_sleep, 428 387 &bench_trig_fmodret, 388 + &bench_trig_uprobe_base, 389 + &bench_trig_uprobe_with_nop, 390 + &bench_trig_uretprobe_with_nop, 391 + &bench_trig_uprobe_without_nop, 392 + &bench_trig_uretprobe_without_nop, 429 393 &bench_rb_libbpf, 430 394 &bench_rb_custom, 431 395 &bench_pb_libbpf, ··· 440 394 &bench_bloom_false_positive, 441 395 &bench_hashmap_without_bloom, 442 396 &bench_hashmap_with_bloom, 397 + &bench_bpf_loop, 443 398 }; 444 399 445 400 static void setup_benchmark()

+2

tools/testing/selftests/bpf/bench.h

··· 59 59 void hits_drops_report_final(struct bench_res res[], int res_cnt); 60 60 void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns); 61 61 void false_hits_report_final(struct bench_res res[], int res_cnt); 62 + void ops_report_progress(int iter, struct bench_res *res, long delta_ns); 63 + void ops_report_final(struct bench_res res[], int res_cnt); 62 64 63 65 static inline __u64 get_time_ns() { 64 66 struct timespec t;

+105

tools/testing/selftests/bpf/benchs/bench_bpf_loop.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #include <argp.h> 5 + #include "bench.h" 6 + #include "bpf_loop_bench.skel.h" 7 + 8 + /* BPF triggering benchmarks */ 9 + static struct ctx { 10 + struct bpf_loop_bench *skel; 11 + } ctx; 12 + 13 + static struct { 14 + __u32 nr_loops; 15 + } args = { 16 + .nr_loops = 10, 17 + }; 18 + 19 + enum { 20 + ARG_NR_LOOPS = 4000, 21 + }; 22 + 23 + static const struct argp_option opts[] = { 24 + { "nr_loops", ARG_NR_LOOPS, "nr_loops", 0, 25 + "Set number of loops for the bpf_loop helper"}, 26 + {}, 27 + }; 28 + 29 + static error_t parse_arg(int key, char *arg, struct argp_state *state) 30 + { 31 + switch (key) { 32 + case ARG_NR_LOOPS: 33 + args.nr_loops = strtol(arg, NULL, 10); 34 + break; 35 + default: 36 + return ARGP_ERR_UNKNOWN; 37 + } 38 + 39 + return 0; 40 + } 41 + 42 + /* exported into benchmark runner */ 43 + const struct argp bench_bpf_loop_argp = { 44 + .options = opts, 45 + .parser = parse_arg, 46 + }; 47 + 48 + static void validate(void) 49 + { 50 + if (env.consumer_cnt != 1) { 51 + fprintf(stderr, "benchmark doesn't support multi-consumer!\n"); 52 + exit(1); 53 + } 54 + } 55 + 56 + static void *producer(void *input) 57 + { 58 + while (true) 59 + /* trigger the bpf program */ 60 + syscall(__NR_getpgid); 61 + 62 + return NULL; 63 + } 64 + 65 + static void *consumer(void *input) 66 + { 67 + return NULL; 68 + } 69 + 70 + static void measure(struct bench_res *res) 71 + { 72 + res->hits = atomic_swap(&ctx.skel->bss->hits, 0); 73 + } 74 + 75 + static void setup(void) 76 + { 77 + struct bpf_link *link; 78 + 79 + setup_libbpf(); 80 + 81 + ctx.skel = bpf_loop_bench__open_and_load(); 82 + if (!ctx.skel) { 83 + fprintf(stderr, "failed to open skeleton\n"); 84 + exit(1); 85 + } 86 + 87 + link = bpf_program__attach(ctx.skel->progs.benchmark); 88 + if (!link) { 89 + fprintf(stderr, "failed to attach program!\n"); 90 + exit(1); 91 + } 92 + 93 + ctx.skel->bss->nr_loops = args.nr_loops; 94 + } 95 + 96 + const struct bench bench_bpf_loop = { 97 + .name = "bpf-loop", 98 + .validate = validate, 99 + .setup = setup, 100 + .producer_thread = producer, 101 + .consumer_thread = consumer, 102 + .measure = measure, 103 + .report_progress = ops_report_progress, 104 + .report_final = ops_report_final, 105 + };

+146

tools/testing/selftests/bpf/benchs/bench_trigger.c

··· 2 2 /* Copyright (c) 2020 Facebook */ 3 3 #include "bench.h" 4 4 #include "trigger_bench.skel.h" 5 + #include "trace_helpers.h" 5 6 6 7 /* BPF triggering benchmarks */ 7 8 static struct trigger_ctx { ··· 108 107 return NULL; 109 108 } 110 109 110 + /* make sure call is not inlined and not avoided by compiler, so __weak and 111 + * inline asm volatile in the body of the function 112 + * 113 + * There is a performance difference between uprobing at nop location vs other 114 + * instructions. So use two different targets, one of which starts with nop 115 + * and another doesn't. 116 + * 117 + * GCC doesn't generate stack setup preample for these functions due to them 118 + * having no input arguments and doing nothing in the body. 119 + */ 120 + __weak void uprobe_target_with_nop(void) 121 + { 122 + asm volatile ("nop"); 123 + } 124 + 125 + __weak void uprobe_target_without_nop(void) 126 + { 127 + asm volatile (""); 128 + } 129 + 130 + static void *uprobe_base_producer(void *input) 131 + { 132 + while (true) { 133 + uprobe_target_with_nop(); 134 + atomic_inc(&base_hits.value); 135 + } 136 + return NULL; 137 + } 138 + 139 + static void *uprobe_producer_with_nop(void *input) 140 + { 141 + while (true) 142 + uprobe_target_with_nop(); 143 + return NULL; 144 + } 145 + 146 + static void *uprobe_producer_without_nop(void *input) 147 + { 148 + while (true) 149 + uprobe_target_without_nop(); 150 + return NULL; 151 + } 152 + 153 + static void usetup(bool use_retprobe, bool use_nop) 154 + { 155 + size_t uprobe_offset; 156 + ssize_t base_addr; 157 + struct bpf_link *link; 158 + 159 + setup_libbpf(); 160 + 161 + ctx.skel = trigger_bench__open_and_load(); 162 + if (!ctx.skel) { 163 + fprintf(stderr, "failed to open skeleton\n"); 164 + exit(1); 165 + } 166 + 167 + base_addr = get_base_addr(); 168 + if (use_nop) 169 + uprobe_offset = get_uprobe_offset(&uprobe_target_with_nop, base_addr); 170 + else 171 + uprobe_offset = get_uprobe_offset(&uprobe_target_without_nop, base_addr); 172 + 173 + link = bpf_program__attach_uprobe(ctx.skel->progs.bench_trigger_uprobe, 174 + use_retprobe, 175 + -1 /* all PIDs */, 176 + "/proc/self/exe", 177 + uprobe_offset); 178 + if (!link) { 179 + fprintf(stderr, "failed to attach uprobe!\n"); 180 + exit(1); 181 + } 182 + ctx.skel->links.bench_trigger_uprobe = link; 183 + } 184 + 185 + static void uprobe_setup_with_nop() 186 + { 187 + usetup(false, true); 188 + } 189 + 190 + static void uretprobe_setup_with_nop() 191 + { 192 + usetup(true, true); 193 + } 194 + 195 + static void uprobe_setup_without_nop() 196 + { 197 + usetup(false, false); 198 + } 199 + 200 + static void uretprobe_setup_without_nop() 201 + { 202 + usetup(true, false); 203 + } 204 + 111 205 const struct bench bench_trig_base = { 112 206 .name = "trig-base", 113 207 .validate = trigger_validate, ··· 273 177 .validate = trigger_validate, 274 178 .setup = trigger_fmodret_setup, 275 179 .producer_thread = trigger_producer, 180 + .consumer_thread = trigger_consumer, 181 + .measure = trigger_measure, 182 + .report_progress = hits_drops_report_progress, 183 + .report_final = hits_drops_report_final, 184 + }; 185 + 186 + const struct bench bench_trig_uprobe_base = { 187 + .name = "trig-uprobe-base", 188 + .setup = NULL, /* no uprobe/uretprobe is attached */ 189 + .producer_thread = uprobe_base_producer, 190 + .consumer_thread = trigger_consumer, 191 + .measure = trigger_base_measure, 192 + .report_progress = hits_drops_report_progress, 193 + .report_final = hits_drops_report_final, 194 + }; 195 + 196 + const struct bench bench_trig_uprobe_with_nop = { 197 + .name = "trig-uprobe-with-nop", 198 + .setup = uprobe_setup_with_nop, 199 + .producer_thread = uprobe_producer_with_nop, 200 + .consumer_thread = trigger_consumer, 201 + .measure = trigger_measure, 202 + .report_progress = hits_drops_report_progress, 203 + .report_final = hits_drops_report_final, 204 + }; 205 + 206 + const struct bench bench_trig_uretprobe_with_nop = { 207 + .name = "trig-uretprobe-with-nop", 208 + .setup = uretprobe_setup_with_nop, 209 + .producer_thread = uprobe_producer_with_nop, 210 + .consumer_thread = trigger_consumer, 211 + .measure = trigger_measure, 212 + .report_progress = hits_drops_report_progress, 213 + .report_final = hits_drops_report_final, 214 + }; 215 + 216 + const struct bench bench_trig_uprobe_without_nop = { 217 + .name = "trig-uprobe-without-nop", 218 + .setup = uprobe_setup_without_nop, 219 + .producer_thread = uprobe_producer_without_nop, 220 + .consumer_thread = trigger_consumer, 221 + .measure = trigger_measure, 222 + .report_progress = hits_drops_report_progress, 223 + .report_final = hits_drops_report_final, 224 + }; 225 + 226 + const struct bench bench_trig_uretprobe_without_nop = { 227 + .name = "trig-uretprobe-without-nop", 228 + .setup = uretprobe_setup_without_nop, 229 + .producer_thread = uprobe_producer_without_nop, 276 230 .consumer_thread = trigger_consumer, 277 231 .measure = trigger_measure, 278 232 .report_progress = hits_drops_report_progress,

+15

tools/testing/selftests/bpf/benchs/run_bench_bpf_loop.sh

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + source ./benchs/run_common.sh 5 + 6 + set -eufo pipefail 7 + 8 + for t in 1 4 8 12 16; do 9 + for i in 10 100 500 1000 5000 10000 50000 100000 500000 1000000; do 10 + subtitle "nr_loops: $i, nr_threads: $t" 11 + summarize_ops "bpf_loop: " \ 12 + "$($RUN_BENCH -p $t --nr_loops $i bpf-loop)" 13 + printf "\n" 14 + done 15 + done

+15

tools/testing/selftests/bpf/benchs/run_common.sh

··· 33 33 echo "$*" | sed -E "s/.*Percentage\s=\s+([0-9]+\.[0-9]+).*/\1/" 34 34 } 35 35 36 + function ops() 37 + { 38 + echo -n "throughput: " 39 + echo -n "$*" | sed -E "s/.*throughput\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+\sM\sops\/s).*/\1/" 40 + echo -n -e ", latency: " 41 + echo "$*" | sed -E "s/.*latency\s+([0-9]+\.[0-9]+\sns\/op).*/\1/" 42 + } 43 + 36 44 function total() 37 45 { 38 46 echo "$*" | sed -E "s/.*total operations\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/" ··· 58 50 bench="$1" 59 51 summary=$(echo $2 | tail -n1) 60 52 printf "%-20s %s%%\n" "$bench" "$(percentage $summary)" 53 + } 54 + 55 + function summarize_ops() 56 + { 57 + bench="$1" 58 + summary=$(echo $2 | tail -n1) 59 + printf "%-20s %s\n" "$bench" "$(ops $summary)" 61 60 } 62 61 63 62 function summarize_total()

+3 -10

tools/testing/selftests/bpf/map_tests/array_map_batch_ops.c

··· 68 68 69 69 static void __test_map_lookup_and_update_batch(bool is_pcpu) 70 70 { 71 - struct bpf_create_map_attr xattr = { 72 - .name = "array_map", 73 - .map_type = is_pcpu ? BPF_MAP_TYPE_PERCPU_ARRAY : 74 - BPF_MAP_TYPE_ARRAY, 75 - .key_size = sizeof(int), 76 - .value_size = sizeof(__s64), 77 - }; 78 71 int map_fd, *keys, *visited; 79 72 __u32 count, total, total_success; 80 73 const __u32 max_entries = 10; ··· 79 86 .flags = 0, 80 87 ); 81 88 82 - xattr.max_entries = max_entries; 83 - map_fd = bpf_create_map_xattr(&xattr); 89 + map_fd = bpf_map_create(is_pcpu ? BPF_MAP_TYPE_PERCPU_ARRAY : BPF_MAP_TYPE_ARRAY, 90 + "array_map", sizeof(int), sizeof(__s64), max_entries, NULL); 84 91 CHECK(map_fd == -1, 85 - "bpf_create_map_xattr()", "error:%s\n", strerror(errno)); 92 + "bpf_map_create()", "error:%s\n", strerror(errno)); 86 93 87 94 value_size = sizeof(__s64); 88 95 if (is_pcpu)

+3 -10

tools/testing/selftests/bpf/map_tests/htab_map_batch_ops.c

··· 83 83 int err, step, value_size; 84 84 bool nospace_err; 85 85 void *values; 86 - struct bpf_create_map_attr xattr = { 87 - .name = "hash_map", 88 - .map_type = is_pcpu ? BPF_MAP_TYPE_PERCPU_HASH : 89 - BPF_MAP_TYPE_HASH, 90 - .key_size = sizeof(int), 91 - .value_size = sizeof(int), 92 - }; 93 86 DECLARE_LIBBPF_OPTS(bpf_map_batch_opts, opts, 94 87 .elem_flags = 0, 95 88 .flags = 0, 96 89 ); 97 90 98 - xattr.max_entries = max_entries; 99 - map_fd = bpf_create_map_xattr(&xattr); 91 + map_fd = bpf_map_create(is_pcpu ? BPF_MAP_TYPE_PERCPU_HASH : BPF_MAP_TYPE_HASH, 92 + "hash_map", sizeof(int), sizeof(int), max_entries, NULL); 100 93 CHECK(map_fd == -1, 101 - "bpf_create_map_xattr()", "error:%s\n", strerror(errno)); 94 + "bpf_map_create()", "error:%s\n", strerror(errno)); 102 95 103 96 value_size = is_pcpu ? sizeof(value) : sizeof(int); 104 97 keys = malloc(max_entries * sizeof(int));

+5 -10

tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c

··· 64 64 65 65 void test_lpm_trie_map_batch_ops(void) 66 66 { 67 - struct bpf_create_map_attr xattr = { 68 - .name = "lpm_trie_map", 69 - .map_type = BPF_MAP_TYPE_LPM_TRIE, 70 - .key_size = sizeof(struct test_lpm_key), 71 - .value_size = sizeof(int), 72 - .map_flags = BPF_F_NO_PREALLOC, 73 - }; 67 + LIBBPF_OPTS(bpf_map_create_opts, create_opts, .map_flags = BPF_F_NO_PREALLOC); 74 68 struct test_lpm_key *keys, key; 75 69 int map_fd, *values, *visited; 76 70 __u32 step, count, total, total_success; ··· 76 82 .flags = 0, 77 83 ); 78 84 79 - xattr.max_entries = max_entries; 80 - map_fd = bpf_create_map_xattr(&xattr); 81 - CHECK(map_fd == -1, "bpf_create_map_xattr()", "error:%s\n", 85 + map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, "lpm_trie_map", 86 + sizeof(struct test_lpm_key), sizeof(int), 87 + max_entries, &create_opts); 88 + CHECK(map_fd == -1, "bpf_map_create()", "error:%s\n", 82 89 strerror(errno)); 83 90 84 91 keys = malloc(max_entries * sizeof(struct test_lpm_key));

+23 -29

tools/testing/selftests/bpf/map_tests/sk_storage_map.c

··· 19 19 #include <test_btf.h> 20 20 #include <test_maps.h> 21 21 22 - static struct bpf_create_map_attr xattr = { 23 - .name = "sk_storage_map", 24 - .map_type = BPF_MAP_TYPE_SK_STORAGE, 25 - .map_flags = BPF_F_NO_PREALLOC, 26 - .max_entries = 0, 27 - .key_size = 4, 28 - .value_size = 8, 22 + static struct bpf_map_create_opts map_opts = { 23 + .sz = sizeof(map_opts), 29 24 .btf_key_type_id = 1, 30 25 .btf_value_type_id = 3, 31 26 .btf_fd = -1, 27 + .map_flags = BPF_F_NO_PREALLOC, 32 28 }; 33 29 34 30 static unsigned int nr_sk_threads_done; ··· 136 140 memcpy(raw_btf + sizeof(btf_hdr) + sizeof(btf_raw_types), 137 141 btf_str_sec, sizeof(btf_str_sec)); 138 142 139 - return bpf_load_btf(raw_btf, sizeof(raw_btf), 0, 0, 0); 143 + return bpf_btf_load(raw_btf, sizeof(raw_btf), NULL); 140 144 } 141 145 142 146 static int create_sk_storage_map(void) ··· 146 150 btf_fd = load_btf(); 147 151 CHECK(btf_fd == -1, "bpf_load_btf", "btf_fd:%d errno:%d\n", 148 152 btf_fd, errno); 149 - xattr.btf_fd = btf_fd; 153 + map_opts.btf_fd = btf_fd; 150 154 151 - map_fd = bpf_create_map_xattr(&xattr); 152 - xattr.btf_fd = -1; 155 + map_fd = bpf_map_create(BPF_MAP_TYPE_SK_STORAGE, "sk_storage_map", 4, 8, 0, &map_opts); 156 + map_opts.btf_fd = -1; 153 157 close(btf_fd); 154 158 CHECK(map_fd == -1, 155 - "bpf_create_map_xattr()", "errno:%d\n", errno); 159 + "bpf_map_create()", "errno:%d\n", errno); 156 160 157 161 return map_fd; 158 162 } ··· 459 463 int cnt; 460 464 int lock; 461 465 } value = { .cnt = 0xeB9f, .lock = 0, }, lookup_value; 462 - struct bpf_create_map_attr bad_xattr; 466 + struct bpf_map_create_opts bad_xattr; 463 467 int btf_fd, map_fd, sk_fd, err; 464 468 465 469 btf_fd = load_btf(); 466 470 CHECK(btf_fd == -1, "bpf_load_btf", "btf_fd:%d errno:%d\n", 467 471 btf_fd, errno); 468 - xattr.btf_fd = btf_fd; 472 + map_opts.btf_fd = btf_fd; 469 473 470 474 sk_fd = socket(AF_INET6, SOCK_STREAM, 0); 471 475 CHECK(sk_fd == -1, "socket()", "sk_fd:%d errno:%d\n", 472 476 sk_fd, errno); 473 477 474 - map_fd = bpf_create_map_xattr(&xattr); 475 - CHECK(map_fd == -1, "bpf_create_map_xattr(good_xattr)", 478 + map_fd = bpf_map_create(BPF_MAP_TYPE_SK_STORAGE, "sk_storage_map", 4, 8, 0, &map_opts); 479 + CHECK(map_fd == -1, "bpf_map_create(good_xattr)", 476 480 "map_fd:%d errno:%d\n", map_fd, errno); 477 481 478 482 /* Add new elem */ ··· 556 560 CHECK(!err || errno != ENOENT, "bpf_map_delete_elem()", 557 561 "err:%d errno:%d\n", err, errno); 558 562 559 - memcpy(&bad_xattr, &xattr, sizeof(xattr)); 563 + memcpy(&bad_xattr, &map_opts, sizeof(map_opts)); 560 564 bad_xattr.btf_key_type_id = 0; 561 - err = bpf_create_map_xattr(&bad_xattr); 562 - CHECK(!err || errno != EINVAL, "bap_create_map_xattr(bad_xattr)", 565 + err = bpf_map_create(BPF_MAP_TYPE_SK_STORAGE, "sk_storage_map", 4, 8, 0, &bad_xattr); 566 + CHECK(!err || errno != EINVAL, "bpf_map_create(bad_xattr)", 563 567 "err:%d errno:%d\n", err, errno); 564 568 565 - memcpy(&bad_xattr, &xattr, sizeof(xattr)); 569 + memcpy(&bad_xattr, &map_opts, sizeof(map_opts)); 566 570 bad_xattr.btf_key_type_id = 3; 567 - err = bpf_create_map_xattr(&bad_xattr); 568 - CHECK(!err || errno != EINVAL, "bap_create_map_xattr(bad_xattr)", 571 + err = bpf_map_create(BPF_MAP_TYPE_SK_STORAGE, "sk_storage_map", 4, 8, 0, &bad_xattr); 572 + CHECK(!err || errno != EINVAL, "bpf_map_create(bad_xattr)", 569 573 "err:%d errno:%d\n", err, errno); 570 574 571 - memcpy(&bad_xattr, &xattr, sizeof(xattr)); 572 - bad_xattr.max_entries = 1; 573 - err = bpf_create_map_xattr(&bad_xattr); 574 - CHECK(!err || errno != EINVAL, "bap_create_map_xattr(bad_xattr)", 575 + err = bpf_map_create(BPF_MAP_TYPE_SK_STORAGE, "sk_storage_map", 4, 8, 1, &map_opts); 576 + CHECK(!err || errno != EINVAL, "bpf_map_create(bad_xattr)", 575 577 "err:%d errno:%d\n", err, errno); 576 578 577 - memcpy(&bad_xattr, &xattr, sizeof(xattr)); 579 + memcpy(&bad_xattr, &map_opts, sizeof(map_opts)); 578 580 bad_xattr.map_flags = 0; 579 - err = bpf_create_map_xattr(&bad_xattr); 581 + err = bpf_map_create(BPF_MAP_TYPE_SK_STORAGE, "sk_storage_map", 4, 8, 0, &bad_xattr); 580 582 CHECK(!err || errno != EINVAL, "bap_create_map_xattr(bad_xattr)", 581 583 "err:%d errno:%d\n", err, errno); 582 584 583 - xattr.btf_fd = -1; 585 + map_opts.btf_fd = -1; 584 586 close(btf_fd); 585 587 close(map_fd); 586 588 close(sk_fd);

+2 -2

tools/testing/selftests/bpf/prog_tests/atomics.c

··· 167 167 prog_fd = skel->progs.cmpxchg.prog_fd; 168 168 err = bpf_prog_test_run(prog_fd, 1, NULL, 0, 169 169 NULL, NULL, &retval, &duration); 170 - if (CHECK(err || retval, "test_run add", 170 + if (CHECK(err || retval, "test_run cmpxchg", 171 171 "err %d errno %d retval %d duration %d\n", err, errno, retval, duration)) 172 172 goto cleanup; 173 173 ··· 196 196 prog_fd = skel->progs.xchg.prog_fd; 197 197 err = bpf_prog_test_run(prog_fd, 1, NULL, 0, 198 198 NULL, NULL, &retval, &duration); 199 - if (CHECK(err || retval, "test_run add", 199 + if (CHECK(err || retval, "test_run xchg", 200 200 "err %d errno %d retval %d duration %d\n", err, errno, retval, duration)) 201 201 goto cleanup; 202 202

+19 -17

tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c

··· 7 7 8 8 static void test_fail_cases(void) 9 9 { 10 + LIBBPF_OPTS(bpf_map_create_opts, opts); 10 11 __u32 value; 11 12 int fd, err; 12 13 13 14 /* Invalid key size */ 14 - fd = bpf_create_map(BPF_MAP_TYPE_BLOOM_FILTER, 4, sizeof(value), 100, 0); 15 - if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid key size")) 15 + fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 4, sizeof(value), 100, NULL); 16 + if (!ASSERT_LT(fd, 0, "bpf_map_create bloom filter invalid key size")) 16 17 close(fd); 17 18 18 19 /* Invalid value size */ 19 - fd = bpf_create_map(BPF_MAP_TYPE_BLOOM_FILTER, 0, 0, 100, 0); 20 - if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid value size 0")) 20 + fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 0, 0, 100, NULL); 21 + if (!ASSERT_LT(fd, 0, "bpf_map_create bloom filter invalid value size 0")) 21 22 close(fd); 22 23 23 24 /* Invalid max entries size */ 24 - fd = bpf_create_map(BPF_MAP_TYPE_BLOOM_FILTER, 0, sizeof(value), 0, 0); 25 - if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid max entries size")) 25 + fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 0, sizeof(value), 0, NULL); 26 + if (!ASSERT_LT(fd, 0, "bpf_map_create bloom filter invalid max entries size")) 26 27 close(fd); 27 28 28 29 /* Bloom filter maps do not support BPF_F_NO_PREALLOC */ 29 - fd = bpf_create_map(BPF_MAP_TYPE_BLOOM_FILTER, 0, sizeof(value), 100, 30 - BPF_F_NO_PREALLOC); 31 - if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid flags")) 30 + opts.map_flags = BPF_F_NO_PREALLOC; 31 + fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 0, sizeof(value), 100, &opts); 32 + if (!ASSERT_LT(fd, 0, "bpf_map_create bloom filter invalid flags")) 32 33 close(fd); 33 34 34 - fd = bpf_create_map(BPF_MAP_TYPE_BLOOM_FILTER, 0, sizeof(value), 100, 0); 35 - if (!ASSERT_GE(fd, 0, "bpf_create_map bloom filter")) 35 + fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 0, sizeof(value), 100, NULL); 36 + if (!ASSERT_GE(fd, 0, "bpf_map_create bloom filter")) 36 37 return; 37 38 38 39 /* Test invalid flags */ ··· 57 56 58 57 static void test_success_cases(void) 59 58 { 59 + LIBBPF_OPTS(bpf_map_create_opts, opts); 60 60 char value[11]; 61 61 int fd, err; 62 62 63 63 /* Create a map */ 64 - fd = bpf_create_map(BPF_MAP_TYPE_BLOOM_FILTER, 0, sizeof(value), 100, 65 - BPF_F_ZERO_SEED | BPF_F_NUMA_NODE); 66 - if (!ASSERT_GE(fd, 0, "bpf_create_map bloom filter success case")) 64 + opts.map_flags = BPF_F_ZERO_SEED | BPF_F_NUMA_NODE; 65 + fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 0, sizeof(value), 100, &opts); 66 + if (!ASSERT_GE(fd, 0, "bpf_map_create bloom filter success case")) 67 67 return; 68 68 69 69 /* Add a value to the bloom filter */ ··· 102 100 struct bpf_link *link; 103 101 104 102 /* Create a bloom filter map that will be used as the inner map */ 105 - inner_map_fd = bpf_create_map(BPF_MAP_TYPE_BLOOM_FILTER, 0, sizeof(*rand_vals), 106 - nr_rand_vals, 0); 107 - if (!ASSERT_GE(inner_map_fd, 0, "bpf_create_map bloom filter inner map")) 103 + inner_map_fd = bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER, NULL, 0, sizeof(*rand_vals), 104 + nr_rand_vals, NULL); 105 + if (!ASSERT_GE(inner_map_fd, 0, "bpf_map_create bloom filter inner map")) 108 106 return; 109 107 110 108 for (i = 0; i < nr_rand_vals; i++) {

+7 -6

tools/testing/selftests/bpf/prog_tests/bpf_iter.c

··· 469 469 * fills seq_file buffer and then the other will trigger 470 470 * overflow and needs restart. 471 471 */ 472 - map1_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 8, 1, 0); 473 - if (CHECK(map1_fd < 0, "bpf_create_map", 472 + map1_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, 4, 8, 1, NULL); 473 + if (CHECK(map1_fd < 0, "bpf_map_create", 474 474 "map_creation failed: %s\n", strerror(errno))) 475 475 goto out; 476 - map2_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 8, 1, 0); 477 - if (CHECK(map2_fd < 0, "bpf_create_map", 476 + map2_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, 4, 8, 1, NULL); 477 + if (CHECK(map2_fd < 0, "bpf_map_create", 478 478 "map_creation failed: %s\n", strerror(errno))) 479 479 goto free_map1; 480 480 ··· 1206 1206 goto out; 1207 1207 1208 1208 /* Read CMP_BUFFER_SIZE (1kB) from bpf_iter. Read in small chunks 1209 - * to trigger seq_file corner cases. The expected output is much 1210 - * longer than 1kB, so the while loop will terminate. 1209 + * to trigger seq_file corner cases. 1211 1210 */ 1212 1211 len = 0; 1213 1212 while (len < CMP_BUFFER_SIZE) { 1214 1213 err = read_fd_into_buffer(iter_fd, task_vma_output + len, 1215 1214 min(read_size, CMP_BUFFER_SIZE - len)); 1215 + if (!err) 1216 + break; 1216 1217 if (CHECK(err < 0, "read_iter_fd", "read_iter_fd failed\n")) 1217 1218 goto out; 1218 1219 len += err;

+145

tools/testing/selftests/bpf/prog_tests/bpf_loop.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #include <test_progs.h> 5 + #include <network_helpers.h> 6 + #include "bpf_loop.skel.h" 7 + 8 + static void check_nr_loops(struct bpf_loop *skel) 9 + { 10 + struct bpf_link *link; 11 + 12 + link = bpf_program__attach(skel->progs.test_prog); 13 + if (!ASSERT_OK_PTR(link, "link")) 14 + return; 15 + 16 + /* test 0 loops */ 17 + skel->bss->nr_loops = 0; 18 + 19 + usleep(1); 20 + 21 + ASSERT_EQ(skel->bss->nr_loops_returned, skel->bss->nr_loops, 22 + "0 loops"); 23 + 24 + /* test 500 loops */ 25 + skel->bss->nr_loops = 500; 26 + 27 + usleep(1); 28 + 29 + ASSERT_EQ(skel->bss->nr_loops_returned, skel->bss->nr_loops, 30 + "500 loops"); 31 + ASSERT_EQ(skel->bss->g_output, (500 * 499) / 2, "g_output"); 32 + 33 + /* test exceeding the max limit */ 34 + skel->bss->nr_loops = -1; 35 + 36 + usleep(1); 37 + 38 + ASSERT_EQ(skel->bss->err, -E2BIG, "over max limit"); 39 + 40 + bpf_link__destroy(link); 41 + } 42 + 43 + static void check_callback_fn_stop(struct bpf_loop *skel) 44 + { 45 + struct bpf_link *link; 46 + 47 + link = bpf_program__attach(skel->progs.test_prog); 48 + if (!ASSERT_OK_PTR(link, "link")) 49 + return; 50 + 51 + /* testing that loop is stopped when callback_fn returns 1 */ 52 + skel->bss->nr_loops = 400; 53 + skel->data->stop_index = 50; 54 + 55 + usleep(1); 56 + 57 + ASSERT_EQ(skel->bss->nr_loops_returned, skel->data->stop_index + 1, 58 + "nr_loops_returned"); 59 + ASSERT_EQ(skel->bss->g_output, (50 * 49) / 2, 60 + "g_output"); 61 + 62 + bpf_link__destroy(link); 63 + } 64 + 65 + static void check_null_callback_ctx(struct bpf_loop *skel) 66 + { 67 + struct bpf_link *link; 68 + 69 + /* check that user is able to pass in a null callback_ctx */ 70 + link = bpf_program__attach(skel->progs.prog_null_ctx); 71 + if (!ASSERT_OK_PTR(link, "link")) 72 + return; 73 + 74 + skel->bss->nr_loops = 10; 75 + 76 + usleep(1); 77 + 78 + ASSERT_EQ(skel->bss->nr_loops_returned, skel->bss->nr_loops, 79 + "nr_loops_returned"); 80 + 81 + bpf_link__destroy(link); 82 + } 83 + 84 + static void check_invalid_flags(struct bpf_loop *skel) 85 + { 86 + struct bpf_link *link; 87 + 88 + /* check that passing in non-zero flags returns -EINVAL */ 89 + link = bpf_program__attach(skel->progs.prog_invalid_flags); 90 + if (!ASSERT_OK_PTR(link, "link")) 91 + return; 92 + 93 + usleep(1); 94 + 95 + ASSERT_EQ(skel->bss->err, -EINVAL, "err"); 96 + 97 + bpf_link__destroy(link); 98 + } 99 + 100 + static void check_nested_calls(struct bpf_loop *skel) 101 + { 102 + __u32 nr_loops = 100, nested_callback_nr_loops = 4; 103 + struct bpf_link *link; 104 + 105 + /* check that nested calls are supported */ 106 + link = bpf_program__attach(skel->progs.prog_nested_calls); 107 + if (!ASSERT_OK_PTR(link, "link")) 108 + return; 109 + 110 + skel->bss->nr_loops = nr_loops; 111 + skel->bss->nested_callback_nr_loops = nested_callback_nr_loops; 112 + 113 + usleep(1); 114 + 115 + ASSERT_EQ(skel->bss->nr_loops_returned, nr_loops * nested_callback_nr_loops 116 + * nested_callback_nr_loops, "nr_loops_returned"); 117 + ASSERT_EQ(skel->bss->g_output, (4 * 3) / 2 * nested_callback_nr_loops 118 + * nr_loops, "g_output"); 119 + 120 + bpf_link__destroy(link); 121 + } 122 + 123 + void test_bpf_loop(void) 124 + { 125 + struct bpf_loop *skel; 126 + 127 + skel = bpf_loop__open_and_load(); 128 + if (!ASSERT_OK_PTR(skel, "bpf_loop__open_and_load")) 129 + return; 130 + 131 + skel->bss->pid = getpid(); 132 + 133 + if (test__start_subtest("check_nr_loops")) 134 + check_nr_loops(skel); 135 + if (test__start_subtest("check_callback_fn_stop")) 136 + check_callback_fn_stop(skel); 137 + if (test__start_subtest("check_null_callback_ctx")) 138 + check_null_callback_ctx(skel); 139 + if (test__start_subtest("check_invalid_flags")) 140 + check_invalid_flags(skel); 141 + if (test__start_subtest("check_nested_calls")) 142 + check_nested_calls(skel); 143 + 144 + bpf_loop__destroy(skel); 145 + }

+4 -2

tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c

··· 217 217 static int libbpf_debug_print(enum libbpf_print_level level, 218 218 const char *format, va_list args) 219 219 { 220 - char *log_buf; 220 + const char *log_buf; 221 221 222 222 if (level != LIBBPF_WARN || 223 - strcmp(format, "libbpf: \n%s\n")) { 223 + !strstr(format, "-- BEGIN PROG LOAD LOG --")) { 224 224 vprintf(format, args); 225 225 return 0; 226 226 } 227 227 228 + /* skip prog_name */ 229 + va_arg(args, char *); 228 230 log_buf = va_arg(args, char *); 229 231 if (!log_buf) 230 232 goto out;

+32 -8

tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c

··· 19 19 20 20 static int check_load(const char *file, enum bpf_prog_type type) 21 21 { 22 - struct bpf_prog_load_attr attr; 23 22 struct bpf_object *obj = NULL; 24 - int err, prog_fd; 23 + struct bpf_program *prog; 24 + int err; 25 25 26 - memset(&attr, 0, sizeof(struct bpf_prog_load_attr)); 27 - attr.file = file; 28 - attr.prog_type = type; 29 - attr.log_level = 4 | extra_prog_load_log_flags; 30 - attr.prog_flags = BPF_F_TEST_RND_HI32; 31 - err = bpf_prog_load_xattr(&attr, &obj, &prog_fd); 26 + obj = bpf_object__open_file(file, NULL); 27 + err = libbpf_get_error(obj); 28 + if (err) 29 + return err; 30 + 31 + prog = bpf_object__next_program(obj, NULL); 32 + if (!prog) { 33 + err = -ENOENT; 34 + goto err_out; 35 + } 36 + 37 + bpf_program__set_type(prog, type); 38 + bpf_program__set_flags(prog, BPF_F_TEST_RND_HI32); 39 + bpf_program__set_log_level(prog, 4 | extra_prog_load_log_flags); 40 + 41 + err = bpf_object__load(obj); 42 + 43 + err_out: 32 44 bpf_object__close(obj); 33 45 return err; 34 46 } ··· 127 115 scale_test("pyperf600.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); 128 116 } 129 117 118 + void test_verif_scale_pyperf600_bpf_loop(void) 119 + { 120 + /* use the bpf_loop helper*/ 121 + scale_test("pyperf600_bpf_loop.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); 122 + } 123 + 130 124 void test_verif_scale_pyperf600_nounroll() 131 125 { 132 126 /* no unroll at all. ··· 181 163 * ~350k processed_insns 182 164 */ 183 165 scale_test("strobemeta.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); 166 + } 167 + 168 + void test_verif_scale_strobemeta_bpf_loop(void) 169 + { 170 + /* use the bpf_loop helper*/ 171 + scale_test("strobemeta_bpf_loop.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); 184 172 } 185 173 186 174 void test_verif_scale_strobemeta_nounroll1()

+74 -53

tools/testing/selftests/bpf/prog_tests/btf.c

··· 4071 4071 return raw_btf; 4072 4072 } 4073 4073 4074 + static int load_raw_btf(const void *raw_data, size_t raw_size) 4075 + { 4076 + LIBBPF_OPTS(bpf_btf_load_opts, opts); 4077 + int btf_fd; 4078 + 4079 + if (always_log) { 4080 + opts.log_buf = btf_log_buf, 4081 + opts.log_size = BTF_LOG_BUF_SIZE, 4082 + opts.log_level = 1; 4083 + } 4084 + 4085 + btf_fd = bpf_btf_load(raw_data, raw_size, &opts); 4086 + if (btf_fd < 0 && !always_log) { 4087 + opts.log_buf = btf_log_buf, 4088 + opts.log_size = BTF_LOG_BUF_SIZE, 4089 + opts.log_level = 1; 4090 + btf_fd = bpf_btf_load(raw_data, raw_size, &opts); 4091 + } 4092 + 4093 + return btf_fd; 4094 + } 4095 + 4074 4096 static void do_test_raw(unsigned int test_num) 4075 4097 { 4076 4098 struct btf_raw_test *test = &raw_tests[test_num - 1]; 4077 - struct bpf_create_map_attr create_attr = {}; 4099 + LIBBPF_OPTS(bpf_map_create_opts, opts); 4078 4100 int map_fd = -1, btf_fd = -1; 4079 4101 unsigned int raw_btf_size; 4080 4102 struct btf_header *hdr; ··· 4122 4100 hdr->str_len = (int)hdr->str_len + test->str_len_delta; 4123 4101 4124 4102 *btf_log_buf = '\0'; 4125 - btf_fd = bpf_load_btf(raw_btf, raw_btf_size, 4126 - btf_log_buf, BTF_LOG_BUF_SIZE, 4127 - always_log); 4103 + btf_fd = load_raw_btf(raw_btf, raw_btf_size); 4128 4104 free(raw_btf); 4129 4105 4130 4106 err = ((btf_fd < 0) != test->btf_load_err); 4131 4107 if (CHECK(err, "btf_fd:%d test->btf_load_err:%u", 4132 4108 btf_fd, test->btf_load_err) || 4133 4109 CHECK(test->err_str && !strstr(btf_log_buf, test->err_str), 4134 - "expected err_str:%s", test->err_str)) { 4110 + "expected err_str:%s\n", test->err_str)) { 4135 4111 err = -1; 4136 4112 goto done; 4137 4113 } ··· 4137 4117 if (err || btf_fd < 0) 4138 4118 goto done; 4139 4119 4140 - create_attr.name = test->map_name; 4141 - create_attr.map_type = test->map_type; 4142 - create_attr.key_size = test->key_size; 4143 - create_attr.value_size = test->value_size; 4144 - create_attr.max_entries = test->max_entries; 4145 - create_attr.btf_fd = btf_fd; 4146 - create_attr.btf_key_type_id = test->key_type_id; 4147 - create_attr.btf_value_type_id = test->value_type_id; 4148 - 4149 - map_fd = bpf_create_map_xattr(&create_attr); 4120 + opts.btf_fd = btf_fd; 4121 + opts.btf_key_type_id = test->key_type_id; 4122 + opts.btf_value_type_id = test->value_type_id; 4123 + map_fd = bpf_map_create(test->map_type, test->map_name, 4124 + test->key_size, test->value_size, test->max_entries, &opts); 4150 4125 4151 4126 err = ((map_fd < 0) != test->map_create_err); 4152 4127 CHECK(err, "map_fd:%d test->map_create_err:%u", ··· 4247 4232 goto done; 4248 4233 } 4249 4234 4250 - btf_fd = bpf_load_btf(raw_btf, raw_btf_size, 4251 - btf_log_buf, BTF_LOG_BUF_SIZE, 4252 - always_log); 4235 + btf_fd = load_raw_btf(raw_btf, raw_btf_size); 4253 4236 if (CHECK(btf_fd < 0, "errno:%d", errno)) { 4254 4237 err = -1; 4255 4238 goto done; ··· 4303 4290 static int test_btf_id(unsigned int test_num) 4304 4291 { 4305 4292 const struct btf_get_info_test *test = &get_info_tests[test_num - 1]; 4306 - struct bpf_create_map_attr create_attr = {}; 4293 + LIBBPF_OPTS(bpf_map_create_opts, opts); 4307 4294 uint8_t *raw_btf = NULL, *user_btf[2] = {}; 4308 4295 int btf_fd[2] = {-1, -1}, map_fd = -1; 4309 4296 struct bpf_map_info map_info = {}; ··· 4333 4320 info[i].btf_size = raw_btf_size; 4334 4321 } 4335 4322 4336 - btf_fd[0] = bpf_load_btf(raw_btf, raw_btf_size, 4337 - btf_log_buf, BTF_LOG_BUF_SIZE, 4338 - always_log); 4323 + btf_fd[0] = load_raw_btf(raw_btf, raw_btf_size); 4339 4324 if (CHECK(btf_fd[0] < 0, "errno:%d", errno)) { 4340 4325 err = -1; 4341 4326 goto done; ··· 4366 4355 } 4367 4356 4368 4357 /* Test btf members in struct bpf_map_info */ 4369 - create_attr.name = "test_btf_id"; 4370 - create_attr.map_type = BPF_MAP_TYPE_ARRAY; 4371 - create_attr.key_size = sizeof(int); 4372 - create_attr.value_size = sizeof(unsigned int); 4373 - create_attr.max_entries = 4; 4374 - create_attr.btf_fd = btf_fd[0]; 4375 - create_attr.btf_key_type_id = 1; 4376 - create_attr.btf_value_type_id = 2; 4377 - 4378 - map_fd = bpf_create_map_xattr(&create_attr); 4358 + opts.btf_fd = btf_fd[0]; 4359 + opts.btf_key_type_id = 1; 4360 + opts.btf_value_type_id = 2; 4361 + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "test_btf_id", 4362 + sizeof(int), sizeof(int), 4, &opts); 4379 4363 if (CHECK(map_fd < 0, "errno:%d", errno)) { 4380 4364 err = -1; 4381 4365 goto done; ··· 4463 4457 goto done; 4464 4458 } 4465 4459 4466 - btf_fd = bpf_load_btf(raw_btf, raw_btf_size, 4467 - btf_log_buf, BTF_LOG_BUF_SIZE, 4468 - always_log); 4460 + btf_fd = load_raw_btf(raw_btf, raw_btf_size); 4469 4461 if (CHECK(btf_fd <= 0, "errno:%d", errno)) { 4470 4462 err = -1; 4471 4463 goto done; ··· 5157 5153 { 5158 5154 const struct btf_raw_test *test = &pprint_test_template[test_num]; 5159 5155 enum pprint_mapv_kind_t mapv_kind = test->mapv_kind; 5160 - struct bpf_create_map_attr create_attr = {}; 5156 + LIBBPF_OPTS(bpf_map_create_opts, opts); 5161 5157 bool ordered_map, lossless_map, percpu_map; 5162 5158 int err, ret, num_cpus, rounded_value_size; 5163 5159 unsigned int key, nr_read_elems; ··· 5183 5179 return; 5184 5180 5185 5181 *btf_log_buf = '\0'; 5186 - btf_fd = bpf_load_btf(raw_btf, raw_btf_size, 5187 - btf_log_buf, BTF_LOG_BUF_SIZE, 5188 - always_log); 5182 + btf_fd = load_raw_btf(raw_btf, raw_btf_size); 5189 5183 free(raw_btf); 5190 5184 5191 - if (CHECK(btf_fd < 0, "errno:%d", errno)) { 5185 + if (CHECK(btf_fd < 0, "errno:%d\n", errno)) { 5192 5186 err = -1; 5193 5187 goto done; 5194 5188 } 5195 5189 5196 - create_attr.name = test->map_name; 5197 - create_attr.map_type = test->map_type; 5198 - create_attr.key_size = test->key_size; 5199 - create_attr.value_size = test->value_size; 5200 - create_attr.max_entries = test->max_entries; 5201 - create_attr.btf_fd = btf_fd; 5202 - create_attr.btf_key_type_id = test->key_type_id; 5203 - create_attr.btf_value_type_id = test->value_type_id; 5204 - 5205 - map_fd = bpf_create_map_xattr(&create_attr); 5190 + opts.btf_fd = btf_fd; 5191 + opts.btf_key_type_id = test->key_type_id; 5192 + opts.btf_value_type_id = test->value_type_id; 5193 + map_fd = bpf_map_create(test->map_type, test->map_name, 5194 + test->key_size, test->value_size, test->max_entries, &opts); 5206 5195 if (CHECK(map_fd < 0, "errno:%d", errno)) { 5207 5196 err = -1; 5208 5197 goto done; ··· 6550 6553 return; 6551 6554 6552 6555 *btf_log_buf = '\0'; 6553 - btf_fd = bpf_load_btf(raw_btf, raw_btf_size, 6554 - btf_log_buf, BTF_LOG_BUF_SIZE, 6555 - always_log); 6556 + btf_fd = load_raw_btf(raw_btf, raw_btf_size); 6556 6557 free(raw_btf); 6557 6558 6558 6559 if (CHECK(btf_fd < 0, "invalid btf_fd errno:%d", errno)) { ··· 7345 7350 BTF_END_RAW, 7346 7351 }, 7347 7352 BTF_STR_SEC("\0tag1"), 7353 + }, 7354 + }, 7355 + { 7356 + .descr = "dedup: btf_type_tag #5, struct", 7357 + .input = { 7358 + .raw_types = { 7359 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 7360 + BTF_TYPE_TAG_ENC(NAME_NTH(1), 1), /* [2] */ 7361 + BTF_TYPE_ENC(NAME_NTH(2), BTF_INFO_ENC(BTF_KIND_STRUCT, 1, 1), 4), /* [3] */ 7362 + BTF_MEMBER_ENC(NAME_NTH(3), 2, BTF_MEMBER_OFFSET(0, 0)), 7363 + BTF_TYPE_TAG_ENC(NAME_NTH(1), 1), /* [4] */ 7364 + BTF_TYPE_ENC(NAME_NTH(2), BTF_INFO_ENC(BTF_KIND_STRUCT, 1, 1), 4), /* [5] */ 7365 + BTF_MEMBER_ENC(NAME_NTH(3), 4, BTF_MEMBER_OFFSET(0, 0)), 7366 + BTF_END_RAW, 7367 + }, 7368 + BTF_STR_SEC("\0tag1\0t\0m"), 7369 + }, 7370 + .expect = { 7371 + .raw_types = { 7372 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 7373 + BTF_TYPE_TAG_ENC(NAME_NTH(1), 1), /* [2] */ 7374 + BTF_TYPE_ENC(NAME_NTH(2), BTF_INFO_ENC(BTF_KIND_STRUCT, 1, 1), 4), /* [3] */ 7375 + BTF_MEMBER_ENC(NAME_NTH(3), 2, BTF_MEMBER_OFFSET(0, 0)), 7376 + BTF_END_RAW, 7377 + }, 7378 + BTF_STR_SEC("\0tag1\0t\0m"), 7348 7379 }, 7349 7380 }, 7350 7381

+113

tools/testing/selftests/bpf/prog_tests/btf_dedup_split.c

··· 314 314 btf__free(btf1); 315 315 } 316 316 317 + static void btf_add_dup_struct_in_cu(struct btf *btf, int start_id) 318 + { 319 + #define ID(n) (start_id + n) 320 + btf__set_pointer_size(btf, 8); /* enforce 64-bit arch */ 321 + 322 + btf__add_int(btf, "int", 4, BTF_INT_SIGNED); /* [1] int */ 323 + 324 + btf__add_struct(btf, "s", 8); /* [2] struct s { */ 325 + btf__add_field(btf, "a", ID(3), 0, 0); /* struct anon a; */ 326 + btf__add_field(btf, "b", ID(4), 0, 0); /* struct anon b; */ 327 + /* } */ 328 + 329 + btf__add_struct(btf, "(anon)", 8); /* [3] struct anon { */ 330 + btf__add_field(btf, "f1", ID(1), 0, 0); /* int f1; */ 331 + btf__add_field(btf, "f2", ID(1), 32, 0); /* int f2; */ 332 + /* } */ 333 + 334 + btf__add_struct(btf, "(anon)", 8); /* [4] struct anon { */ 335 + btf__add_field(btf, "f1", ID(1), 0, 0); /* int f1; */ 336 + btf__add_field(btf, "f2", ID(1), 32, 0); /* int f2; */ 337 + /* } */ 338 + #undef ID 339 + } 340 + 341 + static void test_split_dup_struct_in_cu() 342 + { 343 + struct btf *btf1, *btf2 = NULL; 344 + int err; 345 + 346 + /* generate the base data.. */ 347 + btf1 = btf__new_empty(); 348 + if (!ASSERT_OK_PTR(btf1, "empty_main_btf")) 349 + return; 350 + 351 + btf_add_dup_struct_in_cu(btf1, 0); 352 + 353 + VALIDATE_RAW_BTF( 354 + btf1, 355 + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", 356 + "[2] STRUCT 's' size=8 vlen=2\n" 357 + "\t'a' type_id=3 bits_offset=0\n" 358 + "\t'b' type_id=4 bits_offset=0", 359 + "[3] STRUCT '(anon)' size=8 vlen=2\n" 360 + "\t'f1' type_id=1 bits_offset=0\n" 361 + "\t'f2' type_id=1 bits_offset=32", 362 + "[4] STRUCT '(anon)' size=8 vlen=2\n" 363 + "\t'f1' type_id=1 bits_offset=0\n" 364 + "\t'f2' type_id=1 bits_offset=32"); 365 + 366 + /* ..dedup them... */ 367 + err = btf__dedup(btf1, NULL); 368 + if (!ASSERT_OK(err, "btf_dedup")) 369 + goto cleanup; 370 + 371 + VALIDATE_RAW_BTF( 372 + btf1, 373 + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", 374 + "[2] STRUCT 's' size=8 vlen=2\n" 375 + "\t'a' type_id=3 bits_offset=0\n" 376 + "\t'b' type_id=3 bits_offset=0", 377 + "[3] STRUCT '(anon)' size=8 vlen=2\n" 378 + "\t'f1' type_id=1 bits_offset=0\n" 379 + "\t'f2' type_id=1 bits_offset=32"); 380 + 381 + /* and add the same data on top of it */ 382 + btf2 = btf__new_empty_split(btf1); 383 + if (!ASSERT_OK_PTR(btf2, "empty_split_btf")) 384 + goto cleanup; 385 + 386 + btf_add_dup_struct_in_cu(btf2, 3); 387 + 388 + VALIDATE_RAW_BTF( 389 + btf2, 390 + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", 391 + "[2] STRUCT 's' size=8 vlen=2\n" 392 + "\t'a' type_id=3 bits_offset=0\n" 393 + "\t'b' type_id=3 bits_offset=0", 394 + "[3] STRUCT '(anon)' size=8 vlen=2\n" 395 + "\t'f1' type_id=1 bits_offset=0\n" 396 + "\t'f2' type_id=1 bits_offset=32", 397 + "[4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", 398 + "[5] STRUCT 's' size=8 vlen=2\n" 399 + "\t'a' type_id=6 bits_offset=0\n" 400 + "\t'b' type_id=7 bits_offset=0", 401 + "[6] STRUCT '(anon)' size=8 vlen=2\n" 402 + "\t'f1' type_id=4 bits_offset=0\n" 403 + "\t'f2' type_id=4 bits_offset=32", 404 + "[7] STRUCT '(anon)' size=8 vlen=2\n" 405 + "\t'f1' type_id=4 bits_offset=0\n" 406 + "\t'f2' type_id=4 bits_offset=32"); 407 + 408 + err = btf__dedup(btf2, NULL); 409 + if (!ASSERT_OK(err, "btf_dedup")) 410 + goto cleanup; 411 + 412 + /* after dedup it should match the original data */ 413 + VALIDATE_RAW_BTF( 414 + btf2, 415 + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", 416 + "[2] STRUCT 's' size=8 vlen=2\n" 417 + "\t'a' type_id=3 bits_offset=0\n" 418 + "\t'b' type_id=3 bits_offset=0", 419 + "[3] STRUCT '(anon)' size=8 vlen=2\n" 420 + "\t'f1' type_id=1 bits_offset=0\n" 421 + "\t'f2' type_id=1 bits_offset=32"); 422 + 423 + cleanup: 424 + btf__free(btf2); 425 + btf__free(btf1); 426 + } 427 + 317 428 void test_btf_dedup_split() 318 429 { 319 430 if (test__start_subtest("split_simple")) ··· 433 322 test_split_struct_duped(); 434 323 if (test__start_subtest("split_fwd_resolve")) 435 324 test_split_fwd_resolve(); 325 + if (test__start_subtest("split_dup_struct_in_cu")) 326 + test_split_dup_struct_in_cu(); 436 327 }

+2 -2

tools/testing/selftests/bpf/prog_tests/btf_dump.c

··· 323 323 char *str) 324 324 { 325 325 #ifdef __SIZEOF_INT128__ 326 - __int128 i = 0xffffffffffffffff; 326 + unsigned __int128 i = 0xffffffffffffffff; 327 327 328 328 /* this dance is required because we cannot directly initialize 329 329 * a 128-bit value to anything larger than a 64-bit value. ··· 756 756 /* overflow bpf_sock_ops struct with final element nonzero/zero. 757 757 * Regardless of the value of the final field, we don't have all the 758 758 * data we need to display it, so we should trigger an overflow. 759 - * In other words oveflow checking should trump "is field zero?" 759 + * In other words overflow checking should trump "is field zero?" 760 760 * checks because if we've overflowed, it shouldn't matter what the 761 761 * field is - we can't trust its value so shouldn't display it. 762 762 */

+6 -6

tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c

··· 15 15 int cgroup_storage_fd, percpu_cgroup_storage_fd; 16 16 17 17 if (map_fd < 0) 18 - map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 8, 1, 0); 18 + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, 4, 8, 1, NULL); 19 19 if (map_fd < 0) { 20 20 printf("failed to create map '%s'\n", strerror(errno)); 21 21 return -1; 22 22 } 23 23 24 - cgroup_storage_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_STORAGE, 25 - sizeof(struct bpf_cgroup_storage_key), 8, 0, 0); 24 + cgroup_storage_fd = bpf_map_create(BPF_MAP_TYPE_CGROUP_STORAGE, NULL, 25 + sizeof(struct bpf_cgroup_storage_key), 8, 0, NULL); 26 26 if (cgroup_storage_fd < 0) { 27 27 printf("failed to create map '%s'\n", strerror(errno)); 28 28 return -1; 29 29 } 30 30 31 - percpu_cgroup_storage_fd = bpf_create_map( 32 - BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, 33 - sizeof(struct bpf_cgroup_storage_key), 8, 0, 0); 31 + percpu_cgroup_storage_fd = bpf_map_create( 32 + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, NULL, 33 + sizeof(struct bpf_cgroup_storage_key), 8, 0, NULL); 34 34 if (percpu_cgroup_storage_fd < 0) { 35 35 printf("failed to create map '%s'\n", strerror(errno)); 36 36 return -1;

+9 -8

tools/testing/selftests/bpf/prog_tests/connect_force_port.c

··· 51 51 bool v4 = family == AF_INET; 52 52 __u16 expected_local_port = v4 ? 22222 : 22223; 53 53 __u16 expected_peer_port = 60000; 54 - struct bpf_prog_load_attr attr = { 55 - .file = v4 ? "./connect_force_port4.o" : 56 - "./connect_force_port6.o", 57 - }; 58 54 struct bpf_program *prog; 59 55 struct bpf_object *obj; 60 - int xlate_fd, fd, err; 56 + const char *obj_file = v4 ? "connect_force_port4.o" : "connect_force_port6.o"; 57 + int fd, err; 61 58 __u32 duration = 0; 62 59 63 - err = bpf_prog_load_xattr(&attr, &obj, &xlate_fd); 64 - if (err) { 65 - log_err("Failed to load BPF object"); 60 + obj = bpf_object__open_file(obj_file, NULL); 61 + if (!ASSERT_OK_PTR(obj, "bpf_obj_open")) 66 62 return -1; 63 + 64 + err = bpf_object__load(obj); 65 + if (!ASSERT_OK(err, "bpf_obj_load")) { 66 + err = -EIO; 67 + goto close_bpf_object; 67 68 } 68 69 69 70 prog = bpf_object__find_program_by_title(obj, v4 ?

+14

tools/testing/selftests/bpf/prog_tests/core_kern.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #include "test_progs.h" 5 + #include "core_kern.lskel.h" 6 + 7 + void test_core_kern_lskel(void) 8 + { 9 + struct core_kern_lskel *skel; 10 + 11 + skel = core_kern_lskel__open_and_load(); 12 + ASSERT_OK_PTR(skel, "open_and_load"); 13 + core_kern_lskel__destroy(skel); 14 + }

+2 -1

tools/testing/selftests/bpf/prog_tests/core_reloc.c

··· 881 881 data = mmap_data; 882 882 883 883 memset(mmap_data, 0, sizeof(*data)); 884 - memcpy(data->in, test_case->input, test_case->input_len); 884 + if (test_case->input_len) 885 + memcpy(data->in, test_case->input, test_case->input_len); 885 886 data->my_pid_tgid = my_pid_tgid; 886 887 887 888 link = bpf_program__attach_raw_tracepoint(prog, tp_name);

+10 -4

tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c

··· 24 24 { 25 25 bool good_kern_stack = false, good_user_stack = false; 26 26 const char *nonjit_func = "___bpf_prog_run"; 27 - struct get_stack_trace_t *e = data; 27 + /* perfbuf-submitted data is 4-byte aligned, but we need 8-byte 28 + * alignment, so copy data into a local variable, for simplicity 29 + */ 30 + struct get_stack_trace_t e; 28 31 int i, num_stack; 29 32 static __u64 cnt; 30 33 struct ksym *ks; 31 34 32 35 cnt++; 36 + 37 + memset(&e, 0, sizeof(e)); 38 + memcpy(&e, data, size <= sizeof(e) ? size : sizeof(e)); 33 39 34 40 if (size < sizeof(struct get_stack_trace_t)) { 35 41 __u64 *raw_data = data; ··· 63 57 good_user_stack = true; 64 58 } 65 59 } else { 66 - num_stack = e->kern_stack_size / sizeof(__u64); 60 + num_stack = e.kern_stack_size / sizeof(__u64); 67 61 if (env.jit_enabled) { 68 62 good_kern_stack = num_stack > 0; 69 63 } else { 70 64 for (i = 0; i < num_stack; i++) { 71 - ks = ksym_search(e->kern_stack[i]); 65 + ks = ksym_search(e.kern_stack[i]); 72 66 if (ks && (strcmp(ks->name, nonjit_func) == 0)) { 73 67 good_kern_stack = true; 74 68 break; 75 69 } 76 70 } 77 71 } 78 - if (e->user_stack_size > 0 && e->user_stack_buildid_size > 0) 72 + if (e.user_stack_size > 0 && e.user_stack_buildid_size > 0) 79 73 good_user_stack = true; 80 74 } 81 75

+20 -40

tools/testing/selftests/bpf/prog_tests/kfree_skb.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <test_progs.h> 3 3 #include <network_helpers.h> 4 + #include "kfree_skb.skel.h" 4 5 5 6 struct meta { 6 7 int ifindex; ··· 59 58 .ctx_in = &skb, 60 59 .ctx_size_in = sizeof(skb), 61 60 }; 62 - struct bpf_prog_load_attr attr = { 63 - .file = "./kfree_skb.o", 64 - }; 65 - 66 - struct bpf_link *link = NULL, *link_fentry = NULL, *link_fexit = NULL; 67 - struct bpf_map *perf_buf_map, *global_data; 68 - struct bpf_program *prog, *fentry, *fexit; 69 - struct bpf_object *obj, *obj2 = NULL; 61 + struct kfree_skb *skel = NULL; 62 + struct bpf_link *link; 63 + struct bpf_object *obj; 70 64 struct perf_buffer *pb = NULL; 71 - int err, kfree_skb_fd; 65 + int err; 72 66 bool passed = false; 73 67 __u32 duration = 0; 74 68 const int zero = 0; ··· 74 78 if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno)) 75 79 return; 76 80 77 - err = bpf_prog_load_xattr(&attr, &obj2, &kfree_skb_fd); 78 - if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno)) 81 + skel = kfree_skb__open_and_load(); 82 + if (!ASSERT_OK_PTR(skel, "kfree_skb_skel")) 79 83 goto close_prog; 80 84 81 - prog = bpf_object__find_program_by_title(obj2, "tp_btf/kfree_skb"); 82 - if (CHECK(!prog, "find_prog", "prog kfree_skb not found\n")) 83 - goto close_prog; 84 - fentry = bpf_object__find_program_by_title(obj2, "fentry/eth_type_trans"); 85 - if (CHECK(!fentry, "find_prog", "prog eth_type_trans not found\n")) 86 - goto close_prog; 87 - fexit = bpf_object__find_program_by_title(obj2, "fexit/eth_type_trans"); 88 - if (CHECK(!fexit, "find_prog", "prog eth_type_trans not found\n")) 89 - goto close_prog; 90 - 91 - global_data = bpf_object__find_map_by_name(obj2, ".bss"); 92 - if (CHECK(!global_data, "find global data", "not found\n")) 93 - goto close_prog; 94 - 95 - link = bpf_program__attach_raw_tracepoint(prog, NULL); 85 + link = bpf_program__attach_raw_tracepoint(skel->progs.trace_kfree_skb, NULL); 96 86 if (!ASSERT_OK_PTR(link, "attach_raw_tp")) 97 87 goto close_prog; 98 - link_fentry = bpf_program__attach_trace(fentry); 99 - if (!ASSERT_OK_PTR(link_fentry, "attach fentry")) 100 - goto close_prog; 101 - link_fexit = bpf_program__attach_trace(fexit); 102 - if (!ASSERT_OK_PTR(link_fexit, "attach fexit")) 103 - goto close_prog; 88 + skel->links.trace_kfree_skb = link; 104 89 105 - perf_buf_map = bpf_object__find_map_by_name(obj2, "perf_buf_map"); 106 - if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n")) 90 + link = bpf_program__attach_trace(skel->progs.fentry_eth_type_trans); 91 + if (!ASSERT_OK_PTR(link, "attach fentry")) 107 92 goto close_prog; 93 + skel->links.fentry_eth_type_trans = link; 94 + 95 + link = bpf_program__attach_trace(skel->progs.fexit_eth_type_trans); 96 + if (!ASSERT_OK_PTR(link, "attach fexit")) 97 + goto close_prog; 98 + skel->links.fexit_eth_type_trans = link; 108 99 109 100 /* set up perf buffer */ 110 - pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, 101 + pb = perf_buffer__new(bpf_map__fd(skel->maps.perf_buf_map), 1, 111 102 on_sample, NULL, &passed, NULL); 112 103 if (!ASSERT_OK_PTR(pb, "perf_buf__new")) 113 104 goto close_prog; ··· 116 133 */ 117 134 ASSERT_TRUE(passed, "passed"); 118 135 119 - err = bpf_map_lookup_elem(bpf_map__fd(global_data), &zero, test_ok); 136 + err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.bss), &zero, test_ok); 120 137 if (CHECK(err, "get_result", 121 138 "failed to get output data: %d\n", err)) 122 139 goto close_prog; ··· 124 141 CHECK_FAIL(!test_ok[0] || !test_ok[1]); 125 142 close_prog: 126 143 perf_buffer__free(pb); 127 - bpf_link__destroy(link); 128 - bpf_link__destroy(link_fentry); 129 - bpf_link__destroy(link_fexit); 130 144 bpf_object__close(obj); 131 - bpf_object__close(obj2); 145 + kfree_skb__destroy(skel); 132 146 }

+24

tools/testing/selftests/bpf/prog_tests/kfunc_call.c

··· 4 4 #include <network_helpers.h> 5 5 #include "kfunc_call_test.lskel.h" 6 6 #include "kfunc_call_test_subprog.skel.h" 7 + #include "kfunc_call_test_subprog.lskel.h" 7 8 8 9 static void test_main(void) 9 10 { ··· 50 49 kfunc_call_test_subprog__destroy(skel); 51 50 } 52 51 52 + static void test_subprog_lskel(void) 53 + { 54 + struct kfunc_call_test_subprog_lskel *skel; 55 + int prog_fd, retval, err; 56 + 57 + skel = kfunc_call_test_subprog_lskel__open_and_load(); 58 + if (!ASSERT_OK_PTR(skel, "skel")) 59 + return; 60 + 61 + prog_fd = skel->progs.kfunc_call_test1.prog_fd; 62 + err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4), 63 + NULL, NULL, (__u32 *)&retval, NULL); 64 + ASSERT_OK(err, "bpf_prog_test_run(test1)"); 65 + ASSERT_EQ(retval, 10, "test1-retval"); 66 + ASSERT_NEQ(skel->data->active_res, -1, "active_res"); 67 + ASSERT_EQ(skel->data->sk_state_res, BPF_TCP_CLOSE, "sk_state_res"); 68 + 69 + kfunc_call_test_subprog_lskel__destroy(skel); 70 + } 71 + 53 72 void test_kfunc_call(void) 54 73 { 55 74 if (test__start_subtest("main")) ··· 77 56 78 57 if (test__start_subtest("subprog")) 79 58 test_subprog(); 59 + 60 + if (test__start_subtest("subprog_lskel")) 61 + test_subprog_lskel(); 80 62 }

+65

tools/testing/selftests/bpf/prog_tests/legacy_printk.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + #include <test_progs.h> 4 + #include "test_legacy_printk.skel.h" 5 + 6 + static int execute_one_variant(bool legacy) 7 + { 8 + struct test_legacy_printk *skel; 9 + int err, zero = 0, my_pid = getpid(), res, map_fd; 10 + 11 + skel = test_legacy_printk__open(); 12 + if (!ASSERT_OK_PTR(skel, "skel_open")) 13 + return -errno; 14 + 15 + bpf_program__set_autoload(skel->progs.handle_legacy, legacy); 16 + bpf_program__set_autoload(skel->progs.handle_modern, !legacy); 17 + 18 + err = test_legacy_printk__load(skel); 19 + /* no ASSERT_OK, we expect one of two variants can fail here */ 20 + if (err) 21 + goto err_out; 22 + 23 + if (legacy) { 24 + map_fd = bpf_map__fd(skel->maps.my_pid_map); 25 + err = bpf_map_update_elem(map_fd, &zero, &my_pid, BPF_ANY); 26 + if (!ASSERT_OK(err, "my_pid_map_update")) 27 + goto err_out; 28 + err = bpf_map_lookup_elem(map_fd, &zero, &res); 29 + } else { 30 + skel->bss->my_pid_var = my_pid; 31 + } 32 + 33 + err = test_legacy_printk__attach(skel); 34 + if (!ASSERT_OK(err, "skel_attach")) 35 + goto err_out; 36 + 37 + usleep(1); /* trigger */ 38 + 39 + if (legacy) { 40 + map_fd = bpf_map__fd(skel->maps.res_map); 41 + err = bpf_map_lookup_elem(map_fd, &zero, &res); 42 + if (!ASSERT_OK(err, "res_map_lookup")) 43 + goto err_out; 44 + } else { 45 + res = skel->bss->res_var; 46 + } 47 + 48 + if (!ASSERT_GT(res, 0, "res")) { 49 + err = -EINVAL; 50 + goto err_out; 51 + } 52 + 53 + err_out: 54 + test_legacy_printk__destroy(skel); 55 + return err; 56 + } 57 + 58 + void test_legacy_printk(void) 59 + { 60 + /* legacy variant should work everywhere */ 61 + ASSERT_OK(execute_one_variant(true /* legacy */), "legacy_case"); 62 + 63 + /* execute modern variant, can fail the load on old kernels */ 64 + execute_one_variant(false); 65 + }

+276

tools/testing/selftests/bpf/prog_tests/log_buf.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #include <test_progs.h> 5 + #include <bpf/btf.h> 6 + 7 + #include "test_log_buf.skel.h" 8 + 9 + static size_t libbpf_log_pos; 10 + static char libbpf_log_buf[1024 * 1024]; 11 + static bool libbpf_log_error; 12 + 13 + static int libbpf_print_cb(enum libbpf_print_level level, const char *fmt, va_list args) 14 + { 15 + int emitted_cnt; 16 + size_t left_cnt; 17 + 18 + left_cnt = sizeof(libbpf_log_buf) - libbpf_log_pos; 19 + emitted_cnt = vsnprintf(libbpf_log_buf + libbpf_log_pos, left_cnt, fmt, args); 20 + 21 + if (emitted_cnt < 0 || emitted_cnt + 1 > left_cnt) { 22 + libbpf_log_error = true; 23 + return 0; 24 + } 25 + 26 + libbpf_log_pos += emitted_cnt; 27 + return 0; 28 + } 29 + 30 + static void obj_load_log_buf(void) 31 + { 32 + libbpf_print_fn_t old_print_cb = libbpf_set_print(libbpf_print_cb); 33 + LIBBPF_OPTS(bpf_object_open_opts, opts); 34 + const size_t log_buf_sz = 1024 * 1024; 35 + struct test_log_buf* skel; 36 + char *obj_log_buf, *good_log_buf, *bad_log_buf; 37 + int err; 38 + 39 + obj_log_buf = malloc(3 * log_buf_sz); 40 + if (!ASSERT_OK_PTR(obj_log_buf, "obj_log_buf")) 41 + return; 42 + 43 + good_log_buf = obj_log_buf + log_buf_sz; 44 + bad_log_buf = obj_log_buf + 2 * log_buf_sz; 45 + obj_log_buf[0] = good_log_buf[0] = bad_log_buf[0] = '\0'; 46 + 47 + opts.kernel_log_buf = obj_log_buf; 48 + opts.kernel_log_size = log_buf_sz; 49 + opts.kernel_log_level = 4; /* for BTF this will turn into 1 */ 50 + 51 + /* In the first round every prog has its own log_buf, so libbpf logs 52 + * don't have program failure logs 53 + */ 54 + skel = test_log_buf__open_opts(&opts); 55 + if (!ASSERT_OK_PTR(skel, "skel_open")) 56 + goto cleanup; 57 + 58 + /* set very verbose level for good_prog so we always get detailed logs */ 59 + bpf_program__set_log_buf(skel->progs.good_prog, good_log_buf, log_buf_sz); 60 + bpf_program__set_log_level(skel->progs.good_prog, 2); 61 + 62 + bpf_program__set_log_buf(skel->progs.bad_prog, bad_log_buf, log_buf_sz); 63 + /* log_level 0 with custom log_buf means that verbose logs are not 64 + * requested if program load is successful, but libbpf should retry 65 + * with log_level 1 on error and put program's verbose load log into 66 + * custom log_buf 67 + */ 68 + bpf_program__set_log_level(skel->progs.bad_prog, 0); 69 + 70 + err = test_log_buf__load(skel); 71 + if (!ASSERT_ERR(err, "unexpected_load_success")) 72 + goto cleanup; 73 + 74 + ASSERT_FALSE(libbpf_log_error, "libbpf_log_error"); 75 + 76 + /* there should be no prog loading log because we specified per-prog log buf */ 77 + ASSERT_NULL(strstr(libbpf_log_buf, "-- BEGIN PROG LOAD LOG --"), "unexp_libbpf_log"); 78 + ASSERT_OK_PTR(strstr(libbpf_log_buf, "prog 'bad_prog': BPF program load failed"), 79 + "libbpf_log_not_empty"); 80 + ASSERT_OK_PTR(strstr(obj_log_buf, "DATASEC license"), "obj_log_not_empty"); 81 + ASSERT_OK_PTR(strstr(good_log_buf, "0: R1=ctx(id=0,off=0,imm=0) R10=fp0"), 82 + "good_log_verbose"); 83 + ASSERT_OK_PTR(strstr(bad_log_buf, "invalid access to map value, value_size=16 off=16000 size=4"), 84 + "bad_log_not_empty"); 85 + 86 + if (env.verbosity > VERBOSE_NONE) { 87 + printf("LIBBPF LOG: \n=================\n%s=================\n", libbpf_log_buf); 88 + printf("OBJ LOG: \n=================\n%s=================\n", obj_log_buf); 89 + printf("GOOD_PROG LOG:\n=================\n%s=================\n", good_log_buf); 90 + printf("BAD_PROG LOG:\n=================\n%s=================\n", bad_log_buf); 91 + } 92 + 93 + /* reset everything */ 94 + test_log_buf__destroy(skel); 95 + obj_log_buf[0] = good_log_buf[0] = bad_log_buf[0] = '\0'; 96 + libbpf_log_buf[0] = '\0'; 97 + libbpf_log_pos = 0; 98 + libbpf_log_error = false; 99 + 100 + /* In the second round we let bad_prog's failure be logged through print callback */ 101 + opts.kernel_log_buf = NULL; /* let everything through into print callback */ 102 + opts.kernel_log_size = 0; 103 + opts.kernel_log_level = 1; 104 + 105 + skel = test_log_buf__open_opts(&opts); 106 + if (!ASSERT_OK_PTR(skel, "skel_open")) 107 + goto cleanup; 108 + 109 + /* set normal verbose level for good_prog to check log_level is taken into account */ 110 + bpf_program__set_log_buf(skel->progs.good_prog, good_log_buf, log_buf_sz); 111 + bpf_program__set_log_level(skel->progs.good_prog, 1); 112 + 113 + err = test_log_buf__load(skel); 114 + if (!ASSERT_ERR(err, "unexpected_load_success")) 115 + goto cleanup; 116 + 117 + ASSERT_FALSE(libbpf_log_error, "libbpf_log_error"); 118 + 119 + /* this time prog loading error should be logged through print callback */ 120 + ASSERT_OK_PTR(strstr(libbpf_log_buf, "libbpf: prog 'bad_prog': -- BEGIN PROG LOAD LOG --"), 121 + "libbpf_log_correct"); 122 + ASSERT_STREQ(obj_log_buf, "", "obj_log__empty"); 123 + ASSERT_STREQ(good_log_buf, "processed 4 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0\n", 124 + "good_log_ok"); 125 + ASSERT_STREQ(bad_log_buf, "", "bad_log_empty"); 126 + 127 + if (env.verbosity > VERBOSE_NONE) { 128 + printf("LIBBPF LOG: \n=================\n%s=================\n", libbpf_log_buf); 129 + printf("OBJ LOG: \n=================\n%s=================\n", obj_log_buf); 130 + printf("GOOD_PROG LOG:\n=================\n%s=================\n", good_log_buf); 131 + printf("BAD_PROG LOG:\n=================\n%s=================\n", bad_log_buf); 132 + } 133 + 134 + cleanup: 135 + free(obj_log_buf); 136 + test_log_buf__destroy(skel); 137 + libbpf_set_print(old_print_cb); 138 + } 139 + 140 + static void bpf_prog_load_log_buf(void) 141 + { 142 + const struct bpf_insn good_prog_insns[] = { 143 + BPF_MOV64_IMM(BPF_REG_0, 0), 144 + BPF_EXIT_INSN(), 145 + }; 146 + const size_t good_prog_insn_cnt = sizeof(good_prog_insns) / sizeof(struct bpf_insn); 147 + const struct bpf_insn bad_prog_insns[] = { 148 + BPF_EXIT_INSN(), 149 + }; 150 + size_t bad_prog_insn_cnt = sizeof(bad_prog_insns) / sizeof(struct bpf_insn); 151 + LIBBPF_OPTS(bpf_prog_load_opts, opts); 152 + const size_t log_buf_sz = 1024 * 1024; 153 + char *log_buf; 154 + int fd = -1; 155 + 156 + log_buf = malloc(log_buf_sz); 157 + if (!ASSERT_OK_PTR(log_buf, "log_buf_alloc")) 158 + return; 159 + opts.log_buf = log_buf; 160 + opts.log_size = log_buf_sz; 161 + 162 + /* with log_level == 0 log_buf shoud stay empty for good prog */ 163 + log_buf[0] = '\0'; 164 + opts.log_level = 0; 165 + fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, "good_prog", "GPL", 166 + good_prog_insns, good_prog_insn_cnt, &opts); 167 + ASSERT_STREQ(log_buf, "", "good_log_0"); 168 + ASSERT_GE(fd, 0, "good_fd1"); 169 + if (fd >= 0) 170 + close(fd); 171 + fd = -1; 172 + 173 + /* log_level == 2 should always fill log_buf, even for good prog */ 174 + log_buf[0] = '\0'; 175 + opts.log_level = 2; 176 + fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, "good_prog", "GPL", 177 + good_prog_insns, good_prog_insn_cnt, &opts); 178 + ASSERT_OK_PTR(strstr(log_buf, "0: R1=ctx(id=0,off=0,imm=0) R10=fp0"), "good_log_2"); 179 + ASSERT_GE(fd, 0, "good_fd2"); 180 + if (fd >= 0) 181 + close(fd); 182 + fd = -1; 183 + 184 + /* log_level == 0 should fill log_buf for bad prog */ 185 + log_buf[0] = '\0'; 186 + opts.log_level = 0; 187 + fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, "bad_prog", "GPL", 188 + bad_prog_insns, bad_prog_insn_cnt, &opts); 189 + ASSERT_OK_PTR(strstr(log_buf, "R0 !read_ok"), "bad_log_0"); 190 + ASSERT_LT(fd, 0, "bad_fd"); 191 + if (fd >= 0) 192 + close(fd); 193 + fd = -1; 194 + 195 + free(log_buf); 196 + } 197 + 198 + static void bpf_btf_load_log_buf(void) 199 + { 200 + LIBBPF_OPTS(bpf_btf_load_opts, opts); 201 + const size_t log_buf_sz = 1024 * 1024; 202 + const void *raw_btf_data; 203 + __u32 raw_btf_size; 204 + struct btf *btf; 205 + char *log_buf; 206 + int fd = -1; 207 + 208 + btf = btf__new_empty(); 209 + if (!ASSERT_OK_PTR(btf, "empty_btf")) 210 + return; 211 + 212 + ASSERT_GT(btf__add_int(btf, "int", 4, 0), 0, "int_type"); 213 + 214 + raw_btf_data = btf__raw_data(btf, &raw_btf_size); 215 + if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data_good")) 216 + goto cleanup; 217 + 218 + log_buf = malloc(log_buf_sz); 219 + if (!ASSERT_OK_PTR(log_buf, "log_buf_alloc")) 220 + goto cleanup; 221 + opts.log_buf = log_buf; 222 + opts.log_size = log_buf_sz; 223 + 224 + /* with log_level == 0 log_buf shoud stay empty for good BTF */ 225 + log_buf[0] = '\0'; 226 + opts.log_level = 0; 227 + fd = bpf_btf_load(raw_btf_data, raw_btf_size, &opts); 228 + ASSERT_STREQ(log_buf, "", "good_log_0"); 229 + ASSERT_GE(fd, 0, "good_fd1"); 230 + if (fd >= 0) 231 + close(fd); 232 + fd = -1; 233 + 234 + /* log_level == 2 should always fill log_buf, even for good BTF */ 235 + log_buf[0] = '\0'; 236 + opts.log_level = 2; 237 + fd = bpf_btf_load(raw_btf_data, raw_btf_size, &opts); 238 + printf("LOG_BUF: %s\n", log_buf); 239 + ASSERT_OK_PTR(strstr(log_buf, "magic: 0xeb9f"), "good_log_2"); 240 + ASSERT_GE(fd, 0, "good_fd2"); 241 + if (fd >= 0) 242 + close(fd); 243 + fd = -1; 244 + 245 + /* make BTF bad, add pointer pointing to non-existing type */ 246 + ASSERT_GT(btf__add_ptr(btf, 100), 0, "bad_ptr_type"); 247 + 248 + raw_btf_data = btf__raw_data(btf, &raw_btf_size); 249 + if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data_bad")) 250 + goto cleanup; 251 + 252 + /* log_level == 0 should fill log_buf for bad BTF */ 253 + log_buf[0] = '\0'; 254 + opts.log_level = 0; 255 + fd = bpf_btf_load(raw_btf_data, raw_btf_size, &opts); 256 + printf("LOG_BUF: %s\n", log_buf); 257 + ASSERT_OK_PTR(strstr(log_buf, "[2] PTR (anon) type_id=100 Invalid type_id"), "bad_log_0"); 258 + ASSERT_LT(fd, 0, "bad_fd"); 259 + if (fd >= 0) 260 + close(fd); 261 + fd = -1; 262 + 263 + cleanup: 264 + free(log_buf); 265 + btf__free(btf); 266 + } 267 + 268 + void test_log_buf(void) 269 + { 270 + if (test__start_subtest("obj_load_log_buf")) 271 + obj_load_log_buf(); 272 + if (test__start_subtest("bpf_prog_load_log_buf")) 273 + bpf_prog_load_log_buf(); 274 + if (test__start_subtest("bpf_btf_load_log_buf")) 275 + bpf_btf_load_log_buf(); 276 + }

+7 -9

tools/testing/selftests/bpf/prog_tests/map_ptr.c

··· 4 4 #include <test_progs.h> 5 5 #include <network_helpers.h> 6 6 7 - #include "map_ptr_kern.skel.h" 7 + #include "map_ptr_kern.lskel.h" 8 8 9 9 void test_map_ptr(void) 10 10 { 11 - struct map_ptr_kern *skel; 11 + struct map_ptr_kern_lskel *skel; 12 12 __u32 duration = 0, retval; 13 13 char buf[128]; 14 14 int err; 15 15 int page_size = getpagesize(); 16 16 17 - skel = map_ptr_kern__open(); 17 + skel = map_ptr_kern_lskel__open(); 18 18 if (!ASSERT_OK_PTR(skel, "skel_open")) 19 19 return; 20 20 21 - err = bpf_map__set_max_entries(skel->maps.m_ringbuf, page_size); 22 - if (!ASSERT_OK(err, "bpf_map__set_max_entries")) 23 - goto cleanup; 21 + skel->maps.m_ringbuf.max_entries = page_size; 24 22 25 - err = map_ptr_kern__load(skel); 23 + err = map_ptr_kern_lskel__load(skel); 26 24 if (!ASSERT_OK(err, "skel_load")) 27 25 goto cleanup; 28 26 29 27 skel->bss->page_size = page_size; 30 28 31 - err = bpf_prog_test_run(bpf_program__fd(skel->progs.cg_skb), 1, &pkt_v4, 29 + err = bpf_prog_test_run(skel->progs.cg_skb.prog_fd, 1, &pkt_v4, 32 30 sizeof(pkt_v4), buf, NULL, &retval, NULL); 33 31 34 32 if (CHECK(err, "test_run", "err=%d errno=%d\n", err, errno)) ··· 37 39 goto cleanup; 38 40 39 41 cleanup: 40 - map_ptr_kern__destroy(skel); 42 + map_ptr_kern_lskel__destroy(skel); 41 43 }

+2 -2

tools/testing/selftests/bpf/prog_tests/pinning.c

··· 241 241 goto out; 242 242 } 243 243 244 - map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(__u32), 245 - sizeof(__u64), 1, 0); 244 + map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(__u32), 245 + sizeof(__u64), 1, NULL); 246 246 if (CHECK(map_fd < 0, "create pinmap manually", "fd %d\n", map_fd)) 247 247 goto out; 248 248

+32

tools/testing/selftests/bpf/prog_tests/prog_array_init.c

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2021 Hengqi Chen */ 3 + 4 + #include <test_progs.h> 5 + #include "test_prog_array_init.skel.h" 6 + 7 + void test_prog_array_init(void) 8 + { 9 + struct test_prog_array_init *skel; 10 + int err; 11 + 12 + skel = test_prog_array_init__open(); 13 + if (!ASSERT_OK_PTR(skel, "could not open BPF object")) 14 + return; 15 + 16 + skel->rodata->my_pid = getpid(); 17 + 18 + err = test_prog_array_init__load(skel); 19 + if (!ASSERT_OK(err, "could not load BPF object")) 20 + goto cleanup; 21 + 22 + skel->links.entry = bpf_program__attach_raw_tracepoint(skel->progs.entry, "sys_enter"); 23 + if (!ASSERT_OK_PTR(skel->links.entry, "could not attach BPF program")) 24 + goto cleanup; 25 + 26 + usleep(1); 27 + 28 + ASSERT_EQ(skel->bss->value, 42, "unexpected value"); 29 + 30 + cleanup: 31 + test_prog_array_init__destroy(skel); 32 + }

+7 -5

tools/testing/selftests/bpf/prog_tests/queue_stack_map.c

··· 14 14 int i, err, prog_fd, map_in_fd, map_out_fd; 15 15 char file[32], buf[128]; 16 16 struct bpf_object *obj; 17 - struct iphdr *iph = (void *)buf + sizeof(struct ethhdr); 17 + struct iphdr iph; 18 18 19 19 /* Fill test values to be used */ 20 20 for (i = 0; i < MAP_SIZE; i++) ··· 60 60 61 61 err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4), 62 62 buf, &size, &retval, &duration); 63 - if (err || retval || size != sizeof(pkt_v4) || 64 - iph->daddr != val) 63 + if (err || retval || size != sizeof(pkt_v4)) 64 + break; 65 + memcpy(&iph, buf + sizeof(struct ethhdr), sizeof(iph)); 66 + if (iph.daddr != val) 65 67 break; 66 68 } 67 69 68 - CHECK(err || retval || size != sizeof(pkt_v4) || iph->daddr != val, 70 + CHECK(err || retval || size != sizeof(pkt_v4) || iph.daddr != val, 69 71 "bpf_map_pop_elem", 70 72 "err %d errno %d retval %d size %d iph->daddr %u\n", 71 - err, errno, retval, size, iph->daddr); 73 + err, errno, retval, size, iph.daddr); 72 74 73 75 /* Queue is empty, program should return TC_ACT_SHOT */ 74 76 err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),

+2 -2

tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c

··· 62 62 if (CHECK(err != 0, "bpf_map__set_max_entries", "bpf_map__set_max_entries failed\n")) 63 63 goto cleanup; 64 64 65 - proto_fd = bpf_create_map(BPF_MAP_TYPE_RINGBUF, 0, 0, page_size, 0); 66 - if (CHECK(proto_fd < 0, "bpf_create_map", "bpf_create_map failed\n")) 65 + proto_fd = bpf_map_create(BPF_MAP_TYPE_RINGBUF, NULL, 0, 0, page_size, NULL); 66 + if (CHECK(proto_fd < 0, "bpf_map_create", "bpf_map_create failed\n")) 67 67 goto cleanup; 68 68 69 69 err = bpf_map__set_inner_map_fd(skel->maps.ringbuf_hash, proto_fd);

+6 -15

tools/testing/selftests/bpf/prog_tests/select_reuseport.c

··· 66 66 67 67 static int create_maps(enum bpf_map_type inner_type) 68 68 { 69 - struct bpf_create_map_attr attr = {}; 69 + LIBBPF_OPTS(bpf_map_create_opts, opts); 70 70 71 71 inner_map_type = inner_type; 72 72 73 73 /* Creating reuseport_array */ 74 - attr.name = "reuseport_array"; 75 - attr.map_type = inner_type; 76 - attr.key_size = sizeof(__u32); 77 - attr.value_size = sizeof(__u32); 78 - attr.max_entries = REUSEPORT_ARRAY_SIZE; 79 - 80 - reuseport_array = bpf_create_map_xattr(&attr); 74 + reuseport_array = bpf_map_create(inner_type, "reuseport_array", 75 + sizeof(__u32), sizeof(__u32), REUSEPORT_ARRAY_SIZE, NULL); 81 76 RET_ERR(reuseport_array < 0, "creating reuseport_array", 82 77 "reuseport_array:%d errno:%d\n", reuseport_array, errno); 83 78 84 79 /* Creating outer_map */ 85 - attr.name = "outer_map"; 86 - attr.map_type = BPF_MAP_TYPE_ARRAY_OF_MAPS; 87 - attr.key_size = sizeof(__u32); 88 - attr.value_size = sizeof(__u32); 89 - attr.max_entries = 1; 90 - attr.inner_map_fd = reuseport_array; 91 - outer_map = bpf_create_map_xattr(&attr); 80 + opts.inner_map_fd = reuseport_array; 81 + outer_map = bpf_map_create(BPF_MAP_TYPE_ARRAY_OF_MAPS, "outer_map", 82 + sizeof(__u32), sizeof(__u32), 1, &opts); 92 83 RET_ERR(outer_map < 0, "creating outer_map", 93 84 "outer_map:%d errno:%d\n", outer_map, errno); 94 85

+2 -2

tools/testing/selftests/bpf/prog_tests/sockmap_basic.c

··· 91 91 if (CHECK_FAIL(s < 0)) 92 92 return; 93 93 94 - map = bpf_create_map(map_type, sizeof(int), sizeof(int), 1, 0); 94 + map = bpf_map_create(map_type, NULL, sizeof(int), sizeof(int), 1, NULL); 95 95 if (CHECK_FAIL(map < 0)) { 96 - perror("bpf_create_map"); 96 + perror("bpf_cmap_create"); 97 97 goto out; 98 98 } 99 99

+1 -1

tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c

··· 97 97 char test_name[MAX_TEST_NAME]; 98 98 int map; 99 99 100 - map = bpf_create_map(map_type, sizeof(int), sizeof(int), 1, 0); 100 + map = bpf_map_create(map_type, NULL, sizeof(int), sizeof(int), 1, NULL); 101 101 if (CHECK_FAIL(map < 0)) { 102 102 perror("bpf_map_create"); 103 103 return;

+2 -2

tools/testing/selftests/bpf/prog_tests/sockmap_listen.c

··· 502 502 if (s < 0) 503 503 return; 504 504 505 - mapfd = bpf_create_map(BPF_MAP_TYPE_SOCKMAP, sizeof(key), 506 - sizeof(value32), 1, 0); 505 + mapfd = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(key), 506 + sizeof(value32), 1, NULL); 507 507 if (mapfd < 0) { 508 508 FAIL_ERRNO("map_create"); 509 509 goto close;

+6 -6

tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c

··· 167 167 168 168 static void run_test(int cgroup_fd) 169 169 { 170 - struct bpf_prog_load_attr attr = { 171 - .file = "./sockopt_inherit.o", 172 - }; 173 170 int server_fd = -1, client_fd; 174 171 struct bpf_object *obj; 175 172 void *server_err; 176 173 pthread_t tid; 177 - int ignored; 178 174 int err; 179 175 180 - err = bpf_prog_load_xattr(&attr, &obj, &ignored); 181 - if (CHECK_FAIL(err)) 176 + obj = bpf_object__open_file("sockopt_inherit.o", NULL); 177 + if (!ASSERT_OK_PTR(obj, "obj_open")) 182 178 return; 179 + 180 + err = bpf_object__load(obj); 181 + if (!ASSERT_OK(err, "obj_load")) 182 + goto close_bpf_object; 183 183 184 184 err = prog_attach(obj, cgroup_fd, "cgroup/getsockopt"); 185 185 if (CHECK_FAIL(err))

+6 -6

tools/testing/selftests/bpf/prog_tests/sockopt_multi.c

··· 297 297 298 298 void test_sockopt_multi(void) 299 299 { 300 - struct bpf_prog_load_attr attr = { 301 - .file = "./sockopt_multi.o", 302 - }; 303 300 int cg_parent = -1, cg_child = -1; 304 301 struct bpf_object *obj = NULL; 305 302 int sock_fd = -1; 306 303 int err = -1; 307 - int ignored; 308 304 309 305 cg_parent = test__join_cgroup("/parent"); 310 306 if (CHECK_FAIL(cg_parent < 0)) ··· 310 314 if (CHECK_FAIL(cg_child < 0)) 311 315 goto out; 312 316 313 - err = bpf_prog_load_xattr(&attr, &obj, &ignored); 314 - if (CHECK_FAIL(err)) 317 + obj = bpf_object__open_file("sockopt_multi.o", NULL); 318 + if (!ASSERT_OK_PTR(obj, "obj_load")) 319 + goto out; 320 + 321 + err = bpf_object__load(obj); 322 + if (!ASSERT_OK(err, "obj_load")) 315 323 goto out; 316 324 317 325 sock_fd = socket(AF_INET, SOCK_STREAM, 0);

+7 -14

tools/testing/selftests/bpf/prog_tests/tcp_rtt.c

··· 2 2 #include <test_progs.h> 3 3 #include "cgroup_helpers.h" 4 4 #include "network_helpers.h" 5 + #include "tcp_rtt.skel.h" 5 6 6 7 struct tcp_rtt_storage { 7 8 __u32 invoked; ··· 92 91 93 92 static int run_test(int cgroup_fd, int server_fd) 94 93 { 95 - struct bpf_prog_load_attr attr = { 96 - .prog_type = BPF_PROG_TYPE_SOCK_OPS, 97 - .file = "./tcp_rtt.o", 98 - .expected_attach_type = BPF_CGROUP_SOCK_OPS, 99 - }; 100 - struct bpf_object *obj; 101 - struct bpf_map *map; 94 + struct tcp_rtt *skel; 102 95 int client_fd; 103 96 int prog_fd; 104 97 int map_fd; 105 98 int err; 106 99 107 - err = bpf_prog_load_xattr(&attr, &obj, &prog_fd); 108 - if (err) { 109 - log_err("Failed to load BPF object"); 100 + skel = tcp_rtt__open_and_load(); 101 + if (!ASSERT_OK_PTR(skel, "skel_open_load")) 110 102 return -1; 111 - } 112 103 113 - map = bpf_object__next_map(obj, NULL); 114 - map_fd = bpf_map__fd(map); 104 + map_fd = bpf_map__fd(skel->maps.socket_storage_map); 105 + prog_fd = bpf_program__fd(skel->progs._sockops); 115 106 116 107 err = bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_SOCK_OPS, 0); 117 108 if (err) { ··· 142 149 close(client_fd); 143 150 144 151 close_bpf_object: 145 - bpf_object__close(obj); 152 + tcp_rtt__destroy(skel); 146 153 return err; 147 154 } 148 155

+4 -2

tools/testing/selftests/bpf/prog_tests/test_bpffs.c

··· 19 19 fd = open(file, 0); 20 20 if (fd < 0) 21 21 return -1; 22 - while ((len = read(fd, buf, sizeof(buf))) > 0) 22 + while ((len = read(fd, buf, sizeof(buf))) > 0) { 23 + buf[sizeof(buf) - 1] = '\0'; 23 24 if (strstr(buf, "iter")) { 24 25 close(fd); 25 26 return 0; 26 27 } 28 + } 27 29 close(fd); 28 30 return -1; 29 31 } ··· 82 80 if (!ASSERT_OK(err, "creating " TDIR "/fs1/b")) 83 81 goto out; 84 82 85 - map = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 4, 1, 0); 83 + map = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, 4, 4, 1, NULL); 86 84 if (!ASSERT_GT(map, 0, "create_map(ARRAY)")) 87 85 goto out; 88 86 err = bpf_obj_pin(map, TDIR "/fs1/c");

+20 -8

tools/testing/selftests/bpf/prog_tests/test_global_funcs.c

··· 30 30 31 31 static int check_load(const char *file) 32 32 { 33 - struct bpf_prog_load_attr attr; 34 33 struct bpf_object *obj = NULL; 35 - int err, prog_fd; 34 + struct bpf_program *prog; 35 + int err; 36 36 37 - memset(&attr, 0, sizeof(struct bpf_prog_load_attr)); 38 - attr.file = file; 39 - attr.prog_type = BPF_PROG_TYPE_UNSPEC; 40 - attr.log_level = extra_prog_load_log_flags; 41 - attr.prog_flags = BPF_F_TEST_RND_HI32; 42 37 found = false; 43 - err = bpf_prog_load_xattr(&attr, &obj, &prog_fd); 38 + 39 + obj = bpf_object__open_file(file, NULL); 40 + err = libbpf_get_error(obj); 41 + if (err) 42 + return err; 43 + 44 + prog = bpf_object__next_program(obj, NULL); 45 + if (!prog) { 46 + err = -ENOENT; 47 + goto err_out; 48 + } 49 + 50 + bpf_program__set_flags(prog, BPF_F_TEST_RND_HI32); 51 + bpf_program__set_log_level(prog, extra_prog_load_log_flags); 52 + 53 + err = bpf_object__load(obj); 54 + 55 + err_out: 44 56 bpf_object__close(obj); 45 57 return err; 46 58 }

+6 -5

tools/testing/selftests/bpf/prog_tests/xdp.c

··· 11 11 const char *file = "./test_xdp.o"; 12 12 struct bpf_object *obj; 13 13 char buf[128]; 14 - struct ipv6hdr *iph6 = (void *)buf + sizeof(struct ethhdr); 15 - struct iphdr *iph = (void *)buf + sizeof(struct ethhdr); 14 + struct ipv6hdr iph6; 15 + struct iphdr iph; 16 16 __u32 duration, retval, size; 17 17 int err, prog_fd, map_fd; 18 18 ··· 28 28 29 29 err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4), 30 30 buf, &size, &retval, &duration); 31 - 31 + memcpy(&iph, buf + sizeof(struct ethhdr), sizeof(iph)); 32 32 CHECK(err || retval != XDP_TX || size != 74 || 33 - iph->protocol != IPPROTO_IPIP, "ipv4", 33 + iph.protocol != IPPROTO_IPIP, "ipv4", 34 34 "err %d errno %d retval %d size %d\n", 35 35 err, errno, retval, size); 36 36 37 37 err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6), 38 38 buf, &size, &retval, &duration); 39 + memcpy(&iph6, buf + sizeof(struct ethhdr), sizeof(iph6)); 39 40 CHECK(err || retval != XDP_TX || size != 114 || 40 - iph6->nexthdr != IPPROTO_IPV6, "ipv6", 41 + iph6.nexthdr != IPPROTO_IPV6, "ipv6", 41 42 "err %d errno %d retval %d size %d\n", 42 43 err, errno, retval, size); 43 44 out:

+20 -16

tools/testing/selftests/bpf/prog_tests/xdp_bonding.c

··· 218 218 .h_dest = BOND2_MAC, 219 219 .h_proto = htons(ETH_P_IP), 220 220 }; 221 - uint8_t buf[128] = {}; 222 - struct iphdr *iph = (struct iphdr *)(buf + sizeof(eh)); 223 - struct udphdr *uh = (struct udphdr *)(buf + sizeof(eh) + sizeof(*iph)); 221 + struct iphdr iph = {}; 222 + struct udphdr uh = {}; 223 + uint8_t buf[128]; 224 224 int i, s = -1; 225 225 int ifindex; 226 226 ··· 232 232 if (!ASSERT_GT(ifindex, 0, "get bond1 ifindex")) 233 233 goto err; 234 234 235 - memcpy(buf, &eh, sizeof(eh)); 236 - iph->ihl = 5; 237 - iph->version = 4; 238 - iph->tos = 16; 239 - iph->id = 1; 240 - iph->ttl = 64; 241 - iph->protocol = IPPROTO_UDP; 242 - iph->saddr = 1; 243 - iph->daddr = 2; 244 - iph->tot_len = htons(sizeof(buf) - ETH_HLEN); 245 - iph->check = 0; 235 + iph.ihl = 5; 236 + iph.version = 4; 237 + iph.tos = 16; 238 + iph.id = 1; 239 + iph.ttl = 64; 240 + iph.protocol = IPPROTO_UDP; 241 + iph.saddr = 1; 242 + iph.daddr = 2; 243 + iph.tot_len = htons(sizeof(buf) - ETH_HLEN); 244 + iph.check = 0; 246 245 247 246 for (i = 1; i <= NPACKETS; i++) { 248 247 int n; ··· 252 253 }; 253 254 254 255 /* vary the UDP destination port for even distribution with roundrobin/xor modes */ 255 - uh->dest++; 256 + uh.dest++; 256 257 257 258 if (vary_dst_ip) 258 - iph->daddr++; 259 + iph.daddr++; 260 + 261 + /* construct a packet */ 262 + memcpy(buf, &eh, sizeof(eh)); 263 + memcpy(buf + sizeof(eh), &iph, sizeof(iph)); 264 + memcpy(buf + sizeof(eh) + sizeof(iph), &uh, sizeof(uh)); 259 265 260 266 n = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *)&saddr_ll, sizeof(saddr_ll)); 261 267 if (!ASSERT_EQ(n, sizeof(buf), "sendto"))

+3 -3

tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c

··· 42 42 char buf[128]; 43 43 int err, pkt_fd, map_fd; 44 44 bool passed = false; 45 - struct iphdr *iph = (void *)buf + sizeof(struct ethhdr); 45 + struct iphdr iph; 46 46 struct iptnl_info value4 = {.family = AF_INET}; 47 47 struct test_xdp *pkt_skel = NULL; 48 48 struct test_xdp_bpf2bpf *ftrace_skel = NULL; ··· 93 93 /* Run test program */ 94 94 err = bpf_prog_test_run(pkt_fd, 1, &pkt_v4, sizeof(pkt_v4), 95 95 buf, &size, &retval, &duration); 96 - 96 + memcpy(&iph, buf + sizeof(struct ethhdr), sizeof(iph)); 97 97 if (CHECK(err || retval != XDP_TX || size != 74 || 98 - iph->protocol != IPPROTO_IPIP, "ipv4", 98 + iph.protocol != IPPROTO_IPIP, "ipv4", 99 99 "err %d errno %d retval %d size %d\n", 100 100 err, errno, retval, size)) 101 101 goto out;

+112

tools/testing/selftests/bpf/progs/bpf_loop.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + 7 + char _license[] SEC("license") = "GPL"; 8 + 9 + struct callback_ctx { 10 + int output; 11 + }; 12 + 13 + /* These should be set by the user program */ 14 + u32 nested_callback_nr_loops; 15 + u32 stop_index = -1; 16 + u32 nr_loops; 17 + int pid; 18 + 19 + /* Making these global variables so that the userspace program 20 + * can verify the output through the skeleton 21 + */ 22 + int nr_loops_returned; 23 + int g_output; 24 + int err; 25 + 26 + static int callback(__u32 index, void *data) 27 + { 28 + struct callback_ctx *ctx = data; 29 + 30 + if (index >= stop_index) 31 + return 1; 32 + 33 + ctx->output += index; 34 + 35 + return 0; 36 + } 37 + 38 + static int empty_callback(__u32 index, void *data) 39 + { 40 + return 0; 41 + } 42 + 43 + static int nested_callback2(__u32 index, void *data) 44 + { 45 + nr_loops_returned += bpf_loop(nested_callback_nr_loops, callback, data, 0); 46 + 47 + return 0; 48 + } 49 + 50 + static int nested_callback1(__u32 index, void *data) 51 + { 52 + bpf_loop(nested_callback_nr_loops, nested_callback2, data, 0); 53 + return 0; 54 + } 55 + 56 + SEC("fentry/__x64_sys_nanosleep") 57 + int test_prog(void *ctx) 58 + { 59 + struct callback_ctx data = {}; 60 + 61 + if (bpf_get_current_pid_tgid() >> 32 != pid) 62 + return 0; 63 + 64 + nr_loops_returned = bpf_loop(nr_loops, callback, &data, 0); 65 + 66 + if (nr_loops_returned < 0) 67 + err = nr_loops_returned; 68 + else 69 + g_output = data.output; 70 + 71 + return 0; 72 + } 73 + 74 + SEC("fentry/__x64_sys_nanosleep") 75 + int prog_null_ctx(void *ctx) 76 + { 77 + if (bpf_get_current_pid_tgid() >> 32 != pid) 78 + return 0; 79 + 80 + nr_loops_returned = bpf_loop(nr_loops, empty_callback, NULL, 0); 81 + 82 + return 0; 83 + } 84 + 85 + SEC("fentry/__x64_sys_nanosleep") 86 + int prog_invalid_flags(void *ctx) 87 + { 88 + struct callback_ctx data = {}; 89 + 90 + if (bpf_get_current_pid_tgid() >> 32 != pid) 91 + return 0; 92 + 93 + err = bpf_loop(nr_loops, callback, &data, 1); 94 + 95 + return 0; 96 + } 97 + 98 + SEC("fentry/__x64_sys_nanosleep") 99 + int prog_nested_calls(void *ctx) 100 + { 101 + struct callback_ctx data = {}; 102 + 103 + if (bpf_get_current_pid_tgid() >> 32 != pid) 104 + return 0; 105 + 106 + nr_loops_returned = 0; 107 + bpf_loop(nr_loops, nested_callback1, &data, 0); 108 + 109 + g_output = data.output; 110 + 111 + return 0; 112 + }

+26

tools/testing/selftests/bpf/progs/bpf_loop_bench.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + 7 + char _license[] SEC("license") = "GPL"; 8 + 9 + u32 nr_loops; 10 + long hits; 11 + 12 + static int empty_callback(__u32 index, void *data) 13 + { 14 + return 0; 15 + } 16 + 17 + SEC("fentry/__x64_sys_getpgid") 18 + int benchmark(void *ctx) 19 + { 20 + for (int i = 0; i < 1000; i++) { 21 + bpf_loop(nr_loops, empty_callback, NULL, 0); 22 + 23 + __sync_add_and_fetch(&hits, nr_loops); 24 + } 25 + return 0; 26 + }

+104

tools/testing/selftests/bpf/progs/core_kern.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + #include "vmlinux.h" 4 + 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + #include <bpf/bpf_core_read.h> 8 + 9 + #define ATTR __always_inline 10 + #include "test_jhash.h" 11 + 12 + struct { 13 + __uint(type, BPF_MAP_TYPE_ARRAY); 14 + __type(key, u32); 15 + __type(value, u32); 16 + __uint(max_entries, 256); 17 + } array1 SEC(".maps"); 18 + 19 + struct { 20 + __uint(type, BPF_MAP_TYPE_ARRAY); 21 + __type(key, u32); 22 + __type(value, u32); 23 + __uint(max_entries, 256); 24 + } array2 SEC(".maps"); 25 + 26 + static __noinline int randmap(int v, const struct net_device *dev) 27 + { 28 + struct bpf_map *map = (struct bpf_map *)&array1; 29 + int key = bpf_get_prandom_u32() & 0xff; 30 + int *val; 31 + 32 + if (bpf_get_prandom_u32() & 1) 33 + map = (struct bpf_map *)&array2; 34 + 35 + val = bpf_map_lookup_elem(map, &key); 36 + if (val) 37 + *val = bpf_get_prandom_u32() + v + dev->mtu; 38 + 39 + return 0; 40 + } 41 + 42 + SEC("tp_btf/xdp_devmap_xmit") 43 + int BPF_PROG(tp_xdp_devmap_xmit_multi, const struct net_device 44 + *from_dev, const struct net_device *to_dev, int sent, int drops, 45 + int err) 46 + { 47 + return randmap(from_dev->ifindex, from_dev); 48 + } 49 + 50 + SEC("fentry/eth_type_trans") 51 + int BPF_PROG(fentry_eth_type_trans, struct sk_buff *skb, 52 + struct net_device *dev, unsigned short protocol) 53 + { 54 + return randmap(dev->ifindex + skb->len, dev); 55 + } 56 + 57 + SEC("fexit/eth_type_trans") 58 + int BPF_PROG(fexit_eth_type_trans, struct sk_buff *skb, 59 + struct net_device *dev, unsigned short protocol) 60 + { 61 + return randmap(dev->ifindex + skb->len, dev); 62 + } 63 + 64 + volatile const int never; 65 + 66 + struct __sk_bUfF /* it will not exist in vmlinux */ { 67 + int len; 68 + } __attribute__((preserve_access_index)); 69 + 70 + struct bpf_testmod_test_read_ctx /* it exists in bpf_testmod */ { 71 + size_t len; 72 + } __attribute__((preserve_access_index)); 73 + 74 + SEC("tc") 75 + int balancer_ingress(struct __sk_buff *ctx) 76 + { 77 + void *data_end = (void *)(long)ctx->data_end; 78 + void *data = (void *)(long)ctx->data; 79 + void *ptr; 80 + int ret = 0, nh_off, i = 0; 81 + 82 + nh_off = 14; 83 + 84 + /* pragma unroll doesn't work on large loops */ 85 + #define C do { \ 86 + ptr = data + i; \ 87 + if (ptr + nh_off > data_end) \ 88 + break; \ 89 + ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \ 90 + if (never) { \ 91 + /* below is a dead code with unresolvable CO-RE relo */ \ 92 + i += ((struct __sk_bUfF *)ctx)->len; \ 93 + /* this CO-RE relo may or may not resolve 94 + * depending on whether bpf_testmod is loaded. 95 + */ \ 96 + i += ((struct bpf_testmod_test_read_ctx *)ctx)->len; \ 97 + } \ 98 + } while (0); 99 + #define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C; 100 + C30;C30;C30; /* 90 calls */ 101 + return 0; 102 + } 103 + 104 + char LICENSE[] SEC("license") = "GPL";

+14 -2

tools/testing/selftests/bpf/progs/map_ptr_kern.c

··· 334 334 return 1; 335 335 } 336 336 337 + #define INNER_MAX_ENTRIES 1234 338 + 337 339 struct inner_map { 338 340 __uint(type, BPF_MAP_TYPE_ARRAY); 339 - __uint(max_entries, 1); 341 + __uint(max_entries, INNER_MAX_ENTRIES); 340 342 __type(key, __u32); 341 343 __type(value, __u32); 342 344 } inner_map SEC(".maps"); ··· 350 348 __type(value, __u32); 351 349 __array(values, struct { 352 350 __uint(type, BPF_MAP_TYPE_ARRAY); 353 - __uint(max_entries, 1); 351 + __uint(max_entries, INNER_MAX_ENTRIES); 354 352 __type(key, __u32); 355 353 __type(value, __u32); 356 354 }); ··· 362 360 { 363 361 struct bpf_array *array_of_maps = (struct bpf_array *)&m_array_of_maps; 364 362 struct bpf_map *map = (struct bpf_map *)&m_array_of_maps; 363 + struct bpf_array *inner_map; 364 + int key = 0; 365 365 366 366 VERIFY(check_default(&array_of_maps->map, map)); 367 + inner_map = bpf_map_lookup_elem(array_of_maps, &key); 368 + VERIFY(inner_map != 0); 369 + VERIFY(inner_map->map.max_entries == INNER_MAX_ENTRIES); 367 370 368 371 return 1; 369 372 } ··· 389 382 { 390 383 struct bpf_htab *hash_of_maps = (struct bpf_htab *)&m_hash_of_maps; 391 384 struct bpf_map *map = (struct bpf_map *)&m_hash_of_maps; 385 + struct bpf_htab *inner_map; 386 + int key = 2; 392 387 393 388 VERIFY(check_default(&hash_of_maps->map, map)); 389 + inner_map = bpf_map_lookup_elem(hash_of_maps, &key); 390 + VERIFY(inner_map != 0); 391 + VERIFY(inner_map->map.max_entries == INNER_MAX_ENTRIES); 394 392 395 393 return 1; 396 394 }

+70 -1

tools/testing/selftests/bpf/progs/pyperf.h

··· 159 159 __uint(value_size, sizeof(long long) * 127); 160 160 } stackmap SEC(".maps"); 161 161 162 + #ifdef USE_BPF_LOOP 163 + struct process_frame_ctx { 164 + int cur_cpu; 165 + int32_t *symbol_counter; 166 + void *frame_ptr; 167 + FrameData *frame; 168 + PidData *pidData; 169 + Symbol *sym; 170 + Event *event; 171 + bool done; 172 + }; 173 + 174 + #define barrier_var(var) asm volatile("" : "=r"(var) : "0"(var)) 175 + 176 + static int process_frame_callback(__u32 i, struct process_frame_ctx *ctx) 177 + { 178 + int zero = 0; 179 + void *frame_ptr = ctx->frame_ptr; 180 + PidData *pidData = ctx->pidData; 181 + FrameData *frame = ctx->frame; 182 + int32_t *symbol_counter = ctx->symbol_counter; 183 + int cur_cpu = ctx->cur_cpu; 184 + Event *event = ctx->event; 185 + Symbol *sym = ctx->sym; 186 + 187 + if (frame_ptr && get_frame_data(frame_ptr, pidData, frame, sym)) { 188 + int32_t new_symbol_id = *symbol_counter * 64 + cur_cpu; 189 + int32_t *symbol_id = bpf_map_lookup_elem(&symbolmap, sym); 190 + 191 + if (!symbol_id) { 192 + bpf_map_update_elem(&symbolmap, sym, &zero, 0); 193 + symbol_id = bpf_map_lookup_elem(&symbolmap, sym); 194 + if (!symbol_id) { 195 + ctx->done = true; 196 + return 1; 197 + } 198 + } 199 + if (*symbol_id == new_symbol_id) 200 + (*symbol_counter)++; 201 + 202 + barrier_var(i); 203 + if (i >= STACK_MAX_LEN) 204 + return 1; 205 + 206 + event->stack[i] = *symbol_id; 207 + 208 + event->stack_len = i + 1; 209 + frame_ptr = frame->f_back; 210 + } 211 + return 0; 212 + } 213 + #endif /* USE_BPF_LOOP */ 214 + 162 215 #ifdef GLOBAL_FUNC 163 216 __noinline 164 217 #elif defined(SUBPROGS) ··· 281 228 int32_t* symbol_counter = bpf_map_lookup_elem(&symbolmap, &sym); 282 229 if (symbol_counter == NULL) 283 230 return 0; 231 + #ifdef USE_BPF_LOOP 232 + struct process_frame_ctx ctx = { 233 + .cur_cpu = cur_cpu, 234 + .symbol_counter = symbol_counter, 235 + .frame_ptr = frame_ptr, 236 + .frame = &frame, 237 + .pidData = pidData, 238 + .sym = &sym, 239 + .event = event, 240 + }; 241 + 242 + bpf_loop(STACK_MAX_LEN, process_frame_callback, &ctx, 0); 243 + if (ctx.done) 244 + return 0; 245 + #else 284 246 #ifdef NO_UNROLL 285 247 #pragma clang loop unroll(disable) 286 248 #else 287 249 #pragma clang loop unroll(full) 288 - #endif 250 + #endif /* NO_UNROLL */ 289 251 /* Unwind python stack */ 290 252 for (int i = 0; i < STACK_MAX_LEN; ++i) { 291 253 if (frame_ptr && get_frame_data(frame_ptr, pidData, &frame, &sym)) { ··· 319 251 frame_ptr = frame.f_back; 320 252 } 321 253 } 254 + #endif /* USE_BPF_LOOP */ 322 255 event->stack_complete = frame_ptr == NULL; 323 256 } else { 324 257 event->stack_complete = 1;

+6

tools/testing/selftests/bpf/progs/pyperf600_bpf_loop.c

+72 -3

tools/testing/selftests/bpf/progs/strobemeta.h

··· 445 445 return payload; 446 446 } 447 447 448 + #ifdef USE_BPF_LOOP 449 + enum read_type { 450 + READ_INT_VAR, 451 + READ_MAP_VAR, 452 + READ_STR_VAR, 453 + }; 454 + 455 + struct read_var_ctx { 456 + struct strobemeta_payload *data; 457 + void *tls_base; 458 + struct strobemeta_cfg *cfg; 459 + void *payload; 460 + /* value gets mutated */ 461 + struct strobe_value_generic *value; 462 + enum read_type type; 463 + }; 464 + 465 + static int read_var_callback(__u32 index, struct read_var_ctx *ctx) 466 + { 467 + switch (ctx->type) { 468 + case READ_INT_VAR: 469 + if (index >= STROBE_MAX_INTS) 470 + return 1; 471 + read_int_var(ctx->cfg, index, ctx->tls_base, ctx->value, ctx->data); 472 + break; 473 + case READ_MAP_VAR: 474 + if (index >= STROBE_MAX_MAPS) 475 + return 1; 476 + ctx->payload = read_map_var(ctx->cfg, index, ctx->tls_base, 477 + ctx->value, ctx->data, ctx->payload); 478 + break; 479 + case READ_STR_VAR: 480 + if (index >= STROBE_MAX_STRS) 481 + return 1; 482 + ctx->payload += read_str_var(ctx->cfg, index, ctx->tls_base, 483 + ctx->value, ctx->data, ctx->payload); 484 + break; 485 + } 486 + return 0; 487 + } 488 + #endif /* USE_BPF_LOOP */ 489 + 448 490 /* 449 491 * read_strobe_meta returns NULL, if no metadata was read; otherwise returns 450 492 * pointer to *right after* payload ends ··· 517 475 */ 518 476 tls_base = (void *)task; 519 477 478 + #ifdef USE_BPF_LOOP 479 + struct read_var_ctx ctx = { 480 + .cfg = cfg, 481 + .tls_base = tls_base, 482 + .value = &value, 483 + .data = data, 484 + .payload = payload, 485 + }; 486 + int err; 487 + 488 + ctx.type = READ_INT_VAR; 489 + err = bpf_loop(STROBE_MAX_INTS, read_var_callback, &ctx, 0); 490 + if (err != STROBE_MAX_INTS) 491 + return NULL; 492 + 493 + ctx.type = READ_STR_VAR; 494 + err = bpf_loop(STROBE_MAX_STRS, read_var_callback, &ctx, 0); 495 + if (err != STROBE_MAX_STRS) 496 + return NULL; 497 + 498 + ctx.type = READ_MAP_VAR; 499 + err = bpf_loop(STROBE_MAX_MAPS, read_var_callback, &ctx, 0); 500 + if (err != STROBE_MAX_MAPS) 501 + return NULL; 502 + #else 520 503 #ifdef NO_UNROLL 521 504 #pragma clang loop unroll(disable) 522 505 #else 523 506 #pragma unroll 524 - #endif 507 + #endif /* NO_UNROLL */ 525 508 for (int i = 0; i < STROBE_MAX_INTS; ++i) { 526 509 read_int_var(cfg, i, tls_base, &value, data); 527 510 } ··· 554 487 #pragma clang loop unroll(disable) 555 488 #else 556 489 #pragma unroll 557 - #endif 490 + #endif /* NO_UNROLL */ 558 491 for (int i = 0; i < STROBE_MAX_STRS; ++i) { 559 492 payload += read_str_var(cfg, i, tls_base, &value, data, payload); 560 493 } ··· 562 495 #pragma clang loop unroll(disable) 563 496 #else 564 497 #pragma unroll 565 - #endif 498 + #endif /* NO_UNROLL */ 566 499 for (int i = 0; i < STROBE_MAX_MAPS; ++i) { 567 500 payload = read_map_var(cfg, i, tls_base, &value, data, payload); 568 501 } 502 + #endif /* USE_BPF_LOOP */ 503 + 569 504 /* 570 505 * return pointer right after end of payload, so it's possible to 571 506 * calculate exact amount of useful data that needs to be sent

+9

tools/testing/selftests/bpf/progs/strobemeta_bpf_loop.c

··· 1 + // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #define STROBE_MAX_INTS 2 5 + #define STROBE_MAX_STRS 25 6 + #define STROBE_MAX_MAPS 100 7 + #define STROBE_MAX_MAP_ENTRIES 20 8 + #define USE_BPF_LOOP 9 + #include "strobemeta.h"

+1 -1

tools/testing/selftests/bpf/progs/test_ksyms_weak.c

··· 38 38 /* tests existing symbols. */ 39 39 rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, 0); 40 40 if (rq) 41 - out__existing_typed = 0; 41 + out__existing_typed = rq->cpu; 42 42 out__existing_typeless = (__u64)&bpf_prog_active; 43 43 44 44 /* tests non-existent symbols. */

+73

tools/testing/selftests/bpf/progs/test_legacy_printk.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #include <linux/bpf.h> 5 + #define BPF_NO_GLOBAL_DATA 6 + #include <bpf/bpf_helpers.h> 7 + 8 + char LICENSE[] SEC("license") = "GPL"; 9 + 10 + struct { 11 + __uint(type, BPF_MAP_TYPE_ARRAY); 12 + __type(key, int); 13 + __type(value, int); 14 + __uint(max_entries, 1); 15 + } my_pid_map SEC(".maps"); 16 + 17 + struct { 18 + __uint(type, BPF_MAP_TYPE_ARRAY); 19 + __type(key, int); 20 + __type(value, int); 21 + __uint(max_entries, 1); 22 + } res_map SEC(".maps"); 23 + 24 + volatile int my_pid_var = 0; 25 + volatile int res_var = 0; 26 + 27 + SEC("tp/raw_syscalls/sys_enter") 28 + int handle_legacy(void *ctx) 29 + { 30 + int zero = 0, *my_pid, cur_pid, *my_res; 31 + 32 + my_pid = bpf_map_lookup_elem(&my_pid_map, &zero); 33 + if (!my_pid) 34 + return 1; 35 + 36 + cur_pid = bpf_get_current_pid_tgid() >> 32; 37 + if (cur_pid != *my_pid) 38 + return 1; 39 + 40 + my_res = bpf_map_lookup_elem(&res_map, &zero); 41 + if (!my_res) 42 + return 1; 43 + 44 + if (*my_res == 0) 45 + /* use bpf_printk() in combination with BPF_NO_GLOBAL_DATA to 46 + * force .rodata.str1.1 section that previously caused 47 + * problems on old kernels due to libbpf always tried to 48 + * create a global data map for it 49 + */ 50 + bpf_printk("Legacy-case bpf_printk test, pid %d\n", cur_pid); 51 + *my_res = 1; 52 + 53 + return *my_res; 54 + } 55 + 56 + SEC("tp/raw_syscalls/sys_enter") 57 + int handle_modern(void *ctx) 58 + { 59 + int zero = 0, cur_pid; 60 + 61 + cur_pid = bpf_get_current_pid_tgid() >> 32; 62 + if (cur_pid != my_pid_var) 63 + return 1; 64 + 65 + if (res_var == 0) 66 + /* we need bpf_printk() to validate libbpf logic around unused 67 + * global maps and legacy kernels; see comment in handle_legacy() 68 + */ 69 + bpf_printk("Modern-case bpf_printk test, pid %d\n", cur_pid); 70 + res_var = 1; 71 + 72 + return res_var; 73 + }

+24

tools/testing/selftests/bpf/progs/test_log_buf.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + 4 + #include <linux/bpf.h> 5 + #include <bpf/bpf_helpers.h> 6 + 7 + int a[4]; 8 + const volatile int off = 4000; 9 + 10 + SEC("raw_tp/sys_enter") 11 + int good_prog(const void *ctx) 12 + { 13 + a[0] = (int)(long)ctx; 14 + return a[1]; 15 + } 16 + 17 + SEC("raw_tp/sys_enter") 18 + int bad_prog(const void *ctx) 19 + { 20 + /* out of bounds access */ 21 + return a[off]; 22 + } 23 + 24 + char _license[] SEC("license") = "GPL";

+39

tools/testing/selftests/bpf/progs/test_prog_array_init.c

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2021 Hengqi Chen */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + 8 + const volatile pid_t my_pid = 0; 9 + int value = 0; 10 + 11 + SEC("raw_tp/sys_enter") 12 + int tailcall_1(void *ctx) 13 + { 14 + value = 42; 15 + return 0; 16 + } 17 + 18 + struct { 19 + __uint(type, BPF_MAP_TYPE_PROG_ARRAY); 20 + __uint(max_entries, 2); 21 + __uint(key_size, sizeof(__u32)); 22 + __array(values, int (void *)); 23 + } prog_array_init SEC(".maps") = { 24 + .values = { 25 + [1] = (void *)&tailcall_1, 26 + }, 27 + }; 28 + 29 + SEC("raw_tp/sys_enter") 30 + int entry(void *ctx) 31 + { 32 + pid_t pid = bpf_get_current_pid_tgid() >> 32; 33 + 34 + if (pid != my_pid) 35 + return 0; 36 + 37 + bpf_tail_call(ctx, &prog_array_init, 1); 38 + return 0; 39 + }

+2 -2

tools/testing/selftests/bpf/progs/test_verif_scale2.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 // Copyright (c) 2019 Facebook 3 - #include <linux/bpf.h> 3 + #include "vmlinux.h" 4 4 #include <bpf/bpf_helpers.h> 5 5 #define ATTR __always_inline 6 6 #include "test_jhash.h" 7 7 8 - SEC("scale90_inline") 8 + SEC("tc") 9 9 int balancer_ingress(struct __sk_buff *ctx) 10 10 { 11 11 void *data_end = (void *)(long)ctx->data_end;

+7

tools/testing/selftests/bpf/progs/trigger_bench.c

··· 52 52 __sync_add_and_fetch(&hits, 1); 53 53 return -22; 54 54 } 55 + 56 + SEC("uprobe/self/uprobe_target") 57 + int bench_trigger_uprobe(void *ctx) 58 + { 59 + __sync_add_and_fetch(&hits, 1); 60 + return 0; 61 + }

+79 -15

tools/testing/selftests/bpf/test_bpftool_synctypes.py

··· 9 9 10 10 LINUX_ROOT = os.path.abspath(os.path.join(__file__, 11 11 os.pardir, os.pardir, os.pardir, os.pardir, os.pardir)) 12 - BPFTOOL_DIR = os.path.join(LINUX_ROOT, 'tools/bpf/bpftool') 12 + BPFTOOL_DIR = os.getenv('BPFTOOL_DIR', 13 + os.path.join(LINUX_ROOT, 'tools/bpf/bpftool')) 14 + BPFTOOL_BASHCOMP_DIR = os.getenv('BPFTOOL_BASHCOMP_DIR', 15 + os.path.join(BPFTOOL_DIR, 'bash-completion')) 16 + BPFTOOL_DOC_DIR = os.getenv('BPFTOOL_DOC_DIR', 17 + os.path.join(BPFTOOL_DIR, 'Documentation')) 18 + INCLUDE_DIR = os.getenv('INCLUDE_DIR', 19 + os.path.join(LINUX_ROOT, 'tools/include')) 20 + 13 21 retval = 0 14 22 15 23 class BlockParser(object): ··· 250 242 end_marker = re.compile('}\\\\n') 251 243 return self.__get_description_list(start_marker, pattern, end_marker) 252 244 253 - def default_options(self): 254 - """ 255 - Return the default options contained in HELP_SPEC_OPTIONS 256 - """ 257 - return { '-j', '--json', '-p', '--pretty', '-d', '--debug' } 258 - 259 245 def get_bashcomp_list(self, block_name): 260 246 """ 261 247 Search for and parse a list of type names from a variable in bash ··· 276 274 defined in children classes. 277 275 """ 278 276 def get_options(self): 279 - return self.default_options().union(self.get_help_list_macro('HELP_SPEC_OPTIONS')) 277 + return self.get_help_list_macro('HELP_SPEC_OPTIONS') 278 + 279 + class MainHeaderFileExtractor(SourceFileExtractor): 280 + """ 281 + An extractor for bpftool's main.h 282 + """ 283 + filename = os.path.join(BPFTOOL_DIR, 'main.h') 284 + 285 + def get_common_options(self): 286 + """ 287 + Parse the list of common options in main.h (options that apply to all 288 + commands), which looks to the lists of options in other source files 289 + but has different start and end markers: 290 + 291 + "OPTIONS := { {-j|--json} [{-p|--pretty}] | {-d|--debug} | {-l|--legacy}" 292 + 293 + Return a set containing all options, such as: 294 + 295 + {'-p', '-d', '--legacy', '--pretty', '--debug', '--json', '-l', '-j'} 296 + """ 297 + start_marker = re.compile(f'"OPTIONS :=') 298 + pattern = re.compile('([\w-]+) ?(?:\||}[ }\]"])') 299 + end_marker = re.compile('#define') 300 + 301 + parser = InlineListParser(self.reader) 302 + parser.search_block(start_marker) 303 + return parser.parse(pattern, end_marker) 304 + 305 + class ManSubstitutionsExtractor(SourceFileExtractor): 306 + """ 307 + An extractor for substitutions.rst 308 + """ 309 + filename = os.path.join(BPFTOOL_DOC_DIR, 'substitutions.rst') 310 + 311 + def get_common_options(self): 312 + """ 313 + Parse the list of common options in substitutions.rst (options that 314 + apply to all commands). 315 + 316 + Return a set containing all options, such as: 317 + 318 + {'-p', '-d', '--legacy', '--pretty', '--debug', '--json', '-l', '-j'} 319 + """ 320 + start_marker = re.compile('\|COMMON_OPTIONS\| replace:: {') 321 + pattern = re.compile('\*\*([\w/-]+)\*\*') 322 + end_marker = re.compile('}$') 323 + 324 + parser = InlineListParser(self.reader) 325 + parser.search_block(start_marker) 326 + return parser.parse(pattern, end_marker) 280 327 281 328 class ProgFileExtractor(SourceFileExtractor): 282 329 """ ··· 401 350 """ 402 351 An extractor for the UAPI BPF header. 403 352 """ 404 - filename = os.path.join(LINUX_ROOT, 'tools/include/uapi/linux/bpf.h') 353 + filename = os.path.join(INCLUDE_DIR, 'uapi/linux/bpf.h') 405 354 406 355 def get_prog_types(self): 407 356 return self.get_enum('bpf_prog_type') ··· 425 374 """ 426 375 An extractor for bpftool-prog.rst. 427 376 """ 428 - filename = os.path.join(BPFTOOL_DIR, 'Documentation/bpftool-prog.rst') 377 + filename = os.path.join(BPFTOOL_DOC_DIR, 'bpftool-prog.rst') 429 378 430 379 def get_attach_types(self): 431 380 return self.get_rst_list('ATTACH_TYPE') ··· 434 383 """ 435 384 An extractor for bpftool-map.rst. 436 385 """ 437 - filename = os.path.join(BPFTOOL_DIR, 'Documentation/bpftool-map.rst') 386 + filename = os.path.join(BPFTOOL_DOC_DIR, 'bpftool-map.rst') 438 387 439 388 def get_map_types(self): 440 389 return self.get_rst_list('TYPE') ··· 443 392 """ 444 393 An extractor for bpftool-cgroup.rst. 445 394 """ 446 - filename = os.path.join(BPFTOOL_DIR, 'Documentation/bpftool-cgroup.rst') 395 + filename = os.path.join(BPFTOOL_DOC_DIR, 'bpftool-cgroup.rst') 447 396 448 397 def get_attach_types(self): 449 398 return self.get_rst_list('ATTACH_TYPE') ··· 462 411 """ 463 412 An extractor for bpftool's bash completion file. 464 413 """ 465 - filename = os.path.join(BPFTOOL_DIR, 'bash-completion/bpftool') 414 + filename = os.path.join(BPFTOOL_BASHCOMP_DIR, 'bpftool') 466 415 467 416 def get_prog_attach_types(self): 468 417 return self.get_bashcomp_list('BPFTOOL_PROG_ATTACH_TYPES') ··· 613 562 help_cmd_options = source_info.get_options() 614 563 source_info.close() 615 564 616 - man_cmd_info = ManGenericExtractor(os.path.join('Documentation', 'bpftool-' + cmd + '.rst')) 565 + man_cmd_info = ManGenericExtractor(os.path.join(BPFTOOL_DOC_DIR, 'bpftool-' + cmd + '.rst')) 617 566 man_cmd_options = man_cmd_info.get_options() 618 567 man_cmd_info.close() 619 568 ··· 624 573 help_main_options = source_main_info.get_options() 625 574 source_main_info.close() 626 575 627 - man_main_info = ManGenericExtractor(os.path.join('Documentation', 'bpftool.rst')) 576 + man_main_info = ManGenericExtractor(os.path.join(BPFTOOL_DOC_DIR, 'bpftool.rst')) 628 577 man_main_options = man_main_info.get_options() 629 578 man_main_info.close() 630 579 631 580 verify(help_main_options, man_main_options, 632 581 f'Comparing {source_main_info.filename} (do_help() OPTIONS) and {man_main_info.filename} (OPTIONS):') 582 + 583 + # Compare common options (options that apply to all commands) 584 + 585 + main_hdr_info = MainHeaderFileExtractor() 586 + source_common_options = main_hdr_info.get_common_options() 587 + main_hdr_info.close() 588 + 589 + man_substitutions = ManSubstitutionsExtractor() 590 + man_common_options = man_substitutions.get_common_options() 591 + man_substitutions.close() 592 + 593 + verify(source_common_options, man_common_options, 594 + f'Comparing common options from {main_hdr_info.filename} (HELP_SPEC_OPTIONS) and {man_substitutions.filename}:') 633 595 634 596 sys.exit(retval) 635 597

+4 -4

tools/testing/selftests/bpf/test_cgroup_storage.c

··· 51 51 goto err; 52 52 } 53 53 54 - map_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_STORAGE, sizeof(key), 55 - sizeof(value), 0, 0); 54 + map_fd = bpf_map_create(BPF_MAP_TYPE_CGROUP_STORAGE, NULL, sizeof(key), 55 + sizeof(value), 0, NULL); 56 56 if (map_fd < 0) { 57 57 printf("Failed to create map: %s\n", strerror(errno)); 58 58 goto out; 59 59 } 60 60 61 - percpu_map_fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, 62 - sizeof(key), sizeof(value), 0, 0); 61 + percpu_map_fd = bpf_map_create(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, NULL, 62 + sizeof(key), sizeof(value), 0, NULL); 63 63 if (percpu_map_fd < 0) { 64 64 printf("Failed to create map: %s\n", strerror(errno)); 65 65 goto out;

+15 -12

tools/testing/selftests/bpf/test_lpm_map.c

··· 208 208 209 209 static void test_lpm_map(int keysize) 210 210 { 211 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); 211 212 size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups; 212 213 struct tlpm_node *t, *list = NULL; 213 214 struct bpf_lpm_trie_key *key; ··· 234 233 key = alloca(sizeof(*key) + keysize); 235 234 memset(key, 0, sizeof(*key) + keysize); 236 235 237 - map = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE, 236 + map = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, NULL, 238 237 sizeof(*key) + keysize, 239 238 keysize + 1, 240 239 4096, 241 - BPF_F_NO_PREALLOC); 240 + &opts); 242 241 assert(map >= 0); 243 242 244 243 for (i = 0; i < n_nodes; ++i) { ··· 330 329 331 330 static void test_lpm_ipaddr(void) 332 331 { 332 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); 333 333 struct bpf_lpm_trie_key *key_ipv4; 334 334 struct bpf_lpm_trie_key *key_ipv6; 335 335 size_t key_size_ipv4; ··· 344 342 key_ipv4 = alloca(key_size_ipv4); 345 343 key_ipv6 = alloca(key_size_ipv6); 346 344 347 - map_fd_ipv4 = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE, 345 + map_fd_ipv4 = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, NULL, 348 346 key_size_ipv4, sizeof(value), 349 - 100, BPF_F_NO_PREALLOC); 347 + 100, &opts); 350 348 assert(map_fd_ipv4 >= 0); 351 349 352 - map_fd_ipv6 = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE, 350 + map_fd_ipv6 = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, NULL, 353 351 key_size_ipv6, sizeof(value), 354 - 100, BPF_F_NO_PREALLOC); 352 + 100, &opts); 355 353 assert(map_fd_ipv6 >= 0); 356 354 357 355 /* Fill data some IPv4 and IPv6 address ranges */ ··· 425 423 426 424 static void test_lpm_delete(void) 427 425 { 426 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); 428 427 struct bpf_lpm_trie_key *key; 429 428 size_t key_size; 430 429 int map_fd; ··· 434 431 key_size = sizeof(*key) + sizeof(__u32); 435 432 key = alloca(key_size); 436 433 437 - map_fd = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE, 434 + map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, NULL, 438 435 key_size, sizeof(value), 439 - 100, BPF_F_NO_PREALLOC); 436 + 100, &opts); 440 437 assert(map_fd >= 0); 441 438 442 439 /* Add nodes: ··· 538 535 539 536 static void test_lpm_get_next_key(void) 540 537 { 538 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); 541 539 struct bpf_lpm_trie_key *key_p, *next_key_p; 542 540 size_t key_size; 543 541 __u32 value = 0; ··· 548 544 key_p = alloca(key_size); 549 545 next_key_p = alloca(key_size); 550 546 551 - map_fd = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE, key_size, sizeof(value), 552 - 100, BPF_F_NO_PREALLOC); 547 + map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, NULL, key_size, sizeof(value), 100, &opts); 553 548 assert(map_fd >= 0); 554 549 555 550 /* empty tree. get_next_key should return ENOENT */ ··· 756 753 757 754 static void test_lpm_multi_thread(void) 758 755 { 756 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); 759 757 struct lpm_mt_test_info info[4]; 760 758 size_t key_size, value_size; 761 759 pthread_t thread_id[4]; ··· 766 762 /* create a trie */ 767 763 value_size = sizeof(__u32); 768 764 key_size = sizeof(struct bpf_lpm_trie_key) + value_size; 769 - map_fd = bpf_create_map(BPF_MAP_TYPE_LPM_TRIE, key_size, value_size, 770 - 100, BPF_F_NO_PREALLOC); 765 + map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, NULL, key_size, value_size, 100, &opts); 771 766 772 767 /* create 4 threads to test update, delete, lookup and get_next_key */ 773 768 setup_lpm_mt_test_info(&info[0], map_fd);

+5 -11

tools/testing/selftests/bpf/test_lru_map.c

··· 28 28 29 29 static int create_map(int map_type, int map_flags, unsigned int size) 30 30 { 31 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = map_flags); 31 32 int map_fd; 32 33 33 - map_fd = bpf_create_map(map_type, sizeof(unsigned long long), 34 - sizeof(unsigned long long), size, map_flags); 34 + map_fd = bpf_map_create(map_type, NULL, sizeof(unsigned long long), 35 + sizeof(unsigned long long), size, &opts); 35 36 36 37 if (map_fd == -1) 37 - perror("bpf_create_map"); 38 + perror("bpf_map_create"); 38 39 39 40 return map_fd; 40 41 } ··· 43 42 static int bpf_map_lookup_elem_with_ref_bit(int fd, unsigned long long key, 44 43 void *value) 45 44 { 46 - struct bpf_create_map_attr map; 47 45 struct bpf_insn insns[] = { 48 46 BPF_LD_MAP_VALUE(BPF_REG_9, 0, 0), 49 47 BPF_LD_MAP_FD(BPF_REG_1, fd), ··· 63 63 int mfd, pfd, ret, zero = 0; 64 64 __u32 retval = 0; 65 65 66 - memset(&map, 0, sizeof(map)); 67 - map.map_type = BPF_MAP_TYPE_ARRAY; 68 - map.key_size = sizeof(int); 69 - map.value_size = sizeof(unsigned long long); 70 - map.max_entries = 1; 71 - 72 - mfd = bpf_create_map_xattr(&map); 66 + mfd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(int), sizeof(__u64), 1, NULL); 73 67 if (mfd < 0) 74 68 return -1; 75 69

+56 -54

tools/testing/selftests/bpf/test_maps.c

··· 33 33 34 34 static int skips; 35 35 36 - static int map_flags; 36 + static struct bpf_map_create_opts map_opts = { .sz = sizeof(map_opts) }; 37 37 38 38 static void test_hashmap(unsigned int task, void *data) 39 39 { 40 40 long long key, next_key, first_key, value; 41 41 int fd; 42 42 43 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 44 - 2, map_flags); 43 + fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), 2, &map_opts); 45 44 if (fd < 0) { 46 45 printf("Failed to create hashmap '%s'!\n", strerror(errno)); 47 46 exit(1); ··· 137 138 138 139 for (i = 1; i <= 512; i <<= 1) 139 140 for (j = 1; j <= 1 << 18; j <<= 1) { 140 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, i, j, 141 - 2, map_flags); 141 + fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, i, j, 2, &map_opts); 142 142 if (fd < 0) { 143 143 if (errno == ENOMEM) 144 144 return; ··· 158 160 int expected_key_mask = 0; 159 161 int fd, i; 160 162 161 - fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_HASH, sizeof(key), 162 - sizeof(bpf_percpu(value, 0)), 2, map_flags); 163 + fd = bpf_map_create(BPF_MAP_TYPE_PERCPU_HASH, NULL, sizeof(key), 164 + sizeof(bpf_percpu(value, 0)), 2, &map_opts); 163 165 if (fd < 0) { 164 166 printf("Failed to create hashmap '%s'!\n", strerror(errno)); 165 167 exit(1); ··· 270 272 int i, fd, ret; 271 273 long long key, value; 272 274 273 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 274 - max_entries, map_flags); 275 + fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), 276 + max_entries, &map_opts); 275 277 CHECK(fd < 0, 276 278 "failed to create hashmap", 277 - "err: %s, flags: 0x%x\n", strerror(errno), map_flags); 279 + "err: %s, flags: 0x%x\n", strerror(errno), map_opts.map_flags); 278 280 279 281 for (i = 0; i < max_entries; i++) { 280 282 key = i; value = key; ··· 330 332 int i, first, second, old_flags; 331 333 long long key, next_first, next_second; 332 334 333 - old_flags = map_flags; 334 - map_flags |= BPF_F_ZERO_SEED; 335 + old_flags = map_opts.map_flags; 336 + map_opts.map_flags |= BPF_F_ZERO_SEED; 335 337 336 338 first = helper_fill_hashmap(3); 337 339 second = helper_fill_hashmap(3); ··· 353 355 key = next_first; 354 356 } 355 357 356 - map_flags = old_flags; 358 + map_opts.map_flags = old_flags; 357 359 close(first); 358 360 close(second); 359 361 } ··· 363 365 int key, next_key, fd; 364 366 long long value; 365 367 366 - fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key), sizeof(value), 367 - 2, 0); 368 + fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(key), sizeof(value), 2, NULL); 368 369 if (fd < 0) { 369 370 printf("Failed to create arraymap '%s'!\n", strerror(errno)); 370 371 exit(1); ··· 418 421 BPF_DECLARE_PERCPU(long, values); 419 422 int key, next_key, fd, i; 420 423 421 - fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_ARRAY, sizeof(key), 422 - sizeof(bpf_percpu(values, 0)), 2, 0); 424 + fd = bpf_map_create(BPF_MAP_TYPE_PERCPU_ARRAY, NULL, sizeof(key), 425 + sizeof(bpf_percpu(values, 0)), 2, NULL); 423 426 if (fd < 0) { 424 427 printf("Failed to create arraymap '%s'!\n", strerror(errno)); 425 428 exit(1); ··· 481 484 unsigned int nr_keys = 2000; 482 485 int key, fd, i; 483 486 484 - fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_ARRAY, sizeof(key), 485 - sizeof(bpf_percpu(values, 0)), nr_keys, 0); 487 + fd = bpf_map_create(BPF_MAP_TYPE_PERCPU_ARRAY, NULL, sizeof(key), 488 + sizeof(bpf_percpu(values, 0)), nr_keys, NULL); 486 489 if (fd < 0) { 487 490 printf("Failed to create per-cpu arraymap '%s'!\n", 488 491 strerror(errno)); ··· 513 516 int fd; 514 517 __u32 key, value; 515 518 516 - fd = bpf_create_map(BPF_MAP_TYPE_DEVMAP, sizeof(key), sizeof(value), 517 - 2, 0); 519 + fd = bpf_map_create(BPF_MAP_TYPE_DEVMAP, NULL, sizeof(key), sizeof(value), 2, NULL); 518 520 if (fd < 0) { 519 521 printf("Failed to create devmap '%s'!\n", strerror(errno)); 520 522 exit(1); ··· 527 531 int fd; 528 532 __u32 key, value; 529 533 530 - fd = bpf_create_map(BPF_MAP_TYPE_DEVMAP_HASH, sizeof(key), sizeof(value), 531 - 2, 0); 534 + fd = bpf_map_create(BPF_MAP_TYPE_DEVMAP_HASH, NULL, sizeof(key), sizeof(value), 2, NULL); 532 535 if (fd < 0) { 533 536 printf("Failed to create devmap_hash '%s'!\n", strerror(errno)); 534 537 exit(1); ··· 547 552 vals[i] = rand(); 548 553 549 554 /* Invalid key size */ 550 - fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 4, sizeof(val), MAP_SIZE, 551 - map_flags); 555 + fd = bpf_map_create(BPF_MAP_TYPE_QUEUE, NULL, 4, sizeof(val), MAP_SIZE, &map_opts); 552 556 assert(fd < 0 && errno == EINVAL); 553 557 554 - fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 0, sizeof(val), MAP_SIZE, 555 - map_flags); 558 + fd = bpf_map_create(BPF_MAP_TYPE_QUEUE, NULL, 0, sizeof(val), MAP_SIZE, &map_opts); 556 559 /* Queue map does not support BPF_F_NO_PREALLOC */ 557 - if (map_flags & BPF_F_NO_PREALLOC) { 560 + if (map_opts.map_flags & BPF_F_NO_PREALLOC) { 558 561 assert(fd < 0 && errno == EINVAL); 559 562 return; 560 563 } ··· 603 610 vals[i] = rand(); 604 611 605 612 /* Invalid key size */ 606 - fd = bpf_create_map(BPF_MAP_TYPE_STACK, 4, sizeof(val), MAP_SIZE, 607 - map_flags); 613 + fd = bpf_map_create(BPF_MAP_TYPE_STACK, NULL, 4, sizeof(val), MAP_SIZE, &map_opts); 608 614 assert(fd < 0 && errno == EINVAL); 609 615 610 - fd = bpf_create_map(BPF_MAP_TYPE_STACK, 0, sizeof(val), MAP_SIZE, 611 - map_flags); 616 + fd = bpf_map_create(BPF_MAP_TYPE_STACK, NULL, 0, sizeof(val), MAP_SIZE, &map_opts); 612 617 /* Stack map does not support BPF_F_NO_PREALLOC */ 613 - if (map_flags & BPF_F_NO_PREALLOC) { 618 + if (map_opts.map_flags & BPF_F_NO_PREALLOC) { 614 619 assert(fd < 0 && errno == EINVAL); 615 620 return; 616 621 } ··· 735 744 } 736 745 737 746 /* Test sockmap with connected sockets */ 738 - fd = bpf_create_map(BPF_MAP_TYPE_SOCKMAP, 747 + fd = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, 739 748 sizeof(key), sizeof(value), 740 - 6, 0); 749 + 6, NULL); 741 750 if (fd < 0) { 742 751 if (!bpf_probe_map_type(BPF_MAP_TYPE_SOCKMAP, 0)) { 743 752 printf("%s SKIP (unsupported map type BPF_MAP_TYPE_SOCKMAP)\n", ··· 1159 1168 1160 1169 obj = bpf_object__open(MAPINMAP_PROG); 1161 1170 1162 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), sizeof(int), 1163 - 2, 0); 1171 + fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(int), sizeof(int), 2, NULL); 1164 1172 if (fd < 0) { 1165 1173 printf("Failed to create hashmap '%s'!\n", strerror(errno)); 1166 1174 exit(1); ··· 1305 1315 } key; 1306 1316 int fd, i, value; 1307 1317 1308 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 1309 - MAP_SIZE, map_flags); 1318 + fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), 1319 + MAP_SIZE, &map_opts); 1310 1320 if (fd < 0) { 1311 1321 printf("Failed to create large map '%s'!\n", strerror(errno)); 1312 1322 exit(1); ··· 1459 1469 int i, fd, key = 0, value = 0; 1460 1470 int data[2]; 1461 1471 1462 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 1463 - MAP_SIZE, map_flags); 1472 + fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), 1473 + MAP_SIZE, &map_opts); 1464 1474 if (fd < 0) { 1465 1475 printf("Failed to create map for parallel test '%s'!\n", 1466 1476 strerror(errno)); ··· 1508 1518 static void test_map_rdonly(void) 1509 1519 { 1510 1520 int fd, key = 0, value = 0; 1521 + __u32 old_flags; 1511 1522 1512 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 1513 - MAP_SIZE, map_flags | BPF_F_RDONLY); 1523 + old_flags = map_opts.map_flags; 1524 + map_opts.map_flags |= BPF_F_RDONLY; 1525 + fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), 1526 + MAP_SIZE, &map_opts); 1527 + map_opts.map_flags = old_flags; 1514 1528 if (fd < 0) { 1515 1529 printf("Failed to create map for read only test '%s'!\n", 1516 1530 strerror(errno)); ··· 1537 1543 static void test_map_wronly_hash(void) 1538 1544 { 1539 1545 int fd, key = 0, value = 0; 1546 + __u32 old_flags; 1540 1547 1541 - fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 1542 - MAP_SIZE, map_flags | BPF_F_WRONLY); 1548 + old_flags = map_opts.map_flags; 1549 + map_opts.map_flags |= BPF_F_WRONLY; 1550 + fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), 1551 + MAP_SIZE, &map_opts); 1552 + map_opts.map_flags = old_flags; 1543 1553 if (fd < 0) { 1544 1554 printf("Failed to create map for write only test '%s'!\n", 1545 1555 strerror(errno)); ··· 1565 1567 static void test_map_wronly_stack_or_queue(enum bpf_map_type map_type) 1566 1568 { 1567 1569 int fd, value = 0; 1570 + __u32 old_flags; 1571 + 1568 1572 1569 1573 assert(map_type == BPF_MAP_TYPE_QUEUE || 1570 1574 map_type == BPF_MAP_TYPE_STACK); 1571 - fd = bpf_create_map(map_type, 0, sizeof(value), MAP_SIZE, 1572 - map_flags | BPF_F_WRONLY); 1575 + old_flags = map_opts.map_flags; 1576 + map_opts.map_flags |= BPF_F_WRONLY; 1577 + fd = bpf_map_create(map_type, NULL, 0, sizeof(value), MAP_SIZE, &map_opts); 1578 + map_opts.map_flags = old_flags; 1573 1579 /* Stack/Queue maps do not support BPF_F_NO_PREALLOC */ 1574 - if (map_flags & BPF_F_NO_PREALLOC) { 1580 + if (map_opts.map_flags & BPF_F_NO_PREALLOC) { 1575 1581 assert(fd < 0 && errno == EINVAL); 1576 1582 return; 1577 1583 } ··· 1702 1700 __u32 fds_idx = 0; 1703 1701 int fd; 1704 1702 1705 - map_fd = bpf_create_map(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, 1706 - sizeof(__u32), sizeof(__u64), array_size, 0); 1703 + map_fd = bpf_map_create(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, NULL, 1704 + sizeof(__u32), sizeof(__u64), array_size, NULL); 1707 1705 CHECK(map_fd < 0, "reuseport array create", 1708 1706 "map_fd:%d, errno:%d\n", map_fd, errno); 1709 1707 ··· 1839 1837 close(map_fd); 1840 1838 1841 1839 /* Test 32 bit fd */ 1842 - map_fd = bpf_create_map(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, 1843 - sizeof(__u32), sizeof(__u32), array_size, 0); 1840 + map_fd = bpf_map_create(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, NULL, 1841 + sizeof(__u32), sizeof(__u32), array_size, NULL); 1844 1842 CHECK(map_fd < 0, "reuseport array create", 1845 1843 "map_fd:%d, errno:%d\n", map_fd, errno); 1846 1844 prepare_reuseport_grp(SOCK_STREAM, map_fd, sizeof(__u32), &fd64, ··· 1898 1896 1899 1897 libbpf_set_strict_mode(LIBBPF_STRICT_ALL); 1900 1898 1901 - map_flags = 0; 1899 + map_opts.map_flags = 0; 1902 1900 run_all_tests(); 1903 1901 1904 - map_flags = BPF_F_NO_PREALLOC; 1902 + map_opts.map_flags = BPF_F_NO_PREALLOC; 1905 1903 run_all_tests(); 1906 1904 1907 1905 #define DEFINE_TEST(name) test_##name();

+14 -14

tools/testing/selftests/bpf/test_progs.c

··· 473 473 #include <prog_tests/tests.h> 474 474 #undef DEFINE_TEST 475 475 }; 476 - const int prog_test_cnt = ARRAY_SIZE(prog_test_defs); 476 + static const int prog_test_cnt = ARRAY_SIZE(prog_test_defs); 477 477 478 478 const char *argp_program_version = "test_progs 0.1"; 479 479 const char *argp_program_bug_address = "<bpf@vger.kernel.org>"; 480 - const char argp_program_doc[] = "BPF selftests test runner"; 480 + static const char argp_program_doc[] = "BPF selftests test runner"; 481 481 482 482 enum ARG_KEYS { 483 483 ARG_TEST_NUM = 'n', ··· 939 939 { 940 940 struct dispatch_data *data = ctx; 941 941 int sock_fd; 942 - FILE *log_fd = NULL; 942 + FILE *log_fp = NULL; 943 943 944 944 sock_fd = data->sock_fd; 945 945 ··· 1002 1002 1003 1003 /* collect all logs */ 1004 1004 if (msg_test_done.test_done.have_log) { 1005 - log_fd = open_memstream(&result->log_buf, &result->log_cnt); 1006 - if (!log_fd) 1005 + log_fp = open_memstream(&result->log_buf, &result->log_cnt); 1006 + if (!log_fp) 1007 1007 goto error; 1008 1008 1009 1009 while (true) { ··· 1014 1014 if (msg_log.type != MSG_TEST_LOG) 1015 1015 goto error; 1016 1016 1017 - fprintf(log_fd, "%s", msg_log.test_log.log_buf); 1017 + fprintf(log_fp, "%s", msg_log.test_log.log_buf); 1018 1018 if (msg_log.test_log.is_last) 1019 1019 break; 1020 1020 } 1021 - fclose(log_fd); 1022 - log_fd = NULL; 1021 + fclose(log_fp); 1022 + log_fp = NULL; 1023 1023 } 1024 1024 /* output log */ 1025 1025 { ··· 1045 1045 if (env.debug) 1046 1046 fprintf(stderr, "[%d]: Protocol/IO error: %s.\n", data->worker_id, strerror(errno)); 1047 1047 1048 - if (log_fd) 1049 - fclose(log_fd); 1048 + if (log_fp) 1049 + fclose(log_fp); 1050 1050 done: 1051 1051 { 1052 1052 struct msg msg_exit; ··· 1198 1198 env.sub_succ_cnt += result->sub_succ_cnt; 1199 1199 } 1200 1200 1201 + print_all_error_logs(); 1202 + 1201 1203 fprintf(stdout, "Summary: %d/%d PASSED, %d SKIPPED, %d FAILED\n", 1202 1204 env.succ_cnt, env.sub_succ_cnt, env.skip_cnt, env.fail_cnt); 1203 - 1204 - print_all_error_logs(); 1205 1205 1206 1206 /* reap all workers */ 1207 1207 for (i = 0; i < env.workers; i++) { ··· 1484 1484 if (env.list_test_names) 1485 1485 goto out; 1486 1486 1487 + print_all_error_logs(); 1488 + 1487 1489 fprintf(stdout, "Summary: %d/%d PASSED, %d SKIPPED, %d FAILED\n", 1488 1490 env.succ_cnt, env.sub_succ_cnt, env.skip_cnt, env.fail_cnt); 1489 - 1490 - print_all_error_logs(); 1491 1491 1492 1492 close(env.saved_netns_fd); 1493 1493 out:

+25 -12

tools/testing/selftests/bpf/test_sock_addr.c

··· 663 663 664 664 static int load_path(const struct sock_addr_test *test, const char *path) 665 665 { 666 - struct bpf_prog_load_attr attr; 667 666 struct bpf_object *obj; 668 - int prog_fd; 667 + struct bpf_program *prog; 668 + int err; 669 669 670 - memset(&attr, 0, sizeof(struct bpf_prog_load_attr)); 671 - attr.file = path; 672 - attr.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR; 673 - attr.expected_attach_type = test->expected_attach_type; 674 - attr.prog_flags = BPF_F_TEST_RND_HI32; 675 - 676 - if (bpf_prog_load_xattr(&attr, &obj, &prog_fd)) { 677 - if (test->expected_result != LOAD_REJECT) 678 - log_err(">>> Loading program (%s) error.\n", path); 670 + obj = bpf_object__open_file(path, NULL); 671 + err = libbpf_get_error(obj); 672 + if (err) { 673 + log_err(">>> Opening BPF object (%s) error.\n", path); 679 674 return -1; 680 675 } 681 676 682 - return prog_fd; 677 + prog = bpf_object__next_program(obj, NULL); 678 + if (!prog) 679 + goto err_out; 680 + 681 + bpf_program__set_type(prog, BPF_PROG_TYPE_CGROUP_SOCK_ADDR); 682 + bpf_program__set_expected_attach_type(prog, test->expected_attach_type); 683 + bpf_program__set_flags(prog, BPF_F_TEST_RND_HI32); 684 + 685 + err = bpf_object__load(obj); 686 + if (err) { 687 + if (test->expected_result != LOAD_REJECT) 688 + log_err(">>> Loading program (%s) error.\n", path); 689 + goto err_out; 690 + } 691 + 692 + return bpf_program__fd(prog); 693 + err_out: 694 + bpf_object__close(obj); 695 + return -1; 683 696 } 684 697 685 698 static int bind4_prog_load(const struct sock_addr_test *test)

+3 -2

tools/testing/selftests/bpf/test_tag.c

··· 185 185 186 186 int main(void) 187 187 { 188 + LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC); 188 189 uint32_t tests = 0; 189 190 int i, fd_map; 190 191 191 - fd_map = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(int), 192 - sizeof(int), 1, BPF_F_NO_PREALLOC); 192 + fd_map = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(int), 193 + sizeof(int), 1, &opts); 193 194 assert(fd_map > 0); 194 195 195 196 for (i = 0; i < 5; i++) {

+23 -31

tools/testing/selftests/bpf/test_verifier.c

··· 462 462 uint32_t size_value, uint32_t max_elem, 463 463 uint32_t extra_flags) 464 464 { 465 + LIBBPF_OPTS(bpf_map_create_opts, opts); 465 466 int fd; 466 467 467 - fd = bpf_create_map(type, size_key, size_value, max_elem, 468 - (type == BPF_MAP_TYPE_HASH ? 469 - BPF_F_NO_PREALLOC : 0) | extra_flags); 468 + opts.map_flags = (type == BPF_MAP_TYPE_HASH ? BPF_F_NO_PREALLOC : 0) | extra_flags; 469 + fd = bpf_map_create(type, NULL, size_key, size_value, max_elem, &opts); 470 470 if (fd < 0) { 471 471 if (skip_unsupported_map(type)) 472 472 return -1; ··· 522 522 { 523 523 int mfd, p1fd, p2fd, p3fd; 524 524 525 - mfd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, sizeof(int), 526 - sizeof(int), max_elem, 0); 525 + mfd = bpf_map_create(BPF_MAP_TYPE_PROG_ARRAY, NULL, sizeof(int), 526 + sizeof(int), max_elem, NULL); 527 527 if (mfd < 0) { 528 528 if (skip_unsupported_map(BPF_MAP_TYPE_PROG_ARRAY)) 529 529 return -1; ··· 553 553 554 554 static int create_map_in_map(void) 555 555 { 556 + LIBBPF_OPTS(bpf_map_create_opts, opts); 556 557 int inner_map_fd, outer_map_fd; 557 558 558 - inner_map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(int), 559 - sizeof(int), 1, 0); 559 + inner_map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, sizeof(int), 560 + sizeof(int), 1, NULL); 560 561 if (inner_map_fd < 0) { 561 562 if (skip_unsupported_map(BPF_MAP_TYPE_ARRAY)) 562 563 return -1; ··· 565 564 return inner_map_fd; 566 565 } 567 566 568 - outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL, 569 - sizeof(int), inner_map_fd, 1, 0); 567 + opts.inner_map_fd = inner_map_fd; 568 + outer_map_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL, 569 + sizeof(int), sizeof(int), 1, &opts); 570 570 if (outer_map_fd < 0) { 571 571 if (skip_unsupported_map(BPF_MAP_TYPE_ARRAY_OF_MAPS)) 572 572 return -1; ··· 586 584 BPF_MAP_TYPE_CGROUP_STORAGE; 587 585 int fd; 588 586 589 - fd = bpf_create_map(type, sizeof(struct bpf_cgroup_storage_key), 590 - TEST_DATA_LEN, 0, 0); 587 + fd = bpf_map_create(type, NULL, sizeof(struct bpf_cgroup_storage_key), 588 + TEST_DATA_LEN, 0, NULL); 591 589 if (fd < 0) { 592 590 if (skip_unsupported_map(type)) 593 591 return -1; ··· 654 652 memcpy(ptr, btf_str_sec, hdr.str_len); 655 653 ptr += hdr.str_len; 656 654 657 - btf_fd = bpf_load_btf(raw_btf, ptr - raw_btf, 0, 0, 0); 655 + btf_fd = bpf_btf_load(raw_btf, ptr - raw_btf, NULL); 658 656 free(raw_btf); 659 657 if (btf_fd < 0) 660 658 return -1; ··· 663 661 664 662 static int create_map_spin_lock(void) 665 663 { 666 - struct bpf_create_map_attr attr = { 667 - .name = "test_map", 668 - .map_type = BPF_MAP_TYPE_ARRAY, 669 - .key_size = 4, 670 - .value_size = 8, 671 - .max_entries = 1, 664 + LIBBPF_OPTS(bpf_map_create_opts, opts, 672 665 .btf_key_type_id = 1, 673 666 .btf_value_type_id = 3, 674 - }; 667 + ); 675 668 int fd, btf_fd; 676 669 677 670 btf_fd = load_btf(); 678 671 if (btf_fd < 0) 679 672 return -1; 680 - attr.btf_fd = btf_fd; 681 - fd = bpf_create_map_xattr(&attr); 673 + opts.btf_fd = btf_fd; 674 + fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "test_map", 4, 8, 1, &opts); 682 675 if (fd < 0) 683 676 printf("Failed to create map with spin_lock\n"); 684 677 return fd; ··· 681 684 682 685 static int create_sk_storage_map(void) 683 686 { 684 - struct bpf_create_map_attr attr = { 685 - .name = "test_map", 686 - .map_type = BPF_MAP_TYPE_SK_STORAGE, 687 - .key_size = 4, 688 - .value_size = 8, 689 - .max_entries = 0, 687 + LIBBPF_OPTS(bpf_map_create_opts, opts, 690 688 .map_flags = BPF_F_NO_PREALLOC, 691 689 .btf_key_type_id = 1, 692 690 .btf_value_type_id = 3, 693 - }; 691 + ); 694 692 int fd, btf_fd; 695 693 696 694 btf_fd = load_btf(); 697 695 if (btf_fd < 0) 698 696 return -1; 699 - attr.btf_fd = btf_fd; 700 - fd = bpf_create_map_xattr(&attr); 701 - close(attr.btf_fd); 697 + opts.btf_fd = btf_fd; 698 + fd = bpf_map_create(BPF_MAP_TYPE_SK_STORAGE, "test_map", 4, 8, 0, &opts); 699 + close(opts.btf_fd); 702 700 if (fd < 0) 703 701 printf("Failed to create sk_storage_map\n"); 704 702 return fd;

+8 -6

tools/testing/selftests/bpf/testing_helpers.c

··· 88 88 int bpf_prog_test_load(const char *file, enum bpf_prog_type type, 89 89 struct bpf_object **pobj, int *prog_fd) 90 90 { 91 - struct bpf_object_load_attr attr = {}; 91 + LIBBPF_OPTS(bpf_object_open_opts, opts, 92 + .kernel_log_level = extra_prog_load_log_flags, 93 + ); 92 94 struct bpf_object *obj; 93 95 struct bpf_program *prog; 96 + __u32 flags; 94 97 int err; 95 98 96 - obj = bpf_object__open(file); 99 + obj = bpf_object__open_file(file, &opts); 97 100 if (!obj) 98 101 return -errno; 99 102 ··· 109 106 if (type != BPF_PROG_TYPE_UNSPEC) 110 107 bpf_program__set_type(prog, type); 111 108 112 - bpf_program__set_extra_flags(prog, BPF_F_TEST_RND_HI32); 109 + flags = bpf_program__flags(prog) | BPF_F_TEST_RND_HI32; 110 + bpf_program__set_flags(prog, flags); 113 111 114 - attr.obj = obj; 115 - attr.log_level = extra_prog_load_log_flags; 116 - err = bpf_object__load_xattr(&attr); 112 + err = bpf_object__load(obj); 117 113 if (err) 118 114 goto err_out; 119 115

+31 -15

tools/testing/selftests/bpf/vmtest.sh

··· 4 4 set -u 5 5 set -e 6 6 7 - # This script currently only works for x86_64, as 8 - # it is based on the VM image used by the BPF CI which is 9 - # x86_64. 10 - QEMU_BINARY="${QEMU_BINARY:="qemu-system-x86_64"}" 11 - X86_BZIMAGE="arch/x86/boot/bzImage" 7 + # This script currently only works for x86_64 and s390x, as 8 + # it is based on the VM image used by the BPF CI, which is 9 + # available only for these architectures. 10 + ARCH="$(uname -m)" 11 + case "${ARCH}" in 12 + s390x) 13 + QEMU_BINARY=qemu-system-s390x 14 + QEMU_CONSOLE="ttyS1" 15 + QEMU_FLAGS=(-smp 2) 16 + BZIMAGE="arch/s390/boot/compressed/vmlinux" 17 + ;; 18 + x86_64) 19 + QEMU_BINARY=qemu-system-x86_64 20 + QEMU_CONSOLE="ttyS0,115200" 21 + QEMU_FLAGS=(-cpu host -smp 8) 22 + BZIMAGE="arch/x86/boot/bzImage" 23 + ;; 24 + *) 25 + echo "Unsupported architecture" 26 + exit 1 27 + ;; 28 + esac 12 29 DEFAULT_COMMAND="./test_progs" 13 30 MOUNT_DIR="mnt" 14 31 ROOTFS_IMAGE="root.img" 15 32 OUTPUT_DIR="$HOME/.bpf_selftests" 16 - KCONFIG_URL="https://raw.githubusercontent.com/libbpf/libbpf/master/travis-ci/vmtest/configs/latest.config" 17 - KCONFIG_API_URL="https://api.github.com/repos/libbpf/libbpf/contents/travis-ci/vmtest/configs/latest.config" 33 + KCONFIG_URL="https://raw.githubusercontent.com/libbpf/libbpf/master/travis-ci/vmtest/configs/config-latest.${ARCH}" 34 + KCONFIG_API_URL="https://api.github.com/repos/libbpf/libbpf/contents/travis-ci/vmtest/configs/config-latest.${ARCH}" 18 35 INDEX_URL="https://raw.githubusercontent.com/libbpf/libbpf/master/travis-ci/vmtest/configs/INDEX" 19 36 NUM_COMPILE_JOBS="$(nproc)" 20 37 LOG_FILE_BASE="$(date +"bpf_selftests.%Y-%m-%d_%H-%M-%S")" ··· 102 85 { 103 86 { 104 87 for file in "${!URLS[@]}"; do 105 - if [[ $file =~ ^libbpf-vmtest-rootfs-(.*)\.tar\.zst$ ]]; then 88 + if [[ $file =~ ^"${ARCH}"/libbpf-vmtest-rootfs-(.*)\.tar\.zst$ ]]; then 106 89 echo "${BASH_REMATCH[1]}" 107 90 fi 108 91 done ··· 119 102 exit 1 120 103 fi 121 104 122 - download "libbpf-vmtest-rootfs-$rootfsversion.tar.zst" | 105 + download "${ARCH}/libbpf-vmtest-rootfs-$rootfsversion.tar.zst" | 123 106 zstd -d | sudo tar -C "$dir" -x 124 107 } 125 108 ··· 241 224 -nodefaults \ 242 225 -display none \ 243 226 -serial mon:stdio \ 244 - -cpu host \ 227 + "${qemu_flags[@]}" \ 245 228 -enable-kvm \ 246 - -smp 8 \ 247 229 -m 4G \ 248 230 -drive file="${rootfs_img}",format=raw,index=1,media=disk,if=virtio,cache=none \ 249 231 -kernel "${kernel_bzimage}" \ 250 - -append "root=/dev/vda rw console=ttyS0,115200" 232 + -append "root=/dev/vda rw console=${QEMU_CONSOLE}" 251 233 } 252 234 253 235 copy_logs() ··· 298 282 local kernel_checkout=$(realpath "${script_dir}"/../../../../) 299 283 # By default the script searches for the kernel in the checkout directory but 300 284 # it also obeys environment variables O= and KBUILD_OUTPUT= 301 - local kernel_bzimage="${kernel_checkout}/${X86_BZIMAGE}" 285 + local kernel_bzimage="${kernel_checkout}/${BZIMAGE}" 302 286 local command="${DEFAULT_COMMAND}" 303 287 local update_image="no" 304 288 local exit_command="poweroff -f" ··· 353 337 if is_rel_path "${O}"; then 354 338 O="$(realpath "${PWD}/${O}")" 355 339 fi 356 - kernel_bzimage="${O}/${X86_BZIMAGE}" 340 + kernel_bzimage="${O}/${BZIMAGE}" 357 341 make_command="${make_command} O=${O}" 358 342 elif [[ "${KBUILD_OUTPUT:=""}" != "" ]]; then 359 343 if is_rel_path "${KBUILD_OUTPUT}"; then 360 344 KBUILD_OUTPUT="$(realpath "${PWD}/${KBUILD_OUTPUT}")" 361 345 fi 362 - kernel_bzimage="${KBUILD_OUTPUT}/${X86_BZIMAGE}" 346 + kernel_bzimage="${KBUILD_OUTPUT}/${BZIMAGE}" 363 347 make_command="${make_command} KBUILD_OUTPUT=${KBUILD_OUTPUT}" 364 348 fi 365 349

+8 -7

tools/testing/selftests/bpf/xdp_redirect_multi.c

··· 85 85 { 86 86 int prog_fd, group_all, mac_map; 87 87 struct bpf_program *ingress_prog, *egress_prog; 88 - struct bpf_prog_load_attr prog_load_attr = { 89 - .prog_type = BPF_PROG_TYPE_UNSPEC, 90 - }; 91 - int i, ret, opt, egress_prog_fd = 0; 88 + int i, err, ret, opt, egress_prog_fd = 0; 92 89 struct bpf_devmap_val devmap_val; 93 90 bool attach_egress_prog = false; 94 91 unsigned char mac_addr[6]; ··· 144 147 printf("\n"); 145 148 146 149 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 147 - prog_load_attr.file = filename; 148 - 149 - if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) 150 + obj = bpf_object__open_file(filename, NULL); 151 + err = libbpf_get_error(obj); 152 + if (err) 150 153 goto err_out; 154 + err = bpf_object__load(obj); 155 + if (err) 156 + goto err_out; 157 + prog_fd = bpf_program__fd(bpf_object__next_program(obj, NULL)); 151 158 152 159 if (attach_egress_prog) 153 160 group_all = bpf_object__find_map_fd_by_name(obj, "map_egress");

+11 -1

tools/testing/selftests/bpf/xdpxceiver.c

··· 100 100 #include "xdpxceiver.h" 101 101 #include "../kselftest.h" 102 102 103 + /* AF_XDP APIs were moved into libxdp and marked as deprecated in libbpf. 104 + * Until xdpxceiver is either moved or re-writed into libxdp, suppress 105 + * deprecation warnings in this file 106 + */ 107 + #pragma GCC diagnostic ignored "-Wdeprecated-declarations" 108 + 103 109 static const char *MAC1 = "\x00\x0A\x56\x9E\xEE\x62"; 104 110 static const char *MAC2 = "\x00\x0A\x56\x9E\xEE\x61"; 105 111 static const char *IP1 = "192.168.100.162"; ··· 1223 1217 void *bufs; 1224 1218 1225 1219 bufs = mmap(NULL, mmap_sz, PROT_READ | PROT_WRITE, 1226 - MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE | MAP_HUGETLB, -1, 0); 1220 + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); 1227 1221 if (bufs == MAP_FAILED) 1228 1222 return false; 1229 1223 ··· 1370 1364 testapp_invalid_desc(test); 1371 1365 break; 1372 1366 case TEST_TYPE_UNALIGNED_INV_DESC: 1367 + if (!hugepages_present(test->ifobj_tx)) { 1368 + ksft_test_result_skip("No 2M huge pages present.\n"); 1369 + return; 1370 + } 1373 1371 test_spec_set_name(test, "UNALIGNED_INV_DESC"); 1374 1372 test->ifobj_tx->umem->unaligned_mode = true; 1375 1373 test->ifobj_rx->umem->unaligned_mode = true;