Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+3

Documentation/admin-guide/sysctl/net.rst

··· 102 102 - 1 - enable JIT hardening for unprivileged users only 103 103 - 2 - enable JIT hardening for all users 104 104 105 + where "privileged user" in this context means a process having 106 + CAP_BPF or CAP_SYS_ADMIN in the root user name space. 107 + 105 108 bpf_jit_kallsyms 106 109 ---------------- 107 110

+30

Documentation/bpf/clang-notes.rst

··· 1 + .. contents:: 2 + .. sectnum:: 3 + 4 + ========================== 5 + Clang implementation notes 6 + ========================== 7 + 8 + This document provides more details specific to the Clang/LLVM implementation of the eBPF instruction set. 9 + 10 + Versions 11 + ======== 12 + 13 + Clang defined "CPU" versions, where a CPU version of 3 corresponds to the current eBPF ISA. 14 + 15 + Clang can select the eBPF ISA version using ``-mcpu=v3`` for example to select version 3. 16 + 17 + Arithmetic instructions 18 + ======================= 19 + 20 + For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with 21 + ``-Xclang -target-feature -Xclang +alu32``. In CPU version 3, support is automatically included. 22 + 23 + Atomic operations 24 + ================= 25 + 26 + Clang can generate atomic instructions by default when ``-mcpu=v3`` is 27 + enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction 28 + Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable 29 + the atomics features, while keeping a lower ``-mcpu`` version, you can use 30 + ``-Xclang -target-feature -Xclang +alu32``.

+2

Documentation/bpf/index.rst

··· 26 26 classic_vs_extended.rst 27 27 bpf_licensing 28 28 test_debug 29 + clang-notes 30 + linux-notes 29 31 other 30 32 31 33 .. only:: subproject and html

+136 -174

Documentation/bpf/instruction-set.rst

··· 1 + .. contents:: 2 + .. sectnum:: 1 3 2 - ==================== 3 - eBPF Instruction Set 4 - ==================== 4 + ======================================== 5 + eBPF Instruction Set Specification, v1.0 6 + ======================================== 7 + 8 + This document specifies version 1.0 of the eBPF instruction set. 9 + 5 10 6 11 Registers and calling convention 7 12 ================================ ··· 16 11 17 12 The eBPF calling convention is defined as: 18 13 19 - * R0: return value from function calls, and exit value for eBPF programs 20 - * R1 - R5: arguments for function calls 21 - * R6 - R9: callee saved registers that function calls will preserve 22 - * R10: read-only frame pointer to access stack 14 + * R0: return value from function calls, and exit value for eBPF programs 15 + * R1 - R5: arguments for function calls 16 + * R6 - R9: callee saved registers that function calls will preserve 17 + * R10: read-only frame pointer to access stack 23 18 24 19 R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if 25 20 necessary across calls. ··· 29 24 30 25 eBPF has two instruction encodings: 31 26 32 - * the basic instruction encoding, which uses 64 bits to encode an instruction 33 - * the wide instruction encoding, which appends a second 64-bit immediate value 34 - (imm64) after the basic instruction for a total of 128 bits. 27 + * the basic instruction encoding, which uses 64 bits to encode an instruction 28 + * the wide instruction encoding, which appends a second 64-bit immediate value 29 + (imm64) after the basic instruction for a total of 128 bits. 35 30 36 31 The basic instruction encoding looks as follows: 37 32 38 - ============= ======= =============== ==================== ============ 39 - 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) 40 - ============= ======= =============== ==================== ============ 41 - immediate offset source register destination register opcode 42 - ============= ======= =============== ==================== ============ 33 + ============= ======= =============== ==================== ============ 34 + 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) 35 + ============= ======= =============== ==================== ============ 36 + immediate offset source register destination register opcode 37 + ============= ======= =============== ==================== ============ 43 38 44 39 Note that most instructions do not use all of the fields. 45 40 Unused fields shall be cleared to zero. ··· 49 44 50 45 The three LSB bits of the 'opcode' field store the instruction class: 51 46 52 - ========= ===== =============================== 53 - class value description 54 - ========= ===== =============================== 55 - BPF_LD 0x00 non-standard load operations 56 - BPF_LDX 0x01 load into register operations 57 - BPF_ST 0x02 store from immediate operations 58 - BPF_STX 0x03 store from register operations 59 - BPF_ALU 0x04 32-bit arithmetic operations 60 - BPF_JMP 0x05 64-bit jump operations 61 - BPF_JMP32 0x06 32-bit jump operations 62 - BPF_ALU64 0x07 64-bit arithmetic operations 63 - ========= ===== =============================== 47 + ========= ===== =============================== =================================== 48 + class value description reference 49 + ========= ===== =============================== =================================== 50 + BPF_LD 0x00 non-standard load operations `Load and store instructions`_ 51 + BPF_LDX 0x01 load into register operations `Load and store instructions`_ 52 + BPF_ST 0x02 store from immediate operations `Load and store instructions`_ 53 + BPF_STX 0x03 store from register operations `Load and store instructions`_ 54 + BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_ 55 + BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_ 56 + BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_ 57 + BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_ 58 + ========= ===== =============================== =================================== 64 59 65 60 Arithmetic and jump instructions 66 61 ================================ 67 62 68 - For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and 69 - BPF_JMP32), the 8-bit 'opcode' field is divided into three parts: 63 + For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and 64 + ``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts: 70 65 71 - ============== ====== ================= 72 - 4 bits (MSB) 1 bit 3 bits (LSB) 73 - ============== ====== ================= 74 - operation code source instruction class 75 - ============== ====== ================= 66 + ============== ====== ================= 67 + 4 bits (MSB) 1 bit 3 bits (LSB) 68 + ============== ====== ================= 69 + operation code source instruction class 70 + ============== ====== ================= 76 71 77 72 The 4th bit encodes the source operand: 78 73 ··· 89 84 Arithmetic instructions 90 85 ----------------------- 91 86 92 - BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for 87 + ``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for 93 88 otherwise identical operations. 94 - The code field encodes the operation as below: 89 + The 'code' field encodes the operation as below: 95 90 96 - ======== ===== ================================================= 97 - code value description 98 - ======== ===== ================================================= 99 - BPF_ADD 0x00 dst += src 100 - BPF_SUB 0x10 dst -= src 101 - BPF_MUL 0x20 dst \*= src 102 - BPF_DIV 0x30 dst /= src 103 - BPF_OR 0x40 dst \|= src 104 - BPF_AND 0x50 dst &= src 105 - BPF_LSH 0x60 dst <<= src 106 - BPF_RSH 0x70 dst >>= src 107 - BPF_NEG 0x80 dst = ~src 108 - BPF_MOD 0x90 dst %= src 109 - BPF_XOR 0xa0 dst ^= src 110 - BPF_MOV 0xb0 dst = src 111 - BPF_ARSH 0xc0 sign extending shift right 112 - BPF_END 0xd0 byte swap operations (see separate section below) 113 - ======== ===== ================================================= 91 + ======== ===== ========================================================== 92 + code value description 93 + ======== ===== ========================================================== 94 + BPF_ADD 0x00 dst += src 95 + BPF_SUB 0x10 dst -= src 96 + BPF_MUL 0x20 dst \*= src 97 + BPF_DIV 0x30 dst /= src 98 + BPF_OR 0x40 dst \|= src 99 + BPF_AND 0x50 dst &= src 100 + BPF_LSH 0x60 dst <<= src 101 + BPF_RSH 0x70 dst >>= src 102 + BPF_NEG 0x80 dst = ~src 103 + BPF_MOD 0x90 dst %= src 104 + BPF_XOR 0xa0 dst ^= src 105 + BPF_MOV 0xb0 dst = src 106 + BPF_ARSH 0xc0 sign extending shift right 107 + BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below) 108 + ======== ===== ========================================================== 114 109 115 - BPF_ADD | BPF_X | BPF_ALU means:: 110 + ``BPF_ADD | BPF_X | BPF_ALU`` means:: 116 111 117 112 dst_reg = (u32) dst_reg + (u32) src_reg; 118 113 119 - BPF_ADD | BPF_X | BPF_ALU64 means:: 114 + ``BPF_ADD | BPF_X | BPF_ALU64`` means:: 120 115 121 116 dst_reg = dst_reg + src_reg 122 117 123 - BPF_XOR | BPF_K | BPF_ALU means:: 118 + ``BPF_XOR | BPF_K | BPF_ALU`` means:: 124 119 125 120 src_reg = (u32) src_reg ^ (u32) imm32 126 121 127 - BPF_XOR | BPF_K | BPF_ALU64 means:: 122 + ``BPF_XOR | BPF_K | BPF_ALU64`` means:: 128 123 129 124 src_reg = src_reg ^ imm32 130 125 131 126 132 127 Byte swap instructions 133 - ---------------------- 128 + ~~~~~~~~~~~~~~~~~~~~~~ 134 129 135 130 The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit 136 - code field of ``BPF_END``. 131 + 'code' field of ``BPF_END``. 137 132 138 133 The byte swap instructions operate on the destination register 139 134 only and do not use a separate source register or immediate value. ··· 141 136 The 1-bit source operand field in the opcode is used to to select what byte 142 137 order the operation convert from or to: 143 138 144 - ========= ===== ================================================= 145 - source value description 146 - ========= ===== ================================================= 147 - BPF_TO_LE 0x00 convert between host byte order and little endian 148 - BPF_TO_BE 0x08 convert between host byte order and big endian 149 - ========= ===== ================================================= 139 + ========= ===== ================================================= 140 + source value description 141 + ========= ===== ================================================= 142 + BPF_TO_LE 0x00 convert between host byte order and little endian 143 + BPF_TO_BE 0x08 convert between host byte order and big endian 144 + ========= ===== ================================================= 150 145 151 - The imm field encodes the width of the swap operations. The following widths 146 + The 'imm' field encodes the width of the swap operations. The following widths 152 147 are supported: 16, 32 and 64. 153 148 154 149 Examples: ··· 161 156 162 157 dst_reg = htobe64(dst_reg) 163 158 164 - ``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and 165 - ``BPF_TO_BE`` respectively. 166 - 167 - 168 159 Jump instructions 169 160 ----------------- 170 161 171 - BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for 162 + ``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for 172 163 otherwise identical operations. 173 - The code field encodes the operation as below: 164 + The 'code' field encodes the operation as below: 174 165 175 - ======== ===== ========================= ============ 176 - code value description notes 177 - ======== ===== ========================= ============ 178 - BPF_JA 0x00 PC += off BPF_JMP only 179 - BPF_JEQ 0x10 PC += off if dst == src 180 - BPF_JGT 0x20 PC += off if dst > src unsigned 181 - BPF_JGE 0x30 PC += off if dst >= src unsigned 182 - BPF_JSET 0x40 PC += off if dst & src 183 - BPF_JNE 0x50 PC += off if dst != src 184 - BPF_JSGT 0x60 PC += off if dst > src signed 185 - BPF_JSGE 0x70 PC += off if dst >= src signed 186 - BPF_CALL 0x80 function call 187 - BPF_EXIT 0x90 function / program return BPF_JMP only 188 - BPF_JLT 0xa0 PC += off if dst < src unsigned 189 - BPF_JLE 0xb0 PC += off if dst <= src unsigned 190 - BPF_JSLT 0xc0 PC += off if dst < src signed 191 - BPF_JSLE 0xd0 PC += off if dst <= src signed 192 - ======== ===== ========================= ============ 166 + ======== ===== ========================= ============ 167 + code value description notes 168 + ======== ===== ========================= ============ 169 + BPF_JA 0x00 PC += off BPF_JMP only 170 + BPF_JEQ 0x10 PC += off if dst == src 171 + BPF_JGT 0x20 PC += off if dst > src unsigned 172 + BPF_JGE 0x30 PC += off if dst >= src unsigned 173 + BPF_JSET 0x40 PC += off if dst & src 174 + BPF_JNE 0x50 PC += off if dst != src 175 + BPF_JSGT 0x60 PC += off if dst > src signed 176 + BPF_JSGE 0x70 PC += off if dst >= src signed 177 + BPF_CALL 0x80 function call 178 + BPF_EXIT 0x90 function / program return BPF_JMP only 179 + BPF_JLT 0xa0 PC += off if dst < src unsigned 180 + BPF_JLE 0xb0 PC += off if dst <= src unsigned 181 + BPF_JSLT 0xc0 PC += off if dst < src signed 182 + BPF_JSLE 0xd0 PC += off if dst <= src signed 183 + ======== ===== ========================= ============ 193 184 194 185 The eBPF program needs to store the return value into register R0 before doing a 195 186 BPF_EXIT. ··· 194 193 Load and store instructions 195 194 =========================== 196 195 197 - For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the 196 + For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the 198 197 8-bit 'opcode' field is divided as: 199 198 200 - ============ ====== ================= 201 - 3 bits (MSB) 2 bits 3 bits (LSB) 202 - ============ ====== ================= 203 - mode size instruction class 204 - ============ ====== ================= 199 + ============ ====== ================= 200 + 3 bits (MSB) 2 bits 3 bits (LSB) 201 + ============ ====== ================= 202 + mode size instruction class 203 + ============ ====== ================= 204 + 205 + The mode modifier is one of: 206 + 207 + ============= ===== ==================================== ============= 208 + mode modifier value description reference 209 + ============= ===== ==================================== ============= 210 + BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_ 211 + BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ 212 + BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ 213 + BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ 214 + BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ 215 + ============= ===== ==================================== ============= 205 216 206 217 The size modifier is one of: 207 218 ··· 225 212 BPF_B 0x10 byte 226 213 BPF_DW 0x18 double word (8 bytes) 227 214 ============= ===== ===================== 228 - 229 - The mode modifier is one of: 230 - 231 - ============= ===== ==================================== 232 - mode modifier value description 233 - ============= ===== ==================================== 234 - BPF_IMM 0x00 64-bit immediate instructions 235 - BPF_ABS 0x20 legacy BPF packet access (absolute) 236 - BPF_IND 0x40 legacy BPF packet access (indirect) 237 - BPF_MEM 0x60 regular load and store operations 238 - BPF_ATOMIC 0xc0 atomic operations 239 - ============= ===== ==================================== 240 - 241 215 242 216 Regular load and store operations 243 217 --------------------------------- ··· 256 256 All atomic operations supported by eBPF are encoded as store operations 257 257 that use the ``BPF_ATOMIC`` mode modifier as follows: 258 258 259 - * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations 260 - * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations 261 - * 8-bit and 16-bit wide atomic operations are not supported. 259 + * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations 260 + * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations 261 + * 8-bit and 16-bit wide atomic operations are not supported. 262 262 263 - The imm field is used to encode the actual atomic operation. 263 + The 'imm' field is used to encode the actual atomic operation. 264 264 Simple atomic operation use a subset of the values defined to encode 265 - arithmetic operations in the imm field to encode the atomic operation: 265 + arithmetic operations in the 'imm' field to encode the atomic operation: 266 266 267 - ======== ===== =========== 268 - imm value description 269 - ======== ===== =========== 270 - BPF_ADD 0x00 atomic add 271 - BPF_OR 0x40 atomic or 272 - BPF_AND 0x50 atomic and 273 - BPF_XOR 0xa0 atomic xor 274 - ======== ===== =========== 267 + ======== ===== =========== 268 + imm value description 269 + ======== ===== =========== 270 + BPF_ADD 0x00 atomic add 271 + BPF_OR 0x40 atomic or 272 + BPF_AND 0x50 atomic and 273 + BPF_XOR 0xa0 atomic xor 274 + ======== ===== =========== 275 275 276 276 277 - ``BPF_ATOMIC | BPF_W | BPF_STX`` with imm = BPF_ADD means:: 277 + ``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means:: 278 278 279 279 *(u32 *)(dst_reg + off16) += src_reg 280 280 281 - ``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means:: 281 + ``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means:: 282 282 283 283 *(u64 *)(dst_reg + off16) += src_reg 284 - 285 - ``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``. 286 284 287 285 In addition to the simple atomic operations, there also is a modifier and 288 286 two complex atomic operations: 289 287 290 - =========== ================ =========================== 291 - imm value description 292 - =========== ================ =========================== 293 - BPF_FETCH 0x01 modifier: return old value 294 - BPF_XCHG 0xe0 | BPF_FETCH atomic exchange 295 - BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange 296 - =========== ================ =========================== 288 + =========== ================ =========================== 289 + imm value description 290 + =========== ================ =========================== 291 + BPF_FETCH 0x01 modifier: return old value 292 + BPF_XCHG 0xe0 | BPF_FETCH atomic exchange 293 + BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange 294 + =========== ================ =========================== 297 295 298 296 The ``BPF_FETCH`` modifier is optional for simple atomic operations, and 299 297 always set for the complex atomic operations. If the ``BPF_FETCH`` flag ··· 307 309 value that was at ``dst_reg + off`` before the operation is zero-extended 308 310 and loaded back to ``R0``. 309 311 310 - Clang can generate atomic instructions by default when ``-mcpu=v3`` is 311 - enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction 312 - Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable 313 - the atomics features, while keeping a lower ``-mcpu`` version, you can use 314 - ``-Xclang -target-feature -Xclang +alu32``. 315 - 316 312 64-bit immediate instructions 317 313 ----------------------------- 318 314 319 - Instructions with the ``BPF_IMM`` mode modifier use the wide instruction 315 + Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction 320 316 encoding for an extra imm64 value. 321 317 322 318 There is currently only one such instruction. ··· 323 331 Legacy BPF Packet access instructions 324 332 ------------------------------------- 325 333 326 - eBPF has special instructions for access to packet data that have been 327 - carried over from classic BPF to retain the performance of legacy socket 328 - filters running in the eBPF interpreter. 329 - 330 - The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and 331 - ``BPF_IND | <size> | BPF_LD``. 332 - 333 - These instructions are used to access packet data and can only be used when 334 - the program context is a pointer to networking packet. ``BPF_ABS`` 335 - accesses packet data at an absolute offset specified by the immediate data 336 - and ``BPF_IND`` access packet data at an offset that includes the value of 337 - a register in addition to the immediate data. 338 - 339 - These instructions have seven implicit operands: 340 - 341 - * Register R6 is an implicit input that must contain pointer to a 342 - struct sk_buff. 343 - * Register R0 is an implicit output which contains the data fetched from 344 - the packet. 345 - * Registers R1-R5 are scratch registers that are clobbered after a call to 346 - ``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions. 347 - 348 - These instructions have an implicit program exit condition as well. When an 349 - eBPF program is trying to access the data beyond the packet boundary, the 350 - program execution will be aborted. 351 - 352 - ``BPF_ABS | BPF_W | BPF_LD`` means:: 353 - 354 - R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32)) 355 - 356 - ``BPF_IND | BPF_W | BPF_LD`` means:: 357 - 358 - R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) 334 + eBPF previously introduced special instructions for access to packet data that were 335 + carried over from classic BPF. However, these instructions are 336 + deprecated and should no longer be used.

+16 -8

Documentation/bpf/kfuncs.rst

··· 137 137 -------------------------- 138 138 139 139 The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It 140 - indicates that the all pointer arguments will always be refcounted, and have 141 - their offset set to 0. It can be used to enforce that a pointer to a refcounted 142 - object acquired from a kfunc or BPF helper is passed as an argument to this 143 - kfunc without any modifications (e.g. pointer arithmetic) such that it is 144 - trusted and points to the original object. This flag is often used for kfuncs 145 - that operate (change some property, perform some operation) on an object that 146 - was obtained using an acquire kfunc. Such kfuncs need an unchanged pointer to 147 - ensure the integrity of the operation being performed on the expected object. 140 + indicates that the all pointer arguments will always have a guaranteed lifetime, 141 + and pointers to kernel objects are always passed to helpers in their unmodified 142 + form (as obtained from acquire kfuncs). 143 + 144 + It can be used to enforce that a pointer to a refcounted object acquired from a 145 + kfunc or BPF helper is passed as an argument to this kfunc without any 146 + modifications (e.g. pointer arithmetic) such that it is trusted and points to 147 + the original object. 148 + 149 + Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs, 150 + but those can have a non-zero offset. 151 + 152 + This flag is often used for kfuncs that operate (change some property, perform 153 + some operation) on an object that was obtained using an acquire kfunc. Such 154 + kfuncs need an unchanged pointer to ensure the integrity of the operation being 155 + performed on the expected object. 148 156 149 157 2.4.6 KF_SLEEPABLE flag 150 158 -----------------------

+53

Documentation/bpf/linux-notes.rst

··· 1 + .. contents:: 2 + .. sectnum:: 3 + 4 + ========================== 5 + Linux implementation notes 6 + ========================== 7 + 8 + This document provides more details specific to the Linux kernel implementation of the eBPF instruction set. 9 + 10 + Byte swap instructions 11 + ====================== 12 + 13 + ``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and ``BPF_TO_BE`` respectively. 14 + 15 + Legacy BPF Packet access instructions 16 + ===================================== 17 + 18 + As mentioned in the `ISA standard documentation <instruction-set.rst#legacy-bpf-packet-access-instructions>`_, 19 + Linux has special eBPF instructions for access to packet data that have been 20 + carried over from classic BPF to retain the performance of legacy socket 21 + filters running in the eBPF interpreter. 22 + 23 + The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and 24 + ``BPF_IND | <size> | BPF_LD``. 25 + 26 + These instructions are used to access packet data and can only be used when 27 + the program context is a pointer to a networking packet. ``BPF_ABS`` 28 + accesses packet data at an absolute offset specified by the immediate data 29 + and ``BPF_IND`` access packet data at an offset that includes the value of 30 + a register in addition to the immediate data. 31 + 32 + These instructions have seven implicit operands: 33 + 34 + * Register R6 is an implicit input that must contain a pointer to a 35 + struct sk_buff. 36 + * Register R0 is an implicit output which contains the data fetched from 37 + the packet. 38 + * Registers R1-R5 are scratch registers that are clobbered by the 39 + instruction. 40 + 41 + These instructions have an implicit program exit condition as well. If an 42 + eBPF program attempts access data beyond the packet boundary, the 43 + program execution will be aborted. 44 + 45 + ``BPF_ABS | BPF_W | BPF_LD`` (0x20) means:: 46 + 47 + R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + imm)) 48 + 49 + where ``ntohl()`` converts a 32-bit value from network byte order to host byte order. 50 + 51 + ``BPF_IND | BPF_W | BPF_LD`` (0x40) means:: 52 + 53 + R0 = ntohl(*(u32 *) ((struct sk_buff *) R6->data + src + imm))

+7 -1

arch/arm64/net/bpf_jit_comp.c

··· 1970 1970 u32 flags, struct bpf_tramp_links *tlinks, 1971 1971 void *orig_call) 1972 1972 { 1973 - int ret; 1973 + int i, ret; 1974 1974 int nargs = m->nr_args; 1975 1975 int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE; 1976 1976 struct jit_ctx ctx = { ··· 1981 1981 /* the first 8 arguments are passed by registers */ 1982 1982 if (nargs > 8) 1983 1983 return -ENOTSUPP; 1984 + 1985 + /* don't support struct argument */ 1986 + for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) { 1987 + if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) 1988 + return -ENOTSUPP; 1989 + } 1984 1990 1985 1991 ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags); 1986 1992 if (ret < 0)

+1

arch/x86/Kconfig

··· 284 284 select PROC_PID_ARCH_STATUS if PROC_FS 285 285 select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX 286 286 imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI 287 + select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE 287 288 288 289 config INSTRUCTION_DECODER 289 290 def_bool y

+67 -31

arch/x86/net/bpf_jit_comp.c

··· 662 662 */ 663 663 emit_mov_imm32(&prog, false, dst_reg, imm32_lo); 664 664 } else { 665 - /* movabsq %rax, imm64 */ 665 + /* movabsq rax, imm64 */ 666 666 EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg)); 667 667 EMIT(imm32_lo, 4); 668 668 EMIT(imm32_hi, 4); ··· 1751 1751 static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_args, 1752 1752 int stack_size) 1753 1753 { 1754 - int i; 1754 + int i, j, arg_size, nr_regs; 1755 1755 /* Store function arguments to stack. 1756 1756 * For a function that accepts two pointers the sequence will be: 1757 1757 * mov QWORD PTR [rbp-0x10],rdi 1758 1758 * mov QWORD PTR [rbp-0x8],rsi 1759 1759 */ 1760 - for (i = 0; i < min(nr_args, 6); i++) 1761 - emit_stx(prog, bytes_to_bpf_size(m->arg_size[i]), 1762 - BPF_REG_FP, 1763 - i == 5 ? X86_REG_R9 : BPF_REG_1 + i, 1764 - -(stack_size - i * 8)); 1760 + for (i = 0, j = 0; i < min(nr_args, 6); i++) { 1761 + if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) { 1762 + nr_regs = (m->arg_size[i] + 7) / 8; 1763 + arg_size = 8; 1764 + } else { 1765 + nr_regs = 1; 1766 + arg_size = m->arg_size[i]; 1767 + } 1768 + 1769 + while (nr_regs) { 1770 + emit_stx(prog, bytes_to_bpf_size(arg_size), 1771 + BPF_REG_FP, 1772 + j == 5 ? X86_REG_R9 : BPF_REG_1 + j, 1773 + -(stack_size - j * 8)); 1774 + nr_regs--; 1775 + j++; 1776 + } 1777 + } 1765 1778 } 1766 1779 1767 1780 static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_args, 1768 1781 int stack_size) 1769 1782 { 1770 - int i; 1783 + int i, j, arg_size, nr_regs; 1771 1784 1772 1785 /* Restore function arguments from stack. 1773 1786 * For a function that accepts two pointers the sequence will be: 1774 1787 * EMIT4(0x48, 0x8B, 0x7D, 0xF0); mov rdi,QWORD PTR [rbp-0x10] 1775 1788 * EMIT4(0x48, 0x8B, 0x75, 0xF8); mov rsi,QWORD PTR [rbp-0x8] 1776 1789 */ 1777 - for (i = 0; i < min(nr_args, 6); i++) 1778 - emit_ldx(prog, bytes_to_bpf_size(m->arg_size[i]), 1779 - i == 5 ? X86_REG_R9 : BPF_REG_1 + i, 1780 - BPF_REG_FP, 1781 - -(stack_size - i * 8)); 1790 + for (i = 0, j = 0; i < min(nr_args, 6); i++) { 1791 + if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) { 1792 + nr_regs = (m->arg_size[i] + 7) / 8; 1793 + arg_size = 8; 1794 + } else { 1795 + nr_regs = 1; 1796 + arg_size = m->arg_size[i]; 1797 + } 1798 + 1799 + while (nr_regs) { 1800 + emit_ldx(prog, bytes_to_bpf_size(arg_size), 1801 + j == 5 ? X86_REG_R9 : BPF_REG_1 + j, 1802 + BPF_REG_FP, 1803 + -(stack_size - j * 8)); 1804 + nr_regs--; 1805 + j++; 1806 + } 1807 + } 1782 1808 } 1783 1809 1784 1810 static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, ··· 1836 1810 if (p->aux->sleepable) { 1837 1811 enter = __bpf_prog_enter_sleepable; 1838 1812 exit = __bpf_prog_exit_sleepable; 1813 + } else if (p->type == BPF_PROG_TYPE_STRUCT_OPS) { 1814 + enter = __bpf_prog_enter_struct_ops; 1815 + exit = __bpf_prog_exit_struct_ops; 1839 1816 } else if (p->expected_attach_type == BPF_LSM_CGROUP) { 1840 1817 enter = __bpf_prog_enter_lsm_cgroup; 1841 1818 exit = __bpf_prog_exit_lsm_cgroup; ··· 2042 2013 int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, 2043 2014 const struct btf_func_model *m, u32 flags, 2044 2015 struct bpf_tramp_links *tlinks, 2045 - void *orig_call) 2016 + void *func_addr) 2046 2017 { 2047 - int ret, i, nr_args = m->nr_args; 2018 + int ret, i, nr_args = m->nr_args, extra_nregs = 0; 2048 2019 int regs_off, ip_off, args_off, stack_size = nr_args * 8, run_ctx_off; 2049 2020 struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; 2050 2021 struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; 2051 2022 struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; 2023 + void *orig_call = func_addr; 2052 2024 u8 **branches = NULL; 2053 2025 u8 *prog; 2054 2026 bool save_ret; ··· 2057 2027 /* x86-64 supports up to 6 arguments. 7+ can be added in the future */ 2058 2028 if (nr_args > 6) 2059 2029 return -ENOTSUPP; 2030 + 2031 + for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) { 2032 + if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) 2033 + extra_nregs += (m->arg_size[i] + 7) / 8 - 1; 2034 + } 2035 + if (nr_args + extra_nregs > 6) 2036 + return -ENOTSUPP; 2037 + stack_size += extra_nregs * 8; 2060 2038 2061 2039 /* Generated trampoline stack layout: 2062 2040 * ··· 2078 2040 * [ ... ] 2079 2041 * RBP - regs_off [ reg_arg1 ] program's ctx pointer 2080 2042 * 2081 - * RBP - args_off [ args count ] always 2043 + * RBP - args_off [ arg regs count ] always 2082 2044 * 2083 2045 * RBP - ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag 2084 2046 * ··· 2121 2083 EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */ 2122 2084 EMIT1(0x53); /* push rbx */ 2123 2085 2124 - /* Store number of arguments of the traced function: 2125 - * mov rax, nr_args 2086 + /* Store number of argument registers of the traced function: 2087 + * mov rax, nr_args + extra_nregs 2126 2088 * mov QWORD PTR [rbp - args_off], rax 2127 2089 */ 2128 - emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_args); 2090 + emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_args + extra_nregs); 2129 2091 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -args_off); 2130 2092 2131 2093 if (flags & BPF_TRAMP_F_IP_ARG) { 2132 2094 /* Store IP address of the traced function: 2133 - * mov rax, QWORD PTR [rbp + 8] 2134 - * sub rax, X86_PATCH_SIZE 2095 + * movabsq rax, func_addr 2135 2096 * mov QWORD PTR [rbp - ip_off], rax 2136 2097 */ 2137 - emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8); 2138 - EMIT4(0x48, 0x83, 0xe8, X86_PATCH_SIZE); 2098 + emit_mov_imm64(&prog, BPF_REG_0, (long) func_addr >> 32, (u32) (long) func_addr); 2139 2099 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ip_off); 2140 2100 } 2141 2101 ··· 2245 2209 return ret; 2246 2210 } 2247 2211 2248 - static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs) 2212 + static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs, u8 *image, u8 *buf) 2249 2213 { 2250 2214 u8 *jg_reloc, *prog = *pprog; 2251 2215 int pivot, err, jg_bytes = 1; ··· 2261 2225 EMIT2_off32(0x81, add_1reg(0xF8, BPF_REG_3), 2262 2226 progs[a]); 2263 2227 err = emit_cond_near_jump(&prog, /* je func */ 2264 - (void *)progs[a], prog, 2228 + (void *)progs[a], image + (prog - buf), 2265 2229 X86_JE); 2266 2230 if (err) 2267 2231 return err; 2268 2232 2269 - emit_indirect_jump(&prog, 2 /* rdx */, prog); 2233 + emit_indirect_jump(&prog, 2 /* rdx */, image + (prog - buf)); 2270 2234 2271 2235 *pprog = prog; 2272 2236 return 0; ··· 2291 2255 jg_reloc = prog; 2292 2256 2293 2257 err = emit_bpf_dispatcher(&prog, a, a + pivot, /* emit lower_part */ 2294 - progs); 2258 + progs, image, buf); 2295 2259 if (err) 2296 2260 return err; 2297 2261 ··· 2305 2269 emit_code(jg_reloc - jg_bytes, jg_offset, jg_bytes); 2306 2270 2307 2271 err = emit_bpf_dispatcher(&prog, a + pivot + 1, /* emit upper_part */ 2308 - b, progs); 2272 + b, progs, image, buf); 2309 2273 if (err) 2310 2274 return err; 2311 2275 ··· 2325 2289 return 0; 2326 2290 } 2327 2291 2328 - int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs) 2292 + int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs) 2329 2293 { 2330 - u8 *prog = image; 2294 + u8 *prog = buf; 2331 2295 2332 2296 sort(funcs, num_funcs, sizeof(funcs[0]), cmp_ips, NULL); 2333 - return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs); 2297 + return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs, image, buf); 2334 2298 } 2335 2299 2336 2300 struct x64_jit_data {

+10 -1

include/asm-generic/vmlinux.lds.h

··· 154 154 #define MEM_DISCARD(sec) *(.mem##sec) 155 155 #endif 156 156 157 + #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_NO_PATCHABLE 158 + #define KEEP_PATCHABLE KEEP(*(__patchable_function_entries)) 159 + #define PATCHABLE_DISCARDS 160 + #else 161 + #define KEEP_PATCHABLE 162 + #define PATCHABLE_DISCARDS *(__patchable_function_entries) 163 + #endif 164 + 157 165 #ifdef CONFIG_FTRACE_MCOUNT_RECORD 158 166 /* 159 167 * The ftrace call sites are logged to a section whose name depends on the ··· 180 172 #define MCOUNT_REC() . = ALIGN(8); \ 181 173 __start_mcount_loc = .; \ 182 174 KEEP(*(__mcount_loc)) \ 183 - KEEP(*(__patchable_function_entries)) \ 175 + KEEP_PATCHABLE \ 184 176 __stop_mcount_loc = .; \ 185 177 ftrace_stub_graph = ftrace_stub; \ 186 178 ftrace_ops_list_func = arch_ftrace_ops_list_func; ··· 1031 1023 1032 1024 #define COMMON_DISCARDS \ 1033 1025 SANITIZER_DISCARDS \ 1026 + PATCHABLE_DISCARDS \ 1034 1027 *(.discard) \ 1035 1028 *(.discard.*) \ 1036 1029 *(.modinfo) \

+137 -24

include/linux/bpf.h

··· 280 280 } 281 281 } 282 282 283 - /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */ 284 - static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) 283 + /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and 284 + * forced to use 'long' read/writes to try to atomically copy long counters. 285 + * Best-effort only. No barriers here, since it _will_ race with concurrent 286 + * updates from BPF programs. Called from bpf syscall and mostly used with 287 + * size 8 or 16 bytes, so ask compiler to inline it. 288 + */ 289 + static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) 290 + { 291 + const long *lsrc = src; 292 + long *ldst = dst; 293 + 294 + size /= sizeof(long); 295 + while (size--) 296 + *ldst++ = *lsrc++; 297 + } 298 + 299 + /* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */ 300 + static inline void __copy_map_value(struct bpf_map *map, void *dst, void *src, bool long_memcpy) 285 301 { 286 302 u32 curr_off = 0; 287 303 int i; 288 304 289 305 if (likely(!map->off_arr)) { 290 - memcpy(dst, src, map->value_size); 306 + if (long_memcpy) 307 + bpf_long_memcpy(dst, src, round_up(map->value_size, 8)); 308 + else 309 + memcpy(dst, src, map->value_size); 291 310 return; 292 311 } 293 312 ··· 318 299 } 319 300 memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off); 320 301 } 302 + 303 + static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) 304 + { 305 + __copy_map_value(map, dst, src, false); 306 + } 307 + 308 + static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src) 309 + { 310 + __copy_map_value(map, dst, src, true); 311 + } 312 + 313 + static inline void zero_map_value(struct bpf_map *map, void *dst) 314 + { 315 + u32 curr_off = 0; 316 + int i; 317 + 318 + if (likely(!map->off_arr)) { 319 + memset(dst, 0, map->value_size); 320 + return; 321 + } 322 + 323 + for (i = 0; i < map->off_arr->cnt; i++) { 324 + u32 next_off = map->off_arr->field_off[i]; 325 + 326 + memset(dst + curr_off, 0, next_off - curr_off); 327 + curr_off += map->off_arr->field_sz[i]; 328 + } 329 + memset(dst + curr_off, 0, map->value_size - curr_off); 330 + } 331 + 321 332 void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, 322 333 bool lock_src); 323 334 void bpf_timer_cancel_and_free(void *timer); ··· 451 402 /* DYNPTR points to memory local to the bpf program. */ 452 403 DYNPTR_TYPE_LOCAL = BIT(8 + BPF_BASE_TYPE_BITS), 453 404 454 - /* DYNPTR points to a ringbuf record. */ 405 + /* DYNPTR points to a kernel-produced ringbuf record. */ 455 406 DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), 456 407 457 408 /* Size is known at compile time. */ ··· 656 607 PTR_TO_MEM, /* reg points to valid memory region */ 657 608 PTR_TO_BUF, /* reg points to a read/write buffer */ 658 609 PTR_TO_FUNC, /* reg points to a bpf program function */ 610 + PTR_TO_DYNPTR, /* reg points to a dynptr */ 659 611 __BPF_REG_TYPE_MAX, 660 612 661 613 /* Extended reg_types. */ ··· 777 727 */ 778 728 #define MAX_BPF_FUNC_REG_ARGS 5 779 729 730 + /* The argument is a structure. */ 731 + #define BTF_FMODEL_STRUCT_ARG BIT(0) 732 + 780 733 struct btf_func_model { 781 734 u8 ret_size; 782 735 u8 nr_args; 783 736 u8 arg_size[MAX_BPF_FUNC_ARGS]; 737 + u8 arg_flags[MAX_BPF_FUNC_ARGS]; 784 738 }; 785 739 786 740 /* Restore arguments before returning from trampoline to let original function ··· 863 809 u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, 864 810 struct bpf_tramp_run_ctx *run_ctx); 865 811 void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, 812 + struct bpf_tramp_run_ctx *run_ctx); 813 + u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, 814 + struct bpf_tramp_run_ctx *run_ctx); 815 + void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, 866 816 struct bpf_tramp_run_ctx *run_ctx); 867 817 void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); 868 818 void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); ··· 950 892 struct bpf_dispatcher_prog progs[BPF_DISPATCHER_MAX]; 951 893 int num_progs; 952 894 void *image; 895 + void *rw_image; 953 896 u32 image_off; 954 897 struct bpf_ksym ksym; 955 898 }; ··· 969 910 struct bpf_trampoline *bpf_trampoline_get(u64 key, 970 911 struct bpf_attach_target_info *tgt_info); 971 912 void bpf_trampoline_put(struct bpf_trampoline *tr); 972 - int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs); 913 + int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs); 973 914 #define BPF_DISPATCHER_INIT(_name) { \ 974 915 .mutex = __MUTEX_INITIALIZER(_name.mutex), \ 975 916 .func = &_name##_func, \ ··· 983 924 }, \ 984 925 } 985 926 927 + #ifdef CONFIG_X86_64 928 + #define BPF_DISPATCHER_ATTRIBUTES __attribute__((patchable_function_entry(5))) 929 + #else 930 + #define BPF_DISPATCHER_ATTRIBUTES 931 + #endif 932 + 986 933 #define DEFINE_BPF_DISPATCHER(name) \ 934 + notrace BPF_DISPATCHER_ATTRIBUTES \ 987 935 noinline __nocfi unsigned int bpf_dispatcher_##name##_func( \ 988 936 const void *ctx, \ 989 937 const struct bpf_insn *insnsi, \ ··· 1012 946 void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, 1013 947 struct bpf_prog *to); 1014 948 /* Called only from JIT-enabled code, so there's no need for stubs. */ 1015 - void *bpf_jit_alloc_exec_page(void); 1016 949 void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym); 1017 950 void bpf_image_ksym_del(struct bpf_ksym *ksym); 1018 951 void bpf_ksym_add(struct bpf_ksym *ksym); ··· 1398 1333 1399 1334 #define BPF_MAP_CAN_READ BIT(0) 1400 1335 #define BPF_MAP_CAN_WRITE BIT(1) 1336 + 1337 + /* Maximum number of user-producer ring buffer samples that can be drained in 1338 + * a call to bpf_user_ringbuf_drain(). 1339 + */ 1340 + #define BPF_MAX_USER_RINGBUF_SAMPLES (128 * 1024) 1401 1341 1402 1342 static inline u32 bpf_map_flags_to_cap(struct bpf_map *map) 1403 1343 { ··· 1800 1730 extern int bpf_iter_ ## target(args); \ 1801 1731 int __init bpf_iter_ ## target(args) { return 0; } 1802 1732 1733 + /* 1734 + * The task type of iterators. 1735 + * 1736 + * For BPF task iterators, they can be parameterized with various 1737 + * parameters to visit only some of tasks. 1738 + * 1739 + * BPF_TASK_ITER_ALL (default) 1740 + * Iterate over resources of every task. 1741 + * 1742 + * BPF_TASK_ITER_TID 1743 + * Iterate over resources of a task/tid. 1744 + * 1745 + * BPF_TASK_ITER_TGID 1746 + * Iterate over resources of every task of a process / task group. 1747 + */ 1748 + enum bpf_iter_task_type { 1749 + BPF_TASK_ITER_ALL = 0, 1750 + BPF_TASK_ITER_TID, 1751 + BPF_TASK_ITER_TGID, 1752 + }; 1753 + 1803 1754 struct bpf_iter_aux_info { 1804 1755 /* for map_elem iter */ 1805 1756 struct bpf_map *map; ··· 1830 1739 struct cgroup *start; /* starting cgroup */ 1831 1740 enum bpf_cgroup_iter_order order; 1832 1741 } cgroup; 1742 + struct { 1743 + enum bpf_iter_task_type type; 1744 + u32 pid; 1745 + } task; 1833 1746 }; 1834 1747 1835 1748 typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog, ··· 1917 1822 int bpf_get_file_flag(int flags); 1918 1823 int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size, 1919 1824 size_t actual_size); 1920 - 1921 - /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and 1922 - * forced to use 'long' read/writes to try to atomically copy long counters. 1923 - * Best-effort only. No barriers here, since it _will_ race with concurrent 1924 - * updates from BPF programs. Called from bpf syscall and mostly used with 1925 - * size 8 or 16 bytes, so ask compiler to inline it. 1926 - */ 1927 - static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) 1928 - { 1929 - const long *lsrc = src; 1930 - long *ldst = dst; 1931 - 1932 - size /= sizeof(long); 1933 - while (size--) 1934 - *ldst++ = *lsrc++; 1935 - } 1936 1825 1937 1826 /* verify correctness of eBPF program */ 1938 1827 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr); ··· 2019 1940 const char *func_name, 2020 1941 struct btf_func_model *m); 2021 1942 1943 + struct bpf_kfunc_arg_meta { 1944 + u64 r0_size; 1945 + bool r0_rdonly; 1946 + int ref_obj_id; 1947 + u32 flags; 1948 + }; 1949 + 2022 1950 struct bpf_reg_state; 2023 1951 int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog, 2024 1952 struct bpf_reg_state *regs); 1953 + int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog, 1954 + struct bpf_reg_state *regs); 2025 1955 int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, 2026 1956 const struct btf *btf, u32 func_id, 2027 1957 struct bpf_reg_state *regs, 2028 - u32 kfunc_flags); 1958 + struct bpf_kfunc_arg_meta *meta); 2029 1959 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, 2030 1960 struct bpf_reg_state *reg); 2031 1961 int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog, ··· 2071 1983 { 2072 1984 return !!current->bpf_ctx; 2073 1985 } 1986 + 1987 + void notrace bpf_prog_inc_misses_counter(struct bpf_prog *prog); 2074 1988 #else /* !CONFIG_BPF_SYSCALL */ 2075 1989 static inline struct bpf_prog *bpf_prog_get(u32 ufd) 2076 1990 { ··· 2255 2165 return ERR_PTR(-ENOTSUPP); 2256 2166 } 2257 2167 2168 + static inline int btf_struct_access(struct bpf_verifier_log *log, 2169 + const struct btf *btf, 2170 + const struct btf_type *t, int off, int size, 2171 + enum bpf_access_type atype, 2172 + u32 *next_btf_id, enum bpf_type_flag *flag) 2173 + { 2174 + return -EACCES; 2175 + } 2176 + 2258 2177 static inline const struct bpf_func_proto * 2259 2178 bpf_base_func_proto(enum bpf_func_id func_id) 2260 2179 { ··· 2294 2195 static inline bool has_current_bpf_ctx(void) 2295 2196 { 2296 2197 return false; 2198 + } 2199 + 2200 + static inline void bpf_prog_inc_misses_counter(struct bpf_prog *prog) 2201 + { 2297 2202 } 2298 2203 #endif /* CONFIG_BPF_SYSCALL */ 2299 2204 ··· 2536 2433 extern const struct bpf_func_proto bpf_copy_from_user_task_proto; 2537 2434 extern const struct bpf_func_proto bpf_set_retval_proto; 2538 2435 extern const struct bpf_func_proto bpf_get_retval_proto; 2436 + extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto; 2539 2437 2540 2438 const struct bpf_func_proto *tracing_prog_func_proto( 2541 2439 enum bpf_func_id func_id, const struct bpf_prog *prog); ··· 2681 2577 BPF_DYNPTR_TYPE_INVALID, 2682 2578 /* Points to memory that is local to the bpf program */ 2683 2579 BPF_DYNPTR_TYPE_LOCAL, 2684 - /* Underlying data is a ringbuf record */ 2580 + /* Underlying data is a kernel-produced ringbuf record */ 2685 2581 BPF_DYNPTR_TYPE_RINGBUF, 2686 2582 }; 2687 2583 ··· 2689 2585 enum bpf_dynptr_type type, u32 offset, u32 size); 2690 2586 void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); 2691 2587 int bpf_dynptr_check_size(u32 size); 2588 + u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr); 2692 2589 2693 2590 #ifdef CONFIG_BPF_LSM 2694 2591 void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype); ··· 2699 2594 static inline void bpf_cgroup_atype_put(int cgroup_atype) {} 2700 2595 #endif /* CONFIG_BPF_LSM */ 2701 2596 2597 + struct key; 2598 + 2599 + #ifdef CONFIG_KEYS 2600 + struct bpf_key { 2601 + struct key *key; 2602 + bool has_ref; 2603 + }; 2604 + #endif /* CONFIG_KEYS */ 2702 2605 #endif /* _LINUX_BPF_H */

+1

include/linux/bpf_types.h

··· 126 126 #endif 127 127 BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops) 128 128 BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops) 129 + BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops) 129 130 130 131 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint) 131 132 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)

+29

include/linux/bpf_verifier.h

··· 248 248 */ 249 249 u32 async_entry_cnt; 250 250 bool in_callback_fn; 251 + struct tnum callback_ret_range; 251 252 bool in_async_callback_fn; 252 253 253 254 /* The following fields should be last. See copy_func_state() */ ··· 348 347 for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \ 349 348 iter < frame->allocated_stack / BPF_REG_SIZE; \ 350 349 iter++, reg = bpf_get_spilled_reg(iter, frame)) 350 + 351 + /* Invoke __expr over regsiters in __vst, setting __state and __reg */ 352 + #define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \ 353 + ({ \ 354 + struct bpf_verifier_state *___vstate = __vst; \ 355 + int ___i, ___j; \ 356 + for (___i = 0; ___i <= ___vstate->curframe; ___i++) { \ 357 + struct bpf_reg_state *___regs; \ 358 + __state = ___vstate->frame[___i]; \ 359 + ___regs = __state->regs; \ 360 + for (___j = 0; ___j < MAX_BPF_REG; ___j++) { \ 361 + __reg = &___regs[___j]; \ 362 + (void)(__expr); \ 363 + } \ 364 + bpf_for_each_spilled_reg(___j, __state, __reg) { \ 365 + if (!__reg) \ 366 + continue; \ 367 + (void)(__expr); \ 368 + } \ 369 + } \ 370 + }) 351 371 352 372 /* linked list of verifier states used to prune search */ 353 373 struct bpf_verifier_state_list { ··· 593 571 u32 regno); 594 572 int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 595 573 u32 regno, u32 mem_size); 574 + bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, 575 + struct bpf_reg_state *reg); 576 + bool is_dynptr_type_expected(struct bpf_verifier_env *env, 577 + struct bpf_reg_state *reg, 578 + enum bpf_arg_type arg_type); 596 579 597 580 /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */ 598 581 static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog, ··· 624 597 u32 btf_id, 625 598 struct bpf_attach_target_info *tgt_info); 626 599 void bpf_free_kfunc_btf_tab(struct bpf_kfunc_btf_tab *tab); 600 + 601 + int mark_chain_precision(struct bpf_verifier_env *env, int regno); 627 602 628 603 #define BPF_BASE_TYPE_MASK GENMASK(BPF_BASE_TYPE_BITS - 1, 0) 629 604

+19

include/linux/btf.h

··· 52 52 #define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ 53 53 #define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ 54 54 55 + /* 56 + * Return the name of the passed struct, if exists, or halt the build if for 57 + * example the structure gets renamed. In this way, developers have to revisit 58 + * the code using that structure name, and update it accordingly. 59 + */ 60 + #define stringify_struct(x) \ 61 + ({ BUILD_BUG_ON(sizeof(struct x) < 0); \ 62 + __stringify(x); }) 63 + 55 64 struct btf; 56 65 struct btf_member; 57 66 struct btf_type; ··· 449 440 return 0; 450 441 } 451 442 #endif 443 + 444 + static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t) 445 + { 446 + if (!btf_type_is_ptr(t)) 447 + return false; 448 + 449 + t = btf_type_skip_modifiers(btf, t->type, NULL); 450 + 451 + return btf_type_is_struct(t); 452 + } 452 453 453 454 #endif

+12 -1

include/linux/filter.h

··· 567 567 568 568 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); 569 569 570 + extern struct mutex nf_conn_btf_access_lock; 571 + extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf, 572 + const struct btf_type *t, int off, int size, 573 + enum bpf_access_type atype, u32 *next_btf_id, 574 + enum bpf_type_flag *flag); 575 + 570 576 typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx, 571 577 const struct bpf_insn *insnsi, 572 578 unsigned int (*bpf_func)(const void *, ··· 1023 1017 1024 1018 typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size); 1025 1019 1020 + void bpf_jit_fill_hole_with_zero(void *area, unsigned int size); 1021 + 1026 1022 struct bpf_binary_header * 1027 1023 bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, 1028 1024 unsigned int alignment, ··· 1036 1028 void bpf_jit_free(struct bpf_prog *fp); 1037 1029 struct bpf_binary_header * 1038 1030 bpf_jit_binary_pack_hdr(const struct bpf_prog *fp); 1031 + 1032 + void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns); 1033 + void bpf_prog_pack_free(struct bpf_binary_header *hdr); 1039 1034 1040 1035 static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) 1041 1036 { ··· 1110 1099 return false; 1111 1100 if (!bpf_jit_harden) 1112 1101 return false; 1113 - if (bpf_jit_harden == 1 && capable(CAP_SYS_ADMIN)) 1102 + if (bpf_jit_harden == 1 && bpf_capable()) 1114 1103 return false; 1115 1104 1116 1105 return true;

+6

include/linux/key.h

··· 88 88 KEY_DEFER_PERM_CHECK, /* Special: permission check is deferred */ 89 89 }; 90 90 91 + enum key_lookup_flag { 92 + KEY_LOOKUP_CREATE = 0x01, 93 + KEY_LOOKUP_PARTIAL = 0x02, 94 + KEY_LOOKUP_ALL = (KEY_LOOKUP_CREATE | KEY_LOOKUP_PARTIAL), 95 + }; 96 + 91 97 struct seq_file; 92 98 struct user_struct; 93 99 struct signal_struct;

+1

include/linux/kprobes.h

··· 103 103 * this flag is only for optimized_kprobe. 104 104 */ 105 105 #define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */ 106 + #define KPROBE_FLAG_ON_FUNC_ENTRY 16 /* probe is on the function entry */ 106 107 107 108 /* Has this kprobe gone ? */ 108 109 static inline bool kprobe_gone(struct kprobe *p)

+3

include/linux/poison.h

··· 81 81 /********** net/core/page_pool.c **********/ 82 82 #define PP_SIGNATURE (0x40 + POISON_POINTER_DELTA) 83 83 84 + /********** kernel/bpf/ **********/ 85 + #define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA)) 86 + 84 87 #endif

+6

include/linux/tcp.h

··· 388 388 u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs 389 389 * values defined in uapi/linux/tcp.h 390 390 */ 391 + u8 bpf_chg_cc_inprogress:1; /* In the middle of 392 + * bpf_setsockopt(TCP_CONGESTION), 393 + * it is to avoid the bpf_tcp_cc->init() 394 + * to recur itself by calling 395 + * bpf_setsockopt(TCP_CONGESTION, "itself"). 396 + */ 391 397 #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG) 392 398 #else 393 399 #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0

+8

include/linux/verification.h

··· 17 17 #define VERIFY_USE_SECONDARY_KEYRING ((struct key *)1UL) 18 18 #define VERIFY_USE_PLATFORM_KEYRING ((struct key *)2UL) 19 19 20 + static inline int system_keyring_id_check(u64 id) 21 + { 22 + if (id > (unsigned long)VERIFY_USE_PLATFORM_KEYRING) 23 + return -EINVAL; 24 + 25 + return 0; 26 + } 27 + 20 28 /* 21 29 * The use to which an asymmetric key is being put. 22 30 */

+24 -1

include/net/netfilter/nf_conntrack_bpf.h

··· 3 3 #ifndef _NF_CONNTRACK_BPF_H 4 4 #define _NF_CONNTRACK_BPF_H 5 5 6 - #include <linux/btf.h> 7 6 #include <linux/kconfig.h> 7 + #include <net/netfilter/nf_conntrack.h> 8 + 9 + struct nf_conn___init { 10 + struct nf_conn ct; 11 + }; 8 12 9 13 #if (IS_BUILTIN(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \ 10 14 (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)) 11 15 12 16 extern int register_nf_conntrack_bpf(void); 17 + extern void cleanup_nf_conntrack_bpf(void); 13 18 14 19 #else 15 20 16 21 static inline int register_nf_conntrack_bpf(void) 22 + { 23 + return 0; 24 + } 25 + 26 + static inline void cleanup_nf_conntrack_bpf(void) 27 + { 28 + } 29 + 30 + #endif 31 + 32 + #if (IS_BUILTIN(CONFIG_NF_NAT) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \ 33 + (IS_MODULE(CONFIG_NF_NAT) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)) 34 + 35 + extern int register_nf_nat_bpf(void); 36 + 37 + #else 38 + 39 + static inline int register_nf_nat_bpf(void) 17 40 { 18 41 return 0; 19 42 }

+55 -4

include/uapi/linux/bpf.h

··· 110 110 __u32 cgroup_fd; 111 111 __u64 cgroup_id; 112 112 } cgroup; 113 + /* Parameters of task iterators. */ 114 + struct { 115 + __u32 tid; 116 + __u32 pid; 117 + __u32 pid_fd; 118 + } task; 113 119 }; 114 120 115 121 /* BPF syscall commands, see bpf(2) man-page for more details. */ ··· 934 928 BPF_MAP_TYPE_INODE_STORAGE, 935 929 BPF_MAP_TYPE_TASK_STORAGE, 936 930 BPF_MAP_TYPE_BLOOM_FILTER, 931 + BPF_MAP_TYPE_USER_RINGBUF, 937 932 }; 938 933 939 934 /* Note that tracing related programs such as ··· 4957 4950 * Get address of the traced function (for tracing and kprobe programs). 4958 4951 * Return 4959 4952 * Address of the traced function. 4953 + * 0 for kprobes placed within the function (not at the entry). 4960 4954 * 4961 4955 * u64 bpf_get_attach_cookie(void *ctx) 4962 4956 * Description ··· 5087 5079 * 5088 5080 * long bpf_get_func_arg(void *ctx, u32 n, u64 *value) 5089 5081 * Description 5090 - * Get **n**-th argument (zero based) of the traced function (for tracing programs) 5082 + * Get **n**-th argument register (zero based) of the traced function (for tracing programs) 5091 5083 * returned in **value**. 5092 5084 * 5093 5085 * Return 5094 5086 * 0 on success. 5095 - * **-EINVAL** if n >= arguments count of traced function. 5087 + * **-EINVAL** if n >= argument register count of traced function. 5096 5088 * 5097 5089 * long bpf_get_func_ret(void *ctx, u64 *value) 5098 5090 * Description ··· 5105 5097 * 5106 5098 * long bpf_get_func_arg_cnt(void *ctx) 5107 5099 * Description 5108 - * Get number of arguments of the traced function (for tracing programs). 5100 + * Get number of registers of the traced function (for tracing programs) where 5101 + * function arguments are stored in these registers. 5109 5102 * 5110 5103 * Return 5111 - * The number of arguments of the traced function. 5104 + * The number of argument registers of the traced function. 5112 5105 * 5113 5106 * int bpf_get_retval(void) 5114 5107 * Description ··· 5395 5386 * Return 5396 5387 * Current *ktime*. 5397 5388 * 5389 + * long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) 5390 + * Description 5391 + * Drain samples from the specified user ring buffer, and invoke 5392 + * the provided callback for each such sample: 5393 + * 5394 + * long (\*callback_fn)(struct bpf_dynptr \*dynptr, void \*ctx); 5395 + * 5396 + * If **callback_fn** returns 0, the helper will continue to try 5397 + * and drain the next sample, up to a maximum of 5398 + * BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value is 1, 5399 + * the helper will skip the rest of the samples and return. Other 5400 + * return values are not used now, and will be rejected by the 5401 + * verifier. 5402 + * Return 5403 + * The number of drained samples if no error was encountered while 5404 + * draining samples, or 0 if no samples were present in the ring 5405 + * buffer. If a user-space producer was epoll-waiting on this map, 5406 + * and at least one sample was drained, they will receive an event 5407 + * notification notifying them of available space in the ring 5408 + * buffer. If the BPF_RB_NO_WAKEUP flag is passed to this 5409 + * function, no wakeup notification will be sent. If the 5410 + * BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification will 5411 + * be sent even if no sample was drained. 5412 + * 5413 + * On failure, the returned value is one of the following: 5414 + * 5415 + * **-EBUSY** if the ring buffer is contended, and another calling 5416 + * context was concurrently draining the ring buffer. 5417 + * 5418 + * **-EINVAL** if user-space is not properly tracking the ring 5419 + * buffer due to the producer position not being aligned to 8 5420 + * bytes, a sample not being aligned to 8 bytes, or the producer 5421 + * position not matching the advertised length of a sample. 5422 + * 5423 + * **-E2BIG** if user-space has tried to publish a sample which is 5424 + * larger than the size of the ring buffer, or which cannot fit 5425 + * within a struct bpf_dynptr. 5398 5426 */ 5399 5427 #define __BPF_FUNC_MAPPER(FN) \ 5400 5428 FN(unspec), \ ··· 5643 5597 FN(tcp_raw_check_syncookie_ipv4), \ 5644 5598 FN(tcp_raw_check_syncookie_ipv6), \ 5645 5599 FN(ktime_get_tai_ns), \ 5600 + FN(user_ringbuf_drain), \ 5646 5601 /* */ 5647 5602 5648 5603 /* integer value in 'imm' field of BPF_CALL instruction selects which helper ··· 6265 6218 __u64 cgroup_id; 6266 6219 __u32 order; 6267 6220 } cgroup; 6221 + struct { 6222 + __u32 tid; 6223 + __u32 pid; 6224 + } task; 6268 6225 }; 6269 6226 } iter; 6270 6227 struct {

+24 -9

kernel/bpf/arraymap.c

··· 279 279 rcu_read_lock(); 280 280 pptr = array->pptrs[index & array->index_mask]; 281 281 for_each_possible_cpu(cpu) { 282 - bpf_long_memcpy(value + off, per_cpu_ptr(pptr, cpu), size); 282 + copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu)); 283 + check_and_init_map_value(map, value + off); 283 284 off += size; 284 285 } 285 286 rcu_read_unlock(); ··· 339 338 return -EINVAL; 340 339 341 340 if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { 342 - memcpy(this_cpu_ptr(array->pptrs[index & array->index_mask]), 343 - value, map->value_size); 341 + val = this_cpu_ptr(array->pptrs[index & array->index_mask]); 342 + copy_map_value(map, val, value); 343 + check_and_free_fields(array, val); 344 344 } else { 345 345 val = array->value + 346 346 (u64)array->elem_size * (index & array->index_mask); ··· 385 383 rcu_read_lock(); 386 384 pptr = array->pptrs[index & array->index_mask]; 387 385 for_each_possible_cpu(cpu) { 388 - bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value + off, size); 386 + copy_map_value_long(map, per_cpu_ptr(pptr, cpu), value + off); 387 + check_and_free_fields(array, per_cpu_ptr(pptr, cpu)); 389 388 off += size; 390 389 } 391 390 rcu_read_unlock(); ··· 424 421 int i; 425 422 426 423 if (map_value_has_kptrs(map)) { 427 - for (i = 0; i < array->map.max_entries; i++) 428 - bpf_map_free_kptrs(map, array_map_elem_ptr(array, i)); 424 + if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { 425 + for (i = 0; i < array->map.max_entries; i++) { 426 + void __percpu *pptr = array->pptrs[i & array->index_mask]; 427 + int cpu; 428 + 429 + for_each_possible_cpu(cpu) { 430 + bpf_map_free_kptrs(map, per_cpu_ptr(pptr, cpu)); 431 + cond_resched(); 432 + } 433 + } 434 + } else { 435 + for (i = 0; i < array->map.max_entries; i++) 436 + bpf_map_free_kptrs(map, array_map_elem_ptr(array, i)); 437 + } 429 438 bpf_map_free_kptr_off_tab(map); 430 439 } 431 440 ··· 623 608 pptr = v; 624 609 size = array->elem_size; 625 610 for_each_possible_cpu(cpu) { 626 - bpf_long_memcpy(info->percpu_value_buf + off, 627 - per_cpu_ptr(pptr, cpu), 628 - size); 611 + copy_map_value_long(map, info->percpu_value_buf + off, 612 + per_cpu_ptr(pptr, cpu)); 613 + check_and_init_map_value(map, info->percpu_value_buf + off); 629 614 off += size; 630 615 } 631 616 ctx.value = info->percpu_value_buf;

+222 -47

kernel/bpf/btf.c

··· 208 208 }; 209 209 210 210 enum { 211 - BTF_KFUNC_SET_MAX_CNT = 32, 211 + BTF_KFUNC_SET_MAX_CNT = 256, 212 212 BTF_DTOR_KFUNC_MAX_CNT = 256, 213 213 }; 214 214 ··· 818 818 return NULL; 819 819 return btf->types[type_id]; 820 820 } 821 + EXPORT_SYMBOL_GPL(btf_type_by_id); 821 822 822 823 /* 823 824 * Regular int is not a bit field and it must be either ··· 1397 1396 const char *fmt, ...) 1398 1397 { 1399 1398 struct bpf_verifier_log *log = &env->log; 1400 - u8 kind = BTF_INFO_KIND(t->info); 1401 1399 struct btf *btf = env->btf; 1402 1400 va_list args; 1403 1401 ··· 1412 1412 1413 1413 __btf_verifier_log(log, "[%u] %s %s%s", 1414 1414 env->log_type_id, 1415 - btf_kind_str[kind], 1415 + btf_type_str(t), 1416 1416 __btf_name_by_offset(btf, t->name_off), 1417 1417 log_details ? " " : ""); 1418 1418 ··· 4854 4854 u32 hdr_len, hdr_copy, btf_data_size; 4855 4855 const struct btf_header *hdr; 4856 4856 struct btf *btf; 4857 - int err; 4858 4857 4859 4858 btf = env->btf; 4860 4859 btf_data_size = btf->data_size; ··· 4910 4911 return -EINVAL; 4911 4912 } 4912 4913 4913 - err = btf_check_sec_info(env, btf_data_size); 4914 - if (err) 4915 - return err; 4916 - 4917 - return 0; 4914 + return btf_check_sec_info(env, btf_data_size); 4918 4915 } 4919 4916 4920 4917 static int btf_check_type_tags(struct btf_verifier_env *env, ··· 5323 5328 return btf_type_is_int(t); 5324 5329 } 5325 5330 5331 + static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, 5332 + int off) 5333 + { 5334 + const struct btf_param *args; 5335 + const struct btf_type *t; 5336 + u32 offset = 0, nr_args; 5337 + int i; 5338 + 5339 + if (!func_proto) 5340 + return off / 8; 5341 + 5342 + nr_args = btf_type_vlen(func_proto); 5343 + args = (const struct btf_param *)(func_proto + 1); 5344 + for (i = 0; i < nr_args; i++) { 5345 + t = btf_type_skip_modifiers(btf, args[i].type, NULL); 5346 + offset += btf_type_is_ptr(t) ? 8 : roundup(t->size, 8); 5347 + if (off < offset) 5348 + return i; 5349 + } 5350 + 5351 + t = btf_type_skip_modifiers(btf, func_proto->type, NULL); 5352 + offset += btf_type_is_ptr(t) ? 8 : roundup(t->size, 8); 5353 + if (off < offset) 5354 + return nr_args; 5355 + 5356 + return nr_args + 1; 5357 + } 5358 + 5326 5359 bool btf_ctx_access(int off, int size, enum bpf_access_type type, 5327 5360 const struct bpf_prog *prog, 5328 5361 struct bpf_insn_access_aux *info) ··· 5370 5347 tname, off); 5371 5348 return false; 5372 5349 } 5373 - arg = off / 8; 5350 + arg = get_ctx_arg_idx(btf, t, off); 5374 5351 args = (const struct btf_param *)(t + 1); 5375 5352 /* if (t == NULL) Fall back to default BPF prog with 5376 5353 * MAX_BPF_FUNC_REG_ARGS u64 arguments. ··· 5421 5398 if (!btf_type_is_small_int(t)) { 5422 5399 bpf_log(log, 5423 5400 "ret type %s not allowed for fmod_ret\n", 5424 - btf_kind_str[BTF_INFO_KIND(t->info)]); 5401 + btf_type_str(t)); 5425 5402 return false; 5426 5403 } 5427 5404 break; ··· 5440 5417 /* skip modifiers */ 5441 5418 while (btf_type_is_modifier(t)) 5442 5419 t = btf_type_by_id(btf, t->type); 5443 - if (btf_type_is_small_int(t) || btf_is_any_enum(t)) 5420 + if (btf_type_is_small_int(t) || btf_is_any_enum(t) || __btf_type_is_struct(t)) 5444 5421 /* accessing a scalar */ 5445 5422 return true; 5446 5423 if (!btf_type_is_ptr(t)) { ··· 5448 5425 "func '%s' arg%d '%s' has type %s. Only pointer access is allowed\n", 5449 5426 tname, arg, 5450 5427 __btf_name_by_offset(btf, t->name_off), 5451 - btf_kind_str[BTF_INFO_KIND(t->info)]); 5428 + btf_type_str(t)); 5452 5429 return false; 5453 5430 } 5454 5431 ··· 5532 5509 if (!btf_type_is_struct(t)) { 5533 5510 bpf_log(log, 5534 5511 "func '%s' arg%d type %s is not a struct\n", 5535 - tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]); 5512 + tname, arg, btf_type_str(t)); 5536 5513 return false; 5537 5514 } 5538 5515 bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n", 5539 - tname, arg, info->btf_id, btf_kind_str[BTF_INFO_KIND(t->info)], 5516 + tname, arg, info->btf_id, btf_type_str(t), 5540 5517 __btf_name_by_offset(btf, t->name_off)); 5541 5518 return true; 5542 5519 } ··· 5904 5881 if (btf_type_is_ptr(t)) 5905 5882 /* kernel size of pointer. Not BPF's size of pointer*/ 5906 5883 return sizeof(void *); 5907 - if (btf_type_is_int(t) || btf_is_any_enum(t)) 5884 + if (btf_type_is_int(t) || btf_is_any_enum(t) || __btf_type_is_struct(t)) 5908 5885 return t->size; 5909 5886 return -EINVAL; 5910 5887 } ··· 5924 5901 /* BTF function prototype doesn't match the verifier types. 5925 5902 * Fall back to MAX_BPF_FUNC_REG_ARGS u64 args. 5926 5903 */ 5927 - for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) 5904 + for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { 5928 5905 m->arg_size[i] = 8; 5906 + m->arg_flags[i] = 0; 5907 + } 5929 5908 m->ret_size = 8; 5930 5909 m->nr_args = MAX_BPF_FUNC_REG_ARGS; 5931 5910 return 0; ··· 5941 5916 return -EINVAL; 5942 5917 } 5943 5918 ret = __get_type_size(btf, func->type, &t); 5944 - if (ret < 0) { 5919 + if (ret < 0 || __btf_type_is_struct(t)) { 5945 5920 bpf_log(log, 5946 5921 "The function %s return type %s is unsupported.\n", 5947 - tname, btf_kind_str[BTF_INFO_KIND(t->info)]); 5922 + tname, btf_type_str(t)); 5948 5923 return -EINVAL; 5949 5924 } 5950 5925 m->ret_size = ret; ··· 5957 5932 return -EINVAL; 5958 5933 } 5959 5934 ret = __get_type_size(btf, args[i].type, &t); 5960 - if (ret < 0) { 5935 + 5936 + /* No support of struct argument size greater than 16 bytes */ 5937 + if (ret < 0 || ret > 16) { 5961 5938 bpf_log(log, 5962 5939 "The function %s arg%d type %s is unsupported.\n", 5963 - tname, i, btf_kind_str[BTF_INFO_KIND(t->info)]); 5940 + tname, i, btf_type_str(t)); 5964 5941 return -EINVAL; 5965 5942 } 5966 5943 if (ret == 0) { ··· 5972 5945 return -EINVAL; 5973 5946 } 5974 5947 m->arg_size[i] = ret; 5948 + m->arg_flags[i] = __btf_type_is_struct(t) ? BTF_FMODEL_STRUCT_ARG : 0; 5975 5949 } 5976 5950 m->nr_args = nargs; 5977 5951 return 0; ··· 6194 6166 return true; 6195 6167 } 6196 6168 6169 + static bool btf_is_kfunc_arg_mem_size(const struct btf *btf, 6170 + const struct btf_param *arg, 6171 + const struct bpf_reg_state *reg, 6172 + const char *name) 6173 + { 6174 + int len, target_len = strlen(name); 6175 + const struct btf_type *t; 6176 + const char *param_name; 6177 + 6178 + t = btf_type_skip_modifiers(btf, arg->type, NULL); 6179 + if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE) 6180 + return false; 6181 + 6182 + param_name = btf_name_by_offset(btf, arg->name_off); 6183 + if (str_is_empty(param_name)) 6184 + return false; 6185 + len = strlen(param_name); 6186 + if (len != target_len) 6187 + return false; 6188 + if (strcmp(param_name, name)) 6189 + return false; 6190 + 6191 + return true; 6192 + } 6193 + 6197 6194 static int btf_check_func_arg_match(struct bpf_verifier_env *env, 6198 6195 const struct btf *btf, u32 func_id, 6199 6196 struct bpf_reg_state *regs, 6200 6197 bool ptr_to_mem_ok, 6201 - u32 kfunc_flags) 6198 + struct bpf_kfunc_arg_meta *kfunc_meta, 6199 + bool processing_call) 6202 6200 { 6203 6201 enum bpf_prog_type prog_type = resolve_prog_type(env->prog); 6204 - bool rel = false, kptr_get = false, trusted_arg = false; 6202 + bool rel = false, kptr_get = false, trusted_args = false; 6205 6203 bool sleepable = false; 6206 6204 struct bpf_verifier_log *log = &env->log; 6207 6205 u32 i, nargs, ref_id, ref_obj_id = 0; ··· 6261 6207 return -EINVAL; 6262 6208 } 6263 6209 6264 - if (is_kfunc) { 6210 + if (is_kfunc && kfunc_meta) { 6265 6211 /* Only kfunc can be release func */ 6266 - rel = kfunc_flags & KF_RELEASE; 6267 - kptr_get = kfunc_flags & KF_KPTR_GET; 6268 - trusted_arg = kfunc_flags & KF_TRUSTED_ARGS; 6269 - sleepable = kfunc_flags & KF_SLEEPABLE; 6212 + rel = kfunc_meta->flags & KF_RELEASE; 6213 + kptr_get = kfunc_meta->flags & KF_KPTR_GET; 6214 + trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS; 6215 + sleepable = kfunc_meta->flags & KF_SLEEPABLE; 6270 6216 } 6271 6217 6272 6218 /* check that BTF function arguments match actual types that the ··· 6276 6222 enum bpf_arg_type arg_type = ARG_DONTCARE; 6277 6223 u32 regno = i + 1; 6278 6224 struct bpf_reg_state *reg = &regs[regno]; 6225 + bool obj_ptr = false; 6279 6226 6280 6227 t = btf_type_skip_modifiers(btf, args[i].type, NULL); 6281 6228 if (btf_type_is_scalar(t)) { 6229 + if (is_kfunc && kfunc_meta) { 6230 + bool is_buf_size = false; 6231 + 6232 + /* check for any const scalar parameter of name "rdonly_buf_size" 6233 + * or "rdwr_buf_size" 6234 + */ 6235 + if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg, 6236 + "rdonly_buf_size")) { 6237 + kfunc_meta->r0_rdonly = true; 6238 + is_buf_size = true; 6239 + } else if (btf_is_kfunc_arg_mem_size(btf, &args[i], reg, 6240 + "rdwr_buf_size")) 6241 + is_buf_size = true; 6242 + 6243 + if (is_buf_size) { 6244 + if (kfunc_meta->r0_size) { 6245 + bpf_log(log, "2 or more rdonly/rdwr_buf_size parameters for kfunc"); 6246 + return -EINVAL; 6247 + } 6248 + 6249 + if (!tnum_is_const(reg->var_off)) { 6250 + bpf_log(log, "R%d is not a const\n", regno); 6251 + return -EINVAL; 6252 + } 6253 + 6254 + kfunc_meta->r0_size = reg->var_off.value; 6255 + ret = mark_chain_precision(env, regno); 6256 + if (ret) 6257 + return ret; 6258 + } 6259 + } 6260 + 6282 6261 if (reg->type == SCALAR_VALUE) 6283 6262 continue; 6284 6263 bpf_log(log, "R%d is not a scalar\n", regno); ··· 6324 6237 return -EINVAL; 6325 6238 } 6326 6239 6240 + /* These register types have special constraints wrt ref_obj_id 6241 + * and offset checks. The rest of trusted args don't. 6242 + */ 6243 + obj_ptr = reg->type == PTR_TO_CTX || reg->type == PTR_TO_BTF_ID || 6244 + reg2btf_ids[base_type(reg->type)]; 6245 + 6327 6246 /* Check if argument must be a referenced pointer, args + i has 6328 6247 * been verified to be a pointer (after skipping modifiers). 6248 + * PTR_TO_CTX is ok without having non-zero ref_obj_id. 6329 6249 */ 6330 - if (is_kfunc && trusted_arg && !reg->ref_obj_id) { 6250 + if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) { 6331 6251 bpf_log(log, "R%d must be referenced\n", regno); 6332 6252 return -EINVAL; 6333 6253 } ··· 6343 6249 ref_tname = btf_name_by_offset(btf, ref_t->name_off); 6344 6250 6345 6251 /* Trusted args have the same offset checks as release arguments */ 6346 - if (trusted_arg || (rel && reg->ref_obj_id)) 6252 + if ((trusted_args && obj_ptr) || (rel && reg->ref_obj_id)) 6347 6253 arg_type |= OBJ_RELEASE; 6348 6254 ret = check_func_arg_reg_off(env, reg, regno, arg_type); 6349 6255 if (ret < 0) 6350 6256 return ret; 6257 + 6258 + if (is_kfunc && reg->ref_obj_id) { 6259 + /* Ensure only one argument is referenced PTR_TO_BTF_ID */ 6260 + if (ref_obj_id) { 6261 + bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n", 6262 + regno, reg->ref_obj_id, ref_obj_id); 6263 + return -EFAULT; 6264 + } 6265 + ref_regno = regno; 6266 + ref_obj_id = reg->ref_obj_id; 6267 + } 6351 6268 6352 6269 /* kptr_get is only true for kfunc */ 6353 6270 if (i == 0 && kptr_get) { ··· 6432 6327 if (reg->type == PTR_TO_BTF_ID) { 6433 6328 reg_btf = reg->btf; 6434 6329 reg_ref_id = reg->btf_id; 6435 - /* Ensure only one argument is referenced PTR_TO_BTF_ID */ 6436 - if (reg->ref_obj_id) { 6437 - if (ref_obj_id) { 6438 - bpf_log(log, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n", 6439 - regno, reg->ref_obj_id, ref_obj_id); 6440 - return -EFAULT; 6441 - } 6442 - ref_regno = regno; 6443 - ref_obj_id = reg->ref_obj_id; 6444 - } 6445 6330 } else { 6446 6331 reg_btf = btf_vmlinux; 6447 6332 reg_ref_id = *reg2btf_ids[base_type(reg->type)]; ··· 6443 6348 reg_ref_t->name_off); 6444 6349 if (!btf_struct_ids_match(log, reg_btf, reg_ref_id, 6445 6350 reg->off, btf, ref_id, 6446 - trusted_arg || (rel && reg->ref_obj_id))) { 6351 + trusted_args || (rel && reg->ref_obj_id))) { 6447 6352 bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n", 6448 6353 func_name, i, 6449 6354 btf_type_str(ref_t), ref_tname, ··· 6451 6356 reg_ref_tname); 6452 6357 return -EINVAL; 6453 6358 } 6454 - } else if (ptr_to_mem_ok) { 6359 + } else if (ptr_to_mem_ok && processing_call) { 6455 6360 const struct btf_type *resolve_ret; 6456 6361 u32 type_size; 6457 6362 6458 6363 if (is_kfunc) { 6459 6364 bool arg_mem_size = i + 1 < nargs && is_kfunc_arg_mem_size(btf, &args[i + 1], &regs[regno + 1]); 6365 + bool arg_dynptr = btf_type_is_struct(ref_t) && 6366 + !strcmp(ref_tname, 6367 + stringify_struct(bpf_dynptr_kern)); 6460 6368 6461 6369 /* Permit pointer to mem, but only when argument 6462 6370 * type is pointer to scalar, or struct composed 6463 6371 * (recursively) of scalars. 6464 6372 * When arg_mem_size is true, the pointer can be 6465 6373 * void *. 6374 + * Also permit initialized local dynamic pointers. 6466 6375 */ 6467 6376 if (!btf_type_is_scalar(ref_t) && 6468 6377 !__btf_type_is_scalar_struct(log, btf, ref_t, 0) && 6378 + !arg_dynptr && 6469 6379 (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) { 6470 6380 bpf_log(log, 6471 6381 "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n", 6472 6382 i, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : ""); 6473 6383 return -EINVAL; 6384 + } 6385 + 6386 + if (arg_dynptr) { 6387 + if (reg->type != PTR_TO_STACK) { 6388 + bpf_log(log, "arg#%d pointer type %s %s not to stack\n", 6389 + i, btf_type_str(ref_t), 6390 + ref_tname); 6391 + return -EINVAL; 6392 + } 6393 + 6394 + if (!is_dynptr_reg_valid_init(env, reg)) { 6395 + bpf_log(log, 6396 + "arg#%d pointer type %s %s must be valid and initialized\n", 6397 + i, btf_type_str(ref_t), 6398 + ref_tname); 6399 + return -EINVAL; 6400 + } 6401 + 6402 + if (!is_dynptr_type_expected(env, reg, 6403 + ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL)) { 6404 + bpf_log(log, 6405 + "arg#%d pointer type %s %s points to unsupported dynamic pointer type\n", 6406 + i, btf_type_str(ref_t), 6407 + ref_tname); 6408 + return -EINVAL; 6409 + } 6410 + 6411 + continue; 6474 6412 } 6475 6413 6476 6414 /* Check for mem, len pair */ ··· 6555 6427 return -EINVAL; 6556 6428 } 6557 6429 6430 + if (kfunc_meta && ref_obj_id) 6431 + kfunc_meta->ref_obj_id = ref_obj_id; 6432 + 6558 6433 /* returns argument register number > 0 in case of reference release kfunc */ 6559 6434 return rel ? ref_regno : 0; 6560 6435 } 6561 6436 6562 - /* Compare BTF of a function with given bpf_reg_state. 6437 + /* Compare BTF of a function declaration with given bpf_reg_state. 6563 6438 * Returns: 6564 6439 * EFAULT - there is a verifier bug. Abort verification. 6565 6440 * EINVAL - there is a type mismatch or BTF is not available. ··· 6589 6458 return -EINVAL; 6590 6459 6591 6460 is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL; 6592 - err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, 0); 6461 + err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, false); 6462 + 6463 + /* Compiler optimizations can remove arguments from static functions 6464 + * or mismatched type can be passed into a global function. 6465 + * In such cases mark the function as unreliable from BTF point of view. 6466 + */ 6467 + if (err) 6468 + prog->aux->func_info_aux[subprog].unreliable = true; 6469 + return err; 6470 + } 6471 + 6472 + /* Compare BTF of a function call with given bpf_reg_state. 6473 + * Returns: 6474 + * EFAULT - there is a verifier bug. Abort verification. 6475 + * EINVAL - there is a type mismatch or BTF is not available. 6476 + * 0 - BTF matches with what bpf_reg_state expects. 6477 + * Only PTR_TO_CTX and SCALAR_VALUE states are recognized. 6478 + * 6479 + * NOTE: the code is duplicated from btf_check_subprog_arg_match() 6480 + * because btf_check_func_arg_match() is still doing both. Once that 6481 + * function is split in 2, we can call from here btf_check_subprog_arg_match() 6482 + * first, and then treat the calling part in a new code path. 6483 + */ 6484 + int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog, 6485 + struct bpf_reg_state *regs) 6486 + { 6487 + struct bpf_prog *prog = env->prog; 6488 + struct btf *btf = prog->aux->btf; 6489 + bool is_global; 6490 + u32 btf_id; 6491 + int err; 6492 + 6493 + if (!prog->aux->func_info) 6494 + return -EINVAL; 6495 + 6496 + btf_id = prog->aux->func_info[subprog].type_id; 6497 + if (!btf_id) 6498 + return -EFAULT; 6499 + 6500 + if (prog->aux->func_info_aux[subprog].unreliable) 6501 + return -EINVAL; 6502 + 6503 + is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL; 6504 + err = btf_check_func_arg_match(env, btf, btf_id, regs, is_global, NULL, true); 6593 6505 6594 6506 /* Compiler optimizations can remove arguments from static functions 6595 6507 * or mismatched type can be passed into a global function. ··· 6646 6472 int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, 6647 6473 const struct btf *btf, u32 func_id, 6648 6474 struct bpf_reg_state *regs, 6649 - u32 kfunc_flags) 6475 + struct bpf_kfunc_arg_meta *meta) 6650 6476 { 6651 - return btf_check_func_arg_match(env, btf, func_id, regs, true, kfunc_flags); 6477 + return btf_check_func_arg_match(env, btf, func_id, regs, true, meta, true); 6652 6478 } 6653 6479 6654 6480 /* Convert BTF of a function into bpf_reg_state if possible ··· 6762 6588 continue; 6763 6589 } 6764 6590 bpf_log(log, "Arg#%d type %s in %s() is not supported yet.\n", 6765 - i, btf_kind_str[BTF_INFO_KIND(t->info)], tname); 6591 + i, btf_type_str(t), tname); 6766 6592 return -EINVAL; 6767 6593 } 6768 6594 return 0; ··· 7417 7243 case BPF_PROG_TYPE_STRUCT_OPS: 7418 7244 return BTF_KFUNC_HOOK_STRUCT_OPS; 7419 7245 case BPF_PROG_TYPE_TRACING: 7246 + case BPF_PROG_TYPE_LSM: 7420 7247 return BTF_KFUNC_HOOK_TRACING; 7421 7248 case BPF_PROG_TYPE_SYSCALL: 7422 7249 return BTF_KFUNC_HOOK_SYSCALL;

+7 -2

kernel/bpf/core.c

··· 825 825 unsigned long bitmap[]; 826 826 }; 827 827 828 + void bpf_jit_fill_hole_with_zero(void *area, unsigned int size) 829 + { 830 + memset(area, 0, size); 831 + } 832 + 828 833 #define BPF_PROG_SIZE_TO_NBITS(size) (round_up(size, BPF_PROG_CHUNK_SIZE) / BPF_PROG_CHUNK_SIZE) 829 834 830 835 static DEFINE_MUTEX(pack_mutex); ··· 869 864 return pack; 870 865 } 871 866 872 - static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns) 867 + void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns) 873 868 { 874 869 unsigned int nbits = BPF_PROG_SIZE_TO_NBITS(size); 875 870 struct bpf_prog_pack *pack; ··· 910 905 return ptr; 911 906 } 912 907 913 - static void bpf_prog_pack_free(struct bpf_binary_header *hdr) 908 + void bpf_prog_pack_free(struct bpf_binary_header *hdr) 914 909 { 915 910 struct bpf_prog_pack *pack = NULL, *tmp; 916 911 unsigned int nbits;

+21 -6

kernel/bpf/dispatcher.c

··· 85 85 return false; 86 86 } 87 87 88 - int __weak arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs) 88 + int __weak arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs) 89 89 { 90 90 return -ENOTSUPP; 91 91 } 92 92 93 - static int bpf_dispatcher_prepare(struct bpf_dispatcher *d, void *image) 93 + static int bpf_dispatcher_prepare(struct bpf_dispatcher *d, void *image, void *buf) 94 94 { 95 95 s64 ips[BPF_DISPATCHER_MAX] = {}, *ipsp = &ips[0]; 96 96 int i; ··· 99 99 if (d->progs[i].prog) 100 100 *ipsp++ = (s64)(uintptr_t)d->progs[i].prog->bpf_func; 101 101 } 102 - return arch_prepare_bpf_dispatcher(image, &ips[0], d->num_progs); 102 + return arch_prepare_bpf_dispatcher(image, buf, &ips[0], d->num_progs); 103 103 } 104 104 105 105 static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) 106 106 { 107 - void *old, *new; 107 + void *old, *new, *tmp; 108 108 u32 noff; 109 109 int err; 110 110 ··· 117 117 } 118 118 119 119 new = d->num_progs ? d->image + noff : NULL; 120 + tmp = d->num_progs ? d->rw_image + noff : NULL; 120 121 if (new) { 121 - if (bpf_dispatcher_prepare(d, new)) 122 + /* Prepare the dispatcher in d->rw_image. Then use 123 + * bpf_arch_text_copy to update d->image, which is RO+X. 124 + */ 125 + if (bpf_dispatcher_prepare(d, new, tmp)) 126 + return; 127 + if (IS_ERR(bpf_arch_text_copy(new, tmp, PAGE_SIZE / 2))) 122 128 return; 123 129 } 124 130 ··· 146 140 147 141 mutex_lock(&d->mutex); 148 142 if (!d->image) { 149 - d->image = bpf_jit_alloc_exec_page(); 143 + d->image = bpf_prog_pack_alloc(PAGE_SIZE, bpf_jit_fill_hole_with_zero); 150 144 if (!d->image) 151 145 goto out; 146 + d->rw_image = bpf_jit_alloc_exec(PAGE_SIZE); 147 + if (!d->rw_image) { 148 + u32 size = PAGE_SIZE; 149 + 150 + bpf_arch_text_copy(d->image, &size, sizeof(size)); 151 + bpf_prog_pack_free((struct bpf_binary_header *)d->image); 152 + d->image = NULL; 153 + goto out; 154 + } 152 155 bpf_image_ksym_add(d->image, &d->ksym); 153 156 } 154 157

+16 -52

kernel/bpf/hashtab.c

··· 68 68 * In theory the BPF locks could be converted to regular spinlocks as well, 69 69 * but the bucket locks and percpu_freelist locks can be taken from 70 70 * arbitrary contexts (perf, kprobes, tracepoints) which are required to be 71 - * atomic contexts even on RT. These mechanisms require preallocated maps, 72 - * so there is no need to invoke memory allocations within the lock held 73 - * sections. 74 - * 75 - * BPF maps which need dynamic allocation are only used from (forced) 76 - * thread context on RT and can therefore use regular spinlocks which in 77 - * turn allows to invoke memory allocations from the lock held section. 78 - * 79 - * On a non RT kernel this distinction is neither possible nor required. 80 - * spinlock maps to raw_spinlock and the extra code is optimized out by the 81 - * compiler. 71 + * atomic contexts even on RT. Before the introduction of bpf_mem_alloc, 72 + * it is only safe to use raw spinlock for preallocated hash map on a RT kernel, 73 + * because there is no memory allocation within the lock held sections. However 74 + * after hash map was fully converted to use bpf_mem_alloc, there will be 75 + * non-synchronous memory allocation for non-preallocated hash map, so it is 76 + * safe to always use raw spinlock for bucket lock. 82 77 */ 83 78 struct bucket { 84 79 struct hlist_nulls_head head; 85 - union { 86 - raw_spinlock_t raw_lock; 87 - spinlock_t lock; 88 - }; 80 + raw_spinlock_t raw_lock; 89 81 }; 90 82 91 83 #define HASHTAB_MAP_LOCK_COUNT 8 ··· 133 141 return !(htab->map.map_flags & BPF_F_NO_PREALLOC); 134 142 } 135 143 136 - static inline bool htab_use_raw_lock(const struct bpf_htab *htab) 137 - { 138 - return (!IS_ENABLED(CONFIG_PREEMPT_RT) || htab_is_prealloc(htab)); 139 - } 140 - 141 144 static void htab_init_buckets(struct bpf_htab *htab) 142 145 { 143 146 unsigned int i; 144 147 145 148 for (i = 0; i < htab->n_buckets; i++) { 146 149 INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i); 147 - if (htab_use_raw_lock(htab)) { 148 - raw_spin_lock_init(&htab->buckets[i].raw_lock); 149 - lockdep_set_class(&htab->buckets[i].raw_lock, 150 + raw_spin_lock_init(&htab->buckets[i].raw_lock); 151 + lockdep_set_class(&htab->buckets[i].raw_lock, 150 152 &htab->lockdep_key); 151 - } else { 152 - spin_lock_init(&htab->buckets[i].lock); 153 - lockdep_set_class(&htab->buckets[i].lock, 154 - &htab->lockdep_key); 155 - } 156 153 cond_resched(); 157 154 } 158 155 } ··· 151 170 unsigned long *pflags) 152 171 { 153 172 unsigned long flags; 154 - bool use_raw_lock; 155 173 156 174 hash = hash & HASHTAB_MAP_LOCK_MASK; 157 175 158 - use_raw_lock = htab_use_raw_lock(htab); 159 - if (use_raw_lock) 160 - preempt_disable(); 161 - else 162 - migrate_disable(); 176 + preempt_disable(); 163 177 if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { 164 178 __this_cpu_dec(*(htab->map_locked[hash])); 165 - if (use_raw_lock) 166 - preempt_enable(); 167 - else 168 - migrate_enable(); 179 + preempt_enable(); 169 180 return -EBUSY; 170 181 } 171 182 172 - if (use_raw_lock) 173 - raw_spin_lock_irqsave(&b->raw_lock, flags); 174 - else 175 - spin_lock_irqsave(&b->lock, flags); 183 + raw_spin_lock_irqsave(&b->raw_lock, flags); 176 184 *pflags = flags; 177 185 178 186 return 0; ··· 171 201 struct bucket *b, u32 hash, 172 202 unsigned long flags) 173 203 { 174 - bool use_raw_lock = htab_use_raw_lock(htab); 175 - 176 204 hash = hash & HASHTAB_MAP_LOCK_MASK; 177 - if (use_raw_lock) 178 - raw_spin_unlock_irqrestore(&b->raw_lock, flags); 179 - else 180 - spin_unlock_irqrestore(&b->lock, flags); 205 + raw_spin_unlock_irqrestore(&b->raw_lock, flags); 181 206 __this_cpu_dec(*(htab->map_locked[hash])); 182 - if (use_raw_lock) 183 - preempt_enable(); 184 - else 185 - migrate_enable(); 207 + preempt_enable(); 186 208 } 187 209 188 210 static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); ··· 584 622 free_prealloc: 585 623 prealloc_destroy(htab); 586 624 free_map_locked: 625 + if (htab->use_percpu_counter) 626 + percpu_counter_destroy(&htab->pcount); 587 627 for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) 588 628 free_percpu(htab->map_locked[i]); 589 629 bpf_map_area_free(htab->buckets);

+8 -4

kernel/bpf/helpers.c

··· 15 15 #include <linux/ctype.h> 16 16 #include <linux/jiffies.h> 17 17 #include <linux/pid_namespace.h> 18 + #include <linux/poison.h> 18 19 #include <linux/proc_ns.h> 19 20 #include <linux/security.h> 20 21 #include <linux/btf_ids.h> ··· 1377 1376 } 1378 1377 1379 1378 /* Unlike other PTR_TO_BTF_ID helpers the btf_id in bpf_kptr_xchg() 1380 - * helper is determined dynamically by the verifier. 1379 + * helper is determined dynamically by the verifier. Use BPF_PTR_POISON to 1380 + * denote type that verifier will determine. 1381 1381 */ 1382 - #define BPF_PTR_POISON ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA)) 1383 - 1384 1382 static const struct bpf_func_proto bpf_kptr_xchg_proto = { 1385 1383 .func = bpf_kptr_xchg, 1386 1384 .gpl_only = false, ··· 1408 1408 ptr->size |= type << DYNPTR_TYPE_SHIFT; 1409 1409 } 1410 1410 1411 - static u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr) 1411 + u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr) 1412 1412 { 1413 1413 return ptr->size & DYNPTR_SIZE_MASK; 1414 1414 } ··· 1445 1445 BPF_CALL_4(bpf_dynptr_from_mem, void *, data, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr) 1446 1446 { 1447 1447 int err; 1448 + 1449 + BTF_TYPE_EMIT(struct bpf_dynptr); 1448 1450 1449 1451 err = bpf_dynptr_check_size(size); 1450 1452 if (err) ··· 1661 1659 return &bpf_for_each_map_elem_proto; 1662 1660 case BPF_FUNC_loop: 1663 1661 return &bpf_loop_proto; 1662 + case BPF_FUNC_user_ringbuf_drain: 1663 + return &bpf_user_ringbuf_drain_proto; 1664 1664 default: 1665 1665 break; 1666 1666 }

+3 -2

kernel/bpf/memalloc.c

··· 277 277 local_dec(&c->active); 278 278 if (IS_ENABLED(CONFIG_PREEMPT_RT)) 279 279 local_irq_restore(flags); 280 - enque_to_free(c, llnode); 280 + if (llnode) 281 + enque_to_free(c, llnode); 281 282 } while (cnt > (c->high_watermark + c->low_watermark) / 2); 282 283 283 284 /* and drain free_llist_extra */ ··· 611 610 if (!ptr) 612 611 return; 613 612 614 - idx = bpf_mem_cache_idx(__ksize(ptr - LLIST_NODE_SZ)); 613 + idx = bpf_mem_cache_idx(ksize(ptr - LLIST_NODE_SZ)); 615 614 if (idx < 0) 616 615 return; 617 616

+16 -32

kernel/bpf/percpu_freelist.c

··· 58 58 { 59 59 int cpu, orig_cpu; 60 60 61 - orig_cpu = cpu = raw_smp_processor_id(); 61 + orig_cpu = raw_smp_processor_id(); 62 62 while (1) { 63 - struct pcpu_freelist_head *head; 63 + for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) { 64 + struct pcpu_freelist_head *head; 64 65 65 - head = per_cpu_ptr(s->freelist, cpu); 66 - if (raw_spin_trylock(&head->lock)) { 67 - pcpu_freelist_push_node(head, node); 68 - raw_spin_unlock(&head->lock); 69 - return; 66 + head = per_cpu_ptr(s->freelist, cpu); 67 + if (raw_spin_trylock(&head->lock)) { 68 + pcpu_freelist_push_node(head, node); 69 + raw_spin_unlock(&head->lock); 70 + return; 71 + } 70 72 } 71 - cpu = cpumask_next(cpu, cpu_possible_mask); 72 - if (cpu >= nr_cpu_ids) 73 - cpu = 0; 74 73 75 74 /* cannot lock any per cpu lock, try extralist */ 76 - if (cpu == orig_cpu && 77 - pcpu_freelist_try_push_extra(s, node)) 75 + if (pcpu_freelist_try_push_extra(s, node)) 78 76 return; 79 77 } 80 78 } ··· 123 125 { 124 126 struct pcpu_freelist_head *head; 125 127 struct pcpu_freelist_node *node; 126 - int orig_cpu, cpu; 128 + int cpu; 127 129 128 - orig_cpu = cpu = raw_smp_processor_id(); 129 - while (1) { 130 + for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { 130 131 head = per_cpu_ptr(s->freelist, cpu); 131 132 if (!READ_ONCE(head->first)) 132 - goto next_cpu; 133 + continue; 133 134 raw_spin_lock(&head->lock); 134 135 node = head->first; 135 136 if (node) { ··· 137 140 return node; 138 141 } 139 142 raw_spin_unlock(&head->lock); 140 - next_cpu: 141 - cpu = cpumask_next(cpu, cpu_possible_mask); 142 - if (cpu >= nr_cpu_ids) 143 - cpu = 0; 144 - if (cpu == orig_cpu) 145 - break; 146 143 } 147 144 148 145 /* per cpu lists are all empty, try extralist */ ··· 155 164 { 156 165 struct pcpu_freelist_head *head; 157 166 struct pcpu_freelist_node *node; 158 - int orig_cpu, cpu; 167 + int cpu; 159 168 160 - orig_cpu = cpu = raw_smp_processor_id(); 161 - while (1) { 169 + for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) { 162 170 head = per_cpu_ptr(s->freelist, cpu); 163 171 if (!READ_ONCE(head->first)) 164 - goto next_cpu; 172 + continue; 165 173 if (raw_spin_trylock(&head->lock)) { 166 174 node = head->first; 167 175 if (node) { ··· 170 180 } 171 181 raw_spin_unlock(&head->lock); 172 182 } 173 - next_cpu: 174 - cpu = cpumask_next(cpu, cpu_possible_mask); 175 - if (cpu >= nr_cpu_ids) 176 - cpu = 0; 177 - if (cpu == orig_cpu) 178 - break; 179 183 } 180 184 181 185 /* cannot pop from per cpu lists, try extralist */

+232 -11

kernel/bpf/ringbuf.c

··· 38 38 struct page **pages; 39 39 int nr_pages; 40 40 spinlock_t spinlock ____cacheline_aligned_in_smp; 41 - /* Consumer and producer counters are put into separate pages to allow 42 - * mapping consumer page as r/w, but restrict producer page to r/o. 43 - * This protects producer position from being modified by user-space 44 - * application and ruining in-kernel position tracking. 41 + /* For user-space producer ring buffers, an atomic_t busy bit is used 42 + * to synchronize access to the ring buffers in the kernel, rather than 43 + * the spinlock that is used for kernel-producer ring buffers. This is 44 + * done because the ring buffer must hold a lock across a BPF program's 45 + * callback: 46 + * 47 + * __bpf_user_ringbuf_peek() // lock acquired 48 + * -> program callback_fn() 49 + * -> __bpf_user_ringbuf_sample_release() // lock released 50 + * 51 + * It is unsafe and incorrect to hold an IRQ spinlock across what could 52 + * be a long execution window, so we instead simply disallow concurrent 53 + * access to the ring buffer by kernel consumers, and return -EBUSY from 54 + * __bpf_user_ringbuf_peek() if the busy bit is held by another task. 55 + */ 56 + atomic_t busy ____cacheline_aligned_in_smp; 57 + /* Consumer and producer counters are put into separate pages to 58 + * allow each position to be mapped with different permissions. 59 + * This prevents a user-space application from modifying the 60 + * position and ruining in-kernel tracking. The permissions of the 61 + * pages depend on who is producing samples: user-space or the 62 + * kernel. 63 + * 64 + * Kernel-producer 65 + * --------------- 66 + * The producer position and data pages are mapped as r/o in 67 + * userspace. For this approach, bits in the header of samples are 68 + * used to signal to user-space, and to other producers, whether a 69 + * sample is currently being written. 70 + * 71 + * User-space producer 72 + * ------------------- 73 + * Only the page containing the consumer position is mapped r/o in 74 + * user-space. User-space producers also use bits of the header to 75 + * communicate to the kernel, but the kernel must carefully check and 76 + * validate each sample to ensure that they're correctly formatted, and 77 + * fully contained within the ring buffer. 45 78 */ 46 79 unsigned long consumer_pos __aligned(PAGE_SIZE); 47 80 unsigned long producer_pos __aligned(PAGE_SIZE); ··· 169 136 return NULL; 170 137 171 138 spin_lock_init(&rb->spinlock); 139 + atomic_set(&rb->busy, 0); 172 140 init_waitqueue_head(&rb->waitq); 173 141 init_irq_work(&rb->work, bpf_ringbuf_notify); 174 142 ··· 258 224 return -ENOTSUPP; 259 225 } 260 226 261 - static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma) 227 + static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma) 262 228 { 263 229 struct bpf_ringbuf_map *rb_map; 264 230 ··· 276 242 vma->vm_pgoff + RINGBUF_PGOFF); 277 243 } 278 244 245 + static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma) 246 + { 247 + struct bpf_ringbuf_map *rb_map; 248 + 249 + rb_map = container_of(map, struct bpf_ringbuf_map, map); 250 + 251 + if (vma->vm_flags & VM_WRITE) { 252 + if (vma->vm_pgoff == 0) 253 + /* Disallow writable mappings to the consumer pointer, 254 + * and allow writable mappings to both the producer 255 + * position, and the ring buffer data itself. 256 + */ 257 + return -EPERM; 258 + } else { 259 + vma->vm_flags &= ~VM_MAYWRITE; 260 + } 261 + /* remap_vmalloc_range() checks size and offset constraints */ 262 + return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF); 263 + } 264 + 279 265 static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb) 280 266 { 281 267 unsigned long cons_pos, prod_pos; ··· 305 251 return prod_pos - cons_pos; 306 252 } 307 253 308 - static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp, 309 - struct poll_table_struct *pts) 254 + static u32 ringbuf_total_data_sz(const struct bpf_ringbuf *rb) 255 + { 256 + return rb->mask + 1; 257 + } 258 + 259 + static __poll_t ringbuf_map_poll_kern(struct bpf_map *map, struct file *filp, 260 + struct poll_table_struct *pts) 310 261 { 311 262 struct bpf_ringbuf_map *rb_map; 312 263 ··· 323 264 return 0; 324 265 } 325 266 267 + static __poll_t ringbuf_map_poll_user(struct bpf_map *map, struct file *filp, 268 + struct poll_table_struct *pts) 269 + { 270 + struct bpf_ringbuf_map *rb_map; 271 + 272 + rb_map = container_of(map, struct bpf_ringbuf_map, map); 273 + poll_wait(filp, &rb_map->rb->waitq, pts); 274 + 275 + if (ringbuf_avail_data_sz(rb_map->rb) < ringbuf_total_data_sz(rb_map->rb)) 276 + return EPOLLOUT | EPOLLWRNORM; 277 + return 0; 278 + } 279 + 326 280 BTF_ID_LIST_SINGLE(ringbuf_map_btf_ids, struct, bpf_ringbuf_map) 327 281 const struct bpf_map_ops ringbuf_map_ops = { 328 282 .map_meta_equal = bpf_map_meta_equal, 329 283 .map_alloc = ringbuf_map_alloc, 330 284 .map_free = ringbuf_map_free, 331 - .map_mmap = ringbuf_map_mmap, 332 - .map_poll = ringbuf_map_poll, 285 + .map_mmap = ringbuf_map_mmap_kern, 286 + .map_poll = ringbuf_map_poll_kern, 333 287 .map_lookup_elem = ringbuf_map_lookup_elem, 334 288 .map_update_elem = ringbuf_map_update_elem, 335 289 .map_delete_elem = ringbuf_map_delete_elem, 336 290 .map_get_next_key = ringbuf_map_get_next_key, 337 291 .map_btf_id = &ringbuf_map_btf_ids[0], 292 + }; 293 + 294 + BTF_ID_LIST_SINGLE(user_ringbuf_map_btf_ids, struct, bpf_ringbuf_map) 295 + const struct bpf_map_ops user_ringbuf_map_ops = { 296 + .map_meta_equal = bpf_map_meta_equal, 297 + .map_alloc = ringbuf_map_alloc, 298 + .map_free = ringbuf_map_free, 299 + .map_mmap = ringbuf_map_mmap_user, 300 + .map_poll = ringbuf_map_poll_user, 301 + .map_lookup_elem = ringbuf_map_lookup_elem, 302 + .map_update_elem = ringbuf_map_update_elem, 303 + .map_delete_elem = ringbuf_map_delete_elem, 304 + .map_get_next_key = ringbuf_map_get_next_key, 305 + .map_btf_id = &user_ringbuf_map_btf_ids[0], 338 306 }; 339 307 340 308 /* Given pointer to ring buffer record metadata and struct bpf_ringbuf itself, ··· 398 312 return NULL; 399 313 400 314 len = round_up(size + BPF_RINGBUF_HDR_SZ, 8); 401 - if (len > rb->mask + 1) 315 + if (len > ringbuf_total_data_sz(rb)) 402 316 return NULL; 403 317 404 318 cons_pos = smp_load_acquire(&rb->consumer_pos); ··· 545 459 case BPF_RB_AVAIL_DATA: 546 460 return ringbuf_avail_data_sz(rb); 547 461 case BPF_RB_RING_SIZE: 548 - return rb->mask + 1; 462 + return ringbuf_total_data_sz(rb); 549 463 case BPF_RB_CONS_POS: 550 464 return smp_load_acquire(&rb->consumer_pos); 551 465 case BPF_RB_PROD_POS: ··· 638 552 .ret_type = RET_VOID, 639 553 .arg1_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE, 640 554 .arg2_type = ARG_ANYTHING, 555 + }; 556 + 557 + static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *size) 558 + { 559 + int err; 560 + u32 hdr_len, sample_len, total_len, flags, *hdr; 561 + u64 cons_pos, prod_pos; 562 + 563 + /* Synchronizes with smp_store_release() in user-space producer. */ 564 + prod_pos = smp_load_acquire(&rb->producer_pos); 565 + if (prod_pos % 8) 566 + return -EINVAL; 567 + 568 + /* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */ 569 + cons_pos = smp_load_acquire(&rb->consumer_pos); 570 + if (cons_pos >= prod_pos) 571 + return -ENODATA; 572 + 573 + hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask)); 574 + /* Synchronizes with smp_store_release() in user-space producer. */ 575 + hdr_len = smp_load_acquire(hdr); 576 + flags = hdr_len & (BPF_RINGBUF_BUSY_BIT | BPF_RINGBUF_DISCARD_BIT); 577 + sample_len = hdr_len & ~flags; 578 + total_len = round_up(sample_len + BPF_RINGBUF_HDR_SZ, 8); 579 + 580 + /* The sample must fit within the region advertised by the producer position. */ 581 + if (total_len > prod_pos - cons_pos) 582 + return -EINVAL; 583 + 584 + /* The sample must fit within the data region of the ring buffer. */ 585 + if (total_len > ringbuf_total_data_sz(rb)) 586 + return -E2BIG; 587 + 588 + /* The sample must fit into a struct bpf_dynptr. */ 589 + err = bpf_dynptr_check_size(sample_len); 590 + if (err) 591 + return -E2BIG; 592 + 593 + if (flags & BPF_RINGBUF_DISCARD_BIT) { 594 + /* If the discard bit is set, the sample should be skipped. 595 + * 596 + * Update the consumer pos, and return -EAGAIN so the caller 597 + * knows to skip this sample and try to read the next one. 598 + */ 599 + smp_store_release(&rb->consumer_pos, cons_pos + total_len); 600 + return -EAGAIN; 601 + } 602 + 603 + if (flags & BPF_RINGBUF_BUSY_BIT) 604 + return -ENODATA; 605 + 606 + *sample = (void *)((uintptr_t)rb->data + 607 + (uintptr_t)((cons_pos + BPF_RINGBUF_HDR_SZ) & rb->mask)); 608 + *size = sample_len; 609 + return 0; 610 + } 611 + 612 + static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags) 613 + { 614 + u64 consumer_pos; 615 + u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8); 616 + 617 + /* Using smp_load_acquire() is unnecessary here, as the busy-bit 618 + * prevents another task from writing to consumer_pos after it was read 619 + * by this task with smp_load_acquire() in __bpf_user_ringbuf_peek(). 620 + */ 621 + consumer_pos = rb->consumer_pos; 622 + /* Synchronizes with smp_load_acquire() in user-space producer. */ 623 + smp_store_release(&rb->consumer_pos, consumer_pos + rounded_size); 624 + } 625 + 626 + BPF_CALL_4(bpf_user_ringbuf_drain, struct bpf_map *, map, 627 + void *, callback_fn, void *, callback_ctx, u64, flags) 628 + { 629 + struct bpf_ringbuf *rb; 630 + long samples, discarded_samples = 0, ret = 0; 631 + bpf_callback_t callback = (bpf_callback_t)callback_fn; 632 + u64 wakeup_flags = BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP; 633 + int busy = 0; 634 + 635 + if (unlikely(flags & ~wakeup_flags)) 636 + return -EINVAL; 637 + 638 + rb = container_of(map, struct bpf_ringbuf_map, map)->rb; 639 + 640 + /* If another consumer is already consuming a sample, wait for them to finish. */ 641 + if (!atomic_try_cmpxchg(&rb->busy, &busy, 1)) 642 + return -EBUSY; 643 + 644 + for (samples = 0; samples < BPF_MAX_USER_RINGBUF_SAMPLES && ret == 0; samples++) { 645 + int err; 646 + u32 size; 647 + void *sample; 648 + struct bpf_dynptr_kern dynptr; 649 + 650 + err = __bpf_user_ringbuf_peek(rb, &sample, &size); 651 + if (err) { 652 + if (err == -ENODATA) { 653 + break; 654 + } else if (err == -EAGAIN) { 655 + discarded_samples++; 656 + continue; 657 + } else { 658 + ret = err; 659 + goto schedule_work_return; 660 + } 661 + } 662 + 663 + bpf_dynptr_init(&dynptr, sample, BPF_DYNPTR_TYPE_LOCAL, 0, size); 664 + ret = callback((uintptr_t)&dynptr, (uintptr_t)callback_ctx, 0, 0, 0); 665 + __bpf_user_ringbuf_sample_release(rb, size, flags); 666 + } 667 + ret = samples - discarded_samples; 668 + 669 + schedule_work_return: 670 + /* Prevent the clearing of the busy-bit from being reordered before the 671 + * storing of any rb consumer or producer positions. 672 + */ 673 + smp_mb__before_atomic(); 674 + atomic_set(&rb->busy, 0); 675 + 676 + if (flags & BPF_RB_FORCE_WAKEUP) 677 + irq_work_queue(&rb->work); 678 + else if (!(flags & BPF_RB_NO_WAKEUP) && samples > 0) 679 + irq_work_queue(&rb->work); 680 + return ret; 681 + } 682 + 683 + const struct bpf_func_proto bpf_user_ringbuf_drain_proto = { 684 + .func = bpf_user_ringbuf_drain, 685 + .ret_type = RET_INTEGER, 686 + .arg1_type = ARG_CONST_MAP_PTR, 687 + .arg2_type = ARG_PTR_TO_FUNC, 688 + .arg3_type = ARG_PTR_TO_STACK_OR_NULL, 689 + .arg4_type = ARG_ANYTHING, 641 690 };

+18 -11

kernel/bpf/syscall.c

··· 598 598 if (off_desc->type == BPF_KPTR_UNREF) { 599 599 u64 *p = (u64 *)btf_id_ptr; 600 600 601 - WRITE_ONCE(p, 0); 601 + WRITE_ONCE(*p, 0); 602 602 continue; 603 603 } 604 604 old_ptr = xchg(btf_id_ptr, 0); ··· 1049 1049 } 1050 1050 if (map->map_type != BPF_MAP_TYPE_HASH && 1051 1051 map->map_type != BPF_MAP_TYPE_LRU_HASH && 1052 - map->map_type != BPF_MAP_TYPE_ARRAY) { 1052 + map->map_type != BPF_MAP_TYPE_ARRAY && 1053 + map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) { 1053 1054 ret = -EOPNOTSUPP; 1054 1055 goto free_map_tab; 1055 1056 } ··· 1417 1416 } 1418 1417 1419 1418 value_size = bpf_map_value_size(map); 1420 - 1421 - err = -ENOMEM; 1422 - value = kvmalloc(value_size, GFP_USER | __GFP_NOWARN); 1423 - if (!value) 1419 + value = kvmemdup_bpfptr(uvalue, value_size); 1420 + if (IS_ERR(value)) { 1421 + err = PTR_ERR(value); 1424 1422 goto free_key; 1425 - 1426 - err = -EFAULT; 1427 - if (copy_from_bpfptr(value, uvalue, value_size) != 0) 1428 - goto free_value; 1423 + } 1429 1424 1430 1425 err = bpf_map_update_value(map, f, key, value, attr->flags); 1431 1426 1432 - free_value: 1433 1427 kvfree(value); 1434 1428 free_key: 1435 1429 kvfree(key); ··· 2092 2096 u64 cnt; 2093 2097 u64 misses; 2094 2098 }; 2099 + 2100 + void notrace bpf_prog_inc_misses_counter(struct bpf_prog *prog) 2101 + { 2102 + struct bpf_prog_stats *stats; 2103 + unsigned int flags; 2104 + 2105 + stats = this_cpu_ptr(prog->stats); 2106 + flags = u64_stats_update_begin_irqsave(&stats->syncp); 2107 + u64_stats_inc(&stats->misses); 2108 + u64_stats_update_end_irqrestore(&stats->syncp, flags); 2109 + } 2095 2110 2096 2111 static void bpf_prog_get_stats(const struct bpf_prog *prog, 2097 2112 struct bpf_prog_kstats *stats)

+202 -22

kernel/bpf/task_iter.c

··· 10 10 #include <linux/btf_ids.h> 11 11 #include "mmap_unlock_work.h" 12 12 13 + static const char * const iter_task_type_names[] = { 14 + "ALL", 15 + "TID", 16 + "PID", 17 + }; 18 + 13 19 struct bpf_iter_seq_task_common { 14 20 struct pid_namespace *ns; 21 + enum bpf_iter_task_type type; 22 + u32 pid; 23 + u32 pid_visiting; 15 24 }; 16 25 17 26 struct bpf_iter_seq_task_info { ··· 31 22 u32 tid; 32 23 }; 33 24 34 - static struct task_struct *task_seq_get_next(struct pid_namespace *ns, 25 + static struct task_struct *task_group_seq_get_next(struct bpf_iter_seq_task_common *common, 26 + u32 *tid, 27 + bool skip_if_dup_files) 28 + { 29 + struct task_struct *task, *next_task; 30 + struct pid *pid; 31 + u32 saved_tid; 32 + 33 + if (!*tid) { 34 + /* The first time, the iterator calls this function. */ 35 + pid = find_pid_ns(common->pid, common->ns); 36 + if (!pid) 37 + return NULL; 38 + 39 + task = get_pid_task(pid, PIDTYPE_TGID); 40 + if (!task) 41 + return NULL; 42 + 43 + *tid = common->pid; 44 + common->pid_visiting = common->pid; 45 + 46 + return task; 47 + } 48 + 49 + /* If the control returns to user space and comes back to the 50 + * kernel again, *tid and common->pid_visiting should be the 51 + * same for task_seq_start() to pick up the correct task. 52 + */ 53 + if (*tid == common->pid_visiting) { 54 + pid = find_pid_ns(common->pid_visiting, common->ns); 55 + task = get_pid_task(pid, PIDTYPE_PID); 56 + 57 + return task; 58 + } 59 + 60 + pid = find_pid_ns(common->pid_visiting, common->ns); 61 + if (!pid) 62 + return NULL; 63 + 64 + task = get_pid_task(pid, PIDTYPE_PID); 65 + if (!task) 66 + return NULL; 67 + 68 + retry: 69 + if (!pid_alive(task)) { 70 + put_task_struct(task); 71 + return NULL; 72 + } 73 + 74 + next_task = next_thread(task); 75 + put_task_struct(task); 76 + if (!next_task) 77 + return NULL; 78 + 79 + saved_tid = *tid; 80 + *tid = __task_pid_nr_ns(next_task, PIDTYPE_PID, common->ns); 81 + if (!*tid || *tid == common->pid) { 82 + /* Run out of tasks of a process. The tasks of a 83 + * thread_group are linked as circular linked list. 84 + */ 85 + *tid = saved_tid; 86 + return NULL; 87 + } 88 + 89 + get_task_struct(next_task); 90 + common->pid_visiting = *tid; 91 + 92 + if (skip_if_dup_files && task->files == task->group_leader->files) { 93 + task = next_task; 94 + goto retry; 95 + } 96 + 97 + return next_task; 98 + } 99 + 100 + static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *common, 35 101 u32 *tid, 36 102 bool skip_if_dup_files) 37 103 { 38 104 struct task_struct *task = NULL; 39 105 struct pid *pid; 40 106 107 + if (common->type == BPF_TASK_ITER_TID) { 108 + if (*tid && *tid != common->pid) 109 + return NULL; 110 + rcu_read_lock(); 111 + pid = find_pid_ns(common->pid, common->ns); 112 + if (pid) { 113 + task = get_pid_task(pid, PIDTYPE_TGID); 114 + *tid = common->pid; 115 + } 116 + rcu_read_unlock(); 117 + 118 + return task; 119 + } 120 + 121 + if (common->type == BPF_TASK_ITER_TGID) { 122 + rcu_read_lock(); 123 + task = task_group_seq_get_next(common, tid, skip_if_dup_files); 124 + rcu_read_unlock(); 125 + 126 + return task; 127 + } 128 + 41 129 rcu_read_lock(); 42 130 retry: 43 - pid = find_ge_pid(*tid, ns); 131 + pid = find_ge_pid(*tid, common->ns); 44 132 if (pid) { 45 - *tid = pid_nr_ns(pid, ns); 133 + *tid = pid_nr_ns(pid, common->ns); 46 134 task = get_pid_task(pid, PIDTYPE_PID); 47 135 if (!task) { 48 136 ++*tid; ··· 162 56 struct bpf_iter_seq_task_info *info = seq->private; 163 57 struct task_struct *task; 164 58 165 - task = task_seq_get_next(info->common.ns, &info->tid, false); 59 + task = task_seq_get_next(&info->common, &info->tid, false); 166 60 if (!task) 167 61 return NULL; 168 62 ··· 179 73 ++*pos; 180 74 ++info->tid; 181 75 put_task_struct((struct task_struct *)v); 182 - task = task_seq_get_next(info->common.ns, &info->tid, false); 76 + task = task_seq_get_next(&info->common, &info->tid, false); 183 77 if (!task) 184 78 return NULL; 185 79 ··· 223 117 put_task_struct((struct task_struct *)v); 224 118 } 225 119 120 + static int bpf_iter_attach_task(struct bpf_prog *prog, 121 + union bpf_iter_link_info *linfo, 122 + struct bpf_iter_aux_info *aux) 123 + { 124 + unsigned int flags; 125 + struct pid *pid; 126 + pid_t tgid; 127 + 128 + if ((!!linfo->task.tid + !!linfo->task.pid + !!linfo->task.pid_fd) > 1) 129 + return -EINVAL; 130 + 131 + aux->task.type = BPF_TASK_ITER_ALL; 132 + if (linfo->task.tid != 0) { 133 + aux->task.type = BPF_TASK_ITER_TID; 134 + aux->task.pid = linfo->task.tid; 135 + } 136 + if (linfo->task.pid != 0) { 137 + aux->task.type = BPF_TASK_ITER_TGID; 138 + aux->task.pid = linfo->task.pid; 139 + } 140 + if (linfo->task.pid_fd != 0) { 141 + aux->task.type = BPF_TASK_ITER_TGID; 142 + 143 + pid = pidfd_get_pid(linfo->task.pid_fd, &flags); 144 + if (IS_ERR(pid)) 145 + return PTR_ERR(pid); 146 + 147 + tgid = pid_nr_ns(pid, task_active_pid_ns(current)); 148 + aux->task.pid = tgid; 149 + put_pid(pid); 150 + } 151 + 152 + return 0; 153 + } 154 + 226 155 static const struct seq_operations task_seq_ops = { 227 156 .start = task_seq_start, 228 157 .next = task_seq_next, ··· 278 137 static struct file * 279 138 task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info) 280 139 { 281 - struct pid_namespace *ns = info->common.ns; 282 - u32 curr_tid = info->tid; 140 + u32 saved_tid = info->tid; 283 141 struct task_struct *curr_task; 284 142 unsigned int curr_fd = info->fd; 285 143 ··· 291 151 curr_task = info->task; 292 152 curr_fd = info->fd; 293 153 } else { 294 - curr_task = task_seq_get_next(ns, &curr_tid, true); 154 + curr_task = task_seq_get_next(&info->common, &info->tid, true); 295 155 if (!curr_task) { 296 156 info->task = NULL; 297 - info->tid = curr_tid; 298 157 return NULL; 299 158 } 300 159 301 - /* set info->task and info->tid */ 160 + /* set info->task */ 302 161 info->task = curr_task; 303 - if (curr_tid == info->tid) { 162 + if (saved_tid == info->tid) 304 163 curr_fd = info->fd; 305 - } else { 306 - info->tid = curr_tid; 164 + else 307 165 curr_fd = 0; 308 - } 309 166 } 310 167 311 168 rcu_read_lock(); ··· 323 186 /* the current task is done, go to the next task */ 324 187 rcu_read_unlock(); 325 188 put_task_struct(curr_task); 189 + 190 + if (info->common.type == BPF_TASK_ITER_TID) { 191 + info->task = NULL; 192 + return NULL; 193 + } 194 + 326 195 info->task = NULL; 327 196 info->fd = 0; 328 - curr_tid = ++(info->tid); 197 + saved_tid = ++(info->tid); 329 198 goto again; 330 199 } 331 200 ··· 412 269 struct bpf_iter_seq_task_common *common = priv_data; 413 270 414 271 common->ns = get_pid_ns(task_active_pid_ns(current)); 272 + common->type = aux->task.type; 273 + common->pid = aux->task.pid; 274 + 415 275 return 0; 416 276 } 417 277 ··· 453 307 static struct vm_area_struct * 454 308 task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) 455 309 { 456 - struct pid_namespace *ns = info->common.ns; 457 310 enum bpf_task_vma_iter_find_op op; 458 311 struct vm_area_struct *curr_vma; 459 312 struct task_struct *curr_task; 460 - u32 curr_tid = info->tid; 313 + u32 saved_tid = info->tid; 461 314 462 315 /* If this function returns a non-NULL vma, it holds a reference to 463 316 * the task_struct, and holds read lock on vma->mm->mmap_lock. ··· 516 371 } 517 372 } else { 518 373 again: 519 - curr_task = task_seq_get_next(ns, &curr_tid, true); 374 + curr_task = task_seq_get_next(&info->common, &info->tid, true); 520 375 if (!curr_task) { 521 - info->tid = curr_tid + 1; 376 + info->tid++; 522 377 goto finish; 523 378 } 524 379 525 - if (curr_tid != info->tid) { 526 - info->tid = curr_tid; 380 + if (saved_tid != info->tid) { 527 381 /* new task, process the first vma */ 528 382 op = task_vma_iter_first_vma; 529 383 } else { ··· 574 430 return curr_vma; 575 431 576 432 next_task: 433 + if (info->common.type == BPF_TASK_ITER_TID) 434 + goto finish; 435 + 577 436 put_task_struct(curr_task); 578 437 info->task = NULL; 579 - curr_tid++; 438 + info->tid++; 580 439 goto again; 581 440 582 441 finish: ··· 678 531 .seq_priv_size = sizeof(struct bpf_iter_seq_task_info), 679 532 }; 680 533 534 + static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info) 535 + { 536 + switch (aux->task.type) { 537 + case BPF_TASK_ITER_TID: 538 + info->iter.task.tid = aux->task.pid; 539 + break; 540 + case BPF_TASK_ITER_TGID: 541 + info->iter.task.pid = aux->task.pid; 542 + break; 543 + default: 544 + break; 545 + } 546 + return 0; 547 + } 548 + 549 + static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq) 550 + { 551 + seq_printf(seq, "task_type:\t%s\n", iter_task_type_names[aux->task.type]); 552 + if (aux->task.type == BPF_TASK_ITER_TID) 553 + seq_printf(seq, "tid:\t%u\n", aux->task.pid); 554 + else if (aux->task.type == BPF_TASK_ITER_TGID) 555 + seq_printf(seq, "pid:\t%u\n", aux->task.pid); 556 + } 557 + 681 558 static struct bpf_iter_reg task_reg_info = { 682 559 .target = "task", 560 + .attach_target = bpf_iter_attach_task, 683 561 .feature = BPF_ITER_RESCHED, 684 562 .ctx_arg_info_size = 1, 685 563 .ctx_arg_info = { ··· 712 540 PTR_TO_BTF_ID_OR_NULL }, 713 541 }, 714 542 .seq_info = &task_seq_info, 543 + .fill_link_info = bpf_iter_fill_link_info, 544 + .show_fdinfo = bpf_iter_task_show_fdinfo, 715 545 }; 716 546 717 547 static const struct bpf_iter_seq_info task_file_seq_info = { ··· 725 551 726 552 static struct bpf_iter_reg task_file_reg_info = { 727 553 .target = "task_file", 554 + .attach_target = bpf_iter_attach_task, 728 555 .feature = BPF_ITER_RESCHED, 729 556 .ctx_arg_info_size = 2, 730 557 .ctx_arg_info = { ··· 735 560 PTR_TO_BTF_ID_OR_NULL }, 736 561 }, 737 562 .seq_info = &task_file_seq_info, 563 + .fill_link_info = bpf_iter_fill_link_info, 564 + .show_fdinfo = bpf_iter_task_show_fdinfo, 738 565 }; 739 566 740 567 static const struct bpf_iter_seq_info task_vma_seq_info = { ··· 748 571 749 572 static struct bpf_iter_reg task_vma_reg_info = { 750 573 .target = "task_vma", 574 + .attach_target = bpf_iter_attach_task, 751 575 .feature = BPF_ITER_RESCHED, 752 576 .ctx_arg_info_size = 2, 753 577 .ctx_arg_info = { ··· 758 580 PTR_TO_BTF_ID_OR_NULL }, 759 581 }, 760 582 .seq_info = &task_vma_seq_info, 583 + .fill_link_info = bpf_iter_fill_link_info, 584 + .show_fdinfo = bpf_iter_task_show_fdinfo, 761 585 }; 762 586 763 587 BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,

+30 -30

kernel/bpf/trampoline.c

··· 116 116 (ptype == BPF_PROG_TYPE_LSM && eatype == BPF_LSM_MAC); 117 117 } 118 118 119 - void *bpf_jit_alloc_exec_page(void) 120 - { 121 - void *image; 122 - 123 - image = bpf_jit_alloc_exec(PAGE_SIZE); 124 - if (!image) 125 - return NULL; 126 - 127 - set_vm_flush_reset_perms(image); 128 - /* Keep image as writeable. The alternative is to keep flipping ro/rw 129 - * every time new program is attached or detached. 130 - */ 131 - set_memory_x((long)image, 1); 132 - return image; 133 - } 134 - 135 119 void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym) 136 120 { 137 121 ksym->start = (unsigned long) data; ··· 388 404 goto out_free_im; 389 405 390 406 err = -ENOMEM; 391 - im->image = image = bpf_jit_alloc_exec_page(); 407 + im->image = image = bpf_jit_alloc_exec(PAGE_SIZE); 392 408 if (!image) 393 409 goto out_uncharge; 410 + set_vm_flush_reset_perms(image); 394 411 395 412 err = percpu_ref_init(&im->pcref, __bpf_tramp_image_release, 0, GFP_KERNEL); 396 413 if (err) ··· 467 482 tr->func.addr); 468 483 if (err < 0) 469 484 goto out; 485 + 486 + set_memory_ro((long)im->image, 1); 487 + set_memory_x((long)im->image, 1); 470 488 471 489 WARN_ON(tr->cur_image && tr->selector == 0); 472 490 WARN_ON(!tr->cur_image && tr->selector); ··· 851 863 return start; 852 864 } 853 865 854 - static void notrace inc_misses_counter(struct bpf_prog *prog) 855 - { 856 - struct bpf_prog_stats *stats; 857 - unsigned int flags; 858 - 859 - stats = this_cpu_ptr(prog->stats); 860 - flags = u64_stats_update_begin_irqsave(&stats->syncp); 861 - u64_stats_inc(&stats->misses); 862 - u64_stats_update_end_irqrestore(&stats->syncp, flags); 863 - } 864 - 865 866 /* The logic is similar to bpf_prog_run(), but with an explicit 866 867 * rcu_read_lock() and migrate_disable() which are required 867 868 * for the trampoline. The macro is split into ··· 873 896 run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); 874 897 875 898 if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { 876 - inc_misses_counter(prog); 899 + bpf_prog_inc_misses_counter(prog); 877 900 return 0; 878 901 } 879 902 return bpf_prog_start_time(); ··· 944 967 might_fault(); 945 968 946 969 if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { 947 - inc_misses_counter(prog); 970 + bpf_prog_inc_misses_counter(prog); 948 971 return 0; 949 972 } 950 973 ··· 962 985 this_cpu_dec(*(prog->active)); 963 986 migrate_enable(); 964 987 rcu_read_unlock_trace(); 988 + } 989 + 990 + u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, 991 + struct bpf_tramp_run_ctx *run_ctx) 992 + __acquires(RCU) 993 + { 994 + rcu_read_lock(); 995 + migrate_disable(); 996 + 997 + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); 998 + 999 + return bpf_prog_start_time(); 1000 + } 1001 + 1002 + void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, 1003 + struct bpf_tramp_run_ctx *run_ctx) 1004 + __releases(RCU) 1005 + { 1006 + bpf_reset_run_ctx(run_ctx->saved_run_ctx); 1007 + 1008 + update_prog_stats(prog, start); 1009 + migrate_enable(); 1010 + rcu_read_unlock(); 965 1011 } 966 1012 967 1013 void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr)

+197 -144

kernel/bpf/verifier.c

··· 23 23 #include <linux/error-injection.h> 24 24 #include <linux/bpf_lsm.h> 25 25 #include <linux/btf_ids.h> 26 + #include <linux/poison.h> 26 27 27 28 #include "disasm.h" 28 29 ··· 371 370 bpf_verifier_vlog(log, fmt, args); 372 371 va_end(args); 373 372 } 373 + EXPORT_SYMBOL_GPL(bpf_log); 374 374 375 375 static const char *ltrim(const char *s) 376 376 { ··· 563 561 [PTR_TO_BUF] = "buf", 564 562 [PTR_TO_FUNC] = "func", 565 563 [PTR_TO_MAP_KEY] = "map_key", 564 + [PTR_TO_DYNPTR] = "dynptr_ptr", 566 565 }; 567 566 568 567 if (type & PTR_MAYBE_NULL) { ··· 782 779 return true; 783 780 } 784 781 785 - static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 786 - enum bpf_arg_type arg_type) 782 + bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, 783 + struct bpf_reg_state *reg) 787 784 { 788 785 struct bpf_func_state *state = func(env, reg); 789 786 int spi = get_spi(reg->off); ··· 799 796 return false; 800 797 } 801 798 799 + return true; 800 + } 801 + 802 + bool is_dynptr_type_expected(struct bpf_verifier_env *env, 803 + struct bpf_reg_state *reg, 804 + enum bpf_arg_type arg_type) 805 + { 806 + struct bpf_func_state *state = func(env, reg); 807 + enum bpf_dynptr_type dynptr_type; 808 + int spi = get_spi(reg->off); 809 + 802 810 /* ARG_PTR_TO_DYNPTR takes any type of dynptr */ 803 811 if (arg_type == ARG_PTR_TO_DYNPTR) 804 812 return true; 805 813 806 - return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type); 814 + dynptr_type = arg_to_dynptr_type(arg_type); 815 + 816 + return state->stack[spi].spilled_ptr.dynptr.type == dynptr_type; 807 817 } 808 818 809 819 /* The reg state of a pointer or a bounded scalar was saved when ··· 1765 1749 state->callsite = callsite; 1766 1750 state->frameno = frameno; 1767 1751 state->subprogno = subprogno; 1752 + state->callback_ret_range = tnum_range(0, 0); 1768 1753 init_reg_state(env, state); 1769 1754 mark_verifier_state_scratched(env); 1770 1755 } ··· 2925 2908 return 0; 2926 2909 } 2927 2910 2928 - static int mark_chain_precision(struct bpf_verifier_env *env, int regno) 2911 + int mark_chain_precision(struct bpf_verifier_env *env, int regno) 2929 2912 { 2930 2913 return __mark_chain_precision(env, regno, -1); 2931 2914 } ··· 5250 5233 env, 5251 5234 regno, reg->off, access_size, 5252 5235 zero_size_allowed, ACCESS_HELPER, meta); 5236 + case PTR_TO_CTX: 5237 + /* in case the function doesn't know how to access the context, 5238 + * (because we are in a program of type SYSCALL for example), we 5239 + * can not statically check its size. 5240 + * Dynamically check it now. 5241 + */ 5242 + if (!env->ops->convert_ctx_access) { 5243 + enum bpf_access_type atype = meta && meta->raw_mode ? BPF_WRITE : BPF_READ; 5244 + int offset = access_size - 1; 5245 + 5246 + /* Allow zero-byte read from PTR_TO_CTX */ 5247 + if (access_size == 0) 5248 + return zero_size_allowed ? 0 : -EACCES; 5249 + 5250 + return check_mem_access(env, env->insn_idx, regno, offset, BPF_B, 5251 + atype, -1, false); 5252 + } 5253 + 5254 + fallthrough; 5253 5255 default: /* scalar_value or invalid ptr */ 5254 5256 /* Allow zero-byte read from NULL, regardless of pointer type */ 5255 5257 if (zero_size_allowed && access_size == 0 && ··· 5702 5666 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } }; 5703 5667 static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } }; 5704 5668 static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } }; 5669 + static const struct bpf_reg_types dynptr_types = { 5670 + .types = { 5671 + PTR_TO_STACK, 5672 + PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL, 5673 + } 5674 + }; 5705 5675 5706 5676 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = { 5707 5677 [ARG_PTR_TO_MAP_KEY] = &map_key_value_types, ··· 5734 5692 [ARG_PTR_TO_CONST_STR] = &const_str_ptr_types, 5735 5693 [ARG_PTR_TO_TIMER] = &timer_types, 5736 5694 [ARG_PTR_TO_KPTR] = &kptr_types, 5737 - [ARG_PTR_TO_DYNPTR] = &stack_ptr_types, 5695 + [ARG_PTR_TO_DYNPTR] = &dynptr_types, 5738 5696 }; 5739 5697 5740 5698 static int check_reg_type(struct bpf_verifier_env *env, u32 regno, ··· 5803 5761 if (meta->func_id == BPF_FUNC_kptr_xchg) { 5804 5762 if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno)) 5805 5763 return -EACCES; 5806 - } else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off, 5807 - btf_vmlinux, *arg_btf_id, 5808 - strict_type_match)) { 5809 - verbose(env, "R%d is of type %s but %s is expected\n", 5810 - regno, kernel_type_name(reg->btf, reg->btf_id), 5811 - kernel_type_name(btf_vmlinux, *arg_btf_id)); 5812 - return -EACCES; 5764 + } else { 5765 + if (arg_btf_id == BPF_PTR_POISON) { 5766 + verbose(env, "verifier internal error:"); 5767 + verbose(env, "R%d has non-overwritten BPF_PTR_POISON type\n", 5768 + regno); 5769 + return -EACCES; 5770 + } 5771 + 5772 + if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off, 5773 + btf_vmlinux, *arg_btf_id, 5774 + strict_type_match)) { 5775 + verbose(env, "R%d is of type %s but %s is expected\n", 5776 + regno, kernel_type_name(reg->btf, reg->btf_id), 5777 + kernel_type_name(btf_vmlinux, *arg_btf_id)); 5778 + return -EACCES; 5779 + } 5813 5780 } 5814 5781 } 5815 5782 ··· 6086 6035 err = check_mem_size_reg(env, reg, regno, true, meta); 6087 6036 break; 6088 6037 case ARG_PTR_TO_DYNPTR: 6038 + /* We only need to check for initialized / uninitialized helper 6039 + * dynptr args if the dynptr is not PTR_TO_DYNPTR, as the 6040 + * assumption is that if it is, that a helper function 6041 + * initialized the dynptr on behalf of the BPF program. 6042 + */ 6043 + if (base_type(reg->type) == PTR_TO_DYNPTR) 6044 + break; 6089 6045 if (arg_type & MEM_UNINIT) { 6090 6046 if (!is_dynptr_reg_valid_uninit(env, reg)) { 6091 6047 verbose(env, "Dynptr has to be an uninitialized dynptr\n"); ··· 6108 6050 } 6109 6051 6110 6052 meta->uninit_dynptr_regno = regno; 6111 - } else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) { 6053 + } else if (!is_dynptr_reg_valid_init(env, reg)) { 6054 + verbose(env, 6055 + "Expected an initialized dynptr as arg #%d\n", 6056 + arg + 1); 6057 + return -EINVAL; 6058 + } else if (!is_dynptr_type_expected(env, reg, arg_type)) { 6112 6059 const char *err_extra = ""; 6113 6060 6114 6061 switch (arg_type & DYNPTR_TYPE_FLAG_MASK) { 6115 6062 case DYNPTR_TYPE_LOCAL: 6116 - err_extra = "local "; 6063 + err_extra = "local"; 6117 6064 break; 6118 6065 case DYNPTR_TYPE_RINGBUF: 6119 - err_extra = "ringbuf "; 6066 + err_extra = "ringbuf"; 6120 6067 break; 6121 6068 default: 6069 + err_extra = "<unknown>"; 6122 6070 break; 6123 6071 } 6124 - 6125 - verbose(env, "Expected an initialized %sdynptr as arg #%d\n", 6072 + verbose(env, 6073 + "Expected a dynptr of type %s as arg #%d\n", 6126 6074 err_extra, arg + 1); 6127 6075 return -EINVAL; 6128 6076 } ··· 6273 6209 func_id != BPF_FUNC_ringbuf_discard_dynptr) 6274 6210 goto error; 6275 6211 break; 6212 + case BPF_MAP_TYPE_USER_RINGBUF: 6213 + if (func_id != BPF_FUNC_user_ringbuf_drain) 6214 + goto error; 6215 + break; 6276 6216 case BPF_MAP_TYPE_STACK_TRACE: 6277 6217 if (func_id != BPF_FUNC_get_stackid) 6278 6218 goto error; ··· 6394 6326 case BPF_FUNC_ringbuf_submit_dynptr: 6395 6327 case BPF_FUNC_ringbuf_discard_dynptr: 6396 6328 if (map->map_type != BPF_MAP_TYPE_RINGBUF) 6329 + goto error; 6330 + break; 6331 + case BPF_FUNC_user_ringbuf_drain: 6332 + if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF) 6397 6333 goto error; 6398 6334 break; 6399 6335 case BPF_FUNC_get_stackid: ··· 6566 6494 /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END] 6567 6495 * are now invalid, so turn them into unknown SCALAR_VALUE. 6568 6496 */ 6569 - static void __clear_all_pkt_pointers(struct bpf_verifier_env *env, 6570 - struct bpf_func_state *state) 6571 - { 6572 - struct bpf_reg_state *regs = state->regs, *reg; 6573 - int i; 6574 - 6575 - for (i = 0; i < MAX_BPF_REG; i++) 6576 - if (reg_is_pkt_pointer_any(&regs[i])) 6577 - mark_reg_unknown(env, regs, i); 6578 - 6579 - bpf_for_each_spilled_reg(i, state, reg) { 6580 - if (!reg) 6581 - continue; 6582 - if (reg_is_pkt_pointer_any(reg)) 6583 - __mark_reg_unknown(env, reg); 6584 - } 6585 - } 6586 - 6587 6497 static void clear_all_pkt_pointers(struct bpf_verifier_env *env) 6588 6498 { 6589 - struct bpf_verifier_state *vstate = env->cur_state; 6590 - int i; 6499 + struct bpf_func_state *state; 6500 + struct bpf_reg_state *reg; 6591 6501 6592 - for (i = 0; i <= vstate->curframe; i++) 6593 - __clear_all_pkt_pointers(env, vstate->frame[i]); 6502 + bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ 6503 + if (reg_is_pkt_pointer_any(reg)) 6504 + __mark_reg_unknown(env, reg); 6505 + })); 6594 6506 } 6595 6507 6596 6508 enum { ··· 6603 6547 reg->range = AT_PKT_END; 6604 6548 } 6605 6549 6606 - static void release_reg_references(struct bpf_verifier_env *env, 6607 - struct bpf_func_state *state, 6608 - int ref_obj_id) 6609 - { 6610 - struct bpf_reg_state *regs = state->regs, *reg; 6611 - int i; 6612 - 6613 - for (i = 0; i < MAX_BPF_REG; i++) 6614 - if (regs[i].ref_obj_id == ref_obj_id) 6615 - mark_reg_unknown(env, regs, i); 6616 - 6617 - bpf_for_each_spilled_reg(i, state, reg) { 6618 - if (!reg) 6619 - continue; 6620 - if (reg->ref_obj_id == ref_obj_id) 6621 - __mark_reg_unknown(env, reg); 6622 - } 6623 - } 6624 - 6625 6550 /* The pointer with the specified id has released its reference to kernel 6626 6551 * resources. Identify all copies of the same pointer and clear the reference. 6627 6552 */ 6628 6553 static int release_reference(struct bpf_verifier_env *env, 6629 6554 int ref_obj_id) 6630 6555 { 6631 - struct bpf_verifier_state *vstate = env->cur_state; 6556 + struct bpf_func_state *state; 6557 + struct bpf_reg_state *reg; 6632 6558 int err; 6633 - int i; 6634 6559 6635 6560 err = release_reference_state(cur_func(env), ref_obj_id); 6636 6561 if (err) 6637 6562 return err; 6638 6563 6639 - for (i = 0; i <= vstate->curframe; i++) 6640 - release_reg_references(env, vstate->frame[i], ref_obj_id); 6564 + bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ 6565 + if (reg->ref_obj_id == ref_obj_id) 6566 + __mark_reg_unknown(env, reg); 6567 + })); 6641 6568 6642 6569 return 0; 6643 6570 } ··· 6668 6629 func_info_aux = env->prog->aux->func_info_aux; 6669 6630 if (func_info_aux) 6670 6631 is_global = func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL; 6671 - err = btf_check_subprog_arg_match(env, subprog, caller->regs); 6632 + err = btf_check_subprog_call(env, subprog, caller->regs); 6672 6633 if (err == -EFAULT) 6673 6634 return err; 6674 6635 if (is_global) { ··· 6842 6803 return err; 6843 6804 6844 6805 callee->in_callback_fn = true; 6806 + callee->callback_ret_range = tnum_range(0, 1); 6845 6807 return 0; 6846 6808 } 6847 6809 ··· 6864 6824 __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); 6865 6825 6866 6826 callee->in_callback_fn = true; 6827 + callee->callback_ret_range = tnum_range(0, 1); 6867 6828 return 0; 6868 6829 } 6869 6830 ··· 6894 6853 __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); 6895 6854 __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); 6896 6855 callee->in_async_callback_fn = true; 6856 + callee->callback_ret_range = tnum_range(0, 1); 6897 6857 return 0; 6898 6858 } 6899 6859 ··· 6921 6879 /* unused */ 6922 6880 __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); 6923 6881 __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); 6882 + callee->in_callback_fn = true; 6883 + callee->callback_ret_range = tnum_range(0, 1); 6884 + return 0; 6885 + } 6886 + 6887 + static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env, 6888 + struct bpf_func_state *caller, 6889 + struct bpf_func_state *callee, 6890 + int insn_idx) 6891 + { 6892 + /* bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void 6893 + * callback_ctx, u64 flags); 6894 + * callback_fn(struct bpf_dynptr_t* dynptr, void *callback_ctx); 6895 + */ 6896 + __mark_reg_not_init(env, &callee->regs[BPF_REG_0]); 6897 + callee->regs[BPF_REG_1].type = PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL; 6898 + __mark_reg_known_zero(&callee->regs[BPF_REG_1]); 6899 + callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3]; 6900 + 6901 + /* unused */ 6902 + __mark_reg_not_init(env, &callee->regs[BPF_REG_3]); 6903 + __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); 6904 + __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); 6905 + 6924 6906 callee->in_callback_fn = true; 6925 6907 return 0; 6926 6908 } ··· 6973 6907 caller = state->frame[state->curframe]; 6974 6908 if (callee->in_callback_fn) { 6975 6909 /* enforce R0 return value range [0, 1]. */ 6976 - struct tnum range = tnum_range(0, 1); 6910 + struct tnum range = callee->callback_ret_range; 6977 6911 6978 6912 if (r0->type != SCALAR_VALUE) { 6979 6913 verbose(env, "R0 not a scalar value\n"); ··· 7408 7342 case BPF_FUNC_dynptr_data: 7409 7343 for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { 7410 7344 if (arg_type_is_dynptr(fn->arg_type[i])) { 7345 + struct bpf_reg_state *reg = &regs[BPF_REG_1 + i]; 7346 + 7411 7347 if (meta.ref_obj_id) { 7412 7348 verbose(env, "verifier internal error: meta.ref_obj_id already set\n"); 7413 7349 return -EFAULT; 7414 7350 } 7415 - /* Find the id of the dynptr we're tracking the reference of */ 7416 - meta.ref_obj_id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]); 7351 + 7352 + if (base_type(reg->type) != PTR_TO_DYNPTR) 7353 + /* Find the id of the dynptr we're 7354 + * tracking the reference of 7355 + */ 7356 + meta.ref_obj_id = stack_slot_get_id(env, reg); 7417 7357 break; 7418 7358 } 7419 7359 } ··· 7427 7355 verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n"); 7428 7356 return -EFAULT; 7429 7357 } 7358 + break; 7359 + case BPF_FUNC_user_ringbuf_drain: 7360 + err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, 7361 + set_user_ringbuf_callback_state); 7430 7362 break; 7431 7363 } 7432 7364 ··· 7541 7465 ret_btf = meta.kptr_off_desc->kptr.btf; 7542 7466 ret_btf_id = meta.kptr_off_desc->kptr.btf_id; 7543 7467 } else { 7468 + if (fn->ret_btf_id == BPF_PTR_POISON) { 7469 + verbose(env, "verifier internal error:"); 7470 + verbose(env, "func %s has non-overwritten BPF_PTR_POISON return type\n", 7471 + func_id_name(func_id)); 7472 + return -EINVAL; 7473 + } 7544 7474 ret_btf = btf_vmlinux; 7545 7475 ret_btf_id = *fn->ret_btf_id; 7546 7476 } ··· 7658 7576 { 7659 7577 const struct btf_type *t, *func, *func_proto, *ptr_type; 7660 7578 struct bpf_reg_state *regs = cur_regs(env); 7579 + struct bpf_kfunc_arg_meta meta = { 0 }; 7661 7580 const char *func_name, *ptr_type_name; 7662 7581 u32 i, nargs, func_id, ptr_type_id; 7663 7582 int err, insn_idx = *insn_idx_p; ··· 7693 7610 7694 7611 acq = *kfunc_flags & KF_ACQUIRE; 7695 7612 7613 + meta.flags = *kfunc_flags; 7614 + 7696 7615 /* Check the arguments */ 7697 - err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, *kfunc_flags); 7616 + err = btf_check_kfunc_arg_match(env, desc_btf, func_id, regs, &meta); 7698 7617 if (err < 0) 7699 7618 return err; 7700 7619 /* In case of release function, we get register number of refcounted ··· 7717 7632 /* Check return type */ 7718 7633 t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL); 7719 7634 7720 - if (acq && !btf_type_is_ptr(t)) { 7635 + if (acq && !btf_type_is_struct_ptr(desc_btf, t)) { 7721 7636 verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n"); 7722 7637 return -EINVAL; 7723 7638 } ··· 7729 7644 ptr_type = btf_type_skip_modifiers(desc_btf, t->type, 7730 7645 &ptr_type_id); 7731 7646 if (!btf_type_is_struct(ptr_type)) { 7732 - ptr_type_name = btf_name_by_offset(desc_btf, 7733 - ptr_type->name_off); 7734 - verbose(env, "kernel function %s returns pointer type %s %s is not supported\n", 7735 - func_name, btf_type_str(ptr_type), 7736 - ptr_type_name); 7737 - return -EINVAL; 7647 + if (!meta.r0_size) { 7648 + ptr_type_name = btf_name_by_offset(desc_btf, 7649 + ptr_type->name_off); 7650 + verbose(env, 7651 + "kernel function %s returns pointer type %s %s is not supported\n", 7652 + func_name, 7653 + btf_type_str(ptr_type), 7654 + ptr_type_name); 7655 + return -EINVAL; 7656 + } 7657 + 7658 + mark_reg_known_zero(env, regs, BPF_REG_0); 7659 + regs[BPF_REG_0].type = PTR_TO_MEM; 7660 + regs[BPF_REG_0].mem_size = meta.r0_size; 7661 + 7662 + if (meta.r0_rdonly) 7663 + regs[BPF_REG_0].type |= MEM_RDONLY; 7664 + 7665 + /* Ensures we don't access the memory after a release_reference() */ 7666 + if (meta.ref_obj_id) 7667 + regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id; 7668 + } else { 7669 + mark_reg_known_zero(env, regs, BPF_REG_0); 7670 + regs[BPF_REG_0].btf = desc_btf; 7671 + regs[BPF_REG_0].type = PTR_TO_BTF_ID; 7672 + regs[BPF_REG_0].btf_id = ptr_type_id; 7738 7673 } 7739 - mark_reg_known_zero(env, regs, BPF_REG_0); 7740 - regs[BPF_REG_0].btf = desc_btf; 7741 - regs[BPF_REG_0].type = PTR_TO_BTF_ID; 7742 - regs[BPF_REG_0].btf_id = ptr_type_id; 7743 7674 if (*kfunc_flags & KF_RET_NULL) { 7744 7675 regs[BPF_REG_0].type |= PTR_MAYBE_NULL; 7745 7676 /* For mark_ptr_or_null_reg, see 93c230e3f5bd6 */ ··· 9398 9297 return 0; 9399 9298 } 9400 9299 9401 - static void __find_good_pkt_pointers(struct bpf_func_state *state, 9402 - struct bpf_reg_state *dst_reg, 9403 - enum bpf_reg_type type, int new_range) 9404 - { 9405 - struct bpf_reg_state *reg; 9406 - int i; 9407 - 9408 - for (i = 0; i < MAX_BPF_REG; i++) { 9409 - reg = &state->regs[i]; 9410 - if (reg->type == type && reg->id == dst_reg->id) 9411 - /* keep the maximum range already checked */ 9412 - reg->range = max(reg->range, new_range); 9413 - } 9414 - 9415 - bpf_for_each_spilled_reg(i, state, reg) { 9416 - if (!reg) 9417 - continue; 9418 - if (reg->type == type && reg->id == dst_reg->id) 9419 - reg->range = max(reg->range, new_range); 9420 - } 9421 - } 9422 - 9423 9300 static void find_good_pkt_pointers(struct bpf_verifier_state *vstate, 9424 9301 struct bpf_reg_state *dst_reg, 9425 9302 enum bpf_reg_type type, 9426 9303 bool range_right_open) 9427 9304 { 9428 - int new_range, i; 9305 + struct bpf_func_state *state; 9306 + struct bpf_reg_state *reg; 9307 + int new_range; 9429 9308 9430 9309 if (dst_reg->off < 0 || 9431 9310 (dst_reg->off == 0 && range_right_open)) ··· 9470 9389 * the range won't allow anything. 9471 9390 * dst_reg->off is known < MAX_PACKET_OFF, therefore it fits in a u16. 9472 9391 */ 9473 - for (i = 0; i <= vstate->curframe; i++) 9474 - __find_good_pkt_pointers(vstate->frame[i], dst_reg, type, 9475 - new_range); 9392 + bpf_for_each_reg_in_vstate(vstate, state, reg, ({ 9393 + if (reg->type == type && reg->id == dst_reg->id) 9394 + /* keep the maximum range already checked */ 9395 + reg->range = max(reg->range, new_range); 9396 + })); 9476 9397 } 9477 9398 9478 9399 static int is_branch32_taken(struct bpf_reg_state *reg, u32 val, u8 opcode) ··· 9963 9880 9964 9881 if (!reg_may_point_to_spin_lock(reg)) { 9965 9882 /* For not-NULL ptr, reg->ref_obj_id will be reset 9966 - * in release_reg_references(). 9883 + * in release_reference(). 9967 9884 * 9968 9885 * reg->id is still used by spin_lock ptr. Other 9969 9886 * than spin_lock ptr type, reg->id can be reset. 9970 9887 */ 9971 9888 reg->id = 0; 9972 9889 } 9973 - } 9974 - } 9975 - 9976 - static void __mark_ptr_or_null_regs(struct bpf_func_state *state, u32 id, 9977 - bool is_null) 9978 - { 9979 - struct bpf_reg_state *reg; 9980 - int i; 9981 - 9982 - for (i = 0; i < MAX_BPF_REG; i++) 9983 - mark_ptr_or_null_reg(state, &state->regs[i], id, is_null); 9984 - 9985 - bpf_for_each_spilled_reg(i, state, reg) { 9986 - if (!reg) 9987 - continue; 9988 - mark_ptr_or_null_reg(state, reg, id, is_null); 9989 9890 } 9990 9891 } 9991 9892 ··· 9980 9913 bool is_null) 9981 9914 { 9982 9915 struct bpf_func_state *state = vstate->frame[vstate->curframe]; 9983 - struct bpf_reg_state *regs = state->regs; 9916 + struct bpf_reg_state *regs = state->regs, *reg; 9984 9917 u32 ref_obj_id = regs[regno].ref_obj_id; 9985 9918 u32 id = regs[regno].id; 9986 - int i; 9987 9919 9988 9920 if (ref_obj_id && ref_obj_id == id && is_null) 9989 9921 /* regs[regno] is in the " == NULL" branch. ··· 9991 9925 */ 9992 9926 WARN_ON_ONCE(release_reference_state(state, id)); 9993 9927 9994 - for (i = 0; i <= vstate->curframe; i++) 9995 - __mark_ptr_or_null_regs(vstate->frame[i], id, is_null); 9928 + bpf_for_each_reg_in_vstate(vstate, state, reg, ({ 9929 + mark_ptr_or_null_reg(state, reg, id, is_null); 9930 + })); 9996 9931 } 9997 9932 9998 9933 static bool try_match_pkt_pointers(const struct bpf_insn *insn, ··· 10106 10039 { 10107 10040 struct bpf_func_state *state; 10108 10041 struct bpf_reg_state *reg; 10109 - int i, j; 10110 10042 10111 - for (i = 0; i <= vstate->curframe; i++) { 10112 - state = vstate->frame[i]; 10113 - for (j = 0; j < MAX_BPF_REG; j++) { 10114 - reg = &state->regs[j]; 10115 - if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) 10116 - *reg = *known_reg; 10117 - } 10118 - 10119 - bpf_for_each_spilled_reg(j, state, reg) { 10120 - if (!reg) 10121 - continue; 10122 - if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) 10123 - *reg = *known_reg; 10124 - } 10125 - } 10043 + bpf_for_each_reg_in_vstate(vstate, state, reg, ({ 10044 + if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) 10045 + *reg = *known_reg; 10046 + })); 10126 10047 } 10127 10048 10128 10049 static int check_cond_jmp_op(struct bpf_verifier_env *env, ··· 12709 12654 case BPF_MAP_TYPE_ARRAY_OF_MAPS: 12710 12655 case BPF_MAP_TYPE_HASH_OF_MAPS: 12711 12656 case BPF_MAP_TYPE_RINGBUF: 12657 + case BPF_MAP_TYPE_USER_RINGBUF: 12712 12658 case BPF_MAP_TYPE_INODE_STORAGE: 12713 12659 case BPF_MAP_TYPE_SK_STORAGE: 12714 12660 case BPF_MAP_TYPE_TASK_STORAGE: ··· 13503 13447 insn->code = BPF_LDX | BPF_PROBE_MEM | 13504 13448 BPF_SIZE((insn)->code); 13505 13449 env->prog->aux->num_exentries++; 13506 - } else if (resolve_prog_type(env->prog) != BPF_PROG_TYPE_STRUCT_OPS) { 13507 - verbose(env, "Writes through BTF pointers are not allowed\n"); 13508 - return -EINVAL; 13509 13450 } 13510 13451 continue; 13511 13452 default:

+5 -1

kernel/kprobes.c

··· 1607 1607 struct kprobe *old_p; 1608 1608 struct module *probed_mod; 1609 1609 kprobe_opcode_t *addr; 1610 + bool on_func_entry; 1610 1611 1611 1612 /* Adjust probe address from symbol */ 1612 - addr = kprobe_addr(p); 1613 + addr = _kprobe_addr(p->addr, p->symbol_name, p->offset, &on_func_entry); 1613 1614 if (IS_ERR(addr)) 1614 1615 return PTR_ERR(addr); 1615 1616 p->addr = addr; ··· 1629 1628 return ret; 1630 1629 1631 1630 mutex_lock(&kprobe_mutex); 1631 + 1632 + if (on_func_entry) 1633 + p->flags |= KPROBE_FLAG_ON_FUNC_ENTRY; 1632 1634 1633 1635 old_p = get_kprobe(p->addr); 1634 1636 if (old_p) {

+6

kernel/trace/Kconfig

··· 51 51 This allows for use of regs_get_kernel_argument() and 52 52 kernel_stack_pointer(). 53 53 54 + config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE 55 + bool 56 + help 57 + If the architecture generates __patchable_function_entries sections 58 + but does not want them included in the ftrace locations. 59 + 54 60 config HAVE_FTRACE_MCOUNT_RECORD 55 61 bool 56 62 help

+208 -3

kernel/trace/bpf_trace.c

··· 20 20 #include <linux/fprobe.h> 21 21 #include <linux/bsearch.h> 22 22 #include <linux/sort.h> 23 + #include <linux/key.h> 24 + #include <linux/verification.h> 23 25 24 26 #include <net/bpf_sk_storage.h> 25 27 ··· 1028 1026 .arg1_type = ARG_PTR_TO_CTX, 1029 1027 }; 1030 1028 1029 + #ifdef CONFIG_X86_KERNEL_IBT 1030 + static unsigned long get_entry_ip(unsigned long fentry_ip) 1031 + { 1032 + u32 instr; 1033 + 1034 + /* Being extra safe in here in case entry ip is on the page-edge. */ 1035 + if (get_kernel_nofault(instr, (u32 *) fentry_ip - 1)) 1036 + return fentry_ip; 1037 + if (is_endbr(instr)) 1038 + fentry_ip -= ENDBR_INSN_SIZE; 1039 + return fentry_ip; 1040 + } 1041 + #else 1042 + #define get_entry_ip(fentry_ip) fentry_ip 1043 + #endif 1044 + 1031 1045 BPF_CALL_1(bpf_get_func_ip_kprobe, struct pt_regs *, regs) 1032 1046 { 1033 1047 struct kprobe *kp = kprobe_running(); 1034 1048 1035 - return kp ? (uintptr_t)kp->addr : 0; 1049 + if (!kp || !(kp->flags & KPROBE_FLAG_ON_FUNC_ENTRY)) 1050 + return 0; 1051 + 1052 + return get_entry_ip((uintptr_t)kp->addr); 1036 1053 } 1037 1054 1038 1055 static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = { ··· 1201 1180 .ret_type = RET_INTEGER, 1202 1181 .arg1_type = ARG_PTR_TO_CTX, 1203 1182 }; 1183 + 1184 + #ifdef CONFIG_KEYS 1185 + __diag_push(); 1186 + __diag_ignore_all("-Wmissing-prototypes", 1187 + "kfuncs which will be used in BPF programs"); 1188 + 1189 + /** 1190 + * bpf_lookup_user_key - lookup a key by its serial 1191 + * @serial: key handle serial number 1192 + * @flags: lookup-specific flags 1193 + * 1194 + * Search a key with a given *serial* and the provided *flags*. 1195 + * If found, increment the reference count of the key by one, and 1196 + * return it in the bpf_key structure. 1197 + * 1198 + * The bpf_key structure must be passed to bpf_key_put() when done 1199 + * with it, so that the key reference count is decremented and the 1200 + * bpf_key structure is freed. 1201 + * 1202 + * Permission checks are deferred to the time the key is used by 1203 + * one of the available key-specific kfuncs. 1204 + * 1205 + * Set *flags* with KEY_LOOKUP_CREATE, to attempt creating a requested 1206 + * special keyring (e.g. session keyring), if it doesn't yet exist. 1207 + * Set *flags* with KEY_LOOKUP_PARTIAL, to lookup a key without waiting 1208 + * for the key construction, and to retrieve uninstantiated keys (keys 1209 + * without data attached to them). 1210 + * 1211 + * Return: a bpf_key pointer with a valid key pointer if the key is found, a 1212 + * NULL pointer otherwise. 1213 + */ 1214 + struct bpf_key *bpf_lookup_user_key(u32 serial, u64 flags) 1215 + { 1216 + key_ref_t key_ref; 1217 + struct bpf_key *bkey; 1218 + 1219 + if (flags & ~KEY_LOOKUP_ALL) 1220 + return NULL; 1221 + 1222 + /* 1223 + * Permission check is deferred until the key is used, as the 1224 + * intent of the caller is unknown here. 1225 + */ 1226 + key_ref = lookup_user_key(serial, flags, KEY_DEFER_PERM_CHECK); 1227 + if (IS_ERR(key_ref)) 1228 + return NULL; 1229 + 1230 + bkey = kmalloc(sizeof(*bkey), GFP_KERNEL); 1231 + if (!bkey) { 1232 + key_put(key_ref_to_ptr(key_ref)); 1233 + return NULL; 1234 + } 1235 + 1236 + bkey->key = key_ref_to_ptr(key_ref); 1237 + bkey->has_ref = true; 1238 + 1239 + return bkey; 1240 + } 1241 + 1242 + /** 1243 + * bpf_lookup_system_key - lookup a key by a system-defined ID 1244 + * @id: key ID 1245 + * 1246 + * Obtain a bpf_key structure with a key pointer set to the passed key ID. 1247 + * The key pointer is marked as invalid, to prevent bpf_key_put() from 1248 + * attempting to decrement the key reference count on that pointer. The key 1249 + * pointer set in such way is currently understood only by 1250 + * verify_pkcs7_signature(). 1251 + * 1252 + * Set *id* to one of the values defined in include/linux/verification.h: 1253 + * 0 for the primary keyring (immutable keyring of system keys); 1254 + * VERIFY_USE_SECONDARY_KEYRING for both the primary and secondary keyring 1255 + * (where keys can be added only if they are vouched for by existing keys 1256 + * in those keyrings); VERIFY_USE_PLATFORM_KEYRING for the platform 1257 + * keyring (primarily used by the integrity subsystem to verify a kexec'ed 1258 + * kerned image and, possibly, the initramfs signature). 1259 + * 1260 + * Return: a bpf_key pointer with an invalid key pointer set from the 1261 + * pre-determined ID on success, a NULL pointer otherwise 1262 + */ 1263 + struct bpf_key *bpf_lookup_system_key(u64 id) 1264 + { 1265 + struct bpf_key *bkey; 1266 + 1267 + if (system_keyring_id_check(id) < 0) 1268 + return NULL; 1269 + 1270 + bkey = kmalloc(sizeof(*bkey), GFP_ATOMIC); 1271 + if (!bkey) 1272 + return NULL; 1273 + 1274 + bkey->key = (struct key *)(unsigned long)id; 1275 + bkey->has_ref = false; 1276 + 1277 + return bkey; 1278 + } 1279 + 1280 + /** 1281 + * bpf_key_put - decrement key reference count if key is valid and free bpf_key 1282 + * @bkey: bpf_key structure 1283 + * 1284 + * Decrement the reference count of the key inside *bkey*, if the pointer 1285 + * is valid, and free *bkey*. 1286 + */ 1287 + void bpf_key_put(struct bpf_key *bkey) 1288 + { 1289 + if (bkey->has_ref) 1290 + key_put(bkey->key); 1291 + 1292 + kfree(bkey); 1293 + } 1294 + 1295 + #ifdef CONFIG_SYSTEM_DATA_VERIFICATION 1296 + /** 1297 + * bpf_verify_pkcs7_signature - verify a PKCS#7 signature 1298 + * @data_ptr: data to verify 1299 + * @sig_ptr: signature of the data 1300 + * @trusted_keyring: keyring with keys trusted for signature verification 1301 + * 1302 + * Verify the PKCS#7 signature *sig_ptr* against the supplied *data_ptr* 1303 + * with keys in a keyring referenced by *trusted_keyring*. 1304 + * 1305 + * Return: 0 on success, a negative value on error. 1306 + */ 1307 + int bpf_verify_pkcs7_signature(struct bpf_dynptr_kern *data_ptr, 1308 + struct bpf_dynptr_kern *sig_ptr, 1309 + struct bpf_key *trusted_keyring) 1310 + { 1311 + int ret; 1312 + 1313 + if (trusted_keyring->has_ref) { 1314 + /* 1315 + * Do the permission check deferred in bpf_lookup_user_key(). 1316 + * See bpf_lookup_user_key() for more details. 1317 + * 1318 + * A call to key_task_permission() here would be redundant, as 1319 + * it is already done by keyring_search() called by 1320 + * find_asymmetric_key(). 1321 + */ 1322 + ret = key_validate(trusted_keyring->key); 1323 + if (ret < 0) 1324 + return ret; 1325 + } 1326 + 1327 + return verify_pkcs7_signature(data_ptr->data, 1328 + bpf_dynptr_get_size(data_ptr), 1329 + sig_ptr->data, 1330 + bpf_dynptr_get_size(sig_ptr), 1331 + trusted_keyring->key, 1332 + VERIFYING_UNSPECIFIED_SIGNATURE, NULL, 1333 + NULL); 1334 + } 1335 + #endif /* CONFIG_SYSTEM_DATA_VERIFICATION */ 1336 + 1337 + __diag_pop(); 1338 + 1339 + BTF_SET8_START(key_sig_kfunc_set) 1340 + BTF_ID_FLAGS(func, bpf_lookup_user_key, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE) 1341 + BTF_ID_FLAGS(func, bpf_lookup_system_key, KF_ACQUIRE | KF_RET_NULL) 1342 + BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE) 1343 + #ifdef CONFIG_SYSTEM_DATA_VERIFICATION 1344 + BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE) 1345 + #endif 1346 + BTF_SET8_END(key_sig_kfunc_set) 1347 + 1348 + static const struct btf_kfunc_id_set bpf_key_sig_kfunc_set = { 1349 + .owner = THIS_MODULE, 1350 + .set = &key_sig_kfunc_set, 1351 + }; 1352 + 1353 + static int __init bpf_key_sig_kfuncs_init(void) 1354 + { 1355 + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, 1356 + &bpf_key_sig_kfunc_set); 1357 + } 1358 + 1359 + late_initcall(bpf_key_sig_kfuncs_init); 1360 + #endif /* CONFIG_KEYS */ 1204 1361 1205 1362 static const struct bpf_func_proto * 1206 1363 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) ··· 2241 2042 void __bpf_trace_run(struct bpf_prog *prog, u64 *args) 2242 2043 { 2243 2044 cant_sleep(); 2045 + if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { 2046 + bpf_prog_inc_misses_counter(prog); 2047 + goto out; 2048 + } 2244 2049 rcu_read_lock(); 2245 2050 (void) bpf_prog_run(prog, args); 2246 2051 rcu_read_unlock(); 2052 + out: 2053 + this_cpu_dec(*(prog->active)); 2247 2054 } 2248 2055 2249 2056 #define UNPACK(...) __VA_ARGS__ ··· 2619 2414 } 2620 2415 2621 2416 static void 2622 - kprobe_multi_link_handler(struct fprobe *fp, unsigned long entry_ip, 2417 + kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip, 2623 2418 struct pt_regs *regs) 2624 2419 { 2625 2420 struct bpf_kprobe_multi_link *link; 2626 2421 2627 2422 link = container_of(fp, struct bpf_kprobe_multi_link, fp); 2628 - kprobe_multi_link_prog_run(link, entry_ip, regs); 2423 + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs); 2629 2424 } 2630 2425 2631 2426 static int symbols_cmp_r(const void *a, const void *b, const void *priv)

+1 -2

kernel/trace/ftrace.c

··· 8265 8265 if (args->addrs[idx]) 8266 8266 return 0; 8267 8267 8268 - addr = ftrace_location(addr); 8269 - if (!addr) 8268 + if (!ftrace_location(addr)) 8270 8269 return 0; 8271 8270 8272 8271 args->addrs[idx] = addr;

+37

net/bpf/test_run.c

··· 606 606 WARN_ON_ONCE(1); 607 607 } 608 608 609 + static int *__bpf_kfunc_call_test_get_mem(struct prog_test_ref_kfunc *p, const int size) 610 + { 611 + if (size > 2 * sizeof(int)) 612 + return NULL; 613 + 614 + return (int *)p; 615 + } 616 + 617 + noinline int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p, const int rdwr_buf_size) 618 + { 619 + return __bpf_kfunc_call_test_get_mem(p, rdwr_buf_size); 620 + } 621 + 622 + noinline int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) 623 + { 624 + return __bpf_kfunc_call_test_get_mem(p, rdonly_buf_size); 625 + } 626 + 627 + /* the next 2 ones can't be really used for testing expect to ensure 628 + * that the verifier rejects the call. 629 + * Acquire functions must return struct pointers, so these ones are 630 + * failing. 631 + */ 632 + noinline int *bpf_kfunc_call_test_acq_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) 633 + { 634 + return __bpf_kfunc_call_test_get_mem(p, rdonly_buf_size); 635 + } 636 + 637 + noinline void bpf_kfunc_call_int_mem_release(int *p) 638 + { 639 + } 640 + 609 641 noinline struct prog_test_ref_kfunc * 610 642 bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **pp, int a, int b) 611 643 { ··· 744 712 BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE) 745 713 BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE) 746 714 BTF_ID_FLAGS(func, bpf_kfunc_call_memb1_release, KF_RELEASE) 715 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_get_rdwr_mem, KF_RET_NULL) 716 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_get_rdonly_mem, KF_RET_NULL) 717 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_acq_rdonly_mem, KF_ACQUIRE | KF_RET_NULL) 718 + BTF_ID_FLAGS(func, bpf_kfunc_call_int_mem_release, KF_RELEASE) 747 719 BTF_ID_FLAGS(func, bpf_kfunc_call_test_kptr_get, KF_ACQUIRE | KF_RET_NULL | KF_KPTR_GET) 748 720 BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass_ctx) 749 721 BTF_ID_FLAGS(func, bpf_kfunc_call_test_pass1) ··· 1670 1634 1671 1635 ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set); 1672 1636 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_prog_test_kfunc_set); 1637 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_prog_test_kfunc_set); 1673 1638 return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc, 1674 1639 ARRAY_SIZE(bpf_prog_test_dtor_kfunc), 1675 1640 THIS_MODULE);

+108 -16

net/core/filter.c

··· 18 18 */ 19 19 20 20 #include <linux/atomic.h> 21 + #include <linux/bpf_verifier.h> 21 22 #include <linux/module.h> 22 23 #include <linux/types.h> 23 24 #include <linux/mm.h> ··· 5102 5101 return 0; 5103 5102 } 5104 5103 5104 + static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval, 5105 + int *optlen, bool getopt) 5106 + { 5107 + struct tcp_sock *tp; 5108 + int ret; 5109 + 5110 + if (*optlen < 2) 5111 + return -EINVAL; 5112 + 5113 + if (getopt) { 5114 + if (!inet_csk(sk)->icsk_ca_ops) 5115 + return -EINVAL; 5116 + /* BPF expects NULL-terminated tcp-cc string */ 5117 + optval[--(*optlen)] = '\0'; 5118 + return do_tcp_getsockopt(sk, SOL_TCP, TCP_CONGESTION, 5119 + KERNEL_SOCKPTR(optval), 5120 + KERNEL_SOCKPTR(optlen)); 5121 + } 5122 + 5123 + /* "cdg" is the only cc that alloc a ptr 5124 + * in inet_csk_ca area. The bpf-tcp-cc may 5125 + * overwrite this ptr after switching to cdg. 5126 + */ 5127 + if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen)) 5128 + return -ENOTSUPP; 5129 + 5130 + /* It stops this looping 5131 + * 5132 + * .init => bpf_setsockopt(tcp_cc) => .init => 5133 + * bpf_setsockopt(tcp_cc)" => .init => .... 5134 + * 5135 + * The second bpf_setsockopt(tcp_cc) is not allowed 5136 + * in order to break the loop when both .init 5137 + * are the same bpf prog. 5138 + * 5139 + * This applies even the second bpf_setsockopt(tcp_cc) 5140 + * does not cause a loop. This limits only the first 5141 + * '.init' can call bpf_setsockopt(TCP_CONGESTION) to 5142 + * pick a fallback cc (eg. peer does not support ECN) 5143 + * and the second '.init' cannot fallback to 5144 + * another. 5145 + */ 5146 + tp = tcp_sk(sk); 5147 + if (tp->bpf_chg_cc_inprogress) 5148 + return -EBUSY; 5149 + 5150 + tp->bpf_chg_cc_inprogress = 1; 5151 + ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION, 5152 + KERNEL_SOCKPTR(optval), *optlen); 5153 + tp->bpf_chg_cc_inprogress = 0; 5154 + return ret; 5155 + } 5156 + 5105 5157 static int sol_tcp_sockopt(struct sock *sk, int optname, 5106 5158 char *optval, int *optlen, 5107 5159 bool getopt) ··· 5178 5124 return -EINVAL; 5179 5125 break; 5180 5126 case TCP_CONGESTION: 5181 - if (*optlen < 2) 5182 - return -EINVAL; 5183 - break; 5127 + return sol_tcp_sockopt_congestion(sk, optval, optlen, getopt); 5184 5128 case TCP_SAVED_SYN: 5185 5129 if (*optlen < 1) 5186 5130 return -EINVAL; ··· 5201 5149 * does not know if the user space still needs it. 5202 5150 */ 5203 5151 return 0; 5204 - } 5205 - 5206 - if (optname == TCP_CONGESTION) { 5207 - if (!inet_csk(sk)->icsk_ca_ops) 5208 - return -EINVAL; 5209 - /* BPF expects NULL-terminated tcp-cc string */ 5210 - optval[--(*optlen)] = '\0'; 5211 5152 } 5212 5153 5213 5154 return do_tcp_getsockopt(sk, SOL_TCP, optname, ··· 5329 5284 BPF_CALL_5(bpf_sk_setsockopt, struct sock *, sk, int, level, 5330 5285 int, optname, char *, optval, int, optlen) 5331 5286 { 5332 - if (level == SOL_TCP && optname == TCP_CONGESTION) { 5333 - if (optlen >= sizeof("cdg") - 1 && 5334 - !strncmp("cdg", optval, optlen)) 5335 - return -ENOTSUPP; 5336 - } 5337 - 5338 5287 return _bpf_setsockopt(sk, level, optname, optval, optlen); 5339 5288 } 5340 5289 ··· 8644 8605 return bpf_skb_is_valid_access(off, size, type, prog, info); 8645 8606 } 8646 8607 8608 + DEFINE_MUTEX(nf_conn_btf_access_lock); 8609 + EXPORT_SYMBOL_GPL(nf_conn_btf_access_lock); 8610 + 8611 + int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf, 8612 + const struct btf_type *t, int off, int size, 8613 + enum bpf_access_type atype, u32 *next_btf_id, 8614 + enum bpf_type_flag *flag); 8615 + EXPORT_SYMBOL_GPL(nfct_btf_struct_access); 8616 + 8617 + static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log, 8618 + const struct btf *btf, 8619 + const struct btf_type *t, int off, 8620 + int size, enum bpf_access_type atype, 8621 + u32 *next_btf_id, 8622 + enum bpf_type_flag *flag) 8623 + { 8624 + int ret = -EACCES; 8625 + 8626 + if (atype == BPF_READ) 8627 + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, 8628 + flag); 8629 + 8630 + mutex_lock(&nf_conn_btf_access_lock); 8631 + if (nfct_btf_struct_access) 8632 + ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag); 8633 + mutex_unlock(&nf_conn_btf_access_lock); 8634 + 8635 + return ret; 8636 + } 8637 + 8647 8638 static bool __is_valid_xdp_access(int off, int size) 8648 8639 { 8649 8640 if (off < 0 || off >= sizeof(struct xdp_md)) ··· 8732 8663 act, prog->aux->name, prog->aux->id, dev ? dev->name : "N/A"); 8733 8664 } 8734 8665 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action); 8666 + 8667 + static int xdp_btf_struct_access(struct bpf_verifier_log *log, 8668 + const struct btf *btf, 8669 + const struct btf_type *t, int off, 8670 + int size, enum bpf_access_type atype, 8671 + u32 *next_btf_id, 8672 + enum bpf_type_flag *flag) 8673 + { 8674 + int ret = -EACCES; 8675 + 8676 + if (atype == BPF_READ) 8677 + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id, 8678 + flag); 8679 + 8680 + mutex_lock(&nf_conn_btf_access_lock); 8681 + if (nfct_btf_struct_access) 8682 + ret = nfct_btf_struct_access(log, btf, t, off, size, atype, next_btf_id, flag); 8683 + mutex_unlock(&nf_conn_btf_access_lock); 8684 + 8685 + return ret; 8686 + } 8735 8687 8736 8688 static bool sock_addr_is_valid_access(int off, int size, 8737 8689 enum bpf_access_type type, ··· 10648 10558 .convert_ctx_access = tc_cls_act_convert_ctx_access, 10649 10559 .gen_prologue = tc_cls_act_prologue, 10650 10560 .gen_ld_abs = bpf_gen_ld_abs, 10561 + .btf_struct_access = tc_cls_act_btf_struct_access, 10651 10562 }; 10652 10563 10653 10564 const struct bpf_prog_ops tc_cls_act_prog_ops = { ··· 10660 10569 .is_valid_access = xdp_is_valid_access, 10661 10570 .convert_ctx_access = xdp_convert_ctx_access, 10662 10571 .gen_prologue = bpf_noop_prologue, 10572 + .btf_struct_access = xdp_btf_struct_access, 10663 10573 }; 10664 10574 10665 10575 const struct bpf_prog_ops xdp_prog_ops = {

+8 -4

net/core/skmsg.c

··· 434 434 if (copied + copy > len) 435 435 copy = len - copied; 436 436 copy = copy_page_to_iter(page, sge->offset, copy, iter); 437 - if (!copy) 438 - return copied ? copied : -EFAULT; 437 + if (!copy) { 438 + copied = copied ? copied : -EFAULT; 439 + goto out; 440 + } 439 441 440 442 copied += copy; 441 443 if (likely(!peek)) { ··· 457 455 * didn't copy the entire length lets just break. 458 456 */ 459 457 if (copy != sge->length) 460 - return copied; 458 + goto out; 461 459 sk_msg_iter_var_next(i); 462 460 } 463 461 ··· 479 477 } 480 478 msg_rx = sk_psock_peek_msg(psock); 481 479 } 482 - 480 + out: 481 + if (psock->work_state.skb && copied > 0) 482 + schedule_work(&psock->work); 483 483 return copied; 484 484 } 485 485 EXPORT_SYMBOL_GPL(sk_msg_recvmsg);

+2 -1

net/core/stream.c

··· 159 159 *timeo_p = current_timeo; 160 160 } 161 161 out: 162 - remove_wait_queue(sk_sleep(sk), &wait); 162 + if (!sock_flag(sk, SOCK_DEAD)) 163 + remove_wait_queue(sk_sleep(sk), &wait); 163 164 return err; 164 165 165 166 do_error:

+1 -1

net/ipv4/bpf_tcp_ca.c

··· 124 124 return -EACCES; 125 125 } 126 126 127 - return NOT_INIT; 127 + return 0; 128 128 } 129 129 130 130 BPF_CALL_2(bpf_tcp_send_ack, struct tcp_sock *, tp, u32, rcv_nxt)

+15

net/ipv4/ping.c

··· 33 33 #include <linux/skbuff.h> 34 34 #include <linux/proc_fs.h> 35 35 #include <linux/export.h> 36 + #include <linux/bpf-cgroup.h> 36 37 #include <net/sock.h> 37 38 #include <net/ping.h> 38 39 #include <net/udp.h> ··· 295 294 sk_common_release(sk); 296 295 } 297 296 EXPORT_SYMBOL_GPL(ping_close); 297 + 298 + static int ping_pre_connect(struct sock *sk, struct sockaddr *uaddr, 299 + int addr_len) 300 + { 301 + /* This check is replicated from __ip4_datagram_connect() and 302 + * intended to prevent BPF program called below from accessing bytes 303 + * that are out of the bound specified by user in addr_len. 304 + */ 305 + if (addr_len < sizeof(struct sockaddr_in)) 306 + return -EINVAL; 307 + 308 + return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr); 309 + } 298 310 299 311 /* Checks the bind address and possibly modifies sk->sk_bound_dev_if. */ 300 312 static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, ··· 1023 1009 .owner = THIS_MODULE, 1024 1010 .init = ping_init_sock, 1025 1011 .close = ping_close, 1012 + .pre_connect = ping_pre_connect, 1026 1013 .connect = ip4_datagram_connect, 1027 1014 .disconnect = __udp_disconnect, 1028 1015 .setsockopt = ip_setsockopt,

+1

net/ipv4/tcp_minisocks.c

··· 561 561 newtp->fastopen_req = NULL; 562 562 RCU_INIT_POINTER(newtp->fastopen_rsk, NULL); 563 563 564 + newtp->bpf_chg_cc_inprogress = 0; 564 565 tcp_bpf_clone(sk, newsk); 565 566 566 567 __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);

+16

net/ipv6/ping.c

··· 20 20 #include <net/udp.h> 21 21 #include <net/transp_v6.h> 22 22 #include <linux/proc_fs.h> 23 + #include <linux/bpf-cgroup.h> 23 24 #include <net/ping.h> 24 25 25 26 static void ping_v6_destroy(struct sock *sk) ··· 48 47 const struct net_device *dev, int strict) 49 48 { 50 49 return 0; 50 + } 51 + 52 + static int ping_v6_pre_connect(struct sock *sk, struct sockaddr *uaddr, 53 + int addr_len) 54 + { 55 + /* This check is replicated from __ip6_datagram_connect() and 56 + * intended to prevent BPF program called below from accessing 57 + * bytes that are out of the bound specified by user in addr_len. 58 + */ 59 + 60 + if (addr_len < SIN6_LEN_RFC2133) 61 + return -EINVAL; 62 + 63 + return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr); 51 64 } 52 65 53 66 static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) ··· 206 191 .init = ping_init_sock, 207 192 .close = ping_close, 208 193 .destroy = ping_v6_destroy, 194 + .pre_connect = ping_v6_pre_connect, 209 195 .connect = ip6_datagram_connect_v6_only, 210 196 .disconnect = __udp_disconnect, 211 197 .setsockopt = ipv6_setsockopt,

+6

net/netfilter/Makefile

··· 60 60 nf_nat-$(CONFIG_NF_NAT_REDIRECT) += nf_nat_redirect.o 61 61 nf_nat-$(CONFIG_NF_NAT_MASQUERADE) += nf_nat_masquerade.o 62 62 63 + ifeq ($(CONFIG_NF_NAT),m) 64 + nf_nat-$(CONFIG_DEBUG_INFO_BTF_MODULES) += nf_nat_bpf.o 65 + else ifeq ($(CONFIG_NF_NAT),y) 66 + nf_nat-$(CONFIG_DEBUG_INFO_BTF) += nf_nat_bpf.o 67 + endif 68 + 63 69 # NAT helpers 64 70 obj-$(CONFIG_NF_NAT_AMANDA) += nf_nat_amanda.o 65 71 obj-$(CONFIG_NF_NAT_FTP) += nf_nat_ftp.o

+67 -7

net/netfilter/nf_conntrack_bpf.c

··· 6 6 * are exposed through to BPF programs is explicitly unstable. 7 7 */ 8 8 9 + #include <linux/bpf_verifier.h> 9 10 #include <linux/bpf.h> 10 11 #include <linux/btf.h> 12 + #include <linux/filter.h> 13 + #include <linux/mutex.h> 11 14 #include <linux/types.h> 12 15 #include <linux/btf_ids.h> 13 16 #include <linux/net_namespace.h> 14 - #include <net/netfilter/nf_conntrack.h> 15 17 #include <net/netfilter/nf_conntrack_bpf.h> 16 18 #include <net/netfilter/nf_conntrack_core.h> 17 19 ··· 136 134 137 135 memset(&ct->proto, 0, sizeof(ct->proto)); 138 136 __nf_ct_set_timeout(ct, timeout * HZ); 139 - ct->status |= IPS_CONFIRMED; 140 137 141 138 out: 142 139 if (opts->netns_id >= 0) ··· 185 184 return ct; 186 185 } 187 186 187 + BTF_ID_LIST(btf_nf_conn_ids) 188 + BTF_ID(struct, nf_conn) 189 + BTF_ID(struct, nf_conn___init) 190 + 191 + /* Check writes into `struct nf_conn` */ 192 + static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log, 193 + const struct btf *btf, 194 + const struct btf_type *t, int off, 195 + int size, enum bpf_access_type atype, 196 + u32 *next_btf_id, 197 + enum bpf_type_flag *flag) 198 + { 199 + const struct btf_type *ncit; 200 + const struct btf_type *nct; 201 + size_t end; 202 + 203 + ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]); 204 + nct = btf_type_by_id(btf, btf_nf_conn_ids[0]); 205 + 206 + if (t != nct && t != ncit) { 207 + bpf_log(log, "only read is supported\n"); 208 + return -EACCES; 209 + } 210 + 211 + /* `struct nf_conn` and `struct nf_conn___init` have the same layout 212 + * so we are safe to simply merge offset checks here 213 + */ 214 + switch (off) { 215 + #if defined(CONFIG_NF_CONNTRACK_MARK) 216 + case offsetof(struct nf_conn, mark): 217 + end = offsetofend(struct nf_conn, mark); 218 + break; 219 + #endif 220 + default: 221 + bpf_log(log, "no write support to nf_conn at off %d\n", off); 222 + return -EACCES; 223 + } 224 + 225 + if (off + size > end) { 226 + bpf_log(log, 227 + "write access at off %d with size %d beyond the member of nf_conn ended at %zu\n", 228 + off, size, end); 229 + return -EACCES; 230 + } 231 + 232 + return 0; 233 + } 234 + 188 235 __diag_push(); 189 236 __diag_ignore_all("-Wmissing-prototypes", 190 237 "Global functions as their definitions will be in nf_conntrack BTF"); 191 - 192 - struct nf_conn___init { 193 - struct nf_conn ct; 194 - }; 195 238 196 239 /* bpf_xdp_ct_alloc - Allocate a new CT entry 197 240 * ··· 384 339 struct nf_conn *nfct = (struct nf_conn *)nfct_i; 385 340 int err; 386 341 342 + nfct->status |= IPS_CONFIRMED; 387 343 err = nf_conntrack_hash_check_insert(nfct); 388 344 if (err < 0) { 389 345 nf_conntrack_free(nfct); ··· 495 449 int ret; 496 450 497 451 ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set); 498 - return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set); 452 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set); 453 + if (!ret) { 454 + mutex_lock(&nf_conn_btf_access_lock); 455 + nfct_btf_struct_access = _nf_conntrack_btf_struct_access; 456 + mutex_unlock(&nf_conn_btf_access_lock); 457 + } 458 + 459 + return ret; 460 + } 461 + 462 + void cleanup_nf_conntrack_bpf(void) 463 + { 464 + mutex_lock(&nf_conn_btf_access_lock); 465 + nfct_btf_struct_access = NULL; 466 + mutex_unlock(&nf_conn_btf_access_lock); 499 467 }

+1

net/netfilter/nf_conntrack_core.c

··· 2516 2516 2517 2517 void nf_conntrack_cleanup_start(void) 2518 2518 { 2519 + cleanup_nf_conntrack_bpf(); 2519 2520 conntrack_gc_work.exiting = true; 2520 2521 } 2521 2522

+79

net/netfilter/nf_nat_bpf.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Unstable NAT Helpers for XDP and TC-BPF hook 3 + * 4 + * These are called from the XDP and SCHED_CLS BPF programs. Note that it is 5 + * allowed to break compatibility for these functions since the interface they 6 + * are exposed through to BPF programs is explicitly unstable. 7 + */ 8 + 9 + #include <linux/bpf.h> 10 + #include <linux/btf_ids.h> 11 + #include <net/netfilter/nf_conntrack_bpf.h> 12 + #include <net/netfilter/nf_conntrack_core.h> 13 + #include <net/netfilter/nf_nat.h> 14 + 15 + __diag_push(); 16 + __diag_ignore_all("-Wmissing-prototypes", 17 + "Global functions as their definitions will be in nf_nat BTF"); 18 + 19 + /* bpf_ct_set_nat_info - Set source or destination nat address 20 + * 21 + * Set source or destination nat address of the newly allocated 22 + * nf_conn before insertion. This must be invoked for referenced 23 + * PTR_TO_BTF_ID to nf_conn___init. 24 + * 25 + * Parameters: 26 + * @nfct - Pointer to referenced nf_conn object, obtained using 27 + * bpf_xdp_ct_alloc or bpf_skb_ct_alloc. 28 + * @addr - Nat source/destination address 29 + * @port - Nat source/destination port. Non-positive values are 30 + * interpreted as select a random port. 31 + * @manip - NF_NAT_MANIP_SRC or NF_NAT_MANIP_DST 32 + */ 33 + int bpf_ct_set_nat_info(struct nf_conn___init *nfct, 34 + union nf_inet_addr *addr, int port, 35 + enum nf_nat_manip_type manip) 36 + { 37 + struct nf_conn *ct = (struct nf_conn *)nfct; 38 + u16 proto = nf_ct_l3num(ct); 39 + struct nf_nat_range2 range; 40 + 41 + if (proto != NFPROTO_IPV4 && proto != NFPROTO_IPV6) 42 + return -EINVAL; 43 + 44 + memset(&range, 0, sizeof(struct nf_nat_range2)); 45 + range.flags = NF_NAT_RANGE_MAP_IPS; 46 + range.min_addr = *addr; 47 + range.max_addr = range.min_addr; 48 + if (port > 0) { 49 + range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED; 50 + range.min_proto.all = cpu_to_be16(port); 51 + range.max_proto.all = range.min_proto.all; 52 + } 53 + 54 + return nf_nat_setup_info(ct, &range, manip) == NF_DROP ? -ENOMEM : 0; 55 + } 56 + 57 + __diag_pop() 58 + 59 + BTF_SET8_START(nf_nat_kfunc_set) 60 + BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_TRUSTED_ARGS) 61 + BTF_SET8_END(nf_nat_kfunc_set) 62 + 63 + static const struct btf_kfunc_id_set nf_bpf_nat_kfunc_set = { 64 + .owner = THIS_MODULE, 65 + .set = &nf_nat_kfunc_set, 66 + }; 67 + 68 + int register_nf_nat_bpf(void) 69 + { 70 + int ret; 71 + 72 + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, 73 + &nf_bpf_nat_kfunc_set); 74 + if (ret) 75 + return ret; 76 + 77 + return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, 78 + &nf_bpf_nat_kfunc_set); 79 + }

+2 -2

net/netfilter/nf_nat_core.c

··· 16 16 #include <linux/siphash.h> 17 17 #include <linux/rtnetlink.h> 18 18 19 - #include <net/netfilter/nf_conntrack.h> 19 + #include <net/netfilter/nf_conntrack_bpf.h> 20 20 #include <net/netfilter/nf_conntrack_core.h> 21 21 #include <net/netfilter/nf_conntrack_helper.h> 22 22 #include <net/netfilter/nf_conntrack_seqadj.h> ··· 1152 1152 WARN_ON(nf_nat_hook != NULL); 1153 1153 RCU_INIT_POINTER(nf_nat_hook, &nat_hook); 1154 1154 1155 - return 0; 1155 + return register_nf_nat_bpf(); 1156 1156 } 1157 1157 1158 1158 static void __exit nf_nat_cleanup(void)

+1 -1

samples/bpf/task_fd_query_kern.c

··· 10 10 return 0; 11 11 } 12 12 13 - SEC("kretprobe/blk_account_io_done") 13 + SEC("kretprobe/__blk_account_io_done") 14 14 int bpf_prog2(struct pt_regs *ctx) 15 15 { 16 16 return 0;

+1 -1

samples/bpf/task_fd_query_user.c

··· 348 348 /* test two functions in the corresponding *_kern.c file */ 349 349 CHECK_AND_RET(test_debug_fs_kprobe(0, "blk_mq_start_request", 350 350 BPF_FD_TYPE_KPROBE)); 351 - CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_done", 351 + CHECK_AND_RET(test_debug_fs_kprobe(1, "__blk_account_io_done", 352 352 BPF_FD_TYPE_KRETPROBE)); 353 353 354 354 /* test nondebug fs kprobe */

+1 -1

samples/bpf/tracex3_kern.c

··· 49 49 __uint(max_entries, SLOTS); 50 50 } lat_map SEC(".maps"); 51 51 52 - SEC("kprobe/blk_account_io_done") 52 + SEC("kprobe/__blk_account_io_done") 53 53 int bpf_prog2(struct pt_regs *ctx) 54 54 { 55 55 long rq = PT_REGS_PARM1(ctx);

+1 -1

samples/bpf/xdp_router_ipv4_user.c

··· 209 209 /* Rereading the route table to check if 210 210 * there is an entry with the same 211 211 * prefix but a different metric as the 212 - * deleted enty. 212 + * deleted entry. 213 213 */ 214 214 get_route_table(AF_INET); 215 215 } else if (prefix_key->data[0] ==

-2

security/keys/internal.h

··· 165 165 166 166 extern bool lookup_user_key_possessed(const struct key *key, 167 167 const struct key_match_data *match_data); 168 - #define KEY_LOOKUP_CREATE 0x01 169 - #define KEY_LOOKUP_PARTIAL 0x02 170 168 171 169 extern long join_session_keyring(const char *name); 172 170 extern void key_change_session_keyring(struct callback_head *twork);

+1 -1

tools/bpf/bpftool/Documentation/bpftool-map.rst

··· 55 55 | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash** 56 56 | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage** 57 57 | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage** 58 - | | **task_storage** | **bloom_filter** } 58 + | | **task_storage** | **bloom_filter** | **user_ringbuf** } 59 59 60 60 DESCRIPTION 61 61 ===========

+5 -11

tools/bpf/bpftool/btf.c

··· 43 43 [BTF_KIND_ENUM64] = "ENUM64", 44 44 }; 45 45 46 - struct btf_attach_point { 47 - __u32 obj_id; 48 - __u32 btf_id; 49 - }; 50 - 51 46 static const char *btf_int_enc_str(__u8 encoding) 52 47 { 53 48 switch (encoding) { ··· 635 640 636 641 btf = btf__parse_split(*argv, base ?: base_btf); 637 642 err = libbpf_get_error(btf); 638 - if (err) { 639 - btf = NULL; 643 + if (!btf) { 640 644 p_err("failed to load BTF from %s: %s", 641 - *argv, strerror(err)); 645 + *argv, strerror(errno)); 642 646 goto done; 643 647 } 644 648 NEXT_ARG(); ··· 682 688 683 689 btf = btf__load_from_kernel_by_id_split(btf_id, base_btf); 684 690 err = libbpf_get_error(btf); 685 - if (err) { 686 - p_err("get btf by id (%u): %s", btf_id, strerror(err)); 691 + if (!btf) { 692 + p_err("get btf by id (%u): %s", btf_id, strerror(errno)); 687 693 goto done; 688 694 } 689 695 } ··· 819 825 u32_as_hash_field(id)); 820 826 if (err) { 821 827 p_err("failed to append entry to hashmap for BTF ID %u, object ID %u: %s", 822 - btf_id, id, strerror(errno)); 828 + btf_id, id, strerror(-err)); 823 829 goto err_free; 824 830 } 825 831 }

+2 -2

tools/bpf/bpftool/gen.c

··· 1594 1594 1595 1595 err = bpf_linker__add_file(linker, file, NULL); 1596 1596 if (err) { 1597 - p_err("failed to link '%s': %s (%d)", file, strerror(err), err); 1597 + p_err("failed to link '%s': %s (%d)", file, strerror(errno), errno); 1598 1598 goto out; 1599 1599 } 1600 1600 } 1601 1601 1602 1602 err = bpf_linker__finalize(linker); 1603 1603 if (err) { 1604 - p_err("failed to finalize ELF file: %s (%d)", strerror(err), err); 1604 + p_err("failed to finalize ELF file: %s (%d)", strerror(errno), errno); 1605 1605 goto out; 1606 1606 } 1607 1607

+19

tools/bpf/bpftool/link.c

··· 106 106 } 107 107 } 108 108 109 + static bool is_iter_task_target(const char *target_name) 110 + { 111 + return strcmp(target_name, "task") == 0 || 112 + strcmp(target_name, "task_file") == 0 || 113 + strcmp(target_name, "task_vma") == 0; 114 + } 115 + 109 116 static void show_iter_json(struct bpf_link_info *info, json_writer_t *wtr) 110 117 { 111 118 const char *target_name = u64_to_ptr(info->iter.target_name); ··· 121 114 122 115 if (is_iter_map_target(target_name)) 123 116 jsonw_uint_field(wtr, "map_id", info->iter.map.map_id); 117 + else if (is_iter_task_target(target_name)) { 118 + if (info->iter.task.tid) 119 + jsonw_uint_field(wtr, "tid", info->iter.task.tid); 120 + else if (info->iter.task.pid) 121 + jsonw_uint_field(wtr, "pid", info->iter.task.pid); 122 + } 124 123 125 124 if (is_iter_cgroup_target(target_name)) { 126 125 jsonw_lluint_field(wtr, "cgroup_id", info->iter.cgroup.cgroup_id); ··· 250 237 251 238 if (is_iter_map_target(target_name)) 252 239 printf("map_id %u ", info->iter.map.map_id); 240 + else if (is_iter_task_target(target_name)) { 241 + if (info->iter.task.tid) 242 + printf("tid %u ", info->iter.task.tid); 243 + else if (info->iter.task.pid) 244 + printf("pid %u ", info->iter.task.pid); 245 + } 253 246 254 247 if (is_iter_cgroup_target(target_name)) { 255 248 printf("cgroup_id %llu ", info->iter.cgroup.cgroup_id);

+1 -1

tools/bpf/bpftool/map.c

··· 1459 1459 " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n" 1460 1460 " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n" 1461 1461 " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n" 1462 - " task_storage | bloom_filter }\n" 1462 + " task_storage | bloom_filter | user_ringbuf }\n" 1463 1463 " " HELP_SPEC_OPTIONS " |\n" 1464 1464 " {-f|--bpffs} | {-n|--nomount} }\n" 1465 1465 "",

+3 -11

tools/bpf/bpftool/map_perf_ring.c

··· 29 29 30 30 static volatile bool stop; 31 31 32 - struct event_ring_info { 33 - int fd; 34 - int key; 35 - unsigned int cpu; 36 - void *mem; 37 - }; 38 - 39 32 struct perf_event_sample { 40 33 struct perf_event_header header; 41 34 __u64 time; ··· 188 195 opts.map_keys = &ctx.idx; 189 196 pb = perf_buffer__new_raw(map_fd, MMAP_PAGE_CNT, &perf_attr, 190 197 print_bpf_output, &ctx, &opts); 191 - err = libbpf_get_error(pb); 192 - if (err) { 198 + if (!pb) { 193 199 p_err("failed to create perf buffer: %s (%d)", 194 - strerror(err), err); 200 + strerror(errno), errno); 195 201 goto err_close_map; 196 202 } 197 203 ··· 205 213 err = perf_buffer__poll(pb, 200); 206 214 if (err < 0 && err != -EINTR) { 207 215 p_err("perf buffer polling failed: %s (%d)", 208 - strerror(err), err); 216 + strerror(errno), errno); 209 217 goto err_close_pb; 210 218 } 211 219 }

+55 -4

tools/include/uapi/linux/bpf.h

··· 110 110 __u32 cgroup_fd; 111 111 __u64 cgroup_id; 112 112 } cgroup; 113 + /* Parameters of task iterators. */ 114 + struct { 115 + __u32 tid; 116 + __u32 pid; 117 + __u32 pid_fd; 118 + } task; 113 119 }; 114 120 115 121 /* BPF syscall commands, see bpf(2) man-page for more details. */ ··· 934 928 BPF_MAP_TYPE_INODE_STORAGE, 935 929 BPF_MAP_TYPE_TASK_STORAGE, 936 930 BPF_MAP_TYPE_BLOOM_FILTER, 931 + BPF_MAP_TYPE_USER_RINGBUF, 937 932 }; 938 933 939 934 /* Note that tracing related programs such as ··· 4957 4950 * Get address of the traced function (for tracing and kprobe programs). 4958 4951 * Return 4959 4952 * Address of the traced function. 4953 + * 0 for kprobes placed within the function (not at the entry). 4960 4954 * 4961 4955 * u64 bpf_get_attach_cookie(void *ctx) 4962 4956 * Description ··· 5087 5079 * 5088 5080 * long bpf_get_func_arg(void *ctx, u32 n, u64 *value) 5089 5081 * Description 5090 - * Get **n**-th argument (zero based) of the traced function (for tracing programs) 5082 + * Get **n**-th argument register (zero based) of the traced function (for tracing programs) 5091 5083 * returned in **value**. 5092 5084 * 5093 5085 * Return 5094 5086 * 0 on success. 5095 - * **-EINVAL** if n >= arguments count of traced function. 5087 + * **-EINVAL** if n >= argument register count of traced function. 5096 5088 * 5097 5089 * long bpf_get_func_ret(void *ctx, u64 *value) 5098 5090 * Description ··· 5105 5097 * 5106 5098 * long bpf_get_func_arg_cnt(void *ctx) 5107 5099 * Description 5108 - * Get number of arguments of the traced function (for tracing programs). 5100 + * Get number of registers of the traced function (for tracing programs) where 5101 + * function arguments are stored in these registers. 5109 5102 * 5110 5103 * Return 5111 - * The number of arguments of the traced function. 5104 + * The number of argument registers of the traced function. 5112 5105 * 5113 5106 * int bpf_get_retval(void) 5114 5107 * Description ··· 5395 5386 * Return 5396 5387 * Current *ktime*. 5397 5388 * 5389 + * long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) 5390 + * Description 5391 + * Drain samples from the specified user ring buffer, and invoke 5392 + * the provided callback for each such sample: 5393 + * 5394 + * long (\*callback_fn)(struct bpf_dynptr \*dynptr, void \*ctx); 5395 + * 5396 + * If **callback_fn** returns 0, the helper will continue to try 5397 + * and drain the next sample, up to a maximum of 5398 + * BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value is 1, 5399 + * the helper will skip the rest of the samples and return. Other 5400 + * return values are not used now, and will be rejected by the 5401 + * verifier. 5402 + * Return 5403 + * The number of drained samples if no error was encountered while 5404 + * draining samples, or 0 if no samples were present in the ring 5405 + * buffer. If a user-space producer was epoll-waiting on this map, 5406 + * and at least one sample was drained, they will receive an event 5407 + * notification notifying them of available space in the ring 5408 + * buffer. If the BPF_RB_NO_WAKEUP flag is passed to this 5409 + * function, no wakeup notification will be sent. If the 5410 + * BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification will 5411 + * be sent even if no sample was drained. 5412 + * 5413 + * On failure, the returned value is one of the following: 5414 + * 5415 + * **-EBUSY** if the ring buffer is contended, and another calling 5416 + * context was concurrently draining the ring buffer. 5417 + * 5418 + * **-EINVAL** if user-space is not properly tracking the ring 5419 + * buffer due to the producer position not being aligned to 8 5420 + * bytes, a sample not being aligned to 8 bytes, or the producer 5421 + * position not matching the advertised length of a sample. 5422 + * 5423 + * **-E2BIG** if user-space has tried to publish a sample which is 5424 + * larger than the size of the ring buffer, or which cannot fit 5425 + * within a struct bpf_dynptr. 5398 5426 */ 5399 5427 #define __BPF_FUNC_MAPPER(FN) \ 5400 5428 FN(unspec), \ ··· 5643 5597 FN(tcp_raw_check_syncookie_ipv4), \ 5644 5598 FN(tcp_raw_check_syncookie_ipv6), \ 5645 5599 FN(ktime_get_tai_ns), \ 5600 + FN(user_ringbuf_drain), \ 5646 5601 /* */ 5647 5602 5648 5603 /* integer value in 'imm' field of BPF_CALL instruction selects which helper ··· 6265 6218 __u64 cgroup_id; 6266 6219 __u32 order; 6267 6220 } cgroup; 6221 + struct { 6222 + __u32 tid; 6223 + __u32 pid; 6224 + } task; 6268 6225 }; 6269 6226 } iter; 6270 6227 struct {

+6 -25

tools/lib/bpf/bpf_helpers.h

··· 131 131 /* 132 132 * Helper function to perform a tail call with a constant/immediate map slot. 133 133 */ 134 - #if (!defined(__clang__) || __clang_major__ >= 8) && defined(__bpf__) 134 + #if __clang_major__ >= 8 && defined(__bpf__) 135 135 static __always_inline void 136 136 bpf_tail_call_static(void *ctx, const void *map, const __u32 slot) 137 137 { ··· 139 139 __bpf_unreachable(); 140 140 141 141 /* 142 - * Provide a hard guarantee that the compiler won't optimize setting r2 143 - * (map pointer) and r3 (constant map index) from _different paths_ ending 142 + * Provide a hard guarantee that LLVM won't optimize setting r2 (map 143 + * pointer) and r3 (constant map index) from _different paths_ ending 144 144 * up at the _same_ call insn as otherwise we won't be able to use the 145 145 * jmpq/nopl retpoline-free patching by the x86-64 JIT in the kernel 146 146 * given they mismatch. See also d2e4c1e6c294 ("bpf: Constant map key ··· 148 148 * 149 149 * Note on clobber list: we need to stay in-line with BPF calling 150 150 * convention, so even if we don't end up using r0, r4, r5, we need 151 - * to mark them as clobber so that the compiler doesn't end up using 152 - * them before / after the call. 151 + * to mark them as clobber so that LLVM doesn't end up using them 152 + * before / after the call. 153 153 */ 154 - asm volatile( 155 - #ifdef __clang__ 156 - "r1 = %[ctx]\n\t" 154 + asm volatile("r1 = %[ctx]\n\t" 157 155 "r2 = %[map]\n\t" 158 156 "r3 = %[slot]\n\t" 159 - #else 160 - "mov %%r1,%[ctx]\n\t" 161 - "mov %%r2,%[map]\n\t" 162 - "mov %%r3,%[slot]\n\t" 163 - #endif 164 157 "call 12" 165 158 :: [ctx]"r"(ctx), [map]"r"(map), [slot]"i"(slot) 166 159 : "r0", "r1", "r2", "r3", "r4", "r5"); 167 160 } 168 161 #endif 169 - 170 - /* 171 - * Helper structure used by eBPF C program 172 - * to describe BPF map attributes to libbpf loader 173 - */ 174 - struct bpf_map_def { 175 - unsigned int type; 176 - unsigned int key_size; 177 - unsigned int value_size; 178 - unsigned int max_entries; 179 - unsigned int map_flags; 180 - } __attribute__((deprecated("use BTF-defined maps in .maps section"))); 181 162 182 163 enum libbpf_pin_type { 183 164 LIBBPF_PIN_NONE,

+107

tools/lib/bpf/bpf_tracing.h

··· 438 438 static __always_inline typeof(name(0)) \ 439 439 ____##name(unsigned long long *ctx, ##args) 440 440 441 + #ifndef ___bpf_nth2 442 + #define ___bpf_nth2(_, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, \ 443 + _14, _15, _16, _17, _18, _19, _20, _21, _22, _23, _24, N, ...) N 444 + #endif 445 + #ifndef ___bpf_narg2 446 + #define ___bpf_narg2(...) \ 447 + ___bpf_nth2(_, ##__VA_ARGS__, 12, 12, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, \ 448 + 6, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 0) 449 + #endif 450 + 451 + #define ___bpf_treg_cnt(t) \ 452 + __builtin_choose_expr(sizeof(t) == 1, 1, \ 453 + __builtin_choose_expr(sizeof(t) == 2, 1, \ 454 + __builtin_choose_expr(sizeof(t) == 4, 1, \ 455 + __builtin_choose_expr(sizeof(t) == 8, 1, \ 456 + __builtin_choose_expr(sizeof(t) == 16, 2, \ 457 + (void)0))))) 458 + 459 + #define ___bpf_reg_cnt0() (0) 460 + #define ___bpf_reg_cnt1(t, x) (___bpf_reg_cnt0() + ___bpf_treg_cnt(t)) 461 + #define ___bpf_reg_cnt2(t, x, args...) (___bpf_reg_cnt1(args) + ___bpf_treg_cnt(t)) 462 + #define ___bpf_reg_cnt3(t, x, args...) (___bpf_reg_cnt2(args) + ___bpf_treg_cnt(t)) 463 + #define ___bpf_reg_cnt4(t, x, args...) (___bpf_reg_cnt3(args) + ___bpf_treg_cnt(t)) 464 + #define ___bpf_reg_cnt5(t, x, args...) (___bpf_reg_cnt4(args) + ___bpf_treg_cnt(t)) 465 + #define ___bpf_reg_cnt6(t, x, args...) (___bpf_reg_cnt5(args) + ___bpf_treg_cnt(t)) 466 + #define ___bpf_reg_cnt7(t, x, args...) (___bpf_reg_cnt6(args) + ___bpf_treg_cnt(t)) 467 + #define ___bpf_reg_cnt8(t, x, args...) (___bpf_reg_cnt7(args) + ___bpf_treg_cnt(t)) 468 + #define ___bpf_reg_cnt9(t, x, args...) (___bpf_reg_cnt8(args) + ___bpf_treg_cnt(t)) 469 + #define ___bpf_reg_cnt10(t, x, args...) (___bpf_reg_cnt9(args) + ___bpf_treg_cnt(t)) 470 + #define ___bpf_reg_cnt11(t, x, args...) (___bpf_reg_cnt10(args) + ___bpf_treg_cnt(t)) 471 + #define ___bpf_reg_cnt12(t, x, args...) (___bpf_reg_cnt11(args) + ___bpf_treg_cnt(t)) 472 + #define ___bpf_reg_cnt(args...) ___bpf_apply(___bpf_reg_cnt, ___bpf_narg2(args))(args) 473 + 474 + #define ___bpf_union_arg(t, x, n) \ 475 + __builtin_choose_expr(sizeof(t) == 1, ({ union { __u8 z[1]; t x; } ___t = { .z = {ctx[n]}}; ___t.x; }), \ 476 + __builtin_choose_expr(sizeof(t) == 2, ({ union { __u16 z[1]; t x; } ___t = { .z = {ctx[n]} }; ___t.x; }), \ 477 + __builtin_choose_expr(sizeof(t) == 4, ({ union { __u32 z[1]; t x; } ___t = { .z = {ctx[n]} }; ___t.x; }), \ 478 + __builtin_choose_expr(sizeof(t) == 8, ({ union { __u64 z[1]; t x; } ___t = {.z = {ctx[n]} }; ___t.x; }), \ 479 + __builtin_choose_expr(sizeof(t) == 16, ({ union { __u64 z[2]; t x; } ___t = {.z = {ctx[n], ctx[n + 1]} }; ___t.x; }), \ 480 + (void)0))))) 481 + 482 + #define ___bpf_ctx_arg0(n, args...) 483 + #define ___bpf_ctx_arg1(n, t, x) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt1(t, x)) 484 + #define ___bpf_ctx_arg2(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt2(t, x, args)) ___bpf_ctx_arg1(n, args) 485 + #define ___bpf_ctx_arg3(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt3(t, x, args)) ___bpf_ctx_arg2(n, args) 486 + #define ___bpf_ctx_arg4(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt4(t, x, args)) ___bpf_ctx_arg3(n, args) 487 + #define ___bpf_ctx_arg5(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt5(t, x, args)) ___bpf_ctx_arg4(n, args) 488 + #define ___bpf_ctx_arg6(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt6(t, x, args)) ___bpf_ctx_arg5(n, args) 489 + #define ___bpf_ctx_arg7(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt7(t, x, args)) ___bpf_ctx_arg6(n, args) 490 + #define ___bpf_ctx_arg8(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt8(t, x, args)) ___bpf_ctx_arg7(n, args) 491 + #define ___bpf_ctx_arg9(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt9(t, x, args)) ___bpf_ctx_arg8(n, args) 492 + #define ___bpf_ctx_arg10(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt10(t, x, args)) ___bpf_ctx_arg9(n, args) 493 + #define ___bpf_ctx_arg11(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt11(t, x, args)) ___bpf_ctx_arg10(n, args) 494 + #define ___bpf_ctx_arg12(n, t, x, args...) , ___bpf_union_arg(t, x, n - ___bpf_reg_cnt12(t, x, args)) ___bpf_ctx_arg11(n, args) 495 + #define ___bpf_ctx_arg(args...) ___bpf_apply(___bpf_ctx_arg, ___bpf_narg2(args))(___bpf_reg_cnt(args), args) 496 + 497 + #define ___bpf_ctx_decl0() 498 + #define ___bpf_ctx_decl1(t, x) , t x 499 + #define ___bpf_ctx_decl2(t, x, args...) , t x ___bpf_ctx_decl1(args) 500 + #define ___bpf_ctx_decl3(t, x, args...) , t x ___bpf_ctx_decl2(args) 501 + #define ___bpf_ctx_decl4(t, x, args...) , t x ___bpf_ctx_decl3(args) 502 + #define ___bpf_ctx_decl5(t, x, args...) , t x ___bpf_ctx_decl4(args) 503 + #define ___bpf_ctx_decl6(t, x, args...) , t x ___bpf_ctx_decl5(args) 504 + #define ___bpf_ctx_decl7(t, x, args...) , t x ___bpf_ctx_decl6(args) 505 + #define ___bpf_ctx_decl8(t, x, args...) , t x ___bpf_ctx_decl7(args) 506 + #define ___bpf_ctx_decl9(t, x, args...) , t x ___bpf_ctx_decl8(args) 507 + #define ___bpf_ctx_decl10(t, x, args...) , t x ___bpf_ctx_decl9(args) 508 + #define ___bpf_ctx_decl11(t, x, args...) , t x ___bpf_ctx_decl10(args) 509 + #define ___bpf_ctx_decl12(t, x, args...) , t x ___bpf_ctx_decl11(args) 510 + #define ___bpf_ctx_decl(args...) ___bpf_apply(___bpf_ctx_decl, ___bpf_narg2(args))(args) 511 + 512 + /* 513 + * BPF_PROG2 is an enhanced version of BPF_PROG in order to handle struct 514 + * arguments. Since each struct argument might take one or two u64 values 515 + * in the trampoline stack, argument type size is needed to place proper number 516 + * of u64 values for each argument. Therefore, BPF_PROG2 has different 517 + * syntax from BPF_PROG. For example, for the following BPF_PROG syntax: 518 + * 519 + * int BPF_PROG(test2, int a, int b) { ... } 520 + * 521 + * the corresponding BPF_PROG2 syntax is: 522 + * 523 + * int BPF_PROG2(test2, int, a, int, b) { ... } 524 + * 525 + * where type and the corresponding argument name are separated by comma. 526 + * 527 + * Use BPF_PROG2 macro if one of the arguments might be a struct/union larger 528 + * than 8 bytes: 529 + * 530 + * int BPF_PROG2(test_struct_arg, struct bpf_testmod_struct_arg_1, a, int, b, 531 + * int, c, int, d, struct bpf_testmod_struct_arg_2, e, int, ret) 532 + * { 533 + * // access a, b, c, d, e, and ret directly 534 + * ... 535 + * } 536 + */ 537 + #define BPF_PROG2(name, args...) \ 538 + name(unsigned long long *ctx); \ 539 + static __always_inline typeof(name(0)) \ 540 + ____##name(unsigned long long *ctx ___bpf_ctx_decl(args)); \ 541 + typeof(name(0)) name(unsigned long long *ctx) \ 542 + { \ 543 + return ____##name(ctx ___bpf_ctx_arg(args)); \ 544 + } \ 545 + static __always_inline typeof(name(0)) \ 546 + ____##name(unsigned long long *ctx ___bpf_ctx_decl(args)) 547 + 441 548 struct pt_regs; 442 549 443 550 #define ___bpf_kprobe_args0() ctx

+13 -19

tools/lib/bpf/btf.c

··· 4642 4642 */ 4643 4643 struct btf *btf__load_vmlinux_btf(void) 4644 4644 { 4645 - struct { 4646 - const char *path_fmt; 4647 - bool raw_btf; 4648 - } locations[] = { 4645 + const char *locations[] = { 4649 4646 /* try canonical vmlinux BTF through sysfs first */ 4650 - { "/sys/kernel/btf/vmlinux", true /* raw BTF */ }, 4651 - /* fall back to trying to find vmlinux ELF on disk otherwise */ 4652 - { "/boot/vmlinux-%1$s" }, 4653 - { "/lib/modules/%1$s/vmlinux-%1$s" }, 4654 - { "/lib/modules/%1$s/build/vmlinux" }, 4655 - { "/usr/lib/modules/%1$s/kernel/vmlinux" }, 4656 - { "/usr/lib/debug/boot/vmlinux-%1$s" }, 4657 - { "/usr/lib/debug/boot/vmlinux-%1$s.debug" }, 4658 - { "/usr/lib/debug/lib/modules/%1$s/vmlinux" }, 4647 + "/sys/kernel/btf/vmlinux", 4648 + /* fall back to trying to find vmlinux on disk otherwise */ 4649 + "/boot/vmlinux-%1$s", 4650 + "/lib/modules/%1$s/vmlinux-%1$s", 4651 + "/lib/modules/%1$s/build/vmlinux", 4652 + "/usr/lib/modules/%1$s/kernel/vmlinux", 4653 + "/usr/lib/debug/boot/vmlinux-%1$s", 4654 + "/usr/lib/debug/boot/vmlinux-%1$s.debug", 4655 + "/usr/lib/debug/lib/modules/%1$s/vmlinux", 4659 4656 }; 4660 4657 char path[PATH_MAX + 1]; 4661 4658 struct utsname buf; ··· 4662 4665 uname(&buf); 4663 4666 4664 4667 for (i = 0; i < ARRAY_SIZE(locations); i++) { 4665 - snprintf(path, PATH_MAX, locations[i].path_fmt, buf.release); 4668 + snprintf(path, PATH_MAX, locations[i], buf.release); 4666 4669 4667 - if (access(path, R_OK)) 4670 + if (faccessat(AT_FDCWD, path, R_OK, AT_EACCESS)) 4668 4671 continue; 4669 4672 4670 - if (locations[i].raw_btf) 4671 - btf = btf__parse_raw(path); 4672 - else 4673 - btf = btf__parse_elf(path, NULL); 4673 + btf = btf__parse(path, NULL); 4674 4674 err = libbpf_get_error(btf); 4675 4675 pr_debug("loading kernel BTF '%s': %d\n", path, err); 4676 4676 if (err)

+24 -1

tools/lib/bpf/btf.h

··· 486 486 return (struct btf_enum *)(t + 1); 487 487 } 488 488 489 + struct btf_enum64; 490 + 489 491 static inline struct btf_enum64 *btf_enum64(const struct btf_type *t) 490 492 { 491 493 return (struct btf_enum64 *)(t + 1); ··· 495 493 496 494 static inline __u64 btf_enum64_value(const struct btf_enum64 *e) 497 495 { 498 - return ((__u64)e->val_hi32 << 32) | e->val_lo32; 496 + /* struct btf_enum64 is introduced in Linux 6.0, which is very 497 + * bleeding-edge. Here we are avoiding relying on struct btf_enum64 498 + * definition coming from kernel UAPI headers to support wider range 499 + * of system-wide kernel headers. 500 + * 501 + * Given this header can be also included from C++ applications, that 502 + * further restricts C tricks we can use (like using compatible 503 + * anonymous struct). So just treat struct btf_enum64 as 504 + * a three-element array of u32 and access second (lo32) and third 505 + * (hi32) elements directly. 506 + * 507 + * For reference, here is a struct btf_enum64 definition: 508 + * 509 + * const struct btf_enum64 { 510 + * __u32 name_off; 511 + * __u32 val_lo32; 512 + * __u32 val_hi32; 513 + * }; 514 + */ 515 + const __u32 *e64 = (const __u32 *)e; 516 + 517 + return ((__u64)e64[2] << 32) | e64[1]; 499 518 } 500 519 501 520 static inline struct btf_member *btf_members(const struct btf_type *t)

+1 -1

tools/lib/bpf/btf_dump.c

··· 2385 2385 d->typed_dump->indent_lvl = OPTS_GET(opts, indent_level, 0); 2386 2386 2387 2387 /* default indent string is a tab */ 2388 - if (!opts->indent_str) 2388 + if (!OPTS_GET(opts, indent_str, NULL)) 2389 2389 d->typed_dump->indent_str[0] = '\t'; 2390 2390 else 2391 2391 libbpf_strlcpy(d->typed_dump->indent_str, opts->indent_str,

+50 -56

tools/lib/bpf/libbpf.c

··· 163 163 [BPF_MAP_TYPE_INODE_STORAGE] = "inode_storage", 164 164 [BPF_MAP_TYPE_TASK_STORAGE] = "task_storage", 165 165 [BPF_MAP_TYPE_BLOOM_FILTER] = "bloom_filter", 166 + [BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf", 166 167 }; 167 168 168 169 static const char * const prog_type_name[] = { ··· 884 883 __u32 major, minor, patch; 885 884 struct utsname info; 886 885 887 - if (access(ubuntu_kver_file, R_OK) == 0) { 886 + if (faccessat(AT_FDCWD, ubuntu_kver_file, R_OK, AT_EACCESS) == 0) { 888 887 FILE *f; 889 888 890 889 f = fopen(ubuntu_kver_file, "r"); ··· 2097 2096 return true; 2098 2097 } 2099 2098 2099 + static int pathname_concat(char *buf, size_t buf_sz, const char *path, const char *name) 2100 + { 2101 + int len; 2102 + 2103 + len = snprintf(buf, buf_sz, "%s/%s", path, name); 2104 + if (len < 0) 2105 + return -EINVAL; 2106 + if (len >= buf_sz) 2107 + return -ENAMETOOLONG; 2108 + 2109 + return 0; 2110 + } 2111 + 2100 2112 static int build_map_pin_path(struct bpf_map *map, const char *path) 2101 2113 { 2102 2114 char buf[PATH_MAX]; 2103 - int len; 2115 + int err; 2104 2116 2105 2117 if (!path) 2106 2118 path = "/sys/fs/bpf"; 2107 2119 2108 - len = snprintf(buf, PATH_MAX, "%s/%s", path, bpf_map__name(map)); 2109 - if (len < 0) 2110 - return -EINVAL; 2111 - else if (len >= PATH_MAX) 2112 - return -ENAMETOOLONG; 2120 + err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map)); 2121 + if (err) 2122 + return err; 2113 2123 2114 2124 return bpf_map__set_pin_path(map, buf); 2115 2125 } ··· 2384 2372 return sz; 2385 2373 } 2386 2374 2375 + static bool map_is_ringbuf(const struct bpf_map *map) 2376 + { 2377 + return map->def.type == BPF_MAP_TYPE_RINGBUF || 2378 + map->def.type == BPF_MAP_TYPE_USER_RINGBUF; 2379 + } 2380 + 2387 2381 static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def) 2388 2382 { 2389 2383 map->def.type = def->map_type; ··· 2404 2386 map->btf_value_type_id = def->value_type_id; 2405 2387 2406 2388 /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */ 2407 - if (map->def.type == BPF_MAP_TYPE_RINGBUF) 2389 + if (map_is_ringbuf(map)) 2408 2390 map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 2409 2391 2410 2392 if (def->parts & MAP_DEF_MAP_TYPE) ··· 4387 4369 map->def.max_entries = max_entries; 4388 4370 4389 4371 /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */ 4390 - if (map->def.type == BPF_MAP_TYPE_RINGBUF) 4372 + if (map_is_ringbuf(map)) 4391 4373 map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 4392 4374 4393 4375 return 0; ··· 7979 7961 continue; 7980 7962 7981 7963 if (path) { 7982 - int len; 7983 - 7984 - len = snprintf(buf, PATH_MAX, "%s/%s", path, 7985 - bpf_map__name(map)); 7986 - if (len < 0) { 7987 - err = -EINVAL; 7964 + err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map)); 7965 + if (err) 7988 7966 goto err_unpin_maps; 7989 - } else if (len >= PATH_MAX) { 7990 - err = -ENAMETOOLONG; 7991 - goto err_unpin_maps; 7992 - } 7993 7967 sanitize_pin_path(buf); 7994 7968 pin_path = buf; 7995 7969 } else if (!map->pin_path) { ··· 8019 8009 char buf[PATH_MAX]; 8020 8010 8021 8011 if (path) { 8022 - int len; 8023 - 8024 - len = snprintf(buf, PATH_MAX, "%s/%s", path, 8025 - bpf_map__name(map)); 8026 - if (len < 0) 8027 - return libbpf_err(-EINVAL); 8028 - else if (len >= PATH_MAX) 8029 - return libbpf_err(-ENAMETOOLONG); 8012 + err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map)); 8013 + if (err) 8014 + return libbpf_err(err); 8030 8015 sanitize_pin_path(buf); 8031 8016 pin_path = buf; 8032 8017 } else if (!map->pin_path) { ··· 8039 8034 int bpf_object__pin_programs(struct bpf_object *obj, const char *path) 8040 8035 { 8041 8036 struct bpf_program *prog; 8037 + char buf[PATH_MAX]; 8042 8038 int err; 8043 8039 8044 8040 if (!obj) ··· 8051 8045 } 8052 8046 8053 8047 bpf_object__for_each_program(prog, obj) { 8054 - char buf[PATH_MAX]; 8055 - int len; 8056 - 8057 - len = snprintf(buf, PATH_MAX, "%s/%s", path, prog->name); 8058 - if (len < 0) { 8059 - err = -EINVAL; 8048 + err = pathname_concat(buf, sizeof(buf), path, prog->name); 8049 + if (err) 8060 8050 goto err_unpin_programs; 8061 - } else if (len >= PATH_MAX) { 8062 - err = -ENAMETOOLONG; 8063 - goto err_unpin_programs; 8064 - } 8065 8051 8066 8052 err = bpf_program__pin(prog, buf); 8067 8053 if (err) ··· 8064 8066 8065 8067 err_unpin_programs: 8066 8068 while ((prog = bpf_object__prev_program(obj, prog))) { 8067 - char buf[PATH_MAX]; 8068 - int len; 8069 - 8070 - len = snprintf(buf, PATH_MAX, "%s/%s", path, prog->name); 8071 - if (len < 0) 8072 - continue; 8073 - else if (len >= PATH_MAX) 8069 + if (pathname_concat(buf, sizeof(buf), path, prog->name)) 8074 8070 continue; 8075 8071 8076 8072 bpf_program__unpin(prog, buf); ··· 8083 8091 8084 8092 bpf_object__for_each_program(prog, obj) { 8085 8093 char buf[PATH_MAX]; 8086 - int len; 8087 8094 8088 - len = snprintf(buf, PATH_MAX, "%s/%s", path, prog->name); 8089 - if (len < 0) 8090 - return libbpf_err(-EINVAL); 8091 - else if (len >= PATH_MAX) 8092 - return libbpf_err(-ENAMETOOLONG); 8095 + err = pathname_concat(buf, sizeof(buf), path, prog->name); 8096 + if (err) 8097 + return libbpf_err(err); 8093 8098 8094 8099 err = bpf_program__unpin(prog, buf); 8095 8100 if (err) ··· 9073 9084 int err = 0; 9074 9085 9075 9086 /* BPF program's BTF ID */ 9076 - if (attach_prog_fd) { 9087 + if (prog->type == BPF_PROG_TYPE_EXT || attach_prog_fd) { 9088 + if (!attach_prog_fd) { 9089 + pr_warn("prog '%s': attach program FD is not set\n", prog->name); 9090 + return -EINVAL; 9091 + } 9077 9092 err = libbpf_find_prog_btf_id(attach_name, attach_prog_fd); 9078 9093 if (err < 0) { 9079 - pr_warn("failed to find BPF program (FD %d) BTF ID for '%s': %d\n", 9080 - attach_prog_fd, attach_name, err); 9094 + pr_warn("prog '%s': failed to find BPF program (FD %d) BTF ID for '%s': %d\n", 9095 + prog->name, attach_prog_fd, attach_name, err); 9081 9096 return err; 9082 9097 } 9083 9098 *btf_obj_fd = 0; ··· 9098 9105 err = find_kernel_btf_id(prog->obj, attach_name, attach_type, btf_obj_fd, btf_type_id); 9099 9106 } 9100 9107 if (err) { 9101 - pr_warn("failed to find kernel BTF type ID of '%s': %d\n", attach_name, err); 9108 + pr_warn("prog '%s': failed to find kernel BTF type ID of '%s': %d\n", 9109 + prog->name, attach_name, err); 9102 9110 return err; 9103 9111 } 9104 9112 return 0; ··· 9904 9910 static int has_debugfs = -1; 9905 9911 9906 9912 if (has_debugfs < 0) 9907 - has_debugfs = access(DEBUGFS, F_OK) == 0; 9913 + has_debugfs = faccessat(AT_FDCWD, DEBUGFS, F_OK, AT_EACCESS) == 0; 9908 9914 9909 9915 return has_debugfs == 1; 9910 9916 } ··· 10721 10727 continue; 10722 10728 snprintf(result, result_sz, "%.*s/%s", seg_len, s, file); 10723 10729 /* ensure it has required permissions */ 10724 - if (access(result, perm) < 0) 10730 + if (faccessat(AT_FDCWD, result, perm, AT_EACCESS) < 0) 10725 10731 continue; 10726 10732 pr_debug("resolved '%s' to '%s'\n", file, result); 10727 10733 return 0;

+110 -1

tools/lib/bpf/libbpf.h

··· 118 118 * auto-pinned to that path on load; defaults to "/sys/fs/bpf". 119 119 */ 120 120 const char *pin_root_path; 121 - long :0; 121 + 122 + __u32 :32; /* stub out now removed attach_prog_fd */ 123 + 122 124 /* Additional kernel config content that augments and overrides 123 125 * system Kconfig for CONFIG_xxx externs. 124 126 */ ··· 1013 1011 1014 1012 /* Ring buffer APIs */ 1015 1013 struct ring_buffer; 1014 + struct user_ring_buffer; 1016 1015 1017 1016 typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size); 1018 1017 ··· 1032 1029 LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms); 1033 1030 LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb); 1034 1031 LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb); 1032 + 1033 + struct user_ring_buffer_opts { 1034 + size_t sz; /* size of this struct, for forward/backward compatibility */ 1035 + }; 1036 + 1037 + #define user_ring_buffer_opts__last_field sz 1038 + 1039 + /* @brief **user_ring_buffer__new()** creates a new instance of a user ring 1040 + * buffer. 1041 + * 1042 + * @param map_fd A file descriptor to a BPF_MAP_TYPE_USER_RINGBUF map. 1043 + * @param opts Options for how the ring buffer should be created. 1044 + * @return A user ring buffer on success; NULL and errno being set on a 1045 + * failure. 1046 + */ 1047 + LIBBPF_API struct user_ring_buffer * 1048 + user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); 1049 + 1050 + /* @brief **user_ring_buffer__reserve()** reserves a pointer to a sample in the 1051 + * user ring buffer. 1052 + * @param rb A pointer to a user ring buffer. 1053 + * @param size The size of the sample, in bytes. 1054 + * @return A pointer to an 8-byte aligned reserved region of the user ring 1055 + * buffer; NULL, and errno being set if a sample could not be reserved. 1056 + * 1057 + * This function is *not* thread safe, and callers must synchronize accessing 1058 + * this function if there are multiple producers. If a size is requested that 1059 + * is larger than the size of the entire ring buffer, errno will be set to 1060 + * E2BIG and NULL is returned. If the ring buffer could accommodate the size, 1061 + * but currently does not have enough space, errno is set to ENOSPC and NULL is 1062 + * returned. 1063 + * 1064 + * After initializing the sample, callers must invoke 1065 + * **user_ring_buffer__submit()** to post the sample to the kernel. Otherwise, 1066 + * the sample must be freed with **user_ring_buffer__discard()**. 1067 + */ 1068 + LIBBPF_API void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); 1069 + 1070 + /* @brief **user_ring_buffer__reserve_blocking()** reserves a record in the 1071 + * ring buffer, possibly blocking for up to @timeout_ms until a sample becomes 1072 + * available. 1073 + * @param rb The user ring buffer. 1074 + * @param size The size of the sample, in bytes. 1075 + * @param timeout_ms The amount of time, in milliseconds, for which the caller 1076 + * should block when waiting for a sample. -1 causes the caller to block 1077 + * indefinitely. 1078 + * @return A pointer to an 8-byte aligned reserved region of the user ring 1079 + * buffer; NULL, and errno being set if a sample could not be reserved. 1080 + * 1081 + * This function is *not* thread safe, and callers must synchronize 1082 + * accessing this function if there are multiple producers 1083 + * 1084 + * If **timeout_ms** is -1, the function will block indefinitely until a sample 1085 + * becomes available. Otherwise, **timeout_ms** must be non-negative, or errno 1086 + * is set to EINVAL, and NULL is returned. If **timeout_ms** is 0, no blocking 1087 + * will occur and the function will return immediately after attempting to 1088 + * reserve a sample. 1089 + * 1090 + * If **size** is larger than the size of the entire ring buffer, errno is set 1091 + * to E2BIG and NULL is returned. If the ring buffer could accommodate 1092 + * **size**, but currently does not have enough space, the caller will block 1093 + * until at most **timeout_ms** has elapsed. If insufficient space is available 1094 + * at that time, errno is set to ENOSPC, and NULL is returned. 1095 + * 1096 + * The kernel guarantees that it will wake up this thread to check if 1097 + * sufficient space is available in the ring buffer at least once per 1098 + * invocation of the **bpf_ringbuf_drain()** helper function, provided that at 1099 + * least one sample is consumed, and the BPF program did not invoke the 1100 + * function with BPF_RB_NO_WAKEUP. A wakeup may occur sooner than that, but the 1101 + * kernel does not guarantee this. If the helper function is invoked with 1102 + * BPF_RB_FORCE_WAKEUP, a wakeup event will be sent even if no sample is 1103 + * consumed. 1104 + * 1105 + * When a sample of size **size** is found within **timeout_ms**, a pointer to 1106 + * the sample is returned. After initializing the sample, callers must invoke 1107 + * **user_ring_buffer__submit()** to post the sample to the ring buffer. 1108 + * Otherwise, the sample must be freed with **user_ring_buffer__discard()**. 1109 + */ 1110 + LIBBPF_API void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, 1111 + __u32 size, 1112 + int timeout_ms); 1113 + 1114 + /* @brief **user_ring_buffer__submit()** submits a previously reserved sample 1115 + * into the ring buffer. 1116 + * @param rb The user ring buffer. 1117 + * @param sample A reserved sample. 1118 + * 1119 + * It is not necessary to synchronize amongst multiple producers when invoking 1120 + * this function. 1121 + */ 1122 + LIBBPF_API void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); 1123 + 1124 + /* @brief **user_ring_buffer__discard()** discards a previously reserved sample. 1125 + * @param rb The user ring buffer. 1126 + * @param sample A reserved sample. 1127 + * 1128 + * It is not necessary to synchronize amongst multiple producers when invoking 1129 + * this function. 1130 + */ 1131 + LIBBPF_API void user_ring_buffer__discard(struct user_ring_buffer *rb, void *sample); 1132 + 1133 + /* @brief **user_ring_buffer__free()** frees a ring buffer that was previously 1134 + * created with **user_ring_buffer__new()**. 1135 + * @param rb The user ring buffer being freed. 1136 + */ 1137 + LIBBPF_API void user_ring_buffer__free(struct user_ring_buffer *rb); 1035 1138 1036 1139 /* Perf buffer APIs */ 1037 1140 struct perf_buffer;

+10

tools/lib/bpf/libbpf.map

··· 368 368 libbpf_bpf_prog_type_str; 369 369 perf_buffer__buffer; 370 370 }; 371 + 372 + LIBBPF_1.1.0 { 373 + global: 374 + user_ring_buffer__discard; 375 + user_ring_buffer__free; 376 + user_ring_buffer__new; 377 + user_ring_buffer__reserve; 378 + user_ring_buffer__reserve_blocking; 379 + user_ring_buffer__submit; 380 + } LIBBPF_1.0.0;

+1

tools/lib/bpf/libbpf_probes.c

··· 231 231 return btf_fd; 232 232 break; 233 233 case BPF_MAP_TYPE_RINGBUF: 234 + case BPF_MAP_TYPE_USER_RINGBUF: 234 235 key_size = 0; 235 236 value_size = 0; 236 237 max_entries = 4096;

+1 -1

tools/lib/bpf/libbpf_version.h

··· 4 4 #define __LIBBPF_VERSION_H 5 5 6 6 #define LIBBPF_MAJOR_VERSION 1 7 - #define LIBBPF_MINOR_VERSION 0 7 + #define LIBBPF_MINOR_VERSION 1 8 8 9 9 #endif /* __LIBBPF_VERSION_H */

+1 -1

tools/lib/bpf/nlattr.c

··· 32 32 33 33 static int nla_ok(const struct nlattr *nla, int remaining) 34 34 { 35 - return remaining >= sizeof(*nla) && 35 + return remaining >= (int)sizeof(*nla) && 36 36 nla->nla_len >= sizeof(*nla) && 37 37 nla->nla_len <= remaining; 38 38 }

+271

tools/lib/bpf/ringbuf.c

··· 16 16 #include <asm/barrier.h> 17 17 #include <sys/mman.h> 18 18 #include <sys/epoll.h> 19 + #include <time.h> 19 20 20 21 #include "libbpf.h" 21 22 #include "libbpf_internal.h" ··· 38 37 size_t page_size; 39 38 int epoll_fd; 40 39 int ring_cnt; 40 + }; 41 + 42 + struct user_ring_buffer { 43 + struct epoll_event event; 44 + unsigned long *consumer_pos; 45 + unsigned long *producer_pos; 46 + void *data; 47 + unsigned long mask; 48 + size_t page_size; 49 + int map_fd; 50 + int epoll_fd; 51 + }; 52 + 53 + /* 8-byte ring buffer header structure */ 54 + struct ringbuf_hdr { 55 + __u32 len; 56 + __u32 pad; 41 57 }; 42 58 43 59 static void ringbuf_unmap_ring(struct ring_buffer *rb, struct ring *r) ··· 317 299 int ring_buffer__epoll_fd(const struct ring_buffer *rb) 318 300 { 319 301 return rb->epoll_fd; 302 + } 303 + 304 + static void user_ringbuf_unmap_ring(struct user_ring_buffer *rb) 305 + { 306 + if (rb->consumer_pos) { 307 + munmap(rb->consumer_pos, rb->page_size); 308 + rb->consumer_pos = NULL; 309 + } 310 + if (rb->producer_pos) { 311 + munmap(rb->producer_pos, rb->page_size + 2 * (rb->mask + 1)); 312 + rb->producer_pos = NULL; 313 + } 314 + } 315 + 316 + void user_ring_buffer__free(struct user_ring_buffer *rb) 317 + { 318 + if (!rb) 319 + return; 320 + 321 + user_ringbuf_unmap_ring(rb); 322 + 323 + if (rb->epoll_fd >= 0) 324 + close(rb->epoll_fd); 325 + 326 + free(rb); 327 + } 328 + 329 + static int user_ringbuf_map(struct user_ring_buffer *rb, int map_fd) 330 + { 331 + struct bpf_map_info info; 332 + __u32 len = sizeof(info); 333 + void *tmp; 334 + struct epoll_event *rb_epoll; 335 + int err; 336 + 337 + memset(&info, 0, sizeof(info)); 338 + 339 + err = bpf_obj_get_info_by_fd(map_fd, &info, &len); 340 + if (err) { 341 + err = -errno; 342 + pr_warn("user ringbuf: failed to get map info for fd=%d: %d\n", map_fd, err); 343 + return err; 344 + } 345 + 346 + if (info.type != BPF_MAP_TYPE_USER_RINGBUF) { 347 + pr_warn("user ringbuf: map fd=%d is not BPF_MAP_TYPE_USER_RINGBUF\n", map_fd); 348 + return -EINVAL; 349 + } 350 + 351 + rb->map_fd = map_fd; 352 + rb->mask = info.max_entries - 1; 353 + 354 + /* Map read-only consumer page */ 355 + tmp = mmap(NULL, rb->page_size, PROT_READ, MAP_SHARED, map_fd, 0); 356 + if (tmp == MAP_FAILED) { 357 + err = -errno; 358 + pr_warn("user ringbuf: failed to mmap consumer page for map fd=%d: %d\n", 359 + map_fd, err); 360 + return err; 361 + } 362 + rb->consumer_pos = tmp; 363 + 364 + /* Map read-write the producer page and data pages. We map the data 365 + * region as twice the total size of the ring buffer to allow the 366 + * simple reading and writing of samples that wrap around the end of 367 + * the buffer. See the kernel implementation for details. 368 + */ 369 + tmp = mmap(NULL, rb->page_size + 2 * info.max_entries, 370 + PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, rb->page_size); 371 + if (tmp == MAP_FAILED) { 372 + err = -errno; 373 + pr_warn("user ringbuf: failed to mmap data pages for map fd=%d: %d\n", 374 + map_fd, err); 375 + return err; 376 + } 377 + 378 + rb->producer_pos = tmp; 379 + rb->data = tmp + rb->page_size; 380 + 381 + rb_epoll = &rb->event; 382 + rb_epoll->events = EPOLLOUT; 383 + if (epoll_ctl(rb->epoll_fd, EPOLL_CTL_ADD, map_fd, rb_epoll) < 0) { 384 + err = -errno; 385 + pr_warn("user ringbuf: failed to epoll add map fd=%d: %d\n", map_fd, err); 386 + return err; 387 + } 388 + 389 + return 0; 390 + } 391 + 392 + struct user_ring_buffer * 393 + user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts) 394 + { 395 + struct user_ring_buffer *rb; 396 + int err; 397 + 398 + if (!OPTS_VALID(opts, user_ring_buffer_opts)) 399 + return errno = EINVAL, NULL; 400 + 401 + rb = calloc(1, sizeof(*rb)); 402 + if (!rb) 403 + return errno = ENOMEM, NULL; 404 + 405 + rb->page_size = getpagesize(); 406 + 407 + rb->epoll_fd = epoll_create1(EPOLL_CLOEXEC); 408 + if (rb->epoll_fd < 0) { 409 + err = -errno; 410 + pr_warn("user ringbuf: failed to create epoll instance: %d\n", err); 411 + goto err_out; 412 + } 413 + 414 + err = user_ringbuf_map(rb, map_fd); 415 + if (err) 416 + goto err_out; 417 + 418 + return rb; 419 + 420 + err_out: 421 + user_ring_buffer__free(rb); 422 + return errno = -err, NULL; 423 + } 424 + 425 + static void user_ringbuf_commit(struct user_ring_buffer *rb, void *sample, bool discard) 426 + { 427 + __u32 new_len; 428 + struct ringbuf_hdr *hdr; 429 + uintptr_t hdr_offset; 430 + 431 + hdr_offset = rb->mask + 1 + (sample - rb->data) - BPF_RINGBUF_HDR_SZ; 432 + hdr = rb->data + (hdr_offset & rb->mask); 433 + 434 + new_len = hdr->len & ~BPF_RINGBUF_BUSY_BIT; 435 + if (discard) 436 + new_len |= BPF_RINGBUF_DISCARD_BIT; 437 + 438 + /* Synchronizes with smp_load_acquire() in __bpf_user_ringbuf_peek() in 439 + * the kernel. 440 + */ 441 + __atomic_exchange_n(&hdr->len, new_len, __ATOMIC_ACQ_REL); 442 + } 443 + 444 + void user_ring_buffer__discard(struct user_ring_buffer *rb, void *sample) 445 + { 446 + user_ringbuf_commit(rb, sample, true); 447 + } 448 + 449 + void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample) 450 + { 451 + user_ringbuf_commit(rb, sample, false); 452 + } 453 + 454 + void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size) 455 + { 456 + __u32 avail_size, total_size, max_size; 457 + /* 64-bit to avoid overflow in case of extreme application behavior */ 458 + __u64 cons_pos, prod_pos; 459 + struct ringbuf_hdr *hdr; 460 + 461 + /* Synchronizes with smp_store_release() in __bpf_user_ringbuf_peek() in 462 + * the kernel. 463 + */ 464 + cons_pos = smp_load_acquire(rb->consumer_pos); 465 + /* Synchronizes with smp_store_release() in user_ringbuf_commit() */ 466 + prod_pos = smp_load_acquire(rb->producer_pos); 467 + 468 + max_size = rb->mask + 1; 469 + avail_size = max_size - (prod_pos - cons_pos); 470 + /* Round up total size to a multiple of 8. */ 471 + total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8; 472 + 473 + if (total_size > max_size) 474 + return errno = E2BIG, NULL; 475 + 476 + if (avail_size < total_size) 477 + return errno = ENOSPC, NULL; 478 + 479 + hdr = rb->data + (prod_pos & rb->mask); 480 + hdr->len = size | BPF_RINGBUF_BUSY_BIT; 481 + hdr->pad = 0; 482 + 483 + /* Synchronizes with smp_load_acquire() in __bpf_user_ringbuf_peek() in 484 + * the kernel. 485 + */ 486 + smp_store_release(rb->producer_pos, prod_pos + total_size); 487 + 488 + return (void *)rb->data + ((prod_pos + BPF_RINGBUF_HDR_SZ) & rb->mask); 489 + } 490 + 491 + static __u64 ns_elapsed_timespec(const struct timespec *start, const struct timespec *end) 492 + { 493 + __u64 start_ns, end_ns, ns_per_s = 1000000000; 494 + 495 + start_ns = (__u64)start->tv_sec * ns_per_s + start->tv_nsec; 496 + end_ns = (__u64)end->tv_sec * ns_per_s + end->tv_nsec; 497 + 498 + return end_ns - start_ns; 499 + } 500 + 501 + void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms) 502 + { 503 + void *sample; 504 + int err, ms_remaining = timeout_ms; 505 + struct timespec start; 506 + 507 + if (timeout_ms < 0 && timeout_ms != -1) 508 + return errno = EINVAL, NULL; 509 + 510 + if (timeout_ms != -1) { 511 + err = clock_gettime(CLOCK_MONOTONIC, &start); 512 + if (err) 513 + return NULL; 514 + } 515 + 516 + do { 517 + int cnt, ms_elapsed; 518 + struct timespec curr; 519 + __u64 ns_per_ms = 1000000; 520 + 521 + sample = user_ring_buffer__reserve(rb, size); 522 + if (sample) 523 + return sample; 524 + else if (errno != ENOSPC) 525 + return NULL; 526 + 527 + /* The kernel guarantees at least one event notification 528 + * delivery whenever at least one sample is drained from the 529 + * ring buffer in an invocation to bpf_ringbuf_drain(). Other 530 + * additional events may be delivered at any time, but only one 531 + * event is guaranteed per bpf_ringbuf_drain() invocation, 532 + * provided that a sample is drained, and the BPF program did 533 + * not pass BPF_RB_NO_WAKEUP to bpf_ringbuf_drain(). If 534 + * BPF_RB_FORCE_WAKEUP is passed to bpf_ringbuf_drain(), a 535 + * wakeup event will be delivered even if no samples are 536 + * drained. 537 + */ 538 + cnt = epoll_wait(rb->epoll_fd, &rb->event, 1, ms_remaining); 539 + if (cnt < 0) 540 + return NULL; 541 + 542 + if (timeout_ms == -1) 543 + continue; 544 + 545 + err = clock_gettime(CLOCK_MONOTONIC, &curr); 546 + if (err) 547 + return NULL; 548 + 549 + ms_elapsed = ns_elapsed_timespec(&start, &curr) / ns_per_ms; 550 + ms_remaining = timeout_ms - ms_elapsed; 551 + } while (ms_remaining > 0); 552 + 553 + /* Try one more time to reserve a sample after the specified timeout has elapsed. */ 554 + return user_ring_buffer__reserve(rb, size); 320 555 }

+1 -1

tools/lib/bpf/usdt.c

··· 282 282 * If this is not supported, USDTs with semaphores will not be supported. 283 283 * Added in: a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe") 284 284 */ 285 - man->has_sema_refcnt = access(ref_ctr_sysfs_path, F_OK) == 0; 285 + man->has_sema_refcnt = faccessat(AT_FDCWD, ref_ctr_sysfs_path, F_OK, AT_EACCESS) == 0; 286 286 287 287 return man; 288 288 }

+2 -1

tools/objtool/check.c

··· 4113 4113 !strcmp(sec->name, "__bug_table") || 4114 4114 !strcmp(sec->name, "__ex_table") || 4115 4115 !strcmp(sec->name, "__jump_table") || 4116 - !strcmp(sec->name, "__mcount_loc")) 4116 + !strcmp(sec->name, "__mcount_loc") || 4117 + strstr(sec->name, "__patchable_function_entries")) 4117 4118 continue; 4118 4119 4119 4120 list_for_each_entry(reloc, &sec->reloc->reloc_list, list)

+2

tools/testing/selftests/bpf/.gitignore

··· 39 39 /tools 40 40 /runqslower 41 41 /bench 42 + /veristat 43 + /sign-file 42 44 *.ko 43 45 *.tmp 44 46 xskxceiver

+5

tools/testing/selftests/bpf/DENYLIST.s390x

··· 70 70 cb_refs # expected error message unexpected error: -524 (trampoline) 71 71 cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc) 72 72 htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) 73 + tracing_struct # failed to auto-attach: -524 (trampoline) 74 + user_ringbuf # failed to find kernel BTF type ID of '__s390x_sys_prctl': -3 (?) 75 + lookup_key # JIT does not support calling kernel function (kfunc) 76 + verify_pkcs7_sig # JIT does not support calling kernel function (kfunc) 77 + kfunc_dynptr_param # JIT does not support calling kernel function (kfunc)

+21 -6

tools/testing/selftests/bpf/Makefile

··· 14 14 APIDIR := $(TOOLSINCDIR)/uapi 15 15 GENDIR := $(abspath ../../../../include/generated) 16 16 GENHDR := $(GENDIR)/autoconf.h 17 + HOSTPKG_CONFIG := pkg-config 17 18 18 19 ifneq ($(wildcard $(GENHDR)),) 19 20 GENFLAGS := -DHAVE_GENHDR ··· 76 75 test_xsk.sh 77 76 78 77 TEST_PROGS_EXTENDED := with_addr.sh \ 79 - with_tunnels.sh ima_setup.sh \ 78 + with_tunnels.sh ima_setup.sh verify_sig_setup.sh \ 80 79 test_xdp_vlan.sh test_bpftool.py 81 80 82 81 # Compile but not part of 'make run_tests' 83 82 TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \ 84 83 flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \ 85 84 test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \ 86 - xskxceiver xdp_redirect_multi xdp_synproxy 85 + xskxceiver xdp_redirect_multi xdp_synproxy veristat 87 86 88 - TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read 87 + TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file 88 + TEST_GEN_FILES += liburandom_read.so 89 89 90 90 # Emit succinct information message describing current building step 91 91 # $1 - generic step name (e.g., CC, LINK, etc); ··· 190 188 liburandom_read.so $(LDLIBS) \ 191 189 -fuse-ld=$(LLD) -Wl,-znoseparate-code \ 192 190 -Wl,-rpath=. -Wl,--build-id=sha1 -o $@ 191 + 192 + $(OUTPUT)/sign-file: ../../../../scripts/sign-file.c 193 + $(call msg,SIGN-FILE,,$@) 194 + $(Q)$(CC) $(shell $(HOSTPKG_CONFIG)--cflags libcrypto 2> /dev/null) \ 195 + $< -o $@ \ 196 + $(shell $(HOSTPKG_CONFIG) --libs libcrypto 2> /dev/null || echo -lcrypto) 193 197 194 198 $(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch]) 195 199 $(call msg,MOD,,$@) ··· 359 351 test_subskeleton.skel.h test_subskeleton_lib.skel.h \ 360 352 test_usdt.skel.h 361 353 362 - LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \ 354 + LSKELS := fentry_test.c fexit_test.c fexit_sleep.c \ 363 355 test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \ 364 356 map_ptr_kern.c core_kern.c core_kern_overflow.c 365 357 # Generate both light skeleton and libbpf skeleton for these 366 - LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test_subprog.c 358 + LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \ 359 + kfunc_call_test_subprog.c 367 360 SKEL_BLACKLIST += $$(LSKELS) 368 361 369 362 test_static_linked.skel.h-deps := test_static_linked1.bpf.o test_static_linked2.bpf.o ··· 524 515 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \ 525 516 $(OUTPUT)/liburandom_read.so \ 526 517 $(OUTPUT)/xdp_synproxy \ 527 - ima_setup.sh \ 518 + $(OUTPUT)/sign-file \ 519 + ima_setup.sh verify_sig_setup.sh \ 528 520 $(wildcard progs/btf_dump_test_case_*.c) 529 521 TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE 530 522 TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS ··· 601 591 $(OUTPUT)/bench_bpf_hashmap_full_update.o \ 602 592 $(OUTPUT)/bench_local_storage.o \ 603 593 $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o 594 + $(call msg,BINARY,,$@) 595 + $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@ 596 + 597 + $(OUTPUT)/veristat.o: $(BPFOBJ) 598 + $(OUTPUT)/veristat: $(OUTPUT)/veristat.o 604 599 $(call msg,BINARY,,$@) 605 600 $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@ 606 601

+48

tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c

··· 18 18 typedef int (*func_proto_typedef_nested2)(func_proto_typedef_nested1); 19 19 20 20 DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123; 21 + long bpf_testmod_test_struct_arg_result; 22 + 23 + struct bpf_testmod_struct_arg_1 { 24 + int a; 25 + }; 26 + struct bpf_testmod_struct_arg_2 { 27 + long a; 28 + long b; 29 + }; 30 + 31 + noinline int 32 + bpf_testmod_test_struct_arg_1(struct bpf_testmod_struct_arg_2 a, int b, int c) { 33 + bpf_testmod_test_struct_arg_result = a.a + a.b + b + c; 34 + return bpf_testmod_test_struct_arg_result; 35 + } 36 + 37 + noinline int 38 + bpf_testmod_test_struct_arg_2(int a, struct bpf_testmod_struct_arg_2 b, int c) { 39 + bpf_testmod_test_struct_arg_result = a + b.a + b.b + c; 40 + return bpf_testmod_test_struct_arg_result; 41 + } 42 + 43 + noinline int 44 + bpf_testmod_test_struct_arg_3(int a, int b, struct bpf_testmod_struct_arg_2 c) { 45 + bpf_testmod_test_struct_arg_result = a + b + c.a + c.b; 46 + return bpf_testmod_test_struct_arg_result; 47 + } 48 + 49 + noinline int 50 + bpf_testmod_test_struct_arg_4(struct bpf_testmod_struct_arg_1 a, int b, 51 + int c, int d, struct bpf_testmod_struct_arg_2 e) { 52 + bpf_testmod_test_struct_arg_result = a.a + b + c + d + e.a + e.b; 53 + return bpf_testmod_test_struct_arg_result; 54 + } 55 + 56 + noinline int 57 + bpf_testmod_test_struct_arg_5(void) { 58 + bpf_testmod_test_struct_arg_result = 1; 59 + return bpf_testmod_test_struct_arg_result; 60 + } 21 61 22 62 noinline void 23 63 bpf_testmod_test_mod_kfunc(int i) ··· 138 98 .off = off, 139 99 .len = len, 140 100 }; 101 + struct bpf_testmod_struct_arg_1 struct_arg1 = {10}; 102 + struct bpf_testmod_struct_arg_2 struct_arg2 = {2, 3}; 141 103 int i = 1; 142 104 143 105 while (bpf_testmod_return_ptr(i)) 144 106 i++; 107 + 108 + (void)bpf_testmod_test_struct_arg_1(struct_arg2, 1, 4); 109 + (void)bpf_testmod_test_struct_arg_2(1, struct_arg2, 4); 110 + (void)bpf_testmod_test_struct_arg_3(1, 4, struct_arg2); 111 + (void)bpf_testmod_test_struct_arg_4(struct_arg1, 1, 2, 3, struct_arg2); 112 + (void)bpf_testmod_test_struct_arg_5(); 145 113 146 114 /* This is always true. Use the check to make sure the compiler 147 115 * doesn't remove bpf_testmod_loop_test.

+20 -13

tools/testing/selftests/bpf/config

··· 7 7 CONFIG_BPF_STREAM_PARSER=y 8 8 CONFIG_BPF_SYSCALL=y 9 9 CONFIG_CGROUP_BPF=y 10 - CONFIG_CRYPTO_HMAC=m 11 - CONFIG_CRYPTO_SHA256=m 12 - CONFIG_CRYPTO_USER_API_HASH=m 10 + CONFIG_CRYPTO_HMAC=y 11 + CONFIG_CRYPTO_SHA256=y 12 + CONFIG_CRYPTO_USER_API_HASH=y 13 13 CONFIG_DYNAMIC_FTRACE=y 14 14 CONFIG_FPROBE=y 15 15 CONFIG_FTRACE_SYSCALLS=y ··· 24 24 CONFIG_IP_NF_RAW=y 25 25 CONFIG_IP_NF_TARGET_SYNPROXY=y 26 26 CONFIG_IPV6=y 27 - CONFIG_IPV6_FOU=m 28 - CONFIG_IPV6_FOU_TUNNEL=m 27 + CONFIG_IPV6_FOU=y 28 + CONFIG_IPV6_FOU_TUNNEL=y 29 29 CONFIG_IPV6_GRE=y 30 30 CONFIG_IPV6_SEG6_BPF=y 31 - CONFIG_IPV6_SIT=m 31 + CONFIG_IPV6_SIT=y 32 32 CONFIG_IPV6_TUNNEL=y 33 + CONFIG_KEYS=y 33 34 CONFIG_LIRC=y 34 35 CONFIG_LWTUNNEL=y 36 + CONFIG_MODULE_SIG=y 37 + CONFIG_MODULE_SRCVERSION_ALL=y 38 + CONFIG_MODULE_UNLOAD=y 39 + CONFIG_MODULES=y 40 + CONFIG_MODVERSIONS=y 35 41 CONFIG_MPLS=y 36 - CONFIG_MPLS_IPTUNNEL=m 37 - CONFIG_MPLS_ROUTING=m 42 + CONFIG_MPLS_IPTUNNEL=y 43 + CONFIG_MPLS_ROUTING=y 38 44 CONFIG_MPTCP=y 39 45 CONFIG_NET_CLS_ACT=y 40 46 CONFIG_NET_CLS_BPF=y 41 - CONFIG_NET_CLS_FLOWER=m 42 - CONFIG_NET_FOU=m 47 + CONFIG_NET_CLS_FLOWER=y 48 + CONFIG_NET_FOU=y 43 49 CONFIG_NET_FOU_IP_TUNNELS=y 44 50 CONFIG_NET_IPGRE=y 45 51 CONFIG_NET_IPGRE_DEMUX=y 46 52 CONFIG_NET_IPIP=y 47 - CONFIG_NET_MPLS_GSO=m 53 + CONFIG_NET_MPLS_GSO=y 48 54 CONFIG_NET_SCH_INGRESS=y 49 55 CONFIG_NET_SCHED=y 50 - CONFIG_NETDEVSIM=m 56 + CONFIG_NETDEVSIM=y 51 57 CONFIG_NETFILTER=y 52 58 CONFIG_NETFILTER_SYNPROXY=y 53 59 CONFIG_NETFILTER_XT_CONNMARK=y ··· 63 57 CONFIG_NF_CONNTRACK_MARK=y 64 58 CONFIG_NF_DEFRAG_IPV4=y 65 59 CONFIG_NF_DEFRAG_IPV6=y 60 + CONFIG_NF_NAT=y 66 61 CONFIG_RC_CORE=y 67 62 CONFIG_SECURITY=y 68 63 CONFIG_SECURITYFS=y 69 - CONFIG_TEST_BPF=m 64 + CONFIG_TEST_BPF=y 70 65 CONFIG_USERFAULTFD=y 71 66 CONFIG_VXLAN=y 72 67 CONFIG_XDP_SOCKETS=y

+1 -6

tools/testing/selftests/bpf/config.x86_64

··· 47 47 CONFIG_CPUSETS=y 48 48 CONFIG_CRC_T10DIF=y 49 49 CONFIG_CRYPTO_BLAKE2B=y 50 - CONFIG_CRYPTO_DEV_VIRTIO=m 50 + CONFIG_CRYPTO_DEV_VIRTIO=y 51 51 CONFIG_CRYPTO_SEQIV=y 52 52 CONFIG_CRYPTO_XXHASH=y 53 53 CONFIG_DCB=y ··· 145 145 CONFIG_MEMCG=y 146 146 CONFIG_MEMORY_FAILURE=y 147 147 CONFIG_MINIX_SUBPARTITION=y 148 - CONFIG_MODULE_SIG=y 149 - CONFIG_MODULE_SRCVERSION_ALL=y 150 - CONFIG_MODULE_UNLOAD=y 151 - CONFIG_MODULES=y 152 - CONFIG_MODVERSIONS=y 153 148 CONFIG_NAMESPACES=y 154 149 CONFIG_NET=y 155 150 CONFIG_NET_9P=y

+2

tools/testing/selftests/bpf/map_tests/array_map_batch_ops.c

··· 3 3 #include <stdio.h> 4 4 #include <errno.h> 5 5 #include <string.h> 6 + #include <unistd.h> 6 7 7 8 #include <bpf/bpf.h> 8 9 #include <bpf/libbpf.h> ··· 138 137 free(keys); 139 138 free(values); 140 139 free(visited); 140 + close(map_fd); 141 141 } 142 142 143 143 static void array_map_batch_ops(void)

+2

tools/testing/selftests/bpf/map_tests/htab_map_batch_ops.c

··· 3 3 #include <stdio.h> 4 4 #include <errno.h> 5 5 #include <string.h> 6 + #include <unistd.h> 6 7 7 8 #include <bpf/bpf.h> 8 9 #include <bpf/libbpf.h> ··· 256 255 free(visited); 257 256 if (!is_pcpu) 258 257 free(values); 258 + close(map_fd); 259 259 } 260 260 261 261 void htab_map_batch_ops(void)

+2

tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c

··· 7 7 #include <errno.h> 8 8 #include <string.h> 9 9 #include <stdlib.h> 10 + #include <unistd.h> 10 11 11 12 #include <bpf/bpf.h> 12 13 #include <bpf/libbpf.h> ··· 151 150 free(keys); 152 151 free(values); 153 152 free(visited); 153 + close(map_fd); 154 154 }

+6 -1

tools/testing/selftests/bpf/map_tests/task_storage_map.c

··· 77 77 CHECK(err, "open_and_load", "error %d\n", err); 78 78 79 79 /* Only for a fully preemptible kernel */ 80 - if (!skel->kconfig->CONFIG_PREEMPT) 80 + if (!skel->kconfig->CONFIG_PREEMPT) { 81 + printf("%s SKIP (no CONFIG_PREEMPT)\n", __func__); 82 + read_bpf_task_storage_busy__destroy(skel); 83 + skips++; 81 84 return; 85 + } 82 86 83 87 /* Save the old affinity setting */ 84 88 sched_getaffinity(getpid(), sizeof(old), &old); ··· 123 119 read_bpf_task_storage_busy__destroy(skel); 124 120 /* Restore affinity setting */ 125 121 sched_setaffinity(getpid(), sizeof(old), &old); 122 + printf("%s:PASS\n", __func__); 126 123 }

+261 -21

tools/testing/selftests/bpf/prog_tests/bpf_iter.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 /* Copyright (c) 2020 Facebook */ 3 3 #include <test_progs.h> 4 + #include <unistd.h> 5 + #include <sys/syscall.h> 4 6 #include "bpf_iter_ipv6_route.skel.h" 5 7 #include "bpf_iter_netlink.skel.h" 6 8 #include "bpf_iter_bpf_map.skel.h" ··· 16 14 #include "bpf_iter_udp4.skel.h" 17 15 #include "bpf_iter_udp6.skel.h" 18 16 #include "bpf_iter_unix.skel.h" 17 + #include "bpf_iter_vma_offset.skel.h" 19 18 #include "bpf_iter_test_kern1.skel.h" 20 19 #include "bpf_iter_test_kern2.skel.h" 21 20 #include "bpf_iter_test_kern3.skel.h" ··· 46 43 } 47 44 } 48 45 49 - static void do_dummy_read(struct bpf_program *prog) 46 + static void do_dummy_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts) 50 47 { 51 48 struct bpf_link *link; 52 49 char buf[16] = {}; 53 50 int iter_fd, len; 54 51 55 - link = bpf_program__attach_iter(prog, NULL); 52 + link = bpf_program__attach_iter(prog, opts); 56 53 if (!ASSERT_OK_PTR(link, "attach_iter")) 57 54 return; 58 55 ··· 69 66 70 67 free_link: 71 68 bpf_link__destroy(link); 69 + } 70 + 71 + static void do_dummy_read(struct bpf_program *prog) 72 + { 73 + do_dummy_read_opts(prog, NULL); 72 74 } 73 75 74 76 static void do_read_map_iter_fd(struct bpf_object_skeleton **skel, struct bpf_program *prog, ··· 175 167 bpf_iter_bpf_map__destroy(skel); 176 168 } 177 169 178 - static void test_task(void) 170 + static int pidfd_open(pid_t pid, unsigned int flags) 171 + { 172 + return syscall(SYS_pidfd_open, pid, flags); 173 + } 174 + 175 + static void check_bpf_link_info(const struct bpf_program *prog) 176 + { 177 + LIBBPF_OPTS(bpf_iter_attach_opts, opts); 178 + union bpf_iter_link_info linfo; 179 + struct bpf_link_info info = {}; 180 + struct bpf_link *link; 181 + __u32 info_len; 182 + int err; 183 + 184 + memset(&linfo, 0, sizeof(linfo)); 185 + linfo.task.tid = getpid(); 186 + opts.link_info = &linfo; 187 + opts.link_info_len = sizeof(linfo); 188 + 189 + link = bpf_program__attach_iter(prog, &opts); 190 + if (!ASSERT_OK_PTR(link, "attach_iter")) 191 + return; 192 + 193 + info_len = sizeof(info); 194 + err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len); 195 + ASSERT_OK(err, "bpf_obj_get_info_by_fd"); 196 + ASSERT_EQ(info.iter.task.tid, getpid(), "check_task_tid"); 197 + 198 + bpf_link__destroy(link); 199 + } 200 + 201 + static pthread_mutex_t do_nothing_mutex; 202 + 203 + static void *do_nothing_wait(void *arg) 204 + { 205 + pthread_mutex_lock(&do_nothing_mutex); 206 + pthread_mutex_unlock(&do_nothing_mutex); 207 + 208 + pthread_exit(arg); 209 + } 210 + 211 + static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts, 212 + int *num_unknown, int *num_known) 179 213 { 180 214 struct bpf_iter_task *skel; 215 + pthread_t thread_id; 216 + void *ret; 181 217 182 218 skel = bpf_iter_task__open_and_load(); 183 219 if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load")) 184 220 return; 185 221 186 - do_dummy_read(skel->progs.dump_task); 222 + ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"); 223 + 224 + ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL), 225 + "pthread_create"); 226 + 227 + skel->bss->tid = getpid(); 228 + 229 + do_dummy_read_opts(skel->progs.dump_task, opts); 230 + 231 + *num_unknown = skel->bss->num_unknown_tid; 232 + *num_known = skel->bss->num_known_tid; 233 + 234 + ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock"); 235 + ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL, 236 + "pthread_join"); 187 237 188 238 bpf_iter_task__destroy(skel); 239 + } 240 + 241 + static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known) 242 + { 243 + int num_unknown_tid, num_known_tid; 244 + 245 + test_task_common_nocheck(opts, &num_unknown_tid, &num_known_tid); 246 + ASSERT_EQ(num_unknown_tid, num_unknown, "check_num_unknown_tid"); 247 + ASSERT_EQ(num_known_tid, num_known, "check_num_known_tid"); 248 + } 249 + 250 + static void test_task_tid(void) 251 + { 252 + LIBBPF_OPTS(bpf_iter_attach_opts, opts); 253 + union bpf_iter_link_info linfo; 254 + int num_unknown_tid, num_known_tid; 255 + 256 + memset(&linfo, 0, sizeof(linfo)); 257 + linfo.task.tid = getpid(); 258 + opts.link_info = &linfo; 259 + opts.link_info_len = sizeof(linfo); 260 + test_task_common(&opts, 0, 1); 261 + 262 + linfo.task.tid = 0; 263 + linfo.task.pid = getpid(); 264 + test_task_common(&opts, 1, 1); 265 + 266 + test_task_common_nocheck(NULL, &num_unknown_tid, &num_known_tid); 267 + ASSERT_GT(num_unknown_tid, 1, "check_num_unknown_tid"); 268 + ASSERT_EQ(num_known_tid, 1, "check_num_known_tid"); 269 + } 270 + 271 + static void test_task_pid(void) 272 + { 273 + LIBBPF_OPTS(bpf_iter_attach_opts, opts); 274 + union bpf_iter_link_info linfo; 275 + 276 + memset(&linfo, 0, sizeof(linfo)); 277 + linfo.task.pid = getpid(); 278 + opts.link_info = &linfo; 279 + opts.link_info_len = sizeof(linfo); 280 + 281 + test_task_common(&opts, 1, 1); 282 + } 283 + 284 + static void test_task_pidfd(void) 285 + { 286 + LIBBPF_OPTS(bpf_iter_attach_opts, opts); 287 + union bpf_iter_link_info linfo; 288 + int pidfd; 289 + 290 + pidfd = pidfd_open(getpid(), 0); 291 + if (!ASSERT_GT(pidfd, 0, "pidfd_open")) 292 + return; 293 + 294 + memset(&linfo, 0, sizeof(linfo)); 295 + linfo.task.pid_fd = pidfd; 296 + opts.link_info = &linfo; 297 + opts.link_info_len = sizeof(linfo); 298 + 299 + test_task_common(&opts, 1, 1); 300 + 301 + close(pidfd); 189 302 } 190 303 191 304 static void test_task_sleepable(void) ··· 341 212 bpf_iter_task_stack__destroy(skel); 342 213 } 343 214 344 - static void *do_nothing(void *arg) 345 - { 346 - pthread_exit(arg); 347 - } 348 - 349 215 static void test_task_file(void) 350 216 { 217 + LIBBPF_OPTS(bpf_iter_attach_opts, opts); 351 218 struct bpf_iter_task_file *skel; 219 + union bpf_iter_link_info linfo; 352 220 pthread_t thread_id; 353 221 void *ret; 354 222 ··· 355 229 356 230 skel->bss->tgid = getpid(); 357 231 358 - if (!ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing, NULL), 359 - "pthread_create")) 360 - goto done; 232 + ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"); 233 + 234 + ASSERT_OK(pthread_create(&thread_id, NULL, &do_nothing_wait, NULL), 235 + "pthread_create"); 236 + 237 + memset(&linfo, 0, sizeof(linfo)); 238 + linfo.task.tid = getpid(); 239 + opts.link_info = &linfo; 240 + opts.link_info_len = sizeof(linfo); 241 + 242 + do_dummy_read_opts(skel->progs.dump_task_file, &opts); 243 + 244 + ASSERT_EQ(skel->bss->count, 0, "check_count"); 245 + ASSERT_EQ(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count"); 246 + 247 + skel->bss->last_tgid = 0; 248 + skel->bss->count = 0; 249 + skel->bss->unique_tgid_count = 0; 361 250 362 251 do_dummy_read(skel->progs.dump_task_file); 363 252 364 - if (!ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL, 365 - "pthread_join")) 366 - goto done; 367 - 368 253 ASSERT_EQ(skel->bss->count, 0, "check_count"); 254 + ASSERT_GT(skel->bss->unique_tgid_count, 1, "check_unique_tgid_count"); 369 255 370 - done: 256 + check_bpf_link_info(skel->progs.dump_task_file); 257 + 258 + ASSERT_OK(pthread_mutex_unlock(&do_nothing_mutex), "pthread_mutex_unlock"); 259 + ASSERT_OK(pthread_join(thread_id, &ret), "pthread_join"); 260 + ASSERT_NULL(ret, "pthread_join"); 261 + 371 262 bpf_iter_task_file__destroy(skel); 372 263 } 373 264 ··· 1392 1249 *dst = '\0'; 1393 1250 } 1394 1251 1395 - static void test_task_vma(void) 1252 + static void test_task_vma_common(struct bpf_iter_attach_opts *opts) 1396 1253 { 1397 1254 int err, iter_fd = -1, proc_maps_fd = -1; 1398 1255 struct bpf_iter_task_vma *skel; ··· 1404 1261 return; 1405 1262 1406 1263 skel->bss->pid = getpid(); 1264 + skel->bss->one_task = opts ? 1 : 0; 1407 1265 1408 1266 err = bpf_iter_task_vma__load(skel); 1409 1267 if (!ASSERT_OK(err, "bpf_iter_task_vma__load")) 1410 1268 goto out; 1411 1269 1412 1270 skel->links.proc_maps = bpf_program__attach_iter( 1413 - skel->progs.proc_maps, NULL); 1271 + skel->progs.proc_maps, opts); 1414 1272 1415 1273 if (!ASSERT_OK_PTR(skel->links.proc_maps, "bpf_program__attach_iter")) { 1416 1274 skel->links.proc_maps = NULL; ··· 1435 1291 goto out; 1436 1292 len += err; 1437 1293 } 1294 + if (opts) 1295 + ASSERT_EQ(skel->bss->one_task_error, 0, "unexpected task"); 1438 1296 1439 1297 /* read CMP_BUFFER_SIZE (1kB) from /proc/pid/maps */ 1440 1298 snprintf(maps_path, 64, "/proc/%u/maps", skel->bss->pid); ··· 1452 1306 str_strip_first_line(proc_maps_output); 1453 1307 1454 1308 ASSERT_STREQ(task_vma_output, proc_maps_output, "compare_output"); 1309 + 1310 + check_bpf_link_info(skel->progs.proc_maps); 1311 + 1455 1312 out: 1456 1313 close(proc_maps_fd); 1457 1314 close(iter_fd); ··· 1474 1325 bpf_iter_sockmap__destroy(skel); 1475 1326 } 1476 1327 1328 + static void test_task_vma(void) 1329 + { 1330 + LIBBPF_OPTS(bpf_iter_attach_opts, opts); 1331 + union bpf_iter_link_info linfo; 1332 + 1333 + memset(&linfo, 0, sizeof(linfo)); 1334 + linfo.task.tid = getpid(); 1335 + opts.link_info = &linfo; 1336 + opts.link_info_len = sizeof(linfo); 1337 + 1338 + test_task_vma_common(&opts); 1339 + test_task_vma_common(NULL); 1340 + } 1341 + 1342 + /* uprobe attach point */ 1343 + static noinline int trigger_func(int arg) 1344 + { 1345 + asm volatile (""); 1346 + return arg + 1; 1347 + } 1348 + 1349 + static void test_task_vma_offset_common(struct bpf_iter_attach_opts *opts, bool one_proc) 1350 + { 1351 + struct bpf_iter_vma_offset *skel; 1352 + struct bpf_link *link; 1353 + char buf[16] = {}; 1354 + int iter_fd, len; 1355 + int pgsz, shift; 1356 + 1357 + skel = bpf_iter_vma_offset__open_and_load(); 1358 + if (!ASSERT_OK_PTR(skel, "bpf_iter_vma_offset__open_and_load")) 1359 + return; 1360 + 1361 + skel->bss->pid = getpid(); 1362 + skel->bss->address = (uintptr_t)trigger_func; 1363 + for (pgsz = getpagesize(), shift = 0; pgsz > 1; pgsz >>= 1, shift++) 1364 + ; 1365 + skel->bss->page_shift = shift; 1366 + 1367 + link = bpf_program__attach_iter(skel->progs.get_vma_offset, opts); 1368 + if (!ASSERT_OK_PTR(link, "attach_iter")) 1369 + return; 1370 + 1371 + iter_fd = bpf_iter_create(bpf_link__fd(link)); 1372 + if (!ASSERT_GT(iter_fd, 0, "create_iter")) 1373 + goto exit; 1374 + 1375 + while ((len = read(iter_fd, buf, sizeof(buf))) > 0) 1376 + ; 1377 + buf[15] = 0; 1378 + ASSERT_EQ(strcmp(buf, "OK\n"), 0, "strcmp"); 1379 + 1380 + ASSERT_EQ(skel->bss->offset, get_uprobe_offset(trigger_func), "offset"); 1381 + if (one_proc) 1382 + ASSERT_EQ(skel->bss->unique_tgid_cnt, 1, "unique_tgid_count"); 1383 + else 1384 + ASSERT_GT(skel->bss->unique_tgid_cnt, 1, "unique_tgid_count"); 1385 + 1386 + close(iter_fd); 1387 + 1388 + exit: 1389 + bpf_link__destroy(link); 1390 + } 1391 + 1392 + static void test_task_vma_offset(void) 1393 + { 1394 + LIBBPF_OPTS(bpf_iter_attach_opts, opts); 1395 + union bpf_iter_link_info linfo; 1396 + 1397 + memset(&linfo, 0, sizeof(linfo)); 1398 + linfo.task.pid = getpid(); 1399 + opts.link_info = &linfo; 1400 + opts.link_info_len = sizeof(linfo); 1401 + 1402 + test_task_vma_offset_common(&opts, true); 1403 + 1404 + linfo.task.pid = 0; 1405 + linfo.task.tid = getpid(); 1406 + test_task_vma_offset_common(&opts, true); 1407 + 1408 + test_task_vma_offset_common(NULL, false); 1409 + } 1410 + 1477 1411 void test_bpf_iter(void) 1478 1412 { 1413 + ASSERT_OK(pthread_mutex_init(&do_nothing_mutex, NULL), "pthread_mutex_init"); 1414 + 1479 1415 if (test__start_subtest("btf_id_or_null")) 1480 1416 test_btf_id_or_null(); 1481 1417 if (test__start_subtest("ipv6_route")) ··· 1569 1335 test_netlink(); 1570 1336 if (test__start_subtest("bpf_map")) 1571 1337 test_bpf_map(); 1572 - if (test__start_subtest("task")) 1573 - test_task(); 1338 + if (test__start_subtest("task_tid")) 1339 + test_task_tid(); 1340 + if (test__start_subtest("task_pid")) 1341 + test_task_pid(); 1342 + if (test__start_subtest("task_pidfd")) 1343 + test_task_pidfd(); 1574 1344 if (test__start_subtest("task_sleepable")) 1575 1345 test_task_sleepable(); 1576 1346 if (test__start_subtest("task_stack")) ··· 1635 1397 test_ksym_iter(); 1636 1398 if (test__start_subtest("bpf_sockmap_map_iter_fd")) 1637 1399 test_bpf_sockmap_map_iter_fd(); 1400 + if (test__start_subtest("vma_offset")) 1401 + test_task_vma_offset(); 1638 1402 }

+10 -3

tools/testing/selftests/bpf/prog_tests/bpf_nf.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <test_progs.h> 3 3 #include <network_helpers.h> 4 + #include <linux/netfilter/nf_conntrack_common.h> 4 5 #include "test_bpf_nf.skel.h" 5 6 #include "test_bpf_nf_fail.skel.h" 6 7 ··· 18 17 { "set_status_after_insert", "kernel function bpf_ct_set_status args#0 expected pointer to STRUCT nf_conn___init but" }, 19 18 { "change_timeout_after_alloc", "kernel function bpf_ct_change_timeout args#0 expected pointer to STRUCT nf_conn but" }, 20 19 { "change_status_after_alloc", "kernel function bpf_ct_change_status args#0 expected pointer to STRUCT nf_conn but" }, 20 + { "write_not_allowlisted_field", "no write support to nf_conn at off" }, 21 21 }; 22 22 23 23 enum { ··· 26 24 TEST_TC_BPF, 27 25 }; 28 26 29 - #define TIMEOUT_MS 3000 27 + #define TIMEOUT_MS 3000 28 + #define IPS_STATUS_MASK (IPS_CONFIRMED | IPS_SEEN_REPLY | \ 29 + IPS_SRC_NAT_DONE | IPS_DST_NAT_DONE | \ 30 + IPS_SRC_NAT | IPS_DST_NAT) 30 31 31 32 static int connect_to_server(int srv_fd) 32 33 { ··· 116 111 /* allow some tolerance for test_delta_timeout value to avoid races. */ 117 112 ASSERT_GT(skel->bss->test_delta_timeout, 8, "Test for min ct timeout update"); 118 113 ASSERT_LE(skel->bss->test_delta_timeout, 10, "Test for max ct timeout update"); 119 - /* expected status is IPS_SEEN_REPLY */ 120 - ASSERT_EQ(skel->bss->test_status, 2, "Test for ct status update "); 114 + ASSERT_EQ(skel->bss->test_insert_lookup_mark, 77, "Test for insert and lookup mark value"); 115 + ASSERT_EQ(skel->bss->test_status, IPS_STATUS_MASK, "Test for ct status update "); 121 116 ASSERT_EQ(skel->data->test_exist_lookup, 0, "Test existing connection lookup"); 122 117 ASSERT_EQ(skel->bss->test_exist_lookup_mark, 43, "Test existing connection lookup ctmark"); 118 + ASSERT_EQ(skel->data->test_snat_addr, 0, "Test for source natting"); 119 + ASSERT_EQ(skel->data->test_dnat_addr, 0, "Test for destination natting"); 123 120 end: 124 121 if (srv_client_fd != -1) 125 122 close(srv_client_fd);

+4

tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c

··· 290 290 goto done; 291 291 ASSERT_STREQ(dctcp_skel->bss->cc_res, "cubic", "cc_res"); 292 292 ASSERT_EQ(dctcp_skel->bss->tcp_cdg_res, -ENOTSUPP, "tcp_cdg_res"); 293 + /* All setsockopt(TCP_CONGESTION) in the recurred 294 + * bpf_dctcp->init() should fail with -EBUSY. 295 + */ 296 + ASSERT_EQ(dctcp_skel->bss->ebusy_cnt, 3, "ebusy_cnt"); 293 297 294 298 err = getsockopt(srv_fd, SOL_TCP, TCP_CONGESTION, srv_cc, &cc_len); 295 299 if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)"))

+1 -1

tools/testing/selftests/bpf/prog_tests/btf_dump.c

··· 764 764 765 765 /* union with nested struct */ 766 766 TEST_BTF_DUMP_DATA(btf, d, "union", str, union bpf_iter_link_info, BTF_F_COMPACT, 767 - "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},.cgroup = (struct){.order = (enum bpf_cgroup_iter_order)BPF_CGROUP_ITER_SELF_ONLY,.cgroup_fd = (__u32)1,},}", 767 + "(union bpf_iter_link_info){.map = (struct){.map_fd = (__u32)1,},.cgroup = (struct){.order = (enum bpf_cgroup_iter_order)BPF_CGROUP_ITER_SELF_ONLY,.cgroup_fd = (__u32)1,},.task = (struct){.tid = (__u32)1,.pid = (__u32)1,},}", 768 768 { .cgroup = { .order = 1, .cgroup_fd = 1, }}); 769 769 770 770 /* struct skb with nested structs/unions; because type output is so

-20

tools/testing/selftests/bpf/prog_tests/btf_skc_cls_ingress.c

··· 22 22 23 23 #define PROG_PIN_FILE "/sys/fs/bpf/btf_skc_cls_ingress" 24 24 25 - static int write_sysctl(const char *sysctl, const char *value) 26 - { 27 - int fd, err, len; 28 - 29 - fd = open(sysctl, O_WRONLY); 30 - if (CHECK(fd == -1, "open sysctl", "open(%s): %s (%d)\n", 31 - sysctl, strerror(errno), errno)) 32 - return -1; 33 - 34 - len = strlen(value); 35 - err = write(fd, value, len); 36 - close(fd); 37 - if (CHECK(err != len, "write sysctl", 38 - "write(%s, %s, %d): err:%d %s (%d)\n", 39 - sysctl, value, len, err, strerror(errno), errno)) 40 - return -1; 41 - 42 - return 0; 43 - } 44 - 45 25 static int prepare_netns(void) 46 26 { 47 27 if (CHECK(unshare(CLONE_NEWNET), "create netns",

+75 -93

tools/testing/selftests/bpf/prog_tests/cgroup_hierarchical_stats.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 /* 3 - * Functions to manage eBPF programs attached to cgroup subsystems 3 + * This test makes sure BPF stats collection using rstat works correctly. 4 + * The test uses 3 BPF progs: 5 + * (a) counter: This BPF prog is invoked every time we attach a process to a 6 + * cgroup and locklessly increments a percpu counter. 7 + * The program then calls cgroup_rstat_updated() to inform rstat 8 + * of an update on the (cpu, cgroup) pair. 9 + * 10 + * (b) flusher: This BPF prog is invoked when an rstat flush is ongoing, it 11 + * aggregates all percpu counters to a total counter, and also 12 + * propagates the changes to the ancestor cgroups. 13 + * 14 + * (c) dumper: This BPF prog is a cgroup_iter. It is used to output the total 15 + * counter of a cgroup through reading a file in userspace. 16 + * 17 + * The test sets up a cgroup hierarchy, and the above programs. It spawns a few 18 + * processes in the leaf cgroups and makes sure all the counters are aggregated 19 + * correctly. 4 20 * 5 21 * Copyright 2022 Google LLC. 6 22 */ ··· 37 21 #define PAGE_SIZE 4096 38 22 #define MB(x) (x << 20) 39 23 24 + #define PROCESSES_PER_CGROUP 3 25 + 40 26 #define BPFFS_ROOT "/sys/fs/bpf/" 41 - #define BPFFS_VMSCAN BPFFS_ROOT"vmscan/" 27 + #define BPFFS_ATTACH_COUNTERS BPFFS_ROOT "attach_counters/" 42 28 43 29 #define CG_ROOT_NAME "root" 44 30 #define CG_ROOT_ID 1 ··· 97 79 return err; 98 80 99 81 /* Create a directory to contain stat files in bpffs */ 100 - err = mkdir(BPFFS_VMSCAN, 0755); 82 + err = mkdir(BPFFS_ATTACH_COUNTERS, 0755); 101 83 if (!ASSERT_OK(err, "mkdir")) 102 84 return err; 103 85 ··· 107 89 static void cleanup_bpffs(void) 108 90 { 109 91 /* Remove created directory in bpffs */ 110 - ASSERT_OK(rmdir(BPFFS_VMSCAN), "rmdir "BPFFS_VMSCAN); 92 + ASSERT_OK(rmdir(BPFFS_ATTACH_COUNTERS), "rmdir "BPFFS_ATTACH_COUNTERS); 111 93 112 94 /* Unmount bpffs, if it wasn't already mounted when we started */ 113 95 if (mounted_bpffs) ··· 136 118 137 119 cgroups[i].fd = fd; 138 120 cgroups[i].id = get_cgroup_id(cgroups[i].path); 139 - 140 - /* 141 - * Enable memcg controller for the entire hierarchy. 142 - * Note that stats are collected for all cgroups in a hierarchy 143 - * with memcg enabled anyway, but are only exposed for cgroups 144 - * that have memcg enabled. 145 - */ 146 - if (i < N_NON_LEAF_CGROUPS) { 147 - err = enable_controllers(cgroups[i].path, "memory"); 148 - if (!ASSERT_OK(err, "enable_controllers")) 149 - return err; 150 - } 151 121 } 152 122 return 0; 153 123 } ··· 160 154 cleanup_bpffs(); 161 155 } 162 156 163 - static int reclaimer(const char *cgroup_path, size_t size) 157 + static int attach_processes(void) 164 158 { 165 - static char size_buf[128]; 166 - char *buf, *ptr; 167 - int err; 159 + int i, j, status; 168 160 169 - /* Join cgroup in the parent process workdir */ 170 - if (join_parent_cgroup(cgroup_path)) 171 - return EACCES; 172 - 173 - /* Allocate memory */ 174 - buf = malloc(size); 175 - if (!buf) 176 - return ENOMEM; 177 - 178 - /* Write to memory to make sure it's actually allocated */ 179 - for (ptr = buf; ptr < buf + size; ptr += PAGE_SIZE) 180 - *ptr = 1; 181 - 182 - /* Try to reclaim memory */ 183 - snprintf(size_buf, 128, "%lu", size); 184 - err = write_cgroup_file_parent(cgroup_path, "memory.reclaim", size_buf); 185 - 186 - free(buf); 187 - /* memory.reclaim returns EAGAIN if the amount is not fully reclaimed */ 188 - if (err && errno != EAGAIN) 189 - return errno; 190 - 191 - return 0; 192 - } 193 - 194 - static int induce_vmscan(void) 195 - { 196 - int i, status; 197 - 198 - /* 199 - * In every leaf cgroup, run a child process that allocates some memory 200 - * and attempts to reclaim some of it. 201 - */ 161 + /* In every leaf cgroup, attach 3 processes */ 202 162 for (i = N_NON_LEAF_CGROUPS; i < N_CGROUPS; i++) { 203 - pid_t pid; 163 + for (j = 0; j < PROCESSES_PER_CGROUP; j++) { 164 + pid_t pid; 204 165 205 - /* Create reclaimer child */ 206 - pid = fork(); 207 - if (pid == 0) { 208 - status = reclaimer(cgroups[i].path, MB(5)); 209 - exit(status); 166 + /* Create child and attach to cgroup */ 167 + pid = fork(); 168 + if (pid == 0) { 169 + if (join_parent_cgroup(cgroups[i].path)) 170 + exit(EACCES); 171 + exit(0); 172 + } 173 + 174 + /* Cleanup child */ 175 + waitpid(pid, &status, 0); 176 + if (!ASSERT_TRUE(WIFEXITED(status), "child process exited")) 177 + return 1; 178 + if (!ASSERT_EQ(WEXITSTATUS(status), 0, 179 + "child process exit code")) 180 + return 1; 210 181 } 211 - 212 - /* Cleanup reclaimer child */ 213 - waitpid(pid, &status, 0); 214 - ASSERT_TRUE(WIFEXITED(status), "reclaimer exited"); 215 - ASSERT_EQ(WEXITSTATUS(status), 0, "reclaim exit code"); 216 182 } 217 183 return 0; 218 184 } 219 185 220 186 static unsigned long long 221 - get_cgroup_vmscan_delay(unsigned long long cgroup_id, const char *file_name) 187 + get_attach_counter(unsigned long long cgroup_id, const char *file_name) 222 188 { 223 - unsigned long long vmscan = 0, id = 0; 189 + unsigned long long attach_counter = 0, id = 0; 224 190 static char buf[128], path[128]; 225 191 226 192 /* For every cgroup, read the file generated by cgroup_iter */ 227 - snprintf(path, 128, "%s%s", BPFFS_VMSCAN, file_name); 193 + snprintf(path, 128, "%s%s", BPFFS_ATTACH_COUNTERS, file_name); 228 194 if (!ASSERT_OK(read_from_file(path, buf, 128), "read cgroup_iter")) 229 195 return 0; 230 196 231 197 /* Check the output file formatting */ 232 - ASSERT_EQ(sscanf(buf, "cg_id: %llu, total_vmscan_delay: %llu\n", 233 - &id, &vmscan), 2, "output format"); 198 + ASSERT_EQ(sscanf(buf, "cg_id: %llu, attach_counter: %llu\n", 199 + &id, &attach_counter), 2, "output format"); 234 200 235 201 /* Check that the cgroup_id is displayed correctly */ 236 202 ASSERT_EQ(id, cgroup_id, "cgroup_id"); 237 - /* Check that the vmscan reading is non-zero */ 238 - ASSERT_GT(vmscan, 0, "vmscan_reading"); 239 - return vmscan; 203 + /* Check that the counter is non-zero */ 204 + ASSERT_GT(attach_counter, 0, "attach counter non-zero"); 205 + return attach_counter; 240 206 } 241 207 242 - static void check_vmscan_stats(void) 208 + static void check_attach_counters(void) 243 209 { 244 - unsigned long long vmscan_readings[N_CGROUPS], vmscan_root; 210 + unsigned long long attach_counters[N_CGROUPS], root_attach_counter; 245 211 int i; 246 212 247 - for (i = 0; i < N_CGROUPS; i++) { 248 - vmscan_readings[i] = get_cgroup_vmscan_delay(cgroups[i].id, 249 - cgroups[i].name); 250 - } 213 + for (i = 0; i < N_CGROUPS; i++) 214 + attach_counters[i] = get_attach_counter(cgroups[i].id, 215 + cgroups[i].name); 251 216 252 217 /* Read stats for root too */ 253 - vmscan_root = get_cgroup_vmscan_delay(CG_ROOT_ID, CG_ROOT_NAME); 218 + root_attach_counter = get_attach_counter(CG_ROOT_ID, CG_ROOT_NAME); 219 + 220 + /* Check that all leafs cgroups have an attach counter of 3 */ 221 + for (i = N_NON_LEAF_CGROUPS; i < N_CGROUPS; i++) 222 + ASSERT_EQ(attach_counters[i], PROCESSES_PER_CGROUP, 223 + "leaf cgroup attach counter"); 254 224 255 225 /* Check that child1 == child1_1 + child1_2 */ 256 - ASSERT_EQ(vmscan_readings[1], vmscan_readings[3] + vmscan_readings[4], 257 - "child1_vmscan"); 226 + ASSERT_EQ(attach_counters[1], attach_counters[3] + attach_counters[4], 227 + "child1_counter"); 258 228 /* Check that child2 == child2_1 + child2_2 */ 259 - ASSERT_EQ(vmscan_readings[2], vmscan_readings[5] + vmscan_readings[6], 260 - "child2_vmscan"); 229 + ASSERT_EQ(attach_counters[2], attach_counters[5] + attach_counters[6], 230 + "child2_counter"); 261 231 /* Check that test == child1 + child2 */ 262 - ASSERT_EQ(vmscan_readings[0], vmscan_readings[1] + vmscan_readings[2], 263 - "test_vmscan"); 232 + ASSERT_EQ(attach_counters[0], attach_counters[1] + attach_counters[2], 233 + "test_counter"); 264 234 /* Check that root >= test */ 265 - ASSERT_GE(vmscan_root, vmscan_readings[1], "root_vmscan"); 235 + ASSERT_GE(root_attach_counter, attach_counters[1], "root_counter"); 266 236 } 267 237 268 238 /* Creates iter link and pins in bpffs, returns 0 on success, -errno on failure. ··· 260 278 linfo.cgroup.order = BPF_CGROUP_ITER_SELF_ONLY; 261 279 opts.link_info = &linfo; 262 280 opts.link_info_len = sizeof(linfo); 263 - link = bpf_program__attach_iter(obj->progs.dump_vmscan, &opts); 281 + link = bpf_program__attach_iter(obj->progs.dumper, &opts); 264 282 if (!ASSERT_OK_PTR(link, "attach_iter")) 265 283 return -EFAULT; 266 284 267 285 /* Pin the link to a bpffs file */ 268 - snprintf(path, 128, "%s%s", BPFFS_VMSCAN, file_name); 286 + snprintf(path, 128, "%s%s", BPFFS_ATTACH_COUNTERS, file_name); 269 287 err = bpf_link__pin(link, path); 270 288 ASSERT_OK(err, "pin cgroup_iter"); 271 289 ··· 295 313 if (!ASSERT_OK(err, "setup_cgroup_iter")) 296 314 return err; 297 315 298 - bpf_program__set_autoattach((*skel)->progs.dump_vmscan, false); 316 + bpf_program__set_autoattach((*skel)->progs.dumper, false); 299 317 err = cgroup_hierarchical_stats__attach(*skel); 300 318 if (!ASSERT_OK(err, "attach")) 301 319 return err; ··· 310 328 311 329 for (i = 0; i < N_CGROUPS; i++) { 312 330 /* Delete files in bpffs that cgroup_iters are pinned in */ 313 - snprintf(path, 128, "%s%s", BPFFS_VMSCAN, 331 + snprintf(path, 128, "%s%s", BPFFS_ATTACH_COUNTERS, 314 332 cgroups[i].name); 315 333 ASSERT_OK(remove(path), "remove cgroup_iter pin"); 316 334 } 317 335 318 336 /* Delete root file in bpffs */ 319 - snprintf(path, 128, "%s%s", BPFFS_VMSCAN, CG_ROOT_NAME); 337 + snprintf(path, 128, "%s%s", BPFFS_ATTACH_COUNTERS, CG_ROOT_NAME); 320 338 ASSERT_OK(remove(path), "remove cgroup_iter root pin"); 321 339 cgroup_hierarchical_stats__destroy(skel); 322 340 } ··· 329 347 goto hierarchy_cleanup; 330 348 if (setup_progs(&skel)) 331 349 goto cleanup; 332 - if (induce_vmscan()) 350 + if (attach_processes()) 333 351 goto cleanup; 334 - check_vmscan_stats(); 352 + check_attach_counters(); 335 353 cleanup: 336 354 destroy_progs(skel); 337 355 hierarchy_cleanup:

+178

tools/testing/selftests/bpf/prog_tests/connect_ping.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + /* 4 + * Copyright 2022 Google LLC. 5 + */ 6 + 7 + #define _GNU_SOURCE 8 + #include <sys/mount.h> 9 + 10 + #include "test_progs.h" 11 + #include "cgroup_helpers.h" 12 + #include "network_helpers.h" 13 + 14 + #include "connect_ping.skel.h" 15 + 16 + /* 2001:db8::1 */ 17 + #define BINDADDR_V6 { { { 0x20,0x01,0x0d,0xb8,0,0,0,0,0,0,0,0,0,0,0,1 } } } 18 + static const struct in6_addr bindaddr_v6 = BINDADDR_V6; 19 + 20 + static void subtest(int cgroup_fd, struct connect_ping *skel, 21 + int family, int do_bind) 22 + { 23 + struct sockaddr_in sa4 = { 24 + .sin_family = AF_INET, 25 + .sin_addr.s_addr = htonl(INADDR_LOOPBACK), 26 + }; 27 + struct sockaddr_in6 sa6 = { 28 + .sin6_family = AF_INET6, 29 + .sin6_addr = IN6ADDR_LOOPBACK_INIT, 30 + }; 31 + struct sockaddr *sa; 32 + socklen_t sa_len; 33 + int protocol; 34 + int sock_fd; 35 + 36 + switch (family) { 37 + case AF_INET: 38 + sa = (struct sockaddr *)&sa4; 39 + sa_len = sizeof(sa4); 40 + protocol = IPPROTO_ICMP; 41 + break; 42 + case AF_INET6: 43 + sa = (struct sockaddr *)&sa6; 44 + sa_len = sizeof(sa6); 45 + protocol = IPPROTO_ICMPV6; 46 + break; 47 + } 48 + 49 + memset(skel->bss, 0, sizeof(*skel->bss)); 50 + skel->bss->do_bind = do_bind; 51 + 52 + sock_fd = socket(family, SOCK_DGRAM, protocol); 53 + if (!ASSERT_GE(sock_fd, 0, "sock-create")) 54 + return; 55 + 56 + if (!ASSERT_OK(connect(sock_fd, sa, sa_len), "connect")) 57 + goto close_sock; 58 + 59 + if (!ASSERT_EQ(skel->bss->invocations_v4, family == AF_INET ? 1 : 0, 60 + "invocations_v4")) 61 + goto close_sock; 62 + if (!ASSERT_EQ(skel->bss->invocations_v6, family == AF_INET6 ? 1 : 0, 63 + "invocations_v6")) 64 + goto close_sock; 65 + if (!ASSERT_EQ(skel->bss->has_error, 0, "has_error")) 66 + goto close_sock; 67 + 68 + if (!ASSERT_OK(getsockname(sock_fd, sa, &sa_len), 69 + "getsockname")) 70 + goto close_sock; 71 + 72 + switch (family) { 73 + case AF_INET: 74 + if (!ASSERT_EQ(sa4.sin_family, family, "sin_family")) 75 + goto close_sock; 76 + if (!ASSERT_EQ(sa4.sin_addr.s_addr, 77 + htonl(do_bind ? 0x01010101 : INADDR_LOOPBACK), 78 + "sin_addr")) 79 + goto close_sock; 80 + break; 81 + case AF_INET6: 82 + if (!ASSERT_EQ(sa6.sin6_family, AF_INET6, "sin6_family")) 83 + goto close_sock; 84 + if (!ASSERT_EQ(memcmp(&sa6.sin6_addr, 85 + do_bind ? &bindaddr_v6 : &in6addr_loopback, 86 + sizeof(sa6.sin6_addr)), 87 + 0, "sin6_addr")) 88 + goto close_sock; 89 + break; 90 + } 91 + 92 + close_sock: 93 + close(sock_fd); 94 + } 95 + 96 + void test_connect_ping(void) 97 + { 98 + struct connect_ping *skel; 99 + int cgroup_fd; 100 + 101 + if (!ASSERT_OK(unshare(CLONE_NEWNET | CLONE_NEWNS), "unshare")) 102 + return; 103 + 104 + /* overmount sysfs, and making original sysfs private so overmount 105 + * does not propagate to other mntns. 106 + */ 107 + if (!ASSERT_OK(mount("none", "/sys", NULL, MS_PRIVATE, NULL), 108 + "remount-private-sys")) 109 + return; 110 + if (!ASSERT_OK(mount("sysfs", "/sys", "sysfs", 0, NULL), 111 + "mount-sys")) 112 + return; 113 + if (!ASSERT_OK(mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL), 114 + "mount-bpf")) 115 + goto clean_mount; 116 + 117 + if (!ASSERT_OK(system("ip link set dev lo up"), "lo-up")) 118 + goto clean_mount; 119 + if (!ASSERT_OK(system("ip addr add 1.1.1.1 dev lo"), "lo-addr-v4")) 120 + goto clean_mount; 121 + if (!ASSERT_OK(system("ip -6 addr add 2001:db8::1 dev lo"), "lo-addr-v6")) 122 + goto clean_mount; 123 + if (write_sysctl("/proc/sys/net/ipv4/ping_group_range", "0 0")) 124 + goto clean_mount; 125 + 126 + cgroup_fd = test__join_cgroup("/connect_ping"); 127 + if (!ASSERT_GE(cgroup_fd, 0, "cg-create")) 128 + goto clean_mount; 129 + 130 + skel = connect_ping__open_and_load(); 131 + if (!ASSERT_OK_PTR(skel, "skel-load")) 132 + goto close_cgroup; 133 + skel->links.connect_v4_prog = 134 + bpf_program__attach_cgroup(skel->progs.connect_v4_prog, cgroup_fd); 135 + if (!ASSERT_OK_PTR(skel->links.connect_v4_prog, "cg-attach-v4")) 136 + goto skel_destroy; 137 + skel->links.connect_v6_prog = 138 + bpf_program__attach_cgroup(skel->progs.connect_v6_prog, cgroup_fd); 139 + if (!ASSERT_OK_PTR(skel->links.connect_v6_prog, "cg-attach-v6")) 140 + goto skel_destroy; 141 + 142 + /* Connect a v4 ping socket to localhost, assert that only v4 is called, 143 + * and called exactly once, and that the socket's bound address is 144 + * original loopback address. 145 + */ 146 + if (test__start_subtest("ipv4")) 147 + subtest(cgroup_fd, skel, AF_INET, 0); 148 + 149 + /* Connect a v4 ping socket to localhost, assert that only v4 is called, 150 + * and called exactly once, and that the socket's bound address is 151 + * address we explicitly bound. 152 + */ 153 + if (test__start_subtest("ipv4-bind")) 154 + subtest(cgroup_fd, skel, AF_INET, 1); 155 + 156 + /* Connect a v6 ping socket to localhost, assert that only v6 is called, 157 + * and called exactly once, and that the socket's bound address is 158 + * original loopback address. 159 + */ 160 + if (test__start_subtest("ipv6")) 161 + subtest(cgroup_fd, skel, AF_INET6, 0); 162 + 163 + /* Connect a v6 ping socket to localhost, assert that only v6 is called, 164 + * and called exactly once, and that the socket's bound address is 165 + * address we explicitly bound. 166 + */ 167 + if (test__start_subtest("ipv6-bind")) 168 + subtest(cgroup_fd, skel, AF_INET6, 1); 169 + 170 + skel_destroy: 171 + connect_ping__destroy(skel); 172 + 173 + close_cgroup: 174 + close(cgroup_fd); 175 + 176 + clean_mount: 177 + umount2("/sys", MNT_DETACH); 178 + }

+1 -1

tools/testing/selftests/bpf/prog_tests/dynptr.c

··· 30 30 {"invalid_helper2", "Expected an initialized dynptr as arg #3"}, 31 31 {"invalid_write1", "Expected an initialized dynptr as arg #1"}, 32 32 {"invalid_write2", "Expected an initialized dynptr as arg #3"}, 33 - {"invalid_write3", "Expected an initialized ringbuf dynptr as arg #1"}, 33 + {"invalid_write3", "Expected an initialized dynptr as arg #1"}, 34 34 {"invalid_write4", "arg 1 is an unacquired reference"}, 35 35 {"invalid_read1", "invalid read from stack"}, 36 36 {"invalid_read2", "cannot pass in dynptr at an offset"},

+50 -13

tools/testing/selftests/bpf/prog_tests/get_func_ip_test.c

··· 2 2 #include <test_progs.h> 3 3 #include "get_func_ip_test.skel.h" 4 4 5 - void test_get_func_ip_test(void) 5 + static void test_function_entry(void) 6 6 { 7 7 struct get_func_ip_test *skel = NULL; 8 8 int err, prog_fd; ··· 11 11 skel = get_func_ip_test__open(); 12 12 if (!ASSERT_OK_PTR(skel, "get_func_ip_test__open")) 13 13 return; 14 - 15 - /* test6 is x86_64 specifc because of the instruction 16 - * offset, disabling it for all other archs 17 - */ 18 - #ifndef __x86_64__ 19 - bpf_program__set_autoload(skel->progs.test6, false); 20 - bpf_program__set_autoload(skel->progs.test7, false); 21 - #endif 22 14 23 15 err = get_func_ip_test__load(skel); 24 16 if (!ASSERT_OK(err, "get_func_ip_test__load")) ··· 35 43 ASSERT_EQ(skel->bss->test3_result, 1, "test3_result"); 36 44 ASSERT_EQ(skel->bss->test4_result, 1, "test4_result"); 37 45 ASSERT_EQ(skel->bss->test5_result, 1, "test5_result"); 38 - #ifdef __x86_64__ 39 - ASSERT_EQ(skel->bss->test6_result, 1, "test6_result"); 40 - ASSERT_EQ(skel->bss->test7_result, 1, "test7_result"); 41 - #endif 42 46 43 47 cleanup: 44 48 get_func_ip_test__destroy(skel); 49 + } 50 + 51 + /* test6 is x86_64 specific because of the instruction 52 + * offset, disabling it for all other archs 53 + */ 54 + #ifdef __x86_64__ 55 + static void test_function_body(void) 56 + { 57 + struct get_func_ip_test *skel = NULL; 58 + LIBBPF_OPTS(bpf_test_run_opts, topts); 59 + LIBBPF_OPTS(bpf_kprobe_opts, kopts); 60 + struct bpf_link *link6 = NULL; 61 + int err, prog_fd; 62 + 63 + skel = get_func_ip_test__open(); 64 + if (!ASSERT_OK_PTR(skel, "get_func_ip_test__open")) 65 + return; 66 + 67 + bpf_program__set_autoload(skel->progs.test6, true); 68 + 69 + err = get_func_ip_test__load(skel); 70 + if (!ASSERT_OK(err, "get_func_ip_test__load")) 71 + goto cleanup; 72 + 73 + kopts.offset = skel->kconfig->CONFIG_X86_KERNEL_IBT ? 9 : 5; 74 + 75 + link6 = bpf_program__attach_kprobe_opts(skel->progs.test6, "bpf_fentry_test6", &kopts); 76 + if (!ASSERT_OK_PTR(link6, "link6")) 77 + goto cleanup; 78 + 79 + prog_fd = bpf_program__fd(skel->progs.test1); 80 + err = bpf_prog_test_run_opts(prog_fd, &topts); 81 + ASSERT_OK(err, "test_run"); 82 + ASSERT_EQ(topts.retval, 0, "test_run"); 83 + 84 + ASSERT_EQ(skel->bss->test6_result, 1, "test6_result"); 85 + 86 + cleanup: 87 + bpf_link__destroy(link6); 88 + get_func_ip_test__destroy(skel); 89 + } 90 + #else 91 + #define test_function_body() 92 + #endif 93 + 94 + void test_get_func_ip_test(void) 95 + { 96 + test_function_entry(); 97 + test_function_body(); 45 98 }

+208 -23

tools/testing/selftests/bpf/prog_tests/kfunc_call.c

··· 2 2 /* Copyright (c) 2021 Facebook */ 3 3 #include <test_progs.h> 4 4 #include <network_helpers.h> 5 + #include "kfunc_call_fail.skel.h" 6 + #include "kfunc_call_test.skel.h" 5 7 #include "kfunc_call_test.lskel.h" 6 8 #include "kfunc_call_test_subprog.skel.h" 7 9 #include "kfunc_call_test_subprog.lskel.h" ··· 11 9 12 10 #include "cap_helpers.h" 13 11 14 - static void test_main(void) 15 - { 16 - struct kfunc_call_test_lskel *skel; 17 - int prog_fd, err; 18 - LIBBPF_OPTS(bpf_test_run_opts, topts, 19 - .data_in = &pkt_v4, 20 - .data_size_in = sizeof(pkt_v4), 21 - .repeat = 1, 22 - ); 12 + static size_t log_buf_sz = 1048576; /* 1 MB */ 13 + static char obj_log_buf[1048576]; 23 14 24 - skel = kfunc_call_test_lskel__open_and_load(); 15 + enum kfunc_test_type { 16 + tc_test = 0, 17 + syscall_test, 18 + syscall_null_ctx_test, 19 + }; 20 + 21 + struct kfunc_test_params { 22 + const char *prog_name; 23 + unsigned long lskel_prog_desc_offset; 24 + int retval; 25 + enum kfunc_test_type test_type; 26 + const char *expected_err_msg; 27 + }; 28 + 29 + #define __BPF_TEST_SUCCESS(name, __retval, type) \ 30 + { \ 31 + .prog_name = #name, \ 32 + .lskel_prog_desc_offset = offsetof(struct kfunc_call_test_lskel, progs.name), \ 33 + .retval = __retval, \ 34 + .test_type = type, \ 35 + .expected_err_msg = NULL, \ 36 + } 37 + 38 + #define __BPF_TEST_FAIL(name, __retval, type, error_msg) \ 39 + { \ 40 + .prog_name = #name, \ 41 + .lskel_prog_desc_offset = 0 /* unused when test is failing */, \ 42 + .retval = __retval, \ 43 + .test_type = type, \ 44 + .expected_err_msg = error_msg, \ 45 + } 46 + 47 + #define TC_TEST(name, retval) __BPF_TEST_SUCCESS(name, retval, tc_test) 48 + #define SYSCALL_TEST(name, retval) __BPF_TEST_SUCCESS(name, retval, syscall_test) 49 + #define SYSCALL_NULL_CTX_TEST(name, retval) __BPF_TEST_SUCCESS(name, retval, syscall_null_ctx_test) 50 + 51 + #define TC_FAIL(name, retval, error_msg) __BPF_TEST_FAIL(name, retval, tc_test, error_msg) 52 + #define SYSCALL_NULL_CTX_FAIL(name, retval, error_msg) \ 53 + __BPF_TEST_FAIL(name, retval, syscall_null_ctx_test, error_msg) 54 + 55 + static struct kfunc_test_params kfunc_tests[] = { 56 + /* failure cases: 57 + * if retval is 0 -> the program will fail to load and the error message is an error 58 + * if retval is not 0 -> the program can be loaded but running it will gives the 59 + * provided return value. The error message is thus the one 60 + * from a successful load 61 + */ 62 + SYSCALL_NULL_CTX_FAIL(kfunc_syscall_test_fail, -EINVAL, "processed 4 insns"), 63 + SYSCALL_NULL_CTX_FAIL(kfunc_syscall_test_null_fail, -EINVAL, "processed 4 insns"), 64 + TC_FAIL(kfunc_call_test_get_mem_fail_rdonly, 0, "R0 cannot write into rdonly_mem"), 65 + TC_FAIL(kfunc_call_test_get_mem_fail_use_after_free, 0, "invalid mem access 'scalar'"), 66 + TC_FAIL(kfunc_call_test_get_mem_fail_oob, 0, "min value is outside of the allowed memory range"), 67 + TC_FAIL(kfunc_call_test_get_mem_fail_not_const, 0, "is not a const"), 68 + TC_FAIL(kfunc_call_test_mem_acquire_fail, 0, "acquire kernel function does not return PTR_TO_BTF_ID"), 69 + 70 + /* success cases */ 71 + TC_TEST(kfunc_call_test1, 12), 72 + TC_TEST(kfunc_call_test2, 3), 73 + TC_TEST(kfunc_call_test_ref_btf_id, 0), 74 + TC_TEST(kfunc_call_test_get_mem, 42), 75 + SYSCALL_TEST(kfunc_syscall_test, 0), 76 + SYSCALL_NULL_CTX_TEST(kfunc_syscall_test_null, 0), 77 + }; 78 + 79 + struct syscall_test_args { 80 + __u8 data[16]; 81 + size_t size; 82 + }; 83 + 84 + static void verify_success(struct kfunc_test_params *param) 85 + { 86 + struct kfunc_call_test_lskel *lskel = NULL; 87 + LIBBPF_OPTS(bpf_test_run_opts, topts); 88 + struct bpf_prog_desc *lskel_prog; 89 + struct kfunc_call_test *skel; 90 + struct bpf_program *prog; 91 + int prog_fd, err; 92 + struct syscall_test_args args = { 93 + .size = 10, 94 + }; 95 + 96 + switch (param->test_type) { 97 + case syscall_test: 98 + topts.ctx_in = &args; 99 + topts.ctx_size_in = sizeof(args); 100 + /* fallthrough */ 101 + case syscall_null_ctx_test: 102 + break; 103 + case tc_test: 104 + topts.data_in = &pkt_v4; 105 + topts.data_size_in = sizeof(pkt_v4); 106 + topts.repeat = 1; 107 + break; 108 + } 109 + 110 + /* first test with normal libbpf */ 111 + skel = kfunc_call_test__open_and_load(); 25 112 if (!ASSERT_OK_PTR(skel, "skel")) 26 113 return; 27 114 28 - prog_fd = skel->progs.kfunc_call_test1.prog_fd; 29 - err = bpf_prog_test_run_opts(prog_fd, &topts); 30 - ASSERT_OK(err, "bpf_prog_test_run(test1)"); 31 - ASSERT_EQ(topts.retval, 12, "test1-retval"); 115 + prog = bpf_object__find_program_by_name(skel->obj, param->prog_name); 116 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 117 + goto cleanup; 32 118 33 - prog_fd = skel->progs.kfunc_call_test2.prog_fd; 119 + prog_fd = bpf_program__fd(prog); 34 120 err = bpf_prog_test_run_opts(prog_fd, &topts); 35 - ASSERT_OK(err, "bpf_prog_test_run(test2)"); 36 - ASSERT_EQ(topts.retval, 3, "test2-retval"); 121 + if (!ASSERT_OK(err, param->prog_name)) 122 + goto cleanup; 37 123 38 - prog_fd = skel->progs.kfunc_call_test_ref_btf_id.prog_fd; 124 + if (!ASSERT_EQ(topts.retval, param->retval, "retval")) 125 + goto cleanup; 126 + 127 + /* second test with light skeletons */ 128 + lskel = kfunc_call_test_lskel__open_and_load(); 129 + if (!ASSERT_OK_PTR(lskel, "lskel")) 130 + goto cleanup; 131 + 132 + lskel_prog = (struct bpf_prog_desc *)((char *)lskel + param->lskel_prog_desc_offset); 133 + 134 + prog_fd = lskel_prog->prog_fd; 39 135 err = bpf_prog_test_run_opts(prog_fd, &topts); 40 - ASSERT_OK(err, "bpf_prog_test_run(test_ref_btf_id)"); 41 - ASSERT_EQ(topts.retval, 0, "test_ref_btf_id-retval"); 136 + if (!ASSERT_OK(err, param->prog_name)) 137 + goto cleanup; 42 138 43 - kfunc_call_test_lskel__destroy(skel); 139 + ASSERT_EQ(topts.retval, param->retval, "retval"); 140 + 141 + cleanup: 142 + kfunc_call_test__destroy(skel); 143 + if (lskel) 144 + kfunc_call_test_lskel__destroy(lskel); 145 + } 146 + 147 + static void verify_fail(struct kfunc_test_params *param) 148 + { 149 + LIBBPF_OPTS(bpf_object_open_opts, opts); 150 + LIBBPF_OPTS(bpf_test_run_opts, topts); 151 + struct bpf_program *prog; 152 + struct kfunc_call_fail *skel; 153 + int prog_fd, err; 154 + struct syscall_test_args args = { 155 + .size = 10, 156 + }; 157 + 158 + opts.kernel_log_buf = obj_log_buf; 159 + opts.kernel_log_size = log_buf_sz; 160 + opts.kernel_log_level = 1; 161 + 162 + switch (param->test_type) { 163 + case syscall_test: 164 + topts.ctx_in = &args; 165 + topts.ctx_size_in = sizeof(args); 166 + /* fallthrough */ 167 + case syscall_null_ctx_test: 168 + break; 169 + case tc_test: 170 + topts.data_in = &pkt_v4; 171 + topts.data_size_in = sizeof(pkt_v4); 172 + break; 173 + topts.repeat = 1; 174 + } 175 + 176 + skel = kfunc_call_fail__open_opts(&opts); 177 + if (!ASSERT_OK_PTR(skel, "kfunc_call_fail__open_opts")) 178 + goto cleanup; 179 + 180 + prog = bpf_object__find_program_by_name(skel->obj, param->prog_name); 181 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 182 + goto cleanup; 183 + 184 + bpf_program__set_autoload(prog, true); 185 + 186 + err = kfunc_call_fail__load(skel); 187 + if (!param->retval) { 188 + /* the verifier is supposed to complain and refuses to load */ 189 + if (!ASSERT_ERR(err, "unexpected load success")) 190 + goto out_err; 191 + 192 + } else { 193 + /* the program is loaded but must dynamically fail */ 194 + if (!ASSERT_OK(err, "unexpected load error")) 195 + goto out_err; 196 + 197 + prog_fd = bpf_program__fd(prog); 198 + err = bpf_prog_test_run_opts(prog_fd, &topts); 199 + if (!ASSERT_EQ(err, param->retval, param->prog_name)) 200 + goto out_err; 201 + } 202 + 203 + out_err: 204 + if (!ASSERT_OK_PTR(strstr(obj_log_buf, param->expected_err_msg), "expected_err_msg")) { 205 + fprintf(stderr, "Expected err_msg: %s\n", param->expected_err_msg); 206 + fprintf(stderr, "Verifier output: %s\n", obj_log_buf); 207 + } 208 + 209 + cleanup: 210 + kfunc_call_fail__destroy(skel); 211 + } 212 + 213 + static void test_main(void) 214 + { 215 + int i; 216 + 217 + for (i = 0; i < ARRAY_SIZE(kfunc_tests); i++) { 218 + if (!test__start_subtest(kfunc_tests[i].prog_name)) 219 + continue; 220 + 221 + if (!kfunc_tests[i].expected_err_msg) 222 + verify_success(&kfunc_tests[i]); 223 + else 224 + verify_fail(&kfunc_tests[i]); 225 + } 44 226 } 45 227 46 228 static void test_subprog(void) ··· 307 121 308 122 void test_kfunc_call(void) 309 123 { 310 - if (test__start_subtest("main")) 311 - test_main(); 124 + test_main(); 312 125 313 126 if (test__start_subtest("subprog")) 314 127 test_subprog();

+164

tools/testing/selftests/bpf/prog_tests/kfunc_dynptr_param.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (c) 2022 Facebook 5 + * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH 6 + * 7 + * Author: Roberto Sassu <roberto.sassu@huawei.com> 8 + */ 9 + 10 + #include <test_progs.h> 11 + #include "test_kfunc_dynptr_param.skel.h" 12 + 13 + static size_t log_buf_sz = 1048576; /* 1 MB */ 14 + static char obj_log_buf[1048576]; 15 + 16 + static struct { 17 + const char *prog_name; 18 + const char *expected_verifier_err_msg; 19 + int expected_runtime_err; 20 + } kfunc_dynptr_tests[] = { 21 + {"dynptr_type_not_supp", 22 + "arg#0 pointer type STRUCT bpf_dynptr_kern points to unsupported dynamic pointer type", 0}, 23 + {"not_valid_dynptr", 24 + "arg#0 pointer type STRUCT bpf_dynptr_kern must be valid and initialized", 0}, 25 + {"not_ptr_to_stack", "arg#0 pointer type STRUCT bpf_dynptr_kern not to stack", 0}, 26 + {"dynptr_data_null", NULL, -EBADMSG}, 27 + }; 28 + 29 + static bool kfunc_not_supported; 30 + 31 + static int libbpf_print_cb(enum libbpf_print_level level, const char *fmt, 32 + va_list args) 33 + { 34 + if (strcmp(fmt, "libbpf: extern (func ksym) '%s': not found in kernel or module BTFs\n")) 35 + return 0; 36 + 37 + if (strcmp(va_arg(args, char *), "bpf_verify_pkcs7_signature")) 38 + return 0; 39 + 40 + kfunc_not_supported = true; 41 + return 0; 42 + } 43 + 44 + static void verify_fail(const char *prog_name, const char *expected_err_msg) 45 + { 46 + struct test_kfunc_dynptr_param *skel; 47 + LIBBPF_OPTS(bpf_object_open_opts, opts); 48 + libbpf_print_fn_t old_print_cb; 49 + struct bpf_program *prog; 50 + int err; 51 + 52 + opts.kernel_log_buf = obj_log_buf; 53 + opts.kernel_log_size = log_buf_sz; 54 + opts.kernel_log_level = 1; 55 + 56 + skel = test_kfunc_dynptr_param__open_opts(&opts); 57 + if (!ASSERT_OK_PTR(skel, "test_kfunc_dynptr_param__open_opts")) 58 + goto cleanup; 59 + 60 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 61 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 62 + goto cleanup; 63 + 64 + bpf_program__set_autoload(prog, true); 65 + 66 + bpf_map__set_max_entries(skel->maps.ringbuf, getpagesize()); 67 + 68 + kfunc_not_supported = false; 69 + 70 + old_print_cb = libbpf_set_print(libbpf_print_cb); 71 + err = test_kfunc_dynptr_param__load(skel); 72 + libbpf_set_print(old_print_cb); 73 + 74 + if (err < 0 && kfunc_not_supported) { 75 + fprintf(stderr, 76 + "%s:SKIP:bpf_verify_pkcs7_signature() kfunc not supported\n", 77 + __func__); 78 + test__skip(); 79 + goto cleanup; 80 + } 81 + 82 + if (!ASSERT_ERR(err, "unexpected load success")) 83 + goto cleanup; 84 + 85 + if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) { 86 + fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg); 87 + fprintf(stderr, "Verifier output: %s\n", obj_log_buf); 88 + } 89 + 90 + cleanup: 91 + test_kfunc_dynptr_param__destroy(skel); 92 + } 93 + 94 + static void verify_success(const char *prog_name, int expected_runtime_err) 95 + { 96 + struct test_kfunc_dynptr_param *skel; 97 + libbpf_print_fn_t old_print_cb; 98 + struct bpf_program *prog; 99 + struct bpf_link *link; 100 + __u32 next_id; 101 + int err; 102 + 103 + skel = test_kfunc_dynptr_param__open(); 104 + if (!ASSERT_OK_PTR(skel, "test_kfunc_dynptr_param__open")) 105 + return; 106 + 107 + skel->bss->pid = getpid(); 108 + 109 + bpf_map__set_max_entries(skel->maps.ringbuf, getpagesize()); 110 + 111 + kfunc_not_supported = false; 112 + 113 + old_print_cb = libbpf_set_print(libbpf_print_cb); 114 + err = test_kfunc_dynptr_param__load(skel); 115 + libbpf_set_print(old_print_cb); 116 + 117 + if (err < 0 && kfunc_not_supported) { 118 + fprintf(stderr, 119 + "%s:SKIP:bpf_verify_pkcs7_signature() kfunc not supported\n", 120 + __func__); 121 + test__skip(); 122 + goto cleanup; 123 + } 124 + 125 + if (!ASSERT_OK(err, "test_kfunc_dynptr_param__load")) 126 + goto cleanup; 127 + 128 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 129 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 130 + goto cleanup; 131 + 132 + link = bpf_program__attach(prog); 133 + if (!ASSERT_OK_PTR(link, "bpf_program__attach")) 134 + goto cleanup; 135 + 136 + err = bpf_prog_get_next_id(0, &next_id); 137 + 138 + bpf_link__destroy(link); 139 + 140 + if (!ASSERT_OK(err, "bpf_prog_get_next_id")) 141 + goto cleanup; 142 + 143 + ASSERT_EQ(skel->bss->err, expected_runtime_err, "err"); 144 + 145 + cleanup: 146 + test_kfunc_dynptr_param__destroy(skel); 147 + } 148 + 149 + void test_kfunc_dynptr_param(void) 150 + { 151 + int i; 152 + 153 + for (i = 0; i < ARRAY_SIZE(kfunc_dynptr_tests); i++) { 154 + if (!test__start_subtest(kfunc_dynptr_tests[i].prog_name)) 155 + continue; 156 + 157 + if (kfunc_dynptr_tests[i].expected_verifier_err_msg) 158 + verify_fail(kfunc_dynptr_tests[i].prog_name, 159 + kfunc_dynptr_tests[i].expected_verifier_err_msg); 160 + else 161 + verify_success(kfunc_dynptr_tests[i].prog_name, 162 + kfunc_dynptr_tests[i].expected_runtime_err); 163 + } 164 + }

+112

tools/testing/selftests/bpf/prog_tests/lookup_key.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH 5 + * 6 + * Author: Roberto Sassu <roberto.sassu@huawei.com> 7 + */ 8 + 9 + #include <linux/keyctl.h> 10 + #include <test_progs.h> 11 + 12 + #include "test_lookup_key.skel.h" 13 + 14 + #define KEY_LOOKUP_CREATE 0x01 15 + #define KEY_LOOKUP_PARTIAL 0x02 16 + 17 + static bool kfunc_not_supported; 18 + 19 + static int libbpf_print_cb(enum libbpf_print_level level, const char *fmt, 20 + va_list args) 21 + { 22 + char *func; 23 + 24 + if (strcmp(fmt, "libbpf: extern (func ksym) '%s': not found in kernel or module BTFs\n")) 25 + return 0; 26 + 27 + func = va_arg(args, char *); 28 + 29 + if (strcmp(func, "bpf_lookup_user_key") && strcmp(func, "bpf_key_put") && 30 + strcmp(func, "bpf_lookup_system_key")) 31 + return 0; 32 + 33 + kfunc_not_supported = true; 34 + return 0; 35 + } 36 + 37 + void test_lookup_key(void) 38 + { 39 + libbpf_print_fn_t old_print_cb; 40 + struct test_lookup_key *skel; 41 + __u32 next_id; 42 + int ret; 43 + 44 + skel = test_lookup_key__open(); 45 + if (!ASSERT_OK_PTR(skel, "test_lookup_key__open")) 46 + return; 47 + 48 + old_print_cb = libbpf_set_print(libbpf_print_cb); 49 + ret = test_lookup_key__load(skel); 50 + libbpf_set_print(old_print_cb); 51 + 52 + if (ret < 0 && kfunc_not_supported) { 53 + printf("%s:SKIP:bpf_lookup_*_key(), bpf_key_put() kfuncs not supported\n", 54 + __func__); 55 + test__skip(); 56 + goto close_prog; 57 + } 58 + 59 + if (!ASSERT_OK(ret, "test_lookup_key__load")) 60 + goto close_prog; 61 + 62 + ret = test_lookup_key__attach(skel); 63 + if (!ASSERT_OK(ret, "test_lookup_key__attach")) 64 + goto close_prog; 65 + 66 + skel->bss->monitored_pid = getpid(); 67 + skel->bss->key_serial = KEY_SPEC_THREAD_KEYRING; 68 + 69 + /* The thread-specific keyring does not exist, this test fails. */ 70 + skel->bss->flags = 0; 71 + 72 + ret = bpf_prog_get_next_id(0, &next_id); 73 + if (!ASSERT_LT(ret, 0, "bpf_prog_get_next_id")) 74 + goto close_prog; 75 + 76 + /* Force creation of the thread-specific keyring, this test succeeds. */ 77 + skel->bss->flags = KEY_LOOKUP_CREATE; 78 + 79 + ret = bpf_prog_get_next_id(0, &next_id); 80 + if (!ASSERT_OK(ret, "bpf_prog_get_next_id")) 81 + goto close_prog; 82 + 83 + /* Pass both lookup flags for parameter validation. */ 84 + skel->bss->flags = KEY_LOOKUP_CREATE | KEY_LOOKUP_PARTIAL; 85 + 86 + ret = bpf_prog_get_next_id(0, &next_id); 87 + if (!ASSERT_OK(ret, "bpf_prog_get_next_id")) 88 + goto close_prog; 89 + 90 + /* Pass invalid flags. */ 91 + skel->bss->flags = UINT64_MAX; 92 + 93 + ret = bpf_prog_get_next_id(0, &next_id); 94 + if (!ASSERT_LT(ret, 0, "bpf_prog_get_next_id")) 95 + goto close_prog; 96 + 97 + skel->bss->key_serial = 0; 98 + skel->bss->key_id = 1; 99 + 100 + ret = bpf_prog_get_next_id(0, &next_id); 101 + if (!ASSERT_OK(ret, "bpf_prog_get_next_id")) 102 + goto close_prog; 103 + 104 + skel->bss->key_id = UINT32_MAX; 105 + 106 + ret = bpf_prog_get_next_id(0, &next_id); 107 + ASSERT_LT(ret, 0, "bpf_prog_get_next_id"); 108 + 109 + close_prog: 110 + skel->bss->monitored_pid = 0; 111 + test_lookup_key__destroy(skel); 112 + }

+33 -54

tools/testing/selftests/bpf/prog_tests/sockmap_basic.c

··· 27 27 int s, repair, err; 28 28 29 29 s = socket(AF_INET, SOCK_STREAM, 0); 30 - if (CHECK_FAIL(s == -1)) 30 + if (!ASSERT_GE(s, 0, "socket")) 31 31 goto error; 32 32 33 33 repair = TCP_REPAIR_ON; 34 34 err = setsockopt(s, SOL_TCP, TCP_REPAIR, &repair, sizeof(repair)); 35 - if (CHECK_FAIL(err)) 35 + if (!ASSERT_OK(err, "setsockopt(TCP_REPAIR)")) 36 36 goto error; 37 37 38 38 err = connect(s, (struct sockaddr *)&addr, len); 39 - if (CHECK_FAIL(err)) 39 + if (!ASSERT_OK(err, "connect")) 40 40 goto error; 41 41 42 42 repair = TCP_REPAIR_OFF_NO_WP; 43 43 err = setsockopt(s, SOL_TCP, TCP_REPAIR, &repair, sizeof(repair)); 44 - if (CHECK_FAIL(err)) 44 + if (!ASSERT_OK(err, "setsockopt(TCP_REPAIR)")) 45 45 goto error; 46 46 47 47 return s; ··· 54 54 static void compare_cookies(struct bpf_map *src, struct bpf_map *dst) 55 55 { 56 56 __u32 i, max_entries = bpf_map__max_entries(src); 57 - int err, duration = 0, src_fd, dst_fd; 57 + int err, src_fd, dst_fd; 58 58 59 59 src_fd = bpf_map__fd(src); 60 60 dst_fd = bpf_map__fd(dst); ··· 65 65 err = bpf_map_lookup_elem(src_fd, &i, &src_cookie); 66 66 if (err && errno == ENOENT) { 67 67 err = bpf_map_lookup_elem(dst_fd, &i, &dst_cookie); 68 - CHECK(!err, "map_lookup_elem(dst)", "element %u not deleted\n", i); 69 - CHECK(err && errno != ENOENT, "map_lookup_elem(dst)", "%s\n", 70 - strerror(errno)); 68 + ASSERT_ERR(err, "map_lookup_elem(dst)"); 69 + ASSERT_EQ(errno, ENOENT, "map_lookup_elem(dst)"); 71 70 continue; 72 71 } 73 - if (CHECK(err, "lookup_elem(src)", "%s\n", strerror(errno))) 72 + if (!ASSERT_OK(err, "lookup_elem(src)")) 74 73 continue; 75 74 76 75 err = bpf_map_lookup_elem(dst_fd, &i, &dst_cookie); 77 - if (CHECK(err, "lookup_elem(dst)", "%s\n", strerror(errno))) 76 + if (!ASSERT_OK(err, "lookup_elem(dst)")) 78 77 continue; 79 78 80 - CHECK(dst_cookie != src_cookie, "cookie mismatch", 81 - "%llu != %llu (pos %u)\n", dst_cookie, src_cookie, i); 79 + ASSERT_EQ(dst_cookie, src_cookie, "cookie mismatch"); 82 80 } 83 81 } 84 82 ··· 87 89 int s, map, err; 88 90 89 91 s = connected_socket_v4(); 90 - if (CHECK_FAIL(s < 0)) 92 + if (!ASSERT_GE(s, 0, "connected_socket_v4")) 91 93 return; 92 94 93 95 map = bpf_map_create(map_type, NULL, sizeof(int), sizeof(int), 1, NULL); 94 - if (CHECK_FAIL(map < 0)) { 95 - perror("bpf_cmap_create"); 96 + if (!ASSERT_GE(map, 0, "bpf_map_create")) 96 97 goto out; 97 - } 98 98 99 99 err = bpf_map_update_elem(map, &zero, &s, BPF_NOEXIST); 100 - if (CHECK_FAIL(err)) { 101 - perror("bpf_map_update"); 100 + if (!ASSERT_OK(err, "bpf_map_update")) 102 101 goto out; 103 - } 104 102 105 103 out: 106 104 close(map); ··· 109 115 int err, map, verdict; 110 116 111 117 skel = test_skmsg_load_helpers__open_and_load(); 112 - if (CHECK_FAIL(!skel)) { 113 - perror("test_skmsg_load_helpers__open_and_load"); 118 + if (!ASSERT_OK_PTR(skel, "test_skmsg_load_helpers__open_and_load")) 114 119 return; 115 - } 116 120 117 121 verdict = bpf_program__fd(skel->progs.prog_msg_verdict); 118 122 map = bpf_map__fd(skel->maps.sock_map); 119 123 120 124 err = bpf_prog_attach(verdict, map, BPF_SK_MSG_VERDICT, 0); 121 - if (CHECK_FAIL(err)) { 122 - perror("bpf_prog_attach"); 125 + if (!ASSERT_OK(err, "bpf_prog_attach")) 123 126 goto out; 124 - } 125 127 126 128 err = bpf_prog_detach2(verdict, map, BPF_SK_MSG_VERDICT); 127 - if (CHECK_FAIL(err)) { 128 - perror("bpf_prog_detach2"); 129 + if (!ASSERT_OK(err, "bpf_prog_detach2")) 129 130 goto out; 130 - } 131 131 out: 132 132 test_skmsg_load_helpers__destroy(skel); 133 133 } 134 134 135 135 static void test_sockmap_update(enum bpf_map_type map_type) 136 136 { 137 - int err, prog, src, duration = 0; 137 + int err, prog, src; 138 138 struct test_sockmap_update *skel; 139 139 struct bpf_map *dst_map; 140 140 const __u32 zero = 0; ··· 141 153 __s64 sk; 142 154 143 155 sk = connected_socket_v4(); 144 - if (CHECK(sk == -1, "connected_socket_v4", "cannot connect\n")) 156 + if (!ASSERT_NEQ(sk, -1, "connected_socket_v4")) 145 157 return; 146 158 147 159 skel = test_sockmap_update__open_and_load(); 148 - if (CHECK(!skel, "open_and_load", "cannot load skeleton\n")) 160 + if (!ASSERT_OK_PTR(skel, "open_and_load")) 149 161 goto close_sk; 150 162 151 163 prog = bpf_program__fd(skel->progs.copy_sock_map); ··· 156 168 dst_map = skel->maps.dst_sock_hash; 157 169 158 170 err = bpf_map_update_elem(src, &zero, &sk, BPF_NOEXIST); 159 - if (CHECK(err, "update_elem(src)", "errno=%u\n", errno)) 171 + if (!ASSERT_OK(err, "update_elem(src)")) 160 172 goto out; 161 173 162 174 err = bpf_prog_test_run_opts(prog, &topts); ··· 176 188 static void test_sockmap_invalid_update(void) 177 189 { 178 190 struct test_sockmap_invalid_update *skel; 179 - int duration = 0; 180 191 181 192 skel = test_sockmap_invalid_update__open_and_load(); 182 - if (CHECK(skel, "open_and_load", "verifier accepted map_update\n")) 193 + if (!ASSERT_NULL(skel, "open_and_load")) 183 194 test_sockmap_invalid_update__destroy(skel); 184 195 } 185 196 186 197 static void test_sockmap_copy(enum bpf_map_type map_type) 187 198 { 188 199 DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); 189 - int err, len, src_fd, iter_fd, duration = 0; 200 + int err, len, src_fd, iter_fd; 190 201 union bpf_iter_link_info linfo = {}; 191 202 __u32 i, num_sockets, num_elems; 192 203 struct bpf_iter_sockmap *skel; ··· 195 208 char buf[64]; 196 209 197 210 skel = bpf_iter_sockmap__open_and_load(); 198 - if (CHECK(!skel, "bpf_iter_sockmap__open_and_load", "skeleton open_and_load failed\n")) 211 + if (!ASSERT_OK_PTR(skel, "bpf_iter_sockmap__open_and_load")) 199 212 return; 200 213 201 214 if (map_type == BPF_MAP_TYPE_SOCKMAP) { ··· 209 222 } 210 223 211 224 sock_fd = calloc(num_sockets, sizeof(*sock_fd)); 212 - if (CHECK(!sock_fd, "calloc(sock_fd)", "failed to allocate\n")) 225 + if (!ASSERT_OK_PTR(sock_fd, "calloc(sock_fd)")) 213 226 goto out; 214 227 215 228 for (i = 0; i < num_sockets; i++) ··· 219 232 220 233 for (i = 0; i < num_sockets; i++) { 221 234 sock_fd[i] = connected_socket_v4(); 222 - if (CHECK(sock_fd[i] == -1, "connected_socket_v4", "cannot connect\n")) 235 + if (!ASSERT_NEQ(sock_fd[i], -1, "connected_socket_v4")) 223 236 goto out; 224 237 225 238 err = bpf_map_update_elem(src_fd, &i, &sock_fd[i], BPF_NOEXIST); 226 - if (CHECK(err, "map_update", "failed: %s\n", strerror(errno))) 239 + if (!ASSERT_OK(err, "map_update")) 227 240 goto out; 228 241 } 229 242 ··· 235 248 goto out; 236 249 237 250 iter_fd = bpf_iter_create(bpf_link__fd(link)); 238 - if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n")) 251 + if (!ASSERT_GE(iter_fd, 0, "create_iter")) 239 252 goto free_link; 240 253 241 254 /* do some tests */ 242 255 while ((len = read(iter_fd, buf, sizeof(buf))) > 0) 243 256 ; 244 - if (CHECK(len < 0, "read", "failed: %s\n", strerror(errno))) 257 + if (!ASSERT_GE(len, 0, "read")) 245 258 goto close_iter; 246 259 247 260 /* test results */ 248 - if (CHECK(skel->bss->elems != num_elems, "elems", "got %u expected %u\n", 249 - skel->bss->elems, num_elems)) 261 + if (!ASSERT_EQ(skel->bss->elems, num_elems, "elems")) 250 262 goto close_iter; 251 263 252 - if (CHECK(skel->bss->socks != num_sockets, "socks", "got %u expected %u\n", 253 - skel->bss->socks, num_sockets)) 264 + if (!ASSERT_EQ(skel->bss->socks, num_sockets, "socks")) 254 265 goto close_iter; 255 266 256 267 compare_cookies(src, skel->maps.dst); ··· 273 288 int err, map, verdict; 274 289 275 290 skel = test_sockmap_skb_verdict_attach__open_and_load(); 276 - if (CHECK_FAIL(!skel)) { 277 - perror("test_sockmap_skb_verdict_attach__open_and_load"); 291 + if (!ASSERT_OK_PTR(skel, "open_and_load")) 278 292 return; 279 - } 280 293 281 294 verdict = bpf_program__fd(skel->progs.prog_skb_verdict); 282 295 map = bpf_map__fd(skel->maps.sock_map); 283 296 284 297 err = bpf_prog_attach(verdict, map, first, 0); 285 - if (CHECK_FAIL(err)) { 286 - perror("bpf_prog_attach"); 298 + if (!ASSERT_OK(err, "bpf_prog_attach")) 287 299 goto out; 288 - } 289 300 290 301 err = bpf_prog_attach(verdict, map, second, 0); 291 302 ASSERT_EQ(err, -EBUSY, "prog_attach_fail"); 292 303 293 304 err = bpf_prog_detach2(verdict, map, first); 294 - if (CHECK_FAIL(err)) { 295 - perror("bpf_prog_detach2"); 305 + if (!ASSERT_OK(err, "bpf_prog_detach2")) 296 306 goto out; 297 - } 298 307 out: 299 308 test_sockmap_skb_verdict_attach__destroy(skel); 300 309 }

+10 -29

tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c

··· 15 15 int err, s; 16 16 17 17 s = socket(family, SOCK_STREAM, 0); 18 - if (CHECK_FAIL(s == -1)) { 19 - perror("socket"); 18 + if (!ASSERT_GE(s, 0, "socket")) 20 19 return -1; 21 - } 22 20 23 21 err = listen(s, SOMAXCONN); 24 - if (CHECK_FAIL(err)) { 25 - perror("listen"); 22 + if (!ASSERT_OK(err, "listen")) 26 23 return -1; 27 - } 28 24 29 25 return s; 30 26 } ··· 44 48 return; 45 49 46 50 err = getsockname(srv, (struct sockaddr *)&addr, &len); 47 - if (CHECK_FAIL(err)) { 48 - perror("getsockopt"); 51 + if (!ASSERT_OK(err, "getsockopt")) 49 52 goto close_srv; 50 - } 51 53 52 54 cli = socket(family, SOCK_STREAM, 0); 53 - if (CHECK_FAIL(cli == -1)) { 54 - perror("socket"); 55 + if (!ASSERT_GE(cli, 0, "socket")) 55 56 goto close_srv; 56 - } 57 57 58 58 err = connect(cli, (struct sockaddr *)&addr, len); 59 - if (CHECK_FAIL(err)) { 60 - perror("connect"); 59 + if (!ASSERT_OK(err, "connect")) 61 60 goto close_cli; 62 - } 63 61 64 62 err = bpf_map_update_elem(map, &zero, &cli, 0); 65 - if (CHECK_FAIL(err)) { 66 - perror("bpf_map_update_elem"); 63 + if (!ASSERT_OK(err, "bpf_map_update_elem")) 67 64 goto close_cli; 68 - } 69 65 70 66 err = setsockopt(cli, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls")); 71 - if (CHECK_FAIL(err)) { 72 - perror("setsockopt(TCP_ULP)"); 67 + if (!ASSERT_OK(err, "setsockopt(TCP_ULP)")) 73 68 goto close_cli; 74 - } 75 69 76 70 err = bpf_map_delete_elem(map, &zero); 77 - if (CHECK_FAIL(err)) { 78 - perror("bpf_map_delete_elem"); 71 + if (!ASSERT_OK(err, "bpf_map_delete_elem")) 79 72 goto close_cli; 80 - } 81 73 82 74 err = disconnect(cli); 83 - if (CHECK_FAIL(err)) 84 - perror("disconnect"); 75 + ASSERT_OK(err, "disconnect"); 85 76 86 77 close_cli: 87 78 close(cli); ··· 151 168 int map; 152 169 153 170 map = bpf_map_create(map_type, NULL, sizeof(int), sizeof(int), 1, NULL); 154 - if (CHECK_FAIL(map < 0)) { 155 - perror("bpf_map_create"); 171 + if (!ASSERT_GE(map, 0, "bpf_map_create")) 156 172 return; 157 - } 158 173 159 174 if (test__start_subtest(fmt_test_name("disconnect_after_delete", family, map_type))) 160 175 test_sockmap_ktls_disconnect_after_delete(family, map);

+2 -2

tools/testing/selftests/bpf/prog_tests/sockopt.c

··· 972 972 int cgroup_fd, i; 973 973 974 974 cgroup_fd = test__join_cgroup("/sockopt"); 975 - if (CHECK_FAIL(cgroup_fd < 0)) 975 + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup")) 976 976 return; 977 977 978 978 for (i = 0; i < ARRAY_SIZE(tests); i++) { 979 979 test__start_subtest(tests[i].descr); 980 - CHECK_FAIL(run_test(cgroup_fd, &tests[i])); 980 + ASSERT_OK(run_test(cgroup_fd, &tests[i]), tests[i].descr); 981 981 } 982 982 983 983 close(cgroup_fd);

+13 -17

tools/testing/selftests/bpf/prog_tests/sockopt_inherit.c

··· 76 76 pthread_cond_signal(&server_started); 77 77 pthread_mutex_unlock(&server_started_mtx); 78 78 79 - if (CHECK_FAIL(err < 0)) { 80 - perror("Failed to listed on socket"); 79 + if (!ASSERT_GE(err, 0, "listed on socket")) 81 80 return NULL; 82 - } 83 81 84 82 err += verify_sockopt(fd, CUSTOM_INHERIT1, "listen", 1); 85 83 err += verify_sockopt(fd, CUSTOM_INHERIT2, "listen", 1); 86 84 err += verify_sockopt(fd, CUSTOM_LISTENER, "listen", 1); 87 85 88 86 client_fd = accept(fd, (struct sockaddr *)&addr, &len); 89 - if (CHECK_FAIL(client_fd < 0)) { 90 - perror("Failed to accept client"); 87 + if (!ASSERT_GE(client_fd, 0, "accept client")) 91 88 return NULL; 92 - } 93 89 94 90 err += verify_sockopt(client_fd, CUSTOM_INHERIT1, "accept", 1); 95 91 err += verify_sockopt(client_fd, CUSTOM_INHERIT2, "accept", 1); ··· 179 183 goto close_bpf_object; 180 184 181 185 err = prog_attach(obj, cgroup_fd, "cgroup/getsockopt", "_getsockopt"); 182 - if (CHECK_FAIL(err)) 186 + if (!ASSERT_OK(err, "prog_attach _getsockopt")) 183 187 goto close_bpf_object; 184 188 185 189 err = prog_attach(obj, cgroup_fd, "cgroup/setsockopt", "_setsockopt"); 186 - if (CHECK_FAIL(err)) 190 + if (!ASSERT_OK(err, "prog_attach _setsockopt")) 187 191 goto close_bpf_object; 188 192 189 193 server_fd = start_server(); 190 - if (CHECK_FAIL(server_fd < 0)) 194 + if (!ASSERT_GE(server_fd, 0, "start_server")) 191 195 goto close_bpf_object; 192 196 193 197 pthread_mutex_lock(&server_started_mtx); 194 - if (CHECK_FAIL(pthread_create(&tid, NULL, server_thread, 195 - (void *)&server_fd))) { 198 + if (!ASSERT_OK(pthread_create(&tid, NULL, server_thread, 199 + (void *)&server_fd), "pthread_create")) { 196 200 pthread_mutex_unlock(&server_started_mtx); 197 201 goto close_server_fd; 198 202 } ··· 200 204 pthread_mutex_unlock(&server_started_mtx); 201 205 202 206 client_fd = connect_to_server(server_fd); 203 - if (CHECK_FAIL(client_fd < 0)) 207 + if (!ASSERT_GE(client_fd, 0, "connect_to_server")) 204 208 goto close_server_fd; 205 209 206 - CHECK_FAIL(verify_sockopt(client_fd, CUSTOM_INHERIT1, "connect", 0)); 207 - CHECK_FAIL(verify_sockopt(client_fd, CUSTOM_INHERIT2, "connect", 0)); 208 - CHECK_FAIL(verify_sockopt(client_fd, CUSTOM_LISTENER, "connect", 0)); 210 + ASSERT_OK(verify_sockopt(client_fd, CUSTOM_INHERIT1, "connect", 0), "verify_sockopt1"); 211 + ASSERT_OK(verify_sockopt(client_fd, CUSTOM_INHERIT2, "connect", 0), "verify_sockopt2"); 212 + ASSERT_OK(verify_sockopt(client_fd, CUSTOM_LISTENER, "connect", 0), "verify_sockopt ener"); 209 213 210 214 pthread_join(tid, &server_err); 211 215 212 216 err = (int)(long)server_err; 213 - CHECK_FAIL(err); 217 + ASSERT_OK(err, "pthread_join retval"); 214 218 215 219 close(client_fd); 216 220 ··· 225 229 int cgroup_fd; 226 230 227 231 cgroup_fd = test__join_cgroup("/sockopt_inherit"); 228 - if (CHECK_FAIL(cgroup_fd < 0)) 232 + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup")) 229 233 return; 230 234 231 235 run_test(cgroup_fd);

+5 -5

tools/testing/selftests/bpf/prog_tests/sockopt_multi.c

··· 303 303 int err = -1; 304 304 305 305 cg_parent = test__join_cgroup("/parent"); 306 - if (CHECK_FAIL(cg_parent < 0)) 306 + if (!ASSERT_GE(cg_parent, 0, "join_cgroup /parent")) 307 307 goto out; 308 308 309 309 cg_child = test__join_cgroup("/parent/child"); 310 - if (CHECK_FAIL(cg_child < 0)) 310 + if (!ASSERT_GE(cg_child, 0, "join_cgroup /parent/child")) 311 311 goto out; 312 312 313 313 obj = bpf_object__open_file("sockopt_multi.bpf.o", NULL); ··· 319 319 goto out; 320 320 321 321 sock_fd = socket(AF_INET, SOCK_STREAM, 0); 322 - if (CHECK_FAIL(sock_fd < 0)) 322 + if (!ASSERT_GE(sock_fd, 0, "socket")) 323 323 goto out; 324 324 325 - CHECK_FAIL(run_getsockopt_test(obj, cg_parent, cg_child, sock_fd)); 326 - CHECK_FAIL(run_setsockopt_test(obj, cg_parent, cg_child, sock_fd)); 325 + ASSERT_OK(run_getsockopt_test(obj, cg_parent, cg_child, sock_fd), "getsockopt_test"); 326 + ASSERT_OK(run_setsockopt_test(obj, cg_parent, cg_child, sock_fd), "setsockopt_test"); 327 327 328 328 out: 329 329 close(sock_fd);

+1 -1

tools/testing/selftests/bpf/prog_tests/sockopt_sk.c

··· 223 223 int cgroup_fd; 224 224 225 225 cgroup_fd = test__join_cgroup("/sockopt_sk"); 226 - if (CHECK_FAIL(cgroup_fd < 0)) 226 + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /sockopt_sk")) 227 227 return; 228 228 229 229 run_test(cgroup_fd);

+1 -3

tools/testing/selftests/bpf/prog_tests/tcp_estats.c

··· 6 6 const char *file = "./test_tcp_estats.bpf.o"; 7 7 int err, prog_fd; 8 8 struct bpf_object *obj; 9 - __u32 duration = 0; 10 9 11 10 err = bpf_prog_test_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd); 12 - CHECK(err, "", "err %d errno %d\n", err, errno); 13 - if (err) 11 + if (!ASSERT_OK(err, "")) 14 12 return; 15 13 16 14 bpf_object__close(obj);

+28 -72

tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c

··· 42 42 43 43 static int create_netns(void) 44 44 { 45 - if (CHECK(unshare(CLONE_NEWNET), "create netns", 46 - "unshare(CLONE_NEWNET): %s (%d)", 47 - strerror(errno), errno)) 45 + if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns")) 48 46 return -1; 49 47 50 - if (CHECK(system("ip link set dev lo up"), "run ip cmd", 51 - "failed to bring lo link up\n")) 52 - return -1; 53 - 54 - return 0; 55 - } 56 - 57 - static int write_sysctl(const char *sysctl, const char *value) 58 - { 59 - int fd, err, len; 60 - 61 - fd = open(sysctl, O_WRONLY); 62 - if (CHECK(fd == -1, "open sysctl", "open(%s): %s (%d)\n", 63 - sysctl, strerror(errno), errno)) 64 - return -1; 65 - 66 - len = strlen(value); 67 - err = write(fd, value, len); 68 - close(fd); 69 - if (CHECK(err != len, "write sysctl", 70 - "write(%s, %s): err:%d %s (%d)\n", 71 - sysctl, value, err, strerror(errno), errno)) 48 + if (!ASSERT_OK(system("ip link set dev lo up"), "run ip cmd")) 72 49 return -1; 73 50 74 51 return 0; ··· 77 100 78 101 shutdown(sk_fds->active_fd, SHUT_WR); 79 102 ret = read(sk_fds->passive_fd, &abyte, sizeof(abyte)); 80 - if (CHECK(ret != 0, "read-after-shutdown(passive_fd):", 81 - "ret:%d %s (%d)\n", 82 - ret, strerror(errno), errno)) 103 + if (!ASSERT_EQ(ret, 0, "read-after-shutdown(passive_fd):")) 83 104 return -1; 84 105 85 106 shutdown(sk_fds->passive_fd, SHUT_WR); 86 107 ret = read(sk_fds->active_fd, &abyte, sizeof(abyte)); 87 - if (CHECK(ret != 0, "read-after-shutdown(active_fd):", 88 - "ret:%d %s (%d)\n", 89 - ret, strerror(errno), errno)) 108 + if (!ASSERT_EQ(ret, 0, "read-after-shutdown(active_fd):")) 90 109 return -1; 91 110 92 111 return 0; ··· 95 122 socklen_t len; 96 123 97 124 sk_fds->srv_fd = start_server(AF_INET6, SOCK_STREAM, LO_ADDR6, 0, 0); 98 - if (CHECK(sk_fds->srv_fd == -1, "start_server", "%s (%d)\n", 99 - strerror(errno), errno)) 125 + if (!ASSERT_NEQ(sk_fds->srv_fd, -1, "start_server")) 100 126 goto error; 101 127 102 128 if (fast_open) ··· 104 132 else 105 133 sk_fds->active_fd = connect_to_fd(sk_fds->srv_fd, 0); 106 134 107 - if (CHECK_FAIL(sk_fds->active_fd == -1)) { 135 + if (!ASSERT_NEQ(sk_fds->active_fd, -1, "")) { 108 136 close(sk_fds->srv_fd); 109 137 goto error; 110 138 } 111 139 112 140 len = sizeof(addr6); 113 - if (CHECK(getsockname(sk_fds->srv_fd, (struct sockaddr *)&addr6, 114 - &len), "getsockname(srv_fd)", "%s (%d)\n", 115 - strerror(errno), errno)) 141 + if (!ASSERT_OK(getsockname(sk_fds->srv_fd, (struct sockaddr *)&addr6, 142 + &len), "getsockname(srv_fd)")) 116 143 goto error_close; 117 144 sk_fds->passive_lport = ntohs(addr6.sin6_port); 118 145 119 146 len = sizeof(addr6); 120 - if (CHECK(getsockname(sk_fds->active_fd, (struct sockaddr *)&addr6, 121 - &len), "getsockname(active_fd)", "%s (%d)\n", 122 - strerror(errno), errno)) 147 + if (!ASSERT_OK(getsockname(sk_fds->active_fd, (struct sockaddr *)&addr6, 148 + &len), "getsockname(active_fd)")) 123 149 goto error_close; 124 150 sk_fds->active_lport = ntohs(addr6.sin6_port); 125 151 126 152 sk_fds->passive_fd = accept(sk_fds->srv_fd, NULL, 0); 127 - if (CHECK(sk_fds->passive_fd == -1, "accept(srv_fd)", "%s (%d)\n", 128 - strerror(errno), errno)) 153 + if (!ASSERT_NEQ(sk_fds->passive_fd, -1, "accept(srv_fd)")) 129 154 goto error_close; 130 155 131 156 if (fast_open) { ··· 130 161 int ret; 131 162 132 163 ret = read(sk_fds->passive_fd, bytes_in, sizeof(bytes_in)); 133 - if (CHECK(ret != sizeof(fast), "read fastopen syn data", 134 - "expected=%lu actual=%d\n", sizeof(fast), ret)) { 164 + if (!ASSERT_EQ(ret, sizeof(fast), "read fastopen syn data")) { 135 165 close(sk_fds->passive_fd); 136 166 goto error_close; 137 167 } ··· 151 183 const struct bpf_test_option *act, 152 184 const char *hdr_desc) 153 185 { 154 - if (CHECK(memcmp(exp, act, sizeof(*exp)), 155 - "expected-vs-actual", "unexpected %s\n", hdr_desc)) { 186 + if (!ASSERT_OK(memcmp(exp, act, sizeof(*exp)), hdr_desc)) { 156 187 print_option(exp, "expected: "); 157 188 print_option(act, " actual: "); 158 189 return -1; ··· 165 198 { 166 199 struct hdr_stg act; 167 200 168 - if (CHECK(bpf_map_lookup_elem(hdr_stg_map_fd, &fd, &act), 169 - "map_lookup(hdr_stg_map_fd)", "%s %s (%d)\n", 170 - stg_desc, strerror(errno), errno)) 201 + if (!ASSERT_OK(bpf_map_lookup_elem(hdr_stg_map_fd, &fd, &act), 202 + "map_lookup(hdr_stg_map_fd)")) 171 203 return -1; 172 204 173 - if (CHECK(memcmp(exp, &act, sizeof(*exp)), 174 - "expected-vs-actual", "unexpected %s\n", stg_desc)) { 205 + if (!ASSERT_OK(memcmp(exp, &act, sizeof(*exp)), stg_desc)) { 175 206 print_hdr_stg(exp, "expected: "); 176 207 print_hdr_stg(&act, " actual: "); 177 208 return -1; ··· 213 248 if (sk_fds_shutdown(sk_fds)) 214 249 goto check_linum; 215 250 216 - if (CHECK(expected_inherit_cb_flags != skel->bss->inherit_cb_flags, 217 - "Unexpected inherit_cb_flags", "0x%x != 0x%x\n", 218 - skel->bss->inherit_cb_flags, expected_inherit_cb_flags)) 251 + if (!ASSERT_EQ(expected_inherit_cb_flags, skel->bss->inherit_cb_flags, 252 + "inherit_cb_flags")) 219 253 goto check_linum; 220 254 221 255 if (check_hdr_stg(&exp_passive_hdr_stg, sk_fds->passive_fd, ··· 241 277 "active_fin_in"); 242 278 243 279 check_linum: 244 - CHECK_FAIL(check_error_linum(sk_fds)); 280 + ASSERT_FALSE(check_error_linum(sk_fds), "check_error_linum"); 245 281 sk_fds_close(sk_fds); 246 282 } 247 283 ··· 481 517 /* MSG_EOR to ensure skb will not be combined */ 482 518 ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 483 519 MSG_EOR); 484 - if (CHECK(ret != sizeof(send_msg), "send(msg)", "ret:%d\n", 485 - ret)) 520 + if (!ASSERT_EQ(ret, sizeof(send_msg), "send(msg)")) 486 521 goto check_linum; 487 522 488 523 ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg)); 489 - if (CHECK(ret != sizeof(send_msg), "read(msg)", "ret:%d\n", 490 - ret)) 524 + if (ASSERT_EQ(ret, sizeof(send_msg), "read(msg)")) 491 525 goto check_linum; 492 526 } 493 527 494 528 if (sk_fds_shutdown(&sk_fds)) 495 529 goto check_linum; 496 530 497 - CHECK(misc_skel->bss->nr_syn != 1, "unexpected nr_syn", 498 - "expected (1) != actual (%u)\n", 499 - misc_skel->bss->nr_syn); 531 + ASSERT_EQ(misc_skel->bss->nr_syn, 1, "unexpected nr_syn"); 500 532 501 - CHECK(misc_skel->bss->nr_data != nr_data, "unexpected nr_data", 502 - "expected (%u) != actual (%u)\n", 503 - nr_data, misc_skel->bss->nr_data); 533 + ASSERT_EQ(misc_skel->bss->nr_data, nr_data, "unexpected nr_data"); 504 534 505 535 /* The last ACK may have been delayed, so it is either 1 or 2. */ 506 536 CHECK(misc_skel->bss->nr_pure_ack != 1 && ··· 503 545 "expected (1 or 2) != actual (%u)\n", 504 546 misc_skel->bss->nr_pure_ack); 505 547 506 - CHECK(misc_skel->bss->nr_fin != 1, "unexpected nr_fin", 507 - "expected (1) != actual (%u)\n", 508 - misc_skel->bss->nr_fin); 548 + ASSERT_EQ(misc_skel->bss->nr_fin, 1, "unexpected nr_fin"); 509 549 510 550 check_linum: 511 - CHECK_FAIL(check_error_linum(&sk_fds)); 551 + ASSERT_FALSE(check_error_linum(&sk_fds), "check_error_linum"); 512 552 sk_fds_close(&sk_fds); 513 553 bpf_link__destroy(link); 514 554 } ··· 531 575 int i; 532 576 533 577 skel = test_tcp_hdr_options__open_and_load(); 534 - if (CHECK(!skel, "open and load skel", "failed")) 578 + if (!ASSERT_OK_PTR(skel, "open and load skel")) 535 579 return; 536 580 537 581 misc_skel = test_misc_tcp_hdr_options__open_and_load(); 538 - if (CHECK(!misc_skel, "open and load misc test skel", "failed")) 582 + if (!ASSERT_OK_PTR(misc_skel, "open and load misc test skel")) 539 583 goto skel_destroy; 540 584 541 585 cg_fd = test__join_cgroup(CG_NAME); 542 - if (CHECK_FAIL(cg_fd < 0)) 586 + if (ASSERT_GE(cg_fd, 0, "join_cgroup")) 543 587 goto skel_destroy; 544 588 545 589 for (i = 0; i < ARRAY_SIZE(tests); i++) {

+5 -8

tools/testing/selftests/bpf/prog_tests/tcp_rtt.c

··· 16 16 { 17 17 char b = 0x55; 18 18 19 - if (CHECK_FAIL(write(fd, &b, sizeof(b)) != 1)) 20 - perror("Failed to send single byte"); 19 + ASSERT_EQ(write(fd, &b, sizeof(b)), 1, "send single byte"); 21 20 } 22 21 23 22 static int wait_for_ack(int fd, int retries) ··· 50 51 int err = 0; 51 52 struct tcp_rtt_storage val; 52 53 53 - if (CHECK_FAIL(bpf_map_lookup_elem(map_fd, &client_fd, &val) < 0)) { 54 - perror("Failed to read socket storage"); 54 + if (!ASSERT_GE(bpf_map_lookup_elem(map_fd, &client_fd, &val), 0, "read socket storage")) 55 55 return -1; 56 - } 57 56 58 57 if (val.invoked != invoked) { 59 58 log_err("%s: unexpected bpf_tcp_sock.invoked %d != %d", ··· 148 151 int server_fd, cgroup_fd; 149 152 150 153 cgroup_fd = test__join_cgroup("/tcp_rtt"); 151 - if (CHECK_FAIL(cgroup_fd < 0)) 154 + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /tcp_rtt")) 152 155 return; 153 156 154 157 server_fd = start_server(AF_INET, SOCK_STREAM, NULL, 0, 0); 155 - if (CHECK_FAIL(server_fd < 0)) 158 + if (!ASSERT_GE(server_fd, 0, "start_server")) 156 159 goto close_cgroup_fd; 157 160 158 - CHECK_FAIL(run_test(cgroup_fd, server_fd)); 161 + ASSERT_OK(run_test(cgroup_fd, server_fd), "run_test"); 159 162 160 163 close(server_fd); 161 164

+12 -20

tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c

··· 8 8 #define LO_ADDR6 "::1" 9 9 #define CG_NAME "/tcpbpf-user-test" 10 10 11 - static __u32 duration; 12 - 13 11 static void verify_result(struct tcpbpf_globals *result) 14 12 { 15 13 __u32 expected_events = ((1 << BPF_SOCK_OPS_TIMEOUT_INIT) | ··· 20 22 (1 << BPF_SOCK_OPS_TCP_LISTEN_CB)); 21 23 22 24 /* check global map */ 23 - CHECK(expected_events != result->event_map, "event_map", 24 - "unexpected event_map: actual 0x%08x != expected 0x%08x\n", 25 - result->event_map, expected_events); 25 + ASSERT_EQ(expected_events, result->event_map, "event_map"); 26 26 27 27 ASSERT_EQ(result->bytes_received, 501, "bytes_received"); 28 28 ASSERT_EQ(result->bytes_acked, 1002, "bytes_acked"); ··· 52 56 int i, rv; 53 57 54 58 listen_fd = start_server(AF_INET6, SOCK_STREAM, LO_ADDR6, 0, 0); 55 - if (CHECK(listen_fd == -1, "start_server", "listen_fd:%d errno:%d\n", 56 - listen_fd, errno)) 59 + if (!ASSERT_NEQ(listen_fd, -1, "start_server")) 57 60 goto done; 58 61 59 62 cli_fd = connect_to_fd(listen_fd, 0); 60 - if (CHECK(cli_fd == -1, "connect_to_fd(listen_fd)", 61 - "cli_fd:%d errno:%d\n", cli_fd, errno)) 63 + if (!ASSERT_NEQ(cli_fd, -1, "connect_to_fd(listen_fd)")) 62 64 goto done; 63 65 64 66 accept_fd = accept(listen_fd, NULL, NULL); 65 - if (CHECK(accept_fd == -1, "accept(listen_fd)", 66 - "accept_fd:%d errno:%d\n", accept_fd, errno)) 67 + if (!ASSERT_NEQ(accept_fd, -1, "accept(listen_fd)")) 67 68 goto done; 68 69 69 70 /* Send 1000B of '+'s from cli_fd -> accept_fd */ ··· 68 75 buf[i] = '+'; 69 76 70 77 rv = send(cli_fd, buf, 1000, 0); 71 - if (CHECK(rv != 1000, "send(cli_fd)", "rv:%d errno:%d\n", rv, errno)) 78 + if (!ASSERT_EQ(rv, 1000, "send(cli_fd)")) 72 79 goto done; 73 80 74 81 rv = recv(accept_fd, buf, 1000, 0); 75 - if (CHECK(rv != 1000, "recv(accept_fd)", "rv:%d errno:%d\n", rv, errno)) 82 + if (!ASSERT_EQ(rv, 1000, "recv(accept_fd)")) 76 83 goto done; 77 84 78 85 /* Send 500B of '.'s from accept_fd ->cli_fd */ ··· 80 87 buf[i] = '.'; 81 88 82 89 rv = send(accept_fd, buf, 500, 0); 83 - if (CHECK(rv != 500, "send(accept_fd)", "rv:%d errno:%d\n", rv, errno)) 90 + if (!ASSERT_EQ(rv, 500, "send(accept_fd)")) 84 91 goto done; 85 92 86 93 rv = recv(cli_fd, buf, 500, 0); 87 - if (CHECK(rv != 500, "recv(cli_fd)", "rv:%d errno:%d\n", rv, errno)) 94 + if (!ASSERT_EQ(rv, 500, "recv(cli_fd)")) 88 95 goto done; 89 96 90 97 /* ··· 93 100 */ 94 101 shutdown(accept_fd, SHUT_WR); 95 102 err = recv(cli_fd, buf, 1, 0); 96 - if (CHECK(err, "recv(cli_fd) for fin", "err:%d errno:%d\n", err, errno)) 103 + if (!ASSERT_OK(err, "recv(cli_fd) for fin")) 97 104 goto done; 98 105 99 106 shutdown(cli_fd, SHUT_WR); 100 107 err = recv(accept_fd, buf, 1, 0); 101 - CHECK(err, "recv(accept_fd) for fin", "err:%d errno:%d\n", err, errno); 108 + ASSERT_OK(err, "recv(accept_fd) for fin"); 102 109 done: 103 110 if (accept_fd != -1) 104 111 close(accept_fd); ··· 117 124 int cg_fd = -1; 118 125 119 126 skel = test_tcpbpf_kern__open_and_load(); 120 - if (CHECK(!skel, "open and load skel", "failed")) 127 + if (!ASSERT_OK_PTR(skel, "open and load skel")) 121 128 return; 122 129 123 130 cg_fd = test__join_cgroup(CG_NAME); 124 - if (CHECK(cg_fd < 0, "test__join_cgroup(" CG_NAME ")", 125 - "cg_fd:%d errno:%d", cg_fd, errno)) 131 + if (!ASSERT_GE(cg_fd, 0, "test__join_cgroup(" CG_NAME ")")) 126 132 goto err; 127 133 128 134 skel->links.bpf_testcb = bpf_program__attach_cgroup(skel->progs.bpf_testcb, cg_fd);

+63

tools/testing/selftests/bpf/prog_tests/tracing_struct.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <test_progs.h> 5 + #include "tracing_struct.skel.h" 6 + 7 + static void test_fentry(void) 8 + { 9 + struct tracing_struct *skel; 10 + int err; 11 + 12 + skel = tracing_struct__open_and_load(); 13 + if (!ASSERT_OK_PTR(skel, "tracing_struct__open_and_load")) 14 + return; 15 + 16 + err = tracing_struct__attach(skel); 17 + if (!ASSERT_OK(err, "tracing_struct__attach")) 18 + return; 19 + 20 + ASSERT_OK(trigger_module_test_read(256), "trigger_read"); 21 + 22 + ASSERT_EQ(skel->bss->t1_a_a, 2, "t1:a.a"); 23 + ASSERT_EQ(skel->bss->t1_a_b, 3, "t1:a.b"); 24 + ASSERT_EQ(skel->bss->t1_b, 1, "t1:b"); 25 + ASSERT_EQ(skel->bss->t1_c, 4, "t1:c"); 26 + 27 + ASSERT_EQ(skel->bss->t1_nregs, 4, "t1 nregs"); 28 + ASSERT_EQ(skel->bss->t1_reg0, 2, "t1 reg0"); 29 + ASSERT_EQ(skel->bss->t1_reg1, 3, "t1 reg1"); 30 + ASSERT_EQ(skel->bss->t1_reg2, 1, "t1 reg2"); 31 + ASSERT_EQ(skel->bss->t1_reg3, 4, "t1 reg3"); 32 + ASSERT_EQ(skel->bss->t1_ret, 10, "t1 ret"); 33 + 34 + ASSERT_EQ(skel->bss->t2_a, 1, "t2:a"); 35 + ASSERT_EQ(skel->bss->t2_b_a, 2, "t2:b.a"); 36 + ASSERT_EQ(skel->bss->t2_b_b, 3, "t2:b.b"); 37 + ASSERT_EQ(skel->bss->t2_c, 4, "t2:c"); 38 + ASSERT_EQ(skel->bss->t2_ret, 10, "t2 ret"); 39 + 40 + ASSERT_EQ(skel->bss->t3_a, 1, "t3:a"); 41 + ASSERT_EQ(skel->bss->t3_b, 4, "t3:b"); 42 + ASSERT_EQ(skel->bss->t3_c_a, 2, "t3:c.a"); 43 + ASSERT_EQ(skel->bss->t3_c_b, 3, "t3:c.b"); 44 + ASSERT_EQ(skel->bss->t3_ret, 10, "t3 ret"); 45 + 46 + ASSERT_EQ(skel->bss->t4_a_a, 10, "t4:a.a"); 47 + ASSERT_EQ(skel->bss->t4_b, 1, "t4:b"); 48 + ASSERT_EQ(skel->bss->t4_c, 2, "t4:c"); 49 + ASSERT_EQ(skel->bss->t4_d, 3, "t4:d"); 50 + ASSERT_EQ(skel->bss->t4_e_a, 2, "t4:e.a"); 51 + ASSERT_EQ(skel->bss->t4_e_b, 3, "t4:e.b"); 52 + ASSERT_EQ(skel->bss->t4_ret, 21, "t4 ret"); 53 + 54 + ASSERT_EQ(skel->bss->t5_ret, 1, "t5 ret"); 55 + 56 + tracing_struct__detach(skel); 57 + tracing_struct__destroy(skel); 58 + } 59 + 60 + void test_tracing_struct(void) 61 + { 62 + test_fentry(); 63 + }

+7 -11

tools/testing/selftests/bpf/prog_tests/udp_limit.c

··· 5 5 #include <sys/types.h> 6 6 #include <sys/socket.h> 7 7 8 - static int duration; 9 - 10 8 void test_udp_limit(void) 11 9 { 12 10 struct udp_limit *skel; ··· 12 14 int cgroup_fd; 13 15 14 16 cgroup_fd = test__join_cgroup("/udp_limit"); 15 - if (CHECK(cgroup_fd < 0, "cg-join", "errno %d", errno)) 17 + if (!ASSERT_GE(cgroup_fd, 0, "cg-join")) 16 18 return; 17 19 18 20 skel = udp_limit__open_and_load(); 19 - if (CHECK(!skel, "skel-load", "errno %d", errno)) 21 + if (!ASSERT_OK_PTR(skel, "skel-load")) 20 22 goto close_cgroup_fd; 21 23 22 24 skel->links.sock = bpf_program__attach_cgroup(skel->progs.sock, cgroup_fd); ··· 30 32 * verify that. 31 33 */ 32 34 fd1 = socket(AF_INET, SOCK_DGRAM, 0); 33 - if (CHECK(fd1 < 0, "fd1", "errno %d", errno)) 35 + if (!ASSERT_GE(fd1, 0, "socket(fd1)")) 34 36 goto close_skeleton; 35 37 36 38 fd2 = socket(AF_INET, SOCK_DGRAM, 0); 37 - if (CHECK(fd2 >= 0, "fd2", "errno %d", errno)) 39 + if (!ASSERT_LT(fd2, 0, "socket(fd2)")) 38 40 goto close_skeleton; 39 41 40 42 /* We can reopen again after close. */ ··· 42 44 fd1 = -1; 43 45 44 46 fd1 = socket(AF_INET, SOCK_DGRAM, 0); 45 - if (CHECK(fd1 < 0, "fd1-again", "errno %d", errno)) 47 + if (!ASSERT_GE(fd1, 0, "socket(fd1-again)")) 46 48 goto close_skeleton; 47 49 48 50 /* Make sure the program was invoked the expected ··· 52 54 * - close fd1 - BPF_CGROUP_INET_SOCK_RELEASE 53 55 * - open fd1 again - BPF_CGROUP_INET_SOCK_CREATE 54 56 */ 55 - if (CHECK(skel->bss->invocations != 4, "bss-invocations", 56 - "invocations=%d", skel->bss->invocations)) 57 + if (!ASSERT_EQ(skel->bss->invocations, 4, "bss-invocations")) 57 58 goto close_skeleton; 58 59 59 60 /* We should still have a single socket in use */ 60 - if (CHECK(skel->bss->in_use != 1, "bss-in_use", 61 - "in_use=%d", skel->bss->in_use)) 61 + if (!ASSERT_EQ(skel->bss->in_use, 1, "bss-in_use")) 62 62 goto close_skeleton; 63 63 64 64 close_skeleton:

+754

tools/testing/selftests/bpf/prog_tests/user_ringbuf.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #define _GNU_SOURCE 5 + #include <linux/compiler.h> 6 + #include <linux/ring_buffer.h> 7 + #include <pthread.h> 8 + #include <stdio.h> 9 + #include <stdlib.h> 10 + #include <sys/mman.h> 11 + #include <sys/syscall.h> 12 + #include <sys/sysinfo.h> 13 + #include <test_progs.h> 14 + #include <uapi/linux/bpf.h> 15 + #include <unistd.h> 16 + 17 + #include "user_ringbuf_fail.skel.h" 18 + #include "user_ringbuf_success.skel.h" 19 + 20 + #include "../progs/test_user_ringbuf.h" 21 + 22 + static size_t log_buf_sz = 1 << 20; /* 1 MB */ 23 + static char obj_log_buf[1048576]; 24 + static const long c_sample_size = sizeof(struct sample) + BPF_RINGBUF_HDR_SZ; 25 + static const long c_ringbuf_size = 1 << 12; /* 1 small page */ 26 + static const long c_max_entries = c_ringbuf_size / c_sample_size; 27 + 28 + static void drain_current_samples(void) 29 + { 30 + syscall(__NR_getpgid); 31 + } 32 + 33 + static int write_samples(struct user_ring_buffer *ringbuf, uint32_t num_samples) 34 + { 35 + int i, err = 0; 36 + 37 + /* Write some number of samples to the ring buffer. */ 38 + for (i = 0; i < num_samples; i++) { 39 + struct sample *entry; 40 + int read; 41 + 42 + entry = user_ring_buffer__reserve(ringbuf, sizeof(*entry)); 43 + if (!entry) { 44 + err = -errno; 45 + goto done; 46 + } 47 + 48 + entry->pid = getpid(); 49 + entry->seq = i; 50 + entry->value = i * i; 51 + 52 + read = snprintf(entry->comm, sizeof(entry->comm), "%u", i); 53 + if (read <= 0) { 54 + /* Assert on the error path to avoid spamming logs with 55 + * mostly success messages. 56 + */ 57 + ASSERT_GT(read, 0, "snprintf_comm"); 58 + err = read; 59 + user_ring_buffer__discard(ringbuf, entry); 60 + goto done; 61 + } 62 + 63 + user_ring_buffer__submit(ringbuf, entry); 64 + } 65 + 66 + done: 67 + drain_current_samples(); 68 + 69 + return err; 70 + } 71 + 72 + static struct user_ringbuf_success *open_load_ringbuf_skel(void) 73 + { 74 + struct user_ringbuf_success *skel; 75 + int err; 76 + 77 + skel = user_ringbuf_success__open(); 78 + if (!ASSERT_OK_PTR(skel, "skel_open")) 79 + return NULL; 80 + 81 + err = bpf_map__set_max_entries(skel->maps.user_ringbuf, c_ringbuf_size); 82 + if (!ASSERT_OK(err, "set_max_entries")) 83 + goto cleanup; 84 + 85 + err = bpf_map__set_max_entries(skel->maps.kernel_ringbuf, c_ringbuf_size); 86 + if (!ASSERT_OK(err, "set_max_entries")) 87 + goto cleanup; 88 + 89 + err = user_ringbuf_success__load(skel); 90 + if (!ASSERT_OK(err, "skel_load")) 91 + goto cleanup; 92 + 93 + return skel; 94 + 95 + cleanup: 96 + user_ringbuf_success__destroy(skel); 97 + return NULL; 98 + } 99 + 100 + static void test_user_ringbuf_mappings(void) 101 + { 102 + int err, rb_fd; 103 + int page_size = getpagesize(); 104 + void *mmap_ptr; 105 + struct user_ringbuf_success *skel; 106 + 107 + skel = open_load_ringbuf_skel(); 108 + if (!skel) 109 + return; 110 + 111 + rb_fd = bpf_map__fd(skel->maps.user_ringbuf); 112 + /* cons_pos can be mapped R/O, can't add +X with mprotect. */ 113 + mmap_ptr = mmap(NULL, page_size, PROT_READ, MAP_SHARED, rb_fd, 0); 114 + ASSERT_OK_PTR(mmap_ptr, "ro_cons_pos"); 115 + ASSERT_ERR(mprotect(mmap_ptr, page_size, PROT_WRITE), "write_cons_pos_protect"); 116 + ASSERT_ERR(mprotect(mmap_ptr, page_size, PROT_EXEC), "exec_cons_pos_protect"); 117 + ASSERT_ERR_PTR(mremap(mmap_ptr, 0, 4 * page_size, MREMAP_MAYMOVE), "wr_prod_pos"); 118 + err = -errno; 119 + ASSERT_ERR(err, "wr_prod_pos_err"); 120 + ASSERT_OK(munmap(mmap_ptr, page_size), "unmap_ro_cons"); 121 + 122 + /* prod_pos can be mapped RW, can't add +X with mprotect. */ 123 + mmap_ptr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, 124 + rb_fd, page_size); 125 + ASSERT_OK_PTR(mmap_ptr, "rw_prod_pos"); 126 + ASSERT_ERR(mprotect(mmap_ptr, page_size, PROT_EXEC), "exec_prod_pos_protect"); 127 + err = -errno; 128 + ASSERT_ERR(err, "wr_prod_pos_err"); 129 + ASSERT_OK(munmap(mmap_ptr, page_size), "unmap_rw_prod"); 130 + 131 + /* data pages can be mapped RW, can't add +X with mprotect. */ 132 + mmap_ptr = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, rb_fd, 133 + 2 * page_size); 134 + ASSERT_OK_PTR(mmap_ptr, "rw_data"); 135 + ASSERT_ERR(mprotect(mmap_ptr, page_size, PROT_EXEC), "exec_data_protect"); 136 + err = -errno; 137 + ASSERT_ERR(err, "exec_data_err"); 138 + ASSERT_OK(munmap(mmap_ptr, page_size), "unmap_rw_data"); 139 + 140 + user_ringbuf_success__destroy(skel); 141 + } 142 + 143 + static int load_skel_create_ringbufs(struct user_ringbuf_success **skel_out, 144 + struct ring_buffer **kern_ringbuf_out, 145 + ring_buffer_sample_fn callback, 146 + struct user_ring_buffer **user_ringbuf_out) 147 + { 148 + struct user_ringbuf_success *skel; 149 + struct ring_buffer *kern_ringbuf = NULL; 150 + struct user_ring_buffer *user_ringbuf = NULL; 151 + int err = -ENOMEM, rb_fd; 152 + 153 + skel = open_load_ringbuf_skel(); 154 + if (!skel) 155 + return err; 156 + 157 + /* only trigger BPF program for current process */ 158 + skel->bss->pid = getpid(); 159 + 160 + if (kern_ringbuf_out) { 161 + rb_fd = bpf_map__fd(skel->maps.kernel_ringbuf); 162 + kern_ringbuf = ring_buffer__new(rb_fd, callback, skel, NULL); 163 + if (!ASSERT_OK_PTR(kern_ringbuf, "kern_ringbuf_create")) 164 + goto cleanup; 165 + 166 + *kern_ringbuf_out = kern_ringbuf; 167 + } 168 + 169 + if (user_ringbuf_out) { 170 + rb_fd = bpf_map__fd(skel->maps.user_ringbuf); 171 + user_ringbuf = user_ring_buffer__new(rb_fd, NULL); 172 + if (!ASSERT_OK_PTR(user_ringbuf, "user_ringbuf_create")) 173 + goto cleanup; 174 + 175 + *user_ringbuf_out = user_ringbuf; 176 + ASSERT_EQ(skel->bss->read, 0, "no_reads_after_load"); 177 + } 178 + 179 + err = user_ringbuf_success__attach(skel); 180 + if (!ASSERT_OK(err, "skel_attach")) 181 + goto cleanup; 182 + 183 + *skel_out = skel; 184 + return 0; 185 + 186 + cleanup: 187 + if (kern_ringbuf_out) 188 + *kern_ringbuf_out = NULL; 189 + if (user_ringbuf_out) 190 + *user_ringbuf_out = NULL; 191 + ring_buffer__free(kern_ringbuf); 192 + user_ring_buffer__free(user_ringbuf); 193 + user_ringbuf_success__destroy(skel); 194 + return err; 195 + } 196 + 197 + static int load_skel_create_user_ringbuf(struct user_ringbuf_success **skel_out, 198 + struct user_ring_buffer **ringbuf_out) 199 + { 200 + return load_skel_create_ringbufs(skel_out, NULL, NULL, ringbuf_out); 201 + } 202 + 203 + static void manually_write_test_invalid_sample(struct user_ringbuf_success *skel, 204 + __u32 size, __u64 producer_pos, int err) 205 + { 206 + void *data_ptr; 207 + __u64 *producer_pos_ptr; 208 + int rb_fd, page_size = getpagesize(); 209 + 210 + rb_fd = bpf_map__fd(skel->maps.user_ringbuf); 211 + 212 + ASSERT_EQ(skel->bss->read, 0, "num_samples_before_bad_sample"); 213 + 214 + /* Map the producer_pos as RW. */ 215 + producer_pos_ptr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, 216 + MAP_SHARED, rb_fd, page_size); 217 + ASSERT_OK_PTR(producer_pos_ptr, "producer_pos_ptr"); 218 + 219 + /* Map the data pages as RW. */ 220 + data_ptr = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, rb_fd, 2 * page_size); 221 + ASSERT_OK_PTR(data_ptr, "rw_data"); 222 + 223 + memset(data_ptr, 0, BPF_RINGBUF_HDR_SZ); 224 + *(__u32 *)data_ptr = size; 225 + 226 + /* Synchronizes with smp_load_acquire() in __bpf_user_ringbuf_peek() in the kernel. */ 227 + smp_store_release(producer_pos_ptr, producer_pos + BPF_RINGBUF_HDR_SZ); 228 + 229 + drain_current_samples(); 230 + ASSERT_EQ(skel->bss->read, 0, "num_samples_after_bad_sample"); 231 + ASSERT_EQ(skel->bss->err, err, "err_after_bad_sample"); 232 + 233 + ASSERT_OK(munmap(producer_pos_ptr, page_size), "unmap_producer_pos"); 234 + ASSERT_OK(munmap(data_ptr, page_size), "unmap_data_ptr"); 235 + } 236 + 237 + static void test_user_ringbuf_post_misaligned(void) 238 + { 239 + struct user_ringbuf_success *skel; 240 + struct user_ring_buffer *ringbuf; 241 + int err; 242 + __u32 size = (1 << 5) + 7; 243 + 244 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 245 + if (!ASSERT_OK(err, "misaligned_skel")) 246 + return; 247 + 248 + manually_write_test_invalid_sample(skel, size, size, -EINVAL); 249 + user_ring_buffer__free(ringbuf); 250 + user_ringbuf_success__destroy(skel); 251 + } 252 + 253 + static void test_user_ringbuf_post_producer_wrong_offset(void) 254 + { 255 + struct user_ringbuf_success *skel; 256 + struct user_ring_buffer *ringbuf; 257 + int err; 258 + __u32 size = (1 << 5); 259 + 260 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 261 + if (!ASSERT_OK(err, "wrong_offset_skel")) 262 + return; 263 + 264 + manually_write_test_invalid_sample(skel, size, size - 8, -EINVAL); 265 + user_ring_buffer__free(ringbuf); 266 + user_ringbuf_success__destroy(skel); 267 + } 268 + 269 + static void test_user_ringbuf_post_larger_than_ringbuf_sz(void) 270 + { 271 + struct user_ringbuf_success *skel; 272 + struct user_ring_buffer *ringbuf; 273 + int err; 274 + __u32 size = c_ringbuf_size; 275 + 276 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 277 + if (!ASSERT_OK(err, "huge_sample_skel")) 278 + return; 279 + 280 + manually_write_test_invalid_sample(skel, size, size, -E2BIG); 281 + user_ring_buffer__free(ringbuf); 282 + user_ringbuf_success__destroy(skel); 283 + } 284 + 285 + static void test_user_ringbuf_basic(void) 286 + { 287 + struct user_ringbuf_success *skel; 288 + struct user_ring_buffer *ringbuf; 289 + int err; 290 + 291 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 292 + if (!ASSERT_OK(err, "ringbuf_basic_skel")) 293 + return; 294 + 295 + ASSERT_EQ(skel->bss->read, 0, "num_samples_read_before"); 296 + 297 + err = write_samples(ringbuf, 2); 298 + if (!ASSERT_OK(err, "write_samples")) 299 + goto cleanup; 300 + 301 + ASSERT_EQ(skel->bss->read, 2, "num_samples_read_after"); 302 + 303 + cleanup: 304 + user_ring_buffer__free(ringbuf); 305 + user_ringbuf_success__destroy(skel); 306 + } 307 + 308 + static void test_user_ringbuf_sample_full_ring_buffer(void) 309 + { 310 + struct user_ringbuf_success *skel; 311 + struct user_ring_buffer *ringbuf; 312 + int err; 313 + void *sample; 314 + 315 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 316 + if (!ASSERT_OK(err, "ringbuf_full_sample_skel")) 317 + return; 318 + 319 + sample = user_ring_buffer__reserve(ringbuf, c_ringbuf_size - BPF_RINGBUF_HDR_SZ); 320 + if (!ASSERT_OK_PTR(sample, "full_sample")) 321 + goto cleanup; 322 + 323 + user_ring_buffer__submit(ringbuf, sample); 324 + ASSERT_EQ(skel->bss->read, 0, "num_samples_read_before"); 325 + drain_current_samples(); 326 + ASSERT_EQ(skel->bss->read, 1, "num_samples_read_after"); 327 + 328 + cleanup: 329 + user_ring_buffer__free(ringbuf); 330 + user_ringbuf_success__destroy(skel); 331 + } 332 + 333 + static void test_user_ringbuf_post_alignment_autoadjust(void) 334 + { 335 + struct user_ringbuf_success *skel; 336 + struct user_ring_buffer *ringbuf; 337 + struct sample *sample; 338 + int err; 339 + 340 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 341 + if (!ASSERT_OK(err, "ringbuf_align_autoadjust_skel")) 342 + return; 343 + 344 + /* libbpf should automatically round any sample up to an 8-byte alignment. */ 345 + sample = user_ring_buffer__reserve(ringbuf, sizeof(*sample) + 1); 346 + ASSERT_OK_PTR(sample, "reserve_autoaligned"); 347 + user_ring_buffer__submit(ringbuf, sample); 348 + 349 + ASSERT_EQ(skel->bss->read, 0, "num_samples_read_before"); 350 + drain_current_samples(); 351 + ASSERT_EQ(skel->bss->read, 1, "num_samples_read_after"); 352 + 353 + user_ring_buffer__free(ringbuf); 354 + user_ringbuf_success__destroy(skel); 355 + } 356 + 357 + static void test_user_ringbuf_overfill(void) 358 + { 359 + struct user_ringbuf_success *skel; 360 + struct user_ring_buffer *ringbuf; 361 + int err; 362 + 363 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 364 + if (err) 365 + return; 366 + 367 + err = write_samples(ringbuf, c_max_entries * 5); 368 + ASSERT_ERR(err, "write_samples"); 369 + ASSERT_EQ(skel->bss->read, c_max_entries, "max_entries"); 370 + 371 + user_ring_buffer__free(ringbuf); 372 + user_ringbuf_success__destroy(skel); 373 + } 374 + 375 + static void test_user_ringbuf_discards_properly_ignored(void) 376 + { 377 + struct user_ringbuf_success *skel; 378 + struct user_ring_buffer *ringbuf; 379 + int err, num_discarded = 0; 380 + __u64 *token; 381 + 382 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 383 + if (err) 384 + return; 385 + 386 + ASSERT_EQ(skel->bss->read, 0, "num_samples_read_before"); 387 + 388 + while (1) { 389 + /* Write samples until the buffer is full. */ 390 + token = user_ring_buffer__reserve(ringbuf, sizeof(*token)); 391 + if (!token) 392 + break; 393 + 394 + user_ring_buffer__discard(ringbuf, token); 395 + num_discarded++; 396 + } 397 + 398 + if (!ASSERT_GE(num_discarded, 0, "num_discarded")) 399 + goto cleanup; 400 + 401 + /* Should not read any samples, as they are all discarded. */ 402 + ASSERT_EQ(skel->bss->read, 0, "num_pre_kick"); 403 + drain_current_samples(); 404 + ASSERT_EQ(skel->bss->read, 0, "num_post_kick"); 405 + 406 + /* Now that the ring buffer has been drained, we should be able to 407 + * reserve another token. 408 + */ 409 + token = user_ring_buffer__reserve(ringbuf, sizeof(*token)); 410 + 411 + if (!ASSERT_OK_PTR(token, "new_token")) 412 + goto cleanup; 413 + 414 + user_ring_buffer__discard(ringbuf, token); 415 + cleanup: 416 + user_ring_buffer__free(ringbuf); 417 + user_ringbuf_success__destroy(skel); 418 + } 419 + 420 + static void test_user_ringbuf_loop(void) 421 + { 422 + struct user_ringbuf_success *skel; 423 + struct user_ring_buffer *ringbuf; 424 + uint32_t total_samples = 8192; 425 + uint32_t remaining_samples = total_samples; 426 + int err; 427 + 428 + BUILD_BUG_ON(total_samples <= c_max_entries); 429 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 430 + if (err) 431 + return; 432 + 433 + do { 434 + uint32_t curr_samples; 435 + 436 + curr_samples = remaining_samples > c_max_entries 437 + ? c_max_entries : remaining_samples; 438 + err = write_samples(ringbuf, curr_samples); 439 + if (err != 0) { 440 + /* Assert inside of if statement to avoid flooding logs 441 + * on the success path. 442 + */ 443 + ASSERT_OK(err, "write_samples"); 444 + goto cleanup; 445 + } 446 + 447 + remaining_samples -= curr_samples; 448 + ASSERT_EQ(skel->bss->read, total_samples - remaining_samples, 449 + "current_batched_entries"); 450 + } while (remaining_samples > 0); 451 + ASSERT_EQ(skel->bss->read, total_samples, "total_batched_entries"); 452 + 453 + cleanup: 454 + user_ring_buffer__free(ringbuf); 455 + user_ringbuf_success__destroy(skel); 456 + } 457 + 458 + static int send_test_message(struct user_ring_buffer *ringbuf, 459 + enum test_msg_op op, s64 operand_64, 460 + s32 operand_32) 461 + { 462 + struct test_msg *msg; 463 + 464 + msg = user_ring_buffer__reserve(ringbuf, sizeof(*msg)); 465 + if (!msg) { 466 + /* Assert on the error path to avoid spamming logs with mostly 467 + * success messages. 468 + */ 469 + ASSERT_OK_PTR(msg, "reserve_msg"); 470 + return -ENOMEM; 471 + } 472 + 473 + msg->msg_op = op; 474 + 475 + switch (op) { 476 + case TEST_MSG_OP_INC64: 477 + case TEST_MSG_OP_MUL64: 478 + msg->operand_64 = operand_64; 479 + break; 480 + case TEST_MSG_OP_INC32: 481 + case TEST_MSG_OP_MUL32: 482 + msg->operand_32 = operand_32; 483 + break; 484 + default: 485 + PRINT_FAIL("Invalid operand %d\n", op); 486 + user_ring_buffer__discard(ringbuf, msg); 487 + return -EINVAL; 488 + } 489 + 490 + user_ring_buffer__submit(ringbuf, msg); 491 + 492 + return 0; 493 + } 494 + 495 + static void kick_kernel_read_messages(void) 496 + { 497 + syscall(__NR_prctl); 498 + } 499 + 500 + static int handle_kernel_msg(void *ctx, void *data, size_t len) 501 + { 502 + struct user_ringbuf_success *skel = ctx; 503 + struct test_msg *msg = data; 504 + 505 + switch (msg->msg_op) { 506 + case TEST_MSG_OP_INC64: 507 + skel->bss->user_mutated += msg->operand_64; 508 + return 0; 509 + case TEST_MSG_OP_INC32: 510 + skel->bss->user_mutated += msg->operand_32; 511 + return 0; 512 + case TEST_MSG_OP_MUL64: 513 + skel->bss->user_mutated *= msg->operand_64; 514 + return 0; 515 + case TEST_MSG_OP_MUL32: 516 + skel->bss->user_mutated *= msg->operand_32; 517 + return 0; 518 + default: 519 + fprintf(stderr, "Invalid operand %d\n", msg->msg_op); 520 + return -EINVAL; 521 + } 522 + } 523 + 524 + static void drain_kernel_messages_buffer(struct ring_buffer *kern_ringbuf, 525 + struct user_ringbuf_success *skel) 526 + { 527 + int cnt; 528 + 529 + cnt = ring_buffer__consume(kern_ringbuf); 530 + ASSERT_EQ(cnt, 8, "consume_kern_ringbuf"); 531 + ASSERT_OK(skel->bss->err, "consume_kern_ringbuf_err"); 532 + } 533 + 534 + static void test_user_ringbuf_msg_protocol(void) 535 + { 536 + struct user_ringbuf_success *skel; 537 + struct user_ring_buffer *user_ringbuf; 538 + struct ring_buffer *kern_ringbuf; 539 + int err, i; 540 + __u64 expected_kern = 0; 541 + 542 + err = load_skel_create_ringbufs(&skel, &kern_ringbuf, handle_kernel_msg, &user_ringbuf); 543 + if (!ASSERT_OK(err, "create_ringbufs")) 544 + return; 545 + 546 + for (i = 0; i < 64; i++) { 547 + enum test_msg_op op = i % TEST_MSG_OP_NUM_OPS; 548 + __u64 operand_64 = TEST_OP_64; 549 + __u32 operand_32 = TEST_OP_32; 550 + 551 + err = send_test_message(user_ringbuf, op, operand_64, operand_32); 552 + if (err) { 553 + /* Only assert on a failure to avoid spamming success logs. */ 554 + ASSERT_OK(err, "send_test_message"); 555 + goto cleanup; 556 + } 557 + 558 + switch (op) { 559 + case TEST_MSG_OP_INC64: 560 + expected_kern += operand_64; 561 + break; 562 + case TEST_MSG_OP_INC32: 563 + expected_kern += operand_32; 564 + break; 565 + case TEST_MSG_OP_MUL64: 566 + expected_kern *= operand_64; 567 + break; 568 + case TEST_MSG_OP_MUL32: 569 + expected_kern *= operand_32; 570 + break; 571 + default: 572 + PRINT_FAIL("Unexpected op %d\n", op); 573 + goto cleanup; 574 + } 575 + 576 + if (i % 8 == 0) { 577 + kick_kernel_read_messages(); 578 + ASSERT_EQ(skel->bss->kern_mutated, expected_kern, "expected_kern"); 579 + ASSERT_EQ(skel->bss->err, 0, "bpf_prog_err"); 580 + drain_kernel_messages_buffer(kern_ringbuf, skel); 581 + } 582 + } 583 + 584 + cleanup: 585 + ring_buffer__free(kern_ringbuf); 586 + user_ring_buffer__free(user_ringbuf); 587 + user_ringbuf_success__destroy(skel); 588 + } 589 + 590 + static void *kick_kernel_cb(void *arg) 591 + { 592 + /* Kick the kernel, causing it to drain the ring buffer and then wake 593 + * up the test thread waiting on epoll. 594 + */ 595 + syscall(__NR_getrlimit); 596 + 597 + return NULL; 598 + } 599 + 600 + static int spawn_kick_thread_for_poll(void) 601 + { 602 + pthread_t thread; 603 + 604 + return pthread_create(&thread, NULL, kick_kernel_cb, NULL); 605 + } 606 + 607 + static void test_user_ringbuf_blocking_reserve(void) 608 + { 609 + struct user_ringbuf_success *skel; 610 + struct user_ring_buffer *ringbuf; 611 + int err, num_written = 0; 612 + __u64 *token; 613 + 614 + err = load_skel_create_user_ringbuf(&skel, &ringbuf); 615 + if (err) 616 + return; 617 + 618 + ASSERT_EQ(skel->bss->read, 0, "num_samples_read_before"); 619 + 620 + while (1) { 621 + /* Write samples until the buffer is full. */ 622 + token = user_ring_buffer__reserve(ringbuf, sizeof(*token)); 623 + if (!token) 624 + break; 625 + 626 + *token = 0xdeadbeef; 627 + 628 + user_ring_buffer__submit(ringbuf, token); 629 + num_written++; 630 + } 631 + 632 + if (!ASSERT_GE(num_written, 0, "num_written")) 633 + goto cleanup; 634 + 635 + /* Should not have read any samples until the kernel is kicked. */ 636 + ASSERT_EQ(skel->bss->read, 0, "num_pre_kick"); 637 + 638 + /* We correctly time out after 1 second, without a sample. */ 639 + token = user_ring_buffer__reserve_blocking(ringbuf, sizeof(*token), 1000); 640 + if (!ASSERT_EQ(token, NULL, "pre_kick_timeout_token")) 641 + goto cleanup; 642 + 643 + err = spawn_kick_thread_for_poll(); 644 + if (!ASSERT_EQ(err, 0, "deferred_kick_thread\n")) 645 + goto cleanup; 646 + 647 + /* After spawning another thread that asychronously kicks the kernel to 648 + * drain the messages, we're able to block and successfully get a 649 + * sample once we receive an event notification. 650 + */ 651 + token = user_ring_buffer__reserve_blocking(ringbuf, sizeof(*token), 10000); 652 + 653 + if (!ASSERT_OK_PTR(token, "block_token")) 654 + goto cleanup; 655 + 656 + ASSERT_GT(skel->bss->read, 0, "num_post_kill"); 657 + ASSERT_LE(skel->bss->read, num_written, "num_post_kill"); 658 + ASSERT_EQ(skel->bss->err, 0, "err_post_poll"); 659 + user_ring_buffer__discard(ringbuf, token); 660 + 661 + cleanup: 662 + user_ring_buffer__free(ringbuf); 663 + user_ringbuf_success__destroy(skel); 664 + } 665 + 666 + static struct { 667 + const char *prog_name; 668 + const char *expected_err_msg; 669 + } failure_tests[] = { 670 + /* failure cases */ 671 + {"user_ringbuf_callback_bad_access1", "negative offset dynptr_ptr ptr"}, 672 + {"user_ringbuf_callback_bad_access2", "dereference of modified dynptr_ptr ptr"}, 673 + {"user_ringbuf_callback_write_forbidden", "invalid mem access 'dynptr_ptr'"}, 674 + {"user_ringbuf_callback_null_context_write", "invalid mem access 'scalar'"}, 675 + {"user_ringbuf_callback_null_context_read", "invalid mem access 'scalar'"}, 676 + {"user_ringbuf_callback_discard_dynptr", "arg 1 is an unacquired reference"}, 677 + {"user_ringbuf_callback_submit_dynptr", "arg 1 is an unacquired reference"}, 678 + {"user_ringbuf_callback_invalid_return", "At callback return the register R0 has value"}, 679 + }; 680 + 681 + #define SUCCESS_TEST(_func) { _func, #_func } 682 + 683 + static struct { 684 + void (*test_callback)(void); 685 + const char *test_name; 686 + } success_tests[] = { 687 + SUCCESS_TEST(test_user_ringbuf_mappings), 688 + SUCCESS_TEST(test_user_ringbuf_post_misaligned), 689 + SUCCESS_TEST(test_user_ringbuf_post_producer_wrong_offset), 690 + SUCCESS_TEST(test_user_ringbuf_post_larger_than_ringbuf_sz), 691 + SUCCESS_TEST(test_user_ringbuf_basic), 692 + SUCCESS_TEST(test_user_ringbuf_sample_full_ring_buffer), 693 + SUCCESS_TEST(test_user_ringbuf_post_alignment_autoadjust), 694 + SUCCESS_TEST(test_user_ringbuf_overfill), 695 + SUCCESS_TEST(test_user_ringbuf_discards_properly_ignored), 696 + SUCCESS_TEST(test_user_ringbuf_loop), 697 + SUCCESS_TEST(test_user_ringbuf_msg_protocol), 698 + SUCCESS_TEST(test_user_ringbuf_blocking_reserve), 699 + }; 700 + 701 + static void verify_fail(const char *prog_name, const char *expected_err_msg) 702 + { 703 + LIBBPF_OPTS(bpf_object_open_opts, opts); 704 + struct bpf_program *prog; 705 + struct user_ringbuf_fail *skel; 706 + int err; 707 + 708 + opts.kernel_log_buf = obj_log_buf; 709 + opts.kernel_log_size = log_buf_sz; 710 + opts.kernel_log_level = 1; 711 + 712 + skel = user_ringbuf_fail__open_opts(&opts); 713 + if (!ASSERT_OK_PTR(skel, "dynptr_fail__open_opts")) 714 + goto cleanup; 715 + 716 + prog = bpf_object__find_program_by_name(skel->obj, prog_name); 717 + if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 718 + goto cleanup; 719 + 720 + bpf_program__set_autoload(prog, true); 721 + 722 + bpf_map__set_max_entries(skel->maps.user_ringbuf, getpagesize()); 723 + 724 + err = user_ringbuf_fail__load(skel); 725 + if (!ASSERT_ERR(err, "unexpected load success")) 726 + goto cleanup; 727 + 728 + if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) { 729 + fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg); 730 + fprintf(stderr, "Verifier output: %s\n", obj_log_buf); 731 + } 732 + 733 + cleanup: 734 + user_ringbuf_fail__destroy(skel); 735 + } 736 + 737 + void test_user_ringbuf(void) 738 + { 739 + int i; 740 + 741 + for (i = 0; i < ARRAY_SIZE(success_tests); i++) { 742 + if (!test__start_subtest(success_tests[i].test_name)) 743 + continue; 744 + 745 + success_tests[i].test_callback(); 746 + } 747 + 748 + for (i = 0; i < ARRAY_SIZE(failure_tests); i++) { 749 + if (!test__start_subtest(failure_tests[i].prog_name)) 750 + continue; 751 + 752 + verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg); 753 + } 754 + }

+399

tools/testing/selftests/bpf/prog_tests/verify_pkcs7_sig.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH 5 + * 6 + * Author: Roberto Sassu <roberto.sassu@huawei.com> 7 + */ 8 + 9 + #include <stdio.h> 10 + #include <errno.h> 11 + #include <stdlib.h> 12 + #include <unistd.h> 13 + #include <endian.h> 14 + #include <limits.h> 15 + #include <sys/stat.h> 16 + #include <sys/wait.h> 17 + #include <sys/mman.h> 18 + #include <linux/keyctl.h> 19 + #include <test_progs.h> 20 + 21 + #include "test_verify_pkcs7_sig.skel.h" 22 + 23 + #define MAX_DATA_SIZE (1024 * 1024) 24 + #define MAX_SIG_SIZE 1024 25 + 26 + #define VERIFY_USE_SECONDARY_KEYRING (1UL) 27 + #define VERIFY_USE_PLATFORM_KEYRING (2UL) 28 + 29 + /* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */ 30 + #define MODULE_SIG_STRING "~Module signature appended~\n" 31 + 32 + /* 33 + * Module signature information block. 34 + * 35 + * The constituents of the signature section are, in order: 36 + * 37 + * - Signer's name 38 + * - Key identifier 39 + * - Signature data 40 + * - Information block 41 + */ 42 + struct module_signature { 43 + __u8 algo; /* Public-key crypto algorithm [0] */ 44 + __u8 hash; /* Digest algorithm [0] */ 45 + __u8 id_type; /* Key identifier type [PKEY_ID_PKCS7] */ 46 + __u8 signer_len; /* Length of signer's name [0] */ 47 + __u8 key_id_len; /* Length of key identifier [0] */ 48 + __u8 __pad[3]; 49 + __be32 sig_len; /* Length of signature data */ 50 + }; 51 + 52 + struct data { 53 + __u8 data[MAX_DATA_SIZE]; 54 + __u32 data_len; 55 + __u8 sig[MAX_SIG_SIZE]; 56 + __u32 sig_len; 57 + }; 58 + 59 + static bool kfunc_not_supported; 60 + 61 + static int libbpf_print_cb(enum libbpf_print_level level, const char *fmt, 62 + va_list args) 63 + { 64 + if (strcmp(fmt, "libbpf: extern (func ksym) '%s': not found in kernel or module BTFs\n")) 65 + return 0; 66 + 67 + if (strcmp(va_arg(args, char *), "bpf_verify_pkcs7_signature")) 68 + return 0; 69 + 70 + kfunc_not_supported = true; 71 + return 0; 72 + } 73 + 74 + static int _run_setup_process(const char *setup_dir, const char *cmd) 75 + { 76 + int child_pid, child_status; 77 + 78 + child_pid = fork(); 79 + if (child_pid == 0) { 80 + execlp("./verify_sig_setup.sh", "./verify_sig_setup.sh", cmd, 81 + setup_dir, NULL); 82 + exit(errno); 83 + 84 + } else if (child_pid > 0) { 85 + waitpid(child_pid, &child_status, 0); 86 + return WEXITSTATUS(child_status); 87 + } 88 + 89 + return -EINVAL; 90 + } 91 + 92 + static int populate_data_item_str(const char *tmp_dir, struct data *data_item) 93 + { 94 + struct stat st; 95 + char data_template[] = "/tmp/dataXXXXXX"; 96 + char path[PATH_MAX]; 97 + int ret, fd, child_status, child_pid; 98 + 99 + data_item->data_len = 4; 100 + memcpy(data_item->data, "test", data_item->data_len); 101 + 102 + fd = mkstemp(data_template); 103 + if (fd == -1) 104 + return -errno; 105 + 106 + ret = write(fd, data_item->data, data_item->data_len); 107 + 108 + close(fd); 109 + 110 + if (ret != data_item->data_len) { 111 + ret = -EIO; 112 + goto out; 113 + } 114 + 115 + child_pid = fork(); 116 + 117 + if (child_pid == -1) { 118 + ret = -errno; 119 + goto out; 120 + } 121 + 122 + if (child_pid == 0) { 123 + snprintf(path, sizeof(path), "%s/signing_key.pem", tmp_dir); 124 + 125 + return execlp("./sign-file", "./sign-file", "-d", "sha256", 126 + path, path, data_template, NULL); 127 + } 128 + 129 + waitpid(child_pid, &child_status, 0); 130 + 131 + ret = WEXITSTATUS(child_status); 132 + if (ret) 133 + goto out; 134 + 135 + snprintf(path, sizeof(path), "%s.p7s", data_template); 136 + 137 + ret = stat(path, &st); 138 + if (ret == -1) { 139 + ret = -errno; 140 + goto out; 141 + } 142 + 143 + if (st.st_size > sizeof(data_item->sig)) { 144 + ret = -EINVAL; 145 + goto out_sig; 146 + } 147 + 148 + data_item->sig_len = st.st_size; 149 + 150 + fd = open(path, O_RDONLY); 151 + if (fd == -1) { 152 + ret = -errno; 153 + goto out_sig; 154 + } 155 + 156 + ret = read(fd, data_item->sig, data_item->sig_len); 157 + 158 + close(fd); 159 + 160 + if (ret != data_item->sig_len) { 161 + ret = -EIO; 162 + goto out_sig; 163 + } 164 + 165 + ret = 0; 166 + out_sig: 167 + unlink(path); 168 + out: 169 + unlink(data_template); 170 + return ret; 171 + } 172 + 173 + static int populate_data_item_mod(struct data *data_item) 174 + { 175 + char mod_path[PATH_MAX], *mod_path_ptr; 176 + struct stat st; 177 + void *mod; 178 + FILE *fp; 179 + struct module_signature ms; 180 + int ret, fd, modlen, marker_len, sig_len; 181 + 182 + data_item->data_len = 0; 183 + 184 + if (stat("/lib/modules", &st) == -1) 185 + return 0; 186 + 187 + /* Requires CONFIG_TCP_CONG_BIC=m. */ 188 + fp = popen("find /lib/modules/$(uname -r) -name tcp_bic.ko", "r"); 189 + if (!fp) 190 + return 0; 191 + 192 + mod_path_ptr = fgets(mod_path, sizeof(mod_path), fp); 193 + pclose(fp); 194 + 195 + if (!mod_path_ptr) 196 + return 0; 197 + 198 + mod_path_ptr = strchr(mod_path, '\n'); 199 + if (!mod_path_ptr) 200 + return 0; 201 + 202 + *mod_path_ptr = '\0'; 203 + 204 + if (stat(mod_path, &st) == -1) 205 + return 0; 206 + 207 + modlen = st.st_size; 208 + marker_len = sizeof(MODULE_SIG_STRING) - 1; 209 + 210 + fd = open(mod_path, O_RDONLY); 211 + if (fd == -1) 212 + return -errno; 213 + 214 + mod = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); 215 + 216 + close(fd); 217 + 218 + if (mod == MAP_FAILED) 219 + return -errno; 220 + 221 + if (strncmp(mod + modlen - marker_len, MODULE_SIG_STRING, marker_len)) { 222 + ret = -EINVAL; 223 + goto out; 224 + } 225 + 226 + modlen -= marker_len; 227 + 228 + memcpy(&ms, mod + (modlen - sizeof(ms)), sizeof(ms)); 229 + 230 + sig_len = __be32_to_cpu(ms.sig_len); 231 + modlen -= sig_len + sizeof(ms); 232 + 233 + if (modlen > sizeof(data_item->data)) { 234 + ret = -E2BIG; 235 + goto out; 236 + } 237 + 238 + memcpy(data_item->data, mod, modlen); 239 + data_item->data_len = modlen; 240 + 241 + if (sig_len > sizeof(data_item->sig)) { 242 + ret = -E2BIG; 243 + goto out; 244 + } 245 + 246 + memcpy(data_item->sig, mod + modlen, sig_len); 247 + data_item->sig_len = sig_len; 248 + ret = 0; 249 + out: 250 + munmap(mod, st.st_size); 251 + return ret; 252 + } 253 + 254 + void test_verify_pkcs7_sig(void) 255 + { 256 + libbpf_print_fn_t old_print_cb; 257 + char tmp_dir_template[] = "/tmp/verify_sigXXXXXX"; 258 + char *tmp_dir; 259 + struct test_verify_pkcs7_sig *skel = NULL; 260 + struct bpf_map *map; 261 + struct data data; 262 + int ret, zero = 0; 263 + 264 + /* Trigger creation of session keyring. */ 265 + syscall(__NR_request_key, "keyring", "_uid.0", NULL, 266 + KEY_SPEC_SESSION_KEYRING); 267 + 268 + tmp_dir = mkdtemp(tmp_dir_template); 269 + if (!ASSERT_OK_PTR(tmp_dir, "mkdtemp")) 270 + return; 271 + 272 + ret = _run_setup_process(tmp_dir, "setup"); 273 + if (!ASSERT_OK(ret, "_run_setup_process")) 274 + goto close_prog; 275 + 276 + skel = test_verify_pkcs7_sig__open(); 277 + if (!ASSERT_OK_PTR(skel, "test_verify_pkcs7_sig__open")) 278 + goto close_prog; 279 + 280 + old_print_cb = libbpf_set_print(libbpf_print_cb); 281 + ret = test_verify_pkcs7_sig__load(skel); 282 + libbpf_set_print(old_print_cb); 283 + 284 + if (ret < 0 && kfunc_not_supported) { 285 + printf( 286 + "%s:SKIP:bpf_verify_pkcs7_signature() kfunc not supported\n", 287 + __func__); 288 + test__skip(); 289 + goto close_prog; 290 + } 291 + 292 + if (!ASSERT_OK(ret, "test_verify_pkcs7_sig__load")) 293 + goto close_prog; 294 + 295 + ret = test_verify_pkcs7_sig__attach(skel); 296 + if (!ASSERT_OK(ret, "test_verify_pkcs7_sig__attach")) 297 + goto close_prog; 298 + 299 + map = bpf_object__find_map_by_name(skel->obj, "data_input"); 300 + if (!ASSERT_OK_PTR(map, "data_input not found")) 301 + goto close_prog; 302 + 303 + skel->bss->monitored_pid = getpid(); 304 + 305 + /* Test without data and signature. */ 306 + skel->bss->user_keyring_serial = KEY_SPEC_SESSION_KEYRING; 307 + 308 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY); 309 + if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input")) 310 + goto close_prog; 311 + 312 + /* Test successful signature verification with session keyring. */ 313 + ret = populate_data_item_str(tmp_dir, &data); 314 + if (!ASSERT_OK(ret, "populate_data_item_str")) 315 + goto close_prog; 316 + 317 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY); 318 + if (!ASSERT_OK(ret, "bpf_map_update_elem data_input")) 319 + goto close_prog; 320 + 321 + /* Test successful signature verification with testing keyring. */ 322 + skel->bss->user_keyring_serial = syscall(__NR_request_key, "keyring", 323 + "ebpf_testing_keyring", NULL, 324 + KEY_SPEC_SESSION_KEYRING); 325 + 326 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY); 327 + if (!ASSERT_OK(ret, "bpf_map_update_elem data_input")) 328 + goto close_prog; 329 + 330 + /* 331 + * Ensure key_task_permission() is called and rejects the keyring 332 + * (no Search permission). 333 + */ 334 + syscall(__NR_keyctl, KEYCTL_SETPERM, skel->bss->user_keyring_serial, 335 + 0x37373737); 336 + 337 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY); 338 + if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input")) 339 + goto close_prog; 340 + 341 + syscall(__NR_keyctl, KEYCTL_SETPERM, skel->bss->user_keyring_serial, 342 + 0x3f3f3f3f); 343 + 344 + /* 345 + * Ensure key_validate() is called and rejects the keyring (key expired) 346 + */ 347 + syscall(__NR_keyctl, KEYCTL_SET_TIMEOUT, 348 + skel->bss->user_keyring_serial, 1); 349 + sleep(1); 350 + 351 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY); 352 + if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input")) 353 + goto close_prog; 354 + 355 + skel->bss->user_keyring_serial = KEY_SPEC_SESSION_KEYRING; 356 + 357 + /* Test with corrupted data (signature verification should fail). */ 358 + data.data[0] = 'a'; 359 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY); 360 + if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input")) 361 + goto close_prog; 362 + 363 + ret = populate_data_item_mod(&data); 364 + if (!ASSERT_OK(ret, "populate_data_item_mod")) 365 + goto close_prog; 366 + 367 + /* Test signature verification with system keyrings. */ 368 + if (data.data_len) { 369 + skel->bss->user_keyring_serial = 0; 370 + skel->bss->system_keyring_id = 0; 371 + 372 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, 373 + BPF_ANY); 374 + if (!ASSERT_OK(ret, "bpf_map_update_elem data_input")) 375 + goto close_prog; 376 + 377 + skel->bss->system_keyring_id = VERIFY_USE_SECONDARY_KEYRING; 378 + 379 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, 380 + BPF_ANY); 381 + if (!ASSERT_OK(ret, "bpf_map_update_elem data_input")) 382 + goto close_prog; 383 + 384 + skel->bss->system_keyring_id = VERIFY_USE_PLATFORM_KEYRING; 385 + 386 + ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, 387 + BPF_ANY); 388 + ASSERT_LT(ret, 0, "bpf_map_update_elem data_input"); 389 + } 390 + 391 + close_prog: 392 + _run_setup_process(tmp_dir, "cleanup"); 393 + 394 + if (!skel) 395 + return; 396 + 397 + skel->bss->monitored_pid = 0; 398 + test_verify_pkcs7_sig__destroy(skel); 399 + }

+17 -8

tools/testing/selftests/bpf/progs/bpf_dctcp.c

··· 11 11 #include <linux/types.h> 12 12 #include <linux/stddef.h> 13 13 #include <linux/tcp.h> 14 + #include <errno.h> 14 15 #include <bpf/bpf_helpers.h> 15 16 #include <bpf/bpf_tracing.h> 16 17 #include "bpf_tcp_helpers.h" ··· 24 23 char cc_res[TCP_CA_NAME_MAX]; 25 24 int tcp_cdg_res = 0; 26 25 int stg_result = 0; 26 + int ebusy_cnt = 0; 27 27 28 28 struct { 29 29 __uint(type, BPF_MAP_TYPE_SK_STORAGE); ··· 66 64 67 65 if (!(tp->ecn_flags & TCP_ECN_OK) && fallback[0]) { 68 66 /* Switch to fallback */ 69 - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, 70 - (void *)fallback, sizeof(fallback)); 71 - /* Switch back to myself which the bpf trampoline 72 - * stopped calling dctcp_init recursively. 67 + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, 68 + (void *)fallback, sizeof(fallback)) == -EBUSY) 69 + ebusy_cnt++; 70 + 71 + /* Switch back to myself and the recurred dctcp_init() 72 + * will get -EBUSY for all bpf_setsockopt(TCP_CONGESTION), 73 + * except the last "cdg" one. 73 74 */ 74 - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, 75 - (void *)bpf_dctcp, sizeof(bpf_dctcp)); 75 + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, 76 + (void *)bpf_dctcp, sizeof(bpf_dctcp)) == -EBUSY) 77 + ebusy_cnt++; 78 + 76 79 /* Switch back to fallback */ 77 - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, 78 - (void *)fallback, sizeof(fallback)); 80 + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, 81 + (void *)fallback, sizeof(fallback)) == -EBUSY) 82 + ebusy_cnt++; 83 + 79 84 /* Expecting -ENOTSUPP for tcp_cdg_res */ 80 85 tcp_cdg_res = bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, 81 86 (void *)tcp_cdg, sizeof(tcp_cdg));

+9

tools/testing/selftests/bpf/progs/bpf_iter_task.c

··· 6 6 7 7 char _license[] SEC("license") = "GPL"; 8 8 9 + uint32_t tid = 0; 10 + int num_unknown_tid = 0; 11 + int num_known_tid = 0; 12 + 9 13 SEC("iter/task") 10 14 int dump_task(struct bpf_iter__task *ctx) 11 15 { ··· 21 17 BPF_SEQ_PRINTF(seq, "%s\n", info); 22 18 return 0; 23 19 } 20 + 21 + if (task->pid != tid) 22 + num_unknown_tid++; 23 + else 24 + num_known_tid++; 24 25 25 26 if (ctx->meta->seq_num == 0) 26 27 BPF_SEQ_PRINTF(seq, " tgid gid\n");

+8 -1

tools/testing/selftests/bpf/progs/bpf_iter_task_file.c

··· 7 7 8 8 int count = 0; 9 9 int tgid = 0; 10 + int last_tgid = 0; 11 + int unique_tgid_count = 0; 10 12 11 13 SEC("iter/task_file") 12 14 int dump_task_file(struct bpf_iter__task_file *ctx) 13 15 { 14 16 struct seq_file *seq = ctx->meta->seq; 15 17 struct task_struct *task = ctx->task; 16 - __u32 fd = ctx->fd; 17 18 struct file *file = ctx->file; 19 + __u32 fd = ctx->fd; 18 20 19 21 if (task == (void *)0 || file == (void *)0) 20 22 return 0; ··· 28 26 29 27 if (tgid == task->tgid && task->tgid != task->pid) 30 28 count++; 29 + 30 + if (last_tgid != task->tgid) { 31 + last_tgid = task->tgid; 32 + unique_tgid_count++; 33 + } 31 34 32 35 BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd, 33 36 (long)file->f_op);

+6 -1

tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c

··· 20 20 #define D_PATH_BUF_SIZE 1024 21 21 char d_path_buf[D_PATH_BUF_SIZE] = {}; 22 22 __u32 pid = 0; 23 + __u32 one_task = 0; 24 + __u32 one_task_error = 0; 23 25 24 26 SEC("iter/task_vma") int proc_maps(struct bpf_iter__task_vma *ctx) 25 27 { ··· 35 33 return 0; 36 34 37 35 file = vma->vm_file; 38 - if (task->tgid != pid) 36 + if (task->tgid != pid) { 37 + if (one_task) 38 + one_task_error = 1; 39 39 return 0; 40 + } 40 41 perm_str[0] = (vma->vm_flags & VM_READ) ? 'r' : '-'; 41 42 perm_str[1] = (vma->vm_flags & VM_WRITE) ? 'w' : '-'; 42 43 perm_str[2] = (vma->vm_flags & VM_EXEC) ? 'x' : '-';

+37

tools/testing/selftests/bpf/progs/bpf_iter_vma_offset.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + #include "bpf_iter.h" 4 + #include <bpf/bpf_helpers.h> 5 + 6 + char _license[] SEC("license") = "GPL"; 7 + 8 + __u32 unique_tgid_cnt = 0; 9 + uintptr_t address = 0; 10 + uintptr_t offset = 0; 11 + __u32 last_tgid = 0; 12 + __u32 pid = 0; 13 + __u32 page_shift = 0; 14 + 15 + SEC("iter/task_vma") 16 + int get_vma_offset(struct bpf_iter__task_vma *ctx) 17 + { 18 + struct vm_area_struct *vma = ctx->vma; 19 + struct seq_file *seq = ctx->meta->seq; 20 + struct task_struct *task = ctx->task; 21 + 22 + if (task == NULL || vma == NULL) 23 + return 0; 24 + 25 + if (last_tgid != task->tgid) 26 + unique_tgid_cnt++; 27 + last_tgid = task->tgid; 28 + 29 + if (task->tgid != pid) 30 + return 0; 31 + 32 + if (vma->vm_start <= address && vma->vm_end > address) { 33 + offset = address - vma->vm_start + (vma->vm_pgoff << page_shift); 34 + BPF_SEQ_PRINTF(seq, "OK\n"); 35 + } 36 + return 0; 37 + }

+54 -125

tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 /* 3 - * Functions to manage eBPF programs attached to cgroup subsystems 4 - * 5 3 * Copyright 2022 Google LLC. 6 4 */ 7 5 #include "vmlinux.h" ··· 9 11 10 12 char _license[] SEC("license") = "GPL"; 11 13 12 - /* 13 - * Start times are stored per-task, not per-cgroup, as multiple tasks in one 14 - * cgroup can perform reclaim concurrently. 15 - */ 16 - struct { 17 - __uint(type, BPF_MAP_TYPE_TASK_STORAGE); 18 - __uint(map_flags, BPF_F_NO_PREALLOC); 19 - __type(key, int); 20 - __type(value, __u64); 21 - } vmscan_start_time SEC(".maps"); 22 - 23 - struct vmscan_percpu { 14 + struct percpu_attach_counter { 24 15 /* Previous percpu state, to figure out if we have new updates */ 25 16 __u64 prev; 26 17 /* Current percpu state */ 27 18 __u64 state; 28 19 }; 29 20 30 - struct vmscan { 21 + struct attach_counter { 31 22 /* State propagated through children, pending aggregation */ 32 23 __u64 pending; 33 24 /* Total state, including all cpus and all children */ ··· 25 38 26 39 struct { 27 40 __uint(type, BPF_MAP_TYPE_PERCPU_HASH); 28 - __uint(max_entries, 100); 41 + __uint(max_entries, 1024); 29 42 __type(key, __u64); 30 - __type(value, struct vmscan_percpu); 31 - } pcpu_cgroup_vmscan_elapsed SEC(".maps"); 43 + __type(value, struct percpu_attach_counter); 44 + } percpu_attach_counters SEC(".maps"); 32 45 33 46 struct { 34 47 __uint(type, BPF_MAP_TYPE_HASH); 35 - __uint(max_entries, 100); 48 + __uint(max_entries, 1024); 36 49 __type(key, __u64); 37 - __type(value, struct vmscan); 38 - } cgroup_vmscan_elapsed SEC(".maps"); 50 + __type(value, struct attach_counter); 51 + } attach_counters SEC(".maps"); 39 52 40 53 extern void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __ksym; 41 54 extern void cgroup_rstat_flush(struct cgroup *cgrp) __ksym; 42 - 43 - static struct cgroup *task_memcg(struct task_struct *task) 44 - { 45 - int cgrp_id; 46 - 47 - #if __has_builtin(__builtin_preserve_enum_value) 48 - cgrp_id = bpf_core_enum_value(enum cgroup_subsys_id, memory_cgrp_id); 49 - #else 50 - cgrp_id = memory_cgrp_id; 51 - #endif 52 - return task->cgroups->subsys[cgrp_id]->cgroup; 53 - } 54 55 55 56 static uint64_t cgroup_id(struct cgroup *cgrp) 56 57 { 57 58 return cgrp->kn->id; 58 59 } 59 60 60 - static int create_vmscan_percpu_elem(__u64 cg_id, __u64 state) 61 + static int create_percpu_attach_counter(__u64 cg_id, __u64 state) 61 62 { 62 - struct vmscan_percpu pcpu_init = {.state = state, .prev = 0}; 63 + struct percpu_attach_counter pcpu_init = {.state = state, .prev = 0}; 63 64 64 - return bpf_map_update_elem(&pcpu_cgroup_vmscan_elapsed, &cg_id, 65 + return bpf_map_update_elem(&percpu_attach_counters, &cg_id, 65 66 &pcpu_init, BPF_NOEXIST); 66 67 } 67 68 68 - static int create_vmscan_elem(__u64 cg_id, __u64 state, __u64 pending) 69 + static int create_attach_counter(__u64 cg_id, __u64 state, __u64 pending) 69 70 { 70 - struct vmscan init = {.state = state, .pending = pending}; 71 + struct attach_counter init = {.state = state, .pending = pending}; 71 72 72 - return bpf_map_update_elem(&cgroup_vmscan_elapsed, &cg_id, 73 + return bpf_map_update_elem(&attach_counters, &cg_id, 73 74 &init, BPF_NOEXIST); 74 75 } 75 76 76 - SEC("tp_btf/mm_vmscan_memcg_reclaim_begin") 77 - int BPF_PROG(vmscan_start, int order, gfp_t gfp_flags) 77 + SEC("fentry/cgroup_attach_task") 78 + int BPF_PROG(counter, struct cgroup *dst_cgrp, struct task_struct *leader, 79 + bool threadgroup) 78 80 { 79 - struct task_struct *task = bpf_get_current_task_btf(); 80 - __u64 *start_time_ptr; 81 + __u64 cg_id = cgroup_id(dst_cgrp); 82 + struct percpu_attach_counter *pcpu_counter = bpf_map_lookup_elem( 83 + &percpu_attach_counters, 84 + &cg_id); 81 85 82 - start_time_ptr = bpf_task_storage_get(&vmscan_start_time, task, 0, 83 - BPF_LOCAL_STORAGE_GET_F_CREATE); 84 - if (start_time_ptr) 85 - *start_time_ptr = bpf_ktime_get_ns(); 86 - return 0; 87 - } 88 - 89 - SEC("tp_btf/mm_vmscan_memcg_reclaim_end") 90 - int BPF_PROG(vmscan_end, unsigned long nr_reclaimed) 91 - { 92 - struct vmscan_percpu *pcpu_stat; 93 - struct task_struct *current = bpf_get_current_task_btf(); 94 - struct cgroup *cgrp; 95 - __u64 *start_time_ptr; 96 - __u64 current_elapsed, cg_id; 97 - __u64 end_time = bpf_ktime_get_ns(); 98 - 99 - /* 100 - * cgrp is the first parent cgroup of current that has memcg enabled in 101 - * its subtree_control, or NULL if memcg is disabled in the entire tree. 102 - * In a cgroup hierarchy like this: 103 - * a 104 - * / \ 105 - * b c 106 - * If "a" has memcg enabled, while "b" doesn't, then processes in "b" 107 - * will accumulate their stats directly to "a". This makes sure that no 108 - * stats are lost from processes in leaf cgroups that don't have memcg 109 - * enabled, but only exposes stats for cgroups that have memcg enabled. 110 - */ 111 - cgrp = task_memcg(current); 112 - if (!cgrp) 86 + if (pcpu_counter) 87 + pcpu_counter->state += 1; 88 + else if (create_percpu_attach_counter(cg_id, 1)) 113 89 return 0; 114 90 115 - cg_id = cgroup_id(cgrp); 116 - start_time_ptr = bpf_task_storage_get(&vmscan_start_time, current, 0, 117 - BPF_LOCAL_STORAGE_GET_F_CREATE); 118 - if (!start_time_ptr) 119 - return 0; 120 - 121 - current_elapsed = end_time - *start_time_ptr; 122 - pcpu_stat = bpf_map_lookup_elem(&pcpu_cgroup_vmscan_elapsed, 123 - &cg_id); 124 - if (pcpu_stat) 125 - pcpu_stat->state += current_elapsed; 126 - else if (create_vmscan_percpu_elem(cg_id, current_elapsed)) 127 - return 0; 128 - 129 - cgroup_rstat_updated(cgrp, bpf_get_smp_processor_id()); 91 + cgroup_rstat_updated(dst_cgrp, bpf_get_smp_processor_id()); 130 92 return 0; 131 93 } 132 94 133 95 SEC("fentry/bpf_rstat_flush") 134 - int BPF_PROG(vmscan_flush, struct cgroup *cgrp, struct cgroup *parent, int cpu) 96 + int BPF_PROG(flusher, struct cgroup *cgrp, struct cgroup *parent, int cpu) 135 97 { 136 - struct vmscan_percpu *pcpu_stat; 137 - struct vmscan *total_stat, *parent_stat; 98 + struct percpu_attach_counter *pcpu_counter; 99 + struct attach_counter *total_counter, *parent_counter; 138 100 __u64 cg_id = cgroup_id(cgrp); 139 101 __u64 parent_cg_id = parent ? cgroup_id(parent) : 0; 140 - __u64 *pcpu_vmscan; 141 102 __u64 state; 142 103 __u64 delta = 0; 143 104 144 105 /* Add CPU changes on this level since the last flush */ 145 - pcpu_stat = bpf_map_lookup_percpu_elem(&pcpu_cgroup_vmscan_elapsed, 146 - &cg_id, cpu); 147 - if (pcpu_stat) { 148 - state = pcpu_stat->state; 149 - delta += state - pcpu_stat->prev; 150 - pcpu_stat->prev = state; 106 + pcpu_counter = bpf_map_lookup_percpu_elem(&percpu_attach_counters, 107 + &cg_id, cpu); 108 + if (pcpu_counter) { 109 + state = pcpu_counter->state; 110 + delta += state - pcpu_counter->prev; 111 + pcpu_counter->prev = state; 151 112 } 152 113 153 - total_stat = bpf_map_lookup_elem(&cgroup_vmscan_elapsed, &cg_id); 154 - if (!total_stat) { 155 - if (create_vmscan_elem(cg_id, delta, 0)) 114 + total_counter = bpf_map_lookup_elem(&attach_counters, &cg_id); 115 + if (!total_counter) { 116 + if (create_attach_counter(cg_id, delta, 0)) 156 117 return 0; 157 - 158 118 goto update_parent; 159 119 } 160 120 161 121 /* Collect pending stats from subtree */ 162 - if (total_stat->pending) { 163 - delta += total_stat->pending; 164 - total_stat->pending = 0; 122 + if (total_counter->pending) { 123 + delta += total_counter->pending; 124 + total_counter->pending = 0; 165 125 } 166 126 167 127 /* Propagate changes to this cgroup's total */ 168 - total_stat->state += delta; 128 + total_counter->state += delta; 169 129 170 130 update_parent: 171 131 /* Skip if there are no changes to propagate, or no parent */ ··· 120 186 return 0; 121 187 122 188 /* Propagate changes to cgroup's parent */ 123 - parent_stat = bpf_map_lookup_elem(&cgroup_vmscan_elapsed, 124 - &parent_cg_id); 125 - if (parent_stat) 126 - parent_stat->pending += delta; 189 + parent_counter = bpf_map_lookup_elem(&attach_counters, 190 + &parent_cg_id); 191 + if (parent_counter) 192 + parent_counter->pending += delta; 127 193 else 128 - create_vmscan_elem(parent_cg_id, 0, delta); 194 + create_attach_counter(parent_cg_id, 0, delta); 129 195 return 0; 130 196 } 131 197 132 198 SEC("iter.s/cgroup") 133 - int BPF_PROG(dump_vmscan, struct bpf_iter_meta *meta, struct cgroup *cgrp) 199 + int BPF_PROG(dumper, struct bpf_iter_meta *meta, struct cgroup *cgrp) 134 200 { 135 201 struct seq_file *seq = meta->seq; 136 - struct vmscan *total_stat; 202 + struct attach_counter *total_counter; 137 203 __u64 cg_id = cgrp ? cgroup_id(cgrp) : 0; 138 204 139 205 /* Do nothing for the terminal call */ ··· 143 209 /* Flush the stats to make sure we get the most updated numbers */ 144 210 cgroup_rstat_flush(cgrp); 145 211 146 - total_stat = bpf_map_lookup_elem(&cgroup_vmscan_elapsed, &cg_id); 147 - if (!total_stat) { 148 - BPF_SEQ_PRINTF(seq, "cg_id: %llu, total_vmscan_delay: 0\n", 212 + total_counter = bpf_map_lookup_elem(&attach_counters, &cg_id); 213 + if (!total_counter) { 214 + BPF_SEQ_PRINTF(seq, "cg_id: %llu, attach_counter: 0\n", 149 215 cg_id); 150 216 } else { 151 - BPF_SEQ_PRINTF(seq, "cg_id: %llu, total_vmscan_delay: %llu\n", 152 - cg_id, total_stat->state); 217 + BPF_SEQ_PRINTF(seq, "cg_id: %llu, attach_counter: %llu\n", 218 + cg_id, total_counter->state); 153 219 } 154 - 155 - /* 156 - * We only dump stats for one cgroup here, so return 1 to stop 157 - * iteration after the first cgroup. 158 - */ 159 - return 1; 220 + return 0; 160 221 }

+53

tools/testing/selftests/bpf/progs/connect_ping.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + /* 4 + * Copyright 2022 Google LLC. 5 + */ 6 + 7 + #include <linux/bpf.h> 8 + #include <bpf/bpf_helpers.h> 9 + #include <bpf/bpf_endian.h> 10 + #include <netinet/in.h> 11 + #include <sys/socket.h> 12 + 13 + /* 2001:db8::1 */ 14 + #define BINDADDR_V6 { { { 0x20,0x01,0x0d,0xb8,0,0,0,0,0,0,0,0,0,0,0,1 } } } 15 + 16 + __u32 do_bind = 0; 17 + __u32 has_error = 0; 18 + __u32 invocations_v4 = 0; 19 + __u32 invocations_v6 = 0; 20 + 21 + SEC("cgroup/connect4") 22 + int connect_v4_prog(struct bpf_sock_addr *ctx) 23 + { 24 + struct sockaddr_in sa = { 25 + .sin_family = AF_INET, 26 + .sin_addr.s_addr = bpf_htonl(0x01010101), 27 + }; 28 + 29 + __sync_fetch_and_add(&invocations_v4, 1); 30 + 31 + if (do_bind && bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa))) 32 + has_error = 1; 33 + 34 + return 1; 35 + } 36 + 37 + SEC("cgroup/connect6") 38 + int connect_v6_prog(struct bpf_sock_addr *ctx) 39 + { 40 + struct sockaddr_in6 sa = { 41 + .sin6_family = AF_INET6, 42 + .sin6_addr = BINDADDR_V6, 43 + }; 44 + 45 + __sync_fetch_and_add(&invocations_v6, 1); 46 + 47 + if (do_bind && bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa))) 48 + has_error = 1; 49 + 50 + return 1; 51 + } 52 + 53 + char _license[] SEC("license") = "GPL";

+13 -12

tools/testing/selftests/bpf/progs/get_func_ip_test.c

··· 2 2 #include <linux/bpf.h> 3 3 #include <bpf/bpf_helpers.h> 4 4 #include <bpf/bpf_tracing.h> 5 + #include <stdbool.h> 5 6 6 7 char _license[] SEC("license") = "GPL"; 7 8 ··· 13 12 extern const void bpf_modify_return_test __ksym; 14 13 extern const void bpf_fentry_test6 __ksym; 15 14 extern const void bpf_fentry_test7 __ksym; 15 + 16 + extern bool CONFIG_X86_KERNEL_IBT __kconfig __weak; 17 + 18 + /* This function is here to have CONFIG_X86_KERNEL_IBT 19 + * used and added to object BTF. 20 + */ 21 + int unused(void) 22 + { 23 + return CONFIG_X86_KERNEL_IBT ? 0 : 1; 24 + } 16 25 17 26 __u64 test1_result = 0; 18 27 SEC("fentry/bpf_fentry_test1") ··· 75 64 } 76 65 77 66 __u64 test6_result = 0; 78 - SEC("kprobe/bpf_fentry_test6+0x5") 67 + SEC("?kprobe") 79 68 int test6(struct pt_regs *ctx) 80 69 { 81 70 __u64 addr = bpf_get_func_ip(ctx); 82 71 83 - test6_result = (const void *) addr == &bpf_fentry_test6 + 5; 84 - return 0; 85 - } 86 - 87 - __u64 test7_result = 0; 88 - SEC("kprobe/bpf_fentry_test7+5") 89 - int test7(struct pt_regs *ctx) 90 - { 91 - __u64 addr = bpf_get_func_ip(ctx); 92 - 93 - test7_result = (const void *) addr == &bpf_fentry_test7 + 5; 72 + test6_result = (const void *) addr == 0; 94 73 return 0; 95 74 }

+160

tools/testing/selftests/bpf/progs/kfunc_call_fail.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2021 Facebook */ 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_helpers.h> 5 + 6 + extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym; 7 + extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym; 8 + extern void bpf_kfunc_call_test_mem_len_pass1(void *mem, int len) __ksym; 9 + extern int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p, const int rdwr_buf_size) __ksym; 10 + extern int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) __ksym; 11 + extern int *bpf_kfunc_call_test_acq_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) __ksym; 12 + extern void bpf_kfunc_call_int_mem_release(int *p) __ksym; 13 + 14 + struct syscall_test_args { 15 + __u8 data[16]; 16 + size_t size; 17 + }; 18 + 19 + SEC("?syscall") 20 + int kfunc_syscall_test_fail(struct syscall_test_args *args) 21 + { 22 + bpf_kfunc_call_test_mem_len_pass1(&args->data, sizeof(*args) + 1); 23 + 24 + return 0; 25 + } 26 + 27 + SEC("?syscall") 28 + int kfunc_syscall_test_null_fail(struct syscall_test_args *args) 29 + { 30 + /* Must be called with args as a NULL pointer 31 + * we do not check for it to have the verifier consider that 32 + * the pointer might not be null, and so we can load it. 33 + * 34 + * So the following can not be added: 35 + * 36 + * if (args) 37 + * return -22; 38 + */ 39 + 40 + bpf_kfunc_call_test_mem_len_pass1(args, sizeof(*args)); 41 + 42 + return 0; 43 + } 44 + 45 + SEC("?tc") 46 + int kfunc_call_test_get_mem_fail_rdonly(struct __sk_buff *skb) 47 + { 48 + struct prog_test_ref_kfunc *pt; 49 + unsigned long s = 0; 50 + int *p = NULL; 51 + int ret = 0; 52 + 53 + pt = bpf_kfunc_call_test_acquire(&s); 54 + if (pt) { 55 + p = bpf_kfunc_call_test_get_rdonly_mem(pt, 2 * sizeof(int)); 56 + if (p) 57 + p[0] = 42; /* this is a read-only buffer, so -EACCES */ 58 + else 59 + ret = -1; 60 + 61 + bpf_kfunc_call_test_release(pt); 62 + } 63 + return ret; 64 + } 65 + 66 + SEC("?tc") 67 + int kfunc_call_test_get_mem_fail_use_after_free(struct __sk_buff *skb) 68 + { 69 + struct prog_test_ref_kfunc *pt; 70 + unsigned long s = 0; 71 + int *p = NULL; 72 + int ret = 0; 73 + 74 + pt = bpf_kfunc_call_test_acquire(&s); 75 + if (pt) { 76 + p = bpf_kfunc_call_test_get_rdwr_mem(pt, 2 * sizeof(int)); 77 + if (p) { 78 + p[0] = 42; 79 + ret = p[1]; /* 108 */ 80 + } else { 81 + ret = -1; 82 + } 83 + 84 + bpf_kfunc_call_test_release(pt); 85 + } 86 + if (p) 87 + ret = p[0]; /* p is not valid anymore */ 88 + 89 + return ret; 90 + } 91 + 92 + SEC("?tc") 93 + int kfunc_call_test_get_mem_fail_oob(struct __sk_buff *skb) 94 + { 95 + struct prog_test_ref_kfunc *pt; 96 + unsigned long s = 0; 97 + int *p = NULL; 98 + int ret = 0; 99 + 100 + pt = bpf_kfunc_call_test_acquire(&s); 101 + if (pt) { 102 + p = bpf_kfunc_call_test_get_rdonly_mem(pt, 2 * sizeof(int)); 103 + if (p) 104 + ret = p[2 * sizeof(int)]; /* oob access, so -EACCES */ 105 + else 106 + ret = -1; 107 + 108 + bpf_kfunc_call_test_release(pt); 109 + } 110 + return ret; 111 + } 112 + 113 + int not_const_size = 2 * sizeof(int); 114 + 115 + SEC("?tc") 116 + int kfunc_call_test_get_mem_fail_not_const(struct __sk_buff *skb) 117 + { 118 + struct prog_test_ref_kfunc *pt; 119 + unsigned long s = 0; 120 + int *p = NULL; 121 + int ret = 0; 122 + 123 + pt = bpf_kfunc_call_test_acquire(&s); 124 + if (pt) { 125 + p = bpf_kfunc_call_test_get_rdonly_mem(pt, not_const_size); /* non const size, -EINVAL */ 126 + if (p) 127 + ret = p[0]; 128 + else 129 + ret = -1; 130 + 131 + bpf_kfunc_call_test_release(pt); 132 + } 133 + return ret; 134 + } 135 + 136 + SEC("?tc") 137 + int kfunc_call_test_mem_acquire_fail(struct __sk_buff *skb) 138 + { 139 + struct prog_test_ref_kfunc *pt; 140 + unsigned long s = 0; 141 + int *p = NULL; 142 + int ret = 0; 143 + 144 + pt = bpf_kfunc_call_test_acquire(&s); 145 + if (pt) { 146 + /* we are failing on this one, because we are not acquiring a PTR_TO_BTF_ID (a struct ptr) */ 147 + p = bpf_kfunc_call_test_acq_rdonly_mem(pt, 2 * sizeof(int)); 148 + if (p) 149 + ret = p[0]; 150 + else 151 + ret = -1; 152 + 153 + bpf_kfunc_call_int_mem_release(p); 154 + 155 + bpf_kfunc_call_test_release(pt); 156 + } 157 + return ret; 158 + } 159 + 160 + char _license[] SEC("license") = "GPL";

+71

tools/testing/selftests/bpf/progs/kfunc_call_test.c

··· 14 14 extern void bpf_kfunc_call_test_pass2(struct prog_test_pass2 *p) __ksym; 15 15 extern void bpf_kfunc_call_test_mem_len_pass1(void *mem, int len) __ksym; 16 16 extern void bpf_kfunc_call_test_mem_len_fail2(__u64 *mem, int len) __ksym; 17 + extern int *bpf_kfunc_call_test_get_rdwr_mem(struct prog_test_ref_kfunc *p, const int rdwr_buf_size) __ksym; 18 + extern int *bpf_kfunc_call_test_get_rdonly_mem(struct prog_test_ref_kfunc *p, const int rdonly_buf_size) __ksym; 17 19 18 20 SEC("tc") 19 21 int kfunc_call_test2(struct __sk_buff *skb) ··· 92 90 bpf_kfunc_call_test_mem_len_fail2(&b, -1); 93 91 94 92 return 0; 93 + } 94 + 95 + struct syscall_test_args { 96 + __u8 data[16]; 97 + size_t size; 98 + }; 99 + 100 + SEC("syscall") 101 + int kfunc_syscall_test(struct syscall_test_args *args) 102 + { 103 + const long size = args->size; 104 + 105 + if (size > sizeof(args->data)) 106 + return -7; /* -E2BIG */ 107 + 108 + bpf_kfunc_call_test_mem_len_pass1(&args->data, sizeof(args->data)); 109 + bpf_kfunc_call_test_mem_len_pass1(&args->data, sizeof(*args)); 110 + bpf_kfunc_call_test_mem_len_pass1(&args->data, size); 111 + 112 + return 0; 113 + } 114 + 115 + SEC("syscall") 116 + int kfunc_syscall_test_null(struct syscall_test_args *args) 117 + { 118 + /* Must be called with args as a NULL pointer 119 + * we do not check for it to have the verifier consider that 120 + * the pointer might not be null, and so we can load it. 121 + * 122 + * So the following can not be added: 123 + * 124 + * if (args) 125 + * return -22; 126 + */ 127 + 128 + bpf_kfunc_call_test_mem_len_pass1(args, 0); 129 + 130 + return 0; 131 + } 132 + 133 + SEC("tc") 134 + int kfunc_call_test_get_mem(struct __sk_buff *skb) 135 + { 136 + struct prog_test_ref_kfunc *pt; 137 + unsigned long s = 0; 138 + int *p = NULL; 139 + int ret = 0; 140 + 141 + pt = bpf_kfunc_call_test_acquire(&s); 142 + if (pt) { 143 + p = bpf_kfunc_call_test_get_rdwr_mem(pt, 2 * sizeof(int)); 144 + if (p) { 145 + p[0] = 42; 146 + ret = p[1]; /* 108 */ 147 + } else { 148 + ret = -1; 149 + } 150 + 151 + if (ret >= 0) { 152 + p = bpf_kfunc_call_test_get_rdonly_mem(pt, 2 * sizeof(int)); 153 + if (p) 154 + ret = p[0]; /* 42 */ 155 + else 156 + ret = -1; 157 + } 158 + 159 + bpf_kfunc_call_test_release(pt); 160 + } 161 + return ret; 95 162 } 96 163 97 164 char _license[] SEC("license") = "GPL";

+1 -3

tools/testing/selftests/bpf/progs/kprobe_multi.c

··· 36 36 __u64 kretprobe_test7_result = 0; 37 37 __u64 kretprobe_test8_result = 0; 38 38 39 - extern bool CONFIG_X86_KERNEL_IBT __kconfig __weak; 40 - 41 39 static void kprobe_multi_check(void *ctx, bool is_return) 42 40 { 43 41 if (bpf_get_current_pid_tgid() >> 32 != pid) 44 42 return; 45 43 46 44 __u64 cookie = test_cookie ? bpf_get_attach_cookie(ctx) : 0; 47 - __u64 addr = bpf_get_func_ip(ctx) - (CONFIG_X86_KERNEL_IBT ? 4 : 0); 45 + __u64 addr = bpf_get_func_ip(ctx); 48 46 49 47 #define SET(__var, __addr, __cookie) ({ \ 50 48 if (((const void *) addr == __addr) && \

+38 -5

tools/testing/selftests/bpf/progs/test_bpf_nf.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <vmlinux.h> 3 3 #include <bpf/bpf_helpers.h> 4 + #include <bpf/bpf_endian.h> 4 5 5 6 #define EAFNOSUPPORT 97 6 7 #define EPROTO 71 ··· 24 23 int test_succ_lookup = -ENOENT; 25 24 u32 test_delta_timeout = 0; 26 25 u32 test_status = 0; 26 + u32 test_insert_lookup_mark = 0; 27 + int test_snat_addr = -EINVAL; 28 + int test_dnat_addr = -EINVAL; 27 29 __be32 saddr = 0; 28 30 __be16 sport = 0; 29 31 __be32 daddr = 0; ··· 57 53 int bpf_ct_change_timeout(struct nf_conn *, u32) __ksym; 58 54 int bpf_ct_set_status(struct nf_conn *, u32) __ksym; 59 55 int bpf_ct_change_status(struct nf_conn *, u32) __ksym; 56 + int bpf_ct_set_nat_info(struct nf_conn *, union nf_inet_addr *, 57 + int port, enum nf_nat_manip_type) __ksym; 60 58 61 59 static __always_inline void 62 60 nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32, ··· 146 140 ct = alloc_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, 147 141 sizeof(opts_def)); 148 142 if (ct) { 143 + __u16 sport = bpf_get_prandom_u32(); 144 + __u16 dport = bpf_get_prandom_u32(); 145 + union nf_inet_addr saddr = {}; 146 + union nf_inet_addr daddr = {}; 149 147 struct nf_conn *ct_ins; 150 148 151 149 bpf_ct_set_timeout(ct, 10000); 152 - bpf_ct_set_status(ct, IPS_CONFIRMED); 150 + ct->mark = 77; 151 + 152 + /* snat */ 153 + saddr.ip = bpf_get_prandom_u32(); 154 + bpf_ct_set_nat_info(ct, &saddr, sport, NF_NAT_MANIP_SRC); 155 + /* dnat */ 156 + daddr.ip = bpf_get_prandom_u32(); 157 + bpf_ct_set_nat_info(ct, &daddr, dport, NF_NAT_MANIP_DST); 153 158 154 159 ct_ins = bpf_ct_insert_entry(ct); 155 160 if (ct_ins) { ··· 169 152 ct_lk = lookup_fn(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), 170 153 &opts_def, sizeof(opts_def)); 171 154 if (ct_lk) { 155 + struct nf_conntrack_tuple *tuple; 156 + 157 + /* check snat and dnat addresses */ 158 + tuple = &ct_lk->tuplehash[IP_CT_DIR_REPLY].tuple; 159 + if (tuple->dst.u3.ip == saddr.ip && 160 + tuple->dst.u.all == bpf_htons(sport)) 161 + test_snat_addr = 0; 162 + if (tuple->src.u3.ip == daddr.ip && 163 + tuple->src.u.all == bpf_htons(dport)) 164 + test_dnat_addr = 0; 165 + 172 166 /* update ct entry timeout */ 173 167 bpf_ct_change_timeout(ct_lk, 10000); 174 168 test_delta_timeout = ct_lk->timeout - bpf_jiffies64(); 175 169 test_delta_timeout /= CONFIG_HZ; 176 - test_status = IPS_SEEN_REPLY; 177 - bpf_ct_change_status(ct_lk, IPS_SEEN_REPLY); 170 + test_insert_lookup_mark = ct_lk->mark; 171 + bpf_ct_change_status(ct_lk, 172 + IPS_CONFIRMED | IPS_SEEN_REPLY); 173 + test_status = ct_lk->status; 174 + 178 175 bpf_ct_release(ct_lk); 179 176 test_succ_lookup = 0; 180 177 } ··· 206 175 sizeof(opts_def)); 207 176 if (ct) { 208 177 test_exist_lookup = 0; 209 - if (ct->mark == 42) 210 - test_exist_lookup_mark = 43; 178 + if (ct->mark == 42) { 179 + ct->mark++; 180 + test_exist_lookup_mark = ct->mark; 181 + } 211 182 bpf_ct_release(ct); 212 183 } else { 213 184 test_exist_lookup = opts_def.error;

+14

tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c

··· 70 70 } 71 71 72 72 SEC("?tc") 73 + int write_not_allowlisted_field(struct __sk_buff *ctx) 74 + { 75 + struct bpf_ct_opts___local opts = {}; 76 + struct bpf_sock_tuple tup = {}; 77 + struct nf_conn *ct; 78 + 79 + ct = bpf_skb_ct_lookup(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts)); 80 + if (!ct) 81 + return 0; 82 + ct->status = 0xF00; 83 + return 0; 84 + } 85 + 86 + SEC("?tc") 73 87 int set_timeout_after_insert(struct __sk_buff *ctx) 74 88 { 75 89 struct bpf_ct_opts___local opts = {};

+94

tools/testing/selftests/bpf/progs/test_kfunc_dynptr_param.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH 5 + * 6 + * Author: Roberto Sassu <roberto.sassu@huawei.com> 7 + */ 8 + 9 + #include "vmlinux.h" 10 + #include <errno.h> 11 + #include <bpf/bpf_helpers.h> 12 + #include <bpf/bpf_tracing.h> 13 + 14 + extern struct bpf_key *bpf_lookup_system_key(__u64 id) __ksym; 15 + extern void bpf_key_put(struct bpf_key *key) __ksym; 16 + extern int bpf_verify_pkcs7_signature(struct bpf_dynptr *data_ptr, 17 + struct bpf_dynptr *sig_ptr, 18 + struct bpf_key *trusted_keyring) __ksym; 19 + 20 + struct { 21 + __uint(type, BPF_MAP_TYPE_RINGBUF); 22 + } ringbuf SEC(".maps"); 23 + 24 + struct { 25 + __uint(type, BPF_MAP_TYPE_ARRAY); 26 + __uint(max_entries, 1); 27 + __type(key, __u32); 28 + __type(value, __u32); 29 + } array_map SEC(".maps"); 30 + 31 + int err, pid; 32 + 33 + char _license[] SEC("license") = "GPL"; 34 + 35 + SEC("?lsm.s/bpf") 36 + int BPF_PROG(dynptr_type_not_supp, int cmd, union bpf_attr *attr, 37 + unsigned int size) 38 + { 39 + char write_data[64] = "hello there, world!!"; 40 + struct bpf_dynptr ptr; 41 + 42 + bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(write_data), 0, &ptr); 43 + 44 + return bpf_verify_pkcs7_signature(&ptr, &ptr, NULL); 45 + } 46 + 47 + SEC("?lsm.s/bpf") 48 + int BPF_PROG(not_valid_dynptr, int cmd, union bpf_attr *attr, unsigned int size) 49 + { 50 + unsigned long val; 51 + 52 + return bpf_verify_pkcs7_signature((struct bpf_dynptr *)&val, 53 + (struct bpf_dynptr *)&val, NULL); 54 + } 55 + 56 + SEC("?lsm.s/bpf") 57 + int BPF_PROG(not_ptr_to_stack, int cmd, union bpf_attr *attr, unsigned int size) 58 + { 59 + unsigned long val; 60 + 61 + return bpf_verify_pkcs7_signature((struct bpf_dynptr *)val, 62 + (struct bpf_dynptr *)val, NULL); 63 + } 64 + 65 + SEC("lsm.s/bpf") 66 + int BPF_PROG(dynptr_data_null, int cmd, union bpf_attr *attr, unsigned int size) 67 + { 68 + struct bpf_key *trusted_keyring; 69 + struct bpf_dynptr ptr; 70 + __u32 *value; 71 + int ret, zero = 0; 72 + 73 + if (bpf_get_current_pid_tgid() >> 32 != pid) 74 + return 0; 75 + 76 + value = bpf_map_lookup_elem(&array_map, &zero); 77 + if (!value) 78 + return 0; 79 + 80 + /* Pass invalid flags. */ 81 + ret = bpf_dynptr_from_mem(value, sizeof(*value), ((__u64)~0ULL), &ptr); 82 + if (ret != -EINVAL) 83 + return 0; 84 + 85 + trusted_keyring = bpf_lookup_system_key(0); 86 + if (!trusted_keyring) 87 + return 0; 88 + 89 + err = bpf_verify_pkcs7_signature(&ptr, &ptr, trusted_keyring); 90 + 91 + bpf_key_put(trusted_keyring); 92 + 93 + return 0; 94 + }

+46

tools/testing/selftests/bpf/progs/test_lookup_key.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH 5 + * 6 + * Author: Roberto Sassu <roberto.sassu@huawei.com> 7 + */ 8 + 9 + #include "vmlinux.h" 10 + #include <errno.h> 11 + #include <bpf/bpf_helpers.h> 12 + #include <bpf/bpf_tracing.h> 13 + 14 + char _license[] SEC("license") = "GPL"; 15 + 16 + __u32 monitored_pid; 17 + __u32 key_serial; 18 + __u32 key_id; 19 + __u64 flags; 20 + 21 + extern struct bpf_key *bpf_lookup_user_key(__u32 serial, __u64 flags) __ksym; 22 + extern struct bpf_key *bpf_lookup_system_key(__u64 id) __ksym; 23 + extern void bpf_key_put(struct bpf_key *key) __ksym; 24 + 25 + SEC("lsm.s/bpf") 26 + int BPF_PROG(bpf, int cmd, union bpf_attr *attr, unsigned int size) 27 + { 28 + struct bpf_key *bkey; 29 + __u32 pid; 30 + 31 + pid = bpf_get_current_pid_tgid() >> 32; 32 + if (pid != monitored_pid) 33 + return 0; 34 + 35 + if (key_serial) 36 + bkey = bpf_lookup_user_key(key_serial, flags); 37 + else 38 + bkey = bpf_lookup_system_key(key_id); 39 + 40 + if (!bkey) 41 + return -ENOENT; 42 + 43 + bpf_key_put(bkey); 44 + 45 + return 0; 46 + }

+35

tools/testing/selftests/bpf/progs/test_user_ringbuf.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #ifndef _TEST_USER_RINGBUF_H 5 + #define _TEST_USER_RINGBUF_H 6 + 7 + #define TEST_OP_64 4 8 + #define TEST_OP_32 2 9 + 10 + enum test_msg_op { 11 + TEST_MSG_OP_INC64, 12 + TEST_MSG_OP_INC32, 13 + TEST_MSG_OP_MUL64, 14 + TEST_MSG_OP_MUL32, 15 + 16 + // Must come last. 17 + TEST_MSG_OP_NUM_OPS, 18 + }; 19 + 20 + struct test_msg { 21 + enum test_msg_op msg_op; 22 + union { 23 + __s64 operand_64; 24 + __s32 operand_32; 25 + }; 26 + }; 27 + 28 + struct sample { 29 + int pid; 30 + int seq; 31 + long value; 32 + char comm[16]; 33 + }; 34 + 35 + #endif /* _TEST_USER_RINGBUF_H */

+1 -1

tools/testing/selftests/bpf/progs/test_verif_scale1.c

··· 5 5 #define ATTR __attribute__((noinline)) 6 6 #include "test_jhash.h" 7 7 8 - SEC("scale90_noinline") 8 + SEC("tc") 9 9 int balancer_ingress(struct __sk_buff *ctx) 10 10 { 11 11 void *data_end = (void *)(long)ctx->data_end;

+1 -1

tools/testing/selftests/bpf/progs/test_verif_scale3.c

··· 5 5 #define ATTR __attribute__((noinline)) 6 6 #include "test_jhash.h" 7 7 8 - SEC("scale90_noinline32") 8 + SEC("tc") 9 9 int balancer_ingress(struct __sk_buff *ctx) 10 10 { 11 11 void *data_end = (void *)(long)ctx->data_end;

+90

tools/testing/selftests/bpf/progs/test_verify_pkcs7_sig.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH 5 + * 6 + * Author: Roberto Sassu <roberto.sassu@huawei.com> 7 + */ 8 + 9 + #include "vmlinux.h" 10 + #include <errno.h> 11 + #include <bpf/bpf_helpers.h> 12 + #include <bpf/bpf_tracing.h> 13 + 14 + #define MAX_DATA_SIZE (1024 * 1024) 15 + #define MAX_SIG_SIZE 1024 16 + 17 + extern struct bpf_key *bpf_lookup_user_key(__u32 serial, __u64 flags) __ksym; 18 + extern struct bpf_key *bpf_lookup_system_key(__u64 id) __ksym; 19 + extern void bpf_key_put(struct bpf_key *key) __ksym; 20 + extern int bpf_verify_pkcs7_signature(struct bpf_dynptr *data_ptr, 21 + struct bpf_dynptr *sig_ptr, 22 + struct bpf_key *trusted_keyring) __ksym; 23 + 24 + __u32 monitored_pid; 25 + __u32 user_keyring_serial; 26 + __u64 system_keyring_id; 27 + 28 + struct data { 29 + __u8 data[MAX_DATA_SIZE]; 30 + __u32 data_len; 31 + __u8 sig[MAX_SIG_SIZE]; 32 + __u32 sig_len; 33 + }; 34 + 35 + struct { 36 + __uint(type, BPF_MAP_TYPE_ARRAY); 37 + __uint(max_entries, 1); 38 + __type(key, __u32); 39 + __type(value, struct data); 40 + } data_input SEC(".maps"); 41 + 42 + char _license[] SEC("license") = "GPL"; 43 + 44 + SEC("lsm.s/bpf") 45 + int BPF_PROG(bpf, int cmd, union bpf_attr *attr, unsigned int size) 46 + { 47 + struct bpf_dynptr data_ptr, sig_ptr; 48 + struct data *data_val; 49 + struct bpf_key *trusted_keyring; 50 + __u32 pid; 51 + __u64 value; 52 + int ret, zero = 0; 53 + 54 + pid = bpf_get_current_pid_tgid() >> 32; 55 + if (pid != monitored_pid) 56 + return 0; 57 + 58 + data_val = bpf_map_lookup_elem(&data_input, &zero); 59 + if (!data_val) 60 + return 0; 61 + 62 + bpf_probe_read(&value, sizeof(value), &attr->value); 63 + 64 + bpf_copy_from_user(data_val, sizeof(struct data), 65 + (void *)(unsigned long)value); 66 + 67 + if (data_val->data_len > sizeof(data_val->data)) 68 + return -EINVAL; 69 + 70 + bpf_dynptr_from_mem(data_val->data, data_val->data_len, 0, &data_ptr); 71 + 72 + if (data_val->sig_len > sizeof(data_val->sig)) 73 + return -EINVAL; 74 + 75 + bpf_dynptr_from_mem(data_val->sig, data_val->sig_len, 0, &sig_ptr); 76 + 77 + if (user_keyring_serial) 78 + trusted_keyring = bpf_lookup_user_key(user_keyring_serial, 0); 79 + else 80 + trusted_keyring = bpf_lookup_system_key(system_keyring_id); 81 + 82 + if (!trusted_keyring) 83 + return -ENOENT; 84 + 85 + ret = bpf_verify_pkcs7_signature(&data_ptr, &sig_ptr, trusted_keyring); 86 + 87 + bpf_key_put(trusted_keyring); 88 + 89 + return ret; 90 + }

+2 -2

tools/testing/selftests/bpf/progs/timer.c

··· 120 120 } 121 121 122 122 SEC("fentry/bpf_fentry_test1") 123 - int BPF_PROG(test1, int a) 123 + int BPF_PROG2(test1, int, a) 124 124 { 125 125 struct bpf_timer *arr_timer, *lru_timer; 126 126 struct elem init = {}; ··· 236 236 } 237 237 238 238 SEC("fentry/bpf_fentry_test2") 239 - int BPF_PROG(test2, int a, int b) 239 + int BPF_PROG2(test2, int, a, int, b) 240 240 { 241 241 struct hmap_elem init = {}, *val; 242 242 int key = HTAB, key_malloc = HTAB_MALLOC;

+120

tools/testing/selftests/bpf/progs/tracing_struct.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <vmlinux.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include <bpf/bpf_helpers.h> 7 + 8 + struct bpf_testmod_struct_arg_1 { 9 + int a; 10 + }; 11 + struct bpf_testmod_struct_arg_2 { 12 + long a; 13 + long b; 14 + }; 15 + 16 + long t1_a_a, t1_a_b, t1_b, t1_c, t1_ret, t1_nregs; 17 + __u64 t1_reg0, t1_reg1, t1_reg2, t1_reg3; 18 + long t2_a, t2_b_a, t2_b_b, t2_c, t2_ret; 19 + long t3_a, t3_b, t3_c_a, t3_c_b, t3_ret; 20 + long t4_a_a, t4_b, t4_c, t4_d, t4_e_a, t4_e_b, t4_ret; 21 + long t5_ret; 22 + 23 + SEC("fentry/bpf_testmod_test_struct_arg_1") 24 + int BPF_PROG2(test_struct_arg_1, struct bpf_testmod_struct_arg_2, a, int, b, int, c) 25 + { 26 + t1_a_a = a.a; 27 + t1_a_b = a.b; 28 + t1_b = b; 29 + t1_c = c; 30 + return 0; 31 + } 32 + 33 + SEC("fexit/bpf_testmod_test_struct_arg_1") 34 + int BPF_PROG2(test_struct_arg_2, struct bpf_testmod_struct_arg_2, a, int, b, int, c, int, ret) 35 + { 36 + t1_nregs = bpf_get_func_arg_cnt(ctx); 37 + /* a.a */ 38 + bpf_get_func_arg(ctx, 0, &t1_reg0); 39 + /* a.b */ 40 + bpf_get_func_arg(ctx, 1, &t1_reg1); 41 + /* b */ 42 + bpf_get_func_arg(ctx, 2, &t1_reg2); 43 + t1_reg2 = (int)t1_reg2; 44 + /* c */ 45 + bpf_get_func_arg(ctx, 3, &t1_reg3); 46 + t1_reg3 = (int)t1_reg3; 47 + 48 + t1_ret = ret; 49 + return 0; 50 + } 51 + 52 + SEC("fentry/bpf_testmod_test_struct_arg_2") 53 + int BPF_PROG2(test_struct_arg_3, int, a, struct bpf_testmod_struct_arg_2, b, int, c) 54 + { 55 + t2_a = a; 56 + t2_b_a = b.a; 57 + t2_b_b = b.b; 58 + t2_c = c; 59 + return 0; 60 + } 61 + 62 + SEC("fexit/bpf_testmod_test_struct_arg_2") 63 + int BPF_PROG2(test_struct_arg_4, int, a, struct bpf_testmod_struct_arg_2, b, int, c, int, ret) 64 + { 65 + t2_ret = ret; 66 + return 0; 67 + } 68 + 69 + SEC("fentry/bpf_testmod_test_struct_arg_3") 70 + int BPF_PROG2(test_struct_arg_5, int, a, int, b, struct bpf_testmod_struct_arg_2, c) 71 + { 72 + t3_a = a; 73 + t3_b = b; 74 + t3_c_a = c.a; 75 + t3_c_b = c.b; 76 + return 0; 77 + } 78 + 79 + SEC("fexit/bpf_testmod_test_struct_arg_3") 80 + int BPF_PROG2(test_struct_arg_6, int, a, int, b, struct bpf_testmod_struct_arg_2, c, int, ret) 81 + { 82 + t3_ret = ret; 83 + return 0; 84 + } 85 + 86 + SEC("fentry/bpf_testmod_test_struct_arg_4") 87 + int BPF_PROG2(test_struct_arg_7, struct bpf_testmod_struct_arg_1, a, int, b, 88 + int, c, int, d, struct bpf_testmod_struct_arg_2, e) 89 + { 90 + t4_a_a = a.a; 91 + t4_b = b; 92 + t4_c = c; 93 + t4_d = d; 94 + t4_e_a = e.a; 95 + t4_e_b = e.b; 96 + return 0; 97 + } 98 + 99 + SEC("fexit/bpf_testmod_test_struct_arg_4") 100 + int BPF_PROG2(test_struct_arg_8, struct bpf_testmod_struct_arg_1, a, int, b, 101 + int, c, int, d, struct bpf_testmod_struct_arg_2, e, int, ret) 102 + { 103 + t4_ret = ret; 104 + return 0; 105 + } 106 + 107 + SEC("fentry/bpf_testmod_test_struct_arg_5") 108 + int BPF_PROG2(test_struct_arg_9) 109 + { 110 + return 0; 111 + } 112 + 113 + SEC("fexit/bpf_testmod_test_struct_arg_5") 114 + int BPF_PROG2(test_struct_arg_10, int, ret) 115 + { 116 + t5_ret = ret; 117 + return 0; 118 + } 119 + 120 + char _license[] SEC("license") = "GPL";

+177

tools/testing/selftests/bpf/progs/user_ringbuf_fail.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <linux/bpf.h> 5 + #include <bpf/bpf_helpers.h> 6 + #include "bpf_misc.h" 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + struct sample { 11 + int pid; 12 + int seq; 13 + long value; 14 + char comm[16]; 15 + }; 16 + 17 + struct { 18 + __uint(type, BPF_MAP_TYPE_USER_RINGBUF); 19 + } user_ringbuf SEC(".maps"); 20 + 21 + static long 22 + bad_access1(struct bpf_dynptr *dynptr, void *context) 23 + { 24 + const struct sample *sample; 25 + 26 + sample = bpf_dynptr_data(dynptr - 1, 0, sizeof(*sample)); 27 + bpf_printk("Was able to pass bad pointer %lx\n", (__u64)dynptr - 1); 28 + 29 + return 0; 30 + } 31 + 32 + /* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should 33 + * not be able to read before the pointer. 34 + */ 35 + SEC("?raw_tp/sys_nanosleep") 36 + int user_ringbuf_callback_bad_access1(void *ctx) 37 + { 38 + bpf_user_ringbuf_drain(&user_ringbuf, bad_access1, NULL, 0); 39 + 40 + return 0; 41 + } 42 + 43 + static long 44 + bad_access2(struct bpf_dynptr *dynptr, void *context) 45 + { 46 + const struct sample *sample; 47 + 48 + sample = bpf_dynptr_data(dynptr + 1, 0, sizeof(*sample)); 49 + bpf_printk("Was able to pass bad pointer %lx\n", (__u64)dynptr + 1); 50 + 51 + return 0; 52 + } 53 + 54 + /* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should 55 + * not be able to read past the end of the pointer. 56 + */ 57 + SEC("?raw_tp/sys_nanosleep") 58 + int user_ringbuf_callback_bad_access2(void *ctx) 59 + { 60 + bpf_user_ringbuf_drain(&user_ringbuf, bad_access2, NULL, 0); 61 + 62 + return 0; 63 + } 64 + 65 + static long 66 + write_forbidden(struct bpf_dynptr *dynptr, void *context) 67 + { 68 + *((long *)dynptr) = 0; 69 + 70 + return 0; 71 + } 72 + 73 + /* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should 74 + * not be able to write to that pointer. 75 + */ 76 + SEC("?raw_tp/sys_nanosleep") 77 + int user_ringbuf_callback_write_forbidden(void *ctx) 78 + { 79 + bpf_user_ringbuf_drain(&user_ringbuf, write_forbidden, NULL, 0); 80 + 81 + return 0; 82 + } 83 + 84 + static long 85 + null_context_write(struct bpf_dynptr *dynptr, void *context) 86 + { 87 + *((__u64 *)context) = 0; 88 + 89 + return 0; 90 + } 91 + 92 + /* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should 93 + * not be able to write to that pointer. 94 + */ 95 + SEC("?raw_tp/sys_nanosleep") 96 + int user_ringbuf_callback_null_context_write(void *ctx) 97 + { 98 + bpf_user_ringbuf_drain(&user_ringbuf, null_context_write, NULL, 0); 99 + 100 + return 0; 101 + } 102 + 103 + static long 104 + null_context_read(struct bpf_dynptr *dynptr, void *context) 105 + { 106 + __u64 id = *((__u64 *)context); 107 + 108 + bpf_printk("Read id %lu\n", id); 109 + 110 + return 0; 111 + } 112 + 113 + /* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should 114 + * not be able to write to that pointer. 115 + */ 116 + SEC("?raw_tp/sys_nanosleep") 117 + int user_ringbuf_callback_null_context_read(void *ctx) 118 + { 119 + bpf_user_ringbuf_drain(&user_ringbuf, null_context_read, NULL, 0); 120 + 121 + return 0; 122 + } 123 + 124 + static long 125 + try_discard_dynptr(struct bpf_dynptr *dynptr, void *context) 126 + { 127 + bpf_ringbuf_discard_dynptr(dynptr, 0); 128 + 129 + return 0; 130 + } 131 + 132 + /* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should 133 + * not be able to read past the end of the pointer. 134 + */ 135 + SEC("?raw_tp/sys_nanosleep") 136 + int user_ringbuf_callback_discard_dynptr(void *ctx) 137 + { 138 + bpf_user_ringbuf_drain(&user_ringbuf, try_discard_dynptr, NULL, 0); 139 + 140 + return 0; 141 + } 142 + 143 + static long 144 + try_submit_dynptr(struct bpf_dynptr *dynptr, void *context) 145 + { 146 + bpf_ringbuf_submit_dynptr(dynptr, 0); 147 + 148 + return 0; 149 + } 150 + 151 + /* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should 152 + * not be able to read past the end of the pointer. 153 + */ 154 + SEC("?raw_tp/sys_nanosleep") 155 + int user_ringbuf_callback_submit_dynptr(void *ctx) 156 + { 157 + bpf_user_ringbuf_drain(&user_ringbuf, try_submit_dynptr, NULL, 0); 158 + 159 + return 0; 160 + } 161 + 162 + static long 163 + invalid_drain_callback_return(struct bpf_dynptr *dynptr, void *context) 164 + { 165 + return 2; 166 + } 167 + 168 + /* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should 169 + * not be able to write to that pointer. 170 + */ 171 + SEC("?raw_tp/sys_nanosleep") 172 + int user_ringbuf_callback_invalid_return(void *ctx) 173 + { 174 + bpf_user_ringbuf_drain(&user_ringbuf, invalid_drain_callback_return, NULL, 0); 175 + 176 + return 0; 177 + }

+218

tools/testing/selftests/bpf/progs/user_ringbuf_success.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <linux/bpf.h> 5 + #include <bpf/bpf_helpers.h> 6 + #include "bpf_misc.h" 7 + #include "test_user_ringbuf.h" 8 + 9 + char _license[] SEC("license") = "GPL"; 10 + 11 + struct { 12 + __uint(type, BPF_MAP_TYPE_USER_RINGBUF); 13 + } user_ringbuf SEC(".maps"); 14 + 15 + struct { 16 + __uint(type, BPF_MAP_TYPE_RINGBUF); 17 + } kernel_ringbuf SEC(".maps"); 18 + 19 + /* inputs */ 20 + int pid, err, val; 21 + 22 + int read = 0; 23 + 24 + /* Counter used for end-to-end protocol test */ 25 + __u64 kern_mutated = 0; 26 + __u64 user_mutated = 0; 27 + __u64 expected_user_mutated = 0; 28 + 29 + static int 30 + is_test_process(void) 31 + { 32 + int cur_pid = bpf_get_current_pid_tgid() >> 32; 33 + 34 + return cur_pid == pid; 35 + } 36 + 37 + static long 38 + record_sample(struct bpf_dynptr *dynptr, void *context) 39 + { 40 + const struct sample *sample = NULL; 41 + struct sample stack_sample; 42 + int status; 43 + static int num_calls; 44 + 45 + if (num_calls++ % 2 == 0) { 46 + status = bpf_dynptr_read(&stack_sample, sizeof(stack_sample), dynptr, 0, 0); 47 + if (status) { 48 + bpf_printk("bpf_dynptr_read() failed: %d\n", status); 49 + err = 1; 50 + return 0; 51 + } 52 + } else { 53 + sample = bpf_dynptr_data(dynptr, 0, sizeof(*sample)); 54 + if (!sample) { 55 + bpf_printk("Unexpectedly failed to get sample\n"); 56 + err = 2; 57 + return 0; 58 + } 59 + stack_sample = *sample; 60 + } 61 + 62 + __sync_fetch_and_add(&read, 1); 63 + return 0; 64 + } 65 + 66 + static void 67 + handle_sample_msg(const struct test_msg *msg) 68 + { 69 + switch (msg->msg_op) { 70 + case TEST_MSG_OP_INC64: 71 + kern_mutated += msg->operand_64; 72 + break; 73 + case TEST_MSG_OP_INC32: 74 + kern_mutated += msg->operand_32; 75 + break; 76 + case TEST_MSG_OP_MUL64: 77 + kern_mutated *= msg->operand_64; 78 + break; 79 + case TEST_MSG_OP_MUL32: 80 + kern_mutated *= msg->operand_32; 81 + break; 82 + default: 83 + bpf_printk("Unrecognized op %d\n", msg->msg_op); 84 + err = 2; 85 + } 86 + } 87 + 88 + static long 89 + read_protocol_msg(struct bpf_dynptr *dynptr, void *context) 90 + { 91 + const struct test_msg *msg = NULL; 92 + 93 + msg = bpf_dynptr_data(dynptr, 0, sizeof(*msg)); 94 + if (!msg) { 95 + err = 1; 96 + bpf_printk("Unexpectedly failed to get msg\n"); 97 + return 0; 98 + } 99 + 100 + handle_sample_msg(msg); 101 + 102 + return 0; 103 + } 104 + 105 + static int publish_next_kern_msg(__u32 index, void *context) 106 + { 107 + struct test_msg *msg = NULL; 108 + int operand_64 = TEST_OP_64; 109 + int operand_32 = TEST_OP_32; 110 + 111 + msg = bpf_ringbuf_reserve(&kernel_ringbuf, sizeof(*msg), 0); 112 + if (!msg) { 113 + err = 4; 114 + return 1; 115 + } 116 + 117 + switch (index % TEST_MSG_OP_NUM_OPS) { 118 + case TEST_MSG_OP_INC64: 119 + msg->operand_64 = operand_64; 120 + msg->msg_op = TEST_MSG_OP_INC64; 121 + expected_user_mutated += operand_64; 122 + break; 123 + case TEST_MSG_OP_INC32: 124 + msg->operand_32 = operand_32; 125 + msg->msg_op = TEST_MSG_OP_INC32; 126 + expected_user_mutated += operand_32; 127 + break; 128 + case TEST_MSG_OP_MUL64: 129 + msg->operand_64 = operand_64; 130 + msg->msg_op = TEST_MSG_OP_MUL64; 131 + expected_user_mutated *= operand_64; 132 + break; 133 + case TEST_MSG_OP_MUL32: 134 + msg->operand_32 = operand_32; 135 + msg->msg_op = TEST_MSG_OP_MUL32; 136 + expected_user_mutated *= operand_32; 137 + break; 138 + default: 139 + bpf_ringbuf_discard(msg, 0); 140 + err = 5; 141 + return 1; 142 + } 143 + 144 + bpf_ringbuf_submit(msg, 0); 145 + 146 + return 0; 147 + } 148 + 149 + static void 150 + publish_kern_messages(void) 151 + { 152 + if (expected_user_mutated != user_mutated) { 153 + bpf_printk("%lu != %lu\n", expected_user_mutated, user_mutated); 154 + err = 3; 155 + return; 156 + } 157 + 158 + bpf_loop(8, publish_next_kern_msg, NULL, 0); 159 + } 160 + 161 + SEC("fentry/" SYS_PREFIX "sys_prctl") 162 + int test_user_ringbuf_protocol(void *ctx) 163 + { 164 + long status = 0; 165 + struct sample *sample = NULL; 166 + struct bpf_dynptr ptr; 167 + 168 + if (!is_test_process()) 169 + return 0; 170 + 171 + status = bpf_user_ringbuf_drain(&user_ringbuf, read_protocol_msg, NULL, 0); 172 + if (status < 0) { 173 + bpf_printk("Drain returned: %ld\n", status); 174 + err = 1; 175 + return 0; 176 + } 177 + 178 + publish_kern_messages(); 179 + 180 + return 0; 181 + } 182 + 183 + SEC("fentry/" SYS_PREFIX "sys_getpgid") 184 + int test_user_ringbuf(void *ctx) 185 + { 186 + int status = 0; 187 + struct sample *sample = NULL; 188 + struct bpf_dynptr ptr; 189 + 190 + if (!is_test_process()) 191 + return 0; 192 + 193 + err = bpf_user_ringbuf_drain(&user_ringbuf, record_sample, NULL, 0); 194 + 195 + return 0; 196 + } 197 + 198 + static long 199 + do_nothing_cb(struct bpf_dynptr *dynptr, void *context) 200 + { 201 + __sync_fetch_and_add(&read, 1); 202 + return 0; 203 + } 204 + 205 + SEC("fentry/" SYS_PREFIX "sys_getrlimit") 206 + int test_user_ringbuf_epoll(void *ctx) 207 + { 208 + long num_samples; 209 + 210 + if (!is_test_process()) 211 + return 0; 212 + 213 + num_samples = bpf_user_ringbuf_drain(&user_ringbuf, do_nothing_cb, NULL, 0); 214 + if (num_samples <= 0) 215 + err = 1; 216 + 217 + return 0; 218 + }

+13 -7

tools/testing/selftests/bpf/test_kmod.sh

··· 1 1 #!/bin/sh 2 2 # SPDX-License-Identifier: GPL-2.0 3 3 4 + # Usage: 5 + # ./test_kmod.sh [module_param]... 6 + # Ex.: ./test_kmod.sh test_range=1,3 7 + # All the parameters are passed to the kernel module. 8 + 4 9 # Kselftest framework requirement - SKIP code is 4. 5 10 ksft_skip=4 6 11 ··· 29 24 sysctl -w net.core.bpf_jit_harden=$2 2>&1 > /dev/null 30 25 31 26 echo "[ JIT enabled:$1 hardened:$2 ]" 27 + shift 2 32 28 dmesg -C 33 29 if [ -f ${OUTPUT}/lib/test_bpf.ko ]; then 34 - insmod ${OUTPUT}/lib/test_bpf.ko 2> /dev/null 30 + insmod ${OUTPUT}/lib/test_bpf.ko "$@" 2> /dev/null 35 31 if [ $? -ne 0 ]; then 36 32 rc=1 37 33 fi 38 34 else 39 35 # Use modprobe dry run to check for missing test_bpf module 40 - if ! /sbin/modprobe -q -n test_bpf; then 36 + if ! /sbin/modprobe -q -n test_bpf "$@"; then 41 37 echo "test_bpf: [SKIP]" 42 - elif /sbin/modprobe -q test_bpf; then 38 + elif /sbin/modprobe -q test_bpf "$@"; then 43 39 echo "test_bpf: ok" 44 40 else 45 41 echo "test_bpf: [FAIL]" ··· 65 59 66 60 rc=0 67 61 test_save 68 - test_run 0 0 69 - test_run 1 0 70 - test_run 1 1 71 - test_run 1 2 62 + test_run 0 0 "$@" 63 + test_run 1 0 "$@" 64 + test_run 1 1 "$@" 65 + test_run 1 2 "$@" 72 66 test_restore 73 67 exit $rc

+16 -10

tools/testing/selftests/bpf/test_maps.c

··· 30 30 #define ENOTSUPP 524 31 31 #endif 32 32 33 - static int skips; 33 + int skips; 34 34 35 35 static struct bpf_map_create_opts map_opts = { .sz = sizeof(map_opts) }; 36 36 ··· 659 659 { 660 660 struct bpf_map *bpf_map_rx, *bpf_map_tx, *bpf_map_msg, *bpf_map_break; 661 661 int map_fd_msg = 0, map_fd_rx = 0, map_fd_tx = 0, map_fd_break; 662 + struct bpf_object *parse_obj, *verdict_obj, *msg_obj; 662 663 int ports[] = {50200, 50201, 50202, 50204}; 663 664 int err, i, fd, udp, sfd[6] = {0xdeadbeef}; 664 665 u8 buf[20] = {0x0, 0x5, 0x3, 0x2, 0x1, 0x0}; 665 666 int parse_prog, verdict_prog, msg_prog; 666 667 struct sockaddr_in addr; 667 668 int one = 1, s, sc, rc; 668 - struct bpf_object *obj; 669 669 struct timeval to; 670 670 __u32 key, value; 671 671 pid_t pid[tasks]; ··· 761 761 i, udp); 762 762 goto out_sockmap; 763 763 } 764 + close(udp); 764 765 765 766 /* Test update without programs */ 766 767 for (i = 0; i < 6; i++) { ··· 824 823 825 824 /* Load SK_SKB program and Attach */ 826 825 err = bpf_prog_test_load(SOCKMAP_PARSE_PROG, 827 - BPF_PROG_TYPE_SK_SKB, &obj, &parse_prog); 826 + BPF_PROG_TYPE_SK_SKB, &parse_obj, &parse_prog); 828 827 if (err) { 829 828 printf("Failed to load SK_SKB parse prog\n"); 830 829 goto out_sockmap; 831 830 } 832 831 833 832 err = bpf_prog_test_load(SOCKMAP_TCP_MSG_PROG, 834 - BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog); 833 + BPF_PROG_TYPE_SK_MSG, &msg_obj, &msg_prog); 835 834 if (err) { 836 835 printf("Failed to load SK_SKB msg prog\n"); 837 836 goto out_sockmap; 838 837 } 839 838 840 839 err = bpf_prog_test_load(SOCKMAP_VERDICT_PROG, 841 - BPF_PROG_TYPE_SK_SKB, &obj, &verdict_prog); 840 + BPF_PROG_TYPE_SK_SKB, &verdict_obj, &verdict_prog); 842 841 if (err) { 843 842 printf("Failed to load SK_SKB verdict prog\n"); 844 843 goto out_sockmap; 845 844 } 846 845 847 - bpf_map_rx = bpf_object__find_map_by_name(obj, "sock_map_rx"); 846 + bpf_map_rx = bpf_object__find_map_by_name(verdict_obj, "sock_map_rx"); 848 847 if (!bpf_map_rx) { 849 848 printf("Failed to load map rx from verdict prog\n"); 850 849 goto out_sockmap; ··· 856 855 goto out_sockmap; 857 856 } 858 857 859 - bpf_map_tx = bpf_object__find_map_by_name(obj, "sock_map_tx"); 858 + bpf_map_tx = bpf_object__find_map_by_name(verdict_obj, "sock_map_tx"); 860 859 if (!bpf_map_tx) { 861 860 printf("Failed to load map tx from verdict prog\n"); 862 861 goto out_sockmap; ··· 868 867 goto out_sockmap; 869 868 } 870 869 871 - bpf_map_msg = bpf_object__find_map_by_name(obj, "sock_map_msg"); 870 + bpf_map_msg = bpf_object__find_map_by_name(verdict_obj, "sock_map_msg"); 872 871 if (!bpf_map_msg) { 873 872 printf("Failed to load map msg from msg_verdict prog\n"); 874 873 goto out_sockmap; ··· 880 879 goto out_sockmap; 881 880 } 882 881 883 - bpf_map_break = bpf_object__find_map_by_name(obj, "sock_map_break"); 882 + bpf_map_break = bpf_object__find_map_by_name(verdict_obj, "sock_map_break"); 884 883 if (!bpf_map_break) { 885 884 printf("Failed to load map tx from verdict prog\n"); 886 885 goto out_sockmap; ··· 1126 1125 } 1127 1126 close(fd); 1128 1127 close(map_fd_rx); 1129 - bpf_object__close(obj); 1128 + bpf_object__close(parse_obj); 1129 + bpf_object__close(msg_obj); 1130 + bpf_object__close(verdict_obj); 1130 1131 return; 1131 1132 out: 1132 1133 for (i = 0; i < 6; i++) ··· 1286 1283 printf("Inner map mim.inner was not destroyed\n"); 1287 1284 goto out_map_in_map; 1288 1285 } 1286 + 1287 + close(fd); 1289 1288 } 1290 1289 1290 + bpf_object__close(obj); 1291 1291 return; 1292 1292 1293 1293 out_map_in_map:

+2

tools/testing/selftests/bpf/test_maps.h

··· 14 14 } \ 15 15 }) 16 16 17 + extern int skips; 18 + 17 19 #endif

+17

tools/testing/selftests/bpf/test_progs.c

··· 943 943 return 0; 944 944 } 945 945 946 + int write_sysctl(const char *sysctl, const char *value) 947 + { 948 + int fd, err, len; 949 + 950 + fd = open(sysctl, O_WRONLY); 951 + if (!ASSERT_NEQ(fd, -1, "open sysctl")) 952 + return -1; 953 + 954 + len = strlen(value); 955 + err = write(fd, value, len); 956 + close(fd); 957 + if (!ASSERT_EQ(err, len, "write sysctl")) 958 + return -1; 959 + 960 + return 0; 961 + } 962 + 946 963 #define MAX_BACKTRACE_SZ 128 947 964 void crash_handler(int signum) 948 965 {

+1

tools/testing/selftests/bpf/test_progs.h

··· 384 384 int kern_sync_rcu(void); 385 385 int trigger_module_test_read(int read_sz); 386 386 int trigger_module_test_write(int write_sz); 387 + int write_sysctl(const char *sysctl, const char *value); 387 388 388 389 #ifdef __x86_64__ 389 390 #define SYS_NANOSLEEP_KPROBE_NAME "__x64_sys_nanosleep"

+42

tools/testing/selftests/bpf/test_sockmap.c

··· 138 138 bool data_test; 139 139 bool drop_expected; 140 140 bool check_recved_len; 141 + bool tx_wait_mem; 141 142 int iov_count; 142 143 int iov_length; 143 144 int rate; ··· 579 578 sent = sendmsg(fd, &msg, flags); 580 579 581 580 if (!drop && sent < 0) { 581 + if (opt->tx_wait_mem && errno == EACCES) { 582 + errno = 0; 583 + goto out_errno; 584 + } 582 585 perror("sendmsg loop error"); 583 586 goto out_errno; 584 587 } else if (drop && sent >= 0) { ··· 646 641 fprintf(stderr, "unexpected timeout: recved %zu/%f pop_total %f\n", s->bytes_recvd, total_bytes, txmsg_pop_total); 647 642 errno = -EIO; 648 643 clock_gettime(CLOCK_MONOTONIC, &s->end); 644 + goto out_errno; 645 + } 646 + 647 + if (opt->tx_wait_mem) { 648 + FD_ZERO(&w); 649 + FD_SET(fd, &w); 650 + slct = select(max_fd + 1, NULL, NULL, &w, &timeout); 651 + errno = 0; 652 + close(fd); 649 653 goto out_errno; 650 654 } 651 655 ··· 766 752 return err; 767 753 } 768 754 755 + if (opt->tx_wait_mem) { 756 + struct timeval timeout; 757 + int rxtx_buf_len = 1024; 758 + 759 + timeout.tv_sec = 3; 760 + timeout.tv_usec = 0; 761 + 762 + err = setsockopt(c2, SOL_SOCKET, SO_SNDTIMEO, &timeout, sizeof(struct timeval)); 763 + err |= setsockopt(c2, SOL_SOCKET, SO_SNDBUFFORCE, &rxtx_buf_len, sizeof(int)); 764 + err |= setsockopt(p2, SOL_SOCKET, SO_RCVBUFFORCE, &rxtx_buf_len, sizeof(int)); 765 + if (err) { 766 + perror("setsockopt failed()"); 767 + return errno; 768 + } 769 + } 770 + 769 771 rxpid = fork(); 770 772 if (rxpid == 0) { 771 773 if (txmsg_pop || txmsg_start_pop) ··· 817 787 perror("msg_loop_rx"); 818 788 return errno; 819 789 } 790 + 791 + if (opt->tx_wait_mem) 792 + close(c2); 820 793 821 794 txpid = fork(); 822 795 if (txpid == 0) { ··· 1485 1452 test_send(opt, cgrp); 1486 1453 } 1487 1454 1455 + static void test_txmsg_redir_wait_sndmem(int cgrp, struct sockmap_options *opt) 1456 + { 1457 + txmsg_redir = 1; 1458 + opt->tx_wait_mem = true; 1459 + test_send_large(opt, cgrp); 1460 + opt->tx_wait_mem = false; 1461 + } 1462 + 1488 1463 static void test_txmsg_drop(int cgrp, struct sockmap_options *opt) 1489 1464 { 1490 1465 txmsg_drop = 1; ··· 1841 1800 struct _test test[] = { 1842 1801 {"txmsg test passthrough", test_txmsg_pass}, 1843 1802 {"txmsg test redirect", test_txmsg_redir}, 1803 + {"txmsg test redirect wait send mem", test_txmsg_redir_wait_sndmem}, 1844 1804 {"txmsg test drop", test_txmsg_drop}, 1845 1805 {"txmsg test ingress redirect", test_txmsg_ingress_redir}, 1846 1806 {"txmsg test skb", test_txmsg_skb},

+2 -1

tools/testing/selftests/bpf/test_verifier.c

··· 1498 1498 opts.log_level = DEFAULT_LIBBPF_LOG_LEVEL; 1499 1499 opts.prog_flags = pflags; 1500 1500 1501 - if (prog_type == BPF_PROG_TYPE_TRACING && test->kfunc) { 1501 + if ((prog_type == BPF_PROG_TYPE_TRACING || 1502 + prog_type == BPF_PROG_TYPE_LSM) && test->kfunc) { 1502 1503 int attach_btf_id; 1503 1504 1504 1505 attach_btf_id = libbpf_find_vmlinux_btf_id(test->kfunc,

+1 -1

tools/testing/selftests/bpf/verifier/calls.c

··· 284 284 .result = ACCEPT, 285 285 }, 286 286 { 287 - "calls: not on unpriviledged", 287 + "calls: not on unprivileged", 288 288 .insns = { 289 289 BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), 290 290 BPF_MOV64_IMM(BPF_REG_0, 1),

+139

tools/testing/selftests/bpf/verifier/ref_tracking.c

··· 85 85 .result = REJECT, 86 86 }, 87 87 { 88 + "reference tracking: acquire/release user key reference", 89 + .insns = { 90 + BPF_MOV64_IMM(BPF_REG_1, -3), 91 + BPF_MOV64_IMM(BPF_REG_2, 0), 92 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 93 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 94 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 95 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 96 + BPF_MOV64_IMM(BPF_REG_0, 0), 97 + BPF_EXIT_INSN(), 98 + }, 99 + .prog_type = BPF_PROG_TYPE_LSM, 100 + .kfunc = "bpf", 101 + .expected_attach_type = BPF_LSM_MAC, 102 + .flags = BPF_F_SLEEPABLE, 103 + .fixup_kfunc_btf_id = { 104 + { "bpf_lookup_user_key", 2 }, 105 + { "bpf_key_put", 5 }, 106 + }, 107 + .result = ACCEPT, 108 + }, 109 + { 110 + "reference tracking: acquire/release system key reference", 111 + .insns = { 112 + BPF_MOV64_IMM(BPF_REG_1, 1), 113 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 114 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 115 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 116 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 117 + BPF_MOV64_IMM(BPF_REG_0, 0), 118 + BPF_EXIT_INSN(), 119 + }, 120 + .prog_type = BPF_PROG_TYPE_LSM, 121 + .kfunc = "bpf", 122 + .expected_attach_type = BPF_LSM_MAC, 123 + .flags = BPF_F_SLEEPABLE, 124 + .fixup_kfunc_btf_id = { 125 + { "bpf_lookup_system_key", 1 }, 126 + { "bpf_key_put", 4 }, 127 + }, 128 + .result = ACCEPT, 129 + }, 130 + { 131 + "reference tracking: release user key reference without check", 132 + .insns = { 133 + BPF_MOV64_IMM(BPF_REG_1, -3), 134 + BPF_MOV64_IMM(BPF_REG_2, 0), 135 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 136 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 137 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 138 + BPF_MOV64_IMM(BPF_REG_0, 0), 139 + BPF_EXIT_INSN(), 140 + }, 141 + .prog_type = BPF_PROG_TYPE_LSM, 142 + .kfunc = "bpf", 143 + .expected_attach_type = BPF_LSM_MAC, 144 + .flags = BPF_F_SLEEPABLE, 145 + .errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar", 146 + .fixup_kfunc_btf_id = { 147 + { "bpf_lookup_user_key", 2 }, 148 + { "bpf_key_put", 4 }, 149 + }, 150 + .result = REJECT, 151 + }, 152 + { 153 + "reference tracking: release system key reference without check", 154 + .insns = { 155 + BPF_MOV64_IMM(BPF_REG_1, 1), 156 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 157 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 158 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 159 + BPF_MOV64_IMM(BPF_REG_0, 0), 160 + BPF_EXIT_INSN(), 161 + }, 162 + .prog_type = BPF_PROG_TYPE_LSM, 163 + .kfunc = "bpf", 164 + .expected_attach_type = BPF_LSM_MAC, 165 + .flags = BPF_F_SLEEPABLE, 166 + .errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar", 167 + .fixup_kfunc_btf_id = { 168 + { "bpf_lookup_system_key", 1 }, 169 + { "bpf_key_put", 3 }, 170 + }, 171 + .result = REJECT, 172 + }, 173 + { 174 + "reference tracking: release with NULL key pointer", 175 + .insns = { 176 + BPF_MOV64_IMM(BPF_REG_1, 0), 177 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 178 + BPF_MOV64_IMM(BPF_REG_0, 0), 179 + BPF_EXIT_INSN(), 180 + }, 181 + .prog_type = BPF_PROG_TYPE_LSM, 182 + .kfunc = "bpf", 183 + .expected_attach_type = BPF_LSM_MAC, 184 + .flags = BPF_F_SLEEPABLE, 185 + .errstr = "arg#0 pointer type STRUCT bpf_key must point to scalar, or struct with scalar", 186 + .fixup_kfunc_btf_id = { 187 + { "bpf_key_put", 1 }, 188 + }, 189 + .result = REJECT, 190 + }, 191 + { 192 + "reference tracking: leak potential reference to user key", 193 + .insns = { 194 + BPF_MOV64_IMM(BPF_REG_1, -3), 195 + BPF_MOV64_IMM(BPF_REG_2, 0), 196 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 197 + BPF_EXIT_INSN(), 198 + }, 199 + .prog_type = BPF_PROG_TYPE_LSM, 200 + .kfunc = "bpf", 201 + .expected_attach_type = BPF_LSM_MAC, 202 + .flags = BPF_F_SLEEPABLE, 203 + .errstr = "Unreleased reference", 204 + .fixup_kfunc_btf_id = { 205 + { "bpf_lookup_user_key", 2 }, 206 + }, 207 + .result = REJECT, 208 + }, 209 + { 210 + "reference tracking: leak potential reference to system key", 211 + .insns = { 212 + BPF_MOV64_IMM(BPF_REG_1, 1), 213 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0), 214 + BPF_EXIT_INSN(), 215 + }, 216 + .prog_type = BPF_PROG_TYPE_LSM, 217 + .kfunc = "bpf", 218 + .expected_attach_type = BPF_LSM_MAC, 219 + .flags = BPF_F_SLEEPABLE, 220 + .errstr = "Unreleased reference", 221 + .fixup_kfunc_btf_id = { 222 + { "bpf_lookup_system_key", 1 }, 223 + }, 224 + .result = REJECT, 225 + }, 226 + { 88 227 "reference tracking: release reference without check", 89 228 .insns = { 90 229 BPF_SK_LOOKUP(sk_lookup_tcp),

+1 -1

tools/testing/selftests/bpf/verifier/var_off.c

··· 121 121 BPF_EXIT_INSN(), 122 122 }, 123 123 .fixup_map_hash_8b = { 1 }, 124 - /* The unpriviledged case is not too interesting; variable 124 + /* The unprivileged case is not too interesting; variable 125 125 * stack access is rejected. 126 126 */ 127 127 .errstr_unpriv = "R2 variable stack access prohibited for !root",

+104

tools/testing/selftests/bpf/verify_sig_setup.sh

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + set -e 5 + set -u 6 + set -o pipefail 7 + 8 + VERBOSE="${SELFTESTS_VERBOSE:=0}" 9 + LOG_FILE="$(mktemp /tmp/verify_sig_setup.log.XXXXXX)" 10 + 11 + x509_genkey_content="\ 12 + [ req ] 13 + default_bits = 2048 14 + distinguished_name = req_distinguished_name 15 + prompt = no 16 + string_mask = utf8only 17 + x509_extensions = myexts 18 + 19 + [ req_distinguished_name ] 20 + CN = eBPF Signature Verification Testing Key 21 + 22 + [ myexts ] 23 + basicConstraints=critical,CA:FALSE 24 + keyUsage=digitalSignature 25 + subjectKeyIdentifier=hash 26 + authorityKeyIdentifier=keyid 27 + " 28 + 29 + usage() 30 + { 31 + echo "Usage: $0 <setup|cleanup <existing_tmp_dir>" 32 + exit 1 33 + } 34 + 35 + setup() 36 + { 37 + local tmp_dir="$1" 38 + 39 + echo "${x509_genkey_content}" > ${tmp_dir}/x509.genkey 40 + 41 + openssl req -new -nodes -utf8 -sha256 -days 36500 \ 42 + -batch -x509 -config ${tmp_dir}/x509.genkey \ 43 + -outform PEM -out ${tmp_dir}/signing_key.pem \ 44 + -keyout ${tmp_dir}/signing_key.pem 2>&1 45 + 46 + openssl x509 -in ${tmp_dir}/signing_key.pem -out \ 47 + ${tmp_dir}/signing_key.der -outform der 48 + 49 + key_id=$(cat ${tmp_dir}/signing_key.der | keyctl padd asymmetric ebpf_testing_key @s) 50 + 51 + keyring_id=$(keyctl newring ebpf_testing_keyring @s) 52 + keyctl link $key_id $keyring_id 53 + } 54 + 55 + cleanup() { 56 + local tmp_dir="$1" 57 + 58 + keyctl unlink $(keyctl search @s asymmetric ebpf_testing_key) @s 59 + keyctl unlink $(keyctl search @s keyring ebpf_testing_keyring) @s 60 + rm -rf ${tmp_dir} 61 + } 62 + 63 + catch() 64 + { 65 + local exit_code="$1" 66 + local log_file="$2" 67 + 68 + if [[ "${exit_code}" -ne 0 ]]; then 69 + cat "${log_file}" >&3 70 + fi 71 + 72 + rm -f "${log_file}" 73 + exit ${exit_code} 74 + } 75 + 76 + main() 77 + { 78 + [[ $# -ne 2 ]] && usage 79 + 80 + local action="$1" 81 + local tmp_dir="$2" 82 + 83 + [[ ! -d "${tmp_dir}" ]] && echo "Directory ${tmp_dir} doesn't exist" && exit 1 84 + 85 + if [[ "${action}" == "setup" ]]; then 86 + setup "${tmp_dir}" 87 + elif [[ "${action}" == "cleanup" ]]; then 88 + cleanup "${tmp_dir}" 89 + else 90 + echo "Unknown action: ${action}" 91 + exit 1 92 + fi 93 + } 94 + 95 + trap 'catch "$?" "${LOG_FILE}"' EXIT 96 + 97 + if [[ "${VERBOSE}" -eq 0 ]]; then 98 + # Save the stderr to 3 so that we can output back to 99 + # it incase of an error. 100 + exec 3>&2 1>"${LOG_FILE}" 2>&1 101 + fi 102 + 103 + main "$@" 104 + rm -f "${LOG_FILE}"

+1322

tools/testing/selftests/bpf/veristat.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + #define _GNU_SOURCE 4 + #include <argp.h> 5 + #include <string.h> 6 + #include <stdlib.h> 7 + #include <linux/compiler.h> 8 + #include <sched.h> 9 + #include <pthread.h> 10 + #include <dirent.h> 11 + #include <signal.h> 12 + #include <fcntl.h> 13 + #include <unistd.h> 14 + #include <sys/time.h> 15 + #include <sys/sysinfo.h> 16 + #include <sys/stat.h> 17 + #include <bpf/libbpf.h> 18 + #include <libelf.h> 19 + #include <gelf.h> 20 + 21 + enum stat_id { 22 + VERDICT, 23 + DURATION, 24 + TOTAL_INSNS, 25 + TOTAL_STATES, 26 + PEAK_STATES, 27 + MAX_STATES_PER_INSN, 28 + MARK_READ_MAX_LEN, 29 + 30 + FILE_NAME, 31 + PROG_NAME, 32 + 33 + ALL_STATS_CNT, 34 + NUM_STATS_CNT = FILE_NAME - VERDICT, 35 + }; 36 + 37 + struct verif_stats { 38 + char *file_name; 39 + char *prog_name; 40 + 41 + long stats[NUM_STATS_CNT]; 42 + }; 43 + 44 + struct stat_specs { 45 + int spec_cnt; 46 + enum stat_id ids[ALL_STATS_CNT]; 47 + bool asc[ALL_STATS_CNT]; 48 + int lens[ALL_STATS_CNT * 3]; /* 3x for comparison mode */ 49 + }; 50 + 51 + enum resfmt { 52 + RESFMT_TABLE, 53 + RESFMT_TABLE_CALCLEN, /* fake format to pre-calculate table's column widths */ 54 + RESFMT_CSV, 55 + }; 56 + 57 + struct filter { 58 + char *file_glob; 59 + char *prog_glob; 60 + }; 61 + 62 + static struct env { 63 + char **filenames; 64 + int filename_cnt; 65 + bool verbose; 66 + bool quiet; 67 + int log_level; 68 + enum resfmt out_fmt; 69 + bool comparison_mode; 70 + 71 + struct verif_stats *prog_stats; 72 + int prog_stat_cnt; 73 + 74 + /* baseline_stats is allocated and used only in comparsion mode */ 75 + struct verif_stats *baseline_stats; 76 + int baseline_stat_cnt; 77 + 78 + struct stat_specs output_spec; 79 + struct stat_specs sort_spec; 80 + 81 + struct filter *allow_filters; 82 + struct filter *deny_filters; 83 + int allow_filter_cnt; 84 + int deny_filter_cnt; 85 + 86 + int files_processed; 87 + int files_skipped; 88 + int progs_processed; 89 + int progs_skipped; 90 + } env; 91 + 92 + static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args) 93 + { 94 + if (!env.verbose) 95 + return 0; 96 + if (level == LIBBPF_DEBUG /* && !env.verbose */) 97 + return 0; 98 + return vfprintf(stderr, format, args); 99 + } 100 + 101 + const char *argp_program_version = "veristat"; 102 + const char *argp_program_bug_address = "<bpf@vger.kernel.org>"; 103 + const char argp_program_doc[] = 104 + "veristat BPF verifier stats collection and comparison tool.\n" 105 + "\n" 106 + "USAGE: veristat <obj-file> [<obj-file>...]\n" 107 + " OR: veristat -C <baseline.csv> <comparison.csv>\n"; 108 + 109 + static const struct argp_option opts[] = { 110 + { NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help" }, 111 + { "verbose", 'v', NULL, 0, "Verbose mode" }, 112 + { "log-level", 'l', "LEVEL", 0, "Verifier log level (default 0 for normal mode, 1 for verbose mode)" }, 113 + { "quiet", 'q', NULL, 0, "Quiet mode" }, 114 + { "emit", 'e', "SPEC", 0, "Specify stats to be emitted" }, 115 + { "sort", 's', "SPEC", 0, "Specify sort order" }, 116 + { "output-format", 'o', "FMT", 0, "Result output format (table, csv), default is table." }, 117 + { "compare", 'C', NULL, 0, "Comparison mode" }, 118 + { "filter", 'f', "FILTER", 0, "Filter expressions (or @filename for file with expressions)." }, 119 + {}, 120 + }; 121 + 122 + static int parse_stats(const char *stats_str, struct stat_specs *specs); 123 + static int append_filter(struct filter **filters, int *cnt, const char *str); 124 + static int append_filter_file(const char *path); 125 + 126 + static error_t parse_arg(int key, char *arg, struct argp_state *state) 127 + { 128 + void *tmp; 129 + int err; 130 + 131 + switch (key) { 132 + case 'h': 133 + argp_state_help(state, stderr, ARGP_HELP_STD_HELP); 134 + break; 135 + case 'v': 136 + env.verbose = true; 137 + break; 138 + case 'q': 139 + env.quiet = true; 140 + break; 141 + case 'e': 142 + err = parse_stats(arg, &env.output_spec); 143 + if (err) 144 + return err; 145 + break; 146 + case 's': 147 + err = parse_stats(arg, &env.sort_spec); 148 + if (err) 149 + return err; 150 + break; 151 + case 'o': 152 + if (strcmp(arg, "table") == 0) { 153 + env.out_fmt = RESFMT_TABLE; 154 + } else if (strcmp(arg, "csv") == 0) { 155 + env.out_fmt = RESFMT_CSV; 156 + } else { 157 + fprintf(stderr, "Unrecognized output format '%s'\n", arg); 158 + return -EINVAL; 159 + } 160 + break; 161 + case 'l': 162 + errno = 0; 163 + env.log_level = strtol(arg, NULL, 10); 164 + if (errno) { 165 + fprintf(stderr, "invalid log level: %s\n", arg); 166 + argp_usage(state); 167 + } 168 + break; 169 + case 'C': 170 + env.comparison_mode = true; 171 + break; 172 + case 'f': 173 + if (arg[0] == '@') 174 + err = append_filter_file(arg + 1); 175 + else if (arg[0] == '!') 176 + err = append_filter(&env.deny_filters, &env.deny_filter_cnt, arg + 1); 177 + else 178 + err = append_filter(&env.allow_filters, &env.allow_filter_cnt, arg); 179 + if (err) { 180 + fprintf(stderr, "Failed to collect program filter expressions: %d\n", err); 181 + return err; 182 + } 183 + break; 184 + case ARGP_KEY_ARG: 185 + tmp = realloc(env.filenames, (env.filename_cnt + 1) * sizeof(*env.filenames)); 186 + if (!tmp) 187 + return -ENOMEM; 188 + env.filenames = tmp; 189 + env.filenames[env.filename_cnt] = strdup(arg); 190 + if (!env.filenames[env.filename_cnt]) 191 + return -ENOMEM; 192 + env.filename_cnt++; 193 + break; 194 + default: 195 + return ARGP_ERR_UNKNOWN; 196 + } 197 + return 0; 198 + } 199 + 200 + static const struct argp argp = { 201 + .options = opts, 202 + .parser = parse_arg, 203 + .doc = argp_program_doc, 204 + }; 205 + 206 + 207 + /* Adapted from perf/util/string.c */ 208 + static bool glob_matches(const char *str, const char *pat) 209 + { 210 + while (*str && *pat && *pat != '*') { 211 + if (*str != *pat) 212 + return false; 213 + str++; 214 + pat++; 215 + } 216 + /* Check wild card */ 217 + if (*pat == '*') { 218 + while (*pat == '*') 219 + pat++; 220 + if (!*pat) /* Tail wild card matches all */ 221 + return true; 222 + while (*str) 223 + if (glob_matches(str++, pat)) 224 + return true; 225 + } 226 + return !*str && !*pat; 227 + } 228 + 229 + static bool should_process_file(const char *filename) 230 + { 231 + int i; 232 + 233 + if (env.deny_filter_cnt > 0) { 234 + for (i = 0; i < env.deny_filter_cnt; i++) { 235 + if (glob_matches(filename, env.deny_filters[i].file_glob)) 236 + return false; 237 + } 238 + } 239 + 240 + if (env.allow_filter_cnt == 0) 241 + return true; 242 + 243 + for (i = 0; i < env.allow_filter_cnt; i++) { 244 + if (glob_matches(filename, env.allow_filters[i].file_glob)) 245 + return true; 246 + } 247 + 248 + return false; 249 + } 250 + 251 + static bool is_bpf_obj_file(const char *path) { 252 + Elf64_Ehdr *ehdr; 253 + int fd, err = -EINVAL; 254 + Elf *elf = NULL; 255 + 256 + fd = open(path, O_RDONLY | O_CLOEXEC); 257 + if (fd < 0) 258 + return true; /* we'll fail later and propagate error */ 259 + 260 + /* ensure libelf is initialized */ 261 + (void)elf_version(EV_CURRENT); 262 + 263 + elf = elf_begin(fd, ELF_C_READ, NULL); 264 + if (!elf) 265 + goto cleanup; 266 + 267 + if (elf_kind(elf) != ELF_K_ELF || gelf_getclass(elf) != ELFCLASS64) 268 + goto cleanup; 269 + 270 + ehdr = elf64_getehdr(elf); 271 + /* Old LLVM set e_machine to EM_NONE */ 272 + if (!ehdr || ehdr->e_type != ET_REL || (ehdr->e_machine && ehdr->e_machine != EM_BPF)) 273 + goto cleanup; 274 + 275 + err = 0; 276 + cleanup: 277 + if (elf) 278 + elf_end(elf); 279 + close(fd); 280 + return err == 0; 281 + } 282 + 283 + static bool should_process_prog(const char *path, const char *prog_name) 284 + { 285 + const char *filename = basename(path); 286 + int i; 287 + 288 + if (env.deny_filter_cnt > 0) { 289 + for (i = 0; i < env.deny_filter_cnt; i++) { 290 + if (glob_matches(filename, env.deny_filters[i].file_glob)) 291 + return false; 292 + if (!env.deny_filters[i].prog_glob) 293 + continue; 294 + if (glob_matches(prog_name, env.deny_filters[i].prog_glob)) 295 + return false; 296 + } 297 + } 298 + 299 + if (env.allow_filter_cnt == 0) 300 + return true; 301 + 302 + for (i = 0; i < env.allow_filter_cnt; i++) { 303 + if (!glob_matches(filename, env.allow_filters[i].file_glob)) 304 + continue; 305 + /* if filter specifies only filename glob part, it implicitly 306 + * allows all progs within that file 307 + */ 308 + if (!env.allow_filters[i].prog_glob) 309 + return true; 310 + if (glob_matches(prog_name, env.allow_filters[i].prog_glob)) 311 + return true; 312 + } 313 + 314 + return false; 315 + } 316 + 317 + static int append_filter(struct filter **filters, int *cnt, const char *str) 318 + { 319 + struct filter *f; 320 + void *tmp; 321 + const char *p; 322 + 323 + tmp = realloc(*filters, (*cnt + 1) * sizeof(**filters)); 324 + if (!tmp) 325 + return -ENOMEM; 326 + *filters = tmp; 327 + 328 + f = &(*filters)[*cnt]; 329 + f->file_glob = f->prog_glob = NULL; 330 + 331 + /* filter can be specified either as "<obj-glob>" or "<obj-glob>/<prog-glob>" */ 332 + p = strchr(str, '/'); 333 + if (!p) { 334 + f->file_glob = strdup(str); 335 + if (!f->file_glob) 336 + return -ENOMEM; 337 + } else { 338 + f->file_glob = strndup(str, p - str); 339 + f->prog_glob = strdup(p + 1); 340 + if (!f->file_glob || !f->prog_glob) { 341 + free(f->file_glob); 342 + free(f->prog_glob); 343 + f->file_glob = f->prog_glob = NULL; 344 + return -ENOMEM; 345 + } 346 + } 347 + 348 + *cnt = *cnt + 1; 349 + return 0; 350 + } 351 + 352 + static int append_filter_file(const char *path) 353 + { 354 + char buf[1024]; 355 + FILE *f; 356 + int err = 0; 357 + 358 + f = fopen(path, "r"); 359 + if (!f) { 360 + err = -errno; 361 + fprintf(stderr, "Failed to open filters in '%s': %d\n", path, err); 362 + return err; 363 + } 364 + 365 + while (fscanf(f, " %1023[^\n]\n", buf) == 1) { 366 + /* lines starting with # are comments, skip them */ 367 + if (buf[0] == '\0' || buf[0] == '#') 368 + continue; 369 + /* lines starting with ! are negative match filters */ 370 + if (buf[0] == '!') 371 + err = append_filter(&env.deny_filters, &env.deny_filter_cnt, buf + 1); 372 + else 373 + err = append_filter(&env.allow_filters, &env.allow_filter_cnt, buf); 374 + if (err) 375 + goto cleanup; 376 + } 377 + 378 + cleanup: 379 + fclose(f); 380 + return err; 381 + } 382 + 383 + static const struct stat_specs default_output_spec = { 384 + .spec_cnt = 7, 385 + .ids = { 386 + FILE_NAME, PROG_NAME, VERDICT, DURATION, 387 + TOTAL_INSNS, TOTAL_STATES, PEAK_STATES, 388 + }, 389 + }; 390 + 391 + static const struct stat_specs default_sort_spec = { 392 + .spec_cnt = 2, 393 + .ids = { 394 + FILE_NAME, PROG_NAME, 395 + }, 396 + .asc = { true, true, }, 397 + }; 398 + 399 + static struct stat_def { 400 + const char *header; 401 + const char *names[4]; 402 + bool asc_by_default; 403 + } stat_defs[] = { 404 + [FILE_NAME] = { "File", {"file_name", "filename", "file"}, true /* asc */ }, 405 + [PROG_NAME] = { "Program", {"prog_name", "progname", "prog"}, true /* asc */ }, 406 + [VERDICT] = { "Verdict", {"verdict"}, true /* asc: failure, success */ }, 407 + [DURATION] = { "Duration (us)", {"duration", "dur"}, }, 408 + [TOTAL_INSNS] = { "Total insns", {"total_insns", "insns"}, }, 409 + [TOTAL_STATES] = { "Total states", {"total_states", "states"}, }, 410 + [PEAK_STATES] = { "Peak states", {"peak_states"}, }, 411 + [MAX_STATES_PER_INSN] = { "Max states per insn", {"max_states_per_insn"}, }, 412 + [MARK_READ_MAX_LEN] = { "Max mark read length", {"max_mark_read_len", "mark_read"}, }, 413 + }; 414 + 415 + static int parse_stat(const char *stat_name, struct stat_specs *specs) 416 + { 417 + int id, i; 418 + 419 + if (specs->spec_cnt >= ARRAY_SIZE(specs->ids)) { 420 + fprintf(stderr, "Can't specify more than %zd stats\n", ARRAY_SIZE(specs->ids)); 421 + return -E2BIG; 422 + } 423 + 424 + for (id = 0; id < ARRAY_SIZE(stat_defs); id++) { 425 + struct stat_def *def = &stat_defs[id]; 426 + 427 + for (i = 0; i < ARRAY_SIZE(stat_defs[id].names); i++) { 428 + if (!def->names[i] || strcmp(def->names[i], stat_name) != 0) 429 + continue; 430 + 431 + specs->ids[specs->spec_cnt] = id; 432 + specs->asc[specs->spec_cnt] = def->asc_by_default; 433 + specs->spec_cnt++; 434 + 435 + return 0; 436 + } 437 + } 438 + 439 + fprintf(stderr, "Unrecognized stat name '%s'\n", stat_name); 440 + return -ESRCH; 441 + } 442 + 443 + static int parse_stats(const char *stats_str, struct stat_specs *specs) 444 + { 445 + char *input, *state = NULL, *next; 446 + int err; 447 + 448 + input = strdup(stats_str); 449 + if (!input) 450 + return -ENOMEM; 451 + 452 + while ((next = strtok_r(state ? NULL : input, ",", &state))) { 453 + err = parse_stat(next, specs); 454 + if (err) 455 + return err; 456 + } 457 + 458 + return 0; 459 + } 460 + 461 + static void free_verif_stats(struct verif_stats *stats, size_t stat_cnt) 462 + { 463 + int i; 464 + 465 + if (!stats) 466 + return; 467 + 468 + for (i = 0; i < stat_cnt; i++) { 469 + free(stats[i].file_name); 470 + free(stats[i].prog_name); 471 + } 472 + free(stats); 473 + } 474 + 475 + static char verif_log_buf[64 * 1024]; 476 + 477 + #define MAX_PARSED_LOG_LINES 100 478 + 479 + static int parse_verif_log(char * const buf, size_t buf_sz, struct verif_stats *s) 480 + { 481 + const char *cur; 482 + int pos, lines; 483 + 484 + buf[buf_sz - 1] = '\0'; 485 + 486 + for (pos = strlen(buf) - 1, lines = 0; pos >= 0 && lines < MAX_PARSED_LOG_LINES; lines++) { 487 + /* find previous endline or otherwise take the start of log buf */ 488 + for (cur = &buf[pos]; cur > buf && cur[0] != '\n'; cur--, pos--) { 489 + } 490 + /* next time start from end of previous line (or pos goes to <0) */ 491 + pos--; 492 + /* if we found endline, point right after endline symbol; 493 + * otherwise, stay at the beginning of log buf 494 + */ 495 + if (cur[0] == '\n') 496 + cur++; 497 + 498 + if (1 == sscanf(cur, "verification time %ld usec\n", &s->stats[DURATION])) 499 + continue; 500 + if (6 == sscanf(cur, "processed %ld insns (limit %*d) max_states_per_insn %ld total_states %ld peak_states %ld mark_read %ld", 501 + &s->stats[TOTAL_INSNS], 502 + &s->stats[MAX_STATES_PER_INSN], 503 + &s->stats[TOTAL_STATES], 504 + &s->stats[PEAK_STATES], 505 + &s->stats[MARK_READ_MAX_LEN])) 506 + continue; 507 + } 508 + 509 + return 0; 510 + } 511 + 512 + static int process_prog(const char *filename, struct bpf_object *obj, struct bpf_program *prog) 513 + { 514 + const char *prog_name = bpf_program__name(prog); 515 + size_t buf_sz = sizeof(verif_log_buf); 516 + char *buf = verif_log_buf; 517 + struct verif_stats *stats; 518 + int err = 0; 519 + void *tmp; 520 + 521 + if (!should_process_prog(filename, bpf_program__name(prog))) { 522 + env.progs_skipped++; 523 + return 0; 524 + } 525 + 526 + tmp = realloc(env.prog_stats, (env.prog_stat_cnt + 1) * sizeof(*env.prog_stats)); 527 + if (!tmp) 528 + return -ENOMEM; 529 + env.prog_stats = tmp; 530 + stats = &env.prog_stats[env.prog_stat_cnt++]; 531 + memset(stats, 0, sizeof(*stats)); 532 + 533 + if (env.verbose) { 534 + buf_sz = 16 * 1024 * 1024; 535 + buf = malloc(buf_sz); 536 + if (!buf) 537 + return -ENOMEM; 538 + bpf_program__set_log_buf(prog, buf, buf_sz); 539 + bpf_program__set_log_level(prog, env.log_level | 4); /* stats + log */ 540 + } else { 541 + bpf_program__set_log_buf(prog, buf, buf_sz); 542 + bpf_program__set_log_level(prog, 4); /* only verifier stats */ 543 + } 544 + verif_log_buf[0] = '\0'; 545 + 546 + err = bpf_object__load(obj); 547 + env.progs_processed++; 548 + 549 + stats->file_name = strdup(basename(filename)); 550 + stats->prog_name = strdup(bpf_program__name(prog)); 551 + stats->stats[VERDICT] = err == 0; /* 1 - success, 0 - failure */ 552 + parse_verif_log(buf, buf_sz, stats); 553 + 554 + if (env.verbose) { 555 + printf("PROCESSING %s/%s, DURATION US: %ld, VERDICT: %s, VERIFIER LOG:\n%s\n", 556 + filename, prog_name, stats->stats[DURATION], 557 + err ? "failure" : "success", buf); 558 + } 559 + 560 + if (verif_log_buf != buf) 561 + free(buf); 562 + 563 + return 0; 564 + }; 565 + 566 + static int process_obj(const char *filename) 567 + { 568 + struct bpf_object *obj = NULL, *tobj; 569 + struct bpf_program *prog, *tprog, *lprog; 570 + libbpf_print_fn_t old_libbpf_print_fn; 571 + LIBBPF_OPTS(bpf_object_open_opts, opts); 572 + int err = 0, prog_cnt = 0; 573 + 574 + if (!should_process_file(basename(filename))) { 575 + if (env.verbose) 576 + printf("Skipping '%s' due to filters...\n", filename); 577 + env.files_skipped++; 578 + return 0; 579 + } 580 + if (!is_bpf_obj_file(filename)) { 581 + if (env.verbose) 582 + printf("Skipping '%s' as it's not a BPF object file...\n", filename); 583 + env.files_skipped++; 584 + return 0; 585 + } 586 + 587 + if (!env.quiet && env.out_fmt == RESFMT_TABLE) 588 + printf("Processing '%s'...\n", basename(filename)); 589 + 590 + old_libbpf_print_fn = libbpf_set_print(libbpf_print_fn); 591 + obj = bpf_object__open_file(filename, &opts); 592 + if (!obj) { 593 + /* if libbpf can't open BPF object file, it could be because 594 + * that BPF object file is incomplete and has to be statically 595 + * linked into a final BPF object file; instead of bailing 596 + * out, report it into stderr, mark it as skipped, and 597 + * proceeed 598 + */ 599 + fprintf(stderr, "Failed to open '%s': %d\n", filename, -errno); 600 + env.files_skipped++; 601 + err = 0; 602 + goto cleanup; 603 + } 604 + 605 + env.files_processed++; 606 + 607 + bpf_object__for_each_program(prog, obj) { 608 + prog_cnt++; 609 + } 610 + 611 + if (prog_cnt == 1) { 612 + prog = bpf_object__next_program(obj, NULL); 613 + bpf_program__set_autoload(prog, true); 614 + process_prog(filename, obj, prog); 615 + goto cleanup; 616 + } 617 + 618 + bpf_object__for_each_program(prog, obj) { 619 + const char *prog_name = bpf_program__name(prog); 620 + 621 + tobj = bpf_object__open_file(filename, &opts); 622 + if (!tobj) { 623 + err = -errno; 624 + fprintf(stderr, "Failed to open '%s': %d\n", filename, err); 625 + goto cleanup; 626 + } 627 + 628 + bpf_object__for_each_program(tprog, tobj) { 629 + const char *tprog_name = bpf_program__name(tprog); 630 + 631 + if (strcmp(prog_name, tprog_name) == 0) { 632 + bpf_program__set_autoload(tprog, true); 633 + lprog = tprog; 634 + } else { 635 + bpf_program__set_autoload(tprog, false); 636 + } 637 + } 638 + 639 + process_prog(filename, tobj, lprog); 640 + bpf_object__close(tobj); 641 + } 642 + 643 + cleanup: 644 + bpf_object__close(obj); 645 + libbpf_set_print(old_libbpf_print_fn); 646 + return err; 647 + } 648 + 649 + static int cmp_stat(const struct verif_stats *s1, const struct verif_stats *s2, 650 + enum stat_id id, bool asc) 651 + { 652 + int cmp = 0; 653 + 654 + switch (id) { 655 + case FILE_NAME: 656 + cmp = strcmp(s1->file_name, s2->file_name); 657 + break; 658 + case PROG_NAME: 659 + cmp = strcmp(s1->prog_name, s2->prog_name); 660 + break; 661 + case VERDICT: 662 + case DURATION: 663 + case TOTAL_INSNS: 664 + case TOTAL_STATES: 665 + case PEAK_STATES: 666 + case MAX_STATES_PER_INSN: 667 + case MARK_READ_MAX_LEN: { 668 + long v1 = s1->stats[id]; 669 + long v2 = s2->stats[id]; 670 + 671 + if (v1 != v2) 672 + cmp = v1 < v2 ? -1 : 1; 673 + break; 674 + } 675 + default: 676 + fprintf(stderr, "Unrecognized stat #%d\n", id); 677 + exit(1); 678 + } 679 + 680 + return asc ? cmp : -cmp; 681 + } 682 + 683 + static int cmp_prog_stats(const void *v1, const void *v2) 684 + { 685 + const struct verif_stats *s1 = v1, *s2 = v2; 686 + int i, cmp; 687 + 688 + for (i = 0; i < env.sort_spec.spec_cnt; i++) { 689 + cmp = cmp_stat(s1, s2, env.sort_spec.ids[i], env.sort_spec.asc[i]); 690 + if (cmp != 0) 691 + return cmp; 692 + } 693 + 694 + return 0; 695 + } 696 + 697 + #define HEADER_CHAR '-' 698 + #define COLUMN_SEP " " 699 + 700 + static void output_header_underlines(void) 701 + { 702 + int i, j, len; 703 + 704 + for (i = 0; i < env.output_spec.spec_cnt; i++) { 705 + len = env.output_spec.lens[i]; 706 + 707 + printf("%s", i == 0 ? "" : COLUMN_SEP); 708 + for (j = 0; j < len; j++) 709 + printf("%c", HEADER_CHAR); 710 + } 711 + printf("\n"); 712 + } 713 + 714 + static void output_headers(enum resfmt fmt) 715 + { 716 + int i, len; 717 + 718 + for (i = 0; i < env.output_spec.spec_cnt; i++) { 719 + int id = env.output_spec.ids[i]; 720 + int *max_len = &env.output_spec.lens[i]; 721 + 722 + switch (fmt) { 723 + case RESFMT_TABLE_CALCLEN: 724 + len = snprintf(NULL, 0, "%s", stat_defs[id].header); 725 + if (len > *max_len) 726 + *max_len = len; 727 + break; 728 + case RESFMT_TABLE: 729 + printf("%s%-*s", i == 0 ? "" : COLUMN_SEP, *max_len, stat_defs[id].header); 730 + if (i == env.output_spec.spec_cnt - 1) 731 + printf("\n"); 732 + break; 733 + case RESFMT_CSV: 734 + printf("%s%s", i == 0 ? "" : ",", stat_defs[id].names[0]); 735 + if (i == env.output_spec.spec_cnt - 1) 736 + printf("\n"); 737 + break; 738 + } 739 + } 740 + 741 + if (fmt == RESFMT_TABLE) 742 + output_header_underlines(); 743 + } 744 + 745 + static void prepare_value(const struct verif_stats *s, enum stat_id id, 746 + const char **str, long *val) 747 + { 748 + switch (id) { 749 + case FILE_NAME: 750 + *str = s->file_name; 751 + break; 752 + case PROG_NAME: 753 + *str = s->prog_name; 754 + break; 755 + case VERDICT: 756 + *str = s->stats[VERDICT] ? "success" : "failure"; 757 + break; 758 + case DURATION: 759 + case TOTAL_INSNS: 760 + case TOTAL_STATES: 761 + case PEAK_STATES: 762 + case MAX_STATES_PER_INSN: 763 + case MARK_READ_MAX_LEN: 764 + *val = s->stats[id]; 765 + break; 766 + default: 767 + fprintf(stderr, "Unrecognized stat #%d\n", id); 768 + exit(1); 769 + } 770 + } 771 + 772 + static void output_stats(const struct verif_stats *s, enum resfmt fmt, bool last) 773 + { 774 + int i; 775 + 776 + for (i = 0; i < env.output_spec.spec_cnt; i++) { 777 + int id = env.output_spec.ids[i]; 778 + int *max_len = &env.output_spec.lens[i], len; 779 + const char *str = NULL; 780 + long val = 0; 781 + 782 + prepare_value(s, id, &str, &val); 783 + 784 + switch (fmt) { 785 + case RESFMT_TABLE_CALCLEN: 786 + if (str) 787 + len = snprintf(NULL, 0, "%s", str); 788 + else 789 + len = snprintf(NULL, 0, "%ld", val); 790 + if (len > *max_len) 791 + *max_len = len; 792 + break; 793 + case RESFMT_TABLE: 794 + if (str) 795 + printf("%s%-*s", i == 0 ? "" : COLUMN_SEP, *max_len, str); 796 + else 797 + printf("%s%*ld", i == 0 ? "" : COLUMN_SEP, *max_len, val); 798 + if (i == env.output_spec.spec_cnt - 1) 799 + printf("\n"); 800 + break; 801 + case RESFMT_CSV: 802 + if (str) 803 + printf("%s%s", i == 0 ? "" : ",", str); 804 + else 805 + printf("%s%ld", i == 0 ? "" : ",", val); 806 + if (i == env.output_spec.spec_cnt - 1) 807 + printf("\n"); 808 + break; 809 + } 810 + } 811 + 812 + if (last && fmt == RESFMT_TABLE) { 813 + output_header_underlines(); 814 + printf("Done. Processed %d files, %d programs. Skipped %d files, %d programs.\n", 815 + env.files_processed, env.files_skipped, env.progs_processed, env.progs_skipped); 816 + } 817 + } 818 + 819 + static int handle_verif_mode(void) 820 + { 821 + int i, err; 822 + 823 + if (env.filename_cnt == 0) { 824 + fprintf(stderr, "Please provide path to BPF object file!\n"); 825 + argp_help(&argp, stderr, ARGP_HELP_USAGE, "veristat"); 826 + return -EINVAL; 827 + } 828 + 829 + for (i = 0; i < env.filename_cnt; i++) { 830 + err = process_obj(env.filenames[i]); 831 + if (err) { 832 + fprintf(stderr, "Failed to process '%s': %d\n", env.filenames[i], err); 833 + return err; 834 + } 835 + } 836 + 837 + qsort(env.prog_stats, env.prog_stat_cnt, sizeof(*env.prog_stats), cmp_prog_stats); 838 + 839 + if (env.out_fmt == RESFMT_TABLE) { 840 + /* calculate column widths */ 841 + output_headers(RESFMT_TABLE_CALCLEN); 842 + for (i = 0; i < env.prog_stat_cnt; i++) 843 + output_stats(&env.prog_stats[i], RESFMT_TABLE_CALCLEN, false); 844 + } 845 + 846 + /* actually output the table */ 847 + output_headers(env.out_fmt); 848 + for (i = 0; i < env.prog_stat_cnt; i++) { 849 + output_stats(&env.prog_stats[i], env.out_fmt, i == env.prog_stat_cnt - 1); 850 + } 851 + 852 + return 0; 853 + } 854 + 855 + static int parse_stat_value(const char *str, enum stat_id id, struct verif_stats *st) 856 + { 857 + switch (id) { 858 + case FILE_NAME: 859 + st->file_name = strdup(str); 860 + if (!st->file_name) 861 + return -ENOMEM; 862 + break; 863 + case PROG_NAME: 864 + st->prog_name = strdup(str); 865 + if (!st->prog_name) 866 + return -ENOMEM; 867 + break; 868 + case VERDICT: 869 + if (strcmp(str, "success") == 0) { 870 + st->stats[VERDICT] = true; 871 + } else if (strcmp(str, "failure") == 0) { 872 + st->stats[VERDICT] = false; 873 + } else { 874 + fprintf(stderr, "Unrecognized verification verdict '%s'\n", str); 875 + return -EINVAL; 876 + } 877 + break; 878 + case DURATION: 879 + case TOTAL_INSNS: 880 + case TOTAL_STATES: 881 + case PEAK_STATES: 882 + case MAX_STATES_PER_INSN: 883 + case MARK_READ_MAX_LEN: { 884 + long val; 885 + int err, n; 886 + 887 + if (sscanf(str, "%ld %n", &val, &n) != 1 || n != strlen(str)) { 888 + err = -errno; 889 + fprintf(stderr, "Failed to parse '%s' as integer\n", str); 890 + return err; 891 + } 892 + 893 + st->stats[id] = val; 894 + break; 895 + } 896 + default: 897 + fprintf(stderr, "Unrecognized stat #%d\n", id); 898 + return -EINVAL; 899 + } 900 + return 0; 901 + } 902 + 903 + static int parse_stats_csv(const char *filename, struct stat_specs *specs, 904 + struct verif_stats **statsp, int *stat_cntp) 905 + { 906 + char line[4096]; 907 + FILE *f; 908 + int err = 0; 909 + bool header = true; 910 + 911 + f = fopen(filename, "r"); 912 + if (!f) { 913 + err = -errno; 914 + fprintf(stderr, "Failed to open '%s': %d\n", filename, err); 915 + return err; 916 + } 917 + 918 + *stat_cntp = 0; 919 + 920 + while (fgets(line, sizeof(line), f)) { 921 + char *input = line, *state = NULL, *next; 922 + struct verif_stats *st = NULL; 923 + int col = 0; 924 + 925 + if (!header) { 926 + void *tmp; 927 + 928 + tmp = realloc(*statsp, (*stat_cntp + 1) * sizeof(**statsp)); 929 + if (!tmp) { 930 + err = -ENOMEM; 931 + goto cleanup; 932 + } 933 + *statsp = tmp; 934 + 935 + st = &(*statsp)[*stat_cntp]; 936 + memset(st, 0, sizeof(*st)); 937 + 938 + *stat_cntp += 1; 939 + } 940 + 941 + while ((next = strtok_r(state ? NULL : input, ",\n", &state))) { 942 + if (header) { 943 + /* for the first line, set up spec stats */ 944 + err = parse_stat(next, specs); 945 + if (err) 946 + goto cleanup; 947 + continue; 948 + } 949 + 950 + /* for all other lines, parse values based on spec */ 951 + if (col >= specs->spec_cnt) { 952 + fprintf(stderr, "Found extraneous column #%d in row #%d of '%s'\n", 953 + col, *stat_cntp, filename); 954 + err = -EINVAL; 955 + goto cleanup; 956 + } 957 + err = parse_stat_value(next, specs->ids[col], st); 958 + if (err) 959 + goto cleanup; 960 + col++; 961 + } 962 + 963 + if (header) { 964 + header = false; 965 + continue; 966 + } 967 + 968 + if (col < specs->spec_cnt) { 969 + fprintf(stderr, "Not enough columns in row #%d in '%s'\n", 970 + *stat_cntp, filename); 971 + err = -EINVAL; 972 + goto cleanup; 973 + } 974 + 975 + if (!st->file_name || !st->prog_name) { 976 + fprintf(stderr, "Row #%d in '%s' is missing file and/or program name\n", 977 + *stat_cntp, filename); 978 + err = -EINVAL; 979 + goto cleanup; 980 + } 981 + 982 + /* in comparison mode we can only check filters after we 983 + * parsed entire line; if row should be ignored we pretend we 984 + * never parsed it 985 + */ 986 + if (!should_process_prog(st->file_name, st->prog_name)) { 987 + free(st->file_name); 988 + free(st->prog_name); 989 + *stat_cntp -= 1; 990 + } 991 + } 992 + 993 + if (!feof(f)) { 994 + err = -errno; 995 + fprintf(stderr, "Failed I/O for '%s': %d\n", filename, err); 996 + } 997 + 998 + cleanup: 999 + fclose(f); 1000 + return err; 1001 + } 1002 + 1003 + /* empty/zero stats for mismatched rows */ 1004 + static const struct verif_stats fallback_stats = { .file_name = "", .prog_name = "" }; 1005 + 1006 + static bool is_key_stat(enum stat_id id) 1007 + { 1008 + return id == FILE_NAME || id == PROG_NAME; 1009 + } 1010 + 1011 + static void output_comp_header_underlines(void) 1012 + { 1013 + int i, j, k; 1014 + 1015 + for (i = 0; i < env.output_spec.spec_cnt; i++) { 1016 + int id = env.output_spec.ids[i]; 1017 + int max_j = is_key_stat(id) ? 1 : 3; 1018 + 1019 + for (j = 0; j < max_j; j++) { 1020 + int len = env.output_spec.lens[3 * i + j]; 1021 + 1022 + printf("%s", i + j == 0 ? "" : COLUMN_SEP); 1023 + 1024 + for (k = 0; k < len; k++) 1025 + printf("%c", HEADER_CHAR); 1026 + } 1027 + } 1028 + printf("\n"); 1029 + } 1030 + 1031 + static void output_comp_headers(enum resfmt fmt) 1032 + { 1033 + static const char *table_sfxs[3] = {" (A)", " (B)", " (DIFF)"}; 1034 + static const char *name_sfxs[3] = {"_base", "_comp", "_diff"}; 1035 + int i, j, len; 1036 + 1037 + for (i = 0; i < env.output_spec.spec_cnt; i++) { 1038 + int id = env.output_spec.ids[i]; 1039 + /* key stats don't have A/B/DIFF columns, they are common for both data sets */ 1040 + int max_j = is_key_stat(id) ? 1 : 3; 1041 + 1042 + for (j = 0; j < max_j; j++) { 1043 + int *max_len = &env.output_spec.lens[3 * i + j]; 1044 + bool last = (i == env.output_spec.spec_cnt - 1) && (j == max_j - 1); 1045 + const char *sfx; 1046 + 1047 + switch (fmt) { 1048 + case RESFMT_TABLE_CALCLEN: 1049 + sfx = is_key_stat(id) ? "" : table_sfxs[j]; 1050 + len = snprintf(NULL, 0, "%s%s", stat_defs[id].header, sfx); 1051 + if (len > *max_len) 1052 + *max_len = len; 1053 + break; 1054 + case RESFMT_TABLE: 1055 + sfx = is_key_stat(id) ? "" : table_sfxs[j]; 1056 + printf("%s%-*s%s", i + j == 0 ? "" : COLUMN_SEP, 1057 + *max_len - (int)strlen(sfx), stat_defs[id].header, sfx); 1058 + if (last) 1059 + printf("\n"); 1060 + break; 1061 + case RESFMT_CSV: 1062 + sfx = is_key_stat(id) ? "" : name_sfxs[j]; 1063 + printf("%s%s%s", i + j == 0 ? "" : ",", stat_defs[id].names[0], sfx); 1064 + if (last) 1065 + printf("\n"); 1066 + break; 1067 + } 1068 + } 1069 + } 1070 + 1071 + if (fmt == RESFMT_TABLE) 1072 + output_comp_header_underlines(); 1073 + } 1074 + 1075 + static void output_comp_stats(const struct verif_stats *base, const struct verif_stats *comp, 1076 + enum resfmt fmt, bool last) 1077 + { 1078 + char base_buf[1024] = {}, comp_buf[1024] = {}, diff_buf[1024] = {}; 1079 + int i; 1080 + 1081 + for (i = 0; i < env.output_spec.spec_cnt; i++) { 1082 + int id = env.output_spec.ids[i], len; 1083 + int *max_len_base = &env.output_spec.lens[3 * i + 0]; 1084 + int *max_len_comp = &env.output_spec.lens[3 * i + 1]; 1085 + int *max_len_diff = &env.output_spec.lens[3 * i + 2]; 1086 + const char *base_str = NULL, *comp_str = NULL; 1087 + long base_val = 0, comp_val = 0, diff_val = 0; 1088 + 1089 + prepare_value(base, id, &base_str, &base_val); 1090 + prepare_value(comp, id, &comp_str, &comp_val); 1091 + 1092 + /* normalize all the outputs to be in string buffers for simplicity */ 1093 + if (is_key_stat(id)) { 1094 + /* key stats (file and program name) are always strings */ 1095 + if (base != &fallback_stats) 1096 + snprintf(base_buf, sizeof(base_buf), "%s", base_str); 1097 + else 1098 + snprintf(base_buf, sizeof(base_buf), "%s", comp_str); 1099 + } else if (base_str) { 1100 + snprintf(base_buf, sizeof(base_buf), "%s", base_str); 1101 + snprintf(comp_buf, sizeof(comp_buf), "%s", comp_str); 1102 + if (strcmp(base_str, comp_str) == 0) 1103 + snprintf(diff_buf, sizeof(diff_buf), "%s", "MATCH"); 1104 + else 1105 + snprintf(diff_buf, sizeof(diff_buf), "%s", "MISMATCH"); 1106 + } else { 1107 + snprintf(base_buf, sizeof(base_buf), "%ld", base_val); 1108 + snprintf(comp_buf, sizeof(comp_buf), "%ld", comp_val); 1109 + 1110 + diff_val = comp_val - base_val; 1111 + if (base == &fallback_stats || comp == &fallback_stats || base_val == 0) { 1112 + snprintf(diff_buf, sizeof(diff_buf), "%+ld (%+.2lf%%)", 1113 + diff_val, comp_val < base_val ? -100.0 : 100.0); 1114 + } else { 1115 + snprintf(diff_buf, sizeof(diff_buf), "%+ld (%+.2lf%%)", 1116 + diff_val, diff_val * 100.0 / base_val); 1117 + } 1118 + } 1119 + 1120 + switch (fmt) { 1121 + case RESFMT_TABLE_CALCLEN: 1122 + len = strlen(base_buf); 1123 + if (len > *max_len_base) 1124 + *max_len_base = len; 1125 + if (!is_key_stat(id)) { 1126 + len = strlen(comp_buf); 1127 + if (len > *max_len_comp) 1128 + *max_len_comp = len; 1129 + len = strlen(diff_buf); 1130 + if (len > *max_len_diff) 1131 + *max_len_diff = len; 1132 + } 1133 + break; 1134 + case RESFMT_TABLE: { 1135 + /* string outputs are left-aligned, number outputs are right-aligned */ 1136 + const char *fmt = base_str ? "%s%-*s" : "%s%*s"; 1137 + 1138 + printf(fmt, i == 0 ? "" : COLUMN_SEP, *max_len_base, base_buf); 1139 + if (!is_key_stat(id)) { 1140 + printf(fmt, COLUMN_SEP, *max_len_comp, comp_buf); 1141 + printf(fmt, COLUMN_SEP, *max_len_diff, diff_buf); 1142 + } 1143 + if (i == env.output_spec.spec_cnt - 1) 1144 + printf("\n"); 1145 + break; 1146 + } 1147 + case RESFMT_CSV: 1148 + printf("%s%s", i == 0 ? "" : ",", base_buf); 1149 + if (!is_key_stat(id)) { 1150 + printf("%s%s", i == 0 ? "" : ",", comp_buf); 1151 + printf("%s%s", i == 0 ? "" : ",", diff_buf); 1152 + } 1153 + if (i == env.output_spec.spec_cnt - 1) 1154 + printf("\n"); 1155 + break; 1156 + } 1157 + } 1158 + 1159 + if (last && fmt == RESFMT_TABLE) 1160 + output_comp_header_underlines(); 1161 + } 1162 + 1163 + static int cmp_stats_key(const struct verif_stats *base, const struct verif_stats *comp) 1164 + { 1165 + int r; 1166 + 1167 + r = strcmp(base->file_name, comp->file_name); 1168 + if (r != 0) 1169 + return r; 1170 + return strcmp(base->prog_name, comp->prog_name); 1171 + } 1172 + 1173 + static int handle_comparison_mode(void) 1174 + { 1175 + struct stat_specs base_specs = {}, comp_specs = {}; 1176 + enum resfmt cur_fmt; 1177 + int err, i, j; 1178 + 1179 + if (env.filename_cnt != 2) { 1180 + fprintf(stderr, "Comparison mode expects exactly two input CSV files!\n"); 1181 + argp_help(&argp, stderr, ARGP_HELP_USAGE, "veristat"); 1182 + return -EINVAL; 1183 + } 1184 + 1185 + err = parse_stats_csv(env.filenames[0], &base_specs, 1186 + &env.baseline_stats, &env.baseline_stat_cnt); 1187 + if (err) { 1188 + fprintf(stderr, "Failed to parse stats from '%s': %d\n", env.filenames[0], err); 1189 + return err; 1190 + } 1191 + err = parse_stats_csv(env.filenames[1], &comp_specs, 1192 + &env.prog_stats, &env.prog_stat_cnt); 1193 + if (err) { 1194 + fprintf(stderr, "Failed to parse stats from '%s': %d\n", env.filenames[1], err); 1195 + return err; 1196 + } 1197 + 1198 + /* To keep it simple we validate that the set and order of stats in 1199 + * both CSVs are exactly the same. This can be lifted with a bit more 1200 + * pre-processing later. 1201 + */ 1202 + if (base_specs.spec_cnt != comp_specs.spec_cnt) { 1203 + fprintf(stderr, "Number of stats in '%s' and '%s' differs (%d != %d)!\n", 1204 + env.filenames[0], env.filenames[1], 1205 + base_specs.spec_cnt, comp_specs.spec_cnt); 1206 + return -EINVAL; 1207 + } 1208 + for (i = 0; i < base_specs.spec_cnt; i++) { 1209 + if (base_specs.ids[i] != comp_specs.ids[i]) { 1210 + fprintf(stderr, "Stats composition differs between '%s' and '%s' (%s != %s)!\n", 1211 + env.filenames[0], env.filenames[1], 1212 + stat_defs[base_specs.ids[i]].names[0], 1213 + stat_defs[comp_specs.ids[i]].names[0]); 1214 + return -EINVAL; 1215 + } 1216 + } 1217 + 1218 + qsort(env.prog_stats, env.prog_stat_cnt, sizeof(*env.prog_stats), cmp_prog_stats); 1219 + qsort(env.baseline_stats, env.baseline_stat_cnt, sizeof(*env.baseline_stats), cmp_prog_stats); 1220 + 1221 + /* for human-readable table output we need to do extra pass to 1222 + * calculate column widths, so we substitute current output format 1223 + * with RESFMT_TABLE_CALCLEN and later revert it back to RESFMT_TABLE 1224 + * and do everything again. 1225 + */ 1226 + if (env.out_fmt == RESFMT_TABLE) 1227 + cur_fmt = RESFMT_TABLE_CALCLEN; 1228 + else 1229 + cur_fmt = env.out_fmt; 1230 + 1231 + one_more_time: 1232 + output_comp_headers(cur_fmt); 1233 + 1234 + /* If baseline and comparison datasets have different subset of rows 1235 + * (we match by 'object + prog' as a unique key) then assume 1236 + * empty/missing/zero value for rows that are missing in the opposite 1237 + * data set 1238 + */ 1239 + i = j = 0; 1240 + while (i < env.baseline_stat_cnt || j < env.prog_stat_cnt) { 1241 + bool last = (i == env.baseline_stat_cnt - 1) || (j == env.prog_stat_cnt - 1); 1242 + const struct verif_stats *base, *comp; 1243 + int r; 1244 + 1245 + base = i < env.baseline_stat_cnt ? &env.baseline_stats[i] : &fallback_stats; 1246 + comp = j < env.prog_stat_cnt ? &env.prog_stats[j] : &fallback_stats; 1247 + 1248 + if (!base->file_name || !base->prog_name) { 1249 + fprintf(stderr, "Entry #%d in '%s' doesn't have file and/or program name specified!\n", 1250 + i, env.filenames[0]); 1251 + return -EINVAL; 1252 + } 1253 + if (!comp->file_name || !comp->prog_name) { 1254 + fprintf(stderr, "Entry #%d in '%s' doesn't have file and/or program name specified!\n", 1255 + j, env.filenames[1]); 1256 + return -EINVAL; 1257 + } 1258 + 1259 + r = cmp_stats_key(base, comp); 1260 + if (r == 0) { 1261 + output_comp_stats(base, comp, cur_fmt, last); 1262 + i++; 1263 + j++; 1264 + } else if (comp == &fallback_stats || r < 0) { 1265 + output_comp_stats(base, &fallback_stats, cur_fmt, last); 1266 + i++; 1267 + } else { 1268 + output_comp_stats(&fallback_stats, comp, cur_fmt, last); 1269 + j++; 1270 + } 1271 + } 1272 + 1273 + if (cur_fmt == RESFMT_TABLE_CALCLEN) { 1274 + cur_fmt = RESFMT_TABLE; 1275 + goto one_more_time; /* ... this time with feeling */ 1276 + } 1277 + 1278 + return 0; 1279 + } 1280 + 1281 + int main(int argc, char **argv) 1282 + { 1283 + int err = 0, i; 1284 + 1285 + if (argp_parse(&argp, argc, argv, 0, NULL, NULL)) 1286 + return 1; 1287 + 1288 + if (env.verbose && env.quiet) { 1289 + fprintf(stderr, "Verbose and quiet modes are incompatible, please specify just one or neither!\n"); 1290 + argp_help(&argp, stderr, ARGP_HELP_USAGE, "veristat"); 1291 + return 1; 1292 + } 1293 + if (env.verbose && env.log_level == 0) 1294 + env.log_level = 1; 1295 + 1296 + if (env.output_spec.spec_cnt == 0) 1297 + env.output_spec = default_output_spec; 1298 + if (env.sort_spec.spec_cnt == 0) 1299 + env.sort_spec = default_sort_spec; 1300 + 1301 + if (env.comparison_mode) 1302 + err = handle_comparison_mode(); 1303 + else 1304 + err = handle_verif_mode(); 1305 + 1306 + free_verif_stats(env.prog_stats, env.prog_stat_cnt); 1307 + free_verif_stats(env.baseline_stats, env.baseline_stat_cnt); 1308 + for (i = 0; i < env.filename_cnt; i++) 1309 + free(env.filenames[i]); 1310 + free(env.filenames); 1311 + for (i = 0; i < env.allow_filter_cnt; i++) { 1312 + free(env.allow_filters[i].file_glob); 1313 + free(env.allow_filters[i].prog_glob); 1314 + } 1315 + free(env.allow_filters); 1316 + for (i = 0; i < env.deny_filter_cnt; i++) { 1317 + free(env.deny_filters[i].file_glob); 1318 + free(env.deny_filters[i].prog_glob); 1319 + } 1320 + free(env.deny_filters); 1321 + return -err; 1322 + }

+17

tools/testing/selftests/bpf/veristat.cfg

··· 1 + # pre-canned list of rather complex selftests/bpf BPF object files to monitor 2 + # BPF verifier's performance on 3 + bpf_flow* 4 + bpf_loop_bench* 5 + loop* 6 + netif_receive_skb* 7 + profiler* 8 + pyperf* 9 + strobemeta* 10 + test_cls_redirect* 11 + test_l4lb 12 + test_sysctl* 13 + test_tcp_hdr_* 14 + test_usdt* 15 + test_verif_scale* 16 + test_xdp_noinline* 17 + xdp_synproxy*

-3

tools/testing/selftests/bpf/xskxceiver.c

··· 1953 1953 1954 1954 pkt_stream_delete(tx_pkt_stream_default); 1955 1955 pkt_stream_delete(rx_pkt_stream_default); 1956 - free(ifobj_rx->umem); 1957 - if (!ifobj_tx->shared_umem) 1958 - free(ifobj_tx->umem); 1959 1956 ifobject_delete(ifobj_tx); 1960 1957 ifobject_delete(ifobj_rx); 1961 1958