Merge tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+112 -5

Documentation/bpf/bpf_iterators.rst

··· 2 2 BPF Iterators 3 3 ============= 4 4 5 + -------- 6 + Overview 7 + -------- 5 8 6 - ---------- 7 - Motivation 8 - ---------- 9 + BPF supports two separate entities collectively known as "BPF iterators": BPF 10 + iterator *program type* and *open-coded* BPF iterators. The former is 11 + a stand-alone BPF program type which, when attached and activated by user, 12 + will be called once for each entity (task_struct, cgroup, etc) that is being 13 + iterated. The latter is a set of BPF-side APIs implementing iterator 14 + functionality and available across multiple BPF program types. Open-coded 15 + iterators provide similar functionality to BPF iterator programs, but gives 16 + more flexibility and control to all other BPF program types. BPF iterator 17 + programs, on the other hand, can be used to implement anonymous or BPF 18 + FS-mounted special files, whose contents are generated by attached BPF iterator 19 + program, backed by seq_file functionality. Both are useful depending on 20 + specific needs. 21 + 22 + When adding a new BPF iterator program, it is expected that similar 23 + functionality will be added as open-coded iterator for maximum flexibility. 24 + It's also expected that iteration logic and code will be maximally shared and 25 + reused between two iterator API surfaces. 26 + 27 + ------------------------ 28 + Open-coded BPF Iterators 29 + ------------------------ 30 + 31 + Open-coded BPF iterators are implemented as tightly-coupled trios of kfuncs 32 + (constructor, next element fetch, destructor) and iterator-specific type 33 + describing on-the-stack iterator state, which is guaranteed by the BPF 34 + verifier to not be tampered with outside of the corresponding 35 + constructor/destructor/next APIs. 36 + 37 + Each kind of open-coded BPF iterator has its own associated 38 + struct bpf_iter_<type>, where <type> denotes a specific type of iterator. 39 + bpf_iter_<type> state needs to live on BPF program stack, so make sure it's 40 + small enough to fit on BPF stack. For performance reasons its best to avoid 41 + dynamic memory allocation for iterator state and size the state struct big 42 + enough to fit everything necessary. But if necessary, dynamic memory 43 + allocation is a way to bypass BPF stack limitations. Note, state struct size 44 + is part of iterator's user-visible API, so changing it will break backwards 45 + compatibility, so be deliberate about designing it. 46 + 47 + All kfuncs (constructor, next, destructor) have to be named consistently as 48 + bpf_iter_<type>_{new,next,destroy}(), respectively. <type> represents iterator 49 + type, and iterator state should be represented as a matching 50 + `struct bpf_iter_<type>` state type. Also, all iter kfuncs should have 51 + a pointer to this `struct bpf_iter_<type>` as the very first argument. 52 + 53 + Additionally: 54 + - Constructor, i.e., `bpf_iter_<type>_new()`, can have arbitrary extra 55 + number of arguments. Return type is not enforced either. 56 + - Next method, i.e., `bpf_iter_<type>_next()`, has to return a pointer 57 + type and should have exactly one argument: `struct bpf_iter_<type> *` 58 + (const/volatile/restrict and typedefs are ignored). 59 + - Destructor, i.e., `bpf_iter_<type>_destroy()`, should return void and 60 + should have exactly one argument, similar to the next method. 61 + - `struct bpf_iter_<type>` size is enforced to be positive and 62 + a multiple of 8 bytes (to fit stack slots correctly). 63 + 64 + Such strictness and consistency allows to build generic helpers abstracting 65 + important, but boilerplate, details to be able to use open-coded iterators 66 + effectively and ergonomically (see libbpf's bpf_for_each() macro). This is 67 + enforced at kfunc registration point by the kernel. 68 + 69 + Constructor/next/destructor implementation contract is as follows: 70 + - constructor, `bpf_iter_<type>_new()`, always initializes iterator state on 71 + the stack. If any of the input arguments are invalid, constructor should 72 + make sure to still initialize it such that subsequent next() calls will 73 + return NULL. I.e., on error, *return error and construct empty iterator*. 74 + Constructor kfunc is marked with KF_ITER_NEW flag. 75 + 76 + - next method, `bpf_iter_<type>_next()`, accepts pointer to iterator state 77 + and produces an element. Next method should always return a pointer. The 78 + contract between BPF verifier is that next method *guarantees* that it 79 + will eventually return NULL when elements are exhausted. Once NULL is 80 + returned, subsequent next calls *should keep returning NULL*. Next method 81 + is marked with KF_ITER_NEXT (and should also have KF_RET_NULL as 82 + NULL-returning kfunc, of course). 83 + 84 + - destructor, `bpf_iter_<type>_destroy()`, is always called once. Even if 85 + constructor failed or next returned nothing. Destructor frees up any 86 + resources and marks stack space used by `struct bpf_iter_<type>` as usable 87 + for something else. Destructor is marked with KF_ITER_DESTROY flag. 88 + 89 + Any open-coded BPF iterator implementation has to implement at least these 90 + three methods. It is enforced that for any given type of iterator only 91 + applicable constructor/destructor/next are callable. I.e., verifier ensures 92 + you can't pass number iterator state into, say, cgroup iterator's next method. 93 + 94 + From a 10,000-feet BPF verification point of view, next methods are the points 95 + of forking a verification state, which are conceptually similar to what 96 + verifier is doing when validating conditional jumps. Verifier is branching out 97 + `call bpf_iter_<type>_next` instruction and simulates two outcomes: NULL 98 + (iteration is done) and non-NULL (new element is returned). NULL is simulated 99 + first and is supposed to reach exit without looping. After that non-NULL case 100 + is validated and it either reaches exit (for trivial examples with no real 101 + loop), or reaches another `call bpf_iter_<type>_next` instruction with the 102 + state equivalent to already (partially) validated one. State equivalency at 103 + that point means we technically are going to be looping forever without 104 + "breaking out" out of established "state envelope" (i.e., subsequent 105 + iterations don't add any new knowledge or constraints to the verifier state, 106 + so running 1, 2, 10, or a million of them doesn't matter). But taking into 107 + account the contract stating that iterator next method *has to* return NULL 108 + eventually, we can conclude that loop body is safe and will eventually 109 + terminate. Given we validated logic outside of the loop (NULL case), and 110 + concluded that loop body is safe (though potentially looping many times), 111 + verifier can claim safety of the overall program logic. 112 + 113 + ------------------------ 114 + BPF Iterators Motivation 115 + ------------------------ 9 116 10 117 There are a few existing ways to dump kernel data into user space. The most 11 118 popular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps ··· 430 323 431 324 :: 432 325 433 - link = bpf_program__attach_iter(prog, &opts); iter_fd = 434 - bpf_iter_create(bpf_link__fd(link)); 326 + link = bpf_program__attach_iter(prog, &opts); 327 + iter_fd = bpf_iter_create(bpf_link__fd(link)); 435 328 436 329 If both *tid* and *pid* are zero, an iterator created from this struct 437 330 ``bpf_iter_attach_opts`` will include every opened file of every task in the

+17

Documentation/bpf/kfuncs.rst

··· 160 160 ... 161 161 } 162 162 163 + 2.2.6 __prog Annotation 164 + --------------------------- 165 + This annotation is used to indicate that the argument needs to be fixed up to 166 + the bpf_prog_aux of the caller BPF program. Any value passed into this argument 167 + is ignored, and rewritten by the verifier. 168 + 169 + An example is given below:: 170 + 171 + __bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq, 172 + int (callback_fn)(void *map, int *key, void *value), 173 + unsigned int flags, 174 + void *aux__prog) 175 + { 176 + struct bpf_prog_aux *aux = aux__prog; 177 + ... 178 + } 179 + 163 180 .. _BPF_kfunc_nodef: 164 181 165 182 2.3 Using an existing kernel function

+172 -72

arch/arm64/net/bpf_jit_comp.c

··· 2113 2113 } 2114 2114 2115 2115 static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l, 2116 - int args_off, int retval_off, int run_ctx_off, 2116 + int bargs_off, int retval_off, int run_ctx_off, 2117 2117 bool save_ret) 2118 2118 { 2119 2119 __le32 *branch; ··· 2155 2155 branch = ctx->image + ctx->idx; 2156 2156 emit(A64_NOP, ctx); 2157 2157 2158 - emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx); 2158 + emit(A64_ADD_I(1, A64_R(0), A64_SP, bargs_off), ctx); 2159 2159 if (!p->jited) 2160 2160 emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx); 2161 2161 ··· 2180 2180 } 2181 2181 2182 2182 static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl, 2183 - int args_off, int retval_off, int run_ctx_off, 2183 + int bargs_off, int retval_off, int run_ctx_off, 2184 2184 __le32 **branches) 2185 2185 { 2186 2186 int i; ··· 2190 2190 */ 2191 2191 emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx); 2192 2192 for (i = 0; i < tl->nr_links; i++) { 2193 - invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off, 2193 + invoke_bpf_prog(ctx, tl->links[i], bargs_off, retval_off, 2194 2194 run_ctx_off, true); 2195 2195 /* if (*(u64 *)(sp + retval_off) != 0) 2196 2196 * goto do_fexit; ··· 2204 2204 } 2205 2205 } 2206 2206 2207 - static void save_args(struct jit_ctx *ctx, int args_off, int nregs) 2208 - { 2209 - int i; 2207 + struct arg_aux { 2208 + /* how many args are passed through registers, the rest of the args are 2209 + * passed through stack 2210 + */ 2211 + int args_in_regs; 2212 + /* how many registers are used to pass arguments */ 2213 + int regs_for_args; 2214 + /* how much stack is used for additional args passed to bpf program 2215 + * that did not fit in original function registers 2216 + */ 2217 + int bstack_for_args; 2218 + /* home much stack is used for additional args passed to the 2219 + * original function when called from trampoline (this one needs 2220 + * arguments to be properly aligned) 2221 + */ 2222 + int ostack_for_args; 2223 + }; 2210 2224 2211 - for (i = 0; i < nregs; i++) { 2212 - emit(A64_STR64I(i, A64_SP, args_off), ctx); 2213 - args_off += 8; 2225 + static int calc_arg_aux(const struct btf_func_model *m, 2226 + struct arg_aux *a) 2227 + { 2228 + int stack_slots, nregs, slots, i; 2229 + 2230 + /* verifier ensures m->nr_args <= MAX_BPF_FUNC_ARGS */ 2231 + for (i = 0, nregs = 0; i < m->nr_args; i++) { 2232 + slots = (m->arg_size[i] + 7) / 8; 2233 + if (nregs + slots <= 8) /* passed through register ? */ 2234 + nregs += slots; 2235 + else 2236 + break; 2237 + } 2238 + 2239 + a->args_in_regs = i; 2240 + a->regs_for_args = nregs; 2241 + a->ostack_for_args = 0; 2242 + a->bstack_for_args = 0; 2243 + 2244 + /* the rest arguments are passed through stack */ 2245 + for (; i < m->nr_args; i++) { 2246 + /* We can not know for sure about exact alignment needs for 2247 + * struct passed on stack, so deny those 2248 + */ 2249 + if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) 2250 + return -ENOTSUPP; 2251 + stack_slots = (m->arg_size[i] + 7) / 8; 2252 + a->bstack_for_args += stack_slots * 8; 2253 + a->ostack_for_args = a->ostack_for_args + stack_slots * 8; 2254 + } 2255 + 2256 + return 0; 2257 + } 2258 + 2259 + static void clear_garbage(struct jit_ctx *ctx, int reg, int effective_bytes) 2260 + { 2261 + if (effective_bytes) { 2262 + int garbage_bits = 64 - 8 * effective_bytes; 2263 + #ifdef CONFIG_CPU_BIG_ENDIAN 2264 + /* garbage bits are at the right end */ 2265 + emit(A64_LSR(1, reg, reg, garbage_bits), ctx); 2266 + emit(A64_LSL(1, reg, reg, garbage_bits), ctx); 2267 + #else 2268 + /* garbage bits are at the left end */ 2269 + emit(A64_LSL(1, reg, reg, garbage_bits), ctx); 2270 + emit(A64_LSR(1, reg, reg, garbage_bits), ctx); 2271 + #endif 2214 2272 } 2215 2273 } 2216 2274 2217 - static void restore_args(struct jit_ctx *ctx, int args_off, int nregs) 2275 + static void save_args(struct jit_ctx *ctx, int bargs_off, int oargs_off, 2276 + const struct btf_func_model *m, 2277 + const struct arg_aux *a, 2278 + bool for_call_origin) 2218 2279 { 2219 2280 int i; 2281 + int reg; 2282 + int doff; 2283 + int soff; 2284 + int slots; 2285 + u8 tmp = bpf2a64[TMP_REG_1]; 2220 2286 2221 - for (i = 0; i < nregs; i++) { 2222 - emit(A64_LDR64I(i, A64_SP, args_off), ctx); 2223 - args_off += 8; 2287 + /* store arguments to the stack for the bpf program, or restore 2288 + * arguments from stack for the original function 2289 + */ 2290 + for (reg = 0; reg < a->regs_for_args; reg++) { 2291 + emit(for_call_origin ? 2292 + A64_LDR64I(reg, A64_SP, bargs_off) : 2293 + A64_STR64I(reg, A64_SP, bargs_off), 2294 + ctx); 2295 + bargs_off += 8; 2296 + } 2297 + 2298 + soff = 32; /* on stack arguments start from FP + 32 */ 2299 + doff = (for_call_origin ? oargs_off : bargs_off); 2300 + 2301 + /* save on stack arguments */ 2302 + for (i = a->args_in_regs; i < m->nr_args; i++) { 2303 + slots = (m->arg_size[i] + 7) / 8; 2304 + /* verifier ensures arg_size <= 16, so slots equals 1 or 2 */ 2305 + while (slots-- > 0) { 2306 + emit(A64_LDR64I(tmp, A64_FP, soff), ctx); 2307 + /* if there is unused space in the last slot, clear 2308 + * the garbage contained in the space. 2309 + */ 2310 + if (slots == 0 && !for_call_origin) 2311 + clear_garbage(ctx, tmp, m->arg_size[i] % 8); 2312 + emit(A64_STR64I(tmp, A64_SP, doff), ctx); 2313 + soff += 8; 2314 + doff += 8; 2315 + } 2316 + } 2317 + } 2318 + 2319 + static void restore_args(struct jit_ctx *ctx, int bargs_off, int nregs) 2320 + { 2321 + int reg; 2322 + 2323 + for (reg = 0; reg < nregs; reg++) { 2324 + emit(A64_LDR64I(reg, A64_SP, bargs_off), ctx); 2325 + bargs_off += 8; 2224 2326 } 2225 2327 } 2226 2328 ··· 2345 2243 */ 2346 2244 static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, 2347 2245 struct bpf_tramp_links *tlinks, void *func_addr, 2348 - int nregs, u32 flags) 2246 + const struct btf_func_model *m, 2247 + const struct arg_aux *a, 2248 + u32 flags) 2349 2249 { 2350 2250 int i; 2351 2251 int stack_size; 2352 2252 int retaddr_off; 2353 2253 int regs_off; 2354 2254 int retval_off; 2355 - int args_off; 2356 - int nregs_off; 2255 + int bargs_off; 2256 + int nfuncargs_off; 2357 2257 int ip_off; 2358 2258 int run_ctx_off; 2259 + int oargs_off; 2260 + int nfuncargs; 2359 2261 struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; 2360 2262 struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; 2361 2263 struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; ··· 2368 2262 bool is_struct_ops = is_struct_ops_tramp(fentry); 2369 2263 2370 2264 /* trampoline stack layout: 2371 - * [ parent ip ] 2372 - * [ FP ] 2373 - * SP + retaddr_off [ self ip ] 2374 - * [ FP ] 2265 + * [ parent ip ] 2266 + * [ FP ] 2267 + * SP + retaddr_off [ self ip ] 2268 + * [ FP ] 2375 2269 * 2376 - * [ padding ] align SP to multiples of 16 2270 + * [ padding ] align SP to multiples of 16 2377 2271 * 2378 - * [ x20 ] callee saved reg x20 2379 - * SP + regs_off [ x19 ] callee saved reg x19 2272 + * [ x20 ] callee saved reg x20 2273 + * SP + regs_off [ x19 ] callee saved reg x19 2380 2274 * 2381 - * SP + retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or 2382 - * BPF_TRAMP_F_RET_FENTRY_RET 2275 + * SP + retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or 2276 + * BPF_TRAMP_F_RET_FENTRY_RET 2277 + * [ arg reg N ] 2278 + * [ ... ] 2279 + * SP + bargs_off [ arg reg 1 ] for bpf 2383 2280 * 2384 - * [ arg reg N ] 2385 - * [ ... ] 2386 - * SP + args_off [ arg reg 1 ] 2281 + * SP + nfuncargs_off [ arg regs count ] 2387 2282 * 2388 - * SP + nregs_off [ arg regs count ] 2283 + * SP + ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag 2389 2284 * 2390 - * SP + ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag 2285 + * SP + run_ctx_off [ bpf_tramp_run_ctx ] 2391 2286 * 2392 - * SP + run_ctx_off [ bpf_tramp_run_ctx ] 2287 + * [ stack arg N ] 2288 + * [ ... ] 2289 + * SP + oargs_off [ stack arg 1 ] for original func 2393 2290 */ 2394 2291 2395 2292 stack_size = 0; 2293 + oargs_off = stack_size; 2294 + if (flags & BPF_TRAMP_F_CALL_ORIG) 2295 + stack_size += a->ostack_for_args; 2296 + 2396 2297 run_ctx_off = stack_size; 2397 2298 /* room for bpf_tramp_run_ctx */ 2398 2299 stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8); ··· 2409 2296 if (flags & BPF_TRAMP_F_IP_ARG) 2410 2297 stack_size += 8; 2411 2298 2412 - nregs_off = stack_size; 2299 + nfuncargs_off = stack_size; 2413 2300 /* room for args count */ 2414 2301 stack_size += 8; 2415 2302 2416 - args_off = stack_size; 2303 + bargs_off = stack_size; 2417 2304 /* room for args */ 2418 - stack_size += nregs * 8; 2305 + nfuncargs = a->regs_for_args + a->bstack_for_args / 8; 2306 + stack_size += 8 * nfuncargs; 2419 2307 2420 2308 /* room for return value */ 2421 2309 retval_off = stack_size; ··· 2463 2349 } 2464 2350 2465 2351 /* save arg regs count*/ 2466 - emit(A64_MOVZ(1, A64_R(10), nregs, 0), ctx); 2467 - emit(A64_STR64I(A64_R(10), A64_SP, nregs_off), ctx); 2352 + emit(A64_MOVZ(1, A64_R(10), nfuncargs, 0), ctx); 2353 + emit(A64_STR64I(A64_R(10), A64_SP, nfuncargs_off), ctx); 2468 2354 2469 - /* save arg regs */ 2470 - save_args(ctx, args_off, nregs); 2355 + /* save args for bpf */ 2356 + save_args(ctx, bargs_off, oargs_off, m, a, false); 2471 2357 2472 2358 /* save callee saved registers */ 2473 2359 emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx); ··· 2483 2369 } 2484 2370 2485 2371 for (i = 0; i < fentry->nr_links; i++) 2486 - invoke_bpf_prog(ctx, fentry->links[i], args_off, 2372 + invoke_bpf_prog(ctx, fentry->links[i], bargs_off, 2487 2373 retval_off, run_ctx_off, 2488 2374 flags & BPF_TRAMP_F_RET_FENTRY_RET); 2489 2375 ··· 2493 2379 if (!branches) 2494 2380 return -ENOMEM; 2495 2381 2496 - invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off, 2382 + invoke_bpf_mod_ret(ctx, fmod_ret, bargs_off, retval_off, 2497 2383 run_ctx_off, branches); 2498 2384 } 2499 2385 2500 2386 if (flags & BPF_TRAMP_F_CALL_ORIG) { 2501 - restore_args(ctx, args_off, nregs); 2387 + /* save args for original func */ 2388 + save_args(ctx, bargs_off, oargs_off, m, a, true); 2502 2389 /* call original func */ 2503 2390 emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx); 2504 2391 emit(A64_ADR(A64_LR, AARCH64_INSN_SIZE * 2), ctx); ··· 2518 2403 } 2519 2404 2520 2405 for (i = 0; i < fexit->nr_links; i++) 2521 - invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off, 2406 + invoke_bpf_prog(ctx, fexit->links[i], bargs_off, retval_off, 2522 2407 run_ctx_off, false); 2523 2408 2524 2409 if (flags & BPF_TRAMP_F_CALL_ORIG) { ··· 2532 2417 } 2533 2418 2534 2419 if (flags & BPF_TRAMP_F_RESTORE_REGS) 2535 - restore_args(ctx, args_off, nregs); 2420 + restore_args(ctx, bargs_off, a->regs_for_args); 2536 2421 2537 2422 /* restore callee saved register x19 and x20 */ 2538 2423 emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx); ··· 2569 2454 return ctx->idx; 2570 2455 } 2571 2456 2572 - static int btf_func_model_nregs(const struct btf_func_model *m) 2573 - { 2574 - int nregs = m->nr_args; 2575 - int i; 2576 - 2577 - /* extra registers needed for struct argument */ 2578 - for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) { 2579 - /* The arg_size is at most 16 bytes, enforced by the verifier. */ 2580 - if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) 2581 - nregs += (m->arg_size[i] + 7) / 8 - 1; 2582 - } 2583 - 2584 - return nregs; 2585 - } 2586 - 2587 2457 int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, 2588 2458 struct bpf_tramp_links *tlinks, void *func_addr) 2589 2459 { ··· 2577 2477 .idx = 0, 2578 2478 }; 2579 2479 struct bpf_tramp_image im; 2580 - int nregs, ret; 2480 + struct arg_aux aaux; 2481 + int ret; 2581 2482 2582 - nregs = btf_func_model_nregs(m); 2583 - /* the first 8 registers are used for arguments */ 2584 - if (nregs > 8) 2585 - return -ENOTSUPP; 2483 + ret = calc_arg_aux(m, &aaux); 2484 + if (ret < 0) 2485 + return ret; 2586 2486 2587 - ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, nregs, flags); 2487 + ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, m, &aaux, flags); 2588 2488 if (ret < 0) 2589 2489 return ret; 2590 2490 ··· 2611 2511 u32 flags, struct bpf_tramp_links *tlinks, 2612 2512 void *func_addr) 2613 2513 { 2614 - int ret, nregs; 2615 - void *image, *tmp; 2616 2514 u32 size = ro_image_end - ro_image; 2515 + struct arg_aux aaux; 2516 + void *image, *tmp; 2517 + int ret; 2617 2518 2618 2519 /* image doesn't need to be in module memory range, so we can 2619 2520 * use kvmalloc. ··· 2630 2529 .write = true, 2631 2530 }; 2632 2531 2633 - nregs = btf_func_model_nregs(m); 2634 - /* the first 8 registers are used for arguments */ 2635 - if (nregs > 8) 2636 - return -ENOTSUPP; 2637 2532 2638 2533 jit_fill_hole(image, (unsigned int)(ro_image_end - ro_image)); 2639 - ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags); 2534 + ret = calc_arg_aux(m, &aaux); 2535 + if (ret) 2536 + goto out; 2537 + ret = prepare_trampoline(&ctx, im, tlinks, func_addr, m, &aaux, flags); 2640 2538 2641 2539 if (ret > 0 && validate_code(&ctx) < 0) { 2642 2540 ret = -EINVAL;

+15

arch/riscv/net/bpf_jit.h

··· 608 608 return rv_i_insn(imm11_0, 0, 0, 0, 0xf); 609 609 } 610 610 611 + static inline void emit_fence_r_rw(struct rv_jit_context *ctx) 612 + { 613 + emit(rv_fence(0x2, 0x3), ctx); 614 + } 615 + 616 + static inline void emit_fence_rw_w(struct rv_jit_context *ctx) 617 + { 618 + emit(rv_fence(0x3, 0x1), ctx); 619 + } 620 + 621 + static inline void emit_fence_rw_rw(struct rv_jit_context *ctx) 622 + { 623 + emit(rv_fence(0x3, 0x3), ctx); 624 + } 625 + 611 626 static inline u32 rv_nop(void) 612 627 { 613 628 return rv_i_insn(0, 0, 0, 0, 0x13);

+227 -105

arch/riscv/net/bpf_jit_comp64.c

··· 473 473 emit(hash, ctx); 474 474 } 475 475 476 - static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64, 477 - struct rv_jit_context *ctx) 476 + static int emit_load_8(bool sign_ext, u8 rd, s32 off, u8 rs, struct rv_jit_context *ctx) 478 477 { 479 - u8 r0; 478 + int insns_start; 479 + 480 + if (is_12b_int(off)) { 481 + insns_start = ctx->ninsns; 482 + if (sign_ext) 483 + emit(rv_lb(rd, off, rs), ctx); 484 + else 485 + emit(rv_lbu(rd, off, rs), ctx); 486 + return ctx->ninsns - insns_start; 487 + } 488 + 489 + emit_imm(RV_REG_T1, off, ctx); 490 + emit_add(RV_REG_T1, RV_REG_T1, rs, ctx); 491 + insns_start = ctx->ninsns; 492 + if (sign_ext) 493 + emit(rv_lb(rd, 0, RV_REG_T1), ctx); 494 + else 495 + emit(rv_lbu(rd, 0, RV_REG_T1), ctx); 496 + return ctx->ninsns - insns_start; 497 + } 498 + 499 + static int emit_load_16(bool sign_ext, u8 rd, s32 off, u8 rs, struct rv_jit_context *ctx) 500 + { 501 + int insns_start; 502 + 503 + if (is_12b_int(off)) { 504 + insns_start = ctx->ninsns; 505 + if (sign_ext) 506 + emit(rv_lh(rd, off, rs), ctx); 507 + else 508 + emit(rv_lhu(rd, off, rs), ctx); 509 + return ctx->ninsns - insns_start; 510 + } 511 + 512 + emit_imm(RV_REG_T1, off, ctx); 513 + emit_add(RV_REG_T1, RV_REG_T1, rs, ctx); 514 + insns_start = ctx->ninsns; 515 + if (sign_ext) 516 + emit(rv_lh(rd, 0, RV_REG_T1), ctx); 517 + else 518 + emit(rv_lhu(rd, 0, RV_REG_T1), ctx); 519 + return ctx->ninsns - insns_start; 520 + } 521 + 522 + static int emit_load_32(bool sign_ext, u8 rd, s32 off, u8 rs, struct rv_jit_context *ctx) 523 + { 524 + int insns_start; 525 + 526 + if (is_12b_int(off)) { 527 + insns_start = ctx->ninsns; 528 + if (sign_ext) 529 + emit(rv_lw(rd, off, rs), ctx); 530 + else 531 + emit(rv_lwu(rd, off, rs), ctx); 532 + return ctx->ninsns - insns_start; 533 + } 534 + 535 + emit_imm(RV_REG_T1, off, ctx); 536 + emit_add(RV_REG_T1, RV_REG_T1, rs, ctx); 537 + insns_start = ctx->ninsns; 538 + if (sign_ext) 539 + emit(rv_lw(rd, 0, RV_REG_T1), ctx); 540 + else 541 + emit(rv_lwu(rd, 0, RV_REG_T1), ctx); 542 + return ctx->ninsns - insns_start; 543 + } 544 + 545 + static int emit_load_64(bool sign_ext, u8 rd, s32 off, u8 rs, struct rv_jit_context *ctx) 546 + { 547 + int insns_start; 548 + 549 + if (is_12b_int(off)) { 550 + insns_start = ctx->ninsns; 551 + emit_ld(rd, off, rs, ctx); 552 + return ctx->ninsns - insns_start; 553 + } 554 + 555 + emit_imm(RV_REG_T1, off, ctx); 556 + emit_add(RV_REG_T1, RV_REG_T1, rs, ctx); 557 + insns_start = ctx->ninsns; 558 + emit_ld(rd, 0, RV_REG_T1, ctx); 559 + return ctx->ninsns - insns_start; 560 + } 561 + 562 + static void emit_store_8(u8 rd, s32 off, u8 rs, struct rv_jit_context *ctx) 563 + { 564 + if (is_12b_int(off)) { 565 + emit(rv_sb(rd, off, rs), ctx); 566 + return; 567 + } 568 + 569 + emit_imm(RV_REG_T1, off, ctx); 570 + emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); 571 + emit(rv_sb(RV_REG_T1, 0, rs), ctx); 572 + } 573 + 574 + static void emit_store_16(u8 rd, s32 off, u8 rs, struct rv_jit_context *ctx) 575 + { 576 + if (is_12b_int(off)) { 577 + emit(rv_sh(rd, off, rs), ctx); 578 + return; 579 + } 580 + 581 + emit_imm(RV_REG_T1, off, ctx); 582 + emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); 583 + emit(rv_sh(RV_REG_T1, 0, rs), ctx); 584 + } 585 + 586 + static void emit_store_32(u8 rd, s32 off, u8 rs, struct rv_jit_context *ctx) 587 + { 588 + if (is_12b_int(off)) { 589 + emit_sw(rd, off, rs, ctx); 590 + return; 591 + } 592 + 593 + emit_imm(RV_REG_T1, off, ctx); 594 + emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); 595 + emit_sw(RV_REG_T1, 0, rs, ctx); 596 + } 597 + 598 + static void emit_store_64(u8 rd, s32 off, u8 rs, struct rv_jit_context *ctx) 599 + { 600 + if (is_12b_int(off)) { 601 + emit_sd(rd, off, rs, ctx); 602 + return; 603 + } 604 + 605 + emit_imm(RV_REG_T1, off, ctx); 606 + emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); 607 + emit_sd(RV_REG_T1, 0, rs, ctx); 608 + } 609 + 610 + static int emit_atomic_ld_st(u8 rd, u8 rs, const struct bpf_insn *insn, 611 + struct rv_jit_context *ctx) 612 + { 613 + u8 code = insn->code; 614 + s32 imm = insn->imm; 615 + s16 off = insn->off; 616 + 617 + switch (imm) { 618 + /* dst_reg = load_acquire(src_reg + off16) */ 619 + case BPF_LOAD_ACQ: 620 + switch (BPF_SIZE(code)) { 621 + case BPF_B: 622 + emit_load_8(false, rd, off, rs, ctx); 623 + break; 624 + case BPF_H: 625 + emit_load_16(false, rd, off, rs, ctx); 626 + break; 627 + case BPF_W: 628 + emit_load_32(false, rd, off, rs, ctx); 629 + break; 630 + case BPF_DW: 631 + emit_load_64(false, rd, off, rs, ctx); 632 + break; 633 + } 634 + emit_fence_r_rw(ctx); 635 + 636 + /* If our next insn is a redundant zext, return 1 to tell 637 + * build_body() to skip it. 638 + */ 639 + if (BPF_SIZE(code) != BPF_DW && insn_is_zext(&insn[1])) 640 + return 1; 641 + break; 642 + /* store_release(dst_reg + off16, src_reg) */ 643 + case BPF_STORE_REL: 644 + emit_fence_rw_w(ctx); 645 + switch (BPF_SIZE(code)) { 646 + case BPF_B: 647 + emit_store_8(rd, off, rs, ctx); 648 + break; 649 + case BPF_H: 650 + emit_store_16(rd, off, rs, ctx); 651 + break; 652 + case BPF_W: 653 + emit_store_32(rd, off, rs, ctx); 654 + break; 655 + case BPF_DW: 656 + emit_store_64(rd, off, rs, ctx); 657 + break; 658 + } 659 + break; 660 + default: 661 + pr_err_once("bpf-jit: invalid atomic load/store opcode %02x\n", imm); 662 + return -EINVAL; 663 + } 664 + 665 + return 0; 666 + } 667 + 668 + static int emit_atomic_rmw(u8 rd, u8 rs, const struct bpf_insn *insn, 669 + struct rv_jit_context *ctx) 670 + { 671 + u8 r0, code = insn->code; 672 + s16 off = insn->off; 673 + s32 imm = insn->imm; 480 674 int jmp_offset; 675 + bool is64; 676 + 677 + if (BPF_SIZE(code) != BPF_W && BPF_SIZE(code) != BPF_DW) { 678 + pr_err_once("bpf-jit: 1- and 2-byte RMW atomics are not supported\n"); 679 + return -EINVAL; 680 + } 681 + is64 = BPF_SIZE(code) == BPF_DW; 481 682 482 683 if (off) { 483 684 if (is_12b_int(off)) { ··· 755 554 rv_sc_w(RV_REG_T3, rs, rd, 0, 1), ctx); 756 555 jmp_offset = ninsns_rvoff(-6); 757 556 emit(rv_bne(RV_REG_T3, 0, jmp_offset >> 1), ctx); 758 - emit(rv_fence(0x3, 0x3), ctx); 557 + emit_fence_rw_rw(ctx); 759 558 break; 559 + default: 560 + pr_err_once("bpf-jit: invalid atomic RMW opcode %02x\n", imm); 561 + return -EINVAL; 760 562 } 563 + 564 + return 0; 761 565 } 762 566 763 567 #define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0) ··· 1856 1650 case BPF_LDX | BPF_PROBE_MEM32 | BPF_W: 1857 1651 case BPF_LDX | BPF_PROBE_MEM32 | BPF_DW: 1858 1652 { 1859 - int insn_len, insns_start; 1860 1653 bool sign_ext; 1654 + int insn_len; 1861 1655 1862 1656 sign_ext = BPF_MODE(insn->code) == BPF_MEMSX || 1863 1657 BPF_MODE(insn->code) == BPF_PROBE_MEMSX; ··· 1869 1663 1870 1664 switch (BPF_SIZE(code)) { 1871 1665 case BPF_B: 1872 - if (is_12b_int(off)) { 1873 - insns_start = ctx->ninsns; 1874 - if (sign_ext) 1875 - emit(rv_lb(rd, off, rs), ctx); 1876 - else 1877 - emit(rv_lbu(rd, off, rs), ctx); 1878 - insn_len = ctx->ninsns - insns_start; 1879 - break; 1880 - } 1881 - 1882 - emit_imm(RV_REG_T1, off, ctx); 1883 - emit_add(RV_REG_T1, RV_REG_T1, rs, ctx); 1884 - insns_start = ctx->ninsns; 1885 - if (sign_ext) 1886 - emit(rv_lb(rd, 0, RV_REG_T1), ctx); 1887 - else 1888 - emit(rv_lbu(rd, 0, RV_REG_T1), ctx); 1889 - insn_len = ctx->ninsns - insns_start; 1666 + insn_len = emit_load_8(sign_ext, rd, off, rs, ctx); 1890 1667 break; 1891 1668 case BPF_H: 1892 - if (is_12b_int(off)) { 1893 - insns_start = ctx->ninsns; 1894 - if (sign_ext) 1895 - emit(rv_lh(rd, off, rs), ctx); 1896 - else 1897 - emit(rv_lhu(rd, off, rs), ctx); 1898 - insn_len = ctx->ninsns - insns_start; 1899 - break; 1900 - } 1901 - 1902 - emit_imm(RV_REG_T1, off, ctx); 1903 - emit_add(RV_REG_T1, RV_REG_T1, rs, ctx); 1904 - insns_start = ctx->ninsns; 1905 - if (sign_ext) 1906 - emit(rv_lh(rd, 0, RV_REG_T1), ctx); 1907 - else 1908 - emit(rv_lhu(rd, 0, RV_REG_T1), ctx); 1909 - insn_len = ctx->ninsns - insns_start; 1669 + insn_len = emit_load_16(sign_ext, rd, off, rs, ctx); 1910 1670 break; 1911 1671 case BPF_W: 1912 - if (is_12b_int(off)) { 1913 - insns_start = ctx->ninsns; 1914 - if (sign_ext) 1915 - emit(rv_lw(rd, off, rs), ctx); 1916 - else 1917 - emit(rv_lwu(rd, off, rs), ctx); 1918 - insn_len = ctx->ninsns - insns_start; 1919 - break; 1920 - } 1921 - 1922 - emit_imm(RV_REG_T1, off, ctx); 1923 - emit_add(RV_REG_T1, RV_REG_T1, rs, ctx); 1924 - insns_start = ctx->ninsns; 1925 - if (sign_ext) 1926 - emit(rv_lw(rd, 0, RV_REG_T1), ctx); 1927 - else 1928 - emit(rv_lwu(rd, 0, RV_REG_T1), ctx); 1929 - insn_len = ctx->ninsns - insns_start; 1672 + insn_len = emit_load_32(sign_ext, rd, off, rs, ctx); 1930 1673 break; 1931 1674 case BPF_DW: 1932 - if (is_12b_int(off)) { 1933 - insns_start = ctx->ninsns; 1934 - emit_ld(rd, off, rs, ctx); 1935 - insn_len = ctx->ninsns - insns_start; 1936 - break; 1937 - } 1938 - 1939 - emit_imm(RV_REG_T1, off, ctx); 1940 - emit_add(RV_REG_T1, RV_REG_T1, rs, ctx); 1941 - insns_start = ctx->ninsns; 1942 - emit_ld(rd, 0, RV_REG_T1, ctx); 1943 - insn_len = ctx->ninsns - insns_start; 1675 + insn_len = emit_load_64(sign_ext, rd, off, rs, ctx); 1944 1676 break; 1945 1677 } 1946 1678 ··· 2023 1879 2024 1880 /* STX: *(size *)(dst + off) = src */ 2025 1881 case BPF_STX | BPF_MEM | BPF_B: 2026 - if (is_12b_int(off)) { 2027 - emit(rv_sb(rd, off, rs), ctx); 2028 - break; 2029 - } 2030 - 2031 - emit_imm(RV_REG_T1, off, ctx); 2032 - emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); 2033 - emit(rv_sb(RV_REG_T1, 0, rs), ctx); 1882 + emit_store_8(rd, off, rs, ctx); 2034 1883 break; 2035 1884 case BPF_STX | BPF_MEM | BPF_H: 2036 - if (is_12b_int(off)) { 2037 - emit(rv_sh(rd, off, rs), ctx); 2038 - break; 2039 - } 2040 - 2041 - emit_imm(RV_REG_T1, off, ctx); 2042 - emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); 2043 - emit(rv_sh(RV_REG_T1, 0, rs), ctx); 1885 + emit_store_16(rd, off, rs, ctx); 2044 1886 break; 2045 1887 case BPF_STX | BPF_MEM | BPF_W: 2046 - if (is_12b_int(off)) { 2047 - emit_sw(rd, off, rs, ctx); 2048 - break; 2049 - } 2050 - 2051 - emit_imm(RV_REG_T1, off, ctx); 2052 - emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); 2053 - emit_sw(RV_REG_T1, 0, rs, ctx); 1888 + emit_store_32(rd, off, rs, ctx); 2054 1889 break; 2055 1890 case BPF_STX | BPF_MEM | BPF_DW: 2056 - if (is_12b_int(off)) { 2057 - emit_sd(rd, off, rs, ctx); 2058 - break; 2059 - } 2060 - 2061 - emit_imm(RV_REG_T1, off, ctx); 2062 - emit_add(RV_REG_T1, RV_REG_T1, rd, ctx); 2063 - emit_sd(RV_REG_T1, 0, rs, ctx); 1891 + emit_store_64(rd, off, rs, ctx); 2064 1892 break; 1893 + case BPF_STX | BPF_ATOMIC | BPF_B: 1894 + case BPF_STX | BPF_ATOMIC | BPF_H: 2065 1895 case BPF_STX | BPF_ATOMIC | BPF_W: 2066 1896 case BPF_STX | BPF_ATOMIC | BPF_DW: 2067 - emit_atomic(rd, rs, off, imm, 2068 - BPF_SIZE(code) == BPF_DW, ctx); 1897 + if (bpf_atomic_is_load_store(insn)) 1898 + ret = emit_atomic_ld_st(rd, rs, insn, ctx); 1899 + else 1900 + ret = emit_atomic_rmw(rd, rs, insn, ctx); 1901 + if (ret) 1902 + return ret; 2069 1903 break; 2070 1904 2071 1905 case BPF_STX | BPF_PROBE_MEM32 | BPF_B:

+1 -2

arch/riscv/net/bpf_jit_core.c

··· 26 26 int ret; 27 27 28 28 ret = bpf_jit_emit_insn(insn, ctx, extra_pass); 29 - /* BPF_LD | BPF_IMM | BPF_DW: skip the next instruction. */ 30 29 if (ret > 0) 31 - i++; 30 + i++; /* skip the next instruction */ 32 31 if (offset) 33 32 offset[i] = ctx->ninsns; 34 33 if (ret < 0)

-4

arch/s390/include/asm/nospec-branch.h

··· 26 26 return __is_defined(CC_USING_EXPOLINE) && !nospec_disable; 27 27 } 28 28 29 - #ifdef CONFIG_EXPOLINE_EXTERN 30 - 31 29 void __s390_indirect_jump_r1(void); 32 30 void __s390_indirect_jump_r2(void); 33 31 void __s390_indirect_jump_r3(void); ··· 41 43 void __s390_indirect_jump_r13(void); 42 44 void __s390_indirect_jump_r14(void); 43 45 void __s390_indirect_jump_r15(void); 44 - 45 - #endif 46 46 47 47 #endif /* __ASSEMBLY__ */ 48 48

+67 -73

arch/s390/net/bpf_jit_comp.c

··· 48 48 int lit64; /* Current position in 64-bit literal pool */ 49 49 int base_ip; /* Base address for literal pool */ 50 50 int exit_ip; /* Address of exit */ 51 - int r1_thunk_ip; /* Address of expoline thunk for 'br %r1' */ 52 - int r14_thunk_ip; /* Address of expoline thunk for 'br %r14' */ 53 51 int tail_call_start; /* Tail call start offset */ 54 52 int excnt; /* Number of exception table entries */ 55 53 int prologue_plt_ret; /* Return address for prologue hotpatch PLT */ ··· 125 127 jit->seen_regs |= (1 << r1); 126 128 } 127 129 130 + static s32 off_to_pcrel(struct bpf_jit *jit, u32 off) 131 + { 132 + return off - jit->prg; 133 + } 134 + 135 + static s64 ptr_to_pcrel(struct bpf_jit *jit, const void *ptr) 136 + { 137 + if (jit->prg_buf) 138 + return (const u8 *)ptr - ((const u8 *)jit->prg_buf + jit->prg); 139 + return 0; 140 + } 141 + 128 142 #define REG_SET_SEEN(b1) \ 129 143 ({ \ 130 144 reg_set_seen(jit, b1); \ ··· 211 201 212 202 #define EMIT4_PCREL_RIC(op, mask, target) \ 213 203 ({ \ 214 - int __rel = ((target) - jit->prg) / 2; \ 204 + int __rel = off_to_pcrel(jit, target) / 2; \ 215 205 _EMIT4((op) | (mask) << 20 | (__rel & 0xffff)); \ 216 206 }) 217 207 ··· 249 239 250 240 #define EMIT6_PCREL_RIEB(op1, op2, b1, b2, mask, target) \ 251 241 ({ \ 252 - unsigned int rel = (int)((target) - jit->prg) / 2; \ 242 + unsigned int rel = off_to_pcrel(jit, target) / 2; \ 253 243 _EMIT6((op1) | reg(b1, b2) << 16 | (rel & 0xffff), \ 254 244 (op2) | (mask) << 12); \ 255 245 REG_SET_SEEN(b1); \ ··· 258 248 259 249 #define EMIT6_PCREL_RIEC(op1, op2, b1, imm, mask, target) \ 260 250 ({ \ 261 - unsigned int rel = (int)((target) - jit->prg) / 2; \ 251 + unsigned int rel = off_to_pcrel(jit, target) / 2; \ 262 252 _EMIT6((op1) | (reg_high(b1) | (mask)) << 16 | \ 263 253 (rel & 0xffff), (op2) | ((imm) & 0xff) << 8); \ 264 254 REG_SET_SEEN(b1); \ ··· 267 257 268 258 #define EMIT6_PCREL(op1, op2, b1, b2, i, off, mask) \ 269 259 ({ \ 270 - int rel = (addrs[(i) + (off) + 1] - jit->prg) / 2; \ 260 + int rel = off_to_pcrel(jit, addrs[(i) + (off) + 1]) / 2;\ 271 261 _EMIT6((op1) | reg(b1, b2) << 16 | (rel & 0xffff), (op2) | (mask));\ 272 262 REG_SET_SEEN(b1); \ 273 263 REG_SET_SEEN(b2); \ 274 264 }) 275 265 276 - #define EMIT6_PCREL_RILB(op, b, target) \ 277 - ({ \ 278 - unsigned int rel = (int)((target) - jit->prg) / 2; \ 279 - _EMIT6((op) | reg_high(b) << 16 | rel >> 16, rel & 0xffff);\ 280 - REG_SET_SEEN(b); \ 281 - }) 266 + static void emit6_pcrel_ril(struct bpf_jit *jit, u32 op, s64 pcrel) 267 + { 268 + u32 pc32dbl = (s32)(pcrel / 2); 282 269 283 - #define EMIT6_PCREL_RIL(op, target) \ 284 - ({ \ 285 - unsigned int rel = (int)((target) - jit->prg) / 2; \ 286 - _EMIT6((op) | rel >> 16, rel & 0xffff); \ 287 - }) 270 + _EMIT6(op | pc32dbl >> 16, pc32dbl & 0xffff); 271 + } 272 + 273 + static void emit6_pcrel_rilb(struct bpf_jit *jit, u32 op, u8 b, s64 pcrel) 274 + { 275 + emit6_pcrel_ril(jit, op | reg_high(b) << 16, pcrel); 276 + REG_SET_SEEN(b); 277 + } 278 + 279 + #define EMIT6_PCREL_RILB(op, b, target) \ 280 + emit6_pcrel_rilb(jit, op, b, off_to_pcrel(jit, target)) 281 + 282 + #define EMIT6_PCREL_RILB_PTR(op, b, target_ptr) \ 283 + emit6_pcrel_rilb(jit, op, b, ptr_to_pcrel(jit, target_ptr)) 284 + 285 + static void emit6_pcrel_rilc(struct bpf_jit *jit, u32 op, u8 mask, s64 pcrel) 286 + { 287 + emit6_pcrel_ril(jit, op | mask << 20, pcrel); 288 + } 288 289 289 290 #define EMIT6_PCREL_RILC(op, mask, target) \ 290 - ({ \ 291 - EMIT6_PCREL_RIL((op) | (mask) << 20, (target)); \ 292 - }) 291 + emit6_pcrel_rilc(jit, op, mask, off_to_pcrel(jit, target)) 292 + 293 + #define EMIT6_PCREL_RILC_PTR(op, mask, target_ptr) \ 294 + emit6_pcrel_rilc(jit, op, mask, ptr_to_pcrel(jit, target_ptr)) 293 295 294 296 #define _EMIT6_IMM(op, imm) \ 295 297 ({ \ ··· 525 503 { 526 504 if (size >= 6 && !is_valid_rel(size)) { 527 505 /* brcl 0xf,size */ 528 - EMIT6_PCREL_RIL(0xc0f4000000, size); 506 + EMIT6_PCREL_RILC(0xc0040000, 0xf, size); 529 507 size -= 6; 530 508 } else if (size >= 4 && is_valid_rel(size)) { 531 509 /* brc 0xf,size */ ··· 627 605 } 628 606 /* Setup stack and backchain */ 629 607 if (is_first_pass(jit) || (jit->seen & SEEN_STACK)) { 630 - if (is_first_pass(jit) || (jit->seen & SEEN_FUNC)) 631 - /* lgr %w1,%r15 (backchain) */ 632 - EMIT4(0xb9040000, REG_W1, REG_15); 608 + /* lgr %w1,%r15 (backchain) */ 609 + EMIT4(0xb9040000, REG_W1, REG_15); 633 610 /* la %bfp,STK_160_UNUSED(%r15) (BPF frame pointer) */ 634 611 EMIT4_DISP(0x41000000, BPF_REG_FP, REG_15, STK_160_UNUSED); 635 612 /* aghi %r15,-STK_OFF */ 636 613 EMIT4_IMM(0xa70b0000, REG_15, -(STK_OFF + stack_depth)); 637 - if (is_first_pass(jit) || (jit->seen & SEEN_FUNC)) 638 - /* stg %w1,152(%r15) (backchain) */ 639 - EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0, 640 - REG_15, 152); 614 + /* stg %w1,152(%r15) (backchain) */ 615 + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0, 616 + REG_15, 152); 641 617 } 642 618 } 643 619 644 620 /* 645 - * Emit an expoline for a jump that follows 621 + * Jump using a register either directly or via an expoline thunk 646 622 */ 647 - static void emit_expoline(struct bpf_jit *jit) 648 - { 649 - /* exrl %r0,.+10 */ 650 - EMIT6_PCREL_RIL(0xc6000000, jit->prg + 10); 651 - /* j . */ 652 - EMIT4_PCREL(0xa7f40000, 0); 653 - } 654 - 655 - /* 656 - * Emit __s390_indirect_jump_r1 thunk if necessary 657 - */ 658 - static void emit_r1_thunk(struct bpf_jit *jit) 659 - { 660 - if (nospec_uses_trampoline()) { 661 - jit->r1_thunk_ip = jit->prg; 662 - emit_expoline(jit); 663 - /* br %r1 */ 664 - _EMIT2(0x07f1); 665 - } 666 - } 623 + #define EMIT_JUMP_REG(reg) do { \ 624 + if (nospec_uses_trampoline()) \ 625 + /* brcl 0xf,__s390_indirect_jump_rN */ \ 626 + EMIT6_PCREL_RILC_PTR(0xc0040000, 0x0f, \ 627 + __s390_indirect_jump_r ## reg); \ 628 + else \ 629 + /* br %rN */ \ 630 + _EMIT2(0x07f0 | reg); \ 631 + } while (0) 667 632 668 633 /* 669 634 * Call r1 either directly or via __s390_indirect_jump_r1 thunk ··· 659 650 { 660 651 if (nospec_uses_trampoline()) 661 652 /* brasl %r14,__s390_indirect_jump_r1 */ 662 - EMIT6_PCREL_RILB(0xc0050000, REG_14, jit->r1_thunk_ip); 653 + EMIT6_PCREL_RILB_PTR(0xc0050000, REG_14, 654 + __s390_indirect_jump_r1); 663 655 else 664 656 /* basr %r14,%r1 */ 665 657 EMIT2(0x0d00, REG_14, REG_1); ··· 676 666 EMIT4(0xb9040000, REG_2, BPF_REG_0); 677 667 /* Restore registers */ 678 668 save_restore_regs(jit, REGS_RESTORE, stack_depth, 0); 679 - if (nospec_uses_trampoline()) { 680 - jit->r14_thunk_ip = jit->prg; 681 - /* Generate __s390_indirect_jump_r14 thunk */ 682 - emit_expoline(jit); 683 - } 684 - /* br %r14 */ 685 - _EMIT2(0x07fe); 686 - 687 - if (is_first_pass(jit) || (jit->seen & SEEN_FUNC)) 688 - emit_r1_thunk(jit); 669 + EMIT_JUMP_REG(14); 689 670 690 671 jit->prg = ALIGN(jit->prg, 8); 691 672 jit->prologue_plt = jit->prg; ··· 1878 1877 /* aghi %r1,tail_call_start */ 1879 1878 EMIT4_IMM(0xa70b0000, REG_1, jit->tail_call_start); 1880 1879 /* brcl 0xf,__s390_indirect_jump_r1 */ 1881 - EMIT6_PCREL_RILC(0xc0040000, 0xf, jit->r1_thunk_ip); 1880 + EMIT6_PCREL_RILC_PTR(0xc0040000, 0xf, 1881 + __s390_indirect_jump_r1); 1882 1882 } else { 1883 1883 /* bc 0xf,tail_call_start(%r1) */ 1884 1884 _EMIT4(0x47f01000 + jit->tail_call_start); ··· 2587 2585 if (nr_stack_args > MAX_NR_STACK_ARGS) 2588 2586 return -ENOTSUPP; 2589 2587 2590 - /* Return to %r14, since func_addr and %r0 are not available. */ 2591 - if ((!func_addr && !(flags & BPF_TRAMP_F_ORIG_STACK)) || 2592 - (flags & BPF_TRAMP_F_INDIRECT)) 2588 + /* Return to %r14 in the struct_ops case. */ 2589 + if (flags & BPF_TRAMP_F_INDIRECT) 2593 2590 flags |= BPF_TRAMP_F_SKIP_FRAME; 2594 2591 2595 2592 /* ··· 2848 2847 0xf000 | tjit->tccnt_off); 2849 2848 /* aghi %r15,stack_size */ 2850 2849 EMIT4_IMM(0xa70b0000, REG_15, tjit->stack_size); 2851 - /* Emit an expoline for the following indirect jump. */ 2852 - if (nospec_uses_trampoline()) 2853 - emit_expoline(jit); 2854 2850 if (flags & BPF_TRAMP_F_SKIP_FRAME) 2855 - /* br %r14 */ 2856 - _EMIT2(0x07fe); 2851 + EMIT_JUMP_REG(14); 2857 2852 else 2858 - /* br %r1 */ 2859 - _EMIT2(0x07f1); 2860 - 2861 - emit_r1_thunk(jit); 2853 + EMIT_JUMP_REG(1); 2862 2854 2863 2855 return 0; 2864 2856 }

+80 -22

drivers/dma-buf/dma-buf.c

··· 19 19 #include <linux/anon_inodes.h> 20 20 #include <linux/export.h> 21 21 #include <linux/debugfs.h> 22 + #include <linux/list.h> 22 23 #include <linux/module.h> 24 + #include <linux/mutex.h> 23 25 #include <linux/seq_file.h> 24 26 #include <linux/sync_file.h> 25 27 #include <linux/poll.h> ··· 37 35 38 36 static inline int is_dma_buf_file(struct file *); 39 37 40 - #if IS_ENABLED(CONFIG_DEBUG_FS) 41 - static DEFINE_MUTEX(debugfs_list_mutex); 42 - static LIST_HEAD(debugfs_list); 38 + static DEFINE_MUTEX(dmabuf_list_mutex); 39 + static LIST_HEAD(dmabuf_list); 43 40 44 - static void __dma_buf_debugfs_list_add(struct dma_buf *dmabuf) 41 + static void __dma_buf_list_add(struct dma_buf *dmabuf) 45 42 { 46 - mutex_lock(&debugfs_list_mutex); 47 - list_add(&dmabuf->list_node, &debugfs_list); 48 - mutex_unlock(&debugfs_list_mutex); 43 + mutex_lock(&dmabuf_list_mutex); 44 + list_add(&dmabuf->list_node, &dmabuf_list); 45 + mutex_unlock(&dmabuf_list_mutex); 49 46 } 50 47 51 - static void __dma_buf_debugfs_list_del(struct dma_buf *dmabuf) 48 + static void __dma_buf_list_del(struct dma_buf *dmabuf) 52 49 { 53 50 if (!dmabuf) 54 51 return; 55 52 56 - mutex_lock(&debugfs_list_mutex); 53 + mutex_lock(&dmabuf_list_mutex); 57 54 list_del(&dmabuf->list_node); 58 - mutex_unlock(&debugfs_list_mutex); 59 - } 60 - #else 61 - static void __dma_buf_debugfs_list_add(struct dma_buf *dmabuf) 62 - { 55 + mutex_unlock(&dmabuf_list_mutex); 63 56 } 64 57 65 - static void __dma_buf_debugfs_list_del(struct dma_buf *dmabuf) 58 + /** 59 + * dma_buf_iter_begin - begin iteration through global list of all DMA buffers 60 + * 61 + * Returns the first buffer in the global list of DMA-bufs that's not in the 62 + * process of being destroyed. Increments that buffer's reference count to 63 + * prevent buffer destruction. Callers must release the reference, either by 64 + * continuing iteration with dma_buf_iter_next(), or with dma_buf_put(). 65 + * 66 + * Return: 67 + * * First buffer from global list, with refcount elevated 68 + * * NULL if no active buffers are present 69 + */ 70 + struct dma_buf *dma_buf_iter_begin(void) 66 71 { 72 + struct dma_buf *ret = NULL, *dmabuf; 73 + 74 + /* 75 + * The list mutex does not protect a dmabuf's refcount, so it can be 76 + * zeroed while we are iterating. We cannot call get_dma_buf() since the 77 + * caller may not already own a reference to the buffer. 78 + */ 79 + mutex_lock(&dmabuf_list_mutex); 80 + list_for_each_entry(dmabuf, &dmabuf_list, list_node) { 81 + if (file_ref_get(&dmabuf->file->f_ref)) { 82 + ret = dmabuf; 83 + break; 84 + } 85 + } 86 + mutex_unlock(&dmabuf_list_mutex); 87 + return ret; 67 88 } 68 - #endif 89 + 90 + /** 91 + * dma_buf_iter_next - continue iteration through global list of all DMA buffers 92 + * @dmabuf: [in] pointer to dma_buf 93 + * 94 + * Decrements the reference count on the provided buffer. Returns the next 95 + * buffer from the remainder of the global list of DMA-bufs with its reference 96 + * count incremented. Callers must release the reference, either by continuing 97 + * iteration with dma_buf_iter_next(), or with dma_buf_put(). 98 + * 99 + * Return: 100 + * * Next buffer from global list, with refcount elevated 101 + * * NULL if no additional active buffers are present 102 + */ 103 + struct dma_buf *dma_buf_iter_next(struct dma_buf *dmabuf) 104 + { 105 + struct dma_buf *ret = NULL; 106 + 107 + /* 108 + * The list mutex does not protect a dmabuf's refcount, so it can be 109 + * zeroed while we are iterating. We cannot call get_dma_buf() since the 110 + * caller may not already own a reference to the buffer. 111 + */ 112 + mutex_lock(&dmabuf_list_mutex); 113 + dma_buf_put(dmabuf); 114 + list_for_each_entry_continue(dmabuf, &dmabuf_list, list_node) { 115 + if (file_ref_get(&dmabuf->file->f_ref)) { 116 + ret = dmabuf; 117 + break; 118 + } 119 + } 120 + mutex_unlock(&dmabuf_list_mutex); 121 + return ret; 122 + } 69 123 70 124 static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen) 71 125 { ··· 173 115 if (!is_dma_buf_file(file)) 174 116 return -EINVAL; 175 117 176 - __dma_buf_debugfs_list_del(file->private_data); 118 + __dma_buf_list_del(file->private_data); 177 119 178 120 return 0; 179 121 } ··· 743 685 file->f_path.dentry->d_fsdata = dmabuf; 744 686 dmabuf->file = file; 745 687 746 - __dma_buf_debugfs_list_add(dmabuf); 688 + __dma_buf_list_add(dmabuf); 747 689 748 690 return dmabuf; 749 691 ··· 1621 1563 size_t size = 0; 1622 1564 int ret; 1623 1565 1624 - ret = mutex_lock_interruptible(&debugfs_list_mutex); 1566 + ret = mutex_lock_interruptible(&dmabuf_list_mutex); 1625 1567 1626 1568 if (ret) 1627 1569 return ret; ··· 1630 1572 seq_printf(s, "%-8s\t%-8s\t%-8s\t%-8s\texp_name\t%-8s\tname\n", 1631 1573 "size", "flags", "mode", "count", "ino"); 1632 1574 1633 - list_for_each_entry(buf_obj, &debugfs_list, list_node) { 1575 + list_for_each_entry(buf_obj, &dmabuf_list, list_node) { 1634 1576 1635 1577 ret = dma_resv_lock_interruptible(buf_obj->resv, NULL); 1636 1578 if (ret) ··· 1667 1609 1668 1610 seq_printf(s, "\nTotal %d objects, %zu bytes\n", count, size); 1669 1611 1670 - mutex_unlock(&debugfs_list_mutex); 1612 + mutex_unlock(&dmabuf_list_mutex); 1671 1613 return 0; 1672 1614 1673 1615 error_unlock: 1674 - mutex_unlock(&debugfs_list_mutex); 1616 + mutex_unlock(&dmabuf_list_mutex); 1675 1617 return ret; 1676 1618 } 1677 1619

+2 -1

include/asm-generic/vmlinux.lds.h

··· 667 667 */ 668 668 #ifdef CONFIG_DEBUG_INFO_BTF 669 669 #define BTF \ 670 + . = ALIGN(PAGE_SIZE); \ 670 671 .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { \ 671 672 BOUNDED_SECTION_BY(.BTF, _BTF) \ 672 673 } \ 673 - . = ALIGN(4); \ 674 + . = ALIGN(PAGE_SIZE); \ 674 675 .BTF_ids : AT(ADDR(.BTF_ids) - LOAD_OFFSET) { \ 675 676 *(.BTF_ids) \ 676 677 }

-8

include/linux/bpf-cgroup.h

··· 426 426 427 427 const struct bpf_func_proto * 428 428 cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); 429 - const struct bpf_func_proto * 430 - cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); 431 429 #else 432 430 433 431 static inline void cgroup_bpf_lifetime_notifier_init(void) ··· 460 462 461 463 static inline const struct bpf_func_proto * 462 464 cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 463 - { 464 - return NULL; 465 - } 466 - 467 - static inline const struct bpf_func_proto * 468 - cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 469 465 { 470 466 return NULL; 471 467 }

+20

include/linux/bpf.h

··· 346 346 } 347 347 } 348 348 349 + #if IS_ENABLED(CONFIG_DEBUG_KERNEL) 350 + #define BPF_WARN_ONCE(cond, format...) WARN_ONCE(cond, format) 351 + #else 352 + #define BPF_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond) 353 + #endif 354 + 349 355 static inline u32 btf_field_type_size(enum btf_field_type type) 350 356 { 351 357 switch (type) { ··· 1355 1349 const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len); 1356 1350 void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len); 1357 1351 bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr); 1352 + int __bpf_dynptr_write(const struct bpf_dynptr_kern *dst, u32 offset, 1353 + void *src, u32 len, u64 flags); 1354 + void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr *p, u32 offset, 1355 + void *buffer__opt, u32 buffer__szk); 1356 + 1357 + static inline int bpf_dynptr_check_off_len(const struct bpf_dynptr_kern *ptr, u32 offset, u32 len) 1358 + { 1359 + u32 size = __bpf_dynptr_size(ptr); 1360 + 1361 + if (len > size || offset > size - len) 1362 + return -E2BIG; 1363 + 1364 + return 0; 1365 + } 1358 1366 1359 1367 #ifdef CONFIG_BPF_JIT 1360 1368 int bpf_trampoline_link_prog(struct bpf_tramp_link *link,

+20 -4

include/linux/bpf_verifier.h

··· 356 356 INSN_F_SPI_MASK = 0x3f, /* 6 bits */ 357 357 INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */ 358 358 359 - INSN_F_STACK_ACCESS = BIT(9), /* we need 10 bits total */ 359 + INSN_F_STACK_ACCESS = BIT(9), 360 + 361 + INSN_F_DST_REG_STACK = BIT(10), /* dst_reg is PTR_TO_STACK */ 362 + INSN_F_SRC_REG_STACK = BIT(11), /* src_reg is PTR_TO_STACK */ 363 + /* total 12 bits are used now. */ 360 364 }; 361 365 362 366 static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES); ··· 369 365 struct bpf_insn_hist_entry { 370 366 u32 idx; 371 367 /* insn idx can't be bigger than 1 million */ 372 - u32 prev_idx : 22; 373 - /* special flags, e.g., whether insn is doing register stack spill/load */ 374 - u32 flags : 10; 368 + u32 prev_idx : 20; 369 + /* special INSN_F_xxx flags */ 370 + u32 flags : 12; 375 371 /* additional registers that need precision tracking when this 376 372 * jump is backtracked, vector of six 10-bit records 377 373 */ ··· 595 591 * bpf_fastcall pattern. 596 592 */ 597 593 u8 fastcall_spills_num:3; 594 + u8 arg_prog:4; 598 595 599 596 /* below fields are initialized once */ 600 597 unsigned int orig_idx; /* original instruction index */ ··· 842 837 __printf(3, 4) void verbose_linfo(struct bpf_verifier_env *env, 843 838 u32 insn_off, 844 839 const char *prefix_fmt, ...); 840 + 841 + #define verifier_bug_if(cond, env, fmt, args...) \ 842 + ({ \ 843 + bool __cond = (cond); \ 844 + if (unlikely(__cond)) { \ 845 + BPF_WARN_ONCE(1, "verifier bug: " fmt "(" #cond ")\n", ##args); \ 846 + bpf_log(&env->log, "verifier bug: " fmt "(" #cond ")\n", ##args); \ 847 + } \ 848 + (__cond); \ 849 + }) 850 + #define verifier_bug(env, fmt, args...) verifier_bug_if(1, env, fmt, ##args) 845 851 846 852 static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env) 847 853 {

+2 -2

include/linux/dma-buf.h

··· 361 361 */ 362 362 struct module *owner; 363 363 364 - #if IS_ENABLED(CONFIG_DEBUG_FS) 365 364 /** @list_node: node for dma_buf accounting and debugging. */ 366 365 struct list_head list_node; 367 - #endif 368 366 369 367 /** @priv: exporter specific private data for this buffer object. */ 370 368 void *priv; ··· 607 609 void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map); 608 610 int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map); 609 611 void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map); 612 + struct dma_buf *dma_buf_iter_begin(void); 613 + struct dma_buf *dma_buf_iter_next(struct dma_buf *dmbuf); 610 614 #endif /* __DMA_BUF_H__ */

+12 -7

include/uapi/linux/bpf.h

··· 1506 1506 __s32 map_token_fd; 1507 1507 }; 1508 1508 1509 - struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ 1509 + struct { /* anonymous struct used by BPF_MAP_*_ELEM and BPF_MAP_FREEZE commands */ 1510 1510 __u32 map_fd; 1511 1511 __aligned_u64 key; 1512 1512 union { ··· 1995 1995 * long bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags) 1996 1996 * Description 1997 1997 * Store *len* bytes from address *from* into the packet 1998 - * associated to *skb*, at *offset*. *flags* are a combination of 1999 - * **BPF_F_RECOMPUTE_CSUM** (automatically recompute the 2000 - * checksum for the packet after storing the bytes) and 2001 - * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\ 2002 - * **->swhash** and *skb*\ **->l4hash** to 0). 1998 + * associated to *skb*, at *offset*. The *flags* are a combination 1999 + * of the following values: 2000 + * 2001 + * **BPF_F_RECOMPUTE_CSUM** 2002 + * Automatically update *skb*\ **->csum** after storing the 2003 + * bytes. 2004 + * **BPF_F_INVALIDATE_HASH** 2005 + * Set *skb*\ **->hash**, *skb*\ **->swhash** and *skb*\ 2006 + * **->l4hash** to 0. 2003 2007 * 2004 2008 * A call to this helper is susceptible to change the underlying 2005 2009 * packet buffer. Therefore, at load time, all checks on pointers ··· 2055 2051 * untouched (unless **BPF_F_MARK_ENFORCE** is added as well), and 2056 2052 * for updates resulting in a null checksum the value is set to 2057 2053 * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates 2058 - * the checksum is to be computed against a pseudo-header. 2054 + * that the modified header field is part of the pseudo-header. 2059 2055 * 2060 2056 * This helper works in combination with **bpf_csum_diff**\ (), 2061 2057 * which does not update the checksum in-place, but offers more ··· 6727 6723 __u32 name_len; 6728 6724 __u32 offset; /* offset from file_name */ 6729 6725 __u64 cookie; 6726 + __u64 ref_ctr_offset; 6730 6727 } uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */ 6731 6728 struct { 6732 6729 __aligned_u64 func_name; /* in/out */

+3

kernel/bpf/Makefile

··· 53 53 obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o 54 54 obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o 55 55 obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o 56 + ifeq ($(CONFIG_DMA_SHARED_BUFFER),y) 57 + obj-$(CONFIG_BPF_SYSCALL) += dmabuf_iter.o 58 + endif 56 59 57 60 CFLAGS_REMOVE_percpu_freelist.o = $(CC_FLAGS_FTRACE) 58 61 CFLAGS_REMOVE_bpf_lru_list.o = $(CC_FLAGS_FTRACE)

+1 -1

kernel/bpf/bpf_struct_ops.c

··· 601 601 if (model->ret_size > 0) 602 602 flags |= BPF_TRAMP_F_RET_FENTRY_RET; 603 603 604 - size = arch_bpf_trampoline_size(model, flags, tlinks, NULL); 604 + size = arch_bpf_trampoline_size(model, flags, tlinks, stub_func); 605 605 if (size <= 0) 606 606 return size ? : -EFAULT; 607 607

+20 -25

kernel/bpf/btf.c

··· 26 26 #include <linux/bsearch.h> 27 27 #include <linux/kobject.h> 28 28 #include <linux/sysfs.h> 29 + #include <linux/overflow.h> 29 30 30 31 #include <net/netfilter/nf_bpf_link.h> 31 32 ··· 3958 3957 /* This needs to be kzalloc to zero out padding and unused fields, see 3959 3958 * comment in btf_record_equal. 3960 3959 */ 3961 - rec = kzalloc(offsetof(struct btf_record, fields[cnt]), GFP_KERNEL | __GFP_NOWARN); 3960 + rec = kzalloc(struct_size(rec, fields, cnt), GFP_KERNEL | __GFP_NOWARN); 3962 3961 if (!rec) 3963 3962 return ERR_PTR(-ENOMEM); 3964 3963 ··· 5584 5583 if (id < 0) 5585 5584 continue; 5586 5585 5587 - new_aof = krealloc(aof, offsetof(struct btf_id_set, ids[aof->cnt + 1]), 5586 + new_aof = krealloc(aof, struct_size(new_aof, ids, aof->cnt + 1), 5588 5587 GFP_KERNEL | __GFP_NOWARN); 5589 5588 if (!new_aof) { 5590 5589 ret = -ENOMEM; ··· 5611 5610 if (ret != BTF_FIELD_FOUND) 5612 5611 continue; 5613 5612 5614 - new_aof = krealloc(aof, offsetof(struct btf_id_set, ids[aof->cnt + 1]), 5613 + new_aof = krealloc(aof, struct_size(new_aof, ids, aof->cnt + 1), 5615 5614 GFP_KERNEL | __GFP_NOWARN); 5616 5615 if (!new_aof) { 5617 5616 ret = -ENOMEM; ··· 5648 5647 continue; 5649 5648 parse: 5650 5649 tab_cnt = tab ? tab->cnt : 0; 5651 - new_tab = krealloc(tab, offsetof(struct btf_struct_metas, types[tab_cnt + 1]), 5650 + new_tab = krealloc(tab, struct_size(new_tab, types, tab_cnt + 1), 5652 5651 GFP_KERNEL | __GFP_NOWARN); 5653 5652 if (!new_tab) { 5654 5653 ret = -ENOMEM; ··· 6384 6383 return prog->aux->attach_btf; 6385 6384 } 6386 6385 6387 - static bool is_int_ptr(struct btf *btf, const struct btf_type *t) 6386 + static bool is_void_or_int_ptr(struct btf *btf, const struct btf_type *t) 6388 6387 { 6389 6388 /* skip modifiers */ 6390 6389 t = btf_type_skip_modifiers(btf, t->type, NULL); 6391 - 6392 - return btf_type_is_int(t); 6390 + return btf_type_is_void(t) || btf_type_is_int(t); 6393 6391 } 6394 6392 6395 6393 u32 btf_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, ··· 6777 6777 } 6778 6778 } 6779 6779 6780 - if (t->type == 0) 6781 - /* This is a pointer to void. 6782 - * It is the same as scalar from the verifier safety pov. 6783 - * No further pointer walking is allowed. 6784 - */ 6785 - return true; 6786 - 6787 - if (is_int_ptr(btf, t)) 6780 + /* 6781 + * If it's a pointer to void, it's the same as scalar from the verifier 6782 + * safety POV. Either way, no futher pointer walking is allowed. 6783 + */ 6784 + if (is_void_or_int_ptr(btf, t)) 6788 6785 return true; 6789 6786 6790 6787 /* this is a pointer to another type */ ··· 6827 6830 /* Is this a func with potential NULL args? */ 6828 6831 if (strcmp(tname, raw_tp_null_args[i].func)) 6829 6832 continue; 6830 - if (raw_tp_null_args[i].mask & (0x1 << (arg * 4))) 6833 + if (raw_tp_null_args[i].mask & (0x1ULL << (arg * 4))) 6831 6834 info->reg_type |= PTR_MAYBE_NULL; 6832 6835 /* Is the current arg IS_ERR? */ 6833 - if (raw_tp_null_args[i].mask & (0x2 << (arg * 4))) 6836 + if (raw_tp_null_args[i].mask & (0x2ULL << (arg * 4))) 6834 6837 ptr_err_raw_tp = true; 6835 6838 break; 6836 6839 } ··· 7660 7663 return 0; 7661 7664 7662 7665 if (!prog->aux->func_info) { 7663 - bpf_log(log, "Verifier bug\n"); 7666 + verifier_bug(env, "func_info undefined"); 7664 7667 return -EFAULT; 7665 7668 } 7666 7669 ··· 7684 7687 tname = btf_name_by_offset(btf, fn_t->name_off); 7685 7688 7686 7689 if (prog->aux->func_info_aux[subprog].unreliable) { 7687 - bpf_log(log, "Verifier bug in function %s()\n", tname); 7690 + verifier_bug(env, "unreliable BTF for function %s()", tname); 7688 7691 return -EFAULT; 7689 7692 } 7690 7693 if (prog_type == BPF_PROG_TYPE_EXT) ··· 8561 8564 8562 8565 /* Grow set */ 8563 8566 set = krealloc(tab->sets[hook], 8564 - offsetof(struct btf_id_set8, pairs[set_cnt + add_set->cnt]), 8567 + struct_size(set, pairs, set_cnt + add_set->cnt), 8565 8568 GFP_KERNEL | __GFP_NOWARN); 8566 8569 if (!set) { 8567 8570 ret = -ENOMEM; ··· 8847 8850 } 8848 8851 8849 8852 tab = krealloc(btf->dtor_kfunc_tab, 8850 - offsetof(struct btf_id_dtor_kfunc_tab, dtors[tab_cnt + add_cnt]), 8853 + struct_size(tab, dtors, tab_cnt + add_cnt), 8851 8854 GFP_KERNEL | __GFP_NOWARN); 8852 8855 if (!tab) { 8853 8856 ret = -ENOMEM; ··· 9405 9408 9406 9409 tab = btf->struct_ops_tab; 9407 9410 if (!tab) { 9408 - tab = kzalloc(offsetof(struct btf_struct_ops_tab, ops[4]), 9409 - GFP_KERNEL); 9411 + tab = kzalloc(struct_size(tab, ops, 4), GFP_KERNEL); 9410 9412 if (!tab) 9411 9413 return -ENOMEM; 9412 9414 tab->capacity = 4; ··· 9418 9422 9419 9423 if (tab->cnt == tab->capacity) { 9420 9424 new_tab = krealloc(tab, 9421 - offsetof(struct btf_struct_ops_tab, 9422 - ops[tab->capacity * 2]), 9425 + struct_size(tab, ops, tab->capacity * 2), 9423 9426 GFP_KERNEL); 9424 9427 if (!new_tab) 9425 9428 return -ENOMEM;

-32

kernel/bpf/cgroup.c

··· 1687 1687 if (func_proto) 1688 1688 return func_proto; 1689 1689 1690 - func_proto = cgroup_current_func_proto(func_id, prog); 1691 - if (func_proto) 1692 - return func_proto; 1693 - 1694 1690 switch (func_id) { 1695 1691 case BPF_FUNC_perf_event_output: 1696 1692 return &bpf_event_output_data_proto; ··· 2234 2238 if (func_proto) 2235 2239 return func_proto; 2236 2240 2237 - func_proto = cgroup_current_func_proto(func_id, prog); 2238 - if (func_proto) 2239 - return func_proto; 2240 - 2241 2241 switch (func_id) { 2242 2242 case BPF_FUNC_sysctl_get_name: 2243 2243 return &bpf_sysctl_get_name_proto; ··· 2374 2382 const struct bpf_func_proto *func_proto; 2375 2383 2376 2384 func_proto = cgroup_common_func_proto(func_id, prog); 2377 - if (func_proto) 2378 - return func_proto; 2379 - 2380 - func_proto = cgroup_current_func_proto(func_id, prog); 2381 2385 if (func_proto) 2382 2386 return func_proto; 2383 2387 ··· 2619 2631 default: 2620 2632 return &bpf_set_retval_proto; 2621 2633 } 2622 - default: 2623 - return NULL; 2624 - } 2625 - } 2626 - 2627 - /* Common helpers for cgroup hooks with valid process context. */ 2628 - const struct bpf_func_proto * 2629 - cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 2630 - { 2631 - switch (func_id) { 2632 - case BPF_FUNC_get_current_uid_gid: 2633 - return &bpf_get_current_uid_gid_proto; 2634 - case BPF_FUNC_get_current_comm: 2635 - return &bpf_get_current_comm_proto; 2636 - #ifdef CONFIG_CGROUP_NET_CLASSID 2637 - case BPF_FUNC_get_cgroup_classid: 2638 - return &bpf_get_cgroup_classid_curr_proto; 2639 - #endif 2640 - case BPF_FUNC_current_task_under_cgroup: 2641 - return &bpf_current_task_under_cgroup_proto; 2642 2634 default: 2643 2635 return NULL; 2644 2636 }

+17 -12

kernel/bpf/core.c

··· 2358 2358 return 0; 2359 2359 } 2360 2360 2361 - bool bpf_prog_map_compatible(struct bpf_map *map, 2362 - const struct bpf_prog *fp) 2361 + static bool __bpf_prog_map_compatible(struct bpf_map *map, 2362 + const struct bpf_prog *fp) 2363 2363 { 2364 2364 enum bpf_prog_type prog_type = resolve_prog_type(fp); 2365 2365 bool ret; 2366 2366 struct bpf_prog_aux *aux = fp->aux; 2367 2367 2368 2368 if (fp->kprobe_override) 2369 - return false; 2370 - 2371 - /* XDP programs inserted into maps are not guaranteed to run on 2372 - * a particular netdev (and can run outside driver context entirely 2373 - * in the case of devmap and cpumap). Until device checks 2374 - * are implemented, prohibit adding dev-bound programs to program maps. 2375 - */ 2376 - if (bpf_prog_is_dev_bound(aux)) 2377 2369 return false; 2378 2370 2379 2371 spin_lock(&map->owner.lock); ··· 2401 2409 return ret; 2402 2410 } 2403 2411 2412 + bool bpf_prog_map_compatible(struct bpf_map *map, const struct bpf_prog *fp) 2413 + { 2414 + /* XDP programs inserted into maps are not guaranteed to run on 2415 + * a particular netdev (and can run outside driver context entirely 2416 + * in the case of devmap and cpumap). Until device checks 2417 + * are implemented, prohibit adding dev-bound programs to program maps. 2418 + */ 2419 + if (bpf_prog_is_dev_bound(fp->aux)) 2420 + return false; 2421 + 2422 + return __bpf_prog_map_compatible(map, fp); 2423 + } 2424 + 2404 2425 static int bpf_check_tail_call(const struct bpf_prog *fp) 2405 2426 { 2406 2427 struct bpf_prog_aux *aux = fp->aux; ··· 2426 2421 if (!map_type_contains_progs(map)) 2427 2422 continue; 2428 2423 2429 - if (!bpf_prog_map_compatible(map, fp)) { 2424 + if (!__bpf_prog_map_compatible(map, fp)) { 2430 2425 ret = -EINVAL; 2431 2426 goto out; 2432 2427 } ··· 2474 2469 /* In case of BPF to BPF calls, verifier did all the prep 2475 2470 * work with regards to JITing, etc. 2476 2471 */ 2477 - bool jit_needed = false; 2472 + bool jit_needed = fp->jit_requested; 2478 2473 2479 2474 if (fp->bpf_func) 2480 2475 goto finalize;

+150

kernel/bpf/dmabuf_iter.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright (c) 2025 Google LLC */ 3 + #include <linux/bpf.h> 4 + #include <linux/btf_ids.h> 5 + #include <linux/dma-buf.h> 6 + #include <linux/kernel.h> 7 + #include <linux/seq_file.h> 8 + 9 + static void *dmabuf_iter_seq_start(struct seq_file *seq, loff_t *pos) 10 + { 11 + if (*pos) 12 + return NULL; 13 + 14 + return dma_buf_iter_begin(); 15 + } 16 + 17 + static void *dmabuf_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos) 18 + { 19 + struct dma_buf *dmabuf = v; 20 + 21 + ++*pos; 22 + 23 + return dma_buf_iter_next(dmabuf); 24 + } 25 + 26 + struct bpf_iter__dmabuf { 27 + __bpf_md_ptr(struct bpf_iter_meta *, meta); 28 + __bpf_md_ptr(struct dma_buf *, dmabuf); 29 + }; 30 + 31 + static int __dmabuf_seq_show(struct seq_file *seq, void *v, bool in_stop) 32 + { 33 + struct bpf_iter_meta meta = { 34 + .seq = seq, 35 + }; 36 + struct bpf_iter__dmabuf ctx = { 37 + .meta = &meta, 38 + .dmabuf = v, 39 + }; 40 + struct bpf_prog *prog = bpf_iter_get_info(&meta, in_stop); 41 + 42 + if (prog) 43 + return bpf_iter_run_prog(prog, &ctx); 44 + 45 + return 0; 46 + } 47 + 48 + static int dmabuf_iter_seq_show(struct seq_file *seq, void *v) 49 + { 50 + return __dmabuf_seq_show(seq, v, false); 51 + } 52 + 53 + static void dmabuf_iter_seq_stop(struct seq_file *seq, void *v) 54 + { 55 + struct dma_buf *dmabuf = v; 56 + 57 + if (dmabuf) 58 + dma_buf_put(dmabuf); 59 + } 60 + 61 + static const struct seq_operations dmabuf_iter_seq_ops = { 62 + .start = dmabuf_iter_seq_start, 63 + .next = dmabuf_iter_seq_next, 64 + .stop = dmabuf_iter_seq_stop, 65 + .show = dmabuf_iter_seq_show, 66 + }; 67 + 68 + static void bpf_iter_dmabuf_show_fdinfo(const struct bpf_iter_aux_info *aux, 69 + struct seq_file *seq) 70 + { 71 + seq_puts(seq, "dmabuf iter\n"); 72 + } 73 + 74 + static const struct bpf_iter_seq_info dmabuf_iter_seq_info = { 75 + .seq_ops = &dmabuf_iter_seq_ops, 76 + .init_seq_private = NULL, 77 + .fini_seq_private = NULL, 78 + .seq_priv_size = 0, 79 + }; 80 + 81 + static struct bpf_iter_reg bpf_dmabuf_reg_info = { 82 + .target = "dmabuf", 83 + .feature = BPF_ITER_RESCHED, 84 + .show_fdinfo = bpf_iter_dmabuf_show_fdinfo, 85 + .ctx_arg_info_size = 1, 86 + .ctx_arg_info = { 87 + { offsetof(struct bpf_iter__dmabuf, dmabuf), 88 + PTR_TO_BTF_ID_OR_NULL }, 89 + }, 90 + .seq_info = &dmabuf_iter_seq_info, 91 + }; 92 + 93 + DEFINE_BPF_ITER_FUNC(dmabuf, struct bpf_iter_meta *meta, struct dma_buf *dmabuf) 94 + BTF_ID_LIST_SINGLE(bpf_dmabuf_btf_id, struct, dma_buf) 95 + 96 + static int __init dmabuf_iter_init(void) 97 + { 98 + bpf_dmabuf_reg_info.ctx_arg_info[0].btf_id = bpf_dmabuf_btf_id[0]; 99 + return bpf_iter_reg_target(&bpf_dmabuf_reg_info); 100 + } 101 + 102 + late_initcall(dmabuf_iter_init); 103 + 104 + struct bpf_iter_dmabuf { 105 + /* 106 + * opaque iterator state; having __u64 here allows to preserve correct 107 + * alignment requirements in vmlinux.h, generated from BTF 108 + */ 109 + __u64 __opaque[1]; 110 + } __aligned(8); 111 + 112 + /* Non-opaque version of bpf_iter_dmabuf */ 113 + struct bpf_iter_dmabuf_kern { 114 + struct dma_buf *dmabuf; 115 + } __aligned(8); 116 + 117 + __bpf_kfunc_start_defs(); 118 + 119 + __bpf_kfunc int bpf_iter_dmabuf_new(struct bpf_iter_dmabuf *it) 120 + { 121 + struct bpf_iter_dmabuf_kern *kit = (void *)it; 122 + 123 + BUILD_BUG_ON(sizeof(*kit) > sizeof(*it)); 124 + BUILD_BUG_ON(__alignof__(*kit) != __alignof__(*it)); 125 + 126 + kit->dmabuf = NULL; 127 + return 0; 128 + } 129 + 130 + __bpf_kfunc struct dma_buf *bpf_iter_dmabuf_next(struct bpf_iter_dmabuf *it) 131 + { 132 + struct bpf_iter_dmabuf_kern *kit = (void *)it; 133 + 134 + if (kit->dmabuf) 135 + kit->dmabuf = dma_buf_iter_next(kit->dmabuf); 136 + else 137 + kit->dmabuf = dma_buf_iter_begin(); 138 + 139 + return kit->dmabuf; 140 + } 141 + 142 + __bpf_kfunc void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) 143 + { 144 + struct bpf_iter_dmabuf_kern *kit = (void *)it; 145 + 146 + if (kit->dmabuf) 147 + dma_buf_put(kit->dmabuf); 148 + } 149 + 150 + __bpf_kfunc_end_defs();

+72 -76

kernel/bpf/hashtab.c

··· 175 175 htab->map.map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH; 176 176 } 177 177 178 + static inline bool is_fd_htab(const struct bpf_htab *htab) 179 + { 180 + return htab->map.map_type == BPF_MAP_TYPE_HASH_OF_MAPS; 181 + } 182 + 183 + static inline void *htab_elem_value(struct htab_elem *l, u32 key_size) 184 + { 185 + return l->key + round_up(key_size, 8); 186 + } 187 + 178 188 static inline void htab_elem_set_ptr(struct htab_elem *l, u32 key_size, 179 189 void __percpu *pptr) 180 190 { 181 - *(void __percpu **)(l->key + roundup(key_size, 8)) = pptr; 191 + *(void __percpu **)htab_elem_value(l, key_size) = pptr; 182 192 } 183 193 184 194 static inline void __percpu *htab_elem_get_ptr(struct htab_elem *l, u32 key_size) 185 195 { 186 - return *(void __percpu **)(l->key + roundup(key_size, 8)); 196 + return *(void __percpu **)htab_elem_value(l, key_size); 187 197 } 188 198 189 199 static void *fd_htab_map_get_ptr(const struct bpf_map *map, struct htab_elem *l) 190 200 { 191 - return *(void **)(l->key + roundup(map->key_size, 8)); 201 + return *(void **)htab_elem_value(l, map->key_size); 192 202 } 193 203 194 204 static struct htab_elem *get_htab_elem(struct bpf_htab *htab, int i) ··· 206 196 return (struct htab_elem *) (htab->elems + i * (u64)htab->elem_size); 207 197 } 208 198 199 + /* Both percpu and fd htab support in-place update, so no need for 200 + * extra elem. LRU itself can remove the least used element, so 201 + * there is no need for an extra elem during map_update. 202 + */ 209 203 static bool htab_has_extra_elems(struct bpf_htab *htab) 210 204 { 211 - return !htab_is_percpu(htab) && !htab_is_lru(htab); 205 + return !htab_is_percpu(htab) && !htab_is_lru(htab) && !is_fd_htab(htab); 212 206 } 213 207 214 208 static void htab_free_prealloced_timers_and_wq(struct bpf_htab *htab) ··· 229 215 elem = get_htab_elem(htab, i); 230 216 if (btf_record_has_field(htab->map.record, BPF_TIMER)) 231 217 bpf_obj_free_timer(htab->map.record, 232 - elem->key + round_up(htab->map.key_size, 8)); 218 + htab_elem_value(elem, htab->map.key_size)); 233 219 if (btf_record_has_field(htab->map.record, BPF_WORKQUEUE)) 234 220 bpf_obj_free_workqueue(htab->map.record, 235 - elem->key + round_up(htab->map.key_size, 8)); 221 + htab_elem_value(elem, htab->map.key_size)); 236 222 cond_resched(); 237 223 } 238 224 } ··· 259 245 cond_resched(); 260 246 } 261 247 } else { 262 - bpf_obj_free_fields(htab->map.record, elem->key + round_up(htab->map.key_size, 8)); 248 + bpf_obj_free_fields(htab->map.record, 249 + htab_elem_value(elem, htab->map.key_size)); 263 250 cond_resched(); 264 251 } 265 252 cond_resched(); ··· 468 453 { 469 454 bool percpu = (attr->map_type == BPF_MAP_TYPE_PERCPU_HASH || 470 455 attr->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH); 471 - bool lru = (attr->map_type == BPF_MAP_TYPE_LRU_HASH || 472 - attr->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH); 473 456 /* percpu_lru means each cpu has its own LRU list. 474 457 * it is different from BPF_MAP_TYPE_PERCPU_HASH where 475 458 * the map's value itself is percpu. percpu_lru has ··· 562 549 if (err) 563 550 goto free_map_locked; 564 551 565 - if (!percpu && !lru) { 566 - /* lru itself can remove the least used element, so 567 - * there is no need for an extra elem during map_update. 568 - */ 552 + if (htab_has_extra_elems(htab)) { 569 553 err = alloc_extra_elems(htab); 570 554 if (err) 571 555 goto free_prealloc; ··· 680 670 struct htab_elem *l = __htab_map_lookup_elem(map, key); 681 671 682 672 if (l) 683 - return l->key + round_up(map->key_size, 8); 673 + return htab_elem_value(l, map->key_size); 684 674 685 675 return NULL; 686 676 } ··· 719 709 if (l) { 720 710 if (mark) 721 711 bpf_lru_node_set_ref(&l->lru_node); 722 - return l->key + round_up(map->key_size, 8); 712 + return htab_elem_value(l, map->key_size); 723 713 } 724 714 725 715 return NULL; ··· 773 763 for_each_possible_cpu(cpu) 774 764 bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu)); 775 765 } else { 776 - void *map_value = elem->key + round_up(htab->map.key_size, 8); 766 + void *map_value = htab_elem_value(elem, htab->map.key_size); 777 767 778 768 bpf_obj_free_fields(htab->map.record, map_value); 779 769 } ··· 978 968 979 969 static bool fd_htab_map_needs_adjust(const struct bpf_htab *htab) 980 970 { 981 - return htab->map.map_type == BPF_MAP_TYPE_HASH_OF_MAPS && 982 - BITS_PER_LONG == 64; 971 + return is_fd_htab(htab) && BITS_PER_LONG == 64; 983 972 } 984 973 985 974 static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, ··· 1048 1039 htab_elem_set_ptr(l_new, key_size, pptr); 1049 1040 } else if (fd_htab_map_needs_adjust(htab)) { 1050 1041 size = round_up(size, 8); 1051 - memcpy(l_new->key + round_up(key_size, 8), value, size); 1042 + memcpy(htab_elem_value(l_new, key_size), value, size); 1052 1043 } else { 1053 - copy_map_value(&htab->map, 1054 - l_new->key + round_up(key_size, 8), 1055 - value); 1044 + copy_map_value(&htab->map, htab_elem_value(l_new, key_size), value); 1056 1045 } 1057 1046 1058 1047 l_new->hash = hash; ··· 1079 1072 u64 map_flags) 1080 1073 { 1081 1074 struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 1082 - struct htab_elem *l_new = NULL, *l_old; 1075 + struct htab_elem *l_new, *l_old; 1083 1076 struct hlist_nulls_head *head; 1084 1077 unsigned long flags; 1085 - void *old_map_ptr; 1086 1078 struct bucket *b; 1087 1079 u32 key_size, hash; 1088 1080 int ret; ··· 1112 1106 if (l_old) { 1113 1107 /* grab the element lock and update value in place */ 1114 1108 copy_map_value_locked(map, 1115 - l_old->key + round_up(key_size, 8), 1109 + htab_elem_value(l_old, key_size), 1116 1110 value, false); 1117 1111 return 0; 1118 1112 } ··· 1140 1134 * and update element in place 1141 1135 */ 1142 1136 copy_map_value_locked(map, 1143 - l_old->key + round_up(key_size, 8), 1137 + htab_elem_value(l_old, key_size), 1144 1138 value, false); 1145 1139 ret = 0; 1146 1140 goto err; ··· 1162 1156 hlist_nulls_del_rcu(&l_old->hash_node); 1163 1157 1164 1158 /* l_old has already been stashed in htab->extra_elems, free 1165 - * its special fields before it is available for reuse. Also 1166 - * save the old map pointer in htab of maps before unlock 1167 - * and release it after unlock. 1159 + * its special fields before it is available for reuse. 1168 1160 */ 1169 - old_map_ptr = NULL; 1170 - if (htab_is_prealloc(htab)) { 1171 - if (map->ops->map_fd_put_ptr) 1172 - old_map_ptr = fd_htab_map_get_ptr(map, l_old); 1161 + if (htab_is_prealloc(htab)) 1173 1162 check_and_free_fields(htab, l_old); 1174 - } 1175 1163 } 1176 1164 htab_unlock_bucket(b, flags); 1177 - if (l_old) { 1178 - if (old_map_ptr) 1179 - map->ops->map_fd_put_ptr(map, old_map_ptr, true); 1180 - if (!htab_is_prealloc(htab)) 1181 - free_htab_elem(htab, l_old); 1182 - } 1165 + if (l_old && !htab_is_prealloc(htab)) 1166 + free_htab_elem(htab, l_old); 1183 1167 return 0; 1184 1168 err: 1185 1169 htab_unlock_bucket(b, flags); ··· 1216 1220 l_new = prealloc_lru_pop(htab, key, hash); 1217 1221 if (!l_new) 1218 1222 return -ENOMEM; 1219 - copy_map_value(&htab->map, 1220 - l_new->key + round_up(map->key_size, 8), value); 1223 + copy_map_value(&htab->map, htab_elem_value(l_new, map->key_size), value); 1221 1224 1222 1225 ret = htab_lock_bucket(b, &flags); 1223 1226 if (ret) ··· 1250 1255 return ret; 1251 1256 } 1252 1257 1253 - static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, 1258 + static long htab_map_update_elem_in_place(struct bpf_map *map, void *key, 1254 1259 void *value, u64 map_flags, 1255 - bool onallcpus) 1260 + bool percpu, bool onallcpus) 1256 1261 { 1257 1262 struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 1258 - struct htab_elem *l_new = NULL, *l_old; 1263 + struct htab_elem *l_new, *l_old; 1259 1264 struct hlist_nulls_head *head; 1265 + void *old_map_ptr = NULL; 1260 1266 unsigned long flags; 1261 1267 struct bucket *b; 1262 1268 u32 key_size, hash; ··· 1288 1292 goto err; 1289 1293 1290 1294 if (l_old) { 1291 - /* per-cpu hash map can update value in-place */ 1292 - pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size), 1293 - value, onallcpus); 1295 + /* Update value in-place */ 1296 + if (percpu) { 1297 + pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size), 1298 + value, onallcpus); 1299 + } else { 1300 + void **inner_map_pptr = htab_elem_value(l_old, key_size); 1301 + 1302 + old_map_ptr = *inner_map_pptr; 1303 + WRITE_ONCE(*inner_map_pptr, *(void **)value); 1304 + } 1294 1305 } else { 1295 1306 l_new = alloc_htab_elem(htab, key, value, key_size, 1296 - hash, true, onallcpus, NULL); 1307 + hash, percpu, onallcpus, NULL); 1297 1308 if (IS_ERR(l_new)) { 1298 1309 ret = PTR_ERR(l_new); 1299 1310 goto err; 1300 1311 } 1301 1312 hlist_nulls_add_head_rcu(&l_new->hash_node, head); 1302 1313 } 1303 - ret = 0; 1304 1314 err: 1305 1315 htab_unlock_bucket(b, flags); 1316 + if (old_map_ptr) 1317 + map->ops->map_fd_put_ptr(map, old_map_ptr, true); 1306 1318 return ret; 1307 1319 } 1308 1320 ··· 1387 1383 static long htab_percpu_map_update_elem(struct bpf_map *map, void *key, 1388 1384 void *value, u64 map_flags) 1389 1385 { 1390 - return __htab_percpu_map_update_elem(map, key, value, map_flags, false); 1386 + return htab_map_update_elem_in_place(map, key, value, map_flags, true, false); 1391 1387 } 1392 1388 1393 1389 static long htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, ··· 1504 1500 /* We only free timer on uref dropping to zero */ 1505 1501 if (btf_record_has_field(htab->map.record, BPF_TIMER)) 1506 1502 bpf_obj_free_timer(htab->map.record, 1507 - l->key + round_up(htab->map.key_size, 8)); 1503 + htab_elem_value(l, htab->map.key_size)); 1508 1504 if (btf_record_has_field(htab->map.record, BPF_WORKQUEUE)) 1509 1505 bpf_obj_free_workqueue(htab->map.record, 1510 - l->key + round_up(htab->map.key_size, 8)); 1506 + htab_elem_value(l, htab->map.key_size)); 1511 1507 } 1512 1508 cond_resched_rcu(); 1513 1509 } ··· 1619 1615 off += roundup_value_size; 1620 1616 } 1621 1617 } else { 1622 - u32 roundup_key_size = round_up(map->key_size, 8); 1618 + void *src = htab_elem_value(l, map->key_size); 1623 1619 1624 1620 if (flags & BPF_F_LOCK) 1625 - copy_map_value_locked(map, value, l->key + 1626 - roundup_key_size, 1627 - true); 1621 + copy_map_value_locked(map, value, src, true); 1628 1622 else 1629 - copy_map_value(map, value, l->key + 1630 - roundup_key_size); 1623 + copy_map_value(map, value, src); 1631 1624 /* Zeroing special fields in the temp buffer */ 1632 1625 check_and_init_map_value(map, value); 1633 1626 } ··· 1681 1680 bool is_percpu) 1682 1681 { 1683 1682 struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 1684 - u32 bucket_cnt, total, key_size, value_size, roundup_key_size; 1685 1683 void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val; 1686 1684 void __user *uvalues = u64_to_user_ptr(attr->batch.values); 1687 1685 void __user *ukeys = u64_to_user_ptr(attr->batch.keys); 1688 1686 void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch); 1689 1687 u32 batch, max_count, size, bucket_size, map_id; 1688 + u32 bucket_cnt, total, key_size, value_size; 1690 1689 struct htab_elem *node_to_free = NULL; 1691 1690 u64 elem_map_flags, map_flags; 1692 1691 struct hlist_nulls_head *head; ··· 1721 1720 return -ENOENT; 1722 1721 1723 1722 key_size = htab->map.key_size; 1724 - roundup_key_size = round_up(htab->map.key_size, 8); 1725 1723 value_size = htab->map.value_size; 1726 1724 size = round_up(value_size, 8); 1727 1725 if (is_percpu) ··· 1812 1812 off += size; 1813 1813 } 1814 1814 } else { 1815 - value = l->key + roundup_key_size; 1816 - if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { 1815 + value = htab_elem_value(l, key_size); 1816 + if (is_fd_htab(htab)) { 1817 1817 struct bpf_map **inner_map = value; 1818 1818 1819 1819 /* Actual value is the id of the inner map */ ··· 2063 2063 static int __bpf_hash_map_seq_show(struct seq_file *seq, struct htab_elem *elem) 2064 2064 { 2065 2065 struct bpf_iter_seq_hash_map_info *info = seq->private; 2066 - u32 roundup_key_size, roundup_value_size; 2067 2066 struct bpf_iter__bpf_map_elem ctx = {}; 2068 2067 struct bpf_map *map = info->map; 2069 2068 struct bpf_iter_meta meta; 2070 2069 int ret = 0, off = 0, cpu; 2070 + u32 roundup_value_size; 2071 2071 struct bpf_prog *prog; 2072 2072 void __percpu *pptr; 2073 2073 ··· 2077 2077 ctx.meta = &meta; 2078 2078 ctx.map = info->map; 2079 2079 if (elem) { 2080 - roundup_key_size = round_up(map->key_size, 8); 2081 2080 ctx.key = elem->key; 2082 2081 if (!info->percpu_value_buf) { 2083 - ctx.value = elem->key + roundup_key_size; 2082 + ctx.value = htab_elem_value(elem, map->key_size); 2084 2083 } else { 2085 2084 roundup_value_size = round_up(map->value_size, 8); 2086 2085 pptr = htab_elem_get_ptr(elem, map->key_size); ··· 2164 2165 struct hlist_nulls_head *head; 2165 2166 struct hlist_nulls_node *n; 2166 2167 struct htab_elem *elem; 2167 - u32 roundup_key_size; 2168 2168 int i, num_elems = 0; 2169 2169 void __percpu *pptr; 2170 2170 struct bucket *b; ··· 2178 2180 2179 2181 is_percpu = htab_is_percpu(htab); 2180 2182 2181 - roundup_key_size = round_up(map->key_size, 8); 2182 2183 /* migration has been disabled, so percpu value prepared here will be 2183 2184 * the same as the one seen by the bpf program with 2184 2185 * bpf_map_lookup_elem(). ··· 2193 2196 pptr = htab_elem_get_ptr(elem, map->key_size); 2194 2197 val = this_cpu_ptr(pptr); 2195 2198 } else { 2196 - val = elem->key + roundup_key_size; 2199 + val = htab_elem_value(elem, map->key_size); 2197 2200 } 2198 2201 num_elems++; 2199 2202 ret = callback_fn((u64)(long)map, (u64)(long)key, ··· 2408 2411 ret = __htab_lru_percpu_map_update_elem(map, key, value, 2409 2412 map_flags, true); 2410 2413 else 2411 - ret = __htab_percpu_map_update_elem(map, key, value, map_flags, 2412 - true); 2414 + ret = htab_map_update_elem_in_place(map, key, value, map_flags, 2415 + true, true); 2413 2416 rcu_read_unlock(); 2414 2417 2415 2418 return ret; ··· 2533 2536 return ret; 2534 2537 } 2535 2538 2536 - /* only called from syscall */ 2539 + /* Only called from syscall */ 2537 2540 int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, 2538 2541 void *key, void *value, u64 map_flags) 2539 2542 { 2540 2543 void *ptr; 2541 2544 int ret; 2542 - u32 ufd = *(u32 *)value; 2543 2545 2544 - ptr = map->ops->map_fd_get_ptr(map, map_file, ufd); 2546 + ptr = map->ops->map_fd_get_ptr(map, map_file, *(int *)value); 2545 2547 if (IS_ERR(ptr)) 2546 2548 return PTR_ERR(ptr); 2547 2549 2548 2550 /* The htab bucket lock is always held during update operations in fd 2549 2551 * htab map, and the following rcu_read_lock() is only used to avoid 2550 - * the WARN_ON_ONCE in htab_map_update_elem(). 2552 + * the WARN_ON_ONCE in htab_map_update_elem_in_place(). 2551 2553 */ 2552 2554 rcu_read_lock(); 2553 - ret = htab_map_update_elem(map, key, &ptr, map_flags); 2555 + ret = htab_map_update_elem_in_place(map, key, &ptr, map_flags, false, false); 2554 2556 rcu_read_unlock(); 2555 2557 if (ret) 2556 2558 map->ops->map_fd_put_ptr(map, ptr, false);

+118 -15

kernel/bpf/helpers.c

··· 23 23 #include <linux/btf_ids.h> 24 24 #include <linux/bpf_mem_alloc.h> 25 25 #include <linux/kasan.h> 26 + #include <linux/bpf_verifier.h> 26 27 27 28 #include "../../lib/kstrtox.h" 28 29 ··· 130 129 131 130 BPF_CALL_3(bpf_map_lookup_percpu_elem, struct bpf_map *, map, void *, key, u32, cpu) 132 131 { 133 - WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()); 132 + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() && 133 + !rcu_read_lock_bh_held()); 134 134 return (unsigned long) map->ops->map_lookup_percpu_elem(map, key, cpu); 135 135 } 136 136 ··· 1715 1713 memset(ptr, 0, sizeof(*ptr)); 1716 1714 } 1717 1715 1718 - static int bpf_dynptr_check_off_len(const struct bpf_dynptr_kern *ptr, u32 offset, u32 len) 1719 - { 1720 - u32 size = __bpf_dynptr_size(ptr); 1721 - 1722 - if (len > size || offset > size - len) 1723 - return -E2BIG; 1724 - 1725 - return 0; 1726 - } 1727 - 1728 1716 BPF_CALL_4(bpf_dynptr_from_mem, void *, data, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr) 1729 1717 { 1730 1718 int err; ··· 1801 1809 .arg5_type = ARG_ANYTHING, 1802 1810 }; 1803 1811 1804 - static int __bpf_dynptr_write(const struct bpf_dynptr_kern *dst, u32 offset, void *src, 1805 - u32 len, u64 flags) 1812 + int __bpf_dynptr_write(const struct bpf_dynptr_kern *dst, u32 offset, void *src, 1813 + u32 len, u64 flags) 1806 1814 { 1807 1815 enum bpf_dynptr_type type; 1808 1816 int err; ··· 1904 1912 const struct bpf_func_proto bpf_probe_read_kernel_proto __weak; 1905 1913 const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak; 1906 1914 const struct bpf_func_proto bpf_task_pt_regs_proto __weak; 1915 + const struct bpf_func_proto bpf_perf_event_read_proto __weak; 1916 + const struct bpf_func_proto bpf_send_signal_proto __weak; 1917 + const struct bpf_func_proto bpf_send_signal_thread_proto __weak; 1918 + const struct bpf_func_proto bpf_get_task_stack_sleepable_proto __weak; 1919 + const struct bpf_func_proto bpf_get_task_stack_proto __weak; 1920 + const struct bpf_func_proto bpf_get_branch_snapshot_proto __weak; 1907 1921 1908 1922 const struct bpf_func_proto * 1909 1923 bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) ··· 1963 1965 return &bpf_get_current_pid_tgid_proto; 1964 1966 case BPF_FUNC_get_ns_current_pid_tgid: 1965 1967 return &bpf_get_ns_current_pid_tgid_proto; 1968 + case BPF_FUNC_get_current_uid_gid: 1969 + return &bpf_get_current_uid_gid_proto; 1966 1970 default: 1967 1971 break; 1968 1972 } ··· 2022 2022 return &bpf_get_current_cgroup_id_proto; 2023 2023 case BPF_FUNC_get_current_ancestor_cgroup_id: 2024 2024 return &bpf_get_current_ancestor_cgroup_id_proto; 2025 + case BPF_FUNC_current_task_under_cgroup: 2026 + return &bpf_current_task_under_cgroup_proto; 2025 2027 #endif 2028 + #ifdef CONFIG_CGROUP_NET_CLASSID 2029 + case BPF_FUNC_get_cgroup_classid: 2030 + return &bpf_get_cgroup_classid_curr_proto; 2031 + #endif 2032 + case BPF_FUNC_task_storage_get: 2033 + if (bpf_prog_check_recur(prog)) 2034 + return &bpf_task_storage_get_recur_proto; 2035 + return &bpf_task_storage_get_proto; 2036 + case BPF_FUNC_task_storage_delete: 2037 + if (bpf_prog_check_recur(prog)) 2038 + return &bpf_task_storage_delete_recur_proto; 2039 + return &bpf_task_storage_delete_proto; 2026 2040 default: 2027 2041 break; 2028 2042 } ··· 2051 2037 return &bpf_get_current_task_proto; 2052 2038 case BPF_FUNC_get_current_task_btf: 2053 2039 return &bpf_get_current_task_btf_proto; 2040 + case BPF_FUNC_get_current_comm: 2041 + return &bpf_get_current_comm_proto; 2054 2042 case BPF_FUNC_probe_read_user: 2055 2043 return &bpf_probe_read_user_proto; 2056 2044 case BPF_FUNC_probe_read_kernel: ··· 2063 2047 case BPF_FUNC_probe_read_kernel_str: 2064 2048 return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? 2065 2049 NULL : &bpf_probe_read_kernel_str_proto; 2050 + case BPF_FUNC_copy_from_user: 2051 + return &bpf_copy_from_user_proto; 2052 + case BPF_FUNC_copy_from_user_task: 2053 + return &bpf_copy_from_user_task_proto; 2066 2054 case BPF_FUNC_snprintf_btf: 2067 2055 return &bpf_snprintf_btf_proto; 2068 2056 case BPF_FUNC_snprintf: ··· 2077 2057 return bpf_get_trace_vprintk_proto(); 2078 2058 case BPF_FUNC_perf_event_read_value: 2079 2059 return bpf_get_perf_event_read_value_proto(); 2060 + case BPF_FUNC_perf_event_read: 2061 + return &bpf_perf_event_read_proto; 2062 + case BPF_FUNC_send_signal: 2063 + return &bpf_send_signal_proto; 2064 + case BPF_FUNC_send_signal_thread: 2065 + return &bpf_send_signal_thread_proto; 2066 + case BPF_FUNC_get_task_stack: 2067 + return prog->sleepable ? &bpf_get_task_stack_sleepable_proto 2068 + : &bpf_get_task_stack_proto; 2069 + case BPF_FUNC_get_branch_snapshot: 2070 + return &bpf_get_branch_snapshot_proto; 2071 + case BPF_FUNC_find_vma: 2072 + return &bpf_find_vma_proto; 2080 2073 default: 2081 2074 return NULL; 2082 2075 } ··· 2326 2293 return __bpf_list_del(head, true); 2327 2294 } 2328 2295 2296 + __bpf_kfunc struct bpf_list_node *bpf_list_front(struct bpf_list_head *head) 2297 + { 2298 + struct list_head *h = (struct list_head *)head; 2299 + 2300 + if (list_empty(h) || unlikely(!h->next)) 2301 + return NULL; 2302 + 2303 + return (struct bpf_list_node *)h->next; 2304 + } 2305 + 2306 + __bpf_kfunc struct bpf_list_node *bpf_list_back(struct bpf_list_head *head) 2307 + { 2308 + struct list_head *h = (struct list_head *)head; 2309 + 2310 + if (list_empty(h) || unlikely(!h->next)) 2311 + return NULL; 2312 + 2313 + return (struct bpf_list_node *)h->prev; 2314 + } 2315 + 2329 2316 __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root, 2330 2317 struct bpf_rb_node *node) 2331 2318 { ··· 2417 2364 struct rb_root_cached *r = (struct rb_root_cached *)root; 2418 2365 2419 2366 return (struct bpf_rb_node *)rb_first_cached(r); 2367 + } 2368 + 2369 + __bpf_kfunc struct bpf_rb_node *bpf_rbtree_root(struct bpf_rb_root *root) 2370 + { 2371 + struct rb_root_cached *r = (struct rb_root_cached *)root; 2372 + 2373 + return (struct bpf_rb_node *)r->rb_root.rb_node; 2374 + } 2375 + 2376 + __bpf_kfunc struct bpf_rb_node *bpf_rbtree_left(struct bpf_rb_root *root, struct bpf_rb_node *node) 2377 + { 2378 + struct bpf_rb_node_kern *node_internal = (struct bpf_rb_node_kern *)node; 2379 + 2380 + if (READ_ONCE(node_internal->owner) != root) 2381 + return NULL; 2382 + 2383 + return (struct bpf_rb_node *)node_internal->rb_node.rb_left; 2384 + } 2385 + 2386 + __bpf_kfunc struct bpf_rb_node *bpf_rbtree_right(struct bpf_rb_root *root, struct bpf_rb_node *node) 2387 + { 2388 + struct bpf_rb_node_kern *node_internal = (struct bpf_rb_node_kern *)node; 2389 + 2390 + if (READ_ONCE(node_internal->owner) != root) 2391 + return NULL; 2392 + 2393 + return (struct bpf_rb_node *)node_internal->rb_node.rb_right; 2420 2394 } 2421 2395 2422 2396 /** ··· 3003 2923 __bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq, 3004 2924 int (callback_fn)(void *map, int *key, void *value), 3005 2925 unsigned int flags, 3006 - void *aux__ign) 2926 + void *aux__prog) 3007 2927 { 3008 - struct bpf_prog_aux *aux = (struct bpf_prog_aux *)aux__ign; 2928 + struct bpf_prog_aux *aux = (struct bpf_prog_aux *)aux__prog; 3009 2929 struct bpf_async_kern *async = (struct bpf_async_kern *)wq; 3010 2930 3011 2931 if (flags) ··· 3274 3194 local_irq_restore(*flags__irq_flag); 3275 3195 } 3276 3196 3197 + __bpf_kfunc void __bpf_trap(void) 3198 + { 3199 + } 3200 + 3277 3201 __bpf_kfunc_end_defs(); 3278 3202 3279 3203 BTF_KFUNCS_START(generic_btf_ids) ··· 3293 3209 BTF_ID_FLAGS(func, bpf_list_push_back_impl) 3294 3210 BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) 3295 3211 BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) 3212 + BTF_ID_FLAGS(func, bpf_list_front, KF_RET_NULL) 3213 + BTF_ID_FLAGS(func, bpf_list_back, KF_RET_NULL) 3296 3214 BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL) 3297 3215 BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE) 3298 3216 BTF_ID_FLAGS(func, bpf_rbtree_remove, KF_ACQUIRE | KF_RET_NULL) 3299 3217 BTF_ID_FLAGS(func, bpf_rbtree_add_impl) 3300 3218 BTF_ID_FLAGS(func, bpf_rbtree_first, KF_RET_NULL) 3219 + BTF_ID_FLAGS(func, bpf_rbtree_root, KF_RET_NULL) 3220 + BTF_ID_FLAGS(func, bpf_rbtree_left, KF_RET_NULL) 3221 + BTF_ID_FLAGS(func, bpf_rbtree_right, KF_RET_NULL) 3301 3222 3302 3223 #ifdef CONFIG_CGROUPS 3303 3224 BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL) ··· 3383 3294 BTF_ID_FLAGS(func, bpf_iter_kmem_cache_destroy, KF_ITER_DESTROY | KF_SLEEPABLE) 3384 3295 BTF_ID_FLAGS(func, bpf_local_irq_save) 3385 3296 BTF_ID_FLAGS(func, bpf_local_irq_restore) 3297 + BTF_ID_FLAGS(func, bpf_probe_read_user_dynptr) 3298 + BTF_ID_FLAGS(func, bpf_probe_read_kernel_dynptr) 3299 + BTF_ID_FLAGS(func, bpf_probe_read_user_str_dynptr) 3300 + BTF_ID_FLAGS(func, bpf_probe_read_kernel_str_dynptr) 3301 + BTF_ID_FLAGS(func, bpf_copy_from_user_dynptr, KF_SLEEPABLE) 3302 + BTF_ID_FLAGS(func, bpf_copy_from_user_str_dynptr, KF_SLEEPABLE) 3303 + BTF_ID_FLAGS(func, bpf_copy_from_user_task_dynptr, KF_SLEEPABLE | KF_TRUSTED_ARGS) 3304 + BTF_ID_FLAGS(func, bpf_copy_from_user_task_str_dynptr, KF_SLEEPABLE | KF_TRUSTED_ARGS) 3305 + #ifdef CONFIG_DMA_SHARED_BUFFER 3306 + BTF_ID_FLAGS(func, bpf_iter_dmabuf_new, KF_ITER_NEW | KF_SLEEPABLE) 3307 + BTF_ID_FLAGS(func, bpf_iter_dmabuf_next, KF_ITER_NEXT | KF_RET_NULL | KF_SLEEPABLE) 3308 + BTF_ID_FLAGS(func, bpf_iter_dmabuf_destroy, KF_ITER_DESTROY | KF_SLEEPABLE) 3309 + #endif 3310 + BTF_ID_FLAGS(func, __bpf_trap) 3386 3311 BTF_KFUNCS_END(common_btf_ids) 3387 3312 3388 3313 static const struct btf_kfunc_id_set common_kfunc_set = {

+6 -4

kernel/bpf/syscall.c

··· 36 36 #include <linux/memcontrol.h> 37 37 #include <linux/trace_events.h> 38 38 #include <linux/tracepoint.h> 39 + #include <linux/overflow.h> 39 40 40 41 #include <net/netfilter/nf_bpf_link.h> 41 42 #include <net/netkit.h> ··· 694 693 695 694 if (IS_ERR_OR_NULL(rec)) 696 695 return NULL; 697 - size = offsetof(struct btf_record, fields[rec->cnt]); 696 + size = struct_size(rec, fields, rec->cnt); 698 697 new_rec = kmemdup(rec, size, GFP_KERNEL | __GFP_NOWARN); 699 698 if (!new_rec) 700 699 return ERR_PTR(-ENOMEM); ··· 749 748 return false; 750 749 if (rec_a->cnt != rec_b->cnt) 751 750 return false; 752 - size = offsetof(struct btf_record, fields[rec_a->cnt]); 751 + size = struct_size(rec_a, fields, rec_a->cnt); 753 752 /* btf_parse_fields uses kzalloc to allocate a btf_record, so unused 754 753 * members are zeroed out. So memcmp is safe to do without worrying 755 754 * about padding/unused fields. ··· 3800 3799 static int bpf_perf_link_fill_uprobe(const struct perf_event *event, 3801 3800 struct bpf_link_info *info) 3802 3801 { 3802 + u64 ref_ctr_offset, offset; 3803 3803 char __user *uname; 3804 - u64 addr, offset; 3805 3804 u32 ulen, type; 3806 3805 int err; 3807 3806 3808 3807 uname = u64_to_user_ptr(info->perf_event.uprobe.file_name); 3809 3808 ulen = info->perf_event.uprobe.name_len; 3810 - err = bpf_perf_link_fill_common(event, uname, &ulen, &offset, &addr, 3809 + err = bpf_perf_link_fill_common(event, uname, &ulen, &offset, &ref_ctr_offset, 3811 3810 &type, NULL); 3812 3811 if (err) 3813 3812 return err; ··· 3819 3818 info->perf_event.uprobe.name_len = ulen; 3820 3819 info->perf_event.uprobe.offset = offset; 3821 3820 info->perf_event.uprobe.cookie = event->bpf_cookie; 3821 + info->perf_event.uprobe.ref_ctr_offset = ref_ctr_offset; 3822 3822 return 0; 3823 3823 } 3824 3824 #endif

+32

kernel/bpf/sysfs_btf.c

··· 7 7 #include <linux/kobject.h> 8 8 #include <linux/init.h> 9 9 #include <linux/sysfs.h> 10 + #include <linux/mm.h> 11 + #include <linux/io.h> 12 + #include <linux/btf.h> 10 13 11 14 /* See scripts/link-vmlinux.sh, gen_btf() func for details */ 12 15 extern char __start_BTF[]; 13 16 extern char __stop_BTF[]; 14 17 18 + static int btf_sysfs_vmlinux_mmap(struct file *filp, struct kobject *kobj, 19 + const struct bin_attribute *attr, 20 + struct vm_area_struct *vma) 21 + { 22 + unsigned long pages = PAGE_ALIGN(attr->size) >> PAGE_SHIFT; 23 + size_t vm_size = vma->vm_end - vma->vm_start; 24 + phys_addr_t addr = virt_to_phys(__start_BTF); 25 + unsigned long pfn = addr >> PAGE_SHIFT; 26 + 27 + if (attr->private != __start_BTF || !PAGE_ALIGNED(addr)) 28 + return -EINVAL; 29 + 30 + if (vma->vm_pgoff) 31 + return -EINVAL; 32 + 33 + if (vma->vm_flags & (VM_WRITE | VM_EXEC | VM_MAYSHARE)) 34 + return -EACCES; 35 + 36 + if (pfn + pages < pfn) 37 + return -EINVAL; 38 + 39 + if ((vm_size >> PAGE_SHIFT) > pages) 40 + return -EINVAL; 41 + 42 + vm_flags_mod(vma, VM_DONTDUMP, VM_MAYEXEC | VM_MAYWRITE); 43 + return remap_pfn_range(vma, vma->vm_start, pfn, vm_size, vma->vm_page_prot); 44 + } 45 + 15 46 static struct bin_attribute bin_attr_btf_vmlinux __ro_after_init = { 16 47 .attr = { .name = "vmlinux", .mode = 0444, }, 17 48 .read_new = sysfs_bin_attr_simple_read, 49 + .mmap = btf_sysfs_vmlinux_mmap, 18 50 }; 19 51 20 52 struct kobject *btf_kobj;

+335 -303

kernel/bpf/verifier.c

··· 322 322 struct btf *arg_btf; 323 323 u32 arg_btf_id; 324 324 bool arg_owning_ref; 325 + bool arg_prog; 325 326 326 327 struct { 327 328 struct btf_field *field; ··· 1924 1923 u32 steps = 0; 1925 1924 1926 1925 while (topmost && topmost->loop_entry) { 1927 - if (steps++ > st->dfs_depth) { 1928 - WARN_ONCE(true, "verifier bug: infinite loop in get_loop_entry\n"); 1929 - verbose(env, "verifier bug: infinite loop in get_loop_entry()\n"); 1926 + if (verifier_bug_if(steps++ > st->dfs_depth, env, "infinite loop")) 1930 1927 return ERR_PTR(-EFAULT); 1931 - } 1932 1928 topmost = topmost->loop_entry; 1933 1929 } 1934 1930 return topmost; ··· 3457 3459 /* if read wasn't screened by an earlier write ... */ 3458 3460 if (writes && state->live & REG_LIVE_WRITTEN) 3459 3461 break; 3460 - if (parent->live & REG_LIVE_DONE) { 3461 - verbose(env, "verifier BUG type %s var_off %lld off %d\n", 3462 - reg_type_str(env, parent->type), 3463 - parent->var_off.value, parent->off); 3462 + if (verifier_bug_if(parent->live & REG_LIVE_DONE, env, 3463 + "type %s var_off %lld off %d", 3464 + reg_type_str(env, parent->type), 3465 + parent->var_off.value, parent->off)) 3464 3466 return -EFAULT; 3465 - } 3466 3467 /* The first condition is more likely to be true than the 3467 3468 * second, checked it first. 3468 3469 */ ··· 3646 3649 case BPF_ST: 3647 3650 return -1; 3648 3651 case BPF_STX: 3649 - if ((BPF_MODE(insn->code) == BPF_ATOMIC || 3650 - BPF_MODE(insn->code) == BPF_PROBE_ATOMIC) && 3651 - (insn->imm & BPF_FETCH)) { 3652 + if (BPF_MODE(insn->code) == BPF_ATOMIC || 3653 + BPF_MODE(insn->code) == BPF_PROBE_ATOMIC) { 3652 3654 if (insn->imm == BPF_CMPXCHG) 3653 3655 return BPF_REG_0; 3654 - else 3656 + else if (insn->imm == BPF_LOAD_ACQ) 3657 + return insn->dst_reg; 3658 + else if (insn->imm & BPF_FETCH) 3655 3659 return insn->src_reg; 3656 - } else { 3657 - return -1; 3658 3660 } 3661 + return -1; 3659 3662 default: 3660 3663 return insn->dst_reg; 3661 3664 } ··· 3854 3857 /* atomic instructions push insn_flags twice, for READ and 3855 3858 * WRITE sides, but they should agree on stack slot 3856 3859 */ 3857 - WARN_ONCE((env->cur_hist_ent->flags & insn_flags) && 3858 - (env->cur_hist_ent->flags & insn_flags) != insn_flags, 3859 - "verifier insn history bug: insn_idx %d cur flags %x new flags %x\n", 3860 - env->insn_idx, env->cur_hist_ent->flags, insn_flags); 3860 + verifier_bug_if((env->cur_hist_ent->flags & insn_flags) && 3861 + (env->cur_hist_ent->flags & insn_flags) != insn_flags, 3862 + env, "insn history: insn_idx %d cur flags %x new flags %x", 3863 + env->insn_idx, env->cur_hist_ent->flags, insn_flags); 3861 3864 env->cur_hist_ent->flags |= insn_flags; 3862 - WARN_ONCE(env->cur_hist_ent->linked_regs != 0, 3863 - "verifier insn history bug: insn_idx %d linked_regs != 0: %#llx\n", 3864 - env->insn_idx, env->cur_hist_ent->linked_regs); 3865 + verifier_bug_if(env->cur_hist_ent->linked_regs != 0, env, 3866 + "insn history: insn_idx %d linked_regs: %#llx", 3867 + env->insn_idx, env->cur_hist_ent->linked_regs); 3865 3868 env->cur_hist_ent->linked_regs = linked_regs; 3866 3869 return 0; 3867 3870 } ··· 3984 3987 static inline int bt_subprog_enter(struct backtrack_state *bt) 3985 3988 { 3986 3989 if (bt->frame == MAX_CALL_FRAMES - 1) { 3987 - verbose(bt->env, "BUG subprog enter from frame %d\n", bt->frame); 3988 - WARN_ONCE(1, "verifier backtracking bug"); 3990 + verifier_bug(bt->env, "subprog enter from frame %d", bt->frame); 3989 3991 return -EFAULT; 3990 3992 } 3991 3993 bt->frame++; ··· 3994 3998 static inline int bt_subprog_exit(struct backtrack_state *bt) 3995 3999 { 3996 4000 if (bt->frame == 0) { 3997 - verbose(bt->env, "BUG subprog exit from frame 0\n"); 3998 - WARN_ONCE(1, "verifier backtracking bug"); 4001 + verifier_bug(bt->env, "subprog exit from frame 0"); 3999 4002 return -EFAULT; 4000 4003 } 4001 4004 bt->frame--; ··· 4272 4277 * should be literally next instruction in 4273 4278 * caller program 4274 4279 */ 4275 - WARN_ONCE(idx + 1 != subseq_idx, "verifier backtracking bug"); 4280 + verifier_bug_if(idx + 1 != subseq_idx, env, 4281 + "extra insn from subprog"); 4276 4282 /* r1-r5 are invalidated after subprog call, 4277 4283 * so for global func call it shouldn't be set 4278 4284 * anymore 4279 4285 */ 4280 4286 if (bt_reg_mask(bt) & BPF_REGMASK_ARGS) { 4281 - verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); 4282 - WARN_ONCE(1, "verifier backtracking bug"); 4287 + verifier_bug(env, "global subprog unexpected regs %x", 4288 + bt_reg_mask(bt)); 4283 4289 return -EFAULT; 4284 4290 } 4285 4291 /* global subprog always sets R0 */ ··· 4294 4298 * the current frame should be zero by now 4295 4299 */ 4296 4300 if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS) { 4297 - verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); 4298 - WARN_ONCE(1, "verifier backtracking bug"); 4301 + verifier_bug(env, "static subprog unexpected regs %x", 4302 + bt_reg_mask(bt)); 4299 4303 return -EFAULT; 4300 4304 } 4301 4305 /* we are now tracking register spills correctly, 4302 4306 * so any instance of leftover slots is a bug 4303 4307 */ 4304 4308 if (bt_stack_mask(bt) != 0) { 4305 - verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt)); 4306 - WARN_ONCE(1, "verifier backtracking bug (subprog leftover stack slots)"); 4309 + verifier_bug(env, 4310 + "static subprog leftover stack slots %llx", 4311 + bt_stack_mask(bt)); 4307 4312 return -EFAULT; 4308 4313 } 4309 4314 /* propagate r1-r5 to the caller */ ··· 4327 4330 * not actually arguments passed directly to callback subprogs 4328 4331 */ 4329 4332 if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS) { 4330 - verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); 4331 - WARN_ONCE(1, "verifier backtracking bug"); 4333 + verifier_bug(env, "callback unexpected regs %x", 4334 + bt_reg_mask(bt)); 4332 4335 return -EFAULT; 4333 4336 } 4334 4337 if (bt_stack_mask(bt) != 0) { 4335 - verbose(env, "BUG stack slots %llx\n", bt_stack_mask(bt)); 4336 - WARN_ONCE(1, "verifier backtracking bug (callback leftover stack slots)"); 4338 + verifier_bug(env, "callback leftover stack slots %llx", 4339 + bt_stack_mask(bt)); 4337 4340 return -EFAULT; 4338 4341 } 4339 4342 /* clear r1-r5 in callback subprog's mask */ ··· 4352 4355 /* regular helper call sets R0 */ 4353 4356 bt_clear_reg(bt, BPF_REG_0); 4354 4357 if (bt_reg_mask(bt) & BPF_REGMASK_ARGS) { 4355 - /* if backtracing was looking for registers R1-R5 4358 + /* if backtracking was looking for registers R1-R5 4356 4359 * they should have been found already. 4357 4360 */ 4358 - verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); 4359 - WARN_ONCE(1, "verifier backtracking bug"); 4361 + verifier_bug(env, "backtracking call unexpected regs %x", 4362 + bt_reg_mask(bt)); 4360 4363 return -EFAULT; 4361 4364 } 4362 4365 } else if (opcode == BPF_EXIT) { ··· 4374 4377 for (i = BPF_REG_1; i <= BPF_REG_5; i++) 4375 4378 bt_clear_reg(bt, i); 4376 4379 if (bt_reg_mask(bt) & BPF_REGMASK_ARGS) { 4377 - verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); 4378 - WARN_ONCE(1, "verifier backtracking bug"); 4380 + verifier_bug(env, "backtracking exit unexpected regs %x", 4381 + bt_reg_mask(bt)); 4379 4382 return -EFAULT; 4380 4383 } 4381 4384 ··· 4410 4413 * before it would be equally necessary to 4411 4414 * propagate it to dreg. 4412 4415 */ 4413 - bt_set_reg(bt, dreg); 4414 - bt_set_reg(bt, sreg); 4416 + if (!hist || !(hist->flags & INSN_F_SRC_REG_STACK)) 4417 + bt_set_reg(bt, sreg); 4418 + if (!hist || !(hist->flags & INSN_F_DST_REG_STACK)) 4419 + bt_set_reg(bt, dreg); 4415 4420 } else if (BPF_SRC(insn->code) == BPF_K) { 4416 4421 /* dreg <cond> K 4417 4422 * Only dreg still needs precision before ··· 4718 4719 return 0; 4719 4720 } 4720 4721 4721 - verbose(env, "BUG backtracking func entry subprog %d reg_mask %x stack_mask %llx\n", 4722 - st->frame[0]->subprogno, bt_reg_mask(bt), bt_stack_mask(bt)); 4723 - WARN_ONCE(1, "verifier backtracking bug"); 4722 + verifier_bug(env, "backtracking func entry subprog %d reg_mask %x stack_mask %llx", 4723 + st->frame[0]->subprogno, bt_reg_mask(bt), bt_stack_mask(bt)); 4724 4724 return -EFAULT; 4725 4725 } 4726 4726 ··· 4755 4757 * It means the backtracking missed the spot where 4756 4758 * particular register was initialized with a constant. 4757 4759 */ 4758 - verbose(env, "BUG backtracking idx %d\n", i); 4759 - WARN_ONCE(1, "verifier backtracking bug"); 4760 + verifier_bug(env, "backtracking idx %d", i); 4760 4761 return -EFAULT; 4761 4762 } 4762 4763 } ··· 4780 4783 4781 4784 bitmap_from_u64(mask, bt_frame_stack_mask(bt, fr)); 4782 4785 for_each_set_bit(i, mask, 64) { 4783 - if (i >= func->allocated_stack / BPF_REG_SIZE) { 4784 - verbose(env, "BUG backtracking (stack slot %d, total slots %d)\n", 4785 - i, func->allocated_stack / BPF_REG_SIZE); 4786 - WARN_ONCE(1, "verifier backtracking bug (stack slot out of bounds)"); 4786 + if (verifier_bug_if(i >= func->allocated_stack / BPF_REG_SIZE, 4787 + env, "stack slot %d, total slots %d", 4788 + i, func->allocated_stack / BPF_REG_SIZE)) 4787 4789 return -EFAULT; 4788 - } 4789 4790 4790 4791 if (!is_spilled_scalar_reg(&func->stack[i])) { 4791 4792 bt_clear_frame_slot(bt, fr, i); ··· 6556 6561 /* find the callee */ 6557 6562 next_insn = i + insn[i].imm + 1; 6558 6563 sidx = find_subprog(env, next_insn); 6559 - if (sidx < 0) { 6560 - WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", 6561 - next_insn); 6564 + if (verifier_bug_if(sidx < 0, env, "callee not found at insn %d", next_insn)) 6562 6565 return -EFAULT; 6563 - } 6564 6566 if (subprog[sidx].is_async_cb) { 6565 6567 if (subprog[sidx].has_tail_call) { 6566 - verbose(env, "verifier bug. subprog has tail_call and async cb\n"); 6568 + verifier_bug(env, "subprog has tail_call and async cb"); 6567 6569 return -EFAULT; 6568 6570 } 6569 6571 /* async callbacks don't increase bpf prog stack size unless called directly */ 6570 6572 if (!bpf_pseudo_call(insn + i)) 6571 6573 continue; 6572 6574 if (subprog[sidx].is_exception_cb) { 6573 - verbose(env, "insn %d cannot call exception cb directly\n", i); 6575 + verbose(env, "insn %d cannot call exception cb directly", i); 6574 6576 return -EINVAL; 6575 6577 } 6576 6578 } ··· 6667 6675 int start = idx + insn->imm + 1, subprog; 6668 6676 6669 6677 subprog = find_subprog(env, start); 6670 - if (subprog < 0) { 6671 - WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", 6672 - start); 6678 + if (verifier_bug_if(subprog < 0, env, "get stack depth: no program at insn %d", start)) 6673 6679 return -EFAULT; 6674 - } 6675 6680 return env->subprog_info[subprog].stack_depth; 6676 6681 } 6677 6682 #endif ··· 7973 7984 slot = -i - 1; 7974 7985 spi = slot / BPF_REG_SIZE; 7975 7986 if (state->allocated_stack <= slot) { 7976 - verbose(env, "verifier bug: allocated_stack too small\n"); 7987 + verbose(env, "allocated_stack too small\n"); 7977 7988 return -EFAULT; 7978 7989 } 7979 7990 ··· 8402 8413 return -EINVAL; 8403 8414 } 8404 8415 if (meta->map_ptr) { 8405 - verbose(env, "verifier bug. Two map pointers in a timer helper\n"); 8416 + verifier_bug(env, "Two map pointers in a timer helper"); 8406 8417 return -EFAULT; 8407 8418 } 8408 8419 meta->map_uid = reg->map_uid; ··· 10274 10285 } 10275 10286 10276 10287 if (state->frame[state->curframe + 1]) { 10277 - verbose(env, "verifier bug. Frame %d already allocated\n", 10278 - state->curframe + 1); 10288 + verifier_bug(env, "Frame %d already allocated", state->curframe + 1); 10279 10289 return -EFAULT; 10280 10290 } 10281 10291 ··· 10388 10400 if (err) 10389 10401 return err; 10390 10402 } else { 10391 - bpf_log(log, "verifier bug: unrecognized arg#%d type %d\n", 10392 - i, arg->arg_type); 10403 + verifier_bug(env, "unrecognized arg#%d type %d", i, arg->arg_type); 10393 10404 return -EFAULT; 10394 10405 } 10395 10406 } ··· 10451 10464 env->subprog_info[subprog].is_cb = true; 10452 10465 if (bpf_pseudo_kfunc_call(insn) && 10453 10466 !is_callback_calling_kfunc(insn->imm)) { 10454 - verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n", 10455 - func_id_name(insn->imm), insn->imm); 10467 + verifier_bug(env, "kfunc %s#%d not marked as callback-calling", 10468 + func_id_name(insn->imm), insn->imm); 10456 10469 return -EFAULT; 10457 10470 } else if (!bpf_pseudo_kfunc_call(insn) && 10458 10471 !is_callback_calling_function(insn->imm)) { /* helper */ 10459 - verbose(env, "verifier bug: helper %s#%d not marked as callback-calling\n", 10460 - func_id_name(insn->imm), insn->imm); 10472 + verifier_bug(env, "helper %s#%d not marked as callback-calling", 10473 + func_id_name(insn->imm), insn->imm); 10461 10474 return -EFAULT; 10462 10475 } 10463 10476 ··· 10509 10522 10510 10523 target_insn = *insn_idx + insn->imm + 1; 10511 10524 subprog = find_subprog(env, target_insn); 10512 - if (subprog < 0) { 10513 - verbose(env, "verifier bug. No program starts at insn %d\n", target_insn); 10525 + if (verifier_bug_if(subprog < 0, env, "target of func call at insn %d is not a program", 10526 + target_insn)) 10514 10527 return -EFAULT; 10515 - } 10516 10528 10517 10529 caller = state->frame[state->curframe]; 10518 10530 err = btf_check_subprog_call(env, subprog, caller->regs); ··· 11110 11124 err = fmt_map->ops->map_direct_value_addr(fmt_map, &fmt_addr, 11111 11125 fmt_map_off); 11112 11126 if (err) { 11113 - verbose(env, "verifier bug\n"); 11127 + verbose(env, "failed to retrieve map value address\n"); 11114 11128 return -EFAULT; 11115 11129 } 11116 11130 fmt = (char *)(long)fmt_addr + fmt_map_off; ··· 11883 11897 return btf_param_match_suffix(btf, arg, "__irq_flag"); 11884 11898 } 11885 11899 11900 + static bool is_kfunc_arg_prog(const struct btf *btf, const struct btf_param *arg) 11901 + { 11902 + return btf_param_match_suffix(btf, arg, "__prog"); 11903 + } 11904 + 11886 11905 static bool is_kfunc_arg_scalar_with_name(const struct btf *btf, 11887 11906 const struct btf_param *arg, 11888 11907 const char *name) ··· 11978 11987 return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID); 11979 11988 } 11980 11989 11990 + static bool is_rbtree_node_type(const struct btf_type *t) 11991 + { 11992 + return t == btf_type_by_id(btf_vmlinux, kf_arg_btf_ids[KF_ARG_RB_NODE_ID]); 11993 + } 11994 + 11995 + static bool is_list_node_type(const struct btf_type *t) 11996 + { 11997 + return t == btf_type_by_id(btf_vmlinux, kf_arg_btf_ids[KF_ARG_LIST_NODE_ID]); 11998 + } 11999 + 11981 12000 static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf, 11982 12001 const struct btf_param *arg) 11983 12002 { ··· 12070 12069 KF_bpf_list_push_back_impl, 12071 12070 KF_bpf_list_pop_front, 12072 12071 KF_bpf_list_pop_back, 12072 + KF_bpf_list_front, 12073 + KF_bpf_list_back, 12073 12074 KF_bpf_cast_to_kern_ctx, 12074 12075 KF_bpf_rdonly_cast, 12075 12076 KF_bpf_rcu_read_lock, ··· 12079 12076 KF_bpf_rbtree_remove, 12080 12077 KF_bpf_rbtree_add_impl, 12081 12078 KF_bpf_rbtree_first, 12079 + KF_bpf_rbtree_root, 12080 + KF_bpf_rbtree_left, 12081 + KF_bpf_rbtree_right, 12082 12082 KF_bpf_dynptr_from_skb, 12083 12083 KF_bpf_dynptr_from_xdp, 12084 12084 KF_bpf_dynptr_slice, ··· 12107 12101 KF_bpf_res_spin_unlock, 12108 12102 KF_bpf_res_spin_lock_irqsave, 12109 12103 KF_bpf_res_spin_unlock_irqrestore, 12104 + KF___bpf_trap, 12110 12105 }; 12111 - 12112 - BTF_SET_START(special_kfunc_set) 12113 - BTF_ID(func, bpf_obj_new_impl) 12114 - BTF_ID(func, bpf_obj_drop_impl) 12115 - BTF_ID(func, bpf_refcount_acquire_impl) 12116 - BTF_ID(func, bpf_list_push_front_impl) 12117 - BTF_ID(func, bpf_list_push_back_impl) 12118 - BTF_ID(func, bpf_list_pop_front) 12119 - BTF_ID(func, bpf_list_pop_back) 12120 - BTF_ID(func, bpf_cast_to_kern_ctx) 12121 - BTF_ID(func, bpf_rdonly_cast) 12122 - BTF_ID(func, bpf_rbtree_remove) 12123 - BTF_ID(func, bpf_rbtree_add_impl) 12124 - BTF_ID(func, bpf_rbtree_first) 12125 - #ifdef CONFIG_NET 12126 - BTF_ID(func, bpf_dynptr_from_skb) 12127 - BTF_ID(func, bpf_dynptr_from_xdp) 12128 - #endif 12129 - BTF_ID(func, bpf_dynptr_slice) 12130 - BTF_ID(func, bpf_dynptr_slice_rdwr) 12131 - BTF_ID(func, bpf_dynptr_clone) 12132 - BTF_ID(func, bpf_percpu_obj_new_impl) 12133 - BTF_ID(func, bpf_percpu_obj_drop_impl) 12134 - BTF_ID(func, bpf_throw) 12135 - BTF_ID(func, bpf_wq_set_callback_impl) 12136 - #ifdef CONFIG_CGROUPS 12137 - BTF_ID(func, bpf_iter_css_task_new) 12138 - #endif 12139 - #ifdef CONFIG_BPF_LSM 12140 - BTF_ID(func, bpf_set_dentry_xattr) 12141 - BTF_ID(func, bpf_remove_dentry_xattr) 12142 - #endif 12143 - BTF_SET_END(special_kfunc_set) 12144 12106 12145 12107 BTF_ID_LIST(special_kfunc_list) 12146 12108 BTF_ID(func, bpf_obj_new_impl) ··· 12118 12144 BTF_ID(func, bpf_list_push_back_impl) 12119 12145 BTF_ID(func, bpf_list_pop_front) 12120 12146 BTF_ID(func, bpf_list_pop_back) 12147 + BTF_ID(func, bpf_list_front) 12148 + BTF_ID(func, bpf_list_back) 12121 12149 BTF_ID(func, bpf_cast_to_kern_ctx) 12122 12150 BTF_ID(func, bpf_rdonly_cast) 12123 12151 BTF_ID(func, bpf_rcu_read_lock) ··· 12127 12151 BTF_ID(func, bpf_rbtree_remove) 12128 12152 BTF_ID(func, bpf_rbtree_add_impl) 12129 12153 BTF_ID(func, bpf_rbtree_first) 12154 + BTF_ID(func, bpf_rbtree_root) 12155 + BTF_ID(func, bpf_rbtree_left) 12156 + BTF_ID(func, bpf_rbtree_right) 12130 12157 #ifdef CONFIG_NET 12131 12158 BTF_ID(func, bpf_dynptr_from_skb) 12132 12159 BTF_ID(func, bpf_dynptr_from_xdp) ··· 12173 12194 BTF_ID(func, bpf_res_spin_unlock) 12174 12195 BTF_ID(func, bpf_res_spin_lock_irqsave) 12175 12196 BTF_ID(func, bpf_res_spin_unlock_irqrestore) 12197 + BTF_ID(func, __bpf_trap) 12176 12198 12177 12199 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) 12178 12200 { ··· 12559 12579 return btf_id == special_kfunc_list[KF_bpf_list_push_front_impl] || 12560 12580 btf_id == special_kfunc_list[KF_bpf_list_push_back_impl] || 12561 12581 btf_id == special_kfunc_list[KF_bpf_list_pop_front] || 12562 - btf_id == special_kfunc_list[KF_bpf_list_pop_back]; 12582 + btf_id == special_kfunc_list[KF_bpf_list_pop_back] || 12583 + btf_id == special_kfunc_list[KF_bpf_list_front] || 12584 + btf_id == special_kfunc_list[KF_bpf_list_back]; 12563 12585 } 12564 12586 12565 12587 static bool is_bpf_rbtree_api_kfunc(u32 btf_id) 12566 12588 { 12567 12589 return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl] || 12568 12590 btf_id == special_kfunc_list[KF_bpf_rbtree_remove] || 12569 - btf_id == special_kfunc_list[KF_bpf_rbtree_first]; 12591 + btf_id == special_kfunc_list[KF_bpf_rbtree_first] || 12592 + btf_id == special_kfunc_list[KF_bpf_rbtree_root] || 12593 + btf_id == special_kfunc_list[KF_bpf_rbtree_left] || 12594 + btf_id == special_kfunc_list[KF_bpf_rbtree_right]; 12570 12595 } 12571 12596 12572 12597 static bool is_bpf_iter_num_api_kfunc(u32 btf_id) ··· 12671 12686 break; 12672 12687 case BPF_RB_NODE: 12673 12688 ret = (kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_remove] || 12674 - kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl]); 12689 + kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl] || 12690 + kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_left] || 12691 + kfunc_btf_id == special_kfunc_list[KF_bpf_rbtree_right]); 12675 12692 break; 12676 12693 default: 12677 12694 verbose(env, "verifier internal error: unexpected graph node argument type %s\n", ··· 12892 12905 12893 12906 if (is_kfunc_arg_ignore(btf, &args[i])) 12894 12907 continue; 12908 + 12909 + if (is_kfunc_arg_prog(btf, &args[i])) { 12910 + /* Used to reject repeated use of __prog. */ 12911 + if (meta->arg_prog) { 12912 + verbose(env, "Only 1 prog->aux argument supported per-kfunc\n"); 12913 + return -EFAULT; 12914 + } 12915 + meta->arg_prog = true; 12916 + cur_aux(env)->arg_prog = regno; 12917 + continue; 12918 + } 12895 12919 12896 12920 if (btf_type_is_scalar(t)) { 12897 12921 if (reg->type != SCALAR_VALUE) { ··· 13198 13200 return ret; 13199 13201 break; 13200 13202 case KF_ARG_PTR_TO_RB_NODE: 13201 - if (meta->func_id == special_kfunc_list[KF_bpf_rbtree_remove]) { 13202 - if (!type_is_non_owning_ref(reg->type) || reg->ref_obj_id) { 13203 - verbose(env, "rbtree_remove node input must be non-owning ref\n"); 13204 - return -EINVAL; 13205 - } 13206 - if (in_rbtree_lock_required_cb(env)) { 13207 - verbose(env, "rbtree_remove not allowed in rbtree cb\n"); 13208 - return -EINVAL; 13209 - } 13210 - } else { 13203 + if (meta->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) { 13211 13204 if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { 13212 13205 verbose(env, "arg#%d expected pointer to allocated object\n", i); 13213 13206 return -EINVAL; 13214 13207 } 13215 13208 if (!reg->ref_obj_id) { 13216 13209 verbose(env, "allocated object must be referenced\n"); 13210 + return -EINVAL; 13211 + } 13212 + } else { 13213 + if (!type_is_non_owning_ref(reg->type) && !reg->ref_obj_id) { 13214 + verbose(env, "%s can only take non-owning or refcounted bpf_rb_node pointer\n", func_name); 13215 + return -EINVAL; 13216 + } 13217 + if (in_rbtree_lock_required_cb(env)) { 13218 + verbose(env, "%s not allowed in rbtree cb\n", func_name); 13217 13219 return -EINVAL; 13218 13220 } 13219 13221 } ··· 13418 13420 return 0; 13419 13421 } 13420 13422 13423 + /* check special kfuncs and return: 13424 + * 1 - not fall-through to 'else' branch, continue verification 13425 + * 0 - fall-through to 'else' branch 13426 + * < 0 - not fall-through to 'else' branch, return error 13427 + */ 13428 + static int check_special_kfunc(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, 13429 + struct bpf_reg_state *regs, struct bpf_insn_aux_data *insn_aux, 13430 + const struct btf_type *ptr_type, struct btf *desc_btf) 13431 + { 13432 + const struct btf_type *ret_t; 13433 + int err = 0; 13434 + 13435 + if (meta->btf != btf_vmlinux) 13436 + return 0; 13437 + 13438 + if (meta->func_id == special_kfunc_list[KF_bpf_obj_new_impl] || 13439 + meta->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { 13440 + struct btf_struct_meta *struct_meta; 13441 + struct btf *ret_btf; 13442 + u32 ret_btf_id; 13443 + 13444 + if (meta->func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set) 13445 + return -ENOMEM; 13446 + 13447 + if (((u64)(u32)meta->arg_constant.value) != meta->arg_constant.value) { 13448 + verbose(env, "local type ID argument must be in range [0, U32_MAX]\n"); 13449 + return -EINVAL; 13450 + } 13451 + 13452 + ret_btf = env->prog->aux->btf; 13453 + ret_btf_id = meta->arg_constant.value; 13454 + 13455 + /* This may be NULL due to user not supplying a BTF */ 13456 + if (!ret_btf) { 13457 + verbose(env, "bpf_obj_new/bpf_percpu_obj_new requires prog BTF\n"); 13458 + return -EINVAL; 13459 + } 13460 + 13461 + ret_t = btf_type_by_id(ret_btf, ret_btf_id); 13462 + if (!ret_t || !__btf_type_is_struct(ret_t)) { 13463 + verbose(env, "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct\n"); 13464 + return -EINVAL; 13465 + } 13466 + 13467 + if (meta->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { 13468 + if (ret_t->size > BPF_GLOBAL_PERCPU_MA_MAX_SIZE) { 13469 + verbose(env, "bpf_percpu_obj_new type size (%d) is greater than %d\n", 13470 + ret_t->size, BPF_GLOBAL_PERCPU_MA_MAX_SIZE); 13471 + return -EINVAL; 13472 + } 13473 + 13474 + if (!bpf_global_percpu_ma_set) { 13475 + mutex_lock(&bpf_percpu_ma_lock); 13476 + if (!bpf_global_percpu_ma_set) { 13477 + /* Charge memory allocated with bpf_global_percpu_ma to 13478 + * root memcg. The obj_cgroup for root memcg is NULL. 13479 + */ 13480 + err = bpf_mem_alloc_percpu_init(&bpf_global_percpu_ma, NULL); 13481 + if (!err) 13482 + bpf_global_percpu_ma_set = true; 13483 + } 13484 + mutex_unlock(&bpf_percpu_ma_lock); 13485 + if (err) 13486 + return err; 13487 + } 13488 + 13489 + mutex_lock(&bpf_percpu_ma_lock); 13490 + err = bpf_mem_alloc_percpu_unit_init(&bpf_global_percpu_ma, ret_t->size); 13491 + mutex_unlock(&bpf_percpu_ma_lock); 13492 + if (err) 13493 + return err; 13494 + } 13495 + 13496 + struct_meta = btf_find_struct_meta(ret_btf, ret_btf_id); 13497 + if (meta->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { 13498 + if (!__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) { 13499 + verbose(env, "bpf_percpu_obj_new type ID argument must be of a struct of scalars\n"); 13500 + return -EINVAL; 13501 + } 13502 + 13503 + if (struct_meta) { 13504 + verbose(env, "bpf_percpu_obj_new type ID argument must not contain special fields\n"); 13505 + return -EINVAL; 13506 + } 13507 + } 13508 + 13509 + mark_reg_known_zero(env, regs, BPF_REG_0); 13510 + regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC; 13511 + regs[BPF_REG_0].btf = ret_btf; 13512 + regs[BPF_REG_0].btf_id = ret_btf_id; 13513 + if (meta->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) 13514 + regs[BPF_REG_0].type |= MEM_PERCPU; 13515 + 13516 + insn_aux->obj_new_size = ret_t->size; 13517 + insn_aux->kptr_struct_meta = struct_meta; 13518 + } else if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]) { 13519 + mark_reg_known_zero(env, regs, BPF_REG_0); 13520 + regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC; 13521 + regs[BPF_REG_0].btf = meta->arg_btf; 13522 + regs[BPF_REG_0].btf_id = meta->arg_btf_id; 13523 + 13524 + insn_aux->kptr_struct_meta = 13525 + btf_find_struct_meta(meta->arg_btf, 13526 + meta->arg_btf_id); 13527 + } else if (is_list_node_type(ptr_type)) { 13528 + struct btf_field *field = meta->arg_list_head.field; 13529 + 13530 + mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root); 13531 + } else if (is_rbtree_node_type(ptr_type)) { 13532 + struct btf_field *field = meta->arg_rbtree_root.field; 13533 + 13534 + mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root); 13535 + } else if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) { 13536 + mark_reg_known_zero(env, regs, BPF_REG_0); 13537 + regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED; 13538 + regs[BPF_REG_0].btf = desc_btf; 13539 + regs[BPF_REG_0].btf_id = meta->ret_btf_id; 13540 + } else if (meta->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) { 13541 + ret_t = btf_type_by_id(desc_btf, meta->arg_constant.value); 13542 + if (!ret_t || !btf_type_is_struct(ret_t)) { 13543 + verbose(env, 13544 + "kfunc bpf_rdonly_cast type ID argument must be of a struct\n"); 13545 + return -EINVAL; 13546 + } 13547 + 13548 + mark_reg_known_zero(env, regs, BPF_REG_0); 13549 + regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_UNTRUSTED; 13550 + regs[BPF_REG_0].btf = desc_btf; 13551 + regs[BPF_REG_0].btf_id = meta->arg_constant.value; 13552 + } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_slice] || 13553 + meta->func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { 13554 + enum bpf_type_flag type_flag = get_dynptr_type_flag(meta->initialized_dynptr.type); 13555 + 13556 + mark_reg_known_zero(env, regs, BPF_REG_0); 13557 + 13558 + if (!meta->arg_constant.found) { 13559 + verbose(env, "verifier internal error: bpf_dynptr_slice(_rdwr) no constant size\n"); 13560 + return -EFAULT; 13561 + } 13562 + 13563 + regs[BPF_REG_0].mem_size = meta->arg_constant.value; 13564 + 13565 + /* PTR_MAYBE_NULL will be added when is_kfunc_ret_null is checked */ 13566 + regs[BPF_REG_0].type = PTR_TO_MEM | type_flag; 13567 + 13568 + if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_slice]) { 13569 + regs[BPF_REG_0].type |= MEM_RDONLY; 13570 + } else { 13571 + /* this will set env->seen_direct_write to true */ 13572 + if (!may_access_direct_pkt_data(env, NULL, BPF_WRITE)) { 13573 + verbose(env, "the prog does not allow writes to packet data\n"); 13574 + return -EINVAL; 13575 + } 13576 + } 13577 + 13578 + if (!meta->initialized_dynptr.id) { 13579 + verbose(env, "verifier internal error: no dynptr id\n"); 13580 + return -EFAULT; 13581 + } 13582 + regs[BPF_REG_0].dynptr_id = meta->initialized_dynptr.id; 13583 + 13584 + /* we don't need to set BPF_REG_0's ref obj id 13585 + * because packet slices are not refcounted (see 13586 + * dynptr_type_refcounted) 13587 + */ 13588 + } else { 13589 + return 0; 13590 + } 13591 + 13592 + return 1; 13593 + } 13594 + 13421 13595 static int check_return_code(struct bpf_verifier_env *env, int regno, const char *reg_name); 13422 13596 13423 13597 static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, ··· 13604 13434 struct bpf_insn_aux_data *insn_aux; 13605 13435 int err, insn_idx = *insn_idx_p; 13606 13436 const struct btf_param *args; 13607 - const struct btf_type *ret_t; 13608 13437 struct btf *desc_btf; 13609 13438 13610 13439 /* skip for now, but return error when we find this in fixup_kfunc_call */ ··· 13645 13476 return err; 13646 13477 } 13647 13478 __mark_btf_func_reg_size(env, regs, BPF_REG_0, sizeof(u32)); 13479 + } else if (!insn->off && insn->imm == special_kfunc_list[KF___bpf_trap]) { 13480 + verbose(env, "unexpected __bpf_trap() due to uninitialized variable?\n"); 13481 + return -EFAULT; 13648 13482 } 13649 13483 13650 13484 if (is_kfunc_destructive(&meta) && !capable(CAP_SYS_BOOT)) { ··· 13826 13654 mark_btf_func_reg_size(env, BPF_REG_0, t->size); 13827 13655 } else if (btf_type_is_ptr(t)) { 13828 13656 ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id); 13829 - 13830 - if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) { 13831 - if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] || 13832 - meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { 13833 - struct btf_struct_meta *struct_meta; 13834 - struct btf *ret_btf; 13835 - u32 ret_btf_id; 13836 - 13837 - if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set) 13838 - return -ENOMEM; 13839 - 13840 - if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) { 13841 - verbose(env, "local type ID argument must be in range [0, U32_MAX]\n"); 13842 - return -EINVAL; 13843 - } 13844 - 13845 - ret_btf = env->prog->aux->btf; 13846 - ret_btf_id = meta.arg_constant.value; 13847 - 13848 - /* This may be NULL due to user not supplying a BTF */ 13849 - if (!ret_btf) { 13850 - verbose(env, "bpf_obj_new/bpf_percpu_obj_new requires prog BTF\n"); 13851 - return -EINVAL; 13852 - } 13853 - 13854 - ret_t = btf_type_by_id(ret_btf, ret_btf_id); 13855 - if (!ret_t || !__btf_type_is_struct(ret_t)) { 13856 - verbose(env, "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct\n"); 13857 - return -EINVAL; 13858 - } 13859 - 13860 - if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { 13861 - if (ret_t->size > BPF_GLOBAL_PERCPU_MA_MAX_SIZE) { 13862 - verbose(env, "bpf_percpu_obj_new type size (%d) is greater than %d\n", 13863 - ret_t->size, BPF_GLOBAL_PERCPU_MA_MAX_SIZE); 13864 - return -EINVAL; 13865 - } 13866 - 13867 - if (!bpf_global_percpu_ma_set) { 13868 - mutex_lock(&bpf_percpu_ma_lock); 13869 - if (!bpf_global_percpu_ma_set) { 13870 - /* Charge memory allocated with bpf_global_percpu_ma to 13871 - * root memcg. The obj_cgroup for root memcg is NULL. 13872 - */ 13873 - err = bpf_mem_alloc_percpu_init(&bpf_global_percpu_ma, NULL); 13874 - if (!err) 13875 - bpf_global_percpu_ma_set = true; 13876 - } 13877 - mutex_unlock(&bpf_percpu_ma_lock); 13878 - if (err) 13879 - return err; 13880 - } 13881 - 13882 - mutex_lock(&bpf_percpu_ma_lock); 13883 - err = bpf_mem_alloc_percpu_unit_init(&bpf_global_percpu_ma, ret_t->size); 13884 - mutex_unlock(&bpf_percpu_ma_lock); 13885 - if (err) 13886 - return err; 13887 - } 13888 - 13889 - struct_meta = btf_find_struct_meta(ret_btf, ret_btf_id); 13890 - if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) { 13891 - if (!__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) { 13892 - verbose(env, "bpf_percpu_obj_new type ID argument must be of a struct of scalars\n"); 13893 - return -EINVAL; 13894 - } 13895 - 13896 - if (struct_meta) { 13897 - verbose(env, "bpf_percpu_obj_new type ID argument must not contain special fields\n"); 13898 - return -EINVAL; 13899 - } 13900 - } 13901 - 13902 - mark_reg_known_zero(env, regs, BPF_REG_0); 13903 - regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC; 13904 - regs[BPF_REG_0].btf = ret_btf; 13905 - regs[BPF_REG_0].btf_id = ret_btf_id; 13906 - if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) 13907 - regs[BPF_REG_0].type |= MEM_PERCPU; 13908 - 13909 - insn_aux->obj_new_size = ret_t->size; 13910 - insn_aux->kptr_struct_meta = struct_meta; 13911 - } else if (meta.func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]) { 13912 - mark_reg_known_zero(env, regs, BPF_REG_0); 13913 - regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC; 13914 - regs[BPF_REG_0].btf = meta.arg_btf; 13915 - regs[BPF_REG_0].btf_id = meta.arg_btf_id; 13916 - 13917 - insn_aux->kptr_struct_meta = 13918 - btf_find_struct_meta(meta.arg_btf, 13919 - meta.arg_btf_id); 13920 - } else if (meta.func_id == special_kfunc_list[KF_bpf_list_pop_front] || 13921 - meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) { 13922 - struct btf_field *field = meta.arg_list_head.field; 13923 - 13924 - mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root); 13925 - } else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] || 13926 - meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) { 13927 - struct btf_field *field = meta.arg_rbtree_root.field; 13928 - 13929 - mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root); 13930 - } else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) { 13931 - mark_reg_known_zero(env, regs, BPF_REG_0); 13932 - regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED; 13933 - regs[BPF_REG_0].btf = desc_btf; 13934 - regs[BPF_REG_0].btf_id = meta.ret_btf_id; 13935 - } else if (meta.func_id == special_kfunc_list[KF_bpf_rdonly_cast]) { 13936 - ret_t = btf_type_by_id(desc_btf, meta.arg_constant.value); 13937 - if (!ret_t || !btf_type_is_struct(ret_t)) { 13938 - verbose(env, 13939 - "kfunc bpf_rdonly_cast type ID argument must be of a struct\n"); 13940 - return -EINVAL; 13941 - } 13942 - 13943 - mark_reg_known_zero(env, regs, BPF_REG_0); 13944 - regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_UNTRUSTED; 13945 - regs[BPF_REG_0].btf = desc_btf; 13946 - regs[BPF_REG_0].btf_id = meta.arg_constant.value; 13947 - } else if (meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice] || 13948 - meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { 13949 - enum bpf_type_flag type_flag = get_dynptr_type_flag(meta.initialized_dynptr.type); 13950 - 13951 - mark_reg_known_zero(env, regs, BPF_REG_0); 13952 - 13953 - if (!meta.arg_constant.found) { 13954 - verbose(env, "verifier internal error: bpf_dynptr_slice(_rdwr) no constant size\n"); 13955 - return -EFAULT; 13956 - } 13957 - 13958 - regs[BPF_REG_0].mem_size = meta.arg_constant.value; 13959 - 13960 - /* PTR_MAYBE_NULL will be added when is_kfunc_ret_null is checked */ 13961 - regs[BPF_REG_0].type = PTR_TO_MEM | type_flag; 13962 - 13963 - if (meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice]) { 13964 - regs[BPF_REG_0].type |= MEM_RDONLY; 13965 - } else { 13966 - /* this will set env->seen_direct_write to true */ 13967 - if (!may_access_direct_pkt_data(env, NULL, BPF_WRITE)) { 13968 - verbose(env, "the prog does not allow writes to packet data\n"); 13969 - return -EINVAL; 13970 - } 13971 - } 13972 - 13973 - if (!meta.initialized_dynptr.id) { 13974 - verbose(env, "verifier internal error: no dynptr id\n"); 13975 - return -EFAULT; 13976 - } 13977 - regs[BPF_REG_0].dynptr_id = meta.initialized_dynptr.id; 13978 - 13979 - /* we don't need to set BPF_REG_0's ref obj id 13980 - * because packet slices are not refcounted (see 13981 - * dynptr_type_refcounted) 13982 - */ 13983 - } else { 13984 - verbose(env, "kernel function %s unhandled dynamic return type\n", 13985 - meta.func_name); 13986 - return -EFAULT; 13987 - } 13657 + err = check_special_kfunc(env, &meta, regs, insn_aux, ptr_type, desc_btf); 13658 + if (err) { 13659 + if (err < 0) 13660 + return err; 13988 13661 } else if (btf_type_is_void(ptr_type)) { 13989 13662 /* kfunc returning 'void *' is equivalent to returning scalar */ 13990 13663 mark_reg_unknown(env, regs, BPF_REG_0); ··· 13898 13881 if (is_kfunc_ret_null(&meta)) 13899 13882 regs[BPF_REG_0].id = id; 13900 13883 regs[BPF_REG_0].ref_obj_id = id; 13901 - } else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) { 13884 + } else if (is_rbtree_node_type(ptr_type) || is_list_node_type(ptr_type)) { 13902 13885 ref_set_non_owning(env, &regs[BPF_REG_0]); 13903 13886 } 13904 13887 13905 13888 if (reg_may_point_to_spin_lock(&regs[BPF_REG_0]) && !regs[BPF_REG_0].id) 13906 13889 regs[BPF_REG_0].id = ++env->id_gen; 13907 13890 } else if (btf_type_is_void(t)) { 13908 - if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) { 13891 + if (meta.btf == btf_vmlinux) { 13909 13892 if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl] || 13910 13893 meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl]) { 13911 13894 insn_aux->kptr_struct_meta = ··· 16394 16377 struct bpf_reg_state *eq_branch_regs; 16395 16378 struct linked_regs linked_regs = {}; 16396 16379 u8 opcode = BPF_OP(insn->code); 16380 + int insn_flags = 0; 16397 16381 bool is_jmp32; 16398 16382 int pred = -1; 16399 16383 int err; ··· 16453 16435 insn->src_reg); 16454 16436 return -EACCES; 16455 16437 } 16438 + 16439 + if (src_reg->type == PTR_TO_STACK) 16440 + insn_flags |= INSN_F_SRC_REG_STACK; 16441 + if (dst_reg->type == PTR_TO_STACK) 16442 + insn_flags |= INSN_F_DST_REG_STACK; 16456 16443 } else { 16457 16444 if (insn->src_reg != BPF_REG_0) { 16458 16445 verbose(env, "BPF_JMP/JMP32 uses reserved fields\n"); ··· 16467 16444 memset(src_reg, 0, sizeof(*src_reg)); 16468 16445 src_reg->type = SCALAR_VALUE; 16469 16446 __mark_reg_known(src_reg, insn->imm); 16447 + 16448 + if (dst_reg->type == PTR_TO_STACK) 16449 + insn_flags |= INSN_F_DST_REG_STACK; 16450 + } 16451 + 16452 + if (insn_flags) { 16453 + err = push_insn_history(env, this_branch, insn_flags, 0); 16454 + if (err) 16455 + return err; 16470 16456 } 16471 16457 16472 16458 is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32; ··· 19691 19659 return err; 19692 19660 break; 19693 19661 } else { 19694 - if (WARN_ON_ONCE(env->cur_state->loop_entry)) { 19695 - verbose(env, "verifier bug: env->cur_state->loop_entry != NULL\n"); 19662 + if (verifier_bug_if(env->cur_state->loop_entry, env, 19663 + "broken loop detection")) 19696 19664 return -EFAULT; 19697 - } 19698 19665 do_print_state = true; 19699 19666 continue; 19700 19667 } ··· 20751 20720 if (bpf_pseudo_kfunc_call(&insn)) 20752 20721 continue; 20753 20722 20754 - if (WARN_ON(load_reg == -1)) { 20755 - verbose(env, "verifier bug. zext_dst is set, but no reg is defined\n"); 20723 + if (verifier_bug_if(load_reg == -1, env, 20724 + "zext_dst is set, but no reg is defined")) 20756 20725 return -EFAULT; 20757 - } 20758 20726 20759 20727 zext_patch[0] = insn; 20760 20728 zext_patch[1].dst_reg = load_reg; ··· 21070 21040 * propagated in any case. 21071 21041 */ 21072 21042 subprog = find_subprog(env, i + insn->imm + 1); 21073 - if (subprog < 0) { 21074 - WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", 21075 - i + insn->imm + 1); 21043 + if (verifier_bug_if(subprog < 0, env, "No program to jit at insn %d", 21044 + i + insn->imm + 1)) 21076 21045 return -EFAULT; 21077 - } 21078 21046 /* temporarily remember subprog id inside insn instead of 21079 21047 * aux_data, since next loop will split up all insns into funcs 21080 21048 */ ··· 21515 21487 desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) { 21516 21488 insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1); 21517 21489 *cnt = 1; 21518 - } else if (is_bpf_wq_set_callback_impl_kfunc(desc->func_id)) { 21519 - struct bpf_insn ld_addrs[2] = { BPF_LD_IMM64(BPF_REG_4, (long)env->prog->aux) }; 21490 + } 21520 21491 21521 - insn_buf[0] = ld_addrs[0]; 21522 - insn_buf[1] = ld_addrs[1]; 21523 - insn_buf[2] = *insn; 21524 - *cnt = 3; 21492 + if (env->insn_aux_data[insn_idx].arg_prog) { 21493 + u32 regno = env->insn_aux_data[insn_idx].arg_prog; 21494 + struct bpf_insn ld_addrs[2] = { BPF_LD_IMM64(regno, (long)env->prog->aux) }; 21495 + int idx = *cnt; 21496 + 21497 + insn_buf[idx++] = ld_addrs[0]; 21498 + insn_buf[idx++] = ld_addrs[1]; 21499 + insn_buf[idx++] = *insn; 21500 + *cnt = idx; 21525 21501 } 21526 21502 return 0; 21527 21503 } ··· 22435 22403 continue; 22436 22404 /* We need two slots in case timed may_goto is supported. */ 22437 22405 if (stack_slots > slots) { 22438 - verbose(env, "verifier bug: stack_slots supports may_goto only\n"); 22406 + verifier_bug(env, "stack_slots supports may_goto only"); 22439 22407 return -EFAULT; 22440 22408 } 22441 22409

+1 -14

kernel/sched/ext.c

··· 5791 5791 return -EACCES; 5792 5792 } 5793 5793 5794 - static const struct bpf_func_proto * 5795 - bpf_scx_get_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 5796 - { 5797 - switch (func_id) { 5798 - case BPF_FUNC_task_storage_get: 5799 - return &bpf_task_storage_get_proto; 5800 - case BPF_FUNC_task_storage_delete: 5801 - return &bpf_task_storage_delete_proto; 5802 - default: 5803 - return bpf_base_func_proto(func_id, prog); 5804 - } 5805 - } 5806 - 5807 5794 static const struct bpf_verifier_ops bpf_scx_verifier_ops = { 5808 - .get_func_proto = bpf_scx_get_func_proto, 5795 + .get_func_proto = bpf_base_func_proto, 5809 5796 .is_valid_access = bpf_scx_is_valid_access, 5810 5797 .btf_struct_access = bpf_scx_btf_struct_access, 5811 5798 };

+211 -110

kernel/trace/bpf_trace.c

··· 572 572 return value; 573 573 } 574 574 575 - static const struct bpf_func_proto bpf_perf_event_read_proto = { 575 + const struct bpf_func_proto bpf_perf_event_read_proto = { 576 576 .func = bpf_perf_event_read, 577 577 .gpl_only = true, 578 578 .ret_type = RET_INTEGER, ··· 882 882 return bpf_send_signal_common(sig, PIDTYPE_TGID, NULL, 0); 883 883 } 884 884 885 - static const struct bpf_func_proto bpf_send_signal_proto = { 885 + const struct bpf_func_proto bpf_send_signal_proto = { 886 886 .func = bpf_send_signal, 887 887 .gpl_only = false, 888 888 .ret_type = RET_INTEGER, ··· 894 894 return bpf_send_signal_common(sig, PIDTYPE_PID, NULL, 0); 895 895 } 896 896 897 - static const struct bpf_func_proto bpf_send_signal_thread_proto = { 897 + const struct bpf_func_proto bpf_send_signal_thread_proto = { 898 898 .func = bpf_send_signal_thread, 899 899 .gpl_only = false, 900 900 .ret_type = RET_INTEGER, ··· 1185 1185 return entry_cnt * br_entry_size; 1186 1186 } 1187 1187 1188 - static const struct bpf_func_proto bpf_get_branch_snapshot_proto = { 1188 + const struct bpf_func_proto bpf_get_branch_snapshot_proto = { 1189 1189 .func = bpf_get_branch_snapshot, 1190 1190 .gpl_only = true, 1191 1191 .ret_type = RET_INTEGER, ··· 1430 1430 const struct bpf_func_proto *func_proto; 1431 1431 1432 1432 switch (func_id) { 1433 - case BPF_FUNC_map_lookup_elem: 1434 - return &bpf_map_lookup_elem_proto; 1435 - case BPF_FUNC_map_update_elem: 1436 - return &bpf_map_update_elem_proto; 1437 - case BPF_FUNC_map_delete_elem: 1438 - return &bpf_map_delete_elem_proto; 1439 - case BPF_FUNC_map_push_elem: 1440 - return &bpf_map_push_elem_proto; 1441 - case BPF_FUNC_map_pop_elem: 1442 - return &bpf_map_pop_elem_proto; 1443 - case BPF_FUNC_map_peek_elem: 1444 - return &bpf_map_peek_elem_proto; 1445 - case BPF_FUNC_map_lookup_percpu_elem: 1446 - return &bpf_map_lookup_percpu_elem_proto; 1447 - case BPF_FUNC_ktime_get_ns: 1448 - return &bpf_ktime_get_ns_proto; 1449 - case BPF_FUNC_ktime_get_boot_ns: 1450 - return &bpf_ktime_get_boot_ns_proto; 1451 - case BPF_FUNC_tail_call: 1452 - return &bpf_tail_call_proto; 1453 - case BPF_FUNC_get_current_task: 1454 - return &bpf_get_current_task_proto; 1455 - case BPF_FUNC_get_current_task_btf: 1456 - return &bpf_get_current_task_btf_proto; 1457 - case BPF_FUNC_task_pt_regs: 1458 - return &bpf_task_pt_regs_proto; 1459 - case BPF_FUNC_get_current_uid_gid: 1460 - return &bpf_get_current_uid_gid_proto; 1461 - case BPF_FUNC_get_current_comm: 1462 - return &bpf_get_current_comm_proto; 1463 - case BPF_FUNC_trace_printk: 1464 - return bpf_get_trace_printk_proto(); 1465 1433 case BPF_FUNC_get_smp_processor_id: 1466 1434 return &bpf_get_smp_processor_id_proto; 1467 - case BPF_FUNC_get_numa_node_id: 1468 - return &bpf_get_numa_node_id_proto; 1469 - case BPF_FUNC_perf_event_read: 1470 - return &bpf_perf_event_read_proto; 1471 - case BPF_FUNC_get_prandom_u32: 1472 - return &bpf_get_prandom_u32_proto; 1473 - case BPF_FUNC_probe_read_user: 1474 - return &bpf_probe_read_user_proto; 1475 - case BPF_FUNC_probe_read_kernel: 1476 - return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? 1477 - NULL : &bpf_probe_read_kernel_proto; 1478 - case BPF_FUNC_probe_read_user_str: 1479 - return &bpf_probe_read_user_str_proto; 1480 - case BPF_FUNC_probe_read_kernel_str: 1481 - return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? 1482 - NULL : &bpf_probe_read_kernel_str_proto; 1483 1435 #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE 1484 1436 case BPF_FUNC_probe_read: 1485 1437 return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? ··· 1440 1488 return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? 1441 1489 NULL : &bpf_probe_read_compat_str_proto; 1442 1490 #endif 1443 - #ifdef CONFIG_CGROUPS 1444 - case BPF_FUNC_cgrp_storage_get: 1445 - return &bpf_cgrp_storage_get_proto; 1446 - case BPF_FUNC_cgrp_storage_delete: 1447 - return &bpf_cgrp_storage_delete_proto; 1448 - case BPF_FUNC_current_task_under_cgroup: 1449 - return &bpf_current_task_under_cgroup_proto; 1450 - #endif 1451 - case BPF_FUNC_send_signal: 1452 - return &bpf_send_signal_proto; 1453 - case BPF_FUNC_send_signal_thread: 1454 - return &bpf_send_signal_thread_proto; 1455 - case BPF_FUNC_perf_event_read_value: 1456 - return &bpf_perf_event_read_value_proto; 1457 - case BPF_FUNC_ringbuf_output: 1458 - return &bpf_ringbuf_output_proto; 1459 - case BPF_FUNC_ringbuf_reserve: 1460 - return &bpf_ringbuf_reserve_proto; 1461 - case BPF_FUNC_ringbuf_submit: 1462 - return &bpf_ringbuf_submit_proto; 1463 - case BPF_FUNC_ringbuf_discard: 1464 - return &bpf_ringbuf_discard_proto; 1465 - case BPF_FUNC_ringbuf_query: 1466 - return &bpf_ringbuf_query_proto; 1467 - case BPF_FUNC_jiffies64: 1468 - return &bpf_jiffies64_proto; 1469 - case BPF_FUNC_get_task_stack: 1470 - return prog->sleepable ? &bpf_get_task_stack_sleepable_proto 1471 - : &bpf_get_task_stack_proto; 1472 - case BPF_FUNC_copy_from_user: 1473 - return &bpf_copy_from_user_proto; 1474 - case BPF_FUNC_copy_from_user_task: 1475 - return &bpf_copy_from_user_task_proto; 1476 - case BPF_FUNC_snprintf_btf: 1477 - return &bpf_snprintf_btf_proto; 1478 - case BPF_FUNC_per_cpu_ptr: 1479 - return &bpf_per_cpu_ptr_proto; 1480 - case BPF_FUNC_this_cpu_ptr: 1481 - return &bpf_this_cpu_ptr_proto; 1482 - case BPF_FUNC_task_storage_get: 1483 - if (bpf_prog_check_recur(prog)) 1484 - return &bpf_task_storage_get_recur_proto; 1485 - return &bpf_task_storage_get_proto; 1486 - case BPF_FUNC_task_storage_delete: 1487 - if (bpf_prog_check_recur(prog)) 1488 - return &bpf_task_storage_delete_recur_proto; 1489 - return &bpf_task_storage_delete_proto; 1490 - case BPF_FUNC_for_each_map_elem: 1491 - return &bpf_for_each_map_elem_proto; 1492 - case BPF_FUNC_snprintf: 1493 - return &bpf_snprintf_proto; 1494 1491 case BPF_FUNC_get_func_ip: 1495 1492 return &bpf_get_func_ip_proto_tracing; 1496 - case BPF_FUNC_get_branch_snapshot: 1497 - return &bpf_get_branch_snapshot_proto; 1498 - case BPF_FUNC_find_vma: 1499 - return &bpf_find_vma_proto; 1500 - case BPF_FUNC_trace_vprintk: 1501 - return bpf_get_trace_vprintk_proto(); 1502 1493 default: 1503 1494 break; 1504 1495 } ··· 1753 1858 struct bpf_raw_tp_regs *tp_regs = this_cpu_ptr(&bpf_raw_tp_regs); 1754 1859 int nest_level = this_cpu_inc_return(bpf_raw_tp_nest_level); 1755 1860 1756 - if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(tp_regs->regs))) { 1861 + if (nest_level > ARRAY_SIZE(tp_regs->regs)) { 1757 1862 this_cpu_dec(bpf_raw_tp_nest_level); 1758 1863 return ERR_PTR(-EBUSY); 1759 1864 } ··· 2882 2987 if (sizeof(u64) != sizeof(void *)) 2883 2988 return -EOPNOTSUPP; 2884 2989 2990 + if (attr->link_create.flags) 2991 + return -EINVAL; 2992 + 2885 2993 if (!is_kprobe_multi(prog)) 2886 2994 return -EINVAL; 2887 2995 ··· 3274 3376 if (sizeof(u64) != sizeof(void *)) 3275 3377 return -EOPNOTSUPP; 3276 3378 3379 + if (attr->link_create.flags) 3380 + return -EINVAL; 3381 + 3277 3382 if (!is_uprobe_multi(prog)) 3278 3383 return -EINVAL; 3279 3384 ··· 3318 3417 } 3319 3418 3320 3419 if (pid) { 3420 + rcu_read_lock(); 3321 3421 task = get_pid_task(find_vpid(pid), PIDTYPE_TGID); 3422 + rcu_read_unlock(); 3322 3423 if (!task) { 3323 3424 err = -ESRCH; 3324 3425 goto error_path_put; ··· 3468 3565 3469 3566 late_initcall(bpf_kprobe_multi_kfuncs_init); 3470 3567 3568 + typedef int (*copy_fn_t)(void *dst, const void *src, u32 size, struct task_struct *tsk); 3569 + 3570 + /* 3571 + * The __always_inline is to make sure the compiler doesn't 3572 + * generate indirect calls into callbacks, which is expensive, 3573 + * on some kernel configurations. This allows compiler to put 3574 + * direct calls into all the specific callback implementations 3575 + * (copy_user_data_sleepable, copy_user_data_nofault, and so on) 3576 + */ 3577 + static __always_inline int __bpf_dynptr_copy_str(struct bpf_dynptr *dptr, u32 doff, u32 size, 3578 + const void *unsafe_src, 3579 + copy_fn_t str_copy_fn, 3580 + struct task_struct *tsk) 3581 + { 3582 + struct bpf_dynptr_kern *dst; 3583 + u32 chunk_sz, off; 3584 + void *dst_slice; 3585 + int cnt, err; 3586 + char buf[256]; 3587 + 3588 + dst_slice = bpf_dynptr_slice_rdwr(dptr, doff, NULL, size); 3589 + if (likely(dst_slice)) 3590 + return str_copy_fn(dst_slice, unsafe_src, size, tsk); 3591 + 3592 + dst = (struct bpf_dynptr_kern *)dptr; 3593 + if (bpf_dynptr_check_off_len(dst, doff, size)) 3594 + return -E2BIG; 3595 + 3596 + for (off = 0; off < size; off += chunk_sz - 1) { 3597 + chunk_sz = min_t(u32, sizeof(buf), size - off); 3598 + /* Expect str_copy_fn to return count of copied bytes, including 3599 + * zero terminator. Next iteration increment off by chunk_sz - 1 to 3600 + * overwrite NUL. 3601 + */ 3602 + cnt = str_copy_fn(buf, unsafe_src + off, chunk_sz, tsk); 3603 + if (cnt < 0) 3604 + return cnt; 3605 + err = __bpf_dynptr_write(dst, doff + off, buf, cnt, 0); 3606 + if (err) 3607 + return err; 3608 + if (cnt < chunk_sz || chunk_sz == 1) /* we are done */ 3609 + return off + cnt; 3610 + } 3611 + return off; 3612 + } 3613 + 3614 + static __always_inline int __bpf_dynptr_copy(const struct bpf_dynptr *dptr, u32 doff, 3615 + u32 size, const void *unsafe_src, 3616 + copy_fn_t copy_fn, struct task_struct *tsk) 3617 + { 3618 + struct bpf_dynptr_kern *dst; 3619 + void *dst_slice; 3620 + char buf[256]; 3621 + u32 off, chunk_sz; 3622 + int err; 3623 + 3624 + dst_slice = bpf_dynptr_slice_rdwr(dptr, doff, NULL, size); 3625 + if (likely(dst_slice)) 3626 + return copy_fn(dst_slice, unsafe_src, size, tsk); 3627 + 3628 + dst = (struct bpf_dynptr_kern *)dptr; 3629 + if (bpf_dynptr_check_off_len(dst, doff, size)) 3630 + return -E2BIG; 3631 + 3632 + for (off = 0; off < size; off += chunk_sz) { 3633 + chunk_sz = min_t(u32, sizeof(buf), size - off); 3634 + err = copy_fn(buf, unsafe_src + off, chunk_sz, tsk); 3635 + if (err) 3636 + return err; 3637 + err = __bpf_dynptr_write(dst, doff + off, buf, chunk_sz, 0); 3638 + if (err) 3639 + return err; 3640 + } 3641 + return 0; 3642 + } 3643 + 3644 + static __always_inline int copy_user_data_nofault(void *dst, const void *unsafe_src, 3645 + u32 size, struct task_struct *tsk) 3646 + { 3647 + return copy_from_user_nofault(dst, (const void __user *)unsafe_src, size); 3648 + } 3649 + 3650 + static __always_inline int copy_user_data_sleepable(void *dst, const void *unsafe_src, 3651 + u32 size, struct task_struct *tsk) 3652 + { 3653 + int ret; 3654 + 3655 + if (!tsk) { /* Read from the current task */ 3656 + ret = copy_from_user(dst, (const void __user *)unsafe_src, size); 3657 + if (ret) 3658 + return -EFAULT; 3659 + return 0; 3660 + } 3661 + 3662 + ret = access_process_vm(tsk, (unsigned long)unsafe_src, dst, size, 0); 3663 + if (ret != size) 3664 + return -EFAULT; 3665 + return 0; 3666 + } 3667 + 3668 + static __always_inline int copy_kernel_data_nofault(void *dst, const void *unsafe_src, 3669 + u32 size, struct task_struct *tsk) 3670 + { 3671 + return copy_from_kernel_nofault(dst, unsafe_src, size); 3672 + } 3673 + 3674 + static __always_inline int copy_user_str_nofault(void *dst, const void *unsafe_src, 3675 + u32 size, struct task_struct *tsk) 3676 + { 3677 + return strncpy_from_user_nofault(dst, (const void __user *)unsafe_src, size); 3678 + } 3679 + 3680 + static __always_inline int copy_user_str_sleepable(void *dst, const void *unsafe_src, 3681 + u32 size, struct task_struct *tsk) 3682 + { 3683 + int ret; 3684 + 3685 + if (unlikely(size == 0)) 3686 + return 0; 3687 + 3688 + if (tsk) { 3689 + ret = copy_remote_vm_str(tsk, (unsigned long)unsafe_src, dst, size, 0); 3690 + } else { 3691 + ret = strncpy_from_user(dst, (const void __user *)unsafe_src, size - 1); 3692 + /* strncpy_from_user does not guarantee NUL termination */ 3693 + if (ret >= 0) 3694 + ((char *)dst)[ret] = '\0'; 3695 + } 3696 + 3697 + if (ret < 0) 3698 + return ret; 3699 + return ret + 1; 3700 + } 3701 + 3702 + static __always_inline int copy_kernel_str_nofault(void *dst, const void *unsafe_src, 3703 + u32 size, struct task_struct *tsk) 3704 + { 3705 + return strncpy_from_kernel_nofault(dst, unsafe_src, size); 3706 + } 3707 + 3471 3708 __bpf_kfunc_start_defs(); 3472 3709 3473 3710 __bpf_kfunc int bpf_send_signal_task(struct task_struct *task, int sig, enum pid_type type, ··· 3617 3574 return -EINVAL; 3618 3575 3619 3576 return bpf_send_signal_common(sig, type, task, value); 3577 + } 3578 + 3579 + __bpf_kfunc int bpf_probe_read_user_dynptr(struct bpf_dynptr *dptr, u32 off, 3580 + u32 size, const void __user *unsafe_ptr__ign) 3581 + { 3582 + return __bpf_dynptr_copy(dptr, off, size, (const void *)unsafe_ptr__ign, 3583 + copy_user_data_nofault, NULL); 3584 + } 3585 + 3586 + __bpf_kfunc int bpf_probe_read_kernel_dynptr(struct bpf_dynptr *dptr, u32 off, 3587 + u32 size, const void *unsafe_ptr__ign) 3588 + { 3589 + return __bpf_dynptr_copy(dptr, off, size, unsafe_ptr__ign, 3590 + copy_kernel_data_nofault, NULL); 3591 + } 3592 + 3593 + __bpf_kfunc int bpf_probe_read_user_str_dynptr(struct bpf_dynptr *dptr, u32 off, 3594 + u32 size, const void __user *unsafe_ptr__ign) 3595 + { 3596 + return __bpf_dynptr_copy_str(dptr, off, size, (const void *)unsafe_ptr__ign, 3597 + copy_user_str_nofault, NULL); 3598 + } 3599 + 3600 + __bpf_kfunc int bpf_probe_read_kernel_str_dynptr(struct bpf_dynptr *dptr, u32 off, 3601 + u32 size, const void *unsafe_ptr__ign) 3602 + { 3603 + return __bpf_dynptr_copy_str(dptr, off, size, unsafe_ptr__ign, 3604 + copy_kernel_str_nofault, NULL); 3605 + } 3606 + 3607 + __bpf_kfunc int bpf_copy_from_user_dynptr(struct bpf_dynptr *dptr, u32 off, 3608 + u32 size, const void __user *unsafe_ptr__ign) 3609 + { 3610 + return __bpf_dynptr_copy(dptr, off, size, (const void *)unsafe_ptr__ign, 3611 + copy_user_data_sleepable, NULL); 3612 + } 3613 + 3614 + __bpf_kfunc int bpf_copy_from_user_str_dynptr(struct bpf_dynptr *dptr, u32 off, 3615 + u32 size, const void __user *unsafe_ptr__ign) 3616 + { 3617 + return __bpf_dynptr_copy_str(dptr, off, size, (const void *)unsafe_ptr__ign, 3618 + copy_user_str_sleepable, NULL); 3619 + } 3620 + 3621 + __bpf_kfunc int bpf_copy_from_user_task_dynptr(struct bpf_dynptr *dptr, u32 off, 3622 + u32 size, const void __user *unsafe_ptr__ign, 3623 + struct task_struct *tsk) 3624 + { 3625 + return __bpf_dynptr_copy(dptr, off, size, (const void *)unsafe_ptr__ign, 3626 + copy_user_data_sleepable, tsk); 3627 + } 3628 + 3629 + __bpf_kfunc int bpf_copy_from_user_task_str_dynptr(struct bpf_dynptr *dptr, u32 off, 3630 + u32 size, const void __user *unsafe_ptr__ign, 3631 + struct task_struct *tsk) 3632 + { 3633 + return __bpf_dynptr_copy_str(dptr, off, size, (const void *)unsafe_ptr__ign, 3634 + copy_user_str_sleepable, tsk); 3620 3635 } 3621 3636 3622 3637 __bpf_kfunc_end_defs();

+1 -1

kernel/trace/trace_uprobe.c

··· 1489 1489 : BPF_FD_TYPE_UPROBE; 1490 1490 *filename = tu->filename; 1491 1491 *probe_offset = tu->offset; 1492 - *probe_addr = 0; 1492 + *probe_addr = tu->ref_ctr_offset; 1493 1493 return 0; 1494 1494 } 1495 1495 #endif /* CONFIG_PERF_EVENTS */

+7 -1

net/bpf/test_run.c

··· 569 569 return *a; 570 570 } 571 571 572 + int noinline bpf_fentry_test10(const void *a) 573 + { 574 + return (long)a; 575 + } 576 + 572 577 void noinline bpf_fentry_test_sinfo(struct skb_shared_info *sinfo) 573 578 { 574 579 } ··· 704 699 bpf_fentry_test6(16, (void *)17, 18, 19, (void *)20, 21) != 111 || 705 700 bpf_fentry_test7((struct bpf_fentry_test_t *)0) != 0 || 706 701 bpf_fentry_test8(&arg) != 0 || 707 - bpf_fentry_test9(&retval) != 0) 702 + bpf_fentry_test9(&retval) != 0 || 703 + bpf_fentry_test10((void *)0) != 0) 708 704 goto out; 709 705 break; 710 706 case BPF_MODIFY_RETURN:

-14

net/core/filter.c

··· 8023 8023 if (func_proto) 8024 8024 return func_proto; 8025 8025 8026 - func_proto = cgroup_current_func_proto(func_id, prog); 8027 - if (func_proto) 8028 - return func_proto; 8029 - 8030 8026 switch (func_id) { 8031 8027 case BPF_FUNC_get_socket_cookie: 8032 8028 return &bpf_get_socket_cookie_sock_proto; ··· 8045 8049 const struct bpf_func_proto *func_proto; 8046 8050 8047 8051 func_proto = cgroup_common_func_proto(func_id, prog); 8048 - if (func_proto) 8049 - return func_proto; 8050 - 8051 - func_proto = cgroup_current_func_proto(func_id, prog); 8052 8052 if (func_proto) 8053 8053 return func_proto; 8054 8054 ··· 8481 8489 return &bpf_msg_pop_data_proto; 8482 8490 case BPF_FUNC_perf_event_output: 8483 8491 return &bpf_event_output_data_proto; 8484 - case BPF_FUNC_get_current_uid_gid: 8485 - return &bpf_get_current_uid_gid_proto; 8486 8492 case BPF_FUNC_sk_storage_get: 8487 8493 return &bpf_sk_storage_get_proto; 8488 8494 case BPF_FUNC_sk_storage_delete: 8489 8495 return &bpf_sk_storage_delete_proto; 8490 8496 case BPF_FUNC_get_netns_cookie: 8491 8497 return &bpf_get_netns_cookie_sk_msg_proto; 8492 - #ifdef CONFIG_CGROUP_NET_CLASSID 8493 - case BPF_FUNC_get_cgroup_classid: 8494 - return &bpf_get_cgroup_classid_curr_proto; 8495 - #endif 8496 8498 default: 8497 8499 return bpf_sk_base_func_proto(func_id, prog); 8498 8500 }

+35 -21

net/core/skmsg.c

··· 530 530 u32 off, u32 len, 531 531 struct sk_psock *psock, 532 532 struct sock *sk, 533 - struct sk_msg *msg) 533 + struct sk_msg *msg, 534 + bool take_ref) 534 535 { 535 536 int num_sge, copied; 536 537 538 + /* skb_to_sgvec will fail when the total number of fragments in 539 + * frag_list and frags exceeds MAX_MSG_FRAGS. For example, the 540 + * caller may aggregate multiple skbs. 541 + */ 537 542 num_sge = skb_to_sgvec(skb, msg->sg.data, off, len); 538 543 if (num_sge < 0) { 539 544 /* skb linearize may fail with ENOMEM, but lets simply try again 540 545 * later if this happens. Under memory pressure we don't want to 541 546 * drop the skb. We need to linearize the skb so that the mapping 542 547 * in skb_to_sgvec can not error. 548 + * Note that skb_linearize requires the skb not to be shared. 543 549 */ 544 550 if (skb_linearize(skb)) 545 551 return -EAGAIN; ··· 562 556 msg->sg.start = 0; 563 557 msg->sg.size = copied; 564 558 msg->sg.end = num_sge; 565 - msg->skb = skb; 559 + msg->skb = take_ref ? skb_get(skb) : skb; 566 560 567 561 sk_psock_queue_msg(psock, msg); 568 562 sk_psock_data_ready(sk, psock); ··· 570 564 } 571 565 572 566 static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb, 573 - u32 off, u32 len); 567 + u32 off, u32 len, bool take_ref); 574 568 575 569 static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb, 576 570 u32 off, u32 len) ··· 584 578 * correctly. 585 579 */ 586 580 if (unlikely(skb->sk == sk)) 587 - return sk_psock_skb_ingress_self(psock, skb, off, len); 581 + return sk_psock_skb_ingress_self(psock, skb, off, len, true); 588 582 msg = sk_psock_create_ingress_msg(sk, skb); 589 583 if (!msg) 590 584 return -EAGAIN; ··· 596 590 * into user buffers. 597 591 */ 598 592 skb_set_owner_r(skb, sk); 599 - err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); 593 + err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, true); 600 594 if (err < 0) 601 595 kfree(msg); 602 596 return err; ··· 607 601 * because the skb is already accounted for here. 608 602 */ 609 603 static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb, 610 - u32 off, u32 len) 604 + u32 off, u32 len, bool take_ref) 611 605 { 612 606 struct sk_msg *msg = alloc_sk_msg(GFP_ATOMIC); 613 607 struct sock *sk = psock->sk; ··· 616 610 if (unlikely(!msg)) 617 611 return -EAGAIN; 618 612 skb_set_owner_r(skb, sk); 619 - err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); 613 + err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, take_ref); 620 614 if (err < 0) 621 615 kfree(msg); 622 616 return err; ··· 625 619 static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, 626 620 u32 off, u32 len, bool ingress) 627 621 { 628 - int err = 0; 629 - 630 622 if (!ingress) { 631 623 if (!sock_writeable(psock->sk)) 632 624 return -EAGAIN; 633 625 return skb_send_sock(psock->sk, skb, off, len); 634 626 } 635 - skb_get(skb); 636 - err = sk_psock_skb_ingress(psock, skb, off, len); 637 - if (err < 0) 638 - kfree_skb(skb); 639 - return err; 627 + 628 + return sk_psock_skb_ingress(psock, skb, off, len); 640 629 } 641 630 642 631 static void sk_psock_skb_state(struct sk_psock *psock, ··· 656 655 bool ingress; 657 656 int ret; 658 657 658 + /* Increment the psock refcnt to synchronize with close(fd) path in 659 + * sock_map_close(), ensuring we wait for backlog thread completion 660 + * before sk_socket freed. If refcnt increment fails, it indicates 661 + * sock_map_close() completed with sk_socket potentially already freed. 662 + */ 663 + if (!sk_psock_get(psock->sk)) 664 + return; 659 665 mutex_lock(&psock->work_mutex); 660 - if (unlikely(state->len)) { 661 - len = state->len; 662 - off = state->off; 663 - } 664 - 665 666 while ((skb = skb_peek(&psock->ingress_skb))) { 666 667 len = skb->len; 667 668 off = 0; ··· 673 670 off = stm->offset; 674 671 len = stm->full_len; 675 672 } 673 + 674 + /* Resume processing from previous partial state */ 675 + if (unlikely(state->len)) { 676 + len = state->len; 677 + off = state->off; 678 + } 679 + 676 680 ingress = skb_bpf_ingress(skb); 677 681 skb_bpf_redirect_clear(skb); 678 682 do { ··· 690 680 if (ret <= 0) { 691 681 if (ret == -EAGAIN) { 692 682 sk_psock_skb_state(psock, state, len, off); 693 - 683 + /* Restore redir info we cleared before */ 684 + skb_bpf_set_redir(skb, psock->sk, ingress); 694 685 /* Delay slightly to prioritize any 695 686 * other work that might be here. 696 687 */ ··· 708 697 len -= ret; 709 698 } while (len); 710 699 700 + /* The entire skb sent, clear state */ 701 + sk_psock_skb_state(psock, state, 0, 0); 711 702 skb = skb_dequeue(&psock->ingress_skb); 712 703 kfree_skb(skb); 713 704 } 714 705 end: 715 706 mutex_unlock(&psock->work_mutex); 707 + sk_psock_put(psock->sk, psock); 716 708 } 717 709 718 710 struct sk_psock *sk_psock_init(struct sock *sk, int node) ··· 1028 1014 off = stm->offset; 1029 1015 len = stm->full_len; 1030 1016 } 1031 - err = sk_psock_skb_ingress_self(psock, skb, off, len); 1017 + err = sk_psock_skb_ingress_self(psock, skb, off, len, false); 1032 1018 } 1033 1019 if (err < 0) { 1034 1020 spin_lock_bh(&psock->ingress_lock);

+13 -2

net/tls/tls_sw.c

··· 908 908 &msg_redir, send, flags); 909 909 lock_sock(sk); 910 910 if (err < 0) { 911 + /* Regardless of whether the data represented by 912 + * msg_redir is sent successfully, we have already 913 + * uncharged it via sk_msg_return_zero(). The 914 + * msg->sg.size represents the remaining unprocessed 915 + * data, which needs to be uncharged here. 916 + */ 917 + sk_mem_uncharge(sk, msg->sg.size); 911 918 *copied -= sk_msg_free_nocharge(sk, &msg_redir); 912 919 msg->sg.size = 0; 913 920 } ··· 1127 1120 num_async++; 1128 1121 else if (ret == -ENOMEM) 1129 1122 goto wait_for_memory; 1130 - else if (ctx->open_rec && ret == -ENOSPC) 1123 + else if (ctx->open_rec && ret == -ENOSPC) { 1124 + if (msg_pl->cork_bytes) { 1125 + ret = 0; 1126 + goto send_end; 1127 + } 1131 1128 goto rollback_iter; 1132 - else if (ret != -EAGAIN) 1129 + } else if (ret != -EAGAIN) 1133 1130 goto send_end; 1134 1131 } 1135 1132 continue;

+2

scripts/Makefile.btf

··· 23 23 # Switch to using --btf_features for v1.26 and later. 24 24 pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j$(JOBS) --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs 25 25 26 + pahole-flags-$(call test-ge, $(pahole-ver), 130) += --btf_features=attributes 27 + 26 28 ifneq ($(KBUILD_EXTMOD),) 27 29 module-pahole-flags-$(call test-ge, $(pahole-ver), 128) += --btf_features=distilled_base 28 30 endif

+98 -19

scripts/bpf_doc.py

··· 8 8 from __future__ import print_function 9 9 10 10 import argparse 11 + import json 11 12 import re 12 13 import sys, os 13 14 import subprocess ··· 38 37 @desc: textual description of the symbol 39 38 @ret: (optional) description of any associated return value 40 39 """ 41 - def __init__(self, proto='', desc='', ret='', attrs=[]): 40 + def __init__(self, proto='', desc='', ret=''): 42 41 self.proto = proto 43 42 self.desc = desc 44 43 self.ret = ret 45 - self.attrs = attrs 44 + 45 + def to_dict(self): 46 + return { 47 + 'proto': self.proto, 48 + 'desc': self.desc, 49 + 'ret': self.ret 50 + } 46 51 47 52 48 53 class Helper(APIElement): ··· 58 51 @desc: textual description of the helper function 59 52 @ret: description of the return value of the helper function 60 53 """ 61 - def __init__(self, *args, **kwargs): 62 - super().__init__(*args, **kwargs) 54 + def __init__(self, proto='', desc='', ret='', attrs=[]): 55 + super().__init__(proto, desc, ret) 56 + self.attrs = attrs 63 57 self.enum_val = None 64 58 65 59 def proto_break_down(self): ··· 88 80 }) 89 81 90 82 return res 83 + 84 + def to_dict(self): 85 + d = super().to_dict() 86 + d["attrs"] = self.attrs 87 + d.update(self.proto_break_down()) 88 + return d 91 89 92 90 93 91 ATTRS = { ··· 689 675 self.print_elem(command) 690 676 691 677 692 - class PrinterHelpers(Printer): 678 + class PrinterHelpersHeader(Printer): 693 679 """ 694 680 A printer for dumping collected information about helpers as C header to 695 681 be included from BPF program. ··· 910 896 print(') = (void *) %d;' % helper.enum_val) 911 897 print('') 912 898 899 + 900 + class PrinterHelpersJSON(Printer): 901 + """ 902 + A printer for dumping collected information about helpers as a JSON file. 903 + @parser: A HeaderParser with Helper objects 904 + """ 905 + 906 + def __init__(self, parser): 907 + self.elements = parser.helpers 908 + self.elem_number_check( 909 + parser.desc_unique_helpers, 910 + parser.define_unique_helpers, 911 + "helper", 912 + "___BPF_FUNC_MAPPER", 913 + ) 914 + 915 + def print_all(self): 916 + helper_dicts = [helper.to_dict() for helper in self.elements] 917 + out_dict = {'helpers': helper_dicts} 918 + print(json.dumps(out_dict, indent=4)) 919 + 920 + 921 + class PrinterSyscallJSON(Printer): 922 + """ 923 + A printer for dumping collected syscall information as a JSON file. 924 + @parser: A HeaderParser with APIElement objects 925 + """ 926 + 927 + def __init__(self, parser): 928 + self.elements = parser.commands 929 + self.elem_number_check(parser.desc_syscalls, parser.enum_syscalls, 'syscall', 'bpf_cmd') 930 + 931 + def print_all(self): 932 + syscall_dicts = [syscall.to_dict() for syscall in self.elements] 933 + out_dict = {'syscall': syscall_dicts} 934 + print(json.dumps(out_dict, indent=4)) 935 + 913 936 ############################################################################### 914 937 915 938 # If script is launched from scripts/ from kernel tree and can access ··· 956 905 linuxRoot = os.path.dirname(os.path.dirname(script)) 957 906 bpfh = os.path.join(linuxRoot, 'include/uapi/linux/bpf.h') 958 907 908 + # target -> output format -> printer 959 909 printers = { 960 - 'helpers': PrinterHelpersRST, 961 - 'syscall': PrinterSyscallRST, 910 + 'helpers': { 911 + 'rst': PrinterHelpersRST, 912 + 'json': PrinterHelpersJSON, 913 + 'header': PrinterHelpersHeader, 914 + }, 915 + 'syscall': { 916 + 'rst': PrinterSyscallRST, 917 + 'json': PrinterSyscallJSON 918 + }, 962 919 } 963 920 964 921 argParser = argparse.ArgumentParser(description=""" ··· 976 917 """) 977 918 argParser.add_argument('--header', action='store_true', 978 919 help='generate C header file') 920 + argParser.add_argument('--json', action='store_true', 921 + help='generate a JSON') 979 922 if (os.path.isfile(bpfh)): 980 923 argParser.add_argument('--filename', help='path to include/uapi/linux/bpf.h', 981 924 default=bpfh) ··· 985 924 argParser.add_argument('--filename', help='path to include/uapi/linux/bpf.h') 986 925 argParser.add_argument('target', nargs='?', default='helpers', 987 926 choices=printers.keys(), help='eBPF API target') 988 - args = argParser.parse_args() 989 927 990 - # Parse file. 991 - headerParser = HeaderParser(args.filename) 992 - headerParser.run() 928 + def error_die(message: str): 929 + argParser.print_usage(file=sys.stderr) 930 + print('Error: {}'.format(message), file=sys.stderr) 931 + exit(1) 993 932 994 - # Print formatted output to standard output. 995 - if args.header: 996 - if args.target != 'helpers': 997 - raise NotImplementedError('Only helpers header generation is supported') 998 - printer = PrinterHelpers(headerParser) 999 - else: 1000 - printer = printers[args.target](headerParser) 1001 - printer.print_all() 933 + def parse_and_dump(): 934 + args = argParser.parse_args() 935 + 936 + # Parse file. 937 + headerParser = HeaderParser(args.filename) 938 + headerParser.run() 939 + 940 + if args.header and args.json: 941 + error_die('Use either --header or --json, not both') 942 + 943 + output_format = 'rst' 944 + if args.header: 945 + output_format = 'header' 946 + elif args.json: 947 + output_format = 'json' 948 + 949 + try: 950 + printer = printers[args.target][output_format](headerParser) 951 + # Print formatted output to standard output. 952 + printer.print_all() 953 + except KeyError: 954 + error_die('Unsupported target/format combination: "{}", "{}"' 955 + .format(args.target, output_format)) 956 + 957 + if __name__ == "__main__": 958 + parse_and_dump()

+8 -2

tools/bpf/bpftool/Documentation/bpftool-prog.rst

··· 31 31 | **bpftool** **prog dump xlated** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] [**visual**] }] 32 32 | **bpftool** **prog dump jited** *PROG* [{ **file** *FILE* | [**opcodes**] [**linum**] }] 33 33 | **bpftool** **prog pin** *PROG* *FILE* 34 - | **bpftool** **prog** { **load** | **loadall** } *OBJ* *PATH* [**type** *TYPE*] [**map** { **idx** *IDX* | **name** *NAME* } *MAP*] [{ **offload_dev** | **xdpmeta_dev** } *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**] 34 + | **bpftool** **prog** { **load** | **loadall** } *OBJ* *PATH* [**type** *TYPE*] [**map** { **idx** *IDX* | **name** *NAME* } *MAP*] [{ **offload_dev** | **xdpmeta_dev** } *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**] [**kernel_btf** *BTF_FILE*] 35 35 | **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*] 36 36 | **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*] 37 37 | **bpftool** **prog tracelog** ··· 127 127 Note: *FILE* must be located in *bpffs* mount. It must not contain a dot 128 128 character ('.'), which is reserved for future extensions of *bpffs*. 129 129 130 - bpftool prog { load | loadall } *OBJ* *PATH* [type *TYPE*] [map { idx *IDX* | name *NAME* } *MAP*] [{ offload_dev | xdpmeta_dev } *NAME*] [pinmaps *MAP_DIR*] [autoattach] 130 + bpftool prog { load | loadall } *OBJ* *PATH* [type *TYPE*] [map { idx *IDX* | name *NAME* } *MAP*] [{ offload_dev | xdpmeta_dev } *NAME*] [pinmaps *MAP_DIR*] [autoattach] [kernel_btf *BTF_FILE*] 131 131 Load bpf program(s) from binary *OBJ* and pin as *PATH*. **bpftool prog 132 132 load** pins only the first program from the *OBJ* as *PATH*. **bpftool prog 133 133 loadall** pins all programs from the *OBJ* under *PATH* directory. **type** ··· 152 152 object file, in particular, it's not supported for all program types. If a 153 153 program does not support autoattach, bpftool falls back to regular pinning 154 154 for that program instead. 155 + 156 + The **kernel_btf** option allows specifying an external BTF file to replace 157 + the system's own vmlinux BTF file for CO-RE relocations. Note that any 158 + other feature relying on BTF (such as fentry/fexit programs, struct_ops) 159 + requires the BTF file for the actual kernel running on the host, often 160 + exposed at /sys/kernel/btf/vmlinux. 155 161 156 162 Note: *PATH* must be located in *bpffs* mount. It must not contain a dot 157 163 character ('.'), which is reserved for future extensions of *bpffs*.

+2 -2

tools/bpf/bpftool/bash-completion/bpftool

··· 505 505 _bpftool_get_map_names 506 506 return 0 507 507 ;; 508 - pinned|pinmaps) 508 + pinned|pinmaps|kernel_btf) 509 509 _filedir 510 510 return 0 511 511 ;; 512 512 *) 513 513 COMPREPLY=( $( compgen -W "map" -- "$cur" ) ) 514 - _bpftool_once_attr 'type pinmaps autoattach' 514 + _bpftool_once_attr 'type pinmaps autoattach kernel_btf' 515 515 _bpftool_one_of_list 'offload_dev xdpmeta_dev' 516 516 return 0 517 517 ;;

+7 -7

tools/bpf/bpftool/cgroup.c

··· 221 221 for (i = 0; i < ARRAY_SIZE(cgroup_attach_types); i++) { 222 222 int count = count_attached_bpf_progs(cgroup_fd, cgroup_attach_types[i]); 223 223 224 - if (count < 0) 224 + if (count < 0 && errno != EINVAL) 225 225 return -1; 226 226 227 227 if (count > 0) { ··· 318 318 319 319 static int do_show(int argc, char **argv) 320 320 { 321 - enum bpf_attach_type type; 322 321 int has_attached_progs; 323 322 const char *path; 324 323 int cgroup_fd; 325 324 int ret = -1; 325 + unsigned int i; 326 326 327 327 query_flags = 0; 328 328 ··· 370 370 "AttachFlags", "Name"); 371 371 372 372 btf_vmlinux = libbpf_find_kernel_btf(); 373 - for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) { 373 + for (i = 0; i < ARRAY_SIZE(cgroup_attach_types); i++) { 374 374 /* 375 375 * Not all attach types may be supported, so it's expected, 376 376 * that some requests will fail. 377 377 * If we were able to get the show for at least one 378 378 * attach type, let's return 0. 379 379 */ 380 - if (show_bpf_progs(cgroup_fd, type, 0) == 0) 380 + if (show_bpf_progs(cgroup_fd, cgroup_attach_types[i], 0) == 0) 381 381 ret = 0; 382 382 } 383 383 ··· 400 400 static int do_show_tree_fn(const char *fpath, const struct stat *sb, 401 401 int typeflag, struct FTW *ftw) 402 402 { 403 - enum bpf_attach_type type; 404 403 int has_attached_progs; 405 404 int cgroup_fd; 405 + unsigned int i; 406 406 407 407 if (typeflag != FTW_D) 408 408 return 0; ··· 434 434 } 435 435 436 436 btf_vmlinux = libbpf_find_kernel_btf(); 437 - for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) 438 - show_bpf_progs(cgroup_fd, type, ftw->level); 437 + for (i = 0; i < ARRAY_SIZE(cgroup_attach_types); i++) 438 + show_bpf_progs(cgroup_fd, cgroup_attach_types[i], ftw->level); 439 439 440 440 if (errno == EINVAL) 441 441 /* Last attach type does not support query.

+3

tools/bpf/bpftool/link.c

··· 380 380 u64_to_ptr(info->perf_event.uprobe.file_name)); 381 381 jsonw_uint_field(wtr, "offset", info->perf_event.uprobe.offset); 382 382 jsonw_uint_field(wtr, "cookie", info->perf_event.uprobe.cookie); 383 + jsonw_uint_field(wtr, "ref_ctr_offset", info->perf_event.uprobe.ref_ctr_offset); 383 384 } 384 385 385 386 static void ··· 824 823 printf("%s+%#x ", buf, info->perf_event.uprobe.offset); 825 824 if (info->perf_event.uprobe.cookie) 826 825 printf("cookie %llu ", info->perf_event.uprobe.cookie); 826 + if (info->perf_event.uprobe.ref_ctr_offset) 827 + printf("ref_ctr_offset 0x%llx ", info->perf_event.uprobe.ref_ctr_offset); 827 828 } 828 829 829 830 static void show_perf_event_tracepoint_plain(struct bpf_link_info *info)

+11 -1

tools/bpf/bpftool/prog.c

··· 1681 1681 } else if (is_prefix(*argv, "autoattach")) { 1682 1682 auto_attach = true; 1683 1683 NEXT_ARG(); 1684 + } else if (is_prefix(*argv, "kernel_btf")) { 1685 + NEXT_ARG(); 1686 + 1687 + if (!REQ_ARGS(1)) 1688 + goto err_free_reuse_maps; 1689 + 1690 + open_opts.btf_custom_path = GET_ARG(); 1684 1691 } else { 1685 - p_err("expected no more arguments, 'type', 'map' or 'dev', got: '%s'?", 1692 + p_err("expected no more arguments, " 1693 + "'type', 'map', 'offload_dev', 'xdpmeta_dev', 'pinmaps', " 1694 + "'autoattach', or 'kernel_btf', got: '%s'?", 1686 1695 *argv); 1687 1696 goto err_free_reuse_maps; 1688 1697 } ··· 2483 2474 " [map { idx IDX | name NAME } MAP]\\\n" 2484 2475 " [pinmaps MAP_DIR]\n" 2485 2476 " [autoattach]\n" 2477 + " [kernel_btf BTF_FILE]\n" 2486 2478 " %1$s %2$s attach PROG ATTACH_TYPE [MAP]\n" 2487 2479 " %1$s %2$s detach PROG ATTACH_TYPE [MAP]\n" 2488 2480 " %1$s %2$s run PROG \\\n"

+12 -7

tools/include/uapi/linux/bpf.h

··· 1506 1506 __s32 map_token_fd; 1507 1507 }; 1508 1508 1509 - struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ 1509 + struct { /* anonymous struct used by BPF_MAP_*_ELEM and BPF_MAP_FREEZE commands */ 1510 1510 __u32 map_fd; 1511 1511 __aligned_u64 key; 1512 1512 union { ··· 1995 1995 * long bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags) 1996 1996 * Description 1997 1997 * Store *len* bytes from address *from* into the packet 1998 - * associated to *skb*, at *offset*. *flags* are a combination of 1999 - * **BPF_F_RECOMPUTE_CSUM** (automatically recompute the 2000 - * checksum for the packet after storing the bytes) and 2001 - * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\ 2002 - * **->swhash** and *skb*\ **->l4hash** to 0). 1998 + * associated to *skb*, at *offset*. The *flags* are a combination 1999 + * of the following values: 2000 + * 2001 + * **BPF_F_RECOMPUTE_CSUM** 2002 + * Automatically update *skb*\ **->csum** after storing the 2003 + * bytes. 2004 + * **BPF_F_INVALIDATE_HASH** 2005 + * Set *skb*\ **->hash**, *skb*\ **->swhash** and *skb*\ 2006 + * **->l4hash** to 0. 2003 2007 * 2004 2008 * A call to this helper is susceptible to change the underlying 2005 2009 * packet buffer. Therefore, at load time, all checks on pointers ··· 2055 2051 * untouched (unless **BPF_F_MARK_ENFORCE** is added as well), and 2056 2052 * for updates resulting in a null checksum the value is set to 2057 2053 * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates 2058 - * the checksum is to be computed against a pseudo-header. 2054 + * that the modified header field is part of the pseudo-header. 2059 2055 * 2060 2056 * This helper works in combination with **bpf_csum_diff**\ (), 2061 2057 * which does not update the checksum in-place, but offers more ··· 6727 6723 __u32 name_len; 6728 6724 __u32 offset; /* offset from file_name */ 6729 6725 __u64 cookie; 6726 + __u64 ref_ctr_offset; 6730 6727 } uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */ 6731 6728 struct { 6732 6729 __aligned_u64 func_name; /* in/out */

+6

tools/lib/bpf/bpf_core_read.h

··· 388 388 #define ___arrow10(a, b, c, d, e, f, g, h, i, j) a->b->c->d->e->f->g->h->i->j 389 389 #define ___arrow(...) ___apply(___arrow, ___narg(__VA_ARGS__))(__VA_ARGS__) 390 390 391 + #if defined(__clang__) && (__clang_major__ >= 19) 392 + #define ___type(...) __typeof_unqual__(___arrow(__VA_ARGS__)) 393 + #elif defined(__GNUC__) && (__GNUC__ >= 14) 394 + #define ___type(...) __typeof_unqual__(___arrow(__VA_ARGS__)) 395 + #else 391 396 #define ___type(...) typeof(___arrow(__VA_ARGS__)) 397 + #endif 392 398 393 399 #define ___read(read_fn, dst, src_type, src, accessor) \ 394 400 read_fn((void *)(dst), sizeof(*(dst)), &((src_type)(src))->accessor)

+8

tools/lib/bpf/bpf_helpers.h

··· 15 15 #define __array(name, val) typeof(val) *name[] 16 16 #define __ulong(name, val) enum { ___bpf_concat(__unique_value, __COUNTER__) = val } name 17 17 18 + #ifndef likely 19 + #define likely(x) (__builtin_expect(!!(x), 1)) 20 + #endif 21 + 22 + #ifndef unlikely 23 + #define unlikely(x) (__builtin_expect(!!(x), 0)) 24 + #endif 25 + 18 26 /* 19 27 * Helper macro to place programs, maps, license in 20 28 * different sections in elf_bpf file. Section names

+174 -58

tools/lib/bpf/btf.c

··· 12 12 #include <sys/utsname.h> 13 13 #include <sys/param.h> 14 14 #include <sys/stat.h> 15 + #include <sys/mman.h> 15 16 #include <linux/kernel.h> 16 17 #include <linux/err.h> 17 18 #include <linux/btf.h> ··· 120 119 121 120 /* whether base_btf should be freed in btf_free for this instance */ 122 121 bool owns_base; 122 + 123 + /* whether raw_data is a (read-only) mmap */ 124 + bool raw_data_is_mmap; 123 125 124 126 /* BTF object FD, if loaded into kernel */ 125 127 int fd; ··· 955 951 return (void *)btf->hdr != btf->raw_data; 956 952 } 957 953 954 + static void btf_free_raw_data(struct btf *btf) 955 + { 956 + if (btf->raw_data_is_mmap) { 957 + munmap(btf->raw_data, btf->raw_size); 958 + btf->raw_data_is_mmap = false; 959 + } else { 960 + free(btf->raw_data); 961 + } 962 + btf->raw_data = NULL; 963 + } 964 + 958 965 void btf__free(struct btf *btf) 959 966 { 960 967 if (IS_ERR_OR_NULL(btf)) ··· 985 970 free(btf->types_data); 986 971 strset__free(btf->strs_set); 987 972 } 988 - free(btf->raw_data); 973 + btf_free_raw_data(btf); 989 974 free(btf->raw_data_swapped); 990 975 free(btf->type_offs); 991 976 if (btf->owns_base) ··· 1011 996 if (base_btf) { 1012 997 btf->base_btf = base_btf; 1013 998 btf->start_id = btf__type_cnt(base_btf); 1014 - btf->start_str_off = base_btf->hdr->str_len; 999 + btf->start_str_off = base_btf->hdr->str_len + base_btf->start_str_off; 1015 1000 btf->swapped_endian = base_btf->swapped_endian; 1016 1001 } 1017 1002 ··· 1045 1030 return libbpf_ptr(btf_new_empty(base_btf)); 1046 1031 } 1047 1032 1048 - static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf) 1033 + static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf, bool is_mmap) 1049 1034 { 1050 1035 struct btf *btf; 1051 1036 int err; ··· 1065 1050 btf->start_str_off = base_btf->hdr->str_len; 1066 1051 } 1067 1052 1068 - btf->raw_data = malloc(size); 1069 - if (!btf->raw_data) { 1070 - err = -ENOMEM; 1071 - goto done; 1053 + if (is_mmap) { 1054 + btf->raw_data = (void *)data; 1055 + btf->raw_data_is_mmap = true; 1056 + } else { 1057 + btf->raw_data = malloc(size); 1058 + if (!btf->raw_data) { 1059 + err = -ENOMEM; 1060 + goto done; 1061 + } 1062 + memcpy(btf->raw_data, data, size); 1072 1063 } 1073 - memcpy(btf->raw_data, data, size); 1064 + 1074 1065 btf->raw_size = size; 1075 1066 1076 1067 btf->hdr = btf->raw_data; ··· 1104 1083 1105 1084 struct btf *btf__new(const void *data, __u32 size) 1106 1085 { 1107 - return libbpf_ptr(btf_new(data, size, NULL)); 1086 + return libbpf_ptr(btf_new(data, size, NULL, false)); 1108 1087 } 1109 1088 1110 1089 struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf) 1111 1090 { 1112 - return libbpf_ptr(btf_new(data, size, base_btf)); 1091 + return libbpf_ptr(btf_new(data, size, base_btf, false)); 1113 1092 } 1114 1093 1115 1094 struct btf_elf_secs { ··· 1169 1148 else 1170 1149 continue; 1171 1150 1151 + if (sh.sh_type != SHT_PROGBITS) { 1152 + pr_warn("unexpected section type (%d) of section(%d, %s) from %s\n", 1153 + sh.sh_type, idx, name, path); 1154 + goto err; 1155 + } 1156 + 1172 1157 data = elf_getdata(scn, 0); 1173 1158 if (!data) { 1174 1159 pr_warn("failed to get section(%d, %s) data from %s\n", ··· 1230 1203 1231 1204 if (secs.btf_base_data) { 1232 1205 dist_base_btf = btf_new(secs.btf_base_data->d_buf, secs.btf_base_data->d_size, 1233 - NULL); 1206 + NULL, false); 1234 1207 if (IS_ERR(dist_base_btf)) { 1235 1208 err = PTR_ERR(dist_base_btf); 1236 1209 dist_base_btf = NULL; ··· 1239 1212 } 1240 1213 1241 1214 btf = btf_new(secs.btf_data->d_buf, secs.btf_data->d_size, 1242 - dist_base_btf ?: base_btf); 1215 + dist_base_btf ?: base_btf, false); 1243 1216 if (IS_ERR(btf)) { 1244 1217 err = PTR_ERR(btf); 1245 1218 goto done; ··· 1356 1329 } 1357 1330 1358 1331 /* finally parse BTF data */ 1359 - btf = btf_new(data, sz, base_btf); 1332 + btf = btf_new(data, sz, base_btf, false); 1360 1333 1361 1334 err_out: 1362 1335 free(data); ··· 1373 1346 struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf) 1374 1347 { 1375 1348 return libbpf_ptr(btf_parse_raw(path, base_btf)); 1349 + } 1350 + 1351 + static struct btf *btf_parse_raw_mmap(const char *path, struct btf *base_btf) 1352 + { 1353 + struct stat st; 1354 + void *data; 1355 + struct btf *btf; 1356 + int fd, err; 1357 + 1358 + fd = open(path, O_RDONLY); 1359 + if (fd < 0) 1360 + return libbpf_err_ptr(-errno); 1361 + 1362 + if (fstat(fd, &st) < 0) { 1363 + err = -errno; 1364 + close(fd); 1365 + return libbpf_err_ptr(err); 1366 + } 1367 + 1368 + data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); 1369 + err = -errno; 1370 + close(fd); 1371 + 1372 + if (data == MAP_FAILED) 1373 + return libbpf_err_ptr(err); 1374 + 1375 + btf = btf_new(data, st.st_size, base_btf, true); 1376 + if (IS_ERR(btf)) 1377 + munmap(data, st.st_size); 1378 + 1379 + return btf; 1376 1380 } 1377 1381 1378 1382 static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_ext **btf_ext) ··· 1670 1612 goto exit_free; 1671 1613 } 1672 1614 1673 - btf = btf_new(ptr, btf_info.btf_size, base_btf); 1615 + btf = btf_new(ptr, btf_info.btf_size, base_btf, false); 1674 1616 1675 1617 exit_free: 1676 1618 free(ptr); ··· 1710 1652 1711 1653 static void btf_invalidate_raw_data(struct btf *btf) 1712 1654 { 1713 - if (btf->raw_data) { 1714 - free(btf->raw_data); 1715 - btf->raw_data = NULL; 1716 - } 1655 + if (btf->raw_data) 1656 + btf_free_raw_data(btf); 1717 1657 if (btf->raw_data_swapped) { 1718 1658 free(btf->raw_data_swapped); 1719 1659 btf->raw_data_swapped = NULL; ··· 4406 4350 return btf_kflag(t) ? BTF_KIND_UNION : BTF_KIND_STRUCT; 4407 4351 } 4408 4352 4409 - /* Check if given two types are identical ARRAY definitions */ 4410 - static bool btf_dedup_identical_arrays(struct btf_dedup *d, __u32 id1, __u32 id2) 4353 + static bool btf_dedup_identical_types(struct btf_dedup *d, __u32 id1, __u32 id2, int depth) 4411 4354 { 4412 4355 struct btf_type *t1, *t2; 4413 - 4414 - t1 = btf_type_by_id(d->btf, id1); 4415 - t2 = btf_type_by_id(d->btf, id2); 4416 - if (!btf_is_array(t1) || !btf_is_array(t2)) 4356 + int k1, k2; 4357 + recur: 4358 + if (depth <= 0) 4417 4359 return false; 4418 - 4419 - return btf_equal_array(t1, t2); 4420 - } 4421 - 4422 - /* Check if given two types are identical STRUCT/UNION definitions */ 4423 - static bool btf_dedup_identical_structs(struct btf_dedup *d, __u32 id1, __u32 id2) 4424 - { 4425 - const struct btf_member *m1, *m2; 4426 - struct btf_type *t1, *t2; 4427 - int n, i; 4428 4360 4429 4361 t1 = btf_type_by_id(d->btf, id1); 4430 4362 t2 = btf_type_by_id(d->btf, id2); 4431 4363 4432 - if (!btf_is_composite(t1) || btf_kind(t1) != btf_kind(t2)) 4364 + k1 = btf_kind(t1); 4365 + k2 = btf_kind(t2); 4366 + if (k1 != k2) 4433 4367 return false; 4434 4368 4435 - if (!btf_shallow_equal_struct(t1, t2)) 4436 - return false; 4437 - 4438 - m1 = btf_members(t1); 4439 - m2 = btf_members(t2); 4440 - for (i = 0, n = btf_vlen(t1); i < n; i++, m1++, m2++) { 4441 - if (m1->type != m2->type && 4442 - !btf_dedup_identical_arrays(d, m1->type, m2->type) && 4443 - !btf_dedup_identical_structs(d, m1->type, m2->type)) 4369 + switch (k1) { 4370 + case BTF_KIND_UNKN: /* VOID */ 4371 + return true; 4372 + case BTF_KIND_INT: 4373 + return btf_equal_int_tag(t1, t2); 4374 + case BTF_KIND_ENUM: 4375 + case BTF_KIND_ENUM64: 4376 + return btf_compat_enum(t1, t2); 4377 + case BTF_KIND_FWD: 4378 + case BTF_KIND_FLOAT: 4379 + return btf_equal_common(t1, t2); 4380 + case BTF_KIND_CONST: 4381 + case BTF_KIND_VOLATILE: 4382 + case BTF_KIND_RESTRICT: 4383 + case BTF_KIND_PTR: 4384 + case BTF_KIND_TYPEDEF: 4385 + case BTF_KIND_FUNC: 4386 + case BTF_KIND_TYPE_TAG: 4387 + if (t1->info != t2->info || t1->name_off != t2->name_off) 4444 4388 return false; 4389 + id1 = t1->type; 4390 + id2 = t2->type; 4391 + goto recur; 4392 + case BTF_KIND_ARRAY: { 4393 + struct btf_array *a1, *a2; 4394 + 4395 + if (!btf_compat_array(t1, t2)) 4396 + return false; 4397 + 4398 + a1 = btf_array(t1); 4399 + a2 = btf_array(t1); 4400 + 4401 + if (a1->index_type != a2->index_type && 4402 + !btf_dedup_identical_types(d, a1->index_type, a2->index_type, depth - 1)) 4403 + return false; 4404 + 4405 + if (a1->type != a2->type && 4406 + !btf_dedup_identical_types(d, a1->type, a2->type, depth - 1)) 4407 + return false; 4408 + 4409 + return true; 4445 4410 } 4446 - return true; 4411 + case BTF_KIND_STRUCT: 4412 + case BTF_KIND_UNION: { 4413 + const struct btf_member *m1, *m2; 4414 + int i, n; 4415 + 4416 + if (!btf_shallow_equal_struct(t1, t2)) 4417 + return false; 4418 + 4419 + m1 = btf_members(t1); 4420 + m2 = btf_members(t2); 4421 + for (i = 0, n = btf_vlen(t1); i < n; i++, m1++, m2++) { 4422 + if (m1->type == m2->type) 4423 + continue; 4424 + if (!btf_dedup_identical_types(d, m1->type, m2->type, depth - 1)) 4425 + return false; 4426 + } 4427 + return true; 4428 + } 4429 + case BTF_KIND_FUNC_PROTO: { 4430 + const struct btf_param *p1, *p2; 4431 + int i, n; 4432 + 4433 + if (!btf_compat_fnproto(t1, t2)) 4434 + return false; 4435 + 4436 + if (t1->type != t2->type && 4437 + !btf_dedup_identical_types(d, t1->type, t2->type, depth - 1)) 4438 + return false; 4439 + 4440 + p1 = btf_params(t1); 4441 + p2 = btf_params(t2); 4442 + for (i = 0, n = btf_vlen(t1); i < n; i++, p1++, p2++) { 4443 + if (p1->type == p2->type) 4444 + continue; 4445 + if (!btf_dedup_identical_types(d, p1->type, p2->type, depth - 1)) 4446 + return false; 4447 + } 4448 + return true; 4449 + } 4450 + default: 4451 + return false; 4452 + } 4447 4453 } 4454 + 4448 4455 4449 4456 /* 4450 4457 * Check equivalence of BTF type graph formed by candidate struct/union (we'll ··· 4627 4508 * different fields within the *same* struct. This breaks type 4628 4509 * equivalence check, which makes an assumption that candidate 4629 4510 * types sub-graph has a consistent and deduped-by-compiler 4630 - * types within a single CU. So work around that by explicitly 4631 - * allowing identical array types here. 4511 + * types within a single CU. And similar situation can happen 4512 + * with struct/union sometimes, and event with pointers. 4513 + * So accommodate cases like this doing a structural 4514 + * comparison recursively, but avoiding being stuck in endless 4515 + * loops by limiting the depth up to which we check. 4632 4516 */ 4633 - if (btf_dedup_identical_arrays(d, hypot_type_id, cand_id)) 4634 - return 1; 4635 - /* It turns out that similar situation can happen with 4636 - * struct/union sometimes, sigh... Handle the case where 4637 - * structs/unions are exactly the same, down to the referenced 4638 - * type IDs. Anything more complicated (e.g., if referenced 4639 - * types are different, but equivalent) is *way more* 4640 - * complicated and requires a many-to-many equivalence mapping. 4641 - */ 4642 - if (btf_dedup_identical_structs(d, hypot_type_id, cand_id)) 4517 + if (btf_dedup_identical_types(d, hypot_type_id, cand_id, 16)) 4643 4518 return 1; 4644 4519 return 0; 4645 4520 } ··· 5381 5268 pr_warn("kernel BTF is missing at '%s', was CONFIG_DEBUG_INFO_BTF enabled?\n", 5382 5269 sysfs_btf_path); 5383 5270 } else { 5384 - btf = btf__parse(sysfs_btf_path, NULL); 5271 + btf = btf_parse_raw_mmap(sysfs_btf_path, NULL); 5272 + if (IS_ERR(btf)) 5273 + btf = btf__parse(sysfs_btf_path, NULL); 5274 + 5385 5275 if (!btf) { 5386 5276 err = -errno; 5387 5277 pr_warn("failed to read kernel BTF from '%s': %s\n",

+48 -39

tools/lib/bpf/libbpf.c

··· 60 60 #define BPF_FS_MAGIC 0xcafe4a11 61 61 #endif 62 62 63 + #define MAX_EVENT_NAME_LEN 64 64 + 63 65 #define BPF_FS_DEFAULT_PATH "/sys/fs/bpf" 64 66 65 67 #define BPF_INSN_SZ (sizeof(struct bpf_insn)) ··· 286 284 old_errno = errno; 287 285 288 286 va_start(args, format); 289 - __libbpf_pr(level, format, args); 287 + print_fn(level, format, args); 290 288 va_end(args); 291 289 292 290 errno = old_errno; ··· 898 896 return -LIBBPF_ERRNO__FORMAT; 899 897 } 900 898 901 - if (sec_off + prog_sz > sec_sz) { 899 + if (sec_off + prog_sz > sec_sz || sec_off + prog_sz < sec_off) { 902 900 pr_warn("sec '%s': program at offset %zu crosses section boundary\n", 903 901 sec_name, sec_off); 904 902 return -LIBBPF_ERRNO__FORMAT; ··· 1725 1723 } 1726 1724 1727 1725 return ERR_PTR(-ENOENT); 1728 - } 1729 - 1730 - /* Some versions of Android don't provide memfd_create() in their libc 1731 - * implementation, so avoid complications and just go straight to Linux 1732 - * syscall. 1733 - */ 1734 - static int sys_memfd_create(const char *name, unsigned flags) 1735 - { 1736 - return syscall(__NR_memfd_create, name, flags); 1737 1726 } 1738 1727 1739 1728 #ifndef MFD_CLOEXEC ··· 9448 9455 return 0; 9449 9456 } 9450 9457 9458 + struct bpf_func_info *bpf_program__func_info(const struct bpf_program *prog) 9459 + { 9460 + if (prog->func_info_rec_size != sizeof(struct bpf_func_info)) 9461 + return libbpf_err_ptr(-EOPNOTSUPP); 9462 + return prog->func_info; 9463 + } 9464 + 9465 + __u32 bpf_program__func_info_cnt(const struct bpf_program *prog) 9466 + { 9467 + return prog->func_info_cnt; 9468 + } 9469 + 9470 + struct bpf_line_info *bpf_program__line_info(const struct bpf_program *prog) 9471 + { 9472 + if (prog->line_info_rec_size != sizeof(struct bpf_line_info)) 9473 + return libbpf_err_ptr(-EOPNOTSUPP); 9474 + return prog->line_info; 9475 + } 9476 + 9477 + __u32 bpf_program__line_info_cnt(const struct bpf_program *prog) 9478 + { 9479 + return prog->line_info_cnt; 9480 + } 9481 + 9451 9482 #define SEC_DEF(sec_pfx, ptype, atype, flags, ...) { \ 9452 9483 .sec = (char *)sec_pfx, \ 9453 9484 .prog_type = BPF_PROG_TYPE_##ptype, \ ··· 11138 11121 : TRACEFS"/available_filter_functions_addrs"; 11139 11122 } 11140 11123 11141 - static void gen_kprobe_legacy_event_name(char *buf, size_t buf_sz, 11142 - const char *kfunc_name, size_t offset) 11124 + static void gen_probe_legacy_event_name(char *buf, size_t buf_sz, 11125 + const char *name, size_t offset) 11143 11126 { 11144 11127 static int index = 0; 11145 11128 int i; 11146 11129 11147 - snprintf(buf, buf_sz, "libbpf_%u_%s_0x%zx_%d", getpid(), kfunc_name, offset, 11148 - __sync_fetch_and_add(&index, 1)); 11130 + snprintf(buf, buf_sz, "libbpf_%u_%d_%s_0x%zx", getpid(), 11131 + __sync_fetch_and_add(&index, 1), name, offset); 11149 11132 11150 - /* sanitize binary_path in the probe name */ 11133 + /* sanitize name in the probe name */ 11151 11134 for (i = 0; buf[i]; i++) { 11152 11135 if (!isalnum(buf[i])) 11153 11136 buf[i] = '_'; ··· 11272 11255 11273 11256 return pfd >= 0 ? 1 : 0; 11274 11257 } else { /* legacy mode */ 11275 - char probe_name[128]; 11258 + char probe_name[MAX_EVENT_NAME_LEN]; 11276 11259 11277 - gen_kprobe_legacy_event_name(probe_name, sizeof(probe_name), syscall_name, 0); 11260 + gen_probe_legacy_event_name(probe_name, sizeof(probe_name), syscall_name, 0); 11278 11261 if (add_kprobe_event_legacy(probe_name, false, syscall_name, 0) < 0) 11279 11262 return 0; 11280 11263 ··· 11330 11313 func_name, offset, 11331 11314 -1 /* pid */, 0 /* ref_ctr_off */); 11332 11315 } else { 11333 - char probe_name[256]; 11316 + char probe_name[MAX_EVENT_NAME_LEN]; 11334 11317 11335 - gen_kprobe_legacy_event_name(probe_name, sizeof(probe_name), 11336 - func_name, offset); 11318 + gen_probe_legacy_event_name(probe_name, sizeof(probe_name), 11319 + func_name, offset); 11337 11320 11338 11321 legacy_probe = strdup(probe_name); 11339 11322 if (!legacy_probe) ··· 11877 11860 return ret; 11878 11861 } 11879 11862 11880 - static void gen_uprobe_legacy_event_name(char *buf, size_t buf_sz, 11881 - const char *binary_path, uint64_t offset) 11882 - { 11883 - int i; 11884 - 11885 - snprintf(buf, buf_sz, "libbpf_%u_%s_0x%zx", getpid(), binary_path, (size_t)offset); 11886 - 11887 - /* sanitize binary_path in the probe name */ 11888 - for (i = 0; buf[i]; i++) { 11889 - if (!isalnum(buf[i])) 11890 - buf[i] = '_'; 11891 - } 11892 - } 11893 - 11894 11863 static inline int add_uprobe_event_legacy(const char *probe_name, bool retprobe, 11895 11864 const char *binary_path, size_t offset) 11896 11865 { ··· 12300 12297 pfd = perf_event_open_probe(true /* uprobe */, retprobe, binary_path, 12301 12298 func_offset, pid, ref_ctr_off); 12302 12299 } else { 12303 - char probe_name[PATH_MAX + 64]; 12300 + char probe_name[MAX_EVENT_NAME_LEN]; 12304 12301 12305 12302 if (ref_ctr_off) 12306 12303 return libbpf_err_ptr(-EINVAL); 12307 12304 12308 - gen_uprobe_legacy_event_name(probe_name, sizeof(probe_name), 12309 - binary_path, func_offset); 12305 + gen_probe_legacy_event_name(probe_name, sizeof(probe_name), 12306 + strrchr(binary_path, '/') ? : binary_path, 12307 + func_offset); 12310 12308 12311 12309 legacy_probe = strdup(probe_name); 12312 12310 if (!legacy_probe) ··· 13375 13371 attr.config = PERF_COUNT_SW_BPF_OUTPUT; 13376 13372 attr.type = PERF_TYPE_SOFTWARE; 13377 13373 attr.sample_type = PERF_SAMPLE_RAW; 13378 - attr.sample_period = sample_period; 13379 13374 attr.wakeup_events = sample_period; 13380 13375 13381 13376 p.attr = &attr; ··· 14102 14099 } 14103 14100 14104 14101 link = map_skel->link; 14102 + if (!link) { 14103 + pr_warn("map '%s': BPF map skeleton link is uninitialized\n", 14104 + bpf_map__name(map)); 14105 + continue; 14106 + } 14107 + 14105 14108 if (*link) 14106 14109 continue; 14107 14110

+6

tools/lib/bpf/libbpf.h

··· 940 940 LIBBPF_API const char *bpf_program__log_buf(const struct bpf_program *prog, size_t *log_size); 941 941 LIBBPF_API int bpf_program__set_log_buf(struct bpf_program *prog, char *log_buf, size_t log_size); 942 942 943 + LIBBPF_API struct bpf_func_info *bpf_program__func_info(const struct bpf_program *prog); 944 + LIBBPF_API __u32 bpf_program__func_info_cnt(const struct bpf_program *prog); 945 + 946 + LIBBPF_API struct bpf_line_info *bpf_program__line_info(const struct bpf_program *prog); 947 + LIBBPF_API __u32 bpf_program__line_info_cnt(const struct bpf_program *prog); 948 + 943 949 /** 944 950 * @brief **bpf_program__set_attach_target()** sets BTF-based attach target 945 951 * for supported BPF program types:

+4

tools/lib/bpf/libbpf.map

··· 437 437 bpf_linker__add_fd; 438 438 bpf_linker__new_fd; 439 439 bpf_object__prepare; 440 + bpf_program__func_info; 441 + bpf_program__func_info_cnt; 442 + bpf_program__line_info; 443 + bpf_program__line_info_cnt; 440 444 btf__add_decl_attr; 441 445 btf__add_type_attr; 442 446 } LIBBPF_1.5.0;

+9

tools/lib/bpf/libbpf_internal.h

··· 667 667 return syscall(__NR_dup3, oldfd, newfd, flags); 668 668 } 669 669 670 + /* Some versions of Android don't provide memfd_create() in their libc 671 + * implementation, so avoid complications and just go straight to Linux 672 + * syscall. 673 + */ 674 + static inline int sys_memfd_create(const char *name, unsigned flags) 675 + { 676 + return syscall(__NR_memfd_create, name, flags); 677 + } 678 + 670 679 /* Point *fixed_fd* to the same file that *tmp_fd* points to. 671 680 * Regardless of success, *tmp_fd* is closed. 672 681 * Whatever *fixed_fd* pointed to is closed silently.

+3 -3

tools/lib/bpf/linker.c

··· 573 573 574 574 snprintf(filename, sizeof(filename), "mem:%p+%zu", buf, buf_sz); 575 575 576 - fd = memfd_create(filename, 0); 576 + fd = sys_memfd_create(filename, 0); 577 577 if (fd < 0) { 578 578 ret = -errno; 579 579 pr_warn("failed to create memfd '%s': %s\n", filename, errstr(ret)); ··· 1376 1376 } else { 1377 1377 if (!secs_match(dst_sec, src_sec)) { 1378 1378 pr_warn("ELF sections %s are incompatible\n", src_sec->sec_name); 1379 - return -1; 1379 + return -EINVAL; 1380 1380 } 1381 1381 1382 1382 /* "license" and "version" sections are deduped */ ··· 2223 2223 } 2224 2224 } else if (!secs_match(dst_sec, src_sec)) { 2225 2225 pr_warn("sections %s are not compatible\n", src_sec->sec_name); 2226 - return -1; 2226 + return -EINVAL; 2227 2227 } 2228 2228 2229 2229 /* shdr->sh_link points to SYMTAB */

+7 -8

tools/lib/bpf/nlattr.c

··· 63 63 minlen = nla_attr_minlen[pt->type]; 64 64 65 65 if (libbpf_nla_len(nla) < minlen) 66 - return -1; 66 + return -EINVAL; 67 67 68 68 if (pt->maxlen && libbpf_nla_len(nla) > pt->maxlen) 69 - return -1; 69 + return -EINVAL; 70 70 71 71 if (pt->type == LIBBPF_NLA_STRING) { 72 72 char *data = libbpf_nla_data(nla); 73 73 74 74 if (data[libbpf_nla_len(nla) - 1] != '\0') 75 - return -1; 75 + return -EINVAL; 76 76 } 77 77 78 78 return 0; ··· 118 118 if (policy) { 119 119 err = validate_nla(nla, maxtype, policy); 120 120 if (err < 0) 121 - goto errout; 121 + return err; 122 122 } 123 123 124 - if (tb[type]) 124 + if (tb[type]) { 125 125 pr_warn("Attribute of type %#x found multiple times in message, " 126 126 "previous attribute is being ignored.\n", type); 127 + } 127 128 128 129 tb[type] = nla; 129 130 } 130 131 131 - err = 0; 132 - errout: 133 - return err; 132 + return 0; 134 133 } 135 134 136 135 /**

+1

tools/testing/selftests/bpf/DENYLIST

··· 1 1 # TEMPORARY 2 2 # Alphabetical order 3 + dynptr/test_probe_read_user_str_dynptr # disabled until https://patchwork.kernel.org/project/linux-mm/patch/20250422131449.57177-1-mykyta.yatsenko5@gmail.com/ makes it into the bpf-next 3 4 get_stack_raw_tp # spams with kernel warnings until next bpf -> bpf-next merge 4 5 stacktrace_build_id 5 6 stacktrace_build_id_nmi

-2

tools/testing/selftests/bpf/DENYLIST.aarch64

··· 1 - fentry_test/fentry_many_args # fentry_many_args:FAIL:fentry_many_args_attach unexpected error: -524 2 - fexit_test/fexit_many_args # fexit_many_args:FAIL:fexit_many_args_attach unexpected error: -524 3 1 tracing_struct/struct_many_args # struct_many_args:FAIL:tracing_struct_many_args__attach unexpected error: -524

+11 -5

tools/testing/selftests/bpf/Makefile

··· 34 34 LIBELF_CFLAGS := $(shell $(PKG_CONFIG) libelf --cflags 2>/dev/null) 35 35 LIBELF_LIBS := $(shell $(PKG_CONFIG) libelf --libs 2>/dev/null || echo -lelf) 36 36 37 + SKIP_DOCS ?= 38 + SKIP_LLVM ?= 39 + 37 40 ifeq ($(srctree),) 38 41 srctree := $(patsubst %/,%,$(dir $(CURDIR))) 39 42 srctree := $(patsubst %/,%,$(dir $(srctree))) ··· 175 172 endif 176 173 endif 177 174 175 + ifneq ($(SKIP_LLVM),1) 178 176 ifeq ($(feature-llvm),1) 179 177 LLVM_CFLAGS += -DHAVE_LLVM_SUPPORT 180 178 LLVM_CONFIG_LIB_COMPONENTS := mcdisassembler all-targets ··· 184 180 # Prefer linking statically if it's available, otherwise fallback to shared 185 181 ifeq ($(shell $(LLVM_CONFIG) --link-static --libs >/dev/null 2>&1 && echo static),static) 186 182 LLVM_LDLIBS += $(shell $(LLVM_CONFIG) --link-static --libs $(LLVM_CONFIG_LIB_COMPONENTS)) 187 - LLVM_LDLIBS += $(shell $(LLVM_CONFIG) --link-static --system-libs $(LLVM_CONFIG_LIB_COMPONENTS)) 183 + LLVM_LDLIBS += $(filter-out -lxml2,$(shell $(LLVM_CONFIG) --link-static --system-libs $(LLVM_CONFIG_LIB_COMPONENTS))) 188 184 LLVM_LDLIBS += -lstdc++ 189 185 else 190 186 LLVM_LDLIBS += $(shell $(LLVM_CONFIG) --link-shared --libs $(LLVM_CONFIG_LIB_COMPONENTS)) 191 187 endif 192 188 LLVM_LDFLAGS += $(shell $(LLVM_CONFIG) --ldflags) 189 + endif 193 190 endif 194 191 195 192 SCRATCH_DIR := $(OUTPUT)/tools ··· 363 358 prefix= DESTDIR=$(SCRATCH_DIR)/ install-bin 364 359 endif 365 360 361 + ifneq ($(SKIP_DOCS),1) 366 362 all: docs 363 + endif 367 364 368 365 docs: 369 366 $(Q)RST2MAN_OPTS="--exit-status=1" $(MAKE) $(submake_extras) \ ··· 680 673 $(Q)rsync -aq $$^ $(TRUNNER_OUTPUT)/ 681 674 endif 682 675 683 - $(OUTPUT)/$(TRUNNER_BINARY): LDLIBS += $$(LLVM_LDLIBS) 684 - $(OUTPUT)/$(TRUNNER_BINARY): LDFLAGS += $$(LLVM_LDFLAGS) 685 - 686 676 # some X.test.o files have runtime dependencies on Y.bpf.o files 687 677 $(OUTPUT)/$(TRUNNER_BINARY): | $(TRUNNER_BPF_OBJS) 688 678 ··· 690 686 $(OUTPUT)/veristat \ 691 687 | $(TRUNNER_BINARY)-extras 692 688 $$(call msg,BINARY,,$$@) 693 - $(Q)$$(CC) $$(CFLAGS) $$(filter %.a %.o,$$^) $$(LDLIBS) $$(LDFLAGS) -o $$@ 689 + $(Q)$$(CC) $$(CFLAGS) $$(filter %.a %.o,$$^) $$(LDLIBS) $$(LLVM_LDLIBS) $$(LDFLAGS) $$(LLVM_LDFLAGS) -o $$@ 694 690 $(Q)$(RESOLVE_BTFIDS) --btf $(TRUNNER_OUTPUT)/btf_data.bpf.o $$@ 695 691 $(Q)ln -sf $(if $2,..,.)/tools/build/bpftool/$(USE_BOOTSTRAP)bpftool \ 696 692 $(OUTPUT)/$(if $2,$2/)bpftool ··· 815 811 $(OUTPUT)/bench_bpf_hashmap_lookup.o: $(OUTPUT)/bpf_hashmap_lookup.skel.h 816 812 $(OUTPUT)/bench_htab_mem.o: $(OUTPUT)/htab_mem_bench.skel.h 817 813 $(OUTPUT)/bench_bpf_crypto.o: $(OUTPUT)/crypto_bench.skel.h 814 + $(OUTPUT)/bench_sockmap.o: $(OUTPUT)/bench_sockmap_prog.skel.h 818 815 $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ) 819 816 $(OUTPUT)/bench: LDLIBS += -lm 820 817 $(OUTPUT)/bench: $(OUTPUT)/bench.o \ ··· 836 831 $(OUTPUT)/bench_local_storage_create.o \ 837 832 $(OUTPUT)/bench_htab_mem.o \ 838 833 $(OUTPUT)/bench_bpf_crypto.o \ 834 + $(OUTPUT)/bench_sockmap.o \ 839 835 # 840 836 $(call msg,BINARY,,$@) 841 837 $(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@

+4

tools/testing/selftests/bpf/bench.c

··· 283 283 extern struct argp bench_htab_mem_argp; 284 284 extern struct argp bench_trigger_batch_argp; 285 285 extern struct argp bench_crypto_argp; 286 + extern struct argp bench_sockmap_argp; 286 287 287 288 static const struct argp_child bench_parsers[] = { 288 289 { &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 }, ··· 298 297 { &bench_htab_mem_argp, 0, "hash map memory benchmark", 0 }, 299 298 { &bench_trigger_batch_argp, 0, "BPF triggering benchmark", 0 }, 300 299 { &bench_crypto_argp, 0, "bpf crypto benchmark", 0 }, 300 + { &bench_sockmap_argp, 0, "bpf sockmap benchmark", 0 }, 301 301 {}, 302 302 }; 303 303 ··· 557 555 extern const struct bench bench_htab_mem; 558 556 extern const struct bench bench_crypto_encrypt; 559 557 extern const struct bench bench_crypto_decrypt; 558 + extern const struct bench bench_sockmap; 560 559 561 560 static const struct bench *benchs[] = { 562 561 &bench_count_global, ··· 624 621 &bench_htab_mem, 625 622 &bench_crypto_encrypt, 626 623 &bench_crypto_decrypt, 624 + &bench_sockmap, 627 625 }; 628 626 629 627 static void find_benchmark(void)

+1 -2

tools/testing/selftests/bpf/benchs/bench_htab_mem.c

··· 279 279 } 280 280 281 281 got = read(fd, buf, sizeof(buf) - 1); 282 + close(fd); 282 283 if (got <= 0) { 283 284 *value = 0; 284 285 return; ··· 287 286 buf[got] = 0; 288 287 289 288 *value = strtoull(buf, NULL, 0); 290 - 291 - close(fd); 292 289 } 293 290 294 291 static void htab_mem_measure(struct bench_res *res)

+598

tools/testing/selftests/bpf/benchs/bench_sockmap.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <error.h> 4 + #include <sys/types.h> 5 + #include <sys/socket.h> 6 + #include <netinet/in.h> 7 + #include <sys/sendfile.h> 8 + #include <arpa/inet.h> 9 + #include <fcntl.h> 10 + #include <argp.h> 11 + #include "bench.h" 12 + #include "bench_sockmap_prog.skel.h" 13 + 14 + #define FILE_SIZE (128 * 1024) 15 + #define DATA_REPEAT_SIZE 10 16 + 17 + static const char snd_data[DATA_REPEAT_SIZE] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; 18 + 19 + /* c1 <-> [p1, p2] <-> c2 20 + * RX bench(BPF_SK_SKB_STREAM_VERDICT): 21 + * ARG_FW_RX_PASS: 22 + * send(p2) -> recv(c2) -> bpf skb passthrough -> recv(c2) 23 + * ARG_FW_RX_VERDICT_EGRESS: 24 + * send(c1) -> verdict skb to tx queuec of p2 -> recv(c2) 25 + * ARG_FW_RX_VERDICT_INGRESS: 26 + * send(c1) -> verdict skb to rx queuec of c2 -> recv(c2) 27 + * 28 + * TX bench(BPF_SK_MSG_VERDIC): 29 + * ARG_FW_TX_PASS: 30 + * send(p2) -> bpf msg passthrough -> send(p2) -> recv(c2) 31 + * ARG_FW_TX_VERDICT_INGRESS: 32 + * send(p2) -> verdict msg to rx queue of c2 -> recv(c2) 33 + * ARG_FW_TX_VERDICT_EGRESS: 34 + * send(p1) -> verdict msg to tx queue of p2 -> recv(c2) 35 + */ 36 + enum SOCKMAP_ARG_FLAG { 37 + ARG_FW_RX_NORMAL = 11000, 38 + ARG_FW_RX_PASS, 39 + ARG_FW_RX_VERDICT_EGRESS, 40 + ARG_FW_RX_VERDICT_INGRESS, 41 + ARG_FW_TX_NORMAL, 42 + ARG_FW_TX_PASS, 43 + ARG_FW_TX_VERDICT_INGRESS, 44 + ARG_FW_TX_VERDICT_EGRESS, 45 + ARG_CTL_RX_STRP, 46 + ARG_CONSUMER_DELAY_TIME, 47 + ARG_PRODUCER_DURATION, 48 + }; 49 + 50 + #define TXMODE_NORMAL() \ 51 + ((ctx.mode) == ARG_FW_TX_NORMAL) 52 + 53 + #define TXMODE_BPF_INGRESS() \ 54 + ((ctx.mode) == ARG_FW_TX_VERDICT_INGRESS) 55 + 56 + #define TXMODE_BPF_EGRESS() \ 57 + ((ctx.mode) == ARG_FW_TX_VERDICT_EGRESS) 58 + 59 + #define TXMODE_BPF_PASS() \ 60 + ((ctx.mode) == ARG_FW_TX_PASS) 61 + 62 + #define TXMODE_BPF() ( \ 63 + TXMODE_BPF_PASS() || \ 64 + TXMODE_BPF_INGRESS() || \ 65 + TXMODE_BPF_EGRESS()) 66 + 67 + #define TXMODE() ( \ 68 + TXMODE_NORMAL() || \ 69 + TXMODE_BPF()) 70 + 71 + #define RXMODE_NORMAL() \ 72 + ((ctx.mode) == ARG_FW_RX_NORMAL) 73 + 74 + #define RXMODE_BPF_PASS() \ 75 + ((ctx.mode) == ARG_FW_RX_PASS) 76 + 77 + #define RXMODE_BPF_VERDICT_EGRESS() \ 78 + ((ctx.mode) == ARG_FW_RX_VERDICT_EGRESS) 79 + 80 + #define RXMODE_BPF_VERDICT_INGRESS() \ 81 + ((ctx.mode) == ARG_FW_RX_VERDICT_INGRESS) 82 + 83 + #define RXMODE_BPF_VERDICT() ( \ 84 + RXMODE_BPF_VERDICT_INGRESS() || \ 85 + RXMODE_BPF_VERDICT_EGRESS()) 86 + 87 + #define RXMODE_BPF() ( \ 88 + RXMODE_BPF_PASS() || \ 89 + RXMODE_BPF_VERDICT()) 90 + 91 + #define RXMODE() ( \ 92 + RXMODE_NORMAL() || \ 93 + RXMODE_BPF()) 94 + 95 + static struct socmap_ctx { 96 + struct bench_sockmap_prog *skel; 97 + enum SOCKMAP_ARG_FLAG mode; 98 + #define c1 fds[0] 99 + #define p1 fds[1] 100 + #define c2 fds[2] 101 + #define p2 fds[3] 102 + #define sfd fds[4] 103 + int fds[5]; 104 + long send_calls; 105 + long read_calls; 106 + long prod_send; 107 + long user_read; 108 + int file_size; 109 + int delay_consumer; 110 + int prod_run_time; 111 + int strp_size; 112 + } ctx = { 113 + .prod_send = 0, 114 + .user_read = 0, 115 + .file_size = FILE_SIZE, 116 + .mode = ARG_FW_RX_VERDICT_EGRESS, 117 + .fds = {0}, 118 + .delay_consumer = 0, 119 + .prod_run_time = 0, 120 + .strp_size = 0, 121 + }; 122 + 123 + static void bench_sockmap_prog_destroy(void) 124 + { 125 + int i; 126 + 127 + for (i = 0; i < sizeof(ctx.fds); i++) { 128 + if (ctx.fds[0] > 0) 129 + close(ctx.fds[i]); 130 + } 131 + 132 + bench_sockmap_prog__destroy(ctx.skel); 133 + } 134 + 135 + static void init_addr(struct sockaddr_storage *ss, 136 + socklen_t *len) 137 + { 138 + struct sockaddr_in *addr4 = memset(ss, 0, sizeof(*ss)); 139 + 140 + addr4->sin_family = AF_INET; 141 + addr4->sin_port = 0; 142 + addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK); 143 + *len = sizeof(*addr4); 144 + } 145 + 146 + static bool set_non_block(int fd, bool blocking) 147 + { 148 + int flags = fcntl(fd, F_GETFL, 0); 149 + 150 + if (flags == -1) 151 + return false; 152 + flags = blocking ? (flags | O_NONBLOCK) : (flags & ~O_NONBLOCK); 153 + return (fcntl(fd, F_SETFL, flags) == 0); 154 + } 155 + 156 + static int create_pair(int *c, int *p, int type) 157 + { 158 + struct sockaddr_storage addr; 159 + int err, cfd, pfd; 160 + socklen_t addr_len = sizeof(struct sockaddr_storage); 161 + 162 + err = getsockname(ctx.sfd, (struct sockaddr *)&addr, &addr_len); 163 + if (err) { 164 + fprintf(stderr, "getsockname error %d\n", errno); 165 + return err; 166 + } 167 + cfd = socket(AF_INET, type, 0); 168 + if (cfd < 0) { 169 + fprintf(stderr, "socket error %d\n", errno); 170 + return err; 171 + } 172 + 173 + err = connect(cfd, (struct sockaddr *)&addr, addr_len); 174 + if (err && errno != EINPROGRESS) { 175 + fprintf(stderr, "connect error %d\n", errno); 176 + return err; 177 + } 178 + 179 + pfd = accept(ctx.sfd, NULL, NULL); 180 + if (pfd < 0) { 181 + fprintf(stderr, "accept error %d\n", errno); 182 + return err; 183 + } 184 + *c = cfd; 185 + *p = pfd; 186 + return 0; 187 + } 188 + 189 + static int create_sockets(void) 190 + { 191 + struct sockaddr_storage addr; 192 + int err, one = 1; 193 + socklen_t addr_len; 194 + 195 + init_addr(&addr, &addr_len); 196 + ctx.sfd = socket(AF_INET, SOCK_STREAM, 0); 197 + if (ctx.sfd < 0) { 198 + fprintf(stderr, "socket error:%d\n", errno); 199 + return ctx.sfd; 200 + } 201 + err = setsockopt(ctx.sfd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one)); 202 + if (err) { 203 + fprintf(stderr, "setsockopt error:%d\n", errno); 204 + return err; 205 + } 206 + 207 + err = bind(ctx.sfd, (struct sockaddr *)&addr, addr_len); 208 + if (err) { 209 + fprintf(stderr, "bind error:%d\n", errno); 210 + return err; 211 + } 212 + 213 + err = listen(ctx.sfd, SOMAXCONN); 214 + if (err) { 215 + fprintf(stderr, "listen error:%d\n", errno); 216 + return err; 217 + } 218 + 219 + err = create_pair(&ctx.c1, &ctx.p1, SOCK_STREAM); 220 + if (err) { 221 + fprintf(stderr, "create_pair 1 error\n"); 222 + return err; 223 + } 224 + 225 + err = create_pair(&ctx.c2, &ctx.p2, SOCK_STREAM); 226 + if (err) { 227 + fprintf(stderr, "create_pair 2 error\n"); 228 + return err; 229 + } 230 + printf("create socket fd c1:%d p1:%d c2:%d p2:%d\n", 231 + ctx.c1, ctx.p1, ctx.c2, ctx.p2); 232 + return 0; 233 + } 234 + 235 + static void validate(void) 236 + { 237 + if (env.consumer_cnt != 2 || env.producer_cnt != 1 || 238 + !env.affinity) 239 + goto err; 240 + return; 241 + err: 242 + fprintf(stderr, "argument '-c 2 -p 1 -a' is necessary"); 243 + exit(1); 244 + } 245 + 246 + static int setup_rx_sockmap(void) 247 + { 248 + int verdict, pass, parser, map; 249 + int zero = 0, one = 1; 250 + int err; 251 + 252 + parser = bpf_program__fd(ctx.skel->progs.prog_skb_parser); 253 + verdict = bpf_program__fd(ctx.skel->progs.prog_skb_verdict); 254 + pass = bpf_program__fd(ctx.skel->progs.prog_skb_pass); 255 + map = bpf_map__fd(ctx.skel->maps.sock_map_rx); 256 + 257 + if (ctx.strp_size != 0) { 258 + ctx.skel->bss->pkt_size = ctx.strp_size; 259 + err = bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0); 260 + if (err) 261 + return err; 262 + } 263 + 264 + if (RXMODE_BPF_VERDICT()) 265 + err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); 266 + else if (RXMODE_BPF_PASS()) 267 + err = bpf_prog_attach(pass, map, BPF_SK_SKB_STREAM_VERDICT, 0); 268 + if (err) 269 + return err; 270 + 271 + if (RXMODE_BPF_PASS()) 272 + return bpf_map_update_elem(map, &zero, &ctx.c2, BPF_NOEXIST); 273 + 274 + err = bpf_map_update_elem(map, &zero, &ctx.p1, BPF_NOEXIST); 275 + if (err < 0) 276 + return err; 277 + 278 + if (RXMODE_BPF_VERDICT_INGRESS()) { 279 + ctx.skel->bss->verdict_dir = BPF_F_INGRESS; 280 + err = bpf_map_update_elem(map, &one, &ctx.c2, BPF_NOEXIST); 281 + } else { 282 + err = bpf_map_update_elem(map, &one, &ctx.p2, BPF_NOEXIST); 283 + } 284 + if (err < 0) 285 + return err; 286 + 287 + return 0; 288 + } 289 + 290 + static int setup_tx_sockmap(void) 291 + { 292 + int zero = 0, one = 1; 293 + int prog, map; 294 + int err; 295 + 296 + map = bpf_map__fd(ctx.skel->maps.sock_map_tx); 297 + prog = TXMODE_BPF_PASS() ? 298 + bpf_program__fd(ctx.skel->progs.prog_skmsg_pass) : 299 + bpf_program__fd(ctx.skel->progs.prog_skmsg_verdict); 300 + 301 + err = bpf_prog_attach(prog, map, BPF_SK_MSG_VERDICT, 0); 302 + if (err) 303 + return err; 304 + 305 + if (TXMODE_BPF_EGRESS()) { 306 + err = bpf_map_update_elem(map, &zero, &ctx.p1, BPF_NOEXIST); 307 + err |= bpf_map_update_elem(map, &one, &ctx.p2, BPF_NOEXIST); 308 + } else { 309 + ctx.skel->bss->verdict_dir = BPF_F_INGRESS; 310 + err = bpf_map_update_elem(map, &zero, &ctx.p2, BPF_NOEXIST); 311 + err |= bpf_map_update_elem(map, &one, &ctx.c2, BPF_NOEXIST); 312 + } 313 + 314 + if (err < 0) 315 + return err; 316 + 317 + return 0; 318 + } 319 + 320 + static void setup(void) 321 + { 322 + int err; 323 + 324 + ctx.skel = bench_sockmap_prog__open_and_load(); 325 + if (!ctx.skel) { 326 + fprintf(stderr, "error loading skel\n"); 327 + exit(1); 328 + } 329 + 330 + if (create_sockets()) { 331 + fprintf(stderr, "create_net_mode error\n"); 332 + goto err; 333 + } 334 + 335 + if (RXMODE_BPF()) { 336 + err = setup_rx_sockmap(); 337 + if (err) { 338 + fprintf(stderr, "setup_rx_sockmap error:%d\n", err); 339 + goto err; 340 + } 341 + } else if (TXMODE_BPF()) { 342 + err = setup_tx_sockmap(); 343 + if (err) { 344 + fprintf(stderr, "setup_tx_sockmap error:%d\n", err); 345 + goto err; 346 + } 347 + } else { 348 + fprintf(stderr, "unknown sockmap bench mode: %d\n", ctx.mode); 349 + goto err; 350 + } 351 + 352 + return; 353 + 354 + err: 355 + bench_sockmap_prog_destroy(); 356 + exit(1); 357 + } 358 + 359 + static void measure(struct bench_res *res) 360 + { 361 + res->drops = atomic_swap(&ctx.prod_send, 0); 362 + res->hits = atomic_swap(&ctx.skel->bss->process_byte, 0); 363 + res->false_hits = atomic_swap(&ctx.user_read, 0); 364 + res->important_hits = atomic_swap(&ctx.send_calls, 0); 365 + res->important_hits |= atomic_swap(&ctx.read_calls, 0) << 32; 366 + } 367 + 368 + static void verify_data(int *check_pos, char *buf, int rcv) 369 + { 370 + for (int i = 0 ; i < rcv; i++) { 371 + if (buf[i] != snd_data[(*check_pos) % DATA_REPEAT_SIZE]) { 372 + fprintf(stderr, "verify data fail"); 373 + exit(1); 374 + } 375 + (*check_pos)++; 376 + if (*check_pos >= FILE_SIZE) 377 + *check_pos = 0; 378 + } 379 + } 380 + 381 + static void *consumer(void *input) 382 + { 383 + int rcv, sent; 384 + int check_pos = 0; 385 + int tid = (long)input; 386 + int recv_buf_size = FILE_SIZE; 387 + char *buf = malloc(recv_buf_size); 388 + int delay_read = ctx.delay_consumer; 389 + 390 + if (!buf) { 391 + fprintf(stderr, "fail to init read buffer"); 392 + return NULL; 393 + } 394 + 395 + while (true) { 396 + if (tid == 1) { 397 + /* consumer 1 is unused for tx test and stream verdict test */ 398 + if (RXMODE_BPF() || TXMODE()) 399 + return NULL; 400 + /* it's only for RX_NORMAL which service as reserve-proxy mode */ 401 + rcv = read(ctx.p1, buf, recv_buf_size); 402 + if (rcv < 0) { 403 + fprintf(stderr, "fail to read p1"); 404 + return NULL; 405 + } 406 + 407 + sent = send(ctx.p2, buf, recv_buf_size, 0); 408 + if (sent < 0) { 409 + fprintf(stderr, "fail to send p2"); 410 + return NULL; 411 + } 412 + } else { 413 + if (delay_read != 0) { 414 + if (delay_read < 0) 415 + return NULL; 416 + sleep(delay_read); 417 + delay_read = 0; 418 + } 419 + /* read real endpoint by consumer 0 */ 420 + atomic_inc(&ctx.read_calls); 421 + rcv = read(ctx.c2, buf, recv_buf_size); 422 + if (rcv < 0 && errno != EAGAIN) { 423 + fprintf(stderr, "%s fail to read c2 %d\n", __func__, errno); 424 + return NULL; 425 + } 426 + verify_data(&check_pos, buf, rcv); 427 + atomic_add(&ctx.user_read, rcv); 428 + } 429 + } 430 + 431 + return NULL; 432 + } 433 + 434 + static void *producer(void *input) 435 + { 436 + int off = 0, fp, need_sent, sent; 437 + int file_size = ctx.file_size; 438 + struct timespec ts1, ts2; 439 + int target; 440 + FILE *file; 441 + 442 + file = tmpfile(); 443 + if (!file) { 444 + fprintf(stderr, "create file for sendfile"); 445 + return NULL; 446 + } 447 + 448 + /* we need simple verify */ 449 + for (int i = 0; i < file_size; i++) { 450 + if (fwrite(&snd_data[off], sizeof(char), 1, file) != 1) { 451 + fprintf(stderr, "init tmpfile error"); 452 + return NULL; 453 + } 454 + if (++off >= sizeof(snd_data)) 455 + off = 0; 456 + } 457 + fflush(file); 458 + fseek(file, 0, SEEK_SET); 459 + 460 + fp = fileno(file); 461 + need_sent = file_size; 462 + clock_gettime(CLOCK_MONOTONIC, &ts1); 463 + 464 + if (RXMODE_BPF_VERDICT()) 465 + target = ctx.c1; 466 + else if (TXMODE_BPF_EGRESS()) 467 + target = ctx.p1; 468 + else 469 + target = ctx.p2; 470 + set_non_block(target, true); 471 + while (true) { 472 + if (ctx.prod_run_time) { 473 + clock_gettime(CLOCK_MONOTONIC, &ts2); 474 + if (ts2.tv_sec - ts1.tv_sec > ctx.prod_run_time) 475 + return NULL; 476 + } 477 + 478 + errno = 0; 479 + atomic_inc(&ctx.send_calls); 480 + sent = sendfile(target, fp, NULL, need_sent); 481 + if (sent < 0) { 482 + if (errno != EAGAIN && errno != ENOMEM && errno != ENOBUFS) { 483 + fprintf(stderr, "sendfile return %d, errorno %d:%s\n", 484 + sent, errno, strerror(errno)); 485 + return NULL; 486 + } 487 + continue; 488 + } else if (sent < need_sent) { 489 + need_sent -= sent; 490 + atomic_add(&ctx.prod_send, sent); 491 + continue; 492 + } 493 + atomic_add(&ctx.prod_send, need_sent); 494 + need_sent = file_size; 495 + lseek(fp, 0, SEEK_SET); 496 + } 497 + 498 + return NULL; 499 + } 500 + 501 + static void report_progress(int iter, struct bench_res *res, long delta_ns) 502 + { 503 + double speed_mbs, prod_mbs, bpf_mbs, send_hz, read_hz; 504 + 505 + prod_mbs = res->drops / 1000000.0 / (delta_ns / 1000000000.0); 506 + speed_mbs = res->false_hits / 1000000.0 / (delta_ns / 1000000000.0); 507 + bpf_mbs = res->hits / 1000000.0 / (delta_ns / 1000000000.0); 508 + send_hz = (res->important_hits & 0xFFFFFFFF) / (delta_ns / 1000000000.0); 509 + read_hz = (res->important_hits >> 32) / (delta_ns / 1000000000.0); 510 + 511 + printf("Iter %3d (%7.3lfus): ", 512 + iter, (delta_ns - 1000000000) / 1000.0); 513 + printf("Send Speed %8.3lf MB/s (%8.3lf calls/s), BPF Speed %8.3lf MB/s, " 514 + "Rcv Speed %8.3lf MB/s (%8.3lf calls/s)\n", 515 + prod_mbs, send_hz, bpf_mbs, speed_mbs, read_hz); 516 + } 517 + 518 + static void report_final(struct bench_res res[], int res_cnt) 519 + { 520 + double verdict_mbs_mean = 0.0; 521 + long verdict_total = 0; 522 + int i; 523 + 524 + for (i = 0; i < res_cnt; i++) { 525 + verdict_mbs_mean += res[i].hits / 1000000.0 / (0.0 + res_cnt); 526 + verdict_total += res[i].hits / 1000000.0; 527 + } 528 + 529 + printf("Summary: total trans %8.3lu MB \u00B1 %5.3lf MB/s\n", 530 + verdict_total, verdict_mbs_mean); 531 + } 532 + 533 + static const struct argp_option opts[] = { 534 + { "rx-normal", ARG_FW_RX_NORMAL, NULL, 0, 535 + "simple reserve-proxy mode, no bfp enabled"}, 536 + { "rx-pass", ARG_FW_RX_PASS, NULL, 0, 537 + "run bpf prog but no redir applied"}, 538 + { "rx-strp", ARG_CTL_RX_STRP, "Byte", 0, 539 + "enable strparser and set the encapsulation size"}, 540 + { "rx-verdict-egress", ARG_FW_RX_VERDICT_EGRESS, NULL, 0, 541 + "forward data with bpf(stream verdict)"}, 542 + { "rx-verdict-ingress", ARG_FW_RX_VERDICT_INGRESS, NULL, 0, 543 + "forward data with bpf(stream verdict)"}, 544 + { "tx-normal", ARG_FW_TX_NORMAL, NULL, 0, 545 + "simple c-s mode, no bfp enabled"}, 546 + { "tx-pass", ARG_FW_TX_PASS, NULL, 0, 547 + "run bpf prog but no redir applied"}, 548 + { "tx-verdict-ingress", ARG_FW_TX_VERDICT_INGRESS, NULL, 0, 549 + "forward msg to ingress queue of another socket"}, 550 + { "tx-verdict-egress", ARG_FW_TX_VERDICT_EGRESS, NULL, 0, 551 + "forward msg to egress queue of another socket"}, 552 + { "delay-consumer", ARG_CONSUMER_DELAY_TIME, "SEC", 0, 553 + "delay consumer start"}, 554 + { "producer-duration", ARG_PRODUCER_DURATION, "SEC", 0, 555 + "producer duration"}, 556 + {}, 557 + }; 558 + 559 + static error_t parse_arg(int key, char *arg, struct argp_state *state) 560 + { 561 + switch (key) { 562 + case ARG_FW_RX_NORMAL...ARG_FW_TX_VERDICT_EGRESS: 563 + ctx.mode = key; 564 + break; 565 + case ARG_CONSUMER_DELAY_TIME: 566 + ctx.delay_consumer = strtol(arg, NULL, 10); 567 + break; 568 + case ARG_PRODUCER_DURATION: 569 + ctx.prod_run_time = strtol(arg, NULL, 10); 570 + break; 571 + case ARG_CTL_RX_STRP: 572 + ctx.strp_size = strtol(arg, NULL, 10); 573 + break; 574 + default: 575 + return ARGP_ERR_UNKNOWN; 576 + } 577 + 578 + return 0; 579 + } 580 + 581 + /* exported into benchmark runner */ 582 + const struct argp bench_sockmap_argp = { 583 + .options = opts, 584 + .parser = parse_arg, 585 + }; 586 + 587 + /* Benchmark performance of creating bpf local storage */ 588 + const struct bench bench_sockmap = { 589 + .name = "sockmap", 590 + .argp = &bench_sockmap_argp, 591 + .validate = validate, 592 + .setup = setup, 593 + .producer_thread = producer, 594 + .consumer_thread = consumer, 595 + .measure = measure, 596 + .report_progress = report_progress, 597 + .report_final = report_final, 598 + };

+12 -3

tools/testing/selftests/bpf/bpf_arena_spin_lock.h tools/testing/selftests/bpf/progs/bpf_arena_spin_lock.h

··· 32 32 struct __qspinlock { 33 33 union { 34 34 atomic_t val; 35 + #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ 35 36 struct { 36 37 u8 locked; 37 38 u8 pending; ··· 41 40 u16 locked_pending; 42 41 u16 tail; 43 42 }; 43 + #else 44 + struct { 45 + u16 tail; 46 + u16 locked_pending; 47 + }; 48 + struct { 49 + u8 reserved[2]; 50 + u8 pending; 51 + u8 locked; 52 + }; 53 + #endif 44 54 }; 45 55 }; 46 56 ··· 106 94 107 95 #define _Q_LOCKED_VAL (1U << _Q_LOCKED_OFFSET) 108 96 #define _Q_PENDING_VAL (1U << _Q_PENDING_OFFSET) 109 - 110 - #define likely(x) __builtin_expect(!!(x), 1) 111 - #define unlikely(x) __builtin_expect(!!(x), 0) 112 97 113 98 struct arena_qnode __arena qnodes[_Q_MAX_CPUS][_Q_MAX_NODES]; 114 99

+5

tools/testing/selftests/bpf/bpf_experimental.h

··· 591 591 extern struct kmem_cache *bpf_iter_kmem_cache_next(struct bpf_iter_kmem_cache *it) __weak __ksym; 592 592 extern void bpf_iter_kmem_cache_destroy(struct bpf_iter_kmem_cache *it) __weak __ksym; 593 593 594 + struct bpf_iter_dmabuf; 595 + extern int bpf_iter_dmabuf_new(struct bpf_iter_dmabuf *it) __weak __ksym; 596 + extern struct dma_buf *bpf_iter_dmabuf_next(struct bpf_iter_dmabuf *it) __weak __ksym; 597 + extern void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) __weak __ksym; 598 + 594 599 #endif

+3

tools/testing/selftests/bpf/config

··· 22 22 CONFIG_DEBUG_INFO=y 23 23 CONFIG_DEBUG_INFO_BTF=y 24 24 CONFIG_DEBUG_INFO_DWARF4=y 25 + CONFIG_DMABUF_HEAPS=y 26 + CONFIG_DMABUF_HEAPS_SYSTEM=y 25 27 CONFIG_DUMMY=y 26 28 CONFIG_DYNAMIC_FTRACE=y 27 29 CONFIG_FPROBE=y ··· 110 108 CONFIG_SECURITYFS=y 111 109 CONFIG_SYN_COOKIES=y 112 110 CONFIG_TEST_BPF=m 111 + CONFIG_UDMABUF=y 113 112 CONFIG_USERFAULTFD=y 114 113 CONFIG_VSOCKETS=y 115 114 CONFIG_VXLAN=y

+8 -6

tools/testing/selftests/bpf/prog_tests/arena_spin_lock.c

··· 51 51 struct arena_spin_lock *skel; 52 52 pthread_t thread_id[16]; 53 53 int prog_fd, i, err; 54 + int nthreads; 54 55 void *ret; 55 56 56 - if (get_nprocs() < 2) { 57 + nthreads = MIN(get_nprocs(), ARRAY_SIZE(thread_id)); 58 + if (nthreads < 2) { 57 59 test__skip(); 58 60 return; 59 61 } ··· 68 66 goto end; 69 67 } 70 68 skel->bss->cs_count = size; 71 - skel->bss->limit = repeat * 16; 69 + skel->bss->limit = repeat * nthreads; 72 70 73 - ASSERT_OK(pthread_barrier_init(&barrier, NULL, 16), "barrier init"); 71 + ASSERT_OK(pthread_barrier_init(&barrier, NULL, nthreads), "barrier init"); 74 72 75 73 prog_fd = bpf_program__fd(skel->progs.prog); 76 - for (i = 0; i < 16; i++) { 74 + for (i = 0; i < nthreads; i++) { 77 75 err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd); 78 76 if (!ASSERT_OK(err, "pthread_create")) 79 77 goto end_barrier; 80 78 } 81 79 82 - for (i = 0; i < 16; i++) { 80 + for (i = 0; i < nthreads; i++) { 83 81 if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join")) 84 82 goto end_barrier; 85 83 if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd")) 86 84 goto end_barrier; 87 85 } 88 86 89 - ASSERT_EQ(skel->bss->counter, repeat * 16, "check counter value"); 87 + ASSERT_EQ(skel->bss->counter, repeat * nthreads, "check counter value"); 90 88 91 89 end_barrier: 92 90 pthread_barrier_destroy(&barrier);

+84

tools/testing/selftests/bpf/prog_tests/attach_probe.c

··· 122 122 test_attach_probe_manual__destroy(skel); 123 123 } 124 124 125 + /* attach uprobe/uretprobe long event name testings */ 126 + static void test_attach_uprobe_long_event_name(void) 127 + { 128 + DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts); 129 + struct bpf_link *uprobe_link, *uretprobe_link; 130 + struct test_attach_probe_manual *skel; 131 + ssize_t uprobe_offset; 132 + char path[PATH_MAX] = {0}; 133 + 134 + skel = test_attach_probe_manual__open_and_load(); 135 + if (!ASSERT_OK_PTR(skel, "skel_kprobe_manual_open_and_load")) 136 + return; 137 + 138 + uprobe_offset = get_uprobe_offset(&trigger_func); 139 + if (!ASSERT_GE(uprobe_offset, 0, "uprobe_offset")) 140 + goto cleanup; 141 + 142 + if (!ASSERT_GT(readlink("/proc/self/exe", path, PATH_MAX - 1), 0, "readlink")) 143 + goto cleanup; 144 + 145 + /* manual-attach uprobe/uretprobe */ 146 + uprobe_opts.attach_mode = PROBE_ATTACH_MODE_LEGACY; 147 + uprobe_opts.ref_ctr_offset = 0; 148 + uprobe_opts.retprobe = false; 149 + uprobe_link = bpf_program__attach_uprobe_opts(skel->progs.handle_uprobe, 150 + 0 /* self pid */, 151 + path, 152 + uprobe_offset, 153 + &uprobe_opts); 154 + if (!ASSERT_OK_PTR(uprobe_link, "attach_uprobe_long_event_name")) 155 + goto cleanup; 156 + skel->links.handle_uprobe = uprobe_link; 157 + 158 + uprobe_opts.retprobe = true; 159 + uretprobe_link = bpf_program__attach_uprobe_opts(skel->progs.handle_uretprobe, 160 + -1 /* any pid */, 161 + path, 162 + uprobe_offset, &uprobe_opts); 163 + if (!ASSERT_OK_PTR(uretprobe_link, "attach_uretprobe_long_event_name")) 164 + goto cleanup; 165 + skel->links.handle_uretprobe = uretprobe_link; 166 + 167 + cleanup: 168 + test_attach_probe_manual__destroy(skel); 169 + } 170 + 171 + /* attach kprobe/kretprobe long event name testings */ 172 + static void test_attach_kprobe_long_event_name(void) 173 + { 174 + DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts); 175 + struct bpf_link *kprobe_link, *kretprobe_link; 176 + struct test_attach_probe_manual *skel; 177 + 178 + skel = test_attach_probe_manual__open_and_load(); 179 + if (!ASSERT_OK_PTR(skel, "skel_kprobe_manual_open_and_load")) 180 + return; 181 + 182 + /* manual-attach kprobe/kretprobe */ 183 + kprobe_opts.attach_mode = PROBE_ATTACH_MODE_LEGACY; 184 + kprobe_opts.retprobe = false; 185 + kprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe, 186 + "bpf_testmod_looooooooooooooooooooooooooooooong_name", 187 + &kprobe_opts); 188 + if (!ASSERT_OK_PTR(kprobe_link, "attach_kprobe_long_event_name")) 189 + goto cleanup; 190 + skel->links.handle_kprobe = kprobe_link; 191 + 192 + kprobe_opts.retprobe = true; 193 + kretprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe, 194 + "bpf_testmod_looooooooooooooooooooooooooooooong_name", 195 + &kprobe_opts); 196 + if (!ASSERT_OK_PTR(kretprobe_link, "attach_kretprobe_long_event_name")) 197 + goto cleanup; 198 + skel->links.handle_kretprobe = kretprobe_link; 199 + 200 + cleanup: 201 + test_attach_probe_manual__destroy(skel); 202 + } 203 + 125 204 static void test_attach_probe_auto(struct test_attach_probe *skel) 126 205 { 127 206 struct bpf_link *uprobe_err_link; ··· 401 322 test_uprobe_sleepable(skel); 402 323 if (test__start_subtest("uprobe-ref_ctr")) 403 324 test_uprobe_ref_ctr(skel); 325 + 326 + if (test__start_subtest("uprobe-long_name")) 327 + test_attach_uprobe_long_event_name(); 328 + if (test__start_subtest("kprobe-long_name")) 329 + test_attach_kprobe_long_event_name(); 404 330 405 331 cleanup: 406 332 test_attach_probe__destroy(skel);

+6

tools/testing/selftests/bpf/prog_tests/bpf_nf.c

··· 63 63 .repeat = 1, 64 64 ); 65 65 66 + if (SYS_NOFAIL("iptables-legacy --version")) { 67 + fprintf(stdout, "Missing required iptables-legacy tool\n"); 68 + test__skip(); 69 + return; 70 + } 71 + 66 72 skel = test_bpf_nf__open_and_load(); 67 73 if (!ASSERT_OK_PTR(skel, "test_bpf_nf__open_and_load")) 68 74 return;

+101

tools/testing/selftests/bpf/prog_tests/btf_dedup_split.c

··· 440 440 btf__free(btf1); 441 441 } 442 442 443 + /* Ensure module split BTF dedup worked correctly; when dedup fails badly 444 + * core kernel types are in split BTF also, so ensure that references to 445 + * such types point at base - not split - BTF. 446 + * 447 + * bpf_testmod_test_write() has multiple core kernel type parameters; 448 + * 449 + * ssize_t 450 + * bpf_testmod_test_write(struct file *file, struct kobject *kobj, 451 + * struct bin_attribute *bin_attr, 452 + * char *buf, loff_t off, size_t len); 453 + * 454 + * Ensure each of the FUNC_PROTO params is a core kernel type. 455 + * 456 + * Do the same for 457 + * 458 + * __bpf_kfunc struct sock *bpf_kfunc_call_test3(struct sock *sk); 459 + * 460 + * ...and 461 + * 462 + * __bpf_kfunc void bpf_kfunc_call_test_pass_ctx(struct __sk_buff *skb); 463 + * 464 + */ 465 + const char *mod_funcs[] = { 466 + "bpf_testmod_test_write", 467 + "bpf_kfunc_call_test3", 468 + "bpf_kfunc_call_test_pass_ctx" 469 + }; 470 + 471 + static void test_split_module(void) 472 + { 473 + struct btf *vmlinux_btf, *btf1 = NULL; 474 + int i, nr_base_types; 475 + 476 + vmlinux_btf = btf__load_vmlinux_btf(); 477 + if (!ASSERT_OK_PTR(vmlinux_btf, "vmlinux_btf")) 478 + return; 479 + nr_base_types = btf__type_cnt(vmlinux_btf); 480 + if (!ASSERT_GT(nr_base_types, 0, "nr_base_types")) 481 + goto cleanup; 482 + 483 + btf1 = btf__parse_split("/sys/kernel/btf/bpf_testmod", vmlinux_btf); 484 + if (!ASSERT_OK_PTR(btf1, "split_btf")) 485 + return; 486 + 487 + for (i = 0; i < ARRAY_SIZE(mod_funcs); i++) { 488 + const struct btf_param *p; 489 + const struct btf_type *t; 490 + __u16 vlen; 491 + __u32 id; 492 + int j; 493 + 494 + id = btf__find_by_name_kind(btf1, mod_funcs[i], BTF_KIND_FUNC); 495 + if (!ASSERT_GE(id, nr_base_types, "func_id")) 496 + goto cleanup; 497 + t = btf__type_by_id(btf1, id); 498 + if (!ASSERT_OK_PTR(t, "func_id_type")) 499 + goto cleanup; 500 + t = btf__type_by_id(btf1, t->type); 501 + if (!ASSERT_OK_PTR(t, "func_proto_id_type")) 502 + goto cleanup; 503 + if (!ASSERT_EQ(btf_is_func_proto(t), true, "is_func_proto")) 504 + goto cleanup; 505 + vlen = btf_vlen(t); 506 + 507 + for (j = 0, p = btf_params(t); j < vlen; j++, p++) { 508 + /* bpf_testmod uses resilient split BTF, so any 509 + * reference types will be added to split BTF and their 510 + * associated targets will be base BTF types; for example 511 + * for a "struct sock *" the PTR will be in split BTF 512 + * while the "struct sock" will be in base. 513 + * 514 + * In some cases like loff_t we have to resolve 515 + * multiple typedefs hence the while() loop below. 516 + * 517 + * Note that resilient split BTF generation depends 518 + * on pahole version, so we do not assert that 519 + * reference types are in split BTF, as if pahole 520 + * does not support resilient split BTF they will 521 + * also be base BTF types. 522 + */ 523 + id = p->type; 524 + do { 525 + t = btf__type_by_id(btf1, id); 526 + if (!ASSERT_OK_PTR(t, "param_ref_type")) 527 + goto cleanup; 528 + if (!btf_is_mod(t) && !btf_is_ptr(t) && !btf_is_typedef(t)) 529 + break; 530 + id = t->type; 531 + } while (true); 532 + 533 + if (!ASSERT_LT(id, nr_base_types, "verify_base_type")) 534 + goto cleanup; 535 + } 536 + } 537 + cleanup: 538 + btf__free(btf1); 539 + btf__free(vmlinux_btf); 540 + } 541 + 443 542 void test_btf_dedup_split() 444 543 { 445 544 if (test__start_subtest("split_simple")) ··· 549 450 test_split_fwd_resolve(); 550 451 if (test__start_subtest("split_dup_struct_in_cu")) 551 452 test_split_dup_struct_in_cu(); 453 + if (test__start_subtest("split_module")) 454 + test_split_module(); 552 455 }

+52 -6

tools/testing/selftests/bpf/prog_tests/btf_split.c

··· 12 12 vfprintf(ctx, fmt, args); 13 13 } 14 14 15 - void test_btf_split() { 15 + static void __test_btf_split(bool multi) 16 + { 16 17 struct btf_dump *d = NULL; 17 18 const struct btf_type *t; 18 - struct btf *btf1, *btf2; 19 + struct btf *btf1, *btf2, *btf3 = NULL; 19 20 int str_off, i, err; 20 21 21 22 btf1 = btf__new_empty(); ··· 64 63 ASSERT_EQ(btf_vlen(t), 3, "split_struct_vlen"); 65 64 ASSERT_STREQ(btf__str_by_offset(btf2, t->name_off), "s2", "split_struct_name"); 66 65 66 + if (multi) { 67 + btf3 = btf__new_empty_split(btf2); 68 + if (!ASSERT_OK_PTR(btf3, "multi_split_btf")) 69 + goto cleanup; 70 + } else { 71 + btf3 = btf2; 72 + } 73 + 74 + btf__add_union(btf3, "u1", 16); /* [5] union u1 { */ 75 + btf__add_field(btf3, "f1", 4, 0, 0); /* struct s2 f1; */ 76 + btf__add_field(btf3, "uf2", 1, 0, 0); /* int f2; */ 77 + /* } */ 78 + 79 + if (multi) { 80 + t = btf__type_by_id(btf2, 5); 81 + ASSERT_NULL(t, "multisplit_type_in_first_split"); 82 + } 83 + 84 + t = btf__type_by_id(btf3, 5); 85 + if (!ASSERT_OK_PTR(t, "split_union_type")) 86 + goto cleanup; 87 + ASSERT_EQ(btf_is_union(t), true, "split_union_kind"); 88 + ASSERT_EQ(btf_vlen(t), 2, "split_union_vlen"); 89 + ASSERT_STREQ(btf__str_by_offset(btf3, t->name_off), "u1", "split_union_name"); 90 + ASSERT_EQ(btf__type_cnt(btf3), 6, "split_type_cnt"); 91 + 92 + t = btf__type_by_id(btf3, 1); 93 + if (!ASSERT_OK_PTR(t, "split_base_type")) 94 + goto cleanup; 95 + ASSERT_EQ(btf_is_int(t), true, "split_base_int"); 96 + ASSERT_STREQ(btf__str_by_offset(btf3, t->name_off), "int", "split_base_type_name"); 97 + 67 98 /* BTF-to-C dump of split BTF */ 68 99 dump_buf_file = open_memstream(&dump_buf, &dump_buf_sz); 69 100 if (!ASSERT_OK_PTR(dump_buf_file, "dump_memstream")) 70 101 return; 71 - d = btf_dump__new(btf2, btf_dump_printf, dump_buf_file, NULL); 102 + d = btf_dump__new(btf3, btf_dump_printf, dump_buf_file, NULL); 72 103 if (!ASSERT_OK_PTR(d, "btf_dump__new")) 73 104 goto cleanup; 74 - for (i = 1; i < btf__type_cnt(btf2); i++) { 105 + for (i = 1; i < btf__type_cnt(btf3); i++) { 75 106 err = btf_dump__dump_type(d, i); 76 107 ASSERT_OK(err, "dump_type_ok"); 77 108 } ··· 112 79 ASSERT_STREQ(dump_buf, 113 80 "struct s1 {\n" 114 81 " int f1;\n" 115 - "};\n" 116 - "\n" 82 + "};\n\n" 117 83 "struct s2 {\n" 118 84 " struct s1 f1;\n" 119 85 " int f2;\n" 120 86 " int *f3;\n" 87 + "};\n\n" 88 + "union u1 {\n" 89 + " struct s2 f1;\n" 90 + " int uf2;\n" 121 91 "};\n\n", "c_dump"); 122 92 123 93 cleanup: ··· 130 94 btf_dump__free(d); 131 95 btf__free(btf1); 132 96 btf__free(btf2); 97 + if (btf2 != btf3) 98 + btf__free(btf3); 99 + } 100 + 101 + void test_btf_split(void) 102 + { 103 + if (test__start_subtest("single_split")) 104 + __test_btf_split(false); 105 + if (test__start_subtest("multi_split")) 106 + __test_btf_split(true); 133 107 }

+81

tools/testing/selftests/bpf/prog_tests/btf_sysfs.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause 2 + /* Copyright (c) 2025 Isovalent */ 3 + 4 + #include <test_progs.h> 5 + #include <bpf/btf.h> 6 + #include <sys/stat.h> 7 + #include <sys/mman.h> 8 + #include <fcntl.h> 9 + #include <unistd.h> 10 + 11 + static void test_btf_mmap_sysfs(const char *path, struct btf *base) 12 + { 13 + struct stat st; 14 + __u64 btf_size, end; 15 + void *raw_data = NULL; 16 + int fd = -1; 17 + long page_size; 18 + struct btf *btf = NULL; 19 + 20 + page_size = sysconf(_SC_PAGESIZE); 21 + if (!ASSERT_GE(page_size, 0, "get_page_size")) 22 + goto cleanup; 23 + 24 + if (!ASSERT_OK(stat(path, &st), "stat_btf")) 25 + goto cleanup; 26 + 27 + btf_size = st.st_size; 28 + end = (btf_size + page_size - 1) / page_size * page_size; 29 + 30 + fd = open(path, O_RDONLY); 31 + if (!ASSERT_GE(fd, 0, "open_btf")) 32 + goto cleanup; 33 + 34 + raw_data = mmap(NULL, btf_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); 35 + if (!ASSERT_EQ(raw_data, MAP_FAILED, "mmap_btf_writable")) 36 + goto cleanup; 37 + 38 + raw_data = mmap(NULL, btf_size, PROT_READ, MAP_SHARED, fd, 0); 39 + if (!ASSERT_EQ(raw_data, MAP_FAILED, "mmap_btf_shared")) 40 + goto cleanup; 41 + 42 + raw_data = mmap(NULL, end + 1, PROT_READ, MAP_PRIVATE, fd, 0); 43 + if (!ASSERT_EQ(raw_data, MAP_FAILED, "mmap_btf_invalid_size")) 44 + goto cleanup; 45 + 46 + raw_data = mmap(NULL, end, PROT_READ, MAP_PRIVATE, fd, 0); 47 + if (!ASSERT_OK_PTR(raw_data, "mmap_btf")) 48 + goto cleanup; 49 + 50 + if (!ASSERT_EQ(mprotect(raw_data, btf_size, PROT_READ | PROT_WRITE), -1, 51 + "mprotect_writable")) 52 + goto cleanup; 53 + 54 + if (!ASSERT_EQ(mprotect(raw_data, btf_size, PROT_READ | PROT_EXEC), -1, 55 + "mprotect_executable")) 56 + goto cleanup; 57 + 58 + /* Check padding is zeroed */ 59 + for (int i = btf_size; i < end; i++) { 60 + if (((__u8 *)raw_data)[i] != 0) { 61 + PRINT_FAIL("tail of BTF is not zero at page offset %d\n", i); 62 + goto cleanup; 63 + } 64 + } 65 + 66 + btf = btf__new_split(raw_data, btf_size, base); 67 + if (!ASSERT_OK_PTR(btf, "parse_btf")) 68 + goto cleanup; 69 + 70 + cleanup: 71 + btf__free(btf); 72 + if (raw_data && raw_data != MAP_FAILED) 73 + munmap(raw_data, btf_size); 74 + if (fd >= 0) 75 + close(fd); 76 + } 77 + 78 + void test_btf_sysfs(void) 79 + { 80 + test_btf_mmap_sysfs("/sys/kernel/btf/vmlinux", NULL); 81 + }

+285

tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2025 Google */ 3 + 4 + #include <test_progs.h> 5 + #include <bpf/libbpf.h> 6 + #include <bpf/btf.h> 7 + #include "dmabuf_iter.skel.h" 8 + 9 + #include <fcntl.h> 10 + #include <stdbool.h> 11 + #include <stdio.h> 12 + #include <stdlib.h> 13 + #include <string.h> 14 + #include <sys/ioctl.h> 15 + #include <sys/mman.h> 16 + #include <unistd.h> 17 + 18 + #include <linux/dma-buf.h> 19 + #include <linux/dma-heap.h> 20 + #include <linux/udmabuf.h> 21 + 22 + static int udmabuf = -1; 23 + static const char udmabuf_test_buffer_name[DMA_BUF_NAME_LEN] = "udmabuf_test_buffer_for_iter"; 24 + static size_t udmabuf_test_buffer_size; 25 + static int sysheap_dmabuf = -1; 26 + static const char sysheap_test_buffer_name[DMA_BUF_NAME_LEN] = "sysheap_test_buffer_for_iter"; 27 + static size_t sysheap_test_buffer_size; 28 + 29 + static int create_udmabuf(void) 30 + { 31 + struct udmabuf_create create; 32 + int dev_udmabuf, memfd, local_udmabuf; 33 + 34 + udmabuf_test_buffer_size = 10 * getpagesize(); 35 + 36 + if (!ASSERT_LE(sizeof(udmabuf_test_buffer_name), DMA_BUF_NAME_LEN, "NAMETOOLONG")) 37 + return -1; 38 + 39 + memfd = memfd_create("memfd_test", MFD_ALLOW_SEALING); 40 + if (!ASSERT_OK_FD(memfd, "memfd_create")) 41 + return -1; 42 + 43 + if (!ASSERT_OK(ftruncate(memfd, udmabuf_test_buffer_size), "ftruncate")) 44 + goto close_memfd; 45 + 46 + if (!ASSERT_OK(fcntl(memfd, F_ADD_SEALS, F_SEAL_SHRINK), "seal")) 47 + goto close_memfd; 48 + 49 + dev_udmabuf = open("/dev/udmabuf", O_RDONLY); 50 + if (!ASSERT_OK_FD(dev_udmabuf, "open udmabuf")) 51 + goto close_memfd; 52 + 53 + memset(&create, 0, sizeof(create)); 54 + create.memfd = memfd; 55 + create.flags = UDMABUF_FLAGS_CLOEXEC; 56 + create.offset = 0; 57 + create.size = udmabuf_test_buffer_size; 58 + 59 + local_udmabuf = ioctl(dev_udmabuf, UDMABUF_CREATE, &create); 60 + close(dev_udmabuf); 61 + if (!ASSERT_OK_FD(local_udmabuf, "udmabuf_create")) 62 + goto close_memfd; 63 + 64 + if (!ASSERT_OK(ioctl(local_udmabuf, DMA_BUF_SET_NAME_B, udmabuf_test_buffer_name), "name")) 65 + goto close_udmabuf; 66 + 67 + return local_udmabuf; 68 + 69 + close_udmabuf: 70 + close(local_udmabuf); 71 + close_memfd: 72 + close(memfd); 73 + return -1; 74 + } 75 + 76 + static int create_sys_heap_dmabuf(void) 77 + { 78 + sysheap_test_buffer_size = 20 * getpagesize(); 79 + 80 + struct dma_heap_allocation_data data = { 81 + .len = sysheap_test_buffer_size, 82 + .fd = 0, 83 + .fd_flags = O_RDWR | O_CLOEXEC, 84 + .heap_flags = 0, 85 + }; 86 + int heap_fd, ret; 87 + 88 + if (!ASSERT_LE(sizeof(sysheap_test_buffer_name), DMA_BUF_NAME_LEN, "NAMETOOLONG")) 89 + return -1; 90 + 91 + heap_fd = open("/dev/dma_heap/system", O_RDONLY); 92 + if (!ASSERT_OK_FD(heap_fd, "open dma heap")) 93 + return -1; 94 + 95 + ret = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data); 96 + close(heap_fd); 97 + if (!ASSERT_OK(ret, "syheap alloc")) 98 + return -1; 99 + 100 + if (!ASSERT_OK(ioctl(data.fd, DMA_BUF_SET_NAME_B, sysheap_test_buffer_name), "name")) 101 + goto close_sysheap_dmabuf; 102 + 103 + return data.fd; 104 + 105 + close_sysheap_dmabuf: 106 + close(data.fd); 107 + return -1; 108 + } 109 + 110 + static int create_test_buffers(void) 111 + { 112 + udmabuf = create_udmabuf(); 113 + sysheap_dmabuf = create_sys_heap_dmabuf(); 114 + 115 + if (udmabuf < 0 || sysheap_dmabuf < 0) 116 + return -1; 117 + 118 + return 0; 119 + } 120 + 121 + static void destroy_test_buffers(void) 122 + { 123 + close(udmabuf); 124 + udmabuf = -1; 125 + 126 + close(sysheap_dmabuf); 127 + sysheap_dmabuf = -1; 128 + } 129 + 130 + enum Fields { INODE, SIZE, NAME, EXPORTER, FIELD_COUNT }; 131 + struct DmabufInfo { 132 + unsigned long inode; 133 + unsigned long size; 134 + char name[DMA_BUF_NAME_LEN]; 135 + char exporter[32]; 136 + }; 137 + 138 + static bool check_dmabuf_info(const struct DmabufInfo *bufinfo, 139 + unsigned long size, 140 + const char *name, const char *exporter) 141 + { 142 + return size == bufinfo->size && 143 + !strcmp(name, bufinfo->name) && 144 + !strcmp(exporter, bufinfo->exporter); 145 + } 146 + 147 + static void subtest_dmabuf_iter_check_no_infinite_reads(struct dmabuf_iter *skel) 148 + { 149 + int iter_fd; 150 + char buf[256]; 151 + 152 + iter_fd = bpf_iter_create(bpf_link__fd(skel->links.dmabuf_collector)); 153 + if (!ASSERT_OK_FD(iter_fd, "iter_create")) 154 + return; 155 + 156 + while (read(iter_fd, buf, sizeof(buf)) > 0) 157 + ; /* Read out all contents */ 158 + 159 + /* Next reads should return 0 */ 160 + ASSERT_EQ(read(iter_fd, buf, sizeof(buf)), 0, "read"); 161 + 162 + close(iter_fd); 163 + } 164 + 165 + static void subtest_dmabuf_iter_check_default_iter(struct dmabuf_iter *skel) 166 + { 167 + bool found_test_sysheap_dmabuf = false; 168 + bool found_test_udmabuf = false; 169 + struct DmabufInfo bufinfo; 170 + size_t linesize = 0; 171 + char *line = NULL; 172 + FILE *iter_file; 173 + int iter_fd, f = INODE; 174 + 175 + iter_fd = bpf_iter_create(bpf_link__fd(skel->links.dmabuf_collector)); 176 + if (!ASSERT_OK_FD(iter_fd, "iter_create")) 177 + return; 178 + 179 + iter_file = fdopen(iter_fd, "r"); 180 + if (!ASSERT_OK_PTR(iter_file, "fdopen")) 181 + goto close_iter_fd; 182 + 183 + while (getline(&line, &linesize, iter_file) != -1) { 184 + if (f % FIELD_COUNT == INODE) { 185 + ASSERT_EQ(sscanf(line, "%ld", &bufinfo.inode), 1, 186 + "read inode"); 187 + } else if (f % FIELD_COUNT == SIZE) { 188 + ASSERT_EQ(sscanf(line, "%ld", &bufinfo.size), 1, 189 + "read size"); 190 + } else if (f % FIELD_COUNT == NAME) { 191 + ASSERT_EQ(sscanf(line, "%s", bufinfo.name), 1, 192 + "read name"); 193 + } else if (f % FIELD_COUNT == EXPORTER) { 194 + ASSERT_EQ(sscanf(line, "%31s", bufinfo.exporter), 1, 195 + "read exporter"); 196 + 197 + if (check_dmabuf_info(&bufinfo, 198 + sysheap_test_buffer_size, 199 + sysheap_test_buffer_name, 200 + "system")) 201 + found_test_sysheap_dmabuf = true; 202 + else if (check_dmabuf_info(&bufinfo, 203 + udmabuf_test_buffer_size, 204 + udmabuf_test_buffer_name, 205 + "udmabuf")) 206 + found_test_udmabuf = true; 207 + } 208 + ++f; 209 + } 210 + 211 + ASSERT_EQ(f % FIELD_COUNT, INODE, "number of fields"); 212 + 213 + ASSERT_TRUE(found_test_sysheap_dmabuf, "found_test_sysheap_dmabuf"); 214 + ASSERT_TRUE(found_test_udmabuf, "found_test_udmabuf"); 215 + 216 + free(line); 217 + fclose(iter_file); 218 + close_iter_fd: 219 + close(iter_fd); 220 + } 221 + 222 + static void subtest_dmabuf_iter_check_open_coded(struct dmabuf_iter *skel, int map_fd) 223 + { 224 + LIBBPF_OPTS(bpf_test_run_opts, topts); 225 + char key[DMA_BUF_NAME_LEN]; 226 + int err, fd; 227 + bool found; 228 + 229 + /* No need to attach it, just run it directly */ 230 + fd = bpf_program__fd(skel->progs.iter_dmabuf_for_each); 231 + 232 + err = bpf_prog_test_run_opts(fd, &topts); 233 + if (!ASSERT_OK(err, "test_run_opts err")) 234 + return; 235 + if (!ASSERT_OK(topts.retval, "test_run_opts retval")) 236 + return; 237 + 238 + if (!ASSERT_OK(bpf_map_get_next_key(map_fd, NULL, key), "get next key")) 239 + return; 240 + 241 + do { 242 + ASSERT_OK(bpf_map_lookup_elem(map_fd, key, &found), "lookup"); 243 + ASSERT_TRUE(found, "found test buffer"); 244 + } while (bpf_map_get_next_key(map_fd, key, key)); 245 + } 246 + 247 + void test_dmabuf_iter(void) 248 + { 249 + struct dmabuf_iter *skel = NULL; 250 + int map_fd; 251 + const bool f = false; 252 + 253 + skel = dmabuf_iter__open_and_load(); 254 + if (!ASSERT_OK_PTR(skel, "dmabuf_iter__open_and_load")) 255 + return; 256 + 257 + map_fd = bpf_map__fd(skel->maps.testbuf_hash); 258 + if (!ASSERT_OK_FD(map_fd, "map_fd")) 259 + goto destroy_skel; 260 + 261 + if (!ASSERT_OK(bpf_map_update_elem(map_fd, udmabuf_test_buffer_name, &f, BPF_ANY), 262 + "insert udmabuf")) 263 + goto destroy_skel; 264 + if (!ASSERT_OK(bpf_map_update_elem(map_fd, sysheap_test_buffer_name, &f, BPF_ANY), 265 + "insert sysheap buffer")) 266 + goto destroy_skel; 267 + 268 + if (!ASSERT_OK(create_test_buffers(), "create_test_buffers")) 269 + goto destroy; 270 + 271 + if (!ASSERT_OK(dmabuf_iter__attach(skel), "skel_attach")) 272 + goto destroy; 273 + 274 + if (test__start_subtest("no_infinite_reads")) 275 + subtest_dmabuf_iter_check_no_infinite_reads(skel); 276 + if (test__start_subtest("default_iter")) 277 + subtest_dmabuf_iter_check_default_iter(skel); 278 + if (test__start_subtest("open_coded")) 279 + subtest_dmabuf_iter_check_open_coded(skel, map_fd); 280 + 281 + destroy: 282 + destroy_test_buffers(); 283 + destroy_skel: 284 + dmabuf_iter__destroy(skel); 285 + }

+13

tools/testing/selftests/bpf/prog_tests/dynptr.c

··· 33 33 {"test_dynptr_skb_no_buff", SETUP_SKB_PROG}, 34 34 {"test_dynptr_skb_strcmp", SETUP_SKB_PROG}, 35 35 {"test_dynptr_skb_tp_btf", SETUP_SKB_PROG_TP}, 36 + {"test_probe_read_user_dynptr", SETUP_XDP_PROG}, 37 + {"test_probe_read_kernel_dynptr", SETUP_XDP_PROG}, 38 + {"test_probe_read_user_str_dynptr", SETUP_XDP_PROG}, 39 + {"test_probe_read_kernel_str_dynptr", SETUP_XDP_PROG}, 40 + {"test_copy_from_user_dynptr", SETUP_SYSCALL_SLEEP}, 41 + {"test_copy_from_user_str_dynptr", SETUP_SYSCALL_SLEEP}, 42 + {"test_copy_from_user_task_dynptr", SETUP_SYSCALL_SLEEP}, 43 + {"test_copy_from_user_task_str_dynptr", SETUP_SYSCALL_SLEEP}, 36 44 }; 37 45 38 46 static void verify_success(const char *prog_name, enum test_setup_type setup_type) 39 47 { 48 + char user_data[384] = {[0 ... 382] = 'a', '\0'}; 40 49 struct dynptr_success *skel; 41 50 struct bpf_program *prog; 42 51 struct bpf_link *link; ··· 66 57 err = dynptr_success__load(skel); 67 58 if (!ASSERT_OK(err, "dynptr_success__load")) 68 59 goto cleanup; 60 + 61 + skel->bss->user_ptr = user_data; 62 + skel->data->test_len[0] = sizeof(user_data); 63 + memcpy(skel->bss->expected_str, user_data, sizeof(user_data)); 69 64 70 65 switch (setup_type) { 71 66 case SETUP_SYSCALL_SLEEP:

+192

tools/testing/selftests/bpf/prog_tests/fd_htab_lookup.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (C) 2025. Huawei Technologies Co., Ltd */ 3 + #define _GNU_SOURCE 4 + #include <stdbool.h> 5 + #include <test_progs.h> 6 + #include "fd_htab_lookup.skel.h" 7 + 8 + struct htab_op_ctx { 9 + int fd; 10 + int loop; 11 + unsigned int entries; 12 + bool stop; 13 + }; 14 + 15 + #define ERR_TO_RETVAL(where, err) ((void *)(long)(((where) << 12) | (-err))) 16 + 17 + static void *htab_lookup_fn(void *arg) 18 + { 19 + struct htab_op_ctx *ctx = arg; 20 + int i = 0; 21 + 22 + while (i++ < ctx->loop && !ctx->stop) { 23 + unsigned int j; 24 + 25 + for (j = 0; j < ctx->entries; j++) { 26 + unsigned int key = j, zero = 0, value; 27 + int inner_fd, err; 28 + 29 + err = bpf_map_lookup_elem(ctx->fd, &key, &value); 30 + if (err) { 31 + ctx->stop = true; 32 + return ERR_TO_RETVAL(1, err); 33 + } 34 + 35 + inner_fd = bpf_map_get_fd_by_id(value); 36 + if (inner_fd < 0) { 37 + /* The old map has been freed */ 38 + if (inner_fd == -ENOENT) 39 + continue; 40 + ctx->stop = true; 41 + return ERR_TO_RETVAL(2, inner_fd); 42 + } 43 + 44 + err = bpf_map_lookup_elem(inner_fd, &zero, &value); 45 + if (err) { 46 + close(inner_fd); 47 + ctx->stop = true; 48 + return ERR_TO_RETVAL(3, err); 49 + } 50 + close(inner_fd); 51 + 52 + if (value != key) { 53 + ctx->stop = true; 54 + return ERR_TO_RETVAL(4, -EINVAL); 55 + } 56 + } 57 + } 58 + 59 + return NULL; 60 + } 61 + 62 + static void *htab_update_fn(void *arg) 63 + { 64 + struct htab_op_ctx *ctx = arg; 65 + int i = 0; 66 + 67 + while (i++ < ctx->loop && !ctx->stop) { 68 + unsigned int j; 69 + 70 + for (j = 0; j < ctx->entries; j++) { 71 + unsigned int key = j, zero = 0; 72 + int inner_fd, err; 73 + 74 + inner_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, 4, 4, 1, NULL); 75 + if (inner_fd < 0) { 76 + ctx->stop = true; 77 + return ERR_TO_RETVAL(1, inner_fd); 78 + } 79 + 80 + err = bpf_map_update_elem(inner_fd, &zero, &key, 0); 81 + if (err) { 82 + close(inner_fd); 83 + ctx->stop = true; 84 + return ERR_TO_RETVAL(2, err); 85 + } 86 + 87 + err = bpf_map_update_elem(ctx->fd, &key, &inner_fd, BPF_EXIST); 88 + if (err) { 89 + close(inner_fd); 90 + ctx->stop = true; 91 + return ERR_TO_RETVAL(3, err); 92 + } 93 + close(inner_fd); 94 + } 95 + } 96 + 97 + return NULL; 98 + } 99 + 100 + static int setup_htab(int fd, unsigned int entries) 101 + { 102 + unsigned int i; 103 + 104 + for (i = 0; i < entries; i++) { 105 + unsigned int key = i, zero = 0; 106 + int inner_fd, err; 107 + 108 + inner_fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, NULL, 4, 4, 1, NULL); 109 + if (!ASSERT_OK_FD(inner_fd, "new array")) 110 + return -1; 111 + 112 + err = bpf_map_update_elem(inner_fd, &zero, &key, 0); 113 + if (!ASSERT_OK(err, "init array")) { 114 + close(inner_fd); 115 + return -1; 116 + } 117 + 118 + err = bpf_map_update_elem(fd, &key, &inner_fd, 0); 119 + if (!ASSERT_OK(err, "init outer")) { 120 + close(inner_fd); 121 + return -1; 122 + } 123 + close(inner_fd); 124 + } 125 + 126 + return 0; 127 + } 128 + 129 + static int get_int_from_env(const char *name, int dft) 130 + { 131 + const char *value; 132 + 133 + value = getenv(name); 134 + if (!value) 135 + return dft; 136 + 137 + return atoi(value); 138 + } 139 + 140 + void test_fd_htab_lookup(void) 141 + { 142 + unsigned int i, wr_nr = 8, rd_nr = 16; 143 + pthread_t tids[wr_nr + rd_nr]; 144 + struct fd_htab_lookup *skel; 145 + struct htab_op_ctx ctx; 146 + int err; 147 + 148 + skel = fd_htab_lookup__open_and_load(); 149 + if (!ASSERT_OK_PTR(skel, "fd_htab_lookup__open_and_load")) 150 + return; 151 + 152 + ctx.fd = bpf_map__fd(skel->maps.outer_map); 153 + ctx.loop = get_int_from_env("FD_HTAB_LOOP_NR", 5); 154 + ctx.stop = false; 155 + ctx.entries = 8; 156 + 157 + err = setup_htab(ctx.fd, ctx.entries); 158 + if (err) 159 + goto destroy; 160 + 161 + memset(tids, 0, sizeof(tids)); 162 + for (i = 0; i < wr_nr; i++) { 163 + err = pthread_create(&tids[i], NULL, htab_update_fn, &ctx); 164 + if (!ASSERT_OK(err, "pthread_create")) { 165 + ctx.stop = true; 166 + goto reap; 167 + } 168 + } 169 + for (i = 0; i < rd_nr; i++) { 170 + err = pthread_create(&tids[i + wr_nr], NULL, htab_lookup_fn, &ctx); 171 + if (!ASSERT_OK(err, "pthread_create")) { 172 + ctx.stop = true; 173 + goto reap; 174 + } 175 + } 176 + 177 + reap: 178 + for (i = 0; i < wr_nr + rd_nr; i++) { 179 + void *ret = NULL; 180 + char desc[32]; 181 + 182 + if (!tids[i]) 183 + continue; 184 + 185 + snprintf(desc, sizeof(desc), "thread %u", i + 1); 186 + err = pthread_join(tids[i], &ret); 187 + ASSERT_OK(err, desc); 188 + ASSERT_EQ(ret, NULL, desc); 189 + } 190 + destroy: 191 + fd_htab_lookup__destroy(skel); 192 + }

+16 -2

tools/testing/selftests/bpf/prog_tests/fill_link_info.c

··· 37 37 static int verify_perf_link_info(int fd, enum bpf_perf_event_type type, long addr, 38 38 ssize_t offset, ssize_t entry_offset) 39 39 { 40 + ssize_t ref_ctr_offset = entry_offset /* ref_ctr_offset for uprobes */; 40 41 struct bpf_link_info info; 41 42 __u32 len = sizeof(info); 42 43 char buf[PATH_MAX]; ··· 98 97 case BPF_PERF_EVENT_UPROBE: 99 98 case BPF_PERF_EVENT_URETPROBE: 100 99 ASSERT_EQ(info.perf_event.uprobe.offset, offset, "uprobe_offset"); 100 + ASSERT_EQ(info.perf_event.uprobe.ref_ctr_offset, ref_ctr_offset, "uprobe_ref_ctr_offset"); 101 101 102 102 ASSERT_EQ(info.perf_event.uprobe.name_len, strlen(UPROBE_FILE) + 1, 103 103 "name_len"); ··· 243 241 .retprobe = type == BPF_PERF_EVENT_URETPROBE, 244 242 .bpf_cookie = PERF_EVENT_COOKIE, 245 243 ); 244 + const char *sema[1] = { 245 + "uprobe_link_info_sema_1", 246 + }; 247 + __u64 *ref_ctr_offset; 246 248 struct bpf_link *link; 247 249 int link_fd, err; 248 250 251 + err = elf_resolve_syms_offsets("/proc/self/exe", 1, sema, 252 + (unsigned long **) &ref_ctr_offset, STT_OBJECT); 253 + if (!ASSERT_OK(err, "elf_resolve_syms_offsets_object")) 254 + return; 255 + 256 + opts.ref_ctr_offset = *ref_ctr_offset; 249 257 link = bpf_program__attach_uprobe_opts(skel->progs.uprobe_run, 250 258 0, /* self pid */ 251 259 UPROBE_FILE, uprobe_offset, 252 260 &opts); 253 261 if (!ASSERT_OK_PTR(link, "attach_uprobe")) 254 - return; 262 + goto out; 255 263 256 264 link_fd = bpf_link__fd(link); 257 - err = verify_perf_link_info(link_fd, type, 0, uprobe_offset, 0); 265 + err = verify_perf_link_info(link_fd, type, 0, uprobe_offset, *ref_ctr_offset); 258 266 ASSERT_OK(err, "verify_perf_link_info"); 259 267 bpf_link__destroy(link); 268 + out: 269 + free(ref_ctr_offset); 260 270 } 261 271 262 272 static int verify_kmulti_link_info(int fd, bool retprobe, bool has_cookies)

+1 -1

tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c

··· 104 104 goto destroy; 105 105 106 106 memset(buf, 0, sizeof(buf)); 107 - while (read(iter_fd, buf, sizeof(buf) > 0)) { 107 + while (read(iter_fd, buf, sizeof(buf)) > 0) { 108 108 /* Read out all contents */ 109 109 printf("%s", buf); 110 110 }

+6

tools/testing/selftests/bpf/prog_tests/linked_list.c

··· 7 7 8 8 #include "linked_list.skel.h" 9 9 #include "linked_list_fail.skel.h" 10 + #include "linked_list_peek.skel.h" 10 11 11 12 static char log_buf[1024 * 1024]; 12 13 ··· 805 804 test_linked_list_success(LIST_IN_LIST, false); 806 805 test_linked_list_success(LIST_IN_LIST, true); 807 806 test_linked_list_success(TEST_ALL, false); 807 + } 808 + 809 + void test_linked_list_peek(void) 810 + { 811 + RUN_TESTS(linked_list_peek); 808 812 }

+6

tools/testing/selftests/bpf/prog_tests/rbtree.c

··· 8 8 #include "rbtree_fail.skel.h" 9 9 #include "rbtree_btf_fail__wrong_node_type.skel.h" 10 10 #include "rbtree_btf_fail__add_wrong_type.skel.h" 11 + #include "rbtree_search.skel.h" 11 12 12 13 static void test_rbtree_add_nodes(void) 13 14 { ··· 187 186 void test_rbtree_fail(void) 188 187 { 189 188 RUN_TESTS(rbtree_fail); 189 + } 190 + 191 + void test_rbtree_search(void) 192 + { 193 + RUN_TESTS(rbtree_search); 190 194 }

+3 -1

tools/testing/selftests/bpf/prog_tests/sk_assign.c

··· 37 37 tc = popen("tc -V", "r"); 38 38 if (CHECK_FAIL(!tc)) 39 39 return false; 40 - if (CHECK_FAIL(!fgets(tc_version, sizeof(tc_version), tc))) 40 + if (CHECK_FAIL(!fgets(tc_version, sizeof(tc_version), tc))) { 41 + pclose(tc); 41 42 return false; 43 + } 42 44 if (strstr(tc_version, ", libbpf ")) 43 45 prog = "test_sk_assign_libbpf.bpf.o"; 44 46 else

+79 -5

tools/testing/selftests/bpf/prog_tests/socket_helpers.h

··· 3 3 #ifndef __SOCKET_HELPERS__ 4 4 #define __SOCKET_HELPERS__ 5 5 6 + #include <sys/un.h> 6 7 #include <linux/vm_sockets.h> 7 8 8 9 /* include/linux/net.h */ ··· 170 169 *len = sizeof(*addr6); 171 170 } 172 171 172 + static inline void init_addr_loopback_unix(struct sockaddr_storage *ss, 173 + socklen_t *len) 174 + { 175 + struct sockaddr_un *addr = memset(ss, 0, sizeof(*ss)); 176 + 177 + addr->sun_family = AF_UNIX; 178 + *len = sizeof(sa_family_t); 179 + } 180 + 173 181 static inline void init_addr_loopback_vsock(struct sockaddr_storage *ss, 174 182 socklen_t *len) 175 183 { ··· 199 189 return; 200 190 case AF_INET6: 201 191 init_addr_loopback6(ss, len); 192 + return; 193 + case AF_UNIX: 194 + init_addr_loopback_unix(ss, len); 202 195 return; 203 196 case AF_VSOCK: 204 197 init_addr_loopback_vsock(ss, len); ··· 328 315 { 329 316 __close_fd int s, c = -1, p = -1; 330 317 struct sockaddr_storage addr; 331 - socklen_t len = sizeof(addr); 318 + socklen_t len; 332 319 int err; 333 320 334 321 s = socket_loopback(family, sotype); 335 322 if (s < 0) 336 323 return s; 337 324 338 - err = xgetsockname(s, sockaddr(&addr), &len); 339 - if (err) 340 - return err; 341 - 342 325 c = xsocket(family, sotype, 0); 343 326 if (c < 0) 344 327 return c; 328 + 329 + init_addr_loopback(family, &addr, &len); 330 + err = xbind(c, sockaddr(&addr), len); 331 + if (err) 332 + return err; 333 + 334 + len = sizeof(addr); 335 + err = xgetsockname(s, sockaddr(&addr), &len); 336 + if (err) 337 + return err; 345 338 346 339 err = connect(c, sockaddr(&addr), len); 347 340 if (err) { ··· 408 389 } 409 390 410 391 return err; 392 + } 393 + 394 + static inline const char *socket_kind_to_str(int sock_fd) 395 + { 396 + socklen_t opt_len; 397 + int domain, type; 398 + 399 + opt_len = sizeof(domain); 400 + if (getsockopt(sock_fd, SOL_SOCKET, SO_DOMAIN, &domain, &opt_len)) 401 + FAIL_ERRNO("getsockopt(SO_DOMAIN)"); 402 + 403 + opt_len = sizeof(type); 404 + if (getsockopt(sock_fd, SOL_SOCKET, SO_TYPE, &type, &opt_len)) 405 + FAIL_ERRNO("getsockopt(SO_TYPE)"); 406 + 407 + switch (domain) { 408 + case AF_INET: 409 + switch (type) { 410 + case SOCK_STREAM: 411 + return "tcp4"; 412 + case SOCK_DGRAM: 413 + return "udp4"; 414 + } 415 + break; 416 + case AF_INET6: 417 + switch (type) { 418 + case SOCK_STREAM: 419 + return "tcp6"; 420 + case SOCK_DGRAM: 421 + return "udp6"; 422 + } 423 + break; 424 + case AF_UNIX: 425 + switch (type) { 426 + case SOCK_STREAM: 427 + return "u_str"; 428 + case SOCK_DGRAM: 429 + return "u_dgr"; 430 + case SOCK_SEQPACKET: 431 + return "u_seq"; 432 + } 433 + break; 434 + case AF_VSOCK: 435 + switch (type) { 436 + case SOCK_STREAM: 437 + return "v_str"; 438 + case SOCK_DGRAM: 439 + return "v_dgr"; 440 + case SOCK_SEQPACKET: 441 + return "v_seq"; 442 + } 443 + break; 444 + } 445 + 446 + return "???"; 411 447 } 412 448 413 449 #endif // __SOCKET_HELPERS__

+11 -14

tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h

··· 5 5 6 6 #define MAX_TEST_NAME 80 7 7 8 + #define u32(v) ((u32){(v)}) 9 + #define u64(v) ((u64){(v)}) 10 + 8 11 #define __always_unused __attribute__((__unused__)) 9 12 10 13 #define xbpf_map_delete_elem(fd, key) \ 11 14 ({ \ 12 15 int __ret = bpf_map_delete_elem((fd), (key)); \ 13 - if (__ret < 0) \ 16 + if (__ret < 0) \ 14 17 FAIL_ERRNO("map_delete"); \ 15 18 __ret; \ 16 19 }) ··· 21 18 #define xbpf_map_lookup_elem(fd, key, val) \ 22 19 ({ \ 23 20 int __ret = bpf_map_lookup_elem((fd), (key), (val)); \ 24 - if (__ret < 0) \ 21 + if (__ret < 0) \ 25 22 FAIL_ERRNO("map_lookup"); \ 26 23 __ret; \ 27 24 }) ··· 29 26 #define xbpf_map_update_elem(fd, key, val, flags) \ 30 27 ({ \ 31 28 int __ret = bpf_map_update_elem((fd), (key), (val), (flags)); \ 32 - if (__ret < 0) \ 29 + if (__ret < 0) \ 33 30 FAIL_ERRNO("map_update"); \ 34 31 __ret; \ 35 32 }) ··· 38 35 ({ \ 39 36 int __ret = \ 40 37 bpf_prog_attach((prog), (target), (type), (flags)); \ 41 - if (__ret < 0) \ 38 + if (__ret < 0) \ 42 39 FAIL_ERRNO("prog_attach(" #type ")"); \ 43 40 __ret; \ 44 41 }) ··· 46 43 #define xbpf_prog_detach2(prog, target, type) \ 47 44 ({ \ 48 45 int __ret = bpf_prog_detach2((prog), (target), (type)); \ 49 - if (__ret < 0) \ 46 + if (__ret < 0) \ 50 47 FAIL_ERRNO("prog_detach2(" #type ")"); \ 51 48 __ret; \ 52 49 }) ··· 69 66 __ret; \ 70 67 }) 71 68 72 - static inline int add_to_sockmap(int sock_mapfd, int fd1, int fd2) 69 + static inline int add_to_sockmap(int mapfd, int fd1, int fd2) 73 70 { 74 - u64 value; 75 - u32 key; 76 71 int err; 77 72 78 - key = 0; 79 - value = fd1; 80 - err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); 73 + err = xbpf_map_update_elem(mapfd, &u32(0), &u64(fd1), BPF_NOEXIST); 81 74 if (err) 82 75 return err; 83 76 84 - key = 1; 85 - value = fd2; 86 - return xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); 77 + return xbpf_map_update_elem(mapfd, &u32(1), &u64(fd2), BPF_NOEXIST); 87 78 } 88 79 89 80 #endif // __SOCKMAP_HELPERS__

+241 -60

tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c

··· 3 3 /* 4 4 * Tests for sockmap/sockhash holding kTLS sockets. 5 5 */ 6 - 6 + #include <error.h> 7 7 #include <netinet/tcp.h> 8 + #include <linux/tls.h> 8 9 #include "test_progs.h" 10 + #include "sockmap_helpers.h" 11 + #include "test_skmsg_load_helpers.skel.h" 12 + #include "test_sockmap_ktls.skel.h" 9 13 10 14 #define MAX_TEST_NAME 80 11 15 #define TCP_ULP 31 12 16 13 - static int tcp_server(int family) 17 + static int init_ktls_pairs(int c, int p) 14 18 { 15 - int err, s; 19 + int err; 20 + struct tls12_crypto_info_aes_gcm_128 crypto_rx; 21 + struct tls12_crypto_info_aes_gcm_128 crypto_tx; 16 22 17 - s = socket(family, SOCK_STREAM, 0); 18 - if (!ASSERT_GE(s, 0, "socket")) 19 - return -1; 20 - 21 - err = listen(s, SOMAXCONN); 22 - if (!ASSERT_OK(err, "listen")) 23 - return -1; 24 - 25 - return s; 26 - } 27 - 28 - static int disconnect(int fd) 29 - { 30 - struct sockaddr unspec = { AF_UNSPEC }; 31 - 32 - return connect(fd, &unspec, sizeof(unspec)); 33 - } 34 - 35 - /* Disconnect (unhash) a kTLS socket after removing it from sockmap. */ 36 - static void test_sockmap_ktls_disconnect_after_delete(int family, int map) 37 - { 38 - struct sockaddr_storage addr = {0}; 39 - socklen_t len = sizeof(addr); 40 - int err, cli, srv, zero = 0; 41 - 42 - srv = tcp_server(family); 43 - if (srv == -1) 44 - return; 45 - 46 - err = getsockname(srv, (struct sockaddr *)&addr, &len); 47 - if (!ASSERT_OK(err, "getsockopt")) 48 - goto close_srv; 49 - 50 - cli = socket(family, SOCK_STREAM, 0); 51 - if (!ASSERT_GE(cli, 0, "socket")) 52 - goto close_srv; 53 - 54 - err = connect(cli, (struct sockaddr *)&addr, len); 55 - if (!ASSERT_OK(err, "connect")) 56 - goto close_cli; 57 - 58 - err = bpf_map_update_elem(map, &zero, &cli, 0); 59 - if (!ASSERT_OK(err, "bpf_map_update_elem")) 60 - goto close_cli; 61 - 62 - err = setsockopt(cli, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls")); 23 + err = setsockopt(c, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls")); 63 24 if (!ASSERT_OK(err, "setsockopt(TCP_ULP)")) 64 - goto close_cli; 25 + goto out; 65 26 66 - err = bpf_map_delete_elem(map, &zero); 67 - if (!ASSERT_OK(err, "bpf_map_delete_elem")) 68 - goto close_cli; 27 + err = setsockopt(p, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls")); 28 + if (!ASSERT_OK(err, "setsockopt(TCP_ULP)")) 29 + goto out; 69 30 70 - err = disconnect(cli); 31 + memset(&crypto_rx, 0, sizeof(crypto_rx)); 32 + memset(&crypto_tx, 0, sizeof(crypto_tx)); 33 + crypto_rx.info.version = TLS_1_2_VERSION; 34 + crypto_tx.info.version = TLS_1_2_VERSION; 35 + crypto_rx.info.cipher_type = TLS_CIPHER_AES_GCM_128; 36 + crypto_tx.info.cipher_type = TLS_CIPHER_AES_GCM_128; 71 37 72 - close_cli: 73 - close(cli); 74 - close_srv: 75 - close(srv); 38 + err = setsockopt(c, SOL_TLS, TLS_TX, &crypto_tx, sizeof(crypto_tx)); 39 + if (!ASSERT_OK(err, "setsockopt(TLS_TX)")) 40 + goto out; 41 + 42 + err = setsockopt(p, SOL_TLS, TLS_RX, &crypto_rx, sizeof(crypto_rx)); 43 + if (!ASSERT_OK(err, "setsockopt(TLS_RX)")) 44 + goto out; 45 + return 0; 46 + out: 47 + return -1; 48 + } 49 + 50 + static int create_ktls_pairs(int family, int sotype, int *c, int *p) 51 + { 52 + int err; 53 + 54 + err = create_pair(family, sotype, c, p); 55 + if (!ASSERT_OK(err, "create_pair()")) 56 + return -1; 57 + 58 + err = init_ktls_pairs(*c, *p); 59 + if (!ASSERT_OK(err, "init_ktls_pairs(c, p)")) 60 + return -1; 61 + return 0; 76 62 } 77 63 78 64 static void test_sockmap_ktls_update_fails_when_sock_has_ulp(int family, int map) ··· 131 145 return test_name; 132 146 } 133 147 148 + static void test_sockmap_ktls_offload(int family, int sotype) 149 + { 150 + int err; 151 + int c = 0, p = 0, sent, recvd; 152 + char msg[12] = "hello world\0"; 153 + char rcv[13]; 154 + 155 + err = create_ktls_pairs(family, sotype, &c, &p); 156 + if (!ASSERT_OK(err, "create_ktls_pairs()")) 157 + goto out; 158 + 159 + sent = send(c, msg, sizeof(msg), 0); 160 + if (!ASSERT_OK(err, "send(msg)")) 161 + goto out; 162 + 163 + recvd = recv(p, rcv, sizeof(rcv), 0); 164 + if (!ASSERT_OK(err, "recv(msg)") || 165 + !ASSERT_EQ(recvd, sent, "length mismatch")) 166 + goto out; 167 + 168 + ASSERT_OK(memcmp(msg, rcv, sizeof(msg)), "data mismatch"); 169 + 170 + out: 171 + if (c) 172 + close(c); 173 + if (p) 174 + close(p); 175 + } 176 + 177 + static void test_sockmap_ktls_tx_cork(int family, int sotype, bool push) 178 + { 179 + int err, off; 180 + int i, j; 181 + int start_push = 0, push_len = 0; 182 + int c = 0, p = 0, one = 1, sent, recvd; 183 + int prog_fd, map_fd; 184 + char msg[12] = "hello world\0"; 185 + char rcv[20] = {0}; 186 + struct test_sockmap_ktls *skel; 187 + 188 + skel = test_sockmap_ktls__open_and_load(); 189 + if (!ASSERT_TRUE(skel, "open ktls skel")) 190 + return; 191 + 192 + err = create_pair(family, sotype, &c, &p); 193 + if (!ASSERT_OK(err, "create_pair()")) 194 + goto out; 195 + 196 + prog_fd = bpf_program__fd(skel->progs.prog_sk_policy); 197 + map_fd = bpf_map__fd(skel->maps.sock_map); 198 + 199 + err = bpf_prog_attach(prog_fd, map_fd, BPF_SK_MSG_VERDICT, 0); 200 + if (!ASSERT_OK(err, "bpf_prog_attach sk msg")) 201 + goto out; 202 + 203 + err = bpf_map_update_elem(map_fd, &one, &c, BPF_NOEXIST); 204 + if (!ASSERT_OK(err, "bpf_map_update_elem(c)")) 205 + goto out; 206 + 207 + err = init_ktls_pairs(c, p); 208 + if (!ASSERT_OK(err, "init_ktls_pairs(c, p)")) 209 + goto out; 210 + 211 + skel->bss->cork_byte = sizeof(msg); 212 + if (push) { 213 + start_push = 1; 214 + push_len = 2; 215 + } 216 + skel->bss->push_start = start_push; 217 + skel->bss->push_end = push_len; 218 + 219 + off = sizeof(msg) / 2; 220 + sent = send(c, msg, off, 0); 221 + if (!ASSERT_EQ(sent, off, "send(msg)")) 222 + goto out; 223 + 224 + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1); 225 + if (!ASSERT_EQ(-1, recvd, "expected no data")) 226 + goto out; 227 + 228 + /* send remaining msg */ 229 + sent = send(c, msg + off, sizeof(msg) - off, 0); 230 + if (!ASSERT_EQ(sent, sizeof(msg) - off, "send remaining data")) 231 + goto out; 232 + 233 + recvd = recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1); 234 + if (!ASSERT_OK(err, "recv(msg)") || 235 + !ASSERT_EQ(recvd, sizeof(msg) + push_len, "check length mismatch")) 236 + goto out; 237 + 238 + for (i = 0, j = 0; i < recvd;) { 239 + /* skip checking the data that has been pushed in */ 240 + if (i >= start_push && i <= start_push + push_len - 1) { 241 + i++; 242 + continue; 243 + } 244 + if (!ASSERT_EQ(rcv[i], msg[j], "data mismatch")) 245 + goto out; 246 + i++; 247 + j++; 248 + } 249 + out: 250 + if (c) 251 + close(c); 252 + if (p) 253 + close(p); 254 + test_sockmap_ktls__destroy(skel); 255 + } 256 + 257 + static void test_sockmap_ktls_tx_no_buf(int family, int sotype, bool push) 258 + { 259 + int c = -1, p = -1, one = 1, two = 2; 260 + struct test_sockmap_ktls *skel; 261 + unsigned char *data = NULL; 262 + struct msghdr msg = {0}; 263 + struct iovec iov[2]; 264 + int prog_fd, map_fd; 265 + int txrx_buf = 1024; 266 + int iov_length = 8192; 267 + int err; 268 + 269 + skel = test_sockmap_ktls__open_and_load(); 270 + if (!ASSERT_TRUE(skel, "open ktls skel")) 271 + return; 272 + 273 + err = create_pair(family, sotype, &c, &p); 274 + if (!ASSERT_OK(err, "create_pair()")) 275 + goto out; 276 + 277 + err = setsockopt(c, SOL_SOCKET, SO_RCVBUFFORCE, &txrx_buf, sizeof(int)); 278 + err |= setsockopt(p, SOL_SOCKET, SO_SNDBUFFORCE, &txrx_buf, sizeof(int)); 279 + if (!ASSERT_OK(err, "set buf limit")) 280 + goto out; 281 + 282 + prog_fd = bpf_program__fd(skel->progs.prog_sk_policy_redir); 283 + map_fd = bpf_map__fd(skel->maps.sock_map); 284 + 285 + err = bpf_prog_attach(prog_fd, map_fd, BPF_SK_MSG_VERDICT, 0); 286 + if (!ASSERT_OK(err, "bpf_prog_attach sk msg")) 287 + goto out; 288 + 289 + err = bpf_map_update_elem(map_fd, &one, &c, BPF_NOEXIST); 290 + if (!ASSERT_OK(err, "bpf_map_update_elem(c)")) 291 + goto out; 292 + 293 + err = bpf_map_update_elem(map_fd, &two, &p, BPF_NOEXIST); 294 + if (!ASSERT_OK(err, "bpf_map_update_elem(p)")) 295 + goto out; 296 + 297 + skel->bss->apply_bytes = 1024; 298 + 299 + err = init_ktls_pairs(c, p); 300 + if (!ASSERT_OK(err, "init_ktls_pairs(c, p)")) 301 + goto out; 302 + 303 + data = calloc(iov_length, sizeof(char)); 304 + if (!data) 305 + goto out; 306 + 307 + iov[0].iov_base = data; 308 + iov[0].iov_len = iov_length; 309 + iov[1].iov_base = data; 310 + iov[1].iov_len = iov_length; 311 + msg.msg_iov = iov; 312 + msg.msg_iovlen = 2; 313 + 314 + for (;;) { 315 + err = sendmsg(c, &msg, MSG_DONTWAIT); 316 + if (err <= 0) 317 + break; 318 + } 319 + 320 + out: 321 + if (data) 322 + free(data); 323 + if (c != -1) 324 + close(c); 325 + if (p != -1) 326 + close(p); 327 + 328 + test_sockmap_ktls__destroy(skel); 329 + } 330 + 134 331 static void run_tests(int family, enum bpf_map_type map_type) 135 332 { 136 333 int map; ··· 322 153 if (!ASSERT_GE(map, 0, "bpf_map_create")) 323 154 return; 324 155 325 - if (test__start_subtest(fmt_test_name("disconnect_after_delete", family, map_type))) 326 - test_sockmap_ktls_disconnect_after_delete(family, map); 327 156 if (test__start_subtest(fmt_test_name("update_fails_when_sock_has_ulp", family, map_type))) 328 157 test_sockmap_ktls_update_fails_when_sock_has_ulp(family, map); 329 158 330 159 close(map); 160 + } 161 + 162 + static void run_ktls_test(int family, int sotype) 163 + { 164 + if (test__start_subtest("tls simple offload")) 165 + test_sockmap_ktls_offload(family, sotype); 166 + if (test__start_subtest("tls tx cork")) 167 + test_sockmap_ktls_tx_cork(family, sotype, false); 168 + if (test__start_subtest("tls tx cork with push")) 169 + test_sockmap_ktls_tx_cork(family, sotype, true); 170 + if (test__start_subtest("tls tx egress with no buf")) 171 + test_sockmap_ktls_tx_no_buf(family, sotype, true); 331 172 } 332 173 333 174 void test_sockmap_ktls(void) ··· 346 167 run_tests(AF_INET, BPF_MAP_TYPE_SOCKHASH); 347 168 run_tests(AF_INET6, BPF_MAP_TYPE_SOCKMAP); 348 169 run_tests(AF_INET6, BPF_MAP_TYPE_SOCKHASH); 170 + run_ktls_test(AF_INET, SOCK_STREAM); 171 + run_ktls_test(AF_INET6, SOCK_STREAM); 349 172 }

-457

tools/testing/selftests/bpf/prog_tests/sockmap_listen.c

··· 1366 1366 } 1367 1367 } 1368 1368 1369 - static void pairs_redir_to_connected(int cli0, int peer0, int cli1, int peer1, 1370 - int sock_mapfd, int nop_mapfd, 1371 - int verd_mapfd, enum redir_mode mode, 1372 - int send_flags) 1373 - { 1374 - const char *log_prefix = redir_mode_str(mode); 1375 - unsigned int pass; 1376 - int err, n; 1377 - u32 key; 1378 - char b; 1379 - 1380 - zero_verdict_count(verd_mapfd); 1381 - 1382 - err = add_to_sockmap(sock_mapfd, peer0, peer1); 1383 - if (err) 1384 - return; 1385 - 1386 - if (nop_mapfd >= 0) { 1387 - err = add_to_sockmap(nop_mapfd, cli0, cli1); 1388 - if (err) 1389 - return; 1390 - } 1391 - 1392 - /* Last byte is OOB data when send_flags has MSG_OOB bit set */ 1393 - n = xsend(cli1, "ab", 2, send_flags); 1394 - if (n >= 0 && n < 2) 1395 - FAIL("%s: incomplete send", log_prefix); 1396 - if (n < 2) 1397 - return; 1398 - 1399 - key = SK_PASS; 1400 - err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); 1401 - if (err) 1402 - return; 1403 - if (pass != 1) 1404 - FAIL("%s: want pass count 1, have %d", log_prefix, pass); 1405 - 1406 - n = recv_timeout(mode == REDIR_INGRESS ? peer0 : cli0, &b, 1, 0, IO_TIMEOUT_SEC); 1407 - if (n < 0) 1408 - FAIL_ERRNO("%s: recv_timeout", log_prefix); 1409 - if (n == 0) 1410 - FAIL("%s: incomplete recv", log_prefix); 1411 - 1412 - if (send_flags & MSG_OOB) { 1413 - /* Check that we can't read OOB while in sockmap */ 1414 - errno = 0; 1415 - n = recv(peer1, &b, 1, MSG_OOB | MSG_DONTWAIT); 1416 - if (n != -1 || errno != EOPNOTSUPP) 1417 - FAIL("%s: recv(MSG_OOB): expected EOPNOTSUPP: retval=%d errno=%d", 1418 - log_prefix, n, errno); 1419 - 1420 - /* Remove peer1 from sockmap */ 1421 - xbpf_map_delete_elem(sock_mapfd, &(int){ 1 }); 1422 - 1423 - /* Check that OOB was dropped on redirect */ 1424 - errno = 0; 1425 - n = recv(peer1, &b, 1, MSG_OOB | MSG_DONTWAIT); 1426 - if (n != -1 || errno != EINVAL) 1427 - FAIL("%s: recv(MSG_OOB): expected EINVAL: retval=%d errno=%d", 1428 - log_prefix, n, errno); 1429 - } 1430 - } 1431 - 1432 - static void unix_redir_to_connected(int sotype, int sock_mapfd, 1433 - int verd_mapfd, enum redir_mode mode) 1434 - { 1435 - int c0, c1, p0, p1; 1436 - int sfd[2]; 1437 - 1438 - if (socketpair(AF_UNIX, sotype | SOCK_NONBLOCK, 0, sfd)) 1439 - return; 1440 - c0 = sfd[0], p0 = sfd[1]; 1441 - 1442 - if (socketpair(AF_UNIX, sotype | SOCK_NONBLOCK, 0, sfd)) 1443 - goto close0; 1444 - c1 = sfd[0], p1 = sfd[1]; 1445 - 1446 - pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, 1447 - mode, NO_FLAGS); 1448 - 1449 - xclose(c1); 1450 - xclose(p1); 1451 - close0: 1452 - xclose(c0); 1453 - xclose(p0); 1454 - } 1455 - 1456 - static void unix_skb_redir_to_connected(struct test_sockmap_listen *skel, 1457 - struct bpf_map *inner_map, int sotype) 1458 - { 1459 - int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); 1460 - int verdict_map = bpf_map__fd(skel->maps.verdict_map); 1461 - int sock_map = bpf_map__fd(inner_map); 1462 - int err; 1463 - 1464 - err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); 1465 - if (err) 1466 - return; 1467 - 1468 - skel->bss->test_ingress = false; 1469 - unix_redir_to_connected(sotype, sock_map, verdict_map, REDIR_EGRESS); 1470 - skel->bss->test_ingress = true; 1471 - unix_redir_to_connected(sotype, sock_map, verdict_map, REDIR_INGRESS); 1472 - 1473 - xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); 1474 - } 1475 - 1476 - static void test_unix_redir(struct test_sockmap_listen *skel, struct bpf_map *map, 1477 - int sotype) 1478 - { 1479 - const char *family_name, *map_name; 1480 - char s[MAX_TEST_NAME]; 1481 - 1482 - family_name = family_str(AF_UNIX); 1483 - map_name = map_type_str(map); 1484 - snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); 1485 - if (!test__start_subtest(s)) 1486 - return; 1487 - unix_skb_redir_to_connected(skel, map, sotype); 1488 - } 1489 - 1490 - /* Returns two connected loopback vsock sockets */ 1491 - static int vsock_socketpair_connectible(int sotype, int *v0, int *v1) 1492 - { 1493 - return create_pair(AF_VSOCK, sotype | SOCK_NONBLOCK, v0, v1); 1494 - } 1495 - 1496 - static void vsock_unix_redir_connectible(int sock_mapfd, int verd_mapfd, 1497 - enum redir_mode mode, int sotype) 1498 - { 1499 - const char *log_prefix = redir_mode_str(mode); 1500 - char a = 'a', b = 'b'; 1501 - int u0, u1, v0, v1; 1502 - int sfd[2]; 1503 - unsigned int pass; 1504 - int err, n; 1505 - u32 key; 1506 - 1507 - zero_verdict_count(verd_mapfd); 1508 - 1509 - if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK, 0, sfd)) 1510 - return; 1511 - 1512 - u0 = sfd[0]; 1513 - u1 = sfd[1]; 1514 - 1515 - err = vsock_socketpair_connectible(sotype, &v0, &v1); 1516 - if (err) { 1517 - FAIL("vsock_socketpair_connectible() failed"); 1518 - goto close_uds; 1519 - } 1520 - 1521 - err = add_to_sockmap(sock_mapfd, u0, v0); 1522 - if (err) { 1523 - FAIL("add_to_sockmap failed"); 1524 - goto close_vsock; 1525 - } 1526 - 1527 - n = write(v1, &a, sizeof(a)); 1528 - if (n < 0) 1529 - FAIL_ERRNO("%s: write", log_prefix); 1530 - if (n == 0) 1531 - FAIL("%s: incomplete write", log_prefix); 1532 - if (n < 1) 1533 - goto out; 1534 - 1535 - n = xrecv_nonblock(mode == REDIR_INGRESS ? u0 : u1, &b, sizeof(b), 0); 1536 - if (n < 0) 1537 - FAIL("%s: recv() err, errno=%d", log_prefix, errno); 1538 - if (n == 0) 1539 - FAIL("%s: incomplete recv", log_prefix); 1540 - if (b != a) 1541 - FAIL("%s: vsock socket map failed, %c != %c", log_prefix, a, b); 1542 - 1543 - key = SK_PASS; 1544 - err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); 1545 - if (err) 1546 - goto out; 1547 - if (pass != 1) 1548 - FAIL("%s: want pass count 1, have %d", log_prefix, pass); 1549 - out: 1550 - key = 0; 1551 - bpf_map_delete_elem(sock_mapfd, &key); 1552 - key = 1; 1553 - bpf_map_delete_elem(sock_mapfd, &key); 1554 - 1555 - close_vsock: 1556 - close(v0); 1557 - close(v1); 1558 - 1559 - close_uds: 1560 - close(u0); 1561 - close(u1); 1562 - } 1563 - 1564 - static void vsock_unix_skb_redir_connectible(struct test_sockmap_listen *skel, 1565 - struct bpf_map *inner_map, 1566 - int sotype) 1567 - { 1568 - int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); 1569 - int verdict_map = bpf_map__fd(skel->maps.verdict_map); 1570 - int sock_map = bpf_map__fd(inner_map); 1571 - int err; 1572 - 1573 - err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); 1574 - if (err) 1575 - return; 1576 - 1577 - skel->bss->test_ingress = false; 1578 - vsock_unix_redir_connectible(sock_map, verdict_map, REDIR_EGRESS, sotype); 1579 - skel->bss->test_ingress = true; 1580 - vsock_unix_redir_connectible(sock_map, verdict_map, REDIR_INGRESS, sotype); 1581 - 1582 - xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); 1583 - } 1584 - 1585 - static void test_vsock_redir(struct test_sockmap_listen *skel, struct bpf_map *map) 1586 - { 1587 - const char *family_name, *map_name; 1588 - char s[MAX_TEST_NAME]; 1589 - 1590 - family_name = family_str(AF_VSOCK); 1591 - map_name = map_type_str(map); 1592 - snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); 1593 - if (!test__start_subtest(s)) 1594 - return; 1595 - 1596 - vsock_unix_skb_redir_connectible(skel, map, SOCK_STREAM); 1597 - vsock_unix_skb_redir_connectible(skel, map, SOCK_SEQPACKET); 1598 - } 1599 - 1600 1369 static void test_reuseport(struct test_sockmap_listen *skel, 1601 1370 struct bpf_map *map, int family, int sotype) 1602 1371 { ··· 1406 1637 } 1407 1638 } 1408 1639 1409 - static int inet_socketpair(int family, int type, int *s, int *c) 1410 - { 1411 - return create_pair(family, type | SOCK_NONBLOCK, s, c); 1412 - } 1413 - 1414 - static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd, 1415 - enum redir_mode mode) 1416 - { 1417 - int c0, c1, p0, p1; 1418 - int err; 1419 - 1420 - err = inet_socketpair(family, SOCK_DGRAM, &p0, &c0); 1421 - if (err) 1422 - return; 1423 - err = inet_socketpair(family, SOCK_DGRAM, &p1, &c1); 1424 - if (err) 1425 - goto close_cli0; 1426 - 1427 - pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, 1428 - mode, NO_FLAGS); 1429 - 1430 - xclose(c1); 1431 - xclose(p1); 1432 - close_cli0: 1433 - xclose(c0); 1434 - xclose(p0); 1435 - } 1436 - 1437 - static void udp_skb_redir_to_connected(struct test_sockmap_listen *skel, 1438 - struct bpf_map *inner_map, int family) 1439 - { 1440 - int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); 1441 - int verdict_map = bpf_map__fd(skel->maps.verdict_map); 1442 - int sock_map = bpf_map__fd(inner_map); 1443 - int err; 1444 - 1445 - err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); 1446 - if (err) 1447 - return; 1448 - 1449 - skel->bss->test_ingress = false; 1450 - udp_redir_to_connected(family, sock_map, verdict_map, REDIR_EGRESS); 1451 - skel->bss->test_ingress = true; 1452 - udp_redir_to_connected(family, sock_map, verdict_map, REDIR_INGRESS); 1453 - 1454 - xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); 1455 - } 1456 - 1457 - static void test_udp_redir(struct test_sockmap_listen *skel, struct bpf_map *map, 1458 - int family) 1459 - { 1460 - const char *family_name, *map_name; 1461 - char s[MAX_TEST_NAME]; 1462 - 1463 - family_name = family_str(family); 1464 - map_name = map_type_str(map); 1465 - snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); 1466 - if (!test__start_subtest(s)) 1467 - return; 1468 - udp_skb_redir_to_connected(skel, map, family); 1469 - } 1470 - 1471 - static void inet_unix_redir_to_connected(int family, int type, int sock_mapfd, 1472 - int verd_mapfd, enum redir_mode mode) 1473 - { 1474 - int c0, c1, p0, p1; 1475 - int sfd[2]; 1476 - int err; 1477 - 1478 - if (socketpair(AF_UNIX, type | SOCK_NONBLOCK, 0, sfd)) 1479 - return; 1480 - c0 = sfd[0], p0 = sfd[1]; 1481 - 1482 - err = inet_socketpair(family, type, &p1, &c1); 1483 - if (err) 1484 - goto close; 1485 - 1486 - pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, -1, verd_mapfd, 1487 - mode, NO_FLAGS); 1488 - 1489 - xclose(c1); 1490 - xclose(p1); 1491 - close: 1492 - xclose(c0); 1493 - xclose(p0); 1494 - } 1495 - 1496 - static void inet_unix_skb_redir_to_connected(struct test_sockmap_listen *skel, 1497 - struct bpf_map *inner_map, int family) 1498 - { 1499 - int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); 1500 - int verdict_map = bpf_map__fd(skel->maps.verdict_map); 1501 - int sock_map = bpf_map__fd(inner_map); 1502 - int err; 1503 - 1504 - err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); 1505 - if (err) 1506 - return; 1507 - 1508 - skel->bss->test_ingress = false; 1509 - inet_unix_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map, 1510 - REDIR_EGRESS); 1511 - inet_unix_redir_to_connected(family, SOCK_STREAM, sock_map, verdict_map, 1512 - REDIR_EGRESS); 1513 - skel->bss->test_ingress = true; 1514 - inet_unix_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map, 1515 - REDIR_INGRESS); 1516 - inet_unix_redir_to_connected(family, SOCK_STREAM, sock_map, verdict_map, 1517 - REDIR_INGRESS); 1518 - 1519 - xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); 1520 - } 1521 - 1522 - static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd, 1523 - int nop_mapfd, int verd_mapfd, 1524 - enum redir_mode mode, int send_flags) 1525 - { 1526 - int c0, c1, p0, p1; 1527 - int sfd[2]; 1528 - int err; 1529 - 1530 - err = inet_socketpair(family, type, &p0, &c0); 1531 - if (err) 1532 - return; 1533 - 1534 - if (socketpair(AF_UNIX, type | SOCK_NONBLOCK, 0, sfd)) 1535 - goto close_cli0; 1536 - c1 = sfd[0], p1 = sfd[1]; 1537 - 1538 - pairs_redir_to_connected(c0, p0, c1, p1, sock_mapfd, nop_mapfd, 1539 - verd_mapfd, mode, send_flags); 1540 - 1541 - xclose(c1); 1542 - xclose(p1); 1543 - close_cli0: 1544 - xclose(c0); 1545 - xclose(p0); 1546 - } 1547 - 1548 - static void unix_inet_skb_redir_to_connected(struct test_sockmap_listen *skel, 1549 - struct bpf_map *inner_map, int family) 1550 - { 1551 - int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); 1552 - int nop_map = bpf_map__fd(skel->maps.nop_map); 1553 - int verdict_map = bpf_map__fd(skel->maps.verdict_map); 1554 - int sock_map = bpf_map__fd(inner_map); 1555 - int err; 1556 - 1557 - err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); 1558 - if (err) 1559 - return; 1560 - 1561 - skel->bss->test_ingress = false; 1562 - unix_inet_redir_to_connected(family, SOCK_DGRAM, 1563 - sock_map, -1, verdict_map, 1564 - REDIR_EGRESS, NO_FLAGS); 1565 - unix_inet_redir_to_connected(family, SOCK_STREAM, 1566 - sock_map, -1, verdict_map, 1567 - REDIR_EGRESS, NO_FLAGS); 1568 - 1569 - unix_inet_redir_to_connected(family, SOCK_DGRAM, 1570 - sock_map, nop_map, verdict_map, 1571 - REDIR_EGRESS, NO_FLAGS); 1572 - unix_inet_redir_to_connected(family, SOCK_STREAM, 1573 - sock_map, nop_map, verdict_map, 1574 - REDIR_EGRESS, NO_FLAGS); 1575 - 1576 - /* MSG_OOB not supported by AF_UNIX SOCK_DGRAM */ 1577 - unix_inet_redir_to_connected(family, SOCK_STREAM, 1578 - sock_map, nop_map, verdict_map, 1579 - REDIR_EGRESS, MSG_OOB); 1580 - 1581 - skel->bss->test_ingress = true; 1582 - unix_inet_redir_to_connected(family, SOCK_DGRAM, 1583 - sock_map, -1, verdict_map, 1584 - REDIR_INGRESS, NO_FLAGS); 1585 - unix_inet_redir_to_connected(family, SOCK_STREAM, 1586 - sock_map, -1, verdict_map, 1587 - REDIR_INGRESS, NO_FLAGS); 1588 - 1589 - unix_inet_redir_to_connected(family, SOCK_DGRAM, 1590 - sock_map, nop_map, verdict_map, 1591 - REDIR_INGRESS, NO_FLAGS); 1592 - unix_inet_redir_to_connected(family, SOCK_STREAM, 1593 - sock_map, nop_map, verdict_map, 1594 - REDIR_INGRESS, NO_FLAGS); 1595 - 1596 - /* MSG_OOB not supported by AF_UNIX SOCK_DGRAM */ 1597 - unix_inet_redir_to_connected(family, SOCK_STREAM, 1598 - sock_map, nop_map, verdict_map, 1599 - REDIR_INGRESS, MSG_OOB); 1600 - 1601 - xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); 1602 - } 1603 - 1604 - static void test_udp_unix_redir(struct test_sockmap_listen *skel, struct bpf_map *map, 1605 - int family) 1606 - { 1607 - const char *family_name, *map_name; 1608 - struct netns_obj *netns; 1609 - char s[MAX_TEST_NAME]; 1610 - 1611 - family_name = family_str(family); 1612 - map_name = map_type_str(map); 1613 - snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); 1614 - if (!test__start_subtest(s)) 1615 - return; 1616 - 1617 - netns = netns_new("sockmap_listen", true); 1618 - if (!ASSERT_OK_PTR(netns, "netns_new")) 1619 - return; 1620 - 1621 - inet_unix_skb_redir_to_connected(skel, map, family); 1622 - unix_inet_skb_redir_to_connected(skel, map, family); 1623 - 1624 - netns_free(netns); 1625 - } 1626 - 1627 1640 static void run_tests(struct test_sockmap_listen *skel, struct bpf_map *map, 1628 1641 int family) 1629 1642 { ··· 1414 1863 test_redir(skel, map, family, SOCK_STREAM); 1415 1864 test_reuseport(skel, map, family, SOCK_STREAM); 1416 1865 test_reuseport(skel, map, family, SOCK_DGRAM); 1417 - test_udp_redir(skel, map, family); 1418 - test_udp_unix_redir(skel, map, family); 1419 1866 } 1420 1867 1421 1868 void serial_test_sockmap_listen(void) ··· 1429 1880 skel->bss->test_sockmap = true; 1430 1881 run_tests(skel, skel->maps.sock_map, AF_INET); 1431 1882 run_tests(skel, skel->maps.sock_map, AF_INET6); 1432 - test_unix_redir(skel, skel->maps.sock_map, SOCK_DGRAM); 1433 - test_unix_redir(skel, skel->maps.sock_map, SOCK_STREAM); 1434 - test_vsock_redir(skel, skel->maps.sock_map); 1435 1883 1436 1884 skel->bss->test_sockmap = false; 1437 1885 run_tests(skel, skel->maps.sock_hash, AF_INET); 1438 1886 run_tests(skel, skel->maps.sock_hash, AF_INET6); 1439 - test_unix_redir(skel, skel->maps.sock_hash, SOCK_DGRAM); 1440 - test_unix_redir(skel, skel->maps.sock_hash, SOCK_STREAM); 1441 - test_vsock_redir(skel, skel->maps.sock_hash); 1442 1887 1443 1888 test_sockmap_listen__destroy(skel); 1444 1889 }

+465

tools/testing/selftests/bpf/prog_tests/sockmap_redir.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Test for sockmap/sockhash redirection. 4 + * 5 + * BPF_MAP_TYPE_SOCKMAP 6 + * BPF_MAP_TYPE_SOCKHASH 7 + * x 8 + * sk_msg-to-egress 9 + * sk_msg-to-ingress 10 + * sk_skb-to-egress 11 + * sk_skb-to-ingress 12 + * x 13 + * AF_INET, SOCK_STREAM 14 + * AF_INET6, SOCK_STREAM 15 + * AF_INET, SOCK_DGRAM 16 + * AF_INET6, SOCK_DGRAM 17 + * AF_UNIX, SOCK_STREAM 18 + * AF_UNIX, SOCK_DGRAM 19 + * AF_VSOCK, SOCK_STREAM 20 + * AF_VSOCK, SOCK_SEQPACKET 21 + */ 22 + 23 + #include <errno.h> 24 + #include <error.h> 25 + #include <sched.h> 26 + #include <stdio.h> 27 + #include <unistd.h> 28 + 29 + #include <netinet/in.h> 30 + #include <sys/socket.h> 31 + #include <sys/types.h> 32 + #include <sys/un.h> 33 + #include <linux/string.h> 34 + #include <linux/vm_sockets.h> 35 + 36 + #include <bpf/bpf.h> 37 + #include <bpf/libbpf.h> 38 + 39 + #include "linux/const.h" 40 + #include "test_progs.h" 41 + #include "sockmap_helpers.h" 42 + #include "test_sockmap_redir.skel.h" 43 + 44 + /* The meaning of SUPPORTED is "will redirect packet as expected". 45 + */ 46 + #define SUPPORTED _BITUL(0) 47 + 48 + /* Note on sk_skb-to-ingress ->af_vsock: 49 + * 50 + * Peer socket may receive the packet some time after the return from sendmsg(). 51 + * In a typical usage scenario, recvmsg() will block until the redirected packet 52 + * appears in the destination queue, or timeout if the packet was dropped. By 53 + * that point, the verdict map has already been updated to reflect what has 54 + * happened. 55 + * 56 + * But sk_skb-to-ingress/af_vsock is an unsupported combination, so no recvmsg() 57 + * takes place. Which means we may race the execution of the verdict logic and 58 + * read map_verd before it has been updated, i.e. we might observe 59 + * map_verd[SK_DROP]=0 instead of map_verd[SK_DROP]=1. 60 + * 61 + * This confuses the selftest logic: if there was no packet dropped, where's the 62 + * packet? So here's a heuristic: on map_verd[SK_DROP]=map_verd[SK_PASS]=0 63 + * (which implies the verdict program has not been ran) just re-read the verdict 64 + * map again. 65 + */ 66 + #define UNSUPPORTED_RACY_VERD _BITUL(1) 67 + 68 + enum prog_type { 69 + SK_MSG_EGRESS, 70 + SK_MSG_INGRESS, 71 + SK_SKB_EGRESS, 72 + SK_SKB_INGRESS, 73 + }; 74 + 75 + enum { 76 + SEND_INNER = 0, 77 + SEND_OUTER, 78 + }; 79 + 80 + enum { 81 + RECV_INNER = 0, 82 + RECV_OUTER, 83 + }; 84 + 85 + struct maps { 86 + int in; 87 + int out; 88 + int verd; 89 + }; 90 + 91 + struct combo_spec { 92 + enum prog_type prog_type; 93 + const char *in, *out; 94 + }; 95 + 96 + struct redir_spec { 97 + const char *name; 98 + int idx_send; 99 + int idx_recv; 100 + enum prog_type prog_type; 101 + }; 102 + 103 + struct socket_spec { 104 + int family; 105 + int sotype; 106 + int send_flags; 107 + int in[2]; 108 + int out[2]; 109 + }; 110 + 111 + static int socket_spec_pairs(struct socket_spec *s) 112 + { 113 + return create_socket_pairs(s->family, s->sotype, 114 + &s->in[0], &s->out[0], 115 + &s->in[1], &s->out[1]); 116 + } 117 + 118 + static void socket_spec_close(struct socket_spec *s) 119 + { 120 + xclose(s->in[0]); 121 + xclose(s->in[1]); 122 + xclose(s->out[0]); 123 + xclose(s->out[1]); 124 + } 125 + 126 + static void get_redir_params(struct redir_spec *redir, 127 + struct test_sockmap_redir *skel, int *prog_fd, 128 + enum bpf_attach_type *attach_type, 129 + int *redirect_flags) 130 + { 131 + enum prog_type type = redir->prog_type; 132 + struct bpf_program *prog; 133 + bool sk_msg; 134 + 135 + sk_msg = type == SK_MSG_INGRESS || type == SK_MSG_EGRESS; 136 + prog = sk_msg ? skel->progs.prog_msg_verdict : skel->progs.prog_skb_verdict; 137 + 138 + *prog_fd = bpf_program__fd(prog); 139 + *attach_type = sk_msg ? BPF_SK_MSG_VERDICT : BPF_SK_SKB_VERDICT; 140 + 141 + if (type == SK_MSG_INGRESS || type == SK_SKB_INGRESS) 142 + *redirect_flags = BPF_F_INGRESS; 143 + else 144 + *redirect_flags = 0; 145 + } 146 + 147 + static void try_recv(const char *prefix, int fd, int flags, bool expect_success) 148 + { 149 + ssize_t n; 150 + char buf; 151 + 152 + errno = 0; 153 + n = recv(fd, &buf, 1, flags); 154 + if (n < 0 && expect_success) 155 + FAIL_ERRNO("%s: unexpected failure: retval=%zd", prefix, n); 156 + if (!n && !expect_success) 157 + FAIL("%s: expected failure: retval=%zd", prefix, n); 158 + } 159 + 160 + static void handle_unsupported(int sd_send, int sd_peer, int sd_in, int sd_out, 161 + int sd_recv, int map_verd, int status) 162 + { 163 + unsigned int drop, pass; 164 + char recv_buf; 165 + ssize_t n; 166 + 167 + get_verdict: 168 + if (xbpf_map_lookup_elem(map_verd, &u32(SK_DROP), &drop) || 169 + xbpf_map_lookup_elem(map_verd, &u32(SK_PASS), &pass)) 170 + return; 171 + 172 + if (pass == 0 && drop == 0 && (status & UNSUPPORTED_RACY_VERD)) { 173 + sched_yield(); 174 + goto get_verdict; 175 + } 176 + 177 + if (pass != 0) { 178 + FAIL("unsupported: wanted verdict pass 0, have %u", pass); 179 + return; 180 + } 181 + 182 + /* If nothing was dropped, packet should have reached the peer */ 183 + if (drop == 0) { 184 + errno = 0; 185 + n = recv_timeout(sd_peer, &recv_buf, 1, 0, IO_TIMEOUT_SEC); 186 + if (n != 1) 187 + FAIL_ERRNO("unsupported: packet missing, retval=%zd", n); 188 + } 189 + 190 + /* Ensure queues are empty */ 191 + try_recv("bpf.recv(sd_send)", sd_send, MSG_DONTWAIT, false); 192 + if (sd_in != sd_send) 193 + try_recv("bpf.recv(sd_in)", sd_in, MSG_DONTWAIT, false); 194 + 195 + try_recv("bpf.recv(sd_out)", sd_out, MSG_DONTWAIT, false); 196 + if (sd_recv != sd_out) 197 + try_recv("bpf.recv(sd_recv)", sd_recv, MSG_DONTWAIT, false); 198 + } 199 + 200 + static void test_send_redir_recv(int sd_send, int send_flags, int sd_peer, 201 + int sd_in, int sd_out, int sd_recv, 202 + struct maps *maps, int status) 203 + { 204 + unsigned int drop, pass; 205 + char *send_buf = "ab"; 206 + char recv_buf = '\0'; 207 + ssize_t n, len = 1; 208 + 209 + /* Zero out the verdict map */ 210 + if (xbpf_map_update_elem(maps->verd, &u32(SK_DROP), &u32(0), BPF_ANY) || 211 + xbpf_map_update_elem(maps->verd, &u32(SK_PASS), &u32(0), BPF_ANY)) 212 + return; 213 + 214 + if (xbpf_map_update_elem(maps->in, &u32(0), &u64(sd_in), BPF_NOEXIST)) 215 + return; 216 + 217 + if (xbpf_map_update_elem(maps->out, &u32(0), &u64(sd_out), BPF_NOEXIST)) 218 + goto del_in; 219 + 220 + /* Last byte is OOB data when send_flags has MSG_OOB bit set */ 221 + if (send_flags & MSG_OOB) 222 + len++; 223 + n = send(sd_send, send_buf, len, send_flags); 224 + if (n >= 0 && n < len) 225 + FAIL("incomplete send"); 226 + if (n < 0) { 227 + /* sk_msg redirect combo not supported? */ 228 + if (status & SUPPORTED || errno != EACCES) 229 + FAIL_ERRNO("send"); 230 + goto out; 231 + } 232 + 233 + if (!(status & SUPPORTED)) { 234 + handle_unsupported(sd_send, sd_peer, sd_in, sd_out, sd_recv, 235 + maps->verd, status); 236 + goto out; 237 + } 238 + 239 + errno = 0; 240 + n = recv_timeout(sd_recv, &recv_buf, 1, 0, IO_TIMEOUT_SEC); 241 + if (n != 1) { 242 + FAIL_ERRNO("recv_timeout()"); 243 + goto out; 244 + } 245 + 246 + /* Check verdict _after_ recv(); af_vsock may need time to catch up */ 247 + if (xbpf_map_lookup_elem(maps->verd, &u32(SK_DROP), &drop) || 248 + xbpf_map_lookup_elem(maps->verd, &u32(SK_PASS), &pass)) 249 + goto out; 250 + 251 + if (drop != 0 || pass != 1) 252 + FAIL("unexpected verdict drop/pass: wanted 0/1, have %u/%u", 253 + drop, pass); 254 + 255 + if (recv_buf != send_buf[0]) 256 + FAIL("recv(): payload check, %02x != %02x", recv_buf, send_buf[0]); 257 + 258 + if (send_flags & MSG_OOB) { 259 + /* Fail reading OOB while in sockmap */ 260 + try_recv("bpf.recv(sd_out, MSG_OOB)", sd_out, 261 + MSG_OOB | MSG_DONTWAIT, false); 262 + 263 + /* Remove sd_out from sockmap */ 264 + xbpf_map_delete_elem(maps->out, &u32(0)); 265 + 266 + /* Check that OOB was dropped on redirect */ 267 + try_recv("recv(sd_out, MSG_OOB)", sd_out, 268 + MSG_OOB | MSG_DONTWAIT, false); 269 + 270 + goto del_in; 271 + } 272 + out: 273 + xbpf_map_delete_elem(maps->out, &u32(0)); 274 + del_in: 275 + xbpf_map_delete_elem(maps->in, &u32(0)); 276 + } 277 + 278 + static int is_redir_supported(enum prog_type type, const char *in, 279 + const char *out) 280 + { 281 + /* Matching based on strings returned by socket_kind_to_str(): 282 + * tcp4, udp4, tcp6, udp6, u_str, u_dgr, v_str, v_seq 283 + * Plus a wildcard: any 284 + * Not in use: u_seq, v_dgr 285 + */ 286 + struct combo_spec *c, combos[] = { 287 + /* Send to local: TCP -> any, but vsock */ 288 + { SK_MSG_INGRESS, "tcp", "tcp" }, 289 + { SK_MSG_INGRESS, "tcp", "udp" }, 290 + { SK_MSG_INGRESS, "tcp", "u_str" }, 291 + { SK_MSG_INGRESS, "tcp", "u_dgr" }, 292 + 293 + /* Send to egress: TCP -> TCP */ 294 + { SK_MSG_EGRESS, "tcp", "tcp" }, 295 + 296 + /* Ingress to egress: any -> any */ 297 + { SK_SKB_EGRESS, "any", "any" }, 298 + 299 + /* Ingress to local: any -> any, but vsock */ 300 + { SK_SKB_INGRESS, "any", "tcp" }, 301 + { SK_SKB_INGRESS, "any", "udp" }, 302 + { SK_SKB_INGRESS, "any", "u_str" }, 303 + { SK_SKB_INGRESS, "any", "u_dgr" }, 304 + }; 305 + 306 + for (c = combos; c < combos + ARRAY_SIZE(combos); c++) { 307 + if (c->prog_type == type && 308 + (!strcmp(c->in, "any") || strstarts(in, c->in)) && 309 + (!strcmp(c->out, "any") || strstarts(out, c->out))) 310 + return SUPPORTED; 311 + } 312 + 313 + return 0; 314 + } 315 + 316 + static int get_support_status(enum prog_type type, const char *in, 317 + const char *out) 318 + { 319 + int status = is_redir_supported(type, in, out); 320 + 321 + if (type == SK_SKB_INGRESS && strstarts(out, "v_")) 322 + status |= UNSUPPORTED_RACY_VERD; 323 + 324 + return status; 325 + } 326 + 327 + static void test_socket(enum bpf_map_type type, struct redir_spec *redir, 328 + struct maps *maps, struct socket_spec *s_in, 329 + struct socket_spec *s_out) 330 + { 331 + int fd_in, fd_out, fd_send, fd_peer, fd_recv, flags, status; 332 + const char *in_str, *out_str; 333 + char s[MAX_TEST_NAME]; 334 + 335 + fd_in = s_in->in[0]; 336 + fd_out = s_out->out[0]; 337 + fd_send = s_in->in[redir->idx_send]; 338 + fd_peer = s_in->in[redir->idx_send ^ 1]; 339 + fd_recv = s_out->out[redir->idx_recv]; 340 + flags = s_in->send_flags; 341 + 342 + in_str = socket_kind_to_str(fd_in); 343 + out_str = socket_kind_to_str(fd_out); 344 + status = get_support_status(redir->prog_type, in_str, out_str); 345 + 346 + snprintf(s, sizeof(s), 347 + "%-4s %-17s %-5s %s %-5s%6s", 348 + /* hash sk_skb-to-ingress u_str → v_str (OOB) */ 349 + type == BPF_MAP_TYPE_SOCKMAP ? "map" : "hash", 350 + redir->name, 351 + in_str, 352 + status & SUPPORTED ? "→" : " ", 353 + out_str, 354 + (flags & MSG_OOB) ? "(OOB)" : ""); 355 + 356 + if (!test__start_subtest(s)) 357 + return; 358 + 359 + test_send_redir_recv(fd_send, flags, fd_peer, fd_in, fd_out, fd_recv, 360 + maps, status); 361 + } 362 + 363 + static void test_redir(enum bpf_map_type type, struct redir_spec *redir, 364 + struct maps *maps) 365 + { 366 + struct socket_spec *s, sockets[] = { 367 + { AF_INET, SOCK_STREAM }, 368 + // { AF_INET, SOCK_STREAM, MSG_OOB }, /* Known to be broken */ 369 + { AF_INET6, SOCK_STREAM }, 370 + { AF_INET, SOCK_DGRAM }, 371 + { AF_INET6, SOCK_DGRAM }, 372 + { AF_UNIX, SOCK_STREAM }, 373 + { AF_UNIX, SOCK_STREAM, MSG_OOB }, 374 + { AF_UNIX, SOCK_DGRAM }, 375 + // { AF_UNIX, SOCK_SEQPACKET}, /* Unsupported BPF_MAP_UPDATE_ELEM */ 376 + { AF_VSOCK, SOCK_STREAM }, 377 + // { AF_VSOCK, SOCK_DGRAM }, /* Unsupported socket() */ 378 + { AF_VSOCK, SOCK_SEQPACKET }, 379 + }; 380 + 381 + for (s = sockets; s < sockets + ARRAY_SIZE(sockets); s++) 382 + if (socket_spec_pairs(s)) 383 + goto out; 384 + 385 + /* Intra-proto */ 386 + for (s = sockets; s < sockets + ARRAY_SIZE(sockets); s++) 387 + test_socket(type, redir, maps, s, s); 388 + 389 + /* Cross-proto */ 390 + for (int i = 0; i < ARRAY_SIZE(sockets); i++) { 391 + for (int j = 0; j < ARRAY_SIZE(sockets); j++) { 392 + struct socket_spec *out = &sockets[j]; 393 + struct socket_spec *in = &sockets[i]; 394 + 395 + /* Skip intra-proto and between variants */ 396 + if (out->send_flags || 397 + (in->family == out->family && 398 + in->sotype == out->sotype)) 399 + continue; 400 + 401 + test_socket(type, redir, maps, in, out); 402 + } 403 + } 404 + out: 405 + while (--s >= sockets) 406 + socket_spec_close(s); 407 + } 408 + 409 + static void test_map(enum bpf_map_type type) 410 + { 411 + struct redir_spec *r, redirs[] = { 412 + { "sk_msg-to-ingress", SEND_INNER, RECV_INNER, SK_MSG_INGRESS }, 413 + { "sk_msg-to-egress", SEND_INNER, RECV_OUTER, SK_MSG_EGRESS }, 414 + { "sk_skb-to-egress", SEND_OUTER, RECV_OUTER, SK_SKB_EGRESS }, 415 + { "sk_skb-to-ingress", SEND_OUTER, RECV_INNER, SK_SKB_INGRESS }, 416 + }; 417 + 418 + for (r = redirs; r < redirs + ARRAY_SIZE(redirs); r++) { 419 + enum bpf_attach_type attach_type; 420 + struct test_sockmap_redir *skel; 421 + struct maps maps; 422 + int prog_fd; 423 + 424 + skel = test_sockmap_redir__open_and_load(); 425 + if (!skel) { 426 + FAIL("open_and_load"); 427 + return; 428 + } 429 + 430 + switch (type) { 431 + case BPF_MAP_TYPE_SOCKMAP: 432 + maps.in = bpf_map__fd(skel->maps.nop_map); 433 + maps.out = bpf_map__fd(skel->maps.sock_map); 434 + break; 435 + case BPF_MAP_TYPE_SOCKHASH: 436 + maps.in = bpf_map__fd(skel->maps.nop_hash); 437 + maps.out = bpf_map__fd(skel->maps.sock_hash); 438 + break; 439 + default: 440 + FAIL("Unsupported bpf_map_type"); 441 + return; 442 + } 443 + 444 + skel->bss->redirect_type = type; 445 + maps.verd = bpf_map__fd(skel->maps.verdict_map); 446 + get_redir_params(r, skel, &prog_fd, &attach_type, 447 + &skel->bss->redirect_flags); 448 + 449 + if (xbpf_prog_attach(prog_fd, maps.in, attach_type, 0)) 450 + return; 451 + 452 + test_redir(type, r, &maps); 453 + 454 + if (xbpf_prog_detach2(prog_fd, maps.in, attach_type)) 455 + return; 456 + 457 + test_sockmap_redir__destroy(skel); 458 + } 459 + } 460 + 461 + void serial_test_sockmap_redir(void) 462 + { 463 + test_map(BPF_MAP_TYPE_SOCKMAP); 464 + test_map(BPF_MAP_TYPE_SOCKHASH); 465 + }

+6 -5

tools/testing/selftests/bpf/prog_tests/tc_redirect.c

··· 56 56 57 57 #define MAC_DST_FWD "00:11:22:33:44:55" 58 58 #define MAC_DST "00:22:33:44:55:66" 59 + #define MAC_SRC_FWD "00:33:44:55:66:77" 60 + #define MAC_SRC "00:44:55:66:77:88" 59 61 60 62 #define IFADDR_STR_LEN 18 61 63 #define PING_ARGS "-i 0.2 -c 3 -w 10 -q" ··· 209 207 int err; 210 208 211 209 if (result->dev_mode == MODE_VETH) { 212 - SYS(fail, "ip link add src type veth peer name src_fwd"); 213 - SYS(fail, "ip link add dst type veth peer name dst_fwd"); 214 - 215 - SYS(fail, "ip link set dst_fwd address " MAC_DST_FWD); 216 - SYS(fail, "ip link set dst address " MAC_DST); 210 + SYS(fail, "ip link add src address " MAC_SRC " type veth " 211 + "peer name src_fwd address " MAC_SRC_FWD); 212 + SYS(fail, "ip link add dst address " MAC_DST " type veth " 213 + "peer name dst_fwd address " MAC_DST_FWD); 217 214 } else if (result->dev_mode == MODE_NETKIT) { 218 215 err = create_netkit(NETKIT_L3, "src", "src_fwd"); 219 216 if (!ASSERT_OK(err, "create_ifindex_src"))

+64

tools/testing/selftests/bpf/prog_tests/test_btf_ext.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2025 Meta Platforms Inc. */ 3 + #include <test_progs.h> 4 + #include "test_btf_ext.skel.h" 5 + #include "btf_helpers.h" 6 + 7 + static void subtest_line_func_info(void) 8 + { 9 + struct test_btf_ext *skel; 10 + struct bpf_prog_info info; 11 + struct bpf_line_info line_info[128], *libbpf_line_info; 12 + struct bpf_func_info func_info[128], *libbpf_func_info; 13 + __u32 info_len = sizeof(info), libbbpf_line_info_cnt, libbbpf_func_info_cnt; 14 + int err, fd; 15 + 16 + skel = test_btf_ext__open_and_load(); 17 + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) 18 + return; 19 + 20 + fd = bpf_program__fd(skel->progs.global_func); 21 + 22 + memset(&info, 0, sizeof(info)); 23 + info.line_info = ptr_to_u64(&line_info); 24 + info.nr_line_info = sizeof(line_info); 25 + info.line_info_rec_size = sizeof(*line_info); 26 + err = bpf_prog_get_info_by_fd(fd, &info, &info_len); 27 + if (!ASSERT_OK(err, "prog_line_info")) 28 + goto out; 29 + 30 + libbpf_line_info = bpf_program__line_info(skel->progs.global_func); 31 + libbbpf_line_info_cnt = bpf_program__line_info_cnt(skel->progs.global_func); 32 + 33 + memset(&info, 0, sizeof(info)); 34 + info.func_info = ptr_to_u64(&func_info); 35 + info.nr_func_info = sizeof(func_info); 36 + info.func_info_rec_size = sizeof(*func_info); 37 + err = bpf_prog_get_info_by_fd(fd, &info, &info_len); 38 + if (!ASSERT_OK(err, "prog_func_info")) 39 + goto out; 40 + 41 + libbpf_func_info = bpf_program__func_info(skel->progs.global_func); 42 + libbbpf_func_info_cnt = bpf_program__func_info_cnt(skel->progs.global_func); 43 + 44 + if (!ASSERT_OK_PTR(libbpf_line_info, "bpf_program__line_info")) 45 + goto out; 46 + if (!ASSERT_EQ(libbbpf_line_info_cnt, info.nr_line_info, "line_info_cnt")) 47 + goto out; 48 + if (!ASSERT_OK_PTR(libbpf_func_info, "bpf_program__func_info")) 49 + goto out; 50 + if (!ASSERT_EQ(libbbpf_func_info_cnt, info.nr_func_info, "func_info_cnt")) 51 + goto out; 52 + ASSERT_MEMEQ(libbpf_line_info, line_info, libbbpf_line_info_cnt * sizeof(*line_info), 53 + "line_info"); 54 + ASSERT_MEMEQ(libbpf_func_info, func_info, libbbpf_func_info_cnt * sizeof(*func_info), 55 + "func_info"); 56 + out: 57 + test_btf_ext__destroy(skel); 58 + } 59 + 60 + void test_btf_ext(void) 61 + { 62 + if (test__start_subtest("line_func_info")) 63 + subtest_line_func_info(); 64 + }

+5

tools/testing/selftests/bpf/prog_tests/test_veristat.c

··· 63 63 " -G \"var_eb = EB2\" "\ 64 64 " -G \"var_ec = EC2\" "\ 65 65 " -G \"var_b = 1\" "\ 66 + " -G \"struct1.struct2.u.var_u8 = 170\" "\ 67 + " -G \"union1.struct3.var_u8_l = 0xaa\" "\ 68 + " -G \"union1.struct3.var_u8_h = 0xaa\" "\ 66 69 "-vl2 > %s", fix->veristat, fix->tmpfile); 67 70 68 71 read(fix->fd, fix->output, fix->sz); ··· 81 78 __CHECK_STR("_w=12 ", "var_eb = EB2"); 82 79 __CHECK_STR("_w=13 ", "var_ec = EC2"); 83 80 __CHECK_STR("_w=1 ", "var_b = 1"); 81 + __CHECK_STR("_w=170 ", "struct1.struct2.u.var_u8 = 170"); 82 + __CHECK_STR("_w=0xaaaa ", "union1.var_u16 = 0xaaaa"); 84 83 85 84 out: 86 85 teardown_fixture(fix);

+2

tools/testing/selftests/bpf/prog_tests/verifier.c

··· 14 14 #include "verifier_bounds_deduction_non_const.skel.h" 15 15 #include "verifier_bounds_mix_sign_unsign.skel.h" 16 16 #include "verifier_bpf_get_stack.skel.h" 17 + #include "verifier_bpf_trap.skel.h" 17 18 #include "verifier_bswap.skel.h" 18 19 #include "verifier_btf_ctx_access.skel.h" 19 20 #include "verifier_btf_unreliable_prog.skel.h" ··· 149 148 void test_verifier_bounds_deduction_non_const(void) { RUN(verifier_bounds_deduction_non_const); } 150 149 void test_verifier_bounds_mix_sign_unsign(void) { RUN(verifier_bounds_mix_sign_unsign); } 151 150 void test_verifier_bpf_get_stack(void) { RUN(verifier_bpf_get_stack); } 151 + void test_verifier_bpf_trap(void) { RUN(verifier_bpf_trap); } 152 152 void test_verifier_bswap(void) { RUN(verifier_bswap); } 153 153 void test_verifier_btf_ctx_access(void) { RUN(verifier_btf_ctx_access); } 154 154 void test_verifier_btf_unreliable_prog(void) { RUN(verifier_btf_unreliable_prog); }

+21 -1

tools/testing/selftests/bpf/prog_tests/xdp_metadata.c

··· 351 351 struct xdp_metadata2 *bpf_obj2 = NULL; 352 352 struct xdp_metadata *bpf_obj = NULL; 353 353 struct bpf_program *new_prog, *prog; 354 + struct bpf_devmap_val devmap_e = {}; 355 + struct bpf_map *prog_arr, *devmap; 354 356 struct nstoken *tok = NULL; 355 357 __u32 queue_id = QUEUE_ID; 356 - struct bpf_map *prog_arr; 357 358 struct xsk tx_xsk = {}; 358 359 struct xsk rx_xsk = {}; 359 360 __u32 val, key = 0; ··· 410 409 bpf_program__set_ifindex(prog, rx_ifindex); 411 410 bpf_program__set_flags(prog, BPF_F_XDP_DEV_BOUND_ONLY); 412 411 412 + /* Make sure we can load a dev-bound program that performs 413 + * XDP_REDIRECT into a devmap. 414 + */ 415 + new_prog = bpf_object__find_program_by_name(bpf_obj->obj, "redirect"); 416 + bpf_program__set_ifindex(new_prog, rx_ifindex); 417 + bpf_program__set_flags(new_prog, BPF_F_XDP_DEV_BOUND_ONLY); 418 + 413 419 if (!ASSERT_OK(xdp_metadata__load(bpf_obj), "load skeleton")) 414 420 goto out; 415 421 ··· 429 421 if (!ASSERT_ERR(bpf_map__update_elem(prog_arr, &key, sizeof(key), 430 422 &val, sizeof(val), BPF_ANY), 431 423 "update prog_arr")) 424 + goto out; 425 + 426 + /* Make sure we can't add dev-bound programs to devmaps. */ 427 + devmap = bpf_object__find_map_by_name(bpf_obj->obj, "dev_map"); 428 + if (!ASSERT_OK_PTR(devmap, "no dev_map found")) 429 + goto out; 430 + 431 + devmap_e.bpf_prog.fd = val; 432 + if (!ASSERT_ERR(bpf_map__update_elem(devmap, &key, sizeof(key), 433 + &devmap_e, sizeof(devmap_e), 434 + BPF_ANY), 435 + "update dev_map")) 432 436 goto out; 433 437 434 438 /* Attach BPF program to RX interface. */

+65

tools/testing/selftests/bpf/progs/bench_sockmap_prog.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <linux/bpf.h> 3 + #include <bpf/bpf_helpers.h> 4 + #include <bpf/bpf_endian.h> 5 + 6 + long process_byte = 0; 7 + int verdict_dir = 0; 8 + int dropped = 0; 9 + int pkt_size = 0; 10 + struct { 11 + __uint(type, BPF_MAP_TYPE_SOCKMAP); 12 + __uint(max_entries, 20); 13 + __type(key, int); 14 + __type(value, int); 15 + } sock_map_rx SEC(".maps"); 16 + 17 + struct { 18 + __uint(type, BPF_MAP_TYPE_SOCKMAP); 19 + __uint(max_entries, 20); 20 + __type(key, int); 21 + __type(value, int); 22 + } sock_map_tx SEC(".maps"); 23 + 24 + SEC("sk_skb/stream_parser") 25 + int prog_skb_parser(struct __sk_buff *skb) 26 + { 27 + return pkt_size; 28 + } 29 + 30 + SEC("sk_skb/stream_verdict") 31 + int prog_skb_verdict(struct __sk_buff *skb) 32 + { 33 + int one = 1; 34 + int ret = bpf_sk_redirect_map(skb, &sock_map_rx, one, verdict_dir); 35 + 36 + if (ret == SK_DROP) 37 + dropped++; 38 + __sync_fetch_and_add(&process_byte, skb->len); 39 + return ret; 40 + } 41 + 42 + SEC("sk_skb/stream_verdict") 43 + int prog_skb_pass(struct __sk_buff *skb) 44 + { 45 + __sync_fetch_and_add(&process_byte, skb->len); 46 + return SK_PASS; 47 + } 48 + 49 + SEC("sk_msg") 50 + int prog_skmsg_verdict(struct sk_msg_md *msg) 51 + { 52 + int one = 1; 53 + 54 + __sync_fetch_and_add(&process_byte, msg->size); 55 + return bpf_msg_redirect_map(msg, &sock_map_tx, one, verdict_dir); 56 + } 57 + 58 + SEC("sk_msg") 59 + int prog_skmsg_pass(struct sk_msg_md *msg) 60 + { 61 + __sync_fetch_and_add(&process_byte, msg->size); 62 + return SK_PASS; 63 + } 64 + 65 + char _license[] SEC("license") = "GPL";

+3 -2

tools/testing/selftests/bpf/progs/bpf_misc.h

··· 225 225 #define CAN_USE_BPF_ST 226 226 #endif 227 227 228 - #if __clang_major__ >= 18 && defined(ENABLE_ATOMICS_TESTS) && \ 229 - (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86)) 228 + #if __clang_major__ >= 18 && defined(ENABLE_ATOMICS_TESTS) && \ 229 + (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86) || \ 230 + (defined(__TARGET_ARCH_riscv) && __riscv_xlen == 64)) 230 231 #define CAN_USE_LOAD_ACQ_STORE_REL 231 232 #endif 232 233

+101

tools/testing/selftests/bpf/progs/dmabuf_iter.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2025 Google LLC */ 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_core_read.h> 5 + #include <bpf/bpf_helpers.h> 6 + 7 + /* From uapi/linux/dma-buf.h */ 8 + #define DMA_BUF_NAME_LEN 32 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + struct { 13 + __uint(type, BPF_MAP_TYPE_HASH); 14 + __uint(key_size, DMA_BUF_NAME_LEN); 15 + __type(value, bool); 16 + __uint(max_entries, 5); 17 + } testbuf_hash SEC(".maps"); 18 + 19 + /* 20 + * Fields output by this iterator are delimited by newlines. Convert any 21 + * newlines in user-provided printed strings to spaces. 22 + */ 23 + static void sanitize_string(char *src, size_t size) 24 + { 25 + for (char *c = src; (size_t)(c - src) < size && *c; ++c) 26 + if (*c == '\n') 27 + *c = ' '; 28 + } 29 + 30 + SEC("iter/dmabuf") 31 + int dmabuf_collector(struct bpf_iter__dmabuf *ctx) 32 + { 33 + const struct dma_buf *dmabuf = ctx->dmabuf; 34 + struct seq_file *seq = ctx->meta->seq; 35 + unsigned long inode = 0; 36 + size_t size; 37 + const char *pname, *exporter; 38 + char name[DMA_BUF_NAME_LEN] = {'\0'}; 39 + 40 + if (!dmabuf) 41 + return 0; 42 + 43 + if (BPF_CORE_READ_INTO(&inode, dmabuf, file, f_inode, i_ino) || 44 + bpf_core_read(&size, sizeof(size), &dmabuf->size) || 45 + bpf_core_read(&pname, sizeof(pname), &dmabuf->name) || 46 + bpf_core_read(&exporter, sizeof(exporter), &dmabuf->exp_name)) 47 + return 1; 48 + 49 + /* Buffers are not required to be named */ 50 + if (pname) { 51 + if (bpf_probe_read_kernel(name, sizeof(name), pname)) 52 + return 1; 53 + 54 + /* Name strings can be provided by userspace */ 55 + sanitize_string(name, sizeof(name)); 56 + } 57 + 58 + BPF_SEQ_PRINTF(seq, "%lu\n%llu\n%s\n%s\n", inode, size, name, exporter); 59 + return 0; 60 + } 61 + 62 + SEC("syscall") 63 + int iter_dmabuf_for_each(const void *ctx) 64 + { 65 + struct dma_buf *d; 66 + 67 + bpf_for_each(dmabuf, d) { 68 + char name[DMA_BUF_NAME_LEN]; 69 + const char *pname; 70 + bool *found; 71 + long len; 72 + int i; 73 + 74 + if (bpf_core_read(&pname, sizeof(pname), &d->name)) 75 + return 1; 76 + 77 + /* Buffers are not required to be named */ 78 + if (!pname) 79 + continue; 80 + 81 + len = bpf_probe_read_kernel_str(name, sizeof(name), pname); 82 + if (len < 0) 83 + return 1; 84 + 85 + /* 86 + * The entire name buffer is used as a map key. 87 + * Zeroize any uninitialized trailing bytes after the NUL. 88 + */ 89 + bpf_for(i, len, DMA_BUF_NAME_LEN) 90 + name[i] = 0; 91 + 92 + found = bpf_map_lookup_elem(&testbuf_hash, name); 93 + if (found) { 94 + bool t = true; 95 + 96 + bpf_map_update_elem(&testbuf_hash, name, &t, BPF_EXIST); 97 + } 98 + } 99 + 100 + return 0; 101 + }

+230

tools/testing/selftests/bpf/progs/dynptr_success.c

··· 680 680 bpf_ringbuf_discard_dynptr(&ptr_buf, 0); 681 681 return XDP_DROP; 682 682 } 683 + 684 + void *user_ptr; 685 + /* Contains the copy of the data pointed by user_ptr. 686 + * Size 384 to make it not fit into a single kernel chunk when copying 687 + * but less than the maximum bpf stack size (512). 688 + */ 689 + char expected_str[384]; 690 + __u32 test_len[7] = {0/* placeholder */, 0, 1, 2, 255, 256, 257}; 691 + 692 + typedef int (*bpf_read_dynptr_fn_t)(struct bpf_dynptr *dptr, u32 off, 693 + u32 size, const void *unsafe_ptr); 694 + 695 + /* Returns the offset just before the end of the maximum sized xdp fragment. 696 + * Any write larger than 32 bytes will be split between 2 fragments. 697 + */ 698 + __u32 xdp_near_frag_end_offset(void) 699 + { 700 + const __u32 headroom = 256; 701 + const __u32 max_frag_size = __PAGE_SIZE - headroom - sizeof(struct skb_shared_info); 702 + 703 + /* 32 bytes before the approximate end of the fragment */ 704 + return max_frag_size - 32; 705 + } 706 + 707 + /* Use __always_inline on test_dynptr_probe[_str][_xdp]() and callbacks 708 + * of type bpf_read_dynptr_fn_t to prevent compiler from generating 709 + * indirect calls that make program fail to load with "unknown opcode" error. 710 + */ 711 + static __always_inline void test_dynptr_probe(void *ptr, bpf_read_dynptr_fn_t bpf_read_dynptr_fn) 712 + { 713 + char buf[sizeof(expected_str)]; 714 + struct bpf_dynptr ptr_buf; 715 + int i; 716 + 717 + if (bpf_get_current_pid_tgid() >> 32 != pid) 718 + return; 719 + 720 + err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(buf), 0, &ptr_buf); 721 + 722 + bpf_for(i, 0, ARRAY_SIZE(test_len)) { 723 + __u32 len = test_len[i]; 724 + 725 + err = err ?: bpf_read_dynptr_fn(&ptr_buf, 0, test_len[i], ptr); 726 + if (len > sizeof(buf)) 727 + break; 728 + err = err ?: bpf_dynptr_read(&buf, len, &ptr_buf, 0, 0); 729 + 730 + if (err || bpf_memcmp(expected_str, buf, len)) 731 + err = 1; 732 + 733 + /* Reset buffer and dynptr */ 734 + __builtin_memset(buf, 0, sizeof(buf)); 735 + err = err ?: bpf_dynptr_write(&ptr_buf, 0, buf, len, 0); 736 + } 737 + bpf_ringbuf_discard_dynptr(&ptr_buf, 0); 738 + } 739 + 740 + static __always_inline void test_dynptr_probe_str(void *ptr, 741 + bpf_read_dynptr_fn_t bpf_read_dynptr_fn) 742 + { 743 + char buf[sizeof(expected_str)]; 744 + struct bpf_dynptr ptr_buf; 745 + __u32 cnt, i; 746 + 747 + if (bpf_get_current_pid_tgid() >> 32 != pid) 748 + return; 749 + 750 + bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(buf), 0, &ptr_buf); 751 + 752 + bpf_for(i, 0, ARRAY_SIZE(test_len)) { 753 + __u32 len = test_len[i]; 754 + 755 + cnt = bpf_read_dynptr_fn(&ptr_buf, 0, len, ptr); 756 + if (cnt != len) 757 + err = 1; 758 + 759 + if (len > sizeof(buf)) 760 + continue; 761 + err = err ?: bpf_dynptr_read(&buf, len, &ptr_buf, 0, 0); 762 + if (!len) 763 + continue; 764 + if (err || bpf_memcmp(expected_str, buf, len - 1) || buf[len - 1] != '\0') 765 + err = 1; 766 + } 767 + bpf_ringbuf_discard_dynptr(&ptr_buf, 0); 768 + } 769 + 770 + static __always_inline void test_dynptr_probe_xdp(struct xdp_md *xdp, void *ptr, 771 + bpf_read_dynptr_fn_t bpf_read_dynptr_fn) 772 + { 773 + struct bpf_dynptr ptr_xdp; 774 + char buf[sizeof(expected_str)]; 775 + __u32 off, i; 776 + 777 + if (bpf_get_current_pid_tgid() >> 32 != pid) 778 + return; 779 + 780 + off = xdp_near_frag_end_offset(); 781 + err = bpf_dynptr_from_xdp(xdp, 0, &ptr_xdp); 782 + 783 + bpf_for(i, 0, ARRAY_SIZE(test_len)) { 784 + __u32 len = test_len[i]; 785 + 786 + err = err ?: bpf_read_dynptr_fn(&ptr_xdp, off, len, ptr); 787 + if (len > sizeof(buf)) 788 + continue; 789 + err = err ?: bpf_dynptr_read(&buf, len, &ptr_xdp, off, 0); 790 + if (err || bpf_memcmp(expected_str, buf, len)) 791 + err = 1; 792 + /* Reset buffer and dynptr */ 793 + __builtin_memset(buf, 0, sizeof(buf)); 794 + err = err ?: bpf_dynptr_write(&ptr_xdp, off, buf, len, 0); 795 + } 796 + } 797 + 798 + static __always_inline void test_dynptr_probe_str_xdp(struct xdp_md *xdp, void *ptr, 799 + bpf_read_dynptr_fn_t bpf_read_dynptr_fn) 800 + { 801 + struct bpf_dynptr ptr_xdp; 802 + char buf[sizeof(expected_str)]; 803 + __u32 cnt, off, i; 804 + 805 + if (bpf_get_current_pid_tgid() >> 32 != pid) 806 + return; 807 + 808 + off = xdp_near_frag_end_offset(); 809 + err = bpf_dynptr_from_xdp(xdp, 0, &ptr_xdp); 810 + if (err) 811 + return; 812 + 813 + bpf_for(i, 0, ARRAY_SIZE(test_len)) { 814 + __u32 len = test_len[i]; 815 + 816 + cnt = bpf_read_dynptr_fn(&ptr_xdp, off, len, ptr); 817 + if (cnt != len) 818 + err = 1; 819 + 820 + if (len > sizeof(buf)) 821 + continue; 822 + err = err ?: bpf_dynptr_read(&buf, len, &ptr_xdp, off, 0); 823 + 824 + if (!len) 825 + continue; 826 + if (err || bpf_memcmp(expected_str, buf, len - 1) || buf[len - 1] != '\0') 827 + err = 1; 828 + 829 + __builtin_memset(buf, 0, sizeof(buf)); 830 + err = err ?: bpf_dynptr_write(&ptr_xdp, off, buf, len, 0); 831 + } 832 + } 833 + 834 + SEC("xdp") 835 + int test_probe_read_user_dynptr(struct xdp_md *xdp) 836 + { 837 + test_dynptr_probe(user_ptr, bpf_probe_read_user_dynptr); 838 + if (!err) 839 + test_dynptr_probe_xdp(xdp, user_ptr, bpf_probe_read_user_dynptr); 840 + return XDP_PASS; 841 + } 842 + 843 + SEC("xdp") 844 + int test_probe_read_kernel_dynptr(struct xdp_md *xdp) 845 + { 846 + test_dynptr_probe(expected_str, bpf_probe_read_kernel_dynptr); 847 + if (!err) 848 + test_dynptr_probe_xdp(xdp, expected_str, bpf_probe_read_kernel_dynptr); 849 + return XDP_PASS; 850 + } 851 + 852 + SEC("xdp") 853 + int test_probe_read_user_str_dynptr(struct xdp_md *xdp) 854 + { 855 + test_dynptr_probe_str(user_ptr, bpf_probe_read_user_str_dynptr); 856 + if (!err) 857 + test_dynptr_probe_str_xdp(xdp, user_ptr, bpf_probe_read_user_str_dynptr); 858 + return XDP_PASS; 859 + } 860 + 861 + SEC("xdp") 862 + int test_probe_read_kernel_str_dynptr(struct xdp_md *xdp) 863 + { 864 + test_dynptr_probe_str(expected_str, bpf_probe_read_kernel_str_dynptr); 865 + if (!err) 866 + test_dynptr_probe_str_xdp(xdp, expected_str, bpf_probe_read_kernel_str_dynptr); 867 + return XDP_PASS; 868 + } 869 + 870 + SEC("fentry.s/" SYS_PREFIX "sys_nanosleep") 871 + int test_copy_from_user_dynptr(void *ctx) 872 + { 873 + test_dynptr_probe(user_ptr, bpf_copy_from_user_dynptr); 874 + return 0; 875 + } 876 + 877 + SEC("fentry.s/" SYS_PREFIX "sys_nanosleep") 878 + int test_copy_from_user_str_dynptr(void *ctx) 879 + { 880 + test_dynptr_probe_str(user_ptr, bpf_copy_from_user_str_dynptr); 881 + return 0; 882 + } 883 + 884 + static int bpf_copy_data_from_user_task(struct bpf_dynptr *dptr, u32 off, 885 + u32 size, const void *unsafe_ptr) 886 + { 887 + struct task_struct *task = bpf_get_current_task_btf(); 888 + 889 + return bpf_copy_from_user_task_dynptr(dptr, off, size, unsafe_ptr, task); 890 + } 891 + 892 + static int bpf_copy_data_from_user_task_str(struct bpf_dynptr *dptr, u32 off, 893 + u32 size, const void *unsafe_ptr) 894 + { 895 + struct task_struct *task = bpf_get_current_task_btf(); 896 + 897 + return bpf_copy_from_user_task_str_dynptr(dptr, off, size, unsafe_ptr, task); 898 + } 899 + 900 + SEC("fentry.s/" SYS_PREFIX "sys_nanosleep") 901 + int test_copy_from_user_task_dynptr(void *ctx) 902 + { 903 + test_dynptr_probe(user_ptr, bpf_copy_data_from_user_task); 904 + return 0; 905 + } 906 + 907 + SEC("fentry.s/" SYS_PREFIX "sys_nanosleep") 908 + int test_copy_from_user_task_str_dynptr(void *ctx) 909 + { 910 + test_dynptr_probe_str(user_ptr, bpf_copy_data_from_user_task_str); 911 + return 0; 912 + }

+25

tools/testing/selftests/bpf/progs/fd_htab_lookup.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (C) 2025. Huawei Technologies Co., Ltd */ 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + 6 + char _license[] SEC("license") = "GPL"; 7 + 8 + struct inner_map_type { 9 + __uint(type, BPF_MAP_TYPE_ARRAY); 10 + __uint(key_size, 4); 11 + __uint(value_size, 4); 12 + __uint(max_entries, 1); 13 + } inner_map SEC(".maps"); 14 + 15 + struct { 16 + __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS); 17 + __uint(max_entries, 64); 18 + __type(key, int); 19 + __type(value, int); 20 + __array(values, struct inner_map_type); 21 + } outer_map SEC(".maps") = { 22 + .values = { 23 + [0] = &inner_map, 24 + }, 25 + };

-2

tools/testing/selftests/bpf/progs/iters.c

··· 7 7 #include "bpf_misc.h" 8 8 #include "bpf_compiler.h" 9 9 10 - #define unlikely(x) __builtin_expect(!!(x), 0) 11 - 12 10 static volatile int zero = 0; 13 11 14 12 int my_pid;

+113

tools/testing/selftests/bpf/progs/linked_list_peek.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <vmlinux.h> 5 + #include <bpf/bpf_helpers.h> 6 + #include "bpf_misc.h" 7 + #include "bpf_experimental.h" 8 + 9 + struct node_data { 10 + struct bpf_list_node l; 11 + int key; 12 + }; 13 + 14 + #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) 15 + private(A) struct bpf_spin_lock glock; 16 + private(A) struct bpf_list_head ghead __contains(node_data, l); 17 + 18 + #define list_entry(ptr, type, member) container_of(ptr, type, member) 19 + #define NR_NODES 16 20 + 21 + int zero = 0; 22 + 23 + SEC("syscall") 24 + __retval(0) 25 + long list_peek(void *ctx) 26 + { 27 + struct bpf_list_node *l_n; 28 + struct node_data *n; 29 + int i, err = 0; 30 + 31 + bpf_spin_lock(&glock); 32 + l_n = bpf_list_front(&ghead); 33 + bpf_spin_unlock(&glock); 34 + if (l_n) 35 + return __LINE__; 36 + 37 + bpf_spin_lock(&glock); 38 + l_n = bpf_list_back(&ghead); 39 + bpf_spin_unlock(&glock); 40 + if (l_n) 41 + return __LINE__; 42 + 43 + for (i = zero; i < NR_NODES && can_loop; i++) { 44 + n = bpf_obj_new(typeof(*n)); 45 + if (!n) 46 + return __LINE__; 47 + n->key = i; 48 + bpf_spin_lock(&glock); 49 + bpf_list_push_back(&ghead, &n->l); 50 + bpf_spin_unlock(&glock); 51 + } 52 + 53 + bpf_spin_lock(&glock); 54 + 55 + l_n = bpf_list_front(&ghead); 56 + if (!l_n) { 57 + err = __LINE__; 58 + goto done; 59 + } 60 + 61 + n = list_entry(l_n, struct node_data, l); 62 + if (n->key != 0) { 63 + err = __LINE__; 64 + goto done; 65 + } 66 + 67 + l_n = bpf_list_back(&ghead); 68 + if (!l_n) { 69 + err = __LINE__; 70 + goto done; 71 + } 72 + 73 + n = list_entry(l_n, struct node_data, l); 74 + if (n->key != NR_NODES - 1) { 75 + err = __LINE__; 76 + goto done; 77 + } 78 + 79 + done: 80 + bpf_spin_unlock(&glock); 81 + return err; 82 + } 83 + 84 + #define TEST_FB(op, dolock) \ 85 + SEC("syscall") \ 86 + __failure __msg(MSG) \ 87 + long test_##op##_spinlock_##dolock(void *ctx) \ 88 + { \ 89 + struct bpf_list_node *l_n; \ 90 + __u64 jiffies = 0; \ 91 + \ 92 + if (dolock) \ 93 + bpf_spin_lock(&glock); \ 94 + l_n = bpf_list_##op(&ghead); \ 95 + if (l_n) \ 96 + jiffies = bpf_jiffies64(); \ 97 + if (dolock) \ 98 + bpf_spin_unlock(&glock); \ 99 + \ 100 + return !!jiffies; \ 101 + } 102 + 103 + #define MSG "call bpf_list_{{(front|back).+}}; R0{{(_w)?}}=ptr_or_null_node_data(id={{[0-9]+}},non_own_ref" 104 + TEST_FB(front, true) 105 + TEST_FB(back, true) 106 + #undef MSG 107 + 108 + #define MSG "bpf_spin_lock at off=0 must be held for bpf_list_head" 109 + TEST_FB(front, false) 110 + TEST_FB(back, false) 111 + #undef MSG 112 + 113 + char _license[] SEC("license") = "GPL";

-1

tools/testing/selftests/bpf/progs/prepare.c

+15 -14

tools/testing/selftests/bpf/progs/rbtree_fail.c

··· 69 69 } 70 70 71 71 SEC("?tc") 72 - __failure __msg("rbtree_remove node input must be non-owning ref") 72 + __retval(0) 73 73 long rbtree_api_remove_unadded_node(void *ctx) 74 74 { 75 75 struct node_data *n, *m; 76 - struct bpf_rb_node *res; 76 + struct bpf_rb_node *res_n, *res_m; 77 77 78 78 n = bpf_obj_new(typeof(*n)); 79 79 if (!n) ··· 88 88 bpf_spin_lock(&glock); 89 89 bpf_rbtree_add(&groot, &n->node, less); 90 90 91 - /* This remove should pass verifier */ 92 - res = bpf_rbtree_remove(&groot, &n->node); 93 - n = container_of(res, struct node_data, node); 91 + res_n = bpf_rbtree_remove(&groot, &n->node); 94 92 95 - /* This remove shouldn't, m isn't in an rbtree */ 96 - res = bpf_rbtree_remove(&groot, &m->node); 97 - m = container_of(res, struct node_data, node); 93 + res_m = bpf_rbtree_remove(&groot, &m->node); 98 94 bpf_spin_unlock(&glock); 99 95 100 - if (n) 101 - bpf_obj_drop(n); 102 - if (m) 103 - bpf_obj_drop(m); 96 + bpf_obj_drop(m); 97 + if (res_n) 98 + bpf_obj_drop(container_of(res_n, struct node_data, node)); 99 + if (res_m) { 100 + bpf_obj_drop(container_of(res_m, struct node_data, node)); 101 + /* m was not added to the rbtree */ 102 + return 2; 103 + } 104 + 104 105 return 0; 105 106 } 106 107 ··· 179 178 } 180 179 181 180 SEC("?tc") 182 - __failure __msg("rbtree_remove node input must be non-owning ref") 181 + __failure __msg("bpf_rbtree_remove can only take non-owning or refcounted bpf_rb_node pointer") 183 182 long rbtree_api_add_release_unlock_escape(void *ctx) 184 183 { 185 184 struct node_data *n; ··· 203 202 } 204 203 205 204 SEC("?tc") 206 - __failure __msg("rbtree_remove node input must be non-owning ref") 205 + __failure __msg("bpf_rbtree_remove can only take non-owning or refcounted bpf_rb_node pointer") 207 206 long rbtree_api_first_release_unlock_escape(void *ctx) 208 207 { 209 208 struct bpf_rb_node *res;

+206

tools/testing/selftests/bpf/progs/rbtree_search.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <vmlinux.h> 5 + #include <bpf/bpf_helpers.h> 6 + #include "bpf_misc.h" 7 + #include "bpf_experimental.h" 8 + 9 + struct node_data { 10 + struct bpf_refcount ref; 11 + struct bpf_rb_node r0; 12 + struct bpf_rb_node r1; 13 + int key0; 14 + int key1; 15 + }; 16 + 17 + #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) 18 + private(A) struct bpf_spin_lock glock0; 19 + private(A) struct bpf_rb_root groot0 __contains(node_data, r0); 20 + 21 + private(B) struct bpf_spin_lock glock1; 22 + private(B) struct bpf_rb_root groot1 __contains(node_data, r1); 23 + 24 + #define rb_entry(ptr, type, member) container_of(ptr, type, member) 25 + #define NR_NODES 16 26 + 27 + int zero = 0; 28 + 29 + static bool less0(struct bpf_rb_node *a, const struct bpf_rb_node *b) 30 + { 31 + struct node_data *node_a; 32 + struct node_data *node_b; 33 + 34 + node_a = rb_entry(a, struct node_data, r0); 35 + node_b = rb_entry(b, struct node_data, r0); 36 + 37 + return node_a->key0 < node_b->key0; 38 + } 39 + 40 + static bool less1(struct bpf_rb_node *a, const struct bpf_rb_node *b) 41 + { 42 + struct node_data *node_a; 43 + struct node_data *node_b; 44 + 45 + node_a = rb_entry(a, struct node_data, r1); 46 + node_b = rb_entry(b, struct node_data, r1); 47 + 48 + return node_a->key1 < node_b->key1; 49 + } 50 + 51 + SEC("syscall") 52 + __retval(0) 53 + long rbtree_search(void *ctx) 54 + { 55 + struct bpf_rb_node *rb_n, *rb_m, *gc_ns[NR_NODES]; 56 + long lookup_key = NR_NODES / 2; 57 + struct node_data *n, *m; 58 + int i, nr_gc = 0; 59 + 60 + for (i = zero; i < NR_NODES && can_loop; i++) { 61 + n = bpf_obj_new(typeof(*n)); 62 + if (!n) 63 + return __LINE__; 64 + 65 + m = bpf_refcount_acquire(n); 66 + 67 + n->key0 = i; 68 + m->key1 = i; 69 + 70 + bpf_spin_lock(&glock0); 71 + bpf_rbtree_add(&groot0, &n->r0, less0); 72 + bpf_spin_unlock(&glock0); 73 + 74 + bpf_spin_lock(&glock1); 75 + bpf_rbtree_add(&groot1, &m->r1, less1); 76 + bpf_spin_unlock(&glock1); 77 + } 78 + 79 + n = NULL; 80 + bpf_spin_lock(&glock0); 81 + rb_n = bpf_rbtree_root(&groot0); 82 + while (can_loop) { 83 + if (!rb_n) { 84 + bpf_spin_unlock(&glock0); 85 + return __LINE__; 86 + } 87 + 88 + n = rb_entry(rb_n, struct node_data, r0); 89 + if (lookup_key == n->key0) 90 + break; 91 + if (nr_gc < NR_NODES) 92 + gc_ns[nr_gc++] = rb_n; 93 + if (lookup_key < n->key0) 94 + rb_n = bpf_rbtree_left(&groot0, rb_n); 95 + else 96 + rb_n = bpf_rbtree_right(&groot0, rb_n); 97 + } 98 + 99 + if (!n || lookup_key != n->key0) { 100 + bpf_spin_unlock(&glock0); 101 + return __LINE__; 102 + } 103 + 104 + for (i = 0; i < nr_gc; i++) { 105 + rb_n = gc_ns[i]; 106 + gc_ns[i] = bpf_rbtree_remove(&groot0, rb_n); 107 + } 108 + 109 + m = bpf_refcount_acquire(n); 110 + bpf_spin_unlock(&glock0); 111 + 112 + for (i = 0; i < nr_gc; i++) { 113 + rb_n = gc_ns[i]; 114 + if (rb_n) { 115 + n = rb_entry(rb_n, struct node_data, r0); 116 + bpf_obj_drop(n); 117 + } 118 + } 119 + 120 + if (!m) 121 + return __LINE__; 122 + 123 + bpf_spin_lock(&glock1); 124 + rb_m = bpf_rbtree_remove(&groot1, &m->r1); 125 + bpf_spin_unlock(&glock1); 126 + bpf_obj_drop(m); 127 + if (!rb_m) 128 + return __LINE__; 129 + bpf_obj_drop(rb_entry(rb_m, struct node_data, r1)); 130 + 131 + return 0; 132 + } 133 + 134 + #define TEST_ROOT(dolock) \ 135 + SEC("syscall") \ 136 + __failure __msg(MSG) \ 137 + long test_root_spinlock_##dolock(void *ctx) \ 138 + { \ 139 + struct bpf_rb_node *rb_n; \ 140 + __u64 jiffies = 0; \ 141 + \ 142 + if (dolock) \ 143 + bpf_spin_lock(&glock0); \ 144 + rb_n = bpf_rbtree_root(&groot0); \ 145 + if (rb_n) \ 146 + jiffies = bpf_jiffies64(); \ 147 + if (dolock) \ 148 + bpf_spin_unlock(&glock0); \ 149 + \ 150 + return !!jiffies; \ 151 + } 152 + 153 + #define TEST_LR(op, dolock) \ 154 + SEC("syscall") \ 155 + __failure __msg(MSG) \ 156 + long test_##op##_spinlock_##dolock(void *ctx) \ 157 + { \ 158 + struct bpf_rb_node *rb_n; \ 159 + struct node_data *n; \ 160 + __u64 jiffies = 0; \ 161 + \ 162 + bpf_spin_lock(&glock0); \ 163 + rb_n = bpf_rbtree_root(&groot0); \ 164 + if (!rb_n) { \ 165 + bpf_spin_unlock(&glock0); \ 166 + return 1; \ 167 + } \ 168 + n = rb_entry(rb_n, struct node_data, r0); \ 169 + n = bpf_refcount_acquire(n); \ 170 + bpf_spin_unlock(&glock0); \ 171 + if (!n) \ 172 + return 1; \ 173 + \ 174 + if (dolock) \ 175 + bpf_spin_lock(&glock0); \ 176 + rb_n = bpf_rbtree_##op(&groot0, &n->r0); \ 177 + if (rb_n) \ 178 + jiffies = bpf_jiffies64(); \ 179 + if (dolock) \ 180 + bpf_spin_unlock(&glock0); \ 181 + \ 182 + return !!jiffies; \ 183 + } 184 + 185 + /* 186 + * Use a spearate MSG macro instead of passing to TEST_XXX(..., MSG) 187 + * to ensure the message itself is not in the bpf prog lineinfo 188 + * which the verifier includes in its log. 189 + * Otherwise, the test_loader will incorrectly match the prog lineinfo 190 + * instead of the log generated by the verifier. 191 + */ 192 + #define MSG "call bpf_rbtree_root{{.+}}; R0{{(_w)?}}=rcu_ptr_or_null_node_data(id={{[0-9]+}},non_own_ref" 193 + TEST_ROOT(true) 194 + #undef MSG 195 + #define MSG "call bpf_rbtree_{{(left|right).+}}; R0{{(_w)?}}=rcu_ptr_or_null_node_data(id={{[0-9]+}},non_own_ref" 196 + TEST_LR(left, true) 197 + TEST_LR(right, true) 198 + #undef MSG 199 + 200 + #define MSG "bpf_spin_lock at off=0 must be held for bpf_rb_root" 201 + TEST_ROOT(false) 202 + TEST_LR(left, false) 203 + TEST_LR(right, false) 204 + #undef MSG 205 + 206 + char _license[] SEC("license") = "GPL";

+41

tools/testing/selftests/bpf/progs/set_global_vars.c

··· 24 24 const volatile enum Enums64 var_ec = EC1; 25 25 const volatile bool var_b = false; 26 26 27 + struct Struct { 28 + int:16; 29 + __u16 filler; 30 + struct { 31 + const __u16 filler2; 32 + }; 33 + struct Struct2 { 34 + __u16 filler; 35 + volatile struct { 36 + const int:1; 37 + union { 38 + const volatile __u8 var_u8; 39 + const volatile __s16 filler3; 40 + const int:1; 41 + } u; 42 + }; 43 + } struct2; 44 + }; 45 + 46 + const volatile __u32 stru = 0; /* same prefix as below */ 47 + const volatile struct Struct struct1 = {.struct2 = {.u = {.var_u8 = 1}}}; 48 + 49 + union Union { 50 + __u16 var_u16; 51 + struct Struct3 { 52 + struct { 53 + __u8 var_u8_l; 54 + }; 55 + struct { 56 + struct { 57 + __u8 var_u8_h; 58 + }; 59 + }; 60 + } struct3; 61 + }; 62 + 63 + const volatile union Union union1 = {.var_u16 = -1}; 64 + 27 65 char arr[4] = {0}; 28 66 29 67 SEC("socket") ··· 81 43 a = var_eb; 82 44 a = var_ec; 83 45 a = var_b; 46 + a = struct1.struct2.u.var_u8; 47 + a = union1.var_u16; 48 + 84 49 return a; 85 50 }

+22

tools/testing/selftests/bpf/progs/test_btf_ext.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright (c) 2025 Meta Platforms Inc. */ 3 + 4 + #include <linux/bpf.h> 5 + #include <bpf/bpf_helpers.h> 6 + #include "bpf_misc.h" 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + __noinline static void f0(void) 11 + { 12 + __u64 a = 1; 13 + 14 + __sink(a); 15 + } 16 + 17 + SEC("xdp") 18 + __u64 global_func(struct xdp_md *xdp) 19 + { 20 + f0(); 21 + return XDP_DROP; 22 + }

+36

tools/testing/selftests/bpf/progs/test_sockmap_ktls.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <linux/bpf.h> 3 + #include <bpf/bpf_helpers.h> 4 + #include <bpf/bpf_endian.h> 5 + 6 + int cork_byte; 7 + int push_start; 8 + int push_end; 9 + int apply_bytes; 10 + 11 + struct { 12 + __uint(type, BPF_MAP_TYPE_SOCKMAP); 13 + __uint(max_entries, 20); 14 + __type(key, int); 15 + __type(value, int); 16 + } sock_map SEC(".maps"); 17 + 18 + SEC("sk_msg") 19 + int prog_sk_policy(struct sk_msg_md *msg) 20 + { 21 + if (cork_byte > 0) 22 + bpf_msg_cork_bytes(msg, cork_byte); 23 + if (push_start > 0 && push_end > 0) 24 + bpf_msg_push_data(msg, push_start, push_end, 0); 25 + 26 + return SK_PASS; 27 + } 28 + 29 + SEC("sk_msg") 30 + int prog_sk_policy_redir(struct sk_msg_md *msg) 31 + { 32 + int two = 2; 33 + 34 + bpf_msg_apply_bytes(msg, apply_bytes); 35 + return bpf_msg_redirect_map(msg, &sock_map, two, 0); 36 + }

+68

tools/testing/selftests/bpf/progs/test_sockmap_redir.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + 7 + SEC(".maps") struct { 8 + __uint(type, BPF_MAP_TYPE_SOCKMAP); 9 + __uint(max_entries, 1); 10 + __type(key, __u32); 11 + __type(value, __u64); 12 + } nop_map, sock_map; 13 + 14 + SEC(".maps") struct { 15 + __uint(type, BPF_MAP_TYPE_SOCKHASH); 16 + __uint(max_entries, 1); 17 + __type(key, __u32); 18 + __type(value, __u64); 19 + } nop_hash, sock_hash; 20 + 21 + SEC(".maps") struct { 22 + __uint(type, BPF_MAP_TYPE_ARRAY); 23 + __uint(max_entries, 2); 24 + __type(key, int); 25 + __type(value, unsigned int); 26 + } verdict_map; 27 + 28 + /* Set by user space */ 29 + int redirect_type; 30 + int redirect_flags; 31 + 32 + #define redirect_map(__data) \ 33 + _Generic((__data), \ 34 + struct __sk_buff * : bpf_sk_redirect_map, \ 35 + struct sk_msg_md * : bpf_msg_redirect_map \ 36 + )((__data), &sock_map, (__u32){0}, redirect_flags) 37 + 38 + #define redirect_hash(__data) \ 39 + _Generic((__data), \ 40 + struct __sk_buff * : bpf_sk_redirect_hash, \ 41 + struct sk_msg_md * : bpf_msg_redirect_hash \ 42 + )((__data), &sock_hash, &(__u32){0}, redirect_flags) 43 + 44 + #define DEFINE_PROG(__type, __param) \ 45 + SEC("sk_" XSTR(__type)) \ 46 + int prog_ ## __type ## _verdict(__param data) \ 47 + { \ 48 + unsigned int *count; \ 49 + int verdict; \ 50 + \ 51 + if (redirect_type == BPF_MAP_TYPE_SOCKMAP) \ 52 + verdict = redirect_map(data); \ 53 + else if (redirect_type == BPF_MAP_TYPE_SOCKHASH) \ 54 + verdict = redirect_hash(data); \ 55 + else \ 56 + verdict = redirect_type - __MAX_BPF_MAP_TYPE; \ 57 + \ 58 + count = bpf_map_lookup_elem(&verdict_map, &verdict); \ 59 + if (count) \ 60 + (*count)++; \ 61 + \ 62 + return verdict; \ 63 + } 64 + 65 + DEFINE_PROG(skb, struct __sk_buff *); 66 + DEFINE_PROG(msg, struct sk_msg_md *); 67 + 68 + char _license[] SEC("license") = "GPL";

+3 -1

tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c

··· 294 294 (ctx->ipv6 && ctx->attrs.mss != MSS_LOCAL_IPV6)) 295 295 goto err; 296 296 297 - if (!ctx->attrs.wscale_ok || ctx->attrs.snd_wscale != 7) 297 + if (!ctx->attrs.wscale_ok || 298 + !ctx->attrs.snd_wscale || 299 + ctx->attrs.snd_wscale >= BPF_SYNCOOKIE_WSCALE_MASK) 298 300 goto err; 299 301 300 302 if (!ctx->attrs.tstamp_ok)

+71

tools/testing/selftests/bpf/progs/verifier_bpf_trap.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */ 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + 7 + #if __clang_major__ >= 21 && 0 8 + SEC("socket") 9 + __description("__builtin_trap with simple c code") 10 + __failure __msg("unexpected __bpf_trap() due to uninitialized variable?") 11 + void bpf_builtin_trap_with_simple_c(void) 12 + { 13 + __builtin_trap(); 14 + } 15 + #endif 16 + 17 + SEC("socket") 18 + __description("__bpf_trap with simple c code") 19 + __failure __msg("unexpected __bpf_trap() due to uninitialized variable?") 20 + void bpf_trap_with_simple_c(void) 21 + { 22 + __bpf_trap(); 23 + } 24 + 25 + SEC("socket") 26 + __description("__bpf_trap as the second-from-last insn") 27 + __failure __msg("unexpected __bpf_trap() due to uninitialized variable?") 28 + __naked void bpf_trap_at_func_end(void) 29 + { 30 + asm volatile ( 31 + "r0 = 0;" 32 + "call %[__bpf_trap];" 33 + "exit;" 34 + : 35 + : __imm(__bpf_trap) 36 + : __clobber_all); 37 + } 38 + 39 + SEC("socket") 40 + __description("dead code __bpf_trap in the middle of code") 41 + __success 42 + __naked void dead_bpf_trap_in_middle(void) 43 + { 44 + asm volatile ( 45 + "r0 = 0;" 46 + "if r0 == 0 goto +1;" 47 + "call %[__bpf_trap];" 48 + "r0 = 2;" 49 + "exit;" 50 + : 51 + : __imm(__bpf_trap) 52 + : __clobber_all); 53 + } 54 + 55 + SEC("socket") 56 + __description("reachable __bpf_trap in the middle of code") 57 + __failure __msg("unexpected __bpf_trap() due to uninitialized variable?") 58 + __naked void live_bpf_trap_in_middle(void) 59 + { 60 + asm volatile ( 61 + "r0 = 0;" 62 + "if r0 == 1 goto +1;" 63 + "call %[__bpf_trap];" 64 + "r0 = 2;" 65 + "exit;" 66 + : 67 + : __imm(__bpf_trap) 68 + : __clobber_all); 69 + } 70 + 71 + char _license[] SEC("license") = "GPL";

+12

tools/testing/selftests/bpf/progs/verifier_btf_ctx_access.c

··· 65 65 " ::: __clobber_all); 66 66 } 67 67 68 + SEC("fentry/bpf_fentry_test10") 69 + __description("btf_ctx_access const void pointer accept") 70 + __success __retval(0) 71 + __naked void ctx_access_const_void_pointer_accept(void) 72 + { 73 + asm volatile (" \ 74 + r2 = *(u64 *)(r1 + 0); /* load 1st argument value (const void pointer) */\ 75 + r0 = 0; \ 76 + exit; \ 77 + " ::: __clobber_all); 78 + } 79 + 68 80 char _license[] SEC("license") = "GPL";

+32 -16

tools/testing/selftests/bpf/progs/verifier_load_acquire.c

··· 10 10 11 11 SEC("socket") 12 12 __description("load-acquire, 8-bit") 13 - __success __success_unpriv __retval(0x12) 13 + __success __success_unpriv __retval(0) 14 14 __naked void load_acquire_8(void) 15 15 { 16 16 asm volatile ( 17 - "w1 = 0x12;" 17 + "r0 = 0;" 18 + "w1 = 0xfe;" 18 19 "*(u8 *)(r10 - 1) = w1;" 19 - ".8byte %[load_acquire_insn];" // w0 = load_acquire((u8 *)(r10 - 1)); 20 + ".8byte %[load_acquire_insn];" // w2 = load_acquire((u8 *)(r10 - 1)); 21 + "if r2 == r1 goto 1f;" 22 + "r0 = 1;" 23 + "1:" 20 24 "exit;" 21 25 : 22 26 : __imm_insn(load_acquire_insn, 23 - BPF_ATOMIC_OP(BPF_B, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -1)) 27 + BPF_ATOMIC_OP(BPF_B, BPF_LOAD_ACQ, BPF_REG_2, BPF_REG_10, -1)) 24 28 : __clobber_all); 25 29 } 26 30 27 31 SEC("socket") 28 32 __description("load-acquire, 16-bit") 29 - __success __success_unpriv __retval(0x1234) 33 + __success __success_unpriv __retval(0) 30 34 __naked void load_acquire_16(void) 31 35 { 32 36 asm volatile ( 33 - "w1 = 0x1234;" 37 + "r0 = 0;" 38 + "w1 = 0xfedc;" 34 39 "*(u16 *)(r10 - 2) = w1;" 35 - ".8byte %[load_acquire_insn];" // w0 = load_acquire((u16 *)(r10 - 2)); 40 + ".8byte %[load_acquire_insn];" // w2 = load_acquire((u16 *)(r10 - 2)); 41 + "if r2 == r1 goto 1f;" 42 + "r0 = 1;" 43 + "1:" 36 44 "exit;" 37 45 : 38 46 : __imm_insn(load_acquire_insn, 39 - BPF_ATOMIC_OP(BPF_H, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -2)) 47 + BPF_ATOMIC_OP(BPF_H, BPF_LOAD_ACQ, BPF_REG_2, BPF_REG_10, -2)) 40 48 : __clobber_all); 41 49 } 42 50 43 51 SEC("socket") 44 52 __description("load-acquire, 32-bit") 45 - __success __success_unpriv __retval(0x12345678) 53 + __success __success_unpriv __retval(0) 46 54 __naked void load_acquire_32(void) 47 55 { 48 56 asm volatile ( 49 - "w1 = 0x12345678;" 57 + "r0 = 0;" 58 + "w1 = 0xfedcba09;" 50 59 "*(u32 *)(r10 - 4) = w1;" 51 - ".8byte %[load_acquire_insn];" // w0 = load_acquire((u32 *)(r10 - 4)); 60 + ".8byte %[load_acquire_insn];" // w2 = load_acquire((u32 *)(r10 - 4)); 61 + "if r2 == r1 goto 1f;" 62 + "r0 = 1;" 63 + "1:" 52 64 "exit;" 53 65 : 54 66 : __imm_insn(load_acquire_insn, 55 - BPF_ATOMIC_OP(BPF_W, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -4)) 67 + BPF_ATOMIC_OP(BPF_W, BPF_LOAD_ACQ, BPF_REG_2, BPF_REG_10, -4)) 56 68 : __clobber_all); 57 69 } 58 70 59 71 SEC("socket") 60 72 __description("load-acquire, 64-bit") 61 - __success __success_unpriv __retval(0x1234567890abcdef) 73 + __success __success_unpriv __retval(0) 62 74 __naked void load_acquire_64(void) 63 75 { 64 76 asm volatile ( 65 - "r1 = 0x1234567890abcdef ll;" 77 + "r0 = 0;" 78 + "r1 = 0xfedcba0987654321 ll;" 66 79 "*(u64 *)(r10 - 8) = r1;" 67 - ".8byte %[load_acquire_insn];" // r0 = load_acquire((u64 *)(r10 - 8)); 80 + ".8byte %[load_acquire_insn];" // r2 = load_acquire((u64 *)(r10 - 8)); 81 + "if r2 == r1 goto 1f;" 82 + "r0 = 1;" 83 + "1:" 68 84 "exit;" 69 85 : 70 86 : __imm_insn(load_acquire_insn, 71 - BPF_ATOMIC_OP(BPF_DW, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -8)) 87 + BPF_ATOMIC_OP(BPF_DW, BPF_LOAD_ACQ, BPF_REG_2, BPF_REG_10, -8)) 72 88 : __clobber_all); 73 89 } 74 90

+55 -3

tools/testing/selftests/bpf/progs/verifier_precision.c

··· 91 91 ::: __clobber_all); 92 92 } 93 93 94 - #if defined(ENABLE_ATOMICS_TESTS) && \ 95 - (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86)) 94 + #ifdef CAN_USE_LOAD_ACQ_STORE_REL 96 95 97 96 SEC("?raw_tp") 98 97 __success __log_level(2) ··· 137 138 : __clobber_all); 138 139 } 139 140 140 - #endif /* load-acquire, store-release */ 141 + #endif /* CAN_USE_LOAD_ACQ_STORE_REL */ 141 142 #endif /* v4 instruction */ 142 143 143 144 SEC("?raw_tp") ··· 176 177 "exit;" 177 178 ::: __clobber_common 178 179 ); 180 + } 181 + 182 + __used __naked static void __bpf_cond_op_r10(void) 183 + { 184 + asm volatile ( 185 + "r2 = 2314885393468386424 ll;" 186 + "goto +0;" 187 + "if r2 <= r10 goto +3;" 188 + "if r1 >= -1835016 goto +0;" 189 + "if r2 <= 8 goto +0;" 190 + "if r3 <= 0 goto +0;" 191 + "exit;" 192 + ::: __clobber_all); 193 + } 194 + 195 + SEC("?raw_tp") 196 + __success __log_level(2) 197 + __msg("8: (bd) if r2 <= r10 goto pc+3") 198 + __msg("9: (35) if r1 >= 0xffe3fff8 goto pc+0") 199 + __msg("10: (b5) if r2 <= 0x8 goto pc+0") 200 + __msg("mark_precise: frame1: last_idx 10 first_idx 0 subseq_idx -1") 201 + __msg("mark_precise: frame1: regs=r2 stack= before 9: (35) if r1 >= 0xffe3fff8 goto pc+0") 202 + __msg("mark_precise: frame1: regs=r2 stack= before 8: (bd) if r2 <= r10 goto pc+3") 203 + __msg("mark_precise: frame1: regs=r2 stack= before 7: (05) goto pc+0") 204 + __naked void bpf_cond_op_r10(void) 205 + { 206 + asm volatile ( 207 + "r3 = 0 ll;" 208 + "call __bpf_cond_op_r10;" 209 + "r0 = 0;" 210 + "exit;" 211 + ::: __clobber_all); 212 + } 213 + 214 + SEC("?raw_tp") 215 + __success __log_level(2) 216 + __msg("3: (bf) r3 = r10") 217 + __msg("4: (bd) if r3 <= r2 goto pc+1") 218 + __msg("5: (b5) if r2 <= 0x8 goto pc+2") 219 + __msg("mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1") 220 + __msg("mark_precise: frame0: regs=r2 stack= before 4: (bd) if r3 <= r2 goto pc+1") 221 + __msg("mark_precise: frame0: regs=r2 stack= before 3: (bf) r3 = r10") 222 + __naked void bpf_cond_op_not_r10(void) 223 + { 224 + asm volatile ( 225 + "r0 = 0;" 226 + "r2 = 2314885393468386424 ll;" 227 + "r3 = r10;" 228 + "if r3 <= r2 goto +1;" 229 + "if r2 <= 8 goto +2;" 230 + "r0 = 2 ll;" 231 + "exit;" 232 + ::: __clobber_all); 179 233 } 180 234 181 235 char _license[] SEC("license") = "GPL";

+27 -12

tools/testing/selftests/bpf/progs/verifier_store_release.c

··· 6 6 #include "../../../include/linux/filter.h" 7 7 #include "bpf_misc.h" 8 8 9 - #if __clang_major__ >= 18 && defined(ENABLE_ATOMICS_TESTS) && \ 10 - (defined(__TARGET_ARCH_arm64) || defined(__TARGET_ARCH_x86)) 9 + #ifdef CAN_USE_LOAD_ACQ_STORE_REL 11 10 12 11 SEC("socket") 13 12 __description("store-release, 8-bit") 14 - __success __success_unpriv __retval(0x12) 13 + __success __success_unpriv __retval(0) 15 14 __naked void store_release_8(void) 16 15 { 17 16 asm volatile ( 17 + "r0 = 0;" 18 18 "w1 = 0x12;" 19 19 ".8byte %[store_release_insn];" // store_release((u8 *)(r10 - 1), w1); 20 - "w0 = *(u8 *)(r10 - 1);" 20 + "w2 = *(u8 *)(r10 - 1);" 21 + "if r2 == r1 goto 1f;" 22 + "r0 = 1;" 23 + "1:" 21 24 "exit;" 22 25 : 23 26 : __imm_insn(store_release_insn, ··· 30 27 31 28 SEC("socket") 32 29 __description("store-release, 16-bit") 33 - __success __success_unpriv __retval(0x1234) 30 + __success __success_unpriv __retval(0) 34 31 __naked void store_release_16(void) 35 32 { 36 33 asm volatile ( 34 + "r0 = 0;" 37 35 "w1 = 0x1234;" 38 36 ".8byte %[store_release_insn];" // store_release((u16 *)(r10 - 2), w1); 39 - "w0 = *(u16 *)(r10 - 2);" 37 + "w2 = *(u16 *)(r10 - 2);" 38 + "if r2 == r1 goto 1f;" 39 + "r0 = 1;" 40 + "1:" 40 41 "exit;" 41 42 : 42 43 : __imm_insn(store_release_insn, ··· 50 43 51 44 SEC("socket") 52 45 __description("store-release, 32-bit") 53 - __success __success_unpriv __retval(0x12345678) 46 + __success __success_unpriv __retval(0) 54 47 __naked void store_release_32(void) 55 48 { 56 49 asm volatile ( 50 + "r0 = 0;" 57 51 "w1 = 0x12345678;" 58 52 ".8byte %[store_release_insn];" // store_release((u32 *)(r10 - 4), w1); 59 - "w0 = *(u32 *)(r10 - 4);" 53 + "w2 = *(u32 *)(r10 - 4);" 54 + "if r2 == r1 goto 1f;" 55 + "r0 = 1;" 56 + "1:" 60 57 "exit;" 61 58 : 62 59 : __imm_insn(store_release_insn, ··· 70 59 71 60 SEC("socket") 72 61 __description("store-release, 64-bit") 73 - __success __success_unpriv __retval(0x1234567890abcdef) 62 + __success __success_unpriv __retval(0) 74 63 __naked void store_release_64(void) 75 64 { 76 65 asm volatile ( 66 + "r0 = 0;" 77 67 "r1 = 0x1234567890abcdef ll;" 78 68 ".8byte %[store_release_insn];" // store_release((u64 *)(r10 - 8), r1); 79 - "r0 = *(u64 *)(r10 - 8);" 69 + "r2 = *(u64 *)(r10 - 8);" 70 + "if r2 == r1 goto 1f;" 71 + "r0 = 1;" 72 + "1:" 80 73 "exit;" 81 74 : 82 75 : __imm_insn(store_release_insn, ··· 286 271 : __clobber_all); 287 272 } 288 273 289 - #else 274 + #else /* CAN_USE_LOAD_ACQ_STORE_REL */ 290 275 291 276 SEC("socket") 292 277 __description("Clang version < 18, ENABLE_ATOMICS_TESTS not defined, and/or JIT doesn't support store-release, use a dummy test") ··· 296 281 return 0; 297 282 } 298 283 299 - #endif 284 + #endif /* CAN_USE_LOAD_ACQ_STORE_REL */ 300 285 301 286 char _license[] SEC("license") = "GPL";

+13

tools/testing/selftests/bpf/progs/xdp_metadata.c

··· 19 19 __type(value, __u32); 20 20 } prog_arr SEC(".maps"); 21 21 22 + struct { 23 + __uint(type, BPF_MAP_TYPE_DEVMAP); 24 + __uint(key_size, sizeof(__u32)); 25 + __uint(value_size, sizeof(struct bpf_devmap_val)); 26 + __uint(max_entries, 1); 27 + } dev_map SEC(".maps"); 28 + 22 29 extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, 23 30 __u64 *timestamp) __ksym; 24 31 extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash, ··· 100 93 &meta->rx_vlan_tci); 101 94 102 95 return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS); 96 + } 97 + 98 + SEC("xdp") 99 + int redirect(struct xdp_md *ctx) 100 + { 101 + return bpf_redirect_map(&dev_map, ctx->rx_queue_index, XDP_PASS); 103 102 } 104 103 105 104 char _license[] SEC("license") = "GPL";

+6 -2

tools/testing/selftests/bpf/test_kmods/bpf_testmod.c

··· 134 134 return bpf_testmod_test_struct_arg_result; 135 135 } 136 136 137 + __weak noinline void bpf_testmod_looooooooooooooooooooooooooooooong_name(void) 138 + { 139 + } 140 + 137 141 __bpf_kfunc void 138 142 bpf_testmod_test_mod_kfunc(int i) 139 143 { ··· 1344 1340 *insn++ = BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_7, offsetof(struct st_ops_args, a)); 1345 1341 *insn++ = BPF_JMP_IMM(BPF_JA, 0, 0, 2); 1346 1342 *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_0); 1347 - *insn++ = BPF_CALL_KFUNC(0, bpf_cgroup_release_id), 1343 + *insn++ = BPF_CALL_KFUNC(0, bpf_cgroup_release_id); 1348 1344 *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_8); 1349 1345 *insn++ = prog->insnsi[0]; 1350 1346 ··· 1383 1379 *insn++ = BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, offsetof(struct st_ops_args, a)); 1384 1380 *insn++ = BPF_JMP_IMM(BPF_JA, 0, 0, 2); 1385 1381 *insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_0); 1386 - *insn++ = BPF_CALL_KFUNC(0, bpf_cgroup_release_id), 1382 + *insn++ = BPF_CALL_KFUNC(0, bpf_cgroup_release_id); 1387 1383 *insn++ = BPF_MOV64_REG(BPF_REG_0, BPF_REG_6); 1388 1384 *insn++ = BPF_ALU64_IMM(BPF_MUL, BPF_REG_0, 2); 1389 1385 *insn++ = BPF_EXIT_INSN();

+8 -6

tools/testing/selftests/bpf/test_loader.c

··· 1042 1042 emit_verifier_log(tester->log_buf, false /*force*/); 1043 1043 validate_msgs(tester->log_buf, &subspec->expect_msgs, emit_verifier_log); 1044 1044 1045 + /* Restore capabilities because the kernel will silently ignore requests 1046 + * for program info (such as xlated program text) if we are not 1047 + * bpf-capable. Also, for some reason test_verifier executes programs 1048 + * with all capabilities restored. Do the same here. 1049 + */ 1050 + if (restore_capabilities(&caps)) 1051 + goto tobj_cleanup; 1052 + 1045 1053 if (subspec->expect_xlated.cnt) { 1046 1054 err = get_xlated_program_text(bpf_program__fd(tprog), 1047 1055 tester->log_buf, tester->log_buf_sz); ··· 1075 1067 } 1076 1068 1077 1069 if (should_do_test_run(spec, subspec)) { 1078 - /* For some reason test_verifier executes programs 1079 - * with all capabilities restored. Do the same here. 1080 - */ 1081 - if (restore_capabilities(&caps)) 1082 - goto tobj_cleanup; 1083 - 1084 1070 /* Do bpf_map__attach_struct_ops() for each struct_ops map. 1085 1071 * This should trigger bpf_struct_ops->reg callback on kernel side. 1086 1072 */

+4 -4

tools/testing/selftests/bpf/test_verifier.c

··· 734 734 BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr *ptr; */ 735 735 }; 736 736 737 - static char bpf_vlog[UINT_MAX >> 8]; 737 + static char bpf_vlog[UINT_MAX >> 5]; 738 738 739 739 static int load_btf_spec(__u32 *types, int types_len, 740 740 const char *strings, int strings_len) ··· 1559 1559 test->errstr_unpriv : test->errstr; 1560 1560 1561 1561 opts.expected_attach_type = test->expected_attach_type; 1562 - if (verbose) 1563 - opts.log_level = verif_log_level | 4; /* force stats */ 1564 - else if (expected_ret == VERBOSE_ACCEPT) 1562 + if (expected_ret == VERBOSE_ACCEPT) 1565 1563 opts.log_level = 2; 1564 + else if (verbose) 1565 + opts.log_level = verif_log_level | 4; /* force stats */ 1566 1566 else 1567 1567 opts.log_level = DEFAULT_LIBBPF_LOG_LEVEL; 1568 1568 opts.prog_flags = pflags;

+94 -7

tools/testing/selftests/bpf/veristat.c

··· 1486 1486 return btf_is_int(t) || btf_is_enum(t) || btf_is_enum64(t); 1487 1487 } 1488 1488 1489 - static int set_global_var(struct bpf_object *obj, struct btf *btf, const struct btf_type *t, 1489 + const int btf_find_member(const struct btf *btf, 1490 + const struct btf_type *parent_type, 1491 + __u32 parent_offset, 1492 + const char *member_name, 1493 + int *member_tid, 1494 + __u32 *member_offset) 1495 + { 1496 + int i; 1497 + 1498 + if (!btf_is_composite(parent_type)) 1499 + return -EINVAL; 1500 + 1501 + for (i = 0; i < btf_vlen(parent_type); ++i) { 1502 + const struct btf_member *member; 1503 + const struct btf_type *member_type; 1504 + int tid; 1505 + 1506 + member = btf_members(parent_type) + i; 1507 + tid = btf__resolve_type(btf, member->type); 1508 + if (tid < 0) 1509 + return -EINVAL; 1510 + 1511 + member_type = btf__type_by_id(btf, tid); 1512 + if (member->name_off) { 1513 + const char *name = btf__name_by_offset(btf, member->name_off); 1514 + 1515 + if (strcmp(member_name, name) == 0) { 1516 + if (btf_member_bitfield_size(parent_type, i) != 0) { 1517 + fprintf(stderr, "Bitfield presets are not supported %s\n", 1518 + name); 1519 + return -EINVAL; 1520 + } 1521 + *member_offset = parent_offset + member->offset; 1522 + *member_tid = tid; 1523 + return 0; 1524 + } 1525 + } else if (btf_is_composite(member_type)) { 1526 + int err; 1527 + 1528 + err = btf_find_member(btf, member_type, parent_offset + member->offset, 1529 + member_name, member_tid, member_offset); 1530 + if (!err) 1531 + return 0; 1532 + } 1533 + } 1534 + 1535 + return -EINVAL; 1536 + } 1537 + 1538 + static int adjust_var_secinfo(struct btf *btf, const struct btf_type *t, 1539 + struct btf_var_secinfo *sinfo, const char *var) 1540 + { 1541 + char expr[256], *saveptr; 1542 + const struct btf_type *base_type, *member_type; 1543 + int err, member_tid; 1544 + char *name; 1545 + __u32 member_offset = 0; 1546 + 1547 + base_type = btf__type_by_id(btf, btf__resolve_type(btf, t->type)); 1548 + snprintf(expr, sizeof(expr), "%s", var); 1549 + strtok_r(expr, ".", &saveptr); 1550 + 1551 + while ((name = strtok_r(NULL, ".", &saveptr))) { 1552 + err = btf_find_member(btf, base_type, 0, name, &member_tid, &member_offset); 1553 + if (err) { 1554 + fprintf(stderr, "Could not find member %s for variable %s\n", name, var); 1555 + return err; 1556 + } 1557 + member_type = btf__type_by_id(btf, member_tid); 1558 + sinfo->offset += member_offset / 8; 1559 + sinfo->size = member_type->size; 1560 + sinfo->type = member_tid; 1561 + base_type = member_type; 1562 + } 1563 + return 0; 1564 + } 1565 + 1566 + static int set_global_var(struct bpf_object *obj, struct btf *btf, 1490 1567 struct bpf_map *map, struct btf_var_secinfo *sinfo, 1491 1568 struct var_preset *preset) 1492 1569 { ··· 1572 1495 long long value = preset->ivalue; 1573 1496 size_t size; 1574 1497 1575 - base_type = btf__type_by_id(btf, btf__resolve_type(btf, t->type)); 1498 + base_type = btf__type_by_id(btf, btf__resolve_type(btf, sinfo->type)); 1576 1499 if (!base_type) { 1577 - fprintf(stderr, "Failed to resolve type %d\n", t->type); 1500 + fprintf(stderr, "Failed to resolve type %d\n", sinfo->type); 1578 1501 return -EINVAL; 1579 1502 } 1580 1503 if (!is_preset_supported(base_type)) { ··· 1607 1530 if (value >= max_val || value < -max_val) { 1608 1531 fprintf(stderr, 1609 1532 "Variable %s value %lld is out of range [%lld; %lld]\n", 1610 - btf__name_by_offset(btf, t->name_off), value, 1533 + btf__name_by_offset(btf, base_type->name_off), value, 1611 1534 is_signed ? -max_val : 0, max_val - 1); 1612 1535 return -EINVAL; 1613 1536 } ··· 1660 1583 for (j = 0; j < n; ++j, ++sinfo) { 1661 1584 const struct btf_type *var_type = btf__type_by_id(btf, sinfo->type); 1662 1585 const char *var_name; 1586 + int var_len; 1663 1587 1664 1588 if (!btf_is_var(var_type)) 1665 1589 continue; 1666 1590 1667 1591 var_name = btf__name_by_offset(btf, var_type->name_off); 1592 + var_len = strlen(var_name); 1668 1593 1669 1594 for (k = 0; k < npresets; ++k) { 1670 - if (strcmp(var_name, presets[k].name) != 0) 1595 + struct btf_var_secinfo tmp_sinfo; 1596 + 1597 + if (strncmp(var_name, presets[k].name, var_len) != 0 || 1598 + (presets[k].name[var_len] != '\0' && 1599 + presets[k].name[var_len] != '.')) 1671 1600 continue; 1672 1601 1673 1602 if (presets[k].applied) { ··· 1681 1598 var_name); 1682 1599 return -EINVAL; 1683 1600 } 1601 + tmp_sinfo = *sinfo; 1602 + err = adjust_var_secinfo(btf, var_type, 1603 + &tmp_sinfo, presets[k].name); 1604 + if (err) 1605 + return err; 1684 1606 1685 - err = set_global_var(obj, btf, var_type, map, sinfo, presets + k); 1607 + err = set_global_var(obj, btf, map, &tmp_sinfo, presets + k); 1686 1608 if (err) 1687 1609 return err; 1688 1610 1689 1611 presets[k].applied = true; 1690 - break; 1691 1612 } 1692 1613 } 1693 1614 }