Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+2 -2

Documentation/bpf/bpf_design_QA.rst

··· 314 314 Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map 315 315 values (when using BTF support for BPF maps). This allows to use helpers for 316 316 such objects on these fields inside map values. Users are also allowed to embed 317 - pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the 317 + pointers to some kernel types (with __kptr_untrusted and __kptr BTF tags). Will the 318 318 kernel preserve backwards compatibility for these features? 319 319 320 320 A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else: ··· 324 324 the kernel will preserve backwards compatibility, as they are part of UAPI. 325 325 326 326 For kptrs, they are also part of UAPI, but only with respect to the kptr 327 - mechanism. The types that you can use with a __kptr and __kptr_ref tagged 327 + mechanism. The types that you can use with a __kptr_untrusted and __kptr tagged 328 328 pointer in your struct are NOT part of the UAPI contract. The supported types can 329 329 and will change across kernel releases. However, operations like accessing kptr 330 330 fields and bpf_kptr_xchg() helper will continue to be supported across kernel

+7 -7

Documentation/bpf/bpf_devel_QA.rst

··· 128 128 net-next are both run by David S. Miller. From there, they will go 129 129 into the kernel mainline tree run by Linus Torvalds. To read up on the 130 130 process of net and net-next being merged into the mainline tree, see 131 - the :ref:`netdev-FAQ` 131 + the `netdev-FAQ`_. 132 132 133 133 134 134 ··· 147 147 Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to? 148 148 --------------------------------------------------------------------------------- 149 149 150 - A: The process is the very same as described in the :ref:`netdev-FAQ`, 150 + A: The process is the very same as described in the `netdev-FAQ`_, 151 151 so please read up on it. The subject line must indicate whether the 152 152 patch is a fix or rather "next-like" content in order to let the 153 153 maintainers know whether it is targeted at bpf or bpf-next. ··· 206 206 Once the BPF pull request was accepted by David S. Miller, then 207 207 the patches end up in net or net-next tree, respectively, and 208 208 make their way from there further into mainline. Again, see the 209 - :ref:`netdev-FAQ` for additional information e.g. on how often they are 209 + `netdev-FAQ`_ for additional information e.g. on how often they are 210 210 merged to mainline. 211 211 212 212 Q: How long do I need to wait for feedback on my BPF patches? ··· 230 230 ----------------------------------------------------------------- 231 231 A: For the time when the merge window is open, bpf-next will not be 232 232 processed. This is roughly analogous to net-next patch processing, 233 - so feel free to read up on the :ref:`netdev-FAQ` about further details. 233 + so feel free to read up on the `netdev-FAQ`_ about further details. 234 234 235 235 During those two weeks of merge window, we might ask you to resend 236 236 your patch series once bpf-next is open again. Once Linus released ··· 394 394 netdev@vger.kernel.org 395 395 396 396 The process in general is the same as on netdev itself, see also the 397 - :ref:`netdev-FAQ`. 397 + `netdev-FAQ`_. 398 398 399 399 Q: Do you also backport to kernels not currently maintained as stable? 400 400 ---------------------------------------------------------------------- ··· 410 410 What should I do? 411 411 412 412 A: The same rules apply as with netdev patch submissions in general, see 413 - the :ref:`netdev-FAQ`. 413 + the `netdev-FAQ`_. 414 414 415 415 Never add "``Cc: stable@vger.kernel.org``" to the patch description, but 416 416 ask the BPF maintainers to queue the patches instead. This can be done ··· 685 685 686 686 .. Links 687 687 .. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/ 688 - .. _netdev-FAQ: Documentation/process/maintainer-netdev.rst 688 + .. _netdev-FAQ: https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html 689 689 .. _selftests: 690 690 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/ 691 691 .. _Documentation/dev-tools/kselftest.rst:

+2 -2

Documentation/bpf/cpumasks.rst

··· 51 51 .. code-block:: c 52 52 53 53 struct cpumask_map_value { 54 - struct bpf_cpumask __kptr_ref * cpumask; 54 + struct bpf_cpumask __kptr * cpumask; 55 55 }; 56 56 57 57 struct array_map { ··· 128 128 129 129 /* struct containing the struct bpf_cpumask kptr which is stored in the map. */ 130 130 struct cpumasks_kfunc_map_value { 131 - struct bpf_cpumask __kptr_ref * bpf_cpumask; 131 + struct bpf_cpumask __kptr * bpf_cpumask; 132 132 }; 133 133 134 134 /* The map containing struct cpumasks_kfunc_map_value entries. */

+27 -13

Documentation/bpf/instruction-set.rst

··· 38 38 * the wide instruction encoding, which appends a second 64-bit immediate (i.e., 39 39 constant) value after the basic instruction for a total of 128 bits. 40 40 41 - The basic instruction encoding is as follows, where MSB and LSB mean the most significant 42 - bits and least significant bits, respectively: 41 + The fields conforming an encoded basic instruction are stored in the 42 + following order:: 43 43 44 - ============= ======= ======= ======= ============ 45 - 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) 46 - ============= ======= ======= ======= ============ 47 - imm offset src_reg dst_reg opcode 48 - ============= ======= ======= ======= ============ 44 + opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF. 45 + opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF. 49 46 50 47 **imm** 51 48 signed integer immediate value ··· 60 63 **opcode** 61 64 operation to perform 62 65 66 + Note that the contents of multi-byte fields ('imm' and 'offset') are 67 + stored using big-endian byte ordering in big-endian BPF and 68 + little-endian byte ordering in little-endian BPF. 69 + 70 + For example:: 71 + 72 + opcode offset imm assembly 73 + src_reg dst_reg 74 + 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little 75 + dst_reg src_reg 76 + 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big 77 + 63 78 Note that most instructions do not use all of the fields. 64 79 Unused fields shall be cleared to zero. 65 80 ··· 81 72 using the same format but with opcode, dst_reg, src_reg, and offset all set to zero, 82 73 and imm containing the high 32 bits of the immediate value. 83 74 84 - ================= ================== 85 - 64 bits (MSB) 64 bits (LSB) 86 - ================= ================== 87 - basic instruction pseudo instruction 88 - ================= ================== 75 + This is depicted in the following figure:: 76 + 77 + basic_instruction 78 + .-----------------------------. 79 + | | 80 + code:8 regs:8 offset:16 imm:32 unused:32 imm:32 81 + | | 82 + '--------------' 83 + pseudo instruction 89 84 90 85 Thus the 64-bit immediate value is constructed as follows: 91 86 92 87 imm64 = (next_imm << 32) | imm 93 88 94 89 where 'next_imm' refers to the imm value of the pseudo instruction 95 - following the basic instruction. 90 + following the basic instruction. The unused bytes in the pseudo 91 + instruction are reserved and shall be cleared to zero. 96 92 97 93 Instruction classes 98 94 -------------------

+32 -9

Documentation/bpf/kfuncs.rst

··· 100 100 size parameter, and the value of the constant matters for program safety, __k 101 101 suffix should be used. 102 102 103 + 2.2.2 __uninit Annotation 104 + ------------------------- 105 + 106 + This annotation is used to indicate that the argument will be treated as 107 + uninitialized. 108 + 109 + An example is given below:: 110 + 111 + __bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit) 112 + { 113 + ... 114 + } 115 + 116 + Here, the dynptr will be treated as an uninitialized dynptr. Without this 117 + annotation, the verifier will reject the program if the dynptr passed in is 118 + not initialized. 119 + 103 120 .. _BPF_kfunc_nodef: 104 121 105 122 2.3 Using an existing kernel function ··· 249 232 2.4.8 KF_RCU flag 250 233 ----------------- 251 234 252 - The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument. 253 - When used together with KF_ACQUIRE, it indicates the kfunc should have a 254 - single argument which must be a trusted argument or a MEM_RCU pointer. 255 - The argument may have reference count of 0 and the kfunc must take this 256 - into consideration. 235 + The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with 236 + KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees 237 + that the objects are valid and there is no use-after-free. The pointers are not 238 + NULL, but the object's refcount could have reached zero. The kfuncs need to 239 + consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE 240 + pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely 241 + also be KF_RET_NULL. 257 242 258 243 .. _KF_deprecated_flag: 259 244 ··· 546 527 547 528 /* struct containing the struct task_struct kptr which is actually stored in the map. */ 548 529 struct __cgroups_kfunc_map_value { 549 - struct cgroup __kptr_ref * cgroup; 530 + struct cgroup __kptr * cgroup; 550 531 }; 551 532 552 533 /* The map containing struct __cgroups_kfunc_map_value entries. */ ··· 602 583 603 584 ---- 604 585 605 - Another kfunc available for interacting with ``struct cgroup *`` objects is 606 - bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup, 607 - and return it as a cgroup kptr. 586 + Other kfuncs available for interacting with ``struct cgroup *`` objects are 587 + bpf_cgroup_ancestor() and bpf_cgroup_from_id(), allowing callers to access 588 + the ancestor of a cgroup and find a cgroup by its ID, respectively. Both 589 + return a cgroup kptr. 608 590 609 591 .. kernel-doc:: kernel/bpf/helpers.c 610 592 :identifiers: bpf_cgroup_ancestor 593 + 594 + .. kernel-doc:: kernel/bpf/helpers.c 595 + :identifiers: bpf_cgroup_from_id 611 596 612 597 Eventually, BPF should be updated to allow this to happen with a normal memory 613 598 load in the program itself. This is currently not possible without more work in

+4 -3

Documentation/bpf/maps.rst

··· 11 11 `man-pages`_ for `bpf-helpers(7)`_. 12 12 13 13 BPF maps are accessed from user space via the ``bpf`` syscall, which provides 14 - commands to create maps, lookup elements, update elements and delete 15 - elements. More details of the BPF syscall are available in 16 - :doc:`/userspace-api/ebpf/syscall` and in the `man-pages`_ for `bpf(2)`_. 14 + commands to create maps, lookup elements, update elements and delete elements. 15 + More details of the BPF syscall are available in `ebpf-syscall`_ and in the 16 + `man-pages`_ for `bpf(2)`_. 17 17 18 18 Map Types 19 19 ========= ··· 79 79 .. _man-pages: https://www.kernel.org/doc/man-pages/ 80 80 .. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html 81 81 .. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html 82 + .. _ebpf-syscall: https://docs.kernel.org/userspace-api/ebpf/syscall.html

+6

arch/loongarch/net/bpf_jit.c

··· 1248 1248 1249 1249 return prog; 1250 1250 } 1251 + 1252 + /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */ 1253 + bool bpf_jit_supports_subprog_tailcalls(void) 1254 + { 1255 + return true; 1256 + }

+1 -4

arch/mips/Kconfig

··· 63 63 select HAVE_DEBUG_STACKOVERFLOW 64 64 select HAVE_DMA_CONTIGUOUS 65 65 select HAVE_DYNAMIC_FTRACE 66 - select HAVE_EBPF_JIT if !CPU_MICROMIPS && \ 67 - !CPU_DADDI_WORKAROUNDS && \ 68 - !CPU_R4000_WORKAROUNDS && \ 69 - !CPU_R4400_WORKAROUNDS 66 + select HAVE_EBPF_JIT if !CPU_MICROMIPS 70 67 select HAVE_EXIT_THREAD 71 68 select HAVE_FAST_GUP 72 69 select HAVE_FTRACE_MCOUNT_RECORD

+4

arch/mips/net/bpf_jit_comp.c

··· 218 218 /* All legal eBPF values are valid */ 219 219 return true; 220 220 case BPF_ADD: 221 + if (IS_ENABLED(CONFIG_CPU_DADDI_WORKAROUNDS)) 222 + return false; 221 223 /* imm must be 16 bits */ 222 224 return imm >= -0x8000 && imm <= 0x7fff; 223 225 case BPF_SUB: 226 + if (IS_ENABLED(CONFIG_CPU_DADDI_WORKAROUNDS)) 227 + return false; 224 228 /* -imm must be 16 bits */ 225 229 return imm >= -0x7fff && imm <= 0x8000; 226 230 case BPF_AND:

+3

arch/mips/net/bpf_jit_comp64.c

··· 228 228 } else { 229 229 emit(ctx, dmultu, dst, src); 230 230 emit(ctx, mflo, dst); 231 + /* Ensure multiplication is completed */ 232 + if (IS_ENABLED(CONFIG_CPU_R4000_WORKAROUNDS)) 233 + emit(ctx, mfhi, MIPS_R_ZERO); 231 234 } 232 235 break; 233 236 /* dst = dst / src */

+5

arch/riscv/net/bpf_jit_comp64.c

··· 1751 1751 { 1752 1752 __build_epilogue(false, ctx); 1753 1753 } 1754 + 1755 + bool bpf_jit_supports_kfunc_call(void) 1756 + { 1757 + return true; 1758 + }

+65 -32

include/linux/bpf.h

··· 607 607 */ 608 608 NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS), 609 609 610 + /* DYNPTR points to sk_buff */ 611 + DYNPTR_TYPE_SKB = BIT(15 + BPF_BASE_TYPE_BITS), 612 + 613 + /* DYNPTR points to xdp_buff */ 614 + DYNPTR_TYPE_XDP = BIT(16 + BPF_BASE_TYPE_BITS), 615 + 610 616 __BPF_TYPE_FLAG_MAX, 611 617 __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, 612 618 }; 613 619 614 - #define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF) 620 + #define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \ 621 + | DYNPTR_TYPE_XDP) 615 622 616 623 /* Max number of base types. */ 617 624 #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) ··· 1130 1123 { 1131 1124 return bpf_func(ctx, insnsi); 1132 1125 } 1126 + 1127 + /* the implementation of the opaque uapi struct bpf_dynptr */ 1128 + struct bpf_dynptr_kern { 1129 + void *data; 1130 + /* Size represents the number of usable bytes of dynptr data. 1131 + * If for example the offset is at 4 for a local dynptr whose data is 1132 + * of type u64, the number of usable bytes is 4. 1133 + * 1134 + * The upper 8 bits are reserved. It is as follows: 1135 + * Bits 0 - 23 = size 1136 + * Bits 24 - 30 = dynptr type 1137 + * Bit 31 = whether dynptr is read-only 1138 + */ 1139 + u32 size; 1140 + u32 offset; 1141 + } __aligned(8); 1142 + 1143 + enum bpf_dynptr_type { 1144 + BPF_DYNPTR_TYPE_INVALID, 1145 + /* Points to memory that is local to the bpf program */ 1146 + BPF_DYNPTR_TYPE_LOCAL, 1147 + /* Underlying data is a ringbuf record */ 1148 + BPF_DYNPTR_TYPE_RINGBUF, 1149 + /* Underlying data is a sk_buff */ 1150 + BPF_DYNPTR_TYPE_SKB, 1151 + /* Underlying data is a xdp_buff */ 1152 + BPF_DYNPTR_TYPE_XDP, 1153 + }; 1154 + 1155 + int bpf_dynptr_check_size(u32 size); 1156 + u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr); 1133 1157 1134 1158 #ifdef CONFIG_BPF_JIT 1135 1159 int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr); ··· 2279 2241 2280 2242 bool btf_nested_type_is_trusted(struct bpf_verifier_log *log, 2281 2243 const struct bpf_reg_state *reg, 2282 - int off); 2244 + int off, const char *suffix); 2283 2245 2284 2246 bool btf_type_ids_nocast_alias(struct bpf_verifier_log *log, 2285 2247 const struct btf *reg_btf, u32 reg_id, ··· 2304 2266 } 2305 2267 2306 2268 void notrace bpf_prog_inc_misses_counter(struct bpf_prog *prog); 2269 + 2270 + void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, 2271 + enum bpf_dynptr_type type, u32 offset, u32 size); 2272 + void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); 2273 + void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr); 2307 2274 #else /* !CONFIG_BPF_SYSCALL */ 2308 2275 static inline struct bpf_prog *bpf_prog_get(u32 ufd) 2309 2276 { ··· 2536 2493 } 2537 2494 2538 2495 static inline void bpf_cgrp_storage_free(struct cgroup *cgroup) 2496 + { 2497 + } 2498 + 2499 + static inline void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, 2500 + enum bpf_dynptr_type type, u32 offset, u32 size) 2501 + { 2502 + } 2503 + 2504 + static inline void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr) 2505 + { 2506 + } 2507 + 2508 + static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr) 2539 2509 { 2540 2510 } 2541 2511 #endif /* CONFIG_BPF_SYSCALL */ ··· 2857 2801 struct bpf_insn *insn_buf, 2858 2802 struct bpf_prog *prog, 2859 2803 u32 *target_size); 2804 + int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags, 2805 + struct bpf_dynptr_kern *ptr); 2860 2806 #else 2861 2807 static inline bool bpf_sock_common_is_valid_access(int off, int size, 2862 2808 enum bpf_access_type type, ··· 2879 2821 u32 *target_size) 2880 2822 { 2881 2823 return 0; 2824 + } 2825 + static inline int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags, 2826 + struct bpf_dynptr_kern *ptr) 2827 + { 2828 + return -EOPNOTSUPP; 2882 2829 } 2883 2830 #endif 2884 2831 ··· 2975 2912 int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args, 2976 2913 u32 num_args, struct bpf_bprintf_data *data); 2977 2914 void bpf_bprintf_cleanup(struct bpf_bprintf_data *data); 2978 - 2979 - /* the implementation of the opaque uapi struct bpf_dynptr */ 2980 - struct bpf_dynptr_kern { 2981 - void *data; 2982 - /* Size represents the number of usable bytes of dynptr data. 2983 - * If for example the offset is at 4 for a local dynptr whose data is 2984 - * of type u64, the number of usable bytes is 4. 2985 - * 2986 - * The upper 8 bits are reserved. It is as follows: 2987 - * Bits 0 - 23 = size 2988 - * Bits 24 - 30 = dynptr type 2989 - * Bit 31 = whether dynptr is read-only 2990 - */ 2991 - u32 size; 2992 - u32 offset; 2993 - } __aligned(8); 2994 - 2995 - enum bpf_dynptr_type { 2996 - BPF_DYNPTR_TYPE_INVALID, 2997 - /* Points to memory that is local to the bpf program */ 2998 - BPF_DYNPTR_TYPE_LOCAL, 2999 - /* Underlying data is a kernel-produced ringbuf record */ 3000 - BPF_DYNPTR_TYPE_RINGBUF, 3001 - }; 3002 - 3003 - void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, 3004 - enum bpf_dynptr_type type, u32 offset, u32 size); 3005 - void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); 3006 - int bpf_dynptr_check_size(u32 size); 3007 - u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr); 3008 2915 3009 2916 #ifdef CONFIG_BPF_LSM 3010 2917 void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);

+7

include/linux/bpf_mem_alloc.h

··· 14 14 struct work_struct work; 15 15 }; 16 16 17 + /* 'size != 0' is for bpf_mem_alloc which manages fixed-size objects. 18 + * Alloc and free are done with bpf_mem_cache_{alloc,free}(). 19 + * 20 + * 'size = 0' is for bpf_mem_alloc which manages many fixed-size objects. 21 + * Alloc and free are done with bpf_mem_{alloc,free}() and the size of 22 + * the returned object is given by the size argument of bpf_mem_alloc(). 23 + */ 17 24 int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); 18 25 void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); 19 26

-4

include/linux/bpf_verifier.h

··· 537 537 bool bypass_spec_v1; 538 538 bool bypass_spec_v4; 539 539 bool seen_direct_write; 540 - bool rcu_tag_supported; 541 540 struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */ 542 541 const struct bpf_line_info *prev_linfo; 543 542 struct bpf_verifier_log log; ··· 615 616 enum bpf_arg_type arg_type); 616 617 int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 617 618 u32 regno, u32 mem_size); 618 - struct bpf_call_arg_meta; 619 - int process_dynptr_func(struct bpf_verifier_env *env, int regno, 620 - enum bpf_arg_type arg_type, struct bpf_call_arg_meta *meta); 621 619 622 620 /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */ 623 621 static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,

+1 -1

include/linux/btf.h

··· 70 70 #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ 71 71 #define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ 72 72 #define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ 73 - #define KF_RCU (1 << 7) /* kfunc only takes rcu pointer arguments */ 73 + #define KF_RCU (1 << 7) /* kfunc takes either rcu or trusted pointer arguments */ 74 74 75 75 /* 76 76 * Tag marking a kernel function as a kfunc. This is meant to minimize the

+46

include/linux/filter.h

··· 1542 1542 return XDP_REDIRECT; 1543 1543 } 1544 1544 1545 + #ifdef CONFIG_NET 1546 + int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len); 1547 + int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, 1548 + u32 len, u64 flags); 1549 + int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); 1550 + int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len); 1551 + void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len); 1552 + void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, 1553 + void *buf, unsigned long len, bool flush); 1554 + #else /* CONFIG_NET */ 1555 + static inline int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, 1556 + void *to, u32 len) 1557 + { 1558 + return -EOPNOTSUPP; 1559 + } 1560 + 1561 + static inline int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, 1562 + const void *from, u32 len, u64 flags) 1563 + { 1564 + return -EOPNOTSUPP; 1565 + } 1566 + 1567 + static inline int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, 1568 + void *buf, u32 len) 1569 + { 1570 + return -EOPNOTSUPP; 1571 + } 1572 + 1573 + static inline int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, 1574 + void *buf, u32 len) 1575 + { 1576 + return -EOPNOTSUPP; 1577 + } 1578 + 1579 + static inline void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) 1580 + { 1581 + return NULL; 1582 + } 1583 + 1584 + static inline void *bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *buf, 1585 + unsigned long len, bool flush) 1586 + { 1587 + return NULL; 1588 + } 1589 + #endif /* CONFIG_NET */ 1590 + 1545 1591 #endif /* __LINUX_FILTER_H__ */

+31 -2

include/uapi/linux/bpf.h

··· 4969 4969 * different maps if key/value layout matches across maps. 4970 4970 * Every bpf_timer_set_callback() can have different callback_fn. 4971 4971 * 4972 + * *flags* can be one of: 4973 + * 4974 + * **BPF_F_TIMER_ABS** 4975 + * Start the timer in absolute expire value instead of the 4976 + * default relative one. 4977 + * 4972 4978 * Return 4973 4979 * 0 on success. 4974 4980 * **-EINVAL** if *timer* was not initialized with bpf_timer_init() earlier ··· 5331 5325 * Description 5332 5326 * Write *len* bytes from *src* into *dst*, starting from *offset* 5333 5327 * into *dst*. 5334 - * *flags* is currently unused. 5328 + * 5329 + * *flags* must be 0 except for skb-type dynptrs. 5330 + * 5331 + * For skb-type dynptrs: 5332 + * * All data slices of the dynptr are automatically 5333 + * invalidated after **bpf_dynptr_write**\ (). This is 5334 + * because writing may pull the skb and change the 5335 + * underlying packet buffer. 5336 + * 5337 + * * For *flags*, please see the flags accepted by 5338 + * **bpf_skb_store_bytes**\ (). 5335 5339 * Return 5336 5340 * 0 on success, -E2BIG if *offset* + *len* exceeds the length 5337 5341 * of *dst*'s data, -EINVAL if *dst* is an invalid dynptr or if *dst* 5338 - * is a read-only dynptr or if *flags* is not 0. 5342 + * is a read-only dynptr or if *flags* is not correct. For skb-type dynptrs, 5343 + * other errors correspond to errors returned by **bpf_skb_store_bytes**\ (). 5339 5344 * 5340 5345 * void *bpf_dynptr_data(const struct bpf_dynptr *ptr, u32 offset, u32 len) 5341 5346 * Description ··· 5354 5337 * 5355 5338 * *len* must be a statically known value. The returned data slice 5356 5339 * is invalidated whenever the dynptr is invalidated. 5340 + * 5341 + * skb and xdp type dynptrs may not use bpf_dynptr_data. They should 5342 + * instead use bpf_dynptr_slice and bpf_dynptr_slice_rdwr. 5357 5343 * Return 5358 5344 * Pointer to the underlying dynptr data, NULL if the dynptr is 5359 5345 * read-only, if the dynptr is invalid, or if the offset and length ··· 7101 7081 __u32 type_id; 7102 7082 __u32 access_str_off; 7103 7083 enum bpf_core_relo_kind kind; 7084 + }; 7085 + 7086 + /* 7087 + * Flags to control bpf_timer_start() behaviour. 7088 + * - BPF_F_TIMER_ABS: Timeout passed is absolute time, by default it is 7089 + * relative to current time. 7090 + */ 7091 + enum { 7092 + BPF_F_TIMER_ABS = (1ULL << 0), 7104 7093 }; 7105 7094 7106 7095 #endif /* _UAPI__LINUX_BPF_H__ */

+77 -8

kernel/bpf/bpf_local_storage.c

··· 51 51 return map->ops->map_owner_storage_ptr(owner); 52 52 } 53 53 54 + static bool selem_linked_to_storage_lockless(const struct bpf_local_storage_elem *selem) 55 + { 56 + return !hlist_unhashed_lockless(&selem->snode); 57 + } 58 + 54 59 static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem) 55 60 { 56 61 return !hlist_unhashed(&selem->snode); 62 + } 63 + 64 + static bool selem_linked_to_map_lockless(const struct bpf_local_storage_elem *selem) 65 + { 66 + return !hlist_unhashed_lockless(&selem->map_node); 57 67 } 58 68 59 69 static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) ··· 85 75 if (selem) { 86 76 if (value) 87 77 copy_map_value(&smap->map, SDATA(selem)->data, value); 78 + /* No need to call check_and_init_map_value as memory is zero init */ 88 79 return selem; 89 80 } 90 81 ··· 109 98 kfree_rcu(local_storage, rcu); 110 99 } 111 100 112 - static void bpf_selem_free_rcu(struct rcu_head *rcu) 101 + static void bpf_selem_free_fields_rcu(struct rcu_head *rcu) 102 + { 103 + struct bpf_local_storage_elem *selem; 104 + struct bpf_local_storage_map *smap; 105 + 106 + selem = container_of(rcu, struct bpf_local_storage_elem, rcu); 107 + /* protected by the rcu_barrier*() */ 108 + smap = rcu_dereference_protected(SDATA(selem)->smap, true); 109 + bpf_obj_free_fields(smap->map.record, SDATA(selem)->data); 110 + kfree(selem); 111 + } 112 + 113 + static void bpf_selem_free_fields_trace_rcu(struct rcu_head *rcu) 114 + { 115 + /* Free directly if Tasks Trace RCU GP also implies RCU GP */ 116 + if (rcu_trace_implies_rcu_gp()) 117 + bpf_selem_free_fields_rcu(rcu); 118 + else 119 + call_rcu(rcu, bpf_selem_free_fields_rcu); 120 + } 121 + 122 + static void bpf_selem_free_trace_rcu(struct rcu_head *rcu) 113 123 { 114 124 struct bpf_local_storage_elem *selem; 115 125 ··· 151 119 { 152 120 struct bpf_local_storage_map *smap; 153 121 bool free_local_storage; 122 + struct btf_record *rec; 154 123 void *owner; 155 124 156 125 smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held()); ··· 192 159 SDATA(selem)) 193 160 RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL); 194 161 195 - if (use_trace_rcu) 196 - call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu); 197 - else 198 - kfree_rcu(selem, rcu); 162 + /* A different RCU callback is chosen whenever we need to free 163 + * additional fields in selem data before freeing selem. 164 + * bpf_local_storage_map_free only executes rcu_barrier to wait for RCU 165 + * callbacks when it has special fields, hence we can only conditionally 166 + * dereference smap, as by this time the map might have already been 167 + * freed without waiting for our call_rcu callback if it did not have 168 + * any special fields. 169 + */ 170 + rec = smap->map.record; 171 + if (use_trace_rcu) { 172 + if (!IS_ERR_OR_NULL(rec)) 173 + call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_fields_trace_rcu); 174 + else 175 + call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu); 176 + } else { 177 + if (!IS_ERR_OR_NULL(rec)) 178 + call_rcu(&selem->rcu, bpf_selem_free_fields_rcu); 179 + else 180 + kfree_rcu(selem, rcu); 181 + } 199 182 200 183 return free_local_storage; 201 184 } ··· 223 174 bool free_local_storage = false; 224 175 unsigned long flags; 225 176 226 - if (unlikely(!selem_linked_to_storage(selem))) 177 + if (unlikely(!selem_linked_to_storage_lockless(selem))) 227 178 /* selem has already been unlinked from sk */ 228 179 return; 229 180 ··· 257 208 struct bpf_local_storage_map_bucket *b; 258 209 unsigned long flags; 259 210 260 - if (unlikely(!selem_linked_to_map(selem))) 211 + if (unlikely(!selem_linked_to_map_lockless(selem))) 261 212 /* selem has already be unlinked from smap */ 262 213 return; 263 214 ··· 469 420 err = check_flags(old_sdata, map_flags); 470 421 if (err) 471 422 return ERR_PTR(err); 472 - if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) { 423 + if (old_sdata && selem_linked_to_storage_lockless(SELEM(old_sdata))) { 473 424 copy_map_value_locked(&smap->map, old_sdata->data, 474 425 value, false); 475 426 return old_sdata; ··· 762 713 */ 763 714 synchronize_rcu(); 764 715 716 + /* Only delay freeing of smap, buckets are not needed anymore */ 765 717 kvfree(smap->buckets); 718 + 719 + /* When local storage has special fields, callbacks for 720 + * bpf_selem_free_fields_rcu and bpf_selem_free_fields_trace_rcu will 721 + * keep using the map BTF record, we need to execute an RCU barrier to 722 + * wait for them as the record will be freed right after our map_free 723 + * callback. 724 + */ 725 + if (!IS_ERR_OR_NULL(smap->map.record)) { 726 + rcu_barrier_tasks_trace(); 727 + /* We cannot skip rcu_barrier() when rcu_trace_implies_rcu_gp() 728 + * is true, because while call_rcu invocation is skipped in that 729 + * case in bpf_selem_free_fields_trace_rcu (and all local 730 + * storage maps pass use_trace_rcu = true), there can be 731 + * call_rcu callbacks based on use_trace_rcu = false in the 732 + * while ((selem = ...)) loop above or when owner's free path 733 + * calls bpf_local_storage_unlink_nolock. 734 + */ 735 + rcu_barrier(); 736 + } 766 737 bpf_map_area_free(smap); 767 738 }

+37 -5

kernel/bpf/btf.c

··· 207 207 BTF_KFUNC_HOOK_TRACING, 208 208 BTF_KFUNC_HOOK_SYSCALL, 209 209 BTF_KFUNC_HOOK_FMODRET, 210 + BTF_KFUNC_HOOK_CGROUP_SKB, 211 + BTF_KFUNC_HOOK_SCHED_ACT, 212 + BTF_KFUNC_HOOK_SK_SKB, 213 + BTF_KFUNC_HOOK_SOCKET_FILTER, 214 + BTF_KFUNC_HOOK_LWT, 210 215 BTF_KFUNC_HOOK_MAX, 211 216 }; 212 217 ··· 3288 3283 /* Reject extra tags */ 3289 3284 if (btf_type_is_type_tag(btf_type_by_id(btf, t->type))) 3290 3285 return -EINVAL; 3291 - if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off))) 3286 + if (!strcmp("kptr_untrusted", __btf_name_by_offset(btf, t->name_off))) 3292 3287 type = BPF_KPTR_UNREF; 3293 - else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off))) 3288 + else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off))) 3294 3289 type = BPF_KPTR_REF; 3295 3290 else 3296 3291 return -EINVAL; ··· 5688 5683 * int socket_filter_bpf_prog(struct __sk_buff *skb) 5689 5684 * { // no fields of skb are ever used } 5690 5685 */ 5686 + if (strcmp(ctx_tname, "__sk_buff") == 0 && strcmp(tname, "sk_buff") == 0) 5687 + return ctx_type; 5688 + if (strcmp(ctx_tname, "xdp_md") == 0 && strcmp(tname, "xdp_buff") == 0) 5689 + return ctx_type; 5691 5690 if (strcmp(ctx_tname, tname)) { 5692 5691 /* bpf_user_pt_regs_t is a typedef, so resolve it to 5693 5692 * underlying struct and check name again ··· 6163 6154 const char *tname, *mname, *tag_value; 6164 6155 u32 vlen, elem_id, mid; 6165 6156 6157 + *flag = 0; 6166 6158 again: 6167 6159 tname = __btf_name_by_offset(btf, t->name_off); 6168 6160 if (!btf_type_is_struct(t)) { ··· 6330 6320 * of this field or inside of this struct 6331 6321 */ 6332 6322 if (btf_type_is_struct(mtype)) { 6323 + if (BTF_INFO_KIND(mtype->info) == BTF_KIND_UNION && 6324 + btf_type_vlen(mtype) != 1) 6325 + /* 6326 + * walking unions yields untrusted pointers 6327 + * with exception of __bpf_md_ptr and other 6328 + * unions with a single member 6329 + */ 6330 + *flag |= PTR_UNTRUSTED; 6331 + 6333 6332 /* our field must be inside that union or struct */ 6334 6333 t = mtype; 6335 6334 ··· 6383 6364 stype = btf_type_skip_modifiers(btf, mtype->type, &id); 6384 6365 if (btf_type_is_struct(stype)) { 6385 6366 *next_btf_id = id; 6386 - *flag = tmp_flag; 6367 + *flag |= tmp_flag; 6387 6368 return WALK_PTR; 6388 6369 } 6389 6370 } ··· 7723 7704 return BTF_KFUNC_HOOK_TRACING; 7724 7705 case BPF_PROG_TYPE_SYSCALL: 7725 7706 return BTF_KFUNC_HOOK_SYSCALL; 7707 + case BPF_PROG_TYPE_CGROUP_SKB: 7708 + return BTF_KFUNC_HOOK_CGROUP_SKB; 7709 + case BPF_PROG_TYPE_SCHED_ACT: 7710 + return BTF_KFUNC_HOOK_SCHED_ACT; 7711 + case BPF_PROG_TYPE_SK_SKB: 7712 + return BTF_KFUNC_HOOK_SK_SKB; 7713 + case BPF_PROG_TYPE_SOCKET_FILTER: 7714 + return BTF_KFUNC_HOOK_SOCKET_FILTER; 7715 + case BPF_PROG_TYPE_LWT_OUT: 7716 + case BPF_PROG_TYPE_LWT_IN: 7717 + case BPF_PROG_TYPE_LWT_XMIT: 7718 + case BPF_PROG_TYPE_LWT_SEG6LOCAL: 7719 + return BTF_KFUNC_HOOK_LWT; 7726 7720 default: 7727 7721 return BTF_KFUNC_HOOK_MAX; 7728 7722 } ··· 8367 8335 8368 8336 bool btf_nested_type_is_trusted(struct bpf_verifier_log *log, 8369 8337 const struct bpf_reg_state *reg, 8370 - int off) 8338 + int off, const char *suffix) 8371 8339 { 8372 8340 struct btf *btf = reg->btf; 8373 8341 const struct btf_type *walk_type, *safe_type; ··· 8384 8352 8385 8353 tname = btf_name_by_offset(btf, walk_type->name_off); 8386 8354 8387 - ret = snprintf(safe_tname, sizeof(safe_tname), "%s__safe_fields", tname); 8355 + ret = snprintf(safe_tname, sizeof(safe_tname), "%s%s", tname, suffix); 8388 8356 if (ret < 0) 8389 8357 return false; 8390 8358

+30 -23

kernel/bpf/cgroup.c

··· 2223 2223 BPF_FIELD_SIZEOF(struct bpf_sysctl_kern, ppos), 2224 2224 treg, si->dst_reg, 2225 2225 offsetof(struct bpf_sysctl_kern, ppos)); 2226 - *insn++ = BPF_STX_MEM( 2227 - BPF_SIZEOF(u32), treg, si->src_reg, 2226 + *insn++ = BPF_RAW_INSN( 2227 + BPF_CLASS(si->code) | BPF_MEM | BPF_SIZEOF(u32), 2228 + treg, si->src_reg, 2228 2229 bpf_ctx_narrow_access_offset( 2229 - 0, sizeof(u32), sizeof(loff_t))); 2230 + 0, sizeof(u32), sizeof(loff_t)), 2231 + si->imm); 2230 2232 *insn++ = BPF_LDX_MEM( 2231 2233 BPF_DW, treg, si->dst_reg, 2232 2234 offsetof(struct bpf_sysctl_kern, tmp_reg)); ··· 2378 2376 return true; 2379 2377 } 2380 2378 2381 - #define CG_SOCKOPT_ACCESS_FIELD(T, F) \ 2382 - T(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F), \ 2383 - si->dst_reg, si->src_reg, \ 2384 - offsetof(struct bpf_sockopt_kern, F)) 2379 + #define CG_SOCKOPT_READ_FIELD(F) \ 2380 + BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F), \ 2381 + si->dst_reg, si->src_reg, \ 2382 + offsetof(struct bpf_sockopt_kern, F)) 2383 + 2384 + #define CG_SOCKOPT_WRITE_FIELD(F) \ 2385 + BPF_RAW_INSN((BPF_FIELD_SIZEOF(struct bpf_sockopt_kern, F) | \ 2386 + BPF_MEM | BPF_CLASS(si->code)), \ 2387 + si->dst_reg, si->src_reg, \ 2388 + offsetof(struct bpf_sockopt_kern, F), \ 2389 + si->imm) 2385 2390 2386 2391 static u32 cg_sockopt_convert_ctx_access(enum bpf_access_type type, 2387 2392 const struct bpf_insn *si, ··· 2400 2391 2401 2392 switch (si->off) { 2402 2393 case offsetof(struct bpf_sockopt, sk): 2403 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, sk); 2394 + *insn++ = CG_SOCKOPT_READ_FIELD(sk); 2404 2395 break; 2405 2396 case offsetof(struct bpf_sockopt, level): 2406 2397 if (type == BPF_WRITE) 2407 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, level); 2398 + *insn++ = CG_SOCKOPT_WRITE_FIELD(level); 2408 2399 else 2409 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, level); 2400 + *insn++ = CG_SOCKOPT_READ_FIELD(level); 2410 2401 break; 2411 2402 case offsetof(struct bpf_sockopt, optname): 2412 2403 if (type == BPF_WRITE) 2413 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optname); 2404 + *insn++ = CG_SOCKOPT_WRITE_FIELD(optname); 2414 2405 else 2415 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optname); 2406 + *insn++ = CG_SOCKOPT_READ_FIELD(optname); 2416 2407 break; 2417 2408 case offsetof(struct bpf_sockopt, optlen): 2418 2409 if (type == BPF_WRITE) 2419 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_STX_MEM, optlen); 2410 + *insn++ = CG_SOCKOPT_WRITE_FIELD(optlen); 2420 2411 else 2421 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optlen); 2412 + *insn++ = CG_SOCKOPT_READ_FIELD(optlen); 2422 2413 break; 2423 2414 case offsetof(struct bpf_sockopt, retval): 2424 2415 BUILD_BUG_ON(offsetof(struct bpf_cg_run_ctx, run_ctx) != 0); ··· 2438 2429 *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct task_struct, bpf_ctx), 2439 2430 treg, treg, 2440 2431 offsetof(struct task_struct, bpf_ctx)); 2441 - *insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(struct bpf_cg_run_ctx, retval), 2442 - treg, si->src_reg, 2443 - offsetof(struct bpf_cg_run_ctx, retval)); 2432 + *insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_MEM | 2433 + BPF_FIELD_SIZEOF(struct bpf_cg_run_ctx, retval), 2434 + treg, si->src_reg, 2435 + offsetof(struct bpf_cg_run_ctx, retval), 2436 + si->imm); 2444 2437 *insn++ = BPF_LDX_MEM(BPF_DW, treg, si->dst_reg, 2445 2438 offsetof(struct bpf_sockopt_kern, tmp_reg)); 2446 2439 } else { ··· 2458 2447 } 2459 2448 break; 2460 2449 case offsetof(struct bpf_sockopt, optval): 2461 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval); 2450 + *insn++ = CG_SOCKOPT_READ_FIELD(optval); 2462 2451 break; 2463 2452 case offsetof(struct bpf_sockopt, optval_end): 2464 - *insn++ = CG_SOCKOPT_ACCESS_FIELD(BPF_LDX_MEM, optval_end); 2453 + *insn++ = CG_SOCKOPT_READ_FIELD(optval_end); 2465 2454 break; 2466 2455 } 2467 2456 ··· 2540 2529 return &bpf_get_current_pid_tgid_proto; 2541 2530 case BPF_FUNC_get_current_comm: 2542 2531 return &bpf_get_current_comm_proto; 2543 - case BPF_FUNC_get_current_cgroup_id: 2544 - return &bpf_get_current_cgroup_id_proto; 2545 - case BPF_FUNC_get_current_ancestor_cgroup_id: 2546 - return &bpf_get_current_ancestor_cgroup_id_proto; 2547 2532 #ifdef CONFIG_CGROUP_NET_CLASSID 2548 2533 case BPF_FUNC_get_cgroup_classid: 2549 2534 return &bpf_get_cgroup_classid_curr_proto;

+23 -23

kernel/bpf/cpumask.c

··· 55 55 /* cpumask must be the first element so struct bpf_cpumask be cast to struct cpumask. */ 56 56 BUILD_BUG_ON(offsetof(struct bpf_cpumask, cpumask) != 0); 57 57 58 - cpumask = bpf_mem_alloc(&bpf_cpumask_ma, sizeof(*cpumask)); 58 + cpumask = bpf_mem_cache_alloc(&bpf_cpumask_ma); 59 59 if (!cpumask) 60 60 return NULL; 61 61 ··· 123 123 124 124 if (refcount_dec_and_test(&cpumask->usage)) { 125 125 migrate_disable(); 126 - bpf_mem_free(&bpf_cpumask_ma, cpumask); 126 + bpf_mem_cache_free(&bpf_cpumask_ma, cpumask); 127 127 migrate_enable(); 128 128 } 129 129 } ··· 427 427 BTF_ID_FLAGS(func, bpf_cpumask_release, KF_RELEASE | KF_TRUSTED_ARGS) 428 428 BTF_ID_FLAGS(func, bpf_cpumask_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) 429 429 BTF_ID_FLAGS(func, bpf_cpumask_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL) 430 - BTF_ID_FLAGS(func, bpf_cpumask_first, KF_TRUSTED_ARGS) 431 - BTF_ID_FLAGS(func, bpf_cpumask_first_zero, KF_TRUSTED_ARGS) 432 - BTF_ID_FLAGS(func, bpf_cpumask_set_cpu, KF_TRUSTED_ARGS) 433 - BTF_ID_FLAGS(func, bpf_cpumask_clear_cpu, KF_TRUSTED_ARGS) 434 - BTF_ID_FLAGS(func, bpf_cpumask_test_cpu, KF_TRUSTED_ARGS) 435 - BTF_ID_FLAGS(func, bpf_cpumask_test_and_set_cpu, KF_TRUSTED_ARGS) 436 - BTF_ID_FLAGS(func, bpf_cpumask_test_and_clear_cpu, KF_TRUSTED_ARGS) 437 - BTF_ID_FLAGS(func, bpf_cpumask_setall, KF_TRUSTED_ARGS) 438 - BTF_ID_FLAGS(func, bpf_cpumask_clear, KF_TRUSTED_ARGS) 439 - BTF_ID_FLAGS(func, bpf_cpumask_and, KF_TRUSTED_ARGS) 440 - BTF_ID_FLAGS(func, bpf_cpumask_or, KF_TRUSTED_ARGS) 441 - BTF_ID_FLAGS(func, bpf_cpumask_xor, KF_TRUSTED_ARGS) 442 - BTF_ID_FLAGS(func, bpf_cpumask_equal, KF_TRUSTED_ARGS) 443 - BTF_ID_FLAGS(func, bpf_cpumask_intersects, KF_TRUSTED_ARGS) 444 - BTF_ID_FLAGS(func, bpf_cpumask_subset, KF_TRUSTED_ARGS) 445 - BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_TRUSTED_ARGS) 446 - BTF_ID_FLAGS(func, bpf_cpumask_full, KF_TRUSTED_ARGS) 447 - BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_TRUSTED_ARGS) 448 - BTF_ID_FLAGS(func, bpf_cpumask_any, KF_TRUSTED_ARGS) 449 - BTF_ID_FLAGS(func, bpf_cpumask_any_and, KF_TRUSTED_ARGS) 430 + BTF_ID_FLAGS(func, bpf_cpumask_first, KF_RCU) 431 + BTF_ID_FLAGS(func, bpf_cpumask_first_zero, KF_RCU) 432 + BTF_ID_FLAGS(func, bpf_cpumask_set_cpu, KF_RCU) 433 + BTF_ID_FLAGS(func, bpf_cpumask_clear_cpu, KF_RCU) 434 + BTF_ID_FLAGS(func, bpf_cpumask_test_cpu, KF_RCU) 435 + BTF_ID_FLAGS(func, bpf_cpumask_test_and_set_cpu, KF_RCU) 436 + BTF_ID_FLAGS(func, bpf_cpumask_test_and_clear_cpu, KF_RCU) 437 + BTF_ID_FLAGS(func, bpf_cpumask_setall, KF_RCU) 438 + BTF_ID_FLAGS(func, bpf_cpumask_clear, KF_RCU) 439 + BTF_ID_FLAGS(func, bpf_cpumask_and, KF_RCU) 440 + BTF_ID_FLAGS(func, bpf_cpumask_or, KF_RCU) 441 + BTF_ID_FLAGS(func, bpf_cpumask_xor, KF_RCU) 442 + BTF_ID_FLAGS(func, bpf_cpumask_equal, KF_RCU) 443 + BTF_ID_FLAGS(func, bpf_cpumask_intersects, KF_RCU) 444 + BTF_ID_FLAGS(func, bpf_cpumask_subset, KF_RCU) 445 + BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_RCU) 446 + BTF_ID_FLAGS(func, bpf_cpumask_full, KF_RCU) 447 + BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) 448 + BTF_ID_FLAGS(func, bpf_cpumask_any, KF_RCU) 449 + BTF_ID_FLAGS(func, bpf_cpumask_any_and, KF_RCU) 450 450 BTF_SET8_END(cpumask_kfunc_btf_ids) 451 451 452 452 static const struct btf_kfunc_id_set cpumask_kfunc_set = { ··· 468 468 }, 469 469 }; 470 470 471 - ret = bpf_mem_alloc_init(&bpf_cpumask_ma, 0, false); 471 + ret = bpf_mem_alloc_init(&bpf_cpumask_ma, sizeof(struct bpf_cpumask), false); 472 472 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &cpumask_kfunc_set); 473 473 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &cpumask_kfunc_set); 474 474 return ret ?: register_btf_id_dtor_kfuncs(cpumask_dtors,

+37 -22

kernel/bpf/hashtab.c

··· 249 249 struct htab_elem *elem; 250 250 251 251 elem = get_htab_elem(htab, i); 252 - bpf_obj_free_fields(htab->map.record, elem->key + round_up(htab->map.key_size, 8)); 252 + if (htab_is_percpu(htab)) { 253 + void __percpu *pptr = htab_elem_get_ptr(elem, htab->map.key_size); 254 + int cpu; 255 + 256 + for_each_possible_cpu(cpu) { 257 + bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu)); 258 + cond_resched(); 259 + } 260 + } else { 261 + bpf_obj_free_fields(htab->map.record, elem->key + round_up(htab->map.key_size, 8)); 262 + cond_resched(); 263 + } 253 264 cond_resched(); 254 265 } 255 266 } ··· 770 759 static void check_and_free_fields(struct bpf_htab *htab, 771 760 struct htab_elem *elem) 772 761 { 773 - void *map_value = elem->key + round_up(htab->map.key_size, 8); 762 + if (htab_is_percpu(htab)) { 763 + void __percpu *pptr = htab_elem_get_ptr(elem, htab->map.key_size); 764 + int cpu; 774 765 775 - bpf_obj_free_fields(htab->map.record, map_value); 766 + for_each_possible_cpu(cpu) 767 + bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu)); 768 + } else { 769 + void *map_value = elem->key + round_up(htab->map.key_size, 8); 770 + 771 + bpf_obj_free_fields(htab->map.record, map_value); 772 + } 776 773 } 777 774 778 775 /* It is called from the bpf_lru_list when the LRU needs to delete ··· 877 858 878 859 static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l) 879 860 { 861 + check_and_free_fields(htab, l); 880 862 if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH) 881 863 bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr); 882 - check_and_free_fields(htab, l); 883 864 bpf_mem_cache_free(&htab->ma, l); 884 865 } 885 866 ··· 937 918 { 938 919 if (!onallcpus) { 939 920 /* copy true value_size bytes */ 940 - memcpy(this_cpu_ptr(pptr), value, htab->map.value_size); 921 + copy_map_value(&htab->map, this_cpu_ptr(pptr), value); 941 922 } else { 942 923 u32 size = round_up(htab->map.value_size, 8); 943 924 int off = 0, cpu; 944 925 945 926 for_each_possible_cpu(cpu) { 946 - bpf_long_memcpy(per_cpu_ptr(pptr, cpu), 947 - value + off, size); 927 + copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value + off); 948 928 off += size; 949 929 } 950 930 } ··· 958 940 * (onallcpus=false always when coming from bpf prog). 959 941 */ 960 942 if (!onallcpus) { 961 - u32 size = round_up(htab->map.value_size, 8); 962 943 int current_cpu = raw_smp_processor_id(); 963 944 int cpu; 964 945 965 946 for_each_possible_cpu(cpu) { 966 947 if (cpu == current_cpu) 967 - bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value, 968 - size); 969 - else 970 - memset(per_cpu_ptr(pptr, cpu), 0, size); 948 + copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value); 949 + else /* Since elem is preallocated, we cannot touch special fields */ 950 + zero_map_value(&htab->map, per_cpu_ptr(pptr, cpu)); 971 951 } 972 952 } else { 973 953 pcpu_copy_value(htab, pptr, value, onallcpus); ··· 1591 1575 1592 1576 pptr = htab_elem_get_ptr(l, key_size); 1593 1577 for_each_possible_cpu(cpu) { 1594 - bpf_long_memcpy(value + off, 1595 - per_cpu_ptr(pptr, cpu), 1596 - roundup_value_size); 1578 + copy_map_value_long(&htab->map, value + off, per_cpu_ptr(pptr, cpu)); 1579 + check_and_init_map_value(&htab->map, value + off); 1597 1580 off += roundup_value_size; 1598 1581 } 1599 1582 } else { ··· 1787 1772 1788 1773 pptr = htab_elem_get_ptr(l, map->key_size); 1789 1774 for_each_possible_cpu(cpu) { 1790 - bpf_long_memcpy(dst_val + off, 1791 - per_cpu_ptr(pptr, cpu), size); 1775 + copy_map_value_long(&htab->map, dst_val + off, per_cpu_ptr(pptr, cpu)); 1776 + check_and_init_map_value(&htab->map, dst_val + off); 1792 1777 off += size; 1793 1778 } 1794 1779 } else { ··· 2061 2046 roundup_value_size = round_up(map->value_size, 8); 2062 2047 pptr = htab_elem_get_ptr(elem, map->key_size); 2063 2048 for_each_possible_cpu(cpu) { 2064 - bpf_long_memcpy(info->percpu_value_buf + off, 2065 - per_cpu_ptr(pptr, cpu), 2066 - roundup_value_size); 2049 + copy_map_value_long(map, info->percpu_value_buf + off, 2050 + per_cpu_ptr(pptr, cpu)); 2051 + check_and_init_map_value(map, info->percpu_value_buf + off); 2067 2052 off += roundup_value_size; 2068 2053 } 2069 2054 ctx.value = info->percpu_value_buf; ··· 2307 2292 */ 2308 2293 pptr = htab_elem_get_ptr(l, map->key_size); 2309 2294 for_each_possible_cpu(cpu) { 2310 - bpf_long_memcpy(value + off, 2311 - per_cpu_ptr(pptr, cpu), size); 2295 + copy_map_value_long(map, value + off, per_cpu_ptr(pptr, cpu)); 2296 + check_and_init_map_value(map, value + off); 2312 2297 off += size; 2313 2298 } 2314 2299 ret = 0;

+239 -18

kernel/bpf/helpers.c

··· 1264 1264 { 1265 1265 struct bpf_hrtimer *t; 1266 1266 int ret = 0; 1267 + enum hrtimer_mode mode; 1267 1268 1268 1269 if (in_nmi()) 1269 1270 return -EOPNOTSUPP; 1270 - if (flags) 1271 + if (flags > BPF_F_TIMER_ABS) 1271 1272 return -EINVAL; 1272 1273 __bpf_spin_lock_irqsave(&timer->lock); 1273 1274 t = timer->timer; ··· 1276 1275 ret = -EINVAL; 1277 1276 goto out; 1278 1277 } 1279 - hrtimer_start(&t->timer, ns_to_ktime(nsecs), HRTIMER_MODE_REL_SOFT); 1278 + 1279 + if (flags & BPF_F_TIMER_ABS) 1280 + mode = HRTIMER_MODE_ABS_SOFT; 1281 + else 1282 + mode = HRTIMER_MODE_REL_SOFT; 1283 + 1284 + hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode); 1280 1285 out: 1281 1286 __bpf_spin_unlock_irqrestore(&timer->lock); 1282 1287 return ret; ··· 1427 1420 return ptr->size & DYNPTR_RDONLY_BIT; 1428 1421 } 1429 1422 1423 + void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr) 1424 + { 1425 + ptr->size |= DYNPTR_RDONLY_BIT; 1426 + } 1427 + 1430 1428 static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type) 1431 1429 { 1432 1430 ptr->size |= type << DYNPTR_TYPE_SHIFT; 1431 + } 1432 + 1433 + static enum bpf_dynptr_type bpf_dynptr_get_type(const struct bpf_dynptr_kern *ptr) 1434 + { 1435 + return (ptr->size & ~(DYNPTR_RDONLY_BIT)) >> DYNPTR_TYPE_SHIFT; 1433 1436 } 1434 1437 1435 1438 u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr) ··· 1514 1497 BPF_CALL_5(bpf_dynptr_read, void *, dst, u32, len, const struct bpf_dynptr_kern *, src, 1515 1498 u32, offset, u64, flags) 1516 1499 { 1500 + enum bpf_dynptr_type type; 1517 1501 int err; 1518 1502 1519 1503 if (!src->data || flags) ··· 1524 1506 if (err) 1525 1507 return err; 1526 1508 1527 - /* Source and destination may possibly overlap, hence use memmove to 1528 - * copy the data. E.g. bpf_dynptr_from_mem may create two dynptr 1529 - * pointing to overlapping PTR_TO_MAP_VALUE regions. 1530 - */ 1531 - memmove(dst, src->data + src->offset + offset, len); 1509 + type = bpf_dynptr_get_type(src); 1532 1510 1533 - return 0; 1511 + switch (type) { 1512 + case BPF_DYNPTR_TYPE_LOCAL: 1513 + case BPF_DYNPTR_TYPE_RINGBUF: 1514 + /* Source and destination may possibly overlap, hence use memmove to 1515 + * copy the data. E.g. bpf_dynptr_from_mem may create two dynptr 1516 + * pointing to overlapping PTR_TO_MAP_VALUE regions. 1517 + */ 1518 + memmove(dst, src->data + src->offset + offset, len); 1519 + return 0; 1520 + case BPF_DYNPTR_TYPE_SKB: 1521 + return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len); 1522 + case BPF_DYNPTR_TYPE_XDP: 1523 + return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len); 1524 + default: 1525 + WARN_ONCE(true, "bpf_dynptr_read: unknown dynptr type %d\n", type); 1526 + return -EFAULT; 1527 + } 1534 1528 } 1535 1529 1536 1530 static const struct bpf_func_proto bpf_dynptr_read_proto = { ··· 1559 1529 BPF_CALL_5(bpf_dynptr_write, const struct bpf_dynptr_kern *, dst, u32, offset, void *, src, 1560 1530 u32, len, u64, flags) 1561 1531 { 1532 + enum bpf_dynptr_type type; 1562 1533 int err; 1563 1534 1564 - if (!dst->data || flags || bpf_dynptr_is_rdonly(dst)) 1535 + if (!dst->data || bpf_dynptr_is_rdonly(dst)) 1565 1536 return -EINVAL; 1566 1537 1567 1538 err = bpf_dynptr_check_off_len(dst, offset, len); 1568 1539 if (err) 1569 1540 return err; 1570 1541 1571 - /* Source and destination may possibly overlap, hence use memmove to 1572 - * copy the data. E.g. bpf_dynptr_from_mem may create two dynptr 1573 - * pointing to overlapping PTR_TO_MAP_VALUE regions. 1574 - */ 1575 - memmove(dst->data + dst->offset + offset, src, len); 1542 + type = bpf_dynptr_get_type(dst); 1576 1543 1577 - return 0; 1544 + switch (type) { 1545 + case BPF_DYNPTR_TYPE_LOCAL: 1546 + case BPF_DYNPTR_TYPE_RINGBUF: 1547 + if (flags) 1548 + return -EINVAL; 1549 + /* Source and destination may possibly overlap, hence use memmove to 1550 + * copy the data. E.g. bpf_dynptr_from_mem may create two dynptr 1551 + * pointing to overlapping PTR_TO_MAP_VALUE regions. 1552 + */ 1553 + memmove(dst->data + dst->offset + offset, src, len); 1554 + return 0; 1555 + case BPF_DYNPTR_TYPE_SKB: 1556 + return __bpf_skb_store_bytes(dst->data, dst->offset + offset, src, len, 1557 + flags); 1558 + case BPF_DYNPTR_TYPE_XDP: 1559 + if (flags) 1560 + return -EINVAL; 1561 + return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len); 1562 + default: 1563 + WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type); 1564 + return -EFAULT; 1565 + } 1578 1566 } 1579 1567 1580 1568 static const struct bpf_func_proto bpf_dynptr_write_proto = { ··· 1608 1560 1609 1561 BPF_CALL_3(bpf_dynptr_data, const struct bpf_dynptr_kern *, ptr, u32, offset, u32, len) 1610 1562 { 1563 + enum bpf_dynptr_type type; 1611 1564 int err; 1612 1565 1613 1566 if (!ptr->data) ··· 1621 1572 if (bpf_dynptr_is_rdonly(ptr)) 1622 1573 return 0; 1623 1574 1624 - return (unsigned long)(ptr->data + ptr->offset + offset); 1575 + type = bpf_dynptr_get_type(ptr); 1576 + 1577 + switch (type) { 1578 + case BPF_DYNPTR_TYPE_LOCAL: 1579 + case BPF_DYNPTR_TYPE_RINGBUF: 1580 + return (unsigned long)(ptr->data + ptr->offset + offset); 1581 + case BPF_DYNPTR_TYPE_SKB: 1582 + case BPF_DYNPTR_TYPE_XDP: 1583 + /* skb and xdp dynptrs should use bpf_dynptr_slice / bpf_dynptr_slice_rdwr */ 1584 + return 0; 1585 + default: 1586 + WARN_ONCE(true, "bpf_dynptr_data: unknown dynptr type %d\n", type); 1587 + return 0; 1588 + } 1625 1589 } 1626 1590 1627 1591 static const struct bpf_func_proto bpf_dynptr_data_proto = { ··· 1755 1693 return &bpf_cgrp_storage_get_proto; 1756 1694 case BPF_FUNC_cgrp_storage_delete: 1757 1695 return &bpf_cgrp_storage_delete_proto; 1696 + case BPF_FUNC_get_current_cgroup_id: 1697 + return &bpf_get_current_cgroup_id_proto; 1698 + case BPF_FUNC_get_current_ancestor_cgroup_id: 1699 + return &bpf_get_current_ancestor_cgroup_id_proto; 1758 1700 #endif 1759 1701 default: 1760 1702 break; ··· 2163 2097 if (level > cgrp->level || level < 0) 2164 2098 return NULL; 2165 2099 2100 + /* cgrp's refcnt could be 0 here, but ancestors can still be accessed */ 2166 2101 ancestor = cgrp->ancestors[level]; 2167 - cgroup_get(ancestor); 2102 + if (!cgroup_tryget(ancestor)) 2103 + return NULL; 2168 2104 return ancestor; 2105 + } 2106 + 2107 + /** 2108 + * bpf_cgroup_from_id - Find a cgroup from its ID. A cgroup returned by this 2109 + * kfunc which is not subsequently stored in a map, must be released by calling 2110 + * bpf_cgroup_release(). 2111 + * @cgid: cgroup id. 2112 + */ 2113 + __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid) 2114 + { 2115 + struct cgroup *cgrp; 2116 + 2117 + cgrp = cgroup_get_from_id(cgid); 2118 + if (IS_ERR(cgrp)) 2119 + return NULL; 2120 + return cgrp; 2169 2121 } 2170 2122 #endif /* CONFIG_CGROUPS */ 2171 2123 ··· 2204 2120 rcu_read_unlock(); 2205 2121 2206 2122 return p; 2123 + } 2124 + 2125 + /** 2126 + * bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data. 2127 + * @ptr: The dynptr whose data slice to retrieve 2128 + * @offset: Offset into the dynptr 2129 + * @buffer: User-provided buffer to copy contents into 2130 + * @buffer__szk: Size (in bytes) of the buffer. This is the length of the 2131 + * requested slice. This must be a constant. 2132 + * 2133 + * For non-skb and non-xdp type dynptrs, there is no difference between 2134 + * bpf_dynptr_slice and bpf_dynptr_data. 2135 + * 2136 + * If the intention is to write to the data slice, please use 2137 + * bpf_dynptr_slice_rdwr. 2138 + * 2139 + * The user must check that the returned pointer is not null before using it. 2140 + * 2141 + * Please note that in the case of skb and xdp dynptrs, bpf_dynptr_slice 2142 + * does not change the underlying packet data pointers, so a call to 2143 + * bpf_dynptr_slice will not invalidate any ctx->data/data_end pointers in 2144 + * the bpf program. 2145 + * 2146 + * Return: NULL if the call failed (eg invalid dynptr), pointer to a read-only 2147 + * data slice (can be either direct pointer to the data or a pointer to the user 2148 + * provided buffer, with its contents containing the data, if unable to obtain 2149 + * direct pointer) 2150 + */ 2151 + __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset, 2152 + void *buffer, u32 buffer__szk) 2153 + { 2154 + enum bpf_dynptr_type type; 2155 + u32 len = buffer__szk; 2156 + int err; 2157 + 2158 + if (!ptr->data) 2159 + return NULL; 2160 + 2161 + err = bpf_dynptr_check_off_len(ptr, offset, len); 2162 + if (err) 2163 + return NULL; 2164 + 2165 + type = bpf_dynptr_get_type(ptr); 2166 + 2167 + switch (type) { 2168 + case BPF_DYNPTR_TYPE_LOCAL: 2169 + case BPF_DYNPTR_TYPE_RINGBUF: 2170 + return ptr->data + ptr->offset + offset; 2171 + case BPF_DYNPTR_TYPE_SKB: 2172 + return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer); 2173 + case BPF_DYNPTR_TYPE_XDP: 2174 + { 2175 + void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len); 2176 + if (xdp_ptr) 2177 + return xdp_ptr; 2178 + 2179 + bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer, len, false); 2180 + return buffer; 2181 + } 2182 + default: 2183 + WARN_ONCE(true, "unknown dynptr type %d\n", type); 2184 + return NULL; 2185 + } 2186 + } 2187 + 2188 + /** 2189 + * bpf_dynptr_slice_rdwr() - Obtain a writable pointer to the dynptr data. 2190 + * @ptr: The dynptr whose data slice to retrieve 2191 + * @offset: Offset into the dynptr 2192 + * @buffer: User-provided buffer to copy contents into 2193 + * @buffer__szk: Size (in bytes) of the buffer. This is the length of the 2194 + * requested slice. This must be a constant. 2195 + * 2196 + * For non-skb and non-xdp type dynptrs, there is no difference between 2197 + * bpf_dynptr_slice and bpf_dynptr_data. 2198 + * 2199 + * The returned pointer is writable and may point to either directly the dynptr 2200 + * data at the requested offset or to the buffer if unable to obtain a direct 2201 + * data pointer to (example: the requested slice is to the paged area of an skb 2202 + * packet). In the case where the returned pointer is to the buffer, the user 2203 + * is responsible for persisting writes through calling bpf_dynptr_write(). This 2204 + * usually looks something like this pattern: 2205 + * 2206 + * struct eth_hdr *eth = bpf_dynptr_slice_rdwr(&dynptr, 0, buffer, sizeof(buffer)); 2207 + * if (!eth) 2208 + * return TC_ACT_SHOT; 2209 + * 2210 + * // mutate eth header // 2211 + * 2212 + * if (eth == buffer) 2213 + * bpf_dynptr_write(&ptr, 0, buffer, sizeof(buffer), 0); 2214 + * 2215 + * Please note that, as in the example above, the user must check that the 2216 + * returned pointer is not null before using it. 2217 + * 2218 + * Please also note that in the case of skb and xdp dynptrs, bpf_dynptr_slice_rdwr 2219 + * does not change the underlying packet data pointers, so a call to 2220 + * bpf_dynptr_slice_rdwr will not invalidate any ctx->data/data_end pointers in 2221 + * the bpf program. 2222 + * 2223 + * Return: NULL if the call failed (eg invalid dynptr), pointer to a 2224 + * data slice (can be either direct pointer to the data or a pointer to the user 2225 + * provided buffer, with its contents containing the data, if unable to obtain 2226 + * direct pointer) 2227 + */ 2228 + __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset, 2229 + void *buffer, u32 buffer__szk) 2230 + { 2231 + if (!ptr->data || bpf_dynptr_is_rdonly(ptr)) 2232 + return NULL; 2233 + 2234 + /* bpf_dynptr_slice_rdwr is the same logic as bpf_dynptr_slice. 2235 + * 2236 + * For skb-type dynptrs, it is safe to write into the returned pointer 2237 + * if the bpf program allows skb data writes. There are two possiblities 2238 + * that may occur when calling bpf_dynptr_slice_rdwr: 2239 + * 2240 + * 1) The requested slice is in the head of the skb. In this case, the 2241 + * returned pointer is directly to skb data, and if the skb is cloned, the 2242 + * verifier will have uncloned it (see bpf_unclone_prologue()) already. 2243 + * The pointer can be directly written into. 2244 + * 2245 + * 2) Some portion of the requested slice is in the paged buffer area. 2246 + * In this case, the requested data will be copied out into the buffer 2247 + * and the returned pointer will be a pointer to the buffer. The skb 2248 + * will not be pulled. To persist the write, the user will need to call 2249 + * bpf_dynptr_write(), which will pull the skb and commit the write. 2250 + * 2251 + * Similarly for xdp programs, if the requested slice is not across xdp 2252 + * fragments, then a direct pointer will be returned, otherwise the data 2253 + * will be copied out into the buffer and the user will need to call 2254 + * bpf_dynptr_write() to commit changes. 2255 + */ 2256 + return bpf_dynptr_slice(ptr, offset, buffer, buffer__szk); 2207 2257 } 2208 2258 2209 2259 __bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj) ··· 2384 2166 BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) 2385 2167 BTF_ID_FLAGS(func, bpf_cgroup_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL) 2386 2168 BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE) 2387 - BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL) 2169 + BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL) 2170 + BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL) 2388 2171 #endif 2389 2172 BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) 2390 2173 BTF_SET8_END(generic_btf_ids) ··· 2409 2190 BTF_ID_FLAGS(func, bpf_rdonly_cast) 2410 2191 BTF_ID_FLAGS(func, bpf_rcu_read_lock) 2411 2192 BTF_ID_FLAGS(func, bpf_rcu_read_unlock) 2193 + BTF_ID_FLAGS(func, bpf_dynptr_slice, KF_RET_NULL) 2194 + BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL) 2412 2195 BTF_SET8_END(common_btf_ids) 2413 2196 2414 2197 static const struct btf_kfunc_id_set common_kfunc_set = {

+7 -1

kernel/bpf/syscall.c

··· 1059 1059 case BPF_KPTR_UNREF: 1060 1060 case BPF_KPTR_REF: 1061 1061 if (map->map_type != BPF_MAP_TYPE_HASH && 1062 + map->map_type != BPF_MAP_TYPE_PERCPU_HASH && 1062 1063 map->map_type != BPF_MAP_TYPE_LRU_HASH && 1064 + map->map_type != BPF_MAP_TYPE_LRU_PERCPU_HASH && 1063 1065 map->map_type != BPF_MAP_TYPE_ARRAY && 1064 - map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY) { 1066 + map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY && 1067 + map->map_type != BPF_MAP_TYPE_SK_STORAGE && 1068 + map->map_type != BPF_MAP_TYPE_INODE_STORAGE && 1069 + map->map_type != BPF_MAP_TYPE_TASK_STORAGE && 1070 + map->map_type != BPF_MAP_TYPE_CGRP_STORAGE) { 1065 1071 ret = -EOPNOTSUPP; 1066 1072 goto free_map_tab; 1067 1073 }

+677 -303

kernel/bpf/verifier.c

··· 268 268 u32 ret_btf_id; 269 269 u32 subprogno; 270 270 struct btf_field *kptr_field; 271 - u8 uninit_dynptr_regno; 271 + }; 272 + 273 + struct bpf_kfunc_call_arg_meta { 274 + /* In parameters */ 275 + struct btf *btf; 276 + u32 func_id; 277 + u32 kfunc_flags; 278 + const struct btf_type *func_proto; 279 + const char *func_name; 280 + /* Out parameters */ 281 + u32 ref_obj_id; 282 + u8 release_regno; 283 + bool r0_rdonly; 284 + u32 ret_btf_id; 285 + u64 r0_size; 286 + u32 subprogno; 287 + struct { 288 + u64 value; 289 + bool found; 290 + } arg_constant; 291 + struct { 292 + struct btf *btf; 293 + u32 btf_id; 294 + } arg_obj_drop; 295 + struct { 296 + struct btf_field *field; 297 + } arg_list_head; 298 + struct { 299 + struct btf_field *field; 300 + } arg_rbtree_root; 301 + struct { 302 + enum bpf_dynptr_type type; 303 + u32 id; 304 + } initialized_dynptr; 305 + u64 mem_size; 272 306 }; 273 307 274 308 struct btf *btf_vmlinux; ··· 487 453 type == PTR_TO_TCP_SOCK || 488 454 type == PTR_TO_MAP_VALUE || 489 455 type == PTR_TO_MAP_KEY || 490 - type == PTR_TO_SOCK_COMMON; 456 + type == PTR_TO_SOCK_COMMON || 457 + type == PTR_TO_MEM; 491 458 } 492 459 493 460 static bool type_is_ptr_alloc_obj(u32 type) ··· 710 675 return spi - nr_slots + 1 >= 0 && spi < allocated_slots; 711 676 } 712 677 713 - static int dynptr_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg) 678 + static int stack_slot_obj_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 679 + const char *obj_kind, int nr_slots) 714 680 { 715 681 int off, spi; 716 682 717 683 if (!tnum_is_const(reg->var_off)) { 718 - verbose(env, "dynptr has to be at a constant offset\n"); 684 + verbose(env, "%s has to be at a constant offset\n", obj_kind); 719 685 return -EINVAL; 720 686 } 721 687 722 688 off = reg->off + reg->var_off.value; 723 689 if (off % BPF_REG_SIZE) { 724 - verbose(env, "cannot pass in dynptr at an offset=%d\n", off); 690 + verbose(env, "cannot pass in %s at an offset=%d\n", obj_kind, off); 725 691 return -EINVAL; 726 692 } 727 693 728 694 spi = __get_spi(off); 729 - if (spi < 1) { 730 - verbose(env, "cannot pass in dynptr at an offset=%d\n", off); 695 + if (spi + 1 < nr_slots) { 696 + verbose(env, "cannot pass in %s at an offset=%d\n", obj_kind, off); 731 697 return -EINVAL; 732 698 } 733 699 734 - if (!is_spi_bounds_valid(func(env, reg), spi, BPF_DYNPTR_NR_SLOTS)) 700 + if (!is_spi_bounds_valid(func(env, reg), spi, nr_slots)) 735 701 return -ERANGE; 736 702 return spi; 703 + } 704 + 705 + static int dynptr_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg) 706 + { 707 + return stack_slot_obj_get_spi(env, reg, "dynptr", BPF_DYNPTR_NR_SLOTS); 737 708 } 738 709 739 710 static const char *kernel_type_name(const struct btf* btf, u32 id) 740 711 { 741 712 return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off); 713 + } 714 + 715 + static const char *dynptr_type_str(enum bpf_dynptr_type type) 716 + { 717 + switch (type) { 718 + case BPF_DYNPTR_TYPE_LOCAL: 719 + return "local"; 720 + case BPF_DYNPTR_TYPE_RINGBUF: 721 + return "ringbuf"; 722 + case BPF_DYNPTR_TYPE_SKB: 723 + return "skb"; 724 + case BPF_DYNPTR_TYPE_XDP: 725 + return "xdp"; 726 + case BPF_DYNPTR_TYPE_INVALID: 727 + return "<invalid>"; 728 + default: 729 + WARN_ONCE(1, "unknown dynptr type %d\n", type); 730 + return "<unknown>"; 731 + } 742 732 } 743 733 744 734 static void mark_reg_scratched(struct bpf_verifier_env *env, u32 regno) ··· 811 751 return BPF_DYNPTR_TYPE_LOCAL; 812 752 case DYNPTR_TYPE_RINGBUF: 813 753 return BPF_DYNPTR_TYPE_RINGBUF; 754 + case DYNPTR_TYPE_SKB: 755 + return BPF_DYNPTR_TYPE_SKB; 756 + case DYNPTR_TYPE_XDP: 757 + return BPF_DYNPTR_TYPE_XDP; 814 758 default: 815 759 return BPF_DYNPTR_TYPE_INVALID; 760 + } 761 + } 762 + 763 + static enum bpf_type_flag get_dynptr_type_flag(enum bpf_dynptr_type type) 764 + { 765 + switch (type) { 766 + case BPF_DYNPTR_TYPE_LOCAL: 767 + return DYNPTR_TYPE_LOCAL; 768 + case BPF_DYNPTR_TYPE_RINGBUF: 769 + return DYNPTR_TYPE_RINGBUF; 770 + case BPF_DYNPTR_TYPE_SKB: 771 + return DYNPTR_TYPE_SKB; 772 + case BPF_DYNPTR_TYPE_XDP: 773 + return DYNPTR_TYPE_XDP; 774 + default: 775 + return 0; 816 776 } 817 777 } 818 778 ··· 975 895 static void __mark_reg_unknown(const struct bpf_verifier_env *env, 976 896 struct bpf_reg_state *reg); 977 897 898 + static void mark_reg_invalid(const struct bpf_verifier_env *env, struct bpf_reg_state *reg) 899 + { 900 + if (!env->allow_ptr_leaks) 901 + __mark_reg_not_init(env, reg); 902 + else 903 + __mark_reg_unknown(env, reg); 904 + } 905 + 978 906 static int destroy_if_dynptr_stack_slot(struct bpf_verifier_env *env, 979 907 struct bpf_func_state *state, int spi) 980 908 { ··· 1022 934 /* Dynptr slices are only PTR_TO_MEM_OR_NULL and PTR_TO_MEM */ 1023 935 if (dreg->type != (PTR_TO_MEM | PTR_MAYBE_NULL) && dreg->type != PTR_TO_MEM) 1024 936 continue; 1025 - if (dreg->dynptr_id == dynptr_id) { 1026 - if (!env->allow_ptr_leaks) 1027 - __mark_reg_not_init(env, dreg); 1028 - else 1029 - __mark_reg_unknown(env, dreg); 1030 - } 937 + if (dreg->dynptr_id == dynptr_id) 938 + mark_reg_invalid(env, dreg); 1031 939 })); 1032 940 1033 941 /* Do not release reference state, we are destroying dynptr on stack, ··· 1039 955 return 0; 1040 956 } 1041 957 1042 - static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 1043 - int spi) 958 + static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg) 1044 959 { 960 + int spi; 961 + 1045 962 if (reg->type == CONST_PTR_TO_DYNPTR) 1046 963 return false; 1047 964 1048 - /* For -ERANGE (i.e. spi not falling into allocated stack slots), we 1049 - * will do check_mem_access to check and update stack bounds later, so 1050 - * return true for that case. 965 + spi = dynptr_get_spi(env, reg); 966 + 967 + /* -ERANGE (i.e. spi not falling into allocated stack slots) isn't an 968 + * error because this just means the stack state hasn't been updated yet. 969 + * We will do check_mem_access to check and update stack bounds later. 1051 970 */ 1052 - if (spi < 0) 1053 - return spi == -ERANGE; 1054 - /* We allow overwriting existing unreferenced STACK_DYNPTR slots, see 1055 - * mark_stack_slots_dynptr which calls destroy_if_dynptr_stack_slot to 1056 - * ensure dynptr objects at the slots we are touching are completely 1057 - * destructed before we reinitialize them for a new one. For referenced 1058 - * ones, destroy_if_dynptr_stack_slot returns an error early instead of 1059 - * delaying it until the end where the user will get "Unreleased 971 + if (spi < 0 && spi != -ERANGE) 972 + return false; 973 + 974 + /* We don't need to check if the stack slots are marked by previous 975 + * dynptr initializations because we allow overwriting existing unreferenced 976 + * STACK_DYNPTR slots, see mark_stack_slots_dynptr which calls 977 + * destroy_if_dynptr_stack_slot to ensure dynptr objects at the slots we are 978 + * touching are completely destructed before we reinitialize them for a new 979 + * one. For referenced ones, destroy_if_dynptr_stack_slot returns an error early 980 + * instead of delaying it until the end where the user will get "Unreleased 1060 981 * reference" error. 1061 982 */ 1062 983 return true; 1063 984 } 1064 985 1065 - static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg, 1066 - int spi) 986 + static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg) 1067 987 { 1068 988 struct bpf_func_state *state = func(env, reg); 1069 - int i; 989 + int i, spi; 1070 990 1071 - /* This already represents first slot of initialized bpf_dynptr */ 991 + /* This already represents first slot of initialized bpf_dynptr. 992 + * 993 + * CONST_PTR_TO_DYNPTR already has fixed and var_off as 0 due to 994 + * check_func_arg_reg_off's logic, so we don't need to check its 995 + * offset and alignment. 996 + */ 1072 997 if (reg->type == CONST_PTR_TO_DYNPTR) 1073 998 return true; 1074 999 1000 + spi = dynptr_get_spi(env, reg); 1075 1001 if (spi < 0) 1076 1002 return false; 1077 1003 if (!state->stack[spi].spilled_ptr.dynptr.first_slot) ··· 1237 1143 for (j = 0; j < BPF_REG_SIZE; j++) { 1238 1144 if (state->stack[i].slot_type[j] != STACK_INVALID) 1239 1145 valid = true; 1240 - types_buf[j] = slot_type_char[ 1241 - state->stack[i].slot_type[j]]; 1146 + types_buf[j] = slot_type_char[state->stack[i].slot_type[j]]; 1242 1147 } 1243 1148 types_buf[BPF_REG_SIZE] = 0; 1244 1149 if (!valid) 1245 1150 continue; 1246 1151 if (!print_all && !stack_slot_scratched(env, i)) 1247 1152 continue; 1248 - verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE); 1249 - print_liveness(env, state->stack[i].spilled_ptr.live); 1250 - if (is_spilled_reg(&state->stack[i])) { 1153 + switch (state->stack[i].slot_type[BPF_REG_SIZE - 1]) { 1154 + case STACK_SPILL: 1251 1155 reg = &state->stack[i].spilled_ptr; 1252 1156 t = reg->type; 1157 + 1158 + verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE); 1159 + print_liveness(env, reg->live); 1253 1160 verbose(env, "=%s", t == SCALAR_VALUE ? "" : reg_type_str(env, t)); 1254 1161 if (t == SCALAR_VALUE && reg->precise) 1255 1162 verbose(env, "P"); 1256 1163 if (t == SCALAR_VALUE && tnum_is_const(reg->var_off)) 1257 1164 verbose(env, "%lld", reg->var_off.value + reg->off); 1258 - } else { 1165 + break; 1166 + case STACK_DYNPTR: 1167 + i += BPF_DYNPTR_NR_SLOTS - 1; 1168 + reg = &state->stack[i].spilled_ptr; 1169 + 1170 + verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE); 1171 + print_liveness(env, reg->live); 1172 + verbose(env, "=dynptr_%s", dynptr_type_str(reg->dynptr.type)); 1173 + if (reg->ref_obj_id) 1174 + verbose(env, "(ref_id=%d)", reg->ref_obj_id); 1175 + break; 1176 + case STACK_MISC: 1177 + case STACK_ZERO: 1178 + default: 1179 + reg = &state->stack[i].spilled_ptr; 1180 + 1181 + for (j = 0; j < BPF_REG_SIZE; j++) 1182 + types_buf[j] = slot_type_char[state->stack[i].slot_type[j]]; 1183 + types_buf[BPF_REG_SIZE] = 0; 1184 + 1185 + verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE); 1186 + print_liveness(env, reg->live); 1259 1187 verbose(env, "=%s", types_buf); 1188 + break; 1260 1189 } 1261 1190 } 1262 1191 if (state->acquired_refs && state->refs[0].id) { ··· 1779 1662 { 1780 1663 return reg_is_pkt_pointer(reg) || 1781 1664 reg->type == PTR_TO_PACKET_END; 1665 + } 1666 + 1667 + static bool reg_is_dynptr_slice_pkt(const struct bpf_reg_state *reg) 1668 + { 1669 + return base_type(reg->type) == PTR_TO_MEM && 1670 + (reg->type & DYNPTR_TYPE_SKB || reg->type & DYNPTR_TYPE_XDP); 1782 1671 } 1783 1672 1784 1673 /* Unmodified PTR_TO_PACKET[_META,_END] register from ctx access. */ ··· 2598 2475 u8 code = insn[i].code; 2599 2476 2600 2477 if (code == (BPF_JMP | BPF_CALL) && 2601 - insn[i].imm == BPF_FUNC_tail_call && 2602 - insn[i].src_reg != BPF_PSEUDO_CALL) 2478 + insn[i].src_reg == 0 && 2479 + insn[i].imm == BPF_FUNC_tail_call) 2603 2480 subprog[cur_subprog].has_tail_call = true; 2604 2481 if (BPF_CLASS(code) == BPF_LD && 2605 2482 (BPF_MODE(code) == BPF_ABS || BPF_MODE(code) == BPF_IND)) ··· 3949 3826 continue; 3950 3827 if (type == STACK_MISC) 3951 3828 continue; 3829 + if (type == STACK_INVALID && env->allow_uninit_stack) 3830 + continue; 3952 3831 verbose(env, "invalid read from stack off %d+%d size %d\n", 3953 3832 off, i, size); 3954 3833 return -EACCES; ··· 3987 3862 if (type == STACK_MISC) 3988 3863 continue; 3989 3864 if (type == STACK_ZERO) 3865 + continue; 3866 + if (type == STACK_INVALID && env->allow_uninit_stack) 3990 3867 continue; 3991 3868 verbose(env, "invalid read from stack off %d+%d size %d\n", 3992 3869 off, i, size); ··· 4302 4175 struct bpf_reg_state *reg, u32 regno) 4303 4176 { 4304 4177 const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id); 4305 - int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED; 4178 + int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED | MEM_RCU; 4306 4179 const char *reg_name = ""; 4307 4180 4308 4181 /* Only unreferenced case accepts untrusted pointers */ ··· 4369 4242 return -EINVAL; 4370 4243 } 4371 4244 4245 + /* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock() 4246 + * can dereference RCU protected pointers and result is PTR_TRUSTED. 4247 + */ 4248 + static bool in_rcu_cs(struct bpf_verifier_env *env) 4249 + { 4250 + return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable; 4251 + } 4252 + 4253 + /* Once GCC supports btf_type_tag the following mechanism will be replaced with tag check */ 4254 + BTF_SET_START(rcu_protected_types) 4255 + BTF_ID(struct, prog_test_ref_kfunc) 4256 + BTF_ID(struct, cgroup) 4257 + BTF_SET_END(rcu_protected_types) 4258 + 4259 + static bool rcu_protected_object(const struct btf *btf, u32 btf_id) 4260 + { 4261 + if (!btf_is_kernel(btf)) 4262 + return false; 4263 + return btf_id_set_contains(&rcu_protected_types, btf_id); 4264 + } 4265 + 4266 + static bool rcu_safe_kptr(const struct btf_field *field) 4267 + { 4268 + const struct btf_field_kptr *kptr = &field->kptr; 4269 + 4270 + return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id); 4271 + } 4272 + 4372 4273 static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno, 4373 4274 int value_regno, int insn_idx, 4374 4275 struct btf_field *kptr_field) ··· 4431 4276 * value from map as PTR_TO_BTF_ID, with the correct type. 4432 4277 */ 4433 4278 mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf, 4434 - kptr_field->kptr.btf_id, PTR_MAYBE_NULL | PTR_UNTRUSTED); 4279 + kptr_field->kptr.btf_id, 4280 + rcu_safe_kptr(kptr_field) && in_rcu_cs(env) ? 4281 + PTR_MAYBE_NULL | MEM_RCU : 4282 + PTR_MAYBE_NULL | PTR_UNTRUSTED); 4435 4283 /* For mark_ptr_or_null_reg */ 4436 4284 val_reg->id = ++env->id_gen; 4437 4285 } else if (class == BPF_STX) { ··· 5157 4999 return 0; 5158 5000 } 5159 5001 5160 - #define BTF_TYPE_SAFE_NESTED(__type) __PASTE(__type, __safe_fields) 5002 + #define BTF_TYPE_SAFE_RCU(__type) __PASTE(__type, __safe_rcu) 5003 + #define BTF_TYPE_SAFE_TRUSTED(__type) __PASTE(__type, __safe_trusted) 5161 5004 5162 - BTF_TYPE_SAFE_NESTED(struct task_struct) { 5005 + /* 5006 + * Allow list few fields as RCU trusted or full trusted. 5007 + * This logic doesn't allow mix tagging and will be removed once GCC supports 5008 + * btf_type_tag. 5009 + */ 5010 + 5011 + /* RCU trusted: these fields are trusted in RCU CS and never NULL */ 5012 + BTF_TYPE_SAFE_RCU(struct task_struct) { 5163 5013 const cpumask_t *cpus_ptr; 5014 + struct css_set __rcu *cgroups; 5015 + struct task_struct __rcu *real_parent; 5016 + struct task_struct *group_leader; 5164 5017 }; 5165 5018 5166 - static bool nested_ptr_is_trusted(struct bpf_verifier_env *env, 5167 - struct bpf_reg_state *reg, 5168 - int off) 5019 + BTF_TYPE_SAFE_RCU(struct css_set) { 5020 + struct cgroup *dfl_cgrp; 5021 + }; 5022 + 5023 + /* full trusted: these fields are trusted even outside of RCU CS and never NULL */ 5024 + BTF_TYPE_SAFE_TRUSTED(struct bpf_iter_meta) { 5025 + __bpf_md_ptr(struct seq_file *, seq); 5026 + }; 5027 + 5028 + BTF_TYPE_SAFE_TRUSTED(struct bpf_iter__task) { 5029 + __bpf_md_ptr(struct bpf_iter_meta *, meta); 5030 + __bpf_md_ptr(struct task_struct *, task); 5031 + }; 5032 + 5033 + BTF_TYPE_SAFE_TRUSTED(struct linux_binprm) { 5034 + struct file *file; 5035 + }; 5036 + 5037 + BTF_TYPE_SAFE_TRUSTED(struct file) { 5038 + struct inode *f_inode; 5039 + }; 5040 + 5041 + BTF_TYPE_SAFE_TRUSTED(struct dentry) { 5042 + /* no negative dentry-s in places where bpf can see it */ 5043 + struct inode *d_inode; 5044 + }; 5045 + 5046 + BTF_TYPE_SAFE_TRUSTED(struct socket) { 5047 + struct sock *sk; 5048 + }; 5049 + 5050 + static bool type_is_rcu(struct bpf_verifier_env *env, 5051 + struct bpf_reg_state *reg, 5052 + int off) 5169 5053 { 5170 - /* If its parent is not trusted, it can't regain its trusted status. */ 5171 - if (!is_trusted_reg(reg)) 5172 - return false; 5054 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_RCU(struct task_struct)); 5055 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_RCU(struct css_set)); 5173 5056 5174 - BTF_TYPE_EMIT(BTF_TYPE_SAFE_NESTED(struct task_struct)); 5057 + return btf_nested_type_is_trusted(&env->log, reg, off, "__safe_rcu"); 5058 + } 5175 5059 5176 - return btf_nested_type_is_trusted(&env->log, reg, off); 5060 + static bool type_is_trusted(struct bpf_verifier_env *env, 5061 + struct bpf_reg_state *reg, 5062 + int off) 5063 + { 5064 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct bpf_iter_meta)); 5065 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct bpf_iter__task)); 5066 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct linux_binprm)); 5067 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct file)); 5068 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct dentry)); 5069 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct socket)); 5070 + 5071 + return btf_nested_type_is_trusted(&env->log, reg, off, "__safe_trusted"); 5177 5072 } 5178 5073 5179 5074 static int check_ptr_to_btf_access(struct bpf_verifier_env *env, ··· 5312 5101 if (ret < 0) 5313 5102 return ret; 5314 5103 5315 - /* If this is an untrusted pointer, all pointers formed by walking it 5316 - * also inherit the untrusted flag. 5317 - */ 5318 - if (type_flag(reg->type) & PTR_UNTRUSTED) 5319 - flag |= PTR_UNTRUSTED; 5104 + if (ret != PTR_TO_BTF_ID) { 5105 + /* just mark; */ 5320 5106 5321 - /* By default any pointer obtained from walking a trusted pointer is no 5322 - * longer trusted, unless the field being accessed has explicitly been 5323 - * marked as inheriting its parent's state of trust. 5324 - * 5325 - * An RCU-protected pointer can also be deemed trusted if we are in an 5326 - * RCU read region. This case is handled below. 5327 - */ 5328 - if (nested_ptr_is_trusted(env, reg, off)) 5329 - flag |= PTR_TRUSTED; 5330 - else 5107 + } else if (type_flag(reg->type) & PTR_UNTRUSTED) { 5108 + /* If this is an untrusted pointer, all pointers formed by walking it 5109 + * also inherit the untrusted flag. 5110 + */ 5111 + flag = PTR_UNTRUSTED; 5112 + 5113 + } else if (is_trusted_reg(reg) || is_rcu_reg(reg)) { 5114 + /* By default any pointer obtained from walking a trusted pointer is no 5115 + * longer trusted, unless the field being accessed has explicitly been 5116 + * marked as inheriting its parent's state of trust (either full or RCU). 5117 + * For example: 5118 + * 'cgroups' pointer is untrusted if task->cgroups dereference 5119 + * happened in a sleepable program outside of bpf_rcu_read_lock() 5120 + * section. In a non-sleepable program it's trusted while in RCU CS (aka MEM_RCU). 5121 + * Note bpf_rcu_read_unlock() converts MEM_RCU pointers to PTR_UNTRUSTED. 5122 + * 5123 + * A regular RCU-protected pointer with __rcu tag can also be deemed 5124 + * trusted if we are in an RCU CS. Such pointer can be NULL. 5125 + */ 5126 + if (type_is_trusted(env, reg, off)) { 5127 + flag |= PTR_TRUSTED; 5128 + } else if (in_rcu_cs(env) && !type_may_be_null(reg->type)) { 5129 + if (type_is_rcu(env, reg, off)) { 5130 + /* ignore __rcu tag and mark it MEM_RCU */ 5131 + flag |= MEM_RCU; 5132 + } else if (flag & MEM_RCU) { 5133 + /* __rcu tagged pointers can be NULL */ 5134 + flag |= PTR_MAYBE_NULL; 5135 + } else if (flag & (MEM_PERCPU | MEM_USER)) { 5136 + /* keep as-is */ 5137 + } else { 5138 + /* walking unknown pointers yields untrusted pointer */ 5139 + flag = PTR_UNTRUSTED; 5140 + } 5141 + } else { 5142 + /* 5143 + * If not in RCU CS or MEM_RCU pointer can be NULL then 5144 + * aggressively mark as untrusted otherwise such 5145 + * pointers will be plain PTR_TO_BTF_ID without flags 5146 + * and will be allowed to be passed into helpers for 5147 + * compat reasons. 5148 + */ 5149 + flag = PTR_UNTRUSTED; 5150 + } 5151 + } else { 5152 + /* Old compat. Deprecated */ 5331 5153 flag &= ~PTR_TRUSTED; 5332 - 5333 - if (flag & MEM_RCU) { 5334 - /* Mark value register as MEM_RCU only if it is protected by 5335 - * bpf_rcu_read_lock() and the ptr reg is rcu or trusted. MEM_RCU 5336 - * itself can already indicate trustedness inside the rcu 5337 - * read lock region. Also mark rcu pointer as PTR_MAYBE_NULL since 5338 - * it could be null in some cases. 5339 - */ 5340 - if (!env->cur_state->active_rcu_lock || 5341 - !(is_trusted_reg(reg) || is_rcu_reg(reg))) 5342 - flag &= ~MEM_RCU; 5343 - else 5344 - flag |= PTR_MAYBE_NULL; 5345 - } else if (reg->type & MEM_RCU) { 5346 - /* ptr (reg) is marked as MEM_RCU, but the struct field is not tagged 5347 - * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively. 5348 - */ 5349 - flag |= PTR_UNTRUSTED; 5350 5154 } 5351 5155 5352 5156 if (atype == BPF_READ && value_regno >= 0) ··· 5980 5754 stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE]; 5981 5755 if (*stype == STACK_MISC) 5982 5756 goto mark; 5983 - if (*stype == STACK_ZERO) { 5757 + if ((*stype == STACK_ZERO) || 5758 + (*stype == STACK_INVALID && env->allow_uninit_stack)) { 5984 5759 if (clobber) { 5985 5760 /* helper can write anything into the stack */ 5986 5761 *stype = STACK_MISC; ··· 6433 6206 * Helpers which do not mutate the bpf_dynptr set MEM_RDONLY in their argument 6434 6207 * type, and declare it as 'const struct bpf_dynptr *' in their prototype. 6435 6208 */ 6436 - int process_dynptr_func(struct bpf_verifier_env *env, int regno, 6437 - enum bpf_arg_type arg_type, struct bpf_call_arg_meta *meta) 6209 + static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn_idx, 6210 + enum bpf_arg_type arg_type) 6438 6211 { 6439 6212 struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno]; 6440 - int spi = 0; 6213 + int err; 6441 6214 6442 6215 /* MEM_UNINIT and MEM_RDONLY are exclusive, when applied to an 6443 6216 * ARG_PTR_TO_DYNPTR (or ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_*): ··· 6445 6218 if ((arg_type & (MEM_UNINIT | MEM_RDONLY)) == (MEM_UNINIT | MEM_RDONLY)) { 6446 6219 verbose(env, "verifier internal error: misconfigured dynptr helper type flags\n"); 6447 6220 return -EFAULT; 6448 - } 6449 - /* CONST_PTR_TO_DYNPTR already has fixed and var_off as 0 due to 6450 - * check_func_arg_reg_off's logic. We only need to check offset 6451 - * and its alignment for PTR_TO_STACK. 6452 - */ 6453 - if (reg->type == PTR_TO_STACK) { 6454 - spi = dynptr_get_spi(env, reg); 6455 - if (spi < 0 && spi != -ERANGE) 6456 - return spi; 6457 6221 } 6458 6222 6459 6223 /* MEM_UNINIT - Points to memory that is an appropriate candidate for ··· 6463 6245 * to. 6464 6246 */ 6465 6247 if (arg_type & MEM_UNINIT) { 6466 - if (!is_dynptr_reg_valid_uninit(env, reg, spi)) { 6248 + int i; 6249 + 6250 + if (!is_dynptr_reg_valid_uninit(env, reg)) { 6467 6251 verbose(env, "Dynptr has to be an uninitialized dynptr\n"); 6468 6252 return -EINVAL; 6469 6253 } 6470 6254 6471 - /* We only support one dynptr being uninitialized at the moment, 6472 - * which is sufficient for the helper functions we have right now. 6473 - */ 6474 - if (meta->uninit_dynptr_regno) { 6475 - verbose(env, "verifier internal error: multiple uninitialized dynptr args\n"); 6476 - return -EFAULT; 6255 + /* we write BPF_DW bits (8 bytes) at a time */ 6256 + for (i = 0; i < BPF_DYNPTR_SIZE; i += 8) { 6257 + err = check_mem_access(env, insn_idx, regno, 6258 + i, BPF_DW, BPF_WRITE, -1, false); 6259 + if (err) 6260 + return err; 6477 6261 } 6478 6262 6479 - meta->uninit_dynptr_regno = regno; 6263 + err = mark_stack_slots_dynptr(env, reg, arg_type, insn_idx); 6480 6264 } else /* MEM_RDONLY and None case from above */ { 6481 - int err; 6482 - 6483 6265 /* For the reg->type == PTR_TO_STACK case, bpf_dynptr is never const */ 6484 6266 if (reg->type == CONST_PTR_TO_DYNPTR && !(arg_type & MEM_RDONLY)) { 6485 6267 verbose(env, "cannot pass pointer to const bpf_dynptr, the helper mutates it\n"); 6486 6268 return -EINVAL; 6487 6269 } 6488 6270 6489 - if (!is_dynptr_reg_valid_init(env, reg, spi)) { 6271 + if (!is_dynptr_reg_valid_init(env, reg)) { 6490 6272 verbose(env, 6491 6273 "Expected an initialized dynptr as arg #%d\n", 6492 6274 regno); ··· 6495 6277 6496 6278 /* Fold modifiers (in this case, MEM_RDONLY) when checking expected type */ 6497 6279 if (!is_dynptr_type_expected(env, reg, arg_type & ~MEM_RDONLY)) { 6498 - const char *err_extra = ""; 6499 - 6500 - switch (arg_type & DYNPTR_TYPE_FLAG_MASK) { 6501 - case DYNPTR_TYPE_LOCAL: 6502 - err_extra = "local"; 6503 - break; 6504 - case DYNPTR_TYPE_RINGBUF: 6505 - err_extra = "ringbuf"; 6506 - break; 6507 - default: 6508 - err_extra = "<unknown>"; 6509 - break; 6510 - } 6511 6280 verbose(env, 6512 6281 "Expected a dynptr of type %s as arg #%d\n", 6513 - err_extra, regno); 6282 + dynptr_type_str(arg_to_dynptr_type(arg_type)), regno); 6514 6283 return -EINVAL; 6515 6284 } 6516 6285 6517 6286 err = mark_dynptr_read(env, reg); 6518 - if (err) 6519 - return err; 6520 6287 } 6521 - return 0; 6288 + return err; 6522 6289 } 6523 6290 6524 6291 static bool arg_type_is_mem_size(enum bpf_arg_type type) ··· 6725 6522 return -EACCES; 6726 6523 6727 6524 found: 6728 - if (reg->type == PTR_TO_BTF_ID || reg->type & PTR_TRUSTED) { 6525 + if (base_type(reg->type) != PTR_TO_BTF_ID) 6526 + return 0; 6527 + 6528 + switch ((int)reg->type) { 6529 + case PTR_TO_BTF_ID: 6530 + case PTR_TO_BTF_ID | PTR_TRUSTED: 6531 + case PTR_TO_BTF_ID | MEM_RCU: 6532 + { 6729 6533 /* For bpf_sk_release, it needs to match against first member 6730 6534 * 'struct sock_common', hence make an exception for it. This 6731 6535 * allows bpf_sk_release to work for multiple socket types. ··· 6768 6558 return -EACCES; 6769 6559 } 6770 6560 } 6771 - } else if (type_is_alloc(reg->type)) { 6561 + break; 6562 + } 6563 + case PTR_TO_BTF_ID | MEM_ALLOC: 6772 6564 if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock) { 6773 6565 verbose(env, "verifier internal error: unimplemented handling of MEM_ALLOC\n"); 6774 6566 return -EFAULT; 6775 6567 } 6568 + /* Handled by helper specific checks */ 6569 + break; 6570 + case PTR_TO_BTF_ID | MEM_PERCPU: 6571 + case PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED: 6572 + /* Handled by helper specific checks */ 6573 + break; 6574 + default: 6575 + verbose(env, "verifier internal error: invalid PTR_TO_BTF_ID register for type match\n"); 6576 + return -EFAULT; 6776 6577 } 6777 - 6778 6578 return 0; 6779 6579 } 6780 6580 ··· 6871 6651 case PTR_TO_BTF_ID | MEM_ALLOC: 6872 6652 case PTR_TO_BTF_ID | PTR_TRUSTED: 6873 6653 case PTR_TO_BTF_ID | MEM_RCU: 6874 - case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED: 6875 6654 case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF: 6876 6655 /* When referenced PTR_TO_BTF_ID is passed to release function, 6877 6656 * its fixed offset must be 0. In the other cases, fixed offset ··· 6883 6664 default: 6884 6665 return __check_ptr_off_reg(env, reg, regno, false); 6885 6666 } 6667 + } 6668 + 6669 + static struct bpf_reg_state *get_dynptr_arg_reg(struct bpf_verifier_env *env, 6670 + const struct bpf_func_proto *fn, 6671 + struct bpf_reg_state *regs) 6672 + { 6673 + struct bpf_reg_state *state = NULL; 6674 + int i; 6675 + 6676 + for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) 6677 + if (arg_type_is_dynptr(fn->arg_type[i])) { 6678 + if (state) { 6679 + verbose(env, "verifier internal error: multiple dynptr args\n"); 6680 + return NULL; 6681 + } 6682 + state = &regs[BPF_REG_1 + i]; 6683 + } 6684 + 6685 + if (!state) 6686 + verbose(env, "verifier internal error: no dynptr arg found\n"); 6687 + 6688 + return state; 6886 6689 } 6887 6690 6888 6691 static int dynptr_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg) ··· 6933 6692 return state->stack[spi].spilled_ptr.ref_obj_id; 6934 6693 } 6935 6694 6695 + static enum bpf_dynptr_type dynptr_get_type(struct bpf_verifier_env *env, 6696 + struct bpf_reg_state *reg) 6697 + { 6698 + struct bpf_func_state *state = func(env, reg); 6699 + int spi; 6700 + 6701 + if (reg->type == CONST_PTR_TO_DYNPTR) 6702 + return reg->dynptr.type; 6703 + 6704 + spi = __get_spi(reg->off); 6705 + if (spi < 0) { 6706 + verbose(env, "verifier internal error: invalid spi when querying dynptr type\n"); 6707 + return BPF_DYNPTR_TYPE_INVALID; 6708 + } 6709 + 6710 + return state->stack[spi].spilled_ptr.dynptr.type; 6711 + } 6712 + 6936 6713 static int check_func_arg(struct bpf_verifier_env *env, u32 arg, 6937 6714 struct bpf_call_arg_meta *meta, 6938 - const struct bpf_func_proto *fn) 6715 + const struct bpf_func_proto *fn, 6716 + int insn_idx) 6939 6717 { 6940 6718 u32 regno = BPF_REG_1 + arg; 6941 6719 struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno]; ··· 7167 6907 err = check_mem_size_reg(env, reg, regno, true, meta); 7168 6908 break; 7169 6909 case ARG_PTR_TO_DYNPTR: 7170 - err = process_dynptr_func(env, regno, arg_type, meta); 6910 + err = process_dynptr_func(env, regno, insn_idx, arg_type); 7171 6911 if (err) 7172 6912 return err; 7173 6913 break; ··· 7386 7126 break; 7387 7127 case BPF_MAP_TYPE_SK_STORAGE: 7388 7128 if (func_id != BPF_FUNC_sk_storage_get && 7389 - func_id != BPF_FUNC_sk_storage_delete) 7129 + func_id != BPF_FUNC_sk_storage_delete && 7130 + func_id != BPF_FUNC_kptr_xchg) 7390 7131 goto error; 7391 7132 break; 7392 7133 case BPF_MAP_TYPE_INODE_STORAGE: 7393 7134 if (func_id != BPF_FUNC_inode_storage_get && 7394 - func_id != BPF_FUNC_inode_storage_delete) 7135 + func_id != BPF_FUNC_inode_storage_delete && 7136 + func_id != BPF_FUNC_kptr_xchg) 7395 7137 goto error; 7396 7138 break; 7397 7139 case BPF_MAP_TYPE_TASK_STORAGE: 7398 7140 if (func_id != BPF_FUNC_task_storage_get && 7399 - func_id != BPF_FUNC_task_storage_delete) 7141 + func_id != BPF_FUNC_task_storage_delete && 7142 + func_id != BPF_FUNC_kptr_xchg) 7400 7143 goto error; 7401 7144 break; 7402 7145 case BPF_MAP_TYPE_CGRP_STORAGE: 7403 7146 if (func_id != BPF_FUNC_cgrp_storage_get && 7404 - func_id != BPF_FUNC_cgrp_storage_delete) 7147 + func_id != BPF_FUNC_cgrp_storage_delete && 7148 + func_id != BPF_FUNC_kptr_xchg) 7405 7149 goto error; 7406 7150 break; 7407 7151 case BPF_MAP_TYPE_BLOOM_FILTER: ··· 7619 7355 7620 7356 /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END] 7621 7357 * are now invalid, so turn them into unknown SCALAR_VALUE. 7358 + * 7359 + * This also applies to dynptr slices belonging to skb and xdp dynptrs, 7360 + * since these slices point to packet data. 7622 7361 */ 7623 7362 static void clear_all_pkt_pointers(struct bpf_verifier_env *env) 7624 7363 { ··· 7629 7362 struct bpf_reg_state *reg; 7630 7363 7631 7364 bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ 7632 - if (reg_is_pkt_pointer_any(reg)) 7633 - __mark_reg_unknown(env, reg); 7365 + if (reg_is_pkt_pointer_any(reg) || reg_is_dynptr_slice_pkt(reg)) 7366 + mark_reg_invalid(env, reg); 7634 7367 })); 7635 7368 } 7636 7369 ··· 7675 7408 return err; 7676 7409 7677 7410 bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ 7678 - if (reg->ref_obj_id == ref_obj_id) { 7679 - if (!env->allow_ptr_leaks) 7680 - __mark_reg_not_init(env, reg); 7681 - else 7682 - __mark_reg_unknown(env, reg); 7683 - } 7411 + if (reg->ref_obj_id == ref_obj_id) 7412 + mark_reg_invalid(env, reg); 7684 7413 })); 7685 7414 7686 7415 return 0; ··· 7689 7426 7690 7427 bpf_for_each_reg_in_vstate(env->cur_state, unused, reg, ({ 7691 7428 if (type_is_non_owning_ref(reg->type)) 7692 - __mark_reg_unknown(env, reg); 7429 + mark_reg_invalid(env, reg); 7693 7430 })); 7694 7431 } 7695 7432 ··· 8460 8197 meta.func_id = func_id; 8461 8198 /* check args */ 8462 8199 for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { 8463 - err = check_func_arg(env, i, &meta, fn); 8200 + err = check_func_arg(env, i, &meta, fn, insn_idx); 8464 8201 if (err) 8465 8202 return err; 8466 8203 } ··· 8484 8221 } 8485 8222 8486 8223 regs = cur_regs(env); 8487 - 8488 - /* This can only be set for PTR_TO_STACK, as CONST_PTR_TO_DYNPTR cannot 8489 - * be reinitialized by any dynptr helper. Hence, mark_stack_slots_dynptr 8490 - * is safe to do directly. 8491 - */ 8492 - if (meta.uninit_dynptr_regno) { 8493 - if (regs[meta.uninit_dynptr_regno].type == CONST_PTR_TO_DYNPTR) { 8494 - verbose(env, "verifier internal error: CONST_PTR_TO_DYNPTR cannot be initialized\n"); 8495 - return -EFAULT; 8496 - } 8497 - /* we write BPF_DW bits (8 bytes) at a time */ 8498 - for (i = 0; i < BPF_DYNPTR_SIZE; i += 8) { 8499 - err = check_mem_access(env, insn_idx, meta.uninit_dynptr_regno, 8500 - i, BPF_DW, BPF_WRITE, -1, false); 8501 - if (err) 8502 - return err; 8503 - } 8504 - 8505 - err = mark_stack_slots_dynptr(env, &regs[meta.uninit_dynptr_regno], 8506 - fn->arg_type[meta.uninit_dynptr_regno - BPF_REG_1], 8507 - insn_idx); 8508 - if (err) 8509 - return err; 8510 - } 8511 8224 8512 8225 if (meta.release_regno) { 8513 8226 err = -EINVAL; ··· 8569 8330 } 8570 8331 break; 8571 8332 case BPF_FUNC_dynptr_data: 8572 - for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { 8573 - if (arg_type_is_dynptr(fn->arg_type[i])) { 8574 - struct bpf_reg_state *reg = &regs[BPF_REG_1 + i]; 8575 - int id, ref_obj_id; 8333 + { 8334 + struct bpf_reg_state *reg; 8335 + int id, ref_obj_id; 8576 8336 8577 - if (meta.dynptr_id) { 8578 - verbose(env, "verifier internal error: meta.dynptr_id already set\n"); 8579 - return -EFAULT; 8580 - } 8337 + reg = get_dynptr_arg_reg(env, fn, regs); 8338 + if (!reg) 8339 + return -EFAULT; 8581 8340 8582 - if (meta.ref_obj_id) { 8583 - verbose(env, "verifier internal error: meta.ref_obj_id already set\n"); 8584 - return -EFAULT; 8585 - } 8586 8341 8587 - id = dynptr_id(env, reg); 8588 - if (id < 0) { 8589 - verbose(env, "verifier internal error: failed to obtain dynptr id\n"); 8590 - return id; 8591 - } 8592 - 8593 - ref_obj_id = dynptr_ref_obj_id(env, reg); 8594 - if (ref_obj_id < 0) { 8595 - verbose(env, "verifier internal error: failed to obtain dynptr ref_obj_id\n"); 8596 - return ref_obj_id; 8597 - } 8598 - 8599 - meta.dynptr_id = id; 8600 - meta.ref_obj_id = ref_obj_id; 8601 - break; 8602 - } 8603 - } 8604 - if (i == MAX_BPF_FUNC_REG_ARGS) { 8605 - verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n"); 8342 + if (meta.dynptr_id) { 8343 + verbose(env, "verifier internal error: meta.dynptr_id already set\n"); 8606 8344 return -EFAULT; 8607 8345 } 8346 + if (meta.ref_obj_id) { 8347 + verbose(env, "verifier internal error: meta.ref_obj_id already set\n"); 8348 + return -EFAULT; 8349 + } 8350 + 8351 + id = dynptr_id(env, reg); 8352 + if (id < 0) { 8353 + verbose(env, "verifier internal error: failed to obtain dynptr id\n"); 8354 + return id; 8355 + } 8356 + 8357 + ref_obj_id = dynptr_ref_obj_id(env, reg); 8358 + if (ref_obj_id < 0) { 8359 + verbose(env, "verifier internal error: failed to obtain dynptr ref_obj_id\n"); 8360 + return ref_obj_id; 8361 + } 8362 + 8363 + meta.dynptr_id = id; 8364 + meta.ref_obj_id = ref_obj_id; 8365 + 8608 8366 break; 8367 + } 8368 + case BPF_FUNC_dynptr_write: 8369 + { 8370 + enum bpf_dynptr_type dynptr_type; 8371 + struct bpf_reg_state *reg; 8372 + 8373 + reg = get_dynptr_arg_reg(env, fn, regs); 8374 + if (!reg) 8375 + return -EFAULT; 8376 + 8377 + dynptr_type = dynptr_get_type(env, reg); 8378 + if (dynptr_type == BPF_DYNPTR_TYPE_INVALID) 8379 + return -EFAULT; 8380 + 8381 + if (dynptr_type == BPF_DYNPTR_TYPE_SKB) 8382 + /* this will trigger clear_all_pkt_pointers(), which will 8383 + * invalidate all dynptr slices associated with the skb 8384 + */ 8385 + changes_data = true; 8386 + 8387 + break; 8388 + } 8609 8389 case BPF_FUNC_user_ringbuf_drain: 8610 8390 err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, 8611 8391 set_user_ringbuf_callback_state); ··· 8853 8595 } 8854 8596 } 8855 8597 8856 - struct bpf_kfunc_call_arg_meta { 8857 - /* In parameters */ 8858 - struct btf *btf; 8859 - u32 func_id; 8860 - u32 kfunc_flags; 8861 - const struct btf_type *func_proto; 8862 - const char *func_name; 8863 - /* Out parameters */ 8864 - u32 ref_obj_id; 8865 - u8 release_regno; 8866 - bool r0_rdonly; 8867 - u32 ret_btf_id; 8868 - u64 r0_size; 8869 - u32 subprogno; 8870 - struct { 8871 - u64 value; 8872 - bool found; 8873 - } arg_constant; 8874 - struct { 8875 - struct btf *btf; 8876 - u32 btf_id; 8877 - } arg_obj_drop; 8878 - struct { 8879 - struct btf_field *field; 8880 - } arg_list_head; 8881 - struct { 8882 - struct btf_field *field; 8883 - } arg_rbtree_root; 8884 - }; 8885 - 8886 8598 static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta) 8887 8599 { 8888 8600 return meta->kfunc_flags & KF_ACQUIRE; ··· 8924 8696 return __kfunc_param_match_suffix(btf, arg, "__sz"); 8925 8697 } 8926 8698 8699 + static bool is_kfunc_arg_const_mem_size(const struct btf *btf, 8700 + const struct btf_param *arg, 8701 + const struct bpf_reg_state *reg) 8702 + { 8703 + const struct btf_type *t; 8704 + 8705 + t = btf_type_skip_modifiers(btf, arg->type, NULL); 8706 + if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE) 8707 + return false; 8708 + 8709 + return __kfunc_param_match_suffix(btf, arg, "__szk"); 8710 + } 8711 + 8927 8712 static bool is_kfunc_arg_constant(const struct btf *btf, const struct btf_param *arg) 8928 8713 { 8929 8714 return __kfunc_param_match_suffix(btf, arg, "__k"); ··· 8950 8709 static bool is_kfunc_arg_alloc_obj(const struct btf *btf, const struct btf_param *arg) 8951 8710 { 8952 8711 return __kfunc_param_match_suffix(btf, arg, "__alloc"); 8712 + } 8713 + 8714 + static bool is_kfunc_arg_uninit(const struct btf *btf, const struct btf_param *arg) 8715 + { 8716 + return __kfunc_param_match_suffix(btf, arg, "__uninit"); 8953 8717 } 8954 8718 8955 8719 static bool is_kfunc_arg_scalar_with_name(const struct btf *btf, ··· 9123 8877 KF_bpf_rbtree_remove, 9124 8878 KF_bpf_rbtree_add, 9125 8879 KF_bpf_rbtree_first, 8880 + KF_bpf_dynptr_from_skb, 8881 + KF_bpf_dynptr_from_xdp, 8882 + KF_bpf_dynptr_slice, 8883 + KF_bpf_dynptr_slice_rdwr, 9126 8884 }; 9127 8885 9128 8886 BTF_SET_START(special_kfunc_set) ··· 9141 8891 BTF_ID(func, bpf_rbtree_remove) 9142 8892 BTF_ID(func, bpf_rbtree_add) 9143 8893 BTF_ID(func, bpf_rbtree_first) 8894 + BTF_ID(func, bpf_dynptr_from_skb) 8895 + BTF_ID(func, bpf_dynptr_from_xdp) 8896 + BTF_ID(func, bpf_dynptr_slice) 8897 + BTF_ID(func, bpf_dynptr_slice_rdwr) 9144 8898 BTF_SET_END(special_kfunc_set) 9145 8899 9146 8900 BTF_ID_LIST(special_kfunc_list) ··· 9161 8907 BTF_ID(func, bpf_rbtree_remove) 9162 8908 BTF_ID(func, bpf_rbtree_add) 9163 8909 BTF_ID(func, bpf_rbtree_first) 8910 + BTF_ID(func, bpf_dynptr_from_skb) 8911 + BTF_ID(func, bpf_dynptr_from_xdp) 8912 + BTF_ID(func, bpf_dynptr_slice) 8913 + BTF_ID(func, bpf_dynptr_slice_rdwr) 9164 8914 9165 8915 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta) 9166 8916 { ··· 9244 8986 if (is_kfunc_arg_callback(env, meta->btf, &args[argno])) 9245 8987 return KF_ARG_PTR_TO_CALLBACK; 9246 8988 9247 - if (argno + 1 < nargs && is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1])) 8989 + 8990 + if (argno + 1 < nargs && 8991 + (is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]) || 8992 + is_kfunc_arg_const_mem_size(meta->btf, &args[argno + 1], &regs[regno + 1]))) 9248 8993 arg_mem_size = true; 9249 8994 9250 8995 /* This is the catch all argument type of register types supported by ··· 9467 9206 ptr = reg->map_ptr; 9468 9207 break; 9469 9208 case PTR_TO_BTF_ID | MEM_ALLOC: 9470 - case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED: 9471 9209 ptr = reg->btf; 9472 9210 break; 9473 9211 default: ··· 9715 9455 &meta->arg_rbtree_root.field); 9716 9456 } 9717 9457 9718 - static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta) 9458 + static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, 9459 + int insn_idx) 9719 9460 { 9720 9461 const char *func_name = meta->func_name, *ref_tname; 9721 9462 const struct btf *btf = meta->btf; ··· 9799 9538 return -EINVAL; 9800 9539 } 9801 9540 9802 - if (is_kfunc_trusted_args(meta) && 9541 + if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) && 9803 9542 (register_is_null(reg) || type_may_be_null(reg->type))) { 9804 9543 verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i); 9805 9544 return -EACCES; ··· 9907 9646 return ret; 9908 9647 break; 9909 9648 case KF_ARG_PTR_TO_DYNPTR: 9649 + { 9650 + enum bpf_arg_type dynptr_arg_type = ARG_PTR_TO_DYNPTR; 9651 + 9910 9652 if (reg->type != PTR_TO_STACK && 9911 9653 reg->type != CONST_PTR_TO_DYNPTR) { 9912 9654 verbose(env, "arg#%d expected pointer to stack or dynptr_ptr\n", i); 9913 9655 return -EINVAL; 9914 9656 } 9915 9657 9916 - ret = process_dynptr_func(env, regno, ARG_PTR_TO_DYNPTR | MEM_RDONLY, NULL); 9658 + if (reg->type == CONST_PTR_TO_DYNPTR) 9659 + dynptr_arg_type |= MEM_RDONLY; 9660 + 9661 + if (is_kfunc_arg_uninit(btf, &args[i])) 9662 + dynptr_arg_type |= MEM_UNINIT; 9663 + 9664 + if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) 9665 + dynptr_arg_type |= DYNPTR_TYPE_SKB; 9666 + else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_xdp]) 9667 + dynptr_arg_type |= DYNPTR_TYPE_XDP; 9668 + 9669 + ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type); 9917 9670 if (ret < 0) 9918 9671 return ret; 9672 + 9673 + if (!(dynptr_arg_type & MEM_UNINIT)) { 9674 + int id = dynptr_id(env, reg); 9675 + 9676 + if (id < 0) { 9677 + verbose(env, "verifier internal error: failed to obtain dynptr id\n"); 9678 + return id; 9679 + } 9680 + meta->initialized_dynptr.id = id; 9681 + meta->initialized_dynptr.type = dynptr_get_type(env, reg); 9682 + } 9683 + 9919 9684 break; 9685 + } 9920 9686 case KF_ARG_PTR_TO_LIST_HEAD: 9921 9687 if (reg->type != PTR_TO_MAP_VALUE && 9922 9688 reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) { ··· 10037 9749 return ret; 10038 9750 break; 10039 9751 case KF_ARG_PTR_TO_MEM_SIZE: 10040 - ret = check_kfunc_mem_size_reg(env, &regs[regno + 1], regno + 1); 9752 + { 9753 + struct bpf_reg_state *size_reg = &regs[regno + 1]; 9754 + const struct btf_param *size_arg = &args[i + 1]; 9755 + 9756 + ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1); 10041 9757 if (ret < 0) { 10042 9758 verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1); 10043 9759 return ret; 10044 9760 } 10045 - /* Skip next '__sz' argument */ 9761 + 9762 + if (is_kfunc_arg_const_mem_size(meta->btf, size_arg, size_reg)) { 9763 + if (meta->arg_constant.found) { 9764 + verbose(env, "verifier internal error: only one constant argument permitted\n"); 9765 + return -EFAULT; 9766 + } 9767 + if (!tnum_is_const(size_reg->var_off)) { 9768 + verbose(env, "R%d must be a known constant\n", regno + 1); 9769 + return -EINVAL; 9770 + } 9771 + meta->arg_constant.found = true; 9772 + meta->arg_constant.value = size_reg->var_off.value; 9773 + } 9774 + 9775 + /* Skip next '__sz' or '__szk' argument */ 10046 9776 i++; 10047 9777 break; 9778 + } 10048 9779 case KF_ARG_PTR_TO_CALLBACK: 10049 9780 meta->subprogno = reg->subprogno; 10050 9781 break; ··· 10135 9828 10136 9829 rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta); 10137 9830 rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta); 10138 - if ((rcu_lock || rcu_unlock) && !env->rcu_tag_supported) { 10139 - verbose(env, "no vmlinux btf rcu tag support for kfunc %s\n", func_name); 10140 - return -EACCES; 10141 - } 10142 9831 10143 9832 if (env->cur_state->active_rcu_lock) { 10144 9833 struct bpf_func_state *state; ··· 10163 9860 } 10164 9861 10165 9862 /* Check the arguments */ 10166 - err = check_kfunc_args(env, &meta); 9863 + err = check_kfunc_args(env, &meta, insn_idx); 10167 9864 if (err < 0) 10168 9865 return err; 10169 9866 /* In case of release function, we get register number of refcounted ··· 10294 9991 regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_UNTRUSTED; 10295 9992 regs[BPF_REG_0].btf = desc_btf; 10296 9993 regs[BPF_REG_0].btf_id = meta.arg_constant.value; 9994 + } else if (meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice] || 9995 + meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { 9996 + enum bpf_type_flag type_flag = get_dynptr_type_flag(meta.initialized_dynptr.type); 9997 + 9998 + mark_reg_known_zero(env, regs, BPF_REG_0); 9999 + 10000 + if (!meta.arg_constant.found) { 10001 + verbose(env, "verifier internal error: bpf_dynptr_slice(_rdwr) no constant size\n"); 10002 + return -EFAULT; 10003 + } 10004 + 10005 + regs[BPF_REG_0].mem_size = meta.arg_constant.value; 10006 + 10007 + /* PTR_MAYBE_NULL will be added when is_kfunc_ret_null is checked */ 10008 + regs[BPF_REG_0].type = PTR_TO_MEM | type_flag; 10009 + 10010 + if (meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice]) { 10011 + regs[BPF_REG_0].type |= MEM_RDONLY; 10012 + } else { 10013 + /* this will set env->seen_direct_write to true */ 10014 + if (!may_access_direct_pkt_data(env, NULL, BPF_WRITE)) { 10015 + verbose(env, "the prog does not allow writes to packet data\n"); 10016 + return -EINVAL; 10017 + } 10018 + } 10019 + 10020 + if (!meta.initialized_dynptr.id) { 10021 + verbose(env, "verifier internal error: no dynptr id\n"); 10022 + return -EFAULT; 10023 + } 10024 + regs[BPF_REG_0].dynptr_id = meta.initialized_dynptr.id; 10025 + 10026 + /* we don't need to set BPF_REG_0's ref obj id 10027 + * because packet slices are not refcounted (see 10028 + * dynptr_type_refcounted) 10029 + */ 10297 10030 } else { 10298 10031 verbose(env, "kernel function %s unhandled dynamic return type\n", 10299 10032 meta.func_name); 10300 10033 return -EFAULT; 10301 10034 } 10302 10035 } else if (!__btf_type_is_struct(ptr_type)) { 10036 + if (!meta.r0_size) { 10037 + __u32 sz; 10038 + 10039 + if (!IS_ERR(btf_resolve_size(desc_btf, ptr_type, &sz))) { 10040 + meta.r0_size = sz; 10041 + meta.r0_rdonly = true; 10042 + } 10043 + } 10303 10044 if (!meta.r0_size) { 10304 10045 ptr_type_name = btf_name_by_offset(desc_btf, 10305 10046 ptr_type->name_off); ··· 13499 13152 */ 13500 13153 static int visit_insn(int t, struct bpf_verifier_env *env) 13501 13154 { 13502 - struct bpf_insn *insns = env->prog->insnsi; 13155 + struct bpf_insn *insns = env->prog->insnsi, *insn = &insns[t]; 13503 13156 int ret; 13504 13157 13505 - if (bpf_pseudo_func(insns + t)) 13158 + if (bpf_pseudo_func(insn)) 13506 13159 return visit_func_call_insn(t, insns, env, true); 13507 13160 13508 13161 /* All non-branch instructions have a single fall-through edge. */ 13509 - if (BPF_CLASS(insns[t].code) != BPF_JMP && 13510 - BPF_CLASS(insns[t].code) != BPF_JMP32) 13162 + if (BPF_CLASS(insn->code) != BPF_JMP && 13163 + BPF_CLASS(insn->code) != BPF_JMP32) 13511 13164 return push_insn(t, t + 1, FALLTHROUGH, env, false); 13512 13165 13513 - switch (BPF_OP(insns[t].code)) { 13166 + switch (BPF_OP(insn->code)) { 13514 13167 case BPF_EXIT: 13515 13168 return DONE_EXPLORING; 13516 13169 13517 13170 case BPF_CALL: 13518 - if (insns[t].imm == BPF_FUNC_timer_set_callback) 13171 + if (insn->src_reg == 0 && insn->imm == BPF_FUNC_timer_set_callback) 13519 13172 /* Mark this call insn as a prune point to trigger 13520 13173 * is_state_visited() check before call itself is 13521 13174 * processed by __check_func_call(). Otherwise new 13522 13175 * async state will be pushed for further exploration. 13523 13176 */ 13524 13177 mark_prune_point(env, t); 13525 - return visit_func_call_insn(t, insns, env, 13526 - insns[t].src_reg == BPF_PSEUDO_CALL); 13178 + return visit_func_call_insn(t, insns, env, insn->src_reg == BPF_PSEUDO_CALL); 13527 13179 13528 13180 case BPF_JA: 13529 - if (BPF_SRC(insns[t].code) != BPF_K) 13181 + if (BPF_SRC(insn->code) != BPF_K) 13530 13182 return -EINVAL; 13531 13183 13532 13184 /* unconditional jump with single edge */ 13533 - ret = push_insn(t, t + insns[t].off + 1, FALLTHROUGH, env, 13185 + ret = push_insn(t, t + insn->off + 1, FALLTHROUGH, env, 13534 13186 true); 13535 13187 if (ret) 13536 13188 return ret; 13537 13189 13538 - mark_prune_point(env, t + insns[t].off + 1); 13539 - mark_jmp_point(env, t + insns[t].off + 1); 13190 + mark_prune_point(env, t + insn->off + 1); 13191 + mark_jmp_point(env, t + insn->off + 1); 13540 13192 13541 13193 return ret; 13542 13194 ··· 13547 13201 if (ret) 13548 13202 return ret; 13549 13203 13550 - return push_insn(t, t + insns[t].off + 1, BRANCH, env, true); 13204 + return push_insn(t, t + insn->off + 1, BRANCH, env, true); 13551 13205 } 13552 13206 } 13553 13207 ··· 14223 13877 tnum_in(rold->var_off, rcur->var_off); 14224 13878 case PTR_TO_MAP_KEY: 14225 13879 case PTR_TO_MAP_VALUE: 13880 + case PTR_TO_MEM: 13881 + case PTR_TO_BUF: 13882 + case PTR_TO_TP_BUFFER: 14226 13883 /* If the new min/max/var_off satisfy the old ones and 14227 13884 * everything else matches, we are OK. 14228 13885 */ 14229 13886 return memcmp(rold, rcur, offsetof(struct bpf_reg_state, var_off)) == 0 && 14230 13887 range_within(rold, rcur) && 14231 13888 tnum_in(rold->var_off, rcur->var_off) && 14232 - check_ids(rold->id, rcur->id, idmap); 13889 + check_ids(rold->id, rcur->id, idmap) && 13890 + check_ids(rold->ref_obj_id, rcur->ref_obj_id, idmap); 14233 13891 case PTR_TO_PACKET_META: 14234 13892 case PTR_TO_PACKET: 14235 13893 /* We must have at least as much range as the old ptr ··· 14284 13934 } 14285 13935 14286 13936 if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID) 13937 + continue; 13938 + 13939 + if (env->allow_uninit_stack && 13940 + old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC) 14287 13941 continue; 14288 13942 14289 13943 /* explored stack has more populated slots than current stack ··· 14665 14311 * This threshold shouldn't be too high either, since states 14666 14312 * at the end of the loop are likely to be useful in pruning. 14667 14313 */ 14668 - if (env->jmps_processed - env->prev_jmps_processed < 20 && 14314 + if (!env->test_state_freq && 14315 + env->jmps_processed - env->prev_jmps_processed < 20 && 14669 14316 env->insn_processed - env->prev_insn_processed < 100) 14670 14317 add_new_state = false; 14671 14318 goto miss; ··· 14855 14500 !reg_type_mismatch_ok(prev)); 14856 14501 } 14857 14502 14503 + static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type, 14504 + bool allow_trust_missmatch) 14505 + { 14506 + enum bpf_reg_type *prev_type = &env->insn_aux_data[env->insn_idx].ptr_type; 14507 + 14508 + if (*prev_type == NOT_INIT) { 14509 + /* Saw a valid insn 14510 + * dst_reg = *(u32 *)(src_reg + off) 14511 + * save type to validate intersecting paths 14512 + */ 14513 + *prev_type = type; 14514 + } else if (reg_type_mismatch(type, *prev_type)) { 14515 + /* Abuser program is trying to use the same insn 14516 + * dst_reg = *(u32*) (src_reg + off) 14517 + * with different pointer types: 14518 + * src_reg == ctx in one branch and 14519 + * src_reg == stack|map in some other branch. 14520 + * Reject it. 14521 + */ 14522 + if (allow_trust_missmatch && 14523 + base_type(type) == PTR_TO_BTF_ID && 14524 + base_type(*prev_type) == PTR_TO_BTF_ID) { 14525 + /* 14526 + * Have to support a use case when one path through 14527 + * the program yields TRUSTED pointer while another 14528 + * is UNTRUSTED. Fallback to UNTRUSTED to generate 14529 + * BPF_PROBE_MEM. 14530 + */ 14531 + *prev_type = PTR_TO_BTF_ID | PTR_UNTRUSTED; 14532 + } else { 14533 + verbose(env, "same insn cannot be used with different pointers\n"); 14534 + return -EINVAL; 14535 + } 14536 + } 14537 + 14538 + return 0; 14539 + } 14540 + 14858 14541 static int do_check(struct bpf_verifier_env *env) 14859 14542 { 14860 14543 bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); ··· 15002 14609 return err; 15003 14610 15004 14611 } else if (class == BPF_LDX) { 15005 - enum bpf_reg_type *prev_src_type, src_reg_type; 14612 + enum bpf_reg_type src_reg_type; 15006 14613 15007 14614 /* check for reserved fields is already done */ 15008 14615 ··· 15026 14633 if (err) 15027 14634 return err; 15028 14635 15029 - prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type; 15030 - 15031 - if (*prev_src_type == NOT_INIT) { 15032 - /* saw a valid insn 15033 - * dst_reg = *(u32 *)(src_reg + off) 15034 - * save type to validate intersecting paths 15035 - */ 15036 - *prev_src_type = src_reg_type; 15037 - 15038 - } else if (reg_type_mismatch(src_reg_type, *prev_src_type)) { 15039 - /* ABuser program is trying to use the same insn 15040 - * dst_reg = *(u32*) (src_reg + off) 15041 - * with different pointer types: 15042 - * src_reg == ctx in one branch and 15043 - * src_reg == stack|map in some other branch. 15044 - * Reject it. 15045 - */ 15046 - verbose(env, "same insn cannot be used with different pointers\n"); 15047 - return -EINVAL; 15048 - } 15049 - 14636 + err = save_aux_ptr_type(env, src_reg_type, true); 14637 + if (err) 14638 + return err; 15050 14639 } else if (class == BPF_STX) { 15051 - enum bpf_reg_type *prev_dst_type, dst_reg_type; 14640 + enum bpf_reg_type dst_reg_type; 15052 14641 15053 14642 if (BPF_MODE(insn->code) == BPF_ATOMIC) { 15054 14643 err = check_atomic(env, env->insn_idx, insn); ··· 15063 14688 if (err) 15064 14689 return err; 15065 14690 15066 - prev_dst_type = &env->insn_aux_data[env->insn_idx].ptr_type; 15067 - 15068 - if (*prev_dst_type == NOT_INIT) { 15069 - *prev_dst_type = dst_reg_type; 15070 - } else if (reg_type_mismatch(dst_reg_type, *prev_dst_type)) { 15071 - verbose(env, "same insn cannot be used with different pointers\n"); 15072 - return -EINVAL; 15073 - } 15074 - 14691 + err = save_aux_ptr_type(env, dst_reg_type, false); 14692 + if (err) 14693 + return err; 15075 14694 } else if (class == BPF_ST) { 14695 + enum bpf_reg_type dst_reg_type; 14696 + 15076 14697 if (BPF_MODE(insn->code) != BPF_MEM || 15077 14698 insn->src_reg != BPF_REG_0) { 15078 14699 verbose(env, "BPF_ST uses reserved fields\n"); ··· 15079 14708 if (err) 15080 14709 return err; 15081 14710 15082 - if (is_ctx_reg(env, insn->dst_reg)) { 15083 - verbose(env, "BPF_ST stores into R%d %s is not allowed\n", 15084 - insn->dst_reg, 15085 - reg_type_str(env, reg_state(env, insn->dst_reg)->type)); 15086 - return -EACCES; 15087 - } 14711 + dst_reg_type = regs[insn->dst_reg].type; 15088 14712 15089 14713 /* check that memory (dst_reg + off) is writeable */ 15090 14714 err = check_mem_access(env, env->insn_idx, insn->dst_reg, ··· 15088 14722 if (err) 15089 14723 return err; 15090 14724 14725 + err = save_aux_ptr_type(env, dst_reg_type, false); 14726 + if (err) 14727 + return err; 15091 14728 } else if (class == BPF_JMP || class == BPF_JMP32) { 15092 14729 u8 opcode = BPF_OP(insn->code); 15093 14730 ··· 15125 14756 err = check_helper_call(env, insn, &env->insn_idx); 15126 14757 if (err) 15127 14758 return err; 14759 + 14760 + mark_reg_scratched(env, BPF_REG_0); 15128 14761 } else if (opcode == BPF_JA) { 15129 14762 if (BPF_SRC(insn->code) != BPF_K || 15130 14763 insn->imm != 0 || ··· 16201 15830 16202 15831 for (i = 0; i < insn_cnt; i++, insn++) { 16203 15832 bpf_convert_ctx_access_t convert_ctx_access; 16204 - bool ctx_access; 16205 15833 16206 15834 if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) || 16207 15835 insn->code == (BPF_LDX | BPF_MEM | BPF_H) || 16208 15836 insn->code == (BPF_LDX | BPF_MEM | BPF_W) || 16209 15837 insn->code == (BPF_LDX | BPF_MEM | BPF_DW)) { 16210 15838 type = BPF_READ; 16211 - ctx_access = true; 16212 15839 } else if (insn->code == (BPF_STX | BPF_MEM | BPF_B) || 16213 15840 insn->code == (BPF_STX | BPF_MEM | BPF_H) || 16214 15841 insn->code == (BPF_STX | BPF_MEM | BPF_W) || ··· 16216 15847 insn->code == (BPF_ST | BPF_MEM | BPF_W) || 16217 15848 insn->code == (BPF_ST | BPF_MEM | BPF_DW)) { 16218 15849 type = BPF_WRITE; 16219 - ctx_access = BPF_CLASS(insn->code) == BPF_STX; 16220 15850 } else { 16221 15851 continue; 16222 15852 } ··· 16237 15869 insn = new_prog->insnsi + i + delta; 16238 15870 continue; 16239 15871 } 16240 - 16241 - if (!ctx_access) 16242 - continue; 16243 15872 16244 15873 switch ((int)env->insn_aux_data[i + delta].ptr_type) { 16245 15874 case PTR_TO_CTX: ··· 16686 16321 desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) { 16687 16322 insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1); 16688 16323 *cnt = 1; 16324 + } else if (desc->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) { 16325 + bool seen_direct_write = env->seen_direct_write; 16326 + bool is_rdonly = !may_access_direct_pkt_data(env, NULL, BPF_WRITE); 16327 + 16328 + if (is_rdonly) 16329 + insn->imm = BPF_CALL_IMM(bpf_dynptr_from_skb_rdonly); 16330 + 16331 + /* restore env->seen_direct_write to its original value, since 16332 + * may_access_direct_pkt_data mutates it 16333 + */ 16334 + env->seen_direct_write = seen_direct_write; 16689 16335 } 16690 16336 return 0; 16691 16337 } ··· 18088 17712 env->bypass_spec_v1 = bpf_bypass_spec_v1(); 18089 17713 env->bypass_spec_v4 = bpf_bypass_spec_v4(); 18090 17714 env->bpf_capable = bpf_capable(); 18091 - env->rcu_tag_supported = btf_vmlinux && 18092 - btf_find_by_name_kind(btf_vmlinux, "rcu", BTF_KIND_TYPE_TAG) > 0; 18093 17715 18094 17716 if (is_priv) 18095 17717 env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;

-4

kernel/trace/bpf_trace.c

··· 1453 1453 NULL : &bpf_probe_read_compat_str_proto; 1454 1454 #endif 1455 1455 #ifdef CONFIG_CGROUPS 1456 - case BPF_FUNC_get_current_cgroup_id: 1457 - return &bpf_get_current_cgroup_id_proto; 1458 - case BPF_FUNC_get_current_ancestor_cgroup_id: 1459 - return &bpf_get_current_ancestor_cgroup_id_proto; 1460 1456 case BPF_FUNC_cgrp_storage_get: 1461 1457 return &bpf_cgrp_storage_get_proto; 1462 1458 case BPF_FUNC_cgrp_storage_delete:

+2 -1

net/bpf/test_run.c

··· 737 737 738 738 __bpf_kfunc void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) 739 739 { 740 + /* p != NULL, but p->cnt could be 0 */ 740 741 } 741 742 742 743 __bpf_kfunc void bpf_kfunc_call_test_destructive(void) ··· 785 784 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1) 786 785 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1) 787 786 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2) 788 - BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS) 787 + BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU) 789 788 BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE) 790 789 BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg) 791 790 BTF_SET8_END(test_sk_check_kfunc_ids)

+148 -45

net/core/filter.c

··· 1721 1721 .arg5_type = ARG_ANYTHING, 1722 1722 }; 1723 1723 1724 + int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, 1725 + u32 len, u64 flags) 1726 + { 1727 + return ____bpf_skb_store_bytes(skb, offset, from, len, flags); 1728 + } 1729 + 1724 1730 BPF_CALL_4(bpf_skb_load_bytes, const struct sk_buff *, skb, u32, offset, 1725 1731 void *, to, u32, len) 1726 1732 { ··· 1756 1750 .arg3_type = ARG_PTR_TO_UNINIT_MEM, 1757 1751 .arg4_type = ARG_CONST_SIZE, 1758 1752 }; 1753 + 1754 + int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len) 1755 + { 1756 + return ____bpf_skb_load_bytes(skb, offset, to, len); 1757 + } 1759 1758 1760 1759 BPF_CALL_4(bpf_flow_dissector_load_bytes, 1761 1760 const struct bpf_flow_dissector *, ctx, u32, offset, ··· 3839 3828 .arg3_type = ARG_ANYTHING, 3840 3829 }; 3841 3830 3842 - BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) 3831 + BPF_CALL_1(bpf_xdp_get_buff_len, struct xdp_buff*, xdp) 3843 3832 { 3844 3833 return xdp_get_buff_len(xdp); 3845 3834 } ··· 3894 3883 .arg2_type = ARG_ANYTHING, 3895 3884 }; 3896 3885 3897 - static void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, 3898 - void *buf, unsigned long len, bool flush) 3886 + void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, 3887 + void *buf, unsigned long len, bool flush) 3899 3888 { 3900 3889 unsigned long ptr_len, ptr_off = 0; 3901 3890 skb_frag_t *next_frag, *end_frag; ··· 3941 3930 } 3942 3931 } 3943 3932 3944 - static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) 3933 + void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) 3945 3934 { 3946 3935 struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); 3947 3936 u32 size = xdp->data_end - xdp->data; ··· 3999 3988 .arg4_type = ARG_CONST_SIZE, 4000 3989 }; 4001 3990 3991 + int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len) 3992 + { 3993 + return ____bpf_xdp_load_bytes(xdp, offset, buf, len); 3994 + } 3995 + 4002 3996 BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset, 4003 3997 void *, buf, u32, len) 4004 3998 { ··· 4030 4014 .arg3_type = ARG_PTR_TO_UNINIT_MEM, 4031 4015 .arg4_type = ARG_CONST_SIZE, 4032 4016 }; 4017 + 4018 + int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len) 4019 + { 4020 + return ____bpf_xdp_store_bytes(xdp, offset, buf, len); 4021 + } 4033 4022 4034 4023 static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset) 4035 4024 { ··· 8165 8144 return &bpf_sk_storage_delete_proto; 8166 8145 case BPF_FUNC_get_netns_cookie: 8167 8146 return &bpf_get_netns_cookie_sk_msg_proto; 8168 - #ifdef CONFIG_CGROUPS 8169 - case BPF_FUNC_get_current_cgroup_id: 8170 - return &bpf_get_current_cgroup_id_proto; 8171 - case BPF_FUNC_get_current_ancestor_cgroup_id: 8172 - return &bpf_get_current_ancestor_cgroup_id_proto; 8173 - #endif 8174 8147 #ifdef CONFIG_CGROUP_NET_CLASSID 8175 8148 case BPF_FUNC_get_cgroup_classid: 8176 8149 return &bpf_get_cgroup_classid_curr_proto; ··· 9279 9264 #endif 9280 9265 9281 9266 /* <store>: skb->tstamp = tstamp */ 9282 - *insn++ = BPF_STX_MEM(BPF_DW, skb_reg, value_reg, 9283 - offsetof(struct sk_buff, tstamp)); 9267 + *insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_DW | BPF_MEM, 9268 + skb_reg, value_reg, offsetof(struct sk_buff, tstamp), si->imm); 9284 9269 return insn; 9285 9270 } 9271 + 9272 + #define BPF_EMIT_STORE(size, si, off) \ 9273 + BPF_RAW_INSN(BPF_CLASS((si)->code) | (size) | BPF_MEM, \ 9274 + (si)->dst_reg, (si)->src_reg, (off), (si)->imm) 9286 9275 9287 9276 static u32 bpf_convert_ctx_access(enum bpf_access_type type, 9288 9277 const struct bpf_insn *si, ··· 9317 9298 9318 9299 case offsetof(struct __sk_buff, priority): 9319 9300 if (type == BPF_WRITE) 9320 - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, 9321 - bpf_target_off(struct sk_buff, priority, 4, 9322 - target_size)); 9301 + *insn++ = BPF_EMIT_STORE(BPF_W, si, 9302 + bpf_target_off(struct sk_buff, priority, 4, 9303 + target_size)); 9323 9304 else 9324 9305 *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, 9325 9306 bpf_target_off(struct sk_buff, priority, 4, ··· 9350 9331 9351 9332 case offsetof(struct __sk_buff, mark): 9352 9333 if (type == BPF_WRITE) 9353 - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, 9354 - bpf_target_off(struct sk_buff, mark, 4, 9355 - target_size)); 9334 + *insn++ = BPF_EMIT_STORE(BPF_W, si, 9335 + bpf_target_off(struct sk_buff, mark, 4, 9336 + target_size)); 9356 9337 else 9357 9338 *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, 9358 9339 bpf_target_off(struct sk_buff, mark, 4, ··· 9371 9352 9372 9353 case offsetof(struct __sk_buff, queue_mapping): 9373 9354 if (type == BPF_WRITE) { 9374 - *insn++ = BPF_JMP_IMM(BPF_JGE, si->src_reg, NO_QUEUE_MAPPING, 1); 9375 - *insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, si->src_reg, 9376 - bpf_target_off(struct sk_buff, 9377 - queue_mapping, 9378 - 2, target_size)); 9355 + u32 off = bpf_target_off(struct sk_buff, queue_mapping, 2, target_size); 9356 + 9357 + if (BPF_CLASS(si->code) == BPF_ST && si->imm >= NO_QUEUE_MAPPING) { 9358 + *insn++ = BPF_JMP_A(0); /* noop */ 9359 + break; 9360 + } 9361 + 9362 + if (BPF_CLASS(si->code) == BPF_STX) 9363 + *insn++ = BPF_JMP_IMM(BPF_JGE, si->src_reg, NO_QUEUE_MAPPING, 1); 9364 + *insn++ = BPF_EMIT_STORE(BPF_H, si, off); 9379 9365 } else { 9380 9366 *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, 9381 9367 bpf_target_off(struct sk_buff, ··· 9416 9392 off += offsetof(struct sk_buff, cb); 9417 9393 off += offsetof(struct qdisc_skb_cb, data); 9418 9394 if (type == BPF_WRITE) 9419 - *insn++ = BPF_STX_MEM(BPF_SIZE(si->code), si->dst_reg, 9420 - si->src_reg, off); 9395 + *insn++ = BPF_EMIT_STORE(BPF_SIZE(si->code), si, off); 9421 9396 else 9422 9397 *insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg, 9423 9398 si->src_reg, off); ··· 9431 9408 off += offsetof(struct qdisc_skb_cb, tc_classid); 9432 9409 *target_size = 2; 9433 9410 if (type == BPF_WRITE) 9434 - *insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, 9435 - si->src_reg, off); 9411 + *insn++ = BPF_EMIT_STORE(BPF_H, si, off); 9436 9412 else 9437 9413 *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, 9438 9414 si->src_reg, off); ··· 9464 9442 case offsetof(struct __sk_buff, tc_index): 9465 9443 #ifdef CONFIG_NET_SCHED 9466 9444 if (type == BPF_WRITE) 9467 - *insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, si->src_reg, 9468 - bpf_target_off(struct sk_buff, tc_index, 2, 9469 - target_size)); 9445 + *insn++ = BPF_EMIT_STORE(BPF_H, si, 9446 + bpf_target_off(struct sk_buff, tc_index, 2, 9447 + target_size)); 9470 9448 else 9471 9449 *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, 9472 9450 bpf_target_off(struct sk_buff, tc_index, 2, ··· 9667 9645 BUILD_BUG_ON(sizeof_field(struct sock, sk_bound_dev_if) != 4); 9668 9646 9669 9647 if (type == BPF_WRITE) 9670 - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, 9671 - offsetof(struct sock, sk_bound_dev_if)); 9648 + *insn++ = BPF_EMIT_STORE(BPF_W, si, 9649 + offsetof(struct sock, sk_bound_dev_if)); 9672 9650 else 9673 9651 *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, 9674 9652 offsetof(struct sock, sk_bound_dev_if)); ··· 9678 9656 BUILD_BUG_ON(sizeof_field(struct sock, sk_mark) != 4); 9679 9657 9680 9658 if (type == BPF_WRITE) 9681 - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, 9682 - offsetof(struct sock, sk_mark)); 9659 + *insn++ = BPF_EMIT_STORE(BPF_W, si, 9660 + offsetof(struct sock, sk_mark)); 9683 9661 else 9684 9662 *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, 9685 9663 offsetof(struct sock, sk_mark)); ··· 9689 9667 BUILD_BUG_ON(sizeof_field(struct sock, sk_priority) != 4); 9690 9668 9691 9669 if (type == BPF_WRITE) 9692 - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, 9693 - offsetof(struct sock, sk_priority)); 9670 + *insn++ = BPF_EMIT_STORE(BPF_W, si, 9671 + offsetof(struct sock, sk_priority)); 9694 9672 else 9695 9673 *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, 9696 9674 offsetof(struct sock, sk_priority)); ··· 9955 9933 offsetof(S, TF)); \ 9956 9934 *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), tmp_reg, \ 9957 9935 si->dst_reg, offsetof(S, F)); \ 9958 - *insn++ = BPF_STX_MEM(SIZE, tmp_reg, si->src_reg, \ 9936 + *insn++ = BPF_RAW_INSN(SIZE | BPF_MEM | BPF_CLASS(si->code), \ 9937 + tmp_reg, si->src_reg, \ 9959 9938 bpf_target_off(NS, NF, sizeof_field(NS, NF), \ 9960 9939 target_size) \ 9961 - + OFF); \ 9940 + + OFF, \ 9941 + si->imm); \ 9962 9942 *insn++ = BPF_LDX_MEM(BPF_DW, tmp_reg, si->dst_reg, \ 9963 9943 offsetof(S, TF)); \ 9964 9944 } while (0) ··· 10195 10171 struct bpf_sock_ops_kern, sk),\ 10196 10172 reg, si->dst_reg, \ 10197 10173 offsetof(struct bpf_sock_ops_kern, sk));\ 10198 - *insn++ = BPF_STX_MEM(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD), \ 10199 - reg, si->src_reg, \ 10200 - offsetof(OBJ, OBJ_FIELD)); \ 10174 + *insn++ = BPF_RAW_INSN(BPF_FIELD_SIZEOF(OBJ, OBJ_FIELD) | \ 10175 + BPF_MEM | BPF_CLASS(si->code), \ 10176 + reg, si->src_reg, \ 10177 + offsetof(OBJ, OBJ_FIELD), \ 10178 + si->imm); \ 10201 10179 *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->dst_reg, \ 10202 10180 offsetof(struct bpf_sock_ops_kern, \ 10203 10181 temp)); \ ··· 10231 10205 off -= offsetof(struct bpf_sock_ops, replylong[0]); 10232 10206 off += offsetof(struct bpf_sock_ops_kern, replylong[0]); 10233 10207 if (type == BPF_WRITE) 10234 - *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, 10235 - off); 10208 + *insn++ = BPF_EMIT_STORE(BPF_W, si, off); 10236 10209 else 10237 10210 *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg, 10238 10211 off); ··· 10588 10563 off += offsetof(struct sk_buff, cb); 10589 10564 off += offsetof(struct sk_skb_cb, data); 10590 10565 if (type == BPF_WRITE) 10591 - *insn++ = BPF_STX_MEM(BPF_SIZE(si->code), si->dst_reg, 10592 - si->src_reg, off); 10566 + *insn++ = BPF_EMIT_STORE(BPF_SIZE(si->code), si, off); 10593 10567 else 10594 10568 *insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg, 10595 10569 si->src_reg, off); ··· 11645 11621 11646 11622 return func; 11647 11623 } 11624 + 11625 + __diag_push(); 11626 + __diag_ignore_all("-Wmissing-prototypes", 11627 + "Global functions as their definitions will be in vmlinux BTF"); 11628 + __bpf_kfunc int bpf_dynptr_from_skb(struct sk_buff *skb, u64 flags, 11629 + struct bpf_dynptr_kern *ptr__uninit) 11630 + { 11631 + if (flags) { 11632 + bpf_dynptr_set_null(ptr__uninit); 11633 + return -EINVAL; 11634 + } 11635 + 11636 + bpf_dynptr_init(ptr__uninit, skb, BPF_DYNPTR_TYPE_SKB, 0, skb->len); 11637 + 11638 + return 0; 11639 + } 11640 + 11641 + __bpf_kfunc int bpf_dynptr_from_xdp(struct xdp_buff *xdp, u64 flags, 11642 + struct bpf_dynptr_kern *ptr__uninit) 11643 + { 11644 + if (flags) { 11645 + bpf_dynptr_set_null(ptr__uninit); 11646 + return -EINVAL; 11647 + } 11648 + 11649 + bpf_dynptr_init(ptr__uninit, xdp, BPF_DYNPTR_TYPE_XDP, 0, xdp_get_buff_len(xdp)); 11650 + 11651 + return 0; 11652 + } 11653 + __diag_pop(); 11654 + 11655 + int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags, 11656 + struct bpf_dynptr_kern *ptr__uninit) 11657 + { 11658 + int err; 11659 + 11660 + err = bpf_dynptr_from_skb(skb, flags, ptr__uninit); 11661 + if (err) 11662 + return err; 11663 + 11664 + bpf_dynptr_set_rdonly(ptr__uninit); 11665 + 11666 + return 0; 11667 + } 11668 + 11669 + BTF_SET8_START(bpf_kfunc_check_set_skb) 11670 + BTF_ID_FLAGS(func, bpf_dynptr_from_skb) 11671 + BTF_SET8_END(bpf_kfunc_check_set_skb) 11672 + 11673 + BTF_SET8_START(bpf_kfunc_check_set_xdp) 11674 + BTF_ID_FLAGS(func, bpf_dynptr_from_xdp) 11675 + BTF_SET8_END(bpf_kfunc_check_set_xdp) 11676 + 11677 + static const struct btf_kfunc_id_set bpf_kfunc_set_skb = { 11678 + .owner = THIS_MODULE, 11679 + .set = &bpf_kfunc_check_set_skb, 11680 + }; 11681 + 11682 + static const struct btf_kfunc_id_set bpf_kfunc_set_xdp = { 11683 + .owner = THIS_MODULE, 11684 + .set = &bpf_kfunc_check_set_xdp, 11685 + }; 11686 + 11687 + static int __init bpf_kfunc_init(void) 11688 + { 11689 + int ret; 11690 + 11691 + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb); 11692 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &bpf_kfunc_set_skb); 11693 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SK_SKB, &bpf_kfunc_set_skb); 11694 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCKET_FILTER, &bpf_kfunc_set_skb); 11695 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_skb); 11696 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_OUT, &bpf_kfunc_set_skb); 11697 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_IN, &bpf_kfunc_set_skb); 11698 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_XMIT, &bpf_kfunc_set_skb); 11699 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb); 11700 + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); 11701 + } 11702 + late_initcall(bpf_kfunc_init);

-9

tools/arch/arm64/include/uapi/asm/bpf_perf_event.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _UAPI__ASM_BPF_PERF_EVENT_H__ 3 - #define _UAPI__ASM_BPF_PERF_EVENT_H__ 4 - 5 - #include <asm/ptrace.h> 6 - 7 - typedef struct user_pt_regs bpf_user_pt_regs_t; 8 - 9 - #endif /* _UAPI__ASM_BPF_PERF_EVENT_H__ */

-9

tools/arch/s390/include/uapi/asm/bpf_perf_event.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _UAPI__ASM_BPF_PERF_EVENT_H__ 3 - #define _UAPI__ASM_BPF_PERF_EVENT_H__ 4 - 5 - #include "ptrace.h" 6 - 7 - typedef user_pt_regs bpf_user_pt_regs_t; 8 - 9 - #endif /* _UAPI__ASM_BPF_PERF_EVENT_H__ */

-458

tools/arch/s390/include/uapi/asm/ptrace.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ 2 - /* 3 - * S390 version 4 - * Copyright IBM Corp. 1999, 2000 5 - * Author(s): Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com) 6 - */ 7 - 8 - #ifndef _UAPI_S390_PTRACE_H 9 - #define _UAPI_S390_PTRACE_H 10 - 11 - /* 12 - * Offsets in the user_regs_struct. They are used for the ptrace 13 - * system call and in entry.S 14 - */ 15 - #ifndef __s390x__ 16 - 17 - #define PT_PSWMASK 0x00 18 - #define PT_PSWADDR 0x04 19 - #define PT_GPR0 0x08 20 - #define PT_GPR1 0x0C 21 - #define PT_GPR2 0x10 22 - #define PT_GPR3 0x14 23 - #define PT_GPR4 0x18 24 - #define PT_GPR5 0x1C 25 - #define PT_GPR6 0x20 26 - #define PT_GPR7 0x24 27 - #define PT_GPR8 0x28 28 - #define PT_GPR9 0x2C 29 - #define PT_GPR10 0x30 30 - #define PT_GPR11 0x34 31 - #define PT_GPR12 0x38 32 - #define PT_GPR13 0x3C 33 - #define PT_GPR14 0x40 34 - #define PT_GPR15 0x44 35 - #define PT_ACR0 0x48 36 - #define PT_ACR1 0x4C 37 - #define PT_ACR2 0x50 38 - #define PT_ACR3 0x54 39 - #define PT_ACR4 0x58 40 - #define PT_ACR5 0x5C 41 - #define PT_ACR6 0x60 42 - #define PT_ACR7 0x64 43 - #define PT_ACR8 0x68 44 - #define PT_ACR9 0x6C 45 - #define PT_ACR10 0x70 46 - #define PT_ACR11 0x74 47 - #define PT_ACR12 0x78 48 - #define PT_ACR13 0x7C 49 - #define PT_ACR14 0x80 50 - #define PT_ACR15 0x84 51 - #define PT_ORIGGPR2 0x88 52 - #define PT_FPC 0x90 53 - /* 54 - * A nasty fact of life that the ptrace api 55 - * only supports passing of longs. 56 - */ 57 - #define PT_FPR0_HI 0x98 58 - #define PT_FPR0_LO 0x9C 59 - #define PT_FPR1_HI 0xA0 60 - #define PT_FPR1_LO 0xA4 61 - #define PT_FPR2_HI 0xA8 62 - #define PT_FPR2_LO 0xAC 63 - #define PT_FPR3_HI 0xB0 64 - #define PT_FPR3_LO 0xB4 65 - #define PT_FPR4_HI 0xB8 66 - #define PT_FPR4_LO 0xBC 67 - #define PT_FPR5_HI 0xC0 68 - #define PT_FPR5_LO 0xC4 69 - #define PT_FPR6_HI 0xC8 70 - #define PT_FPR6_LO 0xCC 71 - #define PT_FPR7_HI 0xD0 72 - #define PT_FPR7_LO 0xD4 73 - #define PT_FPR8_HI 0xD8 74 - #define PT_FPR8_LO 0XDC 75 - #define PT_FPR9_HI 0xE0 76 - #define PT_FPR9_LO 0xE4 77 - #define PT_FPR10_HI 0xE8 78 - #define PT_FPR10_LO 0xEC 79 - #define PT_FPR11_HI 0xF0 80 - #define PT_FPR11_LO 0xF4 81 - #define PT_FPR12_HI 0xF8 82 - #define PT_FPR12_LO 0xFC 83 - #define PT_FPR13_HI 0x100 84 - #define PT_FPR13_LO 0x104 85 - #define PT_FPR14_HI 0x108 86 - #define PT_FPR14_LO 0x10C 87 - #define PT_FPR15_HI 0x110 88 - #define PT_FPR15_LO 0x114 89 - #define PT_CR_9 0x118 90 - #define PT_CR_10 0x11C 91 - #define PT_CR_11 0x120 92 - #define PT_IEEE_IP 0x13C 93 - #define PT_LASTOFF PT_IEEE_IP 94 - #define PT_ENDREGS 0x140-1 95 - 96 - #define GPR_SIZE 4 97 - #define CR_SIZE 4 98 - 99 - #define STACK_FRAME_OVERHEAD 96 /* size of minimum stack frame */ 100 - 101 - #else /* __s390x__ */ 102 - 103 - #define PT_PSWMASK 0x00 104 - #define PT_PSWADDR 0x08 105 - #define PT_GPR0 0x10 106 - #define PT_GPR1 0x18 107 - #define PT_GPR2 0x20 108 - #define PT_GPR3 0x28 109 - #define PT_GPR4 0x30 110 - #define PT_GPR5 0x38 111 - #define PT_GPR6 0x40 112 - #define PT_GPR7 0x48 113 - #define PT_GPR8 0x50 114 - #define PT_GPR9 0x58 115 - #define PT_GPR10 0x60 116 - #define PT_GPR11 0x68 117 - #define PT_GPR12 0x70 118 - #define PT_GPR13 0x78 119 - #define PT_GPR14 0x80 120 - #define PT_GPR15 0x88 121 - #define PT_ACR0 0x90 122 - #define PT_ACR1 0x94 123 - #define PT_ACR2 0x98 124 - #define PT_ACR3 0x9C 125 - #define PT_ACR4 0xA0 126 - #define PT_ACR5 0xA4 127 - #define PT_ACR6 0xA8 128 - #define PT_ACR7 0xAC 129 - #define PT_ACR8 0xB0 130 - #define PT_ACR9 0xB4 131 - #define PT_ACR10 0xB8 132 - #define PT_ACR11 0xBC 133 - #define PT_ACR12 0xC0 134 - #define PT_ACR13 0xC4 135 - #define PT_ACR14 0xC8 136 - #define PT_ACR15 0xCC 137 - #define PT_ORIGGPR2 0xD0 138 - #define PT_FPC 0xD8 139 - #define PT_FPR0 0xE0 140 - #define PT_FPR1 0xE8 141 - #define PT_FPR2 0xF0 142 - #define PT_FPR3 0xF8 143 - #define PT_FPR4 0x100 144 - #define PT_FPR5 0x108 145 - #define PT_FPR6 0x110 146 - #define PT_FPR7 0x118 147 - #define PT_FPR8 0x120 148 - #define PT_FPR9 0x128 149 - #define PT_FPR10 0x130 150 - #define PT_FPR11 0x138 151 - #define PT_FPR12 0x140 152 - #define PT_FPR13 0x148 153 - #define PT_FPR14 0x150 154 - #define PT_FPR15 0x158 155 - #define PT_CR_9 0x160 156 - #define PT_CR_10 0x168 157 - #define PT_CR_11 0x170 158 - #define PT_IEEE_IP 0x1A8 159 - #define PT_LASTOFF PT_IEEE_IP 160 - #define PT_ENDREGS 0x1B0-1 161 - 162 - #define GPR_SIZE 8 163 - #define CR_SIZE 8 164 - 165 - #define STACK_FRAME_OVERHEAD 160 /* size of minimum stack frame */ 166 - 167 - #endif /* __s390x__ */ 168 - 169 - #define NUM_GPRS 16 170 - #define NUM_FPRS 16 171 - #define NUM_CRS 16 172 - #define NUM_ACRS 16 173 - 174 - #define NUM_CR_WORDS 3 175 - 176 - #define FPR_SIZE 8 177 - #define FPC_SIZE 4 178 - #define FPC_PAD_SIZE 4 /* gcc insists on aligning the fpregs */ 179 - #define ACR_SIZE 4 180 - 181 - 182 - #define PTRACE_OLDSETOPTIONS 21 183 - #define PTRACE_SYSEMU 31 184 - #define PTRACE_SYSEMU_SINGLESTEP 32 185 - #ifndef __ASSEMBLY__ 186 - #include <linux/stddef.h> 187 - #include <linux/types.h> 188 - 189 - typedef union { 190 - float f; 191 - double d; 192 - __u64 ui; 193 - struct 194 - { 195 - __u32 hi; 196 - __u32 lo; 197 - } fp; 198 - } freg_t; 199 - 200 - typedef struct { 201 - __u32 fpc; 202 - __u32 pad; 203 - freg_t fprs[NUM_FPRS]; 204 - } s390_fp_regs; 205 - 206 - #define FPC_EXCEPTION_MASK 0xF8000000 207 - #define FPC_FLAGS_MASK 0x00F80000 208 - #define FPC_DXC_MASK 0x0000FF00 209 - #define FPC_RM_MASK 0x00000003 210 - 211 - /* this typedef defines how a Program Status Word looks like */ 212 - typedef struct { 213 - unsigned long mask; 214 - unsigned long addr; 215 - } __attribute__ ((aligned(8))) psw_t; 216 - 217 - #ifndef __s390x__ 218 - 219 - #define PSW_MASK_PER 0x40000000UL 220 - #define PSW_MASK_DAT 0x04000000UL 221 - #define PSW_MASK_IO 0x02000000UL 222 - #define PSW_MASK_EXT 0x01000000UL 223 - #define PSW_MASK_KEY 0x00F00000UL 224 - #define PSW_MASK_BASE 0x00080000UL /* always one */ 225 - #define PSW_MASK_MCHECK 0x00040000UL 226 - #define PSW_MASK_WAIT 0x00020000UL 227 - #define PSW_MASK_PSTATE 0x00010000UL 228 - #define PSW_MASK_ASC 0x0000C000UL 229 - #define PSW_MASK_CC 0x00003000UL 230 - #define PSW_MASK_PM 0x00000F00UL 231 - #define PSW_MASK_RI 0x00000000UL 232 - #define PSW_MASK_EA 0x00000000UL 233 - #define PSW_MASK_BA 0x00000000UL 234 - 235 - #define PSW_MASK_USER 0x0000FF00UL 236 - 237 - #define PSW_ADDR_AMODE 0x80000000UL 238 - #define PSW_ADDR_INSN 0x7FFFFFFFUL 239 - 240 - #define PSW_DEFAULT_KEY (((unsigned long) PAGE_DEFAULT_ACC) << 20) 241 - 242 - #define PSW_ASC_PRIMARY 0x00000000UL 243 - #define PSW_ASC_ACCREG 0x00004000UL 244 - #define PSW_ASC_SECONDARY 0x00008000UL 245 - #define PSW_ASC_HOME 0x0000C000UL 246 - 247 - #else /* __s390x__ */ 248 - 249 - #define PSW_MASK_PER 0x4000000000000000UL 250 - #define PSW_MASK_DAT 0x0400000000000000UL 251 - #define PSW_MASK_IO 0x0200000000000000UL 252 - #define PSW_MASK_EXT 0x0100000000000000UL 253 - #define PSW_MASK_BASE 0x0000000000000000UL 254 - #define PSW_MASK_KEY 0x00F0000000000000UL 255 - #define PSW_MASK_MCHECK 0x0004000000000000UL 256 - #define PSW_MASK_WAIT 0x0002000000000000UL 257 - #define PSW_MASK_PSTATE 0x0001000000000000UL 258 - #define PSW_MASK_ASC 0x0000C00000000000UL 259 - #define PSW_MASK_CC 0x0000300000000000UL 260 - #define PSW_MASK_PM 0x00000F0000000000UL 261 - #define PSW_MASK_RI 0x0000008000000000UL 262 - #define PSW_MASK_EA 0x0000000100000000UL 263 - #define PSW_MASK_BA 0x0000000080000000UL 264 - 265 - #define PSW_MASK_USER 0x0000FF0180000000UL 266 - 267 - #define PSW_ADDR_AMODE 0x0000000000000000UL 268 - #define PSW_ADDR_INSN 0xFFFFFFFFFFFFFFFFUL 269 - 270 - #define PSW_DEFAULT_KEY (((unsigned long) PAGE_DEFAULT_ACC) << 52) 271 - 272 - #define PSW_ASC_PRIMARY 0x0000000000000000UL 273 - #define PSW_ASC_ACCREG 0x0000400000000000UL 274 - #define PSW_ASC_SECONDARY 0x0000800000000000UL 275 - #define PSW_ASC_HOME 0x0000C00000000000UL 276 - 277 - #endif /* __s390x__ */ 278 - 279 - 280 - /* 281 - * The s390_regs structure is used to define the elf_gregset_t. 282 - */ 283 - typedef struct { 284 - psw_t psw; 285 - unsigned long gprs[NUM_GPRS]; 286 - unsigned int acrs[NUM_ACRS]; 287 - unsigned long orig_gpr2; 288 - } s390_regs; 289 - 290 - /* 291 - * The user_pt_regs structure exports the beginning of 292 - * the in-kernel pt_regs structure to user space. 293 - */ 294 - typedef struct { 295 - unsigned long args[1]; 296 - psw_t psw; 297 - unsigned long gprs[NUM_GPRS]; 298 - } user_pt_regs; 299 - 300 - /* 301 - * Now for the user space program event recording (trace) definitions. 302 - * The following structures are used only for the ptrace interface, don't 303 - * touch or even look at it if you don't want to modify the user-space 304 - * ptrace interface. In particular stay away from it for in-kernel PER. 305 - */ 306 - typedef struct { 307 - unsigned long cr[NUM_CR_WORDS]; 308 - } per_cr_words; 309 - 310 - #define PER_EM_MASK 0xE8000000UL 311 - 312 - typedef struct { 313 - #ifdef __s390x__ 314 - unsigned : 32; 315 - #endif /* __s390x__ */ 316 - unsigned em_branching : 1; 317 - unsigned em_instruction_fetch : 1; 318 - /* 319 - * Switching on storage alteration automatically fixes 320 - * the storage alteration event bit in the users std. 321 - */ 322 - unsigned em_storage_alteration : 1; 323 - unsigned em_gpr_alt_unused : 1; 324 - unsigned em_store_real_address : 1; 325 - unsigned : 3; 326 - unsigned branch_addr_ctl : 1; 327 - unsigned : 1; 328 - unsigned storage_alt_space_ctl : 1; 329 - unsigned : 21; 330 - unsigned long starting_addr; 331 - unsigned long ending_addr; 332 - } per_cr_bits; 333 - 334 - typedef struct { 335 - unsigned short perc_atmid; 336 - unsigned long address; 337 - unsigned char access_id; 338 - } per_lowcore_words; 339 - 340 - typedef struct { 341 - unsigned perc_branching : 1; 342 - unsigned perc_instruction_fetch : 1; 343 - unsigned perc_storage_alteration : 1; 344 - unsigned perc_gpr_alt_unused : 1; 345 - unsigned perc_store_real_address : 1; 346 - unsigned : 3; 347 - unsigned atmid_psw_bit_31 : 1; 348 - unsigned atmid_validity_bit : 1; 349 - unsigned atmid_psw_bit_32 : 1; 350 - unsigned atmid_psw_bit_5 : 1; 351 - unsigned atmid_psw_bit_16 : 1; 352 - unsigned atmid_psw_bit_17 : 1; 353 - unsigned si : 2; 354 - unsigned long address; 355 - unsigned : 4; 356 - unsigned access_id : 4; 357 - } per_lowcore_bits; 358 - 359 - typedef struct { 360 - union { 361 - per_cr_words words; 362 - per_cr_bits bits; 363 - } control_regs; 364 - /* 365 - * The single_step and instruction_fetch bits are obsolete, 366 - * the kernel always sets them to zero. To enable single 367 - * stepping use ptrace(PTRACE_SINGLESTEP) instead. 368 - */ 369 - unsigned single_step : 1; 370 - unsigned instruction_fetch : 1; 371 - unsigned : 30; 372 - /* 373 - * These addresses are copied into cr10 & cr11 if single 374 - * stepping is switched off 375 - */ 376 - unsigned long starting_addr; 377 - unsigned long ending_addr; 378 - union { 379 - per_lowcore_words words; 380 - per_lowcore_bits bits; 381 - } lowcore; 382 - } per_struct; 383 - 384 - typedef struct { 385 - unsigned int len; 386 - unsigned long kernel_addr; 387 - unsigned long process_addr; 388 - } ptrace_area; 389 - 390 - /* 391 - * S/390 specific non posix ptrace requests. I chose unusual values so 392 - * they are unlikely to clash with future ptrace definitions. 393 - */ 394 - #define PTRACE_PEEKUSR_AREA 0x5000 395 - #define PTRACE_POKEUSR_AREA 0x5001 396 - #define PTRACE_PEEKTEXT_AREA 0x5002 397 - #define PTRACE_PEEKDATA_AREA 0x5003 398 - #define PTRACE_POKETEXT_AREA 0x5004 399 - #define PTRACE_POKEDATA_AREA 0x5005 400 - #define PTRACE_GET_LAST_BREAK 0x5006 401 - #define PTRACE_PEEK_SYSTEM_CALL 0x5007 402 - #define PTRACE_POKE_SYSTEM_CALL 0x5008 403 - #define PTRACE_ENABLE_TE 0x5009 404 - #define PTRACE_DISABLE_TE 0x5010 405 - #define PTRACE_TE_ABORT_RAND 0x5011 406 - 407 - /* 408 - * The numbers chosen here are somewhat arbitrary but absolutely MUST 409 - * not overlap with any of the number assigned in <linux/ptrace.h>. 410 - */ 411 - #define PTRACE_SINGLEBLOCK 12 /* resume execution until next branch */ 412 - 413 - /* 414 - * PT_PROT definition is loosely based on hppa bsd definition in 415 - * gdb/hppab-nat.c 416 - */ 417 - #define PTRACE_PROT 21 418 - 419 - typedef enum { 420 - ptprot_set_access_watchpoint, 421 - ptprot_set_write_watchpoint, 422 - ptprot_disable_watchpoint 423 - } ptprot_flags; 424 - 425 - typedef struct { 426 - unsigned long lowaddr; 427 - unsigned long hiaddr; 428 - ptprot_flags prot; 429 - } ptprot_area; 430 - 431 - /* Sequence of bytes for breakpoint illegal instruction. */ 432 - #define S390_BREAKPOINT {0x0,0x1} 433 - #define S390_BREAKPOINT_U16 ((__u16)0x0001) 434 - #define S390_SYSCALL_OPCODE ((__u16)0x0a00) 435 - #define S390_SYSCALL_SIZE 2 436 - 437 - /* 438 - * The user_regs_struct defines the way the user registers are 439 - * store on the stack for signal handling. 440 - */ 441 - struct user_regs_struct { 442 - psw_t psw; 443 - unsigned long gprs[NUM_GPRS]; 444 - unsigned int acrs[NUM_ACRS]; 445 - unsigned long orig_gpr2; 446 - s390_fp_regs fp_regs; 447 - /* 448 - * These per registers are in here so that gdb can modify them 449 - * itself as there is no "official" ptrace interface for hardware 450 - * watchpoints. This is the way intel does it. 451 - */ 452 - per_struct per_info; 453 - unsigned long ieee_instruction_pointer; /* obsolete, always 0 */ 454 - }; 455 - 456 - #endif /* __ASSEMBLY__ */ 457 - 458 - #endif /* _UAPI_S390_PTRACE_H */

-3

tools/bpf/bpftool/json_writer.c

··· 80 80 case '"': 81 81 fputs("\\\"", self->out); 82 82 break; 83 - case '\'': 84 - fputs("\\\'", self->out); 85 - break; 86 83 default: 87 84 putc(*str, self->out); 88 85 }

+1

tools/bpf/resolve_btfids/.gitignore

··· 1 1 /fixdep 2 2 /resolve_btfids 3 3 /libbpf/ 4 + /libsubcmd/

+31 -2

tools/include/uapi/linux/bpf.h

··· 4969 4969 * different maps if key/value layout matches across maps. 4970 4970 * Every bpf_timer_set_callback() can have different callback_fn. 4971 4971 * 4972 + * *flags* can be one of: 4973 + * 4974 + * **BPF_F_TIMER_ABS** 4975 + * Start the timer in absolute expire value instead of the 4976 + * default relative one. 4977 + * 4972 4978 * Return 4973 4979 * 0 on success. 4974 4980 * **-EINVAL** if *timer* was not initialized with bpf_timer_init() earlier ··· 5331 5325 * Description 5332 5326 * Write *len* bytes from *src* into *dst*, starting from *offset* 5333 5327 * into *dst*. 5334 - * *flags* is currently unused. 5328 + * 5329 + * *flags* must be 0 except for skb-type dynptrs. 5330 + * 5331 + * For skb-type dynptrs: 5332 + * * All data slices of the dynptr are automatically 5333 + * invalidated after **bpf_dynptr_write**\ (). This is 5334 + * because writing may pull the skb and change the 5335 + * underlying packet buffer. 5336 + * 5337 + * * For *flags*, please see the flags accepted by 5338 + * **bpf_skb_store_bytes**\ (). 5335 5339 * Return 5336 5340 * 0 on success, -E2BIG if *offset* + *len* exceeds the length 5337 5341 * of *dst*'s data, -EINVAL if *dst* is an invalid dynptr or if *dst* 5338 - * is a read-only dynptr or if *flags* is not 0. 5342 + * is a read-only dynptr or if *flags* is not correct. For skb-type dynptrs, 5343 + * other errors correspond to errors returned by **bpf_skb_store_bytes**\ (). 5339 5344 * 5340 5345 * void *bpf_dynptr_data(const struct bpf_dynptr *ptr, u32 offset, u32 len) 5341 5346 * Description ··· 5354 5337 * 5355 5338 * *len* must be a statically known value. The returned data slice 5356 5339 * is invalidated whenever the dynptr is invalidated. 5340 + * 5341 + * skb and xdp type dynptrs may not use bpf_dynptr_data. They should 5342 + * instead use bpf_dynptr_slice and bpf_dynptr_slice_rdwr. 5357 5343 * Return 5358 5344 * Pointer to the underlying dynptr data, NULL if the dynptr is 5359 5345 * read-only, if the dynptr is invalid, or if the offset and length ··· 7101 7081 __u32 type_id; 7102 7082 __u32 access_str_off; 7103 7083 enum bpf_core_relo_kind kind; 7084 + }; 7085 + 7086 + /* 7087 + * Flags to control bpf_timer_start() behaviour. 7088 + * - BPF_F_TIMER_ABS: Timeout passed is absolute time, by default it is 7089 + * relative to current time. 7090 + */ 7091 + enum { 7092 + BPF_F_TIMER_ABS = (1ULL << 0), 7104 7093 }; 7105 7094 7106 7095 #endif /* _UAPI__LINUX_BPF_H__ */

+1 -1

tools/lib/bpf/Build

··· 1 1 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \ 2 2 netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \ 3 3 btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \ 4 - usdt.o 4 + usdt.o zip.o

+64 -5

tools/lib/bpf/bpf.h

··· 1 1 /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ 2 2 3 3 /* 4 - * common eBPF ELF operations. 4 + * Common BPF ELF operations. 5 5 * 6 6 * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org> 7 7 * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com> ··· 386 386 LIBBPF_API int bpf_link_get_fd_by_id_opts(__u32 id, 387 387 const struct bpf_get_fd_by_id_opts *opts); 388 388 LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len); 389 - /* Type-safe variants of bpf_obj_get_info_by_fd(). The callers still needs to 390 - * pass info_len, which should normally be 391 - * sizeof(struct bpf_{prog,map,btf,link}_info), in order to be compatible with 392 - * different libbpf and kernel versions. 389 + 390 + /** 391 + * @brief **bpf_prog_get_info_by_fd()** obtains information about the BPF 392 + * program corresponding to *prog_fd*. 393 + * 394 + * Populates up to *info_len* bytes of *info* and updates *info_len* with the 395 + * actual number of bytes written to *info*. 396 + * 397 + * @param prog_fd BPF program file descriptor 398 + * @param info pointer to **struct bpf_prog_info** that will be populated with 399 + * BPF program information 400 + * @param info_len pointer to the size of *info*; on success updated with the 401 + * number of bytes written to *info* 402 + * @return 0, on success; negative error code, otherwise (errno is also set to 403 + * the error code) 393 404 */ 394 405 LIBBPF_API int bpf_prog_get_info_by_fd(int prog_fd, struct bpf_prog_info *info, __u32 *info_len); 406 + 407 + /** 408 + * @brief **bpf_map_get_info_by_fd()** obtains information about the BPF 409 + * map corresponding to *map_fd*. 410 + * 411 + * Populates up to *info_len* bytes of *info* and updates *info_len* with the 412 + * actual number of bytes written to *info*. 413 + * 414 + * @param map_fd BPF map file descriptor 415 + * @param info pointer to **struct bpf_map_info** that will be populated with 416 + * BPF map information 417 + * @param info_len pointer to the size of *info*; on success updated with the 418 + * number of bytes written to *info* 419 + * @return 0, on success; negative error code, otherwise (errno is also set to 420 + * the error code) 421 + */ 395 422 LIBBPF_API int bpf_map_get_info_by_fd(int map_fd, struct bpf_map_info *info, __u32 *info_len); 423 + 424 + /** 425 + * @brief **bpf_btf_get_info_by_fd()** obtains information about the 426 + * BTF object corresponding to *btf_fd*. 427 + * 428 + * Populates up to *info_len* bytes of *info* and updates *info_len* with the 429 + * actual number of bytes written to *info*. 430 + * 431 + * @param btf_fd BTF object file descriptor 432 + * @param info pointer to **struct bpf_btf_info** that will be populated with 433 + * BTF object information 434 + * @param info_len pointer to the size of *info*; on success updated with the 435 + * number of bytes written to *info* 436 + * @return 0, on success; negative error code, otherwise (errno is also set to 437 + * the error code) 438 + */ 396 439 LIBBPF_API int bpf_btf_get_info_by_fd(int btf_fd, struct bpf_btf_info *info, __u32 *info_len); 440 + 441 + /** 442 + * @brief **bpf_btf_get_info_by_fd()** obtains information about the BPF 443 + * link corresponding to *link_fd*. 444 + * 445 + * Populates up to *info_len* bytes of *info* and updates *info_len* with the 446 + * actual number of bytes written to *info*. 447 + * 448 + * @param link_fd BPF link file descriptor 449 + * @param info pointer to **struct bpf_link_info** that will be populated with 450 + * BPF link information 451 + * @param info_len pointer to the size of *info*; on success updated with the 452 + * number of bytes written to *info* 453 + * @return 0, on success; negative error code, otherwise (errno is also set to 454 + * the error code) 455 + */ 397 456 LIBBPF_API int bpf_link_get_info_by_fd(int link_fd, struct bpf_link_info *info, __u32 *info_len); 398 457 399 458 struct bpf_prog_query_opts {

+1 -1

tools/lib/bpf/bpf_helpers.h

··· 174 174 175 175 #define __kconfig __attribute__((section(".kconfig"))) 176 176 #define __ksym __attribute__((section(".ksyms"))) 177 + #define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted"))) 177 178 #define __kptr __attribute__((btf_type_tag("kptr"))) 178 - #define __kptr_ref __attribute__((btf_type_tag("kptr_ref"))) 179 179 180 180 #ifndef ___bpf_concat 181 181 #define ___bpf_concat(a, b) a ## b

+3

tools/lib/bpf/bpf_tracing.h

··· 204 204 #define __PT_PARM2_SYSCALL_REG __PT_PARM2_REG 205 205 #define __PT_PARM3_SYSCALL_REG __PT_PARM3_REG 206 206 #define __PT_PARM4_SYSCALL_REG __PT_PARM4_REG 207 + #define __PT_PARM5_SYSCALL_REG uregs[4] 207 208 #define __PT_PARM6_SYSCALL_REG uregs[5] 208 209 #define __PT_PARM7_SYSCALL_REG uregs[6] 209 210 ··· 416 415 * https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html 417 416 */ 418 417 418 + /* loongarch provides struct user_pt_regs instead of struct pt_regs to userspace */ 419 + #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x)) 419 420 #define __PT_PARM1_REG regs[4] 420 421 #define __PT_PARM2_REG regs[5] 421 422 #define __PT_PARM3_REG regs[6]

-2

tools/lib/bpf/btf.c

··· 1000 1000 } 1001 1001 } 1002 1002 1003 - err = 0; 1004 - 1005 1003 if (!btf_data) { 1006 1004 pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path); 1007 1005 err = -ENODATA;

+167 -30

tools/lib/bpf/libbpf.c

··· 53 53 #include "libbpf_internal.h" 54 54 #include "hashmap.h" 55 55 #include "bpf_gen_internal.h" 56 + #include "zip.h" 56 57 57 58 #ifndef BPF_FS_MAGIC 58 59 #define BPF_FS_MAGIC 0xcafe4a11 ··· 799 798 progs = obj->programs; 800 799 nr_progs = obj->nr_programs; 801 800 nr_syms = symbols->d_size / sizeof(Elf64_Sym); 802 - sec_off = 0; 803 801 804 802 for (i = 0; i < nr_syms; i++) { 805 803 sym = elf_sym_by_idx(obj, i); ··· 2615 2615 strict = !OPTS_GET(opts, relaxed_maps, false); 2616 2616 pin_root_path = OPTS_GET(opts, pin_root_path, NULL); 2617 2617 2618 - err = err ?: bpf_object__init_user_btf_maps(obj, strict, pin_root_path); 2618 + err = bpf_object__init_user_btf_maps(obj, strict, pin_root_path); 2619 2619 err = err ?: bpf_object__init_global_data_maps(obj); 2620 2620 err = err ?: bpf_object__init_kconfig_map(obj); 2621 2621 err = err ?: bpf_object__init_struct_ops_maps(obj); ··· 9724 9724 char errmsg[STRERR_BUFSIZE]; 9725 9725 struct bpf_link_perf *link; 9726 9726 int prog_fd, link_fd = -1, err; 9727 + bool force_ioctl_attach; 9727 9728 9728 9729 if (!OPTS_VALID(opts, bpf_perf_event_opts)) 9729 9730 return libbpf_err_ptr(-EINVAL); ··· 9748 9747 link->link.dealloc = &bpf_link_perf_dealloc; 9749 9748 link->perf_event_fd = pfd; 9750 9749 9751 - if (kernel_supports(prog->obj, FEAT_PERF_LINK)) { 9750 + force_ioctl_attach = OPTS_GET(opts, force_ioctl_attach, false); 9751 + if (kernel_supports(prog->obj, FEAT_PERF_LINK) && !force_ioctl_attach) { 9752 9752 DECLARE_LIBBPF_OPTS(bpf_link_create_opts, link_opts, 9753 9753 .perf_event.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0)); 9754 9754 ··· 10108 10106 const struct bpf_kprobe_opts *opts) 10109 10107 { 10110 10108 DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts); 10109 + enum probe_attach_mode attach_mode; 10111 10110 char errmsg[STRERR_BUFSIZE]; 10112 10111 char *legacy_probe = NULL; 10113 10112 struct bpf_link *link; ··· 10119 10116 if (!OPTS_VALID(opts, bpf_kprobe_opts)) 10120 10117 return libbpf_err_ptr(-EINVAL); 10121 10118 10119 + attach_mode = OPTS_GET(opts, attach_mode, PROBE_ATTACH_MODE_DEFAULT); 10122 10120 retprobe = OPTS_GET(opts, retprobe, false); 10123 10121 offset = OPTS_GET(opts, offset, 0); 10124 10122 pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0); 10125 10123 10126 10124 legacy = determine_kprobe_perf_type() < 0; 10125 + switch (attach_mode) { 10126 + case PROBE_ATTACH_MODE_LEGACY: 10127 + legacy = true; 10128 + pe_opts.force_ioctl_attach = true; 10129 + break; 10130 + case PROBE_ATTACH_MODE_PERF: 10131 + if (legacy) 10132 + return libbpf_err_ptr(-ENOTSUP); 10133 + pe_opts.force_ioctl_attach = true; 10134 + break; 10135 + case PROBE_ATTACH_MODE_LINK: 10136 + if (legacy || !kernel_supports(prog->obj, FEAT_PERF_LINK)) 10137 + return libbpf_err_ptr(-ENOTSUP); 10138 + break; 10139 + case PROBE_ATTACH_MODE_DEFAULT: 10140 + break; 10141 + default: 10142 + return libbpf_err_ptr(-EINVAL); 10143 + } 10144 + 10127 10145 if (!legacy) { 10128 10146 pfd = perf_event_open_probe(false /* uprobe */, retprobe, 10129 10147 func_name, offset, ··· 10555 10531 return NULL; 10556 10532 } 10557 10533 10558 - /* Find offset of function name in object specified by path. "name" matches 10559 - * symbol name or name@@LIB for library functions. 10534 + /* Find offset of function name in the provided ELF object. "binary_path" is 10535 + * the path to the ELF binary represented by "elf", and only used for error 10536 + * reporting matters. "name" matches symbol name or name@@LIB for library 10537 + * functions. 10560 10538 */ 10561 - static long elf_find_func_offset(const char *binary_path, const char *name) 10539 + static long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name) 10562 10540 { 10563 - int fd, i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB }; 10541 + int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB }; 10564 10542 bool is_shared_lib, is_name_qualified; 10565 - char errmsg[STRERR_BUFSIZE]; 10566 10543 long ret = -ENOENT; 10567 10544 size_t name_len; 10568 10545 GElf_Ehdr ehdr; 10569 - Elf *elf; 10570 10546 10571 - fd = open(binary_path, O_RDONLY | O_CLOEXEC); 10572 - if (fd < 0) { 10573 - ret = -errno; 10574 - pr_warn("failed to open %s: %s\n", binary_path, 10575 - libbpf_strerror_r(ret, errmsg, sizeof(errmsg))); 10576 - return ret; 10577 - } 10578 - elf = elf_begin(fd, ELF_C_READ_MMAP, NULL); 10579 - if (!elf) { 10580 - pr_warn("elf: could not read elf from %s: %s\n", binary_path, elf_errmsg(-1)); 10581 - close(fd); 10582 - return -LIBBPF_ERRNO__FORMAT; 10583 - } 10584 10547 if (!gelf_getehdr(elf, &ehdr)) { 10585 10548 pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1)); 10586 10549 ret = -LIBBPF_ERRNO__FORMAT; ··· 10580 10569 /* Does name specify "@@LIB"? */ 10581 10570 is_name_qualified = strstr(name, "@@") != NULL; 10582 10571 10583 - /* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if 10572 + /* Search SHT_DYNSYM, SHT_SYMTAB for symbol. This search order is used because if 10584 10573 * a binary is stripped, it may only have SHT_DYNSYM, and a fully-statically 10585 10574 * linked binary may not have SHT_DYMSYM, so absence of a section should not be 10586 10575 * reported as a warning/error. ··· 10693 10682 } 10694 10683 } 10695 10684 out: 10685 + return ret; 10686 + } 10687 + 10688 + /* Find offset of function name in ELF object specified by path. "name" matches 10689 + * symbol name or name@@LIB for library functions. 10690 + */ 10691 + static long elf_find_func_offset_from_file(const char *binary_path, const char *name) 10692 + { 10693 + char errmsg[STRERR_BUFSIZE]; 10694 + long ret = -ENOENT; 10695 + Elf *elf; 10696 + int fd; 10697 + 10698 + fd = open(binary_path, O_RDONLY | O_CLOEXEC); 10699 + if (fd < 0) { 10700 + ret = -errno; 10701 + pr_warn("failed to open %s: %s\n", binary_path, 10702 + libbpf_strerror_r(ret, errmsg, sizeof(errmsg))); 10703 + return ret; 10704 + } 10705 + elf = elf_begin(fd, ELF_C_READ_MMAP, NULL); 10706 + if (!elf) { 10707 + pr_warn("elf: could not read elf from %s: %s\n", binary_path, elf_errmsg(-1)); 10708 + close(fd); 10709 + return -LIBBPF_ERRNO__FORMAT; 10710 + } 10711 + 10712 + ret = elf_find_func_offset(elf, binary_path, name); 10696 10713 elf_end(elf); 10697 10714 close(fd); 10715 + return ret; 10716 + } 10717 + 10718 + /* Find offset of function name in archive specified by path. Currently 10719 + * supported are .zip files that do not compress their contents, as used on 10720 + * Android in the form of APKs, for example. "file_name" is the name of the ELF 10721 + * file inside the archive. "func_name" matches symbol name or name@@LIB for 10722 + * library functions. 10723 + * 10724 + * An overview of the APK format specifically provided here: 10725 + * https://en.wikipedia.org/w/index.php?title=Apk_(file_format)&oldid=1139099120#Package_contents 10726 + */ 10727 + static long elf_find_func_offset_from_archive(const char *archive_path, const char *file_name, 10728 + const char *func_name) 10729 + { 10730 + struct zip_archive *archive; 10731 + struct zip_entry entry; 10732 + long ret; 10733 + Elf *elf; 10734 + 10735 + archive = zip_archive_open(archive_path); 10736 + if (IS_ERR(archive)) { 10737 + ret = PTR_ERR(archive); 10738 + pr_warn("zip: failed to open %s: %ld\n", archive_path, ret); 10739 + return ret; 10740 + } 10741 + 10742 + ret = zip_archive_find_entry(archive, file_name, &entry); 10743 + if (ret) { 10744 + pr_warn("zip: could not find archive member %s in %s: %ld\n", file_name, 10745 + archive_path, ret); 10746 + goto out; 10747 + } 10748 + pr_debug("zip: found entry for %s in %s at 0x%lx\n", file_name, archive_path, 10749 + (unsigned long)entry.data_offset); 10750 + 10751 + if (entry.compression) { 10752 + pr_warn("zip: entry %s of %s is compressed and cannot be handled\n", file_name, 10753 + archive_path); 10754 + ret = -LIBBPF_ERRNO__FORMAT; 10755 + goto out; 10756 + } 10757 + 10758 + elf = elf_memory((void *)entry.data, entry.data_length); 10759 + if (!elf) { 10760 + pr_warn("elf: could not read elf file %s from %s: %s\n", file_name, archive_path, 10761 + elf_errmsg(-1)); 10762 + ret = -LIBBPF_ERRNO__LIBELF; 10763 + goto out; 10764 + } 10765 + 10766 + ret = elf_find_func_offset(elf, file_name, func_name); 10767 + if (ret > 0) { 10768 + pr_debug("elf: symbol address match for %s of %s in %s: 0x%x + 0x%lx = 0x%lx\n", 10769 + func_name, file_name, archive_path, entry.data_offset, ret, 10770 + ret + entry.data_offset); 10771 + ret += entry.data_offset; 10772 + } 10773 + elf_end(elf); 10774 + 10775 + out: 10776 + zip_archive_close(archive); 10698 10777 return ret; 10699 10778 } 10700 10779 ··· 10873 10772 const char *binary_path, size_t func_offset, 10874 10773 const struct bpf_uprobe_opts *opts) 10875 10774 { 10876 - DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts); 10775 + const char *archive_path = NULL, *archive_sep = NULL; 10877 10776 char errmsg[STRERR_BUFSIZE], *legacy_probe = NULL; 10878 - char full_binary_path[PATH_MAX]; 10777 + DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts); 10778 + enum probe_attach_mode attach_mode; 10779 + char full_path[PATH_MAX]; 10879 10780 struct bpf_link *link; 10880 10781 size_t ref_ctr_off; 10881 10782 int pfd, err; ··· 10887 10784 if (!OPTS_VALID(opts, bpf_uprobe_opts)) 10888 10785 return libbpf_err_ptr(-EINVAL); 10889 10786 10787 + attach_mode = OPTS_GET(opts, attach_mode, PROBE_ATTACH_MODE_DEFAULT); 10890 10788 retprobe = OPTS_GET(opts, retprobe, false); 10891 10789 ref_ctr_off = OPTS_GET(opts, ref_ctr_offset, 0); 10892 10790 pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0); ··· 10895 10791 if (!binary_path) 10896 10792 return libbpf_err_ptr(-EINVAL); 10897 10793 10898 - if (!strchr(binary_path, '/')) { 10899 - err = resolve_full_path(binary_path, full_binary_path, 10900 - sizeof(full_binary_path)); 10794 + /* Check if "binary_path" refers to an archive. */ 10795 + archive_sep = strstr(binary_path, "!/"); 10796 + if (archive_sep) { 10797 + full_path[0] = '\0'; 10798 + libbpf_strlcpy(full_path, binary_path, 10799 + min(sizeof(full_path), (size_t)(archive_sep - binary_path + 1))); 10800 + archive_path = full_path; 10801 + binary_path = archive_sep + 2; 10802 + } else if (!strchr(binary_path, '/')) { 10803 + err = resolve_full_path(binary_path, full_path, sizeof(full_path)); 10901 10804 if (err) { 10902 10805 pr_warn("prog '%s': failed to resolve full path for '%s': %d\n", 10903 10806 prog->name, binary_path, err); 10904 10807 return libbpf_err_ptr(err); 10905 10808 } 10906 - binary_path = full_binary_path; 10809 + binary_path = full_path; 10907 10810 } 10908 10811 func_name = OPTS_GET(opts, func_name, NULL); 10909 10812 if (func_name) { 10910 10813 long sym_off; 10911 10814 10912 - sym_off = elf_find_func_offset(binary_path, func_name); 10815 + if (archive_path) { 10816 + sym_off = elf_find_func_offset_from_archive(archive_path, binary_path, 10817 + func_name); 10818 + binary_path = archive_path; 10819 + } else { 10820 + sym_off = elf_find_func_offset_from_file(binary_path, func_name); 10821 + } 10913 10822 if (sym_off < 0) 10914 10823 return libbpf_err_ptr(sym_off); 10915 10824 func_offset += sym_off; 10916 10825 } 10917 10826 10918 10827 legacy = determine_uprobe_perf_type() < 0; 10828 + switch (attach_mode) { 10829 + case PROBE_ATTACH_MODE_LEGACY: 10830 + legacy = true; 10831 + pe_opts.force_ioctl_attach = true; 10832 + break; 10833 + case PROBE_ATTACH_MODE_PERF: 10834 + if (legacy) 10835 + return libbpf_err_ptr(-ENOTSUP); 10836 + pe_opts.force_ioctl_attach = true; 10837 + break; 10838 + case PROBE_ATTACH_MODE_LINK: 10839 + if (legacy || !kernel_supports(prog->obj, FEAT_PERF_LINK)) 10840 + return libbpf_err_ptr(-ENOTSUP); 10841 + break; 10842 + case PROBE_ATTACH_MODE_DEFAULT: 10843 + break; 10844 + default: 10845 + return libbpf_err_ptr(-EINVAL); 10846 + } 10847 + 10919 10848 if (!legacy) { 10920 10849 pfd = perf_event_open_probe(true /* uprobe */, retprobe, binary_path, 10921 10850 func_offset, pid, ref_ctr_off);

+37 -13

tools/lib/bpf/libbpf.h

··· 447 447 bpf_program__attach(const struct bpf_program *prog); 448 448 449 449 struct bpf_perf_event_opts { 450 - /* size of this struct, for forward/backward compatiblity */ 450 + /* size of this struct, for forward/backward compatibility */ 451 451 size_t sz; 452 452 /* custom user-provided value fetchable through bpf_get_attach_cookie() */ 453 453 __u64 bpf_cookie; 454 + /* don't use BPF link when attach BPF program */ 455 + bool force_ioctl_attach; 456 + size_t :0; 454 457 }; 455 - #define bpf_perf_event_opts__last_field bpf_cookie 458 + #define bpf_perf_event_opts__last_field force_ioctl_attach 456 459 457 460 LIBBPF_API struct bpf_link * 458 461 bpf_program__attach_perf_event(const struct bpf_program *prog, int pfd); ··· 464 461 bpf_program__attach_perf_event_opts(const struct bpf_program *prog, int pfd, 465 462 const struct bpf_perf_event_opts *opts); 466 463 464 + /** 465 + * enum probe_attach_mode - the mode to attach kprobe/uprobe 466 + * 467 + * force libbpf to attach kprobe/uprobe in specific mode, -ENOTSUP will 468 + * be returned if it is not supported by the kernel. 469 + */ 470 + enum probe_attach_mode { 471 + /* attach probe in latest supported mode by kernel */ 472 + PROBE_ATTACH_MODE_DEFAULT = 0, 473 + /* attach probe in legacy mode, using debugfs/tracefs */ 474 + PROBE_ATTACH_MODE_LEGACY, 475 + /* create perf event with perf_event_open() syscall */ 476 + PROBE_ATTACH_MODE_PERF, 477 + /* attach probe with BPF link */ 478 + PROBE_ATTACH_MODE_LINK, 479 + }; 480 + 467 481 struct bpf_kprobe_opts { 468 - /* size of this struct, for forward/backward compatiblity */ 482 + /* size of this struct, for forward/backward compatibility */ 469 483 size_t sz; 470 484 /* custom user-provided value fetchable through bpf_get_attach_cookie() */ 471 485 __u64 bpf_cookie; ··· 490 470 size_t offset; 491 471 /* kprobe is return probe */ 492 472 bool retprobe; 473 + /* kprobe attach mode */ 474 + enum probe_attach_mode attach_mode; 493 475 size_t :0; 494 476 }; 495 - #define bpf_kprobe_opts__last_field retprobe 477 + #define bpf_kprobe_opts__last_field attach_mode 496 478 497 479 LIBBPF_API struct bpf_link * 498 480 bpf_program__attach_kprobe(const struct bpf_program *prog, bool retprobe, ··· 528 506 const struct bpf_kprobe_multi_opts *opts); 529 507 530 508 struct bpf_ksyscall_opts { 531 - /* size of this struct, for forward/backward compatiblity */ 509 + /* size of this struct, for forward/backward compatibility */ 532 510 size_t sz; 533 511 /* custom user-provided value fetchable through bpf_get_attach_cookie() */ 534 512 __u64 bpf_cookie; ··· 574 552 const struct bpf_ksyscall_opts *opts); 575 553 576 554 struct bpf_uprobe_opts { 577 - /* size of this struct, for forward/backward compatiblity */ 555 + /* size of this struct, for forward/backward compatibility */ 578 556 size_t sz; 579 557 /* offset of kernel reference counted USDT semaphore, added in 580 558 * a6ca88b241d5 ("trace_uprobe: support reference counter in fd-based uprobe") ··· 592 570 * binary_path. 593 571 */ 594 572 const char *func_name; 573 + /* uprobe attach mode */ 574 + enum probe_attach_mode attach_mode; 595 575 size_t :0; 596 576 }; 597 - #define bpf_uprobe_opts__last_field func_name 577 + #define bpf_uprobe_opts__last_field attach_mode 598 578 599 579 /** 600 580 * @brief **bpf_program__attach_uprobe()** attaches a BPF program ··· 670 646 const struct bpf_usdt_opts *opts); 671 647 672 648 struct bpf_tracepoint_opts { 673 - /* size of this struct, for forward/backward compatiblity */ 649 + /* size of this struct, for forward/backward compatibility */ 674 650 size_t sz; 675 651 /* custom user-provided value fetchable through bpf_get_attach_cookie() */ 676 652 __u64 bpf_cookie; ··· 1134 1110 typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size); 1135 1111 1136 1112 struct ring_buffer_opts { 1137 - size_t sz; /* size of this struct, for forward/backward compatiblity */ 1113 + size_t sz; /* size of this struct, for forward/backward compatibility */ 1138 1114 }; 1139 1115 1140 1116 #define ring_buffer_opts__last_field sz ··· 1499 1475 bpf_object__destroy_subskeleton(struct bpf_object_subskeleton *s); 1500 1476 1501 1477 struct gen_loader_opts { 1502 - size_t sz; /* size of this struct, for forward/backward compatiblity */ 1478 + size_t sz; /* size of this struct, for forward/backward compatibility */ 1503 1479 const char *data; 1504 1480 const char *insns; 1505 1481 __u32 data_sz; ··· 1517 1493 }; 1518 1494 1519 1495 struct bpf_linker_opts { 1520 - /* size of this struct, for forward/backward compatiblity */ 1496 + /* size of this struct, for forward/backward compatibility */ 1521 1497 size_t sz; 1522 1498 }; 1523 1499 #define bpf_linker_opts__last_field sz 1524 1500 1525 1501 struct bpf_linker_file_opts { 1526 - /* size of this struct, for forward/backward compatiblity */ 1502 + /* size of this struct, for forward/backward compatibility */ 1527 1503 size_t sz; 1528 1504 }; 1529 1505 #define bpf_linker_file_opts__last_field sz ··· 1566 1542 struct bpf_link **link); 1567 1543 1568 1544 struct libbpf_prog_handler_opts { 1569 - /* size of this struct, for forward/backward compatiblity */ 1545 + /* size of this struct, for forward/backward compatibility */ 1570 1546 size_t sz; 1571 1547 /* User-provided value that is passed to prog_setup_fn, 1572 1548 * prog_prepare_load_fn, and prog_attach_fn callbacks. Allows user to

+2 -9

tools/lib/bpf/linker.c

··· 1997 1997 static int linker_append_elf_relos(struct bpf_linker *linker, struct src_obj *obj) 1998 1998 { 1999 1999 struct src_sec *src_symtab = &obj->secs[obj->symtab_sec_idx]; 2000 - struct dst_sec *dst_symtab; 2001 2000 int i, err; 2002 2001 2003 2002 for (i = 1; i < obj->sec_cnt; i++) { ··· 2029 2030 return -1; 2030 2031 } 2031 2032 2032 - /* add_dst_sec() above could have invalidated linker->secs */ 2033 - dst_symtab = &linker->secs[linker->symtab_sec_idx]; 2034 - 2035 2033 /* shdr->sh_link points to SYMTAB */ 2036 2034 dst_sec->shdr->sh_link = linker->symtab_sec_idx; 2037 2035 ··· 2045 2049 dst_rel = dst_sec->raw_data + src_sec->dst_off; 2046 2050 n = src_sec->shdr->sh_size / src_sec->shdr->sh_entsize; 2047 2051 for (j = 0; j < n; j++, src_rel++, dst_rel++) { 2048 - size_t src_sym_idx = ELF64_R_SYM(src_rel->r_info); 2049 - size_t sym_type = ELF64_R_TYPE(src_rel->r_info); 2050 - Elf64_Sym *src_sym, *dst_sym; 2051 - size_t dst_sym_idx; 2052 + size_t src_sym_idx, dst_sym_idx, sym_type; 2053 + Elf64_Sym *src_sym; 2052 2054 2053 2055 src_sym_idx = ELF64_R_SYM(src_rel->r_info); 2054 2056 src_sym = src_symtab->data->d_buf + sizeof(*src_sym) * src_sym_idx; 2055 2057 2056 2058 dst_sym_idx = obj->sym_map[src_sym_idx]; 2057 - dst_sym = dst_symtab->raw_data + sizeof(*dst_sym) * dst_sym_idx; 2058 2059 dst_rel->r_offset += src_linked_sec->dst_off; 2059 2060 sym_type = ELF64_R_TYPE(src_rel->r_info); 2060 2061 dst_rel->r_info = ELF64_R_INFO(dst_sym_idx, sym_type);

+7 -1

tools/lib/bpf/netlink.c

··· 468 468 return 0; 469 469 470 470 err = libbpf_netlink_resolve_genl_family_id("netdev", sizeof("netdev"), &id); 471 - if (err < 0) 471 + if (err < 0) { 472 + if (err == -ENOENT) { 473 + opts->feature_flags = 0; 474 + goto skip_feature_flags; 475 + } 472 476 return libbpf_err(err); 477 + } 473 478 474 479 memset(&req, 0, sizeof(req)); 475 480 req.nh.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN); ··· 494 489 495 490 opts->feature_flags = md.flags; 496 491 492 + skip_feature_flags: 497 493 return 0; 498 494 } 499 495

-3

tools/lib/bpf/relo_core.c

··· 1551 1551 if (level <= 0) 1552 1552 return -EINVAL; 1553 1553 1554 - local_t = btf_type_by_id(local_btf, local_id); 1555 - targ_t = btf_type_by_id(targ_btf, targ_id); 1556 - 1557 1554 recur: 1558 1555 depth--; 1559 1556 if (depth < 0)

+328

tools/lib/bpf/zip.c

··· 1 + // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 + /* 3 + * Routines for dealing with .zip archives. 4 + * 5 + * Copyright (c) Meta Platforms, Inc. and affiliates. 6 + */ 7 + 8 + #include <errno.h> 9 + #include <fcntl.h> 10 + #include <stdint.h> 11 + #include <stdlib.h> 12 + #include <string.h> 13 + #include <sys/mman.h> 14 + #include <unistd.h> 15 + 16 + #include "libbpf_internal.h" 17 + #include "zip.h" 18 + 19 + /* Specification of ZIP file format can be found here: 20 + * https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT 21 + * For a high level overview of the structure of a ZIP file see 22 + * sections 4.3.1 - 4.3.6. 23 + * 24 + * Data structures appearing in ZIP files do not contain any 25 + * padding and they might be misaligned. To allow us to safely 26 + * operate on pointers to such structures and their members, we 27 + * declare the types as packed. 28 + */ 29 + 30 + #define END_OF_CD_RECORD_MAGIC 0x06054b50 31 + 32 + /* See section 4.3.16 of the spec. */ 33 + struct end_of_cd_record { 34 + /* Magic value equal to END_OF_CD_RECORD_MAGIC */ 35 + __u32 magic; 36 + 37 + /* Number of the file containing this structure or 0xFFFF if ZIP64 archive. 38 + * Zip archive might span multiple files (disks). 39 + */ 40 + __u16 this_disk; 41 + 42 + /* Number of the file containing the beginning of the central directory or 43 + * 0xFFFF if ZIP64 archive. 44 + */ 45 + __u16 cd_disk; 46 + 47 + /* Number of central directory records on this disk or 0xFFFF if ZIP64 48 + * archive. 49 + */ 50 + __u16 cd_records; 51 + 52 + /* Number of central directory records on all disks or 0xFFFF if ZIP64 53 + * archive. 54 + */ 55 + __u16 cd_records_total; 56 + 57 + /* Size of the central directory record or 0xFFFFFFFF if ZIP64 archive. */ 58 + __u32 cd_size; 59 + 60 + /* Offset of the central directory from the beginning of the archive or 61 + * 0xFFFFFFFF if ZIP64 archive. 62 + */ 63 + __u32 cd_offset; 64 + 65 + /* Length of comment data following end of central directory record. */ 66 + __u16 comment_length; 67 + 68 + /* Up to 64k of arbitrary bytes. */ 69 + /* uint8_t comment[comment_length] */ 70 + } __attribute__((packed)); 71 + 72 + #define CD_FILE_HEADER_MAGIC 0x02014b50 73 + #define FLAG_ENCRYPTED (1 << 0) 74 + #define FLAG_HAS_DATA_DESCRIPTOR (1 << 3) 75 + 76 + /* See section 4.3.12 of the spec. */ 77 + struct cd_file_header { 78 + /* Magic value equal to CD_FILE_HEADER_MAGIC. */ 79 + __u32 magic; 80 + __u16 version; 81 + /* Minimum zip version needed to extract the file. */ 82 + __u16 min_version; 83 + __u16 flags; 84 + __u16 compression; 85 + __u16 last_modified_time; 86 + __u16 last_modified_date; 87 + __u32 crc; 88 + __u32 compressed_size; 89 + __u32 uncompressed_size; 90 + __u16 file_name_length; 91 + __u16 extra_field_length; 92 + __u16 file_comment_length; 93 + /* Number of the disk where the file starts or 0xFFFF if ZIP64 archive. */ 94 + __u16 disk; 95 + __u16 internal_attributes; 96 + __u32 external_attributes; 97 + /* Offset from the start of the disk containing the local file header to the 98 + * start of the local file header. 99 + */ 100 + __u32 offset; 101 + } __attribute__((packed)); 102 + 103 + #define LOCAL_FILE_HEADER_MAGIC 0x04034b50 104 + 105 + /* See section 4.3.7 of the spec. */ 106 + struct local_file_header { 107 + /* Magic value equal to LOCAL_FILE_HEADER_MAGIC. */ 108 + __u32 magic; 109 + /* Minimum zip version needed to extract the file. */ 110 + __u16 min_version; 111 + __u16 flags; 112 + __u16 compression; 113 + __u16 last_modified_time; 114 + __u16 last_modified_date; 115 + __u32 crc; 116 + __u32 compressed_size; 117 + __u32 uncompressed_size; 118 + __u16 file_name_length; 119 + __u16 extra_field_length; 120 + } __attribute__((packed)); 121 + 122 + struct zip_archive { 123 + void *data; 124 + __u32 size; 125 + __u32 cd_offset; 126 + __u32 cd_records; 127 + }; 128 + 129 + static void *check_access(struct zip_archive *archive, __u32 offset, __u32 size) 130 + { 131 + if (offset + size > archive->size || offset > offset + size) 132 + return NULL; 133 + 134 + return archive->data + offset; 135 + } 136 + 137 + /* Returns 0 on success, -EINVAL on error and -ENOTSUP if the eocd indicates the 138 + * archive uses features which are not supported. 139 + */ 140 + static int try_parse_end_of_cd(struct zip_archive *archive, __u32 offset) 141 + { 142 + __u16 comment_length, cd_records; 143 + struct end_of_cd_record *eocd; 144 + __u32 cd_offset, cd_size; 145 + 146 + eocd = check_access(archive, offset, sizeof(*eocd)); 147 + if (!eocd || eocd->magic != END_OF_CD_RECORD_MAGIC) 148 + return -EINVAL; 149 + 150 + comment_length = eocd->comment_length; 151 + if (offset + sizeof(*eocd) + comment_length != archive->size) 152 + return -EINVAL; 153 + 154 + cd_records = eocd->cd_records; 155 + if (eocd->this_disk != 0 || eocd->cd_disk != 0 || eocd->cd_records_total != cd_records) 156 + /* This is a valid eocd, but we only support single-file non-ZIP64 archives. */ 157 + return -ENOTSUP; 158 + 159 + cd_offset = eocd->cd_offset; 160 + cd_size = eocd->cd_size; 161 + if (!check_access(archive, cd_offset, cd_size)) 162 + return -EINVAL; 163 + 164 + archive->cd_offset = cd_offset; 165 + archive->cd_records = cd_records; 166 + return 0; 167 + } 168 + 169 + static int find_cd(struct zip_archive *archive) 170 + { 171 + int rc = -EINVAL; 172 + int64_t limit; 173 + __u32 offset; 174 + 175 + if (archive->size <= sizeof(struct end_of_cd_record)) 176 + return -EINVAL; 177 + 178 + /* Because the end of central directory ends with a variable length array of 179 + * up to 0xFFFF bytes we can't know exactly where it starts and need to 180 + * search for it at the end of the file, scanning the (limit, offset] range. 181 + */ 182 + offset = archive->size - sizeof(struct end_of_cd_record); 183 + limit = (int64_t)offset - (1 << 16); 184 + 185 + for (; offset >= 0 && offset > limit && rc != 0; offset--) { 186 + rc = try_parse_end_of_cd(archive, offset); 187 + if (rc == -ENOTSUP) 188 + break; 189 + } 190 + return rc; 191 + } 192 + 193 + struct zip_archive *zip_archive_open(const char *path) 194 + { 195 + struct zip_archive *archive; 196 + int err, fd; 197 + off_t size; 198 + void *data; 199 + 200 + fd = open(path, O_RDONLY | O_CLOEXEC); 201 + if (fd < 0) 202 + return ERR_PTR(-errno); 203 + 204 + size = lseek(fd, 0, SEEK_END); 205 + if (size == (off_t)-1 || size > UINT32_MAX) { 206 + close(fd); 207 + return ERR_PTR(-EINVAL); 208 + } 209 + 210 + data = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); 211 + err = -errno; 212 + close(fd); 213 + 214 + if (data == MAP_FAILED) 215 + return ERR_PTR(err); 216 + 217 + archive = malloc(sizeof(*archive)); 218 + if (!archive) { 219 + munmap(data, size); 220 + return ERR_PTR(-ENOMEM); 221 + }; 222 + 223 + archive->data = data; 224 + archive->size = size; 225 + 226 + err = find_cd(archive); 227 + if (err) { 228 + munmap(data, size); 229 + free(archive); 230 + return ERR_PTR(err); 231 + } 232 + 233 + return archive; 234 + } 235 + 236 + void zip_archive_close(struct zip_archive *archive) 237 + { 238 + munmap(archive->data, archive->size); 239 + free(archive); 240 + } 241 + 242 + static struct local_file_header *local_file_header_at_offset(struct zip_archive *archive, 243 + __u32 offset) 244 + { 245 + struct local_file_header *lfh; 246 + 247 + lfh = check_access(archive, offset, sizeof(*lfh)); 248 + if (!lfh || lfh->magic != LOCAL_FILE_HEADER_MAGIC) 249 + return NULL; 250 + 251 + return lfh; 252 + } 253 + 254 + static int get_entry_at_offset(struct zip_archive *archive, __u32 offset, struct zip_entry *out) 255 + { 256 + struct local_file_header *lfh; 257 + __u32 compressed_size; 258 + const char *name; 259 + void *data; 260 + 261 + lfh = local_file_header_at_offset(archive, offset); 262 + if (!lfh) 263 + return -EINVAL; 264 + 265 + offset += sizeof(*lfh); 266 + if ((lfh->flags & FLAG_ENCRYPTED) || (lfh->flags & FLAG_HAS_DATA_DESCRIPTOR)) 267 + return -EINVAL; 268 + 269 + name = check_access(archive, offset, lfh->file_name_length); 270 + if (!name) 271 + return -EINVAL; 272 + 273 + offset += lfh->file_name_length; 274 + if (!check_access(archive, offset, lfh->extra_field_length)) 275 + return -EINVAL; 276 + 277 + offset += lfh->extra_field_length; 278 + compressed_size = lfh->compressed_size; 279 + data = check_access(archive, offset, compressed_size); 280 + if (!data) 281 + return -EINVAL; 282 + 283 + out->compression = lfh->compression; 284 + out->name_length = lfh->file_name_length; 285 + out->name = name; 286 + out->data = data; 287 + out->data_length = compressed_size; 288 + out->data_offset = offset; 289 + 290 + return 0; 291 + } 292 + 293 + int zip_archive_find_entry(struct zip_archive *archive, const char *file_name, 294 + struct zip_entry *out) 295 + { 296 + size_t file_name_length = strlen(file_name); 297 + __u32 i, offset = archive->cd_offset; 298 + 299 + for (i = 0; i < archive->cd_records; ++i) { 300 + __u16 cdfh_name_length, cdfh_flags; 301 + struct cd_file_header *cdfh; 302 + const char *cdfh_name; 303 + 304 + cdfh = check_access(archive, offset, sizeof(*cdfh)); 305 + if (!cdfh || cdfh->magic != CD_FILE_HEADER_MAGIC) 306 + return -EINVAL; 307 + 308 + offset += sizeof(*cdfh); 309 + cdfh_name_length = cdfh->file_name_length; 310 + cdfh_name = check_access(archive, offset, cdfh_name_length); 311 + if (!cdfh_name) 312 + return -EINVAL; 313 + 314 + cdfh_flags = cdfh->flags; 315 + if ((cdfh_flags & FLAG_ENCRYPTED) == 0 && 316 + (cdfh_flags & FLAG_HAS_DATA_DESCRIPTOR) == 0 && 317 + file_name_length == cdfh_name_length && 318 + memcmp(file_name, archive->data + offset, file_name_length) == 0) { 319 + return get_entry_at_offset(archive, cdfh->offset, out); 320 + } 321 + 322 + offset += cdfh_name_length; 323 + offset += cdfh->extra_field_length; 324 + offset += cdfh->file_comment_length; 325 + } 326 + 327 + return -ENOENT; 328 + }

+47

tools/lib/bpf/zip.h

··· 1 + /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ 2 + 3 + #ifndef __LIBBPF_ZIP_H 4 + #define __LIBBPF_ZIP_H 5 + 6 + #include <linux/types.h> 7 + 8 + /* Represents an open zip archive. 9 + * Only basic ZIP files are supported, in particular the following are not 10 + * supported: 11 + * - encryption 12 + * - streaming 13 + * - multi-part ZIP files 14 + * - ZIP64 15 + */ 16 + struct zip_archive; 17 + 18 + /* Carries information on name, compression method, and data corresponding to a 19 + * file in a zip archive. 20 + */ 21 + struct zip_entry { 22 + /* Compression method as defined in pkzip spec. 0 means data is uncompressed. */ 23 + __u16 compression; 24 + 25 + /* Non-null terminated name of the file. */ 26 + const char *name; 27 + /* Length of the file name. */ 28 + __u16 name_length; 29 + 30 + /* Pointer to the file data. */ 31 + const void *data; 32 + /* Length of the file data. */ 33 + __u32 data_length; 34 + /* Offset of the file data within the archive. */ 35 + __u32 data_offset; 36 + }; 37 + 38 + /* Open a zip archive. Returns NULL in case of an error. */ 39 + struct zip_archive *zip_archive_open(const char *path); 40 + 41 + /* Close a zip archive and release resources. */ 42 + void zip_archive_close(struct zip_archive *archive); 43 + 44 + /* Look up an entry corresponding to a file in given zip archive. */ 45 + int zip_archive_find_entry(struct zip_archive *archive, const char *name, struct zip_entry *out); 46 + 47 + #endif

+2

tools/scripts/Makefile.include

··· 108 108 endif # CLANG_CROSS_FLAGS 109 109 CFLAGS += $(CLANG_CROSS_FLAGS) 110 110 AFLAGS += $(CLANG_CROSS_FLAGS) 111 + else 112 + CLANG_CROSS_FLAGS := 111 113 endif # CROSS_COMPILE 112 114 113 115 # Hack to avoid type-punned warnings on old systems such as RHEL5:

+2

tools/testing/selftests/bpf/DENYLIST.s390x

··· 4 4 bpf_cookie # failed to open_and_load program: -524 (trampoline) 5 5 bpf_loop # attaches to __x64_sys_nanosleep 6 6 cgrp_local_storage # prog_attach unexpected error: -524 (trampoline) 7 + dynptr/test_dynptr_skb_data 8 + dynptr/test_skb_readonly 7 9 fexit_sleep # fexit_skel_load fexit skeleton failed (trampoline) 8 10 get_stack_raw_tp # user_stack corrupted user stack (no backchain userspace) 9 11 kprobe_multi_bench_attach # bpf_program__attach_kprobe_multi_opts unexpected error: -95

+4 -3

tools/testing/selftests/bpf/Makefile

··· 338 338 define get_sys_includes 339 339 $(shell $(1) $(2) -v -E - </dev/null 2>&1 \ 340 340 | sed -n '/<...> search starts here:/,/End of search list./{ s| $/.*$|-idirafter \1|p }') \ 341 - $(shell $(1) $(2) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}') 341 + $(shell $(1) $(2) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}') \ 342 + $(shell $(1) $(2) -dM -E - </dev/null | grep '__loongarch_grlen ' | awk '{printf("-D__BITS_PER_LONG=%d", $$3)}') 342 343 endef 343 344 344 345 # Determine target endianness. ··· 357 356 -I$(abspath $(OUTPUT)/../usr/include) 358 357 359 358 CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \ 360 - -Wno-compare-distinct-pointer-types 359 + -Wno-compare-distinct-pointer-types -Wuninitialized 361 360 362 361 $(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline 363 362 $(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline ··· 559 558 TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c \ 560 559 network_helpers.c testing_helpers.c \ 561 560 btf_helpers.c flow_dissector_load.h \ 562 - cap_helpers.c test_loader.c xsk.c 561 + cap_helpers.c test_loader.c xsk.c disasm.c 563 562 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \ 564 563 $(OUTPUT)/liburandom_read.so \ 565 564 $(OUTPUT)/xdp_synproxy \

+38

tools/testing/selftests/bpf/bpf_kfuncs.h

··· 1 + #ifndef __BPF_KFUNCS__ 2 + #define __BPF_KFUNCS__ 3 + 4 + /* Description 5 + * Initializes an skb-type dynptr 6 + * Returns 7 + * Error code 8 + */ 9 + extern int bpf_dynptr_from_skb(struct __sk_buff *skb, __u64 flags, 10 + struct bpf_dynptr *ptr__uninit) __ksym; 11 + 12 + /* Description 13 + * Initializes an xdp-type dynptr 14 + * Returns 15 + * Error code 16 + */ 17 + extern int bpf_dynptr_from_xdp(struct xdp_md *xdp, __u64 flags, 18 + struct bpf_dynptr *ptr__uninit) __ksym; 19 + 20 + /* Description 21 + * Obtain a read-only pointer to the dynptr's data 22 + * Returns 23 + * Either a direct pointer to the dynptr data or a pointer to the user-provided 24 + * buffer if unable to obtain a direct pointer 25 + */ 26 + extern void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, __u32 offset, 27 + void *buffer, __u32 buffer__szk) __ksym; 28 + 29 + /* Description 30 + * Obtain a read-write pointer to the dynptr's data 31 + * Returns 32 + * Either a direct pointer to the dynptr data or a pointer to the user-provided 33 + * buffer if unable to obtain a direct pointer 34 + */ 35 + extern void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr *ptr, __u32 offset, 36 + void *buffer, __u32 buffer__szk) __ksym; 37 + 38 + #endif

+12 -6

tools/testing/selftests/bpf/prog_tests/align.c

··· 660 660 * func#0 @0 661 661 * 0: R1=ctx(off=0,imm=0) R10=fp0 662 662 * 0: (b7) r3 = 2 ; R3_w=2 663 + * 664 + * Sometimes it's actually two lines below, e.g. when 665 + * searching for "6: R3_w=scalar(umax=255,var_off=(0x0; 0xff))": 666 + * from 4 to 6: R0_w=pkt(off=8,r=8,imm=0) R1=ctx(off=0,imm=0) R2_w=pkt(off=0,r=8,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0 667 + * 6: R0_w=pkt(off=8,r=8,imm=0) R1=ctx(off=0,imm=0) R2_w=pkt(off=0,r=8,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0 668 + * 6: (71) r3 = *(u8 *)(r2 +0) ; R2_w=pkt(off=0,r=8,imm=0) R3_w=scalar(umax=255,var_off=(0x0; 0xff)) 663 669 */ 664 - if (!strstr(line_ptr, m.match)) { 670 + while (!strstr(line_ptr, m.match)) { 665 671 cur_line = -1; 666 672 line_ptr = strtok(NULL, "\n"); 667 - sscanf(line_ptr, "%u: ", &cur_line); 673 + sscanf(line_ptr ?: "", "%u: ", &cur_line); 674 + if (!line_ptr || cur_line != m.line) 675 + break; 668 676 } 669 - if (cur_line != m.line || !line_ptr || 670 - !strstr(line_ptr, m.match)) { 671 - printf("Failed to find match %u: %s\n", 672 - m.line, m.match); 677 + if (cur_line != m.line || !line_ptr || !strstr(line_ptr, m.match)) { 678 + printf("Failed to find match %u: %s\n", m.line, m.match); 673 679 ret = 1; 674 680 printf("%s", bpf_vlog); 675 681 break;

+207 -108

tools/testing/selftests/bpf/prog_tests/attach_probe.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <test_progs.h> 3 + #include "test_attach_kprobe_sleepable.skel.h" 4 + #include "test_attach_probe_manual.skel.h" 3 5 #include "test_attach_probe.skel.h" 4 6 5 7 /* this is how USDT semaphore is actually defined, except volatile modifier */ ··· 25 23 asm volatile (""); 26 24 } 27 25 26 + /* attach point for ref_ctr */ 27 + static noinline void trigger_func4(void) 28 + { 29 + asm volatile (""); 30 + } 31 + 28 32 static char test_data[] = "test_data"; 29 33 30 - void test_attach_probe(void) 34 + /* manual attach kprobe/kretprobe/uprobe/uretprobe testings */ 35 + static void test_attach_probe_manual(enum probe_attach_mode attach_mode) 31 36 { 32 37 DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts); 38 + DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts); 33 39 struct bpf_link *kprobe_link, *kretprobe_link; 34 40 struct bpf_link *uprobe_link, *uretprobe_link; 35 - struct test_attach_probe* skel; 36 - ssize_t uprobe_offset, ref_ctr_offset; 37 - struct bpf_link *uprobe_err_link; 38 - FILE *devnull; 39 - bool legacy; 41 + struct test_attach_probe_manual *skel; 42 + ssize_t uprobe_offset; 40 43 41 - /* Check if new-style kprobe/uprobe API is supported. 42 - * Kernels that support new FD-based kprobe and uprobe BPF attachment 43 - * through perf_event_open() syscall expose 44 - * /sys/bus/event_source/devices/kprobe/type and 45 - * /sys/bus/event_source/devices/uprobe/type files, respectively. They 46 - * contain magic numbers that are passed as "type" field of 47 - * perf_event_attr. Lack of such file in the system indicates legacy 48 - * kernel with old-style kprobe/uprobe attach interface through 49 - * creating per-probe event through tracefs. For such cases 50 - * ref_ctr_offset feature is not supported, so we don't test it. 51 - */ 52 - legacy = access("/sys/bus/event_source/devices/kprobe/type", F_OK) != 0; 44 + skel = test_attach_probe_manual__open_and_load(); 45 + if (!ASSERT_OK_PTR(skel, "skel_kprobe_manual_open_and_load")) 46 + return; 53 47 54 48 uprobe_offset = get_uprobe_offset(&trigger_func); 55 49 if (!ASSERT_GE(uprobe_offset, 0, "uprobe_offset")) 56 - return; 57 - 58 - ref_ctr_offset = get_rel_offset((uintptr_t)&uprobe_ref_ctr); 59 - if (!ASSERT_GE(ref_ctr_offset, 0, "ref_ctr_offset")) 60 - return; 61 - 62 - skel = test_attach_probe__open(); 63 - if (!ASSERT_OK_PTR(skel, "skel_open")) 64 - return; 65 - 66 - /* sleepable kprobe test case needs flags set before loading */ 67 - if (!ASSERT_OK(bpf_program__set_flags(skel->progs.handle_kprobe_sleepable, 68 - BPF_F_SLEEPABLE), "kprobe_sleepable_flags")) 69 - goto cleanup; 70 - 71 - if (!ASSERT_OK(test_attach_probe__load(skel), "skel_load")) 72 - goto cleanup; 73 - if (!ASSERT_OK_PTR(skel->bss, "check_bss")) 74 50 goto cleanup; 75 51 76 52 /* manual-attach kprobe/kretprobe */ 77 - kprobe_link = bpf_program__attach_kprobe(skel->progs.handle_kprobe, 78 - false /* retprobe */, 79 - SYS_NANOSLEEP_KPROBE_NAME); 53 + kprobe_opts.attach_mode = attach_mode; 54 + kprobe_opts.retprobe = false; 55 + kprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe, 56 + SYS_NANOSLEEP_KPROBE_NAME, 57 + &kprobe_opts); 80 58 if (!ASSERT_OK_PTR(kprobe_link, "attach_kprobe")) 81 59 goto cleanup; 82 60 skel->links.handle_kprobe = kprobe_link; 83 61 84 - kretprobe_link = bpf_program__attach_kprobe(skel->progs.handle_kretprobe, 85 - true /* retprobe */, 86 - SYS_NANOSLEEP_KPROBE_NAME); 62 + kprobe_opts.retprobe = true; 63 + kretprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe, 64 + SYS_NANOSLEEP_KPROBE_NAME, 65 + &kprobe_opts); 87 66 if (!ASSERT_OK_PTR(kretprobe_link, "attach_kretprobe")) 88 67 goto cleanup; 89 68 skel->links.handle_kretprobe = kretprobe_link; 90 69 91 - /* auto-attachable kprobe and kretprobe */ 92 - skel->links.handle_kprobe_auto = bpf_program__attach(skel->progs.handle_kprobe_auto); 93 - ASSERT_OK_PTR(skel->links.handle_kprobe_auto, "attach_kprobe_auto"); 94 - 95 - skel->links.handle_kretprobe_auto = bpf_program__attach(skel->progs.handle_kretprobe_auto); 96 - ASSERT_OK_PTR(skel->links.handle_kretprobe_auto, "attach_kretprobe_auto"); 97 - 98 - if (!legacy) 99 - ASSERT_EQ(uprobe_ref_ctr, 0, "uprobe_ref_ctr_before"); 100 - 70 + /* manual-attach uprobe/uretprobe */ 71 + uprobe_opts.attach_mode = attach_mode; 72 + uprobe_opts.ref_ctr_offset = 0; 101 73 uprobe_opts.retprobe = false; 102 - uprobe_opts.ref_ctr_offset = legacy ? 0 : ref_ctr_offset; 103 74 uprobe_link = bpf_program__attach_uprobe_opts(skel->progs.handle_uprobe, 104 75 0 /* self pid */, 105 76 "/proc/self/exe", ··· 82 107 goto cleanup; 83 108 skel->links.handle_uprobe = uprobe_link; 84 109 85 - if (!legacy) 86 - ASSERT_GT(uprobe_ref_ctr, 0, "uprobe_ref_ctr_after"); 87 - 88 - /* if uprobe uses ref_ctr, uretprobe has to use ref_ctr as well */ 89 110 uprobe_opts.retprobe = true; 90 - uprobe_opts.ref_ctr_offset = legacy ? 0 : ref_ctr_offset; 91 111 uretprobe_link = bpf_program__attach_uprobe_opts(skel->progs.handle_uretprobe, 92 112 -1 /* any pid */, 93 113 "/proc/self/exe", ··· 91 121 goto cleanup; 92 122 skel->links.handle_uretprobe = uretprobe_link; 93 123 94 - /* verify auto-attach fails for old-style uprobe definition */ 95 - uprobe_err_link = bpf_program__attach(skel->progs.handle_uprobe_byname); 96 - if (!ASSERT_EQ(libbpf_get_error(uprobe_err_link), -EOPNOTSUPP, 97 - "auto-attach should fail for old-style name")) 98 - goto cleanup; 99 - 124 + /* attach uprobe by function name manually */ 100 125 uprobe_opts.func_name = "trigger_func2"; 101 126 uprobe_opts.retprobe = false; 102 127 uprobe_opts.ref_ctr_offset = 0; ··· 103 138 if (!ASSERT_OK_PTR(skel->links.handle_uprobe_byname, "attach_uprobe_byname")) 104 139 goto cleanup; 105 140 141 + /* trigger & validate kprobe && kretprobe */ 142 + usleep(1); 143 + 144 + /* trigger & validate uprobe & uretprobe */ 145 + trigger_func(); 146 + 147 + /* trigger & validate uprobe attached by name */ 148 + trigger_func2(); 149 + 150 + ASSERT_EQ(skel->bss->kprobe_res, 1, "check_kprobe_res"); 151 + ASSERT_EQ(skel->bss->kretprobe_res, 2, "check_kretprobe_res"); 152 + ASSERT_EQ(skel->bss->uprobe_res, 3, "check_uprobe_res"); 153 + ASSERT_EQ(skel->bss->uretprobe_res, 4, "check_uretprobe_res"); 154 + ASSERT_EQ(skel->bss->uprobe_byname_res, 5, "check_uprobe_byname_res"); 155 + 156 + cleanup: 157 + test_attach_probe_manual__destroy(skel); 158 + } 159 + 160 + static void test_attach_probe_auto(struct test_attach_probe *skel) 161 + { 162 + struct bpf_link *uprobe_err_link; 163 + 164 + /* auto-attachable kprobe and kretprobe */ 165 + skel->links.handle_kprobe_auto = bpf_program__attach(skel->progs.handle_kprobe_auto); 166 + ASSERT_OK_PTR(skel->links.handle_kprobe_auto, "attach_kprobe_auto"); 167 + 168 + skel->links.handle_kretprobe_auto = bpf_program__attach(skel->progs.handle_kretprobe_auto); 169 + ASSERT_OK_PTR(skel->links.handle_kretprobe_auto, "attach_kretprobe_auto"); 170 + 171 + /* verify auto-attach fails for old-style uprobe definition */ 172 + uprobe_err_link = bpf_program__attach(skel->progs.handle_uprobe_byname); 173 + if (!ASSERT_EQ(libbpf_get_error(uprobe_err_link), -EOPNOTSUPP, 174 + "auto-attach should fail for old-style name")) 175 + return; 176 + 106 177 /* verify auto-attach works */ 107 178 skel->links.handle_uretprobe_byname = 108 179 bpf_program__attach(skel->progs.handle_uretprobe_byname); 109 180 if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname, "attach_uretprobe_byname")) 110 - goto cleanup; 181 + return; 182 + 183 + /* trigger & validate kprobe && kretprobe */ 184 + usleep(1); 185 + 186 + /* trigger & validate uprobe attached by name */ 187 + trigger_func2(); 188 + 189 + ASSERT_EQ(skel->bss->kprobe2_res, 11, "check_kprobe_auto_res"); 190 + ASSERT_EQ(skel->bss->kretprobe2_res, 22, "check_kretprobe_auto_res"); 191 + ASSERT_EQ(skel->bss->uretprobe_byname_res, 6, "check_uretprobe_byname_res"); 192 + } 193 + 194 + static void test_uprobe_lib(struct test_attach_probe *skel) 195 + { 196 + DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts); 197 + FILE *devnull; 111 198 112 199 /* test attach by name for a library function, using the library 113 200 * as the binary argument. libc.so.6 will be resolved via dlopen()/dlinfo(). ··· 172 155 "libc.so.6", 173 156 0, &uprobe_opts); 174 157 if (!ASSERT_OK_PTR(skel->links.handle_uprobe_byname2, "attach_uprobe_byname2")) 175 - goto cleanup; 158 + return; 176 159 177 160 uprobe_opts.func_name = "fclose"; 178 161 uprobe_opts.retprobe = true; ··· 182 165 "libc.so.6", 183 166 0, &uprobe_opts); 184 167 if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname2, "attach_uretprobe_byname2")) 185 - goto cleanup; 186 - 187 - /* sleepable kprobes should not attach successfully */ 188 - skel->links.handle_kprobe_sleepable = bpf_program__attach(skel->progs.handle_kprobe_sleepable); 189 - if (!ASSERT_ERR_PTR(skel->links.handle_kprobe_sleepable, "attach_kprobe_sleepable")) 190 - goto cleanup; 191 - 192 - /* test sleepable uprobe and uretprobe variants */ 193 - skel->links.handle_uprobe_byname3_sleepable = bpf_program__attach(skel->progs.handle_uprobe_byname3_sleepable); 194 - if (!ASSERT_OK_PTR(skel->links.handle_uprobe_byname3_sleepable, "attach_uprobe_byname3_sleepable")) 195 - goto cleanup; 196 - 197 - skel->links.handle_uprobe_byname3 = bpf_program__attach(skel->progs.handle_uprobe_byname3); 198 - if (!ASSERT_OK_PTR(skel->links.handle_uprobe_byname3, "attach_uprobe_byname3")) 199 - goto cleanup; 200 - 201 - skel->links.handle_uretprobe_byname3_sleepable = bpf_program__attach(skel->progs.handle_uretprobe_byname3_sleepable); 202 - if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname3_sleepable, "attach_uretprobe_byname3_sleepable")) 203 - goto cleanup; 204 - 205 - skel->links.handle_uretprobe_byname3 = bpf_program__attach(skel->progs.handle_uretprobe_byname3); 206 - if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname3, "attach_uretprobe_byname3")) 207 - goto cleanup; 208 - 209 - skel->bss->user_ptr = test_data; 210 - 211 - /* trigger & validate kprobe && kretprobe */ 212 - usleep(1); 168 + return; 213 169 214 170 /* trigger & validate shared library u[ret]probes attached by name */ 215 171 devnull = fopen("/dev/null", "r"); 216 172 fclose(devnull); 217 173 218 - /* trigger & validate uprobe & uretprobe */ 219 - trigger_func(); 174 + ASSERT_EQ(skel->bss->uprobe_byname2_res, 7, "check_uprobe_byname2_res"); 175 + ASSERT_EQ(skel->bss->uretprobe_byname2_res, 8, "check_uretprobe_byname2_res"); 176 + } 220 177 221 - /* trigger & validate uprobe attached by name */ 222 - trigger_func2(); 178 + static void test_uprobe_ref_ctr(struct test_attach_probe *skel) 179 + { 180 + DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts); 181 + struct bpf_link *uprobe_link, *uretprobe_link; 182 + ssize_t uprobe_offset, ref_ctr_offset; 183 + 184 + uprobe_offset = get_uprobe_offset(&trigger_func4); 185 + if (!ASSERT_GE(uprobe_offset, 0, "uprobe_offset_ref_ctr")) 186 + return; 187 + 188 + ref_ctr_offset = get_rel_offset((uintptr_t)&uprobe_ref_ctr); 189 + if (!ASSERT_GE(ref_ctr_offset, 0, "ref_ctr_offset")) 190 + return; 191 + 192 + ASSERT_EQ(uprobe_ref_ctr, 0, "uprobe_ref_ctr_before"); 193 + 194 + uprobe_opts.retprobe = false; 195 + uprobe_opts.ref_ctr_offset = ref_ctr_offset; 196 + uprobe_link = bpf_program__attach_uprobe_opts(skel->progs.handle_uprobe_ref_ctr, 197 + 0 /* self pid */, 198 + "/proc/self/exe", 199 + uprobe_offset, 200 + &uprobe_opts); 201 + if (!ASSERT_OK_PTR(uprobe_link, "attach_uprobe_ref_ctr")) 202 + return; 203 + skel->links.handle_uprobe_ref_ctr = uprobe_link; 204 + 205 + ASSERT_GT(uprobe_ref_ctr, 0, "uprobe_ref_ctr_after"); 206 + 207 + /* if uprobe uses ref_ctr, uretprobe has to use ref_ctr as well */ 208 + uprobe_opts.retprobe = true; 209 + uprobe_opts.ref_ctr_offset = ref_ctr_offset; 210 + uretprobe_link = bpf_program__attach_uprobe_opts(skel->progs.handle_uretprobe_ref_ctr, 211 + -1 /* any pid */, 212 + "/proc/self/exe", 213 + uprobe_offset, &uprobe_opts); 214 + if (!ASSERT_OK_PTR(uretprobe_link, "attach_uretprobe_ref_ctr")) 215 + return; 216 + skel->links.handle_uretprobe_ref_ctr = uretprobe_link; 217 + } 218 + 219 + static void test_kprobe_sleepable(void) 220 + { 221 + struct test_attach_kprobe_sleepable *skel; 222 + 223 + skel = test_attach_kprobe_sleepable__open(); 224 + if (!ASSERT_OK_PTR(skel, "skel_kprobe_sleepable_open")) 225 + return; 226 + 227 + /* sleepable kprobe test case needs flags set before loading */ 228 + if (!ASSERT_OK(bpf_program__set_flags(skel->progs.handle_kprobe_sleepable, 229 + BPF_F_SLEEPABLE), "kprobe_sleepable_flags")) 230 + goto cleanup; 231 + 232 + if (!ASSERT_OK(test_attach_kprobe_sleepable__load(skel), 233 + "skel_kprobe_sleepable_load")) 234 + goto cleanup; 235 + 236 + /* sleepable kprobes should not attach successfully */ 237 + skel->links.handle_kprobe_sleepable = bpf_program__attach(skel->progs.handle_kprobe_sleepable); 238 + ASSERT_ERR_PTR(skel->links.handle_kprobe_sleepable, "attach_kprobe_sleepable"); 239 + 240 + cleanup: 241 + test_attach_kprobe_sleepable__destroy(skel); 242 + } 243 + 244 + static void test_uprobe_sleepable(struct test_attach_probe *skel) 245 + { 246 + /* test sleepable uprobe and uretprobe variants */ 247 + skel->links.handle_uprobe_byname3_sleepable = bpf_program__attach(skel->progs.handle_uprobe_byname3_sleepable); 248 + if (!ASSERT_OK_PTR(skel->links.handle_uprobe_byname3_sleepable, "attach_uprobe_byname3_sleepable")) 249 + return; 250 + 251 + skel->links.handle_uprobe_byname3 = bpf_program__attach(skel->progs.handle_uprobe_byname3); 252 + if (!ASSERT_OK_PTR(skel->links.handle_uprobe_byname3, "attach_uprobe_byname3")) 253 + return; 254 + 255 + skel->links.handle_uretprobe_byname3_sleepable = bpf_program__attach(skel->progs.handle_uretprobe_byname3_sleepable); 256 + if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname3_sleepable, "attach_uretprobe_byname3_sleepable")) 257 + return; 258 + 259 + skel->links.handle_uretprobe_byname3 = bpf_program__attach(skel->progs.handle_uretprobe_byname3); 260 + if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname3, "attach_uretprobe_byname3")) 261 + return; 262 + 263 + skel->bss->user_ptr = test_data; 223 264 224 265 /* trigger & validate sleepable uprobe attached by name */ 225 266 trigger_func3(); 226 267 227 - ASSERT_EQ(skel->bss->kprobe_res, 1, "check_kprobe_res"); 228 - ASSERT_EQ(skel->bss->kprobe2_res, 11, "check_kprobe_auto_res"); 229 - ASSERT_EQ(skel->bss->kretprobe_res, 2, "check_kretprobe_res"); 230 - ASSERT_EQ(skel->bss->kretprobe2_res, 22, "check_kretprobe_auto_res"); 231 - ASSERT_EQ(skel->bss->uprobe_res, 3, "check_uprobe_res"); 232 - ASSERT_EQ(skel->bss->uretprobe_res, 4, "check_uretprobe_res"); 233 - ASSERT_EQ(skel->bss->uprobe_byname_res, 5, "check_uprobe_byname_res"); 234 - ASSERT_EQ(skel->bss->uretprobe_byname_res, 6, "check_uretprobe_byname_res"); 235 - ASSERT_EQ(skel->bss->uprobe_byname2_res, 7, "check_uprobe_byname2_res"); 236 - ASSERT_EQ(skel->bss->uretprobe_byname2_res, 8, "check_uretprobe_byname2_res"); 237 268 ASSERT_EQ(skel->bss->uprobe_byname3_sleepable_res, 9, "check_uprobe_byname3_sleepable_res"); 238 269 ASSERT_EQ(skel->bss->uprobe_byname3_res, 10, "check_uprobe_byname3_res"); 239 270 ASSERT_EQ(skel->bss->uretprobe_byname3_sleepable_res, 11, "check_uretprobe_byname3_sleepable_res"); 240 271 ASSERT_EQ(skel->bss->uretprobe_byname3_res, 12, "check_uretprobe_byname3_res"); 272 + } 273 + 274 + void test_attach_probe(void) 275 + { 276 + struct test_attach_probe *skel; 277 + 278 + skel = test_attach_probe__open(); 279 + if (!ASSERT_OK_PTR(skel, "skel_open")) 280 + return; 281 + 282 + if (!ASSERT_OK(test_attach_probe__load(skel), "skel_load")) 283 + goto cleanup; 284 + if (!ASSERT_OK_PTR(skel->bss, "check_bss")) 285 + goto cleanup; 286 + 287 + if (test__start_subtest("manual-default")) 288 + test_attach_probe_manual(PROBE_ATTACH_MODE_DEFAULT); 289 + if (test__start_subtest("manual-legacy")) 290 + test_attach_probe_manual(PROBE_ATTACH_MODE_LEGACY); 291 + if (test__start_subtest("manual-perf")) 292 + test_attach_probe_manual(PROBE_ATTACH_MODE_PERF); 293 + if (test__start_subtest("manual-link")) 294 + test_attach_probe_manual(PROBE_ATTACH_MODE_LINK); 295 + 296 + if (test__start_subtest("auto")) 297 + test_attach_probe_auto(skel); 298 + if (test__start_subtest("kprobe-sleepable")) 299 + test_kprobe_sleepable(); 300 + if (test__start_subtest("uprobe-lib")) 301 + test_uprobe_lib(skel); 302 + if (test__start_subtest("uprobe-sleepable")) 303 + test_uprobe_sleepable(skel); 304 + if (test__start_subtest("uprobe-ref_ctr")) 305 + test_uprobe_ref_ctr(skel); 241 306 242 307 cleanup: 243 308 test_attach_probe__destroy(skel);

+1

tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c

··· 84 84 "test_cgrp_xchg_release", 85 85 "test_cgrp_get_release", 86 86 "test_cgrp_get_ancestors", 87 + "test_cgrp_from_id", 87 88 }; 88 89 89 90 void test_cgrp_kfunc(void)

+7 -7

tools/testing/selftests/bpf/prog_tests/cgrp_local_storage.c

··· 193 193 cgrp_ls_sleepable__destroy(skel); 194 194 } 195 195 196 - static void test_no_rcu_lock(__u64 cgroup_id) 196 + static void test_yes_rcu_lock(__u64 cgroup_id) 197 197 { 198 198 struct cgrp_ls_sleepable *skel; 199 199 int err; ··· 204 204 205 205 skel->bss->target_pid = syscall(SYS_gettid); 206 206 207 - bpf_program__set_autoload(skel->progs.no_rcu_lock, true); 207 + bpf_program__set_autoload(skel->progs.yes_rcu_lock, true); 208 208 err = cgrp_ls_sleepable__load(skel); 209 209 if (!ASSERT_OK(err, "skel_load")) 210 210 goto out; ··· 220 220 cgrp_ls_sleepable__destroy(skel); 221 221 } 222 222 223 - static void test_rcu_lock(void) 223 + static void test_no_rcu_lock(void) 224 224 { 225 225 struct cgrp_ls_sleepable *skel; 226 226 int err; ··· 229 229 if (!ASSERT_OK_PTR(skel, "skel_open")) 230 230 return; 231 231 232 - bpf_program__set_autoload(skel->progs.yes_rcu_lock, true); 232 + bpf_program__set_autoload(skel->progs.no_rcu_lock, true); 233 233 err = cgrp_ls_sleepable__load(skel); 234 234 ASSERT_ERR(err, "skel_load"); 235 235 ··· 256 256 test_negative(); 257 257 if (test__start_subtest("cgroup_iter_sleepable")) 258 258 test_cgroup_iter_sleepable(cgroup_fd, cgroup_id); 259 + if (test__start_subtest("yes_rcu_lock")) 260 + test_yes_rcu_lock(cgroup_id); 259 261 if (test__start_subtest("no_rcu_lock")) 260 - test_no_rcu_lock(cgroup_id); 261 - if (test__start_subtest("rcu_lock")) 262 - test_rcu_lock(); 262 + test_no_rcu_lock(); 263 263 264 264 close(cgroup_fd); 265 265 }

+25

tools/testing/selftests/bpf/prog_tests/cls_redirect.c

··· 13 13 14 14 #include "progs/test_cls_redirect.h" 15 15 #include "test_cls_redirect.skel.h" 16 + #include "test_cls_redirect_dynptr.skel.h" 16 17 #include "test_cls_redirect_subprogs.skel.h" 17 18 18 19 #define ENCAP_IP INADDR_LOOPBACK ··· 447 446 close_fds((int *)conns, sizeof(conns) / sizeof(conns[0][0])); 448 447 } 449 448 449 + static void test_cls_redirect_dynptr(void) 450 + { 451 + struct test_cls_redirect_dynptr *skel; 452 + int err; 453 + 454 + skel = test_cls_redirect_dynptr__open(); 455 + if (!ASSERT_OK_PTR(skel, "skel_open")) 456 + return; 457 + 458 + skel->rodata->ENCAPSULATION_IP = htonl(ENCAP_IP); 459 + skel->rodata->ENCAPSULATION_PORT = htons(ENCAP_PORT); 460 + 461 + err = test_cls_redirect_dynptr__load(skel); 462 + if (!ASSERT_OK(err, "skel_load")) 463 + goto cleanup; 464 + 465 + test_cls_redirect_common(skel->progs.cls_redirect); 466 + 467 + cleanup: 468 + test_cls_redirect_dynptr__destroy(skel); 469 + } 470 + 450 471 static void test_cls_redirect_inlined(void) 451 472 { 452 473 struct test_cls_redirect *skel; ··· 519 496 test_cls_redirect_inlined(); 520 497 if (test__start_subtest("cls_redirect_subprogs")) 521 498 test_cls_redirect_subprogs(); 499 + if (test__start_subtest("cls_redirect_dynptr")) 500 + test_cls_redirect_dynptr(); 522 501 }

+917

tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <limits.h> 4 + #include <stdio.h> 5 + #include <string.h> 6 + #include <ctype.h> 7 + #include <regex.h> 8 + #include <test_progs.h> 9 + 10 + #include "bpf/btf.h" 11 + #include "bpf_util.h" 12 + #include "linux/filter.h" 13 + #include "disasm.h" 14 + 15 + #define MAX_PROG_TEXT_SZ (32 * 1024) 16 + 17 + /* The code in this file serves the sole purpose of executing test cases 18 + * specified in the test_cases array. Each test case specifies a program 19 + * type, context field offset, and disassembly patterns that correspond 20 + * to read and write instructions generated by 21 + * verifier.c:convert_ctx_access() for accessing that field. 22 + * 23 + * For each test case, up to three programs are created: 24 + * - One that uses BPF_LDX_MEM to read the context field. 25 + * - One that uses BPF_STX_MEM to write to the context field. 26 + * - One that uses BPF_ST_MEM to write to the context field. 27 + * 28 + * The disassembly of each program is then compared with the pattern 29 + * specified in the test case. 30 + */ 31 + struct test_case { 32 + char *name; 33 + enum bpf_prog_type prog_type; 34 + enum bpf_attach_type expected_attach_type; 35 + int field_offset; 36 + int field_sz; 37 + /* Program generated for BPF_ST_MEM uses value 42 by default, 38 + * this field allows to specify custom value. 39 + */ 40 + struct { 41 + bool use; 42 + int value; 43 + } st_value; 44 + /* Pattern for BPF_LDX_MEM(field_sz, dst, ctx, field_offset) */ 45 + char *read; 46 + /* Pattern for BPF_STX_MEM(field_sz, ctx, src, field_offset) and 47 + * BPF_ST_MEM (field_sz, ctx, src, field_offset) 48 + */ 49 + char *write; 50 + /* Pattern for BPF_ST_MEM(field_sz, ctx, src, field_offset), 51 + * takes priority over `write`. 52 + */ 53 + char *write_st; 54 + /* Pattern for BPF_STX_MEM (field_sz, ctx, src, field_offset), 55 + * takes priority over `write`. 56 + */ 57 + char *write_stx; 58 + }; 59 + 60 + #define N(_prog_type, type, field, name_extra...) \ 61 + .name = #_prog_type "." #field name_extra, \ 62 + .prog_type = BPF_PROG_TYPE_##_prog_type, \ 63 + .field_offset = offsetof(type, field), \ 64 + .field_sz = sizeof(typeof(((type *)NULL)->field)) 65 + 66 + static struct test_case test_cases[] = { 67 + /* Sign extension on s390 changes the pattern */ 68 + #if defined(__x86_64__) || defined(__aarch64__) 69 + { 70 + N(SCHED_CLS, struct __sk_buff, tstamp), 71 + .read = "r11 = *(u8 *)($ctx + sk_buff::__pkt_vlan_present_offset);" 72 + "w11 &= 160;" 73 + "if w11 != 0xa0 goto pc+2;" 74 + "$dst = 0;" 75 + "goto pc+1;" 76 + "$dst = *(u64 *)($ctx + sk_buff::tstamp);", 77 + .write = "r11 = *(u8 *)($ctx + sk_buff::__pkt_vlan_present_offset);" 78 + "if w11 & 0x80 goto pc+1;" 79 + "goto pc+2;" 80 + "w11 &= -33;" 81 + "*(u8 *)($ctx + sk_buff::__pkt_vlan_present_offset) = r11;" 82 + "*(u64 *)($ctx + sk_buff::tstamp) = $src;", 83 + }, 84 + #endif 85 + { 86 + N(SCHED_CLS, struct __sk_buff, priority), 87 + .read = "$dst = *(u32 *)($ctx + sk_buff::priority);", 88 + .write = "*(u32 *)($ctx + sk_buff::priority) = $src;", 89 + }, 90 + { 91 + N(SCHED_CLS, struct __sk_buff, mark), 92 + .read = "$dst = *(u32 *)($ctx + sk_buff::mark);", 93 + .write = "*(u32 *)($ctx + sk_buff::mark) = $src;", 94 + }, 95 + { 96 + N(SCHED_CLS, struct __sk_buff, cb[0]), 97 + .read = "$dst = *(u32 *)($ctx + $(sk_buff::cb + qdisc_skb_cb::data));", 98 + .write = "*(u32 *)($ctx + $(sk_buff::cb + qdisc_skb_cb::data)) = $src;", 99 + }, 100 + { 101 + N(SCHED_CLS, struct __sk_buff, tc_classid), 102 + .read = "$dst = *(u16 *)($ctx + $(sk_buff::cb + qdisc_skb_cb::tc_classid));", 103 + .write = "*(u16 *)($ctx + $(sk_buff::cb + qdisc_skb_cb::tc_classid)) = $src;", 104 + }, 105 + { 106 + N(SCHED_CLS, struct __sk_buff, tc_index), 107 + .read = "$dst = *(u16 *)($ctx + sk_buff::tc_index);", 108 + .write = "*(u16 *)($ctx + sk_buff::tc_index) = $src;", 109 + }, 110 + { 111 + N(SCHED_CLS, struct __sk_buff, queue_mapping), 112 + .read = "$dst = *(u16 *)($ctx + sk_buff::queue_mapping);", 113 + .write_stx = "if $src >= 0xffff goto pc+1;" 114 + "*(u16 *)($ctx + sk_buff::queue_mapping) = $src;", 115 + .write_st = "*(u16 *)($ctx + sk_buff::queue_mapping) = $src;", 116 + }, 117 + { 118 + /* This is a corner case in filter.c:bpf_convert_ctx_access() */ 119 + N(SCHED_CLS, struct __sk_buff, queue_mapping, ".ushrt_max"), 120 + .st_value = { true, USHRT_MAX }, 121 + .write_st = "goto pc+0;", 122 + }, 123 + { 124 + N(CGROUP_SOCK, struct bpf_sock, bound_dev_if), 125 + .read = "$dst = *(u32 *)($ctx + sock_common::skc_bound_dev_if);", 126 + .write = "*(u32 *)($ctx + sock_common::skc_bound_dev_if) = $src;", 127 + }, 128 + { 129 + N(CGROUP_SOCK, struct bpf_sock, mark), 130 + .read = "$dst = *(u32 *)($ctx + sock::sk_mark);", 131 + .write = "*(u32 *)($ctx + sock::sk_mark) = $src;", 132 + }, 133 + { 134 + N(CGROUP_SOCK, struct bpf_sock, priority), 135 + .read = "$dst = *(u32 *)($ctx + sock::sk_priority);", 136 + .write = "*(u32 *)($ctx + sock::sk_priority) = $src;", 137 + }, 138 + { 139 + N(SOCK_OPS, struct bpf_sock_ops, replylong[0]), 140 + .read = "$dst = *(u32 *)($ctx + bpf_sock_ops_kern::replylong);", 141 + .write = "*(u32 *)($ctx + bpf_sock_ops_kern::replylong) = $src;", 142 + }, 143 + { 144 + N(CGROUP_SYSCTL, struct bpf_sysctl, file_pos), 145 + #if __BYTE_ORDER == __LITTLE_ENDIAN 146 + .read = "$dst = *(u64 *)($ctx + bpf_sysctl_kern::ppos);" 147 + "$dst = *(u32 *)($dst +0);", 148 + .write = "*(u64 *)($ctx + bpf_sysctl_kern::tmp_reg) = r9;" 149 + "r9 = *(u64 *)($ctx + bpf_sysctl_kern::ppos);" 150 + "*(u32 *)(r9 +0) = $src;" 151 + "r9 = *(u64 *)($ctx + bpf_sysctl_kern::tmp_reg);", 152 + #else 153 + .read = "$dst = *(u64 *)($ctx + bpf_sysctl_kern::ppos);" 154 + "$dst = *(u32 *)($dst +4);", 155 + .write = "*(u64 *)($ctx + bpf_sysctl_kern::tmp_reg) = r9;" 156 + "r9 = *(u64 *)($ctx + bpf_sysctl_kern::ppos);" 157 + "*(u32 *)(r9 +4) = $src;" 158 + "r9 = *(u64 *)($ctx + bpf_sysctl_kern::tmp_reg);", 159 + #endif 160 + }, 161 + { 162 + N(CGROUP_SOCKOPT, struct bpf_sockopt, sk), 163 + .read = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::sk);", 164 + .expected_attach_type = BPF_CGROUP_GETSOCKOPT, 165 + }, 166 + { 167 + N(CGROUP_SOCKOPT, struct bpf_sockopt, level), 168 + .read = "$dst = *(u32 *)($ctx + bpf_sockopt_kern::level);", 169 + .write = "*(u32 *)($ctx + bpf_sockopt_kern::level) = $src;", 170 + .expected_attach_type = BPF_CGROUP_SETSOCKOPT, 171 + }, 172 + { 173 + N(CGROUP_SOCKOPT, struct bpf_sockopt, optname), 174 + .read = "$dst = *(u32 *)($ctx + bpf_sockopt_kern::optname);", 175 + .write = "*(u32 *)($ctx + bpf_sockopt_kern::optname) = $src;", 176 + .expected_attach_type = BPF_CGROUP_SETSOCKOPT, 177 + }, 178 + { 179 + N(CGROUP_SOCKOPT, struct bpf_sockopt, optlen), 180 + .read = "$dst = *(u32 *)($ctx + bpf_sockopt_kern::optlen);", 181 + .write = "*(u32 *)($ctx + bpf_sockopt_kern::optlen) = $src;", 182 + .expected_attach_type = BPF_CGROUP_SETSOCKOPT, 183 + }, 184 + { 185 + N(CGROUP_SOCKOPT, struct bpf_sockopt, retval), 186 + .read = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::current_task);" 187 + "$dst = *(u64 *)($dst + task_struct::bpf_ctx);" 188 + "$dst = *(u32 *)($dst + bpf_cg_run_ctx::retval);", 189 + .write = "*(u64 *)($ctx + bpf_sockopt_kern::tmp_reg) = r9;" 190 + "r9 = *(u64 *)($ctx + bpf_sockopt_kern::current_task);" 191 + "r9 = *(u64 *)(r9 + task_struct::bpf_ctx);" 192 + "*(u32 *)(r9 + bpf_cg_run_ctx::retval) = $src;" 193 + "r9 = *(u64 *)($ctx + bpf_sockopt_kern::tmp_reg);", 194 + .expected_attach_type = BPF_CGROUP_GETSOCKOPT, 195 + }, 196 + { 197 + N(CGROUP_SOCKOPT, struct bpf_sockopt, optval), 198 + .read = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::optval);", 199 + .expected_attach_type = BPF_CGROUP_GETSOCKOPT, 200 + }, 201 + { 202 + N(CGROUP_SOCKOPT, struct bpf_sockopt, optval_end), 203 + .read = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::optval_end);", 204 + .expected_attach_type = BPF_CGROUP_GETSOCKOPT, 205 + }, 206 + }; 207 + 208 + #undef N 209 + 210 + static regex_t *ident_regex; 211 + static regex_t *field_regex; 212 + 213 + static char *skip_space(char *str) 214 + { 215 + while (*str && isspace(*str)) 216 + ++str; 217 + return str; 218 + } 219 + 220 + static char *skip_space_and_semi(char *str) 221 + { 222 + while (*str && (isspace(*str) || *str == ';')) 223 + ++str; 224 + return str; 225 + } 226 + 227 + static char *match_str(char *str, char *prefix) 228 + { 229 + while (*str && *prefix && *str == *prefix) { 230 + ++str; 231 + ++prefix; 232 + } 233 + if (*prefix) 234 + return NULL; 235 + return str; 236 + } 237 + 238 + static char *match_number(char *str, int num) 239 + { 240 + char *next; 241 + int snum = strtol(str, &next, 10); 242 + 243 + if (next - str == 0 || num != snum) 244 + return NULL; 245 + 246 + return next; 247 + } 248 + 249 + static int find_field_offset_aux(struct btf *btf, int btf_id, char *field_name, int off) 250 + { 251 + const struct btf_type *type = btf__type_by_id(btf, btf_id); 252 + const struct btf_member *m; 253 + __u16 mnum; 254 + int i; 255 + 256 + if (!type) { 257 + PRINT_FAIL("Can't find btf_type for id %d\n", btf_id); 258 + return -1; 259 + } 260 + 261 + if (!btf_is_struct(type) && !btf_is_union(type)) { 262 + PRINT_FAIL("BTF id %d is not struct or union\n", btf_id); 263 + return -1; 264 + } 265 + 266 + m = btf_members(type); 267 + mnum = btf_vlen(type); 268 + 269 + for (i = 0; i < mnum; ++i, ++m) { 270 + const char *mname = btf__name_by_offset(btf, m->name_off); 271 + 272 + if (strcmp(mname, "") == 0) { 273 + int msize = find_field_offset_aux(btf, m->type, field_name, 274 + off + m->offset); 275 + if (msize >= 0) 276 + return msize; 277 + } 278 + 279 + if (strcmp(mname, field_name)) 280 + continue; 281 + 282 + return (off + m->offset) / 8; 283 + } 284 + 285 + return -1; 286 + } 287 + 288 + static int find_field_offset(struct btf *btf, char *pattern, regmatch_t *matches) 289 + { 290 + int type_sz = matches[1].rm_eo - matches[1].rm_so; 291 + int field_sz = matches[2].rm_eo - matches[2].rm_so; 292 + char *type = pattern + matches[1].rm_so; 293 + char *field = pattern + matches[2].rm_so; 294 + char field_str[128] = {}; 295 + char type_str[128] = {}; 296 + int btf_id, field_offset; 297 + 298 + if (type_sz >= sizeof(type_str)) { 299 + PRINT_FAIL("Malformed pattern: type ident is too long: %d\n", type_sz); 300 + return -1; 301 + } 302 + 303 + if (field_sz >= sizeof(field_str)) { 304 + PRINT_FAIL("Malformed pattern: field ident is too long: %d\n", field_sz); 305 + return -1; 306 + } 307 + 308 + strncpy(type_str, type, type_sz); 309 + strncpy(field_str, field, field_sz); 310 + btf_id = btf__find_by_name(btf, type_str); 311 + if (btf_id < 0) { 312 + PRINT_FAIL("No BTF info for type %s\n", type_str); 313 + return -1; 314 + } 315 + 316 + field_offset = find_field_offset_aux(btf, btf_id, field_str, 0); 317 + if (field_offset < 0) { 318 + PRINT_FAIL("No BTF info for field %s::%s\n", type_str, field_str); 319 + return -1; 320 + } 321 + 322 + return field_offset; 323 + } 324 + 325 + static regex_t *compile_regex(char *pat) 326 + { 327 + regex_t *re; 328 + int err; 329 + 330 + re = malloc(sizeof(regex_t)); 331 + if (!re) { 332 + PRINT_FAIL("Can't alloc regex\n"); 333 + return NULL; 334 + } 335 + 336 + err = regcomp(re, pat, REG_EXTENDED); 337 + if (err) { 338 + char errbuf[512]; 339 + 340 + regerror(err, re, errbuf, sizeof(errbuf)); 341 + PRINT_FAIL("Can't compile regex: %s\n", errbuf); 342 + free(re); 343 + return NULL; 344 + } 345 + 346 + return re; 347 + } 348 + 349 + static void free_regex(regex_t *re) 350 + { 351 + if (!re) 352 + return; 353 + 354 + regfree(re); 355 + free(re); 356 + } 357 + 358 + static u32 max_line_len(char *str) 359 + { 360 + u32 max_line = 0; 361 + char *next = str; 362 + 363 + while (next) { 364 + next = strchr(str, '\n'); 365 + if (next) { 366 + max_line = max_t(u32, max_line, (next - str)); 367 + str = next + 1; 368 + } else { 369 + max_line = max_t(u32, max_line, strlen(str)); 370 + } 371 + } 372 + 373 + return min(max_line, 60u); 374 + } 375 + 376 + /* Print strings `pattern_origin` and `text_origin` side by side, 377 + * assume `pattern_pos` and `text_pos` designate location within 378 + * corresponding origin string where match diverges. 379 + * The output should look like: 380 + * 381 + * Can't match disassembly(left) with pattern(right): 382 + * r2 = *(u64 *)(r1 +0) ; $dst = *(u64 *)($ctx + bpf_sockopt_kern::sk1) 383 + * ^ ^ 384 + * r0 = 0 ; 385 + * exit ; 386 + */ 387 + static void print_match_error(FILE *out, 388 + char *pattern_origin, char *text_origin, 389 + char *pattern_pos, char *text_pos) 390 + { 391 + char *pattern = pattern_origin; 392 + char *text = text_origin; 393 + int middle = max_line_len(text) + 2; 394 + 395 + fprintf(out, "Can't match disassembly(left) with pattern(right):\n"); 396 + while (*pattern || *text) { 397 + int column = 0; 398 + int mark1 = -1; 399 + int mark2 = -1; 400 + 401 + /* Print one line from text */ 402 + while (*text && *text != '\n') { 403 + if (text == text_pos) 404 + mark1 = column; 405 + fputc(*text, out); 406 + ++text; 407 + ++column; 408 + } 409 + if (text == text_pos) 410 + mark1 = column; 411 + 412 + /* Pad to the middle */ 413 + while (column < middle) { 414 + fputc(' ', out); 415 + ++column; 416 + } 417 + fputs("; ", out); 418 + column += 3; 419 + 420 + /* Print one line from pattern, pattern lines are terminated by ';' */ 421 + while (*pattern && *pattern != ';') { 422 + if (pattern == pattern_pos) 423 + mark2 = column; 424 + fputc(*pattern, out); 425 + ++pattern; 426 + ++column; 427 + } 428 + if (pattern == pattern_pos) 429 + mark2 = column; 430 + 431 + fputc('\n', out); 432 + if (*pattern) 433 + ++pattern; 434 + if (*text) 435 + ++text; 436 + 437 + /* If pattern and text diverge at this line, print an 438 + * additional line with '^' marks, highlighting 439 + * positions where match fails. 440 + */ 441 + if (mark1 > 0 || mark2 > 0) { 442 + for (column = 0; column <= max(mark1, mark2); ++column) { 443 + if (column == mark1 || column == mark2) 444 + fputc('^', out); 445 + else 446 + fputc(' ', out); 447 + } 448 + fputc('\n', out); 449 + } 450 + } 451 + } 452 + 453 + /* Test if `text` matches `pattern`. Pattern consists of the following elements: 454 + * 455 + * - Field offset references: 456 + * 457 + * <type>::<field> 458 + * 459 + * When such reference is encountered BTF is used to compute numerical 460 + * value for the offset of <field> in <type>. The `text` is expected to 461 + * contain matching numerical value. 462 + * 463 + * - Field groups: 464 + * 465 + * $(<type>::<field> [+ <type>::<field>]*) 466 + * 467 + * Allows to specify an offset that is a sum of multiple field offsets. 468 + * The `text` is expected to contain matching numerical value. 469 + * 470 + * - Variable references, e.g. `$src`, `$dst`, `$ctx`. 471 + * These are substitutions specified in `reg_map` array. 472 + * If a substring of pattern is equal to `reg_map[i][0]` the `text` is 473 + * expected to contain `reg_map[i][1]` in the matching position. 474 + * 475 + * - Whitespace is ignored, ';' counts as whitespace for `pattern`. 476 + * 477 + * - Any other characters, `pattern` and `text` should match one-to-one. 478 + * 479 + * Example of a pattern: 480 + * 481 + * __________ fields group ________________ 482 + * ' ' 483 + * *(u16 *)($ctx + $(sk_buff::cb + qdisc_skb_cb::tc_classid)) = $src; 484 + * ^^^^ '______________________' 485 + * variable reference field offset reference 486 + */ 487 + static bool match_pattern(struct btf *btf, char *pattern, char *text, char *reg_map[][2]) 488 + { 489 + char *pattern_origin = pattern; 490 + char *text_origin = text; 491 + regmatch_t matches[3]; 492 + 493 + _continue: 494 + while (*pattern) { 495 + if (!*text) 496 + goto err; 497 + 498 + /* Skip whitespace */ 499 + if (isspace(*pattern) || *pattern == ';') { 500 + if (!isspace(*text) && text != text_origin && isalnum(text[-1])) 501 + goto err; 502 + pattern = skip_space_and_semi(pattern); 503 + text = skip_space(text); 504 + continue; 505 + } 506 + 507 + /* Check for variable references */ 508 + for (int i = 0; reg_map[i][0]; ++i) { 509 + char *pattern_next, *text_next; 510 + 511 + pattern_next = match_str(pattern, reg_map[i][0]); 512 + if (!pattern_next) 513 + continue; 514 + 515 + text_next = match_str(text, reg_map[i][1]); 516 + if (!text_next) 517 + goto err; 518 + 519 + pattern = pattern_next; 520 + text = text_next; 521 + goto _continue; 522 + } 523 + 524 + /* Match field group: 525 + * $(sk_buff::cb + qdisc_skb_cb::tc_classid) 526 + */ 527 + if (strncmp(pattern, "$(", 2) == 0) { 528 + char *group_start = pattern, *text_next; 529 + int acc_offset = 0; 530 + 531 + pattern += 2; 532 + 533 + for (;;) { 534 + int field_offset; 535 + 536 + pattern = skip_space(pattern); 537 + if (!*pattern) { 538 + PRINT_FAIL("Unexpected end of pattern\n"); 539 + goto err; 540 + } 541 + 542 + if (*pattern == ')') { 543 + ++pattern; 544 + break; 545 + } 546 + 547 + if (*pattern == '+') { 548 + ++pattern; 549 + continue; 550 + } 551 + 552 + printf("pattern: %s\n", pattern); 553 + if (regexec(field_regex, pattern, 3, matches, 0) != 0) { 554 + PRINT_FAIL("Field reference expected\n"); 555 + goto err; 556 + } 557 + 558 + field_offset = find_field_offset(btf, pattern, matches); 559 + if (field_offset < 0) 560 + goto err; 561 + 562 + pattern += matches[0].rm_eo; 563 + acc_offset += field_offset; 564 + } 565 + 566 + text_next = match_number(text, acc_offset); 567 + if (!text_next) { 568 + PRINT_FAIL("No match for group offset %.*s (%d)\n", 569 + (int)(pattern - group_start), 570 + group_start, 571 + acc_offset); 572 + goto err; 573 + } 574 + text = text_next; 575 + } 576 + 577 + /* Match field reference: 578 + * sk_buff::cb 579 + */ 580 + if (regexec(field_regex, pattern, 3, matches, 0) == 0) { 581 + int field_offset; 582 + char *text_next; 583 + 584 + field_offset = find_field_offset(btf, pattern, matches); 585 + if (field_offset < 0) 586 + goto err; 587 + 588 + text_next = match_number(text, field_offset); 589 + if (!text_next) { 590 + PRINT_FAIL("No match for field offset %.*s (%d)\n", 591 + (int)matches[0].rm_eo, pattern, field_offset); 592 + goto err; 593 + } 594 + 595 + pattern += matches[0].rm_eo; 596 + text = text_next; 597 + continue; 598 + } 599 + 600 + /* If pattern points to identifier not followed by '::' 601 + * skip the identifier to avoid n^2 application of the 602 + * field reference rule. 603 + */ 604 + if (regexec(ident_regex, pattern, 1, matches, 0) == 0) { 605 + if (strncmp(pattern, text, matches[0].rm_eo) != 0) 606 + goto err; 607 + 608 + pattern += matches[0].rm_eo; 609 + text += matches[0].rm_eo; 610 + continue; 611 + } 612 + 613 + /* Match literally */ 614 + if (*pattern != *text) 615 + goto err; 616 + 617 + ++pattern; 618 + ++text; 619 + } 620 + 621 + return true; 622 + 623 + err: 624 + test__fail(); 625 + print_match_error(stdout, pattern_origin, text_origin, pattern, text); 626 + return false; 627 + } 628 + 629 + /* Request BPF program instructions after all rewrites are applied, 630 + * e.g. verifier.c:convert_ctx_access() is done. 631 + */ 632 + static int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt) 633 + { 634 + struct bpf_prog_info info = {}; 635 + __u32 info_len = sizeof(info); 636 + __u32 xlated_prog_len; 637 + __u32 buf_element_size = sizeof(struct bpf_insn); 638 + 639 + if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) { 640 + perror("bpf_prog_get_info_by_fd failed"); 641 + return -1; 642 + } 643 + 644 + xlated_prog_len = info.xlated_prog_len; 645 + if (xlated_prog_len % buf_element_size) { 646 + printf("Program length %d is not multiple of %d\n", 647 + xlated_prog_len, buf_element_size); 648 + return -1; 649 + } 650 + 651 + *cnt = xlated_prog_len / buf_element_size; 652 + *buf = calloc(*cnt, buf_element_size); 653 + if (!buf) { 654 + perror("can't allocate xlated program buffer"); 655 + return -ENOMEM; 656 + } 657 + 658 + bzero(&info, sizeof(info)); 659 + info.xlated_prog_len = xlated_prog_len; 660 + info.xlated_prog_insns = (__u64)(unsigned long)*buf; 661 + if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) { 662 + perror("second bpf_prog_get_info_by_fd failed"); 663 + goto out_free_buf; 664 + } 665 + 666 + return 0; 667 + 668 + out_free_buf: 669 + free(*buf); 670 + return -1; 671 + } 672 + 673 + static void print_insn(void *private_data, const char *fmt, ...) 674 + { 675 + va_list args; 676 + 677 + va_start(args, fmt); 678 + vfprintf((FILE *)private_data, fmt, args); 679 + va_end(args); 680 + } 681 + 682 + /* Disassemble instructions to a stream */ 683 + static void print_xlated(FILE *out, struct bpf_insn *insn, __u32 len) 684 + { 685 + const struct bpf_insn_cbs cbs = { 686 + .cb_print = print_insn, 687 + .cb_call = NULL, 688 + .cb_imm = NULL, 689 + .private_data = out, 690 + }; 691 + bool double_insn = false; 692 + int i; 693 + 694 + for (i = 0; i < len; i++) { 695 + if (double_insn) { 696 + double_insn = false; 697 + continue; 698 + } 699 + 700 + double_insn = insn[i].code == (BPF_LD | BPF_IMM | BPF_DW); 701 + print_bpf_insn(&cbs, insn + i, true); 702 + } 703 + } 704 + 705 + /* We share code with kernel BPF disassembler, it adds '(FF) ' prefix 706 + * for each instruction (FF stands for instruction `code` byte). 707 + * This function removes the prefix inplace for each line in `str`. 708 + */ 709 + static void remove_insn_prefix(char *str, int size) 710 + { 711 + const int prefix_size = 5; 712 + 713 + int write_pos = 0, read_pos = prefix_size; 714 + int len = strlen(str); 715 + char c; 716 + 717 + size = min(size, len); 718 + 719 + while (read_pos < size) { 720 + c = str[read_pos++]; 721 + if (c == 0) 722 + break; 723 + str[write_pos++] = c; 724 + if (c == '\n') 725 + read_pos += prefix_size; 726 + } 727 + str[write_pos] = 0; 728 + } 729 + 730 + struct prog_info { 731 + char *prog_kind; 732 + enum bpf_prog_type prog_type; 733 + enum bpf_attach_type expected_attach_type; 734 + struct bpf_insn *prog; 735 + u32 prog_len; 736 + }; 737 + 738 + static void match_program(struct btf *btf, 739 + struct prog_info *pinfo, 740 + char *pattern, 741 + char *reg_map[][2], 742 + bool skip_first_insn) 743 + { 744 + struct bpf_insn *buf = NULL; 745 + int err = 0, prog_fd = 0; 746 + FILE *prog_out = NULL; 747 + char *text = NULL; 748 + __u32 cnt = 0; 749 + 750 + text = calloc(MAX_PROG_TEXT_SZ, 1); 751 + if (!text) { 752 + PRINT_FAIL("Can't allocate %d bytes\n", MAX_PROG_TEXT_SZ); 753 + goto out; 754 + } 755 + 756 + // TODO: log level 757 + LIBBPF_OPTS(bpf_prog_load_opts, opts); 758 + opts.log_buf = text; 759 + opts.log_size = MAX_PROG_TEXT_SZ; 760 + opts.log_level = 1 | 2 | 4; 761 + opts.expected_attach_type = pinfo->expected_attach_type; 762 + 763 + prog_fd = bpf_prog_load(pinfo->prog_type, NULL, "GPL", 764 + pinfo->prog, pinfo->prog_len, &opts); 765 + if (prog_fd < 0) { 766 + PRINT_FAIL("Can't load program, errno %d (%s), verifier log:\n%s\n", 767 + errno, strerror(errno), text); 768 + goto out; 769 + } 770 + 771 + memset(text, 0, MAX_PROG_TEXT_SZ); 772 + 773 + err = get_xlated_program(prog_fd, &buf, &cnt); 774 + if (err) { 775 + PRINT_FAIL("Can't load back BPF program\n"); 776 + goto out; 777 + } 778 + 779 + prog_out = fmemopen(text, MAX_PROG_TEXT_SZ - 1, "w"); 780 + if (!prog_out) { 781 + PRINT_FAIL("Can't open memory stream\n"); 782 + goto out; 783 + } 784 + if (skip_first_insn) 785 + print_xlated(prog_out, buf + 1, cnt - 1); 786 + else 787 + print_xlated(prog_out, buf, cnt); 788 + fclose(prog_out); 789 + remove_insn_prefix(text, MAX_PROG_TEXT_SZ); 790 + 791 + ASSERT_TRUE(match_pattern(btf, pattern, text, reg_map), 792 + pinfo->prog_kind); 793 + 794 + out: 795 + if (prog_fd) 796 + close(prog_fd); 797 + free(buf); 798 + free(text); 799 + } 800 + 801 + static void run_one_testcase(struct btf *btf, struct test_case *test) 802 + { 803 + struct prog_info pinfo = {}; 804 + int bpf_sz; 805 + 806 + if (!test__start_subtest(test->name)) 807 + return; 808 + 809 + switch (test->field_sz) { 810 + case 8: 811 + bpf_sz = BPF_DW; 812 + break; 813 + case 4: 814 + bpf_sz = BPF_W; 815 + break; 816 + case 2: 817 + bpf_sz = BPF_H; 818 + break; 819 + case 1: 820 + bpf_sz = BPF_B; 821 + break; 822 + default: 823 + PRINT_FAIL("Unexpected field size: %d, want 8,4,2 or 1\n", test->field_sz); 824 + return; 825 + } 826 + 827 + pinfo.prog_type = test->prog_type; 828 + pinfo.expected_attach_type = test->expected_attach_type; 829 + 830 + if (test->read) { 831 + struct bpf_insn ldx_prog[] = { 832 + BPF_LDX_MEM(bpf_sz, BPF_REG_2, BPF_REG_1, test->field_offset), 833 + BPF_MOV64_IMM(BPF_REG_0, 0), 834 + BPF_EXIT_INSN(), 835 + }; 836 + char *reg_map[][2] = { 837 + { "$ctx", "r1" }, 838 + { "$dst", "r2" }, 839 + {} 840 + }; 841 + 842 + pinfo.prog_kind = "LDX"; 843 + pinfo.prog = ldx_prog; 844 + pinfo.prog_len = ARRAY_SIZE(ldx_prog); 845 + match_program(btf, &pinfo, test->read, reg_map, false); 846 + } 847 + 848 + if (test->write || test->write_st || test->write_stx) { 849 + struct bpf_insn stx_prog[] = { 850 + BPF_MOV64_IMM(BPF_REG_2, 0), 851 + BPF_STX_MEM(bpf_sz, BPF_REG_1, BPF_REG_2, test->field_offset), 852 + BPF_MOV64_IMM(BPF_REG_0, 0), 853 + BPF_EXIT_INSN(), 854 + }; 855 + char *stx_reg_map[][2] = { 856 + { "$ctx", "r1" }, 857 + { "$src", "r2" }, 858 + {} 859 + }; 860 + struct bpf_insn st_prog[] = { 861 + BPF_ST_MEM(bpf_sz, BPF_REG_1, test->field_offset, 862 + test->st_value.use ? test->st_value.value : 42), 863 + BPF_MOV64_IMM(BPF_REG_0, 0), 864 + BPF_EXIT_INSN(), 865 + }; 866 + char *st_reg_map[][2] = { 867 + { "$ctx", "r1" }, 868 + { "$src", "42" }, 869 + {} 870 + }; 871 + 872 + if (test->write || test->write_stx) { 873 + char *pattern = test->write_stx ? test->write_stx : test->write; 874 + 875 + pinfo.prog_kind = "STX"; 876 + pinfo.prog = stx_prog; 877 + pinfo.prog_len = ARRAY_SIZE(stx_prog); 878 + match_program(btf, &pinfo, pattern, stx_reg_map, true); 879 + } 880 + 881 + if (test->write || test->write_st) { 882 + char *pattern = test->write_st ? test->write_st : test->write; 883 + 884 + pinfo.prog_kind = "ST"; 885 + pinfo.prog = st_prog; 886 + pinfo.prog_len = ARRAY_SIZE(st_prog); 887 + match_program(btf, &pinfo, pattern, st_reg_map, false); 888 + } 889 + } 890 + 891 + test__end_subtest(); 892 + } 893 + 894 + void test_ctx_rewrite(void) 895 + { 896 + struct btf *btf; 897 + int i; 898 + 899 + field_regex = compile_regex("^([[:alpha:]_][[:alnum:]_]+)::([[:alpha:]_][[:alnum:]_]+)"); 900 + ident_regex = compile_regex("^[[:alpha:]_][[:alnum:]_]+"); 901 + if (!field_regex || !ident_regex) 902 + return; 903 + 904 + btf = btf__load_vmlinux_btf(); 905 + if (!btf) { 906 + PRINT_FAIL("Can't load vmlinux BTF, errno %d (%s)\n", errno, strerror(errno)); 907 + goto out; 908 + } 909 + 910 + for (i = 0; i < ARRAY_SIZE(test_cases); ++i) 911 + run_one_testcase(btf, &test_cases[i]); 912 + 913 + out: 914 + btf__free(btf); 915 + free_regex(field_regex); 916 + free_regex(ident_regex); 917 + }

+4 -12

tools/testing/selftests/bpf/prog_tests/decap_sanity.c

··· 10 10 #include "network_helpers.h" 11 11 #include "decap_sanity.skel.h" 12 12 13 - #define SYS(fmt, ...) \ 14 - ({ \ 15 - char cmd[1024]; \ 16 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 17 - if (!ASSERT_OK(system(cmd), cmd)) \ 18 - goto fail; \ 19 - }) 20 - 21 13 #define NS_TEST "decap_sanity_ns" 22 14 #define IPV6_IFACE_ADDR "face::1" 23 15 #define UDP_TEST_PORT 7777 ··· 29 37 if (!ASSERT_OK_PTR(skel, "skel open_and_load")) 30 38 return; 31 39 32 - SYS("ip netns add %s", NS_TEST); 33 - SYS("ip -net %s -6 addr add %s/128 dev lo nodad", NS_TEST, IPV6_IFACE_ADDR); 34 - SYS("ip -net %s link set dev lo up", NS_TEST); 40 + SYS(fail, "ip netns add %s", NS_TEST); 41 + SYS(fail, "ip -net %s -6 addr add %s/128 dev lo nodad", NS_TEST, IPV6_IFACE_ADDR); 42 + SYS(fail, "ip -net %s link set dev lo up", NS_TEST); 35 43 36 44 nstoken = open_netns(NS_TEST); 37 45 if (!ASSERT_OK_PTR(nstoken, "open_netns")) ··· 72 80 bpf_tc_hook_destroy(&qdisc_hook); 73 81 close_netns(nstoken); 74 82 } 75 - system("ip netns del " NS_TEST " &> /dev/null"); 83 + SYS_NOFAIL("ip netns del " NS_TEST " &> /dev/null"); 76 84 decap_sanity__destroy(skel); 77 85 }

+58 -16

tools/testing/selftests/bpf/prog_tests/dynptr.c

··· 2 2 /* Copyright (c) 2022 Facebook */ 3 3 4 4 #include <test_progs.h> 5 + #include <network_helpers.h> 5 6 #include "dynptr_fail.skel.h" 6 7 #include "dynptr_success.skel.h" 7 8 8 - static const char * const success_tests[] = { 9 - "test_read_write", 10 - "test_data_slice", 11 - "test_ringbuf", 9 + enum test_setup_type { 10 + SETUP_SYSCALL_SLEEP, 11 + SETUP_SKB_PROG, 12 12 }; 13 13 14 - static void verify_success(const char *prog_name) 14 + static struct { 15 + const char *prog_name; 16 + enum test_setup_type type; 17 + } success_tests[] = { 18 + {"test_read_write", SETUP_SYSCALL_SLEEP}, 19 + {"test_dynptr_data", SETUP_SYSCALL_SLEEP}, 20 + {"test_ringbuf", SETUP_SYSCALL_SLEEP}, 21 + {"test_skb_readonly", SETUP_SKB_PROG}, 22 + {"test_dynptr_skb_data", SETUP_SKB_PROG}, 23 + }; 24 + 25 + static void verify_success(const char *prog_name, enum test_setup_type setup_type) 15 26 { 16 27 struct dynptr_success *skel; 17 28 struct bpf_program *prog; 18 29 struct bpf_link *link; 30 + int err; 19 31 20 32 skel = dynptr_success__open(); 21 33 if (!ASSERT_OK_PTR(skel, "dynptr_success__open")) ··· 35 23 36 24 skel->bss->pid = getpid(); 37 25 38 - dynptr_success__load(skel); 39 - if (!ASSERT_OK_PTR(skel, "dynptr_success__load")) 40 - goto cleanup; 41 - 42 26 prog = bpf_object__find_program_by_name(skel->obj, prog_name); 43 27 if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name")) 44 28 goto cleanup; 45 29 46 - link = bpf_program__attach(prog); 47 - if (!ASSERT_OK_PTR(link, "bpf_program__attach")) 30 + bpf_program__set_autoload(prog, true); 31 + 32 + err = dynptr_success__load(skel); 33 + if (!ASSERT_OK(err, "dynptr_success__load")) 48 34 goto cleanup; 49 35 50 - usleep(1); 36 + switch (setup_type) { 37 + case SETUP_SYSCALL_SLEEP: 38 + link = bpf_program__attach(prog); 39 + if (!ASSERT_OK_PTR(link, "bpf_program__attach")) 40 + goto cleanup; 41 + 42 + usleep(1); 43 + 44 + bpf_link__destroy(link); 45 + break; 46 + case SETUP_SKB_PROG: 47 + { 48 + int prog_fd; 49 + char buf[64]; 50 + 51 + LIBBPF_OPTS(bpf_test_run_opts, topts, 52 + .data_in = &pkt_v4, 53 + .data_size_in = sizeof(pkt_v4), 54 + .data_out = buf, 55 + .data_size_out = sizeof(buf), 56 + .repeat = 1, 57 + ); 58 + 59 + prog_fd = bpf_program__fd(prog); 60 + if (!ASSERT_GE(prog_fd, 0, "prog_fd")) 61 + goto cleanup; 62 + 63 + err = bpf_prog_test_run_opts(prog_fd, &topts); 64 + 65 + if (!ASSERT_OK(err, "test_run")) 66 + goto cleanup; 67 + 68 + break; 69 + } 70 + } 51 71 52 72 ASSERT_EQ(skel->bss->err, 0, "err"); 53 - 54 - bpf_link__destroy(link); 55 73 56 74 cleanup: 57 75 dynptr_success__destroy(skel); ··· 92 50 int i; 93 51 94 52 for (i = 0; i < ARRAY_SIZE(success_tests); i++) { 95 - if (!test__start_subtest(success_tests[i])) 53 + if (!test__start_subtest(success_tests[i].prog_name)) 96 54 continue; 97 55 98 - verify_success(success_tests[i]); 56 + verify_success(success_tests[i].prog_name, success_tests[i].type); 99 57 } 100 58 101 59 RUN_TESTS(dynptr_fail);

+10 -15

tools/testing/selftests/bpf/prog_tests/empty_skb.c

··· 4 4 #include <net/if.h> 5 5 #include "empty_skb.skel.h" 6 6 7 - #define SYS(cmd) ({ \ 8 - if (!ASSERT_OK(system(cmd), (cmd))) \ 9 - goto out; \ 10 - }) 11 - 12 7 void test_empty_skb(void) 13 8 { 14 9 LIBBPF_OPTS(bpf_test_run_opts, tattr); ··· 88 93 }, 89 94 }; 90 95 91 - SYS("ip netns add empty_skb"); 96 + SYS(out, "ip netns add empty_skb"); 92 97 tok = open_netns("empty_skb"); 93 - SYS("ip link add veth0 type veth peer veth1"); 94 - SYS("ip link set dev veth0 up"); 95 - SYS("ip link set dev veth1 up"); 96 - SYS("ip addr add 10.0.0.1/8 dev veth0"); 97 - SYS("ip addr add 10.0.0.2/8 dev veth1"); 98 + SYS(out, "ip link add veth0 type veth peer veth1"); 99 + SYS(out, "ip link set dev veth0 up"); 100 + SYS(out, "ip link set dev veth1 up"); 101 + SYS(out, "ip addr add 10.0.0.1/8 dev veth0"); 102 + SYS(out, "ip addr add 10.0.0.2/8 dev veth1"); 98 103 veth_ifindex = if_nametoindex("veth0"); 99 104 100 - SYS("ip link add ipip0 type ipip local 10.0.0.1 remote 10.0.0.2"); 101 - SYS("ip link set ipip0 up"); 102 - SYS("ip addr add 192.168.1.1/16 dev ipip0"); 105 + SYS(out, "ip link add ipip0 type ipip local 10.0.0.1 remote 10.0.0.2"); 106 + SYS(out, "ip link set ipip0 up"); 107 + SYS(out, "ip addr add 192.168.1.1/16 dev ipip0"); 103 108 ipip_ifindex = if_nametoindex("ipip0"); 104 109 105 110 bpf_obj = empty_skb__open_and_load(); ··· 137 142 empty_skb__destroy(bpf_obj); 138 143 if (tok) 139 144 close_netns(tok); 140 - system("ip netns del empty_skb"); 145 + SYS_NOFAIL("ip netns del empty_skb"); 141 146 }

+10 -18

tools/testing/selftests/bpf/prog_tests/fib_lookup.c

··· 8 8 #include "network_helpers.h" 9 9 #include "fib_lookup.skel.h" 10 10 11 - #define SYS(fmt, ...) \ 12 - ({ \ 13 - char cmd[1024]; \ 14 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 15 - if (!ASSERT_OK(system(cmd), cmd)) \ 16 - goto fail; \ 17 - }) 18 - 19 11 #define NS_TEST "fib_lookup_ns" 20 12 #define IPV6_IFACE_ADDR "face::face" 21 13 #define IPV6_NUD_FAILED_ADDR "face::1" ··· 51 59 { 52 60 int err; 53 61 54 - SYS("ip link add veth1 type veth peer name veth2"); 55 - SYS("ip link set dev veth1 up"); 62 + SYS(fail, "ip link add veth1 type veth peer name veth2"); 63 + SYS(fail, "ip link set dev veth1 up"); 56 64 57 - SYS("ip addr add %s/64 dev veth1 nodad", IPV6_IFACE_ADDR); 58 - SYS("ip neigh add %s dev veth1 nud failed", IPV6_NUD_FAILED_ADDR); 59 - SYS("ip neigh add %s dev veth1 lladdr %s nud stale", IPV6_NUD_STALE_ADDR, DMAC); 65 + SYS(fail, "ip addr add %s/64 dev veth1 nodad", IPV6_IFACE_ADDR); 66 + SYS(fail, "ip neigh add %s dev veth1 nud failed", IPV6_NUD_FAILED_ADDR); 67 + SYS(fail, "ip neigh add %s dev veth1 lladdr %s nud stale", IPV6_NUD_STALE_ADDR, DMAC); 60 68 61 - SYS("ip addr add %s/24 dev veth1 nodad", IPV4_IFACE_ADDR); 62 - SYS("ip neigh add %s dev veth1 nud failed", IPV4_NUD_FAILED_ADDR); 63 - SYS("ip neigh add %s dev veth1 lladdr %s nud stale", IPV4_NUD_STALE_ADDR, DMAC); 69 + SYS(fail, "ip addr add %s/24 dev veth1 nodad", IPV4_IFACE_ADDR); 70 + SYS(fail, "ip neigh add %s dev veth1 nud failed", IPV4_NUD_FAILED_ADDR); 71 + SYS(fail, "ip neigh add %s dev veth1 lladdr %s nud stale", IPV4_NUD_STALE_ADDR, DMAC); 64 72 65 73 err = write_sysctl("/proc/sys/net/ipv4/conf/veth1/forwarding", "1"); 66 74 if (!ASSERT_OK(err, "write_sysctl(net.ipv4.conf.veth1.forwarding)")) ··· 132 140 return; 133 141 prog_fd = bpf_program__fd(skel->progs.fib_lookup); 134 142 135 - SYS("ip netns add %s", NS_TEST); 143 + SYS(fail, "ip netns add %s", NS_TEST); 136 144 137 145 nstoken = open_netns(NS_TEST); 138 146 if (!ASSERT_OK_PTR(nstoken, "open_netns")) ··· 174 182 fail: 175 183 if (nstoken) 176 184 close_netns(nstoken); 177 - system("ip netns del " NS_TEST " &> /dev/null"); 185 + SYS_NOFAIL("ip netns del " NS_TEST " &> /dev/null"); 178 186 fib_lookup__destroy(skel); 179 187 }

+24

tools/testing/selftests/bpf/prog_tests/flow_dissector.c

··· 346 346 .retval = BPF_OK, 347 347 }, 348 348 { 349 + .name = "ipv6-empty-flow-label", 350 + .pkt.ipv6 = { 351 + .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6), 352 + .iph.nexthdr = IPPROTO_TCP, 353 + .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES), 354 + .iph.flow_lbl = { 0x00, 0x00, 0x00 }, 355 + .tcp.doff = 5, 356 + .tcp.source = 80, 357 + .tcp.dest = 8080, 358 + }, 359 + .keys = { 360 + .flags = BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, 361 + .nhoff = ETH_HLEN, 362 + .thoff = ETH_HLEN + sizeof(struct ipv6hdr), 363 + .addr_proto = ETH_P_IPV6, 364 + .ip_proto = IPPROTO_TCP, 365 + .n_proto = __bpf_constant_htons(ETH_P_IPV6), 366 + .sport = 80, 367 + .dport = 8080, 368 + }, 369 + .flags = BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, 370 + .retval = BPF_OK, 371 + }, 372 + { 349 373 .name = "ipip-encap", 350 374 .pkt.ipip = { 351 375 .eth.h_proto = __bpf_constant_htons(ETH_P_IP),

+2

tools/testing/selftests/bpf/prog_tests/l4lb_all.c

··· 93 93 test_l4lb("test_l4lb.bpf.o"); 94 94 if (test__start_subtest("l4lb_noinline")) 95 95 test_l4lb("test_l4lb_noinline.bpf.o"); 96 + if (test__start_subtest("l4lb_noinline_dynptr")) 97 + test_l4lb("test_l4lb_noinline_dynptr.bpf.o"); 96 98 }

+1 -1

tools/testing/selftests/bpf/prog_tests/log_fixup.c

··· 141 141 if (test__start_subtest("bad_core_relo_trunc_partial")) 142 142 bad_core_relo(300, TRUNC_PARTIAL /* truncate original log a bit */); 143 143 if (test__start_subtest("bad_core_relo_trunc_full")) 144 - bad_core_relo(250, TRUNC_FULL /* truncate also libbpf's message patch */); 144 + bad_core_relo(210, TRUNC_FULL /* truncate also libbpf's message patch */); 145 145 if (test__start_subtest("bad_core_relo_subprog")) 146 146 bad_core_relo_subprog(); 147 147 if (test__start_subtest("missing_map"))

+113 -23

tools/testing/selftests/bpf/prog_tests/map_kptr.c

··· 4 4 5 5 #include "map_kptr.skel.h" 6 6 #include "map_kptr_fail.skel.h" 7 + #include "rcu_tasks_trace_gp.skel.h" 7 8 8 9 static void test_map_kptr_success(bool test_run) 9 10 { 11 + LIBBPF_OPTS(bpf_test_run_opts, lopts); 10 12 LIBBPF_OPTS(bpf_test_run_opts, opts, 11 13 .data_in = &pkt_v4, 12 14 .data_size_in = sizeof(pkt_v4), 13 15 .repeat = 1, 14 16 ); 17 + int key = 0, ret, cpu; 15 18 struct map_kptr *skel; 16 - int key = 0, ret; 17 - char buf[16]; 19 + char buf[16], *pbuf; 18 20 19 21 skel = map_kptr__open_and_load(); 20 22 if (!ASSERT_OK_PTR(skel, "map_kptr__open_and_load")) 21 23 return; 22 24 23 - ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref), &opts); 24 - ASSERT_OK(ret, "test_map_kptr_ref refcount"); 25 - ASSERT_OK(opts.retval, "test_map_kptr_ref retval"); 25 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref1), &opts); 26 + ASSERT_OK(ret, "test_map_kptr_ref1 refcount"); 27 + ASSERT_OK(opts.retval, "test_map_kptr_ref1 retval"); 26 28 ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref2), &opts); 27 29 ASSERT_OK(ret, "test_map_kptr_ref2 refcount"); 28 30 ASSERT_OK(opts.retval, "test_map_kptr_ref2 retval"); 29 31 32 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_ls_map_kptr_ref1), &lopts); 33 + ASSERT_OK(ret, "test_ls_map_kptr_ref1 refcount"); 34 + ASSERT_OK(lopts.retval, "test_ls_map_kptr_ref1 retval"); 35 + 36 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_ls_map_kptr_ref2), &lopts); 37 + ASSERT_OK(ret, "test_ls_map_kptr_ref2 refcount"); 38 + ASSERT_OK(lopts.retval, "test_ls_map_kptr_ref2 retval"); 39 + 30 40 if (test_run) 41 + goto exit; 42 + 43 + cpu = libbpf_num_possible_cpus(); 44 + if (!ASSERT_GT(cpu, 0, "libbpf_num_possible_cpus")) 45 + goto exit; 46 + 47 + pbuf = calloc(cpu, sizeof(buf)); 48 + if (!ASSERT_OK_PTR(pbuf, "calloc(pbuf)")) 31 49 goto exit; 32 50 33 51 ret = bpf_map__update_elem(skel->maps.array_map, 34 52 &key, sizeof(key), buf, sizeof(buf), 0); 35 53 ASSERT_OK(ret, "array_map update"); 36 - ret = bpf_map__update_elem(skel->maps.array_map, 37 - &key, sizeof(key), buf, sizeof(buf), 0); 38 - ASSERT_OK(ret, "array_map update2"); 54 + skel->data->ref--; 55 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts); 56 + ASSERT_OK(ret, "test_map_kptr_ref3 refcount"); 57 + ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval"); 39 58 40 - ret = bpf_map__update_elem(skel->maps.hash_map, 41 - &key, sizeof(key), buf, sizeof(buf), 0); 42 - ASSERT_OK(ret, "hash_map update"); 59 + ret = bpf_map__update_elem(skel->maps.pcpu_array_map, 60 + &key, sizeof(key), pbuf, cpu * sizeof(buf), 0); 61 + ASSERT_OK(ret, "pcpu_array_map update"); 62 + skel->data->ref--; 63 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts); 64 + ASSERT_OK(ret, "test_map_kptr_ref3 refcount"); 65 + ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval"); 66 + 43 67 ret = bpf_map__delete_elem(skel->maps.hash_map, &key, sizeof(key), 0); 44 68 ASSERT_OK(ret, "hash_map delete"); 69 + skel->data->ref--; 70 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts); 71 + ASSERT_OK(ret, "test_map_kptr_ref3 refcount"); 72 + ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval"); 45 73 46 - ret = bpf_map__update_elem(skel->maps.hash_malloc_map, 47 - &key, sizeof(key), buf, sizeof(buf), 0); 48 - ASSERT_OK(ret, "hash_malloc_map update"); 74 + ret = bpf_map__delete_elem(skel->maps.pcpu_hash_map, &key, sizeof(key), 0); 75 + ASSERT_OK(ret, "pcpu_hash_map delete"); 76 + skel->data->ref--; 77 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts); 78 + ASSERT_OK(ret, "test_map_kptr_ref3 refcount"); 79 + ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval"); 80 + 49 81 ret = bpf_map__delete_elem(skel->maps.hash_malloc_map, &key, sizeof(key), 0); 50 82 ASSERT_OK(ret, "hash_malloc_map delete"); 83 + skel->data->ref--; 84 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts); 85 + ASSERT_OK(ret, "test_map_kptr_ref3 refcount"); 86 + ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval"); 51 87 52 - ret = bpf_map__update_elem(skel->maps.lru_hash_map, 53 - &key, sizeof(key), buf, sizeof(buf), 0); 54 - ASSERT_OK(ret, "lru_hash_map update"); 88 + ret = bpf_map__delete_elem(skel->maps.pcpu_hash_malloc_map, &key, sizeof(key), 0); 89 + ASSERT_OK(ret, "pcpu_hash_malloc_map delete"); 90 + skel->data->ref--; 91 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts); 92 + ASSERT_OK(ret, "test_map_kptr_ref3 refcount"); 93 + ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval"); 94 + 55 95 ret = bpf_map__delete_elem(skel->maps.lru_hash_map, &key, sizeof(key), 0); 56 96 ASSERT_OK(ret, "lru_hash_map delete"); 97 + skel->data->ref--; 98 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts); 99 + ASSERT_OK(ret, "test_map_kptr_ref3 refcount"); 100 + ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval"); 57 101 102 + ret = bpf_map__delete_elem(skel->maps.lru_pcpu_hash_map, &key, sizeof(key), 0); 103 + ASSERT_OK(ret, "lru_pcpu_hash_map delete"); 104 + skel->data->ref--; 105 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts); 106 + ASSERT_OK(ret, "test_map_kptr_ref3 refcount"); 107 + ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval"); 108 + 109 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_ls_map_kptr_ref_del), &lopts); 110 + ASSERT_OK(ret, "test_ls_map_kptr_ref_del delete"); 111 + skel->data->ref--; 112 + ASSERT_OK(lopts.retval, "test_ls_map_kptr_ref_del retval"); 113 + 114 + free(pbuf); 58 115 exit: 59 116 map_kptr__destroy(skel); 60 117 } 61 118 62 - void test_map_kptr(void) 119 + static int kern_sync_rcu_tasks_trace(struct rcu_tasks_trace_gp *rcu) 63 120 { 64 - if (test__start_subtest("success")) { 121 + long gp_seq = READ_ONCE(rcu->bss->gp_seq); 122 + LIBBPF_OPTS(bpf_test_run_opts, opts); 123 + 124 + if (!ASSERT_OK(bpf_prog_test_run_opts(bpf_program__fd(rcu->progs.do_call_rcu_tasks_trace), 125 + &opts), "do_call_rcu_tasks_trace")) 126 + return -EFAULT; 127 + if (!ASSERT_OK(opts.retval, "opts.retval == 0")) 128 + return -EFAULT; 129 + while (gp_seq == READ_ONCE(rcu->bss->gp_seq)) 130 + sched_yield(); 131 + return 0; 132 + } 133 + 134 + void serial_test_map_kptr(void) 135 + { 136 + struct rcu_tasks_trace_gp *skel; 137 + 138 + RUN_TESTS(map_kptr_fail); 139 + 140 + skel = rcu_tasks_trace_gp__open_and_load(); 141 + if (!ASSERT_OK_PTR(skel, "rcu_tasks_trace_gp__open_and_load")) 142 + return; 143 + if (!ASSERT_OK(rcu_tasks_trace_gp__attach(skel), "rcu_tasks_trace_gp__attach")) 144 + goto end; 145 + 146 + if (test__start_subtest("success-map")) { 147 + test_map_kptr_success(true); 148 + 149 + ASSERT_OK(kern_sync_rcu_tasks_trace(skel), "sync rcu_tasks_trace"); 150 + ASSERT_OK(kern_sync_rcu(), "sync rcu"); 151 + /* Observe refcount dropping to 1 on bpf_map_free_deferred */ 65 152 test_map_kptr_success(false); 66 - /* Do test_run twice, so that we see refcount going back to 1 67 - * after we leave it in map from first iteration. 68 - */ 153 + 154 + ASSERT_OK(kern_sync_rcu_tasks_trace(skel), "sync rcu_tasks_trace"); 155 + ASSERT_OK(kern_sync_rcu(), "sync rcu"); 156 + /* Observe refcount dropping to 1 on synchronous delete elem */ 69 157 test_map_kptr_success(true); 70 158 } 71 159 72 - RUN_TESTS(map_kptr_fail); 160 + end: 161 + rcu_tasks_trace_gp__destroy(skel); 162 + return; 73 163 }

+17 -2

tools/testing/selftests/bpf/prog_tests/mptcp.c

··· 7 7 #include "network_helpers.h" 8 8 #include "mptcp_sock.skel.h" 9 9 10 + #define NS_TEST "mptcp_ns" 11 + 10 12 #ifndef TCP_CA_NAME_MAX 11 13 #define TCP_CA_NAME_MAX 16 12 14 #endif ··· 140 138 141 139 static void test_base(void) 142 140 { 141 + struct nstoken *nstoken = NULL; 143 142 int server_fd, cgroup_fd; 144 143 145 144 cgroup_fd = test__join_cgroup("/mptcp"); 146 145 if (!ASSERT_GE(cgroup_fd, 0, "test__join_cgroup")) 147 146 return; 147 + 148 + SYS(fail, "ip netns add %s", NS_TEST); 149 + SYS(fail, "ip -net %s link set dev lo up", NS_TEST); 150 + 151 + nstoken = open_netns(NS_TEST); 152 + if (!ASSERT_OK_PTR(nstoken, "open_netns")) 153 + goto fail; 148 154 149 155 /* without MPTCP */ 150 156 server_fd = start_server(AF_INET, SOCK_STREAM, NULL, 0, 0); ··· 167 157 /* with MPTCP */ 168 158 server_fd = start_mptcp_server(AF_INET, NULL, 0, 0); 169 159 if (!ASSERT_GE(server_fd, 0, "start_mptcp_server")) 170 - goto close_cgroup_fd; 160 + goto fail; 171 161 172 162 ASSERT_OK(run_test(cgroup_fd, server_fd, true), "run_test mptcp"); 173 163 174 164 close(server_fd); 175 165 176 - close_cgroup_fd: 166 + fail: 167 + if (nstoken) 168 + close_netns(nstoken); 169 + 170 + SYS_NOFAIL("ip netns del " NS_TEST " &> /dev/null"); 171 + 177 172 close(cgroup_fd); 178 173 } 179 174

+93

tools/testing/selftests/bpf/prog_tests/parse_tcp_hdr_opt.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <test_progs.h> 4 + #include <network_helpers.h> 5 + #include "test_parse_tcp_hdr_opt.skel.h" 6 + #include "test_parse_tcp_hdr_opt_dynptr.skel.h" 7 + #include "test_tcp_hdr_options.h" 8 + 9 + struct test_pkt { 10 + struct ipv6_packet pk6_v6; 11 + u8 options[16]; 12 + } __packed; 13 + 14 + struct test_pkt pkt = { 15 + .pk6_v6.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6), 16 + .pk6_v6.iph.nexthdr = IPPROTO_TCP, 17 + .pk6_v6.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES), 18 + .pk6_v6.tcp.urg_ptr = 123, 19 + .pk6_v6.tcp.doff = 9, /* 16 bytes of options */ 20 + 21 + .options = { 22 + TCPOPT_MSS, 4, 0x05, 0xB4, TCPOPT_NOP, TCPOPT_NOP, 23 + 0, 6, 0xBB, 0xBB, 0xBB, 0xBB, TCPOPT_EOL 24 + }, 25 + }; 26 + 27 + static void test_parse_opt(void) 28 + { 29 + struct test_parse_tcp_hdr_opt *skel; 30 + struct bpf_program *prog; 31 + char buf[128]; 32 + int err; 33 + 34 + LIBBPF_OPTS(bpf_test_run_opts, topts, 35 + .data_in = &pkt, 36 + .data_size_in = sizeof(pkt), 37 + .data_out = buf, 38 + .data_size_out = sizeof(buf), 39 + .repeat = 3, 40 + ); 41 + 42 + skel = test_parse_tcp_hdr_opt__open_and_load(); 43 + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) 44 + return; 45 + 46 + pkt.options[6] = skel->rodata->tcp_hdr_opt_kind_tpr; 47 + prog = skel->progs.xdp_ingress_v6; 48 + 49 + err = bpf_prog_test_run_opts(bpf_program__fd(prog), &topts); 50 + ASSERT_OK(err, "ipv6 test_run"); 51 + ASSERT_EQ(topts.retval, XDP_PASS, "ipv6 test_run retval"); 52 + ASSERT_EQ(skel->bss->server_id, 0xBBBBBBBB, "server id"); 53 + 54 + test_parse_tcp_hdr_opt__destroy(skel); 55 + } 56 + 57 + static void test_parse_opt_dynptr(void) 58 + { 59 + struct test_parse_tcp_hdr_opt_dynptr *skel; 60 + struct bpf_program *prog; 61 + char buf[128]; 62 + int err; 63 + 64 + LIBBPF_OPTS(bpf_test_run_opts, topts, 65 + .data_in = &pkt, 66 + .data_size_in = sizeof(pkt), 67 + .data_out = buf, 68 + .data_size_out = sizeof(buf), 69 + .repeat = 3, 70 + ); 71 + 72 + skel = test_parse_tcp_hdr_opt_dynptr__open_and_load(); 73 + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) 74 + return; 75 + 76 + pkt.options[6] = skel->rodata->tcp_hdr_opt_kind_tpr; 77 + prog = skel->progs.xdp_ingress_v6; 78 + 79 + err = bpf_prog_test_run_opts(bpf_program__fd(prog), &topts); 80 + ASSERT_OK(err, "ipv6 test_run"); 81 + ASSERT_EQ(topts.retval, XDP_PASS, "ipv6 test_run retval"); 82 + ASSERT_EQ(skel->bss->server_id, 0xBBBBBBBB, "server id"); 83 + 84 + test_parse_tcp_hdr_opt_dynptr__destroy(skel); 85 + } 86 + 87 + void test_parse_tcp_hdr_opt(void) 88 + { 89 + if (test__start_subtest("parse_tcp_hdr_opt")) 90 + test_parse_opt(); 91 + if (test__start_subtest("parse_tcp_hdr_opt_dynptr")) 92 + test_parse_opt_dynptr(); 93 + }

+3 -13

tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c

··· 25 25 26 26 bpf_program__set_autoload(skel->progs.get_cgroup_id, true); 27 27 bpf_program__set_autoload(skel->progs.task_succ, true); 28 - bpf_program__set_autoload(skel->progs.no_lock, true); 29 28 bpf_program__set_autoload(skel->progs.two_regions, true); 30 29 bpf_program__set_autoload(skel->progs.non_sleepable_1, true); 31 30 bpf_program__set_autoload(skel->progs.non_sleepable_2, true); 31 + bpf_program__set_autoload(skel->progs.task_trusted_non_rcuptr, true); 32 32 err = rcu_read_lock__load(skel); 33 33 if (!ASSERT_OK(err, "skel_load")) 34 34 goto out; ··· 69 69 70 70 static const char * const inproper_region_tests[] = { 71 71 "miss_lock", 72 + "no_lock", 72 73 "miss_unlock", 73 74 "non_sleepable_rcu_mismatch", 74 75 "inproper_sleepable_helper", ··· 100 99 } 101 100 102 101 static const char * const rcuptr_misuse_tests[] = { 103 - "task_untrusted_non_rcuptr", 104 102 "task_untrusted_rcuptr", 105 103 "cross_rcu_region", 106 104 }; ··· 128 128 129 129 void test_rcu_read_lock(void) 130 130 { 131 - struct btf *vmlinux_btf; 132 131 int cgroup_fd; 133 - 134 - vmlinux_btf = btf__load_vmlinux_btf(); 135 - if (!ASSERT_OK_PTR(vmlinux_btf, "could not load vmlinux BTF")) 136 - return; 137 - if (btf__find_by_name_kind(vmlinux_btf, "rcu", BTF_KIND_TYPE_TAG) < 0) { 138 - test__skip(); 139 - goto out; 140 - } 141 132 142 133 cgroup_fd = test__join_cgroup("/rcu_read_lock"); 143 134 if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /rcu_read_lock")) ··· 144 153 if (test__start_subtest("negative_tests_rcuptr_misuse")) 145 154 test_rcuptr_misuse(); 146 155 close(cgroup_fd); 147 - out: 148 - btf__free(vmlinux_btf); 156 + out:; 149 157 }

+46 -54

tools/testing/selftests/bpf/prog_tests/tc_redirect.c

··· 137 137 return 0; 138 138 } 139 139 140 - #define SYS(fmt, ...) \ 141 - ({ \ 142 - char cmd[1024]; \ 143 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 144 - if (!ASSERT_OK(system(cmd), cmd)) \ 145 - goto fail; \ 146 - }) 147 - 148 140 static int netns_setup_links_and_routes(struct netns_setup_result *result) 149 141 { 150 142 struct nstoken *nstoken = NULL; 151 143 char veth_src_fwd_addr[IFADDR_STR_LEN+1] = {}; 152 144 153 - SYS("ip link add veth_src type veth peer name veth_src_fwd"); 154 - SYS("ip link add veth_dst type veth peer name veth_dst_fwd"); 145 + SYS(fail, "ip link add veth_src type veth peer name veth_src_fwd"); 146 + SYS(fail, "ip link add veth_dst type veth peer name veth_dst_fwd"); 155 147 156 - SYS("ip link set veth_dst_fwd address " MAC_DST_FWD); 157 - SYS("ip link set veth_dst address " MAC_DST); 148 + SYS(fail, "ip link set veth_dst_fwd address " MAC_DST_FWD); 149 + SYS(fail, "ip link set veth_dst address " MAC_DST); 158 150 159 151 if (get_ifaddr("veth_src_fwd", veth_src_fwd_addr)) 160 152 goto fail; ··· 167 175 if (!ASSERT_GT(result->ifindex_veth_dst_fwd, 0, "ifindex_veth_dst_fwd")) 168 176 goto fail; 169 177 170 - SYS("ip link set veth_src netns " NS_SRC); 171 - SYS("ip link set veth_src_fwd netns " NS_FWD); 172 - SYS("ip link set veth_dst_fwd netns " NS_FWD); 173 - SYS("ip link set veth_dst netns " NS_DST); 178 + SYS(fail, "ip link set veth_src netns " NS_SRC); 179 + SYS(fail, "ip link set veth_src_fwd netns " NS_FWD); 180 + SYS(fail, "ip link set veth_dst_fwd netns " NS_FWD); 181 + SYS(fail, "ip link set veth_dst netns " NS_DST); 174 182 175 183 /** setup in 'src' namespace */ 176 184 nstoken = open_netns(NS_SRC); 177 185 if (!ASSERT_OK_PTR(nstoken, "setns src")) 178 186 goto fail; 179 187 180 - SYS("ip addr add " IP4_SRC "/32 dev veth_src"); 181 - SYS("ip addr add " IP6_SRC "/128 dev veth_src nodad"); 182 - SYS("ip link set dev veth_src up"); 188 + SYS(fail, "ip addr add " IP4_SRC "/32 dev veth_src"); 189 + SYS(fail, "ip addr add " IP6_SRC "/128 dev veth_src nodad"); 190 + SYS(fail, "ip link set dev veth_src up"); 183 191 184 - SYS("ip route add " IP4_DST "/32 dev veth_src scope global"); 185 - SYS("ip route add " IP4_NET "/16 dev veth_src scope global"); 186 - SYS("ip route add " IP6_DST "/128 dev veth_src scope global"); 192 + SYS(fail, "ip route add " IP4_DST "/32 dev veth_src scope global"); 193 + SYS(fail, "ip route add " IP4_NET "/16 dev veth_src scope global"); 194 + SYS(fail, "ip route add " IP6_DST "/128 dev veth_src scope global"); 187 195 188 - SYS("ip neigh add " IP4_DST " dev veth_src lladdr %s", 196 + SYS(fail, "ip neigh add " IP4_DST " dev veth_src lladdr %s", 189 197 veth_src_fwd_addr); 190 - SYS("ip neigh add " IP6_DST " dev veth_src lladdr %s", 198 + SYS(fail, "ip neigh add " IP6_DST " dev veth_src lladdr %s", 191 199 veth_src_fwd_addr); 192 200 193 201 close_netns(nstoken); ··· 201 209 * needs v4 one in order to start ARP probing. IP4_NET route is added 202 210 * to the endpoints so that the ARP processing will reply. 203 211 */ 204 - SYS("ip addr add " IP4_SLL "/32 dev veth_src_fwd"); 205 - SYS("ip addr add " IP4_DLL "/32 dev veth_dst_fwd"); 206 - SYS("ip link set dev veth_src_fwd up"); 207 - SYS("ip link set dev veth_dst_fwd up"); 212 + SYS(fail, "ip addr add " IP4_SLL "/32 dev veth_src_fwd"); 213 + SYS(fail, "ip addr add " IP4_DLL "/32 dev veth_dst_fwd"); 214 + SYS(fail, "ip link set dev veth_src_fwd up"); 215 + SYS(fail, "ip link set dev veth_dst_fwd up"); 208 216 209 - SYS("ip route add " IP4_SRC "/32 dev veth_src_fwd scope global"); 210 - SYS("ip route add " IP6_SRC "/128 dev veth_src_fwd scope global"); 211 - SYS("ip route add " IP4_DST "/32 dev veth_dst_fwd scope global"); 212 - SYS("ip route add " IP6_DST "/128 dev veth_dst_fwd scope global"); 217 + SYS(fail, "ip route add " IP4_SRC "/32 dev veth_src_fwd scope global"); 218 + SYS(fail, "ip route add " IP6_SRC "/128 dev veth_src_fwd scope global"); 219 + SYS(fail, "ip route add " IP4_DST "/32 dev veth_dst_fwd scope global"); 220 + SYS(fail, "ip route add " IP6_DST "/128 dev veth_dst_fwd scope global"); 213 221 214 222 close_netns(nstoken); 215 223 ··· 218 226 if (!ASSERT_OK_PTR(nstoken, "setns dst")) 219 227 goto fail; 220 228 221 - SYS("ip addr add " IP4_DST "/32 dev veth_dst"); 222 - SYS("ip addr add " IP6_DST "/128 dev veth_dst nodad"); 223 - SYS("ip link set dev veth_dst up"); 229 + SYS(fail, "ip addr add " IP4_DST "/32 dev veth_dst"); 230 + SYS(fail, "ip addr add " IP6_DST "/128 dev veth_dst nodad"); 231 + SYS(fail, "ip link set dev veth_dst up"); 224 232 225 - SYS("ip route add " IP4_SRC "/32 dev veth_dst scope global"); 226 - SYS("ip route add " IP4_NET "/16 dev veth_dst scope global"); 227 - SYS("ip route add " IP6_SRC "/128 dev veth_dst scope global"); 233 + SYS(fail, "ip route add " IP4_SRC "/32 dev veth_dst scope global"); 234 + SYS(fail, "ip route add " IP4_NET "/16 dev veth_dst scope global"); 235 + SYS(fail, "ip route add " IP6_SRC "/128 dev veth_dst scope global"); 228 236 229 - SYS("ip neigh add " IP4_SRC " dev veth_dst lladdr " MAC_DST_FWD); 230 - SYS("ip neigh add " IP6_SRC " dev veth_dst lladdr " MAC_DST_FWD); 237 + SYS(fail, "ip neigh add " IP4_SRC " dev veth_dst lladdr " MAC_DST_FWD); 238 + SYS(fail, "ip neigh add " IP6_SRC " dev veth_dst lladdr " MAC_DST_FWD); 231 239 232 240 close_netns(nstoken); 233 241 ··· 367 375 368 376 static int test_ping(int family, const char *addr) 369 377 { 370 - SYS("ip netns exec " NS_SRC " %s " PING_ARGS " %s > /dev/null", ping_command(family), addr); 378 + SYS(fail, "ip netns exec " NS_SRC " %s " PING_ARGS " %s > /dev/null", ping_command(family), addr); 371 379 return 0; 372 380 fail: 373 381 return -1; ··· 945 953 if (!ASSERT_OK(err, "ioctl TUNSETIFF")) 946 954 goto fail; 947 955 948 - SYS("ip link set dev %s up", name); 956 + SYS(fail, "ip link set dev %s up", name); 949 957 950 958 return fd; 951 959 fail: ··· 1068 1076 XGRESS_FILTER_ADD(&qdisc_veth_dst_fwd, BPF_TC_EGRESS, skel->progs.tc_chk, 0); 1069 1077 1070 1078 /* Setup route and neigh tables */ 1071 - SYS("ip -netns " NS_SRC " addr add dev tun_src " IP4_TUN_SRC "/24"); 1072 - SYS("ip -netns " NS_FWD " addr add dev tun_fwd " IP4_TUN_FWD "/24"); 1079 + SYS(fail, "ip -netns " NS_SRC " addr add dev tun_src " IP4_TUN_SRC "/24"); 1080 + SYS(fail, "ip -netns " NS_FWD " addr add dev tun_fwd " IP4_TUN_FWD "/24"); 1073 1081 1074 - SYS("ip -netns " NS_SRC " addr add dev tun_src " IP6_TUN_SRC "/64 nodad"); 1075 - SYS("ip -netns " NS_FWD " addr add dev tun_fwd " IP6_TUN_FWD "/64 nodad"); 1082 + SYS(fail, "ip -netns " NS_SRC " addr add dev tun_src " IP6_TUN_SRC "/64 nodad"); 1083 + SYS(fail, "ip -netns " NS_FWD " addr add dev tun_fwd " IP6_TUN_FWD "/64 nodad"); 1076 1084 1077 - SYS("ip -netns " NS_SRC " route del " IP4_DST "/32 dev veth_src scope global"); 1078 - SYS("ip -netns " NS_SRC " route add " IP4_DST "/32 via " IP4_TUN_FWD 1085 + SYS(fail, "ip -netns " NS_SRC " route del " IP4_DST "/32 dev veth_src scope global"); 1086 + SYS(fail, "ip -netns " NS_SRC " route add " IP4_DST "/32 via " IP4_TUN_FWD 1079 1087 " dev tun_src scope global"); 1080 - SYS("ip -netns " NS_DST " route add " IP4_TUN_SRC "/32 dev veth_dst scope global"); 1081 - SYS("ip -netns " NS_SRC " route del " IP6_DST "/128 dev veth_src scope global"); 1082 - SYS("ip -netns " NS_SRC " route add " IP6_DST "/128 via " IP6_TUN_FWD 1088 + SYS(fail, "ip -netns " NS_DST " route add " IP4_TUN_SRC "/32 dev veth_dst scope global"); 1089 + SYS(fail, "ip -netns " NS_SRC " route del " IP6_DST "/128 dev veth_src scope global"); 1090 + SYS(fail, "ip -netns " NS_SRC " route add " IP6_DST "/128 via " IP6_TUN_FWD 1083 1091 " dev tun_src scope global"); 1084 - SYS("ip -netns " NS_DST " route add " IP6_TUN_SRC "/128 dev veth_dst scope global"); 1092 + SYS(fail, "ip -netns " NS_DST " route add " IP6_TUN_SRC "/128 dev veth_dst scope global"); 1085 1093 1086 - SYS("ip -netns " NS_DST " neigh add " IP4_TUN_SRC " dev veth_dst lladdr " MAC_DST_FWD); 1087 - SYS("ip -netns " NS_DST " neigh add " IP6_TUN_SRC " dev veth_dst lladdr " MAC_DST_FWD); 1094 + SYS(fail, "ip -netns " NS_DST " neigh add " IP4_TUN_SRC " dev veth_dst lladdr " MAC_DST_FWD); 1095 + SYS(fail, "ip -netns " NS_DST " neigh add " IP6_TUN_SRC " dev veth_dst lladdr " MAC_DST_FWD); 1088 1096 1089 1097 if (!ASSERT_OK(set_forwarding(false), "disable forwarding")) 1090 1098 goto fail;

+28 -43

tools/testing/selftests/bpf/prog_tests/test_tunnel.c

··· 91 91 92 92 #define PING_ARGS "-i 0.01 -c 3 -w 10 -q" 93 93 94 - #define SYS(fmt, ...) \ 95 - ({ \ 96 - char cmd[1024]; \ 97 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 98 - if (!ASSERT_OK(system(cmd), cmd)) \ 99 - goto fail; \ 100 - }) 101 - 102 - #define SYS_NOFAIL(fmt, ...) \ 103 - ({ \ 104 - char cmd[1024]; \ 105 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 106 - system(cmd); \ 107 - }) 108 - 109 94 static int config_device(void) 110 95 { 111 - SYS("ip netns add at_ns0"); 112 - SYS("ip link add veth0 address " MAC_VETH1 " type veth peer name veth1"); 113 - SYS("ip link set veth0 netns at_ns0"); 114 - SYS("ip addr add " IP4_ADDR1_VETH1 "/24 dev veth1"); 115 - SYS("ip link set dev veth1 up mtu 1500"); 116 - SYS("ip netns exec at_ns0 ip addr add " IP4_ADDR_VETH0 "/24 dev veth0"); 117 - SYS("ip netns exec at_ns0 ip link set dev veth0 up mtu 1500"); 96 + SYS(fail, "ip netns add at_ns0"); 97 + SYS(fail, "ip link add veth0 address " MAC_VETH1 " type veth peer name veth1"); 98 + SYS(fail, "ip link set veth0 netns at_ns0"); 99 + SYS(fail, "ip addr add " IP4_ADDR1_VETH1 "/24 dev veth1"); 100 + SYS(fail, "ip link set dev veth1 up mtu 1500"); 101 + SYS(fail, "ip netns exec at_ns0 ip addr add " IP4_ADDR_VETH0 "/24 dev veth0"); 102 + SYS(fail, "ip netns exec at_ns0 ip link set dev veth0 up mtu 1500"); 118 103 119 104 return 0; 120 105 fail: ··· 117 132 static int add_vxlan_tunnel(void) 118 133 { 119 134 /* at_ns0 namespace */ 120 - SYS("ip netns exec at_ns0 ip link add dev %s type vxlan external gbp dstport 4789", 135 + SYS(fail, "ip netns exec at_ns0 ip link add dev %s type vxlan external gbp dstport 4789", 121 136 VXLAN_TUNL_DEV0); 122 - SYS("ip netns exec at_ns0 ip link set dev %s address %s up", 137 + SYS(fail, "ip netns exec at_ns0 ip link set dev %s address %s up", 123 138 VXLAN_TUNL_DEV0, MAC_TUNL_DEV0); 124 - SYS("ip netns exec at_ns0 ip addr add dev %s %s/24", 139 + SYS(fail, "ip netns exec at_ns0 ip addr add dev %s %s/24", 125 140 VXLAN_TUNL_DEV0, IP4_ADDR_TUNL_DEV0); 126 - SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev %s", 141 + SYS(fail, "ip netns exec at_ns0 ip neigh add %s lladdr %s dev %s", 127 142 IP4_ADDR_TUNL_DEV1, MAC_TUNL_DEV1, VXLAN_TUNL_DEV0); 128 - SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev veth0", 143 + SYS(fail, "ip netns exec at_ns0 ip neigh add %s lladdr %s dev veth0", 129 144 IP4_ADDR2_VETH1, MAC_VETH1); 130 145 131 146 /* root namespace */ 132 - SYS("ip link add dev %s type vxlan external gbp dstport 4789", 147 + SYS(fail, "ip link add dev %s type vxlan external gbp dstport 4789", 133 148 VXLAN_TUNL_DEV1); 134 - SYS("ip link set dev %s address %s up", VXLAN_TUNL_DEV1, MAC_TUNL_DEV1); 135 - SYS("ip addr add dev %s %s/24", VXLAN_TUNL_DEV1, IP4_ADDR_TUNL_DEV1); 136 - SYS("ip neigh add %s lladdr %s dev %s", 149 + SYS(fail, "ip link set dev %s address %s up", VXLAN_TUNL_DEV1, MAC_TUNL_DEV1); 150 + SYS(fail, "ip addr add dev %s %s/24", VXLAN_TUNL_DEV1, IP4_ADDR_TUNL_DEV1); 151 + SYS(fail, "ip neigh add %s lladdr %s dev %s", 137 152 IP4_ADDR_TUNL_DEV0, MAC_TUNL_DEV0, VXLAN_TUNL_DEV1); 138 153 139 154 return 0; ··· 150 165 151 166 static int add_ip6vxlan_tunnel(void) 152 167 { 153 - SYS("ip netns exec at_ns0 ip -6 addr add %s/96 dev veth0", 168 + SYS(fail, "ip netns exec at_ns0 ip -6 addr add %s/96 dev veth0", 154 169 IP6_ADDR_VETH0); 155 - SYS("ip netns exec at_ns0 ip link set dev veth0 up"); 156 - SYS("ip -6 addr add %s/96 dev veth1", IP6_ADDR1_VETH1); 157 - SYS("ip -6 addr add %s/96 dev veth1", IP6_ADDR2_VETH1); 158 - SYS("ip link set dev veth1 up"); 170 + SYS(fail, "ip netns exec at_ns0 ip link set dev veth0 up"); 171 + SYS(fail, "ip -6 addr add %s/96 dev veth1", IP6_ADDR1_VETH1); 172 + SYS(fail, "ip -6 addr add %s/96 dev veth1", IP6_ADDR2_VETH1); 173 + SYS(fail, "ip link set dev veth1 up"); 159 174 160 175 /* at_ns0 namespace */ 161 - SYS("ip netns exec at_ns0 ip link add dev %s type vxlan external dstport 4789", 176 + SYS(fail, "ip netns exec at_ns0 ip link add dev %s type vxlan external dstport 4789", 162 177 IP6VXLAN_TUNL_DEV0); 163 - SYS("ip netns exec at_ns0 ip addr add dev %s %s/24", 178 + SYS(fail, "ip netns exec at_ns0 ip addr add dev %s %s/24", 164 179 IP6VXLAN_TUNL_DEV0, IP4_ADDR_TUNL_DEV0); 165 - SYS("ip netns exec at_ns0 ip link set dev %s address %s up", 180 + SYS(fail, "ip netns exec at_ns0 ip link set dev %s address %s up", 166 181 IP6VXLAN_TUNL_DEV0, MAC_TUNL_DEV0); 167 182 168 183 /* root namespace */ 169 - SYS("ip link add dev %s type vxlan external dstport 4789", 184 + SYS(fail, "ip link add dev %s type vxlan external dstport 4789", 170 185 IP6VXLAN_TUNL_DEV1); 171 - SYS("ip addr add dev %s %s/24", IP6VXLAN_TUNL_DEV1, IP4_ADDR_TUNL_DEV1); 172 - SYS("ip link set dev %s address %s up", 186 + SYS(fail, "ip addr add dev %s %s/24", IP6VXLAN_TUNL_DEV1, IP4_ADDR_TUNL_DEV1); 187 + SYS(fail, "ip link set dev %s address %s up", 173 188 IP6VXLAN_TUNL_DEV1, MAC_TUNL_DEV1); 174 189 175 190 return 0; ··· 190 205 191 206 static int test_ping(int family, const char *addr) 192 207 { 193 - SYS("%s %s %s > /dev/null", ping_command(family), PING_ARGS, addr); 208 + SYS(fail, "%s %s %s > /dev/null", ping_command(family), PING_ARGS, addr); 194 209 return 0; 195 210 fail: 196 211 return -1;

+3

tools/testing/selftests/bpf/prog_tests/timer.c

··· 29 29 /* check that timer_cb2() was executed twice */ 30 30 ASSERT_EQ(timer_skel->bss->bss_data, 10, "bss_data"); 31 31 32 + /* check that timer_cb3() was executed twice */ 33 + ASSERT_EQ(timer_skel->bss->abs_data, 12, "abs_data"); 34 + 32 35 /* check that there were no errors in timer execution */ 33 36 ASSERT_EQ(timer_skel->bss->err, 0, "err"); 34 37

+9

tools/testing/selftests/bpf/prog_tests/uninit_stack.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <test_progs.h> 4 + #include "uninit_stack.skel.h" 5 + 6 + void test_uninit_stack(void) 7 + { 8 + RUN_TESTS(uninit_stack); 9 + }

+1 -1

tools/testing/selftests/bpf/prog_tests/user_ringbuf.c

··· 590 590 /* Kick the kernel, causing it to drain the ring buffer and then wake 591 591 * up the test thread waiting on epoll. 592 592 */ 593 - syscall(__NR_getrlimit); 593 + syscall(__NR_prlimit64); 594 594 595 595 return NULL; 596 596 }

+9 -2

tools/testing/selftests/bpf/prog_tests/xdp_attach.c

··· 4 4 #define IFINDEX_LO 1 5 5 #define XDP_FLAGS_REPLACE (1U << 4) 6 6 7 - void serial_test_xdp_attach(void) 7 + static void test_xdp_attach(const char *file) 8 8 { 9 9 __u32 duration = 0, id1, id2, id0 = 0, len; 10 10 struct bpf_object *obj1, *obj2, *obj3; 11 - const char *file = "./test_xdp.bpf.o"; 12 11 struct bpf_prog_info info = {}; 13 12 int err, fd1, fd2, fd3; 14 13 LIBBPF_OPTS(bpf_xdp_attach_opts, opts); ··· 83 84 bpf_object__close(obj2); 84 85 out_1: 85 86 bpf_object__close(obj1); 87 + } 88 + 89 + void serial_test_xdp_attach(void) 90 + { 91 + if (test__start_subtest("xdp_attach")) 92 + test_xdp_attach("./test_xdp.bpf.o"); 93 + if (test__start_subtest("xdp_attach_dynptr")) 94 + test_xdp_attach("./test_xdp_dynptr.bpf.o"); 86 95 }

+15 -23

tools/testing/selftests/bpf/prog_tests/xdp_bonding.c

··· 141 141 static int bonding_setup(struct skeletons *skeletons, int mode, int xmit_policy, 142 142 int bond_both_attach) 143 143 { 144 - #define SYS(fmt, ...) \ 145 - ({ \ 146 - char cmd[1024]; \ 147 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 148 - if (!ASSERT_OK(system(cmd), cmd)) \ 149 - return -1; \ 150 - }) 144 + SYS(fail, "ip netns add ns_dst"); 145 + SYS(fail, "ip link add veth1_1 type veth peer name veth2_1 netns ns_dst"); 146 + SYS(fail, "ip link add veth1_2 type veth peer name veth2_2 netns ns_dst"); 151 147 152 - SYS("ip netns add ns_dst"); 153 - SYS("ip link add veth1_1 type veth peer name veth2_1 netns ns_dst"); 154 - SYS("ip link add veth1_2 type veth peer name veth2_2 netns ns_dst"); 155 - 156 - SYS("ip link add bond1 type bond mode %s xmit_hash_policy %s", 148 + SYS(fail, "ip link add bond1 type bond mode %s xmit_hash_policy %s", 157 149 mode_names[mode], xmit_policy_names[xmit_policy]); 158 - SYS("ip link set bond1 up address " BOND1_MAC_STR " addrgenmode none"); 159 - SYS("ip -netns ns_dst link add bond2 type bond mode %s xmit_hash_policy %s", 150 + SYS(fail, "ip link set bond1 up address " BOND1_MAC_STR " addrgenmode none"); 151 + SYS(fail, "ip -netns ns_dst link add bond2 type bond mode %s xmit_hash_policy %s", 160 152 mode_names[mode], xmit_policy_names[xmit_policy]); 161 - SYS("ip -netns ns_dst link set bond2 up address " BOND2_MAC_STR " addrgenmode none"); 153 + SYS(fail, "ip -netns ns_dst link set bond2 up address " BOND2_MAC_STR " addrgenmode none"); 162 154 163 - SYS("ip link set veth1_1 master bond1"); 155 + SYS(fail, "ip link set veth1_1 master bond1"); 164 156 if (bond_both_attach == BOND_BOTH_AND_ATTACH) { 165 - SYS("ip link set veth1_2 master bond1"); 157 + SYS(fail, "ip link set veth1_2 master bond1"); 166 158 } else { 167 - SYS("ip link set veth1_2 up addrgenmode none"); 159 + SYS(fail, "ip link set veth1_2 up addrgenmode none"); 168 160 169 161 if (xdp_attach(skeletons, skeletons->xdp_dummy->progs.xdp_dummy_prog, "veth1_2")) 170 162 return -1; 171 163 } 172 164 173 - SYS("ip -netns ns_dst link set veth2_1 master bond2"); 165 + SYS(fail, "ip -netns ns_dst link set veth2_1 master bond2"); 174 166 175 167 if (bond_both_attach == BOND_BOTH_AND_ATTACH) 176 - SYS("ip -netns ns_dst link set veth2_2 master bond2"); 168 + SYS(fail, "ip -netns ns_dst link set veth2_2 master bond2"); 177 169 else 178 - SYS("ip -netns ns_dst link set veth2_2 up addrgenmode none"); 170 + SYS(fail, "ip -netns ns_dst link set veth2_2 up addrgenmode none"); 179 171 180 172 /* Load a dummy program on sending side as with veth peer needs to have a 181 173 * XDP program loaded as well. ··· 186 194 } 187 195 188 196 return 0; 189 - 190 - #undef SYS 197 + fail: 198 + return -1; 191 199 } 192 200 193 201 static void bonding_cleanup(struct skeletons *skeletons)

+11 -19

tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c

··· 12 12 #include <uapi/linux/netdev.h> 13 13 #include "test_xdp_do_redirect.skel.h" 14 14 15 - #define SYS(fmt, ...) \ 16 - ({ \ 17 - char cmd[1024]; \ 18 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 19 - if (!ASSERT_OK(system(cmd), cmd)) \ 20 - goto out; \ 21 - }) 22 - 23 15 struct udp_packet { 24 16 struct ethhdr eth; 25 17 struct ipv6hdr iph; ··· 118 126 * iface and NUM_PKTS-2 in the TC hook. We match the packets on the UDP 119 127 * payload. 120 128 */ 121 - SYS("ip netns add testns"); 129 + SYS(out, "ip netns add testns"); 122 130 nstoken = open_netns("testns"); 123 131 if (!ASSERT_OK_PTR(nstoken, "setns")) 124 132 goto out; 125 133 126 - SYS("ip link add veth_src type veth peer name veth_dst"); 127 - SYS("ip link set dev veth_src address 00:11:22:33:44:55"); 128 - SYS("ip link set dev veth_dst address 66:77:88:99:aa:bb"); 129 - SYS("ip link set dev veth_src up"); 130 - SYS("ip link set dev veth_dst up"); 131 - SYS("ip addr add dev veth_src fc00::1/64"); 132 - SYS("ip addr add dev veth_dst fc00::2/64"); 133 - SYS("ip neigh add fc00::2 dev veth_src lladdr 66:77:88:99:aa:bb"); 134 + SYS(out, "ip link add veth_src type veth peer name veth_dst"); 135 + SYS(out, "ip link set dev veth_src address 00:11:22:33:44:55"); 136 + SYS(out, "ip link set dev veth_dst address 66:77:88:99:aa:bb"); 137 + SYS(out, "ip link set dev veth_src up"); 138 + SYS(out, "ip link set dev veth_dst up"); 139 + SYS(out, "ip addr add dev veth_src fc00::1/64"); 140 + SYS(out, "ip addr add dev veth_dst fc00::2/64"); 141 + SYS(out, "ip neigh add fc00::2 dev veth_src lladdr 66:77:88:99:aa:bb"); 134 142 135 143 /* We enable forwarding in the test namespace because that will cause 136 144 * the packets that go through the kernel stack (with XDP_PASS) to be ··· 143 151 * code didn't have this, so we keep the test behaviour to make sure the 144 152 * bug doesn't resurface. 145 153 */ 146 - SYS("sysctl -qw net.ipv6.conf.all.forwarding=1"); 154 + SYS(out, "sysctl -qw net.ipv6.conf.all.forwarding=1"); 147 155 148 156 ifindex_src = if_nametoindex("veth_src"); 149 157 ifindex_dst = if_nametoindex("veth_dst"); ··· 217 225 out: 218 226 if (nstoken) 219 227 close_netns(nstoken); 220 - system("ip netns del testns"); 228 + SYS_NOFAIL("ip netns del testns"); 221 229 test_xdp_do_redirect__destroy(skel); 222 230 }

+9 -14

tools/testing/selftests/bpf/prog_tests/xdp_metadata.c

··· 34 34 #define PREFIX_LEN "8" 35 35 #define FAMILY AF_INET 36 36 37 - #define SYS(cmd) ({ \ 38 - if (!ASSERT_OK(system(cmd), (cmd))) \ 39 - goto out; \ 40 - }) 41 - 42 37 struct xsk { 43 38 void *umem_area; 44 39 struct xsk_umem *umem; ··· 293 298 294 299 /* Setup new networking namespace, with a veth pair. */ 295 300 296 - SYS("ip netns add xdp_metadata"); 301 + SYS(out, "ip netns add xdp_metadata"); 297 302 tok = open_netns("xdp_metadata"); 298 - SYS("ip link add numtxqueues 1 numrxqueues 1 " TX_NAME 303 + SYS(out, "ip link add numtxqueues 1 numrxqueues 1 " TX_NAME 299 304 " type veth peer " RX_NAME " numtxqueues 1 numrxqueues 1"); 300 - SYS("ip link set dev " TX_NAME " address 00:00:00:00:00:01"); 301 - SYS("ip link set dev " RX_NAME " address 00:00:00:00:00:02"); 302 - SYS("ip link set dev " TX_NAME " up"); 303 - SYS("ip link set dev " RX_NAME " up"); 304 - SYS("ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME); 305 - SYS("ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME); 305 + SYS(out, "ip link set dev " TX_NAME " address 00:00:00:00:00:01"); 306 + SYS(out, "ip link set dev " RX_NAME " address 00:00:00:00:00:02"); 307 + SYS(out, "ip link set dev " TX_NAME " up"); 308 + SYS(out, "ip link set dev " RX_NAME " up"); 309 + SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME); 310 + SYS(out, "ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME); 306 311 307 312 rx_ifindex = if_nametoindex(RX_NAME); 308 313 tx_ifindex = if_nametoindex(TX_NAME); ··· 400 405 xdp_metadata__destroy(bpf_obj); 401 406 if (tok) 402 407 close_netns(tok); 403 - system("ip netns del xdp_metadata"); 408 + SYS_NOFAIL("ip netns del xdp_metadata"); 404 409 }

+18 -23

tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c

··· 8 8 9 9 #define CMD_OUT_BUF_SIZE 1023 10 10 11 - #define SYS(cmd) ({ \ 12 - if (!ASSERT_OK(system(cmd), (cmd))) \ 13 - goto out; \ 14 - }) 15 - 16 11 #define SYS_OUT(cmd, ...) ({ \ 17 12 char buf[1024]; \ 18 13 snprintf(buf, sizeof(buf), (cmd), ##__VA_ARGS__); \ ··· 64 69 char buf[CMD_OUT_BUF_SIZE]; 65 70 size_t size; 66 71 67 - SYS("ip netns add synproxy"); 72 + SYS(out, "ip netns add synproxy"); 68 73 69 - SYS("ip link add tmp0 type veth peer name tmp1"); 70 - SYS("ip link set tmp1 netns synproxy"); 71 - SYS("ip link set tmp0 up"); 72 - SYS("ip addr replace 198.18.0.1/24 dev tmp0"); 74 + SYS(out, "ip link add tmp0 type veth peer name tmp1"); 75 + SYS(out, "ip link set tmp1 netns synproxy"); 76 + SYS(out, "ip link set tmp0 up"); 77 + SYS(out, "ip addr replace 198.18.0.1/24 dev tmp0"); 73 78 74 79 /* When checksum offload is enabled, the XDP program sees wrong 75 80 * checksums and drops packets. 76 81 */ 77 - SYS("ethtool -K tmp0 tx off"); 82 + SYS(out, "ethtool -K tmp0 tx off"); 78 83 if (xdp) 79 84 /* Workaround required for veth. */ 80 - SYS("ip link set tmp0 xdp object xdp_dummy.bpf.o section xdp 2> /dev/null"); 85 + SYS(out, "ip link set tmp0 xdp object xdp_dummy.bpf.o section xdp 2> /dev/null"); 81 86 82 87 ns = open_netns("synproxy"); 83 88 if (!ASSERT_OK_PTR(ns, "setns")) 84 89 goto out; 85 90 86 - SYS("ip link set lo up"); 87 - SYS("ip link set tmp1 up"); 88 - SYS("ip addr replace 198.18.0.2/24 dev tmp1"); 89 - SYS("sysctl -w net.ipv4.tcp_syncookies=2"); 90 - SYS("sysctl -w net.ipv4.tcp_timestamps=1"); 91 - SYS("sysctl -w net.netfilter.nf_conntrack_tcp_loose=0"); 92 - SYS("iptables-legacy -t raw -I PREROUTING \ 91 + SYS(out, "ip link set lo up"); 92 + SYS(out, "ip link set tmp1 up"); 93 + SYS(out, "ip addr replace 198.18.0.2/24 dev tmp1"); 94 + SYS(out, "sysctl -w net.ipv4.tcp_syncookies=2"); 95 + SYS(out, "sysctl -w net.ipv4.tcp_timestamps=1"); 96 + SYS(out, "sysctl -w net.netfilter.nf_conntrack_tcp_loose=0"); 97 + SYS(out, "iptables-legacy -t raw -I PREROUTING \ 93 98 -i tmp1 -p tcp -m tcp --syn --dport 8080 -j CT --notrack"); 94 - SYS("iptables-legacy -t filter -A INPUT \ 99 + SYS(out, "iptables-legacy -t filter -A INPUT \ 95 100 -i tmp1 -p tcp -m tcp --dport 8080 -m state --state INVALID,UNTRACKED \ 96 101 -j SYNPROXY --sack-perm --timestamp --wscale 7 --mss 1460"); 97 - SYS("iptables-legacy -t filter -A INPUT \ 102 + SYS(out, "iptables-legacy -t filter -A INPUT \ 98 103 -i tmp1 -m state --state INVALID -j DROP"); 99 104 100 105 ctrl_file = SYS_OUT("./xdp_synproxy --iface tmp1 --ports 8080 \ ··· 165 170 if (ns) 166 171 close_netns(ns); 167 172 168 - system("ip link del tmp0"); 169 - system("ip netns del synproxy"); 173 + SYS_NOFAIL("ip link del tmp0"); 174 + SYS_NOFAIL("ip netns del synproxy"); 170 175 } 171 176 172 177 void test_xdp_synproxy(void)

+26 -41

tools/testing/selftests/bpf/prog_tests/xfrm_info.c

··· 69 69 "proto esp aead 'rfc4106(gcm(aes))' " \ 70 70 "0xe4d8f4b4da1df18a3510b3781496daa82488b713 128 mode tunnel " 71 71 72 - #define SYS(fmt, ...) \ 73 - ({ \ 74 - char cmd[1024]; \ 75 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 76 - if (!ASSERT_OK(system(cmd), cmd)) \ 77 - goto fail; \ 78 - }) 79 - 80 - #define SYS_NOFAIL(fmt, ...) \ 81 - ({ \ 82 - char cmd[1024]; \ 83 - snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 84 - system(cmd); \ 85 - }) 86 - 87 72 static int attach_tc_prog(struct bpf_tc_hook *hook, int igr_fd, int egr_fd) 88 73 { 89 74 LIBBPF_OPTS(bpf_tc_opts, opts1, .handle = 1, .priority = 1, ··· 111 126 112 127 static int config_underlay(void) 113 128 { 114 - SYS("ip netns add " NS0); 115 - SYS("ip netns add " NS1); 116 - SYS("ip netns add " NS2); 129 + SYS(fail, "ip netns add " NS0); 130 + SYS(fail, "ip netns add " NS1); 131 + SYS(fail, "ip netns add " NS2); 117 132 118 133 /* NS0 <-> NS1 [veth01 <-> veth10] */ 119 - SYS("ip link add veth01 netns " NS0 " type veth peer name veth10 netns " NS1); 120 - SYS("ip -net " NS0 " addr add " IP4_ADDR_VETH01 "/24 dev veth01"); 121 - SYS("ip -net " NS0 " link set dev veth01 up"); 122 - SYS("ip -net " NS1 " addr add " IP4_ADDR_VETH10 "/24 dev veth10"); 123 - SYS("ip -net " NS1 " link set dev veth10 up"); 134 + SYS(fail, "ip link add veth01 netns " NS0 " type veth peer name veth10 netns " NS1); 135 + SYS(fail, "ip -net " NS0 " addr add " IP4_ADDR_VETH01 "/24 dev veth01"); 136 + SYS(fail, "ip -net " NS0 " link set dev veth01 up"); 137 + SYS(fail, "ip -net " NS1 " addr add " IP4_ADDR_VETH10 "/24 dev veth10"); 138 + SYS(fail, "ip -net " NS1 " link set dev veth10 up"); 124 139 125 140 /* NS0 <-> NS2 [veth02 <-> veth20] */ 126 - SYS("ip link add veth02 netns " NS0 " type veth peer name veth20 netns " NS2); 127 - SYS("ip -net " NS0 " addr add " IP4_ADDR_VETH02 "/24 dev veth02"); 128 - SYS("ip -net " NS0 " link set dev veth02 up"); 129 - SYS("ip -net " NS2 " addr add " IP4_ADDR_VETH20 "/24 dev veth20"); 130 - SYS("ip -net " NS2 " link set dev veth20 up"); 141 + SYS(fail, "ip link add veth02 netns " NS0 " type veth peer name veth20 netns " NS2); 142 + SYS(fail, "ip -net " NS0 " addr add " IP4_ADDR_VETH02 "/24 dev veth02"); 143 + SYS(fail, "ip -net " NS0 " link set dev veth02 up"); 144 + SYS(fail, "ip -net " NS2 " addr add " IP4_ADDR_VETH20 "/24 dev veth20"); 145 + SYS(fail, "ip -net " NS2 " link set dev veth20 up"); 131 146 132 147 return 0; 133 148 fail: ··· 138 153 const char *ipv4_remote, int if_id) 139 154 { 140 155 /* State: local -> remote */ 141 - SYS("ip -net %s xfrm state add src %s dst %s spi 1 " 156 + SYS(fail, "ip -net %s xfrm state add src %s dst %s spi 1 " 142 157 ESP_DUMMY_PARAMS "if_id %d", ns, ipv4_local, ipv4_remote, if_id); 143 158 144 159 /* State: local <- remote */ 145 - SYS("ip -net %s xfrm state add src %s dst %s spi 1 " 160 + SYS(fail, "ip -net %s xfrm state add src %s dst %s spi 1 " 146 161 ESP_DUMMY_PARAMS "if_id %d", ns, ipv4_remote, ipv4_local, if_id); 147 162 148 163 /* Policy: local -> remote */ 149 - SYS("ip -net %s xfrm policy add dir out src 0.0.0.0/0 dst 0.0.0.0/0 " 164 + SYS(fail, "ip -net %s xfrm policy add dir out src 0.0.0.0/0 dst 0.0.0.0/0 " 150 165 "if_id %d tmpl src %s dst %s proto esp mode tunnel if_id %d", ns, 151 166 if_id, ipv4_local, ipv4_remote, if_id); 152 167 153 168 /* Policy: local <- remote */ 154 - SYS("ip -net %s xfrm policy add dir in src 0.0.0.0/0 dst 0.0.0.0/0 " 169 + SYS(fail, "ip -net %s xfrm policy add dir in src 0.0.0.0/0 dst 0.0.0.0/0 " 155 170 "if_id %d tmpl src %s dst %s proto esp mode tunnel if_id %d", ns, 156 171 if_id, ipv4_remote, ipv4_local, if_id); 157 172 ··· 259 274 if (!ASSERT_OK(setup_xfrmi_external_dev(NS0), "xfrmi")) 260 275 goto fail; 261 276 262 - SYS("ip -net " NS0 " addr add 192.168.1.100/24 dev ipsec0"); 263 - SYS("ip -net " NS0 " link set dev ipsec0 up"); 277 + SYS(fail, "ip -net " NS0 " addr add 192.168.1.100/24 dev ipsec0"); 278 + SYS(fail, "ip -net " NS0 " link set dev ipsec0 up"); 264 279 265 - SYS("ip -net " NS1 " link add ipsec0 type xfrm if_id %d", IF_ID_1); 266 - SYS("ip -net " NS1 " addr add 192.168.1.200/24 dev ipsec0"); 267 - SYS("ip -net " NS1 " link set dev ipsec0 up"); 280 + SYS(fail, "ip -net " NS1 " link add ipsec0 type xfrm if_id %d", IF_ID_1); 281 + SYS(fail, "ip -net " NS1 " addr add 192.168.1.200/24 dev ipsec0"); 282 + SYS(fail, "ip -net " NS1 " link set dev ipsec0 up"); 268 283 269 - SYS("ip -net " NS2 " link add ipsec0 type xfrm if_id %d", IF_ID_2); 270 - SYS("ip -net " NS2 " addr add 192.168.1.200/24 dev ipsec0"); 271 - SYS("ip -net " NS2 " link set dev ipsec0 up"); 284 + SYS(fail, "ip -net " NS2 " link add ipsec0 type xfrm if_id %d", IF_ID_2); 285 + SYS(fail, "ip -net " NS2 " addr add 192.168.1.200/24 dev ipsec0"); 286 + SYS(fail, "ip -net " NS2 " link set dev ipsec0 up"); 272 287 273 288 return 0; 274 289 fail: ··· 279 294 { 280 295 skel->bss->req_if_id = if_id; 281 296 282 - SYS("ping -i 0.01 -c 3 -w 10 -q 192.168.1.200 > /dev/null"); 297 + SYS(fail, "ping -i 0.01 -c 3 -w 10 -q 192.168.1.200 > /dev/null"); 283 298 284 299 if (!ASSERT_EQ(skel->bss->resp_if_id, if_id, "if_id")) 285 300 goto fail;

+1 -1

tools/testing/selftests/bpf/progs/bpf_flow.c

··· 337 337 keys->ip_proto = ip6h->nexthdr; 338 338 keys->flow_label = ip6_flowlabel(ip6h); 339 339 340 - if (keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL) 340 + if (keys->flow_label && keys->flags & BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL) 341 341 return export_flow_keys(keys, BPF_OK); 342 342 343 343 return parse_ipv6_proto(skb, ip6h->nexthdr);

+23

tools/testing/selftests/bpf/progs/bpf_misc.h

··· 2 2 #ifndef __BPF_MISC_H__ 3 3 #define __BPF_MISC_H__ 4 4 5 + /* This set of attributes controls behavior of the 6 + * test_loader.c:test_loader__run_subtests(). 7 + * 8 + * __msg Message expected to be found in the verifier log. 9 + * Multiple __msg attributes could be specified. 10 + * 11 + * __success Expect program load success in privileged mode. 12 + * 13 + * __failure Expect program load failure in privileged mode. 14 + * 15 + * __log_level Log level to use for the program, numeric value expected. 16 + * 17 + * __flag Adds one flag use for the program, the following values are valid: 18 + * - BPF_F_STRICT_ALIGNMENT; 19 + * - BPF_F_TEST_RND_HI32; 20 + * - BPF_F_TEST_STATE_FREQ; 21 + * - BPF_F_SLEEPABLE; 22 + * - BPF_F_XDP_HAS_FRAGS; 23 + * - A numeric value. 24 + * Multiple __flag attributes could be specified, the final flags 25 + * value is derived by applying binary "or" to all specified values. 26 + */ 5 27 #define __msg(msg) __attribute__((btf_decl_tag("comment:test_expect_msg=" msg))) 6 28 #define __failure __attribute__((btf_decl_tag("comment:test_expect_failure"))) 7 29 #define __success __attribute__((btf_decl_tag("comment:test_expect_success"))) 8 30 #define __log_level(lvl) __attribute__((btf_decl_tag("comment:test_log_level="#lvl))) 31 + #define __flag(flag) __attribute__((btf_decl_tag("comment:test_prog_flags="#flag))) 9 32 10 33 /* Convenience macro for use with 'asm volatile' blocks */ 11 34 #define __naked __attribute__((naked))

+1 -1

tools/testing/selftests/bpf/progs/cb_refs.c

··· 4 4 #include <bpf/bpf_helpers.h> 5 5 6 6 struct map_value { 7 - struct prog_test_ref_kfunc __kptr_ref *ptr; 7 + struct prog_test_ref_kfunc __kptr *ptr; 8 8 }; 9 9 10 10 struct {

+2 -1

tools/testing/selftests/bpf/progs/cgrp_kfunc_common.h

··· 10 10 #include <bpf/bpf_tracing.h> 11 11 12 12 struct __cgrps_kfunc_map_value { 13 - struct cgroup __kptr_ref * cgrp; 13 + struct cgroup __kptr * cgrp; 14 14 }; 15 15 16 16 struct hash_map { ··· 24 24 struct cgroup *bpf_cgroup_kptr_get(struct cgroup **pp) __ksym; 25 25 void bpf_cgroup_release(struct cgroup *p) __ksym; 26 26 struct cgroup *bpf_cgroup_ancestor(struct cgroup *cgrp, int level) __ksym; 27 + struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym; 27 28 28 29 static inline struct __cgrps_kfunc_map_value *cgrps_kfunc_map_value_lookup(struct cgroup *cgrp) 29 30 {

+1 -1

tools/testing/selftests/bpf/progs/cgrp_kfunc_failure.c

··· 205 205 } 206 206 207 207 SEC("tp_btf/cgroup_mkdir") 208 - __failure __msg("arg#0 is untrusted_ptr_or_null_ expected ptr_ or socket") 208 + __failure __msg("expects refcounted") 209 209 int BPF_PROG(cgrp_kfunc_release_untrusted, struct cgroup *cgrp, const char *path) 210 210 { 211 211 struct __cgrps_kfunc_map_value *v;

+53 -1

tools/testing/selftests/bpf/progs/cgrp_kfunc_success.c

··· 61 61 SEC("tp_btf/cgroup_mkdir") 62 62 int BPF_PROG(test_cgrp_xchg_release, struct cgroup *cgrp, const char *path) 63 63 { 64 - struct cgroup *kptr; 64 + struct cgroup *kptr, *cg; 65 65 struct __cgrps_kfunc_map_value *v; 66 66 long status; 67 67 ··· 79 79 err = 2; 80 80 return 0; 81 81 } 82 + 83 + kptr = v->cgrp; 84 + if (!kptr) { 85 + err = 4; 86 + return 0; 87 + } 88 + 89 + cg = bpf_cgroup_ancestor(kptr, 1); 90 + if (cg) /* verifier only check */ 91 + bpf_cgroup_release(cg); 82 92 83 93 kptr = bpf_kptr_xchg(&v->cgrp, NULL); 84 94 if (!kptr) { ··· 173 163 if (invalid) { 174 164 bpf_cgroup_release(invalid); 175 165 err = 5; 166 + return 0; 167 + } 168 + 169 + return 0; 170 + } 171 + 172 + SEC("tp_btf/cgroup_mkdir") 173 + int BPF_PROG(test_cgrp_from_id, struct cgroup *cgrp, const char *path) 174 + { 175 + struct cgroup *parent, *res; 176 + u64 parent_cgid; 177 + 178 + if (!is_test_kfunc_task()) 179 + return 0; 180 + 181 + /* @cgrp's ID is not visible yet, let's test with the parent */ 182 + parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1); 183 + if (!parent) { 184 + err = 1; 185 + return 0; 186 + } 187 + 188 + parent_cgid = parent->kn->id; 189 + bpf_cgroup_release(parent); 190 + 191 + res = bpf_cgroup_from_id(parent_cgid); 192 + if (!res) { 193 + err = 2; 194 + return 0; 195 + } 196 + 197 + bpf_cgroup_release(res); 198 + 199 + if (res != parent) { 200 + err = 3; 201 + return 0; 202 + } 203 + 204 + res = bpf_cgroup_from_id((u64)-1); 205 + if (res) { 206 + bpf_cgroup_release(res); 207 + err = 4; 176 208 return 0; 177 209 } 178 210

+2 -2

tools/testing/selftests/bpf/progs/cgrp_ls_sleepable.c

··· 49 49 if (task->pid != target_pid) 50 50 return 0; 51 51 52 - /* ptr_to_btf_id semantics. should work. */ 52 + /* task->cgroups is untrusted in sleepable prog outside of RCU CS */ 53 53 cgrp = task->cgroups->dfl_cgrp; 54 54 ptr = bpf_cgrp_storage_get(&map_a, cgrp, 0, 55 55 BPF_LOCAL_STORAGE_GET_F_CREATE); ··· 71 71 72 72 bpf_rcu_read_lock(); 73 73 cgrp = task->cgroups->dfl_cgrp; 74 - /* cgrp is untrusted and cannot pass to bpf_cgrp_storage_get() helper. */ 74 + /* cgrp is trusted under RCU CS */ 75 75 ptr = bpf_cgrp_storage_get(&map_a, cgrp, 0, BPF_LOCAL_STORAGE_GET_F_CREATE); 76 76 if (ptr) 77 77 cgroup_id = cgrp->kn->id;

+1 -1

tools/testing/selftests/bpf/progs/cpumask_common.h

··· 10 10 int err; 11 11 12 12 struct __cpumask_map_value { 13 - struct bpf_cpumask __kptr_ref * cpumask; 13 + struct bpf_cpumask __kptr * cpumask; 14 14 }; 15 15 16 16 struct array_map {

+1 -1

tools/testing/selftests/bpf/progs/cpumask_failure.c

··· 44 44 } 45 45 46 46 SEC("tp_btf/task_newtask") 47 - __failure __msg("bpf_cpumask_acquire args#0 expected pointer to STRUCT bpf_cpumask") 47 + __failure __msg("must be referenced") 48 48 int BPF_PROG(test_acquire_wrong_cpumask, struct task_struct *task, u64 clone_flags) 49 49 { 50 50 struct bpf_cpumask *cpumask;

+286 -1

tools/testing/selftests/bpf/progs/dynptr_fail.c

··· 5 5 #include <string.h> 6 6 #include <linux/bpf.h> 7 7 #include <bpf/bpf_helpers.h> 8 + #include <linux/if_ether.h> 8 9 #include "bpf_misc.h" 10 + #include "bpf_kfuncs.h" 9 11 10 12 char _license[] SEC("license") = "GPL"; 11 13 ··· 246 244 return 0; 247 245 } 248 246 247 + /* A data slice can't be accessed out of bounds */ 248 + SEC("?tc") 249 + __failure __msg("value is outside of the allowed memory range") 250 + int data_slice_out_of_bounds_skb(struct __sk_buff *skb) 251 + { 252 + struct bpf_dynptr ptr; 253 + struct ethhdr *hdr; 254 + char buffer[sizeof(*hdr)] = {}; 255 + 256 + bpf_dynptr_from_skb(skb, 0, &ptr); 257 + 258 + hdr = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, sizeof(buffer)); 259 + if (!hdr) 260 + return SK_DROP; 261 + 262 + /* this should fail */ 263 + *(__u8*)(hdr + 1) = 1; 264 + 265 + return SK_PASS; 266 + } 267 + 249 268 SEC("?raw_tp") 250 269 __failure __msg("value is outside of the allowed memory range") 251 270 int data_slice_out_of_bounds_map_value(void *ctx) ··· 422 399 423 400 /* this should fail */ 424 401 bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 8, 0, 0); 425 - 426 402 return 0; 427 403 } 428 404 ··· 1066 1044 return 0; 1067 1045 } 1068 1046 1047 + /* bpf_dynptr_slice()s are read-only and cannot be written to */ 1048 + SEC("?tc") 1049 + __failure __msg("R0 cannot write into rdonly_mem") 1050 + int skb_invalid_slice_write(struct __sk_buff *skb) 1051 + { 1052 + struct bpf_dynptr ptr; 1053 + struct ethhdr *hdr; 1054 + char buffer[sizeof(*hdr)] = {}; 1055 + 1056 + bpf_dynptr_from_skb(skb, 0, &ptr); 1057 + 1058 + hdr = bpf_dynptr_slice(&ptr, 0, buffer, sizeof(buffer)); 1059 + if (!hdr) 1060 + return SK_DROP; 1061 + 1062 + /* this should fail */ 1063 + hdr->h_proto = 1; 1064 + 1065 + return SK_PASS; 1066 + } 1067 + 1068 + /* The read-only data slice is invalidated whenever a helper changes packet data */ 1069 + SEC("?tc") 1070 + __failure __msg("invalid mem access 'scalar'") 1071 + int skb_invalid_data_slice1(struct __sk_buff *skb) 1072 + { 1073 + struct bpf_dynptr ptr; 1074 + struct ethhdr *hdr; 1075 + char buffer[sizeof(*hdr)] = {}; 1076 + 1077 + bpf_dynptr_from_skb(skb, 0, &ptr); 1078 + 1079 + hdr = bpf_dynptr_slice(&ptr, 0, buffer, sizeof(buffer)); 1080 + if (!hdr) 1081 + return SK_DROP; 1082 + 1083 + val = hdr->h_proto; 1084 + 1085 + if (bpf_skb_pull_data(skb, skb->len)) 1086 + return SK_DROP; 1087 + 1088 + /* this should fail */ 1089 + val = hdr->h_proto; 1090 + 1091 + return SK_PASS; 1092 + } 1093 + 1094 + /* The read-write data slice is invalidated whenever a helper changes packet data */ 1095 + SEC("?tc") 1096 + __failure __msg("invalid mem access 'scalar'") 1097 + int skb_invalid_data_slice2(struct __sk_buff *skb) 1098 + { 1099 + struct bpf_dynptr ptr; 1100 + struct ethhdr *hdr; 1101 + char buffer[sizeof(*hdr)] = {}; 1102 + 1103 + bpf_dynptr_from_skb(skb, 0, &ptr); 1104 + 1105 + hdr = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, sizeof(buffer)); 1106 + if (!hdr) 1107 + return SK_DROP; 1108 + 1109 + hdr->h_proto = 123; 1110 + 1111 + if (bpf_skb_pull_data(skb, skb->len)) 1112 + return SK_DROP; 1113 + 1114 + /* this should fail */ 1115 + hdr->h_proto = 1; 1116 + 1117 + return SK_PASS; 1118 + } 1119 + 1120 + /* The read-only data slice is invalidated whenever bpf_dynptr_write() is called */ 1121 + SEC("?tc") 1122 + __failure __msg("invalid mem access 'scalar'") 1123 + int skb_invalid_data_slice3(struct __sk_buff *skb) 1124 + { 1125 + char write_data[64] = "hello there, world!!"; 1126 + struct bpf_dynptr ptr; 1127 + struct ethhdr *hdr; 1128 + char buffer[sizeof(*hdr)] = {}; 1129 + 1130 + bpf_dynptr_from_skb(skb, 0, &ptr); 1131 + 1132 + hdr = bpf_dynptr_slice(&ptr, 0, buffer, sizeof(buffer)); 1133 + if (!hdr) 1134 + return SK_DROP; 1135 + 1136 + val = hdr->h_proto; 1137 + 1138 + bpf_dynptr_write(&ptr, 0, write_data, sizeof(write_data), 0); 1139 + 1140 + /* this should fail */ 1141 + val = hdr->h_proto; 1142 + 1143 + return SK_PASS; 1144 + } 1145 + 1146 + /* The read-write data slice is invalidated whenever bpf_dynptr_write() is called */ 1147 + SEC("?tc") 1148 + __failure __msg("invalid mem access 'scalar'") 1149 + int skb_invalid_data_slice4(struct __sk_buff *skb) 1150 + { 1151 + char write_data[64] = "hello there, world!!"; 1152 + struct bpf_dynptr ptr; 1153 + struct ethhdr *hdr; 1154 + char buffer[sizeof(*hdr)] = {}; 1155 + 1156 + bpf_dynptr_from_skb(skb, 0, &ptr); 1157 + hdr = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, sizeof(buffer)); 1158 + if (!hdr) 1159 + return SK_DROP; 1160 + 1161 + hdr->h_proto = 123; 1162 + 1163 + bpf_dynptr_write(&ptr, 0, write_data, sizeof(write_data), 0); 1164 + 1165 + /* this should fail */ 1166 + hdr->h_proto = 1; 1167 + 1168 + return SK_PASS; 1169 + } 1170 + 1171 + /* The read-only data slice is invalidated whenever a helper changes packet data */ 1172 + SEC("?xdp") 1173 + __failure __msg("invalid mem access 'scalar'") 1174 + int xdp_invalid_data_slice1(struct xdp_md *xdp) 1175 + { 1176 + struct bpf_dynptr ptr; 1177 + struct ethhdr *hdr; 1178 + char buffer[sizeof(*hdr)] = {}; 1179 + 1180 + bpf_dynptr_from_xdp(xdp, 0, &ptr); 1181 + hdr = bpf_dynptr_slice(&ptr, 0, buffer, sizeof(buffer)); 1182 + if (!hdr) 1183 + return SK_DROP; 1184 + 1185 + val = hdr->h_proto; 1186 + 1187 + if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(*hdr))) 1188 + return XDP_DROP; 1189 + 1190 + /* this should fail */ 1191 + val = hdr->h_proto; 1192 + 1193 + return XDP_PASS; 1194 + } 1195 + 1196 + /* The read-write data slice is invalidated whenever a helper changes packet data */ 1197 + SEC("?xdp") 1198 + __failure __msg("invalid mem access 'scalar'") 1199 + int xdp_invalid_data_slice2(struct xdp_md *xdp) 1200 + { 1201 + struct bpf_dynptr ptr; 1202 + struct ethhdr *hdr; 1203 + char buffer[sizeof(*hdr)] = {}; 1204 + 1205 + bpf_dynptr_from_xdp(xdp, 0, &ptr); 1206 + hdr = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, sizeof(buffer)); 1207 + if (!hdr) 1208 + return SK_DROP; 1209 + 1210 + hdr->h_proto = 9; 1211 + 1212 + if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(*hdr))) 1213 + return XDP_DROP; 1214 + 1215 + /* this should fail */ 1216 + hdr->h_proto = 1; 1217 + 1218 + return XDP_PASS; 1219 + } 1220 + 1221 + /* Only supported prog type can create skb-type dynptrs */ 1222 + SEC("?raw_tp") 1223 + __failure __msg("calling kernel function bpf_dynptr_from_skb is not allowed") 1224 + int skb_invalid_ctx(void *ctx) 1225 + { 1226 + struct bpf_dynptr ptr; 1227 + 1228 + /* this should fail */ 1229 + bpf_dynptr_from_skb(ctx, 0, &ptr); 1230 + 1231 + return 0; 1232 + } 1233 + 1069 1234 /* Reject writes to dynptr slot for uninit arg */ 1070 1235 SEC("?raw_tp") 1071 1236 __failure __msg("potential write to dynptr at off=-16") ··· 1268 1059 bpf_get_current_comm(data.buf, 80); 1269 1060 1270 1061 return 0; 1062 + } 1063 + 1064 + /* Only supported prog type can create xdp-type dynptrs */ 1065 + SEC("?raw_tp") 1066 + __failure __msg("calling kernel function bpf_dynptr_from_xdp is not allowed") 1067 + int xdp_invalid_ctx(void *ctx) 1068 + { 1069 + struct bpf_dynptr ptr; 1070 + 1071 + /* this should fail */ 1072 + bpf_dynptr_from_xdp(ctx, 0, &ptr); 1073 + 1074 + return 0; 1075 + } 1076 + 1077 + __u32 hdr_size = sizeof(struct ethhdr); 1078 + /* Can't pass in variable-sized len to bpf_dynptr_slice */ 1079 + SEC("?tc") 1080 + __failure __msg("unbounded memory access") 1081 + int dynptr_slice_var_len1(struct __sk_buff *skb) 1082 + { 1083 + struct bpf_dynptr ptr; 1084 + struct ethhdr *hdr; 1085 + char buffer[sizeof(*hdr)] = {}; 1086 + 1087 + bpf_dynptr_from_skb(skb, 0, &ptr); 1088 + 1089 + /* this should fail */ 1090 + hdr = bpf_dynptr_slice(&ptr, 0, buffer, hdr_size); 1091 + if (!hdr) 1092 + return SK_DROP; 1093 + 1094 + return SK_PASS; 1095 + } 1096 + 1097 + /* Can't pass in variable-sized len to bpf_dynptr_slice */ 1098 + SEC("?tc") 1099 + __failure __msg("must be a known constant") 1100 + int dynptr_slice_var_len2(struct __sk_buff *skb) 1101 + { 1102 + char buffer[sizeof(struct ethhdr)] = {}; 1103 + struct bpf_dynptr ptr; 1104 + struct ethhdr *hdr; 1105 + 1106 + bpf_dynptr_from_skb(skb, 0, &ptr); 1107 + 1108 + if (hdr_size <= sizeof(buffer)) { 1109 + /* this should fail */ 1110 + hdr = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, hdr_size); 1111 + if (!hdr) 1112 + return SK_DROP; 1113 + hdr->h_proto = 12; 1114 + } 1115 + 1116 + return SK_PASS; 1271 1117 } 1272 1118 1273 1119 static int callback(__u32 index, void *data) ··· 1353 1089 1354 1090 /* this should fail */ 1355 1091 *slice = 1; 1092 + 1093 + return 0; 1094 + } 1095 + 1096 + /* Program types that don't allow writes to packet data should fail if 1097 + * bpf_dynptr_slice_rdwr is called 1098 + */ 1099 + SEC("cgroup_skb/ingress") 1100 + __failure __msg("the prog does not allow writes to packet data") 1101 + int invalid_slice_rdwr_rdonly(struct __sk_buff *skb) 1102 + { 1103 + char buffer[sizeof(struct ethhdr)] = {}; 1104 + struct bpf_dynptr ptr; 1105 + struct ethhdr *hdr; 1106 + 1107 + bpf_dynptr_from_skb(skb, 0, &ptr); 1108 + 1109 + /* this should fail since cgroup_skb doesn't allow 1110 + * changing packet data 1111 + */ 1112 + hdr = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, sizeof(buffer)); 1356 1113 1357 1114 return 0; 1358 1115 }

+51 -4

tools/testing/selftests/bpf/progs/dynptr_success.c

··· 5 5 #include <linux/bpf.h> 6 6 #include <bpf/bpf_helpers.h> 7 7 #include "bpf_misc.h" 8 + #include "bpf_kfuncs.h" 8 9 #include "errno.h" 9 10 10 11 char _license[] SEC("license") = "GPL"; ··· 31 30 __type(value, __u32); 32 31 } array_map SEC(".maps"); 33 32 34 - SEC("tp/syscalls/sys_enter_nanosleep") 33 + SEC("?tp/syscalls/sys_enter_nanosleep") 35 34 int test_read_write(void *ctx) 36 35 { 37 36 char write_data[64] = "hello there, world!!"; ··· 62 61 return 0; 63 62 } 64 63 65 - SEC("tp/syscalls/sys_enter_nanosleep") 66 - int test_data_slice(void *ctx) 64 + SEC("?tp/syscalls/sys_enter_nanosleep") 65 + int test_dynptr_data(void *ctx) 67 66 { 68 67 __u32 key = 0, val = 235, *map_val; 69 68 struct bpf_dynptr ptr; ··· 132 131 return 0; 133 132 } 134 133 135 - SEC("tp/syscalls/sys_enter_nanosleep") 134 + SEC("?tp/syscalls/sys_enter_nanosleep") 136 135 int test_ringbuf(void *ctx) 137 136 { 138 137 struct bpf_dynptr ptr; ··· 163 162 done: 164 163 bpf_ringbuf_discard_dynptr(&ptr, 0); 165 164 return 0; 165 + } 166 + 167 + SEC("?cgroup_skb/egress") 168 + int test_skb_readonly(struct __sk_buff *skb) 169 + { 170 + __u8 write_data[2] = {1, 2}; 171 + struct bpf_dynptr ptr; 172 + __u64 *data; 173 + int ret; 174 + 175 + if (bpf_dynptr_from_skb(skb, 0, &ptr)) { 176 + err = 1; 177 + return 1; 178 + } 179 + 180 + /* since cgroup skbs are read only, writes should fail */ 181 + ret = bpf_dynptr_write(&ptr, 0, write_data, sizeof(write_data), 0); 182 + if (ret != -EINVAL) { 183 + err = 2; 184 + return 1; 185 + } 186 + 187 + return 1; 188 + } 189 + 190 + SEC("?cgroup_skb/egress") 191 + int test_dynptr_skb_data(struct __sk_buff *skb) 192 + { 193 + __u8 write_data[2] = {1, 2}; 194 + struct bpf_dynptr ptr; 195 + __u64 *data; 196 + int ret; 197 + 198 + if (bpf_dynptr_from_skb(skb, 0, &ptr)) { 199 + err = 1; 200 + return 1; 201 + } 202 + 203 + /* This should return NULL. Must use bpf_dynptr_slice API */ 204 + data = bpf_dynptr_data(&ptr, 0, 1); 205 + if (data) { 206 + err = 2; 207 + return 1; 208 + } 209 + 210 + return 1; 166 211 }

+1 -1

tools/testing/selftests/bpf/progs/find_vma_fail1.c

··· 13 13 struct callback_ctx *data) 14 14 { 15 15 /* writing to vma, which is illegal */ 16 - vma->vm_flags |= 0x55; 16 + vma->vm_start = 0xffffffffff600000; 17 17 18 18 return 0; 19 19 }

+1 -1

tools/testing/selftests/bpf/progs/jit_probe_mem.c

··· 4 4 #include <bpf/bpf_tracing.h> 5 5 #include <bpf/bpf_helpers.h> 6 6 7 - static struct prog_test_ref_kfunc __kptr_ref *v; 7 + static struct prog_test_ref_kfunc __kptr *v; 8 8 long total_sum = -1; 9 9 10 10 extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym;

+1 -1

tools/testing/selftests/bpf/progs/lru_bug.c

··· 4 4 #include <bpf/bpf_helpers.h> 5 5 6 6 struct map_value { 7 - struct task_struct __kptr *ptr; 7 + struct task_struct __kptr_untrusted *ptr; 8 8 }; 9 9 10 10 struct {

+317 -45

tools/testing/selftests/bpf/progs/map_kptr.c

··· 4 4 #include <bpf/bpf_helpers.h> 5 5 6 6 struct map_value { 7 - struct prog_test_ref_kfunc __kptr *unref_ptr; 8 - struct prog_test_ref_kfunc __kptr_ref *ref_ptr; 7 + struct prog_test_ref_kfunc __kptr_untrusted *unref_ptr; 8 + struct prog_test_ref_kfunc __kptr *ref_ptr; 9 9 }; 10 10 11 11 struct array_map { ··· 15 15 __uint(max_entries, 1); 16 16 } array_map SEC(".maps"); 17 17 18 + struct pcpu_array_map { 19 + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); 20 + __type(key, int); 21 + __type(value, struct map_value); 22 + __uint(max_entries, 1); 23 + } pcpu_array_map SEC(".maps"); 24 + 18 25 struct hash_map { 19 26 __uint(type, BPF_MAP_TYPE_HASH); 20 27 __type(key, int); 21 28 __type(value, struct map_value); 22 29 __uint(max_entries, 1); 23 30 } hash_map SEC(".maps"); 31 + 32 + struct pcpu_hash_map { 33 + __uint(type, BPF_MAP_TYPE_PERCPU_HASH); 34 + __type(key, int); 35 + __type(value, struct map_value); 36 + __uint(max_entries, 1); 37 + } pcpu_hash_map SEC(".maps"); 24 38 25 39 struct hash_malloc_map { 26 40 __uint(type, BPF_MAP_TYPE_HASH); ··· 44 30 __uint(map_flags, BPF_F_NO_PREALLOC); 45 31 } hash_malloc_map SEC(".maps"); 46 32 33 + struct pcpu_hash_malloc_map { 34 + __uint(type, BPF_MAP_TYPE_PERCPU_HASH); 35 + __type(key, int); 36 + __type(value, struct map_value); 37 + __uint(max_entries, 1); 38 + __uint(map_flags, BPF_F_NO_PREALLOC); 39 + } pcpu_hash_malloc_map SEC(".maps"); 40 + 47 41 struct lru_hash_map { 48 42 __uint(type, BPF_MAP_TYPE_LRU_HASH); 49 43 __type(key, int); 50 44 __type(value, struct map_value); 51 45 __uint(max_entries, 1); 52 46 } lru_hash_map SEC(".maps"); 47 + 48 + struct lru_pcpu_hash_map { 49 + __uint(type, BPF_MAP_TYPE_LRU_PERCPU_HASH); 50 + __type(key, int); 51 + __type(value, struct map_value); 52 + __uint(max_entries, 1); 53 + } lru_pcpu_hash_map SEC(".maps"); 54 + 55 + struct cgrp_ls_map { 56 + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); 57 + __uint(map_flags, BPF_F_NO_PREALLOC); 58 + __type(key, int); 59 + __type(value, struct map_value); 60 + } cgrp_ls_map SEC(".maps"); 61 + 62 + struct task_ls_map { 63 + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); 64 + __uint(map_flags, BPF_F_NO_PREALLOC); 65 + __type(key, int); 66 + __type(value, struct map_value); 67 + } task_ls_map SEC(".maps"); 68 + 69 + struct inode_ls_map { 70 + __uint(type, BPF_MAP_TYPE_INODE_STORAGE); 71 + __uint(map_flags, BPF_F_NO_PREALLOC); 72 + __type(key, int); 73 + __type(value, struct map_value); 74 + } inode_ls_map SEC(".maps"); 75 + 76 + struct sk_ls_map { 77 + __uint(type, BPF_MAP_TYPE_SK_STORAGE); 78 + __uint(map_flags, BPF_F_NO_PREALLOC); 79 + __type(key, int); 80 + __type(value, struct map_value); 81 + } sk_ls_map SEC(".maps"); 53 82 54 83 #define DEFINE_MAP_OF_MAP(map_type, inner_map_type, name) \ 55 84 struct { \ ··· 118 61 extern struct prog_test_ref_kfunc * 119 62 bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b) __ksym; 120 63 extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym; 64 + void bpf_kfunc_call_test_ref(struct prog_test_ref_kfunc *p) __ksym; 121 65 122 66 #define WRITE_ONCE(x, val) ((*(volatile typeof(x) *) &(x)) = (val)) 123 67 ··· 148 90 WRITE_ONCE(v->unref_ptr, p); 149 91 if (!p) 150 92 return; 93 + /* 94 + * p is rcu_ptr_prog_test_ref_kfunc, 95 + * because bpf prog is non-sleepable and runs in RCU CS. 96 + * p can be passed to kfunc that requires KF_RCU. 97 + */ 98 + bpf_kfunc_call_test_ref(p); 151 99 if (p->a + p->b > 100) 152 100 return; 153 101 /* store NULL */ 154 102 p = bpf_kptr_xchg(&v->ref_ptr, NULL); 155 103 if (!p) 156 104 return; 105 + /* 106 + * p is trusted_ptr_prog_test_ref_kfunc. 107 + * p can be passed to kfunc that requires KF_RCU. 108 + */ 109 + bpf_kfunc_call_test_ref(p); 157 110 if (p->a + p->b > 100) { 158 111 bpf_kfunc_call_test_release(p); 159 112 return; ··· 229 160 return 0; 230 161 } 231 162 163 + SEC("tp_btf/cgroup_mkdir") 164 + int BPF_PROG(test_cgrp_map_kptr, struct cgroup *cgrp, const char *path) 165 + { 166 + struct map_value *v; 167 + 168 + v = bpf_cgrp_storage_get(&cgrp_ls_map, cgrp, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 169 + if (v) 170 + test_kptr(v); 171 + return 0; 172 + } 173 + 174 + SEC("lsm/inode_unlink") 175 + int BPF_PROG(test_task_map_kptr, struct inode *inode, struct dentry *victim) 176 + { 177 + struct task_struct *task; 178 + struct map_value *v; 179 + 180 + task = bpf_get_current_task_btf(); 181 + if (!task) 182 + return 0; 183 + v = bpf_task_storage_get(&task_ls_map, task, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 184 + if (v) 185 + test_kptr(v); 186 + return 0; 187 + } 188 + 189 + SEC("lsm/inode_unlink") 190 + int BPF_PROG(test_inode_map_kptr, struct inode *inode, struct dentry *victim) 191 + { 192 + struct map_value *v; 193 + 194 + v = bpf_inode_storage_get(&inode_ls_map, inode, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 195 + if (v) 196 + test_kptr(v); 197 + return 0; 198 + } 199 + 200 + SEC("tc") 201 + int test_sk_map_kptr(struct __sk_buff *ctx) 202 + { 203 + struct map_value *v; 204 + struct bpf_sock *sk; 205 + 206 + sk = ctx->sk; 207 + if (!sk) 208 + return 0; 209 + v = bpf_sk_storage_get(&sk_ls_map, sk, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 210 + if (v) 211 + test_kptr(v); 212 + return 0; 213 + } 214 + 232 215 SEC("tc") 233 216 int test_map_in_map_kptr(struct __sk_buff *ctx) 234 217 { ··· 310 189 return 0; 311 190 } 312 191 313 - SEC("tc") 314 - int test_map_kptr_ref(struct __sk_buff *ctx) 192 + int ref = 1; 193 + 194 + static __always_inline 195 + int test_map_kptr_ref_pre(struct map_value *v) 315 196 { 316 197 struct prog_test_ref_kfunc *p, *p_st; 317 198 unsigned long arg = 0; 318 - struct map_value *v; 319 - int key = 0, ret; 199 + int ret; 320 200 321 201 p = bpf_kfunc_call_test_acquire(&arg); 322 202 if (!p) 323 203 return 1; 204 + ref++; 324 205 325 206 p_st = p->next; 326 - if (p_st->cnt.refs.counter != 2) { 207 + if (p_st->cnt.refs.counter != ref) { 327 208 ret = 2; 328 209 goto end; 329 210 } 330 211 331 - v = bpf_map_lookup_elem(&array_map, &key); 332 - if (!v) { 212 + p = bpf_kptr_xchg(&v->ref_ptr, p); 213 + if (p) { 333 214 ret = 3; 334 215 goto end; 335 216 } 336 - 337 - p = bpf_kptr_xchg(&v->ref_ptr, p); 338 - if (p) { 339 - ret = 4; 340 - goto end; 341 - } 342 - if (p_st->cnt.refs.counter != 2) 343 - return 5; 217 + if (p_st->cnt.refs.counter != ref) 218 + return 4; 344 219 345 220 p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0); 346 221 if (!p) 347 - return 6; 348 - if (p_st->cnt.refs.counter != 3) { 349 - ret = 7; 222 + return 5; 223 + ref++; 224 + if (p_st->cnt.refs.counter != ref) { 225 + ret = 6; 350 226 goto end; 351 227 } 352 228 bpf_kfunc_call_test_release(p); 353 - if (p_st->cnt.refs.counter != 2) 354 - return 8; 229 + ref--; 230 + if (p_st->cnt.refs.counter != ref) 231 + return 7; 355 232 356 233 p = bpf_kptr_xchg(&v->ref_ptr, NULL); 357 234 if (!p) 358 - return 9; 235 + return 8; 359 236 bpf_kfunc_call_test_release(p); 360 - if (p_st->cnt.refs.counter != 1) 361 - return 10; 237 + ref--; 238 + if (p_st->cnt.refs.counter != ref) 239 + return 9; 362 240 363 241 p = bpf_kfunc_call_test_acquire(&arg); 364 242 if (!p) 365 - return 11; 243 + return 10; 244 + ref++; 366 245 p = bpf_kptr_xchg(&v->ref_ptr, p); 367 246 if (p) { 368 - ret = 12; 247 + ret = 11; 369 248 goto end; 370 249 } 371 - if (p_st->cnt.refs.counter != 2) 372 - return 13; 250 + if (p_st->cnt.refs.counter != ref) 251 + return 12; 373 252 /* Leave in map */ 374 253 375 254 return 0; 376 255 end: 256 + ref--; 377 257 bpf_kfunc_call_test_release(p); 378 258 return ret; 379 259 } 380 260 381 - SEC("tc") 382 - int test_map_kptr_ref2(struct __sk_buff *ctx) 261 + static __always_inline 262 + int test_map_kptr_ref_post(struct map_value *v) 383 263 { 384 264 struct prog_test_ref_kfunc *p, *p_st; 385 - struct map_value *v; 386 - int key = 0; 387 - 388 - v = bpf_map_lookup_elem(&array_map, &key); 389 - if (!v) 390 - return 1; 391 265 392 266 p_st = v->ref_ptr; 393 - if (!p_st || p_st->cnt.refs.counter != 2) 394 - return 2; 267 + if (!p_st || p_st->cnt.refs.counter != ref) 268 + return 1; 395 269 396 270 p = bpf_kptr_xchg(&v->ref_ptr, NULL); 397 271 if (!p) 398 - return 3; 399 - if (p_st->cnt.refs.counter != 2) { 272 + return 2; 273 + if (p_st->cnt.refs.counter != ref) { 400 274 bpf_kfunc_call_test_release(p); 401 - return 4; 275 + return 3; 402 276 } 403 277 404 278 p = bpf_kptr_xchg(&v->ref_ptr, p); 405 279 if (p) { 406 280 bpf_kfunc_call_test_release(p); 407 - return 5; 281 + return 4; 408 282 } 409 - if (p_st->cnt.refs.counter != 2) 410 - return 6; 283 + if (p_st->cnt.refs.counter != ref) 284 + return 5; 411 285 412 286 return 0; 287 + } 288 + 289 + #define TEST(map) \ 290 + v = bpf_map_lookup_elem(&map, &key); \ 291 + if (!v) \ 292 + return -1; \ 293 + ret = test_map_kptr_ref_pre(v); \ 294 + if (ret) \ 295 + return ret; 296 + 297 + #define TEST_PCPU(map) \ 298 + v = bpf_map_lookup_percpu_elem(&map, &key, 0); \ 299 + if (!v) \ 300 + return -1; \ 301 + ret = test_map_kptr_ref_pre(v); \ 302 + if (ret) \ 303 + return ret; 304 + 305 + SEC("tc") 306 + int test_map_kptr_ref1(struct __sk_buff *ctx) 307 + { 308 + struct map_value *v, val = {}; 309 + int key = 0, ret; 310 + 311 + bpf_map_update_elem(&hash_map, &key, &val, 0); 312 + bpf_map_update_elem(&hash_malloc_map, &key, &val, 0); 313 + bpf_map_update_elem(&lru_hash_map, &key, &val, 0); 314 + 315 + bpf_map_update_elem(&pcpu_hash_map, &key, &val, 0); 316 + bpf_map_update_elem(&pcpu_hash_malloc_map, &key, &val, 0); 317 + bpf_map_update_elem(&lru_pcpu_hash_map, &key, &val, 0); 318 + 319 + TEST(array_map); 320 + TEST(hash_map); 321 + TEST(hash_malloc_map); 322 + TEST(lru_hash_map); 323 + 324 + TEST_PCPU(pcpu_array_map); 325 + TEST_PCPU(pcpu_hash_map); 326 + TEST_PCPU(pcpu_hash_malloc_map); 327 + TEST_PCPU(lru_pcpu_hash_map); 328 + 329 + return 0; 330 + } 331 + 332 + #undef TEST 333 + #undef TEST_PCPU 334 + 335 + #define TEST(map) \ 336 + v = bpf_map_lookup_elem(&map, &key); \ 337 + if (!v) \ 338 + return -1; \ 339 + ret = test_map_kptr_ref_post(v); \ 340 + if (ret) \ 341 + return ret; 342 + 343 + #define TEST_PCPU(map) \ 344 + v = bpf_map_lookup_percpu_elem(&map, &key, 0); \ 345 + if (!v) \ 346 + return -1; \ 347 + ret = test_map_kptr_ref_post(v); \ 348 + if (ret) \ 349 + return ret; 350 + 351 + SEC("tc") 352 + int test_map_kptr_ref2(struct __sk_buff *ctx) 353 + { 354 + struct map_value *v; 355 + int key = 0, ret; 356 + 357 + TEST(array_map); 358 + TEST(hash_map); 359 + TEST(hash_malloc_map); 360 + TEST(lru_hash_map); 361 + 362 + TEST_PCPU(pcpu_array_map); 363 + TEST_PCPU(pcpu_hash_map); 364 + TEST_PCPU(pcpu_hash_malloc_map); 365 + TEST_PCPU(lru_pcpu_hash_map); 366 + 367 + return 0; 368 + } 369 + 370 + #undef TEST 371 + #undef TEST_PCPU 372 + 373 + SEC("tc") 374 + int test_map_kptr_ref3(struct __sk_buff *ctx) 375 + { 376 + struct prog_test_ref_kfunc *p; 377 + unsigned long sp = 0; 378 + 379 + p = bpf_kfunc_call_test_acquire(&sp); 380 + if (!p) 381 + return 1; 382 + ref++; 383 + if (p->cnt.refs.counter != ref) { 384 + bpf_kfunc_call_test_release(p); 385 + return 2; 386 + } 387 + bpf_kfunc_call_test_release(p); 388 + ref--; 389 + return 0; 390 + } 391 + 392 + SEC("syscall") 393 + int test_ls_map_kptr_ref1(void *ctx) 394 + { 395 + struct task_struct *current; 396 + struct map_value *v; 397 + int ret; 398 + 399 + current = bpf_get_current_task_btf(); 400 + if (!current) 401 + return 100; 402 + v = bpf_task_storage_get(&task_ls_map, current, NULL, 0); 403 + if (v) 404 + return 150; 405 + v = bpf_task_storage_get(&task_ls_map, current, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 406 + if (!v) 407 + return 200; 408 + return test_map_kptr_ref_pre(v); 409 + } 410 + 411 + SEC("syscall") 412 + int test_ls_map_kptr_ref2(void *ctx) 413 + { 414 + struct task_struct *current; 415 + struct map_value *v; 416 + int ret; 417 + 418 + current = bpf_get_current_task_btf(); 419 + if (!current) 420 + return 100; 421 + v = bpf_task_storage_get(&task_ls_map, current, NULL, 0); 422 + if (!v) 423 + return 200; 424 + return test_map_kptr_ref_post(v); 425 + } 426 + 427 + SEC("syscall") 428 + int test_ls_map_kptr_ref_del(void *ctx) 429 + { 430 + struct task_struct *current; 431 + struct map_value *v; 432 + int ret; 433 + 434 + current = bpf_get_current_task_btf(); 435 + if (!current) 436 + return 100; 437 + v = bpf_task_storage_get(&task_ls_map, current, NULL, 0); 438 + if (!v) 439 + return 200; 440 + if (!v->ref_ptr) 441 + return 300; 442 + return bpf_task_storage_delete(&task_ls_map, current); 413 443 } 414 444 415 445 char _license[] SEC("license") = "GPL";

+5 -5

tools/testing/selftests/bpf/progs/map_kptr_fail.c

··· 7 7 8 8 struct map_value { 9 9 char buf[8]; 10 - struct prog_test_ref_kfunc __kptr *unref_ptr; 11 - struct prog_test_ref_kfunc __kptr_ref *ref_ptr; 12 - struct prog_test_member __kptr_ref *ref_memb_ptr; 10 + struct prog_test_ref_kfunc __kptr_untrusted *unref_ptr; 11 + struct prog_test_ref_kfunc __kptr *ref_ptr; 12 + struct prog_test_member __kptr *ref_memb_ptr; 13 13 }; 14 14 15 15 struct array_map { ··· 281 281 } 282 282 283 283 SEC("?tc") 284 - __failure __msg("R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_") 284 + __failure __msg("R1 type=rcu_ptr_or_null_ expected=percpu_ptr_") 285 285 int mark_ref_as_untrusted_or_null(struct __sk_buff *ctx) 286 286 { 287 287 struct map_value *v; ··· 316 316 } 317 317 318 318 SEC("?tc") 319 - __failure __msg("R2 type=untrusted_ptr_ expected=ptr_") 319 + __failure __msg("R2 must be referenced") 320 320 int reject_untrusted_xchg(struct __sk_buff *ctx) 321 321 { 322 322 struct prog_test_ref_kfunc *p;

+1 -1

tools/testing/selftests/bpf/progs/nested_trust_failure.c

··· 17 17 */ 18 18 19 19 SEC("tp_btf/task_newtask") 20 - __failure __msg("R2 must be referenced or trusted") 20 + __failure __msg("R2 must be") 21 21 int BPF_PROG(test_invalid_nested_user_cpus, struct task_struct *task, u64 clone_flags) 22 22 { 23 23 bpf_cpumask_test_cpu(0, task->user_cpus_ptr);

+1 -1

tools/testing/selftests/bpf/progs/rbtree.c

··· 75 75 long rbtree_add_and_remove(void *ctx) 76 76 { 77 77 struct bpf_rb_node *res = NULL; 78 - struct node_data *n, *m; 78 + struct node_data *n, *m = NULL; 79 79 80 80 n = bpf_obj_new(typeof(*n)); 81 81 if (!n)

+5 -2

tools/testing/selftests/bpf/progs/rbtree_fail.c

··· 232 232 233 233 bpf_spin_lock(&glock); 234 234 res = bpf_rbtree_first(&groot); 235 - if (res) 236 - n = container_of(res, struct node_data, node); 235 + if (!res) { 236 + bpf_spin_unlock(&glock); 237 + return 1; 238 + } 239 + n = container_of(res, struct node_data, node); 237 240 bpf_spin_unlock(&glock); 238 241 239 242 bpf_spin_lock(&glock);

+3 -3

tools/testing/selftests/bpf/progs/rcu_read_lock.c

··· 81 81 { 82 82 struct task_struct *task, *real_parent; 83 83 84 - /* no bpf_rcu_read_lock(), old code still works */ 84 + /* old style ptr_to_btf_id is not allowed in sleepable */ 85 85 task = bpf_get_current_task_btf(); 86 86 real_parent = task->real_parent; 87 87 (void)bpf_task_storage_get(&map_a, real_parent, 0, 0); ··· 286 286 } 287 287 288 288 SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") 289 - int task_untrusted_non_rcuptr(void *ctx) 289 + int task_trusted_non_rcuptr(void *ctx) 290 290 { 291 291 struct task_struct *task, *group_leader; 292 292 293 293 task = bpf_get_current_task_btf(); 294 294 bpf_rcu_read_lock(); 295 - /* the pointer group_leader marked as untrusted */ 295 + /* the pointer group_leader is explicitly marked as trusted */ 296 296 group_leader = task->real_parent->group_leader; 297 297 (void)bpf_task_storage_get(&map_a, group_leader, 0, 0); 298 298 bpf_rcu_read_unlock();

+36

tools/testing/selftests/bpf/progs/rcu_tasks_trace_gp.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <vmlinux.h> 3 + #include <bpf/bpf_tracing.h> 4 + #include <bpf/bpf_helpers.h> 5 + 6 + struct task_ls_map { 7 + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); 8 + __uint(map_flags, BPF_F_NO_PREALLOC); 9 + __type(key, int); 10 + __type(value, int); 11 + } task_ls_map SEC(".maps"); 12 + 13 + long gp_seq; 14 + 15 + SEC("syscall") 16 + int do_call_rcu_tasks_trace(void *ctx) 17 + { 18 + struct task_struct *current; 19 + int *v; 20 + 21 + current = bpf_get_current_task_btf(); 22 + v = bpf_task_storage_get(&task_ls_map, current, NULL, BPF_LOCAL_STORAGE_GET_F_CREATE); 23 + if (!v) 24 + return 1; 25 + /* Invoke call_rcu_tasks_trace */ 26 + return bpf_task_storage_delete(&task_ls_map, current); 27 + } 28 + 29 + SEC("kprobe/rcu_tasks_trace_postgp") 30 + int rcu_tasks_trace_postgp(void *ctx) 31 + { 32 + __sync_add_and_fetch(&gp_seq, 1); 33 + return 0; 34 + } 35 + 36 + char _license[] SEC("license") = "GPL";

+1 -1

tools/testing/selftests/bpf/progs/task_kfunc_common.h

··· 10 10 #include <bpf/bpf_tracing.h> 11 11 12 12 struct __tasks_kfunc_map_value { 13 - struct task_struct __kptr_ref * task; 13 + struct task_struct __kptr * task; 14 14 }; 15 15 16 16 struct hash_map {

+23

tools/testing/selftests/bpf/progs/test_attach_kprobe_sleepable.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2017 Facebook 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + #include <bpf/bpf_core_read.h> 8 + #include "bpf_misc.h" 9 + 10 + int kprobe_res = 0; 11 + 12 + /** 13 + * This program will be manually made sleepable on the userspace side 14 + * and should thus be unattachable. 15 + */ 16 + SEC("kprobe/" SYS_PREFIX "sys_nanosleep") 17 + int handle_kprobe_sleepable(struct pt_regs *ctx) 18 + { 19 + kprobe_res = 1; 20 + return 0; 21 + } 22 + 23 + char _license[] SEC("license") = "GPL";

+2 -33

tools/testing/selftests/bpf/progs/test_attach_probe.c

··· 7 7 #include <bpf/bpf_core_read.h> 8 8 #include "bpf_misc.h" 9 9 10 - int kprobe_res = 0; 11 10 int kprobe2_res = 0; 12 - int kretprobe_res = 0; 13 11 int kretprobe2_res = 0; 14 - int uprobe_res = 0; 15 - int uretprobe_res = 0; 16 12 int uprobe_byname_res = 0; 17 13 int uretprobe_byname_res = 0; 18 14 int uprobe_byname2_res = 0; ··· 19 23 int uretprobe_byname3_res = 0; 20 24 void *user_ptr = 0; 21 25 22 - SEC("kprobe") 23 - int handle_kprobe(struct pt_regs *ctx) 24 - { 25 - kprobe_res = 1; 26 - return 0; 27 - } 28 - 29 26 SEC("ksyscall/nanosleep") 30 27 int BPF_KSYSCALL(handle_kprobe_auto, struct __kernel_timespec *req, struct __kernel_timespec *rem) 31 28 { 32 29 kprobe2_res = 11; 33 - return 0; 34 - } 35 - 36 - /** 37 - * This program will be manually made sleepable on the userspace side 38 - * and should thus be unattachable. 39 - */ 40 - SEC("kprobe/" SYS_PREFIX "sys_nanosleep") 41 - int handle_kprobe_sleepable(struct pt_regs *ctx) 42 - { 43 - kprobe_res = 2; 44 - return 0; 45 - } 46 - 47 - SEC("kretprobe") 48 - int handle_kretprobe(struct pt_regs *ctx) 49 - { 50 - kretprobe_res = 2; 51 30 return 0; 52 31 } 53 32 ··· 34 63 } 35 64 36 65 SEC("uprobe") 37 - int handle_uprobe(struct pt_regs *ctx) 66 + int handle_uprobe_ref_ctr(struct pt_regs *ctx) 38 67 { 39 - uprobe_res = 3; 40 68 return 0; 41 69 } 42 70 43 71 SEC("uretprobe") 44 - int handle_uretprobe(struct pt_regs *ctx) 72 + int handle_uretprobe_ref_ctr(struct pt_regs *ctx) 45 73 { 46 - uretprobe_res = 4; 47 74 return 0; 48 75 } 49 76

+53

tools/testing/selftests/bpf/progs/test_attach_probe_manual.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2017 Facebook 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + #include <bpf/bpf_core_read.h> 8 + #include "bpf_misc.h" 9 + 10 + int kprobe_res = 0; 11 + int kretprobe_res = 0; 12 + int uprobe_res = 0; 13 + int uretprobe_res = 0; 14 + int uprobe_byname_res = 0; 15 + void *user_ptr = 0; 16 + 17 + SEC("kprobe") 18 + int handle_kprobe(struct pt_regs *ctx) 19 + { 20 + kprobe_res = 1; 21 + return 0; 22 + } 23 + 24 + SEC("kretprobe") 25 + int handle_kretprobe(struct pt_regs *ctx) 26 + { 27 + kretprobe_res = 2; 28 + return 0; 29 + } 30 + 31 + SEC("uprobe") 32 + int handle_uprobe(struct pt_regs *ctx) 33 + { 34 + uprobe_res = 3; 35 + return 0; 36 + } 37 + 38 + SEC("uretprobe") 39 + int handle_uretprobe(struct pt_regs *ctx) 40 + { 41 + uretprobe_res = 4; 42 + return 0; 43 + } 44 + 45 + SEC("uprobe") 46 + int handle_uprobe_byname(struct pt_regs *ctx) 47 + { 48 + uprobe_byname_res = 5; 49 + return 0; 50 + } 51 + 52 + 53 + char _license[] SEC("license") = "GPL";

+980

tools/testing/selftests/bpf/progs/test_cls_redirect_dynptr.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause 2 + // Copyright (c) 2019, 2020 Cloudflare 3 + 4 + #include <stdbool.h> 5 + #include <stddef.h> 6 + #include <stdint.h> 7 + #include <string.h> 8 + 9 + #include <linux/bpf.h> 10 + #include <linux/icmp.h> 11 + #include <linux/icmpv6.h> 12 + #include <linux/if_ether.h> 13 + #include <linux/in.h> 14 + #include <linux/ip.h> 15 + #include <linux/ipv6.h> 16 + #include <linux/pkt_cls.h> 17 + #include <linux/tcp.h> 18 + #include <linux/udp.h> 19 + 20 + #include <bpf/bpf_helpers.h> 21 + #include <bpf/bpf_endian.h> 22 + 23 + #include "test_cls_redirect.h" 24 + #include "bpf_kfuncs.h" 25 + 26 + #define offsetofend(TYPE, MEMBER) \ 27 + (offsetof(TYPE, MEMBER) + sizeof((((TYPE *)0)->MEMBER))) 28 + 29 + #define IP_OFFSET_MASK (0x1FFF) 30 + #define IP_MF (0x2000) 31 + 32 + char _license[] SEC("license") = "Dual BSD/GPL"; 33 + 34 + /** 35 + * Destination port and IP used for UDP encapsulation. 36 + */ 37 + volatile const __be16 ENCAPSULATION_PORT; 38 + volatile const __be32 ENCAPSULATION_IP; 39 + 40 + typedef struct { 41 + uint64_t processed_packets_total; 42 + uint64_t l3_protocol_packets_total_ipv4; 43 + uint64_t l3_protocol_packets_total_ipv6; 44 + uint64_t l4_protocol_packets_total_tcp; 45 + uint64_t l4_protocol_packets_total_udp; 46 + uint64_t accepted_packets_total_syn; 47 + uint64_t accepted_packets_total_syn_cookies; 48 + uint64_t accepted_packets_total_last_hop; 49 + uint64_t accepted_packets_total_icmp_echo_request; 50 + uint64_t accepted_packets_total_established; 51 + uint64_t forwarded_packets_total_gue; 52 + uint64_t forwarded_packets_total_gre; 53 + 54 + uint64_t errors_total_unknown_l3_proto; 55 + uint64_t errors_total_unknown_l4_proto; 56 + uint64_t errors_total_malformed_ip; 57 + uint64_t errors_total_fragmented_ip; 58 + uint64_t errors_total_malformed_icmp; 59 + uint64_t errors_total_unwanted_icmp; 60 + uint64_t errors_total_malformed_icmp_pkt_too_big; 61 + uint64_t errors_total_malformed_tcp; 62 + uint64_t errors_total_malformed_udp; 63 + uint64_t errors_total_icmp_echo_replies; 64 + uint64_t errors_total_malformed_encapsulation; 65 + uint64_t errors_total_encap_adjust_failed; 66 + uint64_t errors_total_encap_buffer_too_small; 67 + uint64_t errors_total_redirect_loop; 68 + uint64_t errors_total_encap_mtu_violate; 69 + } metrics_t; 70 + 71 + typedef enum { 72 + INVALID = 0, 73 + UNKNOWN, 74 + ECHO_REQUEST, 75 + SYN, 76 + SYN_COOKIE, 77 + ESTABLISHED, 78 + } verdict_t; 79 + 80 + typedef struct { 81 + uint16_t src, dst; 82 + } flow_ports_t; 83 + 84 + _Static_assert( 85 + sizeof(flow_ports_t) != 86 + offsetofend(struct bpf_sock_tuple, ipv4.dport) - 87 + offsetof(struct bpf_sock_tuple, ipv4.sport) - 1, 88 + "flow_ports_t must match sport and dport in struct bpf_sock_tuple"); 89 + _Static_assert( 90 + sizeof(flow_ports_t) != 91 + offsetofend(struct bpf_sock_tuple, ipv6.dport) - 92 + offsetof(struct bpf_sock_tuple, ipv6.sport) - 1, 93 + "flow_ports_t must match sport and dport in struct bpf_sock_tuple"); 94 + 95 + struct iphdr_info { 96 + void *hdr; 97 + __u64 len; 98 + }; 99 + 100 + typedef int ret_t; 101 + 102 + /* This is a bit of a hack. We need a return value which allows us to 103 + * indicate that the regular flow of the program should continue, 104 + * while allowing functions to use XDP_PASS and XDP_DROP, etc. 105 + */ 106 + static const ret_t CONTINUE_PROCESSING = -1; 107 + 108 + /* Convenience macro to call functions which return ret_t. 109 + */ 110 + #define MAYBE_RETURN(x) \ 111 + do { \ 112 + ret_t __ret = x; \ 113 + if (__ret != CONTINUE_PROCESSING) \ 114 + return __ret; \ 115 + } while (0) 116 + 117 + static bool ipv4_is_fragment(const struct iphdr *ip) 118 + { 119 + uint16_t frag_off = ip->frag_off & bpf_htons(IP_OFFSET_MASK); 120 + return (ip->frag_off & bpf_htons(IP_MF)) != 0 || frag_off > 0; 121 + } 122 + 123 + static int pkt_parse_ipv4(struct bpf_dynptr *dynptr, __u64 *offset, struct iphdr *iphdr) 124 + { 125 + if (bpf_dynptr_read(iphdr, sizeof(*iphdr), dynptr, *offset, 0)) 126 + return -1; 127 + 128 + *offset += sizeof(*iphdr); 129 + 130 + if (iphdr->ihl < 5) 131 + return -1; 132 + 133 + /* skip ipv4 options */ 134 + *offset += (iphdr->ihl - 5) * 4; 135 + 136 + return 0; 137 + } 138 + 139 + /* Parse the L4 ports from a packet, assuming a layout like TCP or UDP. */ 140 + static bool pkt_parse_icmp_l4_ports(struct bpf_dynptr *dynptr, __u64 *offset, flow_ports_t *ports) 141 + { 142 + if (bpf_dynptr_read(ports, sizeof(*ports), dynptr, *offset, 0)) 143 + return false; 144 + 145 + *offset += sizeof(*ports); 146 + 147 + /* Ports in the L4 headers are reversed, since we are parsing an ICMP 148 + * payload which is going towards the eyeball. 149 + */ 150 + uint16_t dst = ports->src; 151 + ports->src = ports->dst; 152 + ports->dst = dst; 153 + return true; 154 + } 155 + 156 + static uint16_t pkt_checksum_fold(uint32_t csum) 157 + { 158 + /* The highest reasonable value for an IPv4 header 159 + * checksum requires two folds, so we just do that always. 160 + */ 161 + csum = (csum & 0xffff) + (csum >> 16); 162 + csum = (csum & 0xffff) + (csum >> 16); 163 + return (uint16_t)~csum; 164 + } 165 + 166 + static void pkt_ipv4_checksum(struct iphdr *iph) 167 + { 168 + iph->check = 0; 169 + 170 + /* An IP header without options is 20 bytes. Two of those 171 + * are the checksum, which we always set to zero. Hence, 172 + * the maximum accumulated value is 18 / 2 * 0xffff = 0x8fff7, 173 + * which fits in 32 bit. 174 + */ 175 + _Static_assert(sizeof(struct iphdr) == 20, "iphdr must be 20 bytes"); 176 + uint32_t acc = 0; 177 + uint16_t *ipw = (uint16_t *)iph; 178 + 179 + for (size_t i = 0; i < sizeof(struct iphdr) / 2; i++) 180 + acc += ipw[i]; 181 + 182 + iph->check = pkt_checksum_fold(acc); 183 + } 184 + 185 + static bool pkt_skip_ipv6_extension_headers(struct bpf_dynptr *dynptr, __u64 *offset, 186 + const struct ipv6hdr *ipv6, uint8_t *upper_proto, 187 + bool *is_fragment) 188 + { 189 + /* We understand five extension headers. 190 + * https://tools.ietf.org/html/rfc8200#section-4.1 states that all 191 + * headers should occur once, except Destination Options, which may 192 + * occur twice. Hence we give up after 6 headers. 193 + */ 194 + struct { 195 + uint8_t next; 196 + uint8_t len; 197 + } exthdr = { 198 + .next = ipv6->nexthdr, 199 + }; 200 + *is_fragment = false; 201 + 202 + for (int i = 0; i < 6; i++) { 203 + switch (exthdr.next) { 204 + case IPPROTO_FRAGMENT: 205 + *is_fragment = true; 206 + /* NB: We don't check that hdrlen == 0 as per spec. */ 207 + /* fallthrough; */ 208 + 209 + case IPPROTO_HOPOPTS: 210 + case IPPROTO_ROUTING: 211 + case IPPROTO_DSTOPTS: 212 + case IPPROTO_MH: 213 + if (bpf_dynptr_read(&exthdr, sizeof(exthdr), dynptr, *offset, 0)) 214 + return false; 215 + 216 + /* hdrlen is in 8-octet units, and excludes the first 8 octets. */ 217 + *offset += (exthdr.len + 1) * 8; 218 + 219 + /* Decode next header */ 220 + break; 221 + 222 + default: 223 + /* The next header is not one of the known extension 224 + * headers, treat it as the upper layer header. 225 + * 226 + * This handles IPPROTO_NONE. 227 + * 228 + * Encapsulating Security Payload (50) and Authentication 229 + * Header (51) also end up here (and will trigger an 230 + * unknown proto error later). They have a custom header 231 + * format and seem too esoteric to care about. 232 + */ 233 + *upper_proto = exthdr.next; 234 + return true; 235 + } 236 + } 237 + 238 + /* We never found an upper layer header. */ 239 + return false; 240 + } 241 + 242 + static int pkt_parse_ipv6(struct bpf_dynptr *dynptr, __u64 *offset, struct ipv6hdr *ipv6, 243 + uint8_t *proto, bool *is_fragment) 244 + { 245 + if (bpf_dynptr_read(ipv6, sizeof(*ipv6), dynptr, *offset, 0)) 246 + return -1; 247 + 248 + *offset += sizeof(*ipv6); 249 + 250 + if (!pkt_skip_ipv6_extension_headers(dynptr, offset, ipv6, proto, is_fragment)) 251 + return -1; 252 + 253 + return 0; 254 + } 255 + 256 + /* Global metrics, per CPU 257 + */ 258 + struct { 259 + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); 260 + __uint(max_entries, 1); 261 + __type(key, unsigned int); 262 + __type(value, metrics_t); 263 + } metrics_map SEC(".maps"); 264 + 265 + static metrics_t *get_global_metrics(void) 266 + { 267 + uint64_t key = 0; 268 + return bpf_map_lookup_elem(&metrics_map, &key); 269 + } 270 + 271 + static ret_t accept_locally(struct __sk_buff *skb, encap_headers_t *encap) 272 + { 273 + const int payload_off = 274 + sizeof(*encap) + 275 + sizeof(struct in_addr) * encap->unigue.hop_count; 276 + int32_t encap_overhead = payload_off - sizeof(struct ethhdr); 277 + 278 + /* Changing the ethertype if the encapsulated packet is ipv6 */ 279 + if (encap->gue.proto_ctype == IPPROTO_IPV6) 280 + encap->eth.h_proto = bpf_htons(ETH_P_IPV6); 281 + 282 + if (bpf_skb_adjust_room(skb, -encap_overhead, BPF_ADJ_ROOM_MAC, 283 + BPF_F_ADJ_ROOM_FIXED_GSO | 284 + BPF_F_ADJ_ROOM_NO_CSUM_RESET) || 285 + bpf_csum_level(skb, BPF_CSUM_LEVEL_DEC)) 286 + return TC_ACT_SHOT; 287 + 288 + return bpf_redirect(skb->ifindex, BPF_F_INGRESS); 289 + } 290 + 291 + static ret_t forward_with_gre(struct __sk_buff *skb, struct bpf_dynptr *dynptr, 292 + encap_headers_t *encap, struct in_addr *next_hop, 293 + metrics_t *metrics) 294 + { 295 + const int payload_off = 296 + sizeof(*encap) + 297 + sizeof(struct in_addr) * encap->unigue.hop_count; 298 + int32_t encap_overhead = 299 + payload_off - sizeof(struct ethhdr) - sizeof(struct iphdr); 300 + int32_t delta = sizeof(struct gre_base_hdr) - encap_overhead; 301 + __u8 encap_buffer[sizeof(encap_gre_t)] = {}; 302 + uint16_t proto = ETH_P_IP; 303 + uint32_t mtu_len = 0; 304 + encap_gre_t *encap_gre; 305 + 306 + metrics->forwarded_packets_total_gre++; 307 + 308 + /* Loop protection: the inner packet's TTL is decremented as a safeguard 309 + * against any forwarding loop. As the only interesting field is the TTL 310 + * hop limit for IPv6, it is easier to use bpf_skb_load_bytes/bpf_skb_store_bytes 311 + * as they handle the split packets if needed (no need for the data to be 312 + * in the linear section). 313 + */ 314 + if (encap->gue.proto_ctype == IPPROTO_IPV6) { 315 + proto = ETH_P_IPV6; 316 + uint8_t ttl; 317 + int rc; 318 + 319 + rc = bpf_skb_load_bytes( 320 + skb, payload_off + offsetof(struct ipv6hdr, hop_limit), 321 + &ttl, 1); 322 + if (rc != 0) { 323 + metrics->errors_total_malformed_encapsulation++; 324 + return TC_ACT_SHOT; 325 + } 326 + 327 + if (ttl == 0) { 328 + metrics->errors_total_redirect_loop++; 329 + return TC_ACT_SHOT; 330 + } 331 + 332 + ttl--; 333 + rc = bpf_skb_store_bytes( 334 + skb, payload_off + offsetof(struct ipv6hdr, hop_limit), 335 + &ttl, 1, 0); 336 + if (rc != 0) { 337 + metrics->errors_total_malformed_encapsulation++; 338 + return TC_ACT_SHOT; 339 + } 340 + } else { 341 + uint8_t ttl; 342 + int rc; 343 + 344 + rc = bpf_skb_load_bytes( 345 + skb, payload_off + offsetof(struct iphdr, ttl), &ttl, 346 + 1); 347 + if (rc != 0) { 348 + metrics->errors_total_malformed_encapsulation++; 349 + return TC_ACT_SHOT; 350 + } 351 + 352 + if (ttl == 0) { 353 + metrics->errors_total_redirect_loop++; 354 + return TC_ACT_SHOT; 355 + } 356 + 357 + /* IPv4 also has a checksum to patch. While the TTL is only one byte, 358 + * this function only works for 2 and 4 bytes arguments (the result is 359 + * the same). 360 + */ 361 + rc = bpf_l3_csum_replace( 362 + skb, payload_off + offsetof(struct iphdr, check), ttl, 363 + ttl - 1, 2); 364 + if (rc != 0) { 365 + metrics->errors_total_malformed_encapsulation++; 366 + return TC_ACT_SHOT; 367 + } 368 + 369 + ttl--; 370 + rc = bpf_skb_store_bytes( 371 + skb, payload_off + offsetof(struct iphdr, ttl), &ttl, 1, 372 + 0); 373 + if (rc != 0) { 374 + metrics->errors_total_malformed_encapsulation++; 375 + return TC_ACT_SHOT; 376 + } 377 + } 378 + 379 + if (bpf_check_mtu(skb, skb->ifindex, &mtu_len, delta, 0)) { 380 + metrics->errors_total_encap_mtu_violate++; 381 + return TC_ACT_SHOT; 382 + } 383 + 384 + if (bpf_skb_adjust_room(skb, delta, BPF_ADJ_ROOM_NET, 385 + BPF_F_ADJ_ROOM_FIXED_GSO | 386 + BPF_F_ADJ_ROOM_NO_CSUM_RESET) || 387 + bpf_csum_level(skb, BPF_CSUM_LEVEL_INC)) { 388 + metrics->errors_total_encap_adjust_failed++; 389 + return TC_ACT_SHOT; 390 + } 391 + 392 + if (bpf_skb_pull_data(skb, sizeof(encap_gre_t))) { 393 + metrics->errors_total_encap_buffer_too_small++; 394 + return TC_ACT_SHOT; 395 + } 396 + 397 + encap_gre = bpf_dynptr_slice_rdwr(dynptr, 0, encap_buffer, sizeof(encap_buffer)); 398 + if (!encap_gre) { 399 + metrics->errors_total_encap_buffer_too_small++; 400 + return TC_ACT_SHOT; 401 + } 402 + 403 + encap_gre->ip.protocol = IPPROTO_GRE; 404 + encap_gre->ip.daddr = next_hop->s_addr; 405 + encap_gre->ip.saddr = ENCAPSULATION_IP; 406 + encap_gre->ip.tot_len = 407 + bpf_htons(bpf_ntohs(encap_gre->ip.tot_len) + delta); 408 + encap_gre->gre.flags = 0; 409 + encap_gre->gre.protocol = bpf_htons(proto); 410 + pkt_ipv4_checksum((void *)&encap_gre->ip); 411 + 412 + if (encap_gre == encap_buffer) 413 + bpf_dynptr_write(dynptr, 0, encap_buffer, sizeof(encap_buffer), 0); 414 + 415 + return bpf_redirect(skb->ifindex, 0); 416 + } 417 + 418 + static ret_t forward_to_next_hop(struct __sk_buff *skb, struct bpf_dynptr *dynptr, 419 + encap_headers_t *encap, struct in_addr *next_hop, 420 + metrics_t *metrics) 421 + { 422 + /* swap L2 addresses */ 423 + /* This assumes that packets are received from a router. 424 + * So just swapping the MAC addresses here will make the packet go back to 425 + * the router, which will send it to the appropriate machine. 426 + */ 427 + unsigned char temp[ETH_ALEN]; 428 + memcpy(temp, encap->eth.h_dest, sizeof(temp)); 429 + memcpy(encap->eth.h_dest, encap->eth.h_source, 430 + sizeof(encap->eth.h_dest)); 431 + memcpy(encap->eth.h_source, temp, sizeof(encap->eth.h_source)); 432 + 433 + if (encap->unigue.next_hop == encap->unigue.hop_count - 1 && 434 + encap->unigue.last_hop_gre) { 435 + return forward_with_gre(skb, dynptr, encap, next_hop, metrics); 436 + } 437 + 438 + metrics->forwarded_packets_total_gue++; 439 + uint32_t old_saddr = encap->ip.saddr; 440 + encap->ip.saddr = encap->ip.daddr; 441 + encap->ip.daddr = next_hop->s_addr; 442 + if (encap->unigue.next_hop < encap->unigue.hop_count) { 443 + encap->unigue.next_hop++; 444 + } 445 + 446 + /* Remove ip->saddr, add next_hop->s_addr */ 447 + const uint64_t off = offsetof(typeof(*encap), ip.check); 448 + int ret = bpf_l3_csum_replace(skb, off, old_saddr, next_hop->s_addr, 4); 449 + if (ret < 0) { 450 + return TC_ACT_SHOT; 451 + } 452 + 453 + return bpf_redirect(skb->ifindex, 0); 454 + } 455 + 456 + static ret_t skip_next_hops(__u64 *offset, int n) 457 + { 458 + __u32 res; 459 + switch (n) { 460 + case 1: 461 + *offset += sizeof(struct in_addr); 462 + case 0: 463 + return CONTINUE_PROCESSING; 464 + 465 + default: 466 + return TC_ACT_SHOT; 467 + } 468 + } 469 + 470 + /* Get the next hop from the GLB header. 471 + * 472 + * Sets next_hop->s_addr to 0 if there are no more hops left. 473 + * pkt is positioned just after the variable length GLB header 474 + * iff the call is successful. 475 + */ 476 + static ret_t get_next_hop(struct bpf_dynptr *dynptr, __u64 *offset, encap_headers_t *encap, 477 + struct in_addr *next_hop) 478 + { 479 + if (encap->unigue.next_hop > encap->unigue.hop_count) 480 + return TC_ACT_SHOT; 481 + 482 + /* Skip "used" next hops. */ 483 + MAYBE_RETURN(skip_next_hops(offset, encap->unigue.next_hop)); 484 + 485 + if (encap->unigue.next_hop == encap->unigue.hop_count) { 486 + /* No more next hops, we are at the end of the GLB header. */ 487 + next_hop->s_addr = 0; 488 + return CONTINUE_PROCESSING; 489 + } 490 + 491 + if (bpf_dynptr_read(next_hop, sizeof(*next_hop), dynptr, *offset, 0)) 492 + return TC_ACT_SHOT; 493 + 494 + *offset += sizeof(*next_hop); 495 + 496 + /* Skip the remainig next hops (may be zero). */ 497 + return skip_next_hops(offset, encap->unigue.hop_count - encap->unigue.next_hop - 1); 498 + } 499 + 500 + /* Fill a bpf_sock_tuple to be used with the socket lookup functions. 501 + * This is a kludge that let's us work around verifier limitations: 502 + * 503 + * fill_tuple(&t, foo, sizeof(struct iphdr), 123, 321) 504 + * 505 + * clang will substitue a costant for sizeof, which allows the verifier 506 + * to track it's value. Based on this, it can figure out the constant 507 + * return value, and calling code works while still being "generic" to 508 + * IPv4 and IPv6. 509 + */ 510 + static uint64_t fill_tuple(struct bpf_sock_tuple *tuple, void *iph, 511 + uint64_t iphlen, uint16_t sport, uint16_t dport) 512 + { 513 + switch (iphlen) { 514 + case sizeof(struct iphdr): { 515 + struct iphdr *ipv4 = (struct iphdr *)iph; 516 + tuple->ipv4.daddr = ipv4->daddr; 517 + tuple->ipv4.saddr = ipv4->saddr; 518 + tuple->ipv4.sport = sport; 519 + tuple->ipv4.dport = dport; 520 + return sizeof(tuple->ipv4); 521 + } 522 + 523 + case sizeof(struct ipv6hdr): { 524 + struct ipv6hdr *ipv6 = (struct ipv6hdr *)iph; 525 + memcpy(&tuple->ipv6.daddr, &ipv6->daddr, 526 + sizeof(tuple->ipv6.daddr)); 527 + memcpy(&tuple->ipv6.saddr, &ipv6->saddr, 528 + sizeof(tuple->ipv6.saddr)); 529 + tuple->ipv6.sport = sport; 530 + tuple->ipv6.dport = dport; 531 + return sizeof(tuple->ipv6); 532 + } 533 + 534 + default: 535 + return 0; 536 + } 537 + } 538 + 539 + static verdict_t classify_tcp(struct __sk_buff *skb, struct bpf_sock_tuple *tuple, 540 + uint64_t tuplen, void *iph, struct tcphdr *tcp) 541 + { 542 + struct bpf_sock *sk = 543 + bpf_skc_lookup_tcp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0); 544 + 545 + if (sk == NULL) 546 + return UNKNOWN; 547 + 548 + if (sk->state != BPF_TCP_LISTEN) { 549 + bpf_sk_release(sk); 550 + return ESTABLISHED; 551 + } 552 + 553 + if (iph != NULL && tcp != NULL) { 554 + /* Kludge: we've run out of arguments, but need the length of the ip header. */ 555 + uint64_t iphlen = sizeof(struct iphdr); 556 + 557 + if (tuplen == sizeof(tuple->ipv6)) 558 + iphlen = sizeof(struct ipv6hdr); 559 + 560 + if (bpf_tcp_check_syncookie(sk, iph, iphlen, tcp, 561 + sizeof(*tcp)) == 0) { 562 + bpf_sk_release(sk); 563 + return SYN_COOKIE; 564 + } 565 + } 566 + 567 + bpf_sk_release(sk); 568 + return UNKNOWN; 569 + } 570 + 571 + static verdict_t classify_udp(struct __sk_buff *skb, struct bpf_sock_tuple *tuple, uint64_t tuplen) 572 + { 573 + struct bpf_sock *sk = 574 + bpf_sk_lookup_udp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0); 575 + 576 + if (sk == NULL) 577 + return UNKNOWN; 578 + 579 + if (sk->state == BPF_TCP_ESTABLISHED) { 580 + bpf_sk_release(sk); 581 + return ESTABLISHED; 582 + } 583 + 584 + bpf_sk_release(sk); 585 + return UNKNOWN; 586 + } 587 + 588 + static verdict_t classify_icmp(struct __sk_buff *skb, uint8_t proto, struct bpf_sock_tuple *tuple, 589 + uint64_t tuplen, metrics_t *metrics) 590 + { 591 + switch (proto) { 592 + case IPPROTO_TCP: 593 + return classify_tcp(skb, tuple, tuplen, NULL, NULL); 594 + 595 + case IPPROTO_UDP: 596 + return classify_udp(skb, tuple, tuplen); 597 + 598 + default: 599 + metrics->errors_total_malformed_icmp++; 600 + return INVALID; 601 + } 602 + } 603 + 604 + static verdict_t process_icmpv4(struct __sk_buff *skb, struct bpf_dynptr *dynptr, __u64 *offset, 605 + metrics_t *metrics) 606 + { 607 + struct icmphdr icmp; 608 + struct iphdr ipv4; 609 + 610 + if (bpf_dynptr_read(&icmp, sizeof(icmp), dynptr, *offset, 0)) { 611 + metrics->errors_total_malformed_icmp++; 612 + return INVALID; 613 + } 614 + 615 + *offset += sizeof(icmp); 616 + 617 + /* We should never receive encapsulated echo replies. */ 618 + if (icmp.type == ICMP_ECHOREPLY) { 619 + metrics->errors_total_icmp_echo_replies++; 620 + return INVALID; 621 + } 622 + 623 + if (icmp.type == ICMP_ECHO) 624 + return ECHO_REQUEST; 625 + 626 + if (icmp.type != ICMP_DEST_UNREACH || icmp.code != ICMP_FRAG_NEEDED) { 627 + metrics->errors_total_unwanted_icmp++; 628 + return INVALID; 629 + } 630 + 631 + if (pkt_parse_ipv4(dynptr, offset, &ipv4)) { 632 + metrics->errors_total_malformed_icmp_pkt_too_big++; 633 + return INVALID; 634 + } 635 + 636 + /* The source address in the outer IP header is from the entity that 637 + * originated the ICMP message. Use the original IP header to restore 638 + * the correct flow tuple. 639 + */ 640 + struct bpf_sock_tuple tuple; 641 + tuple.ipv4.saddr = ipv4.daddr; 642 + tuple.ipv4.daddr = ipv4.saddr; 643 + 644 + if (!pkt_parse_icmp_l4_ports(dynptr, offset, (flow_ports_t *)&tuple.ipv4.sport)) { 645 + metrics->errors_total_malformed_icmp_pkt_too_big++; 646 + return INVALID; 647 + } 648 + 649 + return classify_icmp(skb, ipv4.protocol, &tuple, 650 + sizeof(tuple.ipv4), metrics); 651 + } 652 + 653 + static verdict_t process_icmpv6(struct bpf_dynptr *dynptr, __u64 *offset, struct __sk_buff *skb, 654 + metrics_t *metrics) 655 + { 656 + struct bpf_sock_tuple tuple; 657 + struct ipv6hdr ipv6; 658 + struct icmp6hdr icmp6; 659 + bool is_fragment; 660 + uint8_t l4_proto; 661 + 662 + if (bpf_dynptr_read(&icmp6, sizeof(icmp6), dynptr, *offset, 0)) { 663 + metrics->errors_total_malformed_icmp++; 664 + return INVALID; 665 + } 666 + 667 + /* We should never receive encapsulated echo replies. */ 668 + if (icmp6.icmp6_type == ICMPV6_ECHO_REPLY) { 669 + metrics->errors_total_icmp_echo_replies++; 670 + return INVALID; 671 + } 672 + 673 + if (icmp6.icmp6_type == ICMPV6_ECHO_REQUEST) { 674 + return ECHO_REQUEST; 675 + } 676 + 677 + if (icmp6.icmp6_type != ICMPV6_PKT_TOOBIG) { 678 + metrics->errors_total_unwanted_icmp++; 679 + return INVALID; 680 + } 681 + 682 + if (pkt_parse_ipv6(dynptr, offset, &ipv6, &l4_proto, &is_fragment)) { 683 + metrics->errors_total_malformed_icmp_pkt_too_big++; 684 + return INVALID; 685 + } 686 + 687 + if (is_fragment) { 688 + metrics->errors_total_fragmented_ip++; 689 + return INVALID; 690 + } 691 + 692 + /* Swap source and dest addresses. */ 693 + memcpy(&tuple.ipv6.saddr, &ipv6.daddr, sizeof(tuple.ipv6.saddr)); 694 + memcpy(&tuple.ipv6.daddr, &ipv6.saddr, sizeof(tuple.ipv6.daddr)); 695 + 696 + if (!pkt_parse_icmp_l4_ports(dynptr, offset, (flow_ports_t *)&tuple.ipv6.sport)) { 697 + metrics->errors_total_malformed_icmp_pkt_too_big++; 698 + return INVALID; 699 + } 700 + 701 + return classify_icmp(skb, l4_proto, &tuple, sizeof(tuple.ipv6), 702 + metrics); 703 + } 704 + 705 + static verdict_t process_tcp(struct bpf_dynptr *dynptr, __u64 *offset, struct __sk_buff *skb, 706 + struct iphdr_info *info, metrics_t *metrics) 707 + { 708 + struct bpf_sock_tuple tuple; 709 + struct tcphdr tcp; 710 + uint64_t tuplen; 711 + 712 + metrics->l4_protocol_packets_total_tcp++; 713 + 714 + if (bpf_dynptr_read(&tcp, sizeof(tcp), dynptr, *offset, 0)) { 715 + metrics->errors_total_malformed_tcp++; 716 + return INVALID; 717 + } 718 + 719 + *offset += sizeof(tcp); 720 + 721 + if (tcp.syn) 722 + return SYN; 723 + 724 + tuplen = fill_tuple(&tuple, info->hdr, info->len, tcp.source, tcp.dest); 725 + return classify_tcp(skb, &tuple, tuplen, info->hdr, &tcp); 726 + } 727 + 728 + static verdict_t process_udp(struct bpf_dynptr *dynptr, __u64 *offset, struct __sk_buff *skb, 729 + struct iphdr_info *info, metrics_t *metrics) 730 + { 731 + struct bpf_sock_tuple tuple; 732 + struct udphdr udph; 733 + uint64_t tuplen; 734 + 735 + metrics->l4_protocol_packets_total_udp++; 736 + 737 + if (bpf_dynptr_read(&udph, sizeof(udph), dynptr, *offset, 0)) { 738 + metrics->errors_total_malformed_udp++; 739 + return INVALID; 740 + } 741 + *offset += sizeof(udph); 742 + 743 + tuplen = fill_tuple(&tuple, info->hdr, info->len, udph.source, udph.dest); 744 + return classify_udp(skb, &tuple, tuplen); 745 + } 746 + 747 + static verdict_t process_ipv4(struct __sk_buff *skb, struct bpf_dynptr *dynptr, 748 + __u64 *offset, metrics_t *metrics) 749 + { 750 + struct iphdr ipv4; 751 + struct iphdr_info info = { 752 + .hdr = &ipv4, 753 + .len = sizeof(ipv4), 754 + }; 755 + 756 + metrics->l3_protocol_packets_total_ipv4++; 757 + 758 + if (pkt_parse_ipv4(dynptr, offset, &ipv4)) { 759 + metrics->errors_total_malformed_ip++; 760 + return INVALID; 761 + } 762 + 763 + if (ipv4.version != 4) { 764 + metrics->errors_total_malformed_ip++; 765 + return INVALID; 766 + } 767 + 768 + if (ipv4_is_fragment(&ipv4)) { 769 + metrics->errors_total_fragmented_ip++; 770 + return INVALID; 771 + } 772 + 773 + switch (ipv4.protocol) { 774 + case IPPROTO_ICMP: 775 + return process_icmpv4(skb, dynptr, offset, metrics); 776 + 777 + case IPPROTO_TCP: 778 + return process_tcp(dynptr, offset, skb, &info, metrics); 779 + 780 + case IPPROTO_UDP: 781 + return process_udp(dynptr, offset, skb, &info, metrics); 782 + 783 + default: 784 + metrics->errors_total_unknown_l4_proto++; 785 + return INVALID; 786 + } 787 + } 788 + 789 + static verdict_t process_ipv6(struct __sk_buff *skb, struct bpf_dynptr *dynptr, 790 + __u64 *offset, metrics_t *metrics) 791 + { 792 + struct ipv6hdr ipv6; 793 + struct iphdr_info info = { 794 + .hdr = &ipv6, 795 + .len = sizeof(ipv6), 796 + }; 797 + uint8_t l4_proto; 798 + bool is_fragment; 799 + 800 + metrics->l3_protocol_packets_total_ipv6++; 801 + 802 + if (pkt_parse_ipv6(dynptr, offset, &ipv6, &l4_proto, &is_fragment)) { 803 + metrics->errors_total_malformed_ip++; 804 + return INVALID; 805 + } 806 + 807 + if (ipv6.version != 6) { 808 + metrics->errors_total_malformed_ip++; 809 + return INVALID; 810 + } 811 + 812 + if (is_fragment) { 813 + metrics->errors_total_fragmented_ip++; 814 + return INVALID; 815 + } 816 + 817 + switch (l4_proto) { 818 + case IPPROTO_ICMPV6: 819 + return process_icmpv6(dynptr, offset, skb, metrics); 820 + 821 + case IPPROTO_TCP: 822 + return process_tcp(dynptr, offset, skb, &info, metrics); 823 + 824 + case IPPROTO_UDP: 825 + return process_udp(dynptr, offset, skb, &info, metrics); 826 + 827 + default: 828 + metrics->errors_total_unknown_l4_proto++; 829 + return INVALID; 830 + } 831 + } 832 + 833 + SEC("tc") 834 + int cls_redirect(struct __sk_buff *skb) 835 + { 836 + __u8 encap_buffer[sizeof(encap_headers_t)] = {}; 837 + struct bpf_dynptr dynptr; 838 + struct in_addr next_hop; 839 + /* Tracks offset of the dynptr. This will be unnecessary once 840 + * bpf_dynptr_advance() is available. 841 + */ 842 + __u64 off = 0; 843 + ret_t ret; 844 + 845 + bpf_dynptr_from_skb(skb, 0, &dynptr); 846 + 847 + metrics_t *metrics = get_global_metrics(); 848 + if (metrics == NULL) 849 + return TC_ACT_SHOT; 850 + 851 + metrics->processed_packets_total++; 852 + 853 + /* Pass bogus packets as long as we're not sure they're 854 + * destined for us. 855 + */ 856 + if (skb->protocol != bpf_htons(ETH_P_IP)) 857 + return TC_ACT_OK; 858 + 859 + encap_headers_t *encap; 860 + 861 + /* Make sure that all encapsulation headers are available in 862 + * the linear portion of the skb. This makes it easy to manipulate them. 863 + */ 864 + if (bpf_skb_pull_data(skb, sizeof(*encap))) 865 + return TC_ACT_OK; 866 + 867 + encap = bpf_dynptr_slice_rdwr(&dynptr, 0, encap_buffer, sizeof(encap_buffer)); 868 + if (!encap) 869 + return TC_ACT_OK; 870 + 871 + off += sizeof(*encap); 872 + 873 + if (encap->ip.ihl != 5) 874 + /* We never have any options. */ 875 + return TC_ACT_OK; 876 + 877 + if (encap->ip.daddr != ENCAPSULATION_IP || 878 + encap->ip.protocol != IPPROTO_UDP) 879 + return TC_ACT_OK; 880 + 881 + /* TODO Check UDP length? */ 882 + if (encap->udp.dest != ENCAPSULATION_PORT) 883 + return TC_ACT_OK; 884 + 885 + /* We now know that the packet is destined to us, we can 886 + * drop bogus ones. 887 + */ 888 + if (ipv4_is_fragment((void *)&encap->ip)) { 889 + metrics->errors_total_fragmented_ip++; 890 + return TC_ACT_SHOT; 891 + } 892 + 893 + if (encap->gue.variant != 0) { 894 + metrics->errors_total_malformed_encapsulation++; 895 + return TC_ACT_SHOT; 896 + } 897 + 898 + if (encap->gue.control != 0) { 899 + metrics->errors_total_malformed_encapsulation++; 900 + return TC_ACT_SHOT; 901 + } 902 + 903 + if (encap->gue.flags != 0) { 904 + metrics->errors_total_malformed_encapsulation++; 905 + return TC_ACT_SHOT; 906 + } 907 + 908 + if (encap->gue.hlen != 909 + sizeof(encap->unigue) / 4 + encap->unigue.hop_count) { 910 + metrics->errors_total_malformed_encapsulation++; 911 + return TC_ACT_SHOT; 912 + } 913 + 914 + if (encap->unigue.version != 0) { 915 + metrics->errors_total_malformed_encapsulation++; 916 + return TC_ACT_SHOT; 917 + } 918 + 919 + if (encap->unigue.reserved != 0) 920 + return TC_ACT_SHOT; 921 + 922 + MAYBE_RETURN(get_next_hop(&dynptr, &off, encap, &next_hop)); 923 + 924 + if (next_hop.s_addr == 0) { 925 + metrics->accepted_packets_total_last_hop++; 926 + return accept_locally(skb, encap); 927 + } 928 + 929 + verdict_t verdict; 930 + switch (encap->gue.proto_ctype) { 931 + case IPPROTO_IPIP: 932 + verdict = process_ipv4(skb, &dynptr, &off, metrics); 933 + break; 934 + 935 + case IPPROTO_IPV6: 936 + verdict = process_ipv6(skb, &dynptr, &off, metrics); 937 + break; 938 + 939 + default: 940 + metrics->errors_total_unknown_l3_proto++; 941 + return TC_ACT_SHOT; 942 + } 943 + 944 + switch (verdict) { 945 + case INVALID: 946 + /* metrics have already been bumped */ 947 + return TC_ACT_SHOT; 948 + 949 + case UNKNOWN: 950 + return forward_to_next_hop(skb, &dynptr, encap, &next_hop, metrics); 951 + 952 + case ECHO_REQUEST: 953 + metrics->accepted_packets_total_icmp_echo_request++; 954 + break; 955 + 956 + case SYN: 957 + if (encap->unigue.forward_syn) { 958 + return forward_to_next_hop(skb, &dynptr, encap, &next_hop, 959 + metrics); 960 + } 961 + 962 + metrics->accepted_packets_total_syn++; 963 + break; 964 + 965 + case SYN_COOKIE: 966 + metrics->accepted_packets_total_syn_cookies++; 967 + break; 968 + 969 + case ESTABLISHED: 970 + metrics->accepted_packets_total_established++; 971 + break; 972 + } 973 + 974 + ret = accept_locally(skb, encap); 975 + 976 + if (encap == encap_buffer) 977 + bpf_dynptr_write(&dynptr, 0, encap_buffer, sizeof(encap_buffer), 0); 978 + 979 + return ret; 980 + }

+4 -4

tools/testing/selftests/bpf/progs/test_global_func10.c

··· 5 5 #include "bpf_misc.h" 6 6 7 7 struct Small { 8 - int x; 8 + long x; 9 9 }; 10 10 11 11 struct Big { 12 - int x; 13 - int y; 12 + long x; 13 + long y; 14 14 }; 15 15 16 16 __noinline int foo(const struct Big *big) ··· 22 22 } 23 23 24 24 SEC("cgroup_skb/ingress") 25 - __failure __msg("invalid indirect read from stack") 25 + __failure __msg("invalid indirect access to stack") 26 26 int global_func10(struct __sk_buff *skb) 27 27 { 28 28 const struct Small small = {.x = skb->len };

+1 -1

tools/testing/selftests/bpf/progs/test_kfunc_dynptr_param.c

··· 48 48 __failure __msg("arg#0 expected pointer to stack or dynptr_ptr") 49 49 int BPF_PROG(not_ptr_to_stack, int cmd, union bpf_attr *attr, unsigned int size) 50 50 { 51 - unsigned long val; 51 + unsigned long val = 0; 52 52 53 53 return bpf_verify_pkcs7_signature((struct bpf_dynptr *)val, 54 54 (struct bpf_dynptr *)val, NULL);

+487

tools/testing/selftests/bpf/progs/test_l4lb_noinline_dynptr.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2017 Facebook 3 + #include <stddef.h> 4 + #include <stdbool.h> 5 + #include <string.h> 6 + #include <linux/pkt_cls.h> 7 + #include <linux/bpf.h> 8 + #include <linux/in.h> 9 + #include <linux/if_ether.h> 10 + #include <linux/ip.h> 11 + #include <linux/ipv6.h> 12 + #include <linux/icmp.h> 13 + #include <linux/icmpv6.h> 14 + #include <linux/tcp.h> 15 + #include <linux/udp.h> 16 + #include <bpf/bpf_helpers.h> 17 + #include "test_iptunnel_common.h" 18 + #include <bpf/bpf_endian.h> 19 + 20 + #include "bpf_kfuncs.h" 21 + 22 + static __always_inline __u32 rol32(__u32 word, unsigned int shift) 23 + { 24 + return (word << shift) | (word >> ((-shift) & 31)); 25 + } 26 + 27 + /* copy paste of jhash from kernel sources to make sure llvm 28 + * can compile it into valid sequence of bpf instructions 29 + */ 30 + #define __jhash_mix(a, b, c) \ 31 + { \ 32 + a -= c; a ^= rol32(c, 4); c += b; \ 33 + b -= a; b ^= rol32(a, 6); a += c; \ 34 + c -= b; c ^= rol32(b, 8); b += a; \ 35 + a -= c; a ^= rol32(c, 16); c += b; \ 36 + b -= a; b ^= rol32(a, 19); a += c; \ 37 + c -= b; c ^= rol32(b, 4); b += a; \ 38 + } 39 + 40 + #define __jhash_final(a, b, c) \ 41 + { \ 42 + c ^= b; c -= rol32(b, 14); \ 43 + a ^= c; a -= rol32(c, 11); \ 44 + b ^= a; b -= rol32(a, 25); \ 45 + c ^= b; c -= rol32(b, 16); \ 46 + a ^= c; a -= rol32(c, 4); \ 47 + b ^= a; b -= rol32(a, 14); \ 48 + c ^= b; c -= rol32(b, 24); \ 49 + } 50 + 51 + #define JHASH_INITVAL 0xdeadbeef 52 + 53 + typedef unsigned int u32; 54 + 55 + static __noinline u32 jhash(const void *key, u32 length, u32 initval) 56 + { 57 + u32 a, b, c; 58 + const unsigned char *k = key; 59 + 60 + a = b = c = JHASH_INITVAL + length + initval; 61 + 62 + while (length > 12) { 63 + a += *(u32 *)(k); 64 + b += *(u32 *)(k + 4); 65 + c += *(u32 *)(k + 8); 66 + __jhash_mix(a, b, c); 67 + length -= 12; 68 + k += 12; 69 + } 70 + switch (length) { 71 + case 12: c += (u32)k[11]<<24; 72 + case 11: c += (u32)k[10]<<16; 73 + case 10: c += (u32)k[9]<<8; 74 + case 9: c += k[8]; 75 + case 8: b += (u32)k[7]<<24; 76 + case 7: b += (u32)k[6]<<16; 77 + case 6: b += (u32)k[5]<<8; 78 + case 5: b += k[4]; 79 + case 4: a += (u32)k[3]<<24; 80 + case 3: a += (u32)k[2]<<16; 81 + case 2: a += (u32)k[1]<<8; 82 + case 1: a += k[0]; 83 + __jhash_final(a, b, c); 84 + case 0: /* Nothing left to add */ 85 + break; 86 + } 87 + 88 + return c; 89 + } 90 + 91 + static __noinline u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval) 92 + { 93 + a += initval; 94 + b += initval; 95 + c += initval; 96 + __jhash_final(a, b, c); 97 + return c; 98 + } 99 + 100 + static __noinline u32 jhash_2words(u32 a, u32 b, u32 initval) 101 + { 102 + return __jhash_nwords(a, b, 0, initval + JHASH_INITVAL + (2 << 2)); 103 + } 104 + 105 + #define PCKT_FRAGMENTED 65343 106 + #define IPV4_HDR_LEN_NO_OPT 20 107 + #define IPV4_PLUS_ICMP_HDR 28 108 + #define IPV6_PLUS_ICMP_HDR 48 109 + #define RING_SIZE 2 110 + #define MAX_VIPS 12 111 + #define MAX_REALS 5 112 + #define CTL_MAP_SIZE 16 113 + #define CH_RINGS_SIZE (MAX_VIPS * RING_SIZE) 114 + #define F_IPV6 (1 << 0) 115 + #define F_HASH_NO_SRC_PORT (1 << 0) 116 + #define F_ICMP (1 << 0) 117 + #define F_SYN_SET (1 << 1) 118 + 119 + struct packet_description { 120 + union { 121 + __be32 src; 122 + __be32 srcv6[4]; 123 + }; 124 + union { 125 + __be32 dst; 126 + __be32 dstv6[4]; 127 + }; 128 + union { 129 + __u32 ports; 130 + __u16 port16[2]; 131 + }; 132 + __u8 proto; 133 + __u8 flags; 134 + }; 135 + 136 + struct ctl_value { 137 + union { 138 + __u64 value; 139 + __u32 ifindex; 140 + __u8 mac[6]; 141 + }; 142 + }; 143 + 144 + struct vip_meta { 145 + __u32 flags; 146 + __u32 vip_num; 147 + }; 148 + 149 + struct real_definition { 150 + union { 151 + __be32 dst; 152 + __be32 dstv6[4]; 153 + }; 154 + __u8 flags; 155 + }; 156 + 157 + struct vip_stats { 158 + __u64 bytes; 159 + __u64 pkts; 160 + }; 161 + 162 + struct eth_hdr { 163 + unsigned char eth_dest[ETH_ALEN]; 164 + unsigned char eth_source[ETH_ALEN]; 165 + unsigned short eth_proto; 166 + }; 167 + 168 + struct { 169 + __uint(type, BPF_MAP_TYPE_HASH); 170 + __uint(max_entries, MAX_VIPS); 171 + __type(key, struct vip); 172 + __type(value, struct vip_meta); 173 + } vip_map SEC(".maps"); 174 + 175 + struct { 176 + __uint(type, BPF_MAP_TYPE_ARRAY); 177 + __uint(max_entries, CH_RINGS_SIZE); 178 + __type(key, __u32); 179 + __type(value, __u32); 180 + } ch_rings SEC(".maps"); 181 + 182 + struct { 183 + __uint(type, BPF_MAP_TYPE_ARRAY); 184 + __uint(max_entries, MAX_REALS); 185 + __type(key, __u32); 186 + __type(value, struct real_definition); 187 + } reals SEC(".maps"); 188 + 189 + struct { 190 + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); 191 + __uint(max_entries, MAX_VIPS); 192 + __type(key, __u32); 193 + __type(value, struct vip_stats); 194 + } stats SEC(".maps"); 195 + 196 + struct { 197 + __uint(type, BPF_MAP_TYPE_ARRAY); 198 + __uint(max_entries, CTL_MAP_SIZE); 199 + __type(key, __u32); 200 + __type(value, struct ctl_value); 201 + } ctl_array SEC(".maps"); 202 + 203 + static __noinline __u32 get_packet_hash(struct packet_description *pckt, bool ipv6) 204 + { 205 + if (ipv6) 206 + return jhash_2words(jhash(pckt->srcv6, 16, MAX_VIPS), 207 + pckt->ports, CH_RINGS_SIZE); 208 + else 209 + return jhash_2words(pckt->src, pckt->ports, CH_RINGS_SIZE); 210 + } 211 + 212 + static __noinline bool get_packet_dst(struct real_definition **real, 213 + struct packet_description *pckt, 214 + struct vip_meta *vip_info, 215 + bool is_ipv6) 216 + { 217 + __u32 hash = get_packet_hash(pckt, is_ipv6); 218 + __u32 key = RING_SIZE * vip_info->vip_num + hash % RING_SIZE; 219 + __u32 *real_pos; 220 + 221 + if (hash != 0x358459b7 /* jhash of ipv4 packet */ && 222 + hash != 0x2f4bc6bb /* jhash of ipv6 packet */) 223 + return false; 224 + 225 + real_pos = bpf_map_lookup_elem(&ch_rings, &key); 226 + if (!real_pos) 227 + return false; 228 + key = *real_pos; 229 + *real = bpf_map_lookup_elem(&reals, &key); 230 + if (!(*real)) 231 + return false; 232 + return true; 233 + } 234 + 235 + static __noinline int parse_icmpv6(struct bpf_dynptr *skb_ptr, __u64 off, 236 + struct packet_description *pckt) 237 + { 238 + __u8 buffer[sizeof(struct ipv6hdr)] = {}; 239 + struct icmp6hdr *icmp_hdr; 240 + struct ipv6hdr *ip6h; 241 + 242 + icmp_hdr = bpf_dynptr_slice(skb_ptr, off, buffer, sizeof(buffer)); 243 + if (!icmp_hdr) 244 + return TC_ACT_SHOT; 245 + 246 + if (icmp_hdr->icmp6_type != ICMPV6_PKT_TOOBIG) 247 + return TC_ACT_OK; 248 + off += sizeof(struct icmp6hdr); 249 + ip6h = bpf_dynptr_slice(skb_ptr, off, buffer, sizeof(buffer)); 250 + if (!ip6h) 251 + return TC_ACT_SHOT; 252 + pckt->proto = ip6h->nexthdr; 253 + pckt->flags |= F_ICMP; 254 + memcpy(pckt->srcv6, ip6h->daddr.s6_addr32, 16); 255 + memcpy(pckt->dstv6, ip6h->saddr.s6_addr32, 16); 256 + return TC_ACT_UNSPEC; 257 + } 258 + 259 + static __noinline int parse_icmp(struct bpf_dynptr *skb_ptr, __u64 off, 260 + struct packet_description *pckt) 261 + { 262 + __u8 buffer_icmp[sizeof(struct iphdr)] = {}; 263 + __u8 buffer_ip[sizeof(struct iphdr)] = {}; 264 + struct icmphdr *icmp_hdr; 265 + struct iphdr *iph; 266 + 267 + icmp_hdr = bpf_dynptr_slice(skb_ptr, off, buffer_icmp, sizeof(buffer_icmp)); 268 + if (!icmp_hdr) 269 + return TC_ACT_SHOT; 270 + if (icmp_hdr->type != ICMP_DEST_UNREACH || 271 + icmp_hdr->code != ICMP_FRAG_NEEDED) 272 + return TC_ACT_OK; 273 + off += sizeof(struct icmphdr); 274 + iph = bpf_dynptr_slice(skb_ptr, off, buffer_ip, sizeof(buffer_ip)); 275 + if (!iph || iph->ihl != 5) 276 + return TC_ACT_SHOT; 277 + pckt->proto = iph->protocol; 278 + pckt->flags |= F_ICMP; 279 + pckt->src = iph->daddr; 280 + pckt->dst = iph->saddr; 281 + return TC_ACT_UNSPEC; 282 + } 283 + 284 + static __noinline bool parse_udp(struct bpf_dynptr *skb_ptr, __u64 off, 285 + struct packet_description *pckt) 286 + { 287 + __u8 buffer[sizeof(struct udphdr)] = {}; 288 + struct udphdr *udp; 289 + 290 + udp = bpf_dynptr_slice(skb_ptr, off, buffer, sizeof(buffer)); 291 + if (!udp) 292 + return false; 293 + 294 + if (!(pckt->flags & F_ICMP)) { 295 + pckt->port16[0] = udp->source; 296 + pckt->port16[1] = udp->dest; 297 + } else { 298 + pckt->port16[0] = udp->dest; 299 + pckt->port16[1] = udp->source; 300 + } 301 + return true; 302 + } 303 + 304 + static __noinline bool parse_tcp(struct bpf_dynptr *skb_ptr, __u64 off, 305 + struct packet_description *pckt) 306 + { 307 + __u8 buffer[sizeof(struct tcphdr)] = {}; 308 + struct tcphdr *tcp; 309 + 310 + tcp = bpf_dynptr_slice(skb_ptr, off, buffer, sizeof(buffer)); 311 + if (!tcp) 312 + return false; 313 + 314 + if (tcp->syn) 315 + pckt->flags |= F_SYN_SET; 316 + 317 + if (!(pckt->flags & F_ICMP)) { 318 + pckt->port16[0] = tcp->source; 319 + pckt->port16[1] = tcp->dest; 320 + } else { 321 + pckt->port16[0] = tcp->dest; 322 + pckt->port16[1] = tcp->source; 323 + } 324 + return true; 325 + } 326 + 327 + static __noinline int process_packet(struct bpf_dynptr *skb_ptr, 328 + struct eth_hdr *eth, __u64 off, 329 + bool is_ipv6, struct __sk_buff *skb) 330 + { 331 + struct packet_description pckt = {}; 332 + struct bpf_tunnel_key tkey = {}; 333 + struct vip_stats *data_stats; 334 + struct real_definition *dst; 335 + struct vip_meta *vip_info; 336 + struct ctl_value *cval; 337 + __u32 v4_intf_pos = 1; 338 + __u32 v6_intf_pos = 2; 339 + struct ipv6hdr *ip6h; 340 + struct vip vip = {}; 341 + struct iphdr *iph; 342 + int tun_flag = 0; 343 + __u16 pkt_bytes; 344 + __u64 iph_len; 345 + __u32 ifindex; 346 + __u8 protocol; 347 + __u32 vip_num; 348 + int action; 349 + 350 + tkey.tunnel_ttl = 64; 351 + if (is_ipv6) { 352 + __u8 buffer[sizeof(struct ipv6hdr)] = {}; 353 + 354 + ip6h = bpf_dynptr_slice(skb_ptr, off, buffer, sizeof(buffer)); 355 + if (!ip6h) 356 + return TC_ACT_SHOT; 357 + 358 + iph_len = sizeof(struct ipv6hdr); 359 + protocol = ip6h->nexthdr; 360 + pckt.proto = protocol; 361 + pkt_bytes = bpf_ntohs(ip6h->payload_len); 362 + off += iph_len; 363 + if (protocol == IPPROTO_FRAGMENT) { 364 + return TC_ACT_SHOT; 365 + } else if (protocol == IPPROTO_ICMPV6) { 366 + action = parse_icmpv6(skb_ptr, off, &pckt); 367 + if (action >= 0) 368 + return action; 369 + off += IPV6_PLUS_ICMP_HDR; 370 + } else { 371 + memcpy(pckt.srcv6, ip6h->saddr.s6_addr32, 16); 372 + memcpy(pckt.dstv6, ip6h->daddr.s6_addr32, 16); 373 + } 374 + } else { 375 + __u8 buffer[sizeof(struct iphdr)] = {}; 376 + 377 + iph = bpf_dynptr_slice(skb_ptr, off, buffer, sizeof(buffer)); 378 + if (!iph || iph->ihl != 5) 379 + return TC_ACT_SHOT; 380 + 381 + protocol = iph->protocol; 382 + pckt.proto = protocol; 383 + pkt_bytes = bpf_ntohs(iph->tot_len); 384 + off += IPV4_HDR_LEN_NO_OPT; 385 + 386 + if (iph->frag_off & PCKT_FRAGMENTED) 387 + return TC_ACT_SHOT; 388 + if (protocol == IPPROTO_ICMP) { 389 + action = parse_icmp(skb_ptr, off, &pckt); 390 + if (action >= 0) 391 + return action; 392 + off += IPV4_PLUS_ICMP_HDR; 393 + } else { 394 + pckt.src = iph->saddr; 395 + pckt.dst = iph->daddr; 396 + } 397 + } 398 + protocol = pckt.proto; 399 + 400 + if (protocol == IPPROTO_TCP) { 401 + if (!parse_tcp(skb_ptr, off, &pckt)) 402 + return TC_ACT_SHOT; 403 + } else if (protocol == IPPROTO_UDP) { 404 + if (!parse_udp(skb_ptr, off, &pckt)) 405 + return TC_ACT_SHOT; 406 + } else { 407 + return TC_ACT_SHOT; 408 + } 409 + 410 + if (is_ipv6) 411 + memcpy(vip.daddr.v6, pckt.dstv6, 16); 412 + else 413 + vip.daddr.v4 = pckt.dst; 414 + 415 + vip.dport = pckt.port16[1]; 416 + vip.protocol = pckt.proto; 417 + vip_info = bpf_map_lookup_elem(&vip_map, &vip); 418 + if (!vip_info) { 419 + vip.dport = 0; 420 + vip_info = bpf_map_lookup_elem(&vip_map, &vip); 421 + if (!vip_info) 422 + return TC_ACT_SHOT; 423 + pckt.port16[1] = 0; 424 + } 425 + 426 + if (vip_info->flags & F_HASH_NO_SRC_PORT) 427 + pckt.port16[0] = 0; 428 + 429 + if (!get_packet_dst(&dst, &pckt, vip_info, is_ipv6)) 430 + return TC_ACT_SHOT; 431 + 432 + if (dst->flags & F_IPV6) { 433 + cval = bpf_map_lookup_elem(&ctl_array, &v6_intf_pos); 434 + if (!cval) 435 + return TC_ACT_SHOT; 436 + ifindex = cval->ifindex; 437 + memcpy(tkey.remote_ipv6, dst->dstv6, 16); 438 + tun_flag = BPF_F_TUNINFO_IPV6; 439 + } else { 440 + cval = bpf_map_lookup_elem(&ctl_array, &v4_intf_pos); 441 + if (!cval) 442 + return TC_ACT_SHOT; 443 + ifindex = cval->ifindex; 444 + tkey.remote_ipv4 = dst->dst; 445 + } 446 + vip_num = vip_info->vip_num; 447 + data_stats = bpf_map_lookup_elem(&stats, &vip_num); 448 + if (!data_stats) 449 + return TC_ACT_SHOT; 450 + data_stats->pkts++; 451 + data_stats->bytes += pkt_bytes; 452 + bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), tun_flag); 453 + *(u32 *)eth->eth_dest = tkey.remote_ipv4; 454 + return bpf_redirect(ifindex, 0); 455 + } 456 + 457 + SEC("tc") 458 + int balancer_ingress(struct __sk_buff *ctx) 459 + { 460 + __u8 buffer[sizeof(struct eth_hdr)] = {}; 461 + struct bpf_dynptr ptr; 462 + struct eth_hdr *eth; 463 + __u32 eth_proto; 464 + __u32 nh_off; 465 + int err; 466 + 467 + nh_off = sizeof(struct eth_hdr); 468 + 469 + bpf_dynptr_from_skb(ctx, 0, &ptr); 470 + eth = bpf_dynptr_slice_rdwr(&ptr, 0, buffer, sizeof(buffer)); 471 + if (!eth) 472 + return TC_ACT_SHOT; 473 + eth_proto = eth->eth_proto; 474 + if (eth_proto == bpf_htons(ETH_P_IP)) 475 + err = process_packet(&ptr, eth, nh_off, false, ctx); 476 + else if (eth_proto == bpf_htons(ETH_P_IPV6)) 477 + err = process_packet(&ptr, eth, nh_off, true, ctx); 478 + else 479 + return TC_ACT_SHOT; 480 + 481 + if (eth == buffer) 482 + bpf_dynptr_write(&ptr, 0, buffer, sizeof(buffer), 0); 483 + 484 + return err; 485 + } 486 + 487 + char _license[] SEC("license") = "GPL";

+119

tools/testing/selftests/bpf/progs/test_parse_tcp_hdr_opt.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* This parsing logic is taken from the open source library katran, a layer 4 4 + * load balancer. 5 + * 6 + * This code logic using dynptrs can be found in test_parse_tcp_hdr_opt_dynptr.c 7 + * 8 + * https://github.com/facebookincubator/katran/blob/main/katran/lib/bpf/pckt_parsing.h 9 + */ 10 + 11 + #include <linux/bpf.h> 12 + #include <bpf/bpf_helpers.h> 13 + #include <linux/tcp.h> 14 + #include <stdbool.h> 15 + #include <linux/ipv6.h> 16 + #include <linux/if_ether.h> 17 + #include "test_tcp_hdr_options.h" 18 + 19 + char _license[] SEC("license") = "GPL"; 20 + 21 + /* Kind number used for experiments */ 22 + const __u32 tcp_hdr_opt_kind_tpr = 0xFD; 23 + /* Length of the tcp header option */ 24 + const __u32 tcp_hdr_opt_len_tpr = 6; 25 + /* maximum number of header options to check to lookup server_id */ 26 + const __u32 tcp_hdr_opt_max_opt_checks = 15; 27 + 28 + __u32 server_id; 29 + 30 + struct hdr_opt_state { 31 + __u32 server_id; 32 + __u8 byte_offset; 33 + __u8 hdr_bytes_remaining; 34 + }; 35 + 36 + static int parse_hdr_opt(const struct xdp_md *xdp, struct hdr_opt_state *state) 37 + { 38 + const void *data = (void *)(long)xdp->data; 39 + const void *data_end = (void *)(long)xdp->data_end; 40 + __u8 *tcp_opt, kind, hdr_len; 41 + 42 + tcp_opt = (__u8 *)(data + state->byte_offset); 43 + if (tcp_opt + 1 > data_end) 44 + return -1; 45 + 46 + kind = tcp_opt[0]; 47 + 48 + if (kind == TCPOPT_EOL) 49 + return -1; 50 + 51 + if (kind == TCPOPT_NOP) { 52 + state->hdr_bytes_remaining--; 53 + state->byte_offset++; 54 + return 0; 55 + } 56 + 57 + if (state->hdr_bytes_remaining < 2 || 58 + tcp_opt + sizeof(__u8) + sizeof(__u8) > data_end) 59 + return -1; 60 + 61 + hdr_len = tcp_opt[1]; 62 + if (hdr_len > state->hdr_bytes_remaining) 63 + return -1; 64 + 65 + if (kind == tcp_hdr_opt_kind_tpr) { 66 + if (hdr_len != tcp_hdr_opt_len_tpr) 67 + return -1; 68 + 69 + if (tcp_opt + tcp_hdr_opt_len_tpr > data_end) 70 + return -1; 71 + 72 + state->server_id = *(__u32 *)&tcp_opt[2]; 73 + return 1; 74 + } 75 + 76 + state->hdr_bytes_remaining -= hdr_len; 77 + state->byte_offset += hdr_len; 78 + return 0; 79 + } 80 + 81 + SEC("xdp") 82 + int xdp_ingress_v6(struct xdp_md *xdp) 83 + { 84 + const void *data = (void *)(long)xdp->data; 85 + const void *data_end = (void *)(long)xdp->data_end; 86 + struct hdr_opt_state opt_state = {}; 87 + __u8 tcp_hdr_opt_len = 0; 88 + struct tcphdr *tcp_hdr; 89 + __u64 tcp_offset = 0; 90 + __u32 off; 91 + int err; 92 + 93 + tcp_offset = sizeof(struct ethhdr) + sizeof(struct ipv6hdr); 94 + tcp_hdr = (struct tcphdr *)(data + tcp_offset); 95 + if (tcp_hdr + 1 > data_end) 96 + return XDP_DROP; 97 + 98 + tcp_hdr_opt_len = (tcp_hdr->doff * 4) - sizeof(struct tcphdr); 99 + if (tcp_hdr_opt_len < tcp_hdr_opt_len_tpr) 100 + return XDP_DROP; 101 + 102 + opt_state.hdr_bytes_remaining = tcp_hdr_opt_len; 103 + opt_state.byte_offset = sizeof(struct tcphdr) + tcp_offset; 104 + 105 + /* max number of bytes of options in tcp header is 40 bytes */ 106 + for (int i = 0; i < tcp_hdr_opt_max_opt_checks; i++) { 107 + err = parse_hdr_opt(xdp, &opt_state); 108 + 109 + if (err || !opt_state.hdr_bytes_remaining) 110 + break; 111 + } 112 + 113 + if (!opt_state.server_id) 114 + return XDP_DROP; 115 + 116 + server_id = opt_state.server_id; 117 + 118 + return XDP_PASS; 119 + }

+114

tools/testing/selftests/bpf/progs/test_parse_tcp_hdr_opt_dynptr.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* This logic is lifted from a real-world use case of packet parsing, used in 4 + * the open source library katran, a layer 4 load balancer. 5 + * 6 + * This test demonstrates how to parse packet contents using dynptrs. The 7 + * original code (parsing without dynptrs) can be found in test_parse_tcp_hdr_opt.c 8 + */ 9 + 10 + #include <linux/bpf.h> 11 + #include <bpf/bpf_helpers.h> 12 + #include <linux/tcp.h> 13 + #include <stdbool.h> 14 + #include <linux/ipv6.h> 15 + #include <linux/if_ether.h> 16 + #include "test_tcp_hdr_options.h" 17 + #include "bpf_kfuncs.h" 18 + 19 + char _license[] SEC("license") = "GPL"; 20 + 21 + /* Kind number used for experiments */ 22 + const __u32 tcp_hdr_opt_kind_tpr = 0xFD; 23 + /* Length of the tcp header option */ 24 + const __u32 tcp_hdr_opt_len_tpr = 6; 25 + /* maximum number of header options to check to lookup server_id */ 26 + const __u32 tcp_hdr_opt_max_opt_checks = 15; 27 + 28 + __u32 server_id; 29 + 30 + static int parse_hdr_opt(struct bpf_dynptr *ptr, __u32 *off, __u8 *hdr_bytes_remaining, 31 + __u32 *server_id) 32 + { 33 + __u8 *tcp_opt, kind, hdr_len; 34 + __u8 buffer[sizeof(kind) + sizeof(hdr_len) + sizeof(*server_id)]; 35 + __u8 *data; 36 + 37 + __builtin_memset(buffer, 0, sizeof(buffer)); 38 + 39 + data = bpf_dynptr_slice(ptr, *off, buffer, sizeof(buffer)); 40 + if (!data) 41 + return -1; 42 + 43 + kind = data[0]; 44 + 45 + if (kind == TCPOPT_EOL) 46 + return -1; 47 + 48 + if (kind == TCPOPT_NOP) { 49 + *off += 1; 50 + *hdr_bytes_remaining -= 1; 51 + return 0; 52 + } 53 + 54 + if (*hdr_bytes_remaining < 2) 55 + return -1; 56 + 57 + hdr_len = data[1]; 58 + if (hdr_len > *hdr_bytes_remaining) 59 + return -1; 60 + 61 + if (kind == tcp_hdr_opt_kind_tpr) { 62 + if (hdr_len != tcp_hdr_opt_len_tpr) 63 + return -1; 64 + 65 + __builtin_memcpy(server_id, (__u32 *)(data + 2), sizeof(*server_id)); 66 + return 1; 67 + } 68 + 69 + *off += hdr_len; 70 + *hdr_bytes_remaining -= hdr_len; 71 + return 0; 72 + } 73 + 74 + SEC("xdp") 75 + int xdp_ingress_v6(struct xdp_md *xdp) 76 + { 77 + __u8 buffer[sizeof(struct tcphdr)] = {}; 78 + __u8 hdr_bytes_remaining; 79 + struct tcphdr *tcp_hdr; 80 + __u8 tcp_hdr_opt_len; 81 + int err = 0; 82 + __u32 off; 83 + 84 + struct bpf_dynptr ptr; 85 + 86 + bpf_dynptr_from_xdp(xdp, 0, &ptr); 87 + 88 + off = sizeof(struct ethhdr) + sizeof(struct ipv6hdr); 89 + 90 + tcp_hdr = bpf_dynptr_slice(&ptr, off, buffer, sizeof(buffer)); 91 + if (!tcp_hdr) 92 + return XDP_DROP; 93 + 94 + tcp_hdr_opt_len = (tcp_hdr->doff * 4) - sizeof(struct tcphdr); 95 + if (tcp_hdr_opt_len < tcp_hdr_opt_len_tpr) 96 + return XDP_DROP; 97 + 98 + hdr_bytes_remaining = tcp_hdr_opt_len; 99 + 100 + off += sizeof(struct tcphdr); 101 + 102 + /* max number of bytes of options in tcp header is 40 bytes */ 103 + for (int i = 0; i < tcp_hdr_opt_max_opt_checks; i++) { 104 + err = parse_hdr_opt(&ptr, &off, &hdr_bytes_remaining, &server_id); 105 + 106 + if (err || !hdr_bytes_remaining) 107 + break; 108 + } 109 + 110 + if (!server_id) 111 + return XDP_DROP; 112 + 113 + return XDP_PASS; 114 + }

+1 -1

tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c

··· 23 23 bool *ipv4) 24 24 { 25 25 struct bpf_sock_tuple *result; 26 + __u64 ihl_len = 0; 26 27 __u8 proto = 0; 27 - __u64 ihl_len; 28 28 29 29 if (eth_proto == bpf_htons(ETH_P_IP)) { 30 30 struct iphdr *iph = (struct iphdr *)(data + nh_off);

+5 -5

tools/testing/selftests/bpf/progs/test_tunnel_kern.c

··· 324 324 SEC("tc") 325 325 int vxlan_set_tunnel_dst(struct __sk_buff *skb) 326 326 { 327 - int ret; 328 327 struct bpf_tunnel_key key; 329 328 struct vxlan_metadata md; 330 329 __u32 index = 0; 331 330 __u32 *local_ip = NULL; 331 + int ret = 0; 332 332 333 333 local_ip = bpf_map_lookup_elem(&local_ip_map, &index); 334 334 if (!local_ip) { ··· 363 363 SEC("tc") 364 364 int vxlan_set_tunnel_src(struct __sk_buff *skb) 365 365 { 366 - int ret; 367 366 struct bpf_tunnel_key key; 368 367 struct vxlan_metadata md; 369 368 __u32 index = 0; 370 369 __u32 *local_ip = NULL; 370 + int ret = 0; 371 371 372 372 local_ip = bpf_map_lookup_elem(&local_ip_map, &index); 373 373 if (!local_ip) { ··· 494 494 int ip6vxlan_set_tunnel_dst(struct __sk_buff *skb) 495 495 { 496 496 struct bpf_tunnel_key key; 497 - int ret; 498 497 __u32 index = 0; 499 498 __u32 *local_ip; 499 + int ret = 0; 500 500 501 501 local_ip = bpf_map_lookup_elem(&local_ip_map, &index); 502 502 if (!local_ip) { ··· 525 525 int ip6vxlan_set_tunnel_src(struct __sk_buff *skb) 526 526 { 527 527 struct bpf_tunnel_key key; 528 - int ret; 529 528 __u32 index = 0; 530 529 __u32 *local_ip; 530 + int ret = 0; 531 531 532 532 local_ip = bpf_map_lookup_elem(&local_ip_map, &index); 533 533 if (!local_ip) { ··· 556 556 int ip6vxlan_get_tunnel_src(struct __sk_buff *skb) 557 557 { 558 558 struct bpf_tunnel_key key; 559 - int ret; 560 559 __u32 index = 0; 561 560 __u32 *local_ip; 561 + int ret = 0; 562 562 563 563 local_ip = bpf_map_lookup_elem(&local_ip_map, &index); 564 564 if (!local_ip) {

+257

tools/testing/selftests/bpf/progs/test_xdp_dynptr.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta */ 3 + #include <stddef.h> 4 + #include <string.h> 5 + #include <linux/bpf.h> 6 + #include <linux/if_ether.h> 7 + #include <linux/if_packet.h> 8 + #include <linux/ip.h> 9 + #include <linux/ipv6.h> 10 + #include <linux/in.h> 11 + #include <linux/udp.h> 12 + #include <linux/tcp.h> 13 + #include <linux/pkt_cls.h> 14 + #include <sys/socket.h> 15 + #include <bpf/bpf_helpers.h> 16 + #include <bpf/bpf_endian.h> 17 + #include "test_iptunnel_common.h" 18 + #include "bpf_kfuncs.h" 19 + 20 + const size_t tcphdr_sz = sizeof(struct tcphdr); 21 + const size_t udphdr_sz = sizeof(struct udphdr); 22 + const size_t ethhdr_sz = sizeof(struct ethhdr); 23 + const size_t iphdr_sz = sizeof(struct iphdr); 24 + const size_t ipv6hdr_sz = sizeof(struct ipv6hdr); 25 + 26 + struct { 27 + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); 28 + __uint(max_entries, 256); 29 + __type(key, __u32); 30 + __type(value, __u64); 31 + } rxcnt SEC(".maps"); 32 + 33 + struct { 34 + __uint(type, BPF_MAP_TYPE_HASH); 35 + __uint(max_entries, MAX_IPTNL_ENTRIES); 36 + __type(key, struct vip); 37 + __type(value, struct iptnl_info); 38 + } vip2tnl SEC(".maps"); 39 + 40 + static __always_inline void count_tx(__u32 protocol) 41 + { 42 + __u64 *rxcnt_count; 43 + 44 + rxcnt_count = bpf_map_lookup_elem(&rxcnt, &protocol); 45 + if (rxcnt_count) 46 + *rxcnt_count += 1; 47 + } 48 + 49 + static __always_inline int get_dport(void *trans_data, __u8 protocol) 50 + { 51 + struct tcphdr *th; 52 + struct udphdr *uh; 53 + 54 + switch (protocol) { 55 + case IPPROTO_TCP: 56 + th = (struct tcphdr *)trans_data; 57 + return th->dest; 58 + case IPPROTO_UDP: 59 + uh = (struct udphdr *)trans_data; 60 + return uh->dest; 61 + default: 62 + return 0; 63 + } 64 + } 65 + 66 + static __always_inline void set_ethhdr(struct ethhdr *new_eth, 67 + const struct ethhdr *old_eth, 68 + const struct iptnl_info *tnl, 69 + __be16 h_proto) 70 + { 71 + memcpy(new_eth->h_source, old_eth->h_dest, sizeof(new_eth->h_source)); 72 + memcpy(new_eth->h_dest, tnl->dmac, sizeof(new_eth->h_dest)); 73 + new_eth->h_proto = h_proto; 74 + } 75 + 76 + static __always_inline int handle_ipv4(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr) 77 + { 78 + __u8 eth_buffer[ethhdr_sz + iphdr_sz + ethhdr_sz]; 79 + __u8 iph_buffer_tcp[iphdr_sz + tcphdr_sz]; 80 + __u8 iph_buffer_udp[iphdr_sz + udphdr_sz]; 81 + struct bpf_dynptr new_xdp_ptr; 82 + struct iptnl_info *tnl; 83 + struct ethhdr *new_eth; 84 + struct ethhdr *old_eth; 85 + __u32 transport_hdr_sz; 86 + struct iphdr *iph; 87 + __u16 *next_iph; 88 + __u16 payload_len; 89 + struct vip vip = {}; 90 + int dport; 91 + __u32 csum = 0; 92 + int i; 93 + 94 + __builtin_memset(eth_buffer, 0, sizeof(eth_buffer)); 95 + __builtin_memset(iph_buffer_tcp, 0, sizeof(iph_buffer_tcp)); 96 + __builtin_memset(iph_buffer_udp, 0, sizeof(iph_buffer_udp)); 97 + 98 + if (ethhdr_sz + iphdr_sz + tcphdr_sz > xdp->data_end - xdp->data) 99 + iph = bpf_dynptr_slice(xdp_ptr, ethhdr_sz, iph_buffer_udp, sizeof(iph_buffer_udp)); 100 + else 101 + iph = bpf_dynptr_slice(xdp_ptr, ethhdr_sz, iph_buffer_tcp, sizeof(iph_buffer_tcp)); 102 + 103 + if (!iph) 104 + return XDP_DROP; 105 + 106 + dport = get_dport(iph + 1, iph->protocol); 107 + if (dport == -1) 108 + return XDP_DROP; 109 + 110 + vip.protocol = iph->protocol; 111 + vip.family = AF_INET; 112 + vip.daddr.v4 = iph->daddr; 113 + vip.dport = dport; 114 + payload_len = bpf_ntohs(iph->tot_len); 115 + 116 + tnl = bpf_map_lookup_elem(&vip2tnl, &vip); 117 + /* It only does v4-in-v4 */ 118 + if (!tnl || tnl->family != AF_INET) 119 + return XDP_PASS; 120 + 121 + if (bpf_xdp_adjust_head(xdp, 0 - (int)iphdr_sz)) 122 + return XDP_DROP; 123 + 124 + bpf_dynptr_from_xdp(xdp, 0, &new_xdp_ptr); 125 + new_eth = bpf_dynptr_slice_rdwr(&new_xdp_ptr, 0, eth_buffer, sizeof(eth_buffer)); 126 + if (!new_eth) 127 + return XDP_DROP; 128 + 129 + iph = (struct iphdr *)(new_eth + 1); 130 + old_eth = (struct ethhdr *)(iph + 1); 131 + 132 + set_ethhdr(new_eth, old_eth, tnl, bpf_htons(ETH_P_IP)); 133 + 134 + if (new_eth == eth_buffer) 135 + bpf_dynptr_write(&new_xdp_ptr, 0, eth_buffer, sizeof(eth_buffer), 0); 136 + 137 + iph->version = 4; 138 + iph->ihl = iphdr_sz >> 2; 139 + iph->frag_off = 0; 140 + iph->protocol = IPPROTO_IPIP; 141 + iph->check = 0; 142 + iph->tos = 0; 143 + iph->tot_len = bpf_htons(payload_len + iphdr_sz); 144 + iph->daddr = tnl->daddr.v4; 145 + iph->saddr = tnl->saddr.v4; 146 + iph->ttl = 8; 147 + 148 + next_iph = (__u16 *)iph; 149 + for (i = 0; i < iphdr_sz >> 1; i++) 150 + csum += *next_iph++; 151 + 152 + iph->check = ~((csum & 0xffff) + (csum >> 16)); 153 + 154 + count_tx(vip.protocol); 155 + 156 + return XDP_TX; 157 + } 158 + 159 + static __always_inline int handle_ipv6(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr) 160 + { 161 + __u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz]; 162 + __u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz]; 163 + __u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz]; 164 + struct bpf_dynptr new_xdp_ptr; 165 + struct iptnl_info *tnl; 166 + struct ethhdr *new_eth; 167 + struct ethhdr *old_eth; 168 + __u32 transport_hdr_sz; 169 + struct ipv6hdr *ip6h; 170 + __u16 payload_len; 171 + struct vip vip = {}; 172 + int dport; 173 + 174 + __builtin_memset(eth_buffer, 0, sizeof(eth_buffer)); 175 + __builtin_memset(ip6h_buffer_tcp, 0, sizeof(ip6h_buffer_tcp)); 176 + __builtin_memset(ip6h_buffer_udp, 0, sizeof(ip6h_buffer_udp)); 177 + 178 + if (ethhdr_sz + iphdr_sz + tcphdr_sz > xdp->data_end - xdp->data) 179 + ip6h = bpf_dynptr_slice(xdp_ptr, ethhdr_sz, ip6h_buffer_udp, sizeof(ip6h_buffer_udp)); 180 + else 181 + ip6h = bpf_dynptr_slice(xdp_ptr, ethhdr_sz, ip6h_buffer_tcp, sizeof(ip6h_buffer_tcp)); 182 + 183 + if (!ip6h) 184 + return XDP_DROP; 185 + 186 + dport = get_dport(ip6h + 1, ip6h->nexthdr); 187 + if (dport == -1) 188 + return XDP_DROP; 189 + 190 + vip.protocol = ip6h->nexthdr; 191 + vip.family = AF_INET6; 192 + memcpy(vip.daddr.v6, ip6h->daddr.s6_addr32, sizeof(vip.daddr)); 193 + vip.dport = dport; 194 + payload_len = ip6h->payload_len; 195 + 196 + tnl = bpf_map_lookup_elem(&vip2tnl, &vip); 197 + /* It only does v6-in-v6 */ 198 + if (!tnl || tnl->family != AF_INET6) 199 + return XDP_PASS; 200 + 201 + if (bpf_xdp_adjust_head(xdp, 0 - (int)ipv6hdr_sz)) 202 + return XDP_DROP; 203 + 204 + bpf_dynptr_from_xdp(xdp, 0, &new_xdp_ptr); 205 + new_eth = bpf_dynptr_slice_rdwr(&new_xdp_ptr, 0, eth_buffer, sizeof(eth_buffer)); 206 + if (!new_eth) 207 + return XDP_DROP; 208 + 209 + ip6h = (struct ipv6hdr *)(new_eth + 1); 210 + old_eth = (struct ethhdr *)(ip6h + 1); 211 + 212 + set_ethhdr(new_eth, old_eth, tnl, bpf_htons(ETH_P_IPV6)); 213 + 214 + if (new_eth == eth_buffer) 215 + bpf_dynptr_write(&new_xdp_ptr, 0, eth_buffer, sizeof(eth_buffer), 0); 216 + 217 + ip6h->version = 6; 218 + ip6h->priority = 0; 219 + memset(ip6h->flow_lbl, 0, sizeof(ip6h->flow_lbl)); 220 + ip6h->payload_len = bpf_htons(bpf_ntohs(payload_len) + ipv6hdr_sz); 221 + ip6h->nexthdr = IPPROTO_IPV6; 222 + ip6h->hop_limit = 8; 223 + memcpy(ip6h->saddr.s6_addr32, tnl->saddr.v6, sizeof(tnl->saddr.v6)); 224 + memcpy(ip6h->daddr.s6_addr32, tnl->daddr.v6, sizeof(tnl->daddr.v6)); 225 + 226 + count_tx(vip.protocol); 227 + 228 + return XDP_TX; 229 + } 230 + 231 + SEC("xdp") 232 + int _xdp_tx_iptunnel(struct xdp_md *xdp) 233 + { 234 + __u8 buffer[ethhdr_sz]; 235 + struct bpf_dynptr ptr; 236 + struct ethhdr *eth; 237 + __u16 h_proto; 238 + 239 + __builtin_memset(buffer, 0, sizeof(buffer)); 240 + 241 + bpf_dynptr_from_xdp(xdp, 0, &ptr); 242 + eth = bpf_dynptr_slice(&ptr, 0, buffer, sizeof(buffer)); 243 + if (!eth) 244 + return XDP_DROP; 245 + 246 + h_proto = eth->h_proto; 247 + 248 + if (h_proto == bpf_htons(ETH_P_IP)) 249 + return handle_ipv4(xdp, &ptr); 250 + else if (h_proto == bpf_htons(ETH_P_IPV6)) 251 + 252 + return handle_ipv6(xdp, &ptr); 253 + else 254 + return XDP_DROP; 255 + } 256 + 257 + char _license[] SEC("license") = "GPL";

+45

tools/testing/selftests/bpf/progs/timer.c

··· 46 46 __type(value, struct elem); 47 47 } lru SEC(".maps"); 48 48 49 + struct { 50 + __uint(type, BPF_MAP_TYPE_ARRAY); 51 + __uint(max_entries, 1); 52 + __type(key, int); 53 + __type(value, struct elem); 54 + } abs_timer SEC(".maps"); 55 + 49 56 __u64 bss_data; 57 + __u64 abs_data; 50 58 __u64 err; 51 59 __u64 ok; 52 60 __u64 callback_check = 52; ··· 291 283 bpf_timer_init(&val->timer, &hmap_malloc, CLOCK_BOOTTIME); 292 284 293 285 return bpf_timer_test(); 286 + } 287 + 288 + /* callback for absolute timer */ 289 + static int timer_cb3(void *map, int *key, struct bpf_timer *timer) 290 + { 291 + abs_data += 6; 292 + 293 + if (abs_data < 12) { 294 + bpf_timer_start(timer, bpf_ktime_get_boot_ns() + 1000, 295 + BPF_F_TIMER_ABS); 296 + } else { 297 + /* Re-arm timer ~35 seconds in future */ 298 + bpf_timer_start(timer, bpf_ktime_get_boot_ns() + (1ull << 35), 299 + BPF_F_TIMER_ABS); 300 + } 301 + 302 + return 0; 303 + } 304 + 305 + SEC("fentry/bpf_fentry_test3") 306 + int BPF_PROG2(test3, int, a) 307 + { 308 + int key = 0; 309 + struct bpf_timer *timer; 310 + 311 + bpf_printk("test3"); 312 + 313 + timer = bpf_map_lookup_elem(&abs_timer, &key); 314 + if (timer) { 315 + if (bpf_timer_init(timer, &abs_timer, CLOCK_BOOTTIME) != 0) 316 + err |= 2048; 317 + bpf_timer_set_callback(timer, timer_cb3); 318 + bpf_timer_start(timer, bpf_ktime_get_boot_ns() + 1000, 319 + BPF_F_TIMER_ABS); 320 + } 321 + 322 + return 0; 294 323 }

+87

tools/testing/selftests/bpf/progs/uninit_stack.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + 7 + /* Read an uninitialized value from stack at a fixed offset */ 8 + SEC("socket") 9 + __naked int read_uninit_stack_fixed_off(void *ctx) 10 + { 11 + asm volatile (" \ 12 + r0 = 0; \ 13 + /* force stack depth to be 128 */ \ 14 + *(u64*)(r10 - 128) = r1; \ 15 + r1 = *(u8 *)(r10 - 8 ); \ 16 + r0 += r1; \ 17 + r1 = *(u8 *)(r10 - 11); \ 18 + r1 = *(u8 *)(r10 - 13); \ 19 + r1 = *(u8 *)(r10 - 15); \ 20 + r1 = *(u16*)(r10 - 16); \ 21 + r1 = *(u32*)(r10 - 32); \ 22 + r1 = *(u64*)(r10 - 64); \ 23 + /* read from a spill of a wrong size, it is a separate \ 24 + * branch in check_stack_read_fixed_off() \ 25 + */ \ 26 + *(u32*)(r10 - 72) = r1; \ 27 + r1 = *(u64*)(r10 - 72); \ 28 + r0 = 0; \ 29 + exit; \ 30 + " 31 + ::: __clobber_all); 32 + } 33 + 34 + /* Read an uninitialized value from stack at a variable offset */ 35 + SEC("socket") 36 + __naked int read_uninit_stack_var_off(void *ctx) 37 + { 38 + asm volatile (" \ 39 + call %[bpf_get_prandom_u32]; \ 40 + /* force stack depth to be 64 */ \ 41 + *(u64*)(r10 - 64) = r0; \ 42 + r0 = -r0; \ 43 + /* give r0 a range [-31, -1] */ \ 44 + if r0 s<= -32 goto exit_%=; \ 45 + if r0 s>= 0 goto exit_%=; \ 46 + /* access stack using r0 */ \ 47 + r1 = r10; \ 48 + r1 += r0; \ 49 + r2 = *(u8*)(r1 + 0); \ 50 + exit_%=: r0 = 0; \ 51 + exit; \ 52 + " 53 + : 54 + : __imm(bpf_get_prandom_u32) 55 + : __clobber_all); 56 + } 57 + 58 + static __noinline void dummy(void) {} 59 + 60 + /* Pass a pointer to uninitialized stack memory to a helper. 61 + * Passed memory block should be marked as STACK_MISC after helper call. 62 + */ 63 + SEC("socket") 64 + __log_level(7) __msg("fp-104=mmmmmmmm") 65 + __naked int helper_uninit_to_misc(void *ctx) 66 + { 67 + asm volatile (" \ 68 + /* force stack depth to be 128 */ \ 69 + *(u64*)(r10 - 128) = r1; \ 70 + r1 = r10; \ 71 + r1 += -128; \ 72 + r2 = 32; \ 73 + call %[bpf_trace_printk]; \ 74 + /* Call to dummy() forces print_verifier_state(..., true), \ 75 + * thus showing the stack state, matched by __msg(). \ 76 + */ \ 77 + call %[dummy]; \ 78 + r0 = 0; \ 79 + exit; \ 80 + " 81 + : 82 + : __imm(bpf_trace_printk), 83 + __imm(dummy) 84 + : __clobber_all); 85 + } 86 + 87 + char _license[] SEC("license") = "GPL";

+1 -1

tools/testing/selftests/bpf/progs/user_ringbuf_success.c

··· 202 202 return 0; 203 203 } 204 204 205 - SEC("fentry/" SYS_PREFIX "sys_getrlimit") 205 + SEC("fentry/" SYS_PREFIX "sys_prlimit64") 206 206 int test_user_ringbuf_epoll(void *ctx) 207 207 { 208 208 long num_samples;

+61 -10

tools/testing/selftests/bpf/test_loader.c

··· 13 13 #define TEST_TAG_EXPECT_SUCCESS "comment:test_expect_success" 14 14 #define TEST_TAG_EXPECT_MSG_PFX "comment:test_expect_msg=" 15 15 #define TEST_TAG_LOG_LEVEL_PFX "comment:test_log_level=" 16 + #define TEST_TAG_PROG_FLAGS_PFX "comment:test_prog_flags=" 16 17 17 18 struct test_spec { 18 19 const char *name; 19 20 bool expect_failure; 20 - const char *expect_msg; 21 + const char **expect_msgs; 22 + size_t expect_msg_cnt; 21 23 int log_level; 24 + int prog_flags; 22 25 }; 23 26 24 27 static int tester_init(struct test_loader *tester) ··· 70 67 71 68 for (i = 1; i < btf__type_cnt(btf); i++) { 72 69 const struct btf_type *t; 73 - const char *s; 70 + const char *s, *val; 71 + char *e; 74 72 75 73 t = btf__type_by_id(btf, i); 76 74 if (!btf_is_decl_tag(t)) ··· 86 82 } else if (strcmp(s, TEST_TAG_EXPECT_SUCCESS) == 0) { 87 83 spec->expect_failure = false; 88 84 } else if (str_has_pfx(s, TEST_TAG_EXPECT_MSG_PFX)) { 89 - spec->expect_msg = s + sizeof(TEST_TAG_EXPECT_MSG_PFX) - 1; 85 + void *tmp; 86 + const char **msg; 87 + 88 + tmp = realloc(spec->expect_msgs, 89 + (1 + spec->expect_msg_cnt) * sizeof(void *)); 90 + if (!tmp) { 91 + ASSERT_FAIL("failed to realloc memory for messages\n"); 92 + return -ENOMEM; 93 + } 94 + spec->expect_msgs = tmp; 95 + msg = &spec->expect_msgs[spec->expect_msg_cnt++]; 96 + *msg = s + sizeof(TEST_TAG_EXPECT_MSG_PFX) - 1; 90 97 } else if (str_has_pfx(s, TEST_TAG_LOG_LEVEL_PFX)) { 98 + val = s + sizeof(TEST_TAG_LOG_LEVEL_PFX) - 1; 91 99 errno = 0; 92 - spec->log_level = strtol(s + sizeof(TEST_TAG_LOG_LEVEL_PFX) - 1, NULL, 0); 93 - if (errno) { 100 + spec->log_level = strtol(val, &e, 0); 101 + if (errno || e[0] != '\0') { 94 102 ASSERT_FAIL("failed to parse test log level from '%s'", s); 95 103 return -EINVAL; 104 + } 105 + } else if (str_has_pfx(s, TEST_TAG_PROG_FLAGS_PFX)) { 106 + val = s + sizeof(TEST_TAG_PROG_FLAGS_PFX) - 1; 107 + if (strcmp(val, "BPF_F_STRICT_ALIGNMENT") == 0) { 108 + spec->prog_flags |= BPF_F_STRICT_ALIGNMENT; 109 + } else if (strcmp(val, "BPF_F_ANY_ALIGNMENT") == 0) { 110 + spec->prog_flags |= BPF_F_ANY_ALIGNMENT; 111 + } else if (strcmp(val, "BPF_F_TEST_RND_HI32") == 0) { 112 + spec->prog_flags |= BPF_F_TEST_RND_HI32; 113 + } else if (strcmp(val, "BPF_F_TEST_STATE_FREQ") == 0) { 114 + spec->prog_flags |= BPF_F_TEST_STATE_FREQ; 115 + } else if (strcmp(val, "BPF_F_SLEEPABLE") == 0) { 116 + spec->prog_flags |= BPF_F_SLEEPABLE; 117 + } else if (strcmp(val, "BPF_F_XDP_HAS_FRAGS") == 0) { 118 + spec->prog_flags |= BPF_F_XDP_HAS_FRAGS; 119 + } else /* assume numeric value */ { 120 + errno = 0; 121 + spec->prog_flags |= strtol(val, &e, 0); 122 + if (errno || e[0] != '\0') { 123 + ASSERT_FAIL("failed to parse test prog flags from '%s'", s); 124 + return -EINVAL; 125 + } 96 126 } 97 127 } 98 128 } ··· 139 101 struct bpf_object *obj, 140 102 struct bpf_program *prog) 141 103 { 142 - int min_log_level = 0; 104 + int min_log_level = 0, prog_flags; 143 105 144 106 if (env.verbosity > VERBOSE_NONE) 145 107 min_log_level = 1; ··· 157 119 else 158 120 bpf_program__set_log_level(prog, spec->log_level); 159 121 122 + prog_flags = bpf_program__flags(prog); 123 + bpf_program__set_flags(prog, prog_flags | spec->prog_flags); 124 + 160 125 tester->log_buf[0] = '\0'; 126 + tester->next_match_pos = 0; 161 127 } 162 128 163 129 static void emit_verifier_log(const char *log_buf, bool force) ··· 177 135 struct bpf_program *prog, 178 136 int load_err) 179 137 { 180 - if (spec->expect_msg) { 181 - char *match; 138 + int i, j; 182 139 183 - match = strstr(tester->log_buf, spec->expect_msg); 140 + for (i = 0; i < spec->expect_msg_cnt; i++) { 141 + char *match; 142 + const char *expect_msg; 143 + 144 + expect_msg = spec->expect_msgs[i]; 145 + 146 + match = strstr(tester->log_buf + tester->next_match_pos, expect_msg); 184 147 if (!ASSERT_OK_PTR(match, "expect_msg")) { 185 148 /* if we are in verbose mode, we've already emitted log */ 186 149 if (env.verbosity == VERBOSE_NONE) 187 150 emit_verifier_log(tester->log_buf, true /*force*/); 188 - fprintf(stderr, "EXPECTED MSG: '%s'\n", spec->expect_msg); 151 + for (j = 0; j < i; j++) 152 + fprintf(stderr, "MATCHED MSG: '%s'\n", spec->expect_msgs[j]); 153 + fprintf(stderr, "EXPECTED MSG: '%s'\n", expect_msg); 189 154 return; 190 155 } 156 + 157 + tester->next_match_pos = match - tester->log_buf + strlen(expect_msg); 191 158 } 192 159 } 193 160

+16

tools/testing/selftests/bpf/test_progs.h

··· 376 376 ___ok; \ 377 377 }) 378 378 379 + #define SYS(goto_label, fmt, ...) \ 380 + ({ \ 381 + char cmd[1024]; \ 382 + snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 383 + if (!ASSERT_OK(system(cmd), cmd)) \ 384 + goto goto_label; \ 385 + }) 386 + 387 + #define SYS_NOFAIL(fmt, ...) \ 388 + ({ \ 389 + char cmd[1024]; \ 390 + snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \ 391 + system(cmd); \ 392 + }) 393 + 379 394 static inline __u64 ptr_to_u64(const void *ptr) 380 395 { 381 396 return (__u64) (unsigned long) ptr; ··· 427 412 struct test_loader { 428 413 char *log_buf; 429 414 size_t log_buf_sz; 415 + size_t next_match_pos; 430 416 431 417 struct bpf_object *obj; 432 418 };

+1

tools/testing/selftests/bpf/test_tcp_hdr_options.h

··· 50 50 51 51 #define TCPOPT_EOL 0 52 52 #define TCPOPT_NOP 1 53 + #define TCPOPT_MSS 2 53 54 #define TCPOPT_WINDOW 3 54 55 #define TCPOPT_EXP 254 55 56

+11 -11

tools/testing/selftests/bpf/test_verifier.c

··· 699 699 * struct bpf_timer t; 700 700 * }; 701 701 * struct btf_ptr { 702 + * struct prog_test_ref_kfunc __kptr_untrusted *ptr; 702 703 * struct prog_test_ref_kfunc __kptr *ptr; 703 - * struct prog_test_ref_kfunc __kptr_ref *ptr; 704 - * struct prog_test_member __kptr_ref *ptr; 704 + * struct prog_test_member __kptr *ptr; 705 705 * } 706 706 */ 707 707 static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t" 708 - "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_ref" 708 + "\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_untrusted" 709 709 "\0prog_test_member"; 710 710 static __u32 btf_raw_types[] = { 711 711 /* int */ ··· 724 724 BTF_MEMBER_ENC(41, 4, 0), /* struct bpf_timer t; */ 725 725 /* struct prog_test_ref_kfunc */ /* [6] */ 726 726 BTF_STRUCT_ENC(51, 0, 0), 727 - BTF_STRUCT_ENC(89, 0, 0), /* [7] */ 727 + BTF_STRUCT_ENC(95, 0, 0), /* [7] */ 728 + /* type tag "kptr_untrusted" */ 729 + BTF_TYPE_TAG_ENC(80, 6), /* [8] */ 728 730 /* type tag "kptr" */ 729 - BTF_TYPE_TAG_ENC(75, 6), /* [8] */ 730 - /* type tag "kptr_ref" */ 731 - BTF_TYPE_TAG_ENC(80, 6), /* [9] */ 732 - BTF_TYPE_TAG_ENC(80, 7), /* [10] */ 731 + BTF_TYPE_TAG_ENC(75, 6), /* [9] */ 732 + BTF_TYPE_TAG_ENC(75, 7), /* [10] */ 733 733 BTF_PTR_ENC(8), /* [11] */ 734 734 BTF_PTR_ENC(9), /* [12] */ 735 735 BTF_PTR_ENC(10), /* [13] */ 736 736 /* struct btf_ptr */ /* [14] */ 737 737 BTF_STRUCT_ENC(43, 3, 24), 738 - BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr *ptr; */ 739 - BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr_ref *ptr; */ 740 - BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr_ref *ptr; */ 738 + BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr_untrusted *ptr; */ 739 + BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr *ptr; */ 740 + BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr *ptr; */ 741 741 }; 742 742 743 743 static char bpf_vlog[UINT_MAX >> 8];

+10 -7

tools/testing/selftests/bpf/verifier/calls.c

··· 181 181 }, 182 182 .result_unpriv = REJECT, 183 183 .result = REJECT, 184 - .errstr = "negative offset ptr_ ptr R1 off=-4 disallowed", 184 + .errstr = "ptr R1 off=-4 disallowed", 185 185 }, 186 186 { 187 187 "calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset", ··· 243 243 }, 244 244 .result_unpriv = REJECT, 245 245 .result = REJECT, 246 - .errstr = "R1 must be referenced", 246 + .errstr = "R1 must be", 247 247 }, 248 248 { 249 249 "calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID", ··· 2221 2221 * that fp-8 stack slot was unused in the fall-through 2222 2222 * branch and will accept the program incorrectly 2223 2223 */ 2224 - BPF_JMP_IMM(BPF_JGT, BPF_REG_1, 2, 2), 2224 + BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32), 2225 + BPF_JMP_IMM(BPF_JGT, BPF_REG_0, 2, 2), 2225 2226 BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 2226 2227 BPF_JMP_IMM(BPF_JA, 0, 0, 0), 2227 2228 BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 2228 2229 BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 2229 2230 BPF_LD_MAP_FD(BPF_REG_1, 0), 2230 2231 BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 2232 + BPF_MOV64_IMM(BPF_REG_0, 0), 2231 2233 BPF_EXIT_INSN(), 2232 2234 }, 2233 - .fixup_map_hash_48b = { 6 }, 2234 - .errstr = "invalid indirect read from stack R2 off -8+0 size 8", 2235 - .result = REJECT, 2236 - .prog_type = BPF_PROG_TYPE_XDP, 2235 + .fixup_map_hash_48b = { 7 }, 2236 + .errstr_unpriv = "invalid indirect read from stack R2 off -8+0 size 8", 2237 + .result_unpriv = REJECT, 2238 + /* in privileged mode reads from uninitialized stack locations are permitted */ 2239 + .result = ACCEPT, 2237 2240 }, 2238 2241 { 2239 2242 "calls: ctx read at start of subprog",

-11

tools/testing/selftests/bpf/verifier/ctx.c

··· 1 1 { 2 - "context stores via ST", 3 - .insns = { 4 - BPF_MOV64_IMM(BPF_REG_0, 0), 5 - BPF_ST_MEM(BPF_DW, BPF_REG_1, offsetof(struct __sk_buff, mark), 0), 6 - BPF_EXIT_INSN(), 7 - }, 8 - .errstr = "BPF_ST stores into R1 ctx is not allowed", 9 - .result = REJECT, 10 - .prog_type = BPF_PROG_TYPE_SCHED_CLS, 11 - }, 12 - { 13 2 "context stores via BPF_ATOMIC", 14 3 .insns = { 15 4 BPF_MOV64_IMM(BPF_REG_0, 0),

+69 -35

tools/testing/selftests/bpf/verifier/helper_access_var_len.c

··· 29 29 { 30 30 "helper access to variable memory: stack, bitwise AND, zero included", 31 31 .insns = { 32 - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), 33 - BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), 34 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64), 35 - BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128), 36 - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128), 37 - BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 64), 38 - BPF_MOV64_IMM(BPF_REG_3, 0), 39 - BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel), 32 + /* set max stack size */ 33 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -128, 0), 34 + /* set r3 to a random value */ 35 + BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32), 36 + BPF_MOV64_REG(BPF_REG_3, BPF_REG_0), 37 + /* use bitwise AND to limit r3 range to [0, 64] */ 38 + BPF_ALU64_IMM(BPF_AND, BPF_REG_3, 64), 39 + BPF_LD_MAP_FD(BPF_REG_1, 0), 40 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 41 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64), 42 + BPF_MOV64_IMM(BPF_REG_4, 0), 43 + /* Call bpf_ringbuf_output(), it is one of a few helper functions with 44 + * ARG_CONST_SIZE_OR_ZERO parameter allowed in unpriv mode. 45 + * For unpriv this should signal an error, because memory at &fp[-64] is 46 + * not initialized. 47 + */ 48 + BPF_EMIT_CALL(BPF_FUNC_ringbuf_output), 40 49 BPF_EXIT_INSN(), 41 50 }, 42 - .errstr = "invalid indirect read from stack R1 off -64+0 size 64", 43 - .result = REJECT, 44 - .prog_type = BPF_PROG_TYPE_TRACEPOINT, 51 + .fixup_map_ringbuf = { 4 }, 52 + .errstr_unpriv = "invalid indirect read from stack R2 off -64+0 size 64", 53 + .result_unpriv = REJECT, 54 + /* in privileged mode reads from uninitialized stack locations are permitted */ 55 + .result = ACCEPT, 45 56 }, 46 57 { 47 58 "helper access to variable memory: stack, bitwise AND + JMP, wrong max", ··· 194 183 { 195 184 "helper access to variable memory: stack, JMP, no min check", 196 185 .insns = { 197 - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), 198 - BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), 199 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64), 200 - BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128), 201 - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128), 202 - BPF_JMP_IMM(BPF_JGT, BPF_REG_2, 64, 3), 203 - BPF_MOV64_IMM(BPF_REG_3, 0), 204 - BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel), 186 + /* set max stack size */ 187 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -128, 0), 188 + /* set r3 to a random value */ 189 + BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32), 190 + BPF_MOV64_REG(BPF_REG_3, BPF_REG_0), 191 + /* use JMP to limit r3 range to [0, 64] */ 192 + BPF_JMP_IMM(BPF_JGT, BPF_REG_3, 64, 6), 193 + BPF_LD_MAP_FD(BPF_REG_1, 0), 194 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 195 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64), 196 + BPF_MOV64_IMM(BPF_REG_4, 0), 197 + /* Call bpf_ringbuf_output(), it is one of a few helper functions with 198 + * ARG_CONST_SIZE_OR_ZERO parameter allowed in unpriv mode. 199 + * For unpriv this should signal an error, because memory at &fp[-64] is 200 + * not initialized. 201 + */ 202 + BPF_EMIT_CALL(BPF_FUNC_ringbuf_output), 205 203 BPF_MOV64_IMM(BPF_REG_0, 0), 206 204 BPF_EXIT_INSN(), 207 205 }, 208 - .errstr = "invalid indirect read from stack R1 off -64+0 size 64", 209 - .result = REJECT, 210 - .prog_type = BPF_PROG_TYPE_TRACEPOINT, 206 + .fixup_map_ringbuf = { 4 }, 207 + .errstr_unpriv = "invalid indirect read from stack R2 off -64+0 size 64", 208 + .result_unpriv = REJECT, 209 + /* in privileged mode reads from uninitialized stack locations are permitted */ 210 + .result = ACCEPT, 211 211 }, 212 212 { 213 213 "helper access to variable memory: stack, JMP (signed), no min check", ··· 586 564 { 587 565 "helper access to variable memory: 8 bytes leak", 588 566 .insns = { 589 - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), 590 - BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), 591 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64), 567 + /* set max stack size */ 568 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -128, 0), 569 + /* set r3 to a random value */ 570 + BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32), 571 + BPF_MOV64_REG(BPF_REG_3, BPF_REG_0), 572 + BPF_LD_MAP_FD(BPF_REG_1, 0), 573 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 574 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64), 592 575 BPF_MOV64_IMM(BPF_REG_0, 0), 593 576 BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -64), 594 577 BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -56), 595 578 BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -48), 596 579 BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -40), 580 + /* Note: fp[-32] left uninitialized */ 597 581 BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -24), 598 582 BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16), 599 583 BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), 600 - BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -128), 601 - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -128), 602 - BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 63), 603 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 1), 604 - BPF_MOV64_IMM(BPF_REG_3, 0), 605 - BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel), 606 - BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16), 584 + /* Limit r3 range to [1, 64] */ 585 + BPF_ALU64_IMM(BPF_AND, BPF_REG_3, 63), 586 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, 1), 587 + BPF_MOV64_IMM(BPF_REG_4, 0), 588 + /* Call bpf_ringbuf_output(), it is one of a few helper functions with 589 + * ARG_CONST_SIZE_OR_ZERO parameter allowed in unpriv mode. 590 + * For unpriv this should signal an error, because memory region [1, 64] 591 + * at &fp[-64] is not fully initialized. 592 + */ 593 + BPF_EMIT_CALL(BPF_FUNC_ringbuf_output), 594 + BPF_MOV64_IMM(BPF_REG_0, 0), 607 595 BPF_EXIT_INSN(), 608 596 }, 609 - .errstr = "invalid indirect read from stack R1 off -64+32 size 64", 610 - .result = REJECT, 611 - .prog_type = BPF_PROG_TYPE_TRACEPOINT, 597 + .fixup_map_ringbuf = { 3 }, 598 + .errstr_unpriv = "invalid indirect read from stack R2 off -64+32 size 64", 599 + .result_unpriv = REJECT, 600 + /* in privileged mode reads from uninitialized stack locations are permitted */ 601 + .result = ACCEPT, 612 602 }, 613 603 { 614 604 "helper access to variable memory: 8 bytes no leak (init memory)",

+5 -4

tools/testing/selftests/bpf/verifier/int_ptr.c

··· 54 54 /* bpf_strtoul() */ 55 55 BPF_EMIT_CALL(BPF_FUNC_strtoul), 56 56 57 - BPF_MOV64_IMM(BPF_REG_0, 1), 57 + BPF_MOV64_IMM(BPF_REG_0, 0), 58 58 BPF_EXIT_INSN(), 59 59 }, 60 - .result = REJECT, 61 - .prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL, 62 - .errstr = "invalid indirect read from stack R4 off -16+4 size 8", 60 + .result_unpriv = REJECT, 61 + .errstr_unpriv = "invalid indirect read from stack R4 off -16+4 size 8", 62 + /* in privileged mode reads from uninitialized stack locations are permitted */ 63 + .result = ACCEPT, 63 64 }, 64 65 { 65 66 "ARG_PTR_TO_LONG misaligned",

+1 -1

tools/testing/selftests/bpf/verifier/map_kptr.c

··· 336 336 .prog_type = BPF_PROG_TYPE_SCHED_CLS, 337 337 .fixup_map_kptr = { 1 }, 338 338 .result = REJECT, 339 - .errstr = "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_", 339 + .errstr = "R1 type=rcu_ptr_or_null_ expected=percpu_ptr_", 340 340 }, 341 341 { 342 342 "map_kptr: ref: reject off != 0",

+8 -5

tools/testing/selftests/bpf/verifier/search_pruning.c

··· 128 128 BPF_EXIT_INSN(), 129 129 }, 130 130 .fixup_map_hash_8b = { 3 }, 131 - .errstr = "invalid read from stack off -16+0 size 8", 132 - .result = REJECT, 133 - .prog_type = BPF_PROG_TYPE_TRACEPOINT, 131 + .errstr_unpriv = "invalid read from stack off -16+0 size 8", 132 + .result_unpriv = REJECT, 133 + /* in privileged mode reads from uninitialized stack locations are permitted */ 134 + .result = ACCEPT, 134 135 }, 135 136 { 136 137 "precision tracking for u32 spill/fill", ··· 259 258 BPF_EXIT_INSN(), 260 259 }, 261 260 .flags = BPF_F_TEST_STATE_FREQ, 262 - .errstr = "invalid read from stack off -8+1 size 8", 263 - .result = REJECT, 261 + .errstr_unpriv = "invalid read from stack off -8+1 size 8", 262 + .result_unpriv = REJECT, 263 + /* in privileged mode reads from uninitialized stack locations are permitted */ 264 + .result = ACCEPT, 264 265 },

-27

tools/testing/selftests/bpf/verifier/sock.c

··· 531 531 .result = ACCEPT, 532 532 }, 533 533 { 534 - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init stack_value", 535 - .insns = { 536 - BPF_MOV64_IMM(BPF_REG_2, 0), 537 - BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 538 - BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)), 539 - BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2), 540 - BPF_MOV64_IMM(BPF_REG_0, 0), 541 - BPF_EXIT_INSN(), 542 - BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), 543 - BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2), 544 - BPF_MOV64_IMM(BPF_REG_0, 0), 545 - BPF_EXIT_INSN(), 546 - BPF_MOV64_IMM(BPF_REG_4, 1), 547 - BPF_MOV64_REG(BPF_REG_3, BPF_REG_10), 548 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -8), 549 - BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), 550 - BPF_LD_MAP_FD(BPF_REG_1, 0), 551 - BPF_EMIT_CALL(BPF_FUNC_sk_storage_get), 552 - BPF_MOV64_IMM(BPF_REG_0, 0), 553 - BPF_EXIT_INSN(), 554 - }, 555 - .fixup_sk_storage_map = { 14 }, 556 - .prog_type = BPF_PROG_TYPE_SCHED_CLS, 557 - .result = REJECT, 558 - .errstr = "invalid indirect read from stack", 559 - }, 560 - { 561 534 "bpf_map_lookup_elem(smap, &key)", 562 535 .insns = { 563 536 BPF_ST_MEM(BPF_W, BPF_REG_10, -4, 0),

+4 -3

tools/testing/selftests/bpf/verifier/spill_fill.c

··· 171 171 BPF_MOV64_IMM(BPF_REG_0, 0), 172 172 BPF_EXIT_INSN(), 173 173 }, 174 - .result = REJECT, 175 - .errstr = "invalid read from stack off -4+0 size 4", 176 - .prog_type = BPF_PROG_TYPE_SCHED_CLS, 174 + .result_unpriv = REJECT, 175 + .errstr_unpriv = "invalid read from stack off -4+0 size 4", 176 + /* in privileged mode reads from uninitialized stack locations are permitted */ 177 + .result = ACCEPT, 177 178 }, 178 179 { 179 180 "Spill a u32 const scalar. Refill as u16. Offset to skb->data",

+23

tools/testing/selftests/bpf/verifier/unpriv.c

··· 240 240 .prog_type = BPF_PROG_TYPE_SCHED_CLS, 241 241 }, 242 242 { 243 + /* Same as above, but use BPF_ST_MEM to save 42 244 + * instead of BPF_STX_MEM. 245 + */ 246 + "unpriv: spill/fill of different pointers st", 247 + .insns = { 248 + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), 249 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), 250 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 3), 251 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 252 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -16), 253 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), 254 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), 255 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), 256 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), 257 + BPF_ST_MEM(BPF_W, BPF_REG_1, offsetof(struct __sk_buff, mark), 42), 258 + BPF_MOV64_IMM(BPF_REG_0, 0), 259 + BPF_EXIT_INSN(), 260 + }, 261 + .result = REJECT, 262 + .errstr = "same insn cannot be used with different pointers", 263 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 264 + }, 265 + { 243 266 "unpriv: spill/fill of different pointers stx - ctx and sock", 244 267 .insns = { 245 268 BPF_MOV64_REG(BPF_REG_8, BPF_REG_1),

-52

tools/testing/selftests/bpf/verifier/var_off.c

··· 213 213 .prog_type = BPF_PROG_TYPE_LWT_IN, 214 214 }, 215 215 { 216 - "indirect variable-offset stack access, max_off+size > max_initialized", 217 - .insns = { 218 - /* Fill only the second from top 8 bytes of the stack. */ 219 - BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0), 220 - /* Get an unknown value. */ 221 - BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0), 222 - /* Make it small and 4-byte aligned. */ 223 - BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4), 224 - BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16), 225 - /* Add it to fp. We now have either fp-12 or fp-16, but we don't know 226 - * which. fp-12 size 8 is partially uninitialized stack. 227 - */ 228 - BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10), 229 - /* Dereference it indirectly. */ 230 - BPF_LD_MAP_FD(BPF_REG_1, 0), 231 - BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), 232 - BPF_MOV64_IMM(BPF_REG_0, 0), 233 - BPF_EXIT_INSN(), 234 - }, 235 - .fixup_map_hash_8b = { 5 }, 236 - .errstr = "invalid indirect read from stack R2 var_off", 237 - .result = REJECT, 238 - .prog_type = BPF_PROG_TYPE_LWT_IN, 239 - }, 240 - { 241 216 "indirect variable-offset stack access, min_off < min_initialized", 242 217 .insns = { 243 218 /* Fill only the top 8 bytes of the stack. */ ··· 263 288 .result_unpriv = REJECT, 264 289 .result = ACCEPT, 265 290 .prog_type = BPF_PROG_TYPE_CGROUP_SKB, 266 - }, 267 - { 268 - "indirect variable-offset stack access, uninitialized", 269 - .insns = { 270 - BPF_MOV64_IMM(BPF_REG_2, 6), 271 - BPF_MOV64_IMM(BPF_REG_3, 28), 272 - /* Fill the top 16 bytes of the stack. */ 273 - BPF_ST_MEM(BPF_W, BPF_REG_10, -16, 0), 274 - BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 275 - /* Get an unknown value. */ 276 - BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1, 0), 277 - /* Make it small and 4-byte aligned. */ 278 - BPF_ALU64_IMM(BPF_AND, BPF_REG_4, 4), 279 - BPF_ALU64_IMM(BPF_SUB, BPF_REG_4, 16), 280 - /* Add it to fp. We now have either fp-12 or fp-16, we don't know 281 - * which, but either way it points to initialized stack. 282 - */ 283 - BPF_ALU64_REG(BPF_ADD, BPF_REG_4, BPF_REG_10), 284 - BPF_MOV64_IMM(BPF_REG_5, 8), 285 - /* Dereference it indirectly. */ 286 - BPF_EMIT_CALL(BPF_FUNC_getsockopt), 287 - BPF_MOV64_IMM(BPF_REG_0, 0), 288 - BPF_EXIT_INSN(), 289 - }, 290 - .errstr = "invalid indirect read from stack R4 var_off", 291 - .result = REJECT, 292 - .prog_type = BPF_PROG_TYPE_SOCK_OPS, 293 291 }, 294 292 { 295 293 "indirect variable-offset stack access, ok",