Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

-5

Documentation/bpf/bpf_design_QA.rst

··· 140 140 it more complicated to support on arm64 and other archs. Also it 141 141 needs div-by-zero runtime check. 142 142 143 - Q: Why there is no BPF_SDIV for signed divide operation? 144 - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 145 - A: Because it would be rarely used. llvm errors in such case and 146 - prints a suggestion to use unsigned divide instead. 147 - 148 143 Q: Why BPF has implicit prologue and epilogue? 149 144 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 150 145 A: Because architectures like sparc have register windows and in general

+93 -37

Documentation/bpf/standardization/instruction-set.rst

··· 154 154 The 'code' field encodes the operation as below, where 'src' and 'dst' refer 155 155 to the values of the source and destination registers, respectively. 156 156 157 - ======== ===== ========================================================== 158 - code value description 159 - ======== ===== ========================================================== 160 - BPF_ADD 0x00 dst += src 161 - BPF_SUB 0x10 dst -= src 162 - BPF_MUL 0x20 dst \*= src 163 - BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0 164 - BPF_OR 0x40 dst \|= src 165 - BPF_AND 0x50 dst &= src 166 - BPF_LSH 0x60 dst <<= (src & mask) 167 - BPF_RSH 0x70 dst >>= (src & mask) 168 - BPF_NEG 0x80 dst = -src 169 - BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst 170 - BPF_XOR 0xa0 dst ^= src 171 - BPF_MOV 0xb0 dst = src 172 - BPF_ARSH 0xc0 sign extending dst >>= (src & mask) 173 - BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below) 174 - ======== ===== ========================================================== 157 + ========= ===== ======= ========================================================== 158 + code value offset description 159 + ========= ===== ======= ========================================================== 160 + BPF_ADD 0x00 0 dst += src 161 + BPF_SUB 0x10 0 dst -= src 162 + BPF_MUL 0x20 0 dst \*= src 163 + BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0 164 + BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0 165 + BPF_OR 0x40 0 dst \|= src 166 + BPF_AND 0x50 0 dst &= src 167 + BPF_LSH 0x60 0 dst <<= (src & mask) 168 + BPF_RSH 0x70 0 dst >>= (src & mask) 169 + BPF_NEG 0x80 0 dst = -dst 170 + BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst 171 + BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst 172 + BPF_XOR 0xa0 0 dst ^= src 173 + BPF_MOV 0xb0 0 dst = src 174 + BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src 175 + BPF_ARSH 0xc0 0 sign extending dst >>= (src & mask) 176 + BPF_END 0xd0 0 byte swap operations (see `Byte swap instructions`_ below) 177 + ========= ===== ======= ========================================================== 175 178 176 179 Underflow and overflow are allowed during arithmetic operations, meaning 177 180 the 64-bit or 32-bit value will wrap. If eBPF program execution would ··· 201 198 202 199 dst = dst ^ imm32 203 200 204 - Also note that the division and modulo operations are unsigned. Thus, for 205 - ``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas 206 - for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result 207 - interpreted as an unsigned 64-bit value. There are no instructions for 208 - signed division or modulo. 201 + Note that most instructions have instruction offset of 0. Only three instructions 202 + (``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset. 203 + 204 + The devision and modulo operations support both unsigned and signed flavors. 205 + 206 + For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``, 207 + 'imm' is interpreted as a 32-bit unsigned value. For ``BPF_ALU64``, 208 + 'imm' is first sign extended from 32 to 64 bits, and then interpreted as 209 + a 64-bit unsigned value. 210 + 211 + For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``, 212 + 'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm' 213 + is first sign extended from 32 to 64 bits, and then interpreted as a 214 + 64-bit signed value. 215 + 216 + The ``BPF_MOVSX`` instruction does a move operation with sign extension. 217 + ``BPF_ALU | BPF_MOVSX`` sign extends 8-bit and 16-bit operands into 32 218 + bit operands, and zeroes the remaining upper 32 bits. 219 + ``BPF_ALU64 | BPF_MOVSX`` sign extends 8-bit, 16-bit, and 32-bit 220 + operands into 64 bit operands. 209 221 210 222 Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) 211 223 for 32-bit operations. 212 224 213 225 Byte swap instructions 214 - ~~~~~~~~~~~~~~~~~~~~~~ 226 + ---------------------- 215 227 216 - The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit 217 - 'code' field of ``BPF_END``. 228 + The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64`` 229 + and a 4-bit 'code' field of ``BPF_END``. 218 230 219 231 The byte swap instructions operate on the destination register 220 232 only and do not use a separate source register or immediate value. 221 233 222 - The 1-bit source operand field in the opcode is used to select what byte 223 - order the operation convert from or to: 234 + For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to 235 + select what byte order the operation converts from or to. For 236 + ``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved 237 + and must be set to 0. 224 238 225 - ========= ===== ================================================= 226 - source value description 227 - ========= ===== ================================================= 228 - BPF_TO_LE 0x00 convert between host byte order and little endian 229 - BPF_TO_BE 0x08 convert between host byte order and big endian 230 - ========= ===== ================================================= 239 + ========= ========= ===== ================================================= 240 + class source value description 241 + ========= ========= ===== ================================================= 242 + BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little endian 243 + BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big endian 244 + BPF_ALU64 Reserved 0x00 do byte swap unconditionally 245 + ========= ========= ===== ================================================= 231 246 232 247 The 'imm' field encodes the width of the swap operations. The following widths 233 248 are supported: 16, 32 and 64. ··· 260 239 261 240 dst = htobe64(dst) 262 241 242 + ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: 243 + 244 + dst = bswap16 dst 245 + dst = bswap32 dst 246 + dst = bswap64 dst 247 + 263 248 Jump instructions 264 249 ----------------- 265 250 ··· 276 249 ======== ===== === =========================================== ========================================= 277 250 code value src description notes 278 251 ======== ===== === =========================================== ========================================= 279 - BPF_JA 0x0 0x0 PC += offset BPF_JMP only 252 + BPF_JA 0x0 0x0 PC += offset BPF_JMP class 253 + BPF_JA 0x0 0x0 PC += imm BPF_JMP32 class 280 254 BPF_JEQ 0x1 any PC += offset if dst == src 281 255 BPF_JGT 0x2 any PC += offset if dst > src unsigned 282 256 BPF_JGE 0x3 any PC += offset if dst >= src unsigned ··· 305 277 if (s32)dst s>= (s32)src goto +offset 306 278 307 279 where 's>=' indicates a signed '>=' comparison. 280 + 281 + ``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means:: 282 + 283 + gotol +imm 284 + 285 + where 'imm' means the branch offset comes from insn 'imm' field. 286 + 287 + Note that there are two flavors of ``BPF_JA`` instructions. The 288 + ``BPF_JMP`` class permits a 16-bit jump offset specified by the 'offset' 289 + field, whereas the ``BPF_JMP32`` class permits a 32-bit jump offset 290 + specified by the 'imm' field. A > 16-bit conditional jump may be 291 + converted to a < 16-bit conditional jump plus a 32-bit unconditional 292 + jump. 308 293 309 294 Helper functions 310 295 ~~~~~~~~~~~~~~~~ ··· 361 320 BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ 362 321 BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ 363 322 BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ 323 + BPF_MEMSX 0x80 sign-extension load operations `Sign-extension load operations`_ 364 324 BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ 365 325 ============= ===== ==================================== ============= 366 326 ··· 392 350 393 351 ``BPF_MEM | <size> | BPF_LDX`` means:: 394 352 395 - dst = *(size *) (src + offset) 353 + dst = *(unsigned size *) (src + offset) 396 354 397 - Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. 355 + Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and 356 + 'unsigned size' is one of u8, u16, u32 or u64. 357 + 358 + Sign-extension load operations 359 + ------------------------------ 360 + 361 + The ``BPF_MEMSX`` mode modifier is used to encode sign-extension load 362 + instructions that transfer data between a register and memory. 363 + 364 + ``BPF_MEMSX | <size> | BPF_LDX`` means:: 365 + 366 + dst = *(signed size *) (src + offset) 367 + 368 + Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and 369 + 'signed size' is one of s8, s16 or s32. 398 370 399 371 Atomic operations 400 372 -----------------

+2 -2

MAINTAINERS

··· 3704 3704 M: Andrii Nakryiko <andrii@kernel.org> 3705 3705 R: Martin KaFai Lau <martin.lau@linux.dev> 3706 3706 R: Song Liu <song@kernel.org> 3707 - R: Yonghong Song <yhs@fb.com> 3707 + R: Yonghong Song <yonghong.song@linux.dev> 3708 3708 R: John Fastabend <john.fastabend@gmail.com> 3709 3709 R: KP Singh <kpsingh@kernel.org> 3710 3710 R: Stanislav Fomichev <sdf@google.com> ··· 3743 3743 F: tools/testing/selftests/bpf/ 3744 3744 3745 3745 BPF [ITERATOR] 3746 - M: Yonghong Song <yhs@fb.com> 3746 + M: Yonghong Song <yonghong.song@linux.dev> 3747 3747 L: bpf@vger.kernel.org 3748 3748 S: Maintained 3749 3749 F: kernel/bpf/*iter.c

+81 -70

arch/riscv/net/bpf_jit_comp64.c

··· 13 13 #include <asm/patch.h> 14 14 #include "bpf_jit.h" 15 15 16 + #define RV_FENTRY_NINSNS 2 17 + 16 18 #define RV_REG_TCC RV_REG_A6 17 19 #define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */ 18 20 ··· 243 241 if (!is_tail_call) 244 242 emit_mv(RV_REG_A0, RV_REG_A5, ctx); 245 243 emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA, 246 - is_tail_call ? 20 : 0, /* skip reserved nops and TCC init */ 244 + is_tail_call ? (RV_FENTRY_NINSNS + 1) * 4 : 0, /* skip reserved nops and TCC init */ 247 245 ctx); 248 246 } 249 247 ··· 620 618 return 0; 621 619 } 622 620 623 - static int gen_call_or_nops(void *target, void *ip, u32 *insns) 624 - { 625 - s64 rvoff; 626 - int i, ret; 627 - struct rv_jit_context ctx; 628 - 629 - ctx.ninsns = 0; 630 - ctx.insns = (u16 *)insns; 631 - 632 - if (!target) { 633 - for (i = 0; i < 4; i++) 634 - emit(rv_nop(), &ctx); 635 - return 0; 636 - } 637 - 638 - rvoff = (s64)(target - (ip + 4)); 639 - emit(rv_sd(RV_REG_SP, -8, RV_REG_RA), &ctx); 640 - ret = emit_jump_and_link(RV_REG_RA, rvoff, false, &ctx); 641 - if (ret) 642 - return ret; 643 - emit(rv_ld(RV_REG_RA, -8, RV_REG_SP), &ctx); 644 - 645 - return 0; 646 - } 647 - 648 - static int gen_jump_or_nops(void *target, void *ip, u32 *insns) 621 + static int gen_jump_or_nops(void *target, void *ip, u32 *insns, bool is_call) 649 622 { 650 623 s64 rvoff; 651 624 struct rv_jit_context ctx; ··· 635 658 } 636 659 637 660 rvoff = (s64)(target - ip); 638 - return emit_jump_and_link(RV_REG_ZERO, rvoff, false, &ctx); 661 + return emit_jump_and_link(is_call ? RV_REG_T0 : RV_REG_ZERO, rvoff, false, &ctx); 639 662 } 640 663 641 664 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type, 642 665 void *old_addr, void *new_addr) 643 666 { 644 - u32 old_insns[4], new_insns[4]; 667 + u32 old_insns[RV_FENTRY_NINSNS], new_insns[RV_FENTRY_NINSNS]; 645 668 bool is_call = poke_type == BPF_MOD_CALL; 646 - int (*gen_insns)(void *target, void *ip, u32 *insns); 647 - int ninsns = is_call ? 4 : 2; 648 669 int ret; 649 670 650 - if (!is_bpf_text_address((unsigned long)ip)) 671 + if (!is_kernel_text((unsigned long)ip) && 672 + !is_bpf_text_address((unsigned long)ip)) 651 673 return -ENOTSUPP; 652 674 653 - gen_insns = is_call ? gen_call_or_nops : gen_jump_or_nops; 654 - 655 - ret = gen_insns(old_addr, ip, old_insns); 675 + ret = gen_jump_or_nops(old_addr, ip, old_insns, is_call); 656 676 if (ret) 657 677 return ret; 658 678 659 - if (memcmp(ip, old_insns, ninsns * 4)) 679 + if (memcmp(ip, old_insns, RV_FENTRY_NINSNS * 4)) 660 680 return -EFAULT; 661 681 662 - ret = gen_insns(new_addr, ip, new_insns); 682 + ret = gen_jump_or_nops(new_addr, ip, new_insns, is_call); 663 683 if (ret) 664 684 return ret; 665 685 666 686 cpus_read_lock(); 667 687 mutex_lock(&text_mutex); 668 - if (memcmp(ip, new_insns, ninsns * 4)) 669 - ret = patch_text(ip, new_insns, ninsns); 688 + if (memcmp(ip, new_insns, RV_FENTRY_NINSNS * 4)) 689 + ret = patch_text(ip, new_insns, RV_FENTRY_NINSNS); 670 690 mutex_unlock(&text_mutex); 671 691 cpus_read_unlock(); 672 692 ··· 761 787 int i, ret, offset; 762 788 int *branches_off = NULL; 763 789 int stack_size = 0, nregs = m->nr_args; 764 - int retaddr_off, fp_off, retval_off, args_off; 765 - int nregs_off, ip_off, run_ctx_off, sreg_off; 790 + int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off; 766 791 struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; 767 792 struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; 768 793 struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; ··· 769 796 bool save_ret; 770 797 u32 insn; 771 798 772 - /* Generated trampoline stack layout: 799 + /* Two types of generated trampoline stack layout: 773 800 * 774 - * FP - 8 [ RA of parent func ] return address of parent 801 + * 1. trampoline called from function entry 802 + * -------------------------------------- 803 + * FP + 8 [ RA to parent func ] return address to parent 775 804 * function 776 - * FP - retaddr_off [ RA of traced func ] return address of traced 805 + * FP + 0 [ FP of parent func ] frame pointer of parent 777 806 * function 778 - * FP - fp_off [ FP of parent func ] 807 + * FP - 8 [ T0 to traced func ] return address of traced 808 + * function 809 + * FP - 16 [ FP of traced func ] frame pointer of traced 810 + * function 811 + * -------------------------------------- 812 + * 813 + * 2. trampoline called directly 814 + * -------------------------------------- 815 + * FP - 8 [ RA to caller func ] return address to caller 816 + * function 817 + * FP - 16 [ FP of caller func ] frame pointer of caller 818 + * function 819 + * -------------------------------------- 779 820 * 780 821 * FP - retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or 781 822 * BPF_TRAMP_F_RET_FENTRY_RET ··· 820 833 if (nregs > 8) 821 834 return -ENOTSUPP; 822 835 823 - /* room for parent function return address */ 824 - stack_size += 8; 825 - 826 - stack_size += 8; 827 - retaddr_off = stack_size; 828 - 829 - stack_size += 8; 830 - fp_off = stack_size; 836 + /* room of trampoline frame to store return address and frame pointer */ 837 + stack_size += 16; 831 838 832 839 save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET); 833 840 if (save_ret) { ··· 848 867 849 868 stack_size = round_up(stack_size, 16); 850 869 851 - emit_addi(RV_REG_SP, RV_REG_SP, -stack_size, ctx); 870 + if (func_addr) { 871 + /* For the trampoline called from function entry, 872 + * the frame of traced function and the frame of 873 + * trampoline need to be considered. 874 + */ 875 + emit_addi(RV_REG_SP, RV_REG_SP, -16, ctx); 876 + emit_sd(RV_REG_SP, 8, RV_REG_RA, ctx); 877 + emit_sd(RV_REG_SP, 0, RV_REG_FP, ctx); 878 + emit_addi(RV_REG_FP, RV_REG_SP, 16, ctx); 852 879 853 - emit_sd(RV_REG_SP, stack_size - retaddr_off, RV_REG_RA, ctx); 854 - emit_sd(RV_REG_SP, stack_size - fp_off, RV_REG_FP, ctx); 855 - 856 - emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx); 880 + emit_addi(RV_REG_SP, RV_REG_SP, -stack_size, ctx); 881 + emit_sd(RV_REG_SP, stack_size - 8, RV_REG_T0, ctx); 882 + emit_sd(RV_REG_SP, stack_size - 16, RV_REG_FP, ctx); 883 + emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx); 884 + } else { 885 + /* For the trampoline called directly, just handle 886 + * the frame of trampoline. 887 + */ 888 + emit_addi(RV_REG_SP, RV_REG_SP, -stack_size, ctx); 889 + emit_sd(RV_REG_SP, stack_size - 8, RV_REG_RA, ctx); 890 + emit_sd(RV_REG_SP, stack_size - 16, RV_REG_FP, ctx); 891 + emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx); 892 + } 857 893 858 894 /* callee saved register S1 to pass start time */ 859 895 emit_sd(RV_REG_FP, -sreg_off, RV_REG_S1, ctx); ··· 888 890 889 891 /* skip to actual body of traced function */ 890 892 if (flags & BPF_TRAMP_F_SKIP_FRAME) 891 - orig_call += 16; 893 + orig_call += RV_FENTRY_NINSNS * 4; 892 894 893 895 if (flags & BPF_TRAMP_F_CALL_ORIG) { 894 896 emit_imm(RV_REG_A0, (const s64)im, ctx); ··· 965 967 966 968 emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx); 967 969 968 - if (flags & BPF_TRAMP_F_SKIP_FRAME) 969 - /* return address of parent function */ 970 + if (func_addr) { 971 + /* trampoline called from function entry */ 972 + emit_ld(RV_REG_T0, stack_size - 8, RV_REG_SP, ctx); 973 + emit_ld(RV_REG_FP, stack_size - 16, RV_REG_SP, ctx); 974 + emit_addi(RV_REG_SP, RV_REG_SP, stack_size, ctx); 975 + 976 + emit_ld(RV_REG_RA, 8, RV_REG_SP, ctx); 977 + emit_ld(RV_REG_FP, 0, RV_REG_SP, ctx); 978 + emit_addi(RV_REG_SP, RV_REG_SP, 16, ctx); 979 + 980 + if (flags & BPF_TRAMP_F_SKIP_FRAME) 981 + /* return to parent function */ 982 + emit_jalr(RV_REG_ZERO, RV_REG_RA, 0, ctx); 983 + else 984 + /* return to traced function */ 985 + emit_jalr(RV_REG_ZERO, RV_REG_T0, 0, ctx); 986 + } else { 987 + /* trampoline called directly */ 970 988 emit_ld(RV_REG_RA, stack_size - 8, RV_REG_SP, ctx); 971 - else 972 - /* return address of traced function */ 973 - emit_ld(RV_REG_RA, stack_size - retaddr_off, RV_REG_SP, ctx); 989 + emit_ld(RV_REG_FP, stack_size - 16, RV_REG_SP, ctx); 990 + emit_addi(RV_REG_SP, RV_REG_SP, stack_size, ctx); 974 991 975 - emit_ld(RV_REG_FP, stack_size - fp_off, RV_REG_SP, ctx); 976 - emit_addi(RV_REG_SP, RV_REG_SP, stack_size, ctx); 977 - 978 - emit_jalr(RV_REG_ZERO, RV_REG_RA, 0, ctx); 992 + emit_jalr(RV_REG_ZERO, RV_REG_RA, 0, ctx); 993 + } 979 994 980 995 ret = ctx->ninsns; 981 996 out: ··· 1702 1691 1703 1692 store_offset = stack_adjust - 8; 1704 1693 1705 - /* reserve 4 nop insns */ 1706 - for (i = 0; i < 4; i++) 1694 + /* nops reserved for auipc+jalr pair */ 1695 + for (i = 0; i < RV_FENTRY_NINSNS; i++) 1707 1696 emit(rv_nop(), ctx); 1708 1697 1709 1698 /* First instruction is always setting the tail-call-counter

+117 -24

arch/x86/net/bpf_jit_comp.c

··· 701 701 *pprog = prog; 702 702 } 703 703 704 + static void emit_movsx_reg(u8 **pprog, int num_bits, bool is64, u32 dst_reg, 705 + u32 src_reg) 706 + { 707 + u8 *prog = *pprog; 708 + 709 + if (is64) { 710 + /* movs[b,w,l]q dst, src */ 711 + if (num_bits == 8) 712 + EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbe, 713 + add_2reg(0xC0, src_reg, dst_reg)); 714 + else if (num_bits == 16) 715 + EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbf, 716 + add_2reg(0xC0, src_reg, dst_reg)); 717 + else if (num_bits == 32) 718 + EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x63, 719 + add_2reg(0xC0, src_reg, dst_reg)); 720 + } else { 721 + /* movs[b,w]l dst, src */ 722 + if (num_bits == 8) { 723 + EMIT4(add_2mod(0x40, src_reg, dst_reg), 0x0f, 0xbe, 724 + add_2reg(0xC0, src_reg, dst_reg)); 725 + } else if (num_bits == 16) { 726 + if (is_ereg(dst_reg) || is_ereg(src_reg)) 727 + EMIT1(add_2mod(0x40, src_reg, dst_reg)); 728 + EMIT3(add_2mod(0x0f, src_reg, dst_reg), 0xbf, 729 + add_2reg(0xC0, src_reg, dst_reg)); 730 + } 731 + } 732 + 733 + *pprog = prog; 734 + } 735 + 704 736 /* Emit the suffix (ModR/M etc) for addressing *(ptr_reg + off) and val_reg */ 705 737 static void emit_insn_suffix(u8 **pprog, u32 ptr_reg, u32 val_reg, int off) 706 738 { ··· 805 773 case BPF_DW: 806 774 /* Emit 'mov rax, qword ptr [rax+0x14]' */ 807 775 EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B); 776 + break; 777 + } 778 + emit_insn_suffix(&prog, src_reg, dst_reg, off); 779 + *pprog = prog; 780 + } 781 + 782 + /* LDSX: dst_reg = *(s8*)(src_reg + off) */ 783 + static void emit_ldsx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off) 784 + { 785 + u8 *prog = *pprog; 786 + 787 + switch (size) { 788 + case BPF_B: 789 + /* Emit 'movsx rax, byte ptr [rax + off]' */ 790 + EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xBE); 791 + break; 792 + case BPF_H: 793 + /* Emit 'movsx rax, word ptr [rax + off]' */ 794 + EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xBF); 795 + break; 796 + case BPF_W: 797 + /* Emit 'movsx rax, dword ptr [rax+0x14]' */ 798 + EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x63); 808 799 break; 809 800 } 810 801 emit_insn_suffix(&prog, src_reg, dst_reg, off); ··· 1083 1028 1084 1029 case BPF_ALU64 | BPF_MOV | BPF_X: 1085 1030 case BPF_ALU | BPF_MOV | BPF_X: 1086 - emit_mov_reg(&prog, 1087 - BPF_CLASS(insn->code) == BPF_ALU64, 1088 - dst_reg, src_reg); 1031 + if (insn->off == 0) 1032 + emit_mov_reg(&prog, 1033 + BPF_CLASS(insn->code) == BPF_ALU64, 1034 + dst_reg, src_reg); 1035 + else 1036 + emit_movsx_reg(&prog, insn->off, 1037 + BPF_CLASS(insn->code) == BPF_ALU64, 1038 + dst_reg, src_reg); 1089 1039 break; 1090 1040 1091 1041 /* neg dst */ ··· 1194 1134 /* mov rax, dst_reg */ 1195 1135 emit_mov_reg(&prog, is64, BPF_REG_0, dst_reg); 1196 1136 1197 - /* 1198 - * xor edx, edx 1199 - * equivalent to 'xor rdx, rdx', but one byte less 1200 - */ 1201 - EMIT2(0x31, 0xd2); 1137 + if (insn->off == 0) { 1138 + /* 1139 + * xor edx, edx 1140 + * equivalent to 'xor rdx, rdx', but one byte less 1141 + */ 1142 + EMIT2(0x31, 0xd2); 1202 1143 1203 - /* div src_reg */ 1204 - maybe_emit_1mod(&prog, src_reg, is64); 1205 - EMIT2(0xF7, add_1reg(0xF0, src_reg)); 1144 + /* div src_reg */ 1145 + maybe_emit_1mod(&prog, src_reg, is64); 1146 + EMIT2(0xF7, add_1reg(0xF0, src_reg)); 1147 + } else { 1148 + if (BPF_CLASS(insn->code) == BPF_ALU) 1149 + EMIT1(0x99); /* cdq */ 1150 + else 1151 + EMIT2(0x48, 0x99); /* cqo */ 1152 + 1153 + /* idiv src_reg */ 1154 + maybe_emit_1mod(&prog, src_reg, is64); 1155 + EMIT2(0xF7, add_1reg(0xF8, src_reg)); 1156 + } 1206 1157 1207 1158 if (BPF_OP(insn->code) == BPF_MOD && 1208 1159 dst_reg != BPF_REG_3) ··· 1333 1262 break; 1334 1263 1335 1264 case BPF_ALU | BPF_END | BPF_FROM_BE: 1265 + case BPF_ALU64 | BPF_END | BPF_FROM_LE: 1336 1266 switch (imm32) { 1337 1267 case 16: 1338 1268 /* Emit 'ror %ax, 8' to swap lower 2 bytes */ ··· 1442 1370 case BPF_LDX | BPF_PROBE_MEM | BPF_W: 1443 1371 case BPF_LDX | BPF_MEM | BPF_DW: 1444 1372 case BPF_LDX | BPF_PROBE_MEM | BPF_DW: 1373 + /* LDXS: dst_reg = *(s8*)(src_reg + off) */ 1374 + case BPF_LDX | BPF_MEMSX | BPF_B: 1375 + case BPF_LDX | BPF_MEMSX | BPF_H: 1376 + case BPF_LDX | BPF_MEMSX | BPF_W: 1377 + case BPF_LDX | BPF_PROBE_MEMSX | BPF_B: 1378 + case BPF_LDX | BPF_PROBE_MEMSX | BPF_H: 1379 + case BPF_LDX | BPF_PROBE_MEMSX | BPF_W: 1445 1380 insn_off = insn->off; 1446 1381 1447 - if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { 1382 + if (BPF_MODE(insn->code) == BPF_PROBE_MEM || 1383 + BPF_MODE(insn->code) == BPF_PROBE_MEMSX) { 1448 1384 /* Conservatively check that src_reg + insn->off is a kernel address: 1449 1385 * src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE 1450 1386 * src_reg is used as scratch for src_reg += insn->off and restored ··· 1495 1415 start_of_ldx = prog; 1496 1416 end_of_jmp[-1] = start_of_ldx - end_of_jmp; 1497 1417 } 1498 - emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off); 1499 - if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { 1418 + if (BPF_MODE(insn->code) == BPF_PROBE_MEMSX || 1419 + BPF_MODE(insn->code) == BPF_MEMSX) 1420 + emit_ldsx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off); 1421 + else 1422 + emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off); 1423 + if (BPF_MODE(insn->code) == BPF_PROBE_MEM || 1424 + BPF_MODE(insn->code) == BPF_PROBE_MEMSX) { 1500 1425 struct exception_table_entry *ex; 1501 1426 u8 *_insn = image + proglen + (start_of_ldx - temp); 1502 1427 s64 delta; ··· 1815 1730 break; 1816 1731 1817 1732 case BPF_JMP | BPF_JA: 1818 - if (insn->off == -1) 1819 - /* -1 jmp instructions will always jump 1820 - * backwards two bytes. Explicitly handling 1821 - * this case avoids wasting too many passes 1822 - * when there are long sequences of replaced 1823 - * dead code. 1824 - */ 1825 - jmp_offset = -2; 1826 - else 1827 - jmp_offset = addrs[i + insn->off] - addrs[i]; 1733 + case BPF_JMP32 | BPF_JA: 1734 + if (BPF_CLASS(insn->code) == BPF_JMP) { 1735 + if (insn->off == -1) 1736 + /* -1 jmp instructions will always jump 1737 + * backwards two bytes. Explicitly handling 1738 + * this case avoids wasting too many passes 1739 + * when there are long sequences of replaced 1740 + * dead code. 1741 + */ 1742 + jmp_offset = -2; 1743 + else 1744 + jmp_offset = addrs[i + insn->off] - addrs[i]; 1745 + } else { 1746 + if (insn->imm == -1) 1747 + jmp_offset = -2; 1748 + else 1749 + jmp_offset = addrs[i + insn->imm] - addrs[i]; 1750 + } 1828 1751 1829 1752 if (!jmp_offset) { 1830 1753 /*

+1

drivers/net/bonding/bond_main.c

··· 90 90 #include <net/tls.h> 91 91 #endif 92 92 #include <net/ip6_route.h> 93 + #include <net/xdp.h> 93 94 94 95 #include "bonding_priv.h" 95 96

+1

drivers/net/ethernet/amazon/ena/ena_netdev.h

··· 14 14 #include <linux/interrupt.h> 15 15 #include <linux/netdevice.h> 16 16 #include <linux/skbuff.h> 17 + #include <net/xdp.h> 17 18 #include <uapi/linux/bpf.h> 18 19 19 20 #include "ena_com.h"

+1

drivers/net/ethernet/engleder/tsnep.h

··· 14 14 #include <linux/net_tstamp.h> 15 15 #include <linux/ptp_clock_kernel.h> 16 16 #include <linux/miscdevice.h> 17 + #include <net/xdp.h> 17 18 18 19 #define TSNEP "tsnep" 19 20

+1

drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h

··· 12 12 #include <linux/fsl/mc.h> 13 13 #include <linux/net_tstamp.h> 14 14 #include <net/devlink.h> 15 + #include <net/xdp.h> 15 16 16 17 #include <soc/fsl/dpaa2-io.h> 17 18 #include <soc/fsl/dpaa2-fd.h>

+1

drivers/net/ethernet/freescale/enetc/enetc.h

··· 11 11 #include <linux/if_vlan.h> 12 12 #include <linux/phylink.h> 13 13 #include <linux/dim.h> 14 + #include <net/xdp.h> 14 15 15 16 #include "enetc_hw.h" 16 17

+1

drivers/net/ethernet/freescale/fec.h

··· 22 22 #include <linux/timecounter.h> 23 23 #include <dt-bindings/firmware/imx/rsrc.h> 24 24 #include <linux/firmware/imx/sci.h> 25 + #include <net/xdp.h> 25 26 26 27 #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \ 27 28 defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM) || \

+1

drivers/net/ethernet/fungible/funeth/funeth_txrx.h

··· 5 5 6 6 #include <linux/netdevice.h> 7 7 #include <linux/u64_stats_sync.h> 8 + #include <net/xdp.h> 8 9 9 10 /* Tx descriptor size */ 10 11 #define FUNETH_SQE_SIZE 64U

+1

drivers/net/ethernet/google/gve/gve.h

··· 11 11 #include <linux/netdevice.h> 12 12 #include <linux/pci.h> 13 13 #include <linux/u64_stats_sync.h> 14 + #include <net/xdp.h> 14 15 15 16 #include "gve_desc.h" 16 17 #include "gve_desc_dqo.h"

+1

drivers/net/ethernet/intel/igc/igc.h

··· 15 15 #include <linux/net_tstamp.h> 16 16 #include <linux/bitfield.h> 17 17 #include <linux/hrtimer.h> 18 + #include <net/xdp.h> 18 19 19 20 #include "igc_hw.h" 20 21

+1

drivers/net/ethernet/microchip/lan966x/lan966x_main.h

··· 14 14 #include <net/pkt_cls.h> 15 15 #include <net/pkt_sched.h> 16 16 #include <net/switchdev.h> 17 + #include <net/xdp.h> 17 18 18 19 #include <vcap_api.h> 19 20 #include <vcap_api_client.h>

+1

drivers/net/ethernet/microsoft/mana/mana_en.c

··· 11 11 12 12 #include <net/checksum.h> 13 13 #include <net/ip6_checksum.h> 14 + #include <net/xdp.h> 14 15 15 16 #include <net/mana/mana.h> 16 17 #include <net/mana/mana_auxiliary.h>

+1

drivers/net/ethernet/stmicro/stmmac/stmmac.h

··· 22 22 #include <linux/net_tstamp.h> 23 23 #include <linux/reset.h> 24 24 #include <net/page_pool.h> 25 + #include <net/xdp.h> 25 26 #include <uapi/linux/bpf.h> 26 27 27 28 struct stmmac_resources {

+1

drivers/net/ethernet/ti/cpsw_priv.h

··· 6 6 #ifndef DRIVERS_NET_ETHERNET_TI_CPSW_PRIV_H_ 7 7 #define DRIVERS_NET_ETHERNET_TI_CPSW_PRIV_H_ 8 8 9 + #include <net/xdp.h> 9 10 #include <uapi/linux/bpf.h> 10 11 11 12 #include "davinci_cpdma.h"

+1

drivers/net/hyperv/hyperv_net.h

··· 16 16 #include <linux/hyperv.h> 17 17 #include <linux/rndis.h> 18 18 #include <linux/jhash.h> 19 + #include <net/xdp.h> 19 20 20 21 /* RSS related */ 21 22 #define OID_GEN_RECEIVE_SCALE_CAPABILITIES 0x00010203 /* query only */

+1

drivers/net/tap.c

··· 22 22 #include <net/net_namespace.h> 23 23 #include <net/rtnetlink.h> 24 24 #include <net/sock.h> 25 + #include <net/xdp.h> 25 26 #include <linux/virtio_net.h> 26 27 #include <linux/skb_array.h> 27 28

+1

drivers/net/virtio_net.c

··· 22 22 #include <net/route.h> 23 23 #include <net/xdp.h> 24 24 #include <net/net_failover.h> 25 + #include <net/netdev_rx_queue.h> 25 26 26 27 static int napi_weight = NAPI_POLL_WEIGHT; 27 28 module_param(napi_weight, int, 0444);

+12

include/linux/bpf.h

··· 2661 2661 } 2662 2662 #endif /* CONFIG_BPF_SYSCALL */ 2663 2663 2664 + static __always_inline int 2665 + bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) 2666 + { 2667 + int ret = -EFAULT; 2668 + 2669 + if (IS_ENABLED(CONFIG_BPF_EVENTS)) 2670 + ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); 2671 + if (unlikely(ret < 0)) 2672 + memset(dst, 0, size); 2673 + return ret; 2674 + } 2675 + 2664 2676 void __bpf_free_used_btfs(struct bpf_prog_aux *aux, 2665 2677 struct btf_mod_pair *used_btfs, u32 len); 2666 2678

+13 -21

include/linux/filter.h

··· 69 69 /* unused opcode to mark special load instruction. Same as BPF_ABS */ 70 70 #define BPF_PROBE_MEM 0x20 71 71 72 + /* unused opcode to mark special ldsx instruction. Same as BPF_IND */ 73 + #define BPF_PROBE_MEMSX 0x40 74 + 72 75 /* unused opcode to mark call to interpreter with arguments */ 73 76 #define BPF_CALL_ARGS 0xe0 74 77 ··· 93 90 94 91 /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */ 95 92 96 - #define BPF_ALU64_REG(OP, DST, SRC) \ 93 + #define BPF_ALU64_REG_OFF(OP, DST, SRC, OFF) \ 97 94 ((struct bpf_insn) { \ 98 95 .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \ 99 96 .dst_reg = DST, \ 100 97 .src_reg = SRC, \ 101 - .off = 0, \ 98 + .off = OFF, \ 102 99 .imm = 0 }) 103 100 104 - #define BPF_ALU32_REG(OP, DST, SRC) \ 101 + #define BPF_ALU64_REG(OP, DST, SRC) \ 102 + BPF_ALU64_REG_OFF(OP, DST, SRC, 0) 103 + 104 + #define BPF_ALU32_REG_OFF(OP, DST, SRC, OFF) \ 105 105 ((struct bpf_insn) { \ 106 106 .code = BPF_ALU | BPF_OP(OP) | BPF_X, \ 107 107 .dst_reg = DST, \ 108 108 .src_reg = SRC, \ 109 - .off = 0, \ 109 + .off = OFF, \ 110 110 .imm = 0 }) 111 + 112 + #define BPF_ALU32_REG(OP, DST, SRC) \ 113 + BPF_ALU32_REG_OFF(OP, DST, SRC, 0) 111 114 112 115 /* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */ 113 116 ··· 773 764 DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); 774 765 775 766 u32 xdp_master_redirect(struct xdp_buff *xdp); 776 - 777 - static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, 778 - struct xdp_buff *xdp) 779 - { 780 - /* Driver XDP hooks are invoked within a single NAPI poll cycle and thus 781 - * under local_bh_disable(), which provides the needed RCU protection 782 - * for accessing map entries. 783 - */ 784 - u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); 785 - 786 - if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { 787 - if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) 788 - act = xdp_master_redirect(xdp); 789 - } 790 - 791 - return act; 792 - } 793 767 794 768 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); 795 769

+4 -51

include/linux/netdevice.h

··· 40 40 #include <net/dcbnl.h> 41 41 #endif 42 42 #include <net/netprio_cgroup.h> 43 - #include <net/xdp.h> 44 43 45 44 #include <linux/netdev_features.h> 46 45 #include <linux/neighbour.h> ··· 76 77 struct udp_tunnel_nic; 77 78 struct bpf_prog; 78 79 struct xdp_buff; 80 + struct xdp_frame; 81 + struct xdp_metadata_ops; 79 82 struct xdp_md; 83 + 84 + typedef u32 xdp_features_t; 80 85 81 86 void synchronize_net(void); 82 87 void netdev_set_default_ethtool_ops(struct net_device *dev, ··· 785 782 u16 filter_id); 786 783 #endif 787 784 #endif /* CONFIG_RPS */ 788 - 789 - /* This structure contains an instance of an RX queue. */ 790 - struct netdev_rx_queue { 791 - struct xdp_rxq_info xdp_rxq; 792 - #ifdef CONFIG_RPS 793 - struct rps_map __rcu *rps_map; 794 - struct rps_dev_flow_table __rcu *rps_flow_table; 795 - #endif 796 - struct kobject kobj; 797 - struct net_device *dev; 798 - netdevice_tracker dev_tracker; 799 - 800 - #ifdef CONFIG_XDP_SOCKETS 801 - struct xsk_buff_pool *pool; 802 - #endif 803 - } ____cacheline_aligned_in_smp; 804 - 805 - /* 806 - * RX queue sysfs structures and functions. 807 - */ 808 - struct rx_queue_attribute { 809 - struct attribute attr; 810 - ssize_t (*show)(struct netdev_rx_queue *queue, char *buf); 811 - ssize_t (*store)(struct netdev_rx_queue *queue, 812 - const char *buf, size_t len); 813 - }; 814 785 815 786 /* XPS map type and offset of the xps map within net_device->xps_maps[]. */ 816 787 enum xps_map_type { ··· 1645 1668 int (*ndo_hwtstamp_set)(struct net_device *dev, 1646 1669 struct kernel_hwtstamp_config *kernel_config, 1647 1670 struct netlink_ext_ack *extack); 1648 - }; 1649 - 1650 - struct xdp_metadata_ops { 1651 - int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp); 1652 - int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash, 1653 - enum xdp_rss_hash_type *rss_type); 1654 1671 }; 1655 1672 1656 1673 /** ··· 3821 3850 #endif 3822 3851 int netif_set_real_num_queues(struct net_device *dev, 3823 3852 unsigned int txq, unsigned int rxq); 3824 - 3825 - static inline struct netdev_rx_queue * 3826 - __netif_get_rx_queue(struct net_device *dev, unsigned int rxq) 3827 - { 3828 - return dev->_rx + rxq; 3829 - } 3830 - 3831 - #ifdef CONFIG_SYSFS 3832 - static inline unsigned int get_netdev_rx_queue_index( 3833 - struct netdev_rx_queue *queue) 3834 - { 3835 - struct net_device *dev = queue->dev; 3836 - int index = queue - dev->_rx; 3837 - 3838 - BUG_ON(index >= dev->num_rx_queues); 3839 - return index; 3840 - } 3841 - #endif 3842 3853 3843 3854 int netif_get_num_default_rss_queues(void); 3844 3855

+10

include/linux/netfilter.h

··· 11 11 #include <linux/wait.h> 12 12 #include <linux/list.h> 13 13 #include <linux/static_key.h> 14 + #include <linux/module.h> 14 15 #include <linux/netfilter_defs.h> 15 16 #include <linux/netdevice.h> 16 17 #include <linux/sockptr.h> ··· 481 480 enum ip_conntrack_info ctinfo, s32 off); 482 481 }; 483 482 extern const struct nfnl_ct_hook __rcu *nfnl_ct_hook; 483 + 484 + struct nf_defrag_hook { 485 + struct module *owner; 486 + int (*enable)(struct net *net); 487 + void (*disable)(struct net *net); 488 + }; 489 + 490 + extern const struct nf_defrag_hook __rcu *nf_defrag_v4_hook; 491 + extern const struct nf_defrag_hook __rcu *nf_defrag_v6_hook; 484 492 485 493 /* 486 494 * nf_skb_duplicated - TEE target has sent a packet

+1

include/net/busy_poll.h

··· 16 16 #include <linux/sched/clock.h> 17 17 #include <linux/sched/signal.h> 18 18 #include <net/ip.h> 19 + #include <net/xdp.h> 19 20 20 21 /* 0 - Reserved to indicate value not set 21 22 * 1..NR_CPUS - Reserved for sender_cpu

+75 -4

include/net/inet6_hashtables.h

··· 48 48 const u16 hnum, const int dif, 49 49 const int sdif); 50 50 51 + typedef u32 (inet6_ehashfn_t)(const struct net *net, 52 + const struct in6_addr *laddr, const u16 lport, 53 + const struct in6_addr *faddr, const __be16 fport); 54 + 55 + inet6_ehashfn_t inet6_ehashfn; 56 + 57 + INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn); 58 + 59 + struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, 60 + struct sk_buff *skb, int doff, 61 + const struct in6_addr *saddr, 62 + __be16 sport, 63 + const struct in6_addr *daddr, 64 + unsigned short hnum, 65 + inet6_ehashfn_t *ehashfn); 66 + 51 67 struct sock *inet6_lookup_listener(struct net *net, 52 68 struct inet_hashinfo *hashinfo, 53 69 struct sk_buff *skb, int doff, ··· 72 56 const struct in6_addr *daddr, 73 57 const unsigned short hnum, 74 58 const int dif, const int sdif); 59 + 60 + struct sock *inet6_lookup_run_sk_lookup(struct net *net, 61 + int protocol, 62 + struct sk_buff *skb, int doff, 63 + const struct in6_addr *saddr, 64 + const __be16 sport, 65 + const struct in6_addr *daddr, 66 + const u16 hnum, const int dif, 67 + inet6_ehashfn_t *ehashfn); 75 68 76 69 static inline struct sock *__inet6_lookup(struct net *net, 77 70 struct inet_hashinfo *hashinfo, ··· 103 78 daddr, hnum, dif, sdif); 104 79 } 105 80 81 + static inline 82 + struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, int doff, 83 + const struct in6_addr *saddr, const __be16 sport, 84 + const struct in6_addr *daddr, const __be16 dport, 85 + bool *refcounted, inet6_ehashfn_t *ehashfn) 86 + { 87 + struct sock *sk, *reuse_sk; 88 + bool prefetched; 89 + 90 + sk = skb_steal_sock(skb, refcounted, &prefetched); 91 + if (!sk) 92 + return NULL; 93 + 94 + if (!prefetched) 95 + return sk; 96 + 97 + if (sk->sk_protocol == IPPROTO_TCP) { 98 + if (sk->sk_state != TCP_LISTEN) 99 + return sk; 100 + } else if (sk->sk_protocol == IPPROTO_UDP) { 101 + if (sk->sk_state != TCP_CLOSE) 102 + return sk; 103 + } else { 104 + return sk; 105 + } 106 + 107 + reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, 108 + saddr, sport, daddr, ntohs(dport), 109 + ehashfn); 110 + if (!reuse_sk) 111 + return sk; 112 + 113 + /* We've chosen a new reuseport sock which is never refcounted. This 114 + * implies that sk also isn't refcounted. 115 + */ 116 + WARN_ON_ONCE(*refcounted); 117 + 118 + return reuse_sk; 119 + } 120 + 106 121 static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, 107 122 struct sk_buff *skb, int doff, 108 123 const __be16 sport, ··· 150 85 int iif, int sdif, 151 86 bool *refcounted) 152 87 { 153 - struct sock *sk = skb_steal_sock(skb, refcounted); 88 + struct net *net = dev_net(skb_dst(skb)->dev); 89 + const struct ipv6hdr *ip6h = ipv6_hdr(skb); 90 + struct sock *sk; 154 91 92 + sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, &ip6h->daddr, dport, 93 + refcounted, inet6_ehashfn); 94 + if (IS_ERR(sk)) 95 + return NULL; 155 96 if (sk) 156 97 return sk; 157 98 158 - return __inet6_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb, 159 - doff, &ipv6_hdr(skb)->saddr, sport, 160 - &ipv6_hdr(skb)->daddr, ntohs(dport), 99 + return __inet6_lookup(net, hashinfo, skb, 100 + doff, &ip6h->saddr, sport, 101 + &ip6h->daddr, ntohs(dport), 161 102 iif, sdif, refcounted); 162 103 } 163 104

+68 -6

include/net/inet_hashtables.h

··· 379 379 const __be32 daddr, const u16 hnum, 380 380 const int dif, const int sdif); 381 381 382 + typedef u32 (inet_ehashfn_t)(const struct net *net, 383 + const __be32 laddr, const __u16 lport, 384 + const __be32 faddr, const __be16 fport); 385 + 386 + inet_ehashfn_t inet_ehashfn; 387 + 388 + INDIRECT_CALLABLE_DECLARE(inet_ehashfn_t udp_ehashfn); 389 + 390 + struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, 391 + struct sk_buff *skb, int doff, 392 + __be32 saddr, __be16 sport, 393 + __be32 daddr, unsigned short hnum, 394 + inet_ehashfn_t *ehashfn); 395 + 396 + struct sock *inet_lookup_run_sk_lookup(struct net *net, 397 + int protocol, 398 + struct sk_buff *skb, int doff, 399 + __be32 saddr, __be16 sport, 400 + __be32 daddr, u16 hnum, const int dif, 401 + inet_ehashfn_t *ehashfn); 402 + 382 403 static inline struct sock * 383 404 inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, 384 405 const __be32 saddr, const __be16 sport, ··· 449 428 return sk; 450 429 } 451 430 431 + static inline 432 + struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int doff, 433 + const __be32 saddr, const __be16 sport, 434 + const __be32 daddr, const __be16 dport, 435 + bool *refcounted, inet_ehashfn_t *ehashfn) 436 + { 437 + struct sock *sk, *reuse_sk; 438 + bool prefetched; 439 + 440 + sk = skb_steal_sock(skb, refcounted, &prefetched); 441 + if (!sk) 442 + return NULL; 443 + 444 + if (!prefetched) 445 + return sk; 446 + 447 + if (sk->sk_protocol == IPPROTO_TCP) { 448 + if (sk->sk_state != TCP_LISTEN) 449 + return sk; 450 + } else if (sk->sk_protocol == IPPROTO_UDP) { 451 + if (sk->sk_state != TCP_CLOSE) 452 + return sk; 453 + } else { 454 + return sk; 455 + } 456 + 457 + reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, 458 + saddr, sport, daddr, ntohs(dport), 459 + ehashfn); 460 + if (!reuse_sk) 461 + return sk; 462 + 463 + /* We've chosen a new reuseport sock which is never refcounted. This 464 + * implies that sk also isn't refcounted. 465 + */ 466 + WARN_ON_ONCE(*refcounted); 467 + 468 + return reuse_sk; 469 + } 470 + 452 471 static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, 453 472 struct sk_buff *skb, 454 473 int doff, ··· 497 436 const int sdif, 498 437 bool *refcounted) 499 438 { 500 - struct sock *sk = skb_steal_sock(skb, refcounted); 439 + struct net *net = dev_net(skb_dst(skb)->dev); 501 440 const struct iphdr *iph = ip_hdr(skb); 441 + struct sock *sk; 502 442 443 + sk = inet_steal_sock(net, skb, doff, iph->saddr, sport, iph->daddr, dport, 444 + refcounted, inet_ehashfn); 445 + if (IS_ERR(sk)) 446 + return NULL; 503 447 if (sk) 504 448 return sk; 505 449 506 - return __inet_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb, 450 + return __inet_lookup(net, hashinfo, skb, 507 451 doff, iph->saddr, sport, 508 452 iph->daddr, dport, inet_iif(skb), sdif, 509 453 refcounted); 510 454 } 511 - 512 - u32 inet6_ehashfn(const struct net *net, 513 - const struct in6_addr *laddr, const u16 lport, 514 - const struct in6_addr *faddr, const __be16 fport); 515 455 516 456 static inline void sk_daddr_set(struct sock *sk, __be32 addr) 517 457 {

+2

include/net/mana/mana.h

··· 4 4 #ifndef _MANA_H 5 5 #define _MANA_H 6 6 7 + #include <net/xdp.h> 8 + 7 9 #include "gdma.h" 8 10 #include "hw_channel.h" 9 11

+53

include/net/netdev_rx_queue.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _LINUX_NETDEV_RX_QUEUE_H 3 + #define _LINUX_NETDEV_RX_QUEUE_H 4 + 5 + #include <linux/kobject.h> 6 + #include <linux/netdevice.h> 7 + #include <linux/sysfs.h> 8 + #include <net/xdp.h> 9 + 10 + /* This structure contains an instance of an RX queue. */ 11 + struct netdev_rx_queue { 12 + struct xdp_rxq_info xdp_rxq; 13 + #ifdef CONFIG_RPS 14 + struct rps_map __rcu *rps_map; 15 + struct rps_dev_flow_table __rcu *rps_flow_table; 16 + #endif 17 + struct kobject kobj; 18 + struct net_device *dev; 19 + netdevice_tracker dev_tracker; 20 + 21 + #ifdef CONFIG_XDP_SOCKETS 22 + struct xsk_buff_pool *pool; 23 + #endif 24 + } ____cacheline_aligned_in_smp; 25 + 26 + /* 27 + * RX queue sysfs structures and functions. 28 + */ 29 + struct rx_queue_attribute { 30 + struct attribute attr; 31 + ssize_t (*show)(struct netdev_rx_queue *queue, char *buf); 32 + ssize_t (*store)(struct netdev_rx_queue *queue, 33 + const char *buf, size_t len); 34 + }; 35 + 36 + static inline struct netdev_rx_queue * 37 + __netif_get_rx_queue(struct net_device *dev, unsigned int rxq) 38 + { 39 + return dev->_rx + rxq; 40 + } 41 + 42 + #ifdef CONFIG_SYSFS 43 + static inline unsigned int 44 + get_netdev_rx_queue_index(struct netdev_rx_queue *queue) 45 + { 46 + struct net_device *dev = queue->dev; 47 + int index = queue - dev->_rx; 48 + 49 + BUG_ON(index >= dev->num_rx_queues); 50 + return index; 51 + } 52 + #endif 53 + #endif

+5 -2

include/net/sock.h

··· 2815 2815 * skb_steal_sock - steal a socket from an sk_buff 2816 2816 * @skb: sk_buff to steal the socket from 2817 2817 * @refcounted: is set to true if the socket is reference-counted 2818 + * @prefetched: is set to true if the socket was assigned from bpf 2818 2819 */ 2819 2820 static inline struct sock * 2820 - skb_steal_sock(struct sk_buff *skb, bool *refcounted) 2821 + skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched) 2821 2822 { 2822 2823 if (skb->sk) { 2823 2824 struct sock *sk = skb->sk; 2824 2825 2825 2826 *refcounted = true; 2826 - if (skb_sk_is_prefetched(skb)) 2827 + *prefetched = skb_sk_is_prefetched(skb); 2828 + if (*prefetched) 2827 2829 *refcounted = sk_is_refcounted(sk); 2828 2830 skb->destructor = NULL; 2829 2831 skb->sk = NULL; 2830 2832 return sk; 2831 2833 } 2834 + *prefetched = false; 2832 2835 *refcounted = false; 2833 2836 return NULL; 2834 2837 }

+25 -4

include/net/xdp.h

··· 6 6 #ifndef __LINUX_NET_XDP_H__ 7 7 #define __LINUX_NET_XDP_H__ 8 8 9 - #include <linux/skbuff.h> /* skb_shared_info */ 10 - #include <uapi/linux/netdev.h> 11 9 #include <linux/bitfield.h> 10 + #include <linux/filter.h> 11 + #include <linux/netdevice.h> 12 + #include <linux/skbuff.h> /* skb_shared_info */ 12 13 13 14 /** 14 15 * DOC: XDP RX-queue information ··· 45 44 MEM_TYPE_XSK_BUFF_POOL, 46 45 MEM_TYPE_MAX, 47 46 }; 48 - 49 - typedef u32 xdp_features_t; 50 47 51 48 /* XDP flags for ndo_xdp_xmit */ 52 49 #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ ··· 442 443 XDP_RSS_TYPE_L4_IPV6_SCTP_EX = XDP_RSS_TYPE_L4_IPV6_SCTP | XDP_RSS_L3_DYNHDR, 443 444 }; 444 445 446 + struct xdp_metadata_ops { 447 + int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp); 448 + int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash, 449 + enum xdp_rss_hash_type *rss_type); 450 + }; 451 + 445 452 #ifdef CONFIG_NET 446 453 u32 bpf_xdp_metadata_kfunc_id(int id); 447 454 bool bpf_dev_bound_kfunc_id(u32 btf_id); ··· 479 474 xdp_set_features_flag(dev, 0); 480 475 } 481 476 477 + static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, 478 + struct xdp_buff *xdp) 479 + { 480 + /* Driver XDP hooks are invoked within a single NAPI poll cycle and thus 481 + * under local_bh_disable(), which provides the needed RCU protection 482 + * for accessing map entries. 483 + */ 484 + u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); 485 + 486 + if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) { 487 + if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev)) 488 + act = xdp_master_redirect(xdp); 489 + } 490 + 491 + return act; 492 + } 482 493 #endif /* __LINUX_NET_XDP_H__ */

+18

include/trace/events/xdp.h

··· 9 9 #include <linux/filter.h> 10 10 #include <linux/tracepoint.h> 11 11 #include <linux/bpf.h> 12 + #include <net/xdp.h> 12 13 13 14 #define __XDP_ACT_MAP(FN) \ 14 15 FN(ABORTED) \ ··· 403 402 __print_symbolic(__entry->mem_type, __MEM_TYPE_SYM_TAB), 404 403 __entry->page 405 404 ) 405 + ); 406 + 407 + TRACE_EVENT(bpf_xdp_link_attach_failed, 408 + 409 + TP_PROTO(const char *msg), 410 + 411 + TP_ARGS(msg), 412 + 413 + TP_STRUCT__entry( 414 + __string(msg, msg) 415 + ), 416 + 417 + TP_fast_assign( 418 + __assign_str(msg, msg); 419 + ), 420 + 421 + TP_printk("errmsg=%s", __get_str(msg)) 406 422 ); 407 423 408 424 #endif /* _TRACE_XDP_H */

+6 -3

include/uapi/linux/bpf.h

··· 19 19 20 20 /* ld/ldx fields */ 21 21 #define BPF_DW 0x18 /* double word (64-bit) */ 22 + #define BPF_MEMSX 0x80 /* load with sign extension */ 22 23 #define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */ 23 24 #define BPF_XADD 0xc0 /* exclusive add - legacy name */ 24 25 ··· 1187 1186 * BPF_TRACE_KPROBE_MULTI attach type to create return probe. 1188 1187 */ 1189 1188 #define BPF_F_KPROBE_MULTI_RETURN (1U << 0) 1189 + 1190 + /* link_create.netfilter.flags used in LINK_CREATE command for 1191 + * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation. 1192 + */ 1193 + #define BPF_F_NETFILTER_IP_DEFRAG (1U << 0) 1190 1194 1191 1195 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have 1192 1196 * the following extensions: ··· 4203 4197 * 4204 4198 * **-EOPNOTSUPP** if the operation is not supported, for example 4205 4199 * a call from outside of TC ingress. 4206 - * 4207 - * **-ESOCKTNOSUPPORT** if the socket type is not supported 4208 - * (reuseport). 4209 4200 * 4210 4201 * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags) 4211 4202 * Description

+1

kernel/bpf/btf.c

··· 29 29 #include <net/netfilter/nf_bpf_link.h> 30 30 31 31 #include <net/sock.h> 32 + #include <net/xdp.h> 32 33 #include "../tools/lib/bpf/relo_core.h" 33 34 34 35 /* BTF (BPF Type Format) is the meta data format which describes

+175 -31

kernel/bpf/core.c

··· 61 61 #define AX regs[BPF_REG_AX] 62 62 #define ARG1 regs[BPF_REG_ARG1] 63 63 #define CTX regs[BPF_REG_CTX] 64 + #define OFF insn->off 64 65 #define IMM insn->imm 65 66 66 67 struct bpf_mem_alloc bpf_global_ma; ··· 373 372 { 374 373 const s32 off_min = S16_MIN, off_max = S16_MAX; 375 374 s32 delta = end_new - end_old; 376 - s32 off = insn->off; 375 + s32 off; 376 + 377 + if (insn->code == (BPF_JMP32 | BPF_JA)) 378 + off = insn->imm; 379 + else 380 + off = insn->off; 377 381 378 382 if (curr < pos && curr + off + 1 >= end_old) 379 383 off += delta; ··· 386 380 off -= delta; 387 381 if (off < off_min || off > off_max) 388 382 return -ERANGE; 389 - if (!probe_pass) 390 - insn->off = off; 383 + if (!probe_pass) { 384 + if (insn->code == (BPF_JMP32 | BPF_JA)) 385 + insn->imm = off; 386 + else 387 + insn->off = off; 388 + } 391 389 return 0; 392 390 } 393 391 ··· 1281 1271 case BPF_ALU | BPF_MOD | BPF_K: 1282 1272 *to++ = BPF_ALU32_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm); 1283 1273 *to++ = BPF_ALU32_IMM(BPF_XOR, BPF_REG_AX, imm_rnd); 1284 - *to++ = BPF_ALU32_REG(from->code, from->dst_reg, BPF_REG_AX); 1274 + *to++ = BPF_ALU32_REG_OFF(from->code, from->dst_reg, BPF_REG_AX, from->off); 1285 1275 break; 1286 1276 1287 1277 case BPF_ALU64 | BPF_ADD | BPF_K: ··· 1295 1285 case BPF_ALU64 | BPF_MOD | BPF_K: 1296 1286 *to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm); 1297 1287 *to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd); 1298 - *to++ = BPF_ALU64_REG(from->code, from->dst_reg, BPF_REG_AX); 1288 + *to++ = BPF_ALU64_REG_OFF(from->code, from->dst_reg, BPF_REG_AX, from->off); 1299 1289 break; 1300 1290 1301 1291 case BPF_JMP | BPF_JEQ | BPF_K: ··· 1533 1523 INSN_3(ALU64, DIV, X), \ 1534 1524 INSN_3(ALU64, MOD, X), \ 1535 1525 INSN_2(ALU64, NEG), \ 1526 + INSN_3(ALU64, END, TO_LE), \ 1536 1527 /* Immediate based. */ \ 1537 1528 INSN_3(ALU64, ADD, K), \ 1538 1529 INSN_3(ALU64, SUB, K), \ ··· 1602 1591 INSN_3(JMP, JSLE, K), \ 1603 1592 INSN_3(JMP, JSET, K), \ 1604 1593 INSN_2(JMP, JA), \ 1594 + INSN_2(JMP32, JA), \ 1605 1595 /* Store instructions. */ \ 1606 1596 /* Register based. */ \ 1607 1597 INSN_3(STX, MEM, B), \ ··· 1622 1610 INSN_3(LDX, MEM, H), \ 1623 1611 INSN_3(LDX, MEM, W), \ 1624 1612 INSN_3(LDX, MEM, DW), \ 1613 + INSN_3(LDX, MEMSX, B), \ 1614 + INSN_3(LDX, MEMSX, H), \ 1615 + INSN_3(LDX, MEMSX, W), \ 1625 1616 /* Immediate based. */ \ 1626 1617 INSN_3(LD, IMM, DW) 1627 1618 ··· 1650 1635 } 1651 1636 1652 1637 #ifndef CONFIG_BPF_JIT_ALWAYS_ON 1653 - u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr) 1654 - { 1655 - memset(dst, 0, size); 1656 - return -EFAULT; 1657 - } 1658 - 1659 1638 /** 1660 1639 * ___bpf_prog_run - run eBPF program on a given context 1661 1640 * @regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers ··· 1675 1666 [BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H, 1676 1667 [BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W, 1677 1668 [BPF_LDX | BPF_PROBE_MEM | BPF_DW] = &&LDX_PROBE_MEM_DW, 1669 + [BPF_LDX | BPF_PROBE_MEMSX | BPF_B] = &&LDX_PROBE_MEMSX_B, 1670 + [BPF_LDX | BPF_PROBE_MEMSX | BPF_H] = &&LDX_PROBE_MEMSX_H, 1671 + [BPF_LDX | BPF_PROBE_MEMSX | BPF_W] = &&LDX_PROBE_MEMSX_W, 1678 1672 }; 1679 1673 #undef BPF_INSN_3_LBL 1680 1674 #undef BPF_INSN_2_LBL ··· 1745 1733 DST = -DST; 1746 1734 CONT; 1747 1735 ALU_MOV_X: 1748 - DST = (u32) SRC; 1736 + switch (OFF) { 1737 + case 0: 1738 + DST = (u32) SRC; 1739 + break; 1740 + case 8: 1741 + DST = (u32)(s8) SRC; 1742 + break; 1743 + case 16: 1744 + DST = (u32)(s16) SRC; 1745 + break; 1746 + } 1749 1747 CONT; 1750 1748 ALU_MOV_K: 1751 1749 DST = (u32) IMM; 1752 1750 CONT; 1753 1751 ALU64_MOV_X: 1754 - DST = SRC; 1752 + switch (OFF) { 1753 + case 0: 1754 + DST = SRC; 1755 + break; 1756 + case 8: 1757 + DST = (s8) SRC; 1758 + break; 1759 + case 16: 1760 + DST = (s16) SRC; 1761 + break; 1762 + case 32: 1763 + DST = (s32) SRC; 1764 + break; 1765 + } 1755 1766 CONT; 1756 1767 ALU64_MOV_K: 1757 1768 DST = IMM; ··· 1796 1761 (*(s64 *) &DST) >>= IMM; 1797 1762 CONT; 1798 1763 ALU64_MOD_X: 1799 - div64_u64_rem(DST, SRC, &AX); 1800 - DST = AX; 1764 + switch (OFF) { 1765 + case 0: 1766 + div64_u64_rem(DST, SRC, &AX); 1767 + DST = AX; 1768 + break; 1769 + case 1: 1770 + AX = div64_s64(DST, SRC); 1771 + DST = DST - AX * SRC; 1772 + break; 1773 + } 1801 1774 CONT; 1802 1775 ALU_MOD_X: 1803 - AX = (u32) DST; 1804 - DST = do_div(AX, (u32) SRC); 1776 + switch (OFF) { 1777 + case 0: 1778 + AX = (u32) DST; 1779 + DST = do_div(AX, (u32) SRC); 1780 + break; 1781 + case 1: 1782 + AX = abs((s32)DST); 1783 + AX = do_div(AX, abs((s32)SRC)); 1784 + if ((s32)DST < 0) 1785 + DST = (u32)-AX; 1786 + else 1787 + DST = (u32)AX; 1788 + break; 1789 + } 1805 1790 CONT; 1806 1791 ALU64_MOD_K: 1807 - div64_u64_rem(DST, IMM, &AX); 1808 - DST = AX; 1792 + switch (OFF) { 1793 + case 0: 1794 + div64_u64_rem(DST, IMM, &AX); 1795 + DST = AX; 1796 + break; 1797 + case 1: 1798 + AX = div64_s64(DST, IMM); 1799 + DST = DST - AX * IMM; 1800 + break; 1801 + } 1809 1802 CONT; 1810 1803 ALU_MOD_K: 1811 - AX = (u32) DST; 1812 - DST = do_div(AX, (u32) IMM); 1804 + switch (OFF) { 1805 + case 0: 1806 + AX = (u32) DST; 1807 + DST = do_div(AX, (u32) IMM); 1808 + break; 1809 + case 1: 1810 + AX = abs((s32)DST); 1811 + AX = do_div(AX, abs((s32)IMM)); 1812 + if ((s32)DST < 0) 1813 + DST = (u32)-AX; 1814 + else 1815 + DST = (u32)AX; 1816 + break; 1817 + } 1813 1818 CONT; 1814 1819 ALU64_DIV_X: 1815 - DST = div64_u64(DST, SRC); 1820 + switch (OFF) { 1821 + case 0: 1822 + DST = div64_u64(DST, SRC); 1823 + break; 1824 + case 1: 1825 + DST = div64_s64(DST, SRC); 1826 + break; 1827 + } 1816 1828 CONT; 1817 1829 ALU_DIV_X: 1818 - AX = (u32) DST; 1819 - do_div(AX, (u32) SRC); 1820 - DST = (u32) AX; 1830 + switch (OFF) { 1831 + case 0: 1832 + AX = (u32) DST; 1833 + do_div(AX, (u32) SRC); 1834 + DST = (u32) AX; 1835 + break; 1836 + case 1: 1837 + AX = abs((s32)DST); 1838 + do_div(AX, abs((s32)SRC)); 1839 + if (((s32)DST < 0) == ((s32)SRC < 0)) 1840 + DST = (u32)AX; 1841 + else 1842 + DST = (u32)-AX; 1843 + break; 1844 + } 1821 1845 CONT; 1822 1846 ALU64_DIV_K: 1823 - DST = div64_u64(DST, IMM); 1847 + switch (OFF) { 1848 + case 0: 1849 + DST = div64_u64(DST, IMM); 1850 + break; 1851 + case 1: 1852 + DST = div64_s64(DST, IMM); 1853 + break; 1854 + } 1824 1855 CONT; 1825 1856 ALU_DIV_K: 1826 - AX = (u32) DST; 1827 - do_div(AX, (u32) IMM); 1828 - DST = (u32) AX; 1857 + switch (OFF) { 1858 + case 0: 1859 + AX = (u32) DST; 1860 + do_div(AX, (u32) IMM); 1861 + DST = (u32) AX; 1862 + break; 1863 + case 1: 1864 + AX = abs((s32)DST); 1865 + do_div(AX, abs((s32)IMM)); 1866 + if (((s32)DST < 0) == ((s32)IMM < 0)) 1867 + DST = (u32)AX; 1868 + else 1869 + DST = (u32)-AX; 1870 + break; 1871 + } 1829 1872 CONT; 1830 1873 ALU_END_TO_BE: 1831 1874 switch (IMM) { ··· 1928 1815 break; 1929 1816 case 64: 1930 1817 DST = (__force u64) cpu_to_le64(DST); 1818 + break; 1819 + } 1820 + CONT; 1821 + ALU64_END_TO_LE: 1822 + switch (IMM) { 1823 + case 16: 1824 + DST = (__force u16) __swab16(DST); 1825 + break; 1826 + case 32: 1827 + DST = (__force u32) __swab32(DST); 1828 + break; 1829 + case 64: 1830 + DST = (__force u64) __swab64(DST); 1931 1831 break; 1932 1832 } 1933 1833 CONT; ··· 1992 1866 } 1993 1867 JMP_JA: 1994 1868 insn += insn->off; 1869 + CONT; 1870 + JMP32_JA: 1871 + insn += insn->imm; 1995 1872 CONT; 1996 1873 JMP_EXIT: 1997 1874 return BPF_R0; ··· 2060 1931 DST = *(SIZE *)(unsigned long) (SRC + insn->off); \ 2061 1932 CONT; \ 2062 1933 LDX_PROBE_MEM_##SIZEOP: \ 2063 - bpf_probe_read_kernel(&DST, sizeof(SIZE), \ 2064 - (const void *)(long) (SRC + insn->off)); \ 1934 + bpf_probe_read_kernel_common(&DST, sizeof(SIZE), \ 1935 + (const void *)(long) (SRC + insn->off)); \ 2065 1936 DST = *((SIZE *)&DST); \ 2066 1937 CONT; 2067 1938 ··· 2070 1941 LDST(W, u32) 2071 1942 LDST(DW, u64) 2072 1943 #undef LDST 1944 + 1945 + #define LDSX(SIZEOP, SIZE) \ 1946 + LDX_MEMSX_##SIZEOP: \ 1947 + DST = *(SIZE *)(unsigned long) (SRC + insn->off); \ 1948 + CONT; \ 1949 + LDX_PROBE_MEMSX_##SIZEOP: \ 1950 + bpf_probe_read_kernel_common(&DST, sizeof(SIZE), \ 1951 + (const void *)(long) (SRC + insn->off)); \ 1952 + DST = *((SIZE *)&DST); \ 1953 + CONT; 1954 + 1955 + LDSX(B, s8) 1956 + LDSX(H, s16) 1957 + LDSX(W, s32) 1958 + #undef LDSX 2073 1959 2074 1960 #define ATOMIC_ALU_OP(BOP, KOP) \ 2075 1961 case BOP: \

-3

kernel/bpf/cpumap.c

··· 61 61 /* XDP can run multiple RX-ring queues, need __percpu enqueue store */ 62 62 struct xdp_bulk_queue __percpu *bulkq; 63 63 64 - struct bpf_cpu_map *cmap; 65 - 66 64 /* Queue with potential multi-producers, and single-consumer kthread */ 67 65 struct ptr_ring *queue; 68 66 struct task_struct *kthread; ··· 593 595 rcpu = __cpu_map_entry_alloc(map, &cpumap_value, key_cpu); 594 596 if (!rcpu) 595 597 return -ENOMEM; 596 - rcpu->cmap = cmap; 597 598 } 598 599 rcu_read_lock(); 599 600 __cpu_map_entry_replace(cmap, key_cpu, rcpu);

-2

kernel/bpf/devmap.c

··· 65 65 struct bpf_dtab_netdev { 66 66 struct net_device *dev; /* must be first member, due to tracepoint */ 67 67 struct hlist_node index_hlist; 68 - struct bpf_dtab *dtab; 69 68 struct bpf_prog *xdp_prog; 70 69 struct rcu_head rcu; 71 70 unsigned int idx; ··· 873 874 } 874 875 875 876 dev->idx = idx; 876 - dev->dtab = dtab; 877 877 if (prog) { 878 878 dev->xdp_prog = prog; 879 879 dev->val.bpf_prog.id = prog->aux->id;

+52 -6

kernel/bpf/disasm.c

··· 87 87 [BPF_END >> 4] = "endian", 88 88 }; 89 89 90 + const char *const bpf_alu_sign_string[16] = { 91 + [BPF_DIV >> 4] = "s/=", 92 + [BPF_MOD >> 4] = "s%=", 93 + }; 94 + 95 + const char *const bpf_movsx_string[4] = { 96 + [0] = "(s8)", 97 + [1] = "(s16)", 98 + [3] = "(s32)", 99 + }; 100 + 90 101 static const char *const bpf_atomic_alu_string[16] = { 91 102 [BPF_ADD >> 4] = "add", 92 103 [BPF_AND >> 4] = "and", ··· 110 99 [BPF_H >> 3] = "u16", 111 100 [BPF_B >> 3] = "u8", 112 101 [BPF_DW >> 3] = "u64", 102 + }; 103 + 104 + static const char *const bpf_ldsx_string[] = { 105 + [BPF_W >> 3] = "s32", 106 + [BPF_H >> 3] = "s16", 107 + [BPF_B >> 3] = "s8", 113 108 }; 114 109 115 110 static const char *const bpf_jmp_string[16] = { ··· 145 128 insn->imm, insn->dst_reg); 146 129 } 147 130 131 + static void print_bpf_bswap_insn(bpf_insn_print_t verbose, 132 + void *private_data, 133 + const struct bpf_insn *insn) 134 + { 135 + verbose(private_data, "(%02x) r%d = bswap%d r%d\n", 136 + insn->code, insn->dst_reg, 137 + insn->imm, insn->dst_reg); 138 + } 139 + 140 + static bool is_sdiv_smod(const struct bpf_insn *insn) 141 + { 142 + return (BPF_OP(insn->code) == BPF_DIV || BPF_OP(insn->code) == BPF_MOD) && 143 + insn->off == 1; 144 + } 145 + 146 + static bool is_movsx(const struct bpf_insn *insn) 147 + { 148 + return BPF_OP(insn->code) == BPF_MOV && 149 + (insn->off == 8 || insn->off == 16 || insn->off == 32); 150 + } 151 + 148 152 void print_bpf_insn(const struct bpf_insn_cbs *cbs, 149 153 const struct bpf_insn *insn, 150 154 bool allow_ptr_leaks) ··· 176 138 if (class == BPF_ALU || class == BPF_ALU64) { 177 139 if (BPF_OP(insn->code) == BPF_END) { 178 140 if (class == BPF_ALU64) 179 - verbose(cbs->private_data, "BUG_alu64_%02x\n", insn->code); 141 + print_bpf_bswap_insn(verbose, cbs->private_data, insn); 180 142 else 181 143 print_bpf_end_insn(verbose, cbs->private_data, insn); 182 144 } else if (BPF_OP(insn->code) == BPF_NEG) { ··· 185 147 insn->dst_reg, class == BPF_ALU ? 'w' : 'r', 186 148 insn->dst_reg); 187 149 } else if (BPF_SRC(insn->code) == BPF_X) { 188 - verbose(cbs->private_data, "(%02x) %c%d %s %c%d\n", 150 + verbose(cbs->private_data, "(%02x) %c%d %s %s%c%d\n", 189 151 insn->code, class == BPF_ALU ? 'w' : 'r', 190 152 insn->dst_reg, 191 - bpf_alu_string[BPF_OP(insn->code) >> 4], 153 + is_sdiv_smod(insn) ? bpf_alu_sign_string[BPF_OP(insn->code) >> 4] 154 + : bpf_alu_string[BPF_OP(insn->code) >> 4], 155 + is_movsx(insn) ? bpf_movsx_string[(insn->off >> 3) - 1] : "", 192 156 class == BPF_ALU ? 'w' : 'r', 193 157 insn->src_reg); 194 158 } else { 195 159 verbose(cbs->private_data, "(%02x) %c%d %s %d\n", 196 160 insn->code, class == BPF_ALU ? 'w' : 'r', 197 161 insn->dst_reg, 198 - bpf_alu_string[BPF_OP(insn->code) >> 4], 162 + is_sdiv_smod(insn) ? bpf_alu_sign_string[BPF_OP(insn->code) >> 4] 163 + : bpf_alu_string[BPF_OP(insn->code) >> 4], 199 164 insn->imm); 200 165 } 201 166 } else if (class == BPF_STX) { ··· 259 218 verbose(cbs->private_data, "BUG_st_%02x\n", insn->code); 260 219 } 261 220 } else if (class == BPF_LDX) { 262 - if (BPF_MODE(insn->code) != BPF_MEM) { 221 + if (BPF_MODE(insn->code) != BPF_MEM && BPF_MODE(insn->code) != BPF_MEMSX) { 263 222 verbose(cbs->private_data, "BUG_ldx_%02x\n", insn->code); 264 223 return; 265 224 } 266 225 verbose(cbs->private_data, "(%02x) r%d = *(%s *)(r%d %+d)\n", 267 226 insn->code, insn->dst_reg, 268 - bpf_ldst_string[BPF_SIZE(insn->code) >> 3], 227 + BPF_MODE(insn->code) == BPF_MEM ? 228 + bpf_ldst_string[BPF_SIZE(insn->code) >> 3] : 229 + bpf_ldsx_string[BPF_SIZE(insn->code) >> 3], 269 230 insn->src_reg, insn->off); 270 231 } else if (class == BPF_LD) { 271 232 if (BPF_MODE(insn->code) == BPF_ABS) { ··· 322 279 } else if (insn->code == (BPF_JMP | BPF_JA)) { 323 280 verbose(cbs->private_data, "(%02x) goto pc%+d\n", 324 281 insn->code, insn->off); 282 + } else if (insn->code == (BPF_JMP32 | BPF_JA)) { 283 + verbose(cbs->private_data, "(%02x) gotol pc%+d\n", 284 + insn->code, insn->imm); 325 285 } else if (insn->code == (BPF_JMP | BPF_EXIT)) { 326 286 verbose(cbs->private_data, "(%02x) exit\n", insn->code); 327 287 } else if (BPF_SRC(insn->code) == BPF_X) {

+14 -10

kernel/bpf/memalloc.c

··· 183 183 WARN_ON_ONCE(local_inc_return(&c->active) != 1); 184 184 } 185 185 186 - static void dec_active(struct bpf_mem_cache *c, unsigned long flags) 186 + static void dec_active(struct bpf_mem_cache *c, unsigned long *flags) 187 187 { 188 188 local_dec(&c->active); 189 189 if (IS_ENABLED(CONFIG_PREEMPT_RT)) 190 - local_irq_restore(flags); 190 + local_irq_restore(*flags); 191 191 } 192 192 193 193 static void add_obj_to_free_list(struct bpf_mem_cache *c, void *obj) ··· 197 197 inc_active(c, &flags); 198 198 __llist_add(obj, &c->free_llist); 199 199 c->free_cnt++; 200 - dec_active(c, flags); 200 + dec_active(c, &flags); 201 201 } 202 202 203 203 /* Mostly runs from irq_work except __init phase. */ 204 - static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node) 204 + static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node, bool atomic) 205 205 { 206 206 struct mem_cgroup *memcg = NULL, *old_memcg; 207 + gfp_t gfp; 207 208 void *obj; 208 209 int i; 210 + 211 + gfp = __GFP_NOWARN | __GFP_ACCOUNT; 212 + gfp |= atomic ? GFP_NOWAIT : GFP_KERNEL; 209 213 210 214 for (i = 0; i < cnt; i++) { 211 215 /* ··· 242 238 * will allocate from the current numa node which is what we 243 239 * want here. 244 240 */ 245 - obj = __alloc(c, node, GFP_NOWAIT | __GFP_NOWARN | __GFP_ACCOUNT); 241 + obj = __alloc(c, node, gfp); 246 242 if (!obj) 247 243 break; 248 244 add_obj_to_free_list(c, obj); ··· 348 344 cnt = --c->free_cnt; 349 345 else 350 346 cnt = 0; 351 - dec_active(c, flags); 347 + dec_active(c, &flags); 352 348 if (llnode) 353 349 enque_to_free(tgt, llnode); 354 350 } while (cnt > (c->high_watermark + c->low_watermark) / 2); ··· 388 384 llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra_rcu)) 389 385 if (__llist_add(llnode, &c->free_by_rcu)) 390 386 c->free_by_rcu_tail = llnode; 391 - dec_active(c, flags); 387 + dec_active(c, &flags); 392 388 } 393 389 394 390 if (llist_empty(&c->free_by_rcu)) ··· 412 408 inc_active(c, &flags); 413 409 WRITE_ONCE(c->waiting_for_gp.first, __llist_del_all(&c->free_by_rcu)); 414 410 c->waiting_for_gp_tail = c->free_by_rcu_tail; 415 - dec_active(c, flags); 411 + dec_active(c, &flags); 416 412 417 413 if (unlikely(READ_ONCE(c->draining))) { 418 414 free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size); ··· 433 429 /* irq_work runs on this cpu and kmalloc will allocate 434 430 * from the current numa node which is what we want here. 435 431 */ 436 - alloc_bulk(c, c->batch, NUMA_NO_NODE); 432 + alloc_bulk(c, c->batch, NUMA_NO_NODE, true); 437 433 else if (cnt > c->high_watermark) 438 434 free_bulk(c); 439 435 ··· 481 477 * prog won't be doing more than 4 map_update_elem from 482 478 * irq disabled region 483 479 */ 484 - alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu)); 480 + alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu), false); 485 481 } 486 482 487 483 /* When size != 0 bpf_mem_cache for each cpu.

+1

kernel/bpf/offload.c

··· 25 25 #include <linux/rhashtable.h> 26 26 #include <linux/rtnetlink.h> 27 27 #include <linux/rwsem.h> 28 + #include <net/xdp.h> 28 29 29 30 /* Protects offdevs, members of bpf_offload_netdev and offload members 30 31 * of all progs.

+284 -63

kernel/bpf/verifier.c

··· 26 26 #include <linux/poison.h> 27 27 #include <linux/module.h> 28 28 #include <linux/cpumask.h> 29 + #include <net/xdp.h> 29 30 30 31 #include "disasm.h" 31 32 ··· 2856 2855 goto next; 2857 2856 if (BPF_OP(code) == BPF_EXIT || BPF_OP(code) == BPF_CALL) 2858 2857 goto next; 2859 - off = i + insn[i].off + 1; 2858 + if (code == (BPF_JMP32 | BPF_JA)) 2859 + off = i + insn[i].imm + 1; 2860 + else 2861 + off = i + insn[i].off + 1; 2860 2862 if (off < subprog_start || off >= subprog_end) { 2861 2863 verbose(env, "jump out of range from insn %d to %d\n", i, off); 2862 2864 return -EINVAL; ··· 2871 2867 * or unconditional jump back 2872 2868 */ 2873 2869 if (code != (BPF_JMP | BPF_EXIT) && 2870 + code != (BPF_JMP32 | BPF_JA) && 2874 2871 code != (BPF_JMP | BPF_JA)) { 2875 2872 verbose(env, "last insn is not an exit or jmp\n"); 2876 2873 return -EINVAL; ··· 3017 3012 } 3018 3013 } 3019 3014 3015 + if (class == BPF_ALU64 && op == BPF_END && (insn->imm == 16 || insn->imm == 32)) 3016 + return false; 3017 + 3020 3018 if (class == BPF_ALU64 || class == BPF_JMP || 3021 - /* BPF_END always use BPF_ALU class. */ 3022 3019 (class == BPF_ALU && op == BPF_END && insn->imm == 64)) 3023 3020 return true; 3024 3021 ··· 3428 3421 return 0; 3429 3422 if (opcode == BPF_MOV) { 3430 3423 if (BPF_SRC(insn->code) == BPF_X) { 3431 - /* dreg = sreg 3424 + /* dreg = sreg or dreg = (s8, s16, s32)sreg 3432 3425 * dreg needs precision after this insn 3433 3426 * sreg needs precision before this insn 3434 3427 */ ··· 5834 5827 __reg_combine_64_into_32(reg); 5835 5828 } 5836 5829 5830 + static void set_sext64_default_val(struct bpf_reg_state *reg, int size) 5831 + { 5832 + if (size == 1) { 5833 + reg->smin_value = reg->s32_min_value = S8_MIN; 5834 + reg->smax_value = reg->s32_max_value = S8_MAX; 5835 + } else if (size == 2) { 5836 + reg->smin_value = reg->s32_min_value = S16_MIN; 5837 + reg->smax_value = reg->s32_max_value = S16_MAX; 5838 + } else { 5839 + /* size == 4 */ 5840 + reg->smin_value = reg->s32_min_value = S32_MIN; 5841 + reg->smax_value = reg->s32_max_value = S32_MAX; 5842 + } 5843 + reg->umin_value = reg->u32_min_value = 0; 5844 + reg->umax_value = U64_MAX; 5845 + reg->u32_max_value = U32_MAX; 5846 + reg->var_off = tnum_unknown; 5847 + } 5848 + 5849 + static void coerce_reg_to_size_sx(struct bpf_reg_state *reg, int size) 5850 + { 5851 + s64 init_s64_max, init_s64_min, s64_max, s64_min, u64_cval; 5852 + u64 top_smax_value, top_smin_value; 5853 + u64 num_bits = size * 8; 5854 + 5855 + if (tnum_is_const(reg->var_off)) { 5856 + u64_cval = reg->var_off.value; 5857 + if (size == 1) 5858 + reg->var_off = tnum_const((s8)u64_cval); 5859 + else if (size == 2) 5860 + reg->var_off = tnum_const((s16)u64_cval); 5861 + else 5862 + /* size == 4 */ 5863 + reg->var_off = tnum_const((s32)u64_cval); 5864 + 5865 + u64_cval = reg->var_off.value; 5866 + reg->smax_value = reg->smin_value = u64_cval; 5867 + reg->umax_value = reg->umin_value = u64_cval; 5868 + reg->s32_max_value = reg->s32_min_value = u64_cval; 5869 + reg->u32_max_value = reg->u32_min_value = u64_cval; 5870 + return; 5871 + } 5872 + 5873 + top_smax_value = ((u64)reg->smax_value >> num_bits) << num_bits; 5874 + top_smin_value = ((u64)reg->smin_value >> num_bits) << num_bits; 5875 + 5876 + if (top_smax_value != top_smin_value) 5877 + goto out; 5878 + 5879 + /* find the s64_min and s64_min after sign extension */ 5880 + if (size == 1) { 5881 + init_s64_max = (s8)reg->smax_value; 5882 + init_s64_min = (s8)reg->smin_value; 5883 + } else if (size == 2) { 5884 + init_s64_max = (s16)reg->smax_value; 5885 + init_s64_min = (s16)reg->smin_value; 5886 + } else { 5887 + init_s64_max = (s32)reg->smax_value; 5888 + init_s64_min = (s32)reg->smin_value; 5889 + } 5890 + 5891 + s64_max = max(init_s64_max, init_s64_min); 5892 + s64_min = min(init_s64_max, init_s64_min); 5893 + 5894 + /* both of s64_max/s64_min positive or negative */ 5895 + if ((s64_max >= 0) == (s64_min >= 0)) { 5896 + reg->smin_value = reg->s32_min_value = s64_min; 5897 + reg->smax_value = reg->s32_max_value = s64_max; 5898 + reg->umin_value = reg->u32_min_value = s64_min; 5899 + reg->umax_value = reg->u32_max_value = s64_max; 5900 + reg->var_off = tnum_range(s64_min, s64_max); 5901 + return; 5902 + } 5903 + 5904 + out: 5905 + set_sext64_default_val(reg, size); 5906 + } 5907 + 5908 + static void set_sext32_default_val(struct bpf_reg_state *reg, int size) 5909 + { 5910 + if (size == 1) { 5911 + reg->s32_min_value = S8_MIN; 5912 + reg->s32_max_value = S8_MAX; 5913 + } else { 5914 + /* size == 2 */ 5915 + reg->s32_min_value = S16_MIN; 5916 + reg->s32_max_value = S16_MAX; 5917 + } 5918 + reg->u32_min_value = 0; 5919 + reg->u32_max_value = U32_MAX; 5920 + } 5921 + 5922 + static void coerce_subreg_to_size_sx(struct bpf_reg_state *reg, int size) 5923 + { 5924 + s32 init_s32_max, init_s32_min, s32_max, s32_min, u32_val; 5925 + u32 top_smax_value, top_smin_value; 5926 + u32 num_bits = size * 8; 5927 + 5928 + if (tnum_is_const(reg->var_off)) { 5929 + u32_val = reg->var_off.value; 5930 + if (size == 1) 5931 + reg->var_off = tnum_const((s8)u32_val); 5932 + else 5933 + reg->var_off = tnum_const((s16)u32_val); 5934 + 5935 + u32_val = reg->var_off.value; 5936 + reg->s32_min_value = reg->s32_max_value = u32_val; 5937 + reg->u32_min_value = reg->u32_max_value = u32_val; 5938 + return; 5939 + } 5940 + 5941 + top_smax_value = ((u32)reg->s32_max_value >> num_bits) << num_bits; 5942 + top_smin_value = ((u32)reg->s32_min_value >> num_bits) << num_bits; 5943 + 5944 + if (top_smax_value != top_smin_value) 5945 + goto out; 5946 + 5947 + /* find the s32_min and s32_min after sign extension */ 5948 + if (size == 1) { 5949 + init_s32_max = (s8)reg->s32_max_value; 5950 + init_s32_min = (s8)reg->s32_min_value; 5951 + } else { 5952 + /* size == 2 */ 5953 + init_s32_max = (s16)reg->s32_max_value; 5954 + init_s32_min = (s16)reg->s32_min_value; 5955 + } 5956 + s32_max = max(init_s32_max, init_s32_min); 5957 + s32_min = min(init_s32_max, init_s32_min); 5958 + 5959 + if ((s32_min >= 0) == (s32_max >= 0)) { 5960 + reg->s32_min_value = s32_min; 5961 + reg->s32_max_value = s32_max; 5962 + reg->u32_min_value = (u32)s32_min; 5963 + reg->u32_max_value = (u32)s32_max; 5964 + return; 5965 + } 5966 + 5967 + out: 5968 + set_sext32_default_val(reg, size); 5969 + } 5970 + 5837 5971 static bool bpf_map_is_rdonly(const struct bpf_map *map) 5838 5972 { 5839 5973 /* A map is considered read-only if the following condition are true: ··· 5995 5847 !bpf_map_write_active(map); 5996 5848 } 5997 5849 5998 - static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val) 5850 + static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val, 5851 + bool is_ldsx) 5999 5852 { 6000 5853 void *ptr; 6001 5854 u64 addr; ··· 6009 5860 6010 5861 switch (size) { 6011 5862 case sizeof(u8): 6012 - *val = (u64)*(u8 *)ptr; 5863 + *val = is_ldsx ? (s64)*(s8 *)ptr : (u64)*(u8 *)ptr; 6013 5864 break; 6014 5865 case sizeof(u16): 6015 - *val = (u64)*(u16 *)ptr; 5866 + *val = is_ldsx ? (s64)*(s16 *)ptr : (u64)*(u16 *)ptr; 6016 5867 break; 6017 5868 case sizeof(u32): 6018 - *val = (u64)*(u32 *)ptr; 5869 + *val = is_ldsx ? (s64)*(s32 *)ptr : (u64)*(u32 *)ptr; 6019 5870 break; 6020 5871 case sizeof(u64): 6021 5872 *val = *(u64 *)ptr; ··· 6434 6285 */ 6435 6286 static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regno, 6436 6287 int off, int bpf_size, enum bpf_access_type t, 6437 - int value_regno, bool strict_alignment_once) 6288 + int value_regno, bool strict_alignment_once, bool is_ldsx) 6438 6289 { 6439 6290 struct bpf_reg_state *regs = cur_regs(env); 6440 6291 struct bpf_reg_state *reg = regs + regno; ··· 6495 6346 u64 val = 0; 6496 6347 6497 6348 err = bpf_map_direct_read(map, map_off, size, 6498 - &val); 6349 + &val, is_ldsx); 6499 6350 if (err) 6500 6351 return err; 6501 6352 ··· 6665 6516 6666 6517 if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ && 6667 6518 regs[value_regno].type == SCALAR_VALUE) { 6668 - /* b/h/w load zero-extends, mark upper bits as known 0 */ 6669 - coerce_reg_to_size(&regs[value_regno], size); 6519 + if (!is_ldsx) 6520 + /* b/h/w load zero-extends, mark upper bits as known 0 */ 6521 + coerce_reg_to_size(&regs[value_regno], size); 6522 + else 6523 + coerce_reg_to_size_sx(&regs[value_regno], size); 6670 6524 } 6671 6525 return err; 6672 6526 } ··· 6761 6609 * case to simulate the register fill. 6762 6610 */ 6763 6611 err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, 6764 - BPF_SIZE(insn->code), BPF_READ, -1, true); 6612 + BPF_SIZE(insn->code), BPF_READ, -1, true, false); 6765 6613 if (!err && load_reg >= 0) 6766 6614 err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, 6767 6615 BPF_SIZE(insn->code), BPF_READ, load_reg, 6768 - true); 6616 + true, false); 6769 6617 if (err) 6770 6618 return err; 6771 6619 6772 6620 /* Check whether we can write into the same memory. */ 6773 6621 err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, 6774 - BPF_SIZE(insn->code), BPF_WRITE, -1, true); 6622 + BPF_SIZE(insn->code), BPF_WRITE, -1, true, false); 6775 6623 if (err) 6776 6624 return err; 6777 6625 ··· 7017 6865 return zero_size_allowed ? 0 : -EACCES; 7018 6866 7019 6867 return check_mem_access(env, env->insn_idx, regno, offset, BPF_B, 7020 - atype, -1, false); 6868 + atype, -1, false, false); 7021 6869 } 7022 6870 7023 6871 fallthrough; ··· 7389 7237 /* we write BPF_DW bits (8 bytes) at a time */ 7390 7238 for (i = 0; i < BPF_DYNPTR_SIZE; i += 8) { 7391 7239 err = check_mem_access(env, insn_idx, regno, 7392 - i, BPF_DW, BPF_WRITE, -1, false); 7240 + i, BPF_DW, BPF_WRITE, -1, false, false); 7393 7241 if (err) 7394 7242 return err; 7395 7243 } ··· 7482 7330 7483 7331 for (i = 0; i < nr_slots * 8; i += BPF_REG_SIZE) { 7484 7332 err = check_mem_access(env, insn_idx, regno, 7485 - i, BPF_DW, BPF_WRITE, -1, false); 7333 + i, BPF_DW, BPF_WRITE, -1, false, false); 7486 7334 if (err) 7487 7335 return err; 7488 7336 } ··· 9626 9474 */ 9627 9475 for (i = 0; i < meta.access_size; i++) { 9628 9476 err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B, 9629 - BPF_WRITE, -1, false); 9477 + BPF_WRITE, -1, false, false); 9630 9478 if (err) 9631 9479 return err; 9632 9480 } ··· 13083 12931 } else { 13084 12932 if (insn->src_reg != BPF_REG_0 || insn->off != 0 || 13085 12933 (insn->imm != 16 && insn->imm != 32 && insn->imm != 64) || 13086 - BPF_CLASS(insn->code) == BPF_ALU64) { 12934 + (BPF_CLASS(insn->code) == BPF_ALU64 && 12935 + BPF_SRC(insn->code) != BPF_TO_LE)) { 13087 12936 verbose(env, "BPF_END uses reserved fields\n"); 13088 12937 return -EINVAL; 13089 12938 } ··· 13109 12956 } else if (opcode == BPF_MOV) { 13110 12957 13111 12958 if (BPF_SRC(insn->code) == BPF_X) { 13112 - if (insn->imm != 0 || insn->off != 0) { 12959 + if (insn->imm != 0) { 13113 12960 verbose(env, "BPF_MOV uses reserved fields\n"); 13114 12961 return -EINVAL; 12962 + } 12963 + 12964 + if (BPF_CLASS(insn->code) == BPF_ALU) { 12965 + if (insn->off != 0 && insn->off != 8 && insn->off != 16) { 12966 + verbose(env, "BPF_MOV uses reserved fields\n"); 12967 + return -EINVAL; 12968 + } 12969 + } else { 12970 + if (insn->off != 0 && insn->off != 8 && insn->off != 16 && 12971 + insn->off != 32) { 12972 + verbose(env, "BPF_MOV uses reserved fields\n"); 12973 + return -EINVAL; 12974 + } 13115 12975 } 13116 12976 13117 12977 /* check src operand */ ··· 13150 12984 !tnum_is_const(src_reg->var_off); 13151 12985 13152 12986 if (BPF_CLASS(insn->code) == BPF_ALU64) { 13153 - /* case: R1 = R2 13154 - * copy register state to dest reg 13155 - */ 13156 - if (need_id) 13157 - /* Assign src and dst registers the same ID 13158 - * that will be used by find_equal_scalars() 13159 - * to propagate min/max range. 12987 + if (insn->off == 0) { 12988 + /* case: R1 = R2 12989 + * copy register state to dest reg 13160 12990 */ 13161 - src_reg->id = ++env->id_gen; 13162 - copy_register_state(dst_reg, src_reg); 13163 - dst_reg->live |= REG_LIVE_WRITTEN; 13164 - dst_reg->subreg_def = DEF_NOT_SUBREG; 12991 + if (need_id) 12992 + /* Assign src and dst registers the same ID 12993 + * that will be used by find_equal_scalars() 12994 + * to propagate min/max range. 12995 + */ 12996 + src_reg->id = ++env->id_gen; 12997 + copy_register_state(dst_reg, src_reg); 12998 + dst_reg->live |= REG_LIVE_WRITTEN; 12999 + dst_reg->subreg_def = DEF_NOT_SUBREG; 13000 + } else { 13001 + /* case: R1 = (s8, s16 s32)R2 */ 13002 + bool no_sext; 13003 + 13004 + no_sext = src_reg->umax_value < (1ULL << (insn->off - 1)); 13005 + if (no_sext && need_id) 13006 + src_reg->id = ++env->id_gen; 13007 + copy_register_state(dst_reg, src_reg); 13008 + if (!no_sext) 13009 + dst_reg->id = 0; 13010 + coerce_reg_to_size_sx(dst_reg, insn->off >> 3); 13011 + dst_reg->live |= REG_LIVE_WRITTEN; 13012 + dst_reg->subreg_def = DEF_NOT_SUBREG; 13013 + } 13165 13014 } else { 13166 13015 /* R1 = (u32) R2 */ 13167 13016 if (is_pointer_value(env, insn->src_reg)) { ··· 13185 13004 insn->src_reg); 13186 13005 return -EACCES; 13187 13006 } else if (src_reg->type == SCALAR_VALUE) { 13188 - bool is_src_reg_u32 = src_reg->umax_value <= U32_MAX; 13007 + if (insn->off == 0) { 13008 + bool is_src_reg_u32 = src_reg->umax_value <= U32_MAX; 13189 13009 13190 - if (is_src_reg_u32 && need_id) 13191 - src_reg->id = ++env->id_gen; 13192 - copy_register_state(dst_reg, src_reg); 13193 - /* Make sure ID is cleared if src_reg is not in u32 range otherwise 13194 - * dst_reg min/max could be incorrectly 13195 - * propagated into src_reg by find_equal_scalars() 13196 - */ 13197 - if (!is_src_reg_u32) 13198 - dst_reg->id = 0; 13199 - dst_reg->live |= REG_LIVE_WRITTEN; 13200 - dst_reg->subreg_def = env->insn_idx + 1; 13010 + if (is_src_reg_u32 && need_id) 13011 + src_reg->id = ++env->id_gen; 13012 + copy_register_state(dst_reg, src_reg); 13013 + /* Make sure ID is cleared if src_reg is not in u32 13014 + * range otherwise dst_reg min/max could be incorrectly 13015 + * propagated into src_reg by find_equal_scalars() 13016 + */ 13017 + if (!is_src_reg_u32) 13018 + dst_reg->id = 0; 13019 + dst_reg->live |= REG_LIVE_WRITTEN; 13020 + dst_reg->subreg_def = env->insn_idx + 1; 13021 + } else { 13022 + /* case: W1 = (s8, s16)W2 */ 13023 + bool no_sext = src_reg->umax_value < (1ULL << (insn->off - 1)); 13024 + 13025 + if (no_sext && need_id) 13026 + src_reg->id = ++env->id_gen; 13027 + copy_register_state(dst_reg, src_reg); 13028 + if (!no_sext) 13029 + dst_reg->id = 0; 13030 + dst_reg->live |= REG_LIVE_WRITTEN; 13031 + dst_reg->subreg_def = env->insn_idx + 1; 13032 + coerce_subreg_to_size_sx(dst_reg, insn->off >> 3); 13033 + } 13201 13034 } else { 13202 13035 mark_reg_unknown(env, regs, 13203 13036 insn->dst_reg); ··· 13242 13047 } else { /* all other ALU ops: and, sub, xor, add, ... */ 13243 13048 13244 13049 if (BPF_SRC(insn->code) == BPF_X) { 13245 - if (insn->imm != 0 || insn->off != 0) { 13050 + if (insn->imm != 0 || insn->off > 1 || 13051 + (insn->off == 1 && opcode != BPF_MOD && opcode != BPF_DIV)) { 13246 13052 verbose(env, "BPF_ALU uses reserved fields\n"); 13247 13053 return -EINVAL; 13248 13054 } ··· 13252 13056 if (err) 13253 13057 return err; 13254 13058 } else { 13255 - if (insn->src_reg != BPF_REG_0 || insn->off != 0) { 13059 + if (insn->src_reg != BPF_REG_0 || insn->off > 1 || 13060 + (insn->off == 1 && opcode != BPF_MOD && opcode != BPF_DIV)) { 13256 13061 verbose(env, "BPF_ALU uses reserved fields\n"); 13257 13062 return -EINVAL; 13258 13063 } ··· 14797 14600 static int visit_insn(int t, struct bpf_verifier_env *env) 14798 14601 { 14799 14602 struct bpf_insn *insns = env->prog->insnsi, *insn = &insns[t]; 14800 - int ret; 14603 + int ret, off; 14801 14604 14802 14605 if (bpf_pseudo_func(insn)) 14803 14606 return visit_func_call_insn(t, insns, env, true); ··· 14845 14648 if (BPF_SRC(insn->code) != BPF_K) 14846 14649 return -EINVAL; 14847 14650 14651 + if (BPF_CLASS(insn->code) == BPF_JMP) 14652 + off = insn->off; 14653 + else 14654 + off = insn->imm; 14655 + 14848 14656 /* unconditional jump with single edge */ 14849 - ret = push_insn(t, t + insn->off + 1, FALLTHROUGH, env, 14657 + ret = push_insn(t, t + off + 1, FALLTHROUGH, env, 14850 14658 true); 14851 14659 if (ret) 14852 14660 return ret; 14853 14661 14854 - mark_prune_point(env, t + insn->off + 1); 14855 - mark_jmp_point(env, t + insn->off + 1); 14662 + mark_prune_point(env, t + off + 1); 14663 + mark_jmp_point(env, t + off + 1); 14856 14664 14857 14665 return ret; 14858 14666 ··· 16404 16202 * Have to support a use case when one path through 16405 16203 * the program yields TRUSTED pointer while another 16406 16204 * is UNTRUSTED. Fallback to UNTRUSTED to generate 16407 - * BPF_PROBE_MEM. 16205 + * BPF_PROBE_MEM/BPF_PROBE_MEMSX. 16408 16206 */ 16409 16207 *prev_type = PTR_TO_BTF_ID | PTR_UNTRUSTED; 16410 16208 } else { ··· 16545 16343 */ 16546 16344 err = check_mem_access(env, env->insn_idx, insn->src_reg, 16547 16345 insn->off, BPF_SIZE(insn->code), 16548 - BPF_READ, insn->dst_reg, false); 16346 + BPF_READ, insn->dst_reg, false, 16347 + BPF_MODE(insn->code) == BPF_MEMSX); 16549 16348 if (err) 16550 16349 return err; 16551 16350 ··· 16583 16380 /* check that memory (dst_reg + off) is writeable */ 16584 16381 err = check_mem_access(env, env->insn_idx, insn->dst_reg, 16585 16382 insn->off, BPF_SIZE(insn->code), 16586 - BPF_WRITE, insn->src_reg, false); 16383 + BPF_WRITE, insn->src_reg, false, false); 16587 16384 if (err) 16588 16385 return err; 16589 16386 ··· 16608 16405 /* check that memory (dst_reg + off) is writeable */ 16609 16406 err = check_mem_access(env, env->insn_idx, insn->dst_reg, 16610 16407 insn->off, BPF_SIZE(insn->code), 16611 - BPF_WRITE, -1, false); 16408 + BPF_WRITE, -1, false, false); 16612 16409 if (err) 16613 16410 return err; 16614 16411 ··· 16653 16450 mark_reg_scratched(env, BPF_REG_0); 16654 16451 } else if (opcode == BPF_JA) { 16655 16452 if (BPF_SRC(insn->code) != BPF_K || 16656 - insn->imm != 0 || 16657 16453 insn->src_reg != BPF_REG_0 || 16658 16454 insn->dst_reg != BPF_REG_0 || 16659 - class == BPF_JMP32) { 16455 + (class == BPF_JMP && insn->imm != 0) || 16456 + (class == BPF_JMP32 && insn->off != 0)) { 16660 16457 verbose(env, "BPF_JA uses reserved fields\n"); 16661 16458 return -EINVAL; 16662 16459 } 16663 16460 16664 - env->insn_idx += insn->off + 1; 16461 + if (class == BPF_JMP) 16462 + env->insn_idx += insn->off + 1; 16463 + else 16464 + env->insn_idx += insn->imm + 1; 16665 16465 continue; 16666 16466 16667 16467 } else if (opcode == BPF_EXIT) { ··· 17039 16833 17040 16834 for (i = 0; i < insn_cnt; i++, insn++) { 17041 16835 if (BPF_CLASS(insn->code) == BPF_LDX && 17042 - (BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0)) { 16836 + ((BPF_MODE(insn->code) != BPF_MEM && BPF_MODE(insn->code) != BPF_MEMSX) || 16837 + insn->imm != 0)) { 17043 16838 verbose(env, "BPF_LDX uses reserved fields\n"); 17044 16839 return -EINVAL; 17045 16840 } ··· 17511 17304 { 17512 17305 u8 op; 17513 17306 17307 + op = BPF_OP(code); 17514 17308 if (BPF_CLASS(code) == BPF_JMP32) 17515 - return true; 17309 + return op != BPF_JA; 17516 17310 17517 17311 if (BPF_CLASS(code) != BPF_JMP) 17518 17312 return false; 17519 17313 17520 - op = BPF_OP(code); 17521 17314 return op != BPF_JA && op != BPF_EXIT && op != BPF_CALL; 17522 17315 } 17523 17316 ··· 17734 17527 17735 17528 for (i = 0; i < insn_cnt; i++, insn++) { 17736 17529 bpf_convert_ctx_access_t convert_ctx_access; 17530 + u8 mode; 17737 17531 17738 17532 if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) || 17739 17533 insn->code == (BPF_LDX | BPF_MEM | BPF_H) || 17740 17534 insn->code == (BPF_LDX | BPF_MEM | BPF_W) || 17741 - insn->code == (BPF_LDX | BPF_MEM | BPF_DW)) { 17535 + insn->code == (BPF_LDX | BPF_MEM | BPF_DW) || 17536 + insn->code == (BPF_LDX | BPF_MEMSX | BPF_B) || 17537 + insn->code == (BPF_LDX | BPF_MEMSX | BPF_H) || 17538 + insn->code == (BPF_LDX | BPF_MEMSX | BPF_W)) { 17742 17539 type = BPF_READ; 17743 17540 } else if (insn->code == (BPF_STX | BPF_MEM | BPF_B) || 17744 17541 insn->code == (BPF_STX | BPF_MEM | BPF_H) || ··· 17801 17590 */ 17802 17591 case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED: 17803 17592 if (type == BPF_READ) { 17804 - insn->code = BPF_LDX | BPF_PROBE_MEM | 17805 - BPF_SIZE((insn)->code); 17593 + if (BPF_MODE(insn->code) == BPF_MEM) 17594 + insn->code = BPF_LDX | BPF_PROBE_MEM | 17595 + BPF_SIZE((insn)->code); 17596 + else 17597 + insn->code = BPF_LDX | BPF_PROBE_MEMSX | 17598 + BPF_SIZE((insn)->code); 17806 17599 env->prog->aux->num_exentries++; 17807 17600 } 17808 17601 continue; ··· 17816 17601 17817 17602 ctx_field_size = env->insn_aux_data[i + delta].ctx_field_size; 17818 17603 size = BPF_LDST_BYTES(insn); 17604 + mode = BPF_MODE(insn->code); 17819 17605 17820 17606 /* If the read access is a narrower load of the field, 17821 17607 * convert to a 4/8-byte load, to minimum program type specific ··· 17876 17660 (1ULL << size * 8) - 1); 17877 17661 } 17878 17662 } 17663 + if (mode == BPF_MEMSX) 17664 + insn_buf[cnt++] = BPF_RAW_INSN(BPF_ALU64 | BPF_MOV | BPF_X, 17665 + insn->dst_reg, insn->dst_reg, 17666 + size * 8, 0); 17879 17667 17880 17668 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); 17881 17669 if (!new_prog) ··· 17999 17779 insn = func[i]->insnsi; 18000 17780 for (j = 0; j < func[i]->len; j++, insn++) { 18001 17781 if (BPF_CLASS(insn->code) == BPF_LDX && 18002 - BPF_MODE(insn->code) == BPF_PROBE_MEM) 17782 + (BPF_MODE(insn->code) == BPF_PROBE_MEM || 17783 + BPF_MODE(insn->code) == BPF_PROBE_MEMSX)) 18003 17784 num_exentries++; 18004 17785 } 18005 17786 func[i]->aux->num_exentries = num_exentries;

-11

kernel/trace/bpf_trace.c

··· 223 223 .arg3_type = ARG_ANYTHING, 224 224 }; 225 225 226 - static __always_inline int 227 - bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) 228 - { 229 - int ret; 230 - 231 - ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); 232 - if (unlikely(ret < 0)) 233 - memset(dst, 0, size); 234 - return ret; 235 - } 236 - 237 226 BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, 238 227 const void *, unsafe_ptr) 239 228 {

+8 -4

kernel/trace/trace_syscalls.c

··· 555 555 struct syscall_trace_enter *rec) 556 556 { 557 557 struct syscall_tp_t { 558 - unsigned long long regs; 558 + struct trace_entry ent; 559 559 unsigned long syscall_nr; 560 560 unsigned long args[SYSCALL_DEFINE_MAXARGS]; 561 - } param; 561 + } __aligned(8) param; 562 562 int i; 563 563 564 + BUILD_BUG_ON(sizeof(param.ent) < sizeof(void *)); 565 + 566 + /* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */ 564 567 *(struct pt_regs **)&param = regs; 565 568 param.syscall_nr = rec->nr; 566 569 for (i = 0; i < sys_data->nb_args; i++) ··· 660 657 struct syscall_trace_exit *rec) 661 658 { 662 659 struct syscall_tp_t { 663 - unsigned long long regs; 660 + struct trace_entry ent; 664 661 unsigned long syscall_nr; 665 662 unsigned long ret; 666 - } param; 663 + } __aligned(8) param; 667 664 665 + /* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */ 668 666 *(struct pt_regs **)&param = regs; 669 667 param.syscall_nr = rec->nr; 670 668 param.ret = rec->ret;

+1

net/bpf/test_run.c

··· 20 20 #include <linux/smp.h> 21 21 #include <linux/sock_diag.h> 22 22 #include <linux/netfilter.h> 23 + #include <net/netdev_rx_queue.h> 23 24 #include <net/xdp.h> 24 25 #include <net/netfilter/nf_bpf_link.h> 25 26

+5 -1

net/core/dev.c

··· 133 133 #include <trace/events/net.h> 134 134 #include <trace/events/skb.h> 135 135 #include <trace/events/qdisc.h> 136 + #include <trace/events/xdp.h> 136 137 #include <linux/inetdevice.h> 137 138 #include <linux/cpu_rmap.h> 138 139 #include <linux/static_key.h> ··· 152 151 #include <linux/pm_runtime.h> 153 152 #include <linux/prandom.h> 154 153 #include <linux/once_lite.h> 154 + #include <net/netdev_rx_queue.h> 155 155 156 156 #include "dev.h" 157 157 #include "net-sysfs.h" ··· 9477 9475 { 9478 9476 struct net *net = current->nsproxy->net_ns; 9479 9477 struct bpf_link_primer link_primer; 9478 + struct netlink_ext_ack extack = {}; 9480 9479 struct bpf_xdp_link *link; 9481 9480 struct net_device *dev; 9482 9481 int err, fd; ··· 9505 9502 goto unlock; 9506 9503 } 9507 9504 9508 - err = dev_xdp_attach_link(dev, NULL, link); 9505 + err = dev_xdp_attach_link(dev, &extack, link); 9509 9506 rtnl_unlock(); 9510 9507 9511 9508 if (err) { 9512 9509 link->dev = NULL; 9513 9510 bpf_link_cleanup(&link_primer); 9511 + trace_bpf_xdp_link_attach_failed(extack._msg); 9514 9512 goto out_put_dev; 9515 9513 } 9516 9514

+2 -2

net/core/filter.c

··· 7351 7351 return -EOPNOTSUPP; 7352 7352 if (unlikely(dev_net(skb->dev) != sock_net(sk))) 7353 7353 return -ENETUNREACH; 7354 - if (unlikely(sk_fullsock(sk) && sk->sk_reuseport)) 7355 - return -ESOCKTNOSUPPORT; 7354 + if (sk_unhashed(sk)) 7355 + return -EOPNOTSUPP; 7356 7356 if (sk_is_refcounted(sk) && 7357 7357 unlikely(!refcount_inc_not_zero(&sk->sk_refcnt))) 7358 7358 return -ENOENT;

+1

net/core/net-sysfs.c

··· 23 23 #include <linux/of.h> 24 24 #include <linux/of_net.h> 25 25 #include <linux/cpu.h> 26 + #include <net/netdev_rx_queue.h> 26 27 27 28 #include "dev.h" 28 29 #include "net-sysfs.h"

+43 -23

net/ipv4/inet_hashtables.c

··· 28 28 #include <net/tcp.h> 29 29 #include <net/sock_reuseport.h> 30 30 31 - static u32 inet_ehashfn(const struct net *net, const __be32 laddr, 32 - const __u16 lport, const __be32 faddr, 33 - const __be16 fport) 31 + u32 inet_ehashfn(const struct net *net, const __be32 laddr, 32 + const __u16 lport, const __be32 faddr, 33 + const __be16 fport) 34 34 { 35 35 static u32 inet_ehash_secret __read_mostly; 36 36 ··· 39 39 return __inet_ehashfn(laddr, lport, faddr, fport, 40 40 inet_ehash_secret + net_hash_mix(net)); 41 41 } 42 + EXPORT_SYMBOL_GPL(inet_ehashfn); 42 43 43 44 /* This function handles inet_sock, but also timewait and request sockets 44 45 * for IPv4/IPv6. ··· 333 332 return score; 334 333 } 335 334 336 - static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, 337 - struct sk_buff *skb, int doff, 338 - __be32 saddr, __be16 sport, 339 - __be32 daddr, unsigned short hnum) 335 + /** 336 + * inet_lookup_reuseport() - execute reuseport logic on AF_INET socket if necessary. 337 + * @net: network namespace. 338 + * @sk: AF_INET socket, must be in TCP_LISTEN state for TCP or TCP_CLOSE for UDP. 339 + * @skb: context for a potential SK_REUSEPORT program. 340 + * @doff: header offset. 341 + * @saddr: source address. 342 + * @sport: source port. 343 + * @daddr: destination address. 344 + * @hnum: destination port in host byte order. 345 + * @ehashfn: hash function used to generate the fallback hash. 346 + * 347 + * Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to 348 + * the selected sock or an error. 349 + */ 350 + struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, 351 + struct sk_buff *skb, int doff, 352 + __be32 saddr, __be16 sport, 353 + __be32 daddr, unsigned short hnum, 354 + inet_ehashfn_t *ehashfn) 340 355 { 341 356 struct sock *reuse_sk = NULL; 342 357 u32 phash; 343 358 344 359 if (sk->sk_reuseport) { 345 - phash = inet_ehashfn(net, daddr, hnum, saddr, sport); 360 + phash = INDIRECT_CALL_2(ehashfn, udp_ehashfn, inet_ehashfn, 361 + net, daddr, hnum, saddr, sport); 346 362 reuse_sk = reuseport_select_sock(sk, phash, skb, doff); 347 363 } 348 364 return reuse_sk; 349 365 } 366 + EXPORT_SYMBOL_GPL(inet_lookup_reuseport); 350 367 351 368 /* 352 369 * Here are some nice properties to exploit here. The BSD API ··· 388 369 sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) { 389 370 score = compute_score(sk, net, hnum, daddr, dif, sdif); 390 371 if (score > hiscore) { 391 - result = lookup_reuseport(net, sk, skb, doff, 392 - saddr, sport, daddr, hnum); 372 + result = inet_lookup_reuseport(net, sk, skb, doff, 373 + saddr, sport, daddr, hnum, inet_ehashfn); 393 374 if (result) 394 375 return result; 395 376 ··· 401 382 return result; 402 383 } 403 384 404 - static inline struct sock *inet_lookup_run_bpf(struct net *net, 405 - struct inet_hashinfo *hashinfo, 406 - struct sk_buff *skb, int doff, 407 - __be32 saddr, __be16 sport, 408 - __be32 daddr, u16 hnum, const int dif) 385 + struct sock *inet_lookup_run_sk_lookup(struct net *net, 386 + int protocol, 387 + struct sk_buff *skb, int doff, 388 + __be32 saddr, __be16 sport, 389 + __be32 daddr, u16 hnum, const int dif, 390 + inet_ehashfn_t *ehashfn) 409 391 { 410 392 struct sock *sk, *reuse_sk; 411 393 bool no_reuseport; 412 394 413 - if (hashinfo != net->ipv4.tcp_death_row.hashinfo) 414 - return NULL; /* only TCP is supported */ 415 - 416 - no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_TCP, saddr, sport, 395 + no_reuseport = bpf_sk_lookup_run_v4(net, protocol, saddr, sport, 417 396 daddr, hnum, dif, &sk); 418 397 if (no_reuseport || IS_ERR_OR_NULL(sk)) 419 398 return sk; 420 399 421 - reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); 400 + reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum, 401 + ehashfn); 422 402 if (reuse_sk) 423 403 sk = reuse_sk; 424 404 return sk; ··· 435 417 unsigned int hash2; 436 418 437 419 /* Lookup redirect from BPF */ 438 - if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { 439 - result = inet_lookup_run_bpf(net, hashinfo, skb, doff, 440 - saddr, sport, daddr, hnum, dif); 420 + if (static_branch_unlikely(&bpf_sk_lookup_enabled) && 421 + hashinfo == net->ipv4.tcp_death_row.hashinfo) { 422 + result = inet_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff, 423 + saddr, sport, daddr, hnum, dif, 424 + inet_ehashfn); 441 425 if (result) 442 426 goto done; 443 427 }

+16 -1

net/ipv4/netfilter/nf_defrag_ipv4.c

··· 7 7 #include <linux/ip.h> 8 8 #include <linux/netfilter.h> 9 9 #include <linux/module.h> 10 + #include <linux/rcupdate.h> 10 11 #include <linux/skbuff.h> 11 12 #include <net/netns/generic.h> 12 13 #include <net/route.h> ··· 114 113 } 115 114 } 116 115 116 + static const struct nf_defrag_hook defrag_hook = { 117 + .owner = THIS_MODULE, 118 + .enable = nf_defrag_ipv4_enable, 119 + .disable = nf_defrag_ipv4_disable, 120 + }; 121 + 117 122 static struct pernet_operations defrag4_net_ops = { 118 123 .exit = defrag4_net_exit, 119 124 }; 120 125 121 126 static int __init nf_defrag_init(void) 122 127 { 123 - return register_pernet_subsys(&defrag4_net_ops); 128 + int err; 129 + 130 + err = register_pernet_subsys(&defrag4_net_ops); 131 + if (err) 132 + return err; 133 + 134 + rcu_assign_pointer(nf_defrag_v4_hook, &defrag_hook); 135 + return err; 124 136 } 125 137 126 138 static void __exit nf_defrag_fini(void) 127 139 { 140 + rcu_assign_pointer(nf_defrag_v4_hook, NULL); 128 141 unregister_pernet_subsys(&defrag4_net_ops); 129 142 } 130 143

+36 -52

net/ipv4/udp.c

··· 407 407 return score; 408 408 } 409 409 410 - static u32 udp_ehashfn(const struct net *net, const __be32 laddr, 411 - const __u16 lport, const __be32 faddr, 412 - const __be16 fport) 410 + INDIRECT_CALLABLE_SCOPE 411 + u32 udp_ehashfn(const struct net *net, const __be32 laddr, const __u16 lport, 412 + const __be32 faddr, const __be16 fport) 413 413 { 414 414 static u32 udp_ehash_secret __read_mostly; 415 415 ··· 417 417 418 418 return __inet_ehashfn(laddr, lport, faddr, fport, 419 419 udp_ehash_secret + net_hash_mix(net)); 420 - } 421 - 422 - static struct sock *lookup_reuseport(struct net *net, struct sock *sk, 423 - struct sk_buff *skb, 424 - __be32 saddr, __be16 sport, 425 - __be32 daddr, unsigned short hnum) 426 - { 427 - struct sock *reuse_sk = NULL; 428 - u32 hash; 429 - 430 - if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { 431 - hash = udp_ehashfn(net, daddr, hnum, saddr, sport); 432 - reuse_sk = reuseport_select_sock(sk, hash, skb, 433 - sizeof(struct udphdr)); 434 - } 435 - return reuse_sk; 436 420 } 437 421 438 422 /* called with rcu_read_lock() */ ··· 436 452 score = compute_score(sk, net, saddr, sport, 437 453 daddr, hnum, dif, sdif); 438 454 if (score > badness) { 439 - result = lookup_reuseport(net, sk, skb, 440 - saddr, sport, daddr, hnum); 455 + badness = score; 456 + 457 + if (sk->sk_state == TCP_ESTABLISHED) { 458 + result = sk; 459 + continue; 460 + } 461 + 462 + result = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), 463 + saddr, sport, daddr, hnum, udp_ehashfn); 464 + if (!result) { 465 + result = sk; 466 + continue; 467 + } 468 + 441 469 /* Fall back to scoring if group has connections */ 442 - if (result && !reuseport_has_conns(sk)) 470 + if (!reuseport_has_conns(sk)) 443 471 return result; 444 472 445 - result = result ? : sk; 446 - badness = score; 473 + /* Reuseport logic returned an error, keep original score. */ 474 + if (IS_ERR(result)) 475 + continue; 476 + 477 + badness = compute_score(result, net, saddr, sport, 478 + daddr, hnum, dif, sdif); 479 + 447 480 } 448 481 } 449 482 return result; 450 - } 451 - 452 - static struct sock *udp4_lookup_run_bpf(struct net *net, 453 - struct udp_table *udptable, 454 - struct sk_buff *skb, 455 - __be32 saddr, __be16 sport, 456 - __be32 daddr, u16 hnum, const int dif) 457 - { 458 - struct sock *sk, *reuse_sk; 459 - bool no_reuseport; 460 - 461 - if (udptable != net->ipv4.udp_table) 462 - return NULL; /* only UDP is supported */ 463 - 464 - no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_UDP, saddr, sport, 465 - daddr, hnum, dif, &sk); 466 - if (no_reuseport || IS_ERR_OR_NULL(sk)) 467 - return sk; 468 - 469 - reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); 470 - if (reuse_sk) 471 - sk = reuse_sk; 472 - return sk; 473 483 } 474 484 475 485 /* UDP is nearly always wildcards out the wazoo, it makes no sense to try ··· 490 512 goto done; 491 513 492 514 /* Lookup redirect from BPF */ 493 - if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { 494 - sk = udp4_lookup_run_bpf(net, udptable, skb, 495 - saddr, sport, daddr, hnum, dif); 515 + if (static_branch_unlikely(&bpf_sk_lookup_enabled) && 516 + udptable == net->ipv4.udp_table) { 517 + sk = inet_lookup_run_sk_lookup(net, IPPROTO_UDP, skb, sizeof(struct udphdr), 518 + saddr, sport, daddr, hnum, dif, 519 + udp_ehashfn); 496 520 if (sk) { 497 521 result = sk; 498 522 goto done; ··· 2392 2412 if (udp4_csum_init(skb, uh, proto)) 2393 2413 goto csum_error; 2394 2414 2395 - sk = skb_steal_sock(skb, &refcounted); 2415 + sk = inet_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest, 2416 + &refcounted, udp_ehashfn); 2417 + if (IS_ERR(sk)) 2418 + goto no_sk; 2419 + 2396 2420 if (sk) { 2397 2421 struct dst_entry *dst = skb_dst(skb); 2398 2422 int ret; ··· 2417 2433 sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable); 2418 2434 if (sk) 2419 2435 return udp_unicast_rcv_skb(sk, skb, uh); 2420 - 2436 + no_sk: 2421 2437 if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) 2422 2438 goto drop; 2423 2439 nf_reset_ct(skb);

+45 -24

net/ipv6/inet6_hashtables.c

··· 39 39 return __inet6_ehashfn(lhash, lport, fhash, fport, 40 40 inet6_ehash_secret + net_hash_mix(net)); 41 41 } 42 + EXPORT_SYMBOL_GPL(inet6_ehashfn); 42 43 43 44 /* 44 45 * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so ··· 112 111 return score; 113 112 } 114 113 115 - static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, 116 - struct sk_buff *skb, int doff, 117 - const struct in6_addr *saddr, 118 - __be16 sport, 119 - const struct in6_addr *daddr, 120 - unsigned short hnum) 114 + /** 115 + * inet6_lookup_reuseport() - execute reuseport logic on AF_INET6 socket if necessary. 116 + * @net: network namespace. 117 + * @sk: AF_INET6 socket, must be in TCP_LISTEN state for TCP or TCP_CLOSE for UDP. 118 + * @skb: context for a potential SK_REUSEPORT program. 119 + * @doff: header offset. 120 + * @saddr: source address. 121 + * @sport: source port. 122 + * @daddr: destination address. 123 + * @hnum: destination port in host byte order. 124 + * @ehashfn: hash function used to generate the fallback hash. 125 + * 126 + * Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to 127 + * the selected sock or an error. 128 + */ 129 + struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, 130 + struct sk_buff *skb, int doff, 131 + const struct in6_addr *saddr, 132 + __be16 sport, 133 + const struct in6_addr *daddr, 134 + unsigned short hnum, 135 + inet6_ehashfn_t *ehashfn) 121 136 { 122 137 struct sock *reuse_sk = NULL; 123 138 u32 phash; 124 139 125 140 if (sk->sk_reuseport) { 126 - phash = inet6_ehashfn(net, daddr, hnum, saddr, sport); 141 + phash = INDIRECT_CALL_INET(ehashfn, udp6_ehashfn, inet6_ehashfn, 142 + net, daddr, hnum, saddr, sport); 127 143 reuse_sk = reuseport_select_sock(sk, phash, skb, doff); 128 144 } 129 145 return reuse_sk; 130 146 } 147 + EXPORT_SYMBOL_GPL(inet6_lookup_reuseport); 131 148 132 149 /* called with rcu_read_lock() */ 133 150 static struct sock *inet6_lhash2_lookup(struct net *net, ··· 162 143 sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) { 163 144 score = compute_score(sk, net, hnum, daddr, dif, sdif); 164 145 if (score > hiscore) { 165 - result = lookup_reuseport(net, sk, skb, doff, 166 - saddr, sport, daddr, hnum); 146 + result = inet6_lookup_reuseport(net, sk, skb, doff, 147 + saddr, sport, daddr, hnum, inet6_ehashfn); 167 148 if (result) 168 149 return result; 169 150 ··· 175 156 return result; 176 157 } 177 158 178 - static inline struct sock *inet6_lookup_run_bpf(struct net *net, 179 - struct inet_hashinfo *hashinfo, 180 - struct sk_buff *skb, int doff, 181 - const struct in6_addr *saddr, 182 - const __be16 sport, 183 - const struct in6_addr *daddr, 184 - const u16 hnum, const int dif) 159 + struct sock *inet6_lookup_run_sk_lookup(struct net *net, 160 + int protocol, 161 + struct sk_buff *skb, int doff, 162 + const struct in6_addr *saddr, 163 + const __be16 sport, 164 + const struct in6_addr *daddr, 165 + const u16 hnum, const int dif, 166 + inet6_ehashfn_t *ehashfn) 185 167 { 186 168 struct sock *sk, *reuse_sk; 187 169 bool no_reuseport; 188 170 189 - if (hashinfo != net->ipv4.tcp_death_row.hashinfo) 190 - return NULL; /* only TCP is supported */ 191 - 192 - no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_TCP, saddr, sport, 171 + no_reuseport = bpf_sk_lookup_run_v6(net, protocol, saddr, sport, 193 172 daddr, hnum, dif, &sk); 194 173 if (no_reuseport || IS_ERR_OR_NULL(sk)) 195 174 return sk; 196 175 197 - reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); 176 + reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, 177 + saddr, sport, daddr, hnum, ehashfn); 198 178 if (reuse_sk) 199 179 sk = reuse_sk; 200 180 return sk; 201 181 } 182 + EXPORT_SYMBOL_GPL(inet6_lookup_run_sk_lookup); 202 183 203 184 struct sock *inet6_lookup_listener(struct net *net, 204 185 struct inet_hashinfo *hashinfo, ··· 212 193 unsigned int hash2; 213 194 214 195 /* Lookup redirect from BPF */ 215 - if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { 216 - result = inet6_lookup_run_bpf(net, hashinfo, skb, doff, 217 - saddr, sport, daddr, hnum, dif); 196 + if (static_branch_unlikely(&bpf_sk_lookup_enabled) && 197 + hashinfo == net->ipv4.tcp_death_row.hashinfo) { 198 + result = inet6_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff, 199 + saddr, sport, daddr, hnum, dif, 200 + inet6_ehashfn); 218 201 if (result) 219 202 goto done; 220 203 }

+11

net/ipv6/netfilter/nf_defrag_ipv6_hooks.c

··· 10 10 #include <linux/module.h> 11 11 #include <linux/skbuff.h> 12 12 #include <linux/icmp.h> 13 + #include <linux/rcupdate.h> 13 14 #include <linux/sysctl.h> 14 15 #include <net/ipv6_frag.h> 15 16 ··· 97 96 } 98 97 } 99 98 99 + static const struct nf_defrag_hook defrag_hook = { 100 + .owner = THIS_MODULE, 101 + .enable = nf_defrag_ipv6_enable, 102 + .disable = nf_defrag_ipv6_disable, 103 + }; 104 + 100 105 static struct pernet_operations defrag6_net_ops = { 101 106 .exit = defrag6_net_exit, 102 107 }; ··· 121 114 pr_err("nf_defrag_ipv6: can't register pernet ops\n"); 122 115 goto cleanup_frag6; 123 116 } 117 + 118 + rcu_assign_pointer(nf_defrag_v6_hook, &defrag_hook); 119 + 124 120 return ret; 125 121 126 122 cleanup_frag6: ··· 134 124 135 125 static void __exit nf_defrag_fini(void) 136 126 { 127 + rcu_assign_pointer(nf_defrag_v6_hook, NULL); 137 128 unregister_pernet_subsys(&defrag6_net_ops); 138 129 nf_ct_frag6_cleanup(); 139 130 }

+38 -58

net/ipv6/udp.c

··· 72 72 return 0; 73 73 } 74 74 75 - static u32 udp6_ehashfn(const struct net *net, 76 - const struct in6_addr *laddr, 77 - const u16 lport, 78 - const struct in6_addr *faddr, 79 - const __be16 fport) 75 + INDIRECT_CALLABLE_SCOPE 76 + u32 udp6_ehashfn(const struct net *net, 77 + const struct in6_addr *laddr, 78 + const u16 lport, 79 + const struct in6_addr *faddr, 80 + const __be16 fport) 80 81 { 81 82 static u32 udp6_ehash_secret __read_mostly; 82 83 static u32 udp_ipv6_hash_secret __read_mostly; ··· 162 161 return score; 163 162 } 164 163 165 - static struct sock *lookup_reuseport(struct net *net, struct sock *sk, 166 - struct sk_buff *skb, 167 - const struct in6_addr *saddr, 168 - __be16 sport, 169 - const struct in6_addr *daddr, 170 - unsigned int hnum) 171 - { 172 - struct sock *reuse_sk = NULL; 173 - u32 hash; 174 - 175 - if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { 176 - hash = udp6_ehashfn(net, daddr, hnum, saddr, sport); 177 - reuse_sk = reuseport_select_sock(sk, hash, skb, 178 - sizeof(struct udphdr)); 179 - } 180 - return reuse_sk; 181 - } 182 - 183 164 /* called with rcu_read_lock() */ 184 165 static struct sock *udp6_lib_lookup2(struct net *net, 185 166 const struct in6_addr *saddr, __be16 sport, ··· 178 195 score = compute_score(sk, net, saddr, sport, 179 196 daddr, hnum, dif, sdif); 180 197 if (score > badness) { 181 - result = lookup_reuseport(net, sk, skb, 182 - saddr, sport, daddr, hnum); 198 + badness = score; 199 + 200 + if (sk->sk_state == TCP_ESTABLISHED) { 201 + result = sk; 202 + continue; 203 + } 204 + 205 + result = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), 206 + saddr, sport, daddr, hnum, udp6_ehashfn); 207 + if (!result) { 208 + result = sk; 209 + continue; 210 + } 211 + 183 212 /* Fall back to scoring if group has connections */ 184 - if (result && !reuseport_has_conns(sk)) 213 + if (!reuseport_has_conns(sk)) 185 214 return result; 186 215 187 - result = result ? : sk; 188 - badness = score; 216 + /* Reuseport logic returned an error, keep original score. */ 217 + if (IS_ERR(result)) 218 + continue; 219 + 220 + badness = compute_score(sk, net, saddr, sport, 221 + daddr, hnum, dif, sdif); 189 222 } 190 223 } 191 224 return result; 192 - } 193 - 194 - static inline struct sock *udp6_lookup_run_bpf(struct net *net, 195 - struct udp_table *udptable, 196 - struct sk_buff *skb, 197 - const struct in6_addr *saddr, 198 - __be16 sport, 199 - const struct in6_addr *daddr, 200 - u16 hnum, const int dif) 201 - { 202 - struct sock *sk, *reuse_sk; 203 - bool no_reuseport; 204 - 205 - if (udptable != net->ipv4.udp_table) 206 - return NULL; /* only UDP is supported */ 207 - 208 - no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_UDP, saddr, sport, 209 - daddr, hnum, dif, &sk); 210 - if (no_reuseport || IS_ERR_OR_NULL(sk)) 211 - return sk; 212 - 213 - reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); 214 - if (reuse_sk) 215 - sk = reuse_sk; 216 - return sk; 217 225 } 218 226 219 227 /* rcu_read_lock() must be held */ ··· 231 257 goto done; 232 258 233 259 /* Lookup redirect from BPF */ 234 - if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { 235 - sk = udp6_lookup_run_bpf(net, udptable, skb, 236 - saddr, sport, daddr, hnum, dif); 260 + if (static_branch_unlikely(&bpf_sk_lookup_enabled) && 261 + udptable == net->ipv4.udp_table) { 262 + sk = inet6_lookup_run_sk_lookup(net, IPPROTO_UDP, skb, sizeof(struct udphdr), 263 + saddr, sport, daddr, hnum, dif, 264 + udp6_ehashfn); 237 265 if (sk) { 238 266 result = sk; 239 267 goto done; ··· 968 992 goto csum_error; 969 993 970 994 /* Check if the socket is already available, e.g. due to early demux */ 971 - sk = skb_steal_sock(skb, &refcounted); 995 + sk = inet6_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest, 996 + &refcounted, udp6_ehashfn); 997 + if (IS_ERR(sk)) 998 + goto no_sk; 999 + 972 1000 if (sk) { 973 1001 struct dst_entry *dst = skb_dst(skb); 974 1002 int ret; ··· 1006 1026 goto report_csum_error; 1007 1027 return udp6_unicast_rcv_skb(sk, skb, uh); 1008 1028 } 1009 - 1029 + no_sk: 1010 1030 reason = SKB_DROP_REASON_NO_SOCKET; 1011 1031 1012 1032 if (!uh->check)

+6

net/netfilter/core.c

··· 680 680 const struct nf_ct_hook __rcu *nf_ct_hook __read_mostly; 681 681 EXPORT_SYMBOL_GPL(nf_ct_hook); 682 682 683 + const struct nf_defrag_hook __rcu *nf_defrag_v4_hook __read_mostly; 684 + EXPORT_SYMBOL_GPL(nf_defrag_v4_hook); 685 + 686 + const struct nf_defrag_hook __rcu *nf_defrag_v6_hook __read_mostly; 687 + EXPORT_SYMBOL_GPL(nf_defrag_v6_hook); 688 + 683 689 #if IS_ENABLED(CONFIG_NF_CONNTRACK) 684 690 u8 nf_ctnetlink_has_listener; 685 691 EXPORT_SYMBOL_GPL(nf_ctnetlink_has_listener);

+110 -15

net/netfilter/nf_bpf_link.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <linux/bpf.h> 3 3 #include <linux/filter.h> 4 + #include <linux/kmod.h> 5 + #include <linux/module.h> 4 6 #include <linux/netfilter.h> 5 7 6 8 #include <net/netfilter/nf_bpf_link.h> ··· 25 23 struct nf_hook_ops hook_ops; 26 24 struct net *net; 27 25 u32 dead; 26 + const struct nf_defrag_hook *defrag_hook; 28 27 }; 28 + 29 + #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) || IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) 30 + static const struct nf_defrag_hook * 31 + get_proto_defrag_hook(struct bpf_nf_link *link, 32 + const struct nf_defrag_hook __rcu *global_hook, 33 + const char *mod) 34 + { 35 + const struct nf_defrag_hook *hook; 36 + int err; 37 + 38 + /* RCU protects us from races against module unloading */ 39 + rcu_read_lock(); 40 + hook = rcu_dereference(global_hook); 41 + if (!hook) { 42 + rcu_read_unlock(); 43 + err = request_module(mod); 44 + if (err) 45 + return ERR_PTR(err < 0 ? err : -EINVAL); 46 + 47 + rcu_read_lock(); 48 + hook = rcu_dereference(global_hook); 49 + } 50 + 51 + if (hook && try_module_get(hook->owner)) { 52 + /* Once we have a refcnt on the module, we no longer need RCU */ 53 + hook = rcu_pointer_handoff(hook); 54 + } else { 55 + WARN_ONCE(!hook, "%s has bad registration", mod); 56 + hook = ERR_PTR(-ENOENT); 57 + } 58 + rcu_read_unlock(); 59 + 60 + if (!IS_ERR(hook)) { 61 + err = hook->enable(link->net); 62 + if (err) { 63 + module_put(hook->owner); 64 + hook = ERR_PTR(err); 65 + } 66 + } 67 + 68 + return hook; 69 + } 70 + #endif 71 + 72 + static int bpf_nf_enable_defrag(struct bpf_nf_link *link) 73 + { 74 + const struct nf_defrag_hook __maybe_unused *hook; 75 + 76 + switch (link->hook_ops.pf) { 77 + #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) 78 + case NFPROTO_IPV4: 79 + hook = get_proto_defrag_hook(link, nf_defrag_v4_hook, "nf_defrag_ipv4"); 80 + if (IS_ERR(hook)) 81 + return PTR_ERR(hook); 82 + 83 + link->defrag_hook = hook; 84 + return 0; 85 + #endif 86 + #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) 87 + case NFPROTO_IPV6: 88 + hook = get_proto_defrag_hook(link, nf_defrag_v6_hook, "nf_defrag_ipv6"); 89 + if (IS_ERR(hook)) 90 + return PTR_ERR(hook); 91 + 92 + link->defrag_hook = hook; 93 + return 0; 94 + #endif 95 + default: 96 + return -EAFNOSUPPORT; 97 + } 98 + } 99 + 100 + static void bpf_nf_disable_defrag(struct bpf_nf_link *link) 101 + { 102 + const struct nf_defrag_hook *hook = link->defrag_hook; 103 + 104 + if (!hook) 105 + return; 106 + hook->disable(link->net); 107 + module_put(hook->owner); 108 + } 29 109 30 110 static void bpf_nf_link_release(struct bpf_link *link) 31 111 { ··· 116 32 if (nf_link->dead) 117 33 return; 118 34 119 - /* prevent hook-not-found warning splat from netfilter core when 120 - * .detach was already called 121 - */ 122 - if (!cmpxchg(&nf_link->dead, 0, 1)) 35 + /* do not double release in case .detach was already called */ 36 + if (!cmpxchg(&nf_link->dead, 0, 1)) { 123 37 nf_unregister_net_hook(nf_link->net, &nf_link->hook_ops); 38 + bpf_nf_disable_defrag(nf_link); 39 + } 124 40 } 125 41 126 42 static void bpf_nf_link_dealloc(struct bpf_link *link) ··· 176 92 177 93 static int bpf_nf_check_pf_and_hooks(const union bpf_attr *attr) 178 94 { 95 + int prio; 96 + 179 97 switch (attr->link_create.netfilter.pf) { 180 98 case NFPROTO_IPV4: 181 99 case NFPROTO_IPV6: ··· 188 102 return -EAFNOSUPPORT; 189 103 } 190 104 191 - if (attr->link_create.netfilter.flags) 105 + if (attr->link_create.netfilter.flags & ~BPF_F_NETFILTER_IP_DEFRAG) 192 106 return -EOPNOTSUPP; 193 107 194 - /* make sure conntrack confirm is always last. 195 - * 196 - * In the future, if userspace can e.g. request defrag, then 197 - * "defrag_requested && prio before NF_IP_PRI_CONNTRACK_DEFRAG" 198 - * should fail. 199 - */ 200 - switch (attr->link_create.netfilter.priority) { 201 - case NF_IP_PRI_FIRST: return -ERANGE; /* sabotage_in and other warts */ 202 - case NF_IP_PRI_LAST: return -ERANGE; /* e.g. conntrack confirm */ 203 - } 108 + /* make sure conntrack confirm is always last */ 109 + prio = attr->link_create.netfilter.priority; 110 + if (prio == NF_IP_PRI_FIRST) 111 + return -ERANGE; /* sabotage_in and other warts */ 112 + else if (prio == NF_IP_PRI_LAST) 113 + return -ERANGE; /* e.g. conntrack confirm */ 114 + else if ((attr->link_create.netfilter.flags & BPF_F_NETFILTER_IP_DEFRAG) && 115 + prio <= NF_IP_PRI_CONNTRACK_DEFRAG) 116 + return -ERANGE; /* cannot use defrag if prog runs before nf_defrag */ 204 117 205 118 return 0; 206 119 } ··· 234 149 235 150 link->net = net; 236 151 link->dead = false; 152 + link->defrag_hook = NULL; 237 153 238 154 err = bpf_link_prime(&link->link, &link_primer); 239 155 if (err) { ··· 242 156 return err; 243 157 } 244 158 159 + if (attr->link_create.netfilter.flags & BPF_F_NETFILTER_IP_DEFRAG) { 160 + err = bpf_nf_enable_defrag(link); 161 + if (err) { 162 + bpf_link_cleanup(&link_primer); 163 + return err; 164 + } 165 + } 166 + 245 167 err = nf_register_net_hook(net, &link->hook_ops); 246 168 if (err) { 169 + bpf_nf_disable_defrag(link); 247 170 bpf_link_cleanup(&link_primer); 248 171 return err; 249 172 }

+1

net/netfilter/nf_conntrack_bpf.c

··· 14 14 #include <linux/types.h> 15 15 #include <linux/btf_ids.h> 16 16 #include <linux/net_namespace.h> 17 + #include <net/xdp.h> 17 18 #include <net/netfilter/nf_conntrack_bpf.h> 18 19 #include <net/netfilter/nf_conntrack_core.h> 19 20

+1

net/xdp/xsk.c

··· 25 25 #include <linux/vmalloc.h> 26 26 #include <net/xdp_sock_drv.h> 27 27 #include <net/busy_poll.h> 28 + #include <net/netdev_rx_queue.h> 28 29 #include <net/xdp.h> 29 30 30 31 #include "xsk_queue.h"

+6 -3

tools/include/uapi/linux/bpf.h

··· 19 19 20 20 /* ld/ldx fields */ 21 21 #define BPF_DW 0x18 /* double word (64-bit) */ 22 + #define BPF_MEMSX 0x80 /* load with sign extension */ 22 23 #define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */ 23 24 #define BPF_XADD 0xc0 /* exclusive add - legacy name */ 24 25 ··· 1187 1186 * BPF_TRACE_KPROBE_MULTI attach type to create return probe. 1188 1187 */ 1189 1188 #define BPF_F_KPROBE_MULTI_RETURN (1U << 0) 1189 + 1190 + /* link_create.netfilter.flags used in LINK_CREATE command for 1191 + * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation. 1192 + */ 1193 + #define BPF_F_NETFILTER_IP_DEFRAG (1U << 0) 1190 1194 1191 1195 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have 1192 1196 * the following extensions: ··· 4203 4197 * 4204 4198 * **-EOPNOTSUPP** if the operation is not supported, for example 4205 4199 * a call from outside of TC ingress. 4206 - * 4207 - * **-ESOCKTNOSUPPORT** if the socket type is not supported 4208 - * (reuseport). 4209 4200 * 4210 4201 * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags) 4211 4202 * Description

+2 -2

tools/lib/bpf/Makefile

··· 293 293 @echo ' HINT: use "V=1" to enable verbose build' 294 294 @echo ' all - build libraries and pkgconfig' 295 295 @echo ' clean - remove all generated files' 296 - @echo ' check - check abi and version info' 296 + @echo ' check - check ABI and version info' 297 297 @echo '' 298 298 @echo 'libbpf install targets:' 299 299 @echo ' HINT: use "prefix"(defaults to "/usr/local") or "DESTDIR" (defaults to "/")' 300 - @echo ' to adjust target desitantion, e.g. "make prefix=/usr/local install"' 300 + @echo ' to adjust target destination, e.g. "make prefix=/usr/local install"' 301 301 @echo ' install - build and install all headers, libraries and pkgconfig' 302 302 @echo ' install_headers - install only headers to include/bpf' 303 303 @echo ''

+2

tools/testing/selftests/bpf/.gitignore

··· 13 13 /test_progs 14 14 /test_progs-no_alu32 15 15 /test_progs-bpf_gcc 16 + /test_progs-cpuv4 16 17 test_verifier_log 17 18 feature 18 19 test_sock ··· 37 36 *.lskel.h 38 37 /no_alu32 39 38 /bpf_gcc 39 + /cpuv4 40 40 /host-tools 41 41 /tools 42 42 /runqslower

+27 -6

tools/testing/selftests/bpf/Makefile

··· 33 33 LDFLAGS += $(SAN_LDFLAGS) 34 34 LDLIBS += -lelf -lz -lrt -lpthread 35 35 36 - # Silence some warnings when compiled with clang 37 36 ifneq ($(LLVM),) 37 + # Silence some warnings when compiled with clang 38 38 CFLAGS += -Wno-unused-command-line-argument 39 + endif 40 + 41 + # Check whether bpf cpu=v4 is supported or not by clang 42 + ifneq ($(shell $(CLANG) --target=bpf -mcpu=help 2>&1 | grep 'v4'),) 43 + CLANG_CPUV4 := 1 39 44 endif 40 45 41 46 # Order correspond to 'make run_tests' order ··· 54 49 # Also test bpf-gcc, if present 55 50 ifneq ($(BPF_GCC),) 56 51 TEST_GEN_PROGS += test_progs-bpf_gcc 52 + endif 53 + 54 + ifneq ($(CLANG_CPUV4),) 55 + TEST_GEN_PROGS += test_progs-cpuv4 57 56 endif 58 57 59 58 TEST_GEN_FILES = test_lwt_ip_encap.bpf.o test_tc_edt.bpf.o ··· 392 383 $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2) 393 384 $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v2 -o $2 394 385 endef 386 + # Similar to CLANG_BPF_BUILD_RULE, but with cpu-v4 387 + define CLANG_CPUV4_BPF_BUILD_RULE 388 + $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2) 389 + $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v4 -o $2 390 + endef 395 391 # Build BPF object using GCC 396 392 define GCC_BPF_BUILD_RULE 397 393 $(call msg,GCC-BPF,$(TRUNNER_BINARY),$2) ··· 439 425 # $eval()) and pass control to DEFINE_TEST_RUNNER_RULES. 440 426 # Parameters: 441 427 # $1 - test runner base binary name (e.g., test_progs) 442 - # $2 - test runner extra "flavor" (e.g., no_alu32, gcc-bpf, etc) 428 + # $2 - test runner extra "flavor" (e.g., no_alu32, cpuv4, gcc-bpf, etc) 443 429 define DEFINE_TEST_RUNNER 444 430 445 431 TRUNNER_OUTPUT := $(OUTPUT)$(if $2,/)$2 ··· 467 453 # Using TRUNNER_XXX variables, provided by callers of DEFINE_TEST_RUNNER and 468 454 # set up by DEFINE_TEST_RUNNER itself, create test runner build rules with: 469 455 # $1 - test runner base binary name (e.g., test_progs) 470 - # $2 - test runner extra "flavor" (e.g., no_alu32, gcc-bpf, etc) 456 + # $2 - test runner extra "flavor" (e.g., no_alu32, cpuv4, gcc-bpf, etc) 471 457 define DEFINE_TEST_RUNNER_RULES 472 458 473 459 ifeq ($($(TRUNNER_OUTPUT)-dir),) ··· 579 565 network_helpers.c testing_helpers.c \ 580 566 btf_helpers.c flow_dissector_load.h \ 581 567 cap_helpers.c test_loader.c xsk.c disasm.c \ 582 - json_writer.c unpriv_helpers.c 583 - 568 + json_writer.c unpriv_helpers.c \ 569 + ip_check_defrag_frags.h 584 570 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko \ 585 571 $(OUTPUT)/liburandom_read.so \ 586 572 $(OUTPUT)/xdp_synproxy \ ··· 597 583 TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE 598 584 TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) 599 585 $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32)) 586 + 587 + # Define test_progs-cpuv4 test runner. 588 + ifneq ($(CLANG_CPUV4),) 589 + TRUNNER_BPF_BUILD_RULE := CLANG_CPUV4_BPF_BUILD_RULE 590 + TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) 591 + $(eval $(call DEFINE_TEST_RUNNER,test_progs,cpuv4)) 592 + endif 600 593 601 594 # Define test_progs BPF-GCC-flavored test runner. 602 595 ifneq ($(BPF_GCC),) ··· 702 681 prog_tests/tests.h map_tests/tests.h verifier/tests.h \ 703 682 feature bpftool \ 704 683 $(addprefix $(OUTPUT)/,*.o *.skel.h *.lskel.h *.subskel.h \ 705 - no_alu32 bpf_gcc bpf_testmod.ko \ 684 + no_alu32 cpuv4 bpf_gcc bpf_testmod.ko \ 706 685 liburandom_read.so) 707 686 708 687 .PHONY: docs docs-clean

+8 -1

tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c

··· 98 98 return bpf_testmod_test_struct_arg_result; 99 99 } 100 100 101 + noinline int 102 + bpf_testmod_test_arg_ptr_to_struct(struct bpf_testmod_struct_arg_1 *a) { 103 + bpf_testmod_test_struct_arg_result = a->a; 104 + return bpf_testmod_test_struct_arg_result; 105 + } 106 + 101 107 __bpf_kfunc void 102 108 bpf_testmod_test_mod_kfunc(int i) 103 109 { ··· 246 240 .off = off, 247 241 .len = len, 248 242 }; 249 - struct bpf_testmod_struct_arg_1 struct_arg1 = {10}; 243 + struct bpf_testmod_struct_arg_1 struct_arg1 = {10}, struct_arg1_2 = {-1}; 250 244 struct bpf_testmod_struct_arg_2 struct_arg2 = {2, 3}; 251 245 struct bpf_testmod_struct_arg_3 *struct_arg3; 252 246 struct bpf_testmod_struct_arg_4 struct_arg4 = {21, 22}; ··· 265 259 (void)bpf_testmod_test_struct_arg_8(16, (void *)17, 18, 19, 266 260 (void *)20, struct_arg4, 23); 267 261 262 + (void)bpf_testmod_test_arg_ptr_to_struct(&struct_arg1_2); 268 263 269 264 struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) + 270 265 sizeof(int)), GFP_KERNEL);

+90

tools/testing/selftests/bpf/generate_udp_fragments.py

··· 1 + #!/bin/env python3 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + """ 5 + This script helps generate fragmented UDP packets. 6 + 7 + While it is technically possible to dynamically generate 8 + fragmented packets in C, it is much harder to read and write 9 + said code. `scapy` is relatively industry standard and really 10 + easy to read / write. 11 + 12 + So we choose to write this script that generates a valid C 13 + header. Rerun script and commit generated file after any 14 + modifications. 15 + """ 16 + 17 + import argparse 18 + import os 19 + 20 + from scapy.all import * 21 + 22 + 23 + # These constants must stay in sync with `ip_check_defrag.c` 24 + VETH1_ADDR = "172.16.1.200" 25 + VETH0_ADDR6 = "fc00::100" 26 + VETH1_ADDR6 = "fc00::200" 27 + CLIENT_PORT = 48878 28 + SERVER_PORT = 48879 29 + MAGIC_MESSAGE = "THIS IS THE ORIGINAL MESSAGE, PLEASE REASSEMBLE ME" 30 + 31 + 32 + def print_header(f): 33 + f.write("// SPDX-License-Identifier: GPL-2.0\n") 34 + f.write("/* DO NOT EDIT -- this file is generated */\n") 35 + f.write("\n") 36 + f.write("#ifndef _IP_CHECK_DEFRAG_FRAGS_H\n") 37 + f.write("#define _IP_CHECK_DEFRAG_FRAGS_H\n") 38 + f.write("\n") 39 + f.write("#include <stdint.h>\n") 40 + f.write("\n") 41 + 42 + 43 + def print_frags(f, frags, v6): 44 + for idx, frag in enumerate(frags): 45 + # 10 bytes per line to keep width in check 46 + chunks = [frag[i : i + 10] for i in range(0, len(frag), 10)] 47 + chunks_fmted = [", ".join([str(hex(b)) for b in chunk]) for chunk in chunks] 48 + suffix = "6" if v6 else "" 49 + 50 + f.write(f"static uint8_t frag{suffix}_{idx}[] = {{\n") 51 + for chunk in chunks_fmted: 52 + f.write(f"\t{chunk},\n") 53 + f.write(f"}};\n") 54 + 55 + 56 + def print_trailer(f): 57 + f.write("\n") 58 + f.write("#endif /* _IP_CHECK_DEFRAG_FRAGS_H */\n") 59 + 60 + 61 + def main(f): 62 + # srcip of 0 is filled in by IP_HDRINCL 63 + sip = "0.0.0.0" 64 + sip6 = VETH0_ADDR6 65 + dip = VETH1_ADDR 66 + dip6 = VETH1_ADDR6 67 + sport = CLIENT_PORT 68 + dport = SERVER_PORT 69 + payload = MAGIC_MESSAGE.encode() 70 + 71 + # Disable UDPv4 checksums to keep code simpler 72 + pkt = IP(src=sip,dst=dip) / UDP(sport=sport,dport=dport,chksum=0) / Raw(load=payload) 73 + # UDPv6 requires a checksum 74 + # Also pin the ipv6 fragment header ID, otherwise it's a random value 75 + pkt6 = IPv6(src=sip6,dst=dip6) / IPv6ExtHdrFragment(id=0xBEEF) / UDP(sport=sport,dport=dport) / Raw(load=payload) 76 + 77 + frags = [f.build() for f in pkt.fragment(24)] 78 + frags6 = [f.build() for f in fragment6(pkt6, 72)] 79 + 80 + print_header(f) 81 + print_frags(f, frags, False) 82 + print_frags(f, frags6, True) 83 + print_trailer(f) 84 + 85 + 86 + if __name__ == "__main__": 87 + dir = os.path.dirname(os.path.realpath(__file__)) 88 + header = f"{dir}/ip_check_defrag_frags.h" 89 + with open(header, "w") as f: 90 + main(f)

+57

tools/testing/selftests/bpf/ip_check_defrag_frags.h

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* DO NOT EDIT -- this file is generated */ 3 + 4 + #ifndef _IP_CHECK_DEFRAG_FRAGS_H 5 + #define _IP_CHECK_DEFRAG_FRAGS_H 6 + 7 + #include <stdint.h> 8 + 9 + static uint8_t frag_0[] = { 10 + 0x45, 0x0, 0x0, 0x2c, 0x0, 0x1, 0x20, 0x0, 0x40, 0x11, 11 + 0xac, 0xe8, 0x0, 0x0, 0x0, 0x0, 0xac, 0x10, 0x1, 0xc8, 12 + 0xbe, 0xee, 0xbe, 0xef, 0x0, 0x3a, 0x0, 0x0, 0x54, 0x48, 13 + 0x49, 0x53, 0x20, 0x49, 0x53, 0x20, 0x54, 0x48, 0x45, 0x20, 14 + 0x4f, 0x52, 0x49, 0x47, 15 + }; 16 + static uint8_t frag_1[] = { 17 + 0x45, 0x0, 0x0, 0x2c, 0x0, 0x1, 0x20, 0x3, 0x40, 0x11, 18 + 0xac, 0xe5, 0x0, 0x0, 0x0, 0x0, 0xac, 0x10, 0x1, 0xc8, 19 + 0x49, 0x4e, 0x41, 0x4c, 0x20, 0x4d, 0x45, 0x53, 0x53, 0x41, 20 + 0x47, 0x45, 0x2c, 0x20, 0x50, 0x4c, 0x45, 0x41, 0x53, 0x45, 21 + 0x20, 0x52, 0x45, 0x41, 22 + }; 23 + static uint8_t frag_2[] = { 24 + 0x45, 0x0, 0x0, 0x1e, 0x0, 0x1, 0x0, 0x6, 0x40, 0x11, 25 + 0xcc, 0xf0, 0x0, 0x0, 0x0, 0x0, 0xac, 0x10, 0x1, 0xc8, 26 + 0x53, 0x53, 0x45, 0x4d, 0x42, 0x4c, 0x45, 0x20, 0x4d, 0x45, 27 + }; 28 + static uint8_t frag6_0[] = { 29 + 0x60, 0x0, 0x0, 0x0, 0x0, 0x20, 0x2c, 0x40, 0xfc, 0x0, 30 + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 31 + 0x0, 0x0, 0x1, 0x0, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0, 32 + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0, 33 + 0x11, 0x0, 0x0, 0x1, 0x0, 0x0, 0xbe, 0xef, 0xbe, 0xee, 34 + 0xbe, 0xef, 0x0, 0x3a, 0xd0, 0xf8, 0x54, 0x48, 0x49, 0x53, 35 + 0x20, 0x49, 0x53, 0x20, 0x54, 0x48, 0x45, 0x20, 0x4f, 0x52, 36 + 0x49, 0x47, 37 + }; 38 + static uint8_t frag6_1[] = { 39 + 0x60, 0x0, 0x0, 0x0, 0x0, 0x20, 0x2c, 0x40, 0xfc, 0x0, 40 + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 41 + 0x0, 0x0, 0x1, 0x0, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0, 42 + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0, 43 + 0x11, 0x0, 0x0, 0x19, 0x0, 0x0, 0xbe, 0xef, 0x49, 0x4e, 44 + 0x41, 0x4c, 0x20, 0x4d, 0x45, 0x53, 0x53, 0x41, 0x47, 0x45, 45 + 0x2c, 0x20, 0x50, 0x4c, 0x45, 0x41, 0x53, 0x45, 0x20, 0x52, 46 + 0x45, 0x41, 47 + }; 48 + static uint8_t frag6_2[] = { 49 + 0x60, 0x0, 0x0, 0x0, 0x0, 0x12, 0x2c, 0x40, 0xfc, 0x0, 50 + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 51 + 0x0, 0x0, 0x1, 0x0, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0, 52 + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0, 53 + 0x11, 0x0, 0x0, 0x30, 0x0, 0x0, 0xbe, 0xef, 0x53, 0x53, 54 + 0x45, 0x4d, 0x42, 0x4c, 0x45, 0x20, 0x4d, 0x45, 55 + }; 56 + 57 + #endif /* _IP_CHECK_DEFRAG_FRAGS_H */

+21 -8

tools/testing/selftests/bpf/network_helpers.c

··· 270 270 opts = &default_opts; 271 271 272 272 optlen = sizeof(type); 273 - if (getsockopt(server_fd, SOL_SOCKET, SO_TYPE, &type, &optlen)) { 274 - log_err("getsockopt(SOL_TYPE)"); 275 - return -1; 273 + 274 + if (opts->type) { 275 + type = opts->type; 276 + } else { 277 + if (getsockopt(server_fd, SOL_SOCKET, SO_TYPE, &type, &optlen)) { 278 + log_err("getsockopt(SOL_TYPE)"); 279 + return -1; 280 + } 276 281 } 277 282 278 - if (getsockopt(server_fd, SOL_SOCKET, SO_PROTOCOL, &protocol, &optlen)) { 279 - log_err("getsockopt(SOL_PROTOCOL)"); 280 - return -1; 283 + if (opts->proto) { 284 + protocol = opts->proto; 285 + } else { 286 + if (getsockopt(server_fd, SOL_SOCKET, SO_PROTOCOL, &protocol, &optlen)) { 287 + log_err("getsockopt(SOL_PROTOCOL)"); 288 + return -1; 289 + } 281 290 } 282 291 283 292 addrlen = sizeof(addr); ··· 310 301 strlen(opts->cc) + 1)) 311 302 goto error_close; 312 303 313 - if (connect_fd_to_addr(fd, &addr, addrlen, opts->must_fail)) 314 - goto error_close; 304 + if (!opts->noconnect) 305 + if (connect_fd_to_addr(fd, &addr, addrlen, opts->must_fail)) 306 + goto error_close; 315 307 316 308 return fd; 317 309 ··· 433 423 434 424 void close_netns(struct nstoken *token) 435 425 { 426 + if (!token) 427 + return; 428 + 436 429 ASSERT_OK(setns(token->orig_netns_fd, CLONE_NEWNET), "setns"); 437 430 close(token->orig_netns_fd); 438 431 free(token);

+3

tools/testing/selftests/bpf/network_helpers.h

··· 21 21 const char *cc; 22 22 int timeout_ms; 23 23 bool must_fail; 24 + bool noconnect; 25 + int type; 26 + int proto; 24 27 }; 25 28 26 29 /* ipv4 test vector */

+199

tools/testing/selftests/bpf/prog_tests/assign_reuse.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Isovalent */ 3 + #include <uapi/linux/if_link.h> 4 + #include <test_progs.h> 5 + 6 + #include <netinet/tcp.h> 7 + #include <netinet/udp.h> 8 + 9 + #include "network_helpers.h" 10 + #include "test_assign_reuse.skel.h" 11 + 12 + #define NS_TEST "assign_reuse" 13 + #define LOOPBACK 1 14 + #define PORT 4443 15 + 16 + static int attach_reuseport(int sock_fd, int prog_fd) 17 + { 18 + return setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, 19 + &prog_fd, sizeof(prog_fd)); 20 + } 21 + 22 + static __u64 cookie(int fd) 23 + { 24 + __u64 cookie = 0; 25 + socklen_t cookie_len = sizeof(cookie); 26 + int ret; 27 + 28 + ret = getsockopt(fd, SOL_SOCKET, SO_COOKIE, &cookie, &cookie_len); 29 + ASSERT_OK(ret, "cookie"); 30 + ASSERT_GT(cookie, 0, "cookie_invalid"); 31 + 32 + return cookie; 33 + } 34 + 35 + static int echo_test_udp(int fd_sv) 36 + { 37 + struct sockaddr_storage addr = {}; 38 + socklen_t len = sizeof(addr); 39 + char buff[1] = {}; 40 + int fd_cl = -1, ret; 41 + 42 + fd_cl = connect_to_fd(fd_sv, 100); 43 + ASSERT_GT(fd_cl, 0, "create_client"); 44 + ASSERT_EQ(getsockname(fd_cl, (void *)&addr, &len), 0, "getsockname"); 45 + 46 + ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client"); 47 + 48 + ret = recv(fd_sv, buff, sizeof(buff), 0); 49 + if (ret < 0) { 50 + close(fd_cl); 51 + return errno; 52 + } 53 + 54 + ASSERT_EQ(ret, 1, "recv_server"); 55 + ASSERT_EQ(sendto(fd_sv, buff, sizeof(buff), 0, (void *)&addr, len), 1, "send_server"); 56 + ASSERT_EQ(recv(fd_cl, buff, sizeof(buff), 0), 1, "recv_client"); 57 + close(fd_cl); 58 + return 0; 59 + } 60 + 61 + static int echo_test_tcp(int fd_sv) 62 + { 63 + char buff[1] = {}; 64 + int fd_cl = -1, fd_sv_cl = -1; 65 + 66 + fd_cl = connect_to_fd(fd_sv, 100); 67 + if (fd_cl < 0) 68 + return errno; 69 + 70 + fd_sv_cl = accept(fd_sv, NULL, NULL); 71 + ASSERT_GE(fd_sv_cl, 0, "accept_fd"); 72 + 73 + ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client"); 74 + ASSERT_EQ(recv(fd_sv_cl, buff, sizeof(buff), 0), 1, "recv_server"); 75 + ASSERT_EQ(send(fd_sv_cl, buff, sizeof(buff), 0), 1, "send_server"); 76 + ASSERT_EQ(recv(fd_cl, buff, sizeof(buff), 0), 1, "recv_client"); 77 + close(fd_sv_cl); 78 + close(fd_cl); 79 + return 0; 80 + } 81 + 82 + void run_assign_reuse(int family, int sotype, const char *ip, __u16 port) 83 + { 84 + DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook, 85 + .ifindex = LOOPBACK, 86 + .attach_point = BPF_TC_INGRESS, 87 + ); 88 + DECLARE_LIBBPF_OPTS(bpf_tc_opts, tc_opts, 89 + .handle = 1, 90 + .priority = 1, 91 + ); 92 + bool hook_created = false, tc_attached = false; 93 + int ret, fd_tc, fd_accept, fd_drop, fd_map; 94 + int *fd_sv = NULL; 95 + __u64 fd_val; 96 + struct test_assign_reuse *skel; 97 + const int zero = 0; 98 + 99 + skel = test_assign_reuse__open(); 100 + if (!ASSERT_OK_PTR(skel, "skel_open")) 101 + goto cleanup; 102 + 103 + skel->rodata->dest_port = port; 104 + 105 + ret = test_assign_reuse__load(skel); 106 + if (!ASSERT_OK(ret, "skel_load")) 107 + goto cleanup; 108 + 109 + ASSERT_EQ(skel->bss->sk_cookie_seen, 0, "cookie_init"); 110 + 111 + fd_tc = bpf_program__fd(skel->progs.tc_main); 112 + fd_accept = bpf_program__fd(skel->progs.reuse_accept); 113 + fd_drop = bpf_program__fd(skel->progs.reuse_drop); 114 + fd_map = bpf_map__fd(skel->maps.sk_map); 115 + 116 + fd_sv = start_reuseport_server(family, sotype, ip, port, 100, 1); 117 + if (!ASSERT_NEQ(fd_sv, NULL, "start_reuseport_server")) 118 + goto cleanup; 119 + 120 + ret = attach_reuseport(*fd_sv, fd_drop); 121 + if (!ASSERT_OK(ret, "attach_reuseport")) 122 + goto cleanup; 123 + 124 + fd_val = *fd_sv; 125 + ret = bpf_map_update_elem(fd_map, &zero, &fd_val, BPF_NOEXIST); 126 + if (!ASSERT_OK(ret, "bpf_sk_map")) 127 + goto cleanup; 128 + 129 + ret = bpf_tc_hook_create(&tc_hook); 130 + if (ret == 0) 131 + hook_created = true; 132 + ret = ret == -EEXIST ? 0 : ret; 133 + if (!ASSERT_OK(ret, "bpf_tc_hook_create")) 134 + goto cleanup; 135 + 136 + tc_opts.prog_fd = fd_tc; 137 + ret = bpf_tc_attach(&tc_hook, &tc_opts); 138 + if (!ASSERT_OK(ret, "bpf_tc_attach")) 139 + goto cleanup; 140 + tc_attached = true; 141 + 142 + if (sotype == SOCK_STREAM) 143 + ASSERT_EQ(echo_test_tcp(*fd_sv), ECONNREFUSED, "drop_tcp"); 144 + else 145 + ASSERT_EQ(echo_test_udp(*fd_sv), EAGAIN, "drop_udp"); 146 + ASSERT_EQ(skel->bss->reuseport_executed, 1, "program executed once"); 147 + 148 + skel->bss->sk_cookie_seen = 0; 149 + skel->bss->reuseport_executed = 0; 150 + ASSERT_OK(attach_reuseport(*fd_sv, fd_accept), "attach_reuseport(accept)"); 151 + 152 + if (sotype == SOCK_STREAM) 153 + ASSERT_EQ(echo_test_tcp(*fd_sv), 0, "echo_tcp"); 154 + else 155 + ASSERT_EQ(echo_test_udp(*fd_sv), 0, "echo_udp"); 156 + 157 + ASSERT_EQ(skel->bss->sk_cookie_seen, cookie(*fd_sv), 158 + "cookie_mismatch"); 159 + ASSERT_EQ(skel->bss->reuseport_executed, 1, "program executed once"); 160 + cleanup: 161 + if (tc_attached) { 162 + tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0; 163 + ret = bpf_tc_detach(&tc_hook, &tc_opts); 164 + ASSERT_OK(ret, "bpf_tc_detach"); 165 + } 166 + if (hook_created) { 167 + tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS; 168 + bpf_tc_hook_destroy(&tc_hook); 169 + } 170 + test_assign_reuse__destroy(skel); 171 + free_fds(fd_sv, 1); 172 + } 173 + 174 + void test_assign_reuse(void) 175 + { 176 + struct nstoken *tok = NULL; 177 + 178 + SYS(out, "ip netns add %s", NS_TEST); 179 + SYS(cleanup, "ip -net %s link set dev lo up", NS_TEST); 180 + 181 + tok = open_netns(NS_TEST); 182 + if (!ASSERT_OK_PTR(tok, "netns token")) 183 + return; 184 + 185 + if (test__start_subtest("tcpv4")) 186 + run_assign_reuse(AF_INET, SOCK_STREAM, "127.0.0.1", PORT); 187 + if (test__start_subtest("tcpv6")) 188 + run_assign_reuse(AF_INET6, SOCK_STREAM, "::1", PORT); 189 + if (test__start_subtest("udpv4")) 190 + run_assign_reuse(AF_INET, SOCK_DGRAM, "127.0.0.1", PORT); 191 + if (test__start_subtest("udpv6")) 192 + run_assign_reuse(AF_INET6, SOCK_DGRAM, "::1", PORT); 193 + 194 + cleanup: 195 + close_netns(tok); 196 + SYS_NOFAIL("ip netns delete %s", NS_TEST); 197 + out: 198 + return; 199 + }

+283

tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <test_progs.h> 3 + #include <net/if.h> 4 + #include <linux/netfilter.h> 5 + #include <network_helpers.h> 6 + #include "ip_check_defrag.skel.h" 7 + #include "ip_check_defrag_frags.h" 8 + 9 + /* 10 + * This selftest spins up a client and an echo server, each in their own 11 + * network namespace. The client will send a fragmented message to the server. 12 + * The prog attached to the server will shoot down any fragments. Thus, if 13 + * the server is able to correctly echo back the message to the client, we will 14 + * have verified that netfilter is reassembling packets for us. 15 + * 16 + * Topology: 17 + * ========= 18 + * NS0 | NS1 19 + * | 20 + * client | server 21 + * ---------- | ---------- 22 + * | veth0 | --------- | veth1 | 23 + * ---------- peer ---------- 24 + * | 25 + * | with bpf 26 + */ 27 + 28 + #define NS0 "defrag_ns0" 29 + #define NS1 "defrag_ns1" 30 + #define VETH0 "veth0" 31 + #define VETH1 "veth1" 32 + #define VETH0_ADDR "172.16.1.100" 33 + #define VETH0_ADDR6 "fc00::100" 34 + /* The following constants must stay in sync with `generate_udp_fragments.py` */ 35 + #define VETH1_ADDR "172.16.1.200" 36 + #define VETH1_ADDR6 "fc00::200" 37 + #define CLIENT_PORT 48878 38 + #define SERVER_PORT 48879 39 + #define MAGIC_MESSAGE "THIS IS THE ORIGINAL MESSAGE, PLEASE REASSEMBLE ME" 40 + 41 + static int setup_topology(bool ipv6) 42 + { 43 + bool up; 44 + int i; 45 + 46 + SYS(fail, "ip netns add " NS0); 47 + SYS(fail, "ip netns add " NS1); 48 + SYS(fail, "ip link add " VETH0 " netns " NS0 " type veth peer name " VETH1 " netns " NS1); 49 + if (ipv6) { 50 + SYS(fail, "ip -6 -net " NS0 " addr add " VETH0_ADDR6 "/64 dev " VETH0 " nodad"); 51 + SYS(fail, "ip -6 -net " NS1 " addr add " VETH1_ADDR6 "/64 dev " VETH1 " nodad"); 52 + } else { 53 + SYS(fail, "ip -net " NS0 " addr add " VETH0_ADDR "/24 dev " VETH0); 54 + SYS(fail, "ip -net " NS1 " addr add " VETH1_ADDR "/24 dev " VETH1); 55 + } 56 + SYS(fail, "ip -net " NS0 " link set dev " VETH0 " up"); 57 + SYS(fail, "ip -net " NS1 " link set dev " VETH1 " up"); 58 + 59 + /* Wait for up to 5s for links to come up */ 60 + for (i = 0; i < 5; ++i) { 61 + if (ipv6) 62 + up = !system("ip netns exec " NS0 " ping -6 -c 1 -W 1 " VETH1_ADDR6 " &>/dev/null"); 63 + else 64 + up = !system("ip netns exec " NS0 " ping -c 1 -W 1 " VETH1_ADDR " &>/dev/null"); 65 + 66 + if (up) 67 + break; 68 + } 69 + 70 + return 0; 71 + fail: 72 + return -1; 73 + } 74 + 75 + static void cleanup_topology(void) 76 + { 77 + SYS_NOFAIL("test -f /var/run/netns/" NS0 " && ip netns delete " NS0); 78 + SYS_NOFAIL("test -f /var/run/netns/" NS1 " && ip netns delete " NS1); 79 + } 80 + 81 + static int attach(struct ip_check_defrag *skel, bool ipv6) 82 + { 83 + LIBBPF_OPTS(bpf_netfilter_opts, opts, 84 + .pf = ipv6 ? NFPROTO_IPV6 : NFPROTO_IPV4, 85 + .priority = 42, 86 + .flags = BPF_F_NETFILTER_IP_DEFRAG); 87 + struct nstoken *nstoken; 88 + int err = -1; 89 + 90 + nstoken = open_netns(NS1); 91 + 92 + skel->links.defrag = bpf_program__attach_netfilter(skel->progs.defrag, &opts); 93 + if (!ASSERT_OK_PTR(skel->links.defrag, "program attach")) 94 + goto out; 95 + 96 + err = 0; 97 + out: 98 + close_netns(nstoken); 99 + return err; 100 + } 101 + 102 + static int send_frags(int client) 103 + { 104 + struct sockaddr_storage saddr; 105 + struct sockaddr *saddr_p; 106 + socklen_t saddr_len; 107 + int err; 108 + 109 + saddr_p = (struct sockaddr *)&saddr; 110 + err = make_sockaddr(AF_INET, VETH1_ADDR, SERVER_PORT, &saddr, &saddr_len); 111 + if (!ASSERT_OK(err, "make_sockaddr")) 112 + return -1; 113 + 114 + err = sendto(client, frag_0, sizeof(frag_0), 0, saddr_p, saddr_len); 115 + if (!ASSERT_GE(err, 0, "sendto frag_0")) 116 + return -1; 117 + 118 + err = sendto(client, frag_1, sizeof(frag_1), 0, saddr_p, saddr_len); 119 + if (!ASSERT_GE(err, 0, "sendto frag_1")) 120 + return -1; 121 + 122 + err = sendto(client, frag_2, sizeof(frag_2), 0, saddr_p, saddr_len); 123 + if (!ASSERT_GE(err, 0, "sendto frag_2")) 124 + return -1; 125 + 126 + return 0; 127 + } 128 + 129 + static int send_frags6(int client) 130 + { 131 + struct sockaddr_storage saddr; 132 + struct sockaddr *saddr_p; 133 + socklen_t saddr_len; 134 + int err; 135 + 136 + saddr_p = (struct sockaddr *)&saddr; 137 + /* Port needs to be set to 0 for raw ipv6 socket for some reason */ 138 + err = make_sockaddr(AF_INET6, VETH1_ADDR6, 0, &saddr, &saddr_len); 139 + if (!ASSERT_OK(err, "make_sockaddr")) 140 + return -1; 141 + 142 + err = sendto(client, frag6_0, sizeof(frag6_0), 0, saddr_p, saddr_len); 143 + if (!ASSERT_GE(err, 0, "sendto frag6_0")) 144 + return -1; 145 + 146 + err = sendto(client, frag6_1, sizeof(frag6_1), 0, saddr_p, saddr_len); 147 + if (!ASSERT_GE(err, 0, "sendto frag6_1")) 148 + return -1; 149 + 150 + err = sendto(client, frag6_2, sizeof(frag6_2), 0, saddr_p, saddr_len); 151 + if (!ASSERT_GE(err, 0, "sendto frag6_2")) 152 + return -1; 153 + 154 + return 0; 155 + } 156 + 157 + void test_bpf_ip_check_defrag_ok(bool ipv6) 158 + { 159 + struct network_helper_opts rx_opts = { 160 + .timeout_ms = 1000, 161 + .noconnect = true, 162 + }; 163 + struct network_helper_opts tx_ops = { 164 + .timeout_ms = 1000, 165 + .type = SOCK_RAW, 166 + .proto = IPPROTO_RAW, 167 + .noconnect = true, 168 + }; 169 + struct sockaddr_storage caddr; 170 + struct ip_check_defrag *skel; 171 + struct nstoken *nstoken; 172 + int client_tx_fd = -1; 173 + int client_rx_fd = -1; 174 + socklen_t caddr_len; 175 + int srv_fd = -1; 176 + char buf[1024]; 177 + int len, err; 178 + 179 + skel = ip_check_defrag__open_and_load(); 180 + if (!ASSERT_OK_PTR(skel, "skel_open")) 181 + return; 182 + 183 + if (!ASSERT_OK(setup_topology(ipv6), "setup_topology")) 184 + goto out; 185 + 186 + if (!ASSERT_OK(attach(skel, ipv6), "attach")) 187 + goto out; 188 + 189 + /* Start server in ns1 */ 190 + nstoken = open_netns(NS1); 191 + if (!ASSERT_OK_PTR(nstoken, "setns ns1")) 192 + goto out; 193 + srv_fd = start_server(ipv6 ? AF_INET6 : AF_INET, SOCK_DGRAM, NULL, SERVER_PORT, 0); 194 + close_netns(nstoken); 195 + if (!ASSERT_GE(srv_fd, 0, "start_server")) 196 + goto out; 197 + 198 + /* Open tx raw socket in ns0 */ 199 + nstoken = open_netns(NS0); 200 + if (!ASSERT_OK_PTR(nstoken, "setns ns0")) 201 + goto out; 202 + client_tx_fd = connect_to_fd_opts(srv_fd, &tx_ops); 203 + close_netns(nstoken); 204 + if (!ASSERT_GE(client_tx_fd, 0, "connect_to_fd_opts")) 205 + goto out; 206 + 207 + /* Open rx socket in ns0 */ 208 + nstoken = open_netns(NS0); 209 + if (!ASSERT_OK_PTR(nstoken, "setns ns0")) 210 + goto out; 211 + client_rx_fd = connect_to_fd_opts(srv_fd, &rx_opts); 212 + close_netns(nstoken); 213 + if (!ASSERT_GE(client_rx_fd, 0, "connect_to_fd_opts")) 214 + goto out; 215 + 216 + /* Bind rx socket to a premeditated port */ 217 + memset(&caddr, 0, sizeof(caddr)); 218 + nstoken = open_netns(NS0); 219 + if (!ASSERT_OK_PTR(nstoken, "setns ns0")) 220 + goto out; 221 + if (ipv6) { 222 + struct sockaddr_in6 *c = (struct sockaddr_in6 *)&caddr; 223 + 224 + c->sin6_family = AF_INET6; 225 + inet_pton(AF_INET6, VETH0_ADDR6, &c->sin6_addr); 226 + c->sin6_port = htons(CLIENT_PORT); 227 + err = bind(client_rx_fd, (struct sockaddr *)c, sizeof(*c)); 228 + } else { 229 + struct sockaddr_in *c = (struct sockaddr_in *)&caddr; 230 + 231 + c->sin_family = AF_INET; 232 + inet_pton(AF_INET, VETH0_ADDR, &c->sin_addr); 233 + c->sin_port = htons(CLIENT_PORT); 234 + err = bind(client_rx_fd, (struct sockaddr *)c, sizeof(*c)); 235 + } 236 + close_netns(nstoken); 237 + if (!ASSERT_OK(err, "bind")) 238 + goto out; 239 + 240 + /* Send message in fragments */ 241 + if (ipv6) { 242 + if (!ASSERT_OK(send_frags6(client_tx_fd), "send_frags6")) 243 + goto out; 244 + } else { 245 + if (!ASSERT_OK(send_frags(client_tx_fd), "send_frags")) 246 + goto out; 247 + } 248 + 249 + if (!ASSERT_EQ(skel->bss->shootdowns, 0, "shootdowns")) 250 + goto out; 251 + 252 + /* Receive reassembled msg on server and echo back to client */ 253 + caddr_len = sizeof(caddr); 254 + len = recvfrom(srv_fd, buf, sizeof(buf), 0, (struct sockaddr *)&caddr, &caddr_len); 255 + if (!ASSERT_GE(len, 0, "server recvfrom")) 256 + goto out; 257 + len = sendto(srv_fd, buf, len, 0, (struct sockaddr *)&caddr, caddr_len); 258 + if (!ASSERT_GE(len, 0, "server sendto")) 259 + goto out; 260 + 261 + /* Expect reassembed message to be echoed back */ 262 + len = recvfrom(client_rx_fd, buf, sizeof(buf), 0, NULL, NULL); 263 + if (!ASSERT_EQ(len, sizeof(MAGIC_MESSAGE) - 1, "client short read")) 264 + goto out; 265 + 266 + out: 267 + if (client_rx_fd != -1) 268 + close(client_rx_fd); 269 + if (client_tx_fd != -1) 270 + close(client_tx_fd); 271 + if (srv_fd != -1) 272 + close(srv_fd); 273 + cleanup_topology(); 274 + ip_check_defrag__destroy(skel); 275 + } 276 + 277 + void test_bpf_ip_check_defrag(void) 278 + { 279 + if (test__start_subtest("v4")) 280 + test_bpf_ip_check_defrag_ok(false); 281 + if (test__start_subtest("v6")) 282 + test_bpf_ip_check_defrag_ok(true); 283 + }

+139

tools/testing/selftests/bpf/prog_tests/test_ldsx_insn.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates.*/ 3 + 4 + #include <test_progs.h> 5 + #include <network_helpers.h> 6 + #include "test_ldsx_insn.skel.h" 7 + 8 + static void test_map_val_and_probed_memory(void) 9 + { 10 + struct test_ldsx_insn *skel; 11 + int err; 12 + 13 + skel = test_ldsx_insn__open(); 14 + if (!ASSERT_OK_PTR(skel, "test_ldsx_insn__open")) 15 + return; 16 + 17 + if (skel->rodata->skip) { 18 + test__skip(); 19 + goto out; 20 + } 21 + 22 + bpf_program__set_autoload(skel->progs.rdonly_map_prog, true); 23 + bpf_program__set_autoload(skel->progs.map_val_prog, true); 24 + bpf_program__set_autoload(skel->progs.test_ptr_struct_arg, true); 25 + 26 + err = test_ldsx_insn__load(skel); 27 + if (!ASSERT_OK(err, "test_ldsx_insn__load")) 28 + goto out; 29 + 30 + err = test_ldsx_insn__attach(skel); 31 + if (!ASSERT_OK(err, "test_ldsx_insn__attach")) 32 + goto out; 33 + 34 + ASSERT_OK(trigger_module_test_read(256), "trigger_read"); 35 + 36 + ASSERT_EQ(skel->bss->done1, 1, "done1"); 37 + ASSERT_EQ(skel->bss->ret1, 1, "ret1"); 38 + ASSERT_EQ(skel->bss->done2, 1, "done2"); 39 + ASSERT_EQ(skel->bss->ret2, 1, "ret2"); 40 + ASSERT_EQ(skel->bss->int_member, -1, "int_member"); 41 + 42 + out: 43 + test_ldsx_insn__destroy(skel); 44 + } 45 + 46 + static void test_ctx_member_sign_ext(void) 47 + { 48 + struct test_ldsx_insn *skel; 49 + int err, fd, cgroup_fd; 50 + char buf[16] = {0}; 51 + socklen_t optlen; 52 + 53 + cgroup_fd = test__join_cgroup("/ldsx_test"); 54 + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /ldsx_test")) 55 + return; 56 + 57 + skel = test_ldsx_insn__open(); 58 + if (!ASSERT_OK_PTR(skel, "test_ldsx_insn__open")) 59 + goto close_cgroup_fd; 60 + 61 + if (skel->rodata->skip) { 62 + test__skip(); 63 + goto destroy_skel; 64 + } 65 + 66 + bpf_program__set_autoload(skel->progs._getsockopt, true); 67 + 68 + err = test_ldsx_insn__load(skel); 69 + if (!ASSERT_OK(err, "test_ldsx_insn__load")) 70 + goto destroy_skel; 71 + 72 + skel->links._getsockopt = 73 + bpf_program__attach_cgroup(skel->progs._getsockopt, cgroup_fd); 74 + if (!ASSERT_OK_PTR(skel->links._getsockopt, "getsockopt_link")) 75 + goto destroy_skel; 76 + 77 + fd = socket(AF_INET, SOCK_STREAM, 0); 78 + if (!ASSERT_GE(fd, 0, "socket")) 79 + goto destroy_skel; 80 + 81 + optlen = sizeof(buf); 82 + (void)getsockopt(fd, SOL_IP, IP_TTL, buf, &optlen); 83 + 84 + ASSERT_EQ(skel->bss->set_optlen, -1, "optlen"); 85 + ASSERT_EQ(skel->bss->set_retval, -1, "retval"); 86 + 87 + close(fd); 88 + destroy_skel: 89 + test_ldsx_insn__destroy(skel); 90 + close_cgroup_fd: 91 + close(cgroup_fd); 92 + } 93 + 94 + static void test_ctx_member_narrow_sign_ext(void) 95 + { 96 + struct test_ldsx_insn *skel; 97 + struct __sk_buff skb = {}; 98 + LIBBPF_OPTS(bpf_test_run_opts, topts, 99 + .data_in = &pkt_v4, 100 + .data_size_in = sizeof(pkt_v4), 101 + .ctx_in = &skb, 102 + .ctx_size_in = sizeof(skb), 103 + ); 104 + int err, prog_fd; 105 + 106 + skel = test_ldsx_insn__open(); 107 + if (!ASSERT_OK_PTR(skel, "test_ldsx_insn__open")) 108 + return; 109 + 110 + if (skel->rodata->skip) { 111 + test__skip(); 112 + goto out; 113 + } 114 + 115 + bpf_program__set_autoload(skel->progs._tc, true); 116 + 117 + err = test_ldsx_insn__load(skel); 118 + if (!ASSERT_OK(err, "test_ldsx_insn__load")) 119 + goto out; 120 + 121 + prog_fd = bpf_program__fd(skel->progs._tc); 122 + err = bpf_prog_test_run_opts(prog_fd, &topts); 123 + ASSERT_OK(err, "test_run"); 124 + 125 + ASSERT_EQ(skel->bss->set_mark, -2, "set_mark"); 126 + 127 + out: 128 + test_ldsx_insn__destroy(skel); 129 + } 130 + 131 + void test_ldsx_insn(void) 132 + { 133 + if (test__start_subtest("map_val and probed_memory")) 134 + test_map_val_and_probed_memory(); 135 + if (test__start_subtest("ctx_member_sign_ext")) 136 + test_ctx_member_sign_ext(); 137 + if (test__start_subtest("ctx_member_narrow_sign_ext")) 138 + test_ctx_member_narrow_sign_ext(); 139 + }

+10

tools/testing/selftests/bpf/prog_tests/verifier.c

··· 11 11 #include "verifier_bounds_deduction_non_const.skel.h" 12 12 #include "verifier_bounds_mix_sign_unsign.skel.h" 13 13 #include "verifier_bpf_get_stack.skel.h" 14 + #include "verifier_bswap.skel.h" 14 15 #include "verifier_btf_ctx_access.skel.h" 15 16 #include "verifier_cfg.skel.h" 16 17 #include "verifier_cgroup_inv_retcode.skel.h" ··· 25 24 #include "verifier_direct_stack_access_wraparound.skel.h" 26 25 #include "verifier_div0.skel.h" 27 26 #include "verifier_div_overflow.skel.h" 27 + #include "verifier_gotol.skel.h" 28 28 #include "verifier_helper_access_var_len.skel.h" 29 29 #include "verifier_helper_packet_access.skel.h" 30 30 #include "verifier_helper_restricted.skel.h" ··· 33 31 #include "verifier_int_ptr.skel.h" 34 32 #include "verifier_jeq_infer_not_null.skel.h" 35 33 #include "verifier_ld_ind.skel.h" 34 + #include "verifier_ldsx.skel.h" 36 35 #include "verifier_leak_ptr.skel.h" 37 36 #include "verifier_loops1.skel.h" 38 37 #include "verifier_lwt.skel.h" ··· 43 40 #include "verifier_map_ret_val.skel.h" 44 41 #include "verifier_masking.skel.h" 45 42 #include "verifier_meta_access.skel.h" 43 + #include "verifier_movsx.skel.h" 46 44 #include "verifier_netfilter_ctx.skel.h" 47 45 #include "verifier_netfilter_retcode.skel.h" 48 46 #include "verifier_prevent_map_lookup.skel.h" ··· 55 51 #include "verifier_ringbuf.skel.h" 56 52 #include "verifier_runtime_jit.skel.h" 57 53 #include "verifier_scalar_ids.skel.h" 54 + #include "verifier_sdiv.skel.h" 58 55 #include "verifier_search_pruning.skel.h" 59 56 #include "verifier_sock.skel.h" 60 57 #include "verifier_spill_fill.skel.h" ··· 118 113 void test_verifier_bounds_deduction_non_const(void) { RUN(verifier_bounds_deduction_non_const); } 119 114 void test_verifier_bounds_mix_sign_unsign(void) { RUN(verifier_bounds_mix_sign_unsign); } 120 115 void test_verifier_bpf_get_stack(void) { RUN(verifier_bpf_get_stack); } 116 + void test_verifier_bswap(void) { RUN(verifier_bswap); } 121 117 void test_verifier_btf_ctx_access(void) { RUN(verifier_btf_ctx_access); } 122 118 void test_verifier_cfg(void) { RUN(verifier_cfg); } 123 119 void test_verifier_cgroup_inv_retcode(void) { RUN(verifier_cgroup_inv_retcode); } ··· 132 126 void test_verifier_direct_stack_access_wraparound(void) { RUN(verifier_direct_stack_access_wraparound); } 133 127 void test_verifier_div0(void) { RUN(verifier_div0); } 134 128 void test_verifier_div_overflow(void) { RUN(verifier_div_overflow); } 129 + void test_verifier_gotol(void) { RUN(verifier_gotol); } 135 130 void test_verifier_helper_access_var_len(void) { RUN(verifier_helper_access_var_len); } 136 131 void test_verifier_helper_packet_access(void) { RUN(verifier_helper_packet_access); } 137 132 void test_verifier_helper_restricted(void) { RUN(verifier_helper_restricted); } ··· 140 133 void test_verifier_int_ptr(void) { RUN(verifier_int_ptr); } 141 134 void test_verifier_jeq_infer_not_null(void) { RUN(verifier_jeq_infer_not_null); } 142 135 void test_verifier_ld_ind(void) { RUN(verifier_ld_ind); } 136 + void test_verifier_ldsx(void) { RUN(verifier_ldsx); } 143 137 void test_verifier_leak_ptr(void) { RUN(verifier_leak_ptr); } 144 138 void test_verifier_loops1(void) { RUN(verifier_loops1); } 145 139 void test_verifier_lwt(void) { RUN(verifier_lwt); } ··· 150 142 void test_verifier_map_ret_val(void) { RUN(verifier_map_ret_val); } 151 143 void test_verifier_masking(void) { RUN(verifier_masking); } 152 144 void test_verifier_meta_access(void) { RUN(verifier_meta_access); } 145 + void test_verifier_movsx(void) { RUN(verifier_movsx); } 153 146 void test_verifier_netfilter_ctx(void) { RUN(verifier_netfilter_ctx); } 154 147 void test_verifier_netfilter_retcode(void) { RUN(verifier_netfilter_retcode); } 155 148 void test_verifier_prevent_map_lookup(void) { RUN(verifier_prevent_map_lookup); } ··· 162 153 void test_verifier_ringbuf(void) { RUN(verifier_ringbuf); } 163 154 void test_verifier_runtime_jit(void) { RUN(verifier_runtime_jit); } 164 155 void test_verifier_scalar_ids(void) { RUN(verifier_scalar_ids); } 156 + void test_verifier_sdiv(void) { RUN(verifier_sdiv); } 165 157 void test_verifier_search_pruning(void) { RUN(verifier_search_pruning); } 166 158 void test_verifier_sock(void) { RUN(verifier_sock); } 167 159 void test_verifier_spill_fill(void) { RUN(verifier_spill_fill); }

+65

tools/testing/selftests/bpf/prog_tests/xdp_attach.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <test_progs.h> 3 + #include "test_xdp_attach_fail.skel.h" 3 4 4 5 #define IFINDEX_LO 1 5 6 #define XDP_FLAGS_REPLACE (1U << 4) ··· 86 85 bpf_object__close(obj1); 87 86 } 88 87 88 + #define ERRMSG_LEN 64 89 + 90 + struct xdp_errmsg { 91 + char msg[ERRMSG_LEN]; 92 + }; 93 + 94 + static void on_xdp_errmsg(void *ctx, int cpu, void *data, __u32 size) 95 + { 96 + struct xdp_errmsg *ctx_errmg = ctx, *tp_errmsg = data; 97 + 98 + memcpy(&ctx_errmg->msg, &tp_errmsg->msg, ERRMSG_LEN); 99 + } 100 + 101 + static const char tgt_errmsg[] = "Invalid XDP flags for BPF link attachment"; 102 + 103 + static void test_xdp_attach_fail(const char *file) 104 + { 105 + struct test_xdp_attach_fail *skel = NULL; 106 + struct xdp_errmsg errmsg = {}; 107 + struct perf_buffer *pb = NULL; 108 + struct bpf_object *obj = NULL; 109 + int err, fd_xdp; 110 + 111 + LIBBPF_OPTS(bpf_link_create_opts, opts); 112 + 113 + skel = test_xdp_attach_fail__open_and_load(); 114 + if (!ASSERT_OK_PTR(skel, "test_xdp_attach_fail__open_and_load")) 115 + goto out_close; 116 + 117 + err = test_xdp_attach_fail__attach(skel); 118 + if (!ASSERT_EQ(err, 0, "test_xdp_attach_fail__attach")) 119 + goto out_close; 120 + 121 + /* set up perf buffer */ 122 + pb = perf_buffer__new(bpf_map__fd(skel->maps.xdp_errmsg_pb), 1, 123 + on_xdp_errmsg, NULL, &errmsg, NULL); 124 + if (!ASSERT_OK_PTR(pb, "perf_buffer__new")) 125 + goto out_close; 126 + 127 + err = bpf_prog_test_load(file, BPF_PROG_TYPE_XDP, &obj, &fd_xdp); 128 + if (!ASSERT_EQ(err, 0, "bpf_prog_test_load")) 129 + goto out_close; 130 + 131 + opts.flags = 0xFF; // invalid flags to fail to attach XDP prog 132 + err = bpf_link_create(fd_xdp, IFINDEX_LO, BPF_XDP, &opts); 133 + if (!ASSERT_EQ(err, -EINVAL, "bpf_link_create")) 134 + goto out_close; 135 + 136 + /* read perf buffer */ 137 + err = perf_buffer__poll(pb, 100); 138 + if (!ASSERT_GT(err, -1, "perf_buffer__poll")) 139 + goto out_close; 140 + 141 + ASSERT_STRNEQ((const char *) errmsg.msg, tgt_errmsg, 142 + 42 /* strlen(tgt_errmsg) */, "check error message"); 143 + 144 + out_close: 145 + perf_buffer__free(pb); 146 + bpf_object__close(obj); 147 + test_xdp_attach_fail__destroy(skel); 148 + } 149 + 89 150 void serial_test_xdp_attach(void) 90 151 { 91 152 if (test__start_subtest("xdp_attach")) 92 153 test_xdp_attach("./test_xdp.bpf.o"); 93 154 if (test__start_subtest("xdp_attach_dynptr")) 94 155 test_xdp_attach("./test_xdp_dynptr.bpf.o"); 156 + if (test__start_subtest("xdp_attach_failed")) 157 + test_xdp_attach_fail("./xdp_dummy.bpf.o"); 95 158 }

+104

tools/testing/selftests/bpf/progs/ip_check_defrag.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #include "vmlinux.h" 3 + #include <bpf/bpf_helpers.h> 4 + #include <bpf/bpf_endian.h> 5 + #include "bpf_tracing_net.h" 6 + 7 + #define NF_DROP 0 8 + #define NF_ACCEPT 1 9 + #define ETH_P_IP 0x0800 10 + #define ETH_P_IPV6 0x86DD 11 + #define IP_MF 0x2000 12 + #define IP_OFFSET 0x1FFF 13 + #define NEXTHDR_FRAGMENT 44 14 + 15 + extern int bpf_dynptr_from_skb(struct sk_buff *skb, __u64 flags, 16 + struct bpf_dynptr *ptr__uninit) __ksym; 17 + extern void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, uint32_t offset, 18 + void *buffer, uint32_t buffer__sz) __ksym; 19 + 20 + volatile int shootdowns = 0; 21 + 22 + static bool is_frag_v4(struct iphdr *iph) 23 + { 24 + int offset; 25 + int flags; 26 + 27 + offset = bpf_ntohs(iph->frag_off); 28 + flags = offset & ~IP_OFFSET; 29 + offset &= IP_OFFSET; 30 + offset <<= 3; 31 + 32 + return (flags & IP_MF) || offset; 33 + } 34 + 35 + static bool is_frag_v6(struct ipv6hdr *ip6h) 36 + { 37 + /* Simplifying assumption that there are no extension headers 38 + * between fixed header and fragmentation header. This assumption 39 + * is only valid in this test case. It saves us the hassle of 40 + * searching all potential extension headers. 41 + */ 42 + return ip6h->nexthdr == NEXTHDR_FRAGMENT; 43 + } 44 + 45 + static int handle_v4(struct sk_buff *skb) 46 + { 47 + struct bpf_dynptr ptr; 48 + u8 iph_buf[20] = {}; 49 + struct iphdr *iph; 50 + 51 + if (bpf_dynptr_from_skb(skb, 0, &ptr)) 52 + return NF_DROP; 53 + 54 + iph = bpf_dynptr_slice(&ptr, 0, iph_buf, sizeof(iph_buf)); 55 + if (!iph) 56 + return NF_DROP; 57 + 58 + /* Shootdown any frags */ 59 + if (is_frag_v4(iph)) { 60 + shootdowns++; 61 + return NF_DROP; 62 + } 63 + 64 + return NF_ACCEPT; 65 + } 66 + 67 + static int handle_v6(struct sk_buff *skb) 68 + { 69 + struct bpf_dynptr ptr; 70 + struct ipv6hdr *ip6h; 71 + u8 ip6h_buf[40] = {}; 72 + 73 + if (bpf_dynptr_from_skb(skb, 0, &ptr)) 74 + return NF_DROP; 75 + 76 + ip6h = bpf_dynptr_slice(&ptr, 0, ip6h_buf, sizeof(ip6h_buf)); 77 + if (!ip6h) 78 + return NF_DROP; 79 + 80 + /* Shootdown any frags */ 81 + if (is_frag_v6(ip6h)) { 82 + shootdowns++; 83 + return NF_DROP; 84 + } 85 + 86 + return NF_ACCEPT; 87 + } 88 + 89 + SEC("netfilter") 90 + int defrag(struct bpf_nf_ctx *ctx) 91 + { 92 + struct sk_buff *skb = ctx->skb; 93 + 94 + switch (bpf_ntohs(skb->protocol)) { 95 + case ETH_P_IP: 96 + return handle_v4(skb); 97 + case ETH_P_IPV6: 98 + return handle_v6(skb); 99 + default: 100 + return NF_ACCEPT; 101 + } 102 + } 103 + 104 + char _license[] SEC("license") = "GPL";

+142

tools/testing/selftests/bpf/progs/test_assign_reuse.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Isovalent */ 3 + #include <stdbool.h> 4 + #include <linux/bpf.h> 5 + #include <linux/if_ether.h> 6 + #include <linux/in.h> 7 + #include <linux/ip.h> 8 + #include <linux/ipv6.h> 9 + #include <linux/tcp.h> 10 + #include <linux/udp.h> 11 + #include <bpf/bpf_endian.h> 12 + #include <bpf/bpf_helpers.h> 13 + #include <linux/pkt_cls.h> 14 + 15 + char LICENSE[] SEC("license") = "GPL"; 16 + 17 + __u64 sk_cookie_seen; 18 + __u64 reuseport_executed; 19 + union { 20 + struct tcphdr tcp; 21 + struct udphdr udp; 22 + } headers; 23 + 24 + const volatile __u16 dest_port; 25 + 26 + struct { 27 + __uint(type, BPF_MAP_TYPE_SOCKMAP); 28 + __uint(max_entries, 1); 29 + __type(key, __u32); 30 + __type(value, __u64); 31 + } sk_map SEC(".maps"); 32 + 33 + SEC("sk_reuseport") 34 + int reuse_accept(struct sk_reuseport_md *ctx) 35 + { 36 + reuseport_executed++; 37 + 38 + if (ctx->ip_protocol == IPPROTO_TCP) { 39 + if (ctx->data + sizeof(headers.tcp) > ctx->data_end) 40 + return SK_DROP; 41 + 42 + if (__builtin_memcmp(&headers.tcp, ctx->data, sizeof(headers.tcp)) != 0) 43 + return SK_DROP; 44 + } else if (ctx->ip_protocol == IPPROTO_UDP) { 45 + if (ctx->data + sizeof(headers.udp) > ctx->data_end) 46 + return SK_DROP; 47 + 48 + if (__builtin_memcmp(&headers.udp, ctx->data, sizeof(headers.udp)) != 0) 49 + return SK_DROP; 50 + } else { 51 + return SK_DROP; 52 + } 53 + 54 + sk_cookie_seen = bpf_get_socket_cookie(ctx->sk); 55 + return SK_PASS; 56 + } 57 + 58 + SEC("sk_reuseport") 59 + int reuse_drop(struct sk_reuseport_md *ctx) 60 + { 61 + reuseport_executed++; 62 + sk_cookie_seen = 0; 63 + return SK_DROP; 64 + } 65 + 66 + static int 67 + assign_sk(struct __sk_buff *skb) 68 + { 69 + int zero = 0, ret = 0; 70 + struct bpf_sock *sk; 71 + 72 + sk = bpf_map_lookup_elem(&sk_map, &zero); 73 + if (!sk) 74 + return TC_ACT_SHOT; 75 + ret = bpf_sk_assign(skb, sk, 0); 76 + bpf_sk_release(sk); 77 + return ret ? TC_ACT_SHOT : TC_ACT_OK; 78 + } 79 + 80 + static bool 81 + maybe_assign_tcp(struct __sk_buff *skb, struct tcphdr *th) 82 + { 83 + if (th + 1 > (void *)(long)(skb->data_end)) 84 + return TC_ACT_SHOT; 85 + 86 + if (!th->syn || th->ack || th->dest != bpf_htons(dest_port)) 87 + return TC_ACT_OK; 88 + 89 + __builtin_memcpy(&headers.tcp, th, sizeof(headers.tcp)); 90 + return assign_sk(skb); 91 + } 92 + 93 + static bool 94 + maybe_assign_udp(struct __sk_buff *skb, struct udphdr *uh) 95 + { 96 + if (uh + 1 > (void *)(long)(skb->data_end)) 97 + return TC_ACT_SHOT; 98 + 99 + if (uh->dest != bpf_htons(dest_port)) 100 + return TC_ACT_OK; 101 + 102 + __builtin_memcpy(&headers.udp, uh, sizeof(headers.udp)); 103 + return assign_sk(skb); 104 + } 105 + 106 + SEC("tc") 107 + int tc_main(struct __sk_buff *skb) 108 + { 109 + void *data_end = (void *)(long)skb->data_end; 110 + void *data = (void *)(long)skb->data; 111 + struct ethhdr *eth; 112 + 113 + eth = (struct ethhdr *)(data); 114 + if (eth + 1 > data_end) 115 + return TC_ACT_SHOT; 116 + 117 + if (eth->h_proto == bpf_htons(ETH_P_IP)) { 118 + struct iphdr *iph = (struct iphdr *)(data + sizeof(*eth)); 119 + 120 + if (iph + 1 > data_end) 121 + return TC_ACT_SHOT; 122 + 123 + if (iph->protocol == IPPROTO_TCP) 124 + return maybe_assign_tcp(skb, (struct tcphdr *)(iph + 1)); 125 + else if (iph->protocol == IPPROTO_UDP) 126 + return maybe_assign_udp(skb, (struct udphdr *)(iph + 1)); 127 + else 128 + return TC_ACT_SHOT; 129 + } else { 130 + struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + sizeof(*eth)); 131 + 132 + if (ip6h + 1 > data_end) 133 + return TC_ACT_SHOT; 134 + 135 + if (ip6h->nexthdr == IPPROTO_TCP) 136 + return maybe_assign_tcp(skb, (struct tcphdr *)(ip6h + 1)); 137 + else if (ip6h->nexthdr == IPPROTO_UDP) 138 + return maybe_assign_udp(skb, (struct udphdr *)(ip6h + 1)); 139 + else 140 + return TC_ACT_SHOT; 141 + } 142 + }

+9

tools/testing/selftests/bpf/progs/test_cls_redirect.h

··· 12 12 #include <linux/ipv6.h> 13 13 #include <linux/udp.h> 14 14 15 + /* offsetof() is used in static asserts, and the libbpf-redefined CO-RE 16 + * friendly version breaks compilation for older clang versions <= 15 17 + * when invoked in a static assert. Restore original here. 18 + */ 19 + #ifdef offsetof 20 + #undef offsetof 21 + #define offsetof(type, member) __builtin_offsetof(type, member) 22 + #endif 23 + 15 24 struct gre_base_hdr { 16 25 uint16_t flags; 17 26 uint16_t protocol;

+118

tools/testing/selftests/bpf/progs/test_ldsx_insn.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + 8 + #if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18 9 + const volatile int skip = 0; 10 + #else 11 + const volatile int skip = 1; 12 + #endif 13 + 14 + volatile const short val1 = -1; 15 + volatile const int val2 = -1; 16 + short val3 = -1; 17 + int val4 = -1; 18 + int done1, done2, ret1, ret2; 19 + 20 + SEC("?raw_tp/sys_enter") 21 + int rdonly_map_prog(const void *ctx) 22 + { 23 + if (done1) 24 + return 0; 25 + 26 + done1 = 1; 27 + /* val1/val2 readonly map */ 28 + if (val1 == val2) 29 + ret1 = 1; 30 + return 0; 31 + 32 + } 33 + 34 + SEC("?raw_tp/sys_enter") 35 + int map_val_prog(const void *ctx) 36 + { 37 + if (done2) 38 + return 0; 39 + 40 + done2 = 1; 41 + /* val1/val2 regular read/write map */ 42 + if (val3 == val4) 43 + ret2 = 1; 44 + return 0; 45 + 46 + } 47 + 48 + struct bpf_testmod_struct_arg_1 { 49 + int a; 50 + }; 51 + 52 + long long int_member; 53 + 54 + SEC("?fentry/bpf_testmod_test_arg_ptr_to_struct") 55 + int BPF_PROG2(test_ptr_struct_arg, struct bpf_testmod_struct_arg_1 *, p) 56 + { 57 + /* probed memory access */ 58 + int_member = p->a; 59 + return 0; 60 + } 61 + 62 + long long set_optlen, set_retval; 63 + 64 + SEC("?cgroup/getsockopt") 65 + int _getsockopt(volatile struct bpf_sockopt *ctx) 66 + { 67 + int old_optlen, old_retval; 68 + 69 + old_optlen = ctx->optlen; 70 + old_retval = ctx->retval; 71 + 72 + ctx->optlen = -1; 73 + ctx->retval = -1; 74 + 75 + /* sign extension for ctx member */ 76 + set_optlen = ctx->optlen; 77 + set_retval = ctx->retval; 78 + 79 + ctx->optlen = old_optlen; 80 + ctx->retval = old_retval; 81 + 82 + return 0; 83 + } 84 + 85 + long long set_mark; 86 + 87 + SEC("?tc") 88 + int _tc(volatile struct __sk_buff *skb) 89 + { 90 + long long tmp_mark; 91 + int old_mark; 92 + 93 + old_mark = skb->mark; 94 + 95 + skb->mark = 0xf6fe; 96 + 97 + /* narrowed sign extension for ctx member */ 98 + #if __clang_major__ >= 18 99 + /* force narrow one-byte signed load. Otherwise, compiler may 100 + * generate a 32-bit unsigned load followed by an s8 movsx. 101 + */ 102 + asm volatile ("r1 = *(s8 *)(%[ctx] + %[off_mark])\n\t" 103 + "%[tmp_mark] = r1" 104 + : [tmp_mark]"=r"(tmp_mark) 105 + : [ctx]"r"(skb), 106 + [off_mark]"i"(offsetof(struct __sk_buff, mark)) 107 + : "r1"); 108 + #else 109 + tmp_mark = (char)skb->mark; 110 + #endif 111 + set_mark = tmp_mark; 112 + 113 + skb->mark = old_mark; 114 + 115 + return 0; 116 + } 117 + 118 + char _license[] SEC("license") = "GPL";

+54

tools/testing/selftests/bpf/progs/test_xdp_attach_fail.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright Leon Hwang */ 3 + 4 + #include <linux/bpf.h> 5 + #include <bpf/bpf_helpers.h> 6 + 7 + #define ERRMSG_LEN 64 8 + 9 + struct xdp_errmsg { 10 + char msg[ERRMSG_LEN]; 11 + }; 12 + 13 + struct { 14 + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); 15 + __type(key, int); 16 + __type(value, int); 17 + } xdp_errmsg_pb SEC(".maps"); 18 + 19 + struct xdp_attach_error_ctx { 20 + unsigned long unused; 21 + 22 + /* 23 + * bpf does not support tracepoint __data_loc directly. 24 + * 25 + * Actually, this field is a 32 bit integer whose value encodes 26 + * information on where to find the actual data. The first 2 bytes is 27 + * the size of the data. The last 2 bytes is the offset from the start 28 + * of the tracepoint struct where the data begins. 29 + * -- https://github.com/iovisor/bpftrace/pull/1542 30 + */ 31 + __u32 msg; // __data_loc char[] msg; 32 + }; 33 + 34 + /* 35 + * Catch the error message at the tracepoint. 36 + */ 37 + 38 + SEC("tp/xdp/bpf_xdp_link_attach_failed") 39 + int tp__xdp__bpf_xdp_link_attach_failed(struct xdp_attach_error_ctx *ctx) 40 + { 41 + char *msg = (void *)(__u64) ((void *) ctx + (__u16) ctx->msg); 42 + struct xdp_errmsg errmsg = {}; 43 + 44 + bpf_probe_read_kernel_str(&errmsg.msg, ERRMSG_LEN, msg); 45 + bpf_perf_event_output(ctx, &xdp_errmsg_pb, BPF_F_CURRENT_CPU, &errmsg, 46 + ERRMSG_LEN); 47 + return 0; 48 + } 49 + 50 + /* 51 + * Reuse the XDP program in xdp_dummy.c. 52 + */ 53 + 54 + char LICENSE[] SEC("license") = "GPL";

+59

tools/testing/selftests/bpf/progs/verifier_bswap.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + 7 + #if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18 8 + 9 + SEC("socket") 10 + __description("BSWAP, 16") 11 + __success __success_unpriv __retval(0x23ff) 12 + __naked void bswap_16(void) 13 + { 14 + asm volatile (" \ 15 + r0 = 0xff23; \ 16 + r0 = bswap16 r0; \ 17 + exit; \ 18 + " ::: __clobber_all); 19 + } 20 + 21 + SEC("socket") 22 + __description("BSWAP, 32") 23 + __success __success_unpriv __retval(0x23ff0000) 24 + __naked void bswap_32(void) 25 + { 26 + asm volatile (" \ 27 + r0 = 0xff23; \ 28 + r0 = bswap32 r0; \ 29 + exit; \ 30 + " ::: __clobber_all); 31 + } 32 + 33 + SEC("socket") 34 + __description("BSWAP, 64") 35 + __success __success_unpriv __retval(0x34ff12ff) 36 + __naked void bswap_64(void) 37 + { 38 + asm volatile (" \ 39 + r0 = %[u64_val] ll; \ 40 + r0 = bswap64 r0; \ 41 + exit; \ 42 + " : 43 + : [u64_val]"i"(0xff12ff34ff56ff78ull) 44 + : __clobber_all); 45 + } 46 + 47 + #else 48 + 49 + SEC("socket") 50 + __description("cpuv4 is not supported by compiler or jit, use a dummy test") 51 + __success 52 + int dummy_test(void) 53 + { 54 + return 0; 55 + } 56 + 57 + #endif 58 + 59 + char _license[] SEC("license") = "GPL";

+44

tools/testing/selftests/bpf/progs/verifier_gotol.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + 7 + #if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18 8 + 9 + SEC("socket") 10 + __description("gotol, small_imm") 11 + __success __success_unpriv __retval(1) 12 + __naked void gotol_small_imm(void) 13 + { 14 + asm volatile (" \ 15 + call %[bpf_ktime_get_ns]; \ 16 + if r0 == 0 goto l0_%=; \ 17 + gotol l1_%=; \ 18 + l2_%=: \ 19 + gotol l3_%=; \ 20 + l1_%=: \ 21 + r0 = 1; \ 22 + gotol l2_%=; \ 23 + l0_%=: \ 24 + r0 = 2; \ 25 + l3_%=: \ 26 + exit; \ 27 + " : 28 + : __imm(bpf_ktime_get_ns) 29 + : __clobber_all); 30 + } 31 + 32 + #else 33 + 34 + SEC("socket") 35 + __description("cpuv4 is not supported by compiler or jit, use a dummy test") 36 + __success 37 + int dummy_test(void) 38 + { 39 + return 0; 40 + } 41 + 42 + #endif 43 + 44 + char _license[] SEC("license") = "GPL";

+131

tools/testing/selftests/bpf/progs/verifier_ldsx.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + 7 + #if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18 8 + 9 + SEC("socket") 10 + __description("LDSX, S8") 11 + __success __success_unpriv __retval(-2) 12 + __naked void ldsx_s8(void) 13 + { 14 + asm volatile (" \ 15 + r1 = 0x3fe; \ 16 + *(u64 *)(r10 - 8) = r1; \ 17 + r0 = *(s8 *)(r10 - 8); \ 18 + exit; \ 19 + " ::: __clobber_all); 20 + } 21 + 22 + SEC("socket") 23 + __description("LDSX, S16") 24 + __success __success_unpriv __retval(-2) 25 + __naked void ldsx_s16(void) 26 + { 27 + asm volatile (" \ 28 + r1 = 0x3fffe; \ 29 + *(u64 *)(r10 - 8) = r1; \ 30 + r0 = *(s16 *)(r10 - 8); \ 31 + exit; \ 32 + " ::: __clobber_all); 33 + } 34 + 35 + SEC("socket") 36 + __description("LDSX, S32") 37 + __success __success_unpriv __retval(-1) 38 + __naked void ldsx_s32(void) 39 + { 40 + asm volatile (" \ 41 + r1 = 0xfffffffe; \ 42 + *(u64 *)(r10 - 8) = r1; \ 43 + r0 = *(s32 *)(r10 - 8); \ 44 + r0 >>= 1; \ 45 + exit; \ 46 + " ::: __clobber_all); 47 + } 48 + 49 + SEC("socket") 50 + __description("LDSX, S8 range checking, privileged") 51 + __log_level(2) __success __retval(1) 52 + __msg("R1_w=scalar(smin=-128,smax=127)") 53 + __naked void ldsx_s8_range_priv(void) 54 + { 55 + asm volatile (" \ 56 + call %[bpf_get_prandom_u32]; \ 57 + *(u64 *)(r10 - 8) = r0; \ 58 + r1 = *(s8 *)(r10 - 8); \ 59 + /* r1 with s8 range */ \ 60 + if r1 s> 0x7f goto l0_%=; \ 61 + if r1 s< -0x80 goto l0_%=; \ 62 + r0 = 1; \ 63 + l1_%=: \ 64 + exit; \ 65 + l0_%=: \ 66 + r0 = 2; \ 67 + goto l1_%=; \ 68 + " : 69 + : __imm(bpf_get_prandom_u32) 70 + : __clobber_all); 71 + } 72 + 73 + SEC("socket") 74 + __description("LDSX, S16 range checking") 75 + __success __success_unpriv __retval(1) 76 + __naked void ldsx_s16_range(void) 77 + { 78 + asm volatile (" \ 79 + call %[bpf_get_prandom_u32]; \ 80 + *(u64 *)(r10 - 8) = r0; \ 81 + r1 = *(s16 *)(r10 - 8); \ 82 + /* r1 with s16 range */ \ 83 + if r1 s> 0x7fff goto l0_%=; \ 84 + if r1 s< -0x8000 goto l0_%=; \ 85 + r0 = 1; \ 86 + l1_%=: \ 87 + exit; \ 88 + l0_%=: \ 89 + r0 = 2; \ 90 + goto l1_%=; \ 91 + " : 92 + : __imm(bpf_get_prandom_u32) 93 + : __clobber_all); 94 + } 95 + 96 + SEC("socket") 97 + __description("LDSX, S32 range checking") 98 + __success __success_unpriv __retval(1) 99 + __naked void ldsx_s32_range(void) 100 + { 101 + asm volatile (" \ 102 + call %[bpf_get_prandom_u32]; \ 103 + *(u64 *)(r10 - 8) = r0; \ 104 + r1 = *(s32 *)(r10 - 8); \ 105 + /* r1 with s16 range */ \ 106 + if r1 s> 0x7fffFFFF goto l0_%=; \ 107 + if r1 s< -0x80000000 goto l0_%=; \ 108 + r0 = 1; \ 109 + l1_%=: \ 110 + exit; \ 111 + l0_%=: \ 112 + r0 = 2; \ 113 + goto l1_%=; \ 114 + " : 115 + : __imm(bpf_get_prandom_u32) 116 + : __clobber_all); 117 + } 118 + 119 + #else 120 + 121 + SEC("socket") 122 + __description("cpuv4 is not supported by compiler or jit, use a dummy test") 123 + __success 124 + int dummy_test(void) 125 + { 126 + return 0; 127 + } 128 + 129 + #endif 130 + 131 + char _license[] SEC("license") = "GPL";

+213

tools/testing/selftests/bpf/progs/verifier_movsx.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + 7 + #if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18 8 + 9 + SEC("socket") 10 + __description("MOV32SX, S8") 11 + __success __success_unpriv __retval(0x23) 12 + __naked void mov32sx_s8(void) 13 + { 14 + asm volatile (" \ 15 + w0 = 0xff23; \ 16 + w0 = (s8)w0; \ 17 + exit; \ 18 + " ::: __clobber_all); 19 + } 20 + 21 + SEC("socket") 22 + __description("MOV32SX, S16") 23 + __success __success_unpriv __retval(0xFFFFff23) 24 + __naked void mov32sx_s16(void) 25 + { 26 + asm volatile (" \ 27 + w0 = 0xff23; \ 28 + w0 = (s16)w0; \ 29 + exit; \ 30 + " ::: __clobber_all); 31 + } 32 + 33 + SEC("socket") 34 + __description("MOV64SX, S8") 35 + __success __success_unpriv __retval(-2) 36 + __naked void mov64sx_s8(void) 37 + { 38 + asm volatile (" \ 39 + r0 = 0x1fe; \ 40 + r0 = (s8)r0; \ 41 + exit; \ 42 + " ::: __clobber_all); 43 + } 44 + 45 + SEC("socket") 46 + __description("MOV64SX, S16") 47 + __success __success_unpriv __retval(0xf23) 48 + __naked void mov64sx_s16(void) 49 + { 50 + asm volatile (" \ 51 + r0 = 0xf0f23; \ 52 + r0 = (s16)r0; \ 53 + exit; \ 54 + " ::: __clobber_all); 55 + } 56 + 57 + SEC("socket") 58 + __description("MOV64SX, S32") 59 + __success __success_unpriv __retval(-1) 60 + __naked void mov64sx_s32(void) 61 + { 62 + asm volatile (" \ 63 + r0 = 0xfffffffe; \ 64 + r0 = (s32)r0; \ 65 + r0 >>= 1; \ 66 + exit; \ 67 + " ::: __clobber_all); 68 + } 69 + 70 + SEC("socket") 71 + __description("MOV32SX, S8, range_check") 72 + __success __success_unpriv __retval(1) 73 + __naked void mov32sx_s8_range(void) 74 + { 75 + asm volatile (" \ 76 + call %[bpf_get_prandom_u32]; \ 77 + w1 = (s8)w0; \ 78 + /* w1 with s8 range */ \ 79 + if w1 s> 0x7f goto l0_%=; \ 80 + if w1 s< -0x80 goto l0_%=; \ 81 + r0 = 1; \ 82 + l1_%=: \ 83 + exit; \ 84 + l0_%=: \ 85 + r0 = 2; \ 86 + goto l1_%=; \ 87 + " : 88 + : __imm(bpf_get_prandom_u32) 89 + : __clobber_all); 90 + } 91 + 92 + SEC("socket") 93 + __description("MOV32SX, S16, range_check") 94 + __success __success_unpriv __retval(1) 95 + __naked void mov32sx_s16_range(void) 96 + { 97 + asm volatile (" \ 98 + call %[bpf_get_prandom_u32]; \ 99 + w1 = (s16)w0; \ 100 + /* w1 with s16 range */ \ 101 + if w1 s> 0x7fff goto l0_%=; \ 102 + if w1 s< -0x80ff goto l0_%=; \ 103 + r0 = 1; \ 104 + l1_%=: \ 105 + exit; \ 106 + l0_%=: \ 107 + r0 = 2; \ 108 + goto l1_%=; \ 109 + " : 110 + : __imm(bpf_get_prandom_u32) 111 + : __clobber_all); 112 + } 113 + 114 + SEC("socket") 115 + __description("MOV32SX, S16, range_check 2") 116 + __success __success_unpriv __retval(1) 117 + __naked void mov32sx_s16_range_2(void) 118 + { 119 + asm volatile (" \ 120 + r1 = 65535; \ 121 + w2 = (s16)w1; \ 122 + r2 >>= 1; \ 123 + if r2 != 0x7fffFFFF goto l0_%=; \ 124 + r0 = 1; \ 125 + l1_%=: \ 126 + exit; \ 127 + l0_%=: \ 128 + r0 = 0; \ 129 + goto l1_%=; \ 130 + " : 131 + : __imm(bpf_get_prandom_u32) 132 + : __clobber_all); 133 + } 134 + 135 + SEC("socket") 136 + __description("MOV64SX, S8, range_check") 137 + __success __success_unpriv __retval(1) 138 + __naked void mov64sx_s8_range(void) 139 + { 140 + asm volatile (" \ 141 + call %[bpf_get_prandom_u32]; \ 142 + r1 = (s8)r0; \ 143 + /* r1 with s8 range */ \ 144 + if r1 s> 0x7f goto l0_%=; \ 145 + if r1 s< -0x80 goto l0_%=; \ 146 + r0 = 1; \ 147 + l1_%=: \ 148 + exit; \ 149 + l0_%=: \ 150 + r0 = 2; \ 151 + goto l1_%=; \ 152 + " : 153 + : __imm(bpf_get_prandom_u32) 154 + : __clobber_all); 155 + } 156 + 157 + SEC("socket") 158 + __description("MOV64SX, S16, range_check") 159 + __success __success_unpriv __retval(1) 160 + __naked void mov64sx_s16_range(void) 161 + { 162 + asm volatile (" \ 163 + call %[bpf_get_prandom_u32]; \ 164 + r1 = (s16)r0; \ 165 + /* r1 with s16 range */ \ 166 + if r1 s> 0x7fff goto l0_%=; \ 167 + if r1 s< -0x8000 goto l0_%=; \ 168 + r0 = 1; \ 169 + l1_%=: \ 170 + exit; \ 171 + l0_%=: \ 172 + r0 = 2; \ 173 + goto l1_%=; \ 174 + " : 175 + : __imm(bpf_get_prandom_u32) 176 + : __clobber_all); 177 + } 178 + 179 + SEC("socket") 180 + __description("MOV64SX, S32, range_check") 181 + __success __success_unpriv __retval(1) 182 + __naked void mov64sx_s32_range(void) 183 + { 184 + asm volatile (" \ 185 + call %[bpf_get_prandom_u32]; \ 186 + r1 = (s32)r0; \ 187 + /* r1 with s32 range */ \ 188 + if r1 s> 0x7fffffff goto l0_%=; \ 189 + if r1 s< -0x80000000 goto l0_%=; \ 190 + r0 = 1; \ 191 + l1_%=: \ 192 + exit; \ 193 + l0_%=: \ 194 + r0 = 2; \ 195 + goto l1_%=; \ 196 + " : 197 + : __imm(bpf_get_prandom_u32) 198 + : __clobber_all); 199 + } 200 + 201 + #else 202 + 203 + SEC("socket") 204 + __description("cpuv4 is not supported by compiler or jit, use a dummy test") 205 + __success 206 + int dummy_test(void) 207 + { 208 + return 0; 209 + } 210 + 211 + #endif 212 + 213 + char _license[] SEC("license") = "GPL";

+781

tools/testing/selftests/bpf/progs/verifier_sdiv.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + 7 + #if defined(__TARGET_ARCH_x86) && __clang_major__ >= 18 8 + 9 + SEC("socket") 10 + __description("SDIV32, non-zero imm divisor, check 1") 11 + __success __success_unpriv __retval(-20) 12 + __naked void sdiv32_non_zero_imm_1(void) 13 + { 14 + asm volatile (" \ 15 + w0 = -41; \ 16 + w0 s/= 2; \ 17 + exit; \ 18 + " ::: __clobber_all); 19 + } 20 + 21 + SEC("socket") 22 + __description("SDIV32, non-zero imm divisor, check 2") 23 + __success __success_unpriv __retval(-20) 24 + __naked void sdiv32_non_zero_imm_2(void) 25 + { 26 + asm volatile (" \ 27 + w0 = 41; \ 28 + w0 s/= -2; \ 29 + exit; \ 30 + " ::: __clobber_all); 31 + } 32 + 33 + SEC("socket") 34 + __description("SDIV32, non-zero imm divisor, check 3") 35 + __success __success_unpriv __retval(20) 36 + __naked void sdiv32_non_zero_imm_3(void) 37 + { 38 + asm volatile (" \ 39 + w0 = -41; \ 40 + w0 s/= -2; \ 41 + exit; \ 42 + " ::: __clobber_all); 43 + } 44 + 45 + SEC("socket") 46 + __description("SDIV32, non-zero imm divisor, check 4") 47 + __success __success_unpriv __retval(-21) 48 + __naked void sdiv32_non_zero_imm_4(void) 49 + { 50 + asm volatile (" \ 51 + w0 = -42; \ 52 + w0 s/= 2; \ 53 + exit; \ 54 + " ::: __clobber_all); 55 + } 56 + 57 + SEC("socket") 58 + __description("SDIV32, non-zero imm divisor, check 5") 59 + __success __success_unpriv __retval(-21) 60 + __naked void sdiv32_non_zero_imm_5(void) 61 + { 62 + asm volatile (" \ 63 + w0 = 42; \ 64 + w0 s/= -2; \ 65 + exit; \ 66 + " ::: __clobber_all); 67 + } 68 + 69 + SEC("socket") 70 + __description("SDIV32, non-zero imm divisor, check 6") 71 + __success __success_unpriv __retval(21) 72 + __naked void sdiv32_non_zero_imm_6(void) 73 + { 74 + asm volatile (" \ 75 + w0 = -42; \ 76 + w0 s/= -2; \ 77 + exit; \ 78 + " ::: __clobber_all); 79 + } 80 + 81 + SEC("socket") 82 + __description("SDIV32, non-zero imm divisor, check 7") 83 + __success __success_unpriv __retval(21) 84 + __naked void sdiv32_non_zero_imm_7(void) 85 + { 86 + asm volatile (" \ 87 + w0 = 42; \ 88 + w0 s/= 2; \ 89 + exit; \ 90 + " ::: __clobber_all); 91 + } 92 + 93 + SEC("socket") 94 + __description("SDIV32, non-zero imm divisor, check 8") 95 + __success __success_unpriv __retval(20) 96 + __naked void sdiv32_non_zero_imm_8(void) 97 + { 98 + asm volatile (" \ 99 + w0 = 41; \ 100 + w0 s/= 2; \ 101 + exit; \ 102 + " ::: __clobber_all); 103 + } 104 + 105 + SEC("socket") 106 + __description("SDIV32, non-zero reg divisor, check 1") 107 + __success __success_unpriv __retval(-20) 108 + __naked void sdiv32_non_zero_reg_1(void) 109 + { 110 + asm volatile (" \ 111 + w0 = -41; \ 112 + w1 = 2; \ 113 + w0 s/= w1; \ 114 + exit; \ 115 + " ::: __clobber_all); 116 + } 117 + 118 + SEC("socket") 119 + __description("SDIV32, non-zero reg divisor, check 2") 120 + __success __success_unpriv __retval(-20) 121 + __naked void sdiv32_non_zero_reg_2(void) 122 + { 123 + asm volatile (" \ 124 + w0 = 41; \ 125 + w1 = -2; \ 126 + w0 s/= w1; \ 127 + exit; \ 128 + " ::: __clobber_all); 129 + } 130 + 131 + SEC("socket") 132 + __description("SDIV32, non-zero reg divisor, check 3") 133 + __success __success_unpriv __retval(20) 134 + __naked void sdiv32_non_zero_reg_3(void) 135 + { 136 + asm volatile (" \ 137 + w0 = -41; \ 138 + w1 = -2; \ 139 + w0 s/= w1; \ 140 + exit; \ 141 + " ::: __clobber_all); 142 + } 143 + 144 + SEC("socket") 145 + __description("SDIV32, non-zero reg divisor, check 4") 146 + __success __success_unpriv __retval(-21) 147 + __naked void sdiv32_non_zero_reg_4(void) 148 + { 149 + asm volatile (" \ 150 + w0 = -42; \ 151 + w1 = 2; \ 152 + w0 s/= w1; \ 153 + exit; \ 154 + " ::: __clobber_all); 155 + } 156 + 157 + SEC("socket") 158 + __description("SDIV32, non-zero reg divisor, check 5") 159 + __success __success_unpriv __retval(-21) 160 + __naked void sdiv32_non_zero_reg_5(void) 161 + { 162 + asm volatile (" \ 163 + w0 = 42; \ 164 + w1 = -2; \ 165 + w0 s/= w1; \ 166 + exit; \ 167 + " ::: __clobber_all); 168 + } 169 + 170 + SEC("socket") 171 + __description("SDIV32, non-zero reg divisor, check 6") 172 + __success __success_unpriv __retval(21) 173 + __naked void sdiv32_non_zero_reg_6(void) 174 + { 175 + asm volatile (" \ 176 + w0 = -42; \ 177 + w1 = -2; \ 178 + w0 s/= w1; \ 179 + exit; \ 180 + " ::: __clobber_all); 181 + } 182 + 183 + SEC("socket") 184 + __description("SDIV32, non-zero reg divisor, check 7") 185 + __success __success_unpriv __retval(21) 186 + __naked void sdiv32_non_zero_reg_7(void) 187 + { 188 + asm volatile (" \ 189 + w0 = 42; \ 190 + w1 = 2; \ 191 + w0 s/= w1; \ 192 + exit; \ 193 + " ::: __clobber_all); 194 + } 195 + 196 + SEC("socket") 197 + __description("SDIV32, non-zero reg divisor, check 8") 198 + __success __success_unpriv __retval(20) 199 + __naked void sdiv32_non_zero_reg_8(void) 200 + { 201 + asm volatile (" \ 202 + w0 = 41; \ 203 + w1 = 2; \ 204 + w0 s/= w1; \ 205 + exit; \ 206 + " ::: __clobber_all); 207 + } 208 + 209 + SEC("socket") 210 + __description("SDIV64, non-zero imm divisor, check 1") 211 + __success __success_unpriv __retval(-20) 212 + __naked void sdiv64_non_zero_imm_1(void) 213 + { 214 + asm volatile (" \ 215 + r0 = -41; \ 216 + r0 s/= 2; \ 217 + exit; \ 218 + " ::: __clobber_all); 219 + } 220 + 221 + SEC("socket") 222 + __description("SDIV64, non-zero imm divisor, check 2") 223 + __success __success_unpriv __retval(-20) 224 + __naked void sdiv64_non_zero_imm_2(void) 225 + { 226 + asm volatile (" \ 227 + r0 = 41; \ 228 + r0 s/= -2; \ 229 + exit; \ 230 + " ::: __clobber_all); 231 + } 232 + 233 + SEC("socket") 234 + __description("SDIV64, non-zero imm divisor, check 3") 235 + __success __success_unpriv __retval(20) 236 + __naked void sdiv64_non_zero_imm_3(void) 237 + { 238 + asm volatile (" \ 239 + r0 = -41; \ 240 + r0 s/= -2; \ 241 + exit; \ 242 + " ::: __clobber_all); 243 + } 244 + 245 + SEC("socket") 246 + __description("SDIV64, non-zero imm divisor, check 4") 247 + __success __success_unpriv __retval(-21) 248 + __naked void sdiv64_non_zero_imm_4(void) 249 + { 250 + asm volatile (" \ 251 + r0 = -42; \ 252 + r0 s/= 2; \ 253 + exit; \ 254 + " ::: __clobber_all); 255 + } 256 + 257 + SEC("socket") 258 + __description("SDIV64, non-zero imm divisor, check 5") 259 + __success __success_unpriv __retval(-21) 260 + __naked void sdiv64_non_zero_imm_5(void) 261 + { 262 + asm volatile (" \ 263 + r0 = 42; \ 264 + r0 s/= -2; \ 265 + exit; \ 266 + " ::: __clobber_all); 267 + } 268 + 269 + SEC("socket") 270 + __description("SDIV64, non-zero imm divisor, check 6") 271 + __success __success_unpriv __retval(21) 272 + __naked void sdiv64_non_zero_imm_6(void) 273 + { 274 + asm volatile (" \ 275 + r0 = -42; \ 276 + r0 s/= -2; \ 277 + exit; \ 278 + " ::: __clobber_all); 279 + } 280 + 281 + SEC("socket") 282 + __description("SDIV64, non-zero reg divisor, check 1") 283 + __success __success_unpriv __retval(-20) 284 + __naked void sdiv64_non_zero_reg_1(void) 285 + { 286 + asm volatile (" \ 287 + r0 = -41; \ 288 + r1 = 2; \ 289 + r0 s/= r1; \ 290 + exit; \ 291 + " ::: __clobber_all); 292 + } 293 + 294 + SEC("socket") 295 + __description("SDIV64, non-zero reg divisor, check 2") 296 + __success __success_unpriv __retval(-20) 297 + __naked void sdiv64_non_zero_reg_2(void) 298 + { 299 + asm volatile (" \ 300 + r0 = 41; \ 301 + r1 = -2; \ 302 + r0 s/= r1; \ 303 + exit; \ 304 + " ::: __clobber_all); 305 + } 306 + 307 + SEC("socket") 308 + __description("SDIV64, non-zero reg divisor, check 3") 309 + __success __success_unpriv __retval(20) 310 + __naked void sdiv64_non_zero_reg_3(void) 311 + { 312 + asm volatile (" \ 313 + r0 = -41; \ 314 + r1 = -2; \ 315 + r0 s/= r1; \ 316 + exit; \ 317 + " ::: __clobber_all); 318 + } 319 + 320 + SEC("socket") 321 + __description("SDIV64, non-zero reg divisor, check 4") 322 + __success __success_unpriv __retval(-21) 323 + __naked void sdiv64_non_zero_reg_4(void) 324 + { 325 + asm volatile (" \ 326 + r0 = -42; \ 327 + r1 = 2; \ 328 + r0 s/= r1; \ 329 + exit; \ 330 + " ::: __clobber_all); 331 + } 332 + 333 + SEC("socket") 334 + __description("SDIV64, non-zero reg divisor, check 5") 335 + __success __success_unpriv __retval(-21) 336 + __naked void sdiv64_non_zero_reg_5(void) 337 + { 338 + asm volatile (" \ 339 + r0 = 42; \ 340 + r1 = -2; \ 341 + r0 s/= r1; \ 342 + exit; \ 343 + " ::: __clobber_all); 344 + } 345 + 346 + SEC("socket") 347 + __description("SDIV64, non-zero reg divisor, check 6") 348 + __success __success_unpriv __retval(21) 349 + __naked void sdiv64_non_zero_reg_6(void) 350 + { 351 + asm volatile (" \ 352 + r0 = -42; \ 353 + r1 = -2; \ 354 + r0 s/= r1; \ 355 + exit; \ 356 + " ::: __clobber_all); 357 + } 358 + 359 + SEC("socket") 360 + __description("SMOD32, non-zero imm divisor, check 1") 361 + __success __success_unpriv __retval(-1) 362 + __naked void smod32_non_zero_imm_1(void) 363 + { 364 + asm volatile (" \ 365 + w0 = -41; \ 366 + w0 s%%= 2; \ 367 + exit; \ 368 + " ::: __clobber_all); 369 + } 370 + 371 + SEC("socket") 372 + __description("SMOD32, non-zero imm divisor, check 2") 373 + __success __success_unpriv __retval(1) 374 + __naked void smod32_non_zero_imm_2(void) 375 + { 376 + asm volatile (" \ 377 + w0 = 41; \ 378 + w0 s%%= -2; \ 379 + exit; \ 380 + " ::: __clobber_all); 381 + } 382 + 383 + SEC("socket") 384 + __description("SMOD32, non-zero imm divisor, check 3") 385 + __success __success_unpriv __retval(-1) 386 + __naked void smod32_non_zero_imm_3(void) 387 + { 388 + asm volatile (" \ 389 + w0 = -41; \ 390 + w0 s%%= -2; \ 391 + exit; \ 392 + " ::: __clobber_all); 393 + } 394 + 395 + SEC("socket") 396 + __description("SMOD32, non-zero imm divisor, check 4") 397 + __success __success_unpriv __retval(0) 398 + __naked void smod32_non_zero_imm_4(void) 399 + { 400 + asm volatile (" \ 401 + w0 = -42; \ 402 + w0 s%%= 2; \ 403 + exit; \ 404 + " ::: __clobber_all); 405 + } 406 + 407 + SEC("socket") 408 + __description("SMOD32, non-zero imm divisor, check 5") 409 + __success __success_unpriv __retval(0) 410 + __naked void smod32_non_zero_imm_5(void) 411 + { 412 + asm volatile (" \ 413 + w0 = 42; \ 414 + w0 s%%= -2; \ 415 + exit; \ 416 + " ::: __clobber_all); 417 + } 418 + 419 + SEC("socket") 420 + __description("SMOD32, non-zero imm divisor, check 6") 421 + __success __success_unpriv __retval(0) 422 + __naked void smod32_non_zero_imm_6(void) 423 + { 424 + asm volatile (" \ 425 + w0 = -42; \ 426 + w0 s%%= -2; \ 427 + exit; \ 428 + " ::: __clobber_all); 429 + } 430 + 431 + SEC("socket") 432 + __description("SMOD32, non-zero reg divisor, check 1") 433 + __success __success_unpriv __retval(-1) 434 + __naked void smod32_non_zero_reg_1(void) 435 + { 436 + asm volatile (" \ 437 + w0 = -41; \ 438 + w1 = 2; \ 439 + w0 s%%= w1; \ 440 + exit; \ 441 + " ::: __clobber_all); 442 + } 443 + 444 + SEC("socket") 445 + __description("SMOD32, non-zero reg divisor, check 2") 446 + __success __success_unpriv __retval(1) 447 + __naked void smod32_non_zero_reg_2(void) 448 + { 449 + asm volatile (" \ 450 + w0 = 41; \ 451 + w1 = -2; \ 452 + w0 s%%= w1; \ 453 + exit; \ 454 + " ::: __clobber_all); 455 + } 456 + 457 + SEC("socket") 458 + __description("SMOD32, non-zero reg divisor, check 3") 459 + __success __success_unpriv __retval(-1) 460 + __naked void smod32_non_zero_reg_3(void) 461 + { 462 + asm volatile (" \ 463 + w0 = -41; \ 464 + w1 = -2; \ 465 + w0 s%%= w1; \ 466 + exit; \ 467 + " ::: __clobber_all); 468 + } 469 + 470 + SEC("socket") 471 + __description("SMOD32, non-zero reg divisor, check 4") 472 + __success __success_unpriv __retval(0) 473 + __naked void smod32_non_zero_reg_4(void) 474 + { 475 + asm volatile (" \ 476 + w0 = -42; \ 477 + w1 = 2; \ 478 + w0 s%%= w1; \ 479 + exit; \ 480 + " ::: __clobber_all); 481 + } 482 + 483 + SEC("socket") 484 + __description("SMOD32, non-zero reg divisor, check 5") 485 + __success __success_unpriv __retval(0) 486 + __naked void smod32_non_zero_reg_5(void) 487 + { 488 + asm volatile (" \ 489 + w0 = 42; \ 490 + w1 = -2; \ 491 + w0 s%%= w1; \ 492 + exit; \ 493 + " ::: __clobber_all); 494 + } 495 + 496 + SEC("socket") 497 + __description("SMOD32, non-zero reg divisor, check 6") 498 + __success __success_unpriv __retval(0) 499 + __naked void smod32_non_zero_reg_6(void) 500 + { 501 + asm volatile (" \ 502 + w0 = -42; \ 503 + w1 = -2; \ 504 + w0 s%%= w1; \ 505 + exit; \ 506 + " ::: __clobber_all); 507 + } 508 + 509 + SEC("socket") 510 + __description("SMOD64, non-zero imm divisor, check 1") 511 + __success __success_unpriv __retval(-1) 512 + __naked void smod64_non_zero_imm_1(void) 513 + { 514 + asm volatile (" \ 515 + r0 = -41; \ 516 + r0 s%%= 2; \ 517 + exit; \ 518 + " ::: __clobber_all); 519 + } 520 + 521 + SEC("socket") 522 + __description("SMOD64, non-zero imm divisor, check 2") 523 + __success __success_unpriv __retval(1) 524 + __naked void smod64_non_zero_imm_2(void) 525 + { 526 + asm volatile (" \ 527 + r0 = 41; \ 528 + r0 s%%= -2; \ 529 + exit; \ 530 + " ::: __clobber_all); 531 + } 532 + 533 + SEC("socket") 534 + __description("SMOD64, non-zero imm divisor, check 3") 535 + __success __success_unpriv __retval(-1) 536 + __naked void smod64_non_zero_imm_3(void) 537 + { 538 + asm volatile (" \ 539 + r0 = -41; \ 540 + r0 s%%= -2; \ 541 + exit; \ 542 + " ::: __clobber_all); 543 + } 544 + 545 + SEC("socket") 546 + __description("SMOD64, non-zero imm divisor, check 4") 547 + __success __success_unpriv __retval(0) 548 + __naked void smod64_non_zero_imm_4(void) 549 + { 550 + asm volatile (" \ 551 + r0 = -42; \ 552 + r0 s%%= 2; \ 553 + exit; \ 554 + " ::: __clobber_all); 555 + } 556 + 557 + SEC("socket") 558 + __description("SMOD64, non-zero imm divisor, check 5") 559 + __success __success_unpriv __retval(-0) 560 + __naked void smod64_non_zero_imm_5(void) 561 + { 562 + asm volatile (" \ 563 + r0 = 42; \ 564 + r0 s%%= -2; \ 565 + exit; \ 566 + " ::: __clobber_all); 567 + } 568 + 569 + SEC("socket") 570 + __description("SMOD64, non-zero imm divisor, check 6") 571 + __success __success_unpriv __retval(0) 572 + __naked void smod64_non_zero_imm_6(void) 573 + { 574 + asm volatile (" \ 575 + r0 = -42; \ 576 + r0 s%%= -2; \ 577 + exit; \ 578 + " ::: __clobber_all); 579 + } 580 + 581 + SEC("socket") 582 + __description("SMOD64, non-zero imm divisor, check 7") 583 + __success __success_unpriv __retval(0) 584 + __naked void smod64_non_zero_imm_7(void) 585 + { 586 + asm volatile (" \ 587 + r0 = 42; \ 588 + r0 s%%= 2; \ 589 + exit; \ 590 + " ::: __clobber_all); 591 + } 592 + 593 + SEC("socket") 594 + __description("SMOD64, non-zero imm divisor, check 8") 595 + __success __success_unpriv __retval(1) 596 + __naked void smod64_non_zero_imm_8(void) 597 + { 598 + asm volatile (" \ 599 + r0 = 41; \ 600 + r0 s%%= 2; \ 601 + exit; \ 602 + " ::: __clobber_all); 603 + } 604 + 605 + SEC("socket") 606 + __description("SMOD64, non-zero reg divisor, check 1") 607 + __success __success_unpriv __retval(-1) 608 + __naked void smod64_non_zero_reg_1(void) 609 + { 610 + asm volatile (" \ 611 + r0 = -41; \ 612 + r1 = 2; \ 613 + r0 s%%= r1; \ 614 + exit; \ 615 + " ::: __clobber_all); 616 + } 617 + 618 + SEC("socket") 619 + __description("SMOD64, non-zero reg divisor, check 2") 620 + __success __success_unpriv __retval(1) 621 + __naked void smod64_non_zero_reg_2(void) 622 + { 623 + asm volatile (" \ 624 + r0 = 41; \ 625 + r1 = -2; \ 626 + r0 s%%= r1; \ 627 + exit; \ 628 + " ::: __clobber_all); 629 + } 630 + 631 + SEC("socket") 632 + __description("SMOD64, non-zero reg divisor, check 3") 633 + __success __success_unpriv __retval(-1) 634 + __naked void smod64_non_zero_reg_3(void) 635 + { 636 + asm volatile (" \ 637 + r0 = -41; \ 638 + r1 = -2; \ 639 + r0 s%%= r1; \ 640 + exit; \ 641 + " ::: __clobber_all); 642 + } 643 + 644 + SEC("socket") 645 + __description("SMOD64, non-zero reg divisor, check 4") 646 + __success __success_unpriv __retval(0) 647 + __naked void smod64_non_zero_reg_4(void) 648 + { 649 + asm volatile (" \ 650 + r0 = -42; \ 651 + r1 = 2; \ 652 + r0 s%%= r1; \ 653 + exit; \ 654 + " ::: __clobber_all); 655 + } 656 + 657 + SEC("socket") 658 + __description("SMOD64, non-zero reg divisor, check 5") 659 + __success __success_unpriv __retval(0) 660 + __naked void smod64_non_zero_reg_5(void) 661 + { 662 + asm volatile (" \ 663 + r0 = 42; \ 664 + r1 = -2; \ 665 + r0 s%%= r1; \ 666 + exit; \ 667 + " ::: __clobber_all); 668 + } 669 + 670 + SEC("socket") 671 + __description("SMOD64, non-zero reg divisor, check 6") 672 + __success __success_unpriv __retval(0) 673 + __naked void smod64_non_zero_reg_6(void) 674 + { 675 + asm volatile (" \ 676 + r0 = -42; \ 677 + r1 = -2; \ 678 + r0 s%%= r1; \ 679 + exit; \ 680 + " ::: __clobber_all); 681 + } 682 + 683 + SEC("socket") 684 + __description("SMOD64, non-zero reg divisor, check 7") 685 + __success __success_unpriv __retval(0) 686 + __naked void smod64_non_zero_reg_7(void) 687 + { 688 + asm volatile (" \ 689 + r0 = 42; \ 690 + r1 = 2; \ 691 + r0 s%%= r1; \ 692 + exit; \ 693 + " ::: __clobber_all); 694 + } 695 + 696 + SEC("socket") 697 + __description("SMOD64, non-zero reg divisor, check 8") 698 + __success __success_unpriv __retval(1) 699 + __naked void smod64_non_zero_reg_8(void) 700 + { 701 + asm volatile (" \ 702 + r0 = 41; \ 703 + r1 = 2; \ 704 + r0 s%%= r1; \ 705 + exit; \ 706 + " ::: __clobber_all); 707 + } 708 + 709 + SEC("socket") 710 + __description("SDIV32, zero divisor") 711 + __success __success_unpriv __retval(0) 712 + __naked void sdiv32_zero_divisor(void) 713 + { 714 + asm volatile (" \ 715 + w0 = 42; \ 716 + w1 = 0; \ 717 + w2 = -1; \ 718 + w2 s/= w1; \ 719 + w0 = w2; \ 720 + exit; \ 721 + " ::: __clobber_all); 722 + } 723 + 724 + SEC("socket") 725 + __description("SDIV64, zero divisor") 726 + __success __success_unpriv __retval(0) 727 + __naked void sdiv64_zero_divisor(void) 728 + { 729 + asm volatile (" \ 730 + r0 = 42; \ 731 + r1 = 0; \ 732 + r2 = -1; \ 733 + r2 s/= r1; \ 734 + r0 = r2; \ 735 + exit; \ 736 + " ::: __clobber_all); 737 + } 738 + 739 + SEC("socket") 740 + __description("SMOD32, zero divisor") 741 + __success __success_unpriv __retval(-1) 742 + __naked void smod32_zero_divisor(void) 743 + { 744 + asm volatile (" \ 745 + w0 = 42; \ 746 + w1 = 0; \ 747 + w2 = -1; \ 748 + w2 s%%= w1; \ 749 + w0 = w2; \ 750 + exit; \ 751 + " ::: __clobber_all); 752 + } 753 + 754 + SEC("socket") 755 + __description("SMOD64, zero divisor") 756 + __success __success_unpriv __retval(-1) 757 + __naked void smod64_zero_divisor(void) 758 + { 759 + asm volatile (" \ 760 + r0 = 42; \ 761 + r1 = 0; \ 762 + r2 = -1; \ 763 + r2 s%%= r1; \ 764 + r0 = r2; \ 765 + exit; \ 766 + " ::: __clobber_all); 767 + } 768 + 769 + #else 770 + 771 + SEC("socket") 772 + __description("cpuv4 is not supported by compiler or jit, use a dummy test") 773 + __success 774 + int dummy_test(void) 775 + { 776 + return 0; 777 + } 778 + 779 + #endif 780 + 781 + char _license[] SEC("license") = "GPL";

+3 -3

tools/testing/selftests/bpf/verifier/basic_instr.c

··· 176 176 .retval = 1, 177 177 }, 178 178 { 179 - "invalid 64-bit BPF_END", 179 + "invalid 64-bit BPF_END with BPF_TO_BE", 180 180 .insns = { 181 181 BPF_MOV32_IMM(BPF_REG_0, 0), 182 182 { 183 - .code = BPF_ALU64 | BPF_END | BPF_TO_LE, 183 + .code = BPF_ALU64 | BPF_END | BPF_TO_BE, 184 184 .dst_reg = BPF_REG_0, 185 185 .src_reg = 0, 186 186 .off = 0, ··· 188 188 }, 189 189 BPF_EXIT_INSN(), 190 190 }, 191 - .errstr = "unknown opcode d7", 191 + .errstr = "unknown opcode df", 192 192 .result = REJECT, 193 193 }, 194 194 {

+1 -1

tools/testing/selftests/bpf/xskxceiver.c

··· 2076 2076 2077 2077 err = bpf_xdp_query(ifobj->ifindex, XDP_FLAGS_DRV_MODE, &query_opts); 2078 2078 if (err) { 2079 - ksft_print_msg("Error querrying XDP capabilities\n"); 2079 + ksft_print_msg("Error querying XDP capabilities\n"); 2080 2080 exit_with_error(-err); 2081 2081 } 2082 2082 if (query_opts.feature_flags & NETDEV_XDP_ACT_RX_SG)