Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

+25 -6

Documentation/bpf/btf.rst

··· 726 726 4.2 .BTF.ext section 727 727 -------------------- 728 728 729 - The .BTF.ext section encodes func_info and line_info which needs loader 730 - manipulation before loading into the kernel. 729 + The .BTF.ext section encodes func_info, line_info and CO-RE relocations 730 + which needs loader manipulation before loading into the kernel. 731 731 732 732 The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h`` 733 733 and ``tools/lib/bpf/btf.c``. ··· 745 745 __u32 func_info_len; 746 746 __u32 line_info_off; 747 747 __u32 line_info_len; 748 + 749 + /* optional part of .BTF.ext header */ 750 + __u32 core_relo_off; 751 + __u32 core_relo_len; 748 752 }; 749 753 750 754 It is very similar to .BTF section. Instead of type/string section, it 751 - contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details 752 - about func_info and line_info record format. 755 + contains func_info, line_info and core_relo sub-sections. 756 + See :ref:`BPF_Prog_Load` for details about func_info and line_info 757 + record format. 753 758 754 759 The func_info is organized as below.:: 755 760 756 - func_info_rec_size 761 + func_info_rec_size /* __u32 value */ 757 762 btf_ext_info_sec for section #1 /* func_info for section #1 */ 758 763 btf_ext_info_sec for section #2 /* func_info for section #2 */ 759 764 ... ··· 778 773 779 774 The line_info is organized as below.:: 780 775 781 - line_info_rec_size 776 + line_info_rec_size /* __u32 value */ 782 777 btf_ext_info_sec for section #1 /* line_info for section #1 */ 783 778 btf_ext_info_sec for section #2 /* line_info for section #2 */ 784 779 ... ··· 791 786 kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct 792 787 bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the 793 788 beginning of section (``btf_ext_info_sec->sec_name_off``). 789 + 790 + The core_relo is organized as below.:: 791 + 792 + core_relo_rec_size /* __u32 value */ 793 + btf_ext_info_sec for section #1 /* core_relo for section #1 */ 794 + btf_ext_info_sec for section #2 /* core_relo for section #2 */ 795 + 796 + ``core_relo_rec_size`` specifies the size of ``bpf_core_relo`` 797 + structure when .BTF.ext is generated. All ``bpf_core_relo`` structures 798 + within a single ``btf_ext_info_sec`` describe relocations applied to 799 + section named by ``btf_ext_info_sec->sec_name_off``. 800 + 801 + See :ref:`Documentation/bpf/llvm_reloc.rst <btf-co-re-relocations>` 802 + for more information on CO-RE relocations. 794 803 795 804 4.2 .BTF_ids section 796 805 --------------------

+1

Documentation/bpf/index.rst

··· 29 29 bpf_licensing 30 30 test_debug 31 31 clang-notes 32 + linux-notes 32 33 other 33 34 redirect 34 35

+304

Documentation/bpf/llvm_reloc.rst

··· 240 240 Offset Info Type Symbol's Value Symbol's Name 241 241 000000000000002c 0000000200000004 R_BPF_64_NODYLD32 0000000000000000 .text 242 242 0000000000000040 0000000200000004 R_BPF_64_NODYLD32 0000000000000000 .text 243 + 244 + .. _btf-co-re-relocations: 245 + 246 + ================= 247 + CO-RE Relocations 248 + ================= 249 + 250 + From object file point of view CO-RE mechanism is implemented as a set 251 + of CO-RE specific relocation records. These relocation records are not 252 + related to ELF relocations and are encoded in .BTF.ext section. 253 + See :ref:`Documentation/bpf/btf.rst <BTF_Ext_Section>` for more 254 + information on .BTF.ext structure. 255 + 256 + CO-RE relocations are applied to BPF instructions to update immediate 257 + or offset fields of the instruction at load time with information 258 + relevant for target kernel. 259 + 260 + Field to patch is selected basing on the instruction class: 261 + 262 + * For BPF_ALU, BPF_ALU64, BPF_LD `immediate` field is patched; 263 + * For BPF_LDX, BPF_STX, BPF_ST `offset` field is patched; 264 + * BPF_JMP, BPF_JMP32 instructions **should not** be patched. 265 + 266 + Relocation kinds 267 + ================ 268 + 269 + There are several kinds of CO-RE relocations that could be split in 270 + three groups: 271 + 272 + * Field-based - patch instruction with field related information, e.g. 273 + change offset field of the BPF_LDX instruction to reflect offset 274 + of a specific structure field in the target kernel. 275 + 276 + * Type-based - patch instruction with type related information, e.g. 277 + change immediate field of the BPF_ALU move instruction to 0 or 1 to 278 + reflect if specific type is present in the target kernel. 279 + 280 + * Enum-based - patch instruction with enum related information, e.g. 281 + change immediate field of the BPF_LD_IMM64 instruction to reflect 282 + value of a specific enum literal in the target kernel. 283 + 284 + The complete list of relocation kinds is represented by the following enum: 285 + 286 + .. code-block:: c 287 + 288 + enum bpf_core_relo_kind { 289 + BPF_CORE_FIELD_BYTE_OFFSET = 0, /* field byte offset */ 290 + BPF_CORE_FIELD_BYTE_SIZE = 1, /* field size in bytes */ 291 + BPF_CORE_FIELD_EXISTS = 2, /* field existence in target kernel */ 292 + BPF_CORE_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */ 293 + BPF_CORE_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */ 294 + BPF_CORE_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */ 295 + BPF_CORE_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */ 296 + BPF_CORE_TYPE_ID_TARGET = 7, /* type ID in target kernel */ 297 + BPF_CORE_TYPE_EXISTS = 8, /* type existence in target kernel */ 298 + BPF_CORE_TYPE_SIZE = 9, /* type size in bytes */ 299 + BPF_CORE_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */ 300 + BPF_CORE_ENUMVAL_VALUE = 11, /* enum value integer value */ 301 + BPF_CORE_TYPE_MATCHES = 12, /* type match in target kernel */ 302 + }; 303 + 304 + Notes: 305 + 306 + * ``BPF_CORE_FIELD_LSHIFT_U64`` and ``BPF_CORE_FIELD_RSHIFT_U64`` are 307 + supposed to be used to read bitfield values using the following 308 + algorithm: 309 + 310 + .. code-block:: c 311 + 312 + // To read bitfield ``f`` from ``struct s`` 313 + is_signed = relo(s->f, BPF_CORE_FIELD_SIGNED) 314 + off = relo(s->f, BPF_CORE_FIELD_BYTE_OFFSET) 315 + sz = relo(s->f, BPF_CORE_FIELD_BYTE_SIZE) 316 + l = relo(s->f, BPF_CORE_FIELD_LSHIFT_U64) 317 + r = relo(s->f, BPF_CORE_FIELD_RSHIFT_U64) 318 + // define ``v`` as signed or unsigned integer of size ``sz`` 319 + v = *({s|u}<sz> *)((void *)s + off) 320 + v <<= l 321 + v >>= r 322 + 323 + * The ``BPF_CORE_TYPE_MATCHES`` queries matching relation, defined as 324 + follows: 325 + 326 + * for integers: types match if size and signedness match; 327 + * for arrays & pointers: target types are recursively matched; 328 + * for structs & unions: 329 + 330 + * local members need to exist in target with the same name; 331 + 332 + * for each member we recursively check match unless it is already behind a 333 + pointer, in which case we only check matching names and compatible kind; 334 + 335 + * for enums: 336 + 337 + * local variants have to have a match in target by symbolic name (but not 338 + numeric value); 339 + 340 + * size has to match (but enum may match enum64 and vice versa); 341 + 342 + * for function pointers: 343 + 344 + * number and position of arguments in local type has to match target; 345 + * for each argument and the return value we recursively check match. 346 + 347 + CO-RE Relocation Record 348 + ======================= 349 + 350 + Relocation record is encoded as the following structure: 351 + 352 + .. code-block:: c 353 + 354 + struct bpf_core_relo { 355 + __u32 insn_off; 356 + __u32 type_id; 357 + __u32 access_str_off; 358 + enum bpf_core_relo_kind kind; 359 + }; 360 + 361 + * ``insn_off`` - instruction offset (in bytes) within a code section 362 + associated with this relocation; 363 + 364 + * ``type_id`` - BTF type ID of the "root" (containing) entity of a 365 + relocatable type or field; 366 + 367 + * ``access_str_off`` - offset into corresponding .BTF string section. 368 + String interpretation depends on specific relocation kind: 369 + 370 + * for field-based relocations, string encodes an accessed field using 371 + a sequence of field and array indices, separated by colon (:). It's 372 + conceptually very close to LLVM's `getelementptr <GEP_>`_ instruction's 373 + arguments for identifying offset to a field. For example, consider the 374 + following C code: 375 + 376 + .. code-block:: c 377 + 378 + struct sample { 379 + int a; 380 + int b; 381 + struct { int c[10]; }; 382 + } __attribute__((preserve_access_index)); 383 + struct sample *s; 384 + 385 + * Access to ``s[0].a`` would be encoded as ``0:0``: 386 + 387 + * ``0``: first element of ``s`` (as if ``s`` is an array); 388 + * ``0``: index of field ``a`` in ``struct sample``. 389 + 390 + * Access to ``s->a`` would be encoded as ``0:0`` as well. 391 + * Access to ``s->b`` would be encoded as ``0:1``: 392 + 393 + * ``0``: first element of ``s``; 394 + * ``1``: index of field ``b`` in ``struct sample``. 395 + 396 + * Access to ``s[1].c[5]`` would be encoded as ``1:2:0:5``: 397 + 398 + * ``1``: second element of ``s``; 399 + * ``2``: index of anonymous structure field in ``struct sample``; 400 + * ``0``: index of field ``c`` in anonymous structure; 401 + * ``5``: access to array element #5. 402 + 403 + * for type-based relocations, string is expected to be just "0"; 404 + 405 + * for enum value-based relocations, string contains an index of enum 406 + value within its enum type; 407 + 408 + * ``kind`` - one of ``enum bpf_core_relo_kind``. 409 + 410 + .. _GEP: https://llvm.org/docs/LangRef.html#getelementptr-instruction 411 + 412 + .. _btf_co_re_relocation_examples: 413 + 414 + CO-RE Relocation Examples 415 + ========================= 416 + 417 + For the following C code: 418 + 419 + .. code-block:: c 420 + 421 + struct foo { 422 + int a; 423 + int b; 424 + unsigned c:15; 425 + } __attribute__((preserve_access_index)); 426 + 427 + enum bar { U, V }; 428 + 429 + With the following BTF definitions: 430 + 431 + .. code-block:: 432 + 433 + ... 434 + [2] STRUCT 'foo' size=8 vlen=2 435 + 'a' type_id=3 bits_offset=0 436 + 'b' type_id=3 bits_offset=32 437 + 'c' type_id=4 bits_offset=64 bitfield_size=15 438 + [3] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED 439 + [4] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none) 440 + ... 441 + [16] ENUM 'bar' encoding=UNSIGNED size=4 vlen=2 442 + 'U' val=0 443 + 'V' val=1 444 + 445 + Field offset relocations are generated automatically when 446 + ``__attribute__((preserve_access_index))`` is used, for example: 447 + 448 + .. code-block:: c 449 + 450 + void alpha(struct foo *s, volatile unsigned long *g) { 451 + *g = s->a; 452 + s->a = 1; 453 + } 454 + 455 + 00 <alpha>: 456 + 0: r3 = *(s32 *)(r1 + 0x0) 457 + 00: CO-RE <byte_off> [2] struct foo::a (0:0) 458 + 1: *(u64 *)(r2 + 0x0) = r3 459 + 2: *(u32 *)(r1 + 0x0) = 0x1 460 + 10: CO-RE <byte_off> [2] struct foo::a (0:0) 461 + 3: exit 462 + 463 + 464 + All relocation kinds could be requested via built-in functions. 465 + E.g. field-based relocations: 466 + 467 + .. code-block:: c 468 + 469 + void bravo(struct foo *s, volatile unsigned long *g) { 470 + *g = __builtin_preserve_field_info(s->b, 0 /* field byte offset */); 471 + *g = __builtin_preserve_field_info(s->b, 1 /* field byte size */); 472 + *g = __builtin_preserve_field_info(s->b, 2 /* field existence */); 473 + *g = __builtin_preserve_field_info(s->b, 3 /* field signedness */); 474 + *g = __builtin_preserve_field_info(s->c, 4 /* bitfield left shift */); 475 + *g = __builtin_preserve_field_info(s->c, 5 /* bitfield right shift */); 476 + } 477 + 478 + 20 <bravo>: 479 + 4: r1 = 0x4 480 + 20: CO-RE <byte_off> [2] struct foo::b (0:1) 481 + 5: *(u64 *)(r2 + 0x0) = r1 482 + 6: r1 = 0x4 483 + 30: CO-RE <byte_sz> [2] struct foo::b (0:1) 484 + 7: *(u64 *)(r2 + 0x0) = r1 485 + 8: r1 = 0x1 486 + 40: CO-RE <field_exists> [2] struct foo::b (0:1) 487 + 9: *(u64 *)(r2 + 0x0) = r1 488 + 10: r1 = 0x1 489 + 50: CO-RE <signed> [2] struct foo::b (0:1) 490 + 11: *(u64 *)(r2 + 0x0) = r1 491 + 12: r1 = 0x31 492 + 60: CO-RE <lshift_u64> [2] struct foo::c (0:2) 493 + 13: *(u64 *)(r2 + 0x0) = r1 494 + 14: r1 = 0x31 495 + 70: CO-RE <rshift_u64> [2] struct foo::c (0:2) 496 + 15: *(u64 *)(r2 + 0x0) = r1 497 + 16: exit 498 + 499 + 500 + Type-based relocations: 501 + 502 + .. code-block:: c 503 + 504 + void charlie(struct foo *s, volatile unsigned long *g) { 505 + *g = __builtin_preserve_type_info(*s, 0 /* type existence */); 506 + *g = __builtin_preserve_type_info(*s, 1 /* type size */); 507 + *g = __builtin_preserve_type_info(*s, 2 /* type matches */); 508 + *g = __builtin_btf_type_id(*s, 0 /* type id in this object file */); 509 + *g = __builtin_btf_type_id(*s, 1 /* type id in target kernel */); 510 + } 511 + 512 + 88 <charlie>: 513 + 17: r1 = 0x1 514 + 88: CO-RE <type_exists> [2] struct foo 515 + 18: *(u64 *)(r2 + 0x0) = r1 516 + 19: r1 = 0xc 517 + 98: CO-RE <type_size> [2] struct foo 518 + 20: *(u64 *)(r2 + 0x0) = r1 519 + 21: r1 = 0x1 520 + a8: CO-RE <type_matches> [2] struct foo 521 + 22: *(u64 *)(r2 + 0x0) = r1 522 + 23: r1 = 0x2 ll 523 + b8: CO-RE <local_type_id> [2] struct foo 524 + 25: *(u64 *)(r2 + 0x0) = r1 525 + 26: r1 = 0x2 ll 526 + d0: CO-RE <target_type_id> [2] struct foo 527 + 28: *(u64 *)(r2 + 0x0) = r1 528 + 29: exit 529 + 530 + Enum-based relocations: 531 + 532 + .. code-block:: c 533 + 534 + void delta(struct foo *s, volatile unsigned long *g) { 535 + *g = __builtin_preserve_enum_value(*(enum bar *)U, 0 /* enum literal existence */); 536 + *g = __builtin_preserve_enum_value(*(enum bar *)V, 1 /* enum literal value */); 537 + } 538 + 539 + f0 <delta>: 540 + 30: r1 = 0x1 ll 541 + f0: CO-RE <enumval_exists> [16] enum bar::U = 0 542 + 32: *(u64 *)(r2 + 0x0) = r1 543 + 33: r1 = 0x1 ll 544 + 108: CO-RE <enumval_value> [16] enum bar::V = 1 545 + 35: *(u64 *)(r2 + 0x0) = r1 546 + 36: exit

+25

Documentation/bpf/standardization/abi.rst

··· 1 + .. contents:: 2 + .. sectnum:: 3 + 4 + =================================================== 5 + BPF ABI Recommended Conventions and Guidelines v1.0 6 + =================================================== 7 + 8 + This is version 1.0 of an informational document containing recommended 9 + conventions and guidelines for producing portable BPF program binaries. 10 + 11 + Registers and calling convention 12 + ================================ 13 + 14 + BPF has 10 general purpose registers and a read-only frame pointer register, 15 + all of which are 64-bits wide. 16 + 17 + The BPF calling convention is defined as: 18 + 19 + * R0: return value from function calls, and exit value for BPF programs 20 + * R1 - R5: arguments for function calls 21 + * R6 - R9: callee saved registers that function calls will preserve 22 + * R10: read-only frame pointer to access stack 23 + 24 + R0 - R5 are scratch registers and BPF programs needs to spill/fill them if 25 + necessary across calls.

+1 -1

Documentation/bpf/standardization/index.rst

··· 12 12 :maxdepth: 1 13 13 14 14 instruction-set 15 - linux-notes 15 + abi 16 16 17 17 .. Links: 18 18 .. _IETF BPF Working Group: https://datatracker.ietf.org/wg/bpf/about/

+14 -30

Documentation/bpf/standardization/instruction-set.rst

··· 1 1 .. contents:: 2 2 .. sectnum:: 3 3 4 - ======================================== 5 - eBPF Instruction Set Specification, v1.0 6 - ======================================== 4 + ======================================= 5 + BPF Instruction Set Specification, v1.0 6 + ======================================= 7 7 8 - This document specifies version 1.0 of the eBPF instruction set. 8 + This document specifies version 1.0 of the BPF instruction set. 9 9 10 10 Documentation conventions 11 11 ========================= ··· 97 97 A: 10000110 98 98 B: 11111111 10000110 99 99 100 - Registers and calling convention 101 - ================================ 102 - 103 - eBPF has 10 general purpose registers and a read-only frame pointer register, 104 - all of which are 64-bits wide. 105 - 106 - The eBPF calling convention is defined as: 107 - 108 - * R0: return value from function calls, and exit value for eBPF programs 109 - * R1 - R5: arguments for function calls 110 - * R6 - R9: callee saved registers that function calls will preserve 111 - * R10: read-only frame pointer to access stack 112 - 113 - R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if 114 - necessary across calls. 115 - 116 100 Instruction encoding 117 101 ==================== 118 102 119 - eBPF has two instruction encodings: 103 + BPF has two instruction encodings: 120 104 121 105 * the basic instruction encoding, which uses 64 bits to encode an instruction 122 106 * the wide instruction encoding, which appends a second 64-bit immediate (i.e., ··· 244 260 ========= ===== ======= ========================================================== 245 261 246 262 Underflow and overflow are allowed during arithmetic operations, meaning 247 - the 64-bit or 32-bit value will wrap. If eBPF program execution would 263 + the 64-bit or 32-bit value will wrap. If BPF program execution would 248 264 result in division by zero, the destination register is instead set to zero. 249 265 If execution would result in modulo by zero, for ``BPF_ALU64`` the value of 250 266 the destination register is unchanged whereas for ``BPF_ALU`` the upper ··· 357 373 BPF_JSGT 0x6 any PC += offset if dst > src signed 358 374 BPF_JSGE 0x7 any PC += offset if dst >= src signed 359 375 BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_ 360 - BPF_CALL 0x8 0x1 call PC += offset see `Program-local functions`_ 376 + BPF_CALL 0x8 0x1 call PC += imm see `Program-local functions`_ 361 377 BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_ 362 378 BPF_EXIT 0x9 0x0 return BPF_JMP only 363 379 BPF_JLT 0xa any PC += offset if dst < src unsigned ··· 366 382 BPF_JSLE 0xd any PC += offset if dst <= src signed 367 383 ======== ===== === =========================================== ========================================= 368 384 369 - The eBPF program needs to store the return value into register R0 before doing a 385 + The BPF program needs to store the return value into register R0 before doing a 370 386 ``BPF_EXIT``. 371 387 372 388 Example: ··· 408 424 ~~~~~~~~~~~~~~~~~~~~~~~ 409 425 Program-local functions are functions exposed by the same BPF program as the 410 426 caller, and are referenced by offset from the call instruction, similar to 411 - ``BPF_JA``. A ``BPF_EXIT`` within the program-local function will return to 412 - the caller. 427 + ``BPF_JA``. The offset is encoded in the imm field of the call instruction. 428 + A ``BPF_EXIT`` within the program-local function will return to the caller. 413 429 414 430 Load and store instructions 415 431 =========================== ··· 486 502 487 503 Atomic operations are operations that operate on memory and can not be 488 504 interrupted or corrupted by other access to the same memory region 489 - by other eBPF programs or means outside of this specification. 505 + by other BPF programs or means outside of this specification. 490 506 491 - All atomic operations supported by eBPF are encoded as store operations 507 + All atomic operations supported by BPF are encoded as store operations 492 508 that use the ``BPF_ATOMIC`` mode modifier as follows: 493 509 494 510 * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations ··· 578 594 Maps 579 595 ~~~~ 580 596 581 - Maps are shared memory regions accessible by eBPF programs on some platforms. 597 + Maps are shared memory regions accessible by BPF programs on some platforms. 582 598 A map can have various semantics as defined in a separate document, and may or 583 599 may not have a single contiguous memory region, but the 'map_val(map)' is 584 600 currently only defined for maps that do have a single contiguous memory region. ··· 600 616 Legacy BPF Packet access instructions 601 617 ------------------------------------- 602 618 603 - eBPF previously introduced special instructions for access to packet data that were 619 + BPF previously introduced special instructions for access to packet data that were 604 620 carried over from classic BPF. However, these instructions are 605 621 deprecated and should no longer be used.

Documentation/bpf/standardization/linux-notes.rst Documentation/bpf/linux-notes.rst

+33 -3

Documentation/process/maintainer-netdev.rst

··· 98 98 repository link above for any new networking-related commits. You may 99 99 also check the following website for the current status: 100 100 101 - https://patchwork.hopto.org/net-next.html 101 + https://netdev.bots.linux.dev/net-next.html 102 102 103 103 The ``net`` tree continues to collect fixes for the vX.Y content, and is 104 104 fed back to Linus at regular (~weekly) intervals. Meaning that the ··· 120 120 https://patchwork.kernel.org/project/netdevbpf/list/ 121 121 122 122 The "State" field will tell you exactly where things are at with your 123 - patch. Patches are indexed by the ``Message-ID`` header of the emails 123 + patch: 124 + 125 + ================== ============================================================= 126 + Patch state Description 127 + ================== ============================================================= 128 + New, Under review pending review, patch is in the maintainer’s queue for 129 + review; the two states are used interchangeably (depending on 130 + the exact co-maintainer handling patchwork at the time) 131 + Accepted patch was applied to the appropriate networking tree, this is 132 + usually set automatically by the pw-bot 133 + Needs ACK waiting for an ack from an area expert or testing 134 + Changes requested patch has not passed the review, new revision is expected 135 + with appropriate code and commit message changes 136 + Rejected patch has been rejected and new revision is not expected 137 + Not applicable patch is expected to be applied outside of the networking 138 + subsystem 139 + Awaiting upstream patch should be reviewed and handled by appropriate 140 + sub-maintainer, who will send it on to the networking trees; 141 + patches set to ``Awaiting upstream`` in netdev's patchwork 142 + will usually remain in this state, whether the sub-maintainer 143 + requested changes, accepted or rejected the patch 144 + Deferred patch needs to be reposted later, usually due to dependency 145 + or because it was posted for a closed tree 146 + Superseded new version of the patch was posted, usually set by the 147 + pw-bot 148 + RFC not to be applied, usually not in maintainer’s review queue, 149 + pw-bot can automatically set patches to this state based 150 + on subject tags 151 + ================== ============================================================= 152 + 153 + Patches are indexed by the ``Message-ID`` header of the emails 124 154 which carried them so if you have trouble finding your patch append 125 155 the value of ``Message-ID`` to the URL above. 126 156 ··· 185 155 186 156 Bot records its activity here: 187 157 188 - https://patchwork.hopto.org/pw-bot.html 158 + https://netdev.bots.linux.dev/pw-bot.html 189 159 190 160 Review timelines 191 161 ~~~~~~~~~~~~~~~~

+2

Documentation/userspace-api/netlink/intro.rst

··· 528 528 for most efficient handling of dumps (larger buffer fits more dumped 529 529 objects and therefore fewer recvmsg() calls are needed). 530 530 531 + .. _classic_netlink: 532 + 531 533 Classic Netlink 532 534 =============== 533 535

+10

arch/s390/net/bpf_jit_comp.c

··· 2088 2088 */ 2089 2089 int r14_off; /* Offset of saved %r14 */ 2090 2090 int run_ctx_off; /* Offset of struct bpf_tramp_run_ctx */ 2091 + int tccnt_off; /* Offset of saved tailcall counter */ 2091 2092 int do_fexit; /* do_fexit: label */ 2092 2093 }; 2093 2094 ··· 2259 2258 tjit->r14_off = alloc_stack(tjit, sizeof(u64)); 2260 2259 tjit->run_ctx_off = alloc_stack(tjit, 2261 2260 sizeof(struct bpf_tramp_run_ctx)); 2261 + tjit->tccnt_off = alloc_stack(tjit, sizeof(u64)); 2262 2262 /* The caller has already reserved STACK_FRAME_OVERHEAD bytes. */ 2263 2263 tjit->stack_size -= STACK_FRAME_OVERHEAD; 2264 2264 tjit->orig_stack_args_off = tjit->stack_size + STACK_FRAME_OVERHEAD; 2265 2265 2266 2266 /* aghi %r15,-stack_size */ 2267 2267 EMIT4_IMM(0xa70b0000, REG_15, -tjit->stack_size); 2268 + /* mvc tccnt_off(4,%r15),stack_size+STK_OFF_TCCNT(%r15) */ 2269 + _EMIT6(0xd203f000 | tjit->tccnt_off, 2270 + 0xf000 | (tjit->stack_size + STK_OFF_TCCNT)); 2268 2271 /* stmg %r2,%rN,fwd_reg_args_off(%r15) */ 2269 2272 if (nr_reg_args) 2270 2273 EMIT6_DISP_LH(0xeb000000, 0x0024, REG_2, ··· 2405 2400 (nr_stack_args * sizeof(u64) - 1) << 16 | 2406 2401 tjit->stack_args_off, 2407 2402 0xf000 | tjit->orig_stack_args_off); 2403 + /* mvc STK_OFF_TCCNT(4,%r15),tccnt_off(%r15) */ 2404 + _EMIT6(0xd203f000 | STK_OFF_TCCNT, 0xf000 | tjit->tccnt_off); 2408 2405 /* lgr %r1,%r8 */ 2409 2406 EMIT4(0xb9040000, REG_1, REG_8); 2410 2407 /* %r1() */ ··· 2463 2456 if (flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET)) 2464 2457 EMIT6_DISP_LH(0xe3000000, 0x0004, REG_2, REG_0, REG_15, 2465 2458 tjit->retval_off); 2459 + /* mvc stack_size+STK_OFF_TCCNT(4,%r15),tccnt_off(%r15) */ 2460 + _EMIT6(0xd203f000 | (tjit->stack_size + STK_OFF_TCCNT), 2461 + 0xf000 | tjit->tccnt_off); 2466 2462 /* aghi %r15,stack_size */ 2467 2463 EMIT4_IMM(0xa70b0000, REG_15, tjit->stack_size); 2468 2464 /* Emit an expoline for the following indirect jump. */

+15 -1

drivers/net/dsa/microchip/ksz_common.c

··· 2335 2335 { 2336 2336 struct ksz_device *dev = ds->priv; 2337 2337 2338 - if (dev->chip_id == KSZ8830_CHIP_ID) { 2338 + switch (dev->chip_id) { 2339 + case KSZ8830_CHIP_ID: 2339 2340 /* Silicon Errata Sheet (DS80000830A): 2340 2341 * Port 1 does not work with LinkMD Cable-Testing. 2341 2342 * Port 1 does not respond to received PAUSE control frames. 2342 2343 */ 2343 2344 if (!port) 2344 2345 return MICREL_KSZ8_P1_ERRATA; 2346 + break; 2347 + case KSZ9477_CHIP_ID: 2348 + /* KSZ9477 Errata DS80000754C 2349 + * 2350 + * Module 4: Energy Efficient Ethernet (EEE) feature select must 2351 + * be manually disabled 2352 + * The EEE feature is enabled by default, but it is not fully 2353 + * operational. It must be manually disabled through register 2354 + * controls. If not disabled, the PHY ports can auto-negotiate 2355 + * to enable EEE, and this feature can cause link drops when 2356 + * linked to another device supporting EEE. 2357 + */ 2358 + return MICREL_NO_EEE; 2345 2359 } 2346 2360 2347 2361 return 0;

+2

drivers/net/dsa/sja1105/sja1105.h

··· 132 132 int max_frame_mem; 133 133 int num_ports; 134 134 bool multiple_cascade_ports; 135 + /* Every {port, TXQ} has its own CBS shaper */ 136 + bool fixed_cbs_mapping; 135 137 enum dsa_tag_protocol tag_proto; 136 138 const struct sja1105_dynamic_table_ops *dyn_ops; 137 139 const struct sja1105_table_ops *static_ops;

+45 -6

drivers/net/dsa/sja1105/sja1105_main.c

··· 2115 2115 } 2116 2116 2117 2117 #define BYTES_PER_KBIT (1000LL / 8) 2118 + /* Port 0 (the uC port) does not have CBS shapers */ 2119 + #define SJA1110_FIXED_CBS(port, prio) ((((port) - 1) * SJA1105_NUM_TC) + (prio)) 2120 + 2121 + static int sja1105_find_cbs_shaper(struct sja1105_private *priv, 2122 + int port, int prio) 2123 + { 2124 + int i; 2125 + 2126 + if (priv->info->fixed_cbs_mapping) { 2127 + i = SJA1110_FIXED_CBS(port, prio); 2128 + if (i >= 0 && i < priv->info->num_cbs_shapers) 2129 + return i; 2130 + 2131 + return -1; 2132 + } 2133 + 2134 + for (i = 0; i < priv->info->num_cbs_shapers; i++) 2135 + if (priv->cbs[i].port == port && priv->cbs[i].prio == prio) 2136 + return i; 2137 + 2138 + return -1; 2139 + } 2118 2140 2119 2141 static int sja1105_find_unused_cbs_shaper(struct sja1105_private *priv) 2120 2142 { 2121 2143 int i; 2144 + 2145 + if (priv->info->fixed_cbs_mapping) 2146 + return -1; 2122 2147 2123 2148 for (i = 0; i < priv->info->num_cbs_shapers; i++) 2124 2149 if (!priv->cbs[i].idle_slope && !priv->cbs[i].send_slope) ··· 2175 2150 { 2176 2151 struct sja1105_private *priv = ds->priv; 2177 2152 struct sja1105_cbs_entry *cbs; 2153 + s64 port_transmit_rate_kbps; 2178 2154 int index; 2179 2155 2180 2156 if (!offload->enable) 2181 2157 return sja1105_delete_cbs_shaper(priv, port, offload->queue); 2182 2158 2183 - index = sja1105_find_unused_cbs_shaper(priv); 2184 - if (index < 0) 2185 - return -ENOSPC; 2159 + /* The user may be replacing an existing shaper */ 2160 + index = sja1105_find_cbs_shaper(priv, port, offload->queue); 2161 + if (index < 0) { 2162 + /* That isn't the case - see if we can allocate a new one */ 2163 + index = sja1105_find_unused_cbs_shaper(priv); 2164 + if (index < 0) 2165 + return -ENOSPC; 2166 + } 2186 2167 2187 2168 cbs = &priv->cbs[index]; 2188 2169 cbs->port = port; ··· 2198 2167 */ 2199 2168 cbs->credit_hi = offload->hicredit; 2200 2169 cbs->credit_lo = abs(offload->locredit); 2201 - /* User space is in kbits/sec, hardware in bytes/sec */ 2202 - cbs->idle_slope = offload->idleslope * BYTES_PER_KBIT; 2203 - cbs->send_slope = abs(offload->sendslope * BYTES_PER_KBIT); 2170 + /* User space is in kbits/sec, while the hardware in bytes/sec times 2171 + * link speed. Since the given offload->sendslope is good only for the 2172 + * current link speed anyway, and user space is likely to reprogram it 2173 + * when that changes, don't even bother to track the port's link speed, 2174 + * but deduce the port transmit rate from idleslope - sendslope. 2175 + */ 2176 + port_transmit_rate_kbps = offload->idleslope - offload->sendslope; 2177 + cbs->idle_slope = div_s64(offload->idleslope * BYTES_PER_KBIT, 2178 + port_transmit_rate_kbps); 2179 + cbs->send_slope = div_s64(abs(offload->sendslope * BYTES_PER_KBIT), 2180 + port_transmit_rate_kbps); 2204 2181 /* Convert the negative values from 64-bit 2's complement 2205 2182 * to 32-bit 2's complement (for the case of 0x80000000 whose 2206 2183 * negative is still negative).

+4

drivers/net/dsa/sja1105/sja1105_spi.c

··· 781 781 .tag_proto = DSA_TAG_PROTO_SJA1110, 782 782 .can_limit_mcast_flood = true, 783 783 .multiple_cascade_ports = true, 784 + .fixed_cbs_mapping = true, 784 785 .ptp_ts_bits = 32, 785 786 .ptpegr_ts_bytes = 8, 786 787 .max_frame_mem = SJA1110_MAX_FRAME_MEMORY, ··· 832 831 .tag_proto = DSA_TAG_PROTO_SJA1110, 833 832 .can_limit_mcast_flood = true, 834 833 .multiple_cascade_ports = true, 834 + .fixed_cbs_mapping = true, 835 835 .ptp_ts_bits = 32, 836 836 .ptpegr_ts_bytes = 8, 837 837 .max_frame_mem = SJA1110_MAX_FRAME_MEMORY, ··· 883 881 .tag_proto = DSA_TAG_PROTO_SJA1110, 884 882 .can_limit_mcast_flood = true, 885 883 .multiple_cascade_ports = true, 884 + .fixed_cbs_mapping = true, 886 885 .ptp_ts_bits = 32, 887 886 .ptpegr_ts_bytes = 8, 888 887 .max_frame_mem = SJA1110_MAX_FRAME_MEMORY, ··· 934 931 .tag_proto = DSA_TAG_PROTO_SJA1110, 935 932 .can_limit_mcast_flood = true, 936 933 .multiple_cascade_ports = true, 934 + .fixed_cbs_mapping = true, 937 935 .ptp_ts_bits = 32, 938 936 .ptpegr_ts_bytes = 8, 939 937 .max_frame_mem = SJA1110_MAX_FRAME_MEMORY,

+1 -1

drivers/net/ethernet/freescale/enetc/enetc_pf.c

··· 1402 1402 return; 1403 1403 1404 1404 si = enetc_psi_create(pdev); 1405 - if (si) 1405 + if (!IS_ERR(si)) 1406 1406 enetc_psi_destroy(pdev); 1407 1407 } 1408 1408 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_FREESCALE, ENETC_DEV_ID_PF,

+4 -1

drivers/net/ethernet/google/gve/gve_rx_dqo.c

··· 570 570 if (!skb) 571 571 return -1; 572 572 573 - skb_shinfo(rx->ctx.skb_tail)->frag_list = skb; 573 + if (rx->ctx.skb_tail == rx->ctx.skb_head) 574 + skb_shinfo(rx->ctx.skb_head)->frag_list = skb; 575 + else 576 + rx->ctx.skb_tail->next = skb; 574 577 rx->ctx.skb_tail = skb; 575 578 num_frags = 0; 576 579 }

+1

drivers/net/ethernet/hisilicon/hns3/hnae3.h

··· 814 814 u8 max_tc; /* Total number of TCs */ 815 815 u8 num_tc; /* Total number of enabled TCs */ 816 816 bool mqprio_active; 817 + bool dcb_ets_active; 817 818 }; 818 819 819 820 #define HNAE3_MAX_DSCP 64

+8 -3

drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c

··· 1045 1045 struct hnae3_ae_dev *ae_dev = pci_get_drvdata(h->pdev); 1046 1046 struct hnae3_dev_specs *dev_specs = &ae_dev->dev_specs; 1047 1047 struct hnae3_knic_private_info *kinfo = &h->kinfo; 1048 + struct net_device *dev = kinfo->netdev; 1048 1049 1049 1050 *pos += scnprintf(buf + *pos, len - *pos, "dev_spec:\n"); 1050 1051 *pos += scnprintf(buf + *pos, len - *pos, "MAC entry num: %u\n", ··· 1088 1087 dev_specs->mc_mac_size); 1089 1088 *pos += scnprintf(buf + *pos, len - *pos, "MAC statistics number: %u\n", 1090 1089 dev_specs->mac_stats_num); 1090 + *pos += scnprintf(buf + *pos, len - *pos, 1091 + "TX timeout threshold: %d seconds\n", 1092 + dev->watchdog_timeo / HZ); 1091 1093 } 1092 1094 1093 1095 static int hns3_dbg_dev_info(struct hnae3_handle *h, char *buf, int len) ··· 1415 1411 return 0; 1416 1412 1417 1413 out: 1418 - mutex_destroy(&handle->dbgfs_lock); 1419 1414 debugfs_remove_recursive(handle->hnae3_dbgfs); 1420 1415 handle->hnae3_dbgfs = NULL; 1416 + mutex_destroy(&handle->dbgfs_lock); 1421 1417 return ret; 1422 1418 } 1423 1419 1424 1420 void hns3_dbg_uninit(struct hnae3_handle *handle) 1425 1421 { 1426 1422 u32 i; 1423 + 1424 + debugfs_remove_recursive(handle->hnae3_dbgfs); 1425 + handle->hnae3_dbgfs = NULL; 1427 1426 1428 1427 for (i = 0; i < ARRAY_SIZE(hns3_dbg_cmd); i++) 1429 1428 if (handle->dbgfs_buf[i]) { ··· 1435 1428 } 1436 1429 1437 1430 mutex_destroy(&handle->dbgfs_lock); 1438 - debugfs_remove_recursive(handle->hnae3_dbgfs); 1439 - handle->hnae3_dbgfs = NULL; 1440 1431 } 1441 1432 1442 1433 void hns3_dbg_register_debugfs(const char *debugfs_dir_name)

+12 -7

drivers/net/ethernet/hisilicon/hns3/hns3_enet.c

··· 2103 2103 */ 2104 2104 if (test_bit(HNS3_NIC_STATE_TX_PUSH_ENABLE, &priv->state) && num && 2105 2105 !ring->pending_buf && num <= HNS3_MAX_PUSH_BD_NUM && doorbell) { 2106 + /* This smp_store_release() pairs with smp_load_aquire() in 2107 + * hns3_nic_reclaim_desc(). Ensure that the BD valid bit 2108 + * is updated. 2109 + */ 2110 + smp_store_release(&ring->last_to_use, ring->next_to_use); 2106 2111 hns3_tx_push_bd(ring, num); 2107 - WRITE_ONCE(ring->last_to_use, ring->next_to_use); 2108 2112 return; 2109 2113 } 2110 2114 ··· 2119 2115 return; 2120 2116 } 2121 2117 2118 + /* This smp_store_release() pairs with smp_load_aquire() in 2119 + * hns3_nic_reclaim_desc(). Ensure that the BD valid bit is updated. 2120 + */ 2121 + smp_store_release(&ring->last_to_use, ring->next_to_use); 2122 + 2122 2123 if (ring->tqp->mem_base) 2123 2124 hns3_tx_mem_doorbell(ring); 2124 2125 else ··· 2131 2122 ring->tqp->io_base + HNS3_RING_TX_RING_TAIL_REG); 2132 2123 2133 2124 ring->pending_buf = 0; 2134 - WRITE_ONCE(ring->last_to_use, ring->next_to_use); 2135 2125 } 2136 2126 2137 2127 static void hns3_tsyn(struct net_device *netdev, struct sk_buff *skb, ··· 3316 3308 3317 3309 netdev->priv_flags |= IFF_UNICAST_FLT; 3318 3310 3319 - netdev->gso_partial_features |= NETIF_F_GSO_GRE_CSUM; 3320 - 3321 3311 netdev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | 3322 3312 NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | 3323 3313 NETIF_F_RXCSUM | NETIF_F_SG | NETIF_F_GSO | ··· 3569 3563 static bool hns3_nic_reclaim_desc(struct hns3_enet_ring *ring, 3570 3564 int *bytes, int *pkts, int budget) 3571 3565 { 3572 - /* pair with ring->last_to_use update in hns3_tx_doorbell(), 3573 - * smp_store_release() is not used in hns3_tx_doorbell() because 3574 - * the doorbell operation already have the needed barrier operation. 3566 + /* This smp_load_acquire() pairs with smp_store_release() in 3567 + * hns3_tx_doorbell(). 3575 3568 */ 3576 3569 int ltu = smp_load_acquire(&ring->last_to_use); 3577 3570 int ntc = ring->next_to_clean;

+3 -1

drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c

··· 773 773 hns3_get_ksettings(h, cmd); 774 774 break; 775 775 case HNAE3_MEDIA_TYPE_FIBER: 776 - if (module_type == HNAE3_MODULE_TYPE_CR) 776 + if (module_type == HNAE3_MODULE_TYPE_UNKNOWN) 777 + cmd->base.port = PORT_OTHER; 778 + else if (module_type == HNAE3_MODULE_TYPE_CR) 777 779 cmd->base.port = PORT_DA; 778 780 else 779 781 cmd->base.port = PORT_FIBRE;

+5 -15

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c

··· 259 259 int ret; 260 260 261 261 if (!(hdev->dcbx_cap & DCB_CAP_DCBX_VER_IEEE) || 262 - hdev->flag & HCLGE_FLAG_MQPRIO_ENABLE) 262 + h->kinfo.tc_info.mqprio_active) 263 263 return -EINVAL; 264 264 265 265 ret = hclge_ets_validate(hdev, ets, &num_tc, &map_changed); ··· 275 275 } 276 276 277 277 hclge_tm_schd_info_update(hdev, num_tc); 278 - if (num_tc > 1) 279 - hdev->flag |= HCLGE_FLAG_DCB_ENABLE; 280 - else 281 - hdev->flag &= ~HCLGE_FLAG_DCB_ENABLE; 278 + h->kinfo.tc_info.dcb_ets_active = num_tc > 1; 282 279 283 280 ret = hclge_ieee_ets_to_tm_info(hdev, ets); 284 281 if (ret) ··· 484 487 struct hclge_vport *vport = hclge_get_vport(h); 485 488 struct hclge_dev *hdev = vport->back; 486 489 487 - if (hdev->flag & HCLGE_FLAG_MQPRIO_ENABLE) 490 + if (h->kinfo.tc_info.mqprio_active) 488 491 return 0; 489 492 490 493 return hdev->dcbx_cap; ··· 608 611 if (!test_bit(HCLGE_STATE_NIC_REGISTERED, &hdev->state)) 609 612 return -EBUSY; 610 613 611 - if (hdev->flag & HCLGE_FLAG_DCB_ENABLE) 614 + kinfo = &vport->nic.kinfo; 615 + if (kinfo->tc_info.dcb_ets_active) 612 616 return -EINVAL; 613 617 614 618 ret = hclge_mqprio_qopt_check(hdev, mqprio_qopt); ··· 623 625 if (ret) 624 626 return ret; 625 627 626 - kinfo = &vport->nic.kinfo; 627 628 memcpy(&old_tc_info, &kinfo->tc_info, sizeof(old_tc_info)); 628 629 hclge_sync_mqprio_qopt(&kinfo->tc_info, mqprio_qopt); 629 630 kinfo->tc_info.mqprio_active = tc > 0; ··· 630 633 ret = hclge_config_tc(hdev, &kinfo->tc_info); 631 634 if (ret) 632 635 goto err_out; 633 - 634 - hdev->flag &= ~HCLGE_FLAG_DCB_ENABLE; 635 - 636 - if (tc > 1) 637 - hdev->flag |= HCLGE_FLAG_MQPRIO_ENABLE; 638 - else 639 - hdev->flag &= ~HCLGE_FLAG_MQPRIO_ENABLE; 640 636 641 637 return hclge_notify_init_up(hdev); 642 638

+7 -7

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c

··· 1519 1519 struct hclge_desc desc[3]; 1520 1520 int pos = 0; 1521 1521 int ret, i; 1522 - u32 *req; 1522 + __le32 *req; 1523 1523 1524 1524 hclge_cmd_setup_basic_desc(&desc[0], HCLGE_OPC_FD_TCAM_OP, true); 1525 1525 desc[0].flag |= cpu_to_le16(HCLGE_COMM_CMD_FLAG_NEXT); ··· 1544 1544 tcam_msg.loc); 1545 1545 1546 1546 /* tcam_data0 ~ tcam_data1 */ 1547 - req = (u32 *)req1->tcam_data; 1547 + req = (__le32 *)req1->tcam_data; 1548 1548 for (i = 0; i < 2; i++) 1549 1549 pos += scnprintf(tcam_buf + pos, HCLGE_DBG_TCAM_BUF_SIZE - pos, 1550 - "%08x\n", *req++); 1550 + "%08x\n", le32_to_cpu(*req++)); 1551 1551 1552 1552 /* tcam_data2 ~ tcam_data7 */ 1553 - req = (u32 *)req2->tcam_data; 1553 + req = (__le32 *)req2->tcam_data; 1554 1554 for (i = 0; i < 6; i++) 1555 1555 pos += scnprintf(tcam_buf + pos, HCLGE_DBG_TCAM_BUF_SIZE - pos, 1556 - "%08x\n", *req++); 1556 + "%08x\n", le32_to_cpu(*req++)); 1557 1557 1558 1558 /* tcam_data8 ~ tcam_data12 */ 1559 - req = (u32 *)req3->tcam_data; 1559 + req = (__le32 *)req3->tcam_data; 1560 1560 for (i = 0; i < 5; i++) 1561 1561 pos += scnprintf(tcam_buf + pos, HCLGE_DBG_TCAM_BUF_SIZE - pos, 1562 - "%08x\n", *req++); 1562 + "%08x\n", le32_to_cpu(*req++)); 1563 1563 1564 1564 return ret; 1565 1565 }

+3 -2

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c

··· 11026 11026 11027 11027 static void hclge_info_show(struct hclge_dev *hdev) 11028 11028 { 11029 + struct hnae3_handle *handle = &hdev->vport->nic; 11029 11030 struct device *dev = &hdev->pdev->dev; 11030 11031 11031 11032 dev_info(dev, "PF info begin:\n"); ··· 11043 11042 dev_info(dev, "This is %s PF\n", 11044 11043 hdev->flag & HCLGE_FLAG_MAIN ? "main" : "not main"); 11045 11044 dev_info(dev, "DCB %s\n", 11046 - hdev->flag & HCLGE_FLAG_DCB_ENABLE ? "enable" : "disable"); 11045 + handle->kinfo.tc_info.dcb_ets_active ? "enable" : "disable"); 11047 11046 dev_info(dev, "MQPRIO %s\n", 11048 - hdev->flag & HCLGE_FLAG_MQPRIO_ENABLE ? "enable" : "disable"); 11047 + handle->kinfo.tc_info.mqprio_active ? "enable" : "disable"); 11049 11048 dev_info(dev, "Default tx spare buffer size: %u\n", 11050 11049 hdev->tx_spare_buf_size); 11051 11050

-2

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h

··· 919 919 920 920 #define HCLGE_FLAG_MAIN BIT(0) 921 921 #define HCLGE_FLAG_DCB_CAPABLE BIT(1) 922 - #define HCLGE_FLAG_DCB_ENABLE BIT(2) 923 - #define HCLGE_FLAG_MQPRIO_ENABLE BIT(3) 924 922 u32 flag; 925 923 926 924 u32 pkt_buf_size; /* Total pf buf size for tx/rx */

+2 -2

drivers/net/ethernet/intel/igb/igb.h

··· 34 34 /* TX/RX descriptor defines */ 35 35 #define IGB_DEFAULT_TXD 256 36 36 #define IGB_DEFAULT_TX_WORK 128 37 - #define IGB_MIN_TXD 80 37 + #define IGB_MIN_TXD 64 38 38 #define IGB_MAX_TXD 4096 39 39 40 40 #define IGB_DEFAULT_RXD 256 41 - #define IGB_MIN_RXD 80 41 + #define IGB_MIN_RXD 64 42 42 #define IGB_MAX_RXD 4096 43 43 44 44 #define IGB_DEFAULT_ITR 3 /* dynamic */

+3 -2

drivers/net/ethernet/intel/igb/igb_main.c

··· 3933 3933 struct pci_dev *pdev = adapter->pdev; 3934 3934 struct e1000_hw *hw = &adapter->hw; 3935 3935 3936 - /* Virtualization features not supported on i210 family. */ 3937 - if ((hw->mac.type == e1000_i210) || (hw->mac.type == e1000_i211)) 3936 + /* Virtualization features not supported on i210 and 82580 family. */ 3937 + if ((hw->mac.type == e1000_i210) || (hw->mac.type == e1000_i211) || 3938 + (hw->mac.type == e1000_82580)) 3938 3939 return; 3939 3940 3940 3941 /* Of the below we really only want the effect of getting

+2 -2

drivers/net/ethernet/intel/igbvf/igbvf.h

··· 39 39 /* Tx/Rx descriptor defines */ 40 40 #define IGBVF_DEFAULT_TXD 256 41 41 #define IGBVF_MAX_TXD 4096 42 - #define IGBVF_MIN_TXD 80 42 + #define IGBVF_MIN_TXD 64 43 43 44 44 #define IGBVF_DEFAULT_RXD 256 45 45 #define IGBVF_MAX_RXD 4096 46 - #define IGBVF_MIN_RXD 80 46 + #define IGBVF_MIN_RXD 64 47 47 48 48 #define IGBVF_MIN_ITR_USECS 10 /* 100000 irq/sec */ 49 49 #define IGBVF_MAX_ITR_USECS 10000 /* 100 irq/sec */

+2 -2

drivers/net/ethernet/intel/igc/igc.h

··· 379 379 /* TX/RX descriptor defines */ 380 380 #define IGC_DEFAULT_TXD 256 381 381 #define IGC_DEFAULT_TX_WORK 128 382 - #define IGC_MIN_TXD 80 382 + #define IGC_MIN_TXD 64 383 383 #define IGC_MAX_TXD 4096 384 384 385 385 #define IGC_DEFAULT_RXD 256 386 - #define IGC_MIN_RXD 80 386 + #define IGC_MIN_RXD 64 387 387 #define IGC_MAX_RXD 4096 388 388 389 389 /* Supported Rx Buffer Sizes */

+19 -2

drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c

··· 846 846 return 0; 847 847 } 848 848 849 + static void nix_get_aq_req_smq(struct rvu *rvu, struct nix_aq_enq_req *req, 850 + u16 *smq, u16 *smq_mask) 851 + { 852 + struct nix_cn10k_aq_enq_req *aq_req; 853 + 854 + if (!is_rvu_otx2(rvu)) { 855 + aq_req = (struct nix_cn10k_aq_enq_req *)req; 856 + *smq = aq_req->sq.smq; 857 + *smq_mask = aq_req->sq_mask.smq; 858 + } else { 859 + *smq = req->sq.smq; 860 + *smq_mask = req->sq_mask.smq; 861 + } 862 + } 863 + 849 864 static int rvu_nix_blk_aq_enq_inst(struct rvu *rvu, struct nix_hw *nix_hw, 850 865 struct nix_aq_enq_req *req, 851 866 struct nix_aq_enq_rsp *rsp) ··· 872 857 struct rvu_block *block; 873 858 struct admin_queue *aq; 874 859 struct rvu_pfvf *pfvf; 860 + u16 smq, smq_mask; 875 861 void *ctx, *mask; 876 862 bool ena; 877 863 u64 cfg; ··· 944 928 if (rc) 945 929 return rc; 946 930 931 + nix_get_aq_req_smq(rvu, req, &smq, &smq_mask); 947 932 /* Check if SQ pointed SMQ belongs to this PF/VF or not */ 948 933 if (req->ctype == NIX_AQ_CTYPE_SQ && 949 934 ((req->op == NIX_AQ_INSTOP_INIT && req->sq.ena) || 950 935 (req->op == NIX_AQ_INSTOP_WRITE && 951 - req->sq_mask.ena && req->sq_mask.smq && req->sq.ena))) { 936 + req->sq_mask.ena && req->sq.ena && smq_mask))) { 952 937 if (!is_valid_txschq(rvu, blkaddr, NIX_TXSCH_LVL_SMQ, 953 - pcifunc, req->sq.smq)) 938 + pcifunc, smq)) 954 939 return NIX_AF_ERR_AQ_ENQUEUE; 955 940 } 956 941

+3 -1

drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c

··· 17 17 if (err) 18 18 return err; 19 19 20 - if (mlx5e_is_eswitch_flow(parse_state->flow)) 20 + if (mlx5e_is_eswitch_flow(parse_state->flow)) { 21 21 attr->esw_attr->split_count = attr->esw_attr->out_count; 22 + parse_state->if_count = 0; 23 + } 22 24 23 25 attr->flags |= MLX5_ATTR_FLAG_CT; 24 26

+1

drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/mirred.c

··· 294 294 if (err) 295 295 return err; 296 296 297 + parse_state->if_count = 0; 297 298 esw_attr->out_count++; 298 299 return 0; 299 300 }

+3 -1

drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/pedit.c

··· 98 98 99 99 attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR; 100 100 101 - if (ns_type == MLX5_FLOW_NAMESPACE_FDB) 101 + if (ns_type == MLX5_FLOW_NAMESPACE_FDB) { 102 102 esw_attr->split_count = esw_attr->out_count; 103 + parse_state->if_count = 0; 104 + } 103 105 104 106 return 0; 105 107 }

+1

drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/redirect_ingress.c

··· 66 66 if (err) 67 67 return err; 68 68 69 + parse_state->if_count = 0; 69 70 esw_attr->out_count++; 70 71 71 72 return 0;

+1

drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan.c

··· 166 166 return err; 167 167 168 168 esw_attr->split_count = esw_attr->out_count; 169 + parse_state->if_count = 0; 169 170 170 171 return 0; 171 172 }

+3 -1

drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/vlan_mangle.c

··· 65 65 if (err) 66 66 return err; 67 67 68 - if (ns_type == MLX5_FLOW_NAMESPACE_FDB) 68 + if (ns_type == MLX5_FLOW_NAMESPACE_FDB) { 69 69 attr->esw_attr->split_count = attr->esw_attr->out_count; 70 + parse_state->if_count = 0; 71 + } 70 72 71 73 return 0; 72 74 }

+1

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c

··· 3936 3936 } 3937 3937 3938 3938 i_split = i + 1; 3939 + parse_state->if_count = 0; 3939 3940 list_add(&attr->list, &flow->attrs); 3940 3941 } 3941 3942

+16 -5

drivers/net/ethernet/mellanox/mlx5/core/eswitch.c

··· 1276 1276 mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw, 1277 1277 enum mlx5_eswitch_vport_event enabled_events) 1278 1278 { 1279 + bool pf_needed; 1279 1280 int ret; 1280 1281 1282 + pf_needed = mlx5_core_is_ecpf_esw_manager(esw->dev) || 1283 + esw->mode == MLX5_ESWITCH_LEGACY; 1284 + 1281 1285 /* Enable PF vport */ 1282 - ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_PF, enabled_events); 1283 - if (ret) 1284 - return ret; 1286 + if (pf_needed) { 1287 + ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_PF, 1288 + enabled_events); 1289 + if (ret) 1290 + return ret; 1291 + } 1285 1292 1286 1293 /* Enable external host PF HCA */ 1287 1294 ret = host_pf_enable_hca(esw->dev); ··· 1324 1317 ecpf_err: 1325 1318 host_pf_disable_hca(esw->dev); 1326 1319 pf_hca_err: 1327 - mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); 1320 + if (pf_needed) 1321 + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); 1328 1322 return ret; 1329 1323 } 1330 1324 ··· 1343 1335 } 1344 1336 1345 1337 host_pf_disable_hca(esw->dev); 1346 - mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); 1338 + 1339 + if (mlx5_core_is_ecpf_esw_manager(esw->dev) || 1340 + esw->mode == MLX5_ESWITCH_LEGACY) 1341 + mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF); 1347 1342 } 1348 1343 1349 1344 static void mlx5_eswitch_get_devlink_param(struct mlx5_eswitch *esw)

+35 -14

drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c

··· 3216 3216 esw_acl_ingress_ofld_cleanup(esw, vport); 3217 3217 } 3218 3218 3219 - static int esw_create_uplink_offloads_acl_tables(struct mlx5_eswitch *esw) 3219 + static int esw_create_offloads_acl_tables(struct mlx5_eswitch *esw) 3220 3220 { 3221 - struct mlx5_vport *vport; 3221 + struct mlx5_vport *uplink, *manager; 3222 + int ret; 3222 3223 3223 - vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); 3224 - if (IS_ERR(vport)) 3225 - return PTR_ERR(vport); 3224 + uplink = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); 3225 + if (IS_ERR(uplink)) 3226 + return PTR_ERR(uplink); 3226 3227 3227 - return esw_vport_create_offloads_acl_tables(esw, vport); 3228 + ret = esw_vport_create_offloads_acl_tables(esw, uplink); 3229 + if (ret) 3230 + return ret; 3231 + 3232 + manager = mlx5_eswitch_get_vport(esw, esw->manager_vport); 3233 + if (IS_ERR(manager)) { 3234 + ret = PTR_ERR(manager); 3235 + goto err_manager; 3236 + } 3237 + 3238 + ret = esw_vport_create_offloads_acl_tables(esw, manager); 3239 + if (ret) 3240 + goto err_manager; 3241 + 3242 + return 0; 3243 + 3244 + err_manager: 3245 + esw_vport_destroy_offloads_acl_tables(esw, uplink); 3246 + return ret; 3228 3247 } 3229 3248 3230 - static void esw_destroy_uplink_offloads_acl_tables(struct mlx5_eswitch *esw) 3249 + static void esw_destroy_offloads_acl_tables(struct mlx5_eswitch *esw) 3231 3250 { 3232 3251 struct mlx5_vport *vport; 3233 3252 3234 - vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); 3235 - if (IS_ERR(vport)) 3236 - return; 3253 + vport = mlx5_eswitch_get_vport(esw, esw->manager_vport); 3254 + if (!IS_ERR(vport)) 3255 + esw_vport_destroy_offloads_acl_tables(esw, vport); 3237 3256 3238 - esw_vport_destroy_offloads_acl_tables(esw, vport); 3257 + vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_UPLINK); 3258 + if (!IS_ERR(vport)) 3259 + esw_vport_destroy_offloads_acl_tables(esw, vport); 3239 3260 } 3240 3261 3241 3262 int mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw) ··· 3301 3280 } 3302 3281 esw->fdb_table.offloads.indir = indir; 3303 3282 3304 - err = esw_create_uplink_offloads_acl_tables(esw); 3283 + err = esw_create_offloads_acl_tables(esw); 3305 3284 if (err) 3306 3285 goto create_acl_err; 3307 3286 ··· 3342 3321 create_restore_err: 3343 3322 esw_destroy_offloads_table(esw); 3344 3323 create_offloads_err: 3345 - esw_destroy_uplink_offloads_acl_tables(esw); 3324 + esw_destroy_offloads_acl_tables(esw); 3346 3325 create_acl_err: 3347 3326 mlx5_esw_indir_table_destroy(esw->fdb_table.offloads.indir); 3348 3327 create_indir_err: ··· 3358 3337 esw_destroy_offloads_fdb_tables(esw); 3359 3338 esw_destroy_restore_table(esw); 3360 3339 esw_destroy_offloads_table(esw); 3361 - esw_destroy_uplink_offloads_acl_tables(esw); 3340 + esw_destroy_offloads_acl_tables(esw); 3362 3341 mlx5_esw_indir_table_destroy(esw->fdb_table.offloads.indir); 3363 3342 mutex_destroy(&esw->fdb_table.offloads.vports.lock); 3364 3343 }

+15 -5

drivers/net/ethernet/sfc/rx.c

··· 359 359 /* Handle a received packet. Second half: Touches packet payload. */ 360 360 void __efx_rx_packet(struct efx_channel *channel) 361 361 { 362 + struct efx_rx_queue *rx_queue = efx_channel_get_rx_queue(channel); 362 363 struct efx_nic *efx = channel->efx; 363 364 struct efx_rx_buffer *rx_buf = 364 - efx_rx_buffer(&channel->rx_queue, channel->rx_pkt_index); 365 + efx_rx_buffer(rx_queue, channel->rx_pkt_index); 365 366 u8 *eh = efx_rx_buf_va(rx_buf); 366 367 367 368 /* Read length from the prefix if necessary. This already 368 369 * excludes the length of the prefix itself. 369 370 */ 370 - if (rx_buf->flags & EFX_RX_PKT_PREFIX_LEN) 371 + if (rx_buf->flags & EFX_RX_PKT_PREFIX_LEN) { 371 372 rx_buf->len = le16_to_cpup((__le16 *) 372 373 (eh + efx->rx_packet_len_offset)); 374 + /* A known issue may prevent this being filled in; 375 + * if that happens, just drop the packet. 376 + * Must do that in the driver since passing a zero-length 377 + * packet up to the stack may cause a crash. 378 + */ 379 + if (unlikely(!rx_buf->len)) { 380 + efx_free_rx_buffers(rx_queue, rx_buf, 381 + channel->rx_pkt_n_frags); 382 + channel->n_rx_frm_trunc++; 383 + goto out; 384 + } 385 + } 373 386 374 387 /* If we're in loopback test, then pass the packet directly to the 375 388 * loopback layer, and free the rx_buf here 376 389 */ 377 390 if (unlikely(efx->loopback_selftest)) { 378 - struct efx_rx_queue *rx_queue; 379 - 380 391 efx_loopback_rx_packet(efx, eh, rx_buf->len); 381 - rx_queue = efx_channel_get_rx_queue(channel); 382 392 efx_free_rx_buffers(rx_queue, rx_buf, 383 393 channel->rx_pkt_n_frags); 384 394 goto out;

+2 -3

drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c

··· 419 419 return ERR_PTR(phy_mode); 420 420 421 421 plat->phy_interface = phy_mode; 422 - plat->mac_interface = stmmac_of_get_mac_mode(np); 423 - if (plat->mac_interface < 0) 424 - plat->mac_interface = plat->phy_interface; 422 + rc = stmmac_of_get_mac_mode(np); 423 + plat->mac_interface = rc < 0 ? plat->phy_interface : rc; 425 424 426 425 /* Some wrapper drivers still rely on phy_node. Let's save it while 427 426 * they are not converted to phylink. */

+1 -2

drivers/net/macsec.c

··· 1330 1330 struct crypto_aead *tfm; 1331 1331 int ret; 1332 1332 1333 - /* Pick a sync gcm(aes) cipher to ensure order is preserved. */ 1334 - tfm = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC); 1333 + tfm = crypto_alloc_aead("gcm(aes)", 0, 0); 1335 1334 1336 1335 if (IS_ERR(tfm)) 1337 1336 return tfm;

+6 -3

drivers/net/phy/micrel.c

··· 1800 1800 /* Transmit waveform amplitude can be improved (1000BASE-T, 100BASE-TX, 10BASE-Te) */ 1801 1801 {0x1c, 0x04, 0x00d0}, 1802 1802 1803 - /* Energy Efficient Ethernet (EEE) feature select must be manually disabled */ 1804 - {0x07, 0x3c, 0x0000}, 1805 - 1806 1803 /* Register settings are required to meet data sheet supply current specifications */ 1807 1804 {0x1c, 0x13, 0x6eff}, 1808 1805 {0x1c, 0x14, 0xe6ff}, ··· 1843 1846 if (err) 1844 1847 return err; 1845 1848 } 1849 + 1850 + /* According to KSZ9477 Errata DS80000754C (Module 4) all EEE modes 1851 + * in this switch shall be regarded as broken. 1852 + */ 1853 + if (phydev->dev_flags & MICREL_NO_EEE) 1854 + phydev->eee_broken_modes = -1; 1846 1855 1847 1856 err = genphy_restart_aneg(phydev); 1848 1857 if (err)

+3 -1

drivers/net/veth.c

··· 344 344 { 345 345 struct veth_priv *rcv_priv, *priv = netdev_priv(dev); 346 346 struct veth_rq *rq = NULL; 347 + int ret = NETDEV_TX_OK; 347 348 struct net_device *rcv; 348 349 int length = skb->len; 349 350 bool use_napi = false; ··· 379 378 } else { 380 379 drop: 381 380 atomic64_inc(&priv->dropped); 381 + ret = NET_XMIT_DROP; 382 382 } 383 383 384 384 rcu_read_unlock(); 385 385 386 - return NETDEV_TX_OK; 386 + return ret; 387 387 } 388 388 389 389 static u64 veth_stats_tx(struct net_device *dev, u64 *packets, u64 *bytes)

+1

drivers/nfc/nxp-nci/i2c.c

··· 336 336 #ifdef CONFIG_ACPI 337 337 static const struct acpi_device_id acpi_id[] = { 338 338 { "NXP1001" }, 339 + { "NXP1002" }, 339 340 { "NXP7471" }, 340 341 { } 341 342 };

+2

include/linux/audit.h

··· 117 117 AUDIT_NFT_OP_OBJ_RESET, 118 118 AUDIT_NFT_OP_FLOWTABLE_REGISTER, 119 119 AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, 120 + AUDIT_NFT_OP_SETELEM_RESET, 121 + AUDIT_NFT_OP_RULE_RESET, 120 122 AUDIT_NFT_OP_INVALID, 121 123 }; 122 124

+1 -1

include/linux/bpf.h

··· 438 438 439 439 size /= sizeof(long); 440 440 while (size--) 441 - *ldst++ = *lsrc++; 441 + data_race(*ldst++ = *lsrc++); 442 442 } 443 443 444 444 /* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */

+1

include/linux/ipv6.h

··· 147 147 #define IP6SKB_JUMBOGRAM 128 148 148 #define IP6SKB_SEG6 256 149 149 #define IP6SKB_FAKEJUMBO 512 150 + #define IP6SKB_MULTIPATH 1024 150 151 }; 151 152 152 153 #if defined(CONFIG_NET_L3_MASTER_DEV)

+4 -3

include/linux/micrel_phy.h

··· 41 41 #define PHY_ID_KSZ9477 0x00221631 42 42 43 43 /* struct phy_device dev_flags definitions */ 44 - #define MICREL_PHY_50MHZ_CLK 0x00000001 45 - #define MICREL_PHY_FXEN 0x00000002 46 - #define MICREL_KSZ8_P1_ERRATA 0x00000003 44 + #define MICREL_PHY_50MHZ_CLK BIT(0) 45 + #define MICREL_PHY_FXEN BIT(1) 46 + #define MICREL_KSZ8_P1_ERRATA BIT(2) 47 + #define MICREL_NO_EEE BIT(3) 47 48 48 49 #define MICREL_KSZ9021_EXTREG_CTRL 0xB 49 50 #define MICREL_KSZ9021_EXTREG_DATA_WRITE 0xC

+2 -2

include/linux/phylink.h

··· 600 600 * 601 601 * The %neg_mode argument should be tested via the phylink_mode_*() family of 602 602 * functions, or for PCS that set pcs->neg_mode true, should be tested 603 - * against the %PHYLINK_PCS_NEG_* definitions. 603 + * against the PHYLINK_PCS_NEG_* definitions. 604 604 */ 605 605 int pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode, 606 606 phy_interface_t interface, const unsigned long *advertising, ··· 630 630 * 631 631 * The %mode argument should be tested via the phylink_mode_*() family of 632 632 * functions, or for PCS that set pcs->neg_mode true, should be tested 633 - * against the %PHYLINK_PCS_NEG_* definitions. 633 + * against the PHYLINK_PCS_NEG_* definitions. 634 634 */ 635 635 void pcs_link_up(struct phylink_pcs *pcs, unsigned int neg_mode, 636 636 phy_interface_t interface, int speed, int duplex);

+2 -1

include/net/ip.h

··· 57 57 #define IPSKB_FRAG_PMTU BIT(6) 58 58 #define IPSKB_L3SLAVE BIT(7) 59 59 #define IPSKB_NOPOLICY BIT(8) 60 + #define IPSKB_MULTIPATH BIT(9) 60 61 61 62 u16 frag_max_size; 62 63 }; ··· 95 94 ipcm_init(ipcm); 96 95 97 96 ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); 98 - ipcm->sockc.tsflags = inet->sk.sk_tsflags; 97 + ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags); 99 98 ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); 100 99 ipcm->addr = inet->inet_saddr; 101 100 ipcm->protocol = inet->inet_num;

+4 -1

include/net/ip6_fib.h

··· 642 642 if (!net->ipv6.fib6_rules_require_fldissect) 643 643 return false; 644 644 645 - skb_flow_dissect_flow_keys(skb, flkeys, flag); 645 + memset(flkeys, 0, sizeof(*flkeys)); 646 + __skb_flow_dissect(net, skb, &flow_keys_dissector, 647 + flkeys, NULL, 0, 0, 0, flag); 648 + 646 649 fl6->fl6_sport = flkeys->ports.src; 647 650 fl6->fl6_dport = flkeys->ports.dst; 648 651 fl6->flowi6_proto = flkeys->basic.ip_proto;

+4 -1

include/net/ip_fib.h

··· 418 418 if (!net->ipv4.fib_rules_require_fldissect) 419 419 return false; 420 420 421 - skb_flow_dissect_flow_keys(skb, flkeys, flag); 421 + memset(flkeys, 0, sizeof(*flkeys)); 422 + __skb_flow_dissect(net, skb, &flow_keys_dissector, 423 + flkeys, NULL, 0, 0, 0, flag); 424 + 422 425 fl4->fl4_sport = flkeys->ports.src; 423 426 fl4->fl4_dport = flkeys->ports.dst; 424 427 fl4->flowi4_proto = flkeys->basic.ip_proto;

+7 -8

include/net/ip_tunnels.h

··· 483 483 u64_stats_inc(&tstats->tx_packets); 484 484 u64_stats_update_end(&tstats->syncp); 485 485 put_cpu_ptr(tstats); 486 - } else { 487 - struct net_device_stats *err_stats = &dev->stats; 486 + return; 487 + } 488 488 489 - if (pkt_len < 0) { 490 - err_stats->tx_errors++; 491 - err_stats->tx_aborted_errors++; 492 - } else { 493 - err_stats->tx_dropped++; 494 - } 489 + if (pkt_len < 0) { 490 + DEV_STATS_INC(dev, tx_errors); 491 + DEV_STATS_INC(dev, tx_aborted_errors); 492 + } else { 493 + DEV_STATS_INC(dev, tx_dropped); 495 494 } 496 495 } 497 496

+9 -5

include/net/scm.h

··· 9 9 #include <linux/pid.h> 10 10 #include <linux/nsproxy.h> 11 11 #include <linux/sched/signal.h> 12 + #include <net/compat.h> 12 13 13 14 /* Well, we should have at least one descriptor open 14 15 * to accept passed FDs 8) ··· 124 123 static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm) 125 124 { 126 125 struct file *pidfd_file = NULL; 127 - int pidfd; 126 + int len, pidfd; 128 127 129 - /* 130 - * put_cmsg() doesn't return an error if CMSG is truncated, 128 + /* put_cmsg() doesn't return an error if CMSG is truncated, 131 129 * that's why we need to opencode these checks here. 132 130 */ 133 - if ((msg->msg_controllen <= sizeof(struct cmsghdr)) || 134 - (msg->msg_controllen - sizeof(struct cmsghdr)) < sizeof(int)) { 131 + if (msg->msg_flags & MSG_CMSG_COMPAT) 132 + len = sizeof(struct compat_cmsghdr) + sizeof(int); 133 + else 134 + len = sizeof(struct cmsghdr) + sizeof(int); 135 + 136 + if (msg->msg_controllen < len) { 135 137 msg->msg_flags |= MSG_CTRUNC; 136 138 return; 137 139 }

+19 -10

include/net/sock.h

··· 1053 1053 WRITE_ONCE(sk->sk_wmem_queued, sk->sk_wmem_queued + val); 1054 1054 } 1055 1055 1056 + static inline void sk_forward_alloc_add(struct sock *sk, int val) 1057 + { 1058 + /* Paired with lockless reads of sk->sk_forward_alloc */ 1059 + WRITE_ONCE(sk->sk_forward_alloc, sk->sk_forward_alloc + val); 1060 + } 1061 + 1056 1062 void sk_stream_write_space(struct sock *sk); 1057 1063 1058 1064 /* OOB backlog add */ ··· 1383 1377 if (sk->sk_prot->forward_alloc_get) 1384 1378 return sk->sk_prot->forward_alloc_get(sk); 1385 1379 #endif 1386 - return sk->sk_forward_alloc; 1380 + return READ_ONCE(sk->sk_forward_alloc); 1387 1381 } 1388 1382 1389 1383 static inline bool __sk_stream_memory_free(const struct sock *sk, int wake) ··· 1679 1673 { 1680 1674 if (!sk_has_account(sk)) 1681 1675 return; 1682 - sk->sk_forward_alloc -= size; 1676 + sk_forward_alloc_add(sk, -size); 1683 1677 } 1684 1678 1685 1679 static inline void sk_mem_uncharge(struct sock *sk, int size) 1686 1680 { 1687 1681 if (!sk_has_account(sk)) 1688 1682 return; 1689 - sk->sk_forward_alloc += size; 1683 + sk_forward_alloc_add(sk, size); 1690 1684 sk_mem_reclaim(sk); 1691 1685 } 1692 1686 ··· 1906 1900 static inline void sockcm_init(struct sockcm_cookie *sockc, 1907 1901 const struct sock *sk) 1908 1902 { 1909 - *sockc = (struct sockcm_cookie) { .tsflags = sk->sk_tsflags }; 1903 + *sockc = (struct sockcm_cookie) { 1904 + .tsflags = READ_ONCE(sk->sk_tsflags) 1905 + }; 1910 1906 } 1911 1907 1912 1908 int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, ··· 2703 2695 static inline void 2704 2696 sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) 2705 2697 { 2706 - ktime_t kt = skb->tstamp; 2707 2698 struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); 2708 - 2699 + u32 tsflags = READ_ONCE(sk->sk_tsflags); 2700 + ktime_t kt = skb->tstamp; 2709 2701 /* 2710 2702 * generate control messages if 2711 2703 * - receive time stamping in software requested ··· 2713 2705 * - hardware time stamps available and wanted 2714 2706 */ 2715 2707 if (sock_flag(sk, SOCK_RCVTSTAMP) || 2716 - (sk->sk_tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || 2717 - (kt && sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE) || 2708 + (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) || 2709 + (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) || 2718 2710 (hwtstamps->hwtstamp && 2719 - (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) 2711 + (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE))) 2720 2712 __sock_recv_timestamp(msg, sk, skb); 2721 2713 else 2722 2714 sock_write_timestamp(sk, kt); ··· 2738 2730 #define TSFLAGS_ANY (SOF_TIMESTAMPING_SOFTWARE | \ 2739 2731 SOF_TIMESTAMPING_RAW_HARDWARE) 2740 2732 2741 - if (sk->sk_flags & FLAGS_RECV_CMSGS || sk->sk_tsflags & TSFLAGS_ANY) 2733 + if (sk->sk_flags & FLAGS_RECV_CMSGS || 2734 + READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY) 2742 2735 __sock_recv_cmsgs(msg, sk, skb); 2743 2736 else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP))) 2744 2737 sock_write_timestamp(sk, skb->tstamp);

+1

include/uapi/linux/netfilter/nf_tables.h

··· 263 263 * @NFTA_RULE_USERDATA: user data (NLA_BINARY, NFT_USERDATA_MAXLEN) 264 264 * @NFTA_RULE_ID: uniquely identifies a rule in a transaction (NLA_U32) 265 265 * @NFTA_RULE_POSITION_ID: transaction unique identifier of the previous rule (NLA_U32) 266 + * @NFTA_RULE_CHAIN_ID: add the rule to chain by ID, alternative to @NFTA_RULE_CHAIN (NLA_U32) 266 267 */ 267 268 enum nft_rule_attributes { 268 269 NFTA_RULE_UNSPEC,

+2

kernel/auditsc.c

··· 143 143 { AUDIT_NFT_OP_OBJ_RESET, "nft_reset_obj" }, 144 144 { AUDIT_NFT_OP_FLOWTABLE_REGISTER, "nft_register_flowtable" }, 145 145 { AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, "nft_unregister_flowtable" }, 146 + { AUDIT_NFT_OP_SETELEM_RESET, "nft_reset_setelem" }, 147 + { AUDIT_NFT_OP_RULE_RESET, "nft_reset_rule" }, 146 148 { AUDIT_NFT_OP_INVALID, "nft_invalid" }, 147 149 }; 148 150

+15 -34

kernel/bpf/bpf_local_storage.c

··· 553 553 void *value, u64 map_flags, gfp_t gfp_flags) 554 554 { 555 555 struct bpf_local_storage_data *old_sdata = NULL; 556 - struct bpf_local_storage_elem *selem = NULL; 556 + struct bpf_local_storage_elem *alloc_selem, *selem = NULL; 557 557 struct bpf_local_storage *local_storage; 558 558 unsigned long flags; 559 559 int err; ··· 607 607 } 608 608 } 609 609 610 - if (gfp_flags == GFP_KERNEL) { 611 - selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); 612 - if (!selem) 613 - return ERR_PTR(-ENOMEM); 614 - } 610 + /* A lookup has just been done before and concluded a new selem is 611 + * needed. The chance of an unnecessary alloc is unlikely. 612 + */ 613 + alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags); 614 + if (!alloc_selem) 615 + return ERR_PTR(-ENOMEM); 615 616 616 617 raw_spin_lock_irqsave(&local_storage->lock, flags); 617 618 ··· 624 623 * simple. 625 624 */ 626 625 err = -EAGAIN; 627 - goto unlock_err; 626 + goto unlock; 628 627 } 629 628 630 629 old_sdata = bpf_local_storage_lookup(local_storage, smap, false); 631 630 err = check_flags(old_sdata, map_flags); 632 631 if (err) 633 - goto unlock_err; 632 + goto unlock; 634 633 635 634 if (old_sdata && (map_flags & BPF_F_LOCK)) { 636 635 copy_map_value_locked(&smap->map, old_sdata->data, value, ··· 639 638 goto unlock; 640 639 } 641 640 642 - if (gfp_flags != GFP_KERNEL) { 643 - /* local_storage->lock is held. Hence, we are sure 644 - * we can unlink and uncharge the old_sdata successfully 645 - * later. Hence, instead of charging the new selem now 646 - * and then uncharge the old selem later (which may cause 647 - * a potential but unnecessary charge failure), avoid taking 648 - * a charge at all here (the "!old_sdata" check) and the 649 - * old_sdata will not be uncharged later during 650 - * bpf_selem_unlink_storage_nolock(). 651 - */ 652 - selem = bpf_selem_alloc(smap, owner, value, !old_sdata, gfp_flags); 653 - if (!selem) { 654 - err = -ENOMEM; 655 - goto unlock_err; 656 - } 657 - } 658 - 641 + alloc_selem = NULL; 659 642 /* First, link the new selem to the map */ 660 643 bpf_selem_link_map(smap, selem); 661 644 ··· 650 665 if (old_sdata) { 651 666 bpf_selem_unlink_map(SELEM(old_sdata)); 652 667 bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata), 653 - false, false); 668 + true, false); 654 669 } 655 670 656 671 unlock: 657 672 raw_spin_unlock_irqrestore(&local_storage->lock, flags); 658 - return SDATA(selem); 659 - 660 - unlock_err: 661 - raw_spin_unlock_irqrestore(&local_storage->lock, flags); 662 - if (selem) { 673 + if (alloc_selem) { 663 674 mem_uncharge(smap, owner, smap->elem_size); 664 - bpf_selem_free(selem, smap, true); 675 + bpf_selem_free(alloc_selem, smap, true); 665 676 } 666 - return ERR_PTR(err); 677 + return err ? ERR_PTR(err) : SDATA(selem); 667 678 } 668 679 669 680 static u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache) ··· 760 779 * of the loop will set the free_cgroup_storage to true. 761 780 */ 762 781 free_storage = bpf_selem_unlink_storage_nolock( 763 - local_storage, selem, false, true); 782 + local_storage, selem, true, true); 764 783 } 765 784 raw_spin_unlock_irqrestore(&local_storage->lock, flags); 766 785

+1 -1

kernel/bpf/syscall.c

··· 5502 5502 } 5503 5503 5504 5504 run_ctx.bpf_cookie = 0; 5505 - run_ctx.saved_run_ctx = NULL; 5506 5505 if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) { 5507 5506 /* recursion detected */ 5507 + __bpf_prog_exit_sleepable_recur(prog, 0, &run_ctx); 5508 5508 bpf_prog_put(prog); 5509 5509 return -EBUSY; 5510 5510 }

+2 -3

kernel/bpf/trampoline.c

··· 926 926 migrate_disable(); 927 927 might_fault(); 928 928 929 + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); 930 + 929 931 if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { 930 932 bpf_prog_inc_misses_counter(prog); 931 933 return 0; 932 934 } 933 - 934 - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); 935 - 936 935 return bpf_prog_start_time(); 937 936 } 938 937

+1

net/bpf/test_run.c

··· 543 543 544 544 int noinline bpf_fentry_test7(struct bpf_fentry_test_t *arg) 545 545 { 546 + asm volatile (""); 546 547 return (long)arg; 547 548 } 548 549

+6 -4

net/can/j1939/socket.c

··· 974 974 struct sock_exterr_skb *serr; 975 975 struct sk_buff *skb; 976 976 char *state = "UNK"; 977 + u32 tsflags; 977 978 int err; 978 979 979 980 jsk = j1939_sk(sk); ··· 982 981 if (!(jsk->state & J1939_SOCK_ERRQUEUE)) 983 982 return; 984 983 984 + tsflags = READ_ONCE(sk->sk_tsflags); 985 985 switch (type) { 986 986 case J1939_ERRQUEUE_TX_ACK: 987 - if (!(sk->sk_tsflags & SOF_TIMESTAMPING_TX_ACK)) 987 + if (!(tsflags & SOF_TIMESTAMPING_TX_ACK)) 988 988 return; 989 989 break; 990 990 case J1939_ERRQUEUE_TX_SCHED: 991 - if (!(sk->sk_tsflags & SOF_TIMESTAMPING_TX_SCHED)) 991 + if (!(tsflags & SOF_TIMESTAMPING_TX_SCHED)) 992 992 return; 993 993 break; 994 994 case J1939_ERRQUEUE_TX_ABORT: ··· 999 997 case J1939_ERRQUEUE_RX_DPO: 1000 998 fallthrough; 1001 999 case J1939_ERRQUEUE_RX_ABORT: 1002 - if (!(sk->sk_tsflags & SOF_TIMESTAMPING_RX_SOFTWARE)) 1000 + if (!(tsflags & SOF_TIMESTAMPING_RX_SOFTWARE)) 1003 1001 return; 1004 1002 break; 1005 1003 default: ··· 1056 1054 } 1057 1055 1058 1056 serr->opt_stats = true; 1059 - if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) 1057 + if (tsflags & SOF_TIMESTAMPING_OPT_ID) 1060 1058 serr->ee.ee_data = session->tskey; 1061 1059 1062 1060 netdev_dbg(session->priv->ndev, "%s: 0x%p tskey: %i, state: %s\n",

+1 -2

net/core/flow_dissector.c

··· 1831 1831 1832 1832 memset(&keys, 0, sizeof(keys)); 1833 1833 __skb_flow_dissect(NULL, skb, &flow_keys_dissector_symmetric, 1834 - &keys, NULL, 0, 0, 0, 1835 - FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL); 1834 + &keys, NULL, 0, 0, 0, 0); 1836 1835 1837 1836 return __flow_hash_from_keys(&keys, &hashrnd); 1838 1837 }

+34 -20

net/core/skbuff.c

··· 550 550 bool *pfmemalloc) 551 551 { 552 552 bool ret_pfmemalloc = false; 553 - unsigned int obj_size; 553 + size_t obj_size; 554 554 void *obj; 555 555 556 556 obj_size = SKB_HEAD_ALIGN(*size); ··· 567 567 obj = kmem_cache_alloc_node(skb_small_head_cache, flags, node); 568 568 goto out; 569 569 } 570 - *size = obj_size = kmalloc_size_roundup(obj_size); 570 + 571 + obj_size = kmalloc_size_roundup(obj_size); 572 + /* The following cast might truncate high-order bits of obj_size, this 573 + * is harmless because kmalloc(obj_size >= 2^32) will fail anyway. 574 + */ 575 + *size = (unsigned int)obj_size; 576 + 571 577 /* 572 578 * Try a regular allocation, when that fails and we're not entitled 573 579 * to the reserves, fail. ··· 4429 4423 struct sk_buff *segs = NULL; 4430 4424 struct sk_buff *tail = NULL; 4431 4425 struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list; 4432 - skb_frag_t *frag = skb_shinfo(head_skb)->frags; 4433 4426 unsigned int mss = skb_shinfo(head_skb)->gso_size; 4434 4427 unsigned int doffset = head_skb->data - skb_mac_header(head_skb); 4435 - struct sk_buff *frag_skb = head_skb; 4436 4428 unsigned int offset = doffset; 4437 4429 unsigned int tnl_hlen = skb_tnl_header_len(head_skb); 4438 4430 unsigned int partial_segs = 0; 4439 4431 unsigned int headroom; 4440 4432 unsigned int len = head_skb->len; 4433 + struct sk_buff *frag_skb; 4434 + skb_frag_t *frag; 4441 4435 __be16 proto; 4442 4436 bool csum, sg; 4443 - int nfrags = skb_shinfo(head_skb)->nr_frags; 4444 4437 int err = -ENOMEM; 4445 4438 int i = 0; 4446 - int pos; 4439 + int nfrags, pos; 4447 4440 4448 4441 if ((skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY) && 4449 4442 mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) { ··· 4519 4514 headroom = skb_headroom(head_skb); 4520 4515 pos = skb_headlen(head_skb); 4521 4516 4517 + if (skb_orphan_frags(head_skb, GFP_ATOMIC)) 4518 + return ERR_PTR(-ENOMEM); 4519 + 4520 + nfrags = skb_shinfo(head_skb)->nr_frags; 4521 + frag = skb_shinfo(head_skb)->frags; 4522 + frag_skb = head_skb; 4523 + 4522 4524 do { 4523 4525 struct sk_buff *nskb; 4524 4526 skb_frag_t *nskb_frag; ··· 4546 4534 (skb_headlen(list_skb) == len || sg)) { 4547 4535 BUG_ON(skb_headlen(list_skb) > len); 4548 4536 4537 + nskb = skb_clone(list_skb, GFP_ATOMIC); 4538 + if (unlikely(!nskb)) 4539 + goto err; 4540 + 4549 4541 i = 0; 4550 4542 nfrags = skb_shinfo(list_skb)->nr_frags; 4551 4543 frag = skb_shinfo(list_skb)->frags; ··· 4568 4552 frag++; 4569 4553 } 4570 4554 4571 - nskb = skb_clone(list_skb, GFP_ATOMIC); 4572 4555 list_skb = list_skb->next; 4573 - 4574 - if (unlikely(!nskb)) 4575 - goto err; 4576 4556 4577 4557 if (unlikely(pskb_trim(nskb, len))) { 4578 4558 kfree_skb(nskb); ··· 4645 4633 skb_shinfo(nskb)->flags |= skb_shinfo(head_skb)->flags & 4646 4634 SKBFL_SHARED_FRAG; 4647 4635 4648 - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || 4649 - skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) 4636 + if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) 4650 4637 goto err; 4651 4638 4652 4639 while (pos < offset + len) { 4653 4640 if (i >= nfrags) { 4641 + if (skb_orphan_frags(list_skb, GFP_ATOMIC) || 4642 + skb_zerocopy_clone(nskb, list_skb, 4643 + GFP_ATOMIC)) 4644 + goto err; 4645 + 4654 4646 i = 0; 4655 4647 nfrags = skb_shinfo(list_skb)->nr_frags; 4656 4648 frag = skb_shinfo(list_skb)->frags; ··· 4668 4652 i--; 4669 4653 frag--; 4670 4654 } 4671 - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || 4672 - skb_zerocopy_clone(nskb, frag_skb, 4673 - GFP_ATOMIC)) 4674 - goto err; 4675 4655 4676 4656 list_skb = list_skb->next; 4677 4657 } ··· 5219 5207 serr->ee.ee_info = tstype; 5220 5208 serr->opt_stats = opt_stats; 5221 5209 serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0; 5222 - if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) { 5210 + if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) { 5223 5211 serr->ee.ee_data = skb_shinfo(skb)->tskey; 5224 5212 if (sk_is_tcp(sk)) 5225 5213 serr->ee.ee_data -= atomic_read(&sk->sk_tskey); ··· 5275 5263 { 5276 5264 struct sk_buff *skb; 5277 5265 bool tsonly, opt_stats = false; 5266 + u32 tsflags; 5278 5267 5279 5268 if (!sk) 5280 5269 return; 5281 5270 5282 - if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) && 5271 + tsflags = READ_ONCE(sk->sk_tsflags); 5272 + if (!hwtstamps && !(tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) && 5283 5273 skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS) 5284 5274 return; 5285 5275 5286 - tsonly = sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TSONLY; 5276 + tsonly = tsflags & SOF_TIMESTAMPING_OPT_TSONLY; 5287 5277 if (!skb_may_tx_timestamp(sk, tsonly)) 5288 5278 return; 5289 5279 5290 5280 if (tsonly) { 5291 5281 #ifdef CONFIG_INET 5292 - if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS) && 5282 + if ((tsflags & SOF_TIMESTAMPING_OPT_STATS) && 5293 5283 sk_is_tcp(sk)) { 5294 5284 skb = tcp_get_timestamping_opt_stats(sk, orig_skb, 5295 5285 ack_skb);

+8 -4

net/core/skmsg.c

··· 612 612 static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, 613 613 u32 off, u32 len, bool ingress) 614 614 { 615 + int err = 0; 616 + 615 617 if (!ingress) { 616 618 if (!sock_writeable(psock->sk)) 617 619 return -EAGAIN; 618 620 return skb_send_sock(psock->sk, skb, off, len); 619 621 } 620 - return sk_psock_skb_ingress(psock, skb, off, len); 622 + skb_get(skb); 623 + err = sk_psock_skb_ingress(psock, skb, off, len); 624 + if (err < 0) 625 + kfree_skb(skb); 626 + return err; 621 627 } 622 628 623 629 static void sk_psock_skb_state(struct sk_psock *psock, ··· 691 685 } while (len); 692 686 693 687 skb = skb_dequeue(&psock->ingress_skb); 694 - if (!ingress) { 695 - kfree_skb(skb); 696 - } 688 + kfree_skb(skb); 697 689 } 698 690 end: 699 691 mutex_unlock(&psock->work_mutex);

+14 -13

net/core/sock.c

··· 765 765 return false; 766 766 if (!sk) 767 767 return true; 768 - switch (sk->sk_family) { 768 + /* IPV6_ADDRFORM can change sk->sk_family under us. */ 769 + switch (READ_ONCE(sk->sk_family)) { 769 770 case AF_INET: 770 771 return inet_test_bit(MC_LOOP, sk); 771 772 #if IS_ENABLED(CONFIG_IPV6) ··· 894 893 if (!match) 895 894 return -EINVAL; 896 895 897 - sk->sk_bind_phc = phc_index; 896 + WRITE_ONCE(sk->sk_bind_phc, phc_index); 898 897 899 898 return 0; 900 899 } ··· 937 936 return ret; 938 937 } 939 938 940 - sk->sk_tsflags = val; 939 + WRITE_ONCE(sk->sk_tsflags, val); 941 940 sock_valbool_flag(sk, SOCK_TSTAMP_NEW, optname == SO_TIMESTAMPING_NEW); 942 941 943 942 if (val & SOF_TIMESTAMPING_RX_SOFTWARE) ··· 1045 1044 mem_cgroup_uncharge_skmem(sk->sk_memcg, pages); 1046 1045 return -ENOMEM; 1047 1046 } 1048 - sk->sk_forward_alloc += pages << PAGE_SHIFT; 1047 + sk_forward_alloc_add(sk, pages << PAGE_SHIFT); 1049 1048 1050 1049 WRITE_ONCE(sk->sk_reserved_mem, 1051 1050 sk->sk_reserved_mem + (pages << PAGE_SHIFT)); ··· 1719 1718 1720 1719 case SO_TIMESTAMPING_OLD: 1721 1720 lv = sizeof(v.timestamping); 1722 - v.timestamping.flags = sk->sk_tsflags; 1723 - v.timestamping.bind_phc = sk->sk_bind_phc; 1721 + v.timestamping.flags = READ_ONCE(sk->sk_tsflags); 1722 + v.timestamping.bind_phc = READ_ONCE(sk->sk_bind_phc); 1724 1723 break; 1725 1724 1726 1725 case SO_RCVTIMEO_OLD: ··· 2747 2746 prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); 2748 2747 if (refcount_read(&sk->sk_wmem_alloc) < READ_ONCE(sk->sk_sndbuf)) 2749 2748 break; 2750 - if (sk->sk_shutdown & SEND_SHUTDOWN) 2749 + if (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) 2751 2750 break; 2752 - if (sk->sk_err) 2751 + if (READ_ONCE(sk->sk_err)) 2753 2752 break; 2754 2753 timeo = schedule_timeout(timeo); 2755 2754 } ··· 2777 2776 goto failure; 2778 2777 2779 2778 err = -EPIPE; 2780 - if (sk->sk_shutdown & SEND_SHUTDOWN) 2779 + if (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) 2781 2780 goto failure; 2782 2781 2783 2782 if (sk_wmem_alloc_get(sk) < READ_ONCE(sk->sk_sndbuf)) ··· 3139 3138 { 3140 3139 int ret, amt = sk_mem_pages(size); 3141 3140 3142 - sk->sk_forward_alloc += amt << PAGE_SHIFT; 3141 + sk_forward_alloc_add(sk, amt << PAGE_SHIFT); 3143 3142 ret = __sk_mem_raise_allocated(sk, size, amt, kind); 3144 3143 if (!ret) 3145 - sk->sk_forward_alloc -= amt << PAGE_SHIFT; 3144 + sk_forward_alloc_add(sk, -(amt << PAGE_SHIFT)); 3146 3145 return ret; 3147 3146 } 3148 3147 EXPORT_SYMBOL(__sk_mem_schedule); ··· 3174 3173 void __sk_mem_reclaim(struct sock *sk, int amount) 3175 3174 { 3176 3175 amount >>= PAGE_SHIFT; 3177 - sk->sk_forward_alloc -= amount << PAGE_SHIFT; 3176 + sk_forward_alloc_add(sk, -(amount << PAGE_SHIFT)); 3178 3177 __sk_mem_reduce_allocated(sk, amount); 3179 3178 } 3180 3179 EXPORT_SYMBOL(__sk_mem_reclaim); ··· 3743 3742 mem[SK_MEMINFO_RCVBUF] = READ_ONCE(sk->sk_rcvbuf); 3744 3743 mem[SK_MEMINFO_WMEM_ALLOC] = sk_wmem_alloc_get(sk); 3745 3744 mem[SK_MEMINFO_SNDBUF] = READ_ONCE(sk->sk_sndbuf); 3746 - mem[SK_MEMINFO_FWD_ALLOC] = sk->sk_forward_alloc; 3745 + mem[SK_MEMINFO_FWD_ALLOC] = sk_forward_alloc_get(sk); 3747 3746 mem[SK_MEMINFO_WMEM_QUEUED] = READ_ONCE(sk->sk_wmem_queued); 3748 3747 mem[SK_MEMINFO_OPTMEM] = atomic_read(&sk->sk_omem_alloc); 3749 3748 mem[SK_MEMINFO_BACKLOG] = READ_ONCE(sk->sk_backlog.len);

+18 -18

net/core/sock_map.c

··· 18 18 struct bpf_map map; 19 19 struct sock **sks; 20 20 struct sk_psock_progs progs; 21 - raw_spinlock_t lock; 21 + spinlock_t lock; 22 22 }; 23 23 24 24 #define SOCK_CREATE_FLAG_MASK \ ··· 44 44 return ERR_PTR(-ENOMEM); 45 45 46 46 bpf_map_init_from_attr(&stab->map, attr); 47 - raw_spin_lock_init(&stab->lock); 47 + spin_lock_init(&stab->lock); 48 48 49 49 stab->sks = bpf_map_area_alloc((u64) stab->map.max_entries * 50 50 sizeof(struct sock *), ··· 411 411 struct sock *sk; 412 412 int err = 0; 413 413 414 - raw_spin_lock_bh(&stab->lock); 414 + spin_lock_bh(&stab->lock); 415 415 sk = *psk; 416 416 if (!sk_test || sk_test == sk) 417 417 sk = xchg(psk, NULL); ··· 421 421 else 422 422 err = -EINVAL; 423 423 424 - raw_spin_unlock_bh(&stab->lock); 424 + spin_unlock_bh(&stab->lock); 425 425 return err; 426 426 } 427 427 ··· 487 487 psock = sk_psock(sk); 488 488 WARN_ON_ONCE(!psock); 489 489 490 - raw_spin_lock_bh(&stab->lock); 490 + spin_lock_bh(&stab->lock); 491 491 osk = stab->sks[idx]; 492 492 if (osk && flags == BPF_NOEXIST) { 493 493 ret = -EEXIST; ··· 501 501 stab->sks[idx] = sk; 502 502 if (osk) 503 503 sock_map_unref(osk, &stab->sks[idx]); 504 - raw_spin_unlock_bh(&stab->lock); 504 + spin_unlock_bh(&stab->lock); 505 505 return 0; 506 506 out_unlock: 507 - raw_spin_unlock_bh(&stab->lock); 507 + spin_unlock_bh(&stab->lock); 508 508 if (psock) 509 509 sk_psock_put(sk, psock); 510 510 out_free: ··· 835 835 836 836 struct bpf_shtab_bucket { 837 837 struct hlist_head head; 838 - raw_spinlock_t lock; 838 + spinlock_t lock; 839 839 }; 840 840 841 841 struct bpf_shtab { ··· 910 910 * is okay since it's going away only after RCU grace period. 911 911 * However, we need to check whether it's still present. 912 912 */ 913 - raw_spin_lock_bh(&bucket->lock); 913 + spin_lock_bh(&bucket->lock); 914 914 elem_probe = sock_hash_lookup_elem_raw(&bucket->head, elem->hash, 915 915 elem->key, map->key_size); 916 916 if (elem_probe && elem_probe == elem) { ··· 918 918 sock_map_unref(elem->sk, elem); 919 919 sock_hash_free_elem(htab, elem); 920 920 } 921 - raw_spin_unlock_bh(&bucket->lock); 921 + spin_unlock_bh(&bucket->lock); 922 922 } 923 923 924 924 static long sock_hash_delete_elem(struct bpf_map *map, void *key) ··· 932 932 hash = sock_hash_bucket_hash(key, key_size); 933 933 bucket = sock_hash_select_bucket(htab, hash); 934 934 935 - raw_spin_lock_bh(&bucket->lock); 935 + spin_lock_bh(&bucket->lock); 936 936 elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size); 937 937 if (elem) { 938 938 hlist_del_rcu(&elem->node); ··· 940 940 sock_hash_free_elem(htab, elem); 941 941 ret = 0; 942 942 } 943 - raw_spin_unlock_bh(&bucket->lock); 943 + spin_unlock_bh(&bucket->lock); 944 944 return ret; 945 945 } 946 946 ··· 1000 1000 hash = sock_hash_bucket_hash(key, key_size); 1001 1001 bucket = sock_hash_select_bucket(htab, hash); 1002 1002 1003 - raw_spin_lock_bh(&bucket->lock); 1003 + spin_lock_bh(&bucket->lock); 1004 1004 elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size); 1005 1005 if (elem && flags == BPF_NOEXIST) { 1006 1006 ret = -EEXIST; ··· 1026 1026 sock_map_unref(elem->sk, elem); 1027 1027 sock_hash_free_elem(htab, elem); 1028 1028 } 1029 - raw_spin_unlock_bh(&bucket->lock); 1029 + spin_unlock_bh(&bucket->lock); 1030 1030 return 0; 1031 1031 out_unlock: 1032 - raw_spin_unlock_bh(&bucket->lock); 1032 + spin_unlock_bh(&bucket->lock); 1033 1033 sk_psock_put(sk, psock); 1034 1034 out_free: 1035 1035 sk_psock_free_link(link); ··· 1115 1115 1116 1116 for (i = 0; i < htab->buckets_num; i++) { 1117 1117 INIT_HLIST_HEAD(&htab->buckets[i].head); 1118 - raw_spin_lock_init(&htab->buckets[i].lock); 1118 + spin_lock_init(&htab->buckets[i].lock); 1119 1119 } 1120 1120 1121 1121 return &htab->map; ··· 1147 1147 * exists, psock exists and holds a ref to socket. That 1148 1148 * lets us to grab a socket ref too. 1149 1149 */ 1150 - raw_spin_lock_bh(&bucket->lock); 1150 + spin_lock_bh(&bucket->lock); 1151 1151 hlist_for_each_entry(elem, &bucket->head, node) 1152 1152 sock_hold(elem->sk); 1153 1153 hlist_move_list(&bucket->head, &unlink_list); 1154 - raw_spin_unlock_bh(&bucket->lock); 1154 + spin_unlock_bh(&bucket->lock); 1155 1155 1156 1156 /* Process removed entries out of atomic context to 1157 1157 * block for socket lock before deleting the psock's

+6 -12

net/handshake/netlink.c

··· 157 157 int handshake_nl_done_doit(struct sk_buff *skb, struct genl_info *info) 158 158 { 159 159 struct net *net = sock_net(skb->sk); 160 - struct handshake_req *req = NULL; 161 - struct socket *sock = NULL; 160 + struct handshake_req *req; 161 + struct socket *sock; 162 162 int fd, status, err; 163 163 164 164 if (GENL_REQ_ATTR_CHECK(info, HANDSHAKE_A_DONE_SOCKFD)) 165 165 return -EINVAL; 166 166 fd = nla_get_u32(info->attrs[HANDSHAKE_A_DONE_SOCKFD]); 167 167 168 - err = 0; 169 168 sock = sockfd_lookup(fd, &err); 170 - if (err) { 171 - err = -EBADF; 172 - goto out_status; 173 - } 169 + if (!sock) 170 + return err; 174 171 175 172 req = handshake_req_hash_lookup(sock->sk); 176 173 if (!req) { 177 174 err = -EBUSY; 175 + trace_handshake_cmd_done_err(net, req, sock->sk, err); 178 176 fput(sock->file); 179 - goto out_status; 177 + return err; 180 178 } 181 179 182 180 trace_handshake_cmd_done(net, req, sock->sk, fd); ··· 186 188 handshake_complete(req, status, info); 187 189 fput(sock->file); 188 190 return 0; 189 - 190 - out_status: 191 - trace_handshake_cmd_done_err(net, req, sock->sk, err); 192 - return err; 193 191 } 194 192 195 193 static unsigned int handshake_net_id;

+4 -1

net/ipv4/fib_semantics.c

··· 278 278 hlist_del(&nexthop_nh->nh_hash); 279 279 } endfor_nexthops(fi) 280 280 } 281 - fi->fib_dead = 1; 281 + /* Paired with READ_ONCE() from fib_table_lookup() */ 282 + WRITE_ONCE(fi->fib_dead, 1); 282 283 fib_info_put(fi); 283 284 } 284 285 spin_unlock_bh(&fib_info_lock); ··· 1582 1581 link_it: 1583 1582 ofi = fib_find_info(fi); 1584 1583 if (ofi) { 1584 + /* fib_table_lookup() should not see @fi yet. */ 1585 1585 fi->fib_dead = 1; 1586 1586 free_fib_info(fi); 1587 1587 refcount_inc(&ofi->fib_treeref); ··· 1621 1619 1622 1620 failure: 1623 1621 if (fi) { 1622 + /* fib_table_lookup() should not see @fi yet. */ 1624 1623 fi->fib_dead = 1; 1625 1624 free_fib_info(fi); 1626 1625 }

+2 -1

net/ipv4/fib_trie.c

··· 1582 1582 if (fa->fa_dscp && 1583 1583 inet_dscp_to_dsfield(fa->fa_dscp) != flp->flowi4_tos) 1584 1584 continue; 1585 - if (fi->fib_dead) 1585 + /* Paired with WRITE_ONCE() in fib_release_info() */ 1586 + if (READ_ONCE(fi->fib_dead)) 1586 1587 continue; 1587 1588 if (fa->fa_info->fib_scope < flp->flowi4_scope) 1588 1589 continue;

+2 -1

net/ipv4/igmp.c

··· 353 353 struct flowi4 fl4; 354 354 int hlen = LL_RESERVED_SPACE(dev); 355 355 int tlen = dev->needed_tailroom; 356 - unsigned int size = mtu; 356 + unsigned int size; 357 357 358 + size = min(mtu, IP_MAX_MTU); 358 359 while (1) { 359 360 skb = alloc_skb(size + hlen + tlen, 360 361 GFP_ATOMIC | __GFP_NOWARN);

-1

net/ipv4/ip_forward.c

··· 67 67 struct ip_options *opt = &(IPCB(skb)->opt); 68 68 69 69 __IP_INC_STATS(net, IPSTATS_MIB_OUTFORWDATAGRAMS); 70 - __IP_ADD_STATS(net, IPSTATS_MIB_OUTOCTETS, skb->len); 71 70 72 71 #ifdef CONFIG_NET_SWITCHDEV 73 72 if (skb->offload_l3_fwd_mark) {

+2 -1

net/ipv4/ip_input.c

··· 584 584 static struct sk_buff *ip_extract_route_hint(const struct net *net, 585 585 struct sk_buff *skb, int rt_type) 586 586 { 587 - if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST) 587 + if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST || 588 + IPCB(skb)->flags & IPSKB_MULTIPATH) 588 589 return NULL; 589 590 590 591 return skb;

+4 -5

net/ipv4/ip_output.c

··· 207 207 } else if (rt->rt_type == RTN_BROADCAST) 208 208 IP_UPD_PO_STATS(net, IPSTATS_MIB_OUTBCAST, skb->len); 209 209 210 + /* OUTOCTETS should be counted after fragment */ 211 + IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len); 212 + 210 213 if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { 211 214 skb = skb_expand_head(skb, hh_len); 212 215 if (!skb) ··· 369 366 /* 370 367 * If the indicated interface is up and running, send the packet. 371 368 */ 372 - IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len); 373 - 374 369 skb->dev = dev; 375 370 skb->protocol = htons(ETH_P_IP); 376 371 ··· 424 423 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb) 425 424 { 426 425 struct net_device *dev = skb_dst(skb)->dev, *indev = skb->dev; 427 - 428 - IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len); 429 426 430 427 skb->dev = dev; 431 428 skb->protocol = htons(ETH_P_IP); ··· 981 982 paged = !!cork->gso_size; 982 983 983 984 if (cork->tx_flags & SKBTX_ANY_TSTAMP && 984 - sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) 985 + READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) 985 986 tskey = atomic_inc_return(&sk->sk_tskey) - 1; 986 987 987 988 hh_len = LL_RESERVED_SPACE(rt->dst.dev);

+1 -1

net/ipv4/ip_sockglue.c

··· 511 511 * or without payload (SOF_TIMESTAMPING_OPT_TSONLY). 512 512 */ 513 513 info = PKTINFO_SKB_CB(skb); 514 - if (!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_CMSG) || 514 + if (!(READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_CMSG) || 515 515 !info->ipi_ifindex) 516 516 return false; 517 517

-1

net/ipv4/ipmr.c

··· 1804 1804 struct ip_options *opt = &(IPCB(skb)->opt); 1805 1805 1806 1806 IP_INC_STATS(net, IPSTATS_MIB_OUTFORWDATAGRAMS); 1807 - IP_ADD_STATS(net, IPSTATS_MIB_OUTOCTETS, skb->len); 1808 1807 1809 1808 if (unlikely(opt->optlen)) 1810 1809 ip_forward_options(skb);

+1

net/ipv4/route.c

··· 2144 2144 int h = fib_multipath_hash(res->fi->fib_net, NULL, skb, hkeys); 2145 2145 2146 2146 fib_select_multipath(res, h); 2147 + IPCB(skb)->flags |= IPSKB_MULTIPATH; 2147 2148 } 2148 2149 #endif 2149 2150

+2 -2

net/ipv4/tcp.c

··· 2256 2256 } 2257 2257 } 2258 2258 2259 - if (sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE) 2259 + if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_SOFTWARE) 2260 2260 has_timestamping = true; 2261 2261 else 2262 2262 tss->ts[0] = (struct timespec64) {0}; 2263 2263 } 2264 2264 2265 2265 if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) { 2266 - if (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) 2266 + if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_RAW_HARDWARE) 2267 2267 has_timestamping = true; 2268 2268 else 2269 2269 tss->ts[2] = (struct timespec64) {0};

+1 -1

net/ipv4/tcp_output.c

··· 3474 3474 if (delta <= 0) 3475 3475 return; 3476 3476 amt = sk_mem_pages(delta); 3477 - sk->sk_forward_alloc += amt << PAGE_SHIFT; 3477 + sk_forward_alloc_add(sk, amt << PAGE_SHIFT); 3478 3478 sk_memory_allocated_add(sk, amt); 3479 3479 3480 3480 if (mem_cgroup_sockets_enabled && sk->sk_memcg)

+3 -3

net/ipv4/udp.c

··· 1414 1414 spin_lock(&sk_queue->lock); 1415 1415 1416 1416 1417 - sk->sk_forward_alloc += size; 1417 + sk_forward_alloc_add(sk, size); 1418 1418 amt = (sk->sk_forward_alloc - partial) & ~(PAGE_SIZE - 1); 1419 - sk->sk_forward_alloc -= amt; 1419 + sk_forward_alloc_add(sk, -amt); 1420 1420 1421 1421 if (amt) 1422 1422 __sk_mem_reduce_allocated(sk, amt >> PAGE_SHIFT); ··· 1527 1527 goto uncharge_drop; 1528 1528 } 1529 1529 1530 - sk->sk_forward_alloc -= size; 1530 + sk_forward_alloc_add(sk, -size); 1531 1531 1532 1532 /* no need to setup a destructor, we will explicitly release the 1533 1533 * forward allocated memory on dequeue

+1 -1

net/ipv6/addrconf.c

··· 1378 1378 * idev->desync_factor if it's larger 1379 1379 */ 1380 1380 cnf_temp_preferred_lft = READ_ONCE(idev->cnf.temp_prefered_lft); 1381 - max_desync_factor = min_t(__u32, 1381 + max_desync_factor = min_t(long, 1382 1382 idev->cnf.max_desync_factor, 1383 1383 cnf_temp_preferred_lft - regen_advance); 1384 1384

+2 -1

net/ipv6/ip6_input.c

··· 99 99 static struct sk_buff *ip6_extract_route_hint(const struct net *net, 100 100 struct sk_buff *skb) 101 101 { 102 - if (fib6_routes_require_src(net) || fib6_has_custom_rules(net)) 102 + if (fib6_routes_require_src(net) || fib6_has_custom_rules(net) || 103 + IP6CB(skb)->flags & IP6SKB_MULTIPATH) 103 104 return NULL; 104 105 105 106 return skb;

+1 -2

net/ipv6/ip6_output.c

··· 451 451 struct dst_entry *dst = skb_dst(skb); 452 452 453 453 __IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTFORWDATAGRAMS); 454 - __IP6_ADD_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTOCTETS, skb->len); 455 454 456 455 #ifdef CONFIG_NET_SWITCHDEV 457 456 if (skb->offload_l3_fwd_mark) { ··· 1501 1502 orig_mtu = mtu; 1502 1503 1503 1504 if (cork->tx_flags & SKBTX_ANY_TSTAMP && 1504 - sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID) 1505 + READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) 1505 1506 tskey = atomic_inc_return(&sk->sk_tskey) - 1; 1506 1507 1507 1508 hh_len = LL_RESERVED_SPACE(rt->dst.dev);

-2

net/ipv6/ip6mr.c

··· 2010 2010 { 2011 2011 IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), 2012 2012 IPSTATS_MIB_OUTFORWDATAGRAMS); 2013 - IP6_ADD_STATS(net, ip6_dst_idev(skb_dst(skb)), 2014 - IPSTATS_MIB_OUTOCTETS, skb->len); 2015 2013 return dst_output(net, sk, skb); 2016 2014 } 2017 2015

+1 -1

net/ipv6/ping.c

··· 119 119 return -EINVAL; 120 120 121 121 ipcm6_init_sk(&ipc6, np); 122 - ipc6.sockc.tsflags = sk->sk_tsflags; 122 + ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags); 123 123 ipc6.sockc.mark = READ_ONCE(sk->sk_mark); 124 124 125 125 fl6.flowi6_oif = oif;

+1 -1

net/ipv6/raw.c

··· 772 772 fl6.flowi6_uid = sk->sk_uid; 773 773 774 774 ipcm6_init(&ipc6); 775 - ipc6.sockc.tsflags = sk->sk_tsflags; 775 + ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags); 776 776 ipc6.sockc.mark = fl6.flowi6_mark; 777 777 778 778 if (sin6) {

+3

net/ipv6/route.c

··· 423 423 if (match->nh && have_oif_match && res->nh) 424 424 return; 425 425 426 + if (skb) 427 + IP6CB(skb)->flags |= IP6SKB_MULTIPATH; 428 + 426 429 /* We might have already computed the hash for ICMPv6 errors. In such 427 430 * case it will always be non-zero. Otherwise now is the time to do it. 428 431 */

+1 -1

net/ipv6/udp.c

··· 1339 1339 1340 1340 ipcm6_init(&ipc6); 1341 1341 ipc6.gso_size = READ_ONCE(up->gso_size); 1342 - ipc6.sockc.tsflags = sk->sk_tsflags; 1342 + ipc6.sockc.tsflags = READ_ONCE(sk->sk_tsflags); 1343 1343 ipc6.sockc.mark = READ_ONCE(sk->sk_mark); 1344 1344 1345 1345 /* destination address check */

+2

net/kcm/kcmsock.c

··· 1859 1859 * that all multiplexors and psocks have been destroyed. 1860 1860 */ 1861 1861 WARN_ON(!list_empty(&knet->mux_list)); 1862 + 1863 + mutex_destroy(&knet->mutex); 1862 1864 } 1863 1865 1864 1866 static struct pernet_operations kcm_net_ops = {

+15 -8

net/mptcp/protocol.c

··· 134 134 __kfree_skb(skb); 135 135 } 136 136 137 + static void mptcp_rmem_fwd_alloc_add(struct sock *sk, int size) 138 + { 139 + WRITE_ONCE(mptcp_sk(sk)->rmem_fwd_alloc, 140 + mptcp_sk(sk)->rmem_fwd_alloc + size); 141 + } 142 + 137 143 static void mptcp_rmem_charge(struct sock *sk, int size) 138 144 { 139 - mptcp_sk(sk)->rmem_fwd_alloc -= size; 145 + mptcp_rmem_fwd_alloc_add(sk, -size); 140 146 } 141 147 142 148 static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, ··· 183 177 static void __mptcp_rmem_reclaim(struct sock *sk, int amount) 184 178 { 185 179 amount >>= PAGE_SHIFT; 186 - mptcp_sk(sk)->rmem_fwd_alloc -= amount << PAGE_SHIFT; 180 + mptcp_rmem_charge(sk, amount << PAGE_SHIFT); 187 181 __sk_mem_reduce_allocated(sk, amount); 188 182 } 189 183 ··· 192 186 struct mptcp_sock *msk = mptcp_sk(sk); 193 187 int reclaimable; 194 188 195 - msk->rmem_fwd_alloc += size; 189 + mptcp_rmem_fwd_alloc_add(sk, size); 196 190 reclaimable = msk->rmem_fwd_alloc - sk_unused_reserved_mem(sk); 197 191 198 192 /* see sk_mem_uncharge() for the rationale behind the following schema */ ··· 347 341 if (!__sk_mem_raise_allocated(sk, size, amt, SK_MEM_RECV)) 348 342 return false; 349 343 350 - msk->rmem_fwd_alloc += amount; 344 + mptcp_rmem_fwd_alloc_add(sk, amount); 351 345 return true; 352 346 } 353 347 ··· 1806 1800 } 1807 1801 1808 1802 /* data successfully copied into the write queue */ 1809 - sk->sk_forward_alloc -= total_ts; 1803 + sk_forward_alloc_add(sk, -total_ts); 1810 1804 copied += psize; 1811 1805 dfrag->data_len += psize; 1812 1806 frag_truesize += psize; ··· 3263 3257 /* move all the rx fwd alloc into the sk_mem_reclaim_final in 3264 3258 * inet_sock_destruct() will dispose it 3265 3259 */ 3266 - sk->sk_forward_alloc += msk->rmem_fwd_alloc; 3267 - msk->rmem_fwd_alloc = 0; 3260 + sk_forward_alloc_add(sk, msk->rmem_fwd_alloc); 3261 + WRITE_ONCE(msk->rmem_fwd_alloc, 0); 3268 3262 mptcp_token_destroy(msk); 3269 3263 mptcp_pm_free_anno_list(msk); 3270 3264 mptcp_free_local_addr_list(msk); ··· 3528 3522 3529 3523 static int mptcp_forward_alloc_get(const struct sock *sk) 3530 3524 { 3531 - return sk->sk_forward_alloc + mptcp_sk(sk)->rmem_fwd_alloc; 3525 + return READ_ONCE(sk->sk_forward_alloc) + 3526 + READ_ONCE(mptcp_sk(sk)->rmem_fwd_alloc); 3532 3527 } 3533 3528 3534 3529 static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v)

+1

net/netfilter/ipset/ip_set_hash_netportnet.c

··· 36 36 #define IP_SET_HASH_WITH_PROTO 37 37 #define IP_SET_HASH_WITH_NETS 38 38 #define IPSET_NET_COUNT 2 39 + #define IP_SET_HASH_WITH_NET0 39 40 40 41 /* IPv4 variant */ 41 42

+49 -5

net/netfilter/nf_tables_api.c

··· 102 102 [NFT_MSG_NEWFLOWTABLE] = AUDIT_NFT_OP_FLOWTABLE_REGISTER, 103 103 [NFT_MSG_GETFLOWTABLE] = AUDIT_NFT_OP_INVALID, 104 104 [NFT_MSG_DELFLOWTABLE] = AUDIT_NFT_OP_FLOWTABLE_UNREGISTER, 105 + [NFT_MSG_GETSETELEM_RESET] = AUDIT_NFT_OP_SETELEM_RESET, 105 106 }; 106 107 107 108 static void nft_validate_state_update(struct nft_table *table, u8 new_validate_state) ··· 3422 3421 nfnetlink_set_err(ctx->net, ctx->portid, NFNLGRP_NFTABLES, -ENOBUFS); 3423 3422 } 3424 3423 3424 + static void audit_log_rule_reset(const struct nft_table *table, 3425 + unsigned int base_seq, 3426 + unsigned int nentries) 3427 + { 3428 + char *buf = kasprintf(GFP_ATOMIC, "%s:%u", 3429 + table->name, base_seq); 3430 + 3431 + audit_log_nfcfg(buf, table->family, nentries, 3432 + AUDIT_NFT_OP_RULE_RESET, GFP_ATOMIC); 3433 + kfree(buf); 3434 + } 3435 + 3425 3436 struct nft_rule_dump_ctx { 3426 3437 char *table; 3427 3438 char *chain; ··· 3480 3467 cont_skip: 3481 3468 (*idx)++; 3482 3469 } 3470 + 3471 + if (reset && *idx) 3472 + audit_log_rule_reset(table, cb->seq, *idx); 3473 + 3483 3474 return 0; 3484 3475 } 3485 3476 ··· 3650 3633 family, table, chain, rule, 0, reset); 3651 3634 if (err < 0) 3652 3635 goto err_fill_rule_info; 3636 + 3637 + if (reset) 3638 + audit_log_rule_reset(table, nft_pernet(net)->base_seq, 1); 3653 3639 3654 3640 return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); 3655 3641 ··· 5644 5624 return nf_tables_fill_setelem(args->skb, set, elem, args->reset); 5645 5625 } 5646 5626 5627 + static void audit_log_nft_set_reset(const struct nft_table *table, 5628 + unsigned int base_seq, 5629 + unsigned int nentries) 5630 + { 5631 + char *buf = kasprintf(GFP_ATOMIC, "%s:%u", table->name, base_seq); 5632 + 5633 + audit_log_nfcfg(buf, table->family, nentries, 5634 + AUDIT_NFT_OP_SETELEM_RESET, GFP_ATOMIC); 5635 + kfree(buf); 5636 + } 5637 + 5647 5638 struct nft_set_dump_ctx { 5648 5639 const struct nft_set *set; 5649 5640 struct nft_ctx ctx; 5650 5641 }; 5651 5642 5652 5643 static int nft_set_catchall_dump(struct net *net, struct sk_buff *skb, 5653 - const struct nft_set *set, bool reset) 5644 + const struct nft_set *set, bool reset, 5645 + unsigned int base_seq) 5654 5646 { 5655 5647 struct nft_set_elem_catchall *catchall; 5656 5648 u8 genmask = nft_genmask_cur(net); ··· 5678 5646 5679 5647 elem.priv = catchall->elem; 5680 5648 ret = nf_tables_fill_setelem(skb, set, &elem, reset); 5649 + if (reset && !ret) 5650 + audit_log_nft_set_reset(set->table, base_seq, 1); 5681 5651 break; 5682 5652 } 5683 5653 ··· 5759 5725 set->ops->walk(&dump_ctx->ctx, set, &args.iter); 5760 5726 5761 5727 if (!args.iter.err && args.iter.count == cb->args[0]) 5762 - args.iter.err = nft_set_catchall_dump(net, skb, set, reset); 5763 - rcu_read_unlock(); 5764 - 5728 + args.iter.err = nft_set_catchall_dump(net, skb, set, 5729 + reset, cb->seq); 5765 5730 nla_nest_end(skb, nest); 5766 5731 nlmsg_end(skb, nlh); 5732 + 5733 + if (reset && args.iter.count > args.iter.skip) 5734 + audit_log_nft_set_reset(table, cb->seq, 5735 + args.iter.count - args.iter.skip); 5736 + 5737 + rcu_read_unlock(); 5767 5738 5768 5739 if (args.iter.err && args.iter.err != -EMSGSIZE) 5769 5740 return args.iter.err; ··· 5994 5955 struct netlink_ext_ack *extack = info->extack; 5995 5956 u8 genmask = nft_genmask_cur(info->net); 5996 5957 u8 family = info->nfmsg->nfgen_family; 5958 + int rem, err = 0, nelems = 0; 5997 5959 struct net *net = info->net; 5998 5960 struct nft_table *table; 5999 5961 struct nft_set *set; 6000 5962 struct nlattr *attr; 6001 5963 struct nft_ctx ctx; 6002 5964 bool reset = false; 6003 - int rem, err = 0; 6004 5965 6005 5966 table = nft_table_lookup(net, nla[NFTA_SET_ELEM_LIST_TABLE], family, 6006 5967 genmask, 0); ··· 6043 6004 NL_SET_BAD_ATTR(extack, attr); 6044 6005 break; 6045 6006 } 6007 + nelems++; 6046 6008 } 6009 + 6010 + if (reset) 6011 + audit_log_nft_set_reset(table, nft_pernet(net)->base_seq, 6012 + nelems); 6047 6013 6048 6014 return err; 6049 6015 }

+8

net/netfilter/nfnetlink_osf.c

··· 315 315 316 316 f = nla_data(osf_attrs[OSF_ATTR_FINGER]); 317 317 318 + if (f->opt_num > ARRAY_SIZE(f->opt)) 319 + return -EINVAL; 320 + 321 + if (!memchr(f->genre, 0, MAXGENRELEN) || 322 + !memchr(f->subtype, 0, MAXGENRELEN) || 323 + !memchr(f->version, 0, MAXGENRELEN)) 324 + return -EINVAL; 325 + 318 326 kf = kmalloc(sizeof(struct nf_osf_finger), GFP_KERNEL); 319 327 if (!kf) 320 328 return -ENOMEM;

+22 -20

net/netfilter/nft_exthdr.c

··· 35 35 return opt[offset + 1]; 36 36 } 37 37 38 + static int nft_skb_copy_to_reg(const struct sk_buff *skb, int offset, u32 *dest, unsigned int len) 39 + { 40 + if (len % NFT_REG32_SIZE) 41 + dest[len / NFT_REG32_SIZE] = 0; 42 + 43 + return skb_copy_bits(skb, offset, dest, len); 44 + } 45 + 38 46 static void nft_exthdr_ipv6_eval(const struct nft_expr *expr, 39 47 struct nft_regs *regs, 40 48 const struct nft_pktinfo *pkt) ··· 64 56 } 65 57 offset += priv->offset; 66 58 67 - dest[priv->len / NFT_REG32_SIZE] = 0; 68 - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) 59 + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) 69 60 goto err; 70 61 return; 71 62 err: ··· 160 153 } 161 154 offset += priv->offset; 162 155 163 - dest[priv->len / NFT_REG32_SIZE] = 0; 164 - if (skb_copy_bits(pkt->skb, offset, dest, priv->len) < 0) 156 + if (nft_skb_copy_to_reg(pkt->skb, offset, dest, priv->len) < 0) 165 157 goto err; 166 158 return; 167 159 err: ··· 216 210 if (priv->flags & NFT_EXTHDR_F_PRESENT) { 217 211 *dest = 1; 218 212 } else { 219 - dest[priv->len / NFT_REG32_SIZE] = 0; 213 + if (priv->len % NFT_REG32_SIZE) 214 + dest[priv->len / NFT_REG32_SIZE] = 0; 220 215 memcpy(dest, opt + offset, priv->len); 221 216 } 222 217 ··· 245 238 if (!tcph) 246 239 goto err; 247 240 241 + if (skb_ensure_writable(pkt->skb, nft_thoff(pkt) + tcphdr_len)) 242 + goto err; 243 + 244 + tcph = (struct tcphdr *)(pkt->skb->data + nft_thoff(pkt)); 248 245 opt = (u8 *)tcph; 246 + 249 247 for (i = sizeof(*tcph); i < tcphdr_len - 1; i += optl) { 250 248 union { 251 249 __be16 v16; ··· 263 251 continue; 264 252 265 253 if (i + optl > tcphdr_len || priv->len + priv->offset > optl) 266 - goto err; 267 - 268 - if (skb_ensure_writable(pkt->skb, 269 - nft_thoff(pkt) + i + priv->len)) 270 - goto err; 271 - 272 - tcph = nft_tcp_header_pointer(pkt, sizeof(buff), buff, 273 - &tcphdr_len); 274 - if (!tcph) 275 254 goto err; 276 255 277 256 offset = i + priv->offset; ··· 328 325 if (skb_ensure_writable(pkt->skb, nft_thoff(pkt) + tcphdr_len)) 329 326 goto drop; 330 327 331 - opt = (u8 *)nft_tcp_header_pointer(pkt, sizeof(buff), buff, &tcphdr_len); 332 - if (!opt) 333 - goto err; 328 + tcph = (struct tcphdr *)(pkt->skb->data + nft_thoff(pkt)); 329 + opt = (u8 *)tcph; 330 + 334 331 for (i = sizeof(*tcph); i < tcphdr_len - 1; i += optl) { 335 332 unsigned int j; 336 333 ··· 395 392 offset + ntohs(sch->length) > pkt->skb->len) 396 393 break; 397 394 398 - dest[priv->len / NFT_REG32_SIZE] = 0; 399 - if (skb_copy_bits(pkt->skb, offset + priv->offset, 400 - dest, priv->len) < 0) 395 + if (nft_skb_copy_to_reg(pkt->skb, offset + priv->offset, 396 + dest, priv->len) < 0) 401 397 break; 402 398 return; 403 399 }

+6 -2

net/netfilter/nft_set_rbtree.c

··· 312 312 struct nft_rbtree_elem *rbe, *rbe_le = NULL, *rbe_ge = NULL; 313 313 struct rb_node *node, *next, *parent, **p, *first = NULL; 314 314 struct nft_rbtree *priv = nft_set_priv(set); 315 + u8 cur_genmask = nft_genmask_cur(net); 315 316 u8 genmask = nft_genmask_next(net); 316 317 int d, err; 317 318 ··· 358 357 if (!nft_set_elem_active(&rbe->ext, genmask)) 359 358 continue; 360 359 361 - /* perform garbage collection to avoid bogus overlap reports. */ 362 - if (nft_set_elem_expired(&rbe->ext)) { 360 + /* perform garbage collection to avoid bogus overlap reports 361 + * but skip new elements in this transaction. 362 + */ 363 + if (nft_set_elem_expired(&rbe->ext) && 364 + nft_set_elem_active(&rbe->ext, cur_genmask)) { 363 365 err = nft_rbtree_gc_elem(set, priv, rbe, genmask); 364 366 if (err < 0) 365 367 return err;

+2

net/netfilter/xt_sctp.c

··· 149 149 { 150 150 const struct xt_sctp_info *info = par->matchinfo; 151 151 152 + if (info->flag_count > ARRAY_SIZE(info->flag_info)) 153 + return -EINVAL; 152 154 if (info->flags & ~XT_SCTP_VALID_FLAGS) 153 155 return -EINVAL; 154 156 if (info->invflags & ~XT_SCTP_VALID_FLAGS)

+21

net/netfilter/xt_u32.c

··· 96 96 return ret ^ data->invert; 97 97 } 98 98 99 + static int u32_mt_checkentry(const struct xt_mtchk_param *par) 100 + { 101 + const struct xt_u32 *data = par->matchinfo; 102 + const struct xt_u32_test *ct; 103 + unsigned int i; 104 + 105 + if (data->ntests > ARRAY_SIZE(data->tests)) 106 + return -EINVAL; 107 + 108 + for (i = 0; i < data->ntests; ++i) { 109 + ct = &data->tests[i]; 110 + 111 + if (ct->nnums > ARRAY_SIZE(ct->location) || 112 + ct->nvalues > ARRAY_SIZE(ct->value)) 113 + return -EINVAL; 114 + } 115 + 116 + return 0; 117 + } 118 + 99 119 static struct xt_match xt_u32_mt_reg __read_mostly = { 100 120 .name = "u32", 101 121 .revision = 0, 102 122 .family = NFPROTO_UNSPEC, 103 123 .match = u32_mt, 124 + .checkentry = u32_mt_checkentry, 104 125 .matchsize = sizeof(struct xt_u32), 105 126 .me = THIS_MODULE, 106 127 };

+19 -8

net/sched/sch_fq_pie.c

··· 61 61 struct pie_params p_params; 62 62 u32 ecn_prob; 63 63 u32 flows_cnt; 64 + u32 flows_cursor; 64 65 u32 quantum; 65 66 u32 memory_limit; 66 67 u32 new_flow_count; ··· 376 375 static void fq_pie_timer(struct timer_list *t) 377 376 { 378 377 struct fq_pie_sched_data *q = from_timer(q, t, adapt_timer); 378 + unsigned long next, tupdate; 379 379 struct Qdisc *sch = q->sch; 380 380 spinlock_t *root_lock; /* to lock qdisc for probability calculations */ 381 - u32 idx; 381 + int max_cnt, i; 382 382 383 383 rcu_read_lock(); 384 384 root_lock = qdisc_lock(qdisc_root_sleeping(sch)); 385 385 spin_lock(root_lock); 386 386 387 - for (idx = 0; idx < q->flows_cnt; idx++) 388 - pie_calculate_probability(&q->p_params, &q->flows[idx].vars, 389 - q->flows[idx].backlog); 387 + /* Limit this expensive loop to 2048 flows per round. */ 388 + max_cnt = min_t(int, q->flows_cnt - q->flows_cursor, 2048); 389 + for (i = 0; i < max_cnt; i++) { 390 + pie_calculate_probability(&q->p_params, 391 + &q->flows[q->flows_cursor].vars, 392 + q->flows[q->flows_cursor].backlog); 393 + q->flows_cursor++; 394 + } 390 395 391 - /* reset the timer to fire after 'tupdate' jiffies. */ 392 - if (q->p_params.tupdate) 393 - mod_timer(&q->adapt_timer, jiffies + q->p_params.tupdate); 394 - 396 + tupdate = q->p_params.tupdate; 397 + next = 0; 398 + if (q->flows_cursor >= q->flows_cnt) { 399 + q->flows_cursor = 0; 400 + next = tupdate; 401 + } 402 + if (tupdate) 403 + mod_timer(&q->adapt_timer, jiffies + next); 395 404 spin_unlock(root_lock); 396 405 rcu_read_unlock(); 397 406 }

+1 -1

net/sched/sch_plug.c

··· 207 207 .priv_size = sizeof(struct plug_sched_data), 208 208 .enqueue = plug_enqueue, 209 209 .dequeue = plug_dequeue, 210 - .peek = qdisc_peek_head, 210 + .peek = qdisc_peek_dequeued, 211 211 .init = plug_init, 212 212 .change = plug_change, 213 213 .reset = qdisc_reset_queue,

+17 -5

net/sched/sch_qfq.c

··· 974 974 } 975 975 976 976 /* Dequeue head packet of the head class in the DRR queue of the aggregate. */ 977 - static void agg_dequeue(struct qfq_aggregate *agg, 978 - struct qfq_class *cl, unsigned int len) 977 + static struct sk_buff *agg_dequeue(struct qfq_aggregate *agg, 978 + struct qfq_class *cl, unsigned int len) 979 979 { 980 - qdisc_dequeue_peeked(cl->qdisc); 980 + struct sk_buff *skb = qdisc_dequeue_peeked(cl->qdisc); 981 + 982 + if (!skb) 983 + return NULL; 981 984 982 985 cl->deficit -= (int) len; 983 986 ··· 990 987 cl->deficit += agg->lmax; 991 988 list_move_tail(&cl->alist, &agg->active); 992 989 } 990 + 991 + return skb; 993 992 } 994 993 995 994 static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg, ··· 1137 1132 if (!skb) 1138 1133 return NULL; 1139 1134 1140 - qdisc_qstats_backlog_dec(sch, skb); 1141 1135 sch->q.qlen--; 1136 + 1137 + skb = agg_dequeue(in_serv_agg, cl, len); 1138 + 1139 + if (!skb) { 1140 + sch->q.qlen++; 1141 + return NULL; 1142 + } 1143 + 1144 + qdisc_qstats_backlog_dec(sch, skb); 1142 1145 qdisc_bstats_update(sch, skb); 1143 1146 1144 - agg_dequeue(in_serv_agg, cl, len); 1145 1147 /* If lmax is lowered, through qfq_change_class, for a class 1146 1148 * owning pending packets with larger size than the new value 1147 1149 * of lmax, then the following condition may hold.

+1 -1

net/sctp/proc.c

··· 282 282 assoc->init_retries, assoc->shutdown_retries, 283 283 assoc->rtx_data_chunks, 284 284 refcount_read(&sk->sk_wmem_alloc), 285 - sk->sk_wmem_queued, 285 + READ_ONCE(sk->sk_wmem_queued), 286 286 sk->sk_sndbuf, 287 287 sk->sk_rcvbuf); 288 288 seq_printf(seq, "\n");

+5 -5

net/sctp/socket.c

··· 69 69 #include <net/sctp/stream_sched.h> 70 70 71 71 /* Forward declarations for internal helper functions. */ 72 - static bool sctp_writeable(struct sock *sk); 72 + static bool sctp_writeable(const struct sock *sk); 73 73 static void sctp_wfree(struct sk_buff *skb); 74 74 static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, 75 75 size_t msg_len); ··· 140 140 141 141 refcount_add(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc); 142 142 asoc->sndbuf_used += chunk->skb->truesize + sizeof(struct sctp_chunk); 143 - sk->sk_wmem_queued += chunk->skb->truesize + sizeof(struct sctp_chunk); 143 + sk_wmem_queued_add(sk, chunk->skb->truesize + sizeof(struct sctp_chunk)); 144 144 sk_mem_charge(sk, chunk->skb->truesize); 145 145 } 146 146 ··· 9144 9144 struct sock *sk = asoc->base.sk; 9145 9145 9146 9146 sk_mem_uncharge(sk, skb->truesize); 9147 - sk->sk_wmem_queued -= skb->truesize + sizeof(struct sctp_chunk); 9147 + sk_wmem_queued_add(sk, -(skb->truesize + sizeof(struct sctp_chunk))); 9148 9148 asoc->sndbuf_used -= skb->truesize + sizeof(struct sctp_chunk); 9149 9149 WARN_ON(refcount_sub_and_test(sizeof(struct sctp_chunk), 9150 9150 &sk->sk_wmem_alloc)); ··· 9299 9299 * UDP-style sockets or TCP-style sockets, this code should work. 9300 9300 * - Daisy 9301 9301 */ 9302 - static bool sctp_writeable(struct sock *sk) 9302 + static bool sctp_writeable(const struct sock *sk) 9303 9303 { 9304 - return sk->sk_sndbuf > sk->sk_wmem_queued; 9304 + return READ_ONCE(sk->sk_sndbuf) > READ_ONCE(sk->sk_wmem_queued); 9305 9305 } 9306 9306 9307 9307 /* Wait for an association to go into ESTABLISHED state. If timeout is 0,

+8 -7

net/socket.c

··· 827 827 828 828 static ktime_t get_timestamp(struct sock *sk, struct sk_buff *skb, int *if_index) 829 829 { 830 - bool cycles = sk->sk_tsflags & SOF_TIMESTAMPING_BIND_PHC; 830 + bool cycles = READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_BIND_PHC; 831 831 struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); 832 832 struct net_device *orig_dev; 833 833 ktime_t hwtstamp; ··· 879 879 int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP); 880 880 int new_tstamp = sock_flag(sk, SOCK_TSTAMP_NEW); 881 881 struct scm_timestamping_internal tss; 882 - 883 882 int empty = 1, false_tstamp = 0; 884 883 struct skb_shared_hwtstamps *shhwtstamps = 885 884 skb_hwtstamps(skb); 886 885 int if_index; 887 886 ktime_t hwtstamp; 887 + u32 tsflags; 888 888 889 889 /* Race occurred between timestamp enabling and packet 890 890 receiving. Fill in the current time for now. */ ··· 926 926 } 927 927 928 928 memset(&tss, 0, sizeof(tss)); 929 - if ((sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE) && 929 + tsflags = READ_ONCE(sk->sk_tsflags); 930 + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && 930 931 ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0)) 931 932 empty = 0; 932 933 if (shhwtstamps && 933 - (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && 934 + (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) && 934 935 !skb_is_swtx_tstamp(skb, false_tstamp)) { 935 936 if_index = 0; 936 937 if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV) ··· 939 938 else 940 939 hwtstamp = shhwtstamps->hwtstamp; 941 940 942 - if (sk->sk_tsflags & SOF_TIMESTAMPING_BIND_PHC) 941 + if (tsflags & SOF_TIMESTAMPING_BIND_PHC) 943 942 hwtstamp = ptp_convert_timestamp(&hwtstamp, 944 - sk->sk_bind_phc); 943 + READ_ONCE(sk->sk_bind_phc)); 945 944 946 945 if (ktime_to_timespec64_cond(hwtstamp, tss.ts + 2)) { 947 946 empty = 0; 948 947 949 - if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) && 948 + if ((tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) && 950 949 !skb_is_err_queue(skb)) 951 950 put_ts_pktinfo(msg, skb, if_index); 952 951 }

+1 -1

net/unix/af_unix.c

··· 680 680 * What the above comment does talk about? --ANK(980817) 681 681 */ 682 682 683 - if (unix_tot_inflight) 683 + if (READ_ONCE(unix_tot_inflight)) 684 684 unix_gc(); /* Garbage collect fds */ 685 685 } 686 686

+3 -3

net/unix/scm.c

··· 64 64 /* Paired with READ_ONCE() in wait_for_unix_gc() */ 65 65 WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); 66 66 } 67 - user->unix_inflight++; 67 + WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); 68 68 spin_unlock(&unix_gc_lock); 69 69 } 70 70 ··· 85 85 /* Paired with READ_ONCE() in wait_for_unix_gc() */ 86 86 WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); 87 87 } 88 - user->unix_inflight--; 88 + WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); 89 89 spin_unlock(&unix_gc_lock); 90 90 } 91 91 ··· 99 99 { 100 100 struct user_struct *user = current_user(); 101 101 102 - if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE))) 102 + if (unlikely(READ_ONCE(user->unix_inflight) > task_rlimit(p, RLIMIT_NOFILE))) 103 103 return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); 104 104 return false; 105 105 }

+13 -9

net/xdp/xsk.c

··· 602 602 603 603 for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { 604 604 if (unlikely(i >= MAX_SKB_FRAGS)) 605 - return ERR_PTR(-EFAULT); 605 + return ERR_PTR(-EOVERFLOW); 606 606 607 607 page = pool->umem->pgs[addr >> PAGE_SHIFT]; 608 608 get_page(page); ··· 655 655 skb_put(skb, len); 656 656 657 657 err = skb_store_bits(skb, 0, buffer, len); 658 - if (unlikely(err)) 658 + if (unlikely(err)) { 659 + kfree_skb(skb); 659 660 goto free_err; 661 + } 660 662 } else { 661 663 int nr_frags = skb_shinfo(skb)->nr_frags; 662 664 struct page *page; 663 665 u8 *vaddr; 664 666 665 667 if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) { 666 - err = -EFAULT; 668 + err = -EOVERFLOW; 667 669 goto free_err; 668 670 } 669 671 ··· 692 690 return skb; 693 691 694 692 free_err: 695 - if (err == -EAGAIN) { 696 - xsk_cq_cancel_locked(xs, 1); 697 - } else { 698 - xsk_set_destructor_arg(skb); 699 - xsk_drop_skb(skb); 693 + if (err == -EOVERFLOW) { 694 + /* Drop the packet */ 695 + xsk_set_destructor_arg(xs->skb); 696 + xsk_drop_skb(xs->skb); 700 697 xskq_cons_release(xs->tx); 698 + } else { 699 + /* Let application retry */ 700 + xsk_cq_cancel_locked(xs, 1); 701 701 } 702 702 703 703 return ERR_PTR(err); ··· 742 738 skb = xsk_build_skb(xs, &desc); 743 739 if (IS_ERR(skb)) { 744 740 err = PTR_ERR(skb); 745 - if (err == -EAGAIN) 741 + if (err != -EOVERFLOW) 746 742 goto out; 747 743 err = 0; 748 744 continue;

+3

net/xdp/xsk_diag.c

··· 111 111 sock_diag_save_cookie(sk, msg->xdiag_cookie); 112 112 113 113 mutex_lock(&xs->mutex); 114 + if (READ_ONCE(xs->state) == XSK_UNBOUND) 115 + goto out_nlmsg_trim; 116 + 114 117 if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb)) 115 118 goto out_nlmsg_trim; 116 119

+28 -28

scripts/bpf_doc.py

··· 59 59 Break down helper function protocol into smaller chunks: return type, 60 60 name, distincts arguments. 61 61 """ 62 - arg_re = re.compile('((\w+ )*?(\w+|...))( (\**)(\w+))?$') 62 + arg_re = re.compile(r'((\w+ )*?(\w+|...))( (\**)(\w+))?$') 63 63 res = {} 64 - proto_re = re.compile('(.+) (\**)(\w+)$((([^,]+)(, )?){1,5})$$') 64 + proto_re = re.compile(r'(.+) (\**)(\w+)$((([^,]+)(, )?){1,5})$$') 65 65 66 66 capture = proto_re.match(self.proto) 67 67 res['ret_type'] = capture.group(1) ··· 114 114 return Helper(proto=proto, desc=desc, ret=ret) 115 115 116 116 def parse_symbol(self): 117 - p = re.compile(' \* ?(BPF\w+)$') 117 + p = re.compile(r' \* ?(BPF\w+)$') 118 118 capture = p.match(self.line) 119 119 if not capture: 120 120 raise NoSyscallCommandFound 121 - end_re = re.compile(' \* ?NOTES$') 121 + end_re = re.compile(r' \* ?NOTES$') 122 122 end = end_re.match(self.line) 123 123 if end: 124 124 raise NoSyscallCommandFound ··· 133 133 # - Same as above, with "const" and/or "struct" in front of type 134 134 # - "..." (undefined number of arguments, for bpf_trace_printk()) 135 135 # There is at least one term ("void"), and at most five arguments. 136 - p = re.compile(' \* ?((.+) \**\w+$(((const )?(struct )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}$)$') 136 + p = re.compile(r' \* ?((.+) \**\w+$(((const )?(struct )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}$)$') 137 137 capture = p.match(self.line) 138 138 if not capture: 139 139 raise NoHelperFound ··· 141 141 return capture.group(1) 142 142 143 143 def parse_desc(self, proto): 144 - p = re.compile(' \* ?(?:\t| {5,8})Description$') 144 + p = re.compile(r' \* ?(?:\t| {5,8})Description$') 145 145 capture = p.match(self.line) 146 146 if not capture: 147 147 raise Exception("No description section found for " + proto) ··· 154 154 if self.line == ' *\n': 155 155 desc += '\n' 156 156 else: 157 - p = re.compile(' \* ?(?:\t| {5,8})(?:\t| {8})(.*)') 157 + p = re.compile(r' \* ?(?:\t| {5,8})(?:\t| {8})(.*)') 158 158 capture = p.match(self.line) 159 159 if capture: 160 160 desc_present = True ··· 167 167 return desc 168 168 169 169 def parse_ret(self, proto): 170 - p = re.compile(' \* ?(?:\t| {5,8})Return$') 170 + p = re.compile(r' \* ?(?:\t| {5,8})Return$') 171 171 capture = p.match(self.line) 172 172 if not capture: 173 173 raise Exception("No return section found for " + proto) ··· 180 180 if self.line == ' *\n': 181 181 ret += '\n' 182 182 else: 183 - p = re.compile(' \* ?(?:\t| {5,8})(?:\t| {8})(.*)') 183 + p = re.compile(r' \* ?(?:\t| {5,8})(?:\t| {8})(.*)') 184 184 capture = p.match(self.line) 185 185 if capture: 186 186 ret_present = True ··· 219 219 self.seek_to('enum bpf_cmd {', 220 220 'Could not find start of bpf_cmd enum', 0) 221 221 # Searches for either one or more BPF\w+ enums 222 - bpf_p = re.compile('\s*(BPF\w+)+') 222 + bpf_p = re.compile(r'\s*(BPF\w+)+') 223 223 # Searches for an enum entry assigned to another entry, 224 224 # for e.g. BPF_PROG_RUN = BPF_PROG_TEST_RUN, which is 225 225 # not documented hence should be skipped in check to 226 226 # determine if the right number of syscalls are documented 227 - assign_p = re.compile('\s*(BPF\w+)\s*=\s*(BPF\w+)') 227 + assign_p = re.compile(r'\s*(BPF\w+)\s*=\s*(BPF\w+)') 228 228 bpf_cmd_str = '' 229 229 while True: 230 230 capture = assign_p.match(self.line) ··· 239 239 break 240 240 self.line = self.reader.readline() 241 241 # Find the number of occurences of BPF\w+ 242 - self.enum_syscalls = re.findall('(BPF\w+)+', bpf_cmd_str) 242 + self.enum_syscalls = re.findall(r'(BPF\w+)+', bpf_cmd_str) 243 243 244 244 def parse_desc_helpers(self): 245 245 self.seek_to(helpersDocStart, ··· 263 263 self.seek_to('#define ___BPF_FUNC_MAPPER(FN, ctx...)', 264 264 'Could not find start of eBPF helper definition list') 265 265 # Searches for one FN(\w+) define or a backslash for newline 266 - p = re.compile('\s*FN$(\w+), (\d+), ##ctx$|\\\\') 266 + p = re.compile(r'\s*FN$(\w+), (\d+), ##ctx$|\\\\') 267 267 fn_defines_str = '' 268 268 i = 0 269 269 while True: ··· 278 278 break 279 279 self.line = self.reader.readline() 280 280 # Find the number of occurences of FN(\w+) 281 - self.define_unique_helpers = re.findall('FN$\w+, \d+, ##ctx$', fn_defines_str) 281 + self.define_unique_helpers = re.findall(r'FN$\w+, \d+, ##ctx$', fn_defines_str) 282 282 283 283 def validate_helpers(self): 284 284 last_helper = '' ··· 425 425 try: 426 426 cmd = ['git', 'log', '-1', '--pretty=format:%cs', '--no-patch', 427 427 '-L', 428 - '/{}/,/\*\//:include/uapi/linux/bpf.h'.format(delimiter)] 428 + '/{}/,/\\*\\//:include/uapi/linux/bpf.h'.format(delimiter)] 429 429 date = subprocess.run(cmd, cwd=linuxRoot, 430 430 capture_output=True, check=True) 431 431 return date.stdout.decode().rstrip() ··· 516 516 programs that are compatible with the GNU Privacy License (GPL). 517 517 518 518 In order to use such helpers, the eBPF program must be loaded with the correct 519 - license string passed (via **attr**) to the **bpf**\ () system call, and this 519 + license string passed (via **attr**) to the **bpf**\\ () system call, and this 520 520 generally translates into the C source code of the program containing a line 521 521 similar to the following: 522 522 ··· 550 550 * The bpftool utility can be used to probe the availability of helper functions 551 551 on the system (as well as supported program and map types, and a number of 552 552 other parameters). To do so, run **bpftool feature probe** (see 553 - **bpftool-feature**\ (8) for details). Add the **unprivileged** keyword to 553 + **bpftool-feature**\\ (8) for details). Add the **unprivileged** keyword to 554 554 list features available to unprivileged users. 555 555 556 556 Compatibility between helper functions and program types can generally be found ··· 562 562 requirement for GPL license is also in those **struct bpf_func_proto**. 563 563 564 564 Compatibility between helper functions and map types can be found in the 565 - **check_map_func_compatibility**\ () function in file *kernel/bpf/verifier.c*. 565 + **check_map_func_compatibility**\\ () function in file *kernel/bpf/verifier.c*. 566 566 567 567 Helper functions that invalidate the checks on **data** and **data_end** 568 568 pointers for network processing are listed in function 569 - **bpf_helper_changes_pkt_data**\ () in file *net/core/filter.c*. 569 + **bpf_helper_changes_pkt_data**\\ () in file *net/core/filter.c*. 570 570 571 571 SEE ALSO 572 572 ======== 573 573 574 - **bpf**\ (2), 575 - **bpftool**\ (8), 576 - **cgroups**\ (7), 577 - **ip**\ (8), 578 - **perf_event_open**\ (2), 579 - **sendmsg**\ (2), 580 - **socket**\ (7), 581 - **tc-bpf**\ (8)''' 574 + **bpf**\\ (2), 575 + **bpftool**\\ (8), 576 + **cgroups**\\ (7), 577 + **ip**\\ (8), 578 + **perf_event_open**\\ (2), 579 + **sendmsg**\\ (2), 580 + **socket**\\ (7), 581 + **tc-bpf**\\ (8)''' 582 582 print(footer) 583 583 584 584 def print_proto(self, helper): ··· 598 598 one_arg = '{}{}'.format(comma, a['type']) 599 599 if a['name']: 600 600 if a['star']: 601 - one_arg += ' {}**\ '.format(a['star'].replace('*', '\\*')) 601 + one_arg += ' {}**\\ '.format(a['star'].replace('*', '\\*')) 602 602 else: 603 603 one_arg += '** ' 604 604 one_arg += '*{}*\\ **'.format(a['name'])

+1 -1

tools/bpf/bpftool/link.c

··· 83 83 #define perf_event_name(array, id) ({ \ 84 84 const char *event_str = NULL; \ 85 85 \ 86 - if ((id) >= 0 && (id) < ARRAY_SIZE(array)) \ 86 + if ((id) < ARRAY_SIZE(array)) \ 87 87 event_str = array[id]; \ 88 88 event_str; \ 89 89 })

+12

tools/testing/selftests/bpf/Makefile

··· 50 50 test_cgroup_storage \ 51 51 test_tcpnotify_user test_sysctl \ 52 52 test_progs-no_alu32 53 + TEST_INST_SUBDIRS := no_alu32 53 54 54 55 # Also test bpf-gcc, if present 55 56 ifneq ($(BPF_GCC),) 56 57 TEST_GEN_PROGS += test_progs-bpf_gcc 58 + TEST_INST_SUBDIRS += bpf_gcc 57 59 endif 58 60 59 61 ifneq ($(CLANG_CPUV4),) 60 62 TEST_GEN_PROGS += test_progs-cpuv4 63 + TEST_INST_SUBDIRS += cpuv4 61 64 endif 62 65 63 66 TEST_GEN_FILES = test_lwt_ip_encap.bpf.o test_tc_edt.bpf.o ··· 717 714 718 715 # Delete partially updated (corrupted) files on error 719 716 .DELETE_ON_ERROR: 717 + 718 + DEFAULT_INSTALL_RULE := $(INSTALL_RULE) 719 + override define INSTALL_RULE 720 + $(DEFAULT_INSTALL_RULE) 721 + @for DIR in $(TEST_INST_SUBDIRS); do \ 722 + mkdir -p $(INSTALL_PATH)/$$DIR; \ 723 + rsync -a $(OUTPUT)/$$DIR/*.bpf.o $(INSTALL_PATH)/$$DIR;\ 724 + done 725 + endef

+3 -2

tools/testing/selftests/bpf/prog_tests/bpf_obj_pinning.c

··· 8 8 #include <linux/unistd.h> 9 9 #include <linux/mount.h> 10 10 #include <sys/syscall.h> 11 + #include "bpf/libbpf_internal.h" 11 12 12 13 static inline int sys_fsopen(const char *fsname, unsigned flags) 13 14 { ··· 156 155 ASSERT_OK(err, "obj_pin"); 157 156 158 157 /* cleanup */ 159 - if (pin_opts.path_fd >= 0) 158 + if (path_kind == PATH_FD_REL && pin_opts.path_fd >= 0) 160 159 close(pin_opts.path_fd); 161 160 if (old_cwd[0]) 162 161 ASSERT_OK(chdir(old_cwd), "restore_cwd"); ··· 221 220 goto cleanup; 222 221 223 222 /* cleanup */ 224 - if (get_opts.path_fd >= 0) 223 + if (path_kind == PATH_FD_REL && get_opts.path_fd >= 0) 225 224 close(get_opts.path_fd); 226 225 if (old_cwd[0]) 227 226 ASSERT_OK(chdir(old_cwd), "restore_cwd");

+18 -1

tools/testing/selftests/bpf/prog_tests/d_path.c

··· 12 12 #include "test_d_path_check_rdonly_mem.skel.h" 13 13 #include "test_d_path_check_types.skel.h" 14 14 15 + /* sys_close_range is not around for long time, so let's 16 + * make sure we can call it on systems with older glibc 17 + */ 18 + #ifndef __NR_close_range 19 + #ifdef __alpha__ 20 + #define __NR_close_range 546 21 + #else 22 + #define __NR_close_range 436 23 + #endif 24 + #endif 25 + 15 26 static int duration; 16 27 17 28 static struct { ··· 101 90 fstat(indicatorfd, &fileStat); 102 91 103 92 out_close: 104 - /* triggers filp_close */ 93 + /* sys_close no longer triggers filp_close, but we can 94 + * call sys_close_range instead which still does 95 + */ 96 + #define close(fd) syscall(__NR_close_range, fd, fd, 0) 97 + 105 98 close(pipefd[0]); 106 99 close(pipefd[1]); 107 100 close(sockfd); ··· 113 98 close(devfd); 114 99 close(localfd); 115 100 close(indicatorfd); 101 + 102 + #undef close 116 103 return ret; 117 104 } 118 105

+56

tools/testing/selftests/bpf/prog_tests/sk_storage_omem_uncharge.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Facebook */ 3 + #include <test_progs.h> 4 + #include <bpf/libbpf.h> 5 + #include <sys/types.h> 6 + #include <sys/socket.h> 7 + #include "sk_storage_omem_uncharge.skel.h" 8 + 9 + void test_sk_storage_omem_uncharge(void) 10 + { 11 + struct sk_storage_omem_uncharge *skel; 12 + int sk_fd = -1, map_fd, err, value; 13 + socklen_t optlen; 14 + 15 + skel = sk_storage_omem_uncharge__open_and_load(); 16 + if (!ASSERT_OK_PTR(skel, "skel open_and_load")) 17 + return; 18 + map_fd = bpf_map__fd(skel->maps.sk_storage); 19 + 20 + /* A standalone socket not binding to addr:port, 21 + * so nentns is not needed. 22 + */ 23 + sk_fd = socket(AF_INET6, SOCK_STREAM, 0); 24 + if (!ASSERT_GE(sk_fd, 0, "socket")) 25 + goto done; 26 + 27 + optlen = sizeof(skel->bss->cookie); 28 + err = getsockopt(sk_fd, SOL_SOCKET, SO_COOKIE, &skel->bss->cookie, &optlen); 29 + if (!ASSERT_OK(err, "getsockopt(SO_COOKIE)")) 30 + goto done; 31 + 32 + value = 0; 33 + err = bpf_map_update_elem(map_fd, &sk_fd, &value, 0); 34 + if (!ASSERT_OK(err, "bpf_map_update_elem(value=0)")) 35 + goto done; 36 + 37 + value = 0xdeadbeef; 38 + err = bpf_map_update_elem(map_fd, &sk_fd, &value, 0); 39 + if (!ASSERT_OK(err, "bpf_map_update_elem(value=0xdeadbeef)")) 40 + goto done; 41 + 42 + err = sk_storage_omem_uncharge__attach(skel); 43 + if (!ASSERT_OK(err, "attach")) 44 + goto done; 45 + 46 + close(sk_fd); 47 + sk_fd = -1; 48 + 49 + ASSERT_EQ(skel->bss->cookie_found, 2, "cookie_found"); 50 + ASSERT_EQ(skel->bss->omem, 0, "omem"); 51 + 52 + done: 53 + sk_storage_omem_uncharge__destroy(skel); 54 + if (sk_fd != -1) 55 + close(sk_fd); 56 + }

+26

tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h

··· 179 179 __ret; \ 180 180 }) 181 181 182 + static inline int poll_connect(int fd, unsigned int timeout_sec) 183 + { 184 + struct timeval timeout = { .tv_sec = timeout_sec }; 185 + fd_set wfds; 186 + int r, eval; 187 + socklen_t esize = sizeof(eval); 188 + 189 + FD_ZERO(&wfds); 190 + FD_SET(fd, &wfds); 191 + 192 + r = select(fd + 1, NULL, &wfds, NULL, &timeout); 193 + if (r == 0) 194 + errno = ETIME; 195 + if (r != 1) 196 + return -1; 197 + 198 + if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &eval, &esize) < 0) 199 + return -1; 200 + if (eval != 0) { 201 + errno = eval; 202 + return -1; 203 + } 204 + 205 + return 0; 206 + } 207 + 182 208 static inline int poll_read(int fd, unsigned int timeout_sec) 183 209 { 184 210 struct timeval timeout = { .tv_sec = timeout_sec };

+7

tools/testing/selftests/bpf/prog_tests/sockmap_listen.c

··· 1452 1452 if (p < 0) 1453 1453 goto close_cli; 1454 1454 1455 + if (poll_connect(c, IO_TIMEOUT_SEC) < 0) { 1456 + FAIL_ERRNO("poll_connect"); 1457 + goto close_acc; 1458 + } 1459 + 1455 1460 *v0 = p; 1456 1461 *v1 = c; 1457 1462 1458 1463 return 0; 1459 1464 1465 + close_acc: 1466 + close(p); 1460 1467 close_cli: 1461 1468 close(c); 1462 1469 close_srv:

+1

tools/testing/selftests/bpf/progs/bpf_tracing_net.h

··· 88 88 #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr 89 89 #define sk_flags __sk_common.skc_flags 90 90 #define sk_reuse __sk_common.skc_reuse 91 + #define sk_cookie __sk_common.skc_cookie 91 92 92 93 #define s6_addr32 in6_u.u6_addr32 93 94

+61

tools/testing/selftests/bpf/progs/sk_storage_omem_uncharge.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Facebook */ 3 + #include "vmlinux.h" 4 + #include "bpf_tracing_net.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + #include <bpf/bpf_core_read.h> 8 + 9 + void *local_storage_ptr = NULL; 10 + void *sk_ptr = NULL; 11 + int cookie_found = 0; 12 + __u64 cookie = 0; 13 + __u32 omem = 0; 14 + 15 + void *bpf_rdonly_cast(void *, __u32) __ksym; 16 + 17 + struct { 18 + __uint(type, BPF_MAP_TYPE_SK_STORAGE); 19 + __uint(map_flags, BPF_F_NO_PREALLOC); 20 + __type(key, int); 21 + __type(value, int); 22 + } sk_storage SEC(".maps"); 23 + 24 + SEC("fexit/bpf_local_storage_destroy") 25 + int BPF_PROG(bpf_local_storage_destroy, struct bpf_local_storage *local_storage) 26 + { 27 + struct sock *sk; 28 + 29 + if (local_storage_ptr != local_storage) 30 + return 0; 31 + 32 + sk = bpf_rdonly_cast(sk_ptr, bpf_core_type_id_kernel(struct sock)); 33 + if (sk->sk_cookie.counter != cookie) 34 + return 0; 35 + 36 + cookie_found++; 37 + omem = sk->sk_omem_alloc.counter; 38 + local_storage_ptr = NULL; 39 + 40 + return 0; 41 + } 42 + 43 + SEC("fentry/inet6_sock_destruct") 44 + int BPF_PROG(inet6_sock_destruct, struct sock *sk) 45 + { 46 + int *value; 47 + 48 + if (!cookie || sk->sk_cookie.counter != cookie) 49 + return 0; 50 + 51 + value = bpf_sk_storage_get(&sk_storage, sk, 0, 0); 52 + if (value && *value == 0xdeadbeef) { 53 + cookie_found++; 54 + sk_ptr = sk; 55 + local_storage_ptr = sk->sk_bpf_storage; 56 + } 57 + 58 + return 0; 59 + } 60 + 61 + char _license[] SEC("license") = "GPL";

+154 -1

tools/testing/selftests/net/fib_tests.sh

··· 12 12 TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify \ 13 13 ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics \ 14 14 ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr \ 15 - ipv6_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh fib6_gc_test" 15 + ipv6_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh fib6_gc_test \ 16 + ipv4_mpath_list ipv6_mpath_list" 16 17 17 18 VERBOSE=0 18 19 PAUSE_ON_FAIL=no ··· 2353 2352 cleanup 2354 2353 } 2355 2354 2355 + mpath_dep_check() 2356 + { 2357 + if [ ! -x "$(command -v mausezahn)" ]; then 2358 + echo "mausezahn command not found. Skipping test" 2359 + return 1 2360 + fi 2361 + 2362 + if [ ! -x "$(command -v jq)" ]; then 2363 + echo "jq command not found. Skipping test" 2364 + return 1 2365 + fi 2366 + 2367 + if [ ! -x "$(command -v bc)" ]; then 2368 + echo "bc command not found. Skipping test" 2369 + return 1 2370 + fi 2371 + 2372 + if [ ! -x "$(command -v perf)" ]; then 2373 + echo "perf command not found. Skipping test" 2374 + return 1 2375 + fi 2376 + 2377 + perf list fib:* | grep -q fib_table_lookup 2378 + if [ $? -ne 0 ]; then 2379 + echo "IPv4 FIB tracepoint not found. Skipping test" 2380 + return 1 2381 + fi 2382 + 2383 + perf list fib6:* | grep -q fib6_table_lookup 2384 + if [ $? -ne 0 ]; then 2385 + echo "IPv6 FIB tracepoint not found. Skipping test" 2386 + return 1 2387 + fi 2388 + 2389 + return 0 2390 + } 2391 + 2392 + link_stats_get() 2393 + { 2394 + local ns=$1; shift 2395 + local dev=$1; shift 2396 + local dir=$1; shift 2397 + local stat=$1; shift 2398 + 2399 + ip -n $ns -j -s link show dev $dev \ 2400 + | jq '.[]["stats64"]["'$dir'"]["'$stat'"]' 2401 + } 2402 + 2403 + list_rcv_eval() 2404 + { 2405 + local file=$1; shift 2406 + local expected=$1; shift 2407 + 2408 + local count=$(tail -n 1 $file | jq '.["counter-value"] | tonumber | floor') 2409 + local ratio=$(echo "scale=2; $count / $expected" | bc -l) 2410 + local res=$(echo "$ratio >= 0.95" | bc) 2411 + [[ $res -eq 1 ]] 2412 + log_test $? 0 "Multipath route hit ratio ($ratio)" 2413 + } 2414 + 2415 + ipv4_mpath_list_test() 2416 + { 2417 + echo 2418 + echo "IPv4 multipath list receive tests" 2419 + 2420 + mpath_dep_check || return 1 2421 + 2422 + route_setup 2423 + 2424 + set -e 2425 + run_cmd "ip netns exec ns1 ethtool -K veth1 tcp-segmentation-offload off" 2426 + 2427 + run_cmd "ip netns exec ns2 bash -c \"echo 20000 > /sys/class/net/veth2/gro_flush_timeout\"" 2428 + run_cmd "ip netns exec ns2 bash -c \"echo 1 > /sys/class/net/veth2/napi_defer_hard_irqs\"" 2429 + run_cmd "ip netns exec ns2 ethtool -K veth2 generic-receive-offload on" 2430 + run_cmd "ip -n ns2 link add name nh1 up type dummy" 2431 + run_cmd "ip -n ns2 link add name nh2 up type dummy" 2432 + run_cmd "ip -n ns2 address add 172.16.201.1/24 dev nh1" 2433 + run_cmd "ip -n ns2 address add 172.16.202.1/24 dev nh2" 2434 + run_cmd "ip -n ns2 neigh add 172.16.201.2 lladdr 00:11:22:33:44:55 nud perm dev nh1" 2435 + run_cmd "ip -n ns2 neigh add 172.16.202.2 lladdr 00:aa:bb:cc:dd:ee nud perm dev nh2" 2436 + run_cmd "ip -n ns2 route add 203.0.113.0/24 2437 + nexthop via 172.16.201.2 nexthop via 172.16.202.2" 2438 + run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.fib_multipath_hash_policy=1" 2439 + set +e 2440 + 2441 + local dmac=$(ip -n ns2 -j link show dev veth2 | jq -r '.[]["address"]') 2442 + local tmp_file=$(mktemp) 2443 + local cmd="ip netns exec ns1 mausezahn veth1 -a own -b $dmac 2444 + -A 172.16.101.1 -B 203.0.113.1 -t udp 'sp=12345,dp=0-65535' -q" 2445 + 2446 + # Packets forwarded in a list using a multipath route must not reuse a 2447 + # cached result so that a flow always hits the same nexthop. In other 2448 + # words, the FIB lookup tracepoint needs to be triggered for every 2449 + # packet. 2450 + local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets) 2451 + run_cmd "perf stat -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd" 2452 + local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets) 2453 + local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l) 2454 + list_rcv_eval $tmp_file $diff 2455 + 2456 + rm $tmp_file 2457 + route_cleanup 2458 + } 2459 + 2460 + ipv6_mpath_list_test() 2461 + { 2462 + echo 2463 + echo "IPv6 multipath list receive tests" 2464 + 2465 + mpath_dep_check || return 1 2466 + 2467 + route_setup 2468 + 2469 + set -e 2470 + run_cmd "ip netns exec ns1 ethtool -K veth1 tcp-segmentation-offload off" 2471 + 2472 + run_cmd "ip netns exec ns2 bash -c \"echo 20000 > /sys/class/net/veth2/gro_flush_timeout\"" 2473 + run_cmd "ip netns exec ns2 bash -c \"echo 1 > /sys/class/net/veth2/napi_defer_hard_irqs\"" 2474 + run_cmd "ip netns exec ns2 ethtool -K veth2 generic-receive-offload on" 2475 + run_cmd "ip -n ns2 link add name nh1 up type dummy" 2476 + run_cmd "ip -n ns2 link add name nh2 up type dummy" 2477 + run_cmd "ip -n ns2 -6 address add 2001:db8:201::1/64 dev nh1" 2478 + run_cmd "ip -n ns2 -6 address add 2001:db8:202::1/64 dev nh2" 2479 + run_cmd "ip -n ns2 -6 neigh add 2001:db8:201::2 lladdr 00:11:22:33:44:55 nud perm dev nh1" 2480 + run_cmd "ip -n ns2 -6 neigh add 2001:db8:202::2 lladdr 00:aa:bb:cc:dd:ee nud perm dev nh2" 2481 + run_cmd "ip -n ns2 -6 route add 2001:db8:301::/64 2482 + nexthop via 2001:db8:201::2 nexthop via 2001:db8:202::2" 2483 + run_cmd "ip netns exec ns2 sysctl -qw net.ipv6.fib_multipath_hash_policy=1" 2484 + set +e 2485 + 2486 + local dmac=$(ip -n ns2 -j link show dev veth2 | jq -r '.[]["address"]') 2487 + local tmp_file=$(mktemp) 2488 + local cmd="ip netns exec ns1 mausezahn -6 veth1 -a own -b $dmac 2489 + -A 2001:db8:101::1 -B 2001:db8:301::1 -t udp 'sp=12345,dp=0-65535' -q" 2490 + 2491 + # Packets forwarded in a list using a multipath route must not reuse a 2492 + # cached result so that a flow always hits the same nexthop. In other 2493 + # words, the FIB lookup tracepoint needs to be triggered for every 2494 + # packet. 2495 + local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets) 2496 + run_cmd "perf stat -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd" 2497 + local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets) 2498 + local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l) 2499 + list_rcv_eval $tmp_file $diff 2500 + 2501 + rm $tmp_file 2502 + route_cleanup 2503 + } 2504 + 2356 2505 ################################################################################ 2357 2506 # usage 2358 2507 ··· 2584 2433 ipv6_mangle) ipv6_mangle_test;; 2585 2434 ipv4_bcast_neigh) ipv4_bcast_neigh_test;; 2586 2435 fib6_gc_test|ipv6_gc) fib6_gc_test;; 2436 + ipv4_mpath_list) ipv4_mpath_list_test;; 2437 + ipv6_mpath_list) ipv6_mpath_list_test;; 2587 2438 2588 2439 help) echo "Test names: $TESTS"; exit 0;; 2589 2440 esac