Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+12 -7

Documentation/bpf/bpf_devel_QA.rst

··· 149 149 again in a second or later revision, it is also required to add a 150 150 version number (``v2``, ``v3``, ...) into the subject prefix:: 151 151 152 - git format-patch --subject-prefix='PATCH net-next v2' start..finish 152 + git format-patch --subject-prefix='PATCH bpf-next v2' start..finish 153 153 154 154 When changes have been requested to the patch series, always send the 155 155 whole patch series again with the feedback incorporated (never send ··· 479 479 480 480 $ llc --version 481 481 LLVM (http://llvm.org/): 482 - LLVM version 6.0.0svn 482 + LLVM version 10.0.0 483 483 Optimized build. 484 484 Default target: x86_64-unknown-linux-gnu 485 485 Host CPU: skylake 486 486 487 487 Registered Targets: 488 - bpf - BPF (host endian) 489 - bpfeb - BPF (big endian) 490 - bpfel - BPF (little endian) 491 - x86 - 32-bit X86: Pentium-Pro and above 492 - x86-64 - 64-bit X86: EM64T and AMD64 488 + aarch64 - AArch64 (little endian) 489 + bpf - BPF (host endian) 490 + bpfeb - BPF (big endian) 491 + bpfel - BPF (little endian) 492 + x86 - 32-bit X86: Pentium-Pro and above 493 + x86-64 - 64-bit X86: EM64T and AMD64 493 494 494 495 For developers in order to utilize the latest features added to LLVM's 495 496 BPF back end, it is advisable to run the latest LLVM releases. Support ··· 517 516 518 517 The built binaries can then be found in the build/bin/ directory, where 519 518 you can point the PATH variable to. 519 + 520 + Set ``-DLLVM_TARGETS_TO_BUILD`` equal to the target you wish to build, you 521 + will find a full list of targets within the llvm-project/llvm/lib/Target 522 + directory. 520 523 521 524 Q: Reporting LLVM BPF issues 522 525 ----------------------------

+25

Documentation/bpf/btf.rst

··· 724 724 BTF_ID_UNUSED 725 725 BTF_ID(struct, task_struct) 726 726 727 + The ``BTF_SET_START/END`` macros pair defines sorted list of BTF ID values 728 + and their count, with following syntax:: 729 + 730 + BTF_SET_START(set) 731 + BTF_ID(type1, name1) 732 + BTF_ID(type2, name2) 733 + BTF_SET_END(set) 734 + 735 + resulting in following layout in .BTF_ids section:: 736 + 737 + __BTF_ID__set__set: 738 + .zero 4 739 + __BTF_ID__type1__name1__3: 740 + .zero 4 741 + __BTF_ID__type2__name2__4: 742 + .zero 4 743 + 744 + The ``struct btf_id_set set;`` variable is defined to access the list. 745 + 746 + The ``typeX`` name can be one of following:: 747 + 748 + struct, union, typedef, func 749 + 750 + and is used as a filter when resolving the BTF ID value. 751 + 727 752 All the BTF ID lists and sets are compiled in the .BTF_ids section and 728 753 resolved during the linking phase of kernel build by ``resolve_btfids`` tool. 729 754

+1

Documentation/bpf/index.rst

··· 52 52 prog_cgroup_sysctl 53 53 prog_flow_dissector 54 54 bpf_lsm 55 + prog_sk_lookup 55 56 56 57 57 58 Map types

+98

Documentation/bpf/prog_sk_lookup.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) 2 + 3 + ===================== 4 + BPF sk_lookup program 5 + ===================== 6 + 7 + BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability 8 + into the socket lookup performed by the transport layer when a packet is to be 9 + delivered locally. 10 + 11 + When invoked BPF sk_lookup program can select a socket that will receive the 12 + incoming packet by calling the ``bpf_sk_assign()`` BPF helper function. 13 + 14 + Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP. 15 + 16 + Motivation 17 + ========== 18 + 19 + BPF sk_lookup program type was introduced to address setup scenarios where 20 + binding sockets to an address with ``bind()`` socket call is impractical, such 21 + as: 22 + 23 + 1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when 24 + binding to a wildcard address ``INADRR_ANY`` is not possible due to a port 25 + conflict, 26 + 2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use 27 + case. 28 + 29 + Such setups would require creating and ``bind()``'ing one socket to each of the 30 + IP address/port in the range, leading to resource consumption and potential 31 + latency spikes during socket lookup. 32 + 33 + Attachment 34 + ========== 35 + 36 + BPF sk_lookup program can be attached to a network namespace with 37 + ``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a 38 + netns FD as attachment ``target_fd``. 39 + 40 + Multiple programs can be attached to one network namespace. Programs will be 41 + invoked in the same order as they were attached. 42 + 43 + Hooks 44 + ===== 45 + 46 + The attached BPF sk_lookup programs run whenever the transport layer needs to 47 + find a listening (TCP) or an unconnected (UDP) socket for an incoming packet. 48 + 49 + Incoming traffic to established (TCP) and connected (UDP) sockets is delivered 50 + as usual without triggering the BPF sk_lookup hook. 51 + 52 + The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP`` 53 + verdict code. As for other BPF program types that are network filters, 54 + ``SK_PASS`` signifies that the socket lookup should continue on to regular 55 + hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the 56 + packet. 57 + 58 + A BPF sk_lookup program can also select a socket to receive the packet by 59 + calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket 60 + in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a 61 + ``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the 62 + selection. Selecting a socket only takes effect if the program has terminated 63 + with ``SK_PASS`` code. 64 + 65 + When multiple programs are attached, the end result is determined from return 66 + codes of all the programs according to the following rules: 67 + 68 + 1. If any program returned ``SK_PASS`` and selected a valid socket, the socket 69 + is used as the result of the socket lookup. 70 + 2. If more than one program returned ``SK_PASS`` and selected a socket, the last 71 + selection takes effect. 72 + 3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and 73 + selected a socket, socket lookup fails. 74 + 4. If all programs returned ``SK_PASS`` and none of them selected a socket, 75 + socket lookup continues on. 76 + 77 + API 78 + === 79 + 80 + In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program 81 + receives information about the packet that triggered the socket lookup. Namely: 82 + 83 + * IP version (``AF_INET`` or ``AF_INET6``), 84 + * L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``), 85 + * source and destination IP address, 86 + * source and destination L4 port, 87 + * the socket that has been selected with ``bpf_sk_assign()``. 88 + 89 + Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API 90 + header, and `bpf-helpers(7) 91 + <https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section 92 + for ``bpf_sk_assign()`` for details. 93 + 94 + Example 95 + ======= 96 + 97 + See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference 98 + implementation.

+58 -10

Documentation/networking/af_xdp.rst

··· 258 258 XDP_SHARED_UMEM bind flag 259 259 ------------------------- 260 260 261 - This flag enables you to bind multiple sockets to the same UMEM, but 262 - only if they share the same queue id. In this mode, each socket has 263 - their own RX and TX rings, but the UMEM (tied to the fist socket 264 - created) only has a single FILL ring and a single COMPLETION 265 - ring. To use this mode, create the first socket and bind it in the normal 266 - way. Create a second socket and create an RX and a TX ring, or at 267 - least one of them, but no FILL or COMPLETION rings as the ones from 268 - the first socket will be used. In the bind call, set he 261 + This flag enables you to bind multiple sockets to the same UMEM. It 262 + works on the same queue id, between queue ids and between 263 + netdevs/devices. In this mode, each socket has their own RX and TX 264 + rings as usual, but you are going to have one or more FILL and 265 + COMPLETION ring pairs. You have to create one of these pairs per 266 + unique netdev and queue id tuple that you bind to. 267 + 268 + Starting with the case were we would like to share a UMEM between 269 + sockets bound to the same netdev and queue id. The UMEM (tied to the 270 + fist socket created) will only have a single FILL ring and a single 271 + COMPLETION ring as there is only on unique netdev,queue_id tuple that 272 + we have bound to. To use this mode, create the first socket and bind 273 + it in the normal way. Create a second socket and create an RX and a TX 274 + ring, or at least one of them, but no FILL or COMPLETION rings as the 275 + ones from the first socket will be used. In the bind call, set he 269 276 XDP_SHARED_UMEM option and provide the initial socket's fd in the 270 277 sxdp_shared_umem_fd field. You can attach an arbitrary number of extra 271 278 sockets this way. ··· 312 305 libbpf code that protects multiple users at this point in time. 313 306 314 307 Libbpf uses this mode if you create more than one socket tied to the 315 - same umem. However, note that you need to supply the 308 + same UMEM. However, note that you need to supply the 316 309 XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf_flag with the 317 310 xsk_socket__create calls and load your own XDP program as there is no 318 311 built in one in libbpf that will route the traffic for you. 312 + 313 + The second case is when you share a UMEM between sockets that are 314 + bound to different queue ids and/or netdevs. In this case you have to 315 + create one FILL ring and one COMPLETION ring for each unique 316 + netdev,queue_id pair. Let us say you want to create two sockets bound 317 + to two different queue ids on the same netdev. Create the first socket 318 + and bind it in the normal way. Create a second socket and create an RX 319 + and a TX ring, or at least one of them, and then one FILL and 320 + COMPLETION ring for this socket. Then in the bind call, set he 321 + XDP_SHARED_UMEM option and provide the initial socket's fd in the 322 + sxdp_shared_umem_fd field as you registered the UMEM on that 323 + socket. These two sockets will now share one and the same UMEM. 324 + 325 + There is no need to supply an XDP program like the one in the previous 326 + case where sockets were bound to the same queue id and 327 + device. Instead, use the NIC's packet steering capabilities to steer 328 + the packets to the right queue. In the previous example, there is only 329 + one queue shared among sockets, so the NIC cannot do this steering. It 330 + can only steer between queues. 331 + 332 + In libbpf, you need to use the xsk_socket__create_shared() API as it 333 + takes a reference to a FILL ring and a COMPLETION ring that will be 334 + created for you and bound to the shared UMEM. You can use this 335 + function for all the sockets you create, or you can use it for the 336 + second and following ones and use xsk_socket__create() for the first 337 + one. Both methods yield the same result. 338 + 339 + Note that a UMEM can be shared between sockets on the same queue id 340 + and device, as well as between queues on the same device and between 341 + devices at the same time. 319 342 320 343 XDP_USE_NEED_WAKEUP bind flag 321 344 ----------------------------- ··· 401 364 COMPLETION ring are mandatory as you need to have a UMEM tied to your 402 365 socket. But if the XDP_SHARED_UMEM flag is used, any socket after the 403 366 first one does not have a UMEM and should in that case not have any 404 - FILL or COMPLETION rings created as the ones from the shared umem will 367 + FILL or COMPLETION rings created as the ones from the shared UMEM will 405 368 be used. Note, that the rings are single-producer single-consumer, so 406 369 do not try to access them from multiple processes at the same 407 370 time. See the XDP_SHARED_UMEM section. ··· 603 566 to the same queue id Y. In zero-copy mode, you should use the 604 567 switch, or other distribution mechanism, in your NIC to direct 605 568 traffic to the correct queue id and socket. 569 + 570 + Q: My packets are sometimes corrupted. What is wrong? 571 + 572 + A: Care has to be taken not to feed the same buffer in the UMEM into 573 + more than one ring at the same time. If you for example feed the 574 + same buffer into the FILL ring and the TX ring at the same time, the 575 + NIC might receive data into the buffer at the same time it is 576 + sending it. This will cause some packets to become corrupted. Same 577 + thing goes for feeding the same buffer into the FILL rings 578 + belonging to different queue ids or netdevs bound with the 579 + XDP_SHARED_UMEM flag. 606 580 607 581 Credits 608 582 =======

+21 -11

arch/x86/net/bpf_jit_comp.c

··· 1379 1379 u8 *prog = *pprog; 1380 1380 int cnt = 0; 1381 1381 1382 - if (emit_call(&prog, __bpf_prog_enter, prog)) 1383 - return -EINVAL; 1384 - /* remember prog start time returned by __bpf_prog_enter */ 1385 - emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0); 1382 + if (p->aux->sleepable) { 1383 + if (emit_call(&prog, __bpf_prog_enter_sleepable, prog)) 1384 + return -EINVAL; 1385 + } else { 1386 + if (emit_call(&prog, __bpf_prog_enter, prog)) 1387 + return -EINVAL; 1388 + /* remember prog start time returned by __bpf_prog_enter */ 1389 + emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0); 1390 + } 1386 1391 1387 1392 /* arg1: lea rdi, [rbp - stack_size] */ 1388 1393 EMIT4(0x48, 0x8D, 0x7D, -stack_size); ··· 1407 1402 if (mod_ret) 1408 1403 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); 1409 1404 1410 - /* arg1: mov rdi, progs[i] */ 1411 - emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, 1412 - (u32) (long) p); 1413 - /* arg2: mov rsi, rbx <- start time in nsec */ 1414 - emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6); 1415 - if (emit_call(&prog, __bpf_prog_exit, prog)) 1416 - return -EINVAL; 1405 + if (p->aux->sleepable) { 1406 + if (emit_call(&prog, __bpf_prog_exit_sleepable, prog)) 1407 + return -EINVAL; 1408 + } else { 1409 + /* arg1: mov rdi, progs[i] */ 1410 + emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, 1411 + (u32) (long) p); 1412 + /* arg2: mov rsi, rbx <- start time in nsec */ 1413 + emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6); 1414 + if (emit_call(&prog, __bpf_prog_exit, prog)) 1415 + return -EINVAL; 1416 + } 1417 1417 1418 1418 *pprog = prog; 1419 1419 return 0;

+1 -1

drivers/net/ethernet/intel/i40e/i40e_ethtool.c

··· 1967 1967 (new_rx_count == vsi->rx_rings[0]->count)) 1968 1968 return 0; 1969 1969 1970 - /* If there is a AF_XDP UMEM attached to any of Rx rings, 1970 + /* If there is a AF_XDP page pool attached to any of Rx rings, 1971 1971 * disallow changing the number of descriptors -- regardless 1972 1972 * if the netdev is running or not. 1973 1973 */

+15 -14

drivers/net/ethernet/intel/i40e/i40e_main.c

··· 3122 3122 } 3123 3123 3124 3124 /** 3125 - * i40e_xsk_umem - Retrieve the AF_XDP ZC if XDP and ZC is enabled 3125 + * i40e_xsk_pool - Retrieve the AF_XDP buffer pool if XDP and ZC is enabled 3126 3126 * @ring: The Tx or Rx ring 3127 3127 * 3128 - * Returns the UMEM or NULL. 3128 + * Returns the AF_XDP buffer pool or NULL. 3129 3129 **/ 3130 - static struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring) 3130 + static struct xsk_buff_pool *i40e_xsk_pool(struct i40e_ring *ring) 3131 3131 { 3132 3132 bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi); 3133 3133 int qid = ring->queue_index; ··· 3138 3138 if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps)) 3139 3139 return NULL; 3140 3140 3141 - return xdp_get_umem_from_qid(ring->vsi->netdev, qid); 3141 + return xsk_get_pool_from_qid(ring->vsi->netdev, qid); 3142 3142 } 3143 3143 3144 3144 /** ··· 3157 3157 u32 qtx_ctl = 0; 3158 3158 3159 3159 if (ring_is_xdp(ring)) 3160 - ring->xsk_umem = i40e_xsk_umem(ring); 3160 + ring->xsk_pool = i40e_xsk_pool(ring); 3161 3161 3162 3162 /* some ATR related tx ring init */ 3163 3163 if (vsi->back->flags & I40E_FLAG_FD_ATR_ENABLED) { ··· 3280 3280 xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); 3281 3281 3282 3282 kfree(ring->rx_bi); 3283 - ring->xsk_umem = i40e_xsk_umem(ring); 3284 - if (ring->xsk_umem) { 3283 + ring->xsk_pool = i40e_xsk_pool(ring); 3284 + if (ring->xsk_pool) { 3285 3285 ret = i40e_alloc_rx_bi_zc(ring); 3286 3286 if (ret) 3287 3287 return ret; 3288 - ring->rx_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem); 3288 + ring->rx_buf_len = 3289 + xsk_pool_get_rx_frame_size(ring->xsk_pool); 3289 3290 /* For AF_XDP ZC, we disallow packets to span on 3290 3291 * multiple buffers, thus letting us skip that 3291 3292 * handling in the fast-path. ··· 3369 3368 ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q); 3370 3369 writel(0, ring->tail); 3371 3370 3372 - if (ring->xsk_umem) { 3373 - xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); 3371 + if (ring->xsk_pool) { 3372 + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); 3374 3373 ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring)); 3375 3374 } else { 3376 3375 ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring)); ··· 3381 3380 */ 3382 3381 dev_info(&vsi->back->pdev->dev, 3383 3382 "Failed to allocate some buffers on %sRx ring %d (pf_q %d)\n", 3384 - ring->xsk_umem ? "UMEM enabled " : "", 3383 + ring->xsk_pool ? "AF_XDP ZC enabled " : "", 3385 3384 ring->queue_index, pf_q); 3386 3385 } 3387 3386 ··· 12645 12644 */ 12646 12645 if (need_reset && prog) 12647 12646 for (i = 0; i < vsi->num_queue_pairs; i++) 12648 - if (vsi->xdp_rings[i]->xsk_umem) 12647 + if (vsi->xdp_rings[i]->xsk_pool) 12649 12648 (void)i40e_xsk_wakeup(vsi->netdev, i, 12650 12649 XDP_WAKEUP_RX); 12651 12650 ··· 12924 12923 switch (xdp->command) { 12925 12924 case XDP_SETUP_PROG: 12926 12925 return i40e_xdp_setup(vsi, xdp->prog); 12927 - case XDP_SETUP_XSK_UMEM: 12928 - return i40e_xsk_umem_setup(vsi, xdp->xsk.umem, 12926 + case XDP_SETUP_XSK_POOL: 12927 + return i40e_xsk_pool_setup(vsi, xdp->xsk.pool, 12929 12928 xdp->xsk.queue_id); 12930 12929 default: 12931 12930 return -EINVAL;

+5 -5

drivers/net/ethernet/intel/i40e/i40e_txrx.c

··· 636 636 unsigned long bi_size; 637 637 u16 i; 638 638 639 - if (ring_is_xdp(tx_ring) && tx_ring->xsk_umem) { 639 + if (ring_is_xdp(tx_ring) && tx_ring->xsk_pool) { 640 640 i40e_xsk_clean_tx_ring(tx_ring); 641 641 } else { 642 642 /* ring already cleared, nothing to do */ ··· 1335 1335 rx_ring->skb = NULL; 1336 1336 } 1337 1337 1338 - if (rx_ring->xsk_umem) { 1338 + if (rx_ring->xsk_pool) { 1339 1339 i40e_xsk_clean_rx_ring(rx_ring); 1340 1340 goto skip_free; 1341 1341 } ··· 1369 1369 } 1370 1370 1371 1371 skip_free: 1372 - if (rx_ring->xsk_umem) 1372 + if (rx_ring->xsk_pool) 1373 1373 i40e_clear_rx_bi_zc(rx_ring); 1374 1374 else 1375 1375 i40e_clear_rx_bi(rx_ring); ··· 2575 2575 * budget and be more aggressive about cleaning up the Tx descriptors. 2576 2576 */ 2577 2577 i40e_for_each_ring(ring, q_vector->tx) { 2578 - bool wd = ring->xsk_umem ? 2578 + bool wd = ring->xsk_pool ? 2579 2579 i40e_clean_xdp_tx_irq(vsi, ring) : 2580 2580 i40e_clean_tx_irq(vsi, ring, budget); 2581 2581 ··· 2603 2603 budget_per_ring = budget; 2604 2604 2605 2605 i40e_for_each_ring(ring, q_vector->rx) { 2606 - int cleaned = ring->xsk_umem ? 2606 + int cleaned = ring->xsk_pool ? 2607 2607 i40e_clean_rx_irq_zc(ring, budget_per_ring) : 2608 2608 i40e_clean_rx_irq(ring, budget_per_ring); 2609 2609

+1 -1

drivers/net/ethernet/intel/i40e/i40e_txrx.h

··· 388 388 389 389 struct i40e_channel *ch; 390 390 struct xdp_rxq_info xdp_rxq; 391 - struct xdp_umem *xsk_umem; 391 + struct xsk_buff_pool *xsk_pool; 392 392 } ____cacheline_internodealigned_in_smp; 393 393 394 394 static inline bool ring_uses_build_skb(struct i40e_ring *ring)

+43 -38

drivers/net/ethernet/intel/i40e/i40e_xsk.c

··· 29 29 } 30 30 31 31 /** 32 - * i40e_xsk_umem_enable - Enable/associate a UMEM to a certain ring/qid 32 + * i40e_xsk_pool_enable - Enable/associate an AF_XDP buffer pool to a 33 + * certain ring/qid 33 34 * @vsi: Current VSI 34 - * @umem: UMEM 35 - * @qid: Rx ring to associate UMEM to 35 + * @pool: buffer pool 36 + * @qid: Rx ring to associate buffer pool with 36 37 * 37 38 * Returns 0 on success, <0 on failure 38 39 **/ 39 - static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem, 40 + static int i40e_xsk_pool_enable(struct i40e_vsi *vsi, 41 + struct xsk_buff_pool *pool, 40 42 u16 qid) 41 43 { 42 44 struct net_device *netdev = vsi->netdev; ··· 55 53 qid >= netdev->real_num_tx_queues) 56 54 return -EINVAL; 57 55 58 - err = xsk_buff_dma_map(umem, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR); 56 + err = xsk_pool_dma_map(pool, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR); 59 57 if (err) 60 58 return err; 61 59 ··· 82 80 } 83 81 84 82 /** 85 - * i40e_xsk_umem_disable - Disassociate a UMEM from a certain ring/qid 83 + * i40e_xsk_pool_disable - Disassociate an AF_XDP buffer pool from a 84 + * certain ring/qid 86 85 * @vsi: Current VSI 87 - * @qid: Rx ring to associate UMEM to 86 + * @qid: Rx ring to associate buffer pool with 88 87 * 89 88 * Returns 0 on success, <0 on failure 90 89 **/ 91 - static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid) 90 + static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid) 92 91 { 93 92 struct net_device *netdev = vsi->netdev; 94 - struct xdp_umem *umem; 93 + struct xsk_buff_pool *pool; 95 94 bool if_running; 96 95 int err; 97 96 98 - umem = xdp_get_umem_from_qid(netdev, qid); 99 - if (!umem) 97 + pool = xsk_get_pool_from_qid(netdev, qid); 98 + if (!pool) 100 99 return -EINVAL; 101 100 102 101 if_running = netif_running(vsi->netdev) && i40e_enabled_xdp_vsi(vsi); ··· 109 106 } 110 107 111 108 clear_bit(qid, vsi->af_xdp_zc_qps); 112 - xsk_buff_dma_unmap(umem, I40E_RX_DMA_ATTR); 109 + xsk_pool_dma_unmap(pool, I40E_RX_DMA_ATTR); 113 110 114 111 if (if_running) { 115 112 err = i40e_queue_pair_enable(vsi, qid); ··· 121 118 } 122 119 123 120 /** 124 - * i40e_xsk_umem_setup - Enable/disassociate a UMEM to/from a ring/qid 121 + * i40e_xsk_pool_setup - Enable/disassociate an AF_XDP buffer pool to/from 122 + * a ring/qid 125 123 * @vsi: Current VSI 126 - * @umem: UMEM to enable/associate to a ring, or NULL to disable 127 - * @qid: Rx ring to (dis)associate UMEM (from)to 124 + * @pool: Buffer pool to enable/associate to a ring, or NULL to disable 125 + * @qid: Rx ring to (dis)associate buffer pool (from)to 128 126 * 129 - * This function enables or disables a UMEM to a certain ring. 127 + * This function enables or disables a buffer pool to a certain ring. 130 128 * 131 129 * Returns 0 on success, <0 on failure 132 130 **/ 133 - int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem, 131 + int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool, 134 132 u16 qid) 135 133 { 136 - return umem ? i40e_xsk_umem_enable(vsi, umem, qid) : 137 - i40e_xsk_umem_disable(vsi, qid); 134 + return pool ? i40e_xsk_pool_enable(vsi, pool, qid) : 135 + i40e_xsk_pool_disable(vsi, qid); 138 136 } 139 137 140 138 /** ··· 195 191 rx_desc = I40E_RX_DESC(rx_ring, ntu); 196 192 bi = i40e_rx_bi(rx_ring, ntu); 197 193 do { 198 - xdp = xsk_buff_alloc(rx_ring->xsk_umem); 194 + xdp = xsk_buff_alloc(rx_ring->xsk_pool); 199 195 if (!xdp) { 200 196 ok = false; 201 197 goto no_buffers; ··· 314 310 315 311 bi = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); 316 312 (*bi)->data_end = (*bi)->data + size; 317 - xsk_buff_dma_sync_for_cpu(*bi); 313 + xsk_buff_dma_sync_for_cpu(*bi, rx_ring->xsk_pool); 318 314 319 315 xdp_res = i40e_run_xdp_zc(rx_ring, *bi); 320 316 if (xdp_res) { ··· 362 358 i40e_finalize_xdp_rx(rx_ring, xdp_xmit); 363 359 i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets); 364 360 365 - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { 361 + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { 366 362 if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) 367 - xsk_set_rx_need_wakeup(rx_ring->xsk_umem); 363 + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); 368 364 else 369 - xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); 365 + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); 370 366 371 367 return (int)total_rx_packets; 372 368 } ··· 389 385 dma_addr_t dma; 390 386 391 387 while (budget-- > 0) { 392 - if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) 388 + if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc)) 393 389 break; 394 390 395 - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); 396 - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, 391 + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr); 392 + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, 397 393 desc.len); 398 394 399 395 tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use]; ··· 420 416 I40E_TXD_QW1_CMD_SHIFT); 421 417 i40e_xdp_ring_update_tail(xdp_ring); 422 418 423 - xsk_umem_consume_tx_done(xdp_ring->xsk_umem); 419 + xsk_tx_release(xdp_ring->xsk_pool); 424 420 i40e_update_tx_stats(xdp_ring, sent_frames, total_bytes); 425 421 } 426 422 ··· 452 448 **/ 453 449 bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring) 454 450 { 455 - struct xdp_umem *umem = tx_ring->xsk_umem; 451 + struct xsk_buff_pool *bp = tx_ring->xsk_pool; 456 452 u32 i, completed_frames, xsk_frames = 0; 457 453 u32 head_idx = i40e_get_head(tx_ring); 458 454 struct i40e_tx_buffer *tx_bi; ··· 492 488 tx_ring->next_to_clean -= tx_ring->count; 493 489 494 490 if (xsk_frames) 495 - xsk_umem_complete_tx(umem, xsk_frames); 491 + xsk_tx_completed(bp, xsk_frames); 496 492 497 493 i40e_arm_wb(tx_ring, vsi, completed_frames); 498 494 499 495 out_xmit: 500 - if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem)) 501 - xsk_set_tx_need_wakeup(tx_ring->xsk_umem); 496 + if (xsk_uses_need_wakeup(tx_ring->xsk_pool)) 497 + xsk_set_tx_need_wakeup(tx_ring->xsk_pool); 502 498 503 499 return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring)); 504 500 } ··· 530 526 if (queue_id >= vsi->num_queue_pairs) 531 527 return -ENXIO; 532 528 533 - if (!vsi->xdp_rings[queue_id]->xsk_umem) 529 + if (!vsi->xdp_rings[queue_id]->xsk_pool) 534 530 return -ENXIO; 535 531 536 532 ring = vsi->xdp_rings[queue_id]; ··· 569 565 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring) 570 566 { 571 567 u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; 572 - struct xdp_umem *umem = tx_ring->xsk_umem; 568 + struct xsk_buff_pool *bp = tx_ring->xsk_pool; 573 569 struct i40e_tx_buffer *tx_bi; 574 570 u32 xsk_frames = 0; 575 571 ··· 589 585 } 590 586 591 587 if (xsk_frames) 592 - xsk_umem_complete_tx(umem, xsk_frames); 588 + xsk_tx_completed(bp, xsk_frames); 593 589 } 594 590 595 591 /** 596 - * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have AF_XDP UMEM attached 592 + * i40e_xsk_any_rx_ring_enabled - Checks if Rx rings have an AF_XDP 593 + * buffer pool attached 597 594 * @vsi: vsi 598 595 * 599 - * Returns true if any of the Rx rings has an AF_XDP UMEM attached 596 + * Returns true if any of the Rx rings has an AF_XDP buffer pool attached 600 597 **/ 601 598 bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi) 602 599 { ··· 605 600 int i; 606 601 607 602 for (i = 0; i < vsi->num_queue_pairs; i++) { 608 - if (xdp_get_umem_from_qid(netdev, i)) 603 + if (xsk_get_pool_from_qid(netdev, i)) 609 604 return true; 610 605 } 611 606

+2 -2

drivers/net/ethernet/intel/i40e/i40e_xsk.h

··· 5 5 #define _I40E_XSK_H_ 6 6 7 7 struct i40e_vsi; 8 - struct xdp_umem; 8 + struct xsk_buff_pool; 9 9 struct zero_copy_allocator; 10 10 11 11 int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair); 12 12 int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair); 13 - int i40e_xsk_umem_setup(struct i40e_vsi *vsi, struct xdp_umem *umem, 13 + int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool, 14 14 u16 qid); 15 15 bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count); 16 16 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);

+9 -9

drivers/net/ethernet/intel/ice/ice.h

··· 321 321 struct ice_ring **xdp_rings; /* XDP ring array */ 322 322 u16 num_xdp_txq; /* Used XDP queues */ 323 323 u8 xdp_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */ 324 - struct xdp_umem **xsk_umems; 325 - u16 num_xsk_umems_used; 326 - u16 num_xsk_umems; 324 + struct xsk_buff_pool **xsk_pools; 325 + u16 num_xsk_pools_used; 326 + u16 num_xsk_pools; 327 327 } ____cacheline_internodealigned_in_smp; 328 328 329 329 /* struct that defines an interrupt vector */ ··· 507 507 } 508 508 509 509 /** 510 - * ice_xsk_umem - get XDP UMEM bound to a ring 510 + * ice_xsk_pool - get XSK buffer pool bound to a ring 511 511 * @ring - ring to use 512 512 * 513 - * Returns a pointer to xdp_umem structure if there is an UMEM present, 513 + * Returns a pointer to xdp_umem structure if there is a buffer pool present, 514 514 * NULL otherwise. 515 515 */ 516 - static inline struct xdp_umem *ice_xsk_umem(struct ice_ring *ring) 516 + static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_ring *ring) 517 517 { 518 - struct xdp_umem **umems = ring->vsi->xsk_umems; 518 + struct xsk_buff_pool **pools = ring->vsi->xsk_pools; 519 519 u16 qid = ring->q_index; 520 520 521 521 if (ice_ring_is_xdp(ring)) 522 522 qid -= ring->vsi->num_xdp_txq; 523 523 524 - if (qid >= ring->vsi->num_xsk_umems || !umems || !umems[qid] || 524 + if (qid >= ring->vsi->num_xsk_pools || !pools || !pools[qid] || 525 525 !ice_is_xdp_ena_vsi(ring->vsi)) 526 526 return NULL; 527 527 528 - return umems[qid]; 528 + return pools[qid]; 529 529 } 530 530 531 531 /**

+8 -8

drivers/net/ethernet/intel/ice/ice_base.c

··· 308 308 xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, 309 309 ring->q_index); 310 310 311 - ring->xsk_umem = ice_xsk_umem(ring); 312 - if (ring->xsk_umem) { 311 + ring->xsk_pool = ice_xsk_pool(ring); 312 + if (ring->xsk_pool) { 313 313 xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); 314 314 315 315 ring->rx_buf_len = 316 - xsk_umem_get_rx_frame_size(ring->xsk_umem); 316 + xsk_pool_get_rx_frame_size(ring->xsk_pool); 317 317 /* For AF_XDP ZC, we disallow packets to span on 318 318 * multiple buffers, thus letting us skip that 319 319 * handling in the fast-path. ··· 324 324 NULL); 325 325 if (err) 326 326 return err; 327 - xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); 327 + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); 328 328 329 329 dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", 330 330 ring->q_index); ··· 417 417 ring->tail = hw->hw_addr + QRX_TAIL(pf_q); 418 418 writel(0, ring->tail); 419 419 420 - if (ring->xsk_umem) { 421 - if (!xsk_buff_can_alloc(ring->xsk_umem, num_bufs)) { 422 - dev_warn(dev, "UMEM does not provide enough addresses to fill %d buffers on Rx ring %d\n", 420 + if (ring->xsk_pool) { 421 + if (!xsk_buff_can_alloc(ring->xsk_pool, num_bufs)) { 422 + dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n", 423 423 num_bufs, ring->q_index); 424 424 dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n"); 425 425 ··· 428 428 429 429 err = ice_alloc_rx_bufs_zc(ring, num_bufs); 430 430 if (err) 431 - dev_info(dev, "Failed to allocate some buffers on UMEM enabled Rx ring %d (pf_q %d)\n", 431 + dev_info(dev, "Failed to allocate some buffers on XSK buffer pool enabled Rx ring %d (pf_q %d)\n", 432 432 ring->q_index, pf_q); 433 433 return 0; 434 434 }

+1 -1

drivers/net/ethernet/intel/ice/ice_lib.c

··· 1743 1743 return ret; 1744 1744 1745 1745 for (i = 0; i < vsi->num_xdp_txq; i++) 1746 - vsi->xdp_rings[i]->xsk_umem = ice_xsk_umem(vsi->xdp_rings[i]); 1746 + vsi->xdp_rings[i]->xsk_pool = ice_xsk_pool(vsi->xdp_rings[i]); 1747 1747 1748 1748 return ret; 1749 1749 }

+5 -5

drivers/net/ethernet/intel/ice/ice_main.c

··· 2273 2273 if (ice_setup_tx_ring(xdp_ring)) 2274 2274 goto free_xdp_rings; 2275 2275 ice_set_ring_xdp(xdp_ring); 2276 - xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring); 2276 + xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring); 2277 2277 } 2278 2278 2279 2279 return 0; ··· 2517 2517 if (if_running) 2518 2518 ret = ice_up(vsi); 2519 2519 2520 - if (!ret && prog && vsi->xsk_umems) { 2520 + if (!ret && prog && vsi->xsk_pools) { 2521 2521 int i; 2522 2522 2523 2523 ice_for_each_rxq(vsi, i) { 2524 2524 struct ice_ring *rx_ring = vsi->rx_rings[i]; 2525 2525 2526 - if (rx_ring->xsk_umem) 2526 + if (rx_ring->xsk_pool) 2527 2527 napi_schedule(&rx_ring->q_vector->napi); 2528 2528 } 2529 2529 } ··· 2549 2549 switch (xdp->command) { 2550 2550 case XDP_SETUP_PROG: 2551 2551 return ice_xdp_setup_prog(vsi, xdp->prog, xdp->extack); 2552 - case XDP_SETUP_XSK_UMEM: 2553 - return ice_xsk_umem_setup(vsi, xdp->xsk.umem, 2552 + case XDP_SETUP_XSK_POOL: 2553 + return ice_xsk_pool_setup(vsi, xdp->xsk.pool, 2554 2554 xdp->xsk.queue_id); 2555 2555 default: 2556 2556 return -EINVAL;

+4 -4

drivers/net/ethernet/intel/ice/ice_txrx.c

··· 145 145 { 146 146 u16 i; 147 147 148 - if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_umem) { 148 + if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_pool) { 149 149 ice_xsk_clean_xdp_ring(tx_ring); 150 150 goto tx_skip_free; 151 151 } ··· 375 375 if (!rx_ring->rx_buf) 376 376 return; 377 377 378 - if (rx_ring->xsk_umem) { 378 + if (rx_ring->xsk_pool) { 379 379 ice_xsk_clean_rx_ring(rx_ring); 380 380 goto rx_skip_free; 381 381 } ··· 1610 1610 * budget and be more aggressive about cleaning up the Tx descriptors. 1611 1611 */ 1612 1612 ice_for_each_ring(ring, q_vector->tx) { 1613 - bool wd = ring->xsk_umem ? 1613 + bool wd = ring->xsk_pool ? 1614 1614 ice_clean_tx_irq_zc(ring, budget) : 1615 1615 ice_clean_tx_irq(ring, budget); 1616 1616 ··· 1640 1640 * comparison in the irq context instead of many inside the 1641 1641 * ice_clean_rx_irq function and makes the codebase cleaner. 1642 1642 */ 1643 - cleaned = ring->xsk_umem ? 1643 + cleaned = ring->xsk_pool ? 1644 1644 ice_clean_rx_irq_zc(ring, budget_per_ring) : 1645 1645 ice_clean_rx_irq(ring, budget_per_ring); 1646 1646 work_done += cleaned;

+1 -1

drivers/net/ethernet/intel/ice/ice_txrx.h

··· 295 295 296 296 struct rcu_head rcu; /* to avoid race on free */ 297 297 struct bpf_prog *xdp_prog; 298 - struct xdp_umem *xsk_umem; 298 + struct xsk_buff_pool *xsk_pool; 299 299 /* CL3 - 3rd cacheline starts here */ 300 300 struct xdp_rxq_info xdp_rxq; 301 301 /* CLX - the below items are only accessed infrequently and should be

+69 -69

drivers/net/ethernet/intel/ice/ice_xsk.c

··· 236 236 if (err) 237 237 goto free_buf; 238 238 ice_set_ring_xdp(xdp_ring); 239 - xdp_ring->xsk_umem = ice_xsk_umem(xdp_ring); 239 + xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring); 240 240 } 241 241 242 242 err = ice_setup_rx_ctx(rx_ring); ··· 260 260 } 261 261 262 262 /** 263 - * ice_xsk_alloc_umems - allocate a UMEM region for an XDP socket 264 - * @vsi: VSI to allocate the UMEM on 263 + * ice_xsk_alloc_pools - allocate a buffer pool for an XDP socket 264 + * @vsi: VSI to allocate the buffer pool on 265 265 * 266 266 * Returns 0 on success, negative on error 267 267 */ 268 - static int ice_xsk_alloc_umems(struct ice_vsi *vsi) 268 + static int ice_xsk_alloc_pools(struct ice_vsi *vsi) 269 269 { 270 - if (vsi->xsk_umems) 270 + if (vsi->xsk_pools) 271 271 return 0; 272 272 273 - vsi->xsk_umems = kcalloc(vsi->num_xsk_umems, sizeof(*vsi->xsk_umems), 273 + vsi->xsk_pools = kcalloc(vsi->num_xsk_pools, sizeof(*vsi->xsk_pools), 274 274 GFP_KERNEL); 275 275 276 - if (!vsi->xsk_umems) { 277 - vsi->num_xsk_umems = 0; 276 + if (!vsi->xsk_pools) { 277 + vsi->num_xsk_pools = 0; 278 278 return -ENOMEM; 279 279 } 280 280 ··· 282 282 } 283 283 284 284 /** 285 - * ice_xsk_remove_umem - Remove an UMEM for a certain ring/qid 285 + * ice_xsk_remove_pool - Remove an buffer pool for a certain ring/qid 286 286 * @vsi: VSI from which the VSI will be removed 287 - * @qid: Ring/qid associated with the UMEM 287 + * @qid: Ring/qid associated with the buffer pool 288 288 */ 289 - static void ice_xsk_remove_umem(struct ice_vsi *vsi, u16 qid) 289 + static void ice_xsk_remove_pool(struct ice_vsi *vsi, u16 qid) 290 290 { 291 - vsi->xsk_umems[qid] = NULL; 292 - vsi->num_xsk_umems_used--; 291 + vsi->xsk_pools[qid] = NULL; 292 + vsi->num_xsk_pools_used--; 293 293 294 - if (vsi->num_xsk_umems_used == 0) { 295 - kfree(vsi->xsk_umems); 296 - vsi->xsk_umems = NULL; 297 - vsi->num_xsk_umems = 0; 294 + if (vsi->num_xsk_pools_used == 0) { 295 + kfree(vsi->xsk_pools); 296 + vsi->xsk_pools = NULL; 297 + vsi->num_xsk_pools = 0; 298 298 } 299 299 } 300 300 301 301 /** 302 - * ice_xsk_umem_disable - disable a UMEM region 302 + * ice_xsk_pool_disable - disable a buffer pool region 303 303 * @vsi: Current VSI 304 304 * @qid: queue ID 305 305 * 306 306 * Returns 0 on success, negative on failure 307 307 */ 308 - static int ice_xsk_umem_disable(struct ice_vsi *vsi, u16 qid) 308 + static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid) 309 309 { 310 - if (!vsi->xsk_umems || qid >= vsi->num_xsk_umems || 311 - !vsi->xsk_umems[qid]) 310 + if (!vsi->xsk_pools || qid >= vsi->num_xsk_pools || 311 + !vsi->xsk_pools[qid]) 312 312 return -EINVAL; 313 313 314 - xsk_buff_dma_unmap(vsi->xsk_umems[qid], ICE_RX_DMA_ATTR); 315 - ice_xsk_remove_umem(vsi, qid); 314 + xsk_pool_dma_unmap(vsi->xsk_pools[qid], ICE_RX_DMA_ATTR); 315 + ice_xsk_remove_pool(vsi, qid); 316 316 317 317 return 0; 318 318 } 319 319 320 320 /** 321 - * ice_xsk_umem_enable - enable a UMEM region 321 + * ice_xsk_pool_enable - enable a buffer pool region 322 322 * @vsi: Current VSI 323 - * @umem: pointer to a requested UMEM region 323 + * @pool: pointer to a requested buffer pool region 324 324 * @qid: queue ID 325 325 * 326 326 * Returns 0 on success, negative on failure 327 327 */ 328 328 static int 329 - ice_xsk_umem_enable(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid) 329 + ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) 330 330 { 331 331 int err; 332 332 333 333 if (vsi->type != ICE_VSI_PF) 334 334 return -EINVAL; 335 335 336 - if (!vsi->num_xsk_umems) 337 - vsi->num_xsk_umems = min_t(u16, vsi->num_rxq, vsi->num_txq); 338 - if (qid >= vsi->num_xsk_umems) 336 + if (!vsi->num_xsk_pools) 337 + vsi->num_xsk_pools = min_t(u16, vsi->num_rxq, vsi->num_txq); 338 + if (qid >= vsi->num_xsk_pools) 339 339 return -EINVAL; 340 340 341 - err = ice_xsk_alloc_umems(vsi); 341 + err = ice_xsk_alloc_pools(vsi); 342 342 if (err) 343 343 return err; 344 344 345 - if (vsi->xsk_umems && vsi->xsk_umems[qid]) 345 + if (vsi->xsk_pools && vsi->xsk_pools[qid]) 346 346 return -EBUSY; 347 347 348 - vsi->xsk_umems[qid] = umem; 349 - vsi->num_xsk_umems_used++; 348 + vsi->xsk_pools[qid] = pool; 349 + vsi->num_xsk_pools_used++; 350 350 351 - err = xsk_buff_dma_map(vsi->xsk_umems[qid], ice_pf_to_dev(vsi->back), 351 + err = xsk_pool_dma_map(vsi->xsk_pools[qid], ice_pf_to_dev(vsi->back), 352 352 ICE_RX_DMA_ATTR); 353 353 if (err) 354 354 return err; ··· 357 357 } 358 358 359 359 /** 360 - * ice_xsk_umem_setup - enable/disable a UMEM region depending on its state 360 + * ice_xsk_pool_setup - enable/disable a buffer pool region depending on its state 361 361 * @vsi: Current VSI 362 - * @umem: UMEM to enable/associate to a ring, NULL to disable 362 + * @pool: buffer pool to enable/associate to a ring, NULL to disable 363 363 * @qid: queue ID 364 364 * 365 365 * Returns 0 on success, negative on failure 366 366 */ 367 - int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid) 367 + int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) 368 368 { 369 - bool if_running, umem_present = !!umem; 370 - int ret = 0, umem_failure = 0; 369 + bool if_running, pool_present = !!pool; 370 + int ret = 0, pool_failure = 0; 371 371 372 372 if_running = netif_running(vsi->netdev) && ice_is_xdp_ena_vsi(vsi); 373 373 ··· 375 375 ret = ice_qp_dis(vsi, qid); 376 376 if (ret) { 377 377 netdev_err(vsi->netdev, "ice_qp_dis error = %d\n", ret); 378 - goto xsk_umem_if_up; 378 + goto xsk_pool_if_up; 379 379 } 380 380 } 381 381 382 - umem_failure = umem_present ? ice_xsk_umem_enable(vsi, umem, qid) : 383 - ice_xsk_umem_disable(vsi, qid); 382 + pool_failure = pool_present ? ice_xsk_pool_enable(vsi, pool, qid) : 383 + ice_xsk_pool_disable(vsi, qid); 384 384 385 - xsk_umem_if_up: 385 + xsk_pool_if_up: 386 386 if (if_running) { 387 387 ret = ice_qp_ena(vsi, qid); 388 - if (!ret && umem_present) 388 + if (!ret && pool_present) 389 389 napi_schedule(&vsi->xdp_rings[qid]->q_vector->napi); 390 390 else if (ret) 391 391 netdev_err(vsi->netdev, "ice_qp_ena error = %d\n", ret); 392 392 } 393 393 394 - if (umem_failure) { 395 - netdev_err(vsi->netdev, "Could not %sable UMEM, error = %d\n", 396 - umem_present ? "en" : "dis", umem_failure); 397 - return umem_failure; 394 + if (pool_failure) { 395 + netdev_err(vsi->netdev, "Could not %sable buffer pool, error = %d\n", 396 + pool_present ? "en" : "dis", pool_failure); 397 + return pool_failure; 398 398 } 399 399 400 400 return ret; ··· 425 425 rx_buf = &rx_ring->rx_buf[ntu]; 426 426 427 427 do { 428 - rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_umem); 428 + rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool); 429 429 if (!rx_buf->xdp) { 430 430 ret = true; 431 431 break; ··· 595 595 596 596 rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean]; 597 597 rx_buf->xdp->data_end = rx_buf->xdp->data + size; 598 - xsk_buff_dma_sync_for_cpu(rx_buf->xdp); 598 + xsk_buff_dma_sync_for_cpu(rx_buf->xdp, rx_ring->xsk_pool); 599 599 600 600 xdp_res = ice_run_xdp_zc(rx_ring, rx_buf->xdp); 601 601 if (xdp_res) { ··· 645 645 ice_finalize_xdp_rx(rx_ring, xdp_xmit); 646 646 ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); 647 647 648 - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { 648 + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { 649 649 if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) 650 - xsk_set_rx_need_wakeup(rx_ring->xsk_umem); 650 + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); 651 651 else 652 - xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); 652 + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); 653 653 654 654 return (int)total_rx_packets; 655 655 } ··· 682 682 683 683 tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use]; 684 684 685 - if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) 685 + if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc)) 686 686 break; 687 687 688 - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); 689 - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, 688 + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr); 689 + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, 690 690 desc.len); 691 691 692 692 tx_buf->bytecount = desc.len; ··· 703 703 704 704 if (tx_desc) { 705 705 ice_xdp_ring_update_tail(xdp_ring); 706 - xsk_umem_consume_tx_done(xdp_ring->xsk_umem); 706 + xsk_tx_release(xdp_ring->xsk_pool); 707 707 } 708 708 709 709 return budget > 0 && work_done; ··· 777 777 xdp_ring->next_to_clean = ntc; 778 778 779 779 if (xsk_frames) 780 - xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames); 780 + xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); 781 781 782 - if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_umem)) 783 - xsk_set_tx_need_wakeup(xdp_ring->xsk_umem); 782 + if (xsk_uses_need_wakeup(xdp_ring->xsk_pool)) 783 + xsk_set_tx_need_wakeup(xdp_ring->xsk_pool); 784 784 785 785 ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes); 786 786 xmit_done = ice_xmit_zc(xdp_ring, ICE_DFLT_IRQ_WORK); ··· 814 814 if (queue_id >= vsi->num_txq) 815 815 return -ENXIO; 816 816 817 - if (!vsi->xdp_rings[queue_id]->xsk_umem) 817 + if (!vsi->xdp_rings[queue_id]->xsk_pool) 818 818 return -ENXIO; 819 819 820 820 ring = vsi->xdp_rings[queue_id]; ··· 833 833 } 834 834 835 835 /** 836 - * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP UMEM attached 836 + * ice_xsk_any_rx_ring_ena - Checks if Rx rings have AF_XDP buff pool attached 837 837 * @vsi: VSI to be checked 838 838 * 839 - * Returns true if any of the Rx rings has an AF_XDP UMEM attached 839 + * Returns true if any of the Rx rings has an AF_XDP buff pool attached 840 840 */ 841 841 bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi) 842 842 { 843 843 int i; 844 844 845 - if (!vsi->xsk_umems) 845 + if (!vsi->xsk_pools) 846 846 return false; 847 847 848 - for (i = 0; i < vsi->num_xsk_umems; i++) { 849 - if (vsi->xsk_umems[i]) 848 + for (i = 0; i < vsi->num_xsk_pools; i++) { 849 + if (vsi->xsk_pools[i]) 850 850 return true; 851 851 } 852 852 ··· 854 854 } 855 855 856 856 /** 857 - * ice_xsk_clean_rx_ring - clean UMEM queues connected to a given Rx ring 857 + * ice_xsk_clean_rx_ring - clean buffer pool queues connected to a given Rx ring 858 858 * @rx_ring: ring to be cleaned 859 859 */ 860 860 void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring) ··· 872 872 } 873 873 874 874 /** 875 - * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its UMEM queues 875 + * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its buffer pool queues 876 876 * @xdp_ring: XDP_Tx ring 877 877 */ 878 878 void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring) ··· 896 896 } 897 897 898 898 if (xsk_frames) 899 - xsk_umem_complete_tx(xdp_ring->xsk_umem, xsk_frames); 899 + xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); 900 900 }

+4 -3

drivers/net/ethernet/intel/ice/ice_xsk.h

··· 9 9 struct ice_vsi; 10 10 11 11 #ifdef CONFIG_XDP_SOCKETS 12 - int ice_xsk_umem_setup(struct ice_vsi *vsi, struct xdp_umem *umem, u16 qid); 12 + int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, 13 + u16 qid); 13 14 int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget); 14 15 bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget); 15 16 int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags); ··· 20 19 void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring); 21 20 #else 22 21 static inline int 23 - ice_xsk_umem_setup(struct ice_vsi __always_unused *vsi, 24 - struct xdp_umem __always_unused *umem, 22 + ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi, 23 + struct xsk_buff_pool __always_unused *pool, 25 24 u16 __always_unused qid) 26 25 { 27 26 return -EOPNOTSUPP;

+1 -1

drivers/net/ethernet/intel/ixgbe/ixgbe.h

··· 350 350 struct ixgbe_rx_queue_stats rx_stats; 351 351 }; 352 352 struct xdp_rxq_info xdp_rxq; 353 - struct xdp_umem *xsk_umem; 353 + struct xsk_buff_pool *xsk_pool; 354 354 u16 ring_idx; /* {rx,tx,xdp}_ring back reference idx */ 355 355 u16 rx_buf_len; 356 356 } ____cacheline_internodealigned_in_smp;

+17 -17

drivers/net/ethernet/intel/ixgbe/ixgbe_main.c

··· 3151 3151 #endif 3152 3152 3153 3153 ixgbe_for_each_ring(ring, q_vector->tx) { 3154 - bool wd = ring->xsk_umem ? 3154 + bool wd = ring->xsk_pool ? 3155 3155 ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) : 3156 3156 ixgbe_clean_tx_irq(q_vector, ring, budget); 3157 3157 ··· 3171 3171 per_ring_budget = budget; 3172 3172 3173 3173 ixgbe_for_each_ring(ring, q_vector->rx) { 3174 - int cleaned = ring->xsk_umem ? 3174 + int cleaned = ring->xsk_pool ? 3175 3175 ixgbe_clean_rx_irq_zc(q_vector, ring, 3176 3176 per_ring_budget) : 3177 3177 ixgbe_clean_rx_irq(q_vector, ring, ··· 3466 3466 u32 txdctl = IXGBE_TXDCTL_ENABLE; 3467 3467 u8 reg_idx = ring->reg_idx; 3468 3468 3469 - ring->xsk_umem = NULL; 3469 + ring->xsk_pool = NULL; 3470 3470 if (ring_is_xdp(ring)) 3471 - ring->xsk_umem = ixgbe_xsk_umem(adapter, ring); 3471 + ring->xsk_pool = ixgbe_xsk_pool(adapter, ring); 3472 3472 3473 3473 /* disable queue to avoid issues while updating state */ 3474 3474 IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0); ··· 3708 3708 srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT; 3709 3709 3710 3710 /* configure the packet buffer length */ 3711 - if (rx_ring->xsk_umem) { 3712 - u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_umem); 3711 + if (rx_ring->xsk_pool) { 3712 + u32 xsk_buf_len = xsk_pool_get_rx_frame_size(rx_ring->xsk_pool); 3713 3713 3714 3714 /* If the MAC support setting RXDCTL.RLPML, the 3715 3715 * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and ··· 4054 4054 u8 reg_idx = ring->reg_idx; 4055 4055 4056 4056 xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); 4057 - ring->xsk_umem = ixgbe_xsk_umem(adapter, ring); 4058 - if (ring->xsk_umem) { 4057 + ring->xsk_pool = ixgbe_xsk_pool(adapter, ring); 4058 + if (ring->xsk_pool) { 4059 4059 WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, 4060 4060 MEM_TYPE_XSK_BUFF_POOL, 4061 4061 NULL)); 4062 - xsk_buff_set_rxq_info(ring->xsk_umem, &ring->xdp_rxq); 4062 + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); 4063 4063 } else { 4064 4064 WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, 4065 4065 MEM_TYPE_PAGE_SHARED, NULL)); ··· 4114 4114 #endif 4115 4115 } 4116 4116 4117 - if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) { 4118 - u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_umem); 4117 + if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) { 4118 + u32 xsk_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool); 4119 4119 4120 4120 rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK | 4121 4121 IXGBE_RXDCTL_RLPML_EN); ··· 4137 4137 IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl); 4138 4138 4139 4139 ixgbe_rx_desc_queue_enable(adapter, ring); 4140 - if (ring->xsk_umem) 4140 + if (ring->xsk_pool) 4141 4141 ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring)); 4142 4142 else 4143 4143 ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring)); ··· 5287 5287 u16 i = rx_ring->next_to_clean; 5288 5288 struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i]; 5289 5289 5290 - if (rx_ring->xsk_umem) { 5290 + if (rx_ring->xsk_pool) { 5291 5291 ixgbe_xsk_clean_rx_ring(rx_ring); 5292 5292 goto skip_free; 5293 5293 } ··· 5979 5979 u16 i = tx_ring->next_to_clean; 5980 5980 struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i]; 5981 5981 5982 - if (tx_ring->xsk_umem) { 5982 + if (tx_ring->xsk_pool) { 5983 5983 ixgbe_xsk_clean_tx_ring(tx_ring); 5984 5984 goto out; 5985 5985 } ··· 10141 10141 */ 10142 10142 if (need_reset && prog) 10143 10143 for (i = 0; i < adapter->num_rx_queues; i++) 10144 - if (adapter->xdp_ring[i]->xsk_umem) 10144 + if (adapter->xdp_ring[i]->xsk_pool) 10145 10145 (void)ixgbe_xsk_wakeup(adapter->netdev, i, 10146 10146 XDP_WAKEUP_RX); 10147 10147 ··· 10155 10155 switch (xdp->command) { 10156 10156 case XDP_SETUP_PROG: 10157 10157 return ixgbe_xdp_setup(dev, xdp->prog); 10158 - case XDP_SETUP_XSK_UMEM: 10159 - return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem, 10158 + case XDP_SETUP_XSK_POOL: 10159 + return ixgbe_xsk_pool_setup(adapter, xdp->xsk.pool, 10160 10160 xdp->xsk.queue_id); 10161 10161 10162 10162 default:

+4 -3

drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h

··· 28 28 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring); 29 29 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring); 30 30 31 - struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter, 32 - struct ixgbe_ring *ring); 33 - int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem, 31 + struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter, 32 + struct ixgbe_ring *ring); 33 + int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter, 34 + struct xsk_buff_pool *pool, 34 35 u16 qid); 35 36 36 37 void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);

+32 -31

drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

··· 8 8 #include "ixgbe.h" 9 9 #include "ixgbe_txrx_common.h" 10 10 11 - struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter, 12 - struct ixgbe_ring *ring) 11 + struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter, 12 + struct ixgbe_ring *ring) 13 13 { 14 14 bool xdp_on = READ_ONCE(adapter->xdp_prog); 15 15 int qid = ring->ring_idx; ··· 17 17 if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps)) 18 18 return NULL; 19 19 20 - return xdp_get_umem_from_qid(adapter->netdev, qid); 20 + return xsk_get_pool_from_qid(adapter->netdev, qid); 21 21 } 22 22 23 - static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter, 24 - struct xdp_umem *umem, 23 + static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter, 24 + struct xsk_buff_pool *pool, 25 25 u16 qid) 26 26 { 27 27 struct net_device *netdev = adapter->netdev; ··· 35 35 qid >= netdev->real_num_tx_queues) 36 36 return -EINVAL; 37 37 38 - err = xsk_buff_dma_map(umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); 38 + err = xsk_pool_dma_map(pool, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); 39 39 if (err) 40 40 return err; 41 41 ··· 59 59 return 0; 60 60 } 61 61 62 - static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid) 62 + static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid) 63 63 { 64 - struct xdp_umem *umem; 64 + struct xsk_buff_pool *pool; 65 65 bool if_running; 66 66 67 - umem = xdp_get_umem_from_qid(adapter->netdev, qid); 68 - if (!umem) 67 + pool = xsk_get_pool_from_qid(adapter->netdev, qid); 68 + if (!pool) 69 69 return -EINVAL; 70 70 71 71 if_running = netif_running(adapter->netdev) && ··· 75 75 ixgbe_txrx_ring_disable(adapter, qid); 76 76 77 77 clear_bit(qid, adapter->af_xdp_zc_qps); 78 - xsk_buff_dma_unmap(umem, IXGBE_RX_DMA_ATTR); 78 + xsk_pool_dma_unmap(pool, IXGBE_RX_DMA_ATTR); 79 79 80 80 if (if_running) 81 81 ixgbe_txrx_ring_enable(adapter, qid); ··· 83 83 return 0; 84 84 } 85 85 86 - int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem, 86 + int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter, 87 + struct xsk_buff_pool *pool, 87 88 u16 qid) 88 89 { 89 - return umem ? ixgbe_xsk_umem_enable(adapter, umem, qid) : 90 - ixgbe_xsk_umem_disable(adapter, qid); 90 + return pool ? ixgbe_xsk_pool_enable(adapter, pool, qid) : 91 + ixgbe_xsk_pool_disable(adapter, qid); 91 92 } 92 93 93 94 static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter, ··· 150 149 i -= rx_ring->count; 151 150 152 151 do { 153 - bi->xdp = xsk_buff_alloc(rx_ring->xsk_umem); 152 + bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool); 154 153 if (!bi->xdp) { 155 154 ok = false; 156 155 break; ··· 287 286 } 288 287 289 288 bi->xdp->data_end = bi->xdp->data + size; 290 - xsk_buff_dma_sync_for_cpu(bi->xdp); 289 + xsk_buff_dma_sync_for_cpu(bi->xdp, rx_ring->xsk_pool); 291 290 xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp); 292 291 293 292 if (xdp_res) { ··· 345 344 q_vector->rx.total_packets += total_rx_packets; 346 345 q_vector->rx.total_bytes += total_rx_bytes; 347 346 348 - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { 347 + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { 349 348 if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) 350 - xsk_set_rx_need_wakeup(rx_ring->xsk_umem); 349 + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); 351 350 else 352 - xsk_clear_rx_need_wakeup(rx_ring->xsk_umem); 351 + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); 353 352 354 353 return (int)total_rx_packets; 355 354 } ··· 374 373 375 374 static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) 376 375 { 376 + struct xsk_buff_pool *pool = xdp_ring->xsk_pool; 377 377 union ixgbe_adv_tx_desc *tx_desc = NULL; 378 378 struct ixgbe_tx_buffer *tx_bi; 379 379 bool work_done = true; ··· 389 387 break; 390 388 } 391 389 392 - if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc)) 390 + if (!xsk_tx_peek_desc(pool, &desc)) 393 391 break; 394 392 395 - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_umem, desc.addr); 396 - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_umem, dma, 397 - desc.len); 393 + dma = xsk_buff_raw_get_dma(pool, desc.addr); 394 + xsk_buff_raw_dma_sync_for_device(pool, dma, desc.len); 398 395 399 396 tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use]; 400 397 tx_bi->bytecount = desc.len; ··· 419 418 420 419 if (tx_desc) { 421 420 ixgbe_xdp_ring_update_tail(xdp_ring); 422 - xsk_umem_consume_tx_done(xdp_ring->xsk_umem); 421 + xsk_tx_release(pool); 423 422 } 424 423 425 424 return !!budget && work_done; ··· 440 439 { 441 440 u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; 442 441 unsigned int total_packets = 0, total_bytes = 0; 443 - struct xdp_umem *umem = tx_ring->xsk_umem; 442 + struct xsk_buff_pool *pool = tx_ring->xsk_pool; 444 443 union ixgbe_adv_tx_desc *tx_desc; 445 444 struct ixgbe_tx_buffer *tx_bi; 446 445 u32 xsk_frames = 0; ··· 485 484 q_vector->tx.total_packets += total_packets; 486 485 487 486 if (xsk_frames) 488 - xsk_umem_complete_tx(umem, xsk_frames); 487 + xsk_tx_completed(pool, xsk_frames); 489 488 490 - if (xsk_umem_uses_need_wakeup(tx_ring->xsk_umem)) 491 - xsk_set_tx_need_wakeup(tx_ring->xsk_umem); 489 + if (xsk_uses_need_wakeup(pool)) 490 + xsk_set_tx_need_wakeup(pool); 492 491 493 492 return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit); 494 493 } ··· 512 511 if (test_bit(__IXGBE_TX_DISABLED, &ring->state)) 513 512 return -ENETDOWN; 514 513 515 - if (!ring->xsk_umem) 514 + if (!ring->xsk_pool) 516 515 return -ENXIO; 517 516 518 517 if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) { ··· 527 526 void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring) 528 527 { 529 528 u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use; 530 - struct xdp_umem *umem = tx_ring->xsk_umem; 529 + struct xsk_buff_pool *pool = tx_ring->xsk_pool; 531 530 struct ixgbe_tx_buffer *tx_bi; 532 531 u32 xsk_frames = 0; 533 532 ··· 547 546 } 548 547 549 548 if (xsk_frames) 550 - xsk_umem_complete_tx(umem, xsk_frames); 549 + xsk_tx_completed(pool, xsk_frames); 551 550 }

+1 -1

drivers/net/ethernet/mellanox/mlx5/core/Makefile

··· 24 24 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \ 25 25 en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \ 26 26 en_selftest.o en/port.o en/monitor_stats.o en/health.o \ 27 - en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/umem.o \ 27 + en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \ 28 28 en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o 29 29 30 30 #

+10 -9

drivers/net/ethernet/mellanox/mlx5/core/en.h

··· 442 442 struct mlx5e_cq cq; 443 443 444 444 /* read only */ 445 - struct xdp_umem *umem; 445 + struct xsk_buff_pool *xsk_pool; 446 446 struct mlx5_wq_cyc wq; 447 447 struct mlx5e_xdpsq_stats *stats; 448 448 mlx5e_fp_xmit_xdp_frame_check xmit_xdp_frame_check; ··· 606 606 struct page_pool *page_pool; 607 607 608 608 /* AF_XDP zero-copy */ 609 - struct xdp_umem *umem; 609 + struct xsk_buff_pool *xsk_pool; 610 610 611 611 struct work_struct recover_work; 612 612 ··· 729 729 #endif 730 730 731 731 struct mlx5e_xsk { 732 - /* UMEMs are stored separately from channels, because we don't want to 733 - * lose them when channels are recreated. The kernel also stores UMEMs, 734 - * but it doesn't distinguish between zero-copy and non-zero-copy UMEMs, 735 - * so rely on our mechanism. 732 + /* XSK buffer pools are stored separately from channels, 733 + * because we don't want to lose them when channels are 734 + * recreated. The kernel also stores buffer pool, but it doesn't 735 + * distinguish between zero-copy and non-zero-copy UMEMs, so 736 + * rely on our mechanism. 736 737 */ 737 - struct xdp_umem **umems; 738 + struct xsk_buff_pool **pools; 738 739 u16 refcnt; 739 740 bool ever_used; 740 741 }; ··· 894 893 struct mlx5e_rq_param; 895 894 int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params, 896 895 struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk, 897 - struct xdp_umem *umem, struct mlx5e_rq *rq); 896 + struct xsk_buff_pool *xsk_pool, struct mlx5e_rq *rq); 898 897 int mlx5e_wait_for_min_rx_wqes(struct mlx5e_rq *rq, int wait_time); 899 898 void mlx5e_deactivate_rq(struct mlx5e_rq *rq); 900 899 void mlx5e_close_rq(struct mlx5e_rq *rq); ··· 904 903 struct mlx5e_sq_param *param, struct mlx5e_icosq *sq); 905 904 void mlx5e_close_icosq(struct mlx5e_icosq *sq); 906 905 int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, 907 - struct mlx5e_sq_param *param, struct xdp_umem *umem, 906 + struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool, 908 907 struct mlx5e_xdpsq *sq, bool is_redirect); 909 908 void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq); 910 909

+2 -3

drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c

··· 445 445 } while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq))); 446 446 447 447 if (xsk_frames) 448 - xsk_umem_complete_tx(sq->umem, xsk_frames); 448 + xsk_tx_completed(sq->xsk_pool, xsk_frames); 449 449 450 450 sq->stats->cqes += i; 451 451 ··· 475 475 } 476 476 477 477 if (xsk_frames) 478 - xsk_umem_complete_tx(sq->umem, xsk_frames); 478 + xsk_tx_completed(sq->xsk_pool, xsk_frames); 479 479 } 480 480 481 481 int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, ··· 563 563 sq->xmit_xdp_frame = is_mpw ? 564 564 mlx5e_xmit_xdp_frame_mpwqe : mlx5e_xmit_xdp_frame; 565 565 } 566 -

+27

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + /* Copyright (c) 2019-2020, Mellanox Technologies inc. All rights reserved. */ 3 + 4 + #ifndef __MLX5_EN_XSK_POOL_H__ 5 + #define __MLX5_EN_XSK_POOL_H__ 6 + 7 + #include "en.h" 8 + 9 + static inline struct xsk_buff_pool *mlx5e_xsk_get_pool(struct mlx5e_params *params, 10 + struct mlx5e_xsk *xsk, u16 ix) 11 + { 12 + if (!xsk || !xsk->pools) 13 + return NULL; 14 + 15 + if (unlikely(ix >= params->num_channels)) 16 + return NULL; 17 + 18 + return xsk->pools[ix]; 19 + } 20 + 21 + struct mlx5e_xsk_param; 22 + void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk); 23 + 24 + /* .ndo_bpf callback. */ 25 + int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid); 26 + 27 + #endif /* __MLX5_EN_XSK_POOL_H__ */

+2 -2

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c

··· 48 48 49 49 xdp->data_end = xdp->data + cqe_bcnt32; 50 50 xdp_set_data_meta_invalid(xdp); 51 - xsk_buff_dma_sync_for_cpu(xdp); 51 + xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); 52 52 net_prefetch(xdp->data); 53 53 54 54 rcu_read_lock(); ··· 99 99 100 100 xdp->data_end = xdp->data + cqe_bcnt; 101 101 xdp_set_data_meta_invalid(xdp); 102 - xsk_buff_dma_sync_for_cpu(xdp); 102 + xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); 103 103 net_prefetch(xdp->data); 104 104 105 105 if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) {

+5 -5

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h

··· 19 19 struct mlx5e_wqe_frag_info *wi, 20 20 u32 cqe_bcnt); 21 21 22 - static inline int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq, 22 + static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq, 23 23 struct mlx5e_dma_info *dma_info) 24 24 { 25 - dma_info->xsk = xsk_buff_alloc(rq->umem); 25 + dma_info->xsk = xsk_buff_alloc(rq->xsk_pool); 26 26 if (!dma_info->xsk) 27 27 return -ENOMEM; 28 28 ··· 38 38 39 39 static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err) 40 40 { 41 - if (!xsk_umem_uses_need_wakeup(rq->umem)) 41 + if (!xsk_uses_need_wakeup(rq->xsk_pool)) 42 42 return alloc_err; 43 43 44 44 if (unlikely(alloc_err)) 45 - xsk_set_rx_need_wakeup(rq->umem); 45 + xsk_set_rx_need_wakeup(rq->xsk_pool); 46 46 else 47 - xsk_clear_rx_need_wakeup(rq->umem); 47 + xsk_clear_rx_need_wakeup(rq->xsk_pool); 48 48 49 49 return false; 50 50 }

+6 -6

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c

··· 45 45 } 46 46 47 47 int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, 48 - struct mlx5e_xsk_param *xsk, struct xdp_umem *umem, 48 + struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool, 49 49 struct mlx5e_channel *c) 50 50 { 51 51 struct mlx5e_channel_param *cparam; ··· 64 64 if (unlikely(err)) 65 65 goto err_free_cparam; 66 66 67 - err = mlx5e_open_rq(c, params, &cparam->rq, xsk, umem, &c->xskrq); 67 + err = mlx5e_open_rq(c, params, &cparam->rq, xsk, pool, &c->xskrq); 68 68 if (unlikely(err)) 69 69 goto err_close_rx_cq; 70 70 ··· 72 72 if (unlikely(err)) 73 73 goto err_close_rq; 74 74 75 - /* Create a separate SQ, so that when the UMEM is disabled, we could 75 + /* Create a separate SQ, so that when the buff pool is disabled, we could 76 76 * close this SQ safely and stop receiving CQEs. In other case, e.g., if 77 - * the XDPSQ was used instead, we might run into trouble when the UMEM 77 + * the XDPSQ was used instead, we might run into trouble when the buff pool 78 78 * is disabled and then reenabled, but the SQ continues receiving CQEs 79 - * from the old UMEM. 79 + * from the old buff pool. 80 80 */ 81 - err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, umem, &c->xsksq, true); 81 + err = mlx5e_open_xdpsq(c, params, &cparam->xdp_sq, pool, &c->xsksq, true); 82 82 if (unlikely(err)) 83 83 goto err_close_tx_cq; 84 84

+1 -1

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.h

··· 12 12 struct mlx5e_xsk_param *xsk, 13 13 struct mlx5_core_dev *mdev); 14 14 int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params, 15 - struct mlx5e_xsk_param *xsk, struct xdp_umem *umem, 15 + struct mlx5e_xsk_param *xsk, struct xsk_buff_pool *pool, 16 16 struct mlx5e_channel *c); 17 17 void mlx5e_close_xsk(struct mlx5e_channel *c); 18 18 void mlx5e_activate_xsk(struct mlx5e_channel *c);

+7 -7

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c

··· 2 2 /* Copyright (c) 2019 Mellanox Technologies. */ 3 3 4 4 #include "tx.h" 5 - #include "umem.h" 5 + #include "pool.h" 6 6 #include "en/xdp.h" 7 7 #include "en/params.h" 8 8 #include <net/xdp_sock_drv.h> ··· 66 66 67 67 bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) 68 68 { 69 - struct xdp_umem *umem = sq->umem; 69 + struct xsk_buff_pool *pool = sq->xsk_pool; 70 70 struct mlx5e_xdp_info xdpi; 71 71 struct mlx5e_xdp_xmit_data xdptxd; 72 72 bool work_done = true; ··· 87 87 break; 88 88 } 89 89 90 - if (!xsk_umem_consume_tx(umem, &desc)) { 90 + if (!xsk_tx_peek_desc(pool, &desc)) { 91 91 /* TX will get stuck until something wakes it up by 92 92 * triggering NAPI. Currently it's expected that the 93 93 * application calls sendto() if there are consumed, but ··· 96 96 break; 97 97 } 98 98 99 - xdptxd.dma_addr = xsk_buff_raw_get_dma(umem, desc.addr); 100 - xdptxd.data = xsk_buff_raw_get_data(umem, desc.addr); 99 + xdptxd.dma_addr = xsk_buff_raw_get_dma(pool, desc.addr); 100 + xdptxd.data = xsk_buff_raw_get_data(pool, desc.addr); 101 101 xdptxd.len = desc.len; 102 102 103 - xsk_buff_raw_dma_sync_for_device(umem, xdptxd.dma_addr, xdptxd.len); 103 + xsk_buff_raw_dma_sync_for_device(pool, xdptxd.dma_addr, xdptxd.len); 104 104 105 105 ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, 106 106 mlx5e_xmit_xdp_frame, sq, &xdptxd, &xdpi, check_result); ··· 119 119 mlx5e_xdp_mpwqe_complete(sq); 120 120 mlx5e_xmit_xdp_doorbell(sq); 121 121 122 - xsk_umem_consume_tx_done(umem); 122 + xsk_tx_release(pool); 123 123 } 124 124 125 125 return !(budget && work_done);

+3 -3

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h

··· 15 15 16 16 static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq) 17 17 { 18 - if (!xsk_umem_uses_need_wakeup(sq->umem)) 18 + if (!xsk_uses_need_wakeup(sq->xsk_pool)) 19 19 return; 20 20 21 21 if (sq->pc != sq->cc) 22 - xsk_clear_tx_need_wakeup(sq->umem); 22 + xsk_clear_tx_need_wakeup(sq->xsk_pool); 23 23 else 24 - xsk_set_tx_need_wakeup(sq->umem); 24 + xsk_set_tx_need_wakeup(sq->xsk_pool); 25 25 } 26 26 27 27 #endif /* __MLX5_EN_XSK_TX_H__ */

+55 -55

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 - /* Copyright (c) 2019 Mellanox Technologies. */ 2 + /* Copyright (c) 2019-2020, Mellanox Technologies inc. All rights reserved. */ 3 3 4 4 #include <net/xdp_sock_drv.h> 5 - #include "umem.h" 5 + #include "pool.h" 6 6 #include "setup.h" 7 7 #include "en/params.h" 8 8 9 - static int mlx5e_xsk_map_umem(struct mlx5e_priv *priv, 10 - struct xdp_umem *umem) 9 + static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv, 10 + struct xsk_buff_pool *pool) 11 11 { 12 12 struct device *dev = priv->mdev->device; 13 13 14 - return xsk_buff_dma_map(umem, dev, 0); 14 + return xsk_pool_dma_map(pool, dev, 0); 15 15 } 16 16 17 - static void mlx5e_xsk_unmap_umem(struct mlx5e_priv *priv, 18 - struct xdp_umem *umem) 17 + static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv, 18 + struct xsk_buff_pool *pool) 19 19 { 20 - return xsk_buff_dma_unmap(umem, 0); 20 + return xsk_pool_dma_unmap(pool, 0); 21 21 } 22 22 23 - static int mlx5e_xsk_get_umems(struct mlx5e_xsk *xsk) 23 + static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk) 24 24 { 25 - if (!xsk->umems) { 26 - xsk->umems = kcalloc(MLX5E_MAX_NUM_CHANNELS, 27 - sizeof(*xsk->umems), GFP_KERNEL); 28 - if (unlikely(!xsk->umems)) 25 + if (!xsk->pools) { 26 + xsk->pools = kcalloc(MLX5E_MAX_NUM_CHANNELS, 27 + sizeof(*xsk->pools), GFP_KERNEL); 28 + if (unlikely(!xsk->pools)) 29 29 return -ENOMEM; 30 30 } 31 31 ··· 35 35 return 0; 36 36 } 37 37 38 - static void mlx5e_xsk_put_umems(struct mlx5e_xsk *xsk) 38 + static void mlx5e_xsk_put_pools(struct mlx5e_xsk *xsk) 39 39 { 40 40 if (!--xsk->refcnt) { 41 - kfree(xsk->umems); 42 - xsk->umems = NULL; 41 + kfree(xsk->pools); 42 + xsk->pools = NULL; 43 43 } 44 44 } 45 45 46 - static int mlx5e_xsk_add_umem(struct mlx5e_xsk *xsk, struct xdp_umem *umem, u16 ix) 46 + static int mlx5e_xsk_add_pool(struct mlx5e_xsk *xsk, struct xsk_buff_pool *pool, u16 ix) 47 47 { 48 48 int err; 49 49 50 - err = mlx5e_xsk_get_umems(xsk); 50 + err = mlx5e_xsk_get_pools(xsk); 51 51 if (unlikely(err)) 52 52 return err; 53 53 54 - xsk->umems[ix] = umem; 54 + xsk->pools[ix] = pool; 55 55 return 0; 56 56 } 57 57 58 - static void mlx5e_xsk_remove_umem(struct mlx5e_xsk *xsk, u16 ix) 58 + static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix) 59 59 { 60 - xsk->umems[ix] = NULL; 60 + xsk->pools[ix] = NULL; 61 61 62 - mlx5e_xsk_put_umems(xsk); 62 + mlx5e_xsk_put_pools(xsk); 63 63 } 64 64 65 - static bool mlx5e_xsk_is_umem_sane(struct xdp_umem *umem) 65 + static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool) 66 66 { 67 - return xsk_umem_get_headroom(umem) <= 0xffff && 68 - xsk_umem_get_chunk_size(umem) <= 0xffff; 67 + return xsk_pool_get_headroom(pool) <= 0xffff && 68 + xsk_pool_get_chunk_size(pool) <= 0xffff; 69 69 } 70 70 71 - void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk) 71 + void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk) 72 72 { 73 - xsk->headroom = xsk_umem_get_headroom(umem); 74 - xsk->chunk_size = xsk_umem_get_chunk_size(umem); 73 + xsk->headroom = xsk_pool_get_headroom(pool); 74 + xsk->chunk_size = xsk_pool_get_chunk_size(pool); 75 75 } 76 76 77 77 static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, 78 - struct xdp_umem *umem, u16 ix) 78 + struct xsk_buff_pool *pool, u16 ix) 79 79 { 80 80 struct mlx5e_params *params = &priv->channels.params; 81 81 struct mlx5e_xsk_param xsk; 82 82 struct mlx5e_channel *c; 83 83 int err; 84 84 85 - if (unlikely(mlx5e_xsk_get_umem(&priv->channels.params, &priv->xsk, ix))) 85 + if (unlikely(mlx5e_xsk_get_pool(&priv->channels.params, &priv->xsk, ix))) 86 86 return -EBUSY; 87 87 88 - if (unlikely(!mlx5e_xsk_is_umem_sane(umem))) 88 + if (unlikely(!mlx5e_xsk_is_pool_sane(pool))) 89 89 return -EINVAL; 90 90 91 - err = mlx5e_xsk_map_umem(priv, umem); 91 + err = mlx5e_xsk_map_pool(priv, pool); 92 92 if (unlikely(err)) 93 93 return err; 94 94 95 - err = mlx5e_xsk_add_umem(&priv->xsk, umem, ix); 95 + err = mlx5e_xsk_add_pool(&priv->xsk, pool, ix); 96 96 if (unlikely(err)) 97 - goto err_unmap_umem; 97 + goto err_unmap_pool; 98 98 99 - mlx5e_build_xsk_param(umem, &xsk); 99 + mlx5e_build_xsk_param(pool, &xsk); 100 100 101 101 if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) { 102 102 /* XSK objects will be created on open. */ ··· 112 112 113 113 c = priv->channels.c[ix]; 114 114 115 - err = mlx5e_open_xsk(priv, params, &xsk, umem, c); 115 + err = mlx5e_open_xsk(priv, params, &xsk, pool, c); 116 116 if (unlikely(err)) 117 - goto err_remove_umem; 117 + goto err_remove_pool; 118 118 119 119 mlx5e_activate_xsk(c); 120 120 ··· 132 132 mlx5e_deactivate_xsk(c); 133 133 mlx5e_close_xsk(c); 134 134 135 - err_remove_umem: 136 - mlx5e_xsk_remove_umem(&priv->xsk, ix); 135 + err_remove_pool: 136 + mlx5e_xsk_remove_pool(&priv->xsk, ix); 137 137 138 - err_unmap_umem: 139 - mlx5e_xsk_unmap_umem(priv, umem); 138 + err_unmap_pool: 139 + mlx5e_xsk_unmap_pool(priv, pool); 140 140 141 141 return err; 142 142 ··· 146 146 */ 147 147 if (!mlx5e_validate_xsk_param(params, &xsk, priv->mdev)) { 148 148 err = -EINVAL; 149 - goto err_remove_umem; 149 + goto err_remove_pool; 150 150 } 151 151 152 152 return 0; ··· 154 154 155 155 static int mlx5e_xsk_disable_locked(struct mlx5e_priv *priv, u16 ix) 156 156 { 157 - struct xdp_umem *umem = mlx5e_xsk_get_umem(&priv->channels.params, 157 + struct xsk_buff_pool *pool = mlx5e_xsk_get_pool(&priv->channels.params, 158 158 &priv->xsk, ix); 159 159 struct mlx5e_channel *c; 160 160 161 - if (unlikely(!umem)) 161 + if (unlikely(!pool)) 162 162 return -EINVAL; 163 163 164 164 if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) 165 - goto remove_umem; 165 + goto remove_pool; 166 166 167 167 /* XSK RQ and SQ are only created if XDP program is set. */ 168 168 if (!priv->channels.params.xdp_prog) 169 - goto remove_umem; 169 + goto remove_pool; 170 170 171 171 c = priv->channels.c[ix]; 172 172 mlx5e_xsk_redirect_rqt_to_drop(priv, ix); 173 173 mlx5e_deactivate_xsk(c); 174 174 mlx5e_close_xsk(c); 175 175 176 - remove_umem: 177 - mlx5e_xsk_remove_umem(&priv->xsk, ix); 178 - mlx5e_xsk_unmap_umem(priv, umem); 176 + remove_pool: 177 + mlx5e_xsk_remove_pool(&priv->xsk, ix); 178 + mlx5e_xsk_unmap_pool(priv, pool); 179 179 180 180 return 0; 181 181 } 182 182 183 - static int mlx5e_xsk_enable_umem(struct mlx5e_priv *priv, struct xdp_umem *umem, 183 + static int mlx5e_xsk_enable_pool(struct mlx5e_priv *priv, struct xsk_buff_pool *pool, 184 184 u16 ix) 185 185 { 186 186 int err; 187 187 188 188 mutex_lock(&priv->state_lock); 189 - err = mlx5e_xsk_enable_locked(priv, umem, ix); 189 + err = mlx5e_xsk_enable_locked(priv, pool, ix); 190 190 mutex_unlock(&priv->state_lock); 191 191 192 192 return err; 193 193 } 194 194 195 - static int mlx5e_xsk_disable_umem(struct mlx5e_priv *priv, u16 ix) 195 + static int mlx5e_xsk_disable_pool(struct mlx5e_priv *priv, u16 ix) 196 196 { 197 197 int err; 198 198 ··· 203 203 return err; 204 204 } 205 205 206 - int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid) 206 + int mlx5e_xsk_setup_pool(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid) 207 207 { 208 208 struct mlx5e_priv *priv = netdev_priv(dev); 209 209 struct mlx5e_params *params = &priv->channels.params; ··· 212 212 if (unlikely(!mlx5e_qid_get_ch_if_in_group(params, qid, MLX5E_RQ_GROUP_XSK, &ix))) 213 213 return -EINVAL; 214 214 215 - return umem ? mlx5e_xsk_enable_umem(priv, umem, ix) : 216 - mlx5e_xsk_disable_umem(priv, ix); 215 + return pool ? mlx5e_xsk_enable_pool(priv, pool, ix) : 216 + mlx5e_xsk_disable_pool(priv, ix); 217 217 }

-29

drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 - /* Copyright (c) 2019 Mellanox Technologies. */ 3 - 4 - #ifndef __MLX5_EN_XSK_UMEM_H__ 5 - #define __MLX5_EN_XSK_UMEM_H__ 6 - 7 - #include "en.h" 8 - 9 - static inline struct xdp_umem *mlx5e_xsk_get_umem(struct mlx5e_params *params, 10 - struct mlx5e_xsk *xsk, u16 ix) 11 - { 12 - if (!xsk || !xsk->umems) 13 - return NULL; 14 - 15 - if (unlikely(ix >= params->num_channels)) 16 - return NULL; 17 - 18 - return xsk->umems[ix]; 19 - } 20 - 21 - struct mlx5e_xsk_param; 22 - void mlx5e_build_xsk_param(struct xdp_umem *umem, struct mlx5e_xsk_param *xsk); 23 - 24 - /* .ndo_bpf callback. */ 25 - int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid); 26 - 27 - int mlx5e_xsk_resize_reuseq(struct xdp_umem *umem, u32 nentries); 28 - 29 - #endif /* __MLX5_EN_XSK_UMEM_H__ */

+1 -1

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c

··· 32 32 33 33 #include "en.h" 34 34 #include "en/port.h" 35 - #include "en/xsk/umem.h" 35 + #include "en/xsk/pool.h" 36 36 #include "lib/clock.h" 37 37 38 38 void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv,

+1 -1

drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c

··· 33 33 #include <linux/mlx5/fs.h> 34 34 #include "en.h" 35 35 #include "en/params.h" 36 - #include "en/xsk/umem.h" 36 + #include "en/xsk/pool.h" 37 37 38 38 struct mlx5e_ethtool_rule { 39 39 struct list_head list;

+25 -24

drivers/net/ethernet/mellanox/mlx5/core/en_main.c

··· 57 57 #include "en/monitor_stats.h" 58 58 #include "en/health.h" 59 59 #include "en/params.h" 60 - #include "en/xsk/umem.h" 60 + #include "en/xsk/pool.h" 61 61 #include "en/xsk/setup.h" 62 62 #include "en/xsk/rx.h" 63 63 #include "en/xsk/tx.h" ··· 363 363 static int mlx5e_alloc_rq(struct mlx5e_channel *c, 364 364 struct mlx5e_params *params, 365 365 struct mlx5e_xsk_param *xsk, 366 - struct xdp_umem *umem, 366 + struct xsk_buff_pool *xsk_pool, 367 367 struct mlx5e_rq_param *rqp, 368 368 struct mlx5e_rq *rq) 369 369 { ··· 389 389 rq->mdev = mdev; 390 390 rq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); 391 391 rq->xdpsq = &c->rq_xdpsq; 392 - rq->umem = umem; 392 + rq->xsk_pool = xsk_pool; 393 393 394 - if (rq->umem) 394 + if (rq->xsk_pool) 395 395 rq->stats = &c->priv->channel_stats[c->ix].xskrq; 396 396 else 397 397 rq->stats = &c->priv->channel_stats[c->ix].rq; ··· 477 477 if (xsk) { 478 478 err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, 479 479 MEM_TYPE_XSK_BUFF_POOL, NULL); 480 - xsk_buff_set_rxq_info(rq->umem, &rq->xdp_rxq); 480 + xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq); 481 481 } else { 482 482 /* Create a page_pool and register it with rxq */ 483 483 pp_params.order = 0; ··· 816 816 817 817 int mlx5e_open_rq(struct mlx5e_channel *c, struct mlx5e_params *params, 818 818 struct mlx5e_rq_param *param, struct mlx5e_xsk_param *xsk, 819 - struct xdp_umem *umem, struct mlx5e_rq *rq) 819 + struct xsk_buff_pool *xsk_pool, struct mlx5e_rq *rq) 820 820 { 821 821 int err; 822 822 823 - err = mlx5e_alloc_rq(c, params, xsk, umem, param, rq); 823 + err = mlx5e_alloc_rq(c, params, xsk, xsk_pool, param, rq); 824 824 if (err) 825 825 return err; 826 826 ··· 925 925 926 926 static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, 927 927 struct mlx5e_params *params, 928 - struct xdp_umem *umem, 928 + struct xsk_buff_pool *xsk_pool, 929 929 struct mlx5e_sq_param *param, 930 930 struct mlx5e_xdpsq *sq, 931 931 bool is_redirect) ··· 941 941 sq->uar_map = mdev->mlx5e_res.bfreg.map; 942 942 sq->min_inline_mode = params->tx_min_inline_mode; 943 943 sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); 944 - sq->umem = umem; 944 + sq->xsk_pool = xsk_pool; 945 945 946 - sq->stats = sq->umem ? 946 + sq->stats = sq->xsk_pool ? 947 947 &c->priv->channel_stats[c->ix].xsksq : 948 948 is_redirect ? 949 949 &c->priv->channel_stats[c->ix].xdpsq : ··· 1408 1408 } 1409 1409 1410 1410 int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params, 1411 - struct mlx5e_sq_param *param, struct xdp_umem *umem, 1411 + struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool, 1412 1412 struct mlx5e_xdpsq *sq, bool is_redirect) 1413 1413 { 1414 1414 struct mlx5e_create_sq_param csp = {}; 1415 1415 int err; 1416 1416 1417 - err = mlx5e_alloc_xdpsq(c, params, umem, param, sq, is_redirect); 1417 + err = mlx5e_alloc_xdpsq(c, params, xsk_pool, param, sq, is_redirect); 1418 1418 if (err) 1419 1419 return err; 1420 1420 ··· 1907 1907 static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, 1908 1908 struct mlx5e_params *params, 1909 1909 struct mlx5e_channel_param *cparam, 1910 - struct xdp_umem *umem, 1910 + struct xsk_buff_pool *xsk_pool, 1911 1911 struct mlx5e_channel **cp) 1912 1912 { 1913 1913 int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); ··· 1946 1946 if (unlikely(err)) 1947 1947 goto err_napi_del; 1948 1948 1949 - if (umem) { 1950 - mlx5e_build_xsk_param(umem, &xsk); 1951 - err = mlx5e_open_xsk(priv, params, &xsk, umem, c); 1949 + if (xsk_pool) { 1950 + mlx5e_build_xsk_param(xsk_pool, &xsk); 1951 + err = mlx5e_open_xsk(priv, params, &xsk, xsk_pool, c); 1952 1952 if (unlikely(err)) 1953 1953 goto err_close_queues; 1954 1954 } ··· 2309 2309 2310 2310 mlx5e_build_channel_param(priv, &chs->params, cparam); 2311 2311 for (i = 0; i < chs->num; i++) { 2312 - struct xdp_umem *umem = NULL; 2312 + struct xsk_buff_pool *xsk_pool = NULL; 2313 2313 2314 2314 if (chs->params.xdp_prog) 2315 - umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, i); 2315 + xsk_pool = mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, i); 2316 2316 2317 - err = mlx5e_open_channel(priv, i, &chs->params, cparam, umem, &chs->c[i]); 2317 + err = mlx5e_open_channel(priv, i, &chs->params, cparam, xsk_pool, &chs->c[i]); 2318 2318 if (err) 2319 2319 goto err_close_channels; 2320 2320 } ··· 3892 3892 u16 ix; 3893 3893 3894 3894 for (ix = 0; ix < chs->params.num_channels; ix++) { 3895 - struct xdp_umem *umem = mlx5e_xsk_get_umem(&chs->params, chs->params.xsk, ix); 3895 + struct xsk_buff_pool *xsk_pool = 3896 + mlx5e_xsk_get_pool(&chs->params, chs->params.xsk, ix); 3896 3897 struct mlx5e_xsk_param xsk; 3897 3898 3898 - if (!umem) 3899 + if (!xsk_pool) 3899 3900 continue; 3900 3901 3901 - mlx5e_build_xsk_param(umem, &xsk); 3902 + mlx5e_build_xsk_param(xsk_pool, &xsk); 3902 3903 3903 3904 if (!mlx5e_validate_xsk_param(new_params, &xsk, mdev)) { 3904 3905 u32 hr = mlx5e_get_linear_rq_headroom(new_params, &xsk); ··· 4424 4423 switch (xdp->command) { 4425 4424 case XDP_SETUP_PROG: 4426 4425 return mlx5e_xdp_set(dev, xdp->prog); 4427 - case XDP_SETUP_XSK_UMEM: 4428 - return mlx5e_xsk_setup_umem(dev, xdp->xsk.umem, 4426 + case XDP_SETUP_XSK_POOL: 4427 + return mlx5e_xsk_setup_pool(dev, xdp->xsk.pool, 4429 4428 xdp->xsk.queue_id); 4430 4429 default: 4431 4430 return -EINVAL;

+8 -8

drivers/net/ethernet/mellanox/mlx5/core/en_rx.c

··· 280 280 static inline int mlx5e_page_alloc(struct mlx5e_rq *rq, 281 281 struct mlx5e_dma_info *dma_info) 282 282 { 283 - if (rq->umem) 284 - return mlx5e_xsk_page_alloc_umem(rq, dma_info); 283 + if (rq->xsk_pool) 284 + return mlx5e_xsk_page_alloc_pool(rq, dma_info); 285 285 else 286 286 return mlx5e_page_alloc_pool(rq, dma_info); 287 287 } ··· 312 312 struct mlx5e_dma_info *dma_info, 313 313 bool recycle) 314 314 { 315 - if (rq->umem) 315 + if (rq->xsk_pool) 316 316 /* The `recycle` parameter is ignored, and the page is always 317 317 * put into the Reuse Ring, because there is no way to return 318 318 * the page to the userspace when the interface goes down. ··· 399 399 int err; 400 400 int i; 401 401 402 - if (rq->umem) { 402 + if (rq->xsk_pool) { 403 403 int pages_desired = wqe_bulk << rq->wqe.info.log_num_frags; 404 404 405 405 /* Check in advance that we have enough frames, instead of 406 406 * allocating one-by-one, failing and moving frames to the 407 407 * Reuse Ring. 408 408 */ 409 - if (unlikely(!xsk_buff_can_alloc(rq->umem, pages_desired))) 409 + if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, pages_desired))) 410 410 return -ENOMEM; 411 411 } 412 412 ··· 504 504 /* Check in advance that we have enough frames, instead of allocating 505 505 * one-by-one, failing and moving frames to the Reuse Ring. 506 506 */ 507 - if (rq->umem && 508 - unlikely(!xsk_buff_can_alloc(rq->umem, MLX5_MPWRQ_PAGES_PER_WQE))) { 507 + if (rq->xsk_pool && 508 + unlikely(!xsk_buff_can_alloc(rq->xsk_pool, MLX5_MPWRQ_PAGES_PER_WQE))) { 509 509 err = -ENOMEM; 510 510 goto err; 511 511 } ··· 753 753 * the driver when it refills the Fill Ring. 754 754 * 2. Otherwise, busy poll by rescheduling the NAPI poll. 755 755 */ 756 - if (unlikely(alloc_err == -ENOMEM && rq->umem)) 756 + if (unlikely(alloc_err == -ENOMEM && rq->xsk_pool)) 757 757 return true; 758 758 759 759 return false;

-18

drivers/net/tun.c

··· 219 219 __be16 h_vlan_TCI; 220 220 }; 221 221 222 - bool tun_is_xdp_frame(void *ptr) 223 - { 224 - return (unsigned long)ptr & TUN_XDP_FLAG; 225 - } 226 - EXPORT_SYMBOL(tun_is_xdp_frame); 227 - 228 - void *tun_xdp_to_ptr(void *ptr) 229 - { 230 - return (void *)((unsigned long)ptr | TUN_XDP_FLAG); 231 - } 232 - EXPORT_SYMBOL(tun_xdp_to_ptr); 233 - 234 - void *tun_ptr_to_xdp(void *ptr) 235 - { 236 - return (void *)((unsigned long)ptr & ~TUN_XDP_FLAG); 237 - } 238 - EXPORT_SYMBOL(tun_ptr_to_xdp); 239 - 240 222 static int tun_napi_receive(struct napi_struct *napi, int budget) 241 223 { 242 224 struct tun_file *tfile = container_of(napi, struct tun_file, napi);

+3 -3

drivers/net/veth.c

··· 234 234 return (unsigned long)ptr & VETH_XDP_FLAG; 235 235 } 236 236 237 - static void *veth_ptr_to_xdp(void *ptr) 237 + static struct xdp_frame *veth_ptr_to_xdp(void *ptr) 238 238 { 239 239 return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG); 240 240 } 241 241 242 - static void *veth_xdp_to_ptr(void *ptr) 242 + static void *veth_xdp_to_ptr(struct xdp_frame *xdp) 243 243 { 244 - return (void *)((unsigned long)ptr | VETH_XDP_FLAG); 244 + return (void *)((unsigned long)xdp | VETH_XDP_FLAG); 245 245 } 246 246 247 247 static void veth_ptr_free(void *ptr)

+25

include/linux/bpf-cgroup.h

··· 279 279 #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr) \ 280 280 BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_UDP6_RECVMSG, NULL) 281 281 282 + /* The SOCK_OPS"_SK" macro should be used when sock_ops->sk is not a 283 + * fullsock and its parent fullsock cannot be traced by 284 + * sk_to_full_sk(). 285 + * 286 + * e.g. sock_ops->sk is a request_sock and it is under syncookie mode. 287 + * Its listener-sk is not attached to the rsk_listener. 288 + * In this case, the caller holds the listener-sk (unlocked), 289 + * set its sock_ops->sk to req_sk, and call this SOCK_OPS"_SK" with 290 + * the listener-sk such that the cgroup-bpf-progs of the 291 + * listener-sk will be run. 292 + * 293 + * Regardless of syncookie mode or not, 294 + * calling bpf_setsockopt on listener-sk will not make sense anyway, 295 + * so passing 'sock_ops->sk == req_sk' to the bpf prog is appropriate here. 296 + */ 297 + #define BPF_CGROUP_RUN_PROG_SOCK_OPS_SK(sock_ops, sk) \ 298 + ({ \ 299 + int __ret = 0; \ 300 + if (cgroup_bpf_enabled) \ 301 + __ret = __cgroup_bpf_run_filter_sock_ops(sk, \ 302 + sock_ops, \ 303 + BPF_CGROUP_SOCK_OPS); \ 304 + __ret; \ 305 + }) 306 + 282 307 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) \ 283 308 ({ \ 284 309 int __ret = 0; \

+52

include/linux/bpf.h

··· 34 34 struct exception_table_entry; 35 35 struct seq_operations; 36 36 struct bpf_iter_aux_info; 37 + struct bpf_local_storage; 38 + struct bpf_local_storage_map; 37 39 38 40 extern struct idr btf_idr; 39 41 extern spinlock_t btf_idr_lock; ··· 105 103 int (*map_mmap)(struct bpf_map *map, struct vm_area_struct *vma); 106 104 __poll_t (*map_poll)(struct bpf_map *map, struct file *filp, 107 105 struct poll_table_struct *pts); 106 + 107 + /* Functions called by bpf_local_storage maps */ 108 + int (*map_local_storage_charge)(struct bpf_local_storage_map *smap, 109 + void *owner, u32 size); 110 + void (*map_local_storage_uncharge)(struct bpf_local_storage_map *smap, 111 + void *owner, u32 size); 112 + struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner); 113 + 114 + /* map_meta_equal must be implemented for maps that can be 115 + * used as an inner map. It is a runtime check to ensure 116 + * an inner map can be inserted to an outer map. 117 + * 118 + * Some properties of the inner map has been used during the 119 + * verification time. When inserting an inner map at the runtime, 120 + * map_meta_equal has to ensure the inserting map has the same 121 + * properties that the verifier has used earlier. 122 + */ 123 + bool (*map_meta_equal)(const struct bpf_map *meta0, 124 + const struct bpf_map *meta1); 108 125 109 126 /* BTF name and id of struct allocated by map_alloc */ 110 127 const char * const map_btf_name; ··· 248 227 const struct btf_type *key_type, 249 228 const struct btf_type *value_type); 250 229 230 + bool bpf_map_meta_equal(const struct bpf_map *meta0, 231 + const struct bpf_map *meta1); 232 + 251 233 extern const struct bpf_map_ops bpf_map_offload_ops; 252 234 253 235 /* function argument constraints */ ··· 333 309 * for this argument. 334 310 */ 335 311 int *ret_btf_id; /* return value btf_id */ 312 + bool (*allowed)(const struct bpf_prog *prog); 336 313 }; 337 314 338 315 /* bpf_context is intentionally undefined structure. Pointer to bpf_context is ··· 539 514 /* these two functions are called from generated trampoline */ 540 515 u64 notrace __bpf_prog_enter(void); 541 516 void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start); 517 + void notrace __bpf_prog_enter_sleepable(void); 518 + void notrace __bpf_prog_exit_sleepable(void); 542 519 543 520 struct bpf_ksym { 544 521 unsigned long start; ··· 736 709 bool offload_requested; 737 710 bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */ 738 711 bool func_proto_unreliable; 712 + bool sleepable; 739 713 enum bpf_tramp_prog_type trampoline_prog_type; 740 714 struct bpf_trampoline *trampoline; 741 715 struct hlist_node tramp_hlist; ··· 1246 1218 union bpf_iter_link_info *linfo, 1247 1219 struct bpf_iter_aux_info *aux); 1248 1220 typedef void (*bpf_iter_detach_target_t)(struct bpf_iter_aux_info *aux); 1221 + typedef void (*bpf_iter_show_fdinfo_t) (const struct bpf_iter_aux_info *aux, 1222 + struct seq_file *seq); 1223 + typedef int (*bpf_iter_fill_link_info_t)(const struct bpf_iter_aux_info *aux, 1224 + struct bpf_link_info *info); 1249 1225 1250 1226 #define BPF_ITER_CTX_ARG_MAX 2 1251 1227 struct bpf_iter_reg { 1252 1228 const char *target; 1253 1229 bpf_iter_attach_target_t attach_target; 1254 1230 bpf_iter_detach_target_t detach_target; 1231 + bpf_iter_show_fdinfo_t show_fdinfo; 1232 + bpf_iter_fill_link_info_t fill_link_info; 1255 1233 u32 ctx_arg_info_size; 1256 1234 struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX]; 1257 1235 const struct bpf_iter_seq_info *seq_info; ··· 1284 1250 bool bpf_link_is_iter(struct bpf_link *link); 1285 1251 struct bpf_prog *bpf_iter_get_info(struct bpf_iter_meta *meta, bool in_stop); 1286 1252 int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx); 1253 + void bpf_iter_map_show_fdinfo(const struct bpf_iter_aux_info *aux, 1254 + struct seq_file *seq); 1255 + int bpf_iter_map_fill_link_info(const struct bpf_iter_aux_info *aux, 1256 + struct bpf_link_info *info); 1287 1257 1288 1258 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value); 1289 1259 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value); ··· 1378 1340 const struct btf_type *t, int off, int size, 1379 1341 enum bpf_access_type atype, 1380 1342 u32 *next_btf_id); 1343 + bool btf_struct_ids_match(struct bpf_verifier_log *log, 1344 + int off, u32 id, u32 need_type_id); 1381 1345 int btf_resolve_helper_id(struct bpf_verifier_log *log, 1382 1346 const struct bpf_func_proto *fn, int); 1383 1347 ··· 1398 1358 struct btf *btf, const struct btf_type *t); 1399 1359 1400 1360 struct bpf_prog *bpf_prog_by_id(u32 id); 1361 + struct bpf_link *bpf_link_by_id(u32 id); 1401 1362 1402 1363 const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id); 1403 1364 #else /* !CONFIG_BPF_SYSCALL */ ··· 1678 1637 struct bpf_prog *old, u32 which); 1679 1638 int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog); 1680 1639 int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype); 1640 + int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags); 1681 1641 void sock_map_unhash(struct sock *sk); 1682 1642 void sock_map_close(struct sock *sk, long timeout); 1683 1643 #else ··· 1697 1655 1698 1656 static inline int sock_map_prog_detach(const union bpf_attr *attr, 1699 1657 enum bpf_prog_type ptype) 1658 + { 1659 + return -EOPNOTSUPP; 1660 + } 1661 + 1662 + static inline int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, 1663 + u64 flags) 1700 1664 { 1701 1665 return -EOPNOTSUPP; 1702 1666 } ··· 1784 1736 extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto; 1785 1737 extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; 1786 1738 extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; 1739 + extern const struct bpf_func_proto bpf_copy_from_user_proto; 1787 1740 1788 1741 const struct bpf_func_proto *bpf_tracing_func_proto( 1789 1742 enum bpf_func_id func_id, const struct bpf_prog *prog); ··· 1898 1849 1899 1850 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, 1900 1851 void *addr1, void *addr2); 1852 + 1853 + struct btf_id_set; 1854 + bool btf_id_set_contains(struct btf_id_set *set, u32 id); 1901 1855 1902 1856 #endif /* _LINUX_BPF_H */

+163

include/linux/bpf_local_storage.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Copyright (c) 2019 Facebook 4 + * Copyright 2020 Google LLC. 5 + */ 6 + 7 + #ifndef _BPF_LOCAL_STORAGE_H 8 + #define _BPF_LOCAL_STORAGE_H 9 + 10 + #include <linux/bpf.h> 11 + #include <linux/rculist.h> 12 + #include <linux/list.h> 13 + #include <linux/hash.h> 14 + #include <linux/types.h> 15 + #include <uapi/linux/btf.h> 16 + 17 + #define BPF_LOCAL_STORAGE_CACHE_SIZE 16 18 + 19 + struct bpf_local_storage_map_bucket { 20 + struct hlist_head list; 21 + raw_spinlock_t lock; 22 + }; 23 + 24 + /* Thp map is not the primary owner of a bpf_local_storage_elem. 25 + * Instead, the container object (eg. sk->sk_bpf_storage) is. 26 + * 27 + * The map (bpf_local_storage_map) is for two purposes 28 + * 1. Define the size of the "local storage". It is 29 + * the map's value_size. 30 + * 31 + * 2. Maintain a list to keep track of all elems such 32 + * that they can be cleaned up during the map destruction. 33 + * 34 + * When a bpf local storage is being looked up for a 35 + * particular object, the "bpf_map" pointer is actually used 36 + * as the "key" to search in the list of elem in 37 + * the respective bpf_local_storage owned by the object. 38 + * 39 + * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer 40 + * as the searching key. 41 + */ 42 + struct bpf_local_storage_map { 43 + struct bpf_map map; 44 + /* Lookup elem does not require accessing the map. 45 + * 46 + * Updating/Deleting requires a bucket lock to 47 + * link/unlink the elem from the map. Having 48 + * multiple buckets to improve contention. 49 + */ 50 + struct bpf_local_storage_map_bucket *buckets; 51 + u32 bucket_log; 52 + u16 elem_size; 53 + u16 cache_idx; 54 + }; 55 + 56 + struct bpf_local_storage_data { 57 + /* smap is used as the searching key when looking up 58 + * from the object's bpf_local_storage. 59 + * 60 + * Put it in the same cacheline as the data to minimize 61 + * the number of cachelines access during the cache hit case. 62 + */ 63 + struct bpf_local_storage_map __rcu *smap; 64 + u8 data[] __aligned(8); 65 + }; 66 + 67 + /* Linked to bpf_local_storage and bpf_local_storage_map */ 68 + struct bpf_local_storage_elem { 69 + struct hlist_node map_node; /* Linked to bpf_local_storage_map */ 70 + struct hlist_node snode; /* Linked to bpf_local_storage */ 71 + struct bpf_local_storage __rcu *local_storage; 72 + struct rcu_head rcu; 73 + /* 8 bytes hole */ 74 + /* The data is stored in aother cacheline to minimize 75 + * the number of cachelines access during a cache hit. 76 + */ 77 + struct bpf_local_storage_data sdata ____cacheline_aligned; 78 + }; 79 + 80 + struct bpf_local_storage { 81 + struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE]; 82 + struct hlist_head list; /* List of bpf_local_storage_elem */ 83 + void *owner; /* The object that owns the above "list" of 84 + * bpf_local_storage_elem. 85 + */ 86 + struct rcu_head rcu; 87 + raw_spinlock_t lock; /* Protect adding/removing from the "list" */ 88 + }; 89 + 90 + /* U16_MAX is much more than enough for sk local storage 91 + * considering a tcp_sock is ~2k. 92 + */ 93 + #define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE \ 94 + min_t(u32, \ 95 + (KMALLOC_MAX_SIZE - MAX_BPF_STACK - \ 96 + sizeof(struct bpf_local_storage_elem)), \ 97 + (U16_MAX - sizeof(struct bpf_local_storage_elem))) 98 + 99 + #define SELEM(_SDATA) \ 100 + container_of((_SDATA), struct bpf_local_storage_elem, sdata) 101 + #define SDATA(_SELEM) (&(_SELEM)->sdata) 102 + 103 + #define BPF_LOCAL_STORAGE_CACHE_SIZE 16 104 + 105 + struct bpf_local_storage_cache { 106 + spinlock_t idx_lock; 107 + u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE]; 108 + }; 109 + 110 + #define DEFINE_BPF_STORAGE_CACHE(name) \ 111 + static struct bpf_local_storage_cache name = { \ 112 + .idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \ 113 + } 114 + 115 + u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache); 116 + void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, 117 + u16 idx); 118 + 119 + /* Helper functions for bpf_local_storage */ 120 + int bpf_local_storage_map_alloc_check(union bpf_attr *attr); 121 + 122 + struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr); 123 + 124 + struct bpf_local_storage_data * 125 + bpf_local_storage_lookup(struct bpf_local_storage *local_storage, 126 + struct bpf_local_storage_map *smap, 127 + bool cacheit_lockit); 128 + 129 + void bpf_local_storage_map_free(struct bpf_local_storage_map *smap); 130 + 131 + int bpf_local_storage_map_check_btf(const struct bpf_map *map, 132 + const struct btf *btf, 133 + const struct btf_type *key_type, 134 + const struct btf_type *value_type); 135 + 136 + void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, 137 + struct bpf_local_storage_elem *selem); 138 + 139 + bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, 140 + struct bpf_local_storage_elem *selem, 141 + bool uncharge_omem); 142 + 143 + void bpf_selem_unlink(struct bpf_local_storage_elem *selem); 144 + 145 + void bpf_selem_link_map(struct bpf_local_storage_map *smap, 146 + struct bpf_local_storage_elem *selem); 147 + 148 + void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem); 149 + 150 + struct bpf_local_storage_elem * 151 + bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value, 152 + bool charge_mem); 153 + 154 + int 155 + bpf_local_storage_alloc(void *owner, 156 + struct bpf_local_storage_map *smap, 157 + struct bpf_local_storage_elem *first_selem); 158 + 159 + struct bpf_local_storage_data * 160 + bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, 161 + void *value, u64 map_flags); 162 + 163 + #endif /* _BPF_LOCAL_STORAGE_H */

+29

include/linux/bpf_lsm.h

··· 17 17 #include <linux/lsm_hook_defs.h> 18 18 #undef LSM_HOOK 19 19 20 + struct bpf_storage_blob { 21 + struct bpf_local_storage __rcu *storage; 22 + }; 23 + 24 + extern struct lsm_blob_sizes bpf_lsm_blob_sizes; 25 + 20 26 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, 21 27 const struct bpf_prog *prog); 28 + 29 + static inline struct bpf_storage_blob *bpf_inode( 30 + const struct inode *inode) 31 + { 32 + if (unlikely(!inode->i_security)) 33 + return NULL; 34 + 35 + return inode->i_security + bpf_lsm_blob_sizes.lbs_inode; 36 + } 37 + 38 + extern const struct bpf_func_proto bpf_inode_storage_get_proto; 39 + extern const struct bpf_func_proto bpf_inode_storage_delete_proto; 40 + void bpf_inode_storage_free(struct inode *inode); 22 41 23 42 #else /* !CONFIG_BPF_LSM */ 24 43 ··· 45 26 const struct bpf_prog *prog) 46 27 { 47 28 return -EOPNOTSUPP; 29 + } 30 + 31 + static inline struct bpf_storage_blob *bpf_inode( 32 + const struct inode *inode) 33 + { 34 + return NULL; 35 + } 36 + 37 + static inline void bpf_inode_storage_free(struct inode *inode) 38 + { 48 39 } 49 40 50 41 #endif /* CONFIG_BPF_LSM */

+3

include/linux/bpf_types.h

··· 107 107 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops) 108 108 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops) 109 109 #endif 110 + #ifdef CONFIG_BPF_LSM 111 + BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops) 112 + #endif 110 113 BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops) 111 114 #if defined(CONFIG_XDP_SOCKETS) 112 115 BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)

+1 -2

include/linux/btf.h

··· 64 64 u32 id, u32 *res_id); 65 65 const struct btf_type * 66 66 btf_resolve_size(const struct btf *btf, const struct btf_type *type, 67 - u32 *type_size, const struct btf_type **elem_type, 68 - u32 *total_nelems); 67 + u32 *type_size); 69 68 70 69 #define for_each_member(i, struct_type, member) \ 71 70 for (i = 0, member = btf_type_member(struct_type); \

+50 -1

include/linux/btf_ids.h

··· 3 3 #ifndef _LINUX_BTF_IDS_H 4 4 #define _LINUX_BTF_IDS_H 5 5 6 + struct btf_id_set { 7 + u32 cnt; 8 + u32 ids[]; 9 + }; 10 + 6 11 #ifdef CONFIG_DEBUG_INFO_BTF 7 12 8 13 #include <linux/compiler.h> /* for __PASTE */ ··· 67 62 ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 68 63 "." #scope " " #name "; \n" \ 69 64 #name ":; \n" \ 70 - ".popsection; \n"); \ 65 + ".popsection; \n"); 71 66 72 67 #define BTF_ID_LIST(name) \ 73 68 __BTF_ID_LIST(name, local) \ ··· 93 88 ".zero 4 \n" \ 94 89 ".popsection; \n"); 95 90 91 + /* 92 + * The BTF_SET_START/END macros pair defines sorted list of 93 + * BTF IDs plus its members count, with following layout: 94 + * 95 + * BTF_SET_START(list) 96 + * BTF_ID(type1, name1) 97 + * BTF_ID(type2, name2) 98 + * BTF_SET_END(list) 99 + * 100 + * __BTF_ID__set__list: 101 + * .zero 4 102 + * list: 103 + * __BTF_ID__type1__name1__3: 104 + * .zero 4 105 + * __BTF_ID__type2__name2__4: 106 + * .zero 4 107 + * 108 + */ 109 + #define __BTF_SET_START(name, scope) \ 110 + asm( \ 111 + ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 112 + "." #scope " __BTF_ID__set__" #name "; \n" \ 113 + "__BTF_ID__set__" #name ":; \n" \ 114 + ".zero 4 \n" \ 115 + ".popsection; \n"); 116 + 117 + #define BTF_SET_START(name) \ 118 + __BTF_ID_LIST(name, local) \ 119 + __BTF_SET_START(name, local) 120 + 121 + #define BTF_SET_START_GLOBAL(name) \ 122 + __BTF_ID_LIST(name, globl) \ 123 + __BTF_SET_START(name, globl) 124 + 125 + #define BTF_SET_END(name) \ 126 + asm( \ 127 + ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 128 + ".size __BTF_ID__set__" #name ", .-" #name " \n" \ 129 + ".popsection; \n"); \ 130 + extern struct btf_id_set name; 131 + 96 132 #else 97 133 98 134 #define BTF_ID_LIST(name) static u32 name[5]; 99 135 #define BTF_ID(prefix, name) 100 136 #define BTF_ID_UNUSED 101 137 #define BTF_ID_LIST_GLOBAL(name) u32 name[1]; 138 + #define BTF_SET_START(name) static struct btf_id_set name = { 0 }; 139 + #define BTF_SET_START_GLOBAL(name) static struct btf_id_set name = { 0 }; 140 + #define BTF_SET_END(name) 102 141 103 142 #endif /* CONFIG_DEBUG_INFO_BTF */ 104 143

+6 -2

include/linux/filter.h

··· 1236 1236 1237 1237 struct bpf_sock_ops_kern { 1238 1238 struct sock *sk; 1239 - u32 op; 1240 1239 union { 1241 1240 u32 args[4]; 1242 1241 u32 reply; 1243 1242 u32 replylong[4]; 1244 1243 }; 1245 - u32 is_fullsock; 1244 + struct sk_buff *syn_skb; 1245 + struct sk_buff *skb; 1246 + void *skb_data_end; 1247 + u8 op; 1248 + u8 is_fullsock; 1249 + u8 remaining_opt_len; 1246 1250 u64 temp; /* temp and everything after is not 1247 1251 * initialized to 0 before calling 1248 1252 * the BPF program. New fields that

+14 -5

include/linux/if_tun.h

··· 27 27 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) 28 28 struct socket *tun_get_socket(struct file *); 29 29 struct ptr_ring *tun_get_tx_ring(struct file *file); 30 - bool tun_is_xdp_frame(void *ptr); 31 - void *tun_xdp_to_ptr(void *ptr); 32 - void *tun_ptr_to_xdp(void *ptr); 30 + static inline bool tun_is_xdp_frame(void *ptr) 31 + { 32 + return (unsigned long)ptr & TUN_XDP_FLAG; 33 + } 34 + static inline void *tun_xdp_to_ptr(struct xdp_frame *xdp) 35 + { 36 + return (void *)((unsigned long)xdp | TUN_XDP_FLAG); 37 + } 38 + static inline struct xdp_frame *tun_ptr_to_xdp(void *ptr) 39 + { 40 + return (void *)((unsigned long)ptr & ~TUN_XDP_FLAG); 41 + } 33 42 void tun_ptr_free(void *ptr); 34 43 #else 35 44 #include <linux/err.h> ··· 57 48 { 58 49 return false; 59 50 } 60 - static inline void *tun_xdp_to_ptr(void *ptr) 51 + static inline void *tun_xdp_to_ptr(struct xdp_frame *xdp) 61 52 { 62 53 return NULL; 63 54 } 64 - static inline void *tun_ptr_to_xdp(void *ptr) 55 + static inline struct xdp_frame *tun_ptr_to_xdp(void *ptr) 65 56 { 66 57 return NULL; 67 58 }

+5 -5

include/linux/netdevice.h

··· 618 618 /* Subordinate device that the queue has been assigned to */ 619 619 struct net_device *sb_dev; 620 620 #ifdef CONFIG_XDP_SOCKETS 621 - struct xdp_umem *umem; 621 + struct xsk_buff_pool *pool; 622 622 #endif 623 623 /* 624 624 * write-mostly part ··· 755 755 struct net_device *dev; 756 756 struct xdp_rxq_info xdp_rxq; 757 757 #ifdef CONFIG_XDP_SOCKETS 758 - struct xdp_umem *umem; 758 + struct xsk_buff_pool *pool; 759 759 #endif 760 760 } ____cacheline_aligned_in_smp; 761 761 ··· 883 883 /* BPF program for offload callbacks, invoked at program load time. */ 884 884 BPF_OFFLOAD_MAP_ALLOC, 885 885 BPF_OFFLOAD_MAP_FREE, 886 - XDP_SETUP_XSK_UMEM, 886 + XDP_SETUP_XSK_POOL, 887 887 }; 888 888 889 889 struct bpf_prog_offload_ops; ··· 917 917 struct { 918 918 struct bpf_offloaded_map *offmap; 919 919 }; 920 - /* XDP_SETUP_XSK_UMEM */ 920 + /* XDP_SETUP_XSK_POOL */ 921 921 struct { 922 - struct xdp_umem *umem; 922 + struct xsk_buff_pool *pool; 923 923 u16 queue_id; 924 924 } xsk; 925 925 };

+8 -1

include/linux/rcupdate_trace.h

··· 82 82 void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func); 83 83 void synchronize_rcu_tasks_trace(void); 84 84 void rcu_barrier_tasks_trace(void); 85 - 85 + #else 86 + /* 87 + * The BPF JIT forms these addresses even when it doesn't call these 88 + * functions, so provide definitions that result in runtime errors. 89 + */ 90 + static inline void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func) { BUG(); } 91 + static inline void rcu_read_lock_trace(void) { BUG(); } 92 + static inline void rcu_read_unlock_trace(void) { BUG(); } 86 93 #endif /* #ifdef CONFIG_TASKS_TRACE_RCU */ 87 94 88 95 #endif /* __LINUX_RCUPDATE_TRACE_H */

-17

include/linux/skmsg.h

··· 340 340 struct sk_psock *psock, 341 341 struct proto *ops) 342 342 { 343 - /* Initialize saved callbacks and original proto only once, since this 344 - * function may be called multiple times for a psock, e.g. when 345 - * psock->progs.msg_parser is updated. 346 - * 347 - * Since we've not installed the new proto, psock is not yet in use and 348 - * we can initialize it without synchronization. 349 - */ 350 - if (!psock->sk_proto) { 351 - struct proto *orig = READ_ONCE(sk->sk_prot); 352 - 353 - psock->saved_unhash = orig->unhash; 354 - psock->saved_close = orig->close; 355 - psock->saved_write_space = sk->sk_write_space; 356 - 357 - psock->sk_proto = orig; 358 - } 359 - 360 343 /* Pairs with lockless read in sk_clone_lock() */ 361 344 WRITE_ONCE(sk->sk_prot, ops); 362 345 }

+15 -5

include/linux/tcp.h

··· 92 92 smc_ok : 1, /* SMC seen on SYN packet */ 93 93 snd_wscale : 4, /* Window scaling received from sender */ 94 94 rcv_wscale : 4; /* Window scaling to send to receiver */ 95 + u8 saw_unknown:1, /* Received unknown option */ 96 + unused:7; 95 97 u8 num_sacks; /* Number of SACK blocks */ 96 98 u16 user_mss; /* mss requested by user in ioctl */ 97 99 u16 mss_clamp; /* Maximal mss, negotiated at connection setup */ ··· 239 237 repair : 1, 240 238 frto : 1;/* F-RTO (RFC5682) activated in CA_Loss */ 241 239 u8 repair_queue; 242 - u8 syn_data:1, /* SYN includes data */ 240 + u8 save_syn:2, /* Save headers of SYN packet */ 241 + syn_data:1, /* SYN includes data */ 243 242 syn_fastopen:1, /* SYN includes Fast Open option */ 244 243 syn_fastopen_exp:1,/* SYN includes Fast Open exp. option */ 245 244 syn_fastopen_ch:1, /* Active TFO re-enabling probe */ 246 245 syn_data_acked:1,/* data in SYN is acked by SYN-ACK */ 247 - save_syn:1, /* Save headers of SYN packet */ 248 - is_cwnd_limited:1,/* forward progress limited by snd_cwnd? */ 249 - syn_smc:1; /* SYN includes SMC */ 246 + is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */ 250 247 u32 tlp_high_seq; /* snd_nxt at the time of TLP */ 251 248 252 249 u32 tcp_tx_delay; /* delay (in usec) added to TX packets */ ··· 392 391 #if IS_ENABLED(CONFIG_MPTCP) 393 392 bool is_mptcp; 394 393 #endif 394 + #if IS_ENABLED(CONFIG_SMC) 395 + bool syn_smc; /* SYN includes SMC */ 396 + #endif 395 397 396 398 #ifdef CONFIG_TCP_MD5SIG 397 399 /* TCP AF-Specific parts; only used by MD5 Signature support so far */ ··· 410 406 * socket. Used to retransmit SYNACKs etc. 411 407 */ 412 408 struct request_sock __rcu *fastopen_rsk; 413 - u32 *saved_syn; 409 + struct saved_syn *saved_syn; 414 410 }; 415 411 416 412 enum tsq_enum { ··· 486 482 { 487 483 kfree(tp->saved_syn); 488 484 tp->saved_syn = NULL; 485 + } 486 + 487 + static inline u32 tcp_saved_syn_len(const struct saved_syn *saved_syn) 488 + { 489 + return saved_syn->mac_hdrlen + saved_syn->network_hdrlen + 490 + saved_syn->tcp_hdrlen; 489 491 } 490 492 491 493 struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk,

+14

include/net/bpf_sk_storage.h

··· 3 3 #ifndef _BPF_SK_STORAGE_H 4 4 #define _BPF_SK_STORAGE_H 5 5 6 + #include <linux/rculist.h> 7 + #include <linux/list.h> 8 + #include <linux/hash.h> 9 + #include <linux/types.h> 10 + #include <linux/spinlock.h> 11 + #include <linux/bpf.h> 12 + #include <net/sock.h> 13 + #include <uapi/linux/sock_diag.h> 14 + #include <uapi/linux/btf.h> 15 + #include <linux/bpf_local_storage.h> 16 + 6 17 struct sock; 7 18 8 19 void bpf_sk_storage_free(struct sock *sk); 9 20 10 21 extern const struct bpf_func_proto bpf_sk_storage_get_proto; 11 22 extern const struct bpf_func_proto bpf_sk_storage_delete_proto; 23 + extern const struct bpf_func_proto sk_storage_get_btf_proto; 24 + extern const struct bpf_func_proto sk_storage_delete_btf_proto; 12 25 26 + struct bpf_local_storage_elem; 13 27 struct bpf_sk_storage_diag; 14 28 struct sk_buff; 15 29 struct nlattr;

+2

include/net/inet_connection_sock.h

··· 86 86 struct timer_list icsk_retransmit_timer; 87 87 struct timer_list icsk_delack_timer; 88 88 __u32 icsk_rto; 89 + __u32 icsk_rto_min; 90 + __u32 icsk_delack_max; 89 91 __u32 icsk_pmtu_cookie; 90 92 const struct tcp_congestion_ops *icsk_ca_ops; 91 93 const struct inet_connection_sock_af_ops *icsk_af_ops;

+8 -1

include/net/request_sock.h

··· 41 41 42 42 int inet_rtx_syn_ack(const struct sock *parent, struct request_sock *req); 43 43 44 + struct saved_syn { 45 + u32 mac_hdrlen; 46 + u32 network_hdrlen; 47 + u32 tcp_hdrlen; 48 + u8 data[]; 49 + }; 50 + 44 51 /* struct request_sock - mini sock to represent a connection request 45 52 */ 46 53 struct request_sock { ··· 67 60 struct timer_list rsk_timer; 68 61 const struct request_sock_ops *rsk_ops; 69 62 struct sock *sk; 70 - u32 *saved_syn; 63 + struct saved_syn *saved_syn; 71 64 u32 secid; 72 65 u32 peer_secid; 73 66 };

+2 -2

include/net/sock.h

··· 246 246 /* public: */ 247 247 }; 248 248 249 - struct bpf_sk_storage; 249 + struct bpf_local_storage; 250 250 251 251 /** 252 252 * struct sock - network layer representation of sockets ··· 517 517 void (*sk_destruct)(struct sock *sk); 518 518 struct sock_reuseport __rcu *sk_reuseport_cb; 519 519 #ifdef CONFIG_BPF_SYSCALL 520 - struct bpf_sk_storage __rcu *sk_bpf_storage; 520 + struct bpf_local_storage __rcu *sk_bpf_storage; 521 521 #endif 522 522 struct rcu_head sk_rcu; 523 523 };

+55 -4

include/net/tcp.h

··· 394 394 bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst); 395 395 void tcp_close(struct sock *sk, long timeout); 396 396 void tcp_init_sock(struct sock *sk); 397 - void tcp_init_transfer(struct sock *sk, int bpf_op); 397 + void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb); 398 398 __poll_t tcp_poll(struct file *file, struct socket *sock, 399 399 struct poll_table_struct *wait); 400 400 int tcp_getsockopt(struct sock *sk, int level, int optname, ··· 455 455 struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, 456 456 struct request_sock *req, 457 457 struct tcp_fastopen_cookie *foc, 458 - enum tcp_synack_type synack_type); 458 + enum tcp_synack_type synack_type, 459 + struct sk_buff *syn_skb); 459 460 int tcp_disconnect(struct sock *sk, int flags); 460 461 461 462 void tcp_finish_connect(struct sock *sk, struct sk_buff *skb); ··· 700 699 static inline u32 tcp_rto_min(struct sock *sk) 701 700 { 702 701 const struct dst_entry *dst = __sk_dst_get(sk); 703 - u32 rto_min = TCP_RTO_MIN; 702 + u32 rto_min = inet_csk(sk)->icsk_rto_min; 704 703 705 704 if (dst && dst_metric_locked(dst, RTAX_RTO_MIN)) 706 705 rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN); ··· 2026 2025 int (*send_synack)(const struct sock *sk, struct dst_entry *dst, 2027 2026 struct flowi *fl, struct request_sock *req, 2028 2027 struct tcp_fastopen_cookie *foc, 2029 - enum tcp_synack_type synack_type); 2028 + enum tcp_synack_type synack_type, 2029 + struct sk_buff *syn_skb); 2030 2030 }; 2031 2031 2032 2032 extern const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops; ··· 2224 2222 int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, 2225 2223 struct msghdr *msg, int len, int flags); 2226 2224 #endif /* CONFIG_NET_SOCK_MSG */ 2225 + 2226 + #ifdef CONFIG_CGROUP_BPF 2227 + /* Copy the listen sk's HDR_OPT_CB flags to its child. 2228 + * 2229 + * During 3-Way-HandShake, the synack is usually sent from 2230 + * the listen sk with the HDR_OPT_CB flags set so that 2231 + * bpf-prog will be called to write the BPF hdr option. 2232 + * 2233 + * In fastopen, the child sk is used to send synack instead 2234 + * of the listen sk. Thus, inheriting the HDR_OPT_CB flags 2235 + * from the listen sk gives the bpf-prog a chance to write 2236 + * BPF hdr option in the synack pkt during fastopen. 2237 + * 2238 + * Both fastopen and non-fastopen child will inherit the 2239 + * HDR_OPT_CB flags to keep the bpf-prog having a consistent 2240 + * behavior when deciding to clear this cb flags (or not) 2241 + * during the PASSIVE_ESTABLISHED_CB. 2242 + * 2243 + * In the future, other cb flags could be inherited here also. 2244 + */ 2245 + static inline void bpf_skops_init_child(const struct sock *sk, 2246 + struct sock *child) 2247 + { 2248 + tcp_sk(child)->bpf_sock_ops_cb_flags = 2249 + tcp_sk(sk)->bpf_sock_ops_cb_flags & 2250 + (BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG | 2251 + BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG | 2252 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG); 2253 + } 2254 + 2255 + static inline void bpf_skops_init_skb(struct bpf_sock_ops_kern *skops, 2256 + struct sk_buff *skb, 2257 + unsigned int end_offset) 2258 + { 2259 + skops->skb = skb; 2260 + skops->skb_data_end = skb->data + end_offset; 2261 + } 2262 + #else 2263 + static inline void bpf_skops_init_child(const struct sock *sk, 2264 + struct sock *child) 2265 + { 2266 + } 2267 + 2268 + static inline void bpf_skops_init_skb(struct bpf_sock_ops_kern *skops, 2269 + struct sk_buff *skb, 2270 + unsigned int end_offset) 2271 + { 2272 + } 2273 + #endif 2227 2274 2228 2275 /* Call BPF_SOCK_OPS program that returns an int. If the return value 2229 2276 * is < 0, then the BPF op failed (for example if the loaded BPF

+14 -16

include/net/xdp_sock.h

··· 18 18 struct xdp_buff; 19 19 20 20 struct xdp_umem { 21 - struct xsk_queue *fq; 22 - struct xsk_queue *cq; 23 - struct xsk_buff_pool *pool; 21 + void *addrs; 24 22 u64 size; 25 23 u32 headroom; 26 24 u32 chunk_size; 25 + u32 chunks; 26 + u32 npgs; 27 27 struct user_struct *user; 28 28 refcount_t users; 29 - struct work_struct work; 30 - struct page **pgs; 31 - u32 npgs; 32 - u16 queue_id; 33 - u8 need_wakeup; 34 29 u8 flags; 35 - int id; 36 - struct net_device *dev; 37 30 bool zc; 38 - spinlock_t xsk_tx_list_lock; 39 - struct list_head xsk_tx_list; 31 + struct page **pgs; 32 + int id; 33 + struct list_head xsk_dma_list; 40 34 }; 41 35 42 36 struct xsk_map { ··· 42 48 struct xdp_sock { 43 49 /* struct sock must be the first member of struct xdp_sock */ 44 50 struct sock sk; 45 - struct xsk_queue *rx; 51 + struct xsk_queue *rx ____cacheline_aligned_in_smp; 46 52 struct net_device *dev; 47 53 struct xdp_umem *umem; 48 54 struct list_head flush_node; 55 + struct xsk_buff_pool *pool; 49 56 u16 queue_id; 50 57 bool zc; 51 58 enum { ··· 54 59 XSK_BOUND, 55 60 XSK_UNBOUND, 56 61 } state; 57 - /* Protects multiple processes in the control path */ 58 - struct mutex mutex; 62 + 59 63 struct xsk_queue *tx ____cacheline_aligned_in_smp; 60 - struct list_head list; 64 + struct list_head tx_list; 61 65 /* Mutual exclusion of NAPI TX thread and sendmsg error paths 62 66 * in the SKB destructor callback. 63 67 */ ··· 71 77 struct list_head map_list; 72 78 /* Protects map_list */ 73 79 spinlock_t map_list_lock; 80 + /* Protects multiple processes in the control path */ 81 + struct mutex mutex; 82 + struct xsk_queue *fq_tmp; /* Only as tmp storage before bind */ 83 + struct xsk_queue *cq_tmp; /* Only as tmp storage before bind */ 74 84 }; 75 85 76 86 #ifdef CONFIG_XDP_SOCKETS

+65 -57

include/net/xdp_sock_drv.h

··· 11 11 12 12 #ifdef CONFIG_XDP_SOCKETS 13 13 14 - void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries); 15 - bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc); 16 - void xsk_umem_consume_tx_done(struct xdp_umem *umem); 17 - struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id); 18 - void xsk_set_rx_need_wakeup(struct xdp_umem *umem); 19 - void xsk_set_tx_need_wakeup(struct xdp_umem *umem); 20 - void xsk_clear_rx_need_wakeup(struct xdp_umem *umem); 21 - void xsk_clear_tx_need_wakeup(struct xdp_umem *umem); 22 - bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem); 14 + void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries); 15 + bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); 16 + void xsk_tx_release(struct xsk_buff_pool *pool); 17 + struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, 18 + u16 queue_id); 19 + void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool); 20 + void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool); 21 + void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool); 22 + void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool); 23 + bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool); 23 24 24 - static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) 25 + static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) 25 26 { 26 - return XDP_PACKET_HEADROOM + umem->headroom; 27 + return XDP_PACKET_HEADROOM + pool->headroom; 27 28 } 28 29 29 - static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) 30 + static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool) 30 31 { 31 - return umem->chunk_size; 32 + return pool->chunk_size; 32 33 } 33 34 34 - static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) 35 + static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool) 35 36 { 36 - return xsk_umem_get_chunk_size(umem) - xsk_umem_get_headroom(umem); 37 + return xsk_pool_get_chunk_size(pool) - xsk_pool_get_headroom(pool); 37 38 } 38 39 39 - static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, 40 + static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool, 40 41 struct xdp_rxq_info *rxq) 41 42 { 42 - xp_set_rxq_info(umem->pool, rxq); 43 + xp_set_rxq_info(pool, rxq); 43 44 } 44 45 45 - static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, 46 + static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, 46 47 unsigned long attrs) 47 48 { 48 - xp_dma_unmap(umem->pool, attrs); 49 + xp_dma_unmap(pool, attrs); 49 50 } 50 51 51 - static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, 52 - unsigned long attrs) 52 + static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool, 53 + struct device *dev, unsigned long attrs) 53 54 { 54 - return xp_dma_map(umem->pool, dev, attrs, umem->pgs, umem->npgs); 55 + struct xdp_umem *umem = pool->umem; 56 + 57 + return xp_dma_map(pool, dev, attrs, umem->pgs, umem->npgs); 55 58 } 56 59 57 60 static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp) ··· 71 68 return xp_get_frame_dma(xskb); 72 69 } 73 70 74 - static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) 71 + static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) 75 72 { 76 - return xp_alloc(umem->pool); 73 + return xp_alloc(pool); 77 74 } 78 75 79 - static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count) 76 + static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) 80 77 { 81 - return xp_can_alloc(umem->pool, count); 78 + return xp_can_alloc(pool, count); 82 79 } 83 80 84 81 static inline void xsk_buff_free(struct xdp_buff *xdp) ··· 88 85 xp_free(xskb); 89 86 } 90 87 91 - static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr) 88 + static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool, 89 + u64 addr) 92 90 { 93 - return xp_raw_get_dma(umem->pool, addr); 91 + return xp_raw_get_dma(pool, addr); 94 92 } 95 93 96 - static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr) 94 + static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) 97 95 { 98 - return xp_raw_get_data(umem->pool, addr); 96 + return xp_raw_get_data(pool, addr); 99 97 } 100 98 101 - static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) 99 + static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) 102 100 { 103 101 struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); 102 + 103 + if (!pool->dma_need_sync) 104 + return; 104 105 105 106 xp_dma_sync_for_cpu(xskb); 106 107 } 107 108 108 - static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem, 109 + static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, 109 110 dma_addr_t dma, 110 111 size_t size) 111 112 { 112 - xp_dma_sync_for_device(umem->pool, dma, size); 113 + xp_dma_sync_for_device(pool, dma, size); 113 114 } 114 115 115 116 #else 116 117 117 - static inline void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) 118 + static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) 118 119 { 119 120 } 120 121 121 - static inline bool xsk_umem_consume_tx(struct xdp_umem *umem, 122 - struct xdp_desc *desc) 122 + static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, 123 + struct xdp_desc *desc) 123 124 { 124 125 return false; 125 126 } 126 127 127 - static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem) 128 + static inline void xsk_tx_release(struct xsk_buff_pool *pool) 128 129 { 129 130 } 130 131 131 - static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, 132 - u16 queue_id) 132 + static inline struct xsk_buff_pool * 133 + xsk_get_pool_from_qid(struct net_device *dev, u16 queue_id) 133 134 { 134 135 return NULL; 135 136 } 136 137 137 - static inline void xsk_set_rx_need_wakeup(struct xdp_umem *umem) 138 + static inline void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) 138 139 { 139 140 } 140 141 141 - static inline void xsk_set_tx_need_wakeup(struct xdp_umem *umem) 142 + static inline void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) 142 143 { 143 144 } 144 145 145 - static inline void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) 146 + static inline void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) 146 147 { 147 148 } 148 149 149 - static inline void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) 150 + static inline void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) 150 151 { 151 152 } 152 153 153 - static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem) 154 + static inline bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) 154 155 { 155 156 return false; 156 157 } 157 158 158 - static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) 159 + static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) 159 160 { 160 161 return 0; 161 162 } 162 163 163 - static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) 164 + static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool) 164 165 { 165 166 return 0; 166 167 } 167 168 168 - static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) 169 + static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool) 169 170 { 170 171 return 0; 171 172 } 172 173 173 - static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, 174 + static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool, 174 175 struct xdp_rxq_info *rxq) 175 176 { 176 177 } 177 178 178 - static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, 179 + static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, 179 180 unsigned long attrs) 180 181 { 181 182 } 182 183 183 - static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, 184 - unsigned long attrs) 184 + static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool, 185 + struct device *dev, unsigned long attrs) 185 186 { 186 187 return 0; 187 188 } ··· 200 193 return 0; 201 194 } 202 195 203 - static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) 196 + static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) 204 197 { 205 198 return NULL; 206 199 } 207 200 208 - static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count) 201 + static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) 209 202 { 210 203 return false; 211 204 } ··· 214 207 { 215 208 } 216 209 217 - static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr) 210 + static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool, 211 + u64 addr) 218 212 { 219 213 return 0; 220 214 } 221 215 222 - static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr) 216 + static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) 223 217 { 224 218 return NULL; 225 219 } 226 220 227 - static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) 221 + static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) 228 222 { 229 223 } 230 224 231 - static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem, 225 + static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, 232 226 dma_addr_t dma, 233 227 size_t size) 234 228 {

+43 -10

include/net/xsk_buff_pool.h

··· 13 13 struct xdp_rxq_info; 14 14 struct xsk_queue; 15 15 struct xdp_desc; 16 + struct xdp_umem; 17 + struct xdp_sock; 16 18 struct device; 17 19 struct page; 18 20 ··· 28 26 struct list_head free_list_node; 29 27 }; 30 28 29 + struct xsk_dma_map { 30 + dma_addr_t *dma_pages; 31 + struct device *dev; 32 + struct net_device *netdev; 33 + refcount_t users; 34 + struct list_head list; /* Protected by the RTNL_LOCK */ 35 + u32 dma_pages_cnt; 36 + bool dma_need_sync; 37 + }; 38 + 31 39 struct xsk_buff_pool { 32 - struct xsk_queue *fq; 40 + /* Members only used in the control path first. */ 41 + struct device *dev; 42 + struct net_device *netdev; 43 + struct list_head xsk_tx_list; 44 + /* Protects modifications to the xsk_tx_list */ 45 + spinlock_t xsk_tx_list_lock; 46 + refcount_t users; 47 + struct xdp_umem *umem; 48 + struct work_struct work; 33 49 struct list_head free_list; 50 + u32 heads_cnt; 51 + u16 queue_id; 52 + 53 + /* Data path members as close to free_heads at the end as possible. */ 54 + struct xsk_queue *fq ____cacheline_aligned_in_smp; 55 + struct xsk_queue *cq; 56 + /* For performance reasons, each buff pool has its own array of dma_pages 57 + * even when they are identical. 58 + */ 34 59 dma_addr_t *dma_pages; 35 60 struct xdp_buff_xsk *heads; 36 61 u64 chunk_mask; 37 62 u64 addrs_cnt; 38 63 u32 free_list_cnt; 39 64 u32 dma_pages_cnt; 40 - u32 heads_cnt; 41 65 u32 free_heads_cnt; 42 66 u32 headroom; 43 67 u32 chunk_size; 44 68 u32 frame_len; 69 + u8 cached_need_wakeup; 70 + bool uses_need_wakeup; 45 71 bool dma_need_sync; 46 72 bool unaligned; 47 73 void *addrs; 48 - struct device *dev; 49 74 struct xdp_buff_xsk *free_heads[]; 50 75 }; 51 76 52 77 /* AF_XDP core. */ 53 - struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, 54 - u32 chunk_size, u32 headroom, u64 size, 55 - bool unaligned); 56 - void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq); 78 + struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, 79 + struct xdp_umem *umem); 80 + int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev, 81 + u16 queue_id, u16 flags); 82 + int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem, 83 + struct net_device *dev, u16 queue_id); 57 84 void xp_destroy(struct xsk_buff_pool *pool); 58 85 void xp_release(struct xdp_buff_xsk *xskb); 86 + void xp_get_pool(struct xsk_buff_pool *pool); 87 + void xp_put_pool(struct xsk_buff_pool *pool); 88 + void xp_clear_dev(struct xsk_buff_pool *pool); 89 + void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs); 90 + void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs); 59 91 60 92 /* AF_XDP, and XDP core. */ 61 93 void xp_free(struct xdp_buff_xsk *xskb); ··· 116 80 void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb); 117 81 static inline void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb) 118 82 { 119 - if (!xskb->pool->dma_need_sync) 120 - return; 121 - 122 83 xp_dma_sync_for_cpu_slow(xskb); 123 84 } 124 85

+393 -5

include/uapi/linux/bpf.h

··· 155 155 BPF_MAP_TYPE_DEVMAP_HASH, 156 156 BPF_MAP_TYPE_STRUCT_OPS, 157 157 BPF_MAP_TYPE_RINGBUF, 158 + BPF_MAP_TYPE_INODE_STORAGE, 158 159 }; 159 160 160 161 /* Note that tracing related programs such as ··· 345 344 346 345 /* The verifier internal test flag. Behavior is undefined */ 347 346 #define BPF_F_TEST_STATE_FREQ (1U << 3) 347 + 348 + /* If BPF_F_SLEEPABLE is used in BPF_PROG_LOAD command, the verifier will 349 + * restrict map and helper usage for such programs. Sleepable BPF programs can 350 + * only be attached to hooks where kernel execution context allows sleeping. 351 + * Such programs are allowed to use helpers that may sleep like 352 + * bpf_copy_from_user(). 353 + */ 354 + #define BPF_F_SLEEPABLE (1U << 4) 348 355 349 356 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have 350 357 * two extensions: ··· 2816 2807 * 2817 2808 * **-ERANGE** if resulting value was out of range. 2818 2809 * 2819 - * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags) 2810 + * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags) 2820 2811 * Description 2821 2812 * Get a bpf-local-storage from a *sk*. 2822 2813 * ··· 2832 2823 * "type". The bpf-local-storage "type" (i.e. the *map*) is 2833 2824 * searched against all bpf-local-storages residing at *sk*. 2834 2825 * 2826 + * *sk* is a kernel **struct sock** pointer for LSM program. 2827 + * *sk* is a **struct bpf_sock** pointer for other program types. 2828 + * 2835 2829 * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be 2836 2830 * used such that a new bpf-local-storage will be 2837 2831 * created if one does not exist. *value* can be used ··· 2847 2835 * **NULL** if not found or there was an error in adding 2848 2836 * a new bpf-local-storage. 2849 2837 * 2850 - * long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk) 2838 + * long bpf_sk_storage_delete(struct bpf_map *map, void *sk) 2851 2839 * Description 2852 2840 * Delete a bpf-local-storage from a *sk*. 2853 2841 * Return ··· 3407 3395 * A non-negative value equal to or less than *size* on success, 3408 3396 * or a negative error in case of failure. 3409 3397 * 3398 + * long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags) 3399 + * Description 3400 + * Load header option. Support reading a particular TCP header 3401 + * option for bpf program (BPF_PROG_TYPE_SOCK_OPS). 3402 + * 3403 + * If *flags* is 0, it will search the option from the 3404 + * sock_ops->skb_data. The comment in "struct bpf_sock_ops" 3405 + * has details on what skb_data contains under different 3406 + * sock_ops->op. 3407 + * 3408 + * The first byte of the *searchby_res* specifies the 3409 + * kind that it wants to search. 3410 + * 3411 + * If the searching kind is an experimental kind 3412 + * (i.e. 253 or 254 according to RFC6994). It also 3413 + * needs to specify the "magic" which is either 3414 + * 2 bytes or 4 bytes. It then also needs to 3415 + * specify the size of the magic by using 3416 + * the 2nd byte which is "kind-length" of a TCP 3417 + * header option and the "kind-length" also 3418 + * includes the first 2 bytes "kind" and "kind-length" 3419 + * itself as a normal TCP header option also does. 3420 + * 3421 + * For example, to search experimental kind 254 with 3422 + * 2 byte magic 0xeB9F, the searchby_res should be 3423 + * [ 254, 4, 0xeB, 0x9F, 0, 0, .... 0 ]. 3424 + * 3425 + * To search for the standard window scale option (3), 3426 + * the searchby_res should be [ 3, 0, 0, .... 0 ]. 3427 + * Note, kind-length must be 0 for regular option. 3428 + * 3429 + * Searching for No-Op (0) and End-of-Option-List (1) are 3430 + * not supported. 3431 + * 3432 + * *len* must be at least 2 bytes which is the minimal size 3433 + * of a header option. 3434 + * 3435 + * Supported flags: 3436 + * * **BPF_LOAD_HDR_OPT_TCP_SYN** to search from the 3437 + * saved_syn packet or the just-received syn packet. 3438 + * 3439 + * Return 3440 + * >0 when found, the header option is copied to *searchby_res*. 3441 + * The return value is the total length copied. 3442 + * 3443 + * **-EINVAL** If param is invalid 3444 + * 3445 + * **-ENOMSG** The option is not found 3446 + * 3447 + * **-ENOENT** No syn packet available when 3448 + * **BPF_LOAD_HDR_OPT_TCP_SYN** is used 3449 + * 3450 + * **-ENOSPC** Not enough space. Only *len* number of 3451 + * bytes are copied. 3452 + * 3453 + * **-EFAULT** Cannot parse the header options in the packet 3454 + * 3455 + * **-EPERM** This helper cannot be used under the 3456 + * current sock_ops->op. 3457 + * 3458 + * long bpf_store_hdr_opt(struct bpf_sock_ops *skops, const void *from, u32 len, u64 flags) 3459 + * Description 3460 + * Store header option. The data will be copied 3461 + * from buffer *from* with length *len* to the TCP header. 3462 + * 3463 + * The buffer *from* should have the whole option that 3464 + * includes the kind, kind-length, and the actual 3465 + * option data. The *len* must be at least kind-length 3466 + * long. The kind-length does not have to be 4 byte 3467 + * aligned. The kernel will take care of the padding 3468 + * and setting the 4 bytes aligned value to th->doff. 3469 + * 3470 + * This helper will check for duplicated option 3471 + * by searching the same option in the outgoing skb. 3472 + * 3473 + * This helper can only be called during 3474 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 3475 + * 3476 + * Return 3477 + * 0 on success, or negative error in case of failure: 3478 + * 3479 + * **-EINVAL** If param is invalid 3480 + * 3481 + * **-ENOSPC** Not enough space in the header. 3482 + * Nothing has been written 3483 + * 3484 + * **-EEXIST** The option has already existed 3485 + * 3486 + * **-EFAULT** Cannot parse the existing header options 3487 + * 3488 + * **-EPERM** This helper cannot be used under the 3489 + * current sock_ops->op. 3490 + * 3491 + * long bpf_reserve_hdr_opt(struct bpf_sock_ops *skops, u32 len, u64 flags) 3492 + * Description 3493 + * Reserve *len* bytes for the bpf header option. The 3494 + * space will be used by bpf_store_hdr_opt() later in 3495 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 3496 + * 3497 + * If bpf_reserve_hdr_opt() is called multiple times, 3498 + * the total number of bytes will be reserved. 3499 + * 3500 + * This helper can only be called during 3501 + * BPF_SOCK_OPS_HDR_OPT_LEN_CB. 3502 + * 3503 + * Return 3504 + * 0 on success, or negative error in case of failure: 3505 + * 3506 + * **-EINVAL** if param is invalid 3507 + * 3508 + * **-ENOSPC** Not enough space in the header. 3509 + * 3510 + * **-EPERM** This helper cannot be used under the 3511 + * current sock_ops->op. 3512 + * 3513 + * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags) 3514 + * Description 3515 + * Get a bpf_local_storage from an *inode*. 3516 + * 3517 + * Logically, it could be thought of as getting the value from 3518 + * a *map* with *inode* as the **key**. From this 3519 + * perspective, the usage is not much different from 3520 + * **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this 3521 + * helper enforces the key must be an inode and the map must also 3522 + * be a **BPF_MAP_TYPE_INODE_STORAGE**. 3523 + * 3524 + * Underneath, the value is stored locally at *inode* instead of 3525 + * the *map*. The *map* is used as the bpf-local-storage 3526 + * "type". The bpf-local-storage "type" (i.e. the *map*) is 3527 + * searched against all bpf_local_storage residing at *inode*. 3528 + * 3529 + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be 3530 + * used such that a new bpf_local_storage will be 3531 + * created if one does not exist. *value* can be used 3532 + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify 3533 + * the initial value of a bpf_local_storage. If *value* is 3534 + * **NULL**, the new bpf_local_storage will be zero initialized. 3535 + * Return 3536 + * A bpf_local_storage pointer is returned on success. 3537 + * 3538 + * **NULL** if not found or there was an error in adding 3539 + * a new bpf_local_storage. 3540 + * 3541 + * int bpf_inode_storage_delete(struct bpf_map *map, void *inode) 3542 + * Description 3543 + * Delete a bpf_local_storage from an *inode*. 3544 + * Return 3545 + * 0 on success. 3546 + * 3547 + * **-ENOENT** if the bpf_local_storage cannot be found. 3548 + * 3549 + * long bpf_d_path(struct path *path, char *buf, u32 sz) 3550 + * Description 3551 + * Return full path for given 'struct path' object, which 3552 + * needs to be the kernel BTF 'path' object. The path is 3553 + * returned in the provided buffer 'buf' of size 'sz' and 3554 + * is zero terminated. 3555 + * 3556 + * Return 3557 + * On success, the strictly positive length of the string, 3558 + * including the trailing NUL character. On error, a negative 3559 + * value. 3560 + * 3561 + * long bpf_copy_from_user(void *dst, u32 size, const void *user_ptr) 3562 + * Description 3563 + * Read *size* bytes from user space address *user_ptr* and store 3564 + * the data in *dst*. This is a wrapper of copy_from_user(). 3565 + * Return 3566 + * 0 on success, or a negative error in case of failure. 3410 3567 */ 3411 3568 #define __BPF_FUNC_MAPPER(FN) \ 3412 3569 FN(unspec), \ ··· 3720 3539 FN(skc_to_tcp_request_sock), \ 3721 3540 FN(skc_to_udp6_sock), \ 3722 3541 FN(get_task_stack), \ 3542 + FN(load_hdr_opt), \ 3543 + FN(store_hdr_opt), \ 3544 + FN(reserve_hdr_opt), \ 3545 + FN(inode_storage_get), \ 3546 + FN(inode_storage_delete), \ 3547 + FN(d_path), \ 3548 + FN(copy_from_user), \ 3723 3549 /* */ 3724 3550 3725 3551 /* integer value in 'imm' field of BPF_CALL instruction selects which helper ··· 3836 3648 BPF_F_SYSCTL_BASE_NAME = (1ULL << 0), 3837 3649 }; 3838 3650 3839 - /* BPF_FUNC_sk_storage_get flags */ 3651 + /* BPF_FUNC_<kernel_obj>_storage_get flags */ 3840 3652 enum { 3841 - BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0), 3653 + BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0), 3654 + /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility 3655 + * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead. 3656 + */ 3657 + BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE, 3842 3658 }; 3843 3659 3844 3660 /* BPF_FUNC_read_branch_records flags. */ ··· 4263 4071 __u64 cgroup_id; 4264 4072 __u32 attach_type; 4265 4073 } cgroup; 4074 + struct { 4075 + __aligned_u64 target_name; /* in/out: target_name buffer ptr */ 4076 + __u32 target_name_len; /* in/out: target_name buffer len */ 4077 + union { 4078 + struct { 4079 + __u32 map_id; 4080 + } map; 4081 + }; 4082 + } iter; 4266 4083 struct { 4267 4084 __u32 netns_ino; 4268 4085 __u32 attach_type; ··· 4359 4158 __u64 bytes_received; 4360 4159 __u64 bytes_acked; 4361 4160 __bpf_md_ptr(struct bpf_sock *, sk); 4161 + /* [skb_data, skb_data_end) covers the whole TCP header. 4162 + * 4163 + * BPF_SOCK_OPS_PARSE_HDR_OPT_CB: The packet received 4164 + * BPF_SOCK_OPS_HDR_OPT_LEN_CB: Not useful because the 4165 + * header has not been written. 4166 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB: The header and options have 4167 + * been written so far. 4168 + * BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: The SYNACK that concludes 4169 + * the 3WHS. 4170 + * BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: The ACK that concludes 4171 + * the 3WHS. 4172 + * 4173 + * bpf_load_hdr_opt() can also be used to read a particular option. 4174 + */ 4175 + __bpf_md_ptr(void *, skb_data); 4176 + __bpf_md_ptr(void *, skb_data_end); 4177 + __u32 skb_len; /* The total length of a packet. 4178 + * It includes the header, options, 4179 + * and payload. 4180 + */ 4181 + __u32 skb_tcp_flags; /* tcp_flags of the header. It provides 4182 + * an easy way to check for tcp_flags 4183 + * without parsing skb_data. 4184 + * 4185 + * In particular, the skb_tcp_flags 4186 + * will still be available in 4187 + * BPF_SOCK_OPS_HDR_OPT_LEN even though 4188 + * the outgoing header has not 4189 + * been written yet. 4190 + */ 4362 4191 }; 4363 4192 4364 4193 /* Definitions for bpf_sock_ops_cb_flags */ ··· 4397 4166 BPF_SOCK_OPS_RETRANS_CB_FLAG = (1<<1), 4398 4167 BPF_SOCK_OPS_STATE_CB_FLAG = (1<<2), 4399 4168 BPF_SOCK_OPS_RTT_CB_FLAG = (1<<3), 4169 + /* Call bpf for all received TCP headers. The bpf prog will be 4170 + * called under sock_ops->op == BPF_SOCK_OPS_PARSE_HDR_OPT_CB 4171 + * 4172 + * Please refer to the comment in BPF_SOCK_OPS_PARSE_HDR_OPT_CB 4173 + * for the header option related helpers that will be useful 4174 + * to the bpf programs. 4175 + * 4176 + * It could be used at the client/active side (i.e. connect() side) 4177 + * when the server told it that the server was in syncookie 4178 + * mode and required the active side to resend the bpf-written 4179 + * options. The active side can keep writing the bpf-options until 4180 + * it received a valid packet from the server side to confirm 4181 + * the earlier packet (and options) has been received. The later 4182 + * example patch is using it like this at the active side when the 4183 + * server is in syncookie mode. 4184 + * 4185 + * The bpf prog will usually turn this off in the common cases. 4186 + */ 4187 + BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG = (1<<4), 4188 + /* Call bpf when kernel has received a header option that 4189 + * the kernel cannot handle. The bpf prog will be called under 4190 + * sock_ops->op == BPF_SOCK_OPS_PARSE_HDR_OPT_CB. 4191 + * 4192 + * Please refer to the comment in BPF_SOCK_OPS_PARSE_HDR_OPT_CB 4193 + * for the header option related helpers that will be useful 4194 + * to the bpf programs. 4195 + */ 4196 + BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG = (1<<5), 4197 + /* Call bpf when the kernel is writing header options for the 4198 + * outgoing packet. The bpf prog will first be called 4199 + * to reserve space in a skb under 4200 + * sock_ops->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB. Then 4201 + * the bpf prog will be called to write the header option(s) 4202 + * under sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 4203 + * 4204 + * Please refer to the comment in BPF_SOCK_OPS_HDR_OPT_LEN_CB 4205 + * and BPF_SOCK_OPS_WRITE_HDR_OPT_CB for the header option 4206 + * related helpers that will be useful to the bpf programs. 4207 + * 4208 + * The kernel gets its chance to reserve space and write 4209 + * options first before the BPF program does. 4210 + */ 4211 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6), 4400 4212 /* Mask of all currently supported cb flags */ 4401 - BPF_SOCK_OPS_ALL_CB_FLAGS = 0xF, 4213 + BPF_SOCK_OPS_ALL_CB_FLAGS = 0x7F, 4402 4214 }; 4403 4215 4404 4216 /* List of known BPF sock_ops operators. ··· 4497 4223 */ 4498 4224 BPF_SOCK_OPS_RTT_CB, /* Called on every RTT. 4499 4225 */ 4226 + BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option. 4227 + * It will be called to handle 4228 + * the packets received at 4229 + * an already established 4230 + * connection. 4231 + * 4232 + * sock_ops->skb_data: 4233 + * Referring to the received skb. 4234 + * It covers the TCP header only. 4235 + * 4236 + * bpf_load_hdr_opt() can also 4237 + * be used to search for a 4238 + * particular option. 4239 + */ 4240 + BPF_SOCK_OPS_HDR_OPT_LEN_CB, /* Reserve space for writing the 4241 + * header option later in 4242 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 4243 + * Arg1: bool want_cookie. (in 4244 + * writing SYNACK only) 4245 + * 4246 + * sock_ops->skb_data: 4247 + * Not available because no header has 4248 + * been written yet. 4249 + * 4250 + * sock_ops->skb_tcp_flags: 4251 + * The tcp_flags of the 4252 + * outgoing skb. (e.g. SYN, ACK, FIN). 4253 + * 4254 + * bpf_reserve_hdr_opt() should 4255 + * be used to reserve space. 4256 + */ 4257 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB, /* Write the header options 4258 + * Arg1: bool want_cookie. (in 4259 + * writing SYNACK only) 4260 + * 4261 + * sock_ops->skb_data: 4262 + * Referring to the outgoing skb. 4263 + * It covers the TCP header 4264 + * that has already been written 4265 + * by the kernel and the 4266 + * earlier bpf-progs. 4267 + * 4268 + * sock_ops->skb_tcp_flags: 4269 + * The tcp_flags of the outgoing 4270 + * skb. (e.g. SYN, ACK, FIN). 4271 + * 4272 + * bpf_store_hdr_opt() should 4273 + * be used to write the 4274 + * option. 4275 + * 4276 + * bpf_load_hdr_opt() can also 4277 + * be used to search for a 4278 + * particular option that 4279 + * has already been written 4280 + * by the kernel or the 4281 + * earlier bpf-progs. 4282 + */ 4500 4283 }; 4501 4284 4502 4285 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect ··· 4581 4250 enum { 4582 4251 TCP_BPF_IW = 1001, /* Set TCP initial congestion window */ 4583 4252 TCP_BPF_SNDCWND_CLAMP = 1002, /* Set sndcwnd_clamp */ 4253 + TCP_BPF_DELACK_MAX = 1003, /* Max delay ack in usecs */ 4254 + TCP_BPF_RTO_MIN = 1004, /* Min delay ack in usecs */ 4255 + /* Copy the SYN pkt to optval 4256 + * 4257 + * BPF_PROG_TYPE_SOCK_OPS only. It is similar to the 4258 + * bpf_getsockopt(TCP_SAVED_SYN) but it does not limit 4259 + * to only getting from the saved_syn. It can either get the 4260 + * syn packet from: 4261 + * 4262 + * 1. the just-received SYN packet (only available when writing the 4263 + * SYNACK). It will be useful when it is not necessary to 4264 + * save the SYN packet for latter use. It is also the only way 4265 + * to get the SYN during syncookie mode because the syn 4266 + * packet cannot be saved during syncookie. 4267 + * 4268 + * OR 4269 + * 4270 + * 2. the earlier saved syn which was done by 4271 + * bpf_setsockopt(TCP_SAVE_SYN). 4272 + * 4273 + * The bpf_getsockopt(TCP_BPF_SYN*) option will hide where the 4274 + * SYN packet is obtained. 4275 + * 4276 + * If the bpf-prog does not need the IP[46] header, the 4277 + * bpf-prog can avoid parsing the IP header by using 4278 + * TCP_BPF_SYN. Otherwise, the bpf-prog can get both 4279 + * IP[46] and TCP header by using TCP_BPF_SYN_IP. 4280 + * 4281 + * >0: Total number of bytes copied 4282 + * -ENOSPC: Not enough space in optval. Only optlen number of 4283 + * bytes is copied. 4284 + * -ENOENT: The SYN skb is not available now and the earlier SYN pkt 4285 + * is not saved by setsockopt(TCP_SAVE_SYN). 4286 + */ 4287 + TCP_BPF_SYN = 1005, /* Copy the TCP header */ 4288 + TCP_BPF_SYN_IP = 1006, /* Copy the IP[46] and TCP header */ 4289 + TCP_BPF_SYN_MAC = 1007, /* Copy the MAC, IP[46], and TCP header */ 4290 + }; 4291 + 4292 + enum { 4293 + BPF_LOAD_HDR_OPT_TCP_SYN = (1ULL << 0), 4294 + }; 4295 + 4296 + /* args[0] value during BPF_SOCK_OPS_HDR_OPT_LEN_CB and 4297 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 4298 + */ 4299 + enum { 4300 + BPF_WRITE_HDR_TCP_CURRENT_MSS = 1, /* Kernel is finding the 4301 + * total option spaces 4302 + * required for an established 4303 + * sk in order to calculate the 4304 + * MSS. No skb is actually 4305 + * sent. 4306 + */ 4307 + BPF_WRITE_HDR_TCP_SYNACK_COOKIE = 2, /* Kernel is in syncookie mode 4308 + * when sending a SYN. 4309 + */ 4584 4310 }; 4585 4311 4586 4312 struct bpf_perf_event_value {

+3

init/Kconfig

··· 1691 1691 bool "Enable bpf() system call" 1692 1692 select BPF 1693 1693 select IRQ_WORK 1694 + select TASKS_TRACE_RCU 1694 1695 default n 1695 1696 help 1696 1697 Enable the bpf() system call that allows to manipulate eBPF ··· 1710 1709 config BPF_JIT_DEFAULT_ON 1711 1710 def_bool ARCH_WANT_DEFAULT_BPF_JIT || BPF_JIT_ALWAYS_ON 1712 1711 depends on HAVE_EBPF_JIT && BPF_JIT 1712 + 1713 + source "kernel/bpf/preload/Kconfig" 1713 1714 1714 1715 config USERFAULTFD 1715 1716 bool "Enable userfaultfd() system call"

+1 -1

kernel/Makefile

··· 12 12 notifier.o ksysfs.o cred.o reboot.o \ 13 13 async.o range.o smpboot.o ucount.o regset.o 14 14 15 - obj-$(CONFIG_BPFILTER) += usermode_driver.o 15 + obj-$(CONFIG_USERMODE_DRIVER) += usermode_driver.o 16 16 obj-$(CONFIG_MODULES) += kmod.o 17 17 obj-$(CONFIG_MULTIUSER) += groups.o 18 18

+3

kernel/bpf/Makefile

··· 5 5 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o 6 6 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o 7 7 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o 8 + obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o 8 9 obj-$(CONFIG_BPF_SYSCALL) += disasm.o 9 10 obj-$(CONFIG_BPF_JIT) += trampoline.o 10 11 obj-$(CONFIG_BPF_SYSCALL) += btf.o ··· 13 12 ifeq ($(CONFIG_NET),y) 14 13 obj-$(CONFIG_BPF_SYSCALL) += devmap.o 15 14 obj-$(CONFIG_BPF_SYSCALL) += cpumap.o 15 + obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o 16 16 obj-$(CONFIG_BPF_SYSCALL) += offload.o 17 17 obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o 18 18 endif ··· 31 29 obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o 32 30 obj-${CONFIG_BPF_LSM} += bpf_lsm.o 33 31 endif 32 + obj-$(CONFIG_BPF_PRELOAD) += preload/

+17

kernel/bpf/arraymap.c

··· 10 10 #include <linux/filter.h> 11 11 #include <linux/perf_event.h> 12 12 #include <uapi/linux/btf.h> 13 + #include <linux/rcupdate_trace.h> 13 14 14 15 #include "map_in_map.h" 15 16 ··· 488 487 vma->vm_pgoff + pgoff); 489 488 } 490 489 490 + static bool array_map_meta_equal(const struct bpf_map *meta0, 491 + const struct bpf_map *meta1) 492 + { 493 + return meta0->max_entries == meta1->max_entries && 494 + bpf_map_meta_equal(meta0, meta1); 495 + } 496 + 491 497 struct bpf_iter_seq_array_map_info { 492 498 struct bpf_map *map; 493 499 void *percpu_value_buf; ··· 633 625 634 626 static int array_map_btf_id; 635 627 const struct bpf_map_ops array_map_ops = { 628 + .map_meta_equal = array_map_meta_equal, 636 629 .map_alloc_check = array_map_alloc_check, 637 630 .map_alloc = array_map_alloc, 638 631 .map_free = array_map_free, ··· 656 647 657 648 static int percpu_array_map_btf_id; 658 649 const struct bpf_map_ops percpu_array_map_ops = { 650 + .map_meta_equal = bpf_map_meta_equal, 659 651 .map_alloc_check = array_map_alloc_check, 660 652 .map_alloc = array_map_alloc, 661 653 .map_free = array_map_free, ··· 1013 1003 fd_array_map_free(map); 1014 1004 } 1015 1005 1006 + /* prog_array->aux->{type,jited} is a runtime binding. 1007 + * Doing static check alone in the verifier is not enough. 1008 + * Thus, prog_array_map cannot be used as an inner_map 1009 + * and map_meta_equal is not implemented. 1010 + */ 1016 1011 static int prog_array_map_btf_id; 1017 1012 const struct bpf_map_ops prog_array_map_ops = { 1018 1013 .map_alloc_check = fd_array_map_alloc_check, ··· 1116 1101 1117 1102 static int perf_event_array_map_btf_id; 1118 1103 const struct bpf_map_ops perf_event_array_map_ops = { 1104 + .map_meta_equal = bpf_map_meta_equal, 1119 1105 .map_alloc_check = fd_array_map_alloc_check, 1120 1106 .map_alloc = array_map_alloc, 1121 1107 .map_free = fd_array_map_free, ··· 1153 1137 1154 1138 static int cgroup_array_map_btf_id; 1155 1139 const struct bpf_map_ops cgroup_array_map_ops = { 1140 + .map_meta_equal = bpf_map_meta_equal, 1156 1141 .map_alloc_check = fd_array_map_alloc_check, 1157 1142 .map_alloc = array_map_alloc, 1158 1143 .map_free = cgroup_fd_array_free,

+274

kernel/bpf/bpf_inode_storage.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (c) 2019 Facebook 4 + * Copyright 2020 Google LLC. 5 + */ 6 + 7 + #include <linux/rculist.h> 8 + #include <linux/list.h> 9 + #include <linux/hash.h> 10 + #include <linux/types.h> 11 + #include <linux/spinlock.h> 12 + #include <linux/bpf.h> 13 + #include <linux/bpf_local_storage.h> 14 + #include <net/sock.h> 15 + #include <uapi/linux/sock_diag.h> 16 + #include <uapi/linux/btf.h> 17 + #include <linux/bpf_lsm.h> 18 + #include <linux/btf_ids.h> 19 + #include <linux/fdtable.h> 20 + 21 + DEFINE_BPF_STORAGE_CACHE(inode_cache); 22 + 23 + static struct bpf_local_storage __rcu ** 24 + inode_storage_ptr(void *owner) 25 + { 26 + struct inode *inode = owner; 27 + struct bpf_storage_blob *bsb; 28 + 29 + bsb = bpf_inode(inode); 30 + if (!bsb) 31 + return NULL; 32 + return &bsb->storage; 33 + } 34 + 35 + static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode, 36 + struct bpf_map *map, 37 + bool cacheit_lockit) 38 + { 39 + struct bpf_local_storage *inode_storage; 40 + struct bpf_local_storage_map *smap; 41 + struct bpf_storage_blob *bsb; 42 + 43 + bsb = bpf_inode(inode); 44 + if (!bsb) 45 + return NULL; 46 + 47 + inode_storage = rcu_dereference(bsb->storage); 48 + if (!inode_storage) 49 + return NULL; 50 + 51 + smap = (struct bpf_local_storage_map *)map; 52 + return bpf_local_storage_lookup(inode_storage, smap, cacheit_lockit); 53 + } 54 + 55 + void bpf_inode_storage_free(struct inode *inode) 56 + { 57 + struct bpf_local_storage_elem *selem; 58 + struct bpf_local_storage *local_storage; 59 + bool free_inode_storage = false; 60 + struct bpf_storage_blob *bsb; 61 + struct hlist_node *n; 62 + 63 + bsb = bpf_inode(inode); 64 + if (!bsb) 65 + return; 66 + 67 + rcu_read_lock(); 68 + 69 + local_storage = rcu_dereference(bsb->storage); 70 + if (!local_storage) { 71 + rcu_read_unlock(); 72 + return; 73 + } 74 + 75 + /* Netiher the bpf_prog nor the bpf-map's syscall 76 + * could be modifying the local_storage->list now. 77 + * Thus, no elem can be added-to or deleted-from the 78 + * local_storage->list by the bpf_prog or by the bpf-map's syscall. 79 + * 80 + * It is racing with bpf_local_storage_map_free() alone 81 + * when unlinking elem from the local_storage->list and 82 + * the map's bucket->list. 83 + */ 84 + raw_spin_lock_bh(&local_storage->lock); 85 + hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) { 86 + /* Always unlink from map before unlinking from 87 + * local_storage. 88 + */ 89 + bpf_selem_unlink_map(selem); 90 + free_inode_storage = bpf_selem_unlink_storage_nolock( 91 + local_storage, selem, false); 92 + } 93 + raw_spin_unlock_bh(&local_storage->lock); 94 + rcu_read_unlock(); 95 + 96 + /* free_inoode_storage should always be true as long as 97 + * local_storage->list was non-empty. 98 + */ 99 + if (free_inode_storage) 100 + kfree_rcu(local_storage, rcu); 101 + } 102 + 103 + static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key) 104 + { 105 + struct bpf_local_storage_data *sdata; 106 + struct file *f; 107 + int fd; 108 + 109 + fd = *(int *)key; 110 + f = fget_raw(fd); 111 + if (!f) 112 + return NULL; 113 + 114 + sdata = inode_storage_lookup(f->f_inode, map, true); 115 + fput(f); 116 + return sdata ? sdata->data : NULL; 117 + } 118 + 119 + static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key, 120 + void *value, u64 map_flags) 121 + { 122 + struct bpf_local_storage_data *sdata; 123 + struct file *f; 124 + int fd; 125 + 126 + fd = *(int *)key; 127 + f = fget_raw(fd); 128 + if (!f || !inode_storage_ptr(f->f_inode)) 129 + return -EBADF; 130 + 131 + sdata = bpf_local_storage_update(f->f_inode, 132 + (struct bpf_local_storage_map *)map, 133 + value, map_flags); 134 + fput(f); 135 + return PTR_ERR_OR_ZERO(sdata); 136 + } 137 + 138 + static int inode_storage_delete(struct inode *inode, struct bpf_map *map) 139 + { 140 + struct bpf_local_storage_data *sdata; 141 + 142 + sdata = inode_storage_lookup(inode, map, false); 143 + if (!sdata) 144 + return -ENOENT; 145 + 146 + bpf_selem_unlink(SELEM(sdata)); 147 + 148 + return 0; 149 + } 150 + 151 + static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key) 152 + { 153 + struct file *f; 154 + int fd, err; 155 + 156 + fd = *(int *)key; 157 + f = fget_raw(fd); 158 + if (!f) 159 + return -EBADF; 160 + 161 + err = inode_storage_delete(f->f_inode, map); 162 + fput(f); 163 + return err; 164 + } 165 + 166 + BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode, 167 + void *, value, u64, flags) 168 + { 169 + struct bpf_local_storage_data *sdata; 170 + 171 + if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE)) 172 + return (unsigned long)NULL; 173 + 174 + /* explicitly check that the inode_storage_ptr is not 175 + * NULL as inode_storage_lookup returns NULL in this case and 176 + * bpf_local_storage_update expects the owner to have a 177 + * valid storage pointer. 178 + */ 179 + if (!inode_storage_ptr(inode)) 180 + return (unsigned long)NULL; 181 + 182 + sdata = inode_storage_lookup(inode, map, true); 183 + if (sdata) 184 + return (unsigned long)sdata->data; 185 + 186 + /* This helper must only called from where the inode is gurranteed 187 + * to have a refcount and cannot be freed. 188 + */ 189 + if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) { 190 + sdata = bpf_local_storage_update( 191 + inode, (struct bpf_local_storage_map *)map, value, 192 + BPF_NOEXIST); 193 + return IS_ERR(sdata) ? (unsigned long)NULL : 194 + (unsigned long)sdata->data; 195 + } 196 + 197 + return (unsigned long)NULL; 198 + } 199 + 200 + BPF_CALL_2(bpf_inode_storage_delete, 201 + struct bpf_map *, map, struct inode *, inode) 202 + { 203 + /* This helper must only called from where the inode is gurranteed 204 + * to have a refcount and cannot be freed. 205 + */ 206 + return inode_storage_delete(inode, map); 207 + } 208 + 209 + static int notsupp_get_next_key(struct bpf_map *map, void *key, 210 + void *next_key) 211 + { 212 + return -ENOTSUPP; 213 + } 214 + 215 + static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr) 216 + { 217 + struct bpf_local_storage_map *smap; 218 + 219 + smap = bpf_local_storage_map_alloc(attr); 220 + if (IS_ERR(smap)) 221 + return ERR_CAST(smap); 222 + 223 + smap->cache_idx = bpf_local_storage_cache_idx_get(&inode_cache); 224 + return &smap->map; 225 + } 226 + 227 + static void inode_storage_map_free(struct bpf_map *map) 228 + { 229 + struct bpf_local_storage_map *smap; 230 + 231 + smap = (struct bpf_local_storage_map *)map; 232 + bpf_local_storage_cache_idx_free(&inode_cache, smap->cache_idx); 233 + bpf_local_storage_map_free(smap); 234 + } 235 + 236 + static int inode_storage_map_btf_id; 237 + const struct bpf_map_ops inode_storage_map_ops = { 238 + .map_meta_equal = bpf_map_meta_equal, 239 + .map_alloc_check = bpf_local_storage_map_alloc_check, 240 + .map_alloc = inode_storage_map_alloc, 241 + .map_free = inode_storage_map_free, 242 + .map_get_next_key = notsupp_get_next_key, 243 + .map_lookup_elem = bpf_fd_inode_storage_lookup_elem, 244 + .map_update_elem = bpf_fd_inode_storage_update_elem, 245 + .map_delete_elem = bpf_fd_inode_storage_delete_elem, 246 + .map_check_btf = bpf_local_storage_map_check_btf, 247 + .map_btf_name = "bpf_local_storage_map", 248 + .map_btf_id = &inode_storage_map_btf_id, 249 + .map_owner_storage_ptr = inode_storage_ptr, 250 + }; 251 + 252 + BTF_ID_LIST(bpf_inode_storage_btf_ids) 253 + BTF_ID_UNUSED 254 + BTF_ID(struct, inode) 255 + 256 + const struct bpf_func_proto bpf_inode_storage_get_proto = { 257 + .func = bpf_inode_storage_get, 258 + .gpl_only = false, 259 + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, 260 + .arg1_type = ARG_CONST_MAP_PTR, 261 + .arg2_type = ARG_PTR_TO_BTF_ID, 262 + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, 263 + .arg4_type = ARG_ANYTHING, 264 + .btf_id = bpf_inode_storage_btf_ids, 265 + }; 266 + 267 + const struct bpf_func_proto bpf_inode_storage_delete_proto = { 268 + .func = bpf_inode_storage_delete, 269 + .gpl_only = false, 270 + .ret_type = RET_INTEGER, 271 + .arg1_type = ARG_CONST_MAP_PTR, 272 + .arg2_type = ARG_PTR_TO_BTF_ID, 273 + .btf_id = bpf_inode_storage_btf_ids, 274 + };

+58

kernel/bpf/bpf_iter.c

··· 390 390 return ret; 391 391 } 392 392 393 + static void bpf_iter_link_show_fdinfo(const struct bpf_link *link, 394 + struct seq_file *seq) 395 + { 396 + struct bpf_iter_link *iter_link = 397 + container_of(link, struct bpf_iter_link, link); 398 + bpf_iter_show_fdinfo_t show_fdinfo; 399 + 400 + seq_printf(seq, 401 + "target_name:\t%s\n", 402 + iter_link->tinfo->reg_info->target); 403 + 404 + show_fdinfo = iter_link->tinfo->reg_info->show_fdinfo; 405 + if (show_fdinfo) 406 + show_fdinfo(&iter_link->aux, seq); 407 + } 408 + 409 + static int bpf_iter_link_fill_link_info(const struct bpf_link *link, 410 + struct bpf_link_info *info) 411 + { 412 + struct bpf_iter_link *iter_link = 413 + container_of(link, struct bpf_iter_link, link); 414 + char __user *ubuf = u64_to_user_ptr(info->iter.target_name); 415 + bpf_iter_fill_link_info_t fill_link_info; 416 + u32 ulen = info->iter.target_name_len; 417 + const char *target_name; 418 + u32 target_len; 419 + 420 + if (!ulen ^ !ubuf) 421 + return -EINVAL; 422 + 423 + target_name = iter_link->tinfo->reg_info->target; 424 + target_len = strlen(target_name); 425 + info->iter.target_name_len = target_len + 1; 426 + 427 + if (ubuf) { 428 + if (ulen >= target_len + 1) { 429 + if (copy_to_user(ubuf, target_name, target_len + 1)) 430 + return -EFAULT; 431 + } else { 432 + char zero = '\0'; 433 + 434 + if (copy_to_user(ubuf, target_name, ulen - 1)) 435 + return -EFAULT; 436 + if (put_user(zero, ubuf + ulen - 1)) 437 + return -EFAULT; 438 + return -ENOSPC; 439 + } 440 + } 441 + 442 + fill_link_info = iter_link->tinfo->reg_info->fill_link_info; 443 + if (fill_link_info) 444 + return fill_link_info(&iter_link->aux, info); 445 + 446 + return 0; 447 + } 448 + 393 449 static const struct bpf_link_ops bpf_iter_link_lops = { 394 450 .release = bpf_iter_link_release, 395 451 .dealloc = bpf_iter_link_dealloc, 396 452 .update_prog = bpf_iter_link_replace, 453 + .show_fdinfo = bpf_iter_link_show_fdinfo, 454 + .fill_link_info = bpf_iter_link_fill_link_info, 397 455 }; 398 456 399 457 bool bpf_link_is_iter(struct bpf_link *link)

+600

kernel/bpf/bpf_local_storage.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2019 Facebook */ 3 + #include <linux/rculist.h> 4 + #include <linux/list.h> 5 + #include <linux/hash.h> 6 + #include <linux/types.h> 7 + #include <linux/spinlock.h> 8 + #include <linux/bpf.h> 9 + #include <linux/btf_ids.h> 10 + #include <linux/bpf_local_storage.h> 11 + #include <net/sock.h> 12 + #include <uapi/linux/sock_diag.h> 13 + #include <uapi/linux/btf.h> 14 + 15 + #define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE) 16 + 17 + static struct bpf_local_storage_map_bucket * 18 + select_bucket(struct bpf_local_storage_map *smap, 19 + struct bpf_local_storage_elem *selem) 20 + { 21 + return &smap->buckets[hash_ptr(selem, smap->bucket_log)]; 22 + } 23 + 24 + static int mem_charge(struct bpf_local_storage_map *smap, void *owner, u32 size) 25 + { 26 + struct bpf_map *map = &smap->map; 27 + 28 + if (!map->ops->map_local_storage_charge) 29 + return 0; 30 + 31 + return map->ops->map_local_storage_charge(smap, owner, size); 32 + } 33 + 34 + static void mem_uncharge(struct bpf_local_storage_map *smap, void *owner, 35 + u32 size) 36 + { 37 + struct bpf_map *map = &smap->map; 38 + 39 + if (map->ops->map_local_storage_uncharge) 40 + map->ops->map_local_storage_uncharge(smap, owner, size); 41 + } 42 + 43 + static struct bpf_local_storage __rcu ** 44 + owner_storage(struct bpf_local_storage_map *smap, void *owner) 45 + { 46 + struct bpf_map *map = &smap->map; 47 + 48 + return map->ops->map_owner_storage_ptr(owner); 49 + } 50 + 51 + static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem) 52 + { 53 + return !hlist_unhashed(&selem->snode); 54 + } 55 + 56 + static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) 57 + { 58 + return !hlist_unhashed(&selem->map_node); 59 + } 60 + 61 + struct bpf_local_storage_elem * 62 + bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, 63 + void *value, bool charge_mem) 64 + { 65 + struct bpf_local_storage_elem *selem; 66 + 67 + if (charge_mem && mem_charge(smap, owner, smap->elem_size)) 68 + return NULL; 69 + 70 + selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN); 71 + if (selem) { 72 + if (value) 73 + memcpy(SDATA(selem)->data, value, smap->map.value_size); 74 + return selem; 75 + } 76 + 77 + if (charge_mem) 78 + mem_uncharge(smap, owner, smap->elem_size); 79 + 80 + return NULL; 81 + } 82 + 83 + /* local_storage->lock must be held and selem->local_storage == local_storage. 84 + * The caller must ensure selem->smap is still valid to be 85 + * dereferenced for its smap->elem_size and smap->cache_idx. 86 + */ 87 + bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, 88 + struct bpf_local_storage_elem *selem, 89 + bool uncharge_mem) 90 + { 91 + struct bpf_local_storage_map *smap; 92 + bool free_local_storage; 93 + void *owner; 94 + 95 + smap = rcu_dereference(SDATA(selem)->smap); 96 + owner = local_storage->owner; 97 + 98 + /* All uncharging on the owner must be done first. 99 + * The owner may be freed once the last selem is unlinked 100 + * from local_storage. 101 + */ 102 + if (uncharge_mem) 103 + mem_uncharge(smap, owner, smap->elem_size); 104 + 105 + free_local_storage = hlist_is_singular_node(&selem->snode, 106 + &local_storage->list); 107 + if (free_local_storage) { 108 + mem_uncharge(smap, owner, sizeof(struct bpf_local_storage)); 109 + local_storage->owner = NULL; 110 + 111 + /* After this RCU_INIT, owner may be freed and cannot be used */ 112 + RCU_INIT_POINTER(*owner_storage(smap, owner), NULL); 113 + 114 + /* local_storage is not freed now. local_storage->lock is 115 + * still held and raw_spin_unlock_bh(&local_storage->lock) 116 + * will be done by the caller. 117 + * 118 + * Although the unlock will be done under 119 + * rcu_read_lock(), it is more intutivie to 120 + * read if kfree_rcu(local_storage, rcu) is done 121 + * after the raw_spin_unlock_bh(&local_storage->lock). 122 + * 123 + * Hence, a "bool free_local_storage" is returned 124 + * to the caller which then calls the kfree_rcu() 125 + * after unlock. 126 + */ 127 + } 128 + hlist_del_init_rcu(&selem->snode); 129 + if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) == 130 + SDATA(selem)) 131 + RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL); 132 + 133 + kfree_rcu(selem, rcu); 134 + 135 + return free_local_storage; 136 + } 137 + 138 + static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem) 139 + { 140 + struct bpf_local_storage *local_storage; 141 + bool free_local_storage = false; 142 + 143 + if (unlikely(!selem_linked_to_storage(selem))) 144 + /* selem has already been unlinked from sk */ 145 + return; 146 + 147 + local_storage = rcu_dereference(selem->local_storage); 148 + raw_spin_lock_bh(&local_storage->lock); 149 + if (likely(selem_linked_to_storage(selem))) 150 + free_local_storage = bpf_selem_unlink_storage_nolock( 151 + local_storage, selem, true); 152 + raw_spin_unlock_bh(&local_storage->lock); 153 + 154 + if (free_local_storage) 155 + kfree_rcu(local_storage, rcu); 156 + } 157 + 158 + void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, 159 + struct bpf_local_storage_elem *selem) 160 + { 161 + RCU_INIT_POINTER(selem->local_storage, local_storage); 162 + hlist_add_head(&selem->snode, &local_storage->list); 163 + } 164 + 165 + void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem) 166 + { 167 + struct bpf_local_storage_map *smap; 168 + struct bpf_local_storage_map_bucket *b; 169 + 170 + if (unlikely(!selem_linked_to_map(selem))) 171 + /* selem has already be unlinked from smap */ 172 + return; 173 + 174 + smap = rcu_dereference(SDATA(selem)->smap); 175 + b = select_bucket(smap, selem); 176 + raw_spin_lock_bh(&b->lock); 177 + if (likely(selem_linked_to_map(selem))) 178 + hlist_del_init_rcu(&selem->map_node); 179 + raw_spin_unlock_bh(&b->lock); 180 + } 181 + 182 + void bpf_selem_link_map(struct bpf_local_storage_map *smap, 183 + struct bpf_local_storage_elem *selem) 184 + { 185 + struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem); 186 + 187 + raw_spin_lock_bh(&b->lock); 188 + RCU_INIT_POINTER(SDATA(selem)->smap, smap); 189 + hlist_add_head_rcu(&selem->map_node, &b->list); 190 + raw_spin_unlock_bh(&b->lock); 191 + } 192 + 193 + void bpf_selem_unlink(struct bpf_local_storage_elem *selem) 194 + { 195 + /* Always unlink from map before unlinking from local_storage 196 + * because selem will be freed after successfully unlinked from 197 + * the local_storage. 198 + */ 199 + bpf_selem_unlink_map(selem); 200 + __bpf_selem_unlink_storage(selem); 201 + } 202 + 203 + struct bpf_local_storage_data * 204 + bpf_local_storage_lookup(struct bpf_local_storage *local_storage, 205 + struct bpf_local_storage_map *smap, 206 + bool cacheit_lockit) 207 + { 208 + struct bpf_local_storage_data *sdata; 209 + struct bpf_local_storage_elem *selem; 210 + 211 + /* Fast path (cache hit) */ 212 + sdata = rcu_dereference(local_storage->cache[smap->cache_idx]); 213 + if (sdata && rcu_access_pointer(sdata->smap) == smap) 214 + return sdata; 215 + 216 + /* Slow path (cache miss) */ 217 + hlist_for_each_entry_rcu(selem, &local_storage->list, snode) 218 + if (rcu_access_pointer(SDATA(selem)->smap) == smap) 219 + break; 220 + 221 + if (!selem) 222 + return NULL; 223 + 224 + sdata = SDATA(selem); 225 + if (cacheit_lockit) { 226 + /* spinlock is needed to avoid racing with the 227 + * parallel delete. Otherwise, publishing an already 228 + * deleted sdata to the cache will become a use-after-free 229 + * problem in the next bpf_local_storage_lookup(). 230 + */ 231 + raw_spin_lock_bh(&local_storage->lock); 232 + if (selem_linked_to_storage(selem)) 233 + rcu_assign_pointer(local_storage->cache[smap->cache_idx], 234 + sdata); 235 + raw_spin_unlock_bh(&local_storage->lock); 236 + } 237 + 238 + return sdata; 239 + } 240 + 241 + static int check_flags(const struct bpf_local_storage_data *old_sdata, 242 + u64 map_flags) 243 + { 244 + if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST) 245 + /* elem already exists */ 246 + return -EEXIST; 247 + 248 + if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST) 249 + /* elem doesn't exist, cannot update it */ 250 + return -ENOENT; 251 + 252 + return 0; 253 + } 254 + 255 + int bpf_local_storage_alloc(void *owner, 256 + struct bpf_local_storage_map *smap, 257 + struct bpf_local_storage_elem *first_selem) 258 + { 259 + struct bpf_local_storage *prev_storage, *storage; 260 + struct bpf_local_storage **owner_storage_ptr; 261 + int err; 262 + 263 + err = mem_charge(smap, owner, sizeof(*storage)); 264 + if (err) 265 + return err; 266 + 267 + storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN); 268 + if (!storage) { 269 + err = -ENOMEM; 270 + goto uncharge; 271 + } 272 + 273 + INIT_HLIST_HEAD(&storage->list); 274 + raw_spin_lock_init(&storage->lock); 275 + storage->owner = owner; 276 + 277 + bpf_selem_link_storage_nolock(storage, first_selem); 278 + bpf_selem_link_map(smap, first_selem); 279 + 280 + owner_storage_ptr = 281 + (struct bpf_local_storage **)owner_storage(smap, owner); 282 + /* Publish storage to the owner. 283 + * Instead of using any lock of the kernel object (i.e. owner), 284 + * cmpxchg will work with any kernel object regardless what 285 + * the running context is, bh, irq...etc. 286 + * 287 + * From now on, the owner->storage pointer (e.g. sk->sk_bpf_storage) 288 + * is protected by the storage->lock. Hence, when freeing 289 + * the owner->storage, the storage->lock must be held before 290 + * setting owner->storage ptr to NULL. 291 + */ 292 + prev_storage = cmpxchg(owner_storage_ptr, NULL, storage); 293 + if (unlikely(prev_storage)) { 294 + bpf_selem_unlink_map(first_selem); 295 + err = -EAGAIN; 296 + goto uncharge; 297 + 298 + /* Note that even first_selem was linked to smap's 299 + * bucket->list, first_selem can be freed immediately 300 + * (instead of kfree_rcu) because 301 + * bpf_local_storage_map_free() does a 302 + * synchronize_rcu() before walking the bucket->list. 303 + * Hence, no one is accessing selem from the 304 + * bucket->list under rcu_read_lock(). 305 + */ 306 + } 307 + 308 + return 0; 309 + 310 + uncharge: 311 + kfree(storage); 312 + mem_uncharge(smap, owner, sizeof(*storage)); 313 + return err; 314 + } 315 + 316 + /* sk cannot be going away because it is linking new elem 317 + * to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0). 318 + * Otherwise, it will become a leak (and other memory issues 319 + * during map destruction). 320 + */ 321 + struct bpf_local_storage_data * 322 + bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, 323 + void *value, u64 map_flags) 324 + { 325 + struct bpf_local_storage_data *old_sdata = NULL; 326 + struct bpf_local_storage_elem *selem; 327 + struct bpf_local_storage *local_storage; 328 + int err; 329 + 330 + /* BPF_EXIST and BPF_NOEXIST cannot be both set */ 331 + if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) || 332 + /* BPF_F_LOCK can only be used in a value with spin_lock */ 333 + unlikely((map_flags & BPF_F_LOCK) && 334 + !map_value_has_spin_lock(&smap->map))) 335 + return ERR_PTR(-EINVAL); 336 + 337 + local_storage = rcu_dereference(*owner_storage(smap, owner)); 338 + if (!local_storage || hlist_empty(&local_storage->list)) { 339 + /* Very first elem for the owner */ 340 + err = check_flags(NULL, map_flags); 341 + if (err) 342 + return ERR_PTR(err); 343 + 344 + selem = bpf_selem_alloc(smap, owner, value, true); 345 + if (!selem) 346 + return ERR_PTR(-ENOMEM); 347 + 348 + err = bpf_local_storage_alloc(owner, smap, selem); 349 + if (err) { 350 + kfree(selem); 351 + mem_uncharge(smap, owner, smap->elem_size); 352 + return ERR_PTR(err); 353 + } 354 + 355 + return SDATA(selem); 356 + } 357 + 358 + if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) { 359 + /* Hoping to find an old_sdata to do inline update 360 + * such that it can avoid taking the local_storage->lock 361 + * and changing the lists. 362 + */ 363 + old_sdata = 364 + bpf_local_storage_lookup(local_storage, smap, false); 365 + err = check_flags(old_sdata, map_flags); 366 + if (err) 367 + return ERR_PTR(err); 368 + if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) { 369 + copy_map_value_locked(&smap->map, old_sdata->data, 370 + value, false); 371 + return old_sdata; 372 + } 373 + } 374 + 375 + raw_spin_lock_bh(&local_storage->lock); 376 + 377 + /* Recheck local_storage->list under local_storage->lock */ 378 + if (unlikely(hlist_empty(&local_storage->list))) { 379 + /* A parallel del is happening and local_storage is going 380 + * away. It has just been checked before, so very 381 + * unlikely. Return instead of retry to keep things 382 + * simple. 383 + */ 384 + err = -EAGAIN; 385 + goto unlock_err; 386 + } 387 + 388 + old_sdata = bpf_local_storage_lookup(local_storage, smap, false); 389 + err = check_flags(old_sdata, map_flags); 390 + if (err) 391 + goto unlock_err; 392 + 393 + if (old_sdata && (map_flags & BPF_F_LOCK)) { 394 + copy_map_value_locked(&smap->map, old_sdata->data, value, 395 + false); 396 + selem = SELEM(old_sdata); 397 + goto unlock; 398 + } 399 + 400 + /* local_storage->lock is held. Hence, we are sure 401 + * we can unlink and uncharge the old_sdata successfully 402 + * later. Hence, instead of charging the new selem now 403 + * and then uncharge the old selem later (which may cause 404 + * a potential but unnecessary charge failure), avoid taking 405 + * a charge at all here (the "!old_sdata" check) and the 406 + * old_sdata will not be uncharged later during 407 + * bpf_selem_unlink_storage_nolock(). 408 + */ 409 + selem = bpf_selem_alloc(smap, owner, value, !old_sdata); 410 + if (!selem) { 411 + err = -ENOMEM; 412 + goto unlock_err; 413 + } 414 + 415 + /* First, link the new selem to the map */ 416 + bpf_selem_link_map(smap, selem); 417 + 418 + /* Second, link (and publish) the new selem to local_storage */ 419 + bpf_selem_link_storage_nolock(local_storage, selem); 420 + 421 + /* Third, remove old selem, SELEM(old_sdata) */ 422 + if (old_sdata) { 423 + bpf_selem_unlink_map(SELEM(old_sdata)); 424 + bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata), 425 + false); 426 + } 427 + 428 + unlock: 429 + raw_spin_unlock_bh(&local_storage->lock); 430 + return SDATA(selem); 431 + 432 + unlock_err: 433 + raw_spin_unlock_bh(&local_storage->lock); 434 + return ERR_PTR(err); 435 + } 436 + 437 + u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache) 438 + { 439 + u64 min_usage = U64_MAX; 440 + u16 i, res = 0; 441 + 442 + spin_lock(&cache->idx_lock); 443 + 444 + for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) { 445 + if (cache->idx_usage_counts[i] < min_usage) { 446 + min_usage = cache->idx_usage_counts[i]; 447 + res = i; 448 + 449 + /* Found a free cache_idx */ 450 + if (!min_usage) 451 + break; 452 + } 453 + } 454 + cache->idx_usage_counts[res]++; 455 + 456 + spin_unlock(&cache->idx_lock); 457 + 458 + return res; 459 + } 460 + 461 + void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, 462 + u16 idx) 463 + { 464 + spin_lock(&cache->idx_lock); 465 + cache->idx_usage_counts[idx]--; 466 + spin_unlock(&cache->idx_lock); 467 + } 468 + 469 + void bpf_local_storage_map_free(struct bpf_local_storage_map *smap) 470 + { 471 + struct bpf_local_storage_elem *selem; 472 + struct bpf_local_storage_map_bucket *b; 473 + unsigned int i; 474 + 475 + /* Note that this map might be concurrently cloned from 476 + * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone 477 + * RCU read section to finish before proceeding. New RCU 478 + * read sections should be prevented via bpf_map_inc_not_zero. 479 + */ 480 + synchronize_rcu(); 481 + 482 + /* bpf prog and the userspace can no longer access this map 483 + * now. No new selem (of this map) can be added 484 + * to the owner->storage or to the map bucket's list. 485 + * 486 + * The elem of this map can be cleaned up here 487 + * or when the storage is freed e.g. 488 + * by bpf_sk_storage_free() during __sk_destruct(). 489 + */ 490 + for (i = 0; i < (1U << smap->bucket_log); i++) { 491 + b = &smap->buckets[i]; 492 + 493 + rcu_read_lock(); 494 + /* No one is adding to b->list now */ 495 + while ((selem = hlist_entry_safe( 496 + rcu_dereference_raw(hlist_first_rcu(&b->list)), 497 + struct bpf_local_storage_elem, map_node))) { 498 + bpf_selem_unlink(selem); 499 + cond_resched_rcu(); 500 + } 501 + rcu_read_unlock(); 502 + } 503 + 504 + /* While freeing the storage we may still need to access the map. 505 + * 506 + * e.g. when bpf_sk_storage_free() has unlinked selem from the map 507 + * which then made the above while((selem = ...)) loop 508 + * exit immediately. 509 + * 510 + * However, while freeing the storage one still needs to access the 511 + * smap->elem_size to do the uncharging in 512 + * bpf_selem_unlink_storage_nolock(). 513 + * 514 + * Hence, wait another rcu grace period for the storage to be freed. 515 + */ 516 + synchronize_rcu(); 517 + 518 + kvfree(smap->buckets); 519 + kfree(smap); 520 + } 521 + 522 + int bpf_local_storage_map_alloc_check(union bpf_attr *attr) 523 + { 524 + if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK || 525 + !(attr->map_flags & BPF_F_NO_PREALLOC) || 526 + attr->max_entries || 527 + attr->key_size != sizeof(int) || !attr->value_size || 528 + /* Enforce BTF for userspace sk dumping */ 529 + !attr->btf_key_type_id || !attr->btf_value_type_id) 530 + return -EINVAL; 531 + 532 + if (!bpf_capable()) 533 + return -EPERM; 534 + 535 + if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE) 536 + return -E2BIG; 537 + 538 + return 0; 539 + } 540 + 541 + struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr) 542 + { 543 + struct bpf_local_storage_map *smap; 544 + unsigned int i; 545 + u32 nbuckets; 546 + u64 cost; 547 + int ret; 548 + 549 + smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN); 550 + if (!smap) 551 + return ERR_PTR(-ENOMEM); 552 + bpf_map_init_from_attr(&smap->map, attr); 553 + 554 + nbuckets = roundup_pow_of_two(num_possible_cpus()); 555 + /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ 556 + nbuckets = max_t(u32, 2, nbuckets); 557 + smap->bucket_log = ilog2(nbuckets); 558 + cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap); 559 + 560 + ret = bpf_map_charge_init(&smap->map.memory, cost); 561 + if (ret < 0) { 562 + kfree(smap); 563 + return ERR_PTR(ret); 564 + } 565 + 566 + smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets, 567 + GFP_USER | __GFP_NOWARN); 568 + if (!smap->buckets) { 569 + bpf_map_charge_finish(&smap->map.memory); 570 + kfree(smap); 571 + return ERR_PTR(-ENOMEM); 572 + } 573 + 574 + for (i = 0; i < nbuckets; i++) { 575 + INIT_HLIST_HEAD(&smap->buckets[i].list); 576 + raw_spin_lock_init(&smap->buckets[i].lock); 577 + } 578 + 579 + smap->elem_size = 580 + sizeof(struct bpf_local_storage_elem) + attr->value_size; 581 + 582 + return smap; 583 + } 584 + 585 + int bpf_local_storage_map_check_btf(const struct bpf_map *map, 586 + const struct btf *btf, 587 + const struct btf_type *key_type, 588 + const struct btf_type *value_type) 589 + { 590 + u32 int_data; 591 + 592 + if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT) 593 + return -EINVAL; 594 + 595 + int_data = *(u32 *)(key_type + 1); 596 + if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data)) 597 + return -EINVAL; 598 + 599 + return 0; 600 + }

+20 -1

kernel/bpf/bpf_lsm.c

··· 11 11 #include <linux/bpf_lsm.h> 12 12 #include <linux/kallsyms.h> 13 13 #include <linux/bpf_verifier.h> 14 + #include <net/bpf_sk_storage.h> 15 + #include <linux/bpf_local_storage.h> 14 16 15 17 /* For every LSM hook that allows attachment of BPF programs, declare a nop 16 18 * function where a BPF program can be attached. ··· 47 45 return 0; 48 46 } 49 47 48 + static const struct bpf_func_proto * 49 + bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 50 + { 51 + switch (func_id) { 52 + case BPF_FUNC_inode_storage_get: 53 + return &bpf_inode_storage_get_proto; 54 + case BPF_FUNC_inode_storage_delete: 55 + return &bpf_inode_storage_delete_proto; 56 + case BPF_FUNC_sk_storage_get: 57 + return &sk_storage_get_btf_proto; 58 + case BPF_FUNC_sk_storage_delete: 59 + return &sk_storage_delete_btf_proto; 60 + default: 61 + return tracing_prog_func_proto(func_id, prog); 62 + } 63 + } 64 + 50 65 const struct bpf_prog_ops lsm_prog_ops = { 51 66 }; 52 67 53 68 const struct bpf_verifier_ops lsm_verifier_ops = { 54 - .get_func_proto = tracing_prog_func_proto, 69 + .get_func_proto = bpf_lsm_func_proto, 55 70 .is_valid_access = btf_ctx_access, 56 71 };

+2 -4

kernel/bpf/bpf_struct_ops.c

··· 298 298 return -EINVAL; 299 299 300 300 mtype = btf_type_by_id(btf_vmlinux, member->type); 301 - mtype = btf_resolve_size(btf_vmlinux, mtype, &msize, 302 - NULL, NULL); 301 + mtype = btf_resolve_size(btf_vmlinux, mtype, &msize); 303 302 if (IS_ERR(mtype)) 304 303 return PTR_ERR(mtype); 305 304 prev_mend = moff + msize; ··· 395 396 u32 msize; 396 397 397 398 mtype = btf_type_by_id(btf_vmlinux, member->type); 398 - mtype = btf_resolve_size(btf_vmlinux, mtype, &msize, 399 - NULL, NULL); 399 + mtype = btf_resolve_size(btf_vmlinux, mtype, &msize); 400 400 if (IS_ERR(mtype)) { 401 401 err = PTR_ERR(mtype); 402 402 goto reset_unlock;

+139 -24

kernel/bpf/btf.c

··· 21 21 #include <linux/btf_ids.h> 22 22 #include <linux/skmsg.h> 23 23 #include <linux/perf_event.h> 24 + #include <linux/bsearch.h> 25 + #include <linux/btf_ids.h> 24 26 #include <net/sock.h> 25 27 26 28 /* BTF (BPF Type Format) is the meta data format which describes ··· 1081 1079 * *type_size: (x * y * sizeof(u32)). Hence, *type_size always 1082 1080 * corresponds to the return type. 1083 1081 * *elem_type: u32 1082 + * *elem_id: id of u32 1084 1083 * *total_nelems: (x * y). Hence, individual elem size is 1085 1084 * (*type_size / *total_nelems) 1085 + * *type_id: id of type if it's changed within the function, 0 if not 1086 1086 * 1087 1087 * type: is not an array (e.g. const struct X) 1088 1088 * return type: type "struct X" 1089 1089 * *type_size: sizeof(struct X) 1090 1090 * *elem_type: same as return type ("struct X") 1091 + * *elem_id: 0 1091 1092 * *total_nelems: 1 1093 + * *type_id: id of type if it's changed within the function, 0 if not 1092 1094 */ 1093 - const struct btf_type * 1094 - btf_resolve_size(const struct btf *btf, const struct btf_type *type, 1095 - u32 *type_size, const struct btf_type **elem_type, 1096 - u32 *total_nelems) 1095 + static const struct btf_type * 1096 + __btf_resolve_size(const struct btf *btf, const struct btf_type *type, 1097 + u32 *type_size, const struct btf_type **elem_type, 1098 + u32 *elem_id, u32 *total_nelems, u32 *type_id) 1097 1099 { 1098 1100 const struct btf_type *array_type = NULL; 1099 - const struct btf_array *array; 1100 - u32 i, size, nelems = 1; 1101 + const struct btf_array *array = NULL; 1102 + u32 i, size, nelems = 1, id = 0; 1101 1103 1102 1104 for (i = 0; i < MAX_RESOLVE_DEPTH; i++) { 1103 1105 switch (BTF_INFO_KIND(type->info)) { ··· 1122 1116 case BTF_KIND_VOLATILE: 1123 1117 case BTF_KIND_CONST: 1124 1118 case BTF_KIND_RESTRICT: 1119 + id = type->type; 1125 1120 type = btf_type_by_id(btf, type->type); 1126 1121 break; 1127 1122 ··· 1153 1146 *total_nelems = nelems; 1154 1147 if (elem_type) 1155 1148 *elem_type = type; 1149 + if (elem_id) 1150 + *elem_id = array ? array->type : 0; 1151 + if (type_id && id) 1152 + *type_id = id; 1156 1153 1157 1154 return array_type ? : type; 1155 + } 1156 + 1157 + const struct btf_type * 1158 + btf_resolve_size(const struct btf *btf, const struct btf_type *type, 1159 + u32 *type_size) 1160 + { 1161 + return __btf_resolve_size(btf, type, type_size, NULL, NULL, NULL, NULL); 1158 1162 } 1159 1163 1160 1164 /* The input param "type_id" must point to a needs_resolve type */ ··· 3888 3870 return true; 3889 3871 } 3890 3872 3891 - int btf_struct_access(struct bpf_verifier_log *log, 3892 - const struct btf_type *t, int off, int size, 3893 - enum bpf_access_type atype, 3894 - u32 *next_btf_id) 3873 + enum bpf_struct_walk_result { 3874 + /* < 0 error */ 3875 + WALK_SCALAR = 0, 3876 + WALK_PTR, 3877 + WALK_STRUCT, 3878 + }; 3879 + 3880 + static int btf_struct_walk(struct bpf_verifier_log *log, 3881 + const struct btf_type *t, int off, int size, 3882 + u32 *next_btf_id) 3895 3883 { 3896 3884 u32 i, moff, mtrue_end, msize = 0, total_nelems = 0; 3897 3885 const struct btf_type *mtype, *elem_type = NULL; 3898 3886 const struct btf_member *member; 3899 3887 const char *tname, *mname; 3900 - u32 vlen; 3888 + u32 vlen, elem_id, mid; 3901 3889 3902 3890 again: 3903 3891 tname = __btf_name_by_offset(btf_vmlinux, t->name_off); ··· 3939 3915 /* Only allow structure for now, can be relaxed for 3940 3916 * other types later. 3941 3917 */ 3942 - elem_type = btf_type_skip_modifiers(btf_vmlinux, 3943 - array_elem->type, NULL); 3944 - if (!btf_type_is_struct(elem_type)) 3918 + t = btf_type_skip_modifiers(btf_vmlinux, array_elem->type, 3919 + NULL); 3920 + if (!btf_type_is_struct(t)) 3945 3921 goto error; 3946 3922 3947 - off = (off - moff) % elem_type->size; 3948 - return btf_struct_access(log, elem_type, off, size, atype, 3949 - next_btf_id); 3923 + off = (off - moff) % t->size; 3924 + goto again; 3950 3925 3951 3926 error: 3952 3927 bpf_log(log, "access beyond struct %s at off %u size %u\n", ··· 3974 3951 */ 3975 3952 if (off <= moff && 3976 3953 BITS_ROUNDUP_BYTES(end_bit) <= off + size) 3977 - return SCALAR_VALUE; 3954 + return WALK_SCALAR; 3978 3955 3979 3956 /* off may be accessing a following member 3980 3957 * ··· 3996 3973 break; 3997 3974 3998 3975 /* type of the field */ 3976 + mid = member->type; 3999 3977 mtype = btf_type_by_id(btf_vmlinux, member->type); 4000 3978 mname = __btf_name_by_offset(btf_vmlinux, member->name_off); 4001 3979 4002 - mtype = btf_resolve_size(btf_vmlinux, mtype, &msize, 4003 - &elem_type, &total_nelems); 3980 + mtype = __btf_resolve_size(btf_vmlinux, mtype, &msize, 3981 + &elem_type, &elem_id, &total_nelems, 3982 + &mid); 4004 3983 if (IS_ERR(mtype)) { 4005 3984 bpf_log(log, "field %s doesn't have size\n", mname); 4006 3985 return -EFAULT; ··· 4016 3991 if (btf_type_is_array(mtype)) { 4017 3992 u32 elem_idx; 4018 3993 4019 - /* btf_resolve_size() above helps to 3994 + /* __btf_resolve_size() above helps to 4020 3995 * linearize a multi-dimensional array. 4021 3996 * 4022 3997 * The logic here is treating an array ··· 4064 4039 elem_idx = (off - moff) / msize; 4065 4040 moff += elem_idx * msize; 4066 4041 mtype = elem_type; 4042 + mid = elem_id; 4067 4043 } 4068 4044 4069 4045 /* the 'off' we're looking for is either equal to start ··· 4073 4047 if (btf_type_is_struct(mtype)) { 4074 4048 /* our field must be inside that union or struct */ 4075 4049 t = mtype; 4050 + 4051 + /* return if the offset matches the member offset */ 4052 + if (off == moff) { 4053 + *next_btf_id = mid; 4054 + return WALK_STRUCT; 4055 + } 4076 4056 4077 4057 /* adjust offset we're looking for */ 4078 4058 off -= moff; ··· 4095 4063 mname, moff, tname, off, size); 4096 4064 return -EACCES; 4097 4065 } 4098 - 4099 4066 stype = btf_type_skip_modifiers(btf_vmlinux, mtype->type, &id); 4100 4067 if (btf_type_is_struct(stype)) { 4101 4068 *next_btf_id = id; 4102 - return PTR_TO_BTF_ID; 4069 + return WALK_PTR; 4103 4070 } 4104 4071 } 4105 4072 ··· 4115 4084 return -EACCES; 4116 4085 } 4117 4086 4118 - return SCALAR_VALUE; 4087 + return WALK_SCALAR; 4119 4088 } 4120 4089 bpf_log(log, "struct %s doesn't have field at offset %d\n", tname, off); 4121 4090 return -EINVAL; 4091 + } 4092 + 4093 + int btf_struct_access(struct bpf_verifier_log *log, 4094 + const struct btf_type *t, int off, int size, 4095 + enum bpf_access_type atype __maybe_unused, 4096 + u32 *next_btf_id) 4097 + { 4098 + int err; 4099 + u32 id; 4100 + 4101 + do { 4102 + err = btf_struct_walk(log, t, off, size, &id); 4103 + 4104 + switch (err) { 4105 + case WALK_PTR: 4106 + /* If we found the pointer or scalar on t+off, 4107 + * we're done. 4108 + */ 4109 + *next_btf_id = id; 4110 + return PTR_TO_BTF_ID; 4111 + case WALK_SCALAR: 4112 + return SCALAR_VALUE; 4113 + case WALK_STRUCT: 4114 + /* We found nested struct, so continue the search 4115 + * by diving in it. At this point the offset is 4116 + * aligned with the new type, so set it to 0. 4117 + */ 4118 + t = btf_type_by_id(btf_vmlinux, id); 4119 + off = 0; 4120 + break; 4121 + default: 4122 + /* It's either error or unknown return value.. 4123 + * scream and leave. 4124 + */ 4125 + if (WARN_ONCE(err > 0, "unknown btf_struct_walk return value")) 4126 + return -EINVAL; 4127 + return err; 4128 + } 4129 + } while (t); 4130 + 4131 + return -EINVAL; 4132 + } 4133 + 4134 + bool btf_struct_ids_match(struct bpf_verifier_log *log, 4135 + int off, u32 id, u32 need_type_id) 4136 + { 4137 + const struct btf_type *type; 4138 + int err; 4139 + 4140 + /* Are we already done? */ 4141 + if (need_type_id == id && off == 0) 4142 + return true; 4143 + 4144 + again: 4145 + type = btf_type_by_id(btf_vmlinux, id); 4146 + if (!type) 4147 + return false; 4148 + err = btf_struct_walk(log, type, off, 1, &id); 4149 + if (err != WALK_STRUCT) 4150 + return false; 4151 + 4152 + /* We found nested struct object. If it matches 4153 + * the requested ID, we're done. Otherwise let's 4154 + * continue the search with offset 0 in the new 4155 + * type. 4156 + */ 4157 + if (need_type_id != id) { 4158 + off = 0; 4159 + goto again; 4160 + } 4161 + 4162 + return true; 4122 4163 } 4123 4164 4124 4165 int btf_resolve_helper_id(struct bpf_verifier_log *log, ··· 4763 4660 u32 btf_id(const struct btf *btf) 4764 4661 { 4765 4662 return btf->id; 4663 + } 4664 + 4665 + static int btf_id_cmp_func(const void *a, const void *b) 4666 + { 4667 + const int *pa = a, *pb = b; 4668 + 4669 + return *pa - *pb; 4670 + } 4671 + 4672 + bool btf_id_set_contains(struct btf_id_set *set, u32 id) 4673 + { 4674 + return bsearch(&id, set->ids, set->cnt, sizeof(u32), btf_id_cmp_func) != NULL; 4766 4675 }

+4 -8

kernel/bpf/cpumap.c

··· 79 79 80 80 static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list); 81 81 82 - static int bq_flush_to_queue(struct xdp_bulk_queue *bq); 83 - 84 82 static struct bpf_map *cpu_map_alloc(union bpf_attr *attr) 85 83 { 86 84 u32 value_size = attr->value_size; ··· 656 658 657 659 static int cpu_map_btf_id; 658 660 const struct bpf_map_ops cpu_map_ops = { 661 + .map_meta_equal = bpf_map_meta_equal, 659 662 .map_alloc = cpu_map_alloc, 660 663 .map_free = cpu_map_free, 661 664 .map_delete_elem = cpu_map_delete_elem, ··· 668 669 .map_btf_id = &cpu_map_btf_id, 669 670 }; 670 671 671 - static int bq_flush_to_queue(struct xdp_bulk_queue *bq) 672 + static void bq_flush_to_queue(struct xdp_bulk_queue *bq) 672 673 { 673 674 struct bpf_cpu_map_entry *rcpu = bq->obj; 674 675 unsigned int processed = 0, drops = 0; ··· 677 678 int i; 678 679 679 680 if (unlikely(!bq->count)) 680 - return 0; 681 + return; 681 682 682 683 q = rcpu->queue; 683 684 spin_lock(&q->producer_lock); ··· 700 701 701 702 /* Feedback loop via tracepoints */ 702 703 trace_xdp_cpumap_enqueue(rcpu->map_id, processed, drops, to_cpu); 703 - return 0; 704 704 } 705 705 706 706 /* Runs under RCU-read-side, plus in softirq under NAPI protection. 707 707 * Thus, safe percpu variable access. 708 708 */ 709 - static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf) 709 + static void bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf) 710 710 { 711 711 struct list_head *flush_list = this_cpu_ptr(&cpu_map_flush_list); 712 712 struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq); ··· 726 728 727 729 if (!bq->flush_node.prev) 728 730 list_add(&bq->flush_node, flush_list); 729 - 730 - return 0; 731 731 } 732 732 733 733 int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,

+9 -8

kernel/bpf/devmap.c

··· 341 341 return false; 342 342 } 343 343 344 - static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) 344 + static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) 345 345 { 346 346 struct net_device *dev = bq->dev; 347 347 int sent = 0, drops = 0, err = 0; 348 348 int i; 349 349 350 350 if (unlikely(!bq->count)) 351 - return 0; 351 + return; 352 352 353 353 for (i = 0; i < bq->count; i++) { 354 354 struct xdp_frame *xdpf = bq->q[i]; ··· 369 369 trace_xdp_devmap_xmit(bq->dev_rx, dev, sent, drops, err); 370 370 bq->dev_rx = NULL; 371 371 __list_del_clearprev(&bq->flush_node); 372 - return 0; 372 + return; 373 373 error: 374 374 /* If ndo_xdp_xmit fails with an errno, no frames have been 375 375 * xmit'ed and it's our responsibility to them free all. ··· 421 421 /* Runs under RCU-read-side, plus in softirq under NAPI protection. 422 422 * Thus, safe percpu variable access. 423 423 */ 424 - static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, 425 - struct net_device *dev_rx) 424 + static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, 425 + struct net_device *dev_rx) 426 426 { 427 427 struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); 428 428 struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); ··· 441 441 442 442 if (!bq->flush_node.prev) 443 443 list_add(&bq->flush_node, flush_list); 444 - 445 - return 0; 446 444 } 447 445 448 446 static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, ··· 460 462 if (unlikely(!xdpf)) 461 463 return -EOVERFLOW; 462 464 463 - return bq_enqueue(dev, xdpf, dev_rx); 465 + bq_enqueue(dev, xdpf, dev_rx); 466 + return 0; 464 467 } 465 468 466 469 static struct xdp_buff *dev_map_run_prog(struct net_device *dev, ··· 750 751 751 752 static int dev_map_btf_id; 752 753 const struct bpf_map_ops dev_map_ops = { 754 + .map_meta_equal = bpf_map_meta_equal, 753 755 .map_alloc = dev_map_alloc, 754 756 .map_free = dev_map_free, 755 757 .map_get_next_key = dev_map_get_next_key, ··· 764 764 765 765 static int dev_map_hash_map_btf_id; 766 766 const struct bpf_map_ops dev_map_hash_ops = { 767 + .map_meta_equal = bpf_map_meta_equal, 767 768 .map_alloc = dev_map_alloc, 768 769 .map_free = dev_map_free, 769 770 .map_get_next_key = dev_map_hash_get_next_key,

+10 -6

kernel/bpf/hashtab.c

··· 9 9 #include <linux/rculist_nulls.h> 10 10 #include <linux/random.h> 11 11 #include <uapi/linux/btf.h> 12 + #include <linux/rcupdate_trace.h> 12 13 #include "percpu_freelist.h" 13 14 #include "bpf_lru_list.h" 14 15 #include "map_in_map.h" ··· 578 577 struct htab_elem *l; 579 578 u32 hash, key_size; 580 579 581 - /* Must be called with rcu_read_lock. */ 582 - WARN_ON_ONCE(!rcu_read_lock_held()); 580 + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held()); 583 581 584 582 key_size = map->key_size; 585 583 ··· 941 941 /* unknown flags */ 942 942 return -EINVAL; 943 943 944 - WARN_ON_ONCE(!rcu_read_lock_held()); 944 + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held()); 945 945 946 946 key_size = map->key_size; 947 947 ··· 1032 1032 /* unknown flags */ 1033 1033 return -EINVAL; 1034 1034 1035 - WARN_ON_ONCE(!rcu_read_lock_held()); 1035 + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held()); 1036 1036 1037 1037 key_size = map->key_size; 1038 1038 ··· 1220 1220 u32 hash, key_size; 1221 1221 int ret = -ENOENT; 1222 1222 1223 - WARN_ON_ONCE(!rcu_read_lock_held()); 1223 + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held()); 1224 1224 1225 1225 key_size = map->key_size; 1226 1226 ··· 1252 1252 u32 hash, key_size; 1253 1253 int ret = -ENOENT; 1254 1254 1255 - WARN_ON_ONCE(!rcu_read_lock_held()); 1255 + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held()); 1256 1256 1257 1257 key_size = map->key_size; 1258 1258 ··· 1810 1810 1811 1811 static int htab_map_btf_id; 1812 1812 const struct bpf_map_ops htab_map_ops = { 1813 + .map_meta_equal = bpf_map_meta_equal, 1813 1814 .map_alloc_check = htab_map_alloc_check, 1814 1815 .map_alloc = htab_map_alloc, 1815 1816 .map_free = htab_map_free, ··· 1828 1827 1829 1828 static int htab_lru_map_btf_id; 1830 1829 const struct bpf_map_ops htab_lru_map_ops = { 1830 + .map_meta_equal = bpf_map_meta_equal, 1831 1831 .map_alloc_check = htab_map_alloc_check, 1832 1832 .map_alloc = htab_map_alloc, 1833 1833 .map_free = htab_map_free, ··· 1949 1947 1950 1948 static int htab_percpu_map_btf_id; 1951 1949 const struct bpf_map_ops htab_percpu_map_ops = { 1950 + .map_meta_equal = bpf_map_meta_equal, 1952 1951 .map_alloc_check = htab_map_alloc_check, 1953 1952 .map_alloc = htab_map_alloc, 1954 1953 .map_free = htab_map_free, ··· 1966 1963 1967 1964 static int htab_lru_percpu_map_btf_id; 1968 1965 const struct bpf_map_ops htab_lru_percpu_map_ops = { 1966 + .map_meta_equal = bpf_map_meta_equal, 1969 1967 .map_alloc_check = htab_map_alloc_check, 1970 1968 .map_alloc = htab_map_alloc, 1971 1969 .map_free = htab_map_free,

+22

kernel/bpf/helpers.c

··· 601 601 .arg5_type = ARG_CONST_SIZE_OR_ZERO, 602 602 }; 603 603 604 + BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size, 605 + const void __user *, user_ptr) 606 + { 607 + int ret = copy_from_user(dst, user_ptr, size); 608 + 609 + if (unlikely(ret)) { 610 + memset(dst, 0, size); 611 + ret = -EFAULT; 612 + } 613 + 614 + return ret; 615 + } 616 + 617 + const struct bpf_func_proto bpf_copy_from_user_proto = { 618 + .func = bpf_copy_from_user, 619 + .gpl_only = false, 620 + .ret_type = RET_INTEGER, 621 + .arg1_type = ARG_PTR_TO_UNINIT_MEM, 622 + .arg2_type = ARG_CONST_SIZE_OR_ZERO, 623 + .arg3_type = ARG_ANYTHING, 624 + }; 625 + 604 626 const struct bpf_func_proto bpf_get_current_task_proto __weak; 605 627 const struct bpf_func_proto bpf_probe_read_user_proto __weak; 606 628 const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;

+113 -3

kernel/bpf/inode.c

··· 20 20 #include <linux/filter.h> 21 21 #include <linux/bpf.h> 22 22 #include <linux/bpf_trace.h> 23 + #include "preload/bpf_preload.h" 23 24 24 25 enum bpf_type { 25 26 BPF_TYPE_UNSPEC = 0, ··· 370 369 bpf_lookup(struct inode *dir, struct dentry *dentry, unsigned flags) 371 370 { 372 371 /* Dots in names (e.g. "/sys/fs/bpf/foo.bar") are reserved for future 373 - * extensions. 372 + * extensions. That allows popoulate_bpffs() create special files. 374 373 */ 375 - if (strchr(dentry->d_name.name, '.')) 374 + if ((dir->i_mode & S_IALLUGO) && 375 + strchr(dentry->d_name.name, '.')) 376 376 return ERR_PTR(-EPERM); 377 377 378 378 return simple_lookup(dir, dentry, flags); ··· 410 408 .link = simple_link, 411 409 .unlink = simple_unlink, 412 410 }; 411 + 412 + /* pin iterator link into bpffs */ 413 + static int bpf_iter_link_pin_kernel(struct dentry *parent, 414 + const char *name, struct bpf_link *link) 415 + { 416 + umode_t mode = S_IFREG | S_IRUSR; 417 + struct dentry *dentry; 418 + int ret; 419 + 420 + inode_lock(parent->d_inode); 421 + dentry = lookup_one_len(name, parent, strlen(name)); 422 + if (IS_ERR(dentry)) { 423 + inode_unlock(parent->d_inode); 424 + return PTR_ERR(dentry); 425 + } 426 + ret = bpf_mkobj_ops(dentry, mode, link, &bpf_link_iops, 427 + &bpf_iter_fops); 428 + dput(dentry); 429 + inode_unlock(parent->d_inode); 430 + return ret; 431 + } 413 432 414 433 static int bpf_obj_do_pin(const char __user *pathname, void *raw, 415 434 enum bpf_type type) ··· 661 638 return 0; 662 639 } 663 640 641 + struct bpf_preload_ops *bpf_preload_ops; 642 + EXPORT_SYMBOL_GPL(bpf_preload_ops); 643 + 644 + static bool bpf_preload_mod_get(void) 645 + { 646 + /* If bpf_preload.ko wasn't loaded earlier then load it now. 647 + * When bpf_preload is built into vmlinux the module's __init 648 + * function will populate it. 649 + */ 650 + if (!bpf_preload_ops) { 651 + request_module("bpf_preload"); 652 + if (!bpf_preload_ops) 653 + return false; 654 + } 655 + /* And grab the reference, so the module doesn't disappear while the 656 + * kernel is interacting with the kernel module and its UMD. 657 + */ 658 + if (!try_module_get(bpf_preload_ops->owner)) { 659 + pr_err("bpf_preload module get failed.\n"); 660 + return false; 661 + } 662 + return true; 663 + } 664 + 665 + static void bpf_preload_mod_put(void) 666 + { 667 + if (bpf_preload_ops) 668 + /* now user can "rmmod bpf_preload" if necessary */ 669 + module_put(bpf_preload_ops->owner); 670 + } 671 + 672 + static DEFINE_MUTEX(bpf_preload_lock); 673 + 674 + static int populate_bpffs(struct dentry *parent) 675 + { 676 + struct bpf_preload_info objs[BPF_PRELOAD_LINKS] = {}; 677 + struct bpf_link *links[BPF_PRELOAD_LINKS] = {}; 678 + int err = 0, i; 679 + 680 + /* grab the mutex to make sure the kernel interactions with bpf_preload 681 + * UMD are serialized 682 + */ 683 + mutex_lock(&bpf_preload_lock); 684 + 685 + /* if bpf_preload.ko wasn't built into vmlinux then load it */ 686 + if (!bpf_preload_mod_get()) 687 + goto out; 688 + 689 + if (!bpf_preload_ops->info.tgid) { 690 + /* preload() will start UMD that will load BPF iterator programs */ 691 + err = bpf_preload_ops->preload(objs); 692 + if (err) 693 + goto out_put; 694 + for (i = 0; i < BPF_PRELOAD_LINKS; i++) { 695 + links[i] = bpf_link_by_id(objs[i].link_id); 696 + if (IS_ERR(links[i])) { 697 + err = PTR_ERR(links[i]); 698 + goto out_put; 699 + } 700 + } 701 + for (i = 0; i < BPF_PRELOAD_LINKS; i++) { 702 + err = bpf_iter_link_pin_kernel(parent, 703 + objs[i].link_name, links[i]); 704 + if (err) 705 + goto out_put; 706 + /* do not unlink successfully pinned links even 707 + * if later link fails to pin 708 + */ 709 + links[i] = NULL; 710 + } 711 + /* finish() will tell UMD process to exit */ 712 + err = bpf_preload_ops->finish(); 713 + if (err) 714 + goto out_put; 715 + } 716 + out_put: 717 + bpf_preload_mod_put(); 718 + out: 719 + mutex_unlock(&bpf_preload_lock); 720 + for (i = 0; i < BPF_PRELOAD_LINKS && err; i++) 721 + if (!IS_ERR_OR_NULL(links[i])) 722 + bpf_link_put(links[i]); 723 + return err; 724 + } 725 + 664 726 static int bpf_fill_super(struct super_block *sb, struct fs_context *fc) 665 727 { 666 728 static const struct tree_descr bpf_rfiles[] = { { "" } }; ··· 762 654 inode = sb->s_root->d_inode; 763 655 inode->i_op = &bpf_dir_iops; 764 656 inode->i_mode &= ~S_IALLUGO; 657 + populate_bpffs(sb->s_root); 765 658 inode->i_mode |= S_ISVTX | opts->mode; 766 - 767 659 return 0; 768 660 } 769 661 ··· 812 704 static int __init bpf_init(void) 813 705 { 814 706 int ret; 707 + 708 + mutex_init(&bpf_preload_lock); 815 709 816 710 ret = sysfs_create_mount_point(fs_kobj, "bpf"); 817 711 if (ret)

+1

kernel/bpf/lpm_trie.c

··· 732 732 733 733 static int trie_map_btf_id; 734 734 const struct bpf_map_ops trie_map_ops = { 735 + .map_meta_equal = bpf_map_meta_equal, 735 736 .map_alloc = trie_alloc, 736 737 .map_free = trie_free, 737 738 .map_get_next_key = trie_get_next_key,

+9 -15

kernel/bpf/map_in_map.c

··· 17 17 if (IS_ERR(inner_map)) 18 18 return inner_map; 19 19 20 - /* prog_array->aux->{type,jited} is a runtime binding. 21 - * Doing static check alone in the verifier is not enough. 22 - */ 23 - if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY || 24 - inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE || 25 - inner_map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE || 26 - inner_map->map_type == BPF_MAP_TYPE_STRUCT_OPS) { 27 - fdput(f); 28 - return ERR_PTR(-ENOTSUPP); 29 - } 30 - 31 20 /* Does not support >1 level map-in-map */ 32 21 if (inner_map->inner_map_meta) { 33 22 fdput(f); 34 23 return ERR_PTR(-EINVAL); 24 + } 25 + 26 + if (!inner_map->ops->map_meta_equal) { 27 + fdput(f); 28 + return ERR_PTR(-ENOTSUPP); 35 29 } 36 30 37 31 if (map_value_has_spin_lock(inner_map)) { ··· 75 81 return meta0->map_type == meta1->map_type && 76 82 meta0->key_size == meta1->key_size && 77 83 meta0->value_size == meta1->value_size && 78 - meta0->map_flags == meta1->map_flags && 79 - meta0->max_entries == meta1->max_entries; 84 + meta0->map_flags == meta1->map_flags; 80 85 } 81 86 82 87 void *bpf_map_fd_get_ptr(struct bpf_map *map, 83 88 struct file *map_file /* not used */, 84 89 int ufd) 85 90 { 86 - struct bpf_map *inner_map; 91 + struct bpf_map *inner_map, *inner_map_meta; 87 92 struct fd f; 88 93 89 94 f = fdget(ufd); ··· 90 97 if (IS_ERR(inner_map)) 91 98 return inner_map; 92 99 93 - if (bpf_map_meta_equal(map->inner_map_meta, inner_map)) 100 + inner_map_meta = map->inner_map_meta; 101 + if (inner_map_meta->ops->map_meta_equal(inner_map_meta, inner_map)) 94 102 bpf_map_inc(inner_map); 95 103 else 96 104 inner_map = ERR_PTR(-EINVAL);

-2

kernel/bpf/map_in_map.h

··· 11 11 12 12 struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd); 13 13 void bpf_map_meta_free(struct bpf_map *map_meta); 14 - bool bpf_map_meta_equal(const struct bpf_map *meta0, 15 - const struct bpf_map *meta1); 16 14 void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file, 17 15 int ufd); 18 16 void bpf_map_fd_put_ptr(void *ptr);

+15

kernel/bpf/map_iter.c

··· 149 149 bpf_map_put_with_uref(aux->map); 150 150 } 151 151 152 + void bpf_iter_map_show_fdinfo(const struct bpf_iter_aux_info *aux, 153 + struct seq_file *seq) 154 + { 155 + seq_printf(seq, "map_id:\t%u\n", aux->map->id); 156 + } 157 + 158 + int bpf_iter_map_fill_link_info(const struct bpf_iter_aux_info *aux, 159 + struct bpf_link_info *info) 160 + { 161 + info->iter.map.map_id = aux->map->id; 162 + return 0; 163 + } 164 + 152 165 DEFINE_BPF_ITER_FUNC(bpf_map_elem, struct bpf_iter_meta *meta, 153 166 struct bpf_map *map, void *key, void *value) 154 167 ··· 169 156 .target = "bpf_map_elem", 170 157 .attach_target = bpf_iter_attach_map, 171 158 .detach_target = bpf_iter_detach_map, 159 + .show_fdinfo = bpf_iter_map_show_fdinfo, 160 + .fill_link_info = bpf_iter_map_fill_link_info, 172 161 .ctx_arg_info_size = 2, 173 162 .ctx_arg_info = { 174 163 { offsetof(struct bpf_iter__bpf_map_elem, key),

+26

kernel/bpf/preload/Kconfig

··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + config USERMODE_DRIVER 3 + bool 4 + default n 5 + 6 + menuconfig BPF_PRELOAD 7 + bool "Preload BPF file system with kernel specific program and map iterators" 8 + depends on BPF 9 + # The dependency on !COMPILE_TEST prevents it from being enabled 10 + # in allmodconfig or allyesconfig configurations 11 + depends on !COMPILE_TEST 12 + select USERMODE_DRIVER 13 + help 14 + This builds kernel module with several embedded BPF programs that are 15 + pinned into BPF FS mount point as human readable files that are 16 + useful in debugging and introspection of BPF programs and maps. 17 + 18 + if BPF_PRELOAD 19 + config BPF_PRELOAD_UMD 20 + tristate "bpf_preload kernel module with user mode driver" 21 + depends on CC_CAN_LINK 22 + depends on m || CC_CAN_LINK_STATIC 23 + default m 24 + help 25 + This builds bpf_preload kernel module with embedded user mode driver. 26 + endif

+23

kernel/bpf/preload/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + 3 + LIBBPF_SRCS = $(srctree)/tools/lib/bpf/ 4 + LIBBPF_A = $(obj)/libbpf.a 5 + LIBBPF_OUT = $(abspath $(obj)) 6 + 7 + $(LIBBPF_A): 8 + $(Q)$(MAKE) -C $(LIBBPF_SRCS) OUTPUT=$(LIBBPF_OUT)/ $(LIBBPF_OUT)/libbpf.a 9 + 10 + userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi \ 11 + -I $(srctree)/tools/lib/ -Wno-unused-result 12 + 13 + userprogs := bpf_preload_umd 14 + 15 + bpf_preload_umd-objs := iterators/iterators.o 16 + bpf_preload_umd-userldlibs := $(LIBBPF_A) -lelf -lz 17 + 18 + $(obj)/bpf_preload_umd: $(LIBBPF_A) 19 + 20 + $(obj)/bpf_preload_umd_blob.o: $(obj)/bpf_preload_umd 21 + 22 + obj-$(CONFIG_BPF_PRELOAD_UMD) += bpf_preload.o 23 + bpf_preload-objs += bpf_preload_kern.o bpf_preload_umd_blob.o

+16

kernel/bpf/preload/bpf_preload.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _BPF_PRELOAD_H 3 + #define _BPF_PRELOAD_H 4 + 5 + #include <linux/usermode_driver.h> 6 + #include "iterators/bpf_preload_common.h" 7 + 8 + struct bpf_preload_ops { 9 + struct umd_info info; 10 + int (*preload)(struct bpf_preload_info *); 11 + int (*finish)(void); 12 + struct module *owner; 13 + }; 14 + extern struct bpf_preload_ops *bpf_preload_ops; 15 + #define BPF_PRELOAD_LINKS 2 16 + #endif

+91

kernel/bpf/preload/bpf_preload_kern.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 3 + #include <linux/init.h> 4 + #include <linux/module.h> 5 + #include <linux/pid.h> 6 + #include <linux/fs.h> 7 + #include <linux/sched/signal.h> 8 + #include "bpf_preload.h" 9 + 10 + extern char bpf_preload_umd_start; 11 + extern char bpf_preload_umd_end; 12 + 13 + static int preload(struct bpf_preload_info *obj); 14 + static int finish(void); 15 + 16 + static struct bpf_preload_ops umd_ops = { 17 + .info.driver_name = "bpf_preload", 18 + .preload = preload, 19 + .finish = finish, 20 + .owner = THIS_MODULE, 21 + }; 22 + 23 + static int preload(struct bpf_preload_info *obj) 24 + { 25 + int magic = BPF_PRELOAD_START; 26 + loff_t pos = 0; 27 + int i, err; 28 + ssize_t n; 29 + 30 + err = fork_usermode_driver(&umd_ops.info); 31 + if (err) 32 + return err; 33 + 34 + /* send the start magic to let UMD proceed with loading BPF progs */ 35 + n = kernel_write(umd_ops.info.pipe_to_umh, 36 + &magic, sizeof(magic), &pos); 37 + if (n != sizeof(magic)) 38 + return -EPIPE; 39 + 40 + /* receive bpf_link IDs and names from UMD */ 41 + pos = 0; 42 + for (i = 0; i < BPF_PRELOAD_LINKS; i++) { 43 + n = kernel_read(umd_ops.info.pipe_from_umh, 44 + &obj[i], sizeof(*obj), &pos); 45 + if (n != sizeof(*obj)) 46 + return -EPIPE; 47 + } 48 + return 0; 49 + } 50 + 51 + static int finish(void) 52 + { 53 + int magic = BPF_PRELOAD_END; 54 + struct pid *tgid; 55 + loff_t pos = 0; 56 + ssize_t n; 57 + 58 + /* send the last magic to UMD. It will do a normal exit. */ 59 + n = kernel_write(umd_ops.info.pipe_to_umh, 60 + &magic, sizeof(magic), &pos); 61 + if (n != sizeof(magic)) 62 + return -EPIPE; 63 + tgid = umd_ops.info.tgid; 64 + wait_event(tgid->wait_pidfd, thread_group_exited(tgid)); 65 + umd_ops.info.tgid = NULL; 66 + return 0; 67 + } 68 + 69 + static int __init load_umd(void) 70 + { 71 + int err; 72 + 73 + err = umd_load_blob(&umd_ops.info, &bpf_preload_umd_start, 74 + &bpf_preload_umd_end - &bpf_preload_umd_start); 75 + if (err) 76 + return err; 77 + bpf_preload_ops = &umd_ops; 78 + return err; 79 + } 80 + 81 + static void __exit fini_umd(void) 82 + { 83 + bpf_preload_ops = NULL; 84 + /* kill UMD in case it's still there due to earlier error */ 85 + kill_pid(umd_ops.info.tgid, SIGKILL, 1); 86 + umd_ops.info.tgid = NULL; 87 + umd_unload_blob(&umd_ops.info); 88 + } 89 + late_initcall(load_umd); 90 + module_exit(fini_umd); 91 + MODULE_LICENSE("GPL");

+7

kernel/bpf/preload/bpf_preload_umd_blob.S

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + .section .init.rodata, "a" 3 + .global bpf_preload_umd_start 4 + bpf_preload_umd_start: 5 + .incbin "kernel/bpf/preload/bpf_preload_umd" 6 + .global bpf_preload_umd_end 7 + bpf_preload_umd_end:

+2

kernel/bpf/preload/iterators/.gitignore

··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + /.output

+57

kernel/bpf/preload/iterators/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + OUTPUT := .output 3 + CLANG ?= clang 4 + LLC ?= llc 5 + LLVM_STRIP ?= llvm-strip 6 + DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool 7 + BPFTOOL ?= $(DEFAULT_BPFTOOL) 8 + LIBBPF_SRC := $(abspath ../../../../tools/lib/bpf) 9 + BPFOBJ := $(OUTPUT)/libbpf.a 10 + BPF_INCLUDE := $(OUTPUT) 11 + INCLUDES := -I$(OUTPUT) -I$(BPF_INCLUDE) -I$(abspath ../../../../tools/lib) \ 12 + -I$(abspath ../../../../tools/include/uapi) 13 + CFLAGS := -g -Wall 14 + 15 + abs_out := $(abspath $(OUTPUT)) 16 + ifeq ($(V),1) 17 + Q = 18 + msg = 19 + else 20 + Q = @ 21 + msg = @printf ' %-8s %s%s\n' "$(1)" "$(notdir $(2))" "$(if $(3), $(3))"; 22 + MAKEFLAGS += --no-print-directory 23 + submake_extras := feature_display=0 24 + endif 25 + 26 + .DELETE_ON_ERROR: 27 + 28 + .PHONY: all clean 29 + 30 + all: iterators.skel.h 31 + 32 + clean: 33 + $(call msg,CLEAN) 34 + $(Q)rm -rf $(OUTPUT) iterators 35 + 36 + iterators.skel.h: $(OUTPUT)/iterators.bpf.o | $(BPFTOOL) 37 + $(call msg,GEN-SKEL,$@) 38 + $(Q)$(BPFTOOL) gen skeleton $< > $@ 39 + 40 + 41 + $(OUTPUT)/iterators.bpf.o: iterators.bpf.c $(BPFOBJ) | $(OUTPUT) 42 + $(call msg,BPF,$@) 43 + $(Q)$(CLANG) -g -O2 -target bpf $(INCLUDES) \ 44 + -c $(filter %.c,$^) -o $@ && \ 45 + $(LLVM_STRIP) -g $@ 46 + 47 + $(OUTPUT): 48 + $(call msg,MKDIR,$@) 49 + $(Q)mkdir -p $(OUTPUT) 50 + 51 + $(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT) 52 + $(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) \ 53 + OUTPUT=$(abspath $(dir $@))/ $(abspath $@) 54 + 55 + $(DEFAULT_BPFTOOL): 56 + $(Q)$(MAKE) $(submake_extras) -C ../../../../tools/bpf/bpftool \ 57 + prefix= OUTPUT=$(abs_out)/ DESTDIR=$(abs_out) install

+4

kernel/bpf/preload/iterators/README

··· 1 + WARNING: 2 + If you change "iterators.bpf.c" do "make -j" in this directory to rebuild "iterators.skel.h". 3 + Make sure to have clang 10 installed. 4 + See Documentation/bpf/bpf_devel_QA.rst

+13

kernel/bpf/preload/iterators/bpf_preload_common.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _BPF_PRELOAD_COMMON_H 3 + #define _BPF_PRELOAD_COMMON_H 4 + 5 + #define BPF_PRELOAD_START 0x5555 6 + #define BPF_PRELOAD_END 0xAAAA 7 + 8 + struct bpf_preload_info { 9 + char link_name[16]; 10 + int link_id; 11 + }; 12 + 13 + #endif

+114

kernel/bpf/preload/iterators/iterators.bpf.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2020 Facebook */ 3 + #include <linux/bpf.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include <bpf/bpf_core_read.h> 7 + 8 + #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record) 9 + struct seq_file; 10 + struct bpf_iter_meta { 11 + struct seq_file *seq; 12 + __u64 session_id; 13 + __u64 seq_num; 14 + }; 15 + 16 + struct bpf_map { 17 + __u32 id; 18 + char name[16]; 19 + __u32 max_entries; 20 + }; 21 + 22 + struct bpf_iter__bpf_map { 23 + struct bpf_iter_meta *meta; 24 + struct bpf_map *map; 25 + }; 26 + 27 + struct btf_type { 28 + __u32 name_off; 29 + }; 30 + 31 + struct btf_header { 32 + __u32 str_len; 33 + }; 34 + 35 + struct btf { 36 + const char *strings; 37 + struct btf_type **types; 38 + struct btf_header hdr; 39 + }; 40 + 41 + struct bpf_prog_aux { 42 + __u32 id; 43 + char name[16]; 44 + const char *attach_func_name; 45 + struct bpf_prog *linked_prog; 46 + struct bpf_func_info *func_info; 47 + struct btf *btf; 48 + }; 49 + 50 + struct bpf_prog { 51 + struct bpf_prog_aux *aux; 52 + }; 53 + 54 + struct bpf_iter__bpf_prog { 55 + struct bpf_iter_meta *meta; 56 + struct bpf_prog *prog; 57 + }; 58 + #pragma clang attribute pop 59 + 60 + static const char *get_name(struct btf *btf, long btf_id, const char *fallback) 61 + { 62 + struct btf_type **types, *t; 63 + unsigned int name_off; 64 + const char *str; 65 + 66 + if (!btf) 67 + return fallback; 68 + str = btf->strings; 69 + types = btf->types; 70 + bpf_probe_read_kernel(&t, sizeof(t), types + btf_id); 71 + name_off = BPF_CORE_READ(t, name_off); 72 + if (name_off >= btf->hdr.str_len) 73 + return fallback; 74 + return str + name_off; 75 + } 76 + 77 + SEC("iter/bpf_map") 78 + int dump_bpf_map(struct bpf_iter__bpf_map *ctx) 79 + { 80 + struct seq_file *seq = ctx->meta->seq; 81 + __u64 seq_num = ctx->meta->seq_num; 82 + struct bpf_map *map = ctx->map; 83 + 84 + if (!map) 85 + return 0; 86 + 87 + if (seq_num == 0) 88 + BPF_SEQ_PRINTF(seq, " id name max_entries\n"); 89 + 90 + BPF_SEQ_PRINTF(seq, "%4u %-16s%6d\n", map->id, map->name, map->max_entries); 91 + return 0; 92 + } 93 + 94 + SEC("iter/bpf_prog") 95 + int dump_bpf_prog(struct bpf_iter__bpf_prog *ctx) 96 + { 97 + struct seq_file *seq = ctx->meta->seq; 98 + __u64 seq_num = ctx->meta->seq_num; 99 + struct bpf_prog *prog = ctx->prog; 100 + struct bpf_prog_aux *aux; 101 + 102 + if (!prog) 103 + return 0; 104 + 105 + aux = prog->aux; 106 + if (seq_num == 0) 107 + BPF_SEQ_PRINTF(seq, " id name attached\n"); 108 + 109 + BPF_SEQ_PRINTF(seq, "%4u %-16s %s %s\n", aux->id, 110 + get_name(aux->btf, aux->func_info[0].type_id, aux->name), 111 + aux->attach_func_name, aux->linked_prog->aux->name); 112 + return 0; 113 + } 114 + char LICENSE[] SEC("license") = "GPL";

+94

kernel/bpf/preload/iterators/iterators.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2020 Facebook */ 3 + #include <argp.h> 4 + #include <stdio.h> 5 + #include <stdlib.h> 6 + #include <string.h> 7 + #include <unistd.h> 8 + #include <fcntl.h> 9 + #include <sys/resource.h> 10 + #include <bpf/libbpf.h> 11 + #include <bpf/bpf.h> 12 + #include <sys/mount.h> 13 + #include "iterators.skel.h" 14 + #include "bpf_preload_common.h" 15 + 16 + int to_kernel = -1; 17 + int from_kernel = 0; 18 + 19 + static int send_link_to_kernel(struct bpf_link *link, const char *link_name) 20 + { 21 + struct bpf_preload_info obj = {}; 22 + struct bpf_link_info info = {}; 23 + __u32 info_len = sizeof(info); 24 + int err; 25 + 26 + err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len); 27 + if (err) 28 + return err; 29 + obj.link_id = info.id; 30 + if (strlen(link_name) >= sizeof(obj.link_name)) 31 + return -E2BIG; 32 + strcpy(obj.link_name, link_name); 33 + if (write(to_kernel, &obj, sizeof(obj)) != sizeof(obj)) 34 + return -EPIPE; 35 + return 0; 36 + } 37 + 38 + int main(int argc, char **argv) 39 + { 40 + struct rlimit rlim = { RLIM_INFINITY, RLIM_INFINITY }; 41 + struct iterators_bpf *skel; 42 + int err, magic; 43 + int debug_fd; 44 + 45 + debug_fd = open("/dev/console", O_WRONLY | O_NOCTTY | O_CLOEXEC); 46 + if (debug_fd < 0) 47 + return 1; 48 + to_kernel = dup(1); 49 + close(1); 50 + dup(debug_fd); 51 + /* now stdin and stderr point to /dev/console */ 52 + 53 + read(from_kernel, &magic, sizeof(magic)); 54 + if (magic != BPF_PRELOAD_START) { 55 + printf("bad start magic %d\n", magic); 56 + return 1; 57 + } 58 + setrlimit(RLIMIT_MEMLOCK, &rlim); 59 + /* libbpf opens BPF object and loads it into the kernel */ 60 + skel = iterators_bpf__open_and_load(); 61 + if (!skel) { 62 + /* iterators.skel.h is little endian. 63 + * libbpf doesn't support automatic little->big conversion 64 + * of BPF bytecode yet. 65 + * The program load will fail in such case. 66 + */ 67 + printf("Failed load could be due to wrong endianness\n"); 68 + return 1; 69 + } 70 + err = iterators_bpf__attach(skel); 71 + if (err) 72 + goto cleanup; 73 + 74 + /* send two bpf_link IDs with names to the kernel */ 75 + err = send_link_to_kernel(skel->links.dump_bpf_map, "maps.debug"); 76 + if (err) 77 + goto cleanup; 78 + err = send_link_to_kernel(skel->links.dump_bpf_prog, "progs.debug"); 79 + if (err) 80 + goto cleanup; 81 + 82 + /* The kernel will proceed with pinnging the links in bpffs. 83 + * UMD will wait on read from pipe. 84 + */ 85 + read(from_kernel, &magic, sizeof(magic)); 86 + if (magic != BPF_PRELOAD_END) { 87 + printf("bad final magic %d\n", magic); 88 + err = -EINVAL; 89 + } 90 + cleanup: 91 + iterators_bpf__destroy(skel); 92 + 93 + return err != 0; 94 + }

+410

kernel/bpf/preload/iterators/iterators.skel.h

··· 1 + /* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ 2 + 3 + /* THIS FILE IS AUTOGENERATED! */ 4 + #ifndef __ITERATORS_BPF_SKEL_H__ 5 + #define __ITERATORS_BPF_SKEL_H__ 6 + 7 + #include <stdlib.h> 8 + #include <bpf/libbpf.h> 9 + 10 + struct iterators_bpf { 11 + struct bpf_object_skeleton *skeleton; 12 + struct bpf_object *obj; 13 + struct { 14 + struct bpf_map *rodata; 15 + } maps; 16 + struct { 17 + struct bpf_program *dump_bpf_map; 18 + struct bpf_program *dump_bpf_prog; 19 + } progs; 20 + struct { 21 + struct bpf_link *dump_bpf_map; 22 + struct bpf_link *dump_bpf_prog; 23 + } links; 24 + struct iterators_bpf__rodata { 25 + char dump_bpf_map____fmt[35]; 26 + char dump_bpf_map____fmt_1[14]; 27 + char dump_bpf_prog____fmt[32]; 28 + char dump_bpf_prog____fmt_2[17]; 29 + } *rodata; 30 + }; 31 + 32 + static void 33 + iterators_bpf__destroy(struct iterators_bpf *obj) 34 + { 35 + if (!obj) 36 + return; 37 + if (obj->skeleton) 38 + bpf_object__destroy_skeleton(obj->skeleton); 39 + free(obj); 40 + } 41 + 42 + static inline int 43 + iterators_bpf__create_skeleton(struct iterators_bpf *obj); 44 + 45 + static inline struct iterators_bpf * 46 + iterators_bpf__open_opts(const struct bpf_object_open_opts *opts) 47 + { 48 + struct iterators_bpf *obj; 49 + 50 + obj = (typeof(obj))calloc(1, sizeof(*obj)); 51 + if (!obj) 52 + return NULL; 53 + if (iterators_bpf__create_skeleton(obj)) 54 + goto err; 55 + if (bpf_object__open_skeleton(obj->skeleton, opts)) 56 + goto err; 57 + 58 + return obj; 59 + err: 60 + iterators_bpf__destroy(obj); 61 + return NULL; 62 + } 63 + 64 + static inline struct iterators_bpf * 65 + iterators_bpf__open(void) 66 + { 67 + return iterators_bpf__open_opts(NULL); 68 + } 69 + 70 + static inline int 71 + iterators_bpf__load(struct iterators_bpf *obj) 72 + { 73 + return bpf_object__load_skeleton(obj->skeleton); 74 + } 75 + 76 + static inline struct iterators_bpf * 77 + iterators_bpf__open_and_load(void) 78 + { 79 + struct iterators_bpf *obj; 80 + 81 + obj = iterators_bpf__open(); 82 + if (!obj) 83 + return NULL; 84 + if (iterators_bpf__load(obj)) { 85 + iterators_bpf__destroy(obj); 86 + return NULL; 87 + } 88 + return obj; 89 + } 90 + 91 + static inline int 92 + iterators_bpf__attach(struct iterators_bpf *obj) 93 + { 94 + return bpf_object__attach_skeleton(obj->skeleton); 95 + } 96 + 97 + static inline void 98 + iterators_bpf__detach(struct iterators_bpf *obj) 99 + { 100 + return bpf_object__detach_skeleton(obj->skeleton); 101 + } 102 + 103 + static inline int 104 + iterators_bpf__create_skeleton(struct iterators_bpf *obj) 105 + { 106 + struct bpf_object_skeleton *s; 107 + 108 + s = (typeof(s))calloc(1, sizeof(*s)); 109 + if (!s) 110 + return -1; 111 + obj->skeleton = s; 112 + 113 + s->sz = sizeof(*s); 114 + s->name = "iterators_bpf"; 115 + s->obj = &obj->obj; 116 + 117 + /* maps */ 118 + s->map_cnt = 1; 119 + s->map_skel_sz = sizeof(*s->maps); 120 + s->maps = (typeof(s->maps))calloc(s->map_cnt, s->map_skel_sz); 121 + if (!s->maps) 122 + goto err; 123 + 124 + s->maps[0].name = "iterator.rodata"; 125 + s->maps[0].map = &obj->maps.rodata; 126 + s->maps[0].mmaped = (void **)&obj->rodata; 127 + 128 + /* programs */ 129 + s->prog_cnt = 2; 130 + s->prog_skel_sz = sizeof(*s->progs); 131 + s->progs = (typeof(s->progs))calloc(s->prog_cnt, s->prog_skel_sz); 132 + if (!s->progs) 133 + goto err; 134 + 135 + s->progs[0].name = "dump_bpf_map"; 136 + s->progs[0].prog = &obj->progs.dump_bpf_map; 137 + s->progs[0].link = &obj->links.dump_bpf_map; 138 + 139 + s->progs[1].name = "dump_bpf_prog"; 140 + s->progs[1].prog = &obj->progs.dump_bpf_prog; 141 + s->progs[1].link = &obj->links.dump_bpf_prog; 142 + 143 + s->data_sz = 7128; 144 + s->data = (void *)"\ 145 + \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\ 146 + \0\0\0\0\0\0\0\0\0\0\0\x18\x18\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0f\0\ 147 + \x0e\0\x79\x12\0\0\0\0\0\0\x79\x26\0\0\0\0\0\0\x79\x17\x08\0\0\0\0\0\x15\x07\ 148 + \x1a\0\0\0\0\0\x79\x21\x10\0\0\0\0\0\x55\x01\x08\0\0\0\0\0\xbf\xa4\0\0\0\0\0\0\ 149 + \x07\x04\0\0\xe8\xff\xff\xff\xbf\x61\0\0\0\0\0\0\x18\x02\0\0\0\0\0\0\0\0\0\0\0\ 150 + \0\0\0\xb7\x03\0\0\x23\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x7e\0\0\0\x61\x71\0\ 151 + \0\0\0\0\0\x7b\x1a\xe8\xff\0\0\0\0\xb7\x01\0\0\x04\0\0\0\xbf\x72\0\0\0\0\0\0\ 152 + \x0f\x12\0\0\0\0\0\0\x7b\x2a\xf0\xff\0\0\0\0\x61\x71\x14\0\0\0\0\0\x7b\x1a\xf8\ 153 + \xff\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xe8\xff\xff\xff\xbf\x61\0\0\0\0\0\ 154 + \0\x18\x02\0\0\x23\0\0\0\0\0\0\0\0\0\0\0\xb7\x03\0\0\x0e\0\0\0\xb7\x05\0\0\x18\ 155 + \0\0\0\x85\0\0\0\x7e\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\x79\x12\0\0\0\0\ 156 + \0\0\x79\x26\0\0\0\0\0\0\x79\x11\x08\0\0\0\0\0\x15\x01\x3b\0\0\0\0\0\x79\x17\0\ 157 + \0\0\0\0\0\x79\x21\x10\0\0\0\0\0\x55\x01\x08\0\0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\ 158 + \x04\0\0\xd0\xff\xff\xff\xbf\x61\0\0\0\0\0\0\x18\x02\0\0\x31\0\0\0\0\0\0\0\0\0\ 159 + \0\0\xb7\x03\0\0\x20\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x7e\0\0\0\x7b\x6a\xc8\ 160 + \xff\0\0\0\0\x61\x71\0\0\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xb7\x03\0\0\x04\0\0\0\ 161 + \xbf\x79\0\0\0\0\0\0\x0f\x39\0\0\0\0\0\0\x79\x71\x28\0\0\0\0\0\x79\x78\x30\0\0\ 162 + \0\0\0\x15\x08\x18\0\0\0\0\0\xb7\x02\0\0\0\0\0\0\x0f\x21\0\0\0\0\0\0\x61\x11\ 163 + \x04\0\0\0\0\0\x79\x83\x08\0\0\0\0\0\x67\x01\0\0\x03\0\0\0\x0f\x13\0\0\0\0\0\0\ 164 + \x79\x86\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\xf8\xff\xff\xff\xb7\x02\0\ 165 + \0\x08\0\0\0\x85\0\0\0\x71\0\0\0\xb7\x01\0\0\0\0\0\0\x79\xa3\xf8\xff\0\0\0\0\ 166 + \x0f\x13\0\0\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\xf4\xff\xff\xff\xb7\x02\0\ 167 + \0\x04\0\0\0\x85\0\0\0\x04\0\0\0\xb7\x03\0\0\x04\0\0\0\x61\xa1\xf4\xff\0\0\0\0\ 168 + \x61\x82\x10\0\0\0\0\0\x3d\x21\x02\0\0\0\0\0\x0f\x16\0\0\0\0\0\0\xbf\x69\0\0\0\ 169 + \0\0\0\x7b\x9a\xd8\xff\0\0\0\0\x79\x71\x18\0\0\0\0\0\x7b\x1a\xe0\xff\0\0\0\0\ 170 + \x79\x71\x20\0\0\0\0\0\x79\x11\0\0\0\0\0\0\x0f\x31\0\0\0\0\0\0\x7b\x1a\xe8\xff\ 171 + \0\0\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\xd0\xff\xff\xff\x79\xa1\xc8\xff\0\0\0\ 172 + \0\x18\x02\0\0\x51\0\0\0\0\0\0\0\0\0\0\0\xb7\x03\0\0\x11\0\0\0\xb7\x05\0\0\x20\ 173 + \0\0\0\x85\0\0\0\x7e\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\x20\x20\x69\x64\ 174 + \x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x6d\ 175 + \x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\x0a\0\x25\x34\x75\x20\x25\x2d\x31\x36\ 176 + \x73\x25\x36\x64\x0a\0\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\ 177 + \x20\x20\x20\x20\x20\x20\x20\x20\x61\x74\x74\x61\x63\x68\x65\x64\x0a\0\x25\x34\ 178 + \x75\x20\x25\x2d\x31\x36\x73\x20\x25\x73\x20\x25\x73\x0a\0\x47\x50\x4c\0\x9f\ 179 + \xeb\x01\0\x18\0\0\0\0\0\0\0\x1c\x04\0\0\x1c\x04\0\0\0\x05\0\0\0\0\0\0\0\0\0\ 180 + \x02\x02\0\0\0\x01\0\0\0\x02\0\0\x04\x10\0\0\0\x13\0\0\0\x03\0\0\0\0\0\0\0\x18\ 181 + \0\0\0\x04\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\x02\x08\0\0\0\0\0\0\0\0\0\0\x02\x0d\0\ 182 + \0\0\0\0\0\0\x01\0\0\x0d\x06\0\0\0\x1c\0\0\0\x01\0\0\0\x20\0\0\0\0\0\0\x01\x04\ 183 + \0\0\0\x20\0\0\x01\x24\0\0\0\x01\0\0\x0c\x05\0\0\0\xa3\0\0\0\x03\0\0\x04\x18\0\ 184 + \0\0\xb1\0\0\0\x09\0\0\0\0\0\0\0\xb5\0\0\0\x0b\0\0\0\x40\0\0\0\xc0\0\0\0\x0b\0\ 185 + \0\0\x80\0\0\0\0\0\0\0\0\0\0\x02\x0a\0\0\0\xc8\0\0\0\0\0\0\x07\0\0\0\0\xd1\0\0\ 186 + \0\0\0\0\x08\x0c\0\0\0\xd7\0\0\0\0\0\0\x01\x08\0\0\0\x40\0\0\0\x98\x01\0\0\x03\ 187 + \0\0\x04\x18\0\0\0\xa0\x01\0\0\x0e\0\0\0\0\0\0\0\xa3\x01\0\0\x11\0\0\0\x20\0\0\ 188 + \0\xa8\x01\0\0\x0e\0\0\0\xa0\0\0\0\xb4\x01\0\0\0\0\0\x08\x0f\0\0\0\xba\x01\0\0\ 189 + \0\0\0\x01\x04\0\0\0\x20\0\0\0\xc7\x01\0\0\0\0\0\x01\x01\0\0\0\x08\0\0\x01\0\0\ 190 + \0\0\0\0\0\x03\0\0\0\0\x10\0\0\0\x12\0\0\0\x10\0\0\0\xcc\x01\0\0\0\0\0\x01\x04\ 191 + \0\0\0\x20\0\0\0\0\0\0\0\0\0\0\x02\x14\0\0\0\x30\x02\0\0\x02\0\0\x04\x10\0\0\0\ 192 + \x13\0\0\0\x03\0\0\0\0\0\0\0\x43\x02\0\0\x15\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\x02\ 193 + \x18\0\0\0\0\0\0\0\x01\0\0\x0d\x06\0\0\0\x1c\0\0\0\x13\0\0\0\x48\x02\0\0\x01\0\ 194 + \0\x0c\x16\0\0\0\x94\x02\0\0\x01\0\0\x04\x08\0\0\0\x9d\x02\0\0\x19\0\0\0\0\0\0\ 195 + \0\0\0\0\0\0\0\0\x02\x1a\0\0\0\xee\x02\0\0\x06\0\0\x04\x38\0\0\0\xa0\x01\0\0\ 196 + \x0e\0\0\0\0\0\0\0\xa3\x01\0\0\x11\0\0\0\x20\0\0\0\xfb\x02\0\0\x1b\0\0\0\xc0\0\ 197 + \0\0\x0c\x03\0\0\x15\0\0\0\0\x01\0\0\x18\x03\0\0\x1d\0\0\0\x40\x01\0\0\x22\x03\ 198 + \0\0\x1e\0\0\0\x80\x01\0\0\0\0\0\0\0\0\0\x02\x1c\0\0\0\0\0\0\0\0\0\0\x0a\x10\0\ 199 + \0\0\0\0\0\0\0\0\0\x02\x1f\0\0\0\0\0\0\0\0\0\0\x02\x20\0\0\0\x6c\x03\0\0\x02\0\ 200 + \0\x04\x08\0\0\0\x7a\x03\0\0\x0e\0\0\0\0\0\0\0\x83\x03\0\0\x0e\0\0\0\x20\0\0\0\ 201 + \x22\x03\0\0\x03\0\0\x04\x18\0\0\0\x8d\x03\0\0\x1b\0\0\0\0\0\0\0\x95\x03\0\0\ 202 + \x21\0\0\0\x40\0\0\0\x9b\x03\0\0\x23\0\0\0\x80\0\0\0\0\0\0\0\0\0\0\x02\x22\0\0\ 203 + \0\0\0\0\0\0\0\0\x02\x24\0\0\0\x9f\x03\0\0\x01\0\0\x04\x04\0\0\0\xaa\x03\0\0\ 204 + \x0e\0\0\0\0\0\0\0\x13\x04\0\0\x01\0\0\x04\x04\0\0\0\x1c\x04\0\0\x0e\0\0\0\0\0\ 205 + \0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\x12\0\0\0\x23\0\0\0\x92\x04\0\0\0\0\0\ 206 + \x0e\x25\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\x12\0\0\0\x0e\0\0\0\ 207 + \xa6\x04\0\0\0\0\0\x0e\x27\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\0\0\0\x1c\0\0\0\ 208 + \x12\0\0\0\x20\0\0\0\xbc\x04\0\0\0\0\0\x0e\x29\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\ 209 + \0\0\0\0\x1c\0\0\0\x12\0\0\0\x11\0\0\0\xd1\x04\0\0\0\0\0\x0e\x2b\0\0\0\0\0\0\0\ 210 + \0\0\0\0\0\0\0\x03\0\0\0\0\x10\0\0\0\x12\0\0\0\x04\0\0\0\xe8\x04\0\0\0\0\0\x0e\ 211 + \x2d\0\0\0\x01\0\0\0\xf0\x04\0\0\x04\0\0\x0f\0\0\0\0\x26\0\0\0\0\0\0\0\x23\0\0\ 212 + \0\x28\0\0\0\x23\0\0\0\x0e\0\0\0\x2a\0\0\0\x31\0\0\0\x20\0\0\0\x2c\0\0\0\x51\0\ 213 + \0\0\x11\0\0\0\xf8\x04\0\0\x01\0\0\x0f\0\0\0\0\x2e\0\0\0\0\0\0\0\x04\0\0\0\0\ 214 + \x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x5f\x62\x70\x66\x5f\x6d\x61\x70\0\x6d\x65\ 215 + \x74\x61\0\x6d\x61\x70\0\x63\x74\x78\0\x69\x6e\x74\0\x64\x75\x6d\x70\x5f\x62\ 216 + \x70\x66\x5f\x6d\x61\x70\0\x69\x74\x65\x72\x2f\x62\x70\x66\x5f\x6d\x61\x70\0\ 217 + \x30\x3a\x30\0\x2f\x77\x2f\x6e\x65\x74\x2d\x6e\x65\x78\x74\x2f\x6b\x65\x72\x6e\ 218 + \x65\x6c\x2f\x62\x70\x66\x2f\x70\x72\x65\x6c\x6f\x61\x64\x2f\x69\x74\x65\x72\ 219 + \x61\x74\x6f\x72\x73\x2f\x69\x74\x65\x72\x61\x74\x6f\x72\x73\x2e\x62\x70\x66\ 220 + \x2e\x63\0\x09\x73\x74\x72\x75\x63\x74\x20\x73\x65\x71\x5f\x66\x69\x6c\x65\x20\ 221 + \x2a\x73\x65\x71\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x65\x74\x61\x2d\x3e\x73\ 222 + \x65\x71\x3b\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x6d\x65\x74\x61\0\x73\x65\ 223 + \x71\0\x73\x65\x73\x73\x69\x6f\x6e\x5f\x69\x64\0\x73\x65\x71\x5f\x6e\x75\x6d\0\ 224 + \x73\x65\x71\x5f\x66\x69\x6c\x65\0\x5f\x5f\x75\x36\x34\0\x6c\x6f\x6e\x67\x20\ 225 + \x6c\x6f\x6e\x67\x20\x75\x6e\x73\x69\x67\x6e\x65\x64\x20\x69\x6e\x74\0\x30\x3a\ 226 + \x31\0\x09\x73\x74\x72\x75\x63\x74\x20\x62\x70\x66\x5f\x6d\x61\x70\x20\x2a\x6d\ 227 + \x61\x70\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x61\x70\x3b\0\x09\x69\x66\x20\x28\ 228 + \x21\x6d\x61\x70\x29\0\x30\x3a\x32\0\x09\x5f\x5f\x75\x36\x34\x20\x73\x65\x71\ 229 + \x5f\x6e\x75\x6d\x20\x3d\x20\x63\x74\x78\x2d\x3e\x6d\x65\x74\x61\x2d\x3e\x73\ 230 + \x65\x71\x5f\x6e\x75\x6d\x3b\0\x09\x69\x66\x20\x28\x73\x65\x71\x5f\x6e\x75\x6d\ 231 + \x20\x3d\x3d\x20\x30\x29\0\x09\x09\x42\x50\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\ 232 + \x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x20\x20\x69\x64\x20\x6e\x61\x6d\x65\ 233 + \x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x6d\x61\x78\x5f\x65\x6e\ 234 + \x74\x72\x69\x65\x73\x5c\x6e\x22\x29\x3b\0\x62\x70\x66\x5f\x6d\x61\x70\0\x69\ 235 + \x64\0\x6e\x61\x6d\x65\0\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\0\x5f\x5f\ 236 + \x75\x33\x32\0\x75\x6e\x73\x69\x67\x6e\x65\x64\x20\x69\x6e\x74\0\x63\x68\x61\ 237 + \x72\0\x5f\x5f\x41\x52\x52\x41\x59\x5f\x53\x49\x5a\x45\x5f\x54\x59\x50\x45\x5f\ 238 + \x5f\0\x09\x42\x50\x46\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\ 239 + \x71\x2c\x20\x22\x25\x34\x75\x20\x25\x2d\x31\x36\x73\x25\x36\x64\x5c\x6e\x22\ 240 + \x2c\x20\x6d\x61\x70\x2d\x3e\x69\x64\x2c\x20\x6d\x61\x70\x2d\x3e\x6e\x61\x6d\ 241 + \x65\x2c\x20\x6d\x61\x70\x2d\x3e\x6d\x61\x78\x5f\x65\x6e\x74\x72\x69\x65\x73\ 242 + \x29\x3b\0\x7d\0\x62\x70\x66\x5f\x69\x74\x65\x72\x5f\x5f\x62\x70\x66\x5f\x70\ 243 + \x72\x6f\x67\0\x70\x72\x6f\x67\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\ 244 + \x6f\x67\0\x69\x74\x65\x72\x2f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x09\x73\x74\ 245 + \x72\x75\x63\x74\x20\x62\x70\x66\x5f\x70\x72\x6f\x67\x20\x2a\x70\x72\x6f\x67\ 246 + \x20\x3d\x20\x63\x74\x78\x2d\x3e\x70\x72\x6f\x67\x3b\0\x09\x69\x66\x20\x28\x21\ 247 + \x70\x72\x6f\x67\x29\0\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x61\x75\x78\0\x09\x61\ 248 + \x75\x78\x20\x3d\x20\x70\x72\x6f\x67\x2d\x3e\x61\x75\x78\x3b\0\x09\x09\x42\x50\ 249 + \x46\x5f\x53\x45\x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\ 250 + \x20\x20\x69\x64\x20\x6e\x61\x6d\x65\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\ 251 + \x20\x20\x20\x61\x74\x74\x61\x63\x68\x65\x64\x5c\x6e\x22\x29\x3b\0\x62\x70\x66\ 252 + \x5f\x70\x72\x6f\x67\x5f\x61\x75\x78\0\x61\x74\x74\x61\x63\x68\x5f\x66\x75\x6e\ 253 + \x63\x5f\x6e\x61\x6d\x65\0\x6c\x69\x6e\x6b\x65\x64\x5f\x70\x72\x6f\x67\0\x66\ 254 + \x75\x6e\x63\x5f\x69\x6e\x66\x6f\0\x62\x74\x66\0\x09\x42\x50\x46\x5f\x53\x45\ 255 + \x51\x5f\x50\x52\x49\x4e\x54\x46\x28\x73\x65\x71\x2c\x20\x22\x25\x34\x75\x20\ 256 + \x25\x2d\x31\x36\x73\x20\x25\x73\x20\x25\x73\x5c\x6e\x22\x2c\x20\x61\x75\x78\ 257 + \x2d\x3e\x69\x64\x2c\0\x30\x3a\x34\0\x30\x3a\x35\0\x09\x69\x66\x20\x28\x21\x62\ 258 + \x74\x66\x29\0\x62\x70\x66\x5f\x66\x75\x6e\x63\x5f\x69\x6e\x66\x6f\0\x69\x6e\ 259 + \x73\x6e\x5f\x6f\x66\x66\0\x74\x79\x70\x65\x5f\x69\x64\0\x30\0\x73\x74\x72\x69\ 260 + \x6e\x67\x73\0\x74\x79\x70\x65\x73\0\x68\x64\x72\0\x62\x74\x66\x5f\x68\x65\x61\ 261 + \x64\x65\x72\0\x73\x74\x72\x5f\x6c\x65\x6e\0\x09\x74\x79\x70\x65\x73\x20\x3d\ 262 + \x20\x62\x74\x66\x2d\x3e\x74\x79\x70\x65\x73\x3b\0\x09\x62\x70\x66\x5f\x70\x72\ 263 + \x6f\x62\x65\x5f\x72\x65\x61\x64\x5f\x6b\x65\x72\x6e\x65\x6c\x28\x26\x74\x2c\ 264 + \x20\x73\x69\x7a\x65\x6f\x66\x28\x74\x29\x2c\x20\x74\x79\x70\x65\x73\x20\x2b\ 265 + \x20\x62\x74\x66\x5f\x69\x64\x29\x3b\0\x09\x73\x74\x72\x20\x3d\x20\x62\x74\x66\ 266 + \x2d\x3e\x73\x74\x72\x69\x6e\x67\x73\x3b\0\x62\x74\x66\x5f\x74\x79\x70\x65\0\ 267 + \x6e\x61\x6d\x65\x5f\x6f\x66\x66\0\x09\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x20\x3d\ 268 + \x20\x42\x50\x46\x5f\x43\x4f\x52\x45\x5f\x52\x45\x41\x44\x28\x74\x2c\x20\x6e\ 269 + \x61\x6d\x65\x5f\x6f\x66\x66\x29\x3b\0\x30\x3a\x32\x3a\x30\0\x09\x69\x66\x20\ 270 + \x28\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x20\x3e\x3d\x20\x62\x74\x66\x2d\x3e\x68\ 271 + \x64\x72\x2e\x73\x74\x72\x5f\x6c\x65\x6e\x29\0\x09\x72\x65\x74\x75\x72\x6e\x20\ 272 + \x73\x74\x72\x20\x2b\x20\x6e\x61\x6d\x65\x5f\x6f\x66\x66\x3b\0\x30\x3a\x33\0\ 273 + \x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\0\ 274 + \x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\ 275 + \x2e\x31\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\x2e\x5f\x5f\x5f\ 276 + \x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\x2e\x5f\x5f\ 277 + \x5f\x66\x6d\x74\x2e\x32\0\x4c\x49\x43\x45\x4e\x53\x45\0\x2e\x72\x6f\x64\x61\ 278 + \x74\x61\0\x6c\x69\x63\x65\x6e\x73\x65\0\x9f\xeb\x01\0\x20\0\0\0\0\0\0\0\x24\0\ 279 + \0\0\x24\0\0\0\x44\x02\0\0\x68\x02\0\0\xa4\x01\0\0\x08\0\0\0\x31\0\0\0\x01\0\0\ 280 + \0\0\0\0\0\x07\0\0\0\x56\x02\0\0\x01\0\0\0\0\0\0\0\x17\0\0\0\x10\0\0\0\x31\0\0\ 281 + \0\x09\0\0\0\0\0\0\0\x42\0\0\0\x7b\0\0\0\x1e\x40\x01\0\x08\0\0\0\x42\0\0\0\x7b\ 282 + \0\0\0\x24\x40\x01\0\x10\0\0\0\x42\0\0\0\xf2\0\0\0\x1d\x48\x01\0\x18\0\0\0\x42\ 283 + \0\0\0\x13\x01\0\0\x06\x50\x01\0\x20\0\0\0\x42\0\0\0\x22\x01\0\0\x1d\x44\x01\0\ 284 + \x28\0\0\0\x42\0\0\0\x47\x01\0\0\x06\x5c\x01\0\x38\0\0\0\x42\0\0\0\x5a\x01\0\0\ 285 + \x03\x60\x01\0\x70\0\0\0\x42\0\0\0\xe0\x01\0\0\x02\x68\x01\0\xf0\0\0\0\x42\0\0\ 286 + \0\x2e\x02\0\0\x01\x70\x01\0\x56\x02\0\0\x1a\0\0\0\0\0\0\0\x42\0\0\0\x7b\0\0\0\ 287 + \x1e\x84\x01\0\x08\0\0\0\x42\0\0\0\x7b\0\0\0\x24\x84\x01\0\x10\0\0\0\x42\0\0\0\ 288 + \x64\x02\0\0\x1f\x8c\x01\0\x18\0\0\0\x42\0\0\0\x88\x02\0\0\x06\x98\x01\0\x20\0\ 289 + \0\0\x42\0\0\0\xa1\x02\0\0\x0e\xa4\x01\0\x28\0\0\0\x42\0\0\0\x22\x01\0\0\x1d\ 290 + \x88\x01\0\x30\0\0\0\x42\0\0\0\x47\x01\0\0\x06\xa8\x01\0\x40\0\0\0\x42\0\0\0\ 291 + \xb3\x02\0\0\x03\xac\x01\0\x80\0\0\0\x42\0\0\0\x26\x03\0\0\x02\xb4\x01\0\xb8\0\ 292 + \0\0\x42\0\0\0\x61\x03\0\0\x06\x08\x01\0\xd0\0\0\0\x42\0\0\0\0\0\0\0\0\0\0\0\ 293 + \xd8\0\0\0\x42\0\0\0\xb2\x03\0\0\x0f\x14\x01\0\xe0\0\0\0\x42\0\0\0\xc7\x03\0\0\ 294 + \x2d\x18\x01\0\xf0\0\0\0\x42\0\0\0\xfe\x03\0\0\x0d\x10\x01\0\0\x01\0\0\x42\0\0\ 295 + \0\0\0\0\0\0\0\0\0\x08\x01\0\0\x42\0\0\0\xc7\x03\0\0\x02\x18\x01\0\x20\x01\0\0\ 296 + \x42\0\0\0\x25\x04\0\0\x0d\x1c\x01\0\x38\x01\0\0\x42\0\0\0\0\0\0\0\0\0\0\0\x40\ 297 + \x01\0\0\x42\0\0\0\x25\x04\0\0\x0d\x1c\x01\0\x58\x01\0\0\x42\0\0\0\x25\x04\0\0\ 298 + \x0d\x1c\x01\0\x60\x01\0\0\x42\0\0\0\x53\x04\0\0\x1b\x20\x01\0\x68\x01\0\0\x42\ 299 + \0\0\0\x53\x04\0\0\x06\x20\x01\0\x70\x01\0\0\x42\0\0\0\x76\x04\0\0\x0d\x28\x01\ 300 + \0\x78\x01\0\0\x42\0\0\0\0\0\0\0\0\0\0\0\x80\x01\0\0\x42\0\0\0\x26\x03\0\0\x02\ 301 + \xb4\x01\0\xf8\x01\0\0\x42\0\0\0\x2e\x02\0\0\x01\xc4\x01\0\x10\0\0\0\x31\0\0\0\ 302 + \x07\0\0\0\0\0\0\0\x02\0\0\0\x3e\0\0\0\0\0\0\0\x08\0\0\0\x08\0\0\0\x3e\0\0\0\0\ 303 + \0\0\0\x10\0\0\0\x02\0\0\0\xee\0\0\0\0\0\0\0\x20\0\0\0\x08\0\0\0\x1e\x01\0\0\0\ 304 + \0\0\0\x70\0\0\0\x0d\0\0\0\x3e\0\0\0\0\0\0\0\x80\0\0\0\x0d\0\0\0\xee\0\0\0\0\0\ 305 + \0\0\xa0\0\0\0\x0d\0\0\0\x1e\x01\0\0\0\0\0\0\x56\x02\0\0\x12\0\0\0\0\0\0\0\x14\ 306 + \0\0\0\x3e\0\0\0\0\0\0\0\x08\0\0\0\x08\0\0\0\x3e\0\0\0\0\0\0\0\x10\0\0\0\x14\0\ 307 + \0\0\xee\0\0\0\0\0\0\0\x20\0\0\0\x18\0\0\0\x3e\0\0\0\0\0\0\0\x28\0\0\0\x08\0\0\ 308 + \0\x1e\x01\0\0\0\0\0\0\x80\0\0\0\x1a\0\0\0\x3e\0\0\0\0\0\0\0\x90\0\0\0\x1a\0\0\ 309 + \0\xee\0\0\0\0\0\0\0\xa8\0\0\0\x1a\0\0\0\x59\x03\0\0\0\0\0\0\xb0\0\0\0\x1a\0\0\ 310 + \0\x5d\x03\0\0\0\0\0\0\xc0\0\0\0\x1f\0\0\0\x8b\x03\0\0\0\0\0\0\xd8\0\0\0\x20\0\ 311 + \0\0\xee\0\0\0\0\0\0\0\xf0\0\0\0\x20\0\0\0\x3e\0\0\0\0\0\0\0\x18\x01\0\0\x24\0\ 312 + \0\0\x3e\0\0\0\0\0\0\0\x50\x01\0\0\x1a\0\0\0\xee\0\0\0\0\0\0\0\x60\x01\0\0\x20\ 313 + \0\0\0\x4d\x04\0\0\0\0\0\0\x88\x01\0\0\x1a\0\0\0\x1e\x01\0\0\0\0\0\0\x98\x01\0\ 314 + \0\x1a\0\0\0\x8e\x04\0\0\0\0\0\0\xa0\x01\0\0\x18\0\0\0\x3e\0\0\0\0\0\0\0\0\0\0\ 315 + \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xd6\0\0\0\0\0\x02\0\x70\0\0\0\0\ 316 + \0\0\0\0\0\0\0\0\0\0\0\xc8\0\0\0\0\0\x02\0\xf0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ 317 + \xcf\0\0\0\0\0\x03\0\x78\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xc1\0\0\0\0\0\x03\0\x80\ 318 + \x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xba\0\0\0\0\0\x03\0\xf8\x01\0\0\0\0\0\0\0\0\0\ 319 + \0\0\0\0\0\x14\0\0\0\x01\0\x04\0\0\0\0\0\0\0\0\0\x23\0\0\0\0\0\0\0\xf4\0\0\0\ 320 + \x01\0\x04\0\x23\0\0\0\0\0\0\0\x0e\0\0\0\0\0\0\0\x28\0\0\0\x01\0\x04\0\x31\0\0\ 321 + \0\0\0\0\0\x20\0\0\0\0\0\0\0\xdd\0\0\0\x01\0\x04\0\x51\0\0\0\0\0\0\0\x11\0\0\0\ 322 + \0\0\0\0\0\0\0\0\x03\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\x03\ 323 + \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x03\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\ 324 + \0\0\0\0\xb2\0\0\0\x11\0\x05\0\0\0\0\0\0\0\0\0\x04\0\0\0\0\0\0\0\x3d\0\0\0\x12\ 325 + \0\x02\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\x5b\0\0\0\x12\0\x03\0\0\0\0\0\0\0\0\ 326 + \0\x08\x02\0\0\0\0\0\0\x48\0\0\0\0\0\0\0\x01\0\0\0\x0c\0\0\0\xc8\0\0\0\0\0\0\0\ 327 + \x01\0\0\0\x0c\0\0\0\x50\0\0\0\0\0\0\0\x01\0\0\0\x0c\0\0\0\xd0\x01\0\0\0\0\0\0\ 328 + \x01\0\0\0\x0c\0\0\0\xf0\x03\0\0\0\0\0\0\x0a\0\0\0\x0c\0\0\0\xfc\x03\0\0\0\0\0\ 329 + \0\x0a\0\0\0\x0c\0\0\0\x08\x04\0\0\0\0\0\0\x0a\0\0\0\x0c\0\0\0\x14\x04\0\0\0\0\ 330 + \0\0\x0a\0\0\0\x0c\0\0\0\x2c\x04\0\0\0\0\0\0\0\0\0\0\x0d\0\0\0\x2c\0\0\0\0\0\0\ 331 + \0\0\0\0\0\x0a\0\0\0\x3c\0\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x50\0\0\0\0\0\0\0\0\0\ 332 + \0\0\x0a\0\0\0\x60\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\x70\0\0\0\0\0\0\0\0\0\0\0\ 333 + \x0a\0\0\0\x80\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\x90\0\0\0\0\0\0\0\0\0\0\0\x0a\0\ 334 + \0\0\xa0\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xb0\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\ 335 + \xc0\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xd0\0\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xe8\0\ 336 + \0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xf8\0\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x08\x01\0\0\ 337 + \0\0\0\0\0\0\0\0\x0b\0\0\0\x18\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x28\x01\0\0\0\ 338 + \0\0\0\0\0\0\0\x0b\0\0\0\x38\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x48\x01\0\0\0\0\ 339 + \0\0\0\0\0\0\x0b\0\0\0\x58\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x68\x01\0\0\0\0\0\ 340 + \0\0\0\0\0\x0b\0\0\0\x78\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x88\x01\0\0\0\0\0\0\ 341 + \0\0\0\0\x0b\0\0\0\x98\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xa8\x01\0\0\0\0\0\0\0\ 342 + \0\0\0\x0b\0\0\0\xb8\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xc8\x01\0\0\0\0\0\0\0\0\ 343 + \0\0\x0b\0\0\0\xd8\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xe8\x01\0\0\0\0\0\0\0\0\0\ 344 + \0\x0b\0\0\0\xf8\x01\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x08\x02\0\0\0\0\0\0\0\0\0\0\ 345 + \x0b\0\0\0\x18\x02\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x28\x02\0\0\0\0\0\0\0\0\0\0\ 346 + \x0b\0\0\0\x38\x02\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x48\x02\0\0\0\0\0\0\0\0\0\0\ 347 + \x0b\0\0\0\x58\x02\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x68\x02\0\0\0\0\0\0\0\0\0\0\ 348 + \x0b\0\0\0\x78\x02\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x94\x02\0\0\0\0\0\0\0\0\0\0\ 349 + \x0a\0\0\0\xa4\x02\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xb4\x02\0\0\0\0\0\0\0\0\0\0\ 350 + \x0a\0\0\0\xc4\x02\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xd4\x02\0\0\0\0\0\0\0\0\0\0\ 351 + \x0a\0\0\0\xe4\x02\0\0\0\0\0\0\0\0\0\0\x0a\0\0\0\xf4\x02\0\0\0\0\0\0\0\0\0\0\ 352 + \x0a\0\0\0\x0c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x1c\x03\0\0\0\0\0\0\0\0\0\0\ 353 + \x0b\0\0\0\x2c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x3c\x03\0\0\0\0\0\0\0\0\0\0\ 354 + \x0b\0\0\0\x4c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x5c\x03\0\0\0\0\0\0\0\0\0\0\ 355 + \x0b\0\0\0\x6c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x7c\x03\0\0\0\0\0\0\0\0\0\0\ 356 + \x0b\0\0\0\x8c\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x9c\x03\0\0\0\0\0\0\0\0\0\0\ 357 + \x0b\0\0\0\xac\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xbc\x03\0\0\0\0\0\0\0\0\0\0\ 358 + \x0b\0\0\0\xcc\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xdc\x03\0\0\0\0\0\0\0\0\0\0\ 359 + \x0b\0\0\0\xec\x03\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\xfc\x03\0\0\0\0\0\0\0\0\0\0\ 360 + \x0b\0\0\0\x0c\x04\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x1c\x04\0\0\0\0\0\0\0\0\0\0\ 361 + \x0b\0\0\0\x4e\x4f\x41\x42\x43\x44\x4d\0\x2e\x74\x65\x78\x74\0\x2e\x72\x65\x6c\ 362 + \x2e\x42\x54\x46\x2e\x65\x78\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\x61\ 363 + \x70\x2e\x5f\x5f\x5f\x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\ 364 + \x6f\x67\x2e\x5f\x5f\x5f\x66\x6d\x74\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\ 365 + \x61\x70\0\x2e\x72\x65\x6c\x69\x74\x65\x72\x2f\x62\x70\x66\x5f\x6d\x61\x70\0\ 366 + \x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x2e\x72\x65\x6c\x69\x74\ 367 + \x65\x72\x2f\x62\x70\x66\x5f\x70\x72\x6f\x67\0\x2e\x6c\x6c\x76\x6d\x5f\x61\x64\ 368 + \x64\x72\x73\x69\x67\0\x6c\x69\x63\x65\x6e\x73\x65\0\x2e\x73\x74\x72\x74\x61\ 369 + \x62\0\x2e\x73\x79\x6d\x74\x61\x62\0\x2e\x72\x6f\x64\x61\x74\x61\0\x2e\x72\x65\ 370 + \x6c\x2e\x42\x54\x46\0\x4c\x49\x43\x45\x4e\x53\x45\0\x4c\x42\x42\x31\x5f\x37\0\ 371 + \x4c\x42\x42\x31\x5f\x36\0\x4c\x42\x42\x30\x5f\x34\0\x4c\x42\x42\x31\x5f\x33\0\ 372 + \x4c\x42\x42\x30\x5f\x33\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x70\x72\x6f\x67\ 373 + \x2e\x5f\x5f\x5f\x66\x6d\x74\x2e\x32\0\x64\x75\x6d\x70\x5f\x62\x70\x66\x5f\x6d\ 374 + \x61\x70\x2e\x5f\x5f\x5f\x66\x6d\x74\x2e\x31\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ 375 + \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ 376 + \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\ 377 + \0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x04\0\0\0\0\0\0\0\0\ 378 + \0\0\0\0\0\0\0\x4e\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\ 379 + \0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ 380 + \x6d\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x40\x01\0\0\0\0\0\0\x08\ 381 + \x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xa1\0\0\0\ 382 + \x01\0\0\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x48\x03\0\0\0\0\0\0\x62\0\0\0\0\0\ 383 + \0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x89\0\0\0\x01\0\0\0\x03\ 384 + \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xaa\x03\0\0\0\0\0\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\ 385 + \0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xad\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\ 386 + \0\0\0\0\0\0\0\xae\x03\0\0\0\0\0\0\x34\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\ 387 + \0\0\0\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ 388 + \xe2\x0c\0\0\0\0\0\0\x2c\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\ 389 + \0\0\0\0\0\0\x99\0\0\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x10\x11\0\0\0\ 390 + \0\0\0\x80\x01\0\0\0\0\0\0\x0e\0\0\0\x0d\0\0\0\x08\0\0\0\0\0\0\0\x18\0\0\0\0\0\ 391 + \0\0\x4a\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x90\x12\0\0\0\0\0\0\ 392 + \x20\0\0\0\0\0\0\0\x08\0\0\0\x02\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x69\ 393 + \0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xb0\x12\0\0\0\0\0\0\x20\0\0\0\ 394 + \0\0\0\0\x08\0\0\0\x03\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\xa9\0\0\0\x09\ 395 + \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xd0\x12\0\0\0\0\0\0\x50\0\0\0\0\0\0\0\ 396 + \x08\0\0\0\x06\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x07\0\0\0\x09\0\0\0\0\ 397 + \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x20\x13\0\0\0\0\0\0\xe0\x03\0\0\0\0\0\0\x08\0\0\ 398 + \0\x07\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x7b\0\0\0\x03\x4c\xff\x6f\0\0\ 399 + \0\x80\0\0\0\0\0\0\0\0\0\0\0\0\0\x17\0\0\0\0\0\0\x07\0\0\0\0\0\0\0\0\0\0\0\0\0\ 400 + \0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x91\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\ 401 + \0\0\0\0\0\0\x07\x17\0\0\0\0\0\0\x0a\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\ 402 + \0\0\0\0\0\0\0\0\0\0\0\0"; 403 + 404 + return 0; 405 + err: 406 + bpf_object__destroy_skeleton(s); 407 + return -1; 408 + } 409 + 410 + #endif /* __ITERATORS_BPF_SKEL_H__ */

+2

kernel/bpf/queue_stack_maps.c

··· 257 257 258 258 static int queue_map_btf_id; 259 259 const struct bpf_map_ops queue_map_ops = { 260 + .map_meta_equal = bpf_map_meta_equal, 260 261 .map_alloc_check = queue_stack_map_alloc_check, 261 262 .map_alloc = queue_stack_map_alloc, 262 263 .map_free = queue_stack_map_free, ··· 274 273 275 274 static int stack_map_btf_id; 276 275 const struct bpf_map_ops stack_map_ops = { 276 + .map_meta_equal = bpf_map_meta_equal, 277 277 .map_alloc_check = queue_stack_map_alloc_check, 278 278 .map_alloc = queue_stack_map_alloc, 279 279 .map_free = queue_stack_map_free,

+1

kernel/bpf/reuseport_array.c

··· 351 351 352 352 static int reuseport_array_map_btf_id; 353 353 const struct bpf_map_ops reuseport_array_ops = { 354 + .map_meta_equal = bpf_map_meta_equal, 354 355 .map_alloc_check = reuseport_array_alloc_check, 355 356 .map_alloc = reuseport_array_alloc, 356 357 .map_free = reuseport_array_free,

+1

kernel/bpf/ringbuf.c

··· 287 287 288 288 static int ringbuf_map_btf_id; 289 289 const struct bpf_map_ops ringbuf_map_ops = { 290 + .map_meta_equal = bpf_map_meta_equal, 290 291 .map_alloc = ringbuf_map_alloc, 291 292 .map_free = ringbuf_map_free, 292 293 .map_mmap = ringbuf_map_mmap,

+1

kernel/bpf/stackmap.c

··· 839 839 840 840 static int stack_trace_map_btf_id; 841 841 const struct bpf_map_ops stack_trace_map_ops = { 842 + .map_meta_equal = bpf_map_meta_equal, 842 843 .map_alloc = stack_map_alloc, 843 844 .map_free = stack_map_free, 844 845 .map_get_next_key = stack_map_get_next_key,

+44 -24

kernel/bpf/syscall.c

··· 29 29 #include <linux/bpf_lsm.h> 30 30 #include <linux/poll.h> 31 31 #include <linux/bpf-netns.h> 32 + #include <linux/rcupdate_trace.h> 32 33 33 34 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ 34 35 (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ ··· 91 90 } 92 91 93 92 const struct bpf_map_ops bpf_map_offload_ops = { 93 + .map_meta_equal = bpf_map_meta_equal, 94 94 .map_alloc = bpf_map_offload_map_alloc, 95 95 .map_free = bpf_map_offload_map_free, 96 96 .map_check_btf = map_check_no_btf, ··· 159 157 if (bpf_map_is_dev_bound(map)) { 160 158 return bpf_map_offload_update_elem(map, key, value, flags); 161 159 } else if (map->map_type == BPF_MAP_TYPE_CPUMAP || 162 - map->map_type == BPF_MAP_TYPE_SOCKHASH || 163 - map->map_type == BPF_MAP_TYPE_SOCKMAP || 164 160 map->map_type == BPF_MAP_TYPE_STRUCT_OPS) { 165 161 return map->ops->map_update_elem(map, key, value, flags); 162 + } else if (map->map_type == BPF_MAP_TYPE_SOCKHASH || 163 + map->map_type == BPF_MAP_TYPE_SOCKMAP) { 164 + return sock_map_update_elem_sys(map, key, value, flags); 166 165 } else if (IS_FD_PROG_ARRAY(map)) { 167 166 return bpf_fd_array_map_update_elem(map, f.file, key, value, 168 167 flags); ··· 771 768 if (map->map_type != BPF_MAP_TYPE_HASH && 772 769 map->map_type != BPF_MAP_TYPE_ARRAY && 773 770 map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && 774 - map->map_type != BPF_MAP_TYPE_SK_STORAGE) 771 + map->map_type != BPF_MAP_TYPE_SK_STORAGE && 772 + map->map_type != BPF_MAP_TYPE_INODE_STORAGE) 775 773 return -ENOTSUPP; 776 774 if (map->spin_lock_off + sizeof(struct bpf_spin_lock) > 777 775 map->value_size) { ··· 1732 1728 btf_put(prog->aux->btf); 1733 1729 bpf_prog_free_linfo(prog); 1734 1730 1735 - if (deferred) 1736 - call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); 1737 - else 1731 + if (deferred) { 1732 + if (prog->aux->sleepable) 1733 + call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_rcu); 1734 + else 1735 + call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); 1736 + } else { 1738 1737 __bpf_prog_put_rcu(&prog->aux->rcu); 1738 + } 1739 1739 } 1740 1740 1741 1741 static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock) ··· 2109 2101 if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | 2110 2102 BPF_F_ANY_ALIGNMENT | 2111 2103 BPF_F_TEST_STATE_FREQ | 2104 + BPF_F_SLEEPABLE | 2112 2105 BPF_F_TEST_RND_HI32)) 2113 2106 return -EINVAL; 2114 2107 ··· 2165 2156 } 2166 2157 2167 2158 prog->aux->offload_requested = !!attr->prog_ifindex; 2159 + prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE; 2168 2160 2169 2161 err = security_bpf_prog_alloc(prog->aux); 2170 2162 if (err) ··· 4024 4014 return ret; 4025 4015 } 4026 4016 4027 - static int bpf_link_inc_not_zero(struct bpf_link *link) 4017 + static struct bpf_link *bpf_link_inc_not_zero(struct bpf_link *link) 4028 4018 { 4029 - return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? 0 : -ENOENT; 4019 + return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? link : ERR_PTR(-ENOENT); 4020 + } 4021 + 4022 + struct bpf_link *bpf_link_by_id(u32 id) 4023 + { 4024 + struct bpf_link *link; 4025 + 4026 + if (!id) 4027 + return ERR_PTR(-ENOENT); 4028 + 4029 + spin_lock_bh(&link_idr_lock); 4030 + /* before link is "settled", ID is 0, pretend it doesn't exist yet */ 4031 + link = idr_find(&link_idr, id); 4032 + if (link) { 4033 + if (link->id) 4034 + link = bpf_link_inc_not_zero(link); 4035 + else 4036 + link = ERR_PTR(-EAGAIN); 4037 + } else { 4038 + link = ERR_PTR(-ENOENT); 4039 + } 4040 + spin_unlock_bh(&link_idr_lock); 4041 + return link; 4030 4042 } 4031 4043 4032 4044 #define BPF_LINK_GET_FD_BY_ID_LAST_FIELD link_id ··· 4057 4025 { 4058 4026 struct bpf_link *link; 4059 4027 u32 id = attr->link_id; 4060 - int fd, err; 4028 + int fd; 4061 4029 4062 4030 if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID)) 4063 4031 return -EINVAL; ··· 4065 4033 if (!capable(CAP_SYS_ADMIN)) 4066 4034 return -EPERM; 4067 4035 4068 - spin_lock_bh(&link_idr_lock); 4069 - link = idr_find(&link_idr, id); 4070 - /* before link is "settled", ID is 0, pretend it doesn't exist yet */ 4071 - if (link) { 4072 - if (link->id) 4073 - err = bpf_link_inc_not_zero(link); 4074 - else 4075 - err = -EAGAIN; 4076 - } else { 4077 - err = -ENOENT; 4078 - } 4079 - spin_unlock_bh(&link_idr_lock); 4080 - 4081 - if (err) 4082 - return err; 4036 + link = bpf_link_by_id(id); 4037 + if (IS_ERR(link)) 4038 + return PTR_ERR(link); 4083 4039 4084 4040 fd = bpf_link_new_fd(link); 4085 4041 if (fd < 0)

+26 -3

kernel/bpf/trampoline.c

··· 7 7 #include <linux/rbtree_latch.h> 8 8 #include <linux/perf_event.h> 9 9 #include <linux/btf.h> 10 + #include <linux/rcupdate_trace.h> 11 + #include <linux/rcupdate_wait.h> 10 12 11 13 /* dummy _ops. The verifier will operate on target program's ops. */ 12 14 const struct bpf_verifier_ops bpf_extension_verifier_ops = { ··· 212 210 * updates to trampoline would change the code from underneath the 213 211 * preempted task. Hence wait for tasks to voluntarily schedule or go 214 212 * to userspace. 213 + * The same trampoline can hold both sleepable and non-sleepable progs. 214 + * synchronize_rcu_tasks_trace() is needed to make sure all sleepable 215 + * programs finish executing. 216 + * Wait for these two grace periods together. 215 217 */ 216 - 217 - synchronize_rcu_tasks(); 218 + synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace); 218 219 219 220 err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2, 220 221 &tr->func.model, flags, tprogs, ··· 349 344 if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT]))) 350 345 goto out; 351 346 bpf_image_ksym_del(&tr->ksym); 352 - /* wait for tasks to get out of trampoline before freeing it */ 347 + /* This code will be executed when all bpf progs (both sleepable and 348 + * non-sleepable) went through 349 + * bpf_prog_put()->call_rcu[_tasks_trace]()->bpf_prog_free_deferred(). 350 + * Hence no need for another synchronize_rcu_tasks_trace() here, 351 + * but synchronize_rcu_tasks() is still needed, since trampoline 352 + * may not have had any sleepable programs and we need to wait 353 + * for tasks to get out of trampoline code before freeing it. 354 + */ 353 355 synchronize_rcu_tasks(); 354 356 bpf_jit_free_exec(tr->image); 355 357 hlist_del(&tr->hlist); ··· 404 392 } 405 393 migrate_enable(); 406 394 rcu_read_unlock(); 395 + } 396 + 397 + void notrace __bpf_prog_enter_sleepable(void) 398 + { 399 + rcu_read_lock_trace(); 400 + might_fault(); 401 + } 402 + 403 + void notrace __bpf_prog_exit_sleepable(void) 404 + { 405 + rcu_read_unlock_trace(); 407 406 } 408 407 409 408 int __weak

+263 -20

kernel/bpf/verifier.c

··· 21 21 #include <linux/ctype.h> 22 22 #include <linux/error-injection.h> 23 23 #include <linux/bpf_lsm.h> 24 + #include <linux/btf_ids.h> 24 25 25 26 #include "disasm.h" 26 27 ··· 2626 2625 2627 2626 #define MAX_PACKET_OFF 0xffff 2628 2627 2628 + static enum bpf_prog_type resolve_prog_type(struct bpf_prog *prog) 2629 + { 2630 + return prog->aux->linked_prog ? prog->aux->linked_prog->type 2631 + : prog->type; 2632 + } 2633 + 2629 2634 static bool may_access_direct_pkt_data(struct bpf_verifier_env *env, 2630 2635 const struct bpf_call_arg_meta *meta, 2631 2636 enum bpf_access_type t) 2632 2637 { 2633 - switch (env->prog->type) { 2638 + enum bpf_prog_type prog_type = resolve_prog_type(env->prog); 2639 + 2640 + switch (prog_type) { 2634 2641 /* Program types only with direct read access go here! */ 2635 2642 case BPF_PROG_TYPE_LWT_IN: 2636 2643 case BPF_PROG_TYPE_LWT_OUT: ··· 3881 3872 return -EINVAL; 3882 3873 } 3883 3874 3875 + static int resolve_map_arg_type(struct bpf_verifier_env *env, 3876 + const struct bpf_call_arg_meta *meta, 3877 + enum bpf_arg_type *arg_type) 3878 + { 3879 + if (!meta->map_ptr) { 3880 + /* kernel subsystem misconfigured verifier */ 3881 + verbose(env, "invalid map_ptr to access map->type\n"); 3882 + return -EACCES; 3883 + } 3884 + 3885 + switch (meta->map_ptr->map_type) { 3886 + case BPF_MAP_TYPE_SOCKMAP: 3887 + case BPF_MAP_TYPE_SOCKHASH: 3888 + if (*arg_type == ARG_PTR_TO_MAP_VALUE) { 3889 + *arg_type = ARG_PTR_TO_SOCKET; 3890 + } else { 3891 + verbose(env, "invalid arg_type for sockmap/sockhash\n"); 3892 + return -EINVAL; 3893 + } 3894 + break; 3895 + 3896 + default: 3897 + break; 3898 + } 3899 + return 0; 3900 + } 3901 + 3884 3902 static int check_func_arg(struct bpf_verifier_env *env, u32 arg, 3885 3903 struct bpf_call_arg_meta *meta, 3886 3904 const struct bpf_func_proto *fn) ··· 3938 3902 !may_access_direct_pkt_data(env, meta, BPF_READ)) { 3939 3903 verbose(env, "helper access to the packet is not allowed\n"); 3940 3904 return -EACCES; 3905 + } 3906 + 3907 + if (arg_type == ARG_PTR_TO_MAP_VALUE || 3908 + arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE || 3909 + arg_type == ARG_PTR_TO_MAP_VALUE_OR_NULL) { 3910 + err = resolve_map_arg_type(env, meta, &arg_type); 3911 + if (err) 3912 + return err; 3941 3913 } 3942 3914 3943 3915 if (arg_type == ARG_PTR_TO_MAP_KEY || ··· 4004 3960 goto err_type; 4005 3961 } 4006 3962 } else if (arg_type == ARG_PTR_TO_BTF_ID) { 3963 + bool ids_match = false; 3964 + 4007 3965 expected_type = PTR_TO_BTF_ID; 4008 3966 if (type != expected_type) 4009 3967 goto err_type; 4010 3968 if (!fn->check_btf_id) { 4011 3969 if (reg->btf_id != meta->btf_id) { 4012 - verbose(env, "Helper has type %s got %s in R%d\n", 4013 - kernel_type_name(meta->btf_id), 4014 - kernel_type_name(reg->btf_id), regno); 4015 - 4016 - return -EACCES; 3970 + ids_match = btf_struct_ids_match(&env->log, reg->off, reg->btf_id, 3971 + meta->btf_id); 3972 + if (!ids_match) { 3973 + verbose(env, "Helper has type %s got %s in R%d\n", 3974 + kernel_type_name(meta->btf_id), 3975 + kernel_type_name(reg->btf_id), regno); 3976 + return -EACCES; 3977 + } 4017 3978 } 4018 3979 } else if (!fn->check_btf_id(reg->btf_id, arg)) { 4019 3980 verbose(env, "Helper does not support %s in R%d\n", ··· 4026 3977 4027 3978 return -EACCES; 4028 3979 } 4029 - if (!tnum_is_const(reg->var_off) || reg->var_off.value || reg->off) { 3980 + if ((reg->off && !ids_match) || !tnum_is_const(reg->var_off) || reg->var_off.value) { 4030 3981 verbose(env, "R%d is a pointer to in-kernel struct with non-zero offset\n", 4031 3982 regno); 4032 3983 return -EACCES; ··· 4192 4143 return -EACCES; 4193 4144 } 4194 4145 4146 + static bool may_update_sockmap(struct bpf_verifier_env *env, int func_id) 4147 + { 4148 + enum bpf_attach_type eatype = env->prog->expected_attach_type; 4149 + enum bpf_prog_type type = resolve_prog_type(env->prog); 4150 + 4151 + if (func_id != BPF_FUNC_map_update_elem) 4152 + return false; 4153 + 4154 + /* It's not possible to get access to a locked struct sock in these 4155 + * contexts, so updating is safe. 4156 + */ 4157 + switch (type) { 4158 + case BPF_PROG_TYPE_TRACING: 4159 + if (eatype == BPF_TRACE_ITER) 4160 + return true; 4161 + break; 4162 + case BPF_PROG_TYPE_SOCKET_FILTER: 4163 + case BPF_PROG_TYPE_SCHED_CLS: 4164 + case BPF_PROG_TYPE_SCHED_ACT: 4165 + case BPF_PROG_TYPE_XDP: 4166 + case BPF_PROG_TYPE_SK_REUSEPORT: 4167 + case BPF_PROG_TYPE_FLOW_DISSECTOR: 4168 + case BPF_PROG_TYPE_SK_LOOKUP: 4169 + return true; 4170 + default: 4171 + break; 4172 + } 4173 + 4174 + verbose(env, "cannot update sockmap in this context\n"); 4175 + return false; 4176 + } 4177 + 4195 4178 static int check_map_func_compatibility(struct bpf_verifier_env *env, 4196 4179 struct bpf_map *map, int func_id) 4197 4180 { ··· 4295 4214 func_id != BPF_FUNC_map_delete_elem && 4296 4215 func_id != BPF_FUNC_msg_redirect_map && 4297 4216 func_id != BPF_FUNC_sk_select_reuseport && 4298 - func_id != BPF_FUNC_map_lookup_elem) 4217 + func_id != BPF_FUNC_map_lookup_elem && 4218 + !may_update_sockmap(env, func_id)) 4299 4219 goto error; 4300 4220 break; 4301 4221 case BPF_MAP_TYPE_SOCKHASH: ··· 4305 4223 func_id != BPF_FUNC_map_delete_elem && 4306 4224 func_id != BPF_FUNC_msg_redirect_hash && 4307 4225 func_id != BPF_FUNC_sk_select_reuseport && 4308 - func_id != BPF_FUNC_map_lookup_elem) 4226 + func_id != BPF_FUNC_map_lookup_elem && 4227 + !may_update_sockmap(env, func_id)) 4309 4228 goto error; 4310 4229 break; 4311 4230 case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY: ··· 4323 4240 case BPF_MAP_TYPE_SK_STORAGE: 4324 4241 if (func_id != BPF_FUNC_sk_storage_get && 4325 4242 func_id != BPF_FUNC_sk_storage_delete) 4243 + goto error; 4244 + break; 4245 + case BPF_MAP_TYPE_INODE_STORAGE: 4246 + if (func_id != BPF_FUNC_inode_storage_get && 4247 + func_id != BPF_FUNC_inode_storage_delete) 4326 4248 goto error; 4327 4249 break; 4328 4250 default: ··· 4401 4313 case BPF_FUNC_sk_storage_get: 4402 4314 case BPF_FUNC_sk_storage_delete: 4403 4315 if (map->map_type != BPF_MAP_TYPE_SK_STORAGE) 4316 + goto error; 4317 + break; 4318 + case BPF_FUNC_inode_storage_get: 4319 + case BPF_FUNC_inode_storage_delete: 4320 + if (map->map_type != BPF_MAP_TYPE_INODE_STORAGE) 4404 4321 goto error; 4405 4322 break; 4406 4323 default: ··· 4865 4772 /* eBPF programs must be GPL compatible to use GPL-ed functions */ 4866 4773 if (!env->prog->gpl_compatible && fn->gpl_only) { 4867 4774 verbose(env, "cannot call GPL-restricted function from non-GPL compatible program\n"); 4775 + return -EINVAL; 4776 + } 4777 + 4778 + if (fn->allowed && !fn->allowed(env->prog)) { 4779 + verbose(env, "helper call is not allowed in probe\n"); 4868 4780 return -EINVAL; 4869 4781 } 4870 4782 ··· 5830 5732 __update_reg_bounds(dst_reg); 5831 5733 } 5832 5734 5735 + static void scalar32_min_max_xor(struct bpf_reg_state *dst_reg, 5736 + struct bpf_reg_state *src_reg) 5737 + { 5738 + bool src_known = tnum_subreg_is_const(src_reg->var_off); 5739 + bool dst_known = tnum_subreg_is_const(dst_reg->var_off); 5740 + struct tnum var32_off = tnum_subreg(dst_reg->var_off); 5741 + s32 smin_val = src_reg->s32_min_value; 5742 + 5743 + /* Assuming scalar64_min_max_xor will be called so it is safe 5744 + * to skip updating register for known case. 5745 + */ 5746 + if (src_known && dst_known) 5747 + return; 5748 + 5749 + /* We get both minimum and maximum from the var32_off. */ 5750 + dst_reg->u32_min_value = var32_off.value; 5751 + dst_reg->u32_max_value = var32_off.value | var32_off.mask; 5752 + 5753 + if (dst_reg->s32_min_value >= 0 && smin_val >= 0) { 5754 + /* XORing two positive sign numbers gives a positive, 5755 + * so safe to cast u32 result into s32. 5756 + */ 5757 + dst_reg->s32_min_value = dst_reg->u32_min_value; 5758 + dst_reg->s32_max_value = dst_reg->u32_max_value; 5759 + } else { 5760 + dst_reg->s32_min_value = S32_MIN; 5761 + dst_reg->s32_max_value = S32_MAX; 5762 + } 5763 + } 5764 + 5765 + static void scalar_min_max_xor(struct bpf_reg_state *dst_reg, 5766 + struct bpf_reg_state *src_reg) 5767 + { 5768 + bool src_known = tnum_is_const(src_reg->var_off); 5769 + bool dst_known = tnum_is_const(dst_reg->var_off); 5770 + s64 smin_val = src_reg->smin_value; 5771 + 5772 + if (src_known && dst_known) { 5773 + /* dst_reg->var_off.value has been updated earlier */ 5774 + __mark_reg_known(dst_reg, dst_reg->var_off.value); 5775 + return; 5776 + } 5777 + 5778 + /* We get both minimum and maximum from the var_off. */ 5779 + dst_reg->umin_value = dst_reg->var_off.value; 5780 + dst_reg->umax_value = dst_reg->var_off.value | dst_reg->var_off.mask; 5781 + 5782 + if (dst_reg->smin_value >= 0 && smin_val >= 0) { 5783 + /* XORing two positive sign numbers gives a positive, 5784 + * so safe to cast u64 result into s64. 5785 + */ 5786 + dst_reg->smin_value = dst_reg->umin_value; 5787 + dst_reg->smax_value = dst_reg->umax_value; 5788 + } else { 5789 + dst_reg->smin_value = S64_MIN; 5790 + dst_reg->smax_value = S64_MAX; 5791 + } 5792 + 5793 + __update_reg_bounds(dst_reg); 5794 + } 5795 + 5833 5796 static void __scalar32_min_max_lsh(struct bpf_reg_state *dst_reg, 5834 5797 u64 umin_val, u64 umax_val) 5835 5798 { ··· 6198 6039 dst_reg->var_off = tnum_or(dst_reg->var_off, src_reg.var_off); 6199 6040 scalar32_min_max_or(dst_reg, &src_reg); 6200 6041 scalar_min_max_or(dst_reg, &src_reg); 6042 + break; 6043 + case BPF_XOR: 6044 + dst_reg->var_off = tnum_xor(dst_reg->var_off, src_reg.var_off); 6045 + scalar32_min_max_xor(dst_reg, &src_reg); 6046 + scalar_min_max_xor(dst_reg, &src_reg); 6201 6047 break; 6202 6048 case BPF_LSH: 6203 6049 if (umax_val >= insn_bitness) { ··· 7451 7287 u8 mode = BPF_MODE(insn->code); 7452 7288 int i, err; 7453 7289 7454 - if (!may_access_skb(env->prog->type)) { 7290 + if (!may_access_skb(resolve_prog_type(env->prog))) { 7455 7291 verbose(env, "BPF_LD_[ABS|IND] instructions not allowed for this program type\n"); 7456 7292 return -EINVAL; 7457 7293 } ··· 7539 7375 const struct bpf_prog *prog = env->prog; 7540 7376 struct bpf_reg_state *reg; 7541 7377 struct tnum range = tnum_range(0, 1); 7378 + enum bpf_prog_type prog_type = resolve_prog_type(env->prog); 7542 7379 int err; 7543 7380 7544 7381 /* LSM and struct_ops func-ptr's return type could be "void" */ 7545 - if ((env->prog->type == BPF_PROG_TYPE_STRUCT_OPS || 7546 - env->prog->type == BPF_PROG_TYPE_LSM) && 7382 + if ((prog_type == BPF_PROG_TYPE_STRUCT_OPS || 7383 + prog_type == BPF_PROG_TYPE_LSM) && 7547 7384 !prog->aux->attach_func_proto->type) 7548 7385 return 0; 7549 7386 ··· 7563 7398 return -EACCES; 7564 7399 } 7565 7400 7566 - switch (env->prog->type) { 7401 + switch (prog_type) { 7567 7402 case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: 7568 7403 if (env->prog->expected_attach_type == BPF_CGROUP_UDP4_RECVMSG || 7569 7404 env->prog->expected_attach_type == BPF_CGROUP_UDP6_RECVMSG || ··· 9319 9154 struct bpf_prog *prog) 9320 9155 9321 9156 { 9157 + enum bpf_prog_type prog_type = resolve_prog_type(prog); 9322 9158 /* 9323 9159 * Validate that trace type programs use preallocated hash maps. 9324 9160 * ··· 9337 9171 * now, but warnings are emitted so developers are made aware of 9338 9172 * the unsafety and can fix their programs before this is enforced. 9339 9173 */ 9340 - if (is_tracing_prog_type(prog->type) && !is_preallocated_map(map)) { 9341 - if (prog->type == BPF_PROG_TYPE_PERF_EVENT) { 9174 + if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) { 9175 + if (prog_type == BPF_PROG_TYPE_PERF_EVENT) { 9342 9176 verbose(env, "perf_event programs can only use preallocated hash map\n"); 9343 9177 return -EINVAL; 9344 9178 } ··· 9350 9184 verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n"); 9351 9185 } 9352 9186 9353 - if ((is_tracing_prog_type(prog->type) || 9354 - prog->type == BPF_PROG_TYPE_SOCKET_FILTER) && 9187 + if ((is_tracing_prog_type(prog_type) || 9188 + prog_type == BPF_PROG_TYPE_SOCKET_FILTER) && 9355 9189 map_value_has_spin_lock(map)) { 9356 9190 verbose(env, "tracing progs cannot use bpf_spin_lock yet\n"); 9357 9191 return -EINVAL; ··· 9367 9201 verbose(env, "bpf_struct_ops map cannot be used in prog\n"); 9368 9202 return -EINVAL; 9369 9203 } 9204 + 9205 + if (prog->aux->sleepable) 9206 + switch (map->map_type) { 9207 + case BPF_MAP_TYPE_HASH: 9208 + case BPF_MAP_TYPE_LRU_HASH: 9209 + case BPF_MAP_TYPE_ARRAY: 9210 + if (!is_preallocated_map(map)) { 9211 + verbose(env, 9212 + "Sleepable programs can only use preallocated hash maps\n"); 9213 + return -EINVAL; 9214 + } 9215 + break; 9216 + default: 9217 + verbose(env, 9218 + "Sleepable programs can only use array and hash maps\n"); 9219 + return -EINVAL; 9220 + } 9370 9221 9371 9222 return 0; 9372 9223 } ··· 10080 9897 insn->code = BPF_LDX | BPF_PROBE_MEM | 10081 9898 BPF_SIZE((insn)->code); 10082 9899 env->prog->aux->num_exentries++; 10083 - } else if (env->prog->type != BPF_PROG_TYPE_STRUCT_OPS) { 9900 + } else if (resolve_prog_type(env->prog) != BPF_PROG_TYPE_STRUCT_OPS) { 10084 9901 verbose(env, "Writes through BTF pointers are not allowed\n"); 10085 9902 return -EINVAL; 10086 9903 } ··· 11003 10820 return -EINVAL; 11004 10821 } 11005 10822 10823 + /* non exhaustive list of sleepable bpf_lsm_*() functions */ 10824 + BTF_SET_START(btf_sleepable_lsm_hooks) 10825 + #ifdef CONFIG_BPF_LSM 10826 + BTF_ID(func, bpf_lsm_bprm_committed_creds) 10827 + #else 10828 + BTF_ID_UNUSED 10829 + #endif 10830 + BTF_SET_END(btf_sleepable_lsm_hooks) 10831 + 10832 + static int check_sleepable_lsm_hook(u32 btf_id) 10833 + { 10834 + return btf_id_set_contains(&btf_sleepable_lsm_hooks, btf_id); 10835 + } 10836 + 10837 + /* list of non-sleepable functions that are otherwise on 10838 + * ALLOW_ERROR_INJECTION list 10839 + */ 10840 + BTF_SET_START(btf_non_sleepable_error_inject) 10841 + /* Three functions below can be called from sleepable and non-sleepable context. 10842 + * Assume non-sleepable from bpf safety point of view. 10843 + */ 10844 + BTF_ID(func, __add_to_page_cache_locked) 10845 + BTF_ID(func, should_fail_alloc_page) 10846 + BTF_ID(func, should_failslab) 10847 + BTF_SET_END(btf_non_sleepable_error_inject) 10848 + 10849 + static int check_non_sleepable_error_inject(u32 btf_id) 10850 + { 10851 + return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id); 10852 + } 10853 + 11006 10854 static int check_attach_btf_id(struct bpf_verifier_env *env) 11007 10855 { 11008 10856 struct bpf_prog *prog = env->prog; ··· 11050 10836 struct btf *btf; 11051 10837 long addr; 11052 10838 u64 key; 10839 + 10840 + if (prog->aux->sleepable && prog->type != BPF_PROG_TYPE_TRACING && 10841 + prog->type != BPF_PROG_TYPE_LSM) { 10842 + verbose(env, "Only fentry/fexit/fmod_ret and lsm programs can be sleepable\n"); 10843 + return -EINVAL; 10844 + } 11053 10845 11054 10846 if (prog->type == BPF_PROG_TYPE_STRUCT_OPS) 11055 10847 return check_struct_ops_btf_id(env); ··· 11265 11045 } 11266 11046 } 11267 11047 11268 - if (prog->expected_attach_type == BPF_MODIFY_RETURN) { 11048 + if (prog->aux->sleepable) { 11049 + ret = -EINVAL; 11050 + switch (prog->type) { 11051 + case BPF_PROG_TYPE_TRACING: 11052 + /* fentry/fexit/fmod_ret progs can be sleepable only if they are 11053 + * attached to ALLOW_ERROR_INJECTION and are not in denylist. 11054 + */ 11055 + if (!check_non_sleepable_error_inject(btf_id) && 11056 + within_error_injection_list(addr)) 11057 + ret = 0; 11058 + break; 11059 + case BPF_PROG_TYPE_LSM: 11060 + /* LSM progs check that they are attached to bpf_lsm_*() funcs. 11061 + * Only some of them are sleepable. 11062 + */ 11063 + if (check_sleepable_lsm_hook(btf_id)) 11064 + ret = 0; 11065 + break; 11066 + default: 11067 + break; 11068 + } 11069 + if (ret) 11070 + verbose(env, "%s is not sleepable\n", 11071 + prog->aux->attach_func_name); 11072 + } else if (prog->expected_attach_type == BPF_MODIFY_RETURN) { 11269 11073 ret = check_attach_modify_return(prog, addr); 11270 11074 if (ret) 11271 11075 verbose(env, "%s() is not modifiable\n", 11272 11076 prog->aux->attach_func_name); 11273 11077 } 11274 - 11275 11078 if (ret) 11276 11079 goto out; 11277 11080 tr->func.addr = (void *)addr;

+50

kernel/trace/bpf_trace.c

··· 1098 1098 .arg1_type = ARG_ANYTHING, 1099 1099 }; 1100 1100 1101 + BPF_CALL_3(bpf_d_path, struct path *, path, char *, buf, u32, sz) 1102 + { 1103 + long len; 1104 + char *p; 1105 + 1106 + if (!sz) 1107 + return 0; 1108 + 1109 + p = d_path(path, buf, sz); 1110 + if (IS_ERR(p)) { 1111 + len = PTR_ERR(p); 1112 + } else { 1113 + len = buf + sz - p; 1114 + memmove(buf, p, len); 1115 + } 1116 + 1117 + return len; 1118 + } 1119 + 1120 + BTF_SET_START(btf_allowlist_d_path) 1121 + BTF_ID(func, vfs_truncate) 1122 + BTF_ID(func, vfs_fallocate) 1123 + BTF_ID(func, dentry_open) 1124 + BTF_ID(func, vfs_getattr) 1125 + BTF_ID(func, filp_close) 1126 + BTF_SET_END(btf_allowlist_d_path) 1127 + 1128 + static bool bpf_d_path_allowed(const struct bpf_prog *prog) 1129 + { 1130 + return btf_id_set_contains(&btf_allowlist_d_path, prog->aux->attach_btf_id); 1131 + } 1132 + 1133 + BTF_ID_LIST(bpf_d_path_btf_ids) 1134 + BTF_ID(struct, path) 1135 + 1136 + static const struct bpf_func_proto bpf_d_path_proto = { 1137 + .func = bpf_d_path, 1138 + .gpl_only = false, 1139 + .ret_type = RET_INTEGER, 1140 + .arg1_type = ARG_PTR_TO_BTF_ID, 1141 + .arg2_type = ARG_PTR_TO_MEM, 1142 + .arg3_type = ARG_CONST_SIZE_OR_ZERO, 1143 + .btf_id = bpf_d_path_btf_ids, 1144 + .allowed = bpf_d_path_allowed, 1145 + }; 1146 + 1101 1147 const struct bpf_func_proto * 1102 1148 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 1103 1149 { ··· 1228 1182 return &bpf_jiffies64_proto; 1229 1183 case BPF_FUNC_get_task_stack: 1230 1184 return &bpf_get_task_stack_proto; 1185 + case BPF_FUNC_copy_from_user: 1186 + return prog->aux->sleepable ? &bpf_copy_from_user_proto : NULL; 1231 1187 default: 1232 1188 return NULL; 1233 1189 } ··· 1627 1579 return prog->expected_attach_type == BPF_TRACE_ITER ? 1628 1580 &bpf_seq_write_proto : 1629 1581 NULL; 1582 + case BPF_FUNC_d_path: 1583 + return &bpf_d_path_proto; 1630 1584 default: 1631 1585 return raw_tp_prog_func_proto(func_id, prog); 1632 1586 }

+4 -4

mm/filemap.c

··· 827 827 } 828 828 EXPORT_SYMBOL_GPL(replace_page_cache_page); 829 829 830 - static int __add_to_page_cache_locked(struct page *page, 831 - struct address_space *mapping, 832 - pgoff_t offset, gfp_t gfp_mask, 833 - void **shadowp) 830 + noinline int __add_to_page_cache_locked(struct page *page, 831 + struct address_space *mapping, 832 + pgoff_t offset, gfp_t gfp_mask, 833 + void **shadowp) 834 834 { 835 835 XA_STATE(xas, &mapping->i_pages, offset); 836 836 int huge = PageHuge(page);

+1 -1

mm/page_alloc.c

··· 3482 3482 3483 3483 #endif /* CONFIG_FAIL_PAGE_ALLOC */ 3484 3484 3485 - static noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) 3485 + noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) 3486 3486 { 3487 3487 return __should_fail_alloc_page(gfp_mask, order); 3488 3488 }

+1

net/bpfilter/Kconfig

··· 2 2 menuconfig BPFILTER 3 3 bool "BPF based packet filtering framework (BPFILTER)" 4 4 depends on NET && BPF && INET 5 + select USERMODE_DRIVER 5 6 help 6 7 This builds experimental bpfilter framework that is aiming to 7 8 provide netfilter compatible functionality via BPF

+138 -695

net/core/bpf_sk_storage.c

··· 7 7 #include <linux/spinlock.h> 8 8 #include <linux/bpf.h> 9 9 #include <linux/btf_ids.h> 10 + #include <linux/bpf_local_storage.h> 10 11 #include <net/bpf_sk_storage.h> 11 12 #include <net/sock.h> 12 13 #include <uapi/linux/sock_diag.h> 13 14 #include <uapi/linux/btf.h> 15 + #include <linux/btf_ids.h> 14 16 15 - #define SK_STORAGE_CREATE_FLAG_MASK \ 16 - (BPF_F_NO_PREALLOC | BPF_F_CLONE) 17 - 18 - struct bucket { 19 - struct hlist_head list; 20 - raw_spinlock_t lock; 21 - }; 22 - 23 - /* Thp map is not the primary owner of a bpf_sk_storage_elem. 24 - * Instead, the sk->sk_bpf_storage is. 25 - * 26 - * The map (bpf_sk_storage_map) is for two purposes 27 - * 1. Define the size of the "sk local storage". It is 28 - * the map's value_size. 29 - * 30 - * 2. Maintain a list to keep track of all elems such 31 - * that they can be cleaned up during the map destruction. 32 - * 33 - * When a bpf local storage is being looked up for a 34 - * particular sk, the "bpf_map" pointer is actually used 35 - * as the "key" to search in the list of elem in 36 - * sk->sk_bpf_storage. 37 - * 38 - * Hence, consider sk->sk_bpf_storage is the mini-map 39 - * with the "bpf_map" pointer as the searching key. 40 - */ 41 - struct bpf_sk_storage_map { 42 - struct bpf_map map; 43 - /* Lookup elem does not require accessing the map. 44 - * 45 - * Updating/Deleting requires a bucket lock to 46 - * link/unlink the elem from the map. Having 47 - * multiple buckets to improve contention. 48 - */ 49 - struct bucket *buckets; 50 - u32 bucket_log; 51 - u16 elem_size; 52 - u16 cache_idx; 53 - }; 54 - 55 - struct bpf_sk_storage_data { 56 - /* smap is used as the searching key when looking up 57 - * from sk->sk_bpf_storage. 58 - * 59 - * Put it in the same cacheline as the data to minimize 60 - * the number of cachelines access during the cache hit case. 61 - */ 62 - struct bpf_sk_storage_map __rcu *smap; 63 - u8 data[] __aligned(8); 64 - }; 65 - 66 - /* Linked to bpf_sk_storage and bpf_sk_storage_map */ 67 - struct bpf_sk_storage_elem { 68 - struct hlist_node map_node; /* Linked to bpf_sk_storage_map */ 69 - struct hlist_node snode; /* Linked to bpf_sk_storage */ 70 - struct bpf_sk_storage __rcu *sk_storage; 71 - struct rcu_head rcu; 72 - /* 8 bytes hole */ 73 - /* The data is stored in aother cacheline to minimize 74 - * the number of cachelines access during a cache hit. 75 - */ 76 - struct bpf_sk_storage_data sdata ____cacheline_aligned; 77 - }; 78 - 79 - #define SELEM(_SDATA) container_of((_SDATA), struct bpf_sk_storage_elem, sdata) 80 - #define SDATA(_SELEM) (&(_SELEM)->sdata) 81 - #define BPF_SK_STORAGE_CACHE_SIZE 16 82 - 83 - static DEFINE_SPINLOCK(cache_idx_lock); 84 - static u64 cache_idx_usage_counts[BPF_SK_STORAGE_CACHE_SIZE]; 85 - 86 - struct bpf_sk_storage { 87 - struct bpf_sk_storage_data __rcu *cache[BPF_SK_STORAGE_CACHE_SIZE]; 88 - struct hlist_head list; /* List of bpf_sk_storage_elem */ 89 - struct sock *sk; /* The sk that owns the the above "list" of 90 - * bpf_sk_storage_elem. 91 - */ 92 - struct rcu_head rcu; 93 - raw_spinlock_t lock; /* Protect adding/removing from the "list" */ 94 - }; 95 - 96 - static struct bucket *select_bucket(struct bpf_sk_storage_map *smap, 97 - struct bpf_sk_storage_elem *selem) 98 - { 99 - return &smap->buckets[hash_ptr(selem, smap->bucket_log)]; 100 - } 17 + DEFINE_BPF_STORAGE_CACHE(sk_cache); 101 18 102 19 static int omem_charge(struct sock *sk, unsigned int size) 103 20 { ··· 28 111 return -ENOMEM; 29 112 } 30 113 31 - static bool selem_linked_to_sk(const struct bpf_sk_storage_elem *selem) 32 - { 33 - return !hlist_unhashed(&selem->snode); 34 - } 35 - 36 - static bool selem_linked_to_map(const struct bpf_sk_storage_elem *selem) 37 - { 38 - return !hlist_unhashed(&selem->map_node); 39 - } 40 - 41 - static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap, 42 - struct sock *sk, void *value, 43 - bool charge_omem) 44 - { 45 - struct bpf_sk_storage_elem *selem; 46 - 47 - if (charge_omem && omem_charge(sk, smap->elem_size)) 48 - return NULL; 49 - 50 - selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN); 51 - if (selem) { 52 - if (value) 53 - memcpy(SDATA(selem)->data, value, smap->map.value_size); 54 - return selem; 55 - } 56 - 57 - if (charge_omem) 58 - atomic_sub(smap->elem_size, &sk->sk_omem_alloc); 59 - 60 - return NULL; 61 - } 62 - 63 - /* sk_storage->lock must be held and selem->sk_storage == sk_storage. 64 - * The caller must ensure selem->smap is still valid to be 65 - * dereferenced for its smap->elem_size and smap->cache_idx. 66 - */ 67 - static bool __selem_unlink_sk(struct bpf_sk_storage *sk_storage, 68 - struct bpf_sk_storage_elem *selem, 69 - bool uncharge_omem) 70 - { 71 - struct bpf_sk_storage_map *smap; 72 - bool free_sk_storage; 73 - struct sock *sk; 74 - 75 - smap = rcu_dereference(SDATA(selem)->smap); 76 - sk = sk_storage->sk; 77 - 78 - /* All uncharging on sk->sk_omem_alloc must be done first. 79 - * sk may be freed once the last selem is unlinked from sk_storage. 80 - */ 81 - if (uncharge_omem) 82 - atomic_sub(smap->elem_size, &sk->sk_omem_alloc); 83 - 84 - free_sk_storage = hlist_is_singular_node(&selem->snode, 85 - &sk_storage->list); 86 - if (free_sk_storage) { 87 - atomic_sub(sizeof(struct bpf_sk_storage), &sk->sk_omem_alloc); 88 - sk_storage->sk = NULL; 89 - /* After this RCU_INIT, sk may be freed and cannot be used */ 90 - RCU_INIT_POINTER(sk->sk_bpf_storage, NULL); 91 - 92 - /* sk_storage is not freed now. sk_storage->lock is 93 - * still held and raw_spin_unlock_bh(&sk_storage->lock) 94 - * will be done by the caller. 95 - * 96 - * Although the unlock will be done under 97 - * rcu_read_lock(), it is more intutivie to 98 - * read if kfree_rcu(sk_storage, rcu) is done 99 - * after the raw_spin_unlock_bh(&sk_storage->lock). 100 - * 101 - * Hence, a "bool free_sk_storage" is returned 102 - * to the caller which then calls the kfree_rcu() 103 - * after unlock. 104 - */ 105 - } 106 - hlist_del_init_rcu(&selem->snode); 107 - if (rcu_access_pointer(sk_storage->cache[smap->cache_idx]) == 108 - SDATA(selem)) 109 - RCU_INIT_POINTER(sk_storage->cache[smap->cache_idx], NULL); 110 - 111 - kfree_rcu(selem, rcu); 112 - 113 - return free_sk_storage; 114 - } 115 - 116 - static void selem_unlink_sk(struct bpf_sk_storage_elem *selem) 117 - { 118 - struct bpf_sk_storage *sk_storage; 119 - bool free_sk_storage = false; 120 - 121 - if (unlikely(!selem_linked_to_sk(selem))) 122 - /* selem has already been unlinked from sk */ 123 - return; 124 - 125 - sk_storage = rcu_dereference(selem->sk_storage); 126 - raw_spin_lock_bh(&sk_storage->lock); 127 - if (likely(selem_linked_to_sk(selem))) 128 - free_sk_storage = __selem_unlink_sk(sk_storage, selem, true); 129 - raw_spin_unlock_bh(&sk_storage->lock); 130 - 131 - if (free_sk_storage) 132 - kfree_rcu(sk_storage, rcu); 133 - } 134 - 135 - static void __selem_link_sk(struct bpf_sk_storage *sk_storage, 136 - struct bpf_sk_storage_elem *selem) 137 - { 138 - RCU_INIT_POINTER(selem->sk_storage, sk_storage); 139 - hlist_add_head(&selem->snode, &sk_storage->list); 140 - } 141 - 142 - static void selem_unlink_map(struct bpf_sk_storage_elem *selem) 143 - { 144 - struct bpf_sk_storage_map *smap; 145 - struct bucket *b; 146 - 147 - if (unlikely(!selem_linked_to_map(selem))) 148 - /* selem has already be unlinked from smap */ 149 - return; 150 - 151 - smap = rcu_dereference(SDATA(selem)->smap); 152 - b = select_bucket(smap, selem); 153 - raw_spin_lock_bh(&b->lock); 154 - if (likely(selem_linked_to_map(selem))) 155 - hlist_del_init_rcu(&selem->map_node); 156 - raw_spin_unlock_bh(&b->lock); 157 - } 158 - 159 - static void selem_link_map(struct bpf_sk_storage_map *smap, 160 - struct bpf_sk_storage_elem *selem) 161 - { 162 - struct bucket *b = select_bucket(smap, selem); 163 - 164 - raw_spin_lock_bh(&b->lock); 165 - RCU_INIT_POINTER(SDATA(selem)->smap, smap); 166 - hlist_add_head_rcu(&selem->map_node, &b->list); 167 - raw_spin_unlock_bh(&b->lock); 168 - } 169 - 170 - static void selem_unlink(struct bpf_sk_storage_elem *selem) 171 - { 172 - /* Always unlink from map before unlinking from sk_storage 173 - * because selem will be freed after successfully unlinked from 174 - * the sk_storage. 175 - */ 176 - selem_unlink_map(selem); 177 - selem_unlink_sk(selem); 178 - } 179 - 180 - static struct bpf_sk_storage_data * 181 - __sk_storage_lookup(struct bpf_sk_storage *sk_storage, 182 - struct bpf_sk_storage_map *smap, 183 - bool cacheit_lockit) 184 - { 185 - struct bpf_sk_storage_data *sdata; 186 - struct bpf_sk_storage_elem *selem; 187 - 188 - /* Fast path (cache hit) */ 189 - sdata = rcu_dereference(sk_storage->cache[smap->cache_idx]); 190 - if (sdata && rcu_access_pointer(sdata->smap) == smap) 191 - return sdata; 192 - 193 - /* Slow path (cache miss) */ 194 - hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) 195 - if (rcu_access_pointer(SDATA(selem)->smap) == smap) 196 - break; 197 - 198 - if (!selem) 199 - return NULL; 200 - 201 - sdata = SDATA(selem); 202 - if (cacheit_lockit) { 203 - /* spinlock is needed to avoid racing with the 204 - * parallel delete. Otherwise, publishing an already 205 - * deleted sdata to the cache will become a use-after-free 206 - * problem in the next __sk_storage_lookup(). 207 - */ 208 - raw_spin_lock_bh(&sk_storage->lock); 209 - if (selem_linked_to_sk(selem)) 210 - rcu_assign_pointer(sk_storage->cache[smap->cache_idx], 211 - sdata); 212 - raw_spin_unlock_bh(&sk_storage->lock); 213 - } 214 - 215 - return sdata; 216 - } 217 - 218 - static struct bpf_sk_storage_data * 114 + static struct bpf_local_storage_data * 219 115 sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit) 220 116 { 221 - struct bpf_sk_storage *sk_storage; 222 - struct bpf_sk_storage_map *smap; 117 + struct bpf_local_storage *sk_storage; 118 + struct bpf_local_storage_map *smap; 223 119 224 120 sk_storage = rcu_dereference(sk->sk_bpf_storage); 225 121 if (!sk_storage) 226 122 return NULL; 227 123 228 - smap = (struct bpf_sk_storage_map *)map; 229 - return __sk_storage_lookup(sk_storage, smap, cacheit_lockit); 230 - } 231 - 232 - static int check_flags(const struct bpf_sk_storage_data *old_sdata, 233 - u64 map_flags) 234 - { 235 - if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST) 236 - /* elem already exists */ 237 - return -EEXIST; 238 - 239 - if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST) 240 - /* elem doesn't exist, cannot update it */ 241 - return -ENOENT; 242 - 243 - return 0; 244 - } 245 - 246 - static int sk_storage_alloc(struct sock *sk, 247 - struct bpf_sk_storage_map *smap, 248 - struct bpf_sk_storage_elem *first_selem) 249 - { 250 - struct bpf_sk_storage *prev_sk_storage, *sk_storage; 251 - int err; 252 - 253 - err = omem_charge(sk, sizeof(*sk_storage)); 254 - if (err) 255 - return err; 256 - 257 - sk_storage = kzalloc(sizeof(*sk_storage), GFP_ATOMIC | __GFP_NOWARN); 258 - if (!sk_storage) { 259 - err = -ENOMEM; 260 - goto uncharge; 261 - } 262 - INIT_HLIST_HEAD(&sk_storage->list); 263 - raw_spin_lock_init(&sk_storage->lock); 264 - sk_storage->sk = sk; 265 - 266 - __selem_link_sk(sk_storage, first_selem); 267 - selem_link_map(smap, first_selem); 268 - /* Publish sk_storage to sk. sk->sk_lock cannot be acquired. 269 - * Hence, atomic ops is used to set sk->sk_bpf_storage 270 - * from NULL to the newly allocated sk_storage ptr. 271 - * 272 - * From now on, the sk->sk_bpf_storage pointer is protected 273 - * by the sk_storage->lock. Hence, when freeing 274 - * the sk->sk_bpf_storage, the sk_storage->lock must 275 - * be held before setting sk->sk_bpf_storage to NULL. 276 - */ 277 - prev_sk_storage = cmpxchg((struct bpf_sk_storage **)&sk->sk_bpf_storage, 278 - NULL, sk_storage); 279 - if (unlikely(prev_sk_storage)) { 280 - selem_unlink_map(first_selem); 281 - err = -EAGAIN; 282 - goto uncharge; 283 - 284 - /* Note that even first_selem was linked to smap's 285 - * bucket->list, first_selem can be freed immediately 286 - * (instead of kfree_rcu) because 287 - * bpf_sk_storage_map_free() does a 288 - * synchronize_rcu() before walking the bucket->list. 289 - * Hence, no one is accessing selem from the 290 - * bucket->list under rcu_read_lock(). 291 - */ 292 - } 293 - 294 - return 0; 295 - 296 - uncharge: 297 - kfree(sk_storage); 298 - atomic_sub(sizeof(*sk_storage), &sk->sk_omem_alloc); 299 - return err; 300 - } 301 - 302 - /* sk cannot be going away because it is linking new elem 303 - * to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0). 304 - * Otherwise, it will become a leak (and other memory issues 305 - * during map destruction). 306 - */ 307 - static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk, 308 - struct bpf_map *map, 309 - void *value, 310 - u64 map_flags) 311 - { 312 - struct bpf_sk_storage_data *old_sdata = NULL; 313 - struct bpf_sk_storage_elem *selem; 314 - struct bpf_sk_storage *sk_storage; 315 - struct bpf_sk_storage_map *smap; 316 - int err; 317 - 318 - /* BPF_EXIST and BPF_NOEXIST cannot be both set */ 319 - if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) || 320 - /* BPF_F_LOCK can only be used in a value with spin_lock */ 321 - unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map))) 322 - return ERR_PTR(-EINVAL); 323 - 324 - smap = (struct bpf_sk_storage_map *)map; 325 - sk_storage = rcu_dereference(sk->sk_bpf_storage); 326 - if (!sk_storage || hlist_empty(&sk_storage->list)) { 327 - /* Very first elem for this sk */ 328 - err = check_flags(NULL, map_flags); 329 - if (err) 330 - return ERR_PTR(err); 331 - 332 - selem = selem_alloc(smap, sk, value, true); 333 - if (!selem) 334 - return ERR_PTR(-ENOMEM); 335 - 336 - err = sk_storage_alloc(sk, smap, selem); 337 - if (err) { 338 - kfree(selem); 339 - atomic_sub(smap->elem_size, &sk->sk_omem_alloc); 340 - return ERR_PTR(err); 341 - } 342 - 343 - return SDATA(selem); 344 - } 345 - 346 - if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) { 347 - /* Hoping to find an old_sdata to do inline update 348 - * such that it can avoid taking the sk_storage->lock 349 - * and changing the lists. 350 - */ 351 - old_sdata = __sk_storage_lookup(sk_storage, smap, false); 352 - err = check_flags(old_sdata, map_flags); 353 - if (err) 354 - return ERR_PTR(err); 355 - if (old_sdata && selem_linked_to_sk(SELEM(old_sdata))) { 356 - copy_map_value_locked(map, old_sdata->data, 357 - value, false); 358 - return old_sdata; 359 - } 360 - } 361 - 362 - raw_spin_lock_bh(&sk_storage->lock); 363 - 364 - /* Recheck sk_storage->list under sk_storage->lock */ 365 - if (unlikely(hlist_empty(&sk_storage->list))) { 366 - /* A parallel del is happening and sk_storage is going 367 - * away. It has just been checked before, so very 368 - * unlikely. Return instead of retry to keep things 369 - * simple. 370 - */ 371 - err = -EAGAIN; 372 - goto unlock_err; 373 - } 374 - 375 - old_sdata = __sk_storage_lookup(sk_storage, smap, false); 376 - err = check_flags(old_sdata, map_flags); 377 - if (err) 378 - goto unlock_err; 379 - 380 - if (old_sdata && (map_flags & BPF_F_LOCK)) { 381 - copy_map_value_locked(map, old_sdata->data, value, false); 382 - selem = SELEM(old_sdata); 383 - goto unlock; 384 - } 385 - 386 - /* sk_storage->lock is held. Hence, we are sure 387 - * we can unlink and uncharge the old_sdata successfully 388 - * later. Hence, instead of charging the new selem now 389 - * and then uncharge the old selem later (which may cause 390 - * a potential but unnecessary charge failure), avoid taking 391 - * a charge at all here (the "!old_sdata" check) and the 392 - * old_sdata will not be uncharged later during __selem_unlink_sk(). 393 - */ 394 - selem = selem_alloc(smap, sk, value, !old_sdata); 395 - if (!selem) { 396 - err = -ENOMEM; 397 - goto unlock_err; 398 - } 399 - 400 - /* First, link the new selem to the map */ 401 - selem_link_map(smap, selem); 402 - 403 - /* Second, link (and publish) the new selem to sk_storage */ 404 - __selem_link_sk(sk_storage, selem); 405 - 406 - /* Third, remove old selem, SELEM(old_sdata) */ 407 - if (old_sdata) { 408 - selem_unlink_map(SELEM(old_sdata)); 409 - __selem_unlink_sk(sk_storage, SELEM(old_sdata), false); 410 - } 411 - 412 - unlock: 413 - raw_spin_unlock_bh(&sk_storage->lock); 414 - return SDATA(selem); 415 - 416 - unlock_err: 417 - raw_spin_unlock_bh(&sk_storage->lock); 418 - return ERR_PTR(err); 124 + smap = (struct bpf_local_storage_map *)map; 125 + return bpf_local_storage_lookup(sk_storage, smap, cacheit_lockit); 419 126 } 420 127 421 128 static int sk_storage_delete(struct sock *sk, struct bpf_map *map) 422 129 { 423 - struct bpf_sk_storage_data *sdata; 130 + struct bpf_local_storage_data *sdata; 424 131 425 132 sdata = sk_storage_lookup(sk, map, false); 426 133 if (!sdata) 427 134 return -ENOENT; 428 135 429 - selem_unlink(SELEM(sdata)); 136 + bpf_selem_unlink(SELEM(sdata)); 430 137 431 138 return 0; 432 - } 433 - 434 - static u16 cache_idx_get(void) 435 - { 436 - u64 min_usage = U64_MAX; 437 - u16 i, res = 0; 438 - 439 - spin_lock(&cache_idx_lock); 440 - 441 - for (i = 0; i < BPF_SK_STORAGE_CACHE_SIZE; i++) { 442 - if (cache_idx_usage_counts[i] < min_usage) { 443 - min_usage = cache_idx_usage_counts[i]; 444 - res = i; 445 - 446 - /* Found a free cache_idx */ 447 - if (!min_usage) 448 - break; 449 - } 450 - } 451 - cache_idx_usage_counts[res]++; 452 - 453 - spin_unlock(&cache_idx_lock); 454 - 455 - return res; 456 - } 457 - 458 - static void cache_idx_free(u16 idx) 459 - { 460 - spin_lock(&cache_idx_lock); 461 - cache_idx_usage_counts[idx]--; 462 - spin_unlock(&cache_idx_lock); 463 139 } 464 140 465 141 /* Called by __sk_destruct() & bpf_sk_storage_clone() */ 466 142 void bpf_sk_storage_free(struct sock *sk) 467 143 { 468 - struct bpf_sk_storage_elem *selem; 469 - struct bpf_sk_storage *sk_storage; 144 + struct bpf_local_storage_elem *selem; 145 + struct bpf_local_storage *sk_storage; 470 146 bool free_sk_storage = false; 471 147 struct hlist_node *n; 472 148 ··· 75 565 * Thus, no elem can be added-to or deleted-from the 76 566 * sk_storage->list by the bpf_prog or by the bpf-map's syscall. 77 567 * 78 - * It is racing with bpf_sk_storage_map_free() alone 568 + * It is racing with bpf_local_storage_map_free() alone 79 569 * when unlinking elem from the sk_storage->list and 80 570 * the map's bucket->list. 81 571 */ ··· 84 574 /* Always unlink from map before unlinking from 85 575 * sk_storage. 86 576 */ 87 - selem_unlink_map(selem); 88 - free_sk_storage = __selem_unlink_sk(sk_storage, selem, true); 577 + bpf_selem_unlink_map(selem); 578 + free_sk_storage = bpf_selem_unlink_storage_nolock(sk_storage, 579 + selem, true); 89 580 } 90 581 raw_spin_unlock_bh(&sk_storage->lock); 91 582 rcu_read_unlock(); ··· 95 584 kfree_rcu(sk_storage, rcu); 96 585 } 97 586 98 - static void bpf_sk_storage_map_free(struct bpf_map *map) 587 + static void sk_storage_map_free(struct bpf_map *map) 99 588 { 100 - struct bpf_sk_storage_elem *selem; 101 - struct bpf_sk_storage_map *smap; 102 - struct bucket *b; 103 - unsigned int i; 589 + struct bpf_local_storage_map *smap; 104 590 105 - smap = (struct bpf_sk_storage_map *)map; 106 - 107 - cache_idx_free(smap->cache_idx); 108 - 109 - /* Note that this map might be concurrently cloned from 110 - * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone 111 - * RCU read section to finish before proceeding. New RCU 112 - * read sections should be prevented via bpf_map_inc_not_zero. 113 - */ 114 - synchronize_rcu(); 115 - 116 - /* bpf prog and the userspace can no longer access this map 117 - * now. No new selem (of this map) can be added 118 - * to the sk->sk_bpf_storage or to the map bucket's list. 119 - * 120 - * The elem of this map can be cleaned up here 121 - * or 122 - * by bpf_sk_storage_free() during __sk_destruct(). 123 - */ 124 - for (i = 0; i < (1U << smap->bucket_log); i++) { 125 - b = &smap->buckets[i]; 126 - 127 - rcu_read_lock(); 128 - /* No one is adding to b->list now */ 129 - while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)), 130 - struct bpf_sk_storage_elem, 131 - map_node))) { 132 - selem_unlink(selem); 133 - cond_resched_rcu(); 134 - } 135 - rcu_read_unlock(); 136 - } 137 - 138 - /* bpf_sk_storage_free() may still need to access the map. 139 - * e.g. bpf_sk_storage_free() has unlinked selem from the map 140 - * which then made the above while((selem = ...)) loop 141 - * exited immediately. 142 - * 143 - * However, the bpf_sk_storage_free() still needs to access 144 - * the smap->elem_size to do the uncharging in 145 - * __selem_unlink_sk(). 146 - * 147 - * Hence, wait another rcu grace period for the 148 - * bpf_sk_storage_free() to finish. 149 - */ 150 - synchronize_rcu(); 151 - 152 - kvfree(smap->buckets); 153 - kfree(map); 591 + smap = (struct bpf_local_storage_map *)map; 592 + bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx); 593 + bpf_local_storage_map_free(smap); 154 594 } 155 595 156 - /* U16_MAX is much more than enough for sk local storage 157 - * considering a tcp_sock is ~2k. 158 - */ 159 - #define MAX_VALUE_SIZE \ 160 - min_t(u32, \ 161 - (KMALLOC_MAX_SIZE - MAX_BPF_STACK - sizeof(struct bpf_sk_storage_elem)), \ 162 - (U16_MAX - sizeof(struct bpf_sk_storage_elem))) 163 - 164 - static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr) 596 + static struct bpf_map *sk_storage_map_alloc(union bpf_attr *attr) 165 597 { 166 - if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK || 167 - !(attr->map_flags & BPF_F_NO_PREALLOC) || 168 - attr->max_entries || 169 - attr->key_size != sizeof(int) || !attr->value_size || 170 - /* Enforce BTF for userspace sk dumping */ 171 - !attr->btf_key_type_id || !attr->btf_value_type_id) 172 - return -EINVAL; 598 + struct bpf_local_storage_map *smap; 173 599 174 - if (!bpf_capable()) 175 - return -EPERM; 600 + smap = bpf_local_storage_map_alloc(attr); 601 + if (IS_ERR(smap)) 602 + return ERR_CAST(smap); 176 603 177 - if (attr->value_size > MAX_VALUE_SIZE) 178 - return -E2BIG; 179 - 180 - return 0; 181 - } 182 - 183 - static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr) 184 - { 185 - struct bpf_sk_storage_map *smap; 186 - unsigned int i; 187 - u32 nbuckets; 188 - u64 cost; 189 - int ret; 190 - 191 - smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN); 192 - if (!smap) 193 - return ERR_PTR(-ENOMEM); 194 - bpf_map_init_from_attr(&smap->map, attr); 195 - 196 - nbuckets = roundup_pow_of_two(num_possible_cpus()); 197 - /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ 198 - nbuckets = max_t(u32, 2, nbuckets); 199 - smap->bucket_log = ilog2(nbuckets); 200 - cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap); 201 - 202 - ret = bpf_map_charge_init(&smap->map.memory, cost); 203 - if (ret < 0) { 204 - kfree(smap); 205 - return ERR_PTR(ret); 206 - } 207 - 208 - smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets, 209 - GFP_USER | __GFP_NOWARN); 210 - if (!smap->buckets) { 211 - bpf_map_charge_finish(&smap->map.memory); 212 - kfree(smap); 213 - return ERR_PTR(-ENOMEM); 214 - } 215 - 216 - for (i = 0; i < nbuckets; i++) { 217 - INIT_HLIST_HEAD(&smap->buckets[i].list); 218 - raw_spin_lock_init(&smap->buckets[i].lock); 219 - } 220 - 221 - smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size; 222 - smap->cache_idx = cache_idx_get(); 223 - 604 + smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache); 224 605 return &smap->map; 225 606 } 226 607 ··· 122 719 return -ENOTSUPP; 123 720 } 124 721 125 - static int bpf_sk_storage_map_check_btf(const struct bpf_map *map, 126 - const struct btf *btf, 127 - const struct btf_type *key_type, 128 - const struct btf_type *value_type) 129 - { 130 - u32 int_data; 131 - 132 - if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT) 133 - return -EINVAL; 134 - 135 - int_data = *(u32 *)(key_type + 1); 136 - if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data)) 137 - return -EINVAL; 138 - 139 - return 0; 140 - } 141 - 142 722 static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key) 143 723 { 144 - struct bpf_sk_storage_data *sdata; 724 + struct bpf_local_storage_data *sdata; 145 725 struct socket *sock; 146 726 int fd, err; 147 727 ··· 142 756 static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key, 143 757 void *value, u64 map_flags) 144 758 { 145 - struct bpf_sk_storage_data *sdata; 759 + struct bpf_local_storage_data *sdata; 146 760 struct socket *sock; 147 761 int fd, err; 148 762 149 763 fd = *(int *)key; 150 764 sock = sockfd_lookup(fd, &err); 151 765 if (sock) { 152 - sdata = sk_storage_update(sock->sk, map, value, map_flags); 766 + sdata = bpf_local_storage_update( 767 + sock->sk, (struct bpf_local_storage_map *)map, value, 768 + map_flags); 153 769 sockfd_put(sock); 154 770 return PTR_ERR_OR_ZERO(sdata); 155 771 } ··· 175 787 return err; 176 788 } 177 789 178 - static struct bpf_sk_storage_elem * 790 + static struct bpf_local_storage_elem * 179 791 bpf_sk_storage_clone_elem(struct sock *newsk, 180 - struct bpf_sk_storage_map *smap, 181 - struct bpf_sk_storage_elem *selem) 792 + struct bpf_local_storage_map *smap, 793 + struct bpf_local_storage_elem *selem) 182 794 { 183 - struct bpf_sk_storage_elem *copy_selem; 795 + struct bpf_local_storage_elem *copy_selem; 184 796 185 - copy_selem = selem_alloc(smap, newsk, NULL, true); 797 + copy_selem = bpf_selem_alloc(smap, newsk, NULL, true); 186 798 if (!copy_selem) 187 799 return NULL; 188 800 ··· 198 810 199 811 int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) 200 812 { 201 - struct bpf_sk_storage *new_sk_storage = NULL; 202 - struct bpf_sk_storage *sk_storage; 203 - struct bpf_sk_storage_elem *selem; 813 + struct bpf_local_storage *new_sk_storage = NULL; 814 + struct bpf_local_storage *sk_storage; 815 + struct bpf_local_storage_elem *selem; 204 816 int ret = 0; 205 817 206 818 RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL); ··· 212 824 goto out; 213 825 214 826 hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) { 215 - struct bpf_sk_storage_elem *copy_selem; 216 - struct bpf_sk_storage_map *smap; 827 + struct bpf_local_storage_elem *copy_selem; 828 + struct bpf_local_storage_map *smap; 217 829 struct bpf_map *map; 218 830 219 831 smap = rcu_dereference(SDATA(selem)->smap); ··· 221 833 continue; 222 834 223 835 /* Note that for lockless listeners adding new element 224 - * here can race with cleanup in bpf_sk_storage_map_free. 836 + * here can race with cleanup in bpf_local_storage_map_free. 225 837 * Try to grab map refcnt to make sure that it's still 226 838 * alive and prevent concurrent removal. 227 839 */ ··· 237 849 } 238 850 239 851 if (new_sk_storage) { 240 - selem_link_map(smap, copy_selem); 241 - __selem_link_sk(new_sk_storage, copy_selem); 852 + bpf_selem_link_map(smap, copy_selem); 853 + bpf_selem_link_storage_nolock(new_sk_storage, copy_selem); 242 854 } else { 243 - ret = sk_storage_alloc(newsk, smap, copy_selem); 855 + ret = bpf_local_storage_alloc(newsk, smap, copy_selem); 244 856 if (ret) { 245 857 kfree(copy_selem); 246 858 atomic_sub(smap->elem_size, ··· 249 861 goto out; 250 862 } 251 863 252 - new_sk_storage = rcu_dereference(copy_selem->sk_storage); 864 + new_sk_storage = 865 + rcu_dereference(copy_selem->local_storage); 253 866 } 254 867 bpf_map_put(map); 255 868 } ··· 268 879 BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, 269 880 void *, value, u64, flags) 270 881 { 271 - struct bpf_sk_storage_data *sdata; 882 + struct bpf_local_storage_data *sdata; 272 883 273 884 if (flags > BPF_SK_STORAGE_GET_F_CREATE) 274 885 return (unsigned long)NULL; ··· 284 895 * destruction). 285 896 */ 286 897 refcount_inc_not_zero(&sk->sk_refcnt)) { 287 - sdata = sk_storage_update(sk, map, value, BPF_NOEXIST); 898 + sdata = bpf_local_storage_update( 899 + sk, (struct bpf_local_storage_map *)map, value, 900 + BPF_NOEXIST); 288 901 /* sk must be a fullsock (guaranteed by verifier), 289 902 * so sock_gen_put() is unnecessary. 290 903 */ ··· 311 920 return -ENOENT; 312 921 } 313 922 923 + static int sk_storage_charge(struct bpf_local_storage_map *smap, 924 + void *owner, u32 size) 925 + { 926 + return omem_charge(owner, size); 927 + } 928 + 929 + static void sk_storage_uncharge(struct bpf_local_storage_map *smap, 930 + void *owner, u32 size) 931 + { 932 + struct sock *sk = owner; 933 + 934 + atomic_sub(size, &sk->sk_omem_alloc); 935 + } 936 + 937 + static struct bpf_local_storage __rcu ** 938 + sk_storage_ptr(void *owner) 939 + { 940 + struct sock *sk = owner; 941 + 942 + return &sk->sk_bpf_storage; 943 + } 944 + 314 945 static int sk_storage_map_btf_id; 315 946 const struct bpf_map_ops sk_storage_map_ops = { 316 - .map_alloc_check = bpf_sk_storage_map_alloc_check, 317 - .map_alloc = bpf_sk_storage_map_alloc, 318 - .map_free = bpf_sk_storage_map_free, 947 + .map_meta_equal = bpf_map_meta_equal, 948 + .map_alloc_check = bpf_local_storage_map_alloc_check, 949 + .map_alloc = sk_storage_map_alloc, 950 + .map_free = sk_storage_map_free, 319 951 .map_get_next_key = notsupp_get_next_key, 320 952 .map_lookup_elem = bpf_fd_sk_storage_lookup_elem, 321 953 .map_update_elem = bpf_fd_sk_storage_update_elem, 322 954 .map_delete_elem = bpf_fd_sk_storage_delete_elem, 323 - .map_check_btf = bpf_sk_storage_map_check_btf, 324 - .map_btf_name = "bpf_sk_storage_map", 955 + .map_check_btf = bpf_local_storage_map_check_btf, 956 + .map_btf_name = "bpf_local_storage_map", 325 957 .map_btf_id = &sk_storage_map_btf_id, 958 + .map_local_storage_charge = sk_storage_charge, 959 + .map_local_storage_uncharge = sk_storage_uncharge, 960 + .map_owner_storage_ptr = sk_storage_ptr, 326 961 }; 327 962 328 963 const struct bpf_func_proto bpf_sk_storage_get_proto = { ··· 377 960 .ret_type = RET_INTEGER, 378 961 .arg1_type = ARG_CONST_MAP_PTR, 379 962 .arg2_type = ARG_PTR_TO_SOCKET, 963 + }; 964 + 965 + BTF_ID_LIST(sk_storage_btf_ids) 966 + BTF_ID_UNUSED 967 + BTF_ID(struct, sock) 968 + 969 + const struct bpf_func_proto sk_storage_get_btf_proto = { 970 + .func = bpf_sk_storage_get, 971 + .gpl_only = false, 972 + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, 973 + .arg1_type = ARG_CONST_MAP_PTR, 974 + .arg2_type = ARG_PTR_TO_BTF_ID, 975 + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, 976 + .arg4_type = ARG_ANYTHING, 977 + .btf_id = sk_storage_btf_ids, 978 + }; 979 + 980 + const struct bpf_func_proto sk_storage_delete_btf_proto = { 981 + .func = bpf_sk_storage_delete, 982 + .gpl_only = false, 983 + .ret_type = RET_INTEGER, 984 + .arg1_type = ARG_CONST_MAP_PTR, 985 + .arg2_type = ARG_PTR_TO_BTF_ID, 986 + .btf_id = sk_storage_btf_ids, 380 987 }; 381 988 382 989 struct bpf_sk_storage_diag { ··· 463 1022 u32 nr_maps = 0; 464 1023 int rem, err; 465 1024 466 - /* bpf_sk_storage_map is currently limited to CAP_SYS_ADMIN as 1025 + /* bpf_local_storage_map is currently limited to CAP_SYS_ADMIN as 467 1026 * the map_alloc_check() side also does. 468 1027 */ 469 1028 if (!bpf_capable()) ··· 513 1072 } 514 1073 EXPORT_SYMBOL_GPL(bpf_sk_storage_diag_alloc); 515 1074 516 - static int diag_get(struct bpf_sk_storage_data *sdata, struct sk_buff *skb) 1075 + static int diag_get(struct bpf_local_storage_data *sdata, struct sk_buff *skb) 517 1076 { 518 1077 struct nlattr *nla_stg, *nla_value; 519 - struct bpf_sk_storage_map *smap; 1078 + struct bpf_local_storage_map *smap; 520 1079 521 1080 /* It cannot exceed max nlattr's payload */ 522 - BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < MAX_VALUE_SIZE); 1081 + BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < BPF_LOCAL_STORAGE_MAX_VALUE_SIZE); 523 1082 524 1083 nla_stg = nla_nest_start(skb, SK_DIAG_BPF_STORAGE); 525 1084 if (!nla_stg) ··· 555 1114 { 556 1115 /* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */ 557 1116 unsigned int diag_size = nla_total_size(0); 558 - struct bpf_sk_storage *sk_storage; 559 - struct bpf_sk_storage_elem *selem; 560 - struct bpf_sk_storage_map *smap; 1117 + struct bpf_local_storage *sk_storage; 1118 + struct bpf_local_storage_elem *selem; 1119 + struct bpf_local_storage_map *smap; 561 1120 struct nlattr *nla_stgs; 562 1121 unsigned int saved_len; 563 1122 int err = 0; ··· 610 1169 { 611 1170 /* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */ 612 1171 unsigned int diag_size = nla_total_size(0); 613 - struct bpf_sk_storage *sk_storage; 614 - struct bpf_sk_storage_data *sdata; 1172 + struct bpf_local_storage *sk_storage; 1173 + struct bpf_local_storage_data *sdata; 615 1174 struct nlattr *nla_stgs; 616 1175 unsigned int saved_len; 617 1176 int err = 0; ··· 638 1197 639 1198 saved_len = skb->len; 640 1199 for (i = 0; i < diag->nr_maps; i++) { 641 - sdata = __sk_storage_lookup(sk_storage, 642 - (struct bpf_sk_storage_map *)diag->maps[i], 1200 + sdata = bpf_local_storage_lookup(sk_storage, 1201 + (struct bpf_local_storage_map *)diag->maps[i], 643 1202 false); 644 1203 645 1204 if (!sdata) ··· 676 1235 unsigned skip_elems; 677 1236 }; 678 1237 679 - static struct bpf_sk_storage_elem * 1238 + static struct bpf_local_storage_elem * 680 1239 bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, 681 - struct bpf_sk_storage_elem *prev_selem) 1240 + struct bpf_local_storage_elem *prev_selem) 682 1241 { 683 - struct bpf_sk_storage *sk_storage; 684 - struct bpf_sk_storage_elem *selem; 1242 + struct bpf_local_storage *sk_storage; 1243 + struct bpf_local_storage_elem *selem; 685 1244 u32 skip_elems = info->skip_elems; 686 - struct bpf_sk_storage_map *smap; 1245 + struct bpf_local_storage_map *smap; 687 1246 u32 bucket_id = info->bucket_id; 688 1247 u32 i, count, n_buckets; 689 - struct bucket *b; 1248 + struct bpf_local_storage_map_bucket *b; 690 1249 691 - smap = (struct bpf_sk_storage_map *)info->map; 1250 + smap = (struct bpf_local_storage_map *)info->map; 692 1251 n_buckets = 1U << smap->bucket_log; 693 1252 if (bucket_id >= n_buckets) 694 1253 return NULL; ··· 698 1257 count = 0; 699 1258 while (selem) { 700 1259 selem = hlist_entry_safe(selem->map_node.next, 701 - struct bpf_sk_storage_elem, map_node); 1260 + struct bpf_local_storage_elem, map_node); 702 1261 if (!selem) { 703 1262 /* not found, unlock and go to the next bucket */ 704 1263 b = &smap->buckets[bucket_id++]; ··· 706 1265 skip_elems = 0; 707 1266 break; 708 1267 } 709 - sk_storage = rcu_dereference_raw(selem->sk_storage); 1268 + sk_storage = rcu_dereference_raw(selem->local_storage); 710 1269 if (sk_storage) { 711 1270 info->skip_elems = skip_elems + count; 712 1271 return selem; ··· 719 1278 raw_spin_lock_bh(&b->lock); 720 1279 count = 0; 721 1280 hlist_for_each_entry(selem, &b->list, map_node) { 722 - sk_storage = rcu_dereference_raw(selem->sk_storage); 1281 + sk_storage = rcu_dereference_raw(selem->local_storage); 723 1282 if (sk_storage && count >= skip_elems) { 724 1283 info->bucket_id = i; 725 1284 info->skip_elems = count; ··· 738 1297 739 1298 static void *bpf_sk_storage_map_seq_start(struct seq_file *seq, loff_t *pos) 740 1299 { 741 - struct bpf_sk_storage_elem *selem; 1300 + struct bpf_local_storage_elem *selem; 742 1301 743 1302 selem = bpf_sk_storage_map_seq_find_next(seq->private, NULL); 744 1303 if (!selem) ··· 771 1330 void *value) 772 1331 773 1332 static int __bpf_sk_storage_map_seq_show(struct seq_file *seq, 774 - struct bpf_sk_storage_elem *selem) 1333 + struct bpf_local_storage_elem *selem) 775 1334 { 776 1335 struct bpf_iter_seq_sk_storage_map_info *info = seq->private; 777 1336 struct bpf_iter__bpf_sk_storage_map ctx = {}; 778 - struct bpf_sk_storage *sk_storage; 1337 + struct bpf_local_storage *sk_storage; 779 1338 struct bpf_iter_meta meta; 780 1339 struct bpf_prog *prog; 781 1340 int ret = 0; ··· 786 1345 ctx.meta = &meta; 787 1346 ctx.map = info->map; 788 1347 if (selem) { 789 - sk_storage = rcu_dereference_raw(selem->sk_storage); 790 - ctx.sk = sk_storage->sk; 1348 + sk_storage = rcu_dereference_raw(selem->local_storage); 1349 + ctx.sk = sk_storage->owner; 791 1350 ctx.value = SDATA(selem)->data; 792 1351 } 793 1352 ret = bpf_iter_run_prog(prog, &ctx); ··· 804 1363 static void bpf_sk_storage_map_seq_stop(struct seq_file *seq, void *v) 805 1364 { 806 1365 struct bpf_iter_seq_sk_storage_map_info *info = seq->private; 807 - struct bpf_sk_storage_map *smap; 808 - struct bucket *b; 1366 + struct bpf_local_storage_map *smap; 1367 + struct bpf_local_storage_map_bucket *b; 809 1368 810 1369 if (!v) { 811 1370 (void)__bpf_sk_storage_map_seq_show(seq, v); 812 1371 } else { 813 - smap = (struct bpf_sk_storage_map *)info->map; 1372 + smap = (struct bpf_local_storage_map *)info->map; 814 1373 b = &smap->buckets[info->bucket_id]; 815 1374 raw_spin_unlock_bh(&b->lock); 816 1375 } ··· 878 1437 .target = "bpf_sk_storage_map", 879 1438 .attach_target = bpf_iter_attach_map, 880 1439 .detach_target = bpf_iter_detach_map, 1440 + .show_fdinfo = bpf_iter_map_show_fdinfo, 1441 + .fill_link_info = bpf_iter_map_fill_link_info, 881 1442 .ctx_arg_info_size = 2, 882 1443 .ctx_arg_info = { 883 1444 { offsetof(struct bpf_iter__bpf_sk_storage_map, sk),

+409 -7

net/core/filter.c

··· 4459 4459 } else { 4460 4460 struct inet_connection_sock *icsk = inet_csk(sk); 4461 4461 struct tcp_sock *tp = tcp_sk(sk); 4462 + unsigned long timeout; 4462 4463 4463 4464 if (optlen != sizeof(int)) 4464 4465 return -EINVAL; ··· 4480 4479 tp->snd_cwnd_clamp = val; 4481 4480 tp->snd_ssthresh = val; 4482 4481 } 4482 + break; 4483 + case TCP_BPF_DELACK_MAX: 4484 + timeout = usecs_to_jiffies(val); 4485 + if (timeout > TCP_DELACK_MAX || 4486 + timeout < TCP_TIMEOUT_MIN) 4487 + return -EINVAL; 4488 + inet_csk(sk)->icsk_delack_max = timeout; 4489 + break; 4490 + case TCP_BPF_RTO_MIN: 4491 + timeout = usecs_to_jiffies(val); 4492 + if (timeout > TCP_RTO_MIN || 4493 + timeout < TCP_TIMEOUT_MIN) 4494 + return -EINVAL; 4495 + inet_csk(sk)->icsk_rto_min = timeout; 4483 4496 break; 4484 4497 case TCP_SAVE_SYN: 4485 4498 if (val < 0 || val > 1) ··· 4565 4550 tp = tcp_sk(sk); 4566 4551 4567 4552 if (optlen <= 0 || !tp->saved_syn || 4568 - optlen > tp->saved_syn[0]) 4553 + optlen > tcp_saved_syn_len(tp->saved_syn)) 4569 4554 goto err_clear; 4570 - memcpy(optval, tp->saved_syn + 1, optlen); 4555 + memcpy(optval, tp->saved_syn->data, optlen); 4571 4556 break; 4572 4557 default: 4573 4558 goto err_clear; ··· 4669 4654 .arg5_type = ARG_CONST_SIZE, 4670 4655 }; 4671 4656 4657 + static int bpf_sock_ops_get_syn(struct bpf_sock_ops_kern *bpf_sock, 4658 + int optname, const u8 **start) 4659 + { 4660 + struct sk_buff *syn_skb = bpf_sock->syn_skb; 4661 + const u8 *hdr_start; 4662 + int ret; 4663 + 4664 + if (syn_skb) { 4665 + /* sk is a request_sock here */ 4666 + 4667 + if (optname == TCP_BPF_SYN) { 4668 + hdr_start = syn_skb->data; 4669 + ret = tcp_hdrlen(syn_skb); 4670 + } else if (optname == TCP_BPF_SYN_IP) { 4671 + hdr_start = skb_network_header(syn_skb); 4672 + ret = skb_network_header_len(syn_skb) + 4673 + tcp_hdrlen(syn_skb); 4674 + } else { 4675 + /* optname == TCP_BPF_SYN_MAC */ 4676 + hdr_start = skb_mac_header(syn_skb); 4677 + ret = skb_mac_header_len(syn_skb) + 4678 + skb_network_header_len(syn_skb) + 4679 + tcp_hdrlen(syn_skb); 4680 + } 4681 + } else { 4682 + struct sock *sk = bpf_sock->sk; 4683 + struct saved_syn *saved_syn; 4684 + 4685 + if (sk->sk_state == TCP_NEW_SYN_RECV) 4686 + /* synack retransmit. bpf_sock->syn_skb will 4687 + * not be available. It has to resort to 4688 + * saved_syn (if it is saved). 4689 + */ 4690 + saved_syn = inet_reqsk(sk)->saved_syn; 4691 + else 4692 + saved_syn = tcp_sk(sk)->saved_syn; 4693 + 4694 + if (!saved_syn) 4695 + return -ENOENT; 4696 + 4697 + if (optname == TCP_BPF_SYN) { 4698 + hdr_start = saved_syn->data + 4699 + saved_syn->mac_hdrlen + 4700 + saved_syn->network_hdrlen; 4701 + ret = saved_syn->tcp_hdrlen; 4702 + } else if (optname == TCP_BPF_SYN_IP) { 4703 + hdr_start = saved_syn->data + 4704 + saved_syn->mac_hdrlen; 4705 + ret = saved_syn->network_hdrlen + 4706 + saved_syn->tcp_hdrlen; 4707 + } else { 4708 + /* optname == TCP_BPF_SYN_MAC */ 4709 + 4710 + /* TCP_SAVE_SYN may not have saved the mac hdr */ 4711 + if (!saved_syn->mac_hdrlen) 4712 + return -ENOENT; 4713 + 4714 + hdr_start = saved_syn->data; 4715 + ret = saved_syn->mac_hdrlen + 4716 + saved_syn->network_hdrlen + 4717 + saved_syn->tcp_hdrlen; 4718 + } 4719 + } 4720 + 4721 + *start = hdr_start; 4722 + return ret; 4723 + } 4724 + 4672 4725 BPF_CALL_5(bpf_sock_ops_getsockopt, struct bpf_sock_ops_kern *, bpf_sock, 4673 4726 int, level, int, optname, char *, optval, int, optlen) 4674 4727 { 4728 + if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP && 4729 + optname >= TCP_BPF_SYN && optname <= TCP_BPF_SYN_MAC) { 4730 + int ret, copy_len = 0; 4731 + const u8 *start; 4732 + 4733 + ret = bpf_sock_ops_get_syn(bpf_sock, optname, &start); 4734 + if (ret > 0) { 4735 + copy_len = ret; 4736 + if (optlen < copy_len) { 4737 + copy_len = optlen; 4738 + ret = -ENOSPC; 4739 + } 4740 + 4741 + memcpy(optval, start, copy_len); 4742 + } 4743 + 4744 + /* Zero out unused buffer at the end */ 4745 + memset(optval + copy_len, 0, optlen - copy_len); 4746 + 4747 + return ret; 4748 + } 4749 + 4675 4750 return _bpf_getsockopt(bpf_sock->sk, level, optname, optval, optlen); 4676 4751 } 4677 4752 ··· 6255 6150 .arg3_type = ARG_ANYTHING, 6256 6151 }; 6257 6152 6153 + static const u8 *bpf_search_tcp_opt(const u8 *op, const u8 *opend, 6154 + u8 search_kind, const u8 *magic, 6155 + u8 magic_len, bool *eol) 6156 + { 6157 + u8 kind, kind_len; 6158 + 6159 + *eol = false; 6160 + 6161 + while (op < opend) { 6162 + kind = op[0]; 6163 + 6164 + if (kind == TCPOPT_EOL) { 6165 + *eol = true; 6166 + return ERR_PTR(-ENOMSG); 6167 + } else if (kind == TCPOPT_NOP) { 6168 + op++; 6169 + continue; 6170 + } 6171 + 6172 + if (opend - op < 2 || opend - op < op[1] || op[1] < 2) 6173 + /* Something is wrong in the received header. 6174 + * Follow the TCP stack's tcp_parse_options() 6175 + * and just bail here. 6176 + */ 6177 + return ERR_PTR(-EFAULT); 6178 + 6179 + kind_len = op[1]; 6180 + if (search_kind == kind) { 6181 + if (!magic_len) 6182 + return op; 6183 + 6184 + if (magic_len > kind_len - 2) 6185 + return ERR_PTR(-ENOMSG); 6186 + 6187 + if (!memcmp(&op[2], magic, magic_len)) 6188 + return op; 6189 + } 6190 + 6191 + op += kind_len; 6192 + } 6193 + 6194 + return ERR_PTR(-ENOMSG); 6195 + } 6196 + 6197 + BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock, 6198 + void *, search_res, u32, len, u64, flags) 6199 + { 6200 + bool eol, load_syn = flags & BPF_LOAD_HDR_OPT_TCP_SYN; 6201 + const u8 *op, *opend, *magic, *search = search_res; 6202 + u8 search_kind, search_len, copy_len, magic_len; 6203 + int ret; 6204 + 6205 + /* 2 byte is the minimal option len except TCPOPT_NOP and 6206 + * TCPOPT_EOL which are useless for the bpf prog to learn 6207 + * and this helper disallow loading them also. 6208 + */ 6209 + if (len < 2 || flags & ~BPF_LOAD_HDR_OPT_TCP_SYN) 6210 + return -EINVAL; 6211 + 6212 + search_kind = search[0]; 6213 + search_len = search[1]; 6214 + 6215 + if (search_len > len || search_kind == TCPOPT_NOP || 6216 + search_kind == TCPOPT_EOL) 6217 + return -EINVAL; 6218 + 6219 + if (search_kind == TCPOPT_EXP || search_kind == 253) { 6220 + /* 16 or 32 bit magic. +2 for kind and kind length */ 6221 + if (search_len != 4 && search_len != 6) 6222 + return -EINVAL; 6223 + magic = &search[2]; 6224 + magic_len = search_len - 2; 6225 + } else { 6226 + if (search_len) 6227 + return -EINVAL; 6228 + magic = NULL; 6229 + magic_len = 0; 6230 + } 6231 + 6232 + if (load_syn) { 6233 + ret = bpf_sock_ops_get_syn(bpf_sock, TCP_BPF_SYN, &op); 6234 + if (ret < 0) 6235 + return ret; 6236 + 6237 + opend = op + ret; 6238 + op += sizeof(struct tcphdr); 6239 + } else { 6240 + if (!bpf_sock->skb || 6241 + bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB) 6242 + /* This bpf_sock->op cannot call this helper */ 6243 + return -EPERM; 6244 + 6245 + opend = bpf_sock->skb_data_end; 6246 + op = bpf_sock->skb->data + sizeof(struct tcphdr); 6247 + } 6248 + 6249 + op = bpf_search_tcp_opt(op, opend, search_kind, magic, magic_len, 6250 + &eol); 6251 + if (IS_ERR(op)) 6252 + return PTR_ERR(op); 6253 + 6254 + copy_len = op[1]; 6255 + ret = copy_len; 6256 + if (copy_len > len) { 6257 + ret = -ENOSPC; 6258 + copy_len = len; 6259 + } 6260 + 6261 + memcpy(search_res, op, copy_len); 6262 + return ret; 6263 + } 6264 + 6265 + static const struct bpf_func_proto bpf_sock_ops_load_hdr_opt_proto = { 6266 + .func = bpf_sock_ops_load_hdr_opt, 6267 + .gpl_only = false, 6268 + .ret_type = RET_INTEGER, 6269 + .arg1_type = ARG_PTR_TO_CTX, 6270 + .arg2_type = ARG_PTR_TO_MEM, 6271 + .arg3_type = ARG_CONST_SIZE, 6272 + .arg4_type = ARG_ANYTHING, 6273 + }; 6274 + 6275 + BPF_CALL_4(bpf_sock_ops_store_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock, 6276 + const void *, from, u32, len, u64, flags) 6277 + { 6278 + u8 new_kind, new_kind_len, magic_len = 0, *opend; 6279 + const u8 *op, *new_op, *magic = NULL; 6280 + struct sk_buff *skb; 6281 + bool eol; 6282 + 6283 + if (bpf_sock->op != BPF_SOCK_OPS_WRITE_HDR_OPT_CB) 6284 + return -EPERM; 6285 + 6286 + if (len < 2 || flags) 6287 + return -EINVAL; 6288 + 6289 + new_op = from; 6290 + new_kind = new_op[0]; 6291 + new_kind_len = new_op[1]; 6292 + 6293 + if (new_kind_len > len || new_kind == TCPOPT_NOP || 6294 + new_kind == TCPOPT_EOL) 6295 + return -EINVAL; 6296 + 6297 + if (new_kind_len > bpf_sock->remaining_opt_len) 6298 + return -ENOSPC; 6299 + 6300 + /* 253 is another experimental kind */ 6301 + if (new_kind == TCPOPT_EXP || new_kind == 253) { 6302 + if (new_kind_len < 4) 6303 + return -EINVAL; 6304 + /* Match for the 2 byte magic also. 6305 + * RFC 6994: the magic could be 2 or 4 bytes. 6306 + * Hence, matching by 2 byte only is on the 6307 + * conservative side but it is the right 6308 + * thing to do for the 'search-for-duplication' 6309 + * purpose. 6310 + */ 6311 + magic = &new_op[2]; 6312 + magic_len = 2; 6313 + } 6314 + 6315 + /* Check for duplication */ 6316 + skb = bpf_sock->skb; 6317 + op = skb->data + sizeof(struct tcphdr); 6318 + opend = bpf_sock->skb_data_end; 6319 + 6320 + op = bpf_search_tcp_opt(op, opend, new_kind, magic, magic_len, 6321 + &eol); 6322 + if (!IS_ERR(op)) 6323 + return -EEXIST; 6324 + 6325 + if (PTR_ERR(op) != -ENOMSG) 6326 + return PTR_ERR(op); 6327 + 6328 + if (eol) 6329 + /* The option has been ended. Treat it as no more 6330 + * header option can be written. 6331 + */ 6332 + return -ENOSPC; 6333 + 6334 + /* No duplication found. Store the header option. */ 6335 + memcpy(opend, from, new_kind_len); 6336 + 6337 + bpf_sock->remaining_opt_len -= new_kind_len; 6338 + bpf_sock->skb_data_end += new_kind_len; 6339 + 6340 + return 0; 6341 + } 6342 + 6343 + static const struct bpf_func_proto bpf_sock_ops_store_hdr_opt_proto = { 6344 + .func = bpf_sock_ops_store_hdr_opt, 6345 + .gpl_only = false, 6346 + .ret_type = RET_INTEGER, 6347 + .arg1_type = ARG_PTR_TO_CTX, 6348 + .arg2_type = ARG_PTR_TO_MEM, 6349 + .arg3_type = ARG_CONST_SIZE, 6350 + .arg4_type = ARG_ANYTHING, 6351 + }; 6352 + 6353 + BPF_CALL_3(bpf_sock_ops_reserve_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock, 6354 + u32, len, u64, flags) 6355 + { 6356 + if (bpf_sock->op != BPF_SOCK_OPS_HDR_OPT_LEN_CB) 6357 + return -EPERM; 6358 + 6359 + if (flags || len < 2) 6360 + return -EINVAL; 6361 + 6362 + if (len > bpf_sock->remaining_opt_len) 6363 + return -ENOSPC; 6364 + 6365 + bpf_sock->remaining_opt_len -= len; 6366 + 6367 + return 0; 6368 + } 6369 + 6370 + static const struct bpf_func_proto bpf_sock_ops_reserve_hdr_opt_proto = { 6371 + .func = bpf_sock_ops_reserve_hdr_opt, 6372 + .gpl_only = false, 6373 + .ret_type = RET_INTEGER, 6374 + .arg1_type = ARG_PTR_TO_CTX, 6375 + .arg2_type = ARG_ANYTHING, 6376 + .arg3_type = ARG_ANYTHING, 6377 + }; 6378 + 6258 6379 #endif /* CONFIG_INET */ 6259 6380 6260 6381 bool bpf_helper_changes_pkt_data(void *func) ··· 6509 6178 func == bpf_lwt_seg6_store_bytes || 6510 6179 func == bpf_lwt_seg6_adjust_srh || 6511 6180 func == bpf_lwt_seg6_action || 6181 + #endif 6182 + #ifdef CONFIG_INET 6183 + func == bpf_sock_ops_store_hdr_opt || 6512 6184 #endif 6513 6185 func == bpf_lwt_in_push_encap || 6514 6186 func == bpf_lwt_xmit_push_encap) ··· 6884 6550 case BPF_FUNC_sk_storage_delete: 6885 6551 return &bpf_sk_storage_delete_proto; 6886 6552 #ifdef CONFIG_INET 6553 + case BPF_FUNC_load_hdr_opt: 6554 + return &bpf_sock_ops_load_hdr_opt_proto; 6555 + case BPF_FUNC_store_hdr_opt: 6556 + return &bpf_sock_ops_store_hdr_opt_proto; 6557 + case BPF_FUNC_reserve_hdr_opt: 6558 + return &bpf_sock_ops_reserve_hdr_opt_proto; 6887 6559 case BPF_FUNC_tcp_sock: 6888 6560 return &bpf_tcp_sock_proto; 6889 6561 #endif /* CONFIG_INET */ ··· 7689 7349 return false; 7690 7350 info->reg_type = PTR_TO_SOCKET_OR_NULL; 7691 7351 break; 7352 + case offsetof(struct bpf_sock_ops, skb_data): 7353 + if (size != sizeof(__u64)) 7354 + return false; 7355 + info->reg_type = PTR_TO_PACKET; 7356 + break; 7357 + case offsetof(struct bpf_sock_ops, skb_data_end): 7358 + if (size != sizeof(__u64)) 7359 + return false; 7360 + info->reg_type = PTR_TO_PACKET_END; 7361 + break; 7362 + case offsetof(struct bpf_sock_ops, skb_tcp_flags): 7363 + bpf_ctx_record_field_size(info, size_default); 7364 + return bpf_ctx_narrow_access_ok(off, size, 7365 + size_default); 7692 7366 default: 7693 7367 if (size != size_default) 7694 7368 return false; ··· 8804 8450 return insn - insn_buf; 8805 8451 8806 8452 switch (si->off) { 8807 - case offsetof(struct bpf_sock_ops, op) ... 8453 + case offsetof(struct bpf_sock_ops, op): 8454 + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, 8455 + op), 8456 + si->dst_reg, si->src_reg, 8457 + offsetof(struct bpf_sock_ops_kern, op)); 8458 + break; 8459 + 8460 + case offsetof(struct bpf_sock_ops, replylong[0]) ... 8808 8461 offsetof(struct bpf_sock_ops, replylong[3]): 8809 - BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, op) != 8810 - sizeof_field(struct bpf_sock_ops_kern, op)); 8811 8462 BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, reply) != 8812 8463 sizeof_field(struct bpf_sock_ops_kern, reply)); 8813 8464 BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, replylong) != 8814 8465 sizeof_field(struct bpf_sock_ops_kern, replylong)); 8815 8466 off = si->off; 8816 - off -= offsetof(struct bpf_sock_ops, op); 8817 - off += offsetof(struct bpf_sock_ops_kern, op); 8467 + off -= offsetof(struct bpf_sock_ops, replylong[0]); 8468 + off += offsetof(struct bpf_sock_ops_kern, replylong[0]); 8818 8469 if (type == BPF_WRITE) 8819 8470 *insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg, 8820 8471 off); ··· 9039 8680 break; 9040 8681 case offsetof(struct bpf_sock_ops, sk): 9041 8682 SOCK_OPS_GET_SK(); 8683 + break; 8684 + case offsetof(struct bpf_sock_ops, skb_data_end): 8685 + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, 8686 + skb_data_end), 8687 + si->dst_reg, si->src_reg, 8688 + offsetof(struct bpf_sock_ops_kern, 8689 + skb_data_end)); 8690 + break; 8691 + case offsetof(struct bpf_sock_ops, skb_data): 8692 + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, 8693 + skb), 8694 + si->dst_reg, si->src_reg, 8695 + offsetof(struct bpf_sock_ops_kern, 8696 + skb)); 8697 + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); 8698 + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data), 8699 + si->dst_reg, si->dst_reg, 8700 + offsetof(struct sk_buff, data)); 8701 + break; 8702 + case offsetof(struct bpf_sock_ops, skb_len): 8703 + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, 8704 + skb), 8705 + si->dst_reg, si->src_reg, 8706 + offsetof(struct bpf_sock_ops_kern, 8707 + skb)); 8708 + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); 8709 + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, len), 8710 + si->dst_reg, si->dst_reg, 8711 + offsetof(struct sk_buff, len)); 8712 + break; 8713 + case offsetof(struct bpf_sock_ops, skb_tcp_flags): 8714 + off = offsetof(struct sk_buff, cb); 8715 + off += offsetof(struct tcp_skb_cb, tcp_flags); 8716 + *target_size = sizeof_field(struct tcp_skb_cb, tcp_flags); 8717 + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_sock_ops_kern, 8718 + skb), 8719 + si->dst_reg, si->src_reg, 8720 + offsetof(struct bpf_sock_ops_kern, 8721 + skb)); 8722 + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); 8723 + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct tcp_skb_cb, 8724 + tcp_flags), 8725 + si->dst_reg, si->dst_reg, off); 9042 8726 break; 9043 8727 } 9044 8728 return insn - insn_buf;

+28 -6

net/core/skmsg.c

··· 494 494 495 495 struct sk_psock *sk_psock_init(struct sock *sk, int node) 496 496 { 497 - struct sk_psock *psock = kzalloc_node(sizeof(*psock), 498 - GFP_ATOMIC | __GFP_NOWARN, 499 - node); 500 - if (!psock) 501 - return NULL; 497 + struct sk_psock *psock; 498 + struct proto *prot; 502 499 500 + write_lock_bh(&sk->sk_callback_lock); 501 + 502 + if (inet_csk_has_ulp(sk)) { 503 + psock = ERR_PTR(-EINVAL); 504 + goto out; 505 + } 506 + 507 + if (sk->sk_user_data) { 508 + psock = ERR_PTR(-EBUSY); 509 + goto out; 510 + } 511 + 512 + psock = kzalloc_node(sizeof(*psock), GFP_ATOMIC | __GFP_NOWARN, node); 513 + if (!psock) { 514 + psock = ERR_PTR(-ENOMEM); 515 + goto out; 516 + } 517 + 518 + prot = READ_ONCE(sk->sk_prot); 503 519 psock->sk = sk; 504 - psock->eval = __SK_NONE; 520 + psock->eval = __SK_NONE; 521 + psock->sk_proto = prot; 522 + psock->saved_unhash = prot->unhash; 523 + psock->saved_close = prot->close; 524 + psock->saved_write_space = sk->sk_write_space; 505 525 506 526 INIT_LIST_HEAD(&psock->link); 507 527 spin_lock_init(&psock->link_lock); ··· 536 516 rcu_assign_sk_user_data_nocopy(sk, psock); 537 517 sock_hold(sk); 538 518 519 + out: 520 + write_unlock_bh(&sk->sk_callback_lock); 539 521 return psock; 540 522 } 541 523 EXPORT_SYMBOL_GPL(sk_psock_init);

+37 -54

net/core/sock_map.c

··· 184 184 { 185 185 struct proto *prot; 186 186 187 - sock_owned_by_me(sk); 188 - 189 187 switch (sk->sk_type) { 190 188 case SOCK_STREAM: 191 189 prot = tcp_bpf_get_proto(sk, psock); ··· 270 272 } 271 273 } else { 272 274 psock = sk_psock_init(sk, map->numa_node); 273 - if (!psock) { 274 - ret = -ENOMEM; 275 + if (IS_ERR(psock)) { 276 + ret = PTR_ERR(psock); 275 277 goto out_progs; 276 278 } 277 279 } ··· 320 322 321 323 if (!psock) { 322 324 psock = sk_psock_init(sk, map->numa_node); 323 - if (!psock) 324 - return -ENOMEM; 325 + if (IS_ERR(psock)) 326 + return PTR_ERR(psock); 325 327 } 326 328 327 329 ret = sock_map_init_proto(sk, psock); ··· 476 478 return -EINVAL; 477 479 if (unlikely(idx >= map->max_entries)) 478 480 return -E2BIG; 479 - if (inet_csk_has_ulp(sk)) 480 - return -EINVAL; 481 481 482 482 link = sk_psock_init_link(); 483 483 if (!link) ··· 559 563 return false; 560 564 } 561 565 562 - static int sock_map_update_elem(struct bpf_map *map, void *key, 563 - void *value, u64 flags) 566 + static int sock_hash_update_common(struct bpf_map *map, void *key, 567 + struct sock *sk, u64 flags); 568 + 569 + int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, 570 + u64 flags) 564 571 { 565 - u32 idx = *(u32 *)key; 566 572 struct socket *sock; 567 573 struct sock *sk; 568 574 int ret; ··· 593 595 sock_map_sk_acquire(sk); 594 596 if (!sock_map_sk_state_allowed(sk)) 595 597 ret = -EOPNOTSUPP; 598 + else if (map->map_type == BPF_MAP_TYPE_SOCKMAP) 599 + ret = sock_map_update_common(map, *(u32 *)key, sk, flags); 596 600 else 597 - ret = sock_map_update_common(map, idx, sk, flags); 601 + ret = sock_hash_update_common(map, key, sk, flags); 598 602 sock_map_sk_release(sk); 599 603 out: 600 604 fput(sock->file); 605 + return ret; 606 + } 607 + 608 + static int sock_map_update_elem(struct bpf_map *map, void *key, 609 + void *value, u64 flags) 610 + { 611 + struct sock *sk = (struct sock *)value; 612 + int ret; 613 + 614 + if (!sock_map_sk_is_suitable(sk)) 615 + return -EOPNOTSUPP; 616 + 617 + local_bh_disable(); 618 + bh_lock_sock(sk); 619 + if (!sock_map_sk_state_allowed(sk)) 620 + ret = -EOPNOTSUPP; 621 + else if (map->map_type == BPF_MAP_TYPE_SOCKMAP) 622 + ret = sock_map_update_common(map, *(u32 *)key, sk, flags); 623 + else 624 + ret = sock_hash_update_common(map, key, sk, flags); 625 + bh_unlock_sock(sk); 626 + local_bh_enable(); 601 627 return ret; 602 628 } 603 629 ··· 705 683 706 684 static int sock_map_btf_id; 707 685 const struct bpf_map_ops sock_map_ops = { 686 + .map_meta_equal = bpf_map_meta_equal, 708 687 .map_alloc = sock_map_alloc, 709 688 .map_free = sock_map_free, 710 689 .map_get_next_key = sock_map_get_next_key, ··· 878 855 WARN_ON_ONCE(!rcu_read_lock_held()); 879 856 if (unlikely(flags > BPF_EXIST)) 880 857 return -EINVAL; 881 - if (inet_csk_has_ulp(sk)) 882 - return -EINVAL; 883 858 884 859 link = sk_psock_init_link(); 885 860 if (!link) ··· 933 912 sk_psock_put(sk, psock); 934 913 out_free: 935 914 sk_psock_free_link(link); 936 - return ret; 937 - } 938 - 939 - static int sock_hash_update_elem(struct bpf_map *map, void *key, 940 - void *value, u64 flags) 941 - { 942 - struct socket *sock; 943 - struct sock *sk; 944 - int ret; 945 - u64 ufd; 946 - 947 - if (map->value_size == sizeof(u64)) 948 - ufd = *(u64 *)value; 949 - else 950 - ufd = *(u32 *)value; 951 - if (ufd > S32_MAX) 952 - return -EINVAL; 953 - 954 - sock = sockfd_lookup(ufd, &ret); 955 - if (!sock) 956 - return ret; 957 - sk = sock->sk; 958 - if (!sk) { 959 - ret = -EINVAL; 960 - goto out; 961 - } 962 - if (!sock_map_sk_is_suitable(sk)) { 963 - ret = -EOPNOTSUPP; 964 - goto out; 965 - } 966 - 967 - sock_map_sk_acquire(sk); 968 - if (!sock_map_sk_state_allowed(sk)) 969 - ret = -EOPNOTSUPP; 970 - else 971 - ret = sock_hash_update_common(map, key, sk, flags); 972 - sock_map_sk_release(sk); 973 - out: 974 - fput(sock->file); 975 915 return ret; 976 916 } 977 917 ··· 1201 1219 1202 1220 static int sock_hash_map_btf_id; 1203 1221 const struct bpf_map_ops sock_hash_ops = { 1222 + .map_meta_equal = bpf_map_meta_equal, 1204 1223 .map_alloc = sock_hash_alloc, 1205 1224 .map_free = sock_hash_free, 1206 1225 .map_get_next_key = sock_hash_get_next_key, 1207 - .map_update_elem = sock_hash_update_elem, 1226 + .map_update_elem = sock_map_update_elem, 1208 1227 .map_delete_elem = sock_hash_delete_elem, 1209 1228 .map_lookup_elem = sock_hash_lookup, 1210 1229 .map_lookup_elem_sys_only = sock_hash_lookup_sys,

+1 -1

net/ethtool/channels.c

··· 223 223 from_channel = channels.combined_count + 224 224 min(channels.rx_count, channels.tx_count); 225 225 for (i = from_channel; i < old_total; i++) 226 - if (xdp_get_umem_from_qid(dev, i)) { 226 + if (xsk_get_pool_from_qid(dev, i)) { 227 227 GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing zerocopy AF_XDP sockets"); 228 228 return -EINVAL; 229 229 }

+1 -1

net/ethtool/ioctl.c

··· 1706 1706 min(channels.rx_count, channels.tx_count); 1707 1707 to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count); 1708 1708 for (i = from_channel; i < to_channel; i++) 1709 - if (xdp_get_umem_from_qid(dev, i)) 1709 + if (xsk_get_pool_from_qid(dev, i)) 1710 1710 return -EINVAL; 1711 1711 1712 1712 ret = dev->ethtool_ops->set_channels(dev, &channels);

+11 -5

net/ipv4/tcp.c

··· 418 418 INIT_LIST_HEAD(&tp->tsorted_sent_queue); 419 419 420 420 icsk->icsk_rto = TCP_TIMEOUT_INIT; 421 + icsk->icsk_rto_min = TCP_RTO_MIN; 422 + icsk->icsk_delack_max = TCP_DELACK_MAX; 421 423 tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT); 422 424 minmax_reset(&tp->rtt_min, tcp_jiffies32, ~0U); 423 425 ··· 2687 2685 icsk->icsk_backoff = 0; 2688 2686 icsk->icsk_probes_out = 0; 2689 2687 icsk->icsk_rto = TCP_TIMEOUT_INIT; 2688 + icsk->icsk_rto_min = TCP_RTO_MIN; 2689 + icsk->icsk_delack_max = TCP_DELACK_MAX; 2690 2690 tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; 2691 2691 tp->snd_cwnd = TCP_INIT_CWND; 2692 2692 tp->snd_cwnd_cnt = 0; ··· 3211 3207 break; 3212 3208 3213 3209 case TCP_SAVE_SYN: 3214 - if (val < 0 || val > 1) 3210 + /* 0: disable, 1: enable, 2: start from ether_header */ 3211 + if (val < 0 || val > 2) 3215 3212 err = -EINVAL; 3216 3213 else 3217 3214 tp->save_syn = val; ··· 3793 3788 3794 3789 lock_sock(sk); 3795 3790 if (tp->saved_syn) { 3796 - if (len < tp->saved_syn[0]) { 3797 - if (put_user(tp->saved_syn[0], optlen)) { 3791 + if (len < tcp_saved_syn_len(tp->saved_syn)) { 3792 + if (put_user(tcp_saved_syn_len(tp->saved_syn), 3793 + optlen)) { 3798 3794 release_sock(sk); 3799 3795 return -EFAULT; 3800 3796 } 3801 3797 release_sock(sk); 3802 3798 return -EINVAL; 3803 3799 } 3804 - len = tp->saved_syn[0]; 3800 + len = tcp_saved_syn_len(tp->saved_syn); 3805 3801 if (put_user(len, optlen)) { 3806 3802 release_sock(sk); 3807 3803 return -EFAULT; 3808 3804 } 3809 - if (copy_to_user(optval, tp->saved_syn + 1, len)) { 3805 + if (copy_to_user(optval, tp->saved_syn->data, len)) { 3810 3806 release_sock(sk); 3811 3807 return -EFAULT; 3812 3808 }

+5 -8

net/ipv4/tcp_bpf.c

··· 567 567 prot[TCP_BPF_TX].sendpage = tcp_bpf_sendpage; 568 568 } 569 569 570 - static void tcp_bpf_check_v6_needs_rebuild(struct sock *sk, struct proto *ops) 570 + static void tcp_bpf_check_v6_needs_rebuild(struct proto *ops) 571 571 { 572 - if (sk->sk_family == AF_INET6 && 573 - unlikely(ops != smp_load_acquire(&tcpv6_prot_saved))) { 572 + if (unlikely(ops != smp_load_acquire(&tcpv6_prot_saved))) { 574 573 spin_lock_bh(&tcpv6_prot_lock); 575 574 if (likely(ops != tcpv6_prot_saved)) { 576 575 tcp_bpf_rebuild_protos(tcp_bpf_prots[TCP_BPF_IPV6], ops); ··· 602 603 int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; 603 604 int config = psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE; 604 605 605 - if (!psock->sk_proto) { 606 - struct proto *ops = READ_ONCE(sk->sk_prot); 607 - 608 - if (tcp_bpf_assert_proto_ops(ops)) 606 + if (sk->sk_family == AF_INET6) { 607 + if (tcp_bpf_assert_proto_ops(psock->sk_proto)) 609 608 return ERR_PTR(-EINVAL); 610 609 611 - tcp_bpf_check_v6_needs_rebuild(sk, ops); 610 + tcp_bpf_check_v6_needs_rebuild(psock->sk_proto); 612 611 } 613 612 614 613 return &tcp_bpf_prots[family][config];

+1 -1

net/ipv4/tcp_fastopen.c

··· 295 295 refcount_set(&req->rsk_refcnt, 2); 296 296 297 297 /* Now finish processing the fastopen child socket. */ 298 - tcp_init_transfer(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); 298 + tcp_init_transfer(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB, skb); 299 299 300 300 tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1; 301 301

+109 -18

net/ipv4/tcp_input.c

··· 138 138 EXPORT_SYMBOL_GPL(clean_acked_data_flush); 139 139 #endif 140 140 141 + #ifdef CONFIG_CGROUP_BPF 142 + static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb) 143 + { 144 + bool unknown_opt = tcp_sk(sk)->rx_opt.saw_unknown && 145 + BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), 146 + BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG); 147 + bool parse_all_opt = BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), 148 + BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG); 149 + struct bpf_sock_ops_kern sock_ops; 150 + 151 + if (likely(!unknown_opt && !parse_all_opt)) 152 + return; 153 + 154 + /* The skb will be handled in the 155 + * bpf_skops_established() or 156 + * bpf_skops_write_hdr_opt(). 157 + */ 158 + switch (sk->sk_state) { 159 + case TCP_SYN_RECV: 160 + case TCP_SYN_SENT: 161 + case TCP_LISTEN: 162 + return; 163 + } 164 + 165 + sock_owned_by_me(sk); 166 + 167 + memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp)); 168 + sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB; 169 + sock_ops.is_fullsock = 1; 170 + sock_ops.sk = sk; 171 + bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb)); 172 + 173 + BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops); 174 + } 175 + 176 + static void bpf_skops_established(struct sock *sk, int bpf_op, 177 + struct sk_buff *skb) 178 + { 179 + struct bpf_sock_ops_kern sock_ops; 180 + 181 + sock_owned_by_me(sk); 182 + 183 + memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp)); 184 + sock_ops.op = bpf_op; 185 + sock_ops.is_fullsock = 1; 186 + sock_ops.sk = sk; 187 + /* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */ 188 + if (skb) 189 + bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb)); 190 + 191 + BPF_CGROUP_RUN_PROG_SOCK_OPS(&sock_ops); 192 + } 193 + #else 194 + static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb) 195 + { 196 + } 197 + 198 + static void bpf_skops_established(struct sock *sk, int bpf_op, 199 + struct sk_buff *skb) 200 + { 201 + } 202 + #endif 203 + 141 204 static void tcp_gro_dev_warn(struct sock *sk, const struct sk_buff *skb, 142 205 unsigned int len) 143 206 { ··· 3864 3801 foc->exp = exp_opt; 3865 3802 } 3866 3803 3867 - static void smc_parse_options(const struct tcphdr *th, 3804 + static bool smc_parse_options(const struct tcphdr *th, 3868 3805 struct tcp_options_received *opt_rx, 3869 3806 const unsigned char *ptr, 3870 3807 int opsize) ··· 3873 3810 if (static_branch_unlikely(&tcp_have_smc)) { 3874 3811 if (th->syn && !(opsize & 1) && 3875 3812 opsize >= TCPOLEN_EXP_SMC_BASE && 3876 - get_unaligned_be32(ptr) == TCPOPT_SMC_MAGIC) 3813 + get_unaligned_be32(ptr) == TCPOPT_SMC_MAGIC) { 3877 3814 opt_rx->smc_ok = 1; 3815 + return true; 3816 + } 3878 3817 } 3879 3818 #endif 3819 + return false; 3880 3820 } 3881 3821 3882 3822 /* Try to parse the MSS option from the TCP header. Return 0 on failure, clamped ··· 3940 3874 3941 3875 ptr = (const unsigned char *)(th + 1); 3942 3876 opt_rx->saw_tstamp = 0; 3877 + opt_rx->saw_unknown = 0; 3943 3878 3944 3879 while (length > 0) { 3945 3880 int opcode = *ptr++; ··· 4031 3964 */ 4032 3965 if (opsize >= TCPOLEN_EXP_FASTOPEN_BASE && 4033 3966 get_unaligned_be16(ptr) == 4034 - TCPOPT_FASTOPEN_MAGIC) 3967 + TCPOPT_FASTOPEN_MAGIC) { 4035 3968 tcp_parse_fastopen_option(opsize - 4036 3969 TCPOLEN_EXP_FASTOPEN_BASE, 4037 3970 ptr + 2, th->syn, foc, true); 4038 - else 4039 - smc_parse_options(th, opt_rx, ptr, 4040 - opsize); 3971 + break; 3972 + } 3973 + 3974 + if (smc_parse_options(th, opt_rx, ptr, opsize)) 3975 + break; 3976 + 3977 + opt_rx->saw_unknown = 1; 4041 3978 break; 4042 3979 3980 + default: 3981 + opt_rx->saw_unknown = 1; 4043 3982 } 4044 3983 ptr += opsize-2; 4045 3984 length -= opsize; ··· 5663 5590 goto discard; 5664 5591 } 5665 5592 5593 + bpf_skops_parse_hdr(sk, skb); 5594 + 5666 5595 return true; 5667 5596 5668 5597 discard: ··· 5873 5798 } 5874 5799 EXPORT_SYMBOL(tcp_rcv_established); 5875 5800 5876 - void tcp_init_transfer(struct sock *sk, int bpf_op) 5801 + void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb) 5877 5802 { 5878 5803 struct inet_connection_sock *icsk = inet_csk(sk); 5879 5804 struct tcp_sock *tp = tcp_sk(sk); ··· 5894 5819 tp->snd_cwnd = tcp_init_cwnd(tp, __sk_dst_get(sk)); 5895 5820 tp->snd_cwnd_stamp = tcp_jiffies32; 5896 5821 5897 - tcp_call_bpf(sk, bpf_op, 0, NULL); 5822 + bpf_skops_established(sk, bpf_op, skb); 5898 5823 tcp_init_congestion_control(sk); 5899 5824 tcp_init_buffer_space(sk); 5900 5825 } ··· 5913 5838 sk_mark_napi_id(sk, skb); 5914 5839 } 5915 5840 5916 - tcp_init_transfer(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB); 5841 + tcp_init_transfer(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB, skb); 5917 5842 5918 5843 /* Prevent spurious tcp_cwnd_restart() on first data 5919 5844 * packet. ··· 6385 6310 } else { 6386 6311 tcp_try_undo_spurious_syn(sk); 6387 6312 tp->retrans_stamp = 0; 6388 - tcp_init_transfer(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB); 6313 + tcp_init_transfer(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB, 6314 + skb); 6389 6315 WRITE_ONCE(tp->copied_seq, tp->rcv_nxt); 6390 6316 } 6391 6317 smp_mb(); ··· 6675 6599 { 6676 6600 if (tcp_sk(sk)->save_syn) { 6677 6601 u32 len = skb_network_header_len(skb) + tcp_hdrlen(skb); 6678 - u32 *copy; 6602 + struct saved_syn *saved_syn; 6603 + u32 mac_hdrlen; 6604 + void *base; 6679 6605 6680 - copy = kmalloc(len + sizeof(u32), GFP_ATOMIC); 6681 - if (copy) { 6682 - copy[0] = len; 6683 - memcpy(&copy[1], skb_network_header(skb), len); 6684 - req->saved_syn = copy; 6606 + if (tcp_sk(sk)->save_syn == 2) { /* Save full header. */ 6607 + base = skb_mac_header(skb); 6608 + mac_hdrlen = skb_mac_header_len(skb); 6609 + len += mac_hdrlen; 6610 + } else { 6611 + base = skb_network_header(skb); 6612 + mac_hdrlen = 0; 6613 + } 6614 + 6615 + saved_syn = kmalloc(struct_size(saved_syn, data, len), 6616 + GFP_ATOMIC); 6617 + if (saved_syn) { 6618 + saved_syn->mac_hdrlen = mac_hdrlen; 6619 + saved_syn->network_hdrlen = skb_network_header_len(skb); 6620 + saved_syn->tcp_hdrlen = tcp_hdrlen(skb); 6621 + memcpy(saved_syn->data, base, len); 6622 + req->saved_syn = saved_syn; 6685 6623 } 6686 6624 } 6687 6625 } ··· 6842 6752 } 6843 6753 if (fastopen_sk) { 6844 6754 af_ops->send_synack(fastopen_sk, dst, &fl, req, 6845 - &foc, TCP_SYNACK_FASTOPEN); 6755 + &foc, TCP_SYNACK_FASTOPEN, skb); 6846 6756 /* Add the child socket directly into the accept queue */ 6847 6757 if (!inet_csk_reqsk_queue_add(sk, req, fastopen_sk)) { 6848 6758 reqsk_fastopen_remove(fastopen_sk, req, false); ··· 6860 6770 tcp_timeout_init((struct sock *)req)); 6861 6771 af_ops->send_synack(sk, dst, &fl, req, &foc, 6862 6772 !want_cookie ? TCP_SYNACK_NORMAL : 6863 - TCP_SYNACK_COOKIE); 6773 + TCP_SYNACK_COOKIE, 6774 + skb); 6864 6775 if (want_cookie) { 6865 6776 reqsk_free(req); 6866 6777 return 0;

+3 -2

net/ipv4/tcp_ipv4.c

··· 965 965 struct flowi *fl, 966 966 struct request_sock *req, 967 967 struct tcp_fastopen_cookie *foc, 968 - enum tcp_synack_type synack_type) 968 + enum tcp_synack_type synack_type, 969 + struct sk_buff *syn_skb) 969 970 { 970 971 const struct inet_request_sock *ireq = inet_rsk(req); 971 972 struct flowi4 fl4; ··· 977 976 if (!dst && (dst = inet_csk_route_req(sk, &fl4, req)) == NULL) 978 977 return -1; 979 978 980 - skb = tcp_make_synack(sk, dst, req, foc, synack_type); 979 + skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb); 981 980 982 981 if (skb) { 983 982 __tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr);

+1

net/ipv4/tcp_minisocks.c

··· 548 548 newtp->fastopen_req = NULL; 549 549 RCU_INIT_POINTER(newtp->fastopen_rsk, NULL); 550 550 551 + bpf_skops_init_child(sk, newsk); 551 552 tcp_bpf_clone(sk, newsk); 552 553 553 554 __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);

+180 -13

net/ipv4/tcp_output.c

··· 438 438 u8 ws; /* window scale, 0 to disable */ 439 439 u8 num_sack_blocks; /* number of SACK blocks to include */ 440 440 u8 hash_size; /* bytes in hash_location */ 441 + u8 bpf_opt_len; /* length of BPF hdr option */ 441 442 __u8 *hash_location; /* temporary pointer, overloaded */ 442 443 __u32 tsval, tsecr; /* need to include OPTION_TS */ 443 444 struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */ ··· 452 451 mptcp_write_options(ptr, &opts->mptcp); 453 452 #endif 454 453 } 454 + 455 + #ifdef CONFIG_CGROUP_BPF 456 + static int bpf_skops_write_hdr_opt_arg0(struct sk_buff *skb, 457 + enum tcp_synack_type synack_type) 458 + { 459 + if (unlikely(!skb)) 460 + return BPF_WRITE_HDR_TCP_CURRENT_MSS; 461 + 462 + if (unlikely(synack_type == TCP_SYNACK_COOKIE)) 463 + return BPF_WRITE_HDR_TCP_SYNACK_COOKIE; 464 + 465 + return 0; 466 + } 467 + 468 + /* req, syn_skb and synack_type are used when writing synack */ 469 + static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb, 470 + struct request_sock *req, 471 + struct sk_buff *syn_skb, 472 + enum tcp_synack_type synack_type, 473 + struct tcp_out_options *opts, 474 + unsigned int *remaining) 475 + { 476 + struct bpf_sock_ops_kern sock_ops; 477 + int err; 478 + 479 + if (likely(!BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), 480 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) || 481 + !*remaining) 482 + return; 483 + 484 + /* *remaining has already been aligned to 4 bytes, so *remaining >= 4 */ 485 + 486 + /* init sock_ops */ 487 + memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp)); 488 + 489 + sock_ops.op = BPF_SOCK_OPS_HDR_OPT_LEN_CB; 490 + 491 + if (req) { 492 + /* The listen "sk" cannot be passed here because 493 + * it is not locked. It would not make too much 494 + * sense to do bpf_setsockopt(listen_sk) based 495 + * on individual connection request also. 496 + * 497 + * Thus, "req" is passed here and the cgroup-bpf-progs 498 + * of the listen "sk" will be run. 499 + * 500 + * "req" is also used here for fastopen even the "sk" here is 501 + * a fullsock "child" sk. It is to keep the behavior 502 + * consistent between fastopen and non-fastopen on 503 + * the bpf programming side. 504 + */ 505 + sock_ops.sk = (struct sock *)req; 506 + sock_ops.syn_skb = syn_skb; 507 + } else { 508 + sock_owned_by_me(sk); 509 + 510 + sock_ops.is_fullsock = 1; 511 + sock_ops.sk = sk; 512 + } 513 + 514 + sock_ops.args[0] = bpf_skops_write_hdr_opt_arg0(skb, synack_type); 515 + sock_ops.remaining_opt_len = *remaining; 516 + /* tcp_current_mss() does not pass a skb */ 517 + if (skb) 518 + bpf_skops_init_skb(&sock_ops, skb, 0); 519 + 520 + err = BPF_CGROUP_RUN_PROG_SOCK_OPS_SK(&sock_ops, sk); 521 + 522 + if (err || sock_ops.remaining_opt_len == *remaining) 523 + return; 524 + 525 + opts->bpf_opt_len = *remaining - sock_ops.remaining_opt_len; 526 + /* round up to 4 bytes */ 527 + opts->bpf_opt_len = (opts->bpf_opt_len + 3) & ~3; 528 + 529 + *remaining -= opts->bpf_opt_len; 530 + } 531 + 532 + static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb, 533 + struct request_sock *req, 534 + struct sk_buff *syn_skb, 535 + enum tcp_synack_type synack_type, 536 + struct tcp_out_options *opts) 537 + { 538 + u8 first_opt_off, nr_written, max_opt_len = opts->bpf_opt_len; 539 + struct bpf_sock_ops_kern sock_ops; 540 + int err; 541 + 542 + if (likely(!max_opt_len)) 543 + return; 544 + 545 + memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp)); 546 + 547 + sock_ops.op = BPF_SOCK_OPS_WRITE_HDR_OPT_CB; 548 + 549 + if (req) { 550 + sock_ops.sk = (struct sock *)req; 551 + sock_ops.syn_skb = syn_skb; 552 + } else { 553 + sock_owned_by_me(sk); 554 + 555 + sock_ops.is_fullsock = 1; 556 + sock_ops.sk = sk; 557 + } 558 + 559 + sock_ops.args[0] = bpf_skops_write_hdr_opt_arg0(skb, synack_type); 560 + sock_ops.remaining_opt_len = max_opt_len; 561 + first_opt_off = tcp_hdrlen(skb) - max_opt_len; 562 + bpf_skops_init_skb(&sock_ops, skb, first_opt_off); 563 + 564 + err = BPF_CGROUP_RUN_PROG_SOCK_OPS_SK(&sock_ops, sk); 565 + 566 + if (err) 567 + nr_written = 0; 568 + else 569 + nr_written = max_opt_len - sock_ops.remaining_opt_len; 570 + 571 + if (nr_written < max_opt_len) 572 + memset(skb->data + first_opt_off + nr_written, TCPOPT_NOP, 573 + max_opt_len - nr_written); 574 + } 575 + #else 576 + static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb, 577 + struct request_sock *req, 578 + struct sk_buff *syn_skb, 579 + enum tcp_synack_type synack_type, 580 + struct tcp_out_options *opts, 581 + unsigned int *remaining) 582 + { 583 + } 584 + 585 + static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb, 586 + struct request_sock *req, 587 + struct sk_buff *syn_skb, 588 + enum tcp_synack_type synack_type, 589 + struct tcp_out_options *opts) 590 + { 591 + } 592 + #endif 455 593 456 594 /* Write previously computed TCP options to the packet. 457 595 * ··· 831 691 } 832 692 } 833 693 694 + bpf_skops_hdr_opt_len(sk, skb, NULL, NULL, 0, opts, &remaining); 695 + 834 696 return MAX_TCP_OPTION_SPACE - remaining; 835 697 } 836 698 ··· 843 701 struct tcp_out_options *opts, 844 702 const struct tcp_md5sig_key *md5, 845 703 struct tcp_fastopen_cookie *foc, 846 - enum tcp_synack_type synack_type) 704 + enum tcp_synack_type synack_type, 705 + struct sk_buff *syn_skb) 847 706 { 848 707 struct inet_request_sock *ireq = inet_rsk(req); 849 708 unsigned int remaining = MAX_TCP_OPTION_SPACE; ··· 900 757 mptcp_set_option_cond(req, opts, &remaining); 901 758 902 759 smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining); 760 + 761 + bpf_skops_hdr_opt_len((struct sock *)sk, skb, req, syn_skb, 762 + synack_type, opts, &remaining); 903 763 904 764 return MAX_TCP_OPTION_SPACE - remaining; 905 765 } ··· 970 824 971 825 size += TCPOLEN_SACK_BASE_ALIGNED + 972 826 opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK; 827 + } 828 + 829 + if (unlikely(BPF_SOCK_OPS_TEST_FLAG(tp, 830 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG))) { 831 + unsigned int remaining = MAX_TCP_OPTION_SPACE - size; 832 + 833 + bpf_skops_hdr_opt_len(sk, skb, NULL, NULL, 0, opts, &remaining); 834 + 835 + size = MAX_TCP_OPTION_SPACE - remaining; 973 836 } 974 837 975 838 return size; ··· 1367 1212 md5, sk, skb); 1368 1213 } 1369 1214 #endif 1215 + 1216 + /* BPF prog is the last one writing header option */ 1217 + bpf_skops_write_hdr_opt(sk, skb, NULL, NULL, 0, &opts); 1370 1218 1371 1219 INDIRECT_CALL_INET(icsk->icsk_af_ops->send_check, 1372 1220 tcp_v6_send_check, tcp_v4_send_check, ··· 3494 3336 } 3495 3337 3496 3338 /** 3497 - * tcp_make_synack - Prepare a SYN-ACK. 3498 - * sk: listener socket 3499 - * dst: dst entry attached to the SYNACK 3500 - * req: request_sock pointer 3501 - * foc: cookie for tcp fast open 3502 - * synack_type: Type of synback to prepare 3503 - * 3504 - * Allocate one skb and build a SYNACK packet. 3505 - * @dst is consumed : Caller should not use it again. 3339 + * tcp_make_synack - Allocate one skb and build a SYNACK packet. 3340 + * @sk: listener socket 3341 + * @dst: dst entry attached to the SYNACK. It is consumed and caller 3342 + * should not use it again. 3343 + * @req: request_sock pointer 3344 + * @foc: cookie for tcp fast open 3345 + * @synack_type: Type of synack to prepare 3346 + * @syn_skb: SYN packet just received. It could be NULL for rtx case. 3506 3347 */ 3507 3348 struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, 3508 3349 struct request_sock *req, 3509 3350 struct tcp_fastopen_cookie *foc, 3510 - enum tcp_synack_type synack_type) 3351 + enum tcp_synack_type synack_type, 3352 + struct sk_buff *syn_skb) 3511 3353 { 3512 3354 struct inet_request_sock *ireq = inet_rsk(req); 3513 3355 const struct tcp_sock *tp = tcp_sk(sk); ··· 3566 3408 md5 = tcp_rsk(req)->af_specific->req_md5_lookup(sk, req_to_sk(req)); 3567 3409 #endif 3568 3410 skb_set_hash(skb, tcp_rsk(req)->txhash, PKT_HASH_TYPE_L4); 3411 + /* bpf program will be interested in the tcp_flags */ 3412 + TCP_SKB_CB(skb)->tcp_flags = TCPHDR_SYN | TCPHDR_ACK; 3569 3413 tcp_header_size = tcp_synack_options(sk, req, mss, skb, &opts, md5, 3570 - foc, synack_type) + sizeof(*th); 3414 + foc, synack_type, 3415 + syn_skb) + sizeof(*th); 3571 3416 3572 3417 skb_push(skb, tcp_header_size); 3573 3418 skb_reset_transport_header(skb); ··· 3601 3440 md5, req_to_sk(req), skb); 3602 3441 rcu_read_unlock(); 3603 3442 #endif 3443 + 3444 + bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb, 3445 + synack_type, &opts); 3604 3446 3605 3447 skb->skb_mstamp_ns = now; 3606 3448 tcp_add_tx_delay(skb, tp); ··· 3905 3741 ato = min(ato, max_ato); 3906 3742 } 3907 3743 3744 + ato = min_t(u32, ato, inet_csk(sk)->icsk_delack_max); 3745 + 3908 3746 /* Stay within the limit we were given */ 3909 3747 timeout = jiffies + ato; 3910 3748 ··· 4100 3934 int res; 4101 3935 4102 3936 tcp_rsk(req)->txhash = net_tx_rndhash(); 4103 - res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_NORMAL); 3937 + res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_NORMAL, 3938 + NULL); 4104 3939 if (!res) { 4105 3940 __TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS); 4106 3941 __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);

+4 -5

net/ipv4/udp_bpf.c

··· 22 22 prot->close = sock_map_close; 23 23 } 24 24 25 - static void udp_bpf_check_v6_needs_rebuild(struct sock *sk, struct proto *ops) 25 + static void udp_bpf_check_v6_needs_rebuild(struct proto *ops) 26 26 { 27 - if (sk->sk_family == AF_INET6 && 28 - unlikely(ops != smp_load_acquire(&udpv6_prot_saved))) { 27 + if (unlikely(ops != smp_load_acquire(&udpv6_prot_saved))) { 29 28 spin_lock_bh(&udpv6_prot_lock); 30 29 if (likely(ops != udpv6_prot_saved)) { 31 30 udp_bpf_rebuild_protos(&udp_bpf_prots[UDP_BPF_IPV6], ops); ··· 45 46 { 46 47 int family = sk->sk_family == AF_INET ? UDP_BPF_IPV4 : UDP_BPF_IPV6; 47 48 48 - if (!psock->sk_proto) 49 - udp_bpf_check_v6_needs_rebuild(sk, READ_ONCE(sk->sk_prot)); 49 + if (sk->sk_family == AF_INET6) 50 + udp_bpf_check_v6_needs_rebuild(psock->sk_proto); 50 51 51 52 return &udp_bpf_prots[family]; 52 53 }

+3 -2

net/ipv6/tcp_ipv6.c

··· 501 501 struct flowi *fl, 502 502 struct request_sock *req, 503 503 struct tcp_fastopen_cookie *foc, 504 - enum tcp_synack_type synack_type) 504 + enum tcp_synack_type synack_type, 505 + struct sk_buff *syn_skb) 505 506 { 506 507 struct inet_request_sock *ireq = inet_rsk(req); 507 508 struct ipv6_pinfo *np = tcp_inet6_sk(sk); ··· 516 515 IPPROTO_TCP)) == NULL) 517 516 goto done; 518 517 519 - skb = tcp_make_synack(sk, dst, req, foc, synack_type); 518 + skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb); 520 519 521 520 if (skb) { 522 521 __tcp_v6_send_check(skb, &ireq->ir_v6_loc_addr,

+27 -198

net/xdp/xdp_umem.c

··· 23 23 24 24 static DEFINE_IDA(umem_ida); 25 25 26 - void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) 27 - { 28 - unsigned long flags; 29 - 30 - if (!xs->tx) 31 - return; 32 - 33 - spin_lock_irqsave(&umem->xsk_tx_list_lock, flags); 34 - list_add_rcu(&xs->list, &umem->xsk_tx_list); 35 - spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags); 36 - } 37 - 38 - void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) 39 - { 40 - unsigned long flags; 41 - 42 - if (!xs->tx) 43 - return; 44 - 45 - spin_lock_irqsave(&umem->xsk_tx_list_lock, flags); 46 - list_del_rcu(&xs->list); 47 - spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags); 48 - } 49 - 50 - /* The umem is stored both in the _rx struct and the _tx struct as we do 51 - * not know if the device has more tx queues than rx, or the opposite. 52 - * This might also change during run time. 53 - */ 54 - static int xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem, 55 - u16 queue_id) 56 - { 57 - if (queue_id >= max_t(unsigned int, 58 - dev->real_num_rx_queues, 59 - dev->real_num_tx_queues)) 60 - return -EINVAL; 61 - 62 - if (queue_id < dev->real_num_rx_queues) 63 - dev->_rx[queue_id].umem = umem; 64 - if (queue_id < dev->real_num_tx_queues) 65 - dev->_tx[queue_id].umem = umem; 66 - 67 - return 0; 68 - } 69 - 70 - struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, 71 - u16 queue_id) 72 - { 73 - if (queue_id < dev->real_num_rx_queues) 74 - return dev->_rx[queue_id].umem; 75 - if (queue_id < dev->real_num_tx_queues) 76 - return dev->_tx[queue_id].umem; 77 - 78 - return NULL; 79 - } 80 - EXPORT_SYMBOL(xdp_get_umem_from_qid); 81 - 82 - static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id) 83 - { 84 - if (queue_id < dev->real_num_rx_queues) 85 - dev->_rx[queue_id].umem = NULL; 86 - if (queue_id < dev->real_num_tx_queues) 87 - dev->_tx[queue_id].umem = NULL; 88 - } 89 - 90 - int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, 91 - u16 queue_id, u16 flags) 92 - { 93 - bool force_zc, force_copy; 94 - struct netdev_bpf bpf; 95 - int err = 0; 96 - 97 - ASSERT_RTNL(); 98 - 99 - force_zc = flags & XDP_ZEROCOPY; 100 - force_copy = flags & XDP_COPY; 101 - 102 - if (force_zc && force_copy) 103 - return -EINVAL; 104 - 105 - if (xdp_get_umem_from_qid(dev, queue_id)) 106 - return -EBUSY; 107 - 108 - err = xdp_reg_umem_at_qid(dev, umem, queue_id); 109 - if (err) 110 - return err; 111 - 112 - umem->dev = dev; 113 - umem->queue_id = queue_id; 114 - 115 - if (flags & XDP_USE_NEED_WAKEUP) { 116 - umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; 117 - /* Tx needs to be explicitly woken up the first time. 118 - * Also for supporting drivers that do not implement this 119 - * feature. They will always have to call sendto(). 120 - */ 121 - xsk_set_tx_need_wakeup(umem); 122 - } 123 - 124 - dev_hold(dev); 125 - 126 - if (force_copy) 127 - /* For copy-mode, we are done. */ 128 - return 0; 129 - 130 - if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) { 131 - err = -EOPNOTSUPP; 132 - goto err_unreg_umem; 133 - } 134 - 135 - bpf.command = XDP_SETUP_XSK_UMEM; 136 - bpf.xsk.umem = umem; 137 - bpf.xsk.queue_id = queue_id; 138 - 139 - err = dev->netdev_ops->ndo_bpf(dev, &bpf); 140 - if (err) 141 - goto err_unreg_umem; 142 - 143 - umem->zc = true; 144 - return 0; 145 - 146 - err_unreg_umem: 147 - if (!force_zc) 148 - err = 0; /* fallback to copy mode */ 149 - if (err) 150 - xdp_clear_umem_at_qid(dev, queue_id); 151 - return err; 152 - } 153 - 154 - void xdp_umem_clear_dev(struct xdp_umem *umem) 155 - { 156 - struct netdev_bpf bpf; 157 - int err; 158 - 159 - ASSERT_RTNL(); 160 - 161 - if (!umem->dev) 162 - return; 163 - 164 - if (umem->zc) { 165 - bpf.command = XDP_SETUP_XSK_UMEM; 166 - bpf.xsk.umem = NULL; 167 - bpf.xsk.queue_id = umem->queue_id; 168 - 169 - err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); 170 - 171 - if (err) 172 - WARN(1, "failed to disable umem!\n"); 173 - } 174 - 175 - xdp_clear_umem_at_qid(umem->dev, umem->queue_id); 176 - 177 - dev_put(umem->dev); 178 - umem->dev = NULL; 179 - umem->zc = false; 180 - } 181 - 182 26 static void xdp_umem_unpin_pages(struct xdp_umem *umem) 183 27 { 184 28 unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); ··· 39 195 } 40 196 } 41 197 198 + static void xdp_umem_addr_unmap(struct xdp_umem *umem) 199 + { 200 + vunmap(umem->addrs); 201 + umem->addrs = NULL; 202 + } 203 + 204 + static int xdp_umem_addr_map(struct xdp_umem *umem, struct page **pages, 205 + u32 nr_pages) 206 + { 207 + umem->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); 208 + if (!umem->addrs) 209 + return -ENOMEM; 210 + return 0; 211 + } 212 + 42 213 static void xdp_umem_release(struct xdp_umem *umem) 43 214 { 44 - rtnl_lock(); 45 - xdp_umem_clear_dev(umem); 46 - rtnl_unlock(); 47 - 215 + umem->zc = false; 48 216 ida_simple_remove(&umem_ida, umem->id); 49 217 50 - if (umem->fq) { 51 - xskq_destroy(umem->fq); 52 - umem->fq = NULL; 53 - } 54 - 55 - if (umem->cq) { 56 - xskq_destroy(umem->cq); 57 - umem->cq = NULL; 58 - } 59 - 60 - xp_destroy(umem->pool); 218 + xdp_umem_addr_unmap(umem); 61 219 xdp_umem_unpin_pages(umem); 62 220 63 221 xdp_umem_unaccount_pages(umem); 64 222 kfree(umem); 65 - } 66 - 67 - static void xdp_umem_release_deferred(struct work_struct *work) 68 - { 69 - struct xdp_umem *umem = container_of(work, struct xdp_umem, work); 70 - 71 - xdp_umem_release(umem); 72 223 } 73 224 74 225 void xdp_get_umem(struct xdp_umem *umem) ··· 76 237 if (!umem) 77 238 return; 78 239 79 - if (refcount_dec_and_test(&umem->users)) { 80 - INIT_WORK(&umem->work, xdp_umem_release_deferred); 81 - schedule_work(&umem->work); 82 - } 240 + if (refcount_dec_and_test(&umem->users)) 241 + xdp_umem_release(umem); 83 242 } 84 243 85 244 static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) ··· 156 319 return -EINVAL; 157 320 } 158 321 159 - if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG | 160 - XDP_UMEM_USES_NEED_WAKEUP)) 322 + if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG) 161 323 return -EINVAL; 162 324 163 325 if (!unaligned_chunks && !is_power_of_2(chunk_size)) ··· 192 356 umem->size = size; 193 357 umem->headroom = headroom; 194 358 umem->chunk_size = chunk_size; 359 + umem->chunks = chunks; 195 360 umem->npgs = (u32)npgs; 196 361 umem->pgs = NULL; 197 362 umem->user = NULL; 198 363 umem->flags = mr->flags; 199 - INIT_LIST_HEAD(&umem->xsk_tx_list); 200 - spin_lock_init(&umem->xsk_tx_list_lock); 201 364 365 + INIT_LIST_HEAD(&umem->xsk_dma_list); 202 366 refcount_set(&umem->users, 1); 203 367 204 368 err = xdp_umem_account_pages(umem); ··· 209 373 if (err) 210 374 goto out_account; 211 375 212 - umem->pool = xp_create(umem->pgs, umem->npgs, chunks, chunk_size, 213 - headroom, size, unaligned_chunks); 214 - if (!umem->pool) { 215 - err = -ENOMEM; 216 - goto out_pin; 217 - } 376 + err = xdp_umem_addr_map(umem, umem->pgs, umem->npgs); 377 + if (err) 378 + goto out_unpin; 379 + 218 380 return 0; 219 381 220 - out_pin: 382 + out_unpin: 221 383 xdp_umem_unpin_pages(umem); 222 384 out_account: 223 385 xdp_umem_unaccount_pages(umem); ··· 246 412 } 247 413 248 414 return umem; 249 - } 250 - 251 - bool xdp_umem_validate_queues(struct xdp_umem *umem) 252 - { 253 - return umem->fq && umem->cq; 254 415 }

-6

net/xdp/xdp_umem.h

··· 8 8 9 9 #include <net/xdp_sock_drv.h> 10 10 11 - int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, 12 - u16 queue_id, u16 flags); 13 - void xdp_umem_clear_dev(struct xdp_umem *umem); 14 - bool xdp_umem_validate_queues(struct xdp_umem *umem); 15 11 void xdp_get_umem(struct xdp_umem *umem); 16 12 void xdp_put_umem(struct xdp_umem *umem); 17 - void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); 18 - void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); 19 13 struct xdp_umem *xdp_umem_create(struct xdp_umem_reg *mr); 20 14 21 15 #endif /* XDP_UMEM_H_ */

+142 -71

net/xdp/xsk.c

··· 36 36 bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) 37 37 { 38 38 return READ_ONCE(xs->rx) && READ_ONCE(xs->umem) && 39 - READ_ONCE(xs->umem->fq); 39 + (xs->pool->fq || READ_ONCE(xs->fq_tmp)); 40 40 } 41 41 42 - void xsk_set_rx_need_wakeup(struct xdp_umem *umem) 42 + void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) 43 43 { 44 - if (umem->need_wakeup & XDP_WAKEUP_RX) 44 + if (pool->cached_need_wakeup & XDP_WAKEUP_RX) 45 45 return; 46 46 47 - umem->fq->ring->flags |= XDP_RING_NEED_WAKEUP; 48 - umem->need_wakeup |= XDP_WAKEUP_RX; 47 + pool->fq->ring->flags |= XDP_RING_NEED_WAKEUP; 48 + pool->cached_need_wakeup |= XDP_WAKEUP_RX; 49 49 } 50 50 EXPORT_SYMBOL(xsk_set_rx_need_wakeup); 51 51 52 - void xsk_set_tx_need_wakeup(struct xdp_umem *umem) 52 + void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) 53 53 { 54 54 struct xdp_sock *xs; 55 55 56 - if (umem->need_wakeup & XDP_WAKEUP_TX) 56 + if (pool->cached_need_wakeup & XDP_WAKEUP_TX) 57 57 return; 58 58 59 59 rcu_read_lock(); 60 - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { 60 + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { 61 61 xs->tx->ring->flags |= XDP_RING_NEED_WAKEUP; 62 62 } 63 63 rcu_read_unlock(); 64 64 65 - umem->need_wakeup |= XDP_WAKEUP_TX; 65 + pool->cached_need_wakeup |= XDP_WAKEUP_TX; 66 66 } 67 67 EXPORT_SYMBOL(xsk_set_tx_need_wakeup); 68 68 69 - void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) 69 + void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) 70 70 { 71 - if (!(umem->need_wakeup & XDP_WAKEUP_RX)) 71 + if (!(pool->cached_need_wakeup & XDP_WAKEUP_RX)) 72 72 return; 73 73 74 - umem->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP; 75 - umem->need_wakeup &= ~XDP_WAKEUP_RX; 74 + pool->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP; 75 + pool->cached_need_wakeup &= ~XDP_WAKEUP_RX; 76 76 } 77 77 EXPORT_SYMBOL(xsk_clear_rx_need_wakeup); 78 78 79 - void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) 79 + void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) 80 80 { 81 81 struct xdp_sock *xs; 82 82 83 - if (!(umem->need_wakeup & XDP_WAKEUP_TX)) 83 + if (!(pool->cached_need_wakeup & XDP_WAKEUP_TX)) 84 84 return; 85 85 86 86 rcu_read_lock(); 87 - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { 87 + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { 88 88 xs->tx->ring->flags &= ~XDP_RING_NEED_WAKEUP; 89 89 } 90 90 rcu_read_unlock(); 91 91 92 - umem->need_wakeup &= ~XDP_WAKEUP_TX; 92 + pool->cached_need_wakeup &= ~XDP_WAKEUP_TX; 93 93 } 94 94 EXPORT_SYMBOL(xsk_clear_tx_need_wakeup); 95 95 96 - bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem) 96 + bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) 97 97 { 98 - return umem->flags & XDP_UMEM_USES_NEED_WAKEUP; 98 + return pool->uses_need_wakeup; 99 99 } 100 - EXPORT_SYMBOL(xsk_umem_uses_need_wakeup); 100 + EXPORT_SYMBOL(xsk_uses_need_wakeup); 101 + 102 + struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, 103 + u16 queue_id) 104 + { 105 + if (queue_id < dev->real_num_rx_queues) 106 + return dev->_rx[queue_id].pool; 107 + if (queue_id < dev->real_num_tx_queues) 108 + return dev->_tx[queue_id].pool; 109 + 110 + return NULL; 111 + } 112 + EXPORT_SYMBOL(xsk_get_pool_from_qid); 113 + 114 + void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id) 115 + { 116 + if (queue_id < dev->real_num_rx_queues) 117 + dev->_rx[queue_id].pool = NULL; 118 + if (queue_id < dev->real_num_tx_queues) 119 + dev->_tx[queue_id].pool = NULL; 120 + } 121 + 122 + /* The buffer pool is stored both in the _rx struct and the _tx struct as we do 123 + * not know if the device has more tx queues than rx, or the opposite. 124 + * This might also change during run time. 125 + */ 126 + int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, 127 + u16 queue_id) 128 + { 129 + if (queue_id >= max_t(unsigned int, 130 + dev->real_num_rx_queues, 131 + dev->real_num_tx_queues)) 132 + return -EINVAL; 133 + 134 + if (queue_id < dev->real_num_rx_queues) 135 + dev->_rx[queue_id].pool = pool; 136 + if (queue_id < dev->real_num_tx_queues) 137 + dev->_tx[queue_id].pool = pool; 138 + 139 + return 0; 140 + } 101 141 102 142 void xp_release(struct xdp_buff_xsk *xskb) 103 143 { ··· 195 155 struct xdp_buff *xsk_xdp; 196 156 int err; 197 157 198 - if (len > xsk_umem_get_rx_frame_size(xs->umem)) { 158 + if (len > xsk_pool_get_rx_frame_size(xs->pool)) { 199 159 xs->rx_dropped++; 200 160 return -ENOSPC; 201 161 } 202 162 203 - xsk_xdp = xsk_buff_alloc(xs->umem); 163 + xsk_xdp = xsk_buff_alloc(xs->pool); 204 164 if (!xsk_xdp) { 205 165 xs->rx_dropped++; 206 166 return -ENOSPC; ··· 248 208 static void xsk_flush(struct xdp_sock *xs) 249 209 { 250 210 xskq_prod_submit(xs->rx); 251 - __xskq_cons_release(xs->umem->fq); 211 + __xskq_cons_release(xs->pool->fq); 252 212 sock_def_readable(&xs->sk); 253 213 } 254 214 ··· 289 249 } 290 250 } 291 251 292 - void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) 252 + void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) 293 253 { 294 - xskq_prod_submit_n(umem->cq, nb_entries); 254 + xskq_prod_submit_n(pool->cq, nb_entries); 295 255 } 296 - EXPORT_SYMBOL(xsk_umem_complete_tx); 256 + EXPORT_SYMBOL(xsk_tx_completed); 297 257 298 - void xsk_umem_consume_tx_done(struct xdp_umem *umem) 258 + void xsk_tx_release(struct xsk_buff_pool *pool) 299 259 { 300 260 struct xdp_sock *xs; 301 261 302 262 rcu_read_lock(); 303 - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { 263 + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { 304 264 __xskq_cons_release(xs->tx); 305 265 xs->sk.sk_write_space(&xs->sk); 306 266 } 307 267 rcu_read_unlock(); 308 268 } 309 - EXPORT_SYMBOL(xsk_umem_consume_tx_done); 269 + EXPORT_SYMBOL(xsk_tx_release); 310 270 311 - bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc) 271 + bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) 312 272 { 313 273 struct xdp_sock *xs; 314 274 315 275 rcu_read_lock(); 316 - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { 317 - if (!xskq_cons_peek_desc(xs->tx, desc, umem)) { 276 + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { 277 + if (!xskq_cons_peek_desc(xs->tx, desc, pool)) { 318 278 xs->tx->queue_empty_descs++; 319 279 continue; 320 280 } ··· 324 284 * if there is space in it. This avoids having to implement 325 285 * any buffering in the Tx path. 326 286 */ 327 - if (xskq_prod_reserve_addr(umem->cq, desc->addr)) 287 + if (xskq_prod_reserve_addr(pool->cq, desc->addr)) 328 288 goto out; 329 289 330 290 xskq_cons_release(xs->tx); ··· 336 296 rcu_read_unlock(); 337 297 return false; 338 298 } 339 - EXPORT_SYMBOL(xsk_umem_consume_tx); 299 + EXPORT_SYMBOL(xsk_tx_peek_desc); 340 300 341 301 static int xsk_wakeup(struct xdp_sock *xs, u8 flags) 342 302 { ··· 362 322 unsigned long flags; 363 323 364 324 spin_lock_irqsave(&xs->tx_completion_lock, flags); 365 - xskq_prod_submit_addr(xs->umem->cq, addr); 325 + xskq_prod_submit_addr(xs->pool->cq, addr); 366 326 spin_unlock_irqrestore(&xs->tx_completion_lock, flags); 367 327 368 328 sock_wfree(skb); ··· 382 342 if (xs->queue_id >= xs->dev->real_num_tx_queues) 383 343 goto out; 384 344 385 - while (xskq_cons_peek_desc(xs->tx, &desc, xs->umem)) { 345 + while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { 386 346 char *buffer; 387 347 u64 addr; 388 348 u32 len; ··· 399 359 400 360 skb_put(skb, len); 401 361 addr = desc.addr; 402 - buffer = xsk_buff_raw_get_data(xs->umem, addr); 362 + buffer = xsk_buff_raw_get_data(xs->pool, addr); 403 363 err = skb_store_bits(skb, 0, buffer, len); 404 364 /* This is the backpressure mechanism for the Tx path. 405 365 * Reserve space in the completion queue and only proceed 406 366 * if there is space in it. This avoids having to implement 407 367 * any buffering in the Tx path. 408 368 */ 409 - if (unlikely(err) || xskq_prod_reserve(xs->umem->cq)) { 369 + if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { 410 370 kfree_skb(skb); 411 371 goto out; 412 372 } ··· 471 431 __poll_t mask = datagram_poll(file, sock, wait); 472 432 struct sock *sk = sock->sk; 473 433 struct xdp_sock *xs = xdp_sk(sk); 474 - struct xdp_umem *umem; 434 + struct xsk_buff_pool *pool; 475 435 476 436 if (unlikely(!xsk_is_bound(xs))) 477 437 return mask; 478 438 479 - umem = xs->umem; 439 + pool = xs->pool; 480 440 481 - if (umem->need_wakeup) { 441 + if (pool->cached_need_wakeup) { 482 442 if (xs->zc) 483 - xsk_wakeup(xs, umem->need_wakeup); 443 + xsk_wakeup(xs, pool->cached_need_wakeup); 484 444 else 485 445 /* Poll needs to drive Tx also in copy mode */ 486 446 __xsk_sendmsg(sk); ··· 521 481 WRITE_ONCE(xs->state, XSK_UNBOUND); 522 482 523 483 /* Wait for driver to stop using the xdp socket. */ 524 - xdp_del_sk_umem(xs->umem, xs); 484 + xp_del_xsk(xs->pool, xs); 525 485 xs->dev = NULL; 526 486 synchronize_net(); 527 487 dev_put(dev); ··· 599 559 600 560 xskq_destroy(xs->rx); 601 561 xskq_destroy(xs->tx); 562 + xskq_destroy(xs->fq_tmp); 563 + xskq_destroy(xs->cq_tmp); 602 564 603 565 sock_orphan(sk); 604 566 sock->sk = NULL; ··· 626 584 } 627 585 628 586 return sock; 587 + } 588 + 589 + static bool xsk_validate_queues(struct xdp_sock *xs) 590 + { 591 + return xs->fq_tmp && xs->cq_tmp; 629 592 } 630 593 631 594 static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) ··· 701 654 sockfd_put(sock); 702 655 goto out_unlock; 703 656 } 704 - if (umem_xs->dev != dev || umem_xs->queue_id != qid) { 705 - err = -EINVAL; 706 - sockfd_put(sock); 707 - goto out_unlock; 657 + 658 + if (umem_xs->queue_id != qid || umem_xs->dev != dev) { 659 + /* Share the umem with another socket on another qid 660 + * and/or device. 661 + */ 662 + xs->pool = xp_create_and_assign_umem(xs, 663 + umem_xs->umem); 664 + if (!xs->pool) { 665 + sockfd_put(sock); 666 + goto out_unlock; 667 + } 668 + 669 + err = xp_assign_dev_shared(xs->pool, umem_xs->umem, 670 + dev, qid); 671 + if (err) { 672 + xp_destroy(xs->pool); 673 + sockfd_put(sock); 674 + goto out_unlock; 675 + } 676 + } else { 677 + /* Share the buffer pool with the other socket. */ 678 + if (xs->fq_tmp || xs->cq_tmp) { 679 + /* Do not allow setting your own fq or cq. */ 680 + err = -EINVAL; 681 + sockfd_put(sock); 682 + goto out_unlock; 683 + } 684 + 685 + xp_get_pool(umem_xs->pool); 686 + xs->pool = umem_xs->pool; 708 687 } 709 688 710 689 xdp_get_umem(umem_xs->umem); 711 690 WRITE_ONCE(xs->umem, umem_xs->umem); 712 691 sockfd_put(sock); 713 - } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) { 692 + } else if (!xs->umem || !xsk_validate_queues(xs)) { 714 693 err = -EINVAL; 715 694 goto out_unlock; 716 695 } else { 717 696 /* This xsk has its own umem. */ 718 - err = xdp_umem_assign_dev(xs->umem, dev, qid, flags); 719 - if (err) 697 + xs->pool = xp_create_and_assign_umem(xs, xs->umem); 698 + if (!xs->pool) { 699 + err = -ENOMEM; 720 700 goto out_unlock; 701 + } 702 + 703 + err = xp_assign_dev(xs->pool, dev, qid, flags); 704 + if (err) { 705 + xp_destroy(xs->pool); 706 + xs->pool = NULL; 707 + goto out_unlock; 708 + } 721 709 } 722 710 723 711 xs->dev = dev; 724 712 xs->zc = xs->umem->zc; 725 713 xs->queue_id = qid; 726 - xdp_add_sk_umem(xs->umem, xs); 714 + xp_add_xsk(xs->pool, xs); 727 715 728 716 out_unlock: 729 717 if (err) { ··· 864 782 mutex_unlock(&xs->mutex); 865 783 return -EBUSY; 866 784 } 867 - if (!xs->umem) { 868 - mutex_unlock(&xs->mutex); 869 - return -EINVAL; 870 - } 871 785 872 - q = (optname == XDP_UMEM_FILL_RING) ? &xs->umem->fq : 873 - &xs->umem->cq; 786 + q = (optname == XDP_UMEM_FILL_RING) ? &xs->fq_tmp : 787 + &xs->cq_tmp; 874 788 err = xsk_init_queue(entries, q, true); 875 - if (optname == XDP_UMEM_FILL_RING) 876 - xp_set_fq(xs->umem->pool, *q); 877 789 mutex_unlock(&xs->mutex); 878 790 return err; 879 791 } ··· 934 858 if (extra_stats) { 935 859 stats.rx_ring_full = xs->rx_queue_full; 936 860 stats.rx_fill_ring_empty_descs = 937 - xs->umem ? xskq_nb_queue_empty_descs(xs->umem->fq) : 0; 861 + xs->pool ? xskq_nb_queue_empty_descs(xs->pool->fq) : 0; 938 862 stats.tx_ring_empty_descs = xskq_nb_queue_empty_descs(xs->tx); 939 863 } else { 940 864 stats.rx_dropped += xs->rx_queue_full; ··· 1036 960 unsigned long size = vma->vm_end - vma->vm_start; 1037 961 struct xdp_sock *xs = xdp_sk(sock->sk); 1038 962 struct xsk_queue *q = NULL; 1039 - struct xdp_umem *umem; 1040 963 unsigned long pfn; 1041 964 struct page *qpg; 1042 965 ··· 1047 972 } else if (offset == XDP_PGOFF_TX_RING) { 1048 973 q = READ_ONCE(xs->tx); 1049 974 } else { 1050 - umem = READ_ONCE(xs->umem); 1051 - if (!umem) 1052 - return -EINVAL; 1053 - 1054 975 /* Matches the smp_wmb() in XDP_UMEM_REG */ 1055 976 smp_rmb(); 1056 977 if (offset == XDP_UMEM_PGOFF_FILL_RING) 1057 - q = READ_ONCE(umem->fq); 978 + q = READ_ONCE(xs->fq_tmp); 1058 979 else if (offset == XDP_UMEM_PGOFF_COMPLETION_RING) 1059 - q = READ_ONCE(umem->cq); 980 + q = READ_ONCE(xs->cq_tmp); 1060 981 } 1061 982 1062 983 if (!q) ··· 1090 1019 1091 1020 xsk_unbind_dev(xs); 1092 1021 1093 - /* Clear device references in umem. */ 1094 - xdp_umem_clear_dev(xs->umem); 1022 + /* Clear device references. */ 1023 + xp_clear_dev(xs->pool); 1095 1024 } 1096 1025 mutex_unlock(&xs->mutex); 1097 1026 } ··· 1135 1064 if (!sock_flag(sk, SOCK_DEAD)) 1136 1065 return; 1137 1066 1138 - xdp_put_umem(xs->umem); 1067 + xp_put_pool(xs->pool); 1139 1068 1140 1069 sk_refcnt_debug_dec(sk); 1141 1070 } ··· 1143 1072 static int xsk_create(struct net *net, struct socket *sock, int protocol, 1144 1073 int kern) 1145 1074 { 1146 - struct sock *sk; 1147 1075 struct xdp_sock *xs; 1076 + struct sock *sk; 1148 1077 1149 1078 if (!ns_capable(net->user_ns, CAP_NET_RAW)) 1150 1079 return -EPERM;

+3 -7

net/xdp/xsk.h

··· 11 11 #define XSK_NEXT_PG_CONTIG_SHIFT 0 12 12 #define XSK_NEXT_PG_CONTIG_MASK BIT_ULL(XSK_NEXT_PG_CONTIG_SHIFT) 13 13 14 - /* Flags for the umem flags field. 15 - * 16 - * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public 17 - * flags. See inlude/uapi/include/linux/if_xdp.h. 18 - */ 19 - #define XDP_UMEM_USES_NEED_WAKEUP BIT(1) 20 - 21 14 struct xdp_ring_offset_v1 { 22 15 __u64 producer; 23 16 __u64 consumer; ··· 44 51 struct xdp_sock **map_entry); 45 52 int xsk_map_inc(struct xsk_map *map); 46 53 void xsk_map_put(struct xsk_map *map); 54 + void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id); 55 + int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, 56 + u16 queue_id); 47 57 48 58 #endif /* XSK_H_ */

+324 -56

net/xdp/xsk_buff_pool.c

··· 2 2 3 3 #include <net/xsk_buff_pool.h> 4 4 #include <net/xdp_sock.h> 5 + #include <net/xdp_sock_drv.h> 6 + #include <linux/dma-direct.h> 7 + #include <linux/dma-noncoherent.h> 8 + #include <linux/swiotlb.h> 5 9 6 10 #include "xsk_queue.h" 11 + #include "xdp_umem.h" 12 + #include "xsk.h" 7 13 8 - static void xp_addr_unmap(struct xsk_buff_pool *pool) 14 + void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs) 9 15 { 10 - vunmap(pool->addrs); 16 + unsigned long flags; 17 + 18 + if (!xs->tx) 19 + return; 20 + 21 + spin_lock_irqsave(&pool->xsk_tx_list_lock, flags); 22 + list_add_rcu(&xs->tx_list, &pool->xsk_tx_list); 23 + spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags); 11 24 } 12 25 13 - static int xp_addr_map(struct xsk_buff_pool *pool, 14 - struct page **pages, u32 nr_pages) 26 + void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs) 15 27 { 16 - pool->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); 17 - if (!pool->addrs) 18 - return -ENOMEM; 19 - return 0; 28 + unsigned long flags; 29 + 30 + if (!xs->tx) 31 + return; 32 + 33 + spin_lock_irqsave(&pool->xsk_tx_list_lock, flags); 34 + list_del_rcu(&xs->tx_list); 35 + spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags); 20 36 } 21 37 22 38 void xp_destroy(struct xsk_buff_pool *pool) ··· 40 24 if (!pool) 41 25 return; 42 26 43 - xp_addr_unmap(pool); 44 27 kvfree(pool->heads); 45 28 kvfree(pool); 46 29 } 47 30 48 - struct xsk_buff_pool *xp_create(struct page **pages, u32 nr_pages, u32 chunks, 49 - u32 chunk_size, u32 headroom, u64 size, 50 - bool unaligned) 31 + struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, 32 + struct xdp_umem *umem) 51 33 { 52 34 struct xsk_buff_pool *pool; 53 35 struct xdp_buff_xsk *xskb; 54 - int err; 55 36 u32 i; 56 37 57 - pool = kvzalloc(struct_size(pool, free_heads, chunks), GFP_KERNEL); 38 + pool = kvzalloc(struct_size(pool, free_heads, umem->chunks), 39 + GFP_KERNEL); 58 40 if (!pool) 59 41 goto out; 60 42 61 - pool->heads = kvcalloc(chunks, sizeof(*pool->heads), GFP_KERNEL); 43 + pool->heads = kvcalloc(umem->chunks, sizeof(*pool->heads), GFP_KERNEL); 62 44 if (!pool->heads) 63 45 goto out; 64 46 65 - pool->chunk_mask = ~((u64)chunk_size - 1); 66 - pool->addrs_cnt = size; 67 - pool->heads_cnt = chunks; 68 - pool->free_heads_cnt = chunks; 69 - pool->headroom = headroom; 70 - pool->chunk_size = chunk_size; 71 - pool->unaligned = unaligned; 72 - pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM; 47 + pool->chunk_mask = ~((u64)umem->chunk_size - 1); 48 + pool->addrs_cnt = umem->size; 49 + pool->heads_cnt = umem->chunks; 50 + pool->free_heads_cnt = umem->chunks; 51 + pool->headroom = umem->headroom; 52 + pool->chunk_size = umem->chunk_size; 53 + pool->unaligned = umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG; 54 + pool->frame_len = umem->chunk_size - umem->headroom - 55 + XDP_PACKET_HEADROOM; 56 + pool->umem = umem; 57 + pool->addrs = umem->addrs; 73 58 INIT_LIST_HEAD(&pool->free_list); 59 + INIT_LIST_HEAD(&pool->xsk_tx_list); 60 + spin_lock_init(&pool->xsk_tx_list_lock); 61 + refcount_set(&pool->users, 1); 62 + 63 + pool->fq = xs->fq_tmp; 64 + pool->cq = xs->cq_tmp; 65 + xs->fq_tmp = NULL; 66 + xs->cq_tmp = NULL; 74 67 75 68 for (i = 0; i < pool->free_heads_cnt; i++) { 76 69 xskb = &pool->heads[i]; 77 70 xskb->pool = pool; 78 - xskb->xdp.frame_sz = chunk_size - headroom; 71 + xskb->xdp.frame_sz = umem->chunk_size - umem->headroom; 79 72 pool->free_heads[i] = xskb; 80 73 } 81 74 82 - err = xp_addr_map(pool, pages, nr_pages); 83 - if (!err) 84 - return pool; 75 + return pool; 85 76 86 77 out: 87 78 xp_destroy(pool); 88 79 return NULL; 89 - } 90 - 91 - void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq) 92 - { 93 - pool->fq = fq; 94 80 } 95 81 96 82 void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) ··· 104 86 } 105 87 EXPORT_SYMBOL(xp_set_rxq_info); 106 88 107 - void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) 89 + static void xp_disable_drv_zc(struct xsk_buff_pool *pool) 90 + { 91 + struct netdev_bpf bpf; 92 + int err; 93 + 94 + ASSERT_RTNL(); 95 + 96 + if (pool->umem->zc) { 97 + bpf.command = XDP_SETUP_XSK_POOL; 98 + bpf.xsk.pool = NULL; 99 + bpf.xsk.queue_id = pool->queue_id; 100 + 101 + err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf); 102 + 103 + if (err) 104 + WARN(1, "Failed to disable zero-copy!\n"); 105 + } 106 + } 107 + 108 + static int __xp_assign_dev(struct xsk_buff_pool *pool, 109 + struct net_device *netdev, u16 queue_id, u16 flags) 110 + { 111 + bool force_zc, force_copy; 112 + struct netdev_bpf bpf; 113 + int err = 0; 114 + 115 + ASSERT_RTNL(); 116 + 117 + force_zc = flags & XDP_ZEROCOPY; 118 + force_copy = flags & XDP_COPY; 119 + 120 + if (force_zc && force_copy) 121 + return -EINVAL; 122 + 123 + if (xsk_get_pool_from_qid(netdev, queue_id)) 124 + return -EBUSY; 125 + 126 + pool->netdev = netdev; 127 + pool->queue_id = queue_id; 128 + err = xsk_reg_pool_at_qid(netdev, pool, queue_id); 129 + if (err) 130 + return err; 131 + 132 + if (flags & XDP_USE_NEED_WAKEUP) { 133 + pool->uses_need_wakeup = true; 134 + /* Tx needs to be explicitly woken up the first time. 135 + * Also for supporting drivers that do not implement this 136 + * feature. They will always have to call sendto(). 137 + */ 138 + pool->cached_need_wakeup = XDP_WAKEUP_TX; 139 + } 140 + 141 + dev_hold(netdev); 142 + 143 + if (force_copy) 144 + /* For copy-mode, we are done. */ 145 + return 0; 146 + 147 + if (!netdev->netdev_ops->ndo_bpf || 148 + !netdev->netdev_ops->ndo_xsk_wakeup) { 149 + err = -EOPNOTSUPP; 150 + goto err_unreg_pool; 151 + } 152 + 153 + bpf.command = XDP_SETUP_XSK_POOL; 154 + bpf.xsk.pool = pool; 155 + bpf.xsk.queue_id = queue_id; 156 + 157 + err = netdev->netdev_ops->ndo_bpf(netdev, &bpf); 158 + if (err) 159 + goto err_unreg_pool; 160 + 161 + if (!pool->dma_pages) { 162 + WARN(1, "Driver did not DMA map zero-copy buffers"); 163 + goto err_unreg_xsk; 164 + } 165 + pool->umem->zc = true; 166 + return 0; 167 + 168 + err_unreg_xsk: 169 + xp_disable_drv_zc(pool); 170 + err_unreg_pool: 171 + if (!force_zc) 172 + err = 0; /* fallback to copy mode */ 173 + if (err) 174 + xsk_clear_pool_at_qid(netdev, queue_id); 175 + return err; 176 + } 177 + 178 + int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev, 179 + u16 queue_id, u16 flags) 180 + { 181 + return __xp_assign_dev(pool, dev, queue_id, flags); 182 + } 183 + 184 + int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_umem *umem, 185 + struct net_device *dev, u16 queue_id) 186 + { 187 + u16 flags; 188 + 189 + /* One fill and completion ring required for each queue id. */ 190 + if (!pool->fq || !pool->cq) 191 + return -EINVAL; 192 + 193 + flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY; 194 + if (pool->uses_need_wakeup) 195 + flags |= XDP_USE_NEED_WAKEUP; 196 + 197 + return __xp_assign_dev(pool, dev, queue_id, flags); 198 + } 199 + 200 + void xp_clear_dev(struct xsk_buff_pool *pool) 201 + { 202 + if (!pool->netdev) 203 + return; 204 + 205 + xp_disable_drv_zc(pool); 206 + xsk_clear_pool_at_qid(pool->netdev, pool->queue_id); 207 + dev_put(pool->netdev); 208 + pool->netdev = NULL; 209 + } 210 + 211 + static void xp_release_deferred(struct work_struct *work) 212 + { 213 + struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool, 214 + work); 215 + 216 + rtnl_lock(); 217 + xp_clear_dev(pool); 218 + rtnl_unlock(); 219 + 220 + if (pool->fq) { 221 + xskq_destroy(pool->fq); 222 + pool->fq = NULL; 223 + } 224 + 225 + if (pool->cq) { 226 + xskq_destroy(pool->cq); 227 + pool->cq = NULL; 228 + } 229 + 230 + xdp_put_umem(pool->umem); 231 + xp_destroy(pool); 232 + } 233 + 234 + void xp_get_pool(struct xsk_buff_pool *pool) 235 + { 236 + refcount_inc(&pool->users); 237 + } 238 + 239 + void xp_put_pool(struct xsk_buff_pool *pool) 240 + { 241 + if (!pool) 242 + return; 243 + 244 + if (refcount_dec_and_test(&pool->users)) { 245 + INIT_WORK(&pool->work, xp_release_deferred); 246 + schedule_work(&pool->work); 247 + } 248 + } 249 + 250 + static struct xsk_dma_map *xp_find_dma_map(struct xsk_buff_pool *pool) 251 + { 252 + struct xsk_dma_map *dma_map; 253 + 254 + list_for_each_entry(dma_map, &pool->umem->xsk_dma_list, list) { 255 + if (dma_map->netdev == pool->netdev) 256 + return dma_map; 257 + } 258 + 259 + return NULL; 260 + } 261 + 262 + static struct xsk_dma_map *xp_create_dma_map(struct device *dev, struct net_device *netdev, 263 + u32 nr_pages, struct xdp_umem *umem) 264 + { 265 + struct xsk_dma_map *dma_map; 266 + 267 + dma_map = kzalloc(sizeof(*dma_map), GFP_KERNEL); 268 + if (!dma_map) 269 + return NULL; 270 + 271 + dma_map->dma_pages = kvcalloc(nr_pages, sizeof(*dma_map->dma_pages), GFP_KERNEL); 272 + if (!dma_map) { 273 + kfree(dma_map); 274 + return NULL; 275 + } 276 + 277 + dma_map->netdev = netdev; 278 + dma_map->dev = dev; 279 + dma_map->dma_need_sync = false; 280 + dma_map->dma_pages_cnt = nr_pages; 281 + refcount_set(&dma_map->users, 0); 282 + list_add(&dma_map->list, &umem->xsk_dma_list); 283 + return dma_map; 284 + } 285 + 286 + static void xp_destroy_dma_map(struct xsk_dma_map *dma_map) 287 + { 288 + list_del(&dma_map->list); 289 + kvfree(dma_map->dma_pages); 290 + kfree(dma_map); 291 + } 292 + 293 + static void __xp_dma_unmap(struct xsk_dma_map *dma_map, unsigned long attrs) 108 294 { 109 295 dma_addr_t *dma; 110 296 u32 i; 111 297 112 - if (pool->dma_pages_cnt == 0) 113 - return; 114 - 115 - for (i = 0; i < pool->dma_pages_cnt; i++) { 116 - dma = &pool->dma_pages[i]; 298 + for (i = 0; i < dma_map->dma_pages_cnt; i++) { 299 + dma = &dma_map->dma_pages[i]; 117 300 if (*dma) { 118 - dma_unmap_page_attrs(pool->dev, *dma, PAGE_SIZE, 301 + dma_unmap_page_attrs(dma_map->dev, *dma, PAGE_SIZE, 119 302 DMA_BIDIRECTIONAL, attrs); 120 303 *dma = 0; 121 304 } 122 305 } 123 306 307 + xp_destroy_dma_map(dma_map); 308 + } 309 + 310 + void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) 311 + { 312 + struct xsk_dma_map *dma_map; 313 + 314 + if (pool->dma_pages_cnt == 0) 315 + return; 316 + 317 + dma_map = xp_find_dma_map(pool); 318 + if (!dma_map) { 319 + WARN(1, "Could not find dma_map for device"); 320 + return; 321 + } 322 + 323 + if (!refcount_dec_and_test(&dma_map->users)) 324 + return; 325 + 326 + __xp_dma_unmap(dma_map, attrs); 124 327 kvfree(pool->dma_pages); 125 328 pool->dma_pages_cnt = 0; 126 329 pool->dev = NULL; 127 330 } 128 331 EXPORT_SYMBOL(xp_dma_unmap); 129 332 130 - static void xp_check_dma_contiguity(struct xsk_buff_pool *pool) 333 + static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map) 131 334 { 132 335 u32 i; 133 336 134 - for (i = 0; i < pool->dma_pages_cnt - 1; i++) { 135 - if (pool->dma_pages[i] + PAGE_SIZE == pool->dma_pages[i + 1]) 136 - pool->dma_pages[i] |= XSK_NEXT_PG_CONTIG_MASK; 337 + for (i = 0; i < dma_map->dma_pages_cnt - 1; i++) { 338 + if (dma_map->dma_pages[i] + PAGE_SIZE == dma_map->dma_pages[i + 1]) 339 + dma_map->dma_pages[i] |= XSK_NEXT_PG_CONTIG_MASK; 137 340 else 138 - pool->dma_pages[i] &= ~XSK_NEXT_PG_CONTIG_MASK; 341 + dma_map->dma_pages[i] &= ~XSK_NEXT_PG_CONTIG_MASK; 139 342 } 343 + } 344 + 345 + static int xp_init_dma_info(struct xsk_buff_pool *pool, struct xsk_dma_map *dma_map) 346 + { 347 + pool->dma_pages = kvcalloc(dma_map->dma_pages_cnt, sizeof(*pool->dma_pages), GFP_KERNEL); 348 + if (!pool->dma_pages) 349 + return -ENOMEM; 350 + 351 + pool->dev = dma_map->dev; 352 + pool->dma_pages_cnt = dma_map->dma_pages_cnt; 353 + pool->dma_need_sync = dma_map->dma_need_sync; 354 + refcount_inc(&dma_map->users); 355 + memcpy(pool->dma_pages, dma_map->dma_pages, 356 + pool->dma_pages_cnt * sizeof(*pool->dma_pages)); 357 + 358 + return 0; 140 359 } 141 360 142 361 int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev, 143 362 unsigned long attrs, struct page **pages, u32 nr_pages) 144 363 { 364 + struct xsk_dma_map *dma_map; 145 365 dma_addr_t dma; 366 + int err; 146 367 u32 i; 147 368 148 - pool->dma_pages = kvcalloc(nr_pages, sizeof(*pool->dma_pages), 149 - GFP_KERNEL); 150 - if (!pool->dma_pages) 369 + dma_map = xp_find_dma_map(pool); 370 + if (dma_map) { 371 + err = xp_init_dma_info(pool, dma_map); 372 + if (err) 373 + return err; 374 + 375 + return 0; 376 + } 377 + 378 + dma_map = xp_create_dma_map(dev, pool->netdev, nr_pages, pool->umem); 379 + if (!dma_map) 151 380 return -ENOMEM; 152 381 153 - pool->dev = dev; 154 - pool->dma_pages_cnt = nr_pages; 155 - pool->dma_need_sync = false; 156 - 157 - for (i = 0; i < pool->dma_pages_cnt; i++) { 382 + for (i = 0; i < dma_map->dma_pages_cnt; i++) { 158 383 dma = dma_map_page_attrs(dev, pages[i], 0, PAGE_SIZE, 159 384 DMA_BIDIRECTIONAL, attrs); 160 385 if (dma_mapping_error(dev, dma)) { 161 - xp_dma_unmap(pool, attrs); 386 + __xp_dma_unmap(dma_map, attrs); 162 387 return -ENOMEM; 163 388 } 164 389 if (dma_need_sync(dev, dma)) 165 - pool->dma_need_sync = true; 166 - pool->dma_pages[i] = dma; 390 + dma_map->dma_need_sync = true; 391 + dma_map->dma_pages[i] = dma; 167 392 } 168 393 169 394 if (pool->unaligned) 170 - xp_check_dma_contiguity(pool); 395 + xp_check_dma_contiguity(dma_map); 396 + 397 + err = xp_init_dma_info(pool, dma_map); 398 + if (err) { 399 + __xp_dma_unmap(dma_map, attrs); 400 + return err; 401 + } 402 + 171 403 return 0; 172 404 } 173 405 EXPORT_SYMBOL(xp_dma_map);

+9 -7

net/xdp/xsk_diag.c

··· 46 46 47 47 static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb) 48 48 { 49 + struct xsk_buff_pool *pool = xs->pool; 49 50 struct xdp_umem *umem = xs->umem; 50 51 struct xdp_diag_umem du = {}; 51 52 int err; ··· 59 58 du.num_pages = umem->npgs; 60 59 du.chunk_size = umem->chunk_size; 61 60 du.headroom = umem->headroom; 62 - du.ifindex = umem->dev ? umem->dev->ifindex : 0; 63 - du.queue_id = umem->queue_id; 61 + du.ifindex = pool->netdev ? pool->netdev->ifindex : 0; 62 + du.queue_id = pool->queue_id; 64 63 du.flags = 0; 65 64 if (umem->zc) 66 65 du.flags |= XDP_DU_F_ZEROCOPY; ··· 68 67 69 68 err = nla_put(nlskb, XDP_DIAG_UMEM, sizeof(du), &du); 70 69 71 - if (!err && umem->fq) 72 - err = xsk_diag_put_ring(umem->fq, XDP_DIAG_UMEM_FILL_RING, nlskb); 73 - if (!err && umem->cq) { 74 - err = xsk_diag_put_ring(umem->cq, XDP_DIAG_UMEM_COMPLETION_RING, 70 + if (!err && pool->fq) 71 + err = xsk_diag_put_ring(pool->fq, 72 + XDP_DIAG_UMEM_FILL_RING, nlskb); 73 + if (!err && pool->cq) { 74 + err = xsk_diag_put_ring(pool->cq, XDP_DIAG_UMEM_COMPLETION_RING, 75 75 nlskb); 76 76 } 77 77 return err; ··· 85 83 du.n_rx_dropped = xs->rx_dropped; 86 84 du.n_rx_invalid = xskq_nb_invalid_descs(xs->rx); 87 85 du.n_rx_full = xs->rx_queue_full; 88 - du.n_fill_ring_empty = xs->umem ? xskq_nb_queue_empty_descs(xs->umem->fq) : 0; 86 + du.n_fill_ring_empty = xs->pool ? xskq_nb_queue_empty_descs(xs->pool->fq) : 0; 89 87 du.n_tx_invalid = xskq_nb_invalid_descs(xs->tx); 90 88 du.n_tx_ring_empty = xskq_nb_queue_empty_descs(xs->tx); 91 89 return nla_put(nlskb, XDP_DIAG_STATS, sizeof(du), &du);

+6 -6

net/xdp/xsk_queue.h

··· 166 166 167 167 static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q, 168 168 struct xdp_desc *d, 169 - struct xdp_umem *umem) 169 + struct xsk_buff_pool *pool) 170 170 { 171 - if (!xp_validate_desc(umem->pool, d)) { 171 + if (!xp_validate_desc(pool, d)) { 172 172 q->invalid_descs++; 173 173 return false; 174 174 } ··· 177 177 178 178 static inline bool xskq_cons_read_desc(struct xsk_queue *q, 179 179 struct xdp_desc *desc, 180 - struct xdp_umem *umem) 180 + struct xsk_buff_pool *pool) 181 181 { 182 182 while (q->cached_cons != q->cached_prod) { 183 183 struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; 184 184 u32 idx = q->cached_cons & q->ring_mask; 185 185 186 186 *desc = ring->desc[idx]; 187 - if (xskq_cons_is_valid_desc(q, desc, umem)) 187 + if (xskq_cons_is_valid_desc(q, desc, pool)) 188 188 return true; 189 189 190 190 q->cached_cons++; ··· 236 236 237 237 static inline bool xskq_cons_peek_desc(struct xsk_queue *q, 238 238 struct xdp_desc *desc, 239 - struct xdp_umem *umem) 239 + struct xsk_buff_pool *pool) 240 240 { 241 241 if (q->cached_prod == q->cached_cons) 242 242 xskq_cons_get_entries(q); 243 - return xskq_cons_read_desc(q, desc, umem); 243 + return xskq_cons_read_desc(q, desc, pool); 244 244 } 245 245 246 246 static inline void xskq_cons_release(struct xsk_queue *q)

+8

net/xdp/xskmap.c

··· 254 254 spin_unlock_bh(&map->lock); 255 255 } 256 256 257 + static bool xsk_map_meta_equal(const struct bpf_map *meta0, 258 + const struct bpf_map *meta1) 259 + { 260 + return meta0->max_entries == meta1->max_entries && 261 + bpf_map_meta_equal(meta0, meta1); 262 + } 263 + 257 264 static int xsk_map_btf_id; 258 265 const struct bpf_map_ops xsk_map_ops = { 266 + .map_meta_equal = xsk_map_meta_equal, 259 267 .map_alloc = xsk_map_alloc, 260 268 .map_free = xsk_map_free, 261 269 .map_get_next_key = xsk_map_get_next_key,

+12 -9

samples/bpf/Makefile

··· 48 48 tprogs-y += cpustat 49 49 tprogs-y += xdp_adjust_tail 50 50 tprogs-y += xdpsock 51 + tprogs-y += xsk_fwd 51 52 tprogs-y += xdp_fwd 52 53 tprogs-y += task_fd_query 53 54 tprogs-y += xdp_sample_pkts ··· 72 71 tracex5-objs := tracex5_user.o $(TRACE_HELPERS) 73 72 tracex6-objs := tracex6_user.o 74 73 tracex7-objs := tracex7_user.o 75 - test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o 76 - trace_output-objs := bpf_load.o trace_output_user.o $(TRACE_HELPERS) 77 - lathist-objs := bpf_load.o lathist_user.o 78 - offwaketime-objs := bpf_load.o offwaketime_user.o $(TRACE_HELPERS) 79 - spintest-objs := bpf_load.o spintest_user.o $(TRACE_HELPERS) 80 - map_perf_test-objs := bpf_load.o map_perf_test_user.o 74 + test_probe_write_user-objs := test_probe_write_user_user.o 75 + trace_output-objs := trace_output_user.o $(TRACE_HELPERS) 76 + lathist-objs := lathist_user.o 77 + offwaketime-objs := offwaketime_user.o $(TRACE_HELPERS) 78 + spintest-objs := spintest_user.o $(TRACE_HELPERS) 79 + map_perf_test-objs := map_perf_test_user.o 81 80 test_overhead-objs := bpf_load.o test_overhead_user.o 82 81 test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o 83 82 test_cgrp2_attach-objs := test_cgrp2_attach.o ··· 87 86 # reuse xdp1 source intentionally 88 87 xdp2-objs := xdp1_user.o 89 88 xdp_router_ipv4-objs := xdp_router_ipv4_user.o 90 - test_current_task_under_cgroup-objs := bpf_load.o $(CGROUP_HELPERS) \ 89 + test_current_task_under_cgroup-objs := $(CGROUP_HELPERS) \ 91 90 test_current_task_under_cgroup_user.o 92 91 trace_event-objs := trace_event_user.o $(TRACE_HELPERS) 93 92 sampleip-objs := sampleip_user.o $(TRACE_HELPERS) ··· 101 100 xdp_redirect_cpu-objs := bpf_load.o xdp_redirect_cpu_user.o 102 101 xdp_monitor-objs := bpf_load.o xdp_monitor_user.o 103 102 xdp_rxq_info-objs := xdp_rxq_info_user.o 104 - syscall_tp-objs := bpf_load.o syscall_tp_user.o 105 - cpustat-objs := bpf_load.o cpustat_user.o 103 + syscall_tp-objs := syscall_tp_user.o 104 + cpustat-objs := cpustat_user.o 106 105 xdp_adjust_tail-objs := xdp_adjust_tail_user.o 107 106 xdpsock-objs := xdpsock_user.o 107 + xsk_fwd-objs := xsk_fwd.o 108 108 xdp_fwd-objs := xdp_fwd_user.o 109 109 task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS) 110 110 xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS) ··· 205 203 TPROGLDLIBS_map_perf_test += -lrt 206 204 TPROGLDLIBS_test_overhead += -lrt 207 205 TPROGLDLIBS_xdpsock += -pthread 206 + TPROGLDLIBS_xsk_fwd += -pthread 208 207 209 208 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: 210 209 # make M=samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang

+18 -18

samples/bpf/cpustat_kern.c

··· 51 51 #define MAP_OFF_PSTATE_IDX 3 52 52 #define MAP_OFF_NUM 4 53 53 54 - struct bpf_map_def SEC("maps") my_map = { 55 - .type = BPF_MAP_TYPE_ARRAY, 56 - .key_size = sizeof(u32), 57 - .value_size = sizeof(u64), 58 - .max_entries = MAX_CPU * MAP_OFF_NUM, 59 - }; 54 + struct { 55 + __uint(type, BPF_MAP_TYPE_ARRAY); 56 + __type(key, u32); 57 + __type(value, u64); 58 + __uint(max_entries, MAX_CPU * MAP_OFF_NUM); 59 + } my_map SEC(".maps"); 60 60 61 61 /* cstate_duration records duration time for every idle state per CPU */ 62 - struct bpf_map_def SEC("maps") cstate_duration = { 63 - .type = BPF_MAP_TYPE_ARRAY, 64 - .key_size = sizeof(u32), 65 - .value_size = sizeof(u64), 66 - .max_entries = MAX_CPU * MAX_CSTATE_ENTRIES, 67 - }; 62 + struct { 63 + __uint(type, BPF_MAP_TYPE_ARRAY); 64 + __type(key, u32); 65 + __type(value, u64); 66 + __uint(max_entries, MAX_CPU * MAX_CSTATE_ENTRIES); 67 + } cstate_duration SEC(".maps"); 68 68 69 69 /* pstate_duration records duration time for every operating point per CPU */ 70 - struct bpf_map_def SEC("maps") pstate_duration = { 71 - .type = BPF_MAP_TYPE_ARRAY, 72 - .key_size = sizeof(u32), 73 - .value_size = sizeof(u64), 74 - .max_entries = MAX_CPU * MAX_PSTATE_ENTRIES, 75 - }; 70 + struct { 71 + __uint(type, BPF_MAP_TYPE_ARRAY); 72 + __type(key, u32); 73 + __type(value, u64); 74 + __uint(max_entries, MAX_CPU * MAX_PSTATE_ENTRIES); 75 + } pstate_duration SEC(".maps"); 76 76 77 77 /* 78 78 * The trace events for cpu_idle and cpu_frequency are taken from:

+40 -7

samples/bpf/cpustat_user.c

··· 9 9 #include <string.h> 10 10 #include <unistd.h> 11 11 #include <fcntl.h> 12 - #include <linux/bpf.h> 13 12 #include <locale.h> 14 13 #include <sys/types.h> 15 14 #include <sys/stat.h> ··· 17 18 #include <sys/wait.h> 18 19 19 20 #include <bpf/bpf.h> 20 - #include "bpf_load.h" 21 + #include <bpf/libbpf.h> 22 + 23 + static int cstate_map_fd, pstate_map_fd; 21 24 22 25 #define MAX_CPU 8 23 26 #define MAX_PSTATE_ENTRIES 5 ··· 182 181 { 183 182 cpu_stat_inject_cpu_idle_event(); 184 183 cpu_stat_inject_cpu_frequency_event(); 185 - cpu_stat_update(map_fd[1], map_fd[2]); 184 + cpu_stat_update(cstate_map_fd, pstate_map_fd); 186 185 cpu_stat_print(); 187 186 exit(0); 188 187 } 189 188 190 189 int main(int argc, char **argv) 191 190 { 191 + struct bpf_link *link = NULL; 192 + struct bpf_program *prog; 193 + struct bpf_object *obj; 192 194 char filename[256]; 193 195 int ret; 194 196 195 197 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 198 + obj = bpf_object__open_file(filename, NULL); 199 + if (libbpf_get_error(obj)) { 200 + fprintf(stderr, "ERROR: opening BPF object file failed\n"); 201 + return 0; 202 + } 196 203 197 - if (load_bpf_file(filename)) { 198 - printf("%s", bpf_log_buf); 199 - return 1; 204 + prog = bpf_object__find_program_by_name(obj, "bpf_prog1"); 205 + if (!prog) { 206 + printf("finding a prog in obj file failed\n"); 207 + goto cleanup; 208 + } 209 + 210 + /* load BPF program */ 211 + if (bpf_object__load(obj)) { 212 + fprintf(stderr, "ERROR: loading BPF object file failed\n"); 213 + goto cleanup; 214 + } 215 + 216 + cstate_map_fd = bpf_object__find_map_fd_by_name(obj, "cstate_duration"); 217 + pstate_map_fd = bpf_object__find_map_fd_by_name(obj, "pstate_duration"); 218 + if (cstate_map_fd < 0 || pstate_map_fd < 0) { 219 + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); 220 + goto cleanup; 221 + } 222 + 223 + link = bpf_program__attach(prog); 224 + if (libbpf_get_error(link)) { 225 + fprintf(stderr, "ERROR: bpf_program__attach failed\n"); 226 + link = NULL; 227 + goto cleanup; 200 228 } 201 229 202 230 ret = cpu_stat_inject_cpu_idle_event(); ··· 240 210 signal(SIGTERM, int_exit); 241 211 242 212 while (1) { 243 - cpu_stat_update(map_fd[1], map_fd[2]); 213 + cpu_stat_update(cstate_map_fd, pstate_map_fd); 244 214 cpu_stat_print(); 245 215 sleep(5); 246 216 } 247 217 218 + cleanup: 219 + bpf_link__destroy(link); 220 + bpf_object__close(obj); 248 221 return 0; 249 222 }

+12 -12

samples/bpf/lathist_kern.c

··· 18 18 * trace_preempt_[on|off] tracepoints hooks is not supported. 19 19 */ 20 20 21 - struct bpf_map_def SEC("maps") my_map = { 22 - .type = BPF_MAP_TYPE_ARRAY, 23 - .key_size = sizeof(int), 24 - .value_size = sizeof(u64), 25 - .max_entries = MAX_CPU, 26 - }; 21 + struct { 22 + __uint(type, BPF_MAP_TYPE_ARRAY); 23 + __type(key, int); 24 + __type(value, u64); 25 + __uint(max_entries, MAX_CPU); 26 + } my_map SEC(".maps"); 27 27 28 28 SEC("kprobe/trace_preempt_off") 29 29 int bpf_prog1(struct pt_regs *ctx) ··· 61 61 return log2(v); 62 62 } 63 63 64 - struct bpf_map_def SEC("maps") my_lat = { 65 - .type = BPF_MAP_TYPE_ARRAY, 66 - .key_size = sizeof(int), 67 - .value_size = sizeof(long), 68 - .max_entries = MAX_CPU * MAX_ENTRIES, 69 - }; 64 + struct { 65 + __uint(type, BPF_MAP_TYPE_ARRAY); 66 + __type(key, int); 67 + __type(value, long); 68 + __uint(max_entries, MAX_CPU * MAX_ENTRIES); 69 + } my_lat SEC(".maps"); 70 70 71 71 SEC("kprobe/trace_preempt_on") 72 72 int bpf_prog2(struct pt_regs *ctx)

+36 -6

samples/bpf/lathist_user.c

··· 6 6 #include <unistd.h> 7 7 #include <stdlib.h> 8 8 #include <signal.h> 9 - #include <linux/bpf.h> 9 + #include <bpf/libbpf.h> 10 10 #include <bpf/bpf.h> 11 - #include "bpf_load.h" 12 11 13 12 #define MAX_ENTRIES 20 14 13 #define MAX_CPU 4 ··· 80 81 81 82 int main(int argc, char **argv) 82 83 { 84 + struct bpf_link *links[2]; 85 + struct bpf_program *prog; 86 + struct bpf_object *obj; 83 87 char filename[256]; 88 + int map_fd, i = 0; 84 89 85 90 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 91 + obj = bpf_object__open_file(filename, NULL); 92 + if (libbpf_get_error(obj)) { 93 + fprintf(stderr, "ERROR: opening BPF object file failed\n"); 94 + return 0; 95 + } 86 96 87 - if (load_bpf_file(filename)) { 88 - printf("%s", bpf_log_buf); 89 - return 1; 97 + /* load BPF program */ 98 + if (bpf_object__load(obj)) { 99 + fprintf(stderr, "ERROR: loading BPF object file failed\n"); 100 + goto cleanup; 101 + } 102 + 103 + map_fd = bpf_object__find_map_fd_by_name(obj, "my_lat"); 104 + if (map_fd < 0) { 105 + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); 106 + goto cleanup; 107 + } 108 + 109 + bpf_object__for_each_program(prog, obj) { 110 + links[i] = bpf_program__attach(prog); 111 + if (libbpf_get_error(links[i])) { 112 + fprintf(stderr, "ERROR: bpf_program__attach failed\n"); 113 + links[i] = NULL; 114 + goto cleanup; 115 + } 116 + i++; 90 117 } 91 118 92 119 while (1) { 93 - get_data(map_fd[1]); 120 + get_data(map_fd); 94 121 print_hist(); 95 122 sleep(5); 96 123 } 97 124 125 + cleanup: 126 + for (i--; i >= 0; i--) 127 + bpf_link__destroy(links[i]); 128 + 129 + bpf_object__close(obj); 98 130 return 0; 99 131 }

+24 -24

samples/bpf/offwaketime_kern.c

··· 28 28 u32 tret; 29 29 }; 30 30 31 - struct bpf_map_def SEC("maps") counts = { 32 - .type = BPF_MAP_TYPE_HASH, 33 - .key_size = sizeof(struct key_t), 34 - .value_size = sizeof(u64), 35 - .max_entries = 10000, 36 - }; 31 + struct { 32 + __uint(type, BPF_MAP_TYPE_HASH); 33 + __type(key, struct key_t); 34 + __type(value, u64); 35 + __uint(max_entries, 10000); 36 + } counts SEC(".maps"); 37 37 38 - struct bpf_map_def SEC("maps") start = { 39 - .type = BPF_MAP_TYPE_HASH, 40 - .key_size = sizeof(u32), 41 - .value_size = sizeof(u64), 42 - .max_entries = 10000, 43 - }; 38 + struct { 39 + __uint(type, BPF_MAP_TYPE_HASH); 40 + __type(key, u32); 41 + __type(value, u64); 42 + __uint(max_entries, 10000); 43 + } start SEC(".maps"); 44 44 45 45 struct wokeby_t { 46 46 char name[TASK_COMM_LEN]; 47 47 u32 ret; 48 48 }; 49 49 50 - struct bpf_map_def SEC("maps") wokeby = { 51 - .type = BPF_MAP_TYPE_HASH, 52 - .key_size = sizeof(u32), 53 - .value_size = sizeof(struct wokeby_t), 54 - .max_entries = 10000, 55 - }; 50 + struct { 51 + __uint(type, BPF_MAP_TYPE_HASH); 52 + __type(key, u32); 53 + __type(value, struct wokeby_t); 54 + __uint(max_entries, 10000); 55 + } wokeby SEC(".maps"); 56 56 57 - struct bpf_map_def SEC("maps") stackmap = { 58 - .type = BPF_MAP_TYPE_STACK_TRACE, 59 - .key_size = sizeof(u32), 60 - .value_size = PERF_MAX_STACK_DEPTH * sizeof(u64), 61 - .max_entries = 10000, 62 - }; 57 + struct { 58 + __uint(type, BPF_MAP_TYPE_STACK_TRACE); 59 + __uint(key_size, sizeof(u32)); 60 + __uint(value_size, PERF_MAX_STACK_DEPTH * sizeof(u64)); 61 + __uint(max_entries, 10000); 62 + } stackmap SEC(".maps"); 63 63 64 64 #define STACKID_FLAGS (0 | BPF_F_FAST_STACK_CMP) 65 65

+51 -15

samples/bpf/offwaketime_user.c

··· 5 5 #include <unistd.h> 6 6 #include <stdlib.h> 7 7 #include <signal.h> 8 - #include <linux/bpf.h> 9 - #include <string.h> 10 8 #include <linux/perf_event.h> 11 9 #include <errno.h> 12 - #include <assert.h> 13 10 #include <stdbool.h> 14 11 #include <sys/resource.h> 15 12 #include <bpf/libbpf.h> 16 - #include "bpf_load.h" 13 + #include <bpf/bpf.h> 17 14 #include "trace_helpers.h" 18 15 19 16 #define PRINT_RAW_ADDR 0 17 + 18 + /* counts, stackmap */ 19 + static int map_fd[2]; 20 20 21 21 static void print_ksym(__u64 addr) 22 22 { ··· 52 52 int i; 53 53 54 54 printf("%s;", key->target); 55 - if (bpf_map_lookup_elem(map_fd[3], &key->tret, ip) != 0) { 55 + if (bpf_map_lookup_elem(map_fd[1], &key->tret, ip) != 0) { 56 56 printf("---;"); 57 57 } else { 58 58 for (i = PERF_MAX_STACK_DEPTH - 1; i >= 0; i--) 59 59 print_ksym(ip[i]); 60 60 } 61 61 printf("-;"); 62 - if (bpf_map_lookup_elem(map_fd[3], &key->wret, ip) != 0) { 62 + if (bpf_map_lookup_elem(map_fd[1], &key->wret, ip) != 0) { 63 63 printf("---;"); 64 64 } else { 65 65 for (i = 0; i < PERF_MAX_STACK_DEPTH; i++) ··· 96 96 int main(int argc, char **argv) 97 97 { 98 98 struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 99 + struct bpf_object *obj = NULL; 100 + struct bpf_link *links[2]; 101 + struct bpf_program *prog; 102 + int delay = 1, i = 0; 99 103 char filename[256]; 100 - int delay = 1; 101 104 102 - snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 103 - setrlimit(RLIMIT_MEMLOCK, &r); 104 - 105 - signal(SIGINT, int_exit); 106 - signal(SIGTERM, int_exit); 105 + if (setrlimit(RLIMIT_MEMLOCK, &r)) { 106 + perror("setrlimit(RLIMIT_MEMLOCK)"); 107 + return 1; 108 + } 107 109 108 110 if (load_kallsyms()) { 109 111 printf("failed to process /proc/kallsyms\n"); 110 112 return 2; 111 113 } 112 114 113 - if (load_bpf_file(filename)) { 114 - printf("%s", bpf_log_buf); 115 - return 1; 115 + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 116 + obj = bpf_object__open_file(filename, NULL); 117 + if (libbpf_get_error(obj)) { 118 + fprintf(stderr, "ERROR: opening BPF object file failed\n"); 119 + obj = NULL; 120 + goto cleanup; 121 + } 122 + 123 + /* load BPF program */ 124 + if (bpf_object__load(obj)) { 125 + fprintf(stderr, "ERROR: loading BPF object file failed\n"); 126 + goto cleanup; 127 + } 128 + 129 + map_fd[0] = bpf_object__find_map_fd_by_name(obj, "counts"); 130 + map_fd[1] = bpf_object__find_map_fd_by_name(obj, "stackmap"); 131 + if (map_fd[0] < 0 || map_fd[1] < 0) { 132 + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); 133 + goto cleanup; 134 + } 135 + 136 + signal(SIGINT, int_exit); 137 + signal(SIGTERM, int_exit); 138 + 139 + bpf_object__for_each_program(prog, obj) { 140 + links[i] = bpf_program__attach(prog); 141 + if (libbpf_get_error(links[i])) { 142 + fprintf(stderr, "ERROR: bpf_program__attach failed\n"); 143 + links[i] = NULL; 144 + goto cleanup; 145 + } 146 + i++; 116 147 } 117 148 118 149 if (argc > 1) ··· 151 120 sleep(delay); 152 121 print_stacks(map_fd[0]); 153 122 123 + cleanup: 124 + for (i--; i >= 0; i--) 125 + bpf_link__destroy(links[i]); 126 + 127 + bpf_object__close(obj); 154 128 return 0; 155 129 }

+18 -18

samples/bpf/spintest_kern.c

··· 12 12 #include <bpf/bpf_helpers.h> 13 13 #include <bpf/bpf_tracing.h> 14 14 15 - struct bpf_map_def SEC("maps") my_map = { 16 - .type = BPF_MAP_TYPE_HASH, 17 - .key_size = sizeof(long), 18 - .value_size = sizeof(long), 19 - .max_entries = 1024, 20 - }; 21 - struct bpf_map_def SEC("maps") my_map2 = { 22 - .type = BPF_MAP_TYPE_PERCPU_HASH, 23 - .key_size = sizeof(long), 24 - .value_size = sizeof(long), 25 - .max_entries = 1024, 26 - }; 15 + struct { 16 + __uint(type, BPF_MAP_TYPE_HASH); 17 + __type(key, long); 18 + __type(value, long); 19 + __uint(max_entries, 1024); 20 + } my_map SEC(".maps"); 21 + struct { 22 + __uint(type, BPF_MAP_TYPE_PERCPU_HASH); 23 + __uint(key_size, sizeof(long)); 24 + __uint(value_size, sizeof(long)); 25 + __uint(max_entries, 1024); 26 + } my_map2 SEC(".maps"); 27 27 28 - struct bpf_map_def SEC("maps") stackmap = { 29 - .type = BPF_MAP_TYPE_STACK_TRACE, 30 - .key_size = sizeof(u32), 31 - .value_size = PERF_MAX_STACK_DEPTH * sizeof(u64), 32 - .max_entries = 10000, 33 - }; 28 + struct { 29 + __uint(type, BPF_MAP_TYPE_STACK_TRACE); 30 + __uint(key_size, sizeof(u32)); 31 + __uint(value_size, PERF_MAX_STACK_DEPTH * sizeof(u64)); 32 + __uint(max_entries, 10000); 33 + } stackmap SEC(".maps"); 34 34 35 35 #define PROG(foo) \ 36 36 int foo(struct pt_regs *ctx) \

+55 -13

samples/bpf/spintest_user.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <stdio.h> 3 3 #include <unistd.h> 4 - #include <linux/bpf.h> 5 4 #include <string.h> 6 5 #include <assert.h> 7 6 #include <sys/resource.h> 8 7 #include <bpf/libbpf.h> 9 - #include "bpf_load.h" 8 + #include <bpf/bpf.h> 10 9 #include "trace_helpers.h" 11 10 12 11 int main(int ac, char **argv) 13 12 { 14 13 struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 14 + char filename[256], symbol[256]; 15 + struct bpf_object *obj = NULL; 16 + struct bpf_link *links[20]; 15 17 long key, next_key, value; 16 - char filename[256]; 18 + struct bpf_program *prog; 19 + int map_fd, i, j = 0; 20 + const char *title; 17 21 struct ksym *sym; 18 - int i; 19 22 20 - snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 21 - setrlimit(RLIMIT_MEMLOCK, &r); 23 + if (setrlimit(RLIMIT_MEMLOCK, &r)) { 24 + perror("setrlimit(RLIMIT_MEMLOCK)"); 25 + return 1; 26 + } 22 27 23 28 if (load_kallsyms()) { 24 29 printf("failed to process /proc/kallsyms\n"); 25 30 return 2; 26 31 } 27 32 28 - if (load_bpf_file(filename)) { 29 - printf("%s", bpf_log_buf); 30 - return 1; 33 + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 34 + obj = bpf_object__open_file(filename, NULL); 35 + if (libbpf_get_error(obj)) { 36 + fprintf(stderr, "ERROR: opening BPF object file failed\n"); 37 + obj = NULL; 38 + goto cleanup; 39 + } 40 + 41 + /* load BPF program */ 42 + if (bpf_object__load(obj)) { 43 + fprintf(stderr, "ERROR: loading BPF object file failed\n"); 44 + goto cleanup; 45 + } 46 + 47 + map_fd = bpf_object__find_map_fd_by_name(obj, "my_map"); 48 + if (map_fd < 0) { 49 + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); 50 + goto cleanup; 51 + } 52 + 53 + bpf_object__for_each_program(prog, obj) { 54 + title = bpf_program__title(prog, false); 55 + if (sscanf(title, "kprobe/%s", symbol) != 1) 56 + continue; 57 + 58 + /* Attach prog only when symbol exists */ 59 + if (ksym_get_addr(symbol)) { 60 + links[j] = bpf_program__attach(prog); 61 + if (libbpf_get_error(links[j])) { 62 + fprintf(stderr, "bpf_program__attach failed\n"); 63 + links[j] = NULL; 64 + goto cleanup; 65 + } 66 + j++; 67 + } 31 68 } 32 69 33 70 for (i = 0; i < 5; i++) { 34 71 key = 0; 35 72 printf("kprobing funcs:"); 36 - while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) { 37 - bpf_map_lookup_elem(map_fd[0], &next_key, &value); 73 + while (bpf_map_get_next_key(map_fd, &key, &next_key) == 0) { 74 + bpf_map_lookup_elem(map_fd, &next_key, &value); 38 75 assert(next_key == value); 39 76 sym = ksym_search(value); 40 77 key = next_key; ··· 85 48 if (key) 86 49 printf("\n"); 87 50 key = 0; 88 - while (bpf_map_get_next_key(map_fd[0], &key, &next_key) == 0) 89 - bpf_map_delete_elem(map_fd[0], &next_key); 51 + while (bpf_map_get_next_key(map_fd, &key, &next_key) == 0) 52 + bpf_map_delete_elem(map_fd, &next_key); 90 53 sleep(1); 91 54 } 92 55 56 + cleanup: 57 + for (j--; j >= 0; j--) 58 + bpf_link__destroy(links[j]); 59 + 60 + bpf_object__close(obj); 93 61 return 0; 94 62 }

+12 -12

samples/bpf/syscall_tp_kern.c

··· 18 18 long ret; 19 19 }; 20 20 21 - struct bpf_map_def SEC("maps") enter_open_map = { 22 - .type = BPF_MAP_TYPE_ARRAY, 23 - .key_size = sizeof(u32), 24 - .value_size = sizeof(u32), 25 - .max_entries = 1, 26 - }; 21 + struct { 22 + __uint(type, BPF_MAP_TYPE_ARRAY); 23 + __type(key, u32); 24 + __type(value, u32); 25 + __uint(max_entries, 1); 26 + } enter_open_map SEC(".maps"); 27 27 28 - struct bpf_map_def SEC("maps") exit_open_map = { 29 - .type = BPF_MAP_TYPE_ARRAY, 30 - .key_size = sizeof(u32), 31 - .value_size = sizeof(u32), 32 - .max_entries = 1, 33 - }; 28 + struct { 29 + __uint(type, BPF_MAP_TYPE_ARRAY); 30 + __type(key, u32); 31 + __type(value, u32); 32 + __uint(max_entries, 1); 33 + } exit_open_map SEC(".maps"); 34 34 35 35 static __always_inline void count(void *map) 36 36 {

+42 -12

samples/bpf/syscall_tp_user.c

··· 5 5 #include <unistd.h> 6 6 #include <fcntl.h> 7 7 #include <stdlib.h> 8 - #include <signal.h> 9 - #include <linux/bpf.h> 10 8 #include <string.h> 11 9 #include <linux/perf_event.h> 12 10 #include <errno.h> 13 - #include <assert.h> 14 - #include <stdbool.h> 15 11 #include <sys/resource.h> 12 + #include <bpf/libbpf.h> 16 13 #include <bpf/bpf.h> 17 - #include "bpf_load.h" 18 14 19 15 /* This program verifies bpf attachment to tracepoint sys_enter_* and sys_exit_*. 20 16 * This requires kernel CONFIG_FTRACE_SYSCALLS to be set. ··· 45 49 46 50 static int test(char *filename, int num_progs) 47 51 { 48 - int i, fd, map0_fds[num_progs], map1_fds[num_progs]; 52 + int map0_fds[num_progs], map1_fds[num_progs], fd, i, j = 0; 53 + struct bpf_link *links[num_progs * 4]; 54 + struct bpf_object *objs[num_progs]; 55 + struct bpf_program *prog; 49 56 50 57 for (i = 0; i < num_progs; i++) { 51 - if (load_bpf_file(filename)) { 52 - fprintf(stderr, "%s", bpf_log_buf); 53 - return 1; 58 + objs[i] = bpf_object__open_file(filename, NULL); 59 + if (libbpf_get_error(objs[i])) { 60 + fprintf(stderr, "opening BPF object file failed\n"); 61 + objs[i] = NULL; 62 + goto cleanup; 54 63 } 55 - printf("prog #%d: map ids %d %d\n", i, map_fd[0], map_fd[1]); 56 - map0_fds[i] = map_fd[0]; 57 - map1_fds[i] = map_fd[1]; 64 + 65 + /* load BPF program */ 66 + if (bpf_object__load(objs[i])) { 67 + fprintf(stderr, "loading BPF object file failed\n"); 68 + goto cleanup; 69 + } 70 + 71 + map0_fds[i] = bpf_object__find_map_fd_by_name(objs[i], 72 + "enter_open_map"); 73 + map1_fds[i] = bpf_object__find_map_fd_by_name(objs[i], 74 + "exit_open_map"); 75 + if (map0_fds[i] < 0 || map1_fds[i] < 0) { 76 + fprintf(stderr, "finding a map in obj file failed\n"); 77 + goto cleanup; 78 + } 79 + 80 + bpf_object__for_each_program(prog, objs[i]) { 81 + links[j] = bpf_program__attach(prog); 82 + if (libbpf_get_error(links[j])) { 83 + fprintf(stderr, "bpf_program__attach failed\n"); 84 + links[j] = NULL; 85 + goto cleanup; 86 + } 87 + j++; 88 + } 89 + printf("prog #%d: map ids %d %d\n", i, map0_fds[i], map1_fds[i]); 58 90 } 59 91 60 92 /* current load_bpf_file has perf_event_open default pid = -1 ··· 104 80 verify_map(map1_fds[i]); 105 81 } 106 82 83 + cleanup: 84 + for (j--; j >= 0; j--) 85 + bpf_link__destroy(links[j]); 86 + 87 + for (i--; i >= 0; i--) 88 + bpf_object__close(objs[i]); 107 89 return 0; 108 90 } 109 91

+1 -1

samples/bpf/task_fd_query_kern.c

··· 10 10 return 0; 11 11 } 12 12 13 - SEC("kretprobe/blk_account_io_completion") 13 + SEC("kretprobe/blk_account_io_done") 14 14 int bpf_prog2(struct pt_regs *ctx) 15 15 { 16 16 return 0;

+1 -1

samples/bpf/task_fd_query_user.c

··· 314 314 /* test two functions in the corresponding *_kern.c file */ 315 315 CHECK_AND_RET(test_debug_fs_kprobe(0, "blk_mq_start_request", 316 316 BPF_FD_TYPE_KPROBE)); 317 - CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_completion", 317 + CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_done", 318 318 BPF_FD_TYPE_KRETPROBE)); 319 319 320 320 /* test nondebug fs kprobe */

+14 -13

samples/bpf/test_current_task_under_cgroup_kern.c

··· 10 10 #include <linux/version.h> 11 11 #include <bpf/bpf_helpers.h> 12 12 #include <uapi/linux/utsname.h> 13 + #include "trace_common.h" 13 14 14 - struct bpf_map_def SEC("maps") cgroup_map = { 15 - .type = BPF_MAP_TYPE_CGROUP_ARRAY, 16 - .key_size = sizeof(u32), 17 - .value_size = sizeof(u32), 18 - .max_entries = 1, 19 - }; 15 + struct { 16 + __uint(type, BPF_MAP_TYPE_CGROUP_ARRAY); 17 + __uint(key_size, sizeof(u32)); 18 + __uint(value_size, sizeof(u32)); 19 + __uint(max_entries, 1); 20 + } cgroup_map SEC(".maps"); 20 21 21 - struct bpf_map_def SEC("maps") perf_map = { 22 - .type = BPF_MAP_TYPE_ARRAY, 23 - .key_size = sizeof(u32), 24 - .value_size = sizeof(u64), 25 - .max_entries = 1, 26 - }; 22 + struct { 23 + __uint(type, BPF_MAP_TYPE_ARRAY); 24 + __type(key, u32); 25 + __type(value, u64); 26 + __uint(max_entries, 1); 27 + } perf_map SEC(".maps"); 27 28 28 29 /* Writes the last PID that called sync to a map at index 0 */ 29 - SEC("kprobe/sys_sync") 30 + SEC("kprobe/" SYSCALL(sys_sync)) 30 31 int bpf_prog1(struct pt_regs *ctx) 31 32 { 32 33 u64 pid = bpf_get_current_pid_tgid();

+42 -10

samples/bpf/test_current_task_under_cgroup_user.c

··· 4 4 5 5 #define _GNU_SOURCE 6 6 #include <stdio.h> 7 - #include <linux/bpf.h> 8 7 #include <unistd.h> 9 8 #include <bpf/bpf.h> 10 - #include "bpf_load.h" 9 + #include <bpf/libbpf.h> 11 10 #include "cgroup_helpers.h" 12 11 13 12 #define CGROUP_PATH "/my-cgroup" ··· 14 15 int main(int argc, char **argv) 15 16 { 16 17 pid_t remote_pid, local_pid = getpid(); 17 - int cg2, idx = 0, rc = 0; 18 + struct bpf_link *link = NULL; 19 + struct bpf_program *prog; 20 + int cg2, idx = 0, rc = 1; 21 + struct bpf_object *obj; 18 22 char filename[256]; 23 + int map_fd[2]; 19 24 20 25 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 21 - if (load_bpf_file(filename)) { 22 - printf("%s", bpf_log_buf); 23 - return 1; 26 + obj = bpf_object__open_file(filename, NULL); 27 + if (libbpf_get_error(obj)) { 28 + fprintf(stderr, "ERROR: opening BPF object file failed\n"); 29 + return 0; 30 + } 31 + 32 + prog = bpf_object__find_program_by_name(obj, "bpf_prog1"); 33 + if (!prog) { 34 + printf("finding a prog in obj file failed\n"); 35 + goto cleanup; 36 + } 37 + 38 + /* load BPF program */ 39 + if (bpf_object__load(obj)) { 40 + fprintf(stderr, "ERROR: loading BPF object file failed\n"); 41 + goto cleanup; 42 + } 43 + 44 + map_fd[0] = bpf_object__find_map_fd_by_name(obj, "cgroup_map"); 45 + map_fd[1] = bpf_object__find_map_fd_by_name(obj, "perf_map"); 46 + if (map_fd[0] < 0 || map_fd[1] < 0) { 47 + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); 48 + goto cleanup; 49 + } 50 + 51 + link = bpf_program__attach(prog); 52 + if (libbpf_get_error(link)) { 53 + fprintf(stderr, "ERROR: bpf_program__attach failed\n"); 54 + link = NULL; 55 + goto cleanup; 24 56 } 25 57 26 58 if (setup_cgroup_environment()) ··· 100 70 goto err; 101 71 } 102 72 103 - goto out; 104 - err: 105 - rc = 1; 73 + rc = 0; 106 74 107 - out: 75 + err: 108 76 close(cg2); 109 77 cleanup_cgroup_environment(); 78 + 79 + cleanup: 80 + bpf_link__destroy(link); 81 + bpf_object__close(obj); 110 82 return rc; 111 83 }

+6 -6

samples/bpf/test_probe_write_user_kern.c

··· 13 13 #include <bpf/bpf_core_read.h> 14 14 #include "trace_common.h" 15 15 16 - struct bpf_map_def SEC("maps") dnat_map = { 17 - .type = BPF_MAP_TYPE_HASH, 18 - .key_size = sizeof(struct sockaddr_in), 19 - .value_size = sizeof(struct sockaddr_in), 20 - .max_entries = 256, 21 - }; 16 + struct { 17 + __uint(type, BPF_MAP_TYPE_HASH); 18 + __type(key, struct sockaddr_in); 19 + __type(value, struct sockaddr_in); 20 + __uint(max_entries, 256); 21 + } dnat_map SEC(".maps"); 22 22 23 23 /* kprobe is NOT a stable ABI 24 24 * kernel functions can be removed, renamed or completely change semantics.

+39 -10

samples/bpf/test_probe_write_user_user.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <stdio.h> 3 3 #include <assert.h> 4 - #include <linux/bpf.h> 5 4 #include <unistd.h> 6 5 #include <bpf/bpf.h> 7 - #include "bpf_load.h" 6 + #include <bpf/libbpf.h> 8 7 #include <sys/socket.h> 9 - #include <string.h> 10 8 #include <netinet/in.h> 11 9 #include <arpa/inet.h> 12 10 13 11 int main(int ac, char **argv) 14 12 { 15 - int serverfd, serverconnfd, clientfd; 16 - socklen_t sockaddr_len; 17 - struct sockaddr serv_addr, mapped_addr, tmp_addr; 18 13 struct sockaddr_in *serv_addr_in, *mapped_addr_in, *tmp_addr_in; 14 + struct sockaddr serv_addr, mapped_addr, tmp_addr; 15 + int serverfd, serverconnfd, clientfd, map_fd; 16 + struct bpf_link *link = NULL; 17 + struct bpf_program *prog; 18 + struct bpf_object *obj; 19 + socklen_t sockaddr_len; 19 20 char filename[256]; 20 21 char *ip; 21 22 ··· 25 24 tmp_addr_in = (struct sockaddr_in *)&tmp_addr; 26 25 27 26 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 27 + obj = bpf_object__open_file(filename, NULL); 28 + if (libbpf_get_error(obj)) { 29 + fprintf(stderr, "ERROR: opening BPF object file failed\n"); 30 + return 0; 31 + } 28 32 29 - if (load_bpf_file(filename)) { 30 - printf("%s", bpf_log_buf); 31 - return 1; 33 + prog = bpf_object__find_program_by_name(obj, "bpf_prog1"); 34 + if (libbpf_get_error(prog)) { 35 + fprintf(stderr, "ERROR: finding a prog in obj file failed\n"); 36 + goto cleanup; 37 + } 38 + 39 + /* load BPF program */ 40 + if (bpf_object__load(obj)) { 41 + fprintf(stderr, "ERROR: loading BPF object file failed\n"); 42 + goto cleanup; 43 + } 44 + 45 + map_fd = bpf_object__find_map_fd_by_name(obj, "dnat_map"); 46 + if (map_fd < 0) { 47 + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); 48 + goto cleanup; 49 + } 50 + 51 + link = bpf_program__attach(prog); 52 + if (libbpf_get_error(link)) { 53 + fprintf(stderr, "ERROR: bpf_program__attach failed\n"); 54 + link = NULL; 55 + goto cleanup; 32 56 } 33 57 34 58 assert((serverfd = socket(AF_INET, SOCK_STREAM, 0)) > 0); ··· 77 51 mapped_addr_in->sin_port = htons(5555); 78 52 mapped_addr_in->sin_addr.s_addr = inet_addr("255.255.255.255"); 79 53 80 - assert(!bpf_map_update_elem(map_fd[0], &mapped_addr, &serv_addr, BPF_ANY)); 54 + assert(!bpf_map_update_elem(map_fd, &mapped_addr, &serv_addr, BPF_ANY)); 81 55 82 56 assert(listen(serverfd, 5) == 0); 83 57 ··· 101 75 /* Is the server's getsockname = the socket getpeername */ 102 76 assert(memcmp(&serv_addr, &tmp_addr, sizeof(struct sockaddr_in)) == 0); 103 77 78 + cleanup: 79 + bpf_link__destroy(link); 80 + bpf_object__close(obj); 104 81 return 0; 105 82 }

+8 -7

samples/bpf/trace_output_kern.c

··· 2 2 #include <linux/version.h> 3 3 #include <uapi/linux/bpf.h> 4 4 #include <bpf/bpf_helpers.h> 5 + #include "trace_common.h" 5 6 6 - struct bpf_map_def SEC("maps") my_map = { 7 - .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, 8 - .key_size = sizeof(int), 9 - .value_size = sizeof(u32), 10 - .max_entries = 2, 11 - }; 7 + struct { 8 + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); 9 + __uint(key_size, sizeof(int)); 10 + __uint(value_size, sizeof(u32)); 11 + __uint(max_entries, 2); 12 + } my_map SEC(".maps"); 12 13 13 - SEC("kprobe/sys_write") 14 + SEC("kprobe/" SYSCALL(sys_write)) 14 15 int bpf_prog1(struct pt_regs *ctx) 15 16 { 16 17 struct S {

+37 -18

samples/bpf/trace_output_user.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 #include <stdio.h> 3 - #include <unistd.h> 4 - #include <stdlib.h> 5 - #include <stdbool.h> 6 - #include <string.h> 7 3 #include <fcntl.h> 8 4 #include <poll.h> 9 - #include <linux/perf_event.h> 10 - #include <linux/bpf.h> 11 - #include <errno.h> 12 - #include <assert.h> 13 - #include <sys/syscall.h> 14 - #include <sys/ioctl.h> 15 - #include <sys/mman.h> 16 5 #include <time.h> 17 6 #include <signal.h> 18 7 #include <bpf/libbpf.h> 19 - #include "bpf_load.h" 20 - #include "perf-sys.h" 21 8 22 9 static __u64 time_get_ns(void) 23 10 { ··· 44 57 int main(int argc, char **argv) 45 58 { 46 59 struct perf_buffer_opts pb_opts = {}; 60 + struct bpf_link *link = NULL; 61 + struct bpf_program *prog; 47 62 struct perf_buffer *pb; 63 + struct bpf_object *obj; 64 + int map_fd, ret = 0; 48 65 char filename[256]; 49 66 FILE *f; 50 - int ret; 51 67 52 68 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 69 + obj = bpf_object__open_file(filename, NULL); 70 + if (libbpf_get_error(obj)) { 71 + fprintf(stderr, "ERROR: opening BPF object file failed\n"); 72 + return 0; 73 + } 53 74 54 - if (load_bpf_file(filename)) { 55 - printf("%s", bpf_log_buf); 56 - return 1; 75 + /* load BPF program */ 76 + if (bpf_object__load(obj)) { 77 + fprintf(stderr, "ERROR: loading BPF object file failed\n"); 78 + goto cleanup; 79 + } 80 + 81 + map_fd = bpf_object__find_map_fd_by_name(obj, "my_map"); 82 + if (map_fd < 0) { 83 + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); 84 + goto cleanup; 85 + } 86 + 87 + prog = bpf_object__find_program_by_name(obj, "bpf_prog1"); 88 + if (libbpf_get_error(prog)) { 89 + fprintf(stderr, "ERROR: finding a prog in obj file failed\n"); 90 + goto cleanup; 91 + } 92 + 93 + link = bpf_program__attach(prog); 94 + if (libbpf_get_error(link)) { 95 + fprintf(stderr, "ERROR: bpf_program__attach failed\n"); 96 + link = NULL; 97 + goto cleanup; 57 98 } 58 99 59 100 pb_opts.sample_cb = print_bpf_output; 60 - pb = perf_buffer__new(map_fd[0], 8, &pb_opts); 101 + pb = perf_buffer__new(map_fd, 8, &pb_opts); 61 102 ret = libbpf_get_error(pb); 62 103 if (ret) { 63 104 printf("failed to setup perf_buffer: %d\n", ret); ··· 99 84 while ((ret = perf_buffer__poll(pb, 1000)) >= 0 && cnt < MAX_CNT) { 100 85 } 101 86 kill(0, SIGINT); 87 + 88 + cleanup: 89 + bpf_link__destroy(link); 90 + bpf_object__close(obj); 102 91 return ret; 103 92 }

+1 -1

samples/bpf/tracex3_kern.c

··· 49 49 __uint(max_entries, SLOTS); 50 50 } lat_map SEC(".maps"); 51 51 52 - SEC("kprobe/blk_account_io_completion") 52 + SEC("kprobe/blk_account_io_done") 53 53 int bpf_prog2(struct pt_regs *ctx) 54 54 { 55 55 long rq = PT_REGS_PARM1(ctx);

+19 -13

samples/bpf/xdpsock_user.c

··· 613 613 { 614 614 struct xsk_umem_info *umem; 615 615 struct xsk_umem_config cfg = { 616 - .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, 616 + /* We recommend that you set the fill ring size >= HW RX ring size + 617 + * AF_XDP RX ring size. Make sure you fill up the fill ring 618 + * with buffers at regular intervals, and you will with this setting 619 + * avoid allocation failures in the driver. These are usually quite 620 + * expensive since drivers have not been written to assume that 621 + * allocation failures are common. For regular sockets, kernel 622 + * allocated memory is used that only runs out in OOM situations 623 + * that should be rare. 624 + */ 625 + .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS * 2, 617 626 .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, 618 627 .frame_size = opt_xsk_frame_size, 619 628 .frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM, ··· 649 640 u32 idx; 650 641 651 642 ret = xsk_ring_prod__reserve(&umem->fq, 652 - XSK_RING_PROD__DEFAULT_NUM_DESCS, &idx); 653 - if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS) 643 + XSK_RING_PROD__DEFAULT_NUM_DESCS * 2, &idx); 644 + if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS * 2) 654 645 exit_with_error(-ret); 655 - for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS; i++) 646 + for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS * 2; i++) 656 647 *xsk_ring_prod__fill_addr(&umem->fq, idx++) = 657 648 i * opt_xsk_frame_size; 658 - xsk_ring_prod__submit(&umem->fq, XSK_RING_PROD__DEFAULT_NUM_DESCS); 649 + xsk_ring_prod__submit(&umem->fq, XSK_RING_PROD__DEFAULT_NUM_DESCS * 2); 659 650 } 660 651 661 652 static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem, ··· 897 888 if (!xsk->outstanding_tx) 898 889 return; 899 890 900 - if (!opt_need_wakeup || xsk_ring_prod__needs_wakeup(&xsk->tx)) 901 - kick_tx(xsk); 902 - 903 891 ndescs = (xsk->outstanding_tx > opt_batch_size) ? opt_batch_size : 904 892 xsk->outstanding_tx; 905 893 ··· 1010 1004 } 1011 1005 } 1012 1006 1013 - static void tx_only(struct xsk_socket_info *xsk, u32 frame_nb, int batch_size) 1007 + static void tx_only(struct xsk_socket_info *xsk, u32 *frame_nb, int batch_size) 1014 1008 { 1015 1009 u32 idx; 1016 1010 unsigned int i; ··· 1023 1017 for (i = 0; i < batch_size; i++) { 1024 1018 struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, 1025 1019 idx + i); 1026 - tx_desc->addr = (frame_nb + i) << XSK_UMEM__DEFAULT_FRAME_SHIFT; 1020 + tx_desc->addr = (*frame_nb + i) << XSK_UMEM__DEFAULT_FRAME_SHIFT; 1027 1021 tx_desc->len = PKT_SIZE; 1028 1022 } 1029 1023 1030 1024 xsk_ring_prod__submit(&xsk->tx, batch_size); 1031 1025 xsk->outstanding_tx += batch_size; 1032 - frame_nb += batch_size; 1033 - frame_nb %= NUM_FRAMES; 1026 + *frame_nb += batch_size; 1027 + *frame_nb %= NUM_FRAMES; 1034 1028 complete_tx_only(xsk, batch_size); 1035 1029 } 1036 1030 ··· 1086 1080 } 1087 1081 1088 1082 for (i = 0; i < num_socks; i++) 1089 - tx_only(xsks[i], frame_nb[i], batch_size); 1083 + tx_only(xsks[i], &frame_nb[i], batch_size); 1090 1084 1091 1085 pkt_cnt += batch_size; 1092 1086

+1085

samples/bpf/xsk_fwd.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright(c) 2020 Intel Corporation. */ 3 + 4 + #define _GNU_SOURCE 5 + #include <poll.h> 6 + #include <pthread.h> 7 + #include <signal.h> 8 + #include <sched.h> 9 + #include <stdio.h> 10 + #include <stdlib.h> 11 + #include <string.h> 12 + #include <sys/mman.h> 13 + #include <sys/resource.h> 14 + #include <sys/socket.h> 15 + #include <sys/types.h> 16 + #include <time.h> 17 + #include <unistd.h> 18 + #include <getopt.h> 19 + #include <netinet/ether.h> 20 + #include <net/if.h> 21 + 22 + #include <linux/bpf.h> 23 + #include <linux/if_link.h> 24 + #include <linux/if_xdp.h> 25 + 26 + #include <bpf/libbpf.h> 27 + #include <bpf/xsk.h> 28 + #include <bpf/bpf.h> 29 + 30 + #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) 31 + 32 + typedef __u64 u64; 33 + typedef __u32 u32; 34 + typedef __u16 u16; 35 + typedef __u8 u8; 36 + 37 + /* This program illustrates the packet forwarding between multiple AF_XDP 38 + * sockets in multi-threaded environment. All threads are sharing a common 39 + * buffer pool, with each socket having its own private buffer cache. 40 + * 41 + * Example 1: Single thread handling two sockets. The packets received by socket 42 + * A (interface IFA, queue QA) are forwarded to socket B (interface IFB, queue 43 + * QB), while the packets received by socket B are forwarded to socket A. The 44 + * thread is running on CPU core X: 45 + * 46 + * ./xsk_fwd -i IFA -q QA -i IFB -q QB -c X 47 + * 48 + * Example 2: Two threads, each handling two sockets. The thread running on CPU 49 + * core X forwards all the packets received by socket A to socket B, and all the 50 + * packets received by socket B to socket A. The thread running on CPU core Y is 51 + * performing the same packet forwarding between sockets C and D: 52 + * 53 + * ./xsk_fwd -i IFA -q QA -i IFB -q QB -i IFC -q QC -i IFD -q QD 54 + * -c CX -c CY 55 + */ 56 + 57 + /* 58 + * Buffer pool and buffer cache 59 + * 60 + * For packet forwarding, the packet buffers are typically allocated from the 61 + * pool for packet reception and freed back to the pool for further reuse once 62 + * the packet transmission is completed. 63 + * 64 + * The buffer pool is shared between multiple threads. In order to minimize the 65 + * access latency to the shared buffer pool, each thread creates one (or 66 + * several) buffer caches, which, unlike the buffer pool, are private to the 67 + * thread that creates them and therefore cannot be shared with other threads. 68 + * The access to the shared pool is only needed either (A) when the cache gets 69 + * empty due to repeated buffer allocations and it needs to be replenished from 70 + * the pool, or (B) when the cache gets full due to repeated buffer free and it 71 + * needs to be flushed back to the pull. 72 + * 73 + * In a packet forwarding system, a packet received on any input port can 74 + * potentially be transmitted on any output port, depending on the forwarding 75 + * configuration. For AF_XDP sockets, for this to work with zero-copy of the 76 + * packet buffers when, it is required that the buffer pool memory fits into the 77 + * UMEM area shared by all the sockets. 78 + */ 79 + 80 + struct bpool_params { 81 + u32 n_buffers; 82 + u32 buffer_size; 83 + int mmap_flags; 84 + 85 + u32 n_users_max; 86 + u32 n_buffers_per_slab; 87 + }; 88 + 89 + /* This buffer pool implementation organizes the buffers into equally sized 90 + * slabs of *n_buffers_per_slab*. Initially, there are *n_slabs* slabs in the 91 + * pool that are completely filled with buffer pointers (full slabs). 92 + * 93 + * Each buffer cache has a slab for buffer allocation and a slab for buffer 94 + * free, with both of these slabs initially empty. When the cache's allocation 95 + * slab goes empty, it is swapped with one of the available full slabs from the 96 + * pool, if any is available. When the cache's free slab goes full, it is 97 + * swapped for one of the empty slabs from the pool, which is guaranteed to 98 + * succeed. 99 + * 100 + * Partially filled slabs never get traded between the cache and the pool 101 + * (except when the cache itself is destroyed), which enables fast operation 102 + * through pointer swapping. 103 + */ 104 + struct bpool { 105 + struct bpool_params params; 106 + pthread_mutex_t lock; 107 + void *addr; 108 + 109 + u64 **slabs; 110 + u64 **slabs_reserved; 111 + u64 *buffers; 112 + u64 *buffers_reserved; 113 + 114 + u64 n_slabs; 115 + u64 n_slabs_reserved; 116 + u64 n_buffers; 117 + 118 + u64 n_slabs_available; 119 + u64 n_slabs_reserved_available; 120 + 121 + struct xsk_umem_config umem_cfg; 122 + struct xsk_ring_prod umem_fq; 123 + struct xsk_ring_cons umem_cq; 124 + struct xsk_umem *umem; 125 + }; 126 + 127 + static struct bpool * 128 + bpool_init(struct bpool_params *params, 129 + struct xsk_umem_config *umem_cfg) 130 + { 131 + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 132 + u64 n_slabs, n_slabs_reserved, n_buffers, n_buffers_reserved; 133 + u64 slabs_size, slabs_reserved_size; 134 + u64 buffers_size, buffers_reserved_size; 135 + u64 total_size, i; 136 + struct bpool *bp; 137 + u8 *p; 138 + int status; 139 + 140 + /* mmap prep. */ 141 + if (setrlimit(RLIMIT_MEMLOCK, &r)) 142 + return NULL; 143 + 144 + /* bpool internals dimensioning. */ 145 + n_slabs = (params->n_buffers + params->n_buffers_per_slab - 1) / 146 + params->n_buffers_per_slab; 147 + n_slabs_reserved = params->n_users_max * 2; 148 + n_buffers = n_slabs * params->n_buffers_per_slab; 149 + n_buffers_reserved = n_slabs_reserved * params->n_buffers_per_slab; 150 + 151 + slabs_size = n_slabs * sizeof(u64 *); 152 + slabs_reserved_size = n_slabs_reserved * sizeof(u64 *); 153 + buffers_size = n_buffers * sizeof(u64); 154 + buffers_reserved_size = n_buffers_reserved * sizeof(u64); 155 + 156 + total_size = sizeof(struct bpool) + 157 + slabs_size + slabs_reserved_size + 158 + buffers_size + buffers_reserved_size; 159 + 160 + /* bpool memory allocation. */ 161 + p = calloc(total_size, sizeof(u8)); 162 + if (!p) 163 + return NULL; 164 + 165 + /* bpool memory initialization. */ 166 + bp = (struct bpool *)p; 167 + memcpy(&bp->params, params, sizeof(*params)); 168 + bp->params.n_buffers = n_buffers; 169 + 170 + bp->slabs = (u64 **)&p[sizeof(struct bpool)]; 171 + bp->slabs_reserved = (u64 **)&p[sizeof(struct bpool) + 172 + slabs_size]; 173 + bp->buffers = (u64 *)&p[sizeof(struct bpool) + 174 + slabs_size + slabs_reserved_size]; 175 + bp->buffers_reserved = (u64 *)&p[sizeof(struct bpool) + 176 + slabs_size + slabs_reserved_size + buffers_size]; 177 + 178 + bp->n_slabs = n_slabs; 179 + bp->n_slabs_reserved = n_slabs_reserved; 180 + bp->n_buffers = n_buffers; 181 + 182 + for (i = 0; i < n_slabs; i++) 183 + bp->slabs[i] = &bp->buffers[i * params->n_buffers_per_slab]; 184 + bp->n_slabs_available = n_slabs; 185 + 186 + for (i = 0; i < n_slabs_reserved; i++) 187 + bp->slabs_reserved[i] = &bp->buffers_reserved[i * 188 + params->n_buffers_per_slab]; 189 + bp->n_slabs_reserved_available = n_slabs_reserved; 190 + 191 + for (i = 0; i < n_buffers; i++) 192 + bp->buffers[i] = i * params->buffer_size; 193 + 194 + /* lock. */ 195 + status = pthread_mutex_init(&bp->lock, NULL); 196 + if (status) { 197 + free(p); 198 + return NULL; 199 + } 200 + 201 + /* mmap. */ 202 + bp->addr = mmap(NULL, 203 + n_buffers * params->buffer_size, 204 + PROT_READ | PROT_WRITE, 205 + MAP_PRIVATE | MAP_ANONYMOUS | params->mmap_flags, 206 + -1, 207 + 0); 208 + if (bp->addr == MAP_FAILED) { 209 + pthread_mutex_destroy(&bp->lock); 210 + free(p); 211 + return NULL; 212 + } 213 + 214 + /* umem. */ 215 + status = xsk_umem__create(&bp->umem, 216 + bp->addr, 217 + bp->params.n_buffers * bp->params.buffer_size, 218 + &bp->umem_fq, 219 + &bp->umem_cq, 220 + umem_cfg); 221 + if (status) { 222 + munmap(bp->addr, bp->params.n_buffers * bp->params.buffer_size); 223 + pthread_mutex_destroy(&bp->lock); 224 + free(p); 225 + return NULL; 226 + } 227 + memcpy(&bp->umem_cfg, umem_cfg, sizeof(*umem_cfg)); 228 + 229 + return bp; 230 + } 231 + 232 + static void 233 + bpool_free(struct bpool *bp) 234 + { 235 + if (!bp) 236 + return; 237 + 238 + xsk_umem__delete(bp->umem); 239 + munmap(bp->addr, bp->params.n_buffers * bp->params.buffer_size); 240 + pthread_mutex_destroy(&bp->lock); 241 + free(bp); 242 + } 243 + 244 + struct bcache { 245 + struct bpool *bp; 246 + 247 + u64 *slab_cons; 248 + u64 *slab_prod; 249 + 250 + u64 n_buffers_cons; 251 + u64 n_buffers_prod; 252 + }; 253 + 254 + static u32 255 + bcache_slab_size(struct bcache *bc) 256 + { 257 + struct bpool *bp = bc->bp; 258 + 259 + return bp->params.n_buffers_per_slab; 260 + } 261 + 262 + static struct bcache * 263 + bcache_init(struct bpool *bp) 264 + { 265 + struct bcache *bc; 266 + 267 + bc = calloc(1, sizeof(struct bcache)); 268 + if (!bc) 269 + return NULL; 270 + 271 + bc->bp = bp; 272 + bc->n_buffers_cons = 0; 273 + bc->n_buffers_prod = 0; 274 + 275 + pthread_mutex_lock(&bp->lock); 276 + if (bp->n_slabs_reserved_available == 0) { 277 + pthread_mutex_unlock(&bp->lock); 278 + free(bc); 279 + return NULL; 280 + } 281 + 282 + bc->slab_cons = bp->slabs_reserved[bp->n_slabs_reserved_available - 1]; 283 + bc->slab_prod = bp->slabs_reserved[bp->n_slabs_reserved_available - 2]; 284 + bp->n_slabs_reserved_available -= 2; 285 + pthread_mutex_unlock(&bp->lock); 286 + 287 + return bc; 288 + } 289 + 290 + static void 291 + bcache_free(struct bcache *bc) 292 + { 293 + struct bpool *bp; 294 + 295 + if (!bc) 296 + return; 297 + 298 + /* In order to keep this example simple, the case of freeing any 299 + * existing buffers from the cache back to the pool is ignored. 300 + */ 301 + 302 + bp = bc->bp; 303 + pthread_mutex_lock(&bp->lock); 304 + bp->slabs_reserved[bp->n_slabs_reserved_available] = bc->slab_prod; 305 + bp->slabs_reserved[bp->n_slabs_reserved_available + 1] = bc->slab_cons; 306 + bp->n_slabs_reserved_available += 2; 307 + pthread_mutex_unlock(&bp->lock); 308 + 309 + free(bc); 310 + } 311 + 312 + /* To work correctly, the implementation requires that the *n_buffers* input 313 + * argument is never greater than the buffer pool's *n_buffers_per_slab*. This 314 + * is typically the case, with one exception taking place when large number of 315 + * buffers are allocated at init time (e.g. for the UMEM fill queue setup). 316 + */ 317 + static inline u32 318 + bcache_cons_check(struct bcache *bc, u32 n_buffers) 319 + { 320 + struct bpool *bp = bc->bp; 321 + u64 n_buffers_per_slab = bp->params.n_buffers_per_slab; 322 + u64 n_buffers_cons = bc->n_buffers_cons; 323 + u64 n_slabs_available; 324 + u64 *slab_full; 325 + 326 + /* 327 + * Consumer slab is not empty: Use what's available locally. Do not 328 + * look for more buffers from the pool when the ask can only be 329 + * partially satisfied. 330 + */ 331 + if (n_buffers_cons) 332 + return (n_buffers_cons < n_buffers) ? 333 + n_buffers_cons : 334 + n_buffers; 335 + 336 + /* 337 + * Consumer slab is empty: look to trade the current consumer slab 338 + * (full) for a full slab from the pool, if any is available. 339 + */ 340 + pthread_mutex_lock(&bp->lock); 341 + n_slabs_available = bp->n_slabs_available; 342 + if (!n_slabs_available) { 343 + pthread_mutex_unlock(&bp->lock); 344 + return 0; 345 + } 346 + 347 + n_slabs_available--; 348 + slab_full = bp->slabs[n_slabs_available]; 349 + bp->slabs[n_slabs_available] = bc->slab_cons; 350 + bp->n_slabs_available = n_slabs_available; 351 + pthread_mutex_unlock(&bp->lock); 352 + 353 + bc->slab_cons = slab_full; 354 + bc->n_buffers_cons = n_buffers_per_slab; 355 + return n_buffers; 356 + } 357 + 358 + static inline u64 359 + bcache_cons(struct bcache *bc) 360 + { 361 + u64 n_buffers_cons = bc->n_buffers_cons - 1; 362 + u64 buffer; 363 + 364 + buffer = bc->slab_cons[n_buffers_cons]; 365 + bc->n_buffers_cons = n_buffers_cons; 366 + return buffer; 367 + } 368 + 369 + static inline void 370 + bcache_prod(struct bcache *bc, u64 buffer) 371 + { 372 + struct bpool *bp = bc->bp; 373 + u64 n_buffers_per_slab = bp->params.n_buffers_per_slab; 374 + u64 n_buffers_prod = bc->n_buffers_prod; 375 + u64 n_slabs_available; 376 + u64 *slab_empty; 377 + 378 + /* 379 + * Producer slab is not yet full: store the current buffer to it. 380 + */ 381 + if (n_buffers_prod < n_buffers_per_slab) { 382 + bc->slab_prod[n_buffers_prod] = buffer; 383 + bc->n_buffers_prod = n_buffers_prod + 1; 384 + return; 385 + } 386 + 387 + /* 388 + * Producer slab is full: trade the cache's current producer slab 389 + * (full) for an empty slab from the pool, then store the current 390 + * buffer to the new producer slab. As one full slab exists in the 391 + * cache, it is guaranteed that there is at least one empty slab 392 + * available in the pool. 393 + */ 394 + pthread_mutex_lock(&bp->lock); 395 + n_slabs_available = bp->n_slabs_available; 396 + slab_empty = bp->slabs[n_slabs_available]; 397 + bp->slabs[n_slabs_available] = bc->slab_prod; 398 + bp->n_slabs_available = n_slabs_available + 1; 399 + pthread_mutex_unlock(&bp->lock); 400 + 401 + slab_empty[0] = buffer; 402 + bc->slab_prod = slab_empty; 403 + bc->n_buffers_prod = 1; 404 + } 405 + 406 + /* 407 + * Port 408 + * 409 + * Each of the forwarding ports sits on top of an AF_XDP socket. In order for 410 + * packet forwarding to happen with no packet buffer copy, all the sockets need 411 + * to share the same UMEM area, which is used as the buffer pool memory. 412 + */ 413 + #ifndef MAX_BURST_RX 414 + #define MAX_BURST_RX 64 415 + #endif 416 + 417 + #ifndef MAX_BURST_TX 418 + #define MAX_BURST_TX 64 419 + #endif 420 + 421 + struct burst_rx { 422 + u64 addr[MAX_BURST_RX]; 423 + u32 len[MAX_BURST_RX]; 424 + }; 425 + 426 + struct burst_tx { 427 + u64 addr[MAX_BURST_TX]; 428 + u32 len[MAX_BURST_TX]; 429 + u32 n_pkts; 430 + }; 431 + 432 + struct port_params { 433 + struct xsk_socket_config xsk_cfg; 434 + struct bpool *bp; 435 + const char *iface; 436 + u32 iface_queue; 437 + }; 438 + 439 + struct port { 440 + struct port_params params; 441 + 442 + struct bcache *bc; 443 + 444 + struct xsk_ring_cons rxq; 445 + struct xsk_ring_prod txq; 446 + struct xsk_ring_prod umem_fq; 447 + struct xsk_ring_cons umem_cq; 448 + struct xsk_socket *xsk; 449 + int umem_fq_initialized; 450 + 451 + u64 n_pkts_rx; 452 + u64 n_pkts_tx; 453 + }; 454 + 455 + static void 456 + port_free(struct port *p) 457 + { 458 + if (!p) 459 + return; 460 + 461 + /* To keep this example simple, the code to free the buffers from the 462 + * socket's receive and transmit queues, as well as from the UMEM fill 463 + * and completion queues, is not included. 464 + */ 465 + 466 + if (p->xsk) 467 + xsk_socket__delete(p->xsk); 468 + 469 + bcache_free(p->bc); 470 + 471 + free(p); 472 + } 473 + 474 + static struct port * 475 + port_init(struct port_params *params) 476 + { 477 + struct port *p; 478 + u32 umem_fq_size, pos = 0; 479 + int status, i; 480 + 481 + /* Memory allocation and initialization. */ 482 + p = calloc(sizeof(struct port), 1); 483 + if (!p) 484 + return NULL; 485 + 486 + memcpy(&p->params, params, sizeof(p->params)); 487 + umem_fq_size = params->bp->umem_cfg.fill_size; 488 + 489 + /* bcache. */ 490 + p->bc = bcache_init(params->bp); 491 + if (!p->bc || 492 + (bcache_slab_size(p->bc) < umem_fq_size) || 493 + (bcache_cons_check(p->bc, umem_fq_size) < umem_fq_size)) { 494 + port_free(p); 495 + return NULL; 496 + } 497 + 498 + /* xsk socket. */ 499 + status = xsk_socket__create_shared(&p->xsk, 500 + params->iface, 501 + params->iface_queue, 502 + params->bp->umem, 503 + &p->rxq, 504 + &p->txq, 505 + &p->umem_fq, 506 + &p->umem_cq, 507 + &params->xsk_cfg); 508 + if (status) { 509 + port_free(p); 510 + return NULL; 511 + } 512 + 513 + /* umem fq. */ 514 + xsk_ring_prod__reserve(&p->umem_fq, umem_fq_size, &pos); 515 + 516 + for (i = 0; i < umem_fq_size; i++) 517 + *xsk_ring_prod__fill_addr(&p->umem_fq, pos + i) = 518 + bcache_cons(p->bc); 519 + 520 + xsk_ring_prod__submit(&p->umem_fq, umem_fq_size); 521 + p->umem_fq_initialized = 1; 522 + 523 + return p; 524 + } 525 + 526 + static inline u32 527 + port_rx_burst(struct port *p, struct burst_rx *b) 528 + { 529 + u32 n_pkts, pos, i; 530 + 531 + /* Free buffers for FQ replenish. */ 532 + n_pkts = ARRAY_SIZE(b->addr); 533 + 534 + n_pkts = bcache_cons_check(p->bc, n_pkts); 535 + if (!n_pkts) 536 + return 0; 537 + 538 + /* RXQ. */ 539 + n_pkts = xsk_ring_cons__peek(&p->rxq, n_pkts, &pos); 540 + if (!n_pkts) { 541 + if (xsk_ring_prod__needs_wakeup(&p->umem_fq)) { 542 + struct pollfd pollfd = { 543 + .fd = xsk_socket__fd(p->xsk), 544 + .events = POLLIN, 545 + }; 546 + 547 + poll(&pollfd, 1, 0); 548 + } 549 + return 0; 550 + } 551 + 552 + for (i = 0; i < n_pkts; i++) { 553 + b->addr[i] = xsk_ring_cons__rx_desc(&p->rxq, pos + i)->addr; 554 + b->len[i] = xsk_ring_cons__rx_desc(&p->rxq, pos + i)->len; 555 + } 556 + 557 + xsk_ring_cons__release(&p->rxq, n_pkts); 558 + p->n_pkts_rx += n_pkts; 559 + 560 + /* UMEM FQ. */ 561 + for ( ; ; ) { 562 + int status; 563 + 564 + status = xsk_ring_prod__reserve(&p->umem_fq, n_pkts, &pos); 565 + if (status == n_pkts) 566 + break; 567 + 568 + if (xsk_ring_prod__needs_wakeup(&p->umem_fq)) { 569 + struct pollfd pollfd = { 570 + .fd = xsk_socket__fd(p->xsk), 571 + .events = POLLIN, 572 + }; 573 + 574 + poll(&pollfd, 1, 0); 575 + } 576 + } 577 + 578 + for (i = 0; i < n_pkts; i++) 579 + *xsk_ring_prod__fill_addr(&p->umem_fq, pos + i) = 580 + bcache_cons(p->bc); 581 + 582 + xsk_ring_prod__submit(&p->umem_fq, n_pkts); 583 + 584 + return n_pkts; 585 + } 586 + 587 + static inline void 588 + port_tx_burst(struct port *p, struct burst_tx *b) 589 + { 590 + u32 n_pkts, pos, i; 591 + int status; 592 + 593 + /* UMEM CQ. */ 594 + n_pkts = p->params.bp->umem_cfg.comp_size; 595 + 596 + n_pkts = xsk_ring_cons__peek(&p->umem_cq, n_pkts, &pos); 597 + 598 + for (i = 0; i < n_pkts; i++) { 599 + u64 addr = *xsk_ring_cons__comp_addr(&p->umem_cq, pos + i); 600 + 601 + bcache_prod(p->bc, addr); 602 + } 603 + 604 + xsk_ring_cons__release(&p->umem_cq, n_pkts); 605 + 606 + /* TXQ. */ 607 + n_pkts = b->n_pkts; 608 + 609 + for ( ; ; ) { 610 + status = xsk_ring_prod__reserve(&p->txq, n_pkts, &pos); 611 + if (status == n_pkts) 612 + break; 613 + 614 + if (xsk_ring_prod__needs_wakeup(&p->txq)) 615 + sendto(xsk_socket__fd(p->xsk), NULL, 0, MSG_DONTWAIT, 616 + NULL, 0); 617 + } 618 + 619 + for (i = 0; i < n_pkts; i++) { 620 + xsk_ring_prod__tx_desc(&p->txq, pos + i)->addr = b->addr[i]; 621 + xsk_ring_prod__tx_desc(&p->txq, pos + i)->len = b->len[i]; 622 + } 623 + 624 + xsk_ring_prod__submit(&p->txq, n_pkts); 625 + if (xsk_ring_prod__needs_wakeup(&p->txq)) 626 + sendto(xsk_socket__fd(p->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0); 627 + p->n_pkts_tx += n_pkts; 628 + } 629 + 630 + /* 631 + * Thread 632 + * 633 + * Packet forwarding threads. 634 + */ 635 + #ifndef MAX_PORTS_PER_THREAD 636 + #define MAX_PORTS_PER_THREAD 16 637 + #endif 638 + 639 + struct thread_data { 640 + struct port *ports_rx[MAX_PORTS_PER_THREAD]; 641 + struct port *ports_tx[MAX_PORTS_PER_THREAD]; 642 + u32 n_ports_rx; 643 + struct burst_rx burst_rx; 644 + struct burst_tx burst_tx[MAX_PORTS_PER_THREAD]; 645 + u32 cpu_core_id; 646 + int quit; 647 + }; 648 + 649 + static void swap_mac_addresses(void *data) 650 + { 651 + struct ether_header *eth = (struct ether_header *)data; 652 + struct ether_addr *src_addr = (struct ether_addr *)&eth->ether_shost; 653 + struct ether_addr *dst_addr = (struct ether_addr *)&eth->ether_dhost; 654 + struct ether_addr tmp; 655 + 656 + tmp = *src_addr; 657 + *src_addr = *dst_addr; 658 + *dst_addr = tmp; 659 + } 660 + 661 + static void * 662 + thread_func(void *arg) 663 + { 664 + struct thread_data *t = arg; 665 + cpu_set_t cpu_cores; 666 + u32 i; 667 + 668 + CPU_ZERO(&cpu_cores); 669 + CPU_SET(t->cpu_core_id, &cpu_cores); 670 + pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_cores); 671 + 672 + for (i = 0; !t->quit; i = (i + 1) & (t->n_ports_rx - 1)) { 673 + struct port *port_rx = t->ports_rx[i]; 674 + struct port *port_tx = t->ports_tx[i]; 675 + struct burst_rx *brx = &t->burst_rx; 676 + struct burst_tx *btx = &t->burst_tx[i]; 677 + u32 n_pkts, j; 678 + 679 + /* RX. */ 680 + n_pkts = port_rx_burst(port_rx, brx); 681 + if (!n_pkts) 682 + continue; 683 + 684 + /* Process & TX. */ 685 + for (j = 0; j < n_pkts; j++) { 686 + u64 addr = xsk_umem__add_offset_to_addr(brx->addr[j]); 687 + u8 *pkt = xsk_umem__get_data(port_rx->params.bp->addr, 688 + addr); 689 + 690 + swap_mac_addresses(pkt); 691 + 692 + btx->addr[btx->n_pkts] = brx->addr[j]; 693 + btx->len[btx->n_pkts] = brx->len[j]; 694 + btx->n_pkts++; 695 + 696 + if (btx->n_pkts == MAX_BURST_TX) { 697 + port_tx_burst(port_tx, btx); 698 + btx->n_pkts = 0; 699 + } 700 + } 701 + } 702 + 703 + return NULL; 704 + } 705 + 706 + /* 707 + * Process 708 + */ 709 + static const struct bpool_params bpool_params_default = { 710 + .n_buffers = 64 * 1024, 711 + .buffer_size = XSK_UMEM__DEFAULT_FRAME_SIZE, 712 + .mmap_flags = 0, 713 + 714 + .n_users_max = 16, 715 + .n_buffers_per_slab = XSK_RING_PROD__DEFAULT_NUM_DESCS * 2, 716 + }; 717 + 718 + static const struct xsk_umem_config umem_cfg_default = { 719 + .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS * 2, 720 + .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, 721 + .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, 722 + .frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM, 723 + .flags = 0, 724 + }; 725 + 726 + static const struct port_params port_params_default = { 727 + .xsk_cfg = { 728 + .rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, 729 + .tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, 730 + .libbpf_flags = 0, 731 + .xdp_flags = XDP_FLAGS_DRV_MODE, 732 + .bind_flags = XDP_USE_NEED_WAKEUP | XDP_ZEROCOPY, 733 + }, 734 + 735 + .bp = NULL, 736 + .iface = NULL, 737 + .iface_queue = 0, 738 + }; 739 + 740 + #ifndef MAX_PORTS 741 + #define MAX_PORTS 64 742 + #endif 743 + 744 + #ifndef MAX_THREADS 745 + #define MAX_THREADS 64 746 + #endif 747 + 748 + static struct bpool_params bpool_params; 749 + static struct xsk_umem_config umem_cfg; 750 + static struct bpool *bp; 751 + 752 + static struct port_params port_params[MAX_PORTS]; 753 + static struct port *ports[MAX_PORTS]; 754 + static u64 n_pkts_rx[MAX_PORTS]; 755 + static u64 n_pkts_tx[MAX_PORTS]; 756 + static int n_ports; 757 + 758 + static pthread_t threads[MAX_THREADS]; 759 + static struct thread_data thread_data[MAX_THREADS]; 760 + static int n_threads; 761 + 762 + static void 763 + print_usage(char *prog_name) 764 + { 765 + const char *usage = 766 + "Usage:\n" 767 + "\t%s [ -b SIZE ] -c CORE -i INTERFACE [ -q QUEUE ]\n" 768 + "\n" 769 + "-c CORE CPU core to run a packet forwarding thread\n" 770 + " on. May be invoked multiple times.\n" 771 + "\n" 772 + "-b SIZE Number of buffers in the buffer pool shared\n" 773 + " by all the forwarding threads. Default: %u.\n" 774 + "\n" 775 + "-i INTERFACE Network interface. Each (INTERFACE, QUEUE)\n" 776 + " pair specifies one forwarding port. May be\n" 777 + " invoked multiple times.\n" 778 + "\n" 779 + "-q QUEUE Network interface queue for RX and TX. Each\n" 780 + " (INTERFACE, QUEUE) pair specified one\n" 781 + " forwarding port. Default: %u. May be invoked\n" 782 + " multiple times.\n" 783 + "\n"; 784 + printf(usage, 785 + prog_name, 786 + bpool_params_default.n_buffers, 787 + port_params_default.iface_queue); 788 + } 789 + 790 + static int 791 + parse_args(int argc, char **argv) 792 + { 793 + struct option lgopts[] = { 794 + { NULL, 0, 0, 0 } 795 + }; 796 + int opt, option_index; 797 + 798 + /* Parse the input arguments. */ 799 + for ( ; ;) { 800 + opt = getopt_long(argc, argv, "c:i:q:", lgopts, &option_index); 801 + if (opt == EOF) 802 + break; 803 + 804 + switch (opt) { 805 + case 'b': 806 + bpool_params.n_buffers = atoi(optarg); 807 + break; 808 + 809 + case 'c': 810 + if (n_threads == MAX_THREADS) { 811 + printf("Max number of threads (%d) reached.\n", 812 + MAX_THREADS); 813 + return -1; 814 + } 815 + 816 + thread_data[n_threads].cpu_core_id = atoi(optarg); 817 + n_threads++; 818 + break; 819 + 820 + case 'i': 821 + if (n_ports == MAX_PORTS) { 822 + printf("Max number of ports (%d) reached.\n", 823 + MAX_PORTS); 824 + return -1; 825 + } 826 + 827 + port_params[n_ports].iface = optarg; 828 + port_params[n_ports].iface_queue = 0; 829 + n_ports++; 830 + break; 831 + 832 + case 'q': 833 + if (n_ports == 0) { 834 + printf("No port specified for queue.\n"); 835 + return -1; 836 + } 837 + port_params[n_ports - 1].iface_queue = atoi(optarg); 838 + break; 839 + 840 + default: 841 + printf("Illegal argument.\n"); 842 + return -1; 843 + } 844 + } 845 + 846 + optind = 1; /* reset getopt lib */ 847 + 848 + /* Check the input arguments. */ 849 + if (!n_ports) { 850 + printf("No ports specified.\n"); 851 + return -1; 852 + } 853 + 854 + if (!n_threads) { 855 + printf("No threads specified.\n"); 856 + return -1; 857 + } 858 + 859 + if (n_ports % n_threads) { 860 + printf("Ports cannot be evenly distributed to threads.\n"); 861 + return -1; 862 + } 863 + 864 + return 0; 865 + } 866 + 867 + static void 868 + print_port(u32 port_id) 869 + { 870 + struct port *port = ports[port_id]; 871 + 872 + printf("Port %u: interface = %s, queue = %u\n", 873 + port_id, port->params.iface, port->params.iface_queue); 874 + } 875 + 876 + static void 877 + print_thread(u32 thread_id) 878 + { 879 + struct thread_data *t = &thread_data[thread_id]; 880 + u32 i; 881 + 882 + printf("Thread %u (CPU core %u): ", 883 + thread_id, t->cpu_core_id); 884 + 885 + for (i = 0; i < t->n_ports_rx; i++) { 886 + struct port *port_rx = t->ports_rx[i]; 887 + struct port *port_tx = t->ports_tx[i]; 888 + 889 + printf("(%s, %u) -> (%s, %u), ", 890 + port_rx->params.iface, 891 + port_rx->params.iface_queue, 892 + port_tx->params.iface, 893 + port_tx->params.iface_queue); 894 + } 895 + 896 + printf("\n"); 897 + } 898 + 899 + static void 900 + print_port_stats_separator(void) 901 + { 902 + printf("+-%4s-+-%12s-+-%13s-+-%12s-+-%13s-+\n", 903 + "----", 904 + "------------", 905 + "-------------", 906 + "------------", 907 + "-------------"); 908 + } 909 + 910 + static void 911 + print_port_stats_header(void) 912 + { 913 + print_port_stats_separator(); 914 + printf("| %4s | %12s | %13s | %12s | %13s |\n", 915 + "Port", 916 + "RX packets", 917 + "RX rate (pps)", 918 + "TX packets", 919 + "TX_rate (pps)"); 920 + print_port_stats_separator(); 921 + } 922 + 923 + static void 924 + print_port_stats_trailer(void) 925 + { 926 + print_port_stats_separator(); 927 + printf("\n"); 928 + } 929 + 930 + static void 931 + print_port_stats(int port_id, u64 ns_diff) 932 + { 933 + struct port *p = ports[port_id]; 934 + double rx_pps, tx_pps; 935 + 936 + rx_pps = (p->n_pkts_rx - n_pkts_rx[port_id]) * 1000000000. / ns_diff; 937 + tx_pps = (p->n_pkts_tx - n_pkts_tx[port_id]) * 1000000000. / ns_diff; 938 + 939 + printf("| %4d | %12llu | %13.0f | %12llu | %13.0f |\n", 940 + port_id, 941 + p->n_pkts_rx, 942 + rx_pps, 943 + p->n_pkts_tx, 944 + tx_pps); 945 + 946 + n_pkts_rx[port_id] = p->n_pkts_rx; 947 + n_pkts_tx[port_id] = p->n_pkts_tx; 948 + } 949 + 950 + static void 951 + print_port_stats_all(u64 ns_diff) 952 + { 953 + int i; 954 + 955 + print_port_stats_header(); 956 + for (i = 0; i < n_ports; i++) 957 + print_port_stats(i, ns_diff); 958 + print_port_stats_trailer(); 959 + } 960 + 961 + static int quit; 962 + 963 + static void 964 + signal_handler(int sig) 965 + { 966 + quit = 1; 967 + } 968 + 969 + static void remove_xdp_program(void) 970 + { 971 + int i; 972 + 973 + for (i = 0 ; i < n_ports; i++) 974 + bpf_set_link_xdp_fd(if_nametoindex(port_params[i].iface), -1, 975 + port_params[i].xsk_cfg.xdp_flags); 976 + } 977 + 978 + int main(int argc, char **argv) 979 + { 980 + struct timespec time; 981 + u64 ns0; 982 + int i; 983 + 984 + /* Parse args. */ 985 + memcpy(&bpool_params, &bpool_params_default, 986 + sizeof(struct bpool_params)); 987 + memcpy(&umem_cfg, &umem_cfg_default, 988 + sizeof(struct xsk_umem_config)); 989 + for (i = 0; i < MAX_PORTS; i++) 990 + memcpy(&port_params[i], &port_params_default, 991 + sizeof(struct port_params)); 992 + 993 + if (parse_args(argc, argv)) { 994 + print_usage(argv[0]); 995 + return -1; 996 + } 997 + 998 + /* Buffer pool initialization. */ 999 + bp = bpool_init(&bpool_params, &umem_cfg); 1000 + if (!bp) { 1001 + printf("Buffer pool initialization failed.\n"); 1002 + return -1; 1003 + } 1004 + printf("Buffer pool created successfully.\n"); 1005 + 1006 + /* Ports initialization. */ 1007 + for (i = 0; i < MAX_PORTS; i++) 1008 + port_params[i].bp = bp; 1009 + 1010 + for (i = 0; i < n_ports; i++) { 1011 + ports[i] = port_init(&port_params[i]); 1012 + if (!ports[i]) { 1013 + printf("Port %d initialization failed.\n", i); 1014 + return -1; 1015 + } 1016 + print_port(i); 1017 + } 1018 + printf("All ports created successfully.\n"); 1019 + 1020 + /* Threads. */ 1021 + for (i = 0; i < n_threads; i++) { 1022 + struct thread_data *t = &thread_data[i]; 1023 + u32 n_ports_per_thread = n_ports / n_threads, j; 1024 + 1025 + for (j = 0; j < n_ports_per_thread; j++) { 1026 + t->ports_rx[j] = ports[i * n_ports_per_thread + j]; 1027 + t->ports_tx[j] = ports[i * n_ports_per_thread + 1028 + (j + 1) % n_ports_per_thread]; 1029 + } 1030 + 1031 + t->n_ports_rx = n_ports_per_thread; 1032 + 1033 + print_thread(i); 1034 + } 1035 + 1036 + for (i = 0; i < n_threads; i++) { 1037 + int status; 1038 + 1039 + status = pthread_create(&threads[i], 1040 + NULL, 1041 + thread_func, 1042 + &thread_data[i]); 1043 + if (status) { 1044 + printf("Thread %d creation failed.\n", i); 1045 + return -1; 1046 + } 1047 + } 1048 + printf("All threads created successfully.\n"); 1049 + 1050 + /* Print statistics. */ 1051 + signal(SIGINT, signal_handler); 1052 + signal(SIGTERM, signal_handler); 1053 + signal(SIGABRT, signal_handler); 1054 + 1055 + clock_gettime(CLOCK_MONOTONIC, &time); 1056 + ns0 = time.tv_sec * 1000000000UL + time.tv_nsec; 1057 + for ( ; !quit; ) { 1058 + u64 ns1, ns_diff; 1059 + 1060 + sleep(1); 1061 + clock_gettime(CLOCK_MONOTONIC, &time); 1062 + ns1 = time.tv_sec * 1000000000UL + time.tv_nsec; 1063 + ns_diff = ns1 - ns0; 1064 + ns0 = ns1; 1065 + 1066 + print_port_stats_all(ns_diff); 1067 + } 1068 + 1069 + /* Threads completion. */ 1070 + printf("Quit.\n"); 1071 + for (i = 0; i < n_threads; i++) 1072 + thread_data[i].quit = 1; 1073 + 1074 + for (i = 0; i < n_threads; i++) 1075 + pthread_join(threads[i], NULL); 1076 + 1077 + for (i = 0; i < n_ports; i++) 1078 + port_free(ports[i]); 1079 + 1080 + bpool_free(bp); 1081 + 1082 + remove_xdp_program(); 1083 + 1084 + return 0; 1085 + }

+2

scripts/bpf_helpers_doc.py

··· 432 432 'struct __sk_buff', 433 433 'struct sk_msg_md', 434 434 'struct xdp_md', 435 + 'struct path', 435 436 ] 436 437 known_types = { 437 438 '...', ··· 473 472 'struct tcp_request_sock', 474 473 'struct udp6_sock', 475 474 'struct task_struct', 475 + 'struct path', 476 476 } 477 477 mapped_types = { 478 478 'u8': '__u8',

+6

security/bpf/hooks.c

··· 11 11 LSM_HOOK_INIT(NAME, bpf_lsm_##NAME), 12 12 #include <linux/lsm_hook_defs.h> 13 13 #undef LSM_HOOK 14 + LSM_HOOK_INIT(inode_free_security, bpf_inode_storage_free), 14 15 }; 15 16 16 17 static int __init bpf_lsm_init(void) ··· 21 20 return 0; 22 21 } 23 22 23 + struct lsm_blob_sizes bpf_lsm_blob_sizes __lsm_ro_after_init = { 24 + .lbs_inode = sizeof(struct bpf_storage_blob), 25 + }; 26 + 24 27 DEFINE_LSM(bpf) = { 25 28 .name = "bpf", 26 29 .init = bpf_lsm_init, 30 + .blobs = &bpf_lsm_blob_sizes 27 31 };

+1 -1

tools/bpf/bpftool/Documentation/bpftool-map.rst

··· 49 49 | | **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps** 50 50 | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash** 51 51 | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage** 52 - | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** } 52 + | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage** } 53 53 54 54 DESCRIPTION 55 55 ===========

+5 -1

tools/bpf/bpftool/Makefile

··· 176 176 $(OUTPUT)%.o: %.c 177 177 $(QUIET_CC)$(CC) $(CFLAGS) -c -MMD -o $@ $< 178 178 179 - clean: $(LIBBPF)-clean 179 + feature-detect-clean: 180 + $(call QUIET_CLEAN, feature-detect) 181 + $(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null 182 + 183 + clean: $(LIBBPF)-clean feature-detect-clean 180 184 $(call QUIET_CLEAN, bpftool) 181 185 $(Q)$(RM) -- $(OUTPUT)bpftool $(OUTPUT)*.o $(OUTPUT)*.d 182 186 $(Q)$(RM) -- $(BPFTOOL_BOOTSTRAP) $(OUTPUT)*.skel.h $(OUTPUT)vmlinux.h

+2 -1

tools/bpf/bpftool/bash-completion/bpftool

··· 704 704 lru_percpu_hash lpm_trie array_of_maps \ 705 705 hash_of_maps devmap devmap_hash sockmap cpumap \ 706 706 xskmap sockhash cgroup_storage reuseport_sockarray \ 707 - percpu_cgroup_storage queue stack' -- \ 707 + percpu_cgroup_storage queue stack sk_storage \ 708 + struct_ops inode_storage' -- \ 708 709 "$cur" ) ) 709 710 return 0 710 711 ;;

-2

tools/bpf/bpftool/gen.c

··· 19 19 #include <sys/mman.h> 20 20 #include <bpf/btf.h> 21 21 22 - #include "bpf/libbpf_internal.h" 23 22 #include "json_writer.h" 24 23 #include "main.h" 25 - 26 24 27 25 #define MAX_OBJ_NAME_LEN 64 28 26

+41 -3

tools/bpf/bpftool/link.c

··· 77 77 jsonw_uint_field(wtr, "attach_type", attach_type); 78 78 } 79 79 80 + static bool is_iter_map_target(const char *target_name) 81 + { 82 + return strcmp(target_name, "bpf_map_elem") == 0 || 83 + strcmp(target_name, "bpf_sk_storage_map") == 0; 84 + } 85 + 86 + static void show_iter_json(struct bpf_link_info *info, json_writer_t *wtr) 87 + { 88 + const char *target_name = u64_to_ptr(info->iter.target_name); 89 + 90 + jsonw_string_field(wtr, "target_name", target_name); 91 + 92 + if (is_iter_map_target(target_name)) 93 + jsonw_uint_field(wtr, "map_id", info->iter.map.map_id); 94 + } 95 + 80 96 static int get_prog_info(int prog_id, struct bpf_prog_info *info) 81 97 { 82 98 __u32 len = sizeof(*info); ··· 144 128 info->cgroup.cgroup_id); 145 129 show_link_attach_type_json(info->cgroup.attach_type, json_wtr); 146 130 break; 131 + case BPF_LINK_TYPE_ITER: 132 + show_iter_json(info, json_wtr); 133 + break; 147 134 case BPF_LINK_TYPE_NETNS: 148 135 jsonw_uint_field(json_wtr, "netns_ino", 149 136 info->netns.netns_ino); ··· 194 175 printf("attach_type %u ", attach_type); 195 176 } 196 177 178 + static void show_iter_plain(struct bpf_link_info *info) 179 + { 180 + const char *target_name = u64_to_ptr(info->iter.target_name); 181 + 182 + printf("target_name %s ", target_name); 183 + 184 + if (is_iter_map_target(target_name)) 185 + printf("map_id %u ", info->iter.map.map_id); 186 + } 187 + 197 188 static int show_link_close_plain(int fd, struct bpf_link_info *info) 198 189 { 199 190 struct bpf_prog_info prog_info; ··· 233 204 printf("\n\tcgroup_id %zu ", (size_t)info->cgroup.cgroup_id); 234 205 show_link_attach_type_plain(info->cgroup.attach_type); 235 206 break; 207 + case BPF_LINK_TYPE_ITER: 208 + show_iter_plain(info); 209 + break; 236 210 case BPF_LINK_TYPE_NETNS: 237 211 printf("\n\tnetns_ino %u ", info->netns.netns_ino); 238 212 show_link_attach_type_plain(info->netns.attach_type); ··· 263 231 { 264 232 struct bpf_link_info info; 265 233 __u32 len = sizeof(info); 266 - char raw_tp_name[256]; 234 + char buf[256]; 267 235 int err; 268 236 269 237 memset(&info, 0, sizeof(info)); ··· 277 245 } 278 246 if (info.type == BPF_LINK_TYPE_RAW_TRACEPOINT && 279 247 !info.raw_tracepoint.tp_name) { 280 - info.raw_tracepoint.tp_name = (unsigned long)&raw_tp_name; 281 - info.raw_tracepoint.tp_name_len = sizeof(raw_tp_name); 248 + info.raw_tracepoint.tp_name = (unsigned long)&buf; 249 + info.raw_tracepoint.tp_name_len = sizeof(buf); 250 + goto again; 251 + } 252 + if (info.type == BPF_LINK_TYPE_ITER && 253 + !info.iter.target_name) { 254 + info.iter.target_name = (unsigned long)&buf; 255 + info.iter.target_name_len = sizeof(buf); 282 256 goto again; 283 257 } 284 258

+2 -1

tools/bpf/bpftool/map.c

··· 50 50 [BPF_MAP_TYPE_SK_STORAGE] = "sk_storage", 51 51 [BPF_MAP_TYPE_STRUCT_OPS] = "struct_ops", 52 52 [BPF_MAP_TYPE_RINGBUF] = "ringbuf", 53 + [BPF_MAP_TYPE_INODE_STORAGE] = "inode_storage", 53 54 }; 54 55 55 56 const size_t map_type_name_size = ARRAY_SIZE(map_type_name); ··· 1443 1442 " lru_percpu_hash | lpm_trie | array_of_maps | hash_of_maps |\n" 1444 1443 " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n" 1445 1444 " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n" 1446 - " queue | stack | sk_storage | struct_ops | ringbuf }\n" 1445 + " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage }\n" 1447 1446 " " HELP_SPEC_OPTIONS "\n" 1448 1447 "", 1449 1448 bin_name, argv[-2]);

+282 -17

tools/bpf/bpftool/net.c

··· 6 6 #include <fcntl.h> 7 7 #include <stdlib.h> 8 8 #include <string.h> 9 + #include <time.h> 9 10 #include <unistd.h> 10 11 #include <bpf/bpf.h> 11 12 #include <bpf/libbpf.h> 12 13 #include <net/if.h> 13 14 #include <linux/if.h> 14 15 #include <linux/rtnetlink.h> 16 + #include <linux/socket.h> 15 17 #include <linux/tc_act/tc_bpf.h> 16 18 #include <sys/socket.h> 17 19 #include <sys/stat.h> 18 20 #include <sys/types.h> 19 21 20 22 #include "bpf/nlattr.h" 21 - #include "bpf/libbpf_internal.h" 22 23 #include "main.h" 23 24 #include "netlink_dumper.h" 25 + 26 + #ifndef SOL_NETLINK 27 + #define SOL_NETLINK 270 28 + #endif 24 29 25 30 struct ip_devname_ifindex { 26 31 char devname[64]; ··· 88 83 } 89 84 90 85 return net_attach_type_size; 86 + } 87 + 88 + typedef int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb); 89 + 90 + typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, dump_nlmsg_t, void *cookie); 91 + 92 + static int netlink_open(__u32 *nl_pid) 93 + { 94 + struct sockaddr_nl sa; 95 + socklen_t addrlen; 96 + int one = 1, ret; 97 + int sock; 98 + 99 + memset(&sa, 0, sizeof(sa)); 100 + sa.nl_family = AF_NETLINK; 101 + 102 + sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); 103 + if (sock < 0) 104 + return -errno; 105 + 106 + if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK, 107 + &one, sizeof(one)) < 0) { 108 + p_err("Netlink error reporting not supported"); 109 + } 110 + 111 + if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) { 112 + ret = -errno; 113 + goto cleanup; 114 + } 115 + 116 + addrlen = sizeof(sa); 117 + if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0) { 118 + ret = -errno; 119 + goto cleanup; 120 + } 121 + 122 + if (addrlen != sizeof(sa)) { 123 + ret = -LIBBPF_ERRNO__INTERNAL; 124 + goto cleanup; 125 + } 126 + 127 + *nl_pid = sa.nl_pid; 128 + return sock; 129 + 130 + cleanup: 131 + close(sock); 132 + return ret; 133 + } 134 + 135 + static int netlink_recv(int sock, __u32 nl_pid, __u32 seq, 136 + __dump_nlmsg_t _fn, dump_nlmsg_t fn, 137 + void *cookie) 138 + { 139 + bool multipart = true; 140 + struct nlmsgerr *err; 141 + struct nlmsghdr *nh; 142 + char buf[4096]; 143 + int len, ret; 144 + 145 + while (multipart) { 146 + multipart = false; 147 + len = recv(sock, buf, sizeof(buf), 0); 148 + if (len < 0) { 149 + ret = -errno; 150 + goto done; 151 + } 152 + 153 + if (len == 0) 154 + break; 155 + 156 + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); 157 + nh = NLMSG_NEXT(nh, len)) { 158 + if (nh->nlmsg_pid != nl_pid) { 159 + ret = -LIBBPF_ERRNO__WRNGPID; 160 + goto done; 161 + } 162 + if (nh->nlmsg_seq != seq) { 163 + ret = -LIBBPF_ERRNO__INVSEQ; 164 + goto done; 165 + } 166 + if (nh->nlmsg_flags & NLM_F_MULTI) 167 + multipart = true; 168 + switch (nh->nlmsg_type) { 169 + case NLMSG_ERROR: 170 + err = (struct nlmsgerr *)NLMSG_DATA(nh); 171 + if (!err->error) 172 + continue; 173 + ret = err->error; 174 + libbpf_nla_dump_errormsg(nh); 175 + goto done; 176 + case NLMSG_DONE: 177 + return 0; 178 + default: 179 + break; 180 + } 181 + if (_fn) { 182 + ret = _fn(nh, fn, cookie); 183 + if (ret) 184 + return ret; 185 + } 186 + } 187 + } 188 + ret = 0; 189 + done: 190 + return ret; 191 + } 192 + 193 + static int __dump_class_nlmsg(struct nlmsghdr *nlh, 194 + dump_nlmsg_t dump_class_nlmsg, 195 + void *cookie) 196 + { 197 + struct nlattr *tb[TCA_MAX + 1], *attr; 198 + struct tcmsg *t = NLMSG_DATA(nlh); 199 + int len; 200 + 201 + len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); 202 + attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); 203 + if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) 204 + return -LIBBPF_ERRNO__NLPARSE; 205 + 206 + return dump_class_nlmsg(cookie, t, tb); 207 + } 208 + 209 + static int netlink_get_class(int sock, unsigned int nl_pid, int ifindex, 210 + dump_nlmsg_t dump_class_nlmsg, void *cookie) 211 + { 212 + struct { 213 + struct nlmsghdr nlh; 214 + struct tcmsg t; 215 + } req = { 216 + .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), 217 + .nlh.nlmsg_type = RTM_GETTCLASS, 218 + .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, 219 + .t.tcm_family = AF_UNSPEC, 220 + .t.tcm_ifindex = ifindex, 221 + }; 222 + int seq = time(NULL); 223 + 224 + req.nlh.nlmsg_seq = seq; 225 + if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) 226 + return -errno; 227 + 228 + return netlink_recv(sock, nl_pid, seq, __dump_class_nlmsg, 229 + dump_class_nlmsg, cookie); 230 + } 231 + 232 + static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh, 233 + dump_nlmsg_t dump_qdisc_nlmsg, 234 + void *cookie) 235 + { 236 + struct nlattr *tb[TCA_MAX + 1], *attr; 237 + struct tcmsg *t = NLMSG_DATA(nlh); 238 + int len; 239 + 240 + len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); 241 + attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); 242 + if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) 243 + return -LIBBPF_ERRNO__NLPARSE; 244 + 245 + return dump_qdisc_nlmsg(cookie, t, tb); 246 + } 247 + 248 + static int netlink_get_qdisc(int sock, unsigned int nl_pid, int ifindex, 249 + dump_nlmsg_t dump_qdisc_nlmsg, void *cookie) 250 + { 251 + struct { 252 + struct nlmsghdr nlh; 253 + struct tcmsg t; 254 + } req = { 255 + .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), 256 + .nlh.nlmsg_type = RTM_GETQDISC, 257 + .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, 258 + .t.tcm_family = AF_UNSPEC, 259 + .t.tcm_ifindex = ifindex, 260 + }; 261 + int seq = time(NULL); 262 + 263 + req.nlh.nlmsg_seq = seq; 264 + if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) 265 + return -errno; 266 + 267 + return netlink_recv(sock, nl_pid, seq, __dump_qdisc_nlmsg, 268 + dump_qdisc_nlmsg, cookie); 269 + } 270 + 271 + static int __dump_filter_nlmsg(struct nlmsghdr *nlh, 272 + dump_nlmsg_t dump_filter_nlmsg, 273 + void *cookie) 274 + { 275 + struct nlattr *tb[TCA_MAX + 1], *attr; 276 + struct tcmsg *t = NLMSG_DATA(nlh); 277 + int len; 278 + 279 + len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); 280 + attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); 281 + if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) 282 + return -LIBBPF_ERRNO__NLPARSE; 283 + 284 + return dump_filter_nlmsg(cookie, t, tb); 285 + } 286 + 287 + static int netlink_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, 288 + dump_nlmsg_t dump_filter_nlmsg, void *cookie) 289 + { 290 + struct { 291 + struct nlmsghdr nlh; 292 + struct tcmsg t; 293 + } req = { 294 + .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), 295 + .nlh.nlmsg_type = RTM_GETTFILTER, 296 + .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, 297 + .t.tcm_family = AF_UNSPEC, 298 + .t.tcm_ifindex = ifindex, 299 + .t.tcm_parent = handle, 300 + }; 301 + int seq = time(NULL); 302 + 303 + req.nlh.nlmsg_seq = seq; 304 + if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) 305 + return -errno; 306 + 307 + return netlink_recv(sock, nl_pid, seq, __dump_filter_nlmsg, 308 + dump_filter_nlmsg, cookie); 309 + } 310 + 311 + static int __dump_link_nlmsg(struct nlmsghdr *nlh, 312 + dump_nlmsg_t dump_link_nlmsg, void *cookie) 313 + { 314 + struct nlattr *tb[IFLA_MAX + 1], *attr; 315 + struct ifinfomsg *ifi = NLMSG_DATA(nlh); 316 + int len; 317 + 318 + len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi)); 319 + attr = (struct nlattr *) ((void *) ifi + NLMSG_ALIGN(sizeof(*ifi))); 320 + if (libbpf_nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0) 321 + return -LIBBPF_ERRNO__NLPARSE; 322 + 323 + return dump_link_nlmsg(cookie, ifi, tb); 324 + } 325 + 326 + static int netlink_get_link(int sock, unsigned int nl_pid, 327 + dump_nlmsg_t dump_link_nlmsg, void *cookie) 328 + { 329 + struct { 330 + struct nlmsghdr nlh; 331 + struct ifinfomsg ifm; 332 + } req = { 333 + .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)), 334 + .nlh.nlmsg_type = RTM_GETLINK, 335 + .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, 336 + .ifm.ifi_family = AF_PACKET, 337 + }; 338 + int seq = time(NULL); 339 + 340 + req.nlh.nlmsg_seq = seq; 341 + if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) 342 + return -errno; 343 + 344 + return netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg, 345 + dump_link_nlmsg, cookie); 91 346 } 92 347 93 348 static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb) ··· 433 168 tcinfo.array_len = 0; 434 169 435 170 tcinfo.is_qdisc = false; 436 - ret = libbpf_nl_get_class(sock, nl_pid, dev->ifindex, 437 - dump_class_qdisc_nlmsg, &tcinfo); 171 + ret = netlink_get_class(sock, nl_pid, dev->ifindex, 172 + dump_class_qdisc_nlmsg, &tcinfo); 438 173 if (ret) 439 174 goto out; 440 175 441 176 tcinfo.is_qdisc = true; 442 - ret = libbpf_nl_get_qdisc(sock, nl_pid, dev->ifindex, 443 - dump_class_qdisc_nlmsg, &tcinfo); 177 + ret = netlink_get_qdisc(sock, nl_pid, dev->ifindex, 178 + dump_class_qdisc_nlmsg, &tcinfo); 444 179 if (ret) 445 180 goto out; 446 181 ··· 448 183 filter_info.ifindex = dev->ifindex; 449 184 for (i = 0; i < tcinfo.used_len; i++) { 450 185 filter_info.kind = tcinfo.handle_array[i].kind; 451 - ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, 452 - tcinfo.handle_array[i].handle, 453 - dump_filter_nlmsg, &filter_info); 186 + ret = netlink_get_filter(sock, nl_pid, dev->ifindex, 187 + tcinfo.handle_array[i].handle, 188 + dump_filter_nlmsg, &filter_info); 454 189 if (ret) 455 190 goto out; 456 191 } ··· 458 193 /* root, ingress and egress handle */ 459 194 handle = TC_H_ROOT; 460 195 filter_info.kind = "root"; 461 - ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle, 462 - dump_filter_nlmsg, &filter_info); 196 + ret = netlink_get_filter(sock, nl_pid, dev->ifindex, handle, 197 + dump_filter_nlmsg, &filter_info); 463 198 if (ret) 464 199 goto out; 465 200 466 201 handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS); 467 202 filter_info.kind = "clsact/ingress"; 468 - ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle, 469 - dump_filter_nlmsg, &filter_info); 203 + ret = netlink_get_filter(sock, nl_pid, dev->ifindex, handle, 204 + dump_filter_nlmsg, &filter_info); 470 205 if (ret) 471 206 goto out; 472 207 473 208 handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS); 474 209 filter_info.kind = "clsact/egress"; 475 - ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle, 476 - dump_filter_nlmsg, &filter_info); 210 + ret = netlink_get_filter(sock, nl_pid, dev->ifindex, handle, 211 + dump_filter_nlmsg, &filter_info); 477 212 if (ret) 478 213 goto out; 479 214 ··· 651 386 struct bpf_attach_info attach_info = {}; 652 387 int i, sock, ret, filter_idx = -1; 653 388 struct bpf_netdev_t dev_array; 654 - unsigned int nl_pid; 389 + unsigned int nl_pid = 0; 655 390 char err_buf[256]; 656 391 657 392 if (argc == 2) { ··· 666 401 if (ret) 667 402 return -1; 668 403 669 - sock = libbpf_netlink_open(&nl_pid); 404 + sock = netlink_open(&nl_pid); 670 405 if (sock < 0) { 671 406 fprintf(stderr, "failed to open netlink sock\n"); 672 407 return -1; ··· 681 416 jsonw_start_array(json_wtr); 682 417 NET_START_OBJECT; 683 418 NET_START_ARRAY("xdp", "%s:\n"); 684 - ret = libbpf_nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array); 419 + ret = netlink_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array); 685 420 NET_END_ARRAY("\n"); 686 421 687 422 if (!ret) {

+27 -2

tools/bpf/resolve_btfids/main.c

··· 199 199 /* 200 200 * __BTF_ID__func__vfs_truncate__0 201 201 * prefix_end = ^ 202 + * pos = ^ 202 203 */ 203 - char *p, *id = strdup(prefix_end + sizeof("__") - 1); 204 + int len = strlen(prefix_end); 205 + int pos = sizeof("__") - 1; 206 + char *p, *id; 204 207 208 + if (pos >= len) 209 + return NULL; 210 + 211 + id = strdup(prefix_end + pos); 205 212 if (id) { 206 213 /* 207 214 * __BTF_ID__func__vfs_truncate__0 ··· 225 218 *p = '\0'; 226 219 } 227 220 return id; 221 + } 222 + 223 + static struct btf_id *add_set(struct object *obj, char *name) 224 + { 225 + /* 226 + * __BTF_ID__set__name 227 + * name = ^ 228 + * id = ^ 229 + */ 230 + char *id = name + sizeof(BTF_SET "__") - 1; 231 + int len = strlen(name); 232 + 233 + if (id >= name + len) { 234 + pr_err("FAILED to parse set name: %s\n", name); 235 + return NULL; 236 + } 237 + 238 + return btf_id__add(&obj->sets, id, true); 228 239 } 229 240 230 241 static struct btf_id *add_symbol(struct rb_root *root, char *name, size_t size) ··· 437 412 id = add_symbol(&obj->funcs, prefix, sizeof(BTF_FUNC) - 1); 438 413 /* set */ 439 414 } else if (!strncmp(prefix, BTF_SET, sizeof(BTF_SET) - 1)) { 440 - id = add_symbol(&obj->sets, prefix, sizeof(BTF_SET) - 1); 415 + id = add_set(obj, prefix); 441 416 /* 442 417 * SET objects store list's count, which is encoded 443 418 * in symbol's size, together with 'cnt' field hence

+2

tools/build/Makefile

··· 38 38 $(call QUIET_CLEAN, fixdep) 39 39 $(Q)find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete 40 40 $(Q)rm -f $(OUTPUT)fixdep 41 + $(call QUIET_CLEAN, feature-detect) 42 + $(Q)$(MAKE) -C feature/ clean >/dev/null 41 43 42 44 $(OUTPUT)fixdep-in.o: FORCE 43 45 $(Q)$(MAKE) $(build)=fixdep

-1

tools/build/Makefile.feature

··· 46 46 libelf-getphdrnum \ 47 47 libelf-gelf_getnote \ 48 48 libelf-getshdrstrndx \ 49 - libelf-mmap \ 50 49 libnuma \ 51 50 numa_num_possible_cpus \ 52 51 libperl \

-4

tools/build/feature/Makefile

··· 25 25 test-libelf-getphdrnum.bin \ 26 26 test-libelf-gelf_getnote.bin \ 27 27 test-libelf-getshdrstrndx.bin \ 28 - test-libelf-mmap.bin \ 29 28 test-libdebuginfod.bin \ 30 29 test-libnuma.bin \ 31 30 test-numa_num_possible_cpus.bin \ ··· 144 145 145 146 $(OUTPUT)test-dwarf_getlocations.bin: 146 147 $(BUILD) $(DWARFLIBS) 147 - 148 - $(OUTPUT)test-libelf-mmap.bin: 149 - $(BUILD) -lelf 150 148 151 149 $(OUTPUT)test-libelf-getphdrnum.bin: 152 150 $(BUILD) -lelf

-4

tools/build/feature/test-all.c

··· 30 30 # include "test-libelf.c" 31 31 #undef main 32 32 33 - #define main main_test_libelf_mmap 34 - # include "test-libelf-mmap.c" 35 - #undef main 36 - 37 33 #define main main_test_get_current_dir_name 38 34 # include "test-get_current_dir_name.c" 39 35 #undef main

-9

tools/build/feature/test-libelf-mmap.c

··· 1 - // SPDX-License-Identifier: GPL-2.0 2 - #include <libelf.h> 3 - 4 - int main(void) 5 - { 6 - Elf *elf = elf_begin(0, ELF_C_READ_MMAP, 0); 7 - 8 - return (long)elf; 9 - }

+50 -1

tools/include/linux/btf_ids.h

··· 3 3 #ifndef _LINUX_BTF_IDS_H 4 4 #define _LINUX_BTF_IDS_H 5 5 6 + struct btf_id_set { 7 + u32 cnt; 8 + u32 ids[]; 9 + }; 10 + 6 11 #ifdef CONFIG_DEBUG_INFO_BTF 7 12 8 13 #include <linux/compiler.h> /* for __PASTE */ ··· 67 62 ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 68 63 "." #scope " " #name "; \n" \ 69 64 #name ":; \n" \ 70 - ".popsection; \n"); \ 65 + ".popsection; \n"); 71 66 72 67 #define BTF_ID_LIST(name) \ 73 68 __BTF_ID_LIST(name, local) \ ··· 93 88 ".zero 4 \n" \ 94 89 ".popsection; \n"); 95 90 91 + /* 92 + * The BTF_SET_START/END macros pair defines sorted list of 93 + * BTF IDs plus its members count, with following layout: 94 + * 95 + * BTF_SET_START(list) 96 + * BTF_ID(type1, name1) 97 + * BTF_ID(type2, name2) 98 + * BTF_SET_END(list) 99 + * 100 + * __BTF_ID__set__list: 101 + * .zero 4 102 + * list: 103 + * __BTF_ID__type1__name1__3: 104 + * .zero 4 105 + * __BTF_ID__type2__name2__4: 106 + * .zero 4 107 + * 108 + */ 109 + #define __BTF_SET_START(name, scope) \ 110 + asm( \ 111 + ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 112 + "." #scope " __BTF_ID__set__" #name "; \n" \ 113 + "__BTF_ID__set__" #name ":; \n" \ 114 + ".zero 4 \n" \ 115 + ".popsection; \n"); 116 + 117 + #define BTF_SET_START(name) \ 118 + __BTF_ID_LIST(name, local) \ 119 + __BTF_SET_START(name, local) 120 + 121 + #define BTF_SET_START_GLOBAL(name) \ 122 + __BTF_ID_LIST(name, globl) \ 123 + __BTF_SET_START(name, globl) 124 + 125 + #define BTF_SET_END(name) \ 126 + asm( \ 127 + ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ 128 + ".size __BTF_ID__set__" #name ", .-" #name " \n" \ 129 + ".popsection; \n"); \ 130 + extern struct btf_id_set name; 131 + 96 132 #else 97 133 98 134 #define BTF_ID_LIST(name) static u32 name[5]; 99 135 #define BTF_ID(prefix, name) 100 136 #define BTF_ID_UNUSED 101 137 #define BTF_ID_LIST_GLOBAL(name) u32 name[1]; 138 + #define BTF_SET_START(name) static struct btf_id_set name = { 0 }; 139 + #define BTF_SET_START_GLOBAL(name) static struct btf_id_set name = { 0 }; 140 + #define BTF_SET_END(name) 102 141 103 142 #endif /* CONFIG_DEBUG_INFO_BTF */ 104 143

+393 -5

tools/include/uapi/linux/bpf.h

··· 155 155 BPF_MAP_TYPE_DEVMAP_HASH, 156 156 BPF_MAP_TYPE_STRUCT_OPS, 157 157 BPF_MAP_TYPE_RINGBUF, 158 + BPF_MAP_TYPE_INODE_STORAGE, 158 159 }; 159 160 160 161 /* Note that tracing related programs such as ··· 345 344 346 345 /* The verifier internal test flag. Behavior is undefined */ 347 346 #define BPF_F_TEST_STATE_FREQ (1U << 3) 347 + 348 + /* If BPF_F_SLEEPABLE is used in BPF_PROG_LOAD command, the verifier will 349 + * restrict map and helper usage for such programs. Sleepable BPF programs can 350 + * only be attached to hooks where kernel execution context allows sleeping. 351 + * Such programs are allowed to use helpers that may sleep like 352 + * bpf_copy_from_user(). 353 + */ 354 + #define BPF_F_SLEEPABLE (1U << 4) 348 355 349 356 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have 350 357 * two extensions: ··· 2816 2807 * 2817 2808 * **-ERANGE** if resulting value was out of range. 2818 2809 * 2819 - * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags) 2810 + * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags) 2820 2811 * Description 2821 2812 * Get a bpf-local-storage from a *sk*. 2822 2813 * ··· 2832 2823 * "type". The bpf-local-storage "type" (i.e. the *map*) is 2833 2824 * searched against all bpf-local-storages residing at *sk*. 2834 2825 * 2826 + * *sk* is a kernel **struct sock** pointer for LSM program. 2827 + * *sk* is a **struct bpf_sock** pointer for other program types. 2828 + * 2835 2829 * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be 2836 2830 * used such that a new bpf-local-storage will be 2837 2831 * created if one does not exist. *value* can be used ··· 2847 2835 * **NULL** if not found or there was an error in adding 2848 2836 * a new bpf-local-storage. 2849 2837 * 2850 - * long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk) 2838 + * long bpf_sk_storage_delete(struct bpf_map *map, void *sk) 2851 2839 * Description 2852 2840 * Delete a bpf-local-storage from a *sk*. 2853 2841 * Return ··· 3407 3395 * A non-negative value equal to or less than *size* on success, 3408 3396 * or a negative error in case of failure. 3409 3397 * 3398 + * long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags) 3399 + * Description 3400 + * Load header option. Support reading a particular TCP header 3401 + * option for bpf program (BPF_PROG_TYPE_SOCK_OPS). 3402 + * 3403 + * If *flags* is 0, it will search the option from the 3404 + * sock_ops->skb_data. The comment in "struct bpf_sock_ops" 3405 + * has details on what skb_data contains under different 3406 + * sock_ops->op. 3407 + * 3408 + * The first byte of the *searchby_res* specifies the 3409 + * kind that it wants to search. 3410 + * 3411 + * If the searching kind is an experimental kind 3412 + * (i.e. 253 or 254 according to RFC6994). It also 3413 + * needs to specify the "magic" which is either 3414 + * 2 bytes or 4 bytes. It then also needs to 3415 + * specify the size of the magic by using 3416 + * the 2nd byte which is "kind-length" of a TCP 3417 + * header option and the "kind-length" also 3418 + * includes the first 2 bytes "kind" and "kind-length" 3419 + * itself as a normal TCP header option also does. 3420 + * 3421 + * For example, to search experimental kind 254 with 3422 + * 2 byte magic 0xeB9F, the searchby_res should be 3423 + * [ 254, 4, 0xeB, 0x9F, 0, 0, .... 0 ]. 3424 + * 3425 + * To search for the standard window scale option (3), 3426 + * the searchby_res should be [ 3, 0, 0, .... 0 ]. 3427 + * Note, kind-length must be 0 for regular option. 3428 + * 3429 + * Searching for No-Op (0) and End-of-Option-List (1) are 3430 + * not supported. 3431 + * 3432 + * *len* must be at least 2 bytes which is the minimal size 3433 + * of a header option. 3434 + * 3435 + * Supported flags: 3436 + * * **BPF_LOAD_HDR_OPT_TCP_SYN** to search from the 3437 + * saved_syn packet or the just-received syn packet. 3438 + * 3439 + * Return 3440 + * >0 when found, the header option is copied to *searchby_res*. 3441 + * The return value is the total length copied. 3442 + * 3443 + * **-EINVAL** If param is invalid 3444 + * 3445 + * **-ENOMSG** The option is not found 3446 + * 3447 + * **-ENOENT** No syn packet available when 3448 + * **BPF_LOAD_HDR_OPT_TCP_SYN** is used 3449 + * 3450 + * **-ENOSPC** Not enough space. Only *len* number of 3451 + * bytes are copied. 3452 + * 3453 + * **-EFAULT** Cannot parse the header options in the packet 3454 + * 3455 + * **-EPERM** This helper cannot be used under the 3456 + * current sock_ops->op. 3457 + * 3458 + * long bpf_store_hdr_opt(struct bpf_sock_ops *skops, const void *from, u32 len, u64 flags) 3459 + * Description 3460 + * Store header option. The data will be copied 3461 + * from buffer *from* with length *len* to the TCP header. 3462 + * 3463 + * The buffer *from* should have the whole option that 3464 + * includes the kind, kind-length, and the actual 3465 + * option data. The *len* must be at least kind-length 3466 + * long. The kind-length does not have to be 4 byte 3467 + * aligned. The kernel will take care of the padding 3468 + * and setting the 4 bytes aligned value to th->doff. 3469 + * 3470 + * This helper will check for duplicated option 3471 + * by searching the same option in the outgoing skb. 3472 + * 3473 + * This helper can only be called during 3474 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 3475 + * 3476 + * Return 3477 + * 0 on success, or negative error in case of failure: 3478 + * 3479 + * **-EINVAL** If param is invalid 3480 + * 3481 + * **-ENOSPC** Not enough space in the header. 3482 + * Nothing has been written 3483 + * 3484 + * **-EEXIST** The option has already existed 3485 + * 3486 + * **-EFAULT** Cannot parse the existing header options 3487 + * 3488 + * **-EPERM** This helper cannot be used under the 3489 + * current sock_ops->op. 3490 + * 3491 + * long bpf_reserve_hdr_opt(struct bpf_sock_ops *skops, u32 len, u64 flags) 3492 + * Description 3493 + * Reserve *len* bytes for the bpf header option. The 3494 + * space will be used by bpf_store_hdr_opt() later in 3495 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 3496 + * 3497 + * If bpf_reserve_hdr_opt() is called multiple times, 3498 + * the total number of bytes will be reserved. 3499 + * 3500 + * This helper can only be called during 3501 + * BPF_SOCK_OPS_HDR_OPT_LEN_CB. 3502 + * 3503 + * Return 3504 + * 0 on success, or negative error in case of failure: 3505 + * 3506 + * **-EINVAL** if param is invalid 3507 + * 3508 + * **-ENOSPC** Not enough space in the header. 3509 + * 3510 + * **-EPERM** This helper cannot be used under the 3511 + * current sock_ops->op. 3512 + * 3513 + * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags) 3514 + * Description 3515 + * Get a bpf_local_storage from an *inode*. 3516 + * 3517 + * Logically, it could be thought of as getting the value from 3518 + * a *map* with *inode* as the **key**. From this 3519 + * perspective, the usage is not much different from 3520 + * **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this 3521 + * helper enforces the key must be an inode and the map must also 3522 + * be a **BPF_MAP_TYPE_INODE_STORAGE**. 3523 + * 3524 + * Underneath, the value is stored locally at *inode* instead of 3525 + * the *map*. The *map* is used as the bpf-local-storage 3526 + * "type". The bpf-local-storage "type" (i.e. the *map*) is 3527 + * searched against all bpf_local_storage residing at *inode*. 3528 + * 3529 + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be 3530 + * used such that a new bpf_local_storage will be 3531 + * created if one does not exist. *value* can be used 3532 + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify 3533 + * the initial value of a bpf_local_storage. If *value* is 3534 + * **NULL**, the new bpf_local_storage will be zero initialized. 3535 + * Return 3536 + * A bpf_local_storage pointer is returned on success. 3537 + * 3538 + * **NULL** if not found or there was an error in adding 3539 + * a new bpf_local_storage. 3540 + * 3541 + * int bpf_inode_storage_delete(struct bpf_map *map, void *inode) 3542 + * Description 3543 + * Delete a bpf_local_storage from an *inode*. 3544 + * Return 3545 + * 0 on success. 3546 + * 3547 + * **-ENOENT** if the bpf_local_storage cannot be found. 3548 + * 3549 + * long bpf_d_path(struct path *path, char *buf, u32 sz) 3550 + * Description 3551 + * Return full path for given 'struct path' object, which 3552 + * needs to be the kernel BTF 'path' object. The path is 3553 + * returned in the provided buffer 'buf' of size 'sz' and 3554 + * is zero terminated. 3555 + * 3556 + * Return 3557 + * On success, the strictly positive length of the string, 3558 + * including the trailing NUL character. On error, a negative 3559 + * value. 3560 + * 3561 + * long bpf_copy_from_user(void *dst, u32 size, const void *user_ptr) 3562 + * Description 3563 + * Read *size* bytes from user space address *user_ptr* and store 3564 + * the data in *dst*. This is a wrapper of copy_from_user(). 3565 + * Return 3566 + * 0 on success, or a negative error in case of failure. 3410 3567 */ 3411 3568 #define __BPF_FUNC_MAPPER(FN) \ 3412 3569 FN(unspec), \ ··· 3720 3539 FN(skc_to_tcp_request_sock), \ 3721 3540 FN(skc_to_udp6_sock), \ 3722 3541 FN(get_task_stack), \ 3542 + FN(load_hdr_opt), \ 3543 + FN(store_hdr_opt), \ 3544 + FN(reserve_hdr_opt), \ 3545 + FN(inode_storage_get), \ 3546 + FN(inode_storage_delete), \ 3547 + FN(d_path), \ 3548 + FN(copy_from_user), \ 3723 3549 /* */ 3724 3550 3725 3551 /* integer value in 'imm' field of BPF_CALL instruction selects which helper ··· 3836 3648 BPF_F_SYSCTL_BASE_NAME = (1ULL << 0), 3837 3649 }; 3838 3650 3839 - /* BPF_FUNC_sk_storage_get flags */ 3651 + /* BPF_FUNC_<kernel_obj>_storage_get flags */ 3840 3652 enum { 3841 - BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0), 3653 + BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0), 3654 + /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility 3655 + * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead. 3656 + */ 3657 + BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE, 3842 3658 }; 3843 3659 3844 3660 /* BPF_FUNC_read_branch_records flags. */ ··· 4263 4071 __u64 cgroup_id; 4264 4072 __u32 attach_type; 4265 4073 } cgroup; 4074 + struct { 4075 + __aligned_u64 target_name; /* in/out: target_name buffer ptr */ 4076 + __u32 target_name_len; /* in/out: target_name buffer len */ 4077 + union { 4078 + struct { 4079 + __u32 map_id; 4080 + } map; 4081 + }; 4082 + } iter; 4266 4083 struct { 4267 4084 __u32 netns_ino; 4268 4085 __u32 attach_type; ··· 4359 4158 __u64 bytes_received; 4360 4159 __u64 bytes_acked; 4361 4160 __bpf_md_ptr(struct bpf_sock *, sk); 4161 + /* [skb_data, skb_data_end) covers the whole TCP header. 4162 + * 4163 + * BPF_SOCK_OPS_PARSE_HDR_OPT_CB: The packet received 4164 + * BPF_SOCK_OPS_HDR_OPT_LEN_CB: Not useful because the 4165 + * header has not been written. 4166 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB: The header and options have 4167 + * been written so far. 4168 + * BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: The SYNACK that concludes 4169 + * the 3WHS. 4170 + * BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: The ACK that concludes 4171 + * the 3WHS. 4172 + * 4173 + * bpf_load_hdr_opt() can also be used to read a particular option. 4174 + */ 4175 + __bpf_md_ptr(void *, skb_data); 4176 + __bpf_md_ptr(void *, skb_data_end); 4177 + __u32 skb_len; /* The total length of a packet. 4178 + * It includes the header, options, 4179 + * and payload. 4180 + */ 4181 + __u32 skb_tcp_flags; /* tcp_flags of the header. It provides 4182 + * an easy way to check for tcp_flags 4183 + * without parsing skb_data. 4184 + * 4185 + * In particular, the skb_tcp_flags 4186 + * will still be available in 4187 + * BPF_SOCK_OPS_HDR_OPT_LEN even though 4188 + * the outgoing header has not 4189 + * been written yet. 4190 + */ 4362 4191 }; 4363 4192 4364 4193 /* Definitions for bpf_sock_ops_cb_flags */ ··· 4397 4166 BPF_SOCK_OPS_RETRANS_CB_FLAG = (1<<1), 4398 4167 BPF_SOCK_OPS_STATE_CB_FLAG = (1<<2), 4399 4168 BPF_SOCK_OPS_RTT_CB_FLAG = (1<<3), 4169 + /* Call bpf for all received TCP headers. The bpf prog will be 4170 + * called under sock_ops->op == BPF_SOCK_OPS_PARSE_HDR_OPT_CB 4171 + * 4172 + * Please refer to the comment in BPF_SOCK_OPS_PARSE_HDR_OPT_CB 4173 + * for the header option related helpers that will be useful 4174 + * to the bpf programs. 4175 + * 4176 + * It could be used at the client/active side (i.e. connect() side) 4177 + * when the server told it that the server was in syncookie 4178 + * mode and required the active side to resend the bpf-written 4179 + * options. The active side can keep writing the bpf-options until 4180 + * it received a valid packet from the server side to confirm 4181 + * the earlier packet (and options) has been received. The later 4182 + * example patch is using it like this at the active side when the 4183 + * server is in syncookie mode. 4184 + * 4185 + * The bpf prog will usually turn this off in the common cases. 4186 + */ 4187 + BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG = (1<<4), 4188 + /* Call bpf when kernel has received a header option that 4189 + * the kernel cannot handle. The bpf prog will be called under 4190 + * sock_ops->op == BPF_SOCK_OPS_PARSE_HDR_OPT_CB. 4191 + * 4192 + * Please refer to the comment in BPF_SOCK_OPS_PARSE_HDR_OPT_CB 4193 + * for the header option related helpers that will be useful 4194 + * to the bpf programs. 4195 + */ 4196 + BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG = (1<<5), 4197 + /* Call bpf when the kernel is writing header options for the 4198 + * outgoing packet. The bpf prog will first be called 4199 + * to reserve space in a skb under 4200 + * sock_ops->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB. Then 4201 + * the bpf prog will be called to write the header option(s) 4202 + * under sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 4203 + * 4204 + * Please refer to the comment in BPF_SOCK_OPS_HDR_OPT_LEN_CB 4205 + * and BPF_SOCK_OPS_WRITE_HDR_OPT_CB for the header option 4206 + * related helpers that will be useful to the bpf programs. 4207 + * 4208 + * The kernel gets its chance to reserve space and write 4209 + * options first before the BPF program does. 4210 + */ 4211 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG = (1<<6), 4400 4212 /* Mask of all currently supported cb flags */ 4401 - BPF_SOCK_OPS_ALL_CB_FLAGS = 0xF, 4213 + BPF_SOCK_OPS_ALL_CB_FLAGS = 0x7F, 4402 4214 }; 4403 4215 4404 4216 /* List of known BPF sock_ops operators. ··· 4497 4223 */ 4498 4224 BPF_SOCK_OPS_RTT_CB, /* Called on every RTT. 4499 4225 */ 4226 + BPF_SOCK_OPS_PARSE_HDR_OPT_CB, /* Parse the header option. 4227 + * It will be called to handle 4228 + * the packets received at 4229 + * an already established 4230 + * connection. 4231 + * 4232 + * sock_ops->skb_data: 4233 + * Referring to the received skb. 4234 + * It covers the TCP header only. 4235 + * 4236 + * bpf_load_hdr_opt() can also 4237 + * be used to search for a 4238 + * particular option. 4239 + */ 4240 + BPF_SOCK_OPS_HDR_OPT_LEN_CB, /* Reserve space for writing the 4241 + * header option later in 4242 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 4243 + * Arg1: bool want_cookie. (in 4244 + * writing SYNACK only) 4245 + * 4246 + * sock_ops->skb_data: 4247 + * Not available because no header has 4248 + * been written yet. 4249 + * 4250 + * sock_ops->skb_tcp_flags: 4251 + * The tcp_flags of the 4252 + * outgoing skb. (e.g. SYN, ACK, FIN). 4253 + * 4254 + * bpf_reserve_hdr_opt() should 4255 + * be used to reserve space. 4256 + */ 4257 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB, /* Write the header options 4258 + * Arg1: bool want_cookie. (in 4259 + * writing SYNACK only) 4260 + * 4261 + * sock_ops->skb_data: 4262 + * Referring to the outgoing skb. 4263 + * It covers the TCP header 4264 + * that has already been written 4265 + * by the kernel and the 4266 + * earlier bpf-progs. 4267 + * 4268 + * sock_ops->skb_tcp_flags: 4269 + * The tcp_flags of the outgoing 4270 + * skb. (e.g. SYN, ACK, FIN). 4271 + * 4272 + * bpf_store_hdr_opt() should 4273 + * be used to write the 4274 + * option. 4275 + * 4276 + * bpf_load_hdr_opt() can also 4277 + * be used to search for a 4278 + * particular option that 4279 + * has already been written 4280 + * by the kernel or the 4281 + * earlier bpf-progs. 4282 + */ 4500 4283 }; 4501 4284 4502 4285 /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect ··· 4581 4250 enum { 4582 4251 TCP_BPF_IW = 1001, /* Set TCP initial congestion window */ 4583 4252 TCP_BPF_SNDCWND_CLAMP = 1002, /* Set sndcwnd_clamp */ 4253 + TCP_BPF_DELACK_MAX = 1003, /* Max delay ack in usecs */ 4254 + TCP_BPF_RTO_MIN = 1004, /* Min delay ack in usecs */ 4255 + /* Copy the SYN pkt to optval 4256 + * 4257 + * BPF_PROG_TYPE_SOCK_OPS only. It is similar to the 4258 + * bpf_getsockopt(TCP_SAVED_SYN) but it does not limit 4259 + * to only getting from the saved_syn. It can either get the 4260 + * syn packet from: 4261 + * 4262 + * 1. the just-received SYN packet (only available when writing the 4263 + * SYNACK). It will be useful when it is not necessary to 4264 + * save the SYN packet for latter use. It is also the only way 4265 + * to get the SYN during syncookie mode because the syn 4266 + * packet cannot be saved during syncookie. 4267 + * 4268 + * OR 4269 + * 4270 + * 2. the earlier saved syn which was done by 4271 + * bpf_setsockopt(TCP_SAVE_SYN). 4272 + * 4273 + * The bpf_getsockopt(TCP_BPF_SYN*) option will hide where the 4274 + * SYN packet is obtained. 4275 + * 4276 + * If the bpf-prog does not need the IP[46] header, the 4277 + * bpf-prog can avoid parsing the IP header by using 4278 + * TCP_BPF_SYN. Otherwise, the bpf-prog can get both 4279 + * IP[46] and TCP header by using TCP_BPF_SYN_IP. 4280 + * 4281 + * >0: Total number of bytes copied 4282 + * -ENOSPC: Not enough space in optval. Only optlen number of 4283 + * bytes is copied. 4284 + * -ENOENT: The SYN skb is not available now and the earlier SYN pkt 4285 + * is not saved by setsockopt(TCP_SAVE_SYN). 4286 + */ 4287 + TCP_BPF_SYN = 1005, /* Copy the TCP header */ 4288 + TCP_BPF_SYN_IP = 1006, /* Copy the IP[46] and TCP header */ 4289 + TCP_BPF_SYN_MAC = 1007, /* Copy the MAC, IP[46], and TCP header */ 4290 + }; 4291 + 4292 + enum { 4293 + BPF_LOAD_HDR_OPT_TCP_SYN = (1ULL << 0), 4294 + }; 4295 + 4296 + /* args[0] value during BPF_SOCK_OPS_HDR_OPT_LEN_CB and 4297 + * BPF_SOCK_OPS_WRITE_HDR_OPT_CB. 4298 + */ 4299 + enum { 4300 + BPF_WRITE_HDR_TCP_CURRENT_MSS = 1, /* Kernel is finding the 4301 + * total option spaces 4302 + * required for an established 4303 + * sk in order to calculate the 4304 + * MSS. No skb is actually 4305 + * sent. 4306 + */ 4307 + BPF_WRITE_HDR_TCP_SYNACK_COOKIE = 2, /* Kernel is in syncookie mode 4308 + * when sending a SYN. 4309 + */ 4584 4310 }; 4585 4311 4586 4312 struct bpf_perf_event_value {

+9 -14

tools/lib/bpf/Makefile

··· 1 1 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 2 # Most of this file is copied from tools/lib/traceevent/Makefile 3 3 4 + RM ?= rm 5 + srctree = $(abs_srctree) 6 + 4 7 LIBBPF_VERSION := $(shell \ 5 8 grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \ 6 9 sort -rV | head -n1 | cut -d'_' -f2) ··· 59 56 endif 60 57 61 58 FEATURE_USER = .libbpf 62 - FEATURE_TESTS = libelf libelf-mmap zlib bpf reallocarray 59 + FEATURE_TESTS = libelf zlib bpf 63 60 FEATURE_DISPLAY = libelf zlib bpf 64 61 65 62 INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi ··· 101 98 CFLAGS := -g -Wall 102 99 endif 103 100 104 - ifeq ($(feature-libelf-mmap), 1) 105 - override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT 106 - endif 107 - 108 - ifeq ($(feature-reallocarray), 0) 109 - override CFLAGS += -DCOMPAT_NEED_REALLOCARRAY 110 - endif 111 - 112 101 # Append required CFLAGS 113 - override CFLAGS += $(EXTRA_WARNINGS) 102 + override CFLAGS += $(EXTRA_WARNINGS) -Wno-switch-enum 114 103 override CFLAGS += -Werror -Wall 115 104 override CFLAGS += -fPIC 116 105 override CFLAGS += $(INCLUDES) ··· 191 196 @ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION) 192 197 193 198 $(OUTPUT)libbpf.a: $(BPF_IN_STATIC) 194 - $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^ 199 + $(QUIET_LINK)$(RM) -f $@; $(AR) rcs $@ $^ 195 200 196 201 $(OUTPUT)libbpf.pc: 197 202 $(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \ ··· 264 269 ### Cleaning rules 265 270 266 271 config-clean: 267 - $(call QUIET_CLEAN, config) 272 + $(call QUIET_CLEAN, feature-detect) 268 273 $(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null 269 274 270 - clean: 275 + clean: config-clean 271 276 $(call QUIET_CLEAN, libbpf) $(RM) -rf $(CMD_TARGETS) \ 272 277 *~ .*.d .*.cmd LIBBPF-CFLAGS $(BPF_HELPER_DEFS) \ 273 278 $(SHARED_OBJDIR) $(STATIC_OBJDIR) \ ··· 294 299 cscope -b -q -I $(srctree)/include -f cscope.out 295 300 296 301 tags: 297 - rm -f TAGS tags 302 + $(RM) -f TAGS tags 298 303 ls *.c *.h | xargs $(TAGS_PROG) -a 299 304 300 305 # Declare the contents of the .PHONY variable as phony. We keep that

-3

tools/lib/bpf/bpf.c

··· 32 32 #include "libbpf.h" 33 33 #include "libbpf_internal.h" 34 34 35 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 36 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 37 - 38 35 /* 39 36 * When building perf, unistd.h is overridden. __NR_bpf is 40 37 * required to be defined explicitly.

+101 -19

tools/lib/bpf/bpf_core_read.h

··· 19 19 BPF_FIELD_RSHIFT_U64 = 5, 20 20 }; 21 21 22 + /* second argument to __builtin_btf_type_id() built-in */ 23 + enum bpf_type_id_kind { 24 + BPF_TYPE_ID_LOCAL = 0, /* BTF type ID in local program */ 25 + BPF_TYPE_ID_TARGET = 1, /* BTF type ID in target kernel */ 26 + }; 27 + 28 + /* second argument to __builtin_preserve_type_info() built-in */ 29 + enum bpf_type_info_kind { 30 + BPF_TYPE_EXISTS = 0, /* type existence in target kernel */ 31 + BPF_TYPE_SIZE = 1, /* type size in target kernel */ 32 + }; 33 + 34 + /* second argument to __builtin_preserve_enum_value() built-in */ 35 + enum bpf_enum_value_kind { 36 + BPF_ENUMVAL_EXISTS = 0, /* enum value existence in kernel */ 37 + BPF_ENUMVAL_VALUE = 1, /* enum value value relocation */ 38 + }; 39 + 22 40 #define __CORE_RELO(src, field, info) \ 23 41 __builtin_preserve_field_info((src)->field, BPF_FIELD_##info) 24 42 25 43 #if __BYTE_ORDER == __LITTLE_ENDIAN 26 44 #define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \ 27 - bpf_probe_read((void *)dst, \ 28 - __CORE_RELO(src, fld, BYTE_SIZE), \ 29 - (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) 45 + bpf_probe_read_kernel( \ 46 + (void *)dst, \ 47 + __CORE_RELO(src, fld, BYTE_SIZE), \ 48 + (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) 30 49 #else 31 50 /* semantics of LSHIFT_64 assumes loading values into low-ordered bytes, so 32 51 * for big-endian we need to adjust destination pointer accordingly, based on 33 52 * field byte size 34 53 */ 35 54 #define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \ 36 - bpf_probe_read((void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \ 37 - __CORE_RELO(src, fld, BYTE_SIZE), \ 38 - (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) 55 + bpf_probe_read_kernel( \ 56 + (void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \ 57 + __CORE_RELO(src, fld, BYTE_SIZE), \ 58 + (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) 39 59 #endif 40 60 41 61 /* 42 62 * Extract bitfield, identified by s->field, and return its value as u64. 43 63 * All this is done in relocatable manner, so bitfield changes such as 44 64 * signedness, bit size, offset changes, this will be handled automatically. 45 - * This version of macro is using bpf_probe_read() to read underlying integer 46 - * storage. Macro functions as an expression and its return type is 47 - * bpf_probe_read()'s return value: 0, on success, <0 on error. 65 + * This version of macro is using bpf_probe_read_kernel() to read underlying 66 + * integer storage. Macro functions as an expression and its return type is 67 + * bpf_probe_read_kernel()'s return value: 0, on success, <0 on error. 48 68 */ 49 69 #define BPF_CORE_READ_BITFIELD_PROBED(s, field) ({ \ 50 70 unsigned long long val = 0; \ ··· 112 92 __builtin_preserve_field_info(field, BPF_FIELD_EXISTS) 113 93 114 94 /* 115 - * Convenience macro to get byte size of a field. Works for integers, 95 + * Convenience macro to get the byte size of a field. Works for integers, 116 96 * struct/unions, pointers, arrays, and enums. 117 97 */ 118 98 #define bpf_core_field_size(field) \ 119 99 __builtin_preserve_field_info(field, BPF_FIELD_BYTE_SIZE) 120 100 121 101 /* 122 - * bpf_core_read() abstracts away bpf_probe_read() call and captures offset 123 - * relocation for source address using __builtin_preserve_access_index() 102 + * Convenience macro to get BTF type ID of a specified type, using a local BTF 103 + * information. Return 32-bit unsigned integer with type ID from program's own 104 + * BTF. Always succeeds. 105 + */ 106 + #define bpf_core_type_id_local(type) \ 107 + __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_LOCAL) 108 + 109 + /* 110 + * Convenience macro to get BTF type ID of a target kernel's type that matches 111 + * specified local type. 112 + * Returns: 113 + * - valid 32-bit unsigned type ID in kernel BTF; 114 + * - 0, if no matching type was found in a target kernel BTF. 115 + */ 116 + #define bpf_core_type_id_kernel(type) \ 117 + __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET) 118 + 119 + /* 120 + * Convenience macro to check that provided named type 121 + * (struct/union/enum/typedef) exists in a target kernel. 122 + * Returns: 123 + * 1, if such type is present in target kernel's BTF; 124 + * 0, if no matching type is found. 125 + */ 126 + #define bpf_core_type_exists(type) \ 127 + __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_EXISTS) 128 + 129 + /* 130 + * Convenience macro to get the byte size of a provided named type 131 + * (struct/union/enum/typedef) in a target kernel. 132 + * Returns: 133 + * >= 0 size (in bytes), if type is present in target kernel's BTF; 134 + * 0, if no matching type is found. 135 + */ 136 + #define bpf_core_type_size(type) \ 137 + __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE) 138 + 139 + /* 140 + * Convenience macro to check that provided enumerator value is defined in 141 + * a target kernel. 142 + * Returns: 143 + * 1, if specified enum type and its enumerator value are present in target 144 + * kernel's BTF; 145 + * 0, if no matching enum and/or enum value within that enum is found. 146 + */ 147 + #define bpf_core_enum_value_exists(enum_type, enum_value) \ 148 + __builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_EXISTS) 149 + 150 + /* 151 + * Convenience macro to get the integer value of an enumerator value in 152 + * a target kernel. 153 + * Returns: 154 + * 64-bit value, if specified enum type and its enumerator value are 155 + * present in target kernel's BTF; 156 + * 0, if no matching enum and/or enum value within that enum is found. 157 + */ 158 + #define bpf_core_enum_value(enum_type, enum_value) \ 159 + __builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_VALUE) 160 + 161 + /* 162 + * bpf_core_read() abstracts away bpf_probe_read_kernel() call and captures 163 + * offset relocation for source address using __builtin_preserve_access_index() 124 164 * built-in, provided by Clang. 125 165 * 126 166 * __builtin_preserve_access_index() takes as an argument an expression of ··· 195 115 * (local) BTF, used to record relocation. 196 116 */ 197 117 #define bpf_core_read(dst, sz, src) \ 198 - bpf_probe_read(dst, sz, \ 199 - (const void *)__builtin_preserve_access_index(src)) 118 + bpf_probe_read_kernel(dst, sz, \ 119 + (const void *)__builtin_preserve_access_index(src)) 200 120 201 121 /* 202 122 * bpf_core_read_str() is a thin wrapper around bpf_probe_read_str() ··· 204 124 * argument. 205 125 */ 206 126 #define bpf_core_read_str(dst, sz, src) \ 207 - bpf_probe_read_str(dst, sz, \ 208 - (const void *)__builtin_preserve_access_index(src)) 127 + bpf_probe_read_kernel_str(dst, sz, \ 128 + (const void *)__builtin_preserve_access_index(src)) 209 129 210 130 #define ___concat(a, b) a ## b 211 131 #define ___apply(fn, n) ___concat(fn, n) ··· 319 239 * int x = BPF_CORE_READ(s, a.b.c, d.e, f, g); 320 240 * 321 241 * BPF_CORE_READ will decompose above statement into 4 bpf_core_read (BPF 322 - * CO-RE relocatable bpf_probe_read() wrapper) calls, logically equivalent to: 242 + * CO-RE relocatable bpf_probe_read_kernel() wrapper) calls, logically 243 + * equivalent to: 323 244 * 1. const void *__t = s->a.b.c; 324 245 * 2. __t = __t->d.e; 325 246 * 3. __t = __t->f; 326 247 * 4. return __t->g; 327 248 * 328 249 * Equivalence is logical, because there is a heavy type casting/preservation 329 - * involved, as well as all the reads are happening through bpf_probe_read() 330 - * calls using __builtin_preserve_access_index() to emit CO-RE relocations. 250 + * involved, as well as all the reads are happening through 251 + * bpf_probe_read_kernel() calls using __builtin_preserve_access_index() to 252 + * emit CO-RE relocations. 331 253 * 332 254 * N.B. Only up to 9 "field accessors" are supported, which should be more 333 255 * than enough for any practical purpose.

+3

tools/lib/bpf/bpf_helpers.h

··· 32 32 #ifndef __always_inline 33 33 #define __always_inline __attribute__((always_inline)) 34 34 #endif 35 + #ifndef __noinline 36 + #define __noinline __attribute__((noinline)) 37 + #endif 35 38 #ifndef __weak 36 39 #define __weak __attribute__((weak)) 37 40 #endif

-3

tools/lib/bpf/bpf_prog_linfo.c

··· 8 8 #include "libbpf.h" 9 9 #include "libbpf_internal.h" 10 10 11 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 12 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 13 - 14 11 struct bpf_prog_linfo { 15 12 void *raw_linfo; 16 13 void *raw_jited_linfo;

+2 -2

tools/lib/bpf/bpf_tracing.h

··· 289 289 #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP 290 290 #else 291 291 #define BPF_KPROBE_READ_RET_IP(ip, ctx) \ 292 - ({ bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) 292 + ({ bpf_probe_read_kernel(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) 293 293 #define BPF_KRETPROBE_READ_RET_IP(ip, ctx) \ 294 - ({ bpf_probe_read(&(ip), sizeof(ip), \ 294 + ({ bpf_probe_read_kernel(&(ip), sizeof(ip), \ 295 295 (void *)(PT_REGS_FP(ctx) + sizeof(ip))); }) 296 296 #endif 297 297

+13 -18

tools/lib/bpf/btf.c

··· 21 21 #include "libbpf_internal.h" 22 22 #include "hashmap.h" 23 23 24 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 25 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 26 - 27 24 #define BTF_MAX_NR_TYPES 0x7fffffffU 28 25 #define BTF_MAX_STR_OFFSET 0x7fffffffU 29 26 ··· 58 61 expand_by = max(btf->types_size >> 2, 16U); 59 62 new_size = min(BTF_MAX_NR_TYPES, btf->types_size + expand_by); 60 63 61 - new_types = realloc(btf->types, sizeof(*new_types) * new_size); 64 + new_types = libbpf_reallocarray(btf->types, new_size, sizeof(*new_types)); 62 65 if (!new_types) 63 66 return -ENOMEM; 64 67 ··· 1128 1131 return btf_ext_setup_info(btf_ext, &param); 1129 1132 } 1130 1133 1131 - static int btf_ext_setup_field_reloc(struct btf_ext *btf_ext) 1134 + static int btf_ext_setup_core_relos(struct btf_ext *btf_ext) 1132 1135 { 1133 1136 struct btf_ext_sec_setup_param param = { 1134 - .off = btf_ext->hdr->field_reloc_off, 1135 - .len = btf_ext->hdr->field_reloc_len, 1136 - .min_rec_size = sizeof(struct bpf_field_reloc), 1137 - .ext_info = &btf_ext->field_reloc_info, 1138 - .desc = "field_reloc", 1137 + .off = btf_ext->hdr->core_relo_off, 1138 + .len = btf_ext->hdr->core_relo_len, 1139 + .min_rec_size = sizeof(struct bpf_core_relo), 1140 + .ext_info = &btf_ext->core_relo_info, 1141 + .desc = "core_relo", 1139 1142 }; 1140 1143 1141 1144 return btf_ext_setup_info(btf_ext, &param); ··· 1214 1217 if (err) 1215 1218 goto done; 1216 1219 1217 - if (btf_ext->hdr->hdr_len < 1218 - offsetofend(struct btf_ext_header, field_reloc_len)) 1220 + if (btf_ext->hdr->hdr_len < offsetofend(struct btf_ext_header, core_relo_len)) 1219 1221 goto done; 1220 - err = btf_ext_setup_field_reloc(btf_ext); 1222 + err = btf_ext_setup_core_relos(btf_ext); 1221 1223 if (err) 1222 1224 goto done; 1223 1225 ··· 1571 1575 __u32 *new_list; 1572 1576 1573 1577 d->hypot_cap += max((size_t)16, d->hypot_cap / 2); 1574 - new_list = realloc(d->hypot_list, sizeof(__u32) * d->hypot_cap); 1578 + new_list = libbpf_reallocarray(d->hypot_list, d->hypot_cap, sizeof(__u32)); 1575 1579 if (!new_list) 1576 1580 return -ENOMEM; 1577 1581 d->hypot_list = new_list; ··· 1867 1871 struct btf_str_ptr *new_ptrs; 1868 1872 1869 1873 strs.cap += max(strs.cnt / 2, 16U); 1870 - new_ptrs = realloc(strs.ptrs, 1871 - sizeof(strs.ptrs[0]) * strs.cap); 1874 + new_ptrs = libbpf_reallocarray(strs.ptrs, strs.cap, sizeof(strs.ptrs[0])); 1872 1875 if (!new_ptrs) { 1873 1876 err = -ENOMEM; 1874 1877 goto done; ··· 2952 2957 d->btf->nr_types = next_type_id - 1; 2953 2958 d->btf->types_size = d->btf->nr_types; 2954 2959 d->btf->hdr->type_len = p - types_start; 2955 - new_types = realloc(d->btf->types, 2956 - (1 + d->btf->nr_types) * sizeof(struct btf_type *)); 2960 + new_types = libbpf_reallocarray(d->btf->types, (1 + d->btf->nr_types), 2961 + sizeof(struct btf_type *)); 2957 2962 if (!new_types) 2958 2963 return -ENOMEM; 2959 2964 d->btf->types = new_types;

-38

tools/lib/bpf/btf.h

··· 24 24 25 25 struct bpf_object; 26 26 27 - /* 28 - * The .BTF.ext ELF section layout defined as 29 - * struct btf_ext_header 30 - * func_info subsection 31 - * 32 - * The func_info subsection layout: 33 - * record size for struct bpf_func_info in the func_info subsection 34 - * struct btf_sec_func_info for section #1 35 - * a list of bpf_func_info records for section #1 36 - * where struct bpf_func_info mimics one in include/uapi/linux/bpf.h 37 - * but may not be identical 38 - * struct btf_sec_func_info for section #2 39 - * a list of bpf_func_info records for section #2 40 - * ...... 41 - * 42 - * Note that the bpf_func_info record size in .BTF.ext may not 43 - * be the same as the one defined in include/uapi/linux/bpf.h. 44 - * The loader should ensure that record_size meets minimum 45 - * requirement and pass the record as is to the kernel. The 46 - * kernel will handle the func_info properly based on its contents. 47 - */ 48 - struct btf_ext_header { 49 - __u16 magic; 50 - __u8 version; 51 - __u8 flags; 52 - __u32 hdr_len; 53 - 54 - /* All offsets are in bytes relative to the end of this header */ 55 - __u32 func_info_off; 56 - __u32 func_info_len; 57 - __u32 line_info_off; 58 - __u32 line_info_len; 59 - 60 - /* optional part of .BTF.ext header */ 61 - __u32 field_reloc_off; 62 - __u32 field_reloc_len; 63 - }; 64 - 65 27 LIBBPF_API void btf__free(struct btf *btf); 66 28 LIBBPF_API struct btf *btf__new(const void *data, __u32 size); 67 29 LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext);

+2 -7

tools/lib/bpf/btf_dump.c

··· 19 19 #include "libbpf.h" 20 20 #include "libbpf_internal.h" 21 21 22 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 23 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 24 - 25 22 static const char PREFIXES[] = "\t\t\t\t\t\t\t\t\t\t\t\t\t"; 26 23 static const size_t PREFIX_CNT = sizeof(PREFIXES) - 1; 27 24 ··· 320 323 321 324 if (d->emit_queue_cnt >= d->emit_queue_cap) { 322 325 new_cap = max(16, d->emit_queue_cap * 3 / 2); 323 - new_queue = realloc(d->emit_queue, 324 - new_cap * sizeof(new_queue[0])); 326 + new_queue = libbpf_reallocarray(d->emit_queue, new_cap, sizeof(new_queue[0])); 325 327 if (!new_queue) 326 328 return -ENOMEM; 327 329 d->emit_queue = new_queue; ··· 999 1003 1000 1004 if (d->decl_stack_cnt >= d->decl_stack_cap) { 1001 1005 new_cap = max(16, d->decl_stack_cap * 3 / 2); 1002 - new_stack = realloc(d->decl_stack, 1003 - new_cap * sizeof(new_stack[0])); 1006 + new_stack = libbpf_reallocarray(d->decl_stack, new_cap, sizeof(new_stack[0])); 1004 1007 if (!new_stack) 1005 1008 return -ENOMEM; 1006 1009 d->decl_stack = new_stack;

+3

tools/lib/bpf/hashmap.c

··· 15 15 /* make sure libbpf doesn't use kernel-only integer typedefs */ 16 16 #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 17 17 18 + /* prevent accidental re-addition of reallocarray() */ 19 + #pragma GCC poison reallocarray 20 + 18 21 /* start with 4 buckets */ 19 22 #define HASHMAP_MIN_CAP_BITS 2 20 23

+1226 -501

tools/lib/bpf/libbpf.c

··· 44 44 #include <sys/vfs.h> 45 45 #include <sys/utsname.h> 46 46 #include <sys/resource.h> 47 - #include <tools/libc_compat.h> 48 47 #include <libelf.h> 49 48 #include <gelf.h> 50 49 #include <zlib.h> ··· 55 56 #include "libbpf_internal.h" 56 57 #include "hashmap.h" 57 58 58 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 59 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 60 - 61 59 #ifndef EM_BPF 62 60 #define EM_BPF 247 63 61 #endif ··· 62 66 #ifndef BPF_FS_MAGIC 63 67 #define BPF_FS_MAGIC 0xcafe4a11 64 68 #endif 69 + 70 + #define BPF_INSN_SZ (sizeof(struct bpf_insn)) 65 71 66 72 /* vsprintf() in __base_pr() uses nonliteral format string. It may break 67 73 * compilation if user enables corresponding warning. Disable it explicitly. ··· 152 154 ___err; }) 153 155 #endif 154 156 155 - #ifdef HAVE_LIBELF_MMAP_SUPPORT 156 - # define LIBBPF_ELF_C_READ_MMAP ELF_C_READ_MMAP 157 - #else 158 - # define LIBBPF_ELF_C_READ_MMAP ELF_C_READ 159 - #endif 160 - 161 157 static inline __u64 ptr_to_u64(const void *ptr) 162 158 { 163 159 return (__u64) (unsigned long) ptr; 164 160 } 165 161 166 - struct bpf_capabilities { 162 + enum kern_feature_id { 167 163 /* v4.14: kernel support for program & map names. */ 168 - __u32 name:1; 164 + FEAT_PROG_NAME, 169 165 /* v5.2: kernel support for global data sections. */ 170 - __u32 global_data:1; 166 + FEAT_GLOBAL_DATA, 167 + /* BTF support */ 168 + FEAT_BTF, 171 169 /* BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO support */ 172 - __u32 btf_func:1; 170 + FEAT_BTF_FUNC, 173 171 /* BTF_KIND_VAR and BTF_KIND_DATASEC support */ 174 - __u32 btf_datasec:1; 175 - /* BPF_F_MMAPABLE is supported for arrays */ 176 - __u32 array_mmap:1; 172 + FEAT_BTF_DATASEC, 177 173 /* BTF_FUNC_GLOBAL is supported */ 178 - __u32 btf_func_global:1; 174 + FEAT_BTF_GLOBAL_FUNC, 175 + /* BPF_F_MMAPABLE is supported for arrays */ 176 + FEAT_ARRAY_MMAP, 179 177 /* kernel support for expected_attach_type in BPF_PROG_LOAD */ 180 - __u32 exp_attach_type:1; 178 + FEAT_EXP_ATTACH_TYPE, 179 + /* bpf_probe_read_{kernel,user}[_str] helpers */ 180 + FEAT_PROBE_READ_KERN, 181 + __FEAT_CNT, 181 182 }; 183 + 184 + static bool kernel_supports(enum kern_feature_id feat_id); 182 185 183 186 enum reloc_type { 184 187 RELO_LD64, ··· 208 209 bool is_exp_attach_type_optional; 209 210 bool is_attachable; 210 211 bool is_attach_btf; 212 + bool is_sleepable; 211 213 attach_fn_t attach_fn; 212 214 }; 213 215 ··· 252 252 void *func_info; 253 253 __u32 func_info_rec_size; 254 254 __u32 func_info_cnt; 255 - 256 - struct bpf_capabilities *caps; 257 255 258 256 void *line_info; 259 257 __u32 line_info_rec_size; ··· 401 403 Elf_Data *rodata; 402 404 Elf_Data *bss; 403 405 Elf_Data *st_ops_data; 406 + size_t shstrndx; /* section index for section name strings */ 404 407 size_t strtabidx; 405 408 struct { 406 409 GElf_Shdr shdr; ··· 435 436 void *priv; 436 437 bpf_object_clear_priv_t clear_priv; 437 438 438 - struct bpf_capabilities caps; 439 - 440 439 char path[]; 441 440 }; 442 441 #define obj_elf_valid(o) ((o)->efile.elf) 442 + 443 + static const char *elf_sym_str(const struct bpf_object *obj, size_t off); 444 + static const char *elf_sec_str(const struct bpf_object *obj, size_t off); 445 + static Elf_Scn *elf_sec_by_idx(const struct bpf_object *obj, size_t idx); 446 + static Elf_Scn *elf_sec_by_name(const struct bpf_object *obj, const char *name); 447 + static int elf_sec_hdr(const struct bpf_object *obj, Elf_Scn *scn, GElf_Shdr *hdr); 448 + static const char *elf_sec_name(const struct bpf_object *obj, Elf_Scn *scn); 449 + static Elf_Data *elf_sec_data(const struct bpf_object *obj, Elf_Scn *scn); 443 450 444 451 void bpf_program__unload(struct bpf_program *prog) 445 452 { ··· 508 503 } 509 504 510 505 static int 511 - bpf_program__init(void *data, size_t size, char *section_name, int idx, 506 + bpf_program__init(void *data, size_t size, const char *section_name, int idx, 512 507 struct bpf_program *prog) 513 508 { 514 509 const size_t bpf_insn_sz = sizeof(struct bpf_insn); ··· 557 552 558 553 static int 559 554 bpf_object__add_program(struct bpf_object *obj, void *data, size_t size, 560 - char *section_name, int idx) 555 + const char *section_name, int idx) 561 556 { 562 557 struct bpf_program prog, *progs; 563 558 int nr_progs, err; ··· 566 561 if (err) 567 562 return err; 568 563 569 - prog.caps = &obj->caps; 570 564 progs = obj->programs; 571 565 nr_progs = obj->nr_programs; 572 566 573 - progs = reallocarray(progs, nr_progs + 1, sizeof(progs[0])); 567 + progs = libbpf_reallocarray(progs, nr_progs + 1, sizeof(progs[0])); 574 568 if (!progs) { 575 569 /* 576 570 * In this case the original obj->programs ··· 582 578 return -ENOMEM; 583 579 } 584 580 585 - pr_debug("found program %s\n", prog.section_name); 581 + pr_debug("elf: found program '%s'\n", prog.section_name); 586 582 obj->programs = progs; 587 583 obj->nr_programs = nr_progs + 1; 588 584 prog.obj = obj; ··· 602 598 603 599 prog = &obj->programs[pi]; 604 600 605 - for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name; 606 - si++) { 601 + for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name; si++) { 607 602 GElf_Sym sym; 608 603 609 604 if (!gelf_getsym(symbols, si, &sym)) ··· 612 609 if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL) 613 610 continue; 614 611 615 - name = elf_strptr(obj->efile.elf, 616 - obj->efile.strtabidx, 617 - sym.st_name); 612 + name = elf_sym_str(obj, sym.st_name); 618 613 if (!name) { 619 - pr_warn("failed to get sym name string for prog %s\n", 614 + pr_warn("prog '%s': failed to get symbol name\n", 620 615 prog->section_name); 621 616 return -LIBBPF_ERRNO__LIBELF; 622 617 } ··· 624 623 name = ".text"; 625 624 626 625 if (!name) { 627 - pr_warn("failed to find sym for prog %s\n", 626 + pr_warn("prog '%s': failed to find program symbol\n", 628 627 prog->section_name); 629 628 return -EINVAL; 630 629 } 631 630 632 631 prog->name = strdup(name); 633 - if (!prog->name) { 634 - pr_warn("failed to allocate memory for prog sym %s\n", 635 - name); 632 + if (!prog->name) 636 633 return -ENOMEM; 637 - } 638 634 } 639 635 640 636 return 0; ··· 1064 1066 obj->efile.obj_buf_sz = 0; 1065 1067 } 1066 1068 1069 + /* if libelf is old and doesn't support mmap(), fall back to read() */ 1070 + #ifndef ELF_C_READ_MMAP 1071 + #define ELF_C_READ_MMAP ELF_C_READ 1072 + #endif 1073 + 1067 1074 static int bpf_object__elf_init(struct bpf_object *obj) 1068 1075 { 1069 1076 int err = 0; 1070 1077 GElf_Ehdr *ep; 1071 1078 1072 1079 if (obj_elf_valid(obj)) { 1073 - pr_warn("elf init: internal error\n"); 1080 + pr_warn("elf: init internal error\n"); 1074 1081 return -LIBBPF_ERRNO__LIBELF; 1075 1082 } 1076 1083 ··· 1093 1090 1094 1091 err = -errno; 1095 1092 cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg)); 1096 - pr_warn("failed to open %s: %s\n", obj->path, cp); 1093 + pr_warn("elf: failed to open %s: %s\n", obj->path, cp); 1097 1094 return err; 1098 1095 } 1099 1096 1100 - obj->efile.elf = elf_begin(obj->efile.fd, 1101 - LIBBPF_ELF_C_READ_MMAP, NULL); 1097 + obj->efile.elf = elf_begin(obj->efile.fd, ELF_C_READ_MMAP, NULL); 1102 1098 } 1103 1099 1104 1100 if (!obj->efile.elf) { 1105 - pr_warn("failed to open %s as ELF file\n", obj->path); 1101 + pr_warn("elf: failed to open %s as ELF file: %s\n", obj->path, elf_errmsg(-1)); 1106 1102 err = -LIBBPF_ERRNO__LIBELF; 1107 1103 goto errout; 1108 1104 } 1109 1105 1110 1106 if (!gelf_getehdr(obj->efile.elf, &obj->efile.ehdr)) { 1111 - pr_warn("failed to get EHDR from %s\n", obj->path); 1107 + pr_warn("elf: failed to get ELF header from %s: %s\n", obj->path, elf_errmsg(-1)); 1112 1108 err = -LIBBPF_ERRNO__FORMAT; 1113 1109 goto errout; 1114 1110 } 1115 1111 ep = &obj->efile.ehdr; 1116 1112 1113 + if (elf_getshdrstrndx(obj->efile.elf, &obj->efile.shstrndx)) { 1114 + pr_warn("elf: failed to get section names section index for %s: %s\n", 1115 + obj->path, elf_errmsg(-1)); 1116 + err = -LIBBPF_ERRNO__FORMAT; 1117 + goto errout; 1118 + } 1119 + 1120 + /* Elf is corrupted/truncated, avoid calling elf_strptr. */ 1121 + if (!elf_rawdata(elf_getscn(obj->efile.elf, obj->efile.shstrndx), NULL)) { 1122 + pr_warn("elf: failed to get section names strings from %s: %s\n", 1123 + obj->path, elf_errmsg(-1)); 1124 + return -LIBBPF_ERRNO__FORMAT; 1125 + } 1126 + 1117 1127 /* Old LLVM set e_machine to EM_NONE */ 1118 1128 if (ep->e_type != ET_REL || 1119 1129 (ep->e_machine && ep->e_machine != EM_BPF)) { 1120 - pr_warn("%s is not an eBPF object file\n", obj->path); 1130 + pr_warn("elf: %s is not a valid eBPF object file\n", obj->path); 1121 1131 err = -LIBBPF_ERRNO__FORMAT; 1122 1132 goto errout; 1123 1133 } ··· 1152 1136 #else 1153 1137 # error "Unrecognized __BYTE_ORDER__" 1154 1138 #endif 1155 - pr_warn("endianness mismatch.\n"); 1139 + pr_warn("elf: endianness mismatch in %s.\n", obj->path); 1156 1140 return -LIBBPF_ERRNO__ENDIAN; 1157 1141 } 1158 1142 ··· 1187 1171 return false; 1188 1172 } 1189 1173 1190 - static int bpf_object_search_section_size(const struct bpf_object *obj, 1191 - const char *name, size_t *d_size) 1192 - { 1193 - const GElf_Ehdr *ep = &obj->efile.ehdr; 1194 - Elf *elf = obj->efile.elf; 1195 - Elf_Scn *scn = NULL; 1196 - int idx = 0; 1197 - 1198 - while ((scn = elf_nextscn(elf, scn)) != NULL) { 1199 - const char *sec_name; 1200 - Elf_Data *data; 1201 - GElf_Shdr sh; 1202 - 1203 - idx++; 1204 - if (gelf_getshdr(scn, &sh) != &sh) { 1205 - pr_warn("failed to get section(%d) header from %s\n", 1206 - idx, obj->path); 1207 - return -EIO; 1208 - } 1209 - 1210 - sec_name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name); 1211 - if (!sec_name) { 1212 - pr_warn("failed to get section(%d) name from %s\n", 1213 - idx, obj->path); 1214 - return -EIO; 1215 - } 1216 - 1217 - if (strcmp(name, sec_name)) 1218 - continue; 1219 - 1220 - data = elf_getdata(scn, 0); 1221 - if (!data) { 1222 - pr_warn("failed to get section(%d) data from %s(%s)\n", 1223 - idx, name, obj->path); 1224 - return -EIO; 1225 - } 1226 - 1227 - *d_size = data->d_size; 1228 - return 0; 1229 - } 1230 - 1231 - return -ENOENT; 1232 - } 1233 - 1234 1174 int bpf_object__section_size(const struct bpf_object *obj, const char *name, 1235 1175 __u32 *size) 1236 1176 { 1237 1177 int ret = -ENOENT; 1238 - size_t d_size; 1239 1178 1240 1179 *size = 0; 1241 1180 if (!name) { ··· 1208 1237 if (obj->efile.st_ops_data) 1209 1238 *size = obj->efile.st_ops_data->d_size; 1210 1239 } else { 1211 - ret = bpf_object_search_section_size(obj, name, &d_size); 1212 - if (!ret) 1213 - *size = d_size; 1240 + Elf_Scn *scn = elf_sec_by_name(obj, name); 1241 + Elf_Data *data = elf_sec_data(obj, scn); 1242 + 1243 + if (data) { 1244 + ret = 0; /* found it */ 1245 + *size = data->d_size; 1246 + } 1214 1247 } 1215 1248 1216 1249 return *size ? 0 : ret; ··· 1239 1264 GELF_ST_TYPE(sym.st_info) != STT_OBJECT) 1240 1265 continue; 1241 1266 1242 - sname = elf_strptr(obj->efile.elf, obj->efile.strtabidx, 1243 - sym.st_name); 1267 + sname = elf_sym_str(obj, sym.st_name); 1244 1268 if (!sname) { 1245 1269 pr_warn("failed to get sym name string for var %s\n", 1246 1270 name); ··· 1264 1290 return &obj->maps[obj->nr_maps++]; 1265 1291 1266 1292 new_cap = max((size_t)4, obj->maps_cap * 3 / 2); 1267 - new_maps = realloc(obj->maps, new_cap * sizeof(*obj->maps)); 1293 + new_maps = libbpf_reallocarray(obj->maps, new_cap, sizeof(*obj->maps)); 1268 1294 if (!new_maps) { 1269 1295 pr_warn("alloc maps for object failed\n"); 1270 1296 return ERR_PTR(-ENOMEM); ··· 1716 1742 if (!symbols) 1717 1743 return -EINVAL; 1718 1744 1719 - scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx); 1720 - if (scn) 1721 - data = elf_getdata(scn, NULL); 1745 + 1746 + scn = elf_sec_by_idx(obj, obj->efile.maps_shndx); 1747 + data = elf_sec_data(obj, scn); 1722 1748 if (!scn || !data) { 1723 - pr_warn("failed to get Elf_Data from map section %d\n", 1724 - obj->efile.maps_shndx); 1749 + pr_warn("elf: failed to get legacy map definitions for %s\n", 1750 + obj->path); 1725 1751 return -EINVAL; 1726 1752 } 1727 1753 ··· 1743 1769 nr_maps++; 1744 1770 } 1745 1771 /* Assume equally sized map definitions */ 1746 - pr_debug("maps in %s: %d maps in %zd bytes\n", 1747 - obj->path, nr_maps, data->d_size); 1772 + pr_debug("elf: found %d legacy map definitions (%zd bytes) in %s\n", 1773 + nr_maps, data->d_size, obj->path); 1748 1774 1749 1775 if (!data->d_size || nr_maps == 0 || (data->d_size % nr_maps) != 0) { 1750 - pr_warn("unable to determine map definition size section %s, %d maps in %zd bytes\n", 1751 - obj->path, nr_maps, data->d_size); 1776 + pr_warn("elf: unable to determine legacy map definition size in %s\n", 1777 + obj->path); 1752 1778 return -EINVAL; 1753 1779 } 1754 1780 map_def_sz = data->d_size / nr_maps; ··· 1769 1795 if (IS_ERR(map)) 1770 1796 return PTR_ERR(map); 1771 1797 1772 - map_name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, 1773 - sym.st_name); 1798 + map_name = elf_sym_str(obj, sym.st_name); 1774 1799 if (!map_name) { 1775 1800 pr_warn("failed to get map #%d name sym string for obj %s\n", 1776 1801 i, obj->path); ··· 1857 1884 return btf_is_func_proto(t) ? t : NULL; 1858 1885 } 1859 1886 1887 + static const char *btf_kind_str(const struct btf_type *t) 1888 + { 1889 + switch (btf_kind(t)) { 1890 + case BTF_KIND_UNKN: return "void"; 1891 + case BTF_KIND_INT: return "int"; 1892 + case BTF_KIND_PTR: return "ptr"; 1893 + case BTF_KIND_ARRAY: return "array"; 1894 + case BTF_KIND_STRUCT: return "struct"; 1895 + case BTF_KIND_UNION: return "union"; 1896 + case BTF_KIND_ENUM: return "enum"; 1897 + case BTF_KIND_FWD: return "fwd"; 1898 + case BTF_KIND_TYPEDEF: return "typedef"; 1899 + case BTF_KIND_VOLATILE: return "volatile"; 1900 + case BTF_KIND_CONST: return "const"; 1901 + case BTF_KIND_RESTRICT: return "restrict"; 1902 + case BTF_KIND_FUNC: return "func"; 1903 + case BTF_KIND_FUNC_PROTO: return "func_proto"; 1904 + case BTF_KIND_VAR: return "var"; 1905 + case BTF_KIND_DATASEC: return "datasec"; 1906 + default: return "unknown"; 1907 + } 1908 + } 1909 + 1860 1910 /* 1861 1911 * Fetch integer attribute of BTF map definition. Such attributes are 1862 1912 * represented using a pointer to an array, in which dimensionality of array ··· 1896 1900 const struct btf_type *arr_t; 1897 1901 1898 1902 if (!btf_is_ptr(t)) { 1899 - pr_warn("map '%s': attr '%s': expected PTR, got %u.\n", 1900 - map_name, name, btf_kind(t)); 1903 + pr_warn("map '%s': attr '%s': expected PTR, got %s.\n", 1904 + map_name, name, btf_kind_str(t)); 1901 1905 return false; 1902 1906 } 1903 1907 ··· 1908 1912 return false; 1909 1913 } 1910 1914 if (!btf_is_array(arr_t)) { 1911 - pr_warn("map '%s': attr '%s': expected ARRAY, got %u.\n", 1912 - map_name, name, btf_kind(arr_t)); 1915 + pr_warn("map '%s': attr '%s': expected ARRAY, got %s.\n", 1916 + map_name, name, btf_kind_str(arr_t)); 1913 1917 return false; 1914 1918 } 1915 1919 arr_info = btf_array(arr_t); ··· 1920 1924 static int build_map_pin_path(struct bpf_map *map, const char *path) 1921 1925 { 1922 1926 char buf[PATH_MAX]; 1923 - int err, len; 1927 + int len; 1924 1928 1925 1929 if (!path) 1926 1930 path = "/sys/fs/bpf"; ··· 1931 1935 else if (len >= PATH_MAX) 1932 1936 return -ENAMETOOLONG; 1933 1937 1934 - err = bpf_map__set_pin_path(map, buf); 1935 - if (err) 1936 - return err; 1937 - 1938 - return 0; 1938 + return bpf_map__set_pin_path(map, buf); 1939 1939 } 1940 1940 1941 1941 ··· 1999 2007 return -EINVAL; 2000 2008 } 2001 2009 if (!btf_is_ptr(t)) { 2002 - pr_warn("map '%s': key spec is not PTR: %u.\n", 2003 - map->name, btf_kind(t)); 2010 + pr_warn("map '%s': key spec is not PTR: %s.\n", 2011 + map->name, btf_kind_str(t)); 2004 2012 return -EINVAL; 2005 2013 } 2006 2014 sz = btf__resolve_size(obj->btf, t->type); ··· 2041 2049 return -EINVAL; 2042 2050 } 2043 2051 if (!btf_is_ptr(t)) { 2044 - pr_warn("map '%s': value spec is not PTR: %u.\n", 2045 - map->name, btf_kind(t)); 2052 + pr_warn("map '%s': value spec is not PTR: %s.\n", 2053 + map->name, btf_kind_str(t)); 2046 2054 return -EINVAL; 2047 2055 } 2048 2056 sz = btf__resolve_size(obj->btf, t->type); ··· 2099 2107 t = skip_mods_and_typedefs(obj->btf, btf_array(t)->type, 2100 2108 NULL); 2101 2109 if (!btf_is_ptr(t)) { 2102 - pr_warn("map '%s': map-in-map inner def is of unexpected kind %u.\n", 2103 - map->name, btf_kind(t)); 2110 + pr_warn("map '%s': map-in-map inner def is of unexpected kind %s.\n", 2111 + map->name, btf_kind_str(t)); 2104 2112 return -EINVAL; 2105 2113 } 2106 2114 t = skip_mods_and_typedefs(obj->btf, t->type, NULL); 2107 2115 if (!btf_is_struct(t)) { 2108 - pr_warn("map '%s': map-in-map inner def is of unexpected kind %u.\n", 2109 - map->name, btf_kind(t)); 2116 + pr_warn("map '%s': map-in-map inner def is of unexpected kind %s.\n", 2117 + map->name, btf_kind_str(t)); 2110 2118 return -EINVAL; 2111 2119 } 2112 2120 ··· 2197 2205 return -EINVAL; 2198 2206 } 2199 2207 if (!btf_is_var(var)) { 2200 - pr_warn("map '%s': unexpected var kind %u.\n", 2201 - map_name, btf_kind(var)); 2208 + pr_warn("map '%s': unexpected var kind %s.\n", 2209 + map_name, btf_kind_str(var)); 2202 2210 return -EINVAL; 2203 2211 } 2204 2212 if (var_extra->linkage != BTF_VAR_GLOBAL_ALLOCATED && ··· 2210 2218 2211 2219 def = skip_mods_and_typedefs(obj->btf, var->type, NULL); 2212 2220 if (!btf_is_struct(def)) { 2213 - pr_warn("map '%s': unexpected def kind %u.\n", 2214 - map_name, btf_kind(var)); 2221 + pr_warn("map '%s': unexpected def kind %s.\n", 2222 + map_name, btf_kind_str(var)); 2215 2223 return -EINVAL; 2216 2224 } 2217 2225 if (def->size > vi->size) { ··· 2251 2259 if (obj->efile.btf_maps_shndx < 0) 2252 2260 return 0; 2253 2261 2254 - scn = elf_getscn(obj->efile.elf, obj->efile.btf_maps_shndx); 2255 - if (scn) 2256 - data = elf_getdata(scn, NULL); 2262 + scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx); 2263 + data = elf_sec_data(obj, scn); 2257 2264 if (!scn || !data) { 2258 - pr_warn("failed to get Elf_Data from map section %d (%s)\n", 2259 - obj->efile.btf_maps_shndx, MAPS_ELF_SEC); 2265 + pr_warn("elf: failed to get %s map definitions for %s\n", 2266 + MAPS_ELF_SEC, obj->path); 2260 2267 return -EINVAL; 2261 2268 } 2262 2269 ··· 2313 2322 2314 2323 static bool section_have_execinstr(struct bpf_object *obj, int idx) 2315 2324 { 2316 - Elf_Scn *scn; 2317 2325 GElf_Shdr sh; 2318 2326 2319 - scn = elf_getscn(obj->efile.elf, idx); 2320 - if (!scn) 2327 + if (elf_sec_hdr(obj, elf_sec_by_idx(obj, idx), &sh)) 2321 2328 return false; 2322 2329 2323 - if (gelf_getshdr(scn, &sh) != &sh) 2324 - return false; 2325 - 2326 - if (sh.sh_flags & SHF_EXECINSTR) 2327 - return true; 2328 - 2329 - return false; 2330 + return sh.sh_flags & SHF_EXECINSTR; 2330 2331 } 2331 2332 2332 2333 static bool btf_needs_sanitization(struct bpf_object *obj) 2333 2334 { 2334 - bool has_func_global = obj->caps.btf_func_global; 2335 - bool has_datasec = obj->caps.btf_datasec; 2336 - bool has_func = obj->caps.btf_func; 2335 + bool has_func_global = kernel_supports(FEAT_BTF_GLOBAL_FUNC); 2336 + bool has_datasec = kernel_supports(FEAT_BTF_DATASEC); 2337 + bool has_func = kernel_supports(FEAT_BTF_FUNC); 2337 2338 2338 2339 return !has_func || !has_datasec || !has_func_global; 2339 2340 } 2340 2341 2341 2342 static void bpf_object__sanitize_btf(struct bpf_object *obj, struct btf *btf) 2342 2343 { 2343 - bool has_func_global = obj->caps.btf_func_global; 2344 - bool has_datasec = obj->caps.btf_datasec; 2345 - bool has_func = obj->caps.btf_func; 2344 + bool has_func_global = kernel_supports(FEAT_BTF_GLOBAL_FUNC); 2345 + bool has_datasec = kernel_supports(FEAT_BTF_DATASEC); 2346 + bool has_func = kernel_supports(FEAT_BTF_FUNC); 2346 2347 struct btf_type *t; 2347 2348 int i, j, vlen; 2348 2349 ··· 2482 2499 int err; 2483 2500 2484 2501 /* CO-RE relocations need kernel BTF */ 2485 - if (obj->btf_ext && obj->btf_ext->field_reloc_info.len) 2502 + if (obj->btf_ext && obj->btf_ext->core_relo_info.len) 2486 2503 need_vmlinux_btf = true; 2487 2504 2488 2505 bpf_object__for_each_program(prog, obj) { ··· 2516 2533 if (!obj->btf) 2517 2534 return 0; 2518 2535 2536 + if (!kernel_supports(FEAT_BTF)) { 2537 + if (kernel_needs_btf(obj)) { 2538 + err = -EOPNOTSUPP; 2539 + goto report; 2540 + } 2541 + pr_debug("Kernel doesn't support BTF, skipping uploading it.\n"); 2542 + return 0; 2543 + } 2544 + 2519 2545 sanitize = btf_needs_sanitization(obj); 2520 2546 if (sanitize) { 2521 2547 const void *raw_data; ··· 2550 2558 } 2551 2559 btf__free(kern_btf); 2552 2560 } 2561 + report: 2553 2562 if (err) { 2554 2563 btf_mandatory = kernel_needs_btf(obj); 2555 2564 pr_warn("Error loading .BTF into kernel: %d. %s\n", err, ··· 2562 2569 return err; 2563 2570 } 2564 2571 2572 + static const char *elf_sym_str(const struct bpf_object *obj, size_t off) 2573 + { 2574 + const char *name; 2575 + 2576 + name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, off); 2577 + if (!name) { 2578 + pr_warn("elf: failed to get section name string at offset %zu from %s: %s\n", 2579 + off, obj->path, elf_errmsg(-1)); 2580 + return NULL; 2581 + } 2582 + 2583 + return name; 2584 + } 2585 + 2586 + static const char *elf_sec_str(const struct bpf_object *obj, size_t off) 2587 + { 2588 + const char *name; 2589 + 2590 + name = elf_strptr(obj->efile.elf, obj->efile.shstrndx, off); 2591 + if (!name) { 2592 + pr_warn("elf: failed to get section name string at offset %zu from %s: %s\n", 2593 + off, obj->path, elf_errmsg(-1)); 2594 + return NULL; 2595 + } 2596 + 2597 + return name; 2598 + } 2599 + 2600 + static Elf_Scn *elf_sec_by_idx(const struct bpf_object *obj, size_t idx) 2601 + { 2602 + Elf_Scn *scn; 2603 + 2604 + scn = elf_getscn(obj->efile.elf, idx); 2605 + if (!scn) { 2606 + pr_warn("elf: failed to get section(%zu) from %s: %s\n", 2607 + idx, obj->path, elf_errmsg(-1)); 2608 + return NULL; 2609 + } 2610 + return scn; 2611 + } 2612 + 2613 + static Elf_Scn *elf_sec_by_name(const struct bpf_object *obj, const char *name) 2614 + { 2615 + Elf_Scn *scn = NULL; 2616 + Elf *elf = obj->efile.elf; 2617 + const char *sec_name; 2618 + 2619 + while ((scn = elf_nextscn(elf, scn)) != NULL) { 2620 + sec_name = elf_sec_name(obj, scn); 2621 + if (!sec_name) 2622 + return NULL; 2623 + 2624 + if (strcmp(sec_name, name) != 0) 2625 + continue; 2626 + 2627 + return scn; 2628 + } 2629 + return NULL; 2630 + } 2631 + 2632 + static int elf_sec_hdr(const struct bpf_object *obj, Elf_Scn *scn, GElf_Shdr *hdr) 2633 + { 2634 + if (!scn) 2635 + return -EINVAL; 2636 + 2637 + if (gelf_getshdr(scn, hdr) != hdr) { 2638 + pr_warn("elf: failed to get section(%zu) header from %s: %s\n", 2639 + elf_ndxscn(scn), obj->path, elf_errmsg(-1)); 2640 + return -EINVAL; 2641 + } 2642 + 2643 + return 0; 2644 + } 2645 + 2646 + static const char *elf_sec_name(const struct bpf_object *obj, Elf_Scn *scn) 2647 + { 2648 + const char *name; 2649 + GElf_Shdr sh; 2650 + 2651 + if (!scn) 2652 + return NULL; 2653 + 2654 + if (elf_sec_hdr(obj, scn, &sh)) 2655 + return NULL; 2656 + 2657 + name = elf_sec_str(obj, sh.sh_name); 2658 + if (!name) { 2659 + pr_warn("elf: failed to get section(%zu) name from %s: %s\n", 2660 + elf_ndxscn(scn), obj->path, elf_errmsg(-1)); 2661 + return NULL; 2662 + } 2663 + 2664 + return name; 2665 + } 2666 + 2667 + static Elf_Data *elf_sec_data(const struct bpf_object *obj, Elf_Scn *scn) 2668 + { 2669 + Elf_Data *data; 2670 + 2671 + if (!scn) 2672 + return NULL; 2673 + 2674 + data = elf_getdata(scn, 0); 2675 + if (!data) { 2676 + pr_warn("elf: failed to get section(%zu) %s data from %s: %s\n", 2677 + elf_ndxscn(scn), elf_sec_name(obj, scn) ?: "<?>", 2678 + obj->path, elf_errmsg(-1)); 2679 + return NULL; 2680 + } 2681 + 2682 + return data; 2683 + } 2684 + 2685 + static bool is_sec_name_dwarf(const char *name) 2686 + { 2687 + /* approximation, but the actual list is too long */ 2688 + return strncmp(name, ".debug_", sizeof(".debug_") - 1) == 0; 2689 + } 2690 + 2691 + static bool ignore_elf_section(GElf_Shdr *hdr, const char *name) 2692 + { 2693 + /* no special handling of .strtab */ 2694 + if (hdr->sh_type == SHT_STRTAB) 2695 + return true; 2696 + 2697 + /* ignore .llvm_addrsig section as well */ 2698 + if (hdr->sh_type == 0x6FFF4C03 /* SHT_LLVM_ADDRSIG */) 2699 + return true; 2700 + 2701 + /* no subprograms will lead to an empty .text section, ignore it */ 2702 + if (hdr->sh_type == SHT_PROGBITS && hdr->sh_size == 0 && 2703 + strcmp(name, ".text") == 0) 2704 + return true; 2705 + 2706 + /* DWARF sections */ 2707 + if (is_sec_name_dwarf(name)) 2708 + return true; 2709 + 2710 + if (strncmp(name, ".rel", sizeof(".rel") - 1) == 0) { 2711 + name += sizeof(".rel") - 1; 2712 + /* DWARF section relocations */ 2713 + if (is_sec_name_dwarf(name)) 2714 + return true; 2715 + 2716 + /* .BTF and .BTF.ext don't need relocations */ 2717 + if (strcmp(name, BTF_ELF_SEC) == 0 || 2718 + strcmp(name, BTF_EXT_ELF_SEC) == 0) 2719 + return true; 2720 + } 2721 + 2722 + return false; 2723 + } 2724 + 2565 2725 static int bpf_object__elf_collect(struct bpf_object *obj) 2566 2726 { 2567 2727 Elf *elf = obj->efile.elf; 2568 - GElf_Ehdr *ep = &obj->efile.ehdr; 2569 2728 Elf_Data *btf_ext_data = NULL; 2570 2729 Elf_Data *btf_data = NULL; 2571 2730 Elf_Scn *scn = NULL; 2572 2731 int idx = 0, err = 0; 2573 2732 2574 - /* Elf is corrupted/truncated, avoid calling elf_strptr. */ 2575 - if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL)) { 2576 - pr_warn("failed to get e_shstrndx from %s\n", obj->path); 2577 - return -LIBBPF_ERRNO__FORMAT; 2578 - } 2579 - 2580 2733 while ((scn = elf_nextscn(elf, scn)) != NULL) { 2581 - char *name; 2734 + const char *name; 2582 2735 GElf_Shdr sh; 2583 2736 Elf_Data *data; 2584 2737 2585 2738 idx++; 2586 - if (gelf_getshdr(scn, &sh) != &sh) { 2587 - pr_warn("failed to get section(%d) header from %s\n", 2588 - idx, obj->path); 2589 - return -LIBBPF_ERRNO__FORMAT; 2590 - } 2591 2739 2592 - name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name); 2593 - if (!name) { 2594 - pr_warn("failed to get section(%d) name from %s\n", 2595 - idx, obj->path); 2740 + if (elf_sec_hdr(obj, scn, &sh)) 2596 2741 return -LIBBPF_ERRNO__FORMAT; 2597 - } 2598 2742 2599 - data = elf_getdata(scn, 0); 2600 - if (!data) { 2601 - pr_warn("failed to get section(%d) data from %s(%s)\n", 2602 - idx, name, obj->path); 2743 + name = elf_sec_str(obj, sh.sh_name); 2744 + if (!name) 2603 2745 return -LIBBPF_ERRNO__FORMAT; 2604 - } 2605 - pr_debug("section(%d) %s, size %ld, link %d, flags %lx, type=%d\n", 2746 + 2747 + if (ignore_elf_section(&sh, name)) 2748 + continue; 2749 + 2750 + data = elf_sec_data(obj, scn); 2751 + if (!data) 2752 + return -LIBBPF_ERRNO__FORMAT; 2753 + 2754 + pr_debug("elf: section(%d) %s, size %ld, link %d, flags %lx, type=%d\n", 2606 2755 idx, name, (unsigned long)data->d_size, 2607 2756 (int)sh.sh_link, (unsigned long)sh.sh_flags, 2608 2757 (int)sh.sh_type); 2609 2758 2610 2759 if (strcmp(name, "license") == 0) { 2611 - err = bpf_object__init_license(obj, 2612 - data->d_buf, 2613 - data->d_size); 2760 + err = bpf_object__init_license(obj, data->d_buf, data->d_size); 2614 2761 if (err) 2615 2762 return err; 2616 2763 } else if (strcmp(name, "version") == 0) { 2617 - err = bpf_object__init_kversion(obj, 2618 - data->d_buf, 2619 - data->d_size); 2764 + err = bpf_object__init_kversion(obj, data->d_buf, data->d_size); 2620 2765 if (err) 2621 2766 return err; 2622 2767 } else if (strcmp(name, "maps") == 0) { ··· 2767 2636 btf_ext_data = data; 2768 2637 } else if (sh.sh_type == SHT_SYMTAB) { 2769 2638 if (obj->efile.symbols) { 2770 - pr_warn("bpf: multiple SYMTAB in %s\n", 2771 - obj->path); 2639 + pr_warn("elf: multiple symbol tables in %s\n", obj->path); 2772 2640 return -LIBBPF_ERRNO__FORMAT; 2773 2641 } 2774 2642 obj->efile.symbols = data; ··· 2780 2650 err = bpf_object__add_program(obj, data->d_buf, 2781 2651 data->d_size, 2782 2652 name, idx); 2783 - if (err) { 2784 - char errmsg[STRERR_BUFSIZE]; 2785 - char *cp; 2786 - 2787 - cp = libbpf_strerror_r(-err, errmsg, 2788 - sizeof(errmsg)); 2789 - pr_warn("failed to alloc program %s (%s): %s", 2790 - name, obj->path, cp); 2653 + if (err) 2791 2654 return err; 2792 - } 2793 2655 } else if (strcmp(name, DATA_SEC) == 0) { 2794 2656 obj->efile.data = data; 2795 2657 obj->efile.data_shndx = idx; ··· 2792 2670 obj->efile.st_ops_data = data; 2793 2671 obj->efile.st_ops_shndx = idx; 2794 2672 } else { 2795 - pr_debug("skip section(%d) %s\n", idx, name); 2673 + pr_info("elf: skipping unrecognized data section(%d) %s\n", 2674 + idx, name); 2796 2675 } 2797 2676 } else if (sh.sh_type == SHT_REL) { 2798 2677 int nr_sects = obj->efile.nr_reloc_sects; ··· 2804 2681 if (!section_have_execinstr(obj, sec) && 2805 2682 strcmp(name, ".rel" STRUCT_OPS_SEC) && 2806 2683 strcmp(name, ".rel" MAPS_ELF_SEC)) { 2807 - pr_debug("skip relo %s(%d) for section(%d)\n", 2808 - name, idx, sec); 2684 + pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n", 2685 + idx, name, sec, 2686 + elf_sec_name(obj, elf_sec_by_idx(obj, sec)) ?: "<?>"); 2809 2687 continue; 2810 2688 } 2811 2689 2812 - sects = reallocarray(sects, nr_sects + 1, 2813 - sizeof(*obj->efile.reloc_sects)); 2814 - if (!sects) { 2815 - pr_warn("reloc_sects realloc failed\n"); 2690 + sects = libbpf_reallocarray(sects, nr_sects + 1, 2691 + sizeof(*obj->efile.reloc_sects)); 2692 + if (!sects) 2816 2693 return -ENOMEM; 2817 - } 2818 2694 2819 2695 obj->efile.reloc_sects = sects; 2820 2696 obj->efile.nr_reloc_sects++; 2821 2697 2822 2698 obj->efile.reloc_sects[nr_sects].shdr = sh; 2823 2699 obj->efile.reloc_sects[nr_sects].data = data; 2824 - } else if (sh.sh_type == SHT_NOBITS && 2825 - strcmp(name, BSS_SEC) == 0) { 2700 + } else if (sh.sh_type == SHT_NOBITS && strcmp(name, BSS_SEC) == 0) { 2826 2701 obj->efile.bss = data; 2827 2702 obj->efile.bss_shndx = idx; 2828 2703 } else { 2829 - pr_debug("skip section(%d) %s\n", idx, name); 2704 + pr_info("elf: skipping section(%d) %s (size %zu)\n", idx, name, 2705 + (size_t)sh.sh_size); 2830 2706 } 2831 2707 } 2832 2708 2833 2709 if (!obj->efile.strtabidx || obj->efile.strtabidx > idx) { 2834 - pr_warn("Corrupted ELF file: index of strtab invalid\n"); 2710 + pr_warn("elf: symbol strings section missing or invalid in %s\n", obj->path); 2835 2711 return -LIBBPF_ERRNO__FORMAT; 2836 2712 } 2837 2713 return bpf_object__init_btf(obj, btf_data, btf_ext_data); ··· 2991 2869 if (!obj->efile.symbols) 2992 2870 return 0; 2993 2871 2994 - scn = elf_getscn(obj->efile.elf, obj->efile.symbols_shndx); 2995 - if (!scn) 2872 + scn = elf_sec_by_idx(obj, obj->efile.symbols_shndx); 2873 + if (elf_sec_hdr(obj, scn, &sh)) 2996 2874 return -LIBBPF_ERRNO__FORMAT; 2997 - if (gelf_getshdr(scn, &sh) != &sh) 2998 - return -LIBBPF_ERRNO__FORMAT; 2999 - n = sh.sh_size / sh.sh_entsize; 3000 2875 2876 + n = sh.sh_size / sh.sh_entsize; 3001 2877 pr_debug("looking for externs among %d symbols...\n", n); 2878 + 3002 2879 for (i = 0; i < n; i++) { 3003 2880 GElf_Sym sym; 3004 2881 ··· 3005 2884 return -LIBBPF_ERRNO__FORMAT; 3006 2885 if (!sym_is_extern(&sym)) 3007 2886 continue; 3008 - ext_name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, 3009 - sym.st_name); 2887 + ext_name = elf_sym_str(obj, sym.st_name); 3010 2888 if (!ext_name || !ext_name[0]) 3011 2889 continue; 3012 2890 3013 2891 ext = obj->externs; 3014 - ext = reallocarray(ext, obj->nr_extern + 1, sizeof(*ext)); 2892 + ext = libbpf_reallocarray(ext, obj->nr_extern + 1, sizeof(*ext)); 3015 2893 if (!ext) 3016 2894 return -ENOMEM; 3017 2895 obj->externs = ext; ··· 3229 3109 3230 3110 static int bpf_program__record_reloc(struct bpf_program *prog, 3231 3111 struct reloc_desc *reloc_desc, 3232 - __u32 insn_idx, const char *name, 3112 + __u32 insn_idx, const char *sym_name, 3233 3113 const GElf_Sym *sym, const GElf_Rel *rel) 3234 3114 { 3235 3115 struct bpf_insn *insn = &prog->insns[insn_idx]; ··· 3237 3117 struct bpf_object *obj = prog->obj; 3238 3118 __u32 shdr_idx = sym->st_shndx; 3239 3119 enum libbpf_map_type type; 3120 + const char *sym_sec_name; 3240 3121 struct bpf_map *map; 3241 3122 3242 3123 /* sub-program call relocation */ 3243 3124 if (insn->code == (BPF_JMP | BPF_CALL)) { 3244 3125 if (insn->src_reg != BPF_PSEUDO_CALL) { 3245 - pr_warn("incorrect bpf_call opcode\n"); 3126 + pr_warn("prog '%s': incorrect bpf_call opcode\n", prog->name); 3246 3127 return -LIBBPF_ERRNO__RELOC; 3247 3128 } 3248 3129 /* text_shndx can be 0, if no default "main" program exists */ 3249 3130 if (!shdr_idx || shdr_idx != obj->efile.text_shndx) { 3250 - pr_warn("bad call relo against section %u\n", shdr_idx); 3131 + sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx)); 3132 + pr_warn("prog '%s': bad call relo against '%s' in section '%s'\n", 3133 + prog->name, sym_name, sym_sec_name); 3251 3134 return -LIBBPF_ERRNO__RELOC; 3252 3135 } 3253 - if (sym->st_value % 8) { 3254 - pr_warn("bad call relo offset: %zu\n", 3255 - (size_t)sym->st_value); 3136 + if (sym->st_value % BPF_INSN_SZ) { 3137 + pr_warn("prog '%s': bad call relo against '%s' at offset %zu\n", 3138 + prog->name, sym_name, (size_t)sym->st_value); 3256 3139 return -LIBBPF_ERRNO__RELOC; 3257 3140 } 3258 3141 reloc_desc->type = RELO_CALL; ··· 3266 3143 } 3267 3144 3268 3145 if (insn->code != (BPF_LD | BPF_IMM | BPF_DW)) { 3269 - pr_warn("invalid relo for insns[%d].code 0x%x\n", 3270 - insn_idx, insn->code); 3146 + pr_warn("prog '%s': invalid relo against '%s' for insns[%d].code 0x%x\n", 3147 + prog->name, sym_name, insn_idx, insn->code); 3271 3148 return -LIBBPF_ERRNO__RELOC; 3272 3149 } 3273 3150 ··· 3282 3159 break; 3283 3160 } 3284 3161 if (i >= n) { 3285 - pr_warn("extern relo failed to find extern for sym %d\n", 3286 - sym_idx); 3162 + pr_warn("prog '%s': extern relo failed to find extern for '%s' (%d)\n", 3163 + prog->name, sym_name, sym_idx); 3287 3164 return -LIBBPF_ERRNO__RELOC; 3288 3165 } 3289 - pr_debug("found extern #%d '%s' (sym %d) for insn %u\n", 3290 - i, ext->name, ext->sym_idx, insn_idx); 3166 + pr_debug("prog '%s': found extern #%d '%s' (sym %d) for insn #%u\n", 3167 + prog->name, i, ext->name, ext->sym_idx, insn_idx); 3291 3168 reloc_desc->type = RELO_EXTERN; 3292 3169 reloc_desc->insn_idx = insn_idx; 3293 3170 reloc_desc->sym_off = i; /* sym_off stores extern index */ ··· 3295 3172 } 3296 3173 3297 3174 if (!shdr_idx || shdr_idx >= SHN_LORESERVE) { 3298 - pr_warn("invalid relo for \'%s\' in special section 0x%x; forgot to initialize global var?..\n", 3299 - name, shdr_idx); 3175 + pr_warn("prog '%s': invalid relo against '%s' in special section 0x%x; forgot to initialize global var?..\n", 3176 + prog->name, sym_name, shdr_idx); 3300 3177 return -LIBBPF_ERRNO__RELOC; 3301 3178 } 3302 3179 3303 3180 type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx); 3181 + sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx)); 3304 3182 3305 3183 /* generic map reference relocation */ 3306 3184 if (type == LIBBPF_MAP_UNSPEC) { 3307 3185 if (!bpf_object__shndx_is_maps(obj, shdr_idx)) { 3308 - pr_warn("bad map relo against section %u\n", 3309 - shdr_idx); 3186 + pr_warn("prog '%s': bad map relo against '%s' in section '%s'\n", 3187 + prog->name, sym_name, sym_sec_name); 3310 3188 return -LIBBPF_ERRNO__RELOC; 3311 3189 } 3312 3190 for (map_idx = 0; map_idx < nr_maps; map_idx++) { ··· 3316 3192 map->sec_idx != sym->st_shndx || 3317 3193 map->sec_offset != sym->st_value) 3318 3194 continue; 3319 - pr_debug("found map %zd (%s, sec %d, off %zu) for insn %u\n", 3320 - map_idx, map->name, map->sec_idx, 3195 + pr_debug("prog '%s': found map %zd (%s, sec %d, off %zu) for insn #%u\n", 3196 + prog->name, map_idx, map->name, map->sec_idx, 3321 3197 map->sec_offset, insn_idx); 3322 3198 break; 3323 3199 } 3324 3200 if (map_idx >= nr_maps) { 3325 - pr_warn("map relo failed to find map for sec %u, off %zu\n", 3326 - shdr_idx, (size_t)sym->st_value); 3201 + pr_warn("prog '%s': map relo failed to find map for section '%s', off %zu\n", 3202 + prog->name, sym_sec_name, (size_t)sym->st_value); 3327 3203 return -LIBBPF_ERRNO__RELOC; 3328 3204 } 3329 3205 reloc_desc->type = RELO_LD64; ··· 3335 3211 3336 3212 /* global data map relocation */ 3337 3213 if (!bpf_object__shndx_is_data(obj, shdr_idx)) { 3338 - pr_warn("bad data relo against section %u\n", shdr_idx); 3214 + pr_warn("prog '%s': bad data relo against section '%s'\n", 3215 + prog->name, sym_sec_name); 3339 3216 return -LIBBPF_ERRNO__RELOC; 3340 3217 } 3341 3218 for (map_idx = 0; map_idx < nr_maps; map_idx++) { 3342 3219 map = &obj->maps[map_idx]; 3343 3220 if (map->libbpf_type != type) 3344 3221 continue; 3345 - pr_debug("found data map %zd (%s, sec %d, off %zu) for insn %u\n", 3346 - map_idx, map->name, map->sec_idx, map->sec_offset, 3347 - insn_idx); 3222 + pr_debug("prog '%s': found data map %zd (%s, sec %d, off %zu) for insn %u\n", 3223 + prog->name, map_idx, map->name, map->sec_idx, 3224 + map->sec_offset, insn_idx); 3348 3225 break; 3349 3226 } 3350 3227 if (map_idx >= nr_maps) { 3351 - pr_warn("data relo failed to find map for sec %u\n", 3352 - shdr_idx); 3228 + pr_warn("prog '%s': data relo failed to find map for section '%s'\n", 3229 + prog->name, sym_sec_name); 3353 3230 return -LIBBPF_ERRNO__RELOC; 3354 3231 } 3355 3232 ··· 3366 3241 Elf_Data *data, struct bpf_object *obj) 3367 3242 { 3368 3243 Elf_Data *symbols = obj->efile.symbols; 3244 + const char *relo_sec_name, *sec_name; 3245 + size_t sec_idx = shdr->sh_info; 3369 3246 int err, i, nrels; 3370 3247 3371 - pr_debug("collecting relocating info for: '%s'\n", prog->section_name); 3248 + relo_sec_name = elf_sec_str(obj, shdr->sh_name); 3249 + sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx)); 3250 + if (!relo_sec_name || !sec_name) 3251 + return -EINVAL; 3252 + 3253 + pr_debug("sec '%s': collecting relocation for section(%zu) '%s'\n", 3254 + relo_sec_name, sec_idx, sec_name); 3372 3255 nrels = shdr->sh_size / shdr->sh_entsize; 3373 3256 3374 3257 prog->reloc_desc = malloc(sizeof(*prog->reloc_desc) * nrels); ··· 3387 3254 prog->nr_reloc = nrels; 3388 3255 3389 3256 for (i = 0; i < nrels; i++) { 3390 - const char *name; 3257 + const char *sym_name; 3391 3258 __u32 insn_idx; 3392 3259 GElf_Sym sym; 3393 3260 GElf_Rel rel; 3394 3261 3395 3262 if (!gelf_getrel(data, i, &rel)) { 3396 - pr_warn("relocation: failed to get %d reloc\n", i); 3263 + pr_warn("sec '%s': failed to get relo #%d\n", relo_sec_name, i); 3397 3264 return -LIBBPF_ERRNO__FORMAT; 3398 3265 } 3399 3266 if (!gelf_getsym(symbols, GELF_R_SYM(rel.r_info), &sym)) { 3400 - pr_warn("relocation: symbol %"PRIx64" not found\n", 3401 - GELF_R_SYM(rel.r_info)); 3267 + pr_warn("sec '%s': symbol 0x%zx not found for relo #%d\n", 3268 + relo_sec_name, (size_t)GELF_R_SYM(rel.r_info), i); 3402 3269 return -LIBBPF_ERRNO__FORMAT; 3403 3270 } 3404 - if (rel.r_offset % sizeof(struct bpf_insn)) 3271 + if (rel.r_offset % BPF_INSN_SZ) { 3272 + pr_warn("sec '%s': invalid offset 0x%zx for relo #%d\n", 3273 + relo_sec_name, (size_t)GELF_R_SYM(rel.r_info), i); 3405 3274 return -LIBBPF_ERRNO__FORMAT; 3275 + } 3406 3276 3407 - insn_idx = rel.r_offset / sizeof(struct bpf_insn); 3408 - name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, 3409 - sym.st_name) ? : "<?>"; 3277 + insn_idx = rel.r_offset / BPF_INSN_SZ; 3278 + sym_name = elf_sym_str(obj, sym.st_name) ?: "<?>"; 3410 3279 3411 - pr_debug("relo for shdr %u, symb %zu, value %zu, type %d, bind %d, name %d (\'%s\'), insn %u\n", 3412 - (__u32)sym.st_shndx, (size_t)GELF_R_SYM(rel.r_info), 3413 - (size_t)sym.st_value, GELF_ST_TYPE(sym.st_info), 3414 - GELF_ST_BIND(sym.st_info), sym.st_name, name, 3415 - insn_idx); 3280 + pr_debug("sec '%s': relo #%d: insn #%u against '%s'\n", 3281 + relo_sec_name, i, insn_idx, sym_name); 3416 3282 3417 3283 err = bpf_program__record_reloc(prog, &prog->reloc_desc[i], 3418 - insn_idx, name, &sym, &rel); 3284 + insn_idx, sym_name, &sym, &rel); 3419 3285 if (err) 3420 3286 return err; 3421 3287 } ··· 3565 3433 return 0; 3566 3434 } 3567 3435 3568 - static int 3569 - bpf_object__probe_name(struct bpf_object *obj) 3436 + static int probe_fd(int fd) 3437 + { 3438 + if (fd >= 0) 3439 + close(fd); 3440 + return fd >= 0; 3441 + } 3442 + 3443 + static int probe_kern_prog_name(void) 3570 3444 { 3571 3445 struct bpf_load_program_attr attr; 3572 3446 struct bpf_insn insns[] = { ··· 3590 3452 attr.license = "GPL"; 3591 3453 attr.name = "test"; 3592 3454 ret = bpf_load_program_xattr(&attr, NULL, 0); 3593 - if (ret >= 0) { 3594 - obj->caps.name = 1; 3595 - close(ret); 3596 - } 3597 - 3598 - return 0; 3455 + return probe_fd(ret); 3599 3456 } 3600 3457 3601 - static int 3602 - bpf_object__probe_global_data(struct bpf_object *obj) 3458 + static int probe_kern_global_data(void) 3603 3459 { 3604 3460 struct bpf_load_program_attr prg_attr; 3605 3461 struct bpf_create_map_attr map_attr; ··· 3630 3498 prg_attr.license = "GPL"; 3631 3499 3632 3500 ret = bpf_load_program_xattr(&prg_attr, NULL, 0); 3633 - if (ret >= 0) { 3634 - obj->caps.global_data = 1; 3635 - close(ret); 3636 - } 3637 - 3638 3501 close(map); 3639 - return 0; 3502 + return probe_fd(ret); 3640 3503 } 3641 3504 3642 - static int bpf_object__probe_btf_func(struct bpf_object *obj) 3505 + static int probe_kern_btf(void) 3506 + { 3507 + static const char strs[] = "\0int"; 3508 + __u32 types[] = { 3509 + /* int */ 3510 + BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), 3511 + }; 3512 + 3513 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 3514 + strs, sizeof(strs))); 3515 + } 3516 + 3517 + static int probe_kern_btf_func(void) 3643 3518 { 3644 3519 static const char strs[] = "\0int\0x\0a"; 3645 3520 /* void x(int a) {} */ ··· 3659 3520 /* FUNC x */ /* [3] */ 3660 3521 BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2), 3661 3522 }; 3662 - int btf_fd; 3663 3523 3664 - btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), 3665 - strs, sizeof(strs)); 3666 - if (btf_fd >= 0) { 3667 - obj->caps.btf_func = 1; 3668 - close(btf_fd); 3669 - return 1; 3670 - } 3671 - 3672 - return 0; 3524 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 3525 + strs, sizeof(strs))); 3673 3526 } 3674 3527 3675 - static int bpf_object__probe_btf_func_global(struct bpf_object *obj) 3528 + static int probe_kern_btf_func_global(void) 3676 3529 { 3677 3530 static const char strs[] = "\0int\0x\0a"; 3678 3531 /* static void x(int a) {} */ ··· 3677 3546 /* FUNC x BTF_FUNC_GLOBAL */ /* [3] */ 3678 3547 BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2), 3679 3548 }; 3680 - int btf_fd; 3681 3549 3682 - btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), 3683 - strs, sizeof(strs)); 3684 - if (btf_fd >= 0) { 3685 - obj->caps.btf_func_global = 1; 3686 - close(btf_fd); 3687 - return 1; 3688 - } 3689 - 3690 - return 0; 3550 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 3551 + strs, sizeof(strs))); 3691 3552 } 3692 3553 3693 - static int bpf_object__probe_btf_datasec(struct bpf_object *obj) 3554 + static int probe_kern_btf_datasec(void) 3694 3555 { 3695 3556 static const char strs[] = "\0x\0.data"; 3696 3557 /* static int a; */ ··· 3696 3573 BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4), 3697 3574 BTF_VAR_SECINFO_ENC(2, 0, 4), 3698 3575 }; 3699 - int btf_fd; 3700 3576 3701 - btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), 3702 - strs, sizeof(strs)); 3703 - if (btf_fd >= 0) { 3704 - obj->caps.btf_datasec = 1; 3705 - close(btf_fd); 3706 - return 1; 3707 - } 3708 - 3709 - return 0; 3577 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 3578 + strs, sizeof(strs))); 3710 3579 } 3711 3580 3712 - static int bpf_object__probe_array_mmap(struct bpf_object *obj) 3581 + static int probe_kern_array_mmap(void) 3713 3582 { 3714 3583 struct bpf_create_map_attr attr = { 3715 3584 .map_type = BPF_MAP_TYPE_ARRAY, ··· 3710 3595 .value_size = sizeof(int), 3711 3596 .max_entries = 1, 3712 3597 }; 3713 - int fd; 3714 3598 3715 - fd = bpf_create_map_xattr(&attr); 3716 - if (fd >= 0) { 3717 - obj->caps.array_mmap = 1; 3718 - close(fd); 3719 - return 1; 3720 - } 3721 - 3722 - return 0; 3599 + return probe_fd(bpf_create_map_xattr(&attr)); 3723 3600 } 3724 3601 3725 - static int 3726 - bpf_object__probe_exp_attach_type(struct bpf_object *obj) 3602 + static int probe_kern_exp_attach_type(void) 3727 3603 { 3728 3604 struct bpf_load_program_attr attr; 3729 3605 struct bpf_insn insns[] = { 3730 3606 BPF_MOV64_IMM(BPF_REG_0, 0), 3731 3607 BPF_EXIT_INSN(), 3732 3608 }; 3733 - int fd; 3734 3609 3735 3610 memset(&attr, 0, sizeof(attr)); 3736 3611 /* use any valid combination of program type and (optional) ··· 3734 3629 attr.insns_cnt = ARRAY_SIZE(insns); 3735 3630 attr.license = "GPL"; 3736 3631 3737 - fd = bpf_load_program_xattr(&attr, NULL, 0); 3738 - if (fd >= 0) { 3739 - obj->caps.exp_attach_type = 1; 3740 - close(fd); 3741 - return 1; 3742 - } 3743 - return 0; 3632 + return probe_fd(bpf_load_program_xattr(&attr, NULL, 0)); 3744 3633 } 3745 3634 3746 - static int 3747 - bpf_object__probe_caps(struct bpf_object *obj) 3635 + static int probe_kern_probe_read_kernel(void) 3748 3636 { 3749 - int (*probe_fn[])(struct bpf_object *obj) = { 3750 - bpf_object__probe_name, 3751 - bpf_object__probe_global_data, 3752 - bpf_object__probe_btf_func, 3753 - bpf_object__probe_btf_func_global, 3754 - bpf_object__probe_btf_datasec, 3755 - bpf_object__probe_array_mmap, 3756 - bpf_object__probe_exp_attach_type, 3637 + struct bpf_load_program_attr attr; 3638 + struct bpf_insn insns[] = { 3639 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */ 3640 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */ 3641 + BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */ 3642 + BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */ 3643 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel), 3644 + BPF_EXIT_INSN(), 3757 3645 }; 3758 - int i, ret; 3759 3646 3760 - for (i = 0; i < ARRAY_SIZE(probe_fn); i++) { 3761 - ret = probe_fn[i](obj); 3762 - if (ret < 0) 3763 - pr_debug("Probe #%d failed with %d.\n", i, ret); 3647 + memset(&attr, 0, sizeof(attr)); 3648 + attr.prog_type = BPF_PROG_TYPE_KPROBE; 3649 + attr.insns = insns; 3650 + attr.insns_cnt = ARRAY_SIZE(insns); 3651 + attr.license = "GPL"; 3652 + 3653 + return probe_fd(bpf_load_program_xattr(&attr, NULL, 0)); 3654 + } 3655 + 3656 + enum kern_feature_result { 3657 + FEAT_UNKNOWN = 0, 3658 + FEAT_SUPPORTED = 1, 3659 + FEAT_MISSING = 2, 3660 + }; 3661 + 3662 + typedef int (*feature_probe_fn)(void); 3663 + 3664 + static struct kern_feature_desc { 3665 + const char *desc; 3666 + feature_probe_fn probe; 3667 + enum kern_feature_result res; 3668 + } feature_probes[__FEAT_CNT] = { 3669 + [FEAT_PROG_NAME] = { 3670 + "BPF program name", probe_kern_prog_name, 3671 + }, 3672 + [FEAT_GLOBAL_DATA] = { 3673 + "global variables", probe_kern_global_data, 3674 + }, 3675 + [FEAT_BTF] = { 3676 + "minimal BTF", probe_kern_btf, 3677 + }, 3678 + [FEAT_BTF_FUNC] = { 3679 + "BTF functions", probe_kern_btf_func, 3680 + }, 3681 + [FEAT_BTF_GLOBAL_FUNC] = { 3682 + "BTF global function", probe_kern_btf_func_global, 3683 + }, 3684 + [FEAT_BTF_DATASEC] = { 3685 + "BTF data section and variable", probe_kern_btf_datasec, 3686 + }, 3687 + [FEAT_ARRAY_MMAP] = { 3688 + "ARRAY map mmap()", probe_kern_array_mmap, 3689 + }, 3690 + [FEAT_EXP_ATTACH_TYPE] = { 3691 + "BPF_PROG_LOAD expected_attach_type attribute", 3692 + probe_kern_exp_attach_type, 3693 + }, 3694 + [FEAT_PROBE_READ_KERN] = { 3695 + "bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel, 3696 + } 3697 + }; 3698 + 3699 + static bool kernel_supports(enum kern_feature_id feat_id) 3700 + { 3701 + struct kern_feature_desc *feat = &feature_probes[feat_id]; 3702 + int ret; 3703 + 3704 + if (READ_ONCE(feat->res) == FEAT_UNKNOWN) { 3705 + ret = feat->probe(); 3706 + if (ret > 0) { 3707 + WRITE_ONCE(feat->res, FEAT_SUPPORTED); 3708 + } else if (ret == 0) { 3709 + WRITE_ONCE(feat->res, FEAT_MISSING); 3710 + } else { 3711 + pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret); 3712 + WRITE_ONCE(feat->res, FEAT_MISSING); 3713 + } 3764 3714 } 3765 3715 3766 - return 0; 3716 + return READ_ONCE(feat->res) == FEAT_SUPPORTED; 3767 3717 } 3768 3718 3769 3719 static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd) ··· 3920 3760 3921 3761 memset(&create_attr, 0, sizeof(create_attr)); 3922 3762 3923 - if (obj->caps.name) 3763 + if (kernel_supports(FEAT_PROG_NAME)) 3924 3764 create_attr.name = map->name; 3925 3765 create_attr.map_ifindex = map->map_ifindex; 3926 3766 create_attr.map_type = def->type; ··· 4171 4011 const struct btf *btf; 4172 4012 /* high-level spec: named fields and array indices only */ 4173 4013 struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN]; 4014 + /* original unresolved (no skip_mods_or_typedefs) root type ID */ 4015 + __u32 root_type_id; 4016 + /* CO-RE relocation kind */ 4017 + enum bpf_core_relo_kind relo_kind; 4174 4018 /* high-level spec length */ 4175 4019 int len; 4176 4020 /* raw, low-level spec: 1-to-1 with accessor spec string */ ··· 4205 4041 return acc->idx == btf_vlen(t) - 1; 4206 4042 } 4207 4043 4044 + static const char *core_relo_kind_str(enum bpf_core_relo_kind kind) 4045 + { 4046 + switch (kind) { 4047 + case BPF_FIELD_BYTE_OFFSET: return "byte_off"; 4048 + case BPF_FIELD_BYTE_SIZE: return "byte_sz"; 4049 + case BPF_FIELD_EXISTS: return "field_exists"; 4050 + case BPF_FIELD_SIGNED: return "signed"; 4051 + case BPF_FIELD_LSHIFT_U64: return "lshift_u64"; 4052 + case BPF_FIELD_RSHIFT_U64: return "rshift_u64"; 4053 + case BPF_TYPE_ID_LOCAL: return "local_type_id"; 4054 + case BPF_TYPE_ID_TARGET: return "target_type_id"; 4055 + case BPF_TYPE_EXISTS: return "type_exists"; 4056 + case BPF_TYPE_SIZE: return "type_size"; 4057 + case BPF_ENUMVAL_EXISTS: return "enumval_exists"; 4058 + case BPF_ENUMVAL_VALUE: return "enumval_value"; 4059 + default: return "unknown"; 4060 + } 4061 + } 4062 + 4063 + static bool core_relo_is_field_based(enum bpf_core_relo_kind kind) 4064 + { 4065 + switch (kind) { 4066 + case BPF_FIELD_BYTE_OFFSET: 4067 + case BPF_FIELD_BYTE_SIZE: 4068 + case BPF_FIELD_EXISTS: 4069 + case BPF_FIELD_SIGNED: 4070 + case BPF_FIELD_LSHIFT_U64: 4071 + case BPF_FIELD_RSHIFT_U64: 4072 + return true; 4073 + default: 4074 + return false; 4075 + } 4076 + } 4077 + 4078 + static bool core_relo_is_type_based(enum bpf_core_relo_kind kind) 4079 + { 4080 + switch (kind) { 4081 + case BPF_TYPE_ID_LOCAL: 4082 + case BPF_TYPE_ID_TARGET: 4083 + case BPF_TYPE_EXISTS: 4084 + case BPF_TYPE_SIZE: 4085 + return true; 4086 + default: 4087 + return false; 4088 + } 4089 + } 4090 + 4091 + static bool core_relo_is_enumval_based(enum bpf_core_relo_kind kind) 4092 + { 4093 + switch (kind) { 4094 + case BPF_ENUMVAL_EXISTS: 4095 + case BPF_ENUMVAL_VALUE: 4096 + return true; 4097 + default: 4098 + return false; 4099 + } 4100 + } 4101 + 4208 4102 /* 4209 - * Turn bpf_field_reloc into a low- and high-level spec representation, 4103 + * Turn bpf_core_relo into a low- and high-level spec representation, 4210 4104 * validating correctness along the way, as well as calculating resulting 4211 4105 * field bit offset, specified by accessor string. Low-level spec captures 4212 4106 * every single level of nestedness, including traversing anonymous ··· 4293 4071 * - field 'a' access (corresponds to '2' in low-level spec); 4294 4072 * - array element #3 access (corresponds to '3' in low-level spec). 4295 4073 * 4074 + * Type-based relocations (TYPE_EXISTS/TYPE_SIZE, 4075 + * TYPE_ID_LOCAL/TYPE_ID_TARGET) don't capture any field information. Their 4076 + * spec and raw_spec are kept empty. 4077 + * 4078 + * Enum value-based relocations (ENUMVAL_EXISTS/ENUMVAL_VALUE) use access 4079 + * string to specify enumerator's value index that need to be relocated. 4296 4080 */ 4297 - static int bpf_core_spec_parse(const struct btf *btf, 4081 + static int bpf_core_parse_spec(const struct btf *btf, 4298 4082 __u32 type_id, 4299 4083 const char *spec_str, 4084 + enum bpf_core_relo_kind relo_kind, 4300 4085 struct bpf_core_spec *spec) 4301 4086 { 4302 4087 int access_idx, parsed_len, i; ··· 4318 4089 4319 4090 memset(spec, 0, sizeof(*spec)); 4320 4091 spec->btf = btf; 4092 + spec->root_type_id = type_id; 4093 + spec->relo_kind = relo_kind; 4094 + 4095 + /* type-based relocations don't have a field access string */ 4096 + if (core_relo_is_type_based(relo_kind)) { 4097 + if (strcmp(spec_str, "0")) 4098 + return -EINVAL; 4099 + return 0; 4100 + } 4321 4101 4322 4102 /* parse spec_str="0:1:2:3:4" into array raw_spec=[0, 1, 2, 3, 4] */ 4323 4103 while (*spec_str) { ··· 4343 4105 if (spec->raw_len == 0) 4344 4106 return -EINVAL; 4345 4107 4346 - /* first spec value is always reloc type array index */ 4347 4108 t = skip_mods_and_typedefs(btf, type_id, &id); 4348 4109 if (!t) 4349 4110 return -EINVAL; 4350 4111 4351 4112 access_idx = spec->raw_spec[0]; 4352 - spec->spec[0].type_id = id; 4353 - spec->spec[0].idx = access_idx; 4113 + acc = &spec->spec[0]; 4114 + acc->type_id = id; 4115 + acc->idx = access_idx; 4354 4116 spec->len++; 4117 + 4118 + if (core_relo_is_enumval_based(relo_kind)) { 4119 + if (!btf_is_enum(t) || spec->raw_len > 1 || access_idx >= btf_vlen(t)) 4120 + return -EINVAL; 4121 + 4122 + /* record enumerator name in a first accessor */ 4123 + acc->name = btf__name_by_offset(btf, btf_enum(t)[access_idx].name_off); 4124 + return 0; 4125 + } 4126 + 4127 + if (!core_relo_is_field_based(relo_kind)) 4128 + return -EINVAL; 4355 4129 4356 4130 sz = btf__resolve_size(btf, id); 4357 4131 if (sz < 0) ··· 4422 4172 return sz; 4423 4173 spec->bit_offset += access_idx * sz * 8; 4424 4174 } else { 4425 - pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %d\n", 4426 - type_id, spec_str, i, id, btf_kind(t)); 4175 + pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %s\n", 4176 + type_id, spec_str, i, id, btf_kind_str(t)); 4427 4177 return -EINVAL; 4428 4178 } 4429 4179 } ··· 4473 4223 { 4474 4224 size_t local_essent_len, targ_essent_len; 4475 4225 const char *local_name, *targ_name; 4476 - const struct btf_type *t; 4226 + const struct btf_type *t, *local_t; 4477 4227 struct ids_vec *cand_ids; 4478 4228 __u32 *new_ids; 4479 4229 int i, err, n; 4480 4230 4481 - t = btf__type_by_id(local_btf, local_type_id); 4482 - if (!t) 4231 + local_t = btf__type_by_id(local_btf, local_type_id); 4232 + if (!local_t) 4483 4233 return ERR_PTR(-EINVAL); 4484 4234 4485 - local_name = btf__name_by_offset(local_btf, t->name_off); 4235 + local_name = btf__name_by_offset(local_btf, local_t->name_off); 4486 4236 if (str_is_empty(local_name)) 4487 4237 return ERR_PTR(-EINVAL); 4488 4238 local_essent_len = bpf_core_essential_name_len(local_name); ··· 4494 4244 n = btf__get_nr_types(targ_btf); 4495 4245 for (i = 1; i <= n; i++) { 4496 4246 t = btf__type_by_id(targ_btf, i); 4497 - targ_name = btf__name_by_offset(targ_btf, t->name_off); 4498 - if (str_is_empty(targ_name)) 4247 + if (btf_kind(t) != btf_kind(local_t)) 4499 4248 continue; 4500 4249 4501 - t = skip_mods_and_typedefs(targ_btf, i, NULL); 4502 - if (!btf_is_composite(t) && !btf_is_array(t)) 4250 + targ_name = btf__name_by_offset(targ_btf, t->name_off); 4251 + if (str_is_empty(targ_name)) 4503 4252 continue; 4504 4253 4505 4254 targ_essent_len = bpf_core_essential_name_len(targ_name); ··· 4506 4257 continue; 4507 4258 4508 4259 if (strncmp(local_name, targ_name, local_essent_len) == 0) { 4509 - pr_debug("[%d] %s: found candidate [%d] %s\n", 4510 - local_type_id, local_name, i, targ_name); 4511 - new_ids = reallocarray(cand_ids->data, 4512 - cand_ids->len + 1, 4513 - sizeof(*cand_ids->data)); 4260 + pr_debug("CO-RE relocating [%d] %s %s: found target candidate [%d] %s %s\n", 4261 + local_type_id, btf_kind_str(local_t), 4262 + local_name, i, btf_kind_str(t), targ_name); 4263 + new_ids = libbpf_reallocarray(cand_ids->data, 4264 + cand_ids->len + 1, 4265 + sizeof(*cand_ids->data)); 4514 4266 if (!new_ids) { 4515 4267 err = -ENOMEM; 4516 4268 goto err_out; ··· 4526 4276 return ERR_PTR(err); 4527 4277 } 4528 4278 4529 - /* Check two types for compatibility, skipping const/volatile/restrict and 4530 - * typedefs, to ensure we are relocating compatible entities: 4279 + /* Check two types for compatibility for the purpose of field access 4280 + * relocation. const/volatile/restrict and typedefs are skipped to ensure we 4281 + * are relocating semantically compatible entities: 4531 4282 * - any two STRUCTs/UNIONs are compatible and can be mixed; 4532 4283 * - any two FWDs are compatible, if their names match (modulo flavor suffix); 4533 4284 * - any two PTRs are always compatible; ··· 4683 4432 return 0; 4684 4433 } 4685 4434 4435 + /* Check local and target types for compatibility. This check is used for 4436 + * type-based CO-RE relocations and follow slightly different rules than 4437 + * field-based relocations. This function assumes that root types were already 4438 + * checked for name match. Beyond that initial root-level name check, names 4439 + * are completely ignored. Compatibility rules are as follows: 4440 + * - any two STRUCTs/UNIONs/FWDs/ENUMs/INTs are considered compatible, but 4441 + * kind should match for local and target types (i.e., STRUCT is not 4442 + * compatible with UNION); 4443 + * - for ENUMs, the size is ignored; 4444 + * - for INT, size and signedness are ignored; 4445 + * - for ARRAY, dimensionality is ignored, element types are checked for 4446 + * compatibility recursively; 4447 + * - CONST/VOLATILE/RESTRICT modifiers are ignored; 4448 + * - TYPEDEFs/PTRs are compatible if types they pointing to are compatible; 4449 + * - FUNC_PROTOs are compatible if they have compatible signature: same 4450 + * number of input args and compatible return and argument types. 4451 + * These rules are not set in stone and probably will be adjusted as we get 4452 + * more experience with using BPF CO-RE relocations. 4453 + */ 4454 + static int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id, 4455 + const struct btf *targ_btf, __u32 targ_id) 4456 + { 4457 + const struct btf_type *local_type, *targ_type; 4458 + int depth = 32; /* max recursion depth */ 4459 + 4460 + /* caller made sure that names match (ignoring flavor suffix) */ 4461 + local_type = btf__type_by_id(local_btf, local_id); 4462 + targ_type = btf__type_by_id(targ_btf, targ_id); 4463 + if (btf_kind(local_type) != btf_kind(targ_type)) 4464 + return 0; 4465 + 4466 + recur: 4467 + depth--; 4468 + if (depth < 0) 4469 + return -EINVAL; 4470 + 4471 + local_type = skip_mods_and_typedefs(local_btf, local_id, &local_id); 4472 + targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id); 4473 + if (!local_type || !targ_type) 4474 + return -EINVAL; 4475 + 4476 + if (btf_kind(local_type) != btf_kind(targ_type)) 4477 + return 0; 4478 + 4479 + switch (btf_kind(local_type)) { 4480 + case BTF_KIND_UNKN: 4481 + case BTF_KIND_STRUCT: 4482 + case BTF_KIND_UNION: 4483 + case BTF_KIND_ENUM: 4484 + case BTF_KIND_FWD: 4485 + return 1; 4486 + case BTF_KIND_INT: 4487 + /* just reject deprecated bitfield-like integers; all other 4488 + * integers are by default compatible between each other 4489 + */ 4490 + return btf_int_offset(local_type) == 0 && btf_int_offset(targ_type) == 0; 4491 + case BTF_KIND_PTR: 4492 + local_id = local_type->type; 4493 + targ_id = targ_type->type; 4494 + goto recur; 4495 + case BTF_KIND_ARRAY: 4496 + local_id = btf_array(local_type)->type; 4497 + targ_id = btf_array(targ_type)->type; 4498 + goto recur; 4499 + case BTF_KIND_FUNC_PROTO: { 4500 + struct btf_param *local_p = btf_params(local_type); 4501 + struct btf_param *targ_p = btf_params(targ_type); 4502 + __u16 local_vlen = btf_vlen(local_type); 4503 + __u16 targ_vlen = btf_vlen(targ_type); 4504 + int i, err; 4505 + 4506 + if (local_vlen != targ_vlen) 4507 + return 0; 4508 + 4509 + for (i = 0; i < local_vlen; i++, local_p++, targ_p++) { 4510 + skip_mods_and_typedefs(local_btf, local_p->type, &local_id); 4511 + skip_mods_and_typedefs(targ_btf, targ_p->type, &targ_id); 4512 + err = bpf_core_types_are_compat(local_btf, local_id, targ_btf, targ_id); 4513 + if (err <= 0) 4514 + return err; 4515 + } 4516 + 4517 + /* tail recurse for return type check */ 4518 + skip_mods_and_typedefs(local_btf, local_type->type, &local_id); 4519 + skip_mods_and_typedefs(targ_btf, targ_type->type, &targ_id); 4520 + goto recur; 4521 + } 4522 + default: 4523 + pr_warn("unexpected kind %s relocated, local [%d], target [%d]\n", 4524 + btf_kind_str(local_type), local_id, targ_id); 4525 + return 0; 4526 + } 4527 + } 4528 + 4686 4529 /* 4687 4530 * Try to match local spec to a target type and, if successful, produce full 4688 4531 * target spec (high-level, low-level + bit offset). ··· 4792 4447 4793 4448 memset(targ_spec, 0, sizeof(*targ_spec)); 4794 4449 targ_spec->btf = targ_btf; 4450 + targ_spec->root_type_id = targ_id; 4451 + targ_spec->relo_kind = local_spec->relo_kind; 4452 + 4453 + if (core_relo_is_type_based(local_spec->relo_kind)) { 4454 + return bpf_core_types_are_compat(local_spec->btf, 4455 + local_spec->root_type_id, 4456 + targ_btf, targ_id); 4457 + } 4795 4458 4796 4459 local_acc = &local_spec->spec[0]; 4797 4460 targ_acc = &targ_spec->spec[0]; 4461 + 4462 + if (core_relo_is_enumval_based(local_spec->relo_kind)) { 4463 + size_t local_essent_len, targ_essent_len; 4464 + const struct btf_enum *e; 4465 + const char *targ_name; 4466 + 4467 + /* has to resolve to an enum */ 4468 + targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id, &targ_id); 4469 + if (!btf_is_enum(targ_type)) 4470 + return 0; 4471 + 4472 + local_essent_len = bpf_core_essential_name_len(local_acc->name); 4473 + 4474 + for (i = 0, e = btf_enum(targ_type); i < btf_vlen(targ_type); i++, e++) { 4475 + targ_name = btf__name_by_offset(targ_spec->btf, e->name_off); 4476 + targ_essent_len = bpf_core_essential_name_len(targ_name); 4477 + if (targ_essent_len != local_essent_len) 4478 + continue; 4479 + if (strncmp(local_acc->name, targ_name, local_essent_len) == 0) { 4480 + targ_acc->type_id = targ_id; 4481 + targ_acc->idx = i; 4482 + targ_acc->name = targ_name; 4483 + targ_spec->len++; 4484 + targ_spec->raw_spec[targ_spec->raw_len] = targ_acc->idx; 4485 + targ_spec->raw_len++; 4486 + return 1; 4487 + } 4488 + } 4489 + return 0; 4490 + } 4491 + 4492 + if (!core_relo_is_field_based(local_spec->relo_kind)) 4493 + return -EINVAL; 4798 4494 4799 4495 for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) { 4800 4496 targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id, ··· 4893 4507 } 4894 4508 4895 4509 static int bpf_core_calc_field_relo(const struct bpf_program *prog, 4896 - const struct bpf_field_reloc *relo, 4510 + const struct bpf_core_relo *relo, 4897 4511 const struct bpf_core_spec *spec, 4898 4512 __u32 *val, bool *validate) 4899 4513 { 4900 - const struct bpf_core_accessor *acc = &spec->spec[spec->len - 1]; 4901 - const struct btf_type *t = btf__type_by_id(spec->btf, acc->type_id); 4514 + const struct bpf_core_accessor *acc; 4515 + const struct btf_type *t; 4902 4516 __u32 byte_off, byte_sz, bit_off, bit_sz; 4903 4517 const struct btf_member *m; 4904 4518 const struct btf_type *mt; 4905 4519 bool bitfield; 4906 4520 __s64 sz; 4521 + 4522 + if (relo->kind == BPF_FIELD_EXISTS) { 4523 + *val = spec ? 1 : 0; 4524 + return 0; 4525 + } 4526 + 4527 + if (!spec) 4528 + return -EUCLEAN; /* request instruction poisoning */ 4529 + 4530 + acc = &spec->spec[spec->len - 1]; 4531 + t = btf__type_by_id(spec->btf, acc->type_id); 4907 4532 4908 4533 /* a[n] accessor needs special handling */ 4909 4534 if (!acc->name) { ··· 5001 4604 break; 5002 4605 case BPF_FIELD_EXISTS: 5003 4606 default: 5004 - pr_warn("prog '%s': unknown relo %d at insn #%d\n", 5005 - bpf_program__title(prog, false), 5006 - relo->kind, relo->insn_off / 8); 5007 - return -EINVAL; 4607 + return -EOPNOTSUPP; 5008 4608 } 5009 4609 5010 4610 return 0; 4611 + } 4612 + 4613 + static int bpf_core_calc_type_relo(const struct bpf_core_relo *relo, 4614 + const struct bpf_core_spec *spec, 4615 + __u32 *val) 4616 + { 4617 + __s64 sz; 4618 + 4619 + /* type-based relos return zero when target type is not found */ 4620 + if (!spec) { 4621 + *val = 0; 4622 + return 0; 4623 + } 4624 + 4625 + switch (relo->kind) { 4626 + case BPF_TYPE_ID_TARGET: 4627 + *val = spec->root_type_id; 4628 + break; 4629 + case BPF_TYPE_EXISTS: 4630 + *val = 1; 4631 + break; 4632 + case BPF_TYPE_SIZE: 4633 + sz = btf__resolve_size(spec->btf, spec->root_type_id); 4634 + if (sz < 0) 4635 + return -EINVAL; 4636 + *val = sz; 4637 + break; 4638 + case BPF_TYPE_ID_LOCAL: 4639 + /* BPF_TYPE_ID_LOCAL is handled specially and shouldn't get here */ 4640 + default: 4641 + return -EOPNOTSUPP; 4642 + } 4643 + 4644 + return 0; 4645 + } 4646 + 4647 + static int bpf_core_calc_enumval_relo(const struct bpf_core_relo *relo, 4648 + const struct bpf_core_spec *spec, 4649 + __u32 *val) 4650 + { 4651 + const struct btf_type *t; 4652 + const struct btf_enum *e; 4653 + 4654 + switch (relo->kind) { 4655 + case BPF_ENUMVAL_EXISTS: 4656 + *val = spec ? 1 : 0; 4657 + break; 4658 + case BPF_ENUMVAL_VALUE: 4659 + if (!spec) 4660 + return -EUCLEAN; /* request instruction poisoning */ 4661 + t = btf__type_by_id(spec->btf, spec->spec[0].type_id); 4662 + e = btf_enum(t) + spec->spec[0].idx; 4663 + *val = e->val; 4664 + break; 4665 + default: 4666 + return -EOPNOTSUPP; 4667 + } 4668 + 4669 + return 0; 4670 + } 4671 + 4672 + struct bpf_core_relo_res 4673 + { 4674 + /* expected value in the instruction, unless validate == false */ 4675 + __u32 orig_val; 4676 + /* new value that needs to be patched up to */ 4677 + __u32 new_val; 4678 + /* relocation unsuccessful, poison instruction, but don't fail load */ 4679 + bool poison; 4680 + /* some relocations can't be validated against orig_val */ 4681 + bool validate; 4682 + }; 4683 + 4684 + /* Calculate original and target relocation values, given local and target 4685 + * specs and relocation kind. These values are calculated for each candidate. 4686 + * If there are multiple candidates, resulting values should all be consistent 4687 + * with each other. Otherwise, libbpf will refuse to proceed due to ambiguity. 4688 + * If instruction has to be poisoned, *poison will be set to true. 4689 + */ 4690 + static int bpf_core_calc_relo(const struct bpf_program *prog, 4691 + const struct bpf_core_relo *relo, 4692 + int relo_idx, 4693 + const struct bpf_core_spec *local_spec, 4694 + const struct bpf_core_spec *targ_spec, 4695 + struct bpf_core_relo_res *res) 4696 + { 4697 + int err = -EOPNOTSUPP; 4698 + 4699 + res->orig_val = 0; 4700 + res->new_val = 0; 4701 + res->poison = false; 4702 + res->validate = true; 4703 + 4704 + if (core_relo_is_field_based(relo->kind)) { 4705 + err = bpf_core_calc_field_relo(prog, relo, local_spec, &res->orig_val, &res->validate); 4706 + err = err ?: bpf_core_calc_field_relo(prog, relo, targ_spec, &res->new_val, NULL); 4707 + } else if (core_relo_is_type_based(relo->kind)) { 4708 + err = bpf_core_calc_type_relo(relo, local_spec, &res->orig_val); 4709 + err = err ?: bpf_core_calc_type_relo(relo, targ_spec, &res->new_val); 4710 + } else if (core_relo_is_enumval_based(relo->kind)) { 4711 + err = bpf_core_calc_enumval_relo(relo, local_spec, &res->orig_val); 4712 + err = err ?: bpf_core_calc_enumval_relo(relo, targ_spec, &res->new_val); 4713 + } 4714 + 4715 + if (err == -EUCLEAN) { 4716 + /* EUCLEAN is used to signal instruction poisoning request */ 4717 + res->poison = true; 4718 + err = 0; 4719 + } else if (err == -EOPNOTSUPP) { 4720 + /* EOPNOTSUPP means unknown/unsupported relocation */ 4721 + pr_warn("prog '%s': relo #%d: unrecognized CO-RE relocation %s (%d) at insn #%d\n", 4722 + bpf_program__title(prog, false), relo_idx, 4723 + core_relo_kind_str(relo->kind), relo->kind, relo->insn_off / 8); 4724 + } 4725 + 4726 + return err; 4727 + } 4728 + 4729 + /* 4730 + * Turn instruction for which CO_RE relocation failed into invalid one with 4731 + * distinct signature. 4732 + */ 4733 + static void bpf_core_poison_insn(struct bpf_program *prog, int relo_idx, 4734 + int insn_idx, struct bpf_insn *insn) 4735 + { 4736 + pr_debug("prog '%s': relo #%d: substituting insn #%d w/ invalid insn\n", 4737 + bpf_program__title(prog, false), relo_idx, insn_idx); 4738 + insn->code = BPF_JMP | BPF_CALL; 4739 + insn->dst_reg = 0; 4740 + insn->src_reg = 0; 4741 + insn->off = 0; 4742 + /* if this instruction is reachable (not a dead code), 4743 + * verifier will complain with the following message: 4744 + * invalid func unknown#195896080 4745 + */ 4746 + insn->imm = 195896080; /* => 0xbad2310 => "bad relo" */ 4747 + } 4748 + 4749 + static bool is_ldimm64(struct bpf_insn *insn) 4750 + { 4751 + return insn->code == (BPF_LD | BPF_IMM | BPF_DW); 5011 4752 } 5012 4753 5013 4754 /* 5014 4755 * Patch relocatable BPF instruction. 5015 4756 * 5016 4757 * Patched value is determined by relocation kind and target specification. 5017 - * For field existence relocation target spec will be NULL if field is not 5018 - * found. 4758 + * For existence relocations target spec will be NULL if field/type is not found. 5019 4759 * Expected insn->imm value is determined using relocation kind and local 5020 4760 * spec, and is checked before patching instruction. If actual insn->imm value 5021 4761 * is wrong, bail out with error. ··· 5160 4626 * Currently three kinds of BPF instructions are supported: 5161 4627 * 1. rX = <imm> (assignment with immediate operand); 5162 4628 * 2. rX += <imm> (arithmetic operations with immediate operand); 4629 + * 3. rX = <imm64> (load with 64-bit immediate value). 5163 4630 */ 5164 - static int bpf_core_reloc_insn(struct bpf_program *prog, 5165 - const struct bpf_field_reloc *relo, 4631 + static int bpf_core_patch_insn(struct bpf_program *prog, 4632 + const struct bpf_core_relo *relo, 5166 4633 int relo_idx, 5167 - const struct bpf_core_spec *local_spec, 5168 - const struct bpf_core_spec *targ_spec) 4634 + const struct bpf_core_relo_res *res) 5169 4635 { 5170 4636 __u32 orig_val, new_val; 5171 4637 struct bpf_insn *insn; 5172 - bool validate = true; 5173 - int insn_idx, err; 4638 + int insn_idx; 5174 4639 __u8 class; 5175 4640 5176 - if (relo->insn_off % sizeof(struct bpf_insn)) 4641 + if (relo->insn_off % BPF_INSN_SZ) 5177 4642 return -EINVAL; 5178 - insn_idx = relo->insn_off / sizeof(struct bpf_insn); 4643 + insn_idx = relo->insn_off / BPF_INSN_SZ; 5179 4644 insn = &prog->insns[insn_idx]; 5180 4645 class = BPF_CLASS(insn->code); 5181 4646 5182 - if (relo->kind == BPF_FIELD_EXISTS) { 5183 - orig_val = 1; /* can't generate EXISTS relo w/o local field */ 5184 - new_val = targ_spec ? 1 : 0; 5185 - } else if (!targ_spec) { 5186 - pr_debug("prog '%s': relo #%d: substituting insn #%d w/ invalid insn\n", 5187 - bpf_program__title(prog, false), relo_idx, insn_idx); 5188 - insn->code = BPF_JMP | BPF_CALL; 5189 - insn->dst_reg = 0; 5190 - insn->src_reg = 0; 5191 - insn->off = 0; 5192 - /* if this instruction is reachable (not a dead code), 5193 - * verifier will complain with the following message: 5194 - * invalid func unknown#195896080 4647 + if (res->poison) { 4648 + /* poison second part of ldimm64 to avoid confusing error from 4649 + * verifier about "unknown opcode 00" 5195 4650 */ 5196 - insn->imm = 195896080; /* => 0xbad2310 => "bad relo" */ 4651 + if (is_ldimm64(insn)) 4652 + bpf_core_poison_insn(prog, relo_idx, insn_idx + 1, insn + 1); 4653 + bpf_core_poison_insn(prog, relo_idx, insn_idx, insn); 5197 4654 return 0; 5198 - } else { 5199 - err = bpf_core_calc_field_relo(prog, relo, local_spec, 5200 - &orig_val, &validate); 5201 - if (err) 5202 - return err; 5203 - err = bpf_core_calc_field_relo(prog, relo, targ_spec, 5204 - &new_val, NULL); 5205 - if (err) 5206 - return err; 5207 4655 } 4656 + 4657 + orig_val = res->orig_val; 4658 + new_val = res->new_val; 5208 4659 5209 4660 switch (class) { 5210 4661 case BPF_ALU: 5211 4662 case BPF_ALU64: 5212 4663 if (BPF_SRC(insn->code) != BPF_K) 5213 4664 return -EINVAL; 5214 - if (validate && insn->imm != orig_val) { 4665 + if (res->validate && insn->imm != orig_val) { 5215 4666 pr_warn("prog '%s': relo #%d: unexpected insn #%d (ALU/ALU64) value: got %u, exp %u -> %u\n", 5216 4667 bpf_program__title(prog, false), relo_idx, 5217 4668 insn_idx, insn->imm, orig_val, new_val); ··· 5211 4692 case BPF_LDX: 5212 4693 case BPF_ST: 5213 4694 case BPF_STX: 5214 - if (validate && insn->off != orig_val) { 5215 - pr_warn("prog '%s': relo #%d: unexpected insn #%d (LD/LDX/ST/STX) value: got %u, exp %u -> %u\n", 4695 + if (res->validate && insn->off != orig_val) { 4696 + pr_warn("prog '%s': relo #%d: unexpected insn #%d (LDX/ST/STX) value: got %u, exp %u -> %u\n", 5216 4697 bpf_program__title(prog, false), relo_idx, 5217 4698 insn_idx, insn->off, orig_val, new_val); 5218 4699 return -EINVAL; ··· 5229 4710 bpf_program__title(prog, false), relo_idx, insn_idx, 5230 4711 orig_val, new_val); 5231 4712 break; 4713 + case BPF_LD: { 4714 + __u64 imm; 4715 + 4716 + if (!is_ldimm64(insn) || 4717 + insn[0].src_reg != 0 || insn[0].off != 0 || 4718 + insn_idx + 1 >= prog->insns_cnt || 4719 + insn[1].code != 0 || insn[1].dst_reg != 0 || 4720 + insn[1].src_reg != 0 || insn[1].off != 0) { 4721 + pr_warn("prog '%s': relo #%d: insn #%d (LDIMM64) has unexpected form\n", 4722 + bpf_program__title(prog, false), relo_idx, insn_idx); 4723 + return -EINVAL; 4724 + } 4725 + 4726 + imm = insn[0].imm + ((__u64)insn[1].imm << 32); 4727 + if (res->validate && imm != orig_val) { 4728 + pr_warn("prog '%s': relo #%d: unexpected insn #%d (LDIMM64) value: got %llu, exp %u -> %u\n", 4729 + bpf_program__title(prog, false), relo_idx, 4730 + insn_idx, (unsigned long long)imm, 4731 + orig_val, new_val); 4732 + return -EINVAL; 4733 + } 4734 + 4735 + insn[0].imm = new_val; 4736 + insn[1].imm = 0; /* currently only 32-bit values are supported */ 4737 + pr_debug("prog '%s': relo #%d: patched insn #%d (LDIMM64) imm64 %llu -> %u\n", 4738 + bpf_program__title(prog, false), relo_idx, insn_idx, 4739 + (unsigned long long)imm, new_val); 4740 + break; 4741 + } 5232 4742 default: 5233 - pr_warn("prog '%s': relo #%d: trying to relocate unrecognized insn #%d, code:%x, src:%x, dst:%x, off:%x, imm:%x\n", 4743 + pr_warn("prog '%s': relo #%d: trying to relocate unrecognized insn #%d, code:0x%x, src:0x%x, dst:0x%x, off:0x%x, imm:0x%x\n", 5234 4744 bpf_program__title(prog, false), relo_idx, 5235 4745 insn_idx, insn->code, insn->src_reg, insn->dst_reg, 5236 4746 insn->off, insn->imm); ··· 5276 4728 static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec) 5277 4729 { 5278 4730 const struct btf_type *t; 4731 + const struct btf_enum *e; 5279 4732 const char *s; 5280 4733 __u32 type_id; 5281 4734 int i; 5282 4735 5283 - type_id = spec->spec[0].type_id; 4736 + type_id = spec->root_type_id; 5284 4737 t = btf__type_by_id(spec->btf, type_id); 5285 4738 s = btf__name_by_offset(spec->btf, t->name_off); 5286 - libbpf_print(level, "[%u] %s + ", type_id, s); 5287 4739 5288 - for (i = 0; i < spec->raw_len; i++) 5289 - libbpf_print(level, "%d%s", spec->raw_spec[i], 5290 - i == spec->raw_len - 1 ? " => " : ":"); 4740 + libbpf_print(level, "[%u] %s %s", type_id, btf_kind_str(t), str_is_empty(s) ? "<anon>" : s); 5291 4741 5292 - libbpf_print(level, "%u.%u @ &x", 5293 - spec->bit_offset / 8, spec->bit_offset % 8); 4742 + if (core_relo_is_type_based(spec->relo_kind)) 4743 + return; 5294 4744 5295 - for (i = 0; i < spec->len; i++) { 5296 - if (spec->spec[i].name) 5297 - libbpf_print(level, ".%s", spec->spec[i].name); 5298 - else 5299 - libbpf_print(level, "[%u]", spec->spec[i].idx); 4745 + if (core_relo_is_enumval_based(spec->relo_kind)) { 4746 + t = skip_mods_and_typedefs(spec->btf, type_id, NULL); 4747 + e = btf_enum(t) + spec->raw_spec[0]; 4748 + s = btf__name_by_offset(spec->btf, e->name_off); 4749 + 4750 + libbpf_print(level, "::%s = %u", s, e->val); 4751 + return; 5300 4752 } 5301 4753 4754 + if (core_relo_is_field_based(spec->relo_kind)) { 4755 + for (i = 0; i < spec->len; i++) { 4756 + if (spec->spec[i].name) 4757 + libbpf_print(level, ".%s", spec->spec[i].name); 4758 + else if (i > 0 || spec->spec[i].idx > 0) 4759 + libbpf_print(level, "[%u]", spec->spec[i].idx); 4760 + } 4761 + 4762 + libbpf_print(level, " ("); 4763 + for (i = 0; i < spec->raw_len; i++) 4764 + libbpf_print(level, "%s%d", i == 0 ? "" : ":", spec->raw_spec[i]); 4765 + 4766 + if (spec->bit_offset % 8) 4767 + libbpf_print(level, " @ offset %u.%u)", 4768 + spec->bit_offset / 8, spec->bit_offset % 8); 4769 + else 4770 + libbpf_print(level, " @ offset %u)", spec->bit_offset / 8); 4771 + return; 4772 + } 5302 4773 } 5303 4774 5304 4775 static size_t bpf_core_hash_fn(const void *key, void *ctx) ··· 5381 4814 * CPU-wise compared to prebuilding a map from all local type names to 5382 4815 * a list of candidate type names. It's also sped up by caching resolved 5383 4816 * list of matching candidates per each local "root" type ID, that has at 5384 - * least one bpf_field_reloc associated with it. This list is shared 4817 + * least one bpf_core_relo associated with it. This list is shared 5385 4818 * between multiple relocations for the same type ID and is updated as some 5386 4819 * of the candidates are pruned due to structural incompatibility. 5387 4820 */ 5388 - static int bpf_core_reloc_field(struct bpf_program *prog, 5389 - const struct bpf_field_reloc *relo, 5390 - int relo_idx, 5391 - const struct btf *local_btf, 5392 - const struct btf *targ_btf, 5393 - struct hashmap *cand_cache) 4821 + static int bpf_core_apply_relo(struct bpf_program *prog, 4822 + const struct bpf_core_relo *relo, 4823 + int relo_idx, 4824 + const struct btf *local_btf, 4825 + const struct btf *targ_btf, 4826 + struct hashmap *cand_cache) 5394 4827 { 5395 4828 const char *prog_name = bpf_program__title(prog, false); 5396 - struct bpf_core_spec local_spec, cand_spec, targ_spec; 4829 + struct bpf_core_spec local_spec, cand_spec, targ_spec = {}; 5397 4830 const void *type_key = u32_as_hash_key(relo->type_id); 5398 - const struct btf_type *local_type, *cand_type; 5399 - const char *local_name, *cand_name; 4831 + struct bpf_core_relo_res cand_res, targ_res; 4832 + const struct btf_type *local_type; 4833 + const char *local_name; 5400 4834 struct ids_vec *cand_ids; 5401 4835 __u32 local_id, cand_id; 5402 4836 const char *spec_str; ··· 5409 4841 return -EINVAL; 5410 4842 5411 4843 local_name = btf__name_by_offset(local_btf, local_type->name_off); 5412 - if (str_is_empty(local_name)) 4844 + if (!local_name) 5413 4845 return -EINVAL; 5414 4846 5415 4847 spec_str = btf__name_by_offset(local_btf, relo->access_str_off); 5416 4848 if (str_is_empty(spec_str)) 5417 4849 return -EINVAL; 5418 4850 5419 - err = bpf_core_spec_parse(local_btf, local_id, spec_str, &local_spec); 4851 + err = bpf_core_parse_spec(local_btf, local_id, spec_str, relo->kind, &local_spec); 5420 4852 if (err) { 5421 - pr_warn("prog '%s': relo #%d: parsing [%d] %s + %s failed: %d\n", 5422 - prog_name, relo_idx, local_id, local_name, spec_str, 5423 - err); 4853 + pr_warn("prog '%s': relo #%d: parsing [%d] %s %s + %s failed: %d\n", 4854 + prog_name, relo_idx, local_id, btf_kind_str(local_type), 4855 + str_is_empty(local_name) ? "<anon>" : local_name, 4856 + spec_str, err); 5424 4857 return -EINVAL; 5425 4858 } 5426 4859 5427 - pr_debug("prog '%s': relo #%d: kind %d, spec is ", prog_name, relo_idx, 5428 - relo->kind); 4860 + pr_debug("prog '%s': relo #%d: kind <%s> (%d), spec is ", prog_name, 4861 + relo_idx, core_relo_kind_str(relo->kind), relo->kind); 5429 4862 bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec); 5430 4863 libbpf_print(LIBBPF_DEBUG, "\n"); 4864 + 4865 + /* TYPE_ID_LOCAL relo is special and doesn't need candidate search */ 4866 + if (relo->kind == BPF_TYPE_ID_LOCAL) { 4867 + targ_res.validate = true; 4868 + targ_res.poison = false; 4869 + targ_res.orig_val = local_spec.root_type_id; 4870 + targ_res.new_val = local_spec.root_type_id; 4871 + goto patch_insn; 4872 + } 4873 + 4874 + /* libbpf doesn't support candidate search for anonymous types */ 4875 + if (str_is_empty(spec_str)) { 4876 + pr_warn("prog '%s': relo #%d: <%s> (%d) relocation doesn't support anonymous types\n", 4877 + prog_name, relo_idx, core_relo_kind_str(relo->kind), relo->kind); 4878 + return -EOPNOTSUPP; 4879 + } 5431 4880 5432 4881 if (!hashmap__find(cand_cache, type_key, (void **)&cand_ids)) { 5433 4882 cand_ids = bpf_core_find_cands(local_btf, local_id, targ_btf); 5434 4883 if (IS_ERR(cand_ids)) { 5435 - pr_warn("prog '%s': relo #%d: target candidate search failed for [%d] %s: %ld", 5436 - prog_name, relo_idx, local_id, local_name, 5437 - PTR_ERR(cand_ids)); 4884 + pr_warn("prog '%s': relo #%d: target candidate search failed for [%d] %s %s: %ld", 4885 + prog_name, relo_idx, local_id, btf_kind_str(local_type), 4886 + local_name, PTR_ERR(cand_ids)); 5438 4887 return PTR_ERR(cand_ids); 5439 4888 } 5440 4889 err = hashmap__set(cand_cache, type_key, cand_ids, NULL, NULL); ··· 5463 4878 5464 4879 for (i = 0, j = 0; i < cand_ids->len; i++) { 5465 4880 cand_id = cand_ids->data[i]; 5466 - cand_type = btf__type_by_id(targ_btf, cand_id); 5467 - cand_name = btf__name_by_offset(targ_btf, cand_type->name_off); 5468 - 5469 - err = bpf_core_spec_match(&local_spec, targ_btf, 5470 - cand_id, &cand_spec); 5471 - pr_debug("prog '%s': relo #%d: matching candidate #%d %s against spec ", 5472 - prog_name, relo_idx, i, cand_name); 5473 - bpf_core_dump_spec(LIBBPF_DEBUG, &cand_spec); 5474 - libbpf_print(LIBBPF_DEBUG, ": %d\n", err); 4881 + err = bpf_core_spec_match(&local_spec, targ_btf, cand_id, &cand_spec); 5475 4882 if (err < 0) { 5476 - pr_warn("prog '%s': relo #%d: matching error: %d\n", 5477 - prog_name, relo_idx, err); 4883 + pr_warn("prog '%s': relo #%d: error matching candidate #%d ", 4884 + prog_name, relo_idx, i); 4885 + bpf_core_dump_spec(LIBBPF_WARN, &cand_spec); 4886 + libbpf_print(LIBBPF_WARN, ": %d\n", err); 5478 4887 return err; 5479 4888 } 4889 + 4890 + pr_debug("prog '%s': relo #%d: %s candidate #%d ", prog_name, 4891 + relo_idx, err == 0 ? "non-matching" : "matching", i); 4892 + bpf_core_dump_spec(LIBBPF_DEBUG, &cand_spec); 4893 + libbpf_print(LIBBPF_DEBUG, "\n"); 4894 + 5480 4895 if (err == 0) 5481 4896 continue; 5482 4897 4898 + err = bpf_core_calc_relo(prog, relo, relo_idx, &local_spec, &cand_spec, &cand_res); 4899 + if (err) 4900 + return err; 4901 + 5483 4902 if (j == 0) { 4903 + targ_res = cand_res; 5484 4904 targ_spec = cand_spec; 5485 4905 } else if (cand_spec.bit_offset != targ_spec.bit_offset) { 5486 - /* if there are many candidates, they should all 5487 - * resolve to the same bit offset 4906 + /* if there are many field relo candidates, they 4907 + * should all resolve to the same bit offset 5488 4908 */ 5489 - pr_warn("prog '%s': relo #%d: offset ambiguity: %u != %u\n", 4909 + pr_warn("prog '%s': relo #%d: field offset ambiguity: %u != %u\n", 5490 4910 prog_name, relo_idx, cand_spec.bit_offset, 5491 4911 targ_spec.bit_offset); 5492 4912 return -EINVAL; 4913 + } else if (cand_res.poison != targ_res.poison || cand_res.new_val != targ_res.new_val) { 4914 + /* all candidates should result in the same relocation 4915 + * decision and value, otherwise it's dangerous to 4916 + * proceed due to ambiguity 4917 + */ 4918 + pr_warn("prog '%s': relo #%d: relocation decision ambiguity: %s %u != %s %u\n", 4919 + prog_name, relo_idx, 4920 + cand_res.poison ? "failure" : "success", cand_res.new_val, 4921 + targ_res.poison ? "failure" : "success", targ_res.new_val); 4922 + return -EINVAL; 5493 4923 } 5494 4924 5495 - cand_ids->data[j++] = cand_spec.spec[0].type_id; 4925 + cand_ids->data[j++] = cand_spec.root_type_id; 5496 4926 } 5497 4927 5498 4928 /* ··· 5526 4926 * as well as expected case, depending whether instruction w/ 5527 4927 * relocation is guarded in some way that makes it unreachable (dead 5528 4928 * code) if relocation can't be resolved. This is handled in 5529 - * bpf_core_reloc_insn() uniformly by replacing that instruction with 4929 + * bpf_core_patch_insn() uniformly by replacing that instruction with 5530 4930 * BPF helper call insn (using invalid helper ID). If that instruction 5531 4931 * is indeed unreachable, then it will be ignored and eliminated by 5532 4932 * verifier. If it was an error, then verifier will complain and point 5533 4933 * to a specific instruction number in its log. 5534 4934 */ 5535 - if (j == 0) 5536 - pr_debug("prog '%s': relo #%d: no matching targets found for [%d] %s + %s\n", 5537 - prog_name, relo_idx, local_id, local_name, spec_str); 4935 + if (j == 0) { 4936 + pr_debug("prog '%s': relo #%d: no matching targets found\n", 4937 + prog_name, relo_idx); 5538 4938 5539 - /* bpf_core_reloc_insn should know how to handle missing targ_spec */ 5540 - err = bpf_core_reloc_insn(prog, relo, relo_idx, &local_spec, 5541 - j ? &targ_spec : NULL); 4939 + /* calculate single target relo result explicitly */ 4940 + err = bpf_core_calc_relo(prog, relo, relo_idx, &local_spec, NULL, &targ_res); 4941 + if (err) 4942 + return err; 4943 + } 4944 + 4945 + patch_insn: 4946 + /* bpf_core_patch_insn() should know how to handle missing targ_spec */ 4947 + err = bpf_core_patch_insn(prog, relo, relo_idx, &targ_res); 5542 4948 if (err) { 5543 4949 pr_warn("prog '%s': relo #%d: failed to patch insn at offset %d: %d\n", 5544 4950 prog_name, relo_idx, relo->insn_off, err); ··· 5555 4949 } 5556 4950 5557 4951 static int 5558 - bpf_core_reloc_fields(struct bpf_object *obj, const char *targ_btf_path) 4952 + bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path) 5559 4953 { 5560 4954 const struct btf_ext_info_sec *sec; 5561 - const struct bpf_field_reloc *rec; 4955 + const struct bpf_core_relo *rec; 5562 4956 const struct btf_ext_info *seg; 5563 4957 struct hashmap_entry *entry; 5564 4958 struct hashmap *cand_cache = NULL; ··· 5566 4960 struct btf *targ_btf; 5567 4961 const char *sec_name; 5568 4962 int i, err = 0; 4963 + 4964 + if (obj->btf_ext->core_relo_info.len == 0) 4965 + return 0; 5569 4966 5570 4967 if (targ_btf_path) 5571 4968 targ_btf = btf__parse_elf(targ_btf_path, NULL); ··· 5585 4976 goto out; 5586 4977 } 5587 4978 5588 - seg = &obj->btf_ext->field_reloc_info; 4979 + seg = &obj->btf_ext->core_relo_info; 5589 4980 for_each_btf_ext_sec(seg, sec) { 5590 4981 sec_name = btf__name_by_offset(obj->btf, sec->sec_name_off); 5591 4982 if (str_is_empty(sec_name)) { ··· 5606 4997 goto out; 5607 4998 } 5608 4999 5609 - pr_debug("prog '%s': performing %d CO-RE offset relocs\n", 5000 + pr_debug("sec '%s': found %d CO-RE relocations\n", 5610 5001 sec_name, sec->num_info); 5611 5002 5612 5003 for_each_btf_ext_rec(seg, sec, i, rec) { 5613 - err = bpf_core_reloc_field(prog, rec, i, obj->btf, 5614 - targ_btf, cand_cache); 5004 + err = bpf_core_apply_relo(prog, rec, i, obj->btf, 5005 + targ_btf, cand_cache); 5615 5006 if (err) { 5616 5007 pr_warn("prog '%s': relo #%d: failed to relocate: %d\n", 5617 - sec_name, i, err); 5008 + prog->name, i, err); 5618 5009 goto out; 5619 5010 } 5620 5011 } ··· 5634 5025 } 5635 5026 5636 5027 static int 5637 - bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path) 5638 - { 5639 - int err = 0; 5640 - 5641 - if (obj->btf_ext->field_reloc_info.len) 5642 - err = bpf_core_reloc_fields(obj, targ_btf_path); 5643 - 5644 - return err; 5645 - } 5646 - 5647 - static int 5648 5028 bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj, 5649 5029 struct reloc_desc *relo) 5650 5030 { ··· 5649 5051 return -LIBBPF_ERRNO__RELOC; 5650 5052 } 5651 5053 new_cnt = prog->insns_cnt + text->insns_cnt; 5652 - new_insn = reallocarray(prog->insns, new_cnt, sizeof(*insn)); 5054 + new_insn = libbpf_reallocarray(prog->insns, new_cnt, sizeof(*insn)); 5653 5055 if (!new_insn) { 5654 5056 pr_warn("oom in prog realloc\n"); 5655 5057 return -ENOMEM; ··· 5734 5136 return err; 5735 5137 break; 5736 5138 default: 5737 - pr_warn("relo #%d: bad relo type %d\n", i, relo->type); 5139 + pr_warn("prog '%s': relo #%d: bad relo type %d\n", 5140 + prog->name, i, relo->type); 5738 5141 return -EINVAL; 5739 5142 } 5740 5143 } ··· 5770 5171 5771 5172 err = bpf_program__relocate(prog, obj); 5772 5173 if (err) { 5773 - pr_warn("failed to relocate '%s'\n", prog->section_name); 5174 + pr_warn("prog '%s': failed to relocate data references: %d\n", 5175 + prog->name, err); 5774 5176 return err; 5775 5177 } 5776 5178 break; ··· 5786 5186 5787 5187 err = bpf_program__relocate(prog, obj); 5788 5188 if (err) { 5789 - pr_warn("failed to relocate '%s'\n", prog->section_name); 5189 + pr_warn("prog '%s': failed to relocate calls: %d\n", 5190 + prog->name, err); 5790 5191 return err; 5791 5192 } 5792 5193 } ··· 5831 5230 i, (size_t)GELF_R_SYM(rel.r_info)); 5832 5231 return -LIBBPF_ERRNO__FORMAT; 5833 5232 } 5834 - name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, 5835 - sym.st_name) ? : "<?>"; 5233 + name = elf_sym_str(obj, sym.st_name) ?: "<?>"; 5836 5234 if (sym.st_shndx != obj->efile.btf_maps_shndx) { 5837 5235 pr_warn(".maps relo #%d: '%s' isn't a BTF-defined map\n", 5838 5236 i, name); ··· 5893 5293 moff /= bpf_ptr_sz; 5894 5294 if (moff >= map->init_slots_sz) { 5895 5295 new_sz = moff + 1; 5896 - tmp = realloc(map->init_slots, new_sz * host_ptr_sz); 5296 + tmp = libbpf_reallocarray(map->init_slots, new_sz, host_ptr_sz); 5897 5297 if (!tmp) 5898 5298 return -ENOMEM; 5899 5299 map->init_slots = tmp; ··· 5948 5348 return 0; 5949 5349 } 5950 5350 5351 + static bool insn_is_helper_call(struct bpf_insn *insn, enum bpf_func_id *func_id) 5352 + { 5353 + if (BPF_CLASS(insn->code) == BPF_JMP && 5354 + BPF_OP(insn->code) == BPF_CALL && 5355 + BPF_SRC(insn->code) == BPF_K && 5356 + insn->src_reg == 0 && 5357 + insn->dst_reg == 0) { 5358 + *func_id = insn->imm; 5359 + return true; 5360 + } 5361 + return false; 5362 + } 5363 + 5364 + static int bpf_object__sanitize_prog(struct bpf_object* obj, struct bpf_program *prog) 5365 + { 5366 + struct bpf_insn *insn = prog->insns; 5367 + enum bpf_func_id func_id; 5368 + int i; 5369 + 5370 + for (i = 0; i < prog->insns_cnt; i++, insn++) { 5371 + if (!insn_is_helper_call(insn, &func_id)) 5372 + continue; 5373 + 5374 + /* on kernels that don't yet support 5375 + * bpf_probe_read_{kernel,user}[_str] helpers, fall back 5376 + * to bpf_probe_read() which works well for old kernels 5377 + */ 5378 + switch (func_id) { 5379 + case BPF_FUNC_probe_read_kernel: 5380 + case BPF_FUNC_probe_read_user: 5381 + if (!kernel_supports(FEAT_PROBE_READ_KERN)) 5382 + insn->imm = BPF_FUNC_probe_read; 5383 + break; 5384 + case BPF_FUNC_probe_read_kernel_str: 5385 + case BPF_FUNC_probe_read_user_str: 5386 + if (!kernel_supports(FEAT_PROBE_READ_KERN)) 5387 + insn->imm = BPF_FUNC_probe_read_str; 5388 + break; 5389 + default: 5390 + break; 5391 + } 5392 + } 5393 + return 0; 5394 + } 5395 + 5951 5396 static int 5952 5397 load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt, 5953 5398 char *license, __u32 kern_version, int *pfd) ··· 6009 5364 memset(&load_attr, 0, sizeof(struct bpf_load_program_attr)); 6010 5365 load_attr.prog_type = prog->type; 6011 5366 /* old kernels might not support specifying expected_attach_type */ 6012 - if (!prog->caps->exp_attach_type && prog->sec_def && 5367 + if (!kernel_supports(FEAT_EXP_ATTACH_TYPE) && prog->sec_def && 6013 5368 prog->sec_def->is_exp_attach_type_optional) 6014 5369 load_attr.expected_attach_type = 0; 6015 5370 else 6016 5371 load_attr.expected_attach_type = prog->expected_attach_type; 6017 - if (prog->caps->name) 5372 + if (kernel_supports(FEAT_PROG_NAME)) 6018 5373 load_attr.name = prog->name; 6019 5374 load_attr.insns = insns; 6020 5375 load_attr.insns_cnt = insns_cnt; ··· 6032 5387 } 6033 5388 /* specify func_info/line_info only if kernel supports them */ 6034 5389 btf_fd = bpf_object__btf_fd(prog->obj); 6035 - if (btf_fd >= 0 && prog->obj->caps.btf_func) { 5390 + if (btf_fd >= 0 && kernel_supports(FEAT_BTF_FUNC)) { 6036 5391 load_attr.prog_btf_fd = btf_fd; 6037 5392 load_attr.func_info = prog->func_info; 6038 5393 load_attr.func_info_rec_size = prog->func_info_rec_size; ··· 6070 5425 free(log_buf); 6071 5426 goto retry_load; 6072 5427 } 6073 - ret = -errno; 5428 + ret = errno ? -errno : -LIBBPF_ERRNO__LOAD; 6074 5429 cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg)); 6075 5430 pr_warn("load bpf program failed: %s\n", cp); 6076 5431 pr_perm_msg(ret); ··· 6209 5564 6210 5565 for (i = 0; i < obj->nr_programs; i++) { 6211 5566 prog = &obj->programs[i]; 5567 + err = bpf_object__sanitize_prog(obj, prog); 5568 + if (err) 5569 + return err; 5570 + } 5571 + 5572 + for (i = 0; i < obj->nr_programs; i++) { 5573 + prog = &obj->programs[i]; 6212 5574 if (bpf_program__is_function_storage(prog, obj)) 6213 5575 continue; 6214 5576 if (!prog->load) { 6215 - pr_debug("prog '%s'('%s'): skipped loading\n", 6216 - prog->name, prog->section_name); 5577 + pr_debug("prog '%s': skipped loading\n", prog->name); 6217 5578 continue; 6218 5579 } 6219 5580 prog->log_level |= log_level; ··· 6292 5641 /* couldn't guess, but user might manually specify */ 6293 5642 continue; 6294 5643 5644 + if (prog->sec_def->is_sleepable) 5645 + prog->prog_flags |= BPF_F_SLEEPABLE; 6295 5646 bpf_program__set_type(prog, prog->sec_def->prog_type); 6296 5647 bpf_program__set_expected_attach_type(prog, 6297 5648 prog->sec_def->expected_attach_type); ··· 6403 5750 bpf_object__for_each_map(m, obj) { 6404 5751 if (!bpf_map__is_internal(m)) 6405 5752 continue; 6406 - if (!obj->caps.global_data) { 5753 + if (!kernel_supports(FEAT_GLOBAL_DATA)) { 6407 5754 pr_warn("kernel doesn't support global data\n"); 6408 5755 return -ENOTSUP; 6409 5756 } 6410 - if (!obj->caps.array_mmap) 5757 + if (!kernel_supports(FEAT_ARRAY_MMAP)) 6411 5758 m->def.map_flags ^= BPF_F_MMAPABLE; 6412 5759 } 6413 5760 ··· 6557 5904 } 6558 5905 6559 5906 err = bpf_object__probe_loading(obj); 6560 - err = err ? : bpf_object__probe_caps(obj); 6561 5907 err = err ? : bpf_object__resolve_externs(obj, obj->kconfig); 6562 5908 err = err ? : bpf_object__sanitize_and_load_btf(obj); 6563 5909 err = err ? : bpf_object__sanitize_maps(obj); ··· 7365 6713 7366 6714 size_t bpf_program__size(const struct bpf_program *prog) 7367 6715 { 7368 - return prog->insns_cnt * sizeof(struct bpf_insn); 6716 + return prog->insns_cnt * BPF_INSN_SZ; 7369 6717 } 7370 6718 7371 6719 int bpf_program__set_prep(struct bpf_program *prog, int nr_instances, ··· 7562 6910 .expected_attach_type = BPF_TRACE_FEXIT, 7563 6911 .is_attach_btf = true, 7564 6912 .attach_fn = attach_trace), 6913 + SEC_DEF("fentry.s/", TRACING, 6914 + .expected_attach_type = BPF_TRACE_FENTRY, 6915 + .is_attach_btf = true, 6916 + .is_sleepable = true, 6917 + .attach_fn = attach_trace), 6918 + SEC_DEF("fmod_ret.s/", TRACING, 6919 + .expected_attach_type = BPF_MODIFY_RETURN, 6920 + .is_attach_btf = true, 6921 + .is_sleepable = true, 6922 + .attach_fn = attach_trace), 6923 + SEC_DEF("fexit.s/", TRACING, 6924 + .expected_attach_type = BPF_TRACE_FEXIT, 6925 + .is_attach_btf = true, 6926 + .is_sleepable = true, 6927 + .attach_fn = attach_trace), 7565 6928 SEC_DEF("freplace/", EXT, 7566 6929 .is_attach_btf = true, 7567 6930 .attach_fn = attach_trace), 7568 6931 SEC_DEF("lsm/", LSM, 7569 6932 .is_attach_btf = true, 6933 + .expected_attach_type = BPF_LSM_MAC, 6934 + .attach_fn = attach_lsm), 6935 + SEC_DEF("lsm.s/", LSM, 6936 + .is_attach_btf = true, 6937 + .is_sleepable = true, 7570 6938 .expected_attach_type = BPF_LSM_MAC, 7571 6939 .attach_fn = attach_lsm), 7572 6940 SEC_DEF("iter/", TRACING, ··· 7794 7122 return -LIBBPF_ERRNO__FORMAT; 7795 7123 } 7796 7124 7797 - name = elf_strptr(obj->efile.elf, obj->efile.strtabidx, 7798 - sym.st_name) ? : "<?>"; 7125 + name = elf_sym_str(obj, sym.st_name) ?: "<?>"; 7799 7126 map = find_struct_ops_map_by_offset(obj, rel.r_offset); 7800 7127 if (!map) { 7801 7128 pr_warn("struct_ops reloc: cannot find map at rel.r_offset %zu\n", ··· 8311 7640 8312 7641 prog->prog_ifindex = attr->ifindex; 8313 7642 prog->log_level = attr->log_level; 8314 - prog->prog_flags = attr->prog_flags; 7643 + prog->prog_flags |= attr->prog_flags; 8315 7644 if (!first_prog) 8316 7645 first_prog = prog; 8317 7646 } ··· 9265 8594 struct perf_buffer_params p = {}; 9266 8595 struct perf_event_attr attr = { 0, }; 9267 8596 9268 - attr.config = PERF_COUNT_SW_BPF_OUTPUT, 8597 + attr.config = PERF_COUNT_SW_BPF_OUTPUT; 9269 8598 attr.type = PERF_TYPE_SOFTWARE; 9270 8599 attr.sample_type = PERF_SAMPLE_RAW; 9271 8600 attr.sample_period = 1; ··· 9503 8832 return 0; 9504 8833 } 9505 8834 8835 + int perf_buffer__epoll_fd(const struct perf_buffer *pb) 8836 + { 8837 + return pb->epoll_fd; 8838 + } 8839 + 9506 8840 int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms) 9507 8841 { 9508 8842 int i, cnt, err; ··· 9525 8849 return cnt < 0 ? -errno : cnt; 9526 8850 } 9527 8851 8852 + /* Return number of PERF_EVENT_ARRAY map slots set up by this perf_buffer 8853 + * manager. 8854 + */ 8855 + size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb) 8856 + { 8857 + return pb->cpu_cnt; 8858 + } 8859 + 8860 + /* 8861 + * Return perf_event FD of a ring buffer in *buf_idx* slot of 8862 + * PERF_EVENT_ARRAY BPF map. This FD can be polled for new data using 8863 + * select()/poll()/epoll() Linux syscalls. 8864 + */ 8865 + int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx) 8866 + { 8867 + struct perf_cpu_buf *cpu_buf; 8868 + 8869 + if (buf_idx >= pb->cpu_cnt) 8870 + return -EINVAL; 8871 + 8872 + cpu_buf = pb->cpu_bufs[buf_idx]; 8873 + if (!cpu_buf) 8874 + return -ENOENT; 8875 + 8876 + return cpu_buf->fd; 8877 + } 8878 + 8879 + /* 8880 + * Consume data from perf ring buffer corresponding to slot *buf_idx* in 8881 + * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to 8882 + * consume, do nothing and return success. 8883 + * Returns: 8884 + * - 0 on success; 8885 + * - <0 on failure. 8886 + */ 8887 + int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx) 8888 + { 8889 + struct perf_cpu_buf *cpu_buf; 8890 + 8891 + if (buf_idx >= pb->cpu_cnt) 8892 + return -EINVAL; 8893 + 8894 + cpu_buf = pb->cpu_bufs[buf_idx]; 8895 + if (!cpu_buf) 8896 + return -ENOENT; 8897 + 8898 + return perf_buffer__process_records(pb, cpu_buf); 8899 + } 8900 + 9528 8901 int perf_buffer__consume(struct perf_buffer *pb) 9529 8902 { 9530 8903 int i, err; ··· 9586 8861 9587 8862 err = perf_buffer__process_records(pb, cpu_buf); 9588 8863 if (err) { 9589 - pr_warn("error while processing records: %d\n", err); 8864 + pr_warn("perf_buffer: failed to process records in buffer #%d: %d\n", i, err); 9590 8865 return err; 9591 8866 } 9592 8867 }

+4

tools/lib/bpf/libbpf.h

··· 588 588 const struct perf_buffer_raw_opts *opts); 589 589 590 590 LIBBPF_API void perf_buffer__free(struct perf_buffer *pb); 591 + LIBBPF_API int perf_buffer__epoll_fd(const struct perf_buffer *pb); 591 592 LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms); 592 593 LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb); 594 + LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx); 595 + LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb); 596 + LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx); 593 597 594 598 typedef enum bpf_perf_event_ret 595 599 (*bpf_perf_event_print_t)(struct perf_event_header *hdr,

+9

tools/lib/bpf/libbpf.map

··· 299 299 btf__set_fd; 300 300 btf__set_pointer_size; 301 301 } LIBBPF_0.0.9; 302 + 303 + LIBBPF_0.2.0 { 304 + global: 305 + perf_buffer__buffer_cnt; 306 + perf_buffer__buffer_fd; 307 + perf_buffer__epoll_fd; 308 + perf_buffer__consume_buffer; 309 + xsk_socket__create_shared; 310 + } LIBBPF_0.1.0;

+107 -31

tools/lib/bpf/libbpf_internal.h

··· 9 9 #ifndef __LIBBPF_LIBBPF_INTERNAL_H 10 10 #define __LIBBPF_LIBBPF_INTERNAL_H 11 11 12 + #include <stdlib.h> 13 + #include <limits.h> 14 + 15 + /* make sure libbpf doesn't use kernel-only integer typedefs */ 16 + #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 17 + 18 + /* prevent accidental re-addition of reallocarray() */ 19 + #pragma GCC poison reallocarray 20 + 12 21 #include "libbpf.h" 13 22 14 23 #define BTF_INFO_ENC(kind, kind_flag, vlen) \ ··· 32 23 #define BTF_PARAM_ENC(name, type) (name), (type) 33 24 #define BTF_VAR_SECINFO_ENC(type, offset, size) (type), (offset), (size) 34 25 26 + #ifndef likely 27 + #define likely(x) __builtin_expect(!!(x), 1) 28 + #endif 29 + #ifndef unlikely 30 + #define unlikely(x) __builtin_expect(!!(x), 0) 31 + #endif 35 32 #ifndef min 36 33 # define min(x, y) ((x) < (y) ? (x) : (y)) 37 34 #endif ··· 77 62 #define pr_warn(fmt, ...) __pr(LIBBPF_WARN, fmt, ##__VA_ARGS__) 78 63 #define pr_info(fmt, ...) __pr(LIBBPF_INFO, fmt, ##__VA_ARGS__) 79 64 #define pr_debug(fmt, ...) __pr(LIBBPF_DEBUG, fmt, ##__VA_ARGS__) 65 + 66 + #ifndef __has_builtin 67 + #define __has_builtin(x) 0 68 + #endif 69 + /* 70 + * Re-implement glibc's reallocarray() for libbpf internal-only use. 71 + * reallocarray(), unfortunately, is not available in all versions of glibc, 72 + * so requires extra feature detection and using reallocarray() stub from 73 + * <tools/libc_compat.h> and COMPAT_NEED_REALLOCARRAY. All this complicates 74 + * build of libbpf unnecessarily and is just a maintenance burden. Instead, 75 + * it's trivial to implement libbpf-specific internal version and use it 76 + * throughout libbpf. 77 + */ 78 + static inline void *libbpf_reallocarray(void *ptr, size_t nmemb, size_t size) 79 + { 80 + size_t total; 81 + 82 + #if __has_builtin(__builtin_mul_overflow) 83 + if (unlikely(__builtin_mul_overflow(nmemb, size, &total))) 84 + return NULL; 85 + #else 86 + if (size == 0 || nmemb > ULONG_MAX / size) 87 + return NULL; 88 + total = nmemb * size; 89 + #endif 90 + return realloc(ptr, total); 91 + } 80 92 81 93 static inline bool libbpf_validate_opts(const char *opts, 82 94 size_t opts_sz, size_t user_sz, ··· 147 105 int bpf_object__variable_offset(const struct bpf_object *obj, const char *name, 148 106 __u32 *off); 149 107 150 - struct nlattr; 151 - typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb); 152 - int libbpf_netlink_open(unsigned int *nl_pid); 153 - int libbpf_nl_get_link(int sock, unsigned int nl_pid, 154 - libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie); 155 - int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex, 156 - libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie); 157 - int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, 158 - libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie); 159 - int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, 160 - libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie); 161 - 162 108 struct btf_ext_info { 163 109 /* 164 110 * info points to the individual info section (e.g. func_info and ··· 168 138 i < (sec)->num_info; \ 169 139 i++, rec = (void *)rec + (seg)->rec_size) 170 140 141 + /* 142 + * The .BTF.ext ELF section layout defined as 143 + * struct btf_ext_header 144 + * func_info subsection 145 + * 146 + * The func_info subsection layout: 147 + * record size for struct bpf_func_info in the func_info subsection 148 + * struct btf_sec_func_info for section #1 149 + * a list of bpf_func_info records for section #1 150 + * where struct bpf_func_info mimics one in include/uapi/linux/bpf.h 151 + * but may not be identical 152 + * struct btf_sec_func_info for section #2 153 + * a list of bpf_func_info records for section #2 154 + * ...... 155 + * 156 + * Note that the bpf_func_info record size in .BTF.ext may not 157 + * be the same as the one defined in include/uapi/linux/bpf.h. 158 + * The loader should ensure that record_size meets minimum 159 + * requirement and pass the record as is to the kernel. The 160 + * kernel will handle the func_info properly based on its contents. 161 + */ 162 + struct btf_ext_header { 163 + __u16 magic; 164 + __u8 version; 165 + __u8 flags; 166 + __u32 hdr_len; 167 + 168 + /* All offsets are in bytes relative to the end of this header */ 169 + __u32 func_info_off; 170 + __u32 func_info_len; 171 + __u32 line_info_off; 172 + __u32 line_info_len; 173 + 174 + /* optional part of .BTF.ext header */ 175 + __u32 core_relo_off; 176 + __u32 core_relo_len; 177 + }; 178 + 171 179 struct btf_ext { 172 180 union { 173 181 struct btf_ext_header *hdr; ··· 213 145 }; 214 146 struct btf_ext_info func_info; 215 147 struct btf_ext_info line_info; 216 - struct btf_ext_info field_reloc_info; 148 + struct btf_ext_info core_relo_info; 217 149 __u32 data_size; 218 150 }; 219 151 ··· 238 170 __u32 line_col; 239 171 }; 240 172 241 - /* bpf_field_info_kind encodes which aspect of captured field has to be 242 - * adjusted by relocations. Currently supported values are: 243 - * - BPF_FIELD_BYTE_OFFSET: field offset (in bytes); 244 - * - BPF_FIELD_EXISTS: field existence (1, if field exists; 0, otherwise); 173 + /* bpf_core_relo_kind encodes which aspect of captured field/type/enum value 174 + * has to be adjusted by relocations. 245 175 */ 246 - enum bpf_field_info_kind { 176 + enum bpf_core_relo_kind { 247 177 BPF_FIELD_BYTE_OFFSET = 0, /* field byte offset */ 248 - BPF_FIELD_BYTE_SIZE = 1, 178 + BPF_FIELD_BYTE_SIZE = 1, /* field size in bytes */ 249 179 BPF_FIELD_EXISTS = 2, /* field existence in target kernel */ 250 - BPF_FIELD_SIGNED = 3, 251 - BPF_FIELD_LSHIFT_U64 = 4, 252 - BPF_FIELD_RSHIFT_U64 = 5, 180 + BPF_FIELD_SIGNED = 3, /* field signedness (0 - unsigned, 1 - signed) */ 181 + BPF_FIELD_LSHIFT_U64 = 4, /* bitfield-specific left bitshift */ 182 + BPF_FIELD_RSHIFT_U64 = 5, /* bitfield-specific right bitshift */ 183 + BPF_TYPE_ID_LOCAL = 6, /* type ID in local BPF object */ 184 + BPF_TYPE_ID_TARGET = 7, /* type ID in target kernel */ 185 + BPF_TYPE_EXISTS = 8, /* type existence in target kernel */ 186 + BPF_TYPE_SIZE = 9, /* type size in bytes */ 187 + BPF_ENUMVAL_EXISTS = 10, /* enum value existence in target kernel */ 188 + BPF_ENUMVAL_VALUE = 11, /* enum value integer value */ 253 189 }; 254 190 255 - /* The minimum bpf_field_reloc checked by the loader 191 + /* The minimum bpf_core_relo checked by the loader 256 192 * 257 - * Field relocation captures the following data: 193 + * CO-RE relocation captures the following data: 258 194 * - insn_off - instruction offset (in bytes) within a BPF program that needs 259 195 * its insn->imm field to be relocated with actual field info; 260 196 * - type_id - BTF type ID of the "root" (containing) entity of a relocatable 261 - * field; 197 + * type or field; 262 198 * - access_str_off - offset into corresponding .BTF string section. String 263 - * itself encodes an accessed field using a sequence of field and array 264 - * indicies, separated by colon (:). It's conceptually very close to LLVM's 265 - * getelementptr ([0]) instruction's arguments for identifying offset to 266 - * a field. 199 + * interpretation depends on specific relocation kind: 200 + * - for field-based relocations, string encodes an accessed field using 201 + * a sequence of field and array indices, separated by colon (:). It's 202 + * conceptually very close to LLVM's getelementptr ([0]) instruction's 203 + * arguments for identifying offset to a field. 204 + * - for type-based relocations, strings is expected to be just "0"; 205 + * - for enum value-based relocations, string contains an index of enum 206 + * value within its enum type; 267 207 * 268 208 * Example to provide a better feel. 269 209 * ··· 302 226 * 303 227 * [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction 304 228 */ 305 - struct bpf_field_reloc { 229 + struct bpf_core_relo { 306 230 __u32 insn_off; 307 231 __u32 type_id; 308 232 __u32 access_str_off; 309 - enum bpf_field_info_kind kind; 233 + enum bpf_core_relo_kind kind; 310 234 }; 311 235 312 236 #endif /* __LIBBPF_LIBBPF_INTERNAL_H */

+3 -5

tools/lib/bpf/libbpf_probes.c

··· 17 17 #include "libbpf.h" 18 18 #include "libbpf_internal.h" 19 19 20 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 21 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 22 - 23 20 static bool grep(const char *buffer, const char *pattern) 24 21 { 25 22 return !!strstr(buffer, pattern); ··· 170 173 return btf_fd; 171 174 } 172 175 173 - static int load_sk_storage_btf(void) 176 + static int load_local_storage_btf(void) 174 177 { 175 178 const char strs[] = "\0bpf_spin_lock\0val\0cnt\0l"; 176 179 /* struct bpf_spin_lock { ··· 229 232 key_size = 0; 230 233 break; 231 234 case BPF_MAP_TYPE_SK_STORAGE: 235 + case BPF_MAP_TYPE_INODE_STORAGE: 232 236 btf_key_type_id = 1; 233 237 btf_value_type_id = 3; 234 238 value_size = 8; 235 239 max_entries = 0; 236 240 map_flags = BPF_F_NO_PREALLOC; 237 - btf_fd = load_sk_storage_btf(); 241 + btf_fd = load_local_storage_btf(); 238 242 if (btf_fd < 0) 239 243 return false; 240 244 break;

+6 -122

tools/lib/bpf/netlink.c

··· 15 15 #include "libbpf_internal.h" 16 16 #include "nlattr.h" 17 17 18 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 19 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 20 - 21 18 #ifndef SOL_NETLINK 22 19 #define SOL_NETLINK 270 23 20 #endif 21 + 22 + typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb); 24 23 25 24 typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t, 26 25 void *cookie); ··· 30 31 struct xdp_link_info info; 31 32 }; 32 33 33 - int libbpf_netlink_open(__u32 *nl_pid) 34 + static int libbpf_netlink_open(__u32 *nl_pid) 34 35 { 35 36 struct sockaddr_nl sa; 36 37 socklen_t addrlen; ··· 282 283 return 0; 283 284 } 284 285 286 + static int libbpf_nl_get_link(int sock, unsigned int nl_pid, 287 + libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie); 288 + 285 289 int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info, 286 290 size_t info_size, __u32 flags) 287 291 { ··· 369 367 370 368 return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg, 371 369 dump_link_nlmsg, cookie); 372 - } 373 - 374 - static int __dump_class_nlmsg(struct nlmsghdr *nlh, 375 - libbpf_dump_nlmsg_t dump_class_nlmsg, 376 - void *cookie) 377 - { 378 - struct nlattr *tb[TCA_MAX + 1], *attr; 379 - struct tcmsg *t = NLMSG_DATA(nlh); 380 - int len; 381 - 382 - len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); 383 - attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); 384 - if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) 385 - return -LIBBPF_ERRNO__NLPARSE; 386 - 387 - return dump_class_nlmsg(cookie, t, tb); 388 - } 389 - 390 - int libbpf_nl_get_class(int sock, unsigned int nl_pid, int ifindex, 391 - libbpf_dump_nlmsg_t dump_class_nlmsg, void *cookie) 392 - { 393 - struct { 394 - struct nlmsghdr nlh; 395 - struct tcmsg t; 396 - } req = { 397 - .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), 398 - .nlh.nlmsg_type = RTM_GETTCLASS, 399 - .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, 400 - .t.tcm_family = AF_UNSPEC, 401 - .t.tcm_ifindex = ifindex, 402 - }; 403 - int seq = time(NULL); 404 - 405 - req.nlh.nlmsg_seq = seq; 406 - if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) 407 - return -errno; 408 - 409 - return bpf_netlink_recv(sock, nl_pid, seq, __dump_class_nlmsg, 410 - dump_class_nlmsg, cookie); 411 - } 412 - 413 - static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh, 414 - libbpf_dump_nlmsg_t dump_qdisc_nlmsg, 415 - void *cookie) 416 - { 417 - struct nlattr *tb[TCA_MAX + 1], *attr; 418 - struct tcmsg *t = NLMSG_DATA(nlh); 419 - int len; 420 - 421 - len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); 422 - attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); 423 - if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) 424 - return -LIBBPF_ERRNO__NLPARSE; 425 - 426 - return dump_qdisc_nlmsg(cookie, t, tb); 427 - } 428 - 429 - int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex, 430 - libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie) 431 - { 432 - struct { 433 - struct nlmsghdr nlh; 434 - struct tcmsg t; 435 - } req = { 436 - .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), 437 - .nlh.nlmsg_type = RTM_GETQDISC, 438 - .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, 439 - .t.tcm_family = AF_UNSPEC, 440 - .t.tcm_ifindex = ifindex, 441 - }; 442 - int seq = time(NULL); 443 - 444 - req.nlh.nlmsg_seq = seq; 445 - if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) 446 - return -errno; 447 - 448 - return bpf_netlink_recv(sock, nl_pid, seq, __dump_qdisc_nlmsg, 449 - dump_qdisc_nlmsg, cookie); 450 - } 451 - 452 - static int __dump_filter_nlmsg(struct nlmsghdr *nlh, 453 - libbpf_dump_nlmsg_t dump_filter_nlmsg, 454 - void *cookie) 455 - { 456 - struct nlattr *tb[TCA_MAX + 1], *attr; 457 - struct tcmsg *t = NLMSG_DATA(nlh); 458 - int len; 459 - 460 - len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t)); 461 - attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t))); 462 - if (libbpf_nla_parse(tb, TCA_MAX, attr, len, NULL) != 0) 463 - return -LIBBPF_ERRNO__NLPARSE; 464 - 465 - return dump_filter_nlmsg(cookie, t, tb); 466 - } 467 - 468 - int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle, 469 - libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie) 470 - { 471 - struct { 472 - struct nlmsghdr nlh; 473 - struct tcmsg t; 474 - } req = { 475 - .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), 476 - .nlh.nlmsg_type = RTM_GETTFILTER, 477 - .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST, 478 - .t.tcm_family = AF_UNSPEC, 479 - .t.tcm_ifindex = ifindex, 480 - .t.tcm_parent = handle, 481 - }; 482 - int seq = time(NULL); 483 - 484 - req.nlh.nlmsg_seq = seq; 485 - if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0) 486 - return -errno; 487 - 488 - return bpf_netlink_recv(sock, nl_pid, seq, __dump_filter_nlmsg, 489 - dump_filter_nlmsg, cookie); 490 370 }

+3 -6

tools/lib/bpf/nlattr.c

··· 7 7 */ 8 8 9 9 #include <errno.h> 10 - #include "nlattr.h" 11 - #include "libbpf_internal.h" 12 - #include <linux/rtnetlink.h> 13 10 #include <string.h> 14 11 #include <stdio.h> 15 - 16 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 17 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 12 + #include <linux/rtnetlink.h> 13 + #include "nlattr.h" 14 + #include "libbpf_internal.h" 18 15 19 16 static uint16_t nla_attr_minlen[LIBBPF_NLA_TYPE_MAX+1] = { 20 17 [LIBBPF_NLA_U8] = sizeof(uint8_t),

+2 -6

tools/lib/bpf/ringbuf.c

··· 16 16 #include <asm/barrier.h> 17 17 #include <sys/mman.h> 18 18 #include <sys/epoll.h> 19 - #include <tools/libc_compat.h> 20 19 21 20 #include "libbpf.h" 22 21 #include "libbpf_internal.h" 23 22 #include "bpf.h" 24 - 25 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 26 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 27 23 28 24 struct ring { 29 25 ring_buffer_sample_fn sample_cb; ··· 78 82 return -EINVAL; 79 83 } 80 84 81 - tmp = reallocarray(rb->rings, rb->ring_cnt + 1, sizeof(*rb->rings)); 85 + tmp = libbpf_reallocarray(rb->rings, rb->ring_cnt + 1, sizeof(*rb->rings)); 82 86 if (!tmp) 83 87 return -ENOMEM; 84 88 rb->rings = tmp; 85 89 86 - tmp = reallocarray(rb->events, rb->ring_cnt + 1, sizeof(*rb->events)); 90 + tmp = libbpf_reallocarray(rb->events, rb->ring_cnt + 1, sizeof(*rb->events)); 87 91 if (!tmp) 88 92 return -ENOMEM; 89 93 rb->events = tmp;

+245 -136

tools/lib/bpf/xsk.c

··· 20 20 #include <linux/if_ether.h> 21 21 #include <linux/if_packet.h> 22 22 #include <linux/if_xdp.h> 23 + #include <linux/list.h> 23 24 #include <linux/sockios.h> 24 25 #include <net/if.h> 25 26 #include <sys/ioctl.h> ··· 32 31 #include "libbpf.h" 33 32 #include "libbpf_internal.h" 34 33 #include "xsk.h" 35 - 36 - /* make sure libbpf doesn't use kernel-only integer typedefs */ 37 - #pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 38 34 39 35 #ifndef SOL_XDP 40 36 #define SOL_XDP 283 ··· 46 48 #endif 47 49 48 50 struct xsk_umem { 49 - struct xsk_ring_prod *fill; 50 - struct xsk_ring_cons *comp; 51 + struct xsk_ring_prod *fill_save; 52 + struct xsk_ring_cons *comp_save; 51 53 char *umem_area; 52 54 struct xsk_umem_config config; 53 55 int fd; 54 56 int refcount; 57 + struct list_head ctx_list; 58 + }; 59 + 60 + struct xsk_ctx { 61 + struct xsk_ring_prod *fill; 62 + struct xsk_ring_cons *comp; 63 + __u32 queue_id; 64 + struct xsk_umem *umem; 65 + int refcount; 66 + int ifindex; 67 + struct list_head list; 68 + int prog_fd; 69 + int xsks_map_fd; 70 + char ifname[IFNAMSIZ]; 55 71 }; 56 72 57 73 struct xsk_socket { 58 74 struct xsk_ring_cons *rx; 59 75 struct xsk_ring_prod *tx; 60 76 __u64 outstanding_tx; 61 - struct xsk_umem *umem; 77 + struct xsk_ctx *ctx; 62 78 struct xsk_socket_config config; 63 79 int fd; 64 - int ifindex; 65 - int prog_fd; 66 - int xsks_map_fd; 67 - __u32 queue_id; 68 - char ifname[IFNAMSIZ]; 69 80 }; 70 81 71 82 struct xsk_nl_info { ··· 210 203 return -EINVAL; 211 204 } 212 205 206 + static int xsk_create_umem_rings(struct xsk_umem *umem, int fd, 207 + struct xsk_ring_prod *fill, 208 + struct xsk_ring_cons *comp) 209 + { 210 + struct xdp_mmap_offsets off; 211 + void *map; 212 + int err; 213 + 214 + err = setsockopt(fd, SOL_XDP, XDP_UMEM_FILL_RING, 215 + &umem->config.fill_size, 216 + sizeof(umem->config.fill_size)); 217 + if (err) 218 + return -errno; 219 + 220 + err = setsockopt(fd, SOL_XDP, XDP_UMEM_COMPLETION_RING, 221 + &umem->config.comp_size, 222 + sizeof(umem->config.comp_size)); 223 + if (err) 224 + return -errno; 225 + 226 + err = xsk_get_mmap_offsets(fd, &off); 227 + if (err) 228 + return -errno; 229 + 230 + map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64), 231 + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, 232 + XDP_UMEM_PGOFF_FILL_RING); 233 + if (map == MAP_FAILED) 234 + return -errno; 235 + 236 + fill->mask = umem->config.fill_size - 1; 237 + fill->size = umem->config.fill_size; 238 + fill->producer = map + off.fr.producer; 239 + fill->consumer = map + off.fr.consumer; 240 + fill->flags = map + off.fr.flags; 241 + fill->ring = map + off.fr.desc; 242 + fill->cached_cons = umem->config.fill_size; 243 + 244 + map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), 245 + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, 246 + XDP_UMEM_PGOFF_COMPLETION_RING); 247 + if (map == MAP_FAILED) { 248 + err = -errno; 249 + goto out_mmap; 250 + } 251 + 252 + comp->mask = umem->config.comp_size - 1; 253 + comp->size = umem->config.comp_size; 254 + comp->producer = map + off.cr.producer; 255 + comp->consumer = map + off.cr.consumer; 256 + comp->flags = map + off.cr.flags; 257 + comp->ring = map + off.cr.desc; 258 + 259 + return 0; 260 + 261 + out_mmap: 262 + munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64)); 263 + return err; 264 + } 265 + 213 266 int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, 214 267 __u64 size, struct xsk_ring_prod *fill, 215 268 struct xsk_ring_cons *comp, 216 269 const struct xsk_umem_config *usr_config) 217 270 { 218 - struct xdp_mmap_offsets off; 219 271 struct xdp_umem_reg mr; 220 272 struct xsk_umem *umem; 221 - void *map; 222 273 int err; 223 274 224 275 if (!umem_area || !umem_ptr || !fill || !comp) ··· 295 230 } 296 231 297 232 umem->umem_area = umem_area; 233 + INIT_LIST_HEAD(&umem->ctx_list); 298 234 xsk_set_umem_config(&umem->config, usr_config); 299 235 300 236 memset(&mr, 0, sizeof(mr)); ··· 310 244 err = -errno; 311 245 goto out_socket; 312 246 } 313 - err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_FILL_RING, 314 - &umem->config.fill_size, 315 - sizeof(umem->config.fill_size)); 316 - if (err) { 317 - err = -errno; 247 + 248 + err = xsk_create_umem_rings(umem, umem->fd, fill, comp); 249 + if (err) 318 250 goto out_socket; 319 - } 320 - err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_COMPLETION_RING, 321 - &umem->config.comp_size, 322 - sizeof(umem->config.comp_size)); 323 - if (err) { 324 - err = -errno; 325 - goto out_socket; 326 - } 327 251 328 - err = xsk_get_mmap_offsets(umem->fd, &off); 329 - if (err) { 330 - err = -errno; 331 - goto out_socket; 332 - } 333 - 334 - map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64), 335 - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, 336 - XDP_UMEM_PGOFF_FILL_RING); 337 - if (map == MAP_FAILED) { 338 - err = -errno; 339 - goto out_socket; 340 - } 341 - 342 - umem->fill = fill; 343 - fill->mask = umem->config.fill_size - 1; 344 - fill->size = umem->config.fill_size; 345 - fill->producer = map + off.fr.producer; 346 - fill->consumer = map + off.fr.consumer; 347 - fill->flags = map + off.fr.flags; 348 - fill->ring = map + off.fr.desc; 349 - fill->cached_prod = *fill->producer; 350 - /* cached_cons is "size" bigger than the real consumer pointer 351 - * See xsk_prod_nb_free 352 - */ 353 - fill->cached_cons = *fill->consumer + umem->config.fill_size; 354 - 355 - map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), 356 - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, 357 - XDP_UMEM_PGOFF_COMPLETION_RING); 358 - if (map == MAP_FAILED) { 359 - err = -errno; 360 - goto out_mmap; 361 - } 362 - 363 - umem->comp = comp; 364 - comp->mask = umem->config.comp_size - 1; 365 - comp->size = umem->config.comp_size; 366 - comp->producer = map + off.cr.producer; 367 - comp->consumer = map + off.cr.consumer; 368 - comp->flags = map + off.cr.flags; 369 - comp->ring = map + off.cr.desc; 370 - comp->cached_prod = *comp->producer; 371 - comp->cached_cons = *comp->consumer; 372 - 252 + umem->fill_save = fill; 253 + umem->comp_save = comp; 373 254 *umem_ptr = umem; 374 255 return 0; 375 256 376 - out_mmap: 377 - munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64)); 378 257 out_socket: 379 258 close(umem->fd); 380 259 out_umem_alloc: ··· 353 342 static int xsk_load_xdp_prog(struct xsk_socket *xsk) 354 343 { 355 344 static const int log_buf_size = 16 * 1024; 345 + struct xsk_ctx *ctx = xsk->ctx; 356 346 char log_buf[log_buf_size]; 357 347 int err, prog_fd; 358 348 ··· 381 369 /* *(u32 *)(r10 - 4) = r2 */ 382 370 BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -4), 383 371 /* r1 = xskmap[] */ 384 - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), 372 + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), 385 373 /* r3 = XDP_PASS */ 386 374 BPF_MOV64_IMM(BPF_REG_3, 2), 387 375 /* call bpf_redirect_map */ ··· 393 381 /* r2 += -4 */ 394 382 BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), 395 383 /* r1 = xskmap[] */ 396 - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), 384 + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), 397 385 /* call bpf_map_lookup_elem */ 398 386 BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), 399 387 /* r1 = r0 */ ··· 405 393 /* r2 = *(u32 *)(r10 - 4) */ 406 394 BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4), 407 395 /* r1 = xskmap[] */ 408 - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), 396 + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), 409 397 /* r3 = 0 */ 410 398 BPF_MOV64_IMM(BPF_REG_3, 0), 411 399 /* call bpf_redirect_map */ ··· 423 411 return prog_fd; 424 412 } 425 413 426 - err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags); 414 + err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, prog_fd, 415 + xsk->config.xdp_flags); 427 416 if (err) { 428 417 close(prog_fd); 429 418 return err; 430 419 } 431 420 432 - xsk->prog_fd = prog_fd; 421 + ctx->prog_fd = prog_fd; 433 422 return 0; 434 423 } 435 424 436 425 static int xsk_get_max_queues(struct xsk_socket *xsk) 437 426 { 438 427 struct ethtool_channels channels = { .cmd = ETHTOOL_GCHANNELS }; 428 + struct xsk_ctx *ctx = xsk->ctx; 439 429 struct ifreq ifr = {}; 440 430 int fd, err, ret; 441 431 ··· 446 432 return -errno; 447 433 448 434 ifr.ifr_data = (void *)&channels; 449 - memcpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1); 435 + memcpy(ifr.ifr_name, ctx->ifname, IFNAMSIZ - 1); 450 436 ifr.ifr_name[IFNAMSIZ - 1] = '\0'; 451 437 err = ioctl(fd, SIOCETHTOOL, &ifr); 452 438 if (err && errno != EOPNOTSUPP) { ··· 474 460 475 461 static int xsk_create_bpf_maps(struct xsk_socket *xsk) 476 462 { 463 + struct xsk_ctx *ctx = xsk->ctx; 477 464 int max_queues; 478 465 int fd; 479 466 ··· 487 472 if (fd < 0) 488 473 return fd; 489 474 490 - xsk->xsks_map_fd = fd; 475 + ctx->xsks_map_fd = fd; 491 476 492 477 return 0; 493 478 } 494 479 495 480 static void xsk_delete_bpf_maps(struct xsk_socket *xsk) 496 481 { 497 - bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id); 498 - close(xsk->xsks_map_fd); 482 + struct xsk_ctx *ctx = xsk->ctx; 483 + 484 + bpf_map_delete_elem(ctx->xsks_map_fd, &ctx->queue_id); 485 + close(ctx->xsks_map_fd); 499 486 } 500 487 501 488 static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) ··· 505 488 __u32 i, *map_ids, num_maps, prog_len = sizeof(struct bpf_prog_info); 506 489 __u32 map_len = sizeof(struct bpf_map_info); 507 490 struct bpf_prog_info prog_info = {}; 491 + struct xsk_ctx *ctx = xsk->ctx; 508 492 struct bpf_map_info map_info; 509 493 int fd, err; 510 494 511 - err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len); 495 + err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len); 512 496 if (err) 513 497 return err; 514 498 ··· 523 505 prog_info.nr_map_ids = num_maps; 524 506 prog_info.map_ids = (__u64)(unsigned long)map_ids; 525 507 526 - err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len); 508 + err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len); 527 509 if (err) 528 510 goto out_map_ids; 529 511 530 - xsk->xsks_map_fd = -1; 512 + ctx->xsks_map_fd = -1; 531 513 532 514 for (i = 0; i < prog_info.nr_map_ids; i++) { 533 515 fd = bpf_map_get_fd_by_id(map_ids[i]); ··· 541 523 } 542 524 543 525 if (!strcmp(map_info.name, "xsks_map")) { 544 - xsk->xsks_map_fd = fd; 526 + ctx->xsks_map_fd = fd; 545 527 continue; 546 528 } 547 529 ··· 549 531 } 550 532 551 533 err = 0; 552 - if (xsk->xsks_map_fd == -1) 534 + if (ctx->xsks_map_fd == -1) 553 535 err = -ENOENT; 554 536 555 537 out_map_ids: ··· 559 541 560 542 static int xsk_set_bpf_maps(struct xsk_socket *xsk) 561 543 { 562 - return bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id, 544 + struct xsk_ctx *ctx = xsk->ctx; 545 + 546 + return bpf_map_update_elem(ctx->xsks_map_fd, &ctx->queue_id, 563 547 &xsk->fd, 0); 564 548 } 565 549 566 550 static int xsk_setup_xdp_prog(struct xsk_socket *xsk) 567 551 { 552 + struct xsk_ctx *ctx = xsk->ctx; 568 553 __u32 prog_id = 0; 569 554 int err; 570 555 571 - err = bpf_get_link_xdp_id(xsk->ifindex, &prog_id, 556 + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, 572 557 xsk->config.xdp_flags); 573 558 if (err) 574 559 return err; ··· 587 566 return err; 588 567 } 589 568 } else { 590 - xsk->prog_fd = bpf_prog_get_fd_by_id(prog_id); 591 - if (xsk->prog_fd < 0) 569 + ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id); 570 + if (ctx->prog_fd < 0) 592 571 return -errno; 593 572 err = xsk_lookup_bpf_maps(xsk); 594 573 if (err) { 595 - close(xsk->prog_fd); 574 + close(ctx->prog_fd); 596 575 return err; 597 576 } 598 577 } ··· 601 580 err = xsk_set_bpf_maps(xsk); 602 581 if (err) { 603 582 xsk_delete_bpf_maps(xsk); 604 - close(xsk->prog_fd); 583 + close(ctx->prog_fd); 605 584 return err; 606 585 } 607 586 608 587 return 0; 609 588 } 610 589 611 - int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, 612 - __u32 queue_id, struct xsk_umem *umem, 613 - struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, 614 - const struct xsk_socket_config *usr_config) 590 + static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex, 591 + __u32 queue_id) 592 + { 593 + struct xsk_ctx *ctx; 594 + 595 + if (list_empty(&umem->ctx_list)) 596 + return NULL; 597 + 598 + list_for_each_entry(ctx, &umem->ctx_list, list) { 599 + if (ctx->ifindex == ifindex && ctx->queue_id == queue_id) { 600 + ctx->refcount++; 601 + return ctx; 602 + } 603 + } 604 + 605 + return NULL; 606 + } 607 + 608 + static void xsk_put_ctx(struct xsk_ctx *ctx) 609 + { 610 + struct xsk_umem *umem = ctx->umem; 611 + struct xdp_mmap_offsets off; 612 + int err; 613 + 614 + if (--ctx->refcount == 0) { 615 + err = xsk_get_mmap_offsets(umem->fd, &off); 616 + if (!err) { 617 + munmap(ctx->fill->ring - off.fr.desc, 618 + off.fr.desc + umem->config.fill_size * 619 + sizeof(__u64)); 620 + munmap(ctx->comp->ring - off.cr.desc, 621 + off.cr.desc + umem->config.comp_size * 622 + sizeof(__u64)); 623 + } 624 + 625 + list_del(&ctx->list); 626 + free(ctx); 627 + } 628 + } 629 + 630 + static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk, 631 + struct xsk_umem *umem, int ifindex, 632 + const char *ifname, __u32 queue_id, 633 + struct xsk_ring_prod *fill, 634 + struct xsk_ring_cons *comp) 635 + { 636 + struct xsk_ctx *ctx; 637 + int err; 638 + 639 + ctx = calloc(1, sizeof(*ctx)); 640 + if (!ctx) 641 + return NULL; 642 + 643 + if (!umem->fill_save) { 644 + err = xsk_create_umem_rings(umem, xsk->fd, fill, comp); 645 + if (err) { 646 + free(ctx); 647 + return NULL; 648 + } 649 + } else if (umem->fill_save != fill || umem->comp_save != comp) { 650 + /* Copy over rings to new structs. */ 651 + memcpy(fill, umem->fill_save, sizeof(*fill)); 652 + memcpy(comp, umem->comp_save, sizeof(*comp)); 653 + } 654 + 655 + ctx->ifindex = ifindex; 656 + ctx->refcount = 1; 657 + ctx->umem = umem; 658 + ctx->queue_id = queue_id; 659 + memcpy(ctx->ifname, ifname, IFNAMSIZ - 1); 660 + ctx->ifname[IFNAMSIZ - 1] = '\0'; 661 + 662 + umem->fill_save = NULL; 663 + umem->comp_save = NULL; 664 + ctx->fill = fill; 665 + ctx->comp = comp; 666 + list_add(&ctx->list, &umem->ctx_list); 667 + return ctx; 668 + } 669 + 670 + int xsk_socket__create_shared(struct xsk_socket **xsk_ptr, 671 + const char *ifname, 672 + __u32 queue_id, struct xsk_umem *umem, 673 + struct xsk_ring_cons *rx, 674 + struct xsk_ring_prod *tx, 675 + struct xsk_ring_prod *fill, 676 + struct xsk_ring_cons *comp, 677 + const struct xsk_socket_config *usr_config) 615 678 { 616 679 void *rx_map = NULL, *tx_map = NULL; 617 680 struct sockaddr_xdp sxdp = {}; 618 681 struct xdp_mmap_offsets off; 619 682 struct xsk_socket *xsk; 620 - int err; 683 + struct xsk_ctx *ctx; 684 + int err, ifindex; 621 685 622 - if (!umem || !xsk_ptr || !(rx || tx)) 686 + if (!umem || !xsk_ptr || !(rx || tx) || !fill || !comp) 623 687 return -EFAULT; 624 688 625 689 xsk = calloc(1, sizeof(*xsk)); ··· 715 609 if (err) 716 610 goto out_xsk_alloc; 717 611 718 - if (umem->refcount && 719 - !(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { 720 - pr_warn("Error: shared umems not supported by libbpf supplied XDP program.\n"); 721 - err = -EBUSY; 612 + xsk->outstanding_tx = 0; 613 + ifindex = if_nametoindex(ifname); 614 + if (!ifindex) { 615 + err = -errno; 722 616 goto out_xsk_alloc; 723 617 } 724 618 ··· 732 626 xsk->fd = umem->fd; 733 627 } 734 628 735 - xsk->outstanding_tx = 0; 736 - xsk->queue_id = queue_id; 737 - xsk->umem = umem; 738 - xsk->ifindex = if_nametoindex(ifname); 739 - if (!xsk->ifindex) { 740 - err = -errno; 741 - goto out_socket; 629 + ctx = xsk_get_ctx(umem, ifindex, queue_id); 630 + if (!ctx) { 631 + ctx = xsk_create_ctx(xsk, umem, ifindex, ifname, queue_id, 632 + fill, comp); 633 + if (!ctx) { 634 + err = -ENOMEM; 635 + goto out_socket; 636 + } 742 637 } 743 - memcpy(xsk->ifname, ifname, IFNAMSIZ - 1); 744 - xsk->ifname[IFNAMSIZ - 1] = '\0'; 638 + xsk->ctx = ctx; 745 639 746 640 if (rx) { 747 641 err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING, ··· 749 643 sizeof(xsk->config.rx_size)); 750 644 if (err) { 751 645 err = -errno; 752 - goto out_socket; 646 + goto out_put_ctx; 753 647 } 754 648 } 755 649 if (tx) { ··· 758 652 sizeof(xsk->config.tx_size)); 759 653 if (err) { 760 654 err = -errno; 761 - goto out_socket; 655 + goto out_put_ctx; 762 656 } 763 657 } 764 658 765 659 err = xsk_get_mmap_offsets(xsk->fd, &off); 766 660 if (err) { 767 661 err = -errno; 768 - goto out_socket; 662 + goto out_put_ctx; 769 663 } 770 664 771 665 if (rx) { ··· 775 669 xsk->fd, XDP_PGOFF_RX_RING); 776 670 if (rx_map == MAP_FAILED) { 777 671 err = -errno; 778 - goto out_socket; 672 + goto out_put_ctx; 779 673 } 780 674 781 675 rx->mask = xsk->config.rx_size - 1; ··· 814 708 xsk->tx = tx; 815 709 816 710 sxdp.sxdp_family = PF_XDP; 817 - sxdp.sxdp_ifindex = xsk->ifindex; 818 - sxdp.sxdp_queue_id = xsk->queue_id; 711 + sxdp.sxdp_ifindex = ctx->ifindex; 712 + sxdp.sxdp_queue_id = ctx->queue_id; 819 713 if (umem->refcount > 1) { 820 - sxdp.sxdp_flags = XDP_SHARED_UMEM; 714 + sxdp.sxdp_flags |= XDP_SHARED_UMEM; 821 715 sxdp.sxdp_shared_umem_fd = umem->fd; 822 716 } else { 823 717 sxdp.sxdp_flags = xsk->config.bind_flags; ··· 829 723 goto out_mmap_tx; 830 724 } 831 725 832 - xsk->prog_fd = -1; 726 + ctx->prog_fd = -1; 833 727 834 728 if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { 835 729 err = xsk_setup_xdp_prog(xsk); ··· 848 742 if (rx) 849 743 munmap(rx_map, off.rx.desc + 850 744 xsk->config.rx_size * sizeof(struct xdp_desc)); 745 + out_put_ctx: 746 + xsk_put_ctx(ctx); 851 747 out_socket: 852 748 if (--umem->refcount) 853 749 close(xsk->fd); ··· 858 750 return err; 859 751 } 860 752 753 + int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, 754 + __u32 queue_id, struct xsk_umem *umem, 755 + struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, 756 + const struct xsk_socket_config *usr_config) 757 + { 758 + return xsk_socket__create_shared(xsk_ptr, ifname, queue_id, umem, 759 + rx, tx, umem->fill_save, 760 + umem->comp_save, usr_config); 761 + } 762 + 861 763 int xsk_umem__delete(struct xsk_umem *umem) 862 764 { 863 - struct xdp_mmap_offsets off; 864 - int err; 865 - 866 765 if (!umem) 867 766 return 0; 868 767 869 768 if (umem->refcount) 870 769 return -EBUSY; 871 - 872 - err = xsk_get_mmap_offsets(umem->fd, &off); 873 - if (!err) { 874 - munmap(umem->fill->ring - off.fr.desc, 875 - off.fr.desc + umem->config.fill_size * sizeof(__u64)); 876 - munmap(umem->comp->ring - off.cr.desc, 877 - off.cr.desc + umem->config.comp_size * sizeof(__u64)); 878 - } 879 770 880 771 close(umem->fd); 881 772 free(umem); ··· 885 778 void xsk_socket__delete(struct xsk_socket *xsk) 886 779 { 887 780 size_t desc_sz = sizeof(struct xdp_desc); 781 + struct xsk_ctx *ctx = xsk->ctx; 888 782 struct xdp_mmap_offsets off; 889 783 int err; 890 784 891 785 if (!xsk) 892 786 return; 893 787 894 - if (xsk->prog_fd != -1) { 788 + if (ctx->prog_fd != -1) { 895 789 xsk_delete_bpf_maps(xsk); 896 - close(xsk->prog_fd); 790 + close(ctx->prog_fd); 897 791 } 898 792 899 793 err = xsk_get_mmap_offsets(xsk->fd, &off); ··· 907 799 munmap(xsk->tx->ring - off.tx.desc, 908 800 off.tx.desc + xsk->config.tx_size * desc_sz); 909 801 } 910 - 911 802 } 912 803 913 - xsk->umem->refcount--; 804 + xsk_put_ctx(ctx); 805 + 806 + ctx->umem->refcount--; 914 807 /* Do not close an fd that also has an associated umem connected 915 808 * to it. 916 809 */ 917 - if (xsk->fd != xsk->umem->fd) 810 + if (xsk->fd != ctx->umem->fd) 918 811 close(xsk->fd); 919 812 free(xsk); 920 813 }

+9

tools/lib/bpf/xsk.h

··· 234 234 struct xsk_ring_cons *rx, 235 235 struct xsk_ring_prod *tx, 236 236 const struct xsk_socket_config *config); 237 + LIBBPF_API int 238 + xsk_socket__create_shared(struct xsk_socket **xsk_ptr, 239 + const char *ifname, 240 + __u32 queue_id, struct xsk_umem *umem, 241 + struct xsk_ring_cons *rx, 242 + struct xsk_ring_prod *tx, 243 + struct xsk_ring_prod *fill, 244 + struct xsk_ring_cons *comp, 245 + const struct xsk_socket_config *config); 237 246 238 247 /* Returns 0 for success and -EBUSY if the umem is still in use. */ 239 248 LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem);

-4

tools/perf/Makefile.config

··· 483 483 EXTLIBS += -lelf 484 484 $(call detected,CONFIG_LIBELF) 485 485 486 - ifeq ($(feature-libelf-mmap), 1) 487 - CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT 488 - endif 489 - 490 486 ifeq ($(feature-libelf-getphdrnum), 1) 491 487 CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT 492 488 endif

+1 -1

tools/perf/util/symbol.h

··· 28 28 * libelf 0.8.x and earlier do not support ELF_C_READ_MMAP; 29 29 * for newer versions we can use mmap to reduce memory usage: 30 30 */ 31 - #ifdef HAVE_LIBELF_MMAP_SUPPORT 31 + #ifdef ELF_C_READ_MMAP 32 32 # define PERF_ELF_C_READ_MMAP ELF_C_READ_MMAP 33 33 #else 34 34 # define PERF_ELF_C_READ_MMAP ELF_C_READ

+1 -1

tools/testing/selftests/bpf/Makefile

··· 316 316 $(TRUNNER_BPF_PROGS_DIR)/%.c \ 317 317 $(TRUNNER_BPF_PROGS_DIR)/*.h \ 318 318 $$(INCLUDE_DIR)/vmlinux.h \ 319 - $$(BPFOBJ) | $(TRUNNER_OUTPUT) 319 + $(wildcard $(BPFDIR)/bpf_*.h) | $(TRUNNER_OUTPUT) 320 320 $$(call $(TRUNNER_BPF_BUILD_RULE),$$<,$$@, \ 321 321 $(TRUNNER_BPF_CFLAGS), \ 322 322 $(TRUNNER_BPF_LDFLAGS))

+21

tools/testing/selftests/bpf/README.rst

··· 43 43 https://reviews.llvm.org/D78466 44 44 has been pushed to llvm 10.x release branch and will be 45 45 available in 10.0.1. The fix is available in llvm 11.0.0 trunk. 46 + 47 + BPF CO-RE-based tests and Clang version 48 + ======================================= 49 + 50 + A set of selftests use BPF target-specific built-ins, which might require 51 + bleeding-edge Clang versions (Clang 12 nightly at this time). 52 + 53 + Few sub-tests of core_reloc test suit (part of test_progs test runner) require 54 + the following built-ins, listed with corresponding Clang diffs introducing 55 + them to Clang/LLVM. These sub-tests are going to be skipped if Clang is too 56 + old to support them, they shouldn't cause build failures or runtime test 57 + failures: 58 + 59 + - __builtin_btf_type_id() ([0], [1], [2]); 60 + - __builtin_preserve_type_info(), __builtin_preserve_enum_value() ([3], [4]). 61 + 62 + [0] https://reviews.llvm.org/D74572 63 + [1] https://reviews.llvm.org/D74668 64 + [2] https://reviews.llvm.org/D85174 65 + [3] https://reviews.llvm.org/D83878 66 + [4] https://reviews.llvm.org/D83242

+2

tools/testing/selftests/bpf/bench.c

··· 317 317 extern const struct bench bench_trig_rawtp; 318 318 extern const struct bench bench_trig_kprobe; 319 319 extern const struct bench bench_trig_fentry; 320 + extern const struct bench bench_trig_fentry_sleep; 320 321 extern const struct bench bench_trig_fmodret; 321 322 extern const struct bench bench_rb_libbpf; 322 323 extern const struct bench bench_rb_custom; ··· 339 338 &bench_trig_rawtp, 340 339 &bench_trig_kprobe, 341 340 &bench_trig_fentry, 341 + &bench_trig_fentry_sleep, 342 342 &bench_trig_fmodret, 343 343 &bench_rb_libbpf, 344 344 &bench_rb_custom,

+17

tools/testing/selftests/bpf/benchs/bench_trigger.c

··· 90 90 attach_bpf(ctx.skel->progs.bench_trigger_fentry); 91 91 } 92 92 93 + static void trigger_fentry_sleep_setup() 94 + { 95 + setup_ctx(); 96 + attach_bpf(ctx.skel->progs.bench_trigger_fentry_sleep); 97 + } 98 + 93 99 static void trigger_fmodret_setup() 94 100 { 95 101 setup_ctx(); ··· 154 148 .name = "trig-fentry", 155 149 .validate = trigger_validate, 156 150 .setup = trigger_fentry_setup, 151 + .producer_thread = trigger_producer, 152 + .consumer_thread = trigger_consumer, 153 + .measure = trigger_measure, 154 + .report_progress = hits_drops_report_progress, 155 + .report_final = hits_drops_report_final, 156 + }; 157 + 158 + const struct bench bench_trig_fentry_sleep = { 159 + .name = "trig-fentry-sleep", 160 + .validate = trigger_validate, 161 + .setup = trigger_fentry_sleep_setup, 157 162 .producer_thread = trigger_producer, 158 163 .consumer_thread = trigger_consumer, 159 164 .measure = trigger_measure,

+37

tools/testing/selftests/bpf/network_helpers.c

··· 104 104 return -1; 105 105 } 106 106 107 + int fastopen_connect(int server_fd, const char *data, unsigned int data_len, 108 + int timeout_ms) 109 + { 110 + struct sockaddr_storage addr; 111 + socklen_t addrlen = sizeof(addr); 112 + struct sockaddr_in *addr_in; 113 + int fd, ret; 114 + 115 + if (getsockname(server_fd, (struct sockaddr *)&addr, &addrlen)) { 116 + log_err("Failed to get server addr"); 117 + return -1; 118 + } 119 + 120 + addr_in = (struct sockaddr_in *)&addr; 121 + fd = socket(addr_in->sin_family, SOCK_STREAM, 0); 122 + if (fd < 0) { 123 + log_err("Failed to create client socket"); 124 + return -1; 125 + } 126 + 127 + if (settimeo(fd, timeout_ms)) 128 + goto error_close; 129 + 130 + ret = sendto(fd, data, data_len, MSG_FASTOPEN, (struct sockaddr *)&addr, 131 + addrlen); 132 + if (ret != data_len) { 133 + log_err("sendto(data, %u) != %d\n", data_len, ret); 134 + goto error_close; 135 + } 136 + 137 + return fd; 138 + 139 + error_close: 140 + save_errno_close(fd); 141 + return -1; 142 + } 143 + 107 144 static int connect_fd_to_addr(int fd, 108 145 const struct sockaddr_storage *addr, 109 146 socklen_t addrlen)

+2

tools/testing/selftests/bpf/network_helpers.h

··· 37 37 int timeout_ms); 38 38 int connect_to_fd(int server_fd, int timeout_ms); 39 39 int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms); 40 + int fastopen_connect(int server_fd, const char *data, unsigned int data_len, 41 + int timeout_ms); 40 42 int make_sockaddr(int family, const char *addr_str, __u16 port, 41 43 struct sockaddr_storage *addr, socklen_t *len); 42 44

+34 -1

tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c

··· 53 53 return err; 54 54 } 55 55 56 - void test_btf_map_in_map(void) 56 + static void test_lookup_update(void) 57 57 { 58 58 int err, key = 0, val, i; 59 59 struct test_btf_map_in_map *skel; ··· 142 142 143 143 cleanup: 144 144 test_btf_map_in_map__destroy(skel); 145 + } 146 + 147 + static void test_diff_size(void) 148 + { 149 + struct test_btf_map_in_map *skel; 150 + int err, inner_map_fd, zero = 0; 151 + 152 + skel = test_btf_map_in_map__open_and_load(); 153 + if (CHECK(!skel, "skel_open", "failed to open&load skeleton\n")) 154 + return; 155 + 156 + inner_map_fd = bpf_map__fd(skel->maps.sockarr_sz2); 157 + err = bpf_map_update_elem(bpf_map__fd(skel->maps.outer_sockarr), &zero, 158 + &inner_map_fd, 0); 159 + CHECK(err, "outer_sockarr inner map size check", 160 + "cannot use a different size inner_map\n"); 161 + 162 + inner_map_fd = bpf_map__fd(skel->maps.inner_map_sz2); 163 + err = bpf_map_update_elem(bpf_map__fd(skel->maps.outer_arr), &zero, 164 + &inner_map_fd, 0); 165 + CHECK(!err, "outer_arr inner map size check", 166 + "incorrectly updated with a different size inner_map\n"); 167 + 168 + test_btf_map_in_map__destroy(skel); 169 + } 170 + 171 + void test_btf_map_in_map(void) 172 + { 173 + if (test__start_subtest("lookup_update")) 174 + test_lookup_update(); 175 + 176 + if (test__start_subtest("diff_size")) 177 + test_diff_size(); 145 178 }

+324 -26

tools/testing/selftests/bpf/prog_tests/core_reloc.c

··· 3 3 #include "progs/core_reloc_types.h" 4 4 #include <sys/mman.h> 5 5 #include <sys/syscall.h> 6 + #include <bpf/btf.h> 7 + 8 + static int duration = 0; 6 9 7 10 #define STRUCT_TO_CHAR_PTR(struct_name) (const char *)&(struct struct_name) 8 11 ··· 180 177 .fails = true, \ 181 178 } 182 179 183 - #define EXISTENCE_CASE_COMMON(name) \ 180 + #define FIELD_EXISTS_CASE_COMMON(name) \ 184 181 .case_name = #name, \ 185 182 .bpf_obj_file = "test_core_reloc_existence.o", \ 186 - .btf_src_file = "btf__core_reloc_" #name ".o", \ 187 - .relaxed_core_relocs = true 183 + .btf_src_file = "btf__core_reloc_" #name ".o" \ 188 184 189 - #define EXISTENCE_ERR_CASE(name) { \ 190 - EXISTENCE_CASE_COMMON(name), \ 185 + #define FIELD_EXISTS_ERR_CASE(name) { \ 186 + FIELD_EXISTS_CASE_COMMON(name), \ 191 187 .fails = true, \ 192 188 } 193 189 ··· 255 253 .fails = true, \ 256 254 } 257 255 256 + #define TYPE_BASED_CASE_COMMON(name) \ 257 + .case_name = #name, \ 258 + .bpf_obj_file = "test_core_reloc_type_based.o", \ 259 + .btf_src_file = "btf__core_reloc_" #name ".o" \ 260 + 261 + #define TYPE_BASED_CASE(name, ...) { \ 262 + TYPE_BASED_CASE_COMMON(name), \ 263 + .output = STRUCT_TO_CHAR_PTR(core_reloc_type_based_output) \ 264 + __VA_ARGS__, \ 265 + .output_len = sizeof(struct core_reloc_type_based_output), \ 266 + } 267 + 268 + #define TYPE_BASED_ERR_CASE(name) { \ 269 + TYPE_BASED_CASE_COMMON(name), \ 270 + .fails = true, \ 271 + } 272 + 273 + #define TYPE_ID_CASE_COMMON(name) \ 274 + .case_name = #name, \ 275 + .bpf_obj_file = "test_core_reloc_type_id.o", \ 276 + .btf_src_file = "btf__core_reloc_" #name ".o" \ 277 + 278 + #define TYPE_ID_CASE(name, setup_fn) { \ 279 + TYPE_ID_CASE_COMMON(name), \ 280 + .output = STRUCT_TO_CHAR_PTR(core_reloc_type_id_output) {}, \ 281 + .output_len = sizeof(struct core_reloc_type_id_output), \ 282 + .setup = setup_fn, \ 283 + } 284 + 285 + #define TYPE_ID_ERR_CASE(name) { \ 286 + TYPE_ID_CASE_COMMON(name), \ 287 + .fails = true, \ 288 + } 289 + 290 + #define ENUMVAL_CASE_COMMON(name) \ 291 + .case_name = #name, \ 292 + .bpf_obj_file = "test_core_reloc_enumval.o", \ 293 + .btf_src_file = "btf__core_reloc_" #name ".o" \ 294 + 295 + #define ENUMVAL_CASE(name, ...) { \ 296 + ENUMVAL_CASE_COMMON(name), \ 297 + .output = STRUCT_TO_CHAR_PTR(core_reloc_enumval_output) \ 298 + __VA_ARGS__, \ 299 + .output_len = sizeof(struct core_reloc_enumval_output), \ 300 + } 301 + 302 + #define ENUMVAL_ERR_CASE(name) { \ 303 + ENUMVAL_CASE_COMMON(name), \ 304 + .fails = true, \ 305 + } 306 + 307 + struct core_reloc_test_case; 308 + 309 + typedef int (*setup_test_fn)(struct core_reloc_test_case *test); 310 + 258 311 struct core_reloc_test_case { 259 312 const char *case_name; 260 313 const char *bpf_obj_file; ··· 321 264 bool fails; 322 265 bool relaxed_core_relocs; 323 266 bool direct_raw_tp; 267 + setup_test_fn setup; 324 268 }; 269 + 270 + static int find_btf_type(const struct btf *btf, const char *name, __u32 kind) 271 + { 272 + int id; 273 + 274 + id = btf__find_by_name_kind(btf, name, kind); 275 + if (CHECK(id <= 0, "find_type_id", "failed to find '%s', kind %d: %d\n", name, kind, id)) 276 + return -1; 277 + 278 + return id; 279 + } 280 + 281 + static int setup_type_id_case_local(struct core_reloc_test_case *test) 282 + { 283 + struct core_reloc_type_id_output *exp = (void *)test->output; 284 + struct btf *local_btf = btf__parse(test->bpf_obj_file, NULL); 285 + struct btf *targ_btf = btf__parse(test->btf_src_file, NULL); 286 + const struct btf_type *t; 287 + const char *name; 288 + int i; 289 + 290 + if (CHECK(IS_ERR(local_btf), "local_btf", "failed: %ld\n", PTR_ERR(local_btf)) || 291 + CHECK(IS_ERR(targ_btf), "targ_btf", "failed: %ld\n", PTR_ERR(targ_btf))) { 292 + btf__free(local_btf); 293 + btf__free(targ_btf); 294 + return -EINVAL; 295 + } 296 + 297 + exp->local_anon_struct = -1; 298 + exp->local_anon_union = -1; 299 + exp->local_anon_enum = -1; 300 + exp->local_anon_func_proto_ptr = -1; 301 + exp->local_anon_void_ptr = -1; 302 + exp->local_anon_arr = -1; 303 + 304 + for (i = 1; i <= btf__get_nr_types(local_btf); i++) 305 + { 306 + t = btf__type_by_id(local_btf, i); 307 + /* we are interested only in anonymous types */ 308 + if (t->name_off) 309 + continue; 310 + 311 + if (btf_is_struct(t) && btf_vlen(t) && 312 + (name = btf__name_by_offset(local_btf, btf_members(t)[0].name_off)) && 313 + strcmp(name, "marker_field") == 0) { 314 + exp->local_anon_struct = i; 315 + } else if (btf_is_union(t) && btf_vlen(t) && 316 + (name = btf__name_by_offset(local_btf, btf_members(t)[0].name_off)) && 317 + strcmp(name, "marker_field") == 0) { 318 + exp->local_anon_union = i; 319 + } else if (btf_is_enum(t) && btf_vlen(t) && 320 + (name = btf__name_by_offset(local_btf, btf_enum(t)[0].name_off)) && 321 + strcmp(name, "MARKER_ENUM_VAL") == 0) { 322 + exp->local_anon_enum = i; 323 + } else if (btf_is_ptr(t) && (t = btf__type_by_id(local_btf, t->type))) { 324 + if (btf_is_func_proto(t) && (t = btf__type_by_id(local_btf, t->type)) && 325 + btf_is_int(t) && (name = btf__name_by_offset(local_btf, t->name_off)) && 326 + strcmp(name, "_Bool") == 0) { 327 + /* ptr -> func_proto -> _Bool */ 328 + exp->local_anon_func_proto_ptr = i; 329 + } else if (btf_is_void(t)) { 330 + /* ptr -> void */ 331 + exp->local_anon_void_ptr = i; 332 + } 333 + } else if (btf_is_array(t) && (t = btf__type_by_id(local_btf, btf_array(t)->type)) && 334 + btf_is_int(t) && (name = btf__name_by_offset(local_btf, t->name_off)) && 335 + strcmp(name, "_Bool") == 0) { 336 + /* _Bool[] */ 337 + exp->local_anon_arr = i; 338 + } 339 + } 340 + 341 + exp->local_struct = find_btf_type(local_btf, "a_struct", BTF_KIND_STRUCT); 342 + exp->local_union = find_btf_type(local_btf, "a_union", BTF_KIND_UNION); 343 + exp->local_enum = find_btf_type(local_btf, "an_enum", BTF_KIND_ENUM); 344 + exp->local_int = find_btf_type(local_btf, "int", BTF_KIND_INT); 345 + exp->local_struct_typedef = find_btf_type(local_btf, "named_struct_typedef", BTF_KIND_TYPEDEF); 346 + exp->local_func_proto_typedef = find_btf_type(local_btf, "func_proto_typedef", BTF_KIND_TYPEDEF); 347 + exp->local_arr_typedef = find_btf_type(local_btf, "arr_typedef", BTF_KIND_TYPEDEF); 348 + 349 + btf__free(local_btf); 350 + btf__free(targ_btf); 351 + return 0; 352 + } 353 + 354 + static int setup_type_id_case_success(struct core_reloc_test_case *test) { 355 + struct core_reloc_type_id_output *exp = (void *)test->output; 356 + struct btf *targ_btf = btf__parse(test->btf_src_file, NULL); 357 + int err; 358 + 359 + err = setup_type_id_case_local(test); 360 + if (err) 361 + return err; 362 + 363 + targ_btf = btf__parse(test->btf_src_file, NULL); 364 + 365 + exp->targ_struct = find_btf_type(targ_btf, "a_struct", BTF_KIND_STRUCT); 366 + exp->targ_union = find_btf_type(targ_btf, "a_union", BTF_KIND_UNION); 367 + exp->targ_enum = find_btf_type(targ_btf, "an_enum", BTF_KIND_ENUM); 368 + exp->targ_int = find_btf_type(targ_btf, "int", BTF_KIND_INT); 369 + exp->targ_struct_typedef = find_btf_type(targ_btf, "named_struct_typedef", BTF_KIND_TYPEDEF); 370 + exp->targ_func_proto_typedef = find_btf_type(targ_btf, "func_proto_typedef", BTF_KIND_TYPEDEF); 371 + exp->targ_arr_typedef = find_btf_type(targ_btf, "arr_typedef", BTF_KIND_TYPEDEF); 372 + 373 + btf__free(targ_btf); 374 + return 0; 375 + } 376 + 377 + static int setup_type_id_case_failure(struct core_reloc_test_case *test) 378 + { 379 + struct core_reloc_type_id_output *exp = (void *)test->output; 380 + int err; 381 + 382 + err = setup_type_id_case_local(test); 383 + if (err) 384 + return err; 385 + 386 + exp->targ_struct = 0; 387 + exp->targ_union = 0; 388 + exp->targ_enum = 0; 389 + exp->targ_int = 0; 390 + exp->targ_struct_typedef = 0; 391 + exp->targ_func_proto_typedef = 0; 392 + exp->targ_arr_typedef = 0; 393 + 394 + return 0; 395 + } 325 396 326 397 static struct core_reloc_test_case test_cases[] = { 327 398 /* validate we can find kernel image and use its BTF for relocs */ ··· 549 364 550 365 /* validate field existence checks */ 551 366 { 552 - EXISTENCE_CASE_COMMON(existence), 367 + FIELD_EXISTS_CASE_COMMON(existence), 553 368 .input = STRUCT_TO_CHAR_PTR(core_reloc_existence) { 554 369 .a = 1, 555 370 .b = 2, ··· 573 388 .output_len = sizeof(struct core_reloc_existence_output), 574 389 }, 575 390 { 576 - EXISTENCE_CASE_COMMON(existence___minimal), 391 + FIELD_EXISTS_CASE_COMMON(existence___minimal), 577 392 .input = STRUCT_TO_CHAR_PTR(core_reloc_existence___minimal) { 578 393 .a = 42, 579 394 }, ··· 593 408 .output_len = sizeof(struct core_reloc_existence_output), 594 409 }, 595 410 596 - EXISTENCE_ERR_CASE(existence__err_int_sz), 597 - EXISTENCE_ERR_CASE(existence__err_int_type), 598 - EXISTENCE_ERR_CASE(existence__err_int_kind), 599 - EXISTENCE_ERR_CASE(existence__err_arr_kind), 600 - EXISTENCE_ERR_CASE(existence__err_arr_value_type), 601 - EXISTENCE_ERR_CASE(existence__err_struct_type), 411 + FIELD_EXISTS_ERR_CASE(existence__err_int_sz), 412 + FIELD_EXISTS_ERR_CASE(existence__err_int_type), 413 + FIELD_EXISTS_ERR_CASE(existence__err_int_kind), 414 + FIELD_EXISTS_ERR_CASE(existence__err_arr_kind), 415 + FIELD_EXISTS_ERR_CASE(existence__err_arr_value_type), 416 + FIELD_EXISTS_ERR_CASE(existence__err_struct_type), 602 417 603 418 /* bitfield relocation checks */ 604 419 BITFIELDS_CASE(bitfields, { ··· 637 452 /* size relocation checks */ 638 453 SIZE_CASE(size), 639 454 SIZE_CASE(size___diff_sz), 455 + SIZE_ERR_CASE(size___err_ambiguous), 456 + 457 + /* validate type existence and size relocations */ 458 + TYPE_BASED_CASE(type_based, { 459 + .struct_exists = 1, 460 + .union_exists = 1, 461 + .enum_exists = 1, 462 + .typedef_named_struct_exists = 1, 463 + .typedef_anon_struct_exists = 1, 464 + .typedef_struct_ptr_exists = 1, 465 + .typedef_int_exists = 1, 466 + .typedef_enum_exists = 1, 467 + .typedef_void_ptr_exists = 1, 468 + .typedef_func_proto_exists = 1, 469 + .typedef_arr_exists = 1, 470 + .struct_sz = sizeof(struct a_struct), 471 + .union_sz = sizeof(union a_union), 472 + .enum_sz = sizeof(enum an_enum), 473 + .typedef_named_struct_sz = sizeof(named_struct_typedef), 474 + .typedef_anon_struct_sz = sizeof(anon_struct_typedef), 475 + .typedef_struct_ptr_sz = sizeof(struct_ptr_typedef), 476 + .typedef_int_sz = sizeof(int_typedef), 477 + .typedef_enum_sz = sizeof(enum_typedef), 478 + .typedef_void_ptr_sz = sizeof(void_ptr_typedef), 479 + .typedef_func_proto_sz = sizeof(func_proto_typedef), 480 + .typedef_arr_sz = sizeof(arr_typedef), 481 + }), 482 + TYPE_BASED_CASE(type_based___all_missing, { 483 + /* all zeros */ 484 + }), 485 + TYPE_BASED_CASE(type_based___diff_sz, { 486 + .struct_exists = 1, 487 + .union_exists = 1, 488 + .enum_exists = 1, 489 + .typedef_named_struct_exists = 1, 490 + .typedef_anon_struct_exists = 1, 491 + .typedef_struct_ptr_exists = 1, 492 + .typedef_int_exists = 1, 493 + .typedef_enum_exists = 1, 494 + .typedef_void_ptr_exists = 1, 495 + .typedef_func_proto_exists = 1, 496 + .typedef_arr_exists = 1, 497 + .struct_sz = sizeof(struct a_struct___diff_sz), 498 + .union_sz = sizeof(union a_union___diff_sz), 499 + .enum_sz = sizeof(enum an_enum___diff_sz), 500 + .typedef_named_struct_sz = sizeof(named_struct_typedef___diff_sz), 501 + .typedef_anon_struct_sz = sizeof(anon_struct_typedef___diff_sz), 502 + .typedef_struct_ptr_sz = sizeof(struct_ptr_typedef___diff_sz), 503 + .typedef_int_sz = sizeof(int_typedef___diff_sz), 504 + .typedef_enum_sz = sizeof(enum_typedef___diff_sz), 505 + .typedef_void_ptr_sz = sizeof(void_ptr_typedef___diff_sz), 506 + .typedef_func_proto_sz = sizeof(func_proto_typedef___diff_sz), 507 + .typedef_arr_sz = sizeof(arr_typedef___diff_sz), 508 + }), 509 + TYPE_BASED_CASE(type_based___incompat, { 510 + .enum_exists = 1, 511 + .enum_sz = sizeof(enum an_enum), 512 + }), 513 + TYPE_BASED_CASE(type_based___fn_wrong_args, { 514 + .struct_exists = 1, 515 + .struct_sz = sizeof(struct a_struct), 516 + }), 517 + 518 + /* BTF_TYPE_ID_LOCAL/BTF_TYPE_ID_TARGET tests */ 519 + TYPE_ID_CASE(type_id, setup_type_id_case_success), 520 + TYPE_ID_CASE(type_id___missing_targets, setup_type_id_case_failure), 521 + 522 + /* Enumerator value existence and value relocations */ 523 + ENUMVAL_CASE(enumval, { 524 + .named_val1_exists = true, 525 + .named_val2_exists = true, 526 + .named_val3_exists = true, 527 + .anon_val1_exists = true, 528 + .anon_val2_exists = true, 529 + .anon_val3_exists = true, 530 + .named_val1 = 1, 531 + .named_val2 = 2, 532 + .anon_val1 = 0x10, 533 + .anon_val2 = 0x20, 534 + }), 535 + ENUMVAL_CASE(enumval___diff, { 536 + .named_val1_exists = true, 537 + .named_val2_exists = true, 538 + .named_val3_exists = true, 539 + .anon_val1_exists = true, 540 + .anon_val2_exists = true, 541 + .anon_val3_exists = true, 542 + .named_val1 = 101, 543 + .named_val2 = 202, 544 + .anon_val1 = 0x11, 545 + .anon_val2 = 0x22, 546 + }), 547 + ENUMVAL_CASE(enumval___val3_missing, { 548 + .named_val1_exists = true, 549 + .named_val2_exists = true, 550 + .named_val3_exists = false, 551 + .anon_val1_exists = true, 552 + .anon_val2_exists = true, 553 + .anon_val3_exists = false, 554 + .named_val1 = 111, 555 + .named_val2 = 222, 556 + .anon_val1 = 0x111, 557 + .anon_val2 = 0x222, 558 + }), 559 + ENUMVAL_ERR_CASE(enumval___err_missing), 640 560 }; 641 561 642 562 struct data { 643 563 char in[256]; 644 564 char out[256]; 565 + bool skip; 645 566 uint64_t my_pid_tgid; 646 567 }; 647 568 ··· 763 472 struct bpf_object_load_attr load_attr = {}; 764 473 struct core_reloc_test_case *test_case; 765 474 const char *tp_name, *probe_name; 766 - int err, duration = 0, i, equal; 475 + int err, i, equal; 767 476 struct bpf_link *link = NULL; 768 477 struct bpf_map *data_map; 769 478 struct bpf_program *prog; ··· 779 488 if (!test__start_subtest(test_case->case_name)) 780 489 continue; 781 490 782 - DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts, 783 - .relaxed_core_relocs = test_case->relaxed_core_relocs, 784 - ); 491 + if (test_case->setup) { 492 + err = test_case->setup(test_case); 493 + if (CHECK(err, "test_setup", "test #%d setup failed: %d\n", i, err)) 494 + continue; 495 + } 785 496 786 - obj = bpf_object__open_file(test_case->bpf_obj_file, &opts); 497 + obj = bpf_object__open_file(test_case->bpf_obj_file, NULL); 787 498 if (CHECK(IS_ERR(obj), "obj_open", "failed to open '%s': %ld\n", 788 499 test_case->bpf_obj_file, PTR_ERR(obj))) 789 500 continue; ··· 808 515 load_attr.log_level = 0; 809 516 load_attr.target_btf_path = test_case->btf_src_file; 810 517 err = bpf_object__load_xattr(&load_attr); 811 - if (test_case->fails) { 812 - CHECK(!err, "obj_load_fail", 813 - "should fail to load prog '%s'\n", probe_name); 518 + if (err) { 519 + if (!test_case->fails) 520 + CHECK(false, "obj_load", "failed to load prog '%s': %d\n", probe_name, err); 814 521 goto cleanup; 815 - } else { 816 - if (CHECK(err, "obj_load", 817 - "failed to load prog '%s': %d\n", 818 - probe_name, err)) 819 - goto cleanup; 820 522 } 821 523 822 524 data_map = bpf_object__find_map_by_name(obj, "test_cor.bss"); ··· 838 550 839 551 /* trigger test run */ 840 552 usleep(1); 553 + 554 + if (data->skip) { 555 + test__skip(); 556 + goto cleanup; 557 + } 558 + 559 + if (test_case->fails) { 560 + CHECK(false, "obj_load_fail", "should fail to load prog '%s'\n", probe_name); 561 + goto cleanup; 562 + } 841 563 842 564 equal = memcmp(data->out, test_case->output, 843 565 test_case->output_len) == 0;

+147

tools/testing/selftests/bpf/prog_tests/d_path.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #define _GNU_SOURCE 3 + #include <test_progs.h> 4 + #include <sys/stat.h> 5 + #include <linux/sched.h> 6 + #include <sys/syscall.h> 7 + 8 + #define MAX_PATH_LEN 128 9 + #define MAX_FILES 7 10 + 11 + #include "test_d_path.skel.h" 12 + 13 + static int duration; 14 + 15 + static struct { 16 + __u32 cnt; 17 + char paths[MAX_FILES][MAX_PATH_LEN]; 18 + } src; 19 + 20 + static int set_pathname(int fd, pid_t pid) 21 + { 22 + char buf[MAX_PATH_LEN]; 23 + 24 + snprintf(buf, MAX_PATH_LEN, "/proc/%d/fd/%d", pid, fd); 25 + return readlink(buf, src.paths[src.cnt++], MAX_PATH_LEN); 26 + } 27 + 28 + static int trigger_fstat_events(pid_t pid) 29 + { 30 + int sockfd = -1, procfd = -1, devfd = -1; 31 + int localfd = -1, indicatorfd = -1; 32 + int pipefd[2] = { -1, -1 }; 33 + struct stat fileStat; 34 + int ret = -1; 35 + 36 + /* unmountable pseudo-filesystems */ 37 + if (CHECK(pipe(pipefd) < 0, "trigger", "pipe failed\n")) 38 + return ret; 39 + /* unmountable pseudo-filesystems */ 40 + sockfd = socket(AF_INET, SOCK_STREAM, 0); 41 + if (CHECK(sockfd < 0, "trigger", "socket failed\n")) 42 + goto out_close; 43 + /* mountable pseudo-filesystems */ 44 + procfd = open("/proc/self/comm", O_RDONLY); 45 + if (CHECK(procfd < 0, "trigger", "open /proc/self/comm failed\n")) 46 + goto out_close; 47 + devfd = open("/dev/urandom", O_RDONLY); 48 + if (CHECK(devfd < 0, "trigger", "open /dev/urandom failed\n")) 49 + goto out_close; 50 + localfd = open("/tmp/d_path_loadgen.txt", O_CREAT | O_RDONLY, 0644); 51 + if (CHECK(localfd < 0, "trigger", "open /tmp/d_path_loadgen.txt failed\n")) 52 + goto out_close; 53 + /* bpf_d_path will return path with (deleted) */ 54 + remove("/tmp/d_path_loadgen.txt"); 55 + indicatorfd = open("/tmp/", O_PATH); 56 + if (CHECK(indicatorfd < 0, "trigger", "open /tmp/ failed\n")) 57 + goto out_close; 58 + 59 + ret = set_pathname(pipefd[0], pid); 60 + if (CHECK(ret < 0, "trigger", "set_pathname failed for pipe[0]\n")) 61 + goto out_close; 62 + ret = set_pathname(pipefd[1], pid); 63 + if (CHECK(ret < 0, "trigger", "set_pathname failed for pipe[1]\n")) 64 + goto out_close; 65 + ret = set_pathname(sockfd, pid); 66 + if (CHECK(ret < 0, "trigger", "set_pathname failed for socket\n")) 67 + goto out_close; 68 + ret = set_pathname(procfd, pid); 69 + if (CHECK(ret < 0, "trigger", "set_pathname failed for proc\n")) 70 + goto out_close; 71 + ret = set_pathname(devfd, pid); 72 + if (CHECK(ret < 0, "trigger", "set_pathname failed for dev\n")) 73 + goto out_close; 74 + ret = set_pathname(localfd, pid); 75 + if (CHECK(ret < 0, "trigger", "set_pathname failed for file\n")) 76 + goto out_close; 77 + ret = set_pathname(indicatorfd, pid); 78 + if (CHECK(ret < 0, "trigger", "set_pathname failed for dir\n")) 79 + goto out_close; 80 + 81 + /* triggers vfs_getattr */ 82 + fstat(pipefd[0], &fileStat); 83 + fstat(pipefd[1], &fileStat); 84 + fstat(sockfd, &fileStat); 85 + fstat(procfd, &fileStat); 86 + fstat(devfd, &fileStat); 87 + fstat(localfd, &fileStat); 88 + fstat(indicatorfd, &fileStat); 89 + 90 + out_close: 91 + /* triggers filp_close */ 92 + close(pipefd[0]); 93 + close(pipefd[1]); 94 + close(sockfd); 95 + close(procfd); 96 + close(devfd); 97 + close(localfd); 98 + close(indicatorfd); 99 + return ret; 100 + } 101 + 102 + void test_d_path(void) 103 + { 104 + struct test_d_path__bss *bss; 105 + struct test_d_path *skel; 106 + int err; 107 + 108 + skel = test_d_path__open_and_load(); 109 + if (CHECK(!skel, "setup", "d_path skeleton failed\n")) 110 + goto cleanup; 111 + 112 + err = test_d_path__attach(skel); 113 + if (CHECK(err, "setup", "attach failed: %d\n", err)) 114 + goto cleanup; 115 + 116 + bss = skel->bss; 117 + bss->my_pid = getpid(); 118 + 119 + err = trigger_fstat_events(bss->my_pid); 120 + if (err < 0) 121 + goto cleanup; 122 + 123 + for (int i = 0; i < MAX_FILES; i++) { 124 + CHECK(strncmp(src.paths[i], bss->paths_stat[i], MAX_PATH_LEN), 125 + "check", 126 + "failed to get stat path[%d]: %s vs %s\n", 127 + i, src.paths[i], bss->paths_stat[i]); 128 + CHECK(strncmp(src.paths[i], bss->paths_close[i], MAX_PATH_LEN), 129 + "check", 130 + "failed to get close path[%d]: %s vs %s\n", 131 + i, src.paths[i], bss->paths_close[i]); 132 + /* The d_path helper returns size plus NUL char, hence + 1 */ 133 + CHECK(bss->rets_stat[i] != strlen(bss->paths_stat[i]) + 1, 134 + "check", 135 + "failed to match stat return [%d]: %d vs %zd [%s]\n", 136 + i, bss->rets_stat[i], strlen(bss->paths_stat[i]) + 1, 137 + bss->paths_stat[i]); 138 + CHECK(bss->rets_close[i] != strlen(bss->paths_stat[i]) + 1, 139 + "check", 140 + "failed to match stat return [%d]: %d vs %zd [%s]\n", 141 + i, bss->rets_close[i], strlen(bss->paths_close[i]) + 1, 142 + bss->paths_stat[i]); 143 + } 144 + 145 + cleanup: 146 + test_d_path__destroy(skel); 147 + }

+68

tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c

··· 123 123 "freplace/get_skb_len", 124 124 "freplace/get_skb_ifindex", 125 125 "freplace/get_constant", 126 + "freplace/test_pkt_write_access_subprog", 126 127 }; 127 128 test_fexit_bpf2bpf_common("./fexit_bpf2bpf.o", 128 129 "./test_pkt_access.o", ··· 142 141 prog_name, false); 143 142 } 144 143 144 + static void test_func_sockmap_update(void) 145 + { 146 + const char *prog_name[] = { 147 + "freplace/cls_redirect", 148 + }; 149 + test_fexit_bpf2bpf_common("./freplace_cls_redirect.o", 150 + "./test_cls_redirect.o", 151 + ARRAY_SIZE(prog_name), 152 + prog_name, false); 153 + } 154 + 155 + static void test_obj_load_failure_common(const char *obj_file, 156 + const char *target_obj_file) 157 + 158 + { 159 + /* 160 + * standalone test that asserts failure to load freplace prog 161 + * because of invalid return code. 162 + */ 163 + struct bpf_object *obj = NULL, *pkt_obj; 164 + int err, pkt_fd; 165 + __u32 duration = 0; 166 + 167 + err = bpf_prog_load(target_obj_file, BPF_PROG_TYPE_UNSPEC, 168 + &pkt_obj, &pkt_fd); 169 + /* the target prog should load fine */ 170 + if (CHECK(err, "tgt_prog_load", "file %s err %d errno %d\n", 171 + target_obj_file, err, errno)) 172 + return; 173 + DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts, 174 + .attach_prog_fd = pkt_fd, 175 + ); 176 + 177 + obj = bpf_object__open_file(obj_file, &opts); 178 + if (CHECK(IS_ERR_OR_NULL(obj), "obj_open", 179 + "failed to open %s: %ld\n", obj_file, 180 + PTR_ERR(obj))) 181 + goto close_prog; 182 + 183 + /* It should fail to load the program */ 184 + err = bpf_object__load(obj); 185 + if (CHECK(!err, "bpf_obj_load should fail", "err %d\n", err)) 186 + goto close_prog; 187 + 188 + close_prog: 189 + if (!IS_ERR_OR_NULL(obj)) 190 + bpf_object__close(obj); 191 + bpf_object__close(pkt_obj); 192 + } 193 + 194 + static void test_func_replace_return_code(void) 195 + { 196 + /* test invalid return code in the replaced program */ 197 + test_obj_load_failure_common("./freplace_connect_v4_prog.o", 198 + "./connect4_prog.o"); 199 + } 200 + 201 + static void test_func_map_prog_compatibility(void) 202 + { 203 + /* test with spin lock map value in the replaced program */ 204 + test_obj_load_failure_common("./freplace_attach_probe.o", 205 + "./test_attach_probe.o"); 206 + } 207 + 145 208 void test_fexit_bpf2bpf(void) 146 209 { 147 210 test_target_no_callees(); 148 211 test_target_yes_callees(); 149 212 test_func_replace(); 150 213 test_func_replace_verify(); 214 + test_func_sockmap_update(); 215 + test_func_replace_return_code(); 216 + test_func_map_prog_compatibility(); 151 217 }

+54 -11

tools/testing/selftests/bpf/prog_tests/perf_buffer.c

··· 7 7 #include "test_perf_buffer.skel.h" 8 8 #include "bpf/libbpf_internal.h" 9 9 10 + static int duration; 11 + 10 12 /* AddressSanitizer sometimes crashes due to data dereference below, due to 11 13 * this being mmap()'ed memory. Disable instrumentation with 12 14 * no_sanitize_address attribute ··· 26 24 CPU_SET(cpu, cpu_seen); 27 25 } 28 26 27 + int trigger_on_cpu(int cpu) 28 + { 29 + cpu_set_t cpu_set; 30 + int err; 31 + 32 + CPU_ZERO(&cpu_set); 33 + CPU_SET(cpu, &cpu_set); 34 + 35 + err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set); 36 + if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", cpu, err)) 37 + return err; 38 + 39 + usleep(1); 40 + 41 + return 0; 42 + } 43 + 29 44 void test_perf_buffer(void) 30 45 { 31 - int err, on_len, nr_on_cpus = 0, nr_cpus, i, duration = 0; 46 + int err, on_len, nr_on_cpus = 0, nr_cpus, i; 32 47 struct perf_buffer_opts pb_opts = {}; 33 48 struct test_perf_buffer *skel; 34 - cpu_set_t cpu_set, cpu_seen; 49 + cpu_set_t cpu_seen; 35 50 struct perf_buffer *pb; 51 + int last_fd = -1, fd; 36 52 bool *online; 37 53 38 54 nr_cpus = libbpf_num_possible_cpus(); ··· 83 63 if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb))) 84 64 goto out_close; 85 65 66 + CHECK(perf_buffer__epoll_fd(pb) < 0, "epoll_fd", 67 + "bad fd: %d\n", perf_buffer__epoll_fd(pb)); 68 + 86 69 /* trigger kprobe on every CPU */ 87 70 CPU_ZERO(&cpu_seen); 88 71 for (i = 0; i < nr_cpus; i++) { ··· 94 71 continue; 95 72 } 96 73 97 - CPU_ZERO(&cpu_set); 98 - CPU_SET(i, &cpu_set); 99 - 100 - err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), 101 - &cpu_set); 102 - if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", 103 - i, err)) 74 + if (trigger_on_cpu(i)) 104 75 goto out_close; 105 - 106 - usleep(1); 107 76 } 108 77 109 78 /* read perf buffer */ ··· 106 91 if (CHECK(CPU_COUNT(&cpu_seen) != nr_on_cpus, "seen_cpu_cnt", 107 92 "expect %d, seen %d\n", nr_on_cpus, CPU_COUNT(&cpu_seen))) 108 93 goto out_free_pb; 94 + 95 + if (CHECK(perf_buffer__buffer_cnt(pb) != nr_cpus, "buf_cnt", 96 + "got %zu, expected %d\n", perf_buffer__buffer_cnt(pb), nr_cpus)) 97 + goto out_close; 98 + 99 + for (i = 0; i < nr_cpus; i++) { 100 + if (i >= on_len || !online[i]) 101 + continue; 102 + 103 + fd = perf_buffer__buffer_fd(pb, i); 104 + CHECK(fd < 0 || last_fd == fd, "fd_check", "last fd %d == fd %d\n", last_fd, fd); 105 + last_fd = fd; 106 + 107 + err = perf_buffer__consume_buffer(pb, i); 108 + if (CHECK(err, "drain_buf", "cpu %d, err %d\n", i, err)) 109 + goto out_close; 110 + 111 + CPU_CLR(i, &cpu_seen); 112 + if (trigger_on_cpu(i)) 113 + goto out_close; 114 + 115 + err = perf_buffer__consume_buffer(pb, i); 116 + if (CHECK(err, "consume_buf", "cpu %d, err %d\n", i, err)) 117 + goto out_close; 118 + 119 + if (CHECK(!CPU_ISSET(i, &cpu_seen), "cpu_seen", "cpu %d not seen\n", i)) 120 + goto out_close; 121 + } 109 122 110 123 out_free_pb: 111 124 perf_buffer__free(pb);

+38 -1

tools/testing/selftests/bpf/prog_tests/resolve_btfids.c

··· 47 47 BTF_ID(union, U) 48 48 BTF_ID(func, func) 49 49 50 + BTF_SET_START(test_set) 51 + BTF_ID(typedef, S) 52 + BTF_ID(typedef, T) 53 + BTF_ID(typedef, U) 54 + BTF_ID(struct, S) 55 + BTF_ID(union, U) 56 + BTF_ID(func, func) 57 + BTF_SET_END(test_set) 58 + 50 59 static int 51 60 __resolve_symbol(struct btf *btf, int type_id) 52 61 { ··· 125 116 */ 126 117 for (j = 0; j < ARRAY_SIZE(test_lists); j++) { 127 118 test_list = test_lists[j]; 128 - for (i = 0; i < ARRAY_SIZE(test_symbols) && !ret; i++) { 119 + for (i = 0; i < ARRAY_SIZE(test_symbols); i++) { 129 120 ret = CHECK(test_list[i] != test_symbols[i].id, 130 121 "id_check", 131 122 "wrong ID for %s (%d != %d)\n", 132 123 test_symbols[i].name, 133 124 test_list[i], test_symbols[i].id); 125 + if (ret) 126 + return ret; 127 + } 128 + } 129 + 130 + /* Check BTF_SET_START(test_set) IDs */ 131 + for (i = 0; i < test_set.cnt; i++) { 132 + bool found = false; 133 + 134 + for (j = 0; j < ARRAY_SIZE(test_symbols); j++) { 135 + if (test_symbols[j].id != test_set.ids[i]) 136 + continue; 137 + found = true; 138 + break; 139 + } 140 + 141 + ret = CHECK(!found, "id_check", 142 + "ID %d not found in test_symbols\n", 143 + test_set.ids[i]); 144 + if (ret) 145 + break; 146 + 147 + if (i > 0) { 148 + ret = CHECK(test_set.ids[i - 1] > test_set.ids[i], 149 + "sort_check", 150 + "test_set is not sorted\n"); 151 + if (ret) 152 + break; 134 153 } 135 154 } 136 155

+3 -2

tools/testing/selftests/bpf/prog_tests/sk_assign.c

··· 49 49 sprintf(tc_cmd, "%s %s %s %s", "tc filter add dev lo ingress bpf", 50 50 "direct-action object-file ./test_sk_assign.o", 51 51 "section classifier/sk_assign_test", 52 - (env.verbosity < VERBOSE_VERY) ? " 2>/dev/null" : ""); 52 + (env.verbosity < VERBOSE_VERY) ? " 2>/dev/null" : "verbose"); 53 53 if (CHECK(system(tc_cmd), "BPF load failed;", 54 54 "run with -vv for more info\n")) 55 55 return false; ··· 268 268 int server = -1; 269 269 int server_map; 270 270 int self_net; 271 + int i; 271 272 272 273 self_net = open(NS_SELF, O_RDONLY); 273 274 if (CHECK_FAIL(self_net < 0)) { ··· 287 286 goto cleanup; 288 287 } 289 288 290 - for (int i = 0; i < ARRAY_SIZE(tests) && !READ_ONCE(stop); i++) { 289 + for (i = 0; i < ARRAY_SIZE(tests) && !READ_ONCE(stop); i++) { 291 290 struct test_sk_cfg *test = &tests[i]; 292 291 const struct sockaddr *addr; 293 292 const int zero = 0;

+76

tools/testing/selftests/bpf/prog_tests/sockmap_basic.c

··· 4 4 5 5 #include "test_progs.h" 6 6 #include "test_skmsg_load_helpers.skel.h" 7 + #include "test_sockmap_update.skel.h" 8 + #include "test_sockmap_invalid_update.skel.h" 7 9 8 10 #define TCP_REPAIR 19 /* TCP sock is under repair right now */ 9 11 ··· 103 101 test_skmsg_load_helpers__destroy(skel); 104 102 } 105 103 104 + static void test_sockmap_update(enum bpf_map_type map_type) 105 + { 106 + struct bpf_prog_test_run_attr tattr; 107 + int err, prog, src, dst, duration = 0; 108 + struct test_sockmap_update *skel; 109 + __u64 src_cookie, dst_cookie; 110 + const __u32 zero = 0; 111 + char dummy[14] = {0}; 112 + __s64 sk; 113 + 114 + sk = connected_socket_v4(); 115 + if (CHECK(sk == -1, "connected_socket_v4", "cannot connect\n")) 116 + return; 117 + 118 + skel = test_sockmap_update__open_and_load(); 119 + if (CHECK(!skel, "open_and_load", "cannot load skeleton\n")) 120 + goto close_sk; 121 + 122 + prog = bpf_program__fd(skel->progs.copy_sock_map); 123 + src = bpf_map__fd(skel->maps.src); 124 + if (map_type == BPF_MAP_TYPE_SOCKMAP) 125 + dst = bpf_map__fd(skel->maps.dst_sock_map); 126 + else 127 + dst = bpf_map__fd(skel->maps.dst_sock_hash); 128 + 129 + err = bpf_map_update_elem(src, &zero, &sk, BPF_NOEXIST); 130 + if (CHECK(err, "update_elem(src)", "errno=%u\n", errno)) 131 + goto out; 132 + 133 + err = bpf_map_lookup_elem(src, &zero, &src_cookie); 134 + if (CHECK(err, "lookup_elem(src, cookie)", "errno=%u\n", errno)) 135 + goto out; 136 + 137 + tattr = (struct bpf_prog_test_run_attr){ 138 + .prog_fd = prog, 139 + .repeat = 1, 140 + .data_in = dummy, 141 + .data_size_in = sizeof(dummy), 142 + }; 143 + 144 + err = bpf_prog_test_run_xattr(&tattr); 145 + if (CHECK_ATTR(err || !tattr.retval, "bpf_prog_test_run", 146 + "errno=%u retval=%u\n", errno, tattr.retval)) 147 + goto out; 148 + 149 + err = bpf_map_lookup_elem(dst, &zero, &dst_cookie); 150 + if (CHECK(err, "lookup_elem(dst, cookie)", "errno=%u\n", errno)) 151 + goto out; 152 + 153 + CHECK(dst_cookie != src_cookie, "cookie mismatch", "%llu != %llu\n", 154 + dst_cookie, src_cookie); 155 + 156 + out: 157 + test_sockmap_update__destroy(skel); 158 + close_sk: 159 + close(sk); 160 + } 161 + 162 + static void test_sockmap_invalid_update(void) 163 + { 164 + struct test_sockmap_invalid_update *skel; 165 + int duration = 0; 166 + 167 + skel = test_sockmap_invalid_update__open_and_load(); 168 + if (CHECK(skel, "open_and_load", "verifier accepted map_update\n")) 169 + test_sockmap_invalid_update__destroy(skel); 170 + } 171 + 106 172 void test_sockmap_basic(void) 107 173 { 108 174 if (test__start_subtest("sockmap create_update_free")) ··· 181 111 test_skmsg_helpers(BPF_MAP_TYPE_SOCKMAP); 182 112 if (test__start_subtest("sockhash sk_msg load helpers")) 183 113 test_skmsg_helpers(BPF_MAP_TYPE_SOCKHASH); 114 + if (test__start_subtest("sockmap update")) 115 + test_sockmap_update(BPF_MAP_TYPE_SOCKMAP); 116 + if (test__start_subtest("sockhash update")) 117 + test_sockmap_update(BPF_MAP_TYPE_SOCKHASH); 118 + if (test__start_subtest("sockmap update in unsafe context")) 119 + test_sockmap_invalid_update(); 184 120 }

+622

tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2020 Facebook */ 3 + 4 + #define _GNU_SOURCE 5 + #include <sched.h> 6 + #include <stdio.h> 7 + #include <stdlib.h> 8 + #include <sys/socket.h> 9 + #include <linux/compiler.h> 10 + 11 + #include "test_progs.h" 12 + #include "cgroup_helpers.h" 13 + #include "network_helpers.h" 14 + #include "test_tcp_hdr_options.h" 15 + #include "test_tcp_hdr_options.skel.h" 16 + #include "test_misc_tcp_hdr_options.skel.h" 17 + 18 + #define LO_ADDR6 "::eB9F" 19 + #define CG_NAME "/tcpbpf-hdr-opt-test" 20 + 21 + struct bpf_test_option exp_passive_estab_in; 22 + struct bpf_test_option exp_active_estab_in; 23 + struct bpf_test_option exp_passive_fin_in; 24 + struct bpf_test_option exp_active_fin_in; 25 + struct hdr_stg exp_passive_hdr_stg; 26 + struct hdr_stg exp_active_hdr_stg = { .active = true, }; 27 + 28 + static struct test_misc_tcp_hdr_options *misc_skel; 29 + static struct test_tcp_hdr_options *skel; 30 + static int lport_linum_map_fd; 31 + static int hdr_stg_map_fd; 32 + static __u32 duration; 33 + static int cg_fd; 34 + 35 + struct sk_fds { 36 + int srv_fd; 37 + int passive_fd; 38 + int active_fd; 39 + int passive_lport; 40 + int active_lport; 41 + }; 42 + 43 + static int add_lo_addr(void) 44 + { 45 + char ip_addr_cmd[256]; 46 + int cmdlen; 47 + 48 + cmdlen = snprintf(ip_addr_cmd, sizeof(ip_addr_cmd), 49 + "ip -6 addr add %s/128 dev lo scope host", 50 + LO_ADDR6); 51 + 52 + if (CHECK(cmdlen >= sizeof(ip_addr_cmd), "compile ip cmd", 53 + "failed to add host addr %s to lo. ip cmdlen is too long\n", 54 + LO_ADDR6)) 55 + return -1; 56 + 57 + if (CHECK(system(ip_addr_cmd), "run ip cmd", 58 + "failed to add host addr %s to lo\n", LO_ADDR6)) 59 + return -1; 60 + 61 + return 0; 62 + } 63 + 64 + static int create_netns(void) 65 + { 66 + if (CHECK(unshare(CLONE_NEWNET), "create netns", 67 + "unshare(CLONE_NEWNET): %s (%d)", 68 + strerror(errno), errno)) 69 + return -1; 70 + 71 + if (CHECK(system("ip link set dev lo up"), "run ip cmd", 72 + "failed to bring lo link up\n")) 73 + return -1; 74 + 75 + if (add_lo_addr()) 76 + return -1; 77 + 78 + return 0; 79 + } 80 + 81 + static int write_sysctl(const char *sysctl, const char *value) 82 + { 83 + int fd, err, len; 84 + 85 + fd = open(sysctl, O_WRONLY); 86 + if (CHECK(fd == -1, "open sysctl", "open(%s): %s (%d)\n", 87 + sysctl, strerror(errno), errno)) 88 + return -1; 89 + 90 + len = strlen(value); 91 + err = write(fd, value, len); 92 + close(fd); 93 + if (CHECK(err != len, "write sysctl", 94 + "write(%s, %s): err:%d %s (%d)\n", 95 + sysctl, value, err, strerror(errno), errno)) 96 + return -1; 97 + 98 + return 0; 99 + } 100 + 101 + static void print_hdr_stg(const struct hdr_stg *hdr_stg, const char *prefix) 102 + { 103 + fprintf(stderr, "%s{active:%u, resend_syn:%u, syncookie:%u, fastopen:%u}\n", 104 + prefix ? : "", hdr_stg->active, hdr_stg->resend_syn, 105 + hdr_stg->syncookie, hdr_stg->fastopen); 106 + } 107 + 108 + static void print_option(const struct bpf_test_option *opt, const char *prefix) 109 + { 110 + fprintf(stderr, "%s{flags:0x%x, max_delack_ms:%u, rand:0x%x}\n", 111 + prefix ? : "", opt->flags, opt->max_delack_ms, opt->rand); 112 + } 113 + 114 + static void sk_fds_close(struct sk_fds *sk_fds) 115 + { 116 + close(sk_fds->srv_fd); 117 + close(sk_fds->passive_fd); 118 + close(sk_fds->active_fd); 119 + } 120 + 121 + static int sk_fds_shutdown(struct sk_fds *sk_fds) 122 + { 123 + int ret, abyte; 124 + 125 + shutdown(sk_fds->active_fd, SHUT_WR); 126 + ret = read(sk_fds->passive_fd, &abyte, sizeof(abyte)); 127 + if (CHECK(ret != 0, "read-after-shutdown(passive_fd):", 128 + "ret:%d %s (%d)\n", 129 + ret, strerror(errno), errno)) 130 + return -1; 131 + 132 + shutdown(sk_fds->passive_fd, SHUT_WR); 133 + ret = read(sk_fds->active_fd, &abyte, sizeof(abyte)); 134 + if (CHECK(ret != 0, "read-after-shutdown(active_fd):", 135 + "ret:%d %s (%d)\n", 136 + ret, strerror(errno), errno)) 137 + return -1; 138 + 139 + return 0; 140 + } 141 + 142 + static int sk_fds_connect(struct sk_fds *sk_fds, bool fast_open) 143 + { 144 + const char fast[] = "FAST!!!"; 145 + struct sockaddr_in6 addr6; 146 + socklen_t len; 147 + 148 + sk_fds->srv_fd = start_server(AF_INET6, SOCK_STREAM, LO_ADDR6, 0, 0); 149 + if (CHECK(sk_fds->srv_fd == -1, "start_server", "%s (%d)\n", 150 + strerror(errno), errno)) 151 + goto error; 152 + 153 + if (fast_open) 154 + sk_fds->active_fd = fastopen_connect(sk_fds->srv_fd, fast, 155 + sizeof(fast), 0); 156 + else 157 + sk_fds->active_fd = connect_to_fd(sk_fds->srv_fd, 0); 158 + 159 + if (CHECK_FAIL(sk_fds->active_fd == -1)) { 160 + close(sk_fds->srv_fd); 161 + goto error; 162 + } 163 + 164 + len = sizeof(addr6); 165 + if (CHECK(getsockname(sk_fds->srv_fd, (struct sockaddr *)&addr6, 166 + &len), "getsockname(srv_fd)", "%s (%d)\n", 167 + strerror(errno), errno)) 168 + goto error_close; 169 + sk_fds->passive_lport = ntohs(addr6.sin6_port); 170 + 171 + len = sizeof(addr6); 172 + if (CHECK(getsockname(sk_fds->active_fd, (struct sockaddr *)&addr6, 173 + &len), "getsockname(active_fd)", "%s (%d)\n", 174 + strerror(errno), errno)) 175 + goto error_close; 176 + sk_fds->active_lport = ntohs(addr6.sin6_port); 177 + 178 + sk_fds->passive_fd = accept(sk_fds->srv_fd, NULL, 0); 179 + if (CHECK(sk_fds->passive_fd == -1, "accept(srv_fd)", "%s (%d)\n", 180 + strerror(errno), errno)) 181 + goto error_close; 182 + 183 + if (fast_open) { 184 + char bytes_in[sizeof(fast)]; 185 + int ret; 186 + 187 + ret = read(sk_fds->passive_fd, bytes_in, sizeof(bytes_in)); 188 + if (CHECK(ret != sizeof(fast), "read fastopen syn data", 189 + "expected=%lu actual=%d\n", sizeof(fast), ret)) { 190 + close(sk_fds->passive_fd); 191 + goto error_close; 192 + } 193 + } 194 + 195 + return 0; 196 + 197 + error_close: 198 + close(sk_fds->active_fd); 199 + close(sk_fds->srv_fd); 200 + 201 + error: 202 + memset(sk_fds, -1, sizeof(*sk_fds)); 203 + return -1; 204 + } 205 + 206 + static int check_hdr_opt(const struct bpf_test_option *exp, 207 + const struct bpf_test_option *act, 208 + const char *hdr_desc) 209 + { 210 + if (CHECK(memcmp(exp, act, sizeof(*exp)), 211 + "expected-vs-actual", "unexpected %s\n", hdr_desc)) { 212 + print_option(exp, "expected: "); 213 + print_option(act, " actual: "); 214 + return -1; 215 + } 216 + 217 + return 0; 218 + } 219 + 220 + static int check_hdr_stg(const struct hdr_stg *exp, int fd, 221 + const char *stg_desc) 222 + { 223 + struct hdr_stg act; 224 + 225 + if (CHECK(bpf_map_lookup_elem(hdr_stg_map_fd, &fd, &act), 226 + "map_lookup(hdr_stg_map_fd)", "%s %s (%d)\n", 227 + stg_desc, strerror(errno), errno)) 228 + return -1; 229 + 230 + if (CHECK(memcmp(exp, &act, sizeof(*exp)), 231 + "expected-vs-actual", "unexpected %s\n", stg_desc)) { 232 + print_hdr_stg(exp, "expected: "); 233 + print_hdr_stg(&act, " actual: "); 234 + return -1; 235 + } 236 + 237 + return 0; 238 + } 239 + 240 + static int check_error_linum(const struct sk_fds *sk_fds) 241 + { 242 + unsigned int nr_errors = 0; 243 + struct linum_err linum_err; 244 + int lport; 245 + 246 + lport = sk_fds->passive_lport; 247 + if (!bpf_map_lookup_elem(lport_linum_map_fd, &lport, &linum_err)) { 248 + fprintf(stderr, 249 + "bpf prog error out at lport:passive(%d), linum:%u err:%d\n", 250 + lport, linum_err.linum, linum_err.err); 251 + nr_errors++; 252 + } 253 + 254 + lport = sk_fds->active_lport; 255 + if (!bpf_map_lookup_elem(lport_linum_map_fd, &lport, &linum_err)) { 256 + fprintf(stderr, 257 + "bpf prog error out at lport:active(%d), linum:%u err:%d\n", 258 + lport, linum_err.linum, linum_err.err); 259 + nr_errors++; 260 + } 261 + 262 + return nr_errors; 263 + } 264 + 265 + static void check_hdr_and_close_fds(struct sk_fds *sk_fds) 266 + { 267 + if (sk_fds_shutdown(sk_fds)) 268 + goto check_linum; 269 + 270 + if (check_hdr_stg(&exp_passive_hdr_stg, sk_fds->passive_fd, 271 + "passive_hdr_stg")) 272 + goto check_linum; 273 + 274 + if (check_hdr_stg(&exp_active_hdr_stg, sk_fds->active_fd, 275 + "active_hdr_stg")) 276 + goto check_linum; 277 + 278 + if (check_hdr_opt(&exp_passive_estab_in, &skel->bss->passive_estab_in, 279 + "passive_estab_in")) 280 + goto check_linum; 281 + 282 + if (check_hdr_opt(&exp_active_estab_in, &skel->bss->active_estab_in, 283 + "active_estab_in")) 284 + goto check_linum; 285 + 286 + if (check_hdr_opt(&exp_passive_fin_in, &skel->bss->passive_fin_in, 287 + "passive_fin_in")) 288 + goto check_linum; 289 + 290 + check_hdr_opt(&exp_active_fin_in, &skel->bss->active_fin_in, 291 + "active_fin_in"); 292 + 293 + check_linum: 294 + CHECK_FAIL(check_error_linum(sk_fds)); 295 + sk_fds_close(sk_fds); 296 + } 297 + 298 + static void prepare_out(void) 299 + { 300 + skel->bss->active_syn_out = exp_passive_estab_in; 301 + skel->bss->passive_synack_out = exp_active_estab_in; 302 + 303 + skel->bss->active_fin_out = exp_passive_fin_in; 304 + skel->bss->passive_fin_out = exp_active_fin_in; 305 + } 306 + 307 + static void reset_test(void) 308 + { 309 + size_t optsize = sizeof(struct bpf_test_option); 310 + int lport, err; 311 + 312 + memset(&skel->bss->passive_synack_out, 0, optsize); 313 + memset(&skel->bss->passive_fin_out, 0, optsize); 314 + 315 + memset(&skel->bss->passive_estab_in, 0, optsize); 316 + memset(&skel->bss->passive_fin_in, 0, optsize); 317 + 318 + memset(&skel->bss->active_syn_out, 0, optsize); 319 + memset(&skel->bss->active_fin_out, 0, optsize); 320 + 321 + memset(&skel->bss->active_estab_in, 0, optsize); 322 + memset(&skel->bss->active_fin_in, 0, optsize); 323 + 324 + skel->data->test_kind = TCPOPT_EXP; 325 + skel->data->test_magic = 0xeB9F; 326 + 327 + memset(&exp_passive_estab_in, 0, optsize); 328 + memset(&exp_active_estab_in, 0, optsize); 329 + memset(&exp_passive_fin_in, 0, optsize); 330 + memset(&exp_active_fin_in, 0, optsize); 331 + 332 + memset(&exp_passive_hdr_stg, 0, sizeof(exp_passive_hdr_stg)); 333 + memset(&exp_active_hdr_stg, 0, sizeof(exp_active_hdr_stg)); 334 + exp_active_hdr_stg.active = true; 335 + 336 + err = bpf_map_get_next_key(lport_linum_map_fd, NULL, &lport); 337 + while (!err) { 338 + bpf_map_delete_elem(lport_linum_map_fd, &lport); 339 + err = bpf_map_get_next_key(lport_linum_map_fd, &lport, &lport); 340 + } 341 + } 342 + 343 + static void fastopen_estab(void) 344 + { 345 + struct bpf_link *link; 346 + struct sk_fds sk_fds; 347 + 348 + hdr_stg_map_fd = bpf_map__fd(skel->maps.hdr_stg_map); 349 + lport_linum_map_fd = bpf_map__fd(skel->maps.lport_linum_map); 350 + 351 + exp_passive_estab_in.flags = OPTION_F_RAND | OPTION_F_MAX_DELACK_MS; 352 + exp_passive_estab_in.rand = 0xfa; 353 + exp_passive_estab_in.max_delack_ms = 11; 354 + 355 + exp_active_estab_in.flags = OPTION_F_RAND | OPTION_F_MAX_DELACK_MS; 356 + exp_active_estab_in.rand = 0xce; 357 + exp_active_estab_in.max_delack_ms = 22; 358 + 359 + exp_passive_hdr_stg.fastopen = true; 360 + 361 + prepare_out(); 362 + 363 + /* Allow fastopen without fastopen cookie */ 364 + if (write_sysctl("/proc/sys/net/ipv4/tcp_fastopen", "1543")) 365 + return; 366 + 367 + link = bpf_program__attach_cgroup(skel->progs.estab, cg_fd); 368 + if (CHECK(IS_ERR(link), "attach_cgroup(estab)", "err: %ld\n", 369 + PTR_ERR(link))) 370 + return; 371 + 372 + if (sk_fds_connect(&sk_fds, true)) { 373 + bpf_link__destroy(link); 374 + return; 375 + } 376 + 377 + check_hdr_and_close_fds(&sk_fds); 378 + bpf_link__destroy(link); 379 + } 380 + 381 + static void syncookie_estab(void) 382 + { 383 + struct bpf_link *link; 384 + struct sk_fds sk_fds; 385 + 386 + hdr_stg_map_fd = bpf_map__fd(skel->maps.hdr_stg_map); 387 + lport_linum_map_fd = bpf_map__fd(skel->maps.lport_linum_map); 388 + 389 + exp_passive_estab_in.flags = OPTION_F_RAND | OPTION_F_MAX_DELACK_MS; 390 + exp_passive_estab_in.rand = 0xfa; 391 + exp_passive_estab_in.max_delack_ms = 11; 392 + 393 + exp_active_estab_in.flags = OPTION_F_RAND | OPTION_F_MAX_DELACK_MS | 394 + OPTION_F_RESEND; 395 + exp_active_estab_in.rand = 0xce; 396 + exp_active_estab_in.max_delack_ms = 22; 397 + 398 + exp_passive_hdr_stg.syncookie = true; 399 + exp_active_hdr_stg.resend_syn = true, 400 + 401 + prepare_out(); 402 + 403 + /* Clear the RESEND to ensure the bpf prog can learn 404 + * want_cookie and set the RESEND by itself. 405 + */ 406 + skel->bss->passive_synack_out.flags &= ~OPTION_F_RESEND; 407 + 408 + /* Enforce syncookie mode */ 409 + if (write_sysctl("/proc/sys/net/ipv4/tcp_syncookies", "2")) 410 + return; 411 + 412 + link = bpf_program__attach_cgroup(skel->progs.estab, cg_fd); 413 + if (CHECK(IS_ERR(link), "attach_cgroup(estab)", "err: %ld\n", 414 + PTR_ERR(link))) 415 + return; 416 + 417 + if (sk_fds_connect(&sk_fds, false)) { 418 + bpf_link__destroy(link); 419 + return; 420 + } 421 + 422 + check_hdr_and_close_fds(&sk_fds); 423 + bpf_link__destroy(link); 424 + } 425 + 426 + static void fin(void) 427 + { 428 + struct bpf_link *link; 429 + struct sk_fds sk_fds; 430 + 431 + hdr_stg_map_fd = bpf_map__fd(skel->maps.hdr_stg_map); 432 + lport_linum_map_fd = bpf_map__fd(skel->maps.lport_linum_map); 433 + 434 + exp_passive_fin_in.flags = OPTION_F_RAND; 435 + exp_passive_fin_in.rand = 0xfa; 436 + 437 + exp_active_fin_in.flags = OPTION_F_RAND; 438 + exp_active_fin_in.rand = 0xce; 439 + 440 + prepare_out(); 441 + 442 + if (write_sysctl("/proc/sys/net/ipv4/tcp_syncookies", "1")) 443 + return; 444 + 445 + link = bpf_program__attach_cgroup(skel->progs.estab, cg_fd); 446 + if (CHECK(IS_ERR(link), "attach_cgroup(estab)", "err: %ld\n", 447 + PTR_ERR(link))) 448 + return; 449 + 450 + if (sk_fds_connect(&sk_fds, false)) { 451 + bpf_link__destroy(link); 452 + return; 453 + } 454 + 455 + check_hdr_and_close_fds(&sk_fds); 456 + bpf_link__destroy(link); 457 + } 458 + 459 + static void __simple_estab(bool exprm) 460 + { 461 + struct bpf_link *link; 462 + struct sk_fds sk_fds; 463 + 464 + hdr_stg_map_fd = bpf_map__fd(skel->maps.hdr_stg_map); 465 + lport_linum_map_fd = bpf_map__fd(skel->maps.lport_linum_map); 466 + 467 + exp_passive_estab_in.flags = OPTION_F_RAND | OPTION_F_MAX_DELACK_MS; 468 + exp_passive_estab_in.rand = 0xfa; 469 + exp_passive_estab_in.max_delack_ms = 11; 470 + 471 + exp_active_estab_in.flags = OPTION_F_RAND | OPTION_F_MAX_DELACK_MS; 472 + exp_active_estab_in.rand = 0xce; 473 + exp_active_estab_in.max_delack_ms = 22; 474 + 475 + prepare_out(); 476 + 477 + if (!exprm) { 478 + skel->data->test_kind = 0xB9; 479 + skel->data->test_magic = 0; 480 + } 481 + 482 + if (write_sysctl("/proc/sys/net/ipv4/tcp_syncookies", "1")) 483 + return; 484 + 485 + link = bpf_program__attach_cgroup(skel->progs.estab, cg_fd); 486 + if (CHECK(IS_ERR(link), "attach_cgroup(estab)", "err: %ld\n", 487 + PTR_ERR(link))) 488 + return; 489 + 490 + if (sk_fds_connect(&sk_fds, false)) { 491 + bpf_link__destroy(link); 492 + return; 493 + } 494 + 495 + check_hdr_and_close_fds(&sk_fds); 496 + bpf_link__destroy(link); 497 + } 498 + 499 + static void no_exprm_estab(void) 500 + { 501 + __simple_estab(false); 502 + } 503 + 504 + static void simple_estab(void) 505 + { 506 + __simple_estab(true); 507 + } 508 + 509 + static void misc(void) 510 + { 511 + const char send_msg[] = "MISC!!!"; 512 + char recv_msg[sizeof(send_msg)]; 513 + const unsigned int nr_data = 2; 514 + struct bpf_link *link; 515 + struct sk_fds sk_fds; 516 + int i, ret; 517 + 518 + lport_linum_map_fd = bpf_map__fd(misc_skel->maps.lport_linum_map); 519 + 520 + if (write_sysctl("/proc/sys/net/ipv4/tcp_syncookies", "1")) 521 + return; 522 + 523 + link = bpf_program__attach_cgroup(misc_skel->progs.misc_estab, cg_fd); 524 + if (CHECK(IS_ERR(link), "attach_cgroup(misc_estab)", "err: %ld\n", 525 + PTR_ERR(link))) 526 + return; 527 + 528 + if (sk_fds_connect(&sk_fds, false)) { 529 + bpf_link__destroy(link); 530 + return; 531 + } 532 + 533 + for (i = 0; i < nr_data; i++) { 534 + /* MSG_EOR to ensure skb will not be combined */ 535 + ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 536 + MSG_EOR); 537 + if (CHECK(ret != sizeof(send_msg), "send(msg)", "ret:%d\n", 538 + ret)) 539 + goto check_linum; 540 + 541 + ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg)); 542 + if (CHECK(ret != sizeof(send_msg), "read(msg)", "ret:%d\n", 543 + ret)) 544 + goto check_linum; 545 + } 546 + 547 + if (sk_fds_shutdown(&sk_fds)) 548 + goto check_linum; 549 + 550 + CHECK(misc_skel->bss->nr_syn != 1, "unexpected nr_syn", 551 + "expected (1) != actual (%u)\n", 552 + misc_skel->bss->nr_syn); 553 + 554 + CHECK(misc_skel->bss->nr_data != nr_data, "unexpected nr_data", 555 + "expected (%u) != actual (%u)\n", 556 + nr_data, misc_skel->bss->nr_data); 557 + 558 + /* The last ACK may have been delayed, so it is either 1 or 2. */ 559 + CHECK(misc_skel->bss->nr_pure_ack != 1 && 560 + misc_skel->bss->nr_pure_ack != 2, 561 + "unexpected nr_pure_ack", 562 + "expected (1 or 2) != actual (%u)\n", 563 + misc_skel->bss->nr_pure_ack); 564 + 565 + CHECK(misc_skel->bss->nr_fin != 1, "unexpected nr_fin", 566 + "expected (1) != actual (%u)\n", 567 + misc_skel->bss->nr_fin); 568 + 569 + check_linum: 570 + CHECK_FAIL(check_error_linum(&sk_fds)); 571 + sk_fds_close(&sk_fds); 572 + bpf_link__destroy(link); 573 + } 574 + 575 + struct test { 576 + const char *desc; 577 + void (*run)(void); 578 + }; 579 + 580 + #define DEF_TEST(name) { #name, name } 581 + static struct test tests[] = { 582 + DEF_TEST(simple_estab), 583 + DEF_TEST(no_exprm_estab), 584 + DEF_TEST(syncookie_estab), 585 + DEF_TEST(fastopen_estab), 586 + DEF_TEST(fin), 587 + DEF_TEST(misc), 588 + }; 589 + 590 + void test_tcp_hdr_options(void) 591 + { 592 + int i; 593 + 594 + skel = test_tcp_hdr_options__open_and_load(); 595 + if (CHECK(!skel, "open and load skel", "failed")) 596 + return; 597 + 598 + misc_skel = test_misc_tcp_hdr_options__open_and_load(); 599 + if (CHECK(!misc_skel, "open and load misc test skel", "failed")) 600 + goto skel_destroy; 601 + 602 + cg_fd = test__join_cgroup(CG_NAME); 603 + if (CHECK_FAIL(cg_fd < 0)) 604 + goto skel_destroy; 605 + 606 + for (i = 0; i < ARRAY_SIZE(tests); i++) { 607 + if (!test__start_subtest(tests[i].desc)) 608 + continue; 609 + 610 + if (create_netns()) 611 + break; 612 + 613 + tests[i].run(); 614 + 615 + reset_test(); 616 + } 617 + 618 + close(cg_fd); 619 + skel_destroy: 620 + test_misc_tcp_hdr_options__destroy(misc_skel); 621 + test_tcp_hdr_options__destroy(skel); 622 + }

+94

tools/testing/selftests/bpf/prog_tests/test_bpffs.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2020 Facebook */ 3 + #define _GNU_SOURCE 4 + #include <sched.h> 5 + #include <sys/mount.h> 6 + #include <sys/stat.h> 7 + #include <sys/types.h> 8 + #include <test_progs.h> 9 + 10 + #define TDIR "/sys/kernel/debug" 11 + 12 + static int read_iter(char *file) 13 + { 14 + /* 1024 should be enough to get contiguous 4 "iter" letters at some point */ 15 + char buf[1024]; 16 + int fd, len; 17 + 18 + fd = open(file, 0); 19 + if (fd < 0) 20 + return -1; 21 + while ((len = read(fd, buf, sizeof(buf))) > 0) 22 + if (strstr(buf, "iter")) { 23 + close(fd); 24 + return 0; 25 + } 26 + close(fd); 27 + return -1; 28 + } 29 + 30 + static int fn(void) 31 + { 32 + int err, duration = 0; 33 + 34 + err = unshare(CLONE_NEWNS); 35 + if (CHECK(err, "unshare", "failed: %d\n", errno)) 36 + goto out; 37 + 38 + err = mount("", "/", "", MS_REC | MS_PRIVATE, NULL); 39 + if (CHECK(err, "mount /", "failed: %d\n", errno)) 40 + goto out; 41 + 42 + err = umount(TDIR); 43 + if (CHECK(err, "umount " TDIR, "failed: %d\n", errno)) 44 + goto out; 45 + 46 + err = mount("none", TDIR, "tmpfs", 0, NULL); 47 + if (CHECK(err, "mount", "mount root failed: %d\n", errno)) 48 + goto out; 49 + 50 + err = mkdir(TDIR "/fs1", 0777); 51 + if (CHECK(err, "mkdir "TDIR"/fs1", "failed: %d\n", errno)) 52 + goto out; 53 + err = mkdir(TDIR "/fs2", 0777); 54 + if (CHECK(err, "mkdir "TDIR"/fs2", "failed: %d\n", errno)) 55 + goto out; 56 + 57 + err = mount("bpf", TDIR "/fs1", "bpf", 0, NULL); 58 + if (CHECK(err, "mount bpffs "TDIR"/fs1", "failed: %d\n", errno)) 59 + goto out; 60 + err = mount("bpf", TDIR "/fs2", "bpf", 0, NULL); 61 + if (CHECK(err, "mount bpffs " TDIR "/fs2", "failed: %d\n", errno)) 62 + goto out; 63 + 64 + err = read_iter(TDIR "/fs1/maps.debug"); 65 + if (CHECK(err, "reading " TDIR "/fs1/maps.debug", "failed\n")) 66 + goto out; 67 + err = read_iter(TDIR "/fs2/progs.debug"); 68 + if (CHECK(err, "reading " TDIR "/fs2/progs.debug", "failed\n")) 69 + goto out; 70 + out: 71 + umount(TDIR "/fs1"); 72 + umount(TDIR "/fs2"); 73 + rmdir(TDIR "/fs1"); 74 + rmdir(TDIR "/fs2"); 75 + umount(TDIR); 76 + exit(err); 77 + } 78 + 79 + void test_test_bpffs(void) 80 + { 81 + int err, duration = 0, status = 0; 82 + pid_t pid; 83 + 84 + pid = fork(); 85 + if (CHECK(pid == -1, "clone", "clone failed %d", errno)) 86 + return; 87 + if (pid == 0) 88 + fn(); 89 + err = waitpid(pid, &status, 0); 90 + if (CHECK(err == -1 && errno != ECHILD, "waitpid", "failed %d", errno)) 91 + return; 92 + if (CHECK(WEXITSTATUS(status), "bpffs test ", "failed %d", WEXITSTATUS(status))) 93 + return; 94 + }

+60

tools/testing/selftests/bpf/prog_tests/test_local_storage.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright (C) 2020 Google LLC. 5 + */ 6 + 7 + #include <test_progs.h> 8 + #include <linux/limits.h> 9 + 10 + #include "local_storage.skel.h" 11 + #include "network_helpers.h" 12 + 13 + int create_and_unlink_file(void) 14 + { 15 + char fname[PATH_MAX] = "/tmp/fileXXXXXX"; 16 + int fd; 17 + 18 + fd = mkstemp(fname); 19 + if (fd < 0) 20 + return fd; 21 + 22 + close(fd); 23 + unlink(fname); 24 + return 0; 25 + } 26 + 27 + void test_test_local_storage(void) 28 + { 29 + struct local_storage *skel = NULL; 30 + int err, duration = 0, serv_sk = -1; 31 + 32 + skel = local_storage__open_and_load(); 33 + if (CHECK(!skel, "skel_load", "lsm skeleton failed\n")) 34 + goto close_prog; 35 + 36 + err = local_storage__attach(skel); 37 + if (CHECK(err, "attach", "lsm attach failed: %d\n", err)) 38 + goto close_prog; 39 + 40 + skel->bss->monitored_pid = getpid(); 41 + 42 + err = create_and_unlink_file(); 43 + if (CHECK(err < 0, "exec_cmd", "err %d errno %d\n", err, errno)) 44 + goto close_prog; 45 + 46 + CHECK(skel->data->inode_storage_result != 0, "inode_storage_result", 47 + "inode_local_storage not set\n"); 48 + 49 + serv_sk = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); 50 + if (CHECK(serv_sk < 0, "start_server", "failed to start server\n")) 51 + goto close_prog; 52 + 53 + CHECK(skel->data->sk_storage_result != 0, "sk_storage_result", 54 + "sk_local_storage not set\n"); 55 + 56 + close(serv_sk); 57 + 58 + close_prog: 59 + local_storage__destroy(skel); 60 + }

+9

tools/testing/selftests/bpf/prog_tests/test_lsm.c

··· 10 10 #include <unistd.h> 11 11 #include <malloc.h> 12 12 #include <stdlib.h> 13 + #include <unistd.h> 13 14 14 15 #include "lsm.skel.h" 15 16 ··· 56 55 { 57 56 struct lsm *skel = NULL; 58 57 int err, duration = 0; 58 + int buf = 1234; 59 59 60 60 skel = lsm__open_and_load(); 61 61 if (CHECK(!skel, "skel_load", "lsm skeleton failed\n")) ··· 82 80 83 81 CHECK(skel->bss->mprotect_count != 1, "mprotect_count", 84 82 "mprotect_count = %d\n", skel->bss->mprotect_count); 83 + 84 + syscall(__NR_setdomainname, &buf, -2L); 85 + syscall(__NR_setdomainname, 0, -3L); 86 + syscall(__NR_setdomainname, ~0L, -4L); 87 + 88 + CHECK(skel->bss->copy_test != 3, "copy_test", 89 + "copy_test = %d\n", skel->bss->copy_test); 85 90 86 91 close_prog: 87 92 lsm__destroy(skel);

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_enumval.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_enumval x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_enumval___diff.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_enumval___diff x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_enumval___err_missing.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_enumval___err_missing x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_enumval___val3_missing.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_enumval___val3_missing x) {}

+4

tools/testing/selftests/bpf/progs/btf__core_reloc_size___err_ambiguous.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_size___err_ambiguous1 x, 4 + struct core_reloc_size___err_ambiguous2 y) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_type_based.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_type_based x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_type_based___all_missing.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_type_based___all_missing x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_type_based___diff_sz.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_type_based___diff_sz x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_type_based___fn_wrong_args.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_type_based___fn_wrong_args x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_type_based___incompat.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_type_based___incompat x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_type_id.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_type_id x) {}

+3

tools/testing/selftests/bpf/progs/btf__core_reloc_type_id___missing_targets.c

··· 1 + #include "core_reloc_types.h" 2 + 3 + void f(struct core_reloc_type_id___missing_targets x) {}

+351 -1

tools/testing/selftests/bpf/progs/core_reloc_types.h

··· 652 652 }; 653 653 654 654 /* 655 - * EXISTENCE 655 + * FIELD EXISTENCE 656 656 */ 657 657 struct core_reloc_existence_output { 658 658 int a_exists; ··· 808 808 char arr_field[10]; 809 809 void *ptr_field; 810 810 enum { OTHER_VALUE = 0xFFFFFFFFFFFFFFFF } enum_field; 811 + }; 812 + 813 + /* Error case of two candidates with the fields (int_field) at the same 814 + * offset, but with differing final relocation values: size 4 vs size 1 815 + */ 816 + struct core_reloc_size___err_ambiguous1 { 817 + /* int at offset 0 */ 818 + int int_field; 819 + 820 + struct { int x; } struct_field; 821 + union { int x; } union_field; 822 + int arr_field[4]; 823 + void *ptr_field; 824 + enum { VALUE___1 = 123 } enum_field; 825 + }; 826 + 827 + struct core_reloc_size___err_ambiguous2 { 828 + /* char at offset 0 */ 829 + char int_field; 830 + 831 + struct { int x; } struct_field; 832 + union { int x; } union_field; 833 + int arr_field[4]; 834 + void *ptr_field; 835 + enum { VALUE___2 = 123 } enum_field; 836 + }; 837 + 838 + /* 839 + * TYPE EXISTENCE & SIZE 840 + */ 841 + struct core_reloc_type_based_output { 842 + bool struct_exists; 843 + bool union_exists; 844 + bool enum_exists; 845 + bool typedef_named_struct_exists; 846 + bool typedef_anon_struct_exists; 847 + bool typedef_struct_ptr_exists; 848 + bool typedef_int_exists; 849 + bool typedef_enum_exists; 850 + bool typedef_void_ptr_exists; 851 + bool typedef_func_proto_exists; 852 + bool typedef_arr_exists; 853 + 854 + int struct_sz; 855 + int union_sz; 856 + int enum_sz; 857 + int typedef_named_struct_sz; 858 + int typedef_anon_struct_sz; 859 + int typedef_struct_ptr_sz; 860 + int typedef_int_sz; 861 + int typedef_enum_sz; 862 + int typedef_void_ptr_sz; 863 + int typedef_func_proto_sz; 864 + int typedef_arr_sz; 865 + }; 866 + 867 + struct a_struct { 868 + int x; 869 + }; 870 + 871 + union a_union { 872 + int y; 873 + int z; 874 + }; 875 + 876 + typedef struct a_struct named_struct_typedef; 877 + 878 + typedef struct { int x, y, z; } anon_struct_typedef; 879 + 880 + typedef struct { 881 + int a, b, c; 882 + } *struct_ptr_typedef; 883 + 884 + enum an_enum { 885 + AN_ENUM_VAL1 = 1, 886 + AN_ENUM_VAL2 = 2, 887 + AN_ENUM_VAL3 = 3, 888 + }; 889 + 890 + typedef int int_typedef; 891 + 892 + typedef enum { TYPEDEF_ENUM_VAL1, TYPEDEF_ENUM_VAL2 } enum_typedef; 893 + 894 + typedef void *void_ptr_typedef; 895 + 896 + typedef int (*func_proto_typedef)(long); 897 + 898 + typedef char arr_typedef[20]; 899 + 900 + struct core_reloc_type_based { 901 + struct a_struct f1; 902 + union a_union f2; 903 + enum an_enum f3; 904 + named_struct_typedef f4; 905 + anon_struct_typedef f5; 906 + struct_ptr_typedef f6; 907 + int_typedef f7; 908 + enum_typedef f8; 909 + void_ptr_typedef f9; 910 + func_proto_typedef f10; 911 + arr_typedef f11; 912 + }; 913 + 914 + /* no types in target */ 915 + struct core_reloc_type_based___all_missing { 916 + }; 917 + 918 + /* different type sizes, extra modifiers, anon vs named enums, etc */ 919 + struct a_struct___diff_sz { 920 + long x; 921 + int y; 922 + char z; 923 + }; 924 + 925 + union a_union___diff_sz { 926 + char yy; 927 + char zz; 928 + }; 929 + 930 + typedef struct a_struct___diff_sz named_struct_typedef___diff_sz; 931 + 932 + typedef struct { long xx, yy, zzz; } anon_struct_typedef___diff_sz; 933 + 934 + typedef struct { 935 + char aa[1], bb[2], cc[3]; 936 + } *struct_ptr_typedef___diff_sz; 937 + 938 + enum an_enum___diff_sz { 939 + AN_ENUM_VAL1___diff_sz = 0x123412341234, 940 + AN_ENUM_VAL2___diff_sz = 2, 941 + }; 942 + 943 + typedef unsigned long int_typedef___diff_sz; 944 + 945 + typedef enum an_enum___diff_sz enum_typedef___diff_sz; 946 + 947 + typedef const void * const void_ptr_typedef___diff_sz; 948 + 949 + typedef int_typedef___diff_sz (*func_proto_typedef___diff_sz)(char); 950 + 951 + typedef int arr_typedef___diff_sz[2]; 952 + 953 + struct core_reloc_type_based___diff_sz { 954 + struct a_struct___diff_sz f1; 955 + union a_union___diff_sz f2; 956 + enum an_enum___diff_sz f3; 957 + named_struct_typedef___diff_sz f4; 958 + anon_struct_typedef___diff_sz f5; 959 + struct_ptr_typedef___diff_sz f6; 960 + int_typedef___diff_sz f7; 961 + enum_typedef___diff_sz f8; 962 + void_ptr_typedef___diff_sz f9; 963 + func_proto_typedef___diff_sz f10; 964 + arr_typedef___diff_sz f11; 965 + }; 966 + 967 + /* incompatibilities between target and local types */ 968 + union a_struct___incompat { /* union instead of struct */ 969 + int x; 970 + }; 971 + 972 + struct a_union___incompat { /* struct instead of union */ 973 + int y; 974 + int z; 975 + }; 976 + 977 + /* typedef to union, not to struct */ 978 + typedef union a_struct___incompat named_struct_typedef___incompat; 979 + 980 + /* typedef to void pointer, instead of struct */ 981 + typedef void *anon_struct_typedef___incompat; 982 + 983 + /* extra pointer indirection */ 984 + typedef struct { 985 + int a, b, c; 986 + } **struct_ptr_typedef___incompat; 987 + 988 + /* typedef of a struct with int, instead of int */ 989 + typedef struct { int x; } int_typedef___incompat; 990 + 991 + /* typedef to func_proto, instead of enum */ 992 + typedef int (*enum_typedef___incompat)(void); 993 + 994 + /* pointer to char instead of void */ 995 + typedef char *void_ptr_typedef___incompat; 996 + 997 + /* void return type instead of int */ 998 + typedef void (*func_proto_typedef___incompat)(long); 999 + 1000 + /* multi-dimensional array instead of a single-dimensional */ 1001 + typedef int arr_typedef___incompat[20][2]; 1002 + 1003 + struct core_reloc_type_based___incompat { 1004 + union a_struct___incompat f1; 1005 + struct a_union___incompat f2; 1006 + /* the only valid one is enum, to check that something still succeeds */ 1007 + enum an_enum f3; 1008 + named_struct_typedef___incompat f4; 1009 + anon_struct_typedef___incompat f5; 1010 + struct_ptr_typedef___incompat f6; 1011 + int_typedef___incompat f7; 1012 + enum_typedef___incompat f8; 1013 + void_ptr_typedef___incompat f9; 1014 + func_proto_typedef___incompat f10; 1015 + arr_typedef___incompat f11; 1016 + }; 1017 + 1018 + /* func_proto with incompatible signature */ 1019 + typedef void (*func_proto_typedef___fn_wrong_ret1)(long); 1020 + typedef int * (*func_proto_typedef___fn_wrong_ret2)(long); 1021 + typedef struct { int x; } int_struct_typedef; 1022 + typedef int_struct_typedef (*func_proto_typedef___fn_wrong_ret3)(long); 1023 + typedef int (*func_proto_typedef___fn_wrong_arg)(void *); 1024 + typedef int (*func_proto_typedef___fn_wrong_arg_cnt1)(long, long); 1025 + typedef int (*func_proto_typedef___fn_wrong_arg_cnt2)(void); 1026 + 1027 + struct core_reloc_type_based___fn_wrong_args { 1028 + /* one valid type to make sure relos still work */ 1029 + struct a_struct f1; 1030 + func_proto_typedef___fn_wrong_ret1 f2; 1031 + func_proto_typedef___fn_wrong_ret2 f3; 1032 + func_proto_typedef___fn_wrong_ret3 f4; 1033 + func_proto_typedef___fn_wrong_arg f5; 1034 + func_proto_typedef___fn_wrong_arg_cnt1 f6; 1035 + func_proto_typedef___fn_wrong_arg_cnt2 f7; 1036 + }; 1037 + 1038 + /* 1039 + * TYPE ID MAPPING (LOCAL AND TARGET) 1040 + */ 1041 + struct core_reloc_type_id_output { 1042 + int local_anon_struct; 1043 + int local_anon_union; 1044 + int local_anon_enum; 1045 + int local_anon_func_proto_ptr; 1046 + int local_anon_void_ptr; 1047 + int local_anon_arr; 1048 + 1049 + int local_struct; 1050 + int local_union; 1051 + int local_enum; 1052 + int local_int; 1053 + int local_struct_typedef; 1054 + int local_func_proto_typedef; 1055 + int local_arr_typedef; 1056 + 1057 + int targ_struct; 1058 + int targ_union; 1059 + int targ_enum; 1060 + int targ_int; 1061 + int targ_struct_typedef; 1062 + int targ_func_proto_typedef; 1063 + int targ_arr_typedef; 1064 + }; 1065 + 1066 + struct core_reloc_type_id { 1067 + struct a_struct f1; 1068 + union a_union f2; 1069 + enum an_enum f3; 1070 + named_struct_typedef f4; 1071 + func_proto_typedef f5; 1072 + arr_typedef f6; 1073 + }; 1074 + 1075 + struct core_reloc_type_id___missing_targets { 1076 + /* nothing */ 1077 + }; 1078 + 1079 + /* 1080 + * ENUMERATOR VALUE EXISTENCE AND VALUE RELOCATION 1081 + */ 1082 + struct core_reloc_enumval_output { 1083 + bool named_val1_exists; 1084 + bool named_val2_exists; 1085 + bool named_val3_exists; 1086 + bool anon_val1_exists; 1087 + bool anon_val2_exists; 1088 + bool anon_val3_exists; 1089 + 1090 + int named_val1; 1091 + int named_val2; 1092 + int anon_val1; 1093 + int anon_val2; 1094 + }; 1095 + 1096 + enum named_enum { 1097 + NAMED_ENUM_VAL1 = 1, 1098 + NAMED_ENUM_VAL2 = 2, 1099 + NAMED_ENUM_VAL3 = 3, 1100 + }; 1101 + 1102 + typedef enum { 1103 + ANON_ENUM_VAL1 = 0x10, 1104 + ANON_ENUM_VAL2 = 0x20, 1105 + ANON_ENUM_VAL3 = 0x30, 1106 + } anon_enum; 1107 + 1108 + struct core_reloc_enumval { 1109 + enum named_enum f1; 1110 + anon_enum f2; 1111 + }; 1112 + 1113 + /* differing enumerator values */ 1114 + enum named_enum___diff { 1115 + NAMED_ENUM_VAL1___diff = 101, 1116 + NAMED_ENUM_VAL2___diff = 202, 1117 + NAMED_ENUM_VAL3___diff = 303, 1118 + }; 1119 + 1120 + typedef enum { 1121 + ANON_ENUM_VAL1___diff = 0x11, 1122 + ANON_ENUM_VAL2___diff = 0x22, 1123 + ANON_ENUM_VAL3___diff = 0x33, 1124 + } anon_enum___diff; 1125 + 1126 + struct core_reloc_enumval___diff { 1127 + enum named_enum___diff f1; 1128 + anon_enum___diff f2; 1129 + }; 1130 + 1131 + /* missing (optional) third enum value */ 1132 + enum named_enum___val3_missing { 1133 + NAMED_ENUM_VAL1___val3_missing = 111, 1134 + NAMED_ENUM_VAL2___val3_missing = 222, 1135 + }; 1136 + 1137 + typedef enum { 1138 + ANON_ENUM_VAL1___val3_missing = 0x111, 1139 + ANON_ENUM_VAL2___val3_missing = 0x222, 1140 + } anon_enum___val3_missing; 1141 + 1142 + struct core_reloc_enumval___val3_missing { 1143 + enum named_enum___val3_missing f1; 1144 + anon_enum___val3_missing f2; 1145 + }; 1146 + 1147 + /* missing (mandatory) second enum value, should fail */ 1148 + enum named_enum___err_missing { 1149 + NAMED_ENUM_VAL1___err_missing = 1, 1150 + NAMED_ENUM_VAL3___err_missing = 3, 1151 + }; 1152 + 1153 + typedef enum { 1154 + ANON_ENUM_VAL1___err_missing = 0x111, 1155 + ANON_ENUM_VAL3___err_missing = 0x222, 1156 + } anon_enum___err_missing; 1157 + 1158 + struct core_reloc_enumval___err_missing { 1159 + enum named_enum___err_missing f1; 1160 + anon_enum___err_missing f2; 811 1161 };

+27

tools/testing/selftests/bpf/progs/fexit_bpf2bpf.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 /* Copyright (c) 2019 Facebook */ 3 3 #include <linux/stddef.h> 4 + #include <linux/if_ether.h> 4 5 #include <linux/ipv6.h> 5 6 #include <linux/bpf.h> 7 + #include <linux/tcp.h> 6 8 #include <bpf/bpf_helpers.h> 7 9 #include <bpf/bpf_endian.h> 8 10 #include <bpf/bpf_tracing.h> ··· 153 151 test_get_constant = 1; 154 152 return test_get_constant; /* original get_constant() returns val - 122 */ 155 153 } 154 + 155 + __u64 test_pkt_write_access_subprog = 0; 156 + SEC("freplace/test_pkt_write_access_subprog") 157 + int new_test_pkt_write_access_subprog(struct __sk_buff *skb, __u32 off) 158 + { 159 + 160 + void *data = (void *)(long)skb->data; 161 + void *data_end = (void *)(long)skb->data_end; 162 + struct tcphdr *tcp; 163 + 164 + if (off > sizeof(struct ethhdr) + sizeof(struct ipv6hdr)) 165 + return -1; 166 + 167 + tcp = data + off; 168 + if (tcp + 1 > data_end) 169 + return -1; 170 + 171 + /* make modifications to the packet data */ 172 + tcp->check++; 173 + tcp->syn = 0; 174 + 175 + test_pkt_write_access_subprog = 1; 176 + return 0; 177 + } 178 + 156 179 char _license[] SEC("license") = "GPL";

+40

tools/testing/selftests/bpf/progs/freplace_attach_probe.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Facebook 3 + 4 + #include <linux/ptrace.h> 5 + #include <linux/bpf.h> 6 + #include <bpf/bpf_helpers.h> 7 + #include <bpf/bpf_tracing.h> 8 + 9 + #define VAR_NUM 2 10 + 11 + struct hmap_elem { 12 + struct bpf_spin_lock lock; 13 + int var[VAR_NUM]; 14 + }; 15 + 16 + struct { 17 + __uint(type, BPF_MAP_TYPE_HASH); 18 + __uint(max_entries, 1); 19 + __type(key, __u32); 20 + __type(value, struct hmap_elem); 21 + } hash_map SEC(".maps"); 22 + 23 + SEC("freplace/handle_kprobe") 24 + int new_handle_kprobe(struct pt_regs *ctx) 25 + { 26 + struct hmap_elem zero = {}, *val; 27 + int key = 0; 28 + 29 + val = bpf_map_lookup_elem(&hash_map, &key); 30 + if (!val) 31 + return 1; 32 + /* spin_lock in hash map */ 33 + bpf_spin_lock(&val->lock); 34 + val->var[0] = 99; 35 + bpf_spin_unlock(&val->lock); 36 + 37 + return 0; 38 + } 39 + 40 + char _license[] SEC("license") = "GPL";

+34

tools/testing/selftests/bpf/progs/freplace_cls_redirect.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Facebook 3 + 4 + #include <linux/stddef.h> 5 + #include <linux/bpf.h> 6 + #include <linux/pkt_cls.h> 7 + #include <bpf/bpf_endian.h> 8 + #include <bpf/bpf_helpers.h> 9 + 10 + struct bpf_map_def SEC("maps") sock_map = { 11 + .type = BPF_MAP_TYPE_SOCKMAP, 12 + .key_size = sizeof(int), 13 + .value_size = sizeof(int), 14 + .max_entries = 2, 15 + }; 16 + 17 + SEC("freplace/cls_redirect") 18 + int freplace_cls_redirect_test(struct __sk_buff *skb) 19 + { 20 + int ret = 0; 21 + const int zero = 0; 22 + struct bpf_sock *sk; 23 + 24 + sk = bpf_map_lookup_elem(&sock_map, &zero); 25 + if (!sk) 26 + return TC_ACT_SHOT; 27 + 28 + ret = bpf_map_update_elem(&sock_map, &zero, sk, 0); 29 + bpf_sk_release(sk); 30 + 31 + return ret == 0 ? TC_ACT_OK : TC_ACT_SHOT; 32 + } 33 + 34 + char _license[] SEC("license") = "GPL";

+19

tools/testing/selftests/bpf/progs/freplace_connect_v4_prog.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Facebook 3 + 4 + #include <linux/stddef.h> 5 + #include <linux/ipv6.h> 6 + #include <linux/bpf.h> 7 + #include <linux/in.h> 8 + #include <sys/socket.h> 9 + #include <bpf/bpf_helpers.h> 10 + #include <bpf/bpf_endian.h> 11 + 12 + SEC("freplace/connect_v4_prog") 13 + int new_connect_v4_prog(struct bpf_sock_addr *ctx) 14 + { 15 + // return value thats in invalid range 16 + return 255; 17 + } 18 + 19 + char _license[] SEC("license") = "GPL";

+140

tools/testing/selftests/bpf/progs/local_storage.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* 4 + * Copyright 2020 Google LLC. 5 + */ 6 + 7 + #include <errno.h> 8 + #include <linux/bpf.h> 9 + #include <stdbool.h> 10 + #include <bpf/bpf_helpers.h> 11 + #include <bpf/bpf_tracing.h> 12 + 13 + char _license[] SEC("license") = "GPL"; 14 + 15 + #define DUMMY_STORAGE_VALUE 0xdeadbeef 16 + 17 + int monitored_pid = 0; 18 + int inode_storage_result = -1; 19 + int sk_storage_result = -1; 20 + 21 + struct dummy_storage { 22 + __u32 value; 23 + }; 24 + 25 + struct { 26 + __uint(type, BPF_MAP_TYPE_INODE_STORAGE); 27 + __uint(map_flags, BPF_F_NO_PREALLOC); 28 + __type(key, int); 29 + __type(value, struct dummy_storage); 30 + } inode_storage_map SEC(".maps"); 31 + 32 + struct { 33 + __uint(type, BPF_MAP_TYPE_SK_STORAGE); 34 + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE); 35 + __type(key, int); 36 + __type(value, struct dummy_storage); 37 + } sk_storage_map SEC(".maps"); 38 + 39 + /* TODO Use vmlinux.h once BTF pruning for embedded types is fixed. 40 + */ 41 + struct sock {} __attribute__((preserve_access_index)); 42 + struct sockaddr {} __attribute__((preserve_access_index)); 43 + struct socket { 44 + struct sock *sk; 45 + } __attribute__((preserve_access_index)); 46 + 47 + struct inode {} __attribute__((preserve_access_index)); 48 + struct dentry { 49 + struct inode *d_inode; 50 + } __attribute__((preserve_access_index)); 51 + struct file { 52 + struct inode *f_inode; 53 + } __attribute__((preserve_access_index)); 54 + 55 + 56 + SEC("lsm/inode_unlink") 57 + int BPF_PROG(unlink_hook, struct inode *dir, struct dentry *victim) 58 + { 59 + __u32 pid = bpf_get_current_pid_tgid() >> 32; 60 + struct dummy_storage *storage; 61 + 62 + if (pid != monitored_pid) 63 + return 0; 64 + 65 + storage = bpf_inode_storage_get(&inode_storage_map, victim->d_inode, 0, 66 + BPF_SK_STORAGE_GET_F_CREATE); 67 + if (!storage) 68 + return 0; 69 + 70 + if (storage->value == DUMMY_STORAGE_VALUE) 71 + inode_storage_result = -1; 72 + 73 + inode_storage_result = 74 + bpf_inode_storage_delete(&inode_storage_map, victim->d_inode); 75 + 76 + return 0; 77 + } 78 + 79 + SEC("lsm/socket_bind") 80 + int BPF_PROG(socket_bind, struct socket *sock, struct sockaddr *address, 81 + int addrlen) 82 + { 83 + __u32 pid = bpf_get_current_pid_tgid() >> 32; 84 + struct dummy_storage *storage; 85 + 86 + if (pid != monitored_pid) 87 + return 0; 88 + 89 + storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0, 90 + BPF_SK_STORAGE_GET_F_CREATE); 91 + if (!storage) 92 + return 0; 93 + 94 + if (storage->value == DUMMY_STORAGE_VALUE) 95 + sk_storage_result = -1; 96 + 97 + sk_storage_result = bpf_sk_storage_delete(&sk_storage_map, sock->sk); 98 + return 0; 99 + } 100 + 101 + SEC("lsm/socket_post_create") 102 + int BPF_PROG(socket_post_create, struct socket *sock, int family, int type, 103 + int protocol, int kern) 104 + { 105 + __u32 pid = bpf_get_current_pid_tgid() >> 32; 106 + struct dummy_storage *storage; 107 + 108 + if (pid != monitored_pid) 109 + return 0; 110 + 111 + storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0, 112 + BPF_SK_STORAGE_GET_F_CREATE); 113 + if (!storage) 114 + return 0; 115 + 116 + storage->value = DUMMY_STORAGE_VALUE; 117 + 118 + return 0; 119 + } 120 + 121 + SEC("lsm/file_open") 122 + int BPF_PROG(file_open, struct file *file) 123 + { 124 + __u32 pid = bpf_get_current_pid_tgid() >> 32; 125 + struct dummy_storage *storage; 126 + 127 + if (pid != monitored_pid) 128 + return 0; 129 + 130 + if (!file->f_inode) 131 + return 0; 132 + 133 + storage = bpf_inode_storage_get(&inode_storage_map, file->f_inode, 0, 134 + BPF_LOCAL_STORAGE_GET_F_CREATE); 135 + if (!storage) 136 + return 0; 137 + 138 + storage->value = DUMMY_STORAGE_VALUE; 139 + return 0; 140 + }

+63 -1

tools/testing/selftests/bpf/progs/lsm.c

··· 9 9 #include <bpf/bpf_tracing.h> 10 10 #include <errno.h> 11 11 12 + struct { 13 + __uint(type, BPF_MAP_TYPE_ARRAY); 14 + __uint(max_entries, 1); 15 + __type(key, __u32); 16 + __type(value, __u64); 17 + } array SEC(".maps"); 18 + 19 + struct { 20 + __uint(type, BPF_MAP_TYPE_HASH); 21 + __uint(max_entries, 1); 22 + __type(key, __u32); 23 + __type(value, __u64); 24 + } hash SEC(".maps"); 25 + 26 + struct { 27 + __uint(type, BPF_MAP_TYPE_LRU_HASH); 28 + __uint(max_entries, 1); 29 + __type(key, __u32); 30 + __type(value, __u64); 31 + } lru_hash SEC(".maps"); 32 + 12 33 char _license[] SEC("license") = "GPL"; 13 34 14 35 int monitored_pid = 0; ··· 57 36 return ret; 58 37 } 59 38 60 - SEC("lsm/bprm_committed_creds") 39 + SEC("lsm.s/bprm_committed_creds") 61 40 int BPF_PROG(test_void_hook, struct linux_binprm *bprm) 62 41 { 63 42 __u32 pid = bpf_get_current_pid_tgid() >> 32; 43 + char args[64]; 44 + __u32 key = 0; 45 + __u64 *value; 64 46 65 47 if (monitored_pid == pid) 66 48 bprm_count++; 67 49 50 + bpf_copy_from_user(args, sizeof(args), (void *)bprm->vma->vm_mm->arg_start); 51 + bpf_copy_from_user(args, sizeof(args), (void *)bprm->mm->arg_start); 52 + 53 + value = bpf_map_lookup_elem(&array, &key); 54 + if (value) 55 + *value = 0; 56 + value = bpf_map_lookup_elem(&hash, &key); 57 + if (value) 58 + *value = 0; 59 + value = bpf_map_lookup_elem(&lru_hash, &key); 60 + if (value) 61 + *value = 0; 62 + 63 + return 0; 64 + } 65 + SEC("lsm/task_free") /* lsm/ is ok, lsm.s/ fails */ 66 + int BPF_PROG(test_task_free, struct task_struct *task) 67 + { 68 + return 0; 69 + } 70 + 71 + int copy_test = 0; 72 + 73 + SEC("fentry.s/__x64_sys_setdomainname") 74 + int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) 75 + { 76 + void *ptr = (void *)PT_REGS_PARM1(regs); 77 + int len = PT_REGS_PARM2(regs); 78 + int buf = 0; 79 + long ret; 80 + 81 + ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); 82 + if (len == -2 && ret == 0 && buf == 1234) 83 + copy_test++; 84 + if (len == -3 && ret == -EFAULT) 85 + copy_test++; 86 + if (len == -4 && ret == -EFAULT) 87 + copy_test++; 68 88 return 0; 69 89 }

+3 -3

tools/testing/selftests/bpf/progs/map_ptr_kern.c

··· 589 589 return 1; 590 590 } 591 591 592 - struct bpf_sk_storage_map { 592 + struct bpf_local_storage_map { 593 593 struct bpf_map map; 594 594 } __attribute__((preserve_access_index)); 595 595 ··· 602 602 603 603 static inline int check_sk_storage(void) 604 604 { 605 - struct bpf_sk_storage_map *sk_storage = 606 - (struct bpf_sk_storage_map *)&m_sk_storage; 605 + struct bpf_local_storage_map *sk_storage = 606 + (struct bpf_local_storage_map *)&m_sk_storage; 607 607 struct bpf_map *map = (struct bpf_map *)&m_sk_storage; 608 608 609 609 VERIFY(check(&sk_storage->map, map, sizeof(__u32), sizeof(__u32), 0));

+31

tools/testing/selftests/bpf/progs/test_btf_map_in_map.c

··· 11 11 } inner_map1 SEC(".maps"), 12 12 inner_map2 SEC(".maps"); 13 13 14 + struct inner_map_sz2 { 15 + __uint(type, BPF_MAP_TYPE_ARRAY); 16 + __uint(max_entries, 2); 17 + __type(key, int); 18 + __type(value, int); 19 + } inner_map_sz2 SEC(".maps"); 20 + 14 21 struct outer_arr { 15 22 __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS); 16 23 __uint(max_entries, 3); ··· 55 48 [0] = &inner_map2, 56 49 [4] = &inner_map1, 57 50 }, 51 + }; 52 + 53 + struct sockarr_sz1 { 54 + __uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY); 55 + __uint(max_entries, 1); 56 + __type(key, int); 57 + __type(value, int); 58 + } sockarr_sz1 SEC(".maps"); 59 + 60 + struct sockarr_sz2 { 61 + __uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY); 62 + __uint(max_entries, 2); 63 + __type(key, int); 64 + __type(value, int); 65 + } sockarr_sz2 SEC(".maps"); 66 + 67 + struct outer_sockarr_sz1 { 68 + __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS); 69 + __uint(max_entries, 1); 70 + __uint(key_size, sizeof(int)); 71 + __uint(value_size, sizeof(int)); 72 + __array(values, struct sockarr_sz1); 73 + } outer_sockarr SEC(".maps") = { 74 + .values = { (void *)&sockarr_sz1 }, 58 75 }; 59 76 60 77 int input = 0;

+72

tools/testing/selftests/bpf/progs/test_core_reloc_enumval.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Facebook 3 + 4 + #include <linux/bpf.h> 5 + #include <stdint.h> 6 + #include <stdbool.h> 7 + #include <bpf/bpf_helpers.h> 8 + #include <bpf/bpf_core_read.h> 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + struct { 13 + char in[256]; 14 + char out[256]; 15 + bool skip; 16 + } data = {}; 17 + 18 + enum named_enum { 19 + NAMED_ENUM_VAL1 = 1, 20 + NAMED_ENUM_VAL2 = 2, 21 + NAMED_ENUM_VAL3 = 3, 22 + }; 23 + 24 + typedef enum { 25 + ANON_ENUM_VAL1 = 0x10, 26 + ANON_ENUM_VAL2 = 0x20, 27 + ANON_ENUM_VAL3 = 0x30, 28 + } anon_enum; 29 + 30 + struct core_reloc_enumval_output { 31 + bool named_val1_exists; 32 + bool named_val2_exists; 33 + bool named_val3_exists; 34 + bool anon_val1_exists; 35 + bool anon_val2_exists; 36 + bool anon_val3_exists; 37 + 38 + int named_val1; 39 + int named_val2; 40 + int anon_val1; 41 + int anon_val2; 42 + }; 43 + 44 + SEC("raw_tracepoint/sys_enter") 45 + int test_core_enumval(void *ctx) 46 + { 47 + #if __has_builtin(__builtin_preserve_enum_value) 48 + struct core_reloc_enumval_output *out = (void *)&data.out; 49 + enum named_enum named = 0; 50 + anon_enum anon = 0; 51 + 52 + out->named_val1_exists = bpf_core_enum_value_exists(named, NAMED_ENUM_VAL1); 53 + out->named_val2_exists = bpf_core_enum_value_exists(enum named_enum, NAMED_ENUM_VAL2); 54 + out->named_val3_exists = bpf_core_enum_value_exists(enum named_enum, NAMED_ENUM_VAL3); 55 + 56 + out->anon_val1_exists = bpf_core_enum_value_exists(anon, ANON_ENUM_VAL1); 57 + out->anon_val2_exists = bpf_core_enum_value_exists(anon_enum, ANON_ENUM_VAL2); 58 + out->anon_val3_exists = bpf_core_enum_value_exists(anon_enum, ANON_ENUM_VAL3); 59 + 60 + out->named_val1 = bpf_core_enum_value(named, NAMED_ENUM_VAL1); 61 + out->named_val2 = bpf_core_enum_value(named, NAMED_ENUM_VAL2); 62 + /* NAMED_ENUM_VAL3 value is optional */ 63 + 64 + out->anon_val1 = bpf_core_enum_value(anon, ANON_ENUM_VAL1); 65 + out->anon_val2 = bpf_core_enum_value(anon, ANON_ENUM_VAL2); 66 + /* ANON_ENUM_VAL3 value is optional */ 67 + #else 68 + data.skip = true; 69 + #endif 70 + 71 + return 0; 72 + }

+2

tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c

··· 3 3 4 4 #include <linux/bpf.h> 5 5 #include <stdint.h> 6 + #include <stdbool.h> 6 7 #include <bpf/bpf_helpers.h> 7 8 #include <bpf/bpf_core_read.h> 8 9 ··· 12 11 struct { 13 12 char in[256]; 14 13 char out[256]; 14 + bool skip; 15 15 uint64_t my_pid_tgid; 16 16 } data = {}; 17 17

+110

tools/testing/selftests/bpf/progs/test_core_reloc_type_based.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Facebook 3 + 4 + #include <linux/bpf.h> 5 + #include <stdint.h> 6 + #include <stdbool.h> 7 + #include <bpf/bpf_helpers.h> 8 + #include <bpf/bpf_core_read.h> 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + struct { 13 + char in[256]; 14 + char out[256]; 15 + bool skip; 16 + } data = {}; 17 + 18 + struct a_struct { 19 + int x; 20 + }; 21 + 22 + union a_union { 23 + int y; 24 + int z; 25 + }; 26 + 27 + typedef struct a_struct named_struct_typedef; 28 + 29 + typedef struct { int x, y, z; } anon_struct_typedef; 30 + 31 + typedef struct { 32 + int a, b, c; 33 + } *struct_ptr_typedef; 34 + 35 + enum an_enum { 36 + AN_ENUM_VAL1 = 1, 37 + AN_ENUM_VAL2 = 2, 38 + AN_ENUM_VAL3 = 3, 39 + }; 40 + 41 + typedef int int_typedef; 42 + 43 + typedef enum { TYPEDEF_ENUM_VAL1, TYPEDEF_ENUM_VAL2 } enum_typedef; 44 + 45 + typedef void *void_ptr_typedef; 46 + 47 + typedef int (*func_proto_typedef)(long); 48 + 49 + typedef char arr_typedef[20]; 50 + 51 + struct core_reloc_type_based_output { 52 + bool struct_exists; 53 + bool union_exists; 54 + bool enum_exists; 55 + bool typedef_named_struct_exists; 56 + bool typedef_anon_struct_exists; 57 + bool typedef_struct_ptr_exists; 58 + bool typedef_int_exists; 59 + bool typedef_enum_exists; 60 + bool typedef_void_ptr_exists; 61 + bool typedef_func_proto_exists; 62 + bool typedef_arr_exists; 63 + 64 + int struct_sz; 65 + int union_sz; 66 + int enum_sz; 67 + int typedef_named_struct_sz; 68 + int typedef_anon_struct_sz; 69 + int typedef_struct_ptr_sz; 70 + int typedef_int_sz; 71 + int typedef_enum_sz; 72 + int typedef_void_ptr_sz; 73 + int typedef_func_proto_sz; 74 + int typedef_arr_sz; 75 + }; 76 + 77 + SEC("raw_tracepoint/sys_enter") 78 + int test_core_type_based(void *ctx) 79 + { 80 + #if __has_builtin(__builtin_preserve_type_info) 81 + struct core_reloc_type_based_output *out = (void *)&data.out; 82 + 83 + out->struct_exists = bpf_core_type_exists(struct a_struct); 84 + out->union_exists = bpf_core_type_exists(union a_union); 85 + out->enum_exists = bpf_core_type_exists(enum an_enum); 86 + out->typedef_named_struct_exists = bpf_core_type_exists(named_struct_typedef); 87 + out->typedef_anon_struct_exists = bpf_core_type_exists(anon_struct_typedef); 88 + out->typedef_struct_ptr_exists = bpf_core_type_exists(struct_ptr_typedef); 89 + out->typedef_int_exists = bpf_core_type_exists(int_typedef); 90 + out->typedef_enum_exists = bpf_core_type_exists(enum_typedef); 91 + out->typedef_void_ptr_exists = bpf_core_type_exists(void_ptr_typedef); 92 + out->typedef_func_proto_exists = bpf_core_type_exists(func_proto_typedef); 93 + out->typedef_arr_exists = bpf_core_type_exists(arr_typedef); 94 + 95 + out->struct_sz = bpf_core_type_size(struct a_struct); 96 + out->union_sz = bpf_core_type_size(union a_union); 97 + out->enum_sz = bpf_core_type_size(enum an_enum); 98 + out->typedef_named_struct_sz = bpf_core_type_size(named_struct_typedef); 99 + out->typedef_anon_struct_sz = bpf_core_type_size(anon_struct_typedef); 100 + out->typedef_struct_ptr_sz = bpf_core_type_size(struct_ptr_typedef); 101 + out->typedef_int_sz = bpf_core_type_size(int_typedef); 102 + out->typedef_enum_sz = bpf_core_type_size(enum_typedef); 103 + out->typedef_void_ptr_sz = bpf_core_type_size(void_ptr_typedef); 104 + out->typedef_func_proto_sz = bpf_core_type_size(func_proto_typedef); 105 + out->typedef_arr_sz = bpf_core_type_size(arr_typedef); 106 + #else 107 + data.skip = true; 108 + #endif 109 + return 0; 110 + }

+115

tools/testing/selftests/bpf/progs/test_core_reloc_type_id.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Facebook 3 + 4 + #include <linux/bpf.h> 5 + #include <stdint.h> 6 + #include <stdbool.h> 7 + #include <bpf/bpf_helpers.h> 8 + #include <bpf/bpf_core_read.h> 9 + 10 + char _license[] SEC("license") = "GPL"; 11 + 12 + struct { 13 + char in[256]; 14 + char out[256]; 15 + bool skip; 16 + } data = {}; 17 + 18 + /* some types are shared with test_core_reloc_type_based.c */ 19 + struct a_struct { 20 + int x; 21 + }; 22 + 23 + union a_union { 24 + int y; 25 + int z; 26 + }; 27 + 28 + enum an_enum { 29 + AN_ENUM_VAL1 = 1, 30 + AN_ENUM_VAL2 = 2, 31 + AN_ENUM_VAL3 = 3, 32 + }; 33 + 34 + typedef struct a_struct named_struct_typedef; 35 + 36 + typedef int (*func_proto_typedef)(long); 37 + 38 + typedef char arr_typedef[20]; 39 + 40 + struct core_reloc_type_id_output { 41 + int local_anon_struct; 42 + int local_anon_union; 43 + int local_anon_enum; 44 + int local_anon_func_proto_ptr; 45 + int local_anon_void_ptr; 46 + int local_anon_arr; 47 + 48 + int local_struct; 49 + int local_union; 50 + int local_enum; 51 + int local_int; 52 + int local_struct_typedef; 53 + int local_func_proto_typedef; 54 + int local_arr_typedef; 55 + 56 + int targ_struct; 57 + int targ_union; 58 + int targ_enum; 59 + int targ_int; 60 + int targ_struct_typedef; 61 + int targ_func_proto_typedef; 62 + int targ_arr_typedef; 63 + }; 64 + 65 + /* preserve types even if Clang doesn't support built-in */ 66 + struct a_struct t1 = {}; 67 + union a_union t2 = {}; 68 + enum an_enum t3 = 0; 69 + named_struct_typedef t4 = {}; 70 + func_proto_typedef t5 = 0; 71 + arr_typedef t6 = {}; 72 + 73 + SEC("raw_tracepoint/sys_enter") 74 + int test_core_type_id(void *ctx) 75 + { 76 + /* We use __builtin_btf_type_id() in this tests, but up until the time 77 + * __builtin_preserve_type_info() was added it contained a bug that 78 + * would make this test fail. The bug was fixed ([0]) with addition of 79 + * __builtin_preserve_type_info(), though, so that's what we are using 80 + * to detect whether this test has to be executed, however strange 81 + * that might look like. 82 + * 83 + * [0] https://reviews.llvm.org/D85174 84 + */ 85 + #if __has_builtin(__builtin_preserve_type_info) 86 + struct core_reloc_type_id_output *out = (void *)&data.out; 87 + 88 + out->local_anon_struct = bpf_core_type_id_local(struct { int marker_field; }); 89 + out->local_anon_union = bpf_core_type_id_local(union { int marker_field; }); 90 + out->local_anon_enum = bpf_core_type_id_local(enum { MARKER_ENUM_VAL = 123 }); 91 + out->local_anon_func_proto_ptr = bpf_core_type_id_local(_Bool(*)(int)); 92 + out->local_anon_void_ptr = bpf_core_type_id_local(void *); 93 + out->local_anon_arr = bpf_core_type_id_local(_Bool[47]); 94 + 95 + out->local_struct = bpf_core_type_id_local(struct a_struct); 96 + out->local_union = bpf_core_type_id_local(union a_union); 97 + out->local_enum = bpf_core_type_id_local(enum an_enum); 98 + out->local_int = bpf_core_type_id_local(int); 99 + out->local_struct_typedef = bpf_core_type_id_local(named_struct_typedef); 100 + out->local_func_proto_typedef = bpf_core_type_id_local(func_proto_typedef); 101 + out->local_arr_typedef = bpf_core_type_id_local(arr_typedef); 102 + 103 + out->targ_struct = bpf_core_type_id_kernel(struct a_struct); 104 + out->targ_union = bpf_core_type_id_kernel(union a_union); 105 + out->targ_enum = bpf_core_type_id_kernel(enum an_enum); 106 + out->targ_int = bpf_core_type_id_kernel(int); 107 + out->targ_struct_typedef = bpf_core_type_id_kernel(named_struct_typedef); 108 + out->targ_func_proto_typedef = bpf_core_type_id_kernel(func_proto_typedef); 109 + out->targ_arr_typedef = bpf_core_type_id_kernel(arr_typedef); 110 + #else 111 + data.skip = true; 112 + #endif 113 + 114 + return 0; 115 + }

+58

tools/testing/selftests/bpf/progs/test_d_path.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include "vmlinux.h" 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + 7 + #define MAX_PATH_LEN 128 8 + #define MAX_FILES 7 9 + 10 + pid_t my_pid = 0; 11 + __u32 cnt_stat = 0; 12 + __u32 cnt_close = 0; 13 + char paths_stat[MAX_FILES][MAX_PATH_LEN] = {}; 14 + char paths_close[MAX_FILES][MAX_PATH_LEN] = {}; 15 + int rets_stat[MAX_FILES] = {}; 16 + int rets_close[MAX_FILES] = {}; 17 + 18 + SEC("fentry/vfs_getattr") 19 + int BPF_PROG(prog_stat, struct path *path, struct kstat *stat, 20 + __u32 request_mask, unsigned int query_flags) 21 + { 22 + pid_t pid = bpf_get_current_pid_tgid() >> 32; 23 + __u32 cnt = cnt_stat; 24 + int ret; 25 + 26 + if (pid != my_pid) 27 + return 0; 28 + 29 + if (cnt >= MAX_FILES) 30 + return 0; 31 + ret = bpf_d_path(path, paths_stat[cnt], MAX_PATH_LEN); 32 + 33 + rets_stat[cnt] = ret; 34 + cnt_stat++; 35 + return 0; 36 + } 37 + 38 + SEC("fentry/filp_close") 39 + int BPF_PROG(prog_close, struct file *file, void *id) 40 + { 41 + pid_t pid = bpf_get_current_pid_tgid() >> 32; 42 + __u32 cnt = cnt_close; 43 + int ret; 44 + 45 + if (pid != my_pid) 46 + return 0; 47 + 48 + if (cnt >= MAX_FILES) 49 + return 0; 50 + ret = bpf_d_path(&file->f_path, 51 + paths_close[cnt], MAX_PATH_LEN); 52 + 53 + rets_close[cnt] = ret; 54 + cnt_close++; 55 + return 0; 56 + } 57 + 58 + char _license[] SEC("license") = "GPL";

+325

tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2020 Facebook */ 3 + 4 + #include <stddef.h> 5 + #include <errno.h> 6 + #include <stdbool.h> 7 + #include <sys/types.h> 8 + #include <sys/socket.h> 9 + #include <linux/ipv6.h> 10 + #include <linux/tcp.h> 11 + #include <linux/socket.h> 12 + #include <linux/bpf.h> 13 + #include <linux/types.h> 14 + #include <bpf/bpf_helpers.h> 15 + #include <bpf/bpf_endian.h> 16 + #define BPF_PROG_TEST_TCP_HDR_OPTIONS 17 + #include "test_tcp_hdr_options.h" 18 + 19 + __u16 last_addr16_n = __bpf_htons(0xeB9F); 20 + __u16 active_lport_n = 0; 21 + __u16 active_lport_h = 0; 22 + __u16 passive_lport_n = 0; 23 + __u16 passive_lport_h = 0; 24 + 25 + /* options received at passive side */ 26 + unsigned int nr_pure_ack = 0; 27 + unsigned int nr_data = 0; 28 + unsigned int nr_syn = 0; 29 + unsigned int nr_fin = 0; 30 + 31 + /* Check the header received from the active side */ 32 + static int __check_active_hdr_in(struct bpf_sock_ops *skops, bool check_syn) 33 + { 34 + union { 35 + struct tcphdr th; 36 + struct ipv6hdr ip6; 37 + struct tcp_exprm_opt exprm_opt; 38 + struct tcp_opt reg_opt; 39 + __u8 data[100]; /* IPv6 (40) + Max TCP hdr (60) */ 40 + } hdr = {}; 41 + __u64 load_flags = check_syn ? BPF_LOAD_HDR_OPT_TCP_SYN : 0; 42 + struct tcphdr *pth; 43 + int ret; 44 + 45 + hdr.reg_opt.kind = 0xB9; 46 + 47 + /* The option is 4 bytes long instead of 2 bytes */ 48 + ret = bpf_load_hdr_opt(skops, &hdr.reg_opt, 2, load_flags); 49 + if (ret != -ENOSPC) 50 + RET_CG_ERR(ret); 51 + 52 + /* Test searching magic with regular kind */ 53 + hdr.reg_opt.len = 4; 54 + ret = bpf_load_hdr_opt(skops, &hdr.reg_opt, sizeof(hdr.reg_opt), 55 + load_flags); 56 + if (ret != -EINVAL) 57 + RET_CG_ERR(ret); 58 + 59 + hdr.reg_opt.len = 0; 60 + ret = bpf_load_hdr_opt(skops, &hdr.reg_opt, sizeof(hdr.reg_opt), 61 + load_flags); 62 + if (ret != 4 || hdr.reg_opt.len != 4 || hdr.reg_opt.kind != 0xB9 || 63 + hdr.reg_opt.data[0] != 0xfa || hdr.reg_opt.data[1] != 0xce) 64 + RET_CG_ERR(ret); 65 + 66 + /* Test searching experimental option with invalid kind length */ 67 + hdr.exprm_opt.kind = TCPOPT_EXP; 68 + hdr.exprm_opt.len = 5; 69 + hdr.exprm_opt.magic = 0; 70 + ret = bpf_load_hdr_opt(skops, &hdr.exprm_opt, sizeof(hdr.exprm_opt), 71 + load_flags); 72 + if (ret != -EINVAL) 73 + RET_CG_ERR(ret); 74 + 75 + /* Test searching experimental option with 0 magic value */ 76 + hdr.exprm_opt.len = 4; 77 + ret = bpf_load_hdr_opt(skops, &hdr.exprm_opt, sizeof(hdr.exprm_opt), 78 + load_flags); 79 + if (ret != -ENOMSG) 80 + RET_CG_ERR(ret); 81 + 82 + hdr.exprm_opt.magic = __bpf_htons(0xeB9F); 83 + ret = bpf_load_hdr_opt(skops, &hdr.exprm_opt, sizeof(hdr.exprm_opt), 84 + load_flags); 85 + if (ret != 4 || hdr.exprm_opt.len != 4 || 86 + hdr.exprm_opt.kind != TCPOPT_EXP || 87 + hdr.exprm_opt.magic != __bpf_htons(0xeB9F)) 88 + RET_CG_ERR(ret); 89 + 90 + if (!check_syn) 91 + return CG_OK; 92 + 93 + /* Test loading from skops->syn_skb if sk_state == TCP_NEW_SYN_RECV 94 + * 95 + * Test loading from tp->saved_syn for other sk_state. 96 + */ 97 + ret = bpf_getsockopt(skops, SOL_TCP, TCP_BPF_SYN_IP, &hdr.ip6, 98 + sizeof(hdr.ip6)); 99 + if (ret != -ENOSPC) 100 + RET_CG_ERR(ret); 101 + 102 + if (hdr.ip6.saddr.s6_addr16[7] != last_addr16_n || 103 + hdr.ip6.daddr.s6_addr16[7] != last_addr16_n) 104 + RET_CG_ERR(0); 105 + 106 + ret = bpf_getsockopt(skops, SOL_TCP, TCP_BPF_SYN_IP, &hdr, sizeof(hdr)); 107 + if (ret < 0) 108 + RET_CG_ERR(ret); 109 + 110 + pth = (struct tcphdr *)(&hdr.ip6 + 1); 111 + if (pth->dest != passive_lport_n || pth->source != active_lport_n) 112 + RET_CG_ERR(0); 113 + 114 + ret = bpf_getsockopt(skops, SOL_TCP, TCP_BPF_SYN, &hdr, sizeof(hdr)); 115 + if (ret < 0) 116 + RET_CG_ERR(ret); 117 + 118 + if (hdr.th.dest != passive_lport_n || hdr.th.source != active_lport_n) 119 + RET_CG_ERR(0); 120 + 121 + return CG_OK; 122 + } 123 + 124 + static int check_active_syn_in(struct bpf_sock_ops *skops) 125 + { 126 + return __check_active_hdr_in(skops, true); 127 + } 128 + 129 + static int check_active_hdr_in(struct bpf_sock_ops *skops) 130 + { 131 + struct tcphdr *th; 132 + 133 + if (__check_active_hdr_in(skops, false) == CG_ERR) 134 + return CG_ERR; 135 + 136 + th = skops->skb_data; 137 + if (th + 1 > skops->skb_data_end) 138 + RET_CG_ERR(0); 139 + 140 + if (tcp_hdrlen(th) < skops->skb_len) 141 + nr_data++; 142 + 143 + if (th->fin) 144 + nr_fin++; 145 + 146 + if (th->ack && !th->fin && tcp_hdrlen(th) == skops->skb_len) 147 + nr_pure_ack++; 148 + 149 + return CG_OK; 150 + } 151 + 152 + static int active_opt_len(struct bpf_sock_ops *skops) 153 + { 154 + int err; 155 + 156 + /* Reserve more than enough to allow the -EEXIST test in 157 + * the write_active_opt(). 158 + */ 159 + err = bpf_reserve_hdr_opt(skops, 12, 0); 160 + if (err) 161 + RET_CG_ERR(err); 162 + 163 + return CG_OK; 164 + } 165 + 166 + static int write_active_opt(struct bpf_sock_ops *skops) 167 + { 168 + struct tcp_exprm_opt exprm_opt = {}; 169 + struct tcp_opt win_scale_opt = {}; 170 + struct tcp_opt reg_opt = {}; 171 + struct tcphdr *th; 172 + int err, ret; 173 + 174 + exprm_opt.kind = TCPOPT_EXP; 175 + exprm_opt.len = 4; 176 + exprm_opt.magic = __bpf_htons(0xeB9F); 177 + 178 + reg_opt.kind = 0xB9; 179 + reg_opt.len = 4; 180 + reg_opt.data[0] = 0xfa; 181 + reg_opt.data[1] = 0xce; 182 + 183 + win_scale_opt.kind = TCPOPT_WINDOW; 184 + 185 + err = bpf_store_hdr_opt(skops, &exprm_opt, sizeof(exprm_opt), 0); 186 + if (err) 187 + RET_CG_ERR(err); 188 + 189 + /* Store the same exprm option */ 190 + err = bpf_store_hdr_opt(skops, &exprm_opt, sizeof(exprm_opt), 0); 191 + if (err != -EEXIST) 192 + RET_CG_ERR(err); 193 + 194 + err = bpf_store_hdr_opt(skops, &reg_opt, sizeof(reg_opt), 0); 195 + if (err) 196 + RET_CG_ERR(err); 197 + err = bpf_store_hdr_opt(skops, &reg_opt, sizeof(reg_opt), 0); 198 + if (err != -EEXIST) 199 + RET_CG_ERR(err); 200 + 201 + /* Check the option has been written and can be searched */ 202 + ret = bpf_load_hdr_opt(skops, &exprm_opt, sizeof(exprm_opt), 0); 203 + if (ret != 4 || exprm_opt.len != 4 || exprm_opt.kind != TCPOPT_EXP || 204 + exprm_opt.magic != __bpf_htons(0xeB9F)) 205 + RET_CG_ERR(ret); 206 + 207 + reg_opt.len = 0; 208 + ret = bpf_load_hdr_opt(skops, &reg_opt, sizeof(reg_opt), 0); 209 + if (ret != 4 || reg_opt.len != 4 || reg_opt.kind != 0xB9 || 210 + reg_opt.data[0] != 0xfa || reg_opt.data[1] != 0xce) 211 + RET_CG_ERR(ret); 212 + 213 + th = skops->skb_data; 214 + if (th + 1 > skops->skb_data_end) 215 + RET_CG_ERR(0); 216 + 217 + if (th->syn) { 218 + active_lport_h = skops->local_port; 219 + active_lport_n = th->source; 220 + 221 + /* Search the win scale option written by kernel 222 + * in the SYN packet. 223 + */ 224 + ret = bpf_load_hdr_opt(skops, &win_scale_opt, 225 + sizeof(win_scale_opt), 0); 226 + if (ret != 3 || win_scale_opt.len != 3 || 227 + win_scale_opt.kind != TCPOPT_WINDOW) 228 + RET_CG_ERR(ret); 229 + 230 + /* Write the win scale option that kernel 231 + * has already written. 232 + */ 233 + err = bpf_store_hdr_opt(skops, &win_scale_opt, 234 + sizeof(win_scale_opt), 0); 235 + if (err != -EEXIST) 236 + RET_CG_ERR(err); 237 + } 238 + 239 + return CG_OK; 240 + } 241 + 242 + static int handle_hdr_opt_len(struct bpf_sock_ops *skops) 243 + { 244 + __u8 tcp_flags = skops_tcp_flags(skops); 245 + 246 + if ((tcp_flags & TCPHDR_SYNACK) == TCPHDR_SYNACK) 247 + /* Check the SYN from bpf_sock_ops_kern->syn_skb */ 248 + return check_active_syn_in(skops); 249 + 250 + /* Passive side should have cleared the write hdr cb by now */ 251 + if (skops->local_port == passive_lport_h) 252 + RET_CG_ERR(0); 253 + 254 + return active_opt_len(skops); 255 + } 256 + 257 + static int handle_write_hdr_opt(struct bpf_sock_ops *skops) 258 + { 259 + if (skops->local_port == passive_lport_h) 260 + RET_CG_ERR(0); 261 + 262 + return write_active_opt(skops); 263 + } 264 + 265 + static int handle_parse_hdr(struct bpf_sock_ops *skops) 266 + { 267 + /* Passive side is not writing any non-standard/unknown 268 + * option, so the active side should never be called. 269 + */ 270 + if (skops->local_port == active_lport_h) 271 + RET_CG_ERR(0); 272 + 273 + return check_active_hdr_in(skops); 274 + } 275 + 276 + static int handle_passive_estab(struct bpf_sock_ops *skops) 277 + { 278 + int err; 279 + 280 + /* No more write hdr cb */ 281 + bpf_sock_ops_cb_flags_set(skops, 282 + skops->bpf_sock_ops_cb_flags & 283 + ~BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG); 284 + 285 + /* Recheck the SYN but check the tp->saved_syn this time */ 286 + err = check_active_syn_in(skops); 287 + if (err == CG_ERR) 288 + return err; 289 + 290 + nr_syn++; 291 + 292 + /* The ack has header option written by the active side also */ 293 + return check_active_hdr_in(skops); 294 + } 295 + 296 + SEC("sockops/misc_estab") 297 + int misc_estab(struct bpf_sock_ops *skops) 298 + { 299 + int true_val = 1; 300 + 301 + switch (skops->op) { 302 + case BPF_SOCK_OPS_TCP_LISTEN_CB: 303 + passive_lport_h = skops->local_port; 304 + passive_lport_n = __bpf_htons(passive_lport_h); 305 + bpf_setsockopt(skops, SOL_TCP, TCP_SAVE_SYN, 306 + &true_val, sizeof(true_val)); 307 + set_hdr_cb_flags(skops); 308 + break; 309 + case BPF_SOCK_OPS_TCP_CONNECT_CB: 310 + set_hdr_cb_flags(skops); 311 + break; 312 + case BPF_SOCK_OPS_PARSE_HDR_OPT_CB: 313 + return handle_parse_hdr(skops); 314 + case BPF_SOCK_OPS_HDR_OPT_LEN_CB: 315 + return handle_hdr_opt_len(skops); 316 + case BPF_SOCK_OPS_WRITE_HDR_OPT_CB: 317 + return handle_write_hdr_opt(skops); 318 + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: 319 + return handle_passive_estab(skops); 320 + } 321 + 322 + return CG_OK; 323 + } 324 + 325 + char _license[] SEC("license") = "GPL";

+20

tools/testing/selftests/bpf/progs/test_pkt_access.c

··· 79 79 return skb->ifindex * val * var; 80 80 } 81 81 82 + __attribute__ ((noinline)) 83 + int test_pkt_write_access_subprog(struct __sk_buff *skb, __u32 off) 84 + { 85 + void *data = (void *)(long)skb->data; 86 + void *data_end = (void *)(long)skb->data_end; 87 + struct tcphdr *tcp = NULL; 88 + 89 + if (off > sizeof(struct ethhdr) + sizeof(struct ipv6hdr)) 90 + return -1; 91 + 92 + tcp = data + off; 93 + if (tcp + 1 > data_end) 94 + return -1; 95 + /* make modification to the packet data */ 96 + tcp->check++; 97 + return 0; 98 + } 99 + 82 100 SEC("classifier/test_pkt_access") 83 101 int test_pkt_access(struct __sk_buff *skb) 84 102 { ··· 135 117 if (test_pkt_access_subprog3(3, skb) != skb->len * 3 * skb->ifindex) 136 118 return TC_ACT_SHOT; 137 119 if (tcp) { 120 + if (test_pkt_write_access_subprog(skb, (void *)tcp - data)) 121 + return TC_ACT_SHOT; 138 122 if (((void *)(tcp) + 20) > data_end || proto != 6) 139 123 return TC_ACT_SHOT; 140 124 barrier(); /* to force ordering of checks */

+23

tools/testing/selftests/bpf/progs/test_sockmap_invalid_update.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Cloudflare 3 + #include "vmlinux.h" 4 + #include <bpf/bpf_helpers.h> 5 + 6 + struct { 7 + __uint(type, BPF_MAP_TYPE_SOCKMAP); 8 + __uint(max_entries, 1); 9 + __type(key, __u32); 10 + __type(value, __u64); 11 + } map SEC(".maps"); 12 + 13 + SEC("sockops") 14 + int bpf_sockmap(struct bpf_sock_ops *skops) 15 + { 16 + __u32 key = 0; 17 + 18 + if (skops->sk) 19 + bpf_map_update_elem(&map, &key, skops->sk, 0); 20 + return 0; 21 + } 22 + 23 + char _license[] SEC("license") = "GPL";

+48

tools/testing/selftests/bpf/progs/test_sockmap_update.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2020 Cloudflare 3 + #include "vmlinux.h" 4 + #include <bpf/bpf_helpers.h> 5 + 6 + struct { 7 + __uint(type, BPF_MAP_TYPE_SOCKMAP); 8 + __uint(max_entries, 1); 9 + __type(key, __u32); 10 + __type(value, __u64); 11 + } src SEC(".maps"); 12 + 13 + struct { 14 + __uint(type, BPF_MAP_TYPE_SOCKMAP); 15 + __uint(max_entries, 1); 16 + __type(key, __u32); 17 + __type(value, __u64); 18 + } dst_sock_map SEC(".maps"); 19 + 20 + struct { 21 + __uint(type, BPF_MAP_TYPE_SOCKHASH); 22 + __uint(max_entries, 1); 23 + __type(key, __u32); 24 + __type(value, __u64); 25 + } dst_sock_hash SEC(".maps"); 26 + 27 + SEC("classifier/copy_sock_map") 28 + int copy_sock_map(void *ctx) 29 + { 30 + struct bpf_sock *sk; 31 + bool failed = false; 32 + __u32 key = 0; 33 + 34 + sk = bpf_map_lookup_elem(&src, &key); 35 + if (!sk) 36 + return SK_DROP; 37 + 38 + if (bpf_map_update_elem(&dst_sock_map, &key, sk, 0)) 39 + failed = true; 40 + 41 + if (bpf_map_update_elem(&dst_sock_hash, &key, sk, 0)) 42 + failed = true; 43 + 44 + bpf_sk_release(sk); 45 + return failed ? SK_DROP : SK_PASS; 46 + } 47 + 48 + char _license[] SEC("license") = "GPL";

+623

tools/testing/selftests/bpf/progs/test_tcp_hdr_options.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2020 Facebook */ 3 + 4 + #include <stddef.h> 5 + #include <errno.h> 6 + #include <stdbool.h> 7 + #include <sys/types.h> 8 + #include <sys/socket.h> 9 + #include <linux/tcp.h> 10 + #include <linux/socket.h> 11 + #include <linux/bpf.h> 12 + #include <linux/types.h> 13 + #include <bpf/bpf_helpers.h> 14 + #include <bpf/bpf_endian.h> 15 + #define BPF_PROG_TEST_TCP_HDR_OPTIONS 16 + #include "test_tcp_hdr_options.h" 17 + 18 + #ifndef sizeof_field 19 + #define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER)) 20 + #endif 21 + 22 + __u8 test_kind = TCPOPT_EXP; 23 + __u16 test_magic = 0xeB9F; 24 + 25 + struct bpf_test_option passive_synack_out = {}; 26 + struct bpf_test_option passive_fin_out = {}; 27 + 28 + struct bpf_test_option passive_estab_in = {}; 29 + struct bpf_test_option passive_fin_in = {}; 30 + 31 + struct bpf_test_option active_syn_out = {}; 32 + struct bpf_test_option active_fin_out = {}; 33 + 34 + struct bpf_test_option active_estab_in = {}; 35 + struct bpf_test_option active_fin_in = {}; 36 + 37 + struct { 38 + __uint(type, BPF_MAP_TYPE_SK_STORAGE); 39 + __uint(map_flags, BPF_F_NO_PREALLOC); 40 + __type(key, int); 41 + __type(value, struct hdr_stg); 42 + } hdr_stg_map SEC(".maps"); 43 + 44 + static bool skops_want_cookie(const struct bpf_sock_ops *skops) 45 + { 46 + return skops->args[0] == BPF_WRITE_HDR_TCP_SYNACK_COOKIE; 47 + } 48 + 49 + static bool skops_current_mss(const struct bpf_sock_ops *skops) 50 + { 51 + return skops->args[0] == BPF_WRITE_HDR_TCP_CURRENT_MSS; 52 + } 53 + 54 + static __u8 option_total_len(__u8 flags) 55 + { 56 + __u8 i, len = 1; /* +1 for flags */ 57 + 58 + if (!flags) 59 + return 0; 60 + 61 + /* RESEND bit does not use a byte */ 62 + for (i = OPTION_RESEND + 1; i < __NR_OPTION_FLAGS; i++) 63 + len += !!TEST_OPTION_FLAGS(flags, i); 64 + 65 + if (test_kind == TCPOPT_EXP) 66 + return len + TCP_BPF_EXPOPT_BASE_LEN; 67 + else 68 + return len + 2; /* +1 kind, +1 kind-len */ 69 + } 70 + 71 + static void write_test_option(const struct bpf_test_option *test_opt, 72 + __u8 *data) 73 + { 74 + __u8 offset = 0; 75 + 76 + data[offset++] = test_opt->flags; 77 + if (TEST_OPTION_FLAGS(test_opt->flags, OPTION_MAX_DELACK_MS)) 78 + data[offset++] = test_opt->max_delack_ms; 79 + 80 + if (TEST_OPTION_FLAGS(test_opt->flags, OPTION_RAND)) 81 + data[offset++] = test_opt->rand; 82 + } 83 + 84 + static int store_option(struct bpf_sock_ops *skops, 85 + const struct bpf_test_option *test_opt) 86 + { 87 + union { 88 + struct tcp_exprm_opt exprm; 89 + struct tcp_opt regular; 90 + } write_opt; 91 + int err; 92 + 93 + if (test_kind == TCPOPT_EXP) { 94 + write_opt.exprm.kind = TCPOPT_EXP; 95 + write_opt.exprm.len = option_total_len(test_opt->flags); 96 + write_opt.exprm.magic = __bpf_htons(test_magic); 97 + write_opt.exprm.data32 = 0; 98 + write_test_option(test_opt, write_opt.exprm.data); 99 + err = bpf_store_hdr_opt(skops, &write_opt.exprm, 100 + sizeof(write_opt.exprm), 0); 101 + } else { 102 + write_opt.regular.kind = test_kind; 103 + write_opt.regular.len = option_total_len(test_opt->flags); 104 + write_opt.regular.data32 = 0; 105 + write_test_option(test_opt, write_opt.regular.data); 106 + err = bpf_store_hdr_opt(skops, &write_opt.regular, 107 + sizeof(write_opt.regular), 0); 108 + } 109 + 110 + if (err) 111 + RET_CG_ERR(err); 112 + 113 + return CG_OK; 114 + } 115 + 116 + static int parse_test_option(struct bpf_test_option *opt, const __u8 *start) 117 + { 118 + opt->flags = *start++; 119 + 120 + if (TEST_OPTION_FLAGS(opt->flags, OPTION_MAX_DELACK_MS)) 121 + opt->max_delack_ms = *start++; 122 + 123 + if (TEST_OPTION_FLAGS(opt->flags, OPTION_RAND)) 124 + opt->rand = *start++; 125 + 126 + return 0; 127 + } 128 + 129 + static int load_option(struct bpf_sock_ops *skops, 130 + struct bpf_test_option *test_opt, bool from_syn) 131 + { 132 + union { 133 + struct tcp_exprm_opt exprm; 134 + struct tcp_opt regular; 135 + } search_opt; 136 + int ret, load_flags = from_syn ? BPF_LOAD_HDR_OPT_TCP_SYN : 0; 137 + 138 + if (test_kind == TCPOPT_EXP) { 139 + search_opt.exprm.kind = TCPOPT_EXP; 140 + search_opt.exprm.len = 4; 141 + search_opt.exprm.magic = __bpf_htons(test_magic); 142 + search_opt.exprm.data32 = 0; 143 + ret = bpf_load_hdr_opt(skops, &search_opt.exprm, 144 + sizeof(search_opt.exprm), load_flags); 145 + if (ret < 0) 146 + return ret; 147 + return parse_test_option(test_opt, search_opt.exprm.data); 148 + } else { 149 + search_opt.regular.kind = test_kind; 150 + search_opt.regular.len = 0; 151 + search_opt.regular.data32 = 0; 152 + ret = bpf_load_hdr_opt(skops, &search_opt.regular, 153 + sizeof(search_opt.regular), load_flags); 154 + if (ret < 0) 155 + return ret; 156 + return parse_test_option(test_opt, search_opt.regular.data); 157 + } 158 + } 159 + 160 + static int synack_opt_len(struct bpf_sock_ops *skops) 161 + { 162 + struct bpf_test_option test_opt = {}; 163 + __u8 optlen; 164 + int err; 165 + 166 + if (!passive_synack_out.flags) 167 + return CG_OK; 168 + 169 + err = load_option(skops, &test_opt, true); 170 + 171 + /* bpf_test_option is not found */ 172 + if (err == -ENOMSG) 173 + return CG_OK; 174 + 175 + if (err) 176 + RET_CG_ERR(err); 177 + 178 + optlen = option_total_len(passive_synack_out.flags); 179 + if (optlen) { 180 + err = bpf_reserve_hdr_opt(skops, optlen, 0); 181 + if (err) 182 + RET_CG_ERR(err); 183 + } 184 + 185 + return CG_OK; 186 + } 187 + 188 + static int write_synack_opt(struct bpf_sock_ops *skops) 189 + { 190 + struct bpf_test_option opt; 191 + 192 + if (!passive_synack_out.flags) 193 + /* We should not even be called since no header 194 + * space has been reserved. 195 + */ 196 + RET_CG_ERR(0); 197 + 198 + opt = passive_synack_out; 199 + if (skops_want_cookie(skops)) 200 + SET_OPTION_FLAGS(opt.flags, OPTION_RESEND); 201 + 202 + return store_option(skops, &opt); 203 + } 204 + 205 + static int syn_opt_len(struct bpf_sock_ops *skops) 206 + { 207 + __u8 optlen; 208 + int err; 209 + 210 + if (!active_syn_out.flags) 211 + return CG_OK; 212 + 213 + optlen = option_total_len(active_syn_out.flags); 214 + if (optlen) { 215 + err = bpf_reserve_hdr_opt(skops, optlen, 0); 216 + if (err) 217 + RET_CG_ERR(err); 218 + } 219 + 220 + return CG_OK; 221 + } 222 + 223 + static int write_syn_opt(struct bpf_sock_ops *skops) 224 + { 225 + if (!active_syn_out.flags) 226 + RET_CG_ERR(0); 227 + 228 + return store_option(skops, &active_syn_out); 229 + } 230 + 231 + static int fin_opt_len(struct bpf_sock_ops *skops) 232 + { 233 + struct bpf_test_option *opt; 234 + struct hdr_stg *hdr_stg; 235 + __u8 optlen; 236 + int err; 237 + 238 + if (!skops->sk) 239 + RET_CG_ERR(0); 240 + 241 + hdr_stg = bpf_sk_storage_get(&hdr_stg_map, skops->sk, NULL, 0); 242 + if (!hdr_stg) 243 + RET_CG_ERR(0); 244 + 245 + if (hdr_stg->active) 246 + opt = &active_fin_out; 247 + else 248 + opt = &passive_fin_out; 249 + 250 + optlen = option_total_len(opt->flags); 251 + if (optlen) { 252 + err = bpf_reserve_hdr_opt(skops, optlen, 0); 253 + if (err) 254 + RET_CG_ERR(err); 255 + } 256 + 257 + return CG_OK; 258 + } 259 + 260 + static int write_fin_opt(struct bpf_sock_ops *skops) 261 + { 262 + struct bpf_test_option *opt; 263 + struct hdr_stg *hdr_stg; 264 + 265 + if (!skops->sk) 266 + RET_CG_ERR(0); 267 + 268 + hdr_stg = bpf_sk_storage_get(&hdr_stg_map, skops->sk, NULL, 0); 269 + if (!hdr_stg) 270 + RET_CG_ERR(0); 271 + 272 + if (hdr_stg->active) 273 + opt = &active_fin_out; 274 + else 275 + opt = &passive_fin_out; 276 + 277 + if (!opt->flags) 278 + RET_CG_ERR(0); 279 + 280 + return store_option(skops, opt); 281 + } 282 + 283 + static int resend_in_ack(struct bpf_sock_ops *skops) 284 + { 285 + struct hdr_stg *hdr_stg; 286 + 287 + if (!skops->sk) 288 + return -1; 289 + 290 + hdr_stg = bpf_sk_storage_get(&hdr_stg_map, skops->sk, NULL, 0); 291 + if (!hdr_stg) 292 + return -1; 293 + 294 + return !!hdr_stg->resend_syn; 295 + } 296 + 297 + static int nodata_opt_len(struct bpf_sock_ops *skops) 298 + { 299 + int resend; 300 + 301 + resend = resend_in_ack(skops); 302 + if (resend < 0) 303 + RET_CG_ERR(0); 304 + 305 + if (resend) 306 + return syn_opt_len(skops); 307 + 308 + return CG_OK; 309 + } 310 + 311 + static int write_nodata_opt(struct bpf_sock_ops *skops) 312 + { 313 + int resend; 314 + 315 + resend = resend_in_ack(skops); 316 + if (resend < 0) 317 + RET_CG_ERR(0); 318 + 319 + if (resend) 320 + return write_syn_opt(skops); 321 + 322 + return CG_OK; 323 + } 324 + 325 + static int data_opt_len(struct bpf_sock_ops *skops) 326 + { 327 + /* Same as the nodata version. Mostly to show 328 + * an example usage on skops->skb_len. 329 + */ 330 + return nodata_opt_len(skops); 331 + } 332 + 333 + static int write_data_opt(struct bpf_sock_ops *skops) 334 + { 335 + return write_nodata_opt(skops); 336 + } 337 + 338 + static int current_mss_opt_len(struct bpf_sock_ops *skops) 339 + { 340 + /* Reserve maximum that may be needed */ 341 + int err; 342 + 343 + err = bpf_reserve_hdr_opt(skops, option_total_len(OPTION_MASK), 0); 344 + if (err) 345 + RET_CG_ERR(err); 346 + 347 + return CG_OK; 348 + } 349 + 350 + static int handle_hdr_opt_len(struct bpf_sock_ops *skops) 351 + { 352 + __u8 tcp_flags = skops_tcp_flags(skops); 353 + 354 + if ((tcp_flags & TCPHDR_SYNACK) == TCPHDR_SYNACK) 355 + return synack_opt_len(skops); 356 + 357 + if (tcp_flags & TCPHDR_SYN) 358 + return syn_opt_len(skops); 359 + 360 + if (tcp_flags & TCPHDR_FIN) 361 + return fin_opt_len(skops); 362 + 363 + if (skops_current_mss(skops)) 364 + /* The kernel is calculating the MSS */ 365 + return current_mss_opt_len(skops); 366 + 367 + if (skops->skb_len) 368 + return data_opt_len(skops); 369 + 370 + return nodata_opt_len(skops); 371 + } 372 + 373 + static int handle_write_hdr_opt(struct bpf_sock_ops *skops) 374 + { 375 + __u8 tcp_flags = skops_tcp_flags(skops); 376 + struct tcphdr *th; 377 + 378 + if ((tcp_flags & TCPHDR_SYNACK) == TCPHDR_SYNACK) 379 + return write_synack_opt(skops); 380 + 381 + if (tcp_flags & TCPHDR_SYN) 382 + return write_syn_opt(skops); 383 + 384 + if (tcp_flags & TCPHDR_FIN) 385 + return write_fin_opt(skops); 386 + 387 + th = skops->skb_data; 388 + if (th + 1 > skops->skb_data_end) 389 + RET_CG_ERR(0); 390 + 391 + if (skops->skb_len > tcp_hdrlen(th)) 392 + return write_data_opt(skops); 393 + 394 + return write_nodata_opt(skops); 395 + } 396 + 397 + static int set_delack_max(struct bpf_sock_ops *skops, __u8 max_delack_ms) 398 + { 399 + __u32 max_delack_us = max_delack_ms * 1000; 400 + 401 + return bpf_setsockopt(skops, SOL_TCP, TCP_BPF_DELACK_MAX, 402 + &max_delack_us, sizeof(max_delack_us)); 403 + } 404 + 405 + static int set_rto_min(struct bpf_sock_ops *skops, __u8 peer_max_delack_ms) 406 + { 407 + __u32 min_rto_us = peer_max_delack_ms * 1000; 408 + 409 + return bpf_setsockopt(skops, SOL_TCP, TCP_BPF_RTO_MIN, &min_rto_us, 410 + sizeof(min_rto_us)); 411 + } 412 + 413 + static int handle_active_estab(struct bpf_sock_ops *skops) 414 + { 415 + struct hdr_stg init_stg = { 416 + .active = true, 417 + }; 418 + int err; 419 + 420 + err = load_option(skops, &active_estab_in, false); 421 + if (err && err != -ENOMSG) 422 + RET_CG_ERR(err); 423 + 424 + init_stg.resend_syn = TEST_OPTION_FLAGS(active_estab_in.flags, 425 + OPTION_RESEND); 426 + if (!skops->sk || !bpf_sk_storage_get(&hdr_stg_map, skops->sk, 427 + &init_stg, 428 + BPF_SK_STORAGE_GET_F_CREATE)) 429 + RET_CG_ERR(0); 430 + 431 + if (init_stg.resend_syn) 432 + /* Don't clear the write_hdr cb now because 433 + * the ACK may get lost and retransmit may 434 + * be needed. 435 + * 436 + * PARSE_ALL_HDR cb flag is set to learn if this 437 + * resend_syn option has received by the peer. 438 + * 439 + * The header option will be resent until a valid 440 + * packet is received at handle_parse_hdr() 441 + * and all hdr cb flags will be cleared in 442 + * handle_parse_hdr(). 443 + */ 444 + set_parse_all_hdr_cb_flags(skops); 445 + else if (!active_fin_out.flags) 446 + /* No options will be written from now */ 447 + clear_hdr_cb_flags(skops); 448 + 449 + if (active_syn_out.max_delack_ms) { 450 + err = set_delack_max(skops, active_syn_out.max_delack_ms); 451 + if (err) 452 + RET_CG_ERR(err); 453 + } 454 + 455 + if (active_estab_in.max_delack_ms) { 456 + err = set_rto_min(skops, active_estab_in.max_delack_ms); 457 + if (err) 458 + RET_CG_ERR(err); 459 + } 460 + 461 + return CG_OK; 462 + } 463 + 464 + static int handle_passive_estab(struct bpf_sock_ops *skops) 465 + { 466 + struct hdr_stg init_stg = {}; 467 + struct tcphdr *th; 468 + int err; 469 + 470 + err = load_option(skops, &passive_estab_in, true); 471 + if (err == -ENOENT) { 472 + /* saved_syn is not found. It was in syncookie mode. 473 + * We have asked the active side to resend the options 474 + * in ACK, so try to find the bpf_test_option from ACK now. 475 + */ 476 + err = load_option(skops, &passive_estab_in, false); 477 + init_stg.syncookie = true; 478 + } 479 + 480 + /* ENOMSG: The bpf_test_option is not found which is fine. 481 + * Bail out now for all other errors. 482 + */ 483 + if (err && err != -ENOMSG) 484 + RET_CG_ERR(err); 485 + 486 + th = skops->skb_data; 487 + if (th + 1 > skops->skb_data_end) 488 + RET_CG_ERR(0); 489 + 490 + if (th->syn) { 491 + /* Fastopen */ 492 + 493 + /* Cannot clear cb_flags to stop write_hdr cb. 494 + * synack is not sent yet for fast open. 495 + * Even it was, the synack may need to be retransmitted. 496 + * 497 + * PARSE_ALL_HDR cb flag is set to learn 498 + * if synack has reached the peer. 499 + * All cb_flags will be cleared in handle_parse_hdr(). 500 + */ 501 + set_parse_all_hdr_cb_flags(skops); 502 + init_stg.fastopen = true; 503 + } else if (!passive_fin_out.flags) { 504 + /* No options will be written from now */ 505 + clear_hdr_cb_flags(skops); 506 + } 507 + 508 + if (!skops->sk || 509 + !bpf_sk_storage_get(&hdr_stg_map, skops->sk, &init_stg, 510 + BPF_SK_STORAGE_GET_F_CREATE)) 511 + RET_CG_ERR(0); 512 + 513 + if (passive_synack_out.max_delack_ms) { 514 + err = set_delack_max(skops, passive_synack_out.max_delack_ms); 515 + if (err) 516 + RET_CG_ERR(err); 517 + } 518 + 519 + if (passive_estab_in.max_delack_ms) { 520 + err = set_rto_min(skops, passive_estab_in.max_delack_ms); 521 + if (err) 522 + RET_CG_ERR(err); 523 + } 524 + 525 + return CG_OK; 526 + } 527 + 528 + static int handle_parse_hdr(struct bpf_sock_ops *skops) 529 + { 530 + struct hdr_stg *hdr_stg; 531 + struct tcphdr *th; 532 + 533 + if (!skops->sk) 534 + RET_CG_ERR(0); 535 + 536 + th = skops->skb_data; 537 + if (th + 1 > skops->skb_data_end) 538 + RET_CG_ERR(0); 539 + 540 + hdr_stg = bpf_sk_storage_get(&hdr_stg_map, skops->sk, NULL, 0); 541 + if (!hdr_stg) 542 + RET_CG_ERR(0); 543 + 544 + if (hdr_stg->resend_syn || hdr_stg->fastopen) 545 + /* The PARSE_ALL_HDR cb flag was turned on 546 + * to ensure that the previously written 547 + * options have reached the peer. 548 + * Those previously written option includes: 549 + * - Active side: resend_syn in ACK during syncookie 550 + * or 551 + * - Passive side: SYNACK during fastopen 552 + * 553 + * A valid packet has been received here after 554 + * the 3WHS, so the PARSE_ALL_HDR cb flag 555 + * can be cleared now. 556 + */ 557 + clear_parse_all_hdr_cb_flags(skops); 558 + 559 + if (hdr_stg->resend_syn && !active_fin_out.flags) 560 + /* Active side resent the syn option in ACK 561 + * because the server was in syncookie mode. 562 + * A valid packet has been received, so 563 + * clear header cb flags if there is no 564 + * more option to send. 565 + */ 566 + clear_hdr_cb_flags(skops); 567 + 568 + if (hdr_stg->fastopen && !passive_fin_out.flags) 569 + /* Passive side was in fastopen. 570 + * A valid packet has been received, so 571 + * the SYNACK has reached the peer. 572 + * Clear header cb flags if there is no more 573 + * option to send. 574 + */ 575 + clear_hdr_cb_flags(skops); 576 + 577 + if (th->fin) { 578 + struct bpf_test_option *fin_opt; 579 + int err; 580 + 581 + if (hdr_stg->active) 582 + fin_opt = &active_fin_in; 583 + else 584 + fin_opt = &passive_fin_in; 585 + 586 + err = load_option(skops, fin_opt, false); 587 + if (err && err != -ENOMSG) 588 + RET_CG_ERR(err); 589 + } 590 + 591 + return CG_OK; 592 + } 593 + 594 + SEC("sockops/estab") 595 + int estab(struct bpf_sock_ops *skops) 596 + { 597 + int true_val = 1; 598 + 599 + switch (skops->op) { 600 + case BPF_SOCK_OPS_TCP_LISTEN_CB: 601 + bpf_setsockopt(skops, SOL_TCP, TCP_SAVE_SYN, 602 + &true_val, sizeof(true_val)); 603 + set_hdr_cb_flags(skops); 604 + break; 605 + case BPF_SOCK_OPS_TCP_CONNECT_CB: 606 + set_hdr_cb_flags(skops); 607 + break; 608 + case BPF_SOCK_OPS_PARSE_HDR_OPT_CB: 609 + return handle_parse_hdr(skops); 610 + case BPF_SOCK_OPS_HDR_OPT_LEN_CB: 611 + return handle_hdr_opt_len(skops); 612 + case BPF_SOCK_OPS_WRITE_HDR_OPT_CB: 613 + return handle_write_hdr_opt(skops); 614 + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: 615 + return handle_passive_estab(skops); 616 + case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: 617 + return handle_active_estab(skops); 618 + } 619 + 620 + return CG_OK; 621 + } 622 + 623 + char _license[] SEC("license") = "GPL";

+9 -3

tools/testing/selftests/bpf/progs/test_vmlinux.c

··· 19 19 int handle__tp(struct trace_event_raw_sys_enter *args) 20 20 { 21 21 struct __kernel_timespec *ts; 22 + long tv_nsec; 22 23 23 24 if (args->id != __NR_nanosleep) 24 25 return 0; 25 26 26 27 ts = (void *)args->args[0]; 27 - if (BPF_CORE_READ(ts, tv_nsec) != MY_TV_NSEC) 28 + if (bpf_probe_read_user(&tv_nsec, sizeof(ts->tv_nsec), &ts->tv_nsec) || 29 + tv_nsec != MY_TV_NSEC) 28 30 return 0; 29 31 30 32 tp_called = true; ··· 37 35 int BPF_PROG(handle__raw_tp, struct pt_regs *regs, long id) 38 36 { 39 37 struct __kernel_timespec *ts; 38 + long tv_nsec; 40 39 41 40 if (id != __NR_nanosleep) 42 41 return 0; 43 42 44 43 ts = (void *)PT_REGS_PARM1_CORE(regs); 45 - if (BPF_CORE_READ(ts, tv_nsec) != MY_TV_NSEC) 44 + if (bpf_probe_read_user(&tv_nsec, sizeof(ts->tv_nsec), &ts->tv_nsec) || 45 + tv_nsec != MY_TV_NSEC) 46 46 return 0; 47 47 48 48 raw_tp_called = true; ··· 55 51 int BPF_PROG(handle__tp_btf, struct pt_regs *regs, long id) 56 52 { 57 53 struct __kernel_timespec *ts; 54 + long tv_nsec; 58 55 59 56 if (id != __NR_nanosleep) 60 57 return 0; 61 58 62 59 ts = (void *)PT_REGS_PARM1_CORE(regs); 63 - if (BPF_CORE_READ(ts, tv_nsec) != MY_TV_NSEC) 60 + if (bpf_probe_read_user(&tv_nsec, sizeof(ts->tv_nsec), &ts->tv_nsec) || 61 + tv_nsec != MY_TV_NSEC) 64 62 return 0; 65 63 66 64 tp_btf_called = true;

+7

tools/testing/selftests/bpf/progs/trigger_bench.c

··· 39 39 return 0; 40 40 } 41 41 42 + SEC("fentry.s/__x64_sys_getpgid") 43 + int bench_trigger_fentry_sleep(void *ctx) 44 + { 45 + __sync_add_and_fetch(&hits, 1); 46 + return 0; 47 + } 48 + 42 49 SEC("fmod_ret/__x64_sys_getpgid") 43 50 int bench_trigger_fmodret(void *ctx) 44 51 {

+1

tools/testing/selftests/bpf/test_current_pid_tgid_new_ns.c

··· 156 156 bpf_object__close(obj); 157 157 } 158 158 } 159 + return 0; 159 160 }

+151

tools/testing/selftests/bpf/test_tcp_hdr_options.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2020 Facebook */ 3 + 4 + #ifndef _TEST_TCP_HDR_OPTIONS_H 5 + #define _TEST_TCP_HDR_OPTIONS_H 6 + 7 + struct bpf_test_option { 8 + __u8 flags; 9 + __u8 max_delack_ms; 10 + __u8 rand; 11 + } __attribute__((packed)); 12 + 13 + enum { 14 + OPTION_RESEND, 15 + OPTION_MAX_DELACK_MS, 16 + OPTION_RAND, 17 + __NR_OPTION_FLAGS, 18 + }; 19 + 20 + #define OPTION_F_RESEND (1 << OPTION_RESEND) 21 + #define OPTION_F_MAX_DELACK_MS (1 << OPTION_MAX_DELACK_MS) 22 + #define OPTION_F_RAND (1 << OPTION_RAND) 23 + #define OPTION_MASK ((1 << __NR_OPTION_FLAGS) - 1) 24 + 25 + #define TEST_OPTION_FLAGS(flags, option) (1 & ((flags) >> (option))) 26 + #define SET_OPTION_FLAGS(flags, option) ((flags) |= (1 << (option))) 27 + 28 + /* Store in bpf_sk_storage */ 29 + struct hdr_stg { 30 + bool active; 31 + bool resend_syn; /* active side only */ 32 + bool syncookie; /* passive side only */ 33 + bool fastopen; /* passive side only */ 34 + }; 35 + 36 + struct linum_err { 37 + unsigned int linum; 38 + int err; 39 + }; 40 + 41 + #define TCPHDR_FIN 0x01 42 + #define TCPHDR_SYN 0x02 43 + #define TCPHDR_RST 0x04 44 + #define TCPHDR_PSH 0x08 45 + #define TCPHDR_ACK 0x10 46 + #define TCPHDR_URG 0x20 47 + #define TCPHDR_ECE 0x40 48 + #define TCPHDR_CWR 0x80 49 + #define TCPHDR_SYNACK (TCPHDR_SYN | TCPHDR_ACK) 50 + 51 + #define TCPOPT_EOL 0 52 + #define TCPOPT_NOP 1 53 + #define TCPOPT_WINDOW 3 54 + #define TCPOPT_EXP 254 55 + 56 + #define TCP_BPF_EXPOPT_BASE_LEN 4 57 + #define MAX_TCP_HDR_LEN 60 58 + #define MAX_TCP_OPTION_SPACE 40 59 + 60 + #ifdef BPF_PROG_TEST_TCP_HDR_OPTIONS 61 + 62 + #define CG_OK 1 63 + #define CG_ERR 0 64 + 65 + #ifndef SOL_TCP 66 + #define SOL_TCP 6 67 + #endif 68 + 69 + struct tcp_exprm_opt { 70 + __u8 kind; 71 + __u8 len; 72 + __u16 magic; 73 + union { 74 + __u8 data[4]; 75 + __u32 data32; 76 + }; 77 + } __attribute__((packed)); 78 + 79 + struct tcp_opt { 80 + __u8 kind; 81 + __u8 len; 82 + union { 83 + __u8 data[4]; 84 + __u32 data32; 85 + }; 86 + } __attribute__((packed)); 87 + 88 + struct { 89 + __uint(type, BPF_MAP_TYPE_HASH); 90 + __uint(max_entries, 2); 91 + __type(key, int); 92 + __type(value, struct linum_err); 93 + } lport_linum_map SEC(".maps"); 94 + 95 + static inline unsigned int tcp_hdrlen(const struct tcphdr *th) 96 + { 97 + return th->doff << 2; 98 + } 99 + 100 + static inline __u8 skops_tcp_flags(const struct bpf_sock_ops *skops) 101 + { 102 + return skops->skb_tcp_flags; 103 + } 104 + 105 + static inline void clear_hdr_cb_flags(struct bpf_sock_ops *skops) 106 + { 107 + bpf_sock_ops_cb_flags_set(skops, 108 + skops->bpf_sock_ops_cb_flags & 109 + ~(BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG | 110 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)); 111 + } 112 + 113 + static inline void set_hdr_cb_flags(struct bpf_sock_ops *skops) 114 + { 115 + bpf_sock_ops_cb_flags_set(skops, 116 + skops->bpf_sock_ops_cb_flags | 117 + BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG | 118 + BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG); 119 + } 120 + static inline void 121 + clear_parse_all_hdr_cb_flags(struct bpf_sock_ops *skops) 122 + { 123 + bpf_sock_ops_cb_flags_set(skops, 124 + skops->bpf_sock_ops_cb_flags & 125 + ~BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG); 126 + } 127 + 128 + static inline void 129 + set_parse_all_hdr_cb_flags(struct bpf_sock_ops *skops) 130 + { 131 + bpf_sock_ops_cb_flags_set(skops, 132 + skops->bpf_sock_ops_cb_flags | 133 + BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG); 134 + } 135 + 136 + #define RET_CG_ERR(__err) ({ \ 137 + struct linum_err __linum_err; \ 138 + int __lport; \ 139 + \ 140 + __linum_err.linum = __LINE__; \ 141 + __linum_err.err = __err; \ 142 + __lport = skops->local_port; \ 143 + bpf_map_update_elem(&lport_linum_map, &__lport, &__linum_err, BPF_NOEXIST); \ 144 + clear_hdr_cb_flags(skops); \ 145 + clear_parse_all_hdr_cb_flags(skops); \ 146 + return CG_ERR; \ 147 + }) 148 + 149 + #endif /* BPF_PROG_TEST_TCP_HDR_OPTIONS */ 150 + 151 + #endif /* _TEST_TCP_HDR_OPTIONS_H */

+18 -1

tools/testing/selftests/bpf/test_verifier.c

··· 114 114 bpf_testdata_struct_t retvals[MAX_TEST_RUNS]; 115 115 }; 116 116 enum bpf_attach_type expected_attach_type; 117 + const char *kfunc; 117 118 }; 118 119 119 120 /* Note we want this to be 64 bit aligned so that the end of our array is ··· 985 984 attr.log_level = 4; 986 985 attr.prog_flags = pflags; 987 986 987 + if (prog_type == BPF_PROG_TYPE_TRACING && test->kfunc) { 988 + attr.attach_btf_id = libbpf_find_vmlinux_btf_id(test->kfunc, 989 + attr.expected_attach_type); 990 + if (attr.attach_btf_id < 0) { 991 + printf("FAIL\nFailed to find BTF ID for '%s'!\n", 992 + test->kfunc); 993 + (*errors)++; 994 + return; 995 + } 996 + } 997 + 988 998 fd_prog = bpf_load_program_xattr(&attr, bpf_vlog, sizeof(bpf_vlog)); 989 - if (fd_prog < 0 && !bpf_probe_prog_type(prog_type, 0)) { 999 + 1000 + /* BPF_PROG_TYPE_TRACING requires more setup and 1001 + * bpf_probe_prog_type won't give correct answer 1002 + */ 1003 + if (fd_prog < 0 && prog_type != BPF_PROG_TYPE_TRACING && 1004 + !bpf_probe_prog_type(prog_type, 0)) { 990 1005 printf("SKIP (unsupported program type %d)\n", prog_type); 991 1006 skips++; 992 1007 goto close_fds;

+146

tools/testing/selftests/bpf/verifier/bounds.c

··· 557 557 .result = ACCEPT, 558 558 .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, 559 559 }, 560 + { 561 + "bounds check for reg = 0, reg xor 1", 562 + .insns = { 563 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 564 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 565 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 566 + BPF_LD_MAP_FD(BPF_REG_1, 0), 567 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 568 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 569 + BPF_EXIT_INSN(), 570 + BPF_MOV64_IMM(BPF_REG_1, 0), 571 + BPF_ALU64_IMM(BPF_XOR, BPF_REG_1, 1), 572 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), 573 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 8), 574 + BPF_MOV64_IMM(BPF_REG_0, 0), 575 + BPF_EXIT_INSN(), 576 + }, 577 + .fixup_map_hash_8b = { 3 }, 578 + .result = ACCEPT, 579 + }, 580 + { 581 + "bounds check for reg32 = 0, reg32 xor 1", 582 + .insns = { 583 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 584 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 585 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 586 + BPF_LD_MAP_FD(BPF_REG_1, 0), 587 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 588 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 589 + BPF_EXIT_INSN(), 590 + BPF_MOV32_IMM(BPF_REG_1, 0), 591 + BPF_ALU32_IMM(BPF_XOR, BPF_REG_1, 1), 592 + BPF_JMP32_IMM(BPF_JNE, BPF_REG_1, 0, 1), 593 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 8), 594 + BPF_MOV64_IMM(BPF_REG_0, 0), 595 + BPF_EXIT_INSN(), 596 + }, 597 + .fixup_map_hash_8b = { 3 }, 598 + .result = ACCEPT, 599 + }, 600 + { 601 + "bounds check for reg = 2, reg xor 3", 602 + .insns = { 603 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 604 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 605 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 606 + BPF_LD_MAP_FD(BPF_REG_1, 0), 607 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 608 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 609 + BPF_EXIT_INSN(), 610 + BPF_MOV64_IMM(BPF_REG_1, 2), 611 + BPF_ALU64_IMM(BPF_XOR, BPF_REG_1, 3), 612 + BPF_JMP_IMM(BPF_JGT, BPF_REG_1, 0, 1), 613 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 8), 614 + BPF_MOV64_IMM(BPF_REG_0, 0), 615 + BPF_EXIT_INSN(), 616 + }, 617 + .fixup_map_hash_8b = { 3 }, 618 + .result = ACCEPT, 619 + }, 620 + { 621 + "bounds check for reg = any, reg xor 3", 622 + .insns = { 623 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 624 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 625 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 626 + BPF_LD_MAP_FD(BPF_REG_1, 0), 627 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 628 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 629 + BPF_EXIT_INSN(), 630 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0), 631 + BPF_ALU64_IMM(BPF_XOR, BPF_REG_1, 3), 632 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), 633 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 8), 634 + BPF_MOV64_IMM(BPF_REG_0, 0), 635 + BPF_EXIT_INSN(), 636 + }, 637 + .fixup_map_hash_8b = { 3 }, 638 + .result = REJECT, 639 + .errstr = "invalid access to map value", 640 + .errstr_unpriv = "invalid access to map value", 641 + }, 642 + { 643 + "bounds check for reg32 = any, reg32 xor 3", 644 + .insns = { 645 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 646 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 647 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 648 + BPF_LD_MAP_FD(BPF_REG_1, 0), 649 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 650 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 651 + BPF_EXIT_INSN(), 652 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0), 653 + BPF_ALU32_IMM(BPF_XOR, BPF_REG_1, 3), 654 + BPF_JMP32_IMM(BPF_JNE, BPF_REG_1, 0, 1), 655 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 8), 656 + BPF_MOV64_IMM(BPF_REG_0, 0), 657 + BPF_EXIT_INSN(), 658 + }, 659 + .fixup_map_hash_8b = { 3 }, 660 + .result = REJECT, 661 + .errstr = "invalid access to map value", 662 + .errstr_unpriv = "invalid access to map value", 663 + }, 664 + { 665 + "bounds check for reg > 0, reg xor 3", 666 + .insns = { 667 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 668 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 669 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 670 + BPF_LD_MAP_FD(BPF_REG_1, 0), 671 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 672 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 673 + BPF_EXIT_INSN(), 674 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0), 675 + BPF_JMP_IMM(BPF_JLE, BPF_REG_1, 0, 3), 676 + BPF_ALU64_IMM(BPF_XOR, BPF_REG_1, 3), 677 + BPF_JMP_IMM(BPF_JGE, BPF_REG_1, 0, 1), 678 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 8), 679 + BPF_MOV64_IMM(BPF_REG_0, 0), 680 + BPF_EXIT_INSN(), 681 + }, 682 + .fixup_map_hash_8b = { 3 }, 683 + .result = ACCEPT, 684 + }, 685 + { 686 + "bounds check for reg32 > 0, reg32 xor 3", 687 + .insns = { 688 + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 689 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 690 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 691 + BPF_LD_MAP_FD(BPF_REG_1, 0), 692 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 693 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 694 + BPF_EXIT_INSN(), 695 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0), 696 + BPF_JMP32_IMM(BPF_JLE, BPF_REG_1, 0, 3), 697 + BPF_ALU32_IMM(BPF_XOR, BPF_REG_1, 3), 698 + BPF_JMP32_IMM(BPF_JGE, BPF_REG_1, 0, 1), 699 + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 8), 700 + BPF_MOV64_IMM(BPF_REG_0, 0), 701 + BPF_EXIT_INSN(), 702 + }, 703 + .fixup_map_hash_8b = { 3 }, 704 + .result = ACCEPT, 705 + },

+37

tools/testing/selftests/bpf/verifier/d_path.c

··· 1 + { 2 + "d_path accept", 3 + .insns = { 4 + BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0), 5 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 6 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 7 + BPF_MOV64_IMM(BPF_REG_6, 0), 8 + BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6, 0), 9 + BPF_LD_IMM64(BPF_REG_3, 8), 10 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_d_path), 11 + BPF_MOV64_IMM(BPF_REG_0, 0), 12 + BPF_EXIT_INSN(), 13 + }, 14 + .result = ACCEPT, 15 + .prog_type = BPF_PROG_TYPE_TRACING, 16 + .expected_attach_type = BPF_TRACE_FENTRY, 17 + .kfunc = "dentry_open", 18 + }, 19 + { 20 + "d_path reject", 21 + .insns = { 22 + BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0), 23 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 24 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 25 + BPF_MOV64_IMM(BPF_REG_6, 0), 26 + BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_6, 0), 27 + BPF_LD_IMM64(BPF_REG_3, 8), 28 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_d_path), 29 + BPF_MOV64_IMM(BPF_REG_0, 0), 30 + BPF_EXIT_INSN(), 31 + }, 32 + .errstr = "helper call is not allowed in probe", 33 + .result = REJECT, 34 + .prog_type = BPF_PROG_TYPE_TRACING, 35 + .expected_attach_type = BPF_TRACE_FENTRY, 36 + .kfunc = "d_path", 37 + },