Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-11-30

We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).

The main changes are:

1) Add initial TX metadata implementation for AF_XDP with support in mlx5
and stmmac drivers. Two types of offloads are supported right now, that
is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
stmmac implementation from Song Yoong Siang.

2) Change BPF verifier logic to validate global subprograms lazily instead
of unconditionally before the main program, so they can be guarded using
BPF CO-RE techniques, from Andrii Nakryiko.

3) Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter, from Jiri Olsa.

4) Use pkg-config in BPF selftests to determine ld flags which is
in particular needed for linking statically, from Akihiko Odaki.

5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
bpf/tests: Remove duplicate JSGT tests
selftests/bpf: Add TX side to xdp_hw_metadata
selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
selftests/bpf: Add TX side to xdp_metadata
selftests/bpf: Add csum helpers
selftests/xsk: Support tx_metadata_len
xsk: Add option to calculate TX checksum in SW
xsk: Validate xsk_tx_metadata flags
xsk: Document tx_metadata_len layout
net: stmmac: Add Tx HWTS support to XDP ZC
net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
tools: ynl: Print xsk-features from the sample
xsk: Add TX timestamp and TX checksum offload support
xsk: Support tx_metadata_len
selftests/bpf: Use pkg-config for libelf
selftests/bpf: Override PKG_CONFIG for static builds
selftests/bpf: Choose pkg-config for the target
bpftool: Add support to display uprobe_multi links
selftests/bpf: Add link_info test for uprobe_multi link
selftests/bpf: Use bpf_link__destroy in fill_link_info tests
...
====================

Conflicts:

Documentation/netlink/specs/netdev.yaml:
839ff60df3ab ("net: page_pool: add nlspec for basic access to page pools")
48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/

While at it also regen, tree is dirty after:
48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.

Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+1588 -156
+18 -1
Documentation/netlink/specs/netdev.yaml
··· 45 45 - 46 46 type: flags 47 47 name: xdp-rx-metadata 48 - render-max: true 49 48 entries: 50 49 - 51 50 name: timestamp ··· 54 55 name: hash 55 56 doc: 56 57 Device is capable of exposing receive packet hash via bpf_xdp_metadata_rx_hash(). 58 + - 59 + type: flags 60 + name: xsk-flags 61 + entries: 62 + - 63 + name: tx-timestamp 64 + doc: 65 + HW timestamping egress packets is supported by the driver. 66 + - 67 + name: tx-checksum 68 + doc: 69 + L3 checksum HW offload is supported by the driver. 57 70 58 71 attribute-sets: 59 72 - ··· 97 86 See Documentation/networking/xdp-rx-metadata.rst for more details. 98 87 type: u64 99 88 enum: xdp-rx-metadata 89 + - 90 + name: xsk-features 91 + doc: Bitmask of enabled AF_XDP features. 92 + type: u64 93 + enum: xsk-flags 100 94 - 101 95 name: page-pool 102 96 attributes: ··· 225 209 - xdp-features 226 210 - xdp-zc-max-segs 227 211 - xdp-rx-metadata-features 212 + - xsk-features 228 213 dump: 229 214 reply: *dev-all 230 215 -
+1
Documentation/networking/index.rst
··· 124 124 xfrm_sync 125 125 xfrm_sysctl 126 126 xdp-rx-metadata 127 + xsk-tx-metadata 127 128 128 129 .. only:: subproject and html 129 130
+2
Documentation/networking/xdp-rx-metadata.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 1 3 =============== 2 4 XDP RX Metadata 3 5 ===============
+79
Documentation/networking/xsk-tx-metadata.rst
··· 1 + ================== 2 + AF_XDP TX Metadata 3 + ================== 4 + 5 + This document describes how to enable offloads when transmitting packets 6 + via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar 7 + metadata on the receive side. 8 + 9 + General Design 10 + ============== 11 + 12 + The headroom for the metadata is reserved via ``tx_metadata_len`` in 13 + ``struct xdp_umem_reg``. The metadata length is therefore the same for 14 + every socket that shares the same umem. The metadata layout is a fixed UAPI, 15 + refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``. 16 + Thus, generally, the ``tx_metadata_len`` field above should contain 17 + ``sizeof(union xsk_tx_metadata)``. 18 + 19 + The headroom and the metadata itself should be located right before 20 + ``xdp_desc->addr`` in the umem frame. Within a frame, the metadata 21 + layout is as follows:: 22 + 23 + tx_metadata_len 24 + / \ 25 + +-----------------+---------+----------------------------+ 26 + | xsk_tx_metadata | padding | payload | 27 + +-----------------+---------+----------------------------+ 28 + ^ 29 + | 30 + xdp_desc->addr 31 + 32 + An AF_XDP application can request headrooms larger than ``sizeof(struct 33 + xsk_tx_metadata)``. The kernel will ignore the padding (and will still 34 + use ``xdp_desc->addr - tx_metadata_len`` to locate 35 + the ``xsk_tx_metadata``). For the frames that shouldn't carry 36 + any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option), 37 + the metadata area is ignored by the kernel as well. 38 + 39 + The flags field enables the particular offload: 40 + 41 + - ``XDP_TXMD_FLAGS_TIMESTAMP``: requests the device to put transmission 42 + timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``. 43 + - ``XDP_TXMD_FLAGS_CHECKSUM``: requests the device to calculate L4 44 + checksum. ``csum_start`` specifies byte offset of where the checksumming 45 + should start and ``csum_offset`` specifies byte offset where the 46 + device should store the computed checksum. 47 + 48 + Besides the flags above, in order to trigger the offloads, the first 49 + packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` 50 + bit in the ``options`` field. Also note that in a multi-buffer packet 51 + only the first chunk should carry the metadata. 52 + 53 + Software TX Checksum 54 + ==================== 55 + 56 + For development and testing purposes its possible to pass 57 + ``XDP_UMEM_TX_SW_CSUM`` flag to ``XDP_UMEM_REG`` UMEM registration call. 58 + In this case, when running in ``XDK_COPY`` mode, the TX checksum 59 + is calculated on the CPU. Do not enable this option in production because 60 + it will negatively affect performance. 61 + 62 + Querying Device Capabilities 63 + ============================ 64 + 65 + Every devices exports its offloads capabilities via netlink netdev family. 66 + Refer to ``xsk-flags`` features bitmask in 67 + ``Documentation/netlink/specs/netdev.yaml``. 68 + 69 + - ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP`` 70 + - ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM`` 71 + 72 + See ``tools/net/ynl/samples/netdev.c`` on how to query this information. 73 + 74 + Example 75 + ======= 76 + 77 + See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example 78 + program that handles TX metadata. Also see https://github.com/fomichev/xskgen 79 + for a more bare-bones example.
+3 -1
drivers/net/ethernet/mellanox/mlx5/core/en.h
··· 484 484 485 485 struct mlx5e_xdpsq; 486 486 struct mlx5e_xmit_data; 487 + struct xsk_tx_metadata; 487 488 typedef int (*mlx5e_fp_xmit_xdp_frame_check)(struct mlx5e_xdpsq *); 488 489 typedef bool (*mlx5e_fp_xmit_xdp_frame)(struct mlx5e_xdpsq *, 489 490 struct mlx5e_xmit_data *, 490 - int); 491 + int, 492 + struct xsk_tx_metadata *); 491 493 492 494 struct mlx5e_xdpsq { 493 495 /* data path */
+61 -11
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
··· 103 103 xdptxd->dma_addr = dma_addr; 104 104 105 105 if (unlikely(!INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, 106 - mlx5e_xmit_xdp_frame, sq, xdptxd, 0))) 106 + mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL))) 107 107 return false; 108 108 109 109 /* xmit_mode == MLX5E_XDP_XMIT_MODE_FRAME */ ··· 145 145 xdptxd->dma_addr = dma_addr; 146 146 147 147 if (unlikely(!INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, 148 - mlx5e_xmit_xdp_frame, sq, xdptxd, 0))) 148 + mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL))) 149 149 return false; 150 150 151 151 /* xmit_mode == MLX5E_XDP_XMIT_MODE_PAGE */ ··· 259 259 const struct xdp_metadata_ops mlx5e_xdp_metadata_ops = { 260 260 .xmo_rx_timestamp = mlx5e_xdp_rx_timestamp, 261 261 .xmo_rx_hash = mlx5e_xdp_rx_hash, 262 + }; 263 + 264 + struct mlx5e_xsk_tx_complete { 265 + struct mlx5_cqe64 *cqe; 266 + struct mlx5e_cq *cq; 267 + }; 268 + 269 + static u64 mlx5e_xsk_fill_timestamp(void *_priv) 270 + { 271 + struct mlx5e_xsk_tx_complete *priv = _priv; 272 + u64 ts; 273 + 274 + ts = get_cqe_ts(priv->cqe); 275 + 276 + if (mlx5_is_real_time_rq(priv->cq->mdev) || mlx5_is_real_time_sq(priv->cq->mdev)) 277 + return mlx5_real_time_cyc2time(&priv->cq->mdev->clock, ts); 278 + 279 + return mlx5_timecounter_cyc2time(&priv->cq->mdev->clock, ts); 280 + } 281 + 282 + static void mlx5e_xsk_request_checksum(u16 csum_start, u16 csum_offset, void *priv) 283 + { 284 + struct mlx5_wqe_eth_seg *eseg = priv; 285 + 286 + /* HW/FW is doing parsing, so offsets are largely ignored. */ 287 + eseg->cs_flags |= MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM; 288 + } 289 + 290 + const struct xsk_tx_metadata_ops mlx5e_xsk_tx_metadata_ops = { 291 + .tmo_fill_timestamp = mlx5e_xsk_fill_timestamp, 292 + .tmo_request_checksum = mlx5e_xsk_request_checksum, 262 293 }; 263 294 264 295 /* returns true if packet was consumed by xdp */ ··· 429 398 430 399 INDIRECT_CALLABLE_SCOPE bool 431 400 mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, 432 - int check_result); 401 + int check_result, struct xsk_tx_metadata *meta); 433 402 434 403 INDIRECT_CALLABLE_SCOPE bool 435 404 mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, 436 - int check_result) 405 + int check_result, struct xsk_tx_metadata *meta) 437 406 { 438 407 struct mlx5e_tx_mpwqe *session = &sq->mpwqe; 439 408 struct mlx5e_xdpsq_stats *stats = sq->stats; ··· 451 420 */ 452 421 if (unlikely(sq->mpwqe.wqe)) 453 422 mlx5e_xdp_mpwqe_complete(sq); 454 - return mlx5e_xmit_xdp_frame(sq, xdptxd, 0); 423 + return mlx5e_xmit_xdp_frame(sq, xdptxd, 0, meta); 455 424 } 456 425 if (!xdptxd->len) { 457 426 skb_frag_t *frag = &xdptxdf->sinfo->frags[0]; ··· 481 450 * and it's safe to complete it at any time. 482 451 */ 483 452 mlx5e_xdp_mpwqe_session_start(sq); 453 + xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, &session->wqe->eth); 484 454 } 485 455 486 456 mlx5e_xdp_mpwqe_add_dseg(sq, p, stats); ··· 512 480 513 481 INDIRECT_CALLABLE_SCOPE bool 514 482 mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, 515 - int check_result) 483 + int check_result, struct xsk_tx_metadata *meta) 516 484 { 517 485 struct mlx5e_xmit_data_frags *xdptxdf = 518 486 container_of(xdptxd, struct mlx5e_xmit_data_frags, xd); ··· 631 599 sq->pc++; 632 600 } 633 601 602 + xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, eseg); 603 + 634 604 sq->doorbell_cseg = cseg; 635 605 636 606 stats->xmit++; ··· 642 608 static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq, 643 609 struct mlx5e_xdp_wqe_info *wi, 644 610 u32 *xsk_frames, 645 - struct xdp_frame_bulk *bq) 611 + struct xdp_frame_bulk *bq, 612 + struct mlx5e_cq *cq, 613 + struct mlx5_cqe64 *cqe) 646 614 { 647 615 struct mlx5e_xdp_info_fifo *xdpi_fifo = &sq->db.xdpi_fifo; 648 616 u16 i; ··· 704 668 705 669 break; 706 670 } 707 - case MLX5E_XDP_XMIT_MODE_XSK: 671 + case MLX5E_XDP_XMIT_MODE_XSK: { 708 672 /* AF_XDP send */ 673 + struct xsk_tx_metadata_compl *compl = NULL; 674 + struct mlx5e_xsk_tx_complete priv = { 675 + .cqe = cqe, 676 + .cq = cq, 677 + }; 678 + 679 + if (xp_tx_metadata_enabled(sq->xsk_pool)) { 680 + xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo); 681 + compl = &xdpi.xsk_meta; 682 + 683 + xsk_tx_metadata_complete(compl, &mlx5e_xsk_tx_metadata_ops, &priv); 684 + } 685 + 709 686 (*xsk_frames)++; 710 687 break; 688 + } 711 689 default: 712 690 WARN_ON_ONCE(true); 713 691 } ··· 770 720 771 721 sqcc += wi->num_wqebbs; 772 722 773 - mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq); 723 + mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq, cq, cqe); 774 724 } while (!last_wqe); 775 725 776 726 if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_REQ)) { ··· 817 767 818 768 sq->cc += wi->num_wqebbs; 819 769 820 - mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq); 770 + mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq, NULL, NULL); 821 771 } 822 772 823 773 xdp_flush_frame_bulk(&bq); ··· 890 840 } 891 841 892 842 ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, 893 - mlx5e_xmit_xdp_frame, sq, xdptxd, 0); 843 + mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL); 894 844 if (unlikely(!ret)) { 895 845 int j; 896 846
+8 -3
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
··· 33 33 #define __MLX5_EN_XDP_H__ 34 34 35 35 #include <linux/indirect_call_wrapper.h> 36 + #include <net/xdp_sock.h> 36 37 37 38 #include "en.h" 38 39 #include "en/txrx.h" ··· 83 82 * num, page_1, page_2, ... , page_num. 84 83 * 85 84 * MLX5E_XDP_XMIT_MODE_XSK: 86 - * none. 85 + * frame.xsk_meta. 87 86 */ 88 87 #define MLX5E_XDP_FIFO_ENTRIES2DS_MAX_RATIO 4 89 88 ··· 98 97 u8 num; 99 98 struct page *page; 100 99 } page; 100 + struct xsk_tx_metadata_compl xsk_meta; 101 101 }; 102 102 103 103 struct mlx5e_xsk_param; ··· 114 112 u32 flags); 115 113 116 114 extern const struct xdp_metadata_ops mlx5e_xdp_metadata_ops; 115 + extern const struct xsk_tx_metadata_ops mlx5e_xsk_tx_metadata_ops; 117 116 118 117 INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, 119 118 struct mlx5e_xmit_data *xdptxd, 120 - int check_result)); 119 + int check_result, 120 + struct xsk_tx_metadata *meta)); 121 121 INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, 122 122 struct mlx5e_xmit_data *xdptxd, 123 - int check_result)); 123 + int check_result, 124 + struct xsk_tx_metadata *meta)); 124 125 INDIRECT_CALLABLE_DECLARE(int mlx5e_xmit_xdp_frame_check_mpwqe(struct mlx5e_xdpsq *sq)); 125 126 INDIRECT_CALLABLE_DECLARE(int mlx5e_xmit_xdp_frame_check(struct mlx5e_xdpsq *sq)); 126 127
+16 -1
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
··· 55 55 56 56 nopwqe = mlx5e_post_nop(&sq->wq, sq->sqn, &sq->pc); 57 57 mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, *xdpi); 58 + if (xp_tx_metadata_enabled(sq->xsk_pool)) 59 + mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, 60 + (union mlx5e_xdp_info) { .xsk_meta = {} }); 58 61 sq->doorbell_cseg = &nopwqe->ctrl; 59 62 } 60 63 61 64 bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) 62 65 { 63 66 struct xsk_buff_pool *pool = sq->xsk_pool; 67 + struct xsk_tx_metadata *meta = NULL; 64 68 union mlx5e_xdp_info xdpi; 65 69 bool work_done = true; 66 70 bool flush = false; ··· 97 93 xdptxd.dma_addr = xsk_buff_raw_get_dma(pool, desc.addr); 98 94 xdptxd.data = xsk_buff_raw_get_data(pool, desc.addr); 99 95 xdptxd.len = desc.len; 96 + meta = xsk_buff_get_metadata(pool, desc.addr); 100 97 101 98 xsk_buff_raw_dma_sync_for_device(pool, xdptxd.dma_addr, xdptxd.len); 102 99 103 100 ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, 104 101 mlx5e_xmit_xdp_frame, sq, &xdptxd, 105 - check_result); 102 + check_result, meta); 106 103 if (unlikely(!ret)) { 107 104 if (sq->mpwqe.wqe) 108 105 mlx5e_xdp_mpwqe_complete(sq); ··· 111 106 mlx5e_xsk_tx_post_err(sq, &xdpi); 112 107 } else { 113 108 mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, xdpi); 109 + if (xp_tx_metadata_enabled(sq->xsk_pool)) { 110 + struct xsk_tx_metadata_compl compl; 111 + 112 + xsk_tx_metadata_to_compl(meta, &compl); 113 + XSK_TX_COMPL_FITS(void *); 114 + 115 + mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, 116 + (union mlx5e_xdp_info) 117 + { .xsk_meta = compl }); 118 + } 114 119 } 115 120 116 121 flush = true;
+1
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
··· 5165 5165 5166 5166 netdev->netdev_ops = &mlx5e_netdev_ops; 5167 5167 netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops; 5168 + netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops; 5168 5169 5169 5170 mlx5e_dcbnl_build_netdev(netdev); 5170 5171
+12
drivers/net/ethernet/stmicro/stmmac/stmmac.h
··· 51 51 bool last_segment; 52 52 bool is_jumbo; 53 53 enum stmmac_txbuf_type buf_type; 54 + struct xsk_tx_metadata_compl xsk_meta; 54 55 }; 55 56 56 57 #define STMMAC_TBS_AVAIL BIT(0) ··· 99 98 struct stmmac_priv *priv; 100 99 struct dma_desc *desc; 101 100 struct dma_desc *ndesc; 101 + }; 102 + 103 + struct stmmac_metadata_request { 104 + struct stmmac_priv *priv; 105 + struct dma_desc *tx_desc; 106 + bool *set_ic; 107 + }; 108 + 109 + struct stmmac_xsk_tx_complete { 110 + struct stmmac_priv *priv; 111 + struct dma_desc *desc; 102 112 }; 103 113 104 114 struct stmmac_rx_queue {
+63 -1
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
··· 2430 2430 } 2431 2431 } 2432 2432 2433 + static void stmmac_xsk_request_timestamp(void *_priv) 2434 + { 2435 + struct stmmac_metadata_request *meta_req = _priv; 2436 + 2437 + stmmac_enable_tx_timestamp(meta_req->priv, meta_req->tx_desc); 2438 + *meta_req->set_ic = true; 2439 + } 2440 + 2441 + static u64 stmmac_xsk_fill_timestamp(void *_priv) 2442 + { 2443 + struct stmmac_xsk_tx_complete *tx_compl = _priv; 2444 + struct stmmac_priv *priv = tx_compl->priv; 2445 + struct dma_desc *desc = tx_compl->desc; 2446 + bool found = false; 2447 + u64 ns = 0; 2448 + 2449 + if (!priv->hwts_tx_en) 2450 + return 0; 2451 + 2452 + /* check tx tstamp status */ 2453 + if (stmmac_get_tx_timestamp_status(priv, desc)) { 2454 + stmmac_get_timestamp(priv, desc, priv->adv_ts, &ns); 2455 + found = true; 2456 + } else if (!stmmac_get_mac_tx_timestamp(priv, priv->hw, &ns)) { 2457 + found = true; 2458 + } 2459 + 2460 + if (found) { 2461 + ns -= priv->plat->cdc_error_adj; 2462 + return ns_to_ktime(ns); 2463 + } 2464 + 2465 + return 0; 2466 + } 2467 + 2468 + static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = { 2469 + .tmo_request_timestamp = stmmac_xsk_request_timestamp, 2470 + .tmo_fill_timestamp = stmmac_xsk_fill_timestamp, 2471 + }; 2472 + 2433 2473 static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget) 2434 2474 { 2435 2475 struct netdev_queue *nq = netdev_get_tx_queue(priv->dev, queue); ··· 2489 2449 budget = min(budget, stmmac_tx_avail(priv, queue)); 2490 2450 2491 2451 while (budget-- > 0) { 2452 + struct stmmac_metadata_request meta_req; 2453 + struct xsk_tx_metadata *meta = NULL; 2492 2454 dma_addr_t dma_addr; 2493 2455 bool set_ic; 2494 2456 ··· 2514 2472 tx_desc = tx_q->dma_tx + entry; 2515 2473 2516 2474 dma_addr = xsk_buff_raw_get_dma(pool, xdp_desc.addr); 2475 + meta = xsk_buff_get_metadata(pool, xdp_desc.addr); 2517 2476 xsk_buff_raw_dma_sync_for_device(pool, dma_addr, xdp_desc.len); 2518 2477 2519 2478 tx_q->tx_skbuff_dma[entry].buf_type = STMMAC_TXBUF_T_XSK_TX; ··· 2542 2499 else 2543 2500 set_ic = false; 2544 2501 2502 + meta_req.priv = priv; 2503 + meta_req.tx_desc = tx_desc; 2504 + meta_req.set_ic = &set_ic; 2505 + xsk_tx_metadata_request(meta, &stmmac_xsk_tx_metadata_ops, 2506 + &meta_req); 2545 2507 if (set_ic) { 2546 2508 tx_q->tx_count_frames = 0; 2547 2509 stmmac_set_tx_ic(priv, tx_desc); ··· 2558 2510 xdp_desc.len); 2559 2511 2560 2512 stmmac_enable_dma_transmission(priv, priv->ioaddr); 2513 + 2514 + xsk_tx_metadata_to_compl(meta, 2515 + &tx_q->tx_skbuff_dma[entry].xsk_meta); 2561 2516 2562 2517 tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, priv->dma_conf.dma_tx_size); 2563 2518 entry = tx_q->cur_tx; ··· 2671 2620 } else { 2672 2621 tx_packets++; 2673 2622 } 2674 - if (skb) 2623 + if (skb) { 2675 2624 stmmac_get_tx_hwtstamp(priv, p, skb); 2625 + } else { 2626 + struct stmmac_xsk_tx_complete tx_compl = { 2627 + .priv = priv, 2628 + .desc = p, 2629 + }; 2630 + 2631 + xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, 2632 + &stmmac_xsk_tx_metadata_ops, 2633 + &tx_compl); 2634 + } 2676 2635 } 2677 2636 2678 2637 if (likely(tx_q->tx_skbuff_dma[entry].buf && ··· 7525 7464 ndev->netdev_ops = &stmmac_netdev_ops; 7526 7465 7527 7466 ndev->xdp_metadata_ops = &stmmac_xdp_metadata_ops; 7467 + ndev->xsk_tx_metadata_ops = &stmmac_xsk_tx_metadata_ops; 7528 7468 7529 7469 ndev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | 7530 7470 NETIF_F_RXCSUM;
+2
include/linux/bpf.h
··· 1347 1347 struct bpf_func_info_aux { 1348 1348 u16 linkage; 1349 1349 bool unreliable; 1350 + bool called : 1; 1351 + bool verified : 1; 1350 1352 }; 1351 1353 1352 1354 enum bpf_jit_poke_reason {
+2
include/linux/netdevice.h
··· 1865 1865 * @netdev_ops: Includes several pointers to callbacks, 1866 1866 * if one wants to override the ndo_*() functions 1867 1867 * @xdp_metadata_ops: Includes pointers to XDP metadata callbacks. 1868 + * @xsk_tx_metadata_ops: Includes pointers to AF_XDP TX metadata callbacks. 1868 1869 * @ethtool_ops: Management operations 1869 1870 * @l3mdev_ops: Layer 3 master device operations 1870 1871 * @ndisc_ops: Includes callbacks for different IPv6 neighbour ··· 2129 2128 unsigned long long priv_flags; 2130 2129 const struct net_device_ops *netdev_ops; 2131 2130 const struct xdp_metadata_ops *xdp_metadata_ops; 2131 + const struct xsk_tx_metadata_ops *xsk_tx_metadata_ops; 2132 2132 int ifindex; 2133 2133 unsigned short gflags; 2134 2134 unsigned short hard_header_len;
+13 -1
include/linux/skbuff.h
··· 566 566 int mm_account_pinned_pages(struct mmpin *mmp, size_t size); 567 567 void mm_unaccount_pinned_pages(struct mmpin *mmp); 568 568 569 + /* Preserve some data across TX submission and completion. 570 + * 571 + * Note, this state is stored in the driver. Extending the layout 572 + * might need some special care. 573 + */ 574 + struct xsk_tx_metadata_compl { 575 + __u64 *tx_timestamp; 576 + }; 577 + 569 578 /* This data is invariant across clones and lives at 570 579 * the end of the header data, ie. at skb->end. 571 580 */ ··· 587 578 /* Warning: this field is not always filled in (UFO)! */ 588 579 unsigned short gso_segs; 589 580 struct sk_buff *frag_list; 590 - struct skb_shared_hwtstamps hwtstamps; 581 + union { 582 + struct skb_shared_hwtstamps hwtstamps; 583 + struct xsk_tx_metadata_compl xsk_meta; 584 + }; 591 585 unsigned int gso_type; 592 586 u32 tskey; 593 587
+111
include/net/xdp_sock.h
··· 30 30 struct user_struct *user; 31 31 refcount_t users; 32 32 u8 flags; 33 + u8 tx_metadata_len; 33 34 bool zc; 34 35 struct page **pgs; 35 36 int id; ··· 93 92 struct xsk_queue *cq_tmp; /* Only as tmp storage before bind */ 94 93 }; 95 94 95 + /* 96 + * AF_XDP TX metadata hooks for network devices. 97 + * The following hooks can be defined; unless noted otherwise, they are 98 + * optional and can be filled with a null pointer. 99 + * 100 + * void (*tmo_request_timestamp)(void *priv) 101 + * Called when AF_XDP frame requested egress timestamp. 102 + * 103 + * u64 (*tmo_fill_timestamp)(void *priv) 104 + * Called when AF_XDP frame, that had requested egress timestamp, 105 + * received a completion. The hook needs to return the actual HW timestamp. 106 + * 107 + * void (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv) 108 + * Called when AF_XDP frame requested HW checksum offload. csum_start 109 + * indicates position where checksumming should start. 110 + * csum_offset indicates position where checksum should be stored. 111 + * 112 + */ 113 + struct xsk_tx_metadata_ops { 114 + void (*tmo_request_timestamp)(void *priv); 115 + u64 (*tmo_fill_timestamp)(void *priv); 116 + void (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv); 117 + }; 118 + 96 119 #ifdef CONFIG_XDP_SOCKETS 97 120 98 121 int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); 99 122 int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp); 100 123 void __xsk_map_flush(void); 124 + 125 + /** 126 + * xsk_tx_metadata_to_compl - Save enough relevant metadata information 127 + * to perform tx completion in the future. 128 + * @meta: pointer to AF_XDP metadata area 129 + * @compl: pointer to output struct xsk_tx_metadata_to_compl 130 + * 131 + * This function should be called by the networking device when 132 + * it prepares AF_XDP egress packet. The value of @compl should be stored 133 + * and passed to xsk_tx_metadata_complete upon TX completion. 134 + */ 135 + static inline void xsk_tx_metadata_to_compl(struct xsk_tx_metadata *meta, 136 + struct xsk_tx_metadata_compl *compl) 137 + { 138 + if (!meta) 139 + return; 140 + 141 + if (meta->flags & XDP_TXMD_FLAGS_TIMESTAMP) 142 + compl->tx_timestamp = &meta->completion.tx_timestamp; 143 + else 144 + compl->tx_timestamp = NULL; 145 + } 146 + 147 + /** 148 + * xsk_tx_metadata_request - Evaluate AF_XDP TX metadata at submission 149 + * and call appropriate xsk_tx_metadata_ops operation. 150 + * @meta: pointer to AF_XDP metadata area 151 + * @ops: pointer to struct xsk_tx_metadata_ops 152 + * @priv: pointer to driver-private aread 153 + * 154 + * This function should be called by the networking device when 155 + * it prepares AF_XDP egress packet. 156 + */ 157 + static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta, 158 + const struct xsk_tx_metadata_ops *ops, 159 + void *priv) 160 + { 161 + if (!meta) 162 + return; 163 + 164 + if (ops->tmo_request_timestamp) 165 + if (meta->flags & XDP_TXMD_FLAGS_TIMESTAMP) 166 + ops->tmo_request_timestamp(priv); 167 + 168 + if (ops->tmo_request_checksum) 169 + if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) 170 + ops->tmo_request_checksum(meta->request.csum_start, 171 + meta->request.csum_offset, priv); 172 + } 173 + 174 + /** 175 + * xsk_tx_metadata_complete - Evaluate AF_XDP TX metadata at completion 176 + * and call appropriate xsk_tx_metadata_ops operation. 177 + * @compl: pointer to completion metadata produced from xsk_tx_metadata_to_compl 178 + * @ops: pointer to struct xsk_tx_metadata_ops 179 + * @priv: pointer to driver-private aread 180 + * 181 + * This function should be called by the networking device upon 182 + * AF_XDP egress completion. 183 + */ 184 + static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata_compl *compl, 185 + const struct xsk_tx_metadata_ops *ops, 186 + void *priv) 187 + { 188 + if (!compl) 189 + return; 190 + 191 + *compl->tx_timestamp = ops->tmo_fill_timestamp(priv); 192 + } 101 193 102 194 #else 103 195 ··· 205 111 } 206 112 207 113 static inline void __xsk_map_flush(void) 114 + { 115 + } 116 + 117 + static inline void xsk_tx_metadata_to_compl(struct xsk_tx_metadata *meta, 118 + struct xsk_tx_metadata_compl *compl) 119 + { 120 + } 121 + 122 + static inline void xsk_tx_metadata_request(struct xsk_tx_metadata *meta, 123 + const struct xsk_tx_metadata_ops *ops, 124 + void *priv) 125 + { 126 + } 127 + 128 + static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata_compl *compl, 129 + const struct xsk_tx_metadata_ops *ops, 130 + void *priv) 208 131 { 209 132 } 210 133
+34
include/net/xdp_sock_drv.h
··· 165 165 return xp_raw_get_data(pool, addr); 166 166 } 167 167 168 + #define XDP_TXMD_FLAGS_VALID ( \ 169 + XDP_TXMD_FLAGS_TIMESTAMP | \ 170 + XDP_TXMD_FLAGS_CHECKSUM | \ 171 + 0) 172 + 173 + static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta) 174 + { 175 + return !(meta->flags & ~XDP_TXMD_FLAGS_VALID); 176 + } 177 + 178 + static inline struct xsk_tx_metadata *xsk_buff_get_metadata(struct xsk_buff_pool *pool, u64 addr) 179 + { 180 + struct xsk_tx_metadata *meta; 181 + 182 + if (!pool->tx_metadata_len) 183 + return NULL; 184 + 185 + meta = xp_raw_get_data(pool, addr) - pool->tx_metadata_len; 186 + if (unlikely(!xsk_buff_valid_tx_metadata(meta))) 187 + return NULL; /* no way to signal the error to the user */ 188 + 189 + return meta; 190 + } 191 + 168 192 static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) 169 193 { 170 194 struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); ··· 344 320 } 345 321 346 322 static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) 323 + { 324 + return NULL; 325 + } 326 + 327 + static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta) 328 + { 329 + return false; 330 + } 331 + 332 + static inline struct xsk_tx_metadata *xsk_buff_get_metadata(struct xsk_buff_pool *pool, u64 addr) 347 333 { 348 334 return NULL; 349 335 }
+8
include/net/xsk_buff_pool.h
··· 33 33 }; 34 34 35 35 #define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct xdp_buff_xsk, cb)) 36 + #define XSK_TX_COMPL_FITS(t) BUILD_BUG_ON(sizeof(struct xsk_tx_metadata_compl) > sizeof(t)) 36 37 37 38 struct xsk_dma_map { 38 39 dma_addr_t *dma_pages; ··· 78 77 u32 chunk_size; 79 78 u32 chunk_shift; 80 79 u32 frame_len; 80 + u8 tx_metadata_len; /* inherited from umem */ 81 81 u8 cached_need_wakeup; 82 82 bool uses_need_wakeup; 83 83 bool dma_need_sync; 84 84 bool unaligned; 85 + bool tx_sw_csum; 85 86 void *addrs; 86 87 /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect: 87 88 * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when ··· 234 231 if (!xskb->pool->unaligned) 235 232 return xskb->orig_addr + offset; 236 233 return xskb->orig_addr + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT); 234 + } 235 + 236 + static inline bool xp_tx_metadata_enabled(const struct xsk_buff_pool *pool) 237 + { 238 + return pool->tx_metadata_len > 0; 237 239 } 238 240 239 241 #endif /* XSK_BUFF_POOL_H_ */
+10
include/uapi/linux/bpf.h
··· 6563 6563 __u64 missed; 6564 6564 } kprobe_multi; 6565 6565 struct { 6566 + __aligned_u64 path; 6567 + __aligned_u64 offsets; 6568 + __aligned_u64 ref_ctr_offsets; 6569 + __aligned_u64 cookies; 6570 + __u32 path_size; /* in/out: real path size on success, including zero byte */ 6571 + __u32 count; /* in/out: uprobe_multi offsets/ref_ctr_offsets/cookies count */ 6572 + __u32 flags; 6573 + __u32 pid; 6574 + } uprobe_multi; 6575 + struct { 6566 6576 __u32 type; /* enum bpf_perf_event_type */ 6567 6577 __u32 :32; 6568 6578 union {
+46 -1
include/uapi/linux/if_xdp.h
··· 33 33 #define XDP_USE_SG (1 << 4) 34 34 35 35 /* Flags for xsk_umem_config flags */ 36 - #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) 36 + #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) 37 + 38 + /* Force checksum calculation in software. Can be used for testing or 39 + * working around potential HW issues. This option causes performance 40 + * degradation and only works in XDP_COPY mode. 41 + */ 42 + #define XDP_UMEM_TX_SW_CSUM (1 << 1) 37 43 38 44 struct sockaddr_xdp { 39 45 __u16 sxdp_family; ··· 82 76 __u32 chunk_size; 83 77 __u32 headroom; 84 78 __u32 flags; 79 + __u32 tx_metadata_len; 85 80 }; 86 81 87 82 struct xdp_statistics { ··· 112 105 #define XSK_UNALIGNED_BUF_ADDR_MASK \ 113 106 ((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1) 114 107 108 + /* Request transmit timestamp. Upon completion, put it into tx_timestamp 109 + * field of struct xsk_tx_metadata. 110 + */ 111 + #define XDP_TXMD_FLAGS_TIMESTAMP (1 << 0) 112 + 113 + /* Request transmit checksum offload. Checksum start position and offset 114 + * are communicated via csum_start and csum_offset fields of struct 115 + * xsk_tx_metadata. 116 + */ 117 + #define XDP_TXMD_FLAGS_CHECKSUM (1 << 1) 118 + 119 + /* AF_XDP offloads request. 'request' union member is consumed by the driver 120 + * when the packet is being transmitted. 'completion' union member is 121 + * filled by the driver when the transmit completion arrives. 122 + */ 123 + struct xsk_tx_metadata { 124 + __u64 flags; 125 + 126 + union { 127 + struct { 128 + /* XDP_TXMD_FLAGS_CHECKSUM */ 129 + 130 + /* Offset from desc->addr where checksumming should start. */ 131 + __u16 csum_start; 132 + /* Offset from csum_start where checksum should be stored. */ 133 + __u16 csum_offset; 134 + } request; 135 + 136 + struct { 137 + /* XDP_TXMD_FLAGS_TIMESTAMP */ 138 + __u64 tx_timestamp; 139 + } completion; 140 + }; 141 + }; 142 + 115 143 /* Rx/Tx descriptor */ 116 144 struct xdp_desc { 117 145 __u64 addr; ··· 162 120 * to 0 and this maintains backward compatibility. 163 121 */ 164 122 #define XDP_PKT_CONTD (1 << 0) 123 + 124 + /* TX packet carries valid metadata. */ 125 + #define XDP_TX_METADATA (1 << 1) 165 126 166 127 #endif /* _LINUX_IF_XDP_H */
+12 -2
include/uapi/linux/netdev.h
··· 48 48 enum netdev_xdp_rx_metadata { 49 49 NETDEV_XDP_RX_METADATA_TIMESTAMP = 1, 50 50 NETDEV_XDP_RX_METADATA_HASH = 2, 51 + }; 51 52 52 - /* private: */ 53 - NETDEV_XDP_RX_METADATA_MASK = 3, 53 + /** 54 + * enum netdev_xsk_flags 55 + * @NETDEV_XSK_FLAGS_TX_TIMESTAMP: HW timestamping egress packets is supported 56 + * by the driver. 57 + * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the 58 + * driver. 59 + */ 60 + enum netdev_xsk_flags { 61 + NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1, 62 + NETDEV_XSK_FLAGS_TX_CHECKSUM = 2, 54 63 }; 55 64 56 65 enum { ··· 68 59 NETDEV_A_DEV_XDP_FEATURES, 69 60 NETDEV_A_DEV_XDP_ZC_MAX_SEGS, 70 61 NETDEV_A_DEV_XDP_RX_METADATA_FEATURES, 62 + NETDEV_A_DEV_XSK_FEATURES, 71 63 72 64 __NETDEV_A_DEV_MAX, 73 65 NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
+66 -17
kernel/bpf/verifier.c
··· 339 339 340 340 struct btf *btf_vmlinux; 341 341 342 + static const char *btf_type_name(const struct btf *btf, u32 id) 343 + { 344 + return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off); 345 + } 346 + 342 347 static DEFINE_MUTEX(bpf_verifier_lock); 343 348 static DEFINE_MUTEX(bpf_percpu_ma_lock); 344 349 ··· 421 416 struct bpf_func_info_aux *aux = env->prog->aux->func_info_aux; 422 417 423 418 return aux && aux[subprog].linkage == BTF_FUNC_GLOBAL; 419 + } 420 + 421 + static const char *subprog_name(const struct bpf_verifier_env *env, int subprog) 422 + { 423 + struct bpf_func_info *info; 424 + 425 + if (!env->prog->aux->func_info) 426 + return ""; 427 + 428 + info = &env->prog->aux->func_info[subprog]; 429 + return btf_type_name(env->prog->aux->btf, info->type_id); 430 + } 431 + 432 + static struct bpf_func_info_aux *subprog_aux(const struct bpf_verifier_env *env, int subprog) 433 + { 434 + return &env->prog->aux->func_info_aux[subprog]; 424 435 } 425 436 426 437 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) ··· 606 585 static int iter_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int nr_slots) 607 586 { 608 587 return stack_slot_obj_get_spi(env, reg, "iter", nr_slots); 609 - } 610 - 611 - static const char *btf_type_name(const struct btf *btf, u32 id) 612 - { 613 - return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off); 614 588 } 615 589 616 590 static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type) ··· 9285 9269 if (err == -EFAULT) 9286 9270 return err; 9287 9271 if (subprog_is_global(env, subprog)) { 9272 + const char *sub_name = subprog_name(env, subprog); 9273 + 9288 9274 if (err) { 9289 - verbose(env, "Caller passes invalid args into func#%d\n", subprog); 9275 + verbose(env, "Caller passes invalid args into func#%d ('%s')\n", 9276 + subprog, sub_name); 9290 9277 return err; 9291 9278 } 9292 9279 9293 - if (env->log.level & BPF_LOG_LEVEL) 9294 - verbose(env, "Func#%d is global and valid. Skipping.\n", subprog); 9280 + verbose(env, "Func#%d ('%s') is global and assumed valid.\n", 9281 + subprog, sub_name); 9282 + /* mark global subprog for verifying after main prog */ 9283 + subprog_aux(env, subprog)->called = true; 9295 9284 clear_caller_saved_regs(env, caller->regs); 9296 9285 9297 9286 /* All global functions return a 64-bit SCALAR_VALUE */ ··· 19880 19859 return ret; 19881 19860 } 19882 19861 19883 - /* Verify all global functions in a BPF program one by one based on their BTF. 19884 - * All global functions must pass verification. Otherwise the whole program is rejected. 19862 + /* Lazily verify all global functions based on their BTF, if they are called 19863 + * from main BPF program or any of subprograms transitively. 19864 + * BPF global subprogs called from dead code are not validated. 19865 + * All callable global functions must pass verification. 19866 + * Otherwise the whole program is rejected. 19885 19867 * Consider: 19886 19868 * int bar(int); 19887 19869 * int foo(int f) ··· 19903 19879 static int do_check_subprogs(struct bpf_verifier_env *env) 19904 19880 { 19905 19881 struct bpf_prog_aux *aux = env->prog->aux; 19906 - int i, ret; 19882 + struct bpf_func_info_aux *sub_aux; 19883 + int i, ret, new_cnt; 19907 19884 19908 19885 if (!aux->func_info) 19909 19886 return 0; 19910 19887 19888 + /* exception callback is presumed to be always called */ 19889 + if (env->exception_callback_subprog) 19890 + subprog_aux(env, env->exception_callback_subprog)->called = true; 19891 + 19892 + again: 19893 + new_cnt = 0; 19911 19894 for (i = 1; i < env->subprog_cnt; i++) { 19912 - if (aux->func_info_aux[i].linkage != BTF_FUNC_GLOBAL) 19895 + if (!subprog_is_global(env, i)) 19913 19896 continue; 19897 + 19898 + sub_aux = subprog_aux(env, i); 19899 + if (!sub_aux->called || sub_aux->verified) 19900 + continue; 19901 + 19914 19902 env->insn_idx = env->subprog_info[i].start; 19915 19903 WARN_ON_ONCE(env->insn_idx == 0); 19916 19904 ret = do_check_common(env, i, env->exception_callback_subprog == i); 19917 19905 if (ret) { 19918 19906 return ret; 19919 19907 } else if (env->log.level & BPF_LOG_LEVEL) { 19920 - verbose(env, 19921 - "Func#%d is safe for any args that match its prototype\n", 19922 - i); 19908 + verbose(env, "Func#%d ('%s') is safe for any args that match its prototype\n", 19909 + i, subprog_name(env, i)); 19923 19910 } 19911 + 19912 + /* We verified new global subprog, it might have called some 19913 + * more global subprogs that we haven't verified yet, so we 19914 + * need to do another pass over subprogs to verify those. 19915 + */ 19916 + sub_aux->verified = true; 19917 + new_cnt++; 19924 19918 } 19919 + 19920 + /* We can't loop forever as we verify at least one global subprog on 19921 + * each pass. 19922 + */ 19923 + if (new_cnt) 19924 + goto again; 19925 + 19925 19926 return 0; 19926 19927 } 19927 19928 ··· 20592 20543 if (ret < 0) 20593 20544 goto skip_full_check; 20594 20545 20595 - ret = do_check_subprogs(env); 20596 - ret = ret ?: do_check_main(env); 20546 + ret = do_check_main(env); 20547 + ret = ret ?: do_check_subprogs(env); 20597 20548 20598 20549 if (ret == 0 && bpf_prog_is_offloaded(env->prog->aux)) 20599 20550 ret = bpf_prog_offload_finalize(env);
+75 -11
kernel/trace/bpf_trace.c
··· 3033 3033 struct bpf_uprobe { 3034 3034 struct bpf_uprobe_multi_link *link; 3035 3035 loff_t offset; 3036 + unsigned long ref_ctr_offset; 3036 3037 u64 cookie; 3037 3038 struct uprobe_consumer consumer; 3038 3039 }; ··· 3042 3041 struct path path; 3043 3042 struct bpf_link link; 3044 3043 u32 cnt; 3044 + u32 flags; 3045 3045 struct bpf_uprobe *uprobes; 3046 3046 struct task_struct *task; 3047 3047 }; ··· 3084 3082 kfree(umulti_link); 3085 3083 } 3086 3084 3085 + static int bpf_uprobe_multi_link_fill_link_info(const struct bpf_link *link, 3086 + struct bpf_link_info *info) 3087 + { 3088 + u64 __user *uref_ctr_offsets = u64_to_user_ptr(info->uprobe_multi.ref_ctr_offsets); 3089 + u64 __user *ucookies = u64_to_user_ptr(info->uprobe_multi.cookies); 3090 + u64 __user *uoffsets = u64_to_user_ptr(info->uprobe_multi.offsets); 3091 + u64 __user *upath = u64_to_user_ptr(info->uprobe_multi.path); 3092 + u32 upath_size = info->uprobe_multi.path_size; 3093 + struct bpf_uprobe_multi_link *umulti_link; 3094 + u32 ucount = info->uprobe_multi.count; 3095 + int err = 0, i; 3096 + long left; 3097 + 3098 + if (!upath ^ !upath_size) 3099 + return -EINVAL; 3100 + 3101 + if ((uoffsets || uref_ctr_offsets || ucookies) && !ucount) 3102 + return -EINVAL; 3103 + 3104 + umulti_link = container_of(link, struct bpf_uprobe_multi_link, link); 3105 + info->uprobe_multi.count = umulti_link->cnt; 3106 + info->uprobe_multi.flags = umulti_link->flags; 3107 + info->uprobe_multi.pid = umulti_link->task ? 3108 + task_pid_nr_ns(umulti_link->task, task_active_pid_ns(current)) : 0; 3109 + 3110 + if (upath) { 3111 + char *p, *buf; 3112 + 3113 + upath_size = min_t(u32, upath_size, PATH_MAX); 3114 + 3115 + buf = kmalloc(upath_size, GFP_KERNEL); 3116 + if (!buf) 3117 + return -ENOMEM; 3118 + p = d_path(&umulti_link->path, buf, upath_size); 3119 + if (IS_ERR(p)) { 3120 + kfree(buf); 3121 + return PTR_ERR(p); 3122 + } 3123 + upath_size = buf + upath_size - p; 3124 + left = copy_to_user(upath, p, upath_size); 3125 + kfree(buf); 3126 + if (left) 3127 + return -EFAULT; 3128 + info->uprobe_multi.path_size = upath_size; 3129 + } 3130 + 3131 + if (!uoffsets && !ucookies && !uref_ctr_offsets) 3132 + return 0; 3133 + 3134 + if (ucount < umulti_link->cnt) 3135 + err = -ENOSPC; 3136 + else 3137 + ucount = umulti_link->cnt; 3138 + 3139 + for (i = 0; i < ucount; i++) { 3140 + if (uoffsets && 3141 + put_user(umulti_link->uprobes[i].offset, uoffsets + i)) 3142 + return -EFAULT; 3143 + if (uref_ctr_offsets && 3144 + put_user(umulti_link->uprobes[i].ref_ctr_offset, uref_ctr_offsets + i)) 3145 + return -EFAULT; 3146 + if (ucookies && 3147 + put_user(umulti_link->uprobes[i].cookie, ucookies + i)) 3148 + return -EFAULT; 3149 + } 3150 + 3151 + return err; 3152 + } 3153 + 3087 3154 static const struct bpf_link_ops bpf_uprobe_multi_link_lops = { 3088 3155 .release = bpf_uprobe_multi_link_release, 3089 3156 .dealloc = bpf_uprobe_multi_link_dealloc, 3157 + .fill_link_info = bpf_uprobe_multi_link_fill_link_info, 3090 3158 }; 3091 3159 3092 3160 static int uprobe_prog_run(struct bpf_uprobe *uprobe, ··· 3244 3172 { 3245 3173 struct bpf_uprobe_multi_link *link = NULL; 3246 3174 unsigned long __user *uref_ctr_offsets; 3247 - unsigned long *ref_ctr_offsets = NULL; 3248 3175 struct bpf_link_primer link_primer; 3249 3176 struct bpf_uprobe *uprobes = NULL; 3250 3177 struct task_struct *task = NULL; ··· 3316 3245 if (!uprobes || !link) 3317 3246 goto error_free; 3318 3247 3319 - if (uref_ctr_offsets) { 3320 - ref_ctr_offsets = kvcalloc(cnt, sizeof(*ref_ctr_offsets), GFP_KERNEL); 3321 - if (!ref_ctr_offsets) 3322 - goto error_free; 3323 - } 3324 - 3325 3248 for (i = 0; i < cnt; i++) { 3326 3249 if (ucookies && __get_user(uprobes[i].cookie, ucookies + i)) { 3327 3250 err = -EFAULT; 3328 3251 goto error_free; 3329 3252 } 3330 - if (uref_ctr_offsets && __get_user(ref_ctr_offsets[i], uref_ctr_offsets + i)) { 3253 + if (uref_ctr_offsets && __get_user(uprobes[i].ref_ctr_offset, uref_ctr_offsets + i)) { 3331 3254 err = -EFAULT; 3332 3255 goto error_free; 3333 3256 } ··· 3345 3280 link->uprobes = uprobes; 3346 3281 link->path = path; 3347 3282 link->task = task; 3283 + link->flags = flags; 3348 3284 3349 3285 bpf_link_init(&link->link, BPF_LINK_TYPE_UPROBE_MULTI, 3350 3286 &bpf_uprobe_multi_link_lops, prog); ··· 3353 3287 for (i = 0; i < cnt; i++) { 3354 3288 err = uprobe_register_refctr(d_real_inode(link->path.dentry), 3355 3289 uprobes[i].offset, 3356 - ref_ctr_offsets ? ref_ctr_offsets[i] : 0, 3290 + uprobes[i].ref_ctr_offset, 3357 3291 &uprobes[i].consumer); 3358 3292 if (err) { 3359 3293 bpf_uprobe_unregister(&path, uprobes, i); ··· 3365 3299 if (err) 3366 3300 goto error_free; 3367 3301 3368 - kvfree(ref_ctr_offsets); 3369 3302 return bpf_link_settle(&link_primer); 3370 3303 3371 3304 error_free: 3372 - kvfree(ref_ctr_offsets); 3373 3305 kvfree(uprobes); 3374 3306 kfree(link); 3375 3307 if (task)
-2
lib/test_bpf.c
··· 12199 12199 BPF_JMP32_IMM_ZEXT(JLE), 12200 12200 BPF_JMP32_IMM_ZEXT(JSGT), 12201 12201 BPF_JMP32_IMM_ZEXT(JSGE), 12202 - BPF_JMP32_IMM_ZEXT(JSGT), 12203 12202 BPF_JMP32_IMM_ZEXT(JSLT), 12204 12203 BPF_JMP32_IMM_ZEXT(JSLE), 12205 12204 #undef BPF_JMP2_IMM_ZEXT ··· 12234 12235 BPF_JMP32_REG_ZEXT(JLE), 12235 12236 BPF_JMP32_REG_ZEXT(JSGT), 12236 12237 BPF_JMP32_REG_ZEXT(JSGE), 12237 - BPF_JMP32_REG_ZEXT(JSGT), 12238 12238 BPF_JMP32_REG_ZEXT(JSLT), 12239 12239 BPF_JMP32_REG_ZEXT(JSLE), 12240 12240 #undef BPF_JMP2_REG_ZEXT
+1 -1
net/bpf/test_run.c
··· 542 542 543 543 int noinline bpf_fentry_test7(struct bpf_fentry_test_t *arg) 544 544 { 545 - asm volatile (""); 545 + asm volatile ("": "+r"(arg)); 546 546 return (long)arg; 547 547 } 548 548
+12 -1
net/core/netdev-genl.c
··· 6 6 #include <net/net_namespace.h> 7 7 #include <net/sock.h> 8 8 #include <net/xdp.h> 9 + #include <net/xdp_sock.h> 9 10 10 11 #include "netdev-genl-gen.h" 11 12 ··· 14 13 netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp, 15 14 const struct genl_info *info) 16 15 { 16 + u64 xsk_features = 0; 17 17 u64 xdp_rx_meta = 0; 18 18 void *hdr; 19 19 ··· 28 26 XDP_METADATA_KFUNC_xxx 29 27 #undef XDP_METADATA_KFUNC 30 28 29 + if (netdev->xsk_tx_metadata_ops) { 30 + if (netdev->xsk_tx_metadata_ops->tmo_fill_timestamp) 31 + xsk_features |= NETDEV_XSK_FLAGS_TX_TIMESTAMP; 32 + if (netdev->xsk_tx_metadata_ops->tmo_request_checksum) 33 + xsk_features |= NETDEV_XSK_FLAGS_TX_CHECKSUM; 34 + } 35 + 31 36 if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) || 32 37 nla_put_u64_64bit(rsp, NETDEV_A_DEV_XDP_FEATURES, 33 38 netdev->xdp_features, NETDEV_A_DEV_PAD) || 34 39 nla_put_u64_64bit(rsp, NETDEV_A_DEV_XDP_RX_METADATA_FEATURES, 35 - xdp_rx_meta, NETDEV_A_DEV_PAD)) { 40 + xdp_rx_meta, NETDEV_A_DEV_PAD) || 41 + nla_put_u64_64bit(rsp, NETDEV_A_DEV_XSK_FEATURES, 42 + xsk_features, NETDEV_A_DEV_PAD)) { 36 43 genlmsg_cancel(rsp, hdr); 37 44 return -EINVAL; 38 45 }
+10 -1
net/xdp/xdp_umem.c
··· 148 148 return 0; 149 149 } 150 150 151 + #define XDP_UMEM_FLAGS_VALID ( \ 152 + XDP_UMEM_UNALIGNED_CHUNK_FLAG | \ 153 + XDP_UMEM_TX_SW_CSUM | \ 154 + 0) 155 + 151 156 static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) 152 157 { 153 158 bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG; ··· 172 167 return -EINVAL; 173 168 } 174 169 175 - if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG) 170 + if (mr->flags & ~XDP_UMEM_FLAGS_VALID) 176 171 return -EINVAL; 177 172 178 173 if (!unaligned_chunks && !is_power_of_2(chunk_size)) ··· 204 199 if (headroom >= chunk_size - XDP_PACKET_HEADROOM) 205 200 return -EINVAL; 206 201 202 + if (mr->tx_metadata_len >= 256 || mr->tx_metadata_len % 8) 203 + return -EINVAL; 204 + 207 205 umem->size = size; 208 206 umem->headroom = headroom; 209 207 umem->chunk_size = chunk_size; ··· 215 207 umem->pgs = NULL; 216 208 umem->user = NULL; 217 209 umem->flags = mr->flags; 210 + umem->tx_metadata_len = mr->tx_metadata_len; 218 211 219 212 INIT_LIST_HEAD(&umem->xsk_dma_list); 220 213 refcount_set(&umem->users, 1);
+55 -1
net/xdp/xsk.c
··· 571 571 572 572 static void xsk_destruct_skb(struct sk_buff *skb) 573 573 { 574 + struct xsk_tx_metadata_compl *compl = &skb_shinfo(skb)->xsk_meta; 575 + 576 + if (compl->tx_timestamp) { 577 + /* sw completion timestamp, not a real one */ 578 + *compl->tx_timestamp = ktime_get_tai_fast_ns(); 579 + } 580 + 574 581 xsk_cq_submit_locked(xdp_sk(skb->sk), xsk_get_num_desc(skb)); 575 582 sock_wfree(skb); 576 583 } ··· 662 655 static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, 663 656 struct xdp_desc *desc) 664 657 { 658 + struct xsk_tx_metadata *meta = NULL; 665 659 struct net_device *dev = xs->dev; 666 660 struct sk_buff *skb = xs->skb; 661 + bool first_frag = false; 667 662 int err; 668 663 669 664 if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { ··· 696 687 kfree_skb(skb); 697 688 goto free_err; 698 689 } 690 + 691 + first_frag = true; 699 692 } else { 700 693 int nr_frags = skb_shinfo(skb)->nr_frags; 701 694 struct page *page; ··· 720 709 721 710 skb_add_rx_frag(skb, nr_frags, page, 0, len, 0); 722 711 } 712 + 713 + if (first_frag && desc->options & XDP_TX_METADATA) { 714 + if (unlikely(xs->pool->tx_metadata_len == 0)) { 715 + err = -EINVAL; 716 + goto free_err; 717 + } 718 + 719 + meta = buffer - xs->pool->tx_metadata_len; 720 + if (unlikely(!xsk_buff_valid_tx_metadata(meta))) { 721 + err = -EINVAL; 722 + goto free_err; 723 + } 724 + 725 + if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) { 726 + if (unlikely(meta->request.csum_start + 727 + meta->request.csum_offset + 728 + sizeof(__sum16) > len)) { 729 + err = -EINVAL; 730 + goto free_err; 731 + } 732 + 733 + skb->csum_start = hr + meta->request.csum_start; 734 + skb->csum_offset = meta->request.csum_offset; 735 + skb->ip_summed = CHECKSUM_PARTIAL; 736 + 737 + if (unlikely(xs->pool->tx_sw_csum)) { 738 + err = skb_checksum_help(skb); 739 + if (err) 740 + goto free_err; 741 + } 742 + } 743 + } 723 744 } 724 745 725 746 skb->dev = dev; 726 747 skb->priority = READ_ONCE(xs->sk.sk_priority); 727 748 skb->mark = READ_ONCE(xs->sk.sk_mark); 728 749 skb->destructor = xsk_destruct_skb; 750 + xsk_tx_metadata_to_compl(meta, &skb_shinfo(skb)->xsk_meta); 729 751 xsk_set_destructor_arg(skb); 730 752 731 753 return skb; ··· 1327 1283 __u32 headroom; 1328 1284 }; 1329 1285 1286 + struct xdp_umem_reg_v2 { 1287 + __u64 addr; /* Start of packet data area */ 1288 + __u64 len; /* Length of packet data area */ 1289 + __u32 chunk_size; 1290 + __u32 headroom; 1291 + __u32 flags; 1292 + }; 1293 + 1330 1294 static int xsk_setsockopt(struct socket *sock, int level, int optname, 1331 1295 sockptr_t optval, unsigned int optlen) 1332 1296 { ··· 1378 1326 1379 1327 if (optlen < sizeof(struct xdp_umem_reg_v1)) 1380 1328 return -EINVAL; 1381 - else if (optlen < sizeof(mr)) 1329 + else if (optlen < sizeof(struct xdp_umem_reg_v2)) 1382 1330 mr_size = sizeof(struct xdp_umem_reg_v1); 1331 + else if (optlen < sizeof(mr)) 1332 + mr_size = sizeof(struct xdp_umem_reg_v2); 1383 1333 1384 1334 if (copy_from_sockptr(&mr, optval, mr_size)) 1385 1335 return -EFAULT;
+2
net/xdp/xsk_buff_pool.c
··· 85 85 XDP_PACKET_HEADROOM; 86 86 pool->umem = umem; 87 87 pool->addrs = umem->addrs; 88 + pool->tx_metadata_len = umem->tx_metadata_len; 89 + pool->tx_sw_csum = umem->flags & XDP_UMEM_TX_SW_CSUM; 88 90 INIT_LIST_HEAD(&pool->free_list); 89 91 INIT_LIST_HEAD(&pool->xskb_list); 90 92 INIT_LIST_HEAD(&pool->xsk_tx_list);
+11 -8
net/xdp/xsk_queue.h
··· 137 137 138 138 static inline bool xp_unused_options_set(u32 options) 139 139 { 140 - return options & ~XDP_PKT_CONTD; 140 + return options & ~(XDP_PKT_CONTD | XDP_TX_METADATA); 141 141 } 142 142 143 143 static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, 144 144 struct xdp_desc *desc) 145 145 { 146 - u64 offset = desc->addr & (pool->chunk_size - 1); 146 + u64 addr = desc->addr - pool->tx_metadata_len; 147 + u64 len = desc->len + pool->tx_metadata_len; 148 + u64 offset = addr & (pool->chunk_size - 1); 147 149 148 150 if (!desc->len) 149 151 return false; 150 152 151 - if (offset + desc->len > pool->chunk_size) 153 + if (offset + len > pool->chunk_size) 152 154 return false; 153 155 154 - if (desc->addr >= pool->addrs_cnt) 156 + if (addr >= pool->addrs_cnt) 155 157 return false; 156 158 157 159 if (xp_unused_options_set(desc->options)) ··· 164 162 static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, 165 163 struct xdp_desc *desc) 166 164 { 167 - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr); 165 + u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len; 166 + u64 len = desc->len + pool->tx_metadata_len; 168 167 169 168 if (!desc->len) 170 169 return false; 171 170 172 - if (desc->len > pool->chunk_size) 171 + if (len > pool->chunk_size) 173 172 return false; 174 173 175 - if (addr >= pool->addrs_cnt || addr + desc->len > pool->addrs_cnt || 176 - xp_desc_crosses_non_contig_pg(pool, addr, desc->len)) 174 + if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt || 175 + xp_desc_crosses_non_contig_pg(pool, addr, len)) 177 176 return false; 178 177 179 178 if (xp_unused_options_set(desc->options))
+103 -2
tools/bpf/bpftool/link.c
··· 294 294 jsonw_end_array(json_wtr); 295 295 } 296 296 297 + static __u64 *u64_to_arr(__u64 val) 298 + { 299 + return (__u64 *) u64_to_ptr(val); 300 + } 301 + 302 + static void 303 + show_uprobe_multi_json(struct bpf_link_info *info, json_writer_t *wtr) 304 + { 305 + __u32 i; 306 + 307 + jsonw_bool_field(json_wtr, "retprobe", 308 + info->uprobe_multi.flags & BPF_F_UPROBE_MULTI_RETURN); 309 + jsonw_string_field(json_wtr, "path", (char *) u64_to_ptr(info->uprobe_multi.path)); 310 + jsonw_uint_field(json_wtr, "func_cnt", info->uprobe_multi.count); 311 + jsonw_int_field(json_wtr, "pid", (int) info->uprobe_multi.pid); 312 + jsonw_name(json_wtr, "funcs"); 313 + jsonw_start_array(json_wtr); 314 + 315 + for (i = 0; i < info->uprobe_multi.count; i++) { 316 + jsonw_start_object(json_wtr); 317 + jsonw_uint_field(json_wtr, "offset", 318 + u64_to_arr(info->uprobe_multi.offsets)[i]); 319 + jsonw_uint_field(json_wtr, "ref_ctr_offset", 320 + u64_to_arr(info->uprobe_multi.ref_ctr_offsets)[i]); 321 + jsonw_uint_field(json_wtr, "cookie", 322 + u64_to_arr(info->uprobe_multi.cookies)[i]); 323 + jsonw_end_object(json_wtr); 324 + } 325 + jsonw_end_array(json_wtr); 326 + } 327 + 297 328 static void 298 329 show_perf_event_kprobe_json(struct bpf_link_info *info, json_writer_t *wtr) 299 330 { ··· 495 464 break; 496 465 case BPF_LINK_TYPE_KPROBE_MULTI: 497 466 show_kprobe_multi_json(info, json_wtr); 467 + break; 468 + case BPF_LINK_TYPE_UPROBE_MULTI: 469 + show_uprobe_multi_json(info, json_wtr); 498 470 break; 499 471 case BPF_LINK_TYPE_PERF_EVENT: 500 472 switch (info->perf_event.type) { ··· 708 674 } 709 675 } 710 676 677 + static void show_uprobe_multi_plain(struct bpf_link_info *info) 678 + { 679 + __u32 i; 680 + 681 + if (!info->uprobe_multi.count) 682 + return; 683 + 684 + if (info->uprobe_multi.flags & BPF_F_UPROBE_MULTI_RETURN) 685 + printf("\n\turetprobe.multi "); 686 + else 687 + printf("\n\tuprobe.multi "); 688 + 689 + printf("path %s ", (char *) u64_to_ptr(info->uprobe_multi.path)); 690 + printf("func_cnt %u ", info->uprobe_multi.count); 691 + 692 + if (info->uprobe_multi.pid) 693 + printf("pid %d ", info->uprobe_multi.pid); 694 + 695 + printf("\n\t%-16s %-16s %-16s", "offset", "ref_ctr_offset", "cookies"); 696 + for (i = 0; i < info->uprobe_multi.count; i++) { 697 + printf("\n\t0x%-16llx 0x%-16llx 0x%-16llx", 698 + u64_to_arr(info->uprobe_multi.offsets)[i], 699 + u64_to_arr(info->uprobe_multi.ref_ctr_offsets)[i], 700 + u64_to_arr(info->uprobe_multi.cookies)[i]); 701 + } 702 + } 703 + 711 704 static void show_perf_event_kprobe_plain(struct bpf_link_info *info) 712 705 { 713 706 const char *buf; ··· 868 807 case BPF_LINK_TYPE_KPROBE_MULTI: 869 808 show_kprobe_multi_plain(info); 870 809 break; 810 + case BPF_LINK_TYPE_UPROBE_MULTI: 811 + show_uprobe_multi_plain(info); 812 + break; 871 813 case BPF_LINK_TYPE_PERF_EVENT: 872 814 switch (info->perf_event.type) { 873 815 case BPF_PERF_EVENT_EVENT: ··· 910 846 911 847 static int do_show_link(int fd) 912 848 { 849 + __u64 *ref_ctr_offsets = NULL, *offsets = NULL, *cookies = NULL; 913 850 struct bpf_link_info info; 914 851 __u32 len = sizeof(info); 852 + char path_buf[PATH_MAX]; 915 853 __u64 *addrs = NULL; 916 854 char buf[PATH_MAX]; 917 855 int count; ··· 955 889 goto again; 956 890 } 957 891 } 892 + if (info.type == BPF_LINK_TYPE_UPROBE_MULTI && 893 + !info.uprobe_multi.offsets) { 894 + count = info.uprobe_multi.count; 895 + if (count) { 896 + offsets = calloc(count, sizeof(__u64)); 897 + if (!offsets) { 898 + p_err("mem alloc failed"); 899 + close(fd); 900 + return -ENOMEM; 901 + } 902 + info.uprobe_multi.offsets = ptr_to_u64(offsets); 903 + ref_ctr_offsets = calloc(count, sizeof(__u64)); 904 + if (!ref_ctr_offsets) { 905 + p_err("mem alloc failed"); 906 + free(offsets); 907 + close(fd); 908 + return -ENOMEM; 909 + } 910 + info.uprobe_multi.ref_ctr_offsets = ptr_to_u64(ref_ctr_offsets); 911 + cookies = calloc(count, sizeof(__u64)); 912 + if (!cookies) { 913 + p_err("mem alloc failed"); 914 + free(cookies); 915 + free(offsets); 916 + close(fd); 917 + return -ENOMEM; 918 + } 919 + info.uprobe_multi.cookies = ptr_to_u64(cookies); 920 + info.uprobe_multi.path = ptr_to_u64(path_buf); 921 + info.uprobe_multi.path_size = sizeof(path_buf); 922 + goto again; 923 + } 924 + } 958 925 if (info.type == BPF_LINK_TYPE_PERF_EVENT) { 959 926 switch (info.perf_event.type) { 960 927 case BPF_PERF_EVENT_TRACEPOINT: ··· 1023 924 else 1024 925 show_link_close_plain(fd, &info); 1025 926 1026 - if (addrs) 1027 - free(addrs); 927 + free(ref_ctr_offsets); 928 + free(cookies); 929 + free(offsets); 930 + free(addrs); 1028 931 close(fd); 1029 932 return 0; 1030 933 }
+9 -5
tools/bpf/bpftool/prog.c
··· 442 442 jsonw_uint_field(json_wtr, "recursion_misses", info->recursion_misses); 443 443 } 444 444 445 - static void print_prog_json(struct bpf_prog_info *info, int fd) 445 + static void print_prog_json(struct bpf_prog_info *info, int fd, bool orphaned) 446 446 { 447 447 char *memlock; 448 448 ··· 461 461 jsonw_uint_field(json_wtr, "uid", info->created_by_uid); 462 462 } 463 463 464 + jsonw_bool_field(json_wtr, "orphaned", orphaned); 464 465 jsonw_uint_field(json_wtr, "bytes_xlated", info->xlated_prog_len); 465 466 466 467 if (info->jited_prog_len) { ··· 528 527 printf("\n"); 529 528 } 530 529 531 - static void print_prog_plain(struct bpf_prog_info *info, int fd) 530 + static void print_prog_plain(struct bpf_prog_info *info, int fd, bool orphaned) 532 531 { 533 532 char *memlock; 534 533 ··· 554 553 if (memlock) 555 554 printf(" memlock %sB", memlock); 556 555 free(memlock); 556 + 557 + if (orphaned) 558 + printf(" orphaned"); 557 559 558 560 if (info->nr_map_ids) 559 561 show_prog_maps(fd, info->nr_map_ids); ··· 585 581 int err; 586 582 587 583 err = bpf_prog_get_info_by_fd(fd, &info, &len); 588 - if (err) { 584 + if (err && err != -ENODEV) { 589 585 p_err("can't get prog info: %s", strerror(errno)); 590 586 return -1; 591 587 } 592 588 593 589 if (json_output) 594 - print_prog_json(&info, fd); 590 + print_prog_json(&info, fd, err == -ENODEV); 595 591 else 596 - print_prog_plain(&info, fd); 592 + print_prog_plain(&info, fd, err == -ENODEV); 597 593 598 594 return 0; 599 595 }
+10
tools/include/uapi/linux/bpf.h
··· 6563 6563 __u64 missed; 6564 6564 } kprobe_multi; 6565 6565 struct { 6566 + __aligned_u64 path; 6567 + __aligned_u64 offsets; 6568 + __aligned_u64 ref_ctr_offsets; 6569 + __aligned_u64 cookies; 6570 + __u32 path_size; /* in/out: real path size on success, including zero byte */ 6571 + __u32 count; /* in/out: uprobe_multi offsets/ref_ctr_offsets/cookies count */ 6572 + __u32 flags; 6573 + __u32 pid; 6574 + } uprobe_multi; 6575 + struct { 6566 6576 __u32 type; /* enum bpf_perf_event_type */ 6567 6577 __u32 :32; 6568 6578 union {
+55 -6
tools/include/uapi/linux/if_xdp.h
··· 26 26 */ 27 27 #define XDP_USE_NEED_WAKEUP (1 << 3) 28 28 /* By setting this option, userspace application indicates that it can 29 - * handle multiple descriptors per packet thus enabling xsk core to split 29 + * handle multiple descriptors per packet thus enabling AF_XDP to split 30 30 * multi-buffer XDP frames into multiple Rx descriptors. Without this set 31 - * such frames will be dropped by xsk. 31 + * such frames will be dropped. 32 32 */ 33 - #define XDP_USE_SG (1 << 4) 33 + #define XDP_USE_SG (1 << 4) 34 34 35 35 /* Flags for xsk_umem_config flags */ 36 - #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) 36 + #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) 37 + 38 + /* Force checksum calculation in software. Can be used for testing or 39 + * working around potential HW issues. This option causes performance 40 + * degradation and only works in XDP_COPY mode. 41 + */ 42 + #define XDP_UMEM_TX_SW_CSUM (1 << 1) 37 43 38 44 struct sockaddr_xdp { 39 45 __u16 sxdp_family; ··· 82 76 __u32 chunk_size; 83 77 __u32 headroom; 84 78 __u32 flags; 79 + __u32 tx_metadata_len; 85 80 }; 86 81 87 82 struct xdp_statistics { ··· 112 105 #define XSK_UNALIGNED_BUF_ADDR_MASK \ 113 106 ((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1) 114 107 108 + /* Request transmit timestamp. Upon completion, put it into tx_timestamp 109 + * field of union xsk_tx_metadata. 110 + */ 111 + #define XDP_TXMD_FLAGS_TIMESTAMP (1 << 0) 112 + 113 + /* Request transmit checksum offload. Checksum start position and offset 114 + * are communicated via csum_start and csum_offset fields of union 115 + * xsk_tx_metadata. 116 + */ 117 + #define XDP_TXMD_FLAGS_CHECKSUM (1 << 1) 118 + 119 + /* AF_XDP offloads request. 'request' union member is consumed by the driver 120 + * when the packet is being transmitted. 'completion' union member is 121 + * filled by the driver when the transmit completion arrives. 122 + */ 123 + struct xsk_tx_metadata { 124 + __u64 flags; 125 + 126 + union { 127 + struct { 128 + /* XDP_TXMD_FLAGS_CHECKSUM */ 129 + 130 + /* Offset from desc->addr where checksumming should start. */ 131 + __u16 csum_start; 132 + /* Offset from csum_start where checksum should be stored. */ 133 + __u16 csum_offset; 134 + } request; 135 + 136 + struct { 137 + /* XDP_TXMD_FLAGS_TIMESTAMP */ 138 + __u64 tx_timestamp; 139 + } completion; 140 + }; 141 + }; 142 + 115 143 /* Rx/Tx descriptor */ 116 144 struct xdp_desc { 117 145 __u64 addr; ··· 154 112 __u32 options; 155 113 }; 156 114 157 - /* Flag indicating packet constitutes of multiple buffers*/ 115 + /* UMEM descriptor is __u64 */ 116 + 117 + /* Flag indicating that the packet continues with the buffer pointed out by the 118 + * next frame in the ring. The end of the packet is signalled by setting this 119 + * bit to zero. For single buffer packets, every descriptor has 'options' set 120 + * to 0 and this maintains backward compatibility. 121 + */ 158 122 #define XDP_PKT_CONTD (1 << 0) 159 123 160 - /* UMEM descriptor is __u64 */ 124 + /* TX packet carries valid metadata. */ 125 + #define XDP_TX_METADATA (1 << 1) 161 126 162 127 #endif /* _LINUX_IF_XDP_H */
+12 -2
tools/include/uapi/linux/netdev.h
··· 48 48 enum netdev_xdp_rx_metadata { 49 49 NETDEV_XDP_RX_METADATA_TIMESTAMP = 1, 50 50 NETDEV_XDP_RX_METADATA_HASH = 2, 51 + }; 51 52 52 - /* private: */ 53 - NETDEV_XDP_RX_METADATA_MASK = 3, 53 + /** 54 + * enum netdev_xsk_flags 55 + * @NETDEV_XSK_FLAGS_TX_TIMESTAMP: HW timestamping egress packets is supported 56 + * by the driver. 57 + * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the 58 + * driver. 59 + */ 60 + enum netdev_xsk_flags { 61 + NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1, 62 + NETDEV_XSK_FLAGS_TX_CHECKSUM = 2, 54 63 }; 55 64 56 65 enum { ··· 68 59 NETDEV_A_DEV_XDP_FEATURES, 69 60 NETDEV_A_DEV_XDP_ZC_MAX_SEGS, 70 61 NETDEV_A_DEV_XDP_RX_METADATA_FEATURES, 62 + NETDEV_A_DEV_XSK_FEATURES, 71 63 72 64 __NETDEV_A_DEV_MAX, 73 65 NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
+3 -2
tools/lib/bpf/elf.c
··· 407 407 * size, that needs to be released by the caller. 408 408 */ 409 409 int elf_resolve_syms_offsets(const char *binary_path, int cnt, 410 - const char **syms, unsigned long **poffsets) 410 + const char **syms, unsigned long **poffsets, 411 + int st_type) 411 412 { 412 413 int sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB }; 413 414 int err = 0, i, cnt_done = 0; ··· 439 438 struct elf_sym_iter iter; 440 439 struct elf_sym *sym; 441 440 442 - err = elf_sym_iter_new(&iter, elf_fd.elf, binary_path, sh_types[i], STT_FUNC); 441 + err = elf_sym_iter_new(&iter, elf_fd.elf, binary_path, sh_types[i], st_type); 443 442 if (err == -ENOENT) 444 443 continue; 445 444 if (err)
+1 -1
tools/lib/bpf/libbpf.c
··· 11447 11447 return libbpf_err_ptr(err); 11448 11448 offsets = resolved_offsets; 11449 11449 } else if (syms) { 11450 - err = elf_resolve_syms_offsets(path, cnt, syms, &resolved_offsets); 11450 + err = elf_resolve_syms_offsets(path, cnt, syms, &resolved_offsets, STT_FUNC); 11451 11451 if (err < 0) 11452 11452 return libbpf_err_ptr(err); 11453 11453 offsets = resolved_offsets;
+3
tools/lib/bpf/libbpf.map
··· 409 409 ring__size; 410 410 ring_buffer__ring; 411 411 } LIBBPF_1.2.0; 412 + 413 + LIBBPF_1.4.0 { 414 + } LIBBPF_1.3.0;
+2 -1
tools/lib/bpf/libbpf_internal.h
··· 594 594 void elf_close(struct elf_fd *elf_fd); 595 595 596 596 int elf_resolve_syms_offsets(const char *binary_path, int cnt, 597 - const char **syms, unsigned long **poffsets); 597 + const char **syms, unsigned long **poffsets, 598 + int st_type); 598 599 int elf_resolve_pattern_offsets(const char *binary_path, const char *pattern, 599 600 unsigned long **poffsets, size_t *pcnt); 600 601
+1 -1
tools/lib/bpf/libbpf_version.h
··· 4 4 #define __LIBBPF_VERSION_H 5 5 6 6 #define LIBBPF_MAJOR_VERSION 1 7 - #define LIBBPF_MINOR_VERSION 3 7 + #define LIBBPF_MINOR_VERSION 4 8 8 9 9 #endif /* __LIBBPF_VERSION_H */
+19
tools/net/ynl/generated/netdev-user.c
··· 63 63 return netdev_xdp_rx_metadata_strmap[value]; 64 64 } 65 65 66 + static const char * const netdev_xsk_flags_strmap[] = { 67 + [0] = "tx-timestamp", 68 + [1] = "tx-checksum", 69 + }; 70 + 71 + const char *netdev_xsk_flags_str(enum netdev_xsk_flags value) 72 + { 73 + value = ffs(value) - 1; 74 + if (value < 0 || value >= (int)MNL_ARRAY_SIZE(netdev_xsk_flags_strmap)) 75 + return NULL; 76 + return netdev_xsk_flags_strmap[value]; 77 + } 78 + 66 79 /* Policies */ 67 80 struct ynl_policy_attr netdev_page_pool_info_policy[NETDEV_A_PAGE_POOL_MAX + 1] = { 68 81 [NETDEV_A_PAGE_POOL_ID] = { .name = "id", .type = YNL_PT_UINT, }, ··· 93 80 [NETDEV_A_DEV_XDP_FEATURES] = { .name = "xdp-features", .type = YNL_PT_U64, }, 94 81 [NETDEV_A_DEV_XDP_ZC_MAX_SEGS] = { .name = "xdp-zc-max-segs", .type = YNL_PT_U32, }, 95 82 [NETDEV_A_DEV_XDP_RX_METADATA_FEATURES] = { .name = "xdp-rx-metadata-features", .type = YNL_PT_U64, }, 83 + [NETDEV_A_DEV_XSK_FEATURES] = { .name = "xsk-features", .type = YNL_PT_U64, }, 96 84 }; 97 85 98 86 struct ynl_policy_nest netdev_dev_nest = { ··· 223 209 return MNL_CB_ERROR; 224 210 dst->_present.xdp_rx_metadata_features = 1; 225 211 dst->xdp_rx_metadata_features = mnl_attr_get_u64(attr); 212 + } else if (type == NETDEV_A_DEV_XSK_FEATURES) { 213 + if (ynl_attr_validate(yarg, attr)) 214 + return MNL_CB_ERROR; 215 + dst->_present.xsk_features = 1; 216 + dst->xsk_features = mnl_attr_get_u64(attr); 226 217 } 227 218 } 228 219
+3
tools/net/ynl/generated/netdev-user.h
··· 19 19 const char *netdev_op_str(int op); 20 20 const char *netdev_xdp_act_str(enum netdev_xdp_act value); 21 21 const char *netdev_xdp_rx_metadata_str(enum netdev_xdp_rx_metadata value); 22 + const char *netdev_xsk_flags_str(enum netdev_xsk_flags value); 22 23 23 24 /* Common nested types */ 24 25 struct netdev_page_pool_info { ··· 61 60 __u32 xdp_features:1; 62 61 __u32 xdp_zc_max_segs:1; 63 62 __u32 xdp_rx_metadata_features:1; 63 + __u32 xsk_features:1; 64 64 } _present; 65 65 66 66 __u32 ifindex; 67 67 __u64 xdp_features; 68 68 __u32 xdp_zc_max_segs; 69 69 __u64 xdp_rx_metadata_features; 70 + __u64 xsk_features; 70 71 }; 71 72 72 73 void netdev_dev_get_rsp_free(struct netdev_dev_get_rsp *rsp);
+8 -2
tools/net/ynl/samples/netdev.c
··· 33 33 return; 34 34 35 35 printf("xdp-features (%llx):", d->xdp_features); 36 - for (int i = 0; d->xdp_features > 1U << i; i++) { 36 + for (int i = 0; d->xdp_features >= 1U << i; i++) { 37 37 if (d->xdp_features & (1U << i)) 38 38 printf(" %s", netdev_xdp_act_str(1 << i)); 39 39 } 40 40 41 41 printf(" xdp-rx-metadata-features (%llx):", d->xdp_rx_metadata_features); 42 - for (int i = 0; d->xdp_rx_metadata_features > 1U << i; i++) { 42 + for (int i = 0; d->xdp_rx_metadata_features >= 1U << i; i++) { 43 43 if (d->xdp_rx_metadata_features & (1U << i)) 44 44 printf(" %s", netdev_xdp_rx_metadata_str(1 << i)); 45 + } 46 + 47 + printf(" xsk-features (%llx):", d->xsk_features); 48 + for (int i = 0; d->xsk_features >= 1U << i; i++) { 49 + if (d->xsk_features & (1U << i)) 50 + printf(" %s", netdev_xsk_flags_str(1 << i)); 45 51 } 46 52 47 53 printf(" xdp-zc-max-segs=%u", d->xdp_zc_max_segs);
+9 -5
tools/testing/selftests/bpf/Makefile
··· 18 18 GENDIR := $(abspath ../../../../include/generated) 19 19 endif 20 20 GENHDR := $(GENDIR)/autoconf.h 21 - HOSTPKG_CONFIG := pkg-config 21 + PKG_CONFIG ?= $(CROSS_COMPILE)pkg-config 22 22 23 23 ifneq ($(wildcard $(GENHDR)),) 24 24 GENFLAGS := -DHAVE_GENHDR ··· 29 29 SAN_LDFLAGS ?= $(SAN_CFLAGS) 30 30 RELEASE ?= 31 31 OPT_FLAGS ?= $(if $(RELEASE),-O2,-O0) 32 + 33 + LIBELF_CFLAGS := $(shell $(PKG_CONFIG) libelf --cflags 2>/dev/null) 34 + LIBELF_LIBS := $(shell $(PKG_CONFIG) libelf --libs 2>/dev/null || echo -lelf) 35 + 32 36 CFLAGS += -g $(OPT_FLAGS) -rdynamic \ 33 37 -Wall -Werror \ 34 - $(GENFLAGS) $(SAN_CFLAGS) \ 38 + $(GENFLAGS) $(SAN_CFLAGS) $(LIBELF_CFLAGS) \ 35 39 -I$(CURDIR) -I$(INCLUDE_DIR) -I$(GENDIR) -I$(LIBDIR) \ 36 40 -I$(TOOLSINCDIR) -I$(APIDIR) -I$(OUTPUT) 37 41 LDFLAGS += $(SAN_LDFLAGS) 38 - LDLIBS += -lelf -lz -lrt -lpthread 42 + LDLIBS += $(LIBELF_LIBS) -lz -lrt -lpthread 39 43 40 44 ifneq ($(LLVM),) 41 45 # Silence some warnings when compiled with clang ··· 223 219 224 220 $(OUTPUT)/sign-file: ../../../../scripts/sign-file.c 225 221 $(call msg,SIGN-FILE,,$@) 226 - $(Q)$(CC) $(shell $(HOSTPKG_CONFIG) --cflags libcrypto 2> /dev/null) \ 222 + $(Q)$(CC) $(shell $(PKG_CONFIG) --cflags libcrypto 2> /dev/null) \ 227 223 $< -o $@ \ 228 - $(shell $(HOSTPKG_CONFIG) --libs libcrypto 2> /dev/null || echo -lcrypto) 224 + $(shell $(PKG_CONFIG) --libs libcrypto 2> /dev/null || echo -lcrypto) 229 225 230 226 $(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(RESOLVE_BTFIDS) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch]) 231 227 $(call msg,MOD,,$@)
+1 -1
tools/testing/selftests/bpf/README.rst
··· 77 77 78 78 .. code-block:: console 79 79 80 - $ LDLIBS=-static vmtest.sh 80 + $ LDLIBS=-static PKG_CONFIG='pkg-config --static' vmtest.sh 81 81 82 82 .. note:: Some distros may not support static linking. 83 83
+43
tools/testing/selftests/bpf/network_helpers.h
··· 71 71 */ 72 72 struct nstoken *open_netns(const char *name); 73 73 void close_netns(struct nstoken *token); 74 + 75 + static __u16 csum_fold(__u32 csum) 76 + { 77 + csum = (csum & 0xffff) + (csum >> 16); 78 + csum = (csum & 0xffff) + (csum >> 16); 79 + 80 + return (__u16)~csum; 81 + } 82 + 83 + static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, 84 + __u32 len, __u8 proto, 85 + __wsum csum) 86 + { 87 + __u64 s = csum; 88 + 89 + s += (__u32)saddr; 90 + s += (__u32)daddr; 91 + s += htons(proto + len); 92 + s = (s & 0xffffffff) + (s >> 32); 93 + s = (s & 0xffffffff) + (s >> 32); 94 + 95 + return csum_fold((__u32)s); 96 + } 97 + 98 + static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr, 99 + const struct in6_addr *daddr, 100 + __u32 len, __u8 proto, 101 + __wsum csum) 102 + { 103 + __u64 s = csum; 104 + int i; 105 + 106 + for (i = 0; i < 4; i++) 107 + s += (__u32)saddr->s6_addr32[i]; 108 + for (i = 0; i < 4; i++) 109 + s += (__u32)daddr->s6_addr32[i]; 110 + s += htons(proto + len); 111 + s = (s & 0xffffffff) + (s >> 32); 112 + s = (s & 0xffffffff) + (s >> 32); 113 + 114 + return csum_fold((__u32)s); 115 + } 116 + 74 117 #endif
+1 -1
tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
··· 249 249 int link_extra_fd = -1; 250 250 int err; 251 251 252 - err = elf_resolve_syms_offsets(path, 3, syms, (unsigned long **) &offsets); 252 + err = elf_resolve_syms_offsets(path, 3, syms, (unsigned long **) &offsets, STT_FUNC); 253 253 if (!ASSERT_OK(err, "elf_resolve_syms_offsets")) 254 254 return; 255 255
+2
tools/testing/selftests/bpf/prog_tests/verifier.c
··· 25 25 #include "verifier_direct_stack_access_wraparound.skel.h" 26 26 #include "verifier_div0.skel.h" 27 27 #include "verifier_div_overflow.skel.h" 28 + #include "verifier_global_subprogs.skel.h" 28 29 #include "verifier_gotol.skel.h" 29 30 #include "verifier_helper_access_var_len.skel.h" 30 31 #include "verifier_helper_packet_access.skel.h" ··· 135 134 void test_verifier_direct_stack_access_wraparound(void) { RUN(verifier_direct_stack_access_wraparound); } 136 135 void test_verifier_div0(void) { RUN(verifier_div0); } 137 136 void test_verifier_div_overflow(void) { RUN(verifier_div_overflow); } 137 + void test_verifier_global_subprogs(void) { RUN(verifier_global_subprogs); } 138 138 void test_verifier_gotol(void) { RUN(verifier_gotol); } 139 139 void test_verifier_helper_access_var_len(void) { RUN(verifier_helper_access_var_len); } 140 140 void test_verifier_helper_packet_access(void) { RUN(verifier_helper_packet_access); }
+29 -4
tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
··· 56 56 .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, 57 57 .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, 58 58 .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, 59 - .flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG, 59 + .flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG | XDP_UMEM_TX_SW_CSUM, 60 + .tx_metadata_len = sizeof(struct xsk_tx_metadata), 60 61 }; 61 62 __u32 idx; 62 63 u64 addr; ··· 139 138 140 139 static int generate_packet(struct xsk *xsk, __u16 dst_port) 141 140 { 141 + struct xsk_tx_metadata *meta; 142 142 struct xdp_desc *tx_desc; 143 143 struct udphdr *udph; 144 144 struct ethhdr *eth; ··· 153 151 return -1; 154 152 155 153 tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx); 156 - tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE; 154 + tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE + sizeof(struct xsk_tx_metadata); 157 155 printf("%p: tx_desc[%u]->addr=%llx\n", xsk, idx, tx_desc->addr); 158 156 data = xsk_umem__get_data(xsk->umem_area, tx_desc->addr); 157 + 158 + meta = data - sizeof(struct xsk_tx_metadata); 159 + memset(meta, 0, sizeof(*meta)); 160 + meta->flags = XDP_TXMD_FLAGS_TIMESTAMP; 159 161 160 162 eth = data; 161 163 iph = (void *)(eth + 1); ··· 184 178 udph->source = htons(AF_XDP_SOURCE_PORT); 185 179 udph->dest = htons(dst_port); 186 180 udph->len = htons(sizeof(*udph) + UDP_PAYLOAD_BYTES); 187 - udph->check = 0; 181 + udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, 182 + ntohs(udph->len), IPPROTO_UDP, 0); 188 183 189 184 memset(udph + 1, 0xAA, UDP_PAYLOAD_BYTES); 190 185 186 + meta->flags |= XDP_TXMD_FLAGS_CHECKSUM; 187 + meta->request.csum_start = sizeof(*eth) + sizeof(*iph); 188 + meta->request.csum_offset = offsetof(struct udphdr, check); 189 + 191 190 tx_desc->len = sizeof(*eth) + sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES; 191 + tx_desc->options |= XDP_TX_METADATA; 192 192 xsk_ring_prod__submit(&xsk->tx, 1); 193 193 194 194 ret = sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0); ··· 206 194 207 195 static void complete_tx(struct xsk *xsk) 208 196 { 209 - __u32 idx; 197 + struct xsk_tx_metadata *meta; 210 198 __u64 addr; 199 + void *data; 200 + __u32 idx; 211 201 212 202 if (ASSERT_EQ(xsk_ring_cons__peek(&xsk->comp, 1, &idx), 1, "xsk_ring_cons__peek")) { 213 203 addr = *xsk_ring_cons__comp_addr(&xsk->comp, idx); 214 204 215 205 printf("%p: complete tx idx=%u addr=%llx\n", xsk, idx, addr); 206 + 207 + data = xsk_umem__get_data(xsk->umem_area, addr); 208 + meta = data - sizeof(struct xsk_tx_metadata); 209 + 210 + ASSERT_NEQ(meta->completion.tx_timestamp, 0, "tx_timestamp"); 211 + 216 212 xsk_ring_cons__release(&xsk->comp, 1); 217 213 } 218 214 } ··· 241 221 const struct xdp_desc *rx_desc; 242 222 struct pollfd fds = {}; 243 223 struct xdp_meta *meta; 224 + struct udphdr *udph; 244 225 struct ethhdr *eth; 245 226 struct iphdr *iph; 246 227 __u64 comp_addr; ··· 278 257 ASSERT_EQ(eth->h_proto, htons(ETH_P_IP), "eth->h_proto"); 279 258 iph = (void *)(eth + 1); 280 259 ASSERT_EQ((int)iph->version, 4, "iph->version"); 260 + udph = (void *)(iph + 1); 281 261 282 262 /* custom metadata */ 283 263 ··· 291 269 return -1; 292 270 293 271 ASSERT_EQ(meta->rx_hash_type, 0, "rx_hash_type"); 272 + 273 + /* checksum offload */ 274 + ASSERT_EQ(udph->check, htons(0x721c), "csum"); 294 275 295 276 xsk_ring_cons__release(&xsk->rx, 1); 296 277 refill_rx(xsk, comp_addr);
+3 -1
tools/testing/selftests/bpf/progs/test_global_func12.c
··· 19 19 { 20 20 const struct S s = {.x = skb->len }; 21 21 22 - return foo(&s); 22 + foo(&s); 23 + 24 + return 1; 23 25 }
+1
tools/testing/selftests/bpf/progs/test_global_func17.c
··· 5 5 6 6 __noinline int foo(int *p) 7 7 { 8 + barrier_var(p); 8 9 return p ? (*p = 42) : 0; 9 10 } 10 11
+92
tools/testing/selftests/bpf/progs/verifier_global_subprogs.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <stdbool.h> 5 + #include <errno.h> 6 + #include <string.h> 7 + #include <linux/bpf.h> 8 + #include <bpf/bpf_helpers.h> 9 + #include "bpf_misc.h" 10 + 11 + int arr[1]; 12 + int unkn_idx; 13 + 14 + __noinline long global_bad(void) 15 + { 16 + return arr[unkn_idx]; /* BOOM */ 17 + } 18 + 19 + __noinline long global_good(void) 20 + { 21 + return arr[0]; 22 + } 23 + 24 + __noinline long global_calls_bad(void) 25 + { 26 + return global_good() + global_bad() /* does BOOM indirectly */; 27 + } 28 + 29 + __noinline long global_calls_good_only(void) 30 + { 31 + return global_good(); 32 + } 33 + 34 + SEC("?raw_tp") 35 + __success __log_level(2) 36 + /* main prog is validated completely first */ 37 + __msg("('global_calls_good_only') is global and assumed valid.") 38 + __msg("1: (95) exit") 39 + /* eventually global_good() is transitively validated as well */ 40 + __msg("Validating global_good() func") 41 + __msg("('global_good') is safe for any args that match its prototype") 42 + int chained_global_func_calls_success(void) 43 + { 44 + return global_calls_good_only(); 45 + } 46 + 47 + SEC("?raw_tp") 48 + __failure __log_level(2) 49 + /* main prog validated successfully first */ 50 + __msg("1: (95) exit") 51 + /* eventually we validate global_bad() and fail */ 52 + __msg("Validating global_bad() func") 53 + __msg("math between map_value pointer and register") /* BOOM */ 54 + int chained_global_func_calls_bad(void) 55 + { 56 + return global_calls_bad(); 57 + } 58 + 59 + /* do out of bounds access forcing verifier to fail verification if this 60 + * global func is called 61 + */ 62 + __noinline int global_unsupp(const int *mem) 63 + { 64 + if (!mem) 65 + return 0; 66 + return mem[100]; /* BOOM */ 67 + } 68 + 69 + const volatile bool skip_unsupp_global = true; 70 + 71 + SEC("?raw_tp") 72 + __success 73 + int guarded_unsupp_global_called(void) 74 + { 75 + if (!skip_unsupp_global) 76 + return global_unsupp(NULL); 77 + return 0; 78 + } 79 + 80 + SEC("?raw_tp") 81 + __failure __log_level(2) 82 + __msg("Func#1 ('global_unsupp') is global and assumed valid.") 83 + __msg("Validating global_unsupp() func#1...") 84 + __msg("value is outside of the allowed memory range") 85 + int unguarded_unsupp_global_called(void) 86 + { 87 + int x = 0; 88 + 89 + return global_unsupp(&x); 90 + } 91 + 92 + char _license[] SEC("license") = "GPL";
+1 -3
tools/testing/selftests/bpf/progs/verifier_subprog_precision.c
··· 370 370 SEC("?raw_tp") 371 371 __success __log_level(2) 372 372 __msg("9: (0f) r1 += r6") 373 - __msg("mark_precise: frame0: last_idx 9 first_idx 6") 373 + __msg("mark_precise: frame0: last_idx 9 first_idx 0") 374 374 __msg("mark_precise: frame0: regs=r6 stack= before 8: (bf) r1 = r7") 375 375 __msg("mark_precise: frame0: regs=r6 stack= before 7: (27) r6 *= 4") 376 376 __msg("mark_precise: frame0: regs=r6 stack= before 6: (79) r6 = *(u64 *)(r10 -8)") 377 - __msg("mark_precise: frame0: parent state regs= stack=-8:") 378 - __msg("mark_precise: frame0: last_idx 5 first_idx 0") 379 377 __msg("mark_precise: frame0: regs= stack=-8 before 5: (85) call pc+6") 380 378 __msg("mark_precise: frame0: regs= stack=-8 before 4: (b7) r1 = 0") 381 379 __msg("mark_precise: frame0: regs= stack=-8 before 3: (7b) *(u64 *)(r10 -8) = r6")
+6 -9
tools/testing/selftests/bpf/test_offload.py
··· 169 169 return tool("bpftool", args, {"json":"-p"}, JSON=JSON, ns=ns, 170 170 fail=fail, include_stderr=include_stderr) 171 171 172 - def bpftool_prog_list(expected=None, ns=""): 172 + def bpftool_prog_list(expected=None, ns="", exclude_orphaned=True): 173 173 _, progs = bpftool("prog show", JSON=True, ns=ns, fail=True) 174 174 # Remove the base progs 175 175 for p in base_progs: 176 176 if p in progs: 177 177 progs.remove(p) 178 + if exclude_orphaned: 179 + progs = [ p for p in progs if not p['orphaned'] ] 178 180 if expected is not None: 179 181 if len(progs) != expected: 180 182 fail(True, "%d BPF programs loaded, expected %d" % ··· 614 612 615 613 def check_dev_info_removed(prog_file=None, map_file=None): 616 614 bpftool_prog_list(expected=0) 615 + bpftool_prog_list(expected=1, exclude_orphaned=False) 617 616 ret, err = bpftool("prog show pin %s" % (prog_file), fail=False) 618 - fail(ret == 0, "Showing prog with removed device did not fail") 619 - fail(err["error"].find("No such device") == -1, 620 - "Showing prog with removed device expected ENODEV, error is %s" % 621 - (err["error"])) 617 + fail(ret != 0, "failed to show prog with removed device") 622 618 623 619 bpftool_map_list(expected=0) 624 620 ret, err = bpftool("map show pin %s" % (map_file), fail=False) ··· 1395 1395 1396 1396 start_test("Test multi-dev ASIC cross-dev destruction - orphaned...") 1397 1397 ret, out = bpftool("prog show %s" % (progB), fail=False) 1398 - fail(ret == 0, "got information about orphaned program") 1399 - fail("error" not in out, "no error reported for get info on orphaned") 1400 - fail(out["error"] != "can't get prog info: No such device", 1401 - "wrong error for get info on orphaned") 1398 + fail(ret != 0, "couldn't get information about orphaned program") 1402 1399 1403 1400 print("%s: OK" % (os.path.basename(__file__))) 1404 1401
+206 -25
tools/testing/selftests/bpf/xdp_hw_metadata.c
··· 10 10 * - rx_hash 11 11 * 12 12 * TX: 13 - * - TBD 13 + * - UDP 9091 packets trigger TX reply 14 + * - TX HW timestamp is requested and reported back upon completion 15 + * - TX checksum is requested 14 16 */ 15 17 16 18 #include <test_progs.h> ··· 26 24 #include <linux/net_tstamp.h> 27 25 #include <linux/udp.h> 28 26 #include <linux/sockios.h> 27 + #include <linux/if_xdp.h> 29 28 #include <sys/mman.h> 30 29 #include <net/if.h> 31 30 #include <ctype.h> 32 31 #include <poll.h> 33 32 #include <time.h> 33 + #include <unistd.h> 34 + #include <libgen.h> 34 35 35 36 #include "xdp_metadata.h" 36 37 37 - #define UMEM_NUM 16 38 + #define UMEM_NUM 256 38 39 #define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE 39 40 #define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM) 40 41 #define XDP_FLAGS (XDP_FLAGS_DRV_MODE | XDP_FLAGS_REPLACE) ··· 53 48 }; 54 49 55 50 struct xdp_hw_metadata *bpf_obj; 56 - __u16 bind_flags = XDP_COPY; 51 + __u16 bind_flags = XDP_USE_NEED_WAKEUP | XDP_ZEROCOPY; 57 52 struct xsk *rx_xsk; 58 53 const char *ifname; 59 54 int ifindex; 60 55 int rxq; 56 + bool skip_tx; 57 + __u64 last_hw_rx_timestamp; 58 + __u64 last_xdp_rx_timestamp; 61 59 62 60 void test__fail(void) { /* for network_helpers.c */ } 63 61 ··· 76 68 .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, 77 69 .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, 78 70 .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, 79 - .flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG, 71 + .flags = XSK_UMEM__DEFAULT_FLAGS, 72 + .tx_metadata_len = sizeof(struct xsk_tx_metadata), 80 73 }; 81 74 __u32 idx; 82 75 u64 addr; ··· 119 110 for (i = 0; i < UMEM_NUM / 2; i++) { 120 111 addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE; 121 112 printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr); 122 - *xsk_ring_prod__fill_addr(&xsk->fill, i) = addr; 113 + *xsk_ring_prod__fill_addr(&xsk->fill, idx + i) = addr; 123 114 } 124 115 xsk_ring_prod__submit(&xsk->fill, ret); 125 116 ··· 140 131 __u32 idx; 141 132 142 133 if (xsk_ring_prod__reserve(&xsk->fill, 1, &idx) == 1) { 143 - printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr); 134 + printf("%p: complete rx idx=%u addr=%llx\n", xsk, idx, addr); 144 135 *xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr; 145 136 xsk_ring_prod__submit(&xsk->fill, 1); 146 137 } 138 + } 139 + 140 + static int kick_tx(struct xsk *xsk) 141 + { 142 + return sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0); 143 + } 144 + 145 + static int kick_rx(struct xsk *xsk) 146 + { 147 + return recvfrom(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, NULL); 147 148 } 148 149 149 150 #define NANOSEC_PER_SEC 1000000000 /* 10^9 */ ··· 171 152 return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec; 172 153 } 173 154 155 + static void print_tstamp_delta(const char *name, const char *refname, 156 + __u64 tstamp, __u64 reference) 157 + { 158 + __s64 delta = (__s64)reference - (__s64)tstamp; 159 + 160 + printf("%s: %llu (sec:%0.4f) delta to %s sec:%0.4f (%0.3f usec)\n", 161 + name, tstamp, (double)tstamp / NANOSEC_PER_SEC, refname, 162 + (double)delta / NANOSEC_PER_SEC, 163 + (double)delta / 1000); 164 + } 165 + 174 166 static void verify_xdp_metadata(void *data, clockid_t clock_id) 175 167 { 176 168 struct xdp_meta *meta; ··· 194 164 printf("rx_hash: 0x%X with RSS type:0x%X\n", 195 165 meta->rx_hash, meta->rx_hash_type); 196 166 197 - printf("rx_timestamp: %llu (sec:%0.4f)\n", meta->rx_timestamp, 198 - (double)meta->rx_timestamp / NANOSEC_PER_SEC); 199 167 if (meta->rx_timestamp) { 200 - __u64 usr_clock = gettime(clock_id); 201 - __u64 xdp_clock = meta->xdp_timestamp; 202 - __s64 delta_X = xdp_clock - meta->rx_timestamp; 203 - __s64 delta_X2U = usr_clock - xdp_clock; 168 + __u64 ref_tstamp = gettime(clock_id); 204 169 205 - printf("XDP RX-time: %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n", 206 - xdp_clock, (double)xdp_clock / NANOSEC_PER_SEC, 207 - (double)delta_X / NANOSEC_PER_SEC, 208 - (double)delta_X / 1000); 170 + /* store received timestamps to calculate a delta at tx */ 171 + last_hw_rx_timestamp = meta->rx_timestamp; 172 + last_xdp_rx_timestamp = meta->xdp_timestamp; 209 173 210 - printf("AF_XDP time: %llu (sec:%0.4f) delta sec:%0.4f (%0.3f usec)\n", 211 - usr_clock, (double)usr_clock / NANOSEC_PER_SEC, 212 - (double)delta_X2U / NANOSEC_PER_SEC, 213 - (double)delta_X2U / 1000); 174 + print_tstamp_delta("HW RX-time", "User RX-time", 175 + meta->rx_timestamp, ref_tstamp); 176 + print_tstamp_delta("XDP RX-time", "User RX-time", 177 + meta->xdp_timestamp, ref_tstamp); 178 + } else { 179 + printf("No rx_timestamp\n"); 214 180 } 215 - 216 181 } 217 182 218 183 static void verify_skb_metadata(int fd) ··· 255 230 printf("skb hwtstamp is not found!\n"); 256 231 } 257 232 233 + static bool complete_tx(struct xsk *xsk, clockid_t clock_id) 234 + { 235 + struct xsk_tx_metadata *meta; 236 + __u64 addr; 237 + void *data; 238 + __u32 idx; 239 + 240 + if (!xsk_ring_cons__peek(&xsk->comp, 1, &idx)) 241 + return false; 242 + 243 + addr = *xsk_ring_cons__comp_addr(&xsk->comp, idx); 244 + data = xsk_umem__get_data(xsk->umem_area, addr); 245 + meta = data - sizeof(struct xsk_tx_metadata); 246 + 247 + printf("%p: complete tx idx=%u addr=%llx\n", xsk, idx, addr); 248 + 249 + if (meta->completion.tx_timestamp) { 250 + __u64 ref_tstamp = gettime(clock_id); 251 + 252 + print_tstamp_delta("HW TX-complete-time", "User TX-complete-time", 253 + meta->completion.tx_timestamp, ref_tstamp); 254 + print_tstamp_delta("XDP RX-time", "User TX-complete-time", 255 + last_xdp_rx_timestamp, ref_tstamp); 256 + print_tstamp_delta("HW RX-time", "HW TX-complete-time", 257 + last_hw_rx_timestamp, meta->completion.tx_timestamp); 258 + } else { 259 + printf("No tx_timestamp\n"); 260 + } 261 + 262 + xsk_ring_cons__release(&xsk->comp, 1); 263 + 264 + return true; 265 + } 266 + 267 + #define swap(a, b, len) do { \ 268 + for (int i = 0; i < len; i++) { \ 269 + __u8 tmp = ((__u8 *)a)[i]; \ 270 + ((__u8 *)a)[i] = ((__u8 *)b)[i]; \ 271 + ((__u8 *)b)[i] = tmp; \ 272 + } \ 273 + } while (0) 274 + 275 + static void ping_pong(struct xsk *xsk, void *rx_packet, clockid_t clock_id) 276 + { 277 + struct xsk_tx_metadata *meta; 278 + struct ipv6hdr *ip6h = NULL; 279 + struct iphdr *iph = NULL; 280 + struct xdp_desc *tx_desc; 281 + struct udphdr *udph; 282 + struct ethhdr *eth; 283 + __sum16 want_csum; 284 + void *data; 285 + __u32 idx; 286 + int ret; 287 + int len; 288 + 289 + ret = xsk_ring_prod__reserve(&xsk->tx, 1, &idx); 290 + if (ret != 1) { 291 + printf("%p: failed to reserve tx slot\n", xsk); 292 + return; 293 + } 294 + 295 + tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx); 296 + tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE + sizeof(struct xsk_tx_metadata); 297 + data = xsk_umem__get_data(xsk->umem_area, tx_desc->addr); 298 + 299 + meta = data - sizeof(struct xsk_tx_metadata); 300 + memset(meta, 0, sizeof(*meta)); 301 + meta->flags = XDP_TXMD_FLAGS_TIMESTAMP; 302 + 303 + eth = rx_packet; 304 + 305 + if (eth->h_proto == htons(ETH_P_IP)) { 306 + iph = (void *)(eth + 1); 307 + udph = (void *)(iph + 1); 308 + } else if (eth->h_proto == htons(ETH_P_IPV6)) { 309 + ip6h = (void *)(eth + 1); 310 + udph = (void *)(ip6h + 1); 311 + } else { 312 + printf("%p: failed to detect IP version for ping pong %04x\n", xsk, eth->h_proto); 313 + xsk_ring_prod__cancel(&xsk->tx, 1); 314 + return; 315 + } 316 + 317 + len = ETH_HLEN; 318 + if (ip6h) 319 + len += sizeof(*ip6h) + ntohs(ip6h->payload_len); 320 + if (iph) 321 + len += ntohs(iph->tot_len); 322 + 323 + swap(eth->h_dest, eth->h_source, ETH_ALEN); 324 + if (iph) 325 + swap(&iph->saddr, &iph->daddr, 4); 326 + else 327 + swap(&ip6h->saddr, &ip6h->daddr, 16); 328 + swap(&udph->source, &udph->dest, 2); 329 + 330 + want_csum = udph->check; 331 + if (ip6h) 332 + udph->check = ~csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr, 333 + ntohs(udph->len), IPPROTO_UDP, 0); 334 + else 335 + udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, 336 + ntohs(udph->len), IPPROTO_UDP, 0); 337 + 338 + meta->flags |= XDP_TXMD_FLAGS_CHECKSUM; 339 + if (iph) 340 + meta->request.csum_start = sizeof(*eth) + sizeof(*iph); 341 + else 342 + meta->request.csum_start = sizeof(*eth) + sizeof(*ip6h); 343 + meta->request.csum_offset = offsetof(struct udphdr, check); 344 + 345 + printf("%p: ping-pong with csum=%04x (want %04x) csum_start=%d csum_offset=%d\n", 346 + xsk, ntohs(udph->check), ntohs(want_csum), 347 + meta->request.csum_start, meta->request.csum_offset); 348 + 349 + memcpy(data, rx_packet, len); /* don't share umem chunk for simplicity */ 350 + tx_desc->options |= XDP_TX_METADATA; 351 + tx_desc->len = len; 352 + 353 + xsk_ring_prod__submit(&xsk->tx, 1); 354 + } 355 + 258 356 static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id) 259 357 { 260 358 const struct xdp_desc *rx_desc; ··· 400 252 401 253 while (true) { 402 254 errno = 0; 255 + 256 + for (i = 0; i < rxq; i++) { 257 + ret = kick_rx(&rx_xsk[i]); 258 + if (ret) 259 + printf("kick_rx ret=%d\n", ret); 260 + } 261 + 403 262 ret = poll(fds, rxq + 1, 1000); 404 263 printf("poll: %d (%d) skip=%llu fail=%llu redir=%llu\n", 405 264 ret, errno, bpf_obj->bss->pkts_skip, ··· 443 288 verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr), 444 289 clock_id); 445 290 first_seg = false; 291 + 292 + if (!skip_tx) { 293 + /* mirror first chunk back */ 294 + ping_pong(xsk, xsk_umem__get_data(xsk->umem_area, addr), 295 + clock_id); 296 + 297 + ret = kick_tx(xsk); 298 + if (ret) 299 + printf("kick_tx ret=%d\n", ret); 300 + 301 + for (int j = 0; j < 500; j++) { 302 + if (complete_tx(xsk, clock_id)) 303 + break; 304 + usleep(10*1000); 305 + } 306 + } 446 307 } 447 308 448 309 xsk_ring_cons__release(&xsk->rx, 1); ··· 591 420 { 592 421 const char *usage = 593 422 "Usage: xdp_hw_metadata [OPTIONS] [IFNAME]\n" 594 - " -m Enable multi-buffer XDP for larger MTU\n" 423 + " -c Run in copy mode (zerocopy is default)\n" 595 424 " -h Display this help and exit\n\n" 425 + " -m Enable multi-buffer XDP for larger MTU\n" 426 + " -r Don't generate AF_XDP reply (rx metadata only)\n" 596 427 "Generate test packets on the other machine with:\n" 597 428 " echo -n xdp | nc -u -q1 <dst_ip> 9091\n"; 598 429 ··· 605 432 { 606 433 int opt; 607 434 608 - while ((opt = getopt(argc, argv, "mh")) != -1) { 435 + while ((opt = getopt(argc, argv, "chmr")) != -1) { 609 436 switch (opt) { 610 - case 'm': 611 - bind_flags |= XDP_USE_SG; 437 + case 'c': 438 + bind_flags &= ~XDP_USE_NEED_WAKEUP; 439 + bind_flags &= ~XDP_ZEROCOPY; 440 + bind_flags |= XDP_COPY; 612 441 break; 613 442 case 'h': 614 443 print_usage(); 615 444 exit(0); 445 + case 'm': 446 + bind_flags |= XDP_USE_SG; 447 + break; 448 + case 'r': 449 + skip_tx = true; 450 + break; 616 451 case '?': 617 452 if (isprint(optopt)) 618 453 fprintf(stderr, "Unknown option: -%c\n", optopt);
+3
tools/testing/selftests/bpf/xsk.c
··· 115 115 cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE; 116 116 cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM; 117 117 cfg->flags = XSK_UMEM__DEFAULT_FLAGS; 118 + cfg->tx_metadata_len = 0; 118 119 return; 119 120 } 120 121 ··· 124 123 cfg->frame_size = usr_cfg->frame_size; 125 124 cfg->frame_headroom = usr_cfg->frame_headroom; 126 125 cfg->flags = usr_cfg->flags; 126 + cfg->tx_metadata_len = usr_cfg->tx_metadata_len; 127 127 } 128 128 129 129 static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg, ··· 254 252 mr.chunk_size = umem->config.frame_size; 255 253 mr.headroom = umem->config.frame_headroom; 256 254 mr.flags = umem->config.flags; 255 + mr.tx_metadata_len = umem->config.tx_metadata_len; 257 256 258 257 err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)); 259 258 if (err) {
+1
tools/testing/selftests/bpf/xsk.h
··· 200 200 __u32 frame_size; 201 201 __u32 frame_headroom; 202 202 __u32 flags; 203 + __u32 tx_metadata_len; 203 204 }; 204 205 205 206 int xsk_attach_xdp_program(struct bpf_program *prog, int ifindex, u32 xdp_flags);