Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+6

Documentation/netlink/specs/netdev.yaml

··· 62 62 type: u64 63 63 enum: xdp-act 64 64 enum-as-flags: true 65 + - 66 + name: xdp_zc_max_segs 67 + doc: max fragment count supported by ZC driver 68 + type: u32 69 + checks: 70 + min: 1 65 71 66 72 operations: 67 73 list:

+210 -1

Documentation/networking/af_xdp.rst

··· 462 462 Gets options from an XDP socket. The only one supported so far is 463 463 XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not. 464 464 465 + Multi-Buffer Support 466 + ==================== 467 + 468 + With multi-buffer support, programs using AF_XDP sockets can receive 469 + and transmit packets consisting of multiple buffers both in copy and 470 + zero-copy mode. For example, a packet can consist of two 471 + frames/buffers, one with the header and the other one with the data, 472 + or a 9K Ethernet jumbo frame can be constructed by chaining together 473 + three 4K frames. 474 + 475 + Some definitions: 476 + 477 + * A packet consists of one or more frames 478 + 479 + * A descriptor in one of the AF_XDP rings always refers to a single 480 + frame. In the case the packet consists of a single frame, the 481 + descriptor refers to the whole packet. 482 + 483 + To enable multi-buffer support for an AF_XDP socket, use the new bind 484 + flag XDP_USE_SG. If this is not provided, all multi-buffer packets 485 + will be dropped just as before. Note that the XDP program loaded also 486 + needs to be in multi-buffer mode. This can be accomplished by using 487 + "xdp.frags" as the section name of the XDP program used. 488 + 489 + To represent a packet consisting of multiple frames, a new flag called 490 + XDP_PKT_CONTD is introduced in the options field of the Rx and Tx 491 + descriptors. If it is true (1) the packet continues with the next 492 + descriptor and if it is false (0) it means this is the last descriptor 493 + of the packet. Why the reverse logic of end-of-packet (eop) flag found 494 + in many NICs? Just to preserve compatibility with non-multi-buffer 495 + applications that have this bit set to false for all packets on Rx, 496 + and the apps set the options field to zero for Tx, as anything else 497 + will be treated as an invalid descriptor. 498 + 499 + These are the semantics for producing packets onto AF_XDP Tx ring 500 + consisting of multiple frames: 501 + 502 + * When an invalid descriptor is found, all the other 503 + descriptors/frames of this packet are marked as invalid and not 504 + completed. The next descriptor is treated as the start of a new 505 + packet, even if this was not the intent (because we cannot guess 506 + the intent). As before, if your program is producing invalid 507 + descriptors you have a bug that must be fixed. 508 + 509 + * Zero length descriptors are treated as invalid descriptors. 510 + 511 + * For copy mode, the maximum supported number of frames in a packet is 512 + equal to CONFIG_MAX_SKB_FRAGS + 1. If it is exceeded, all 513 + descriptors accumulated so far are dropped and treated as 514 + invalid. To produce an application that will work on any system 515 + regardless of this config setting, limit the number of frags to 18, 516 + as the minimum value of the config is 17. 517 + 518 + * For zero-copy mode, the limit is up to what the NIC HW 519 + supports. Usually at least five on the NICs we have checked. We 520 + consciously chose to not enforce a rigid limit (such as 521 + CONFIG_MAX_SKB_FRAGS + 1) for zero-copy mode, as it would have 522 + resulted in copy actions under the hood to fit into what limit the 523 + NIC supports. Kind of defeats the purpose of zero-copy mode. How to 524 + probe for this limit is explained in the "probe for multi-buffer 525 + support" section. 526 + 527 + On the Rx path in copy-mode, the xsk core copies the XDP data into 528 + multiple descriptors, if needed, and sets the XDP_PKT_CONTD flag as 529 + detailed before. Zero-copy mode works the same, though the data is not 530 + copied. When the application gets a descriptor with the XDP_PKT_CONTD 531 + flag set to one, it means that the packet consists of multiple buffers 532 + and it continues with the next buffer in the following 533 + descriptor. When a descriptor with XDP_PKT_CONTD == 0 is received, it 534 + means that this is the last buffer of the packet. AF_XDP guarantees 535 + that only a complete packet (all frames in the packet) is sent to the 536 + application. If there is not enough space in the AF_XDP Rx ring, all 537 + frames of the packet will be dropped. 538 + 539 + If application reads a batch of descriptors, using for example the libxdp 540 + interfaces, it is not guaranteed that the batch will end with a full 541 + packet. It might end in the middle of a packet and the rest of the 542 + buffers of that packet will arrive at the beginning of the next batch, 543 + since the libxdp interface does not read the whole ring (unless you 544 + have an enormous batch size or a very small ring size). 545 + 546 + An example program each for Rx and Tx multi-buffer support can be found 547 + later in this document. 548 + 465 549 Usage 466 - ===== 550 + ----- 467 551 468 552 In order to use AF_XDP sockets two parts are needed. The 469 553 user-space application and the XDP program. For a complete setup and ··· 624 540 625 541 But please use the libbpf functions as they are optimized and ready to 626 542 use. Will make your life easier. 543 + 544 + Usage Multi-Buffer Rx 545 + --------------------- 546 + 547 + Here is a simple Rx path pseudo-code example (using libxdp interfaces 548 + for simplicity). Error paths have been excluded to keep it short: 549 + 550 + .. code-block:: c 551 + 552 + void rx_packets(struct xsk_socket_info *xsk) 553 + { 554 + static bool new_packet = true; 555 + u32 idx_rx = 0, idx_fq = 0; 556 + static char *pkt; 557 + 558 + int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx); 559 + 560 + xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq); 561 + 562 + for (int i = 0; i < rcvd; i++) { 563 + struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); 564 + char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr); 565 + bool eop = !(desc->options & XDP_PKT_CONTD); 566 + 567 + if (new_packet) 568 + pkt = frag; 569 + else 570 + add_frag_to_pkt(pkt, frag); 571 + 572 + if (eop) 573 + process_pkt(pkt); 574 + 575 + new_packet = eop; 576 + 577 + *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr; 578 + } 579 + 580 + xsk_ring_prod__submit(&xsk->umem->fq, rcvd); 581 + xsk_ring_cons__release(&xsk->rx, rcvd); 582 + } 583 + 584 + Usage Multi-Buffer Tx 585 + --------------------- 586 + 587 + Here is an example Tx path pseudo-code (using libxdp interfaces for 588 + simplicity) ignoring that the umem is finite in size, and that we 589 + eventually will run out of packets to send. Also assumes pkts.addr 590 + points to a valid location in the umem. 591 + 592 + .. code-block:: c 593 + 594 + void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts, 595 + int batch_size) 596 + { 597 + u32 idx, i, pkt_nb = 0; 598 + 599 + xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx); 600 + 601 + for (i = 0; i < batch_size;) { 602 + u64 addr = pkts[pkt_nb].addr; 603 + u32 len = pkts[pkt_nb].size; 604 + 605 + do { 606 + struct xdp_desc *tx_desc; 607 + 608 + tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++); 609 + tx_desc->addr = addr; 610 + 611 + if (len > xsk_frame_size) { 612 + tx_desc->len = xsk_frame_size; 613 + tx_desc->options = XDP_PKT_CONTD; 614 + } else { 615 + tx_desc->len = len; 616 + tx_desc->options = 0; 617 + pkt_nb++; 618 + } 619 + len -= tx_desc->len; 620 + addr += xsk_frame_size; 621 + 622 + if (i == batch_size) { 623 + /* Remember len, addr, pkt_nb for next iteration. 624 + * Skipped for simplicity. 625 + */ 626 + break; 627 + } 628 + } while (len); 629 + } 630 + 631 + xsk_ring_prod__submit(&xsk->tx, i); 632 + } 633 + 634 + Probing for Multi-Buffer Support 635 + -------------------------------- 636 + 637 + To discover if a driver supports multi-buffer AF_XDP in SKB or DRV 638 + mode, use the XDP_FEATURES feature of netlink in linux/netdev.h to 639 + query for NETDEV_XDP_ACT_RX_SG support. This is the same flag as for 640 + querying for XDP multi-buffer support. If XDP supports multi-buffer in 641 + a driver, then AF_XDP will also support that in SKB and DRV mode. 642 + 643 + To discover if a driver supports multi-buffer AF_XDP in zero-copy 644 + mode, use XDP_FEATURES and first check the NETDEV_XDP_ACT_XSK_ZEROCOPY 645 + flag. If it is set, it means that at least zero-copy is supported and 646 + you should go and check the netlink attribute 647 + NETDEV_A_DEV_XDP_ZC_MAX_SEGS in linux/netdev.h. An unsigned integer 648 + value will be returned stating the max number of frags that are 649 + supported by this device in zero-copy mode. These are the possible 650 + return values: 651 + 652 + 1: Multi-buffer for zero-copy is not supported by this device, as max 653 + one fragment supported means that multi-buffer is not possible. 654 + 655 + >=2: Multi-buffer is supported in zero-copy mode for this device. The 656 + returned number signifies the max number of frags supported. 657 + 658 + For an example on how these are used through libbpf, please take a 659 + look at tools/testing/selftests/bpf/xskxceiver.c. 660 + 661 + Multi-Buffer Support for Zero-Copy Drivers 662 + ------------------------------------------ 663 + 664 + Zero-copy drivers usually use the batched APIs for Rx and Tx 665 + processing. Note that the Tx batch API guarantees that it will provide 666 + a batch of Tx descriptors that ends with full packet at the end. This 667 + to facilitate extending a zero-copy driver with multi-buffer support. 627 668 628 669 Sample application 629 670 ==================

+4 -1

MAINTAINERS

··· 3684 3684 F: include/linux/tnum.h 3685 3685 F: kernel/bpf/core.c 3686 3686 F: kernel/bpf/dispatcher.c 3687 + F: kernel/bpf/mprog.c 3687 3688 F: kernel/bpf/syscall.c 3688 3689 F: kernel/bpf/tnum.c 3689 3690 F: kernel/bpf/trampoline.c ··· 3778 3777 S: Maintained 3779 3778 F: kernel/bpf/bpf_struct* 3780 3779 3781 - BPF [NETWORKING] (tc BPF, sock_addr) 3780 + BPF [NETWORKING] (tcx & tc BPF, sock_addr) 3782 3781 M: Martin KaFai Lau <martin.lau@linux.dev> 3783 3782 M: Daniel Borkmann <daniel@iogearbox.net> 3784 3783 R: John Fastabend <john.fastabend@gmail.com> 3785 3784 L: bpf@vger.kernel.org 3786 3785 L: netdev@vger.kernel.org 3787 3786 S: Maintained 3787 + F: include/net/tcx.h 3788 + F: kernel/bpf/tcx.c 3788 3789 F: net/core/filter.c 3789 3790 F: net/sched/act_bpf.c 3790 3791 F: net/sched/cls_bpf.c

+1 -1

arch/x86/net/bpf_jit_comp.c

··· 1925 1925 static void save_args(const struct btf_func_model *m, u8 **prog, 1926 1926 int stack_size, bool for_call_origin) 1927 1927 { 1928 - int arg_regs, first_off, nr_regs = 0, nr_stack_slots = 0; 1928 + int arg_regs, first_off = 0, nr_regs = 0, nr_stack_slots = 0; 1929 1929 int i, j; 1930 1930 1931 1931 /* Store function arguments to stack.

+1 -5

drivers/net/ethernet/intel/i40e/i40e_main.c

··· 3585 3585 if (ring->xsk_pool) { 3586 3586 ring->rx_buf_len = 3587 3587 xsk_pool_get_rx_frame_size(ring->xsk_pool); 3588 - /* For AF_XDP ZC, we disallow packets to span on 3589 - * multiple buffers, thus letting us skip that 3590 - * handling in the fast-path. 3591 - */ 3592 - chain_len = 1; 3593 3588 ret = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, 3594 3589 MEM_TYPE_XSK_BUFF_POOL, 3595 3590 NULL); ··· 13817 13822 NETDEV_XDP_ACT_REDIRECT | 13818 13823 NETDEV_XDP_ACT_XSK_ZEROCOPY | 13819 13824 NETDEV_XDP_ACT_RX_SG; 13825 + netdev->xdp_zc_max_segs = I40E_MAX_BUFFER_TXD; 13820 13826 } else { 13821 13827 /* Relate the VSI_VMDQ name to the VSI_MAIN name. Note that we 13822 13828 * are still limited by IFNAMSIZ, but we're adding 'v%d\0' to

+2 -2

drivers/net/ethernet/intel/i40e/i40e_txrx.c

··· 2284 2284 * If the buffer is an EOP buffer, this function exits returning false, 2285 2285 * otherwise return true indicating that this is in fact a non-EOP buffer. 2286 2286 */ 2287 - static bool i40e_is_non_eop(struct i40e_ring *rx_ring, 2288 - union i40e_rx_desc *rx_desc) 2287 + bool i40e_is_non_eop(struct i40e_ring *rx_ring, 2288 + union i40e_rx_desc *rx_desc) 2289 2289 { 2290 2290 /* if we are the last buffer then there is nothing else to do */ 2291 2291 #define I40E_RXD_EOF BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)

+2

drivers/net/ethernet/intel/i40e/i40e_txrx.h

··· 473 473 bool __i40e_chk_linearize(struct sk_buff *skb); 474 474 int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, 475 475 u32 flags); 476 + bool i40e_is_non_eop(struct i40e_ring *rx_ring, 477 + union i40e_rx_desc *rx_desc); 476 478 477 479 /** 478 480 * i40e_get_head - Retrieve head from head writeback

+85 -16

drivers/net/ethernet/intel/i40e/i40e_xsk.c

··· 294 294 { 295 295 unsigned int totalsize = xdp->data_end - xdp->data_meta; 296 296 unsigned int metasize = xdp->data - xdp->data_meta; 297 + struct skb_shared_info *sinfo = NULL; 297 298 struct sk_buff *skb; 299 + u32 nr_frags = 0; 298 300 301 + if (unlikely(xdp_buff_has_frags(xdp))) { 302 + sinfo = xdp_get_shared_info_from_buff(xdp); 303 + nr_frags = sinfo->nr_frags; 304 + } 299 305 net_prefetch(xdp->data_meta); 300 306 301 307 /* allocate a skb to store the frags */ ··· 318 312 __skb_pull(skb, metasize); 319 313 } 320 314 315 + if (likely(!xdp_buff_has_frags(xdp))) 316 + goto out; 317 + 318 + for (int i = 0; i < nr_frags; i++) { 319 + struct skb_shared_info *skinfo = skb_shinfo(skb); 320 + skb_frag_t *frag = &sinfo->frags[i]; 321 + struct page *page; 322 + void *addr; 323 + 324 + page = dev_alloc_page(); 325 + if (!page) { 326 + dev_kfree_skb(skb); 327 + return NULL; 328 + } 329 + addr = page_to_virt(page); 330 + 331 + memcpy(addr, skb_frag_page(frag), skb_frag_size(frag)); 332 + 333 + __skb_fill_page_desc_noacc(skinfo, skinfo->nr_frags++, 334 + addr, 0, skb_frag_size(frag)); 335 + } 336 + 321 337 out: 322 338 xsk_buff_free(xdp); 323 339 return skb; ··· 350 322 union i40e_rx_desc *rx_desc, 351 323 unsigned int *rx_packets, 352 324 unsigned int *rx_bytes, 353 - unsigned int size, 354 325 unsigned int xdp_res, 355 326 bool *failure) 356 327 { 357 328 struct sk_buff *skb; 358 329 359 330 *rx_packets = 1; 360 - *rx_bytes = size; 331 + *rx_bytes = xdp_get_buff_len(xdp_buff); 361 332 362 333 if (likely(xdp_res == I40E_XDP_REDIR) || xdp_res == I40E_XDP_TX) 363 334 return; ··· 390 363 return; 391 364 } 392 365 393 - *rx_bytes = skb->len; 394 366 i40e_process_skb_fields(rx_ring, rx_desc, skb); 395 367 napi_gro_receive(&rx_ring->q_vector->napi, skb); 396 368 return; ··· 398 372 /* Should never get here, as all valid cases have been handled already. 399 373 */ 400 374 WARN_ON_ONCE(1); 375 + } 376 + 377 + static int 378 + i40e_add_xsk_frag(struct i40e_ring *rx_ring, struct xdp_buff *first, 379 + struct xdp_buff *xdp, const unsigned int size) 380 + { 381 + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(first); 382 + 383 + if (!xdp_buff_has_frags(first)) { 384 + sinfo->nr_frags = 0; 385 + sinfo->xdp_frags_size = 0; 386 + xdp_buff_set_frags_flag(first); 387 + } 388 + 389 + if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) { 390 + xsk_buff_free(first); 391 + return -ENOMEM; 392 + } 393 + 394 + __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, 395 + virt_to_page(xdp->data_hard_start), 0, size); 396 + sinfo->xdp_frags_size += size; 397 + xsk_buff_add_frag(xdp); 398 + 399 + return 0; 401 400 } 402 401 403 402 /** ··· 435 384 int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) 436 385 { 437 386 unsigned int total_rx_bytes = 0, total_rx_packets = 0; 387 + u16 next_to_process = rx_ring->next_to_process; 438 388 u16 next_to_clean = rx_ring->next_to_clean; 439 389 u16 count_mask = rx_ring->count - 1; 440 390 unsigned int xdp_res, xdp_xmit = 0; 391 + struct xdp_buff *first = NULL; 441 392 struct bpf_prog *xdp_prog; 442 393 bool failure = false; 443 394 u16 cleaned_count; 395 + 396 + if (next_to_process != next_to_clean) 397 + first = *i40e_rx_bi(rx_ring, next_to_clean); 444 398 445 399 /* NB! xdp_prog will always be !NULL, due to the fact that 446 400 * this path is enabled by setting an XDP program. ··· 460 404 unsigned int size; 461 405 u64 qword; 462 406 463 - rx_desc = I40E_RX_DESC(rx_ring, next_to_clean); 407 + rx_desc = I40E_RX_DESC(rx_ring, next_to_process); 464 408 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len); 465 409 466 410 /* This memory barrier is needed to keep us from reading ··· 473 417 i40e_clean_programming_status(rx_ring, 474 418 rx_desc->raw.qword[0], 475 419 qword); 476 - bi = *i40e_rx_bi(rx_ring, next_to_clean); 420 + bi = *i40e_rx_bi(rx_ring, next_to_process); 477 421 xsk_buff_free(bi); 478 - next_to_clean = (next_to_clean + 1) & count_mask; 422 + next_to_process = (next_to_process + 1) & count_mask; 479 423 continue; 480 424 } 481 425 ··· 484 428 if (!size) 485 429 break; 486 430 487 - bi = *i40e_rx_bi(rx_ring, next_to_clean); 431 + bi = *i40e_rx_bi(rx_ring, next_to_process); 488 432 xsk_buff_set_size(bi, size); 489 433 xsk_buff_dma_sync_for_cpu(bi, rx_ring->xsk_pool); 490 434 491 - xdp_res = i40e_run_xdp_zc(rx_ring, bi, xdp_prog); 492 - i40e_handle_xdp_result_zc(rx_ring, bi, rx_desc, &rx_packets, 493 - &rx_bytes, size, xdp_res, &failure); 435 + if (!first) 436 + first = bi; 437 + else if (i40e_add_xsk_frag(rx_ring, first, bi, size)) 438 + break; 439 + 440 + next_to_process = (next_to_process + 1) & count_mask; 441 + 442 + if (i40e_is_non_eop(rx_ring, rx_desc)) 443 + continue; 444 + 445 + xdp_res = i40e_run_xdp_zc(rx_ring, first, xdp_prog); 446 + i40e_handle_xdp_result_zc(rx_ring, first, rx_desc, &rx_packets, 447 + &rx_bytes, xdp_res, &failure); 448 + first->flags = 0; 449 + next_to_clean = next_to_process; 494 450 if (failure) 495 451 break; 496 452 total_rx_packets += rx_packets; 497 453 total_rx_bytes += rx_bytes; 498 454 xdp_xmit |= xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR); 499 - next_to_clean = (next_to_clean + 1) & count_mask; 455 + first = NULL; 500 456 } 501 457 502 458 rx_ring->next_to_clean = next_to_clean; 459 + rx_ring->next_to_process = next_to_process; 503 460 cleaned_count = (next_to_clean - rx_ring->next_to_use - 1) & count_mask; 504 461 505 462 if (cleaned_count >= I40E_RX_BUFFER_WRITE) ··· 535 466 static void i40e_xmit_pkt(struct i40e_ring *xdp_ring, struct xdp_desc *desc, 536 467 unsigned int *total_bytes) 537 468 { 469 + u32 cmd = I40E_TX_DESC_CMD_ICRC | xsk_is_eop_desc(desc); 538 470 struct i40e_tx_desc *tx_desc; 539 471 dma_addr_t dma; 540 472 ··· 544 474 545 475 tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use++); 546 476 tx_desc->buffer_addr = cpu_to_le64(dma); 547 - tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC | I40E_TX_DESC_CMD_EOP, 548 - 0, desc->len, 0); 477 + tx_desc->cmd_type_offset_bsz = build_ctob(cmd, 0, desc->len, 0); 549 478 550 479 *total_bytes += desc->len; 551 480 } ··· 558 489 u32 i; 559 490 560 491 loop_unrolled_for(i = 0; i < PKTS_PER_BATCH; i++) { 492 + u32 cmd = I40E_TX_DESC_CMD_ICRC | xsk_is_eop_desc(&desc[i]); 493 + 561 494 dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc[i].addr); 562 495 xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc[i].len); 563 496 564 497 tx_desc = I40E_TX_DESC(xdp_ring, ntu++); 565 498 tx_desc->buffer_addr = cpu_to_le64(dma); 566 - tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC | 567 - I40E_TX_DESC_CMD_EOP, 568 - 0, desc[i].len, 0); 499 + tx_desc->cmd_type_offset_bsz = build_ctob(cmd, 0, desc[i].len, 0); 569 500 570 501 *total_bytes += desc[i].len; 571 502 }

+1 -8

drivers/net/ethernet/intel/ice/ice_base.c

··· 408 408 */ 409 409 static int ice_setup_rx_ctx(struct ice_rx_ring *ring) 410 410 { 411 - int chain_len = ICE_MAX_CHAINED_RX_BUFS; 412 411 struct ice_vsi *vsi = ring->vsi; 413 412 u32 rxdid = ICE_RXDID_FLEX_NIC; 414 413 struct ice_rlan_ctx rlan_ctx; ··· 471 472 */ 472 473 rlan_ctx.showiv = 0; 473 474 474 - /* For AF_XDP ZC, we disallow packets to span on 475 - * multiple buffers, thus letting us skip that 476 - * handling in the fast-path. 477 - */ 478 - if (ring->xsk_pool) 479 - chain_len = 1; 480 475 /* Max packet size for this queue - must not be set to a larger value 481 476 * than 5 x DBUF 482 477 */ 483 478 rlan_ctx.rxmax = min_t(u32, vsi->max_frame, 484 - chain_len * ring->rx_buf_len); 479 + ICE_MAX_CHAINED_RX_BUFS * ring->rx_buf_len); 485 480 486 481 /* Rx queue threshold in units of 64 */ 487 482 rlan_ctx.lrxqthresh = 1;

+1

drivers/net/ethernet/intel/ice/ice_main.c

··· 3392 3392 netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | 3393 3393 NETDEV_XDP_ACT_XSK_ZEROCOPY | 3394 3394 NETDEV_XDP_ACT_RX_SG; 3395 + netdev->xdp_zc_max_segs = ICE_MAX_BUF_TXD; 3395 3396 } 3396 3397 3397 3398 /**

+161 -58

drivers/net/ethernet/intel/ice/ice_xsk.c

··· 546 546 } 547 547 548 548 /** 549 - * ice_bump_ntc - Bump the next_to_clean counter of an Rx ring 550 - * @rx_ring: Rx ring 551 - */ 552 - static void ice_bump_ntc(struct ice_rx_ring *rx_ring) 553 - { 554 - int ntc = rx_ring->next_to_clean + 1; 555 - 556 - ntc = (ntc < rx_ring->count) ? ntc : 0; 557 - rx_ring->next_to_clean = ntc; 558 - prefetch(ICE_RX_DESC(rx_ring, ntc)); 559 - } 560 - 561 - /** 562 549 * ice_construct_skb_zc - Create an sk_buff from zero-copy buffer 563 550 * @rx_ring: Rx ring 564 551 * @xdp: Pointer to XDP buffer ··· 559 572 { 560 573 unsigned int totalsize = xdp->data_end - xdp->data_meta; 561 574 unsigned int metasize = xdp->data - xdp->data_meta; 575 + struct skb_shared_info *sinfo = NULL; 562 576 struct sk_buff *skb; 577 + u32 nr_frags = 0; 563 578 579 + if (unlikely(xdp_buff_has_frags(xdp))) { 580 + sinfo = xdp_get_shared_info_from_buff(xdp); 581 + nr_frags = sinfo->nr_frags; 582 + } 564 583 net_prefetch(xdp->data_meta); 565 584 566 585 skb = __napi_alloc_skb(&rx_ring->q_vector->napi, totalsize, ··· 582 589 __skb_pull(skb, metasize); 583 590 } 584 591 592 + if (likely(!xdp_buff_has_frags(xdp))) 593 + goto out; 594 + 595 + for (int i = 0; i < nr_frags; i++) { 596 + struct skb_shared_info *skinfo = skb_shinfo(skb); 597 + skb_frag_t *frag = &sinfo->frags[i]; 598 + struct page *page; 599 + void *addr; 600 + 601 + page = dev_alloc_page(); 602 + if (!page) { 603 + dev_kfree_skb(skb); 604 + return NULL; 605 + } 606 + addr = page_to_virt(page); 607 + 608 + memcpy(addr, skb_frag_page(frag), skb_frag_size(frag)); 609 + 610 + __skb_fill_page_desc_noacc(skinfo, skinfo->nr_frags++, 611 + addr, 0, skb_frag_size(frag)); 612 + } 613 + 614 + out: 585 615 xsk_buff_free(xdp); 586 616 return skb; 587 617 } ··· 613 597 * ice_clean_xdp_irq_zc - produce AF_XDP descriptors to CQ 614 598 * @xdp_ring: XDP Tx ring 615 599 */ 616 - static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) 600 + static u32 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring) 617 601 { 618 602 u16 ntc = xdp_ring->next_to_clean; 619 603 struct ice_tx_desc *tx_desc; ··· 635 619 } 636 620 637 621 if (!completed_frames) 638 - return; 622 + return 0; 639 623 640 624 if (likely(!xdp_ring->xdp_tx_active)) { 641 625 xsk_frames = completed_frames; ··· 665 649 xdp_ring->next_to_clean -= cnt; 666 650 if (xsk_frames) 667 651 xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); 652 + 653 + return completed_frames; 668 654 } 669 655 670 656 /** ··· 684 666 static int ice_xmit_xdp_tx_zc(struct xdp_buff *xdp, 685 667 struct ice_tx_ring *xdp_ring) 686 668 { 669 + struct skb_shared_info *sinfo = NULL; 687 670 u32 size = xdp->data_end - xdp->data; 688 671 u32 ntu = xdp_ring->next_to_use; 689 672 struct ice_tx_desc *tx_desc; 690 673 struct ice_tx_buf *tx_buf; 691 - dma_addr_t dma; 674 + struct xdp_buff *head; 675 + u32 nr_frags = 0; 676 + u32 free_space; 677 + u32 frag = 0; 692 678 693 - if (ICE_DESC_UNUSED(xdp_ring) < ICE_RING_QUARTER(xdp_ring)) { 694 - ice_clean_xdp_irq_zc(xdp_ring); 695 - if (!ICE_DESC_UNUSED(xdp_ring)) { 696 - xdp_ring->ring_stats->tx_stats.tx_busy++; 697 - return ICE_XDP_CONSUMED; 698 - } 679 + free_space = ICE_DESC_UNUSED(xdp_ring); 680 + if (free_space < ICE_RING_QUARTER(xdp_ring)) 681 + free_space += ice_clean_xdp_irq_zc(xdp_ring); 682 + 683 + if (unlikely(!free_space)) 684 + goto busy; 685 + 686 + if (unlikely(xdp_buff_has_frags(xdp))) { 687 + sinfo = xdp_get_shared_info_from_buff(xdp); 688 + nr_frags = sinfo->nr_frags; 689 + if (free_space < nr_frags + 1) 690 + goto busy; 699 691 } 700 692 701 - dma = xsk_buff_xdp_get_dma(xdp); 702 - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, size); 703 - 704 - tx_buf = &xdp_ring->tx_buf[ntu]; 705 - tx_buf->xdp = xdp; 706 - tx_buf->type = ICE_TX_BUF_XSK_TX; 707 693 tx_desc = ICE_TX_DESC(xdp_ring, ntu); 708 - tx_desc->buf_addr = cpu_to_le64(dma); 709 - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, 710 - 0, size, 0); 711 - xdp_ring->xdp_tx_active++; 694 + tx_buf = &xdp_ring->tx_buf[ntu]; 695 + head = xdp; 712 696 713 - if (++ntu == xdp_ring->count) 714 - ntu = 0; 697 + for (;;) { 698 + dma_addr_t dma; 699 + 700 + dma = xsk_buff_xdp_get_dma(xdp); 701 + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, size); 702 + 703 + tx_buf->xdp = xdp; 704 + tx_buf->type = ICE_TX_BUF_XSK_TX; 705 + tx_desc->buf_addr = cpu_to_le64(dma); 706 + tx_desc->cmd_type_offset_bsz = ice_build_ctob(0, 0, size, 0); 707 + /* account for each xdp_buff from xsk_buff_pool */ 708 + xdp_ring->xdp_tx_active++; 709 + 710 + if (++ntu == xdp_ring->count) 711 + ntu = 0; 712 + 713 + if (frag == nr_frags) 714 + break; 715 + 716 + tx_desc = ICE_TX_DESC(xdp_ring, ntu); 717 + tx_buf = &xdp_ring->tx_buf[ntu]; 718 + 719 + xdp = xsk_buff_get_frag(head); 720 + size = skb_frag_size(&sinfo->frags[frag]); 721 + frag++; 722 + } 723 + 715 724 xdp_ring->next_to_use = ntu; 725 + /* update last descriptor from a frame with EOP */ 726 + tx_desc->cmd_type_offset_bsz |= 727 + cpu_to_le64(ICE_TX_DESC_CMD_EOP << ICE_TXD_QW1_CMD_S); 716 728 717 729 return ICE_XDP_TX; 730 + 731 + busy: 732 + xdp_ring->ring_stats->tx_stats.tx_busy++; 733 + 734 + return ICE_XDP_CONSUMED; 718 735 } 719 736 720 737 /** ··· 805 752 return result; 806 753 } 807 754 755 + static int 756 + ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, 757 + struct xdp_buff *xdp, const unsigned int size) 758 + { 759 + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(first); 760 + 761 + if (!size) 762 + return 0; 763 + 764 + if (!xdp_buff_has_frags(first)) { 765 + sinfo->nr_frags = 0; 766 + sinfo->xdp_frags_size = 0; 767 + xdp_buff_set_frags_flag(first); 768 + } 769 + 770 + if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) { 771 + xsk_buff_free(first); 772 + return -ENOMEM; 773 + } 774 + 775 + __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, 776 + virt_to_page(xdp->data_hard_start), 0, size); 777 + sinfo->xdp_frags_size += size; 778 + xsk_buff_add_frag(xdp); 779 + 780 + return 0; 781 + } 782 + 808 783 /** 809 784 * ice_clean_rx_irq_zc - consumes packets from the hardware ring 810 785 * @rx_ring: AF_XDP Rx ring ··· 843 762 int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) 844 763 { 845 764 unsigned int total_rx_bytes = 0, total_rx_packets = 0; 765 + struct xsk_buff_pool *xsk_pool = rx_ring->xsk_pool; 766 + u32 ntc = rx_ring->next_to_clean; 767 + u32 ntu = rx_ring->next_to_use; 768 + struct xdp_buff *first = NULL; 846 769 struct ice_tx_ring *xdp_ring; 847 770 unsigned int xdp_xmit = 0; 848 771 struct bpf_prog *xdp_prog; 772 + u32 cnt = rx_ring->count; 849 773 bool failure = false; 850 774 int entries_to_alloc; 851 775 ··· 859 773 */ 860 774 xdp_prog = READ_ONCE(rx_ring->xdp_prog); 861 775 xdp_ring = rx_ring->xdp_ring; 776 + 777 + if (ntc != rx_ring->first_desc) 778 + first = *ice_xdp_buf(rx_ring, rx_ring->first_desc); 862 779 863 780 while (likely(total_rx_packets < (unsigned int)budget)) { 864 781 union ice_32b_rx_flex_desc *rx_desc; ··· 872 783 u16 vlan_tag = 0; 873 784 u16 rx_ptype; 874 785 875 - rx_desc = ICE_RX_DESC(rx_ring, rx_ring->next_to_clean); 786 + rx_desc = ICE_RX_DESC(rx_ring, ntc); 876 787 877 788 stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S); 878 789 if (!ice_test_staterr(rx_desc->wb.status_error0, stat_err_bits)) ··· 884 795 */ 885 796 dma_rmb(); 886 797 887 - if (unlikely(rx_ring->next_to_clean == rx_ring->next_to_use)) 798 + if (unlikely(ntc == ntu)) 888 799 break; 889 800 890 - xdp = *ice_xdp_buf(rx_ring, rx_ring->next_to_clean); 801 + xdp = *ice_xdp_buf(rx_ring, ntc); 891 802 892 803 size = le16_to_cpu(rx_desc->wb.pkt_len) & 893 804 ICE_RX_FLX_DESC_PKT_LEN_M; 894 - if (!size) { 895 - xdp->data = NULL; 896 - xdp->data_end = NULL; 897 - xdp->data_hard_start = NULL; 898 - xdp->data_meta = NULL; 899 - goto construct_skb; 900 - } 901 805 902 806 xsk_buff_set_size(xdp, size); 903 - xsk_buff_dma_sync_for_cpu(xdp, rx_ring->xsk_pool); 807 + xsk_buff_dma_sync_for_cpu(xdp, xsk_pool); 904 808 905 - xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring); 809 + if (!first) { 810 + first = xdp; 811 + xdp_buff_clear_frags_flag(first); 812 + } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) { 813 + break; 814 + } 815 + 816 + if (++ntc == cnt) 817 + ntc = 0; 818 + 819 + if (ice_is_non_eop(rx_ring, rx_desc)) 820 + continue; 821 + 822 + xdp_res = ice_run_xdp_zc(rx_ring, first, xdp_prog, xdp_ring); 906 823 if (likely(xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))) { 907 824 xdp_xmit |= xdp_res; 908 825 } else if (xdp_res == ICE_XDP_EXIT) { 909 826 failure = true; 827 + first = NULL; 828 + rx_ring->first_desc = ntc; 910 829 break; 911 830 } else if (xdp_res == ICE_XDP_CONSUMED) { 912 - xsk_buff_free(xdp); 831 + xsk_buff_free(first); 913 832 } else if (xdp_res == ICE_XDP_PASS) { 914 833 goto construct_skb; 915 834 } 916 835 917 - total_rx_bytes += size; 836 + total_rx_bytes += xdp_get_buff_len(first); 918 837 total_rx_packets++; 919 838 920 - ice_bump_ntc(rx_ring); 839 + first = NULL; 840 + rx_ring->first_desc = ntc; 921 841 continue; 922 842 923 843 construct_skb: 924 844 /* XDP_PASS path */ 925 - skb = ice_construct_skb_zc(rx_ring, xdp); 845 + skb = ice_construct_skb_zc(rx_ring, first); 926 846 if (!skb) { 927 847 rx_ring->ring_stats->rx_stats.alloc_buf_failed++; 928 848 break; 929 849 } 930 850 931 - ice_bump_ntc(rx_ring); 851 + first = NULL; 852 + rx_ring->first_desc = ntc; 932 853 933 854 if (eth_skb_pad(skb)) { 934 855 skb = NULL; ··· 957 858 ice_receive_skb(rx_ring, skb, vlan_tag); 958 859 } 959 860 960 - entries_to_alloc = ICE_DESC_UNUSED(rx_ring); 861 + rx_ring->next_to_clean = ntc; 862 + entries_to_alloc = ICE_RX_DESC_UNUSED(rx_ring); 961 863 if (entries_to_alloc > ICE_RING_QUARTER(rx_ring)) 962 864 failure |= !ice_alloc_rx_bufs_zc(rx_ring, entries_to_alloc); 963 865 964 866 ice_finalize_xdp_rx(xdp_ring, xdp_xmit, 0); 965 867 ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); 966 868 967 - if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { 968 - if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) 969 - xsk_set_rx_need_wakeup(rx_ring->xsk_pool); 869 + if (xsk_uses_need_wakeup(xsk_pool)) { 870 + /* ntu could have changed when allocating entries above, so 871 + * use rx_ring value instead of stack based one 872 + */ 873 + if (failure || ntc == rx_ring->next_to_use) 874 + xsk_set_rx_need_wakeup(xsk_pool); 970 875 else 971 - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); 876 + xsk_clear_rx_need_wakeup(xsk_pool); 972 877 973 878 return (int)total_rx_packets; 974 879 } ··· 997 894 998 895 tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++); 999 896 tx_desc->buf_addr = cpu_to_le64(dma); 1000 - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, 897 + tx_desc->cmd_type_offset_bsz = ice_build_ctob(xsk_is_eop_desc(desc), 1001 898 0, desc->len, 0); 1002 899 1003 900 *total_bytes += desc->len; ··· 1024 921 1025 922 tx_desc = ICE_TX_DESC(xdp_ring, ntu++); 1026 923 tx_desc->buf_addr = cpu_to_le64(dma); 1027 - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, 924 + tx_desc->cmd_type_offset_bsz = ice_build_ctob(xsk_is_eop_desc(&descs[i]), 1028 925 0, descs[i].len, 0); 1029 926 1030 927 *total_bytes += descs[i].len;

+12

include/linux/bpf.h

··· 228 228 struct btf_field fields[]; 229 229 }; 230 230 231 + /* Non-opaque version of bpf_rb_node in uapi/linux/bpf.h */ 232 + struct bpf_rb_node_kern { 233 + struct rb_node rb_node; 234 + void *owner; 235 + } __attribute__((aligned(8))); 236 + 237 + /* Non-opaque version of bpf_list_node in uapi/linux/bpf.h */ 238 + struct bpf_list_node_kern { 239 + struct list_head list_head; 240 + void *owner; 241 + } __attribute__((aligned(8))); 242 + 231 243 struct bpf_map { 232 244 /* The first two cachelines with read-mostly members of which some 233 245 * are also accessed in fast-path (e.g. ops, max_entries).

+327

include/linux/bpf_mprog.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2023 Isovalent */ 3 + #ifndef __BPF_MPROG_H 4 + #define __BPF_MPROG_H 5 + 6 + #include <linux/bpf.h> 7 + 8 + /* bpf_mprog framework: 9 + * 10 + * bpf_mprog is a generic layer for multi-program attachment. In-kernel users 11 + * of the bpf_mprog don't need to care about the dependency resolution 12 + * internals, they can just consume it with few API calls. Currently available 13 + * dependency directives are BPF_F_{BEFORE,AFTER} which enable insertion of 14 + * a BPF program or BPF link relative to an existing BPF program or BPF link 15 + * inside the multi-program array as well as prepend and append behavior if 16 + * no relative object was specified, see corresponding selftests for concrete 17 + * examples (e.g. tc_links and tc_opts test cases of test_progs). 18 + * 19 + * Usage of bpf_mprog_{attach,detach,query}() core APIs with pseudo code: 20 + * 21 + * Attach case: 22 + * 23 + * struct bpf_mprog_entry *entry, *entry_new; 24 + * int ret; 25 + * 26 + * // bpf_mprog user-side lock 27 + * // fetch active @entry from attach location 28 + * [...] 29 + * ret = bpf_mprog_attach(entry, &entry_new, [...]); 30 + * if (!ret) { 31 + * if (entry != entry_new) { 32 + * // swap @entry to @entry_new at attach location 33 + * // ensure there are no inflight users of @entry: 34 + * synchronize_rcu(); 35 + * } 36 + * bpf_mprog_commit(entry); 37 + * } else { 38 + * // error path, bail out, propagate @ret 39 + * } 40 + * // bpf_mprog user-side unlock 41 + * 42 + * Detach case: 43 + * 44 + * struct bpf_mprog_entry *entry, *entry_new; 45 + * int ret; 46 + * 47 + * // bpf_mprog user-side lock 48 + * // fetch active @entry from attach location 49 + * [...] 50 + * ret = bpf_mprog_detach(entry, &entry_new, [...]); 51 + * if (!ret) { 52 + * // all (*) marked is optional and depends on the use-case 53 + * // whether bpf_mprog_bundle should be freed or not 54 + * if (!bpf_mprog_total(entry_new)) (*) 55 + * entry_new = NULL (*) 56 + * // swap @entry to @entry_new at attach location 57 + * // ensure there are no inflight users of @entry: 58 + * synchronize_rcu(); 59 + * bpf_mprog_commit(entry); 60 + * if (!entry_new) (*) 61 + * // free bpf_mprog_bundle (*) 62 + * } else { 63 + * // error path, bail out, propagate @ret 64 + * } 65 + * // bpf_mprog user-side unlock 66 + * 67 + * Query case: 68 + * 69 + * struct bpf_mprog_entry *entry; 70 + * int ret; 71 + * 72 + * // bpf_mprog user-side lock 73 + * // fetch active @entry from attach location 74 + * [...] 75 + * ret = bpf_mprog_query(attr, uattr, entry); 76 + * // bpf_mprog user-side unlock 77 + * 78 + * Data/fast path: 79 + * 80 + * struct bpf_mprog_entry *entry; 81 + * struct bpf_mprog_fp *fp; 82 + * struct bpf_prog *prog; 83 + * int ret = [...]; 84 + * 85 + * rcu_read_lock(); 86 + * // fetch active @entry from attach location 87 + * [...] 88 + * bpf_mprog_foreach_prog(entry, fp, prog) { 89 + * ret = bpf_prog_run(prog, [...]); 90 + * // process @ret from program 91 + * } 92 + * [...] 93 + * rcu_read_unlock(); 94 + * 95 + * bpf_mprog locking considerations: 96 + * 97 + * bpf_mprog_{attach,detach,query}() must be protected by an external lock 98 + * (like RTNL in case of tcx). 99 + * 100 + * bpf_mprog_entry pointer can be an __rcu annotated pointer (in case of tcx 101 + * the netdevice has tcx_ingress and tcx_egress __rcu pointer) which gets 102 + * updated via rcu_assign_pointer() pointing to the active bpf_mprog_entry of 103 + * the bpf_mprog_bundle. 104 + * 105 + * Fast path accesses the active bpf_mprog_entry within RCU critical section 106 + * (in case of tcx it runs in NAPI which provides RCU protection there, 107 + * other users might need explicit rcu_read_lock()). The bpf_mprog_commit() 108 + * assumes that for the old bpf_mprog_entry there are no inflight users 109 + * anymore. 110 + * 111 + * The READ_ONCE()/WRITE_ONCE() pairing for bpf_mprog_fp's prog access is for 112 + * the replacement case where we don't swap the bpf_mprog_entry. 113 + */ 114 + 115 + #define bpf_mprog_foreach_tuple(entry, fp, cp, t) \ 116 + for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\ 117 + ({ \ 118 + t.prog = READ_ONCE(fp->prog); \ 119 + t.link = cp->link; \ 120 + t.prog; \ 121 + }); \ 122 + fp++, cp++) 123 + 124 + #define bpf_mprog_foreach_prog(entry, fp, p) \ 125 + for (fp = &entry->fp_items[0]; \ 126 + (p = READ_ONCE(fp->prog)); \ 127 + fp++) 128 + 129 + #define BPF_MPROG_MAX 64 130 + 131 + struct bpf_mprog_fp { 132 + struct bpf_prog *prog; 133 + }; 134 + 135 + struct bpf_mprog_cp { 136 + struct bpf_link *link; 137 + }; 138 + 139 + struct bpf_mprog_entry { 140 + struct bpf_mprog_fp fp_items[BPF_MPROG_MAX]; 141 + struct bpf_mprog_bundle *parent; 142 + }; 143 + 144 + struct bpf_mprog_bundle { 145 + struct bpf_mprog_entry a; 146 + struct bpf_mprog_entry b; 147 + struct bpf_mprog_cp cp_items[BPF_MPROG_MAX]; 148 + struct bpf_prog *ref; 149 + atomic64_t revision; 150 + u32 count; 151 + }; 152 + 153 + struct bpf_tuple { 154 + struct bpf_prog *prog; 155 + struct bpf_link *link; 156 + }; 157 + 158 + static inline struct bpf_mprog_entry * 159 + bpf_mprog_peer(const struct bpf_mprog_entry *entry) 160 + { 161 + if (entry == &entry->parent->a) 162 + return &entry->parent->b; 163 + else 164 + return &entry->parent->a; 165 + } 166 + 167 + static inline void bpf_mprog_bundle_init(struct bpf_mprog_bundle *bundle) 168 + { 169 + BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64)); 170 + BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) != 171 + ARRAY_SIZE(bundle->cp_items)); 172 + 173 + memset(bundle, 0, sizeof(*bundle)); 174 + atomic64_set(&bundle->revision, 1); 175 + bundle->a.parent = bundle; 176 + bundle->b.parent = bundle; 177 + } 178 + 179 + static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry) 180 + { 181 + entry->parent->count++; 182 + } 183 + 184 + static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry) 185 + { 186 + entry->parent->count--; 187 + } 188 + 189 + static inline int bpf_mprog_max(void) 190 + { 191 + return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1; 192 + } 193 + 194 + static inline int bpf_mprog_total(struct bpf_mprog_entry *entry) 195 + { 196 + int total = entry->parent->count; 197 + 198 + WARN_ON_ONCE(total > bpf_mprog_max()); 199 + return total; 200 + } 201 + 202 + static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry, 203 + struct bpf_prog *prog) 204 + { 205 + const struct bpf_mprog_fp *fp; 206 + const struct bpf_prog *tmp; 207 + 208 + bpf_mprog_foreach_prog(entry, fp, tmp) { 209 + if (tmp == prog) 210 + return true; 211 + } 212 + return false; 213 + } 214 + 215 + static inline void bpf_mprog_mark_for_release(struct bpf_mprog_entry *entry, 216 + struct bpf_tuple *tuple) 217 + { 218 + WARN_ON_ONCE(entry->parent->ref); 219 + if (!tuple->link) 220 + entry->parent->ref = tuple->prog; 221 + } 222 + 223 + static inline void bpf_mprog_complete_release(struct bpf_mprog_entry *entry) 224 + { 225 + /* In the non-link case prog deletions can only drop the reference 226 + * to the prog after the bpf_mprog_entry got swapped and the 227 + * bpf_mprog ensured that there are no inflight users anymore. 228 + * 229 + * Paired with bpf_mprog_mark_for_release(). 230 + */ 231 + if (entry->parent->ref) { 232 + bpf_prog_put(entry->parent->ref); 233 + entry->parent->ref = NULL; 234 + } 235 + } 236 + 237 + static inline void bpf_mprog_revision_new(struct bpf_mprog_entry *entry) 238 + { 239 + atomic64_inc(&entry->parent->revision); 240 + } 241 + 242 + static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry) 243 + { 244 + bpf_mprog_complete_release(entry); 245 + bpf_mprog_revision_new(entry); 246 + } 247 + 248 + static inline u64 bpf_mprog_revision(struct bpf_mprog_entry *entry) 249 + { 250 + return atomic64_read(&entry->parent->revision); 251 + } 252 + 253 + static inline void bpf_mprog_entry_copy(struct bpf_mprog_entry *dst, 254 + struct bpf_mprog_entry *src) 255 + { 256 + memcpy(dst->fp_items, src->fp_items, sizeof(src->fp_items)); 257 + } 258 + 259 + static inline void bpf_mprog_entry_grow(struct bpf_mprog_entry *entry, int idx) 260 + { 261 + int total = bpf_mprog_total(entry); 262 + 263 + memmove(entry->fp_items + idx + 1, 264 + entry->fp_items + idx, 265 + (total - idx) * sizeof(struct bpf_mprog_fp)); 266 + 267 + memmove(entry->parent->cp_items + idx + 1, 268 + entry->parent->cp_items + idx, 269 + (total - idx) * sizeof(struct bpf_mprog_cp)); 270 + } 271 + 272 + static inline void bpf_mprog_entry_shrink(struct bpf_mprog_entry *entry, int idx) 273 + { 274 + /* Total array size is needed in this case to enure the NULL 275 + * entry is copied at the end. 276 + */ 277 + int total = ARRAY_SIZE(entry->fp_items); 278 + 279 + memmove(entry->fp_items + idx, 280 + entry->fp_items + idx + 1, 281 + (total - idx - 1) * sizeof(struct bpf_mprog_fp)); 282 + 283 + memmove(entry->parent->cp_items + idx, 284 + entry->parent->cp_items + idx + 1, 285 + (total - idx - 1) * sizeof(struct bpf_mprog_cp)); 286 + } 287 + 288 + static inline void bpf_mprog_read(struct bpf_mprog_entry *entry, u32 idx, 289 + struct bpf_mprog_fp **fp, 290 + struct bpf_mprog_cp **cp) 291 + { 292 + *fp = &entry->fp_items[idx]; 293 + *cp = &entry->parent->cp_items[idx]; 294 + } 295 + 296 + static inline void bpf_mprog_write(struct bpf_mprog_fp *fp, 297 + struct bpf_mprog_cp *cp, 298 + struct bpf_tuple *tuple) 299 + { 300 + WRITE_ONCE(fp->prog, tuple->prog); 301 + cp->link = tuple->link; 302 + } 303 + 304 + int bpf_mprog_attach(struct bpf_mprog_entry *entry, 305 + struct bpf_mprog_entry **entry_new, 306 + struct bpf_prog *prog_new, struct bpf_link *link, 307 + struct bpf_prog *prog_old, 308 + u32 flags, u32 id_or_fd, u64 revision); 309 + 310 + int bpf_mprog_detach(struct bpf_mprog_entry *entry, 311 + struct bpf_mprog_entry **entry_new, 312 + struct bpf_prog *prog, struct bpf_link *link, 313 + u32 flags, u32 id_or_fd, u64 revision); 314 + 315 + int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, 316 + struct bpf_mprog_entry *entry); 317 + 318 + static inline bool bpf_mprog_supported(enum bpf_prog_type type) 319 + { 320 + switch (type) { 321 + case BPF_PROG_TYPE_SCHED_CLS: 322 + return true; 323 + default: 324 + return false; 325 + } 326 + } 327 + #endif /* __BPF_MPROG_H */

+1

include/linux/btf_ids.h

··· 267 267 extern u32 btf_tracing_ids[]; 268 268 extern u32 bpf_cgroup_btf_id[]; 269 269 extern u32 bpf_local_storage_map_btf_id[]; 270 + extern u32 btf_bpf_map_id[]; 270 271 271 272 #endif

+7 -9

include/linux/netdevice.h

··· 1930 1930 * 1931 1931 * @rx_handler: handler for received packets 1932 1932 * @rx_handler_data: XXX: need comments on this one 1933 - * @miniq_ingress: ingress/clsact qdisc specific data for 1934 - * ingress processing 1933 + * @tcx_ingress: BPF & clsact qdisc specific data for ingress processing 1935 1934 * @ingress_queue: XXX: need comments on this one 1936 1935 * @nf_hooks_ingress: netfilter hooks executed for ingress packets 1937 1936 * @broadcast: hw bcast address ··· 1951 1952 * @xps_maps: all CPUs/RXQs maps for XPS device 1952 1953 * 1953 1954 * @xps_maps: XXX: need comments on this one 1954 - * @miniq_egress: clsact qdisc specific data for 1955 - * egress processing 1955 + * @tcx_egress: BPF & clsact qdisc specific data for egress processing 1956 1956 * @nf_hooks_egress: netfilter hooks executed for egress packets 1957 1957 * @qdisc_hash: qdisc hash table 1958 1958 * @watchdog_timeo: Represents the timeout that is used by ··· 2248 2250 #define GRO_MAX_SIZE (8 * 65535u) 2249 2251 unsigned int gro_max_size; 2250 2252 unsigned int gro_ipv4_max_size; 2253 + unsigned int xdp_zc_max_segs; 2251 2254 rx_handler_func_t __rcu *rx_handler; 2252 2255 void __rcu *rx_handler_data; 2253 - 2254 - #ifdef CONFIG_NET_CLS_ACT 2255 - struct mini_Qdisc __rcu *miniq_ingress; 2256 + #ifdef CONFIG_NET_XGRESS 2257 + struct bpf_mprog_entry __rcu *tcx_ingress; 2256 2258 #endif 2257 2259 struct netdev_queue __rcu *ingress_queue; 2258 2260 #ifdef CONFIG_NETFILTER_INGRESS ··· 2280 2282 #ifdef CONFIG_XPS 2281 2283 struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX]; 2282 2284 #endif 2283 - #ifdef CONFIG_NET_CLS_ACT 2284 - struct mini_Qdisc __rcu *miniq_egress; 2285 + #ifdef CONFIG_NET_XGRESS 2286 + struct bpf_mprog_entry __rcu *tcx_egress; 2285 2287 #endif 2286 2288 #ifdef CONFIG_NETFILTER_EGRESS 2287 2289 struct nf_hook_entries __rcu *nf_hooks_egress;

+11 -3

include/linux/skbuff.h

··· 944 944 __u8 __mono_tc_offset[0]; 945 945 /* public: */ 946 946 __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ 947 - #ifdef CONFIG_NET_CLS_ACT 947 + #ifdef CONFIG_NET_XGRESS 948 948 __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ 949 949 __u8 tc_skip_classify:1; 950 950 #endif ··· 993 993 __u8 csum_not_inet:1; 994 994 #endif 995 995 996 - #ifdef CONFIG_NET_SCHED 996 + #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) 997 997 __u16 tc_index; /* traffic control index */ 998 998 #endif 999 999 ··· 4023 4023 if (likely(hlen - offset >= len)) 4024 4024 return (void *)data + offset; 4025 4025 4026 - if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) 4026 + if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) 4027 4027 return NULL; 4028 4028 4029 4029 return buffer; ··· 4034 4034 { 4035 4035 return __skb_header_pointer(skb, offset, len, skb->data, 4036 4036 skb_headlen(skb), buffer); 4037 + } 4038 + 4039 + static inline void * __must_check 4040 + skb_pointer_if_linear(const struct sk_buff *skb, int offset, int len) 4041 + { 4042 + if (likely(skb_headlen(skb) - offset >= len)) 4043 + return skb->data + offset; 4044 + return NULL; 4037 4045 } 4038 4046 4039 4047 /**

+1 -1

include/net/sch_generic.h

··· 703 703 704 704 static inline bool skb_at_tc_ingress(const struct sk_buff *skb) 705 705 { 706 - #ifdef CONFIG_NET_CLS_ACT 706 + #ifdef CONFIG_NET_XGRESS 707 707 return skb->tc_at_ingress; 708 708 #else 709 709 return false;

+206

include/net/tcx.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2023 Isovalent */ 3 + #ifndef __NET_TCX_H 4 + #define __NET_TCX_H 5 + 6 + #include <linux/bpf.h> 7 + #include <linux/bpf_mprog.h> 8 + 9 + #include <net/sch_generic.h> 10 + 11 + struct mini_Qdisc; 12 + 13 + struct tcx_entry { 14 + struct mini_Qdisc __rcu *miniq; 15 + struct bpf_mprog_bundle bundle; 16 + bool miniq_active; 17 + struct rcu_head rcu; 18 + }; 19 + 20 + struct tcx_link { 21 + struct bpf_link link; 22 + struct net_device *dev; 23 + u32 location; 24 + }; 25 + 26 + static inline void tcx_set_ingress(struct sk_buff *skb, bool ingress) 27 + { 28 + #ifdef CONFIG_NET_XGRESS 29 + skb->tc_at_ingress = ingress; 30 + #endif 31 + } 32 + 33 + #ifdef CONFIG_NET_XGRESS 34 + static inline struct tcx_entry *tcx_entry(struct bpf_mprog_entry *entry) 35 + { 36 + struct bpf_mprog_bundle *bundle = entry->parent; 37 + 38 + return container_of(bundle, struct tcx_entry, bundle); 39 + } 40 + 41 + static inline struct tcx_link *tcx_link(struct bpf_link *link) 42 + { 43 + return container_of(link, struct tcx_link, link); 44 + } 45 + 46 + static inline const struct tcx_link *tcx_link_const(const struct bpf_link *link) 47 + { 48 + return tcx_link((struct bpf_link *)link); 49 + } 50 + 51 + void tcx_inc(void); 52 + void tcx_dec(void); 53 + 54 + static inline void tcx_entry_sync(void) 55 + { 56 + /* bpf_mprog_entry got a/b swapped, therefore ensure that 57 + * there are no inflight users on the old one anymore. 58 + */ 59 + synchronize_rcu(); 60 + } 61 + 62 + static inline void 63 + tcx_entry_update(struct net_device *dev, struct bpf_mprog_entry *entry, 64 + bool ingress) 65 + { 66 + ASSERT_RTNL(); 67 + if (ingress) 68 + rcu_assign_pointer(dev->tcx_ingress, entry); 69 + else 70 + rcu_assign_pointer(dev->tcx_egress, entry); 71 + } 72 + 73 + static inline struct bpf_mprog_entry * 74 + tcx_entry_fetch(struct net_device *dev, bool ingress) 75 + { 76 + ASSERT_RTNL(); 77 + if (ingress) 78 + return rcu_dereference_rtnl(dev->tcx_ingress); 79 + else 80 + return rcu_dereference_rtnl(dev->tcx_egress); 81 + } 82 + 83 + static inline struct bpf_mprog_entry *tcx_entry_create(void) 84 + { 85 + struct tcx_entry *tcx = kzalloc(sizeof(*tcx), GFP_KERNEL); 86 + 87 + if (tcx) { 88 + bpf_mprog_bundle_init(&tcx->bundle); 89 + return &tcx->bundle.a; 90 + } 91 + return NULL; 92 + } 93 + 94 + static inline void tcx_entry_free(struct bpf_mprog_entry *entry) 95 + { 96 + kfree_rcu(tcx_entry(entry), rcu); 97 + } 98 + 99 + static inline struct bpf_mprog_entry * 100 + tcx_entry_fetch_or_create(struct net_device *dev, bool ingress, bool *created) 101 + { 102 + struct bpf_mprog_entry *entry = tcx_entry_fetch(dev, ingress); 103 + 104 + *created = false; 105 + if (!entry) { 106 + entry = tcx_entry_create(); 107 + if (!entry) 108 + return NULL; 109 + *created = true; 110 + } 111 + return entry; 112 + } 113 + 114 + static inline void tcx_skeys_inc(bool ingress) 115 + { 116 + tcx_inc(); 117 + if (ingress) 118 + net_inc_ingress_queue(); 119 + else 120 + net_inc_egress_queue(); 121 + } 122 + 123 + static inline void tcx_skeys_dec(bool ingress) 124 + { 125 + if (ingress) 126 + net_dec_ingress_queue(); 127 + else 128 + net_dec_egress_queue(); 129 + tcx_dec(); 130 + } 131 + 132 + static inline void tcx_miniq_set_active(struct bpf_mprog_entry *entry, 133 + const bool active) 134 + { 135 + ASSERT_RTNL(); 136 + tcx_entry(entry)->miniq_active = active; 137 + } 138 + 139 + static inline bool tcx_entry_is_active(struct bpf_mprog_entry *entry) 140 + { 141 + ASSERT_RTNL(); 142 + return bpf_mprog_total(entry) || tcx_entry(entry)->miniq_active; 143 + } 144 + 145 + static inline enum tcx_action_base tcx_action_code(struct sk_buff *skb, 146 + int code) 147 + { 148 + switch (code) { 149 + case TCX_PASS: 150 + skb->tc_index = qdisc_skb_cb(skb)->tc_classid; 151 + fallthrough; 152 + case TCX_DROP: 153 + case TCX_REDIRECT: 154 + return code; 155 + case TCX_NEXT: 156 + default: 157 + return TCX_NEXT; 158 + } 159 + } 160 + #endif /* CONFIG_NET_XGRESS */ 161 + 162 + #if defined(CONFIG_NET_XGRESS) && defined(CONFIG_BPF_SYSCALL) 163 + int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog); 164 + int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); 165 + int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog); 166 + void tcx_uninstall(struct net_device *dev, bool ingress); 167 + 168 + int tcx_prog_query(const union bpf_attr *attr, 169 + union bpf_attr __user *uattr); 170 + 171 + static inline void dev_tcx_uninstall(struct net_device *dev) 172 + { 173 + ASSERT_RTNL(); 174 + tcx_uninstall(dev, true); 175 + tcx_uninstall(dev, false); 176 + } 177 + #else 178 + static inline int tcx_prog_attach(const union bpf_attr *attr, 179 + struct bpf_prog *prog) 180 + { 181 + return -EINVAL; 182 + } 183 + 184 + static inline int tcx_link_attach(const union bpf_attr *attr, 185 + struct bpf_prog *prog) 186 + { 187 + return -EINVAL; 188 + } 189 + 190 + static inline int tcx_prog_detach(const union bpf_attr *attr, 191 + struct bpf_prog *prog) 192 + { 193 + return -EINVAL; 194 + } 195 + 196 + static inline int tcx_prog_query(const union bpf_attr *attr, 197 + union bpf_attr __user *uattr) 198 + { 199 + return -EINVAL; 200 + } 201 + 202 + static inline void dev_tcx_uninstall(struct net_device *dev) 203 + { 204 + } 205 + #endif /* CONFIG_NET_XGRESS && CONFIG_BPF_SYSCALL */ 206 + #endif /* __NET_TCX_H */

+7

include/net/xdp_sock.h

··· 52 52 struct xsk_buff_pool *pool; 53 53 u16 queue_id; 54 54 bool zc; 55 + bool sg; 55 56 enum { 56 57 XSK_READY = 0, 57 58 XSK_BOUND, ··· 67 66 /* Statistics */ 68 67 u64 rx_dropped; 69 68 u64 rx_queue_full; 69 + 70 + /* When __xsk_generic_xmit() must return before it sees the EOP descriptor for the current 71 + * packet, the partially built skb is saved here so that packet building can resume in next 72 + * call of __xsk_generic_xmit(). 73 + */ 74 + struct sk_buff *skb; 70 75 71 76 struct list_head map_list; 72 77 /* Protects map_list */

+54

include/net/xdp_sock_drv.h

··· 89 89 return xp_alloc(pool); 90 90 } 91 91 92 + static inline bool xsk_is_eop_desc(struct xdp_desc *desc) 93 + { 94 + return !xp_mb_desc(desc); 95 + } 96 + 92 97 /* Returns as many entries as possible up to max. 0 <= N <= max. */ 93 98 static inline u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) 94 99 { ··· 108 103 static inline void xsk_buff_free(struct xdp_buff *xdp) 109 104 { 110 105 struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); 106 + struct list_head *xskb_list = &xskb->pool->xskb_list; 107 + struct xdp_buff_xsk *pos, *tmp; 111 108 109 + if (likely(!xdp_buff_has_frags(xdp))) 110 + goto out; 111 + 112 + list_for_each_entry_safe(pos, tmp, xskb_list, xskb_list_node) { 113 + list_del(&pos->xskb_list_node); 114 + xp_free(pos); 115 + } 116 + 117 + xdp_get_shared_info_from_buff(xdp)->nr_frags = 0; 118 + out: 112 119 xp_free(xskb); 120 + } 121 + 122 + static inline void xsk_buff_add_frag(struct xdp_buff *xdp) 123 + { 124 + struct xdp_buff_xsk *frag = container_of(xdp, struct xdp_buff_xsk, xdp); 125 + 126 + list_add_tail(&frag->xskb_list_node, &frag->pool->xskb_list); 127 + } 128 + 129 + static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) 130 + { 131 + struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp); 132 + struct xdp_buff *ret = NULL; 133 + struct xdp_buff_xsk *frag; 134 + 135 + frag = list_first_entry_or_null(&xskb->pool->xskb_list, 136 + struct xdp_buff_xsk, xskb_list_node); 137 + if (frag) { 138 + list_del(&frag->xskb_list_node); 139 + ret = &frag->xdp; 140 + } 141 + 142 + return ret; 113 143 } 114 144 115 145 static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) ··· 281 241 return NULL; 282 242 } 283 243 244 + static inline bool xsk_is_eop_desc(struct xdp_desc *desc) 245 + { 246 + return false; 247 + } 248 + 284 249 static inline u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) 285 250 { 286 251 return 0; ··· 298 253 299 254 static inline void xsk_buff_free(struct xdp_buff *xdp) 300 255 { 256 + } 257 + 258 + static inline void xsk_buff_add_frag(struct xdp_buff *xdp) 259 + { 260 + } 261 + 262 + static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) 263 + { 264 + return NULL; 301 265 } 302 266 303 267 static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size)

+7

include/net/xsk_buff_pool.h

··· 29 29 struct xsk_buff_pool *pool; 30 30 u64 orig_addr; 31 31 struct list_head free_list_node; 32 + struct list_head xskb_list_node; 32 33 }; 33 34 34 35 #define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct xdp_buff_xsk, cb)) ··· 55 54 struct xdp_umem *umem; 56 55 struct work_struct work; 57 56 struct list_head free_list; 57 + struct list_head xskb_list; 58 58 u32 heads_cnt; 59 59 u16 queue_id; 60 60 ··· 184 182 185 183 return pool->dma_pages && 186 184 !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK); 185 + } 186 + 187 + static inline bool xp_mb_desc(struct xdp_desc *desc) 188 + { 189 + return desc->options & XDP_PKT_CONTD; 187 190 } 188 191 189 192 static inline u64 xp_aligned_extract_addr(struct xsk_buff_pool *pool, u64 addr)

+60 -12

include/uapi/linux/bpf.h

··· 1036 1036 BPF_LSM_CGROUP, 1037 1037 BPF_STRUCT_OPS, 1038 1038 BPF_NETFILTER, 1039 + BPF_TCX_INGRESS, 1040 + BPF_TCX_EGRESS, 1039 1041 __MAX_BPF_ATTACH_TYPE 1040 1042 }; 1041 1043 ··· 1055 1053 BPF_LINK_TYPE_KPROBE_MULTI = 8, 1056 1054 BPF_LINK_TYPE_STRUCT_OPS = 9, 1057 1055 BPF_LINK_TYPE_NETFILTER = 10, 1058 - 1056 + BPF_LINK_TYPE_TCX = 11, 1059 1057 MAX_BPF_LINK_TYPE, 1060 1058 }; 1061 1059 ··· 1115 1113 */ 1116 1114 #define BPF_F_ALLOW_OVERRIDE (1U << 0) 1117 1115 #define BPF_F_ALLOW_MULTI (1U << 1) 1116 + /* Generic attachment flags. */ 1118 1117 #define BPF_F_REPLACE (1U << 2) 1118 + #define BPF_F_BEFORE (1U << 3) 1119 + #define BPF_F_AFTER (1U << 4) 1120 + #define BPF_F_ID (1U << 5) 1121 + #define BPF_F_LINK BPF_F_LINK /* 1 << 13 */ 1119 1122 1120 1123 /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the 1121 1124 * verifier will perform strict alignment checking as if the kernel ··· 1451 1444 }; 1452 1445 1453 1446 struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ 1454 - __u32 target_fd; /* container object to attach to */ 1455 - __u32 attach_bpf_fd; /* eBPF program to attach */ 1447 + union { 1448 + __u32 target_fd; /* target object to attach to or ... */ 1449 + __u32 target_ifindex; /* target ifindex */ 1450 + }; 1451 + __u32 attach_bpf_fd; 1456 1452 __u32 attach_type; 1457 1453 __u32 attach_flags; 1458 - __u32 replace_bpf_fd; /* previously attached eBPF 1459 - * program to replace if 1460 - * BPF_F_REPLACE is used 1461 - */ 1454 + __u32 replace_bpf_fd; 1455 + union { 1456 + __u32 relative_fd; 1457 + __u32 relative_id; 1458 + }; 1459 + __u64 expected_revision; 1462 1460 }; 1463 1461 1464 1462 struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */ ··· 1509 1497 } info; 1510 1498 1511 1499 struct { /* anonymous struct used by BPF_PROG_QUERY command */ 1512 - __u32 target_fd; /* container object to query */ 1500 + union { 1501 + __u32 target_fd; /* target object to query or ... */ 1502 + __u32 target_ifindex; /* target ifindex */ 1503 + }; 1513 1504 __u32 attach_type; 1514 1505 __u32 query_flags; 1515 1506 __u32 attach_flags; 1516 1507 __aligned_u64 prog_ids; 1517 - __u32 prog_cnt; 1508 + union { 1509 + __u32 prog_cnt; 1510 + __u32 count; 1511 + }; 1512 + __u32 :32; 1518 1513 /* output: per-program attach_flags. 1519 1514 * not allowed to be set during effective query. 1520 1515 */ 1521 1516 __aligned_u64 prog_attach_flags; 1517 + __aligned_u64 link_ids; 1518 + __aligned_u64 link_attach_flags; 1519 + __u64 revision; 1522 1520 } query; 1523 1521 1524 1522 struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ ··· 1571 1549 __u32 map_fd; /* struct_ops to attach */ 1572 1550 }; 1573 1551 union { 1574 - __u32 target_fd; /* object to attach to */ 1575 - __u32 target_ifindex; /* target ifindex */ 1552 + __u32 target_fd; /* target object to attach to or ... */ 1553 + __u32 target_ifindex; /* target ifindex */ 1576 1554 }; 1577 1555 __u32 attach_type; /* attach type */ 1578 1556 __u32 flags; /* extra flags */ 1579 1557 union { 1580 - __u32 target_btf_id; /* btf_id of target to attach to */ 1558 + __u32 target_btf_id; /* btf_id of target to attach to */ 1581 1559 struct { 1582 1560 __aligned_u64 iter_info; /* extra bpf_iter_link_info */ 1583 1561 __u32 iter_info_len; /* iter_info length */ ··· 1611 1589 __s32 priority; 1612 1590 __u32 flags; 1613 1591 } netfilter; 1592 + struct { 1593 + union { 1594 + __u32 relative_fd; 1595 + __u32 relative_id; 1596 + }; 1597 + __u64 expected_revision; 1598 + } tcx; 1614 1599 }; 1615 1600 } link_create; 1616 1601 ··· 6226 6197 }; 6227 6198 }; 6228 6199 6200 + /* (Simplified) user return codes for tcx prog type. 6201 + * A valid tcx program must return one of these defined values. All other 6202 + * return codes are reserved for future use. Must remain compatible with 6203 + * their TC_ACT_* counter-parts. For compatibility in behavior, unknown 6204 + * return codes are mapped to TCX_NEXT. 6205 + */ 6206 + enum tcx_action_base { 6207 + TCX_NEXT = -1, 6208 + TCX_PASS = 0, 6209 + TCX_DROP = 2, 6210 + TCX_REDIRECT = 7, 6211 + }; 6212 + 6229 6213 struct bpf_xdp_sock { 6230 6214 __u32 queue_id; 6231 6215 }; ··· 6521 6479 } event; /* BPF_PERF_EVENT_EVENT */ 6522 6480 }; 6523 6481 } perf_event; 6482 + struct { 6483 + __u32 ifindex; 6484 + __u32 attach_type; 6485 + } tcx; 6524 6486 }; 6525 6487 } __attribute__((aligned(8))); 6526 6488 ··· 7098 7052 struct bpf_list_node { 7099 7053 __u64 :64; 7100 7054 __u64 :64; 7055 + __u64 :64; 7101 7056 } __attribute__((aligned(8))); 7102 7057 7103 7058 struct bpf_rb_root { ··· 7107 7060 } __attribute__((aligned(8))); 7108 7061 7109 7062 struct bpf_rb_node { 7063 + __u64 :64; 7110 7064 __u64 :64; 7111 7065 __u64 :64; 7112 7066 __u64 :64;

+13

include/uapi/linux/if_xdp.h

··· 25 25 * application. 26 26 */ 27 27 #define XDP_USE_NEED_WAKEUP (1 << 3) 28 + /* By setting this option, userspace application indicates that it can 29 + * handle multiple descriptors per packet thus enabling AF_XDP to split 30 + * multi-buffer XDP frames into multiple Rx descriptors. Without this set 31 + * such frames will be dropped. 32 + */ 33 + #define XDP_USE_SG (1 << 4) 28 34 29 35 /* Flags for xsk_umem_config flags */ 30 36 #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) ··· 113 107 }; 114 108 115 109 /* UMEM descriptor is __u64 */ 110 + 111 + /* Flag indicating that the packet continues with the buffer pointed out by the 112 + * next frame in the ring. The end of the packet is signalled by setting this 113 + * bit to zero. For single buffer packets, every descriptor has 'options' set 114 + * to 0 and this maintains backward compatibility. 115 + */ 116 + #define XDP_PKT_CONTD (1 << 0) 116 117 117 118 #endif /* _LINUX_IF_XDP_H */

+1

include/uapi/linux/netdev.h

··· 41 41 NETDEV_A_DEV_IFINDEX = 1, 42 42 NETDEV_A_DEV_PAD, 43 43 NETDEV_A_DEV_XDP_FEATURES, 44 + NETDEV_A_DEV_XDP_ZC_MAX_SEGS, 44 45 45 46 __NETDEV_A_DEV_MAX, 46 47 NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)

+1

kernel/bpf/Kconfig

··· 31 31 select TASKS_TRACE_RCU 32 32 select BINARY_PRINTF 33 33 select NET_SOCK_MSG if NET 34 + select NET_XGRESS if NET 34 35 select PAGE_POOL if NET 35 36 default n 36 37 help

+2 -1

kernel/bpf/Makefile

··· 12 12 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o 13 13 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o 14 14 obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o 15 - obj-$(CONFIG_BPF_SYSCALL) += disasm.o 15 + obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o 16 16 obj-$(CONFIG_BPF_JIT) += trampoline.o 17 17 obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o 18 18 obj-$(CONFIG_BPF_JIT) += dispatcher.o ··· 21 21 obj-$(CONFIG_BPF_SYSCALL) += cpumap.o 22 22 obj-$(CONFIG_BPF_SYSCALL) += offload.o 23 23 obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o 24 + obj-$(CONFIG_BPF_SYSCALL) += tcx.o 24 25 endif 25 26 ifeq ($(CONFIG_PERF_EVENTS),y) 26 27 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o

+41 -14

kernel/bpf/helpers.c

··· 1942 1942 return (void *)p__refcounted_kptr; 1943 1943 } 1944 1944 1945 - static int __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head, 1945 + static int __bpf_list_add(struct bpf_list_node_kern *node, 1946 + struct bpf_list_head *head, 1946 1947 bool tail, struct btf_record *rec, u64 off) 1947 1948 { 1948 - struct list_head *n = (void *)node, *h = (void *)head; 1949 + struct list_head *n = &node->list_head, *h = (void *)head; 1949 1950 1950 1951 /* If list_head was 0-initialized by map, bpf_obj_init_field wasn't 1951 1952 * called on its fields, so init here 1952 1953 */ 1953 1954 if (unlikely(!h->next)) 1954 1955 INIT_LIST_HEAD(h); 1955 - if (!list_empty(n)) { 1956 + 1957 + /* node->owner != NULL implies !list_empty(n), no need to separately 1958 + * check the latter 1959 + */ 1960 + if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) { 1956 1961 /* Only called from BPF prog, no need to migrate_disable */ 1957 1962 __bpf_obj_drop_impl((void *)n - off, rec); 1958 1963 return -EINVAL; 1959 1964 } 1960 1965 1961 1966 tail ? list_add_tail(n, h) : list_add(n, h); 1967 + WRITE_ONCE(node->owner, head); 1962 1968 1963 1969 return 0; 1964 1970 } ··· 1973 1967 struct bpf_list_node *node, 1974 1968 void *meta__ign, u64 off) 1975 1969 { 1970 + struct bpf_list_node_kern *n = (void *)node; 1976 1971 struct btf_struct_meta *meta = meta__ign; 1977 1972 1978 - return __bpf_list_add(node, head, false, 1979 - meta ? meta->record : NULL, off); 1973 + return __bpf_list_add(n, head, false, meta ? meta->record : NULL, off); 1980 1974 } 1981 1975 1982 1976 __bpf_kfunc int bpf_list_push_back_impl(struct bpf_list_head *head, 1983 1977 struct bpf_list_node *node, 1984 1978 void *meta__ign, u64 off) 1985 1979 { 1980 + struct bpf_list_node_kern *n = (void *)node; 1986 1981 struct btf_struct_meta *meta = meta__ign; 1987 1982 1988 - return __bpf_list_add(node, head, true, 1989 - meta ? meta->record : NULL, off); 1983 + return __bpf_list_add(n, head, true, meta ? meta->record : NULL, off); 1990 1984 } 1991 1985 1992 1986 static struct bpf_list_node *__bpf_list_del(struct bpf_list_head *head, bool tail) 1993 1987 { 1994 1988 struct list_head *n, *h = (void *)head; 1989 + struct bpf_list_node_kern *node; 1995 1990 1996 1991 /* If list_head was 0-initialized by map, bpf_obj_init_field wasn't 1997 1992 * called on its fields, so init here ··· 2001 1994 INIT_LIST_HEAD(h); 2002 1995 if (list_empty(h)) 2003 1996 return NULL; 1997 + 2004 1998 n = tail ? h->prev : h->next; 1999 + node = container_of(n, struct bpf_list_node_kern, list_head); 2000 + if (WARN_ON_ONCE(READ_ONCE(node->owner) != head)) 2001 + return NULL; 2002 + 2005 2003 list_del_init(n); 2004 + WRITE_ONCE(node->owner, NULL); 2006 2005 return (struct bpf_list_node *)n; 2007 2006 } 2008 2007 ··· 2025 2012 __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root, 2026 2013 struct bpf_rb_node *node) 2027 2014 { 2015 + struct bpf_rb_node_kern *node_internal = (struct bpf_rb_node_kern *)node; 2028 2016 struct rb_root_cached *r = (struct rb_root_cached *)root; 2029 - struct rb_node *n = (struct rb_node *)node; 2017 + struct rb_node *n = &node_internal->rb_node; 2030 2018 2031 - if (RB_EMPTY_NODE(n)) 2019 + /* node_internal->owner != root implies either RB_EMPTY_NODE(n) or 2020 + * n is owned by some other tree. No need to check RB_EMPTY_NODE(n) 2021 + */ 2022 + if (READ_ONCE(node_internal->owner) != root) 2032 2023 return NULL; 2033 2024 2034 2025 rb_erase_cached(n, r); 2035 2026 RB_CLEAR_NODE(n); 2027 + WRITE_ONCE(node_internal->owner, NULL); 2036 2028 return (struct bpf_rb_node *)n; 2037 2029 } 2038 2030 2039 2031 /* Need to copy rbtree_add_cached's logic here because our 'less' is a BPF 2040 2032 * program 2041 2033 */ 2042 - static int __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node, 2034 + static int __bpf_rbtree_add(struct bpf_rb_root *root, 2035 + struct bpf_rb_node_kern *node, 2043 2036 void *less, struct btf_record *rec, u64 off) 2044 2037 { 2045 2038 struct rb_node **link = &((struct rb_root_cached *)root)->rb_root.rb_node; 2046 - struct rb_node *parent = NULL, *n = (struct rb_node *)node; 2039 + struct rb_node *parent = NULL, *n = &node->rb_node; 2047 2040 bpf_callback_t cb = (bpf_callback_t)less; 2048 2041 bool leftmost = true; 2049 2042 2050 - if (!RB_EMPTY_NODE(n)) { 2043 + /* node->owner != NULL implies !RB_EMPTY_NODE(n), no need to separately 2044 + * check the latter 2045 + */ 2046 + if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) { 2051 2047 /* Only called from BPF prog, no need to migrate_disable */ 2052 2048 __bpf_obj_drop_impl((void *)n - off, rec); 2053 2049 return -EINVAL; ··· 2074 2052 2075 2053 rb_link_node(n, parent, link); 2076 2054 rb_insert_color_cached(n, (struct rb_root_cached *)root, leftmost); 2055 + WRITE_ONCE(node->owner, root); 2077 2056 return 0; 2078 2057 } 2079 2058 ··· 2083 2060 void *meta__ign, u64 off) 2084 2061 { 2085 2062 struct btf_struct_meta *meta = meta__ign; 2063 + struct bpf_rb_node_kern *n = (void *)node; 2086 2064 2087 - return __bpf_rbtree_add(root, node, (void *)less, meta ? meta->record : NULL, off); 2065 + return __bpf_rbtree_add(root, n, (void *)less, meta ? meta->record : NULL, off); 2088 2066 } 2089 2067 2090 2068 __bpf_kfunc struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) ··· 2263 2239 case BPF_DYNPTR_TYPE_RINGBUF: 2264 2240 return ptr->data + ptr->offset + offset; 2265 2241 case BPF_DYNPTR_TYPE_SKB: 2266 - return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt); 2242 + if (buffer__opt) 2243 + return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt); 2244 + else 2245 + return skb_pointer_if_linear(ptr->data, ptr->offset + offset, len); 2267 2246 case BPF_DYNPTR_TYPE_XDP: 2268 2247 { 2269 2248 void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len);

+3 -4

kernel/bpf/map_iter.c

··· 78 78 .show = bpf_map_seq_show, 79 79 }; 80 80 81 - BTF_ID_LIST(btf_bpf_map_id) 82 - BTF_ID(struct, bpf_map) 81 + BTF_ID_LIST_GLOBAL_SINGLE(btf_bpf_map_id, struct, bpf_map) 83 82 84 83 static const struct bpf_iter_seq_info bpf_map_seq_info = { 85 84 .seq_ops = &bpf_map_seq_ops, ··· 197 198 __diag_ignore_all("-Wmissing-prototypes", 198 199 "Global functions as their definitions will be in vmlinux BTF"); 199 200 200 - __bpf_kfunc s64 bpf_map_sum_elem_count(struct bpf_map *map) 201 + __bpf_kfunc s64 bpf_map_sum_elem_count(const struct bpf_map *map) 201 202 { 202 203 s64 *pcount; 203 204 s64 ret = 0; ··· 226 227 227 228 static int init_subsystem(void) 228 229 { 229 - return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_map_iter_kfunc_set); 230 + return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &bpf_map_iter_kfunc_set); 230 231 } 231 232 late_initcall(init_subsystem);

+445

kernel/bpf/mprog.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Isovalent */ 3 + 4 + #include <linux/bpf.h> 5 + #include <linux/bpf_mprog.h> 6 + 7 + static int bpf_mprog_link(struct bpf_tuple *tuple, 8 + u32 id_or_fd, u32 flags, 9 + enum bpf_prog_type type) 10 + { 11 + struct bpf_link *link = ERR_PTR(-EINVAL); 12 + bool id = flags & BPF_F_ID; 13 + 14 + if (id) 15 + link = bpf_link_by_id(id_or_fd); 16 + else if (id_or_fd) 17 + link = bpf_link_get_from_fd(id_or_fd); 18 + if (IS_ERR(link)) 19 + return PTR_ERR(link); 20 + if (type && link->prog->type != type) { 21 + bpf_link_put(link); 22 + return -EINVAL; 23 + } 24 + 25 + tuple->link = link; 26 + tuple->prog = link->prog; 27 + return 0; 28 + } 29 + 30 + static int bpf_mprog_prog(struct bpf_tuple *tuple, 31 + u32 id_or_fd, u32 flags, 32 + enum bpf_prog_type type) 33 + { 34 + struct bpf_prog *prog = ERR_PTR(-EINVAL); 35 + bool id = flags & BPF_F_ID; 36 + 37 + if (id) 38 + prog = bpf_prog_by_id(id_or_fd); 39 + else if (id_or_fd) 40 + prog = bpf_prog_get(id_or_fd); 41 + if (IS_ERR(prog)) 42 + return PTR_ERR(prog); 43 + if (type && prog->type != type) { 44 + bpf_prog_put(prog); 45 + return -EINVAL; 46 + } 47 + 48 + tuple->link = NULL; 49 + tuple->prog = prog; 50 + return 0; 51 + } 52 + 53 + static int bpf_mprog_tuple_relative(struct bpf_tuple *tuple, 54 + u32 id_or_fd, u32 flags, 55 + enum bpf_prog_type type) 56 + { 57 + bool link = flags & BPF_F_LINK; 58 + bool id = flags & BPF_F_ID; 59 + 60 + memset(tuple, 0, sizeof(*tuple)); 61 + if (link) 62 + return bpf_mprog_link(tuple, id_or_fd, flags, type); 63 + /* If no relevant flag is set and no id_or_fd was passed, then 64 + * tuple link/prog is just NULLed. This is the case when before/ 65 + * after selects first/last position without passing fd. 66 + */ 67 + if (!id && !id_or_fd) 68 + return 0; 69 + return bpf_mprog_prog(tuple, id_or_fd, flags, type); 70 + } 71 + 72 + static void bpf_mprog_tuple_put(struct bpf_tuple *tuple) 73 + { 74 + if (tuple->link) 75 + bpf_link_put(tuple->link); 76 + else if (tuple->prog) 77 + bpf_prog_put(tuple->prog); 78 + } 79 + 80 + /* The bpf_mprog_{replace,delete}() operate on exact idx position with the 81 + * one exception that for deletion we support delete from front/back. In 82 + * case of front idx is -1, in case of back idx is bpf_mprog_total(entry). 83 + * Adjustment to first and last entry is trivial. The bpf_mprog_insert() 84 + * we have to deal with the following cases: 85 + * 86 + * idx + before: 87 + * 88 + * Insert P4 before P3: idx for old array is 1, idx for new array is 2, 89 + * hence we adjust target idx for the new array, so that memmove copies 90 + * P1 and P2 to the new entry, and we insert P4 into idx 2. Inserting 91 + * before P1 would have old idx -1 and new idx 0. 92 + * 93 + * +--+--+--+ +--+--+--+--+ +--+--+--+--+ 94 + * |P1|P2|P3| ==> |P1|P2| |P3| ==> |P1|P2|P4|P3| 95 + * +--+--+--+ +--+--+--+--+ +--+--+--+--+ 96 + * 97 + * idx + after: 98 + * 99 + * Insert P4 after P2: idx for old array is 2, idx for new array is 2. 100 + * Again, memmove copies P1 and P2 to the new entry, and we insert P4 101 + * into idx 2. Inserting after P3 would have both old/new idx at 4 aka 102 + * bpf_mprog_total(entry). 103 + * 104 + * +--+--+--+ +--+--+--+--+ +--+--+--+--+ 105 + * |P1|P2|P3| ==> |P1|P2| |P3| ==> |P1|P2|P4|P3| 106 + * +--+--+--+ +--+--+--+--+ +--+--+--+--+ 107 + */ 108 + static int bpf_mprog_replace(struct bpf_mprog_entry *entry, 109 + struct bpf_mprog_entry **entry_new, 110 + struct bpf_tuple *ntuple, int idx) 111 + { 112 + struct bpf_mprog_fp *fp; 113 + struct bpf_mprog_cp *cp; 114 + struct bpf_prog *oprog; 115 + 116 + bpf_mprog_read(entry, idx, &fp, &cp); 117 + oprog = READ_ONCE(fp->prog); 118 + bpf_mprog_write(fp, cp, ntuple); 119 + if (!ntuple->link) { 120 + WARN_ON_ONCE(cp->link); 121 + bpf_prog_put(oprog); 122 + } 123 + *entry_new = entry; 124 + return 0; 125 + } 126 + 127 + static int bpf_mprog_insert(struct bpf_mprog_entry *entry, 128 + struct bpf_mprog_entry **entry_new, 129 + struct bpf_tuple *ntuple, int idx, u32 flags) 130 + { 131 + int total = bpf_mprog_total(entry); 132 + struct bpf_mprog_entry *peer; 133 + struct bpf_mprog_fp *fp; 134 + struct bpf_mprog_cp *cp; 135 + 136 + peer = bpf_mprog_peer(entry); 137 + bpf_mprog_entry_copy(peer, entry); 138 + if (idx == total) 139 + goto insert; 140 + else if (flags & BPF_F_BEFORE) 141 + idx += 1; 142 + bpf_mprog_entry_grow(peer, idx); 143 + insert: 144 + bpf_mprog_read(peer, idx, &fp, &cp); 145 + bpf_mprog_write(fp, cp, ntuple); 146 + bpf_mprog_inc(peer); 147 + *entry_new = peer; 148 + return 0; 149 + } 150 + 151 + static int bpf_mprog_delete(struct bpf_mprog_entry *entry, 152 + struct bpf_mprog_entry **entry_new, 153 + struct bpf_tuple *dtuple, int idx) 154 + { 155 + int total = bpf_mprog_total(entry); 156 + struct bpf_mprog_entry *peer; 157 + 158 + peer = bpf_mprog_peer(entry); 159 + bpf_mprog_entry_copy(peer, entry); 160 + if (idx == -1) 161 + idx = 0; 162 + else if (idx == total) 163 + idx = total - 1; 164 + bpf_mprog_entry_shrink(peer, idx); 165 + bpf_mprog_dec(peer); 166 + bpf_mprog_mark_for_release(peer, dtuple); 167 + *entry_new = peer; 168 + return 0; 169 + } 170 + 171 + /* In bpf_mprog_pos_*() we evaluate the target position for the BPF 172 + * program/link that needs to be replaced, inserted or deleted for 173 + * each "rule" independently. If all rules agree on that position 174 + * or existing element, then enact replacement, addition or deletion. 175 + * If this is not the case, then the request cannot be satisfied and 176 + * we bail out with an error. 177 + */ 178 + static int bpf_mprog_pos_exact(struct bpf_mprog_entry *entry, 179 + struct bpf_tuple *tuple) 180 + { 181 + struct bpf_mprog_fp *fp; 182 + struct bpf_mprog_cp *cp; 183 + int i; 184 + 185 + for (i = 0; i < bpf_mprog_total(entry); i++) { 186 + bpf_mprog_read(entry, i, &fp, &cp); 187 + if (tuple->prog == READ_ONCE(fp->prog)) 188 + return tuple->link == cp->link ? i : -EBUSY; 189 + } 190 + return -ENOENT; 191 + } 192 + 193 + static int bpf_mprog_pos_before(struct bpf_mprog_entry *entry, 194 + struct bpf_tuple *tuple) 195 + { 196 + struct bpf_mprog_fp *fp; 197 + struct bpf_mprog_cp *cp; 198 + int i; 199 + 200 + for (i = 0; i < bpf_mprog_total(entry); i++) { 201 + bpf_mprog_read(entry, i, &fp, &cp); 202 + if (tuple->prog == READ_ONCE(fp->prog) && 203 + (!tuple->link || tuple->link == cp->link)) 204 + return i - 1; 205 + } 206 + return tuple->prog ? -ENOENT : -1; 207 + } 208 + 209 + static int bpf_mprog_pos_after(struct bpf_mprog_entry *entry, 210 + struct bpf_tuple *tuple) 211 + { 212 + struct bpf_mprog_fp *fp; 213 + struct bpf_mprog_cp *cp; 214 + int i; 215 + 216 + for (i = 0; i < bpf_mprog_total(entry); i++) { 217 + bpf_mprog_read(entry, i, &fp, &cp); 218 + if (tuple->prog == READ_ONCE(fp->prog) && 219 + (!tuple->link || tuple->link == cp->link)) 220 + return i + 1; 221 + } 222 + return tuple->prog ? -ENOENT : bpf_mprog_total(entry); 223 + } 224 + 225 + int bpf_mprog_attach(struct bpf_mprog_entry *entry, 226 + struct bpf_mprog_entry **entry_new, 227 + struct bpf_prog *prog_new, struct bpf_link *link, 228 + struct bpf_prog *prog_old, 229 + u32 flags, u32 id_or_fd, u64 revision) 230 + { 231 + struct bpf_tuple rtuple, ntuple = { 232 + .prog = prog_new, 233 + .link = link, 234 + }, otuple = { 235 + .prog = prog_old, 236 + .link = link, 237 + }; 238 + int ret, idx = -ERANGE, tidx; 239 + 240 + if (revision && revision != bpf_mprog_revision(entry)) 241 + return -ESTALE; 242 + if (bpf_mprog_exists(entry, prog_new)) 243 + return -EEXIST; 244 + ret = bpf_mprog_tuple_relative(&rtuple, id_or_fd, 245 + flags & ~BPF_F_REPLACE, 246 + prog_new->type); 247 + if (ret) 248 + return ret; 249 + if (flags & BPF_F_REPLACE) { 250 + tidx = bpf_mprog_pos_exact(entry, &otuple); 251 + if (tidx < 0) { 252 + ret = tidx; 253 + goto out; 254 + } 255 + idx = tidx; 256 + } 257 + if (flags & BPF_F_BEFORE) { 258 + tidx = bpf_mprog_pos_before(entry, &rtuple); 259 + if (tidx < -1 || (idx >= -1 && tidx != idx)) { 260 + ret = tidx < -1 ? tidx : -ERANGE; 261 + goto out; 262 + } 263 + idx = tidx; 264 + } 265 + if (flags & BPF_F_AFTER) { 266 + tidx = bpf_mprog_pos_after(entry, &rtuple); 267 + if (tidx < -1 || (idx >= -1 && tidx != idx)) { 268 + ret = tidx < 0 ? tidx : -ERANGE; 269 + goto out; 270 + } 271 + idx = tidx; 272 + } 273 + if (idx < -1) { 274 + if (rtuple.prog || flags) { 275 + ret = -EINVAL; 276 + goto out; 277 + } 278 + idx = bpf_mprog_total(entry); 279 + flags = BPF_F_AFTER; 280 + } 281 + if (idx >= bpf_mprog_max()) { 282 + ret = -ERANGE; 283 + goto out; 284 + } 285 + if (flags & BPF_F_REPLACE) 286 + ret = bpf_mprog_replace(entry, entry_new, &ntuple, idx); 287 + else 288 + ret = bpf_mprog_insert(entry, entry_new, &ntuple, idx, flags); 289 + out: 290 + bpf_mprog_tuple_put(&rtuple); 291 + return ret; 292 + } 293 + 294 + static int bpf_mprog_fetch(struct bpf_mprog_entry *entry, 295 + struct bpf_tuple *tuple, int idx) 296 + { 297 + int total = bpf_mprog_total(entry); 298 + struct bpf_mprog_cp *cp; 299 + struct bpf_mprog_fp *fp; 300 + struct bpf_prog *prog; 301 + struct bpf_link *link; 302 + 303 + if (idx == -1) 304 + idx = 0; 305 + else if (idx == total) 306 + idx = total - 1; 307 + bpf_mprog_read(entry, idx, &fp, &cp); 308 + prog = READ_ONCE(fp->prog); 309 + link = cp->link; 310 + /* The deletion request can either be without filled tuple in which 311 + * case it gets populated here based on idx, or with filled tuple 312 + * where the only thing we end up doing is the WARN_ON_ONCE() assert. 313 + * If we hit a BPF link at the given index, it must not be removed 314 + * from opts path. 315 + */ 316 + if (link && !tuple->link) 317 + return -EBUSY; 318 + WARN_ON_ONCE(tuple->prog && tuple->prog != prog); 319 + WARN_ON_ONCE(tuple->link && tuple->link != link); 320 + tuple->prog = prog; 321 + tuple->link = link; 322 + return 0; 323 + } 324 + 325 + int bpf_mprog_detach(struct bpf_mprog_entry *entry, 326 + struct bpf_mprog_entry **entry_new, 327 + struct bpf_prog *prog, struct bpf_link *link, 328 + u32 flags, u32 id_or_fd, u64 revision) 329 + { 330 + struct bpf_tuple rtuple, dtuple = { 331 + .prog = prog, 332 + .link = link, 333 + }; 334 + int ret, idx = -ERANGE, tidx; 335 + 336 + if (flags & BPF_F_REPLACE) 337 + return -EINVAL; 338 + if (revision && revision != bpf_mprog_revision(entry)) 339 + return -ESTALE; 340 + ret = bpf_mprog_tuple_relative(&rtuple, id_or_fd, flags, 341 + prog ? prog->type : 342 + BPF_PROG_TYPE_UNSPEC); 343 + if (ret) 344 + return ret; 345 + if (dtuple.prog) { 346 + tidx = bpf_mprog_pos_exact(entry, &dtuple); 347 + if (tidx < 0) { 348 + ret = tidx; 349 + goto out; 350 + } 351 + idx = tidx; 352 + } 353 + if (flags & BPF_F_BEFORE) { 354 + tidx = bpf_mprog_pos_before(entry, &rtuple); 355 + if (tidx < -1 || (idx >= -1 && tidx != idx)) { 356 + ret = tidx < -1 ? tidx : -ERANGE; 357 + goto out; 358 + } 359 + idx = tidx; 360 + } 361 + if (flags & BPF_F_AFTER) { 362 + tidx = bpf_mprog_pos_after(entry, &rtuple); 363 + if (tidx < -1 || (idx >= -1 && tidx != idx)) { 364 + ret = tidx < 0 ? tidx : -ERANGE; 365 + goto out; 366 + } 367 + idx = tidx; 368 + } 369 + if (idx < -1) { 370 + if (rtuple.prog || flags) { 371 + ret = -EINVAL; 372 + goto out; 373 + } 374 + idx = bpf_mprog_total(entry); 375 + flags = BPF_F_AFTER; 376 + } 377 + if (idx >= bpf_mprog_max()) { 378 + ret = -ERANGE; 379 + goto out; 380 + } 381 + ret = bpf_mprog_fetch(entry, &dtuple, idx); 382 + if (ret) 383 + goto out; 384 + ret = bpf_mprog_delete(entry, entry_new, &dtuple, idx); 385 + out: 386 + bpf_mprog_tuple_put(&rtuple); 387 + return ret; 388 + } 389 + 390 + int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, 391 + struct bpf_mprog_entry *entry) 392 + { 393 + u32 __user *uprog_flags, *ulink_flags; 394 + u32 __user *uprog_id, *ulink_id; 395 + struct bpf_mprog_fp *fp; 396 + struct bpf_mprog_cp *cp; 397 + struct bpf_prog *prog; 398 + const u32 flags = 0; 399 + int i, ret = 0; 400 + u32 id, count; 401 + u64 revision; 402 + 403 + if (attr->query.query_flags || attr->query.attach_flags) 404 + return -EINVAL; 405 + revision = bpf_mprog_revision(entry); 406 + count = bpf_mprog_total(entry); 407 + if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) 408 + return -EFAULT; 409 + if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) 410 + return -EFAULT; 411 + if (copy_to_user(&uattr->query.count, &count, sizeof(count))) 412 + return -EFAULT; 413 + uprog_id = u64_to_user_ptr(attr->query.prog_ids); 414 + uprog_flags = u64_to_user_ptr(attr->query.prog_attach_flags); 415 + ulink_id = u64_to_user_ptr(attr->query.link_ids); 416 + ulink_flags = u64_to_user_ptr(attr->query.link_attach_flags); 417 + if (attr->query.count == 0 || !uprog_id || !count) 418 + return 0; 419 + if (attr->query.count < count) { 420 + count = attr->query.count; 421 + ret = -ENOSPC; 422 + } 423 + for (i = 0; i < bpf_mprog_max(); i++) { 424 + bpf_mprog_read(entry, i, &fp, &cp); 425 + prog = READ_ONCE(fp->prog); 426 + if (!prog) 427 + break; 428 + id = prog->aux->id; 429 + if (copy_to_user(uprog_id + i, &id, sizeof(id))) 430 + return -EFAULT; 431 + if (uprog_flags && 432 + copy_to_user(uprog_flags + i, &flags, sizeof(flags))) 433 + return -EFAULT; 434 + id = cp->link ? cp->link->id : 0; 435 + if (ulink_id && 436 + copy_to_user(ulink_id + i, &id, sizeof(id))) 437 + return -EFAULT; 438 + if (ulink_flags && 439 + copy_to_user(ulink_flags + i, &flags, sizeof(flags))) 440 + return -EFAULT; 441 + if (i + 1 == count) 442 + break; 443 + } 444 + return ret; 445 + }

+69 -13

kernel/bpf/syscall.c

··· 37 37 #include <linux/trace_events.h> 38 38 #include <net/netfilter/nf_bpf_link.h> 39 39 40 + #include <net/tcx.h> 41 + 40 42 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ 41 43 (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ 42 44 (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) ··· 3742 3740 return BPF_PROG_TYPE_XDP; 3743 3741 case BPF_LSM_CGROUP: 3744 3742 return BPF_PROG_TYPE_LSM; 3743 + case BPF_TCX_INGRESS: 3744 + case BPF_TCX_EGRESS: 3745 + return BPF_PROG_TYPE_SCHED_CLS; 3745 3746 default: 3746 3747 return BPF_PROG_TYPE_UNSPEC; 3747 3748 } 3748 3749 } 3749 3750 3750 - #define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd 3751 + #define BPF_PROG_ATTACH_LAST_FIELD expected_revision 3751 3752 3752 - #define BPF_F_ATTACH_MASK \ 3753 - (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE) 3753 + #define BPF_F_ATTACH_MASK_BASE \ 3754 + (BPF_F_ALLOW_OVERRIDE | \ 3755 + BPF_F_ALLOW_MULTI | \ 3756 + BPF_F_REPLACE) 3757 + 3758 + #define BPF_F_ATTACH_MASK_MPROG \ 3759 + (BPF_F_REPLACE | \ 3760 + BPF_F_BEFORE | \ 3761 + BPF_F_AFTER | \ 3762 + BPF_F_ID | \ 3763 + BPF_F_LINK) 3754 3764 3755 3765 static int bpf_prog_attach(const union bpf_attr *attr) 3756 3766 { 3757 3767 enum bpf_prog_type ptype; 3758 3768 struct bpf_prog *prog; 3769 + u32 mask; 3759 3770 int ret; 3760 3771 3761 3772 if (CHECK_ATTR(BPF_PROG_ATTACH)) 3762 3773 return -EINVAL; 3763 3774 3764 - if (attr->attach_flags & ~BPF_F_ATTACH_MASK) 3765 - return -EINVAL; 3766 - 3767 3775 ptype = attach_type_to_prog_type(attr->attach_type); 3768 3776 if (ptype == BPF_PROG_TYPE_UNSPEC) 3777 + return -EINVAL; 3778 + mask = bpf_mprog_supported(ptype) ? 3779 + BPF_F_ATTACH_MASK_MPROG : BPF_F_ATTACH_MASK_BASE; 3780 + if (attr->attach_flags & ~mask) 3769 3781 return -EINVAL; 3770 3782 3771 3783 prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype); ··· 3816 3800 else 3817 3801 ret = cgroup_bpf_prog_attach(attr, ptype, prog); 3818 3802 break; 3803 + case BPF_PROG_TYPE_SCHED_CLS: 3804 + ret = tcx_prog_attach(attr, prog); 3805 + break; 3819 3806 default: 3820 3807 ret = -EINVAL; 3821 3808 } ··· 3828 3809 return ret; 3829 3810 } 3830 3811 3831 - #define BPF_PROG_DETACH_LAST_FIELD attach_type 3812 + #define BPF_PROG_DETACH_LAST_FIELD expected_revision 3832 3813 3833 3814 static int bpf_prog_detach(const union bpf_attr *attr) 3834 3815 { 3816 + struct bpf_prog *prog = NULL; 3835 3817 enum bpf_prog_type ptype; 3818 + int ret; 3836 3819 3837 3820 if (CHECK_ATTR(BPF_PROG_DETACH)) 3838 3821 return -EINVAL; 3839 3822 3840 3823 ptype = attach_type_to_prog_type(attr->attach_type); 3824 + if (bpf_mprog_supported(ptype)) { 3825 + if (ptype == BPF_PROG_TYPE_UNSPEC) 3826 + return -EINVAL; 3827 + if (attr->attach_flags & ~BPF_F_ATTACH_MASK_MPROG) 3828 + return -EINVAL; 3829 + if (attr->attach_bpf_fd) { 3830 + prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype); 3831 + if (IS_ERR(prog)) 3832 + return PTR_ERR(prog); 3833 + } 3834 + } 3841 3835 3842 3836 switch (ptype) { 3843 3837 case BPF_PROG_TYPE_SK_MSG: 3844 3838 case BPF_PROG_TYPE_SK_SKB: 3845 - return sock_map_prog_detach(attr, ptype); 3839 + ret = sock_map_prog_detach(attr, ptype); 3840 + break; 3846 3841 case BPF_PROG_TYPE_LIRC_MODE2: 3847 - return lirc_prog_detach(attr); 3842 + ret = lirc_prog_detach(attr); 3843 + break; 3848 3844 case BPF_PROG_TYPE_FLOW_DISSECTOR: 3849 - return netns_bpf_prog_detach(attr, ptype); 3845 + ret = netns_bpf_prog_detach(attr, ptype); 3846 + break; 3850 3847 case BPF_PROG_TYPE_CGROUP_DEVICE: 3851 3848 case BPF_PROG_TYPE_CGROUP_SKB: 3852 3849 case BPF_PROG_TYPE_CGROUP_SOCK: ··· 3871 3836 case BPF_PROG_TYPE_CGROUP_SYSCTL: 3872 3837 case BPF_PROG_TYPE_SOCK_OPS: 3873 3838 case BPF_PROG_TYPE_LSM: 3874 - return cgroup_bpf_prog_detach(attr, ptype); 3839 + ret = cgroup_bpf_prog_detach(attr, ptype); 3840 + break; 3841 + case BPF_PROG_TYPE_SCHED_CLS: 3842 + ret = tcx_prog_detach(attr, prog); 3843 + break; 3875 3844 default: 3876 - return -EINVAL; 3845 + ret = -EINVAL; 3877 3846 } 3847 + 3848 + if (prog) 3849 + bpf_prog_put(prog); 3850 + return ret; 3878 3851 } 3879 3852 3880 - #define BPF_PROG_QUERY_LAST_FIELD query.prog_attach_flags 3853 + #define BPF_PROG_QUERY_LAST_FIELD query.link_attach_flags 3881 3854 3882 3855 static int bpf_prog_query(const union bpf_attr *attr, 3883 3856 union bpf_attr __user *uattr) ··· 3933 3890 case BPF_SK_MSG_VERDICT: 3934 3891 case BPF_SK_SKB_VERDICT: 3935 3892 return sock_map_bpf_prog_query(attr, uattr); 3893 + case BPF_TCX_INGRESS: 3894 + case BPF_TCX_EGRESS: 3895 + return tcx_prog_query(attr, uattr); 3936 3896 default: 3937 3897 return -EINVAL; 3938 3898 } ··· 4898 4852 goto out; 4899 4853 } 4900 4854 break; 4855 + case BPF_PROG_TYPE_SCHED_CLS: 4856 + if (attr->link_create.attach_type != BPF_TCX_INGRESS && 4857 + attr->link_create.attach_type != BPF_TCX_EGRESS) { 4858 + ret = -EINVAL; 4859 + goto out; 4860 + } 4861 + break; 4901 4862 default: 4902 4863 ptype = attach_type_to_prog_type(attr->link_create.attach_type); 4903 4864 if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) { ··· 4955 4902 #ifdef CONFIG_NET 4956 4903 case BPF_PROG_TYPE_XDP: 4957 4904 ret = bpf_xdp_link_attach(attr, prog); 4905 + break; 4906 + case BPF_PROG_TYPE_SCHED_CLS: 4907 + ret = tcx_link_attach(attr, prog); 4958 4908 break; 4959 4909 case BPF_PROG_TYPE_NETFILTER: 4960 4910 ret = bpf_nf_link_attach(attr, prog);

+348

kernel/bpf/tcx.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Isovalent */ 3 + 4 + #include <linux/bpf.h> 5 + #include <linux/bpf_mprog.h> 6 + #include <linux/netdevice.h> 7 + 8 + #include <net/tcx.h> 9 + 10 + int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog) 11 + { 12 + bool created, ingress = attr->attach_type == BPF_TCX_INGRESS; 13 + struct net *net = current->nsproxy->net_ns; 14 + struct bpf_mprog_entry *entry, *entry_new; 15 + struct bpf_prog *replace_prog = NULL; 16 + struct net_device *dev; 17 + int ret; 18 + 19 + rtnl_lock(); 20 + dev = __dev_get_by_index(net, attr->target_ifindex); 21 + if (!dev) { 22 + ret = -ENODEV; 23 + goto out; 24 + } 25 + if (attr->attach_flags & BPF_F_REPLACE) { 26 + replace_prog = bpf_prog_get_type(attr->replace_bpf_fd, 27 + prog->type); 28 + if (IS_ERR(replace_prog)) { 29 + ret = PTR_ERR(replace_prog); 30 + replace_prog = NULL; 31 + goto out; 32 + } 33 + } 34 + entry = tcx_entry_fetch_or_create(dev, ingress, &created); 35 + if (!entry) { 36 + ret = -ENOMEM; 37 + goto out; 38 + } 39 + ret = bpf_mprog_attach(entry, &entry_new, prog, NULL, replace_prog, 40 + attr->attach_flags, attr->relative_fd, 41 + attr->expected_revision); 42 + if (!ret) { 43 + if (entry != entry_new) { 44 + tcx_entry_update(dev, entry_new, ingress); 45 + tcx_entry_sync(); 46 + tcx_skeys_inc(ingress); 47 + } 48 + bpf_mprog_commit(entry); 49 + } else if (created) { 50 + tcx_entry_free(entry); 51 + } 52 + out: 53 + if (replace_prog) 54 + bpf_prog_put(replace_prog); 55 + rtnl_unlock(); 56 + return ret; 57 + } 58 + 59 + int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog) 60 + { 61 + bool ingress = attr->attach_type == BPF_TCX_INGRESS; 62 + struct net *net = current->nsproxy->net_ns; 63 + struct bpf_mprog_entry *entry, *entry_new; 64 + struct net_device *dev; 65 + int ret; 66 + 67 + rtnl_lock(); 68 + dev = __dev_get_by_index(net, attr->target_ifindex); 69 + if (!dev) { 70 + ret = -ENODEV; 71 + goto out; 72 + } 73 + entry = tcx_entry_fetch(dev, ingress); 74 + if (!entry) { 75 + ret = -ENOENT; 76 + goto out; 77 + } 78 + ret = bpf_mprog_detach(entry, &entry_new, prog, NULL, attr->attach_flags, 79 + attr->relative_fd, attr->expected_revision); 80 + if (!ret) { 81 + if (!tcx_entry_is_active(entry_new)) 82 + entry_new = NULL; 83 + tcx_entry_update(dev, entry_new, ingress); 84 + tcx_entry_sync(); 85 + tcx_skeys_dec(ingress); 86 + bpf_mprog_commit(entry); 87 + if (!entry_new) 88 + tcx_entry_free(entry); 89 + } 90 + out: 91 + rtnl_unlock(); 92 + return ret; 93 + } 94 + 95 + void tcx_uninstall(struct net_device *dev, bool ingress) 96 + { 97 + struct bpf_tuple tuple = {}; 98 + struct bpf_mprog_entry *entry; 99 + struct bpf_mprog_fp *fp; 100 + struct bpf_mprog_cp *cp; 101 + 102 + entry = tcx_entry_fetch(dev, ingress); 103 + if (!entry) 104 + return; 105 + tcx_entry_update(dev, NULL, ingress); 106 + tcx_entry_sync(); 107 + bpf_mprog_foreach_tuple(entry, fp, cp, tuple) { 108 + if (tuple.link) 109 + tcx_link(tuple.link)->dev = NULL; 110 + else 111 + bpf_prog_put(tuple.prog); 112 + tcx_skeys_dec(ingress); 113 + } 114 + WARN_ON_ONCE(tcx_entry(entry)->miniq_active); 115 + tcx_entry_free(entry); 116 + } 117 + 118 + int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) 119 + { 120 + bool ingress = attr->query.attach_type == BPF_TCX_INGRESS; 121 + struct net *net = current->nsproxy->net_ns; 122 + struct bpf_mprog_entry *entry; 123 + struct net_device *dev; 124 + int ret; 125 + 126 + rtnl_lock(); 127 + dev = __dev_get_by_index(net, attr->query.target_ifindex); 128 + if (!dev) { 129 + ret = -ENODEV; 130 + goto out; 131 + } 132 + entry = tcx_entry_fetch(dev, ingress); 133 + if (!entry) { 134 + ret = -ENOENT; 135 + goto out; 136 + } 137 + ret = bpf_mprog_query(attr, uattr, entry); 138 + out: 139 + rtnl_unlock(); 140 + return ret; 141 + } 142 + 143 + static int tcx_link_prog_attach(struct bpf_link *link, u32 flags, u32 id_or_fd, 144 + u64 revision) 145 + { 146 + struct tcx_link *tcx = tcx_link(link); 147 + bool created, ingress = tcx->location == BPF_TCX_INGRESS; 148 + struct bpf_mprog_entry *entry, *entry_new; 149 + struct net_device *dev = tcx->dev; 150 + int ret; 151 + 152 + ASSERT_RTNL(); 153 + entry = tcx_entry_fetch_or_create(dev, ingress, &created); 154 + if (!entry) 155 + return -ENOMEM; 156 + ret = bpf_mprog_attach(entry, &entry_new, link->prog, link, NULL, flags, 157 + id_or_fd, revision); 158 + if (!ret) { 159 + if (entry != entry_new) { 160 + tcx_entry_update(dev, entry_new, ingress); 161 + tcx_entry_sync(); 162 + tcx_skeys_inc(ingress); 163 + } 164 + bpf_mprog_commit(entry); 165 + } else if (created) { 166 + tcx_entry_free(entry); 167 + } 168 + return ret; 169 + } 170 + 171 + static void tcx_link_release(struct bpf_link *link) 172 + { 173 + struct tcx_link *tcx = tcx_link(link); 174 + bool ingress = tcx->location == BPF_TCX_INGRESS; 175 + struct bpf_mprog_entry *entry, *entry_new; 176 + struct net_device *dev; 177 + int ret = 0; 178 + 179 + rtnl_lock(); 180 + dev = tcx->dev; 181 + if (!dev) 182 + goto out; 183 + entry = tcx_entry_fetch(dev, ingress); 184 + if (!entry) { 185 + ret = -ENOENT; 186 + goto out; 187 + } 188 + ret = bpf_mprog_detach(entry, &entry_new, link->prog, link, 0, 0, 0); 189 + if (!ret) { 190 + if (!tcx_entry_is_active(entry_new)) 191 + entry_new = NULL; 192 + tcx_entry_update(dev, entry_new, ingress); 193 + tcx_entry_sync(); 194 + tcx_skeys_dec(ingress); 195 + bpf_mprog_commit(entry); 196 + if (!entry_new) 197 + tcx_entry_free(entry); 198 + tcx->dev = NULL; 199 + } 200 + out: 201 + WARN_ON_ONCE(ret); 202 + rtnl_unlock(); 203 + } 204 + 205 + static int tcx_link_update(struct bpf_link *link, struct bpf_prog *nprog, 206 + struct bpf_prog *oprog) 207 + { 208 + struct tcx_link *tcx = tcx_link(link); 209 + bool ingress = tcx->location == BPF_TCX_INGRESS; 210 + struct bpf_mprog_entry *entry, *entry_new; 211 + struct net_device *dev; 212 + int ret = 0; 213 + 214 + rtnl_lock(); 215 + dev = tcx->dev; 216 + if (!dev) { 217 + ret = -ENOLINK; 218 + goto out; 219 + } 220 + if (oprog && link->prog != oprog) { 221 + ret = -EPERM; 222 + goto out; 223 + } 224 + oprog = link->prog; 225 + if (oprog == nprog) { 226 + bpf_prog_put(nprog); 227 + goto out; 228 + } 229 + entry = tcx_entry_fetch(dev, ingress); 230 + if (!entry) { 231 + ret = -ENOENT; 232 + goto out; 233 + } 234 + ret = bpf_mprog_attach(entry, &entry_new, nprog, link, oprog, 235 + BPF_F_REPLACE | BPF_F_ID, 236 + link->prog->aux->id, 0); 237 + if (!ret) { 238 + WARN_ON_ONCE(entry != entry_new); 239 + oprog = xchg(&link->prog, nprog); 240 + bpf_prog_put(oprog); 241 + bpf_mprog_commit(entry); 242 + } 243 + out: 244 + rtnl_unlock(); 245 + return ret; 246 + } 247 + 248 + static void tcx_link_dealloc(struct bpf_link *link) 249 + { 250 + kfree(tcx_link(link)); 251 + } 252 + 253 + static void tcx_link_fdinfo(const struct bpf_link *link, struct seq_file *seq) 254 + { 255 + const struct tcx_link *tcx = tcx_link_const(link); 256 + u32 ifindex = 0; 257 + 258 + rtnl_lock(); 259 + if (tcx->dev) 260 + ifindex = tcx->dev->ifindex; 261 + rtnl_unlock(); 262 + 263 + seq_printf(seq, "ifindex:\t%u\n", ifindex); 264 + seq_printf(seq, "attach_type:\t%u (%s)\n", 265 + tcx->location, 266 + tcx->location == BPF_TCX_INGRESS ? "ingress" : "egress"); 267 + } 268 + 269 + static int tcx_link_fill_info(const struct bpf_link *link, 270 + struct bpf_link_info *info) 271 + { 272 + const struct tcx_link *tcx = tcx_link_const(link); 273 + u32 ifindex = 0; 274 + 275 + rtnl_lock(); 276 + if (tcx->dev) 277 + ifindex = tcx->dev->ifindex; 278 + rtnl_unlock(); 279 + 280 + info->tcx.ifindex = ifindex; 281 + info->tcx.attach_type = tcx->location; 282 + return 0; 283 + } 284 + 285 + static int tcx_link_detach(struct bpf_link *link) 286 + { 287 + tcx_link_release(link); 288 + return 0; 289 + } 290 + 291 + static const struct bpf_link_ops tcx_link_lops = { 292 + .release = tcx_link_release, 293 + .detach = tcx_link_detach, 294 + .dealloc = tcx_link_dealloc, 295 + .update_prog = tcx_link_update, 296 + .show_fdinfo = tcx_link_fdinfo, 297 + .fill_link_info = tcx_link_fill_info, 298 + }; 299 + 300 + static int tcx_link_init(struct tcx_link *tcx, 301 + struct bpf_link_primer *link_primer, 302 + const union bpf_attr *attr, 303 + struct net_device *dev, 304 + struct bpf_prog *prog) 305 + { 306 + bpf_link_init(&tcx->link, BPF_LINK_TYPE_TCX, &tcx_link_lops, prog); 307 + tcx->location = attr->link_create.attach_type; 308 + tcx->dev = dev; 309 + return bpf_link_prime(&tcx->link, link_primer); 310 + } 311 + 312 + int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) 313 + { 314 + struct net *net = current->nsproxy->net_ns; 315 + struct bpf_link_primer link_primer; 316 + struct net_device *dev; 317 + struct tcx_link *tcx; 318 + int ret; 319 + 320 + rtnl_lock(); 321 + dev = __dev_get_by_index(net, attr->link_create.target_ifindex); 322 + if (!dev) { 323 + ret = -ENODEV; 324 + goto out; 325 + } 326 + tcx = kzalloc(sizeof(*tcx), GFP_USER); 327 + if (!tcx) { 328 + ret = -ENOMEM; 329 + goto out; 330 + } 331 + ret = tcx_link_init(tcx, &link_primer, attr, dev, prog); 332 + if (ret) { 333 + kfree(tcx); 334 + goto out; 335 + } 336 + ret = tcx_link_prog_attach(&tcx->link, attr->link_create.flags, 337 + attr->link_create.tcx.relative_fd, 338 + attr->link_create.tcx.expected_revision); 339 + if (ret) { 340 + tcx->dev = NULL; 341 + bpf_link_cleanup(&link_primer); 342 + goto out; 343 + } 344 + ret = bpf_link_settle(&link_primer); 345 + out: 346 + rtnl_unlock(); 347 + return ret; 348 + }

+13 -9

kernel/bpf/verifier.c

··· 5413 5413 return reg->type == PTR_TO_FLOW_KEYS; 5414 5414 } 5415 5415 5416 + static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = { 5417 + #ifdef CONFIG_NET 5418 + [PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK], 5419 + [PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], 5420 + [PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP], 5421 + #endif 5422 + [CONST_PTR_TO_MAP] = btf_bpf_map_id, 5423 + }; 5424 + 5416 5425 static bool is_trusted_reg(const struct bpf_reg_state *reg) 5417 5426 { 5418 5427 /* A referenced register is always trusted. */ 5419 5428 if (reg->ref_obj_id) 5429 + return true; 5430 + 5431 + /* Types listed in the reg2btf_ids are always trusted */ 5432 + if (reg2btf_ids[base_type(reg->type)]) 5420 5433 return true; 5421 5434 5422 5435 /* If a register is not referenced, it is trusted if it has the ··· 10064 10051 } 10065 10052 return true; 10066 10053 } 10067 - 10068 - 10069 - static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = { 10070 - #ifdef CONFIG_NET 10071 - [PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK], 10072 - [PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], 10073 - [PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP], 10074 - #endif 10075 - }; 10076 10054 10077 10055 enum kfunc_ptr_arg_type { 10078 10056 KF_ARG_PTR_TO_CTX,

+5

net/Kconfig

··· 52 52 config NET_EGRESS 53 53 bool 54 54 55 + config NET_XGRESS 56 + select NET_INGRESS 57 + select NET_EGRESS 58 + bool 59 + 55 60 config NET_REDIRECT 56 61 bool 57 62

+177 -115

net/core/dev.c

··· 107 107 #include <net/pkt_cls.h> 108 108 #include <net/checksum.h> 109 109 #include <net/xfrm.h> 110 + #include <net/tcx.h> 110 111 #include <linux/highmem.h> 111 112 #include <linux/init.h> 112 113 #include <linux/module.h> ··· 154 153 155 154 #include "dev.h" 156 155 #include "net-sysfs.h" 157 - 158 156 159 157 static DEFINE_SPINLOCK(ptype_lock); 160 158 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly; ··· 3882 3882 EXPORT_SYMBOL(dev_loopback_xmit); 3883 3883 3884 3884 #ifdef CONFIG_NET_EGRESS 3885 - static struct sk_buff * 3886 - sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) 3887 - { 3888 - #ifdef CONFIG_NET_CLS_ACT 3889 - struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress); 3890 - struct tcf_result cl_res; 3891 - 3892 - if (!miniq) 3893 - return skb; 3894 - 3895 - /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ 3896 - tc_skb_cb(skb)->mru = 0; 3897 - tc_skb_cb(skb)->post_ct = false; 3898 - mini_qdisc_bstats_cpu_update(miniq, skb); 3899 - 3900 - switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) { 3901 - case TC_ACT_OK: 3902 - case TC_ACT_RECLASSIFY: 3903 - skb->tc_index = TC_H_MIN(cl_res.classid); 3904 - break; 3905 - case TC_ACT_SHOT: 3906 - mini_qdisc_qstats_cpu_drop(miniq); 3907 - *ret = NET_XMIT_DROP; 3908 - kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS); 3909 - return NULL; 3910 - case TC_ACT_STOLEN: 3911 - case TC_ACT_QUEUED: 3912 - case TC_ACT_TRAP: 3913 - *ret = NET_XMIT_SUCCESS; 3914 - consume_skb(skb); 3915 - return NULL; 3916 - case TC_ACT_REDIRECT: 3917 - /* No need to push/pop skb's mac_header here on egress! */ 3918 - skb_do_redirect(skb); 3919 - *ret = NET_XMIT_SUCCESS; 3920 - return NULL; 3921 - default: 3922 - break; 3923 - } 3924 - #endif /* CONFIG_NET_CLS_ACT */ 3925 - 3926 - return skb; 3927 - } 3928 - 3929 3885 static struct netdev_queue * 3930 3886 netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb) 3931 3887 { ··· 3901 3945 } 3902 3946 EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue); 3903 3947 #endif /* CONFIG_NET_EGRESS */ 3948 + 3949 + #ifdef CONFIG_NET_XGRESS 3950 + static int tc_run(struct tcx_entry *entry, struct sk_buff *skb) 3951 + { 3952 + int ret = TC_ACT_UNSPEC; 3953 + #ifdef CONFIG_NET_CLS_ACT 3954 + struct mini_Qdisc *miniq = rcu_dereference_bh(entry->miniq); 3955 + struct tcf_result res; 3956 + 3957 + if (!miniq) 3958 + return ret; 3959 + 3960 + tc_skb_cb(skb)->mru = 0; 3961 + tc_skb_cb(skb)->post_ct = false; 3962 + 3963 + mini_qdisc_bstats_cpu_update(miniq, skb); 3964 + ret = tcf_classify(skb, miniq->block, miniq->filter_list, &res, false); 3965 + /* Only tcf related quirks below. */ 3966 + switch (ret) { 3967 + case TC_ACT_SHOT: 3968 + mini_qdisc_qstats_cpu_drop(miniq); 3969 + break; 3970 + case TC_ACT_OK: 3971 + case TC_ACT_RECLASSIFY: 3972 + skb->tc_index = TC_H_MIN(res.classid); 3973 + break; 3974 + } 3975 + #endif /* CONFIG_NET_CLS_ACT */ 3976 + return ret; 3977 + } 3978 + 3979 + static DEFINE_STATIC_KEY_FALSE(tcx_needed_key); 3980 + 3981 + void tcx_inc(void) 3982 + { 3983 + static_branch_inc(&tcx_needed_key); 3984 + } 3985 + 3986 + void tcx_dec(void) 3987 + { 3988 + static_branch_dec(&tcx_needed_key); 3989 + } 3990 + 3991 + static __always_inline enum tcx_action_base 3992 + tcx_run(const struct bpf_mprog_entry *entry, struct sk_buff *skb, 3993 + const bool needs_mac) 3994 + { 3995 + const struct bpf_mprog_fp *fp; 3996 + const struct bpf_prog *prog; 3997 + int ret = TCX_NEXT; 3998 + 3999 + if (needs_mac) 4000 + __skb_push(skb, skb->mac_len); 4001 + bpf_mprog_foreach_prog(entry, fp, prog) { 4002 + bpf_compute_data_pointers(skb); 4003 + ret = bpf_prog_run(prog, skb); 4004 + if (ret != TCX_NEXT) 4005 + break; 4006 + } 4007 + if (needs_mac) 4008 + __skb_pull(skb, skb->mac_len); 4009 + return tcx_action_code(skb, ret); 4010 + } 4011 + 4012 + static __always_inline struct sk_buff * 4013 + sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, 4014 + struct net_device *orig_dev, bool *another) 4015 + { 4016 + struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress); 4017 + int sch_ret; 4018 + 4019 + if (!entry) 4020 + return skb; 4021 + if (*pt_prev) { 4022 + *ret = deliver_skb(skb, *pt_prev, orig_dev); 4023 + *pt_prev = NULL; 4024 + } 4025 + 4026 + qdisc_skb_cb(skb)->pkt_len = skb->len; 4027 + tcx_set_ingress(skb, true); 4028 + 4029 + if (static_branch_unlikely(&tcx_needed_key)) { 4030 + sch_ret = tcx_run(entry, skb, true); 4031 + if (sch_ret != TC_ACT_UNSPEC) 4032 + goto ingress_verdict; 4033 + } 4034 + sch_ret = tc_run(tcx_entry(entry), skb); 4035 + ingress_verdict: 4036 + switch (sch_ret) { 4037 + case TC_ACT_REDIRECT: 4038 + /* skb_mac_header check was done by BPF, so we can safely 4039 + * push the L2 header back before redirecting to another 4040 + * netdev. 4041 + */ 4042 + __skb_push(skb, skb->mac_len); 4043 + if (skb_do_redirect(skb) == -EAGAIN) { 4044 + __skb_pull(skb, skb->mac_len); 4045 + *another = true; 4046 + break; 4047 + } 4048 + *ret = NET_RX_SUCCESS; 4049 + return NULL; 4050 + case TC_ACT_SHOT: 4051 + kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS); 4052 + *ret = NET_RX_DROP; 4053 + return NULL; 4054 + /* used by tc_run */ 4055 + case TC_ACT_STOLEN: 4056 + case TC_ACT_QUEUED: 4057 + case TC_ACT_TRAP: 4058 + consume_skb(skb); 4059 + fallthrough; 4060 + case TC_ACT_CONSUMED: 4061 + *ret = NET_RX_SUCCESS; 4062 + return NULL; 4063 + } 4064 + 4065 + return skb; 4066 + } 4067 + 4068 + static __always_inline struct sk_buff * 4069 + sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) 4070 + { 4071 + struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress); 4072 + int sch_ret; 4073 + 4074 + if (!entry) 4075 + return skb; 4076 + 4077 + /* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was 4078 + * already set by the caller. 4079 + */ 4080 + if (static_branch_unlikely(&tcx_needed_key)) { 4081 + sch_ret = tcx_run(entry, skb, false); 4082 + if (sch_ret != TC_ACT_UNSPEC) 4083 + goto egress_verdict; 4084 + } 4085 + sch_ret = tc_run(tcx_entry(entry), skb); 4086 + egress_verdict: 4087 + switch (sch_ret) { 4088 + case TC_ACT_REDIRECT: 4089 + /* No need to push/pop skb's mac_header here on egress! */ 4090 + skb_do_redirect(skb); 4091 + *ret = NET_XMIT_SUCCESS; 4092 + return NULL; 4093 + case TC_ACT_SHOT: 4094 + kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS); 4095 + *ret = NET_XMIT_DROP; 4096 + return NULL; 4097 + /* used by tc_run */ 4098 + case TC_ACT_STOLEN: 4099 + case TC_ACT_QUEUED: 4100 + case TC_ACT_TRAP: 4101 + *ret = NET_XMIT_SUCCESS; 4102 + return NULL; 4103 + } 4104 + 4105 + return skb; 4106 + } 4107 + #else 4108 + static __always_inline struct sk_buff * 4109 + sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, 4110 + struct net_device *orig_dev, bool *another) 4111 + { 4112 + return skb; 4113 + } 4114 + 4115 + static __always_inline struct sk_buff * 4116 + sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) 4117 + { 4118 + return skb; 4119 + } 4120 + #endif /* CONFIG_NET_XGRESS */ 3904 4121 3905 4122 #ifdef CONFIG_XPS 3906 4123 static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb, ··· 4257 4128 skb_update_prio(skb); 4258 4129 4259 4130 qdisc_pkt_len_init(skb); 4260 - #ifdef CONFIG_NET_CLS_ACT 4261 - skb->tc_at_ingress = 0; 4262 - #endif 4131 + tcx_set_ingress(skb, false); 4263 4132 #ifdef CONFIG_NET_EGRESS 4264 4133 if (static_branch_unlikely(&egress_needed_key)) { 4265 4134 if (nf_hook_egress_active()) { ··· 5190 5063 unsigned char *addr) __read_mostly; 5191 5064 EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook); 5192 5065 #endif 5193 - 5194 - static inline struct sk_buff * 5195 - sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, 5196 - struct net_device *orig_dev, bool *another) 5197 - { 5198 - #ifdef CONFIG_NET_CLS_ACT 5199 - struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress); 5200 - struct tcf_result cl_res; 5201 - 5202 - /* If there's at least one ingress present somewhere (so 5203 - * we get here via enabled static key), remaining devices 5204 - * that are not configured with an ingress qdisc will bail 5205 - * out here. 5206 - */ 5207 - if (!miniq) 5208 - return skb; 5209 - 5210 - if (*pt_prev) { 5211 - *ret = deliver_skb(skb, *pt_prev, orig_dev); 5212 - *pt_prev = NULL; 5213 - } 5214 - 5215 - qdisc_skb_cb(skb)->pkt_len = skb->len; 5216 - tc_skb_cb(skb)->mru = 0; 5217 - tc_skb_cb(skb)->post_ct = false; 5218 - skb->tc_at_ingress = 1; 5219 - mini_qdisc_bstats_cpu_update(miniq, skb); 5220 - 5221 - switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) { 5222 - case TC_ACT_OK: 5223 - case TC_ACT_RECLASSIFY: 5224 - skb->tc_index = TC_H_MIN(cl_res.classid); 5225 - break; 5226 - case TC_ACT_SHOT: 5227 - mini_qdisc_qstats_cpu_drop(miniq); 5228 - kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS); 5229 - *ret = NET_RX_DROP; 5230 - return NULL; 5231 - case TC_ACT_STOLEN: 5232 - case TC_ACT_QUEUED: 5233 - case TC_ACT_TRAP: 5234 - consume_skb(skb); 5235 - *ret = NET_RX_SUCCESS; 5236 - return NULL; 5237 - case TC_ACT_REDIRECT: 5238 - /* skb_mac_header check was done by cls/act_bpf, so 5239 - * we can safely push the L2 header back before 5240 - * redirecting to another netdev 5241 - */ 5242 - __skb_push(skb, skb->mac_len); 5243 - if (skb_do_redirect(skb) == -EAGAIN) { 5244 - __skb_pull(skb, skb->mac_len); 5245 - *another = true; 5246 - break; 5247 - } 5248 - *ret = NET_RX_SUCCESS; 5249 - return NULL; 5250 - case TC_ACT_CONSUMED: 5251 - *ret = NET_RX_SUCCESS; 5252 - return NULL; 5253 - default: 5254 - break; 5255 - } 5256 - #endif /* CONFIG_NET_CLS_ACT */ 5257 - return skb; 5258 - } 5259 5066 5260 5067 /** 5261 5068 * netdev_is_rx_handler_busy - check if receive handler is registered ··· 10674 10613 dev_net_set(dev, &init_net); 10675 10614 10676 10615 dev->gso_max_size = GSO_LEGACY_MAX_SIZE; 10616 + dev->xdp_zc_max_segs = 1; 10677 10617 dev->gso_max_segs = GSO_MAX_SEGS; 10678 10618 dev->gro_max_size = GRO_LEGACY_MAX_SIZE; 10679 10619 dev->gso_ipv4_max_size = GSO_LEGACY_MAX_SIZE; ··· 10896 10834 10897 10835 /* Shutdown queueing discipline. */ 10898 10836 dev_shutdown(dev); 10899 - 10837 + dev_tcx_uninstall(dev); 10900 10838 dev_xdp_uninstall(dev); 10901 10839 bpf_dev_bound_netdev_unregister(dev); 10902 10840

+3 -8

net/core/filter.c

··· 4345 4345 struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); 4346 4346 enum bpf_map_type map_type = ri->map_type; 4347 4347 4348 - if (map_type == BPF_MAP_TYPE_XSKMAP) { 4349 - /* XDP_REDIRECT is not supported AF_XDP yet. */ 4350 - if (unlikely(xdp_buff_has_frags(xdp))) 4351 - return -EOPNOTSUPP; 4352 - 4348 + if (map_type == BPF_MAP_TYPE_XSKMAP) 4353 4349 return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog); 4354 - } 4355 4350 4356 4351 return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp), 4357 4352 xdp_prog); ··· 9307 9312 __u8 value_reg = si->dst_reg; 9308 9313 __u8 skb_reg = si->src_reg; 9309 9314 9310 - #ifdef CONFIG_NET_CLS_ACT 9315 + #ifdef CONFIG_NET_XGRESS 9311 9316 /* If the tstamp_type is read, 9312 9317 * the bpf prog is aware the tstamp could have delivery time. 9313 9318 * Thus, read skb->tstamp as is if tstamp_type_access is true. ··· 9341 9346 __u8 value_reg = si->src_reg; 9342 9347 __u8 skb_reg = si->dst_reg; 9343 9348 9344 - #ifdef CONFIG_NET_CLS_ACT 9349 + #ifdef CONFIG_NET_XGRESS 9345 9350 /* If the tstamp_type is read, 9346 9351 * the bpf prog is aware the tstamp could have delivery time. 9347 9352 * Thus, write skb->tstamp as is if tstamp_type_access is true.

+8

net/core/netdev-genl.c

··· 25 25 return -EINVAL; 26 26 } 27 27 28 + if (netdev->xdp_features & NETDEV_XDP_ACT_XSK_ZEROCOPY) { 29 + if (nla_put_u32(rsp, NETDEV_A_DEV_XDP_ZC_MAX_SEGS, 30 + netdev->xdp_zc_max_segs)) { 31 + genlmsg_cancel(rsp, hdr); 32 + return -EINVAL; 33 + } 34 + } 35 + 28 36 genlmsg_end(rsp, hdr); 29 37 30 38 return 0;

-2

net/ipv4/bpf_tcp_ca.c

··· 51 51 return false; 52 52 } 53 53 54 - extern struct btf *btf_vmlinux; 55 - 56 54 static bool bpf_tcp_ca_is_valid_access(int off, int size, 57 55 enum bpf_access_type type, 58 56 const struct bpf_prog *prog,

+2 -2

net/sched/Kconfig

··· 347 347 config NET_SCH_INGRESS 348 348 tristate "Ingress/classifier-action Qdisc" 349 349 depends on NET_CLS_ACT 350 - select NET_INGRESS 351 - select NET_EGRESS 350 + select NET_XGRESS 352 351 help 353 352 Say Y here if you want to use classifiers for incoming and/or outgoing 354 353 packets. This qdisc doesn't do anything else besides running classifiers, ··· 678 679 config NET_CLS_ACT 679 680 bool "Actions" 680 681 select NET_CLS 682 + select NET_XGRESS 681 683 help 682 684 Say Y here if you want to use traffic control actions. Actions 683 685 get attached to classifiers and are invoked after a successful

+57 -4

net/sched/sch_ingress.c

··· 13 13 #include <net/netlink.h> 14 14 #include <net/pkt_sched.h> 15 15 #include <net/pkt_cls.h> 16 + #include <net/tcx.h> 16 17 17 18 struct ingress_sched_data { 18 19 struct tcf_block *block; ··· 79 78 { 80 79 struct ingress_sched_data *q = qdisc_priv(sch); 81 80 struct net_device *dev = qdisc_dev(sch); 81 + struct bpf_mprog_entry *entry; 82 + bool created; 82 83 int err; 83 84 84 85 if (sch->parent != TC_H_INGRESS) ··· 88 85 89 86 net_inc_ingress_queue(); 90 87 91 - mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress); 88 + entry = tcx_entry_fetch_or_create(dev, true, &created); 89 + if (!entry) 90 + return -ENOMEM; 91 + tcx_miniq_set_active(entry, true); 92 + mini_qdisc_pair_init(&q->miniqp, sch, &tcx_entry(entry)->miniq); 93 + if (created) 94 + tcx_entry_update(dev, entry, true); 92 95 93 96 q->block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; 94 97 q->block_info.chain_head_change = clsact_chain_head_change; ··· 112 103 static void ingress_destroy(struct Qdisc *sch) 113 104 { 114 105 struct ingress_sched_data *q = qdisc_priv(sch); 106 + struct net_device *dev = qdisc_dev(sch); 107 + struct bpf_mprog_entry *entry = rtnl_dereference(dev->tcx_ingress); 115 108 116 109 if (sch->parent != TC_H_INGRESS) 117 110 return; 118 111 119 112 tcf_block_put_ext(q->block, sch, &q->block_info); 113 + 114 + if (entry) { 115 + tcx_miniq_set_active(entry, false); 116 + if (!tcx_entry_is_active(entry)) { 117 + tcx_entry_update(dev, NULL, false); 118 + tcx_entry_free(entry); 119 + } 120 + } 121 + 120 122 net_dec_ingress_queue(); 121 123 } 122 124 ··· 243 223 { 244 224 struct clsact_sched_data *q = qdisc_priv(sch); 245 225 struct net_device *dev = qdisc_dev(sch); 226 + struct bpf_mprog_entry *entry; 227 + bool created; 246 228 int err; 247 229 248 230 if (sch->parent != TC_H_CLSACT) ··· 253 231 net_inc_ingress_queue(); 254 232 net_inc_egress_queue(); 255 233 256 - mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress); 234 + entry = tcx_entry_fetch_or_create(dev, true, &created); 235 + if (!entry) 236 + return -ENOMEM; 237 + tcx_miniq_set_active(entry, true); 238 + mini_qdisc_pair_init(&q->miniqp_ingress, sch, &tcx_entry(entry)->miniq); 239 + if (created) 240 + tcx_entry_update(dev, entry, true); 257 241 258 242 q->ingress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; 259 243 q->ingress_block_info.chain_head_change = clsact_chain_head_change; ··· 272 244 273 245 mini_qdisc_pair_block_init(&q->miniqp_ingress, q->ingress_block); 274 246 275 - mini_qdisc_pair_init(&q->miniqp_egress, sch, &dev->miniq_egress); 247 + entry = tcx_entry_fetch_or_create(dev, false, &created); 248 + if (!entry) 249 + return -ENOMEM; 250 + tcx_miniq_set_active(entry, true); 251 + mini_qdisc_pair_init(&q->miniqp_egress, sch, &tcx_entry(entry)->miniq); 252 + if (created) 253 + tcx_entry_update(dev, entry, false); 276 254 277 255 q->egress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS; 278 256 q->egress_block_info.chain_head_change = clsact_chain_head_change; ··· 290 256 static void clsact_destroy(struct Qdisc *sch) 291 257 { 292 258 struct clsact_sched_data *q = qdisc_priv(sch); 259 + struct net_device *dev = qdisc_dev(sch); 260 + struct bpf_mprog_entry *ingress_entry = rtnl_dereference(dev->tcx_ingress); 261 + struct bpf_mprog_entry *egress_entry = rtnl_dereference(dev->tcx_egress); 293 262 294 263 if (sch->parent != TC_H_CLSACT) 295 264 return; 296 265 297 - tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info); 298 266 tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info); 267 + tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info); 268 + 269 + if (ingress_entry) { 270 + tcx_miniq_set_active(ingress_entry, false); 271 + if (!tcx_entry_is_active(ingress_entry)) { 272 + tcx_entry_update(dev, NULL, true); 273 + tcx_entry_free(ingress_entry); 274 + } 275 + } 276 + 277 + if (egress_entry) { 278 + tcx_miniq_set_active(egress_entry, false); 279 + if (!tcx_entry_is_active(egress_entry)) { 280 + tcx_entry_update(dev, NULL, false); 281 + tcx_entry_free(egress_entry); 282 + } 283 + } 299 284 300 285 net_dec_ingress_queue(); 301 286 net_dec_egress_queue();

+285 -90

net/xdp/xsk.c

··· 135 135 return 0; 136 136 } 137 137 138 - static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) 138 + static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff_xsk *xskb, u32 len, 139 + u32 flags) 139 140 { 140 - struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); 141 141 u64 addr; 142 142 int err; 143 143 144 144 addr = xp_get_handle(xskb); 145 - err = xskq_prod_reserve_desc(xs->rx, addr, len); 145 + err = xskq_prod_reserve_desc(xs->rx, addr, len, flags); 146 146 if (err) { 147 147 xs->rx_queue_full++; 148 148 return err; ··· 152 152 return 0; 153 153 } 154 154 155 - static void xsk_copy_xdp(struct xdp_buff *to, struct xdp_buff *from, u32 len) 155 + static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) 156 156 { 157 - void *from_buf, *to_buf; 158 - u32 metalen; 157 + struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); 158 + u32 frags = xdp_buff_has_frags(xdp); 159 + struct xdp_buff_xsk *pos, *tmp; 160 + struct list_head *xskb_list; 161 + u32 contd = 0; 162 + int err; 159 163 160 - if (unlikely(xdp_data_meta_unsupported(from))) { 161 - from_buf = from->data; 162 - to_buf = to->data; 163 - metalen = 0; 164 - } else { 165 - from_buf = from->data_meta; 166 - metalen = from->data - from->data_meta; 167 - to_buf = to->data - metalen; 164 + if (frags) 165 + contd = XDP_PKT_CONTD; 166 + 167 + err = __xsk_rcv_zc(xs, xskb, len, contd); 168 + if (err || likely(!frags)) 169 + goto out; 170 + 171 + xskb_list = &xskb->pool->xskb_list; 172 + list_for_each_entry_safe(pos, tmp, xskb_list, xskb_list_node) { 173 + if (list_is_singular(xskb_list)) 174 + contd = 0; 175 + len = pos->xdp.data_end - pos->xdp.data; 176 + err = __xsk_rcv_zc(xs, pos, len, contd); 177 + if (err) 178 + return err; 179 + list_del(&pos->xskb_list_node); 168 180 } 169 181 170 - memcpy(to_buf, from_buf, len + metalen); 182 + out: 183 + return err; 171 184 } 172 185 173 - static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) 186 + static void *xsk_copy_xdp_start(struct xdp_buff *from) 174 187 { 175 - struct xdp_buff *xsk_xdp; 176 - int err; 177 - u32 len; 188 + if (unlikely(xdp_data_meta_unsupported(from))) 189 + return from->data; 190 + else 191 + return from->data_meta; 192 + } 178 193 179 - len = xdp->data_end - xdp->data; 180 - if (len > xsk_pool_get_rx_frame_size(xs->pool)) { 181 - xs->rx_dropped++; 182 - return -ENOSPC; 194 + static u32 xsk_copy_xdp(void *to, void **from, u32 to_len, 195 + u32 *from_len, skb_frag_t **frag, u32 rem) 196 + { 197 + u32 copied = 0; 198 + 199 + while (1) { 200 + u32 copy_len = min_t(u32, *from_len, to_len); 201 + 202 + memcpy(to, *from, copy_len); 203 + copied += copy_len; 204 + if (rem == copied) 205 + return copied; 206 + 207 + if (*from_len == copy_len) { 208 + *from = skb_frag_address(*frag); 209 + *from_len = skb_frag_size((*frag)++); 210 + } else { 211 + *from += copy_len; 212 + *from_len -= copy_len; 213 + } 214 + if (to_len == copy_len) 215 + return copied; 216 + 217 + to_len -= copy_len; 218 + to += copy_len; 219 + } 220 + } 221 + 222 + static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) 223 + { 224 + u32 frame_size = xsk_pool_get_rx_frame_size(xs->pool); 225 + void *copy_from = xsk_copy_xdp_start(xdp), *copy_to; 226 + u32 from_len, meta_len, rem, num_desc; 227 + struct xdp_buff_xsk *xskb; 228 + struct xdp_buff *xsk_xdp; 229 + skb_frag_t *frag; 230 + 231 + from_len = xdp->data_end - copy_from; 232 + meta_len = xdp->data - copy_from; 233 + rem = len + meta_len; 234 + 235 + if (len <= frame_size && !xdp_buff_has_frags(xdp)) { 236 + int err; 237 + 238 + xsk_xdp = xsk_buff_alloc(xs->pool); 239 + if (!xsk_xdp) { 240 + xs->rx_dropped++; 241 + return -ENOMEM; 242 + } 243 + memcpy(xsk_xdp->data - meta_len, copy_from, rem); 244 + xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); 245 + err = __xsk_rcv_zc(xs, xskb, len, 0); 246 + if (err) { 247 + xsk_buff_free(xsk_xdp); 248 + return err; 249 + } 250 + 251 + return 0; 183 252 } 184 253 185 - xsk_xdp = xsk_buff_alloc(xs->pool); 186 - if (!xsk_xdp) { 254 + num_desc = (len - 1) / frame_size + 1; 255 + 256 + if (!xsk_buff_can_alloc(xs->pool, num_desc)) { 187 257 xs->rx_dropped++; 188 258 return -ENOMEM; 189 259 } 190 - 191 - xsk_copy_xdp(xsk_xdp, xdp, len); 192 - err = __xsk_rcv_zc(xs, xsk_xdp, len); 193 - if (err) { 194 - xsk_buff_free(xsk_xdp); 195 - return err; 260 + if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) { 261 + xs->rx_queue_full++; 262 + return -ENOBUFS; 196 263 } 264 + 265 + if (xdp_buff_has_frags(xdp)) { 266 + struct skb_shared_info *sinfo; 267 + 268 + sinfo = xdp_get_shared_info_from_buff(xdp); 269 + frag = &sinfo->frags[0]; 270 + } 271 + 272 + do { 273 + u32 to_len = frame_size + meta_len; 274 + u32 copied; 275 + 276 + xsk_xdp = xsk_buff_alloc(xs->pool); 277 + copy_to = xsk_xdp->data - meta_len; 278 + 279 + copied = xsk_copy_xdp(copy_to, &copy_from, to_len, &from_len, &frag, rem); 280 + rem -= copied; 281 + 282 + xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp); 283 + __xsk_rcv_zc(xs, xskb, copied - meta_len, rem ? XDP_PKT_CONTD : 0); 284 + meta_len = 0; 285 + } while (rem); 286 + 197 287 return 0; 198 288 } 199 289 ··· 305 215 return false; 306 216 } 307 217 308 - static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp) 218 + static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) 309 219 { 310 220 if (!xsk_is_bound(xs)) 311 221 return -ENXIO; 312 222 313 223 if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) 314 224 return -EINVAL; 225 + 226 + if (len > xsk_pool_get_rx_frame_size(xs->pool) && !xs->sg) { 227 + xs->rx_dropped++; 228 + return -ENOSPC; 229 + } 315 230 316 231 sk_mark_napi_id_once_xdp(&xs->sk, xdp); 317 232 return 0; ··· 331 236 332 237 int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) 333 238 { 239 + u32 len = xdp_get_buff_len(xdp); 334 240 int err; 335 241 336 242 spin_lock_bh(&xs->rx_lock); 337 - err = xsk_rcv_check(xs, xdp); 243 + err = xsk_rcv_check(xs, xdp, len); 338 244 if (!err) { 339 - err = __xsk_rcv(xs, xdp); 245 + err = __xsk_rcv(xs, xdp, len); 340 246 xsk_flush(xs); 341 247 } 342 248 spin_unlock_bh(&xs->rx_lock); ··· 346 250 347 251 static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) 348 252 { 253 + u32 len = xdp_get_buff_len(xdp); 349 254 int err; 350 - u32 len; 351 255 352 - err = xsk_rcv_check(xs, xdp); 256 + err = xsk_rcv_check(xs, xdp, len); 353 257 if (err) 354 258 return err; 355 259 356 260 if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) { 357 261 len = xdp->data_end - xdp->data; 358 - return __xsk_rcv_zc(xs, xdp, len); 262 + return xsk_rcv_zc(xs, xdp, len); 359 263 } 360 264 361 - err = __xsk_rcv(xs, xdp); 265 + err = __xsk_rcv(xs, xdp, len); 362 266 if (!err) 363 267 xdp_return_buff(xdp); 364 268 return err; ··· 417 321 rcu_read_lock(); 418 322 list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { 419 323 if (!xskq_cons_peek_desc(xs->tx, desc, pool)) { 420 - xs->tx->queue_empty_descs++; 324 + if (xskq_has_descs(xs->tx)) 325 + xskq_cons_release(xs->tx); 421 326 continue; 422 327 } 423 328 ··· 505 408 return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags); 506 409 } 507 410 508 - static void xsk_destruct_skb(struct sk_buff *skb) 411 + static int xsk_cq_reserve_addr_locked(struct xdp_sock *xs, u64 addr) 509 412 { 510 - u64 addr = (u64)(long)skb_shinfo(skb)->destructor_arg; 511 - struct xdp_sock *xs = xdp_sk(skb->sk); 413 + unsigned long flags; 414 + int ret; 415 + 416 + spin_lock_irqsave(&xs->pool->cq_lock, flags); 417 + ret = xskq_prod_reserve_addr(xs->pool->cq, addr); 418 + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); 419 + 420 + return ret; 421 + } 422 + 423 + static void xsk_cq_submit_locked(struct xdp_sock *xs, u32 n) 424 + { 512 425 unsigned long flags; 513 426 514 427 spin_lock_irqsave(&xs->pool->cq_lock, flags); 515 - xskq_prod_submit_addr(xs->pool->cq, addr); 428 + xskq_prod_submit_n(xs->pool->cq, n); 516 429 spin_unlock_irqrestore(&xs->pool->cq_lock, flags); 430 + } 517 431 432 + static void xsk_cq_cancel_locked(struct xdp_sock *xs, u32 n) 433 + { 434 + unsigned long flags; 435 + 436 + spin_lock_irqsave(&xs->pool->cq_lock, flags); 437 + xskq_prod_cancel_n(xs->pool->cq, n); 438 + spin_unlock_irqrestore(&xs->pool->cq_lock, flags); 439 + } 440 + 441 + static u32 xsk_get_num_desc(struct sk_buff *skb) 442 + { 443 + return skb ? (long)skb_shinfo(skb)->destructor_arg : 0; 444 + } 445 + 446 + static void xsk_destruct_skb(struct sk_buff *skb) 447 + { 448 + xsk_cq_submit_locked(xdp_sk(skb->sk), xsk_get_num_desc(skb)); 518 449 sock_wfree(skb); 450 + } 451 + 452 + static void xsk_set_destructor_arg(struct sk_buff *skb) 453 + { 454 + long num = xsk_get_num_desc(xdp_sk(skb->sk)->skb) + 1; 455 + 456 + skb_shinfo(skb)->destructor_arg = (void *)num; 457 + } 458 + 459 + static void xsk_consume_skb(struct sk_buff *skb) 460 + { 461 + struct xdp_sock *xs = xdp_sk(skb->sk); 462 + 463 + skb->destructor = sock_wfree; 464 + xsk_cq_cancel_locked(xs, xsk_get_num_desc(skb)); 465 + /* Free skb without triggering the perf drop trace */ 466 + consume_skb(skb); 467 + xs->skb = NULL; 468 + } 469 + 470 + static void xsk_drop_skb(struct sk_buff *skb) 471 + { 472 + xdp_sk(skb->sk)->tx->invalid_descs += xsk_get_num_desc(skb); 473 + xsk_consume_skb(skb); 519 474 } 520 475 521 476 static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, ··· 575 426 { 576 427 struct xsk_buff_pool *pool = xs->pool; 577 428 u32 hr, len, ts, offset, copy, copied; 578 - struct sk_buff *skb; 429 + struct sk_buff *skb = xs->skb; 579 430 struct page *page; 580 431 void *buffer; 581 432 int err, i; 582 433 u64 addr; 583 434 584 - hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); 435 + if (!skb) { 436 + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom)); 585 437 586 - skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); 587 - if (unlikely(!skb)) 588 - return ERR_PTR(err); 438 + skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err); 439 + if (unlikely(!skb)) 440 + return ERR_PTR(err); 589 441 590 - skb_reserve(skb, hr); 442 + skb_reserve(skb, hr); 443 + } 591 444 592 445 addr = desc->addr; 593 446 len = desc->len; ··· 599 448 offset = offset_in_page(buffer); 600 449 addr = buffer - pool->addrs; 601 450 602 - for (copied = 0, i = 0; copied < len; i++) { 451 + for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) { 452 + if (unlikely(i >= MAX_SKB_FRAGS)) 453 + return ERR_PTR(-EFAULT); 454 + 603 455 page = pool->umem->pgs[addr >> PAGE_SHIFT]; 604 456 get_page(page); 605 457 ··· 627 473 struct xdp_desc *desc) 628 474 { 629 475 struct net_device *dev = xs->dev; 630 - struct sk_buff *skb; 476 + struct sk_buff *skb = xs->skb; 477 + int err; 631 478 632 479 if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { 633 480 skb = xsk_build_skb_zerocopy(xs, desc); 634 - if (IS_ERR(skb)) 635 - return skb; 481 + if (IS_ERR(skb)) { 482 + err = PTR_ERR(skb); 483 + goto free_err; 484 + } 636 485 } else { 637 486 u32 hr, tr, len; 638 487 void *buffer; 639 - int err; 640 - 641 - hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); 642 - tr = dev->needed_tailroom; 643 - len = desc->len; 644 - 645 - skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); 646 - if (unlikely(!skb)) 647 - return ERR_PTR(err); 648 - 649 - skb_reserve(skb, hr); 650 - skb_put(skb, len); 651 488 652 489 buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); 653 - err = skb_store_bits(skb, 0, buffer, len); 654 - if (unlikely(err)) { 655 - kfree_skb(skb); 656 - return ERR_PTR(err); 490 + len = desc->len; 491 + 492 + if (!skb) { 493 + hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)); 494 + tr = dev->needed_tailroom; 495 + skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err); 496 + if (unlikely(!skb)) 497 + goto free_err; 498 + 499 + skb_reserve(skb, hr); 500 + skb_put(skb, len); 501 + 502 + err = skb_store_bits(skb, 0, buffer, len); 503 + if (unlikely(err)) 504 + goto free_err; 505 + } else { 506 + int nr_frags = skb_shinfo(skb)->nr_frags; 507 + struct page *page; 508 + u8 *vaddr; 509 + 510 + if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) { 511 + err = -EFAULT; 512 + goto free_err; 513 + } 514 + 515 + page = alloc_page(xs->sk.sk_allocation); 516 + if (unlikely(!page)) { 517 + err = -EAGAIN; 518 + goto free_err; 519 + } 520 + 521 + vaddr = kmap_local_page(page); 522 + memcpy(vaddr, buffer, len); 523 + kunmap_local(vaddr); 524 + 525 + skb_add_rx_frag(skb, nr_frags, page, 0, len, 0); 657 526 } 658 527 } 659 528 660 529 skb->dev = dev; 661 530 skb->priority = xs->sk.sk_priority; 662 531 skb->mark = xs->sk.sk_mark; 663 - skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; 664 532 skb->destructor = xsk_destruct_skb; 533 + xsk_set_destructor_arg(skb); 665 534 666 535 return skb; 536 + 537 + free_err: 538 + if (err == -EAGAIN) { 539 + xsk_cq_cancel_locked(xs, 1); 540 + } else { 541 + xsk_set_destructor_arg(skb); 542 + xsk_drop_skb(skb); 543 + xskq_cons_release(xs->tx); 544 + } 545 + 546 + return ERR_PTR(err); 667 547 } 668 548 669 549 static int __xsk_generic_xmit(struct sock *sk) ··· 707 519 bool sent_frame = false; 708 520 struct xdp_desc desc; 709 521 struct sk_buff *skb; 710 - unsigned long flags; 711 522 int err = 0; 712 523 713 524 mutex_lock(&xs->mutex); ··· 731 544 * if there is space in it. This avoids having to implement 732 545 * any buffering in the Tx path. 733 546 */ 734 - spin_lock_irqsave(&xs->pool->cq_lock, flags); 735 - if (xskq_prod_reserve(xs->pool->cq)) { 736 - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); 547 + if (xsk_cq_reserve_addr_locked(xs, desc.addr)) 737 548 goto out; 738 - } 739 - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); 740 549 741 550 skb = xsk_build_skb(xs, &desc); 742 551 if (IS_ERR(skb)) { 743 552 err = PTR_ERR(skb); 744 - spin_lock_irqsave(&xs->pool->cq_lock, flags); 745 - xskq_prod_cancel(xs->pool->cq); 746 - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); 747 - goto out; 553 + if (err == -EAGAIN) 554 + goto out; 555 + err = 0; 556 + continue; 557 + } 558 + 559 + xskq_cons_release(xs->tx); 560 + 561 + if (xp_mb_desc(&desc)) { 562 + xs->skb = skb; 563 + continue; 748 564 } 749 565 750 566 err = __dev_direct_xmit(skb, xs->queue_id); 751 567 if (err == NETDEV_TX_BUSY) { 752 568 /* Tell user-space to retry the send */ 753 - skb->destructor = sock_wfree; 754 - spin_lock_irqsave(&xs->pool->cq_lock, flags); 755 - xskq_prod_cancel(xs->pool->cq); 756 - spin_unlock_irqrestore(&xs->pool->cq_lock, flags); 757 - /* Free skb without triggering the perf drop trace */ 758 - consume_skb(skb); 569 + xskq_cons_cancel_n(xs->tx, xsk_get_num_desc(skb)); 570 + xsk_consume_skb(skb); 759 571 err = -EAGAIN; 760 572 goto out; 761 573 } 762 574 763 - xskq_cons_release(xs->tx); 764 575 /* Ignore NET_XMIT_CN as packet might have been sent */ 765 576 if (err == NET_XMIT_DROP) { 766 577 /* SKB completed but not sent */ 767 578 err = -EBUSY; 579 + xs->skb = NULL; 768 580 goto out; 769 581 } 770 582 771 583 sent_frame = true; 584 + xs->skb = NULL; 772 585 } 773 586 774 - xs->tx->queue_empty_descs++; 587 + if (xskq_has_descs(xs->tx)) { 588 + if (xs->skb) 589 + xsk_drop_skb(xs->skb); 590 + xskq_cons_release(xs->tx); 591 + } 775 592 776 593 out: 777 594 if (sent_frame) ··· 1025 834 1026 835 net = sock_net(sk); 1027 836 837 + if (xs->skb) 838 + xsk_drop_skb(xs->skb); 839 + 1028 840 mutex_lock(&net->xdp.lock); 1029 841 sk_del_node_init_rcu(sk); 1030 842 mutex_unlock(&net->xdp.lock); ··· 1091 897 1092 898 flags = sxdp->sxdp_flags; 1093 899 if (flags & ~(XDP_SHARED_UMEM | XDP_COPY | XDP_ZEROCOPY | 1094 - XDP_USE_NEED_WAKEUP)) 900 + XDP_USE_NEED_WAKEUP | XDP_USE_SG)) 1095 901 return -EINVAL; 1096 902 1097 903 bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); ··· 1123 929 struct socket *sock; 1124 930 1125 931 if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY) || 1126 - (flags & XDP_USE_NEED_WAKEUP)) { 932 + (flags & XDP_USE_NEED_WAKEUP) || (flags & XDP_USE_SG)) { 1127 933 /* Cannot specify flags for shared sockets. */ 1128 934 err = -EINVAL; 1129 935 goto out_unlock; ··· 1222 1028 1223 1029 xs->dev = dev; 1224 1030 xs->zc = xs->umem->zc; 1031 + xs->sg = !!(flags & XDP_USE_SG); 1225 1032 xs->queue_id = qid; 1226 1033 xp_add_xsk(xs->pool, xs); 1227 1034

+7

net/xdp/xsk_buff_pool.c

··· 86 86 pool->umem = umem; 87 87 pool->addrs = umem->addrs; 88 88 INIT_LIST_HEAD(&pool->free_list); 89 + INIT_LIST_HEAD(&pool->xskb_list); 89 90 INIT_LIST_HEAD(&pool->xsk_tx_list); 90 91 spin_lock_init(&pool->xsk_tx_list_lock); 91 92 spin_lock_init(&pool->cq_lock); ··· 100 99 xskb->pool = pool; 101 100 xskb->xdp.frame_sz = umem->chunk_size - umem->headroom; 102 101 INIT_LIST_HEAD(&xskb->free_list_node); 102 + INIT_LIST_HEAD(&xskb->xskb_list_node); 103 103 if (pool->unaligned) 104 104 pool->free_heads[i] = xskb; 105 105 else ··· 185 183 return 0; 186 184 187 185 if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) { 186 + err = -EOPNOTSUPP; 187 + goto err_unreg_pool; 188 + } 189 + 190 + if (netdev->xdp_zc_max_segs == 1 && (flags & XDP_USE_SG)) { 188 191 err = -EOPNOTSUPP; 189 192 goto err_unreg_pool; 190 193 }

+67 -30

net/xdp/xsk_queue.h

··· 48 48 size_t ring_vmalloc_size; 49 49 }; 50 50 51 + struct parsed_desc { 52 + u32 mb; 53 + u32 valid; 54 + }; 55 + 51 56 /* The structure of the shared state of the rings are a simple 52 57 * circular buffer, as outlined in 53 58 * Documentation/core-api/circular-buffers.rst. For the Rx and ··· 135 130 return false; 136 131 } 137 132 133 + static inline bool xp_unused_options_set(u32 options) 134 + { 135 + return options & ~XDP_PKT_CONTD; 136 + } 137 + 138 138 static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, 139 139 struct xdp_desc *desc) 140 140 { 141 141 u64 offset = desc->addr & (pool->chunk_size - 1); 142 + 143 + if (!desc->len) 144 + return false; 142 145 143 146 if (offset + desc->len > pool->chunk_size) 144 147 return false; ··· 154 141 if (desc->addr >= pool->addrs_cnt) 155 142 return false; 156 143 157 - if (desc->options) 144 + if (xp_unused_options_set(desc->options)) 158 145 return false; 159 146 return true; 160 147 } ··· 164 151 { 165 152 u64 addr = xp_unaligned_add_offset_to_addr(desc->addr); 166 153 154 + if (!desc->len) 155 + return false; 156 + 167 157 if (desc->len > pool->chunk_size) 168 158 return false; 169 159 ··· 174 158 xp_desc_crosses_non_contig_pg(pool, addr, desc->len)) 175 159 return false; 176 160 177 - if (desc->options) 161 + if (xp_unused_options_set(desc->options)) 178 162 return false; 179 163 return true; 180 164 } ··· 184 168 { 185 169 return pool->unaligned ? xp_unaligned_validate_desc(pool, desc) : 186 170 xp_aligned_validate_desc(pool, desc); 171 + } 172 + 173 + static inline bool xskq_has_descs(struct xsk_queue *q) 174 + { 175 + return q->cached_cons != q->cached_prod; 187 176 } 188 177 189 178 static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q, ··· 206 185 struct xdp_desc *desc, 207 186 struct xsk_buff_pool *pool) 208 187 { 209 - while (q->cached_cons != q->cached_prod) { 188 + if (q->cached_cons != q->cached_prod) { 210 189 struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; 211 190 u32 idx = q->cached_cons & q->ring_mask; 212 191 213 192 *desc = ring->desc[idx]; 214 - if (xskq_cons_is_valid_desc(q, desc, pool)) 215 - return true; 216 - 217 - q->cached_cons++; 193 + return xskq_cons_is_valid_desc(q, desc, pool); 218 194 } 219 195 196 + q->queue_empty_descs++; 220 197 return false; 221 198 } 222 199 ··· 223 204 q->cached_cons += cnt; 224 205 } 225 206 226 - static inline u32 xskq_cons_read_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool, 227 - u32 max) 207 + static inline void parse_desc(struct xsk_queue *q, struct xsk_buff_pool *pool, 208 + struct xdp_desc *desc, struct parsed_desc *parsed) 209 + { 210 + parsed->valid = xskq_cons_is_valid_desc(q, desc, pool); 211 + parsed->mb = xp_mb_desc(desc); 212 + } 213 + 214 + static inline 215 + u32 xskq_cons_read_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool, 216 + u32 max) 228 217 { 229 218 u32 cached_cons = q->cached_cons, nb_entries = 0; 230 219 struct xdp_desc *descs = pool->tx_descs; 220 + u32 total_descs = 0, nr_frags = 0; 231 221 222 + /* track first entry, if stumble upon *any* invalid descriptor, rewind 223 + * current packet that consists of frags and stop the processing 224 + */ 232 225 while (cached_cons != q->cached_prod && nb_entries < max) { 233 226 struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; 234 227 u32 idx = cached_cons & q->ring_mask; 228 + struct parsed_desc parsed; 235 229 236 230 descs[nb_entries] = ring->desc[idx]; 237 - if (unlikely(!xskq_cons_is_valid_desc(q, &descs[nb_entries], pool))) { 238 - /* Skip the entry */ 239 - cached_cons++; 240 - continue; 241 - } 242 - 243 - nb_entries++; 244 231 cached_cons++; 232 + parse_desc(q, pool, &descs[nb_entries], &parsed); 233 + if (unlikely(!parsed.valid)) 234 + break; 235 + 236 + if (likely(!parsed.mb)) { 237 + total_descs += (nr_frags + 1); 238 + nr_frags = 0; 239 + } else { 240 + nr_frags++; 241 + if (nr_frags == pool->netdev->xdp_zc_max_segs) { 242 + nr_frags = 0; 243 + break; 244 + } 245 + } 246 + nb_entries++; 245 247 } 246 248 249 + cached_cons -= nr_frags; 247 250 /* Release valid plus any invalid entries */ 248 251 xskq_cons_release_n(q, cached_cons - q->cached_cons); 249 - return nb_entries; 252 + return total_descs; 250 253 } 251 254 252 255 /* Functions for consumers */ ··· 333 292 q->cached_cons++; 334 293 } 335 294 295 + static inline void xskq_cons_cancel_n(struct xsk_queue *q, u32 cnt) 296 + { 297 + q->cached_cons -= cnt; 298 + } 299 + 336 300 static inline u32 xskq_cons_present_entries(struct xsk_queue *q) 337 301 { 338 302 /* No barriers needed since data is not accessed */ ··· 365 319 return xskq_prod_nb_free(q, 1) ? false : true; 366 320 } 367 321 368 - static inline void xskq_prod_cancel(struct xsk_queue *q) 322 + static inline void xskq_prod_cancel_n(struct xsk_queue *q, u32 cnt) 369 323 { 370 - q->cached_prod--; 324 + q->cached_prod -= cnt; 371 325 } 372 326 373 327 static inline int xskq_prod_reserve(struct xsk_queue *q) ··· 406 360 } 407 361 408 362 static inline int xskq_prod_reserve_desc(struct xsk_queue *q, 409 - u64 addr, u32 len) 363 + u64 addr, u32 len, u32 flags) 410 364 { 411 365 struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; 412 366 u32 idx; ··· 418 372 idx = q->cached_prod++ & q->ring_mask; 419 373 ring->desc[idx].addr = addr; 420 374 ring->desc[idx].len = len; 375 + ring->desc[idx].options = flags; 421 376 422 377 return 0; 423 378 } ··· 431 384 static inline void xskq_prod_submit(struct xsk_queue *q) 432 385 { 433 386 __xskq_prod_submit(q, q->cached_prod); 434 - } 435 - 436 - static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr) 437 - { 438 - struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; 439 - u32 idx = q->ring->producer; 440 - 441 - ring->desc[idx++ & q->ring_mask] = addr; 442 - 443 - __xskq_prod_submit(q, idx); 444 387 } 445 388 446 389 static inline void xskq_prod_submit_n(struct xsk_queue *q, u32 nb_entries)

+9 -5

samples/bpf/README.rst

··· 8 8 ================== 9 9 10 10 Compiling requires having installed: 11 - * clang >= version 3.4.0 12 - * llvm >= version 3.7.1 11 + * clang 12 + * llvm 13 + * pahole 13 14 14 - Note that LLVM's tool 'llc' must support target 'bpf', list version 15 - and supported targets with command: ``llc --version`` 15 + Consult :ref:`Documentation/process/changes.rst <changes>` for the minimum 16 + version numbers required and how to update them. Note that LLVM's tool 17 + 'llc' must support target 'bpf', list version and supported targets with 18 + command: ``llc --version`` 16 19 17 20 Clean and configuration 18 21 ----------------------- ··· 27 24 make -C samples/bpf clean 28 25 make clean 29 26 30 - Configure kernel, defconfig for instance:: 27 + Configure kernel, defconfig for instance 28 + (see "tools/testing/selftests/bpf/config" for a reference config):: 31 29 32 30 make defconfig 33 31

+15 -11

tools/bpf/bpftool/Documentation/bpftool-net.rst

··· 4 4 bpftool-net 5 5 ================ 6 6 ------------------------------------------------------------------------------- 7 - tool for inspection of netdev/tc related bpf prog attachments 7 + tool for inspection of networking related bpf prog attachments 8 8 ------------------------------------------------------------------------------- 9 9 10 10 :Manual section: 8 ··· 37 37 **bpftool net { show | list }** [ **dev** *NAME* ] 38 38 List bpf program attachments in the kernel networking subsystem. 39 39 40 - Currently, only device driver xdp attachments and tc filter 41 - classification/action attachments are implemented, i.e., for 42 - program types **BPF_PROG_TYPE_SCHED_CLS**, 43 - **BPF_PROG_TYPE_SCHED_ACT** and **BPF_PROG_TYPE_XDP**. 40 + Currently, device driver xdp attachments, tcx and old-style tc 41 + classifier/action attachments, flow_dissector as well as netfilter 42 + attachments are implemented, i.e., for 43 + program types **BPF_PROG_TYPE_XDP**, **BPF_PROG_TYPE_SCHED_CLS**, 44 + **BPF_PROG_TYPE_SCHED_ACT**, **BPF_PROG_TYPE_FLOW_DISSECTOR**, 45 + **BPF_PROG_TYPE_NETFILTER**. 46 + 44 47 For programs attached to a particular cgroup, e.g., 45 48 **BPF_PROG_TYPE_CGROUP_SKB**, **BPF_PROG_TYPE_CGROUP_SOCK**, 46 49 **BPF_PROG_TYPE_SOCK_OPS** and **BPF_PROG_TYPE_CGROUP_SOCK_ADDR**, ··· 52 49 bpf programs, users should consult other tools, e.g., iproute2. 53 50 54 51 The current output will start with all xdp program attachments, followed by 55 - all tc class/qdisc bpf program attachments. Both xdp programs and 56 - tc programs are ordered based on ifindex number. If multiple bpf 57 - programs attached to the same networking device through **tc filter**, 58 - the order will be first all bpf programs attached to tc classes, then 59 - all bpf programs attached to non clsact qdiscs, and finally all 60 - bpf programs attached to root and clsact qdisc. 52 + all tcx, then tc class/qdisc bpf program attachments, then flow_dissector 53 + and finally netfilter programs. Both xdp programs and tcx/tc programs are 54 + ordered based on ifindex number. If multiple bpf programs attached 55 + to the same networking device through **tc**, the order will be first 56 + all bpf programs attached to tcx, then tc classes, then all bpf programs 57 + attached to non clsact qdiscs, and finally all bpf programs attached 58 + to root and clsact qdisc. 61 59 62 60 **bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ] 63 61 Attach bpf program *PROG* to network interface *NAME* with

+93 -5

tools/bpf/bpftool/net.c

··· 76 76 [NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload", 77 77 }; 78 78 79 + static const char * const attach_loc_strings[] = { 80 + [BPF_TCX_INGRESS] = "tcx/ingress", 81 + [BPF_TCX_EGRESS] = "tcx/egress", 82 + }; 83 + 79 84 const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings); 80 85 81 86 static enum net_attach_type parse_attach_type(const char *str) ··· 427 422 filter_info->devname, filter_info->ifindex); 428 423 } 429 424 430 - static int show_dev_tc_bpf(int sock, unsigned int nl_pid, 431 - struct ip_devname_ifindex *dev) 425 + static int __show_dev_tc_bpf_name(__u32 id, char *name, size_t len) 426 + { 427 + struct bpf_prog_info info = {}; 428 + __u32 ilen = sizeof(info); 429 + int fd, ret; 430 + 431 + fd = bpf_prog_get_fd_by_id(id); 432 + if (fd < 0) 433 + return fd; 434 + ret = bpf_obj_get_info_by_fd(fd, &info, &ilen); 435 + if (ret < 0) 436 + goto out; 437 + ret = -ENOENT; 438 + if (info.name[0]) { 439 + get_prog_full_name(&info, fd, name, len); 440 + ret = 0; 441 + } 442 + out: 443 + close(fd); 444 + return ret; 445 + } 446 + 447 + static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev, 448 + const enum bpf_attach_type loc) 449 + { 450 + __u32 prog_flags[64] = {}, link_flags[64] = {}, i, j; 451 + __u32 prog_ids[64] = {}, link_ids[64] = {}; 452 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 453 + char prog_name[MAX_PROG_FULL_NAME]; 454 + int ret; 455 + 456 + optq.prog_ids = prog_ids; 457 + optq.prog_attach_flags = prog_flags; 458 + optq.link_ids = link_ids; 459 + optq.link_attach_flags = link_flags; 460 + optq.count = ARRAY_SIZE(prog_ids); 461 + 462 + ret = bpf_prog_query_opts(dev->ifindex, loc, &optq); 463 + if (ret) 464 + return; 465 + for (i = 0; i < optq.count; i++) { 466 + NET_START_OBJECT; 467 + NET_DUMP_STR("devname", "%s", dev->devname); 468 + NET_DUMP_UINT("ifindex", "(%u)", dev->ifindex); 469 + NET_DUMP_STR("kind", " %s", attach_loc_strings[loc]); 470 + ret = __show_dev_tc_bpf_name(prog_ids[i], prog_name, 471 + sizeof(prog_name)); 472 + if (!ret) 473 + NET_DUMP_STR("name", " %s", prog_name); 474 + NET_DUMP_UINT("prog_id", " prog_id %u ", prog_ids[i]); 475 + if (prog_flags[i] || json_output) { 476 + NET_START_ARRAY("prog_flags", "%s "); 477 + for (j = 0; prog_flags[i] && j < 32; j++) { 478 + if (!(prog_flags[i] & (1 << j))) 479 + continue; 480 + NET_DUMP_UINT_ONLY(1 << j); 481 + } 482 + NET_END_ARRAY(""); 483 + } 484 + if (link_ids[i] || json_output) { 485 + NET_DUMP_UINT("link_id", "link_id %u ", link_ids[i]); 486 + if (link_flags[i] || json_output) { 487 + NET_START_ARRAY("link_flags", "%s "); 488 + for (j = 0; link_flags[i] && j < 32; j++) { 489 + if (!(link_flags[i] & (1 << j))) 490 + continue; 491 + NET_DUMP_UINT_ONLY(1 << j); 492 + } 493 + NET_END_ARRAY(""); 494 + } 495 + } 496 + NET_END_OBJECT_FINAL; 497 + } 498 + } 499 + 500 + static void show_dev_tc_bpf(struct ip_devname_ifindex *dev) 501 + { 502 + __show_dev_tc_bpf(dev, BPF_TCX_INGRESS); 503 + __show_dev_tc_bpf(dev, BPF_TCX_EGRESS); 504 + } 505 + 506 + static int show_dev_tc_bpf_classic(int sock, unsigned int nl_pid, 507 + struct ip_devname_ifindex *dev) 432 508 { 433 509 struct bpf_filter_t filter_info; 434 510 struct bpf_tcinfo_t tcinfo; ··· 876 790 if (!ret) { 877 791 NET_START_ARRAY("tc", "%s:\n"); 878 792 for (i = 0; i < dev_array.used_len; i++) { 879 - ret = show_dev_tc_bpf(sock, nl_pid, 880 - &dev_array.devices[i]); 793 + show_dev_tc_bpf(&dev_array.devices[i]); 794 + ret = show_dev_tc_bpf_classic(sock, nl_pid, 795 + &dev_array.devices[i]); 881 796 if (ret) 882 797 break; 883 798 } ··· 926 839 " ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n" 927 840 " " HELP_SPEC_OPTIONS " }\n" 928 841 "\n" 929 - "Note: Only xdp and tc attachments are supported now.\n" 842 + "Note: Only xdp, tcx, tc, flow_dissector and netfilter attachments\n" 843 + " are currently supported.\n" 930 844 " For progs attached to cgroups, use \"bpftool cgroup\"\n" 931 845 " to dump program attachments. For program types\n" 932 846 " sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"

+8

tools/bpf/bpftool/netlink_dumper.h

··· 76 76 fprintf(stdout, fmt_str, val); \ 77 77 } 78 78 79 + #define NET_DUMP_UINT_ONLY(str) \ 80 + { \ 81 + if (json_output) \ 82 + jsonw_uint(json_wtr, str); \ 83 + else \ 84 + fprintf(stdout, "%u ", str); \ 85 + } 86 + 79 87 #define NET_DUMP_STR(name, fmt_str, str) \ 80 88 { \ 81 89 if (json_output) \

+60 -12

tools/include/uapi/linux/bpf.h

··· 1036 1036 BPF_LSM_CGROUP, 1037 1037 BPF_STRUCT_OPS, 1038 1038 BPF_NETFILTER, 1039 + BPF_TCX_INGRESS, 1040 + BPF_TCX_EGRESS, 1039 1041 __MAX_BPF_ATTACH_TYPE 1040 1042 }; 1041 1043 ··· 1055 1053 BPF_LINK_TYPE_KPROBE_MULTI = 8, 1056 1054 BPF_LINK_TYPE_STRUCT_OPS = 9, 1057 1055 BPF_LINK_TYPE_NETFILTER = 10, 1058 - 1056 + BPF_LINK_TYPE_TCX = 11, 1059 1057 MAX_BPF_LINK_TYPE, 1060 1058 }; 1061 1059 ··· 1115 1113 */ 1116 1114 #define BPF_F_ALLOW_OVERRIDE (1U << 0) 1117 1115 #define BPF_F_ALLOW_MULTI (1U << 1) 1116 + /* Generic attachment flags. */ 1118 1117 #define BPF_F_REPLACE (1U << 2) 1118 + #define BPF_F_BEFORE (1U << 3) 1119 + #define BPF_F_AFTER (1U << 4) 1120 + #define BPF_F_ID (1U << 5) 1121 + #define BPF_F_LINK BPF_F_LINK /* 1 << 13 */ 1119 1122 1120 1123 /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the 1121 1124 * verifier will perform strict alignment checking as if the kernel ··· 1451 1444 }; 1452 1445 1453 1446 struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ 1454 - __u32 target_fd; /* container object to attach to */ 1455 - __u32 attach_bpf_fd; /* eBPF program to attach */ 1447 + union { 1448 + __u32 target_fd; /* target object to attach to or ... */ 1449 + __u32 target_ifindex; /* target ifindex */ 1450 + }; 1451 + __u32 attach_bpf_fd; 1456 1452 __u32 attach_type; 1457 1453 __u32 attach_flags; 1458 - __u32 replace_bpf_fd; /* previously attached eBPF 1459 - * program to replace if 1460 - * BPF_F_REPLACE is used 1461 - */ 1454 + __u32 replace_bpf_fd; 1455 + union { 1456 + __u32 relative_fd; 1457 + __u32 relative_id; 1458 + }; 1459 + __u64 expected_revision; 1462 1460 }; 1463 1461 1464 1462 struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */ ··· 1509 1497 } info; 1510 1498 1511 1499 struct { /* anonymous struct used by BPF_PROG_QUERY command */ 1512 - __u32 target_fd; /* container object to query */ 1500 + union { 1501 + __u32 target_fd; /* target object to query or ... */ 1502 + __u32 target_ifindex; /* target ifindex */ 1503 + }; 1513 1504 __u32 attach_type; 1514 1505 __u32 query_flags; 1515 1506 __u32 attach_flags; 1516 1507 __aligned_u64 prog_ids; 1517 - __u32 prog_cnt; 1508 + union { 1509 + __u32 prog_cnt; 1510 + __u32 count; 1511 + }; 1512 + __u32 :32; 1518 1513 /* output: per-program attach_flags. 1519 1514 * not allowed to be set during effective query. 1520 1515 */ 1521 1516 __aligned_u64 prog_attach_flags; 1517 + __aligned_u64 link_ids; 1518 + __aligned_u64 link_attach_flags; 1519 + __u64 revision; 1522 1520 } query; 1523 1521 1524 1522 struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */ ··· 1571 1549 __u32 map_fd; /* struct_ops to attach */ 1572 1550 }; 1573 1551 union { 1574 - __u32 target_fd; /* object to attach to */ 1575 - __u32 target_ifindex; /* target ifindex */ 1552 + __u32 target_fd; /* target object to attach to or ... */ 1553 + __u32 target_ifindex; /* target ifindex */ 1576 1554 }; 1577 1555 __u32 attach_type; /* attach type */ 1578 1556 __u32 flags; /* extra flags */ 1579 1557 union { 1580 - __u32 target_btf_id; /* btf_id of target to attach to */ 1558 + __u32 target_btf_id; /* btf_id of target to attach to */ 1581 1559 struct { 1582 1560 __aligned_u64 iter_info; /* extra bpf_iter_link_info */ 1583 1561 __u32 iter_info_len; /* iter_info length */ ··· 1611 1589 __s32 priority; 1612 1590 __u32 flags; 1613 1591 } netfilter; 1592 + struct { 1593 + union { 1594 + __u32 relative_fd; 1595 + __u32 relative_id; 1596 + }; 1597 + __u64 expected_revision; 1598 + } tcx; 1614 1599 }; 1615 1600 } link_create; 1616 1601 ··· 6226 6197 }; 6227 6198 }; 6228 6199 6200 + /* (Simplified) user return codes for tcx prog type. 6201 + * A valid tcx program must return one of these defined values. All other 6202 + * return codes are reserved for future use. Must remain compatible with 6203 + * their TC_ACT_* counter-parts. For compatibility in behavior, unknown 6204 + * return codes are mapped to TCX_NEXT. 6205 + */ 6206 + enum tcx_action_base { 6207 + TCX_NEXT = -1, 6208 + TCX_PASS = 0, 6209 + TCX_DROP = 2, 6210 + TCX_REDIRECT = 7, 6211 + }; 6212 + 6229 6213 struct bpf_xdp_sock { 6230 6214 __u32 queue_id; 6231 6215 }; ··· 6521 6479 } event; /* BPF_PERF_EVENT_EVENT */ 6522 6480 }; 6523 6481 } perf_event; 6482 + struct { 6483 + __u32 ifindex; 6484 + __u32 attach_type; 6485 + } tcx; 6524 6486 }; 6525 6487 } __attribute__((aligned(8))); 6526 6488 ··· 7098 7052 struct bpf_list_node { 7099 7053 __u64 :64; 7100 7054 __u64 :64; 7055 + __u64 :64; 7101 7056 } __attribute__((aligned(8))); 7102 7057 7103 7058 struct bpf_rb_root { ··· 7107 7060 } __attribute__((aligned(8))); 7108 7061 7109 7062 struct bpf_rb_node { 7063 + __u64 :64; 7110 7064 __u64 :64; 7111 7065 __u64 :64; 7112 7066 __u64 :64;

+9

tools/include/uapi/linux/if_xdp.h

··· 25 25 * application. 26 26 */ 27 27 #define XDP_USE_NEED_WAKEUP (1 << 3) 28 + /* By setting this option, userspace application indicates that it can 29 + * handle multiple descriptors per packet thus enabling xsk core to split 30 + * multi-buffer XDP frames into multiple Rx descriptors. Without this set 31 + * such frames will be dropped by xsk. 32 + */ 33 + #define XDP_USE_SG (1 << 4) 28 34 29 35 /* Flags for xsk_umem_config flags */ 30 36 #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) ··· 111 105 __u32 len; 112 106 __u32 options; 113 107 }; 108 + 109 + /* Flag indicating packet constitutes of multiple buffers*/ 110 + #define XDP_PKT_CONTD (1 << 0) 114 111 115 112 /* UMEM descriptor is __u64 */ 116 113

+1

tools/include/uapi/linux/netdev.h

··· 41 41 NETDEV_A_DEV_IFINDEX = 1, 42 42 NETDEV_A_DEV_PAD, 43 43 NETDEV_A_DEV_XDP_FEATURES, 44 + NETDEV_A_DEV_XDP_ZC_MAX_SEGS, 44 45 45 46 __NETDEV_A_DEV_MAX, 46 47 NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)

+89 -38

tools/lib/bpf/bpf.c

··· 629 629 return bpf_prog_attach_opts(prog_fd, target_fd, type, &opts); 630 630 } 631 631 632 - int bpf_prog_attach_opts(int prog_fd, int target_fd, 633 - enum bpf_attach_type type, 634 - const struct bpf_prog_attach_opts *opts) 632 + int bpf_prog_attach_opts(int prog_fd, int target, enum bpf_attach_type type, 633 + const struct bpf_prog_attach_opts *opts) 635 634 { 636 - const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd); 635 + const size_t attr_sz = offsetofend(union bpf_attr, expected_revision); 636 + __u32 relative_id, flags; 637 + int ret, relative_fd; 637 638 union bpf_attr attr; 638 - int ret; 639 639 640 640 if (!OPTS_VALID(opts, bpf_prog_attach_opts)) 641 641 return libbpf_err(-EINVAL); 642 642 643 + relative_id = OPTS_GET(opts, relative_id, 0); 644 + relative_fd = OPTS_GET(opts, relative_fd, 0); 645 + flags = OPTS_GET(opts, flags, 0); 646 + 647 + /* validate we don't have unexpected combinations of non-zero fields */ 648 + if (relative_fd && relative_id) 649 + return libbpf_err(-EINVAL); 650 + 643 651 memset(&attr, 0, attr_sz); 644 - attr.target_fd = target_fd; 645 - attr.attach_bpf_fd = prog_fd; 646 - attr.attach_type = type; 647 - attr.attach_flags = OPTS_GET(opts, flags, 0); 648 - attr.replace_bpf_fd = OPTS_GET(opts, replace_prog_fd, 0); 652 + attr.target_fd = target; 653 + attr.attach_bpf_fd = prog_fd; 654 + attr.attach_type = type; 655 + attr.replace_bpf_fd = OPTS_GET(opts, replace_fd, 0); 656 + attr.expected_revision = OPTS_GET(opts, expected_revision, 0); 657 + 658 + if (relative_id) { 659 + attr.attach_flags = flags | BPF_F_ID; 660 + attr.relative_id = relative_id; 661 + } else { 662 + attr.attach_flags = flags; 663 + attr.relative_fd = relative_fd; 664 + } 649 665 650 666 ret = sys_bpf(BPF_PROG_ATTACH, &attr, attr_sz); 651 667 return libbpf_err_errno(ret); 652 668 } 653 669 654 - int bpf_prog_detach(int target_fd, enum bpf_attach_type type) 670 + int bpf_prog_detach_opts(int prog_fd, int target, enum bpf_attach_type type, 671 + const struct bpf_prog_detach_opts *opts) 655 672 { 656 - const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd); 673 + const size_t attr_sz = offsetofend(union bpf_attr, expected_revision); 674 + __u32 relative_id, flags; 675 + int ret, relative_fd; 657 676 union bpf_attr attr; 658 - int ret; 677 + 678 + if (!OPTS_VALID(opts, bpf_prog_detach_opts)) 679 + return libbpf_err(-EINVAL); 680 + 681 + relative_id = OPTS_GET(opts, relative_id, 0); 682 + relative_fd = OPTS_GET(opts, relative_fd, 0); 683 + flags = OPTS_GET(opts, flags, 0); 684 + 685 + /* validate we don't have unexpected combinations of non-zero fields */ 686 + if (relative_fd && relative_id) 687 + return libbpf_err(-EINVAL); 659 688 660 689 memset(&attr, 0, attr_sz); 661 - attr.target_fd = target_fd; 662 - attr.attach_type = type; 690 + attr.target_fd = target; 691 + attr.attach_bpf_fd = prog_fd; 692 + attr.attach_type = type; 693 + attr.expected_revision = OPTS_GET(opts, expected_revision, 0); 694 + 695 + if (relative_id) { 696 + attr.attach_flags = flags | BPF_F_ID; 697 + attr.relative_id = relative_id; 698 + } else { 699 + attr.attach_flags = flags; 700 + attr.relative_fd = relative_fd; 701 + } 663 702 664 703 ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz); 665 704 return libbpf_err_errno(ret); 666 705 } 667 706 707 + int bpf_prog_detach(int target_fd, enum bpf_attach_type type) 708 + { 709 + return bpf_prog_detach_opts(0, target_fd, type, NULL); 710 + } 711 + 668 712 int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type) 669 713 { 670 - const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd); 671 - union bpf_attr attr; 672 - int ret; 673 - 674 - memset(&attr, 0, attr_sz); 675 - attr.target_fd = target_fd; 676 - attr.attach_bpf_fd = prog_fd; 677 - attr.attach_type = type; 678 - 679 - ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz); 680 - return libbpf_err_errno(ret); 714 + return bpf_prog_detach_opts(prog_fd, target_fd, type, NULL); 681 715 } 682 716 683 717 int bpf_link_create(int prog_fd, int target_fd, ··· 719 685 const struct bpf_link_create_opts *opts) 720 686 { 721 687 const size_t attr_sz = offsetofend(union bpf_attr, link_create); 722 - __u32 target_btf_id, iter_info_len; 688 + __u32 target_btf_id, iter_info_len, relative_id; 689 + int fd, err, relative_fd; 723 690 union bpf_attr attr; 724 - int fd, err; 725 691 726 692 if (!OPTS_VALID(opts, bpf_link_create_opts)) 727 693 return libbpf_err(-EINVAL); ··· 781 747 attr.link_create.netfilter.priority = OPTS_GET(opts, netfilter.priority, 0); 782 748 attr.link_create.netfilter.flags = OPTS_GET(opts, netfilter.flags, 0); 783 749 if (!OPTS_ZEROED(opts, netfilter)) 750 + return libbpf_err(-EINVAL); 751 + break; 752 + case BPF_TCX_INGRESS: 753 + case BPF_TCX_EGRESS: 754 + relative_fd = OPTS_GET(opts, tcx.relative_fd, 0); 755 + relative_id = OPTS_GET(opts, tcx.relative_id, 0); 756 + if (relative_fd && relative_id) 757 + return libbpf_err(-EINVAL); 758 + if (relative_id) { 759 + attr.link_create.tcx.relative_id = relative_id; 760 + attr.link_create.flags |= BPF_F_ID; 761 + } else { 762 + attr.link_create.tcx.relative_fd = relative_fd; 763 + } 764 + attr.link_create.tcx.expected_revision = OPTS_GET(opts, tcx.expected_revision, 0); 765 + if (!OPTS_ZEROED(opts, tcx)) 784 766 return libbpf_err(-EINVAL); 785 767 break; 786 768 default: ··· 891 841 return libbpf_err_errno(fd); 892 842 } 893 843 894 - int bpf_prog_query_opts(int target_fd, 895 - enum bpf_attach_type type, 844 + int bpf_prog_query_opts(int target, enum bpf_attach_type type, 896 845 struct bpf_prog_query_opts *opts) 897 846 { 898 847 const size_t attr_sz = offsetofend(union bpf_attr, query); ··· 902 853 return libbpf_err(-EINVAL); 903 854 904 855 memset(&attr, 0, attr_sz); 905 - 906 - attr.query.target_fd = target_fd; 907 - attr.query.attach_type = type; 908 - attr.query.query_flags = OPTS_GET(opts, query_flags, 0); 909 - attr.query.prog_cnt = OPTS_GET(opts, prog_cnt, 0); 910 - attr.query.prog_ids = ptr_to_u64(OPTS_GET(opts, prog_ids, NULL)); 911 - attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL)); 856 + attr.query.target_fd = target; 857 + attr.query.attach_type = type; 858 + attr.query.query_flags = OPTS_GET(opts, query_flags, 0); 859 + attr.query.count = OPTS_GET(opts, count, 0); 860 + attr.query.prog_ids = ptr_to_u64(OPTS_GET(opts, prog_ids, NULL)); 861 + attr.query.link_ids = ptr_to_u64(OPTS_GET(opts, link_ids, NULL)); 862 + attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL)); 863 + attr.query.link_attach_flags = ptr_to_u64(OPTS_GET(opts, link_attach_flags, NULL)); 912 864 913 865 ret = sys_bpf(BPF_PROG_QUERY, &attr, attr_sz); 914 866 915 867 OPTS_SET(opts, attach_flags, attr.query.attach_flags); 916 - OPTS_SET(opts, prog_cnt, attr.query.prog_cnt); 868 + OPTS_SET(opts, revision, attr.query.revision); 869 + OPTS_SET(opts, count, attr.query.count); 917 870 918 871 return libbpf_err_errno(ret); 919 872 }

+83 -14

tools/lib/bpf/bpf.h

··· 312 312 LIBBPF_API int bpf_obj_get_opts(const char *pathname, 313 313 const struct bpf_obj_get_opts *opts); 314 314 315 - struct bpf_prog_attach_opts { 316 - size_t sz; /* size of this struct for forward/backward compatibility */ 317 - unsigned int flags; 318 - int replace_prog_fd; 319 - }; 320 - #define bpf_prog_attach_opts__last_field replace_prog_fd 321 - 322 315 LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd, 323 316 enum bpf_attach_type type, unsigned int flags); 324 - LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int attachable_fd, 325 - enum bpf_attach_type type, 326 - const struct bpf_prog_attach_opts *opts); 327 317 LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type); 328 318 LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd, 329 319 enum bpf_attach_type type); 320 + 321 + struct bpf_prog_attach_opts { 322 + size_t sz; /* size of this struct for forward/backward compatibility */ 323 + __u32 flags; 324 + union { 325 + int replace_prog_fd; 326 + int replace_fd; 327 + }; 328 + int relative_fd; 329 + __u32 relative_id; 330 + __u64 expected_revision; 331 + size_t :0; 332 + }; 333 + #define bpf_prog_attach_opts__last_field expected_revision 334 + 335 + struct bpf_prog_detach_opts { 336 + size_t sz; /* size of this struct for forward/backward compatibility */ 337 + __u32 flags; 338 + int relative_fd; 339 + __u32 relative_id; 340 + __u64 expected_revision; 341 + size_t :0; 342 + }; 343 + #define bpf_prog_detach_opts__last_field expected_revision 344 + 345 + /** 346 + * @brief **bpf_prog_attach_opts()** attaches the BPF program corresponding to 347 + * *prog_fd* to a *target* which can represent a file descriptor or netdevice 348 + * ifindex. 349 + * 350 + * @param prog_fd BPF program file descriptor 351 + * @param target attach location file descriptor or ifindex 352 + * @param type attach type for the BPF program 353 + * @param opts options for configuring the attachment 354 + * @return 0, on success; negative error code, otherwise (errno is also set to 355 + * the error code) 356 + */ 357 + LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int target, 358 + enum bpf_attach_type type, 359 + const struct bpf_prog_attach_opts *opts); 360 + 361 + /** 362 + * @brief **bpf_prog_detach_opts()** detaches the BPF program corresponding to 363 + * *prog_fd* from a *target* which can represent a file descriptor or netdevice 364 + * ifindex. 365 + * 366 + * @param prog_fd BPF program file descriptor 367 + * @param target detach location file descriptor or ifindex 368 + * @param type detach type for the BPF program 369 + * @param opts options for configuring the detachment 370 + * @return 0, on success; negative error code, otherwise (errno is also set to 371 + * the error code) 372 + */ 373 + LIBBPF_API int bpf_prog_detach_opts(int prog_fd, int target, 374 + enum bpf_attach_type type, 375 + const struct bpf_prog_detach_opts *opts); 330 376 331 377 union bpf_iter_link_info; /* defined in up-to-date linux/bpf.h */ 332 378 struct bpf_link_create_opts { ··· 401 355 __s32 priority; 402 356 __u32 flags; 403 357 } netfilter; 358 + struct { 359 + __u32 relative_fd; 360 + __u32 relative_id; 361 + __u64 expected_revision; 362 + } tcx; 404 363 }; 405 364 size_t :0; 406 365 }; ··· 546 495 __u32 query_flags; 547 496 __u32 attach_flags; /* output argument */ 548 497 __u32 *prog_ids; 549 - __u32 prog_cnt; /* input+output argument */ 498 + union { 499 + /* input+output argument */ 500 + __u32 prog_cnt; 501 + __u32 count; 502 + }; 550 503 __u32 *prog_attach_flags; 504 + __u32 *link_ids; 505 + __u32 *link_attach_flags; 506 + __u64 revision; 507 + size_t :0; 551 508 }; 552 - #define bpf_prog_query_opts__last_field prog_attach_flags 509 + #define bpf_prog_query_opts__last_field revision 553 510 554 - LIBBPF_API int bpf_prog_query_opts(int target_fd, 555 - enum bpf_attach_type type, 511 + /** 512 + * @brief **bpf_prog_query_opts()** queries the BPF programs and BPF links 513 + * which are attached to *target* which can represent a file descriptor or 514 + * netdevice ifindex. 515 + * 516 + * @param target query location file descriptor or ifindex 517 + * @param type attach type for the BPF program 518 + * @param opts options for configuring the query 519 + * @return 0, on success; negative error code, otherwise (errno is also set to 520 + * the error code) 521 + */ 522 + LIBBPF_API int bpf_prog_query_opts(int target, enum bpf_attach_type type, 556 523 struct bpf_prog_query_opts *opts); 557 524 LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type, 558 525 __u32 query_flags, __u32 *attach_flags,

+58 -12

tools/lib/bpf/libbpf.c

··· 118 118 [BPF_TRACE_KPROBE_MULTI] = "trace_kprobe_multi", 119 119 [BPF_STRUCT_OPS] = "struct_ops", 120 120 [BPF_NETFILTER] = "netfilter", 121 + [BPF_TCX_INGRESS] = "tcx_ingress", 122 + [BPF_TCX_EGRESS] = "tcx_egress", 121 123 }; 122 124 123 125 static const char * const link_type_name[] = { ··· 134 132 [BPF_LINK_TYPE_KPROBE_MULTI] = "kprobe_multi", 135 133 [BPF_LINK_TYPE_STRUCT_OPS] = "struct_ops", 136 134 [BPF_LINK_TYPE_NETFILTER] = "netfilter", 135 + [BPF_LINK_TYPE_TCX] = "tcx", 137 136 }; 138 137 139 138 static const char * const map_type_name[] = { ··· 8699 8696 SEC_DEF("ksyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall), 8700 8697 SEC_DEF("kretsyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall), 8701 8698 SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt), 8702 - SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE), 8703 - SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE), 8704 - SEC_DEF("action", SCHED_ACT, 0, SEC_NONE), 8699 + SEC_DEF("tc/ingress", SCHED_CLS, BPF_TCX_INGRESS, SEC_NONE), /* alias for tcx */ 8700 + SEC_DEF("tc/egress", SCHED_CLS, BPF_TCX_EGRESS, SEC_NONE), /* alias for tcx */ 8701 + SEC_DEF("tcx/ingress", SCHED_CLS, BPF_TCX_INGRESS, SEC_NONE), 8702 + SEC_DEF("tcx/egress", SCHED_CLS, BPF_TCX_EGRESS, SEC_NONE), 8703 + SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE), /* deprecated / legacy, use tcx */ 8704 + SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE), /* deprecated / legacy, use tcx */ 8705 + SEC_DEF("action", SCHED_ACT, 0, SEC_NONE), /* deprecated / legacy, use tcx */ 8705 8706 SEC_DEF("tracepoint+", TRACEPOINT, 0, SEC_NONE, attach_tp), 8706 8707 SEC_DEF("tp+", TRACEPOINT, 0, SEC_NONE, attach_tp), 8707 8708 SEC_DEF("raw_tracepoint+", RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp), ··· 11855 11848 } 11856 11849 11857 11850 static struct bpf_link * 11858 - bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id, 11859 - const char *target_name) 11851 + bpf_program_attach_fd(const struct bpf_program *prog, 11852 + int target_fd, const char *target_name, 11853 + const struct bpf_link_create_opts *opts) 11860 11854 { 11861 - DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts, 11862 - .target_btf_id = btf_id); 11863 11855 enum bpf_attach_type attach_type; 11864 11856 char errmsg[STRERR_BUFSIZE]; 11865 11857 struct bpf_link *link; ··· 11876 11870 link->detach = &bpf_link__detach_fd; 11877 11871 11878 11872 attach_type = bpf_program__expected_attach_type(prog); 11879 - link_fd = bpf_link_create(prog_fd, target_fd, attach_type, &opts); 11873 + link_fd = bpf_link_create(prog_fd, target_fd, attach_type, opts); 11880 11874 if (link_fd < 0) { 11881 11875 link_fd = -errno; 11882 11876 free(link); ··· 11892 11886 struct bpf_link * 11893 11887 bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd) 11894 11888 { 11895 - return bpf_program__attach_fd(prog, cgroup_fd, 0, "cgroup"); 11889 + return bpf_program_attach_fd(prog, cgroup_fd, "cgroup", NULL); 11896 11890 } 11897 11891 11898 11892 struct bpf_link * 11899 11893 bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd) 11900 11894 { 11901 - return bpf_program__attach_fd(prog, netns_fd, 0, "netns"); 11895 + return bpf_program_attach_fd(prog, netns_fd, "netns", NULL); 11902 11896 } 11903 11897 11904 11898 struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex) 11905 11899 { 11906 11900 /* target_fd/target_ifindex use the same field in LINK_CREATE */ 11907 - return bpf_program__attach_fd(prog, ifindex, 0, "xdp"); 11901 + return bpf_program_attach_fd(prog, ifindex, "xdp", NULL); 11902 + } 11903 + 11904 + struct bpf_link * 11905 + bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex, 11906 + const struct bpf_tcx_opts *opts) 11907 + { 11908 + LIBBPF_OPTS(bpf_link_create_opts, link_create_opts); 11909 + __u32 relative_id; 11910 + int relative_fd; 11911 + 11912 + if (!OPTS_VALID(opts, bpf_tcx_opts)) 11913 + return libbpf_err_ptr(-EINVAL); 11914 + 11915 + relative_id = OPTS_GET(opts, relative_id, 0); 11916 + relative_fd = OPTS_GET(opts, relative_fd, 0); 11917 + 11918 + /* validate we don't have unexpected combinations of non-zero fields */ 11919 + if (!ifindex) { 11920 + pr_warn("prog '%s': target netdevice ifindex cannot be zero\n", 11921 + prog->name); 11922 + return libbpf_err_ptr(-EINVAL); 11923 + } 11924 + if (relative_fd && relative_id) { 11925 + pr_warn("prog '%s': relative_fd and relative_id cannot be set at the same time\n", 11926 + prog->name); 11927 + return libbpf_err_ptr(-EINVAL); 11928 + } 11929 + 11930 + link_create_opts.tcx.expected_revision = OPTS_GET(opts, expected_revision, 0); 11931 + link_create_opts.tcx.relative_fd = relative_fd; 11932 + link_create_opts.tcx.relative_id = relative_id; 11933 + link_create_opts.flags = OPTS_GET(opts, flags, 0); 11934 + 11935 + /* target_fd/target_ifindex use the same field in LINK_CREATE */ 11936 + return bpf_program_attach_fd(prog, ifindex, "tcx", &link_create_opts); 11908 11937 } 11909 11938 11910 11939 struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog, ··· 11961 11920 } 11962 11921 11963 11922 if (target_fd) { 11923 + LIBBPF_OPTS(bpf_link_create_opts, target_opts); 11924 + 11964 11925 btf_id = libbpf_find_prog_btf_id(attach_func_name, target_fd); 11965 11926 if (btf_id < 0) 11966 11927 return libbpf_err_ptr(btf_id); 11967 11928 11968 - return bpf_program__attach_fd(prog, target_fd, btf_id, "freplace"); 11929 + target_opts.target_btf_id = btf_id; 11930 + 11931 + return bpf_program_attach_fd(prog, target_fd, "freplace", 11932 + &target_opts); 11969 11933 } else { 11970 11934 /* no target, so use raw_tracepoint_open for compatibility 11971 11935 * with old kernels

+17 -1

tools/lib/bpf/libbpf.h

··· 733 733 bpf_program__attach_netfilter(const struct bpf_program *prog, 734 734 const struct bpf_netfilter_opts *opts); 735 735 736 + struct bpf_tcx_opts { 737 + /* size of this struct, for forward/backward compatibility */ 738 + size_t sz; 739 + __u32 flags; 740 + __u32 relative_fd; 741 + __u32 relative_id; 742 + __u64 expected_revision; 743 + size_t :0; 744 + }; 745 + #define bpf_tcx_opts__last_field expected_revision 746 + 747 + LIBBPF_API struct bpf_link * 748 + bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex, 749 + const struct bpf_tcx_opts *opts); 750 + 736 751 struct bpf_map; 737 752 738 753 LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map); ··· 1120 1105 __u32 skb_prog_id; /* output */ 1121 1106 __u8 attach_mode; /* output */ 1122 1107 __u64 feature_flags; /* output */ 1108 + __u32 xdp_zc_max_segs; /* output */ 1123 1109 size_t :0; 1124 1110 }; 1125 - #define bpf_xdp_query_opts__last_field feature_flags 1111 + #define bpf_xdp_query_opts__last_field xdp_zc_max_segs 1126 1112 1127 1113 LIBBPF_API int bpf_xdp_attach(int ifindex, int prog_fd, __u32 flags, 1128 1114 const struct bpf_xdp_attach_opts *opts);

+2

tools/lib/bpf/libbpf.map

··· 395 395 LIBBPF_1.3.0 { 396 396 global: 397 397 bpf_obj_pin_opts; 398 + bpf_prog_detach_opts; 398 399 bpf_program__attach_netfilter; 400 + bpf_program__attach_tcx; 399 401 } LIBBPF_1.2.0;

+16

tools/lib/bpf/libbpf_common.h

··· 70 70 }; \ 71 71 }) 72 72 73 + /* Helper macro to clear and optionally reinitialize libbpf options struct 74 + * 75 + * Small helper macro to reset all fields and to reinitialize the common 76 + * structure size member. Values provided by users in struct initializer- 77 + * syntax as varargs can be provided as well to reinitialize options struct 78 + * specific members. 79 + */ 80 + #define LIBBPF_OPTS_RESET(NAME, ...) \ 81 + do { \ 82 + memset(&NAME, 0, sizeof(NAME)); \ 83 + NAME = (typeof(NAME)) { \ 84 + .sz = sizeof(NAME), \ 85 + __VA_ARGS__ \ 86 + }; \ 87 + } while (0) 88 + 73 89 #endif /* __LIBBPF_LIBBPF_COMMON_H */

+5

tools/lib/bpf/netlink.c

··· 45 45 46 46 struct xdp_features_md { 47 47 int ifindex; 48 + __u32 xdp_zc_max_segs; 48 49 __u64 flags; 49 50 }; 50 51 ··· 422 421 return NL_CONT; 423 422 424 423 md->flags = libbpf_nla_getattr_u64(tb[NETDEV_A_DEV_XDP_FEATURES]); 424 + if (tb[NETDEV_A_DEV_XDP_ZC_MAX_SEGS]) 425 + md->xdp_zc_max_segs = 426 + libbpf_nla_getattr_u32(tb[NETDEV_A_DEV_XDP_ZC_MAX_SEGS]); 425 427 return NL_DONE; 426 428 } 427 429 ··· 497 493 return libbpf_err(err); 498 494 499 495 opts->feature_flags = md.flags; 496 + opts->xdp_zc_max_segs = md.xdp_zc_max_segs; 500 497 501 498 skip_feature_flags: 502 499 return 0;

+39 -39

tools/testing/selftests/bpf/prog_tests/linked_list.c

··· 23 23 "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, \ 24 24 { #test "_missing_lock_pop_back", \ 25 25 "bpf_spin_lock at off=" #off " must be held for bpf_list_head" }, 26 - TEST(kptr, 32) 26 + TEST(kptr, 40) 27 27 TEST(global, 16) 28 28 TEST(map, 0) 29 29 TEST(inner_map, 0) ··· 31 31 #define TEST(test, op) \ 32 32 { #test "_kptr_incorrect_lock_" #op, \ 33 33 "held lock and object are not in the same allocation\n" \ 34 - "bpf_spin_lock at off=32 must be held for bpf_list_head" }, \ 34 + "bpf_spin_lock at off=40 must be held for bpf_list_head" }, \ 35 35 { #test "_global_incorrect_lock_" #op, \ 36 36 "held lock and object are not in the same allocation\n" \ 37 37 "bpf_spin_lock at off=16 must be held for bpf_list_head" }, \ ··· 84 84 { "double_push_back", "arg#1 expected pointer to allocated object" }, 85 85 { "no_node_value_type", "bpf_list_node not found at offset=0" }, 86 86 { "incorrect_value_type", 87 - "operation on bpf_list_head expects arg#1 bpf_list_node at offset=40 in struct foo, " 87 + "operation on bpf_list_head expects arg#1 bpf_list_node at offset=48 in struct foo, " 88 88 "but arg is at offset=0 in struct bar" }, 89 89 { "incorrect_node_var_off", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" }, 90 - { "incorrect_node_off1", "bpf_list_node not found at offset=41" }, 91 - { "incorrect_node_off2", "arg#1 offset=0, but expected bpf_list_node at offset=40 in struct foo" }, 90 + { "incorrect_node_off1", "bpf_list_node not found at offset=49" }, 91 + { "incorrect_node_off2", "arg#1 offset=0, but expected bpf_list_node at offset=48 in struct foo" }, 92 92 { "no_head_type", "bpf_list_head not found at offset=0" }, 93 93 { "incorrect_head_var_off1", "R1 doesn't have constant offset" }, 94 94 { "incorrect_head_var_off2", "variable ptr_ access var_off=(0x0; 0xffffffff) disallowed" }, 95 - { "incorrect_head_off1", "bpf_list_head not found at offset=17" }, 95 + { "incorrect_head_off1", "bpf_list_head not found at offset=25" }, 96 96 { "incorrect_head_off2", "bpf_list_head not found at offset=1" }, 97 97 { "pop_front_off", 98 - "15: (bf) r1 = r6 ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) " 99 - "R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) refs=2,4\n" 98 + "15: (bf) r1 = r6 ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=48,imm=0) " 99 + "R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=48,imm=0) refs=2,4\n" 100 100 "16: (85) call bpf_this_cpu_ptr#154\nR1 type=ptr_or_null_ expected=percpu_ptr_" }, 101 101 { "pop_back_off", 102 - "15: (bf) r1 = r6 ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) " 103 - "R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=40,imm=0) refs=2,4\n" 102 + "15: (bf) r1 = r6 ; R1_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=48,imm=0) " 103 + "R6_w=ptr_or_null_foo(id=4,ref_obj_id=4,off=48,imm=0) refs=2,4\n" 104 104 "16: (85) call bpf_this_cpu_ptr#154\nR1 type=ptr_or_null_ expected=percpu_ptr_" }, 105 105 }; 106 106 ··· 257 257 hid = btf__add_struct(btf, "bpf_list_head", 16); 258 258 if (!ASSERT_EQ(hid, LIST_HEAD, "btf__add_struct bpf_list_head")) 259 259 goto end; 260 - nid = btf__add_struct(btf, "bpf_list_node", 16); 260 + nid = btf__add_struct(btf, "bpf_list_node", 24); 261 261 if (!ASSERT_EQ(nid, LIST_NODE, "btf__add_struct bpf_list_node")) 262 262 goto end; 263 263 return btf; ··· 276 276 if (!ASSERT_OK_PTR(btf, "init_btf")) 277 277 return; 278 278 279 - bpf_rb_node_btf_id = btf__add_struct(btf, "bpf_rb_node", 24); 279 + bpf_rb_node_btf_id = btf__add_struct(btf, "bpf_rb_node", 32); 280 280 if (!ASSERT_GT(bpf_rb_node_btf_id, 0, "btf__add_struct bpf_rb_node")) 281 281 return; 282 282 ··· 286 286 return; 287 287 } 288 288 289 - id = btf__add_struct(btf, "bar", refcount_field ? 44 : 40); 289 + id = btf__add_struct(btf, "bar", refcount_field ? 60 : 56); 290 290 if (!ASSERT_GT(id, 0, "btf__add_struct bar")) 291 291 return; 292 292 err = btf__add_field(btf, "a", LIST_NODE, 0, 0); 293 293 if (!ASSERT_OK(err, "btf__add_field bar::a")) 294 294 return; 295 - err = btf__add_field(btf, "c", bpf_rb_node_btf_id, 128, 0); 295 + err = btf__add_field(btf, "c", bpf_rb_node_btf_id, 192, 0); 296 296 if (!ASSERT_OK(err, "btf__add_field bar::c")) 297 297 return; 298 298 if (refcount_field) { 299 - err = btf__add_field(btf, "ref", bpf_refcount_btf_id, 320, 0); 299 + err = btf__add_field(btf, "ref", bpf_refcount_btf_id, 448, 0); 300 300 if (!ASSERT_OK(err, "btf__add_field bar::ref")) 301 301 return; 302 302 } ··· 527 527 btf = init_btf(); 528 528 if (!ASSERT_OK_PTR(btf, "init_btf")) 529 529 break; 530 - id = btf__add_struct(btf, "foo", 36); 530 + id = btf__add_struct(btf, "foo", 44); 531 531 if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 532 532 break; 533 533 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); ··· 536 536 err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 537 537 if (!ASSERT_OK(err, "btf__add_field foo::b")) 538 538 break; 539 - err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 539 + err = btf__add_field(btf, "c", SPIN_LOCK, 320, 0); 540 540 if (!ASSERT_OK(err, "btf__add_field foo::c")) 541 541 break; 542 542 id = btf__add_decl_tag(btf, "contains:foo:b", 5, 0); ··· 553 553 btf = init_btf(); 554 554 if (!ASSERT_OK_PTR(btf, "init_btf")) 555 555 break; 556 - id = btf__add_struct(btf, "foo", 36); 556 + id = btf__add_struct(btf, "foo", 44); 557 557 if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 558 558 break; 559 559 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); ··· 562 562 err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 563 563 if (!ASSERT_OK(err, "btf__add_field foo::b")) 564 564 break; 565 - err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 565 + err = btf__add_field(btf, "c", SPIN_LOCK, 320, 0); 566 566 if (!ASSERT_OK(err, "btf__add_field foo::c")) 567 567 break; 568 568 id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0); 569 569 if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b")) 570 570 break; 571 - id = btf__add_struct(btf, "bar", 36); 571 + id = btf__add_struct(btf, "bar", 44); 572 572 if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 573 573 break; 574 574 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); ··· 577 577 err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 578 578 if (!ASSERT_OK(err, "btf__add_field bar::b")) 579 579 break; 580 - err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 580 + err = btf__add_field(btf, "c", SPIN_LOCK, 320, 0); 581 581 if (!ASSERT_OK(err, "btf__add_field bar::c")) 582 582 break; 583 583 id = btf__add_decl_tag(btf, "contains:foo:b", 7, 0); ··· 594 594 btf = init_btf(); 595 595 if (!ASSERT_OK_PTR(btf, "init_btf")) 596 596 break; 597 - id = btf__add_struct(btf, "foo", 20); 597 + id = btf__add_struct(btf, "foo", 28); 598 598 if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 599 599 break; 600 600 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 601 601 if (!ASSERT_OK(err, "btf__add_field foo::a")) 602 602 break; 603 - err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0); 603 + err = btf__add_field(btf, "b", SPIN_LOCK, 192, 0); 604 604 if (!ASSERT_OK(err, "btf__add_field foo::b")) 605 605 break; 606 606 id = btf__add_decl_tag(btf, "contains:bar:a", 5, 0); 607 607 if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:a")) 608 608 break; 609 - id = btf__add_struct(btf, "bar", 16); 609 + id = btf__add_struct(btf, "bar", 24); 610 610 if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 611 611 break; 612 612 err = btf__add_field(btf, "a", LIST_NODE, 0, 0); ··· 623 623 btf = init_btf(); 624 624 if (!ASSERT_OK_PTR(btf, "init_btf")) 625 625 break; 626 - id = btf__add_struct(btf, "foo", 20); 626 + id = btf__add_struct(btf, "foo", 28); 627 627 if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 628 628 break; 629 629 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); 630 630 if (!ASSERT_OK(err, "btf__add_field foo::a")) 631 631 break; 632 - err = btf__add_field(btf, "b", SPIN_LOCK, 128, 0); 632 + err = btf__add_field(btf, "b", SPIN_LOCK, 192, 0); 633 633 if (!ASSERT_OK(err, "btf__add_field foo::b")) 634 634 break; 635 635 id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0); 636 636 if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b")) 637 637 break; 638 - id = btf__add_struct(btf, "bar", 36); 638 + id = btf__add_struct(btf, "bar", 44); 639 639 if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 640 640 break; 641 641 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); ··· 644 644 err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 645 645 if (!ASSERT_OK(err, "btf__add_field bar::b")) 646 646 break; 647 - err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 647 + err = btf__add_field(btf, "c", SPIN_LOCK, 320, 0); 648 648 if (!ASSERT_OK(err, "btf__add_field bar::c")) 649 649 break; 650 650 id = btf__add_decl_tag(btf, "contains:baz:a", 7, 0); 651 651 if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:baz:a")) 652 652 break; 653 - id = btf__add_struct(btf, "baz", 16); 653 + id = btf__add_struct(btf, "baz", 24); 654 654 if (!ASSERT_EQ(id, 9, "btf__add_struct baz")) 655 655 break; 656 656 err = btf__add_field(btf, "a", LIST_NODE, 0, 0); ··· 667 667 btf = init_btf(); 668 668 if (!ASSERT_OK_PTR(btf, "init_btf")) 669 669 break; 670 - id = btf__add_struct(btf, "foo", 36); 670 + id = btf__add_struct(btf, "foo", 44); 671 671 if (!ASSERT_EQ(id, 5, "btf__add_struct foo")) 672 672 break; 673 673 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); ··· 676 676 err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 677 677 if (!ASSERT_OK(err, "btf__add_field foo::b")) 678 678 break; 679 - err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 679 + err = btf__add_field(btf, "c", SPIN_LOCK, 320, 0); 680 680 if (!ASSERT_OK(err, "btf__add_field foo::c")) 681 681 break; 682 682 id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0); 683 683 if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b")) 684 684 break; 685 - id = btf__add_struct(btf, "bar", 36); 685 + id = btf__add_struct(btf, "bar", 44); 686 686 if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 687 687 break; 688 688 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); ··· 691 691 err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 692 692 if (!ASSERT_OK(err, "btf__add_field bar:b")) 693 693 break; 694 - err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 694 + err = btf__add_field(btf, "c", SPIN_LOCK, 320, 0); 695 695 if (!ASSERT_OK(err, "btf__add_field bar:c")) 696 696 break; 697 697 id = btf__add_decl_tag(btf, "contains:baz:a", 7, 0); 698 698 if (!ASSERT_EQ(id, 8, "btf__add_decl_tag contains:baz:a")) 699 699 break; 700 - id = btf__add_struct(btf, "baz", 16); 700 + id = btf__add_struct(btf, "baz", 24); 701 701 if (!ASSERT_EQ(id, 9, "btf__add_struct baz")) 702 702 break; 703 703 err = btf__add_field(btf, "a", LIST_NODE, 0, 0); ··· 726 726 id = btf__add_decl_tag(btf, "contains:bar:b", 5, 0); 727 727 if (!ASSERT_EQ(id, 6, "btf__add_decl_tag contains:bar:b")) 728 728 break; 729 - id = btf__add_struct(btf, "bar", 36); 729 + id = btf__add_struct(btf, "bar", 44); 730 730 if (!ASSERT_EQ(id, 7, "btf__add_struct bar")) 731 731 break; 732 732 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); ··· 735 735 err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 736 736 if (!ASSERT_OK(err, "btf__add_field bar::b")) 737 737 break; 738 - err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 738 + err = btf__add_field(btf, "c", SPIN_LOCK, 320, 0); 739 739 if (!ASSERT_OK(err, "btf__add_field bar::c")) 740 740 break; 741 741 id = btf__add_decl_tag(btf, "contains:baz:b", 7, 0); 742 742 if (!ASSERT_EQ(id, 8, "btf__add_decl_tag")) 743 743 break; 744 - id = btf__add_struct(btf, "baz", 36); 744 + id = btf__add_struct(btf, "baz", 44); 745 745 if (!ASSERT_EQ(id, 9, "btf__add_struct baz")) 746 746 break; 747 747 err = btf__add_field(btf, "a", LIST_HEAD, 0, 0); ··· 750 750 err = btf__add_field(btf, "b", LIST_NODE, 128, 0); 751 751 if (!ASSERT_OK(err, "btf__add_field bar::b")) 752 752 break; 753 - err = btf__add_field(btf, "c", SPIN_LOCK, 256, 0); 753 + err = btf__add_field(btf, "c", SPIN_LOCK, 320, 0); 754 754 if (!ASSERT_OK(err, "btf__add_field bar::c")) 755 755 break; 756 756 id = btf__add_decl_tag(btf, "contains:bam:a", 9, 0); 757 757 if (!ASSERT_EQ(id, 10, "btf__add_decl_tag contains:bam:a")) 758 758 break; 759 - id = btf__add_struct(btf, "bam", 16); 759 + id = btf__add_struct(btf, "bam", 24); 760 760 if (!ASSERT_EQ(id, 11, "btf__add_struct bam")) 761 761 break; 762 762 err = btf__add_field(btf, "a", LIST_NODE, 0, 0);

+4

tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c

··· 14 14 void test_refcounted_kptr_fail(void) 15 15 { 16 16 } 17 + 18 + void test_refcounted_kptr_wrong_owner(void) 19 + { 20 + }

+72

tools/testing/selftests/bpf/prog_tests/tc_helpers.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2023 Isovalent */ 3 + #ifndef TC_HELPERS 4 + #define TC_HELPERS 5 + #include <test_progs.h> 6 + 7 + static inline __u32 id_from_prog_fd(int fd) 8 + { 9 + struct bpf_prog_info prog_info = {}; 10 + __u32 prog_info_len = sizeof(prog_info); 11 + int err; 12 + 13 + err = bpf_obj_get_info_by_fd(fd, &prog_info, &prog_info_len); 14 + if (!ASSERT_OK(err, "id_from_prog_fd")) 15 + return 0; 16 + 17 + ASSERT_NEQ(prog_info.id, 0, "prog_info.id"); 18 + return prog_info.id; 19 + } 20 + 21 + static inline __u32 id_from_link_fd(int fd) 22 + { 23 + struct bpf_link_info link_info = {}; 24 + __u32 link_info_len = sizeof(link_info); 25 + int err; 26 + 27 + err = bpf_link_get_info_by_fd(fd, &link_info, &link_info_len); 28 + if (!ASSERT_OK(err, "id_from_link_fd")) 29 + return 0; 30 + 31 + ASSERT_NEQ(link_info.id, 0, "link_info.id"); 32 + return link_info.id; 33 + } 34 + 35 + static inline __u32 ifindex_from_link_fd(int fd) 36 + { 37 + struct bpf_link_info link_info = {}; 38 + __u32 link_info_len = sizeof(link_info); 39 + int err; 40 + 41 + err = bpf_link_get_info_by_fd(fd, &link_info, &link_info_len); 42 + if (!ASSERT_OK(err, "id_from_link_fd")) 43 + return 0; 44 + 45 + return link_info.tcx.ifindex; 46 + } 47 + 48 + static inline void __assert_mprog_count(int target, int expected, bool miniq, int ifindex) 49 + { 50 + __u32 count = 0, attach_flags = 0; 51 + int err; 52 + 53 + err = bpf_prog_query(ifindex, target, 0, &attach_flags, 54 + NULL, &count); 55 + ASSERT_EQ(count, expected, "count"); 56 + if (!expected && !miniq) 57 + ASSERT_EQ(err, -ENOENT, "prog_query"); 58 + else 59 + ASSERT_EQ(err, 0, "prog_query"); 60 + } 61 + 62 + static inline void assert_mprog_count(int target, int expected) 63 + { 64 + __assert_mprog_count(target, expected, false, loopback); 65 + } 66 + 67 + static inline void assert_mprog_count_ifindex(int ifindex, int target, int expected) 68 + { 69 + __assert_mprog_count(target, expected, false, ifindex); 70 + } 71 + 72 + #endif /* TC_HELPERS */

+1583

tools/testing/selftests/bpf/prog_tests/tc_links.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Isovalent */ 3 + #include <uapi/linux/if_link.h> 4 + #include <net/if.h> 5 + #include <test_progs.h> 6 + 7 + #define loopback 1 8 + #define ping_cmd "ping -q -c1 -w1 127.0.0.1 > /dev/null" 9 + 10 + #include "test_tc_link.skel.h" 11 + #include "tc_helpers.h" 12 + 13 + void serial_test_tc_links_basic(void) 14 + { 15 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 16 + LIBBPF_OPTS(bpf_tcx_opts, optl); 17 + __u32 prog_ids[2], link_ids[2]; 18 + __u32 pid1, pid2, lid1, lid2; 19 + struct test_tc_link *skel; 20 + struct bpf_link *link; 21 + int err; 22 + 23 + skel = test_tc_link__open_and_load(); 24 + if (!ASSERT_OK_PTR(skel, "skel_load")) 25 + goto cleanup; 26 + 27 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 28 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 29 + 30 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 31 + 32 + assert_mprog_count(BPF_TCX_INGRESS, 0); 33 + assert_mprog_count(BPF_TCX_EGRESS, 0); 34 + 35 + ASSERT_EQ(skel->bss->seen_tc1, false, "seen_tc1"); 36 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 37 + 38 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 39 + if (!ASSERT_OK_PTR(link, "link_attach")) 40 + goto cleanup; 41 + 42 + skel->links.tc1 = link; 43 + 44 + lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1)); 45 + 46 + assert_mprog_count(BPF_TCX_INGRESS, 1); 47 + assert_mprog_count(BPF_TCX_EGRESS, 0); 48 + 49 + optq.prog_ids = prog_ids; 50 + optq.link_ids = link_ids; 51 + 52 + memset(prog_ids, 0, sizeof(prog_ids)); 53 + memset(link_ids, 0, sizeof(link_ids)); 54 + optq.count = ARRAY_SIZE(prog_ids); 55 + 56 + err = bpf_prog_query_opts(loopback, BPF_TCX_INGRESS, &optq); 57 + if (!ASSERT_OK(err, "prog_query")) 58 + goto cleanup; 59 + 60 + ASSERT_EQ(optq.count, 1, "count"); 61 + ASSERT_EQ(optq.revision, 2, "revision"); 62 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 63 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 64 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 65 + ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]"); 66 + 67 + ASSERT_OK(system(ping_cmd), ping_cmd); 68 + 69 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 70 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 71 + 72 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 73 + if (!ASSERT_OK_PTR(link, "link_attach")) 74 + goto cleanup; 75 + 76 + skel->links.tc2 = link; 77 + 78 + lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2)); 79 + ASSERT_NEQ(lid1, lid2, "link_ids_1_2"); 80 + 81 + assert_mprog_count(BPF_TCX_INGRESS, 1); 82 + assert_mprog_count(BPF_TCX_EGRESS, 1); 83 + 84 + memset(prog_ids, 0, sizeof(prog_ids)); 85 + memset(link_ids, 0, sizeof(link_ids)); 86 + optq.count = ARRAY_SIZE(prog_ids); 87 + 88 + err = bpf_prog_query_opts(loopback, BPF_TCX_EGRESS, &optq); 89 + if (!ASSERT_OK(err, "prog_query")) 90 + goto cleanup; 91 + 92 + ASSERT_EQ(optq.count, 1, "count"); 93 + ASSERT_EQ(optq.revision, 2, "revision"); 94 + ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]"); 95 + ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]"); 96 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 97 + ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]"); 98 + 99 + ASSERT_OK(system(ping_cmd), ping_cmd); 100 + 101 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 102 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 103 + cleanup: 104 + test_tc_link__destroy(skel); 105 + 106 + assert_mprog_count(BPF_TCX_INGRESS, 0); 107 + assert_mprog_count(BPF_TCX_EGRESS, 0); 108 + } 109 + 110 + static void test_tc_links_before_target(int target) 111 + { 112 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 113 + LIBBPF_OPTS(bpf_tcx_opts, optl); 114 + __u32 prog_ids[5], link_ids[5]; 115 + __u32 pid1, pid2, pid3, pid4; 116 + __u32 lid1, lid2, lid3, lid4; 117 + struct test_tc_link *skel; 118 + struct bpf_link *link; 119 + int err; 120 + 121 + skel = test_tc_link__open(); 122 + if (!ASSERT_OK_PTR(skel, "skel_open")) 123 + goto cleanup; 124 + 125 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 126 + 0, "tc1_attach_type"); 127 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 128 + 0, "tc2_attach_type"); 129 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target), 130 + 0, "tc3_attach_type"); 131 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target), 132 + 0, "tc4_attach_type"); 133 + 134 + err = test_tc_link__load(skel); 135 + if (!ASSERT_OK(err, "skel_load")) 136 + goto cleanup; 137 + 138 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 139 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 140 + pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3)); 141 + pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4)); 142 + 143 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 144 + ASSERT_NEQ(pid3, pid4, "prog_ids_3_4"); 145 + ASSERT_NEQ(pid2, pid3, "prog_ids_2_3"); 146 + 147 + assert_mprog_count(target, 0); 148 + 149 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 150 + if (!ASSERT_OK_PTR(link, "link_attach")) 151 + goto cleanup; 152 + 153 + skel->links.tc1 = link; 154 + 155 + lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1)); 156 + 157 + assert_mprog_count(target, 1); 158 + 159 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 160 + if (!ASSERT_OK_PTR(link, "link_attach")) 161 + goto cleanup; 162 + 163 + skel->links.tc2 = link; 164 + 165 + lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2)); 166 + 167 + assert_mprog_count(target, 2); 168 + 169 + optq.prog_ids = prog_ids; 170 + optq.link_ids = link_ids; 171 + 172 + memset(prog_ids, 0, sizeof(prog_ids)); 173 + memset(link_ids, 0, sizeof(link_ids)); 174 + optq.count = ARRAY_SIZE(prog_ids); 175 + 176 + err = bpf_prog_query_opts(loopback, target, &optq); 177 + if (!ASSERT_OK(err, "prog_query")) 178 + goto cleanup; 179 + 180 + ASSERT_EQ(optq.count, 2, "count"); 181 + ASSERT_EQ(optq.revision, 3, "revision"); 182 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 183 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 184 + ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]"); 185 + ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]"); 186 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 187 + ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]"); 188 + 189 + ASSERT_OK(system(ping_cmd), ping_cmd); 190 + 191 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 192 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 193 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 194 + ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4"); 195 + 196 + skel->bss->seen_tc1 = false; 197 + skel->bss->seen_tc2 = false; 198 + 199 + LIBBPF_OPTS_RESET(optl, 200 + .flags = BPF_F_BEFORE, 201 + .relative_fd = bpf_program__fd(skel->progs.tc2), 202 + ); 203 + 204 + link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl); 205 + if (!ASSERT_OK_PTR(link, "link_attach")) 206 + goto cleanup; 207 + 208 + skel->links.tc3 = link; 209 + 210 + lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3)); 211 + 212 + LIBBPF_OPTS_RESET(optl, 213 + .flags = BPF_F_BEFORE | BPF_F_LINK, 214 + .relative_id = lid1, 215 + ); 216 + 217 + link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl); 218 + if (!ASSERT_OK_PTR(link, "link_attach")) 219 + goto cleanup; 220 + 221 + skel->links.tc4 = link; 222 + 223 + lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4)); 224 + 225 + assert_mprog_count(target, 4); 226 + 227 + memset(prog_ids, 0, sizeof(prog_ids)); 228 + memset(link_ids, 0, sizeof(link_ids)); 229 + optq.count = ARRAY_SIZE(prog_ids); 230 + 231 + err = bpf_prog_query_opts(loopback, target, &optq); 232 + if (!ASSERT_OK(err, "prog_query")) 233 + goto cleanup; 234 + 235 + ASSERT_EQ(optq.count, 4, "count"); 236 + ASSERT_EQ(optq.revision, 5, "revision"); 237 + ASSERT_EQ(optq.prog_ids[0], pid4, "prog_ids[0]"); 238 + ASSERT_EQ(optq.link_ids[0], lid4, "link_ids[0]"); 239 + ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]"); 240 + ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]"); 241 + ASSERT_EQ(optq.prog_ids[2], pid3, "prog_ids[2]"); 242 + ASSERT_EQ(optq.link_ids[2], lid3, "link_ids[2]"); 243 + ASSERT_EQ(optq.prog_ids[3], pid2, "prog_ids[3]"); 244 + ASSERT_EQ(optq.link_ids[3], lid2, "link_ids[3]"); 245 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 246 + ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]"); 247 + 248 + ASSERT_OK(system(ping_cmd), ping_cmd); 249 + 250 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 251 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 252 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 253 + ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4"); 254 + cleanup: 255 + test_tc_link__destroy(skel); 256 + assert_mprog_count(target, 0); 257 + } 258 + 259 + void serial_test_tc_links_before(void) 260 + { 261 + test_tc_links_before_target(BPF_TCX_INGRESS); 262 + test_tc_links_before_target(BPF_TCX_EGRESS); 263 + } 264 + 265 + static void test_tc_links_after_target(int target) 266 + { 267 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 268 + LIBBPF_OPTS(bpf_tcx_opts, optl); 269 + __u32 prog_ids[5], link_ids[5]; 270 + __u32 pid1, pid2, pid3, pid4; 271 + __u32 lid1, lid2, lid3, lid4; 272 + struct test_tc_link *skel; 273 + struct bpf_link *link; 274 + int err; 275 + 276 + skel = test_tc_link__open(); 277 + if (!ASSERT_OK_PTR(skel, "skel_open")) 278 + goto cleanup; 279 + 280 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 281 + 0, "tc1_attach_type"); 282 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 283 + 0, "tc2_attach_type"); 284 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target), 285 + 0, "tc3_attach_type"); 286 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target), 287 + 0, "tc4_attach_type"); 288 + 289 + err = test_tc_link__load(skel); 290 + if (!ASSERT_OK(err, "skel_load")) 291 + goto cleanup; 292 + 293 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 294 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 295 + pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3)); 296 + pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4)); 297 + 298 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 299 + ASSERT_NEQ(pid3, pid4, "prog_ids_3_4"); 300 + ASSERT_NEQ(pid2, pid3, "prog_ids_2_3"); 301 + 302 + assert_mprog_count(target, 0); 303 + 304 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 305 + if (!ASSERT_OK_PTR(link, "link_attach")) 306 + goto cleanup; 307 + 308 + skel->links.tc1 = link; 309 + 310 + lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1)); 311 + 312 + assert_mprog_count(target, 1); 313 + 314 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 315 + if (!ASSERT_OK_PTR(link, "link_attach")) 316 + goto cleanup; 317 + 318 + skel->links.tc2 = link; 319 + 320 + lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2)); 321 + 322 + assert_mprog_count(target, 2); 323 + 324 + optq.prog_ids = prog_ids; 325 + optq.link_ids = link_ids; 326 + 327 + memset(prog_ids, 0, sizeof(prog_ids)); 328 + memset(link_ids, 0, sizeof(link_ids)); 329 + optq.count = ARRAY_SIZE(prog_ids); 330 + 331 + err = bpf_prog_query_opts(loopback, target, &optq); 332 + if (!ASSERT_OK(err, "prog_query")) 333 + goto cleanup; 334 + 335 + ASSERT_EQ(optq.count, 2, "count"); 336 + ASSERT_EQ(optq.revision, 3, "revision"); 337 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 338 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 339 + ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]"); 340 + ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]"); 341 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 342 + ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]"); 343 + 344 + ASSERT_OK(system(ping_cmd), ping_cmd); 345 + 346 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 347 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 348 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 349 + ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4"); 350 + 351 + skel->bss->seen_tc1 = false; 352 + skel->bss->seen_tc2 = false; 353 + 354 + LIBBPF_OPTS_RESET(optl, 355 + .flags = BPF_F_AFTER, 356 + .relative_fd = bpf_program__fd(skel->progs.tc1), 357 + ); 358 + 359 + link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl); 360 + if (!ASSERT_OK_PTR(link, "link_attach")) 361 + goto cleanup; 362 + 363 + skel->links.tc3 = link; 364 + 365 + lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3)); 366 + 367 + LIBBPF_OPTS_RESET(optl, 368 + .flags = BPF_F_AFTER | BPF_F_LINK, 369 + .relative_fd = bpf_link__fd(skel->links.tc2), 370 + ); 371 + 372 + link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl); 373 + if (!ASSERT_OK_PTR(link, "link_attach")) 374 + goto cleanup; 375 + 376 + skel->links.tc4 = link; 377 + 378 + lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4)); 379 + 380 + assert_mprog_count(target, 4); 381 + 382 + memset(prog_ids, 0, sizeof(prog_ids)); 383 + memset(link_ids, 0, sizeof(link_ids)); 384 + optq.count = ARRAY_SIZE(prog_ids); 385 + 386 + err = bpf_prog_query_opts(loopback, target, &optq); 387 + if (!ASSERT_OK(err, "prog_query")) 388 + goto cleanup; 389 + 390 + ASSERT_EQ(optq.count, 4, "count"); 391 + ASSERT_EQ(optq.revision, 5, "revision"); 392 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 393 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 394 + ASSERT_EQ(optq.prog_ids[1], pid3, "prog_ids[1]"); 395 + ASSERT_EQ(optq.link_ids[1], lid3, "link_ids[1]"); 396 + ASSERT_EQ(optq.prog_ids[2], pid2, "prog_ids[2]"); 397 + ASSERT_EQ(optq.link_ids[2], lid2, "link_ids[2]"); 398 + ASSERT_EQ(optq.prog_ids[3], pid4, "prog_ids[3]"); 399 + ASSERT_EQ(optq.link_ids[3], lid4, "link_ids[3]"); 400 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 401 + ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]"); 402 + 403 + ASSERT_OK(system(ping_cmd), ping_cmd); 404 + 405 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 406 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 407 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 408 + ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4"); 409 + cleanup: 410 + test_tc_link__destroy(skel); 411 + assert_mprog_count(target, 0); 412 + } 413 + 414 + void serial_test_tc_links_after(void) 415 + { 416 + test_tc_links_after_target(BPF_TCX_INGRESS); 417 + test_tc_links_after_target(BPF_TCX_EGRESS); 418 + } 419 + 420 + static void test_tc_links_revision_target(int target) 421 + { 422 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 423 + LIBBPF_OPTS(bpf_tcx_opts, optl); 424 + __u32 prog_ids[3], link_ids[3]; 425 + __u32 pid1, pid2, lid1, lid2; 426 + struct test_tc_link *skel; 427 + struct bpf_link *link; 428 + int err; 429 + 430 + skel = test_tc_link__open(); 431 + if (!ASSERT_OK_PTR(skel, "skel_open")) 432 + goto cleanup; 433 + 434 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 435 + 0, "tc1_attach_type"); 436 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 437 + 0, "tc2_attach_type"); 438 + 439 + err = test_tc_link__load(skel); 440 + if (!ASSERT_OK(err, "skel_load")) 441 + goto cleanup; 442 + 443 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 444 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 445 + 446 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 447 + 448 + assert_mprog_count(target, 0); 449 + 450 + optl.expected_revision = 1; 451 + 452 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 453 + if (!ASSERT_OK_PTR(link, "link_attach")) 454 + goto cleanup; 455 + 456 + skel->links.tc1 = link; 457 + 458 + lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1)); 459 + 460 + assert_mprog_count(target, 1); 461 + 462 + optl.expected_revision = 1; 463 + 464 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 465 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 466 + bpf_link__destroy(link); 467 + goto cleanup; 468 + } 469 + 470 + assert_mprog_count(target, 1); 471 + 472 + optl.expected_revision = 2; 473 + 474 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 475 + if (!ASSERT_OK_PTR(link, "link_attach")) 476 + goto cleanup; 477 + 478 + skel->links.tc2 = link; 479 + 480 + lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2)); 481 + 482 + assert_mprog_count(target, 2); 483 + 484 + optq.prog_ids = prog_ids; 485 + optq.link_ids = link_ids; 486 + 487 + memset(prog_ids, 0, sizeof(prog_ids)); 488 + memset(link_ids, 0, sizeof(link_ids)); 489 + optq.count = ARRAY_SIZE(prog_ids); 490 + 491 + err = bpf_prog_query_opts(loopback, target, &optq); 492 + if (!ASSERT_OK(err, "prog_query")) 493 + goto cleanup; 494 + 495 + ASSERT_EQ(optq.count, 2, "count"); 496 + ASSERT_EQ(optq.revision, 3, "revision"); 497 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 498 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 499 + ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]"); 500 + ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]"); 501 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 502 + ASSERT_EQ(optq.link_ids[2], 0, "prog_ids[2]"); 503 + 504 + ASSERT_OK(system(ping_cmd), ping_cmd); 505 + 506 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 507 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 508 + cleanup: 509 + test_tc_link__destroy(skel); 510 + assert_mprog_count(target, 0); 511 + } 512 + 513 + void serial_test_tc_links_revision(void) 514 + { 515 + test_tc_links_revision_target(BPF_TCX_INGRESS); 516 + test_tc_links_revision_target(BPF_TCX_EGRESS); 517 + } 518 + 519 + static void test_tc_chain_classic(int target, bool chain_tc_old) 520 + { 521 + LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1); 522 + LIBBPF_OPTS(bpf_tc_hook, tc_hook, .ifindex = loopback); 523 + bool hook_created = false, tc_attached = false; 524 + LIBBPF_OPTS(bpf_tcx_opts, optl); 525 + __u32 pid1, pid2, pid3; 526 + struct test_tc_link *skel; 527 + struct bpf_link *link; 528 + int err; 529 + 530 + skel = test_tc_link__open(); 531 + if (!ASSERT_OK_PTR(skel, "skel_open")) 532 + goto cleanup; 533 + 534 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 535 + 0, "tc1_attach_type"); 536 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 537 + 0, "tc2_attach_type"); 538 + 539 + err = test_tc_link__load(skel); 540 + if (!ASSERT_OK(err, "skel_load")) 541 + goto cleanup; 542 + 543 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 544 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 545 + pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3)); 546 + 547 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 548 + ASSERT_NEQ(pid2, pid3, "prog_ids_2_3"); 549 + 550 + assert_mprog_count(target, 0); 551 + 552 + if (chain_tc_old) { 553 + tc_hook.attach_point = target == BPF_TCX_INGRESS ? 554 + BPF_TC_INGRESS : BPF_TC_EGRESS; 555 + err = bpf_tc_hook_create(&tc_hook); 556 + if (err == 0) 557 + hook_created = true; 558 + err = err == -EEXIST ? 0 : err; 559 + if (!ASSERT_OK(err, "bpf_tc_hook_create")) 560 + goto cleanup; 561 + 562 + tc_opts.prog_fd = bpf_program__fd(skel->progs.tc3); 563 + err = bpf_tc_attach(&tc_hook, &tc_opts); 564 + if (!ASSERT_OK(err, "bpf_tc_attach")) 565 + goto cleanup; 566 + tc_attached = true; 567 + } 568 + 569 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 570 + if (!ASSERT_OK_PTR(link, "link_attach")) 571 + goto cleanup; 572 + 573 + skel->links.tc1 = link; 574 + 575 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 576 + if (!ASSERT_OK_PTR(link, "link_attach")) 577 + goto cleanup; 578 + 579 + skel->links.tc2 = link; 580 + 581 + assert_mprog_count(target, 2); 582 + 583 + ASSERT_OK(system(ping_cmd), ping_cmd); 584 + 585 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 586 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 587 + ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3"); 588 + 589 + skel->bss->seen_tc1 = false; 590 + skel->bss->seen_tc2 = false; 591 + skel->bss->seen_tc3 = false; 592 + 593 + err = bpf_link__detach(skel->links.tc2); 594 + if (!ASSERT_OK(err, "prog_detach")) 595 + goto cleanup; 596 + 597 + assert_mprog_count(target, 1); 598 + 599 + ASSERT_OK(system(ping_cmd), ping_cmd); 600 + 601 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 602 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 603 + ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3"); 604 + cleanup: 605 + if (tc_attached) { 606 + tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0; 607 + err = bpf_tc_detach(&tc_hook, &tc_opts); 608 + ASSERT_OK(err, "bpf_tc_detach"); 609 + } 610 + if (hook_created) { 611 + tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS; 612 + bpf_tc_hook_destroy(&tc_hook); 613 + } 614 + assert_mprog_count(target, 1); 615 + test_tc_link__destroy(skel); 616 + assert_mprog_count(target, 0); 617 + } 618 + 619 + void serial_test_tc_links_chain_classic(void) 620 + { 621 + test_tc_chain_classic(BPF_TCX_INGRESS, false); 622 + test_tc_chain_classic(BPF_TCX_EGRESS, false); 623 + test_tc_chain_classic(BPF_TCX_INGRESS, true); 624 + test_tc_chain_classic(BPF_TCX_EGRESS, true); 625 + } 626 + 627 + static void test_tc_links_replace_target(int target) 628 + { 629 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 630 + LIBBPF_OPTS(bpf_tcx_opts, optl); 631 + __u32 pid1, pid2, pid3, lid1, lid2; 632 + __u32 prog_ids[4], link_ids[4]; 633 + struct test_tc_link *skel; 634 + struct bpf_link *link; 635 + int err; 636 + 637 + skel = test_tc_link__open(); 638 + if (!ASSERT_OK_PTR(skel, "skel_open")) 639 + goto cleanup; 640 + 641 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 642 + 0, "tc1_attach_type"); 643 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 644 + 0, "tc2_attach_type"); 645 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target), 646 + 0, "tc3_attach_type"); 647 + 648 + err = test_tc_link__load(skel); 649 + if (!ASSERT_OK(err, "skel_load")) 650 + goto cleanup; 651 + 652 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 653 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 654 + pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3)); 655 + 656 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 657 + ASSERT_NEQ(pid2, pid3, "prog_ids_2_3"); 658 + 659 + assert_mprog_count(target, 0); 660 + 661 + optl.expected_revision = 1; 662 + 663 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 664 + if (!ASSERT_OK_PTR(link, "link_attach")) 665 + goto cleanup; 666 + 667 + skel->links.tc1 = link; 668 + 669 + lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1)); 670 + 671 + assert_mprog_count(target, 1); 672 + 673 + LIBBPF_OPTS_RESET(optl, 674 + .flags = BPF_F_BEFORE, 675 + .relative_id = pid1, 676 + .expected_revision = 2, 677 + ); 678 + 679 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 680 + if (!ASSERT_OK_PTR(link, "link_attach")) 681 + goto cleanup; 682 + 683 + skel->links.tc2 = link; 684 + 685 + lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2)); 686 + 687 + assert_mprog_count(target, 2); 688 + 689 + optq.prog_ids = prog_ids; 690 + optq.link_ids = link_ids; 691 + 692 + memset(prog_ids, 0, sizeof(prog_ids)); 693 + memset(link_ids, 0, sizeof(link_ids)); 694 + optq.count = ARRAY_SIZE(prog_ids); 695 + 696 + err = bpf_prog_query_opts(loopback, target, &optq); 697 + if (!ASSERT_OK(err, "prog_query")) 698 + goto cleanup; 699 + 700 + ASSERT_EQ(optq.count, 2, "count"); 701 + ASSERT_EQ(optq.revision, 3, "revision"); 702 + ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]"); 703 + ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]"); 704 + ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]"); 705 + ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]"); 706 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 707 + ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]"); 708 + 709 + ASSERT_OK(system(ping_cmd), ping_cmd); 710 + 711 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 712 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 713 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 714 + 715 + skel->bss->seen_tc1 = false; 716 + skel->bss->seen_tc2 = false; 717 + skel->bss->seen_tc3 = false; 718 + 719 + LIBBPF_OPTS_RESET(optl, 720 + .flags = BPF_F_REPLACE, 721 + .relative_fd = bpf_program__fd(skel->progs.tc2), 722 + .expected_revision = 3, 723 + ); 724 + 725 + link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl); 726 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 727 + bpf_link__destroy(link); 728 + goto cleanup; 729 + } 730 + 731 + assert_mprog_count(target, 2); 732 + 733 + LIBBPF_OPTS_RESET(optl, 734 + .flags = BPF_F_REPLACE | BPF_F_LINK, 735 + .relative_fd = bpf_link__fd(skel->links.tc2), 736 + .expected_revision = 3, 737 + ); 738 + 739 + link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl); 740 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 741 + bpf_link__destroy(link); 742 + goto cleanup; 743 + } 744 + 745 + assert_mprog_count(target, 2); 746 + 747 + LIBBPF_OPTS_RESET(optl, 748 + .flags = BPF_F_REPLACE | BPF_F_LINK | BPF_F_AFTER, 749 + .relative_id = lid2, 750 + ); 751 + 752 + link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl); 753 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 754 + bpf_link__destroy(link); 755 + goto cleanup; 756 + } 757 + 758 + assert_mprog_count(target, 2); 759 + 760 + err = bpf_link__update_program(skel->links.tc2, skel->progs.tc3); 761 + if (!ASSERT_OK(err, "link_update")) 762 + goto cleanup; 763 + 764 + assert_mprog_count(target, 2); 765 + 766 + memset(prog_ids, 0, sizeof(prog_ids)); 767 + memset(link_ids, 0, sizeof(link_ids)); 768 + optq.count = ARRAY_SIZE(prog_ids); 769 + 770 + err = bpf_prog_query_opts(loopback, target, &optq); 771 + if (!ASSERT_OK(err, "prog_query")) 772 + goto cleanup; 773 + 774 + ASSERT_EQ(optq.count, 2, "count"); 775 + ASSERT_EQ(optq.revision, 4, "revision"); 776 + ASSERT_EQ(optq.prog_ids[0], pid3, "prog_ids[0]"); 777 + ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]"); 778 + ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]"); 779 + ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]"); 780 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 781 + ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]"); 782 + 783 + ASSERT_OK(system(ping_cmd), ping_cmd); 784 + 785 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 786 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 787 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 788 + 789 + skel->bss->seen_tc1 = false; 790 + skel->bss->seen_tc2 = false; 791 + skel->bss->seen_tc3 = false; 792 + 793 + err = bpf_link__detach(skel->links.tc2); 794 + if (!ASSERT_OK(err, "link_detach")) 795 + goto cleanup; 796 + 797 + assert_mprog_count(target, 1); 798 + 799 + memset(prog_ids, 0, sizeof(prog_ids)); 800 + memset(link_ids, 0, sizeof(link_ids)); 801 + optq.count = ARRAY_SIZE(prog_ids); 802 + 803 + err = bpf_prog_query_opts(loopback, target, &optq); 804 + if (!ASSERT_OK(err, "prog_query")) 805 + goto cleanup; 806 + 807 + ASSERT_EQ(optq.count, 1, "count"); 808 + ASSERT_EQ(optq.revision, 5, "revision"); 809 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 810 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 811 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 812 + ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]"); 813 + 814 + ASSERT_OK(system(ping_cmd), ping_cmd); 815 + 816 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 817 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 818 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 819 + 820 + skel->bss->seen_tc1 = false; 821 + skel->bss->seen_tc2 = false; 822 + skel->bss->seen_tc3 = false; 823 + 824 + err = bpf_link__update_program(skel->links.tc1, skel->progs.tc1); 825 + if (!ASSERT_OK(err, "link_update_self")) 826 + goto cleanup; 827 + 828 + assert_mprog_count(target, 1); 829 + 830 + memset(prog_ids, 0, sizeof(prog_ids)); 831 + memset(link_ids, 0, sizeof(link_ids)); 832 + optq.count = ARRAY_SIZE(prog_ids); 833 + 834 + err = bpf_prog_query_opts(loopback, target, &optq); 835 + if (!ASSERT_OK(err, "prog_query")) 836 + goto cleanup; 837 + 838 + ASSERT_EQ(optq.count, 1, "count"); 839 + ASSERT_EQ(optq.revision, 5, "revision"); 840 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 841 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 842 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 843 + ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]"); 844 + 845 + ASSERT_OK(system(ping_cmd), ping_cmd); 846 + 847 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 848 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 849 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 850 + cleanup: 851 + test_tc_link__destroy(skel); 852 + assert_mprog_count(target, 0); 853 + } 854 + 855 + void serial_test_tc_links_replace(void) 856 + { 857 + test_tc_links_replace_target(BPF_TCX_INGRESS); 858 + test_tc_links_replace_target(BPF_TCX_EGRESS); 859 + } 860 + 861 + static void test_tc_links_invalid_target(int target) 862 + { 863 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 864 + LIBBPF_OPTS(bpf_tcx_opts, optl); 865 + __u32 pid1, pid2, lid1; 866 + struct test_tc_link *skel; 867 + struct bpf_link *link; 868 + int err; 869 + 870 + skel = test_tc_link__open(); 871 + if (!ASSERT_OK_PTR(skel, "skel_open")) 872 + goto cleanup; 873 + 874 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 875 + 0, "tc1_attach_type"); 876 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 877 + 0, "tc2_attach_type"); 878 + 879 + err = test_tc_link__load(skel); 880 + if (!ASSERT_OK(err, "skel_load")) 881 + goto cleanup; 882 + 883 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 884 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 885 + 886 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 887 + 888 + assert_mprog_count(target, 0); 889 + 890 + optl.flags = BPF_F_BEFORE | BPF_F_AFTER; 891 + 892 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 893 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 894 + bpf_link__destroy(link); 895 + goto cleanup; 896 + } 897 + 898 + assert_mprog_count(target, 0); 899 + 900 + LIBBPF_OPTS_RESET(optl, 901 + .flags = BPF_F_BEFORE | BPF_F_ID, 902 + ); 903 + 904 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 905 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 906 + bpf_link__destroy(link); 907 + goto cleanup; 908 + } 909 + 910 + assert_mprog_count(target, 0); 911 + 912 + LIBBPF_OPTS_RESET(optl, 913 + .flags = BPF_F_AFTER | BPF_F_ID, 914 + ); 915 + 916 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 917 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 918 + bpf_link__destroy(link); 919 + goto cleanup; 920 + } 921 + 922 + assert_mprog_count(target, 0); 923 + 924 + LIBBPF_OPTS_RESET(optl, 925 + .flags = BPF_F_ID, 926 + ); 927 + 928 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 929 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 930 + bpf_link__destroy(link); 931 + goto cleanup; 932 + } 933 + 934 + assert_mprog_count(target, 0); 935 + 936 + LIBBPF_OPTS_RESET(optl, 937 + .flags = BPF_F_LINK, 938 + .relative_fd = bpf_program__fd(skel->progs.tc2), 939 + ); 940 + 941 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 942 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 943 + bpf_link__destroy(link); 944 + goto cleanup; 945 + } 946 + 947 + assert_mprog_count(target, 0); 948 + 949 + LIBBPF_OPTS_RESET(optl, 950 + .flags = BPF_F_LINK, 951 + ); 952 + 953 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 954 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 955 + bpf_link__destroy(link); 956 + goto cleanup; 957 + } 958 + 959 + assert_mprog_count(target, 0); 960 + 961 + LIBBPF_OPTS_RESET(optl, 962 + .relative_fd = bpf_program__fd(skel->progs.tc2), 963 + ); 964 + 965 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 966 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 967 + bpf_link__destroy(link); 968 + goto cleanup; 969 + } 970 + 971 + assert_mprog_count(target, 0); 972 + 973 + LIBBPF_OPTS_RESET(optl, 974 + .flags = BPF_F_BEFORE | BPF_F_AFTER, 975 + .relative_fd = bpf_program__fd(skel->progs.tc2), 976 + ); 977 + 978 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 979 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 980 + bpf_link__destroy(link); 981 + goto cleanup; 982 + } 983 + 984 + assert_mprog_count(target, 0); 985 + 986 + LIBBPF_OPTS_RESET(optl, 987 + .flags = BPF_F_BEFORE, 988 + .relative_fd = bpf_program__fd(skel->progs.tc1), 989 + ); 990 + 991 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 992 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 993 + bpf_link__destroy(link); 994 + goto cleanup; 995 + } 996 + 997 + assert_mprog_count(target, 0); 998 + 999 + LIBBPF_OPTS_RESET(optl, 1000 + .flags = BPF_F_ID, 1001 + .relative_id = pid2, 1002 + ); 1003 + 1004 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1005 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1006 + bpf_link__destroy(link); 1007 + goto cleanup; 1008 + } 1009 + 1010 + assert_mprog_count(target, 0); 1011 + 1012 + LIBBPF_OPTS_RESET(optl, 1013 + .flags = BPF_F_ID, 1014 + .relative_id = 42, 1015 + ); 1016 + 1017 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1018 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1019 + bpf_link__destroy(link); 1020 + goto cleanup; 1021 + } 1022 + 1023 + assert_mprog_count(target, 0); 1024 + 1025 + LIBBPF_OPTS_RESET(optl, 1026 + .flags = BPF_F_BEFORE, 1027 + .relative_fd = bpf_program__fd(skel->progs.tc1), 1028 + ); 1029 + 1030 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1031 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1032 + bpf_link__destroy(link); 1033 + goto cleanup; 1034 + } 1035 + 1036 + assert_mprog_count(target, 0); 1037 + 1038 + LIBBPF_OPTS_RESET(optl, 1039 + .flags = BPF_F_BEFORE | BPF_F_LINK, 1040 + .relative_fd = bpf_program__fd(skel->progs.tc1), 1041 + ); 1042 + 1043 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1044 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1045 + bpf_link__destroy(link); 1046 + goto cleanup; 1047 + } 1048 + 1049 + assert_mprog_count(target, 0); 1050 + 1051 + LIBBPF_OPTS_RESET(optl, 1052 + .flags = BPF_F_AFTER, 1053 + .relative_fd = bpf_program__fd(skel->progs.tc1), 1054 + ); 1055 + 1056 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1057 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1058 + bpf_link__destroy(link); 1059 + goto cleanup; 1060 + } 1061 + 1062 + assert_mprog_count(target, 0); 1063 + 1064 + LIBBPF_OPTS_RESET(optl); 1065 + 1066 + link = bpf_program__attach_tcx(skel->progs.tc1, 0, &optl); 1067 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1068 + bpf_link__destroy(link); 1069 + goto cleanup; 1070 + } 1071 + 1072 + assert_mprog_count(target, 0); 1073 + 1074 + LIBBPF_OPTS_RESET(optl, 1075 + .flags = BPF_F_AFTER | BPF_F_LINK, 1076 + .relative_fd = bpf_program__fd(skel->progs.tc1), 1077 + ); 1078 + 1079 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1080 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1081 + bpf_link__destroy(link); 1082 + goto cleanup; 1083 + } 1084 + 1085 + assert_mprog_count(target, 0); 1086 + 1087 + LIBBPF_OPTS_RESET(optl); 1088 + 1089 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1090 + if (!ASSERT_OK_PTR(link, "link_attach")) 1091 + goto cleanup; 1092 + 1093 + skel->links.tc1 = link; 1094 + 1095 + lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1)); 1096 + 1097 + assert_mprog_count(target, 1); 1098 + 1099 + LIBBPF_OPTS_RESET(optl, 1100 + .flags = BPF_F_AFTER | BPF_F_LINK, 1101 + .relative_fd = bpf_program__fd(skel->progs.tc1), 1102 + ); 1103 + 1104 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 1105 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1106 + bpf_link__destroy(link); 1107 + goto cleanup; 1108 + } 1109 + 1110 + assert_mprog_count(target, 1); 1111 + 1112 + LIBBPF_OPTS_RESET(optl, 1113 + .flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID, 1114 + .relative_id = ~0, 1115 + ); 1116 + 1117 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 1118 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1119 + bpf_link__destroy(link); 1120 + goto cleanup; 1121 + } 1122 + 1123 + assert_mprog_count(target, 1); 1124 + 1125 + LIBBPF_OPTS_RESET(optl, 1126 + .flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID, 1127 + .relative_id = lid1, 1128 + ); 1129 + 1130 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1131 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1132 + bpf_link__destroy(link); 1133 + goto cleanup; 1134 + } 1135 + 1136 + assert_mprog_count(target, 1); 1137 + 1138 + LIBBPF_OPTS_RESET(optl, 1139 + .flags = BPF_F_BEFORE | BPF_F_ID, 1140 + .relative_id = pid1, 1141 + ); 1142 + 1143 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1144 + if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) { 1145 + bpf_link__destroy(link); 1146 + goto cleanup; 1147 + } 1148 + assert_mprog_count(target, 1); 1149 + 1150 + LIBBPF_OPTS_RESET(optl, 1151 + .flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID, 1152 + .relative_id = lid1, 1153 + ); 1154 + 1155 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 1156 + if (!ASSERT_OK_PTR(link, "link_attach")) 1157 + goto cleanup; 1158 + 1159 + skel->links.tc2 = link; 1160 + 1161 + assert_mprog_count(target, 2); 1162 + cleanup: 1163 + test_tc_link__destroy(skel); 1164 + assert_mprog_count(target, 0); 1165 + } 1166 + 1167 + void serial_test_tc_links_invalid(void) 1168 + { 1169 + test_tc_links_invalid_target(BPF_TCX_INGRESS); 1170 + test_tc_links_invalid_target(BPF_TCX_EGRESS); 1171 + } 1172 + 1173 + static void test_tc_links_prepend_target(int target) 1174 + { 1175 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1176 + LIBBPF_OPTS(bpf_tcx_opts, optl); 1177 + __u32 prog_ids[5], link_ids[5]; 1178 + __u32 pid1, pid2, pid3, pid4; 1179 + __u32 lid1, lid2, lid3, lid4; 1180 + struct test_tc_link *skel; 1181 + struct bpf_link *link; 1182 + int err; 1183 + 1184 + skel = test_tc_link__open(); 1185 + if (!ASSERT_OK_PTR(skel, "skel_open")) 1186 + goto cleanup; 1187 + 1188 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 1189 + 0, "tc1_attach_type"); 1190 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 1191 + 0, "tc2_attach_type"); 1192 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target), 1193 + 0, "tc3_attach_type"); 1194 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target), 1195 + 0, "tc4_attach_type"); 1196 + 1197 + err = test_tc_link__load(skel); 1198 + if (!ASSERT_OK(err, "skel_load")) 1199 + goto cleanup; 1200 + 1201 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 1202 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 1203 + pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3)); 1204 + pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4)); 1205 + 1206 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 1207 + ASSERT_NEQ(pid3, pid4, "prog_ids_3_4"); 1208 + ASSERT_NEQ(pid2, pid3, "prog_ids_2_3"); 1209 + 1210 + assert_mprog_count(target, 0); 1211 + 1212 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1213 + if (!ASSERT_OK_PTR(link, "link_attach")) 1214 + goto cleanup; 1215 + 1216 + skel->links.tc1 = link; 1217 + 1218 + lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1)); 1219 + 1220 + assert_mprog_count(target, 1); 1221 + 1222 + LIBBPF_OPTS_RESET(optl, 1223 + .flags = BPF_F_BEFORE, 1224 + ); 1225 + 1226 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 1227 + if (!ASSERT_OK_PTR(link, "link_attach")) 1228 + goto cleanup; 1229 + 1230 + skel->links.tc2 = link; 1231 + 1232 + lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2)); 1233 + 1234 + assert_mprog_count(target, 2); 1235 + 1236 + optq.prog_ids = prog_ids; 1237 + optq.link_ids = link_ids; 1238 + 1239 + memset(prog_ids, 0, sizeof(prog_ids)); 1240 + memset(link_ids, 0, sizeof(link_ids)); 1241 + optq.count = ARRAY_SIZE(prog_ids); 1242 + 1243 + err = bpf_prog_query_opts(loopback, target, &optq); 1244 + if (!ASSERT_OK(err, "prog_query")) 1245 + goto cleanup; 1246 + 1247 + ASSERT_EQ(optq.count, 2, "count"); 1248 + ASSERT_EQ(optq.revision, 3, "revision"); 1249 + ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]"); 1250 + ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]"); 1251 + ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]"); 1252 + ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]"); 1253 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 1254 + ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]"); 1255 + 1256 + ASSERT_OK(system(ping_cmd), ping_cmd); 1257 + 1258 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 1259 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 1260 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 1261 + ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4"); 1262 + 1263 + skel->bss->seen_tc1 = false; 1264 + skel->bss->seen_tc2 = false; 1265 + 1266 + LIBBPF_OPTS_RESET(optl, 1267 + .flags = BPF_F_BEFORE, 1268 + ); 1269 + 1270 + link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl); 1271 + if (!ASSERT_OK_PTR(link, "link_attach")) 1272 + goto cleanup; 1273 + 1274 + skel->links.tc3 = link; 1275 + 1276 + lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3)); 1277 + 1278 + LIBBPF_OPTS_RESET(optl, 1279 + .flags = BPF_F_BEFORE, 1280 + ); 1281 + 1282 + link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl); 1283 + if (!ASSERT_OK_PTR(link, "link_attach")) 1284 + goto cleanup; 1285 + 1286 + skel->links.tc4 = link; 1287 + 1288 + lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4)); 1289 + 1290 + assert_mprog_count(target, 4); 1291 + 1292 + memset(prog_ids, 0, sizeof(prog_ids)); 1293 + memset(link_ids, 0, sizeof(link_ids)); 1294 + optq.count = ARRAY_SIZE(prog_ids); 1295 + 1296 + err = bpf_prog_query_opts(loopback, target, &optq); 1297 + if (!ASSERT_OK(err, "prog_query")) 1298 + goto cleanup; 1299 + 1300 + ASSERT_EQ(optq.count, 4, "count"); 1301 + ASSERT_EQ(optq.revision, 5, "revision"); 1302 + ASSERT_EQ(optq.prog_ids[0], pid4, "prog_ids[0]"); 1303 + ASSERT_EQ(optq.link_ids[0], lid4, "link_ids[0]"); 1304 + ASSERT_EQ(optq.prog_ids[1], pid3, "prog_ids[1]"); 1305 + ASSERT_EQ(optq.link_ids[1], lid3, "link_ids[1]"); 1306 + ASSERT_EQ(optq.prog_ids[2], pid2, "prog_ids[2]"); 1307 + ASSERT_EQ(optq.link_ids[2], lid2, "link_ids[2]"); 1308 + ASSERT_EQ(optq.prog_ids[3], pid1, "prog_ids[3]"); 1309 + ASSERT_EQ(optq.link_ids[3], lid1, "link_ids[3]"); 1310 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 1311 + ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]"); 1312 + 1313 + ASSERT_OK(system(ping_cmd), ping_cmd); 1314 + 1315 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 1316 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 1317 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 1318 + ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4"); 1319 + cleanup: 1320 + test_tc_link__destroy(skel); 1321 + assert_mprog_count(target, 0); 1322 + } 1323 + 1324 + void serial_test_tc_links_prepend(void) 1325 + { 1326 + test_tc_links_prepend_target(BPF_TCX_INGRESS); 1327 + test_tc_links_prepend_target(BPF_TCX_EGRESS); 1328 + } 1329 + 1330 + static void test_tc_links_append_target(int target) 1331 + { 1332 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1333 + LIBBPF_OPTS(bpf_tcx_opts, optl); 1334 + __u32 prog_ids[5], link_ids[5]; 1335 + __u32 pid1, pid2, pid3, pid4; 1336 + __u32 lid1, lid2, lid3, lid4; 1337 + struct test_tc_link *skel; 1338 + struct bpf_link *link; 1339 + int err; 1340 + 1341 + skel = test_tc_link__open(); 1342 + if (!ASSERT_OK_PTR(skel, "skel_open")) 1343 + goto cleanup; 1344 + 1345 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 1346 + 0, "tc1_attach_type"); 1347 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 1348 + 0, "tc2_attach_type"); 1349 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target), 1350 + 0, "tc3_attach_type"); 1351 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target), 1352 + 0, "tc4_attach_type"); 1353 + 1354 + err = test_tc_link__load(skel); 1355 + if (!ASSERT_OK(err, "skel_load")) 1356 + goto cleanup; 1357 + 1358 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 1359 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 1360 + pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3)); 1361 + pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4)); 1362 + 1363 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 1364 + ASSERT_NEQ(pid3, pid4, "prog_ids_3_4"); 1365 + ASSERT_NEQ(pid2, pid3, "prog_ids_2_3"); 1366 + 1367 + assert_mprog_count(target, 0); 1368 + 1369 + link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl); 1370 + if (!ASSERT_OK_PTR(link, "link_attach")) 1371 + goto cleanup; 1372 + 1373 + skel->links.tc1 = link; 1374 + 1375 + lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1)); 1376 + 1377 + assert_mprog_count(target, 1); 1378 + 1379 + LIBBPF_OPTS_RESET(optl, 1380 + .flags = BPF_F_AFTER, 1381 + ); 1382 + 1383 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 1384 + if (!ASSERT_OK_PTR(link, "link_attach")) 1385 + goto cleanup; 1386 + 1387 + skel->links.tc2 = link; 1388 + 1389 + lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2)); 1390 + 1391 + assert_mprog_count(target, 2); 1392 + 1393 + optq.prog_ids = prog_ids; 1394 + optq.link_ids = link_ids; 1395 + 1396 + memset(prog_ids, 0, sizeof(prog_ids)); 1397 + memset(link_ids, 0, sizeof(link_ids)); 1398 + optq.count = ARRAY_SIZE(prog_ids); 1399 + 1400 + err = bpf_prog_query_opts(loopback, target, &optq); 1401 + if (!ASSERT_OK(err, "prog_query")) 1402 + goto cleanup; 1403 + 1404 + ASSERT_EQ(optq.count, 2, "count"); 1405 + ASSERT_EQ(optq.revision, 3, "revision"); 1406 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 1407 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 1408 + ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]"); 1409 + ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]"); 1410 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 1411 + ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]"); 1412 + 1413 + ASSERT_OK(system(ping_cmd), ping_cmd); 1414 + 1415 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 1416 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 1417 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 1418 + ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4"); 1419 + 1420 + skel->bss->seen_tc1 = false; 1421 + skel->bss->seen_tc2 = false; 1422 + 1423 + LIBBPF_OPTS_RESET(optl, 1424 + .flags = BPF_F_AFTER, 1425 + ); 1426 + 1427 + link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl); 1428 + if (!ASSERT_OK_PTR(link, "link_attach")) 1429 + goto cleanup; 1430 + 1431 + skel->links.tc3 = link; 1432 + 1433 + lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3)); 1434 + 1435 + LIBBPF_OPTS_RESET(optl, 1436 + .flags = BPF_F_AFTER, 1437 + ); 1438 + 1439 + link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl); 1440 + if (!ASSERT_OK_PTR(link, "link_attach")) 1441 + goto cleanup; 1442 + 1443 + skel->links.tc4 = link; 1444 + 1445 + lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4)); 1446 + 1447 + assert_mprog_count(target, 4); 1448 + 1449 + memset(prog_ids, 0, sizeof(prog_ids)); 1450 + memset(link_ids, 0, sizeof(link_ids)); 1451 + optq.count = ARRAY_SIZE(prog_ids); 1452 + 1453 + err = bpf_prog_query_opts(loopback, target, &optq); 1454 + if (!ASSERT_OK(err, "prog_query")) 1455 + goto cleanup; 1456 + 1457 + ASSERT_EQ(optq.count, 4, "count"); 1458 + ASSERT_EQ(optq.revision, 5, "revision"); 1459 + ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]"); 1460 + ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]"); 1461 + ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]"); 1462 + ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]"); 1463 + ASSERT_EQ(optq.prog_ids[2], pid3, "prog_ids[2]"); 1464 + ASSERT_EQ(optq.link_ids[2], lid3, "link_ids[2]"); 1465 + ASSERT_EQ(optq.prog_ids[3], pid4, "prog_ids[3]"); 1466 + ASSERT_EQ(optq.link_ids[3], lid4, "link_ids[3]"); 1467 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 1468 + ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]"); 1469 + 1470 + ASSERT_OK(system(ping_cmd), ping_cmd); 1471 + 1472 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 1473 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 1474 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 1475 + ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4"); 1476 + cleanup: 1477 + test_tc_link__destroy(skel); 1478 + assert_mprog_count(target, 0); 1479 + } 1480 + 1481 + void serial_test_tc_links_append(void) 1482 + { 1483 + test_tc_links_append_target(BPF_TCX_INGRESS); 1484 + test_tc_links_append_target(BPF_TCX_EGRESS); 1485 + } 1486 + 1487 + static void test_tc_links_dev_cleanup_target(int target) 1488 + { 1489 + LIBBPF_OPTS(bpf_tcx_opts, optl); 1490 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1491 + __u32 pid1, pid2, pid3, pid4; 1492 + struct test_tc_link *skel; 1493 + struct bpf_link *link; 1494 + int err, ifindex; 1495 + 1496 + ASSERT_OK(system("ip link add dev tcx_opts1 type veth peer name tcx_opts2"), "add veth"); 1497 + ifindex = if_nametoindex("tcx_opts1"); 1498 + ASSERT_NEQ(ifindex, 0, "non_zero_ifindex"); 1499 + 1500 + skel = test_tc_link__open(); 1501 + if (!ASSERT_OK_PTR(skel, "skel_open")) 1502 + goto cleanup; 1503 + 1504 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 1505 + 0, "tc1_attach_type"); 1506 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 1507 + 0, "tc2_attach_type"); 1508 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target), 1509 + 0, "tc3_attach_type"); 1510 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target), 1511 + 0, "tc4_attach_type"); 1512 + 1513 + err = test_tc_link__load(skel); 1514 + if (!ASSERT_OK(err, "skel_load")) 1515 + goto cleanup; 1516 + 1517 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 1518 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 1519 + pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3)); 1520 + pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4)); 1521 + 1522 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 1523 + ASSERT_NEQ(pid3, pid4, "prog_ids_3_4"); 1524 + ASSERT_NEQ(pid2, pid3, "prog_ids_2_3"); 1525 + 1526 + assert_mprog_count(target, 0); 1527 + 1528 + link = bpf_program__attach_tcx(skel->progs.tc1, ifindex, &optl); 1529 + if (!ASSERT_OK_PTR(link, "link_attach")) 1530 + goto cleanup; 1531 + 1532 + skel->links.tc1 = link; 1533 + 1534 + assert_mprog_count_ifindex(ifindex, target, 1); 1535 + 1536 + link = bpf_program__attach_tcx(skel->progs.tc2, ifindex, &optl); 1537 + if (!ASSERT_OK_PTR(link, "link_attach")) 1538 + goto cleanup; 1539 + 1540 + skel->links.tc2 = link; 1541 + 1542 + assert_mprog_count_ifindex(ifindex, target, 2); 1543 + 1544 + link = bpf_program__attach_tcx(skel->progs.tc3, ifindex, &optl); 1545 + if (!ASSERT_OK_PTR(link, "link_attach")) 1546 + goto cleanup; 1547 + 1548 + skel->links.tc3 = link; 1549 + 1550 + assert_mprog_count_ifindex(ifindex, target, 3); 1551 + 1552 + link = bpf_program__attach_tcx(skel->progs.tc4, ifindex, &optl); 1553 + if (!ASSERT_OK_PTR(link, "link_attach")) 1554 + goto cleanup; 1555 + 1556 + skel->links.tc4 = link; 1557 + 1558 + assert_mprog_count_ifindex(ifindex, target, 4); 1559 + 1560 + ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth"); 1561 + ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed"); 1562 + ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed"); 1563 + 1564 + ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc1)), 0, "tc1_ifindex"); 1565 + ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc2)), 0, "tc2_ifindex"); 1566 + ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc3)), 0, "tc3_ifindex"); 1567 + ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc4)), 0, "tc4_ifindex"); 1568 + 1569 + test_tc_link__destroy(skel); 1570 + return; 1571 + cleanup: 1572 + test_tc_link__destroy(skel); 1573 + 1574 + ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth"); 1575 + ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed"); 1576 + ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed"); 1577 + } 1578 + 1579 + void serial_test_tc_links_dev_cleanup(void) 1580 + { 1581 + test_tc_links_dev_cleanup_target(BPF_TCX_INGRESS); 1582 + test_tc_links_dev_cleanup_target(BPF_TCX_EGRESS); 1583 + }

+2239

tools/testing/selftests/bpf/prog_tests/tc_opts.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Isovalent */ 3 + #include <uapi/linux/if_link.h> 4 + #include <net/if.h> 5 + #include <test_progs.h> 6 + 7 + #define loopback 1 8 + #define ping_cmd "ping -q -c1 -w1 127.0.0.1 > /dev/null" 9 + 10 + #include "test_tc_link.skel.h" 11 + #include "tc_helpers.h" 12 + 13 + void serial_test_tc_opts_basic(void) 14 + { 15 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 16 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 17 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 18 + __u32 fd1, fd2, id1, id2; 19 + struct test_tc_link *skel; 20 + __u32 prog_ids[2]; 21 + int err; 22 + 23 + skel = test_tc_link__open_and_load(); 24 + if (!ASSERT_OK_PTR(skel, "skel_load")) 25 + goto cleanup; 26 + 27 + fd1 = bpf_program__fd(skel->progs.tc1); 28 + fd2 = bpf_program__fd(skel->progs.tc2); 29 + 30 + id1 = id_from_prog_fd(fd1); 31 + id2 = id_from_prog_fd(fd2); 32 + 33 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 34 + 35 + assert_mprog_count(BPF_TCX_INGRESS, 0); 36 + assert_mprog_count(BPF_TCX_EGRESS, 0); 37 + 38 + ASSERT_EQ(skel->bss->seen_tc1, false, "seen_tc1"); 39 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 40 + 41 + err = bpf_prog_attach_opts(fd1, loopback, BPF_TCX_INGRESS, &opta); 42 + if (!ASSERT_EQ(err, 0, "prog_attach")) 43 + goto cleanup; 44 + 45 + assert_mprog_count(BPF_TCX_INGRESS, 1); 46 + assert_mprog_count(BPF_TCX_EGRESS, 0); 47 + 48 + optq.prog_ids = prog_ids; 49 + 50 + memset(prog_ids, 0, sizeof(prog_ids)); 51 + optq.count = ARRAY_SIZE(prog_ids); 52 + 53 + err = bpf_prog_query_opts(loopback, BPF_TCX_INGRESS, &optq); 54 + if (!ASSERT_OK(err, "prog_query")) 55 + goto cleanup_in; 56 + 57 + ASSERT_EQ(optq.count, 1, "count"); 58 + ASSERT_EQ(optq.revision, 2, "revision"); 59 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 60 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 61 + 62 + ASSERT_OK(system(ping_cmd), ping_cmd); 63 + 64 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 65 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 66 + 67 + err = bpf_prog_attach_opts(fd2, loopback, BPF_TCX_EGRESS, &opta); 68 + if (!ASSERT_EQ(err, 0, "prog_attach")) 69 + goto cleanup_in; 70 + 71 + assert_mprog_count(BPF_TCX_INGRESS, 1); 72 + assert_mprog_count(BPF_TCX_EGRESS, 1); 73 + 74 + memset(prog_ids, 0, sizeof(prog_ids)); 75 + optq.count = ARRAY_SIZE(prog_ids); 76 + 77 + err = bpf_prog_query_opts(loopback, BPF_TCX_EGRESS, &optq); 78 + if (!ASSERT_OK(err, "prog_query")) 79 + goto cleanup_eg; 80 + 81 + ASSERT_EQ(optq.count, 1, "count"); 82 + ASSERT_EQ(optq.revision, 2, "revision"); 83 + ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]"); 84 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 85 + 86 + ASSERT_OK(system(ping_cmd), ping_cmd); 87 + 88 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 89 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 90 + 91 + cleanup_eg: 92 + err = bpf_prog_detach_opts(fd2, loopback, BPF_TCX_EGRESS, &optd); 93 + ASSERT_OK(err, "prog_detach_eg"); 94 + 95 + assert_mprog_count(BPF_TCX_INGRESS, 1); 96 + assert_mprog_count(BPF_TCX_EGRESS, 0); 97 + 98 + cleanup_in: 99 + err = bpf_prog_detach_opts(fd1, loopback, BPF_TCX_INGRESS, &optd); 100 + ASSERT_OK(err, "prog_detach_in"); 101 + 102 + assert_mprog_count(BPF_TCX_INGRESS, 0); 103 + assert_mprog_count(BPF_TCX_EGRESS, 0); 104 + 105 + cleanup: 106 + test_tc_link__destroy(skel); 107 + } 108 + 109 + static void test_tc_opts_before_target(int target) 110 + { 111 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 112 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 113 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 114 + __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4; 115 + struct test_tc_link *skel; 116 + __u32 prog_ids[5]; 117 + int err; 118 + 119 + skel = test_tc_link__open_and_load(); 120 + if (!ASSERT_OK_PTR(skel, "skel_load")) 121 + goto cleanup; 122 + 123 + fd1 = bpf_program__fd(skel->progs.tc1); 124 + fd2 = bpf_program__fd(skel->progs.tc2); 125 + fd3 = bpf_program__fd(skel->progs.tc3); 126 + fd4 = bpf_program__fd(skel->progs.tc4); 127 + 128 + id1 = id_from_prog_fd(fd1); 129 + id2 = id_from_prog_fd(fd2); 130 + id3 = id_from_prog_fd(fd3); 131 + id4 = id_from_prog_fd(fd4); 132 + 133 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 134 + ASSERT_NEQ(id3, id4, "prog_ids_3_4"); 135 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 136 + 137 + assert_mprog_count(target, 0); 138 + 139 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 140 + if (!ASSERT_EQ(err, 0, "prog_attach")) 141 + goto cleanup; 142 + 143 + assert_mprog_count(target, 1); 144 + 145 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 146 + if (!ASSERT_EQ(err, 0, "prog_attach")) 147 + goto cleanup_target; 148 + 149 + assert_mprog_count(target, 2); 150 + 151 + optq.prog_ids = prog_ids; 152 + 153 + memset(prog_ids, 0, sizeof(prog_ids)); 154 + optq.count = ARRAY_SIZE(prog_ids); 155 + 156 + err = bpf_prog_query_opts(loopback, target, &optq); 157 + if (!ASSERT_OK(err, "prog_query")) 158 + goto cleanup_target2; 159 + 160 + ASSERT_EQ(optq.count, 2, "count"); 161 + ASSERT_EQ(optq.revision, 3, "revision"); 162 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 163 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 164 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 165 + 166 + ASSERT_OK(system(ping_cmd), ping_cmd); 167 + 168 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 169 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 170 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 171 + ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4"); 172 + 173 + LIBBPF_OPTS_RESET(opta, 174 + .flags = BPF_F_BEFORE, 175 + .relative_fd = fd2, 176 + ); 177 + 178 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 179 + if (!ASSERT_EQ(err, 0, "prog_attach")) 180 + goto cleanup_target2; 181 + 182 + memset(prog_ids, 0, sizeof(prog_ids)); 183 + optq.count = ARRAY_SIZE(prog_ids); 184 + 185 + err = bpf_prog_query_opts(loopback, target, &optq); 186 + if (!ASSERT_OK(err, "prog_query")) 187 + goto cleanup_target3; 188 + 189 + ASSERT_EQ(optq.count, 3, "count"); 190 + ASSERT_EQ(optq.revision, 4, "revision"); 191 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 192 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 193 + ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]"); 194 + ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]"); 195 + 196 + LIBBPF_OPTS_RESET(opta, 197 + .flags = BPF_F_BEFORE, 198 + .relative_id = id1, 199 + ); 200 + 201 + err = bpf_prog_attach_opts(fd4, loopback, target, &opta); 202 + if (!ASSERT_EQ(err, 0, "prog_attach")) 203 + goto cleanup_target3; 204 + 205 + assert_mprog_count(target, 4); 206 + 207 + memset(prog_ids, 0, sizeof(prog_ids)); 208 + optq.count = ARRAY_SIZE(prog_ids); 209 + 210 + err = bpf_prog_query_opts(loopback, target, &optq); 211 + if (!ASSERT_OK(err, "prog_query")) 212 + goto cleanup_target4; 213 + 214 + ASSERT_EQ(optq.count, 4, "count"); 215 + ASSERT_EQ(optq.revision, 5, "revision"); 216 + ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]"); 217 + ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]"); 218 + ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]"); 219 + ASSERT_EQ(optq.prog_ids[3], id2, "prog_ids[3]"); 220 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 221 + 222 + ASSERT_OK(system(ping_cmd), ping_cmd); 223 + 224 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 225 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 226 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 227 + ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4"); 228 + 229 + cleanup_target4: 230 + err = bpf_prog_detach_opts(fd4, loopback, target, &optd); 231 + ASSERT_OK(err, "prog_detach"); 232 + assert_mprog_count(target, 3); 233 + 234 + cleanup_target3: 235 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 236 + ASSERT_OK(err, "prog_detach"); 237 + assert_mprog_count(target, 2); 238 + 239 + cleanup_target2: 240 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 241 + ASSERT_OK(err, "prog_detach"); 242 + assert_mprog_count(target, 1); 243 + 244 + cleanup_target: 245 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 246 + ASSERT_OK(err, "prog_detach"); 247 + assert_mprog_count(target, 0); 248 + 249 + cleanup: 250 + test_tc_link__destroy(skel); 251 + } 252 + 253 + void serial_test_tc_opts_before(void) 254 + { 255 + test_tc_opts_before_target(BPF_TCX_INGRESS); 256 + test_tc_opts_before_target(BPF_TCX_EGRESS); 257 + } 258 + 259 + static void test_tc_opts_after_target(int target) 260 + { 261 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 262 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 263 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 264 + __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4; 265 + struct test_tc_link *skel; 266 + __u32 prog_ids[5]; 267 + int err; 268 + 269 + skel = test_tc_link__open_and_load(); 270 + if (!ASSERT_OK_PTR(skel, "skel_load")) 271 + goto cleanup; 272 + 273 + fd1 = bpf_program__fd(skel->progs.tc1); 274 + fd2 = bpf_program__fd(skel->progs.tc2); 275 + fd3 = bpf_program__fd(skel->progs.tc3); 276 + fd4 = bpf_program__fd(skel->progs.tc4); 277 + 278 + id1 = id_from_prog_fd(fd1); 279 + id2 = id_from_prog_fd(fd2); 280 + id3 = id_from_prog_fd(fd3); 281 + id4 = id_from_prog_fd(fd4); 282 + 283 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 284 + ASSERT_NEQ(id3, id4, "prog_ids_3_4"); 285 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 286 + 287 + assert_mprog_count(target, 0); 288 + 289 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 290 + if (!ASSERT_EQ(err, 0, "prog_attach")) 291 + goto cleanup; 292 + 293 + assert_mprog_count(target, 1); 294 + 295 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 296 + if (!ASSERT_EQ(err, 0, "prog_attach")) 297 + goto cleanup_target; 298 + 299 + assert_mprog_count(target, 2); 300 + 301 + optq.prog_ids = prog_ids; 302 + 303 + memset(prog_ids, 0, sizeof(prog_ids)); 304 + optq.count = ARRAY_SIZE(prog_ids); 305 + 306 + err = bpf_prog_query_opts(loopback, target, &optq); 307 + if (!ASSERT_OK(err, "prog_query")) 308 + goto cleanup_target2; 309 + 310 + ASSERT_EQ(optq.count, 2, "count"); 311 + ASSERT_EQ(optq.revision, 3, "revision"); 312 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 313 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 314 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 315 + 316 + ASSERT_OK(system(ping_cmd), ping_cmd); 317 + 318 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 319 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 320 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 321 + ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4"); 322 + 323 + LIBBPF_OPTS_RESET(opta, 324 + .flags = BPF_F_AFTER, 325 + .relative_fd = fd1, 326 + ); 327 + 328 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 329 + if (!ASSERT_EQ(err, 0, "prog_attach")) 330 + goto cleanup_target2; 331 + 332 + memset(prog_ids, 0, sizeof(prog_ids)); 333 + optq.count = ARRAY_SIZE(prog_ids); 334 + 335 + err = bpf_prog_query_opts(loopback, target, &optq); 336 + if (!ASSERT_OK(err, "prog_query")) 337 + goto cleanup_target3; 338 + 339 + ASSERT_EQ(optq.count, 3, "count"); 340 + ASSERT_EQ(optq.revision, 4, "revision"); 341 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 342 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 343 + ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]"); 344 + ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]"); 345 + 346 + LIBBPF_OPTS_RESET(opta, 347 + .flags = BPF_F_AFTER, 348 + .relative_id = id2, 349 + ); 350 + 351 + err = bpf_prog_attach_opts(fd4, loopback, target, &opta); 352 + if (!ASSERT_EQ(err, 0, "prog_attach")) 353 + goto cleanup_target3; 354 + 355 + assert_mprog_count(target, 4); 356 + 357 + memset(prog_ids, 0, sizeof(prog_ids)); 358 + optq.count = ARRAY_SIZE(prog_ids); 359 + 360 + err = bpf_prog_query_opts(loopback, target, &optq); 361 + if (!ASSERT_OK(err, "prog_query")) 362 + goto cleanup_target4; 363 + 364 + ASSERT_EQ(optq.count, 4, "count"); 365 + ASSERT_EQ(optq.revision, 5, "revision"); 366 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 367 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 368 + ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]"); 369 + ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]"); 370 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 371 + 372 + ASSERT_OK(system(ping_cmd), ping_cmd); 373 + 374 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 375 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 376 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 377 + ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4"); 378 + 379 + cleanup_target4: 380 + err = bpf_prog_detach_opts(fd4, loopback, target, &optd); 381 + ASSERT_OK(err, "prog_detach"); 382 + assert_mprog_count(target, 3); 383 + 384 + memset(prog_ids, 0, sizeof(prog_ids)); 385 + optq.count = ARRAY_SIZE(prog_ids); 386 + 387 + err = bpf_prog_query_opts(loopback, target, &optq); 388 + if (!ASSERT_OK(err, "prog_query")) 389 + goto cleanup_target3; 390 + 391 + ASSERT_EQ(optq.count, 3, "count"); 392 + ASSERT_EQ(optq.revision, 6, "revision"); 393 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 394 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 395 + ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]"); 396 + ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]"); 397 + 398 + cleanup_target3: 399 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 400 + ASSERT_OK(err, "prog_detach"); 401 + assert_mprog_count(target, 2); 402 + 403 + memset(prog_ids, 0, sizeof(prog_ids)); 404 + optq.count = ARRAY_SIZE(prog_ids); 405 + 406 + err = bpf_prog_query_opts(loopback, target, &optq); 407 + if (!ASSERT_OK(err, "prog_query")) 408 + goto cleanup_target2; 409 + 410 + ASSERT_EQ(optq.count, 2, "count"); 411 + ASSERT_EQ(optq.revision, 7, "revision"); 412 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 413 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 414 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 415 + 416 + cleanup_target2: 417 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 418 + ASSERT_OK(err, "prog_detach"); 419 + assert_mprog_count(target, 1); 420 + 421 + memset(prog_ids, 0, sizeof(prog_ids)); 422 + optq.count = ARRAY_SIZE(prog_ids); 423 + 424 + err = bpf_prog_query_opts(loopback, target, &optq); 425 + if (!ASSERT_OK(err, "prog_query")) 426 + goto cleanup_target; 427 + 428 + ASSERT_EQ(optq.count, 1, "count"); 429 + ASSERT_EQ(optq.revision, 8, "revision"); 430 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 431 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 432 + 433 + cleanup_target: 434 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 435 + ASSERT_OK(err, "prog_detach"); 436 + assert_mprog_count(target, 0); 437 + 438 + cleanup: 439 + test_tc_link__destroy(skel); 440 + } 441 + 442 + void serial_test_tc_opts_after(void) 443 + { 444 + test_tc_opts_after_target(BPF_TCX_INGRESS); 445 + test_tc_opts_after_target(BPF_TCX_EGRESS); 446 + } 447 + 448 + static void test_tc_opts_revision_target(int target) 449 + { 450 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 451 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 452 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 453 + __u32 fd1, fd2, id1, id2; 454 + struct test_tc_link *skel; 455 + __u32 prog_ids[3]; 456 + int err; 457 + 458 + skel = test_tc_link__open_and_load(); 459 + if (!ASSERT_OK_PTR(skel, "skel_load")) 460 + goto cleanup; 461 + 462 + fd1 = bpf_program__fd(skel->progs.tc1); 463 + fd2 = bpf_program__fd(skel->progs.tc2); 464 + 465 + id1 = id_from_prog_fd(fd1); 466 + id2 = id_from_prog_fd(fd2); 467 + 468 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 469 + 470 + assert_mprog_count(target, 0); 471 + 472 + LIBBPF_OPTS_RESET(opta, 473 + .expected_revision = 1, 474 + ); 475 + 476 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 477 + if (!ASSERT_EQ(err, 0, "prog_attach")) 478 + goto cleanup; 479 + 480 + assert_mprog_count(target, 1); 481 + 482 + LIBBPF_OPTS_RESET(opta, 483 + .expected_revision = 1, 484 + ); 485 + 486 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 487 + if (!ASSERT_EQ(err, -ESTALE, "prog_attach")) 488 + goto cleanup_target; 489 + 490 + assert_mprog_count(target, 1); 491 + 492 + LIBBPF_OPTS_RESET(opta, 493 + .expected_revision = 2, 494 + ); 495 + 496 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 497 + if (!ASSERT_EQ(err, 0, "prog_attach")) 498 + goto cleanup_target; 499 + 500 + assert_mprog_count(target, 2); 501 + 502 + optq.prog_ids = prog_ids; 503 + 504 + memset(prog_ids, 0, sizeof(prog_ids)); 505 + optq.count = ARRAY_SIZE(prog_ids); 506 + 507 + err = bpf_prog_query_opts(loopback, target, &optq); 508 + if (!ASSERT_OK(err, "prog_query")) 509 + goto cleanup_target2; 510 + 511 + ASSERT_EQ(optq.count, 2, "count"); 512 + ASSERT_EQ(optq.revision, 3, "revision"); 513 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 514 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 515 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 516 + 517 + ASSERT_OK(system(ping_cmd), ping_cmd); 518 + 519 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 520 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 521 + 522 + LIBBPF_OPTS_RESET(optd, 523 + .expected_revision = 2, 524 + ); 525 + 526 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 527 + ASSERT_EQ(err, -ESTALE, "prog_detach"); 528 + assert_mprog_count(target, 2); 529 + 530 + cleanup_target2: 531 + LIBBPF_OPTS_RESET(optd, 532 + .expected_revision = 3, 533 + ); 534 + 535 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 536 + ASSERT_OK(err, "prog_detach"); 537 + assert_mprog_count(target, 1); 538 + 539 + cleanup_target: 540 + LIBBPF_OPTS_RESET(optd); 541 + 542 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 543 + ASSERT_OK(err, "prog_detach"); 544 + assert_mprog_count(target, 0); 545 + 546 + cleanup: 547 + test_tc_link__destroy(skel); 548 + } 549 + 550 + void serial_test_tc_opts_revision(void) 551 + { 552 + test_tc_opts_revision_target(BPF_TCX_INGRESS); 553 + test_tc_opts_revision_target(BPF_TCX_EGRESS); 554 + } 555 + 556 + static void test_tc_chain_classic(int target, bool chain_tc_old) 557 + { 558 + LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1); 559 + LIBBPF_OPTS(bpf_tc_hook, tc_hook, .ifindex = loopback); 560 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 561 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 562 + bool hook_created = false, tc_attached = false; 563 + __u32 fd1, fd2, fd3, id1, id2, id3; 564 + struct test_tc_link *skel; 565 + int err; 566 + 567 + skel = test_tc_link__open_and_load(); 568 + if (!ASSERT_OK_PTR(skel, "skel_load")) 569 + goto cleanup; 570 + 571 + fd1 = bpf_program__fd(skel->progs.tc1); 572 + fd2 = bpf_program__fd(skel->progs.tc2); 573 + fd3 = bpf_program__fd(skel->progs.tc3); 574 + 575 + id1 = id_from_prog_fd(fd1); 576 + id2 = id_from_prog_fd(fd2); 577 + id3 = id_from_prog_fd(fd3); 578 + 579 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 580 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 581 + 582 + assert_mprog_count(target, 0); 583 + 584 + if (chain_tc_old) { 585 + tc_hook.attach_point = target == BPF_TCX_INGRESS ? 586 + BPF_TC_INGRESS : BPF_TC_EGRESS; 587 + err = bpf_tc_hook_create(&tc_hook); 588 + if (err == 0) 589 + hook_created = true; 590 + err = err == -EEXIST ? 0 : err; 591 + if (!ASSERT_OK(err, "bpf_tc_hook_create")) 592 + goto cleanup; 593 + 594 + tc_opts.prog_fd = fd3; 595 + err = bpf_tc_attach(&tc_hook, &tc_opts); 596 + if (!ASSERT_OK(err, "bpf_tc_attach")) 597 + goto cleanup; 598 + tc_attached = true; 599 + } 600 + 601 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 602 + if (!ASSERT_EQ(err, 0, "prog_attach")) 603 + goto cleanup; 604 + 605 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 606 + if (!ASSERT_EQ(err, 0, "prog_attach")) 607 + goto cleanup_detach; 608 + 609 + assert_mprog_count(target, 2); 610 + 611 + ASSERT_OK(system(ping_cmd), ping_cmd); 612 + 613 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 614 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 615 + ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3"); 616 + 617 + skel->bss->seen_tc1 = false; 618 + skel->bss->seen_tc2 = false; 619 + skel->bss->seen_tc3 = false; 620 + 621 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 622 + if (!ASSERT_OK(err, "prog_detach")) 623 + goto cleanup_detach; 624 + 625 + assert_mprog_count(target, 1); 626 + 627 + ASSERT_OK(system(ping_cmd), ping_cmd); 628 + 629 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 630 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 631 + ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3"); 632 + 633 + cleanup_detach: 634 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 635 + if (!ASSERT_OK(err, "prog_detach")) 636 + goto cleanup; 637 + 638 + __assert_mprog_count(target, 0, chain_tc_old, loopback); 639 + cleanup: 640 + if (tc_attached) { 641 + tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0; 642 + err = bpf_tc_detach(&tc_hook, &tc_opts); 643 + ASSERT_OK(err, "bpf_tc_detach"); 644 + } 645 + if (hook_created) { 646 + tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS; 647 + bpf_tc_hook_destroy(&tc_hook); 648 + } 649 + test_tc_link__destroy(skel); 650 + assert_mprog_count(target, 0); 651 + } 652 + 653 + void serial_test_tc_opts_chain_classic(void) 654 + { 655 + test_tc_chain_classic(BPF_TCX_INGRESS, false); 656 + test_tc_chain_classic(BPF_TCX_EGRESS, false); 657 + test_tc_chain_classic(BPF_TCX_INGRESS, true); 658 + test_tc_chain_classic(BPF_TCX_EGRESS, true); 659 + } 660 + 661 + static void test_tc_opts_replace_target(int target) 662 + { 663 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 664 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 665 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 666 + __u32 fd1, fd2, fd3, id1, id2, id3, detach_fd; 667 + __u32 prog_ids[4], prog_flags[4]; 668 + struct test_tc_link *skel; 669 + int err; 670 + 671 + skel = test_tc_link__open_and_load(); 672 + if (!ASSERT_OK_PTR(skel, "skel_load")) 673 + goto cleanup; 674 + 675 + fd1 = bpf_program__fd(skel->progs.tc1); 676 + fd2 = bpf_program__fd(skel->progs.tc2); 677 + fd3 = bpf_program__fd(skel->progs.tc3); 678 + 679 + id1 = id_from_prog_fd(fd1); 680 + id2 = id_from_prog_fd(fd2); 681 + id3 = id_from_prog_fd(fd3); 682 + 683 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 684 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 685 + 686 + assert_mprog_count(target, 0); 687 + 688 + LIBBPF_OPTS_RESET(opta, 689 + .expected_revision = 1, 690 + ); 691 + 692 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 693 + if (!ASSERT_EQ(err, 0, "prog_attach")) 694 + goto cleanup; 695 + 696 + assert_mprog_count(target, 1); 697 + 698 + LIBBPF_OPTS_RESET(opta, 699 + .flags = BPF_F_BEFORE, 700 + .relative_id = id1, 701 + .expected_revision = 2, 702 + ); 703 + 704 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 705 + if (!ASSERT_EQ(err, 0, "prog_attach")) 706 + goto cleanup_target; 707 + 708 + detach_fd = fd2; 709 + 710 + assert_mprog_count(target, 2); 711 + 712 + optq.prog_attach_flags = prog_flags; 713 + optq.prog_ids = prog_ids; 714 + 715 + memset(prog_flags, 0, sizeof(prog_flags)); 716 + memset(prog_ids, 0, sizeof(prog_ids)); 717 + optq.count = ARRAY_SIZE(prog_ids); 718 + 719 + err = bpf_prog_query_opts(loopback, target, &optq); 720 + if (!ASSERT_OK(err, "prog_query")) 721 + goto cleanup_target2; 722 + 723 + ASSERT_EQ(optq.count, 2, "count"); 724 + ASSERT_EQ(optq.revision, 3, "revision"); 725 + ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]"); 726 + ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]"); 727 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 728 + 729 + ASSERT_EQ(optq.prog_attach_flags[0], 0, "prog_flags[0]"); 730 + ASSERT_EQ(optq.prog_attach_flags[1], 0, "prog_flags[1]"); 731 + ASSERT_EQ(optq.prog_attach_flags[2], 0, "prog_flags[2]"); 732 + 733 + ASSERT_OK(system(ping_cmd), ping_cmd); 734 + 735 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 736 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 737 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 738 + 739 + skel->bss->seen_tc1 = false; 740 + skel->bss->seen_tc2 = false; 741 + skel->bss->seen_tc3 = false; 742 + 743 + LIBBPF_OPTS_RESET(opta, 744 + .flags = BPF_F_REPLACE, 745 + .replace_prog_fd = fd2, 746 + .expected_revision = 3, 747 + ); 748 + 749 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 750 + if (!ASSERT_EQ(err, 0, "prog_attach")) 751 + goto cleanup_target2; 752 + 753 + detach_fd = fd3; 754 + 755 + assert_mprog_count(target, 2); 756 + 757 + memset(prog_ids, 0, sizeof(prog_ids)); 758 + optq.count = ARRAY_SIZE(prog_ids); 759 + 760 + err = bpf_prog_query_opts(loopback, target, &optq); 761 + if (!ASSERT_OK(err, "prog_query")) 762 + goto cleanup_target2; 763 + 764 + ASSERT_EQ(optq.count, 2, "count"); 765 + ASSERT_EQ(optq.revision, 4, "revision"); 766 + ASSERT_EQ(optq.prog_ids[0], id3, "prog_ids[0]"); 767 + ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]"); 768 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 769 + 770 + ASSERT_OK(system(ping_cmd), ping_cmd); 771 + 772 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 773 + ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2"); 774 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 775 + 776 + skel->bss->seen_tc1 = false; 777 + skel->bss->seen_tc2 = false; 778 + skel->bss->seen_tc3 = false; 779 + 780 + LIBBPF_OPTS_RESET(opta, 781 + .flags = BPF_F_REPLACE | BPF_F_BEFORE, 782 + .replace_prog_fd = fd3, 783 + .relative_fd = fd1, 784 + .expected_revision = 4, 785 + ); 786 + 787 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 788 + if (!ASSERT_EQ(err, 0, "prog_attach")) 789 + goto cleanup_target2; 790 + 791 + detach_fd = fd2; 792 + 793 + assert_mprog_count(target, 2); 794 + 795 + memset(prog_ids, 0, sizeof(prog_ids)); 796 + optq.count = ARRAY_SIZE(prog_ids); 797 + 798 + err = bpf_prog_query_opts(loopback, target, &optq); 799 + if (!ASSERT_OK(err, "prog_query")) 800 + goto cleanup_target2; 801 + 802 + ASSERT_EQ(optq.count, 2, "count"); 803 + ASSERT_EQ(optq.revision, 5, "revision"); 804 + ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]"); 805 + ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]"); 806 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 807 + 808 + ASSERT_OK(system(ping_cmd), ping_cmd); 809 + 810 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 811 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 812 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 813 + 814 + LIBBPF_OPTS_RESET(opta, 815 + .flags = BPF_F_REPLACE, 816 + .replace_prog_fd = fd2, 817 + ); 818 + 819 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 820 + ASSERT_EQ(err, -EEXIST, "prog_attach"); 821 + assert_mprog_count(target, 2); 822 + 823 + LIBBPF_OPTS_RESET(opta, 824 + .flags = BPF_F_REPLACE | BPF_F_AFTER, 825 + .replace_prog_fd = fd2, 826 + .relative_fd = fd1, 827 + .expected_revision = 5, 828 + ); 829 + 830 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 831 + ASSERT_EQ(err, -ERANGE, "prog_attach"); 832 + assert_mprog_count(target, 2); 833 + 834 + LIBBPF_OPTS_RESET(opta, 835 + .flags = BPF_F_BEFORE | BPF_F_AFTER | BPF_F_REPLACE, 836 + .replace_prog_fd = fd2, 837 + .relative_fd = fd1, 838 + .expected_revision = 5, 839 + ); 840 + 841 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 842 + ASSERT_EQ(err, -ERANGE, "prog_attach"); 843 + assert_mprog_count(target, 2); 844 + 845 + LIBBPF_OPTS_RESET(optd, 846 + .flags = BPF_F_BEFORE, 847 + .relative_id = id1, 848 + .expected_revision = 5, 849 + ); 850 + 851 + cleanup_target2: 852 + err = bpf_prog_detach_opts(detach_fd, loopback, target, &optd); 853 + ASSERT_OK(err, "prog_detach"); 854 + assert_mprog_count(target, 1); 855 + 856 + cleanup_target: 857 + LIBBPF_OPTS_RESET(optd); 858 + 859 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 860 + ASSERT_OK(err, "prog_detach"); 861 + assert_mprog_count(target, 0); 862 + 863 + cleanup: 864 + test_tc_link__destroy(skel); 865 + } 866 + 867 + void serial_test_tc_opts_replace(void) 868 + { 869 + test_tc_opts_replace_target(BPF_TCX_INGRESS); 870 + test_tc_opts_replace_target(BPF_TCX_EGRESS); 871 + } 872 + 873 + static void test_tc_opts_invalid_target(int target) 874 + { 875 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 876 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 877 + __u32 fd1, fd2, id1, id2; 878 + struct test_tc_link *skel; 879 + int err; 880 + 881 + skel = test_tc_link__open_and_load(); 882 + if (!ASSERT_OK_PTR(skel, "skel_load")) 883 + goto cleanup; 884 + 885 + fd1 = bpf_program__fd(skel->progs.tc1); 886 + fd2 = bpf_program__fd(skel->progs.tc2); 887 + 888 + id1 = id_from_prog_fd(fd1); 889 + id2 = id_from_prog_fd(fd2); 890 + 891 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 892 + 893 + assert_mprog_count(target, 0); 894 + 895 + LIBBPF_OPTS_RESET(opta, 896 + .flags = BPF_F_BEFORE | BPF_F_AFTER, 897 + ); 898 + 899 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 900 + ASSERT_EQ(err, -ERANGE, "prog_attach"); 901 + assert_mprog_count(target, 0); 902 + 903 + LIBBPF_OPTS_RESET(opta, 904 + .flags = BPF_F_BEFORE | BPF_F_ID, 905 + ); 906 + 907 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 908 + ASSERT_EQ(err, -ENOENT, "prog_attach"); 909 + assert_mprog_count(target, 0); 910 + 911 + LIBBPF_OPTS_RESET(opta, 912 + .flags = BPF_F_AFTER | BPF_F_ID, 913 + ); 914 + 915 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 916 + ASSERT_EQ(err, -ENOENT, "prog_attach"); 917 + assert_mprog_count(target, 0); 918 + 919 + LIBBPF_OPTS_RESET(opta, 920 + .relative_fd = fd2, 921 + ); 922 + 923 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 924 + ASSERT_EQ(err, -EINVAL, "prog_attach"); 925 + assert_mprog_count(target, 0); 926 + 927 + LIBBPF_OPTS_RESET(opta, 928 + .flags = BPF_F_BEFORE | BPF_F_AFTER, 929 + .relative_fd = fd2, 930 + ); 931 + 932 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 933 + ASSERT_EQ(err, -ENOENT, "prog_attach"); 934 + assert_mprog_count(target, 0); 935 + 936 + LIBBPF_OPTS_RESET(opta, 937 + .flags = BPF_F_ID, 938 + .relative_id = id2, 939 + ); 940 + 941 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 942 + ASSERT_EQ(err, -EINVAL, "prog_attach"); 943 + assert_mprog_count(target, 0); 944 + 945 + LIBBPF_OPTS_RESET(opta, 946 + .flags = BPF_F_BEFORE, 947 + .relative_fd = fd1, 948 + ); 949 + 950 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 951 + ASSERT_EQ(err, -ENOENT, "prog_attach"); 952 + assert_mprog_count(target, 0); 953 + 954 + LIBBPF_OPTS_RESET(opta, 955 + .flags = BPF_F_AFTER, 956 + .relative_fd = fd1, 957 + ); 958 + 959 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 960 + ASSERT_EQ(err, -ENOENT, "prog_attach"); 961 + assert_mprog_count(target, 0); 962 + 963 + LIBBPF_OPTS_RESET(opta); 964 + 965 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 966 + if (!ASSERT_EQ(err, 0, "prog_attach")) 967 + goto cleanup; 968 + 969 + assert_mprog_count(target, 1); 970 + 971 + LIBBPF_OPTS_RESET(opta); 972 + 973 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 974 + ASSERT_EQ(err, -EEXIST, "prog_attach"); 975 + assert_mprog_count(target, 1); 976 + 977 + LIBBPF_OPTS_RESET(opta, 978 + .flags = BPF_F_BEFORE, 979 + .relative_fd = fd1, 980 + ); 981 + 982 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 983 + ASSERT_EQ(err, -EEXIST, "prog_attach"); 984 + assert_mprog_count(target, 1); 985 + 986 + LIBBPF_OPTS_RESET(opta, 987 + .flags = BPF_F_AFTER, 988 + .relative_fd = fd1, 989 + ); 990 + 991 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 992 + ASSERT_EQ(err, -EEXIST, "prog_attach"); 993 + assert_mprog_count(target, 1); 994 + 995 + LIBBPF_OPTS_RESET(opta, 996 + .flags = BPF_F_REPLACE, 997 + .relative_fd = fd1, 998 + ); 999 + 1000 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 1001 + ASSERT_EQ(err, -EINVAL, "prog_attach_x1"); 1002 + assert_mprog_count(target, 1); 1003 + 1004 + LIBBPF_OPTS_RESET(opta, 1005 + .flags = BPF_F_REPLACE, 1006 + .replace_prog_fd = fd1, 1007 + ); 1008 + 1009 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 1010 + ASSERT_EQ(err, -EEXIST, "prog_attach"); 1011 + assert_mprog_count(target, 1); 1012 + 1013 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 1014 + ASSERT_OK(err, "prog_detach"); 1015 + assert_mprog_count(target, 0); 1016 + cleanup: 1017 + test_tc_link__destroy(skel); 1018 + } 1019 + 1020 + void serial_test_tc_opts_invalid(void) 1021 + { 1022 + test_tc_opts_invalid_target(BPF_TCX_INGRESS); 1023 + test_tc_opts_invalid_target(BPF_TCX_EGRESS); 1024 + } 1025 + 1026 + static void test_tc_opts_prepend_target(int target) 1027 + { 1028 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 1029 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 1030 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1031 + __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4; 1032 + struct test_tc_link *skel; 1033 + __u32 prog_ids[5]; 1034 + int err; 1035 + 1036 + skel = test_tc_link__open_and_load(); 1037 + if (!ASSERT_OK_PTR(skel, "skel_load")) 1038 + goto cleanup; 1039 + 1040 + fd1 = bpf_program__fd(skel->progs.tc1); 1041 + fd2 = bpf_program__fd(skel->progs.tc2); 1042 + fd3 = bpf_program__fd(skel->progs.tc3); 1043 + fd4 = bpf_program__fd(skel->progs.tc4); 1044 + 1045 + id1 = id_from_prog_fd(fd1); 1046 + id2 = id_from_prog_fd(fd2); 1047 + id3 = id_from_prog_fd(fd3); 1048 + id4 = id_from_prog_fd(fd4); 1049 + 1050 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 1051 + ASSERT_NEQ(id3, id4, "prog_ids_3_4"); 1052 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 1053 + 1054 + assert_mprog_count(target, 0); 1055 + 1056 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 1057 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1058 + goto cleanup; 1059 + 1060 + assert_mprog_count(target, 1); 1061 + 1062 + LIBBPF_OPTS_RESET(opta, 1063 + .flags = BPF_F_BEFORE, 1064 + ); 1065 + 1066 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 1067 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1068 + goto cleanup_target; 1069 + 1070 + assert_mprog_count(target, 2); 1071 + 1072 + optq.prog_ids = prog_ids; 1073 + 1074 + memset(prog_ids, 0, sizeof(prog_ids)); 1075 + optq.count = ARRAY_SIZE(prog_ids); 1076 + 1077 + err = bpf_prog_query_opts(loopback, target, &optq); 1078 + if (!ASSERT_OK(err, "prog_query")) 1079 + goto cleanup_target2; 1080 + 1081 + ASSERT_EQ(optq.count, 2, "count"); 1082 + ASSERT_EQ(optq.revision, 3, "revision"); 1083 + ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]"); 1084 + ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]"); 1085 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 1086 + 1087 + ASSERT_OK(system(ping_cmd), ping_cmd); 1088 + 1089 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 1090 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 1091 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 1092 + ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4"); 1093 + 1094 + LIBBPF_OPTS_RESET(opta, 1095 + .flags = BPF_F_BEFORE, 1096 + ); 1097 + 1098 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 1099 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1100 + goto cleanup_target2; 1101 + 1102 + LIBBPF_OPTS_RESET(opta, 1103 + .flags = BPF_F_BEFORE, 1104 + ); 1105 + 1106 + err = bpf_prog_attach_opts(fd4, loopback, target, &opta); 1107 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1108 + goto cleanup_target3; 1109 + 1110 + assert_mprog_count(target, 4); 1111 + 1112 + memset(prog_ids, 0, sizeof(prog_ids)); 1113 + optq.count = ARRAY_SIZE(prog_ids); 1114 + 1115 + err = bpf_prog_query_opts(loopback, target, &optq); 1116 + if (!ASSERT_OK(err, "prog_query")) 1117 + goto cleanup_target4; 1118 + 1119 + ASSERT_EQ(optq.count, 4, "count"); 1120 + ASSERT_EQ(optq.revision, 5, "revision"); 1121 + ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]"); 1122 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 1123 + ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]"); 1124 + ASSERT_EQ(optq.prog_ids[3], id1, "prog_ids[3]"); 1125 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 1126 + 1127 + ASSERT_OK(system(ping_cmd), ping_cmd); 1128 + 1129 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 1130 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 1131 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 1132 + ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4"); 1133 + 1134 + cleanup_target4: 1135 + err = bpf_prog_detach_opts(fd4, loopback, target, &optd); 1136 + ASSERT_OK(err, "prog_detach"); 1137 + assert_mprog_count(target, 3); 1138 + 1139 + cleanup_target3: 1140 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 1141 + ASSERT_OK(err, "prog_detach"); 1142 + assert_mprog_count(target, 2); 1143 + 1144 + cleanup_target2: 1145 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 1146 + ASSERT_OK(err, "prog_detach"); 1147 + assert_mprog_count(target, 1); 1148 + 1149 + cleanup_target: 1150 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 1151 + ASSERT_OK(err, "prog_detach"); 1152 + assert_mprog_count(target, 0); 1153 + 1154 + cleanup: 1155 + test_tc_link__destroy(skel); 1156 + } 1157 + 1158 + void serial_test_tc_opts_prepend(void) 1159 + { 1160 + test_tc_opts_prepend_target(BPF_TCX_INGRESS); 1161 + test_tc_opts_prepend_target(BPF_TCX_EGRESS); 1162 + } 1163 + 1164 + static void test_tc_opts_append_target(int target) 1165 + { 1166 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 1167 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 1168 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1169 + __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4; 1170 + struct test_tc_link *skel; 1171 + __u32 prog_ids[5]; 1172 + int err; 1173 + 1174 + skel = test_tc_link__open_and_load(); 1175 + if (!ASSERT_OK_PTR(skel, "skel_load")) 1176 + goto cleanup; 1177 + 1178 + fd1 = bpf_program__fd(skel->progs.tc1); 1179 + fd2 = bpf_program__fd(skel->progs.tc2); 1180 + fd3 = bpf_program__fd(skel->progs.tc3); 1181 + fd4 = bpf_program__fd(skel->progs.tc4); 1182 + 1183 + id1 = id_from_prog_fd(fd1); 1184 + id2 = id_from_prog_fd(fd2); 1185 + id3 = id_from_prog_fd(fd3); 1186 + id4 = id_from_prog_fd(fd4); 1187 + 1188 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 1189 + ASSERT_NEQ(id3, id4, "prog_ids_3_4"); 1190 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 1191 + 1192 + assert_mprog_count(target, 0); 1193 + 1194 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 1195 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1196 + goto cleanup; 1197 + 1198 + assert_mprog_count(target, 1); 1199 + 1200 + LIBBPF_OPTS_RESET(opta, 1201 + .flags = BPF_F_AFTER, 1202 + ); 1203 + 1204 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 1205 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1206 + goto cleanup_target; 1207 + 1208 + assert_mprog_count(target, 2); 1209 + 1210 + optq.prog_ids = prog_ids; 1211 + 1212 + memset(prog_ids, 0, sizeof(prog_ids)); 1213 + optq.count = ARRAY_SIZE(prog_ids); 1214 + 1215 + err = bpf_prog_query_opts(loopback, target, &optq); 1216 + if (!ASSERT_OK(err, "prog_query")) 1217 + goto cleanup_target2; 1218 + 1219 + ASSERT_EQ(optq.count, 2, "count"); 1220 + ASSERT_EQ(optq.revision, 3, "revision"); 1221 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 1222 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 1223 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 1224 + 1225 + ASSERT_OK(system(ping_cmd), ping_cmd); 1226 + 1227 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 1228 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 1229 + ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3"); 1230 + ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4"); 1231 + 1232 + LIBBPF_OPTS_RESET(opta, 1233 + .flags = BPF_F_AFTER, 1234 + ); 1235 + 1236 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 1237 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1238 + goto cleanup_target2; 1239 + 1240 + LIBBPF_OPTS_RESET(opta, 1241 + .flags = BPF_F_AFTER, 1242 + ); 1243 + 1244 + err = bpf_prog_attach_opts(fd4, loopback, target, &opta); 1245 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1246 + goto cleanup_target3; 1247 + 1248 + assert_mprog_count(target, 4); 1249 + 1250 + memset(prog_ids, 0, sizeof(prog_ids)); 1251 + optq.count = ARRAY_SIZE(prog_ids); 1252 + 1253 + err = bpf_prog_query_opts(loopback, target, &optq); 1254 + if (!ASSERT_OK(err, "prog_query")) 1255 + goto cleanup_target4; 1256 + 1257 + ASSERT_EQ(optq.count, 4, "count"); 1258 + ASSERT_EQ(optq.revision, 5, "revision"); 1259 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 1260 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 1261 + ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]"); 1262 + ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]"); 1263 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 1264 + 1265 + ASSERT_OK(system(ping_cmd), ping_cmd); 1266 + 1267 + ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1"); 1268 + ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2"); 1269 + ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3"); 1270 + ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4"); 1271 + 1272 + cleanup_target4: 1273 + err = bpf_prog_detach_opts(fd4, loopback, target, &optd); 1274 + ASSERT_OK(err, "prog_detach"); 1275 + assert_mprog_count(target, 3); 1276 + 1277 + cleanup_target3: 1278 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 1279 + ASSERT_OK(err, "prog_detach"); 1280 + assert_mprog_count(target, 2); 1281 + 1282 + cleanup_target2: 1283 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 1284 + ASSERT_OK(err, "prog_detach"); 1285 + assert_mprog_count(target, 1); 1286 + 1287 + cleanup_target: 1288 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 1289 + ASSERT_OK(err, "prog_detach"); 1290 + assert_mprog_count(target, 0); 1291 + 1292 + cleanup: 1293 + test_tc_link__destroy(skel); 1294 + } 1295 + 1296 + void serial_test_tc_opts_append(void) 1297 + { 1298 + test_tc_opts_append_target(BPF_TCX_INGRESS); 1299 + test_tc_opts_append_target(BPF_TCX_EGRESS); 1300 + } 1301 + 1302 + static void test_tc_opts_dev_cleanup_target(int target) 1303 + { 1304 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 1305 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 1306 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1307 + __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4; 1308 + struct test_tc_link *skel; 1309 + int err, ifindex; 1310 + 1311 + ASSERT_OK(system("ip link add dev tcx_opts1 type veth peer name tcx_opts2"), "add veth"); 1312 + ifindex = if_nametoindex("tcx_opts1"); 1313 + ASSERT_NEQ(ifindex, 0, "non_zero_ifindex"); 1314 + 1315 + skel = test_tc_link__open_and_load(); 1316 + if (!ASSERT_OK_PTR(skel, "skel_load")) 1317 + goto cleanup; 1318 + 1319 + fd1 = bpf_program__fd(skel->progs.tc1); 1320 + fd2 = bpf_program__fd(skel->progs.tc2); 1321 + fd3 = bpf_program__fd(skel->progs.tc3); 1322 + fd4 = bpf_program__fd(skel->progs.tc4); 1323 + 1324 + id1 = id_from_prog_fd(fd1); 1325 + id2 = id_from_prog_fd(fd2); 1326 + id3 = id_from_prog_fd(fd3); 1327 + id4 = id_from_prog_fd(fd4); 1328 + 1329 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 1330 + ASSERT_NEQ(id3, id4, "prog_ids_3_4"); 1331 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 1332 + 1333 + assert_mprog_count_ifindex(ifindex, target, 0); 1334 + 1335 + err = bpf_prog_attach_opts(fd1, ifindex, target, &opta); 1336 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1337 + goto cleanup; 1338 + 1339 + assert_mprog_count_ifindex(ifindex, target, 1); 1340 + 1341 + err = bpf_prog_attach_opts(fd2, ifindex, target, &opta); 1342 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1343 + goto cleanup1; 1344 + 1345 + assert_mprog_count_ifindex(ifindex, target, 2); 1346 + 1347 + err = bpf_prog_attach_opts(fd3, ifindex, target, &opta); 1348 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1349 + goto cleanup2; 1350 + 1351 + assert_mprog_count_ifindex(ifindex, target, 3); 1352 + 1353 + err = bpf_prog_attach_opts(fd4, ifindex, target, &opta); 1354 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1355 + goto cleanup3; 1356 + 1357 + assert_mprog_count_ifindex(ifindex, target, 4); 1358 + 1359 + ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth"); 1360 + ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed"); 1361 + ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed"); 1362 + return; 1363 + cleanup3: 1364 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 1365 + ASSERT_OK(err, "prog_detach"); 1366 + 1367 + assert_mprog_count_ifindex(ifindex, target, 2); 1368 + cleanup2: 1369 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 1370 + ASSERT_OK(err, "prog_detach"); 1371 + 1372 + assert_mprog_count_ifindex(ifindex, target, 1); 1373 + cleanup1: 1374 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 1375 + ASSERT_OK(err, "prog_detach"); 1376 + 1377 + assert_mprog_count_ifindex(ifindex, target, 0); 1378 + cleanup: 1379 + test_tc_link__destroy(skel); 1380 + 1381 + ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth"); 1382 + ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed"); 1383 + ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed"); 1384 + } 1385 + 1386 + void serial_test_tc_opts_dev_cleanup(void) 1387 + { 1388 + test_tc_opts_dev_cleanup_target(BPF_TCX_INGRESS); 1389 + test_tc_opts_dev_cleanup_target(BPF_TCX_EGRESS); 1390 + } 1391 + 1392 + static void test_tc_opts_mixed_target(int target) 1393 + { 1394 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 1395 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 1396 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1397 + LIBBPF_OPTS(bpf_tcx_opts, optl); 1398 + __u32 pid1, pid2, pid3, pid4, lid2, lid4; 1399 + __u32 prog_flags[4], link_flags[4]; 1400 + __u32 prog_ids[4], link_ids[4]; 1401 + struct test_tc_link *skel; 1402 + struct bpf_link *link; 1403 + int err, detach_fd; 1404 + 1405 + skel = test_tc_link__open(); 1406 + if (!ASSERT_OK_PTR(skel, "skel_open")) 1407 + goto cleanup; 1408 + 1409 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 1410 + 0, "tc1_attach_type"); 1411 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 1412 + 0, "tc2_attach_type"); 1413 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target), 1414 + 0, "tc3_attach_type"); 1415 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target), 1416 + 0, "tc4_attach_type"); 1417 + 1418 + err = test_tc_link__load(skel); 1419 + if (!ASSERT_OK(err, "skel_load")) 1420 + goto cleanup; 1421 + 1422 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 1423 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 1424 + pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3)); 1425 + pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4)); 1426 + 1427 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 1428 + ASSERT_NEQ(pid3, pid4, "prog_ids_3_4"); 1429 + ASSERT_NEQ(pid2, pid3, "prog_ids_2_3"); 1430 + 1431 + assert_mprog_count(target, 0); 1432 + 1433 + err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1), 1434 + loopback, target, &opta); 1435 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1436 + goto cleanup; 1437 + 1438 + detach_fd = bpf_program__fd(skel->progs.tc1); 1439 + 1440 + assert_mprog_count(target, 1); 1441 + 1442 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 1443 + if (!ASSERT_OK_PTR(link, "link_attach")) 1444 + goto cleanup1; 1445 + skel->links.tc2 = link; 1446 + 1447 + lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2)); 1448 + 1449 + assert_mprog_count(target, 2); 1450 + 1451 + LIBBPF_OPTS_RESET(opta, 1452 + .flags = BPF_F_REPLACE, 1453 + .replace_prog_fd = bpf_program__fd(skel->progs.tc1), 1454 + ); 1455 + 1456 + err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc2), 1457 + loopback, target, &opta); 1458 + ASSERT_EQ(err, -EEXIST, "prog_attach"); 1459 + 1460 + assert_mprog_count(target, 2); 1461 + 1462 + LIBBPF_OPTS_RESET(opta, 1463 + .flags = BPF_F_REPLACE, 1464 + .replace_prog_fd = bpf_program__fd(skel->progs.tc2), 1465 + ); 1466 + 1467 + err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1), 1468 + loopback, target, &opta); 1469 + ASSERT_EQ(err, -EEXIST, "prog_attach"); 1470 + 1471 + assert_mprog_count(target, 2); 1472 + 1473 + LIBBPF_OPTS_RESET(opta, 1474 + .flags = BPF_F_REPLACE, 1475 + .replace_prog_fd = bpf_program__fd(skel->progs.tc2), 1476 + ); 1477 + 1478 + err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc3), 1479 + loopback, target, &opta); 1480 + ASSERT_EQ(err, -EBUSY, "prog_attach"); 1481 + 1482 + assert_mprog_count(target, 2); 1483 + 1484 + LIBBPF_OPTS_RESET(opta, 1485 + .flags = BPF_F_REPLACE, 1486 + .replace_prog_fd = bpf_program__fd(skel->progs.tc1), 1487 + ); 1488 + 1489 + err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc3), 1490 + loopback, target, &opta); 1491 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1492 + goto cleanup1; 1493 + 1494 + detach_fd = bpf_program__fd(skel->progs.tc3); 1495 + 1496 + assert_mprog_count(target, 2); 1497 + 1498 + link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl); 1499 + if (!ASSERT_OK_PTR(link, "link_attach")) 1500 + goto cleanup1; 1501 + skel->links.tc4 = link; 1502 + 1503 + lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4)); 1504 + 1505 + assert_mprog_count(target, 3); 1506 + 1507 + LIBBPF_OPTS_RESET(opta, 1508 + .flags = BPF_F_REPLACE, 1509 + .replace_prog_fd = bpf_program__fd(skel->progs.tc4), 1510 + ); 1511 + 1512 + err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc2), 1513 + loopback, target, &opta); 1514 + ASSERT_EQ(err, -EEXIST, "prog_attach"); 1515 + 1516 + optq.prog_ids = prog_ids; 1517 + optq.prog_attach_flags = prog_flags; 1518 + optq.link_ids = link_ids; 1519 + optq.link_attach_flags = link_flags; 1520 + 1521 + memset(prog_ids, 0, sizeof(prog_ids)); 1522 + memset(prog_flags, 0, sizeof(prog_flags)); 1523 + memset(link_ids, 0, sizeof(link_ids)); 1524 + memset(link_flags, 0, sizeof(link_flags)); 1525 + optq.count = ARRAY_SIZE(prog_ids); 1526 + 1527 + err = bpf_prog_query_opts(loopback, target, &optq); 1528 + if (!ASSERT_OK(err, "prog_query")) 1529 + goto cleanup1; 1530 + 1531 + ASSERT_EQ(optq.count, 3, "count"); 1532 + ASSERT_EQ(optq.revision, 5, "revision"); 1533 + ASSERT_EQ(optq.prog_ids[0], pid3, "prog_ids[0]"); 1534 + ASSERT_EQ(optq.prog_attach_flags[0], 0, "prog_flags[0]"); 1535 + ASSERT_EQ(optq.link_ids[0], 0, "link_ids[0]"); 1536 + ASSERT_EQ(optq.link_attach_flags[0], 0, "link_flags[0]"); 1537 + ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]"); 1538 + ASSERT_EQ(optq.prog_attach_flags[1], 0, "prog_flags[1]"); 1539 + ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]"); 1540 + ASSERT_EQ(optq.link_attach_flags[1], 0, "link_flags[1]"); 1541 + ASSERT_EQ(optq.prog_ids[2], pid4, "prog_ids[2]"); 1542 + ASSERT_EQ(optq.prog_attach_flags[2], 0, "prog_flags[2]"); 1543 + ASSERT_EQ(optq.link_ids[2], lid4, "link_ids[2]"); 1544 + ASSERT_EQ(optq.link_attach_flags[2], 0, "link_flags[2]"); 1545 + ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]"); 1546 + ASSERT_EQ(optq.prog_attach_flags[3], 0, "prog_flags[3]"); 1547 + ASSERT_EQ(optq.link_ids[3], 0, "link_ids[3]"); 1548 + ASSERT_EQ(optq.link_attach_flags[3], 0, "link_flags[3]"); 1549 + 1550 + ASSERT_OK(system(ping_cmd), ping_cmd); 1551 + 1552 + cleanup1: 1553 + err = bpf_prog_detach_opts(detach_fd, loopback, target, &optd); 1554 + ASSERT_OK(err, "prog_detach"); 1555 + assert_mprog_count(target, 2); 1556 + 1557 + cleanup: 1558 + test_tc_link__destroy(skel); 1559 + assert_mprog_count(target, 0); 1560 + } 1561 + 1562 + void serial_test_tc_opts_mixed(void) 1563 + { 1564 + test_tc_opts_mixed_target(BPF_TCX_INGRESS); 1565 + test_tc_opts_mixed_target(BPF_TCX_EGRESS); 1566 + } 1567 + 1568 + static void test_tc_opts_demixed_target(int target) 1569 + { 1570 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 1571 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 1572 + LIBBPF_OPTS(bpf_tcx_opts, optl); 1573 + struct test_tc_link *skel; 1574 + struct bpf_link *link; 1575 + __u32 pid1, pid2; 1576 + int err; 1577 + 1578 + skel = test_tc_link__open(); 1579 + if (!ASSERT_OK_PTR(skel, "skel_open")) 1580 + goto cleanup; 1581 + 1582 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target), 1583 + 0, "tc1_attach_type"); 1584 + ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target), 1585 + 0, "tc2_attach_type"); 1586 + 1587 + err = test_tc_link__load(skel); 1588 + if (!ASSERT_OK(err, "skel_load")) 1589 + goto cleanup; 1590 + 1591 + pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1)); 1592 + pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2)); 1593 + ASSERT_NEQ(pid1, pid2, "prog_ids_1_2"); 1594 + 1595 + assert_mprog_count(target, 0); 1596 + 1597 + err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1), 1598 + loopback, target, &opta); 1599 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1600 + goto cleanup; 1601 + 1602 + assert_mprog_count(target, 1); 1603 + 1604 + link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl); 1605 + if (!ASSERT_OK_PTR(link, "link_attach")) 1606 + goto cleanup1; 1607 + skel->links.tc2 = link; 1608 + 1609 + assert_mprog_count(target, 2); 1610 + 1611 + LIBBPF_OPTS_RESET(optd, 1612 + .flags = BPF_F_AFTER, 1613 + ); 1614 + 1615 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 1616 + ASSERT_EQ(err, -EBUSY, "prog_detach"); 1617 + 1618 + assert_mprog_count(target, 2); 1619 + 1620 + LIBBPF_OPTS_RESET(optd, 1621 + .flags = BPF_F_BEFORE, 1622 + ); 1623 + 1624 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 1625 + ASSERT_OK(err, "prog_detach"); 1626 + 1627 + assert_mprog_count(target, 1); 1628 + goto cleanup; 1629 + 1630 + cleanup1: 1631 + err = bpf_prog_detach_opts(bpf_program__fd(skel->progs.tc1), 1632 + loopback, target, &optd); 1633 + ASSERT_OK(err, "prog_detach"); 1634 + assert_mprog_count(target, 2); 1635 + 1636 + cleanup: 1637 + test_tc_link__destroy(skel); 1638 + assert_mprog_count(target, 0); 1639 + } 1640 + 1641 + void serial_test_tc_opts_demixed(void) 1642 + { 1643 + test_tc_opts_demixed_target(BPF_TCX_INGRESS); 1644 + test_tc_opts_demixed_target(BPF_TCX_EGRESS); 1645 + } 1646 + 1647 + static void test_tc_opts_detach_target(int target) 1648 + { 1649 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 1650 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 1651 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1652 + __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4; 1653 + struct test_tc_link *skel; 1654 + __u32 prog_ids[5]; 1655 + int err; 1656 + 1657 + skel = test_tc_link__open_and_load(); 1658 + if (!ASSERT_OK_PTR(skel, "skel_load")) 1659 + goto cleanup; 1660 + 1661 + fd1 = bpf_program__fd(skel->progs.tc1); 1662 + fd2 = bpf_program__fd(skel->progs.tc2); 1663 + fd3 = bpf_program__fd(skel->progs.tc3); 1664 + fd4 = bpf_program__fd(skel->progs.tc4); 1665 + 1666 + id1 = id_from_prog_fd(fd1); 1667 + id2 = id_from_prog_fd(fd2); 1668 + id3 = id_from_prog_fd(fd3); 1669 + id4 = id_from_prog_fd(fd4); 1670 + 1671 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 1672 + ASSERT_NEQ(id3, id4, "prog_ids_3_4"); 1673 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 1674 + 1675 + assert_mprog_count(target, 0); 1676 + 1677 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 1678 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1679 + goto cleanup; 1680 + 1681 + assert_mprog_count(target, 1); 1682 + 1683 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 1684 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1685 + goto cleanup1; 1686 + 1687 + assert_mprog_count(target, 2); 1688 + 1689 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 1690 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1691 + goto cleanup2; 1692 + 1693 + assert_mprog_count(target, 3); 1694 + 1695 + err = bpf_prog_attach_opts(fd4, loopback, target, &opta); 1696 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1697 + goto cleanup3; 1698 + 1699 + assert_mprog_count(target, 4); 1700 + 1701 + optq.prog_ids = prog_ids; 1702 + 1703 + memset(prog_ids, 0, sizeof(prog_ids)); 1704 + optq.count = ARRAY_SIZE(prog_ids); 1705 + 1706 + err = bpf_prog_query_opts(loopback, target, &optq); 1707 + if (!ASSERT_OK(err, "prog_query")) 1708 + goto cleanup4; 1709 + 1710 + ASSERT_EQ(optq.count, 4, "count"); 1711 + ASSERT_EQ(optq.revision, 5, "revision"); 1712 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 1713 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 1714 + ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]"); 1715 + ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]"); 1716 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 1717 + 1718 + LIBBPF_OPTS_RESET(optd, 1719 + .flags = BPF_F_BEFORE, 1720 + ); 1721 + 1722 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 1723 + ASSERT_OK(err, "prog_detach"); 1724 + 1725 + assert_mprog_count(target, 3); 1726 + 1727 + memset(prog_ids, 0, sizeof(prog_ids)); 1728 + optq.count = ARRAY_SIZE(prog_ids); 1729 + 1730 + err = bpf_prog_query_opts(loopback, target, &optq); 1731 + if (!ASSERT_OK(err, "prog_query")) 1732 + goto cleanup4; 1733 + 1734 + ASSERT_EQ(optq.count, 3, "count"); 1735 + ASSERT_EQ(optq.revision, 6, "revision"); 1736 + ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]"); 1737 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 1738 + ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]"); 1739 + ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]"); 1740 + 1741 + LIBBPF_OPTS_RESET(optd, 1742 + .flags = BPF_F_AFTER, 1743 + ); 1744 + 1745 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 1746 + ASSERT_OK(err, "prog_detach"); 1747 + 1748 + assert_mprog_count(target, 2); 1749 + 1750 + memset(prog_ids, 0, sizeof(prog_ids)); 1751 + optq.count = ARRAY_SIZE(prog_ids); 1752 + 1753 + err = bpf_prog_query_opts(loopback, target, &optq); 1754 + if (!ASSERT_OK(err, "prog_query")) 1755 + goto cleanup4; 1756 + 1757 + ASSERT_EQ(optq.count, 2, "count"); 1758 + ASSERT_EQ(optq.revision, 7, "revision"); 1759 + ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]"); 1760 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 1761 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 1762 + 1763 + LIBBPF_OPTS_RESET(optd); 1764 + 1765 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 1766 + ASSERT_OK(err, "prog_detach"); 1767 + assert_mprog_count(target, 1); 1768 + 1769 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 1770 + ASSERT_OK(err, "prog_detach"); 1771 + assert_mprog_count(target, 0); 1772 + 1773 + LIBBPF_OPTS_RESET(optd, 1774 + .flags = BPF_F_BEFORE, 1775 + ); 1776 + 1777 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 1778 + ASSERT_EQ(err, -ENOENT, "prog_detach"); 1779 + 1780 + LIBBPF_OPTS_RESET(optd, 1781 + .flags = BPF_F_AFTER, 1782 + ); 1783 + 1784 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 1785 + ASSERT_EQ(err, -ENOENT, "prog_detach"); 1786 + goto cleanup; 1787 + 1788 + cleanup4: 1789 + err = bpf_prog_detach_opts(fd4, loopback, target, &optd); 1790 + ASSERT_OK(err, "prog_detach"); 1791 + assert_mprog_count(target, 3); 1792 + 1793 + cleanup3: 1794 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 1795 + ASSERT_OK(err, "prog_detach"); 1796 + assert_mprog_count(target, 2); 1797 + 1798 + cleanup2: 1799 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 1800 + ASSERT_OK(err, "prog_detach"); 1801 + assert_mprog_count(target, 1); 1802 + 1803 + cleanup1: 1804 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 1805 + ASSERT_OK(err, "prog_detach"); 1806 + assert_mprog_count(target, 0); 1807 + 1808 + cleanup: 1809 + test_tc_link__destroy(skel); 1810 + } 1811 + 1812 + void serial_test_tc_opts_detach(void) 1813 + { 1814 + test_tc_opts_detach_target(BPF_TCX_INGRESS); 1815 + test_tc_opts_detach_target(BPF_TCX_EGRESS); 1816 + } 1817 + 1818 + static void test_tc_opts_detach_before_target(int target) 1819 + { 1820 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 1821 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 1822 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 1823 + __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4; 1824 + struct test_tc_link *skel; 1825 + __u32 prog_ids[5]; 1826 + int err; 1827 + 1828 + skel = test_tc_link__open_and_load(); 1829 + if (!ASSERT_OK_PTR(skel, "skel_load")) 1830 + goto cleanup; 1831 + 1832 + fd1 = bpf_program__fd(skel->progs.tc1); 1833 + fd2 = bpf_program__fd(skel->progs.tc2); 1834 + fd3 = bpf_program__fd(skel->progs.tc3); 1835 + fd4 = bpf_program__fd(skel->progs.tc4); 1836 + 1837 + id1 = id_from_prog_fd(fd1); 1838 + id2 = id_from_prog_fd(fd2); 1839 + id3 = id_from_prog_fd(fd3); 1840 + id4 = id_from_prog_fd(fd4); 1841 + 1842 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 1843 + ASSERT_NEQ(id3, id4, "prog_ids_3_4"); 1844 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 1845 + 1846 + assert_mprog_count(target, 0); 1847 + 1848 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 1849 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1850 + goto cleanup; 1851 + 1852 + assert_mprog_count(target, 1); 1853 + 1854 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 1855 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1856 + goto cleanup1; 1857 + 1858 + assert_mprog_count(target, 2); 1859 + 1860 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 1861 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1862 + goto cleanup2; 1863 + 1864 + assert_mprog_count(target, 3); 1865 + 1866 + err = bpf_prog_attach_opts(fd4, loopback, target, &opta); 1867 + if (!ASSERT_EQ(err, 0, "prog_attach")) 1868 + goto cleanup3; 1869 + 1870 + assert_mprog_count(target, 4); 1871 + 1872 + optq.prog_ids = prog_ids; 1873 + 1874 + memset(prog_ids, 0, sizeof(prog_ids)); 1875 + optq.count = ARRAY_SIZE(prog_ids); 1876 + 1877 + err = bpf_prog_query_opts(loopback, target, &optq); 1878 + if (!ASSERT_OK(err, "prog_query")) 1879 + goto cleanup4; 1880 + 1881 + ASSERT_EQ(optq.count, 4, "count"); 1882 + ASSERT_EQ(optq.revision, 5, "revision"); 1883 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 1884 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 1885 + ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]"); 1886 + ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]"); 1887 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 1888 + 1889 + LIBBPF_OPTS_RESET(optd, 1890 + .flags = BPF_F_BEFORE, 1891 + .relative_fd = fd2, 1892 + ); 1893 + 1894 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 1895 + ASSERT_OK(err, "prog_detach"); 1896 + 1897 + assert_mprog_count(target, 3); 1898 + 1899 + memset(prog_ids, 0, sizeof(prog_ids)); 1900 + optq.count = ARRAY_SIZE(prog_ids); 1901 + 1902 + err = bpf_prog_query_opts(loopback, target, &optq); 1903 + if (!ASSERT_OK(err, "prog_query")) 1904 + goto cleanup4; 1905 + 1906 + ASSERT_EQ(optq.count, 3, "count"); 1907 + ASSERT_EQ(optq.revision, 6, "revision"); 1908 + ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]"); 1909 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 1910 + ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]"); 1911 + ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]"); 1912 + 1913 + LIBBPF_OPTS_RESET(optd, 1914 + .flags = BPF_F_BEFORE, 1915 + .relative_fd = fd2, 1916 + ); 1917 + 1918 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 1919 + ASSERT_EQ(err, -ENOENT, "prog_detach"); 1920 + assert_mprog_count(target, 3); 1921 + 1922 + LIBBPF_OPTS_RESET(optd, 1923 + .flags = BPF_F_BEFORE, 1924 + .relative_fd = fd4, 1925 + ); 1926 + 1927 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 1928 + ASSERT_EQ(err, -ERANGE, "prog_detach"); 1929 + assert_mprog_count(target, 3); 1930 + 1931 + LIBBPF_OPTS_RESET(optd, 1932 + .flags = BPF_F_BEFORE, 1933 + .relative_fd = fd1, 1934 + ); 1935 + 1936 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 1937 + ASSERT_EQ(err, -ENOENT, "prog_detach"); 1938 + assert_mprog_count(target, 3); 1939 + 1940 + LIBBPF_OPTS_RESET(optd, 1941 + .flags = BPF_F_BEFORE, 1942 + .relative_fd = fd3, 1943 + ); 1944 + 1945 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 1946 + ASSERT_OK(err, "prog_detach"); 1947 + 1948 + assert_mprog_count(target, 2); 1949 + 1950 + memset(prog_ids, 0, sizeof(prog_ids)); 1951 + optq.count = ARRAY_SIZE(prog_ids); 1952 + 1953 + err = bpf_prog_query_opts(loopback, target, &optq); 1954 + if (!ASSERT_OK(err, "prog_query")) 1955 + goto cleanup4; 1956 + 1957 + ASSERT_EQ(optq.count, 2, "count"); 1958 + ASSERT_EQ(optq.revision, 7, "revision"); 1959 + ASSERT_EQ(optq.prog_ids[0], id3, "prog_ids[0]"); 1960 + ASSERT_EQ(optq.prog_ids[1], id4, "prog_ids[1]"); 1961 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 1962 + 1963 + LIBBPF_OPTS_RESET(optd, 1964 + .flags = BPF_F_BEFORE, 1965 + .relative_fd = fd4, 1966 + ); 1967 + 1968 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 1969 + ASSERT_OK(err, "prog_detach"); 1970 + 1971 + assert_mprog_count(target, 1); 1972 + 1973 + memset(prog_ids, 0, sizeof(prog_ids)); 1974 + optq.count = ARRAY_SIZE(prog_ids); 1975 + 1976 + err = bpf_prog_query_opts(loopback, target, &optq); 1977 + if (!ASSERT_OK(err, "prog_query")) 1978 + goto cleanup4; 1979 + 1980 + ASSERT_EQ(optq.count, 1, "count"); 1981 + ASSERT_EQ(optq.revision, 8, "revision"); 1982 + ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]"); 1983 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 1984 + 1985 + LIBBPF_OPTS_RESET(optd, 1986 + .flags = BPF_F_BEFORE, 1987 + ); 1988 + 1989 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 1990 + ASSERT_OK(err, "prog_detach"); 1991 + 1992 + assert_mprog_count(target, 0); 1993 + goto cleanup; 1994 + 1995 + cleanup4: 1996 + err = bpf_prog_detach_opts(fd4, loopback, target, &optd); 1997 + ASSERT_OK(err, "prog_detach"); 1998 + assert_mprog_count(target, 3); 1999 + 2000 + cleanup3: 2001 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 2002 + ASSERT_OK(err, "prog_detach"); 2003 + assert_mprog_count(target, 2); 2004 + 2005 + cleanup2: 2006 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 2007 + ASSERT_OK(err, "prog_detach"); 2008 + assert_mprog_count(target, 1); 2009 + 2010 + cleanup1: 2011 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 2012 + ASSERT_OK(err, "prog_detach"); 2013 + assert_mprog_count(target, 0); 2014 + 2015 + cleanup: 2016 + test_tc_link__destroy(skel); 2017 + } 2018 + 2019 + void serial_test_tc_opts_detach_before(void) 2020 + { 2021 + test_tc_opts_detach_before_target(BPF_TCX_INGRESS); 2022 + test_tc_opts_detach_before_target(BPF_TCX_EGRESS); 2023 + } 2024 + 2025 + static void test_tc_opts_detach_after_target(int target) 2026 + { 2027 + LIBBPF_OPTS(bpf_prog_attach_opts, opta); 2028 + LIBBPF_OPTS(bpf_prog_detach_opts, optd); 2029 + LIBBPF_OPTS(bpf_prog_query_opts, optq); 2030 + __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4; 2031 + struct test_tc_link *skel; 2032 + __u32 prog_ids[5]; 2033 + int err; 2034 + 2035 + skel = test_tc_link__open_and_load(); 2036 + if (!ASSERT_OK_PTR(skel, "skel_load")) 2037 + goto cleanup; 2038 + 2039 + fd1 = bpf_program__fd(skel->progs.tc1); 2040 + fd2 = bpf_program__fd(skel->progs.tc2); 2041 + fd3 = bpf_program__fd(skel->progs.tc3); 2042 + fd4 = bpf_program__fd(skel->progs.tc4); 2043 + 2044 + id1 = id_from_prog_fd(fd1); 2045 + id2 = id_from_prog_fd(fd2); 2046 + id3 = id_from_prog_fd(fd3); 2047 + id4 = id_from_prog_fd(fd4); 2048 + 2049 + ASSERT_NEQ(id1, id2, "prog_ids_1_2"); 2050 + ASSERT_NEQ(id3, id4, "prog_ids_3_4"); 2051 + ASSERT_NEQ(id2, id3, "prog_ids_2_3"); 2052 + 2053 + assert_mprog_count(target, 0); 2054 + 2055 + err = bpf_prog_attach_opts(fd1, loopback, target, &opta); 2056 + if (!ASSERT_EQ(err, 0, "prog_attach")) 2057 + goto cleanup; 2058 + 2059 + assert_mprog_count(target, 1); 2060 + 2061 + err = bpf_prog_attach_opts(fd2, loopback, target, &opta); 2062 + if (!ASSERT_EQ(err, 0, "prog_attach")) 2063 + goto cleanup1; 2064 + 2065 + assert_mprog_count(target, 2); 2066 + 2067 + err = bpf_prog_attach_opts(fd3, loopback, target, &opta); 2068 + if (!ASSERT_EQ(err, 0, "prog_attach")) 2069 + goto cleanup2; 2070 + 2071 + assert_mprog_count(target, 3); 2072 + 2073 + err = bpf_prog_attach_opts(fd4, loopback, target, &opta); 2074 + if (!ASSERT_EQ(err, 0, "prog_attach")) 2075 + goto cleanup3; 2076 + 2077 + assert_mprog_count(target, 4); 2078 + 2079 + optq.prog_ids = prog_ids; 2080 + 2081 + memset(prog_ids, 0, sizeof(prog_ids)); 2082 + optq.count = ARRAY_SIZE(prog_ids); 2083 + 2084 + err = bpf_prog_query_opts(loopback, target, &optq); 2085 + if (!ASSERT_OK(err, "prog_query")) 2086 + goto cleanup4; 2087 + 2088 + ASSERT_EQ(optq.count, 4, "count"); 2089 + ASSERT_EQ(optq.revision, 5, "revision"); 2090 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 2091 + ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]"); 2092 + ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]"); 2093 + ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]"); 2094 + ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]"); 2095 + 2096 + LIBBPF_OPTS_RESET(optd, 2097 + .flags = BPF_F_AFTER, 2098 + .relative_fd = fd1, 2099 + ); 2100 + 2101 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 2102 + ASSERT_OK(err, "prog_detach"); 2103 + 2104 + assert_mprog_count(target, 3); 2105 + 2106 + memset(prog_ids, 0, sizeof(prog_ids)); 2107 + optq.count = ARRAY_SIZE(prog_ids); 2108 + 2109 + err = bpf_prog_query_opts(loopback, target, &optq); 2110 + if (!ASSERT_OK(err, "prog_query")) 2111 + goto cleanup4; 2112 + 2113 + ASSERT_EQ(optq.count, 3, "count"); 2114 + ASSERT_EQ(optq.revision, 6, "revision"); 2115 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 2116 + ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]"); 2117 + ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]"); 2118 + ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]"); 2119 + 2120 + LIBBPF_OPTS_RESET(optd, 2121 + .flags = BPF_F_AFTER, 2122 + .relative_fd = fd1, 2123 + ); 2124 + 2125 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 2126 + ASSERT_EQ(err, -ENOENT, "prog_detach"); 2127 + assert_mprog_count(target, 3); 2128 + 2129 + LIBBPF_OPTS_RESET(optd, 2130 + .flags = BPF_F_AFTER, 2131 + .relative_fd = fd4, 2132 + ); 2133 + 2134 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 2135 + ASSERT_EQ(err, -ERANGE, "prog_detach"); 2136 + assert_mprog_count(target, 3); 2137 + 2138 + LIBBPF_OPTS_RESET(optd, 2139 + .flags = BPF_F_AFTER, 2140 + .relative_fd = fd3, 2141 + ); 2142 + 2143 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 2144 + ASSERT_EQ(err, -ERANGE, "prog_detach"); 2145 + assert_mprog_count(target, 3); 2146 + 2147 + LIBBPF_OPTS_RESET(optd, 2148 + .flags = BPF_F_AFTER, 2149 + .relative_fd = fd1, 2150 + ); 2151 + 2152 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 2153 + ASSERT_EQ(err, -ERANGE, "prog_detach"); 2154 + assert_mprog_count(target, 3); 2155 + 2156 + LIBBPF_OPTS_RESET(optd, 2157 + .flags = BPF_F_AFTER, 2158 + .relative_fd = fd1, 2159 + ); 2160 + 2161 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 2162 + ASSERT_OK(err, "prog_detach"); 2163 + 2164 + assert_mprog_count(target, 2); 2165 + 2166 + memset(prog_ids, 0, sizeof(prog_ids)); 2167 + optq.count = ARRAY_SIZE(prog_ids); 2168 + 2169 + err = bpf_prog_query_opts(loopback, target, &optq); 2170 + if (!ASSERT_OK(err, "prog_query")) 2171 + goto cleanup4; 2172 + 2173 + ASSERT_EQ(optq.count, 2, "count"); 2174 + ASSERT_EQ(optq.revision, 7, "revision"); 2175 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 2176 + ASSERT_EQ(optq.prog_ids[1], id4, "prog_ids[1]"); 2177 + ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]"); 2178 + 2179 + LIBBPF_OPTS_RESET(optd, 2180 + .flags = BPF_F_AFTER, 2181 + .relative_fd = fd1, 2182 + ); 2183 + 2184 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 2185 + ASSERT_OK(err, "prog_detach"); 2186 + 2187 + assert_mprog_count(target, 1); 2188 + 2189 + memset(prog_ids, 0, sizeof(prog_ids)); 2190 + optq.count = ARRAY_SIZE(prog_ids); 2191 + 2192 + err = bpf_prog_query_opts(loopback, target, &optq); 2193 + if (!ASSERT_OK(err, "prog_query")) 2194 + goto cleanup4; 2195 + 2196 + ASSERT_EQ(optq.count, 1, "count"); 2197 + ASSERT_EQ(optq.revision, 8, "revision"); 2198 + ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]"); 2199 + ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]"); 2200 + 2201 + LIBBPF_OPTS_RESET(optd, 2202 + .flags = BPF_F_AFTER, 2203 + ); 2204 + 2205 + err = bpf_prog_detach_opts(0, loopback, target, &optd); 2206 + ASSERT_OK(err, "prog_detach"); 2207 + 2208 + assert_mprog_count(target, 0); 2209 + goto cleanup; 2210 + 2211 + cleanup4: 2212 + err = bpf_prog_detach_opts(fd4, loopback, target, &optd); 2213 + ASSERT_OK(err, "prog_detach"); 2214 + assert_mprog_count(target, 3); 2215 + 2216 + cleanup3: 2217 + err = bpf_prog_detach_opts(fd3, loopback, target, &optd); 2218 + ASSERT_OK(err, "prog_detach"); 2219 + assert_mprog_count(target, 2); 2220 + 2221 + cleanup2: 2222 + err = bpf_prog_detach_opts(fd2, loopback, target, &optd); 2223 + ASSERT_OK(err, "prog_detach"); 2224 + assert_mprog_count(target, 1); 2225 + 2226 + cleanup1: 2227 + err = bpf_prog_detach_opts(fd1, loopback, target, &optd); 2228 + ASSERT_OK(err, "prog_detach"); 2229 + assert_mprog_count(target, 0); 2230 + 2231 + cleanup: 2232 + test_tc_link__destroy(skel); 2233 + } 2234 + 2235 + void serial_test_tc_opts_detach_after(void) 2236 + { 2237 + test_tc_opts_detach_after_target(BPF_TCX_INGRESS); 2238 + test_tc_opts_detach_after_target(BPF_TCX_EGRESS); 2239 + }

+5

tools/testing/selftests/bpf/progs/map_ptr_kern.c

··· 103 103 __type(value, __u32); 104 104 } m_hash SEC(".maps"); 105 105 106 + __s64 bpf_map_sum_elem_count(struct bpf_map *map) __ksym; 107 + 106 108 static inline int check_hash(void) 107 109 { 108 110 struct bpf_htab *hash = (struct bpf_htab *)&m_hash; ··· 117 115 VERIFY(hash->elem_size == 64); 118 116 119 117 VERIFY(hash->count.counter == 0); 118 + VERIFY(bpf_map_sum_elem_count(map) == 0); 119 + 120 120 for (i = 0; i < HALF_ENTRIES; ++i) { 121 121 const __u32 key = i; 122 122 const __u32 val = 1; ··· 127 123 return 0; 128 124 } 129 125 VERIFY(hash->count.counter == HALF_ENTRIES); 126 + VERIFY(bpf_map_sum_elem_count(map) == HALF_ENTRIES); 130 127 131 128 return 1; 132 129 }

+93 -1

tools/testing/selftests/bpf/progs/refcounted_kptr.c

··· 24 24 __uint(type, BPF_MAP_TYPE_ARRAY); 25 25 __type(key, int); 26 26 __type(value, struct map_value); 27 - __uint(max_entries, 1); 27 + __uint(max_entries, 2); 28 28 } stashed_nodes SEC(".maps"); 29 29 30 30 struct node_acquire { ··· 41 41 42 42 private(B) struct bpf_spin_lock alock; 43 43 private(B) struct bpf_rb_root aroot __contains(node_acquire, node); 44 + 45 + private(C) struct bpf_spin_lock block; 46 + private(C) struct bpf_rb_root broot __contains(node_data, r); 44 47 45 48 static bool less(struct bpf_rb_node *node_a, const struct bpf_rb_node *node_b) 46 49 { ··· 405 402 406 403 bpf_obj_drop(m); 407 404 405 + return 0; 406 + } 407 + 408 + static long __stash_map_empty_xchg(struct node_data *n, int idx) 409 + { 410 + struct map_value *mapval = bpf_map_lookup_elem(&stashed_nodes, &idx); 411 + 412 + if (!mapval) { 413 + bpf_obj_drop(n); 414 + return 1; 415 + } 416 + n = bpf_kptr_xchg(&mapval->node, n); 417 + if (n) { 418 + bpf_obj_drop(n); 419 + return 2; 420 + } 421 + return 0; 422 + } 423 + 424 + SEC("tc") 425 + long rbtree_wrong_owner_remove_fail_a1(void *ctx) 426 + { 427 + struct node_data *n, *m; 428 + 429 + n = bpf_obj_new(typeof(*n)); 430 + if (!n) 431 + return 1; 432 + m = bpf_refcount_acquire(n); 433 + 434 + if (__stash_map_empty_xchg(n, 0)) { 435 + bpf_obj_drop(m); 436 + return 2; 437 + } 438 + 439 + if (__stash_map_empty_xchg(m, 1)) 440 + return 3; 441 + 442 + return 0; 443 + } 444 + 445 + SEC("tc") 446 + long rbtree_wrong_owner_remove_fail_b(void *ctx) 447 + { 448 + struct map_value *mapval; 449 + struct node_data *n; 450 + int idx = 0; 451 + 452 + mapval = bpf_map_lookup_elem(&stashed_nodes, &idx); 453 + if (!mapval) 454 + return 1; 455 + 456 + n = bpf_kptr_xchg(&mapval->node, NULL); 457 + if (!n) 458 + return 2; 459 + 460 + bpf_spin_lock(&block); 461 + 462 + bpf_rbtree_add(&broot, &n->r, less); 463 + 464 + bpf_spin_unlock(&block); 465 + return 0; 466 + } 467 + 468 + SEC("tc") 469 + long rbtree_wrong_owner_remove_fail_a2(void *ctx) 470 + { 471 + struct map_value *mapval; 472 + struct bpf_rb_node *res; 473 + struct node_data *m; 474 + int idx = 1; 475 + 476 + mapval = bpf_map_lookup_elem(&stashed_nodes, &idx); 477 + if (!mapval) 478 + return 1; 479 + 480 + m = bpf_kptr_xchg(&mapval->node, NULL); 481 + if (!m) 482 + return 2; 483 + bpf_spin_lock(&lock); 484 + 485 + /* make m non-owning ref */ 486 + bpf_list_push_back(&head, &m->l); 487 + res = bpf_rbtree_remove(&root, &m->r); 488 + 489 + bpf_spin_unlock(&lock); 490 + if (res) { 491 + bpf_obj_drop(container_of(res, struct node_data, r)); 492 + return 3; 493 + } 408 494 return 0; 409 495 } 410 496

+40

tools/testing/selftests/bpf/progs/test_tc_link.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Isovalent */ 3 + #include <stdbool.h> 4 + #include <linux/bpf.h> 5 + #include <bpf/bpf_helpers.h> 6 + 7 + char LICENSE[] SEC("license") = "GPL"; 8 + 9 + bool seen_tc1; 10 + bool seen_tc2; 11 + bool seen_tc3; 12 + bool seen_tc4; 13 + 14 + SEC("tc/ingress") 15 + int tc1(struct __sk_buff *skb) 16 + { 17 + seen_tc1 = true; 18 + return TCX_NEXT; 19 + } 20 + 21 + SEC("tc/egress") 22 + int tc2(struct __sk_buff *skb) 23 + { 24 + seen_tc2 = true; 25 + return TCX_NEXT; 26 + } 27 + 28 + SEC("tc/egress") 29 + int tc3(struct __sk_buff *skb) 30 + { 31 + seen_tc3 = true; 32 + return TCX_NEXT; 33 + } 34 + 35 + SEC("tc/egress") 36 + int tc4(struct __sk_buff *skb) 37 + { 38 + seen_tc4 = true; 39 + return TCX_NEXT; 40 + }

+3 -3

tools/testing/selftests/bpf/progs/xsk_xdp_progs.c

··· 15 15 static unsigned int idx; 16 16 int count = 0; 17 17 18 - SEC("xdp") int xsk_def_prog(struct xdp_md *xdp) 18 + SEC("xdp.frags") int xsk_def_prog(struct xdp_md *xdp) 19 19 { 20 20 return bpf_redirect_map(&xsk, 0, XDP_DROP); 21 21 } 22 22 23 - SEC("xdp") int xsk_xdp_drop(struct xdp_md *xdp) 23 + SEC("xdp.frags") int xsk_xdp_drop(struct xdp_md *xdp) 24 24 { 25 25 /* Drop every other packet */ 26 26 if (idx++ % 2) ··· 29 29 return bpf_redirect_map(&xsk, 0, XDP_DROP); 30 30 } 31 31 32 - SEC("xdp") int xsk_xdp_populate_metadata(struct xdp_md *xdp) 32 + SEC("xdp.frags") int xsk_xdp_populate_metadata(struct xdp_md *xdp) 33 33 { 34 34 void *data, *data_meta; 35 35 struct xdp_info *meta;

+5

tools/testing/selftests/bpf/test_xsk.sh

··· 171 171 172 172 if [ -z $ETH ]; then 173 173 cleanup_exit ${VETH0} ${VETH1} 174 + else 175 + cleanup_iface ${ETH} ${MTU} 174 176 fi 177 + 175 178 TEST_NAME="XSK_SELFTESTS_${VETH0}_BUSY_POLL" 176 179 busy_poll=1 177 180 ··· 187 184 188 185 if [ -z $ETH ]; then 189 186 cleanup_exit ${VETH0} ${VETH1} 187 + else 188 + cleanup_iface ${ETH} ${MTU} 190 189 fi 191 190 192 191 failures=0

+135 -1

tools/testing/selftests/bpf/xsk.c

··· 18 18 #include <linux/ethtool.h> 19 19 #include <linux/filter.h> 20 20 #include <linux/if_ether.h> 21 + #include <linux/if_link.h> 21 22 #include <linux/if_packet.h> 22 23 #include <linux/if_xdp.h> 23 24 #include <linux/kernel.h> 24 25 #include <linux/list.h> 26 + #include <linux/netlink.h> 27 + #include <linux/rtnetlink.h> 25 28 #include <linux/sockios.h> 26 29 #include <net/if.h> 27 30 #include <sys/ioctl.h> 28 31 #include <sys/mman.h> 29 32 #include <sys/socket.h> 30 33 #include <sys/types.h> 31 - #include <linux/if_link.h> 32 34 33 35 #include <bpf/bpf.h> 34 36 #include <bpf/libbpf.h> ··· 81 79 struct xsk_ctx *ctx; 82 80 struct xsk_socket_config config; 83 81 int fd; 82 + }; 83 + 84 + struct nl_mtu_req { 85 + struct nlmsghdr nh; 86 + struct ifinfomsg msg; 87 + char buf[512]; 84 88 }; 85 89 86 90 int xsk_umem__fd(const struct xsk_umem *umem) ··· 292 284 return opts.attach_mode == XDP_ATTACHED_SKB; 293 285 294 286 return false; 287 + } 288 + 289 + /* Lifted from netlink.c in tools/lib/bpf */ 290 + static int netlink_recvmsg(int sock, struct msghdr *mhdr, int flags) 291 + { 292 + int len; 293 + 294 + do { 295 + len = recvmsg(sock, mhdr, flags); 296 + } while (len < 0 && (errno == EINTR || errno == EAGAIN)); 297 + 298 + if (len < 0) 299 + return -errno; 300 + return len; 301 + } 302 + 303 + /* Lifted from netlink.c in tools/lib/bpf */ 304 + static int alloc_iov(struct iovec *iov, int len) 305 + { 306 + void *nbuf; 307 + 308 + nbuf = realloc(iov->iov_base, len); 309 + if (!nbuf) 310 + return -ENOMEM; 311 + 312 + iov->iov_base = nbuf; 313 + iov->iov_len = len; 314 + return 0; 315 + } 316 + 317 + /* Original version lifted from netlink.c in tools/lib/bpf */ 318 + static int netlink_recv(int sock) 319 + { 320 + struct iovec iov = {}; 321 + struct msghdr mhdr = { 322 + .msg_iov = &iov, 323 + .msg_iovlen = 1, 324 + }; 325 + bool multipart = true; 326 + struct nlmsgerr *err; 327 + struct nlmsghdr *nh; 328 + int len, ret; 329 + 330 + ret = alloc_iov(&iov, 4096); 331 + if (ret) 332 + goto done; 333 + 334 + while (multipart) { 335 + multipart = false; 336 + len = netlink_recvmsg(sock, &mhdr, MSG_PEEK | MSG_TRUNC); 337 + if (len < 0) { 338 + ret = len; 339 + goto done; 340 + } 341 + 342 + if (len > iov.iov_len) { 343 + ret = alloc_iov(&iov, len); 344 + if (ret) 345 + goto done; 346 + } 347 + 348 + len = netlink_recvmsg(sock, &mhdr, 0); 349 + if (len < 0) { 350 + ret = len; 351 + goto done; 352 + } 353 + 354 + if (len == 0) 355 + break; 356 + 357 + for (nh = (struct nlmsghdr *)iov.iov_base; NLMSG_OK(nh, len); 358 + nh = NLMSG_NEXT(nh, len)) { 359 + if (nh->nlmsg_flags & NLM_F_MULTI) 360 + multipart = true; 361 + switch (nh->nlmsg_type) { 362 + case NLMSG_ERROR: 363 + err = (struct nlmsgerr *)NLMSG_DATA(nh); 364 + if (!err->error) 365 + continue; 366 + ret = err->error; 367 + goto done; 368 + case NLMSG_DONE: 369 + ret = 0; 370 + goto done; 371 + default: 372 + break; 373 + } 374 + } 375 + } 376 + ret = 0; 377 + done: 378 + free(iov.iov_base); 379 + return ret; 380 + } 381 + 382 + int xsk_set_mtu(int ifindex, int mtu) 383 + { 384 + struct nl_mtu_req req; 385 + struct rtattr *rta; 386 + int fd, ret; 387 + 388 + fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE); 389 + if (fd < 0) 390 + return fd; 391 + 392 + memset(&req, 0, sizeof(req)); 393 + req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); 394 + req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; 395 + req.nh.nlmsg_type = RTM_NEWLINK; 396 + req.msg.ifi_family = AF_UNSPEC; 397 + req.msg.ifi_index = ifindex; 398 + rta = (struct rtattr *)(((char *)&req) + NLMSG_ALIGN(req.nh.nlmsg_len)); 399 + rta->rta_type = IFLA_MTU; 400 + rta->rta_len = RTA_LENGTH(sizeof(unsigned int)); 401 + req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_LENGTH(sizeof(mtu)); 402 + memcpy(RTA_DATA(rta), &mtu, sizeof(mtu)); 403 + 404 + ret = send(fd, &req, req.nh.nlmsg_len, 0); 405 + if (ret < 0) { 406 + close(fd); 407 + return errno; 408 + } 409 + 410 + ret = netlink_recv(fd); 411 + close(fd); 412 + return ret; 295 413 } 296 414 297 415 int xsk_attach_xdp_program(struct bpf_program *prog, int ifindex, u32 xdp_flags)

+2

tools/testing/selftests/bpf/xsk.h

··· 239 239 int xsk_umem__delete(struct xsk_umem *umem); 240 240 void xsk_socket__delete(struct xsk_socket *xsk); 241 241 242 + int xsk_set_mtu(int ifindex, int mtu); 243 + 242 244 #ifdef __cplusplus 243 245 } /* extern "C" */ 244 246 #endif

+7

tools/testing/selftests/bpf/xsk_prereqs.sh

··· 53 53 exit 1 54 54 } 55 55 56 + cleanup_iface() 57 + { 58 + ip link set $1 mtu $2 59 + ip link set $1 xdp off 60 + ip link set $1 xdpgeneric off 61 + } 62 + 56 63 clear_configs() 57 64 { 58 65 [ $(ip link show $1 &>/dev/null; echo $?;) == 0 ] &&

+404 -62

tools/testing/selftests/bpf/xskxceiver.c

··· 49 49 * h. tests for invalid and corner case Tx descriptors so that the correct ones 50 50 * are discarded and let through, respectively. 51 51 * i. 2K frame size tests 52 - * 53 - * Total tests: 12 52 + * j. If multi-buffer is supported, send 9k packets divided into 3 frames 53 + * k. If multi-buffer and huge pages are supported, send 9k packets in a single frame 54 + * using unaligned mode 55 + * l. If multi-buffer is supported, try various nasty combinations of descriptors to 56 + * check if they pass the validation or not 54 57 * 55 58 * Flow: 56 59 * ----- ··· 76 73 #include <fcntl.h> 77 74 #include <errno.h> 78 75 #include <getopt.h> 79 - #include <asm/barrier.h> 80 76 #include <linux/if_link.h> 81 77 #include <linux/if_ether.h> 82 78 #include <linux/mman.h> 79 + #include <linux/netdev.h> 83 80 #include <arpa/inet.h> 84 81 #include <net/if.h> 85 82 #include <locale.h> ··· 94 91 #include <sys/socket.h> 95 92 #include <sys/time.h> 96 93 #include <sys/types.h> 97 - #include <time.h> 98 94 #include <unistd.h> 99 95 100 96 #include "xsk_xdp_progs.skel.h" ··· 255 253 cfg.bind_flags = ifobject->bind_flags; 256 254 if (shared) 257 255 cfg.bind_flags |= XDP_SHARED_UMEM; 256 + if (ifobject->pkt_stream && ifobject->mtu > MAX_ETH_PKT_SIZE) 257 + cfg.bind_flags |= XDP_USE_SG; 258 258 259 259 txr = ifobject->tx_on ? &xsk->tx : NULL; 260 260 rxr = ifobject->rx_on ? &xsk->rx : NULL; ··· 419 415 test->total_steps = 1; 420 416 test->nb_sockets = 1; 421 417 test->fail = false; 418 + test->mtu = MAX_ETH_PKT_SIZE; 422 419 test->xdp_prog_rx = ifobj_rx->xdp_progs->progs.xsk_def_prog; 423 420 test->xskmap_rx = ifobj_rx->xdp_progs->maps.xsk; 424 421 test->xdp_prog_tx = ifobj_tx->xdp_progs->progs.xsk_def_prog; ··· 471 466 test->xdp_prog_tx = xdp_prog_tx; 472 467 test->xskmap_rx = xskmap_rx; 473 468 test->xskmap_tx = xskmap_tx; 469 + } 470 + 471 + static int test_spec_set_mtu(struct test_spec *test, int mtu) 472 + { 473 + int err; 474 + 475 + if (test->ifobj_rx->mtu != mtu) { 476 + err = xsk_set_mtu(test->ifobj_rx->ifindex, mtu); 477 + if (err) 478 + return err; 479 + test->ifobj_rx->mtu = mtu; 480 + } 481 + if (test->ifobj_tx->mtu != mtu) { 482 + err = xsk_set_mtu(test->ifobj_tx->ifindex, mtu); 483 + if (err) 484 + return err; 485 + test->ifobj_tx->mtu = mtu; 486 + } 487 + 488 + return 0; 474 489 } 475 490 476 491 static void pkt_stream_reset(struct pkt_stream *pkt_stream) ··· 558 533 return pkt_stream; 559 534 } 560 535 536 + static bool pkt_continues(u32 options) 537 + { 538 + return options & XDP_PKT_CONTD; 539 + } 540 + 561 541 static u32 ceil_u32(u32 a, u32 b) 562 542 { 563 543 return (a + b - 1) / b; 564 544 } 565 545 566 - static u32 pkt_nb_frags(u32 frame_size, struct pkt *pkt) 546 + static u32 pkt_nb_frags(u32 frame_size, struct pkt_stream *pkt_stream, struct pkt *pkt) 567 547 { 568 - if (!pkt || !pkt->valid) 548 + u32 nb_frags = 1, next_frag; 549 + 550 + if (!pkt) 569 551 return 1; 570 - return ceil_u32(pkt->len, frame_size); 552 + 553 + if (!pkt_stream->verbatim) { 554 + if (!pkt->valid || !pkt->len) 555 + return 1; 556 + return ceil_u32(pkt->len, frame_size); 557 + } 558 + 559 + /* Search for the end of the packet in verbatim mode */ 560 + if (!pkt_continues(pkt->options)) 561 + return nb_frags; 562 + 563 + next_frag = pkt_stream->current_pkt_nb; 564 + pkt++; 565 + while (next_frag++ < pkt_stream->nb_pkts) { 566 + nb_frags++; 567 + if (!pkt_continues(pkt->options) || !pkt->valid) 568 + break; 569 + pkt++; 570 + } 571 + return nb_frags; 571 572 } 572 573 573 574 static void pkt_set(struct xsk_umem_info *umem, struct pkt *pkt, int offset, u32 len) 574 575 { 575 576 pkt->offset = offset; 576 577 pkt->len = len; 577 - if (len > umem->frame_size - XDP_PACKET_HEADROOM - MIN_PKT_SIZE * 2 - umem->frame_headroom) 578 + if (len > MAX_ETH_JUMBO_SIZE) 578 579 pkt->valid = false; 579 580 else 580 581 pkt->valid = true; ··· 688 637 return pkt->offset + umem_alloc_buffer(umem); 689 638 } 690 639 640 + static void pkt_stream_cancel(struct pkt_stream *pkt_stream) 641 + { 642 + pkt_stream->current_pkt_nb--; 643 + } 644 + 691 645 static void pkt_generate(struct ifobject *ifobject, u64 addr, u32 len, u32 pkt_nb, 692 646 u32 bytes_written) 693 647 { ··· 713 657 write_payload(data, pkt_nb, bytes_written, len); 714 658 } 715 659 716 - static void __pkt_stream_generate_custom(struct ifobject *ifobj, 717 - struct pkt *pkts, u32 nb_pkts) 660 + static struct pkt_stream *__pkt_stream_generate_custom(struct ifobject *ifobj, struct pkt *frames, 661 + u32 nb_frames, bool verbatim) 718 662 { 663 + u32 i, len = 0, pkt_nb = 0, payload = 0; 719 664 struct pkt_stream *pkt_stream; 720 - u32 i; 721 665 722 - pkt_stream = __pkt_stream_alloc(nb_pkts); 666 + pkt_stream = __pkt_stream_alloc(nb_frames); 723 667 if (!pkt_stream) 724 668 exit_with_error(ENOMEM); 725 669 726 - for (i = 0; i < nb_pkts; i++) { 727 - struct pkt *pkt = &pkt_stream->pkts[i]; 670 + for (i = 0; i < nb_frames; i++) { 671 + struct pkt *pkt = &pkt_stream->pkts[pkt_nb]; 672 + struct pkt *frame = &frames[i]; 728 673 729 - pkt->offset = pkts[i].offset; 730 - pkt->len = pkts[i].len; 731 - pkt->pkt_nb = i; 732 - pkt->valid = pkts[i].valid; 733 - if (pkt->len > pkt_stream->max_pkt_len) 674 + pkt->offset = frame->offset; 675 + if (verbatim) { 676 + *pkt = *frame; 677 + pkt->pkt_nb = payload; 678 + if (!frame->valid || !pkt_continues(frame->options)) 679 + payload++; 680 + } else { 681 + if (frame->valid) 682 + len += frame->len; 683 + if (frame->valid && pkt_continues(frame->options)) 684 + continue; 685 + 686 + pkt->pkt_nb = pkt_nb; 687 + pkt->len = len; 688 + pkt->valid = frame->valid; 689 + pkt->options = 0; 690 + 691 + len = 0; 692 + } 693 + 694 + if (pkt->valid && pkt->len > pkt_stream->max_pkt_len) 734 695 pkt_stream->max_pkt_len = pkt->len; 696 + pkt_nb++; 735 697 } 736 698 737 - ifobj->pkt_stream = pkt_stream; 699 + pkt_stream->nb_pkts = pkt_nb; 700 + pkt_stream->verbatim = verbatim; 701 + return pkt_stream; 738 702 } 739 703 740 704 static void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts, u32 nb_pkts) 741 705 { 742 - __pkt_stream_generate_custom(test->ifobj_tx, pkts, nb_pkts); 743 - __pkt_stream_generate_custom(test->ifobj_rx, pkts, nb_pkts); 706 + struct pkt_stream *pkt_stream; 707 + 708 + pkt_stream = __pkt_stream_generate_custom(test->ifobj_tx, pkts, nb_pkts, true); 709 + test->ifobj_tx->pkt_stream = pkt_stream; 710 + 711 + pkt_stream = __pkt_stream_generate_custom(test->ifobj_rx, pkts, nb_pkts, false); 712 + test->ifobj_rx->pkt_stream = pkt_stream; 744 713 } 745 714 746 715 static void pkt_print_data(u32 *data, u32 cnt) ··· 846 765 return true; 847 766 } 848 767 849 - static bool is_pkt_valid(struct pkt *pkt, void *buffer, u64 addr, u32 len) 768 + static bool is_frag_valid(struct xsk_umem_info *umem, u64 addr, u32 len, u32 expected_pkt_nb, 769 + u32 bytes_processed) 850 770 { 851 - void *data = xsk_umem__get_data(buffer, addr); 852 - u32 seqnum, pkt_data; 771 + u32 seqnum, pkt_nb, *pkt_data, words_to_end, expected_seqnum; 772 + void *data = xsk_umem__get_data(umem->buffer, addr); 853 773 854 - if (!pkt) { 855 - ksft_print_msg("[%s] too many packets received\n", __func__); 774 + addr -= umem->base_addr; 775 + 776 + if (addr >= umem->num_frames * umem->frame_size || 777 + addr + len > umem->num_frames * umem->frame_size) { 778 + ksft_print_msg("Frag invalid addr: %llx len: %u\n", addr, len); 779 + return false; 780 + } 781 + if (!umem->unaligned_mode && addr % umem->frame_size + len > umem->frame_size) { 782 + ksft_print_msg("Frag crosses frame boundary addr: %llx len: %u\n", addr, len); 783 + return false; 784 + } 785 + 786 + pkt_data = data; 787 + if (!bytes_processed) { 788 + pkt_data += PKT_HDR_SIZE / sizeof(*pkt_data); 789 + len -= PKT_HDR_SIZE; 790 + } else { 791 + bytes_processed -= PKT_HDR_SIZE; 792 + } 793 + 794 + expected_seqnum = bytes_processed / sizeof(*pkt_data); 795 + seqnum = ntohl(*pkt_data) & 0xffff; 796 + pkt_nb = ntohl(*pkt_data) >> 16; 797 + 798 + if (expected_pkt_nb != pkt_nb) { 799 + ksft_print_msg("[%s] expected pkt_nb [%u], got pkt_nb [%u]\n", 800 + __func__, expected_pkt_nb, pkt_nb); 801 + goto error; 802 + } 803 + if (expected_seqnum != seqnum) { 804 + ksft_print_msg("[%s] expected seqnum at start [%u], got seqnum [%u]\n", 805 + __func__, expected_seqnum, seqnum); 856 806 goto error; 857 807 } 858 808 859 - if (len < MIN_PKT_SIZE || pkt->len < MIN_PKT_SIZE) { 860 - /* Do not try to verify packets that are smaller than minimum size. */ 861 - return true; 862 - } 863 - 864 - if (pkt->len != len) { 865 - ksft_print_msg("[%s] expected length [%d], got length [%d]\n", 866 - __func__, pkt->len, len); 867 - goto error; 868 - } 869 - 870 - pkt_data = ntohl(*((u32 *)(data + PKT_HDR_SIZE))); 871 - seqnum = pkt_data >> 16; 872 - 873 - if (pkt->pkt_nb != seqnum) { 874 - ksft_print_msg("[%s] expected seqnum [%d], got seqnum [%d]\n", 875 - __func__, pkt->pkt_nb, seqnum); 809 + words_to_end = len / sizeof(*pkt_data) - 1; 810 + pkt_data += words_to_end; 811 + seqnum = ntohl(*pkt_data) & 0xffff; 812 + expected_seqnum += words_to_end; 813 + if (expected_seqnum != seqnum) { 814 + ksft_print_msg("[%s] expected seqnum at end [%u], got seqnum [%u]\n", 815 + __func__, expected_seqnum, seqnum); 876 816 goto error; 877 817 } 878 818 879 819 return true; 880 820 881 821 error: 882 - pkt_dump(data, len, true); 822 + pkt_dump(data, len, !bytes_processed); 883 823 return false; 824 + } 825 + 826 + static bool is_pkt_valid(struct pkt *pkt, void *buffer, u64 addr, u32 len) 827 + { 828 + if (pkt->len != len) { 829 + ksft_print_msg("[%s] expected packet length [%d], got length [%d]\n", 830 + __func__, pkt->len, len); 831 + pkt_dump(xsk_umem__get_data(buffer, addr), len, true); 832 + return false; 833 + } 834 + 835 + return true; 884 836 } 885 837 886 838 static void kick_tx(struct xsk_socket_info *xsk) ··· 968 854 { 969 855 struct timeval tv_end, tv_now, tv_timeout = {THREAD_TMOUT, 0}; 970 856 struct pkt_stream *pkt_stream = test->ifobj_rx->pkt_stream; 971 - u32 idx_rx = 0, idx_fq = 0, rcvd, i, pkts_sent = 0; 972 857 struct xsk_socket_info *xsk = test->ifobj_rx->xsk; 858 + u32 idx_rx = 0, idx_fq = 0, rcvd, pkts_sent = 0; 973 859 struct ifobject *ifobj = test->ifobj_rx; 974 860 struct xsk_umem_info *umem = xsk->umem; 975 861 struct pkt *pkt; ··· 982 868 983 869 pkt = pkt_stream_get_next_rx_pkt(pkt_stream, &pkts_sent); 984 870 while (pkt) { 871 + u32 frags_processed = 0, nb_frags = 0, pkt_len = 0; 872 + u64 first_addr; 873 + 985 874 ret = gettimeofday(&tv_now, NULL); 986 875 if (ret) 987 876 exit_with_error(errno); ··· 1005 888 1006 889 ksft_print_msg("ERROR: [%s] Poll timed out\n", __func__); 1007 890 return TEST_FAILURE; 1008 - 1009 891 } 1010 892 1011 893 if (!(fds->revents & POLLIN)) ··· 1029 913 } 1030 914 } 1031 915 1032 - for (i = 0; i < rcvd; i++) { 916 + while (frags_processed < rcvd) { 1033 917 const struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++); 1034 918 u64 addr = desc->addr, orig; 1035 919 1036 920 orig = xsk_umem__extract_addr(addr); 1037 921 addr = xsk_umem__add_offset_to_addr(addr); 1038 922 1039 - if (!is_pkt_valid(pkt, umem->buffer, addr, desc->len) || 923 + if (!pkt) { 924 + ksft_print_msg("[%s] received too many packets addr: %lx len %u\n", 925 + __func__, addr, desc->len); 926 + return TEST_FAILURE; 927 + } 928 + 929 + if (!is_frag_valid(umem, addr, desc->len, pkt->pkt_nb, pkt_len) || 1040 930 !is_offset_correct(umem, pkt, addr) || 1041 931 (ifobj->use_metadata && !is_metadata_correct(pkt, umem->buffer, addr))) 1042 932 return TEST_FAILURE; 1043 933 934 + if (!nb_frags++) 935 + first_addr = addr; 936 + frags_processed++; 937 + pkt_len += desc->len; 1044 938 if (ifobj->use_fill_ring) 1045 939 *xsk_ring_prod__fill_addr(&umem->fq, idx_fq++) = orig; 940 + 941 + if (pkt_continues(desc->options)) 942 + continue; 943 + 944 + /* The complete packet has been received */ 945 + if (!is_pkt_valid(pkt, umem->buffer, first_addr, pkt_len) || 946 + !is_offset_correct(umem, pkt, addr)) 947 + return TEST_FAILURE; 948 + 1046 949 pkt = pkt_stream_get_next_rx_pkt(pkt_stream, &pkts_sent); 950 + nb_frags = 0; 951 + pkt_len = 0; 952 + } 953 + 954 + if (nb_frags) { 955 + /* In the middle of a packet. Start over from beginning of packet. */ 956 + idx_rx -= nb_frags; 957 + xsk_ring_cons__cancel(&xsk->rx, nb_frags); 958 + if (ifobj->use_fill_ring) { 959 + idx_fq -= nb_frags; 960 + xsk_ring_prod__cancel(&umem->fq, nb_frags); 961 + } 962 + frags_processed -= nb_frags; 1047 963 } 1048 964 1049 965 if (ifobj->use_fill_ring) 1050 - xsk_ring_prod__submit(&umem->fq, rcvd); 966 + xsk_ring_prod__submit(&umem->fq, frags_processed); 1051 967 if (ifobj->release_rx) 1052 - xsk_ring_cons__release(&xsk->rx, rcvd); 968 + xsk_ring_cons__release(&xsk->rx, frags_processed); 1053 969 1054 970 pthread_mutex_lock(&pacing_mutex); 1055 971 pkts_in_flight -= pkts_sent; ··· 1094 946 1095 947 static int __send_pkts(struct ifobject *ifobject, struct pollfd *fds, bool timeout) 1096 948 { 949 + u32 i, idx = 0, valid_pkts = 0, valid_frags = 0, buffer_len; 950 + struct pkt_stream *pkt_stream = ifobject->pkt_stream; 1097 951 struct xsk_socket_info *xsk = ifobject->xsk; 1098 952 struct xsk_umem_info *umem = ifobject->umem; 1099 - u32 i, idx = 0, valid_pkts = 0, buffer_len; 1100 953 bool use_poll = ifobject->use_poll; 1101 954 int ret; 1102 955 1103 - buffer_len = pkt_get_buffer_len(umem, ifobject->pkt_stream->max_pkt_len); 956 + buffer_len = pkt_get_buffer_len(umem, pkt_stream->max_pkt_len); 1104 957 /* pkts_in_flight might be negative if many invalid packets are sent */ 1105 958 if (pkts_in_flight >= (int)((umem_size(umem) - BATCH_SIZE * buffer_len) / buffer_len)) { 1106 959 kick_tx(xsk); ··· 1132 983 } 1133 984 1134 985 for (i = 0; i < BATCH_SIZE; i++) { 1135 - struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); 1136 - struct pkt *pkt = pkt_stream_get_next_tx_pkt(ifobject->pkt_stream); 986 + struct pkt *pkt = pkt_stream_get_next_tx_pkt(pkt_stream); 987 + u32 nb_frags_left, nb_frags, bytes_written = 0; 1137 988 1138 989 if (!pkt) 1139 990 break; 1140 991 1141 - tx_desc->addr = pkt_get_addr(pkt, umem); 1142 - tx_desc->len = pkt->len; 1143 - if (pkt->valid) { 992 + nb_frags = pkt_nb_frags(umem->frame_size, pkt_stream, pkt); 993 + if (nb_frags > BATCH_SIZE - i) { 994 + pkt_stream_cancel(pkt_stream); 995 + xsk_ring_prod__cancel(&xsk->tx, BATCH_SIZE - i); 996 + break; 997 + } 998 + nb_frags_left = nb_frags; 999 + 1000 + while (nb_frags_left--) { 1001 + struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); 1002 + 1003 + tx_desc->addr = pkt_get_addr(pkt, ifobject->umem); 1004 + if (pkt_stream->verbatim) { 1005 + tx_desc->len = pkt->len; 1006 + tx_desc->options = pkt->options; 1007 + } else if (nb_frags_left) { 1008 + tx_desc->len = umem->frame_size; 1009 + tx_desc->options = XDP_PKT_CONTD; 1010 + } else { 1011 + tx_desc->len = pkt->len - bytes_written; 1012 + tx_desc->options = 0; 1013 + } 1014 + if (pkt->valid) 1015 + pkt_generate(ifobject, tx_desc->addr, tx_desc->len, pkt->pkt_nb, 1016 + bytes_written); 1017 + bytes_written += tx_desc->len; 1018 + 1019 + if (nb_frags_left) { 1020 + i++; 1021 + if (pkt_stream->verbatim) 1022 + pkt = pkt_stream_get_next_tx_pkt(pkt_stream); 1023 + } 1024 + } 1025 + 1026 + if (pkt && pkt->valid) { 1144 1027 valid_pkts++; 1145 - pkt_generate(ifobject, tx_desc->addr, tx_desc->len, pkt->pkt_nb, 0); 1028 + valid_frags += nb_frags; 1146 1029 } 1147 1030 } 1148 1031 ··· 1183 1002 pthread_mutex_unlock(&pacing_mutex); 1184 1003 1185 1004 xsk_ring_prod__submit(&xsk->tx, i); 1186 - xsk->outstanding_tx += valid_pkts; 1005 + xsk->outstanding_tx += valid_frags; 1187 1006 1188 1007 if (use_poll) { 1189 1008 ret = poll(fds, 1, POLL_TMOUT); ··· 1403 1222 u64 addr; 1404 1223 u32 i; 1405 1224 1406 - for (i = 0; i < pkt_nb_frags(rx_frame_size, pkt); i++) { 1225 + for (i = 0; i < pkt_nb_frags(rx_frame_size, pkt_stream, pkt); i++) { 1407 1226 if (!pkt) { 1408 1227 if (!fill_up) 1409 1228 break; ··· 1596 1415 struct ifobject *ifobj2) 1597 1416 { 1598 1417 pthread_t t0, t1; 1418 + int err; 1419 + 1420 + if (test->mtu > MAX_ETH_PKT_SIZE) { 1421 + if (test->mode == TEST_MODE_ZC && (!ifobj1->multi_buff_zc_supp || 1422 + (ifobj2 && !ifobj2->multi_buff_zc_supp))) { 1423 + ksft_test_result_skip("Multi buffer for zero-copy not supported.\n"); 1424 + return TEST_SKIP; 1425 + } 1426 + if (test->mode != TEST_MODE_ZC && (!ifobj1->multi_buff_supp || 1427 + (ifobj2 && !ifobj2->multi_buff_supp))) { 1428 + ksft_test_result_skip("Multi buffer not supported.\n"); 1429 + return TEST_SKIP; 1430 + } 1431 + } 1432 + err = test_spec_set_mtu(test, test->mtu); 1433 + if (err) { 1434 + ksft_print_msg("Error, could not set mtu.\n"); 1435 + exit_with_error(err); 1436 + } 1599 1437 1600 1438 if (ifobj2) { 1601 1439 if (pthread_barrier_init(&barr, NULL, 2)) ··· 1816 1616 return testapp_validate_traffic(test); 1817 1617 } 1818 1618 1619 + static int testapp_unaligned_mb(struct test_spec *test) 1620 + { 1621 + test_spec_set_name(test, "UNALIGNED_MODE_9K"); 1622 + test->mtu = MAX_ETH_JUMBO_SIZE; 1623 + test->ifobj_tx->umem->unaligned_mode = true; 1624 + test->ifobj_rx->umem->unaligned_mode = true; 1625 + pkt_stream_replace(test, DEFAULT_PKT_CNT, MAX_ETH_JUMBO_SIZE); 1626 + return testapp_validate_traffic(test); 1627 + } 1628 + 1819 1629 static int testapp_single_pkt(struct test_spec *test) 1820 1630 { 1821 1631 struct pkt pkts[] = {{0, MIN_PKT_SIZE, 0, true}}; 1822 1632 1633 + pkt_stream_generate_custom(test, pkts, ARRAY_SIZE(pkts)); 1634 + return testapp_validate_traffic(test); 1635 + } 1636 + 1637 + static int testapp_multi_buffer(struct test_spec *test) 1638 + { 1639 + test_spec_set_name(test, "RUN_TO_COMPLETION_9K_PACKETS"); 1640 + test->mtu = MAX_ETH_JUMBO_SIZE; 1641 + pkt_stream_replace(test, DEFAULT_PKT_CNT, MAX_ETH_JUMBO_SIZE); 1642 + 1643 + return testapp_validate_traffic(test); 1644 + } 1645 + 1646 + static int testapp_invalid_desc_mb(struct test_spec *test) 1647 + { 1648 + struct xsk_umem_info *umem = test->ifobj_tx->umem; 1649 + u64 umem_size = umem->num_frames * umem->frame_size; 1650 + struct pkt pkts[] = { 1651 + /* Valid packet for synch to start with */ 1652 + {0, MIN_PKT_SIZE, 0, true, 0}, 1653 + /* Zero frame len is not legal */ 1654 + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, 1655 + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, 1656 + {0, 0, 0, false, 0}, 1657 + /* Invalid address in the second frame */ 1658 + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, 1659 + {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, 1660 + /* Invalid len in the middle */ 1661 + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, 1662 + {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, 1663 + /* Invalid options in the middle */ 1664 + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, 1665 + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION}, 1666 + /* Transmit 2 frags, receive 3 */ 1667 + {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, XDP_PKT_CONTD}, 1668 + {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, 0}, 1669 + /* Middle frame crosses chunk boundary with small length */ 1670 + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, 1671 + {-MIN_PKT_SIZE / 2, MIN_PKT_SIZE, 0, false, 0}, 1672 + /* Valid packet for synch so that something is received */ 1673 + {0, MIN_PKT_SIZE, 0, true, 0}}; 1674 + 1675 + if (umem->unaligned_mode) { 1676 + /* Crossing a chunk boundary allowed */ 1677 + pkts[12].valid = true; 1678 + pkts[13].valid = true; 1679 + } 1680 + 1681 + test->mtu = MAX_ETH_JUMBO_SIZE; 1823 1682 pkt_stream_generate_custom(test, pkts, ARRAY_SIZE(pkts)); 1824 1683 return testapp_validate_traffic(test); 1825 1684 } ··· 1949 1690 int count = 0; 1950 1691 int key = 0; 1951 1692 1952 - test_spec_set_name(test, "XDP_METADATA_COUNT"); 1953 1693 test_spec_set_xdp_prog(test, skel_rx->progs.xsk_xdp_populate_metadata, 1954 1694 skel_tx->progs.xsk_xdp_populate_metadata, 1955 1695 skel_rx->maps.xsk, skel_tx->maps.xsk); ··· 1980 1722 test_spec_set_name(test, "POLL_RXQ_EMPTY"); 1981 1723 test->ifobj_rx->use_poll = true; 1982 1724 return testapp_validate_traffic_single_thread(test, test->ifobj_rx); 1725 + } 1726 + 1727 + static int testapp_too_many_frags(struct test_spec *test) 1728 + { 1729 + struct pkt pkts[2 * XSK_DESC__MAX_SKB_FRAGS + 2] = {}; 1730 + u32 max_frags, i; 1731 + 1732 + test_spec_set_name(test, "TOO_MANY_FRAGS"); 1733 + if (test->mode == TEST_MODE_ZC) 1734 + max_frags = test->ifobj_tx->xdp_zc_max_segs; 1735 + else 1736 + max_frags = XSK_DESC__MAX_SKB_FRAGS; 1737 + 1738 + test->mtu = MAX_ETH_JUMBO_SIZE; 1739 + 1740 + /* Valid packet for synch */ 1741 + pkts[0].len = MIN_PKT_SIZE; 1742 + pkts[0].valid = true; 1743 + 1744 + /* One valid packet with the max amount of frags */ 1745 + for (i = 1; i < max_frags + 1; i++) { 1746 + pkts[i].len = MIN_PKT_SIZE; 1747 + pkts[i].options = XDP_PKT_CONTD; 1748 + pkts[i].valid = true; 1749 + } 1750 + pkts[max_frags].options = 0; 1751 + 1752 + /* An invalid packet with the max amount of frags but signals packet 1753 + * continues on the last frag 1754 + */ 1755 + for (i = max_frags + 1; i < 2 * max_frags + 1; i++) { 1756 + pkts[i].len = MIN_PKT_SIZE; 1757 + pkts[i].options = XDP_PKT_CONTD; 1758 + pkts[i].valid = false; 1759 + } 1760 + 1761 + /* Valid packet for synch */ 1762 + pkts[2 * max_frags + 1].len = MIN_PKT_SIZE; 1763 + pkts[2 * max_frags + 1].valid = true; 1764 + 1765 + pkt_stream_generate_custom(test, pkts, 2 * max_frags + 2); 1766 + return testapp_validate_traffic(test); 1983 1767 } 1984 1768 1985 1769 static int xsk_load_xdp_programs(struct ifobject *ifobj) ··· 2057 1757 static void init_iface(struct ifobject *ifobj, const char *dst_mac, const char *src_mac, 2058 1758 thread_func_t func_ptr) 2059 1759 { 1760 + LIBBPF_OPTS(bpf_xdp_query_opts, query_opts); 2060 1761 int err; 2061 1762 2062 1763 memcpy(ifobj->dst_mac, dst_mac, ETH_ALEN); ··· 2073 1772 2074 1773 if (hugepages_present()) 2075 1774 ifobj->unaligned_supp = true; 1775 + 1776 + err = bpf_xdp_query(ifobj->ifindex, XDP_FLAGS_DRV_MODE, &query_opts); 1777 + if (err) { 1778 + ksft_print_msg("Error querrying XDP capabilities\n"); 1779 + exit_with_error(-err); 1780 + } 1781 + if (query_opts.feature_flags & NETDEV_XDP_ACT_RX_SG) 1782 + ifobj->multi_buff_supp = true; 1783 + if (query_opts.feature_flags & NETDEV_XDP_ACT_XSK_ZEROCOPY) { 1784 + if (query_opts.xdp_zc_max_segs > 1) { 1785 + ifobj->multi_buff_zc_supp = true; 1786 + ifobj->xdp_zc_max_segs = query_opts.xdp_zc_max_segs; 1787 + } else { 1788 + ifobj->xdp_zc_max_segs = 0; 1789 + } 1790 + } 2076 1791 } 2077 1792 2078 1793 static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_type type) ··· 2120 1803 case TEST_TYPE_RUN_TO_COMPLETION: 2121 1804 test_spec_set_name(test, "RUN_TO_COMPLETION"); 2122 1805 ret = testapp_validate_traffic(test); 1806 + break; 1807 + case TEST_TYPE_RUN_TO_COMPLETION_MB: 1808 + ret = testapp_multi_buffer(test); 2123 1809 break; 2124 1810 case TEST_TYPE_RUN_TO_COMPLETION_SINGLE_PKT: 2125 1811 test_spec_set_name(test, "RUN_TO_COMPLETION_SINGLE_PKT"); ··· 2186 1866 ret = testapp_invalid_desc(test); 2187 1867 break; 2188 1868 } 1869 + case TEST_TYPE_ALIGNED_INV_DESC_MB: 1870 + test_spec_set_name(test, "ALIGNED_INV_DESC_MULTI_BUFF"); 1871 + ret = testapp_invalid_desc_mb(test); 1872 + break; 1873 + case TEST_TYPE_UNALIGNED_INV_DESC_MB: 1874 + test_spec_set_name(test, "UNALIGNED_INV_DESC_MULTI_BUFF"); 1875 + test->ifobj_tx->umem->unaligned_mode = true; 1876 + test->ifobj_rx->umem->unaligned_mode = true; 1877 + ret = testapp_invalid_desc_mb(test); 1878 + break; 2189 1879 case TEST_TYPE_UNALIGNED: 2190 1880 ret = testapp_unaligned(test); 1881 + break; 1882 + case TEST_TYPE_UNALIGNED_MB: 1883 + ret = testapp_unaligned_mb(test); 2191 1884 break; 2192 1885 case TEST_TYPE_HEADROOM: 2193 1886 ret = testapp_headroom(test); ··· 2209 1876 ret = testapp_xdp_drop(test); 2210 1877 break; 2211 1878 case TEST_TYPE_XDP_METADATA_COUNT: 1879 + test_spec_set_name(test, "XDP_METADATA_COUNT"); 2212 1880 ret = testapp_xdp_metadata_count(test); 1881 + break; 1882 + case TEST_TYPE_XDP_METADATA_COUNT_MB: 1883 + test_spec_set_name(test, "XDP_METADATA_COUNT_MULTI_BUFF"); 1884 + test->mtu = MAX_ETH_JUMBO_SIZE; 1885 + ret = testapp_xdp_metadata_count(test); 1886 + break; 1887 + case TEST_TYPE_TOO_MANY_FRAGS: 1888 + ret = testapp_too_many_frags(test); 2213 1889 break; 2214 1890 default: 2215 1891 break;

+20 -1

tools/testing/selftests/bpf/xskxceiver.h

··· 38 38 #define MAX_TEARDOWN_ITER 10 39 39 #define PKT_HDR_SIZE (sizeof(struct ethhdr) + 2) /* Just to align the data in the packet */ 40 40 #define MIN_PKT_SIZE 64 41 + #define MAX_ETH_PKT_SIZE 1518 42 + #define MAX_ETH_JUMBO_SIZE 9000 41 43 #define USLEEP_MAX 10000 42 44 #define SOCK_RECONF_CTR 10 43 45 #define BATCH_SIZE 64 ··· 49 47 #define DEFAULT_UMEM_BUFFERS (DEFAULT_PKT_CNT / 4) 50 48 #define RX_FULL_RXQSIZE 32 51 49 #define UMEM_HEADROOM_TEST_SIZE 128 52 - #define XSK_UMEM__INVALID_FRAME_SIZE (XSK_UMEM__DEFAULT_FRAME_SIZE + 1) 50 + #define XSK_UMEM__INVALID_FRAME_SIZE (MAX_ETH_JUMBO_SIZE + 1) 51 + #define XSK_UMEM__LARGE_FRAME_SIZE (3 * 1024) 52 + #define XSK_UMEM__MAX_FRAME_SIZE (4 * 1024) 53 + #define XSK_DESC__INVALID_OPTION (0xffff) 54 + #define XSK_DESC__MAX_SKB_FRAGS 18 53 55 #define HUGEPAGE_SIZE (2 * 1024 * 1024) 54 56 #define PKT_DUMP_NB_TO_PRINT 16 55 57 ··· 89 83 TEST_TYPE_BPF_RES, 90 84 TEST_TYPE_XDP_DROP_HALF, 91 85 TEST_TYPE_XDP_METADATA_COUNT, 86 + TEST_TYPE_XDP_METADATA_COUNT_MB, 87 + TEST_TYPE_RUN_TO_COMPLETION_MB, 88 + TEST_TYPE_UNALIGNED_MB, 89 + TEST_TYPE_ALIGNED_INV_DESC_MB, 90 + TEST_TYPE_UNALIGNED_INV_DESC_MB, 91 + TEST_TYPE_TOO_MANY_FRAGS, 92 92 TEST_TYPE_MAX 93 93 }; 94 94 ··· 127 115 u32 len; 128 116 u32 pkt_nb; 129 117 bool valid; 118 + u16 options; 130 119 }; 131 120 132 121 struct pkt_stream { ··· 135 122 u32 current_pkt_nb; 136 123 struct pkt *pkts; 137 124 u32 max_pkt_len; 125 + bool verbatim; 138 126 }; 139 127 140 128 struct ifobject; ··· 155 141 struct bpf_program *xdp_prog; 156 142 enum test_mode mode; 157 143 int ifindex; 144 + int mtu; 158 145 u32 bind_flags; 146 + u32 xdp_zc_max_segs; 159 147 bool tx_on; 160 148 bool rx_on; 161 149 bool use_poll; ··· 167 151 bool shared_umem; 168 152 bool use_metadata; 169 153 bool unaligned_supp; 154 + bool multi_buff_supp; 155 + bool multi_buff_zc_supp; 170 156 u8 dst_mac[ETH_ALEN]; 171 157 u8 src_mac[ETH_ALEN]; 172 158 }; ··· 182 164 struct bpf_program *xdp_prog_tx; 183 165 struct bpf_map *xskmap_rx; 184 166 struct bpf_map *xskmap_tx; 167 + int mtu; 185 168 u16 total_steps; 186 169 u16 current_step; 187 170 u16 nb_sockets;