Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation/bpf: Add documentation for BPF_PROG_RUN

This adds documentation for the BPF_PROG_RUN command; a short overview of
the command itself, and a more verbose description of the "live packet"
mode for XDP introduced in the previous commit.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-3-toke@redhat.com

authored by

Toke Høiland-Jørgensen and committed by
Alexei Starovoitov
1a7551f1 b530e9e1

+118
+117
Documentation/bpf/bpf_prog_run.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =================================== 4 + Running BPF programs from userspace 5 + =================================== 6 + 7 + This document describes the ``BPF_PROG_RUN`` facility for running BPF programs 8 + from userspace. 9 + 10 + .. contents:: 11 + :local: 12 + :depth: 2 13 + 14 + 15 + Overview 16 + -------- 17 + 18 + The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to 19 + execute a BPF program in the kernel and return the results to userspace. This 20 + can be used to unit test BPF programs against user-supplied context objects, and 21 + as way to explicitly execute programs in the kernel for their side effects. The 22 + command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue 23 + to be defined in the UAPI header, aliased to the same value. 24 + 25 + The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the 26 + following types: 27 + 28 + - ``BPF_PROG_TYPE_SOCKET_FILTER`` 29 + - ``BPF_PROG_TYPE_SCHED_CLS`` 30 + - ``BPF_PROG_TYPE_SCHED_ACT`` 31 + - ``BPF_PROG_TYPE_XDP`` 32 + - ``BPF_PROG_TYPE_SK_LOOKUP`` 33 + - ``BPF_PROG_TYPE_CGROUP_SKB`` 34 + - ``BPF_PROG_TYPE_LWT_IN`` 35 + - ``BPF_PROG_TYPE_LWT_OUT`` 36 + - ``BPF_PROG_TYPE_LWT_XMIT`` 37 + - ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` 38 + - ``BPF_PROG_TYPE_FLOW_DISSECTOR`` 39 + - ``BPF_PROG_TYPE_STRUCT_OPS`` 40 + - ``BPF_PROG_TYPE_RAW_TRACEPOINT`` 41 + - ``BPF_PROG_TYPE_SYSCALL`` 42 + 43 + When using the ``BPF_PROG_RUN`` command, userspace supplies an input context 44 + object and (for program types operating on network packets) a buffer containing 45 + the packet data that the BPF program will operate on. The kernel will then 46 + execute the program and return the results to userspace. Note that programs will 47 + not have any side effects while being run in this mode; in particular, packets 48 + will not actually be redirected or dropped, the program return code will just be 49 + returned to userspace. A separate mode for live execution of XDP programs is 50 + provided, documented separately below. 51 + 52 + Running XDP programs in "live frame mode" 53 + ----------------------------------------- 54 + 55 + The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, 56 + which can be used to execute XDP programs in a way where packets will actually 57 + be processed by the kernel after the execution of the XDP program as if they 58 + arrived on a physical interface. This mode is activated by setting the 59 + ``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to 60 + ``BPF_PROG_RUN``. 61 + 62 + The live packet mode is optimised for high performance execution of the supplied 63 + XDP program many times (suitable for, e.g., running as a traffic generator), 64 + which means the semantics are not quite as straight-forward as the regular test 65 + run mode. Specifically: 66 + 67 + - When executing an XDP program in live frame mode, the result of the execution 68 + will not be returned to userspace; instead, the kernel will perform the 69 + operation indicated by the program's return code (drop the packet, redirect 70 + it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes 71 + in the syscall parameters when running in this mode will be rejected. In 72 + addition, not all failures will be reported back to userspace directly; 73 + specifically, only fatal errors in setup or during execution (like memory 74 + allocation errors) will halt execution and return an error. If an error occurs 75 + in packet processing, like a failure to redirect to a given interface, 76 + execution will continue with the next repetition; these errors can be detected 77 + via the same trace points as for regular XDP programs. 78 + 79 + - Userspace can supply an ifindex as part of the context object, just like in 80 + the regular (non-live) mode. The XDP program will be executed as though the 81 + packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context 82 + object will point to that interface. Furthermore, if the XDP program returns 83 + ``XDP_PASS``, the packet will be injected into the kernel networking stack as 84 + though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet 85 + will be transmitted *out* of that same interface. Do note, though, that 86 + because the program execution is not happening in driver context, an 87 + ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to 88 + that same interface (i.e., it will only work if the driver has support for the 89 + ``ndo_xdp_xmit`` driver op). 90 + 91 + - When running the program with multiple repetitions, the execution will happen 92 + in batches. The batch size defaults to 64 packets (which is same as the 93 + maximum NAPI receive batch size), but can be specified by userspace through 94 + the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, 95 + the kernel executes the XDP program repeatedly, each invocation getting a 96 + separate copy of the packet data. For each repetition, if the program drops 97 + the packet, the data page is immediately recycled (see below). Otherwise, the 98 + packet is buffered until the end of the batch, at which point all packets 99 + buffered this way during the batch are transmitted at once. 100 + 101 + - When setting up the test run, the kernel will initialise a pool of memory 102 + pages of the same size as the batch size. Each memory page will be initialised 103 + with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` 104 + invocation. When possible, the pages will be recycled on future program 105 + invocations, to improve performance. Pages will generally be recycled a full 106 + batch at a time, except when a packet is dropped (by return code or because 107 + of, say, a redirection error), in which case that page will be recycled 108 + immediately. If a packet ends up being passed to the regular networking stack 109 + (because the XDP program returns ``XDP_PASS``, or because it ends up being 110 + redirected to an interface that injects it into the stack), the page will be 111 + released and a new one will be allocated when the pool is empty. 112 + 113 + When recycling, the page content is not rewritten; only the packet boundary 114 + pointers (``data``, ``data_end`` and ``data_meta``) in the context object will 115 + be reset to the original values. This means that if a program rewrites the 116 + packet contents, it has to be prepared to see either the original content or 117 + the modified version on subsequent invocations.
+1
Documentation/bpf/index.rst
··· 21 21 helpers 22 22 programs 23 23 maps 24 + bpf_prog_run 24 25 classic_vs_extended.rst 25 26 bpf_licensing 26 27 test_debug