Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-05-17

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Provide a new BPF helper for doing a FIB and neighbor lookup
in the kernel tables from an XDP or tc BPF program. The helper
provides a fast-path for forwarding packets. The API supports
IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
implemented in this initial work, from David (Ahern).

2) Just a tiny diff but huge feature enabled for nfp driver by
extending the BPF offload beyond a pure host processing offload.
Offloaded XDP programs are allowed to set the RX queue index and
thus opening the door for defining a fully programmable RSS/n-tuple
filter replacement. Once BPF decided on a queue already, the device
data-path will skip the conventional RSS processing completely,
from Jakub.

3) The original sockmap implementation was array based similar to
devmap. However unlike devmap where an ifindex has a 1:1 mapping
into the map there are use cases with sockets that need to be
referenced using longer keys. Hence, sockhash map is added reusing
as much of the sockmap code as possible, from John.

4) Introduce BTF ID. The ID is allocatd through an IDR similar as
with BPF maps and progs. It also makes BTF accessible to user
space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
through BPF_OBJ_GET_INFO_BY_FD, from Martin.

5) Enable BPF stackmap with build_id also in NMI context. Due to the
up_read() of current->mm->mmap_sem build_id cannot be parsed.
This work defers the up_read() via a per-cpu irq_work so that
at least limited support can be enabled, from Song.

6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
JIT conversion as well as implementation of an optimized 32/64 bit
immediate load in the arm64 JIT that allows to reduce the number of
emitted instructions; in case of tested real-world programs they
were shrinking by three percent, from Daniel.

7) Add ifindex parameter to the libbpf loader in order to enable
BPF offload support. Right now only iproute2 can load offloaded
BPF and this will also enable libbpf for direct integration into
other applications, from David (Beckett).

8) Convert the plain text documentation under Documentation/bpf/ into
RST format since this is the appropriate standard the kernel is
moving to for all documentation. Also add an overview README.rst,
from Jesper.

9) Add __printf verification attribute to the bpf_verifier_vlog()
helper. Though it uses va_list we can still allow gcc to check
the format string, from Mathieu.

10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
is a bash 4.0+ feature which is not guaranteed to be available
when calling out to shell, therefore use a more portable variant,
from Joe.

11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
instead of relying on the gcc built-in, from Björn.

12) Fix a sock hashmap kmalloc warning reported by syzbot when an
overly large key size is used in hashmap then causing overflows
in htab->elem_size. Reject bogus attr->key_size early in the
sock_hash_alloc(), from Yonghong.

13) Ensure in BPF selftests when urandom_read is being linked that
--build-id is always enabled so that test_stacktrace_build_id[_nmi]
won't be failing, from Alexei.

14) Add bitsperlong.h as well as errno.h uapi headers into the tools
header infrastructure which point to one of the arch specific
uapi headers. This was needed in order to fix a build error on
some systems for the BPF selftests, from Sirio.

15) Allow for short options to be used in the xdp_monitor BPF sample
code. And also a bpf.h tools uapi header sync in order to fix a
selftest build failure. Both from Prashant.

16) More formally clarify the meaning of ID in the direct packet access
section of the BPF documentation, from Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+4601 -1866
+36
Documentation/bpf/README.rst
··· 1 + ================= 2 + BPF documentation 3 + ================= 4 + 5 + This directory contains documentation for the BPF (Berkeley Packet 6 + Filter) facility, with a focus on the extended BPF version (eBPF). 7 + 8 + This kernel side documentation is still work in progress. The main 9 + textual documentation is (for historical reasons) described in 10 + `Documentation/networking/filter.txt`_, which describe both classical 11 + and extended BPF instruction-set. 12 + The Cilium project also maintains a `BPF and XDP Reference Guide`_ 13 + that goes into great technical depth about the BPF Architecture. 14 + 15 + The primary info for the bpf syscall is available in the `man-pages`_ 16 + for `bpf(2)`_. 17 + 18 + 19 + 20 + Frequently asked questions (FAQ) 21 + ================================ 22 + 23 + Two sets of Questions and Answers (Q&A) are maintained. 24 + 25 + * QA for common questions about BPF see: bpf_design_QA_ 26 + 27 + * QA for developers interacting with BPF subsystem: bpf_devel_QA_ 28 + 29 + 30 + .. Links: 31 + .. _bpf_design_QA: bpf_design_QA.rst 32 + .. _bpf_devel_QA: bpf_devel_QA.rst 33 + .. _Documentation/networking/filter.txt: ../networking/filter.txt 34 + .. _man-pages: https://www.kernel.org/doc/man-pages/ 35 + .. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html 36 + .. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
+221
Documentation/bpf/bpf_design_QA.rst
··· 1 + ============== 2 + BPF Design Q&A 3 + ============== 4 + 5 + BPF extensibility and applicability to networking, tracing, security 6 + in the linux kernel and several user space implementations of BPF 7 + virtual machine led to a number of misunderstanding on what BPF actually is. 8 + This short QA is an attempt to address that and outline a direction 9 + of where BPF is heading long term. 10 + 11 + .. contents:: 12 + :local: 13 + :depth: 3 14 + 15 + Questions and Answers 16 + ===================== 17 + 18 + Q: Is BPF a generic instruction set similar to x64 and arm64? 19 + ------------------------------------------------------------- 20 + A: NO. 21 + 22 + Q: Is BPF a generic virtual machine ? 23 + ------------------------------------- 24 + A: NO. 25 + 26 + BPF is generic instruction set *with* C calling convention. 27 + ----------------------------------------------------------- 28 + 29 + Q: Why C calling convention was chosen? 30 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 31 + 32 + A: Because BPF programs are designed to run in the linux kernel 33 + which is written in C, hence BPF defines instruction set compatible 34 + with two most used architectures x64 and arm64 (and takes into 35 + consideration important quirks of other architectures) and 36 + defines calling convention that is compatible with C calling 37 + convention of the linux kernel on those architectures. 38 + 39 + Q: can multiple return values be supported in the future? 40 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 41 + A: NO. BPF allows only register R0 to be used as return value. 42 + 43 + Q: can more than 5 function arguments be supported in the future? 44 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 45 + A: NO. BPF calling convention only allows registers R1-R5 to be used 46 + as arguments. BPF is not a standalone instruction set. 47 + (unlike x64 ISA that allows msft, cdecl and other conventions) 48 + 49 + Q: can BPF programs access instruction pointer or return address? 50 + ----------------------------------------------------------------- 51 + A: NO. 52 + 53 + Q: can BPF programs access stack pointer ? 54 + ------------------------------------------ 55 + A: NO. 56 + 57 + Only frame pointer (register R10) is accessible. 58 + From compiler point of view it's necessary to have stack pointer. 59 + For example LLVM defines register R11 as stack pointer in its 60 + BPF backend, but it makes sure that generated code never uses it. 61 + 62 + Q: Does C-calling convention diminishes possible use cases? 63 + ----------------------------------------------------------- 64 + A: YES. 65 + 66 + BPF design forces addition of major functionality in the form 67 + of kernel helper functions and kernel objects like BPF maps with 68 + seamless interoperability between them. It lets kernel call into 69 + BPF programs and programs call kernel helpers with zero overhead. 70 + As all of them were native C code. That is particularly the case 71 + for JITed BPF programs that are indistinguishable from 72 + native kernel C code. 73 + 74 + Q: Does it mean that 'innovative' extensions to BPF code are disallowed? 75 + ------------------------------------------------------------------------ 76 + A: Soft yes. 77 + 78 + At least for now until BPF core has support for 79 + bpf-to-bpf calls, indirect calls, loops, global variables, 80 + jump tables, read only sections and all other normal constructs 81 + that C code can produce. 82 + 83 + Q: Can loops be supported in a safe way? 84 + ---------------------------------------- 85 + A: It's not clear yet. 86 + 87 + BPF developers are trying to find a way to 88 + support bounded loops where the verifier can guarantee that 89 + the program terminates in less than 4096 instructions. 90 + 91 + Instruction level questions 92 + --------------------------- 93 + 94 + Q: LD_ABS and LD_IND instructions vs C code 95 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 96 + 97 + Q: How come LD_ABS and LD_IND instruction are present in BPF whereas 98 + C code cannot express them and has to use builtin intrinsics? 99 + 100 + A: This is artifact of compatibility with classic BPF. Modern 101 + networking code in BPF performs better without them. 102 + See 'direct packet access'. 103 + 104 + Q: BPF instructions mapping not one-to-one to native CPU 105 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 106 + Q: It seems not all BPF instructions are one-to-one to native CPU. 107 + For example why BPF_JNE and other compare and jumps are not cpu-like? 108 + 109 + A: This was necessary to avoid introducing flags into ISA which are 110 + impossible to make generic and efficient across CPU architectures. 111 + 112 + Q: why BPF_DIV instruction doesn't map to x64 div? 113 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 114 + A: Because if we picked one-to-one relationship to x64 it would have made 115 + it more complicated to support on arm64 and other archs. Also it 116 + needs div-by-zero runtime check. 117 + 118 + Q: why there is no BPF_SDIV for signed divide operation? 119 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 120 + A: Because it would be rarely used. llvm errors in such case and 121 + prints a suggestion to use unsigned divide instead 122 + 123 + Q: Why BPF has implicit prologue and epilogue? 124 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 125 + A: Because architectures like sparc have register windows and in general 126 + there are enough subtle differences between architectures, so naive 127 + store return address into stack won't work. Another reason is BPF has 128 + to be safe from division by zero (and legacy exception path 129 + of LD_ABS insn). Those instructions need to invoke epilogue and 130 + return implicitly. 131 + 132 + Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? 133 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 134 + A: Because classic BPF didn't have them and BPF authors felt that compiler 135 + workaround would be acceptable. Turned out that programs lose performance 136 + due to lack of these compare instructions and they were added. 137 + These two instructions is a perfect example what kind of new BPF 138 + instructions are acceptable and can be added in the future. 139 + These two already had equivalent instructions in native CPUs. 140 + New instructions that don't have one-to-one mapping to HW instructions 141 + will not be accepted. 142 + 143 + Q: BPF 32-bit subregister requirements 144 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 145 + Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF 146 + registers which makes BPF inefficient virtual machine for 32-bit 147 + CPU architectures and 32-bit HW accelerators. Can true 32-bit registers 148 + be added to BPF in the future? 149 + 150 + A: NO. The first thing to improve performance on 32-bit archs is to teach 151 + LLVM to generate code that uses 32-bit subregisters. Then second step 152 + is to teach verifier to mark operations where zero-ing upper bits 153 + is unnecessary. Then JITs can take advantage of those markings and 154 + drastically reduce size of generated code and improve performance. 155 + 156 + Q: Does BPF have a stable ABI? 157 + ------------------------------ 158 + A: YES. BPF instructions, arguments to BPF programs, set of helper 159 + functions and their arguments, recognized return codes are all part 160 + of ABI. However when tracing programs are using bpf_probe_read() helper 161 + to walk kernel internal datastructures and compile with kernel 162 + internal headers these accesses can and will break with newer 163 + kernels. The union bpf_attr -> kern_version is checked at load time 164 + to prevent accidentally loading kprobe-based bpf programs written 165 + for a different kernel. Networking programs don't do kern_version check. 166 + 167 + Q: How much stack space a BPF program uses? 168 + ------------------------------------------- 169 + A: Currently all program types are limited to 512 bytes of stack 170 + space, but the verifier computes the actual amount of stack used 171 + and both interpreter and most JITed code consume necessary amount. 172 + 173 + Q: Can BPF be offloaded to HW? 174 + ------------------------------ 175 + A: YES. BPF HW offload is supported by NFP driver. 176 + 177 + Q: Does classic BPF interpreter still exist? 178 + -------------------------------------------- 179 + A: NO. Classic BPF programs are converted into extend BPF instructions. 180 + 181 + Q: Can BPF call arbitrary kernel functions? 182 + ------------------------------------------- 183 + A: NO. BPF programs can only call a set of helper functions which 184 + is defined for every program type. 185 + 186 + Q: Can BPF overwrite arbitrary kernel memory? 187 + --------------------------------------------- 188 + A: NO. 189 + 190 + Tracing bpf programs can *read* arbitrary memory with bpf_probe_read() 191 + and bpf_probe_read_str() helpers. Networking programs cannot read 192 + arbitrary memory, since they don't have access to these helpers. 193 + Programs can never read or write arbitrary memory directly. 194 + 195 + Q: Can BPF overwrite arbitrary user memory? 196 + ------------------------------------------- 197 + A: Sort-of. 198 + 199 + Tracing BPF programs can overwrite the user memory 200 + of the current task with bpf_probe_write_user(). Every time such 201 + program is loaded the kernel will print warning message, so 202 + this helper is only useful for experiments and prototypes. 203 + Tracing BPF programs are root only. 204 + 205 + Q: bpf_trace_printk() helper warning 206 + ------------------------------------ 207 + Q: When bpf_trace_printk() helper is used the kernel prints nasty 208 + warning message. Why is that? 209 + 210 + A: This is done to nudge program authors into better interfaces when 211 + programs need to pass data to user space. Like bpf_perf_event_output() 212 + can be used to efficiently stream data via perf ring buffer. 213 + BPF maps can be used for asynchronous data sharing between kernel 214 + and user space. bpf_trace_printk() should only be used for debugging. 215 + 216 + Q: New functionality via kernel modules? 217 + ---------------------------------------- 218 + Q: Can BPF functionality such as new program or map types, new 219 + helpers, etc be added out of kernel module code? 220 + 221 + A: NO.
-156
Documentation/bpf/bpf_design_QA.txt
··· 1 - BPF extensibility and applicability to networking, tracing, security 2 - in the linux kernel and several user space implementations of BPF 3 - virtual machine led to a number of misunderstanding on what BPF actually is. 4 - This short QA is an attempt to address that and outline a direction 5 - of where BPF is heading long term. 6 - 7 - Q: Is BPF a generic instruction set similar to x64 and arm64? 8 - A: NO. 9 - 10 - Q: Is BPF a generic virtual machine ? 11 - A: NO. 12 - 13 - BPF is generic instruction set _with_ C calling convention. 14 - 15 - Q: Why C calling convention was chosen? 16 - A: Because BPF programs are designed to run in the linux kernel 17 - which is written in C, hence BPF defines instruction set compatible 18 - with two most used architectures x64 and arm64 (and takes into 19 - consideration important quirks of other architectures) and 20 - defines calling convention that is compatible with C calling 21 - convention of the linux kernel on those architectures. 22 - 23 - Q: can multiple return values be supported in the future? 24 - A: NO. BPF allows only register R0 to be used as return value. 25 - 26 - Q: can more than 5 function arguments be supported in the future? 27 - A: NO. BPF calling convention only allows registers R1-R5 to be used 28 - as arguments. BPF is not a standalone instruction set. 29 - (unlike x64 ISA that allows msft, cdecl and other conventions) 30 - 31 - Q: can BPF programs access instruction pointer or return address? 32 - A: NO. 33 - 34 - Q: can BPF programs access stack pointer ? 35 - A: NO. Only frame pointer (register R10) is accessible. 36 - From compiler point of view it's necessary to have stack pointer. 37 - For example LLVM defines register R11 as stack pointer in its 38 - BPF backend, but it makes sure that generated code never uses it. 39 - 40 - Q: Does C-calling convention diminishes possible use cases? 41 - A: YES. BPF design forces addition of major functionality in the form 42 - of kernel helper functions and kernel objects like BPF maps with 43 - seamless interoperability between them. It lets kernel call into 44 - BPF programs and programs call kernel helpers with zero overhead. 45 - As all of them were native C code. That is particularly the case 46 - for JITed BPF programs that are indistinguishable from 47 - native kernel C code. 48 - 49 - Q: Does it mean that 'innovative' extensions to BPF code are disallowed? 50 - A: Soft yes. At least for now until BPF core has support for 51 - bpf-to-bpf calls, indirect calls, loops, global variables, 52 - jump tables, read only sections and all other normal constructs 53 - that C code can produce. 54 - 55 - Q: Can loops be supported in a safe way? 56 - A: It's not clear yet. BPF developers are trying to find a way to 57 - support bounded loops where the verifier can guarantee that 58 - the program terminates in less than 4096 instructions. 59 - 60 - Q: How come LD_ABS and LD_IND instruction are present in BPF whereas 61 - C code cannot express them and has to use builtin intrinsics? 62 - A: This is artifact of compatibility with classic BPF. Modern 63 - networking code in BPF performs better without them. 64 - See 'direct packet access'. 65 - 66 - Q: It seems not all BPF instructions are one-to-one to native CPU. 67 - For example why BPF_JNE and other compare and jumps are not cpu-like? 68 - A: This was necessary to avoid introducing flags into ISA which are 69 - impossible to make generic and efficient across CPU architectures. 70 - 71 - Q: why BPF_DIV instruction doesn't map to x64 div? 72 - A: Because if we picked one-to-one relationship to x64 it would have made 73 - it more complicated to support on arm64 and other archs. Also it 74 - needs div-by-zero runtime check. 75 - 76 - Q: why there is no BPF_SDIV for signed divide operation? 77 - A: Because it would be rarely used. llvm errors in such case and 78 - prints a suggestion to use unsigned divide instead 79 - 80 - Q: Why BPF has implicit prologue and epilogue? 81 - A: Because architectures like sparc have register windows and in general 82 - there are enough subtle differences between architectures, so naive 83 - store return address into stack won't work. Another reason is BPF has 84 - to be safe from division by zero (and legacy exception path 85 - of LD_ABS insn). Those instructions need to invoke epilogue and 86 - return implicitly. 87 - 88 - Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? 89 - A: Because classic BPF didn't have them and BPF authors felt that compiler 90 - workaround would be acceptable. Turned out that programs lose performance 91 - due to lack of these compare instructions and they were added. 92 - These two instructions is a perfect example what kind of new BPF 93 - instructions are acceptable and can be added in the future. 94 - These two already had equivalent instructions in native CPUs. 95 - New instructions that don't have one-to-one mapping to HW instructions 96 - will not be accepted. 97 - 98 - Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF 99 - registers which makes BPF inefficient virtual machine for 32-bit 100 - CPU architectures and 32-bit HW accelerators. Can true 32-bit registers 101 - be added to BPF in the future? 102 - A: NO. The first thing to improve performance on 32-bit archs is to teach 103 - LLVM to generate code that uses 32-bit subregisters. Then second step 104 - is to teach verifier to mark operations where zero-ing upper bits 105 - is unnecessary. Then JITs can take advantage of those markings and 106 - drastically reduce size of generated code and improve performance. 107 - 108 - Q: Does BPF have a stable ABI? 109 - A: YES. BPF instructions, arguments to BPF programs, set of helper 110 - functions and their arguments, recognized return codes are all part 111 - of ABI. However when tracing programs are using bpf_probe_read() helper 112 - to walk kernel internal datastructures and compile with kernel 113 - internal headers these accesses can and will break with newer 114 - kernels. The union bpf_attr -> kern_version is checked at load time 115 - to prevent accidentally loading kprobe-based bpf programs written 116 - for a different kernel. Networking programs don't do kern_version check. 117 - 118 - Q: How much stack space a BPF program uses? 119 - A: Currently all program types are limited to 512 bytes of stack 120 - space, but the verifier computes the actual amount of stack used 121 - and both interpreter and most JITed code consume necessary amount. 122 - 123 - Q: Can BPF be offloaded to HW? 124 - A: YES. BPF HW offload is supported by NFP driver. 125 - 126 - Q: Does classic BPF interpreter still exist? 127 - A: NO. Classic BPF programs are converted into extend BPF instructions. 128 - 129 - Q: Can BPF call arbitrary kernel functions? 130 - A: NO. BPF programs can only call a set of helper functions which 131 - is defined for every program type. 132 - 133 - Q: Can BPF overwrite arbitrary kernel memory? 134 - A: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read() 135 - and bpf_probe_read_str() helpers. Networking programs cannot read 136 - arbitrary memory, since they don't have access to these helpers. 137 - Programs can never read or write arbitrary memory directly. 138 - 139 - Q: Can BPF overwrite arbitrary user memory? 140 - A: Sort-of. Tracing BPF programs can overwrite the user memory 141 - of the current task with bpf_probe_write_user(). Every time such 142 - program is loaded the kernel will print warning message, so 143 - this helper is only useful for experiments and prototypes. 144 - Tracing BPF programs are root only. 145 - 146 - Q: When bpf_trace_printk() helper is used the kernel prints nasty 147 - warning message. Why is that? 148 - A: This is done to nudge program authors into better interfaces when 149 - programs need to pass data to user space. Like bpf_perf_event_output() 150 - can be used to efficiently stream data via perf ring buffer. 151 - BPF maps can be used for asynchronous data sharing between kernel 152 - and user space. bpf_trace_printk() should only be used for debugging. 153 - 154 - Q: Can BPF functionality such as new program or map types, new 155 - helpers, etc be added out of kernel module code? 156 - A: NO.
+640
Documentation/bpf/bpf_devel_QA.rst
··· 1 + ================================= 2 + HOWTO interact with BPF subsystem 3 + ================================= 4 + 5 + This document provides information for the BPF subsystem about various 6 + workflows related to reporting bugs, submitting patches, and queueing 7 + patches for stable kernels. 8 + 9 + For general information about submitting patches, please refer to 10 + `Documentation/process/`_. This document only describes additional specifics 11 + related to BPF. 12 + 13 + .. contents:: 14 + :local: 15 + :depth: 2 16 + 17 + Reporting bugs 18 + ============== 19 + 20 + Q: How do I report bugs for BPF kernel code? 21 + -------------------------------------------- 22 + A: Since all BPF kernel development as well as bpftool and iproute2 BPF 23 + loader development happens through the netdev kernel mailing list, 24 + please report any found issues around BPF to the following mailing 25 + list: 26 + 27 + netdev@vger.kernel.org 28 + 29 + This may also include issues related to XDP, BPF tracing, etc. 30 + 31 + Given netdev has a high volume of traffic, please also add the BPF 32 + maintainers to Cc (from kernel MAINTAINERS_ file): 33 + 34 + * Alexei Starovoitov <ast@kernel.org> 35 + * Daniel Borkmann <daniel@iogearbox.net> 36 + 37 + In case a buggy commit has already been identified, make sure to keep 38 + the actual commit authors in Cc as well for the report. They can 39 + typically be identified through the kernel's git tree. 40 + 41 + **Please do NOT report BPF issues to bugzilla.kernel.org since it 42 + is a guarantee that the reported issue will be overlooked.** 43 + 44 + Submitting patches 45 + ================== 46 + 47 + Q: To which mailing list do I need to submit my BPF patches? 48 + ------------------------------------------------------------ 49 + A: Please submit your BPF patches to the netdev kernel mailing list: 50 + 51 + netdev@vger.kernel.org 52 + 53 + Historically, BPF came out of networking and has always been maintained 54 + by the kernel networking community. Although these days BPF touches 55 + many other subsystems as well, the patches are still routed mainly 56 + through the networking community. 57 + 58 + In case your patch has changes in various different subsystems (e.g. 59 + tracing, security, etc), make sure to Cc the related kernel mailing 60 + lists and maintainers from there as well, so they are able to review 61 + the changes and provide their Acked-by's to the patches. 62 + 63 + Q: Where can I find patches currently under discussion for BPF subsystem? 64 + ------------------------------------------------------------------------- 65 + A: All patches that are Cc'ed to netdev are queued for review under netdev 66 + patchwork project: 67 + 68 + http://patchwork.ozlabs.org/project/netdev/list/ 69 + 70 + Those patches which target BPF, are assigned to a 'bpf' delegate for 71 + further processing from BPF maintainers. The current queue with 72 + patches under review can be found at: 73 + 74 + https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147 75 + 76 + Once the patches have been reviewed by the BPF community as a whole 77 + and approved by the BPF maintainers, their status in patchwork will be 78 + changed to 'Accepted' and the submitter will be notified by mail. This 79 + means that the patches look good from a BPF perspective and have been 80 + applied to one of the two BPF kernel trees. 81 + 82 + In case feedback from the community requires a respin of the patches, 83 + their status in patchwork will be set to 'Changes Requested', and purged 84 + from the current review queue. Likewise for cases where patches would 85 + get rejected or are not applicable to the BPF trees (but assigned to 86 + the 'bpf' delegate). 87 + 88 + Q: How do the changes make their way into Linux? 89 + ------------------------------------------------ 90 + A: There are two BPF kernel trees (git repositories). Once patches have 91 + been accepted by the BPF maintainers, they will be applied to one 92 + of the two BPF trees: 93 + 94 + * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/ 95 + * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ 96 + 97 + The bpf tree itself is for fixes only, whereas bpf-next for features, 98 + cleanups or other kind of improvements ("next-like" content). This is 99 + analogous to net and net-next trees for networking. Both bpf and 100 + bpf-next will only have a master branch in order to simplify against 101 + which branch patches should get rebased to. 102 + 103 + Accumulated BPF patches in the bpf tree will regularly get pulled 104 + into the net kernel tree. Likewise, accumulated BPF patches accepted 105 + into the bpf-next tree will make their way into net-next tree. net and 106 + net-next are both run by David S. Miller. From there, they will go 107 + into the kernel mainline tree run by Linus Torvalds. To read up on the 108 + process of net and net-next being merged into the mainline tree, see 109 + the `netdev FAQ`_ under: 110 + 111 + `Documentation/networking/netdev-FAQ.txt`_ 112 + 113 + Occasionally, to prevent merge conflicts, we might send pull requests 114 + to other trees (e.g. tracing) with a small subset of the patches, but 115 + net and net-next are always the main trees targeted for integration. 116 + 117 + The pull requests will contain a high-level summary of the accumulated 118 + patches and can be searched on netdev kernel mailing list through the 119 + following subject lines (``yyyy-mm-dd`` is the date of the pull 120 + request):: 121 + 122 + pull-request: bpf yyyy-mm-dd 123 + pull-request: bpf-next yyyy-mm-dd 124 + 125 + Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to? 126 + --------------------------------------------------------------------------------- 127 + 128 + A: The process is the very same as described in the `netdev FAQ`_, so 129 + please read up on it. The subject line must indicate whether the 130 + patch is a fix or rather "next-like" content in order to let the 131 + maintainers know whether it is targeted at bpf or bpf-next. 132 + 133 + For fixes eventually landing in bpf -> net tree, the subject must 134 + look like:: 135 + 136 + git format-patch --subject-prefix='PATCH bpf' start..finish 137 + 138 + For features/improvements/etc that should eventually land in 139 + bpf-next -> net-next, the subject must look like:: 140 + 141 + git format-patch --subject-prefix='PATCH bpf-next' start..finish 142 + 143 + If unsure whether the patch or patch series should go into bpf 144 + or net directly, or bpf-next or net-next directly, it is not a 145 + problem either if the subject line says net or net-next as target. 146 + It is eventually up to the maintainers to do the delegation of 147 + the patches. 148 + 149 + If it is clear that patches should go into bpf or bpf-next tree, 150 + please make sure to rebase the patches against those trees in 151 + order to reduce potential conflicts. 152 + 153 + In case the patch or patch series has to be reworked and sent out 154 + again in a second or later revision, it is also required to add a 155 + version number (``v2``, ``v3``, ...) into the subject prefix:: 156 + 157 + git format-patch --subject-prefix='PATCH net-next v2' start..finish 158 + 159 + When changes have been requested to the patch series, always send the 160 + whole patch series again with the feedback incorporated (never send 161 + individual diffs on top of the old series). 162 + 163 + Q: What does it mean when a patch gets applied to bpf or bpf-next tree? 164 + ----------------------------------------------------------------------- 165 + A: It means that the patch looks good for mainline inclusion from 166 + a BPF point of view. 167 + 168 + Be aware that this is not a final verdict that the patch will 169 + automatically get accepted into net or net-next trees eventually: 170 + 171 + On the netdev kernel mailing list reviews can come in at any point 172 + in time. If discussions around a patch conclude that they cannot 173 + get included as-is, we will either apply a follow-up fix or drop 174 + them from the trees entirely. Therefore, we also reserve to rebase 175 + the trees when deemed necessary. After all, the purpose of the tree 176 + is to: 177 + 178 + i) accumulate and stage BPF patches for integration into trees 179 + like net and net-next, and 180 + 181 + ii) run extensive BPF test suite and 182 + workloads on the patches before they make their way any further. 183 + 184 + Once the BPF pull request was accepted by David S. Miller, then 185 + the patches end up in net or net-next tree, respectively, and 186 + make their way from there further into mainline. Again, see the 187 + `netdev FAQ`_ for additional information e.g. on how often they are 188 + merged to mainline. 189 + 190 + Q: How long do I need to wait for feedback on my BPF patches? 191 + ------------------------------------------------------------- 192 + A: We try to keep the latency low. The usual time to feedback will 193 + be around 2 or 3 business days. It may vary depending on the 194 + complexity of changes and current patch load. 195 + 196 + Q: How often do you send pull requests to major kernel trees like net or net-next? 197 + ---------------------------------------------------------------------------------- 198 + 199 + A: Pull requests will be sent out rather often in order to not 200 + accumulate too many patches in bpf or bpf-next. 201 + 202 + As a rule of thumb, expect pull requests for each tree regularly 203 + at the end of the week. In some cases pull requests could additionally 204 + come also in the middle of the week depending on the current patch 205 + load or urgency. 206 + 207 + Q: Are patches applied to bpf-next when the merge window is open? 208 + ----------------------------------------------------------------- 209 + A: For the time when the merge window is open, bpf-next will not be 210 + processed. This is roughly analogous to net-next patch processing, 211 + so feel free to read up on the `netdev FAQ`_ about further details. 212 + 213 + During those two weeks of merge window, we might ask you to resend 214 + your patch series once bpf-next is open again. Once Linus released 215 + a ``v*-rc1`` after the merge window, we continue processing of bpf-next. 216 + 217 + For non-subscribers to kernel mailing lists, there is also a status 218 + page run by David S. Miller on net-next that provides guidance: 219 + 220 + http://vger.kernel.org/~davem/net-next.html 221 + 222 + Q: Verifier changes and test cases 223 + ---------------------------------- 224 + Q: I made a BPF verifier change, do I need to add test cases for 225 + BPF kernel selftests_? 226 + 227 + A: If the patch has changes to the behavior of the verifier, then yes, 228 + it is absolutely necessary to add test cases to the BPF kernel 229 + selftests_ suite. If they are not present and we think they are 230 + needed, then we might ask for them before accepting any changes. 231 + 232 + In particular, test_verifier.c is tracking a high number of BPF test 233 + cases, including a lot of corner cases that LLVM BPF back end may 234 + generate out of the restricted C code. Thus, adding test cases is 235 + absolutely crucial to make sure future changes do not accidentally 236 + affect prior use-cases. Thus, treat those test cases as: verifier 237 + behavior that is not tracked in test_verifier.c could potentially 238 + be subject to change. 239 + 240 + Q: samples/bpf preference vs selftests? 241 + --------------------------------------- 242 + Q: When should I add code to `samples/bpf/`_ and when to BPF kernel 243 + selftests_ ? 244 + 245 + A: In general, we prefer additions to BPF kernel selftests_ rather than 246 + `samples/bpf/`_. The rationale is very simple: kernel selftests are 247 + regularly run by various bots to test for kernel regressions. 248 + 249 + The more test cases we add to BPF selftests, the better the coverage 250 + and the less likely it is that those could accidentally break. It is 251 + not that BPF kernel selftests cannot demo how a specific feature can 252 + be used. 253 + 254 + That said, `samples/bpf/`_ may be a good place for people to get started, 255 + so it might be advisable that simple demos of features could go into 256 + `samples/bpf/`_, but advanced functional and corner-case testing rather 257 + into kernel selftests. 258 + 259 + If your sample looks like a test case, then go for BPF kernel selftests 260 + instead! 261 + 262 + Q: When should I add code to the bpftool? 263 + ----------------------------------------- 264 + A: The main purpose of bpftool (under tools/bpf/bpftool/) is to provide 265 + a central user space tool for debugging and introspection of BPF programs 266 + and maps that are active in the kernel. If UAPI changes related to BPF 267 + enable for dumping additional information of programs or maps, then 268 + bpftool should be extended as well to support dumping them. 269 + 270 + Q: When should I add code to iproute2's BPF loader? 271 + --------------------------------------------------- 272 + A: For UAPI changes related to the XDP or tc layer (e.g. ``cls_bpf``), 273 + the convention is that those control-path related changes are added to 274 + iproute2's BPF loader as well from user space side. This is not only 275 + useful to have UAPI changes properly designed to be usable, but also 276 + to make those changes available to a wider user base of major 277 + downstream distributions. 278 + 279 + Q: Do you accept patches as well for iproute2's BPF loader? 280 + ----------------------------------------------------------- 281 + A: Patches for the iproute2's BPF loader have to be sent to: 282 + 283 + netdev@vger.kernel.org 284 + 285 + While those patches are not processed by the BPF kernel maintainers, 286 + please keep them in Cc as well, so they can be reviewed. 287 + 288 + The official git repository for iproute2 is run by Stephen Hemminger 289 + and can be found at: 290 + 291 + https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/ 292 + 293 + The patches need to have a subject prefix of '``[PATCH iproute2 294 + master]``' or '``[PATCH iproute2 net-next]``'. '``master``' or 295 + '``net-next``' describes the target branch where the patch should be 296 + applied to. Meaning, if kernel changes went into the net-next kernel 297 + tree, then the related iproute2 changes need to go into the iproute2 298 + net-next branch, otherwise they can be targeted at master branch. The 299 + iproute2 net-next branch will get merged into the master branch after 300 + the current iproute2 version from master has been released. 301 + 302 + Like BPF, the patches end up in patchwork under the netdev project and 303 + are delegated to 'shemminger' for further processing: 304 + 305 + http://patchwork.ozlabs.org/project/netdev/list/?delegate=389 306 + 307 + Q: What is the minimum requirement before I submit my BPF patches? 308 + ------------------------------------------------------------------ 309 + A: When submitting patches, always take the time and properly test your 310 + patches *prior* to submission. Never rush them! If maintainers find 311 + that your patches have not been properly tested, it is a good way to 312 + get them grumpy. Testing patch submissions is a hard requirement! 313 + 314 + Note, fixes that go to bpf tree *must* have a ``Fixes:`` tag included. 315 + The same applies to fixes that target bpf-next, where the affected 316 + commit is in net-next (or in some cases bpf-next). The ``Fixes:`` tag is 317 + crucial in order to identify follow-up commits and tremendously helps 318 + for people having to do backporting, so it is a must have! 319 + 320 + We also don't accept patches with an empty commit message. Take your 321 + time and properly write up a high quality commit message, it is 322 + essential! 323 + 324 + Think about it this way: other developers looking at your code a month 325 + from now need to understand *why* a certain change has been done that 326 + way, and whether there have been flaws in the analysis or assumptions 327 + that the original author did. Thus providing a proper rationale and 328 + describing the use-case for the changes is a must. 329 + 330 + Patch submissions with >1 patch must have a cover letter which includes 331 + a high level description of the series. This high level summary will 332 + then be placed into the merge commit by the BPF maintainers such that 333 + it is also accessible from the git log for future reference. 334 + 335 + Q: Features changing BPF JIT and/or LLVM 336 + ---------------------------------------- 337 + Q: What do I need to consider when adding a new instruction or feature 338 + that would require BPF JIT and/or LLVM integration as well? 339 + 340 + A: We try hard to keep all BPF JITs up to date such that the same user 341 + experience can be guaranteed when running BPF programs on different 342 + architectures without having the program punt to the less efficient 343 + interpreter in case the in-kernel BPF JIT is enabled. 344 + 345 + If you are unable to implement or test the required JIT changes for 346 + certain architectures, please work together with the related BPF JIT 347 + developers in order to get the feature implemented in a timely manner. 348 + Please refer to the git log (``arch/*/net/``) to locate the necessary 349 + people for helping out. 350 + 351 + Also always make sure to add BPF test cases (e.g. test_bpf.c and 352 + test_verifier.c) for new instructions, so that they can receive 353 + broad test coverage and help run-time testing the various BPF JITs. 354 + 355 + In case of new BPF instructions, once the changes have been accepted 356 + into the Linux kernel, please implement support into LLVM's BPF back 357 + end. See LLVM_ section below for further information. 358 + 359 + Stable submission 360 + ================= 361 + 362 + Q: I need a specific BPF commit in stable kernels. What should I do? 363 + -------------------------------------------------------------------- 364 + A: In case you need a specific fix in stable kernels, first check whether 365 + the commit has already been applied in the related ``linux-*.y`` branches: 366 + 367 + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/ 368 + 369 + If not the case, then drop an email to the BPF maintainers with the 370 + netdev kernel mailing list in Cc and ask for the fix to be queued up: 371 + 372 + netdev@vger.kernel.org 373 + 374 + The process in general is the same as on netdev itself, see also the 375 + `netdev FAQ`_ document. 376 + 377 + Q: Do you also backport to kernels not currently maintained as stable? 378 + ---------------------------------------------------------------------- 379 + A: No. If you need a specific BPF commit in kernels that are currently not 380 + maintained by the stable maintainers, then you are on your own. 381 + 382 + The current stable and longterm stable kernels are all listed here: 383 + 384 + https://www.kernel.org/ 385 + 386 + Q: The BPF patch I am about to submit needs to go to stable as well 387 + ------------------------------------------------------------------- 388 + What should I do? 389 + 390 + A: The same rules apply as with netdev patch submissions in general, see 391 + `netdev FAQ`_ under: 392 + 393 + `Documentation/networking/netdev-FAQ.txt`_ 394 + 395 + Never add "``Cc: stable@vger.kernel.org``" to the patch description, but 396 + ask the BPF maintainers to queue the patches instead. This can be done 397 + with a note, for example, under the ``---`` part of the patch which does 398 + not go into the git log. Alternatively, this can be done as a simple 399 + request by mail instead. 400 + 401 + Q: Queue stable patches 402 + ----------------------- 403 + Q: Where do I find currently queued BPF patches that will be submitted 404 + to stable? 405 + 406 + A: Once patches that fix critical bugs got applied into the bpf tree, they 407 + are queued up for stable submission under: 408 + 409 + http://patchwork.ozlabs.org/bundle/bpf/stable/?state=* 410 + 411 + They will be on hold there at minimum until the related commit made its 412 + way into the mainline kernel tree. 413 + 414 + After having been under broader exposure, the queued patches will be 415 + submitted by the BPF maintainers to the stable maintainers. 416 + 417 + Testing patches 418 + =============== 419 + 420 + Q: How to run BPF selftests 421 + --------------------------- 422 + A: After you have booted into the newly compiled kernel, navigate to 423 + the BPF selftests_ suite in order to test BPF functionality (current 424 + working directory points to the root of the cloned git tree):: 425 + 426 + $ cd tools/testing/selftests/bpf/ 427 + $ make 428 + 429 + To run the verifier tests:: 430 + 431 + $ sudo ./test_verifier 432 + 433 + The verifier tests print out all the current checks being 434 + performed. The summary at the end of running all tests will dump 435 + information of test successes and failures:: 436 + 437 + Summary: 418 PASSED, 0 FAILED 438 + 439 + In order to run through all BPF selftests, the following command is 440 + needed:: 441 + 442 + $ sudo make run_tests 443 + 444 + See the kernels selftest `Documentation/dev-tools/kselftest.rst`_ 445 + document for further documentation. 446 + 447 + Q: Which BPF kernel selftests version should I run my kernel against? 448 + --------------------------------------------------------------------- 449 + A: If you run a kernel ``xyz``, then always run the BPF kernel selftests 450 + from that kernel ``xyz`` as well. Do not expect that the BPF selftest 451 + from the latest mainline tree will pass all the time. 452 + 453 + In particular, test_bpf.c and test_verifier.c have a large number of 454 + test cases and are constantly updated with new BPF test sequences, or 455 + existing ones are adapted to verifier changes e.g. due to verifier 456 + becoming smarter and being able to better track certain things. 457 + 458 + LLVM 459 + ==== 460 + 461 + Q: Where do I find LLVM with BPF support? 462 + ----------------------------------------- 463 + A: The BPF back end for LLVM is upstream in LLVM since version 3.7.1. 464 + 465 + All major distributions these days ship LLVM with BPF back end enabled, 466 + so for the majority of use-cases it is not required to compile LLVM by 467 + hand anymore, just install the distribution provided package. 468 + 469 + LLVM's static compiler lists the supported targets through 470 + ``llc --version``, make sure BPF targets are listed. Example:: 471 + 472 + $ llc --version 473 + LLVM (http://llvm.org/): 474 + LLVM version 6.0.0svn 475 + Optimized build. 476 + Default target: x86_64-unknown-linux-gnu 477 + Host CPU: skylake 478 + 479 + Registered Targets: 480 + bpf - BPF (host endian) 481 + bpfeb - BPF (big endian) 482 + bpfel - BPF (little endian) 483 + x86 - 32-bit X86: Pentium-Pro and above 484 + x86-64 - 64-bit X86: EM64T and AMD64 485 + 486 + For developers in order to utilize the latest features added to LLVM's 487 + BPF back end, it is advisable to run the latest LLVM releases. Support 488 + for new BPF kernel features such as additions to the BPF instruction 489 + set are often developed together. 490 + 491 + All LLVM releases can be found at: http://releases.llvm.org/ 492 + 493 + Q: Got it, so how do I build LLVM manually anyway? 494 + -------------------------------------------------- 495 + A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have 496 + that set up, proceed with building the latest LLVM and clang version 497 + from the git repositories:: 498 + 499 + $ git clone http://llvm.org/git/llvm.git 500 + $ cd llvm/tools 501 + $ git clone --depth 1 http://llvm.org/git/clang.git 502 + $ cd ..; mkdir build; cd build 503 + $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ 504 + -DBUILD_SHARED_LIBS=OFF \ 505 + -DCMAKE_BUILD_TYPE=Release \ 506 + -DLLVM_BUILD_RUNTIME=OFF 507 + $ make -j $(getconf _NPROCESSORS_ONLN) 508 + 509 + The built binaries can then be found in the build/bin/ directory, where 510 + you can point the PATH variable to. 511 + 512 + Q: Reporting LLVM BPF issues 513 + ---------------------------- 514 + Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code 515 + generation back end or about LLVM generated code that the verifier 516 + refuses to accept? 517 + 518 + A: Yes, please do! 519 + 520 + LLVM's BPF back end is a key piece of the whole BPF 521 + infrastructure and it ties deeply into verification of programs from the 522 + kernel side. Therefore, any issues on either side need to be investigated 523 + and fixed whenever necessary. 524 + 525 + Therefore, please make sure to bring them up at netdev kernel mailing 526 + list and Cc BPF maintainers for LLVM and kernel bits: 527 + 528 + * Yonghong Song <yhs@fb.com> 529 + * Alexei Starovoitov <ast@kernel.org> 530 + * Daniel Borkmann <daniel@iogearbox.net> 531 + 532 + LLVM also has an issue tracker where BPF related bugs can be found: 533 + 534 + https://bugs.llvm.org/buglist.cgi?quicksearch=bpf 535 + 536 + However, it is better to reach out through mailing lists with having 537 + maintainers in Cc. 538 + 539 + Q: New BPF instruction for kernel and LLVM 540 + ------------------------------------------ 541 + Q: I have added a new BPF instruction to the kernel, how can I integrate 542 + it into LLVM? 543 + 544 + A: LLVM has a ``-mcpu`` selector for the BPF back end in order to allow 545 + the selection of BPF instruction set extensions. By default the 546 + ``generic`` processor target is used, which is the base instruction set 547 + (v1) of BPF. 548 + 549 + LLVM has an option to select ``-mcpu=probe`` where it will probe the host 550 + kernel for supported BPF instruction set extensions and selects the 551 + optimal set automatically. 552 + 553 + For cross-compilation, a specific version can be select manually as well :: 554 + 555 + $ llc -march bpf -mcpu=help 556 + Available CPUs for this target: 557 + 558 + generic - Select the generic processor. 559 + probe - Select the probe processor. 560 + v1 - Select the v1 processor. 561 + v2 - Select the v2 processor. 562 + [...] 563 + 564 + Newly added BPF instructions to the Linux kernel need to follow the same 565 + scheme, bump the instruction set version and implement probing for the 566 + extensions such that ``-mcpu=probe`` users can benefit from the 567 + optimization transparently when upgrading their kernels. 568 + 569 + If you are unable to implement support for the newly added BPF instruction 570 + please reach out to BPF developers for help. 571 + 572 + By the way, the BPF kernel selftests run with ``-mcpu=probe`` for better 573 + test coverage. 574 + 575 + Q: clang flag for target bpf? 576 + ----------------------------- 577 + Q: In some cases clang flag ``-target bpf`` is used but in other cases the 578 + default clang target, which matches the underlying architecture, is used. 579 + What is the difference and when I should use which? 580 + 581 + A: Although LLVM IR generation and optimization try to stay architecture 582 + independent, ``-target <arch>`` still has some impact on generated code: 583 + 584 + - BPF program may recursively include header file(s) with file scope 585 + inline assembly codes. The default target can handle this well, 586 + while ``bpf`` target may fail if bpf backend assembler does not 587 + understand these assembly codes, which is true in most cases. 588 + 589 + - When compiled without ``-g``, additional elf sections, e.g., 590 + .eh_frame and .rela.eh_frame, may be present in the object file 591 + with default target, but not with ``bpf`` target. 592 + 593 + - The default target may turn a C switch statement into a switch table 594 + lookup and jump operation. Since the switch table is placed 595 + in the global readonly section, the bpf program will fail to load. 596 + The bpf target does not support switch table optimization. 597 + The clang option ``-fno-jump-tables`` can be used to disable 598 + switch table generation. 599 + 600 + - For clang ``-target bpf``, it is guaranteed that pointer or long / 601 + unsigned long types will always have a width of 64 bit, no matter 602 + whether underlying clang binary or default target (or kernel) is 603 + 32 bit. However, when native clang target is used, then it will 604 + compile these types based on the underlying architecture's conventions, 605 + meaning in case of 32 bit architecture, pointer or long / unsigned 606 + long types e.g. in BPF context structure will have width of 32 bit 607 + while the BPF LLVM back end still operates in 64 bit. The native 608 + target is mostly needed in tracing for the case of walking ``pt_regs`` 609 + or other kernel structures where CPU's register width matters. 610 + Otherwise, ``clang -target bpf`` is generally recommended. 611 + 612 + You should use default target when: 613 + 614 + - Your program includes a header file, e.g., ptrace.h, which eventually 615 + pulls in some header files containing file scope host assembly codes. 616 + 617 + - You can add ``-fno-jump-tables`` to work around the switch table issue. 618 + 619 + Otherwise, you can use ``bpf`` target. Additionally, you *must* use bpf target 620 + when: 621 + 622 + - Your program uses data structures with pointer or long / unsigned long 623 + types that interface with BPF helpers or context data structures. Access 624 + into these structures is verified by the BPF verifier and may result 625 + in verification failures if the native architecture is not aligned with 626 + the BPF architecture, e.g. 64-bit. An example of this is 627 + BPF_PROG_TYPE_SK_MSG require ``-target bpf`` 628 + 629 + 630 + .. Links 631 + .. _Documentation/process/: https://www.kernel.org/doc/html/latest/process/ 632 + .. _MAINTAINERS: ../../MAINTAINERS 633 + .. _Documentation/networking/netdev-FAQ.txt: ../networking/netdev-FAQ.txt 634 + .. _netdev FAQ: ../networking/netdev-FAQ.txt 635 + .. _samples/bpf/: ../../samples/bpf/ 636 + .. _selftests: ../../tools/testing/selftests/bpf/ 637 + .. _Documentation/dev-tools/kselftest.rst: 638 + https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html 639 + 640 + Happy BPF hacking!
-570
Documentation/bpf/bpf_devel_QA.txt
··· 1 - This document provides information for the BPF subsystem about various 2 - workflows related to reporting bugs, submitting patches, and queueing 3 - patches for stable kernels. 4 - 5 - For general information about submitting patches, please refer to 6 - Documentation/process/. This document only describes additional specifics 7 - related to BPF. 8 - 9 - Reporting bugs: 10 - --------------- 11 - 12 - Q: How do I report bugs for BPF kernel code? 13 - 14 - A: Since all BPF kernel development as well as bpftool and iproute2 BPF 15 - loader development happens through the netdev kernel mailing list, 16 - please report any found issues around BPF to the following mailing 17 - list: 18 - 19 - netdev@vger.kernel.org 20 - 21 - This may also include issues related to XDP, BPF tracing, etc. 22 - 23 - Given netdev has a high volume of traffic, please also add the BPF 24 - maintainers to Cc (from kernel MAINTAINERS file): 25 - 26 - Alexei Starovoitov <ast@kernel.org> 27 - Daniel Borkmann <daniel@iogearbox.net> 28 - 29 - In case a buggy commit has already been identified, make sure to keep 30 - the actual commit authors in Cc as well for the report. They can 31 - typically be identified through the kernel's git tree. 32 - 33 - Please do *not* report BPF issues to bugzilla.kernel.org since it 34 - is a guarantee that the reported issue will be overlooked. 35 - 36 - Submitting patches: 37 - ------------------- 38 - 39 - Q: To which mailing list do I need to submit my BPF patches? 40 - 41 - A: Please submit your BPF patches to the netdev kernel mailing list: 42 - 43 - netdev@vger.kernel.org 44 - 45 - Historically, BPF came out of networking and has always been maintained 46 - by the kernel networking community. Although these days BPF touches 47 - many other subsystems as well, the patches are still routed mainly 48 - through the networking community. 49 - 50 - In case your patch has changes in various different subsystems (e.g. 51 - tracing, security, etc), make sure to Cc the related kernel mailing 52 - lists and maintainers from there as well, so they are able to review 53 - the changes and provide their Acked-by's to the patches. 54 - 55 - Q: Where can I find patches currently under discussion for BPF subsystem? 56 - 57 - A: All patches that are Cc'ed to netdev are queued for review under netdev 58 - patchwork project: 59 - 60 - http://patchwork.ozlabs.org/project/netdev/list/ 61 - 62 - Those patches which target BPF, are assigned to a 'bpf' delegate for 63 - further processing from BPF maintainers. The current queue with 64 - patches under review can be found at: 65 - 66 - https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147 67 - 68 - Once the patches have been reviewed by the BPF community as a whole 69 - and approved by the BPF maintainers, their status in patchwork will be 70 - changed to 'Accepted' and the submitter will be notified by mail. This 71 - means that the patches look good from a BPF perspective and have been 72 - applied to one of the two BPF kernel trees. 73 - 74 - In case feedback from the community requires a respin of the patches, 75 - their status in patchwork will be set to 'Changes Requested', and purged 76 - from the current review queue. Likewise for cases where patches would 77 - get rejected or are not applicable to the BPF trees (but assigned to 78 - the 'bpf' delegate). 79 - 80 - Q: How do the changes make their way into Linux? 81 - 82 - A: There are two BPF kernel trees (git repositories). Once patches have 83 - been accepted by the BPF maintainers, they will be applied to one 84 - of the two BPF trees: 85 - 86 - https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/ 87 - https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ 88 - 89 - The bpf tree itself is for fixes only, whereas bpf-next for features, 90 - cleanups or other kind of improvements ("next-like" content). This is 91 - analogous to net and net-next trees for networking. Both bpf and 92 - bpf-next will only have a master branch in order to simplify against 93 - which branch patches should get rebased to. 94 - 95 - Accumulated BPF patches in the bpf tree will regularly get pulled 96 - into the net kernel tree. Likewise, accumulated BPF patches accepted 97 - into the bpf-next tree will make their way into net-next tree. net and 98 - net-next are both run by David S. Miller. From there, they will go 99 - into the kernel mainline tree run by Linus Torvalds. To read up on the 100 - process of net and net-next being merged into the mainline tree, see 101 - the netdev FAQ under: 102 - 103 - Documentation/networking/netdev-FAQ.txt 104 - 105 - Occasionally, to prevent merge conflicts, we might send pull requests 106 - to other trees (e.g. tracing) with a small subset of the patches, but 107 - net and net-next are always the main trees targeted for integration. 108 - 109 - The pull requests will contain a high-level summary of the accumulated 110 - patches and can be searched on netdev kernel mailing list through the 111 - following subject lines (yyyy-mm-dd is the date of the pull request): 112 - 113 - pull-request: bpf yyyy-mm-dd 114 - pull-request: bpf-next yyyy-mm-dd 115 - 116 - Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be 117 - applied to? 118 - 119 - A: The process is the very same as described in the netdev FAQ, so 120 - please read up on it. The subject line must indicate whether the 121 - patch is a fix or rather "next-like" content in order to let the 122 - maintainers know whether it is targeted at bpf or bpf-next. 123 - 124 - For fixes eventually landing in bpf -> net tree, the subject must 125 - look like: 126 - 127 - git format-patch --subject-prefix='PATCH bpf' start..finish 128 - 129 - For features/improvements/etc that should eventually land in 130 - bpf-next -> net-next, the subject must look like: 131 - 132 - git format-patch --subject-prefix='PATCH bpf-next' start..finish 133 - 134 - If unsure whether the patch or patch series should go into bpf 135 - or net directly, or bpf-next or net-next directly, it is not a 136 - problem either if the subject line says net or net-next as target. 137 - It is eventually up to the maintainers to do the delegation of 138 - the patches. 139 - 140 - If it is clear that patches should go into bpf or bpf-next tree, 141 - please make sure to rebase the patches against those trees in 142 - order to reduce potential conflicts. 143 - 144 - In case the patch or patch series has to be reworked and sent out 145 - again in a second or later revision, it is also required to add a 146 - version number (v2, v3, ...) into the subject prefix: 147 - 148 - git format-patch --subject-prefix='PATCH net-next v2' start..finish 149 - 150 - When changes have been requested to the patch series, always send the 151 - whole patch series again with the feedback incorporated (never send 152 - individual diffs on top of the old series). 153 - 154 - Q: What does it mean when a patch gets applied to bpf or bpf-next tree? 155 - 156 - A: It means that the patch looks good for mainline inclusion from 157 - a BPF point of view. 158 - 159 - Be aware that this is not a final verdict that the patch will 160 - automatically get accepted into net or net-next trees eventually: 161 - 162 - On the netdev kernel mailing list reviews can come in at any point 163 - in time. If discussions around a patch conclude that they cannot 164 - get included as-is, we will either apply a follow-up fix or drop 165 - them from the trees entirely. Therefore, we also reserve to rebase 166 - the trees when deemed necessary. After all, the purpose of the tree 167 - is to i) accumulate and stage BPF patches for integration into trees 168 - like net and net-next, and ii) run extensive BPF test suite and 169 - workloads on the patches before they make their way any further. 170 - 171 - Once the BPF pull request was accepted by David S. Miller, then 172 - the patches end up in net or net-next tree, respectively, and 173 - make their way from there further into mainline. Again, see the 174 - netdev FAQ for additional information e.g. on how often they are 175 - merged to mainline. 176 - 177 - Q: How long do I need to wait for feedback on my BPF patches? 178 - 179 - A: We try to keep the latency low. The usual time to feedback will 180 - be around 2 or 3 business days. It may vary depending on the 181 - complexity of changes and current patch load. 182 - 183 - Q: How often do you send pull requests to major kernel trees like 184 - net or net-next? 185 - 186 - A: Pull requests will be sent out rather often in order to not 187 - accumulate too many patches in bpf or bpf-next. 188 - 189 - As a rule of thumb, expect pull requests for each tree regularly 190 - at the end of the week. In some cases pull requests could additionally 191 - come also in the middle of the week depending on the current patch 192 - load or urgency. 193 - 194 - Q: Are patches applied to bpf-next when the merge window is open? 195 - 196 - A: For the time when the merge window is open, bpf-next will not be 197 - processed. This is roughly analogous to net-next patch processing, 198 - so feel free to read up on the netdev FAQ about further details. 199 - 200 - During those two weeks of merge window, we might ask you to resend 201 - your patch series once bpf-next is open again. Once Linus released 202 - a v*-rc1 after the merge window, we continue processing of bpf-next. 203 - 204 - For non-subscribers to kernel mailing lists, there is also a status 205 - page run by David S. Miller on net-next that provides guidance: 206 - 207 - http://vger.kernel.org/~davem/net-next.html 208 - 209 - Q: I made a BPF verifier change, do I need to add test cases for 210 - BPF kernel selftests? 211 - 212 - A: If the patch has changes to the behavior of the verifier, then yes, 213 - it is absolutely necessary to add test cases to the BPF kernel 214 - selftests suite. If they are not present and we think they are 215 - needed, then we might ask for them before accepting any changes. 216 - 217 - In particular, test_verifier.c is tracking a high number of BPF test 218 - cases, including a lot of corner cases that LLVM BPF back end may 219 - generate out of the restricted C code. Thus, adding test cases is 220 - absolutely crucial to make sure future changes do not accidentally 221 - affect prior use-cases. Thus, treat those test cases as: verifier 222 - behavior that is not tracked in test_verifier.c could potentially 223 - be subject to change. 224 - 225 - Q: When should I add code to samples/bpf/ and when to BPF kernel 226 - selftests? 227 - 228 - A: In general, we prefer additions to BPF kernel selftests rather than 229 - samples/bpf/. The rationale is very simple: kernel selftests are 230 - regularly run by various bots to test for kernel regressions. 231 - 232 - The more test cases we add to BPF selftests, the better the coverage 233 - and the less likely it is that those could accidentally break. It is 234 - not that BPF kernel selftests cannot demo how a specific feature can 235 - be used. 236 - 237 - That said, samples/bpf/ may be a good place for people to get started, 238 - so it might be advisable that simple demos of features could go into 239 - samples/bpf/, but advanced functional and corner-case testing rather 240 - into kernel selftests. 241 - 242 - If your sample looks like a test case, then go for BPF kernel selftests 243 - instead! 244 - 245 - Q: When should I add code to the bpftool? 246 - 247 - A: The main purpose of bpftool (under tools/bpf/bpftool/) is to provide 248 - a central user space tool for debugging and introspection of BPF programs 249 - and maps that are active in the kernel. If UAPI changes related to BPF 250 - enable for dumping additional information of programs or maps, then 251 - bpftool should be extended as well to support dumping them. 252 - 253 - Q: When should I add code to iproute2's BPF loader? 254 - 255 - A: For UAPI changes related to the XDP or tc layer (e.g. cls_bpf), the 256 - convention is that those control-path related changes are added to 257 - iproute2's BPF loader as well from user space side. This is not only 258 - useful to have UAPI changes properly designed to be usable, but also 259 - to make those changes available to a wider user base of major 260 - downstream distributions. 261 - 262 - Q: Do you accept patches as well for iproute2's BPF loader? 263 - 264 - A: Patches for the iproute2's BPF loader have to be sent to: 265 - 266 - netdev@vger.kernel.org 267 - 268 - While those patches are not processed by the BPF kernel maintainers, 269 - please keep them in Cc as well, so they can be reviewed. 270 - 271 - The official git repository for iproute2 is run by Stephen Hemminger 272 - and can be found at: 273 - 274 - https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/ 275 - 276 - The patches need to have a subject prefix of '[PATCH iproute2 master]' 277 - or '[PATCH iproute2 net-next]'. 'master' or 'net-next' describes the 278 - target branch where the patch should be applied to. Meaning, if kernel 279 - changes went into the net-next kernel tree, then the related iproute2 280 - changes need to go into the iproute2 net-next branch, otherwise they 281 - can be targeted at master branch. The iproute2 net-next branch will get 282 - merged into the master branch after the current iproute2 version from 283 - master has been released. 284 - 285 - Like BPF, the patches end up in patchwork under the netdev project and 286 - are delegated to 'shemminger' for further processing: 287 - 288 - http://patchwork.ozlabs.org/project/netdev/list/?delegate=389 289 - 290 - Q: What is the minimum requirement before I submit my BPF patches? 291 - 292 - A: When submitting patches, always take the time and properly test your 293 - patches *prior* to submission. Never rush them! If maintainers find 294 - that your patches have not been properly tested, it is a good way to 295 - get them grumpy. Testing patch submissions is a hard requirement! 296 - 297 - Note, fixes that go to bpf tree *must* have a Fixes: tag included. The 298 - same applies to fixes that target bpf-next, where the affected commit 299 - is in net-next (or in some cases bpf-next). The Fixes: tag is crucial 300 - in order to identify follow-up commits and tremendously helps for people 301 - having to do backporting, so it is a must have! 302 - 303 - We also don't accept patches with an empty commit message. Take your 304 - time and properly write up a high quality commit message, it is 305 - essential! 306 - 307 - Think about it this way: other developers looking at your code a month 308 - from now need to understand *why* a certain change has been done that 309 - way, and whether there have been flaws in the analysis or assumptions 310 - that the original author did. Thus providing a proper rationale and 311 - describing the use-case for the changes is a must. 312 - 313 - Patch submissions with >1 patch must have a cover letter which includes 314 - a high level description of the series. This high level summary will 315 - then be placed into the merge commit by the BPF maintainers such that 316 - it is also accessible from the git log for future reference. 317 - 318 - Q: What do I need to consider when adding a new instruction or feature 319 - that would require BPF JIT and/or LLVM integration as well? 320 - 321 - A: We try hard to keep all BPF JITs up to date such that the same user 322 - experience can be guaranteed when running BPF programs on different 323 - architectures without having the program punt to the less efficient 324 - interpreter in case the in-kernel BPF JIT is enabled. 325 - 326 - If you are unable to implement or test the required JIT changes for 327 - certain architectures, please work together with the related BPF JIT 328 - developers in order to get the feature implemented in a timely manner. 329 - Please refer to the git log (arch/*/net/) to locate the necessary 330 - people for helping out. 331 - 332 - Also always make sure to add BPF test cases (e.g. test_bpf.c and 333 - test_verifier.c) for new instructions, so that they can receive 334 - broad test coverage and help run-time testing the various BPF JITs. 335 - 336 - In case of new BPF instructions, once the changes have been accepted 337 - into the Linux kernel, please implement support into LLVM's BPF back 338 - end. See LLVM section below for further information. 339 - 340 - Stable submission: 341 - ------------------ 342 - 343 - Q: I need a specific BPF commit in stable kernels. What should I do? 344 - 345 - A: In case you need a specific fix in stable kernels, first check whether 346 - the commit has already been applied in the related linux-*.y branches: 347 - 348 - https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/ 349 - 350 - If not the case, then drop an email to the BPF maintainers with the 351 - netdev kernel mailing list in Cc and ask for the fix to be queued up: 352 - 353 - netdev@vger.kernel.org 354 - 355 - The process in general is the same as on netdev itself, see also the 356 - netdev FAQ document. 357 - 358 - Q: Do you also backport to kernels not currently maintained as stable? 359 - 360 - A: No. If you need a specific BPF commit in kernels that are currently not 361 - maintained by the stable maintainers, then you are on your own. 362 - 363 - The current stable and longterm stable kernels are all listed here: 364 - 365 - https://www.kernel.org/ 366 - 367 - Q: The BPF patch I am about to submit needs to go to stable as well. What 368 - should I do? 369 - 370 - A: The same rules apply as with netdev patch submissions in general, see 371 - netdev FAQ under: 372 - 373 - Documentation/networking/netdev-FAQ.txt 374 - 375 - Never add "Cc: stable@vger.kernel.org" to the patch description, but 376 - ask the BPF maintainers to queue the patches instead. This can be done 377 - with a note, for example, under the "---" part of the patch which does 378 - not go into the git log. Alternatively, this can be done as a simple 379 - request by mail instead. 380 - 381 - Q: Where do I find currently queued BPF patches that will be submitted 382 - to stable? 383 - 384 - A: Once patches that fix critical bugs got applied into the bpf tree, they 385 - are queued up for stable submission under: 386 - 387 - http://patchwork.ozlabs.org/bundle/bpf/stable/?state=* 388 - 389 - They will be on hold there at minimum until the related commit made its 390 - way into the mainline kernel tree. 391 - 392 - After having been under broader exposure, the queued patches will be 393 - submitted by the BPF maintainers to the stable maintainers. 394 - 395 - Testing patches: 396 - ---------------- 397 - 398 - Q: Which BPF kernel selftests version should I run my kernel against? 399 - 400 - A: If you run a kernel xyz, then always run the BPF kernel selftests from 401 - that kernel xyz as well. Do not expect that the BPF selftest from the 402 - latest mainline tree will pass all the time. 403 - 404 - In particular, test_bpf.c and test_verifier.c have a large number of 405 - test cases and are constantly updated with new BPF test sequences, or 406 - existing ones are adapted to verifier changes e.g. due to verifier 407 - becoming smarter and being able to better track certain things. 408 - 409 - LLVM: 410 - ----- 411 - 412 - Q: Where do I find LLVM with BPF support? 413 - 414 - A: The BPF back end for LLVM is upstream in LLVM since version 3.7.1. 415 - 416 - All major distributions these days ship LLVM with BPF back end enabled, 417 - so for the majority of use-cases it is not required to compile LLVM by 418 - hand anymore, just install the distribution provided package. 419 - 420 - LLVM's static compiler lists the supported targets through 'llc --version', 421 - make sure BPF targets are listed. Example: 422 - 423 - $ llc --version 424 - LLVM (http://llvm.org/): 425 - LLVM version 6.0.0svn 426 - Optimized build. 427 - Default target: x86_64-unknown-linux-gnu 428 - Host CPU: skylake 429 - 430 - Registered Targets: 431 - bpf - BPF (host endian) 432 - bpfeb - BPF (big endian) 433 - bpfel - BPF (little endian) 434 - x86 - 32-bit X86: Pentium-Pro and above 435 - x86-64 - 64-bit X86: EM64T and AMD64 436 - 437 - For developers in order to utilize the latest features added to LLVM's 438 - BPF back end, it is advisable to run the latest LLVM releases. Support 439 - for new BPF kernel features such as additions to the BPF instruction 440 - set are often developed together. 441 - 442 - All LLVM releases can be found at: http://releases.llvm.org/ 443 - 444 - Q: Got it, so how do I build LLVM manually anyway? 445 - 446 - A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have 447 - that set up, proceed with building the latest LLVM and clang version 448 - from the git repositories: 449 - 450 - $ git clone http://llvm.org/git/llvm.git 451 - $ cd llvm/tools 452 - $ git clone --depth 1 http://llvm.org/git/clang.git 453 - $ cd ..; mkdir build; cd build 454 - $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ 455 - -DBUILD_SHARED_LIBS=OFF \ 456 - -DCMAKE_BUILD_TYPE=Release \ 457 - -DLLVM_BUILD_RUNTIME=OFF 458 - $ make -j $(getconf _NPROCESSORS_ONLN) 459 - 460 - The built binaries can then be found in the build/bin/ directory, where 461 - you can point the PATH variable to. 462 - 463 - Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code 464 - generation back end or about LLVM generated code that the verifier 465 - refuses to accept? 466 - 467 - A: Yes, please do! LLVM's BPF back end is a key piece of the whole BPF 468 - infrastructure and it ties deeply into verification of programs from the 469 - kernel side. Therefore, any issues on either side need to be investigated 470 - and fixed whenever necessary. 471 - 472 - Therefore, please make sure to bring them up at netdev kernel mailing 473 - list and Cc BPF maintainers for LLVM and kernel bits: 474 - 475 - Yonghong Song <yhs@fb.com> 476 - Alexei Starovoitov <ast@kernel.org> 477 - Daniel Borkmann <daniel@iogearbox.net> 478 - 479 - LLVM also has an issue tracker where BPF related bugs can be found: 480 - 481 - https://bugs.llvm.org/buglist.cgi?quicksearch=bpf 482 - 483 - However, it is better to reach out through mailing lists with having 484 - maintainers in Cc. 485 - 486 - Q: I have added a new BPF instruction to the kernel, how can I integrate 487 - it into LLVM? 488 - 489 - A: LLVM has a -mcpu selector for the BPF back end in order to allow the 490 - selection of BPF instruction set extensions. By default the 'generic' 491 - processor target is used, which is the base instruction set (v1) of BPF. 492 - 493 - LLVM has an option to select -mcpu=probe where it will probe the host 494 - kernel for supported BPF instruction set extensions and selects the 495 - optimal set automatically. 496 - 497 - For cross-compilation, a specific version can be select manually as well. 498 - 499 - $ llc -march bpf -mcpu=help 500 - Available CPUs for this target: 501 - 502 - generic - Select the generic processor. 503 - probe - Select the probe processor. 504 - v1 - Select the v1 processor. 505 - v2 - Select the v2 processor. 506 - [...] 507 - 508 - Newly added BPF instructions to the Linux kernel need to follow the same 509 - scheme, bump the instruction set version and implement probing for the 510 - extensions such that -mcpu=probe users can benefit from the optimization 511 - transparently when upgrading their kernels. 512 - 513 - If you are unable to implement support for the newly added BPF instruction 514 - please reach out to BPF developers for help. 515 - 516 - By the way, the BPF kernel selftests run with -mcpu=probe for better 517 - test coverage. 518 - 519 - Q: In some cases clang flag "-target bpf" is used but in other cases the 520 - default clang target, which matches the underlying architecture, is used. 521 - What is the difference and when I should use which? 522 - 523 - A: Although LLVM IR generation and optimization try to stay architecture 524 - independent, "-target <arch>" still has some impact on generated code: 525 - 526 - - BPF program may recursively include header file(s) with file scope 527 - inline assembly codes. The default target can handle this well, 528 - while bpf target may fail if bpf backend assembler does not 529 - understand these assembly codes, which is true in most cases. 530 - 531 - - When compiled without -g, additional elf sections, e.g., 532 - .eh_frame and .rela.eh_frame, may be present in the object file 533 - with default target, but not with bpf target. 534 - 535 - - The default target may turn a C switch statement into a switch table 536 - lookup and jump operation. Since the switch table is placed 537 - in the global readonly section, the bpf program will fail to load. 538 - The bpf target does not support switch table optimization. 539 - The clang option "-fno-jump-tables" can be used to disable 540 - switch table generation. 541 - 542 - - For clang -target bpf, it is guaranteed that pointer or long / 543 - unsigned long types will always have a width of 64 bit, no matter 544 - whether underlying clang binary or default target (or kernel) is 545 - 32 bit. However, when native clang target is used, then it will 546 - compile these types based on the underlying architecture's conventions, 547 - meaning in case of 32 bit architecture, pointer or long / unsigned 548 - long types e.g. in BPF context structure will have width of 32 bit 549 - while the BPF LLVM back end still operates in 64 bit. The native 550 - target is mostly needed in tracing for the case of walking pt_regs 551 - or other kernel structures where CPU's register width matters. 552 - Otherwise, clang -target bpf is generally recommended. 553 - 554 - You should use default target when: 555 - 556 - - Your program includes a header file, e.g., ptrace.h, which eventually 557 - pulls in some header files containing file scope host assembly codes. 558 - - You can add "-fno-jump-tables" to work around the switch table issue. 559 - 560 - Otherwise, you can use bpf target. Additionally, you _must_ use bpf target 561 - when: 562 - 563 - - Your program uses data structures with pointer or long / unsigned long 564 - types that interface with BPF helpers or context data structures. Access 565 - into these structures is verified by the BPF verifier and may result 566 - in verification failures if the native architecture is not aligned with 567 - the BPF architecture, e.g. 64-bit. An example of this is 568 - BPF_PROG_TYPE_SK_MSG require '-target bpf' 569 - 570 - Happy BPF hacking!
+9 -6
Documentation/networking/filter.txt
··· 1142 1142 the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we 1143 1143 then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0; 1144 1144 0x1ff), because of potential carries. 1145 + 1145 1146 Besides arithmetic, the register state can also be updated by conditional 1146 1147 branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch 1147 1148 it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false' ··· 1151 1150 from the signed and unsigned bounds can be combined; for instance if a value is 1152 1151 first tested < 8 and then tested s> 4, the verifier will conclude that the value 1153 1152 is also > 4 and s< 8, since the bounds prevent crossing the sign boundary. 1153 + 1154 1154 PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all 1155 1155 pointers sharing that same variable offset. This is important for packet range 1156 - checks: after adding some variable to a packet pointer, if you then copy it to 1157 - another register and (say) add a constant 4, both registers will share the same 1158 - 'id' but one will have a fixed offset of +4. Then if it is bounds-checked and 1159 - found to be less than a PTR_TO_PACKET_END, the other register is now known to 1160 - have a safe range of at least 4 bytes. See 'Direct packet access', below, for 1161 - more on PTR_TO_PACKET ranges. 1156 + checks: after adding a variable to a packet pointer register A, if you then copy 1157 + it to another register B and then add a constant 4 to A, both registers will 1158 + share the same 'id' but the A will have a fixed offset of +4. Then if A is 1159 + bounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is 1160 + now known to have a safe range of at least 4 bytes. See 'Direct packet access', 1161 + below, for more on PTR_TO_PACKET ranges. 1162 + 1162 1163 The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of 1163 1164 the pointer returned from a map lookup. This means that when one copy is 1164 1165 checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs.
+3 -10
arch/arm/net/bpf_jit_32.c
··· 234 234 #define SCRATCH_SIZE 80 235 235 236 236 /* total stack size used in JITed code */ 237 - #define _STACK_SIZE \ 238 - (ctx->prog->aux->stack_depth + \ 239 - + SCRATCH_SIZE + \ 240 - + 4 /* extra for skb_copy_bits buffer */) 241 - 242 - #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT) 237 + #define _STACK_SIZE (ctx->prog->aux->stack_depth + SCRATCH_SIZE) 238 + #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT) 243 239 244 240 /* Get the offset of eBPF REGISTERs stored on scratch space. */ 245 - #define STACK_VAR(off) (STACK_SIZE-off-4) 246 - 247 - /* Offset of skb_copy_bits buffer */ 248 - #define SKB_BUFFER STACK_VAR(SCRATCH_SIZE) 241 + #define STACK_VAR(off) (STACK_SIZE - off) 249 242 250 243 #if __LINUX_ARM_ARCH__ < 7 251 244
+70 -47
arch/arm64/net/bpf_jit_comp.c
··· 21 21 #include <linux/bpf.h> 22 22 #include <linux/filter.h> 23 23 #include <linux/printk.h> 24 - #include <linux/skbuff.h> 25 24 #include <linux/slab.h> 26 25 27 26 #include <asm/byteorder.h> ··· 79 80 ctx->idx++; 80 81 } 81 82 82 - static inline void emit_a64_mov_i64(const int reg, const u64 val, 83 - struct jit_ctx *ctx) 84 - { 85 - u64 tmp = val; 86 - int shift = 0; 87 - 88 - emit(A64_MOVZ(1, reg, tmp & 0xffff, shift), ctx); 89 - tmp >>= 16; 90 - shift += 16; 91 - while (tmp) { 92 - if (tmp & 0xffff) 93 - emit(A64_MOVK(1, reg, tmp & 0xffff, shift), ctx); 94 - tmp >>= 16; 95 - shift += 16; 96 - } 97 - } 98 - 99 - static inline void emit_addr_mov_i64(const int reg, const u64 val, 100 - struct jit_ctx *ctx) 101 - { 102 - u64 tmp = val; 103 - int shift = 0; 104 - 105 - emit(A64_MOVZ(1, reg, tmp & 0xffff, shift), ctx); 106 - for (;shift < 48;) { 107 - tmp >>= 16; 108 - shift += 16; 109 - emit(A64_MOVK(1, reg, tmp & 0xffff, shift), ctx); 110 - } 111 - } 112 - 113 83 static inline void emit_a64_mov_i(const int is64, const int reg, 114 84 const s32 val, struct jit_ctx *ctx) 115 85 { ··· 90 122 emit(A64_MOVN(is64, reg, (u16)~lo, 0), ctx); 91 123 } else { 92 124 emit(A64_MOVN(is64, reg, (u16)~hi, 16), ctx); 93 - emit(A64_MOVK(is64, reg, lo, 0), ctx); 125 + if (lo != 0xffff) 126 + emit(A64_MOVK(is64, reg, lo, 0), ctx); 94 127 } 95 128 } else { 96 129 emit(A64_MOVZ(is64, reg, lo, 0), ctx); 97 130 if (hi) 98 131 emit(A64_MOVK(is64, reg, hi, 16), ctx); 132 + } 133 + } 134 + 135 + static int i64_i16_blocks(const u64 val, bool inverse) 136 + { 137 + return (((val >> 0) & 0xffff) != (inverse ? 0xffff : 0x0000)) + 138 + (((val >> 16) & 0xffff) != (inverse ? 0xffff : 0x0000)) + 139 + (((val >> 32) & 0xffff) != (inverse ? 0xffff : 0x0000)) + 140 + (((val >> 48) & 0xffff) != (inverse ? 0xffff : 0x0000)); 141 + } 142 + 143 + static inline void emit_a64_mov_i64(const int reg, const u64 val, 144 + struct jit_ctx *ctx) 145 + { 146 + u64 nrm_tmp = val, rev_tmp = ~val; 147 + bool inverse; 148 + int shift; 149 + 150 + if (!(nrm_tmp >> 32)) 151 + return emit_a64_mov_i(0, reg, (u32)val, ctx); 152 + 153 + inverse = i64_i16_blocks(nrm_tmp, true) < i64_i16_blocks(nrm_tmp, false); 154 + shift = max(round_down((inverse ? (fls64(rev_tmp) - 1) : 155 + (fls64(nrm_tmp) - 1)), 16), 0); 156 + if (inverse) 157 + emit(A64_MOVN(1, reg, (rev_tmp >> shift) & 0xffff, shift), ctx); 158 + else 159 + emit(A64_MOVZ(1, reg, (nrm_tmp >> shift) & 0xffff, shift), ctx); 160 + shift -= 16; 161 + while (shift >= 0) { 162 + if (((nrm_tmp >> shift) & 0xffff) != (inverse ? 0xffff : 0x0000)) 163 + emit(A64_MOVK(1, reg, (nrm_tmp >> shift) & 0xffff, shift), ctx); 164 + shift -= 16; 165 + } 166 + } 167 + 168 + /* 169 + * This is an unoptimized 64 immediate emission used for BPF to BPF call 170 + * addresses. It will always do a full 64 bit decomposition as otherwise 171 + * more complexity in the last extra pass is required since we previously 172 + * reserved 4 instructions for the address. 173 + */ 174 + static inline void emit_addr_mov_i64(const int reg, const u64 val, 175 + struct jit_ctx *ctx) 176 + { 177 + u64 tmp = val; 178 + int shift = 0; 179 + 180 + emit(A64_MOVZ(1, reg, tmp & 0xffff, shift), ctx); 181 + for (;shift < 48;) { 182 + tmp >>= 16; 183 + shift += 16; 184 + emit(A64_MOVK(1, reg, tmp & 0xffff, shift), ctx); 99 185 } 100 186 } 101 187 ··· 185 163 /* Tail call offset to jump into */ 186 164 #define PROLOGUE_OFFSET 7 187 165 188 - static int build_prologue(struct jit_ctx *ctx) 166 + static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) 189 167 { 190 168 const struct bpf_prog *prog = ctx->prog; 191 169 const u8 r6 = bpf2a64[BPF_REG_6]; ··· 210 188 * | ... | BPF prog stack 211 189 * | | 212 190 * +-----+ <= (BPF_FP - prog->aux->stack_depth) 213 - * |RSVD | JIT scratchpad 191 + * |RSVD | padding 214 192 * current A64_SP => +-----+ <= (BPF_FP - ctx->stack_size) 215 193 * | | 216 194 * | ... | Function call stack ··· 232 210 /* Set up BPF prog stack base register */ 233 211 emit(A64_MOV(1, fp, A64_SP), ctx); 234 212 235 - /* Initialize tail_call_cnt */ 236 - emit(A64_MOVZ(1, tcc, 0, 0), ctx); 213 + if (!ebpf_from_cbpf) { 214 + /* Initialize tail_call_cnt */ 215 + emit(A64_MOVZ(1, tcc, 0, 0), ctx); 237 216 238 - cur_offset = ctx->idx - idx0; 239 - if (cur_offset != PROLOGUE_OFFSET) { 240 - pr_err_once("PROLOGUE_OFFSET = %d, expected %d!\n", 241 - cur_offset, PROLOGUE_OFFSET); 242 - return -1; 217 + cur_offset = ctx->idx - idx0; 218 + if (cur_offset != PROLOGUE_OFFSET) { 219 + pr_err_once("PROLOGUE_OFFSET = %d, expected %d!\n", 220 + cur_offset, PROLOGUE_OFFSET); 221 + return -1; 222 + } 243 223 } 244 224 245 - /* 4 byte extra for skb_copy_bits buffer */ 246 - ctx->stack_size = prog->aux->stack_depth + 4; 247 - ctx->stack_size = STACK_ALIGN(ctx->stack_size); 225 + ctx->stack_size = STACK_ALIGN(prog->aux->stack_depth); 248 226 249 227 /* Set up function call stack */ 250 228 emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); ··· 808 786 struct bpf_prog *tmp, *orig_prog = prog; 809 787 struct bpf_binary_header *header; 810 788 struct arm64_jit_data *jit_data; 789 + bool was_classic = bpf_prog_was_classic(prog); 811 790 bool tmp_blinded = false; 812 791 bool extra_pass = false; 813 792 struct jit_ctx ctx; ··· 863 840 goto out_off; 864 841 } 865 842 866 - if (build_prologue(&ctx)) { 843 + if (build_prologue(&ctx, was_classic)) { 867 844 prog = orig_prog; 868 845 goto out_off; 869 846 } ··· 886 863 skip_init_ctx: 887 864 ctx.idx = 0; 888 865 889 - build_prologue(&ctx); 866 + build_prologue(&ctx, was_classic); 890 867 891 868 if (build_body(&ctx)) { 892 869 bpf_jit_binary_free(header);
-26
arch/mips/net/ebpf_jit.c
··· 95 95 * struct jit_ctx - JIT context 96 96 * @skf: The sk_filter 97 97 * @stack_size: eBPF stack size 98 - * @tmp_offset: eBPF $sp offset to 8-byte temporary memory 99 98 * @idx: Instruction index 100 99 * @flags: JIT flags 101 100 * @offsets: Instruction offsets ··· 104 105 struct jit_ctx { 105 106 const struct bpf_prog *skf; 106 107 int stack_size; 107 - int tmp_offset; 108 108 u32 idx; 109 109 u32 flags; 110 110 u32 *offsets; ··· 291 293 locals_size = (ctx->flags & EBPF_SEEN_FP) ? MAX_BPF_STACK : 0; 292 294 293 295 stack_adjust += locals_size; 294 - ctx->tmp_offset = locals_size; 295 296 296 297 ctx->stack_size = stack_adjust; 297 298 ··· 396 399 emit_instr(ctx, lui, reg, upper >> 16); 397 400 emit_instr(ctx, addiu, reg, reg, lower); 398 401 } 399 - 400 402 } 401 403 402 404 static int gen_imm_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, ··· 540 544 } 541 545 } 542 546 543 - return 0; 544 - } 545 - 546 - static void * __must_check 547 - ool_skb_header_pointer(const struct sk_buff *skb, int offset, 548 - int len, void *buffer) 549 - { 550 - return skb_header_pointer(skb, offset, len, buffer); 551 - } 552 - 553 - static int size_to_len(const struct bpf_insn *insn) 554 - { 555 - switch (BPF_SIZE(insn->code)) { 556 - case BPF_B: 557 - return 1; 558 - case BPF_H: 559 - return 2; 560 - case BPF_W: 561 - return 4; 562 - case BPF_DW: 563 - return 8; 564 - } 565 547 return 0; 566 548 } 567 549
-1
arch/sparc/net/bpf_jit_comp_64.c
··· 894 894 const int i = insn - ctx->prog->insnsi; 895 895 const s16 off = insn->off; 896 896 const s32 imm = insn->imm; 897 - u32 *func; 898 897 899 898 if (insn->src_reg == BPF_REG_FP) 900 899 ctx->saw_frame_pointer = true;
+14 -15
arch/x86/include/asm/nospec-branch.h
··· 301 301 * jmp *%edx for x86_32 302 302 */ 303 303 #ifdef CONFIG_RETPOLINE 304 - #ifdef CONFIG_X86_64 305 - # define RETPOLINE_RAX_BPF_JIT_SIZE 17 306 - # define RETPOLINE_RAX_BPF_JIT() \ 304 + # ifdef CONFIG_X86_64 305 + # define RETPOLINE_RAX_BPF_JIT_SIZE 17 306 + # define RETPOLINE_RAX_BPF_JIT() \ 307 307 do { \ 308 308 EMIT1_off32(0xE8, 7); /* callq do_rop */ \ 309 309 /* spec_trap: */ \ ··· 314 314 EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */ \ 315 315 EMIT1(0xC3); /* retq */ \ 316 316 } while (0) 317 - #else 318 - # define RETPOLINE_EDX_BPF_JIT() \ 317 + # else /* !CONFIG_X86_64 */ 318 + # define RETPOLINE_EDX_BPF_JIT() \ 319 319 do { \ 320 320 EMIT1_off32(0xE8, 7); /* call do_rop */ \ 321 321 /* spec_trap: */ \ ··· 326 326 EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */ \ 327 327 EMIT1(0xC3); /* ret */ \ 328 328 } while (0) 329 - #endif 329 + # endif 330 330 #else /* !CONFIG_RETPOLINE */ 331 - 332 - #ifdef CONFIG_X86_64 333 - # define RETPOLINE_RAX_BPF_JIT_SIZE 2 334 - # define RETPOLINE_RAX_BPF_JIT() \ 335 - EMIT2(0xFF, 0xE0); /* jmp *%rax */ 336 - #else 337 - # define RETPOLINE_EDX_BPF_JIT() \ 338 - EMIT2(0xFF, 0xE2) /* jmp *%edx */ 339 - #endif 331 + # ifdef CONFIG_X86_64 332 + # define RETPOLINE_RAX_BPF_JIT_SIZE 2 333 + # define RETPOLINE_RAX_BPF_JIT() \ 334 + EMIT2(0xFF, 0xE0); /* jmp *%rax */ 335 + # else /* !CONFIG_X86_64 */ 336 + # define RETPOLINE_EDX_BPF_JIT() \ 337 + EMIT2(0xFF, 0xE2) /* jmp *%edx */ 338 + # endif 340 339 #endif 341 340 342 341 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
+1
drivers/net/ethernet/netronome/nfp/bpf/fw.h
··· 50 50 NFP_BPF_CAP_TYPE_ADJUST_HEAD = 2, 51 51 NFP_BPF_CAP_TYPE_MAPS = 3, 52 52 NFP_BPF_CAP_TYPE_RANDOM = 4, 53 + NFP_BPF_CAP_TYPE_QUEUE_SELECT = 5, 53 54 }; 54 55 55 56 struct nfp_bpf_cap_tlv_func {
+47
drivers/net/ethernet/netronome/nfp/bpf/jit.c
··· 42 42 43 43 #include "main.h" 44 44 #include "../nfp_asm.h" 45 + #include "../nfp_net_ctrl.h" 45 46 46 47 /* --- NFP prog --- */ 47 48 /* Foreach "multiple" entries macros provide pos and next<n> pointers. ··· 1471 1470 return 0; 1472 1471 } 1473 1472 1473 + static int 1474 + nfp_queue_select(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) 1475 + { 1476 + u32 jmp_tgt; 1477 + 1478 + jmp_tgt = nfp_prog_current_offset(nfp_prog) + 5; 1479 + 1480 + /* Make sure the queue id fits into FW field */ 1481 + emit_alu(nfp_prog, reg_none(), reg_a(meta->insn.src_reg * 2), 1482 + ALU_OP_AND_NOT_B, reg_imm(0xff)); 1483 + emit_br(nfp_prog, BR_BEQ, jmp_tgt, 2); 1484 + 1485 + /* Set the 'queue selected' bit and the queue value */ 1486 + emit_shf(nfp_prog, pv_qsel_set(nfp_prog), 1487 + pv_qsel_set(nfp_prog), SHF_OP_OR, reg_imm(1), 1488 + SHF_SC_L_SHF, PKT_VEL_QSEL_SET_BIT); 1489 + emit_ld_field(nfp_prog, 1490 + pv_qsel_val(nfp_prog), 0x1, reg_b(meta->insn.src_reg * 2), 1491 + SHF_SC_NONE, 0); 1492 + /* Delay slots end here, we will jump over next instruction if queue 1493 + * value fits into the field. 1494 + */ 1495 + emit_ld_field(nfp_prog, 1496 + pv_qsel_val(nfp_prog), 0x1, reg_imm(NFP_NET_RXR_MAX), 1497 + SHF_SC_NONE, 0); 1498 + 1499 + if (!nfp_prog_confirm_current_offset(nfp_prog, jmp_tgt)) 1500 + return -EINVAL; 1501 + 1502 + return 0; 1503 + } 1504 + 1474 1505 /* --- Callbacks --- */ 1475 1506 static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) 1476 1507 { ··· 2193 2160 false, wrp_lmem_store); 2194 2161 } 2195 2162 2163 + static int mem_stx_xdp(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) 2164 + { 2165 + switch (meta->insn.off) { 2166 + case offsetof(struct xdp_md, rx_queue_index): 2167 + return nfp_queue_select(nfp_prog, meta); 2168 + } 2169 + 2170 + WARN_ON_ONCE(1); /* verifier should have rejected bad accesses */ 2171 + return -EOPNOTSUPP; 2172 + } 2173 + 2196 2174 static int 2197 2175 mem_stx(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, 2198 2176 unsigned int size) ··· 2230 2186 2231 2187 static int mem_stx4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta) 2232 2188 { 2189 + if (meta->ptr.type == PTR_TO_CTX) 2190 + if (nfp_prog->type == BPF_PROG_TYPE_XDP) 2191 + return mem_stx_xdp(nfp_prog, meta); 2233 2192 return mem_stx(nfp_prog, meta, 4); 2234 2193 } 2235 2194
+11
drivers/net/ethernet/netronome/nfp/bpf/main.c
··· 334 334 return 0; 335 335 } 336 336 337 + static int 338 + nfp_bpf_parse_cap_qsel(struct nfp_app_bpf *bpf, void __iomem *value, u32 length) 339 + { 340 + bpf->queue_select = true; 341 + return 0; 342 + } 343 + 337 344 static int nfp_bpf_parse_capabilities(struct nfp_app *app) 338 345 { 339 346 struct nfp_cpp *cpp = app->pf->cpp; ··· 381 374 break; 382 375 case NFP_BPF_CAP_TYPE_RANDOM: 383 376 if (nfp_bpf_parse_cap_random(app->priv, value, length)) 377 + goto err_release_free; 378 + break; 379 + case NFP_BPF_CAP_TYPE_QUEUE_SELECT: 380 + if (nfp_bpf_parse_cap_qsel(app->priv, value, length)) 384 381 goto err_release_free; 385 382 break; 386 383 default:
+8
drivers/net/ethernet/netronome/nfp/bpf/main.h
··· 82 82 enum pkt_vec { 83 83 PKT_VEC_PKT_LEN = 0, 84 84 PKT_VEC_PKT_PTR = 2, 85 + PKT_VEC_QSEL_SET = 4, 86 + PKT_VEC_QSEL_VAL = 6, 85 87 }; 88 + 89 + #define PKT_VEL_QSEL_SET_BIT 4 86 90 87 91 #define pv_len(np) reg_lm(1, PKT_VEC_PKT_LEN) 88 92 #define pv_ctm_ptr(np) reg_lm(1, PKT_VEC_PKT_PTR) 93 + #define pv_qsel_set(np) reg_lm(1, PKT_VEC_QSEL_SET) 94 + #define pv_qsel_val(np) reg_lm(1, PKT_VEC_QSEL_VAL) 89 95 90 96 #define stack_reg(np) reg_a(STATIC_REG_STACK) 91 97 #define stack_imm(np) imm_b(np) ··· 145 139 * @helpers.perf_event_output: output perf event to a ring buffer 146 140 * 147 141 * @pseudo_random: FW initialized the pseudo-random machinery (CSRs) 142 + * @queue_select: BPF can set the RX queue ID in packet vector 148 143 */ 149 144 struct nfp_app_bpf { 150 145 struct nfp_app *app; ··· 188 181 } helpers; 189 182 190 183 bool pseudo_random; 184 + bool queue_select; 191 185 }; 192 186 193 187 enum nfp_bpf_map_use {
+26 -2
drivers/net/ethernet/netronome/nfp/bpf/verifier.c
··· 468 468 } 469 469 470 470 static int 471 + nfp_bpf_check_store(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, 472 + struct bpf_verifier_env *env) 473 + { 474 + const struct bpf_reg_state *reg = cur_regs(env) + meta->insn.dst_reg; 475 + 476 + if (reg->type == PTR_TO_CTX) { 477 + if (nfp_prog->type == BPF_PROG_TYPE_XDP) { 478 + /* XDP ctx accesses must be 4B in size */ 479 + switch (meta->insn.off) { 480 + case offsetof(struct xdp_md, rx_queue_index): 481 + if (nfp_prog->bpf->queue_select) 482 + goto exit_check_ptr; 483 + pr_vlog(env, "queue selection not supported by FW\n"); 484 + return -EOPNOTSUPP; 485 + } 486 + } 487 + pr_vlog(env, "unsupported store to context field\n"); 488 + return -EOPNOTSUPP; 489 + } 490 + exit_check_ptr: 491 + return nfp_bpf_check_ptr(nfp_prog, meta, env, meta->insn.dst_reg); 492 + } 493 + 494 + static int 471 495 nfp_bpf_check_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, 472 496 struct bpf_verifier_env *env) 473 497 { ··· 546 522 return nfp_bpf_check_ptr(nfp_prog, meta, env, 547 523 meta->insn.src_reg); 548 524 if (is_mbpf_store(meta)) 549 - return nfp_bpf_check_ptr(nfp_prog, meta, env, 550 - meta->insn.dst_reg); 525 + return nfp_bpf_check_store(nfp_prog, meta, env); 526 + 551 527 if (is_mbpf_xadd(meta)) 552 528 return nfp_bpf_check_xadd(nfp_prog, meta, env); 553 529
+12 -10
drivers/net/ethernet/netronome/nfp/nfp_asm.h
··· 183 183 #define OP_ALU_DST_LMEXTN 0x80000000000ULL 184 184 185 185 enum alu_op { 186 - ALU_OP_NONE = 0x00, 187 - ALU_OP_ADD = 0x01, 188 - ALU_OP_NOT = 0x04, 189 - ALU_OP_ADD_2B = 0x05, 190 - ALU_OP_AND = 0x08, 191 - ALU_OP_SUB_C = 0x0d, 192 - ALU_OP_ADD_C = 0x11, 193 - ALU_OP_OR = 0x14, 194 - ALU_OP_SUB = 0x15, 195 - ALU_OP_XOR = 0x18, 186 + ALU_OP_NONE = 0x00, 187 + ALU_OP_ADD = 0x01, 188 + ALU_OP_NOT = 0x04, 189 + ALU_OP_ADD_2B = 0x05, 190 + ALU_OP_AND = 0x08, 191 + ALU_OP_AND_NOT_A = 0x0c, 192 + ALU_OP_SUB_C = 0x0d, 193 + ALU_OP_AND_NOT_B = 0x10, 194 + ALU_OP_ADD_C = 0x11, 195 + ALU_OP_OR = 0x14, 196 + ALU_OP_SUB = 0x15, 197 + ALU_OP_XOR = 0x18, 196 198 }; 197 199 198 200 enum alu_dst_ab {
+9 -1
include/linux/bpf.h
··· 627 627 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) 628 628 int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr); 629 629 630 - static inline bool bpf_prog_is_dev_bound(struct bpf_prog_aux *aux) 630 + static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux) 631 631 { 632 632 return aux->offload_requested; 633 633 } ··· 668 668 669 669 #if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_INET) 670 670 struct sock *__sock_map_lookup_elem(struct bpf_map *map, u32 key); 671 + struct sock *__sock_hash_lookup_elem(struct bpf_map *map, void *key); 671 672 int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type); 672 673 #else 673 674 static inline struct sock *__sock_map_lookup_elem(struct bpf_map *map, u32 key) 675 + { 676 + return NULL; 677 + } 678 + 679 + static inline struct sock *__sock_hash_lookup_elem(struct bpf_map *map, 680 + void *key) 674 681 { 675 682 return NULL; 676 683 } ··· 731 724 extern const struct bpf_func_proto bpf_get_stackid_proto; 732 725 extern const struct bpf_func_proto bpf_get_stack_proto; 733 726 extern const struct bpf_func_proto bpf_sock_map_update_proto; 727 + extern const struct bpf_func_proto bpf_sock_hash_update_proto; 734 728 735 729 /* Shared helpers among cBPF and eBPF. */ 736 730 void bpf_user_rnd_init_once(void);
+1
include/linux/bpf_types.h
··· 47 47 BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops) 48 48 #if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_INET) 49 49 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops) 50 + BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops) 50 51 #endif 51 52 BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops) 52 53 #if defined(CONFIG_XDP_SOCKETS)
+2 -2
include/linux/bpf_verifier.h
··· 200 200 u32 subprog_cnt; 201 201 }; 202 202 203 - void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt, 204 - va_list args); 203 + __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log, 204 + const char *fmt, va_list args); 205 205 __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env, 206 206 const char *fmt, ...); 207 207
+2
include/linux/btf.h
··· 44 44 u32 *ret_size); 45 45 void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj, 46 46 struct seq_file *m); 47 + int btf_get_fd_by_id(u32 id); 48 + u32 btf_id(const struct btf *btf); 47 49 48 50 #endif
+1 -2
include/linux/filter.h
··· 515 515 int sg_end; 516 516 struct scatterlist sg_data[MAX_SKB_FRAGS]; 517 517 bool sg_copy[MAX_SKB_FRAGS]; 518 - __u32 key; 519 518 __u32 flags; 520 - struct bpf_map *map; 519 + struct sock *sk_redir; 521 520 struct sk_buff *skb; 522 521 struct list_head list; 523 522 };
+14
include/net/addrconf.h
··· 223 223 const struct in6_addr *addr); 224 224 int (*ipv6_dst_lookup)(struct net *net, struct sock *sk, 225 225 struct dst_entry **dst, struct flowi6 *fl6); 226 + 227 + struct fib6_table *(*fib6_get_table)(struct net *net, u32 id); 228 + struct fib6_info *(*fib6_lookup)(struct net *net, int oif, 229 + struct flowi6 *fl6, int flags); 230 + struct fib6_info *(*fib6_table_lookup)(struct net *net, 231 + struct fib6_table *table, 232 + int oif, struct flowi6 *fl6, 233 + int flags); 234 + struct fib6_info *(*fib6_multipath_select)(const struct net *net, 235 + struct fib6_info *f6i, 236 + struct flowi6 *fl6, int oif, 237 + const struct sk_buff *skb, 238 + int strict); 239 + 226 240 void (*udpv6_encap_enable)(void); 227 241 void (*ndisc_send_na)(struct net_device *dev, const struct in6_addr *daddr, 228 242 const struct in6_addr *solicited_addr,
+18 -3
include/net/ip6_fib.h
··· 376 376 const struct sk_buff *skb, 377 377 int flags, pol_lookup_t lookup); 378 378 379 - struct fib6_node *fib6_lookup(struct fib6_node *root, 380 - const struct in6_addr *daddr, 381 - const struct in6_addr *saddr); 379 + /* called with rcu lock held; can return error pointer 380 + * caller needs to select path 381 + */ 382 + struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6, 383 + int flags); 384 + 385 + /* called with rcu lock held; caller needs to select path */ 386 + struct fib6_info *fib6_table_lookup(struct net *net, struct fib6_table *table, 387 + int oif, struct flowi6 *fl6, int strict); 388 + 389 + struct fib6_info *fib6_multipath_select(const struct net *net, 390 + struct fib6_info *match, 391 + struct flowi6 *fl6, int oif, 392 + const struct sk_buff *skb, int strict); 393 + 394 + struct fib6_node *fib6_node_lookup(struct fib6_node *root, 395 + const struct in6_addr *daddr, 396 + const struct in6_addr *saddr); 382 397 383 398 struct fib6_node *fib6_locate(struct fib6_node *root, 384 399 const struct in6_addr *daddr, int dst_len,
+1 -2
include/net/tcp.h
··· 816 816 #endif 817 817 } header; /* For incoming skbs */ 818 818 struct { 819 - __u32 key; 820 819 __u32 flags; 821 - struct bpf_map *map; 820 + struct sock *sk_redir; 822 821 void *data_end; 823 822 } bpf; 824 823 };
+7 -7
include/trace/events/fib6.h
··· 12 12 13 13 TRACE_EVENT(fib6_table_lookup, 14 14 15 - TP_PROTO(const struct net *net, const struct rt6_info *rt, 15 + TP_PROTO(const struct net *net, const struct fib6_info *f6i, 16 16 struct fib6_table *table, const struct flowi6 *flp), 17 17 18 - TP_ARGS(net, rt, table, flp), 18 + TP_ARGS(net, f6i, table, flp), 19 19 20 20 TP_STRUCT__entry( 21 21 __field( u32, tb_id ) ··· 48 48 in6 = (struct in6_addr *)__entry->dst; 49 49 *in6 = flp->daddr; 50 50 51 - if (rt->rt6i_idev) { 52 - __assign_str(name, rt->rt6i_idev->dev->name); 51 + if (f6i->fib6_nh.nh_dev) { 52 + __assign_str(name, f6i->fib6_nh.nh_dev); 53 53 } else { 54 54 __assign_str(name, ""); 55 55 } 56 - if (rt == net->ipv6.ip6_null_entry) { 56 + if (f6i == net->ipv6.fib6_null_entry) { 57 57 struct in6_addr in6_zero = {}; 58 58 59 59 in6 = (struct in6_addr *)__entry->gw; 60 60 *in6 = in6_zero; 61 61 62 - } else if (rt) { 62 + } else if (f6i) { 63 63 in6 = (struct in6_addr *)__entry->gw; 64 - *in6 = rt->rt6i_gateway; 64 + *in6 = f6i->fib6_nh.nh_gw; 65 65 } 66 66 ), 67 67
+141 -1
include/uapi/linux/bpf.h
··· 96 96 BPF_PROG_QUERY, 97 97 BPF_RAW_TRACEPOINT_OPEN, 98 98 BPF_BTF_LOAD, 99 + BPF_BTF_GET_FD_BY_ID, 99 100 }; 100 101 101 102 enum bpf_map_type { ··· 118 117 BPF_MAP_TYPE_SOCKMAP, 119 118 BPF_MAP_TYPE_CPUMAP, 120 119 BPF_MAP_TYPE_XSKMAP, 120 + BPF_MAP_TYPE_SOCKHASH, 121 121 }; 122 122 123 123 enum bpf_prog_type { ··· 346 344 __u32 start_id; 347 345 __u32 prog_id; 348 346 __u32 map_id; 347 + __u32 btf_id; 349 348 }; 350 349 __u32 next_id; 351 350 __u32 open_flags; ··· 1829 1826 * Return 1830 1827 * 0 on success, or a negative error in case of failure. 1831 1828 * 1829 + * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags) 1830 + * Description 1831 + * Do FIB lookup in kernel tables using parameters in *params*. 1832 + * If lookup is successful and result shows packet is to be 1833 + * forwarded, the neighbor tables are searched for the nexthop. 1834 + * If successful (ie., FIB lookup shows forwarding and nexthop 1835 + * is resolved), the nexthop address is returned in ipv4_dst, 1836 + * ipv6_dst or mpls_out based on family, smac is set to mac 1837 + * address of egress device, dmac is set to nexthop mac address, 1838 + * rt_metric is set to metric from route. 1839 + * 1840 + * *plen* argument is the size of the passed in struct. 1841 + * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags: 1842 + * 1843 + * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs 1844 + * full lookup using FIB rules 1845 + * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress 1846 + * perspective (default is ingress) 1847 + * 1848 + * *ctx* is either **struct xdp_md** for XDP programs or 1849 + * **struct sk_buff** tc cls_act programs. 1850 + * 1851 + * Return 1852 + * Egress device index on success, 0 if packet needs to continue 1853 + * up the stack for further processing or a negative error in case 1854 + * of failure. 1855 + * 1856 + * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags) 1857 + * Description 1858 + * Add an entry to, or update a sockhash *map* referencing sockets. 1859 + * The *skops* is used as a new value for the entry associated to 1860 + * *key*. *flags* is one of: 1861 + * 1862 + * **BPF_NOEXIST** 1863 + * The entry for *key* must not exist in the map. 1864 + * **BPF_EXIST** 1865 + * The entry for *key* must already exist in the map. 1866 + * **BPF_ANY** 1867 + * No condition on the existence of the entry for *key*. 1868 + * 1869 + * If the *map* has eBPF programs (parser and verdict), those will 1870 + * be inherited by the socket being added. If the socket is 1871 + * already attached to eBPF programs, this results in an error. 1872 + * Return 1873 + * 0 on success, or a negative error in case of failure. 1874 + * 1875 + * int bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map, void *key, u64 flags) 1876 + * Description 1877 + * This helper is used in programs implementing policies at the 1878 + * socket level. If the message *msg* is allowed to pass (i.e. if 1879 + * the verdict eBPF program returns **SK_PASS**), redirect it to 1880 + * the socket referenced by *map* (of type 1881 + * **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and 1882 + * egress interfaces can be used for redirection. The 1883 + * **BPF_F_INGRESS** value in *flags* is used to make the 1884 + * distinction (ingress path is selected if the flag is present, 1885 + * egress path otherwise). This is the only flag supported for now. 1886 + * Return 1887 + * **SK_PASS** on success, or **SK_DROP** on error. 1888 + * 1889 + * int bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void *key, u64 flags) 1890 + * Description 1891 + * This helper is used in programs implementing policies at the 1892 + * skb socket level. If the sk_buff *skb* is allowed to pass (i.e. 1893 + * if the verdeict eBPF program returns **SK_PASS**), redirect it 1894 + * to the socket referenced by *map* (of type 1895 + * **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and 1896 + * egress interfaces can be used for redirection. The 1897 + * **BPF_F_INGRESS** value in *flags* is used to make the 1898 + * distinction (ingress path is selected if the flag is present, 1899 + * egress otherwise). This is the only flag supported for now. 1900 + * Return 1901 + * **SK_PASS** on success, or **SK_DROP** on error. 1832 1902 */ 1833 1903 #define __BPF_FUNC_MAPPER(FN) \ 1834 1904 FN(unspec), \ ··· 1972 1896 FN(xdp_adjust_tail), \ 1973 1897 FN(skb_get_xfrm_state), \ 1974 1898 FN(get_stack), \ 1975 - FN(skb_load_bytes_relative), 1899 + FN(skb_load_bytes_relative), \ 1900 + FN(fib_lookup), \ 1901 + FN(sock_hash_update), \ 1902 + FN(msg_redirect_hash), \ 1903 + FN(sk_redirect_hash), 1976 1904 1977 1905 /* integer value in 'imm' field of BPF_CALL instruction selects which helper 1978 1906 * function eBPF program intends to call ··· 2210 2130 __u32 ifindex; 2211 2131 __u64 netns_dev; 2212 2132 __u64 netns_ino; 2133 + __u32 btf_id; 2134 + __u32 btf_key_id; 2135 + __u32 btf_value_id; 2136 + } __attribute__((aligned(8))); 2137 + 2138 + struct bpf_btf_info { 2139 + __aligned_u64 btf; 2140 + __u32 btf_size; 2141 + __u32 id; 2213 2142 } __attribute__((aligned(8))); 2214 2143 2215 2144 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed ··· 2397 2308 2398 2309 struct bpf_raw_tracepoint_args { 2399 2310 __u64 args[0]; 2311 + }; 2312 + 2313 + /* DIRECT: Skip the FIB rules and go to FIB table associated with device 2314 + * OUTPUT: Do lookup from egress perspective; default is ingress 2315 + */ 2316 + #define BPF_FIB_LOOKUP_DIRECT BIT(0) 2317 + #define BPF_FIB_LOOKUP_OUTPUT BIT(1) 2318 + 2319 + struct bpf_fib_lookup { 2320 + /* input */ 2321 + __u8 family; /* network family, AF_INET, AF_INET6, AF_MPLS */ 2322 + 2323 + /* set if lookup is to consider L4 data - e.g., FIB rules */ 2324 + __u8 l4_protocol; 2325 + __be16 sport; 2326 + __be16 dport; 2327 + 2328 + /* total length of packet from network header - used for MTU check */ 2329 + __u16 tot_len; 2330 + __u32 ifindex; /* L3 device index for lookup */ 2331 + 2332 + union { 2333 + /* inputs to lookup */ 2334 + __u8 tos; /* AF_INET */ 2335 + __be32 flowlabel; /* AF_INET6 */ 2336 + 2337 + /* output: metric of fib result */ 2338 + __u32 rt_metric; 2339 + }; 2340 + 2341 + union { 2342 + __be32 mpls_in; 2343 + __be32 ipv4_src; 2344 + __u32 ipv6_src[4]; /* in6_addr; network order */ 2345 + }; 2346 + 2347 + /* input to bpf_fib_lookup, *dst is destination address. 2348 + * output: bpf_fib_lookup sets to gateway address 2349 + */ 2350 + union { 2351 + /* return for MPLS lookups */ 2352 + __be32 mpls_out[4]; /* support up to 4 labels */ 2353 + __be32 ipv4_dst; 2354 + __u32 ipv6_dst[4]; /* in6_addr; network order */ 2355 + }; 2356 + 2357 + /* output */ 2358 + __be16 h_vlan_proto; 2359 + __be16 h_vlan_TCI; 2360 + __u8 smac[6]; /* ETH_ALEN */ 2361 + __u8 dmac[6]; /* ETH_ALEN */ 2400 2362 }; 2401 2363 2402 2364 #endif /* _UAPI__LINUX_BPF_H__ */
+1
init/Kconfig
··· 1391 1391 bool "Enable bpf() system call" 1392 1392 select ANON_INODES 1393 1393 select BPF 1394 + select IRQ_WORK 1394 1395 default n 1395 1396 help 1396 1397 Enable the bpf() system call that allows to manipulate eBPF
+120 -16
kernel/bpf/btf.c
··· 11 11 #include <linux/file.h> 12 12 #include <linux/uaccess.h> 13 13 #include <linux/kernel.h> 14 + #include <linux/idr.h> 14 15 #include <linux/bpf_verifier.h> 15 16 #include <linux/btf.h> 16 17 ··· 180 179 i < btf_type_vlen(struct_type); \ 181 180 i++, member++) 182 181 182 + static DEFINE_IDR(btf_idr); 183 + static DEFINE_SPINLOCK(btf_idr_lock); 184 + 183 185 struct btf { 184 186 union { 185 187 struct btf_header *hdr; ··· 197 193 u32 types_size; 198 194 u32 data_size; 199 195 refcount_t refcnt; 196 + u32 id; 197 + struct rcu_head rcu; 200 198 }; 201 199 202 200 enum verifier_phase { ··· 604 598 return 0; 605 599 } 606 600 601 + static int btf_alloc_id(struct btf *btf) 602 + { 603 + int id; 604 + 605 + idr_preload(GFP_KERNEL); 606 + spin_lock_bh(&btf_idr_lock); 607 + id = idr_alloc_cyclic(&btf_idr, btf, 1, INT_MAX, GFP_ATOMIC); 608 + if (id > 0) 609 + btf->id = id; 610 + spin_unlock_bh(&btf_idr_lock); 611 + idr_preload_end(); 612 + 613 + if (WARN_ON_ONCE(!id)) 614 + return -ENOSPC; 615 + 616 + return id > 0 ? 0 : id; 617 + } 618 + 619 + static void btf_free_id(struct btf *btf) 620 + { 621 + unsigned long flags; 622 + 623 + /* 624 + * In map-in-map, calling map_delete_elem() on outer 625 + * map will call bpf_map_put on the inner map. 626 + * It will then eventually call btf_free_id() 627 + * on the inner map. Some of the map_delete_elem() 628 + * implementation may have irq disabled, so 629 + * we need to use the _irqsave() version instead 630 + * of the _bh() version. 631 + */ 632 + spin_lock_irqsave(&btf_idr_lock, flags); 633 + idr_remove(&btf_idr, btf->id); 634 + spin_unlock_irqrestore(&btf_idr_lock, flags); 635 + } 636 + 607 637 static void btf_free(struct btf *btf) 608 638 { 609 639 kvfree(btf->types); ··· 649 607 kfree(btf); 650 608 } 651 609 652 - static void btf_get(struct btf *btf) 610 + static void btf_free_rcu(struct rcu_head *rcu) 653 611 { 654 - refcount_inc(&btf->refcnt); 612 + struct btf *btf = container_of(rcu, struct btf, rcu); 613 + 614 + btf_free(btf); 655 615 } 656 616 657 617 void btf_put(struct btf *btf) 658 618 { 659 - if (btf && refcount_dec_and_test(&btf->refcnt)) 660 - btf_free(btf); 619 + if (btf && refcount_dec_and_test(&btf->refcnt)) { 620 + btf_free_id(btf); 621 + call_rcu(&btf->rcu, btf_free_rcu); 622 + } 661 623 } 662 624 663 625 static int env_resolve_init(struct btf_verifier_env *env) ··· 2023 1977 2024 1978 if (!err) { 2025 1979 btf_verifier_env_free(env); 2026 - btf_get(btf); 1980 + refcount_set(&btf->refcnt, 1); 2027 1981 return btf; 2028 1982 } 2029 1983 ··· 2052 2006 .release = btf_release, 2053 2007 }; 2054 2008 2009 + static int __btf_new_fd(struct btf *btf) 2010 + { 2011 + return anon_inode_getfd("btf", &btf_fops, btf, O_RDONLY | O_CLOEXEC); 2012 + } 2013 + 2055 2014 int btf_new_fd(const union bpf_attr *attr) 2056 2015 { 2057 2016 struct btf *btf; 2058 - int fd; 2017 + int ret; 2059 2018 2060 2019 btf = btf_parse(u64_to_user_ptr(attr->btf), 2061 2020 attr->btf_size, attr->btf_log_level, ··· 2069 2018 if (IS_ERR(btf)) 2070 2019 return PTR_ERR(btf); 2071 2020 2072 - fd = anon_inode_getfd("btf", &btf_fops, btf, 2073 - O_RDONLY | O_CLOEXEC); 2074 - if (fd < 0) 2021 + ret = btf_alloc_id(btf); 2022 + if (ret) { 2023 + btf_free(btf); 2024 + return ret; 2025 + } 2026 + 2027 + /* 2028 + * The BTF ID is published to the userspace. 2029 + * All BTF free must go through call_rcu() from 2030 + * now on (i.e. free by calling btf_put()). 2031 + */ 2032 + 2033 + ret = __btf_new_fd(btf); 2034 + if (ret < 0) 2075 2035 btf_put(btf); 2076 2036 2077 - return fd; 2037 + return ret; 2078 2038 } 2079 2039 2080 2040 struct btf *btf_get_by_fd(int fd) ··· 2104 2042 } 2105 2043 2106 2044 btf = f.file->private_data; 2107 - btf_get(btf); 2045 + refcount_inc(&btf->refcnt); 2108 2046 fdput(f); 2109 2047 2110 2048 return btf; ··· 2114 2052 const union bpf_attr *attr, 2115 2053 union bpf_attr __user *uattr) 2116 2054 { 2117 - void __user *udata = u64_to_user_ptr(attr->info.info); 2118 - u32 copy_len = min_t(u32, btf->data_size, 2119 - attr->info.info_len); 2055 + struct bpf_btf_info __user *uinfo; 2056 + struct bpf_btf_info info = {}; 2057 + u32 info_copy, btf_copy; 2058 + void __user *ubtf; 2059 + u32 uinfo_len; 2120 2060 2121 - if (copy_to_user(udata, btf->data, copy_len) || 2122 - put_user(btf->data_size, &uattr->info.info_len)) 2061 + uinfo = u64_to_user_ptr(attr->info.info); 2062 + uinfo_len = attr->info.info_len; 2063 + 2064 + info_copy = min_t(u32, uinfo_len, sizeof(info)); 2065 + if (copy_from_user(&info, uinfo, info_copy)) 2066 + return -EFAULT; 2067 + 2068 + info.id = btf->id; 2069 + ubtf = u64_to_user_ptr(info.btf); 2070 + btf_copy = min_t(u32, btf->data_size, info.btf_size); 2071 + if (copy_to_user(ubtf, btf->data, btf_copy)) 2072 + return -EFAULT; 2073 + info.btf_size = btf->data_size; 2074 + 2075 + if (copy_to_user(uinfo, &info, info_copy) || 2076 + put_user(info_copy, &uattr->info.info_len)) 2123 2077 return -EFAULT; 2124 2078 2125 2079 return 0; 2080 + } 2081 + 2082 + int btf_get_fd_by_id(u32 id) 2083 + { 2084 + struct btf *btf; 2085 + int fd; 2086 + 2087 + rcu_read_lock(); 2088 + btf = idr_find(&btf_idr, id); 2089 + if (!btf || !refcount_inc_not_zero(&btf->refcnt)) 2090 + btf = ERR_PTR(-ENOENT); 2091 + rcu_read_unlock(); 2092 + 2093 + if (IS_ERR(btf)) 2094 + return PTR_ERR(btf); 2095 + 2096 + fd = __btf_new_fd(btf); 2097 + if (fd < 0) 2098 + btf_put(btf); 2099 + 2100 + return fd; 2101 + } 2102 + 2103 + u32 btf_id(const struct btf *btf) 2104 + { 2105 + return btf->id; 2126 2106 }
+1
kernel/bpf/core.c
··· 1707 1707 const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak; 1708 1708 const struct bpf_func_proto bpf_get_current_comm_proto __weak; 1709 1709 const struct bpf_func_proto bpf_sock_map_update_proto __weak; 1710 + const struct bpf_func_proto bpf_sock_hash_update_proto __weak; 1710 1711 1711 1712 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void) 1712 1713 {
+571 -73
kernel/bpf/sockmap.c
··· 48 48 #define SOCK_CREATE_FLAG_MASK \ 49 49 (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) 50 50 51 - struct bpf_stab { 52 - struct bpf_map map; 53 - struct sock **sock_map; 51 + struct bpf_sock_progs { 54 52 struct bpf_prog *bpf_tx_msg; 55 53 struct bpf_prog *bpf_parse; 56 54 struct bpf_prog *bpf_verdict; 55 + }; 56 + 57 + struct bpf_stab { 58 + struct bpf_map map; 59 + struct sock **sock_map; 60 + struct bpf_sock_progs progs; 61 + }; 62 + 63 + struct bucket { 64 + struct hlist_head head; 65 + raw_spinlock_t lock; 66 + }; 67 + 68 + struct bpf_htab { 69 + struct bpf_map map; 70 + struct bucket *buckets; 71 + atomic_t count; 72 + u32 n_buckets; 73 + u32 elem_size; 74 + struct bpf_sock_progs progs; 75 + }; 76 + 77 + struct htab_elem { 78 + struct rcu_head rcu; 79 + struct hlist_node hash_node; 80 + u32 hash; 81 + struct sock *sk; 82 + char key[0]; 57 83 }; 58 84 59 85 enum smap_psock_state { ··· 89 63 struct smap_psock_map_entry { 90 64 struct list_head list; 91 65 struct sock **entry; 66 + struct htab_elem *hash_link; 67 + struct bpf_htab *htab; 92 68 }; 93 69 94 70 struct smap_psock { ··· 219 191 rcu_read_unlock(); 220 192 } 221 193 194 + static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) 195 + { 196 + atomic_dec(&htab->count); 197 + kfree_rcu(l, rcu); 198 + } 199 + 222 200 static void bpf_tcp_close(struct sock *sk, long timeout) 223 201 { 224 202 void (*close_fun)(struct sock *sk, long timeout); ··· 261 227 } 262 228 263 229 list_for_each_entry_safe(e, tmp, &psock->maps, list) { 264 - osk = cmpxchg(e->entry, sk, NULL); 265 - if (osk == sk) { 266 - list_del(&e->list); 267 - smap_release_sock(psock, sk); 230 + if (e->entry) { 231 + osk = cmpxchg(e->entry, sk, NULL); 232 + if (osk == sk) { 233 + list_del(&e->list); 234 + smap_release_sock(psock, sk); 235 + } 236 + } else { 237 + hlist_del_rcu(&e->hash_link->hash_node); 238 + smap_release_sock(psock, e->hash_link->sk); 239 + free_htab_elem(e->htab, e->hash_link); 268 240 } 269 241 } 270 242 write_unlock_bh(&sk->sk_callback_lock); ··· 501 461 static int bpf_map_msg_verdict(int _rc, struct sk_msg_buff *md) 502 462 { 503 463 return ((_rc == SK_PASS) ? 504 - (md->map ? __SK_REDIRECT : __SK_PASS) : 464 + (md->sk_redir ? __SK_REDIRECT : __SK_PASS) : 505 465 __SK_DROP); 506 466 } 507 467 ··· 1132 1092 * when we orphan the skb so that we don't have the possibility 1133 1093 * to reference a stale map. 1134 1094 */ 1135 - TCP_SKB_CB(skb)->bpf.map = NULL; 1095 + TCP_SKB_CB(skb)->bpf.sk_redir = NULL; 1136 1096 skb->sk = psock->sock; 1137 1097 bpf_compute_data_pointers(skb); 1138 1098 preempt_disable(); ··· 1142 1102 1143 1103 /* Moving return codes from UAPI namespace into internal namespace */ 1144 1104 return rc == SK_PASS ? 1145 - (TCP_SKB_CB(skb)->bpf.map ? __SK_REDIRECT : __SK_PASS) : 1105 + (TCP_SKB_CB(skb)->bpf.sk_redir ? __SK_REDIRECT : __SK_PASS) : 1146 1106 __SK_DROP; 1147 1107 } 1148 1108 ··· 1412 1372 } 1413 1373 1414 1374 static void smap_init_progs(struct smap_psock *psock, 1415 - struct bpf_stab *stab, 1416 1375 struct bpf_prog *verdict, 1417 1376 struct bpf_prog *parse) 1418 1377 { ··· 1489 1450 kfree(psock); 1490 1451 } 1491 1452 1492 - static struct smap_psock *smap_init_psock(struct sock *sock, 1493 - struct bpf_stab *stab) 1453 + static struct smap_psock *smap_init_psock(struct sock *sock, int node) 1494 1454 { 1495 1455 struct smap_psock *psock; 1496 1456 1497 1457 psock = kzalloc_node(sizeof(struct smap_psock), 1498 1458 GFP_ATOMIC | __GFP_NOWARN, 1499 - stab->map.numa_node); 1459 + node); 1500 1460 if (!psock) 1501 1461 return ERR_PTR(-ENOMEM); 1502 1462 ··· 1563 1525 return ERR_PTR(err); 1564 1526 } 1565 1527 1566 - static void smap_list_remove(struct smap_psock *psock, struct sock **entry) 1528 + static void smap_list_remove(struct smap_psock *psock, 1529 + struct sock **entry, 1530 + struct htab_elem *hash_link) 1567 1531 { 1568 1532 struct smap_psock_map_entry *e, *tmp; 1569 1533 1570 1534 list_for_each_entry_safe(e, tmp, &psock->maps, list) { 1571 - if (e->entry == entry) { 1535 + if (e->entry == entry || e->hash_link == hash_link) { 1572 1536 list_del(&e->list); 1573 1537 break; 1574 1538 } ··· 1608 1568 * to be null and queued for garbage collection. 1609 1569 */ 1610 1570 if (likely(psock)) { 1611 - smap_list_remove(psock, &stab->sock_map[i]); 1571 + smap_list_remove(psock, &stab->sock_map[i], NULL); 1612 1572 smap_release_sock(psock, sock); 1613 1573 } 1614 1574 write_unlock_bh(&sock->sk_callback_lock); ··· 1667 1627 1668 1628 if (psock->bpf_parse) 1669 1629 smap_stop_sock(psock, sock); 1670 - smap_list_remove(psock, &stab->sock_map[k]); 1630 + smap_list_remove(psock, &stab->sock_map[k], NULL); 1671 1631 smap_release_sock(psock, sock); 1672 1632 out: 1673 1633 write_unlock_bh(&sock->sk_callback_lock); ··· 1702 1662 * - sock_map must use READ_ONCE and (cmp)xchg operations 1703 1663 * - BPF verdict/parse programs must use READ_ONCE and xchg operations 1704 1664 */ 1705 - static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops, 1706 - struct bpf_map *map, 1707 - void *key, u64 flags) 1665 + 1666 + static int __sock_map_ctx_update_elem(struct bpf_map *map, 1667 + struct bpf_sock_progs *progs, 1668 + struct sock *sock, 1669 + struct sock **map_link, 1670 + void *key) 1708 1671 { 1709 - struct bpf_stab *stab = container_of(map, struct bpf_stab, map); 1710 - struct smap_psock_map_entry *e = NULL; 1711 1672 struct bpf_prog *verdict, *parse, *tx_msg; 1712 - struct sock *osock, *sock; 1673 + struct smap_psock_map_entry *e = NULL; 1713 1674 struct smap_psock *psock; 1714 - u32 i = *(u32 *)key; 1715 1675 bool new = false; 1716 1676 int err; 1717 - 1718 - if (unlikely(flags > BPF_EXIST)) 1719 - return -EINVAL; 1720 - 1721 - if (unlikely(i >= stab->map.max_entries)) 1722 - return -E2BIG; 1723 - 1724 - sock = READ_ONCE(stab->sock_map[i]); 1725 - if (flags == BPF_EXIST && !sock) 1726 - return -ENOENT; 1727 - else if (flags == BPF_NOEXIST && sock) 1728 - return -EEXIST; 1729 - 1730 - sock = skops->sk; 1731 1677 1732 1678 /* 1. If sock map has BPF programs those will be inherited by the 1733 1679 * sock being added. If the sock is already attached to BPF programs 1734 1680 * this results in an error. 1735 1681 */ 1736 - verdict = READ_ONCE(stab->bpf_verdict); 1737 - parse = READ_ONCE(stab->bpf_parse); 1738 - tx_msg = READ_ONCE(stab->bpf_tx_msg); 1682 + verdict = READ_ONCE(progs->bpf_verdict); 1683 + parse = READ_ONCE(progs->bpf_parse); 1684 + tx_msg = READ_ONCE(progs->bpf_tx_msg); 1739 1685 1740 1686 if (parse && verdict) { 1741 1687 /* bpf prog refcnt may be zero if a concurrent attach operation ··· 1729 1703 * we increment the refcnt. If this is the case abort with an 1730 1704 * error. 1731 1705 */ 1732 - verdict = bpf_prog_inc_not_zero(stab->bpf_verdict); 1706 + verdict = bpf_prog_inc_not_zero(progs->bpf_verdict); 1733 1707 if (IS_ERR(verdict)) 1734 1708 return PTR_ERR(verdict); 1735 1709 1736 - parse = bpf_prog_inc_not_zero(stab->bpf_parse); 1710 + parse = bpf_prog_inc_not_zero(progs->bpf_parse); 1737 1711 if (IS_ERR(parse)) { 1738 1712 bpf_prog_put(verdict); 1739 1713 return PTR_ERR(parse); ··· 1741 1715 } 1742 1716 1743 1717 if (tx_msg) { 1744 - tx_msg = bpf_prog_inc_not_zero(stab->bpf_tx_msg); 1718 + tx_msg = bpf_prog_inc_not_zero(progs->bpf_tx_msg); 1745 1719 if (IS_ERR(tx_msg)) { 1746 1720 if (verdict) 1747 1721 bpf_prog_put(verdict); ··· 1774 1748 goto out_progs; 1775 1749 } 1776 1750 } else { 1777 - psock = smap_init_psock(sock, stab); 1751 + psock = smap_init_psock(sock, map->numa_node); 1778 1752 if (IS_ERR(psock)) { 1779 1753 err = PTR_ERR(psock); 1780 1754 goto out_progs; ··· 1784 1758 new = true; 1785 1759 } 1786 1760 1787 - e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN); 1788 - if (!e) { 1789 - err = -ENOMEM; 1790 - goto out_progs; 1761 + if (map_link) { 1762 + e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN); 1763 + if (!e) { 1764 + err = -ENOMEM; 1765 + goto out_progs; 1766 + } 1791 1767 } 1792 - e->entry = &stab->sock_map[i]; 1793 1768 1794 1769 /* 3. At this point we have a reference to a valid psock that is 1795 1770 * running. Attach any BPF programs needed. ··· 1807 1780 err = smap_init_sock(psock, sock); 1808 1781 if (err) 1809 1782 goto out_free; 1810 - smap_init_progs(psock, stab, verdict, parse); 1783 + smap_init_progs(psock, verdict, parse); 1811 1784 smap_start_sock(psock, sock); 1812 1785 } 1813 1786 ··· 1816 1789 * it with. Because we can only have a single set of programs if 1817 1790 * old_sock has a strp we can stop it. 1818 1791 */ 1819 - list_add_tail(&e->list, &psock->maps); 1820 - write_unlock_bh(&sock->sk_callback_lock); 1821 - 1822 - osock = xchg(&stab->sock_map[i], sock); 1823 - if (osock) { 1824 - struct smap_psock *opsock = smap_psock_sk(osock); 1825 - 1826 - write_lock_bh(&osock->sk_callback_lock); 1827 - smap_list_remove(opsock, &stab->sock_map[i]); 1828 - smap_release_sock(opsock, osock); 1829 - write_unlock_bh(&osock->sk_callback_lock); 1792 + if (map_link) { 1793 + e->entry = map_link; 1794 + list_add_tail(&e->list, &psock->maps); 1830 1795 } 1831 - return 0; 1796 + write_unlock_bh(&sock->sk_callback_lock); 1797 + return err; 1832 1798 out_free: 1799 + kfree(e); 1833 1800 smap_release_sock(psock, sock); 1834 1801 out_progs: 1835 1802 if (verdict) ··· 1837 1816 return err; 1838 1817 } 1839 1818 1840 - int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type) 1819 + static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops, 1820 + struct bpf_map *map, 1821 + void *key, u64 flags) 1841 1822 { 1842 1823 struct bpf_stab *stab = container_of(map, struct bpf_stab, map); 1824 + struct bpf_sock_progs *progs = &stab->progs; 1825 + struct sock *osock, *sock; 1826 + u32 i = *(u32 *)key; 1827 + int err; 1828 + 1829 + if (unlikely(flags > BPF_EXIST)) 1830 + return -EINVAL; 1831 + 1832 + if (unlikely(i >= stab->map.max_entries)) 1833 + return -E2BIG; 1834 + 1835 + sock = READ_ONCE(stab->sock_map[i]); 1836 + if (flags == BPF_EXIST && !sock) 1837 + return -ENOENT; 1838 + else if (flags == BPF_NOEXIST && sock) 1839 + return -EEXIST; 1840 + 1841 + sock = skops->sk; 1842 + err = __sock_map_ctx_update_elem(map, progs, sock, &stab->sock_map[i], 1843 + key); 1844 + if (err) 1845 + goto out; 1846 + 1847 + osock = xchg(&stab->sock_map[i], sock); 1848 + if (osock) { 1849 + struct smap_psock *opsock = smap_psock_sk(osock); 1850 + 1851 + write_lock_bh(&osock->sk_callback_lock); 1852 + smap_list_remove(opsock, &stab->sock_map[i], NULL); 1853 + smap_release_sock(opsock, osock); 1854 + write_unlock_bh(&osock->sk_callback_lock); 1855 + } 1856 + out: 1857 + return err; 1858 + } 1859 + 1860 + int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type) 1861 + { 1862 + struct bpf_sock_progs *progs; 1843 1863 struct bpf_prog *orig; 1844 1864 1845 - if (unlikely(map->map_type != BPF_MAP_TYPE_SOCKMAP)) 1865 + if (map->map_type == BPF_MAP_TYPE_SOCKMAP) { 1866 + struct bpf_stab *stab = container_of(map, struct bpf_stab, map); 1867 + 1868 + progs = &stab->progs; 1869 + } else if (map->map_type == BPF_MAP_TYPE_SOCKHASH) { 1870 + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 1871 + 1872 + progs = &htab->progs; 1873 + } else { 1846 1874 return -EINVAL; 1875 + } 1847 1876 1848 1877 switch (type) { 1849 1878 case BPF_SK_MSG_VERDICT: 1850 - orig = xchg(&stab->bpf_tx_msg, prog); 1879 + orig = xchg(&progs->bpf_tx_msg, prog); 1851 1880 break; 1852 1881 case BPF_SK_SKB_STREAM_PARSER: 1853 - orig = xchg(&stab->bpf_parse, prog); 1882 + orig = xchg(&progs->bpf_parse, prog); 1854 1883 break; 1855 1884 case BPF_SK_SKB_STREAM_VERDICT: 1856 - orig = xchg(&stab->bpf_verdict, prog); 1885 + orig = xchg(&progs->bpf_verdict, prog); 1857 1886 break; 1858 1887 default: 1859 1888 return -EOPNOTSUPP; ··· 1951 1880 1952 1881 static void sock_map_release(struct bpf_map *map) 1953 1882 { 1954 - struct bpf_stab *stab = container_of(map, struct bpf_stab, map); 1883 + struct bpf_sock_progs *progs; 1955 1884 struct bpf_prog *orig; 1956 1885 1957 - orig = xchg(&stab->bpf_parse, NULL); 1886 + if (map->map_type == BPF_MAP_TYPE_SOCKMAP) { 1887 + struct bpf_stab *stab = container_of(map, struct bpf_stab, map); 1888 + 1889 + progs = &stab->progs; 1890 + } else { 1891 + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 1892 + 1893 + progs = &htab->progs; 1894 + } 1895 + 1896 + orig = xchg(&progs->bpf_parse, NULL); 1958 1897 if (orig) 1959 1898 bpf_prog_put(orig); 1960 - orig = xchg(&stab->bpf_verdict, NULL); 1899 + orig = xchg(&progs->bpf_verdict, NULL); 1961 1900 if (orig) 1962 1901 bpf_prog_put(orig); 1963 1902 1964 - orig = xchg(&stab->bpf_tx_msg, NULL); 1903 + orig = xchg(&progs->bpf_tx_msg, NULL); 1965 1904 if (orig) 1966 1905 bpf_prog_put(orig); 1906 + } 1907 + 1908 + static struct bpf_map *sock_hash_alloc(union bpf_attr *attr) 1909 + { 1910 + struct bpf_htab *htab; 1911 + int i, err; 1912 + u64 cost; 1913 + 1914 + if (!capable(CAP_NET_ADMIN)) 1915 + return ERR_PTR(-EPERM); 1916 + 1917 + /* check sanity of attributes */ 1918 + if (attr->max_entries == 0 || attr->value_size != 4 || 1919 + attr->map_flags & ~SOCK_CREATE_FLAG_MASK) 1920 + return ERR_PTR(-EINVAL); 1921 + 1922 + if (attr->key_size > MAX_BPF_STACK) 1923 + /* eBPF programs initialize keys on stack, so they cannot be 1924 + * larger than max stack size 1925 + */ 1926 + return ERR_PTR(-E2BIG); 1927 + 1928 + err = bpf_tcp_ulp_register(); 1929 + if (err && err != -EEXIST) 1930 + return ERR_PTR(err); 1931 + 1932 + htab = kzalloc(sizeof(*htab), GFP_USER); 1933 + if (!htab) 1934 + return ERR_PTR(-ENOMEM); 1935 + 1936 + bpf_map_init_from_attr(&htab->map, attr); 1937 + 1938 + htab->n_buckets = roundup_pow_of_two(htab->map.max_entries); 1939 + htab->elem_size = sizeof(struct htab_elem) + 1940 + round_up(htab->map.key_size, 8); 1941 + err = -EINVAL; 1942 + if (htab->n_buckets == 0 || 1943 + htab->n_buckets > U32_MAX / sizeof(struct bucket)) 1944 + goto free_htab; 1945 + 1946 + cost = (u64) htab->n_buckets * sizeof(struct bucket) + 1947 + (u64) htab->elem_size * htab->map.max_entries; 1948 + 1949 + if (cost >= U32_MAX - PAGE_SIZE) 1950 + goto free_htab; 1951 + 1952 + htab->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; 1953 + err = bpf_map_precharge_memlock(htab->map.pages); 1954 + if (err) 1955 + goto free_htab; 1956 + 1957 + err = -ENOMEM; 1958 + htab->buckets = bpf_map_area_alloc( 1959 + htab->n_buckets * sizeof(struct bucket), 1960 + htab->map.numa_node); 1961 + if (!htab->buckets) 1962 + goto free_htab; 1963 + 1964 + for (i = 0; i < htab->n_buckets; i++) { 1965 + INIT_HLIST_HEAD(&htab->buckets[i].head); 1966 + raw_spin_lock_init(&htab->buckets[i].lock); 1967 + } 1968 + 1969 + return &htab->map; 1970 + free_htab: 1971 + kfree(htab); 1972 + return ERR_PTR(err); 1973 + } 1974 + 1975 + static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash) 1976 + { 1977 + return &htab->buckets[hash & (htab->n_buckets - 1)]; 1978 + } 1979 + 1980 + static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash) 1981 + { 1982 + return &__select_bucket(htab, hash)->head; 1983 + } 1984 + 1985 + static void sock_hash_free(struct bpf_map *map) 1986 + { 1987 + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 1988 + int i; 1989 + 1990 + synchronize_rcu(); 1991 + 1992 + /* At this point no update, lookup or delete operations can happen. 1993 + * However, be aware we can still get a socket state event updates, 1994 + * and data ready callabacks that reference the psock from sk_user_data 1995 + * Also psock worker threads are still in-flight. So smap_release_sock 1996 + * will only free the psock after cancel_sync on the worker threads 1997 + * and a grace period expire to ensure psock is really safe to remove. 1998 + */ 1999 + rcu_read_lock(); 2000 + for (i = 0; i < htab->n_buckets; i++) { 2001 + struct hlist_head *head = select_bucket(htab, i); 2002 + struct hlist_node *n; 2003 + struct htab_elem *l; 2004 + 2005 + hlist_for_each_entry_safe(l, n, head, hash_node) { 2006 + struct sock *sock = l->sk; 2007 + struct smap_psock *psock; 2008 + 2009 + hlist_del_rcu(&l->hash_node); 2010 + write_lock_bh(&sock->sk_callback_lock); 2011 + psock = smap_psock_sk(sock); 2012 + /* This check handles a racing sock event that can get 2013 + * the sk_callback_lock before this case but after xchg 2014 + * causing the refcnt to hit zero and sock user data 2015 + * (psock) to be null and queued for garbage collection. 2016 + */ 2017 + if (likely(psock)) { 2018 + smap_list_remove(psock, NULL, l); 2019 + smap_release_sock(psock, sock); 2020 + } 2021 + write_unlock_bh(&sock->sk_callback_lock); 2022 + kfree(l); 2023 + } 2024 + } 2025 + rcu_read_unlock(); 2026 + bpf_map_area_free(htab->buckets); 2027 + kfree(htab); 2028 + } 2029 + 2030 + static struct htab_elem *alloc_sock_hash_elem(struct bpf_htab *htab, 2031 + void *key, u32 key_size, u32 hash, 2032 + struct sock *sk, 2033 + struct htab_elem *old_elem) 2034 + { 2035 + struct htab_elem *l_new; 2036 + 2037 + if (atomic_inc_return(&htab->count) > htab->map.max_entries) { 2038 + if (!old_elem) { 2039 + atomic_dec(&htab->count); 2040 + return ERR_PTR(-E2BIG); 2041 + } 2042 + } 2043 + l_new = kmalloc_node(htab->elem_size, GFP_ATOMIC | __GFP_NOWARN, 2044 + htab->map.numa_node); 2045 + if (!l_new) 2046 + return ERR_PTR(-ENOMEM); 2047 + 2048 + memcpy(l_new->key, key, key_size); 2049 + l_new->sk = sk; 2050 + l_new->hash = hash; 2051 + return l_new; 2052 + } 2053 + 2054 + static struct htab_elem *lookup_elem_raw(struct hlist_head *head, 2055 + u32 hash, void *key, u32 key_size) 2056 + { 2057 + struct htab_elem *l; 2058 + 2059 + hlist_for_each_entry_rcu(l, head, hash_node) { 2060 + if (l->hash == hash && !memcmp(&l->key, key, key_size)) 2061 + return l; 2062 + } 2063 + 2064 + return NULL; 2065 + } 2066 + 2067 + static inline u32 htab_map_hash(const void *key, u32 key_len) 2068 + { 2069 + return jhash(key, key_len, 0); 2070 + } 2071 + 2072 + static int sock_hash_get_next_key(struct bpf_map *map, 2073 + void *key, void *next_key) 2074 + { 2075 + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 2076 + struct htab_elem *l, *next_l; 2077 + struct hlist_head *h; 2078 + u32 hash, key_size; 2079 + int i = 0; 2080 + 2081 + WARN_ON_ONCE(!rcu_read_lock_held()); 2082 + 2083 + key_size = map->key_size; 2084 + if (!key) 2085 + goto find_first_elem; 2086 + hash = htab_map_hash(key, key_size); 2087 + h = select_bucket(htab, hash); 2088 + 2089 + l = lookup_elem_raw(h, hash, key, key_size); 2090 + if (!l) 2091 + goto find_first_elem; 2092 + next_l = hlist_entry_safe( 2093 + rcu_dereference_raw(hlist_next_rcu(&l->hash_node)), 2094 + struct htab_elem, hash_node); 2095 + if (next_l) { 2096 + memcpy(next_key, next_l->key, key_size); 2097 + return 0; 2098 + } 2099 + 2100 + /* no more elements in this hash list, go to the next bucket */ 2101 + i = hash & (htab->n_buckets - 1); 2102 + i++; 2103 + 2104 + find_first_elem: 2105 + /* iterate over buckets */ 2106 + for (; i < htab->n_buckets; i++) { 2107 + h = select_bucket(htab, i); 2108 + 2109 + /* pick first element in the bucket */ 2110 + next_l = hlist_entry_safe( 2111 + rcu_dereference_raw(hlist_first_rcu(h)), 2112 + struct htab_elem, hash_node); 2113 + if (next_l) { 2114 + /* if it's not empty, just return it */ 2115 + memcpy(next_key, next_l->key, key_size); 2116 + return 0; 2117 + } 2118 + } 2119 + 2120 + /* iterated over all buckets and all elements */ 2121 + return -ENOENT; 2122 + } 2123 + 2124 + static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops, 2125 + struct bpf_map *map, 2126 + void *key, u64 map_flags) 2127 + { 2128 + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 2129 + struct bpf_sock_progs *progs = &htab->progs; 2130 + struct htab_elem *l_new = NULL, *l_old; 2131 + struct smap_psock_map_entry *e = NULL; 2132 + struct hlist_head *head; 2133 + struct smap_psock *psock; 2134 + u32 key_size, hash; 2135 + struct sock *sock; 2136 + struct bucket *b; 2137 + int err; 2138 + 2139 + sock = skops->sk; 2140 + 2141 + if (sock->sk_type != SOCK_STREAM || 2142 + sock->sk_protocol != IPPROTO_TCP) 2143 + return -EOPNOTSUPP; 2144 + 2145 + if (unlikely(map_flags > BPF_EXIST)) 2146 + return -EINVAL; 2147 + 2148 + e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN); 2149 + if (!e) 2150 + return -ENOMEM; 2151 + 2152 + WARN_ON_ONCE(!rcu_read_lock_held()); 2153 + key_size = map->key_size; 2154 + hash = htab_map_hash(key, key_size); 2155 + b = __select_bucket(htab, hash); 2156 + head = &b->head; 2157 + 2158 + err = __sock_map_ctx_update_elem(map, progs, sock, NULL, key); 2159 + if (err) 2160 + goto err; 2161 + 2162 + /* bpf_map_update_elem() can be called in_irq() */ 2163 + raw_spin_lock_bh(&b->lock); 2164 + l_old = lookup_elem_raw(head, hash, key, key_size); 2165 + if (l_old && map_flags == BPF_NOEXIST) { 2166 + err = -EEXIST; 2167 + goto bucket_err; 2168 + } 2169 + if (!l_old && map_flags == BPF_EXIST) { 2170 + err = -ENOENT; 2171 + goto bucket_err; 2172 + } 2173 + 2174 + l_new = alloc_sock_hash_elem(htab, key, key_size, hash, sock, l_old); 2175 + if (IS_ERR(l_new)) { 2176 + err = PTR_ERR(l_new); 2177 + goto bucket_err; 2178 + } 2179 + 2180 + psock = smap_psock_sk(sock); 2181 + if (unlikely(!psock)) { 2182 + err = -EINVAL; 2183 + goto bucket_err; 2184 + } 2185 + 2186 + e->hash_link = l_new; 2187 + e->htab = container_of(map, struct bpf_htab, map); 2188 + list_add_tail(&e->list, &psock->maps); 2189 + 2190 + /* add new element to the head of the list, so that 2191 + * concurrent search will find it before old elem 2192 + */ 2193 + hlist_add_head_rcu(&l_new->hash_node, head); 2194 + if (l_old) { 2195 + psock = smap_psock_sk(l_old->sk); 2196 + 2197 + hlist_del_rcu(&l_old->hash_node); 2198 + smap_list_remove(psock, NULL, l_old); 2199 + smap_release_sock(psock, l_old->sk); 2200 + free_htab_elem(htab, l_old); 2201 + } 2202 + raw_spin_unlock_bh(&b->lock); 2203 + return 0; 2204 + bucket_err: 2205 + raw_spin_unlock_bh(&b->lock); 2206 + err: 2207 + kfree(e); 2208 + psock = smap_psock_sk(sock); 2209 + if (psock) 2210 + smap_release_sock(psock, sock); 2211 + return err; 2212 + } 2213 + 2214 + static int sock_hash_update_elem(struct bpf_map *map, 2215 + void *key, void *value, u64 flags) 2216 + { 2217 + struct bpf_sock_ops_kern skops; 2218 + u32 fd = *(u32 *)value; 2219 + struct socket *socket; 2220 + int err; 2221 + 2222 + socket = sockfd_lookup(fd, &err); 2223 + if (!socket) 2224 + return err; 2225 + 2226 + skops.sk = socket->sk; 2227 + if (!skops.sk) { 2228 + fput(socket->file); 2229 + return -EINVAL; 2230 + } 2231 + 2232 + err = sock_hash_ctx_update_elem(&skops, map, key, flags); 2233 + fput(socket->file); 2234 + return err; 2235 + } 2236 + 2237 + static int sock_hash_delete_elem(struct bpf_map *map, void *key) 2238 + { 2239 + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 2240 + struct hlist_head *head; 2241 + struct bucket *b; 2242 + struct htab_elem *l; 2243 + u32 hash, key_size; 2244 + int ret = -ENOENT; 2245 + 2246 + key_size = map->key_size; 2247 + hash = htab_map_hash(key, key_size); 2248 + b = __select_bucket(htab, hash); 2249 + head = &b->head; 2250 + 2251 + raw_spin_lock_bh(&b->lock); 2252 + l = lookup_elem_raw(head, hash, key, key_size); 2253 + if (l) { 2254 + struct sock *sock = l->sk; 2255 + struct smap_psock *psock; 2256 + 2257 + hlist_del_rcu(&l->hash_node); 2258 + write_lock_bh(&sock->sk_callback_lock); 2259 + psock = smap_psock_sk(sock); 2260 + /* This check handles a racing sock event that can get the 2261 + * sk_callback_lock before this case but after xchg happens 2262 + * causing the refcnt to hit zero and sock user data (psock) 2263 + * to be null and queued for garbage collection. 2264 + */ 2265 + if (likely(psock)) { 2266 + smap_list_remove(psock, NULL, l); 2267 + smap_release_sock(psock, sock); 2268 + } 2269 + write_unlock_bh(&sock->sk_callback_lock); 2270 + free_htab_elem(htab, l); 2271 + ret = 0; 2272 + } 2273 + raw_spin_unlock_bh(&b->lock); 2274 + return ret; 2275 + } 2276 + 2277 + struct sock *__sock_hash_lookup_elem(struct bpf_map *map, void *key) 2278 + { 2279 + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 2280 + struct hlist_head *head; 2281 + struct htab_elem *l; 2282 + u32 key_size, hash; 2283 + struct bucket *b; 2284 + struct sock *sk; 2285 + 2286 + key_size = map->key_size; 2287 + hash = htab_map_hash(key, key_size); 2288 + b = __select_bucket(htab, hash); 2289 + head = &b->head; 2290 + 2291 + raw_spin_lock_bh(&b->lock); 2292 + l = lookup_elem_raw(head, hash, key, key_size); 2293 + sk = l ? l->sk : NULL; 2294 + raw_spin_unlock_bh(&b->lock); 2295 + return sk; 1967 2296 } 1968 2297 1969 2298 const struct bpf_map_ops sock_map_ops = { ··· 2376 1905 .map_release_uref = sock_map_release, 2377 1906 }; 2378 1907 1908 + const struct bpf_map_ops sock_hash_ops = { 1909 + .map_alloc = sock_hash_alloc, 1910 + .map_free = sock_hash_free, 1911 + .map_lookup_elem = sock_map_lookup, 1912 + .map_get_next_key = sock_hash_get_next_key, 1913 + .map_update_elem = sock_hash_update_elem, 1914 + .map_delete_elem = sock_hash_delete_elem, 1915 + }; 1916 + 2379 1917 BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, bpf_sock, 2380 1918 struct bpf_map *, map, void *, key, u64, flags) 2381 1919 { ··· 2394 1914 2395 1915 const struct bpf_func_proto bpf_sock_map_update_proto = { 2396 1916 .func = bpf_sock_map_update, 1917 + .gpl_only = false, 1918 + .pkt_access = true, 1919 + .ret_type = RET_INTEGER, 1920 + .arg1_type = ARG_PTR_TO_CTX, 1921 + .arg2_type = ARG_CONST_MAP_PTR, 1922 + .arg3_type = ARG_PTR_TO_MAP_KEY, 1923 + .arg4_type = ARG_ANYTHING, 1924 + }; 1925 + 1926 + BPF_CALL_4(bpf_sock_hash_update, struct bpf_sock_ops_kern *, bpf_sock, 1927 + struct bpf_map *, map, void *, key, u64, flags) 1928 + { 1929 + WARN_ON_ONCE(!rcu_read_lock_held()); 1930 + return sock_hash_ctx_update_elem(bpf_sock, map, key, flags); 1931 + } 1932 + 1933 + const struct bpf_func_proto bpf_sock_hash_update_proto = { 1934 + .func = bpf_sock_hash_update, 2397 1935 .gpl_only = false, 2398 1936 .pkt_access = true, 2399 1937 .ret_type = RET_INTEGER,
+53 -6
kernel/bpf/stackmap.c
··· 11 11 #include <linux/perf_event.h> 12 12 #include <linux/elf.h> 13 13 #include <linux/pagemap.h> 14 + #include <linux/irq_work.h> 14 15 #include "percpu_freelist.h" 15 16 16 17 #define STACK_CREATE_FLAG_MASK \ ··· 32 31 u32 n_buckets; 33 32 struct stack_map_bucket *buckets[]; 34 33 }; 34 + 35 + /* irq_work to run up_read() for build_id lookup in nmi context */ 36 + struct stack_map_irq_work { 37 + struct irq_work irq_work; 38 + struct rw_semaphore *sem; 39 + }; 40 + 41 + static void do_up_read(struct irq_work *entry) 42 + { 43 + struct stack_map_irq_work *work; 44 + 45 + work = container_of(entry, struct stack_map_irq_work, irq_work); 46 + up_read(work->sem); 47 + work->sem = NULL; 48 + } 49 + 50 + static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work); 35 51 36 52 static inline bool stack_map_use_build_id(struct bpf_map *map) 37 53 { ··· 285 267 { 286 268 int i; 287 269 struct vm_area_struct *vma; 270 + bool in_nmi_ctx = in_nmi(); 271 + bool irq_work_busy = false; 272 + struct stack_map_irq_work *work; 273 + 274 + if (in_nmi_ctx) { 275 + work = this_cpu_ptr(&up_read_work); 276 + if (work->irq_work.flags & IRQ_WORK_BUSY) 277 + /* cannot queue more up_read, fallback */ 278 + irq_work_busy = true; 279 + } 288 280 289 281 /* 290 - * We cannot do up_read() in nmi context, so build_id lookup is 291 - * only supported for non-nmi events. If at some point, it is 292 - * possible to run find_vma() without taking the semaphore, we 293 - * would like to allow build_id lookup in nmi context. 282 + * We cannot do up_read() in nmi context. To do build_id lookup 283 + * in nmi context, we need to run up_read() in irq_work. We use 284 + * a percpu variable to do the irq_work. If the irq_work is 285 + * already used by another lookup, we fall back to report ips. 294 286 * 295 287 * Same fallback is used for kernel stack (!user) on a stackmap 296 288 * with build_id. 297 289 */ 298 - if (!user || !current || !current->mm || in_nmi() || 290 + if (!user || !current || !current->mm || irq_work_busy || 299 291 down_read_trylock(&current->mm->mmap_sem) == 0) { 300 292 /* cannot access current->mm, fall back to ips */ 301 293 for (i = 0; i < trace_nr; i++) { ··· 327 299 - vma->vm_start; 328 300 id_offs[i].status = BPF_STACK_BUILD_ID_VALID; 329 301 } 330 - up_read(&current->mm->mmap_sem); 302 + 303 + if (!in_nmi_ctx) { 304 + up_read(&current->mm->mmap_sem); 305 + } else { 306 + work->sem = &current->mm->mmap_sem; 307 + irq_work_queue(&work->irq_work); 308 + } 331 309 } 332 310 333 311 BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map, ··· 609 575 .map_update_elem = stack_map_update_elem, 610 576 .map_delete_elem = stack_map_delete_elem, 611 577 }; 578 + 579 + static int __init stack_map_init(void) 580 + { 581 + int cpu; 582 + struct stack_map_irq_work *work; 583 + 584 + for_each_possible_cpu(cpu) { 585 + work = per_cpu_ptr(&up_read_work, cpu); 586 + init_irq_work(&work->irq_work, do_up_read); 587 + } 588 + return 0; 589 + } 590 + subsys_initcall(stack_map_init);
+39 -2
kernel/bpf/syscall.c
··· 255 255 256 256 bpf_map_uncharge_memlock(map); 257 257 security_bpf_map_free(map); 258 - btf_put(map->btf); 259 258 /* implementation dependent freeing */ 260 259 map->ops->map_free(map); 261 260 } ··· 275 276 if (atomic_dec_and_test(&map->refcnt)) { 276 277 /* bpf_map_free_id() must be called first */ 277 278 bpf_map_free_id(map, do_idr_lock); 279 + btf_put(map->btf); 278 280 INIT_WORK(&map->work, bpf_map_free_deferred); 279 281 schedule_work(&map->work); 280 282 } ··· 2011 2011 info.map_flags = map->map_flags; 2012 2012 memcpy(info.name, map->name, sizeof(map->name)); 2013 2013 2014 + if (map->btf) { 2015 + info.btf_id = btf_id(map->btf); 2016 + info.btf_key_id = map->btf_key_id; 2017 + info.btf_value_id = map->btf_value_id; 2018 + } 2019 + 2014 2020 if (bpf_map_is_dev_bound(map)) { 2015 2021 err = bpf_map_offload_info_fill(&info, map); 2016 2022 if (err) ··· 2028 2022 return -EFAULT; 2029 2023 2030 2024 return 0; 2025 + } 2026 + 2027 + static int bpf_btf_get_info_by_fd(struct btf *btf, 2028 + const union bpf_attr *attr, 2029 + union bpf_attr __user *uattr) 2030 + { 2031 + struct bpf_btf_info __user *uinfo = u64_to_user_ptr(attr->info.info); 2032 + u32 info_len = attr->info.info_len; 2033 + int err; 2034 + 2035 + err = check_uarg_tail_zero(uinfo, sizeof(*uinfo), info_len); 2036 + if (err) 2037 + return err; 2038 + 2039 + return btf_get_info_by_fd(btf, attr, uattr); 2031 2040 } 2032 2041 2033 2042 #define BPF_OBJ_GET_INFO_BY_FD_LAST_FIELD info.info ··· 2068 2047 err = bpf_map_get_info_by_fd(f.file->private_data, attr, 2069 2048 uattr); 2070 2049 else if (f.file->f_op == &btf_fops) 2071 - err = btf_get_info_by_fd(f.file->private_data, attr, uattr); 2050 + err = bpf_btf_get_info_by_fd(f.file->private_data, attr, uattr); 2072 2051 else 2073 2052 err = -EINVAL; 2074 2053 ··· 2087 2066 return -EPERM; 2088 2067 2089 2068 return btf_new_fd(attr); 2069 + } 2070 + 2071 + #define BPF_BTF_GET_FD_BY_ID_LAST_FIELD btf_id 2072 + 2073 + static int bpf_btf_get_fd_by_id(const union bpf_attr *attr) 2074 + { 2075 + if (CHECK_ATTR(BPF_BTF_GET_FD_BY_ID)) 2076 + return -EINVAL; 2077 + 2078 + if (!capable(CAP_SYS_ADMIN)) 2079 + return -EPERM; 2080 + 2081 + return btf_get_fd_by_id(attr->btf_id); 2090 2082 } 2091 2083 2092 2084 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) ··· 2184 2150 break; 2185 2151 case BPF_BTF_LOAD: 2186 2152 err = bpf_btf_load(&attr); 2153 + break; 2154 + case BPF_BTF_GET_FD_BY_ID: 2155 + err = bpf_btf_get_fd_by_id(&attr); 2187 2156 break; 2188 2157 default: 2189 2158 err = -EINVAL;
+13 -3
kernel/bpf/verifier.c
··· 2093 2093 func_id != BPF_FUNC_msg_redirect_map) 2094 2094 goto error; 2095 2095 break; 2096 + case BPF_MAP_TYPE_SOCKHASH: 2097 + if (func_id != BPF_FUNC_sk_redirect_hash && 2098 + func_id != BPF_FUNC_sock_hash_update && 2099 + func_id != BPF_FUNC_map_delete_elem && 2100 + func_id != BPF_FUNC_msg_redirect_hash) 2101 + goto error; 2102 + break; 2096 2103 default: 2097 2104 break; 2098 2105 } ··· 2137 2130 break; 2138 2131 case BPF_FUNC_sk_redirect_map: 2139 2132 case BPF_FUNC_msg_redirect_map: 2133 + case BPF_FUNC_sock_map_update: 2140 2134 if (map->map_type != BPF_MAP_TYPE_SOCKMAP) 2141 2135 goto error; 2142 2136 break; 2143 - case BPF_FUNC_sock_map_update: 2144 - if (map->map_type != BPF_MAP_TYPE_SOCKMAP) 2137 + case BPF_FUNC_sk_redirect_hash: 2138 + case BPF_FUNC_msg_redirect_hash: 2139 + case BPF_FUNC_sock_hash_update: 2140 + if (map->map_type != BPF_MAP_TYPE_SOCKHASH) 2145 2141 goto error; 2146 2142 break; 2147 2143 default: ··· 5225 5215 } 5226 5216 } 5227 5217 5228 - if (!ops->convert_ctx_access) 5218 + if (!ops->convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux)) 5229 5219 return 0; 5230 5220 5231 5221 insn = env->prog->insnsi + delta;
+341 -24
net/core/filter.c
··· 60 60 #include <net/xfrm.h> 61 61 #include <linux/bpf_trace.h> 62 62 #include <net/xdp_sock.h> 63 + #include <linux/inetdevice.h> 64 + #include <net/ip_fib.h> 65 + #include <net/flow.h> 66 + #include <net/arp.h> 63 67 64 68 /** 65 69 * sk_filter_trim_cap - run a packet through a socket filter ··· 2074 2070 .arg2_type = ARG_ANYTHING, 2075 2071 }; 2076 2072 2073 + BPF_CALL_4(bpf_sk_redirect_hash, struct sk_buff *, skb, 2074 + struct bpf_map *, map, void *, key, u64, flags) 2075 + { 2076 + struct tcp_skb_cb *tcb = TCP_SKB_CB(skb); 2077 + 2078 + /* If user passes invalid input drop the packet. */ 2079 + if (unlikely(flags & ~(BPF_F_INGRESS))) 2080 + return SK_DROP; 2081 + 2082 + tcb->bpf.flags = flags; 2083 + tcb->bpf.sk_redir = __sock_hash_lookup_elem(map, key); 2084 + if (!tcb->bpf.sk_redir) 2085 + return SK_DROP; 2086 + 2087 + return SK_PASS; 2088 + } 2089 + 2090 + static const struct bpf_func_proto bpf_sk_redirect_hash_proto = { 2091 + .func = bpf_sk_redirect_hash, 2092 + .gpl_only = false, 2093 + .ret_type = RET_INTEGER, 2094 + .arg1_type = ARG_PTR_TO_CTX, 2095 + .arg2_type = ARG_CONST_MAP_PTR, 2096 + .arg3_type = ARG_PTR_TO_MAP_KEY, 2097 + .arg4_type = ARG_ANYTHING, 2098 + }; 2099 + 2077 2100 BPF_CALL_4(bpf_sk_redirect_map, struct sk_buff *, skb, 2078 2101 struct bpf_map *, map, u32, key, u64, flags) 2079 2102 { ··· 2110 2079 if (unlikely(flags & ~(BPF_F_INGRESS))) 2111 2080 return SK_DROP; 2112 2081 2113 - tcb->bpf.key = key; 2114 2082 tcb->bpf.flags = flags; 2115 - tcb->bpf.map = map; 2083 + tcb->bpf.sk_redir = __sock_map_lookup_elem(map, key); 2084 + if (!tcb->bpf.sk_redir) 2085 + return SK_DROP; 2116 2086 2117 2087 return SK_PASS; 2118 2088 } ··· 2121 2089 struct sock *do_sk_redirect_map(struct sk_buff *skb) 2122 2090 { 2123 2091 struct tcp_skb_cb *tcb = TCP_SKB_CB(skb); 2124 - struct sock *sk = NULL; 2125 2092 2126 - if (tcb->bpf.map) { 2127 - sk = __sock_map_lookup_elem(tcb->bpf.map, tcb->bpf.key); 2128 - 2129 - tcb->bpf.key = 0; 2130 - tcb->bpf.map = NULL; 2131 - } 2132 - 2133 - return sk; 2093 + return tcb->bpf.sk_redir; 2134 2094 } 2135 2095 2136 2096 static const struct bpf_func_proto bpf_sk_redirect_map_proto = { ··· 2135 2111 .arg4_type = ARG_ANYTHING, 2136 2112 }; 2137 2113 2114 + BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg_buff *, msg, 2115 + struct bpf_map *, map, void *, key, u64, flags) 2116 + { 2117 + /* If user passes invalid input drop the packet. */ 2118 + if (unlikely(flags & ~(BPF_F_INGRESS))) 2119 + return SK_DROP; 2120 + 2121 + msg->flags = flags; 2122 + msg->sk_redir = __sock_hash_lookup_elem(map, key); 2123 + if (!msg->sk_redir) 2124 + return SK_DROP; 2125 + 2126 + return SK_PASS; 2127 + } 2128 + 2129 + static const struct bpf_func_proto bpf_msg_redirect_hash_proto = { 2130 + .func = bpf_msg_redirect_hash, 2131 + .gpl_only = false, 2132 + .ret_type = RET_INTEGER, 2133 + .arg1_type = ARG_PTR_TO_CTX, 2134 + .arg2_type = ARG_CONST_MAP_PTR, 2135 + .arg3_type = ARG_PTR_TO_MAP_KEY, 2136 + .arg4_type = ARG_ANYTHING, 2137 + }; 2138 + 2138 2139 BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg_buff *, msg, 2139 2140 struct bpf_map *, map, u32, key, u64, flags) 2140 2141 { ··· 2167 2118 if (unlikely(flags & ~(BPF_F_INGRESS))) 2168 2119 return SK_DROP; 2169 2120 2170 - msg->key = key; 2171 2121 msg->flags = flags; 2172 - msg->map = map; 2122 + msg->sk_redir = __sock_map_lookup_elem(map, key); 2123 + if (!msg->sk_redir) 2124 + return SK_DROP; 2173 2125 2174 2126 return SK_PASS; 2175 2127 } 2176 2128 2177 2129 struct sock *do_msg_redirect_map(struct sk_msg_buff *msg) 2178 2130 { 2179 - struct sock *sk = NULL; 2180 - 2181 - if (msg->map) { 2182 - sk = __sock_map_lookup_elem(msg->map, msg->key); 2183 - 2184 - msg->key = 0; 2185 - msg->map = NULL; 2186 - } 2187 - 2188 - return sk; 2131 + return msg->sk_redir; 2189 2132 } 2190 2133 2191 2134 static const struct bpf_func_proto bpf_msg_redirect_map_proto = { ··· 4073 4032 }; 4074 4033 #endif 4075 4034 4035 + #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6) 4036 + static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, 4037 + const struct neighbour *neigh, 4038 + const struct net_device *dev) 4039 + { 4040 + memcpy(params->dmac, neigh->ha, ETH_ALEN); 4041 + memcpy(params->smac, dev->dev_addr, ETH_ALEN); 4042 + params->h_vlan_TCI = 0; 4043 + params->h_vlan_proto = 0; 4044 + 4045 + return dev->ifindex; 4046 + } 4047 + #endif 4048 + 4049 + #if IS_ENABLED(CONFIG_INET) 4050 + static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params, 4051 + u32 flags) 4052 + { 4053 + struct in_device *in_dev; 4054 + struct neighbour *neigh; 4055 + struct net_device *dev; 4056 + struct fib_result res; 4057 + struct fib_nh *nh; 4058 + struct flowi4 fl4; 4059 + int err; 4060 + 4061 + dev = dev_get_by_index_rcu(net, params->ifindex); 4062 + if (unlikely(!dev)) 4063 + return -ENODEV; 4064 + 4065 + /* verify forwarding is enabled on this interface */ 4066 + in_dev = __in_dev_get_rcu(dev); 4067 + if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev))) 4068 + return 0; 4069 + 4070 + if (flags & BPF_FIB_LOOKUP_OUTPUT) { 4071 + fl4.flowi4_iif = 1; 4072 + fl4.flowi4_oif = params->ifindex; 4073 + } else { 4074 + fl4.flowi4_iif = params->ifindex; 4075 + fl4.flowi4_oif = 0; 4076 + } 4077 + fl4.flowi4_tos = params->tos & IPTOS_RT_MASK; 4078 + fl4.flowi4_scope = RT_SCOPE_UNIVERSE; 4079 + fl4.flowi4_flags = 0; 4080 + 4081 + fl4.flowi4_proto = params->l4_protocol; 4082 + fl4.daddr = params->ipv4_dst; 4083 + fl4.saddr = params->ipv4_src; 4084 + fl4.fl4_sport = params->sport; 4085 + fl4.fl4_dport = params->dport; 4086 + 4087 + if (flags & BPF_FIB_LOOKUP_DIRECT) { 4088 + u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; 4089 + struct fib_table *tb; 4090 + 4091 + tb = fib_get_table(net, tbid); 4092 + if (unlikely(!tb)) 4093 + return 0; 4094 + 4095 + err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF); 4096 + } else { 4097 + fl4.flowi4_mark = 0; 4098 + fl4.flowi4_secid = 0; 4099 + fl4.flowi4_tun_key.tun_id = 0; 4100 + fl4.flowi4_uid = sock_net_uid(net, NULL); 4101 + 4102 + err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF); 4103 + } 4104 + 4105 + if (err || res.type != RTN_UNICAST) 4106 + return 0; 4107 + 4108 + if (res.fi->fib_nhs > 1) 4109 + fib_select_path(net, &res, &fl4, NULL); 4110 + 4111 + nh = &res.fi->fib_nh[res.nh_sel]; 4112 + 4113 + /* do not handle lwt encaps right now */ 4114 + if (nh->nh_lwtstate) 4115 + return 0; 4116 + 4117 + dev = nh->nh_dev; 4118 + if (unlikely(!dev)) 4119 + return 0; 4120 + 4121 + if (nh->nh_gw) 4122 + params->ipv4_dst = nh->nh_gw; 4123 + 4124 + params->rt_metric = res.fi->fib_priority; 4125 + 4126 + /* xdp and cls_bpf programs are run in RCU-bh so 4127 + * rcu_read_lock_bh is not needed here 4128 + */ 4129 + neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst); 4130 + if (neigh) 4131 + return bpf_fib_set_fwd_params(params, neigh, dev); 4132 + 4133 + return 0; 4134 + } 4135 + #endif 4136 + 4137 + #if IS_ENABLED(CONFIG_IPV6) 4138 + static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params, 4139 + u32 flags) 4140 + { 4141 + struct in6_addr *src = (struct in6_addr *) params->ipv6_src; 4142 + struct in6_addr *dst = (struct in6_addr *) params->ipv6_dst; 4143 + struct neighbour *neigh; 4144 + struct net_device *dev; 4145 + struct inet6_dev *idev; 4146 + struct fib6_info *f6i; 4147 + struct flowi6 fl6; 4148 + int strict = 0; 4149 + int oif; 4150 + 4151 + /* link local addresses are never forwarded */ 4152 + if (rt6_need_strict(dst) || rt6_need_strict(src)) 4153 + return 0; 4154 + 4155 + dev = dev_get_by_index_rcu(net, params->ifindex); 4156 + if (unlikely(!dev)) 4157 + return -ENODEV; 4158 + 4159 + idev = __in6_dev_get_safely(dev); 4160 + if (unlikely(!idev || !net->ipv6.devconf_all->forwarding)) 4161 + return 0; 4162 + 4163 + if (flags & BPF_FIB_LOOKUP_OUTPUT) { 4164 + fl6.flowi6_iif = 1; 4165 + oif = fl6.flowi6_oif = params->ifindex; 4166 + } else { 4167 + oif = fl6.flowi6_iif = params->ifindex; 4168 + fl6.flowi6_oif = 0; 4169 + strict = RT6_LOOKUP_F_HAS_SADDR; 4170 + } 4171 + fl6.flowlabel = params->flowlabel; 4172 + fl6.flowi6_scope = 0; 4173 + fl6.flowi6_flags = 0; 4174 + fl6.mp_hash = 0; 4175 + 4176 + fl6.flowi6_proto = params->l4_protocol; 4177 + fl6.daddr = *dst; 4178 + fl6.saddr = *src; 4179 + fl6.fl6_sport = params->sport; 4180 + fl6.fl6_dport = params->dport; 4181 + 4182 + if (flags & BPF_FIB_LOOKUP_DIRECT) { 4183 + u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN; 4184 + struct fib6_table *tb; 4185 + 4186 + tb = ipv6_stub->fib6_get_table(net, tbid); 4187 + if (unlikely(!tb)) 4188 + return 0; 4189 + 4190 + f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict); 4191 + } else { 4192 + fl6.flowi6_mark = 0; 4193 + fl6.flowi6_secid = 0; 4194 + fl6.flowi6_tun_key.tun_id = 0; 4195 + fl6.flowi6_uid = sock_net_uid(net, NULL); 4196 + 4197 + f6i = ipv6_stub->fib6_lookup(net, oif, &fl6, strict); 4198 + } 4199 + 4200 + if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry)) 4201 + return 0; 4202 + 4203 + if (unlikely(f6i->fib6_flags & RTF_REJECT || 4204 + f6i->fib6_type != RTN_UNICAST)) 4205 + return 0; 4206 + 4207 + if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0) 4208 + f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6, 4209 + fl6.flowi6_oif, NULL, 4210 + strict); 4211 + 4212 + if (f6i->fib6_nh.nh_lwtstate) 4213 + return 0; 4214 + 4215 + if (f6i->fib6_flags & RTF_GATEWAY) 4216 + *dst = f6i->fib6_nh.nh_gw; 4217 + 4218 + dev = f6i->fib6_nh.nh_dev; 4219 + params->rt_metric = f6i->fib6_metric; 4220 + 4221 + /* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is 4222 + * not needed here. Can not use __ipv6_neigh_lookup_noref here 4223 + * because we need to get nd_tbl via the stub 4224 + */ 4225 + neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128, 4226 + ndisc_hashfn, dst, dev); 4227 + if (neigh) 4228 + return bpf_fib_set_fwd_params(params, neigh, dev); 4229 + 4230 + return 0; 4231 + } 4232 + #endif 4233 + 4234 + BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx, 4235 + struct bpf_fib_lookup *, params, int, plen, u32, flags) 4236 + { 4237 + if (plen < sizeof(*params)) 4238 + return -EINVAL; 4239 + 4240 + switch (params->family) { 4241 + #if IS_ENABLED(CONFIG_INET) 4242 + case AF_INET: 4243 + return bpf_ipv4_fib_lookup(dev_net(ctx->rxq->dev), params, 4244 + flags); 4245 + #endif 4246 + #if IS_ENABLED(CONFIG_IPV6) 4247 + case AF_INET6: 4248 + return bpf_ipv6_fib_lookup(dev_net(ctx->rxq->dev), params, 4249 + flags); 4250 + #endif 4251 + } 4252 + return 0; 4253 + } 4254 + 4255 + static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = { 4256 + .func = bpf_xdp_fib_lookup, 4257 + .gpl_only = true, 4258 + .ret_type = RET_INTEGER, 4259 + .arg1_type = ARG_PTR_TO_CTX, 4260 + .arg2_type = ARG_PTR_TO_MEM, 4261 + .arg3_type = ARG_CONST_SIZE, 4262 + .arg4_type = ARG_ANYTHING, 4263 + }; 4264 + 4265 + BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb, 4266 + struct bpf_fib_lookup *, params, int, plen, u32, flags) 4267 + { 4268 + if (plen < sizeof(*params)) 4269 + return -EINVAL; 4270 + 4271 + switch (params->family) { 4272 + #if IS_ENABLED(CONFIG_INET) 4273 + case AF_INET: 4274 + return bpf_ipv4_fib_lookup(dev_net(skb->dev), params, flags); 4275 + #endif 4276 + #if IS_ENABLED(CONFIG_IPV6) 4277 + case AF_INET6: 4278 + return bpf_ipv6_fib_lookup(dev_net(skb->dev), params, flags); 4279 + #endif 4280 + } 4281 + return -ENOTSUPP; 4282 + } 4283 + 4284 + static const struct bpf_func_proto bpf_skb_fib_lookup_proto = { 4285 + .func = bpf_skb_fib_lookup, 4286 + .gpl_only = true, 4287 + .ret_type = RET_INTEGER, 4288 + .arg1_type = ARG_PTR_TO_CTX, 4289 + .arg2_type = ARG_PTR_TO_MEM, 4290 + .arg3_type = ARG_CONST_SIZE, 4291 + .arg4_type = ARG_ANYTHING, 4292 + }; 4293 + 4076 4294 static const struct bpf_func_proto * 4077 4295 bpf_base_func_proto(enum bpf_func_id func_id) 4078 4296 { ··· 4481 4181 case BPF_FUNC_skb_get_xfrm_state: 4482 4182 return &bpf_skb_get_xfrm_state_proto; 4483 4183 #endif 4184 + case BPF_FUNC_fib_lookup: 4185 + return &bpf_skb_fib_lookup_proto; 4484 4186 default: 4485 4187 return bpf_base_func_proto(func_id); 4486 4188 } ··· 4508 4206 return &bpf_xdp_redirect_map_proto; 4509 4207 case BPF_FUNC_xdp_adjust_tail: 4510 4208 return &bpf_xdp_adjust_tail_proto; 4209 + case BPF_FUNC_fib_lookup: 4210 + return &bpf_xdp_fib_lookup_proto; 4511 4211 default: 4512 4212 return bpf_base_func_proto(func_id); 4513 4213 } ··· 4554 4250 return &bpf_sock_ops_cb_flags_set_proto; 4555 4251 case BPF_FUNC_sock_map_update: 4556 4252 return &bpf_sock_map_update_proto; 4253 + case BPF_FUNC_sock_hash_update: 4254 + return &bpf_sock_hash_update_proto; 4557 4255 default: 4558 4256 return bpf_base_func_proto(func_id); 4559 4257 } ··· 4567 4261 switch (func_id) { 4568 4262 case BPF_FUNC_msg_redirect_map: 4569 4263 return &bpf_msg_redirect_map_proto; 4264 + case BPF_FUNC_msg_redirect_hash: 4265 + return &bpf_msg_redirect_hash_proto; 4570 4266 case BPF_FUNC_msg_apply_bytes: 4571 4267 return &bpf_msg_apply_bytes_proto; 4572 4268 case BPF_FUNC_msg_cork_bytes: ··· 4600 4292 return &bpf_get_socket_uid_proto; 4601 4293 case BPF_FUNC_sk_redirect_map: 4602 4294 return &bpf_sk_redirect_map_proto; 4295 + case BPF_FUNC_sk_redirect_hash: 4296 + return &bpf_sk_redirect_hash_proto; 4603 4297 default: 4604 4298 return bpf_base_func_proto(func_id); 4605 4299 } ··· 4955 4645 const struct bpf_prog *prog, 4956 4646 struct bpf_insn_access_aux *info) 4957 4647 { 4958 - if (type == BPF_WRITE) 4648 + if (type == BPF_WRITE) { 4649 + if (bpf_prog_is_dev_bound(prog->aux)) { 4650 + switch (off) { 4651 + case offsetof(struct xdp_md, rx_queue_index): 4652 + return __is_valid_xdp_access(off, size); 4653 + } 4654 + } 4959 4655 return false; 4656 + } 4960 4657 4961 4658 switch (off) { 4962 4659 case offsetof(struct xdp_md, data):
+32 -1
net/ipv6/addrconf_core.c
··· 134 134 return -EAFNOSUPPORT; 135 135 } 136 136 137 + static struct fib6_table *eafnosupport_fib6_get_table(struct net *net, u32 id) 138 + { 139 + return NULL; 140 + } 141 + 142 + static struct fib6_info * 143 + eafnosupport_fib6_table_lookup(struct net *net, struct fib6_table *table, 144 + int oif, struct flowi6 *fl6, int flags) 145 + { 146 + return NULL; 147 + } 148 + 149 + static struct fib6_info * 150 + eafnosupport_fib6_lookup(struct net *net, int oif, struct flowi6 *fl6, 151 + int flags) 152 + { 153 + return NULL; 154 + } 155 + 156 + static struct fib6_info * 157 + eafnosupport_fib6_multipath_select(const struct net *net, struct fib6_info *f6i, 158 + struct flowi6 *fl6, int oif, 159 + const struct sk_buff *skb, int strict) 160 + { 161 + return f6i; 162 + } 163 + 137 164 const struct ipv6_stub *ipv6_stub __read_mostly = &(struct ipv6_stub) { 138 - .ipv6_dst_lookup = eafnosupport_ipv6_dst_lookup, 165 + .ipv6_dst_lookup = eafnosupport_ipv6_dst_lookup, 166 + .fib6_get_table = eafnosupport_fib6_get_table, 167 + .fib6_table_lookup = eafnosupport_fib6_table_lookup, 168 + .fib6_lookup = eafnosupport_fib6_lookup, 169 + .fib6_multipath_select = eafnosupport_fib6_multipath_select, 139 170 }; 140 171 EXPORT_SYMBOL_GPL(ipv6_stub); 141 172
+5 -1
net/ipv6/af_inet6.c
··· 889 889 static const struct ipv6_stub ipv6_stub_impl = { 890 890 .ipv6_sock_mc_join = ipv6_sock_mc_join, 891 891 .ipv6_sock_mc_drop = ipv6_sock_mc_drop, 892 - .ipv6_dst_lookup = ip6_dst_lookup, 892 + .ipv6_dst_lookup = ip6_dst_lookup, 893 + .fib6_get_table = fib6_get_table, 894 + .fib6_table_lookup = fib6_table_lookup, 895 + .fib6_lookup = fib6_lookup, 896 + .fib6_multipath_select = fib6_multipath_select, 893 897 .udpv6_encap_enable = udpv6_encap_enable, 894 898 .ndisc_send_na = ndisc_send_na, 895 899 .nd_tbl = &nd_tbl,
+113 -21
net/ipv6/fib6_rules.c
··· 60 60 return fib_rules_seq_read(net, AF_INET6); 61 61 } 62 62 63 + /* called with rcu lock held; no reference taken on fib6_info */ 64 + struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6, 65 + int flags) 66 + { 67 + struct fib6_info *f6i; 68 + int err; 69 + 70 + if (net->ipv6.fib6_has_custom_rules) { 71 + struct fib_lookup_arg arg = { 72 + .lookup_ptr = fib6_table_lookup, 73 + .lookup_data = &oif, 74 + .flags = FIB_LOOKUP_NOREF, 75 + }; 76 + 77 + l3mdev_update_flow(net, flowi6_to_flowi(fl6)); 78 + 79 + err = fib_rules_lookup(net->ipv6.fib6_rules_ops, 80 + flowi6_to_flowi(fl6), flags, &arg); 81 + if (err) 82 + return ERR_PTR(err); 83 + 84 + f6i = arg.result ? : net->ipv6.fib6_null_entry; 85 + } else { 86 + f6i = fib6_table_lookup(net, net->ipv6.fib6_local_tbl, 87 + oif, fl6, flags); 88 + if (!f6i || f6i == net->ipv6.fib6_null_entry) 89 + f6i = fib6_table_lookup(net, net->ipv6.fib6_main_tbl, 90 + oif, fl6, flags); 91 + } 92 + 93 + return f6i; 94 + } 95 + 63 96 struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6, 64 97 const struct sk_buff *skb, 65 98 int flags, pol_lookup_t lookup) ··· 129 96 return &net->ipv6.ip6_null_entry->dst; 130 97 } 131 98 132 - static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp, 133 - int flags, struct fib_lookup_arg *arg) 99 + static int fib6_rule_saddr(struct net *net, struct fib_rule *rule, int flags, 100 + struct flowi6 *flp6, const struct net_device *dev) 101 + { 102 + struct fib6_rule *r = (struct fib6_rule *)rule; 103 + 104 + /* If we need to find a source address for this traffic, 105 + * we check the result if it meets requirement of the rule. 106 + */ 107 + if ((rule->flags & FIB_RULE_FIND_SADDR) && 108 + r->src.plen && !(flags & RT6_LOOKUP_F_HAS_SADDR)) { 109 + struct in6_addr saddr; 110 + 111 + if (ipv6_dev_get_saddr(net, dev, &flp6->daddr, 112 + rt6_flags2srcprefs(flags), &saddr)) 113 + return -EAGAIN; 114 + 115 + if (!ipv6_prefix_equal(&saddr, &r->src.addr, r->src.plen)) 116 + return -EAGAIN; 117 + 118 + flp6->saddr = saddr; 119 + } 120 + 121 + return 0; 122 + } 123 + 124 + static int fib6_rule_action_alt(struct fib_rule *rule, struct flowi *flp, 125 + int flags, struct fib_lookup_arg *arg) 126 + { 127 + struct flowi6 *flp6 = &flp->u.ip6; 128 + struct net *net = rule->fr_net; 129 + struct fib6_table *table; 130 + struct fib6_info *f6i; 131 + int err = -EAGAIN, *oif; 132 + u32 tb_id; 133 + 134 + switch (rule->action) { 135 + case FR_ACT_TO_TBL: 136 + break; 137 + case FR_ACT_UNREACHABLE: 138 + return -ENETUNREACH; 139 + case FR_ACT_PROHIBIT: 140 + return -EACCES; 141 + case FR_ACT_BLACKHOLE: 142 + default: 143 + return -EINVAL; 144 + } 145 + 146 + tb_id = fib_rule_get_table(rule, arg); 147 + table = fib6_get_table(net, tb_id); 148 + if (!table) 149 + return -EAGAIN; 150 + 151 + oif = (int *)arg->lookup_data; 152 + f6i = fib6_table_lookup(net, table, *oif, flp6, flags); 153 + if (f6i != net->ipv6.fib6_null_entry) { 154 + err = fib6_rule_saddr(net, rule, flags, flp6, 155 + fib6_info_nh_dev(f6i)); 156 + 157 + if (likely(!err)) 158 + arg->result = f6i; 159 + } 160 + 161 + return err; 162 + } 163 + 164 + static int __fib6_rule_action(struct fib_rule *rule, struct flowi *flp, 165 + int flags, struct fib_lookup_arg *arg) 134 166 { 135 167 struct flowi6 *flp6 = &flp->u.ip6; 136 168 struct rt6_info *rt = NULL; ··· 232 134 233 135 rt = lookup(net, table, flp6, arg->lookup_data, flags); 234 136 if (rt != net->ipv6.ip6_null_entry) { 235 - struct fib6_rule *r = (struct fib6_rule *)rule; 137 + err = fib6_rule_saddr(net, rule, flags, flp6, 138 + ip6_dst_idev(&rt->dst)->dev); 236 139 237 - /* 238 - * If we need to find a source address for this traffic, 239 - * we check the result if it meets requirement of the rule. 240 - */ 241 - if ((rule->flags & FIB_RULE_FIND_SADDR) && 242 - r->src.plen && !(flags & RT6_LOOKUP_F_HAS_SADDR)) { 243 - struct in6_addr saddr; 140 + if (err == -EAGAIN) 141 + goto again; 244 142 245 - if (ipv6_dev_get_saddr(net, 246 - ip6_dst_idev(&rt->dst)->dev, 247 - &flp6->daddr, 248 - rt6_flags2srcprefs(flags), 249 - &saddr)) 250 - goto again; 251 - if (!ipv6_prefix_equal(&saddr, &r->src.addr, 252 - r->src.plen)) 253 - goto again; 254 - flp6->saddr = saddr; 255 - } 256 143 err = rt->dst.error; 257 144 if (err != -EAGAIN) 258 145 goto out; ··· 253 170 out: 254 171 arg->result = rt; 255 172 return err; 173 + } 174 + 175 + static int fib6_rule_action(struct fib_rule *rule, struct flowi *flp, 176 + int flags, struct fib_lookup_arg *arg) 177 + { 178 + if (arg->lookup_ptr == fib6_table_lookup) 179 + return fib6_rule_action_alt(rule, flp, flags, arg); 180 + 181 + return __fib6_rule_action(rule, flp, flags, arg); 256 182 } 257 183 258 184 static bool fib6_rule_suppress(struct fib_rule *rule, struct fib_lookup_arg *arg)
+15 -6
net/ipv6/ip6_fib.c
··· 354 354 return &rt->dst; 355 355 } 356 356 357 + /* called with rcu lock held; no reference taken on fib6_info */ 358 + struct fib6_info *fib6_lookup(struct net *net, int oif, struct flowi6 *fl6, 359 + int flags) 360 + { 361 + return fib6_table_lookup(net, net->ipv6.fib6_main_tbl, oif, fl6, flags); 362 + } 363 + 357 364 static void __net_init fib6_tables_init(struct net *net) 358 365 { 359 366 fib6_link_table(net, net->ipv6.fib6_main_tbl); ··· 1361 1354 const struct in6_addr *addr; /* search key */ 1362 1355 }; 1363 1356 1364 - static struct fib6_node *fib6_lookup_1(struct fib6_node *root, 1365 - struct lookup_args *args) 1357 + static struct fib6_node *fib6_node_lookup_1(struct fib6_node *root, 1358 + struct lookup_args *args) 1366 1359 { 1367 1360 struct fib6_node *fn; 1368 1361 __be32 dir; ··· 1407 1400 #ifdef CONFIG_IPV6_SUBTREES 1408 1401 if (subtree) { 1409 1402 struct fib6_node *sfn; 1410 - sfn = fib6_lookup_1(subtree, args + 1); 1403 + sfn = fib6_node_lookup_1(subtree, 1404 + args + 1); 1411 1405 if (!sfn) 1412 1406 goto backtrack; 1413 1407 fn = sfn; ··· 1430 1422 1431 1423 /* called with rcu_read_lock() held 1432 1424 */ 1433 - struct fib6_node *fib6_lookup(struct fib6_node *root, const struct in6_addr *daddr, 1434 - const struct in6_addr *saddr) 1425 + struct fib6_node *fib6_node_lookup(struct fib6_node *root, 1426 + const struct in6_addr *daddr, 1427 + const struct in6_addr *saddr) 1435 1428 { 1436 1429 struct fib6_node *fn; 1437 1430 struct lookup_args args[] = { ··· 1451 1442 } 1452 1443 }; 1453 1444 1454 - fn = fib6_lookup_1(root, daddr ? args : args + 1); 1445 + fn = fib6_node_lookup_1(root, daddr ? args : args + 1); 1455 1446 if (!fn || fn->fn_flags & RTN_TL_ROOT) 1456 1447 fn = root; 1457 1448
+43 -33
net/ipv6/route.c
··· 419 419 return false; 420 420 } 421 421 422 - static struct fib6_info *rt6_multipath_select(const struct net *net, 423 - struct fib6_info *match, 424 - struct flowi6 *fl6, int oif, 425 - const struct sk_buff *skb, 426 - int strict) 422 + struct fib6_info *fib6_multipath_select(const struct net *net, 423 + struct fib6_info *match, 424 + struct flowi6 *fl6, int oif, 425 + const struct sk_buff *skb, 426 + int strict) 427 427 { 428 428 struct fib6_info *sibling, *next_sibling; 429 429 ··· 1006 1006 pn = rcu_dereference(fn->parent); 1007 1007 sn = FIB6_SUBTREE(pn); 1008 1008 if (sn && sn != fn) 1009 - fn = fib6_lookup(sn, NULL, saddr); 1009 + fn = fib6_node_lookup(sn, NULL, saddr); 1010 1010 else 1011 1011 fn = pn; 1012 1012 if (fn->fn_flags & RTN_RTINFO) ··· 1059 1059 flags &= ~RT6_LOOKUP_F_IFACE; 1060 1060 1061 1061 rcu_read_lock(); 1062 - fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr); 1062 + fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr); 1063 1063 restart: 1064 1064 f6i = rcu_dereference(fn->leaf); 1065 1065 if (!f6i) { ··· 1068 1068 f6i = rt6_device_match(net, f6i, &fl6->saddr, 1069 1069 fl6->flowi6_oif, flags); 1070 1070 if (f6i->fib6_nsiblings && fl6->flowi6_oif == 0) 1071 - f6i = rt6_multipath_select(net, f6i, fl6, 1072 - fl6->flowi6_oif, skb, flags); 1071 + f6i = fib6_multipath_select(net, f6i, fl6, 1072 + fl6->flowi6_oif, skb, 1073 + flags); 1073 1074 } 1074 1075 if (f6i == net->ipv6.fib6_null_entry) { 1075 1076 fn = fib6_backtrack(fn, &fl6->saddr); 1076 1077 if (fn) 1077 1078 goto restart; 1078 1079 } 1080 + 1081 + trace_fib6_table_lookup(net, f6i, table, fl6); 1079 1082 1080 1083 /* Search through exception table */ 1081 1084 rt = rt6_find_cached_rt(f6i, &fl6->daddr, &fl6->saddr); ··· 1097 1094 } 1098 1095 1099 1096 rcu_read_unlock(); 1100 - 1101 - trace_fib6_table_lookup(net, rt, table, fl6); 1102 1097 1103 1098 return rt; 1104 1099 } ··· 1800 1799 rcu_read_unlock_bh(); 1801 1800 } 1802 1801 1803 - struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, 1804 - int oif, struct flowi6 *fl6, 1805 - const struct sk_buff *skb, int flags) 1802 + /* must be called with rcu lock held */ 1803 + struct fib6_info *fib6_table_lookup(struct net *net, struct fib6_table *table, 1804 + int oif, struct flowi6 *fl6, int strict) 1806 1805 { 1807 1806 struct fib6_node *fn, *saved_fn; 1808 1807 struct fib6_info *f6i; 1809 - struct rt6_info *rt; 1810 - int strict = 0; 1811 1808 1812 - strict |= flags & RT6_LOOKUP_F_IFACE; 1813 - strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE; 1814 - if (net->ipv6.devconf_all->forwarding == 0) 1815 - strict |= RT6_LOOKUP_F_REACHABLE; 1816 - 1817 - rcu_read_lock(); 1818 - 1819 - fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr); 1809 + fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr); 1820 1810 saved_fn = fn; 1821 1811 1822 1812 if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF) ··· 1815 1823 1816 1824 redo_rt6_select: 1817 1825 f6i = rt6_select(net, fn, oif, strict); 1818 - if (f6i->fib6_nsiblings) 1819 - f6i = rt6_multipath_select(net, f6i, fl6, oif, skb, strict); 1820 1826 if (f6i == net->ipv6.fib6_null_entry) { 1821 1827 fn = fib6_backtrack(fn, &fl6->saddr); 1822 1828 if (fn) ··· 1827 1837 } 1828 1838 } 1829 1839 1840 + trace_fib6_table_lookup(net, f6i, table, fl6); 1841 + 1842 + return f6i; 1843 + } 1844 + 1845 + struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, 1846 + int oif, struct flowi6 *fl6, 1847 + const struct sk_buff *skb, int flags) 1848 + { 1849 + struct fib6_info *f6i; 1850 + struct rt6_info *rt; 1851 + int strict = 0; 1852 + 1853 + strict |= flags & RT6_LOOKUP_F_IFACE; 1854 + strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE; 1855 + if (net->ipv6.devconf_all->forwarding == 0) 1856 + strict |= RT6_LOOKUP_F_REACHABLE; 1857 + 1858 + rcu_read_lock(); 1859 + 1860 + f6i = fib6_table_lookup(net, table, oif, fl6, strict); 1861 + if (f6i->fib6_nsiblings) 1862 + f6i = fib6_multipath_select(net, f6i, fl6, oif, skb, strict); 1863 + 1830 1864 if (f6i == net->ipv6.fib6_null_entry) { 1831 1865 rt = net->ipv6.ip6_null_entry; 1832 1866 rcu_read_unlock(); 1833 1867 dst_hold(&rt->dst); 1834 - trace_fib6_table_lookup(net, rt, table, fl6); 1835 1868 return rt; 1836 1869 } 1837 1870 ··· 1865 1852 dst_use_noref(&rt->dst, jiffies); 1866 1853 1867 1854 rcu_read_unlock(); 1868 - trace_fib6_table_lookup(net, rt, table, fl6); 1869 1855 return rt; 1870 1856 } else if (unlikely((fl6->flowi6_flags & FLOWI_FLAG_KNOWN_NH) && 1871 1857 !(f6i->fib6_flags & RTF_GATEWAY))) { ··· 1890 1878 dst_hold(&uncached_rt->dst); 1891 1879 } 1892 1880 1893 - trace_fib6_table_lookup(net, uncached_rt, table, fl6); 1894 1881 return uncached_rt; 1895 - 1896 1882 } else { 1897 1883 /* Get a percpu copy */ 1898 1884 ··· 1904 1894 1905 1895 local_bh_enable(); 1906 1896 rcu_read_unlock(); 1907 - trace_fib6_table_lookup(net, pcpu_rt, table, fl6); 1897 + 1908 1898 return pcpu_rt; 1909 1899 } 1910 1900 } ··· 2435 2425 */ 2436 2426 2437 2427 rcu_read_lock(); 2438 - fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr); 2428 + fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr); 2439 2429 restart: 2440 2430 for_each_fib6_node_rt_rcu(fn) { 2441 2431 if (rt->fib6_nh.nh_flags & RTNH_F_DEAD) ··· 2489 2479 2490 2480 rcu_read_unlock(); 2491 2481 2492 - trace_fib6_table_lookup(net, ret, table, fl6); 2482 + trace_fib6_table_lookup(net, rt, table, fl6); 2493 2483 return ret; 2494 2484 }; 2495 2485
+1 -1
net/xdp/xdp_umem.c
··· 209 209 if ((addr + size) < addr) 210 210 return -EINVAL; 211 211 212 - nframes = size / frame_size; 212 + nframes = (unsigned int)div_u64(size, frame_size); 213 213 if (nframes == 0 || nframes > UINT_MAX) 214 214 return -EINVAL; 215 215
+75 -91
samples/bpf/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 + 3 + BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src)) 4 + TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools 5 + 2 6 # List of programs to build 3 7 hostprogs-y := test_lru_dist 4 8 hostprogs-y += sock_example ··· 50 46 hostprogs-y += cpustat 51 47 hostprogs-y += xdp_adjust_tail 52 48 hostprogs-y += xdpsock 49 + hostprogs-y += xdp_fwd 53 50 54 51 # Libbpf dependencies 55 - LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o 52 + LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a 53 + 56 54 CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o 57 55 TRACE_HELPERS := ../../tools/testing/selftests/bpf/trace_helpers.o 58 56 59 - test_lru_dist-objs := test_lru_dist.o $(LIBBPF) 60 - sock_example-objs := sock_example.o $(LIBBPF) 61 - fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o 62 - sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o 63 - sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o 64 - sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o 65 - tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o 66 - tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o 67 - tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o 68 - tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o 69 - tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o 70 - tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o 71 - tracex7-objs := bpf_load.o $(LIBBPF) tracex7_user.o 72 - load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o 73 - test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o 74 - trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o $(TRACE_HELPERS) 75 - lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o 76 - offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o $(TRACE_HELPERS) 77 - spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o $(TRACE_HELPERS) 78 - map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o 79 - test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o 80 - test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o 81 - test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o 82 - test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o $(CGROUP_HELPERS) 83 - test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o 84 - test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o 85 - xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o 57 + fds_example-objs := bpf_load.o fds_example.o 58 + sockex1-objs := bpf_load.o sockex1_user.o 59 + sockex2-objs := bpf_load.o sockex2_user.o 60 + sockex3-objs := bpf_load.o sockex3_user.o 61 + tracex1-objs := bpf_load.o tracex1_user.o 62 + tracex2-objs := bpf_load.o tracex2_user.o 63 + tracex3-objs := bpf_load.o tracex3_user.o 64 + tracex4-objs := bpf_load.o tracex4_user.o 65 + tracex5-objs := bpf_load.o tracex5_user.o 66 + tracex6-objs := bpf_load.o tracex6_user.o 67 + tracex7-objs := bpf_load.o tracex7_user.o 68 + load_sock_ops-objs := bpf_load.o load_sock_ops.o 69 + test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o 70 + trace_output-objs := bpf_load.o trace_output_user.o $(TRACE_HELPERS) 71 + lathist-objs := bpf_load.o lathist_user.o 72 + offwaketime-objs := bpf_load.o offwaketime_user.o $(TRACE_HELPERS) 73 + spintest-objs := bpf_load.o spintest_user.o $(TRACE_HELPERS) 74 + map_perf_test-objs := bpf_load.o map_perf_test_user.o 75 + test_overhead-objs := bpf_load.o test_overhead_user.o 76 + test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o 77 + test_cgrp2_attach-objs := test_cgrp2_attach.o 78 + test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(CGROUP_HELPERS) 79 + test_cgrp2_sock-objs := test_cgrp2_sock.o 80 + test_cgrp2_sock2-objs := bpf_load.o test_cgrp2_sock2.o 81 + xdp1-objs := xdp1_user.o 86 82 # reuse xdp1 source intentionally 87 - xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o 88 - xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o 89 - test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \ 83 + xdp2-objs := xdp1_user.o 84 + xdp_router_ipv4-objs := bpf_load.o xdp_router_ipv4_user.o 85 + test_current_task_under_cgroup-objs := bpf_load.o $(CGROUP_HELPERS) \ 90 86 test_current_task_under_cgroup_user.o 91 - trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o $(TRACE_HELPERS) 92 - sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o $(TRACE_HELPERS) 93 - tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o 94 - lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o 95 - xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o 96 - test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o 97 - per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o 98 - xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o 99 - xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o 100 - xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o 101 - xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o 102 - xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o 103 - syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o 104 - cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o 105 - xdp_adjust_tail-objs := bpf_load.o $(LIBBPF) xdp_adjust_tail_user.o 106 - xdpsock-objs := bpf_load.o $(LIBBPF) xdpsock_user.o 87 + trace_event-objs := bpf_load.o trace_event_user.o $(TRACE_HELPERS) 88 + sampleip-objs := bpf_load.o sampleip_user.o $(TRACE_HELPERS) 89 + tc_l2_redirect-objs := bpf_load.o tc_l2_redirect_user.o 90 + lwt_len_hist-objs := bpf_load.o lwt_len_hist_user.o 91 + xdp_tx_iptunnel-objs := bpf_load.o xdp_tx_iptunnel_user.o 92 + test_map_in_map-objs := bpf_load.o test_map_in_map_user.o 93 + per_socket_stats_example-objs := cookie_uid_helper_example.o 94 + xdp_redirect-objs := bpf_load.o xdp_redirect_user.o 95 + xdp_redirect_map-objs := bpf_load.o xdp_redirect_map_user.o 96 + xdp_redirect_cpu-objs := bpf_load.o xdp_redirect_cpu_user.o 97 + xdp_monitor-objs := bpf_load.o xdp_monitor_user.o 98 + xdp_rxq_info-objs := xdp_rxq_info_user.o 99 + syscall_tp-objs := bpf_load.o syscall_tp_user.o 100 + cpustat-objs := bpf_load.o cpustat_user.o 101 + xdp_adjust_tail-objs := xdp_adjust_tail_user.o 102 + xdpsock-objs := bpf_load.o xdpsock_user.o 103 + xdp_fwd-objs := bpf_load.o xdp_fwd_user.o 107 104 108 105 # Tell kbuild to always build the programs 109 106 always := $(hostprogs-y) ··· 159 154 always += cpustat_kern.o 160 155 always += xdp_adjust_tail_kern.o 161 156 always += xdpsock_kern.o 157 + always += xdp_fwd_kern.o 162 158 163 159 HOSTCFLAGS += -I$(objtree)/usr/include 164 160 HOSTCFLAGS += -I$(srctree)/tools/lib/ ··· 168 162 HOSTCFLAGS += -I$(srctree)/tools/perf 169 163 170 164 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable 171 - HOSTLOADLIBES_fds_example += -lelf 172 - HOSTLOADLIBES_sockex1 += -lelf 173 - HOSTLOADLIBES_sockex2 += -lelf 174 - HOSTLOADLIBES_sockex3 += -lelf 175 - HOSTLOADLIBES_tracex1 += -lelf 176 - HOSTLOADLIBES_tracex2 += -lelf 177 - HOSTLOADLIBES_tracex3 += -lelf 178 - HOSTLOADLIBES_tracex4 += -lelf -lrt 179 - HOSTLOADLIBES_tracex5 += -lelf 180 - HOSTLOADLIBES_tracex6 += -lelf 181 - HOSTLOADLIBES_tracex7 += -lelf 182 - HOSTLOADLIBES_test_cgrp2_sock2 += -lelf 183 - HOSTLOADLIBES_load_sock_ops += -lelf 184 - HOSTLOADLIBES_test_probe_write_user += -lelf 185 - HOSTLOADLIBES_trace_output += -lelf -lrt 186 - HOSTLOADLIBES_lathist += -lelf 187 - HOSTLOADLIBES_offwaketime += -lelf 188 - HOSTLOADLIBES_spintest += -lelf 189 - HOSTLOADLIBES_map_perf_test += -lelf -lrt 190 - HOSTLOADLIBES_test_overhead += -lelf -lrt 191 - HOSTLOADLIBES_xdp1 += -lelf 192 - HOSTLOADLIBES_xdp2 += -lelf 193 - HOSTLOADLIBES_xdp_router_ipv4 += -lelf 194 - HOSTLOADLIBES_test_current_task_under_cgroup += -lelf 195 - HOSTLOADLIBES_trace_event += -lelf 196 - HOSTLOADLIBES_sampleip += -lelf 197 - HOSTLOADLIBES_tc_l2_redirect += -l elf 198 - HOSTLOADLIBES_lwt_len_hist += -l elf 199 - HOSTLOADLIBES_xdp_tx_iptunnel += -lelf 200 - HOSTLOADLIBES_test_map_in_map += -lelf 201 - HOSTLOADLIBES_xdp_redirect += -lelf 202 - HOSTLOADLIBES_xdp_redirect_map += -lelf 203 - HOSTLOADLIBES_xdp_redirect_cpu += -lelf 204 - HOSTLOADLIBES_xdp_monitor += -lelf 205 - HOSTLOADLIBES_xdp_rxq_info += -lelf 206 - HOSTLOADLIBES_syscall_tp += -lelf 207 - HOSTLOADLIBES_cpustat += -lelf 208 - HOSTLOADLIBES_xdp_adjust_tail += -lelf 209 - HOSTLOADLIBES_xdpsock += -lelf -pthread 165 + HOSTCFLAGS_trace_helpers.o += -I$(srctree)/tools/lib/bpf/ 166 + 167 + HOSTCFLAGS_trace_output_user.o += -I$(srctree)/tools/lib/bpf/ 168 + HOSTCFLAGS_offwaketime_user.o += -I$(srctree)/tools/lib/bpf/ 169 + HOSTCFLAGS_spintest_user.o += -I$(srctree)/tools/lib/bpf/ 170 + HOSTCFLAGS_trace_event_user.o += -I$(srctree)/tools/lib/bpf/ 171 + HOSTCFLAGS_sampleip_user.o += -I$(srctree)/tools/lib/bpf/ 172 + 173 + HOST_LOADLIBES += $(LIBBPF) -lelf 174 + HOSTLOADLIBES_tracex4 += -lrt 175 + HOSTLOADLIBES_trace_output += -lrt 176 + HOSTLOADLIBES_map_perf_test += -lrt 177 + HOSTLOADLIBES_test_overhead += -lrt 178 + HOSTLOADLIBES_xdpsock += -pthread 210 179 211 180 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: 212 181 # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang ··· 195 214 endif 196 215 197 216 # Trick to allow make to be run from this directory 198 - all: $(LIBBPF) 199 - $(MAKE) -C ../../ $(CURDIR)/ 217 + all: 218 + $(MAKE) -C ../../ $(CURDIR)/ BPF_SAMPLES_PATH=$(CURDIR) 200 219 201 220 clean: 202 221 $(MAKE) -C ../../ M=$(CURDIR) clean 203 222 @rm -f *~ 204 223 205 224 $(LIBBPF): FORCE 206 - $(MAKE) -C $(dir $@) $(notdir $@) 225 + # Fix up variables inherited from Kbuild that tools/ build system won't like 226 + $(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(BPF_SAMPLES_PATH)/../../ O= 207 227 208 228 $(obj)/syscall_nrs.s: $(src)/syscall_nrs.c 209 229 $(call if_changed_dep,cc_s_c) ··· 235 253 exit 2; \ 236 254 else true; fi 237 255 238 - $(src)/*.c: verify_target_bpf 256 + $(BPF_SAMPLES_PATH)/*.c: verify_target_bpf $(LIBBPF) 257 + $(src)/*.c: verify_target_bpf $(LIBBPF) 239 258 240 259 $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h 241 260 ··· 244 261 # But, there is no easy way to fix it, so just exclude it since it is 245 262 # useless for BPF samples. 246 263 $(obj)/%.o: $(src)/%.c 247 - $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \ 264 + @echo " CLANG-bpf " $@ 265 + $(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \ 248 266 -I$(srctree)/tools/testing/selftests/bpf/ \ 249 267 -D__KERNEL__ -Wno-unused-value -Wno-pointer-sign \ 250 268 -D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
+6 -6
samples/bpf/bpf_load.c
··· 24 24 #include <poll.h> 25 25 #include <ctype.h> 26 26 #include <assert.h> 27 - #include "libbpf.h" 27 + #include <bpf/bpf.h> 28 28 #include "bpf_load.h" 29 29 #include "perf-sys.h" 30 30 ··· 420 420 421 421 /* Keeping compatible with ELF maps section changes 422 422 * ------------------------------------------------ 423 - * The program size of struct bpf_map_def is known by loader 423 + * The program size of struct bpf_load_map_def is known by loader 424 424 * code, but struct stored in ELF file can be different. 425 425 * 426 426 * Unfortunately sym[i].st_size is zero. To calculate the ··· 429 429 * symbols. 430 430 */ 431 431 map_sz_elf = data_maps->d_size / nr_maps; 432 - map_sz_copy = sizeof(struct bpf_map_def); 432 + map_sz_copy = sizeof(struct bpf_load_map_def); 433 433 if (map_sz_elf < map_sz_copy) { 434 434 /* 435 435 * Backward compat, loading older ELF file with ··· 448 448 449 449 /* Memcpy relevant part of ELF maps data to loader maps */ 450 450 for (i = 0; i < nr_maps; i++) { 451 + struct bpf_load_map_def *def; 451 452 unsigned char *addr, *end; 452 - struct bpf_map_def *def; 453 453 const char *map_name; 454 454 size_t offset; 455 455 ··· 464 464 465 465 /* Symbol value is offset into ELF maps section data area */ 466 466 offset = sym[i].st_value; 467 - def = (struct bpf_map_def *)(data_maps->d_buf + offset); 467 + def = (struct bpf_load_map_def *)(data_maps->d_buf + offset); 468 468 maps[i].elf_offset = offset; 469 - memset(&maps[i].def, 0, sizeof(struct bpf_map_def)); 469 + memset(&maps[i].def, 0, sizeof(struct bpf_load_map_def)); 470 470 memcpy(&maps[i].def, def, map_sz_copy); 471 471 472 472 /* Verify no newer features were requested */
+3 -3
samples/bpf/bpf_load.h
··· 2 2 #ifndef __BPF_LOAD_H 3 3 #define __BPF_LOAD_H 4 4 5 - #include "libbpf.h" 5 + #include <bpf/bpf.h> 6 6 7 7 #define MAX_MAPS 32 8 8 #define MAX_PROGS 32 9 9 10 - struct bpf_map_def { 10 + struct bpf_load_map_def { 11 11 unsigned int type; 12 12 unsigned int key_size; 13 13 unsigned int value_size; ··· 21 21 int fd; 22 22 char *name; 23 23 size_t elf_offset; 24 - struct bpf_map_def def; 24 + struct bpf_load_map_def def; 25 25 }; 26 26 27 27 typedef void (*fixup_map_cb)(struct bpf_map_data *map, int idx);
+1 -1
samples/bpf/cookie_uid_helper_example.c
··· 51 51 #include <sys/types.h> 52 52 #include <unistd.h> 53 53 #include <bpf/bpf.h> 54 - #include "libbpf.h" 54 + #include "bpf_insn.h" 55 55 56 56 #define PORT 8888 57 57
+1 -1
samples/bpf/cpustat_user.c
··· 17 17 #include <sys/resource.h> 18 18 #include <sys/wait.h> 19 19 20 - #include "libbpf.h" 20 + #include <bpf/bpf.h> 21 21 #include "bpf_load.h" 22 22 23 23 #define MAX_CPU 8
+3 -1
samples/bpf/fds_example.c
··· 12 12 #include <sys/types.h> 13 13 #include <sys/socket.h> 14 14 15 + #include <bpf/bpf.h> 16 + 17 + #include "bpf_insn.h" 15 18 #include "bpf_load.h" 16 - #include "libbpf.h" 17 19 #include "sock_example.h" 18 20 19 21 #define BPF_F_PIN (1 << 0)
+1 -1
samples/bpf/lathist_user.c
··· 10 10 #include <stdlib.h> 11 11 #include <signal.h> 12 12 #include <linux/bpf.h> 13 - #include "libbpf.h" 13 + #include <bpf/bpf.h> 14 14 #include "bpf_load.h" 15 15 16 16 #define MAX_ENTRIES 20
+3 -5
samples/bpf/libbpf.h samples/bpf/bpf_insn.h
··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 - /* eBPF mini library */ 3 - #ifndef __LIBBPF_H 4 - #define __LIBBPF_H 5 - 6 - #include <bpf/bpf.h> 2 + /* eBPF instruction mini library */ 3 + #ifndef __BPF_INSN_H 4 + #define __BPF_INSN_H 7 5 8 6 struct bpf_insn; 9 7
+1 -1
samples/bpf/load_sock_ops.c
··· 8 8 #include <stdlib.h> 9 9 #include <string.h> 10 10 #include <linux/bpf.h> 11 - #include "libbpf.h" 11 + #include <bpf/bpf.h> 12 12 #include "bpf_load.h" 13 13 #include <unistd.h> 14 14 #include <errno.h>
+1 -1
samples/bpf/lwt_len_hist_user.c
··· 9 9 #include <errno.h> 10 10 #include <arpa/inet.h> 11 11 12 - #include "libbpf.h" 12 + #include <bpf/bpf.h> 13 13 #include "bpf_util.h" 14 14 15 15 #define MAX_INDEX 64
+1 -1
samples/bpf/map_perf_test_user.c
··· 21 21 #include <arpa/inet.h> 22 22 #include <errno.h> 23 23 24 - #include "libbpf.h" 24 + #include <bpf/bpf.h> 25 25 #include "bpf_load.h" 26 26 27 27 #define TEST_BIT(t) (1U << (t))
+2 -1
samples/bpf/sock_example.c
··· 26 26 #include <linux/if_ether.h> 27 27 #include <linux/ip.h> 28 28 #include <stddef.h> 29 - #include "libbpf.h" 29 + #include <bpf/bpf.h> 30 + #include "bpf_insn.h" 30 31 #include "sock_example.h" 31 32 32 33 char bpf_log_buf[BPF_LOG_BUF_SIZE];
-1
samples/bpf/sock_example.h
··· 9 9 #include <net/if.h> 10 10 #include <linux/if_packet.h> 11 11 #include <arpa/inet.h> 12 - #include "libbpf.h" 13 12 14 13 static inline int open_raw_sock(const char *name) 15 14 {
+1 -1
samples/bpf/sockex1_user.c
··· 2 2 #include <stdio.h> 3 3 #include <assert.h> 4 4 #include <linux/bpf.h> 5 - #include "libbpf.h" 5 + #include <bpf/bpf.h> 6 6 #include "bpf_load.h" 7 7 #include "sock_example.h" 8 8 #include <unistd.h>
+1 -1
samples/bpf/sockex2_user.c
··· 2 2 #include <stdio.h> 3 3 #include <assert.h> 4 4 #include <linux/bpf.h> 5 - #include "libbpf.h" 5 + #include <bpf/bpf.h> 6 6 #include "bpf_load.h" 7 7 #include "sock_example.h" 8 8 #include <unistd.h>
+1 -1
samples/bpf/sockex3_user.c
··· 2 2 #include <stdio.h> 3 3 #include <assert.h> 4 4 #include <linux/bpf.h> 5 - #include "libbpf.h" 5 + #include <bpf/bpf.h> 6 6 #include "bpf_load.h" 7 7 #include "sock_example.h" 8 8 #include <unistd.h>
+1 -1
samples/bpf/syscall_tp_user.c
··· 16 16 #include <assert.h> 17 17 #include <stdbool.h> 18 18 #include <sys/resource.h> 19 - #include "libbpf.h" 19 + #include <bpf/bpf.h> 20 20 #include "bpf_load.h" 21 21 22 22 /* This program verifies bpf attachment to tracepoint sys_enter_* and sys_exit_*.
+1 -1
samples/bpf/tc_l2_redirect_user.c
··· 13 13 #include <string.h> 14 14 #include <errno.h> 15 15 16 - #include "libbpf.h" 16 + #include <bpf/bpf.h> 17 17 18 18 static void usage(void) 19 19 {
+1 -1
samples/bpf/test_cgrp2_array_pin.c
··· 14 14 #include <errno.h> 15 15 #include <fcntl.h> 16 16 17 - #include "libbpf.h" 17 + #include <bpf/bpf.h> 18 18 19 19 static void usage(void) 20 20 {
+2 -1
samples/bpf/test_cgrp2_attach.c
··· 28 28 #include <fcntl.h> 29 29 30 30 #include <linux/bpf.h> 31 + #include <bpf/bpf.h> 31 32 32 - #include "libbpf.h" 33 + #include "bpf_insn.h" 33 34 34 35 enum { 35 36 MAP_KEY_PACKETS,
+2 -1
samples/bpf/test_cgrp2_attach2.c
··· 24 24 #include <unistd.h> 25 25 26 26 #include <linux/bpf.h> 27 + #include <bpf/bpf.h> 27 28 28 - #include "libbpf.h" 29 + #include "bpf_insn.h" 29 30 #include "cgroup_helpers.h" 30 31 31 32 #define FOO "/foo"
+2 -1
samples/bpf/test_cgrp2_sock.c
··· 21 21 #include <net/if.h> 22 22 #include <inttypes.h> 23 23 #include <linux/bpf.h> 24 + #include <bpf/bpf.h> 24 25 25 - #include "libbpf.h" 26 + #include "bpf_insn.h" 26 27 27 28 char bpf_log_buf[BPF_LOG_BUF_SIZE]; 28 29
+2 -1
samples/bpf/test_cgrp2_sock2.c
··· 19 19 #include <fcntl.h> 20 20 #include <net/if.h> 21 21 #include <linux/bpf.h> 22 + #include <bpf/bpf.h> 22 23 23 - #include "libbpf.h" 24 + #include "bpf_insn.h" 24 25 #include "bpf_load.h" 25 26 26 27 static int usage(const char *argv0)
+1 -1
samples/bpf/test_current_task_under_cgroup_user.c
··· 9 9 #include <stdio.h> 10 10 #include <linux/bpf.h> 11 11 #include <unistd.h> 12 - #include "libbpf.h" 12 + #include <bpf/bpf.h> 13 13 #include "bpf_load.h" 14 14 #include <linux/bpf.h> 15 15 #include "cgroup_helpers.h"
+1 -1
samples/bpf/test_lru_dist.c
··· 21 21 #include <stdlib.h> 22 22 #include <time.h> 23 23 24 - #include "libbpf.h" 24 + #include <bpf/bpf.h> 25 25 #include "bpf_util.h" 26 26 27 27 #define min(a, b) ((a) < (b) ? (a) : (b))
+1 -1
samples/bpf/test_map_in_map_user.c
··· 13 13 #include <errno.h> 14 14 #include <stdlib.h> 15 15 #include <stdio.h> 16 - #include "libbpf.h" 16 + #include <bpf/bpf.h> 17 17 #include "bpf_load.h" 18 18 19 19 #define PORT_A (map_fd[0])
+1 -1
samples/bpf/test_overhead_user.c
··· 19 19 #include <string.h> 20 20 #include <time.h> 21 21 #include <sys/resource.h> 22 - #include "libbpf.h" 22 + #include <bpf/bpf.h> 23 23 #include "bpf_load.h" 24 24 25 25 #define MAX_CNT 1000000
+1 -1
samples/bpf/test_probe_write_user_user.c
··· 3 3 #include <assert.h> 4 4 #include <linux/bpf.h> 5 5 #include <unistd.h> 6 - #include "libbpf.h" 6 + #include <bpf/bpf.h> 7 7 #include "bpf_load.h" 8 8 #include <sys/socket.h> 9 9 #include <string.h>
+4 -4
samples/bpf/trace_output_user.c
··· 18 18 #include <sys/mman.h> 19 19 #include <time.h> 20 20 #include <signal.h> 21 - #include "libbpf.h" 21 + #include <libbpf.h> 22 22 #include "bpf_load.h" 23 23 #include "perf-sys.h" 24 24 #include "trace_helpers.h" ··· 48 48 if (e->cookie != 0x12345678) { 49 49 printf("BUG pid %llx cookie %llx sized %d\n", 50 50 e->pid, e->cookie, size); 51 - return PERF_EVENT_ERROR; 51 + return LIBBPF_PERF_EVENT_ERROR; 52 52 } 53 53 54 54 cnt++; ··· 56 56 if (cnt == MAX_CNT) { 57 57 printf("recv %lld events per sec\n", 58 58 MAX_CNT * 1000000000ll / (time_get_ns() - start_time)); 59 - return PERF_EVENT_DONE; 59 + return LIBBPF_PERF_EVENT_DONE; 60 60 } 61 61 62 - return PERF_EVENT_CONT; 62 + return LIBBPF_PERF_EVENT_CONT; 63 63 } 64 64 65 65 static void test_bpf_perf_event(void)
+1 -1
samples/bpf/tracex1_user.c
··· 2 2 #include <stdio.h> 3 3 #include <linux/bpf.h> 4 4 #include <unistd.h> 5 - #include "libbpf.h" 5 + #include <bpf/bpf.h> 6 6 #include "bpf_load.h" 7 7 8 8 int main(int ac, char **argv)
+1 -1
samples/bpf/tracex2_user.c
··· 7 7 #include <string.h> 8 8 #include <sys/resource.h> 9 9 10 - #include "libbpf.h" 10 + #include <bpf/bpf.h> 11 11 #include "bpf_load.h" 12 12 #include "bpf_util.h" 13 13
+1 -1
samples/bpf/tracex3_user.c
··· 13 13 #include <linux/bpf.h> 14 14 #include <sys/resource.h> 15 15 16 - #include "libbpf.h" 16 + #include <bpf/bpf.h> 17 17 #include "bpf_load.h" 18 18 #include "bpf_util.h" 19 19
+1 -1
samples/bpf/tracex4_user.c
··· 14 14 #include <linux/bpf.h> 15 15 #include <sys/resource.h> 16 16 17 - #include "libbpf.h" 17 + #include <bpf/bpf.h> 18 18 #include "bpf_load.h" 19 19 20 20 struct pair {
+1 -1
samples/bpf/tracex5_user.c
··· 5 5 #include <linux/filter.h> 6 6 #include <linux/seccomp.h> 7 7 #include <sys/prctl.h> 8 - #include "libbpf.h" 8 + #include <bpf/bpf.h> 9 9 #include "bpf_load.h" 10 10 #include <sys/resource.h> 11 11
+1 -1
samples/bpf/tracex6_user.c
··· 16 16 #include <unistd.h> 17 17 18 18 #include "bpf_load.h" 19 - #include "libbpf.h" 19 + #include <bpf/bpf.h> 20 20 #include "perf-sys.h" 21 21 22 22 #define SAMPLE_PERIOD 0x7fffffffffffffffULL
+1 -1
samples/bpf/tracex7_user.c
··· 3 3 #include <stdio.h> 4 4 #include <linux/bpf.h> 5 5 #include <unistd.h> 6 - #include "libbpf.h" 6 + #include <bpf/bpf.h> 7 7 #include "bpf_load.h" 8 8 9 9 int main(int argc, char **argv)
+21 -10
samples/bpf/xdp1_user.c
··· 16 16 #include <libgen.h> 17 17 #include <sys/resource.h> 18 18 19 - #include "bpf_load.h" 20 19 #include "bpf_util.h" 21 - #include "libbpf.h" 20 + #include "bpf/bpf.h" 21 + #include "bpf/libbpf.h" 22 22 23 23 static int ifindex; 24 24 static __u32 xdp_flags; ··· 31 31 32 32 /* simple per-protocol drop counter 33 33 */ 34 - static void poll_stats(int interval) 34 + static void poll_stats(int map_fd, int interval) 35 35 { 36 36 unsigned int nr_cpus = bpf_num_possible_cpus(); 37 37 const unsigned int nr_keys = 256; ··· 47 47 for (key = 0; key < nr_keys; key++) { 48 48 __u64 sum = 0; 49 49 50 - assert(bpf_map_lookup_elem(map_fd[0], &key, values) == 0); 50 + assert(bpf_map_lookup_elem(map_fd, &key, values) == 0); 51 51 for (i = 0; i < nr_cpus; i++) 52 52 sum += (values[i] - prev[key][i]); 53 53 if (sum) ··· 71 71 int main(int argc, char **argv) 72 72 { 73 73 struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 74 + struct bpf_prog_load_attr prog_load_attr = { 75 + .prog_type = BPF_PROG_TYPE_XDP, 76 + }; 74 77 const char *optstr = "SN"; 78 + int prog_fd, map_fd, opt; 79 + struct bpf_object *obj; 80 + struct bpf_map *map; 75 81 char filename[256]; 76 - int opt; 77 82 78 83 while ((opt = getopt(argc, argv, optstr)) != -1) { 79 84 switch (opt) { ··· 107 102 ifindex = strtoul(argv[optind], NULL, 0); 108 103 109 104 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 105 + prog_load_attr.file = filename; 110 106 111 - if (load_bpf_file(filename)) { 112 - printf("%s", bpf_log_buf); 107 + if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) 108 + return 1; 109 + 110 + map = bpf_map__next(NULL, obj); 111 + if (!map) { 112 + printf("finding a map in obj file failed\n"); 113 113 return 1; 114 114 } 115 + map_fd = bpf_map__fd(map); 115 116 116 - if (!prog_fd[0]) { 117 + if (!prog_fd) { 117 118 printf("load_bpf_file: %s\n", strerror(errno)); 118 119 return 1; 119 120 } ··· 127 116 signal(SIGINT, int_exit); 128 117 signal(SIGTERM, int_exit); 129 118 130 - if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) { 119 + if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) { 131 120 printf("link set xdp fd failed\n"); 132 121 return 1; 133 122 } 134 123 135 - poll_stats(2); 124 + poll_stats(map_fd, 2); 136 125 137 126 return 0; 138 127 }
+22 -14
samples/bpf/xdp_adjust_tail_user.c
··· 18 18 #include <netinet/ether.h> 19 19 #include <unistd.h> 20 20 #include <time.h> 21 - #include "bpf_load.h" 22 - #include "libbpf.h" 23 - #include "bpf_util.h" 21 + #include "bpf/bpf.h" 22 + #include "bpf/libbpf.h" 24 23 25 24 #define STATS_INTERVAL_S 2U 26 25 ··· 35 36 36 37 /* simple "icmp packet too big sent" counter 37 38 */ 38 - static void poll_stats(unsigned int kill_after_s) 39 + static void poll_stats(unsigned int map_fd, unsigned int kill_after_s) 39 40 { 40 41 time_t started_at = time(NULL); 41 42 __u64 value = 0; ··· 45 46 while (!kill_after_s || time(NULL) - started_at <= kill_after_s) { 46 47 sleep(STATS_INTERVAL_S); 47 48 48 - assert(bpf_map_lookup_elem(map_fd[0], &key, &value) == 0); 49 + assert(bpf_map_lookup_elem(map_fd, &key, &value) == 0); 49 50 50 51 printf("icmp \"packet too big\" sent: %10llu pkts\n", value); 51 52 } ··· 65 66 66 67 int main(int argc, char **argv) 67 68 { 69 + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 70 + struct bpf_prog_load_attr prog_load_attr = { 71 + .prog_type = BPF_PROG_TYPE_XDP, 72 + }; 68 73 unsigned char opt_flags[256] = {}; 69 74 unsigned int kill_after_s = 0; 70 75 const char *optstr = "i:T:SNh"; 71 - struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; 76 + int i, prog_fd, map_fd, opt; 77 + struct bpf_object *obj; 78 + struct bpf_map *map; 72 79 char filename[256]; 73 - int opt; 74 - int i; 75 - 76 80 77 81 for (i = 0; i < strlen(optstr); i++) 78 82 if (optstr[i] != 'h' && 'a' <= optstr[i] && optstr[i] <= 'z') ··· 117 115 } 118 116 119 117 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 118 + prog_load_attr.file = filename; 120 119 121 - if (load_bpf_file(filename)) { 122 - printf("%s", bpf_log_buf); 120 + if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) 121 + return 1; 122 + 123 + map = bpf_map__next(NULL, obj); 124 + if (!map) { 125 + printf("finding a map in obj file failed\n"); 123 126 return 1; 124 127 } 128 + map_fd = bpf_map__fd(map); 125 129 126 - if (!prog_fd[0]) { 130 + if (!prog_fd) { 127 131 printf("load_bpf_file: %s\n", strerror(errno)); 128 132 return 1; 129 133 } ··· 137 129 signal(SIGINT, int_exit); 138 130 signal(SIGTERM, int_exit); 139 131 140 - if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) { 132 + if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) { 141 133 printf("link set xdp fd failed\n"); 142 134 return 1; 143 135 } 144 136 145 - poll_stats(kill_after_s); 137 + poll_stats(map_fd, kill_after_s); 146 138 147 139 bpf_set_link_xdp_fd(ifindex, -1, xdp_flags); 148 140
+138
samples/bpf/xdp_fwd_kern.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2017-18 David Ahern <dsahern@gmail.com> 3 + * 4 + * This program is free software; you can redistribute it and/or 5 + * modify it under the terms of version 2 of the GNU General Public 6 + * License as published by the Free Software Foundation. 7 + * 8 + * This program is distributed in the hope that it will be useful, but 9 + * WITHOUT ANY WARRANTY; without even the implied warranty of 10 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 11 + * General Public License for more details. 12 + */ 13 + #define KBUILD_MODNAME "foo" 14 + #include <uapi/linux/bpf.h> 15 + #include <linux/in.h> 16 + #include <linux/if_ether.h> 17 + #include <linux/if_packet.h> 18 + #include <linux/if_vlan.h> 19 + #include <linux/ip.h> 20 + #include <linux/ipv6.h> 21 + 22 + #include "bpf_helpers.h" 23 + 24 + #define IPV6_FLOWINFO_MASK cpu_to_be32(0x0FFFFFFF) 25 + 26 + struct bpf_map_def SEC("maps") tx_port = { 27 + .type = BPF_MAP_TYPE_DEVMAP, 28 + .key_size = sizeof(int), 29 + .value_size = sizeof(int), 30 + .max_entries = 64, 31 + }; 32 + 33 + /* from include/net/ip.h */ 34 + static __always_inline int ip_decrease_ttl(struct iphdr *iph) 35 + { 36 + u32 check = (__force u32)iph->check; 37 + 38 + check += (__force u32)htons(0x0100); 39 + iph->check = (__force __sum16)(check + (check >= 0xFFFF)); 40 + return --iph->ttl; 41 + } 42 + 43 + static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags) 44 + { 45 + void *data_end = (void *)(long)ctx->data_end; 46 + void *data = (void *)(long)ctx->data; 47 + struct bpf_fib_lookup fib_params; 48 + struct ethhdr *eth = data; 49 + struct ipv6hdr *ip6h; 50 + struct iphdr *iph; 51 + int out_index; 52 + u16 h_proto; 53 + u64 nh_off; 54 + 55 + nh_off = sizeof(*eth); 56 + if (data + nh_off > data_end) 57 + return XDP_DROP; 58 + 59 + __builtin_memset(&fib_params, 0, sizeof(fib_params)); 60 + 61 + h_proto = eth->h_proto; 62 + if (h_proto == htons(ETH_P_IP)) { 63 + iph = data + nh_off; 64 + 65 + if (iph + 1 > data_end) 66 + return XDP_DROP; 67 + 68 + if (iph->ttl <= 1) 69 + return XDP_PASS; 70 + 71 + fib_params.family = AF_INET; 72 + fib_params.tos = iph->tos; 73 + fib_params.l4_protocol = iph->protocol; 74 + fib_params.sport = 0; 75 + fib_params.dport = 0; 76 + fib_params.tot_len = ntohs(iph->tot_len); 77 + fib_params.ipv4_src = iph->saddr; 78 + fib_params.ipv4_dst = iph->daddr; 79 + } else if (h_proto == htons(ETH_P_IPV6)) { 80 + struct in6_addr *src = (struct in6_addr *) fib_params.ipv6_src; 81 + struct in6_addr *dst = (struct in6_addr *) fib_params.ipv6_dst; 82 + 83 + ip6h = data + nh_off; 84 + if (ip6h + 1 > data_end) 85 + return XDP_DROP; 86 + 87 + if (ip6h->hop_limit <= 1) 88 + return XDP_PASS; 89 + 90 + fib_params.family = AF_INET6; 91 + fib_params.flowlabel = *(__be32 *)ip6h & IPV6_FLOWINFO_MASK; 92 + fib_params.l4_protocol = ip6h->nexthdr; 93 + fib_params.sport = 0; 94 + fib_params.dport = 0; 95 + fib_params.tot_len = ntohs(ip6h->payload_len); 96 + *src = ip6h->saddr; 97 + *dst = ip6h->daddr; 98 + } else { 99 + return XDP_PASS; 100 + } 101 + 102 + fib_params.ifindex = ctx->ingress_ifindex; 103 + 104 + out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags); 105 + 106 + /* verify egress index has xdp support 107 + * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with 108 + * cannot pass map_type 14 into func bpf_map_lookup_elem#1: 109 + * NOTE: without verification that egress index supports XDP 110 + * forwarding packets are dropped. 111 + */ 112 + if (out_index > 0) { 113 + if (h_proto == htons(ETH_P_IP)) 114 + ip_decrease_ttl(iph); 115 + else if (h_proto == htons(ETH_P_IPV6)) 116 + ip6h->hop_limit--; 117 + 118 + memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN); 119 + memcpy(eth->h_source, fib_params.smac, ETH_ALEN); 120 + return bpf_redirect_map(&tx_port, out_index, 0); 121 + } 122 + 123 + return XDP_PASS; 124 + } 125 + 126 + SEC("xdp_fwd") 127 + int xdp_fwd_prog(struct xdp_md *ctx) 128 + { 129 + return xdp_fwd_flags(ctx, 0); 130 + } 131 + 132 + SEC("xdp_fwd_direct") 133 + int xdp_fwd_direct_prog(struct xdp_md *ctx) 134 + { 135 + return xdp_fwd_flags(ctx, BPF_FIB_LOOKUP_DIRECT); 136 + } 137 + 138 + char _license[] SEC("license") = "GPL";
+136
samples/bpf/xdp_fwd_user.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2017-18 David Ahern <dsahern@gmail.com> 3 + * 4 + * This program is free software; you can redistribute it and/or 5 + * modify it under the terms of version 2 of the GNU General Public 6 + * License as published by the Free Software Foundation. 7 + * 8 + * This program is distributed in the hope that it will be useful, but 9 + * WITHOUT ANY WARRANTY; without even the implied warranty of 10 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 11 + * General Public License for more details. 12 + */ 13 + 14 + #include <linux/bpf.h> 15 + #include <linux/if_link.h> 16 + #include <linux/limits.h> 17 + #include <net/if.h> 18 + #include <errno.h> 19 + #include <stdio.h> 20 + #include <stdlib.h> 21 + #include <stdbool.h> 22 + #include <string.h> 23 + #include <unistd.h> 24 + #include <fcntl.h> 25 + #include <libgen.h> 26 + 27 + #include "bpf_load.h" 28 + #include "bpf_util.h" 29 + #include <bpf/bpf.h> 30 + 31 + 32 + static int do_attach(int idx, int fd, const char *name) 33 + { 34 + int err; 35 + 36 + err = bpf_set_link_xdp_fd(idx, fd, 0); 37 + if (err < 0) 38 + printf("ERROR: failed to attach program to %s\n", name); 39 + 40 + return err; 41 + } 42 + 43 + static int do_detach(int idx, const char *name) 44 + { 45 + int err; 46 + 47 + err = bpf_set_link_xdp_fd(idx, -1, 0); 48 + if (err < 0) 49 + printf("ERROR: failed to detach program from %s\n", name); 50 + 51 + return err; 52 + } 53 + 54 + static void usage(const char *prog) 55 + { 56 + fprintf(stderr, 57 + "usage: %s [OPTS] interface-list\n" 58 + "\nOPTS:\n" 59 + " -d detach program\n" 60 + " -D direct table lookups (skip fib rules)\n", 61 + prog); 62 + } 63 + 64 + int main(int argc, char **argv) 65 + { 66 + char filename[PATH_MAX]; 67 + int opt, i, idx, err; 68 + int prog_id = 0; 69 + int attach = 1; 70 + int ret = 0; 71 + 72 + while ((opt = getopt(argc, argv, ":dD")) != -1) { 73 + switch (opt) { 74 + case 'd': 75 + attach = 0; 76 + break; 77 + case 'D': 78 + prog_id = 1; 79 + break; 80 + default: 81 + usage(basename(argv[0])); 82 + return 1; 83 + } 84 + } 85 + 86 + if (optind == argc) { 87 + usage(basename(argv[0])); 88 + return 1; 89 + } 90 + 91 + if (attach) { 92 + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 93 + 94 + if (access(filename, O_RDONLY) < 0) { 95 + printf("error accessing file %s: %s\n", 96 + filename, strerror(errno)); 97 + return 1; 98 + } 99 + 100 + if (load_bpf_file(filename)) { 101 + printf("%s", bpf_log_buf); 102 + return 1; 103 + } 104 + 105 + if (!prog_fd[prog_id]) { 106 + printf("load_bpf_file: %s\n", strerror(errno)); 107 + return 1; 108 + } 109 + } 110 + if (attach) { 111 + for (i = 1; i < 64; ++i) 112 + bpf_map_update_elem(map_fd[0], &i, &i, 0); 113 + } 114 + 115 + for (i = optind; i < argc; ++i) { 116 + idx = if_nametoindex(argv[i]); 117 + if (!idx) 118 + idx = strtoul(argv[i], NULL, 0); 119 + 120 + if (!idx) { 121 + fprintf(stderr, "Invalid arg\n"); 122 + return 1; 123 + } 124 + if (!attach) { 125 + err = do_detach(idx, argv[i]); 126 + if (err) 127 + ret = err; 128 + } else { 129 + err = do_attach(idx, prog_fd[prog_id], argv[i]); 130 + if (err) 131 + ret = err; 132 + } 133 + } 134 + 135 + return ret; 136 + }
+3 -3
samples/bpf/xdp_monitor_user.c
··· 26 26 #include <net/if.h> 27 27 #include <time.h> 28 28 29 - #include "libbpf.h" 29 + #include <bpf/bpf.h> 30 30 #include "bpf_load.h" 31 31 #include "bpf_util.h" 32 32 ··· 58 58 printf(" flag (internal value:%d)", 59 59 *long_options[i].flag); 60 60 else 61 - printf("(internal short-option: -%c)", 61 + printf("short-option: -%c", 62 62 long_options[i].val); 63 63 printf("\n"); 64 64 } ··· 594 594 snprintf(bpf_obj_file, sizeof(bpf_obj_file), "%s_kern.o", argv[0]); 595 595 596 596 /* Parse commands line args */ 597 - while ((opt = getopt_long(argc, argv, "h", 597 + while ((opt = getopt_long(argc, argv, "hDSs:", 598 598 long_options, &longindex)) != -1) { 599 599 switch (opt) { 600 600 case 'D':
+1 -1
samples/bpf/xdp_redirect_cpu_user.c
··· 28 28 * use bpf/libbpf.h), but cannot as (currently) needed for XDP 29 29 * attaching to a device via bpf_set_link_xdp_fd() 30 30 */ 31 - #include "libbpf.h" 31 + #include <bpf/bpf.h> 32 32 #include "bpf_load.h" 33 33 34 34 #include "bpf_util.h"
+1 -1
samples/bpf/xdp_redirect_map_user.c
··· 24 24 25 25 #include "bpf_load.h" 26 26 #include "bpf_util.h" 27 - #include "libbpf.h" 27 + #include <bpf/bpf.h> 28 28 29 29 static int ifindex_in; 30 30 static int ifindex_out;
+1 -1
samples/bpf/xdp_redirect_user.c
··· 24 24 25 25 #include "bpf_load.h" 26 26 #include "bpf_util.h" 27 - #include "libbpf.h" 27 + #include <bpf/bpf.h> 28 28 29 29 static int ifindex_in; 30 30 static int ifindex_out;
+1 -1
samples/bpf/xdp_router_ipv4_user.c
··· 16 16 #include <sys/socket.h> 17 17 #include <unistd.h> 18 18 #include "bpf_load.h" 19 - #include "libbpf.h" 19 + #include <bpf/bpf.h> 20 20 #include <arpa/inet.h> 21 21 #include <fcntl.h> 22 22 #include <poll.h>
+31 -15
samples/bpf/xdp_rxq_info_user.c
··· 22 22 #include <arpa/inet.h> 23 23 #include <linux/if_link.h> 24 24 25 - #include "libbpf.h" 26 - #include "bpf_load.h" 25 + #include "bpf/bpf.h" 26 + #include "bpf/libbpf.h" 27 27 #include "bpf_util.h" 28 28 29 29 static int ifindex = -1; ··· 31 31 static char *ifname; 32 32 33 33 static __u32 xdp_flags; 34 + 35 + static struct bpf_map *stats_global_map; 36 + static struct bpf_map *rx_queue_index_map; 34 37 35 38 /* Exit return codes */ 36 39 #define EXIT_OK 0 ··· 177 174 178 175 static struct record *alloc_record_per_rxq(void) 179 176 { 180 - unsigned int nr_rxqs = map_data[2].def.max_entries; 177 + unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries; 181 178 struct record *array; 182 179 size_t size; 183 180 ··· 193 190 194 191 static struct stats_record *alloc_stats_record(void) 195 192 { 196 - unsigned int nr_rxqs = map_data[2].def.max_entries; 193 + unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries; 197 194 struct stats_record *rec; 198 195 int i; 199 196 ··· 213 210 214 211 static void free_stats_record(struct stats_record *r) 215 212 { 216 - unsigned int nr_rxqs = map_data[2].def.max_entries; 213 + unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries; 217 214 int i; 218 215 219 216 for (i = 0; i < nr_rxqs; i++) ··· 257 254 { 258 255 int fd, i, max_rxqs; 259 256 260 - fd = map_data[1].fd; /* map: stats_global_map */ 257 + fd = bpf_map__fd(stats_global_map); 261 258 map_collect_percpu(fd, 0, &rec->stats); 262 259 263 - fd = map_data[2].fd; /* map: rx_queue_index_map */ 264 - max_rxqs = map_data[2].def.max_entries; 260 + fd = bpf_map__fd(rx_queue_index_map); 261 + max_rxqs = bpf_map__def(rx_queue_index_map)->max_entries; 265 262 for (i = 0; i < max_rxqs; i++) 266 263 map_collect_percpu(fd, i, &rec->rxq[i]); 267 264 } ··· 307 304 struct stats_record *stats_prev, 308 305 int action) 309 306 { 307 + unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries; 310 308 unsigned int nr_cpus = bpf_num_possible_cpus(); 311 - unsigned int nr_rxqs = map_data[2].def.max_entries; 312 309 double pps = 0, err = 0; 313 310 struct record *rec, *prev; 314 311 double t; ··· 422 419 int main(int argc, char **argv) 423 420 { 424 421 struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY}; 422 + struct bpf_prog_load_attr prog_load_attr = { 423 + .prog_type = BPF_PROG_TYPE_XDP, 424 + }; 425 + int prog_fd, map_fd, opt, err; 425 426 bool use_separators = true; 426 427 struct config cfg = { 0 }; 428 + struct bpf_object *obj; 429 + struct bpf_map *map; 427 430 char filename[256]; 428 431 int longindex = 0; 429 432 int interval = 2; 430 433 __u32 key = 0; 431 - int opt, err; 432 434 433 435 char action_str_buf[XDP_ACTION_MAX_STRLEN + 1 /* for \0 */] = { 0 }; 434 436 int action = XDP_PASS; /* Default action */ 435 437 char *action_str = NULL; 436 438 437 439 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); 440 + prog_load_attr.file = filename; 438 441 439 442 if (setrlimit(RLIMIT_MEMLOCK, &r)) { 440 443 perror("setrlimit(RLIMIT_MEMLOCK)"); 441 444 return 1; 442 445 } 443 446 444 - if (load_bpf_file(filename)) { 445 - fprintf(stderr, "ERR in load_bpf_file(): %s", bpf_log_buf); 447 + if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd)) 448 + return EXIT_FAIL; 449 + 450 + map = bpf_map__next(NULL, obj); 451 + stats_global_map = bpf_map__next(map, obj); 452 + rx_queue_index_map = bpf_map__next(stats_global_map, obj); 453 + if (!map || !stats_global_map || !rx_queue_index_map) { 454 + printf("finding a map in obj file failed\n"); 446 455 return EXIT_FAIL; 447 456 } 457 + map_fd = bpf_map__fd(map); 448 458 449 - if (!prog_fd[0]) { 459 + if (!prog_fd) { 450 460 fprintf(stderr, "ERR: load_bpf_file: %s\n", strerror(errno)); 451 461 return EXIT_FAIL; 452 462 } ··· 528 512 setlocale(LC_NUMERIC, "en_US"); 529 513 530 514 /* User-side setup ifindex in config_map */ 531 - err = bpf_map_update_elem(map_fd[0], &key, &cfg, 0); 515 + err = bpf_map_update_elem(map_fd, &key, &cfg, 0); 532 516 if (err) { 533 517 fprintf(stderr, "Store config failed (err:%d)\n", err); 534 518 exit(EXIT_FAIL_BPF); ··· 537 521 /* Remove XDP program when program is interrupted */ 538 522 signal(SIGINT, int_exit); 539 523 540 - if (bpf_set_link_xdp_fd(ifindex, prog_fd[0], xdp_flags) < 0) { 524 + if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) { 541 525 fprintf(stderr, "link set xdp fd failed\n"); 542 526 return EXIT_FAIL_XDP; 543 527 }
+1 -1
samples/bpf/xdp_tx_iptunnel_user.c
··· 18 18 #include <unistd.h> 19 19 #include <time.h> 20 20 #include "bpf_load.h" 21 - #include "libbpf.h" 21 + #include <bpf/bpf.h> 22 22 #include "bpf_util.h" 23 23 #include "xdp_tx_iptunnel_common.h" 24 24
+1 -1
samples/bpf/xdpsock_user.c
··· 38 38 39 39 #include "bpf_load.h" 40 40 #include "bpf_util.h" 41 - #include "libbpf.h" 41 + #include <bpf/bpf.h> 42 42 43 43 #include "xdpsock.h" 44 44
+3
tools/bpf/bpftool/.gitignore
··· 1 + *.d 2 + bpftool 3 + FEATURE-DUMP.bpftool
+1
tools/bpf/bpftool/map.c
··· 66 66 [BPF_MAP_TYPE_DEVMAP] = "devmap", 67 67 [BPF_MAP_TYPE_SOCKMAP] = "sockmap", 68 68 [BPF_MAP_TYPE_CPUMAP] = "cpumap", 69 + [BPF_MAP_TYPE_SOCKHASH] = "sockhash", 69 70 }; 70 71 71 72 static bool map_is_per_cpu(__u32 type)
+20 -61
tools/bpf/bpftool/map_perf_ring.c
··· 39 39 40 40 struct perf_event_sample { 41 41 struct perf_event_header header; 42 + u64 time; 42 43 __u32 size; 43 44 unsigned char data[]; 44 45 }; ··· 50 49 stop = true; 51 50 } 52 51 53 - static void 54 - print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e) 52 + static enum bpf_perf_event_ret print_bpf_output(void *event, void *priv) 55 53 { 54 + struct event_ring_info *ring = priv; 55 + struct perf_event_sample *e = event; 56 56 struct { 57 57 struct perf_event_header header; 58 58 __u64 id; 59 59 __u64 lost; 60 - } *lost = (void *)e; 61 - struct timespec ts; 62 - 63 - if (clock_gettime(CLOCK_MONOTONIC, &ts)) { 64 - perror("Can't read clock for timestamp"); 65 - return; 66 - } 60 + } *lost = event; 67 61 68 62 if (json_output) { 69 63 jsonw_start_object(json_wtr); 70 - jsonw_name(json_wtr, "timestamp"); 71 - jsonw_uint(json_wtr, ts.tv_sec * 1000000000ull + ts.tv_nsec); 72 64 jsonw_name(json_wtr, "type"); 73 65 jsonw_uint(json_wtr, e->header.type); 74 66 jsonw_name(json_wtr, "cpu"); ··· 69 75 jsonw_name(json_wtr, "index"); 70 76 jsonw_uint(json_wtr, ring->key); 71 77 if (e->header.type == PERF_RECORD_SAMPLE) { 78 + jsonw_name(json_wtr, "timestamp"); 79 + jsonw_uint(json_wtr, e->time); 72 80 jsonw_name(json_wtr, "data"); 73 81 print_data_json(e->data, e->size); 74 82 } else if (e->header.type == PERF_RECORD_LOST) { ··· 85 89 jsonw_end_object(json_wtr); 86 90 } else { 87 91 if (e->header.type == PERF_RECORD_SAMPLE) { 88 - printf("== @%ld.%ld CPU: %d index: %d =====\n", 89 - (long)ts.tv_sec, ts.tv_nsec, 92 + printf("== @%lld.%09lld CPU: %d index: %d =====\n", 93 + e->time / 1000000000ULL, e->time % 1000000000ULL, 90 94 ring->cpu, ring->key); 91 95 fprint_hex(stdout, e->data, e->size, " "); 92 96 printf("\n"); ··· 97 101 e->header.type, e->header.size); 98 102 } 99 103 } 104 + 105 + return LIBBPF_PERF_EVENT_CONT; 100 106 } 101 107 102 108 static void 103 109 perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len) 104 110 { 105 - volatile struct perf_event_mmap_page *header = ring->mem; 106 - __u64 buffer_size = MMAP_PAGE_CNT * get_page_size(); 107 - __u64 data_tail = header->data_tail; 108 - __u64 data_head = header->data_head; 109 - void *base, *begin, *end; 111 + enum bpf_perf_event_ret ret; 110 112 111 - asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */ 112 - if (data_head == data_tail) 113 - return; 114 - 115 - base = ((char *)header) + get_page_size(); 116 - 117 - begin = base + data_tail % buffer_size; 118 - end = base + data_head % buffer_size; 119 - 120 - while (begin != end) { 121 - struct perf_event_sample *e; 122 - 123 - e = begin; 124 - if (begin + e->header.size > base + buffer_size) { 125 - long len = base + buffer_size - begin; 126 - 127 - if (*buf_len < e->header.size) { 128 - free(*buf); 129 - *buf = malloc(e->header.size); 130 - if (!*buf) { 131 - fprintf(stderr, 132 - "can't allocate memory"); 133 - stop = true; 134 - return; 135 - } 136 - *buf_len = e->header.size; 137 - } 138 - 139 - memcpy(*buf, begin, len); 140 - memcpy(*buf + len, base, e->header.size - len); 141 - e = (void *)*buf; 142 - begin = base + e->header.size - len; 143 - } else if (begin + e->header.size == base + buffer_size) { 144 - begin = base; 145 - } else { 146 - begin += e->header.size; 147 - } 148 - 149 - print_bpf_output(ring, e); 113 + ret = bpf_perf_event_read_simple(ring->mem, 114 + MMAP_PAGE_CNT * get_page_size(), 115 + get_page_size(), buf, buf_len, 116 + print_bpf_output, ring); 117 + if (ret != LIBBPF_PERF_EVENT_CONT) { 118 + fprintf(stderr, "perf read loop failed with %d\n", ret); 119 + stop = true; 150 120 } 151 - 152 - __sync_synchronize(); /* smp_mb() */ 153 - header->data_tail = data_head; 154 121 } 155 122 156 123 static int perf_mmap_size(void) ··· 144 185 static int bpf_perf_event_open(int map_fd, int key, int cpu) 145 186 { 146 187 struct perf_event_attr attr = { 147 - .sample_type = PERF_SAMPLE_RAW, 188 + .sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_TIME, 148 189 .type = PERF_TYPE_SOFTWARE, 149 190 .config = PERF_COUNT_SW_BPF_OUTPUT, 150 191 };
+18
tools/include/uapi/asm/bitsperlong.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #if defined(__i386__) || defined(__x86_64__) 3 + #include "../../arch/x86/include/uapi/asm/bitsperlong.h" 4 + #elif defined(__aarch64__) 5 + #include "../../arch/arm64/include/uapi/asm/bitsperlong.h" 6 + #elif defined(__powerpc__) 7 + #include "../../arch/powerpc/include/uapi/asm/bitsperlong.h" 8 + #elif defined(__s390__) 9 + #include "../../arch/s390/include/uapi/asm/bitsperlong.h" 10 + #elif defined(__sparc__) 11 + #include "../../arch/sparc/include/uapi/asm/bitsperlong.h" 12 + #elif defined(__mips__) 13 + #include "../../arch/mips/include/uapi/asm/bitsperlong.h" 14 + #elif defined(__ia64__) 15 + #include "../../arch/ia64/include/uapi/asm/bitsperlong.h" 16 + #else 17 + #include <asm-generic/bitsperlong.h> 18 + #endif
+18
tools/include/uapi/asm/errno.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #if defined(__i386__) || defined(__x86_64__) 3 + #include "../../arch/x86/include/uapi/asm/errno.h" 4 + #elif defined(__powerpc__) 5 + #include "../../arch/powerpc/include/uapi/asm/errno.h" 6 + #elif defined(__sparc__) 7 + #include "../../arch/sparc/include/uapi/asm/errno.h" 8 + #elif defined(__alpha__) 9 + #include "../../arch/alpha/include/uapi/asm/errno.h" 10 + #elif defined(__mips__) 11 + #include "../../arch/mips/include/uapi/asm/errno.h" 12 + #elif defined(__ia64__) 13 + #include "../../arch/ia64/include/uapi/asm/errno.h" 14 + #elif defined(__xtensa__) 15 + #include "../../arch/xtensa/include/uapi/asm/errno.h" 16 + #else 17 + #include <asm-generic/errno.h> 18 + #endif
+142 -1
tools/include/uapi/linux/bpf.h
··· 96 96 BPF_PROG_QUERY, 97 97 BPF_RAW_TRACEPOINT_OPEN, 98 98 BPF_BTF_LOAD, 99 + BPF_BTF_GET_FD_BY_ID, 99 100 }; 100 101 101 102 enum bpf_map_type { ··· 117 116 BPF_MAP_TYPE_DEVMAP, 118 117 BPF_MAP_TYPE_SOCKMAP, 119 118 BPF_MAP_TYPE_CPUMAP, 119 + BPF_MAP_TYPE_XSKMAP, 120 + BPF_MAP_TYPE_SOCKHASH, 120 121 }; 121 122 122 123 enum bpf_prog_type { ··· 346 343 __u32 start_id; 347 344 __u32 prog_id; 348 345 __u32 map_id; 346 + __u32 btf_id; 349 347 }; 350 348 __u32 next_id; 351 349 __u32 open_flags; ··· 1829 1825 * Return 1830 1826 * 0 on success, or a negative error in case of failure. 1831 1827 * 1828 + * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags) 1829 + * Description 1830 + * Do FIB lookup in kernel tables using parameters in *params*. 1831 + * If lookup is successful and result shows packet is to be 1832 + * forwarded, the neighbor tables are searched for the nexthop. 1833 + * If successful (ie., FIB lookup shows forwarding and nexthop 1834 + * is resolved), the nexthop address is returned in ipv4_dst, 1835 + * ipv6_dst or mpls_out based on family, smac is set to mac 1836 + * address of egress device, dmac is set to nexthop mac address, 1837 + * rt_metric is set to metric from route. 1838 + * 1839 + * *plen* argument is the size of the passed in struct. 1840 + * *flags* argument can be one or more BPF_FIB_LOOKUP_ flags: 1841 + * 1842 + * **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs 1843 + * full lookup using FIB rules 1844 + * **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress 1845 + * perspective (default is ingress) 1846 + * 1847 + * *ctx* is either **struct xdp_md** for XDP programs or 1848 + * **struct sk_buff** tc cls_act programs. 1849 + * 1850 + * Return 1851 + * Egress device index on success, 0 if packet needs to continue 1852 + * up the stack for further processing or a negative error in case 1853 + * of failure. 1854 + * 1855 + * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags) 1856 + * Description 1857 + * Add an entry to, or update a sockhash *map* referencing sockets. 1858 + * The *skops* is used as a new value for the entry associated to 1859 + * *key*. *flags* is one of: 1860 + * 1861 + * **BPF_NOEXIST** 1862 + * The entry for *key* must not exist in the map. 1863 + * **BPF_EXIST** 1864 + * The entry for *key* must already exist in the map. 1865 + * **BPF_ANY** 1866 + * No condition on the existence of the entry for *key*. 1867 + * 1868 + * If the *map* has eBPF programs (parser and verdict), those will 1869 + * be inherited by the socket being added. If the socket is 1870 + * already attached to eBPF programs, this results in an error. 1871 + * Return 1872 + * 0 on success, or a negative error in case of failure. 1873 + * 1874 + * int bpf_msg_redirect_hash(struct sk_msg_buff *msg, struct bpf_map *map, void *key, u64 flags) 1875 + * Description 1876 + * This helper is used in programs implementing policies at the 1877 + * socket level. If the message *msg* is allowed to pass (i.e. if 1878 + * the verdict eBPF program returns **SK_PASS**), redirect it to 1879 + * the socket referenced by *map* (of type 1880 + * **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and 1881 + * egress interfaces can be used for redirection. The 1882 + * **BPF_F_INGRESS** value in *flags* is used to make the 1883 + * distinction (ingress path is selected if the flag is present, 1884 + * egress path otherwise). This is the only flag supported for now. 1885 + * Return 1886 + * **SK_PASS** on success, or **SK_DROP** on error. 1887 + * 1888 + * int bpf_sk_redirect_hash(struct sk_buff *skb, struct bpf_map *map, void *key, u64 flags) 1889 + * Description 1890 + * This helper is used in programs implementing policies at the 1891 + * skb socket level. If the sk_buff *skb* is allowed to pass (i.e. 1892 + * if the verdeict eBPF program returns **SK_PASS**), redirect it 1893 + * to the socket referenced by *map* (of type 1894 + * **BPF_MAP_TYPE_SOCKHASH**) using hash *key*. Both ingress and 1895 + * egress interfaces can be used for redirection. The 1896 + * **BPF_F_INGRESS** value in *flags* is used to make the 1897 + * distinction (ingress path is selected if the flag is present, 1898 + * egress otherwise). This is the only flag supported for now. 1899 + * Return 1900 + * **SK_PASS** on success, or **SK_DROP** on error. 1832 1901 */ 1833 1902 #define __BPF_FUNC_MAPPER(FN) \ 1834 1903 FN(unspec), \ ··· 1972 1895 FN(xdp_adjust_tail), \ 1973 1896 FN(skb_get_xfrm_state), \ 1974 1897 FN(get_stack), \ 1975 - FN(skb_load_bytes_relative), 1898 + FN(skb_load_bytes_relative), \ 1899 + FN(fib_lookup), \ 1900 + FN(sock_hash_update), \ 1901 + FN(msg_redirect_hash), \ 1902 + FN(sk_redirect_hash), 1976 1903 1977 1904 /* integer value in 'imm' field of BPF_CALL instruction selects which helper 1978 1905 * function eBPF program intends to call ··· 2210 2129 __u32 ifindex; 2211 2130 __u64 netns_dev; 2212 2131 __u64 netns_ino; 2132 + __u32 btf_id; 2133 + __u32 btf_key_id; 2134 + __u32 btf_value_id; 2135 + } __attribute__((aligned(8))); 2136 + 2137 + struct bpf_btf_info { 2138 + __aligned_u64 btf; 2139 + __u32 btf_size; 2140 + __u32 id; 2213 2141 } __attribute__((aligned(8))); 2214 2142 2215 2143 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed ··· 2397 2307 2398 2308 struct bpf_raw_tracepoint_args { 2399 2309 __u64 args[0]; 2310 + }; 2311 + 2312 + /* DIRECT: Skip the FIB rules and go to FIB table associated with device 2313 + * OUTPUT: Do lookup from egress perspective; default is ingress 2314 + */ 2315 + #define BPF_FIB_LOOKUP_DIRECT BIT(0) 2316 + #define BPF_FIB_LOOKUP_OUTPUT BIT(1) 2317 + 2318 + struct bpf_fib_lookup { 2319 + /* input */ 2320 + __u8 family; /* network family, AF_INET, AF_INET6, AF_MPLS */ 2321 + 2322 + /* set if lookup is to consider L4 data - e.g., FIB rules */ 2323 + __u8 l4_protocol; 2324 + __be16 sport; 2325 + __be16 dport; 2326 + 2327 + /* total length of packet from network header - used for MTU check */ 2328 + __u16 tot_len; 2329 + __u32 ifindex; /* L3 device index for lookup */ 2330 + 2331 + union { 2332 + /* inputs to lookup */ 2333 + __u8 tos; /* AF_INET */ 2334 + __be32 flowlabel; /* AF_INET6 */ 2335 + 2336 + /* output: metric of fib result */ 2337 + __u32 rt_metric; 2338 + }; 2339 + 2340 + union { 2341 + __be32 mpls_in; 2342 + __be32 ipv4_src; 2343 + __u32 ipv6_src[4]; /* in6_addr; network order */ 2344 + }; 2345 + 2346 + /* input to bpf_fib_lookup, *dst is destination address. 2347 + * output: bpf_fib_lookup sets to gateway address 2348 + */ 2349 + union { 2350 + /* return for MPLS lookups */ 2351 + __be32 mpls_out[4]; /* support up to 4 labels */ 2352 + __be32 ipv4_dst; 2353 + __u32 ipv6_dst[4]; /* in6_addr; network order */ 2354 + }; 2355 + 2356 + /* output */ 2357 + __be16 h_vlan_proto; 2358 + __be16 h_vlan_TCI; 2359 + __u8 smac[6]; /* ETH_ALEN */ 2360 + __u8 dmac[6]; /* ETH_ALEN */ 2400 2361 }; 2401 2362 2402 2363 #endif /* _UAPI__LINUX_BPF_H__ */
+1 -1
tools/lib/bpf/Makefile
··· 69 69 FEATURE_TESTS = libelf libelf-getphdrnum libelf-mmap bpf 70 70 FEATURE_DISPLAY = libelf bpf 71 71 72 - INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi 72 + INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(ARCH)/include/uapi -I$(srctree)/tools/include/uapi -I$(srctree)/tools/perf 73 73 FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES) 74 74 75 75 check_feat := 1
+12
tools/lib/bpf/bpf.c
··· 91 91 attr.btf_fd = create_attr->btf_fd; 92 92 attr.btf_key_id = create_attr->btf_key_id; 93 93 attr.btf_value_id = create_attr->btf_value_id; 94 + attr.map_ifindex = create_attr->map_ifindex; 94 95 95 96 return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr)); 96 97 } ··· 202 201 attr.log_size = 0; 203 202 attr.log_level = 0; 204 203 attr.kern_version = load_attr->kern_version; 204 + attr.prog_ifindex = load_attr->prog_ifindex; 205 205 memcpy(attr.prog_name, load_attr->name, 206 206 min(name_len, BPF_OBJ_NAME_LEN - 1)); 207 207 ··· 458 456 attr.map_id = id; 459 457 460 458 return sys_bpf(BPF_MAP_GET_FD_BY_ID, &attr, sizeof(attr)); 459 + } 460 + 461 + int bpf_btf_get_fd_by_id(__u32 id) 462 + { 463 + union bpf_attr attr; 464 + 465 + bzero(&attr, sizeof(attr)); 466 + attr.btf_id = id; 467 + 468 + return sys_bpf(BPF_BTF_GET_FD_BY_ID, &attr, sizeof(attr)); 461 469 } 462 470 463 471 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
+3
tools/lib/bpf/bpf.h
··· 38 38 __u32 btf_fd; 39 39 __u32 btf_key_id; 40 40 __u32 btf_value_id; 41 + __u32 map_ifindex; 41 42 }; 42 43 43 44 int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr); ··· 65 64 size_t insns_cnt; 66 65 const char *license; 67 66 __u32 kern_version; 67 + __u32 prog_ifindex; 68 68 }; 69 69 70 70 /* Recommend log buffer size */ ··· 100 98 int bpf_map_get_next_id(__u32 start_id, __u32 *next_id); 101 99 int bpf_prog_get_fd_by_id(__u32 id); 102 100 int bpf_map_get_fd_by_id(__u32 id); 101 + int bpf_btf_get_fd_by_id(__u32 id); 103 102 int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len); 104 103 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags, 105 104 __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt);
+115 -10
tools/lib/bpf/libbpf.c
··· 31 31 #include <unistd.h> 32 32 #include <fcntl.h> 33 33 #include <errno.h> 34 + #include <perf-sys.h> 34 35 #include <asm/unistd.h> 35 36 #include <linux/err.h> 36 37 #include <linux/kernel.h> ··· 178 177 /* Index in elf obj file, for relocation use. */ 179 178 int idx; 180 179 char *name; 180 + int prog_ifindex; 181 181 char *section_name; 182 182 struct bpf_insn *insns; 183 183 size_t insns_cnt, main_prog_cnt; ··· 214 212 int fd; 215 213 char *name; 216 214 size_t offset; 215 + int map_ifindex; 217 216 struct bpf_map_def def; 218 217 uint32_t btf_key_id; 219 218 uint32_t btf_value_id; ··· 1093 1090 int *pfd = &map->fd; 1094 1091 1095 1092 create_attr.name = map->name; 1093 + create_attr.map_ifindex = map->map_ifindex; 1096 1094 create_attr.map_type = def->type; 1097 1095 create_attr.map_flags = def->map_flags; 1098 1096 create_attr.key_size = def->key_size; ··· 1276 1272 static int 1277 1273 load_program(enum bpf_prog_type type, enum bpf_attach_type expected_attach_type, 1278 1274 const char *name, struct bpf_insn *insns, int insns_cnt, 1279 - char *license, u32 kern_version, int *pfd) 1275 + char *license, u32 kern_version, int *pfd, int prog_ifindex) 1280 1276 { 1281 1277 struct bpf_load_program_attr load_attr; 1282 1278 char *log_buf; ··· 1290 1286 load_attr.insns_cnt = insns_cnt; 1291 1287 load_attr.license = license; 1292 1288 load_attr.kern_version = kern_version; 1289 + load_attr.prog_ifindex = prog_ifindex; 1293 1290 1294 1291 if (!load_attr.insns || !load_attr.insns_cnt) 1295 1292 return -EINVAL; ··· 1372 1367 } 1373 1368 err = load_program(prog->type, prog->expected_attach_type, 1374 1369 prog->name, prog->insns, prog->insns_cnt, 1375 - license, kern_version, &fd); 1370 + license, kern_version, &fd, 1371 + prog->prog_ifindex); 1376 1372 if (!err) 1377 1373 prog->instances.fds[0] = fd; 1378 1374 goto out; ··· 1404 1398 err = load_program(prog->type, prog->expected_attach_type, 1405 1399 prog->name, result.new_insn_ptr, 1406 1400 result.new_insn_cnt, 1407 - license, kern_version, &fd); 1401 + license, kern_version, &fd, 1402 + prog->prog_ifindex); 1408 1403 1409 1404 if (err) { 1410 1405 pr_warning("Loading the %dth instance of program '%s' failed\n", ··· 1444 1437 return 0; 1445 1438 } 1446 1439 1447 - static int bpf_object__validate(struct bpf_object *obj) 1440 + static bool bpf_prog_type__needs_kver(enum bpf_prog_type type) 1448 1441 { 1449 - if (obj->kern_version == 0) { 1442 + switch (type) { 1443 + case BPF_PROG_TYPE_SOCKET_FILTER: 1444 + case BPF_PROG_TYPE_SCHED_CLS: 1445 + case BPF_PROG_TYPE_SCHED_ACT: 1446 + case BPF_PROG_TYPE_XDP: 1447 + case BPF_PROG_TYPE_CGROUP_SKB: 1448 + case BPF_PROG_TYPE_CGROUP_SOCK: 1449 + case BPF_PROG_TYPE_LWT_IN: 1450 + case BPF_PROG_TYPE_LWT_OUT: 1451 + case BPF_PROG_TYPE_LWT_XMIT: 1452 + case BPF_PROG_TYPE_SOCK_OPS: 1453 + case BPF_PROG_TYPE_SK_SKB: 1454 + case BPF_PROG_TYPE_CGROUP_DEVICE: 1455 + case BPF_PROG_TYPE_SK_MSG: 1456 + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: 1457 + return false; 1458 + case BPF_PROG_TYPE_UNSPEC: 1459 + case BPF_PROG_TYPE_KPROBE: 1460 + case BPF_PROG_TYPE_TRACEPOINT: 1461 + case BPF_PROG_TYPE_PERF_EVENT: 1462 + case BPF_PROG_TYPE_RAW_TRACEPOINT: 1463 + default: 1464 + return true; 1465 + } 1466 + } 1467 + 1468 + static int bpf_object__validate(struct bpf_object *obj, bool needs_kver) 1469 + { 1470 + if (needs_kver && obj->kern_version == 0) { 1450 1471 pr_warning("%s doesn't provide kernel version\n", 1451 1472 obj->path); 1452 1473 return -LIBBPF_ERRNO__KVERSION; ··· 1483 1448 } 1484 1449 1485 1450 static struct bpf_object * 1486 - __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz) 1451 + __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz, 1452 + bool needs_kver) 1487 1453 { 1488 1454 struct bpf_object *obj; 1489 1455 int err; ··· 1502 1466 CHECK_ERR(bpf_object__check_endianness(obj), err, out); 1503 1467 CHECK_ERR(bpf_object__elf_collect(obj), err, out); 1504 1468 CHECK_ERR(bpf_object__collect_reloc(obj), err, out); 1505 - CHECK_ERR(bpf_object__validate(obj), err, out); 1469 + CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out); 1506 1470 1507 1471 bpf_object__elf_finish(obj); 1508 1472 return obj; ··· 1519 1483 1520 1484 pr_debug("loading %s\n", path); 1521 1485 1522 - return __bpf_object__open(path, NULL, 0); 1486 + return __bpf_object__open(path, NULL, 0, true); 1523 1487 } 1524 1488 1525 1489 struct bpf_object *bpf_object__open_buffer(void *obj_buf, ··· 1542 1506 pr_debug("loading object '%s' from buffer\n", 1543 1507 name); 1544 1508 1545 - return __bpf_object__open(name, obj_buf, obj_buf_sz); 1509 + return __bpf_object__open(name, obj_buf, obj_buf_sz, true); 1546 1510 } 1547 1511 1548 1512 int bpf_object__unload(struct bpf_object *obj) ··· 2194 2158 enum bpf_attach_type expected_attach_type; 2195 2159 enum bpf_prog_type prog_type; 2196 2160 struct bpf_object *obj; 2161 + struct bpf_map *map; 2197 2162 int section_idx; 2198 2163 int err; 2199 2164 2200 2165 if (!attr) 2201 2166 return -EINVAL; 2167 + if (!attr->file) 2168 + return -EINVAL; 2202 2169 2203 - obj = bpf_object__open(attr->file); 2170 + obj = __bpf_object__open(attr->file, NULL, 0, 2171 + bpf_prog_type__needs_kver(attr->prog_type)); 2204 2172 if (IS_ERR(obj)) 2205 2173 return -ENOENT; 2206 2174 ··· 2214 2174 * section name. 2215 2175 */ 2216 2176 prog_type = attr->prog_type; 2177 + prog->prog_ifindex = attr->ifindex; 2217 2178 expected_attach_type = attr->expected_attach_type; 2218 2179 if (prog_type == BPF_PROG_TYPE_UNSPEC) { 2219 2180 section_idx = bpf_program__identify_section(prog); ··· 2235 2194 first_prog = prog; 2236 2195 } 2237 2196 2197 + bpf_map__for_each(map, obj) { 2198 + map->map_ifindex = attr->ifindex; 2199 + } 2200 + 2238 2201 if (!first_prog) { 2239 2202 pr_warning("object file doesn't contain bpf program\n"); 2240 2203 bpf_object__close(obj); ··· 2254 2209 *pobj = obj; 2255 2210 *prog_fd = bpf_program__fd(first_prog); 2256 2211 return 0; 2212 + } 2213 + 2214 + enum bpf_perf_event_ret 2215 + bpf_perf_event_read_simple(void *mem, unsigned long size, 2216 + unsigned long page_size, void **buf, size_t *buf_len, 2217 + bpf_perf_event_print_t fn, void *priv) 2218 + { 2219 + volatile struct perf_event_mmap_page *header = mem; 2220 + __u64 data_tail = header->data_tail; 2221 + __u64 data_head = header->data_head; 2222 + void *base, *begin, *end; 2223 + int ret; 2224 + 2225 + asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */ 2226 + if (data_head == data_tail) 2227 + return LIBBPF_PERF_EVENT_CONT; 2228 + 2229 + base = ((char *)header) + page_size; 2230 + 2231 + begin = base + data_tail % size; 2232 + end = base + data_head % size; 2233 + 2234 + while (begin != end) { 2235 + struct perf_event_header *ehdr; 2236 + 2237 + ehdr = begin; 2238 + if (begin + ehdr->size > base + size) { 2239 + long len = base + size - begin; 2240 + 2241 + if (*buf_len < ehdr->size) { 2242 + free(*buf); 2243 + *buf = malloc(ehdr->size); 2244 + if (!*buf) { 2245 + ret = LIBBPF_PERF_EVENT_ERROR; 2246 + break; 2247 + } 2248 + *buf_len = ehdr->size; 2249 + } 2250 + 2251 + memcpy(*buf, begin, len); 2252 + memcpy(*buf + len, base, ehdr->size - len); 2253 + ehdr = (void *)*buf; 2254 + begin = base + ehdr->size - len; 2255 + } else if (begin + ehdr->size == base + size) { 2256 + begin = base; 2257 + } else { 2258 + begin += ehdr->size; 2259 + } 2260 + 2261 + ret = fn(ehdr, priv); 2262 + if (ret != LIBBPF_PERF_EVENT_CONT) 2263 + break; 2264 + 2265 + data_tail += ehdr->size; 2266 + } 2267 + 2268 + __sync_synchronize(); /* smp_mb() */ 2269 + header->data_tail = data_tail; 2270 + 2271 + return ret; 2257 2272 }
+38 -24
tools/lib/bpf/libbpf.h
··· 52 52 int libbpf_strerror(int err, char *buf, size_t size); 53 53 54 54 /* 55 - * In include/linux/compiler-gcc.h, __printf is defined. However 56 - * it should be better if libbpf.h doesn't depend on Linux header file. 55 + * __printf is defined in include/linux/compiler-gcc.h. However, 56 + * it would be better if libbpf.h didn't depend on Linux header files. 57 57 * So instead of __printf, here we use gcc attribute directly. 58 58 */ 59 59 typedef int (*libbpf_print_fn_t)(const char *, ...) ··· 92 92 bpf_object_clear_priv_t clear_priv); 93 93 void *bpf_object__priv(struct bpf_object *prog); 94 94 95 - /* Accessors of bpf_program. */ 95 + /* Accessors of bpf_program */ 96 96 struct bpf_program; 97 97 struct bpf_program *bpf_program__next(struct bpf_program *prog, 98 98 struct bpf_object *obj); ··· 121 121 122 122 /* 123 123 * Libbpf allows callers to adjust BPF programs before being loaded 124 - * into kernel. One program in an object file can be transform into 125 - * multiple variants to be attached to different code. 124 + * into kernel. One program in an object file can be transformed into 125 + * multiple variants to be attached to different hooks. 126 126 * 127 127 * bpf_program_prep_t, bpf_program__set_prep and bpf_program__nth_fd 128 - * are APIs for this propose. 128 + * form an API for this purpose. 129 129 * 130 130 * - bpf_program_prep_t: 131 - * It defines 'preprocessor', which is a caller defined function 131 + * Defines a 'preprocessor', which is a caller defined function 132 132 * passed to libbpf through bpf_program__set_prep(), and will be 133 133 * called before program is loaded. The processor should adjust 134 - * the program one time for each instances according to the number 134 + * the program one time for each instance according to the instance id 135 135 * passed to it. 136 136 * 137 137 * - bpf_program__set_prep: 138 - * Attachs a preprocessor to a BPF program. The number of instances 139 - * whould be created is also passed through this function. 138 + * Attaches a preprocessor to a BPF program. The number of instances 139 + * that should be created is also passed through this function. 140 140 * 141 141 * - bpf_program__nth_fd: 142 - * After the program is loaded, get resuling fds from bpf program for 143 - * each instances. 142 + * After the program is loaded, get resulting FD of a given instance 143 + * of the BPF program. 144 144 * 145 - * If bpf_program__set_prep() is not used, the program whould be loaded 145 + * If bpf_program__set_prep() is not used, the program would be loaded 146 146 * without adjustment during bpf_object__load(). The program has only 147 147 * one instance. In this case bpf_program__fd(prog) is equal to 148 148 * bpf_program__nth_fd(prog, 0). ··· 156 156 struct bpf_insn *new_insn_ptr; 157 157 int new_insn_cnt; 158 158 159 - /* If not NULL, result fd is set to it */ 159 + /* If not NULL, result FD is written to it. */ 160 160 int *pfd; 161 161 }; 162 162 ··· 169 169 * - res: Output parameter, result of transformation. 170 170 * 171 171 * Return value: 172 - * - Zero: pre-processing success. 173 - * - Non-zero: pre-processing, stop loading. 172 + * - Zero: pre-processing success. 173 + * - Non-zero: pre-processing error, stop loading. 174 174 */ 175 175 typedef int (*bpf_program_prep_t)(struct bpf_program *prog, int n, 176 176 struct bpf_insn *insns, int insns_cnt, ··· 182 182 int bpf_program__nth_fd(struct bpf_program *prog, int n); 183 183 184 184 /* 185 - * Adjust type of bpf program. Default is kprobe. 185 + * Adjust type of BPF program. Default is kprobe. 186 186 */ 187 187 int bpf_program__set_socket_filter(struct bpf_program *prog); 188 188 int bpf_program__set_tracepoint(struct bpf_program *prog); ··· 206 206 bool bpf_program__is_perf_event(struct bpf_program *prog); 207 207 208 208 /* 209 - * We don't need __attribute__((packed)) now since it is 210 - * unnecessary for 'bpf_map_def' because they are all aligned. 211 - * In addition, using it will trigger -Wpacked warning message, 212 - * and will be treated as an error due to -Werror. 209 + * No need for __attribute__((packed)), all members of 'bpf_map_def' 210 + * are all aligned. In addition, using __attribute__((packed)) 211 + * would trigger a -Wpacked warning message, and lead to an error 212 + * if -Werror is set. 213 213 */ 214 214 struct bpf_map_def { 215 215 unsigned int type; ··· 220 220 }; 221 221 222 222 /* 223 - * There is another 'struct bpf_map' in include/linux/map.h. However, 224 - * it is not a uapi header so no need to consider name clash. 223 + * The 'struct bpf_map' in include/linux/bpf.h is internal to the kernel, 224 + * so no need to worry about a name clash. 225 225 */ 226 226 struct bpf_map; 227 227 struct bpf_map * ··· 229 229 230 230 /* 231 231 * Get bpf_map through the offset of corresponding struct bpf_map_def 232 - * in the bpf object file. 232 + * in the BPF object file. 233 233 */ 234 234 struct bpf_map * 235 235 bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset); ··· 259 259 const char *file; 260 260 enum bpf_prog_type prog_type; 261 261 enum bpf_attach_type expected_attach_type; 262 + int ifindex; 262 263 }; 263 264 264 265 int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr, ··· 268 267 struct bpf_object **pobj, int *prog_fd); 269 268 270 269 int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags); 270 + 271 + enum bpf_perf_event_ret { 272 + LIBBPF_PERF_EVENT_DONE = 0, 273 + LIBBPF_PERF_EVENT_ERROR = -1, 274 + LIBBPF_PERF_EVENT_CONT = -2, 275 + }; 276 + 277 + typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(void *event, 278 + void *priv); 279 + int bpf_perf_event_read_simple(void *mem, unsigned long size, 280 + unsigned long page_size, 281 + void **buf, size_t *buf_len, 282 + bpf_perf_event_print_t fn, void *priv); 271 283 #endif
+1
tools/testing/selftests/bpf/.gitignore
··· 16 16 test_sock_addr 17 17 urandom_read 18 18 test_btf 19 + test_sockmap
+6 -6
tools/testing/selftests/bpf/Makefile
··· 10 10 GENFLAGS := -DHAVE_GENHDR 11 11 endif 12 12 13 - CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include 13 + CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(BPFDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include 14 14 LDLIBS += -lcap -lelf -lrt -lpthread 15 15 16 16 TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read ··· 19 19 $(TEST_CUSTOM_PROGS): urandom_read 20 20 21 21 urandom_read: urandom_read.c 22 - $(CC) -o $(TEST_CUSTOM_PROGS) -static $< 22 + $(CC) -o $(TEST_CUSTOM_PROGS) -static $< -Wl,--build-id 23 23 24 24 # Order correspond to 'make run_tests' order 25 25 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ ··· 33 33 sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \ 34 34 sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \ 35 35 test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \ 36 - test_get_stack_rawtp.o 36 + test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o 37 37 38 38 # Order correspond to 'make run_tests' order 39 39 TEST_PROGS := test_kmod.sh \ ··· 90 90 $(OUTPUT)/test_l4lb_noinline.o: CLANG_FLAGS += -fno-inline 91 91 $(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline 92 92 93 - BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help |& grep dwarfris) 94 - BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help |& grep BTF) 95 - BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --version |& grep LLVM) 93 + BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris) 94 + BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF) 95 + BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --version 2>&1 | grep LLVM) 96 96 97 97 ifneq ($(BTF_LLC_PROBE),) 98 98 ifneq ($(BTF_PAHOLE_PROBE),)
+11
tools/testing/selftests/bpf/bpf_helpers.h
··· 75 75 (void *) BPF_FUNC_sock_ops_cb_flags_set; 76 76 static int (*bpf_sk_redirect_map)(void *ctx, void *map, int key, int flags) = 77 77 (void *) BPF_FUNC_sk_redirect_map; 78 + static int (*bpf_sk_redirect_hash)(void *ctx, void *map, void *key, int flags) = 79 + (void *) BPF_FUNC_sk_redirect_hash; 78 80 static int (*bpf_sock_map_update)(void *map, void *key, void *value, 79 81 unsigned long long flags) = 80 82 (void *) BPF_FUNC_sock_map_update; 83 + static int (*bpf_sock_hash_update)(void *map, void *key, void *value, 84 + unsigned long long flags) = 85 + (void *) BPF_FUNC_sock_hash_update; 81 86 static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags, 82 87 void *buf, unsigned int buf_size) = 83 88 (void *) BPF_FUNC_perf_event_read_value; ··· 93 88 (void *) BPF_FUNC_override_return; 94 89 static int (*bpf_msg_redirect_map)(void *ctx, void *map, int key, int flags) = 95 90 (void *) BPF_FUNC_msg_redirect_map; 91 + static int (*bpf_msg_redirect_hash)(void *ctx, 92 + void *map, void *key, int flags) = 93 + (void *) BPF_FUNC_msg_redirect_hash; 96 94 static int (*bpf_msg_apply_bytes)(void *ctx, int len) = 97 95 (void *) BPF_FUNC_msg_apply_bytes; 98 96 static int (*bpf_msg_cork_bytes)(void *ctx, int len) = ··· 111 103 (void *) BPF_FUNC_skb_get_xfrm_state; 112 104 static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) = 113 105 (void *) BPF_FUNC_get_stack; 106 + static int (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params, 107 + int plen, __u32 flags) = 108 + (void *) BPF_FUNC_fib_lookup; 114 109 115 110 /* llvm builtin functions that eBPF C program may use to 116 111 * emit BPF_LD_ABS and BPF_LD_IND instructions
+80
tools/testing/selftests/bpf/bpf_rand.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef __BPF_RAND__ 3 + #define __BPF_RAND__ 4 + 5 + #include <stdint.h> 6 + #include <stdlib.h> 7 + #include <time.h> 8 + 9 + static inline uint64_t bpf_rand_mask(uint64_t mask) 10 + { 11 + return (((uint64_t)(uint32_t)rand()) | 12 + ((uint64_t)(uint32_t)rand() << 32)) & mask; 13 + } 14 + 15 + #define bpf_rand_ux(x, m) \ 16 + static inline uint64_t bpf_rand_u##x(int shift) \ 17 + { \ 18 + return bpf_rand_mask((m)) << shift; \ 19 + } 20 + 21 + bpf_rand_ux( 8, 0xffULL) 22 + bpf_rand_ux(16, 0xffffULL) 23 + bpf_rand_ux(24, 0xffffffULL) 24 + bpf_rand_ux(32, 0xffffffffULL) 25 + bpf_rand_ux(40, 0xffffffffffULL) 26 + bpf_rand_ux(48, 0xffffffffffffULL) 27 + bpf_rand_ux(56, 0xffffffffffffffULL) 28 + bpf_rand_ux(64, 0xffffffffffffffffULL) 29 + 30 + static inline void bpf_semi_rand_init(void) 31 + { 32 + srand(time(NULL)); 33 + } 34 + 35 + static inline uint64_t bpf_semi_rand_get(void) 36 + { 37 + switch (rand() % 39) { 38 + case 0: return 0x000000ff00000000ULL | bpf_rand_u8(0); 39 + case 1: return 0xffffffff00000000ULL | bpf_rand_u16(0); 40 + case 2: return 0x00000000ffff0000ULL | bpf_rand_u16(0); 41 + case 3: return 0x8000000000000000ULL | bpf_rand_u32(0); 42 + case 4: return 0x00000000f0000000ULL | bpf_rand_u32(0); 43 + case 5: return 0x0000000100000000ULL | bpf_rand_u24(0); 44 + case 6: return 0x800ff00000000000ULL | bpf_rand_u32(0); 45 + case 7: return 0x7fffffff00000000ULL | bpf_rand_u32(0); 46 + case 8: return 0xffffffffffffff00ULL ^ bpf_rand_u32(24); 47 + case 9: return 0xffffffffffffff00ULL | bpf_rand_u8(0); 48 + case 10: return 0x0000000010000000ULL | bpf_rand_u32(0); 49 + case 11: return 0xf000000000000000ULL | bpf_rand_u8(0); 50 + case 12: return 0x0000f00000000000ULL | bpf_rand_u8(8); 51 + case 13: return 0x000000000f000000ULL | bpf_rand_u8(16); 52 + case 14: return 0x0000000000000f00ULL | bpf_rand_u8(32); 53 + case 15: return 0x00fff00000000f00ULL | bpf_rand_u8(48); 54 + case 16: return 0x00007fffffffffffULL ^ bpf_rand_u32(1); 55 + case 17: return 0xffff800000000000ULL | bpf_rand_u8(4); 56 + case 18: return 0xffff800000000000ULL | bpf_rand_u8(20); 57 + case 19: return (0xffffffc000000000ULL + 0x80000ULL) | bpf_rand_u32(0); 58 + case 20: return (0xffffffc000000000ULL - 0x04000000ULL) | bpf_rand_u32(0); 59 + case 21: return 0x0000000000000000ULL | bpf_rand_u8(55) | bpf_rand_u32(20); 60 + case 22: return 0xffffffffffffffffULL ^ bpf_rand_u8(3) ^ bpf_rand_u32(40); 61 + case 23: return 0x0000000000000000ULL | bpf_rand_u8(bpf_rand_u8(0) % 64); 62 + case 24: return 0x0000000000000000ULL | bpf_rand_u16(bpf_rand_u8(0) % 64); 63 + case 25: return 0xffffffffffffffffULL ^ bpf_rand_u8(bpf_rand_u8(0) % 64); 64 + case 26: return 0xffffffffffffffffULL ^ bpf_rand_u40(bpf_rand_u8(0) % 64); 65 + case 27: return 0x0000800000000000ULL; 66 + case 28: return 0x8000000000000000ULL; 67 + case 29: return 0x0000000000000000ULL; 68 + case 30: return 0xffffffffffffffffULL; 69 + case 31: return bpf_rand_u16(bpf_rand_u8(0) % 64); 70 + case 32: return bpf_rand_u24(bpf_rand_u8(0) % 64); 71 + case 33: return bpf_rand_u32(bpf_rand_u8(0) % 64); 72 + case 34: return bpf_rand_u40(bpf_rand_u8(0) % 64); 73 + case 35: return bpf_rand_u48(bpf_rand_u8(0) % 64); 74 + case 36: return bpf_rand_u56(bpf_rand_u8(0) % 64); 75 + case 37: return bpf_rand_u64(bpf_rand_u8(0) % 64); 76 + default: return bpf_rand_u64(0); 77 + } 78 + } 79 + 80 + #endif /* __BPF_RAND__ */
+372 -112
tools/testing/selftests/bpf/test_btf.c
··· 20 20 21 21 #include "bpf_rlimit.h" 22 22 23 + static uint32_t pass_cnt; 24 + static uint32_t error_cnt; 25 + static uint32_t skip_cnt; 26 + 27 + #define CHECK(condition, format...) ({ \ 28 + int __ret = !!(condition); \ 29 + if (__ret) { \ 30 + fprintf(stderr, "%s:%d:FAIL ", __func__, __LINE__); \ 31 + fprintf(stderr, format); \ 32 + } \ 33 + __ret; \ 34 + }) 35 + 36 + static int count_result(int err) 37 + { 38 + if (err) 39 + error_cnt++; 40 + else 41 + pass_cnt++; 42 + 43 + fprintf(stderr, "\n"); 44 + return err; 45 + } 46 + 23 47 #define min(a, b) ((a) < (b) ? (a) : (b)) 24 48 #define __printf(a, b) __attribute__((format(printf, a, b))) 25 49 ··· 918 894 void *raw_btf; 919 895 920 896 type_sec_size = get_type_sec_size(raw_types); 921 - if (type_sec_size < 0) { 922 - fprintf(stderr, "Cannot get nr_raw_types\n"); 897 + if (CHECK(type_sec_size < 0, "Cannot get nr_raw_types")) 923 898 return NULL; 924 - } 925 899 926 900 size_needed = sizeof(*hdr) + type_sec_size + str_sec_size; 927 901 raw_btf = malloc(size_needed); 928 - if (!raw_btf) { 929 - fprintf(stderr, "Cannot allocate memory for raw_btf\n"); 902 + if (CHECK(!raw_btf, "Cannot allocate memory for raw_btf")) 930 903 return NULL; 931 - } 932 904 933 905 /* Copy header */ 934 906 memcpy(raw_btf, hdr, sizeof(*hdr)); ··· 935 915 for (i = 0; i < type_sec_size / sizeof(raw_types[0]); i++) { 936 916 if (raw_types[i] == NAME_TBD) { 937 917 next_str = get_next_str(next_str, end_str); 938 - if (!next_str) { 939 - fprintf(stderr, "Error in getting next_str\n"); 918 + if (CHECK(!next_str, "Error in getting next_str")) { 940 919 free(raw_btf); 941 920 return NULL; 942 921 } ··· 992 973 free(raw_btf); 993 974 994 975 err = ((btf_fd == -1) != test->btf_load_err); 995 - if (err) 996 - fprintf(stderr, "btf_load_err:%d btf_fd:%d\n", 997 - test->btf_load_err, btf_fd); 976 + CHECK(err, "btf_fd:%d test->btf_load_err:%u", 977 + btf_fd, test->btf_load_err); 998 978 999 979 if (err || btf_fd == -1) 1000 980 goto done; ··· 1010 992 map_fd = bpf_create_map_xattr(&create_attr); 1011 993 1012 994 err = ((map_fd == -1) != test->map_create_err); 1013 - if (err) 1014 - fprintf(stderr, "map_create_err:%d map_fd:%d\n", 1015 - test->map_create_err, map_fd); 995 + CHECK(err, "map_fd:%d test->map_create_err:%u", 996 + map_fd, test->map_create_err); 1016 997 1017 998 done: 1018 999 if (!err) 1019 - fprintf(stderr, "OK\n"); 1000 + fprintf(stderr, "OK"); 1020 1001 1021 1002 if (*btf_log_buf && (err || args.always_log)) 1022 - fprintf(stderr, "%s\n", btf_log_buf); 1003 + fprintf(stderr, "\n%s", btf_log_buf); 1023 1004 1024 1005 if (btf_fd != -1) 1025 1006 close(btf_fd); ··· 1034 1017 int err = 0; 1035 1018 1036 1019 if (args.raw_test_num) 1037 - return do_test_raw(args.raw_test_num); 1020 + return count_result(do_test_raw(args.raw_test_num)); 1038 1021 1039 1022 for (i = 1; i <= ARRAY_SIZE(raw_tests); i++) 1040 - err |= do_test_raw(i); 1023 + err |= count_result(do_test_raw(i)); 1041 1024 1042 1025 return err; 1043 1026 } ··· 1047 1030 const char *str_sec; 1048 1031 __u32 raw_types[MAX_NR_RAW_TYPES]; 1049 1032 __u32 str_sec_size; 1050 - int info_size_delta; 1033 + int btf_size_delta; 1034 + int (*special_test)(unsigned int test_num); 1051 1035 }; 1036 + 1037 + static int test_big_btf_info(unsigned int test_num); 1038 + static int test_btf_id(unsigned int test_num); 1052 1039 1053 1040 const struct btf_get_info_test get_info_tests[] = { 1054 1041 { ··· 1064 1043 }, 1065 1044 .str_sec = "", 1066 1045 .str_sec_size = sizeof(""), 1067 - .info_size_delta = 1, 1046 + .btf_size_delta = 1, 1068 1047 }, 1069 1048 { 1070 1049 .descr = "== raw_btf_size-3", ··· 1075 1054 }, 1076 1055 .str_sec = "", 1077 1056 .str_sec_size = sizeof(""), 1078 - .info_size_delta = -3, 1057 + .btf_size_delta = -3, 1058 + }, 1059 + { 1060 + .descr = "Large bpf_btf_info", 1061 + .raw_types = { 1062 + /* int */ /* [1] */ 1063 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), 1064 + BTF_END_RAW, 1065 + }, 1066 + .str_sec = "", 1067 + .str_sec_size = sizeof(""), 1068 + .special_test = test_big_btf_info, 1069 + }, 1070 + { 1071 + .descr = "BTF ID", 1072 + .raw_types = { 1073 + /* int */ /* [1] */ 1074 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), 1075 + /* unsigned int */ /* [2] */ 1076 + BTF_TYPE_INT_ENC(0, 0, 0, 32, 4), 1077 + BTF_END_RAW, 1078 + }, 1079 + .str_sec = "", 1080 + .str_sec_size = sizeof(""), 1081 + .special_test = test_btf_id, 1079 1082 }, 1080 1083 }; 1081 1084 1082 - static int do_test_get_info(unsigned int test_num) 1085 + static inline __u64 ptr_to_u64(const void *ptr) 1086 + { 1087 + return (__u64)(unsigned long)ptr; 1088 + } 1089 + 1090 + static int test_big_btf_info(unsigned int test_num) 1083 1091 { 1084 1092 const struct btf_get_info_test *test = &get_info_tests[test_num - 1]; 1085 - unsigned int raw_btf_size, user_btf_size, expected_nbytes; 1086 1093 uint8_t *raw_btf = NULL, *user_btf = NULL; 1094 + unsigned int raw_btf_size; 1095 + struct { 1096 + struct bpf_btf_info info; 1097 + uint64_t garbage; 1098 + } info_garbage; 1099 + struct bpf_btf_info *info; 1087 1100 int btf_fd = -1, err; 1088 - 1089 - fprintf(stderr, "BTF GET_INFO_BY_ID test[%u] (%s): ", 1090 - test_num, test->descr); 1101 + uint32_t info_len; 1091 1102 1092 1103 raw_btf = btf_raw_create(&hdr_tmpl, 1093 1104 test->raw_types, ··· 1133 1080 *btf_log_buf = '\0'; 1134 1081 1135 1082 user_btf = malloc(raw_btf_size); 1136 - if (!user_btf) { 1137 - fprintf(stderr, "Cannot allocate memory for user_btf\n"); 1083 + if (CHECK(!user_btf, "!user_btf")) { 1138 1084 err = -1; 1139 1085 goto done; 1140 1086 } ··· 1141 1089 btf_fd = bpf_load_btf(raw_btf, raw_btf_size, 1142 1090 btf_log_buf, BTF_LOG_BUF_SIZE, 1143 1091 args.always_log); 1144 - if (btf_fd == -1) { 1145 - fprintf(stderr, "bpf_load_btf:%s(%d)\n", 1146 - strerror(errno), errno); 1092 + if (CHECK(btf_fd == -1, "errno:%d", errno)) { 1147 1093 err = -1; 1148 1094 goto done; 1149 1095 } 1150 1096 1151 - user_btf_size = (int)raw_btf_size + test->info_size_delta; 1097 + /* 1098 + * GET_INFO should error out if the userspace info 1099 + * has non zero tailing bytes. 1100 + */ 1101 + info = &info_garbage.info; 1102 + memset(info, 0, sizeof(*info)); 1103 + info_garbage.garbage = 0xdeadbeef; 1104 + info_len = sizeof(info_garbage); 1105 + info->btf = ptr_to_u64(user_btf); 1106 + info->btf_size = raw_btf_size; 1107 + 1108 + err = bpf_obj_get_info_by_fd(btf_fd, info, &info_len); 1109 + if (CHECK(!err, "!err")) { 1110 + err = -1; 1111 + goto done; 1112 + } 1113 + 1114 + /* 1115 + * GET_INFO should succeed even info_len is larger than 1116 + * the kernel supported as long as tailing bytes are zero. 1117 + * The kernel supported info len should also be returned 1118 + * to userspace. 1119 + */ 1120 + info_garbage.garbage = 0; 1121 + err = bpf_obj_get_info_by_fd(btf_fd, info, &info_len); 1122 + if (CHECK(err || info_len != sizeof(*info), 1123 + "err:%d errno:%d info_len:%u sizeof(*info):%lu", 1124 + err, errno, info_len, sizeof(*info))) { 1125 + err = -1; 1126 + goto done; 1127 + } 1128 + 1129 + fprintf(stderr, "OK"); 1130 + 1131 + done: 1132 + if (*btf_log_buf && (err || args.always_log)) 1133 + fprintf(stderr, "\n%s", btf_log_buf); 1134 + 1135 + free(raw_btf); 1136 + free(user_btf); 1137 + 1138 + if (btf_fd != -1) 1139 + close(btf_fd); 1140 + 1141 + return err; 1142 + } 1143 + 1144 + static int test_btf_id(unsigned int test_num) 1145 + { 1146 + const struct btf_get_info_test *test = &get_info_tests[test_num - 1]; 1147 + struct bpf_create_map_attr create_attr = {}; 1148 + uint8_t *raw_btf = NULL, *user_btf[2] = {}; 1149 + int btf_fd[2] = {-1, -1}, map_fd = -1; 1150 + struct bpf_map_info map_info = {}; 1151 + struct bpf_btf_info info[2] = {}; 1152 + unsigned int raw_btf_size; 1153 + uint32_t info_len; 1154 + int err, i, ret; 1155 + 1156 + raw_btf = btf_raw_create(&hdr_tmpl, 1157 + test->raw_types, 1158 + test->str_sec, 1159 + test->str_sec_size, 1160 + &raw_btf_size); 1161 + 1162 + if (!raw_btf) 1163 + return -1; 1164 + 1165 + *btf_log_buf = '\0'; 1166 + 1167 + for (i = 0; i < 2; i++) { 1168 + user_btf[i] = malloc(raw_btf_size); 1169 + if (CHECK(!user_btf[i], "!user_btf[%d]", i)) { 1170 + err = -1; 1171 + goto done; 1172 + } 1173 + info[i].btf = ptr_to_u64(user_btf[i]); 1174 + info[i].btf_size = raw_btf_size; 1175 + } 1176 + 1177 + btf_fd[0] = bpf_load_btf(raw_btf, raw_btf_size, 1178 + btf_log_buf, BTF_LOG_BUF_SIZE, 1179 + args.always_log); 1180 + if (CHECK(btf_fd[0] == -1, "errno:%d", errno)) { 1181 + err = -1; 1182 + goto done; 1183 + } 1184 + 1185 + /* Test BPF_OBJ_GET_INFO_BY_ID on btf_id */ 1186 + info_len = sizeof(info[0]); 1187 + err = bpf_obj_get_info_by_fd(btf_fd[0], &info[0], &info_len); 1188 + if (CHECK(err, "errno:%d", errno)) { 1189 + err = -1; 1190 + goto done; 1191 + } 1192 + 1193 + btf_fd[1] = bpf_btf_get_fd_by_id(info[0].id); 1194 + if (CHECK(btf_fd[1] == -1, "errno:%d", errno)) { 1195 + err = -1; 1196 + goto done; 1197 + } 1198 + 1199 + ret = 0; 1200 + err = bpf_obj_get_info_by_fd(btf_fd[1], &info[1], &info_len); 1201 + if (CHECK(err || info[0].id != info[1].id || 1202 + info[0].btf_size != info[1].btf_size || 1203 + (ret = memcmp(user_btf[0], user_btf[1], info[0].btf_size)), 1204 + "err:%d errno:%d id0:%u id1:%u btf_size0:%u btf_size1:%u memcmp:%d", 1205 + err, errno, info[0].id, info[1].id, 1206 + info[0].btf_size, info[1].btf_size, ret)) { 1207 + err = -1; 1208 + goto done; 1209 + } 1210 + 1211 + /* Test btf members in struct bpf_map_info */ 1212 + create_attr.name = "test_btf_id"; 1213 + create_attr.map_type = BPF_MAP_TYPE_ARRAY; 1214 + create_attr.key_size = sizeof(int); 1215 + create_attr.value_size = sizeof(unsigned int); 1216 + create_attr.max_entries = 4; 1217 + create_attr.btf_fd = btf_fd[0]; 1218 + create_attr.btf_key_id = 1; 1219 + create_attr.btf_value_id = 2; 1220 + 1221 + map_fd = bpf_create_map_xattr(&create_attr); 1222 + if (CHECK(map_fd == -1, "errno:%d", errno)) { 1223 + err = -1; 1224 + goto done; 1225 + } 1226 + 1227 + info_len = sizeof(map_info); 1228 + err = bpf_obj_get_info_by_fd(map_fd, &map_info, &info_len); 1229 + if (CHECK(err || map_info.btf_id != info[0].id || 1230 + map_info.btf_key_id != 1 || map_info.btf_value_id != 2, 1231 + "err:%d errno:%d info.id:%u btf_id:%u btf_key_id:%u btf_value_id:%u", 1232 + err, errno, info[0].id, map_info.btf_id, map_info.btf_key_id, 1233 + map_info.btf_value_id)) { 1234 + err = -1; 1235 + goto done; 1236 + } 1237 + 1238 + for (i = 0; i < 2; i++) { 1239 + close(btf_fd[i]); 1240 + btf_fd[i] = -1; 1241 + } 1242 + 1243 + /* Test BTF ID is removed from the kernel */ 1244 + btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id); 1245 + if (CHECK(btf_fd[0] == -1, "errno:%d", errno)) { 1246 + err = -1; 1247 + goto done; 1248 + } 1249 + close(btf_fd[0]); 1250 + btf_fd[0] = -1; 1251 + 1252 + /* The map holds the last ref to BTF and its btf_id */ 1253 + close(map_fd); 1254 + map_fd = -1; 1255 + btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id); 1256 + if (CHECK(btf_fd[0] != -1, "BTF lingers")) { 1257 + err = -1; 1258 + goto done; 1259 + } 1260 + 1261 + fprintf(stderr, "OK"); 1262 + 1263 + done: 1264 + if (*btf_log_buf && (err || args.always_log)) 1265 + fprintf(stderr, "\n%s", btf_log_buf); 1266 + 1267 + free(raw_btf); 1268 + if (map_fd != -1) 1269 + close(map_fd); 1270 + for (i = 0; i < 2; i++) { 1271 + free(user_btf[i]); 1272 + if (btf_fd[i] != -1) 1273 + close(btf_fd[i]); 1274 + } 1275 + 1276 + return err; 1277 + } 1278 + 1279 + static int do_test_get_info(unsigned int test_num) 1280 + { 1281 + const struct btf_get_info_test *test = &get_info_tests[test_num - 1]; 1282 + unsigned int raw_btf_size, user_btf_size, expected_nbytes; 1283 + uint8_t *raw_btf = NULL, *user_btf = NULL; 1284 + struct bpf_btf_info info = {}; 1285 + int btf_fd = -1, err, ret; 1286 + uint32_t info_len; 1287 + 1288 + fprintf(stderr, "BTF GET_INFO test[%u] (%s): ", 1289 + test_num, test->descr); 1290 + 1291 + if (test->special_test) 1292 + return test->special_test(test_num); 1293 + 1294 + raw_btf = btf_raw_create(&hdr_tmpl, 1295 + test->raw_types, 1296 + test->str_sec, 1297 + test->str_sec_size, 1298 + &raw_btf_size); 1299 + 1300 + if (!raw_btf) 1301 + return -1; 1302 + 1303 + *btf_log_buf = '\0'; 1304 + 1305 + user_btf = malloc(raw_btf_size); 1306 + if (CHECK(!user_btf, "!user_btf")) { 1307 + err = -1; 1308 + goto done; 1309 + } 1310 + 1311 + btf_fd = bpf_load_btf(raw_btf, raw_btf_size, 1312 + btf_log_buf, BTF_LOG_BUF_SIZE, 1313 + args.always_log); 1314 + if (CHECK(btf_fd == -1, "errno:%d", errno)) { 1315 + err = -1; 1316 + goto done; 1317 + } 1318 + 1319 + user_btf_size = (int)raw_btf_size + test->btf_size_delta; 1152 1320 expected_nbytes = min(raw_btf_size, user_btf_size); 1153 1321 if (raw_btf_size > expected_nbytes) 1154 1322 memset(user_btf + expected_nbytes, 0xff, 1155 1323 raw_btf_size - expected_nbytes); 1156 1324 1157 - err = bpf_obj_get_info_by_fd(btf_fd, user_btf, &user_btf_size); 1158 - if (err || user_btf_size != raw_btf_size || 1159 - memcmp(raw_btf, user_btf, expected_nbytes)) { 1160 - fprintf(stderr, 1161 - "err:%d(errno:%d) raw_btf_size:%u user_btf_size:%u expected_nbytes:%u memcmp:%d\n", 1162 - err, errno, 1163 - raw_btf_size, user_btf_size, expected_nbytes, 1164 - memcmp(raw_btf, user_btf, expected_nbytes)); 1325 + info_len = sizeof(info); 1326 + info.btf = ptr_to_u64(user_btf); 1327 + info.btf_size = user_btf_size; 1328 + 1329 + ret = 0; 1330 + err = bpf_obj_get_info_by_fd(btf_fd, &info, &info_len); 1331 + if (CHECK(err || !info.id || info_len != sizeof(info) || 1332 + info.btf_size != raw_btf_size || 1333 + (ret = memcmp(raw_btf, user_btf, expected_nbytes)), 1334 + "err:%d errno:%d info.id:%u info_len:%u sizeof(info):%lu raw_btf_size:%u info.btf_size:%u expected_nbytes:%u memcmp:%d", 1335 + err, errno, info.id, info_len, sizeof(info), 1336 + raw_btf_size, info.btf_size, expected_nbytes, ret)) { 1165 1337 err = -1; 1166 1338 goto done; 1167 1339 } 1168 1340 1169 1341 while (expected_nbytes < raw_btf_size) { 1170 1342 fprintf(stderr, "%u...", expected_nbytes); 1171 - if (user_btf[expected_nbytes++] != 0xff) { 1172 - fprintf(stderr, "!= 0xff\n"); 1343 + if (CHECK(user_btf[expected_nbytes++] != 0xff, 1344 + "user_btf[%u]:%x != 0xff", expected_nbytes - 1, 1345 + user_btf[expected_nbytes - 1])) { 1173 1346 err = -1; 1174 1347 goto done; 1175 1348 } 1176 1349 } 1177 1350 1178 - fprintf(stderr, "OK\n"); 1351 + fprintf(stderr, "OK"); 1179 1352 1180 1353 done: 1181 1354 if (*btf_log_buf && (err || args.always_log)) 1182 - fprintf(stderr, "%s\n", btf_log_buf); 1355 + fprintf(stderr, "\n%s", btf_log_buf); 1183 1356 1184 1357 free(raw_btf); 1185 1358 free(user_btf); ··· 1421 1144 int err = 0; 1422 1145 1423 1146 if (args.get_info_test_num) 1424 - return do_test_get_info(args.get_info_test_num); 1147 + return count_result(do_test_get_info(args.get_info_test_num)); 1425 1148 1426 1149 for (i = 1; i <= ARRAY_SIZE(get_info_tests); i++) 1427 - err |= do_test_get_info(i); 1150 + err |= count_result(do_test_get_info(i)); 1428 1151 1429 1152 return err; 1430 1153 } ··· 1452 1175 Elf *elf; 1453 1176 int ret; 1454 1177 1455 - if (elf_version(EV_CURRENT) == EV_NONE) { 1456 - fprintf(stderr, "Failed to init libelf\n"); 1178 + if (CHECK(elf_version(EV_CURRENT) == EV_NONE, 1179 + "elf_version(EV_CURRENT) == EV_NONE")) 1457 1180 return -1; 1458 - } 1459 1181 1460 1182 elf_fd = open(fn, O_RDONLY); 1461 - if (elf_fd == -1) { 1462 - fprintf(stderr, "Cannot open file %s: %s(%d)\n", 1463 - fn, strerror(errno), errno); 1183 + if (CHECK(elf_fd == -1, "open(%s): errno:%d", fn, errno)) 1464 1184 return -1; 1465 - } 1466 1185 1467 1186 elf = elf_begin(elf_fd, ELF_C_READ, NULL); 1468 - if (!elf) { 1469 - fprintf(stderr, "Failed to read ELF from %s. %s\n", fn, 1470 - elf_errmsg(elf_errno())); 1187 + if (CHECK(!elf, "elf_begin(%s): %s", fn, elf_errmsg(elf_errno()))) { 1471 1188 ret = -1; 1472 1189 goto done; 1473 1190 } 1474 1191 1475 - if (!gelf_getehdr(elf, &ehdr)) { 1476 - fprintf(stderr, "Failed to get EHDR from %s\n", fn); 1192 + if (CHECK(!gelf_getehdr(elf, &ehdr), "!gelf_getehdr(%s)", fn)) { 1477 1193 ret = -1; 1478 1194 goto done; 1479 1195 } ··· 1475 1205 const char *sh_name; 1476 1206 GElf_Shdr sh; 1477 1207 1478 - if (gelf_getshdr(scn, &sh) != &sh) { 1479 - fprintf(stderr, 1480 - "Failed to get section header from %s\n", fn); 1208 + if (CHECK(gelf_getshdr(scn, &sh) != &sh, 1209 + "file:%s gelf_getshdr != &sh", fn)) { 1481 1210 ret = -1; 1482 1211 goto done; 1483 1212 } ··· 1512 1243 return err; 1513 1244 1514 1245 if (err == 0) { 1515 - fprintf(stderr, "SKIP. No ELF %s found\n", BTF_ELF_SEC); 1246 + fprintf(stderr, "SKIP. No ELF %s found", BTF_ELF_SEC); 1247 + skip_cnt++; 1516 1248 return 0; 1517 1249 } 1518 1250 1519 1251 obj = bpf_object__open(test->file); 1520 - if (IS_ERR(obj)) 1252 + if (CHECK(IS_ERR(obj), "obj: %ld", PTR_ERR(obj))) 1521 1253 return PTR_ERR(obj); 1522 1254 1523 1255 err = bpf_object__btf_fd(obj); 1524 - if (err == -1) { 1525 - fprintf(stderr, "bpf_object__btf_fd: -1\n"); 1256 + if (CHECK(err == -1, "bpf_object__btf_fd: -1")) 1526 1257 goto done; 1527 - } 1528 1258 1529 1259 prog = bpf_program__next(NULL, obj); 1530 - if (!prog) { 1531 - fprintf(stderr, "Cannot find bpf_prog\n"); 1260 + if (CHECK(!prog, "Cannot find bpf_prog")) { 1532 1261 err = -1; 1533 1262 goto done; 1534 1263 } 1535 1264 1536 1265 bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT); 1537 1266 err = bpf_object__load(obj); 1538 - if (err < 0) { 1539 - fprintf(stderr, "bpf_object__load: %d\n", err); 1267 + if (CHECK(err < 0, "bpf_object__load: %d", err)) 1540 1268 goto done; 1541 - } 1542 1269 1543 1270 map = bpf_object__find_map_by_name(obj, "btf_map"); 1544 - if (!map) { 1545 - fprintf(stderr, "btf_map not found\n"); 1271 + if (CHECK(!map, "btf_map not found")) { 1546 1272 err = -1; 1547 1273 goto done; 1548 1274 } 1549 1275 1550 1276 err = (bpf_map__btf_key_id(map) == 0 || bpf_map__btf_value_id(map) == 0) 1551 1277 != test->btf_kv_notfound; 1552 - if (err) { 1553 - fprintf(stderr, 1554 - "btf_kv_notfound:%u btf_key_id:%u btf_value_id:%u\n", 1555 - test->btf_kv_notfound, 1556 - bpf_map__btf_key_id(map), 1557 - bpf_map__btf_value_id(map)); 1278 + if (CHECK(err, "btf_key_id:%u btf_value_id:%u test->btf_kv_notfound:%u", 1279 + bpf_map__btf_key_id(map), bpf_map__btf_value_id(map), 1280 + test->btf_kv_notfound)) 1558 1281 goto done; 1559 - } 1560 1282 1561 - fprintf(stderr, "OK\n"); 1283 + fprintf(stderr, "OK"); 1562 1284 1563 1285 done: 1564 1286 bpf_object__close(obj); ··· 1562 1302 int err = 0; 1563 1303 1564 1304 if (args.file_test_num) 1565 - return do_test_file(args.file_test_num); 1305 + return count_result(do_test_file(args.file_test_num)); 1566 1306 1567 1307 for (i = 1; i <= ARRAY_SIZE(file_tests); i++) 1568 - err |= do_test_file(i); 1308 + err |= count_result(do_test_file(i)); 1569 1309 1570 1310 return err; 1571 1311 } ··· 1685 1425 unsigned int key; 1686 1426 uint8_t *raw_btf; 1687 1427 ssize_t nread; 1688 - int err; 1428 + int err, ret; 1689 1429 1690 1430 fprintf(stderr, "%s......", test->descr); 1691 1431 raw_btf = btf_raw_create(&hdr_tmpl, test->raw_types, ··· 1701 1441 args.always_log); 1702 1442 free(raw_btf); 1703 1443 1704 - if (btf_fd == -1) { 1444 + if (CHECK(btf_fd == -1, "errno:%d", errno)) { 1705 1445 err = -1; 1706 - fprintf(stderr, "bpf_load_btf: %s(%d)\n", 1707 - strerror(errno), errno); 1708 1446 goto done; 1709 1447 } 1710 1448 ··· 1716 1458 create_attr.btf_value_id = test->value_id; 1717 1459 1718 1460 map_fd = bpf_create_map_xattr(&create_attr); 1719 - if (map_fd == -1) { 1461 + if (CHECK(map_fd == -1, "errno:%d", errno)) { 1720 1462 err = -1; 1721 - fprintf(stderr, "bpf_creat_map_btf: %s(%d)\n", 1722 - strerror(errno), errno); 1723 1463 goto done; 1724 1464 } 1725 1465 1726 - if (snprintf(pin_path, sizeof(pin_path), "%s/%s", 1727 - "/sys/fs/bpf", test->map_name) == sizeof(pin_path)) { 1466 + ret = snprintf(pin_path, sizeof(pin_path), "%s/%s", 1467 + "/sys/fs/bpf", test->map_name); 1468 + 1469 + if (CHECK(ret == sizeof(pin_path), "pin_path %s/%s is too long", 1470 + "/sys/fs/bpf", test->map_name)) { 1728 1471 err = -1; 1729 - fprintf(stderr, "pin_path is too long\n"); 1730 1472 goto done; 1731 1473 } 1732 1474 1733 1475 err = bpf_obj_pin(map_fd, pin_path); 1734 - if (err) { 1735 - fprintf(stderr, "Cannot pin to %s. %s(%d).\n", pin_path, 1736 - strerror(errno), errno); 1476 + if (CHECK(err, "bpf_obj_pin(%s): errno:%d.", pin_path, errno)) 1737 1477 goto done; 1738 - } 1739 1478 1740 1479 for (key = 0; key < test->max_entries; key++) { 1741 1480 set_pprint_mapv(&mapv, key); ··· 1740 1485 } 1741 1486 1742 1487 pin_file = fopen(pin_path, "r"); 1743 - if (!pin_file) { 1488 + if (CHECK(!pin_file, "fopen(%s): errno:%d", pin_path, errno)) { 1744 1489 err = -1; 1745 - fprintf(stderr, "fopen(%s): %s(%d)\n", pin_path, 1746 - strerror(errno), errno); 1747 1490 goto done; 1748 1491 } 1749 1492 ··· 1750 1497 *line == '#') 1751 1498 ; 1752 1499 1753 - if (nread <= 0) { 1500 + if (CHECK(nread <= 0, "Unexpected EOF")) { 1754 1501 err = -1; 1755 - fprintf(stderr, "Unexpected EOF\n"); 1756 1502 goto done; 1757 1503 } 1758 1504 ··· 1770 1518 mapv.ui8a[4], mapv.ui8a[5], mapv.ui8a[6], mapv.ui8a[7], 1771 1519 pprint_enum_str[mapv.aenum]); 1772 1520 1773 - if (nexpected_line == sizeof(expected_line)) { 1521 + if (CHECK(nexpected_line == sizeof(expected_line), 1522 + "expected_line is too long")) { 1774 1523 err = -1; 1775 - fprintf(stderr, "expected_line is too long\n"); 1776 1524 goto done; 1777 1525 } 1778 1526 ··· 1787 1535 nread = getline(&line, &line_len, pin_file); 1788 1536 } while (++key < test->max_entries && nread > 0); 1789 1537 1790 - if (key < test->max_entries) { 1538 + if (CHECK(key < test->max_entries, 1539 + "Unexpected EOF. key:%u test->max_entries:%u", 1540 + key, test->max_entries)) { 1791 1541 err = -1; 1792 - fprintf(stderr, "Unexpected EOF\n"); 1793 1542 goto done; 1794 1543 } 1795 1544 1796 - if (nread > 0) { 1545 + if (CHECK(nread > 0, "Unexpected extra pprint output: %s", line)) { 1797 1546 err = -1; 1798 - fprintf(stderr, "Unexpected extra pprint output: %s\n", line); 1799 1547 goto done; 1800 1548 } 1801 1549 ··· 1803 1551 1804 1552 done: 1805 1553 if (!err) 1806 - fprintf(stderr, "OK\n"); 1554 + fprintf(stderr, "OK"); 1807 1555 if (*btf_log_buf && (err || args.always_log)) 1808 - fprintf(stderr, "%s\n", btf_log_buf); 1556 + fprintf(stderr, "\n%s", btf_log_buf); 1809 1557 if (btf_fd != -1) 1810 1558 close(btf_fd); 1811 1559 if (map_fd != -1) ··· 1886 1634 return 0; 1887 1635 } 1888 1636 1637 + static void print_summary(void) 1638 + { 1639 + fprintf(stderr, "PASS:%u SKIP:%u FAIL:%u\n", 1640 + pass_cnt - skip_cnt, skip_cnt, error_cnt); 1641 + } 1642 + 1889 1643 int main(int argc, char **argv) 1890 1644 { 1891 1645 int err = 0; ··· 1913 1655 err |= test_file(); 1914 1656 1915 1657 if (args.pprint_test) 1916 - err |= test_pprint(); 1658 + err |= count_result(test_pprint()); 1917 1659 1918 1660 if (args.raw_test || args.get_info_test || args.file_test || 1919 1661 args.pprint_test) 1920 - return err; 1662 + goto done; 1921 1663 1922 1664 err |= test_raw(); 1923 1665 err |= test_get_info(); 1924 1666 err |= test_file(); 1925 1667 1668 + done: 1669 + print_summary(); 1926 1670 return err; 1927 1671 }
+137 -3
tools/testing/selftests/bpf/test_progs.c
··· 1272 1272 return; 1273 1273 } 1274 1274 1275 + static void test_stacktrace_build_id_nmi(void) 1276 + { 1277 + int control_map_fd, stackid_hmap_fd, stackmap_fd, stack_amap_fd; 1278 + const char *file = "./test_stacktrace_build_id.o"; 1279 + int err, pmu_fd, prog_fd; 1280 + struct perf_event_attr attr = { 1281 + .sample_freq = 5000, 1282 + .freq = 1, 1283 + .type = PERF_TYPE_HARDWARE, 1284 + .config = PERF_COUNT_HW_CPU_CYCLES, 1285 + }; 1286 + __u32 key, previous_key, val, duration = 0; 1287 + struct bpf_object *obj; 1288 + char buf[256]; 1289 + int i, j; 1290 + struct bpf_stack_build_id id_offs[PERF_MAX_STACK_DEPTH]; 1291 + int build_id_matches = 0; 1292 + 1293 + err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd); 1294 + if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno)) 1295 + return; 1296 + 1297 + pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */, 1298 + 0 /* cpu 0 */, -1 /* group id */, 1299 + 0 /* flags */); 1300 + if (CHECK(pmu_fd < 0, "perf_event_open", 1301 + "err %d errno %d. Does the test host support PERF_COUNT_HW_CPU_CYCLES?\n", 1302 + pmu_fd, errno)) 1303 + goto close_prog; 1304 + 1305 + err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0); 1306 + if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", 1307 + err, errno)) 1308 + goto close_pmu; 1309 + 1310 + err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); 1311 + if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", 1312 + err, errno)) 1313 + goto disable_pmu; 1314 + 1315 + /* find map fds */ 1316 + control_map_fd = bpf_find_map(__func__, obj, "control_map"); 1317 + if (CHECK(control_map_fd < 0, "bpf_find_map control_map", 1318 + "err %d errno %d\n", err, errno)) 1319 + goto disable_pmu; 1320 + 1321 + stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap"); 1322 + if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap", 1323 + "err %d errno %d\n", err, errno)) 1324 + goto disable_pmu; 1325 + 1326 + stackmap_fd = bpf_find_map(__func__, obj, "stackmap"); 1327 + if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n", 1328 + err, errno)) 1329 + goto disable_pmu; 1330 + 1331 + stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap"); 1332 + if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap", 1333 + "err %d errno %d\n", err, errno)) 1334 + goto disable_pmu; 1335 + 1336 + assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null") 1337 + == 0); 1338 + assert(system("taskset 0x1 ./urandom_read 100000") == 0); 1339 + /* disable stack trace collection */ 1340 + key = 0; 1341 + val = 1; 1342 + bpf_map_update_elem(control_map_fd, &key, &val, 0); 1343 + 1344 + /* for every element in stackid_hmap, we can find a corresponding one 1345 + * in stackmap, and vise versa. 1346 + */ 1347 + err = compare_map_keys(stackid_hmap_fd, stackmap_fd); 1348 + if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap", 1349 + "err %d errno %d\n", err, errno)) 1350 + goto disable_pmu; 1351 + 1352 + err = compare_map_keys(stackmap_fd, stackid_hmap_fd); 1353 + if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap", 1354 + "err %d errno %d\n", err, errno)) 1355 + goto disable_pmu; 1356 + 1357 + err = extract_build_id(buf, 256); 1358 + 1359 + if (CHECK(err, "get build_id with readelf", 1360 + "err %d errno %d\n", err, errno)) 1361 + goto disable_pmu; 1362 + 1363 + err = bpf_map_get_next_key(stackmap_fd, NULL, &key); 1364 + if (CHECK(err, "get_next_key from stackmap", 1365 + "err %d, errno %d\n", err, errno)) 1366 + goto disable_pmu; 1367 + 1368 + do { 1369 + char build_id[64]; 1370 + 1371 + err = bpf_map_lookup_elem(stackmap_fd, &key, id_offs); 1372 + if (CHECK(err, "lookup_elem from stackmap", 1373 + "err %d, errno %d\n", err, errno)) 1374 + goto disable_pmu; 1375 + for (i = 0; i < PERF_MAX_STACK_DEPTH; ++i) 1376 + if (id_offs[i].status == BPF_STACK_BUILD_ID_VALID && 1377 + id_offs[i].offset != 0) { 1378 + for (j = 0; j < 20; ++j) 1379 + sprintf(build_id + 2 * j, "%02x", 1380 + id_offs[i].build_id[j] & 0xff); 1381 + if (strstr(buf, build_id) != NULL) 1382 + build_id_matches = 1; 1383 + } 1384 + previous_key = key; 1385 + } while (bpf_map_get_next_key(stackmap_fd, &previous_key, &key) == 0); 1386 + 1387 + if (CHECK(build_id_matches < 1, "build id match", 1388 + "Didn't find expected build ID from the map\n")) 1389 + goto disable_pmu; 1390 + 1391 + /* 1392 + * We intentionally skip compare_stack_ips(). This is because we 1393 + * only support one in_nmi() ips-to-build_id translation per cpu 1394 + * at any time, thus stack_amap here will always fallback to 1395 + * BPF_STACK_BUILD_ID_IP; 1396 + */ 1397 + 1398 + disable_pmu: 1399 + ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE); 1400 + 1401 + close_pmu: 1402 + close(pmu_fd); 1403 + 1404 + close_prog: 1405 + bpf_object__close(obj); 1406 + } 1407 + 1275 1408 #define MAX_CNT_RAWTP 10ull 1276 1409 #define MAX_STACK_RAWTP 100 1277 1410 struct get_stack_trace_t { ··· 1470 1337 good_user_stack = true; 1471 1338 } 1472 1339 if (!good_kern_stack || !good_user_stack) 1473 - return PERF_EVENT_ERROR; 1340 + return LIBBPF_PERF_EVENT_ERROR; 1474 1341 1475 1342 if (cnt == MAX_CNT_RAWTP) 1476 - return PERF_EVENT_DONE; 1343 + return LIBBPF_PERF_EVENT_DONE; 1477 1344 1478 - return PERF_EVENT_CONT; 1345 + return LIBBPF_PERF_EVENT_CONT; 1479 1346 } 1480 1347 1481 1348 static void test_get_stack_raw_tp(void) ··· 1558 1425 test_tp_attach_query(); 1559 1426 test_stacktrace_map(); 1560 1427 test_stacktrace_build_id(); 1428 + test_stacktrace_build_id_nmi(); 1561 1429 test_stacktrace_map_raw_tp(); 1562 1430 test_get_stack_raw_tp(); 1563 1431
+5
tools/testing/selftests/bpf/test_sockhash_kern.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + // Copyright (c) 2018 Covalent IO, Inc. http://covalent.io 3 + #undef SOCKMAP 4 + #define TEST_MAP_TYPE BPF_MAP_TYPE_SOCKHASH 5 + #include "./test_sockmap_kern.h"
+20 -7
tools/testing/selftests/bpf/test_sockmap.c
··· 47 47 #define S1_PORT 10000 48 48 #define S2_PORT 10001 49 49 50 - #define BPF_FILENAME "test_sockmap_kern.o" 50 + #define BPF_SOCKMAP_FILENAME "test_sockmap_kern.o" 51 + #define BPF_SOCKHASH_FILENAME "test_sockhash_kern.o" 51 52 #define CG_PATH "/sockmap" 52 53 53 54 /* global sockets */ ··· 1261 1260 BPF_PROG_TYPE_SK_MSG, 1262 1261 }; 1263 1262 1264 - static int populate_progs(void) 1263 + static int populate_progs(char *bpf_file) 1265 1264 { 1266 - char *bpf_file = BPF_FILENAME; 1267 1265 struct bpf_program *prog; 1268 1266 struct bpf_object *obj; 1269 1267 int i = 0; ··· 1306 1306 return 0; 1307 1307 } 1308 1308 1309 - static int test_suite(void) 1309 + static int __test_suite(char *bpf_file) 1310 1310 { 1311 1311 int cg_fd, err; 1312 1312 1313 - err = populate_progs(); 1313 + err = populate_progs(bpf_file); 1314 1314 if (err < 0) { 1315 1315 fprintf(stderr, "ERROR: (%i) load bpf failed\n", err); 1316 1316 return err; ··· 1347 1347 1348 1348 out: 1349 1349 printf("Summary: %i PASSED %i FAILED\n", passed, failed); 1350 + cleanup_cgroup_environment(); 1350 1351 close(cg_fd); 1352 + return err; 1353 + } 1354 + 1355 + static int test_suite(void) 1356 + { 1357 + int err; 1358 + 1359 + err = __test_suite(BPF_SOCKMAP_FILENAME); 1360 + if (err) 1361 + goto out; 1362 + err = __test_suite(BPF_SOCKHASH_FILENAME); 1363 + out: 1351 1364 return err; 1352 1365 } 1353 1366 ··· 1370 1357 int iov_count = 1, length = 1024, rate = 1; 1371 1358 struct sockmap_options options = {0}; 1372 1359 int opt, longindex, err, cg_fd = 0; 1373 - char *bpf_file = BPF_FILENAME; 1360 + char *bpf_file = BPF_SOCKMAP_FILENAME; 1374 1361 int test = PING_PONG; 1375 1362 1376 1363 if (setrlimit(RLIMIT_MEMLOCK, &r)) { ··· 1451 1438 return -1; 1452 1439 } 1453 1440 1454 - err = populate_progs(); 1441 + err = populate_progs(bpf_file); 1455 1442 if (err) { 1456 1443 fprintf(stderr, "populate program: (%s) %s\n", 1457 1444 bpf_file, strerror(errno));
+4 -339
tools/testing/selftests/bpf/test_sockmap_kern.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 - // Copyright (c) 2017-2018 Covalent IO, Inc. http://covalent.io 3 - #include <stddef.h> 4 - #include <string.h> 5 - #include <linux/bpf.h> 6 - #include <linux/if_ether.h> 7 - #include <linux/if_packet.h> 8 - #include <linux/ip.h> 9 - #include <linux/ipv6.h> 10 - #include <linux/in.h> 11 - #include <linux/udp.h> 12 - #include <linux/tcp.h> 13 - #include <linux/pkt_cls.h> 14 - #include <sys/socket.h> 15 - #include "bpf_helpers.h" 16 - #include "bpf_endian.h" 17 - 18 - /* Sockmap sample program connects a client and a backend together 19 - * using cgroups. 20 - * 21 - * client:X <---> frontend:80 client:X <---> backend:80 22 - * 23 - * For simplicity we hard code values here and bind 1:1. The hard 24 - * coded values are part of the setup in sockmap.sh script that 25 - * is associated with this BPF program. 26 - * 27 - * The bpf_printk is verbose and prints information as connections 28 - * are established and verdicts are decided. 29 - */ 30 - 31 - #define bpf_printk(fmt, ...) \ 32 - ({ \ 33 - char ____fmt[] = fmt; \ 34 - bpf_trace_printk(____fmt, sizeof(____fmt), \ 35 - ##__VA_ARGS__); \ 36 - }) 37 - 38 - struct bpf_map_def SEC("maps") sock_map = { 39 - .type = BPF_MAP_TYPE_SOCKMAP, 40 - .key_size = sizeof(int), 41 - .value_size = sizeof(int), 42 - .max_entries = 20, 43 - }; 44 - 45 - struct bpf_map_def SEC("maps") sock_map_txmsg = { 46 - .type = BPF_MAP_TYPE_SOCKMAP, 47 - .key_size = sizeof(int), 48 - .value_size = sizeof(int), 49 - .max_entries = 20, 50 - }; 51 - 52 - struct bpf_map_def SEC("maps") sock_map_redir = { 53 - .type = BPF_MAP_TYPE_SOCKMAP, 54 - .key_size = sizeof(int), 55 - .value_size = sizeof(int), 56 - .max_entries = 20, 57 - }; 58 - 59 - struct bpf_map_def SEC("maps") sock_apply_bytes = { 60 - .type = BPF_MAP_TYPE_ARRAY, 61 - .key_size = sizeof(int), 62 - .value_size = sizeof(int), 63 - .max_entries = 1 64 - }; 65 - 66 - struct bpf_map_def SEC("maps") sock_cork_bytes = { 67 - .type = BPF_MAP_TYPE_ARRAY, 68 - .key_size = sizeof(int), 69 - .value_size = sizeof(int), 70 - .max_entries = 1 71 - }; 72 - 73 - struct bpf_map_def SEC("maps") sock_pull_bytes = { 74 - .type = BPF_MAP_TYPE_ARRAY, 75 - .key_size = sizeof(int), 76 - .value_size = sizeof(int), 77 - .max_entries = 2 78 - }; 79 - 80 - struct bpf_map_def SEC("maps") sock_redir_flags = { 81 - .type = BPF_MAP_TYPE_ARRAY, 82 - .key_size = sizeof(int), 83 - .value_size = sizeof(int), 84 - .max_entries = 1 85 - }; 86 - 87 - struct bpf_map_def SEC("maps") sock_skb_opts = { 88 - .type = BPF_MAP_TYPE_ARRAY, 89 - .key_size = sizeof(int), 90 - .value_size = sizeof(int), 91 - .max_entries = 1 92 - }; 93 - 94 - SEC("sk_skb1") 95 - int bpf_prog1(struct __sk_buff *skb) 96 - { 97 - return skb->len; 98 - } 99 - 100 - SEC("sk_skb2") 101 - int bpf_prog2(struct __sk_buff *skb) 102 - { 103 - __u32 lport = skb->local_port; 104 - __u32 rport = skb->remote_port; 105 - int len, *f, ret, zero = 0; 106 - __u64 flags = 0; 107 - 108 - if (lport == 10000) 109 - ret = 10; 110 - else 111 - ret = 1; 112 - 113 - len = (__u32)skb->data_end - (__u32)skb->data; 114 - f = bpf_map_lookup_elem(&sock_skb_opts, &zero); 115 - if (f && *f) { 116 - ret = 3; 117 - flags = *f; 118 - } 119 - 120 - bpf_printk("sk_skb2: redirect(%iB) flags=%i\n", 121 - len, flags); 122 - return bpf_sk_redirect_map(skb, &sock_map, ret, flags); 123 - } 124 - 125 - SEC("sockops") 126 - int bpf_sockmap(struct bpf_sock_ops *skops) 127 - { 128 - __u32 lport, rport; 129 - int op, err = 0, index, key, ret; 130 - 131 - 132 - op = (int) skops->op; 133 - 134 - switch (op) { 135 - case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: 136 - lport = skops->local_port; 137 - rport = skops->remote_port; 138 - 139 - if (lport == 10000) { 140 - ret = 1; 141 - err = bpf_sock_map_update(skops, &sock_map, &ret, 142 - BPF_NOEXIST); 143 - bpf_printk("passive(%i -> %i) map ctx update err: %d\n", 144 - lport, bpf_ntohl(rport), err); 145 - } 146 - break; 147 - case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: 148 - lport = skops->local_port; 149 - rport = skops->remote_port; 150 - 151 - if (bpf_ntohl(rport) == 10001) { 152 - ret = 10; 153 - err = bpf_sock_map_update(skops, &sock_map, &ret, 154 - BPF_NOEXIST); 155 - bpf_printk("active(%i -> %i) map ctx update err: %d\n", 156 - lport, bpf_ntohl(rport), err); 157 - } 158 - break; 159 - default: 160 - break; 161 - } 162 - 163 - return 0; 164 - } 165 - 166 - SEC("sk_msg1") 167 - int bpf_prog4(struct sk_msg_md *msg) 168 - { 169 - int *bytes, zero = 0, one = 1; 170 - int *start, *end; 171 - 172 - bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 173 - if (bytes) 174 - bpf_msg_apply_bytes(msg, *bytes); 175 - bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 176 - if (bytes) 177 - bpf_msg_cork_bytes(msg, *bytes); 178 - start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 179 - end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 180 - if (start && end) 181 - bpf_msg_pull_data(msg, *start, *end, 0); 182 - return SK_PASS; 183 - } 184 - 185 - SEC("sk_msg2") 186 - int bpf_prog5(struct sk_msg_md *msg) 187 - { 188 - int err1 = -1, err2 = -1, zero = 0, one = 1; 189 - int *bytes, *start, *end, len1, len2; 190 - 191 - bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 192 - if (bytes) 193 - err1 = bpf_msg_apply_bytes(msg, *bytes); 194 - bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 195 - if (bytes) 196 - err2 = bpf_msg_cork_bytes(msg, *bytes); 197 - len1 = (__u64)msg->data_end - (__u64)msg->data; 198 - start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 199 - end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 200 - if (start && end) { 201 - int err; 202 - 203 - bpf_printk("sk_msg2: pull(%i:%i)\n", 204 - start ? *start : 0, end ? *end : 0); 205 - err = bpf_msg_pull_data(msg, *start, *end, 0); 206 - if (err) 207 - bpf_printk("sk_msg2: pull_data err %i\n", 208 - err); 209 - len2 = (__u64)msg->data_end - (__u64)msg->data; 210 - bpf_printk("sk_msg2: length update %i->%i\n", 211 - len1, len2); 212 - } 213 - bpf_printk("sk_msg2: data length %i err1 %i err2 %i\n", 214 - len1, err1, err2); 215 - return SK_PASS; 216 - } 217 - 218 - SEC("sk_msg3") 219 - int bpf_prog6(struct sk_msg_md *msg) 220 - { 221 - int *bytes, zero = 0, one = 1, key = 0; 222 - int *start, *end, *f; 223 - __u64 flags = 0; 224 - 225 - bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 226 - if (bytes) 227 - bpf_msg_apply_bytes(msg, *bytes); 228 - bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 229 - if (bytes) 230 - bpf_msg_cork_bytes(msg, *bytes); 231 - start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 232 - end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 233 - if (start && end) 234 - bpf_msg_pull_data(msg, *start, *end, 0); 235 - f = bpf_map_lookup_elem(&sock_redir_flags, &zero); 236 - if (f && *f) { 237 - key = 2; 238 - flags = *f; 239 - } 240 - return bpf_msg_redirect_map(msg, &sock_map_redir, key, flags); 241 - } 242 - 243 - SEC("sk_msg4") 244 - int bpf_prog7(struct sk_msg_md *msg) 245 - { 246 - int err1 = 0, err2 = 0, zero = 0, one = 1, key = 0; 247 - int *f, *bytes, *start, *end, len1, len2; 248 - __u64 flags = 0; 249 - 250 - int err; 251 - bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 252 - if (bytes) 253 - err1 = bpf_msg_apply_bytes(msg, *bytes); 254 - bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 255 - if (bytes) 256 - err2 = bpf_msg_cork_bytes(msg, *bytes); 257 - len1 = (__u64)msg->data_end - (__u64)msg->data; 258 - start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 259 - end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 260 - if (start && end) { 261 - 262 - bpf_printk("sk_msg2: pull(%i:%i)\n", 263 - start ? *start : 0, end ? *end : 0); 264 - err = bpf_msg_pull_data(msg, *start, *end, 0); 265 - if (err) 266 - bpf_printk("sk_msg2: pull_data err %i\n", 267 - err); 268 - len2 = (__u64)msg->data_end - (__u64)msg->data; 269 - bpf_printk("sk_msg2: length update %i->%i\n", 270 - len1, len2); 271 - } 272 - f = bpf_map_lookup_elem(&sock_redir_flags, &zero); 273 - if (f && *f) { 274 - key = 2; 275 - flags = *f; 276 - } 277 - bpf_printk("sk_msg3: redirect(%iB) flags=%i err=%i\n", 278 - len1, flags, err1 ? err1 : err2); 279 - err = bpf_msg_redirect_map(msg, &sock_map_redir, key, flags); 280 - bpf_printk("sk_msg3: err %i\n", err); 281 - return err; 282 - } 283 - 284 - SEC("sk_msg5") 285 - int bpf_prog8(struct sk_msg_md *msg) 286 - { 287 - void *data_end = (void *)(long) msg->data_end; 288 - void *data = (void *)(long) msg->data; 289 - int ret = 0, *bytes, zero = 0; 290 - 291 - bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 292 - if (bytes) { 293 - ret = bpf_msg_apply_bytes(msg, *bytes); 294 - if (ret) 295 - return SK_DROP; 296 - } else { 297 - return SK_DROP; 298 - } 299 - return SK_PASS; 300 - } 301 - SEC("sk_msg6") 302 - int bpf_prog9(struct sk_msg_md *msg) 303 - { 304 - void *data_end = (void *)(long) msg->data_end; 305 - void *data = (void *)(long) msg->data; 306 - int ret = 0, *bytes, zero = 0; 307 - 308 - bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 309 - if (bytes) { 310 - if (((__u64)data_end - (__u64)data) >= *bytes) 311 - return SK_PASS; 312 - ret = bpf_msg_cork_bytes(msg, *bytes); 313 - if (ret) 314 - return SK_DROP; 315 - } 316 - return SK_PASS; 317 - } 318 - 319 - SEC("sk_msg7") 320 - int bpf_prog10(struct sk_msg_md *msg) 321 - { 322 - int *bytes, zero = 0, one = 1; 323 - int *start, *end; 324 - 325 - bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 326 - if (bytes) 327 - bpf_msg_apply_bytes(msg, *bytes); 328 - bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 329 - if (bytes) 330 - bpf_msg_cork_bytes(msg, *bytes); 331 - start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 332 - end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 333 - if (start && end) 334 - bpf_msg_pull_data(msg, *start, *end, 0); 335 - 336 - return SK_DROP; 337 - } 338 - 339 - int _version SEC("version") = 1; 340 - char _license[] SEC("license") = "GPL"; 2 + // Copyright (c) 2018 Covalent IO, Inc. http://covalent.io 3 + #define SOCKMAP 4 + #define TEST_MAP_TYPE BPF_MAP_TYPE_SOCKMAP 5 + #include "./test_sockmap_kern.h"
+363
tools/testing/selftests/bpf/test_sockmap_kern.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* Copyright (c) 2017-2018 Covalent IO, Inc. http://covalent.io */ 3 + #include <stddef.h> 4 + #include <string.h> 5 + #include <linux/bpf.h> 6 + #include <linux/if_ether.h> 7 + #include <linux/if_packet.h> 8 + #include <linux/ip.h> 9 + #include <linux/ipv6.h> 10 + #include <linux/in.h> 11 + #include <linux/udp.h> 12 + #include <linux/tcp.h> 13 + #include <linux/pkt_cls.h> 14 + #include <sys/socket.h> 15 + #include "bpf_helpers.h" 16 + #include "bpf_endian.h" 17 + 18 + /* Sockmap sample program connects a client and a backend together 19 + * using cgroups. 20 + * 21 + * client:X <---> frontend:80 client:X <---> backend:80 22 + * 23 + * For simplicity we hard code values here and bind 1:1. The hard 24 + * coded values are part of the setup in sockmap.sh script that 25 + * is associated with this BPF program. 26 + * 27 + * The bpf_printk is verbose and prints information as connections 28 + * are established and verdicts are decided. 29 + */ 30 + 31 + #define bpf_printk(fmt, ...) \ 32 + ({ \ 33 + char ____fmt[] = fmt; \ 34 + bpf_trace_printk(____fmt, sizeof(____fmt), \ 35 + ##__VA_ARGS__); \ 36 + }) 37 + 38 + struct bpf_map_def SEC("maps") sock_map = { 39 + .type = TEST_MAP_TYPE, 40 + .key_size = sizeof(int), 41 + .value_size = sizeof(int), 42 + .max_entries = 20, 43 + }; 44 + 45 + struct bpf_map_def SEC("maps") sock_map_txmsg = { 46 + .type = TEST_MAP_TYPE, 47 + .key_size = sizeof(int), 48 + .value_size = sizeof(int), 49 + .max_entries = 20, 50 + }; 51 + 52 + struct bpf_map_def SEC("maps") sock_map_redir = { 53 + .type = TEST_MAP_TYPE, 54 + .key_size = sizeof(int), 55 + .value_size = sizeof(int), 56 + .max_entries = 20, 57 + }; 58 + 59 + struct bpf_map_def SEC("maps") sock_apply_bytes = { 60 + .type = BPF_MAP_TYPE_ARRAY, 61 + .key_size = sizeof(int), 62 + .value_size = sizeof(int), 63 + .max_entries = 1 64 + }; 65 + 66 + struct bpf_map_def SEC("maps") sock_cork_bytes = { 67 + .type = BPF_MAP_TYPE_ARRAY, 68 + .key_size = sizeof(int), 69 + .value_size = sizeof(int), 70 + .max_entries = 1 71 + }; 72 + 73 + struct bpf_map_def SEC("maps") sock_pull_bytes = { 74 + .type = BPF_MAP_TYPE_ARRAY, 75 + .key_size = sizeof(int), 76 + .value_size = sizeof(int), 77 + .max_entries = 2 78 + }; 79 + 80 + struct bpf_map_def SEC("maps") sock_redir_flags = { 81 + .type = BPF_MAP_TYPE_ARRAY, 82 + .key_size = sizeof(int), 83 + .value_size = sizeof(int), 84 + .max_entries = 1 85 + }; 86 + 87 + struct bpf_map_def SEC("maps") sock_skb_opts = { 88 + .type = BPF_MAP_TYPE_ARRAY, 89 + .key_size = sizeof(int), 90 + .value_size = sizeof(int), 91 + .max_entries = 1 92 + }; 93 + 94 + SEC("sk_skb1") 95 + int bpf_prog1(struct __sk_buff *skb) 96 + { 97 + return skb->len; 98 + } 99 + 100 + SEC("sk_skb2") 101 + int bpf_prog2(struct __sk_buff *skb) 102 + { 103 + __u32 lport = skb->local_port; 104 + __u32 rport = skb->remote_port; 105 + int len, *f, ret, zero = 0; 106 + __u64 flags = 0; 107 + 108 + if (lport == 10000) 109 + ret = 10; 110 + else 111 + ret = 1; 112 + 113 + len = (__u32)skb->data_end - (__u32)skb->data; 114 + f = bpf_map_lookup_elem(&sock_skb_opts, &zero); 115 + if (f && *f) { 116 + ret = 3; 117 + flags = *f; 118 + } 119 + 120 + bpf_printk("sk_skb2: redirect(%iB) flags=%i\n", 121 + len, flags); 122 + #ifdef SOCKMAP 123 + return bpf_sk_redirect_map(skb, &sock_map, ret, flags); 124 + #else 125 + return bpf_sk_redirect_hash(skb, &sock_map, &ret, flags); 126 + #endif 127 + 128 + } 129 + 130 + SEC("sockops") 131 + int bpf_sockmap(struct bpf_sock_ops *skops) 132 + { 133 + __u32 lport, rport; 134 + int op, err = 0, index, key, ret; 135 + 136 + 137 + op = (int) skops->op; 138 + 139 + switch (op) { 140 + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: 141 + lport = skops->local_port; 142 + rport = skops->remote_port; 143 + 144 + if (lport == 10000) { 145 + ret = 1; 146 + #ifdef SOCKMAP 147 + err = bpf_sock_map_update(skops, &sock_map, &ret, 148 + BPF_NOEXIST); 149 + #else 150 + err = bpf_sock_hash_update(skops, &sock_map, &ret, 151 + BPF_NOEXIST); 152 + #endif 153 + bpf_printk("passive(%i -> %i) map ctx update err: %d\n", 154 + lport, bpf_ntohl(rport), err); 155 + } 156 + break; 157 + case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: 158 + lport = skops->local_port; 159 + rport = skops->remote_port; 160 + 161 + if (bpf_ntohl(rport) == 10001) { 162 + ret = 10; 163 + #ifdef SOCKMAP 164 + err = bpf_sock_map_update(skops, &sock_map, &ret, 165 + BPF_NOEXIST); 166 + #else 167 + err = bpf_sock_hash_update(skops, &sock_map, &ret, 168 + BPF_NOEXIST); 169 + #endif 170 + bpf_printk("active(%i -> %i) map ctx update err: %d\n", 171 + lport, bpf_ntohl(rport), err); 172 + } 173 + break; 174 + default: 175 + break; 176 + } 177 + 178 + return 0; 179 + } 180 + 181 + SEC("sk_msg1") 182 + int bpf_prog4(struct sk_msg_md *msg) 183 + { 184 + int *bytes, zero = 0, one = 1; 185 + int *start, *end; 186 + 187 + bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 188 + if (bytes) 189 + bpf_msg_apply_bytes(msg, *bytes); 190 + bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 191 + if (bytes) 192 + bpf_msg_cork_bytes(msg, *bytes); 193 + start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 194 + end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 195 + if (start && end) 196 + bpf_msg_pull_data(msg, *start, *end, 0); 197 + return SK_PASS; 198 + } 199 + 200 + SEC("sk_msg2") 201 + int bpf_prog5(struct sk_msg_md *msg) 202 + { 203 + int err1 = -1, err2 = -1, zero = 0, one = 1; 204 + int *bytes, *start, *end, len1, len2; 205 + 206 + bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 207 + if (bytes) 208 + err1 = bpf_msg_apply_bytes(msg, *bytes); 209 + bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 210 + if (bytes) 211 + err2 = bpf_msg_cork_bytes(msg, *bytes); 212 + len1 = (__u64)msg->data_end - (__u64)msg->data; 213 + start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 214 + end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 215 + if (start && end) { 216 + int err; 217 + 218 + bpf_printk("sk_msg2: pull(%i:%i)\n", 219 + start ? *start : 0, end ? *end : 0); 220 + err = bpf_msg_pull_data(msg, *start, *end, 0); 221 + if (err) 222 + bpf_printk("sk_msg2: pull_data err %i\n", 223 + err); 224 + len2 = (__u64)msg->data_end - (__u64)msg->data; 225 + bpf_printk("sk_msg2: length update %i->%i\n", 226 + len1, len2); 227 + } 228 + bpf_printk("sk_msg2: data length %i err1 %i err2 %i\n", 229 + len1, err1, err2); 230 + return SK_PASS; 231 + } 232 + 233 + SEC("sk_msg3") 234 + int bpf_prog6(struct sk_msg_md *msg) 235 + { 236 + int *bytes, zero = 0, one = 1, key = 0; 237 + int *start, *end, *f; 238 + __u64 flags = 0; 239 + 240 + bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 241 + if (bytes) 242 + bpf_msg_apply_bytes(msg, *bytes); 243 + bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 244 + if (bytes) 245 + bpf_msg_cork_bytes(msg, *bytes); 246 + start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 247 + end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 248 + if (start && end) 249 + bpf_msg_pull_data(msg, *start, *end, 0); 250 + f = bpf_map_lookup_elem(&sock_redir_flags, &zero); 251 + if (f && *f) { 252 + key = 2; 253 + flags = *f; 254 + } 255 + #ifdef SOCKMAP 256 + return bpf_msg_redirect_map(msg, &sock_map_redir, key, flags); 257 + #else 258 + return bpf_msg_redirect_hash(msg, &sock_map_redir, &key, flags); 259 + #endif 260 + } 261 + 262 + SEC("sk_msg4") 263 + int bpf_prog7(struct sk_msg_md *msg) 264 + { 265 + int err1 = 0, err2 = 0, zero = 0, one = 1, key = 0; 266 + int *f, *bytes, *start, *end, len1, len2; 267 + __u64 flags = 0; 268 + 269 + int err; 270 + bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 271 + if (bytes) 272 + err1 = bpf_msg_apply_bytes(msg, *bytes); 273 + bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 274 + if (bytes) 275 + err2 = bpf_msg_cork_bytes(msg, *bytes); 276 + len1 = (__u64)msg->data_end - (__u64)msg->data; 277 + start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 278 + end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 279 + if (start && end) { 280 + 281 + bpf_printk("sk_msg2: pull(%i:%i)\n", 282 + start ? *start : 0, end ? *end : 0); 283 + err = bpf_msg_pull_data(msg, *start, *end, 0); 284 + if (err) 285 + bpf_printk("sk_msg2: pull_data err %i\n", 286 + err); 287 + len2 = (__u64)msg->data_end - (__u64)msg->data; 288 + bpf_printk("sk_msg2: length update %i->%i\n", 289 + len1, len2); 290 + } 291 + f = bpf_map_lookup_elem(&sock_redir_flags, &zero); 292 + if (f && *f) { 293 + key = 2; 294 + flags = *f; 295 + } 296 + bpf_printk("sk_msg3: redirect(%iB) flags=%i err=%i\n", 297 + len1, flags, err1 ? err1 : err2); 298 + #ifdef SOCKMAP 299 + err = bpf_msg_redirect_map(msg, &sock_map_redir, key, flags); 300 + #else 301 + err = bpf_msg_redirect_hash(msg, &sock_map_redir, &key, flags); 302 + #endif 303 + bpf_printk("sk_msg3: err %i\n", err); 304 + return err; 305 + } 306 + 307 + SEC("sk_msg5") 308 + int bpf_prog8(struct sk_msg_md *msg) 309 + { 310 + void *data_end = (void *)(long) msg->data_end; 311 + void *data = (void *)(long) msg->data; 312 + int ret = 0, *bytes, zero = 0; 313 + 314 + bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 315 + if (bytes) { 316 + ret = bpf_msg_apply_bytes(msg, *bytes); 317 + if (ret) 318 + return SK_DROP; 319 + } else { 320 + return SK_DROP; 321 + } 322 + return SK_PASS; 323 + } 324 + SEC("sk_msg6") 325 + int bpf_prog9(struct sk_msg_md *msg) 326 + { 327 + void *data_end = (void *)(long) msg->data_end; 328 + void *data = (void *)(long) msg->data; 329 + int ret = 0, *bytes, zero = 0; 330 + 331 + bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 332 + if (bytes) { 333 + if (((__u64)data_end - (__u64)data) >= *bytes) 334 + return SK_PASS; 335 + ret = bpf_msg_cork_bytes(msg, *bytes); 336 + if (ret) 337 + return SK_DROP; 338 + } 339 + return SK_PASS; 340 + } 341 + 342 + SEC("sk_msg7") 343 + int bpf_prog10(struct sk_msg_md *msg) 344 + { 345 + int *bytes, zero = 0, one = 1; 346 + int *start, *end; 347 + 348 + bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero); 349 + if (bytes) 350 + bpf_msg_apply_bytes(msg, *bytes); 351 + bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero); 352 + if (bytes) 353 + bpf_msg_cork_bytes(msg, *bytes); 354 + start = bpf_map_lookup_elem(&sock_pull_bytes, &zero); 355 + end = bpf_map_lookup_elem(&sock_pull_bytes, &one); 356 + if (start && end) 357 + bpf_msg_pull_data(msg, *start, *end, 0); 358 + 359 + return SK_DROP; 360 + } 361 + 362 + int _version SEC("version") = 1; 363 + char _license[] SEC("license") = "GPL";
+62
tools/testing/selftests/bpf/test_verifier.c
··· 41 41 # endif 42 42 #endif 43 43 #include "bpf_rlimit.h" 44 + #include "bpf_rand.h" 44 45 #include "../../../include/linux/filter.h" 45 46 46 47 #ifndef ARRAY_SIZE ··· 151 150 while (i < len - 1) 152 151 insn[i++] = BPF_LD_ABS(BPF_B, 1); 153 152 insn[i] = BPF_EXIT_INSN(); 153 + } 154 + 155 + static void bpf_fill_rand_ld_dw(struct bpf_test *self) 156 + { 157 + struct bpf_insn *insn = self->insns; 158 + uint64_t res = 0; 159 + int i = 0; 160 + 161 + insn[i++] = BPF_MOV32_IMM(BPF_REG_0, 0); 162 + while (i < self->retval) { 163 + uint64_t val = bpf_semi_rand_get(); 164 + struct bpf_insn tmp[2] = { BPF_LD_IMM64(BPF_REG_1, val) }; 165 + 166 + res ^= val; 167 + insn[i++] = tmp[0]; 168 + insn[i++] = tmp[1]; 169 + insn[i++] = BPF_ALU64_REG(BPF_XOR, BPF_REG_0, BPF_REG_1); 170 + } 171 + insn[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_0); 172 + insn[i++] = BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 32); 173 + insn[i++] = BPF_ALU64_REG(BPF_XOR, BPF_REG_0, BPF_REG_1); 174 + insn[i] = BPF_EXIT_INSN(); 175 + res ^= (res >> 32); 176 + self->retval = (uint32_t)res; 154 177 } 155 178 156 179 static struct bpf_test tests[] = { ··· 11999 11974 .result = ACCEPT, 12000 11975 .retval = 10, 12001 11976 }, 11977 + { 11978 + "ld_dw: xor semi-random 64 bit imms, test 1", 11979 + .insns = { }, 11980 + .data = { }, 11981 + .fill_helper = bpf_fill_rand_ld_dw, 11982 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 11983 + .result = ACCEPT, 11984 + .retval = 4090, 11985 + }, 11986 + { 11987 + "ld_dw: xor semi-random 64 bit imms, test 2", 11988 + .insns = { }, 11989 + .data = { }, 11990 + .fill_helper = bpf_fill_rand_ld_dw, 11991 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 11992 + .result = ACCEPT, 11993 + .retval = 2047, 11994 + }, 11995 + { 11996 + "ld_dw: xor semi-random 64 bit imms, test 3", 11997 + .insns = { }, 11998 + .data = { }, 11999 + .fill_helper = bpf_fill_rand_ld_dw, 12000 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12001 + .result = ACCEPT, 12002 + .retval = 511, 12003 + }, 12004 + { 12005 + "ld_dw: xor semi-random 64 bit imms, test 4", 12006 + .insns = { }, 12007 + .data = { }, 12008 + .fill_helper = bpf_fill_rand_ld_dw, 12009 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12010 + .result = ACCEPT, 12011 + .retval = 5, 12012 + }, 12002 12013 }; 12003 12014 12004 12015 static int probe_filter_length(const struct bpf_insn *fp) ··· 12407 12346 return EXIT_FAILURE; 12408 12347 } 12409 12348 12349 + bpf_semi_rand_init(); 12410 12350 return do_test(unpriv, from, to); 12411 12351 }
+30 -57
tools/testing/selftests/bpf/trace_helpers.c
··· 74 74 75 75 static int page_size; 76 76 static int page_cnt = 8; 77 - static volatile struct perf_event_mmap_page *header; 77 + static struct perf_event_mmap_page *header; 78 78 79 79 int perf_event_mmap(int fd) 80 80 { ··· 107 107 char data[]; 108 108 }; 109 109 110 - static int perf_event_read(perf_event_print_fn fn) 110 + static enum bpf_perf_event_ret bpf_perf_event_print(void *event, void *priv) 111 111 { 112 - __u64 data_tail = header->data_tail; 113 - __u64 data_head = header->data_head; 114 - __u64 buffer_size = page_cnt * page_size; 115 - void *base, *begin, *end; 116 - char buf[256]; 112 + struct perf_event_sample *e = event; 113 + perf_event_print_fn fn = priv; 117 114 int ret; 118 115 119 - asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */ 120 - if (data_head == data_tail) 121 - return PERF_EVENT_CONT; 122 - 123 - base = ((char *)header) + page_size; 124 - 125 - begin = base + data_tail % buffer_size; 126 - end = base + data_head % buffer_size; 127 - 128 - while (begin != end) { 129 - struct perf_event_sample *e; 130 - 131 - e = begin; 132 - if (begin + e->header.size > base + buffer_size) { 133 - long len = base + buffer_size - begin; 134 - 135 - assert(len < e->header.size); 136 - memcpy(buf, begin, len); 137 - memcpy(buf + len, base, e->header.size - len); 138 - e = (void *) buf; 139 - begin = base + e->header.size - len; 140 - } else if (begin + e->header.size == base + buffer_size) { 141 - begin = base; 142 - } else { 143 - begin += e->header.size; 144 - } 145 - 146 - if (e->header.type == PERF_RECORD_SAMPLE) { 147 - ret = fn(e->data, e->size); 148 - if (ret != PERF_EVENT_CONT) 149 - return ret; 150 - } else if (e->header.type == PERF_RECORD_LOST) { 151 - struct { 152 - struct perf_event_header header; 153 - __u64 id; 154 - __u64 lost; 155 - } *lost = (void *) e; 156 - printf("lost %lld events\n", lost->lost); 157 - } else { 158 - printf("unknown event type=%d size=%d\n", 159 - e->header.type, e->header.size); 160 - } 116 + if (e->header.type == PERF_RECORD_SAMPLE) { 117 + ret = fn(e->data, e->size); 118 + if (ret != LIBBPF_PERF_EVENT_CONT) 119 + return ret; 120 + } else if (e->header.type == PERF_RECORD_LOST) { 121 + struct { 122 + struct perf_event_header header; 123 + __u64 id; 124 + __u64 lost; 125 + } *lost = (void *) e; 126 + printf("lost %lld events\n", lost->lost); 127 + } else { 128 + printf("unknown event type=%d size=%d\n", 129 + e->header.type, e->header.size); 161 130 } 162 131 163 - __sync_synchronize(); /* smp_mb() */ 164 - header->data_tail = data_head; 165 - return PERF_EVENT_CONT; 132 + return LIBBPF_PERF_EVENT_CONT; 166 133 } 167 134 168 135 int perf_event_poller(int fd, perf_event_print_fn output_fn) 169 136 { 170 - int ret; 137 + enum bpf_perf_event_ret ret; 138 + void *buf = NULL; 139 + size_t len = 0; 171 140 172 141 for (;;) { 173 142 perf_event_poll(fd); 174 - ret = perf_event_read(output_fn); 175 - if (ret != PERF_EVENT_CONT) 176 - return ret; 143 + ret = bpf_perf_event_read_simple(header, page_cnt * page_size, 144 + page_size, &buf, &len, 145 + bpf_perf_event_print, 146 + output_fn); 147 + if (ret != LIBBPF_PERF_EVENT_CONT) 148 + break; 177 149 } 150 + free(buf); 178 151 179 - return PERF_EVENT_DONE; 152 + return ret; 180 153 }
+4 -7
tools/testing/selftests/bpf/trace_helpers.h
··· 2 2 #ifndef __TRACE_HELPER_H 3 3 #define __TRACE_HELPER_H 4 4 5 + #include <libbpf.h> 6 + 5 7 struct ksym { 6 8 long addr; 7 9 char *name; ··· 12 10 int load_kallsyms(void); 13 11 struct ksym *ksym_search(long key); 14 12 15 - typedef int (*perf_event_print_fn)(void *data, int size); 16 - 17 - /* return code for perf_event_print_fn */ 18 - #define PERF_EVENT_DONE 0 19 - #define PERF_EVENT_ERROR -1 20 - #define PERF_EVENT_CONT -2 13 + typedef enum bpf_perf_event_ret (*perf_event_print_fn)(void *data, int size); 21 14 22 15 int perf_event_mmap(int fd); 23 - /* return PERF_EVENT_DONE or PERF_EVENT_ERROR */ 16 + /* return LIBBPF_PERF_EVENT_DONE or LIBBPF_PERF_EVENT_ERROR */ 24 17 int perf_event_poller(int fd, perf_event_print_fn output_fn); 25 18 #endif
+8 -2
tools/testing/selftests/bpf/urandom_read.c
··· 6 6 #include <stdlib.h> 7 7 8 8 #define BUF_SIZE 256 9 - int main(void) 9 + 10 + int main(int argc, char *argv[]) 10 11 { 11 12 int fd = open("/dev/urandom", O_RDONLY); 12 13 int i; 13 14 char buf[BUF_SIZE]; 15 + int count = 4; 14 16 15 17 if (fd < 0) 16 18 return 1; 17 - for (i = 0; i < 4; ++i) 19 + 20 + if (argc == 2) 21 + count = atoi(argv[1]); 22 + 23 + for (i = 0; i < count; ++i) 18 24 read(fd, buf, BUF_SIZE); 19 25 20 26 close(fd);