Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: doc: update answer for 32-bit subregister question

There has been quite a few progress around the two steps mentioned in the
answer to the following question:

Q: BPF 32-bit subregister requirements

This patch updates the answer to reflect what has been done.

v2:
- Add missing full stop. (Song Liu)
- Minor tweak on one sentence. (Song Liu)

v1:
- Integrated rephrase from Quentin and Jakub

Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Jiong Wang and committed by
Alexei Starovoitov
c231c22a d168286d

+25 -5
+25 -5
Documentation/bpf/bpf_design_QA.rst
··· 172 172 CPU architectures and 32-bit HW accelerators. Can true 32-bit registers 173 173 be added to BPF in the future? 174 174 175 - A: NO. The first thing to improve performance on 32-bit archs is to teach 176 - LLVM to generate code that uses 32-bit subregisters. Then second step 177 - is to teach verifier to mark operations where zero-ing upper bits 178 - is unnecessary. Then JITs can take advantage of those markings and 179 - drastically reduce size of generated code and improve performance. 175 + A: NO. 176 + 177 + But some optimizations on zero-ing the upper 32 bits for BPF registers are 178 + available, and can be leveraged to improve the performance of JITed BPF 179 + programs for 32-bit architectures. 180 + 181 + Starting with version 7, LLVM is able to generate instructions that operate 182 + on 32-bit subregisters, provided the option -mattr=+alu32 is passed for 183 + compiling a program. Furthermore, the verifier can now mark the 184 + instructions for which zero-ing the upper bits of the destination register 185 + is required, and insert an explicit zero-extension (zext) instruction 186 + (a mov32 variant). This means that for architectures without zext hardware 187 + support, the JIT back-ends do not need to clear the upper bits for 188 + subregisters written by alu32 instructions or narrow loads. Instead, the 189 + back-ends simply need to support code generation for that mov32 variant, 190 + and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to 191 + enable zext insertion in the verifier). 192 + 193 + Note that it is possible for a JIT back-end to have partial hardware 194 + support for zext. In that case, if verifier zext insertion is enabled, 195 + it could lead to the insertion of unnecessary zext instructions. Such 196 + instructions could be removed by creating a simple peephole inside the JIT 197 + back-end: if one instruction has hardware support for zext and if the next 198 + instruction is an explicit zext, then the latter can be skipped when doing 199 + the code generation. 180 200 181 201 Q: Does BPF have a stable ABI? 182 202 ------------------------------