Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

docs/bpf: Add documentation for new instructions

Add documentation in instruction-set.rst for new instruction encoding
and their corresponding operations. Also removed the question
related to 'no BPF_SDIV' in bpf_design_QA.rst since we have
BPF_SDIV insn now.

Cc: bpf@ietf.org
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011342.3724411-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Yonghong Song and committed by
Alexei Starovoitov
245d4c40 0c606571

+79 -41
-5
Documentation/bpf/bpf_design_QA.rst
··· 140 140 it more complicated to support on arm64 and other archs. Also it 141 141 needs div-by-zero runtime check. 142 142 143 - Q: Why there is no BPF_SDIV for signed divide operation? 144 - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 145 - A: Because it would be rarely used. llvm errors in such case and 146 - prints a suggestion to use unsigned divide instead. 147 - 148 143 Q: Why BPF has implicit prologue and epilogue? 149 144 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 150 145 A: Because architectures like sparc have register windows and in general
+79 -36
Documentation/bpf/standardization/instruction-set.rst
··· 154 154 The 'code' field encodes the operation as below, where 'src' and 'dst' refer 155 155 to the values of the source and destination registers, respectively. 156 156 157 - ======== ===== ========================================================== 158 - code value description 159 - ======== ===== ========================================================== 160 - BPF_ADD 0x00 dst += src 161 - BPF_SUB 0x10 dst -= src 162 - BPF_MUL 0x20 dst \*= src 163 - BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0 164 - BPF_OR 0x40 dst \|= src 165 - BPF_AND 0x50 dst &= src 166 - BPF_LSH 0x60 dst <<= (src & mask) 167 - BPF_RSH 0x70 dst >>= (src & mask) 168 - BPF_NEG 0x80 dst = -dst 169 - BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst 170 - BPF_XOR 0xa0 dst ^= src 171 - BPF_MOV 0xb0 dst = src 172 - BPF_ARSH 0xc0 sign extending dst >>= (src & mask) 173 - BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below) 174 - ======== ===== ========================================================== 157 + ======== ===== ======= ========================================================== 158 + code value offset description 159 + ======== ===== ======= ========================================================== 160 + BPF_ADD 0x00 0 dst += src 161 + BPF_SUB 0x10 0 dst -= src 162 + BPF_MUL 0x20 0 dst \*= src 163 + BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0 164 + BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0 165 + BPF_OR 0x40 0 dst \|= src 166 + BPF_AND 0x50 0 dst &= src 167 + BPF_LSH 0x60 0 dst <<= (src & mask) 168 + BPF_RSH 0x70 0 dst >>= (src & mask) 169 + BPF_NEG 0x80 0 dst = -dst 170 + BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst 171 + BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst 172 + BPF_XOR 0xa0 0 dst ^= src 173 + BPF_MOV 0xb0 0 dst = src 174 + BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src 175 + BPF_ARSH 0xc0 0 sign extending dst >>= (src & mask) 176 + BPF_END 0xd0 0 byte swap operations (see `Byte swap instructions`_ below) 177 + ======== ===== ============ ========================================================== 175 178 176 179 Underflow and overflow are allowed during arithmetic operations, meaning 177 180 the 64-bit or 32-bit value will wrap. If eBPF program execution would ··· 201 198 202 199 dst = dst ^ imm32 203 200 204 - Also note that the division and modulo operations are unsigned. Thus, for 205 - ``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas 206 - for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result 207 - interpreted as an unsigned 64-bit value. There are no instructions for 208 - signed division or modulo. 201 + Note that most instructions have instruction offset of 0. But three instructions 202 + (BPF_SDIV, BPF_SMOD, BPF_MOVSX) have non-zero offset. 203 + 204 + The devision and modulo operations support both unsigned and signed flavors. 205 + For unsigned operation (BPF_DIV and BPF_MOD), for ``BPF_ALU``, 'imm' is first 206 + interpreted as an unsigned 32-bit value, whereas for ``BPF_ALU64``, 'imm' is 207 + first sign extended to 64 bits and the result interpreted as an unsigned 64-bit 208 + value. For signed operation (BPF_SDIV and BPF_SMOD), for ``BPF_ALU``, 'imm' is 209 + interpreted as a signed value. For ``BPF_ALU64``, the 'imm' is sign extended 210 + from 32 to 64 and interpreted as a signed 64-bit value. 211 + 212 + Instruction BPF_MOVSX does move operation with sign extension. 213 + ``BPF_ALU | MOVSX`` sign extendes 8-bit and 16-bit into 32-bit and upper 32-bit are zeroed. 214 + ``BPF_ALU64 | MOVSX`` sign extends 8-bit, 16-bit and 32-bit into 64-bit. 209 215 210 216 Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) 211 217 for 32-bit operations. ··· 222 210 Byte swap instructions 223 211 ~~~~~~~~~~~~~~~~~~~~~~ 224 212 225 - The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit 226 - 'code' field of ``BPF_END``. 213 + The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64`` 214 + and a 4-bit 'code' field of ``BPF_END``. 227 215 228 216 The byte swap instructions operate on the destination register 229 217 only and do not use a separate source register or immediate value. 230 218 231 - The 1-bit source operand field in the opcode is used to select what byte 232 - order the operation convert from or to: 219 + For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to select what byte 220 + order the operation convert from or to. For ``BPF_ALU64``, the 1-bit source operand 221 + field in the opcode is not used and must be 0. 233 222 234 - ========= ===== ================================================= 235 - source value description 236 - ========= ===== ================================================= 237 - BPF_TO_LE 0x00 convert between host byte order and little endian 238 - BPF_TO_BE 0x08 convert between host byte order and big endian 239 - ========= ===== ================================================= 223 + ========= ========= ===== ================================================= 224 + class source value description 225 + ========= ========= ===== ================================================= 226 + BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little endian 227 + BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big endian 228 + BPF_ALU64 BPF_TO_LE 0x00 do byte swap unconditionally 229 + ========= ========= ===== ================================================= 240 230 241 231 The 'imm' field encodes the width of the swap operations. The following widths 242 232 are supported: 16, 32 and 64. ··· 253 239 254 240 dst = htobe64(dst) 255 241 242 + ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: 243 + 244 + dst = bswap16 dst 245 + dst = bswap32 dst 246 + dst = bswap64 dst 247 + 256 248 Jump instructions 257 249 ----------------- 258 250 ··· 269 249 ======== ===== === =========================================== ========================================= 270 250 code value src description notes 271 251 ======== ===== === =========================================== ========================================= 272 - BPF_JA 0x0 0x0 PC += offset BPF_JMP only 252 + BPF_JA 0x0 0x0 PC += offset BPF_JMP class 253 + BPF_JA 0x0 0x0 PC += imm BPF_JMP32 class 273 254 BPF_JEQ 0x1 any PC += offset if dst == src 274 255 BPF_JGT 0x2 any PC += offset if dst > src unsigned 275 256 BPF_JGE 0x3 any PC += offset if dst >= src unsigned ··· 298 277 if (s32)dst s>= (s32)src goto +offset 299 278 300 279 where 's>=' indicates a signed '>=' comparison. 280 + 281 + ``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means:: 282 + 283 + gotol +imm 284 + 285 + where 'imm' means the branch offset comes from insn 'imm' field. 286 + 287 + Note there are two flavors of BPF_JA instrions. BPF_JMP class permits 16-bit jump offset while 288 + BPF_JMP32 permits 32-bit jump offset. A >16bit conditional jmp can be converted to a <16bit 289 + conditional jmp plus a 32-bit unconditional jump. 301 290 302 291 Helper functions 303 292 ~~~~~~~~~~~~~~~~ ··· 351 320 BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ 352 321 BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ 353 322 BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ 323 + BPF_MEMSX 0x80 sign-extension load operations `Sign-extension load operations`_ 354 324 BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ 355 325 ============= ===== ==================================== ============= 356 326 ··· 382 350 383 351 ``BPF_MEM | <size> | BPF_LDX`` means:: 384 352 385 - dst = *(size *) (src + offset) 353 + dst = *(unsigned size *) (src + offset) 386 354 387 - Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. 355 + Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and 356 + 'unsigned size' is one of u8, u16, u32 and u64. 357 + 358 + The ``BPF_MEMSX`` mode modifier is used to encode sign-extension load 359 + instructions that transfer data between a register and memory. 360 + 361 + ``BPF_MEMSX | <size> | BPF_LDX`` means:: 362 + 363 + dst = *(signed size *) (src + offset) 364 + 365 + Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and 366 + 'signed size' is one of s8, s16 and s32. 388 367 389 368 Atomic operations 390 369 -----------------