Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

at v6.4 475 lines 20 kB view raw
1.. contents:: 2.. sectnum:: 3 4======================================== 5eBPF Instruction Set Specification, v1.0 6======================================== 7 8This document specifies version 1.0 of the eBPF instruction set. 9 10Documentation conventions 11========================= 12 13For brevity, this document uses the type notion "u64", "u32", etc. 14to mean an unsigned integer whose width is the specified number of bits, 15and "s32", etc. to mean a signed integer of the specified number of bits. 16 17Registers and calling convention 18================================ 19 20eBPF has 10 general purpose registers and a read-only frame pointer register, 21all of which are 64-bits wide. 22 23The eBPF calling convention is defined as: 24 25* R0: return value from function calls, and exit value for eBPF programs 26* R1 - R5: arguments for function calls 27* R6 - R9: callee saved registers that function calls will preserve 28* R10: read-only frame pointer to access stack 29 30R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if 31necessary across calls. 32 33Instruction encoding 34==================== 35 36eBPF has two instruction encodings: 37 38* the basic instruction encoding, which uses 64 bits to encode an instruction 39* the wide instruction encoding, which appends a second 64-bit immediate (i.e., 40 constant) value after the basic instruction for a total of 128 bits. 41 42The fields conforming an encoded basic instruction are stored in the 43following order:: 44 45 opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF. 46 opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF. 47 48**imm** 49 signed integer immediate value 50 51**offset** 52 signed integer offset used with pointer arithmetic 53 54**src_reg** 55 the source register number (0-10), except where otherwise specified 56 (`64-bit immediate instructions`_ reuse this field for other purposes) 57 58**dst_reg** 59 destination register number (0-10) 60 61**opcode** 62 operation to perform 63 64Note that the contents of multi-byte fields ('imm' and 'offset') are 65stored using big-endian byte ordering in big-endian BPF and 66little-endian byte ordering in little-endian BPF. 67 68For example:: 69 70 opcode offset imm assembly 71 src_reg dst_reg 72 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little 73 dst_reg src_reg 74 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big 75 76Note that most instructions do not use all of the fields. 77Unused fields shall be cleared to zero. 78 79As discussed below in `64-bit immediate instructions`_, a 64-bit immediate 80instruction uses a 64-bit immediate value that is constructed as follows. 81The 64 bits following the basic instruction contain a pseudo instruction 82using the same format but with opcode, dst_reg, src_reg, and offset all set to zero, 83and imm containing the high 32 bits of the immediate value. 84 85This is depicted in the following figure:: 86 87 basic_instruction 88 .-----------------------------. 89 | | 90 code:8 regs:8 offset:16 imm:32 unused:32 imm:32 91 | | 92 '--------------' 93 pseudo instruction 94 95Thus the 64-bit immediate value is constructed as follows: 96 97 imm64 = (next_imm << 32) | imm 98 99where 'next_imm' refers to the imm value of the pseudo instruction 100following the basic instruction. The unused bytes in the pseudo 101instruction are reserved and shall be cleared to zero. 102 103Instruction classes 104------------------- 105 106The three LSB bits of the 'opcode' field store the instruction class: 107 108========= ===== =============================== =================================== 109class value description reference 110========= ===== =============================== =================================== 111BPF_LD 0x00 non-standard load operations `Load and store instructions`_ 112BPF_LDX 0x01 load into register operations `Load and store instructions`_ 113BPF_ST 0x02 store from immediate operations `Load and store instructions`_ 114BPF_STX 0x03 store from register operations `Load and store instructions`_ 115BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_ 116BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_ 117BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_ 118BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_ 119========= ===== =============================== =================================== 120 121Arithmetic and jump instructions 122================================ 123 124For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and 125``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts: 126 127============== ====== ================= 1284 bits (MSB) 1 bit 3 bits (LSB) 129============== ====== ================= 130code source instruction class 131============== ====== ================= 132 133**code** 134 the operation code, whose meaning varies by instruction class 135 136**source** 137 the source operand location, which unless otherwise specified is one of: 138 139 ====== ===== ============================================== 140 source value description 141 ====== ===== ============================================== 142 BPF_K 0x00 use 32-bit 'imm' value as source operand 143 BPF_X 0x08 use 'src_reg' register value as source operand 144 ====== ===== ============================================== 145 146**instruction class** 147 the instruction class (see `Instruction classes`_) 148 149Arithmetic instructions 150----------------------- 151 152``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for 153otherwise identical operations. 154The 'code' field encodes the operation as below, where 'src' and 'dst' refer 155to the values of the source and destination registers, respectively. 156 157======== ===== ========================================================== 158code value description 159======== ===== ========================================================== 160BPF_ADD 0x00 dst += src 161BPF_SUB 0x10 dst -= src 162BPF_MUL 0x20 dst \*= src 163BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0 164BPF_OR 0x40 dst \|= src 165BPF_AND 0x50 dst &= src 166BPF_LSH 0x60 dst <<= src 167BPF_RSH 0x70 dst >>= src 168BPF_NEG 0x80 dst = ~src 169BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst 170BPF_XOR 0xa0 dst ^= src 171BPF_MOV 0xb0 dst = src 172BPF_ARSH 0xc0 sign extending shift right 173BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below) 174======== ===== ========================================================== 175 176Underflow and overflow are allowed during arithmetic operations, meaning 177the 64-bit or 32-bit value will wrap. If eBPF program execution would 178result in division by zero, the destination register is instead set to zero. 179If execution would result in modulo by zero, for ``BPF_ALU64`` the value of 180the destination register is unchanged whereas for ``BPF_ALU`` the upper 18132 bits of the destination register are zeroed. 182 183``BPF_ADD | BPF_X | BPF_ALU`` means:: 184 185 dst = (u32) ((u32) dst + (u32) src) 186 187where '(u32)' indicates that the upper 32 bits are zeroed. 188 189``BPF_ADD | BPF_X | BPF_ALU64`` means:: 190 191 dst = dst + src 192 193``BPF_XOR | BPF_K | BPF_ALU`` means:: 194 195 dst = (u32) dst ^ (u32) imm32 196 197``BPF_XOR | BPF_K | BPF_ALU64`` means:: 198 199 dst = dst ^ imm32 200 201Also note that the division and modulo operations are unsigned. Thus, for 202``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas 203for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result 204interpreted as an unsigned 64-bit value. There are no instructions for 205signed division or modulo. 206 207Byte swap instructions 208~~~~~~~~~~~~~~~~~~~~~~ 209 210The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit 211'code' field of ``BPF_END``. 212 213The byte swap instructions operate on the destination register 214only and do not use a separate source register or immediate value. 215 216The 1-bit source operand field in the opcode is used to select what byte 217order the operation convert from or to: 218 219========= ===== ================================================= 220source value description 221========= ===== ================================================= 222BPF_TO_LE 0x00 convert between host byte order and little endian 223BPF_TO_BE 0x08 convert between host byte order and big endian 224========= ===== ================================================= 225 226The 'imm' field encodes the width of the swap operations. The following widths 227are supported: 16, 32 and 64. 228 229Examples: 230 231``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: 232 233 dst = htole16(dst) 234 235``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: 236 237 dst = htobe64(dst) 238 239Jump instructions 240----------------- 241 242``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for 243otherwise identical operations. 244The 'code' field encodes the operation as below: 245 246======== ===== === =========================================== ========================================= 247code value src description notes 248======== ===== === =========================================== ========================================= 249BPF_JA 0x0 0x0 PC += offset BPF_JMP only 250BPF_JEQ 0x1 any PC += offset if dst == src 251BPF_JGT 0x2 any PC += offset if dst > src unsigned 252BPF_JGE 0x3 any PC += offset if dst >= src unsigned 253BPF_JSET 0x4 any PC += offset if dst & src 254BPF_JNE 0x5 any PC += offset if dst != src 255BPF_JSGT 0x6 any PC += offset if dst > src signed 256BPF_JSGE 0x7 any PC += offset if dst >= src signed 257BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_ 258BPF_CALL 0x8 0x1 call PC += offset see `Program-local functions`_ 259BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_ 260BPF_EXIT 0x9 0x0 return BPF_JMP only 261BPF_JLT 0xa any PC += offset if dst < src unsigned 262BPF_JLE 0xb any PC += offset if dst <= src unsigned 263BPF_JSLT 0xc any PC += offset if dst < src signed 264BPF_JSLE 0xd any PC += offset if dst <= src signed 265======== ===== === =========================================== ========================================= 266 267The eBPF program needs to store the return value into register R0 before doing a 268``BPF_EXIT``. 269 270Example: 271 272``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means:: 273 274 if (s32)dst s>= (s32)src goto +offset 275 276where 's>=' indicates a signed '>=' comparison. 277 278Helper functions 279~~~~~~~~~~~~~~~~ 280 281Helper functions are a concept whereby BPF programs can call into a 282set of function calls exposed by the underlying platform. 283 284Historically, each helper function was identified by an address 285encoded in the imm field. The available helper functions may differ 286for each program type, but address values are unique across all program types. 287 288Platforms that support the BPF Type Format (BTF) support identifying 289a helper function by a BTF ID encoded in the imm field, where the BTF ID 290identifies the helper name and type. 291 292Program-local functions 293~~~~~~~~~~~~~~~~~~~~~~~ 294Program-local functions are functions exposed by the same BPF program as the 295caller, and are referenced by offset from the call instruction, similar to 296``BPF_JA``. A ``BPF_EXIT`` within the program-local function will return to 297the caller. 298 299Load and store instructions 300=========================== 301 302For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the 3038-bit 'opcode' field is divided as: 304 305============ ====== ================= 3063 bits (MSB) 2 bits 3 bits (LSB) 307============ ====== ================= 308mode size instruction class 309============ ====== ================= 310 311The mode modifier is one of: 312 313 ============= ===== ==================================== ============= 314 mode modifier value description reference 315 ============= ===== ==================================== ============= 316 BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_ 317 BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ 318 BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ 319 BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ 320 BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ 321 ============= ===== ==================================== ============= 322 323The size modifier is one of: 324 325 ============= ===== ===================== 326 size modifier value description 327 ============= ===== ===================== 328 BPF_W 0x00 word (4 bytes) 329 BPF_H 0x08 half word (2 bytes) 330 BPF_B 0x10 byte 331 BPF_DW 0x18 double word (8 bytes) 332 ============= ===== ===================== 333 334Regular load and store operations 335--------------------------------- 336 337The ``BPF_MEM`` mode modifier is used to encode regular load and store 338instructions that transfer data between a register and memory. 339 340``BPF_MEM | <size> | BPF_STX`` means:: 341 342 *(size *) (dst + offset) = src 343 344``BPF_MEM | <size> | BPF_ST`` means:: 345 346 *(size *) (dst + offset) = imm32 347 348``BPF_MEM | <size> | BPF_LDX`` means:: 349 350 dst = *(size *) (src + offset) 351 352Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. 353 354Atomic operations 355----------------- 356 357Atomic operations are operations that operate on memory and can not be 358interrupted or corrupted by other access to the same memory region 359by other eBPF programs or means outside of this specification. 360 361All atomic operations supported by eBPF are encoded as store operations 362that use the ``BPF_ATOMIC`` mode modifier as follows: 363 364* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations 365* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations 366* 8-bit and 16-bit wide atomic operations are not supported. 367 368The 'imm' field is used to encode the actual atomic operation. 369Simple atomic operation use a subset of the values defined to encode 370arithmetic operations in the 'imm' field to encode the atomic operation: 371 372======== ===== =========== 373imm value description 374======== ===== =========== 375BPF_ADD 0x00 atomic add 376BPF_OR 0x40 atomic or 377BPF_AND 0x50 atomic and 378BPF_XOR 0xa0 atomic xor 379======== ===== =========== 380 381 382``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means:: 383 384 *(u32 *)(dst + offset) += src 385 386``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means:: 387 388 *(u64 *)(dst + offset) += src 389 390In addition to the simple atomic operations, there also is a modifier and 391two complex atomic operations: 392 393=========== ================ =========================== 394imm value description 395=========== ================ =========================== 396BPF_FETCH 0x01 modifier: return old value 397BPF_XCHG 0xe0 | BPF_FETCH atomic exchange 398BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange 399=========== ================ =========================== 400 401The ``BPF_FETCH`` modifier is optional for simple atomic operations, and 402always set for the complex atomic operations. If the ``BPF_FETCH`` flag 403is set, then the operation also overwrites ``src`` with the value that 404was in memory before it was modified. 405 406The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value 407addressed by ``dst + offset``. 408 409The ``BPF_CMPXCHG`` operation atomically compares the value addressed by 410``dst + offset`` with ``R0``. If they match, the value addressed by 411``dst + offset`` is replaced with ``src``. In either case, the 412value that was at ``dst + offset`` before the operation is zero-extended 413and loaded back to ``R0``. 414 41564-bit immediate instructions 416----------------------------- 417 418Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction 419encoding defined in `Instruction encoding`_, and use the 'src' field of the 420basic instruction to hold an opcode subtype. 421 422The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions 423with opcode subtypes in the 'src' field, using new terms such as "map" 424defined further below: 425 426========================= ====== === ========================================= =========== ============== 427opcode construction opcode src pseudocode imm type dst type 428========================= ====== === ========================================= =========== ============== 429BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = imm64 integer integer 430BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map 431BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer 432BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer 433BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer 434BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map 435BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer 436========================= ====== === ========================================= =========== ============== 437 438where 439 440* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_) 441* map_by_idx(imm) means to convert a 32-bit index into an address of a map 442* map_val(map) gets the address of the first value in a given map 443* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id 444* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions 445* the 'imm type' can be used by disassemblers for display 446* the 'dst type' can be used for verification and JIT compilation purposes 447 448Maps 449~~~~ 450 451Maps are shared memory regions accessible by eBPF programs on some platforms. 452A map can have various semantics as defined in a separate document, and may or 453may not have a single contiguous memory region, but the 'map_val(map)' is 454currently only defined for maps that do have a single contiguous memory region. 455 456Each map can have a file descriptor (fd) if supported by the platform, where 457'map_by_fd(imm)' means to get the map with the specified file descriptor. Each 458BPF program can also be defined to use a set of maps associated with the 459program at load time, and 'map_by_idx(imm)' means to get the map with the given 460index in the set associated with the BPF program containing the instruction. 461 462Platform Variables 463~~~~~~~~~~~~~~~~~~ 464 465Platform variables are memory regions, identified by integer ids, exposed by 466the runtime and accessible by BPF programs on some platforms. The 467'var_addr(imm)' operation means to get the address of the memory region 468identified by the given id. 469 470Legacy BPF Packet access instructions 471------------------------------------- 472 473eBPF previously introduced special instructions for access to packet data that were 474carried over from classic BPF. However, these instructions are 475deprecated and should no longer be used.