Documentation/bpf/instruction-set.rst at v5.17

tjh.dev / kernel
fork
Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
fork
kernel / Documentation / bpf / instruction-set.rst
at v5.17 279 lines 9.8 kB view raw
wrap content
  1
  2====================
  3eBPF Instruction Set
  4====================
  5
  6Registers and calling convention
  7================================
  8
  9eBPF has 10 general purpose registers and a read-only frame pointer register,
 10all of which are 64-bits wide.
 11
 12The eBPF calling convention is defined as:
 13
 14 * R0: return value from function calls, and exit value for eBPF programs
 15 * R1 - R5: arguments for function calls
 16 * R6 - R9: callee saved registers that function calls will preserve
 17 * R10: read-only frame pointer to access stack
 18
 19R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
 20necessary across calls.
 21
 22Instruction encoding
 23====================
 24
 25eBPF uses 64-bit instructions with the following encoding:
 26
 27 =============  =======  ===============  ====================  ============
 28 32 bits (MSB)  16 bits  4 bits           4 bits                8 bits (LSB)
 29 =============  =======  ===============  ====================  ============
 30 immediate      offset   source register  destination register  opcode
 31 =============  =======  ===============  ====================  ============
 32
 33Note that most instructions do not use all of the fields.
 34Unused fields shall be cleared to zero.
 35
 36Instruction classes
 37-------------------
 38
 39The three LSB bits of the 'opcode' field store the instruction class:
 40
 41  =========  =====  ===============================
 42  class      value  description
 43  =========  =====  ===============================
 44  BPF_LD     0x00   non-standard load operations
 45  BPF_LDX    0x01   load into register operations
 46  BPF_ST     0x02   store from immediate operations
 47  BPF_STX    0x03   store from register operations
 48  BPF_ALU    0x04   32-bit arithmetic operations
 49  BPF_JMP    0x05   64-bit jump operations
 50  BPF_JMP32  0x06   32-bit jump operations
 51  BPF_ALU64  0x07   64-bit arithmetic operations
 52  =========  =====  ===============================
 53
 54Arithmetic and jump instructions
 55================================
 56
 57For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
 58BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
 59
 60  ==============  ======  =================
 61  4 bits (MSB)    1 bit   3 bits (LSB)
 62  ==============  ======  =================
 63  operation code  source  instruction class
 64  ==============  ======  =================
 65
 66The 4th bit encodes the source operand:
 67
 68  ======  =====  ========================================
 69  source  value  description
 70  ======  =====  ========================================
 71  BPF_K   0x00   use 32-bit immediate as source operand
 72  BPF_X   0x08   use 'src_reg' register as source operand
 73  ======  =====  ========================================
 74
 75The four MSB bits store the operation code.
 76
 77
 78Arithmetic instructions
 79-----------------------
 80
 81BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
 82otherwise identical operations.
 83The code field encodes the operation as below:
 84
 85  ========  =====  ==========================
 86  code      value  description
 87  ========  =====  ==========================
 88  BPF_ADD   0x00   dst += src
 89  BPF_SUB   0x10   dst -= src
 90  BPF_MUL   0x20   dst \*= src
 91  BPF_DIV   0x30   dst /= src
 92  BPF_OR    0x40   dst \|= src
 93  BPF_AND   0x50   dst &= src
 94  BPF_LSH   0x60   dst <<= src
 95  BPF_RSH   0x70   dst >>= src
 96  BPF_NEG   0x80   dst = ~src
 97  BPF_MOD   0x90   dst %= src
 98  BPF_XOR   0xa0   dst ^= src
 99  BPF_MOV   0xb0   dst = src
100  BPF_ARSH  0xc0   sign extending shift right
101  BPF_END   0xd0   endianness conversion
102  ========  =====  ==========================
103
104BPF_ADD | BPF_X | BPF_ALU means::
105
106  dst_reg = (u32) dst_reg + (u32) src_reg;
107
108BPF_ADD | BPF_X | BPF_ALU64 means::
109
110  dst_reg = dst_reg + src_reg
111
112BPF_XOR | BPF_K | BPF_ALU means::
113
114  src_reg = (u32) src_reg ^ (u32) imm32
115
116BPF_XOR | BPF_K | BPF_ALU64 means::
117
118  src_reg = src_reg ^ imm32
119
120
121Jump instructions
122-----------------
123
124BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
125otherwise identical operations.
126The code field encodes the operation as below:
127
128  ========  =====  =========================  ============
129  code      value  description                notes
130  ========  =====  =========================  ============
131  BPF_JA    0x00   PC += off                  BPF_JMP only
132  BPF_JEQ   0x10   PC += off if dst == src
133  BPF_JGT   0x20   PC += off if dst > src     unsigned
134  BPF_JGE   0x30   PC += off if dst >= src    unsigned
135  BPF_JSET  0x40   PC += off if dst & src
136  BPF_JNE   0x50   PC += off if dst != src
137  BPF_JSGT  0x60   PC += off if dst > src     signed
138  BPF_JSGE  0x70   PC += off if dst >= src    signed
139  BPF_CALL  0x80   function call
140  BPF_EXIT  0x90   function / program return  BPF_JMP only
141  BPF_JLT   0xa0   PC += off if dst < src     unsigned
142  BPF_JLE   0xb0   PC += off if dst <= src    unsigned
143  BPF_JSLT  0xc0   PC += off if dst < src     signed
144  BPF_JSLE  0xd0   PC += off if dst <= src    signed
145  ========  =====  =========================  ============
146
147The eBPF program needs to store the return value into register R0 before doing a
148BPF_EXIT.
149
150
151Load and store instructions
152===========================
153
154For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
1558-bit 'opcode' field is divided as:
156
157  ============  ======  =================
158  3 bits (MSB)  2 bits  3 bits (LSB)
159  ============  ======  =================
160  mode          size    instruction class
161  ============  ======  =================
162
163The size modifier is one of:
164
165  =============  =====  =====================
166  size modifier  value  description
167  =============  =====  =====================
168  BPF_W          0x00   word        (4 bytes)
169  BPF_H          0x08   half word   (2 bytes)
170  BPF_B          0x10   byte
171  BPF_DW         0x18   double word (8 bytes)
172  =============  =====  =====================
173
174The mode modifier is one of:
175
176  =============  =====  ====================================
177  mode modifier  value  description
178  =============  =====  ====================================
179  BPF_IMM        0x00   used for 64-bit mov
180  BPF_ABS        0x20   legacy BPF packet access
181  BPF_IND        0x40   legacy BPF packet access
182  BPF_MEM        0x60   all normal load and store operations
183  BPF_ATOMIC     0xc0   atomic operations
184  =============  =====  ====================================
185
186BPF_MEM | <size> | BPF_STX means::
187
188  *(size *) (dst_reg + off) = src_reg
189
190BPF_MEM | <size> | BPF_ST means::
191
192  *(size *) (dst_reg + off) = imm32
193
194BPF_MEM | <size> | BPF_LDX means::
195
196  dst_reg = *(size *) (src_reg + off)
197
198Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
199
200Atomic operations
201-----------------
202
203eBPF includes atomic operations, which use the immediate field for extra
204encoding::
205
206   .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W  | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
207   .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
208
209The basic atomic operations supported are::
210
211    BPF_ADD
212    BPF_AND
213    BPF_OR
214    BPF_XOR
215
216Each having equivalent semantics with the ``BPF_ADD`` example, that is: the
217memory location addresed by ``dst_reg + off`` is atomically modified, with
218``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the
219immediate, then these operations also overwrite ``src_reg`` with the
220value that was in memory before it was modified.
221
222The more special operations are::
223
224    BPF_XCHG
225
226This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
227off``. ::
228
229    BPF_CMPXCHG
230
231This atomically compares the value addressed by ``dst_reg + off`` with
232``R0``. If they match it is replaced with ``src_reg``. In either case, the
233value that was there before is zero-extended and loaded back to ``R0``.
234
235Note that 1 and 2 byte atomic operations are not supported.
236
237Clang can generate atomic instructions by default when ``-mcpu=v3`` is
238enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
239Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
240the atomics features, while keeping a lower ``-mcpu`` version, you can use
241``-Xclang -target-feature -Xclang +alu32``.
242
243You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``,
244referring to the exclusive-add operation encoded when the immediate field is
245zero.
246
24716-byte instructions
248--------------------
249
250eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists
251of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
252instruction that loads 64-bit immediate value into a dst_reg.
253
254Packet access instructions
255--------------------------
256
257eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
258(BPF_IND | <size> | BPF_LD) which are used to access packet data.
259
260They had to be carried over from classic BPF to have strong performance of
261socket filters running in eBPF interpreter. These instructions can only
262be used when interpreter context is a pointer to ``struct sk_buff`` and
263have seven implicit operands. Register R6 is an implicit input that must
264contain pointer to sk_buff. Register R0 is an implicit output which contains
265the data fetched from the packet. Registers R1-R5 are scratch registers
266and must not be used to store the data across BPF_ABS | BPF_LD or
267BPF_IND | BPF_LD instructions.
268
269These instructions have implicit program exit condition as well. When
270eBPF program is trying to access the data beyond the packet boundary,
271the interpreter will abort the execution of the program. JIT compilers
272therefore must preserve this property. src_reg and imm32 fields are
273explicit inputs to these instructions.
274
275For example, BPF_IND | BPF_W | BPF_LD means::
276
277  R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
278
279and R1 - R5 are clobbered.
Configure Feed

Configure Feed