Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'libbpf-fix-usdt-sib-argument-handling-causing-unrecognized-register-error'

Jiawei Zhao says:

====================
libbpf: fix USDT SIB argument handling causing unrecognized register error

When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array, e.g. "1@-96(%rbp,%rax,8)".

The current USDT implementation in libbpf cannot parse these two formats,
causing `bpf_program__attach_usdt()` to fail with -ENOENT
(unrecognized register).

This patch series adds support for SIB addressing mode in USDT probes.
The main changes include:
- add correct handling logic for SIB-addressed arguments in
`parse_usdt_arg`.
- add an usdt_o2 test case to cover SIB addressing mode.

Testing shows that the SIB probe correctly generates 8@(%rcx,%rax,8)
argument spec and passes all validation checks.

The modification history of this patch series:
Change since v1:
- refactor the code to make it more readable
- modify the commit message to explain why and how

Change since v2:
- fix the `scale` uninitialized error

Change since v3:
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
and pass all test cases.

Change since v4:
- split the patch into two parts, one for the fix and the other for the
test

Change since v5:
- Only enable optimization for x86 architecture to generate SIB addressing
usdt argument spec.

Change since v6:
- Add an usdt_o2 test case to cover SIB addressing mode.
- Reinstate the usdt.c test case.

Change since v7:
- Refactor modifications to __bpf_usdt_arg_spec to avoid increasing its size,
achieving better compatibility
- Fix some minor code style issues
- Refactor the usdt_o2 test case, removing semaphore and adding GCC attribute
to force -O2 optimization

Change since v8:
- Refactor the usdt_o2 test case, using assembly to force SIB addressing mode.

Change since v9:
- Only enable the usdt_o2 test case on x86_64 and i386 architectures since the
SIB addressing mode is only supported on x86_64 and i386.

Change since v10:
- Replace `__attribute__((optimize("O2")))` with `#pragma GCC optimize("O1")`
to fix the issue where the optimized compilation condition works improperly.
- Renamed test case usdt_o2 and relevant files name to usdt_o1 in that O1
level optimization is enough to generate SIB addressing usdt argument spec.

Change since v11:
- Replace `STAP_PROBE1` with `STAP_PROBE_ASM`
- Use bit fields instead of bit shifting operations
- Merge the usdt_o1 test case into the usdt test case

Change since v12:
- This patch is same with the v12 but with a new version number.

Change since v13(resolve some review comments):
- https://lore.kernel.org/bpf/CAEf4BzZWd2zUC=U6uGJFF3EMZ7zWGLweQAG3CJWTeHy-5yFEPw@mail.gmail.com/
- https://lore.kernel.org/bpf/CAEf4Bzbs3hV_Q47+d93tTX13WkrpkpOb4=U04mZCjHyZg4aVdw@mail.gmail.com/

Change since v14:
- fix a typo in __bpf_usdt_arg_spec

Change since v15(resolve some review comments):
- https://lore.kernel.org/bpf/CAEf4BzaxuYijEfQMDFZ+CQdjxLuDZiesUXNA-SiopS+5+VxRaA@mail.gmail.com/
- https://lore.kernel.org/bpf/CAEf4BzaHi5kpuJ6OVvDU62LT5g0qHbWYMfb_FBQ3iuvvUF9fag@mail.gmail.com/
- https://lore.kernel.org/bpf/d438bf3a-a9c9-4d34-b814-63f2e9bb3a85@linux.dev/
====================

Link: https://patch.msgid.link/20250827053128.1301287-1-phoenix500526@163.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

+211 -9
+42 -2
tools/lib/bpf/usdt.bpf.h
··· 34 34 BPF_USDT_ARG_CONST, 35 35 BPF_USDT_ARG_REG, 36 36 BPF_USDT_ARG_REG_DEREF, 37 + BPF_USDT_ARG_SIB, 37 38 }; 38 39 40 + /* 41 + * This struct layout is designed specifically to be backwards/forward 42 + * compatible between libbpf versions for ARG_CONST, ARG_REG, and 43 + * ARG_REG_DEREF modes. ARG_SIB requires libbpf v1.7+. 44 + */ 39 45 struct __bpf_usdt_arg_spec { 40 46 /* u64 scalar interpreted depending on arg_type, see below */ 41 47 __u64 val_off; 48 + #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ 42 49 /* arg location case, see bpf_usdt_arg() for details */ 43 - enum __bpf_usdt_arg_type arg_type; 50 + enum __bpf_usdt_arg_type arg_type: 8; 51 + /* index register offset within struct pt_regs */ 52 + __u16 idx_reg_off: 12; 53 + /* scale factor for index register (1, 2, 4, or 8) */ 54 + __u16 scale_bitshift: 4; 55 + /* reserved for future use, keeps reg_off offset stable */ 56 + __u8 __reserved: 8; 57 + #else 58 + __u8 __reserved: 8; 59 + __u16 idx_reg_off: 12; 60 + __u16 scale_bitshift: 4; 61 + enum __bpf_usdt_arg_type arg_type: 8; 62 + #endif 44 63 /* offset of referenced register within struct pt_regs */ 45 64 short reg_off; 46 65 /* whether arg should be interpreted as signed value */ ··· 168 149 { 169 150 struct __bpf_usdt_spec *spec; 170 151 struct __bpf_usdt_arg_spec *arg_spec; 171 - unsigned long val; 152 + unsigned long val, idx; 172 153 int err, spec_id; 173 154 174 155 *res = 0; ··· 217 198 if (err) 218 199 return err; 219 200 err = bpf_probe_read_user(&val, sizeof(val), (void *)val + arg_spec->val_off); 201 + if (err) 202 + return err; 203 + #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ 204 + val >>= arg_spec->arg_bitshift; 205 + #endif 206 + break; 207 + case BPF_USDT_ARG_SIB: 208 + /* Arg is in memory addressed by SIB (Scale-Index-Base) mode 209 + * (e.g., "-1@-96(%rbp,%rax,8)" in USDT arg spec). We first 210 + * fetch the base register contents and the index register 211 + * contents from pt_regs. Then we calculate the final address 212 + * as base + (index * scale) + offset, and do a user-space 213 + * probe read to fetch the argument value. 214 + */ 215 + err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off); 216 + if (err) 217 + return err; 218 + err = bpf_probe_read_kernel(&idx, sizeof(idx), (void *)ctx + arg_spec->idx_reg_off); 219 + if (err) 220 + return err; 221 + err = bpf_probe_read_user(&val, sizeof(val), (void *)(val + (idx << arg_spec->scale_bitshift) + arg_spec->val_off)); 220 222 if (err) 221 223 return err; 222 224 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+57 -5
tools/lib/bpf/usdt.c
··· 200 200 USDT_ARG_CONST, 201 201 USDT_ARG_REG, 202 202 USDT_ARG_REG_DEREF, 203 + USDT_ARG_SIB, 203 204 }; 204 205 205 206 /* should match exactly struct __bpf_usdt_arg_spec from usdt.bpf.h */ 206 207 struct usdt_arg_spec { 207 208 __u64 val_off; 208 - enum usdt_arg_type arg_type; 209 + #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ 210 + enum usdt_arg_type arg_type: 8; 211 + __u16 idx_reg_off: 12; 212 + __u16 scale_bitshift: 4; 213 + __u8 __reserved: 8; /* keep reg_off offset stable */ 214 + #else 215 + __u8 __reserved: 8; /* keep reg_off offset stable */ 216 + __u16 idx_reg_off: 12; 217 + __u16 scale_bitshift: 4; 218 + enum usdt_arg_type arg_type: 8; 219 + #endif 209 220 short reg_off; 210 221 bool arg_signed; 211 222 char arg_bitshift; ··· 1294 1283 1295 1284 static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg, int *arg_sz) 1296 1285 { 1297 - char reg_name[16]; 1298 - int len, reg_off; 1299 - long off; 1286 + char reg_name[16] = {0}, idx_reg_name[16] = {0}; 1287 + int len, reg_off, idx_reg_off, scale = 1; 1288 + long off = 0; 1300 1289 1301 - if (sscanf(arg_str, " %d @ %ld ( %%%15[^)] ) %n", arg_sz, &off, reg_name, &len) == 3) { 1290 + if (sscanf(arg_str, " %d @ %ld ( %%%15[^,] , %%%15[^,] , %d ) %n", 1291 + arg_sz, &off, reg_name, idx_reg_name, &scale, &len) == 5 || 1292 + sscanf(arg_str, " %d @ ( %%%15[^,] , %%%15[^,] , %d ) %n", 1293 + arg_sz, reg_name, idx_reg_name, &scale, &len) == 4 || 1294 + sscanf(arg_str, " %d @ %ld ( %%%15[^,] , %%%15[^)] ) %n", 1295 + arg_sz, &off, reg_name, idx_reg_name, &len) == 4 || 1296 + sscanf(arg_str, " %d @ ( %%%15[^,] , %%%15[^)] ) %n", 1297 + arg_sz, reg_name, idx_reg_name, &len) == 3 1298 + ) { 1299 + /* 1300 + * Scale Index Base case: 1301 + * 1@-96(%rbp,%rax,8) 1302 + * 1@(%rbp,%rax,8) 1303 + * 1@-96(%rbp,%rax) 1304 + * 1@(%rbp,%rax) 1305 + */ 1306 + arg->arg_type = USDT_ARG_SIB; 1307 + arg->val_off = off; 1308 + 1309 + reg_off = calc_pt_regs_off(reg_name); 1310 + if (reg_off < 0) 1311 + return reg_off; 1312 + arg->reg_off = reg_off; 1313 + 1314 + idx_reg_off = calc_pt_regs_off(idx_reg_name); 1315 + if (idx_reg_off < 0) 1316 + return idx_reg_off; 1317 + arg->idx_reg_off = idx_reg_off; 1318 + 1319 + /* validate scale factor and set fields directly */ 1320 + switch (scale) { 1321 + case 1: arg->scale_bitshift = 0; break; 1322 + case 2: arg->scale_bitshift = 1; break; 1323 + case 4: arg->scale_bitshift = 2; break; 1324 + case 8: arg->scale_bitshift = 3; break; 1325 + default: 1326 + pr_warn("usdt: invalid SIB scale %d, expected 1, 2, 4, 8\n", scale); 1327 + return -EINVAL; 1328 + } 1329 + } else if (sscanf(arg_str, " %d @ %ld ( %%%15[^)] ) %n", 1330 + arg_sz, &off, reg_name, &len) == 3) { 1302 1331 /* Memory dereference case, e.g., -4@-20(%rbp) */ 1303 1332 arg->arg_type = USDT_ARG_REG_DEREF; 1304 1333 arg->val_off = off; ··· 1357 1306 } else if (sscanf(arg_str, " %d @ %%%15s %n", arg_sz, reg_name, &len) == 2) { 1358 1307 /* Register read case, e.g., -4@%eax */ 1359 1308 arg->arg_type = USDT_ARG_REG; 1309 + /* register read has no memory offset */ 1360 1310 arg->val_off = 0; 1361 1311 1362 1312 reg_off = calc_pt_regs_off(reg_name);
+81 -2
tools/testing/selftests/bpf/prog_tests/usdt.c
··· 40 40 } 41 41 } 42 42 43 + #if defined(__x86_64__) || defined(__i386__) 44 + /* 45 + * SIB (Scale-Index-Base) addressing format: "size@(base_reg, index_reg, scale)" 46 + * - 'size' is the size in bytes of the array element, and its sign indicates 47 + * whether the type is signed (negative) or unsigned (positive). 48 + * - 'base_reg' is the register holding the base address, normally rdx or edx 49 + * - 'index_reg' is the register holding the index, normally rax or eax 50 + * - 'scale' is the scaling factor (typically 1, 2, 4, or 8), which matches the 51 + * size of the element type. 52 + * 53 + * For example, for an array of 'short' (signed 2-byte elements), the SIB spec would be: 54 + * - size: -2 (negative because 'short' is signed) 55 + * - scale: 2 (since sizeof(short) == 2) 56 + * 57 + * The resulting SIB format: "-2@(%%rdx,%%rax,2)" for x86_64, "-2@(%%edx,%%eax,2)" for i386 58 + */ 59 + static volatile short array[] = {-1, -2, -3, -4}; 60 + 61 + #if defined(__x86_64__) 62 + #define USDT_SIB_ARG_SPEC -2@(%%rdx,%%rax,2) 63 + #else 64 + #define USDT_SIB_ARG_SPEC -2@(%%edx,%%eax,2) 65 + #endif 66 + 67 + unsigned short test_usdt_sib_semaphore SEC(".probes"); 68 + 69 + static void trigger_sib_spec(void) 70 + { 71 + /* 72 + * Force SIB addressing with inline assembly. 73 + * 74 + * You must compile with -std=gnu99 or -std=c99 to use the 75 + * STAP_PROBE_ASM macro. 76 + * 77 + * The STAP_PROBE_ASM macro generates a quoted string that gets 78 + * inserted between the surrounding assembly instructions. In this 79 + * case, USDT_SIB_ARG_SPEC is embedded directly into the instruction 80 + * stream, creating a probe point between the asm statement boundaries. 81 + * It works fine with gcc/clang. 82 + * 83 + * Register constraints: 84 + * - "d"(array): Binds the 'array' variable to %rdx or %edx register 85 + * - "a"(0): Binds the constant 0 to %rax or %eax register 86 + * These ensure that when USDT_SIB_ARG_SPEC references %%rdx(%edx) and 87 + * %%rax(%eax), they contain the expected values for SIB addressing. 88 + * 89 + * The "memory" clobber prevents the compiler from reordering memory 90 + * accesses around the probe point, ensuring that the probe behavior 91 + * is predictable and consistent. 92 + */ 93 + asm volatile( 94 + STAP_PROBE_ASM(test, usdt_sib, USDT_SIB_ARG_SPEC) 95 + : 96 + : "d"(array), "a"(0) 97 + : "memory" 98 + ); 99 + } 100 + #endif 101 + 43 102 static void subtest_basic_usdt(void) 44 103 { 45 104 LIBBPF_OPTS(bpf_usdt_opts, opts); 46 105 struct test_usdt *skel; 47 106 struct test_usdt__bss *bss; 48 107 int err, i; 108 + const __u64 expected_cookie = 0xcafedeadbeeffeed; 49 109 50 110 skel = test_usdt__open_and_load(); 51 111 if (!ASSERT_OK_PTR(skel, "skel_open")) ··· 119 59 goto cleanup; 120 60 121 61 /* usdt0 won't be auto-attached */ 122 - opts.usdt_cookie = 0xcafedeadbeeffeed; 62 + opts.usdt_cookie = expected_cookie; 123 63 skel->links.usdt0 = bpf_program__attach_usdt(skel->progs.usdt0, 124 64 0 /*self*/, "/proc/self/exe", 125 65 "test", "usdt0", &opts); 126 66 if (!ASSERT_OK_PTR(skel->links.usdt0, "usdt0_link")) 127 67 goto cleanup; 68 + 69 + #if defined(__x86_64__) || defined(__i386__) 70 + opts.usdt_cookie = expected_cookie; 71 + skel->links.usdt_sib = bpf_program__attach_usdt(skel->progs.usdt_sib, 72 + 0 /*self*/, "/proc/self/exe", 73 + "test", "usdt_sib", &opts); 74 + if (!ASSERT_OK_PTR(skel->links.usdt_sib, "usdt_sib_link")) 75 + goto cleanup; 76 + #endif 128 77 129 78 trigger_func(1); 130 79 ··· 141 72 ASSERT_EQ(bss->usdt3_called, 1, "usdt3_called"); 142 73 ASSERT_EQ(bss->usdt12_called, 1, "usdt12_called"); 143 74 144 - ASSERT_EQ(bss->usdt0_cookie, 0xcafedeadbeeffeed, "usdt0_cookie"); 75 + ASSERT_EQ(bss->usdt0_cookie, expected_cookie, "usdt0_cookie"); 145 76 ASSERT_EQ(bss->usdt0_arg_cnt, 0, "usdt0_arg_cnt"); 146 77 ASSERT_EQ(bss->usdt0_arg_ret, -ENOENT, "usdt0_arg_ret"); 147 78 ASSERT_EQ(bss->usdt0_arg_size, -ENOENT, "usdt0_arg_size"); ··· 224 155 ASSERT_EQ(bss->usdt3_args[0], 3, "usdt3_arg1"); 225 156 ASSERT_EQ(bss->usdt3_args[1], 42, "usdt3_arg2"); 226 157 ASSERT_EQ(bss->usdt3_args[2], (uintptr_t)&bla, "usdt3_arg3"); 158 + 159 + #if defined(__x86_64__) || defined(__i386__) 160 + trigger_sib_spec(); 161 + ASSERT_EQ(bss->usdt_sib_called, 1, "usdt_sib_called"); 162 + ASSERT_EQ(bss->usdt_sib_cookie, expected_cookie, "usdt_sib_cookie"); 163 + ASSERT_EQ(bss->usdt_sib_arg_cnt, 1, "usdt_sib_arg_cnt"); 164 + ASSERT_EQ(bss->usdt_sib_arg, nums[0], "usdt_sib_arg"); 165 + ASSERT_EQ(bss->usdt_sib_arg_ret, 0, "usdt_sib_arg_ret"); 166 + ASSERT_EQ(bss->usdt_sib_arg_size, sizeof(nums[0]), "usdt_sib_arg_size"); 167 + #endif 227 168 228 169 cleanup: 229 170 test_usdt__destroy(skel);
+31
tools/testing/selftests/bpf/progs/test_usdt.c
··· 107 107 return 0; 108 108 } 109 109 110 + int usdt_sib_called; 111 + u64 usdt_sib_cookie; 112 + int usdt_sib_arg_cnt; 113 + int usdt_sib_arg_ret; 114 + short usdt_sib_arg; 115 + int usdt_sib_arg_size; 116 + 117 + /* 118 + * usdt_sib is only tested on x86-related architectures, so it requires 119 + * manual attach since auto-attach will panic tests under other architectures 120 + */ 121 + SEC("usdt") 122 + int usdt_sib(struct pt_regs *ctx) 123 + { 124 + long tmp; 125 + 126 + if (my_pid != (bpf_get_current_pid_tgid() >> 32)) 127 + return 0; 128 + 129 + __sync_fetch_and_add(&usdt_sib_called, 1); 130 + 131 + usdt_sib_cookie = bpf_usdt_cookie(ctx); 132 + usdt_sib_arg_cnt = bpf_usdt_arg_cnt(ctx); 133 + 134 + usdt_sib_arg_ret = bpf_usdt_arg(ctx, 0, &tmp); 135 + usdt_sib_arg = (short)tmp; 136 + usdt_sib_arg_size = bpf_usdt_arg_size(ctx, 0); 137 + 138 + return 0; 139 + } 140 + 110 141 char _license[] SEC("license") = "GPL";