Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf script: Minimize "not reaching sample" for '-F +brstackinsn'

In some situations 'perf script -F +brstackinsn' sees a lot of "not
reaching sample" messages.

This happens when the last LBR block before the sample contains a branch
that is not in the LBR, and the instruction dumping stops.

$ perf record -b emacs -Q --batch '()'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.396 MB perf.data (443 samples) ]
$ perf script -F +brstackinsn
...
00007f0ab2d171a4 insn: 41 0f 94 c0
00007f0ab2d171a8 insn: 83 fa 01
00007f0ab2d171ab insn: 74 d3 # PRED 6 cycles [313] 1.00 IPC
00007f0ab2d17180 insn: 45 84 c0
00007f0ab2d17183 insn: 74 28
... not reaching sample ...

$ perf script -F +brstackinsn | grep -c reach
136
$

This is a problem for further analysis that wants to see the full code
upto the sample.

There are two common cases where the message is bogus:

- The LBR only logs taken branches, but the branch might be a
conditional branch that is not taken (that is the most common case
actually)

- The LBR sampling uses a filter ignoring some branches, but the perf
script check checks for all branches.

This patch fixes these two conditions, by only checking for conditional
branches, as well as checking the perf_event_attr's branch filter
attributes.

For the test case above it fixes all the messages:

$ ./perf script -F +brstackinsn | grep -c reach
0

Note that there are still conditions when the message is hit --
sometimes there can be a unconditional branch that misses the LBR update
before the sample -- but they are much more rare now.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20240229161828.386397-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Andi Kleen and committed by
Arnaldo Carvalho de Melo
bf0db8c7 8b3b1bb3

+9 -6
+4 -2
tools/perf/builtin-script.c
··· 1428 1428 * Due to pipeline delays the LBRs might be missing a branch 1429 1429 * or two, which can result in very large or negative blocks 1430 1430 * between final branch and sample. When this happens just 1431 - * continue walking after the last TO until we hit a branch. 1431 + * continue walking after the last TO. 1432 1432 */ 1433 1433 start = entries[0].to; 1434 1434 end = sample->ip; ··· 1463 1463 printed += fprintf(fp, "\n"); 1464 1464 if (ilen == 0) 1465 1465 break; 1466 - if (arch_is_branch(buffer + off, len - off, x.is64bit) && start + off != sample->ip) { 1466 + if ((attr->branch_sample_type == 0 || attr->branch_sample_type & PERF_SAMPLE_BRANCH_ANY) 1467 + && arch_is_uncond_branch(buffer + off, len - off, x.is64bit) 1468 + && start + off != sample->ip) { 1467 1469 /* 1468 1470 * Hit a missing branch. Just stop. 1469 1471 */
+1 -1
tools/perf/util/dump-insn.c
··· 15 15 } 16 16 17 17 __weak 18 - int arch_is_branch(const unsigned char *buf __maybe_unused, 18 + int arch_is_uncond_branch(const unsigned char *buf __maybe_unused, 19 19 size_t len __maybe_unused, 20 20 int x86_64 __maybe_unused) 21 21 {
+1 -1
tools/perf/util/dump-insn.h
··· 21 21 22 22 const char *dump_insn(struct perf_insn *x, u64 ip, 23 23 u8 *inbuf, int inlen, int *lenp); 24 - int arch_is_branch(const unsigned char *buf, size_t len, int x86_64); 24 + int arch_is_uncond_branch(const unsigned char *buf, size_t len, int x86_64); 25 25 26 26 #endif
+3 -2
tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
··· 209 209 return 0; 210 210 } 211 211 212 - int arch_is_branch(const unsigned char *buf, size_t len, int x86_64) 212 + int arch_is_uncond_branch(const unsigned char *buf, size_t len, int x86_64) 213 213 { 214 214 struct intel_pt_insn in; 215 215 if (intel_pt_get_insn(buf, len, x86_64, &in) < 0) 216 216 return -1; 217 - return in.branch != INTEL_PT_BR_NO_BRANCH; 217 + return in.branch == INTEL_PT_BR_UNCONDITIONAL || 218 + in.branch == INTEL_PT_BR_INDIRECT; 218 219 } 219 220 220 221 const char *dump_insn(struct perf_insn *x, uint64_t ip __maybe_unused,