Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf intel-pt: Support itrace A option to approximate IPC

Normally, for cycle-acccurate mode, IPC values are an exact number of
instructions and cycles. Due to the granularity of timestamps, that happens
only when a CYC packet correlates to the event.

Support the itrace 'A' option, to use instead, the number of cycles
associated with the current timestamp. This provides IPC information for
every change of timestamp, but at the expense of accuracy. Due to the
granularity of timestamps, the actual number of cycles increases even
though the cycles reported does not. The number of instructions is known,
but if IPC is reported, cycles can be too low and so IPC is too high. Note
that inaccuracy decreases as the period of sampling increases i.e. if the
number of cycles is too low by a small amount, that becomes less
significant if the number of cycles is large.

Furthermore, it can be used in conjunction with dlfilter-show-cycles.so
to provide higher granularity cycle information.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/r/20211027080334.365596-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Adrian Hunter and committed by
Arnaldo Carvalho de Melo
f2b91386 b6778fe1

+24 -4
+10
tools/perf/Documentation/perf-intel-pt.txt
··· 157 157 the average IPC since the last IPC for that event type. Note IPC for "branches" 158 158 events is calculated separately from IPC for "instructions" events. 159 159 160 + Even with the 'cyc' config term, it is possible to produce IPC information for 161 + every change of timestamp, but at the expense of accuracy. That is selected by 162 + specifying the itrace 'A' option. Due to the granularity of timestamps, the 163 + actual number of cycles increases even though the cycles reported does not. 164 + The number of instructions is known, but if IPC is reported, cycles can be too 165 + low and so IPC is too high. Note that inaccuracy decreases as the period of 166 + sampling increases i.e. if the number of cycles is too low by a small amount, 167 + that becomes less significant if the number of cycles is large. 168 + 160 169 Also note that the IPC instruction count may or may not include the current 161 170 instruction. If the cycle count is associated with an asynchronous branch 162 171 (e.g. page fault or interrupt), then the instruction count does not include the ··· 882 873 L synthesize last branch entries on existing event records 883 874 s skip initial number of events 884 875 q quicker (less detailed) decoding 876 + A approximate IPC 885 877 Z prefer to ignore timestamps (so-called "timeless" decoding) 886 878 887 879 "Instructions" events look like they were recorded by "perf record -e
+1
tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
··· 608 608 { 609 609 decoder->sample_timestamp = decoder->timestamp; 610 610 decoder->sample_insn_cnt = decoder->timestamp_insn_cnt; 611 + decoder->state.cycles = decoder->tot_cyc_cnt; 611 612 } 612 613 613 614 static void intel_pt_reposition(struct intel_pt_decoder *decoder)
+1
tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
··· 218 218 uint64_t to_ip; 219 219 uint64_t tot_insn_cnt; 220 220 uint64_t tot_cyc_cnt; 221 + uint64_t cycles; 221 222 uint64_t timestamp; 222 223 uint64_t est_timestamp; 223 224 uint64_t trace_nr;
+12 -4
tools/perf/util/intel-pt.c
··· 172 172 bool step_through_buffers; 173 173 bool use_buffer_pid_tid; 174 174 bool sync_switch; 175 + bool sample_ipc; 175 176 pid_t pid, tid; 176 177 int cpu; 177 178 int switch_state; ··· 1582 1581 sample.branch_stack = (struct branch_stack *)&dummy_bs; 1583 1582 } 1584 1583 1585 - if (ptq->state->flags & INTEL_PT_SAMPLE_IPC) 1584 + if (ptq->sample_ipc) 1586 1585 sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_br_cyc_cnt; 1587 1586 if (sample.cyc_cnt) { 1588 1587 sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_br_insn_cnt; ··· 1633 1632 else 1634 1633 sample.period = ptq->state->tot_insn_cnt - ptq->last_insn_cnt; 1635 1634 1636 - if (ptq->state->flags & INTEL_PT_SAMPLE_IPC) 1635 + if (ptq->sample_ipc) 1637 1636 sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_in_cyc_cnt; 1638 1637 if (sample.cyc_cnt) { 1639 1638 sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_in_insn_cnt; ··· 2246 2245 2247 2246 ptq->have_sample = false; 2248 2247 2249 - ptq->ipc_insn_cnt = ptq->state->tot_insn_cnt; 2250 - ptq->ipc_cyc_cnt = ptq->state->tot_cyc_cnt; 2248 + if (pt->synth_opts.approx_ipc) { 2249 + ptq->ipc_insn_cnt = ptq->state->tot_insn_cnt; 2250 + ptq->ipc_cyc_cnt = ptq->state->cycles; 2251 + ptq->sample_ipc = true; 2252 + } else { 2253 + ptq->ipc_insn_cnt = ptq->state->tot_insn_cnt; 2254 + ptq->ipc_cyc_cnt = ptq->state->tot_cyc_cnt; 2255 + ptq->sample_ipc = ptq->state->flags & INTEL_PT_SAMPLE_IPC; 2256 + } 2251 2257 2252 2258 /* 2253 2259 * Do PEBS first to allow for the possibility that the PEBS timestamp