Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf Document: Add TPEBS (Timed PEBS(Precise Event-Based Sampling)) to Documents

TPEBS (Timed PEBS(Precise Event-Based Sampling)) is a new feature Intel
PMU from Granite Rapids microarchitecture.

It will be used in new TMA (Top-Down Microarchitecture Analysis)
releases.

Add related introduction to documents while adding new code to support
it in 'perf stat'.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-8-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Weilin Wang and committed by
Arnaldo Carvalho de Melo
169f18fd d546e3ac

+31
+1
tools/perf/Documentation/perf-list.txt
··· 72 72 W - group is weak and will fallback to non-group if not schedulable, 73 73 e - group or event are exclusive and do not share the PMU 74 74 b - use BPF aggregration (see perf stat --bpf-counters) 75 + R - retire latency value of the event 75 76 76 77 The 'p' modifier can be used for specifying how precise the instruction 77 78 address should be. The 'p' modifier can be specified multiple times:
+30
tools/perf/Documentation/topdown.txt
··· 325 325 Fetch_Bandwidth = Frontend_Bound - Fetch_Latency 326 326 Core_Bound = Backend_Bound - Memory_Bound 327 327 328 + TPEBS in TopDown 329 + ================ 330 + 331 + TPEBS (Timed PEBS) is one of the new Intel PMU features provided since Granite 332 + Rapids microarchitecture. The TPEBS feature adds a 16 bit retire_latency field 333 + in the Basic Info group of the PEBS record. It records the Core cycles since the 334 + retirement of the previous instruction to the retirement of current instruction. 335 + Please refer to Section 8.4.1 of "Intel® Architecture Instruction Set Extensions 336 + Programming Reference" for more details about this feature. Because this feature 337 + extends PEBS record, sampling with weight option is required to get the 338 + retire_latency value. 339 + 340 + perf record -e event_name -W ... 341 + 342 + In the most recent release of TMA, the metrics begin to use event retire_latency 343 + values in some of the metrics’ formulas on processors that support TPEBS feature. 344 + For previous generations that do not support TPEBS, the values are static and 345 + predefined per processor family by the hardware architects. Due to the diversity 346 + of workloads in execution environments, retire_latency values measured at real 347 + time are more accurate. Therefore, new TMA metrics that use TPEBS will provide 348 + more accurate performance analysis results. 349 + 350 + To support TPEBS in TMA metrics, a new modifier :R on event is added. Perf would 351 + capture retire_latency value of required events(event with :R in metric formula) 352 + with perf record. The retire_latency value would be used in metric calculation. 353 + Currently, this feature is supported through perf stat 354 + 355 + perf stat -M metric_name --record-tpebs ... 356 + 357 + 328 358 329 359 [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win 330 360 [2] https://sites.google.com/site/analysismethods/yasin-pubs