Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf doc: Refresh topdown documentation

perf stat now supports --topdown for any platform with the TopdownL1
metric group including Intel before Icelake. Tweak the documentation
to reflect this.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-43-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Ian Rogers and committed by
Arnaldo Carvalho de Melo
20cb10ea 7b86475f

+43 -52
+14 -11
tools/perf/Documentation/perf-stat.txt
··· 394 394 Do not aggregate counts across all monitored CPUs. 395 395 396 396 --topdown:: 397 - Print complete top-down metrics supported by the CPU. This allows to 398 - determine bottle necks in the CPU pipeline for CPU bound workloads, 399 - by breaking the cycles consumed down into frontend bound, backend bound, 400 - bad speculation and retiring. 397 + Print top-down metrics supported by the CPU. This allows to determine 398 + bottle necks in the CPU pipeline for CPU bound workloads, by breaking 399 + the cycles consumed down into frontend bound, backend bound, bad 400 + speculation and retiring. 401 401 402 402 Frontend bound means that the CPU cannot fetch and decode instructions fast 403 403 enough. Backend bound means that computation or memory access is the bottle ··· 430 430 taskset. 431 431 432 432 --td-level:: 433 - Print the top-down statistics that equal to or lower than the input level. 434 - It allows users to print the interested top-down metrics level instead of 435 - the complete top-down metrics. 433 + Print the top-down statistics that equal the input level. It allows 434 + users to print the interested top-down metrics level instead of the 435 + level 1 top-down metrics. 436 436 437 - The availability of the top-down metrics level depends on the hardware. For 438 - example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids 439 - supports both L1 and L2 top-down metrics. 437 + As the higher levels gather more metrics and use more counters they 438 + will be less accurate. By convention a metric can be examined by 439 + appending '_group' to it and this will increase accuracy compared to 440 + gathering all metrics for a level. For example, level 1 analysis may 441 + highlight 'tma_frontend_bound'. This metric may be drilled into with 442 + 'tma_frontend_bound_group' with 443 + 'perf stat -M tma_frontend_bound_group...'. 440 444 441 - Default: 0 means the max level that the current hardware support. 442 445 Error out if the input is higher than the supported max level. 443 446 444 447 --no-merge::
+29 -41
tools/perf/Documentation/topdown.txt
··· 1 - Using TopDown metrics in user space 2 - ----------------------------------- 1 + Using TopDown metrics 2 + --------------------- 3 3 4 - Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown 5 - methodology to break down CPU pipeline execution into 4 bottlenecks: 6 - frontend bound, backend bound, bad speculation, retiring. 4 + TopDown metrics break apart performance bottlenecks. Starting at level 5 + 1 it is typical to get metrics on retiring, bad speculation, frontend 6 + bound, and backend bound. Higher levels provide more detail in to the 7 + level 1 bottlenecks, such as at level 2: core bound, memory bound, 8 + heavy operations, light operations, branch mispredicts, machine 9 + clears, fetch latency and fetch bandwidth. For more details see [1][2][3]. 7 10 8 - For more details on Topdown see [1][5] 11 + perf stat --topdown implements this using available metrics that vary 12 + per architecture. 9 13 10 - Traditionally this was implemented by events in generic counters 11 - and specific formulas to compute the bottlenecks. 14 + % perf stat -a --topdown -I1000 15 + # time % tma_retiring % tma_backend_bound % tma_frontend_bound % tma_bad_speculation 16 + 1.001141351 11.5 34.9 46.9 6.7 17 + 2.006141972 13.4 28.1 50.4 8.1 18 + 3.010162040 12.9 28.1 51.1 8.0 19 + 4.014009311 12.5 28.6 51.8 7.2 20 + 5.017838554 11.8 33.0 48.0 7.2 21 + 5.704818971 14.0 27.5 51.3 7.3 22 + ... 12 23 13 - perf stat --topdown implements this. 14 - 15 - Full Top Down includes more levels that can break down the 16 - bottlenecks further. This is not directly implemented in perf, 17 - but available in other tools that can run on top of perf, 18 - such as toplev[2] or vtune[3] 19 - 20 - New Topdown features in Ice Lake 21 - =============================== 24 + New Topdown features in Intel Ice Lake 25 + ====================================== 22 26 23 27 With Ice Lake CPUs the TopDown metrics are directly available as 24 28 fixed counters and do not require generic counters. This allows 25 29 to collect TopDown always in addition to other events. 26 30 27 - % perf stat -a --topdown -I1000 28 - # time retiring bad speculation frontend bound backend bound 29 - 1.001281330 23.0% 15.3% 29.6% 32.1% 30 - 2.003009005 5.0% 6.8% 46.6% 41.6% 31 - 3.004646182 6.7% 6.7% 46.0% 40.6% 32 - 4.006326375 5.0% 6.4% 47.6% 41.0% 33 - 5.007991804 5.1% 6.3% 46.3% 42.3% 34 - 6.009626773 6.2% 7.1% 47.3% 39.3% 35 - 7.011296356 4.7% 6.7% 46.2% 42.4% 36 - 8.012951831 4.7% 6.7% 47.5% 41.1% 37 - ... 38 - 39 - This also enables measuring TopDown per thread/process instead 40 - of only per core. 41 - 42 - Using TopDown through RDPMC in applications on Ice Lake 43 - ====================================================== 31 + Using TopDown through RDPMC in applications on Intel Ice Lake 32 + ============================================================= 44 33 45 34 For more fine grained measurements it can be useful to 46 35 access the new directly from user space. This is more complicated, ··· 290 301 A program using RDPMC for TopDown should schedule such a reset 291 302 regularly, as in every few seconds. 292 303 293 - Limits on Ice Lake 294 - ================== 304 + Limits on Intel Ice Lake 305 + ======================== 295 306 296 307 Four pseudo TopDown metric events are exposed for the end-users, 297 308 topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound. ··· 307 318 group, the second event of the group is the sampling event. 308 319 For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S' 309 320 310 - Extension on Sapphire Rapids Server 311 - =================================== 321 + Extension on Intel Sapphire Rapids Server 322 + ========================================= 312 323 The metrics counter is extended to support TMA method level 2 metrics. 313 324 The lower half of the register is the TMA level 1 metrics (legacy). 314 325 The upper half is also divided into four 8-bit fields for the new level 2 ··· 327 338 328 339 329 340 [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win 330 - [2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual 331 - [3] https://software.intel.com/en-us/intel-vtune-amplifier-xe 341 + [2] https://sites.google.com/site/analysismethods/yasin-pubs 342 + [3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis 332 343 [4] https://github.com/andikleen/pmu-tools/tree/master/jevents 333 - [5] https://sites.google.com/site/analysismethods/yasin-pubs