Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf stat: Combine the -A/--no-aggr and --no-merge options

The -A or --no-aggr option disables aggregation of core events:

$ perf stat -A -e cycles,data_total -a true

Performance counter stats for 'system wide':

CPU0 1,287,665 cycles
CPU1 1,831,681 cycles
CPU2 27,345,998 cycles
CPU3 1,964,799 cycles
CPU4 236,174 cycles
CPU5 3,302,825 cycles
CPU6 9,201,446 cycles
CPU7 1,403,043 cycles
CPU0 110.90 MiB data_total

0.008961761 seconds time elapsed

The --no-merge option disables the aggregation of uncore events:

$ perf stat --no-merge -e cycles,data_total -a true

Performance counter stats for 'system wide':

38,482,778 cycles
15.04 MiB data_total [uncore_imc_free_running_1]
15.00 MiB data_total [uncore_imc_free_running_0]

0.005915155 seconds time elapsed

Having two options confuses users who generally don't appreciate the
difference in PMUs. Keep all the options but make it so they all
disable aggregation both of core and uncore events:

$ perf stat -A -e cycles,data_total -a true

Performance counter stats for 'system wide':

CPU0 85,878 cycles
CPU1 88,179 cycles
CPU2 60,872 cycles
CPU3 3,265,567 cycles
CPU4 82,357 cycles
CPU5 83,383 cycles
CPU6 84,156 cycles
CPU7 220,803 cycles
CPU0 2.38 MiB data_total [uncore_imc_free_running_0]
CPU0 2.38 MiB data_total [uncore_imc_free_running_1]

0.001397205 seconds time elapsed

Update the relevant 'perf stat' man page information.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231214060256.2094017-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Ian Rogers and committed by
Arnaldo Carvalho de Melo
6f33e6fa 1bc479d6

+33 -29
+28 -24
tools/perf/Documentation/perf-stat.txt
··· 422 422 423 423 -A:: 424 424 --no-aggr:: 425 - Do not aggregate counts across all monitored CPUs. 425 + --no-merge:: 426 + Do not aggregate/merge counts across monitored CPUs or PMUs. 427 + 428 + When multiple events are created from a single event specification, 429 + stat will, by default, aggregate the event counts and show the result 430 + in a single row. This option disables that behavior and shows the 431 + individual events and counts. 432 + 433 + Multiple events are created from a single event specification when: 434 + 435 + 1. PID monitoring isn't requested and the system has more than one 436 + CPU. For example, a system with 8 SMT threads will have one event 437 + opened on each thread and aggregation is performed across them. 438 + 439 + 2. Prefix or glob wildcard matching is used for the PMU name. For 440 + example, multiple memory controller PMUs may exist typically with a 441 + suffix of _0, _1, etc. By default the event counts will all be 442 + combined if the PMU is specified without the suffix such as 443 + uncore_imc rather than uncore_imc_0. 444 + 445 + 3. Aliases, which are listed immediately after the Kernel PMU events 446 + by perf list, are used. 447 + 448 + --hybrid-merge:: 449 + Merge core event counts from all core PMUs. In hybrid or big.LITTLE 450 + systems by default each core PMU will report its count 451 + separately. This option forces core PMU counts to be combined to give 452 + a behavior closer to having a single CPU type in the system. 426 453 427 454 --topdown:: 428 455 Print top-down metrics supported by the CPU. This allows to determine ··· 501 474 'perf stat -M tma_frontend_bound_group...'. 502 475 503 476 Error out if the input is higher than the supported max level. 504 - 505 - --no-merge:: 506 - Do not merge results from same PMUs. 507 - 508 - When multiple events are created from a single event specification, 509 - stat will, by default, aggregate the event counts and show the result 510 - in a single row. This option disables that behavior and shows 511 - the individual events and counts. 512 - 513 - Multiple events are created from a single event specification when: 514 - 1. Prefix or glob matching is used for the PMU name. 515 - 2. Aliases, which are listed immediately after the Kernel PMU events 516 - by perf list, are used. 517 - 518 - --hybrid-merge:: 519 - Merge the hybrid event counts from all PMUs. 520 - 521 - For hybrid events, by default, the stat aggregates and reports the event 522 - counts per PMU. But sometimes, it's also useful to aggregate event counts 523 - from all PMUs. This option enables that behavior and reports the counts 524 - without PMUs. 525 - 526 - For non-hybrid events, it should be no effect. 527 477 528 478 --smi-cost:: 529 479 Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
+3 -2
tools/perf/builtin-stat.c
··· 1204 1204 OPT_STRING('C', "cpu", &target.cpu_list, "cpu", 1205 1205 "list of cpus to monitor in system-wide"), 1206 1206 OPT_SET_UINT('A', "no-aggr", &stat_config.aggr_mode, 1207 - "disable CPU count aggregation", AGGR_NONE), 1208 - OPT_BOOLEAN(0, "no-merge", &stat_config.no_merge, "Do not merge identical named events"), 1207 + "disable aggregation across CPUs or PMUs", AGGR_NONE), 1208 + OPT_SET_UINT(0, "no-merge", &stat_config.aggr_mode, 1209 + "disable aggregation the same as -A or -no-aggr", AGGR_NONE), 1209 1210 OPT_BOOLEAN(0, "hybrid-merge", &stat_config.hybrid_merge, 1210 1211 "Merge identical named hybrid events"), 1211 1212 OPT_STRING('x', "field-separator", &stat_config.csv_sep, "separator",
+1 -1
tools/perf/util/stat-display.c
··· 898 898 899 899 static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter) 900 900 { 901 - if (config->no_merge || hybrid_uniquify(counter, config)) 901 + if (config->aggr_mode == AGGR_NONE || hybrid_uniquify(counter, config)) 902 902 uniquify_event_name(counter); 903 903 } 904 904
+1 -1
tools/perf/util/stat.c
··· 592 592 { 593 593 struct evsel *evsel; 594 594 595 - if (config->no_merge) 595 + if (config->aggr_mode == AGGR_NONE) 596 596 return; 597 597 598 598 evlist__for_each_entry(evlist, evsel)
-1
tools/perf/util/stat.h
··· 76 76 bool null_run; 77 77 bool ru_display; 78 78 bool big_num; 79 - bool no_merge; 80 79 bool hybrid_merge; 81 80 bool walltime_run_table; 82 81 bool all_kernel;