Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf mem/c2c: Add load store event mappings for AMD

The 'perf mem' and 'perf c2c' tools are wrappers around 'perf record'
with mem load/ store events. IBS tagged load/store sample provides most
of the information needed for these tools. Wire in the "ibs_op//" event
as mem-ldst event for AMD.

There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported in
per-cpu mode only.

Example:

$ sudo perf mem record -- -c 10000
^C[ perf record: Woken up 227 times to write data ]
[ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

$ sudo perf mem report -F mem,sample,snoop
Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
Memory access Samples Snoop
N/A 700620 N/A
L1 hit 126675 N/A
L2 hit 424 N/A
L3 hit 664 HitM
L3 hit 10 N/A
Local RAM hit 2 N/A
Remote RAM (1 hop) hit 8558 N/A
Remote Cache (1 hop) hit 3 N/A
Remote Cache (1 hop) hit 2 HitM
Remote Cache (2 hops) hit 10 HitM
Remote Cache (2 hops) hit 6 N/A
Uncached hit 4 N/A
$

[1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-6-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Ravi Bangoria and committed by
Arnaldo Carvalho de Melo
f7b58cbd 4173cc05

+41 -7
+10 -4
tools/perf/Documentation/perf-c2c.txt
··· 19 19 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows 20 20 you to track down the cacheline contentions. 21 21 22 - On x86, the tool is based on load latency and precise store facility events 22 + On Intel, the tool is based on load latency and precise store facility events 23 23 provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling 24 - with thresholding feature. 24 + with thresholding feature. On AMD, the tool uses IBS op pmu (due to hardware 25 + limitations, perf c2c is not supported on Zen3 cpus). 25 26 26 27 These events provide: 27 28 - memory address of the access ··· 50 49 51 50 -l:: 52 51 --ldlat:: 53 - Configure mem-loads latency. (x86 only) 52 + Configure mem-loads latency. Supported on Intel and Arm64 processors 53 + only. Ignored on other archs. 54 54 55 55 -k:: 56 56 --all-kernel:: ··· 137 135 -W,-d,--phys-data,--sample-cpu 138 136 139 137 Unless specified otherwise with '-e' option, following events are monitored by 140 - default on x86: 138 + default on Intel: 141 139 142 140 cpu/mem-loads,ldlat=30/P 143 141 cpu/mem-stores/P 142 + 143 + following on AMD: 144 + 145 + ibs_op// 144 146 145 147 and following on PowerPC: 146 148
+2 -1
tools/perf/Documentation/perf-mem.txt
··· 85 85 Be more verbose (show counter open errors, etc) 86 86 87 87 --ldlat <n>:: 88 - Specify desired latency for loads event. (x86 only) 88 + Specify desired latency for loads event. Supported on Intel and Arm64 89 + processors only. Ignored on other archs. 89 90 90 91 In addition, for report all perf report options are valid, and for record 91 92 all perf record options.
+29 -2
tools/perf/arch/x86/util/mem-events.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include "util/pmu.h" 3 + #include "util/env.h" 3 4 #include "map_symbol.h" 4 5 #include "mem-events.h" 6 + #include "linux/string.h" 5 7 6 8 static char mem_loads_name[100]; 7 9 static bool mem_loads_name__init; ··· 14 12 15 13 #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s } 16 14 17 - static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = { 15 + static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = { 18 16 E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads"), 19 17 E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores"), 20 18 E(NULL, NULL, NULL), 21 19 }; 22 20 21 + static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = { 22 + E(NULL, NULL, NULL), 23 + E(NULL, NULL, NULL), 24 + E("mem-ldst", "ibs_op//", "ibs_op"), 25 + }; 26 + 27 + static int perf_mem_is_amd_cpu(void) 28 + { 29 + struct perf_env env = { .total_mem = 0, }; 30 + 31 + perf_env__cpuid(&env); 32 + if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD")) 33 + return 1; 34 + return -1; 35 + } 36 + 23 37 struct perf_mem_event *perf_mem_events__ptr(int i) 24 38 { 39 + /* 0: Uninitialized, 1: Yes, -1: No */ 40 + static int is_amd; 41 + 25 42 if (i >= PERF_MEM_EVENTS__MAX) 26 43 return NULL; 27 44 28 - return &perf_mem_events[i]; 45 + if (!is_amd) 46 + is_amd = perf_mem_is_amd_cpu(); 47 + 48 + if (is_amd == 1) 49 + return &perf_mem_events_amd[i]; 50 + 51 + return &perf_mem_events_intel[i]; 29 52 } 30 53 31 54 bool is_mem_loads_aux_event(struct evsel *leader)