Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf topology: Add core_wide

It is possible to optimize metrics when all SMT threads (CPUs) on a
core are measuring events in system wide mode. For example, TMA
metrics defines CORE_CLKS for Sandybrdige as:

if SMT is disabled:
CPU_CLK_UNHALTED.THREAD
if SMT is enabled and recording on all SMT threads:
CPU_CLK_UNHALTED.THREAD_ANY / 2
if SMT is enabled and not recording on all SMT threads:
(CPU_CLK_UNHALTED.THREAD/2)*
(1+CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE/CPU_CLK_UNHALTED.REF_XCLK )

That is two more events are necessary when not gathering counts on all
SMT threads. To distinguish all SMT threads on a core vs system wide
(all CPUs) call the new property core wide. Add a core wide test that
determines the property from user requested CPUs, the topology and
system wide. System wide is required as other processes running on a
SMT thread will change the counts.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220831174926.579643-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Ian Rogers and committed by
Arnaldo Carvalho de Melo
cc2c4e26 09b73fe9

+70
+46
tools/perf/util/cputopo.c
··· 172 172 return false; 173 173 } 174 174 175 + bool cpu_topology__core_wide(const struct cpu_topology *topology, 176 + const char *user_requested_cpu_list) 177 + { 178 + struct perf_cpu_map *user_requested_cpus; 179 + 180 + /* 181 + * If user_requested_cpu_list is empty then all CPUs are recorded and so 182 + * core_wide is true. 183 + */ 184 + if (!user_requested_cpu_list) 185 + return true; 186 + 187 + user_requested_cpus = perf_cpu_map__new(user_requested_cpu_list); 188 + /* Check that every user requested CPU is the complete set of SMT threads on a core. */ 189 + for (u32 i = 0; i < topology->core_cpus_lists; i++) { 190 + const char *core_cpu_list = topology->core_cpus_list[i]; 191 + struct perf_cpu_map *core_cpus = perf_cpu_map__new(core_cpu_list); 192 + struct perf_cpu cpu; 193 + int idx; 194 + bool has_first, first = true; 195 + 196 + perf_cpu_map__for_each_cpu(cpu, idx, core_cpus) { 197 + if (first) { 198 + has_first = perf_cpu_map__has(user_requested_cpus, cpu); 199 + first = false; 200 + } else { 201 + /* 202 + * If the first core CPU is user requested then 203 + * all subsequent CPUs in the core must be user 204 + * requested too. If the first CPU isn't user 205 + * requested then none of the others must be 206 + * too. 207 + */ 208 + if (perf_cpu_map__has(user_requested_cpus, cpu) != has_first) { 209 + perf_cpu_map__put(core_cpus); 210 + perf_cpu_map__put(user_requested_cpus); 211 + return false; 212 + } 213 + } 214 + } 215 + perf_cpu_map__put(core_cpus); 216 + } 217 + perf_cpu_map__put(user_requested_cpus); 218 + return true; 219 + } 220 + 175 221 static bool has_die_topology(void) 176 222 { 177 223 char filename[MAXPATHLEN];
+3
tools/perf/util/cputopo.h
··· 60 60 void cpu_topology__delete(struct cpu_topology *tp); 61 61 /* Determine from the core list whether SMT was enabled. */ 62 62 bool cpu_topology__smt_on(const struct cpu_topology *topology); 63 + /* Are the sets of SMT siblings all enabled or all disabled in user_requested_cpus. */ 64 + bool cpu_topology__core_wide(const struct cpu_topology *topology, 65 + const char *user_requested_cpu_list); 63 66 64 67 struct numa_topology *numa_topology__new(void); 65 68 void numa_topology__delete(struct numa_topology *tp);
+14
tools/perf/util/smt.c
··· 21 21 cached = true; 22 22 return cached_result; 23 23 } 24 + 25 + bool core_wide(bool system_wide, const char *user_requested_cpu_list, 26 + const struct cpu_topology *topology) 27 + { 28 + /* If not everything running on a core is being recorded then we can't use core_wide. */ 29 + if (!system_wide) 30 + return false; 31 + 32 + /* Cheap case that SMT is disabled and therefore we're inherently core_wide. */ 33 + if (!smt_on(topology)) 34 + return true; 35 + 36 + return cpu_topology__core_wide(topology, user_requested_cpu_list); 37 + }
+7
tools/perf/util/smt.h
··· 7 7 /* Returns true if SMT (aka hyperthreading) is enabled. */ 8 8 bool smt_on(const struct cpu_topology *topology); 9 9 10 + /* 11 + * Returns true when system wide and all SMT threads for a core are in the 12 + * user_requested_cpus map. 13 + */ 14 + bool core_wide(bool system_wide, const char *user_requested_cpu_list, 15 + const struct cpu_topology *topology); 16 + 10 17 #endif /* __SMT_H */