Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf Documentation: Document intel-hybrid support

Add some words and examples to help understanding of
Intel hybrid perf support.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-27-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Jin Yao and committed by
Arnaldo Carvalho de Melo
2750ce1d a37f3b88

+217
+214
tools/perf/Documentation/intel-hybrid.txt
··· 1 + Intel hybrid support 2 + -------------------- 3 + Support for Intel hybrid events within perf tools. 4 + 5 + For some Intel platforms, such as AlderLake, which is hybrid platform and 6 + it consists of atom cpu and core cpu. Each cpu has dedicated event list. 7 + Part of events are available on core cpu, part of events are available 8 + on atom cpu and even part of events are available on both. 9 + 10 + Kernel exports two new cpu pmus via sysfs: 11 + /sys/devices/cpu_core 12 + /sys/devices/cpu_atom 13 + 14 + The 'cpus' files are created under the directories. For example, 15 + 16 + cat /sys/devices/cpu_core/cpus 17 + 0-15 18 + 19 + cat /sys/devices/cpu_atom/cpus 20 + 16-23 21 + 22 + It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus. 23 + 24 + Quickstart 25 + 26 + List hybrid event 27 + ----------------- 28 + 29 + As before, use perf-list to list the symbolic event. 30 + 31 + perf list 32 + 33 + inst_retired.any 34 + [Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom] 35 + inst_retired.any 36 + [Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core] 37 + 38 + The 'Unit: xxx' is added to brief description to indicate which pmu 39 + the event is belong to. Same event name but with different pmu can 40 + be supported. 41 + 42 + Enable hybrid event with a specific pmu 43 + --------------------------------------- 44 + 45 + To enable a core only event or atom only event, following syntax is supported: 46 + 47 + cpu_core/<event name>/ 48 + or 49 + cpu_atom/<event name>/ 50 + 51 + For example, count the 'cycles' event on core cpus. 52 + 53 + perf stat -e cpu_core/cycles/ 54 + 55 + Create two events for one hardware event automatically 56 + ------------------------------------------------------ 57 + 58 + When creating one event and the event is available on both atom and core, 59 + two events are created automatically. One is for atom, the other is for 60 + core. Most of hardware events and cache events are available on both 61 + cpu_core and cpu_atom. 62 + 63 + For hardware events, they have pre-defined configs (e.g. 0 for cycles). 64 + But on hybrid platform, kernel needs to know where the event comes from 65 + (from atom or from core). The original perf event type PERF_TYPE_HARDWARE 66 + can't carry pmu information. So now this type is extended to be PMU aware 67 + type. The PMU type ID is stored at attr.config[63:32]. 68 + 69 + PMU type ID is retrieved from sysfs. 70 + /sys/devices/cpu_atom/type 71 + /sys/devices/cpu_core/type 72 + 73 + The new attr.config layout for PERF_TYPE_HARDWARE: 74 + 75 + PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA 76 + AA: hardware event ID 77 + EEEEEEEE: PMU type ID 78 + 79 + Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be 80 + PMU aware type. The PMU type ID is stored at attr.config[63:32]. 81 + 82 + The new attr.config layout for PERF_TYPE_HW_CACHE: 83 + 84 + PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB 85 + BB: hardware cache ID 86 + CC: hardware cache op ID 87 + DD: hardware cache op result ID 88 + EEEEEEEE: PMU type ID 89 + 90 + When enabling a hardware event without specified pmu, such as, 91 + perf stat -e cycles -a (use system-wide in this example), two events 92 + are created automatically. 93 + 94 + ------------------------------------------------------------ 95 + perf_event_attr: 96 + size 120 97 + config 0x400000000 98 + sample_type IDENTIFIER 99 + read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 100 + disabled 1 101 + inherit 1 102 + exclude_guest 1 103 + ------------------------------------------------------------ 104 + 105 + and 106 + 107 + ------------------------------------------------------------ 108 + perf_event_attr: 109 + size 120 110 + config 0x800000000 111 + sample_type IDENTIFIER 112 + read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 113 + disabled 1 114 + inherit 1 115 + exclude_guest 1 116 + ------------------------------------------------------------ 117 + 118 + type 0 is PERF_TYPE_HARDWARE. 119 + 0x4 in 0x400000000 indicates it's cpu_core pmu. 120 + 0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random). 121 + 122 + The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus), 123 + and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus). 124 + 125 + For perf-stat result, it displays two events: 126 + 127 + Performance counter stats for 'system wide': 128 + 129 + 6,744,979 cpu_core/cycles/ 130 + 1,965,552 cpu_atom/cycles/ 131 + 132 + The first 'cycles' is core event, the second 'cycles' is atom event. 133 + 134 + Thread mode example: 135 + -------------------- 136 + 137 + perf-stat reports the scaled counts for hybrid event and with a percentage 138 + displayed. The percentage is the event's running time/enabling time. 139 + 140 + One example, 'triad_loop' runs on cpu16 (atom core), while we can see the 141 + scaled value for core cycles is 160,444,092 and the percentage is 0.47%. 142 + 143 + perf stat -e cycles -- taskset -c 16 ./triad_loop 144 + 145 + As previous, two events are created. 146 + 147 + ------------------------------------------------------------ 148 + perf_event_attr: 149 + size 120 150 + config 0x400000000 151 + sample_type IDENTIFIER 152 + read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 153 + disabled 1 154 + inherit 1 155 + enable_on_exec 1 156 + exclude_guest 1 157 + ------------------------------------------------------------ 158 + 159 + and 160 + 161 + ------------------------------------------------------------ 162 + perf_event_attr: 163 + size 120 164 + config 0x800000000 165 + sample_type IDENTIFIER 166 + read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 167 + disabled 1 168 + inherit 1 169 + enable_on_exec 1 170 + exclude_guest 1 171 + ------------------------------------------------------------ 172 + 173 + Performance counter stats for 'taskset -c 16 ./triad_loop': 174 + 175 + 233,066,666 cpu_core/cycles/ (0.43%) 176 + 604,097,080 cpu_atom/cycles/ (99.57%) 177 + 178 + perf-record: 179 + ------------ 180 + 181 + If there is no '-e' specified in perf record, on hybrid platform, 182 + it creates two default 'cycles' and adds them to event list. One 183 + is for core, the other is for atom. 184 + 185 + perf-stat: 186 + ---------- 187 + 188 + If there is no '-e' specified in perf stat, on hybrid platform, 189 + besides of software events, following events are created and 190 + added to event list in order. 191 + 192 + cpu_core/cycles/, 193 + cpu_atom/cycles/, 194 + cpu_core/instructions/, 195 + cpu_atom/instructions/, 196 + cpu_core/branches/, 197 + cpu_atom/branches/, 198 + cpu_core/branch-misses/, 199 + cpu_atom/branch-misses/ 200 + 201 + Of course, both perf-stat and perf-record support to enable 202 + hybrid event with a specific pmu. 203 + 204 + e.g. 205 + perf stat -e cpu_core/cycles/ 206 + perf stat -e cpu_atom/cycles/ 207 + perf stat -e cpu_core/r1a/ 208 + perf stat -e cpu_atom/L1-icache-loads/ 209 + perf stat -e cpu_core/cycles/,cpu_atom/instructions/ 210 + perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' 211 + 212 + But '{cpu_core/cycles/,cpu_atom/instructions/}' will return 213 + warning and disable grouping, because the pmus in group are 214 + not matched (cpu_core vs. cpu_atom).
+1
tools/perf/Documentation/perf-record.txt
··· 695 695 wait -n ${perf_pid} 696 696 exit $? 697 697 698 + include::intel-hybrid.txt[] 698 699 699 700 SEE ALSO 700 701 --------
+2
tools/perf/Documentation/perf-stat.txt
··· 552 552 553 553 Additional metrics may be printed with all earlier fields being empty. 554 554 555 + include::intel-hybrid.txt[] 556 + 555 557 SEE ALSO 556 558 -------- 557 559 linkperf:perf-top[1], linkperf:perf-list[1]