Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf docs: arm-spe: Document new SPE filtering features

FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes
so document them. Also document existing 'event_filter' bits that were
missing from the doc and the fact that latency values are stored in the
weight field.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

authored by

James Clark and committed by
Namhyung Kim
5accdaec 14a84c70

+94 -8
+94 -8
tools/perf/Documentation/perf-arm-spe.txt
··· 141 141 These are placed between the // in the event and comma separated. For example '-e 142 142 arm_spe/load_filter=1,min_latency=10/' 143 143 144 - branch_filter=1 - collect branches only (PMSFCR.B) 145 - event_filter=<mask> - filter on specific events (PMSEVFR) - see bitfield description below 144 + event_filter=<mask> - logical AND filter on specific events (PMSEVFR) - see bitfield description below 145 + inv_event_filter=<mask> - logical OR to filter out specific events (PMSNEVFR, FEAT_SPEv1p2) - see bitfield description below 146 146 jitter=1 - use jitter to avoid resonance when sampling (PMSIRR.RND) 147 - load_filter=1 - collect loads only (PMSFCR.LD) 148 147 min_latency=<n> - collect only samples with this latency or higher* (PMSLATFR) 149 148 pa_enable=1 - collect physical address (as well as VA) of loads/stores (PMSCR.PA) - requires privilege 150 149 pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege 151 - store_filter=1 - collect stores only (PMSFCR.ST) 152 150 ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS) 153 151 discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD) 152 + inv_data_src_filter=<mask> - mask to filter from 0-63 possible data sources (PMSDSFR, FEAT_SPE_FDS) - See 'Data source filtering' 154 153 155 154 +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather 156 155 than only the execution latency. 157 156 158 - Only some events can be filtered on; these include: 157 + Only some events can be filtered on using 'event_filter' bits. The overall 158 + filter is the logical AND of these bits, for example if bits 3 and 5 are set 159 + only samples that have both 'L1D cache refill' AND 'TLB walk' are recorded. When 160 + FEAT_SPEv1p2 is implemented 'inv_event_filter' can also be used to exclude 161 + events that have any (OR) of the filter's bits set. For example setting bits 3 162 + and 5 in 'inv_event_filter' will exclude any events that are either L1D cache 163 + refill OR TLB walk. If the same bit is set in both filters it's UNPREDICTABLE 164 + whether the sample is included or excluded. Filter bits for both event_filter 165 + and inv_event_filter are: 159 166 160 - bit 1 - instruction retired (i.e. omit speculative instructions) 167 + bit 1 - Instruction retired (i.e. omit speculative instructions) 168 + bit 2 - L1D access (FEAT_SPEv1p4) 161 169 bit 3 - L1D refill 170 + bit 4 - TLB access (FEAT_SPEv1p4) 162 171 bit 5 - TLB refill 163 - bit 7 - mispredict 164 - bit 11 - misaligned access 172 + bit 6 - Not taken event (FEAT_SPEv1p2) 173 + bit 7 - Mispredict 174 + bit 8 - Last level cache access (FEAT_SPEv1p4) 175 + bit 9 - Last level cache miss (FEAT_SPEv1p4) 176 + bit 10 - Remote access (FEAT_SPEv1p4) 177 + bit 11 - Misaligned access (FEAT_SPEv1p1) 178 + bit 12-15 - IMPLEMENTATION DEFINED events (when implemented) 179 + bit 16 - Transaction (FEAT_TME) 180 + bit 17 - Partial or empty SME or SVE predicate (FEAT_SPEv1p1) 181 + bit 18 - Empty SME or SVE predicate (FEAT_SPEv1p1) 182 + bit 19 - L2D access (FEAT_SPEv1p4) 183 + bit 20 - L2D miss (FEAT_SPEv1p4) 184 + bit 21 - Cache data modified (FEAT_SPEv1p4) 185 + bit 22 - Recently fetched (FEAT_SPEv1p4) 186 + bit 23 - Data snooped (FEAT_SPEv1p4) 187 + bit 24 - Streaming SVE mode event (when FEAT_SPE_SME is implemented), or 188 + IMPLEMENTATION DEFINED event 24 (when implemented, only versions 189 + less than FEAT_SPEv1p4) 190 + bit 25 - SMCU or external coprocessor operation event when FEAT_SPE_SME is 191 + implemented, or IMPLEMENTATION DEFINED event 25 (when implemented, 192 + only versions less than FEAT_SPEv1p4) 193 + bit 26-31 - IMPLEMENTATION DEFINED events (only versions less than FEAT_SPEv1p4) 194 + bit 48-63 - IMPLEMENTATION DEFINED events (when implemented) 195 + 196 + For IMPLEMENTATION DEFINED bits, refer to the CPU TRM if these bits are 197 + implemented. 198 + 199 + The driver will reject events if requested filter bits require unimplemented SPE 200 + versions, but will not reject filter bits for unimplemented IMPDEF bits or when 201 + their related feature is not present (e.g. SME). For example, if FEAT_SPEv1p2 is 202 + not implemented, filtering on "Not taken event" (bit 6) will be rejected. 165 203 166 204 So to sample just retired instructions: 167 205 ··· 208 170 or just mispredicted branches: 209 171 210 172 perf record -e arm_spe/event_filter=0x80/ -- ./mybench 173 + 174 + When set, the following filters can be used to select samples that match any of 175 + the operation types (OR filtering). If only one is set then only samples of that 176 + type are collected: 177 + 178 + branch_filter=1 - Collect branches (PMSFCR.B) 179 + load_filter=1 - Collect loads (PMSFCR.LD) 180 + store_filter=1 - Collect stores (PMSFCR.ST) 181 + 182 + When extended filtering is supported (FEAT_SPE_EFT), SIMD and float 183 + pointer operations can also be selected: 184 + 185 + simd_filter=1 - Collect SIMD loads, stores and operations (PMSFCR.SIMD) 186 + float_filter=1 - Collect floating point loads, stores and operations (PMSFCR.FP) 187 + 188 + When extended filtering is supported (FEAT_SPE_EFT), operation type filters can 189 + be changed to AND using _mask fields. For example samples could be selected if 190 + they are store AND SIMD by setting 'store_filter=1,simd_filter=1, 191 + store_filter_mask=1,simd_filter_mask=1'. The new masks are as follows: 192 + 193 + branch_filter_mask=1 - Change branch filter behavior from OR to AND (PMSFCR.Bm) 194 + load_filter_mask=1 - Change load filter behavior from OR to AND (PMSFCR.LDm) 195 + store_filter_mask=1 - Change store filter behavior from OR to AND (PMSFCR.STm) 196 + simd_filter_mask=1 - Change SIMD filter behavior from OR to AND (PMSFCR.SIMDm) 197 + float_filter_mask=1 - Change floating point filter behavior from OR to AND (PMSFCR.FPm) 211 198 212 199 Viewing the data 213 200 ~~~~~~~~~~~~~~~~~ ··· 272 209 Memory access details are also stored on the samples and this can be viewed with: 273 210 274 211 perf report --mem-mode 212 + 213 + The latency value from the SPE sample is stored in the 'weight' field of the 214 + Perf samples and can be displayed in Perf script and report outputs by enabling 215 + its display from the command line. 275 216 276 217 Common errors 277 218 ~~~~~~~~~~~~~ ··· 319 252 320 253 perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null & 321 254 perf stat -e SAMPLE_FEED_LD 255 + 256 + Data source filtering 257 + ~~~~~~~~~~~~~~~~~~~~~ 258 + 259 + When FEAT_SPE_FDS is present, 'inv_data_src_filter' can be used as a mask to 260 + filter on a subset (0 - 63) of possible data source IDs. The full range of data 261 + sources is 0 - 65535 although these are unlikely to be used in practice. Data 262 + sources are IMPDEF so refer to the TRM for the mappings. Each bit N of the 263 + filter maps to data source N. The filter is an OR of all the bits, and the value 264 + provided inv_data_src_filter is inverted before writing to PMSDSFR_EL1 so that 265 + set bits exclude that data source and cleared bits include that data source. 266 + Therefore the default value of 0 is equivalent to no filtering (all data sources 267 + included). 268 + 269 + For example, to include only data sources 0 and 3, clear bits 0 and 3 270 + (0xFFFFFFFFFFFFFFF6) 271 + 272 + When 'inv_data_src_filter' is set to 0xFFFFFFFFFFFFFFFF, any samples with any 273 + data source set are excluded. 322 274 323 275 SEE ALSO 324 276 --------