Linux kernel mirror (for testing)
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel
os
linux
1perf-mem(1)
2===========
3
4NAME
5----
6perf-mem - Profile memory accesses
7
8SYNOPSIS
9--------
10[verse]
11'perf mem' [<options>] (record [<command>] | report)
12
13DESCRIPTION
14-----------
15"perf mem record" runs a command and gathers memory operation data
16from it, into perf.data. Perf record options are accepted and are passed through.
17
18"perf mem report" displays the result. It invokes perf report with the
19right set of options to display a memory access profile. By default, loads
20and stores are sampled. Use the -t option to limit to loads or stores.
21
22Note that on Intel systems the memory latency reported is the use-latency,
23not the pure load (or store latency). Use latency includes any pipeline
24queuing delays in addition to the memory subsystem latency.
25
26On Arm64 this uses SPE to sample load and store operations, therefore hardware
27and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
28Due to the statistical nature of SPE sampling, not every memory operation will
29be sampled.
30
31On AMD this use IBS Op PMU to sample load-store operations.
32
33COMMON OPTIONS
34--------------
35-f::
36--force::
37 Don't do ownership validation
38
39-t::
40--type=<type>::
41 Select the memory operation type: load or store (default: load,store)
42
43-v::
44--verbose::
45 Be more verbose (show counter open errors, etc)
46
47-p::
48--phys-data::
49 Record/Report sample physical addresses
50
51--data-page-size::
52 Record/Report sample data address page size
53
54RECORD OPTIONS
55--------------
56<command>...::
57 Any command you can specify in a shell.
58
59-e::
60--event <event>::
61 Event selector. Use 'perf mem record -e list' to list available events.
62
63-K::
64--all-kernel::
65 Configure all used events to run in kernel space.
66
67-U::
68--all-user::
69 Configure all used events to run in user space.
70
71--ldlat <n>::
72 Specify desired latency for loads event. Supported on Intel, Arm64 and
73 some AMD processors. Ignored on other archs.
74
75 On supported AMD processors:
76 - /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'.
77 - Supported latency values are 128 to 2048 (both inclusive).
78 - Latency value which is a multiple of 128 incurs a little less profiling
79 overhead compared to other values.
80 - Load latency filtering is disabled by default.
81
82REPORT OPTIONS
83--------------
84-i::
85--input=<file>::
86 Input file name.
87
88-C::
89--cpu=<cpu>::
90 Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
91 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -
92 like 0-2. Default is to monitor all CPUS.
93
94-D::
95--dump-raw-samples::
96 Dump the raw decoded samples on the screen in a format that is easy to parse with
97 one sample per line.
98
99-s::
100--sort=<key>::
101 Group result by given key(s) - multiple keys can be specified
102 in CSV format. The keys are specific to memory samples are:
103 symbol_daddr, symbol_iaddr, dso_daddr, locked, tlb, mem, snoop,
104 dcacheline, phys_daddr, data_page_size, blocked.
105
106 - symbol_daddr: name of data symbol being executed on at the time of sample
107 - symbol_iaddr: name of code symbol being executed on at the time of sample
108 - dso_daddr: name of library or module containing the data being executed
109 on at the time of the sample
110 - locked: whether the bus was locked at the time of the sample
111 - tlb: type of tlb access for the data at the time of the sample
112 - mem: type of memory access for the data at the time of the sample
113 - snoop: type of snoop (if any) for the data at the time of the sample
114 - dcacheline: the cacheline the data address is on at the time of the sample
115 - phys_daddr: physical address of data being executed on at the time of sample
116 - data_page_size: the data page size of data being executed on at the time of sample
117 - blocked: reason of blocked load access for the data at the time of the sample
118
119 And the default sort keys are changed to local_weight, mem, sym, dso,
120 symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
121
122-F::
123--fields=::
124 Specify output field - multiple keys can be specified in CSV format.
125 Please see linkperf:perf-report[1] for details.
126
127 In addition to the default fields, 'perf mem report' will provide the
128 following fields to break down sample periods.
129
130 - op: operation in the sample instruction (load, store, prefetch, ...)
131 - cache: location in CPU cache (L1, L2, ...) where the sample hit
132 - mem: location in memory or other places the sample hit
133 - dtlb: location in Data TLB (L1, L2) where the sample hit
134 - snoop: snoop result for the sampled data access
135
136 Please take a look at the OUTPUT FIELD SELECTION section for caveats.
137
138-T::
139--type-profile::
140 Show data-type profile result instead of code symbols. This requires
141 the debug information and it will change the default sort keys to:
142 mem, snoop, tlb, type.
143
144-U::
145--hide-unresolved::
146 Only display entries resolved to a symbol.
147
148-x::
149--field-separator=<separator>::
150 Specify the field separator used when dump raw samples (-D option). By default,
151 The separator is the space character.
152
153In addition, for report all perf report options are valid, and for record
154all perf record options.
155
156OVERHEAD CALCULATION
157--------------------
158Unlike linkperf:perf-report[1], which calculates overhead from the actual
159sample period, perf-mem overhead is calculated using sample weight. E.g.
160there are two samples in perf.data file, both with the same sample period,
161but one sample with weight 180 and the other with weight 20:
162
163 $ perf script -F period,data_src,weight,ip,sym
164 100000 629080842 |OP LOAD|LVL L3 hit|... 20 7e69b93ca524 strcmp
165 100000 1a29081042 |OP LOAD|LVL RAM hit|... 180 ffffffff82429168 memcpy
166
167 $ perf report -F overhead,symbol
168 50% [.] strcmp
169 50% [k] memcpy
170
171 $ perf mem report -F overhead,symbol
172 90% [k] memcpy
173 10% [.] strcmp
174
175OUTPUT FIELD SELECTION
176----------------------
177"perf mem report" adds a number of new output fields specific to data source
178information in the sample. Some of them have the same name with the existing
179sort keys ("mem" and "snoop"). So unlike other fields and sort keys, they'll
180behave differently when it's used by -F/--fields or -s/--sort.
181
182Using those two as output fields will aggregate samples altogether and show
183breakdown.
184
185 $ perf mem report -F mem,snoop
186 ...
187 # ------ Memory ------- --- Snoop ----
188 # RAM Uncach Other HitM Other
189 # ..................... ..............
190 #
191 3.5% 0.0% 96.5% 25.1% 74.9%
192
193But using the same name for sort keys will aggregate samples for each type
194separately.
195
196 $ perf mem report -s mem,snoop
197 # Overhead Samples Memory access Snoop
198 # ........ ............ ....................................... ............
199 #
200 47.99% 1509 L2 hit N/A
201 25.08% 338 core, same node Any cache hit HitM
202 10.24% 54374 N/A N/A
203 6.77% 35938 L1 hit N/A
204 6.39% 101 core, same node Any cache hit N/A
205 3.50% 69 RAM hit N/A
206 0.03% 158 LFB/MAB hit N/A
207 0.00% 2 Uncached hit N/A
208
209SEE ALSO
210--------
211linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]