Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf Documentation: Add some more hints to tips.txt

Add some (hopefully useful) hints to tips.txt
Also some minor corrections.

Would probably good to make it a reviewer rule that if generally useful
options are added the patch must add an example to tips.txt

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240131021352.151440-1-ak@linux.intel.com

authored by

Andi Kleen and committed by
Namhyung Kim
366fb5f5 8f95b29c

+26 -5
+26 -5
tools/perf/Documentation/tips.txt
··· 2 2 Sample related events with: perf record -e '{cycles,instructions}:S' 3 3 Compare performance results with: perf diff [<old file> <new file>] 4 4 Boolean options have negative forms, e.g.: perf report --no-children 5 + To not accumulate CPU time of children symbols add --no-children 5 6 Customize output of perf script with: perf script -F event,ip,sym 6 7 Generate a script for your data: perf script -g <lang> 7 8 Save output of perf stat using: perf stat record <target workload> ··· 13 12 To see list of saved events and attributes: perf evlist -v 14 13 Use --symfs <dir> if your symbol files are in non-standard locations 15 14 To see callchains in a more compact form: perf report -g folded 15 + To see call chains by final symbol taking CPU time (bottom up) use perf report -G 16 16 Show individual samples with: perf script 17 17 Limit to show entries above 5% only: perf report --percent-limit 5 18 18 Profiling branch (mis)predictions with: perf record -b / perf report 19 - To show assembler sample contexts use perf record -b / perf script -F +brstackinsn --xed 20 - Treat branches as callchains: perf report --branch-history 21 - To count events in every 1000 msec: perf stat -I 1000 22 - Print event counts in CSV format with: perf stat -x, 19 + To show assembler sample context control flow use perf record -b / perf report --samples 10 and then browse context 20 + To adjust path to source files to local file system use perf report --prefix=... --prefix-strip=... 21 + Treat branches as callchains: perf record -b ... ; perf report --branch-history 22 + Show estimate cycles per function and IPC in annotate use perf record -b ... ; perf report --total-cycles 23 + To count events every 1000 msec: perf stat -I 1000 24 + Print event counts in machine readable CSV format with: perf stat -x\; 23 25 If you have debuginfo enabled, try: perf report -s sym,srcline 24 26 For memory address profiling, try: perf mem record / perf mem report 25 27 For tracepoint events, try: perf report -s trace_fields 26 28 To record callchains for each sample: perf record -g 29 + If call chains don't work try perf record --call-graph dwarf or --call-graph lbr 27 30 To record every process run by a user: perf record -u <user> 31 + To show inline functions in call traces add --inline to perf report 32 + To not record events from perf itself add --exclude-perf 28 33 Skip collecting build-id when recording: perf record -B 29 34 To change sampling frequency to 100 Hz: perf record -F 100 35 + To show information about system the samples were collected on use perf report --header 36 + To only collect call graph on one event use perf record -e cpu/cpu-cycles,callgraph=1/,branches ; perf report --show-ref-call-graph 37 + To set sampling period of individual events use perf record -e cpu/cpu-cycles,period=100001/,cpu/branches,period=10001/ ... 38 + To group events which need to be collected together for accuracy use {}: perf record -e {cycles,branches}' ... 39 + To compute metrics for samples use perf record -e '{cycles,instructions}' ... ; perf script -F +metric 30 40 See assembly instructions with percentage: perf annotate <symbol> 31 41 If you prefer Intel style assembly, try: perf annotate -M intel 42 + When collecting LBR backtraces use --stitch-lbr to handle more than 32 deep entries: perf record --call-graph lbr ; perf report --stitch-lbr 32 43 For hierarchical output, try: perf report --hierarchy 33 44 Order by the overhead of source file name and line number: perf report -s srcline 34 45 System-wide collection from all CPUs: perf record -a 35 46 Show current config key-value pairs: perf config --list 47 + To collect Processor Trace with samples use perf record -e '{intel_pt//,cycles}' ; perf script --call-trace or --insn-trace --xed -F +ipc (remove --xed if no xed) 48 + To trace calls using Processor Trace use perf record -e intel_pt// ... ; perf script --call-trace. Then use perf script --time A-B --insn-trace to look at region of interest. 49 + To measure approximate function latency with Processor Trace use perf record -e intel_pt// ... ; perf script --call-ret-trace 50 + To trace only single function with Processor Trace use perf record --filter 'filter func @ program' -e intel_pt//u ./program ; perf script --insn-trace 36 51 Show user configuration overrides: perf config --user --list 37 52 To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache --add `which node` 38 - To report cacheline events from previous recording: perf c2c report 53 + To analyze cache line scalability issues use perf c2c record ... ; perf c2c report 39 54 To browse sample contexts use perf report --sample 10 and select in context menu 40 55 To separate samples by time use perf report --sort time,overhead,sym 56 + To filter subset of samples with report or script add --time X-Y or --cpu A,B,C or --socket-filter ... 41 57 To set sample time separation other than 100ms with --sort time use --time-quantum 42 58 Add -I to perf record to sample register values, which will be visible in perf report sample context. 43 59 To show IPC for sampling periods use perf record -e '{cycles,instructions}:S' and then browse context 44 60 To show context switches in perf report sample context add --switch-events to perf record. 61 + To show time in nanoseconds in record/report add --ns 62 + To compare hot regions in two workloads use perf record -b -o file ... ; perf diff --stream file1 file2 63 + To compare scalability of two workload samples use perf diff -c ratio file1 file2