Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf tools: Document --children option in more detail

As the --children option changes the output of perf report (and perf
top) it sometimes confuses users. Add more words and examples to help
understanding of the option's behavior - and how to disable it ;-).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1429684425-14987-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Namhyung Kim and committed by
Arnaldo Carvalho de Melo
dd309207 c4fa0d9c

+114 -1
+108
tools/perf/Documentation/callchain-overhead-calculation.txt
··· 1 + Overhead calculation 2 + -------------------- 3 + The overhead can be shown in two columns as 'Children' and 'Self' when 4 + perf collects callchains. The 'self' overhead is simply calculated by 5 + adding all period values of the entry - usually a function (symbol). 6 + This is the value that perf shows traditionally and sum of all the 7 + 'self' overhead values should be 100%. 8 + 9 + The 'children' overhead is calculated by adding all period values of 10 + the child functions so that it can show the total overhead of the 11 + higher level functions even if they don't directly execute much. 12 + 'Children' here means functions that are called from another (parent) 13 + function. 14 + 15 + It might be confusing that the sum of all the 'children' overhead 16 + values exceeds 100% since each of them is already an accumulation of 17 + 'self' overhead of its child functions. But with this enabled, users 18 + can find which function has the most overhead even if samples are 19 + spread over the children. 20 + 21 + Consider the following example; there are three functions like below. 22 + 23 + ----------------------- 24 + void foo(void) { 25 + /* do something */ 26 + } 27 + 28 + void bar(void) { 29 + /* do something */ 30 + foo(); 31 + } 32 + 33 + int main(void) { 34 + bar() 35 + return 0; 36 + } 37 + ----------------------- 38 + 39 + In this case 'foo' is a child of 'bar', and 'bar' is an immediate 40 + child of 'main' so 'foo' also is a child of 'main'. In other words, 41 + 'main' is a parent of 'foo' and 'bar', and 'bar' is a parent of 'foo'. 42 + 43 + Suppose all samples are recorded in 'foo' and 'bar' only. When it's 44 + recorded with callchains the output will show something like below 45 + in the usual (self-overhead-only) output of perf report: 46 + 47 + ---------------------------------- 48 + Overhead Symbol 49 + ........ ..................... 50 + 60.00% foo 51 + | 52 + --- foo 53 + bar 54 + main 55 + __libc_start_main 56 + 57 + 40.00% bar 58 + | 59 + --- bar 60 + main 61 + __libc_start_main 62 + ---------------------------------- 63 + 64 + When the --children option is enabled, the 'self' overhead values of 65 + child functions (i.e. 'foo' and 'bar') are added to the parents to 66 + calculate the 'children' overhead. In this case the report could be 67 + displayed as: 68 + 69 + ------------------------------------------- 70 + Children Self Symbol 71 + ........ ........ .................... 72 + 100.00% 0.00% __libc_start_main 73 + | 74 + --- __libc_start_main 75 + 76 + 100.00% 0.00% main 77 + | 78 + --- main 79 + __libc_start_main 80 + 81 + 100.00% 40.00% bar 82 + | 83 + --- bar 84 + main 85 + __libc_start_main 86 + 87 + 60.00% 60.00% foo 88 + | 89 + --- foo 90 + bar 91 + main 92 + __libc_start_main 93 + ------------------------------------------- 94 + 95 + In the above output, the 'self' overhead of 'foo' (60%) was add to the 96 + 'children' overhead of 'bar', 'main' and '\_\_libc_start_main'. 97 + Likewise, the 'self' overhead of 'bar' (40%) was added to the 98 + 'children' overhead of 'main' and '\_\_libc_start_main'. 99 + 100 + So '\_\_libc_start_main' and 'main' are shown first since they have 101 + same (100%) 'children' overhead (even though they have zero 'self' 102 + overhead) and they are the parents of 'foo' and 'bar'. 103 + 104 + Since v3.16 the 'children' overhead is shown by default and the output 105 + is sorted by its values. The 'children' overhead is disabled by 106 + specifying --no-children option on the command line or by adding 107 + 'report.children = false' or 'top.children = false' in the perf config 108 + file.
+4
tools/perf/Documentation/perf-report.txt
··· 193 193 Accumulate callchain of children to parent entry so that then can 194 194 show up in the output. The output will have a new "Children" column 195 195 and will be sorted on the data. It requires callchains are recorded. 196 + See the `overhead calculation' section for more details. 196 197 197 198 --max-stack:: 198 199 Set the stack depth limit when parsing the callchain, anything ··· 323 322 324 323 --header-only:: 325 324 Show only perf.data header (forces --stdio). 325 + 326 + 327 + include::callchain-overhead-calculation.txt[] 326 328 327 329 SEE ALSO 328 330 --------
+2 -1
tools/perf/Documentation/perf-top.txt
··· 168 168 Accumulate callchain of children to parent entry so that then can 169 169 show up in the output. The output will have a new "Children" column 170 170 and will be sorted on the data. It requires -g/--call-graph option 171 - enabled. 171 + enabled. See the `overhead calculation' section for more details. 172 172 173 173 --max-stack:: 174 174 Set the stack depth limit when parsing the callchain, anything ··· 234 234 235 235 Pressing any unmapped key displays a menu, and prompts for input. 236 236 237 + include::callchain-overhead-calculation.txt[] 237 238 238 239 SEE ALSO 239 240 --------