Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf callchain: Fix incorrect ordering of entries

The existing implementation of thread__resolve_callchain, under certain
circumstances, can assemble callchain entries in the incorrect order.

The callchain entries are resolved incorrectly for a sample when all of
the following conditions are met:

1. callchain_param.order is set to ORDER_CALLER

2. thread__resolve_callchain_sample is able to resolve callchain entries
for the sample.

3. unwind__get_entries is also able to resolve callchain entries for the
sample.

The fix is accomplished by reversing the order in which
thread__resolve_callchain_sample and unwind__get_entries are called when
callchain_param.order is set to ORDER_CALLER.

Unwind specific code from thread__resolve_callchain is also moved into a
new static function to improve readability of the fix.

How to Reproduce the Existing Bug:

Modifying perf script to print call trees in the opposite order or
applying the remaining patches from this series and comparing the
results output from export-to-postgtresql.py are the easiest ways to see
the bug, however it can still be seen in current builds using perf
report.

Here is how i can reproduce the bug using perf report:

# perf record --call-graph=dwarf stress -c 1 -t 5

when i run this command:

# perf report --call-graph=flat,0,0,callee

This callchain, containing kernel (handle_irq_event, etc) and userspace
samples (__libc_start_main, etc) is contained in the output, which looks
correct (callee order):

gen8_irq_handler
handle_irq_event_percpu
handle_irq_event
handle_edge_irq
handle_irq
do_IRQ
ret_from_intr
__random
rand
0x558f2a04dded
0x558f2a04c774
__libc_start_main
0x558f2a04dcd9

Now run this command using caller order:

# perf report --call-graph=flat,0,0,caller

It is expected to see the exact reverse of the above when using caller
order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the
bottom) in the output, but it is nowhere to be found.

instead you see this:

ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random

Notice how internally the kernel symbols are reversed and the user space
symbols are reversed, but the kernel symbols still appear above the user
space symbols.

if this patch is applied and perf script is re-run, you will see the
expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler"
at the bottom):

0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler

Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Chris Phlipot and committed by
Arnaldo Carvalho de Melo
9919a65e 4c4d6e51

+41 -15
+41 -15
tools/perf/util/machine.c
··· 1817 1817 int skip_idx = -1; 1818 1818 int first_call = 0; 1819 1819 1820 - callchain_cursor_reset(cursor); 1821 - 1822 1820 if (perf_evsel__has_branch_callstack(evsel)) { 1823 1821 err = resolve_lbr_callchain_sample(thread, cursor, sample, parent, 1824 1822 root_al, max_stack); ··· 1927 1929 entry->map, entry->sym); 1928 1930 } 1929 1931 1930 - int thread__resolve_callchain(struct thread *thread, 1931 - struct callchain_cursor *cursor, 1932 - struct perf_evsel *evsel, 1933 - struct perf_sample *sample, 1934 - struct symbol **parent, 1935 - struct addr_location *root_al, 1936 - int max_stack) 1932 + static int thread__resolve_callchain_unwind(struct thread *thread, 1933 + struct callchain_cursor *cursor, 1934 + struct perf_evsel *evsel, 1935 + struct perf_sample *sample, 1936 + int max_stack) 1937 1937 { 1938 - int ret = thread__resolve_callchain_sample(thread, cursor, evsel, 1939 - sample, parent, 1940 - root_al, max_stack); 1941 - if (ret) 1942 - return ret; 1943 - 1944 1938 /* Can we do dwarf post unwind? */ 1945 1939 if (!((evsel->attr.sample_type & PERF_SAMPLE_REGS_USER) && 1946 1940 (evsel->attr.sample_type & PERF_SAMPLE_STACK_USER))) ··· 1945 1955 1946 1956 return unwind__get_entries(unwind_entry, cursor, 1947 1957 thread, sample, max_stack); 1958 + } 1948 1959 1960 + int thread__resolve_callchain(struct thread *thread, 1961 + struct callchain_cursor *cursor, 1962 + struct perf_evsel *evsel, 1963 + struct perf_sample *sample, 1964 + struct symbol **parent, 1965 + struct addr_location *root_al, 1966 + int max_stack) 1967 + { 1968 + int ret = 0; 1969 + 1970 + callchain_cursor_reset(&callchain_cursor); 1971 + 1972 + if (callchain_param.order == ORDER_CALLEE) { 1973 + ret = thread__resolve_callchain_sample(thread, cursor, 1974 + evsel, sample, 1975 + parent, root_al, 1976 + max_stack); 1977 + if (ret) 1978 + return ret; 1979 + ret = thread__resolve_callchain_unwind(thread, cursor, 1980 + evsel, sample, 1981 + max_stack); 1982 + } else { 1983 + ret = thread__resolve_callchain_unwind(thread, cursor, 1984 + evsel, sample, 1985 + max_stack); 1986 + if (ret) 1987 + return ret; 1988 + ret = thread__resolve_callchain_sample(thread, cursor, 1989 + evsel, sample, 1990 + parent, root_al, 1991 + max_stack); 1992 + } 1993 + 1994 + return ret; 1949 1995 } 1950 1996 1951 1997 int machine__for_each_thread(struct machine *machine,