Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf trace: Allow selecting use the use of the ordered_events code

I was trigger happy on this one, as using ordered_events as implemented
by Jiri for use with the --block code under discussion on lkml incurs
in delaying processing to form batches that then get ordered and then
printed.

With 'perf trace' we want to process the events as they go, without that
delay, and doing it that way works well for the common case which is to
trace a thread or a workload started by 'perf trace'.

So revert back to not using ordered_events but add an option to select
that mode so that users can experiment with their particular use case to
see if works better, i.e. if the added delay is not a problem and the
ordering helps.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-8ki7sld6rusnjhhtaly26i5o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

+25 -6
+6
tools/perf/Documentation/perf-trace.txt
··· 205 205 because the file may be huge. A time out is needed in such cases. 206 206 This option sets the time out limit. The default value is 500 ms. 207 207 208 + --sort-events:: 209 + Do sorting on batches of events, use when noticing out of order events that 210 + may happen, for instance, when a thread gets migrated to a different CPU 211 + while processing a syscall. 212 + 213 + 208 214 PAGEFAULTS 209 215 ---------- 210 216
+19 -6
tools/perf/builtin-trace.c
··· 110 110 } stats; 111 111 unsigned int max_stack; 112 112 unsigned int min_stack; 113 + bool sort_events; 113 114 bool raw_augmented_syscalls; 114 115 bool not_ev_qualifier; 115 116 bool live; ··· 2657 2656 return 0; 2658 2657 } 2659 2658 2660 - static int trace__flush_events(struct trace *trace) 2659 + static int __trace__flush_events(struct trace *trace) 2661 2660 { 2662 2661 u64 first = ordered_events__first_time(&trace->oe.data); 2663 2662 u64 flush = trace->oe.last - NSEC_PER_SEC; ··· 2669 2668 return 0; 2670 2669 } 2671 2670 2671 + static int trace__flush_events(struct trace *trace) 2672 + { 2673 + return !trace->sort_events ? 0 : __trace__flush_events(trace); 2674 + } 2675 + 2672 2676 static int trace__deliver_event(struct trace *trace, union perf_event *event) 2673 2677 { 2674 - struct perf_evlist *evlist = trace->evlist; 2675 2678 int err; 2676 2679 2677 - err = perf_evlist__parse_sample_timestamp(evlist, event, &trace->oe.last); 2680 + if (!trace->sort_events) 2681 + return __trace__deliver_event(trace, event); 2682 + 2683 + err = perf_evlist__parse_sample_timestamp(trace->evlist, event, &trace->oe.last); 2678 2684 if (err && err != -1) 2679 2685 return err; 2680 2686 ··· 2905 2897 2906 2898 perf_evlist__disable(evlist); 2907 2899 2908 - ordered_events__flush(&trace->oe.data, OE_FLUSH__FINAL); 2900 + if (trace->sort_events) 2901 + ordered_events__flush(&trace->oe.data, OE_FLUSH__FINAL); 2909 2902 2910 2903 if (!err) { 2911 2904 if (trace->summary) ··· 3525 3516 "Set the maximum stack depth when parsing the callchain, " 3526 3517 "anything beyond the specified depth will be ignored. " 3527 3518 "Default: kernel.perf_event_max_stack or " __stringify(PERF_MAX_STACK_DEPTH)), 3519 + OPT_BOOLEAN(0, "sort-events", &trace.sort_events, 3520 + "Sort batch of events before processing, use if getting out of order events"), 3528 3521 OPT_BOOLEAN(0, "print-sample", &trace.print_sample, 3529 3522 "print the PERF_RECORD_SAMPLE PERF_SAMPLE_ info, for debugging"), 3530 3523 OPT_UINTEGER(0, "proc-map-timeout", &proc_map_timeout, ··· 3620 3609 } 3621 3610 } 3622 3611 3623 - ordered_events__init(&trace.oe.data, ordered_events__deliver_event, &trace); 3624 - ordered_events__set_copy_on_queue(&trace.oe.data, true); 3612 + if (trace.sort_events) { 3613 + ordered_events__init(&trace.oe.data, ordered_events__deliver_event, &trace); 3614 + ordered_events__set_copy_on_queue(&trace.oe.data, true); 3615 + } 3625 3616 3626 3617 /* 3627 3618 * If we are augmenting syscalls, then combine what we put in the