Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

perf: Add ITRACE_START record to indicate that tracing has started

For counters that generate AUX data that is bound to the context of a
running task, such as instruction tracing, the decoder needs to know
exactly which task is running when the event is first scheduled in,
before the first sched_switch. The decoder's need to know this stems
from the fact that instruction flow trace decoding will almost always
require program's object code in order to reconstruct said flow and
for that we need at least its pid/tid in the perf stream.

To single out such instruction tracing pmus, this patch introduces
ITRACE PMU capability. The reason this is not part of RECORD_AUX
record is that not all pmus capable of generating AUX data need this,
and the opposite is *probably* also true.

While sched_switch covers for most cases, there are two problems with it:
the consumer will need to process events out of order (that is, having
found RECORD_AUX, it will have to skip forward to the nearest sched_switch
to figure out which task it was, then go back to the actual trace to
decode it) and it completely misses the case when the tracing is enabled
and disabled before sched_switch, for example, via PERF_EVENT_IOC_DISABLE.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-15-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Alexander Shishkin and committed by
Ingo Molnar
ec0d7729 1a594131

+56
+4
include/linux/perf_event.h
··· 129 129 struct list_head cqm_groups_entry; 130 130 struct list_head cqm_group_entry; 131 131 }; 132 + struct { /* itrace */ 133 + int itrace_started; 134 + }; 132 135 #ifdef CONFIG_HAVE_HW_BREAKPOINT 133 136 struct { /* breakpoint */ 134 137 /* ··· 180 177 #define PERF_PMU_CAP_AUX_NO_SG 0x04 181 178 #define PERF_PMU_CAP_AUX_SW_DOUBLEBUF 0x08 182 179 #define PERF_PMU_CAP_EXCLUSIVE 0x10 180 + #define PERF_PMU_CAP_ITRACE 0x20 183 181 184 182 /** 185 183 * struct pmu - generic performance monitoring unit
+11
include/uapi/linux/perf_event.h
··· 789 789 */ 790 790 PERF_RECORD_AUX = 11, 791 791 792 + /* 793 + * Indicates that instruction trace has started 794 + * 795 + * struct { 796 + * struct perf_event_header header; 797 + * u32 pid; 798 + * u32 tid; 799 + * }; 800 + */ 801 + PERF_RECORD_ITRACE_START = 12, 802 + 792 803 PERF_RECORD_MAX, /* non-ABI */ 793 804 }; 794 805
+41
kernel/events/core.c
··· 1831 1831 #define MAX_INTERRUPTS (~0ULL) 1832 1832 1833 1833 static void perf_log_throttle(struct perf_event *event, int enable); 1834 + static void perf_log_itrace_start(struct perf_event *event); 1834 1835 1835 1836 static int 1836 1837 event_sched_in(struct perf_event *event, ··· 1869 1868 event->tstamp_running += tstamp - event->tstamp_stopped; 1870 1869 1871 1870 perf_set_shadow_time(event, ctx, tstamp); 1871 + 1872 + perf_log_itrace_start(event); 1872 1873 1873 1874 if (event->pmu->add(event, PERF_EF_START)) { 1874 1875 event->state = PERF_EVENT_STATE_INACTIVE; ··· 5991 5988 5992 5989 perf_output_put(&handle, throttle_event); 5993 5990 perf_event__output_id_sample(event, &handle, &sample); 5991 + perf_output_end(&handle); 5992 + } 5993 + 5994 + static void perf_log_itrace_start(struct perf_event *event) 5995 + { 5996 + struct perf_output_handle handle; 5997 + struct perf_sample_data sample; 5998 + struct perf_aux_event { 5999 + struct perf_event_header header; 6000 + u32 pid; 6001 + u32 tid; 6002 + } rec; 6003 + int ret; 6004 + 6005 + if (event->parent) 6006 + event = event->parent; 6007 + 6008 + if (!(event->pmu->capabilities & PERF_PMU_CAP_ITRACE) || 6009 + event->hw.itrace_started) 6010 + return; 6011 + 6012 + event->hw.itrace_started = 1; 6013 + 6014 + rec.header.type = PERF_RECORD_ITRACE_START; 6015 + rec.header.misc = 0; 6016 + rec.header.size = sizeof(rec); 6017 + rec.pid = perf_event_pid(event, current); 6018 + rec.tid = perf_event_tid(event, current); 6019 + 6020 + perf_event_header__init_id(&rec.header, &sample, event); 6021 + ret = perf_output_begin(&handle, event, rec.header.size); 6022 + 6023 + if (ret) 6024 + return; 6025 + 6026 + perf_output_put(&handle, rec); 6027 + perf_event__output_id_sample(event, &handle, &sample); 6028 + 5994 6029 perf_output_end(&handle); 5995 6030 } 5996 6031