Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU

Documentation for vpa-dtl (Virtual Processor Area - Dispatch Trace Log)
PMU interface. And how it can be used to collect the distrace trace log
entries in perf data, how to process/report as part of perf report/perf
script.

Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-by: Tejas Manhas <tejas05@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250915102947.26681-8-atrajeev@linux.ibm.com

authored by

Athira Rajeev and committed by
Madhavan Srinivasan
4a774b39 b5e71caf

+157
+1
Documentation/arch/powerpc/index.rst
··· 37 37 vas-api 38 38 vcpudispatch_stats 39 39 vmemmap_dedup 40 + vpa-dtl 40 41 41 42 features 42 43
+156
Documentation/arch/powerpc/vpa-dtl.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. _vpa-dtl: 3 + 4 + =================================== 5 + DTL (Dispatch Trace Log) 6 + =================================== 7 + 8 + Athira Rajeev, 19 April 2025 9 + 10 + .. contents:: 11 + :depth: 3 12 + 13 + 14 + Basic overview 15 + ============== 16 + 17 + The pseries Shared Processor Logical Partition(SPLPAR) machines can 18 + retrieve a log of dispatch and preempt events from the hypervisor 19 + using data from Disptach Trace Log(DTL) buffer. With this information, 20 + user can retrieve when and why each dispatch & preempt has occurred. 21 + The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters 22 + via perf. 23 + 24 + Infrastructure used 25 + =================== 26 + 27 + The VPA DTL PMU counters do not interrupt on overflow or generate any 28 + PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer 29 + nterval can be provided by user via sample_period field in nano seconds. 30 + vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch 31 + Trace Log) contains information about dispatch/preempt, enqueue time etc. 32 + We directly copy the DTL buffer data as part of auxiliary buffer and it 33 + will be processed later. This will avoid time taken to create samples 34 + in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL) 35 + entries makes use of AUX support in perf infrastructure. On the tools side, 36 + this data is made available as PERF_RECORD_AUXTRACE records. 37 + 38 + To correlate each DTL entry with other events across CPU's, an auxtrace_queue 39 + is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers. 40 + All auxtrace queues is maintained in auxtrace heap. The queues are sorted 41 + based on timestamp. When the different PERF_RECORD_XX records are processed, 42 + compare the timestamp of perf record with timestamp of top element in the 43 + auxtrace heap so that DTL events can be co-related with other events 44 + Process the auxtrace queue if the timestamp of element from heap is 45 + lower than timestamp from entry in perf record. Sometimes it could happen that 46 + one buffer is only partially processed. if the timestamp of occurrence of 47 + another event is more than currently processed element in the queue, it will 48 + move on to next perf record. So keep track of position of buffer to continue 49 + processing next time. Update the timestamp of the auxtrace heap with the timestamp 50 + of last processed entry from the auxtrace buffer. 51 + 52 + This infrastructure ensures dispatch trace log entries can be correlated 53 + and presented along with other events like sched. 54 + 55 + vpa-dtl PMU example usage 56 + ========================= 57 + 58 + .. code-block:: sh 59 + 60 + # ls /sys/devices/vpa_dtl/ 61 + events format perf_event_mux_interval_ms power subsystem type uevent 62 + 63 + 64 + To capture the DTL data using perf record: 65 + .. code-block:: sh 66 + 67 + # ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1 68 + 69 + The result can be interpreted using perf record. Snippet of perf report -D 70 + 71 + .. code-block:: sh 72 + 73 + # ./perf report -D 74 + 75 + There are different PERF_RECORD_XX records. In that records corresponding to 76 + auxtrace buffers includes: 77 + 78 + 1. PERF_RECORD_AUX 79 + Conveys that new data is available in AUX area 80 + 81 + 2. PERF_RECORD_AUXTRACE_INFO 82 + Describes offset and size of auxtrace data in the buffers 83 + 84 + 3. PERF_RECORD_AUXTRACE 85 + This is the record that defines the auxtrace data which here in case of 86 + vpa-dtl pmu is dispatch trace log data. 87 + 88 + Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump 89 + 90 + .. code-block:: sh 91 + 92 + 0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0 93 + . 94 + . ... VPA DTL PMU data: size 1680 bytes, entries is 35 95 + . 00000000: boot_tb: 21349649546353231, tb_freq: 512000000 96 + . 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773 97 + . 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437 98 + . 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709 99 + . 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243 100 + . 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648 101 + . 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446 102 + . 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126 103 + . 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665 104 + 105 + Above is representation of dtl entry of below format: 106 + 107 + struct dtl_entry { 108 + u8 dispatch_reason; 109 + u8 preempt_reason; 110 + u16 processor_id; 111 + u32 enqueue_to_dispatch_time; 112 + u32 ready_to_enqueue_time; 113 + u32 waiting_to_ready_time; 114 + u64 timebase; 115 + u64 fault_addr; 116 + u64 srr0; 117 + u64 srr1; 118 + 119 + }; 120 + 121 + First two fields represent the dispatch reason and preempt reason. The post 122 + processing of PERF_RECORD_AUXTRACE records will translate to meaningful data 123 + for user to consume. 124 + 125 + Visualize the dispatch trace log entries with perf report 126 + ========================================================= 127 + 128 + .. code-block:: sh 129 + 130 + # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1 131 + [ perf record: Woken up 1 times to write data ] 132 + [ perf record: Captured and wrote 0.300 MB perf.data ] 133 + 134 + # ./perf report 135 + # Samples: 321 of event 'vpa-dtl' 136 + # Event count (approx.): 321 137 + # 138 + # Children Self Command Shared Object Symbol 139 + # ........ ........ ....... ................. .............................. 140 + # 141 + 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace 142 + 143 + Visualize the dispatch trace log entries with perf script 144 + ========================================================= 145 + 146 + .. code-block:: sh 147 + 148 + # ./perf script 149 + migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009 150 + migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10 151 + migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns] 152 + migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120 153 + :256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms]) 154 + :256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms]) 155 + perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns] 156 + perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010