Linux kernel mirror (for testing)
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel
os
linux
1================
2Delay accounting
3================
4
5Tasks encounter delays in execution when they wait
6for some kernel resource to become available e.g. a
7runnable task may wait for a free CPU to run on.
8
9The per-task delay accounting functionality measures
10the delays experienced by a task while
11
12a) waiting for a CPU (while being runnable)
13b) completion of synchronous block I/O initiated by the task
14c) swapping in pages
15d) memory reclaim
16e) thrashing
17f) direct compact
18g) write-protect copy
19h) IRQ/SOFTIRQ
20
21and makes these statistics available to userspace through
22the taskstats interface.
23
24Such delays provide feedback for setting a task's cpu priority,
25io priority and rss limit values appropriately. Long delays for
26important tasks could be a trigger for raising its corresponding priority.
27
28The functionality, through its use of the taskstats interface, also provides
29delay statistics aggregated for all tasks (or threads) belonging to a
30thread group (corresponding to a traditional Unix process). This is a commonly
31needed aggregation that is more efficiently done by the kernel.
32
33Userspace utilities, particularly resource management applications, can also
34aggregate delay statistics into arbitrary groups. To enable this, delay
35statistics of a task are available both during its lifetime as well as on its
36exit, ensuring continuous and complete monitoring can be done.
37
38
39Interface
40---------
41
42Delay accounting uses the taskstats interface which is described
43in detail in a separate document in this directory. Taskstats returns a
44generic data structure to userspace corresponding to per-pid and per-tgid
45statistics. The delay accounting functionality populates specific fields of
46this structure. See
47
48 include/uapi/linux/taskstats.h
49
50for a description of the fields pertaining to delay accounting.
51It will generally be in the form of counters returning the cumulative
52delay seen for cpu, sync block I/O, swapin, memory reclaim, thrash page
53cache, direct compact, write-protect copy, IRQ/SOFTIRQ etc.
54
55Taking the difference of two successive readings of a given
56counter (say cpu_delay_total) for a task will give the delay
57experienced by the task waiting for the corresponding resource
58in that interval.
59
60When a task exits, records containing the per-task statistics
61are sent to userspace without requiring a command. If it is the last exiting
62task of a thread group, the per-tgid statistics are also sent. More details
63are given in the taskstats interface description.
64
65The getdelays.c userspace utility in tools/accounting directory allows simple
66commands to be run and the corresponding delay statistics to be displayed. It
67also serves as an example of using the taskstats interface.
68
69Usage
70-----
71
72Compile the kernel with::
73
74 CONFIG_TASK_DELAY_ACCT=y
75 CONFIG_TASKSTATS=y
76
77Delay accounting is disabled by default at boot up.
78To enable, add::
79
80 delayacct
81
82to the kernel boot options. The rest of the instructions below assume this has
83been done. Alternatively, use sysctl kernel.task_delayacct to switch the state
84at runtime. Note however that only tasks started after enabling it will have
85delayacct information.
86
87After the system has booted up, use a utility
88similar to getdelays.c to access the delays
89seen by a given task or a task group (tgid).
90The utility also allows a given command to be
91executed and the corresponding delays to be
92seen.
93
94General format of the getdelays command::
95
96 getdelays [-dilv] [-t tgid] [-p pid]
97
98Get delays, since system boot, for pid 10::
99
100 # ./getdelays -d -p 10
101 (output similar to next case)
102
103Get sum and peak of delays, since system boot, for all pids with tgid 242::
104
105 bash-4.4# ./getdelays -d -t 242
106 print delayacct stats ON
107 TGID 242
108
109
110 CPU count real total virtual total delay total delay average delay max delay min
111 39 156000000 156576579 2111069 0.054ms 0.212296ms 0.031307ms
112 IO count delay total delay average delay max delay min
113 0 0 0.000ms 0.000000ms 0.000000ms
114 SWAP count delay total delay average delay max delay min
115 0 0 0.000ms 0.000000ms 0.000000ms
116 RECLAIM count delay total delay average delay max delay min
117 0 0 0.000ms 0.000000ms 0.000000ms
118 THRASHING count delay total delay average delay max delay min
119 0 0 0.000ms 0.000000ms 0.000000ms
120 COMPACT count delay total delay average delay max delay min
121 0 0 0.000ms 0.000000ms 0.000000ms
122 WPCOPY count delay total delay average delay max delay min
123 156 11215873 0.072ms 0.207403ms 0.033913ms
124 IRQ count delay total delay average delay max delay min
125 0 0 0.000ms 0.000000ms 0.000000ms
126
127Get IO accounting for pid 1, it works only with -p::
128
129 # ./getdelays -i -p 1
130 printing IO accounting
131 linuxrc: read=65536, write=0, cancelled_write=0
132
133The above command can be used with -v to get more debug information.
134
135After the system starts, use `delaytop` to get the system-wide delay information,
136which includes system-wide PSI information and Top-N high-latency tasks.
137Note: PSI support requires `CONFIG_PSI=y` and `psi=1` for full functionality.
138
139`delaytop` is an interactive tool for monitoring system pressure and task delays.
140It supports multiple sorting options, display modes, and real-time keyboard controls.
141
142Basic usage with default settings (sorts by CPU delay, shows top 20 tasks, refreshes every 2 seconds)::
143
144 bash# ./delaytop
145 System Pressure Information: (avg10/avg60vg300/total)
146 CPU some: 0.0%/ 0.0%/ 0.0%/ 106137(ms)
147 CPU full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
148 Memory full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
149 Memory some: 0.0%/ 0.0%/ 0.0%/ 0(ms)
150 IO full: 0.0%/ 0.0%/ 0.0%/ 2240(ms)
151 IO some: 0.0%/ 0.0%/ 0.0%/ 2783(ms)
152 IRQ full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
153 [o]sort [M]memverbose [q]quit
154 Top 20 processes (sorted by cpu delay):
155 PID TGID COMMAND CPU(ms) IO(ms) IRQ(ms) MEM(ms)
156 ------------------------------------------------------------------------
157 110 110 kworker/15:0H-s 27.91 0.00 0.00 0.00
158 57 57 cpuhp/7 3.18 0.00 0.00 0.00
159 99 99 cpuhp/14 2.97 0.00 0.00 0.00
160 51 51 cpuhp/6 0.90 0.00 0.00 0.00
161 44 44 kworker/4:0H-sy 0.80 0.00 0.00 0.00
162 60 60 ksoftirqd/7 0.74 0.00 0.00 0.00
163 76 76 idle_inject/10 0.31 0.00 0.00 0.00
164 100 100 idle_inject/14 0.30 0.00 0.00 0.00
165 1309 1309 systemsettings 0.29 0.00 0.00 0.00
166 45 45 cpuhp/5 0.22 0.00 0.00 0.00
167 63 63 cpuhp/8 0.20 0.00 0.00 0.00
168 87 87 cpuhp/12 0.18 0.00 0.00 0.00
169 93 93 cpuhp/13 0.17 0.00 0.00 0.00
170 1265 1265 acpid 0.17 0.00 0.00 0.00
171 1552 1552 sshd 0.17 0.00 0.00 0.00
172 2584 2584 sddm-helper 0.16 0.00 0.00 0.00
173 1284 1284 rtkit-daemon 0.15 0.00 0.00 0.00
174 1326 1326 nde-netfilter 0.14 0.00 0.00 0.00
175 27 27 cpuhp/2 0.13 0.00 0.00 0.00
176 631 631 kworker/11:2-rc 0.11 0.00 0.00 0.00
177
178Interactive keyboard controls during runtime::
179
180 o - Select sort field (CPU, IO, IRQ, Memory, etc.)
181 M - Toggle display mode (Default/Memory Verbose)
182 q - Quit
183
184Available sort fields(use -s/--sort or interactive command)::
185
186 cpu(c) - CPU delay
187 blkio(i) - I/O delay
188 irq(q) - IRQ delay
189 mem(m) - Total memory delay
190 swapin(s) - Swapin delay (memory verbose mode only)
191 freepages(r) - Freepages reclaim delay (memory verbose mode only)
192 thrashing(t) - Thrashing delay (memory verbose mode only)
193 compact(p) - Compaction delay (memory verbose mode only)
194 wpcopy(w) - Write page copy delay (memory verbose mode only)
195
196Advanced usage examples::
197
198 # ./delaytop -s blkio
199 Sorted by IO delay
200
201 # ./delaytop -s mem -M
202 Sorted by memory delay in memory verbose mode
203
204 # ./delaytop -p pid
205 Print delayacct stats
206
207 # ./delaytop -P num
208 Display the top N tasks
209
210 # ./delaytop -n num
211 Set delaytop refresh frequency (num times)
212
213 # ./delaytop -d secs
214 Specify refresh interval as secs