Merge tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

+7

Documentation/kernel-parameters.txt

··· 320 320 on: enable for both 32- and 64-bit processes 321 321 off: disable for both 32- and 64-bit processes 322 322 323 + alloc_snapshot [FTRACE] 324 + Allocate the ftrace snapshot buffer on boot up when the 325 + main buffer is allocated. This is handy if debugging 326 + and you need to use tracing_snapshot() on boot up, and 327 + do not want to use tracing_snapshot_alloc() as it needs 328 + to be done where GFP_KERNEL allocations are allowed. 329 + 323 330 amd_iommu= [HW,X86-64] 324 331 Pass parameters to the AMD IOMMU driver in the system. 325 332 Possible values are:

+1473 -610

Documentation/trace/ftrace.txt

··· 8 8 Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, 9 9 John Kacur, and David Teigland. 10 10 Written for: 2.6.28-rc2 11 + Updated for: 3.10 11 12 12 13 Introduction 13 14 ------------ ··· 18 17 It can be used for debugging or analyzing latencies and 19 18 performance issues that take place outside of user-space. 20 19 21 - Although ftrace is the function tracer, it also includes an 22 - infrastructure that allows for other types of tracing. Some of 23 - the tracers that are currently in ftrace include a tracer to 24 - trace context switches, the time it takes for a high priority 25 - task to run after it was woken up, the time interrupts are 26 - disabled, and more (ftrace allows for tracer plugins, which 27 - means that the list of tracers can always grow). 20 + Although ftrace is typically considered the function tracer, it 21 + is really a frame work of several assorted tracing utilities. 22 + There's latency tracing to examine what occurs between interrupts 23 + disabled and enabled, as well as for preemption and from a time 24 + a task is woken to the task is actually scheduled in. 25 + 26 + One of the most common uses of ftrace is the event tracing. 27 + Through out the kernel is hundreds of static event points that 28 + can be enabled via the debugfs file system to see what is 29 + going on in certain parts of the kernel. 28 30 29 31 30 32 Implementation Details ··· 65 61 66 62 That's it! (assuming that you have ftrace configured into your kernel) 67 63 68 - After mounting the debugfs, you can see a directory called 64 + After mounting debugfs, you can see a directory called 69 65 "tracing". This directory contains the control and output files 70 66 of ftrace. Here is a list of some of the key files: 71 67 ··· 88 84 89 85 This sets or displays whether writing to the trace 90 86 ring buffer is enabled. Echo 0 into this file to disable 91 - the tracer or 1 to enable it. 87 + the tracer or 1 to enable it. Note, this only disables 88 + writing to the ring buffer, the tracing overhead may 89 + still be occurring. 92 90 93 91 trace: 94 92 ··· 115 109 116 110 This file lets the user control the amount of data 117 111 that is displayed in one of the above output 118 - files. 112 + files. Options also exist to modify how a tracer 113 + or events work (stack traces, timestamps, etc). 114 + 115 + options: 116 + 117 + This is a directory that has a file for every available 118 + trace option (also in trace_options). Options may also be set 119 + or cleared by writing a "1" or "0" respectively into the 120 + corresponding file with the option name. 119 121 120 122 tracing_max_latency: 121 123 ··· 135 121 latency is greater than the value in this 136 122 file. (in microseconds) 137 123 124 + tracing_thresh: 125 + 126 + Some latency tracers will record a trace whenever the 127 + latency is greater than the number in this file. 128 + Only active when the file contains a number greater than 0. 129 + (in microseconds) 130 + 138 131 buffer_size_kb: 139 132 140 133 This sets or displays the number of kilobytes each CPU 141 - buffer can hold. The tracer buffers are the same size 134 + buffer holds. By default, the trace buffers are the same size 142 135 for each CPU. The displayed number is the size of the 143 136 CPU buffer and not total size of all buffers. The 144 137 trace buffers are allocated in pages (blocks of memory ··· 154 133 than requested, the rest of the page will be used, 155 134 making the actual allocation bigger than requested. 156 135 ( Note, the size may not be a multiple of the page size 157 - due to buffer management overhead. ) 136 + due to buffer management meta-data. ) 158 137 159 - This can only be updated when the current_tracer 160 - is set to "nop". 138 + buffer_total_size_kb: 139 + 140 + This displays the total combined size of all the trace buffers. 141 + 142 + free_buffer: 143 + 144 + If a process is performing the tracing, and the ring buffer 145 + should be shrunk "freed" when the process is finished, even 146 + if it were to be killed by a signal, this file can be used 147 + for that purpose. On close of this file, the ring buffer will 148 + be resized to its minimum size. Having a process that is tracing 149 + also open this file, when the process exits its file descriptor 150 + for this file will be closed, and in doing so, the ring buffer 151 + will be "freed". 152 + 153 + It may also stop tracing if disable_on_free option is set. 161 154 162 155 tracing_cpumask: 163 156 164 157 This is a mask that lets the user only trace 165 - on specified CPUS. The format is a hex string 166 - representing the CPUS. 158 + on specified CPUs. The format is a hex string 159 + representing the CPUs. 167 160 168 161 set_ftrace_filter: 169 162 ··· 218 183 "set_ftrace_notrace". (See the section "dynamic ftrace" 219 184 below for more details.) 220 185 186 + enabled_functions: 187 + 188 + This file is more for debugging ftrace, but can also be useful 189 + in seeing if any function has a callback attached to it. 190 + Not only does the trace infrastructure use ftrace function 191 + trace utility, but other subsystems might too. This file 192 + displays all functions that have a callback attached to them 193 + as well as the number of callbacks that have been attached. 194 + Note, a callback may also call multiple functions which will 195 + not be listed in this count. 196 + 197 + If the callback registered to be traced by a function with 198 + the "save regs" attribute (thus even more overhead), a 'R' 199 + will be displayed on the same line as the function that 200 + is returning registers. 201 + 202 + function_profile_enabled: 203 + 204 + When set it will enable all functions with either the function 205 + tracer, or if enabled, the function graph tracer. It will 206 + keep a histogram of the number of functions that were called 207 + and if run with the function graph tracer, it will also keep 208 + track of the time spent in those functions. The histogram 209 + content can be displayed in the files: 210 + 211 + trace_stats/function<cpu> ( function0, function1, etc). 212 + 213 + trace_stats: 214 + 215 + A directory that holds different tracing stats. 216 + 217 + kprobe_events: 218 + 219 + Enable dynamic trace points. See kprobetrace.txt. 220 + 221 + kprobe_profile: 222 + 223 + Dynamic trace points stats. See kprobetrace.txt. 224 + 225 + max_graph_depth: 226 + 227 + Used with the function graph tracer. This is the max depth 228 + it will trace into a function. Setting this to a value of 229 + one will show only the first kernel function that is called 230 + from user space. 231 + 232 + printk_formats: 233 + 234 + This is for tools that read the raw format files. If an event in 235 + the ring buffer references a string (currently only trace_printk() 236 + does this), only a pointer to the string is recorded into the buffer 237 + and not the string itself. This prevents tools from knowing what 238 + that string was. This file displays the string and address for 239 + the string allowing tools to map the pointers to what the 240 + strings were. 241 + 242 + saved_cmdlines: 243 + 244 + Only the pid of the task is recorded in a trace event unless 245 + the event specifically saves the task comm as well. Ftrace 246 + makes a cache of pid mappings to comms to try to display 247 + comms for events. If a pid for a comm is not listed, then 248 + "<...>" is displayed in the output. 249 + 250 + snapshot: 251 + 252 + This displays the "snapshot" buffer and also lets the user 253 + take a snapshot of the current running trace. 254 + See the "Snapshot" section below for more details. 255 + 256 + stack_max_size: 257 + 258 + When the stack tracer is activated, this will display the 259 + maximum stack size it has encountered. 260 + See the "Stack Trace" section below. 261 + 262 + stack_trace: 263 + 264 + This displays the stack back trace of the largest stack 265 + that was encountered when the stack tracer is activated. 266 + See the "Stack Trace" section below. 267 + 268 + stack_trace_filter: 269 + 270 + This is similar to "set_ftrace_filter" but it limits what 271 + functions the stack tracer will check. 272 + 273 + trace_clock: 274 + 275 + Whenever an event is recorded into the ring buffer, a 276 + "timestamp" is added. This stamp comes from a specified 277 + clock. By default, ftrace uses the "local" clock. This 278 + clock is very fast and strictly per cpu, but on some 279 + systems it may not be monotonic with respect to other 280 + CPUs. In other words, the local clocks may not be in sync 281 + with local clocks on other CPUs. 282 + 283 + Usual clocks for tracing: 284 + 285 + # cat trace_clock 286 + [local] global counter x86-tsc 287 + 288 + local: Default clock, but may not be in sync across CPUs 289 + 290 + global: This clock is in sync with all CPUs but may 291 + be a bit slower than the local clock. 292 + 293 + counter: This is not a clock at all, but literally an atomic 294 + counter. It counts up one by one, but is in sync 295 + with all CPUs. This is useful when you need to 296 + know exactly the order events occurred with respect to 297 + each other on different CPUs. 298 + 299 + uptime: This uses the jiffies counter and the time stamp 300 + is relative to the time since boot up. 301 + 302 + perf: This makes ftrace use the same clock that perf uses. 303 + Eventually perf will be able to read ftrace buffers 304 + and this will help out in interleaving the data. 305 + 306 + x86-tsc: Architectures may define their own clocks. For 307 + example, x86 uses its own TSC cycle clock here. 308 + 309 + To set a clock, simply echo the clock name into this file. 310 + 311 + echo global > trace_clock 312 + 313 + trace_marker: 314 + 315 + This is a very useful file for synchronizing user space 316 + with events happening in the kernel. Writing strings into 317 + this file will be written into the ftrace buffer. 318 + 319 + It is useful in applications to open this file at the start 320 + of the application and just reference the file descriptor 321 + for the file. 322 + 323 + void trace_write(const char *fmt, ...) 324 + { 325 + va_list ap; 326 + char buf[256]; 327 + int n; 328 + 329 + if (trace_fd < 0) 330 + return; 331 + 332 + va_start(ap, fmt); 333 + n = vsnprintf(buf, 256, fmt, ap); 334 + va_end(ap); 335 + 336 + write(trace_fd, buf, n); 337 + } 338 + 339 + start: 340 + 341 + trace_fd = open("trace_marker", WR_ONLY); 342 + 343 + uprobe_events: 344 + 345 + Add dynamic tracepoints in programs. 346 + See uprobetracer.txt 347 + 348 + uprobe_profile: 349 + 350 + Uprobe statistics. See uprobetrace.txt 351 + 352 + instances: 353 + 354 + This is a way to make multiple trace buffers where different 355 + events can be recorded in different buffers. 356 + See "Instances" section below. 357 + 358 + events: 359 + 360 + This is the trace event directory. It holds event tracepoints 361 + (also known as static tracepoints) that have been compiled 362 + into the kernel. It shows what event tracepoints exist 363 + and how they are grouped by system. There are "enable" 364 + files at various levels that can enable the tracepoints 365 + when a "1" is written to them. 366 + 367 + See events.txt for more information. 368 + 369 + per_cpu: 370 + 371 + This is a directory that contains the trace per_cpu information. 372 + 373 + per_cpu/cpu0/buffer_size_kb: 374 + 375 + The ftrace buffer is defined per_cpu. That is, there's a separate 376 + buffer for each CPU to allow writes to be done atomically, 377 + and free from cache bouncing. These buffers may have different 378 + size buffers. This file is similar to the buffer_size_kb 379 + file, but it only displays or sets the buffer size for the 380 + specific CPU. (here cpu0). 381 + 382 + per_cpu/cpu0/trace: 383 + 384 + This is similar to the "trace" file, but it will only display 385 + the data specific for the CPU. If written to, it only clears 386 + the specific CPU buffer. 387 + 388 + per_cpu/cpu0/trace_pipe 389 + 390 + This is similar to the "trace_pipe" file, and is a consuming 391 + read, but it will only display (and consume) the data specific 392 + for the CPU. 393 + 394 + per_cpu/cpu0/trace_pipe_raw 395 + 396 + For tools that can parse the ftrace ring buffer binary format, 397 + the trace_pipe_raw file can be used to extract the data 398 + from the ring buffer directly. With the use of the splice() 399 + system call, the buffer data can be quickly transferred to 400 + a file or to the network where a server is collecting the 401 + data. 402 + 403 + Like trace_pipe, this is a consuming reader, where multiple 404 + reads will always produce different data. 405 + 406 + per_cpu/cpu0/snapshot: 407 + 408 + This is similar to the main "snapshot" file, but will only 409 + snapshot the current CPU (if supported). It only displays 410 + the content of the snapshot for a given CPU, and if 411 + written to, only clears this CPU buffer. 412 + 413 + per_cpu/cpu0/snapshot_raw: 414 + 415 + Similar to the trace_pipe_raw, but will read the binary format 416 + from the snapshot buffer for the given CPU. 417 + 418 + per_cpu/cpu0/stats: 419 + 420 + This displays certain stats about the ring buffer: 421 + 422 + entries: The number of events that are still in the buffer. 423 + 424 + overrun: The number of lost events due to overwriting when 425 + the buffer was full. 426 + 427 + commit overrun: Should always be zero. 428 + This gets set if so many events happened within a nested 429 + event (ring buffer is re-entrant), that it fills the 430 + buffer and starts dropping events. 431 + 432 + bytes: Bytes actually read (not overwritten). 433 + 434 + oldest event ts: The oldest timestamp in the buffer 435 + 436 + now ts: The current timestamp 437 + 438 + dropped events: Events lost due to overwrite option being off. 439 + 440 + read events: The number of events read. 221 441 222 442 The Tracers 223 443 ----------- ··· 524 234 RT tasks (as the current "wakeup" does). This is useful 525 235 for those interested in wake up timings of RT tasks. 526 236 527 - "hw-branch-tracer" 528 - 529 - Uses the BTS CPU feature on x86 CPUs to traces all 530 - branches executed. 531 - 532 237 "nop" 533 238 534 239 This is the "trace nothing" tracer. To remove all ··· 546 261 -------- 547 262 # tracer: function 548 263 # 549 - # TASK-PID CPU# TIMESTAMP FUNCTION 550 - # | | | | | 551 - bash-4251 [01] 10152.583854: path_put <-path_walk 552 - bash-4251 [01] 10152.583855: dput <-path_put 553 - bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput 264 + # entries-in-buffer/entries-written: 140080/250280 #P:4 265 + # 266 + # _-----=> irqs-off 267 + # / _----=> need-resched 268 + # | / _---=> hardirq/softirq 269 + # || / _--=> preempt-depth 270 + # ||| / delay 271 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 272 + # | | | |||| | | 273 + bash-1977 [000] .... 17284.993652: sys_close <-system_call_fastpath 274 + bash-1977 [000] .... 17284.993653: __close_fd <-sys_close 275 + bash-1977 [000] .... 17284.993653: _raw_spin_lock <-__close_fd 276 + sshd-1974 [003] .... 17284.993653: __srcu_read_unlock <-fsnotify 277 + bash-1977 [000] .... 17284.993654: add_preempt_count <-_raw_spin_lock 278 + bash-1977 [000] ...1 17284.993655: _raw_spin_unlock <-__close_fd 279 + bash-1977 [000] ...1 17284.993656: sub_preempt_count <-_raw_spin_unlock 280 + bash-1977 [000] .... 17284.993657: filp_close <-__close_fd 281 + bash-1977 [000] .... 17284.993657: dnotify_flush <-filp_close 282 + sshd-1974 [003] .... 17284.993658: sys_select <-system_call_fastpath 554 283 -------- 555 284 556 285 A header is printed with the tracer name that is represented by 557 - the trace. In this case the tracer is "function". Then a header 558 - showing the format. Task name "bash", the task PID "4251", the 559 - CPU that it was running on "01", the timestamp in <secs>.<usecs> 560 - format, the function name that was traced "path_put" and the 561 - parent function that called this function "path_walk". The 562 - timestamp is the time at which the function was entered. 286 + the trace. In this case the tracer is "function". Then it shows the 287 + number of events in the buffer as well as the total number of entries 288 + that were written. The difference is the number of entries that were 289 + lost due to the buffer filling up (250280 - 140080 = 110200 events 290 + lost). 291 + 292 + The header explains the content of the events. Task name "bash", the task 293 + PID "1977", the CPU that it was running on "000", the latency format 294 + (explained below), the timestamp in <secs>.<usecs> format, the 295 + function name that was traced "sys_close" and the parent function that 296 + called this function "system_call_fastpath". The timestamp is the time 297 + at which the function was entered. 563 298 564 299 Latency trace format 565 300 -------------------- 566 301 567 - When the latency-format option is enabled, the trace file gives 568 - somewhat more information to see why a latency happened. 569 - Here is a typical trace. 302 + When the latency-format option is enabled or when one of the latency 303 + tracers is set, the trace file gives somewhat more information to see 304 + why a latency happened. Here is a typical trace. 570 305 571 306 # tracer: irqsoff 572 307 # 573 - irqsoff latency trace v1.1.5 on 2.6.26-rc8 574 - -------------------------------------------------------------------- 575 - latency: 97 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 576 - ----------------- 577 - | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0) 578 - ----------------- 579 - => started at: apic_timer_interrupt 580 - => ended at: do_softirq 581 - 582 - # _------=> CPU# 583 - # / _-----=> irqs-off 584 - # | / _----=> need-resched 585 - # || / _---=> hardirq/softirq 586 - # ||| / _--=> preempt-depth 587 - # |||| / 588 - # ||||| delay 589 - # cmd pid ||||| time | caller 590 - # \ / ||||| \ | / 591 - <idle>-0 0d..1 0us+: trace_hardirqs_off_thunk (apic_timer_interrupt) 592 - <idle>-0 0d.s. 97us : __do_softirq (do_softirq) 593 - <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq) 308 + # irqsoff latency trace v1.1.5 on 3.8.0-test+ 309 + # -------------------------------------------------------------------- 310 + # latency: 259 us, #4/4, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 311 + # ----------------- 312 + # | task: ps-6143 (uid:0 nice:0 policy:0 rt_prio:0) 313 + # ----------------- 314 + # => started at: __lock_task_sighand 315 + # => ended at: _raw_spin_unlock_irqrestore 316 + # 317 + # 318 + # _------=> CPU# 319 + # / _-----=> irqs-off 320 + # | / _----=> need-resched 321 + # || / _---=> hardirq/softirq 322 + # ||| / _--=> preempt-depth 323 + # |||| / delay 324 + # cmd pid ||||| time | caller 325 + # \ / ||||| \ | / 326 + ps-6143 2d... 0us!: trace_hardirqs_off <-__lock_task_sighand 327 + ps-6143 2d..1 259us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore 328 + ps-6143 2d..1 263us+: time_hardirqs_on <-_raw_spin_unlock_irqrestore 329 + ps-6143 2d..1 306us : <stack trace> 330 + => trace_hardirqs_on_caller 331 + => trace_hardirqs_on 332 + => _raw_spin_unlock_irqrestore 333 + => do_task_stat 334 + => proc_tgid_stat 335 + => proc_single_show 336 + => seq_read 337 + => vfs_read 338 + => sys_read 339 + => system_call_fastpath 594 340 595 341 596 342 This shows that the current tracer is "irqsoff" tracing the time 597 - for which interrupts were disabled. It gives the trace version 598 - and the version of the kernel upon which this was executed on 599 - (2.6.26-rc8). Then it displays the max latency in microsecs (97 600 - us). The number of trace entries displayed and the total number 601 - recorded (both are three: #3/3). The type of preemption that was 602 - used (PREEMPT). VP, KP, SP, and HP are always zero and are 603 - reserved for later use. #P is the number of online CPUS (#P:2). 343 + for which interrupts were disabled. It gives the trace version (which 344 + never changes) and the version of the kernel upon which this was executed on 345 + (3.10). Then it displays the max latency in microseconds (259 us). The number 346 + of trace entries displayed and the total number (both are four: #4/4). 347 + VP, KP, SP, and HP are always zero and are reserved for later use. 348 + #P is the number of online CPUs (#P:4). 604 349 605 350 The task is the process that was running when the latency 606 - occurred. (swapper pid: 0). 351 + occurred. (ps pid: 6143). 607 352 608 353 The start and stop (the functions in which the interrupts were 609 354 disabled and enabled respectively) that caused the latencies: 610 355 611 - apic_timer_interrupt is where the interrupts were disabled. 612 - do_softirq is where they were enabled again. 356 + __lock_task_sighand is where the interrupts were disabled. 357 + _raw_spin_unlock_irqrestore is where they were enabled again. 613 358 614 359 The next lines after the header are the trace itself. The header 615 360 explains which is which. ··· 682 367 683 368 The rest is the same as the 'trace' file. 684 369 370 + Note, the latency tracers will usually end with a back trace 371 + to easily find where the latency occurred. 685 372 686 373 trace_options 687 374 ------------- 688 375 689 - The trace_options file is used to control what gets printed in 690 - the trace output. To see what is available, simply cat the file: 376 + The trace_options file (or the options directory) is used to control 377 + what gets printed in the trace output, or manipulate the tracers. 378 + To see what is available, simply cat the file: 691 379 692 380 cat trace_options 693 - print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ 694 - noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj 381 + print-parent 382 + nosym-offset 383 + nosym-addr 384 + noverbose 385 + noraw 386 + nohex 387 + nobin 388 + noblock 389 + nostacktrace 390 + trace_printk 391 + noftrace_preempt 392 + nobranch 393 + annotate 394 + nouserstacktrace 395 + nosym-userobj 396 + noprintk-msg-only 397 + context-info 398 + latency-format 399 + sleep-time 400 + graph-time 401 + record-cmd 402 + overwrite 403 + nodisable_on_free 404 + irq-info 405 + markers 406 + function-trace 695 407 696 408 To disable one of the options, echo in the option prepended with 697 409 "no". ··· 770 428 771 429 bin - This will print out the formats in raw binary. 772 430 773 - block - TBD (needs update) 431 + block - When set, reading trace_pipe will not block when polled. 774 432 775 433 stacktrace - This is one of the options that changes the trace 776 434 itself. When a trace is recorded, so is the stack 777 435 of functions. This allows for back traces of 778 436 trace sites. 437 + 438 + trace_printk - Can disable trace_printk() from writing into the buffer. 439 + 440 + branch - Enable branch tracing with the tracer. 441 + 442 + annotate - It is sometimes confusing when the CPU buffers are full 443 + and one CPU buffer had a lot of events recently, thus 444 + a shorter time frame, were another CPU may have only had 445 + a few events, which lets it have older events. When 446 + the trace is reported, it shows the oldest events first, 447 + and it may look like only one CPU ran (the one with the 448 + oldest events). When the annotate option is set, it will 449 + display when a new CPU buffer started: 450 + 451 + <idle>-0 [001] dNs4 21169.031481: wake_up_idle_cpu <-add_timer_on 452 + <idle>-0 [001] dNs4 21169.031482: _raw_spin_unlock_irqrestore <-add_timer_on 453 + <idle>-0 [001] .Ns4 21169.031484: sub_preempt_count <-_raw_spin_unlock_irqrestore 454 + ##### CPU 2 buffer started #### 455 + <idle>-0 [002] .N.1 21169.031484: rcu_idle_exit <-cpu_idle 456 + <idle>-0 [001] .Ns3 21169.031484: _raw_spin_unlock <-clocksource_watchdog 457 + <idle>-0 [001] .Ns3 21169.031485: sub_preempt_count <-_raw_spin_unlock 779 458 780 459 userstacktrace - This option changes the trace. It records a 781 460 stacktrace of the current userspace thread. ··· 814 451 a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0 815 452 x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6] 816 453 817 - sched-tree - trace all tasks that are on the runqueue, at 818 - every scheduling event. Will add overhead if 819 - there's a lot of tasks running at once. 454 + 455 + printk-msg-only - When set, trace_printk()s will only show the format 456 + and not their parameters (if trace_bprintk() or 457 + trace_bputs() was used to save the trace_printk()). 458 + 459 + context-info - Show only the event data. Hides the comm, PID, 460 + timestamp, CPU, and other useful data. 820 461 821 462 latency-format - This option changes the trace. When 822 463 it is enabled, the trace displays ··· 828 461 latencies, as described in "Latency 829 462 trace format". 830 463 464 + sleep-time - When running function graph tracer, to include 465 + the time a task schedules out in its function. 466 + When enabled, it will account time the task has been 467 + scheduled out as part of the function call. 468 + 469 + graph-time - When running function graph tracer, to include the 470 + time to call nested functions. When this is not set, 471 + the time reported for the function will only include 472 + the time the function itself executed for, not the time 473 + for functions that it called. 474 + 475 + record-cmd - When any event or tracer is enabled, a hook is enabled 476 + in the sched_switch trace point to fill comm cache 477 + with mapped pids and comms. But this may cause some 478 + overhead, and if you only care about pids, and not the 479 + name of the task, disabling this option can lower the 480 + impact of tracing. 481 + 831 482 overwrite - This controls what happens when the trace buffer is 832 483 full. If "1" (default), the oldest events are 833 484 discarded and overwritten. If "0", then the newest 834 485 events are discarded. 486 + (see per_cpu/cpu0/stats for overrun and dropped) 835 487 836 - ftrace_enabled 837 - -------------- 488 + disable_on_free - When the free_buffer is closed, tracing will 489 + stop (tracing_on set to 0). 838 490 839 - The following tracers (listed below) give different output 840 - depending on whether or not the sysctl ftrace_enabled is set. To 841 - set ftrace_enabled, one can either use the sysctl function or 842 - set it via the proc file system interface. 491 + irq-info - Shows the interrupt, preempt count, need resched data. 492 + When disabled, the trace looks like: 843 493 844 - sysctl kernel.ftrace_enabled=1 494 + # tracer: function 495 + # 496 + # entries-in-buffer/entries-written: 144405/9452052 #P:4 497 + # 498 + # TASK-PID CPU# TIMESTAMP FUNCTION 499 + # | | | | | 500 + <idle>-0 [002] 23636.756054: ttwu_do_activate.constprop.89 <-try_to_wake_up 501 + <idle>-0 [002] 23636.756054: activate_task <-ttwu_do_activate.constprop.89 502 + <idle>-0 [002] 23636.756055: enqueue_task <-activate_task 845 503 846 - or 847 504 848 - echo 1 > /proc/sys/kernel/ftrace_enabled 505 + markers - When set, the trace_marker is writable (only by root). 506 + When disabled, the trace_marker will error with EINVAL 507 + on write. 849 508 850 - To disable ftrace_enabled simply replace the '1' with '0' in the 851 - above commands. 852 509 853 - When ftrace_enabled is set the tracers will also record the 854 - functions that are within the trace. The descriptions of the 855 - tracers will also show an example with ftrace enabled. 510 + function-trace - The latency tracers will enable function tracing 511 + if this option is enabled (default it is). When 512 + it is disabled, the latency tracers do not trace 513 + functions. This keeps the overhead of the tracer down 514 + when performing latency tests. 515 + 516 + Note: Some tracers have their own options. They only appear 517 + when the tracer is active. 518 + 856 519 857 520 858 521 irqsoff ··· 903 506 To reset the maximum, echo 0 into tracing_max_latency. Here is 904 507 an example: 905 508 509 + # echo 0 > options/function-trace 906 510 # echo irqsoff > current_tracer 907 - # echo latency-format > trace_options 908 - # echo 0 > tracing_max_latency 909 511 # echo 1 > tracing_on 512 + # echo 0 > tracing_max_latency 910 513 # ls -ltr 911 514 [...] 912 515 # echo 0 > tracing_on 913 516 # cat trace 914 517 # tracer: irqsoff 915 518 # 916 - irqsoff latency trace v1.1.5 on 2.6.26 917 - -------------------------------------------------------------------- 918 - latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 919 - ----------------- 920 - | task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0) 921 - ----------------- 922 - => started at: sys_setpgid 923 - => ended at: sys_setpgid 519 + # irqsoff latency trace v1.1.5 on 3.8.0-test+ 520 + # -------------------------------------------------------------------- 521 + # latency: 16 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 522 + # ----------------- 523 + # | task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0) 524 + # ----------------- 525 + # => started at: run_timer_softirq 526 + # => ended at: run_timer_softirq 527 + # 528 + # 529 + # _------=> CPU# 530 + # / _-----=> irqs-off 531 + # | / _----=> need-resched 532 + # || / _---=> hardirq/softirq 533 + # ||| / _--=> preempt-depth 534 + # |||| / delay 535 + # cmd pid ||||| time | caller 536 + # \ / ||||| \ | / 537 + <idle>-0 0d.s2 0us+: _raw_spin_lock_irq <-run_timer_softirq 538 + <idle>-0 0dNs3 17us : _raw_spin_unlock_irq <-run_timer_softirq 539 + <idle>-0 0dNs3 17us+: trace_hardirqs_on <-run_timer_softirq 540 + <idle>-0 0dNs3 25us : <stack trace> 541 + => _raw_spin_unlock_irq 542 + => run_timer_softirq 543 + => __do_softirq 544 + => call_softirq 545 + => do_softirq 546 + => irq_exit 547 + => smp_apic_timer_interrupt 548 + => apic_timer_interrupt 549 + => rcu_idle_exit 550 + => cpu_idle 551 + => rest_init 552 + => start_kernel 553 + => x86_64_start_reservations 554 + => x86_64_start_kernel 924 555 925 - # _------=> CPU# 926 - # / _-----=> irqs-off 927 - # | / _----=> need-resched 928 - # || / _---=> hardirq/softirq 929 - # ||| / _--=> preempt-depth 930 - # |||| / 931 - # ||||| delay 932 - # cmd pid ||||| time | caller 933 - # \ / ||||| \ | / 934 - bash-3730 1d... 0us : _write_lock_irq (sys_setpgid) 935 - bash-3730 1d..1 1us+: _write_unlock_irq (sys_setpgid) 936 - bash-3730 1d..2 14us : trace_hardirqs_on (sys_setpgid) 937 - 938 - 939 - Here we see that that we had a latency of 12 microsecs (which is 940 - very good). The _write_lock_irq in sys_setpgid disabled 941 - interrupts. The difference between the 12 and the displayed 942 - timestamp 14us occurred because the clock was incremented 556 + Here we see that that we had a latency of 16 microseconds (which is 557 + very good). The _raw_spin_lock_irq in run_timer_softirq disabled 558 + interrupts. The difference between the 16 and the displayed 559 + timestamp 25us occurred because the clock was incremented 943 560 between the time of recording the max latency and the time of 944 561 recording the function that had that latency. 945 562 946 - Note the above example had ftrace_enabled not set. If we set the 947 - ftrace_enabled, we get a much larger output: 563 + Note the above example had function-trace not set. If we set 564 + function-trace, we get a much larger output: 565 + 566 + with echo 1 > options/function-trace 948 567 949 568 # tracer: irqsoff 950 569 # 951 - irqsoff latency trace v1.1.5 on 2.6.26-rc8 952 - -------------------------------------------------------------------- 953 - latency: 50 us, #101/101, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 954 - ----------------- 955 - | task: ls-4339 (uid:0 nice:0 policy:0 rt_prio:0) 956 - ----------------- 957 - => started at: __alloc_pages_internal 958 - => ended at: __alloc_pages_internal 959 - 960 - # _------=> CPU# 961 - # / _-----=> irqs-off 962 - # | / _----=> need-resched 963 - # || / _---=> hardirq/softirq 964 - # ||| / _--=> preempt-depth 965 - # |||| / 966 - # ||||| delay 967 - # cmd pid ||||| time | caller 968 - # \ / ||||| \ | / 969 - ls-4339 0...1 0us+: get_page_from_freelist (__alloc_pages_internal) 970 - ls-4339 0d..1 3us : rmqueue_bulk (get_page_from_freelist) 971 - ls-4339 0d..1 3us : _spin_lock (rmqueue_bulk) 972 - ls-4339 0d..1 4us : add_preempt_count (_spin_lock) 973 - ls-4339 0d..2 4us : __rmqueue (rmqueue_bulk) 974 - ls-4339 0d..2 5us : __rmqueue_smallest (__rmqueue) 975 - ls-4339 0d..2 5us : __mod_zone_page_state (__rmqueue_smallest) 976 - ls-4339 0d..2 6us : __rmqueue (rmqueue_bulk) 977 - ls-4339 0d..2 6us : __rmqueue_smallest (__rmqueue) 978 - ls-4339 0d..2 7us : __mod_zone_page_state (__rmqueue_smallest) 979 - ls-4339 0d..2 7us : __rmqueue (rmqueue_bulk) 980 - ls-4339 0d..2 8us : __rmqueue_smallest (__rmqueue) 570 + # irqsoff latency trace v1.1.5 on 3.8.0-test+ 571 + # -------------------------------------------------------------------- 572 + # latency: 71 us, #168/168, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 573 + # ----------------- 574 + # | task: bash-2042 (uid:0 nice:0 policy:0 rt_prio:0) 575 + # ----------------- 576 + # => started at: ata_scsi_queuecmd 577 + # => ended at: ata_scsi_queuecmd 578 + # 579 + # 580 + # _------=> CPU# 581 + # / _-----=> irqs-off 582 + # | / _----=> need-resched 583 + # || / _---=> hardirq/softirq 584 + # ||| / _--=> preempt-depth 585 + # |||| / delay 586 + # cmd pid ||||| time | caller 587 + # \ / ||||| \ | / 588 + bash-2042 3d... 0us : _raw_spin_lock_irqsave <-ata_scsi_queuecmd 589 + bash-2042 3d... 0us : add_preempt_count <-_raw_spin_lock_irqsave 590 + bash-2042 3d..1 1us : ata_scsi_find_dev <-ata_scsi_queuecmd 591 + bash-2042 3d..1 1us : __ata_scsi_find_dev <-ata_scsi_find_dev 592 + bash-2042 3d..1 2us : ata_find_dev.part.14 <-__ata_scsi_find_dev 593 + bash-2042 3d..1 2us : ata_qc_new_init <-__ata_scsi_queuecmd 594 + bash-2042 3d..1 3us : ata_sg_init <-__ata_scsi_queuecmd 595 + bash-2042 3d..1 4us : ata_scsi_rw_xlat <-__ata_scsi_queuecmd 596 + bash-2042 3d..1 4us : ata_build_rw_tf <-ata_scsi_rw_xlat 981 597 [...] 982 - ls-4339 0d..2 46us : __rmqueue_smallest (__rmqueue) 983 - ls-4339 0d..2 47us : __mod_zone_page_state (__rmqueue_smallest) 984 - ls-4339 0d..2 47us : __rmqueue (rmqueue_bulk) 985 - ls-4339 0d..2 48us : __rmqueue_smallest (__rmqueue) 986 - ls-4339 0d..2 48us : __mod_zone_page_state (__rmqueue_smallest) 987 - ls-4339 0d..2 49us : _spin_unlock (rmqueue_bulk) 988 - ls-4339 0d..2 49us : sub_preempt_count (_spin_unlock) 989 - ls-4339 0d..1 50us : get_page_from_freelist (__alloc_pages_internal) 990 - ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal) 598 + bash-2042 3d..1 67us : delay_tsc <-__delay 599 + bash-2042 3d..1 67us : add_preempt_count <-delay_tsc 600 + bash-2042 3d..2 67us : sub_preempt_count <-delay_tsc 601 + bash-2042 3d..1 67us : add_preempt_count <-delay_tsc 602 + bash-2042 3d..2 68us : sub_preempt_count <-delay_tsc 603 + bash-2042 3d..1 68us+: ata_bmdma_start <-ata_bmdma_qc_issue 604 + bash-2042 3d..1 71us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd 605 + bash-2042 3d..1 71us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd 606 + bash-2042 3d..1 72us+: trace_hardirqs_on <-ata_scsi_queuecmd 607 + bash-2042 3d..1 120us : <stack trace> 608 + => _raw_spin_unlock_irqrestore 609 + => ata_scsi_queuecmd 610 + => scsi_dispatch_cmd 611 + => scsi_request_fn 612 + => __blk_run_queue_uncond 613 + => __blk_run_queue 614 + => blk_queue_bio 615 + => generic_make_request 616 + => submit_bio 617 + => submit_bh 618 + => __ext3_get_inode_loc 619 + => ext3_iget 620 + => ext3_lookup 621 + => lookup_real 622 + => __lookup_hash 623 + => walk_component 624 + => lookup_last 625 + => path_lookupat 626 + => filename_lookup 627 + => user_path_at_empty 628 + => user_path_at 629 + => vfs_fstatat 630 + => vfs_stat 631 + => sys_newstat 632 + => system_call_fastpath 991 633 992 634 993 - 994 - Here we traced a 50 microsecond latency. But we also see all the 635 + Here we traced a 71 microsecond latency. But we also see all the 995 636 functions that were called during that time. Note that by 996 637 enabling function tracing, we incur an added overhead. This 997 638 overhead may extend the latency times. But nevertheless, this ··· 1049 614 which preemption was disabled. The control of preemptoff tracer 1050 615 is much like the irqsoff tracer. 1051 616 617 + # echo 0 > options/function-trace 1052 618 # echo preemptoff > current_tracer 1053 - # echo latency-format > trace_options 1054 - # echo 0 > tracing_max_latency 1055 619 # echo 1 > tracing_on 620 + # echo 0 > tracing_max_latency 1056 621 # ls -ltr 1057 622 [...] 1058 623 # echo 0 > tracing_on 1059 624 # cat trace 1060 625 # tracer: preemptoff 1061 626 # 1062 - preemptoff latency trace v1.1.5 on 2.6.26-rc8 1063 - -------------------------------------------------------------------- 1064 - latency: 29 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 1065 - ----------------- 1066 - | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0) 1067 - ----------------- 1068 - => started at: do_IRQ 1069 - => ended at: __do_softirq 1070 - 1071 - # _------=> CPU# 1072 - # / _-----=> irqs-off 1073 - # | / _----=> need-resched 1074 - # || / _---=> hardirq/softirq 1075 - # ||| / _--=> preempt-depth 1076 - # |||| / 1077 - # ||||| delay 1078 - # cmd pid ||||| time | caller 1079 - # \ / ||||| \ | / 1080 - sshd-4261 0d.h. 0us+: irq_enter (do_IRQ) 1081 - sshd-4261 0d.s. 29us : _local_bh_enable (__do_softirq) 1082 - sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) 627 + # preemptoff latency trace v1.1.5 on 3.8.0-test+ 628 + # -------------------------------------------------------------------- 629 + # latency: 46 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 630 + # ----------------- 631 + # | task: sshd-1991 (uid:0 nice:0 policy:0 rt_prio:0) 632 + # ----------------- 633 + # => started at: do_IRQ 634 + # => ended at: do_IRQ 635 + # 636 + # 637 + # _------=> CPU# 638 + # / _-----=> irqs-off 639 + # | / _----=> need-resched 640 + # || / _---=> hardirq/softirq 641 + # ||| / _--=> preempt-depth 642 + # |||| / delay 643 + # cmd pid ||||| time | caller 644 + # \ / ||||| \ | / 645 + sshd-1991 1d.h. 0us+: irq_enter <-do_IRQ 646 + sshd-1991 1d..1 46us : irq_exit <-do_IRQ 647 + sshd-1991 1d..1 47us+: trace_preempt_on <-do_IRQ 648 + sshd-1991 1d..1 52us : <stack trace> 649 + => sub_preempt_count 650 + => irq_exit 651 + => do_IRQ 652 + => ret_from_intr 1083 653 1084 654 1085 655 This has some more changes. Preemption was disabled when an 1086 - interrupt came in (notice the 'h'), and was enabled while doing 1087 - a softirq. (notice the 's'). But we also see that interrupts 1088 - have been disabled when entering the preempt off section and 1089 - leaving it (the 'd'). We do not know if interrupts were enabled 1090 - in the mean time. 656 + interrupt came in (notice the 'h'), and was enabled on exit. 657 + But we also see that interrupts have been disabled when entering 658 + the preempt off section and leaving it (the 'd'). We do not know if 659 + interrupts were enabled in the mean time or shortly after this 660 + was over. 1091 661 1092 662 # tracer: preemptoff 1093 663 # 1094 - preemptoff latency trace v1.1.5 on 2.6.26-rc8 1095 - -------------------------------------------------------------------- 1096 - latency: 63 us, #87/87, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 1097 - ----------------- 1098 - | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0) 1099 - ----------------- 1100 - => started at: remove_wait_queue 1101 - => ended at: __do_softirq 1102 - 1103 - # _------=> CPU# 1104 - # / _-----=> irqs-off 1105 - # | / _----=> need-resched 1106 - # || / _---=> hardirq/softirq 1107 - # ||| / _--=> preempt-depth 1108 - # |||| / 1109 - # ||||| delay 1110 - # cmd pid ||||| time | caller 1111 - # \ / ||||| \ | / 1112 - sshd-4261 0d..1 0us : _spin_lock_irqsave (remove_wait_queue) 1113 - sshd-4261 0d..1 1us : _spin_unlock_irqrestore (remove_wait_queue) 1114 - sshd-4261 0d..1 2us : do_IRQ (common_interrupt) 1115 - sshd-4261 0d..1 2us : irq_enter (do_IRQ) 1116 - sshd-4261 0d..1 2us : idle_cpu (irq_enter) 1117 - sshd-4261 0d..1 3us : add_preempt_count (irq_enter) 1118 - sshd-4261 0d.h1 3us : idle_cpu (irq_enter) 1119 - sshd-4261 0d.h. 4us : handle_fasteoi_irq (do_IRQ) 664 + # preemptoff latency trace v1.1.5 on 3.8.0-test+ 665 + # -------------------------------------------------------------------- 666 + # latency: 83 us, #241/241, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 667 + # ----------------- 668 + # | task: bash-1994 (uid:0 nice:0 policy:0 rt_prio:0) 669 + # ----------------- 670 + # => started at: wake_up_new_task 671 + # => ended at: task_rq_unlock 672 + # 673 + # 674 + # _------=> CPU# 675 + # / _-----=> irqs-off 676 + # | / _----=> need-resched 677 + # || / _---=> hardirq/softirq 678 + # ||| / _--=> preempt-depth 679 + # |||| / delay 680 + # cmd pid ||||| time | caller 681 + # \ / ||||| \ | / 682 + bash-1994 1d..1 0us : _raw_spin_lock_irqsave <-wake_up_new_task 683 + bash-1994 1d..1 0us : select_task_rq_fair <-select_task_rq 684 + bash-1994 1d..1 1us : __rcu_read_lock <-select_task_rq_fair 685 + bash-1994 1d..1 1us : source_load <-select_task_rq_fair 686 + bash-1994 1d..1 1us : source_load <-select_task_rq_fair 1120 687 [...] 1121 - sshd-4261 0d.h. 12us : add_preempt_count (_spin_lock) 1122 - sshd-4261 0d.h1 12us : ack_ioapic_quirk_irq (handle_fasteoi_irq) 1123 - sshd-4261 0d.h1 13us : move_native_irq (ack_ioapic_quirk_irq) 1124 - sshd-4261 0d.h1 13us : _spin_unlock (handle_fasteoi_irq) 1125 - sshd-4261 0d.h1 14us : sub_preempt_count (_spin_unlock) 1126 - sshd-4261 0d.h1 14us : irq_exit (do_IRQ) 1127 - sshd-4261 0d.h1 15us : sub_preempt_count (irq_exit) 1128 - sshd-4261 0d..2 15us : do_softirq (irq_exit) 1129 - sshd-4261 0d... 15us : __do_softirq (do_softirq) 1130 - sshd-4261 0d... 16us : __local_bh_disable (__do_softirq) 1131 - sshd-4261 0d... 16us+: add_preempt_count (__local_bh_disable) 1132 - sshd-4261 0d.s4 20us : add_preempt_count (__local_bh_disable) 1133 - sshd-4261 0d.s4 21us : sub_preempt_count (local_bh_enable) 1134 - sshd-4261 0d.s5 21us : sub_preempt_count (local_bh_enable) 688 + bash-1994 1d..1 12us : irq_enter <-smp_apic_timer_interrupt 689 + bash-1994 1d..1 12us : rcu_irq_enter <-irq_enter 690 + bash-1994 1d..1 13us : add_preempt_count <-irq_enter 691 + bash-1994 1d.h1 13us : exit_idle <-smp_apic_timer_interrupt 692 + bash-1994 1d.h1 13us : hrtimer_interrupt <-smp_apic_timer_interrupt 693 + bash-1994 1d.h1 13us : _raw_spin_lock <-hrtimer_interrupt 694 + bash-1994 1d.h1 14us : add_preempt_count <-_raw_spin_lock 695 + bash-1994 1d.h2 14us : ktime_get_update_offsets <-hrtimer_interrupt 1135 696 [...] 1136 - sshd-4261 0d.s6 41us : add_preempt_count (__local_bh_disable) 1137 - sshd-4261 0d.s6 42us : sub_preempt_count (local_bh_enable) 1138 - sshd-4261 0d.s7 42us : sub_preempt_count (local_bh_enable) 1139 - sshd-4261 0d.s5 43us : add_preempt_count (__local_bh_disable) 1140 - sshd-4261 0d.s5 43us : sub_preempt_count (local_bh_enable_ip) 1141 - sshd-4261 0d.s6 44us : sub_preempt_count (local_bh_enable_ip) 1142 - sshd-4261 0d.s5 44us : add_preempt_count (__local_bh_disable) 1143 - sshd-4261 0d.s5 45us : sub_preempt_count (local_bh_enable) 697 + bash-1994 1d.h1 35us : lapic_next_event <-clockevents_program_event 698 + bash-1994 1d.h1 35us : irq_exit <-smp_apic_timer_interrupt 699 + bash-1994 1d.h1 36us : sub_preempt_count <-irq_exit 700 + bash-1994 1d..2 36us : do_softirq <-irq_exit 701 + bash-1994 1d..2 36us : __do_softirq <-call_softirq 702 + bash-1994 1d..2 36us : __local_bh_disable <-__do_softirq 703 + bash-1994 1d.s2 37us : add_preempt_count <-_raw_spin_lock_irq 704 + bash-1994 1d.s3 38us : _raw_spin_unlock <-run_timer_softirq 705 + bash-1994 1d.s3 39us : sub_preempt_count <-_raw_spin_unlock 706 + bash-1994 1d.s2 39us : call_timer_fn <-run_timer_softirq 1144 707 [...] 1145 - sshd-4261 0d.s. 63us : _local_bh_enable (__do_softirq) 1146 - sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq) 708 + bash-1994 1dNs2 81us : cpu_needs_another_gp <-rcu_process_callbacks 709 + bash-1994 1dNs2 82us : __local_bh_enable <-__do_softirq 710 + bash-1994 1dNs2 82us : sub_preempt_count <-__local_bh_enable 711 + bash-1994 1dN.2 82us : idle_cpu <-irq_exit 712 + bash-1994 1dN.2 83us : rcu_irq_exit <-irq_exit 713 + bash-1994 1dN.2 83us : sub_preempt_count <-irq_exit 714 + bash-1994 1.N.1 84us : _raw_spin_unlock_irqrestore <-task_rq_unlock 715 + bash-1994 1.N.1 84us+: trace_preempt_on <-task_rq_unlock 716 + bash-1994 1.N.1 104us : <stack trace> 717 + => sub_preempt_count 718 + => _raw_spin_unlock_irqrestore 719 + => task_rq_unlock 720 + => wake_up_new_task 721 + => do_fork 722 + => sys_clone 723 + => stub_clone 1147 724 1148 725 1149 726 The above is an example of the preemptoff trace with 1150 - ftrace_enabled set. Here we see that interrupts were disabled 727 + function-trace set. Here we see that interrupts were not disabled 1151 728 the entire time. The irq_enter code lets us know that we entered 1152 729 an interrupt 'h'. Before that, the functions being traced still 1153 730 show that it is not in an interrupt, but we can see from the 1154 731 functions themselves that this is not the case. 1155 - 1156 - Notice that __do_softirq when called does not have a 1157 - preempt_count. It may seem that we missed a preempt enabling. 1158 - What really happened is that the preempt count is held on the 1159 - thread's stack and we switched to the softirq stack (4K stacks 1160 - in effect). The code does not copy the preempt count, but 1161 - because interrupts are disabled, we do not need to worry about 1162 - it. Having a tracer like this is good for letting people know 1163 - what really happens inside the kernel. 1164 - 1165 732 1166 733 preemptirqsoff 1167 734 -------------- ··· 1199 762 Again, using this trace is much like the irqsoff and preemptoff 1200 763 tracers. 1201 764 765 + # echo 0 > options/function-trace 1202 766 # echo preemptirqsoff > current_tracer 1203 - # echo latency-format > trace_options 1204 - # echo 0 > tracing_max_latency 1205 767 # echo 1 > tracing_on 768 + # echo 0 > tracing_max_latency 1206 769 # ls -ltr 1207 770 [...] 1208 771 # echo 0 > tracing_on 1209 772 # cat trace 1210 773 # tracer: preemptirqsoff 1211 774 # 1212 - preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 1213 - -------------------------------------------------------------------- 1214 - latency: 293 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 1215 - ----------------- 1216 - | task: ls-4860 (uid:0 nice:0 policy:0 rt_prio:0) 1217 - ----------------- 1218 - => started at: apic_timer_interrupt 1219 - => ended at: __do_softirq 1220 - 1221 - # _------=> CPU# 1222 - # / _-----=> irqs-off 1223 - # | / _----=> need-resched 1224 - # || / _---=> hardirq/softirq 1225 - # ||| / _--=> preempt-depth 1226 - # |||| / 1227 - # ||||| delay 1228 - # cmd pid ||||| time | caller 1229 - # \ / ||||| \ | / 1230 - ls-4860 0d... 0us!: trace_hardirqs_off_thunk (apic_timer_interrupt) 1231 - ls-4860 0d.s. 294us : _local_bh_enable (__do_softirq) 1232 - ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq) 1233 - 775 + # preemptirqsoff latency trace v1.1.5 on 3.8.0-test+ 776 + # -------------------------------------------------------------------- 777 + # latency: 100 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 778 + # ----------------- 779 + # | task: ls-2230 (uid:0 nice:0 policy:0 rt_prio:0) 780 + # ----------------- 781 + # => started at: ata_scsi_queuecmd 782 + # => ended at: ata_scsi_queuecmd 783 + # 784 + # 785 + # _------=> CPU# 786 + # / _-----=> irqs-off 787 + # | / _----=> need-resched 788 + # || / _---=> hardirq/softirq 789 + # ||| / _--=> preempt-depth 790 + # |||| / delay 791 + # cmd pid ||||| time | caller 792 + # \ / ||||| \ | / 793 + ls-2230 3d... 0us+: _raw_spin_lock_irqsave <-ata_scsi_queuecmd 794 + ls-2230 3...1 100us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd 795 + ls-2230 3...1 101us+: trace_preempt_on <-ata_scsi_queuecmd 796 + ls-2230 3...1 111us : <stack trace> 797 + => sub_preempt_count 798 + => _raw_spin_unlock_irqrestore 799 + => ata_scsi_queuecmd 800 + => scsi_dispatch_cmd 801 + => scsi_request_fn 802 + => __blk_run_queue_uncond 803 + => __blk_run_queue 804 + => blk_queue_bio 805 + => generic_make_request 806 + => submit_bio 807 + => submit_bh 808 + => ext3_bread 809 + => ext3_dir_bread 810 + => htree_dirblock_to_tree 811 + => ext3_htree_fill_tree 812 + => ext3_readdir 813 + => vfs_readdir 814 + => sys_getdents 815 + => system_call_fastpath 1234 816 1235 817 1236 818 The trace_hardirqs_off_thunk is called from assembly on x86 when ··· 1258 802 within the preemption points. We do see that it started with 1259 803 preemption enabled. 1260 804 1261 - Here is a trace with ftrace_enabled set: 1262 - 805 + Here is a trace with function-trace set: 1263 806 1264 807 # tracer: preemptirqsoff 1265 808 # 1266 - preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8 1267 - -------------------------------------------------------------------- 1268 - latency: 105 us, #183/183, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 1269 - ----------------- 1270 - | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0) 1271 - ----------------- 1272 - => started at: write_chan 1273 - => ended at: __do_softirq 809 + # preemptirqsoff latency trace v1.1.5 on 3.8.0-test+ 810 + # -------------------------------------------------------------------- 811 + # latency: 161 us, #339/339, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 812 + # ----------------- 813 + # | task: ls-2269 (uid:0 nice:0 policy:0 rt_prio:0) 814 + # ----------------- 815 + # => started at: schedule 816 + # => ended at: mutex_unlock 817 + # 818 + # 819 + # _------=> CPU# 820 + # / _-----=> irqs-off 821 + # | / _----=> need-resched 822 + # || / _---=> hardirq/softirq 823 + # ||| / _--=> preempt-depth 824 + # |||| / delay 825 + # cmd pid ||||| time | caller 826 + # \ / ||||| \ | / 827 + kworker/-59 3...1 0us : __schedule <-schedule 828 + kworker/-59 3d..1 0us : rcu_preempt_qs <-rcu_note_context_switch 829 + kworker/-59 3d..1 1us : add_preempt_count <-_raw_spin_lock_irq 830 + kworker/-59 3d..2 1us : deactivate_task <-__schedule 831 + kworker/-59 3d..2 1us : dequeue_task <-deactivate_task 832 + kworker/-59 3d..2 2us : update_rq_clock <-dequeue_task 833 + kworker/-59 3d..2 2us : dequeue_task_fair <-dequeue_task 834 + kworker/-59 3d..2 2us : update_curr <-dequeue_task_fair 835 + kworker/-59 3d..2 2us : update_min_vruntime <-update_curr 836 + kworker/-59 3d..2 3us : cpuacct_charge <-update_curr 837 + kworker/-59 3d..2 3us : __rcu_read_lock <-cpuacct_charge 838 + kworker/-59 3d..2 3us : __rcu_read_unlock <-cpuacct_charge 839 + kworker/-59 3d..2 3us : update_cfs_rq_blocked_load <-dequeue_task_fair 840 + kworker/-59 3d..2 4us : clear_buddies <-dequeue_task_fair 841 + kworker/-59 3d..2 4us : account_entity_dequeue <-dequeue_task_fair 842 + kworker/-59 3d..2 4us : update_min_vruntime <-dequeue_task_fair 843 + kworker/-59 3d..2 4us : update_cfs_shares <-dequeue_task_fair 844 + kworker/-59 3d..2 5us : hrtick_update <-dequeue_task_fair 845 + kworker/-59 3d..2 5us : wq_worker_sleeping <-__schedule 846 + kworker/-59 3d..2 5us : kthread_data <-wq_worker_sleeping 847 + kworker/-59 3d..2 5us : put_prev_task_fair <-__schedule 848 + kworker/-59 3d..2 6us : pick_next_task_fair <-pick_next_task 849 + kworker/-59 3d..2 6us : clear_buddies <-pick_next_task_fair 850 + kworker/-59 3d..2 6us : set_next_entity <-pick_next_task_fair 851 + kworker/-59 3d..2 6us : update_stats_wait_end <-set_next_entity 852 + ls-2269 3d..2 7us : finish_task_switch <-__schedule 853 + ls-2269 3d..2 7us : _raw_spin_unlock_irq <-finish_task_switch 854 + ls-2269 3d..2 8us : do_IRQ <-ret_from_intr 855 + ls-2269 3d..2 8us : irq_enter <-do_IRQ 856 + ls-2269 3d..2 8us : rcu_irq_enter <-irq_enter 857 + ls-2269 3d..2 9us : add_preempt_count <-irq_enter 858 + ls-2269 3d.h2 9us : exit_idle <-do_IRQ 859 + [...] 860 + ls-2269 3d.h3 20us : sub_preempt_count <-_raw_spin_unlock 861 + ls-2269 3d.h2 20us : irq_exit <-do_IRQ 862 + ls-2269 3d.h2 21us : sub_preempt_count <-irq_exit 863 + ls-2269 3d..3 21us : do_softirq <-irq_exit 864 + ls-2269 3d..3 21us : __do_softirq <-call_softirq 865 + ls-2269 3d..3 21us+: __local_bh_disable <-__do_softirq 866 + ls-2269 3d.s4 29us : sub_preempt_count <-_local_bh_enable_ip 867 + ls-2269 3d.s5 29us : sub_preempt_count <-_local_bh_enable_ip 868 + ls-2269 3d.s5 31us : do_IRQ <-ret_from_intr 869 + ls-2269 3d.s5 31us : irq_enter <-do_IRQ 870 + ls-2269 3d.s5 31us : rcu_irq_enter <-irq_enter 871 + [...] 872 + ls-2269 3d.s5 31us : rcu_irq_enter <-irq_enter 873 + ls-2269 3d.s5 32us : add_preempt_count <-irq_enter 874 + ls-2269 3d.H5 32us : exit_idle <-do_IRQ 875 + ls-2269 3d.H5 32us : handle_irq <-do_IRQ 876 + ls-2269 3d.H5 32us : irq_to_desc <-handle_irq 877 + ls-2269 3d.H5 33us : handle_fasteoi_irq <-handle_irq 878 + [...] 879 + ls-2269 3d.s5 158us : _raw_spin_unlock_irqrestore <-rtl8139_poll 880 + ls-2269 3d.s3 158us : net_rps_action_and_irq_enable.isra.65 <-net_rx_action 881 + ls-2269 3d.s3 159us : __local_bh_enable <-__do_softirq 882 + ls-2269 3d.s3 159us : sub_preempt_count <-__local_bh_enable 883 + ls-2269 3d..3 159us : idle_cpu <-irq_exit 884 + ls-2269 3d..3 159us : rcu_irq_exit <-irq_exit 885 + ls-2269 3d..3 160us : sub_preempt_count <-irq_exit 886 + ls-2269 3d... 161us : __mutex_unlock_slowpath <-mutex_unlock 887 + ls-2269 3d... 162us+: trace_hardirqs_on <-mutex_unlock 888 + ls-2269 3d... 186us : <stack trace> 889 + => __mutex_unlock_slowpath 890 + => mutex_unlock 891 + => process_output 892 + => n_tty_write 893 + => tty_write 894 + => vfs_write 895 + => sys_write 896 + => system_call_fastpath 1274 897 1275 - # _------=> CPU# 1276 - # / _-----=> irqs-off 1277 - # | / _----=> need-resched 1278 - # || / _---=> hardirq/softirq 1279 - # ||| / _--=> preempt-depth 1280 - # |||| / 1281 - # ||||| delay 1282 - # cmd pid ||||| time | caller 1283 - # \ / ||||| \ | / 1284 - ls-4473 0.N.. 0us : preempt_schedule (write_chan) 1285 - ls-4473 0dN.1 1us : _spin_lock (schedule) 1286 - ls-4473 0dN.1 2us : add_preempt_count (_spin_lock) 1287 - ls-4473 0d..2 2us : put_prev_task_fair (schedule) 1288 - [...] 1289 - ls-4473 0d..2 13us : set_normalized_timespec (ktime_get_ts) 1290 - ls-4473 0d..2 13us : __switch_to (schedule) 1291 - sshd-4261 0d..2 14us : finish_task_switch (schedule) 1292 - sshd-4261 0d..2 14us : _spin_unlock_irq (finish_task_switch) 1293 - sshd-4261 0d..1 15us : add_preempt_count (_spin_lock_irqsave) 1294 - sshd-4261 0d..2 16us : _spin_unlock_irqrestore (hrtick_set) 1295 - sshd-4261 0d..2 16us : do_IRQ (common_interrupt) 1296 - sshd-4261 0d..2 17us : irq_enter (do_IRQ) 1297 - sshd-4261 0d..2 17us : idle_cpu (irq_enter) 1298 - sshd-4261 0d..2 18us : add_preempt_count (irq_enter) 1299 - sshd-4261 0d.h2 18us : idle_cpu (irq_enter) 1300 - sshd-4261 0d.h. 18us : handle_fasteoi_irq (do_IRQ) 1301 - sshd-4261 0d.h. 19us : _spin_lock (handle_fasteoi_irq) 1302 - sshd-4261 0d.h. 19us : add_preempt_count (_spin_lock) 1303 - sshd-4261 0d.h1 20us : _spin_unlock (handle_fasteoi_irq) 1304 - sshd-4261 0d.h1 20us : sub_preempt_count (_spin_unlock) 1305 - [...] 1306 - sshd-4261 0d.h1 28us : _spin_unlock (handle_fasteoi_irq) 1307 - sshd-4261 0d.h1 29us : sub_preempt_count (_spin_unlock) 1308 - sshd-4261 0d.h2 29us : irq_exit (do_IRQ) 1309 - sshd-4261 0d.h2 29us : sub_preempt_count (irq_exit) 1310 - sshd-4261 0d..3 30us : do_softirq (irq_exit) 1311 - sshd-4261 0d... 30us : __do_softirq (do_softirq) 1312 - sshd-4261 0d... 31us : __local_bh_disable (__do_softirq) 1313 - sshd-4261 0d... 31us+: add_preempt_count (__local_bh_disable) 1314 - sshd-4261 0d.s4 34us : add_preempt_count (__local_bh_disable) 1315 - [...] 1316 - sshd-4261 0d.s3 43us : sub_preempt_count (local_bh_enable_ip) 1317 - sshd-4261 0d.s4 44us : sub_preempt_count (local_bh_enable_ip) 1318 - sshd-4261 0d.s3 44us : smp_apic_timer_interrupt (apic_timer_interrupt) 1319 - sshd-4261 0d.s3 45us : irq_enter (smp_apic_timer_interrupt) 1320 - sshd-4261 0d.s3 45us : idle_cpu (irq_enter) 1321 - sshd-4261 0d.s3 46us : add_preempt_count (irq_enter) 1322 - sshd-4261 0d.H3 46us : idle_cpu (irq_enter) 1323 - sshd-4261 0d.H3 47us : hrtimer_interrupt (smp_apic_timer_interrupt) 1324 - sshd-4261 0d.H3 47us : ktime_get (hrtimer_interrupt) 1325 - [...] 1326 - sshd-4261 0d.H3 81us : tick_program_event (hrtimer_interrupt) 1327 - sshd-4261 0d.H3 82us : ktime_get (tick_program_event) 1328 - sshd-4261 0d.H3 82us : ktime_get_ts (ktime_get) 1329 - sshd-4261 0d.H3 83us : getnstimeofday (ktime_get_ts) 1330 - sshd-4261 0d.H3 83us : set_normalized_timespec (ktime_get_ts) 1331 - sshd-4261 0d.H3 84us : clockevents_program_event (tick_program_event) 1332 - sshd-4261 0d.H3 84us : lapic_next_event (clockevents_program_event) 1333 - sshd-4261 0d.H3 85us : irq_exit (smp_apic_timer_interrupt) 1334 - sshd-4261 0d.H3 85us : sub_preempt_count (irq_exit) 1335 - sshd-4261 0d.s4 86us : sub_preempt_count (irq_exit) 1336 - sshd-4261 0d.s3 86us : add_preempt_count (__local_bh_disable) 1337 - [...] 1338 - sshd-4261 0d.s1 98us : sub_preempt_count (net_rx_action) 1339 - sshd-4261 0d.s. 99us : add_preempt_count (_spin_lock_irq) 1340 - sshd-4261 0d.s1 99us+: _spin_unlock_irq (run_timer_softirq) 1341 - sshd-4261 0d.s. 104us : _local_bh_enable (__do_softirq) 1342 - sshd-4261 0d.s. 104us : sub_preempt_count (_local_bh_enable) 1343 - sshd-4261 0d.s. 105us : _local_bh_enable (__do_softirq) 1344 - sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq) 1345 - 1346 - 1347 - This is a very interesting trace. It started with the preemption 1348 - of the ls task. We see that the task had the "need_resched" bit 1349 - set via the 'N' in the trace. Interrupts were disabled before 1350 - the spin_lock at the beginning of the trace. We see that a 1351 - schedule took place to run sshd. When the interrupts were 1352 - enabled, we took an interrupt. On return from the interrupt 1353 - handler, the softirq ran. We took another interrupt while 1354 - running the softirq as we see from the capital 'H'. 898 + This is an interesting trace. It started with kworker running and 899 + scheduling out and ls taking over. But as soon as ls released the 900 + rq lock and enabled interrupts (but not preemption) an interrupt 901 + triggered. When the interrupt finished, it started running softirqs. 902 + But while the softirq was running, another interrupt triggered. 903 + When an interrupt is running inside a softirq, the annotation is 'H'. 1355 904 1356 905 1357 906 wakeup 1358 907 ------ 908 + 909 + One common case that people are interested in tracing is the 910 + time it takes for a task that is woken to actually wake up. 911 + Now for non Real-Time tasks, this can be arbitrary. But tracing 912 + it none the less can be interesting. 913 + 914 + Without function tracing: 915 + 916 + # echo 0 > options/function-trace 917 + # echo wakeup > current_tracer 918 + # echo 1 > tracing_on 919 + # echo 0 > tracing_max_latency 920 + # chrt -f 5 sleep 1 921 + # echo 0 > tracing_on 922 + # cat trace 923 + # tracer: wakeup 924 + # 925 + # wakeup latency trace v1.1.5 on 3.8.0-test+ 926 + # -------------------------------------------------------------------- 927 + # latency: 15 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 928 + # ----------------- 929 + # | task: kworker/3:1H-312 (uid:0 nice:-20 policy:0 rt_prio:0) 930 + # ----------------- 931 + # 932 + # _------=> CPU# 933 + # / _-----=> irqs-off 934 + # | / _----=> need-resched 935 + # || / _---=> hardirq/softirq 936 + # ||| / _--=> preempt-depth 937 + # |||| / delay 938 + # cmd pid ||||| time | caller 939 + # \ / ||||| \ | / 940 + <idle>-0 3dNs7 0us : 0:120:R + [003] 312:100:R kworker/3:1H 941 + <idle>-0 3dNs7 1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up 942 + <idle>-0 3d..3 15us : __schedule <-schedule 943 + <idle>-0 3d..3 15us : 0:120:R ==> [003] 312:100:R kworker/3:1H 944 + 945 + The tracer only traces the highest priority task in the system 946 + to avoid tracing the normal circumstances. Here we see that 947 + the kworker with a nice priority of -20 (not very nice), took 948 + just 15 microseconds from the time it woke up, to the time it 949 + ran. 950 + 951 + Non Real-Time tasks are not that interesting. A more interesting 952 + trace is to concentrate only on Real-Time tasks. 953 + 954 + wakeup_rt 955 + --------- 1359 956 1360 957 In a Real-Time environment it is very important to know the 1361 958 wakeup time it takes for the highest priority task that is woken ··· 1423 914 That is the longest latency it takes for something to happen, 1424 915 and not the average. We can have a very fast scheduler that may 1425 916 only have a large latency once in a while, but that would not 1426 - work well with Real-Time tasks. The wakeup tracer was designed 917 + work well with Real-Time tasks. The wakeup_rt tracer was designed 1427 918 to record the worst case wakeups of RT tasks. Non-RT tasks are 1428 919 not recorded because the tracer only records one worst case and 1429 920 tracing non-RT tasks that are unpredictable will overwrite the 1430 - worst case latency of RT tasks. 921 + worst case latency of RT tasks (just run the normal wakeup 922 + tracer for a while to see that effect). 1431 923 1432 924 Since this tracer only deals with RT tasks, we will run this 1433 925 slightly differently than we did with the previous tracers. 1434 926 Instead of performing an 'ls', we will run 'sleep 1' under 1435 927 'chrt' which changes the priority of the task. 1436 928 1437 - # echo wakeup > current_tracer 1438 - # echo latency-format > trace_options 1439 - # echo 0 > tracing_max_latency 929 + # echo 0 > options/function-trace 930 + # echo wakeup_rt > current_tracer 1440 931 # echo 1 > tracing_on 932 + # echo 0 > tracing_max_latency 1441 933 # chrt -f 5 sleep 1 1442 934 # echo 0 > tracing_on 1443 935 # cat trace 1444 936 # tracer: wakeup 1445 937 # 1446 - wakeup latency trace v1.1.5 on 2.6.26-rc8 1447 - -------------------------------------------------------------------- 1448 - latency: 4 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 1449 - ----------------- 1450 - | task: sleep-4901 (uid:0 nice:0 policy:1 rt_prio:5) 1451 - ----------------- 938 + # tracer: wakeup_rt 939 + # 940 + # wakeup_rt latency trace v1.1.5 on 3.8.0-test+ 941 + # -------------------------------------------------------------------- 942 + # latency: 5 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 943 + # ----------------- 944 + # | task: sleep-2389 (uid:0 nice:0 policy:1 rt_prio:5) 945 + # ----------------- 946 + # 947 + # _------=> CPU# 948 + # / _-----=> irqs-off 949 + # | / _----=> need-resched 950 + # || / _---=> hardirq/softirq 951 + # ||| / _--=> preempt-depth 952 + # |||| / delay 953 + # cmd pid ||||| time | caller 954 + # \ / ||||| \ | / 955 + <idle>-0 3d.h4 0us : 0:120:R + [003] 2389: 94:R sleep 956 + <idle>-0 3d.h4 1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up 957 + <idle>-0 3d..3 5us : __schedule <-schedule 958 + <idle>-0 3d..3 5us : 0:120:R ==> [003] 2389: 94:R sleep 1452 959 1453 - # _------=> CPU# 1454 - # / _-----=> irqs-off 1455 - # | / _----=> need-resched 1456 - # || / _---=> hardirq/softirq 1457 - # ||| / _--=> preempt-depth 1458 - # |||| / 1459 - # ||||| delay 1460 - # cmd pid ||||| time | caller 1461 - # \ / ||||| \ | / 1462 - <idle>-0 1d.h4 0us+: try_to_wake_up (wake_up_process) 1463 - <idle>-0 1d..4 4us : schedule (cpu_idle) 1464 960 961 + Running this on an idle system, we see that it only took 5 microseconds 962 + to perform the task switch. Note, since the trace point in the schedule 963 + is before the actual "switch", we stop the tracing when the recorded task 964 + is about to schedule in. This may change if we add a new marker at the 965 + end of the scheduler. 1465 966 1466 - Running this on an idle system, we see that it only took 4 1467 - microseconds to perform the task switch. Note, since the trace 1468 - marker in the schedule is before the actual "switch", we stop 1469 - the tracing when the recorded task is about to schedule in. This 1470 - may change if we add a new marker at the end of the scheduler. 1471 - 1472 - Notice that the recorded task is 'sleep' with the PID of 4901 967 + Notice that the recorded task is 'sleep' with the PID of 2389 1473 968 and it has an rt_prio of 5. This priority is user-space priority 1474 969 and not the internal kernel priority. The policy is 1 for 1475 970 SCHED_FIFO and 2 for SCHED_RR. 1476 971 1477 - Doing the same with chrt -r 5 and ftrace_enabled set. 972 + Note, that the trace data shows the internal priority (99 - rtprio). 1478 973 1479 - # tracer: wakeup 974 + <idle>-0 3d..3 5us : 0:120:R ==> [003] 2389: 94:R sleep 975 + 976 + The 0:120:R means idle was running with a nice priority of 0 (120 - 20) 977 + and in the running state 'R'. The sleep task was scheduled in with 978 + 2389: 94:R. That is the priority is the kernel rtprio (99 - 5 = 94) 979 + and it too is in the running state. 980 + 981 + Doing the same with chrt -r 5 and function-trace set. 982 + 983 + echo 1 > options/function-trace 984 + 985 + # tracer: wakeup_rt 1480 986 # 1481 - wakeup latency trace v1.1.5 on 2.6.26-rc8 1482 - -------------------------------------------------------------------- 1483 - latency: 50 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 1484 - ----------------- 1485 - | task: sleep-4068 (uid:0 nice:0 policy:2 rt_prio:5) 1486 - ----------------- 987 + # wakeup_rt latency trace v1.1.5 on 3.8.0-test+ 988 + # -------------------------------------------------------------------- 989 + # latency: 29 us, #85/85, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 990 + # ----------------- 991 + # | task: sleep-2448 (uid:0 nice:0 policy:1 rt_prio:5) 992 + # ----------------- 993 + # 994 + # _------=> CPU# 995 + # / _-----=> irqs-off 996 + # | / _----=> need-resched 997 + # || / _---=> hardirq/softirq 998 + # ||| / _--=> preempt-depth 999 + # |||| / delay 1000 + # cmd pid ||||| time | caller 1001 + # \ / ||||| \ | / 1002 + <idle>-0 3d.h4 1us+: 0:120:R + [003] 2448: 94:R sleep 1003 + <idle>-0 3d.h4 2us : ttwu_do_activate.constprop.87 <-try_to_wake_up 1004 + <idle>-0 3d.h3 3us : check_preempt_curr <-ttwu_do_wakeup 1005 + <idle>-0 3d.h3 3us : resched_task <-check_preempt_curr 1006 + <idle>-0 3dNh3 4us : task_woken_rt <-ttwu_do_wakeup 1007 + <idle>-0 3dNh3 4us : _raw_spin_unlock <-try_to_wake_up 1008 + <idle>-0 3dNh3 4us : sub_preempt_count <-_raw_spin_unlock 1009 + <idle>-0 3dNh2 5us : ttwu_stat <-try_to_wake_up 1010 + <idle>-0 3dNh2 5us : _raw_spin_unlock_irqrestore <-try_to_wake_up 1011 + <idle>-0 3dNh2 6us : sub_preempt_count <-_raw_spin_unlock_irqrestore 1012 + <idle>-0 3dNh1 6us : _raw_spin_lock <-__run_hrtimer 1013 + <idle>-0 3dNh1 6us : add_preempt_count <-_raw_spin_lock 1014 + <idle>-0 3dNh2 7us : _raw_spin_unlock <-hrtimer_interrupt 1015 + <idle>-0 3dNh2 7us : sub_preempt_count <-_raw_spin_unlock 1016 + <idle>-0 3dNh1 7us : tick_program_event <-hrtimer_interrupt 1017 + <idle>-0 3dNh1 7us : clockevents_program_event <-tick_program_event 1018 + <idle>-0 3dNh1 8us : ktime_get <-clockevents_program_event 1019 + <idle>-0 3dNh1 8us : lapic_next_event <-clockevents_program_event 1020 + <idle>-0 3dNh1 8us : irq_exit <-smp_apic_timer_interrupt 1021 + <idle>-0 3dNh1 9us : sub_preempt_count <-irq_exit 1022 + <idle>-0 3dN.2 9us : idle_cpu <-irq_exit 1023 + <idle>-0 3dN.2 9us : rcu_irq_exit <-irq_exit 1024 + <idle>-0 3dN.2 10us : rcu_eqs_enter_common.isra.45 <-rcu_irq_exit 1025 + <idle>-0 3dN.2 10us : sub_preempt_count <-irq_exit 1026 + <idle>-0 3.N.1 11us : rcu_idle_exit <-cpu_idle 1027 + <idle>-0 3dN.1 11us : rcu_eqs_exit_common.isra.43 <-rcu_idle_exit 1028 + <idle>-0 3.N.1 11us : tick_nohz_idle_exit <-cpu_idle 1029 + <idle>-0 3dN.1 12us : menu_hrtimer_cancel <-tick_nohz_idle_exit 1030 + <idle>-0 3dN.1 12us : ktime_get <-tick_nohz_idle_exit 1031 + <idle>-0 3dN.1 12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit 1032 + <idle>-0 3dN.1 13us : update_cpu_load_nohz <-tick_nohz_idle_exit 1033 + <idle>-0 3dN.1 13us : _raw_spin_lock <-update_cpu_load_nohz 1034 + <idle>-0 3dN.1 13us : add_preempt_count <-_raw_spin_lock 1035 + <idle>-0 3dN.2 13us : __update_cpu_load <-update_cpu_load_nohz 1036 + <idle>-0 3dN.2 14us : sched_avg_update <-__update_cpu_load 1037 + <idle>-0 3dN.2 14us : _raw_spin_unlock <-update_cpu_load_nohz 1038 + <idle>-0 3dN.2 14us : sub_preempt_count <-_raw_spin_unlock 1039 + <idle>-0 3dN.1 15us : calc_load_exit_idle <-tick_nohz_idle_exit 1040 + <idle>-0 3dN.1 15us : touch_softlockup_watchdog <-tick_nohz_idle_exit 1041 + <idle>-0 3dN.1 15us : hrtimer_cancel <-tick_nohz_idle_exit 1042 + <idle>-0 3dN.1 15us : hrtimer_try_to_cancel <-hrtimer_cancel 1043 + <idle>-0 3dN.1 16us : lock_hrtimer_base.isra.18 <-hrtimer_try_to_cancel 1044 + <idle>-0 3dN.1 16us : _raw_spin_lock_irqsave <-lock_hrtimer_base.isra.18 1045 + <idle>-0 3dN.1 16us : add_preempt_count <-_raw_spin_lock_irqsave 1046 + <idle>-0 3dN.2 17us : __remove_hrtimer <-remove_hrtimer.part.16 1047 + <idle>-0 3dN.2 17us : hrtimer_force_reprogram <-__remove_hrtimer 1048 + <idle>-0 3dN.2 17us : tick_program_event <-hrtimer_force_reprogram 1049 + <idle>-0 3dN.2 18us : clockevents_program_event <-tick_program_event 1050 + <idle>-0 3dN.2 18us : ktime_get <-clockevents_program_event 1051 + <idle>-0 3dN.2 18us : lapic_next_event <-clockevents_program_event 1052 + <idle>-0 3dN.2 19us : _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel 1053 + <idle>-0 3dN.2 19us : sub_preempt_count <-_raw_spin_unlock_irqrestore 1054 + <idle>-0 3dN.1 19us : hrtimer_forward <-tick_nohz_idle_exit 1055 + <idle>-0 3dN.1 20us : ktime_add_safe <-hrtimer_forward 1056 + <idle>-0 3dN.1 20us : ktime_add_safe <-hrtimer_forward 1057 + <idle>-0 3dN.1 20us : hrtimer_start_range_ns <-hrtimer_start_expires.constprop.11 1058 + <idle>-0 3dN.1 20us : __hrtimer_start_range_ns <-hrtimer_start_range_ns 1059 + <idle>-0 3dN.1 21us : lock_hrtimer_base.isra.18 <-__hrtimer_start_range_ns 1060 + <idle>-0 3dN.1 21us : _raw_spin_lock_irqsave <-lock_hrtimer_base.isra.18 1061 + <idle>-0 3dN.1 21us : add_preempt_count <-_raw_spin_lock_irqsave 1062 + <idle>-0 3dN.2 22us : ktime_add_safe <-__hrtimer_start_range_ns 1063 + <idle>-0 3dN.2 22us : enqueue_hrtimer <-__hrtimer_start_range_ns 1064 + <idle>-0 3dN.2 22us : tick_program_event <-__hrtimer_start_range_ns 1065 + <idle>-0 3dN.2 23us : clockevents_program_event <-tick_program_event 1066 + <idle>-0 3dN.2 23us : ktime_get <-clockevents_program_event 1067 + <idle>-0 3dN.2 23us : lapic_next_event <-clockevents_program_event 1068 + <idle>-0 3dN.2 24us : _raw_spin_unlock_irqrestore <-__hrtimer_start_range_ns 1069 + <idle>-0 3dN.2 24us : sub_preempt_count <-_raw_spin_unlock_irqrestore 1070 + <idle>-0 3dN.1 24us : account_idle_ticks <-tick_nohz_idle_exit 1071 + <idle>-0 3dN.1 24us : account_idle_time <-account_idle_ticks 1072 + <idle>-0 3.N.1 25us : sub_preempt_count <-cpu_idle 1073 + <idle>-0 3.N.. 25us : schedule <-cpu_idle 1074 + <idle>-0 3.N.. 25us : __schedule <-preempt_schedule 1075 + <idle>-0 3.N.. 26us : add_preempt_count <-__schedule 1076 + <idle>-0 3.N.1 26us : rcu_note_context_switch <-__schedule 1077 + <idle>-0 3.N.1 26us : rcu_sched_qs <-rcu_note_context_switch 1078 + <idle>-0 3dN.1 27us : rcu_preempt_qs <-rcu_note_context_switch 1079 + <idle>-0 3.N.1 27us : _raw_spin_lock_irq <-__schedule 1080 + <idle>-0 3dN.1 27us : add_preempt_count <-_raw_spin_lock_irq 1081 + <idle>-0 3dN.2 28us : put_prev_task_idle <-__schedule 1082 + <idle>-0 3dN.2 28us : pick_next_task_stop <-pick_next_task 1083 + <idle>-0 3dN.2 28us : pick_next_task_rt <-pick_next_task 1084 + <idle>-0 3dN.2 29us : dequeue_pushable_task <-pick_next_task_rt 1085 + <idle>-0 3d..3 29us : __schedule <-preempt_schedule 1086 + <idle>-0 3d..3 30us : 0:120:R ==> [003] 2448: 94:R sleep 1487 1087 1488 - # _------=> CPU# 1489 - # / _-----=> irqs-off 1490 - # | / _----=> need-resched 1491 - # || / _---=> hardirq/softirq 1492 - # ||| / _--=> preempt-depth 1493 - # |||| / 1494 - # ||||| delay 1495 - # cmd pid ||||| time | caller 1496 - # \ / ||||| \ | / 1497 - ksoftirq-7 1d.H3 0us : try_to_wake_up (wake_up_process) 1498 - ksoftirq-7 1d.H4 1us : sub_preempt_count (marker_probe_cb) 1499 - ksoftirq-7 1d.H3 2us : check_preempt_wakeup (try_to_wake_up) 1500 - ksoftirq-7 1d.H3 3us : update_curr (check_preempt_wakeup) 1501 - ksoftirq-7 1d.H3 4us : calc_delta_mine (update_curr) 1502 - ksoftirq-7 1d.H3 5us : __resched_task (check_preempt_wakeup) 1503 - ksoftirq-7 1d.H3 6us : task_wake_up_rt (try_to_wake_up) 1504 - ksoftirq-7 1d.H3 7us : _spin_unlock_irqrestore (try_to_wake_up) 1505 - [...] 1506 - ksoftirq-7 1d.H2 17us : irq_exit (smp_apic_timer_interrupt) 1507 - ksoftirq-7 1d.H2 18us : sub_preempt_count (irq_exit) 1508 - ksoftirq-7 1d.s3 19us : sub_preempt_count (irq_exit) 1509 - ksoftirq-7 1..s2 20us : rcu_process_callbacks (__do_softirq) 1510 - [...] 1511 - ksoftirq-7 1..s2 26us : __rcu_process_callbacks (rcu_process_callbacks) 1512 - ksoftirq-7 1d.s2 27us : _local_bh_enable (__do_softirq) 1513 - ksoftirq-7 1d.s2 28us : sub_preempt_count (_local_bh_enable) 1514 - ksoftirq-7 1.N.3 29us : sub_preempt_count (ksoftirqd) 1515 - ksoftirq-7 1.N.2 30us : _cond_resched (ksoftirqd) 1516 - ksoftirq-7 1.N.2 31us : __cond_resched (_cond_resched) 1517 - ksoftirq-7 1.N.2 32us : add_preempt_count (__cond_resched) 1518 - ksoftirq-7 1.N.2 33us : schedule (__cond_resched) 1519 - ksoftirq-7 1.N.2 33us : add_preempt_count (schedule) 1520 - ksoftirq-7 1.N.3 34us : hrtick_clear (schedule) 1521 - ksoftirq-7 1dN.3 35us : _spin_lock (schedule) 1522 - ksoftirq-7 1dN.3 36us : add_preempt_count (_spin_lock) 1523 - ksoftirq-7 1d..4 37us : put_prev_task_fair (schedule) 1524 - ksoftirq-7 1d..4 38us : update_curr (put_prev_task_fair) 1525 - [...] 1526 - ksoftirq-7 1d..5 47us : _spin_trylock (tracing_record_cmdline) 1527 - ksoftirq-7 1d..5 48us : add_preempt_count (_spin_trylock) 1528 - ksoftirq-7 1d..6 49us : _spin_unlock (tracing_record_cmdline) 1529 - ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock) 1530 - ksoftirq-7 1d..4 50us : schedule (__cond_resched) 1088 + This isn't that big of a trace, even with function tracing enabled, 1089 + so I included the entire trace. 1531 1090 1532 - The interrupt went off while running ksoftirqd. This task runs 1533 - at SCHED_OTHER. Why did not we see the 'N' set early? This may 1534 - be a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K 1535 - stacks configured, the interrupt and softirq run with their own 1536 - stack. Some information is held on the top of the task's stack 1537 - (need_resched and preempt_count are both stored there). The 1538 - setting of the NEED_RESCHED bit is done directly to the task's 1539 - stack, but the reading of the NEED_RESCHED is done by looking at 1540 - the current stack, which in this case is the stack for the hard 1541 - interrupt. This hides the fact that NEED_RESCHED has been set. 1542 - We do not see the 'N' until we switch back to the task's 1543 - assigned stack. 1091 + The interrupt went off while when the system was idle. Somewhere 1092 + before task_woken_rt() was called, the NEED_RESCHED flag was set, 1093 + this is indicated by the first occurrence of the 'N' flag. 1094 + 1095 + Latency tracing and events 1096 + -------------------------- 1097 + As function tracing can induce a much larger latency, but without 1098 + seeing what happens within the latency it is hard to know what 1099 + caused it. There is a middle ground, and that is with enabling 1100 + events. 1101 + 1102 + # echo 0 > options/function-trace 1103 + # echo wakeup_rt > current_tracer 1104 + # echo 1 > events/enable 1105 + # echo 1 > tracing_on 1106 + # echo 0 > tracing_max_latency 1107 + # chrt -f 5 sleep 1 1108 + # echo 0 > tracing_on 1109 + # cat trace 1110 + # tracer: wakeup_rt 1111 + # 1112 + # wakeup_rt latency trace v1.1.5 on 3.8.0-test+ 1113 + # -------------------------------------------------------------------- 1114 + # latency: 6 us, #12/12, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) 1115 + # ----------------- 1116 + # | task: sleep-5882 (uid:0 nice:0 policy:1 rt_prio:5) 1117 + # ----------------- 1118 + # 1119 + # _------=> CPU# 1120 + # / _-----=> irqs-off 1121 + # | / _----=> need-resched 1122 + # || / _---=> hardirq/softirq 1123 + # ||| / _--=> preempt-depth 1124 + # |||| / delay 1125 + # cmd pid ||||| time | caller 1126 + # \ / ||||| \ | / 1127 + <idle>-0 2d.h4 0us : 0:120:R + [002] 5882: 94:R sleep 1128 + <idle>-0 2d.h4 0us : ttwu_do_activate.constprop.87 <-try_to_wake_up 1129 + <idle>-0 2d.h4 1us : sched_wakeup: comm=sleep pid=5882 prio=94 success=1 target_cpu=002 1130 + <idle>-0 2dNh2 1us : hrtimer_expire_exit: hrtimer=ffff88007796feb8 1131 + <idle>-0 2.N.2 2us : power_end: cpu_id=2 1132 + <idle>-0 2.N.2 3us : cpu_idle: state=4294967295 cpu_id=2 1133 + <idle>-0 2dN.3 4us : hrtimer_cancel: hrtimer=ffff88007d50d5e0 1134 + <idle>-0 2dN.3 4us : hrtimer_start: hrtimer=ffff88007d50d5e0 function=tick_sched_timer expires=34311211000000 softexpires=34311211000000 1135 + <idle>-0 2.N.2 5us : rcu_utilization: Start context switch 1136 + <idle>-0 2.N.2 5us : rcu_utilization: End context switch 1137 + <idle>-0 2d..3 6us : __schedule <-schedule 1138 + <idle>-0 2d..3 6us : 0:120:R ==> [002] 5882: 94:R sleep 1139 + 1544 1140 1545 1141 function 1546 1142 -------- ··· 1653 1039 This tracer is the function tracer. Enabling the function tracer 1654 1040 can be done from the debug file system. Make sure the 1655 1041 ftrace_enabled is set; otherwise this tracer is a nop. 1042 + See the "ftrace_enabled" section below. 1656 1043 1657 1044 # sysctl kernel.ftrace_enabled=1 1658 1045 # echo function > current_tracer ··· 1663 1048 # cat trace 1664 1049 # tracer: function 1665 1050 # 1666 - # TASK-PID CPU# TIMESTAMP FUNCTION 1667 - # | | | | | 1668 - bash-4003 [00] 123.638713: finish_task_switch <-schedule 1669 - bash-4003 [00] 123.638714: _spin_unlock_irq <-finish_task_switch 1670 - bash-4003 [00] 123.638714: sub_preempt_count <-_spin_unlock_irq 1671 - bash-4003 [00] 123.638715: hrtick_set <-schedule 1672 - bash-4003 [00] 123.638715: _spin_lock_irqsave <-hrtick_set 1673 - bash-4003 [00] 123.638716: add_preempt_count <-_spin_lock_irqsave 1674 - bash-4003 [00] 123.638716: _spin_unlock_irqrestore <-hrtick_set 1675 - bash-4003 [00] 123.638717: sub_preempt_count <-_spin_unlock_irqrestore 1676 - bash-4003 [00] 123.638717: hrtick_clear <-hrtick_set 1677 - bash-4003 [00] 123.638718: sub_preempt_count <-schedule 1678 - bash-4003 [00] 123.638718: sub_preempt_count <-preempt_schedule 1679 - bash-4003 [00] 123.638719: wait_for_completion <-__stop_machine_run 1680 - bash-4003 [00] 123.638719: wait_for_common <-wait_for_completion 1681 - bash-4003 [00] 123.638720: _spin_lock_irq <-wait_for_common 1682 - bash-4003 [00] 123.638720: add_preempt_count <-_spin_lock_irq 1051 + # entries-in-buffer/entries-written: 24799/24799 #P:4 1052 + # 1053 + # _-----=> irqs-off 1054 + # / _----=> need-resched 1055 + # | / _---=> hardirq/softirq 1056 + # || / _--=> preempt-depth 1057 + # ||| / delay 1058 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 1059 + # | | | |||| | | 1060 + bash-1994 [002] .... 3082.063030: mutex_unlock <-rb_simple_write 1061 + bash-1994 [002] .... 3082.063031: __mutex_unlock_slowpath <-mutex_unlock 1062 + bash-1994 [002] .... 3082.063031: __fsnotify_parent <-fsnotify_modify 1063 + bash-1994 [002] .... 3082.063032: fsnotify <-fsnotify_modify 1064 + bash-1994 [002] .... 3082.063032: __srcu_read_lock <-fsnotify 1065 + bash-1994 [002] .... 3082.063032: add_preempt_count <-__srcu_read_lock 1066 + bash-1994 [002] ...1 3082.063032: sub_preempt_count <-__srcu_read_lock 1067 + bash-1994 [002] .... 3082.063033: __srcu_read_unlock <-fsnotify 1683 1068 [...] 1684 1069 1685 1070 ··· 1829 1214 return 0; 1830 1215 } 1831 1216 1217 + Or this simple script! 1832 1218 1833 - hw-branch-tracer (x86 only) 1834 - --------------------------- 1219 + ------ 1220 + #!/bin/bash 1835 1221 1836 - This tracer uses the x86 last branch tracing hardware feature to 1837 - collect a branch trace on all cpus with relatively low overhead. 1838 - 1839 - The tracer uses a fixed-size circular buffer per cpu and only 1840 - traces ring 0 branches. The trace file dumps that buffer in the 1841 - following format: 1842 - 1843 - # tracer: hw-branch-tracer 1844 - # 1845 - # CPU# TO <- FROM 1846 - 0 scheduler_tick+0xb5/0x1bf <- task_tick_idle+0x5/0x6 1847 - 2 run_posix_cpu_timers+0x2b/0x72a <- run_posix_cpu_timers+0x25/0x72a 1848 - 0 scheduler_tick+0x139/0x1bf <- scheduler_tick+0xed/0x1bf 1849 - 0 scheduler_tick+0x17c/0x1bf <- scheduler_tick+0x148/0x1bf 1850 - 2 run_posix_cpu_timers+0x9e/0x72a <- run_posix_cpu_timers+0x5e/0x72a 1851 - 0 scheduler_tick+0x1b6/0x1bf <- scheduler_tick+0x1aa/0x1bf 1852 - 1853 - 1854 - The tracer may be used to dump the trace for the oops'ing cpu on 1855 - a kernel oops into the system log. To enable this, 1856 - ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one 1857 - can either use the sysctl function or set it via the proc system 1858 - interface. 1859 - 1860 - sysctl kernel.ftrace_dump_on_oops=n 1861 - 1862 - or 1863 - 1864 - echo n > /proc/sys/kernel/ftrace_dump_on_oops 1865 - 1866 - If n = 1, ftrace will dump buffers of all CPUs, if n = 2 ftrace will 1867 - only dump the buffer of the CPU that triggered the oops. 1868 - 1869 - Here's an example of such a dump after a null pointer 1870 - dereference in a kernel module: 1871 - 1872 - [57848.105921] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 1873 - [57848.106019] IP: [<ffffffffa0000006>] open+0x6/0x14 [oops] 1874 - [57848.106019] PGD 2354e9067 PUD 2375e7067 PMD 0 1875 - [57848.106019] Oops: 0002 [#1] SMP 1876 - [57848.106019] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:20:05.0/local_cpus 1877 - [57848.106019] Dumping ftrace buffer: 1878 - [57848.106019] --------------------------------- 1879 - [...] 1880 - [57848.106019] 0 chrdev_open+0xe6/0x165 <- cdev_put+0x23/0x24 1881 - [57848.106019] 0 chrdev_open+0x117/0x165 <- chrdev_open+0xfa/0x165 1882 - [57848.106019] 0 chrdev_open+0x120/0x165 <- chrdev_open+0x11c/0x165 1883 - [57848.106019] 0 chrdev_open+0x134/0x165 <- chrdev_open+0x12b/0x165 1884 - [57848.106019] 0 open+0x0/0x14 [oops] <- chrdev_open+0x144/0x165 1885 - [57848.106019] 0 page_fault+0x0/0x30 <- open+0x6/0x14 [oops] 1886 - [57848.106019] 0 error_entry+0x0/0x5b <- page_fault+0x4/0x30 1887 - [57848.106019] 0 error_kernelspace+0x0/0x31 <- error_entry+0x59/0x5b 1888 - [57848.106019] 0 error_sti+0x0/0x1 <- error_kernelspace+0x2d/0x31 1889 - [57848.106019] 0 page_fault+0x9/0x30 <- error_sti+0x0/0x1 1890 - [57848.106019] 0 do_page_fault+0x0/0x881 <- page_fault+0x1a/0x30 1891 - [...] 1892 - [57848.106019] 0 do_page_fault+0x66b/0x881 <- is_prefetch+0x1ee/0x1f2 1893 - [57848.106019] 0 do_page_fault+0x6e0/0x881 <- do_page_fault+0x67a/0x881 1894 - [57848.106019] 0 oops_begin+0x0/0x96 <- do_page_fault+0x6e0/0x881 1895 - [57848.106019] 0 trace_hw_branch_oops+0x0/0x2d <- oops_begin+0x9/0x96 1896 - [...] 1897 - [57848.106019] 0 ds_suspend_bts+0x2a/0xe3 <- ds_suspend_bts+0x1a/0xe3 1898 - [57848.106019] --------------------------------- 1899 - [57848.106019] CPU 0 1900 - [57848.106019] Modules linked in: oops 1901 - [57848.106019] Pid: 5542, comm: cat Tainted: G W 2.6.28 #23 1902 - [57848.106019] RIP: 0010:[<ffffffffa0000006>] [<ffffffffa0000006>] open+0x6/0x14 [oops] 1903 - [57848.106019] RSP: 0018:ffff880235457d48 EFLAGS: 00010246 1904 - [...] 1222 + debugfs=`sed -ne 's/^debugfs $.*$ debugfs.*/\1/p' /proc/mounts` 1223 + echo nop > $debugfs/tracing/current_tracer 1224 + echo 0 > $debugfs/tracing/tracing_on 1225 + echo $$ > $debugfs/tracing/set_ftrace_pid 1226 + echo function > $debugfs/tracing/current_tracer 1227 + echo 1 > $debugfs/tracing/tracing_on 1228 + exec "$@" 1229 + ------ 1905 1230 1906 1231 1907 1232 function graph tracer ··· 2028 1473 include the -pg switch in the compiling of the kernel.) 2029 1474 2030 1475 At compile time every C file object is run through the 2031 - recordmcount.pl script (located in the scripts directory). This 2032 - script will process the C object using objdump to find all the 2033 - locations in the .text section that call mcount. (Note, only the 2034 - .text section is processed, since processing other sections like 2035 - .init.text may cause races due to those sections being freed). 1476 + recordmcount program (located in the scripts directory). This 1477 + program will parse the ELF headers in the C object to find all 1478 + the locations in the .text section that call mcount. (Note, only 1479 + white listed .text sections are processed, since processing other 1480 + sections like .init.text may cause races due to those sections 1481 + being freed unexpectedly). 2036 1482 2037 1483 A new section called "__mcount_loc" is created that holds 2038 1484 references to all the mcount call sites in the .text section. 2039 - This section is compiled back into the original object. The 2040 - final linker will add all these references into a single table. 1485 + The recordmcount program re-links this section back into the 1486 + original object. The final linking stage of the kernel will add all these 1487 + references into a single table. 2041 1488 2042 1489 On boot up, before SMP is initialized, the dynamic ftrace code 2043 1490 scans this table and updates all the locations into nops. It ··· 2050 1493 list. This is automatic in the module unload code, and the 2051 1494 module author does not need to worry about it. 2052 1495 2053 - When tracing is enabled, kstop_machine is called to prevent 2054 - races with the CPUS executing code being modified (which can 2055 - cause the CPU to do undesirable things), and the nops are 1496 + When tracing is enabled, the process of modifying the function 1497 + tracepoints is dependent on architecture. The old method is to use 1498 + kstop_machine to prevent races with the CPUs executing code being 1499 + modified (which can cause the CPU to do undesirable things, especially 1500 + if the modified code crosses cache (or page) boundaries), and the nops are 2056 1501 patched back to calls. But this time, they do not call mcount 2057 1502 (which is just a function stub). They now call into the ftrace 2058 1503 infrastructure. 1504 + 1505 + The new method of modifying the function tracepoints is to place 1506 + a breakpoint at the location to be modified, sync all CPUs, modify 1507 + the rest of the instruction not covered by the breakpoint. Sync 1508 + all CPUs again, and then remove the breakpoint with the finished 1509 + version to the ftrace call site. 1510 + 1511 + Some archs do not even need to monkey around with the synchronization, 1512 + and can just slap the new code on top of the old without any 1513 + problems with other CPUs executing it at the same time. 2059 1514 2060 1515 One special side-effect to the recording of the functions being 2061 1516 traced is that we can now selectively choose which functions we ··· 2099 1530 2100 1531 If I am only interested in sys_nanosleep and hrtimer_interrupt: 2101 1532 2102 - # echo sys_nanosleep hrtimer_interrupt \ 2103 - > set_ftrace_filter 1533 + # echo sys_nanosleep hrtimer_interrupt > set_ftrace_filter 2104 1534 # echo function > current_tracer 2105 1535 # echo 1 > tracing_on 2106 1536 # usleep 1 2107 1537 # echo 0 > tracing_on 2108 1538 # cat trace 2109 - # tracer: ftrace 1539 + # tracer: function 2110 1540 # 2111 - # TASK-PID CPU# TIMESTAMP FUNCTION 2112 - # | | | | | 2113 - usleep-4134 [00] 1317.070017: hrtimer_interrupt <-smp_apic_timer_interrupt 2114 - usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call 2115 - <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt 1541 + # entries-in-buffer/entries-written: 5/5 #P:4 1542 + # 1543 + # _-----=> irqs-off 1544 + # / _----=> need-resched 1545 + # | / _---=> hardirq/softirq 1546 + # || / _--=> preempt-depth 1547 + # ||| / delay 1548 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 1549 + # | | | |||| | | 1550 + usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath 1551 + <idle>-0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt 1552 + usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt 1553 + <idle>-0 [003] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt 1554 + <idle>-0 [002] d.h1 4186.475427: hrtimer_interrupt <-smp_apic_timer_interrupt 2116 1555 2117 1556 To see which functions are being traced, you can cat the file: 2118 1557 ··· 2148 1571 2149 1572 Produces: 2150 1573 2151 - # tracer: ftrace 1574 + # tracer: function 2152 1575 # 2153 - # TASK-PID CPU# TIMESTAMP FUNCTION 2154 - # | | | | | 2155 - bash-4003 [00] 1480.611794: hrtimer_init <-copy_process 2156 - bash-4003 [00] 1480.611941: hrtimer_start <-hrtick_set 2157 - bash-4003 [00] 1480.611956: hrtimer_cancel <-hrtick_clear 2158 - bash-4003 [00] 1480.611956: hrtimer_try_to_cancel <-hrtimer_cancel 2159 - <idle>-0 [00] 1480.612019: hrtimer_get_next_event <-get_next_timer_interrupt 2160 - <idle>-0 [00] 1480.612025: hrtimer_get_next_event <-get_next_timer_interrupt 2161 - <idle>-0 [00] 1480.612032: hrtimer_get_next_event <-get_next_timer_interrupt 2162 - <idle>-0 [00] 1480.612037: hrtimer_get_next_event <-get_next_timer_interrupt 2163 - <idle>-0 [00] 1480.612382: hrtimer_get_next_event <-get_next_timer_interrupt 2164 - 1576 + # entries-in-buffer/entries-written: 897/897 #P:4 1577 + # 1578 + # _-----=> irqs-off 1579 + # / _----=> need-resched 1580 + # | / _---=> hardirq/softirq 1581 + # || / _--=> preempt-depth 1582 + # ||| / delay 1583 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 1584 + # | | | |||| | | 1585 + <idle>-0 [003] dN.1 4228.547803: hrtimer_cancel <-tick_nohz_idle_exit 1586 + <idle>-0 [003] dN.1 4228.547804: hrtimer_try_to_cancel <-hrtimer_cancel 1587 + <idle>-0 [003] dN.2 4228.547805: hrtimer_force_reprogram <-__remove_hrtimer 1588 + <idle>-0 [003] dN.1 4228.547805: hrtimer_forward <-tick_nohz_idle_exit 1589 + <idle>-0 [003] dN.1 4228.547805: hrtimer_start_range_ns <-hrtimer_start_expires.constprop.11 1590 + <idle>-0 [003] d..1 4228.547858: hrtimer_get_next_event <-get_next_timer_interrupt 1591 + <idle>-0 [003] d..1 4228.547859: hrtimer_start <-__tick_nohz_idle_enter 1592 + <idle>-0 [003] d..2 4228.547860: hrtimer_force_reprogram <-__rem 2165 1593 2166 1594 Notice that we lost the sys_nanosleep. 2167 1595 ··· 2233 1651 2234 1652 Produces: 2235 1653 2236 - # tracer: ftrace 1654 + # tracer: function 2237 1655 # 2238 - # TASK-PID CPU# TIMESTAMP FUNCTION 2239 - # | | | | | 2240 - bash-4043 [01] 115.281644: finish_task_switch <-schedule 2241 - bash-4043 [01] 115.281645: hrtick_set <-schedule 2242 - bash-4043 [01] 115.281645: hrtick_clear <-hrtick_set 2243 - bash-4043 [01] 115.281646: wait_for_completion <-__stop_machine_run 2244 - bash-4043 [01] 115.281647: wait_for_common <-wait_for_completion 2245 - bash-4043 [01] 115.281647: kthread_stop <-stop_machine_run 2246 - bash-4043 [01] 115.281648: init_waitqueue_head <-kthread_stop 2247 - bash-4043 [01] 115.281648: wake_up_process <-kthread_stop 2248 - bash-4043 [01] 115.281649: try_to_wake_up <-wake_up_process 1656 + # entries-in-buffer/entries-written: 39608/39608 #P:4 1657 + # 1658 + # _-----=> irqs-off 1659 + # / _----=> need-resched 1660 + # | / _---=> hardirq/softirq 1661 + # || / _--=> preempt-depth 1662 + # ||| / delay 1663 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 1664 + # | | | |||| | | 1665 + bash-1994 [000] .... 4342.324896: file_ra_state_init <-do_dentry_open 1666 + bash-1994 [000] .... 4342.324897: open_check_o_direct <-do_last 1667 + bash-1994 [000] .... 4342.324897: ima_file_check <-do_last 1668 + bash-1994 [000] .... 4342.324898: process_measurement <-ima_file_check 1669 + bash-1994 [000] .... 4342.324898: ima_get_action <-process_measurement 1670 + bash-1994 [000] .... 4342.324898: ima_match_policy <-ima_get_action 1671 + bash-1994 [000] .... 4342.324899: do_truncate <-do_last 1672 + bash-1994 [000] .... 4342.324899: should_remove_suid <-do_truncate 1673 + bash-1994 [000] .... 4342.324899: notify_change <-do_truncate 1674 + bash-1994 [000] .... 4342.324900: current_fs_time <-notify_change 1675 + bash-1994 [000] .... 4342.324900: current_kernel_time <-current_fs_time 1676 + bash-1994 [000] .... 4342.324900: timespec_trunc <-current_fs_time 2249 1677 2250 1678 We can see that there's no more lock or preempt tracing. 2251 1679 ··· 2321 1729 echo > set_graph_function 2322 1730 2323 1731 1732 + ftrace_enabled 1733 + -------------- 1734 + 1735 + Note, the proc sysctl ftrace_enable is a big on/off switch for the 1736 + function tracer. By default it is enabled (when function tracing is 1737 + enabled in the kernel). If it is disabled, all function tracing is 1738 + disabled. This includes not only the function tracers for ftrace, but 1739 + also for any other uses (perf, kprobes, stack tracing, profiling, etc). 1740 + 1741 + Please disable this with care. 1742 + 1743 + This can be disable (and enabled) with: 1744 + 1745 + sysctl kernel.ftrace_enabled=0 1746 + sysctl kernel.ftrace_enabled=1 1747 + 1748 + or 1749 + 1750 + echo 0 > /proc/sys/kernel/ftrace_enabled 1751 + echo 1 > /proc/sys/kernel/ftrace_enabled 1752 + 1753 + 2324 1754 Filter commands 2325 1755 --------------- 2326 1756 ··· 2377 1763 2378 1764 echo '__schedule_bug:traceoff:5' > set_ftrace_filter 2379 1765 1766 + To always disable tracing when __schedule_bug is hit: 1767 + 1768 + echo '__schedule_bug:traceoff' > set_ftrace_filter 1769 + 2380 1770 These commands are cumulative whether or not they are appended 2381 1771 to set_ftrace_filter. To remove a command, prepend it by '!' 2382 1772 and drop the parameter: 2383 1773 1774 + echo '!__schedule_bug:traceoff:0' > set_ftrace_filter 1775 + 1776 + The above removes the traceoff command for __schedule_bug 1777 + that have a counter. To remove commands without counters: 1778 + 2384 1779 echo '!__schedule_bug:traceoff' > set_ftrace_filter 2385 1780 1781 + - snapshot 1782 + Will cause a snapshot to be triggered when the function is hit. 1783 + 1784 + echo 'native_flush_tlb_others:snapshot' > set_ftrace_filter 1785 + 1786 + To only snapshot once: 1787 + 1788 + echo 'native_flush_tlb_others:snapshot:1' > set_ftrace_filter 1789 + 1790 + To remove the above commands: 1791 + 1792 + echo '!native_flush_tlb_others:snapshot' > set_ftrace_filter 1793 + echo '!native_flush_tlb_others:snapshot:0' > set_ftrace_filter 1794 + 1795 + - enable_event/disable_event 1796 + These commands can enable or disable a trace event. Note, because 1797 + function tracing callbacks are very sensitive, when these commands 1798 + are registered, the trace point is activated, but disabled in 1799 + a "soft" mode. That is, the tracepoint will be called, but 1800 + just will not be traced. The event tracepoint stays in this mode 1801 + as long as there's a command that triggers it. 1802 + 1803 + echo 'try_to_wake_up:enable_event:sched:sched_switch:2' > \ 1804 + set_ftrace_filter 1805 + 1806 + The format is: 1807 + 1808 + <function>:enable_event:<system>:<event>[:count] 1809 + <function>:disable_event:<system>:<event>[:count] 1810 + 1811 + To remove the events commands: 1812 + 1813 + 1814 + echo '!try_to_wake_up:enable_event:sched:sched_switch:0' > \ 1815 + set_ftrace_filter 1816 + echo '!schedule:disable_event:sched:sched_switch' > \ 1817 + set_ftrace_filter 2386 1818 2387 1819 trace_pipe 2388 1820 ---------- ··· 2447 1787 # cat trace 2448 1788 # tracer: function 2449 1789 # 2450 - # TASK-PID CPU# TIMESTAMP FUNCTION 2451 - # | | | | | 1790 + # entries-in-buffer/entries-written: 0/0 #P:4 1791 + # 1792 + # _-----=> irqs-off 1793 + # / _----=> need-resched 1794 + # | / _---=> hardirq/softirq 1795 + # || / _--=> preempt-depth 1796 + # ||| / delay 1797 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 1798 + # | | | |||| | | 2452 1799 2453 1800 # 2454 1801 # cat /tmp/trace.out 2455 - bash-4043 [00] 41.267106: finish_task_switch <-schedule 2456 - bash-4043 [00] 41.267106: hrtick_set <-schedule 2457 - bash-4043 [00] 41.267107: hrtick_clear <-hrtick_set 2458 - bash-4043 [00] 41.267108: wait_for_completion <-__stop_machine_run 2459 - bash-4043 [00] 41.267108: wait_for_common <-wait_for_completion 2460 - bash-4043 [00] 41.267109: kthread_stop <-stop_machine_run 2461 - bash-4043 [00] 41.267109: init_waitqueue_head <-kthread_stop 2462 - bash-4043 [00] 41.267110: wake_up_process <-kthread_stop 2463 - bash-4043 [00] 41.267110: try_to_wake_up <-wake_up_process 2464 - bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up 1802 + bash-1994 [000] .... 5281.568961: mutex_unlock <-rb_simple_write 1803 + bash-1994 [000] .... 5281.568963: __mutex_unlock_slowpath <-mutex_unlock 1804 + bash-1994 [000] .... 5281.568963: __fsnotify_parent <-fsnotify_modify 1805 + bash-1994 [000] .... 5281.568964: fsnotify <-fsnotify_modify 1806 + bash-1994 [000] .... 5281.568964: __srcu_read_lock <-fsnotify 1807 + bash-1994 [000] .... 5281.568964: add_preempt_count <-__srcu_read_lock 1808 + bash-1994 [000] ...1 5281.568965: sub_preempt_count <-__srcu_read_lock 1809 + bash-1994 [000] .... 5281.568965: __srcu_read_unlock <-fsnotify 1810 + bash-1994 [000] .... 5281.568967: sys_dup2 <-system_call_fastpath 2465 1811 2466 1812 2467 1813 Note, reading the trace_pipe file will block until more input is 2468 - added. By changing the tracer, trace_pipe will issue an EOF. We 2469 - needed to set the function tracer _before_ we "cat" the 2470 - trace_pipe file. 2471 - 1814 + added. 2472 1815 2473 1816 trace entries 2474 1817 ------------- ··· 2480 1817 diagnosing an issue in the kernel. The file buffer_size_kb is 2481 1818 used to modify the size of the internal trace buffers. The 2482 1819 number listed is the number of entries that can be recorded per 2483 - CPU. To know the full size, multiply the number of possible CPUS 1820 + CPU. To know the full size, multiply the number of possible CPUs 2484 1821 with the number of entries. 2485 1822 2486 1823 # cat buffer_size_kb 2487 1824 1408 (units kilobytes) 2488 1825 2489 - Note, to modify this, you must have tracing completely disabled. 2490 - To do that, echo "nop" into the current_tracer. If the 2491 - current_tracer is not set to "nop", an EINVAL error will be 2492 - returned. 1826 + Or simply read buffer_total_size_kb 2493 1827 2494 - # echo nop > current_tracer 1828 + # cat buffer_total_size_kb 1829 + 5632 1830 + 1831 + To modify the buffer, simple echo in a number (in 1024 byte segments). 1832 + 2495 1833 # echo 10000 > buffer_size_kb 2496 1834 # cat buffer_size_kb 2497 1835 10000 (units kilobytes) 2498 1836 2499 - The number of pages which will be allocated is limited to a 2500 - percentage of available memory. Allocating too much will produce 2501 - an error. 1837 + It will try to allocate as much as possible. If you allocate too 1838 + much, it can cause Out-Of-Memory to trigger. 2502 1839 2503 1840 # echo 1000000000000 > buffer_size_kb 2504 1841 -bash: echo: write error: Cannot allocate memory 2505 1842 # cat buffer_size_kb 2506 1843 85 1844 + 1845 + The per_cpu buffers can be changed individually as well: 1846 + 1847 + # echo 10000 > per_cpu/cpu0/buffer_size_kb 1848 + # echo 100 > per_cpu/cpu1/buffer_size_kb 1849 + 1850 + When the per_cpu buffers are not the same, the buffer_size_kb 1851 + at the top level will just show an X 1852 + 1853 + # cat buffer_size_kb 1854 + X 1855 + 1856 + This is where the buffer_total_size_kb is useful: 1857 + 1858 + # cat buffer_total_size_kb 1859 + 12916 1860 + 1861 + Writing to the top level buffer_size_kb will reset all the buffers 1862 + to be the same again. 2507 1863 2508 1864 Snapshot 2509 1865 -------- ··· 2607 1925 # cat snapshot 2608 1926 cat: snapshot: Device or resource busy 2609 1927 1928 + 1929 + Instances 1930 + --------- 1931 + In the debugfs tracing directory is a directory called "instances". 1932 + This directory can have new directories created inside of it using 1933 + mkdir, and removing directories with rmdir. The directory created 1934 + with mkdir in this directory will already contain files and other 1935 + directories after it is created. 1936 + 1937 + # mkdir instances/foo 1938 + # ls instances/foo 1939 + buffer_size_kb buffer_total_size_kb events free_buffer per_cpu 1940 + set_event snapshot trace trace_clock trace_marker trace_options 1941 + trace_pipe tracing_on 1942 + 1943 + As you can see, the new directory looks similar to the tracing directory 1944 + itself. In fact, it is very similar, except that the buffer and 1945 + events are agnostic from the main director, or from any other 1946 + instances that are created. 1947 + 1948 + The files in the new directory work just like the files with the 1949 + same name in the tracing directory except the buffer that is used 1950 + is a separate and new buffer. The files affect that buffer but do not 1951 + affect the main buffer with the exception of trace_options. Currently, 1952 + the trace_options affect all instances and the top level buffer 1953 + the same, but this may change in future releases. That is, options 1954 + may become specific to the instance they reside in. 1955 + 1956 + Notice that none of the function tracer files are there, nor is 1957 + current_tracer and available_tracers. This is because the buffers 1958 + can currently only have events enabled for them. 1959 + 1960 + # mkdir instances/foo 1961 + # mkdir instances/bar 1962 + # mkdir instances/zoot 1963 + # echo 100000 > buffer_size_kb 1964 + # echo 1000 > instances/foo/buffer_size_kb 1965 + # echo 5000 > instances/bar/per_cpu/cpu1/buffer_size_kb 1966 + # echo function > current_trace 1967 + # echo 1 > instances/foo/events/sched/sched_wakeup/enable 1968 + # echo 1 > instances/foo/events/sched/sched_wakeup_new/enable 1969 + # echo 1 > instances/foo/events/sched/sched_switch/enable 1970 + # echo 1 > instances/bar/events/irq/enable 1971 + # echo 1 > instances/zoot/events/syscalls/enable 1972 + # cat trace_pipe 1973 + CPU:2 [LOST 11745 EVENTS] 1974 + bash-2044 [002] .... 10594.481032: _raw_spin_lock_irqsave <-get_page_from_freelist 1975 + bash-2044 [002] d... 10594.481032: add_preempt_count <-_raw_spin_lock_irqsave 1976 + bash-2044 [002] d..1 10594.481032: __rmqueue <-get_page_from_freelist 1977 + bash-2044 [002] d..1 10594.481033: _raw_spin_unlock <-get_page_from_freelist 1978 + bash-2044 [002] d..1 10594.481033: sub_preempt_count <-_raw_spin_unlock 1979 + bash-2044 [002] d... 10594.481033: get_pageblock_flags_group <-get_pageblock_migratetype 1980 + bash-2044 [002] d... 10594.481034: __mod_zone_page_state <-get_page_from_freelist 1981 + bash-2044 [002] d... 10594.481034: zone_statistics <-get_page_from_freelist 1982 + bash-2044 [002] d... 10594.481034: __inc_zone_state <-zone_statistics 1983 + bash-2044 [002] d... 10594.481034: __inc_zone_state <-zone_statistics 1984 + bash-2044 [002] .... 10594.481035: arch_dup_task_struct <-copy_process 1985 + [...] 1986 + 1987 + # cat instances/foo/trace_pipe 1988 + bash-1998 [000] d..4 136.676759: sched_wakeup: comm=kworker/0:1 pid=59 prio=120 success=1 target_cpu=000 1989 + bash-1998 [000] dN.4 136.676760: sched_wakeup: comm=bash pid=1998 prio=120 success=1 target_cpu=000 1990 + <idle>-0 [003] d.h3 136.676906: sched_wakeup: comm=rcu_preempt pid=9 prio=120 success=1 target_cpu=003 1991 + <idle>-0 [003] d..3 136.676909: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=9 next_prio=120 1992 + rcu_preempt-9 [003] d..3 136.676916: sched_switch: prev_comm=rcu_preempt prev_pid=9 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 1993 + bash-1998 [000] d..4 136.677014: sched_wakeup: comm=kworker/0:1 pid=59 prio=120 success=1 target_cpu=000 1994 + bash-1998 [000] dN.4 136.677016: sched_wakeup: comm=bash pid=1998 prio=120 success=1 target_cpu=000 1995 + bash-1998 [000] d..3 136.677018: sched_switch: prev_comm=bash prev_pid=1998 prev_prio=120 prev_state=R+ ==> next_comm=kworker/0:1 next_pid=59 next_prio=120 1996 + kworker/0:1-59 [000] d..4 136.677022: sched_wakeup: comm=sshd pid=1995 prio=120 success=1 target_cpu=001 1997 + kworker/0:1-59 [000] d..3 136.677025: sched_switch: prev_comm=kworker/0:1 prev_pid=59 prev_prio=120 prev_state=S ==> next_comm=bash next_pid=1998 next_prio=120 1998 + [...] 1999 + 2000 + # cat instances/bar/trace_pipe 2001 + migration/1-14 [001] d.h3 138.732674: softirq_raise: vec=3 [action=NET_RX] 2002 + <idle>-0 [001] dNh3 138.732725: softirq_raise: vec=3 [action=NET_RX] 2003 + bash-1998 [000] d.h1 138.733101: softirq_raise: vec=1 [action=TIMER] 2004 + bash-1998 [000] d.h1 138.733102: softirq_raise: vec=9 [action=RCU] 2005 + bash-1998 [000] ..s2 138.733105: softirq_entry: vec=1 [action=TIMER] 2006 + bash-1998 [000] ..s2 138.733106: softirq_exit: vec=1 [action=TIMER] 2007 + bash-1998 [000] ..s2 138.733106: softirq_entry: vec=9 [action=RCU] 2008 + bash-1998 [000] ..s2 138.733109: softirq_exit: vec=9 [action=RCU] 2009 + sshd-1995 [001] d.h1 138.733278: irq_handler_entry: irq=21 name=uhci_hcd:usb4 2010 + sshd-1995 [001] d.h1 138.733280: irq_handler_exit: irq=21 ret=unhandled 2011 + sshd-1995 [001] d.h1 138.733281: irq_handler_entry: irq=21 name=eth0 2012 + sshd-1995 [001] d.h1 138.733283: irq_handler_exit: irq=21 ret=handled 2013 + [...] 2014 + 2015 + # cat instances/zoot/trace 2016 + # tracer: nop 2017 + # 2018 + # entries-in-buffer/entries-written: 18996/18996 #P:4 2019 + # 2020 + # _-----=> irqs-off 2021 + # / _----=> need-resched 2022 + # | / _---=> hardirq/softirq 2023 + # || / _--=> preempt-depth 2024 + # ||| / delay 2025 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 2026 + # | | | |||| | | 2027 + bash-1998 [000] d... 140.733501: sys_write -> 0x2 2028 + bash-1998 [000] d... 140.733504: sys_dup2(oldfd: a, newfd: 1) 2029 + bash-1998 [000] d... 140.733506: sys_dup2 -> 0x1 2030 + bash-1998 [000] d... 140.733508: sys_fcntl(fd: a, cmd: 1, arg: 0) 2031 + bash-1998 [000] d... 140.733509: sys_fcntl -> 0x1 2032 + bash-1998 [000] d... 140.733510: sys_close(fd: a) 2033 + bash-1998 [000] d... 140.733510: sys_close -> 0x0 2034 + bash-1998 [000] d... 140.733514: sys_rt_sigprocmask(how: 0, nset: 0, oset: 6e2768, sigsetsize: 8) 2035 + bash-1998 [000] d... 140.733515: sys_rt_sigprocmask -> 0x0 2036 + bash-1998 [000] d... 140.733516: sys_rt_sigaction(sig: 2, act: 7fff718846f0, oact: 7fff71884650, sigsetsize: 8) 2037 + bash-1998 [000] d... 140.733516: sys_rt_sigaction -> 0x0 2038 + 2039 + You can see that the trace of the top most trace buffer shows only 2040 + the function tracing. The foo instance displays wakeups and task 2041 + switches. 2042 + 2043 + To remove the instances, simply delete their directories: 2044 + 2045 + # rmdir instances/foo 2046 + # rmdir instances/bar 2047 + # rmdir instances/zoot 2048 + 2049 + Note, if a process has a trace file open in one of the instance 2050 + directories, the rmdir will fail with EBUSY. 2051 + 2052 + 2053 + Stack trace 2610 2054 ----------- 2055 + Since the kernel has a fixed sized stack, it is important not to 2056 + waste it in functions. A kernel developer must be conscience of 2057 + what they allocate on the stack. If they add too much, the system 2058 + can be in danger of a stack overflow, and corruption will occur, 2059 + usually leading to a system panic. 2060 + 2061 + There are some tools that check this, usually with interrupts 2062 + periodically checking usage. But if you can perform a check 2063 + at every function call that will become very useful. As ftrace provides 2064 + a function tracer, it makes it convenient to check the stack size 2065 + at every function call. This is enabled via the stack tracer. 2066 + 2067 + CONFIG_STACK_TRACER enables the ftrace stack tracing functionality. 2068 + To enable it, write a '1' into /proc/sys/kernel/stack_tracer_enabled. 2069 + 2070 + # echo 1 > /proc/sys/kernel/stack_tracer_enabled 2071 + 2072 + You can also enable it from the kernel command line to trace 2073 + the stack size of the kernel during boot up, by adding "stacktrace" 2074 + to the kernel command line parameter. 2075 + 2076 + After running it for a few minutes, the output looks like: 2077 + 2078 + # cat stack_max_size 2079 + 2928 2080 + 2081 + # cat stack_trace 2082 + Depth Size Location (18 entries) 2083 + ----- ---- -------- 2084 + 0) 2928 224 update_sd_lb_stats+0xbc/0x4ac 2085 + 1) 2704 160 find_busiest_group+0x31/0x1f1 2086 + 2) 2544 256 load_balance+0xd9/0x662 2087 + 3) 2288 80 idle_balance+0xbb/0x130 2088 + 4) 2208 128 __schedule+0x26e/0x5b9 2089 + 5) 2080 16 schedule+0x64/0x66 2090 + 6) 2064 128 schedule_timeout+0x34/0xe0 2091 + 7) 1936 112 wait_for_common+0x97/0xf1 2092 + 8) 1824 16 wait_for_completion+0x1d/0x1f 2093 + 9) 1808 128 flush_work+0xfe/0x119 2094 + 10) 1680 16 tty_flush_to_ldisc+0x1e/0x20 2095 + 11) 1664 48 input_available_p+0x1d/0x5c 2096 + 12) 1616 48 n_tty_poll+0x6d/0x134 2097 + 13) 1568 64 tty_poll+0x64/0x7f 2098 + 14) 1504 880 do_select+0x31e/0x511 2099 + 15) 624 400 core_sys_select+0x177/0x216 2100 + 16) 224 96 sys_select+0x91/0xb9 2101 + 17) 128 128 system_call_fastpath+0x16/0x1b 2102 + 2103 + Note, if -mfentry is being used by gcc, functions get traced before 2104 + they set up the stack frame. This means that leaf level functions 2105 + are not tested by the stack tracer when -mfentry is used. 2106 + 2107 + Currently, -mfentry is used by gcc 4.6.0 and above on x86 only. 2108 + 2109 + --------- 2611 2110 2612 2111 More details can be found in the source code, in the 2613 2112 kernel/trace/*.c files.

+4 -2

include/linux/ftrace.h

··· 261 261 void (*func)(unsigned long ip, 262 262 unsigned long parent_ip, 263 263 void **data); 264 - int (*callback)(unsigned long ip, void **data); 265 - void (*free)(void **data); 264 + int (*init)(struct ftrace_probe_ops *ops, 265 + unsigned long ip, void **data); 266 + void (*free)(struct ftrace_probe_ops *ops, 267 + unsigned long ip, void **data); 266 268 int (*print)(struct seq_file *m, 267 269 unsigned long ip, 268 270 struct ftrace_probe_ops *ops,

+85 -26

include/linux/ftrace_event.h

··· 8 8 #include <linux/perf_event.h> 9 9 10 10 struct trace_array; 11 + struct trace_buffer; 11 12 struct tracer; 12 13 struct dentry; 13 14 ··· 39 38 const char *ftrace_print_hex_seq(struct trace_seq *p, 40 39 const unsigned char *buf, int len); 41 40 41 + struct trace_iterator; 42 + struct trace_event; 43 + 44 + int ftrace_raw_output_prep(struct trace_iterator *iter, 45 + struct trace_event *event); 46 + 42 47 /* 43 48 * The trace entry - the most basic unit of tracing. This is what 44 49 * is printed in the end as a single line in the trace output, such as: ··· 68 61 struct trace_iterator { 69 62 struct trace_array *tr; 70 63 struct tracer *trace; 64 + struct trace_buffer *trace_buffer; 71 65 void *private; 72 66 int cpu_file; 73 67 struct mutex mutex; ··· 103 95 }; 104 96 105 97 106 - struct trace_event; 107 - 108 98 typedef enum print_line_t (*trace_print_func)(struct trace_iterator *iter, 109 99 int flags, struct trace_event *event); 110 100 ··· 134 128 void tracing_generic_entry_update(struct trace_entry *entry, 135 129 unsigned long flags, 136 130 int pc); 131 + struct ftrace_event_file; 132 + 133 + struct ring_buffer_event * 134 + trace_event_buffer_lock_reserve(struct ring_buffer **current_buffer, 135 + struct ftrace_event_file *ftrace_file, 136 + int type, unsigned long len, 137 + unsigned long flags, int pc); 137 138 struct ring_buffer_event * 138 139 trace_current_buffer_lock_reserve(struct ring_buffer **current_buffer, 139 140 int type, unsigned long len, ··· 195 182 enum trace_reg type, void *data); 196 183 197 184 enum { 198 - TRACE_EVENT_FL_ENABLED_BIT, 199 185 TRACE_EVENT_FL_FILTERED_BIT, 200 - TRACE_EVENT_FL_RECORDED_CMD_BIT, 201 186 TRACE_EVENT_FL_CAP_ANY_BIT, 202 187 TRACE_EVENT_FL_NO_SET_FILTER_BIT, 203 188 TRACE_EVENT_FL_IGNORE_ENABLE_BIT, 189 + TRACE_EVENT_FL_WAS_ENABLED_BIT, 204 190 }; 205 191 192 + /* 193 + * Event flags: 194 + * FILTERED - The event has a filter attached 195 + * CAP_ANY - Any user can enable for perf 196 + * NO_SET_FILTER - Set when filter has error and is to be ignored 197 + * IGNORE_ENABLE - For ftrace internal events, do not enable with debugfs file 198 + * WAS_ENABLED - Set and stays set when an event was ever enabled 199 + * (used for module unloading, if a module event is enabled, 200 + * it is best to clear the buffers that used it). 201 + */ 206 202 enum { 207 - TRACE_EVENT_FL_ENABLED = (1 << TRACE_EVENT_FL_ENABLED_BIT), 208 203 TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT), 209 - TRACE_EVENT_FL_RECORDED_CMD = (1 << TRACE_EVENT_FL_RECORDED_CMD_BIT), 210 204 TRACE_EVENT_FL_CAP_ANY = (1 << TRACE_EVENT_FL_CAP_ANY_BIT), 211 205 TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT), 212 206 TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT), 207 + TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT), 213 208 }; 214 209 215 210 struct ftrace_event_call { 216 211 struct list_head list; 217 212 struct ftrace_event_class *class; 218 213 char *name; 219 - struct dentry *dir; 220 214 struct trace_event event; 221 215 const char *print_fmt; 222 216 struct event_filter *filter; 217 + struct list_head *files; 223 218 void *mod; 224 219 void *data; 225 - 226 220 /* 227 - * 32 bit flags: 228 - * bit 1: enabled 229 - * bit 2: filter_active 230 - * bit 3: enabled cmd record 231 - * bit 4: allow trace by non root (cap any) 232 - * bit 5: failed to apply filter 233 - * bit 6: ftrace internal event (do not enable) 234 - * 235 - * Changes to flags must hold the event_mutex. 236 - * 237 - * Note: Reads of flags do not hold the event_mutex since 238 - * they occur in critical sections. But the way flags 239 - * is currently used, these changes do no affect the code 240 - * except that when a change is made, it may have a slight 241 - * delay in propagating the changes to other CPUs due to 242 - * caching and such. 221 + * bit 0: filter_active 222 + * bit 1: allow trace by non root (cap any) 223 + * bit 2: failed to apply filter 224 + * bit 3: ftrace internal event (do not enable) 225 + * bit 4: Event was enabled by module 243 226 */ 244 - unsigned int flags; 227 + int flags; /* static flags of different events */ 245 228 246 229 #ifdef CONFIG_PERF_EVENTS 247 230 int perf_refcount; 248 231 struct hlist_head __percpu *perf_events; 249 232 #endif 233 + }; 234 + 235 + struct trace_array; 236 + struct ftrace_subsystem_dir; 237 + 238 + enum { 239 + FTRACE_EVENT_FL_ENABLED_BIT, 240 + FTRACE_EVENT_FL_RECORDED_CMD_BIT, 241 + FTRACE_EVENT_FL_SOFT_MODE_BIT, 242 + FTRACE_EVENT_FL_SOFT_DISABLED_BIT, 243 + }; 244 + 245 + /* 246 + * Ftrace event file flags: 247 + * ENABLED - The event is enabled 248 + * RECORDED_CMD - The comms should be recorded at sched_switch 249 + * SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED 250 + * SOFT_DISABLED - When set, do not trace the event (even though its 251 + * tracepoint may be enabled) 252 + */ 253 + enum { 254 + FTRACE_EVENT_FL_ENABLED = (1 << FTRACE_EVENT_FL_ENABLED_BIT), 255 + FTRACE_EVENT_FL_RECORDED_CMD = (1 << FTRACE_EVENT_FL_RECORDED_CMD_BIT), 256 + FTRACE_EVENT_FL_SOFT_MODE = (1 << FTRACE_EVENT_FL_SOFT_MODE_BIT), 257 + FTRACE_EVENT_FL_SOFT_DISABLED = (1 << FTRACE_EVENT_FL_SOFT_DISABLED_BIT), 258 + }; 259 + 260 + struct ftrace_event_file { 261 + struct list_head list; 262 + struct ftrace_event_call *event_call; 263 + struct dentry *dir; 264 + struct trace_array *tr; 265 + struct ftrace_subsystem_dir *system; 266 + 267 + /* 268 + * 32 bit flags: 269 + * bit 0: enabled 270 + * bit 1: enabled cmd record 271 + * bit 2: enable/disable with the soft disable bit 272 + * bit 3: soft disabled 273 + * 274 + * Note: The bits must be set atomically to prevent races 275 + * from other writers. Reads of flags do not need to be in 276 + * sync as they occur in critical sections. But the way flags 277 + * is currently used, these changes do not affect the code 278 + * except that when a change is made, it may have a slight 279 + * delay in propagating the changes to other CPUs due to 280 + * caching and such. Which is mostly OK ;-) 281 + */ 282 + unsigned long flags; 250 283 }; 251 284 252 285 #define __TRACE_EVENT_FLAGS(name, value) \ ··· 333 274 extern int trace_add_event_call(struct ftrace_event_call *call); 334 275 extern void trace_remove_event_call(struct ftrace_event_call *call); 335 276 336 - #define is_signed_type(type) (((type)(-1)) < (type)0) 277 + #define is_signed_type(type) (((type)(-1)) < (type)1) 337 278 338 279 int trace_set_clr_event(const char *system, const char *event, int set); 339 280

+67 -3

include/linux/kernel.h

··· 486 486 void tracing_on(void); 487 487 void tracing_off(void); 488 488 int tracing_is_on(void); 489 + void tracing_snapshot(void); 490 + void tracing_snapshot_alloc(void); 489 491 490 492 extern void tracing_start(void); 491 493 extern void tracing_stop(void); ··· 517 515 * 518 516 * This is intended as a debugging tool for the developer only. 519 517 * Please refrain from leaving trace_printks scattered around in 520 - * your code. 518 + * your code. (Extra memory is used for special buffers that are 519 + * allocated when trace_printk() is used) 520 + * 521 + * A little optization trick is done here. If there's only one 522 + * argument, there's no need to scan the string for printf formats. 523 + * The trace_puts() will suffice. But how can we take advantage of 524 + * using trace_puts() when trace_printk() has only one argument? 525 + * By stringifying the args and checking the size we can tell 526 + * whether or not there are args. __stringify((__VA_ARGS__)) will 527 + * turn into "()\0" with a size of 3 when there are no args, anything 528 + * else will be bigger. All we need to do is define a string to this, 529 + * and then take its size and compare to 3. If it's bigger, use 530 + * do_trace_printk() otherwise, optimize it to trace_puts(). Then just 531 + * let gcc optimize the rest. 521 532 */ 522 533 523 - #define trace_printk(fmt, args...) \ 534 + #define trace_printk(fmt, ...) \ 535 + do { \ 536 + char _______STR[] = __stringify((__VA_ARGS__)); \ 537 + if (sizeof(_______STR) > 3) \ 538 + do_trace_printk(fmt, ##__VA_ARGS__); \ 539 + else \ 540 + trace_puts(fmt); \ 541 + } while (0) 542 + 543 + #define do_trace_printk(fmt, args...) \ 524 544 do { \ 525 545 static const char *trace_printk_fmt \ 526 546 __attribute__((section("__trace_printk_fmt"))) = \ ··· 562 538 extern __printf(2, 3) 563 539 int __trace_printk(unsigned long ip, const char *fmt, ...); 564 540 565 - extern void trace_dump_stack(void); 541 + /** 542 + * trace_puts - write a string into the ftrace buffer 543 + * @str: the string to record 544 + * 545 + * Note: __trace_bputs is an internal function for trace_puts and 546 + * the @ip is passed in via the trace_puts macro. 547 + * 548 + * This is similar to trace_printk() but is made for those really fast 549 + * paths that a developer wants the least amount of "Heisenbug" affects, 550 + * where the processing of the print format is still too much. 551 + * 552 + * This function allows a kernel developer to debug fast path sections 553 + * that printk is not appropriate for. By scattering in various 554 + * printk like tracing in the code, a developer can quickly see 555 + * where problems are occurring. 556 + * 557 + * This is intended as a debugging tool for the developer only. 558 + * Please refrain from leaving trace_puts scattered around in 559 + * your code. (Extra memory is used for special buffers that are 560 + * allocated when trace_puts() is used) 561 + * 562 + * Returns: 0 if nothing was written, positive # if string was. 563 + * (1 when __trace_bputs is used, strlen(str) when __trace_puts is used) 564 + */ 565 + 566 + extern int __trace_bputs(unsigned long ip, const char *str); 567 + extern int __trace_puts(unsigned long ip, const char *str, int size); 568 + #define trace_puts(str) ({ \ 569 + static const char *trace_printk_fmt \ 570 + __attribute__((section("__trace_printk_fmt"))) = \ 571 + __builtin_constant_p(str) ? str : NULL; \ 572 + \ 573 + if (__builtin_constant_p(str)) \ 574 + __trace_bputs(_THIS_IP_, trace_printk_fmt); \ 575 + else \ 576 + __trace_puts(_THIS_IP_, str, strlen(str)); \ 577 + }) 578 + 579 + extern void trace_dump_stack(int skip); 566 580 567 581 /* 568 582 * The double __builtin_constant_p is because gcc will give us an error ··· 635 573 static inline void tracing_on(void) { } 636 574 static inline void tracing_off(void) { } 637 575 static inline int tracing_is_on(void) { return 0; } 576 + static inline void tracing_snapshot(void) { } 577 + static inline void tracing_snapshot_alloc(void) { } 638 578 639 579 static inline __printf(1, 2) 640 580 int trace_printk(const char *fmt, ...)

+6

include/linux/ring_buffer.h

··· 4 4 #include <linux/kmemcheck.h> 5 5 #include <linux/mm.h> 6 6 #include <linux/seq_file.h> 7 + #include <linux/poll.h> 7 8 8 9 struct ring_buffer; 9 10 struct ring_buffer_iter; ··· 96 95 static struct lock_class_key __key; \ 97 96 __ring_buffer_alloc((size), (flags), &__key); \ 98 97 }) 98 + 99 + void ring_buffer_wait(struct ring_buffer *buffer, int cpu); 100 + int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu, 101 + struct file *filp, poll_table *poll_table); 102 + 99 103 100 104 #define RING_BUFFER_ALL_CPUS -1 101 105

+1

include/linux/trace_clock.h

··· 16 16 17 17 extern u64 notrace trace_clock_local(void); 18 18 extern u64 notrace trace_clock(void); 19 + extern u64 notrace trace_clock_jiffies(void); 19 20 extern u64 notrace trace_clock_global(void); 20 21 extern u64 notrace trace_clock_counter(void); 21 22

+23 -26

include/trace/ftrace.h

··· 227 227 ftrace_raw_output_##call(struct trace_iterator *iter, int flags, \ 228 228 struct trace_event *trace_event) \ 229 229 { \ 230 - struct ftrace_event_call *event; \ 231 230 struct trace_seq *s = &iter->seq; \ 231 + struct trace_seq __maybe_unused *p = &iter->tmp_seq; \ 232 232 struct ftrace_raw_##call *field; \ 233 - struct trace_entry *entry; \ 234 - struct trace_seq *p = &iter->tmp_seq; \ 235 233 int ret; \ 236 234 \ 237 - event = container_of(trace_event, struct ftrace_event_call, \ 238 - event); \ 235 + field = (typeof(field))iter->ent; \ 239 236 \ 240 - entry = iter->ent; \ 241 - \ 242 - if (entry->type != event->event.type) { \ 243 - WARN_ON_ONCE(1); \ 244 - return TRACE_TYPE_UNHANDLED; \ 245 - } \ 246 - \ 247 - field = (typeof(field))entry; \ 248 - \ 249 - trace_seq_init(p); \ 250 - ret = trace_seq_printf(s, "%s: ", event->name); \ 237 + ret = ftrace_raw_output_prep(iter, trace_event); \ 251 238 if (ret) \ 252 - ret = trace_seq_printf(s, print); \ 239 + return ret; \ 240 + \ 241 + ret = trace_seq_printf(s, print); \ 253 242 if (!ret) \ 254 243 return TRACE_TYPE_PARTIAL_LINE; \ 255 244 \ ··· 324 335 325 336 #undef DECLARE_EVENT_CLASS 326 337 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print) \ 327 - static int notrace \ 338 + static int notrace __init \ 328 339 ftrace_define_fields_##call(struct ftrace_event_call *event_call) \ 329 340 { \ 330 341 struct ftrace_raw_##call field; \ ··· 403 414 * 404 415 * static void ftrace_raw_event_<call>(void *__data, proto) 405 416 * { 406 - * struct ftrace_event_call *event_call = __data; 417 + * struct ftrace_event_file *ftrace_file = __data; 418 + * struct ftrace_event_call *event_call = ftrace_file->event_call; 407 419 * struct ftrace_data_offsets_<call> __maybe_unused __data_offsets; 408 420 * struct ring_buffer_event *event; 409 421 * struct ftrace_raw_<call> *entry; <-- defined in stage 1 ··· 413 423 * int __data_size; 414 424 * int pc; 415 425 * 426 + * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, 427 + * &ftrace_file->flags)) 428 + * return; 429 + * 416 430 * local_save_flags(irq_flags); 417 431 * pc = preempt_count(); 418 432 * 419 433 * __data_size = ftrace_get_offsets_<call>(&__data_offsets, args); 420 434 * 421 - * event = trace_current_buffer_lock_reserve(&buffer, 435 + * event = trace_event_buffer_lock_reserve(&buffer, ftrace_file, 422 436 * event_<call>->event.type, 423 437 * sizeof(*entry) + __data_size, 424 438 * irq_flags, pc); ··· 434 440 * __array macros. 435 441 * 436 442 * if (!filter_current_check_discard(buffer, event_call, entry, event)) 437 - * trace_current_buffer_unlock_commit(buffer, 443 + * trace_nowake_buffer_unlock_commit(buffer, 438 444 * event, irq_flags, pc); 439 445 * } 440 446 * ··· 512 518 static notrace void \ 513 519 ftrace_raw_event_##call(void *__data, proto) \ 514 520 { \ 515 - struct ftrace_event_call *event_call = __data; \ 521 + struct ftrace_event_file *ftrace_file = __data; \ 522 + struct ftrace_event_call *event_call = ftrace_file->event_call; \ 516 523 struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\ 517 524 struct ring_buffer_event *event; \ 518 525 struct ftrace_raw_##call *entry; \ ··· 522 527 int __data_size; \ 523 528 int pc; \ 524 529 \ 530 + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \ 531 + &ftrace_file->flags)) \ 532 + return; \ 533 + \ 525 534 local_save_flags(irq_flags); \ 526 535 pc = preempt_count(); \ 527 536 \ 528 537 __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \ 529 538 \ 530 - event = trace_current_buffer_lock_reserve(&buffer, \ 539 + event = trace_event_buffer_lock_reserve(&buffer, ftrace_file, \ 531 540 event_call->event.type, \ 532 541 sizeof(*entry) + __data_size, \ 533 542 irq_flags, pc); \ ··· 580 581 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 581 582 _TRACE_PERF_PROTO(call, PARAMS(proto)); \ 582 583 static const char print_fmt_##call[] = print; \ 583 - static struct ftrace_event_class __used event_class_##call = { \ 584 + static struct ftrace_event_class __used __refdata event_class_##call = { \ 584 585 .system = __stringify(TRACE_SYSTEM), \ 585 586 .define_fields = ftrace_define_fields_##call, \ 586 587 .fields = LIST_HEAD_INIT(event_class_##call.fields),\ ··· 703 704 704 705 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) 705 706 #endif /* CONFIG_PERF_EVENTS */ 706 - 707 - #undef _TRACE_PROFILE_INIT 708 707

+49

kernel/trace/Kconfig

··· 176 176 select GENERIC_TRACER 177 177 select TRACER_MAX_TRACE 178 178 select RING_BUFFER_ALLOW_SWAP 179 + select TRACER_SNAPSHOT 180 + select TRACER_SNAPSHOT_PER_CPU_SWAP 179 181 help 180 182 This option measures the time spent in irqs-off critical 181 183 sections, with microsecond accuracy. ··· 200 198 select GENERIC_TRACER 201 199 select TRACER_MAX_TRACE 202 200 select RING_BUFFER_ALLOW_SWAP 201 + select TRACER_SNAPSHOT 202 + select TRACER_SNAPSHOT_PER_CPU_SWAP 203 203 help 204 204 This option measures the time spent in preemption-off critical 205 205 sections, with microsecond accuracy. ··· 221 217 select GENERIC_TRACER 222 218 select CONTEXT_SWITCH_TRACER 223 219 select TRACER_MAX_TRACE 220 + select TRACER_SNAPSHOT 224 221 help 225 222 This tracer tracks the latency of the highest priority task 226 223 to be scheduled in, starting from the point it has woken up. ··· 252 247 253 248 echo 1 > /sys/kernel/debug/tracing/snapshot 254 249 cat snapshot 250 + 251 + config TRACER_SNAPSHOT_PER_CPU_SWAP 252 + bool "Allow snapshot to swap per CPU" 253 + depends on TRACER_SNAPSHOT 254 + select RING_BUFFER_ALLOW_SWAP 255 + help 256 + Allow doing a snapshot of a single CPU buffer instead of a 257 + full swap (all buffers). If this is set, then the following is 258 + allowed: 259 + 260 + echo 1 > /sys/kernel/debug/tracing/per_cpu/cpu2/snapshot 261 + 262 + After which, only the tracing buffer for CPU 2 was swapped with 263 + the main tracing buffer, and the other CPU buffers remain the same. 264 + 265 + When this is enabled, this adds a little more overhead to the 266 + trace recording, as it needs to add some checks to synchronize 267 + recording with swaps. But this does not affect the performance 268 + of the overall system. This is enabled by default when the preempt 269 + or irq latency tracers are enabled, as those need to swap as well 270 + and already adds the overhead (plus a lot more). 255 271 256 272 config TRACE_BRANCH_PROFILING 257 273 bool ··· 549 523 affected by processes that are running. 550 524 551 525 If unsure, say N. 526 + 527 + config RING_BUFFER_STARTUP_TEST 528 + bool "Ring buffer startup self test" 529 + depends on RING_BUFFER 530 + help 531 + Run a simple self test on the ring buffer on boot up. Late in the 532 + kernel boot sequence, the test will start that kicks off 533 + a thread per cpu. Each thread will write various size events 534 + into the ring buffer. Another thread is created to send IPIs 535 + to each of the threads, where the IPI handler will also write 536 + to the ring buffer, to test/stress the nesting ability. 537 + If any anomalies are discovered, a warning will be displayed 538 + and all ring buffers will be disabled. 539 + 540 + The test runs for 10 seconds. This will slow your boot time 541 + by at least 10 more seconds. 542 + 543 + At the end of the test, statics and more checks are done. 544 + It will output the stats of each per cpu buffer. What 545 + was written, the sizes, what was read, what was lost, and 546 + other similar details. 547 + 548 + If unsure, say N 552 549 553 550 endif # FTRACE 554 551

+2 -2

kernel/trace/blktrace.c

··· 72 72 bool blk_tracer = blk_tracer_enabled; 73 73 74 74 if (blk_tracer) { 75 - buffer = blk_tr->buffer; 75 + buffer = blk_tr->trace_buffer.buffer; 76 76 pc = preempt_count(); 77 77 event = trace_buffer_lock_reserve(buffer, TRACE_BLK, 78 78 sizeof(*t) + len, ··· 218 218 if (blk_tracer) { 219 219 tracing_record_cmdline(current); 220 220 221 - buffer = blk_tr->buffer; 221 + buffer = blk_tr->trace_buffer.buffer; 222 222 pc = preempt_count(); 223 223 event = trace_buffer_lock_reserve(buffer, TRACE_BLK, 224 224 sizeof(*t) + pdu_len,

+69 -29

kernel/trace/ftrace.c

··· 486 486 #define PROFILES_PER_PAGE \ 487 487 (PROFILE_RECORDS_SIZE / sizeof(struct ftrace_profile)) 488 488 489 - static int ftrace_profile_bits __read_mostly; 490 489 static int ftrace_profile_enabled __read_mostly; 491 490 492 491 /* ftrace_profile_lock - synchronize the enable and disable of the profiler */ ··· 493 494 494 495 static DEFINE_PER_CPU(struct ftrace_profile_stat, ftrace_profile_stats); 495 496 496 - #define FTRACE_PROFILE_HASH_SIZE 1024 /* must be power of 2 */ 497 + #define FTRACE_PROFILE_HASH_BITS 10 498 + #define FTRACE_PROFILE_HASH_SIZE (1 << FTRACE_PROFILE_HASH_BITS) 497 499 498 500 static void * 499 501 function_stat_next(void *v, int idx) ··· 676 676 677 677 pages = DIV_ROUND_UP(functions, PROFILES_PER_PAGE); 678 678 679 - for (i = 0; i < pages; i++) { 679 + for (i = 1; i < pages; i++) { 680 680 pg->next = (void *)get_zeroed_page(GFP_KERNEL); 681 681 if (!pg->next) 682 682 goto out_free; ··· 724 724 if (!stat->hash) 725 725 return -ENOMEM; 726 726 727 - if (!ftrace_profile_bits) { 728 - size--; 729 - 730 - for (; size; size >>= 1) 731 - ftrace_profile_bits++; 732 - } 733 - 734 727 /* Preallocate the function profiling pages */ 735 728 if (ftrace_profile_pages_init(stat) < 0) { 736 729 kfree(stat->hash); ··· 756 763 struct hlist_head *hhd; 757 764 unsigned long key; 758 765 759 - key = hash_long(ip, ftrace_profile_bits); 766 + key = hash_long(ip, FTRACE_PROFILE_HASH_BITS); 760 767 hhd = &stat->hash[key]; 761 768 762 769 if (hlist_empty(hhd)) ··· 775 782 { 776 783 unsigned long key; 777 784 778 - key = hash_long(rec->ip, ftrace_profile_bits); 785 + key = hash_long(rec->ip, FTRACE_PROFILE_HASH_BITS); 779 786 hlist_add_head_rcu(&rec->node, &stat->hash[key]); 780 787 } 781 788 ··· 1072 1079 unsigned long flags; 1073 1080 unsigned long ip; 1074 1081 void *data; 1075 - struct rcu_head rcu; 1082 + struct list_head free_list; 1076 1083 }; 1077 1084 1078 1085 struct ftrace_func_entry { ··· 1322 1329 struct hlist_head *hhd; 1323 1330 struct ftrace_hash *old_hash; 1324 1331 struct ftrace_hash *new_hash; 1325 - unsigned long key; 1326 1332 int size = src->count; 1327 1333 int bits = 0; 1328 1334 int ret; ··· 1364 1372 for (i = 0; i < size; i++) { 1365 1373 hhd = &src->buckets[i]; 1366 1374 hlist_for_each_entry_safe(entry, tn, hhd, hlist) { 1367 - if (bits > 0) 1368 - key = hash_long(entry->ip, bits); 1369 - else 1370 - key = 0; 1371 1375 remove_hash_entry(src, entry); 1372 1376 __add_hash_entry(new_hash, entry); 1373 1377 } ··· 2961 2973 } 2962 2974 2963 2975 2964 - static void ftrace_free_entry_rcu(struct rcu_head *rhp) 2976 + static void ftrace_free_entry(struct ftrace_func_probe *entry) 2965 2977 { 2966 - struct ftrace_func_probe *entry = 2967 - container_of(rhp, struct ftrace_func_probe, rcu); 2968 - 2969 2978 if (entry->ops->free) 2970 - entry->ops->free(&entry->data); 2979 + entry->ops->free(entry->ops, entry->ip, &entry->data); 2971 2980 kfree(entry); 2972 2981 } 2973 - 2974 2982 2975 2983 int 2976 2984 register_ftrace_function_probe(char *glob, struct ftrace_probe_ops *ops, 2977 2985 void *data) 2978 2986 { 2979 2987 struct ftrace_func_probe *entry; 2988 + struct ftrace_hash **orig_hash = &trace_probe_ops.filter_hash; 2989 + struct ftrace_hash *hash; 2980 2990 struct ftrace_page *pg; 2981 2991 struct dyn_ftrace *rec; 2982 2992 int type, len, not; 2983 2993 unsigned long key; 2984 2994 int count = 0; 2985 2995 char *search; 2996 + int ret; 2986 2997 2987 2998 type = filter_parse_regex(glob, strlen(glob), &search, &not); 2988 2999 len = strlen(search); ··· 2992 3005 2993 3006 mutex_lock(&ftrace_lock); 2994 3007 2995 - if (unlikely(ftrace_disabled)) 3008 + hash = alloc_and_copy_ftrace_hash(FTRACE_HASH_DEFAULT_BITS, *orig_hash); 3009 + if (!hash) { 3010 + count = -ENOMEM; 2996 3011 goto out_unlock; 3012 + } 3013 + 3014 + if (unlikely(ftrace_disabled)) { 3015 + count = -ENODEV; 3016 + goto out_unlock; 3017 + } 2997 3018 2998 3019 do_for_each_ftrace_rec(pg, rec) { 2999 3020 ··· 3025 3030 * for each function we find. We call the callback 3026 3031 * to give the caller an opportunity to do so. 3027 3032 */ 3028 - if (ops->callback) { 3029 - if (ops->callback(rec->ip, &entry->data) < 0) { 3033 + if (ops->init) { 3034 + if (ops->init(ops, rec->ip, &entry->data) < 0) { 3030 3035 /* caller does not like this func */ 3031 3036 kfree(entry); 3032 3037 continue; 3033 3038 } 3039 + } 3040 + 3041 + ret = enter_record(hash, rec, 0); 3042 + if (ret < 0) { 3043 + kfree(entry); 3044 + count = ret; 3045 + goto out_unlock; 3034 3046 } 3035 3047 3036 3048 entry->ops = ops; ··· 3047 3045 hlist_add_head_rcu(&entry->node, &ftrace_func_hash[key]); 3048 3046 3049 3047 } while_for_each_ftrace_rec(); 3048 + 3049 + ret = ftrace_hash_move(&trace_probe_ops, 1, orig_hash, hash); 3050 + if (ret < 0) 3051 + count = ret; 3052 + 3050 3053 __enable_ftrace_function_probe(); 3051 3054 3052 3055 out_unlock: 3053 3056 mutex_unlock(&ftrace_lock); 3057 + free_ftrace_hash(hash); 3054 3058 3055 3059 return count; 3056 3060 } ··· 3070 3062 __unregister_ftrace_function_probe(char *glob, struct ftrace_probe_ops *ops, 3071 3063 void *data, int flags) 3072 3064 { 3065 + struct ftrace_func_entry *rec_entry; 3073 3066 struct ftrace_func_probe *entry; 3067 + struct ftrace_func_probe *p; 3068 + struct ftrace_hash **orig_hash = &trace_probe_ops.filter_hash; 3069 + struct list_head free_list; 3070 + struct ftrace_hash *hash; 3074 3071 struct hlist_node *tmp; 3075 3072 char str[KSYM_SYMBOL_LEN]; 3076 3073 int type = MATCH_FULL; ··· 3096 3083 } 3097 3084 3098 3085 mutex_lock(&ftrace_lock); 3086 + 3087 + hash = alloc_and_copy_ftrace_hash(FTRACE_HASH_DEFAULT_BITS, *orig_hash); 3088 + if (!hash) 3089 + /* Hmm, should report this somehow */ 3090 + goto out_unlock; 3091 + 3092 + INIT_LIST_HEAD(&free_list); 3093 + 3099 3094 for (i = 0; i < FTRACE_FUNC_HASHSIZE; i++) { 3100 3095 struct hlist_head *hhd = &ftrace_func_hash[i]; 3101 3096 ··· 3124 3103 continue; 3125 3104 } 3126 3105 3106 + rec_entry = ftrace_lookup_ip(hash, entry->ip); 3107 + /* It is possible more than one entry had this ip */ 3108 + if (rec_entry) 3109 + free_hash_entry(hash, rec_entry); 3110 + 3127 3111 hlist_del_rcu(&entry->node); 3128 - call_rcu_sched(&entry->rcu, ftrace_free_entry_rcu); 3112 + list_add(&entry->free_list, &free_list); 3129 3113 } 3130 3114 } 3131 3115 __disable_ftrace_function_probe(); 3116 + /* 3117 + * Remove after the disable is called. Otherwise, if the last 3118 + * probe is removed, a null hash means *all enabled*. 3119 + */ 3120 + ftrace_hash_move(&trace_probe_ops, 1, orig_hash, hash); 3121 + synchronize_sched(); 3122 + list_for_each_entry_safe(entry, p, &free_list, free_list) { 3123 + list_del(&entry->free_list); 3124 + ftrace_free_entry(entry); 3125 + } 3126 + 3127 + out_unlock: 3132 3128 mutex_unlock(&ftrace_lock); 3129 + free_ftrace_hash(hash); 3133 3130 } 3134 3131 3135 3132 void ··· 3775 3736 if (fail) 3776 3737 return -EINVAL; 3777 3738 3778 - ftrace_graph_filter_enabled = 1; 3739 + ftrace_graph_filter_enabled = !!(*idx); 3740 + 3779 3741 return 0; 3780 3742 } 3781 3743

+494 -6

kernel/trace/ring_buffer.c

··· 8 8 #include <linux/trace_clock.h> 9 9 #include <linux/trace_seq.h> 10 10 #include <linux/spinlock.h> 11 + #include <linux/irq_work.h> 11 12 #include <linux/debugfs.h> 12 13 #include <linux/uaccess.h> 13 14 #include <linux/hardirq.h> 15 + #include <linux/kthread.h> /* for self test */ 14 16 #include <linux/kmemcheck.h> 15 17 #include <linux/module.h> 16 18 #include <linux/percpu.h> 17 19 #include <linux/mutex.h> 20 + #include <linux/delay.h> 18 21 #include <linux/slab.h> 19 22 #include <linux/init.h> 20 23 #include <linux/hash.h> ··· 447 444 return ret; 448 445 } 449 446 447 + struct rb_irq_work { 448 + struct irq_work work; 449 + wait_queue_head_t waiters; 450 + bool waiters_pending; 451 + }; 452 + 450 453 /* 451 454 * head_page == tail_page && head == tail then buffer is empty. 452 455 */ ··· 487 478 struct list_head new_pages; /* new pages to add */ 488 479 struct work_struct update_pages_work; 489 480 struct completion update_done; 481 + 482 + struct rb_irq_work irq_work; 490 483 }; 491 484 492 485 struct ring_buffer { ··· 508 497 struct notifier_block cpu_notify; 509 498 #endif 510 499 u64 (*clock)(void); 500 + 501 + struct rb_irq_work irq_work; 511 502 }; 512 503 513 504 struct ring_buffer_iter { ··· 520 507 unsigned long cache_read; 521 508 u64 read_stamp; 522 509 }; 510 + 511 + /* 512 + * rb_wake_up_waiters - wake up tasks waiting for ring buffer input 513 + * 514 + * Schedules a delayed work to wake up any task that is blocked on the 515 + * ring buffer waiters queue. 516 + */ 517 + static void rb_wake_up_waiters(struct irq_work *work) 518 + { 519 + struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work); 520 + 521 + wake_up_all(&rbwork->waiters); 522 + } 523 + 524 + /** 525 + * ring_buffer_wait - wait for input to the ring buffer 526 + * @buffer: buffer to wait on 527 + * @cpu: the cpu buffer to wait on 528 + * 529 + * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon 530 + * as data is added to any of the @buffer's cpu buffers. Otherwise 531 + * it will wait for data to be added to a specific cpu buffer. 532 + */ 533 + void ring_buffer_wait(struct ring_buffer *buffer, int cpu) 534 + { 535 + struct ring_buffer_per_cpu *cpu_buffer; 536 + DEFINE_WAIT(wait); 537 + struct rb_irq_work *work; 538 + 539 + /* 540 + * Depending on what the caller is waiting for, either any 541 + * data in any cpu buffer, or a specific buffer, put the 542 + * caller on the appropriate wait queue. 543 + */ 544 + if (cpu == RING_BUFFER_ALL_CPUS) 545 + work = &buffer->irq_work; 546 + else { 547 + cpu_buffer = buffer->buffers[cpu]; 548 + work = &cpu_buffer->irq_work; 549 + } 550 + 551 + 552 + prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE); 553 + 554 + /* 555 + * The events can happen in critical sections where 556 + * checking a work queue can cause deadlocks. 557 + * After adding a task to the queue, this flag is set 558 + * only to notify events to try to wake up the queue 559 + * using irq_work. 560 + * 561 + * We don't clear it even if the buffer is no longer 562 + * empty. The flag only causes the next event to run 563 + * irq_work to do the work queue wake up. The worse 564 + * that can happen if we race with !trace_empty() is that 565 + * an event will cause an irq_work to try to wake up 566 + * an empty queue. 567 + * 568 + * There's no reason to protect this flag either, as 569 + * the work queue and irq_work logic will do the necessary 570 + * synchronization for the wake ups. The only thing 571 + * that is necessary is that the wake up happens after 572 + * a task has been queued. It's OK for spurious wake ups. 573 + */ 574 + work->waiters_pending = true; 575 + 576 + if ((cpu == RING_BUFFER_ALL_CPUS && ring_buffer_empty(buffer)) || 577 + (cpu != RING_BUFFER_ALL_CPUS && ring_buffer_empty_cpu(buffer, cpu))) 578 + schedule(); 579 + 580 + finish_wait(&work->waiters, &wait); 581 + } 582 + 583 + /** 584 + * ring_buffer_poll_wait - poll on buffer input 585 + * @buffer: buffer to wait on 586 + * @cpu: the cpu buffer to wait on 587 + * @filp: the file descriptor 588 + * @poll_table: The poll descriptor 589 + * 590 + * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon 591 + * as data is added to any of the @buffer's cpu buffers. Otherwise 592 + * it will wait for data to be added to a specific cpu buffer. 593 + * 594 + * Returns POLLIN | POLLRDNORM if data exists in the buffers, 595 + * zero otherwise. 596 + */ 597 + int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu, 598 + struct file *filp, poll_table *poll_table) 599 + { 600 + struct ring_buffer_per_cpu *cpu_buffer; 601 + struct rb_irq_work *work; 602 + 603 + if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || 604 + (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) 605 + return POLLIN | POLLRDNORM; 606 + 607 + if (cpu == RING_BUFFER_ALL_CPUS) 608 + work = &buffer->irq_work; 609 + else { 610 + cpu_buffer = buffer->buffers[cpu]; 611 + work = &cpu_buffer->irq_work; 612 + } 613 + 614 + work->waiters_pending = true; 615 + poll_wait(filp, &work->waiters, poll_table); 616 + 617 + if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || 618 + (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) 619 + return POLLIN | POLLRDNORM; 620 + return 0; 621 + } 523 622 524 623 /* buffer may be either ring_buffer or ring_buffer_per_cpu */ 525 624 #define RB_WARN_ON(b, cond) \ ··· 1188 1063 cpu_buffer->lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; 1189 1064 INIT_WORK(&cpu_buffer->update_pages_work, update_pages_handler); 1190 1065 init_completion(&cpu_buffer->update_done); 1066 + init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters); 1067 + init_waitqueue_head(&cpu_buffer->irq_work.waiters); 1191 1068 1192 1069 bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()), 1193 1070 GFP_KERNEL, cpu_to_node(cpu)); ··· 1284 1157 buffer->flags = flags; 1285 1158 buffer->clock = trace_clock_local; 1286 1159 buffer->reader_lock_key = key; 1160 + 1161 + init_irq_work(&buffer->irq_work.work, rb_wake_up_waiters); 1162 + init_waitqueue_head(&buffer->irq_work.waiters); 1287 1163 1288 1164 /* need at least two pages */ 1289 1165 if (nr_pages < 2) ··· 1683 1553 if (!cpu_buffer->nr_pages_to_update) 1684 1554 continue; 1685 1555 1686 - if (cpu_online(cpu)) 1556 + /* The update must run on the CPU that is being updated. */ 1557 + preempt_disable(); 1558 + if (cpu == smp_processor_id() || !cpu_online(cpu)) { 1559 + rb_update_pages(cpu_buffer); 1560 + cpu_buffer->nr_pages_to_update = 0; 1561 + } else { 1562 + /* 1563 + * Can not disable preemption for schedule_work_on() 1564 + * on PREEMPT_RT. 1565 + */ 1566 + preempt_enable(); 1687 1567 schedule_work_on(cpu, 1688 1568 &cpu_buffer->update_pages_work); 1689 - else 1690 - rb_update_pages(cpu_buffer); 1569 + preempt_disable(); 1570 + } 1571 + preempt_enable(); 1691 1572 } 1692 1573 1693 1574 /* wait for all the updates to complete */ ··· 1736 1595 1737 1596 get_online_cpus(); 1738 1597 1739 - if (cpu_online(cpu_id)) { 1598 + preempt_disable(); 1599 + /* The update must run on the CPU that is being updated. */ 1600 + if (cpu_id == smp_processor_id() || !cpu_online(cpu_id)) 1601 + rb_update_pages(cpu_buffer); 1602 + else { 1603 + /* 1604 + * Can not disable preemption for schedule_work_on() 1605 + * on PREEMPT_RT. 1606 + */ 1607 + preempt_enable(); 1740 1608 schedule_work_on(cpu_id, 1741 1609 &cpu_buffer->update_pages_work); 1742 1610 wait_for_completion(&cpu_buffer->update_done); 1743 - } else 1744 - rb_update_pages(cpu_buffer); 1611 + preempt_disable(); 1612 + } 1613 + preempt_enable(); 1745 1614 1746 1615 cpu_buffer->nr_pages_to_update = 0; 1747 1616 put_online_cpus(); ··· 2763 2612 rb_end_commit(cpu_buffer); 2764 2613 } 2765 2614 2615 + static __always_inline void 2616 + rb_wakeups(struct ring_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer) 2617 + { 2618 + if (buffer->irq_work.waiters_pending) { 2619 + buffer->irq_work.waiters_pending = false; 2620 + /* irq_work_queue() supplies it's own memory barriers */ 2621 + irq_work_queue(&buffer->irq_work.work); 2622 + } 2623 + 2624 + if (cpu_buffer->irq_work.waiters_pending) { 2625 + cpu_buffer->irq_work.waiters_pending = false; 2626 + /* irq_work_queue() supplies it's own memory barriers */ 2627 + irq_work_queue(&cpu_buffer->irq_work.work); 2628 + } 2629 + } 2630 + 2766 2631 /** 2767 2632 * ring_buffer_unlock_commit - commit a reserved 2768 2633 * @buffer: The buffer to commit to ··· 2797 2630 cpu_buffer = buffer->buffers[cpu]; 2798 2631 2799 2632 rb_commit(cpu_buffer, event); 2633 + 2634 + rb_wakeups(buffer, cpu_buffer); 2800 2635 2801 2636 trace_recursive_unlock(); 2802 2637 ··· 2971 2802 memcpy(body, data, length); 2972 2803 2973 2804 rb_commit(cpu_buffer, event); 2805 + 2806 + rb_wakeups(buffer, cpu_buffer); 2974 2807 2975 2808 ret = 0; 2976 2809 out: ··· 4638 4467 return NOTIFY_OK; 4639 4468 } 4640 4469 #endif 4470 + 4471 + #ifdef CONFIG_RING_BUFFER_STARTUP_TEST 4472 + /* 4473 + * This is a basic integrity check of the ring buffer. 4474 + * Late in the boot cycle this test will run when configured in. 4475 + * It will kick off a thread per CPU that will go into a loop 4476 + * writing to the per cpu ring buffer various sizes of data. 4477 + * Some of the data will be large items, some small. 4478 + * 4479 + * Another thread is created that goes into a spin, sending out 4480 + * IPIs to the other CPUs to also write into the ring buffer. 4481 + * this is to test the nesting ability of the buffer. 4482 + * 4483 + * Basic stats are recorded and reported. If something in the 4484 + * ring buffer should happen that's not expected, a big warning 4485 + * is displayed and all ring buffers are disabled. 4486 + */ 4487 + static struct task_struct *rb_threads[NR_CPUS] __initdata; 4488 + 4489 + struct rb_test_data { 4490 + struct ring_buffer *buffer; 4491 + unsigned long events; 4492 + unsigned long bytes_written; 4493 + unsigned long bytes_alloc; 4494 + unsigned long bytes_dropped; 4495 + unsigned long events_nested; 4496 + unsigned long bytes_written_nested; 4497 + unsigned long bytes_alloc_nested; 4498 + unsigned long bytes_dropped_nested; 4499 + int min_size_nested; 4500 + int max_size_nested; 4501 + int max_size; 4502 + int min_size; 4503 + int cpu; 4504 + int cnt; 4505 + }; 4506 + 4507 + static struct rb_test_data rb_data[NR_CPUS] __initdata; 4508 + 4509 + /* 1 meg per cpu */ 4510 + #define RB_TEST_BUFFER_SIZE 1048576 4511 + 4512 + static char rb_string[] __initdata = 4513 + "abcdefghijklmnopqrstuvwxyz1234567890!@#$%^&*()?+\\" 4514 + "?+|:';\",.<>/?abcdefghijklmnopqrstuvwxyz1234567890" 4515 + "!@#$%^&*()?+\\?+|:';\",.<>/?abcdefghijklmnopqrstuv"; 4516 + 4517 + static bool rb_test_started __initdata; 4518 + 4519 + struct rb_item { 4520 + int size; 4521 + char str[]; 4522 + }; 4523 + 4524 + static __init int rb_write_something(struct rb_test_data *data, bool nested) 4525 + { 4526 + struct ring_buffer_event *event; 4527 + struct rb_item *item; 4528 + bool started; 4529 + int event_len; 4530 + int size; 4531 + int len; 4532 + int cnt; 4533 + 4534 + /* Have nested writes different that what is written */ 4535 + cnt = data->cnt + (nested ? 27 : 0); 4536 + 4537 + /* Multiply cnt by ~e, to make some unique increment */ 4538 + size = (data->cnt * 68 / 25) % (sizeof(rb_string) - 1); 4539 + 4540 + len = size + sizeof(struct rb_item); 4541 + 4542 + started = rb_test_started; 4543 + /* read rb_test_started before checking buffer enabled */ 4544 + smp_rmb(); 4545 + 4546 + event = ring_buffer_lock_reserve(data->buffer, len); 4547 + if (!event) { 4548 + /* Ignore dropped events before test starts. */ 4549 + if (started) { 4550 + if (nested) 4551 + data->bytes_dropped += len; 4552 + else 4553 + data->bytes_dropped_nested += len; 4554 + } 4555 + return len; 4556 + } 4557 + 4558 + event_len = ring_buffer_event_length(event); 4559 + 4560 + if (RB_WARN_ON(data->buffer, event_len < len)) 4561 + goto out; 4562 + 4563 + item = ring_buffer_event_data(event); 4564 + item->size = size; 4565 + memcpy(item->str, rb_string, size); 4566 + 4567 + if (nested) { 4568 + data->bytes_alloc_nested += event_len; 4569 + data->bytes_written_nested += len; 4570 + data->events_nested++; 4571 + if (!data->min_size_nested || len < data->min_size_nested) 4572 + data->min_size_nested = len; 4573 + if (len > data->max_size_nested) 4574 + data->max_size_nested = len; 4575 + } else { 4576 + data->bytes_alloc += event_len; 4577 + data->bytes_written += len; 4578 + data->events++; 4579 + if (!data->min_size || len < data->min_size) 4580 + data->max_size = len; 4581 + if (len > data->max_size) 4582 + data->max_size = len; 4583 + } 4584 + 4585 + out: 4586 + ring_buffer_unlock_commit(data->buffer, event); 4587 + 4588 + return 0; 4589 + } 4590 + 4591 + static __init int rb_test(void *arg) 4592 + { 4593 + struct rb_test_data *data = arg; 4594 + 4595 + while (!kthread_should_stop()) { 4596 + rb_write_something(data, false); 4597 + data->cnt++; 4598 + 4599 + set_current_state(TASK_INTERRUPTIBLE); 4600 + /* Now sleep between a min of 100-300us and a max of 1ms */ 4601 + usleep_range(((data->cnt % 3) + 1) * 100, 1000); 4602 + } 4603 + 4604 + return 0; 4605 + } 4606 + 4607 + static __init void rb_ipi(void *ignore) 4608 + { 4609 + struct rb_test_data *data; 4610 + int cpu = smp_processor_id(); 4611 + 4612 + data = &rb_data[cpu]; 4613 + rb_write_something(data, true); 4614 + } 4615 + 4616 + static __init int rb_hammer_test(void *arg) 4617 + { 4618 + while (!kthread_should_stop()) { 4619 + 4620 + /* Send an IPI to all cpus to write data! */ 4621 + smp_call_function(rb_ipi, NULL, 1); 4622 + /* No sleep, but for non preempt, let others run */ 4623 + schedule(); 4624 + } 4625 + 4626 + return 0; 4627 + } 4628 + 4629 + static __init int test_ringbuffer(void) 4630 + { 4631 + struct task_struct *rb_hammer; 4632 + struct ring_buffer *buffer; 4633 + int cpu; 4634 + int ret = 0; 4635 + 4636 + pr_info("Running ring buffer tests...\n"); 4637 + 4638 + buffer = ring_buffer_alloc(RB_TEST_BUFFER_SIZE, RB_FL_OVERWRITE); 4639 + if (WARN_ON(!buffer)) 4640 + return 0; 4641 + 4642 + /* Disable buffer so that threads can't write to it yet */ 4643 + ring_buffer_record_off(buffer); 4644 + 4645 + for_each_online_cpu(cpu) { 4646 + rb_data[cpu].buffer = buffer; 4647 + rb_data[cpu].cpu = cpu; 4648 + rb_data[cpu].cnt = cpu; 4649 + rb_threads[cpu] = kthread_create(rb_test, &rb_data[cpu], 4650 + "rbtester/%d", cpu); 4651 + if (WARN_ON(!rb_threads[cpu])) { 4652 + pr_cont("FAILED\n"); 4653 + ret = -1; 4654 + goto out_free; 4655 + } 4656 + 4657 + kthread_bind(rb_threads[cpu], cpu); 4658 + wake_up_process(rb_threads[cpu]); 4659 + } 4660 + 4661 + /* Now create the rb hammer! */ 4662 + rb_hammer = kthread_run(rb_hammer_test, NULL, "rbhammer"); 4663 + if (WARN_ON(!rb_hammer)) { 4664 + pr_cont("FAILED\n"); 4665 + ret = -1; 4666 + goto out_free; 4667 + } 4668 + 4669 + ring_buffer_record_on(buffer); 4670 + /* 4671 + * Show buffer is enabled before setting rb_test_started. 4672 + * Yes there's a small race window where events could be 4673 + * dropped and the thread wont catch it. But when a ring 4674 + * buffer gets enabled, there will always be some kind of 4675 + * delay before other CPUs see it. Thus, we don't care about 4676 + * those dropped events. We care about events dropped after 4677 + * the threads see that the buffer is active. 4678 + */ 4679 + smp_wmb(); 4680 + rb_test_started = true; 4681 + 4682 + set_current_state(TASK_INTERRUPTIBLE); 4683 + /* Just run for 10 seconds */; 4684 + schedule_timeout(10 * HZ); 4685 + 4686 + kthread_stop(rb_hammer); 4687 + 4688 + out_free: 4689 + for_each_online_cpu(cpu) { 4690 + if (!rb_threads[cpu]) 4691 + break; 4692 + kthread_stop(rb_threads[cpu]); 4693 + } 4694 + if (ret) { 4695 + ring_buffer_free(buffer); 4696 + return ret; 4697 + } 4698 + 4699 + /* Report! */ 4700 + pr_info("finished\n"); 4701 + for_each_online_cpu(cpu) { 4702 + struct ring_buffer_event *event; 4703 + struct rb_test_data *data = &rb_data[cpu]; 4704 + struct rb_item *item; 4705 + unsigned long total_events; 4706 + unsigned long total_dropped; 4707 + unsigned long total_written; 4708 + unsigned long total_alloc; 4709 + unsigned long total_read = 0; 4710 + unsigned long total_size = 0; 4711 + unsigned long total_len = 0; 4712 + unsigned long total_lost = 0; 4713 + unsigned long lost; 4714 + int big_event_size; 4715 + int small_event_size; 4716 + 4717 + ret = -1; 4718 + 4719 + total_events = data->events + data->events_nested; 4720 + total_written = data->bytes_written + data->bytes_written_nested; 4721 + total_alloc = data->bytes_alloc + data->bytes_alloc_nested; 4722 + total_dropped = data->bytes_dropped + data->bytes_dropped_nested; 4723 + 4724 + big_event_size = data->max_size + data->max_size_nested; 4725 + small_event_size = data->min_size + data->min_size_nested; 4726 + 4727 + pr_info("CPU %d:\n", cpu); 4728 + pr_info(" events: %ld\n", total_events); 4729 + pr_info(" dropped bytes: %ld\n", total_dropped); 4730 + pr_info(" alloced bytes: %ld\n", total_alloc); 4731 + pr_info(" written bytes: %ld\n", total_written); 4732 + pr_info(" biggest event: %d\n", big_event_size); 4733 + pr_info(" smallest event: %d\n", small_event_size); 4734 + 4735 + if (RB_WARN_ON(buffer, total_dropped)) 4736 + break; 4737 + 4738 + ret = 0; 4739 + 4740 + while ((event = ring_buffer_consume(buffer, cpu, NULL, &lost))) { 4741 + total_lost += lost; 4742 + item = ring_buffer_event_data(event); 4743 + total_len += ring_buffer_event_length(event); 4744 + total_size += item->size + sizeof(struct rb_item); 4745 + if (memcmp(&item->str[0], rb_string, item->size) != 0) { 4746 + pr_info("FAILED!\n"); 4747 + pr_info("buffer had: %.*s\n", item->size, item->str); 4748 + pr_info("expected: %.*s\n", item->size, rb_string); 4749 + RB_WARN_ON(buffer, 1); 4750 + ret = -1; 4751 + break; 4752 + } 4753 + total_read++; 4754 + } 4755 + if (ret) 4756 + break; 4757 + 4758 + ret = -1; 4759 + 4760 + pr_info(" read events: %ld\n", total_read); 4761 + pr_info(" lost events: %ld\n", total_lost); 4762 + pr_info(" total events: %ld\n", total_lost + total_read); 4763 + pr_info(" recorded len bytes: %ld\n", total_len); 4764 + pr_info(" recorded size bytes: %ld\n", total_size); 4765 + if (total_lost) 4766 + pr_info(" With dropped events, record len and size may not match\n" 4767 + " alloced and written from above\n"); 4768 + if (!total_lost) { 4769 + if (RB_WARN_ON(buffer, total_len != total_alloc || 4770 + total_size != total_written)) 4771 + break; 4772 + } 4773 + if (RB_WARN_ON(buffer, total_lost + total_read != total_events)) 4774 + break; 4775 + 4776 + ret = 0; 4777 + } 4778 + if (!ret) 4779 + pr_info("Ring buffer PASSED!\n"); 4780 + 4781 + ring_buffer_free(buffer); 4782 + return 0; 4783 + } 4784 + 4785 + late_initcall(test_ringbuffer); 4786 + #endif /* CONFIG_RING_BUFFER_STARTUP_TEST */

+1546 -664

kernel/trace/trace.c

··· 1 1 /* 2 2 * ring buffer based function tracer 3 3 * 4 - * Copyright (C) 2007-2008 Steven Rostedt <srostedt@redhat.com> 4 + * Copyright (C) 2007-2012 Steven Rostedt <srostedt@redhat.com> 5 5 * Copyright (C) 2008 Ingo Molnar <mingo@redhat.com> 6 6 * 7 7 * Originally taken from the RT patch by: ··· 19 19 #include <linux/seq_file.h> 20 20 #include <linux/notifier.h> 21 21 #include <linux/irqflags.h> 22 - #include <linux/irq_work.h> 23 22 #include <linux/debugfs.h> 24 23 #include <linux/pagemap.h> 25 24 #include <linux/hardirq.h> ··· 47 48 * On boot up, the ring buffer is set to the minimum size, so that 48 49 * we do not waste memory on systems that are not using tracing. 49 50 */ 50 - int ring_buffer_expanded; 51 + bool ring_buffer_expanded; 51 52 52 53 /* 53 54 * We need to change this state when a selftest is running. ··· 86 87 static DEFINE_PER_CPU(bool, trace_cmdline_save); 87 88 88 89 /* 89 - * When a reader is waiting for data, then this variable is 90 - * set to true. 91 - */ 92 - static bool trace_wakeup_needed; 93 - 94 - static struct irq_work trace_work_wakeup; 95 - 96 - /* 97 90 * Kill all tracing for good (never come back). 98 91 * It is initialized to 1 but will turn to zero if the initialization 99 92 * of the tracer is successful. But that is the only place that sets ··· 121 130 static char bootup_tracer_buf[MAX_TRACER_SIZE] __initdata; 122 131 static char *default_bootup_tracer; 123 132 133 + static bool allocate_snapshot; 134 + 124 135 static int __init set_cmdline_ftrace(char *str) 125 136 { 126 137 strlcpy(bootup_tracer_buf, str, MAX_TRACER_SIZE); 127 138 default_bootup_tracer = bootup_tracer_buf; 128 139 /* We are using ftrace early, expand it */ 129 - ring_buffer_expanded = 1; 140 + ring_buffer_expanded = true; 130 141 return 1; 131 142 } 132 143 __setup("ftrace=", set_cmdline_ftrace); ··· 148 155 return 0; 149 156 } 150 157 __setup("ftrace_dump_on_oops", set_ftrace_dump_on_oops); 158 + 159 + static int __init boot_alloc_snapshot(char *str) 160 + { 161 + allocate_snapshot = true; 162 + /* We also need the main ring buffer expanded */ 163 + ring_buffer_expanded = true; 164 + return 1; 165 + } 166 + __setup("alloc_snapshot", boot_alloc_snapshot); 151 167 152 168 153 169 static char trace_boot_options_buf[MAX_TRACER_SIZE] __initdata; ··· 191 189 */ 192 190 static struct trace_array global_trace; 193 191 194 - static DEFINE_PER_CPU(struct trace_array_cpu, global_trace_cpu); 192 + LIST_HEAD(ftrace_trace_arrays); 195 193 196 194 int filter_current_check_discard(struct ring_buffer *buffer, 197 195 struct ftrace_event_call *call, void *rec, ··· 206 204 u64 ts; 207 205 208 206 /* Early boot up does not have a buffer yet */ 209 - if (!global_trace.buffer) 207 + if (!global_trace.trace_buffer.buffer) 210 208 return trace_clock_local(); 211 209 212 - ts = ring_buffer_time_stamp(global_trace.buffer, cpu); 213 - ring_buffer_normalize_time_stamp(global_trace.buffer, cpu, &ts); 210 + ts = ring_buffer_time_stamp(global_trace.trace_buffer.buffer, cpu); 211 + ring_buffer_normalize_time_stamp(global_trace.trace_buffer.buffer, cpu, &ts); 214 212 215 213 return ts; 216 214 } 217 - 218 - /* 219 - * The max_tr is used to snapshot the global_trace when a maximum 220 - * latency is reached. Some tracers will use this to store a maximum 221 - * trace while it continues examining live traces. 222 - * 223 - * The buffers for the max_tr are set up the same as the global_trace. 224 - * When a snapshot is taken, the link list of the max_tr is swapped 225 - * with the link list of the global_trace and the buffers are reset for 226 - * the global_trace so the tracing can continue. 227 - */ 228 - static struct trace_array max_tr; 229 - 230 - static DEFINE_PER_CPU(struct trace_array_cpu, max_tr_data); 231 215 232 216 int tracing_is_enabled(void) 233 217 { ··· 236 248 237 249 /* trace_types holds a link list of available tracers. */ 238 250 static struct tracer *trace_types __read_mostly; 239 - 240 - /* current_trace points to the tracer that is currently active */ 241 - static struct tracer *current_trace __read_mostly = &nop_trace; 242 251 243 252 /* 244 253 * trace_types_lock is used to protect the trace_types list. ··· 270 285 271 286 static inline void trace_access_lock(int cpu) 272 287 { 273 - if (cpu == TRACE_PIPE_ALL_CPU) { 288 + if (cpu == RING_BUFFER_ALL_CPUS) { 274 289 /* gain it for accessing the whole ring buffer. */ 275 290 down_write(&all_cpu_access_lock); 276 291 } else { 277 292 /* gain it for accessing a cpu ring buffer. */ 278 293 279 - /* Firstly block other trace_access_lock(TRACE_PIPE_ALL_CPU). */ 294 + /* Firstly block other trace_access_lock(RING_BUFFER_ALL_CPUS). */ 280 295 down_read(&all_cpu_access_lock); 281 296 282 297 /* Secondly block other access to this @cpu ring buffer. */ ··· 286 301 287 302 static inline void trace_access_unlock(int cpu) 288 303 { 289 - if (cpu == TRACE_PIPE_ALL_CPU) { 304 + if (cpu == RING_BUFFER_ALL_CPUS) { 290 305 up_write(&all_cpu_access_lock); 291 306 } else { 292 307 mutex_unlock(&per_cpu(cpu_access_lock, cpu)); ··· 324 339 325 340 #endif 326 341 327 - /* trace_wait is a waitqueue for tasks blocked on trace_poll */ 328 - static DECLARE_WAIT_QUEUE_HEAD(trace_wait); 329 - 330 342 /* trace_flags holds trace_options default values */ 331 343 unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK | 332 344 TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | TRACE_ITER_SLEEP_TIME | 333 345 TRACE_ITER_GRAPH_TIME | TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE | 334 - TRACE_ITER_IRQ_INFO | TRACE_ITER_MARKERS; 335 - 336 - static int trace_stop_count; 337 - static DEFINE_RAW_SPINLOCK(tracing_start_lock); 338 - 339 - /** 340 - * trace_wake_up - wake up tasks waiting for trace input 341 - * 342 - * Schedules a delayed work to wake up any task that is blocked on the 343 - * trace_wait queue. These is used with trace_poll for tasks polling the 344 - * trace. 345 - */ 346 - static void trace_wake_up(struct irq_work *work) 347 - { 348 - wake_up_all(&trace_wait); 349 - 350 - } 346 + TRACE_ITER_IRQ_INFO | TRACE_ITER_MARKERS | TRACE_ITER_FUNCTION; 351 347 352 348 /** 353 349 * tracing_on - enable tracing buffers ··· 338 372 */ 339 373 void tracing_on(void) 340 374 { 341 - if (global_trace.buffer) 342 - ring_buffer_record_on(global_trace.buffer); 375 + if (global_trace.trace_buffer.buffer) 376 + ring_buffer_record_on(global_trace.trace_buffer.buffer); 343 377 /* 344 378 * This flag is only looked at when buffers haven't been 345 379 * allocated yet. We don't really care about the race ··· 351 385 EXPORT_SYMBOL_GPL(tracing_on); 352 386 353 387 /** 388 + * __trace_puts - write a constant string into the trace buffer. 389 + * @ip: The address of the caller 390 + * @str: The constant string to write 391 + * @size: The size of the string. 392 + */ 393 + int __trace_puts(unsigned long ip, const char *str, int size) 394 + { 395 + struct ring_buffer_event *event; 396 + struct ring_buffer *buffer; 397 + struct print_entry *entry; 398 + unsigned long irq_flags; 399 + int alloc; 400 + 401 + alloc = sizeof(*entry) + size + 2; /* possible \n added */ 402 + 403 + local_save_flags(irq_flags); 404 + buffer = global_trace.trace_buffer.buffer; 405 + event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, alloc, 406 + irq_flags, preempt_count()); 407 + if (!event) 408 + return 0; 409 + 410 + entry = ring_buffer_event_data(event); 411 + entry->ip = ip; 412 + 413 + memcpy(&entry->buf, str, size); 414 + 415 + /* Add a newline if necessary */ 416 + if (entry->buf[size - 1] != '\n') { 417 + entry->buf[size] = '\n'; 418 + entry->buf[size + 1] = '\0'; 419 + } else 420 + entry->buf[size] = '\0'; 421 + 422 + __buffer_unlock_commit(buffer, event); 423 + 424 + return size; 425 + } 426 + EXPORT_SYMBOL_GPL(__trace_puts); 427 + 428 + /** 429 + * __trace_bputs - write the pointer to a constant string into trace buffer 430 + * @ip: The address of the caller 431 + * @str: The constant string to write to the buffer to 432 + */ 433 + int __trace_bputs(unsigned long ip, const char *str) 434 + { 435 + struct ring_buffer_event *event; 436 + struct ring_buffer *buffer; 437 + struct bputs_entry *entry; 438 + unsigned long irq_flags; 439 + int size = sizeof(struct bputs_entry); 440 + 441 + local_save_flags(irq_flags); 442 + buffer = global_trace.trace_buffer.buffer; 443 + event = trace_buffer_lock_reserve(buffer, TRACE_BPUTS, size, 444 + irq_flags, preempt_count()); 445 + if (!event) 446 + return 0; 447 + 448 + entry = ring_buffer_event_data(event); 449 + entry->ip = ip; 450 + entry->str = str; 451 + 452 + __buffer_unlock_commit(buffer, event); 453 + 454 + return 1; 455 + } 456 + EXPORT_SYMBOL_GPL(__trace_bputs); 457 + 458 + #ifdef CONFIG_TRACER_SNAPSHOT 459 + /** 460 + * trace_snapshot - take a snapshot of the current buffer. 461 + * 462 + * This causes a swap between the snapshot buffer and the current live 463 + * tracing buffer. You can use this to take snapshots of the live 464 + * trace when some condition is triggered, but continue to trace. 465 + * 466 + * Note, make sure to allocate the snapshot with either 467 + * a tracing_snapshot_alloc(), or by doing it manually 468 + * with: echo 1 > /sys/kernel/debug/tracing/snapshot 469 + * 470 + * If the snapshot buffer is not allocated, it will stop tracing. 471 + * Basically making a permanent snapshot. 472 + */ 473 + void tracing_snapshot(void) 474 + { 475 + struct trace_array *tr = &global_trace; 476 + struct tracer *tracer = tr->current_trace; 477 + unsigned long flags; 478 + 479 + if (in_nmi()) { 480 + internal_trace_puts("*** SNAPSHOT CALLED FROM NMI CONTEXT ***\n"); 481 + internal_trace_puts("*** snapshot is being ignored ***\n"); 482 + return; 483 + } 484 + 485 + if (!tr->allocated_snapshot) { 486 + internal_trace_puts("*** SNAPSHOT NOT ALLOCATED ***\n"); 487 + internal_trace_puts("*** stopping trace here! ***\n"); 488 + tracing_off(); 489 + return; 490 + } 491 + 492 + /* Note, snapshot can not be used when the tracer uses it */ 493 + if (tracer->use_max_tr) { 494 + internal_trace_puts("*** LATENCY TRACER ACTIVE ***\n"); 495 + internal_trace_puts("*** Can not use snapshot (sorry) ***\n"); 496 + return; 497 + } 498 + 499 + local_irq_save(flags); 500 + update_max_tr(tr, current, smp_processor_id()); 501 + local_irq_restore(flags); 502 + } 503 + EXPORT_SYMBOL_GPL(tracing_snapshot); 504 + 505 + static int resize_buffer_duplicate_size(struct trace_buffer *trace_buf, 506 + struct trace_buffer *size_buf, int cpu_id); 507 + static void set_buffer_entries(struct trace_buffer *buf, unsigned long val); 508 + 509 + static int alloc_snapshot(struct trace_array *tr) 510 + { 511 + int ret; 512 + 513 + if (!tr->allocated_snapshot) { 514 + 515 + /* allocate spare buffer */ 516 + ret = resize_buffer_duplicate_size(&tr->max_buffer, 517 + &tr->trace_buffer, RING_BUFFER_ALL_CPUS); 518 + if (ret < 0) 519 + return ret; 520 + 521 + tr->allocated_snapshot = true; 522 + } 523 + 524 + return 0; 525 + } 526 + 527 + void free_snapshot(struct trace_array *tr) 528 + { 529 + /* 530 + * We don't free the ring buffer. instead, resize it because 531 + * The max_tr ring buffer has some state (e.g. ring->clock) and 532 + * we want preserve it. 533 + */ 534 + ring_buffer_resize(tr->max_buffer.buffer, 1, RING_BUFFER_ALL_CPUS); 535 + set_buffer_entries(&tr->max_buffer, 1); 536 + tracing_reset_online_cpus(&tr->max_buffer); 537 + tr->allocated_snapshot = false; 538 + } 539 + 540 + /** 541 + * trace_snapshot_alloc - allocate and take a snapshot of the current buffer. 542 + * 543 + * This is similar to trace_snapshot(), but it will allocate the 544 + * snapshot buffer if it isn't already allocated. Use this only 545 + * where it is safe to sleep, as the allocation may sleep. 546 + * 547 + * This causes a swap between the snapshot buffer and the current live 548 + * tracing buffer. You can use this to take snapshots of the live 549 + * trace when some condition is triggered, but continue to trace. 550 + */ 551 + void tracing_snapshot_alloc(void) 552 + { 553 + struct trace_array *tr = &global_trace; 554 + int ret; 555 + 556 + ret = alloc_snapshot(tr); 557 + if (WARN_ON(ret < 0)) 558 + return; 559 + 560 + tracing_snapshot(); 561 + } 562 + EXPORT_SYMBOL_GPL(tracing_snapshot_alloc); 563 + #else 564 + void tracing_snapshot(void) 565 + { 566 + WARN_ONCE(1, "Snapshot feature not enabled, but internal snapshot used"); 567 + } 568 + EXPORT_SYMBOL_GPL(tracing_snapshot); 569 + void tracing_snapshot_alloc(void) 570 + { 571 + /* Give warning */ 572 + tracing_snapshot(); 573 + } 574 + EXPORT_SYMBOL_GPL(tracing_snapshot_alloc); 575 + #endif /* CONFIG_TRACER_SNAPSHOT */ 576 + 577 + /** 354 578 * tracing_off - turn off tracing buffers 355 579 * 356 580 * This function stops the tracing buffers from recording data. ··· 550 394 */ 551 395 void tracing_off(void) 552 396 { 553 - if (global_trace.buffer) 554 - ring_buffer_record_off(global_trace.buffer); 397 + if (global_trace.trace_buffer.buffer) 398 + ring_buffer_record_off(global_trace.trace_buffer.buffer); 555 399 /* 556 400 * This flag is only looked at when buffers haven't been 557 401 * allocated yet. We don't really care about the race ··· 567 411 */ 568 412 int tracing_is_on(void) 569 413 { 570 - if (global_trace.buffer) 571 - return ring_buffer_record_is_on(global_trace.buffer); 414 + if (global_trace.trace_buffer.buffer) 415 + return ring_buffer_record_is_on(global_trace.trace_buffer.buffer); 572 416 return !global_trace.buffer_disabled; 573 417 } 574 418 EXPORT_SYMBOL_GPL(tracing_is_on); ··· 635 479 "disable_on_free", 636 480 "irq-info", 637 481 "markers", 482 + "function-trace", 638 483 NULL 639 484 }; 640 485 ··· 647 490 { trace_clock_local, "local", 1 }, 648 491 { trace_clock_global, "global", 1 }, 649 492 { trace_clock_counter, "counter", 0 }, 493 + { trace_clock_jiffies, "uptime", 1 }, 494 + { trace_clock, "perf", 1 }, 650 495 ARCH_TRACE_CLOCKS 651 496 }; 652 497 ··· 829 670 static void 830 671 __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu) 831 672 { 832 - struct trace_array_cpu *data = tr->data[cpu]; 833 - struct trace_array_cpu *max_data; 673 + struct trace_buffer *trace_buf = &tr->trace_buffer; 674 + struct trace_buffer *max_buf = &tr->max_buffer; 675 + struct trace_array_cpu *data = per_cpu_ptr(trace_buf->data, cpu); 676 + struct trace_array_cpu *max_data = per_cpu_ptr(max_buf->data, cpu); 834 677 835 - max_tr.cpu = cpu; 836 - max_tr.time_start = data->preempt_timestamp; 678 + max_buf->cpu = cpu; 679 + max_buf->time_start = data->preempt_timestamp; 837 680 838 - max_data = max_tr.data[cpu]; 839 681 max_data->saved_latency = tracing_max_latency; 840 682 max_data->critical_start = data->critical_start; 841 683 max_data->critical_end = data->critical_end; ··· 866 706 { 867 707 struct ring_buffer *buf; 868 708 869 - if (trace_stop_count) 709 + if (tr->stop_count) 870 710 return; 871 711 872 712 WARN_ON_ONCE(!irqs_disabled()); 873 713 874 - if (!current_trace->allocated_snapshot) { 714 + if (!tr->allocated_snapshot) { 875 715 /* Only the nop tracer should hit this when disabling */ 876 - WARN_ON_ONCE(current_trace != &nop_trace); 716 + WARN_ON_ONCE(tr->current_trace != &nop_trace); 877 717 return; 878 718 } 879 719 880 720 arch_spin_lock(&ftrace_max_lock); 881 721 882 - buf = tr->buffer; 883 - tr->buffer = max_tr.buffer; 884 - max_tr.buffer = buf; 722 + buf = tr->trace_buffer.buffer; 723 + tr->trace_buffer.buffer = tr->max_buffer.buffer; 724 + tr->max_buffer.buffer = buf; 885 725 886 726 __update_max_tr(tr, tsk, cpu); 887 727 arch_spin_unlock(&ftrace_max_lock); ··· 900 740 { 901 741 int ret; 902 742 903 - if (trace_stop_count) 743 + if (tr->stop_count) 904 744 return; 905 745 906 746 WARN_ON_ONCE(!irqs_disabled()); 907 - if (!current_trace->allocated_snapshot) { 747 + if (tr->allocated_snapshot) { 908 748 /* Only the nop tracer should hit this when disabling */ 909 - WARN_ON_ONCE(current_trace != &nop_trace); 749 + WARN_ON_ONCE(tr->current_trace != &nop_trace); 910 750 return; 911 751 } 912 752 913 753 arch_spin_lock(&ftrace_max_lock); 914 754 915 - ret = ring_buffer_swap_cpu(max_tr.buffer, tr->buffer, cpu); 755 + ret = ring_buffer_swap_cpu(tr->max_buffer.buffer, tr->trace_buffer.buffer, cpu); 916 756 917 757 if (ret == -EBUSY) { 918 758 /* ··· 921 761 * the max trace buffer (no one writes directly to it) 922 762 * and flag that it failed. 923 763 */ 924 - trace_array_printk(&max_tr, _THIS_IP_, 764 + trace_array_printk_buf(tr->max_buffer.buffer, _THIS_IP_, 925 765 "Failed to swap buffers due to commit in progress\n"); 926 766 } 927 767 ··· 934 774 935 775 static void default_wait_pipe(struct trace_iterator *iter) 936 776 { 937 - DEFINE_WAIT(wait); 777 + /* Iterators are static, they should be filled or empty */ 778 + if (trace_buffer_iter(iter, iter->cpu_file)) 779 + return; 938 780 939 - prepare_to_wait(&trace_wait, &wait, TASK_INTERRUPTIBLE); 781 + ring_buffer_wait(iter->trace_buffer->buffer, iter->cpu_file); 782 + } 783 + 784 + #ifdef CONFIG_FTRACE_STARTUP_TEST 785 + static int run_tracer_selftest(struct tracer *type) 786 + { 787 + struct trace_array *tr = &global_trace; 788 + struct tracer *saved_tracer = tr->current_trace; 789 + int ret; 790 + 791 + if (!type->selftest || tracing_selftest_disabled) 792 + return 0; 940 793 941 794 /* 942 - * The events can happen in critical sections where 943 - * checking a work queue can cause deadlocks. 944 - * After adding a task to the queue, this flag is set 945 - * only to notify events to try to wake up the queue 946 - * using irq_work. 947 - * 948 - * We don't clear it even if the buffer is no longer 949 - * empty. The flag only causes the next event to run 950 - * irq_work to do the work queue wake up. The worse 951 - * that can happen if we race with !trace_empty() is that 952 - * an event will cause an irq_work to try to wake up 953 - * an empty queue. 954 - * 955 - * There's no reason to protect this flag either, as 956 - * the work queue and irq_work logic will do the necessary 957 - * synchronization for the wake ups. The only thing 958 - * that is necessary is that the wake up happens after 959 - * a task has been queued. It's OK for spurious wake ups. 795 + * Run a selftest on this tracer. 796 + * Here we reset the trace buffer, and set the current 797 + * tracer to be this tracer. The tracer can then run some 798 + * internal tracing to verify that everything is in order. 799 + * If we fail, we do not register this tracer. 960 800 */ 961 - trace_wakeup_needed = true; 801 + tracing_reset_online_cpus(&tr->trace_buffer); 962 802 963 - if (trace_empty(iter)) 964 - schedule(); 803 + tr->current_trace = type; 965 804 966 - finish_wait(&trace_wait, &wait); 805 + #ifdef CONFIG_TRACER_MAX_TRACE 806 + if (type->use_max_tr) { 807 + /* If we expanded the buffers, make sure the max is expanded too */ 808 + if (ring_buffer_expanded) 809 + ring_buffer_resize(tr->max_buffer.buffer, trace_buf_size, 810 + RING_BUFFER_ALL_CPUS); 811 + tr->allocated_snapshot = true; 812 + } 813 + #endif 814 + 815 + /* the test is responsible for initializing and enabling */ 816 + pr_info("Testing tracer %s: ", type->name); 817 + ret = type->selftest(type, tr); 818 + /* the test is responsible for resetting too */ 819 + tr->current_trace = saved_tracer; 820 + if (ret) { 821 + printk(KERN_CONT "FAILED!\n"); 822 + /* Add the warning after printing 'FAILED' */ 823 + WARN_ON(1); 824 + return -1; 825 + } 826 + /* Only reset on passing, to avoid touching corrupted buffers */ 827 + tracing_reset_online_cpus(&tr->trace_buffer); 828 + 829 + #ifdef CONFIG_TRACER_MAX_TRACE 830 + if (type->use_max_tr) { 831 + tr->allocated_snapshot = false; 832 + 833 + /* Shrink the max buffer again */ 834 + if (ring_buffer_expanded) 835 + ring_buffer_resize(tr->max_buffer.buffer, 1, 836 + RING_BUFFER_ALL_CPUS); 837 + } 838 + #endif 839 + 840 + printk(KERN_CONT "PASSED\n"); 841 + return 0; 967 842 } 843 + #else 844 + static inline int run_tracer_selftest(struct tracer *type) 845 + { 846 + return 0; 847 + } 848 + #endif /* CONFIG_FTRACE_STARTUP_TEST */ 968 849 969 850 /** 970 851 * register_tracer - register a tracer with the ftrace system. ··· 1052 851 if (!type->wait_pipe) 1053 852 type->wait_pipe = default_wait_pipe; 1054 853 1055 - 1056 - #ifdef CONFIG_FTRACE_STARTUP_TEST 1057 - if (type->selftest && !tracing_selftest_disabled) { 1058 - struct tracer *saved_tracer = current_trace; 1059 - struct trace_array *tr = &global_trace; 1060 - 1061 - /* 1062 - * Run a selftest on this tracer. 1063 - * Here we reset the trace buffer, and set the current 1064 - * tracer to be this tracer. The tracer can then run some 1065 - * internal tracing to verify that everything is in order. 1066 - * If we fail, we do not register this tracer. 1067 - */ 1068 - tracing_reset_online_cpus(tr); 1069 - 1070 - current_trace = type; 1071 - 1072 - if (type->use_max_tr) { 1073 - /* If we expanded the buffers, make sure the max is expanded too */ 1074 - if (ring_buffer_expanded) 1075 - ring_buffer_resize(max_tr.buffer, trace_buf_size, 1076 - RING_BUFFER_ALL_CPUS); 1077 - type->allocated_snapshot = true; 1078 - } 1079 - 1080 - /* the test is responsible for initializing and enabling */ 1081 - pr_info("Testing tracer %s: ", type->name); 1082 - ret = type->selftest(type, tr); 1083 - /* the test is responsible for resetting too */ 1084 - current_trace = saved_tracer; 1085 - if (ret) { 1086 - printk(KERN_CONT "FAILED!\n"); 1087 - /* Add the warning after printing 'FAILED' */ 1088 - WARN_ON(1); 1089 - goto out; 1090 - } 1091 - /* Only reset on passing, to avoid touching corrupted buffers */ 1092 - tracing_reset_online_cpus(tr); 1093 - 1094 - if (type->use_max_tr) { 1095 - type->allocated_snapshot = false; 1096 - 1097 - /* Shrink the max buffer again */ 1098 - if (ring_buffer_expanded) 1099 - ring_buffer_resize(max_tr.buffer, 1, 1100 - RING_BUFFER_ALL_CPUS); 1101 - } 1102 - 1103 - printk(KERN_CONT "PASSED\n"); 1104 - } 1105 - #endif 854 + ret = run_tracer_selftest(type); 855 + if (ret < 0) 856 + goto out; 1106 857 1107 858 type->next = trace_types; 1108 859 trace_types = type; ··· 1074 921 tracing_set_tracer(type->name); 1075 922 default_bootup_tracer = NULL; 1076 923 /* disable other selftests, since this will break it. */ 1077 - tracing_selftest_disabled = 1; 924 + tracing_selftest_disabled = true; 1078 925 #ifdef CONFIG_FTRACE_STARTUP_TEST 1079 926 printk(KERN_INFO "Disabling FTRACE selftests due to running tracer '%s'\n", 1080 927 type->name); ··· 1084 931 return ret; 1085 932 } 1086 933 1087 - void tracing_reset(struct trace_array *tr, int cpu) 934 + void tracing_reset(struct trace_buffer *buf, int cpu) 1088 935 { 1089 - struct ring_buffer *buffer = tr->buffer; 936 + struct ring_buffer *buffer = buf->buffer; 1090 937 1091 938 if (!buffer) 1092 939 return; ··· 1100 947 ring_buffer_record_enable(buffer); 1101 948 } 1102 949 1103 - void tracing_reset_online_cpus(struct trace_array *tr) 950 + void tracing_reset_online_cpus(struct trace_buffer *buf) 1104 951 { 1105 - struct ring_buffer *buffer = tr->buffer; 952 + struct ring_buffer *buffer = buf->buffer; 1106 953 int cpu; 1107 954 1108 955 if (!buffer) ··· 1113 960 /* Make sure all commits have finished */ 1114 961 synchronize_sched(); 1115 962 1116 - tr->time_start = ftrace_now(tr->cpu); 963 + buf->time_start = ftrace_now(buf->cpu); 1117 964 1118 965 for_each_online_cpu(cpu) 1119 966 ring_buffer_reset_cpu(buffer, cpu); ··· 1123 970 1124 971 void tracing_reset_current(int cpu) 1125 972 { 1126 - tracing_reset(&global_trace, cpu); 973 + tracing_reset(&global_trace.trace_buffer, cpu); 1127 974 } 1128 975 1129 - void tracing_reset_current_online_cpus(void) 976 + void tracing_reset_all_online_cpus(void) 1130 977 { 1131 - tracing_reset_online_cpus(&global_trace); 978 + struct trace_array *tr; 979 + 980 + mutex_lock(&trace_types_lock); 981 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { 982 + tracing_reset_online_cpus(&tr->trace_buffer); 983 + #ifdef CONFIG_TRACER_MAX_TRACE 984 + tracing_reset_online_cpus(&tr->max_buffer); 985 + #endif 986 + } 987 + mutex_unlock(&trace_types_lock); 1132 988 } 1133 989 1134 990 #define SAVED_CMDLINES 128 ··· 1160 998 1161 999 int is_tracing_stopped(void) 1162 1000 { 1163 - return trace_stop_count; 1001 + return global_trace.stop_count; 1164 1002 } 1165 1003 1166 1004 /** ··· 1192 1030 if (tracing_disabled) 1193 1031 return; 1194 1032 1195 - raw_spin_lock_irqsave(&tracing_start_lock, flags); 1196 - if (--trace_stop_count) { 1197 - if (trace_stop_count < 0) { 1033 + raw_spin_lock_irqsave(&global_trace.start_lock, flags); 1034 + if (--global_trace.stop_count) { 1035 + if (global_trace.stop_count < 0) { 1198 1036 /* Someone screwed up their debugging */ 1199 1037 WARN_ON_ONCE(1); 1200 - trace_stop_count = 0; 1038 + global_trace.stop_count = 0; 1201 1039 } 1202 1040 goto out; 1203 1041 } ··· 1205 1043 /* Prevent the buffers from switching */ 1206 1044 arch_spin_lock(&ftrace_max_lock); 1207 1045 1208 - buffer = global_trace.buffer; 1046 + buffer = global_trace.trace_buffer.buffer; 1209 1047 if (buffer) 1210 1048 ring_buffer_record_enable(buffer); 1211 1049 1212 - buffer = max_tr.buffer; 1050 + #ifdef CONFIG_TRACER_MAX_TRACE 1051 + buffer = global_trace.max_buffer.buffer; 1213 1052 if (buffer) 1214 1053 ring_buffer_record_enable(buffer); 1054 + #endif 1215 1055 1216 1056 arch_spin_unlock(&ftrace_max_lock); 1217 1057 1218 1058 ftrace_start(); 1219 1059 out: 1220 - raw_spin_unlock_irqrestore(&tracing_start_lock, flags); 1060 + raw_spin_unlock_irqrestore(&global_trace.start_lock, flags); 1061 + } 1062 + 1063 + static void tracing_start_tr(struct trace_array *tr) 1064 + { 1065 + struct ring_buffer *buffer; 1066 + unsigned long flags; 1067 + 1068 + if (tracing_disabled) 1069 + return; 1070 + 1071 + /* If global, we need to also start the max tracer */ 1072 + if (tr->flags & TRACE_ARRAY_FL_GLOBAL) 1073 + return tracing_start(); 1074 + 1075 + raw_spin_lock_irqsave(&tr->start_lock, flags); 1076 + 1077 + if (--tr->stop_count) { 1078 + if (tr->stop_count < 0) { 1079 + /* Someone screwed up their debugging */ 1080 + WARN_ON_ONCE(1); 1081 + tr->stop_count = 0; 1082 + } 1083 + goto out; 1084 + } 1085 + 1086 + buffer = tr->trace_buffer.buffer; 1087 + if (buffer) 1088 + ring_buffer_record_enable(buffer); 1089 + 1090 + out: 1091 + raw_spin_unlock_irqrestore(&tr->start_lock, flags); 1221 1092 } 1222 1093 1223 1094 /** ··· 1265 1070 unsigned long flags; 1266 1071 1267 1072 ftrace_stop(); 1268 - raw_spin_lock_irqsave(&tracing_start_lock, flags); 1269 - if (trace_stop_count++) 1073 + raw_spin_lock_irqsave(&global_trace.start_lock, flags); 1074 + if (global_trace.stop_count++) 1270 1075 goto out; 1271 1076 1272 1077 /* Prevent the buffers from switching */ 1273 1078 arch_spin_lock(&ftrace_max_lock); 1274 1079 1275 - buffer = global_trace.buffer; 1080 + buffer = global_trace.trace_buffer.buffer; 1276 1081 if (buffer) 1277 1082 ring_buffer_record_disable(buffer); 1278 1083 1279 - buffer = max_tr.buffer; 1084 + #ifdef CONFIG_TRACER_MAX_TRACE 1085 + buffer = global_trace.max_buffer.buffer; 1280 1086 if (buffer) 1281 1087 ring_buffer_record_disable(buffer); 1088 + #endif 1282 1089 1283 1090 arch_spin_unlock(&ftrace_max_lock); 1284 1091 1285 1092 out: 1286 - raw_spin_unlock_irqrestore(&tracing_start_lock, flags); 1093 + raw_spin_unlock_irqrestore(&global_trace.start_lock, flags); 1094 + } 1095 + 1096 + static void tracing_stop_tr(struct trace_array *tr) 1097 + { 1098 + struct ring_buffer *buffer; 1099 + unsigned long flags; 1100 + 1101 + /* If global, we need to also stop the max tracer */ 1102 + if (tr->flags & TRACE_ARRAY_FL_GLOBAL) 1103 + return tracing_stop(); 1104 + 1105 + raw_spin_lock_irqsave(&tr->start_lock, flags); 1106 + if (tr->stop_count++) 1107 + goto out; 1108 + 1109 + buffer = tr->trace_buffer.buffer; 1110 + if (buffer) 1111 + ring_buffer_record_disable(buffer); 1112 + 1113 + out: 1114 + raw_spin_unlock_irqrestore(&tr->start_lock, flags); 1287 1115 } 1288 1116 1289 1117 void trace_stop_cmdline_recording(void); ··· 1439 1221 __buffer_unlock_commit(struct ring_buffer *buffer, struct ring_buffer_event *event) 1440 1222 { 1441 1223 __this_cpu_write(trace_cmdline_save, true); 1442 - if (trace_wakeup_needed) { 1443 - trace_wakeup_needed = false; 1444 - /* irq_work_queue() supplies it's own memory barriers */ 1445 - irq_work_queue(&trace_work_wakeup); 1446 - } 1447 1224 ring_buffer_unlock_commit(buffer, event); 1448 1225 } 1449 1226 ··· 1462 1249 EXPORT_SYMBOL_GPL(trace_buffer_unlock_commit); 1463 1250 1464 1251 struct ring_buffer_event * 1252 + trace_event_buffer_lock_reserve(struct ring_buffer **current_rb, 1253 + struct ftrace_event_file *ftrace_file, 1254 + int type, unsigned long len, 1255 + unsigned long flags, int pc) 1256 + { 1257 + *current_rb = ftrace_file->tr->trace_buffer.buffer; 1258 + return trace_buffer_lock_reserve(*current_rb, 1259 + type, len, flags, pc); 1260 + } 1261 + EXPORT_SYMBOL_GPL(trace_event_buffer_lock_reserve); 1262 + 1263 + struct ring_buffer_event * 1465 1264 trace_current_buffer_lock_reserve(struct ring_buffer **current_rb, 1466 1265 int type, unsigned long len, 1467 1266 unsigned long flags, int pc) 1468 1267 { 1469 - *current_rb = global_trace.buffer; 1268 + *current_rb = global_trace.trace_buffer.buffer; 1470 1269 return trace_buffer_lock_reserve(*current_rb, 1471 1270 type, len, flags, pc); 1472 1271 } ··· 1517 1292 int pc) 1518 1293 { 1519 1294 struct ftrace_event_call *call = &event_function; 1520 - struct ring_buffer *buffer = tr->buffer; 1295 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 1521 1296 struct ring_buffer_event *event; 1522 1297 struct ftrace_entry *entry; 1523 1298 ··· 1658 1433 void __trace_stack(struct trace_array *tr, unsigned long flags, int skip, 1659 1434 int pc) 1660 1435 { 1661 - __ftrace_trace_stack(tr->buffer, flags, skip, pc, NULL); 1436 + __ftrace_trace_stack(tr->trace_buffer.buffer, flags, skip, pc, NULL); 1662 1437 } 1663 1438 1664 1439 /** 1665 1440 * trace_dump_stack - record a stack back trace in the trace buffer 1441 + * @skip: Number of functions to skip (helper handlers) 1666 1442 */ 1667 - void trace_dump_stack(void) 1443 + void trace_dump_stack(int skip) 1668 1444 { 1669 1445 unsigned long flags; 1670 1446 ··· 1674 1448 1675 1449 local_save_flags(flags); 1676 1450 1677 - /* skipping 3 traces, seems to get us at the caller of this function */ 1678 - __ftrace_trace_stack(global_trace.buffer, flags, 3, preempt_count(), NULL); 1451 + /* 1452 + * Skip 3 more, seems to get us at the caller of 1453 + * this function. 1454 + */ 1455 + skip += 3; 1456 + __ftrace_trace_stack(global_trace.trace_buffer.buffer, 1457 + flags, skip, preempt_count(), NULL); 1679 1458 } 1680 1459 1681 1460 static DEFINE_PER_CPU(int, user_stack_count); ··· 1850 1619 * directly here. If the global_trace.buffer is already 1851 1620 * allocated here, then this was called by module code. 1852 1621 */ 1853 - if (global_trace.buffer) 1622 + if (global_trace.trace_buffer.buffer) 1854 1623 tracing_start_cmdline_record(); 1855 1624 } 1856 1625 ··· 1910 1679 1911 1680 local_save_flags(flags); 1912 1681 size = sizeof(*entry) + sizeof(u32) * len; 1913 - buffer = tr->buffer; 1682 + buffer = tr->trace_buffer.buffer; 1914 1683 event = trace_buffer_lock_reserve(buffer, TRACE_BPRINT, size, 1915 1684 flags, pc); 1916 1685 if (!event) ··· 1933 1702 } 1934 1703 EXPORT_SYMBOL_GPL(trace_vbprintk); 1935 1704 1936 - int trace_array_printk(struct trace_array *tr, 1937 - unsigned long ip, const char *fmt, ...) 1938 - { 1939 - int ret; 1940 - va_list ap; 1941 - 1942 - if (!(trace_flags & TRACE_ITER_PRINTK)) 1943 - return 0; 1944 - 1945 - va_start(ap, fmt); 1946 - ret = trace_array_vprintk(tr, ip, fmt, ap); 1947 - va_end(ap); 1948 - return ret; 1949 - } 1950 - 1951 - int trace_array_vprintk(struct trace_array *tr, 1952 - unsigned long ip, const char *fmt, va_list args) 1705 + static int 1706 + __trace_array_vprintk(struct ring_buffer *buffer, 1707 + unsigned long ip, const char *fmt, va_list args) 1953 1708 { 1954 1709 struct ftrace_event_call *call = &event_print; 1955 1710 struct ring_buffer_event *event; 1956 - struct ring_buffer *buffer; 1957 1711 int len = 0, size, pc; 1958 1712 struct print_entry *entry; 1959 1713 unsigned long flags; ··· 1966 1750 1967 1751 local_save_flags(flags); 1968 1752 size = sizeof(*entry) + len + 1; 1969 - buffer = tr->buffer; 1970 1753 event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, size, 1971 1754 flags, pc); 1972 1755 if (!event) ··· 1984 1769 unpause_graph_tracing(); 1985 1770 1986 1771 return len; 1772 + } 1773 + 1774 + int trace_array_vprintk(struct trace_array *tr, 1775 + unsigned long ip, const char *fmt, va_list args) 1776 + { 1777 + return __trace_array_vprintk(tr->trace_buffer.buffer, ip, fmt, args); 1778 + } 1779 + 1780 + int trace_array_printk(struct trace_array *tr, 1781 + unsigned long ip, const char *fmt, ...) 1782 + { 1783 + int ret; 1784 + va_list ap; 1785 + 1786 + if (!(trace_flags & TRACE_ITER_PRINTK)) 1787 + return 0; 1788 + 1789 + va_start(ap, fmt); 1790 + ret = trace_array_vprintk(tr, ip, fmt, ap); 1791 + va_end(ap); 1792 + return ret; 1793 + } 1794 + 1795 + int trace_array_printk_buf(struct ring_buffer *buffer, 1796 + unsigned long ip, const char *fmt, ...) 1797 + { 1798 + int ret; 1799 + va_list ap; 1800 + 1801 + if (!(trace_flags & TRACE_ITER_PRINTK)) 1802 + return 0; 1803 + 1804 + va_start(ap, fmt); 1805 + ret = __trace_array_vprintk(buffer, ip, fmt, ap); 1806 + va_end(ap); 1807 + return ret; 1987 1808 } 1988 1809 1989 1810 int trace_vprintk(unsigned long ip, const char *fmt, va_list args) ··· 2047 1796 if (buf_iter) 2048 1797 event = ring_buffer_iter_peek(buf_iter, ts); 2049 1798 else 2050 - event = ring_buffer_peek(iter->tr->buffer, cpu, ts, 1799 + event = ring_buffer_peek(iter->trace_buffer->buffer, cpu, ts, 2051 1800 lost_events); 2052 1801 2053 1802 if (event) { ··· 2062 1811 __find_next_entry(struct trace_iterator *iter, int *ent_cpu, 2063 1812 unsigned long *missing_events, u64 *ent_ts) 2064 1813 { 2065 - struct ring_buffer *buffer = iter->tr->buffer; 1814 + struct ring_buffer *buffer = iter->trace_buffer->buffer; 2066 1815 struct trace_entry *ent, *next = NULL; 2067 1816 unsigned long lost_events = 0, next_lost = 0; 2068 1817 int cpu_file = iter->cpu_file; ··· 2075 1824 * If we are in a per_cpu trace file, don't bother by iterating over 2076 1825 * all cpu and peek directly. 2077 1826 */ 2078 - if (cpu_file > TRACE_PIPE_ALL_CPU) { 1827 + if (cpu_file > RING_BUFFER_ALL_CPUS) { 2079 1828 if (ring_buffer_empty_cpu(buffer, cpu_file)) 2080 1829 return NULL; 2081 1830 ent = peek_next_entry(iter, cpu_file, ent_ts, missing_events); ··· 2139 1888 2140 1889 static void trace_consume(struct trace_iterator *iter) 2141 1890 { 2142 - ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts, 1891 + ring_buffer_consume(iter->trace_buffer->buffer, iter->cpu, &iter->ts, 2143 1892 &iter->lost_events); 2144 1893 } 2145 1894 ··· 2172 1921 2173 1922 void tracing_iter_reset(struct trace_iterator *iter, int cpu) 2174 1923 { 2175 - struct trace_array *tr = iter->tr; 2176 1924 struct ring_buffer_event *event; 2177 1925 struct ring_buffer_iter *buf_iter; 2178 1926 unsigned long entries = 0; 2179 1927 u64 ts; 2180 1928 2181 - tr->data[cpu]->skipped_entries = 0; 1929 + per_cpu_ptr(iter->trace_buffer->data, cpu)->skipped_entries = 0; 2182 1930 2183 1931 buf_iter = trace_buffer_iter(iter, cpu); 2184 1932 if (!buf_iter) ··· 2191 1941 * by the timestamp being before the start of the buffer. 2192 1942 */ 2193 1943 while ((event = ring_buffer_iter_peek(buf_iter, &ts))) { 2194 - if (ts >= iter->tr->time_start) 1944 + if (ts >= iter->trace_buffer->time_start) 2195 1945 break; 2196 1946 entries++; 2197 1947 ring_buffer_read(buf_iter, NULL); 2198 1948 } 2199 1949 2200 - tr->data[cpu]->skipped_entries = entries; 1950 + per_cpu_ptr(iter->trace_buffer->data, cpu)->skipped_entries = entries; 2201 1951 } 2202 1952 2203 1953 /* ··· 2207 1957 static void *s_start(struct seq_file *m, loff_t *pos) 2208 1958 { 2209 1959 struct trace_iterator *iter = m->private; 1960 + struct trace_array *tr = iter->tr; 2210 1961 int cpu_file = iter->cpu_file; 2211 1962 void *p = NULL; 2212 1963 loff_t l = 0; ··· 2220 1969 * will point to the same string as current_trace->name. 2221 1970 */ 2222 1971 mutex_lock(&trace_types_lock); 2223 - if (unlikely(current_trace && iter->trace->name != current_trace->name)) 2224 - *iter->trace = *current_trace; 1972 + if (unlikely(tr->current_trace && iter->trace->name != tr->current_trace->name)) 1973 + *iter->trace = *tr->current_trace; 2225 1974 mutex_unlock(&trace_types_lock); 2226 1975 1976 + #ifdef CONFIG_TRACER_MAX_TRACE 2227 1977 if (iter->snapshot && iter->trace->use_max_tr) 2228 1978 return ERR_PTR(-EBUSY); 1979 + #endif 2229 1980 2230 1981 if (!iter->snapshot) 2231 1982 atomic_inc(&trace_record_cmdline_disabled); ··· 2237 1984 iter->cpu = 0; 2238 1985 iter->idx = -1; 2239 1986 2240 - if (cpu_file == TRACE_PIPE_ALL_CPU) { 1987 + if (cpu_file == RING_BUFFER_ALL_CPUS) { 2241 1988 for_each_tracing_cpu(cpu) 2242 1989 tracing_iter_reset(iter, cpu); 2243 1990 } else ··· 2269 2016 { 2270 2017 struct trace_iterator *iter = m->private; 2271 2018 2019 + #ifdef CONFIG_TRACER_MAX_TRACE 2272 2020 if (iter->snapshot && iter->trace->use_max_tr) 2273 2021 return; 2022 + #endif 2274 2023 2275 2024 if (!iter->snapshot) 2276 2025 atomic_dec(&trace_record_cmdline_disabled); 2026 + 2277 2027 trace_access_unlock(iter->cpu_file); 2278 2028 trace_event_read_unlock(); 2279 2029 } 2280 2030 2281 2031 static void 2282 - get_total_entries(struct trace_array *tr, unsigned long *total, unsigned long *entries) 2032 + get_total_entries(struct trace_buffer *buf, 2033 + unsigned long *total, unsigned long *entries) 2283 2034 { 2284 2035 unsigned long count; 2285 2036 int cpu; ··· 2292 2035 *entries = 0; 2293 2036 2294 2037 for_each_tracing_cpu(cpu) { 2295 - count = ring_buffer_entries_cpu(tr->buffer, cpu); 2038 + count = ring_buffer_entries_cpu(buf->buffer, cpu); 2296 2039 /* 2297 2040 * If this buffer has skipped entries, then we hold all 2298 2041 * entries for the trace and we need to ignore the 2299 2042 * ones before the time stamp. 2300 2043 */ 2301 - if (tr->data[cpu]->skipped_entries) { 2302 - count -= tr->data[cpu]->skipped_entries; 2044 + if (per_cpu_ptr(buf->data, cpu)->skipped_entries) { 2045 + count -= per_cpu_ptr(buf->data, cpu)->skipped_entries; 2303 2046 /* total is the same as the entries */ 2304 2047 *total += count; 2305 2048 } else 2306 2049 *total += count + 2307 - ring_buffer_overrun_cpu(tr->buffer, cpu); 2050 + ring_buffer_overrun_cpu(buf->buffer, cpu); 2308 2051 *entries += count; 2309 2052 } 2310 2053 } ··· 2321 2064 seq_puts(m, "# \\ / ||||| \\ | / \n"); 2322 2065 } 2323 2066 2324 - static void print_event_info(struct trace_array *tr, struct seq_file *m) 2067 + static void print_event_info(struct trace_buffer *buf, struct seq_file *m) 2325 2068 { 2326 2069 unsigned long total; 2327 2070 unsigned long entries; 2328 2071 2329 - get_total_entries(tr, &total, &entries); 2072 + get_total_entries(buf, &total, &entries); 2330 2073 seq_printf(m, "# entries-in-buffer/entries-written: %lu/%lu #P:%d\n", 2331 2074 entries, total, num_online_cpus()); 2332 2075 seq_puts(m, "#\n"); 2333 2076 } 2334 2077 2335 - static void print_func_help_header(struct trace_array *tr, struct seq_file *m) 2078 + static void print_func_help_header(struct trace_buffer *buf, struct seq_file *m) 2336 2079 { 2337 - print_event_info(tr, m); 2080 + print_event_info(buf, m); 2338 2081 seq_puts(m, "# TASK-PID CPU# TIMESTAMP FUNCTION\n"); 2339 2082 seq_puts(m, "# | | | | |\n"); 2340 2083 } 2341 2084 2342 - static void print_func_help_header_irq(struct trace_array *tr, struct seq_file *m) 2085 + static void print_func_help_header_irq(struct trace_buffer *buf, struct seq_file *m) 2343 2086 { 2344 - print_event_info(tr, m); 2087 + print_event_info(buf, m); 2345 2088 seq_puts(m, "# _-----=> irqs-off\n"); 2346 2089 seq_puts(m, "# / _----=> need-resched\n"); 2347 2090 seq_puts(m, "# | / _---=> hardirq/softirq\n"); ··· 2355 2098 print_trace_header(struct seq_file *m, struct trace_iterator *iter) 2356 2099 { 2357 2100 unsigned long sym_flags = (trace_flags & TRACE_ITER_SYM_MASK); 2358 - struct trace_array *tr = iter->tr; 2359 - struct trace_array_cpu *data = tr->data[tr->cpu]; 2360 - struct tracer *type = current_trace; 2101 + struct trace_buffer *buf = iter->trace_buffer; 2102 + struct trace_array_cpu *data = per_cpu_ptr(buf->data, buf->cpu); 2103 + struct tracer *type = iter->trace; 2361 2104 unsigned long entries; 2362 2105 unsigned long total; 2363 2106 const char *name = "preemption"; 2364 2107 2365 2108 name = type->name; 2366 2109 2367 - get_total_entries(tr, &total, &entries); 2110 + get_total_entries(buf, &total, &entries); 2368 2111 2369 2112 seq_printf(m, "# %s latency trace v1.1.5 on %s\n", 2370 2113 name, UTS_RELEASE); ··· 2375 2118 nsecs_to_usecs(data->saved_latency), 2376 2119 entries, 2377 2120 total, 2378 - tr->cpu, 2121 + buf->cpu, 2379 2122 #if defined(CONFIG_PREEMPT_NONE) 2380 2123 "server", 2381 2124 #elif defined(CONFIG_PREEMPT_VOLUNTARY) ··· 2426 2169 if (cpumask_test_cpu(iter->cpu, iter->started)) 2427 2170 return; 2428 2171 2429 - if (iter->tr->data[iter->cpu]->skipped_entries) 2172 + if (per_cpu_ptr(iter->trace_buffer->data, iter->cpu)->skipped_entries) 2430 2173 return; 2431 2174 2432 2175 cpumask_set_cpu(iter->cpu, iter->started); ··· 2549 2292 int cpu; 2550 2293 2551 2294 /* If we are looking at one CPU buffer, only check that one */ 2552 - if (iter->cpu_file != TRACE_PIPE_ALL_CPU) { 2295 + if (iter->cpu_file != RING_BUFFER_ALL_CPUS) { 2553 2296 cpu = iter->cpu_file; 2554 2297 buf_iter = trace_buffer_iter(iter, cpu); 2555 2298 if (buf_iter) { 2556 2299 if (!ring_buffer_iter_empty(buf_iter)) 2557 2300 return 0; 2558 2301 } else { 2559 - if (!ring_buffer_empty_cpu(iter->tr->buffer, cpu)) 2302 + if (!ring_buffer_empty_cpu(iter->trace_buffer->buffer, cpu)) 2560 2303 return 0; 2561 2304 } 2562 2305 return 1; ··· 2568 2311 if (!ring_buffer_iter_empty(buf_iter)) 2569 2312 return 0; 2570 2313 } else { 2571 - if (!ring_buffer_empty_cpu(iter->tr->buffer, cpu)) 2314 + if (!ring_buffer_empty_cpu(iter->trace_buffer->buffer, cpu)) 2572 2315 return 0; 2573 2316 } 2574 2317 } ··· 2591 2334 if (ret != TRACE_TYPE_UNHANDLED) 2592 2335 return ret; 2593 2336 } 2337 + 2338 + if (iter->ent->type == TRACE_BPUTS && 2339 + trace_flags & TRACE_ITER_PRINTK && 2340 + trace_flags & TRACE_ITER_PRINTK_MSGONLY) 2341 + return trace_print_bputs_msg_only(iter); 2594 2342 2595 2343 if (iter->ent->type == TRACE_BPRINT && 2596 2344 trace_flags & TRACE_ITER_PRINTK && ··· 2651 2389 } else { 2652 2390 if (!(trace_flags & TRACE_ITER_VERBOSE)) { 2653 2391 if (trace_flags & TRACE_ITER_IRQ_INFO) 2654 - print_func_help_header_irq(iter->tr, m); 2392 + print_func_help_header_irq(iter->trace_buffer, m); 2655 2393 else 2656 - print_func_help_header(iter->tr, m); 2394 + print_func_help_header(iter->trace_buffer, m); 2657 2395 } 2658 2396 } 2659 2397 } ··· 2667 2405 } 2668 2406 2669 2407 #ifdef CONFIG_TRACER_MAX_TRACE 2670 - static void print_snapshot_help(struct seq_file *m, struct trace_iterator *iter) 2408 + static void show_snapshot_main_help(struct seq_file *m) 2671 2409 { 2672 - if (iter->trace->allocated_snapshot) 2673 - seq_printf(m, "#\n# * Snapshot is allocated *\n#\n"); 2674 - else 2675 - seq_printf(m, "#\n# * Snapshot is freed *\n#\n"); 2676 - 2677 - seq_printf(m, "# Snapshot commands:\n"); 2678 2410 seq_printf(m, "# echo 0 > snapshot : Clears and frees snapshot buffer\n"); 2679 2411 seq_printf(m, "# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.\n"); 2680 2412 seq_printf(m, "# Takes a snapshot of the main buffer.\n"); 2681 2413 seq_printf(m, "# echo 2 > snapshot : Clears snapshot buffer (but does not allocate)\n"); 2682 2414 seq_printf(m, "# (Doesn't have to be '2' works with any number that\n"); 2683 2415 seq_printf(m, "# is not a '0' or '1')\n"); 2416 + } 2417 + 2418 + static void show_snapshot_percpu_help(struct seq_file *m) 2419 + { 2420 + seq_printf(m, "# echo 0 > snapshot : Invalid for per_cpu snapshot file.\n"); 2421 + #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP 2422 + seq_printf(m, "# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.\n"); 2423 + seq_printf(m, "# Takes a snapshot of the main buffer for this cpu.\n"); 2424 + #else 2425 + seq_printf(m, "# echo 1 > snapshot : Not supported with this kernel.\n"); 2426 + seq_printf(m, "# Must use main snapshot file to allocate.\n"); 2427 + #endif 2428 + seq_printf(m, "# echo 2 > snapshot : Clears this cpu's snapshot buffer (but does not allocate)\n"); 2429 + seq_printf(m, "# (Doesn't have to be '2' works with any number that\n"); 2430 + seq_printf(m, "# is not a '0' or '1')\n"); 2431 + } 2432 + 2433 + static void print_snapshot_help(struct seq_file *m, struct trace_iterator *iter) 2434 + { 2435 + if (iter->tr->allocated_snapshot) 2436 + seq_printf(m, "#\n# * Snapshot is allocated *\n#\n"); 2437 + else 2438 + seq_printf(m, "#\n# * Snapshot is freed *\n#\n"); 2439 + 2440 + seq_printf(m, "# Snapshot commands:\n"); 2441 + if (iter->cpu_file == RING_BUFFER_ALL_CPUS) 2442 + show_snapshot_main_help(m); 2443 + else 2444 + show_snapshot_percpu_help(m); 2684 2445 } 2685 2446 #else 2686 2447 /* Should never be called */ ··· 2764 2479 static struct trace_iterator * 2765 2480 __tracing_open(struct inode *inode, struct file *file, bool snapshot) 2766 2481 { 2767 - long cpu_file = (long) inode->i_private; 2482 + struct trace_cpu *tc = inode->i_private; 2483 + struct trace_array *tr = tc->tr; 2768 2484 struct trace_iterator *iter; 2769 2485 int cpu; 2770 2486 ··· 2790 2504 if (!iter->trace) 2791 2505 goto fail; 2792 2506 2793 - *iter->trace = *current_trace; 2507 + *iter->trace = *tr->current_trace; 2794 2508 2795 2509 if (!zalloc_cpumask_var(&iter->started, GFP_KERNEL)) 2796 2510 goto fail; 2797 2511 2798 - if (current_trace->print_max || snapshot) 2799 - iter->tr = &max_tr; 2512 + iter->tr = tr; 2513 + 2514 + #ifdef CONFIG_TRACER_MAX_TRACE 2515 + /* Currently only the top directory has a snapshot */ 2516 + if (tr->current_trace->print_max || snapshot) 2517 + iter->trace_buffer = &tr->max_buffer; 2800 2518 else 2801 - iter->tr = &global_trace; 2519 + #endif 2520 + iter->trace_buffer = &tr->trace_buffer; 2802 2521 iter->snapshot = snapshot; 2803 2522 iter->pos = -1; 2804 2523 mutex_init(&iter->mutex); 2805 - iter->cpu_file = cpu_file; 2524 + iter->cpu_file = tc->cpu; 2806 2525 2807 2526 /* Notify the tracer early; before we stop tracing. */ 2808 2527 if (iter->trace && iter->trace->open) 2809 2528 iter->trace->open(iter); 2810 2529 2811 2530 /* Annotate start of buffers if we had overruns */ 2812 - if (ring_buffer_overruns(iter->tr->buffer)) 2531 + if (ring_buffer_overruns(iter->trace_buffer->buffer)) 2813 2532 iter->iter_flags |= TRACE_FILE_ANNOTATE; 2814 2533 2815 2534 /* Output in nanoseconds only if we are using a clock in nanoseconds. */ ··· 2823 2532 2824 2533 /* stop the trace while dumping if we are not opening "snapshot" */ 2825 2534 if (!iter->snapshot) 2826 - tracing_stop(); 2535 + tracing_stop_tr(tr); 2827 2536 2828 - if (iter->cpu_file == TRACE_PIPE_ALL_CPU) { 2537 + if (iter->cpu_file == RING_BUFFER_ALL_CPUS) { 2829 2538 for_each_tracing_cpu(cpu) { 2830 2539 iter->buffer_iter[cpu] = 2831 - ring_buffer_read_prepare(iter->tr->buffer, cpu); 2540 + ring_buffer_read_prepare(iter->trace_buffer->buffer, cpu); 2832 2541 } 2833 2542 ring_buffer_read_prepare_sync(); 2834 2543 for_each_tracing_cpu(cpu) { ··· 2838 2547 } else { 2839 2548 cpu = iter->cpu_file; 2840 2549 iter->buffer_iter[cpu] = 2841 - ring_buffer_read_prepare(iter->tr->buffer, cpu); 2550 + ring_buffer_read_prepare(iter->trace_buffer->buffer, cpu); 2842 2551 ring_buffer_read_prepare_sync(); 2843 2552 ring_buffer_read_start(iter->buffer_iter[cpu]); 2844 2553 tracing_iter_reset(iter, cpu); 2845 2554 } 2555 + 2556 + tr->ref++; 2846 2557 2847 2558 mutex_unlock(&trace_types_lock); 2848 2559 ··· 2872 2579 { 2873 2580 struct seq_file *m = file->private_data; 2874 2581 struct trace_iterator *iter; 2582 + struct trace_array *tr; 2875 2583 int cpu; 2876 2584 2877 2585 if (!(file->f_mode & FMODE_READ)) 2878 2586 return 0; 2879 2587 2880 2588 iter = m->private; 2589 + tr = iter->tr; 2881 2590 2882 2591 mutex_lock(&trace_types_lock); 2592 + 2593 + WARN_ON(!tr->ref); 2594 + tr->ref--; 2595 + 2883 2596 for_each_tracing_cpu(cpu) { 2884 2597 if (iter->buffer_iter[cpu]) 2885 2598 ring_buffer_read_finish(iter->buffer_iter[cpu]); ··· 2896 2597 2897 2598 if (!iter->snapshot) 2898 2599 /* reenable tracing if it was previously enabled */ 2899 - tracing_start(); 2600 + tracing_start_tr(tr); 2900 2601 mutex_unlock(&trace_types_lock); 2901 2602 2902 2603 mutex_destroy(&iter->mutex); ··· 2915 2616 /* If this file was open for write, then erase contents */ 2916 2617 if ((file->f_mode & FMODE_WRITE) && 2917 2618 (file->f_flags & O_TRUNC)) { 2918 - long cpu = (long) inode->i_private; 2619 + struct trace_cpu *tc = inode->i_private; 2620 + struct trace_array *tr = tc->tr; 2919 2621 2920 - if (cpu == TRACE_PIPE_ALL_CPU) 2921 - tracing_reset_online_cpus(&global_trace); 2622 + if (tc->cpu == RING_BUFFER_ALL_CPUS) 2623 + tracing_reset_online_cpus(&tr->trace_buffer); 2922 2624 else 2923 - tracing_reset(&global_trace, cpu); 2625 + tracing_reset(&tr->trace_buffer, tc->cpu); 2924 2626 } 2925 2627 2926 2628 if (file->f_mode & FMODE_READ) { ··· 3068 2768 tracing_cpumask_write(struct file *filp, const char __user *ubuf, 3069 2769 size_t count, loff_t *ppos) 3070 2770 { 3071 - int err, cpu; 2771 + struct trace_array *tr = filp->private_data; 3072 2772 cpumask_var_t tracing_cpumask_new; 2773 + int err, cpu; 3073 2774 3074 2775 if (!alloc_cpumask_var(&tracing_cpumask_new, GFP_KERNEL)) 3075 2776 return -ENOMEM; ··· 3090 2789 */ 3091 2790 if (cpumask_test_cpu(cpu, tracing_cpumask) && 3092 2791 !cpumask_test_cpu(cpu, tracing_cpumask_new)) { 3093 - atomic_inc(&global_trace.data[cpu]->disabled); 3094 - ring_buffer_record_disable_cpu(global_trace.buffer, cpu); 2792 + atomic_inc(&per_cpu_ptr(tr->trace_buffer.data, cpu)->disabled); 2793 + ring_buffer_record_disable_cpu(tr->trace_buffer.buffer, cpu); 3095 2794 } 3096 2795 if (!cpumask_test_cpu(cpu, tracing_cpumask) && 3097 2796 cpumask_test_cpu(cpu, tracing_cpumask_new)) { 3098 - atomic_dec(&global_trace.data[cpu]->disabled); 3099 - ring_buffer_record_enable_cpu(global_trace.buffer, cpu); 2797 + atomic_dec(&per_cpu_ptr(tr->trace_buffer.data, cpu)->disabled); 2798 + ring_buffer_record_enable_cpu(tr->trace_buffer.buffer, cpu); 3100 2799 } 3101 2800 } 3102 2801 arch_spin_unlock(&ftrace_max_lock); ··· 3125 2824 static int tracing_trace_options_show(struct seq_file *m, void *v) 3126 2825 { 3127 2826 struct tracer_opt *trace_opts; 2827 + struct trace_array *tr = m->private; 3128 2828 u32 tracer_flags; 3129 2829 int i; 3130 2830 3131 2831 mutex_lock(&trace_types_lock); 3132 - tracer_flags = current_trace->flags->val; 3133 - trace_opts = current_trace->flags->opts; 2832 + tracer_flags = tr->current_trace->flags->val; 2833 + trace_opts = tr->current_trace->flags->opts; 3134 2834 3135 2835 for (i = 0; trace_options[i]; i++) { 3136 2836 if (trace_flags & (1 << i)) ··· 3195 2893 return 0; 3196 2894 } 3197 2895 3198 - int set_tracer_flag(unsigned int mask, int enabled) 2896 + int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) 3199 2897 { 3200 2898 /* do nothing if flag is already set */ 3201 2899 if (!!(trace_flags & mask) == !!enabled) 3202 2900 return 0; 3203 2901 3204 2902 /* Give the tracer a chance to approve the change */ 3205 - if (current_trace->flag_changed) 3206 - if (current_trace->flag_changed(current_trace, mask, !!enabled)) 2903 + if (tr->current_trace->flag_changed) 2904 + if (tr->current_trace->flag_changed(tr->current_trace, mask, !!enabled)) 3207 2905 return -EINVAL; 3208 2906 3209 2907 if (enabled) ··· 3215 2913 trace_event_enable_cmd_record(enabled); 3216 2914 3217 2915 if (mask == TRACE_ITER_OVERWRITE) { 3218 - ring_buffer_change_overwrite(global_trace.buffer, enabled); 2916 + ring_buffer_change_overwrite(tr->trace_buffer.buffer, enabled); 3219 2917 #ifdef CONFIG_TRACER_MAX_TRACE 3220 - ring_buffer_change_overwrite(max_tr.buffer, enabled); 2918 + ring_buffer_change_overwrite(tr->max_buffer.buffer, enabled); 3221 2919 #endif 3222 2920 } 3223 2921 ··· 3227 2925 return 0; 3228 2926 } 3229 2927 3230 - static int trace_set_options(char *option) 2928 + static int trace_set_options(struct trace_array *tr, char *option) 3231 2929 { 3232 2930 char *cmp; 3233 2931 int neg = 0; ··· 3245 2943 3246 2944 for (i = 0; trace_options[i]; i++) { 3247 2945 if (strcmp(cmp, trace_options[i]) == 0) { 3248 - ret = set_tracer_flag(1 << i, !neg); 2946 + ret = set_tracer_flag(tr, 1 << i, !neg); 3249 2947 break; 3250 2948 } 3251 2949 } 3252 2950 3253 2951 /* If no option could be set, test the specific tracer options */ 3254 2952 if (!trace_options[i]) 3255 - ret = set_tracer_option(current_trace, cmp, neg); 2953 + ret = set_tracer_option(tr->current_trace, cmp, neg); 3256 2954 3257 2955 mutex_unlock(&trace_types_lock); 3258 2956 ··· 3263 2961 tracing_trace_options_write(struct file *filp, const char __user *ubuf, 3264 2962 size_t cnt, loff_t *ppos) 3265 2963 { 2964 + struct seq_file *m = filp->private_data; 2965 + struct trace_array *tr = m->private; 3266 2966 char buf[64]; 3267 2967 int ret; 3268 2968 ··· 3276 2972 3277 2973 buf[cnt] = 0; 3278 2974 3279 - ret = trace_set_options(buf); 2975 + ret = trace_set_options(tr, buf); 3280 2976 if (ret < 0) 3281 2977 return ret; 3282 2978 ··· 3289 2985 { 3290 2986 if (tracing_disabled) 3291 2987 return -ENODEV; 3292 - return single_open(file, tracing_trace_options_show, NULL); 2988 + 2989 + return single_open(file, tracing_trace_options_show, inode->i_private); 3293 2990 } 3294 2991 3295 2992 static const struct file_operations tracing_iter_fops = { ··· 3303 2998 3304 2999 static const char readme_msg[] = 3305 3000 "tracing mini-HOWTO:\n\n" 3306 - "# mount -t debugfs nodev /sys/kernel/debug\n\n" 3307 - "# cat /sys/kernel/debug/tracing/available_tracers\n" 3308 - "wakeup wakeup_rt preemptirqsoff preemptoff irqsoff function nop\n\n" 3309 - "# cat /sys/kernel/debug/tracing/current_tracer\n" 3310 - "nop\n" 3311 - "# echo wakeup > /sys/kernel/debug/tracing/current_tracer\n" 3312 - "# cat /sys/kernel/debug/tracing/current_tracer\n" 3313 - "wakeup\n" 3314 - "# cat /sys/kernel/debug/tracing/trace_options\n" 3315 - "noprint-parent nosym-offset nosym-addr noverbose\n" 3316 - "# echo print-parent > /sys/kernel/debug/tracing/trace_options\n" 3317 - "# echo 1 > /sys/kernel/debug/tracing/tracing_on\n" 3318 - "# cat /sys/kernel/debug/tracing/trace > /tmp/trace.txt\n" 3319 - "# echo 0 > /sys/kernel/debug/tracing/tracing_on\n" 3001 + "# echo 0 > tracing_on : quick way to disable tracing\n" 3002 + "# echo 1 > tracing_on : quick way to re-enable tracing\n\n" 3003 + " Important files:\n" 3004 + " trace\t\t\t- The static contents of the buffer\n" 3005 + "\t\t\t To clear the buffer write into this file: echo > trace\n" 3006 + " trace_pipe\t\t- A consuming read to see the contents of the buffer\n" 3007 + " current_tracer\t- function and latency tracers\n" 3008 + " available_tracers\t- list of configured tracers for current_tracer\n" 3009 + " buffer_size_kb\t- view and modify size of per cpu buffer\n" 3010 + " buffer_total_size_kb - view total size of all cpu buffers\n\n" 3011 + " trace_clock\t\t-change the clock used to order events\n" 3012 + " local: Per cpu clock but may not be synced across CPUs\n" 3013 + " global: Synced across CPUs but slows tracing down.\n" 3014 + " counter: Not a clock, but just an increment\n" 3015 + " uptime: Jiffy counter from time of boot\n" 3016 + " perf: Same clock that perf events use\n" 3017 + #ifdef CONFIG_X86_64 3018 + " x86-tsc: TSC cycle counter\n" 3019 + #endif 3020 + "\n trace_marker\t\t- Writes into this file writes into the kernel buffer\n" 3021 + " tracing_cpumask\t- Limit which CPUs to trace\n" 3022 + " instances\t\t- Make sub-buffers with: mkdir instances/foo\n" 3023 + "\t\t\t Remove sub-buffer with rmdir\n" 3024 + " trace_options\t\t- Set format or modify how tracing happens\n" 3025 + "\t\t\t Disable an option by adding a suffix 'no' to the option name\n" 3026 + #ifdef CONFIG_DYNAMIC_FTRACE 3027 + "\n available_filter_functions - list of functions that can be filtered on\n" 3028 + " set_ftrace_filter\t- echo function name in here to only trace these functions\n" 3029 + " accepts: func_full_name, *func_end, func_begin*, *func_middle*\n" 3030 + " modules: Can select a group via module\n" 3031 + " Format: :mod:<module-name>\n" 3032 + " example: echo :mod:ext3 > set_ftrace_filter\n" 3033 + " triggers: a command to perform when function is hit\n" 3034 + " Format: <function>:<trigger>[:count]\n" 3035 + " trigger: traceon, traceoff\n" 3036 + " enable_event:<system>:<event>\n" 3037 + " disable_event:<system>:<event>\n" 3038 + #ifdef CONFIG_STACKTRACE 3039 + " stacktrace\n" 3040 + #endif 3041 + #ifdef CONFIG_TRACER_SNAPSHOT 3042 + " snapshot\n" 3043 + #endif 3044 + " example: echo do_fault:traceoff > set_ftrace_filter\n" 3045 + " echo do_trap:traceoff:3 > set_ftrace_filter\n" 3046 + " The first one will disable tracing every time do_fault is hit\n" 3047 + " The second will disable tracing at most 3 times when do_trap is hit\n" 3048 + " The first time do trap is hit and it disables tracing, the counter\n" 3049 + " will decrement to 2. If tracing is already disabled, the counter\n" 3050 + " will not decrement. It only decrements when the trigger did work\n" 3051 + " To remove trigger without count:\n" 3052 + " echo '!<function>:<trigger> > set_ftrace_filter\n" 3053 + " To remove trigger with a count:\n" 3054 + " echo '!<function>:<trigger>:0 > set_ftrace_filter\n" 3055 + " set_ftrace_notrace\t- echo function name in here to never trace.\n" 3056 + " accepts: func_full_name, *func_end, func_begin*, *func_middle*\n" 3057 + " modules: Can select a group via module command :mod:\n" 3058 + " Does not accept triggers\n" 3059 + #endif /* CONFIG_DYNAMIC_FTRACE */ 3060 + #ifdef CONFIG_FUNCTION_TRACER 3061 + " set_ftrace_pid\t- Write pid(s) to only function trace those pids (function)\n" 3062 + #endif 3063 + #ifdef CONFIG_FUNCTION_GRAPH_TRACER 3064 + " set_graph_function\t- Trace the nested calls of a function (function_graph)\n" 3065 + " max_graph_depth\t- Trace a limited depth of nested calls (0 is unlimited)\n" 3066 + #endif 3067 + #ifdef CONFIG_TRACER_SNAPSHOT 3068 + "\n snapshot\t\t- Like 'trace' but shows the content of the static snapshot buffer\n" 3069 + "\t\t\t Read the contents for more information\n" 3070 + #endif 3071 + #ifdef CONFIG_STACKTRACE 3072 + " stack_trace\t\t- Shows the max stack trace when active\n" 3073 + " stack_max_size\t- Shows current max stack size that was traced\n" 3074 + "\t\t\t Write into this file to reset the max size (trigger a new trace)\n" 3075 + #ifdef CONFIG_DYNAMIC_FTRACE 3076 + " stack_trace_filter\t- Like set_ftrace_filter but limits what stack_trace traces\n" 3077 + #endif 3078 + #endif /* CONFIG_STACKTRACE */ 3320 3079 ; 3321 3080 3322 3081 static ssize_t ··· 3452 3083 tracing_set_trace_read(struct file *filp, char __user *ubuf, 3453 3084 size_t cnt, loff_t *ppos) 3454 3085 { 3086 + struct trace_array *tr = filp->private_data; 3455 3087 char buf[MAX_TRACER_SIZE+2]; 3456 3088 int r; 3457 3089 3458 3090 mutex_lock(&trace_types_lock); 3459 - r = sprintf(buf, "%s\n", current_trace->name); 3091 + r = sprintf(buf, "%s\n", tr->current_trace->name); 3460 3092 mutex_unlock(&trace_types_lock); 3461 3093 3462 3094 return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); ··· 3465 3095 3466 3096 int tracer_init(struct tracer *t, struct trace_array *tr) 3467 3097 { 3468 - tracing_reset_online_cpus(tr); 3098 + tracing_reset_online_cpus(&tr->trace_buffer); 3469 3099 return t->init(tr); 3470 3100 } 3471 3101 3472 - static void set_buffer_entries(struct trace_array *tr, unsigned long val) 3102 + static void set_buffer_entries(struct trace_buffer *buf, unsigned long val) 3473 3103 { 3474 3104 int cpu; 3105 + 3475 3106 for_each_tracing_cpu(cpu) 3476 - tr->data[cpu]->entries = val; 3107 + per_cpu_ptr(buf->data, cpu)->entries = val; 3477 3108 } 3478 3109 3110 + #ifdef CONFIG_TRACER_MAX_TRACE 3479 3111 /* resize @tr's buffer to the size of @size_tr's entries */ 3480 - static int resize_buffer_duplicate_size(struct trace_array *tr, 3481 - struct trace_array *size_tr, int cpu_id) 3112 + static int resize_buffer_duplicate_size(struct trace_buffer *trace_buf, 3113 + struct trace_buffer *size_buf, int cpu_id) 3482 3114 { 3483 3115 int cpu, ret = 0; 3484 3116 3485 3117 if (cpu_id == RING_BUFFER_ALL_CPUS) { 3486 3118 for_each_tracing_cpu(cpu) { 3487 - ret = ring_buffer_resize(tr->buffer, 3488 - size_tr->data[cpu]->entries, cpu); 3119 + ret = ring_buffer_resize(trace_buf->buffer, 3120 + per_cpu_ptr(size_buf->data, cpu)->entries, cpu); 3489 3121 if (ret < 0) 3490 3122 break; 3491 - tr->data[cpu]->entries = size_tr->data[cpu]->entries; 3123 + per_cpu_ptr(trace_buf->data, cpu)->entries = 3124 + per_cpu_ptr(size_buf->data, cpu)->entries; 3492 3125 } 3493 3126 } else { 3494 - ret = ring_buffer_resize(tr->buffer, 3495 - size_tr->data[cpu_id]->entries, cpu_id); 3127 + ret = ring_buffer_resize(trace_buf->buffer, 3128 + per_cpu_ptr(size_buf->data, cpu_id)->entries, cpu_id); 3496 3129 if (ret == 0) 3497 - tr->data[cpu_id]->entries = 3498 - size_tr->data[cpu_id]->entries; 3130 + per_cpu_ptr(trace_buf->data, cpu_id)->entries = 3131 + per_cpu_ptr(size_buf->data, cpu_id)->entries; 3499 3132 } 3500 3133 3501 3134 return ret; 3502 3135 } 3136 + #endif /* CONFIG_TRACER_MAX_TRACE */ 3503 3137 3504 - static int __tracing_resize_ring_buffer(unsigned long size, int cpu) 3138 + static int __tracing_resize_ring_buffer(struct trace_array *tr, 3139 + unsigned long size, int cpu) 3505 3140 { 3506 3141 int ret; 3507 3142 ··· 3515 3140 * we use the size that was given, and we can forget about 3516 3141 * expanding it later. 3517 3142 */ 3518 - ring_buffer_expanded = 1; 3143 + ring_buffer_expanded = true; 3519 3144 3520 3145 /* May be called before buffers are initialized */ 3521 - if (!global_trace.buffer) 3146 + if (!tr->trace_buffer.buffer) 3522 3147 return 0; 3523 3148 3524 - ret = ring_buffer_resize(global_trace.buffer, size, cpu); 3149 + ret = ring_buffer_resize(tr->trace_buffer.buffer, size, cpu); 3525 3150 if (ret < 0) 3526 3151 return ret; 3527 3152 3528 - if (!current_trace->use_max_tr) 3153 + #ifdef CONFIG_TRACER_MAX_TRACE 3154 + if (!(tr->flags & TRACE_ARRAY_FL_GLOBAL) || 3155 + !tr->current_trace->use_max_tr) 3529 3156 goto out; 3530 3157 3531 - ret = ring_buffer_resize(max_tr.buffer, size, cpu); 3158 + ret = ring_buffer_resize(tr->max_buffer.buffer, size, cpu); 3532 3159 if (ret < 0) { 3533 - int r = resize_buffer_duplicate_size(&global_trace, 3534 - &global_trace, cpu); 3160 + int r = resize_buffer_duplicate_size(&tr->trace_buffer, 3161 + &tr->trace_buffer, cpu); 3535 3162 if (r < 0) { 3536 3163 /* 3537 3164 * AARGH! We are left with different ··· 3556 3179 } 3557 3180 3558 3181 if (cpu == RING_BUFFER_ALL_CPUS) 3559 - set_buffer_entries(&max_tr, size); 3182 + set_buffer_entries(&tr->max_buffer, size); 3560 3183 else 3561 - max_tr.data[cpu]->entries = size; 3184 + per_cpu_ptr(tr->max_buffer.data, cpu)->entries = size; 3562 3185 3563 3186 out: 3187 + #endif /* CONFIG_TRACER_MAX_TRACE */ 3188 + 3564 3189 if (cpu == RING_BUFFER_ALL_CPUS) 3565 - set_buffer_entries(&global_trace, size); 3190 + set_buffer_entries(&tr->trace_buffer, size); 3566 3191 else 3567 - global_trace.data[cpu]->entries = size; 3192 + per_cpu_ptr(tr->trace_buffer.data, cpu)->entries = size; 3568 3193 3569 3194 return ret; 3570 3195 } 3571 3196 3572 - static ssize_t tracing_resize_ring_buffer(unsigned long size, int cpu_id) 3197 + static ssize_t tracing_resize_ring_buffer(struct trace_array *tr, 3198 + unsigned long size, int cpu_id) 3573 3199 { 3574 3200 int ret = size; 3575 3201 ··· 3586 3206 } 3587 3207 } 3588 3208 3589 - ret = __tracing_resize_ring_buffer(size, cpu_id); 3209 + ret = __tracing_resize_ring_buffer(tr, size, cpu_id); 3590 3210 if (ret < 0) 3591 3211 ret = -ENOMEM; 3592 3212 ··· 3613 3233 3614 3234 mutex_lock(&trace_types_lock); 3615 3235 if (!ring_buffer_expanded) 3616 - ret = __tracing_resize_ring_buffer(trace_buf_size, 3236 + ret = __tracing_resize_ring_buffer(&global_trace, trace_buf_size, 3617 3237 RING_BUFFER_ALL_CPUS); 3618 3238 mutex_unlock(&trace_types_lock); 3619 3239 ··· 3623 3243 struct trace_option_dentry; 3624 3244 3625 3245 static struct trace_option_dentry * 3626 - create_trace_option_files(struct tracer *tracer); 3246 + create_trace_option_files(struct trace_array *tr, struct tracer *tracer); 3627 3247 3628 3248 static void 3629 3249 destroy_trace_option_files(struct trace_option_dentry *topts); ··· 3633 3253 static struct trace_option_dentry *topts; 3634 3254 struct trace_array *tr = &global_trace; 3635 3255 struct tracer *t; 3256 + #ifdef CONFIG_TRACER_MAX_TRACE 3636 3257 bool had_max_tr; 3258 + #endif 3637 3259 int ret = 0; 3638 3260 3639 3261 mutex_lock(&trace_types_lock); 3640 3262 3641 3263 if (!ring_buffer_expanded) { 3642 - ret = __tracing_resize_ring_buffer(trace_buf_size, 3264 + ret = __tracing_resize_ring_buffer(tr, trace_buf_size, 3643 3265 RING_BUFFER_ALL_CPUS); 3644 3266 if (ret < 0) 3645 3267 goto out; ··· 3656 3274 ret = -EINVAL; 3657 3275 goto out; 3658 3276 } 3659 - if (t == current_trace) 3277 + if (t == tr->current_trace) 3660 3278 goto out; 3661 3279 3662 3280 trace_branch_disable(); 3663 3281 3664 - current_trace->enabled = false; 3282 + tr->current_trace->enabled = false; 3665 3283 3666 - if (current_trace->reset) 3667 - current_trace->reset(tr); 3284 + if (tr->current_trace->reset) 3285 + tr->current_trace->reset(tr); 3668 3286 3669 - had_max_tr = current_trace->allocated_snapshot; 3670 - current_trace = &nop_trace; 3287 + /* Current trace needs to be nop_trace before synchronize_sched */ 3288 + tr->current_trace = &nop_trace; 3289 + 3290 + #ifdef CONFIG_TRACER_MAX_TRACE 3291 + had_max_tr = tr->allocated_snapshot; 3671 3292 3672 3293 if (had_max_tr && !t->use_max_tr) { 3673 3294 /* ··· 3681 3296 * so a synchronized_sched() is sufficient. 3682 3297 */ 3683 3298 synchronize_sched(); 3684 - /* 3685 - * We don't free the ring buffer. instead, resize it because 3686 - * The max_tr ring buffer has some state (e.g. ring->clock) and 3687 - * we want preserve it. 3688 - */ 3689 - ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS); 3690 - set_buffer_entries(&max_tr, 1); 3691 - tracing_reset_online_cpus(&max_tr); 3692 - current_trace->allocated_snapshot = false; 3299 + free_snapshot(tr); 3693 3300 } 3301 + #endif 3694 3302 destroy_trace_option_files(topts); 3695 3303 3696 - topts = create_trace_option_files(t); 3304 + topts = create_trace_option_files(tr, t); 3305 + 3306 + #ifdef CONFIG_TRACER_MAX_TRACE 3697 3307 if (t->use_max_tr && !had_max_tr) { 3698 - /* we need to make per cpu buffer sizes equivalent */ 3699 - ret = resize_buffer_duplicate_size(&max_tr, &global_trace, 3700 - RING_BUFFER_ALL_CPUS); 3308 + ret = alloc_snapshot(tr); 3701 3309 if (ret < 0) 3702 3310 goto out; 3703 - t->allocated_snapshot = true; 3704 3311 } 3312 + #endif 3705 3313 3706 3314 if (t->init) { 3707 3315 ret = tracer_init(t, tr); ··· 3702 3324 goto out; 3703 3325 } 3704 3326 3705 - current_trace = t; 3706 - current_trace->enabled = true; 3327 + tr->current_trace = t; 3328 + tr->current_trace->enabled = true; 3707 3329 trace_branch_enable(tr); 3708 3330 out: 3709 3331 mutex_unlock(&trace_types_lock); ··· 3777 3399 3778 3400 static int tracing_open_pipe(struct inode *inode, struct file *filp) 3779 3401 { 3780 - long cpu_file = (long) inode->i_private; 3402 + struct trace_cpu *tc = inode->i_private; 3403 + struct trace_array *tr = tc->tr; 3781 3404 struct trace_iterator *iter; 3782 3405 int ret = 0; 3783 3406 ··· 3803 3424 ret = -ENOMEM; 3804 3425 goto fail; 3805 3426 } 3806 - *iter->trace = *current_trace; 3427 + *iter->trace = *tr->current_trace; 3807 3428 3808 3429 if (!alloc_cpumask_var(&iter->started, GFP_KERNEL)) { 3809 3430 ret = -ENOMEM; ··· 3820 3441 if (trace_clocks[trace_clock_id].in_ns) 3821 3442 iter->iter_flags |= TRACE_FILE_TIME_IN_NS; 3822 3443 3823 - iter->cpu_file = cpu_file; 3824 - iter->tr = &global_trace; 3444 + iter->cpu_file = tc->cpu; 3445 + iter->tr = tc->tr; 3446 + iter->trace_buffer = &tc->tr->trace_buffer; 3825 3447 mutex_init(&iter->mutex); 3826 3448 filp->private_data = iter; 3827 3449 ··· 3861 3481 } 3862 3482 3863 3483 static unsigned int 3864 - tracing_poll_pipe(struct file *filp, poll_table *poll_table) 3484 + trace_poll(struct trace_iterator *iter, struct file *filp, poll_table *poll_table) 3865 3485 { 3866 - struct trace_iterator *iter = filp->private_data; 3486 + /* Iterators are static, they should be filled or empty */ 3487 + if (trace_buffer_iter(iter, iter->cpu_file)) 3488 + return POLLIN | POLLRDNORM; 3867 3489 3868 - if (trace_flags & TRACE_ITER_BLOCK) { 3490 + if (trace_flags & TRACE_ITER_BLOCK) 3869 3491 /* 3870 3492 * Always select as readable when in blocking mode 3871 3493 */ 3872 3494 return POLLIN | POLLRDNORM; 3873 - } else { 3874 - if (!trace_empty(iter)) 3875 - return POLLIN | POLLRDNORM; 3876 - poll_wait(filp, &trace_wait, poll_table); 3877 - if (!trace_empty(iter)) 3878 - return POLLIN | POLLRDNORM; 3495 + else 3496 + return ring_buffer_poll_wait(iter->trace_buffer->buffer, iter->cpu_file, 3497 + filp, poll_table); 3498 + } 3879 3499 3880 - return 0; 3881 - } 3500 + static unsigned int 3501 + tracing_poll_pipe(struct file *filp, poll_table *poll_table) 3502 + { 3503 + struct trace_iterator *iter = filp->private_data; 3504 + 3505 + return trace_poll(iter, filp, poll_table); 3882 3506 } 3883 3507 3884 3508 /* ··· 3948 3564 size_t cnt, loff_t *ppos) 3949 3565 { 3950 3566 struct trace_iterator *iter = filp->private_data; 3567 + struct trace_array *tr = iter->tr; 3951 3568 ssize_t sret; 3952 3569 3953 3570 /* return any leftover data */ ··· 3960 3575 3961 3576 /* copy the tracer to avoid using a global lock all around */ 3962 3577 mutex_lock(&trace_types_lock); 3963 - if (unlikely(iter->trace->name != current_trace->name)) 3964 - *iter->trace = *current_trace; 3578 + if (unlikely(iter->trace->name != tr->current_trace->name)) 3579 + *iter->trace = *tr->current_trace; 3965 3580 mutex_unlock(&trace_types_lock); 3966 3581 3967 3582 /* ··· 4117 3732 .ops = &tracing_pipe_buf_ops, 4118 3733 .spd_release = tracing_spd_release_pipe, 4119 3734 }; 3735 + struct trace_array *tr = iter->tr; 4120 3736 ssize_t ret; 4121 3737 size_t rem; 4122 3738 unsigned int i; ··· 4127 3741 4128 3742 /* copy the tracer to avoid using a global lock all around */ 4129 3743 mutex_lock(&trace_types_lock); 4130 - if (unlikely(iter->trace->name != current_trace->name)) 4131 - *iter->trace = *current_trace; 3744 + if (unlikely(iter->trace->name != tr->current_trace->name)) 3745 + *iter->trace = *tr->current_trace; 4132 3746 mutex_unlock(&trace_types_lock); 4133 3747 4134 3748 mutex_lock(&iter->mutex); ··· 4190 3804 goto out; 4191 3805 } 4192 3806 4193 - struct ftrace_entries_info { 4194 - struct trace_array *tr; 4195 - int cpu; 4196 - }; 4197 - 4198 - static int tracing_entries_open(struct inode *inode, struct file *filp) 4199 - { 4200 - struct ftrace_entries_info *info; 4201 - 4202 - if (tracing_disabled) 4203 - return -ENODEV; 4204 - 4205 - info = kzalloc(sizeof(*info), GFP_KERNEL); 4206 - if (!info) 4207 - return -ENOMEM; 4208 - 4209 - info->tr = &global_trace; 4210 - info->cpu = (unsigned long)inode->i_private; 4211 - 4212 - filp->private_data = info; 4213 - 4214 - return 0; 4215 - } 4216 - 4217 3807 static ssize_t 4218 3808 tracing_entries_read(struct file *filp, char __user *ubuf, 4219 3809 size_t cnt, loff_t *ppos) 4220 3810 { 4221 - struct ftrace_entries_info *info = filp->private_data; 4222 - struct trace_array *tr = info->tr; 3811 + struct trace_cpu *tc = filp->private_data; 3812 + struct trace_array *tr = tc->tr; 4223 3813 char buf[64]; 4224 3814 int r = 0; 4225 3815 ssize_t ret; 4226 3816 4227 3817 mutex_lock(&trace_types_lock); 4228 3818 4229 - if (info->cpu == RING_BUFFER_ALL_CPUS) { 3819 + if (tc->cpu == RING_BUFFER_ALL_CPUS) { 4230 3820 int cpu, buf_size_same; 4231 3821 unsigned long size; 4232 3822 ··· 4212 3850 for_each_tracing_cpu(cpu) { 4213 3851 /* fill in the size from first enabled cpu */ 4214 3852 if (size == 0) 4215 - size = tr->data[cpu]->entries; 4216 - if (size != tr->data[cpu]->entries) { 3853 + size = per_cpu_ptr(tr->trace_buffer.data, cpu)->entries; 3854 + if (size != per_cpu_ptr(tr->trace_buffer.data, cpu)->entries) { 4217 3855 buf_size_same = 0; 4218 3856 break; 4219 3857 } ··· 4229 3867 } else 4230 3868 r = sprintf(buf, "X\n"); 4231 3869 } else 4232 - r = sprintf(buf, "%lu\n", tr->data[info->cpu]->entries >> 10); 3870 + r = sprintf(buf, "%lu\n", per_cpu_ptr(tr->trace_buffer.data, tc->cpu)->entries >> 10); 4233 3871 4234 3872 mutex_unlock(&trace_types_lock); 4235 3873 ··· 4241 3879 tracing_entries_write(struct file *filp, const char __user *ubuf, 4242 3880 size_t cnt, loff_t *ppos) 4243 3881 { 4244 - struct ftrace_entries_info *info = filp->private_data; 3882 + struct trace_cpu *tc = filp->private_data; 4245 3883 unsigned long val; 4246 3884 int ret; 4247 3885 ··· 4256 3894 /* value is in KB */ 4257 3895 val <<= 10; 4258 3896 4259 - ret = tracing_resize_ring_buffer(val, info->cpu); 3897 + ret = tracing_resize_ring_buffer(tc->tr, val, tc->cpu); 4260 3898 if (ret < 0) 4261 3899 return ret; 4262 3900 4263 3901 *ppos += cnt; 4264 3902 4265 3903 return cnt; 4266 - } 4267 - 4268 - static int 4269 - tracing_entries_release(struct inode *inode, struct file *filp) 4270 - { 4271 - struct ftrace_entries_info *info = filp->private_data; 4272 - 4273 - kfree(info); 4274 - 4275 - return 0; 4276 3904 } 4277 3905 4278 3906 static ssize_t ··· 4276 3924 4277 3925 mutex_lock(&trace_types_lock); 4278 3926 for_each_tracing_cpu(cpu) { 4279 - size += tr->data[cpu]->entries >> 10; 3927 + size += per_cpu_ptr(tr->trace_buffer.data, cpu)->entries >> 10; 4280 3928 if (!ring_buffer_expanded) 4281 3929 expanded_size += trace_buf_size >> 10; 4282 3930 } ··· 4306 3954 static int 4307 3955 tracing_free_buffer_release(struct inode *inode, struct file *filp) 4308 3956 { 3957 + struct trace_array *tr = inode->i_private; 3958 + 4309 3959 /* disable tracing ? */ 4310 3960 if (trace_flags & TRACE_ITER_STOP_ON_FREE) 4311 3961 tracing_off(); 4312 3962 /* resize the ring buffer to 0 */ 4313 - tracing_resize_ring_buffer(0, RING_BUFFER_ALL_CPUS); 3963 + tracing_resize_ring_buffer(tr, 0, RING_BUFFER_ALL_CPUS); 4314 3964 4315 3965 return 0; 4316 3966 } ··· 4381 4027 4382 4028 local_save_flags(irq_flags); 4383 4029 size = sizeof(*entry) + cnt + 2; /* possible \n added */ 4384 - buffer = global_trace.buffer; 4030 + buffer = global_trace.trace_buffer.buffer; 4385 4031 event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, size, 4386 4032 irq_flags, preempt_count()); 4387 4033 if (!event) { ··· 4423 4069 4424 4070 static int tracing_clock_show(struct seq_file *m, void *v) 4425 4071 { 4072 + struct trace_array *tr = m->private; 4426 4073 int i; 4427 4074 4428 4075 for (i = 0; i < ARRAY_SIZE(trace_clocks); i++) 4429 4076 seq_printf(m, 4430 4077 "%s%s%s%s", i ? " " : "", 4431 - i == trace_clock_id ? "[" : "", trace_clocks[i].name, 4432 - i == trace_clock_id ? "]" : ""); 4078 + i == tr->clock_id ? "[" : "", trace_clocks[i].name, 4079 + i == tr->clock_id ? "]" : ""); 4433 4080 seq_putc(m, '\n'); 4434 4081 4435 4082 return 0; ··· 4439 4084 static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf, 4440 4085 size_t cnt, loff_t *fpos) 4441 4086 { 4087 + struct seq_file *m = filp->private_data; 4088 + struct trace_array *tr = m->private; 4442 4089 char buf[64]; 4443 4090 const char *clockstr; 4444 4091 int i; ··· 4462 4105 if (i == ARRAY_SIZE(trace_clocks)) 4463 4106 return -EINVAL; 4464 4107 4465 - trace_clock_id = i; 4466 - 4467 4108 mutex_lock(&trace_types_lock); 4468 4109 4469 - ring_buffer_set_clock(global_trace.buffer, trace_clocks[i].func); 4470 - if (max_tr.buffer) 4471 - ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func); 4110 + tr->clock_id = i; 4111 + 4112 + ring_buffer_set_clock(tr->trace_buffer.buffer, trace_clocks[i].func); 4472 4113 4473 4114 /* 4474 4115 * New clock may not be consistent with the previous clock. 4475 4116 * Reset the buffer so that it doesn't have incomparable timestamps. 4476 4117 */ 4477 - tracing_reset_online_cpus(&global_trace); 4478 - tracing_reset_online_cpus(&max_tr); 4118 + tracing_reset_online_cpus(&global_trace.trace_buffer); 4119 + 4120 + #ifdef CONFIG_TRACER_MAX_TRACE 4121 + if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer) 4122 + ring_buffer_set_clock(tr->max_buffer.buffer, trace_clocks[i].func); 4123 + tracing_reset_online_cpus(&global_trace.max_buffer); 4124 + #endif 4479 4125 4480 4126 mutex_unlock(&trace_types_lock); 4481 4127 ··· 4491 4131 { 4492 4132 if (tracing_disabled) 4493 4133 return -ENODEV; 4494 - return single_open(file, tracing_clock_show, NULL); 4134 + 4135 + return single_open(file, tracing_clock_show, inode->i_private); 4495 4136 } 4137 + 4138 + struct ftrace_buffer_info { 4139 + struct trace_iterator iter; 4140 + void *spare; 4141 + unsigned int read; 4142 + }; 4496 4143 4497 4144 #ifdef CONFIG_TRACER_SNAPSHOT 4498 4145 static int tracing_snapshot_open(struct inode *inode, struct file *file) 4499 4146 { 4147 + struct trace_cpu *tc = inode->i_private; 4500 4148 struct trace_iterator *iter; 4149 + struct seq_file *m; 4501 4150 int ret = 0; 4502 4151 4503 4152 if (file->f_mode & FMODE_READ) { 4504 4153 iter = __tracing_open(inode, file, true); 4505 4154 if (IS_ERR(iter)) 4506 4155 ret = PTR_ERR(iter); 4156 + } else { 4157 + /* Writes still need the seq_file to hold the private data */ 4158 + m = kzalloc(sizeof(*m), GFP_KERNEL); 4159 + if (!m) 4160 + return -ENOMEM; 4161 + iter = kzalloc(sizeof(*iter), GFP_KERNEL); 4162 + if (!iter) { 4163 + kfree(m); 4164 + return -ENOMEM; 4165 + } 4166 + iter->tr = tc->tr; 4167 + iter->trace_buffer = &tc->tr->max_buffer; 4168 + iter->cpu_file = tc->cpu; 4169 + m->private = iter; 4170 + file->private_data = m; 4507 4171 } 4172 + 4508 4173 return ret; 4509 4174 } 4510 4175 ··· 4537 4152 tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt, 4538 4153 loff_t *ppos) 4539 4154 { 4155 + struct seq_file *m = filp->private_data; 4156 + struct trace_iterator *iter = m->private; 4157 + struct trace_array *tr = iter->tr; 4540 4158 unsigned long val; 4541 4159 int ret; 4542 4160 ··· 4553 4165 4554 4166 mutex_lock(&trace_types_lock); 4555 4167 4556 - if (current_trace->use_max_tr) { 4168 + if (tr->current_trace->use_max_tr) { 4557 4169 ret = -EBUSY; 4558 4170 goto out; 4559 4171 } 4560 4172 4561 4173 switch (val) { 4562 4174 case 0: 4563 - if (current_trace->allocated_snapshot) { 4564 - /* free spare buffer */ 4565 - ring_buffer_resize(max_tr.buffer, 1, 4566 - RING_BUFFER_ALL_CPUS); 4567 - set_buffer_entries(&max_tr, 1); 4568 - tracing_reset_online_cpus(&max_tr); 4569 - current_trace->allocated_snapshot = false; 4175 + if (iter->cpu_file != RING_BUFFER_ALL_CPUS) { 4176 + ret = -EINVAL; 4177 + break; 4570 4178 } 4179 + if (tr->allocated_snapshot) 4180 + free_snapshot(tr); 4571 4181 break; 4572 4182 case 1: 4573 - if (!current_trace->allocated_snapshot) { 4574 - /* allocate spare buffer */ 4575 - ret = resize_buffer_duplicate_size(&max_tr, 4576 - &global_trace, RING_BUFFER_ALL_CPUS); 4183 + /* Only allow per-cpu swap if the ring buffer supports it */ 4184 + #ifndef CONFIG_RING_BUFFER_ALLOW_SWAP 4185 + if (iter->cpu_file != RING_BUFFER_ALL_CPUS) { 4186 + ret = -EINVAL; 4187 + break; 4188 + } 4189 + #endif 4190 + if (!tr->allocated_snapshot) { 4191 + ret = alloc_snapshot(tr); 4577 4192 if (ret < 0) 4578 4193 break; 4579 - current_trace->allocated_snapshot = true; 4580 4194 } 4581 - 4582 4195 local_irq_disable(); 4583 4196 /* Now, we're going to swap */ 4584 - update_max_tr(&global_trace, current, smp_processor_id()); 4197 + if (iter->cpu_file == RING_BUFFER_ALL_CPUS) 4198 + update_max_tr(tr, current, smp_processor_id()); 4199 + else 4200 + update_max_tr_single(tr, current, iter->cpu_file); 4585 4201 local_irq_enable(); 4586 4202 break; 4587 4203 default: 4588 - if (current_trace->allocated_snapshot) 4589 - tracing_reset_online_cpus(&max_tr); 4204 + if (tr->allocated_snapshot) { 4205 + if (iter->cpu_file == RING_BUFFER_ALL_CPUS) 4206 + tracing_reset_online_cpus(&tr->max_buffer); 4207 + else 4208 + tracing_reset(&tr->max_buffer, iter->cpu_file); 4209 + } 4590 4210 break; 4591 4211 } 4592 4212 ··· 4606 4210 mutex_unlock(&trace_types_lock); 4607 4211 return ret; 4608 4212 } 4213 + 4214 + static int tracing_snapshot_release(struct inode *inode, struct file *file) 4215 + { 4216 + struct seq_file *m = file->private_data; 4217 + 4218 + if (file->f_mode & FMODE_READ) 4219 + return tracing_release(inode, file); 4220 + 4221 + /* If write only, the seq_file is just a stub */ 4222 + if (m) 4223 + kfree(m->private); 4224 + kfree(m); 4225 + 4226 + return 0; 4227 + } 4228 + 4229 + static int tracing_buffers_open(struct inode *inode, struct file *filp); 4230 + static ssize_t tracing_buffers_read(struct file *filp, char __user *ubuf, 4231 + size_t count, loff_t *ppos); 4232 + static int tracing_buffers_release(struct inode *inode, struct file *file); 4233 + static ssize_t tracing_buffers_splice_read(struct file *file, loff_t *ppos, 4234 + struct pipe_inode_info *pipe, size_t len, unsigned int flags); 4235 + 4236 + static int snapshot_raw_open(struct inode *inode, struct file *filp) 4237 + { 4238 + struct ftrace_buffer_info *info; 4239 + int ret; 4240 + 4241 + ret = tracing_buffers_open(inode, filp); 4242 + if (ret < 0) 4243 + return ret; 4244 + 4245 + info = filp->private_data; 4246 + 4247 + if (info->iter.trace->use_max_tr) { 4248 + tracing_buffers_release(inode, filp); 4249 + return -EBUSY; 4250 + } 4251 + 4252 + info->iter.snapshot = true; 4253 + info->iter.trace_buffer = &info->iter.tr->max_buffer; 4254 + 4255 + return ret; 4256 + } 4257 + 4609 4258 #endif /* CONFIG_TRACER_SNAPSHOT */ 4610 4259 4611 4260 ··· 4678 4237 }; 4679 4238 4680 4239 static const struct file_operations tracing_entries_fops = { 4681 - .open = tracing_entries_open, 4240 + .open = tracing_open_generic, 4682 4241 .read = tracing_entries_read, 4683 4242 .write = tracing_entries_write, 4684 - .release = tracing_entries_release, 4685 4243 .llseek = generic_file_llseek, 4686 4244 }; 4687 4245 ··· 4715 4275 .read = seq_read, 4716 4276 .write = tracing_snapshot_write, 4717 4277 .llseek = tracing_seek, 4718 - .release = tracing_release, 4278 + .release = tracing_snapshot_release, 4719 4279 }; 4720 - #endif /* CONFIG_TRACER_SNAPSHOT */ 4721 4280 4722 - struct ftrace_buffer_info { 4723 - struct trace_array *tr; 4724 - void *spare; 4725 - int cpu; 4726 - unsigned int read; 4281 + static const struct file_operations snapshot_raw_fops = { 4282 + .open = snapshot_raw_open, 4283 + .read = tracing_buffers_read, 4284 + .release = tracing_buffers_release, 4285 + .splice_read = tracing_buffers_splice_read, 4286 + .llseek = no_llseek, 4727 4287 }; 4288 + 4289 + #endif /* CONFIG_TRACER_SNAPSHOT */ 4728 4290 4729 4291 static int tracing_buffers_open(struct inode *inode, struct file *filp) 4730 4292 { 4731 - int cpu = (int)(long)inode->i_private; 4293 + struct trace_cpu *tc = inode->i_private; 4294 + struct trace_array *tr = tc->tr; 4732 4295 struct ftrace_buffer_info *info; 4733 4296 4734 4297 if (tracing_disabled) ··· 4741 4298 if (!info) 4742 4299 return -ENOMEM; 4743 4300 4744 - info->tr = &global_trace; 4745 - info->cpu = cpu; 4746 - info->spare = NULL; 4301 + mutex_lock(&trace_types_lock); 4302 + 4303 + tr->ref++; 4304 + 4305 + info->iter.tr = tr; 4306 + info->iter.cpu_file = tc->cpu; 4307 + info->iter.trace = tr->current_trace; 4308 + info->iter.trace_buffer = &tr->trace_buffer; 4309 + info->spare = NULL; 4747 4310 /* Force reading ring buffer for first read */ 4748 - info->read = (unsigned int)-1; 4311 + info->read = (unsigned int)-1; 4749 4312 4750 4313 filp->private_data = info; 4751 4314 4315 + mutex_unlock(&trace_types_lock); 4316 + 4752 4317 return nonseekable_open(inode, filp); 4318 + } 4319 + 4320 + static unsigned int 4321 + tracing_buffers_poll(struct file *filp, poll_table *poll_table) 4322 + { 4323 + struct ftrace_buffer_info *info = filp->private_data; 4324 + struct trace_iterator *iter = &info->iter; 4325 + 4326 + return trace_poll(iter, filp, poll_table); 4753 4327 } 4754 4328 4755 4329 static ssize_t ··· 4774 4314 size_t count, loff_t *ppos) 4775 4315 { 4776 4316 struct ftrace_buffer_info *info = filp->private_data; 4317 + struct trace_iterator *iter = &info->iter; 4777 4318 ssize_t ret; 4778 - size_t size; 4319 + ssize_t size; 4779 4320 4780 4321 if (!count) 4781 4322 return 0; 4782 4323 4324 + mutex_lock(&trace_types_lock); 4325 + 4326 + #ifdef CONFIG_TRACER_MAX_TRACE 4327 + if (iter->snapshot && iter->tr->current_trace->use_max_tr) { 4328 + size = -EBUSY; 4329 + goto out_unlock; 4330 + } 4331 + #endif 4332 + 4783 4333 if (!info->spare) 4784 - info->spare = ring_buffer_alloc_read_page(info->tr->buffer, info->cpu); 4334 + info->spare = ring_buffer_alloc_read_page(iter->trace_buffer->buffer, 4335 + iter->cpu_file); 4336 + size = -ENOMEM; 4785 4337 if (!info->spare) 4786 - return -ENOMEM; 4338 + goto out_unlock; 4787 4339 4788 4340 /* Do we have previous read data to read? */ 4789 4341 if (info->read < PAGE_SIZE) 4790 4342 goto read; 4791 4343 4792 - trace_access_lock(info->cpu); 4793 - ret = ring_buffer_read_page(info->tr->buffer, 4344 + again: 4345 + trace_access_lock(iter->cpu_file); 4346 + ret = ring_buffer_read_page(iter->trace_buffer->buffer, 4794 4347 &info->spare, 4795 4348 count, 4796 - info->cpu, 0); 4797 - trace_access_unlock(info->cpu); 4798 - if (ret < 0) 4799 - return 0; 4349 + iter->cpu_file, 0); 4350 + trace_access_unlock(iter->cpu_file); 4351 + 4352 + if (ret < 0) { 4353 + if (trace_empty(iter)) { 4354 + if ((filp->f_flags & O_NONBLOCK)) { 4355 + size = -EAGAIN; 4356 + goto out_unlock; 4357 + } 4358 + mutex_unlock(&trace_types_lock); 4359 + iter->trace->wait_pipe(iter); 4360 + mutex_lock(&trace_types_lock); 4361 + if (signal_pending(current)) { 4362 + size = -EINTR; 4363 + goto out_unlock; 4364 + } 4365 + goto again; 4366 + } 4367 + size = 0; 4368 + goto out_unlock; 4369 + } 4800 4370 4801 4371 info->read = 0; 4802 - 4803 - read: 4372 + read: 4804 4373 size = PAGE_SIZE - info->read; 4805 4374 if (size > count) 4806 4375 size = count; 4807 4376 4808 4377 ret = copy_to_user(ubuf, info->spare + info->read, size); 4809 - if (ret == size) 4810 - return -EFAULT; 4378 + if (ret == size) { 4379 + size = -EFAULT; 4380 + goto out_unlock; 4381 + } 4811 4382 size -= ret; 4812 4383 4813 4384 *ppos += size; 4814 4385 info->read += size; 4386 + 4387 + out_unlock: 4388 + mutex_unlock(&trace_types_lock); 4815 4389 4816 4390 return size; 4817 4391 } ··· 4853 4359 static int tracing_buffers_release(struct inode *inode, struct file *file) 4854 4360 { 4855 4361 struct ftrace_buffer_info *info = file->private_data; 4362 + struct trace_iterator *iter = &info->iter; 4363 + 4364 + mutex_lock(&trace_types_lock); 4365 + 4366 + WARN_ON(!iter->tr->ref); 4367 + iter->tr->ref--; 4856 4368 4857 4369 if (info->spare) 4858 - ring_buffer_free_read_page(info->tr->buffer, info->spare); 4370 + ring_buffer_free_read_page(iter->trace_buffer->buffer, info->spare); 4859 4371 kfree(info); 4372 + 4373 + mutex_unlock(&trace_types_lock); 4860 4374 4861 4375 return 0; 4862 4376 } ··· 4930 4428 unsigned int flags) 4931 4429 { 4932 4430 struct ftrace_buffer_info *info = file->private_data; 4431 + struct trace_iterator *iter = &info->iter; 4933 4432 struct partial_page partial_def[PIPE_DEF_BUFFERS]; 4934 4433 struct page *pages_def[PIPE_DEF_BUFFERS]; 4935 4434 struct splice_pipe_desc spd = { ··· 4943 4440 }; 4944 4441 struct buffer_ref *ref; 4945 4442 int entries, size, i; 4946 - size_t ret; 4443 + ssize_t ret; 4947 4444 4948 - if (splice_grow_spd(pipe, &spd)) 4949 - return -ENOMEM; 4445 + mutex_lock(&trace_types_lock); 4446 + 4447 + #ifdef CONFIG_TRACER_MAX_TRACE 4448 + if (iter->snapshot && iter->tr->current_trace->use_max_tr) { 4449 + ret = -EBUSY; 4450 + goto out; 4451 + } 4452 + #endif 4453 + 4454 + if (splice_grow_spd(pipe, &spd)) { 4455 + ret = -ENOMEM; 4456 + goto out; 4457 + } 4950 4458 4951 4459 if (*ppos & (PAGE_SIZE - 1)) { 4952 4460 ret = -EINVAL; ··· 4972 4458 len &= PAGE_MASK; 4973 4459 } 4974 4460 4975 - trace_access_lock(info->cpu); 4976 - entries = ring_buffer_entries_cpu(info->tr->buffer, info->cpu); 4461 + again: 4462 + trace_access_lock(iter->cpu_file); 4463 + entries = ring_buffer_entries_cpu(iter->trace_buffer->buffer, iter->cpu_file); 4977 4464 4978 4465 for (i = 0; i < pipe->buffers && len && entries; i++, len -= PAGE_SIZE) { 4979 4466 struct page *page; ··· 4985 4470 break; 4986 4471 4987 4472 ref->ref = 1; 4988 - ref->buffer = info->tr->buffer; 4989 - ref->page = ring_buffer_alloc_read_page(ref->buffer, info->cpu); 4473 + ref->buffer = iter->trace_buffer->buffer; 4474 + ref->page = ring_buffer_alloc_read_page(ref->buffer, iter->cpu_file); 4990 4475 if (!ref->page) { 4991 4476 kfree(ref); 4992 4477 break; 4993 4478 } 4994 4479 4995 4480 r = ring_buffer_read_page(ref->buffer, &ref->page, 4996 - len, info->cpu, 1); 4481 + len, iter->cpu_file, 1); 4997 4482 if (r < 0) { 4998 4483 ring_buffer_free_read_page(ref->buffer, ref->page); 4999 4484 kfree(ref); ··· 5017 4502 spd.nr_pages++; 5018 4503 *ppos += PAGE_SIZE; 5019 4504 5020 - entries = ring_buffer_entries_cpu(info->tr->buffer, info->cpu); 4505 + entries = ring_buffer_entries_cpu(iter->trace_buffer->buffer, iter->cpu_file); 5021 4506 } 5022 4507 5023 - trace_access_unlock(info->cpu); 4508 + trace_access_unlock(iter->cpu_file); 5024 4509 spd.nr_pages = i; 5025 4510 5026 4511 /* did we read anything? */ 5027 4512 if (!spd.nr_pages) { 5028 - if (flags & SPLICE_F_NONBLOCK) 4513 + if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) { 5029 4514 ret = -EAGAIN; 5030 - else 5031 - ret = 0; 5032 - /* TODO: block */ 5033 - goto out; 4515 + goto out; 4516 + } 4517 + mutex_unlock(&trace_types_lock); 4518 + iter->trace->wait_pipe(iter); 4519 + mutex_lock(&trace_types_lock); 4520 + if (signal_pending(current)) { 4521 + ret = -EINTR; 4522 + goto out; 4523 + } 4524 + goto again; 5034 4525 } 5035 4526 5036 4527 ret = splice_to_pipe(pipe, &spd); 5037 4528 splice_shrink_spd(&spd); 5038 4529 out: 4530 + mutex_unlock(&trace_types_lock); 4531 + 5039 4532 return ret; 5040 4533 } 5041 4534 5042 4535 static const struct file_operations tracing_buffers_fops = { 5043 4536 .open = tracing_buffers_open, 5044 4537 .read = tracing_buffers_read, 4538 + .poll = tracing_buffers_poll, 5045 4539 .release = tracing_buffers_release, 5046 4540 .splice_read = tracing_buffers_splice_read, 5047 4541 .llseek = no_llseek, ··· 5060 4536 tracing_stats_read(struct file *filp, char __user *ubuf, 5061 4537 size_t count, loff_t *ppos) 5062 4538 { 5063 - unsigned long cpu = (unsigned long)filp->private_data; 5064 - struct trace_array *tr = &global_trace; 4539 + struct trace_cpu *tc = filp->private_data; 4540 + struct trace_array *tr = tc->tr; 4541 + struct trace_buffer *trace_buf = &tr->trace_buffer; 5065 4542 struct trace_seq *s; 5066 4543 unsigned long cnt; 5067 4544 unsigned long long t; 5068 4545 unsigned long usec_rem; 4546 + int cpu = tc->cpu; 5069 4547 5070 4548 s = kmalloc(sizeof(*s), GFP_KERNEL); 5071 4549 if (!s) ··· 5075 4549 5076 4550 trace_seq_init(s); 5077 4551 5078 - cnt = ring_buffer_entries_cpu(tr->buffer, cpu); 4552 + cnt = ring_buffer_entries_cpu(trace_buf->buffer, cpu); 5079 4553 trace_seq_printf(s, "entries: %ld\n", cnt); 5080 4554 5081 - cnt = ring_buffer_overrun_cpu(tr->buffer, cpu); 4555 + cnt = ring_buffer_overrun_cpu(trace_buf->buffer, cpu); 5082 4556 trace_seq_printf(s, "overrun: %ld\n", cnt); 5083 4557 5084 - cnt = ring_buffer_commit_overrun_cpu(tr->buffer, cpu); 4558 + cnt = ring_buffer_commit_overrun_cpu(trace_buf->buffer, cpu); 5085 4559 trace_seq_printf(s, "commit overrun: %ld\n", cnt); 5086 4560 5087 - cnt = ring_buffer_bytes_cpu(tr->buffer, cpu); 4561 + cnt = ring_buffer_bytes_cpu(trace_buf->buffer, cpu); 5088 4562 trace_seq_printf(s, "bytes: %ld\n", cnt); 5089 4563 5090 4564 if (trace_clocks[trace_clock_id].in_ns) { 5091 4565 /* local or global for trace_clock */ 5092 - t = ns2usecs(ring_buffer_oldest_event_ts(tr->buffer, cpu)); 4566 + t = ns2usecs(ring_buffer_oldest_event_ts(trace_buf->buffer, cpu)); 5093 4567 usec_rem = do_div(t, USEC_PER_SEC); 5094 4568 trace_seq_printf(s, "oldest event ts: %5llu.%06lu\n", 5095 4569 t, usec_rem); 5096 4570 5097 - t = ns2usecs(ring_buffer_time_stamp(tr->buffer, cpu)); 4571 + t = ns2usecs(ring_buffer_time_stamp(trace_buf->buffer, cpu)); 5098 4572 usec_rem = do_div(t, USEC_PER_SEC); 5099 4573 trace_seq_printf(s, "now ts: %5llu.%06lu\n", t, usec_rem); 5100 4574 } else { 5101 4575 /* counter or tsc mode for trace_clock */ 5102 4576 trace_seq_printf(s, "oldest event ts: %llu\n", 5103 - ring_buffer_oldest_event_ts(tr->buffer, cpu)); 4577 + ring_buffer_oldest_event_ts(trace_buf->buffer, cpu)); 5104 4578 5105 4579 trace_seq_printf(s, "now ts: %llu\n", 5106 - ring_buffer_time_stamp(tr->buffer, cpu)); 4580 + ring_buffer_time_stamp(trace_buf->buffer, cpu)); 5107 4581 } 5108 4582 5109 - cnt = ring_buffer_dropped_events_cpu(tr->buffer, cpu); 4583 + cnt = ring_buffer_dropped_events_cpu(trace_buf->buffer, cpu); 5110 4584 trace_seq_printf(s, "dropped events: %ld\n", cnt); 5111 4585 5112 - cnt = ring_buffer_read_events_cpu(tr->buffer, cpu); 4586 + cnt = ring_buffer_read_events_cpu(trace_buf->buffer, cpu); 5113 4587 trace_seq_printf(s, "read events: %ld\n", cnt); 5114 4588 5115 4589 count = simple_read_from_buffer(ubuf, count, ppos, s->buffer, s->len); ··· 5161 4635 .read = tracing_read_dyn_info, 5162 4636 .llseek = generic_file_llseek, 5163 4637 }; 5164 - #endif 4638 + #endif /* CONFIG_DYNAMIC_FTRACE */ 5165 4639 5166 - static struct dentry *d_tracer; 5167 - 5168 - struct dentry *tracing_init_dentry(void) 4640 + #if defined(CONFIG_TRACER_SNAPSHOT) && defined(CONFIG_DYNAMIC_FTRACE) 4641 + static void 4642 + ftrace_snapshot(unsigned long ip, unsigned long parent_ip, void **data) 5169 4643 { 5170 - static int once; 4644 + tracing_snapshot(); 4645 + } 5171 4646 5172 - if (d_tracer) 5173 - return d_tracer; 4647 + static void 4648 + ftrace_count_snapshot(unsigned long ip, unsigned long parent_ip, void **data) 4649 + { 4650 + unsigned long *count = (long *)data; 4651 + 4652 + if (!*count) 4653 + return; 4654 + 4655 + if (*count != -1) 4656 + (*count)--; 4657 + 4658 + tracing_snapshot(); 4659 + } 4660 + 4661 + static int 4662 + ftrace_snapshot_print(struct seq_file *m, unsigned long ip, 4663 + struct ftrace_probe_ops *ops, void *data) 4664 + { 4665 + long count = (long)data; 4666 + 4667 + seq_printf(m, "%ps:", (void *)ip); 4668 + 4669 + seq_printf(m, "snapshot"); 4670 + 4671 + if (count == -1) 4672 + seq_printf(m, ":unlimited\n"); 4673 + else 4674 + seq_printf(m, ":count=%ld\n", count); 4675 + 4676 + return 0; 4677 + } 4678 + 4679 + static struct ftrace_probe_ops snapshot_probe_ops = { 4680 + .func = ftrace_snapshot, 4681 + .print = ftrace_snapshot_print, 4682 + }; 4683 + 4684 + static struct ftrace_probe_ops snapshot_count_probe_ops = { 4685 + .func = ftrace_count_snapshot, 4686 + .print = ftrace_snapshot_print, 4687 + }; 4688 + 4689 + static int 4690 + ftrace_trace_snapshot_callback(struct ftrace_hash *hash, 4691 + char *glob, char *cmd, char *param, int enable) 4692 + { 4693 + struct ftrace_probe_ops *ops; 4694 + void *count = (void *)-1; 4695 + char *number; 4696 + int ret; 4697 + 4698 + /* hash funcs only work with set_ftrace_filter */ 4699 + if (!enable) 4700 + return -EINVAL; 4701 + 4702 + ops = param ? &snapshot_count_probe_ops : &snapshot_probe_ops; 4703 + 4704 + if (glob[0] == '!') { 4705 + unregister_ftrace_function_probe_func(glob+1, ops); 4706 + return 0; 4707 + } 4708 + 4709 + if (!param) 4710 + goto out_reg; 4711 + 4712 + number = strsep(&param, ":"); 4713 + 4714 + if (!strlen(number)) 4715 + goto out_reg; 4716 + 4717 + /* 4718 + * We use the callback data field (which is a pointer) 4719 + * as our counter. 4720 + */ 4721 + ret = kstrtoul(number, 0, (unsigned long *)&count); 4722 + if (ret) 4723 + return ret; 4724 + 4725 + out_reg: 4726 + ret = register_ftrace_function_probe(glob, ops, count); 4727 + 4728 + if (ret >= 0) 4729 + alloc_snapshot(&global_trace); 4730 + 4731 + return ret < 0 ? ret : 0; 4732 + } 4733 + 4734 + static struct ftrace_func_command ftrace_snapshot_cmd = { 4735 + .name = "snapshot", 4736 + .func = ftrace_trace_snapshot_callback, 4737 + }; 4738 + 4739 + static int register_snapshot_cmd(void) 4740 + { 4741 + return register_ftrace_command(&ftrace_snapshot_cmd); 4742 + } 4743 + #else 4744 + static inline int register_snapshot_cmd(void) { return 0; } 4745 + #endif /* defined(CONFIG_TRACER_SNAPSHOT) && defined(CONFIG_DYNAMIC_FTRACE) */ 4746 + 4747 + struct dentry *tracing_init_dentry_tr(struct trace_array *tr) 4748 + { 4749 + if (tr->dir) 4750 + return tr->dir; 5174 4751 5175 4752 if (!debugfs_initialized()) 5176 4753 return NULL; 5177 4754 5178 - d_tracer = debugfs_create_dir("tracing", NULL); 4755 + if (tr->flags & TRACE_ARRAY_FL_GLOBAL) 4756 + tr->dir = debugfs_create_dir("tracing", NULL); 5179 4757 5180 - if (!d_tracer && !once) { 5181 - once = 1; 5182 - pr_warning("Could not create debugfs directory 'tracing'\n"); 5183 - return NULL; 5184 - } 4758 + if (!tr->dir) 4759 + pr_warn_once("Could not create debugfs directory 'tracing'\n"); 5185 4760 5186 - return d_tracer; 4761 + return tr->dir; 5187 4762 } 5188 4763 5189 - static struct dentry *d_percpu; 5190 - 5191 - static struct dentry *tracing_dentry_percpu(void) 4764 + struct dentry *tracing_init_dentry(void) 5192 4765 { 5193 - static int once; 4766 + return tracing_init_dentry_tr(&global_trace); 4767 + } 4768 + 4769 + static struct dentry *tracing_dentry_percpu(struct trace_array *tr, int cpu) 4770 + { 5194 4771 struct dentry *d_tracer; 5195 4772 5196 - if (d_percpu) 5197 - return d_percpu; 4773 + if (tr->percpu_dir) 4774 + return tr->percpu_dir; 5198 4775 5199 - d_tracer = tracing_init_dentry(); 5200 - 4776 + d_tracer = tracing_init_dentry_tr(tr); 5201 4777 if (!d_tracer) 5202 4778 return NULL; 5203 4779 5204 - d_percpu = debugfs_create_dir("per_cpu", d_tracer); 4780 + tr->percpu_dir = debugfs_create_dir("per_cpu", d_tracer); 5205 4781 5206 - if (!d_percpu && !once) { 5207 - once = 1; 5208 - pr_warning("Could not create debugfs directory 'per_cpu'\n"); 5209 - return NULL; 5210 - } 4782 + WARN_ONCE(!tr->percpu_dir, 4783 + "Could not create debugfs directory 'per_cpu/%d'\n", cpu); 5211 4784 5212 - return d_percpu; 4785 + return tr->percpu_dir; 5213 4786 } 5214 4787 5215 - static void tracing_init_debugfs_percpu(long cpu) 4788 + static void 4789 + tracing_init_debugfs_percpu(struct trace_array *tr, long cpu) 5216 4790 { 5217 - struct dentry *d_percpu = tracing_dentry_percpu(); 4791 + struct trace_array_cpu *data = per_cpu_ptr(tr->trace_buffer.data, cpu); 4792 + struct dentry *d_percpu = tracing_dentry_percpu(tr, cpu); 5218 4793 struct dentry *d_cpu; 5219 4794 char cpu_dir[30]; /* 30 characters should be more than enough */ 5220 4795 ··· 5331 4704 5332 4705 /* per cpu trace_pipe */ 5333 4706 trace_create_file("trace_pipe", 0444, d_cpu, 5334 - (void *) cpu, &tracing_pipe_fops); 4707 + (void *)&data->trace_cpu, &tracing_pipe_fops); 5335 4708 5336 4709 /* per cpu trace */ 5337 4710 trace_create_file("trace", 0644, d_cpu, 5338 - (void *) cpu, &tracing_fops); 4711 + (void *)&data->trace_cpu, &tracing_fops); 5339 4712 5340 4713 trace_create_file("trace_pipe_raw", 0444, d_cpu, 5341 - (void *) cpu, &tracing_buffers_fops); 4714 + (void *)&data->trace_cpu, &tracing_buffers_fops); 5342 4715 5343 4716 trace_create_file("stats", 0444, d_cpu, 5344 - (void *) cpu, &tracing_stats_fops); 4717 + (void *)&data->trace_cpu, &tracing_stats_fops); 5345 4718 5346 4719 trace_create_file("buffer_size_kb", 0444, d_cpu, 5347 - (void *) cpu, &tracing_entries_fops); 4720 + (void *)&data->trace_cpu, &tracing_entries_fops); 4721 + 4722 + #ifdef CONFIG_TRACER_SNAPSHOT 4723 + trace_create_file("snapshot", 0644, d_cpu, 4724 + (void *)&data->trace_cpu, &snapshot_fops); 4725 + 4726 + trace_create_file("snapshot_raw", 0444, d_cpu, 4727 + (void *)&data->trace_cpu, &snapshot_raw_fops); 4728 + #endif 5348 4729 } 5349 4730 5350 4731 #ifdef CONFIG_FTRACE_SELFTEST ··· 5363 4728 struct trace_option_dentry { 5364 4729 struct tracer_opt *opt; 5365 4730 struct tracer_flags *flags; 4731 + struct trace_array *tr; 5366 4732 struct dentry *entry; 5367 4733 }; 5368 4734 ··· 5399 4763 5400 4764 if (!!(topt->flags->val & topt->opt->bit) != val) { 5401 4765 mutex_lock(&trace_types_lock); 5402 - ret = __set_tracer_option(current_trace, topt->flags, 4766 + ret = __set_tracer_option(topt->tr->current_trace, topt->flags, 5403 4767 topt->opt, !val); 5404 4768 mutex_unlock(&trace_types_lock); 5405 4769 if (ret) ··· 5438 4802 trace_options_core_write(struct file *filp, const char __user *ubuf, size_t cnt, 5439 4803 loff_t *ppos) 5440 4804 { 4805 + struct trace_array *tr = &global_trace; 5441 4806 long index = (long)filp->private_data; 5442 4807 unsigned long val; 5443 4808 int ret; ··· 5451 4814 return -EINVAL; 5452 4815 5453 4816 mutex_lock(&trace_types_lock); 5454 - ret = set_tracer_flag(1 << index, val); 4817 + ret = set_tracer_flag(tr, 1 << index, val); 5455 4818 mutex_unlock(&trace_types_lock); 5456 4819 5457 4820 if (ret < 0) ··· 5485 4848 } 5486 4849 5487 4850 5488 - static struct dentry *trace_options_init_dentry(void) 4851 + static struct dentry *trace_options_init_dentry(struct trace_array *tr) 5489 4852 { 5490 4853 struct dentry *d_tracer; 5491 - static struct dentry *t_options; 5492 4854 5493 - if (t_options) 5494 - return t_options; 4855 + if (tr->options) 4856 + return tr->options; 5495 4857 5496 - d_tracer = tracing_init_dentry(); 4858 + d_tracer = tracing_init_dentry_tr(tr); 5497 4859 if (!d_tracer) 5498 4860 return NULL; 5499 4861 5500 - t_options = debugfs_create_dir("options", d_tracer); 5501 - if (!t_options) { 4862 + tr->options = debugfs_create_dir("options", d_tracer); 4863 + if (!tr->options) { 5502 4864 pr_warning("Could not create debugfs directory 'options'\n"); 5503 4865 return NULL; 5504 4866 } 5505 4867 5506 - return t_options; 4868 + return tr->options; 5507 4869 } 5508 4870 5509 4871 static void 5510 - create_trace_option_file(struct trace_option_dentry *topt, 4872 + create_trace_option_file(struct trace_array *tr, 4873 + struct trace_option_dentry *topt, 5511 4874 struct tracer_flags *flags, 5512 4875 struct tracer_opt *opt) 5513 4876 { 5514 4877 struct dentry *t_options; 5515 4878 5516 - t_options = trace_options_init_dentry(); 4879 + t_options = trace_options_init_dentry(tr); 5517 4880 if (!t_options) 5518 4881 return; 5519 4882 5520 4883 topt->flags = flags; 5521 4884 topt->opt = opt; 4885 + topt->tr = tr; 5522 4886 5523 4887 topt->entry = trace_create_file(opt->name, 0644, t_options, topt, 5524 4888 &trace_options_fops); ··· 5527 4889 } 5528 4890 5529 4891 static struct trace_option_dentry * 5530 - create_trace_option_files(struct tracer *tracer) 4892 + create_trace_option_files(struct trace_array *tr, struct tracer *tracer) 5531 4893 { 5532 4894 struct trace_option_dentry *topts; 5533 4895 struct tracer_flags *flags; ··· 5552 4914 return NULL; 5553 4915 5554 4916 for (cnt = 0; opts[cnt].name; cnt++) 5555 - create_trace_option_file(&topts[cnt], flags, 4917 + create_trace_option_file(tr, &topts[cnt], flags, 5556 4918 &opts[cnt]); 5557 4919 5558 4920 return topts; ··· 5575 4937 } 5576 4938 5577 4939 static struct dentry * 5578 - create_trace_option_core_file(const char *option, long index) 4940 + create_trace_option_core_file(struct trace_array *tr, 4941 + const char *option, long index) 5579 4942 { 5580 4943 struct dentry *t_options; 5581 4944 5582 - t_options = trace_options_init_dentry(); 4945 + t_options = trace_options_init_dentry(tr); 5583 4946 if (!t_options) 5584 4947 return NULL; 5585 4948 ··· 5588 4949 &trace_options_core_fops); 5589 4950 } 5590 4951 5591 - static __init void create_trace_options_dir(void) 4952 + static __init void create_trace_options_dir(struct trace_array *tr) 5592 4953 { 5593 4954 struct dentry *t_options; 5594 4955 int i; 5595 4956 5596 - t_options = trace_options_init_dentry(); 4957 + t_options = trace_options_init_dentry(tr); 5597 4958 if (!t_options) 5598 4959 return; 5599 4960 5600 4961 for (i = 0; trace_options[i]; i++) 5601 - create_trace_option_core_file(trace_options[i], i); 4962 + create_trace_option_core_file(tr, trace_options[i], i); 5602 4963 } 5603 4964 5604 4965 static ssize_t ··· 5606 4967 size_t cnt, loff_t *ppos) 5607 4968 { 5608 4969 struct trace_array *tr = filp->private_data; 5609 - struct ring_buffer *buffer = tr->buffer; 4970 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 5610 4971 char buf[64]; 5611 4972 int r; 5612 4973 ··· 5625 4986 size_t cnt, loff_t *ppos) 5626 4987 { 5627 4988 struct trace_array *tr = filp->private_data; 5628 - struct ring_buffer *buffer = tr->buffer; 4989 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 5629 4990 unsigned long val; 5630 4991 int ret; 5631 4992 ··· 5637 4998 mutex_lock(&trace_types_lock); 5638 4999 if (val) { 5639 5000 ring_buffer_record_on(buffer); 5640 - if (current_trace->start) 5641 - current_trace->start(tr); 5001 + if (tr->current_trace->start) 5002 + tr->current_trace->start(tr); 5642 5003 } else { 5643 5004 ring_buffer_record_off(buffer); 5644 - if (current_trace->stop) 5645 - current_trace->stop(tr); 5005 + if (tr->current_trace->stop) 5006 + tr->current_trace->stop(tr); 5646 5007 } 5647 5008 mutex_unlock(&trace_types_lock); 5648 5009 } ··· 5659 5020 .llseek = default_llseek, 5660 5021 }; 5661 5022 5023 + struct dentry *trace_instance_dir; 5024 + 5025 + static void 5026 + init_tracer_debugfs(struct trace_array *tr, struct dentry *d_tracer); 5027 + 5028 + static void init_trace_buffers(struct trace_array *tr, struct trace_buffer *buf) 5029 + { 5030 + int cpu; 5031 + 5032 + for_each_tracing_cpu(cpu) { 5033 + memset(per_cpu_ptr(buf->data, cpu), 0, sizeof(struct trace_array_cpu)); 5034 + per_cpu_ptr(buf->data, cpu)->trace_cpu.cpu = cpu; 5035 + per_cpu_ptr(buf->data, cpu)->trace_cpu.tr = tr; 5036 + } 5037 + } 5038 + 5039 + static int 5040 + allocate_trace_buffer(struct trace_array *tr, struct trace_buffer *buf, int size) 5041 + { 5042 + enum ring_buffer_flags rb_flags; 5043 + 5044 + rb_flags = trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0; 5045 + 5046 + buf->buffer = ring_buffer_alloc(size, rb_flags); 5047 + if (!buf->buffer) 5048 + return -ENOMEM; 5049 + 5050 + buf->data = alloc_percpu(struct trace_array_cpu); 5051 + if (!buf->data) { 5052 + ring_buffer_free(buf->buffer); 5053 + return -ENOMEM; 5054 + } 5055 + 5056 + init_trace_buffers(tr, buf); 5057 + 5058 + /* Allocate the first page for all buffers */ 5059 + set_buffer_entries(&tr->trace_buffer, 5060 + ring_buffer_size(tr->trace_buffer.buffer, 0)); 5061 + 5062 + return 0; 5063 + } 5064 + 5065 + static int allocate_trace_buffers(struct trace_array *tr, int size) 5066 + { 5067 + int ret; 5068 + 5069 + ret = allocate_trace_buffer(tr, &tr->trace_buffer, size); 5070 + if (ret) 5071 + return ret; 5072 + 5073 + #ifdef CONFIG_TRACER_MAX_TRACE 5074 + ret = allocate_trace_buffer(tr, &tr->max_buffer, 5075 + allocate_snapshot ? size : 1); 5076 + if (WARN_ON(ret)) { 5077 + ring_buffer_free(tr->trace_buffer.buffer); 5078 + free_percpu(tr->trace_buffer.data); 5079 + return -ENOMEM; 5080 + } 5081 + tr->allocated_snapshot = allocate_snapshot; 5082 + 5083 + /* 5084 + * Only the top level trace array gets its snapshot allocated 5085 + * from the kernel command line. 5086 + */ 5087 + allocate_snapshot = false; 5088 + #endif 5089 + return 0; 5090 + } 5091 + 5092 + static int new_instance_create(const char *name) 5093 + { 5094 + struct trace_array *tr; 5095 + int ret; 5096 + 5097 + mutex_lock(&trace_types_lock); 5098 + 5099 + ret = -EEXIST; 5100 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { 5101 + if (tr->name && strcmp(tr->name, name) == 0) 5102 + goto out_unlock; 5103 + } 5104 + 5105 + ret = -ENOMEM; 5106 + tr = kzalloc(sizeof(*tr), GFP_KERNEL); 5107 + if (!tr) 5108 + goto out_unlock; 5109 + 5110 + tr->name = kstrdup(name, GFP_KERNEL); 5111 + if (!tr->name) 5112 + goto out_free_tr; 5113 + 5114 + raw_spin_lock_init(&tr->start_lock); 5115 + 5116 + tr->current_trace = &nop_trace; 5117 + 5118 + INIT_LIST_HEAD(&tr->systems); 5119 + INIT_LIST_HEAD(&tr->events); 5120 + 5121 + if (allocate_trace_buffers(tr, trace_buf_size) < 0) 5122 + goto out_free_tr; 5123 + 5124 + /* Holder for file callbacks */ 5125 + tr->trace_cpu.cpu = RING_BUFFER_ALL_CPUS; 5126 + tr->trace_cpu.tr = tr; 5127 + 5128 + tr->dir = debugfs_create_dir(name, trace_instance_dir); 5129 + if (!tr->dir) 5130 + goto out_free_tr; 5131 + 5132 + ret = event_trace_add_tracer(tr->dir, tr); 5133 + if (ret) 5134 + goto out_free_tr; 5135 + 5136 + init_tracer_debugfs(tr, tr->dir); 5137 + 5138 + list_add(&tr->list, &ftrace_trace_arrays); 5139 + 5140 + mutex_unlock(&trace_types_lock); 5141 + 5142 + return 0; 5143 + 5144 + out_free_tr: 5145 + if (tr->trace_buffer.buffer) 5146 + ring_buffer_free(tr->trace_buffer.buffer); 5147 + kfree(tr->name); 5148 + kfree(tr); 5149 + 5150 + out_unlock: 5151 + mutex_unlock(&trace_types_lock); 5152 + 5153 + return ret; 5154 + 5155 + } 5156 + 5157 + static int instance_delete(const char *name) 5158 + { 5159 + struct trace_array *tr; 5160 + int found = 0; 5161 + int ret; 5162 + 5163 + mutex_lock(&trace_types_lock); 5164 + 5165 + ret = -ENODEV; 5166 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { 5167 + if (tr->name && strcmp(tr->name, name) == 0) { 5168 + found = 1; 5169 + break; 5170 + } 5171 + } 5172 + if (!found) 5173 + goto out_unlock; 5174 + 5175 + ret = -EBUSY; 5176 + if (tr->ref) 5177 + goto out_unlock; 5178 + 5179 + list_del(&tr->list); 5180 + 5181 + event_trace_del_tracer(tr); 5182 + debugfs_remove_recursive(tr->dir); 5183 + free_percpu(tr->trace_buffer.data); 5184 + ring_buffer_free(tr->trace_buffer.buffer); 5185 + 5186 + kfree(tr->name); 5187 + kfree(tr); 5188 + 5189 + ret = 0; 5190 + 5191 + out_unlock: 5192 + mutex_unlock(&trace_types_lock); 5193 + 5194 + return ret; 5195 + } 5196 + 5197 + static int instance_mkdir (struct inode *inode, struct dentry *dentry, umode_t mode) 5198 + { 5199 + struct dentry *parent; 5200 + int ret; 5201 + 5202 + /* Paranoid: Make sure the parent is the "instances" directory */ 5203 + parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias); 5204 + if (WARN_ON_ONCE(parent != trace_instance_dir)) 5205 + return -ENOENT; 5206 + 5207 + /* 5208 + * The inode mutex is locked, but debugfs_create_dir() will also 5209 + * take the mutex. As the instances directory can not be destroyed 5210 + * or changed in any other way, it is safe to unlock it, and 5211 + * let the dentry try. If two users try to make the same dir at 5212 + * the same time, then the new_instance_create() will determine the 5213 + * winner. 5214 + */ 5215 + mutex_unlock(&inode->i_mutex); 5216 + 5217 + ret = new_instance_create(dentry->d_iname); 5218 + 5219 + mutex_lock(&inode->i_mutex); 5220 + 5221 + return ret; 5222 + } 5223 + 5224 + static int instance_rmdir(struct inode *inode, struct dentry *dentry) 5225 + { 5226 + struct dentry *parent; 5227 + int ret; 5228 + 5229 + /* Paranoid: Make sure the parent is the "instances" directory */ 5230 + parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias); 5231 + if (WARN_ON_ONCE(parent != trace_instance_dir)) 5232 + return -ENOENT; 5233 + 5234 + /* The caller did a dget() on dentry */ 5235 + mutex_unlock(&dentry->d_inode->i_mutex); 5236 + 5237 + /* 5238 + * The inode mutex is locked, but debugfs_create_dir() will also 5239 + * take the mutex. As the instances directory can not be destroyed 5240 + * or changed in any other way, it is safe to unlock it, and 5241 + * let the dentry try. If two users try to make the same dir at 5242 + * the same time, then the instance_delete() will determine the 5243 + * winner. 5244 + */ 5245 + mutex_unlock(&inode->i_mutex); 5246 + 5247 + ret = instance_delete(dentry->d_iname); 5248 + 5249 + mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT); 5250 + mutex_lock(&dentry->d_inode->i_mutex); 5251 + 5252 + return ret; 5253 + } 5254 + 5255 + static const struct inode_operations instance_dir_inode_operations = { 5256 + .lookup = simple_lookup, 5257 + .mkdir = instance_mkdir, 5258 + .rmdir = instance_rmdir, 5259 + }; 5260 + 5261 + static __init void create_trace_instances(struct dentry *d_tracer) 5262 + { 5263 + trace_instance_dir = debugfs_create_dir("instances", d_tracer); 5264 + if (WARN_ON(!trace_instance_dir)) 5265 + return; 5266 + 5267 + /* Hijack the dir inode operations, to allow mkdir */ 5268 + trace_instance_dir->d_inode->i_op = &instance_dir_inode_operations; 5269 + } 5270 + 5271 + static void 5272 + init_tracer_debugfs(struct trace_array *tr, struct dentry *d_tracer) 5273 + { 5274 + int cpu; 5275 + 5276 + trace_create_file("trace_options", 0644, d_tracer, 5277 + tr, &tracing_iter_fops); 5278 + 5279 + trace_create_file("trace", 0644, d_tracer, 5280 + (void *)&tr->trace_cpu, &tracing_fops); 5281 + 5282 + trace_create_file("trace_pipe", 0444, d_tracer, 5283 + (void *)&tr->trace_cpu, &tracing_pipe_fops); 5284 + 5285 + trace_create_file("buffer_size_kb", 0644, d_tracer, 5286 + (void *)&tr->trace_cpu, &tracing_entries_fops); 5287 + 5288 + trace_create_file("buffer_total_size_kb", 0444, d_tracer, 5289 + tr, &tracing_total_entries_fops); 5290 + 5291 + trace_create_file("free_buffer", 0644, d_tracer, 5292 + tr, &tracing_free_buffer_fops); 5293 + 5294 + trace_create_file("trace_marker", 0220, d_tracer, 5295 + tr, &tracing_mark_fops); 5296 + 5297 + trace_create_file("trace_clock", 0644, d_tracer, tr, 5298 + &trace_clock_fops); 5299 + 5300 + trace_create_file("tracing_on", 0644, d_tracer, 5301 + tr, &rb_simple_fops); 5302 + 5303 + #ifdef CONFIG_TRACER_SNAPSHOT 5304 + trace_create_file("snapshot", 0644, d_tracer, 5305 + (void *)&tr->trace_cpu, &snapshot_fops); 5306 + #endif 5307 + 5308 + for_each_tracing_cpu(cpu) 5309 + tracing_init_debugfs_percpu(tr, cpu); 5310 + 5311 + } 5312 + 5662 5313 static __init int tracer_init_debugfs(void) 5663 5314 { 5664 5315 struct dentry *d_tracer; 5665 - int cpu; 5666 5316 5667 5317 trace_access_lock_init(); 5668 5318 5669 5319 d_tracer = tracing_init_dentry(); 5320 + if (!d_tracer) 5321 + return 0; 5670 5322 5671 - trace_create_file("trace_options", 0644, d_tracer, 5672 - NULL, &tracing_iter_fops); 5323 + init_tracer_debugfs(&global_trace, d_tracer); 5673 5324 5674 5325 trace_create_file("tracing_cpumask", 0644, d_tracer, 5675 - NULL, &tracing_cpumask_fops); 5676 - 5677 - trace_create_file("trace", 0644, d_tracer, 5678 - (void *) TRACE_PIPE_ALL_CPU, &tracing_fops); 5326 + &global_trace, &tracing_cpumask_fops); 5679 5327 5680 5328 trace_create_file("available_tracers", 0444, d_tracer, 5681 5329 &global_trace, &show_traces_fops); ··· 5981 5055 trace_create_file("README", 0444, d_tracer, 5982 5056 NULL, &tracing_readme_fops); 5983 5057 5984 - trace_create_file("trace_pipe", 0444, d_tracer, 5985 - (void *) TRACE_PIPE_ALL_CPU, &tracing_pipe_fops); 5986 - 5987 - trace_create_file("buffer_size_kb", 0644, d_tracer, 5988 - (void *) RING_BUFFER_ALL_CPUS, &tracing_entries_fops); 5989 - 5990 - trace_create_file("buffer_total_size_kb", 0444, d_tracer, 5991 - &global_trace, &tracing_total_entries_fops); 5992 - 5993 - trace_create_file("free_buffer", 0644, d_tracer, 5994 - &global_trace, &tracing_free_buffer_fops); 5995 - 5996 - trace_create_file("trace_marker", 0220, d_tracer, 5997 - NULL, &tracing_mark_fops); 5998 - 5999 5058 trace_create_file("saved_cmdlines", 0444, d_tracer, 6000 5059 NULL, &tracing_saved_cmdlines_fops); 6001 - 6002 - trace_create_file("trace_clock", 0644, d_tracer, NULL, 6003 - &trace_clock_fops); 6004 - 6005 - trace_create_file("tracing_on", 0644, d_tracer, 6006 - &global_trace, &rb_simple_fops); 6007 5060 6008 5061 #ifdef CONFIG_DYNAMIC_FTRACE 6009 5062 trace_create_file("dyn_ftrace_total_info", 0444, d_tracer, 6010 5063 &ftrace_update_tot_cnt, &tracing_dyn_info_fops); 6011 5064 #endif 6012 5065 6013 - #ifdef CONFIG_TRACER_SNAPSHOT 6014 - trace_create_file("snapshot", 0644, d_tracer, 6015 - (void *) TRACE_PIPE_ALL_CPU, &snapshot_fops); 6016 - #endif 5066 + create_trace_instances(d_tracer); 6017 5067 6018 - create_trace_options_dir(); 6019 - 6020 - for_each_tracing_cpu(cpu) 6021 - tracing_init_debugfs_percpu(cpu); 5068 + create_trace_options_dir(&global_trace); 6022 5069 6023 5070 return 0; 6024 5071 } ··· 6047 5148 trace_printk_seq(struct trace_seq *s) 6048 5149 { 6049 5150 /* Probably should print a warning here. */ 6050 - if (s->len >= 1000) 6051 - s->len = 1000; 5151 + if (s->len >= TRACE_MAX_PRINT) 5152 + s->len = TRACE_MAX_PRINT; 6052 5153 6053 5154 /* should be zero ended, but we are paranoid. */ 6054 5155 s->buffer[s->len] = 0; ··· 6061 5162 void trace_init_global_iter(struct trace_iterator *iter) 6062 5163 { 6063 5164 iter->tr = &global_trace; 6064 - iter->trace = current_trace; 6065 - iter->cpu_file = TRACE_PIPE_ALL_CPU; 5165 + iter->trace = iter->tr->current_trace; 5166 + iter->cpu_file = RING_BUFFER_ALL_CPUS; 5167 + iter->trace_buffer = &global_trace.trace_buffer; 6066 5168 } 6067 5169 6068 - static void 6069 - __ftrace_dump(bool disable_tracing, enum ftrace_dump_mode oops_dump_mode) 5170 + void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) 6070 5171 { 6071 - static arch_spinlock_t ftrace_dump_lock = 6072 - (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; 6073 5172 /* use static because iter can be a bit big for the stack */ 6074 5173 static struct trace_iterator iter; 5174 + static atomic_t dump_running; 6075 5175 unsigned int old_userobj; 6076 - static int dump_ran; 6077 5176 unsigned long flags; 6078 5177 int cnt = 0, cpu; 6079 5178 6080 - /* only one dump */ 6081 - local_irq_save(flags); 6082 - arch_spin_lock(&ftrace_dump_lock); 6083 - if (dump_ran) 6084 - goto out; 6085 - 6086 - dump_ran = 1; 6087 - 6088 - tracing_off(); 6089 - 6090 - /* Did function tracer already get disabled? */ 6091 - if (ftrace_is_dead()) { 6092 - printk("# WARNING: FUNCTION TRACING IS CORRUPTED\n"); 6093 - printk("# MAY BE MISSING FUNCTION EVENTS\n"); 5179 + /* Only allow one dump user at a time. */ 5180 + if (atomic_inc_return(&dump_running) != 1) { 5181 + atomic_dec(&dump_running); 5182 + return; 6094 5183 } 6095 5184 6096 - if (disable_tracing) 6097 - ftrace_kill(); 5185 + /* 5186 + * Always turn off tracing when we dump. 5187 + * We don't need to show trace output of what happens 5188 + * between multiple crashes. 5189 + * 5190 + * If the user does a sysrq-z, then they can re-enable 5191 + * tracing with echo 1 > tracing_on. 5192 + */ 5193 + tracing_off(); 5194 + 5195 + local_irq_save(flags); 6098 5196 6099 5197 /* Simulate the iterator */ 6100 5198 trace_init_global_iter(&iter); 6101 5199 6102 5200 for_each_tracing_cpu(cpu) { 6103 - atomic_inc(&iter.tr->data[cpu]->disabled); 5201 + atomic_inc(&per_cpu_ptr(iter.tr->trace_buffer.data, cpu)->disabled); 6104 5202 } 6105 5203 6106 5204 old_userobj = trace_flags & TRACE_ITER_SYM_USEROBJ; ··· 6107 5211 6108 5212 switch (oops_dump_mode) { 6109 5213 case DUMP_ALL: 6110 - iter.cpu_file = TRACE_PIPE_ALL_CPU; 5214 + iter.cpu_file = RING_BUFFER_ALL_CPUS; 6111 5215 break; 6112 5216 case DUMP_ORIG: 6113 5217 iter.cpu_file = raw_smp_processor_id(); ··· 6116 5220 goto out_enable; 6117 5221 default: 6118 5222 printk(KERN_TRACE "Bad dumping mode, switching to all CPUs dump\n"); 6119 - iter.cpu_file = TRACE_PIPE_ALL_CPU; 5223 + iter.cpu_file = RING_BUFFER_ALL_CPUS; 6120 5224 } 6121 5225 6122 5226 printk(KERN_TRACE "Dumping ftrace buffer:\n"); 5227 + 5228 + /* Did function tracer already get disabled? */ 5229 + if (ftrace_is_dead()) { 5230 + printk("# WARNING: FUNCTION TRACING IS CORRUPTED\n"); 5231 + printk("# MAY BE MISSING FUNCTION EVENTS\n"); 5232 + } 6123 5233 6124 5234 /* 6125 5235 * We need to stop all tracing on all CPUS to read the ··· 6166 5264 printk(KERN_TRACE "---------------------------------\n"); 6167 5265 6168 5266 out_enable: 6169 - /* Re-enable tracing if requested */ 6170 - if (!disable_tracing) { 6171 - trace_flags |= old_userobj; 5267 + trace_flags |= old_userobj; 6172 5268 6173 - for_each_tracing_cpu(cpu) { 6174 - atomic_dec(&iter.tr->data[cpu]->disabled); 6175 - } 6176 - tracing_on(); 5269 + for_each_tracing_cpu(cpu) { 5270 + atomic_dec(&per_cpu_ptr(iter.trace_buffer->data, cpu)->disabled); 6177 5271 } 6178 - 6179 - out: 6180 - arch_spin_unlock(&ftrace_dump_lock); 5272 + atomic_dec(&dump_running); 6181 5273 local_irq_restore(flags); 6182 - } 6183 - 6184 - /* By default: disable tracing after the dump */ 6185 - void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) 6186 - { 6187 - __ftrace_dump(true, oops_dump_mode); 6188 5274 } 6189 5275 EXPORT_SYMBOL_GPL(ftrace_dump); 6190 5276 6191 5277 __init static int tracer_alloc_buffers(void) 6192 5278 { 6193 5279 int ring_buf_size; 6194 - enum ring_buffer_flags rb_flags; 6195 - int i; 6196 5280 int ret = -ENOMEM; 6197 5281 6198 5282 ··· 6199 5311 else 6200 5312 ring_buf_size = 1; 6201 5313 6202 - rb_flags = trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0; 6203 - 6204 5314 cpumask_copy(tracing_buffer_mask, cpu_possible_mask); 6205 5315 cpumask_copy(tracing_cpumask, cpu_all_mask); 6206 5316 5317 + raw_spin_lock_init(&global_trace.start_lock); 5318 + 6207 5319 /* TODO: make the number of buffers hot pluggable with CPUS */ 6208 - global_trace.buffer = ring_buffer_alloc(ring_buf_size, rb_flags); 6209 - if (!global_trace.buffer) { 5320 + if (allocate_trace_buffers(&global_trace, ring_buf_size) < 0) { 6210 5321 printk(KERN_ERR "tracer: failed to allocate ring buffer!\n"); 6211 5322 WARN_ON(1); 6212 5323 goto out_free_cpumask; 6213 5324 } 5325 + 6214 5326 if (global_trace.buffer_disabled) 6215 5327 tracing_off(); 6216 5328 6217 - 6218 - #ifdef CONFIG_TRACER_MAX_TRACE 6219 - max_tr.buffer = ring_buffer_alloc(1, rb_flags); 6220 - if (!max_tr.buffer) { 6221 - printk(KERN_ERR "tracer: failed to allocate max ring buffer!\n"); 6222 - WARN_ON(1); 6223 - ring_buffer_free(global_trace.buffer); 6224 - goto out_free_cpumask; 6225 - } 6226 - #endif 6227 - 6228 - /* Allocate the first page for all buffers */ 6229 - for_each_tracing_cpu(i) { 6230 - global_trace.data[i] = &per_cpu(global_trace_cpu, i); 6231 - max_tr.data[i] = &per_cpu(max_tr_data, i); 6232 - } 6233 - 6234 - set_buffer_entries(&global_trace, 6235 - ring_buffer_size(global_trace.buffer, 0)); 6236 - #ifdef CONFIG_TRACER_MAX_TRACE 6237 - set_buffer_entries(&max_tr, 1); 6238 - #endif 6239 - 6240 5329 trace_init_cmdlines(); 6241 - init_irq_work(&trace_work_wakeup, trace_wake_up); 6242 5330 6243 5331 register_tracer(&nop_trace); 5332 + 5333 + global_trace.current_trace = &nop_trace; 6244 5334 6245 5335 /* All seems OK, enable tracing */ 6246 5336 tracing_disabled = 0; ··· 6228 5362 6229 5363 register_die_notifier(&trace_die_notifier); 6230 5364 5365 + global_trace.flags = TRACE_ARRAY_FL_GLOBAL; 5366 + 5367 + /* Holder for file callbacks */ 5368 + global_trace.trace_cpu.cpu = RING_BUFFER_ALL_CPUS; 5369 + global_trace.trace_cpu.tr = &global_trace; 5370 + 5371 + INIT_LIST_HEAD(&global_trace.systems); 5372 + INIT_LIST_HEAD(&global_trace.events); 5373 + list_add(&global_trace.list, &ftrace_trace_arrays); 5374 + 6231 5375 while (trace_boot_options) { 6232 5376 char *option; 6233 5377 6234 5378 option = strsep(&trace_boot_options, ","); 6235 - trace_set_options(option); 5379 + trace_set_options(&global_trace, option); 6236 5380 } 5381 + 5382 + register_snapshot_cmd(); 6237 5383 6238 5384 return 0; 6239 5385 6240 5386 out_free_cpumask: 5387 + free_percpu(global_trace.trace_buffer.data); 5388 + #ifdef CONFIG_TRACER_MAX_TRACE 5389 + free_percpu(global_trace.max_buffer.data); 5390 + #endif 6241 5391 free_cpumask_var(tracing_cpumask); 6242 5392 out_free_buffer_mask: 6243 5393 free_cpumask_var(tracing_buffer_mask);

+122 -22

kernel/trace/trace.h

··· 13 13 #include <linux/trace_seq.h> 14 14 #include <linux/ftrace_event.h> 15 15 16 + #ifdef CONFIG_FTRACE_SYSCALLS 17 + #include <asm/unistd.h> /* For NR_SYSCALLS */ 18 + #include <asm/syscall.h> /* some archs define it here */ 19 + #endif 20 + 16 21 enum trace_type { 17 22 __TRACE_FIRST_TYPE = 0, 18 23 ··· 34 29 TRACE_GRAPH_ENT, 35 30 TRACE_USER_STACK, 36 31 TRACE_BLK, 32 + TRACE_BPUTS, 37 33 38 34 __TRACE_LAST_TYPE, 39 35 }; ··· 133 127 134 128 #define TRACE_BUF_SIZE 1024 135 129 130 + struct trace_array; 131 + 132 + struct trace_cpu { 133 + struct trace_array *tr; 134 + struct dentry *dir; 135 + int cpu; 136 + }; 137 + 136 138 /* 137 139 * The CPU trace array - it consists of thousands of trace entries 138 140 * plus some other descriptor data: (for example which task started 139 141 * the trace, etc.) 140 142 */ 141 143 struct trace_array_cpu { 144 + struct trace_cpu trace_cpu; 142 145 atomic_t disabled; 143 146 void *buffer_page; /* ring buffer spare */ 144 147 ··· 166 151 char comm[TASK_COMM_LEN]; 167 152 }; 168 153 154 + struct tracer; 155 + 156 + struct trace_buffer { 157 + struct trace_array *tr; 158 + struct ring_buffer *buffer; 159 + struct trace_array_cpu __percpu *data; 160 + cycle_t time_start; 161 + int cpu; 162 + }; 163 + 169 164 /* 170 165 * The trace array - an array of per-CPU trace arrays. This is the 171 166 * highest level data structure that individual tracers deal with. 172 167 * They have on/off state as well: 173 168 */ 174 169 struct trace_array { 175 - struct ring_buffer *buffer; 176 - int cpu; 170 + struct list_head list; 171 + char *name; 172 + struct trace_buffer trace_buffer; 173 + #ifdef CONFIG_TRACER_MAX_TRACE 174 + /* 175 + * The max_buffer is used to snapshot the trace when a maximum 176 + * latency is reached, or when the user initiates a snapshot. 177 + * Some tracers will use this to store a maximum trace while 178 + * it continues examining live traces. 179 + * 180 + * The buffers for the max_buffer are set up the same as the trace_buffer 181 + * When a snapshot is taken, the buffer of the max_buffer is swapped 182 + * with the buffer of the trace_buffer and the buffers are reset for 183 + * the trace_buffer so the tracing can continue. 184 + */ 185 + struct trace_buffer max_buffer; 186 + bool allocated_snapshot; 187 + #endif 177 188 int buffer_disabled; 178 - cycle_t time_start; 189 + struct trace_cpu trace_cpu; /* place holder */ 190 + #ifdef CONFIG_FTRACE_SYSCALLS 191 + int sys_refcount_enter; 192 + int sys_refcount_exit; 193 + DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls); 194 + DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls); 195 + #endif 196 + int stop_count; 197 + int clock_id; 198 + struct tracer *current_trace; 199 + unsigned int flags; 200 + raw_spinlock_t start_lock; 201 + struct dentry *dir; 202 + struct dentry *options; 203 + struct dentry *percpu_dir; 204 + struct dentry *event_dir; 205 + struct list_head systems; 206 + struct list_head events; 179 207 struct task_struct *waiter; 180 - struct trace_array_cpu *data[NR_CPUS]; 208 + int ref; 181 209 }; 210 + 211 + enum { 212 + TRACE_ARRAY_FL_GLOBAL = (1 << 0) 213 + }; 214 + 215 + extern struct list_head ftrace_trace_arrays; 216 + 217 + /* 218 + * The global tracer (top) should be the first trace array added, 219 + * but we check the flag anyway. 220 + */ 221 + static inline struct trace_array *top_trace_array(void) 222 + { 223 + struct trace_array *tr; 224 + 225 + tr = list_entry(ftrace_trace_arrays.prev, 226 + typeof(*tr), list); 227 + WARN_ON(!(tr->flags & TRACE_ARRAY_FL_GLOBAL)); 228 + return tr; 229 + } 182 230 183 231 #define FTRACE_CMP_TYPE(var, type) \ 184 232 __builtin_types_compatible_p(typeof(var), type *) ··· 278 200 IF_ASSIGN(var, ent, struct userstack_entry, TRACE_USER_STACK);\ 279 201 IF_ASSIGN(var, ent, struct print_entry, TRACE_PRINT); \ 280 202 IF_ASSIGN(var, ent, struct bprint_entry, TRACE_BPRINT); \ 203 + IF_ASSIGN(var, ent, struct bputs_entry, TRACE_BPUTS); \ 281 204 IF_ASSIGN(var, ent, struct trace_mmiotrace_rw, \ 282 205 TRACE_MMIO_RW); \ 283 206 IF_ASSIGN(var, ent, struct trace_mmiotrace_map, \ ··· 368 289 struct tracer *next; 369 290 struct tracer_flags *flags; 370 291 bool print_max; 371 - bool use_max_tr; 372 - bool allocated_snapshot; 373 292 bool enabled; 293 + #ifdef CONFIG_TRACER_MAX_TRACE 294 + bool use_max_tr; 295 + #endif 374 296 }; 375 297 376 298 ··· 507 427 current->trace_recursion = val; 508 428 } 509 429 510 - #define TRACE_PIPE_ALL_CPU -1 511 - 512 430 static inline struct ring_buffer_iter * 513 431 trace_buffer_iter(struct trace_iterator *iter, int cpu) 514 432 { ··· 517 439 518 440 int tracer_init(struct tracer *t, struct trace_array *tr); 519 441 int tracing_is_enabled(void); 520 - void tracing_reset(struct trace_array *tr, int cpu); 521 - void tracing_reset_online_cpus(struct trace_array *tr); 442 + void tracing_reset(struct trace_buffer *buf, int cpu); 443 + void tracing_reset_online_cpus(struct trace_buffer *buf); 522 444 void tracing_reset_current(int cpu); 523 - void tracing_reset_current_online_cpus(void); 445 + void tracing_reset_all_online_cpus(void); 524 446 int tracing_open_generic(struct inode *inode, struct file *filp); 525 447 struct dentry *trace_create_file(const char *name, 526 448 umode_t mode, ··· 528 450 void *data, 529 451 const struct file_operations *fops); 530 452 453 + struct dentry *tracing_init_dentry_tr(struct trace_array *tr); 531 454 struct dentry *tracing_init_dentry(void); 532 455 533 456 struct ring_buffer_event; ··· 662 583 #define DYN_FTRACE_TEST_NAME2 trace_selftest_dynamic_test_func2 663 584 extern int DYN_FTRACE_TEST_NAME2(void); 664 585 665 - extern int ring_buffer_expanded; 586 + extern bool ring_buffer_expanded; 666 587 extern bool tracing_selftest_disabled; 667 588 DECLARE_PER_CPU(int, ftrace_cpu_disabled); 668 589 ··· 698 619 unsigned long ip, const char *fmt, va_list args); 699 620 int trace_array_printk(struct trace_array *tr, 700 621 unsigned long ip, const char *fmt, ...); 622 + int trace_array_printk_buf(struct ring_buffer *buffer, 623 + unsigned long ip, const char *fmt, ...); 701 624 void trace_printk_seq(struct trace_seq *s); 702 625 enum print_line_t print_trace_line(struct trace_iterator *iter); 703 626 ··· 867 786 TRACE_ITER_STOP_ON_FREE = 0x400000, 868 787 TRACE_ITER_IRQ_INFO = 0x800000, 869 788 TRACE_ITER_MARKERS = 0x1000000, 789 + TRACE_ITER_FUNCTION = 0x2000000, 870 790 }; 871 791 872 792 /* ··· 914 832 915 833 struct ftrace_event_field { 916 834 struct list_head link; 917 - char *name; 918 - char *type; 835 + const char *name; 836 + const char *type; 919 837 int filter_type; 920 838 int offset; 921 839 int size; ··· 933 851 struct event_subsystem { 934 852 struct list_head list; 935 853 const char *name; 936 - struct dentry *entry; 937 854 struct event_filter *filter; 938 - int nr_events; 939 855 int ref_count; 856 + }; 857 + 858 + struct ftrace_subsystem_dir { 859 + struct list_head list; 860 + struct event_subsystem *subsystem; 861 + struct trace_array *tr; 862 + struct dentry *entry; 863 + int ref_count; 864 + int nr_events; 940 865 }; 941 866 942 867 #define FILTER_PRED_INVALID ((unsigned short)-1) ··· 995 906 unsigned short right; 996 907 }; 997 908 998 - extern struct list_head ftrace_common_fields; 999 - 1000 909 extern enum regex_type 1001 910 filter_parse_regex(char *buff, int len, char **search, int *not); 1002 911 extern void print_event_filter(struct ftrace_event_call *call, 1003 912 struct trace_seq *s); 1004 913 extern int apply_event_filter(struct ftrace_event_call *call, 1005 914 char *filter_string); 1006 - extern int apply_subsystem_event_filter(struct event_subsystem *system, 915 + extern int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir, 1007 916 char *filter_string); 1008 917 extern void print_subsystem_event_filter(struct event_subsystem *system, 1009 918 struct trace_seq *s); 1010 919 extern int filter_assign_type(const char *type); 1011 920 1012 - struct list_head * 1013 - trace_get_fields(struct ftrace_event_call *event_call); 921 + struct ftrace_event_field * 922 + trace_find_event_field(struct ftrace_event_call *call, char *name); 1014 923 1015 924 static inline int 1016 925 filter_check_discard(struct ftrace_event_call *call, void *rec, ··· 1025 938 } 1026 939 1027 940 extern void trace_event_enable_cmd_record(bool enable); 941 + extern int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr); 942 + extern int event_trace_del_tracer(struct trace_array *tr); 1028 943 1029 944 extern struct mutex event_mutex; 1030 945 extern struct list_head ftrace_events; ··· 1037 948 void trace_printk_init_buffers(void); 1038 949 void trace_printk_start_comm(void); 1039 950 int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set); 1040 - int set_tracer_flag(unsigned int mask, int enabled); 951 + int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled); 952 + 953 + /* 954 + * Normal trace_printk() and friends allocates special buffers 955 + * to do the manipulation, as well as saves the print formats 956 + * into sections to display. But the trace infrastructure wants 957 + * to use these without the added overhead at the price of being 958 + * a bit slower (used mainly for warnings, where we don't care 959 + * about performance). The internal_trace_puts() is for such 960 + * a purpose. 961 + */ 962 + #define internal_trace_puts(str) __trace_puts(_THIS_IP_, str, strlen(str)) 1041 963 1042 964 #undef FTRACE_ENTRY 1043 965 #define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter) \

+5 -3

kernel/trace/trace_branch.c

··· 32 32 { 33 33 struct ftrace_event_call *call = &event_branch; 34 34 struct trace_array *tr = branch_tracer; 35 + struct trace_array_cpu *data; 35 36 struct ring_buffer_event *event; 36 37 struct trace_branch *entry; 37 38 struct ring_buffer *buffer; ··· 52 51 53 52 local_irq_save(flags); 54 53 cpu = raw_smp_processor_id(); 55 - if (atomic_inc_return(&tr->data[cpu]->disabled) != 1) 54 + data = per_cpu_ptr(tr->trace_buffer.data, cpu); 55 + if (atomic_inc_return(&data->disabled) != 1) 56 56 goto out; 57 57 58 58 pc = preempt_count(); 59 - buffer = tr->buffer; 59 + buffer = tr->trace_buffer.buffer; 60 60 event = trace_buffer_lock_reserve(buffer, TRACE_BRANCH, 61 61 sizeof(*entry), flags, pc); 62 62 if (!event) ··· 82 80 __buffer_unlock_commit(buffer, event); 83 81 84 82 out: 85 - atomic_dec(&tr->data[cpu]->disabled); 83 + atomic_dec(&data->disabled); 86 84 local_irq_restore(flags); 87 85 } 88 86

+10

kernel/trace/trace_clock.c

··· 57 57 return local_clock(); 58 58 } 59 59 60 + /* 61 + * trace_jiffy_clock(): Simply use jiffies as a clock counter. 62 + */ 63 + u64 notrace trace_clock_jiffies(void) 64 + { 65 + u64 jiffy = jiffies - INITIAL_JIFFIES; 66 + 67 + /* Return nsecs */ 68 + return (u64)jiffies_to_usecs(jiffy) * 1000ULL; 69 + } 60 70 61 71 /* 62 72 * trace_clock_global(): special globally coherent trace clock

+19 -4

kernel/trace/trace_entries.h

··· 223 223 __dynamic_array( u32, buf ) 224 224 ), 225 225 226 - F_printk("%08lx fmt:%p", 227 - __entry->ip, __entry->fmt), 226 + F_printk("%pf: %s", 227 + (void *)__entry->ip, __entry->fmt), 228 228 229 229 FILTER_OTHER 230 230 ); ··· 238 238 __dynamic_array( char, buf ) 239 239 ), 240 240 241 - F_printk("%08lx %s", 242 - __entry->ip, __entry->buf), 241 + F_printk("%pf: %s", 242 + (void *)__entry->ip, __entry->buf), 243 + 244 + FILTER_OTHER 245 + ); 246 + 247 + FTRACE_ENTRY(bputs, bputs_entry, 248 + 249 + TRACE_BPUTS, 250 + 251 + F_STRUCT( 252 + __field( unsigned long, ip ) 253 + __field( const char *, str ) 254 + ), 255 + 256 + F_printk("%pf: %s", 257 + (void *)__entry->ip, __entry->str), 243 258 244 259 FILTER_OTHER 245 260 );

+1121 -280

kernel/trace/trace_events.c

··· 34 34 EXPORT_SYMBOL_GPL(event_storage); 35 35 36 36 LIST_HEAD(ftrace_events); 37 - LIST_HEAD(ftrace_common_fields); 37 + static LIST_HEAD(ftrace_common_fields); 38 38 39 - struct list_head * 39 + #define GFP_TRACE (GFP_KERNEL | __GFP_ZERO) 40 + 41 + static struct kmem_cache *field_cachep; 42 + static struct kmem_cache *file_cachep; 43 + 44 + /* Double loops, do not use break, only goto's work */ 45 + #define do_for_each_event_file(tr, file) \ 46 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { \ 47 + list_for_each_entry(file, &tr->events, list) 48 + 49 + #define do_for_each_event_file_safe(tr, file) \ 50 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { \ 51 + struct ftrace_event_file *___n; \ 52 + list_for_each_entry_safe(file, ___n, &tr->events, list) 53 + 54 + #define while_for_each_event_file() \ 55 + } 56 + 57 + static struct list_head * 40 58 trace_get_fields(struct ftrace_event_call *event_call) 41 59 { 42 60 if (!event_call->class->get_fields) 43 61 return &event_call->class->fields; 44 62 return event_call->class->get_fields(event_call); 63 + } 64 + 65 + static struct ftrace_event_field * 66 + __find_event_field(struct list_head *head, char *name) 67 + { 68 + struct ftrace_event_field *field; 69 + 70 + list_for_each_entry(field, head, link) { 71 + if (!strcmp(field->name, name)) 72 + return field; 73 + } 74 + 75 + return NULL; 76 + } 77 + 78 + struct ftrace_event_field * 79 + trace_find_event_field(struct ftrace_event_call *call, char *name) 80 + { 81 + struct ftrace_event_field *field; 82 + struct list_head *head; 83 + 84 + field = __find_event_field(&ftrace_common_fields, name); 85 + if (field) 86 + return field; 87 + 88 + head = trace_get_fields(call); 89 + return __find_event_field(head, name); 45 90 } 46 91 47 92 static int __trace_define_field(struct list_head *head, const char *type, ··· 95 50 { 96 51 struct ftrace_event_field *field; 97 52 98 - field = kzalloc(sizeof(*field), GFP_KERNEL); 53 + field = kmem_cache_alloc(field_cachep, GFP_TRACE); 99 54 if (!field) 100 55 goto err; 101 56 102 - field->name = kstrdup(name, GFP_KERNEL); 103 - if (!field->name) 104 - goto err; 105 - 106 - field->type = kstrdup(type, GFP_KERNEL); 107 - if (!field->type) 108 - goto err; 57 + field->name = name; 58 + field->type = type; 109 59 110 60 if (filter_type == FILTER_OTHER) 111 61 field->filter_type = filter_assign_type(type); ··· 116 76 return 0; 117 77 118 78 err: 119 - if (field) 120 - kfree(field->name); 121 - kfree(field); 79 + kmem_cache_free(field_cachep, field); 122 80 123 81 return -ENOMEM; 124 82 } ··· 158 120 return ret; 159 121 } 160 122 161 - void trace_destroy_fields(struct ftrace_event_call *call) 123 + static void trace_destroy_fields(struct ftrace_event_call *call) 162 124 { 163 125 struct ftrace_event_field *field, *next; 164 126 struct list_head *head; ··· 166 128 head = trace_get_fields(call); 167 129 list_for_each_entry_safe(field, next, head, link) { 168 130 list_del(&field->link); 169 - kfree(field->type); 170 - kfree(field->name); 171 - kfree(field); 131 + kmem_cache_free(field_cachep, field); 172 132 } 173 133 } 174 134 ··· 185 149 int ftrace_event_reg(struct ftrace_event_call *call, 186 150 enum trace_reg type, void *data) 187 151 { 152 + struct ftrace_event_file *file = data; 153 + 188 154 switch (type) { 189 155 case TRACE_REG_REGISTER: 190 156 return tracepoint_probe_register(call->name, 191 157 call->class->probe, 192 - call); 158 + file); 193 159 case TRACE_REG_UNREGISTER: 194 160 tracepoint_probe_unregister(call->name, 195 161 call->class->probe, 196 - call); 162 + file); 197 163 return 0; 198 164 199 165 #ifdef CONFIG_PERF_EVENTS ··· 221 183 222 184 void trace_event_enable_cmd_record(bool enable) 223 185 { 224 - struct ftrace_event_call *call; 186 + struct ftrace_event_file *file; 187 + struct trace_array *tr; 225 188 226 189 mutex_lock(&event_mutex); 227 - list_for_each_entry(call, &ftrace_events, list) { 228 - if (!(call->flags & TRACE_EVENT_FL_ENABLED)) 190 + do_for_each_event_file(tr, file) { 191 + 192 + if (!(file->flags & FTRACE_EVENT_FL_ENABLED)) 229 193 continue; 230 194 231 195 if (enable) { 232 196 tracing_start_cmdline_record(); 233 - call->flags |= TRACE_EVENT_FL_RECORDED_CMD; 197 + set_bit(FTRACE_EVENT_FL_RECORDED_CMD_BIT, &file->flags); 234 198 } else { 235 199 tracing_stop_cmdline_record(); 236 - call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD; 200 + clear_bit(FTRACE_EVENT_FL_RECORDED_CMD_BIT, &file->flags); 237 201 } 238 - } 202 + } while_for_each_event_file(); 239 203 mutex_unlock(&event_mutex); 240 204 } 241 205 242 - static int ftrace_event_enable_disable(struct ftrace_event_call *call, 243 - int enable) 206 + static int __ftrace_event_enable_disable(struct ftrace_event_file *file, 207 + int enable, int soft_disable) 244 208 { 209 + struct ftrace_event_call *call = file->event_call; 245 210 int ret = 0; 211 + int disable; 246 212 247 213 switch (enable) { 248 214 case 0: 249 - if (call->flags & TRACE_EVENT_FL_ENABLED) { 250 - call->flags &= ~TRACE_EVENT_FL_ENABLED; 251 - if (call->flags & TRACE_EVENT_FL_RECORDED_CMD) { 215 + /* 216 + * When soft_disable is set and enable is cleared, we want 217 + * to clear the SOFT_DISABLED flag but leave the event in the 218 + * state that it was. That is, if the event was enabled and 219 + * SOFT_DISABLED isn't set, then do nothing. But if SOFT_DISABLED 220 + * is set we do not want the event to be enabled before we 221 + * clear the bit. 222 + * 223 + * When soft_disable is not set but the SOFT_MODE flag is, 224 + * we do nothing. Do not disable the tracepoint, otherwise 225 + * "soft enable"s (clearing the SOFT_DISABLED bit) wont work. 226 + */ 227 + if (soft_disable) { 228 + disable = file->flags & FTRACE_EVENT_FL_SOFT_DISABLED; 229 + clear_bit(FTRACE_EVENT_FL_SOFT_MODE_BIT, &file->flags); 230 + } else 231 + disable = !(file->flags & FTRACE_EVENT_FL_SOFT_MODE); 232 + 233 + if (disable && (file->flags & FTRACE_EVENT_FL_ENABLED)) { 234 + clear_bit(FTRACE_EVENT_FL_ENABLED_BIT, &file->flags); 235 + if (file->flags & FTRACE_EVENT_FL_RECORDED_CMD) { 252 236 tracing_stop_cmdline_record(); 253 - call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD; 237 + clear_bit(FTRACE_EVENT_FL_RECORDED_CMD_BIT, &file->flags); 254 238 } 255 - call->class->reg(call, TRACE_REG_UNREGISTER, NULL); 239 + call->class->reg(call, TRACE_REG_UNREGISTER, file); 256 240 } 241 + /* If in SOFT_MODE, just set the SOFT_DISABLE_BIT */ 242 + if (file->flags & FTRACE_EVENT_FL_SOFT_MODE) 243 + set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags); 257 244 break; 258 245 case 1: 259 - if (!(call->flags & TRACE_EVENT_FL_ENABLED)) { 246 + /* 247 + * When soft_disable is set and enable is set, we want to 248 + * register the tracepoint for the event, but leave the event 249 + * as is. That means, if the event was already enabled, we do 250 + * nothing (but set SOFT_MODE). If the event is disabled, we 251 + * set SOFT_DISABLED before enabling the event tracepoint, so 252 + * it still seems to be disabled. 253 + */ 254 + if (!soft_disable) 255 + clear_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags); 256 + else 257 + set_bit(FTRACE_EVENT_FL_SOFT_MODE_BIT, &file->flags); 258 + 259 + if (!(file->flags & FTRACE_EVENT_FL_ENABLED)) { 260 + 261 + /* Keep the event disabled, when going to SOFT_MODE. */ 262 + if (soft_disable) 263 + set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags); 264 + 260 265 if (trace_flags & TRACE_ITER_RECORD_CMD) { 261 266 tracing_start_cmdline_record(); 262 - call->flags |= TRACE_EVENT_FL_RECORDED_CMD; 267 + set_bit(FTRACE_EVENT_FL_RECORDED_CMD_BIT, &file->flags); 263 268 } 264 - ret = call->class->reg(call, TRACE_REG_REGISTER, NULL); 269 + ret = call->class->reg(call, TRACE_REG_REGISTER, file); 265 270 if (ret) { 266 271 tracing_stop_cmdline_record(); 267 272 pr_info("event trace: Could not enable event " 268 273 "%s\n", call->name); 269 274 break; 270 275 } 271 - call->flags |= TRACE_EVENT_FL_ENABLED; 276 + set_bit(FTRACE_EVENT_FL_ENABLED_BIT, &file->flags); 277 + 278 + /* WAS_ENABLED gets set but never cleared. */ 279 + call->flags |= TRACE_EVENT_FL_WAS_ENABLED; 272 280 } 273 281 break; 274 282 } ··· 322 238 return ret; 323 239 } 324 240 325 - static void ftrace_clear_events(void) 241 + static int ftrace_event_enable_disable(struct ftrace_event_file *file, 242 + int enable) 326 243 { 327 - struct ftrace_event_call *call; 244 + return __ftrace_event_enable_disable(file, enable, 0); 245 + } 246 + 247 + static void ftrace_clear_events(struct trace_array *tr) 248 + { 249 + struct ftrace_event_file *file; 328 250 329 251 mutex_lock(&event_mutex); 330 - list_for_each_entry(call, &ftrace_events, list) { 331 - ftrace_event_enable_disable(call, 0); 252 + list_for_each_entry(file, &tr->events, list) { 253 + ftrace_event_enable_disable(file, 0); 332 254 } 333 255 mutex_unlock(&event_mutex); 334 256 } ··· 347 257 if (--system->ref_count) 348 258 return; 349 259 260 + list_del(&system->list); 261 + 350 262 if (filter) { 351 263 kfree(filter->filter_string); 352 264 kfree(filter); 353 265 } 354 - kfree(system->name); 355 266 kfree(system); 356 267 } 357 268 ··· 362 271 system->ref_count++; 363 272 } 364 273 365 - static void put_system(struct event_subsystem *system) 274 + static void __get_system_dir(struct ftrace_subsystem_dir *dir) 275 + { 276 + WARN_ON_ONCE(dir->ref_count == 0); 277 + dir->ref_count++; 278 + __get_system(dir->subsystem); 279 + } 280 + 281 + static void __put_system_dir(struct ftrace_subsystem_dir *dir) 282 + { 283 + WARN_ON_ONCE(dir->ref_count == 0); 284 + /* If the subsystem is about to be freed, the dir must be too */ 285 + WARN_ON_ONCE(dir->subsystem->ref_count == 1 && dir->ref_count != 1); 286 + 287 + __put_system(dir->subsystem); 288 + if (!--dir->ref_count) 289 + kfree(dir); 290 + } 291 + 292 + static void put_system(struct ftrace_subsystem_dir *dir) 366 293 { 367 294 mutex_lock(&event_mutex); 368 - __put_system(system); 295 + __put_system_dir(dir); 369 296 mutex_unlock(&event_mutex); 370 297 } 371 298 372 299 /* 373 300 * __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events. 374 301 */ 375 - static int __ftrace_set_clr_event(const char *match, const char *sub, 376 - const char *event, int set) 302 + static int __ftrace_set_clr_event(struct trace_array *tr, const char *match, 303 + const char *sub, const char *event, int set) 377 304 { 305 + struct ftrace_event_file *file; 378 306 struct ftrace_event_call *call; 379 307 int ret = -EINVAL; 380 308 381 309 mutex_lock(&event_mutex); 382 - list_for_each_entry(call, &ftrace_events, list) { 310 + list_for_each_entry(file, &tr->events, list) { 311 + 312 + call = file->event_call; 383 313 384 314 if (!call->name || !call->class || !call->class->reg) 385 315 continue; ··· 419 307 if (event && strcmp(event, call->name) != 0) 420 308 continue; 421 309 422 - ftrace_event_enable_disable(call, set); 310 + ftrace_event_enable_disable(file, set); 423 311 424 312 ret = 0; 425 313 } ··· 428 316 return ret; 429 317 } 430 318 431 - static int ftrace_set_clr_event(char *buf, int set) 319 + static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set) 432 320 { 433 321 char *event = NULL, *sub = NULL, *match; 434 322 ··· 456 344 event = NULL; 457 345 } 458 346 459 - return __ftrace_set_clr_event(match, sub, event, set); 347 + return __ftrace_set_clr_event(tr, match, sub, event, set); 460 348 } 461 349 462 350 /** ··· 473 361 */ 474 362 int trace_set_clr_event(const char *system, const char *event, int set) 475 363 { 476 - return __ftrace_set_clr_event(NULL, system, event, set); 364 + struct trace_array *tr = top_trace_array(); 365 + 366 + return __ftrace_set_clr_event(tr, NULL, system, event, set); 477 367 } 478 368 EXPORT_SYMBOL_GPL(trace_set_clr_event); 479 369 ··· 487 373 size_t cnt, loff_t *ppos) 488 374 { 489 375 struct trace_parser parser; 376 + struct seq_file *m = file->private_data; 377 + struct trace_array *tr = m->private; 490 378 ssize_t read, ret; 491 379 492 380 if (!cnt) ··· 511 395 512 396 parser.buffer[parser.idx] = 0; 513 397 514 - ret = ftrace_set_clr_event(parser.buffer + !set, set); 398 + ret = ftrace_set_clr_event(tr, parser.buffer + !set, set); 515 399 if (ret) 516 400 goto out_put; 517 401 } ··· 527 411 static void * 528 412 t_next(struct seq_file *m, void *v, loff_t *pos) 529 413 { 530 - struct ftrace_event_call *call = v; 414 + struct ftrace_event_file *file = v; 415 + struct ftrace_event_call *call; 416 + struct trace_array *tr = m->private; 531 417 532 418 (*pos)++; 533 419 534 - list_for_each_entry_continue(call, &ftrace_events, list) { 420 + list_for_each_entry_continue(file, &tr->events, list) { 421 + call = file->event_call; 535 422 /* 536 423 * The ftrace subsystem is for showing formats only. 537 424 * They can not be enabled or disabled via the event files. 538 425 */ 539 426 if (call->class && call->class->reg) 540 - return call; 427 + return file; 541 428 } 542 429 543 430 return NULL; ··· 548 429 549 430 static void *t_start(struct seq_file *m, loff_t *pos) 550 431 { 551 - struct ftrace_event_call *call; 432 + struct ftrace_event_file *file; 433 + struct trace_array *tr = m->private; 552 434 loff_t l; 553 435 554 436 mutex_lock(&event_mutex); 555 437 556 - call = list_entry(&ftrace_events, struct ftrace_event_call, list); 438 + file = list_entry(&tr->events, struct ftrace_event_file, list); 557 439 for (l = 0; l <= *pos; ) { 558 - call = t_next(m, call, &l); 559 - if (!call) 440 + file = t_next(m, file, &l); 441 + if (!file) 560 442 break; 561 443 } 562 - return call; 444 + return file; 563 445 } 564 446 565 447 static void * 566 448 s_next(struct seq_file *m, void *v, loff_t *pos) 567 449 { 568 - struct ftrace_event_call *call = v; 450 + struct ftrace_event_file *file = v; 451 + struct trace_array *tr = m->private; 569 452 570 453 (*pos)++; 571 454 572 - list_for_each_entry_continue(call, &ftrace_events, list) { 573 - if (call->flags & TRACE_EVENT_FL_ENABLED) 574 - return call; 455 + list_for_each_entry_continue(file, &tr->events, list) { 456 + if (file->flags & FTRACE_EVENT_FL_ENABLED) 457 + return file; 575 458 } 576 459 577 460 return NULL; ··· 581 460 582 461 static void *s_start(struct seq_file *m, loff_t *pos) 583 462 { 584 - struct ftrace_event_call *call; 463 + struct ftrace_event_file *file; 464 + struct trace_array *tr = m->private; 585 465 loff_t l; 586 466 587 467 mutex_lock(&event_mutex); 588 468 589 - call = list_entry(&ftrace_events, struct ftrace_event_call, list); 469 + file = list_entry(&tr->events, struct ftrace_event_file, list); 590 470 for (l = 0; l <= *pos; ) { 591 - call = s_next(m, call, &l); 592 - if (!call) 471 + file = s_next(m, file, &l); 472 + if (!file) 593 473 break; 594 474 } 595 - return call; 475 + return file; 596 476 } 597 477 598 478 static int t_show(struct seq_file *m, void *v) 599 479 { 600 - struct ftrace_event_call *call = v; 480 + struct ftrace_event_file *file = v; 481 + struct ftrace_event_call *call = file->event_call; 601 482 602 483 if (strcmp(call->class->system, TRACE_SYSTEM) != 0) 603 484 seq_printf(m, "%s:", call->class->system); ··· 617 494 event_enable_read(struct file *filp, char __user *ubuf, size_t cnt, 618 495 loff_t *ppos) 619 496 { 620 - struct ftrace_event_call *call = filp->private_data; 497 + struct ftrace_event_file *file = filp->private_data; 621 498 char *buf; 622 499 623 - if (call->flags & TRACE_EVENT_FL_ENABLED) 624 - buf = "1\n"; 625 - else 500 + if (file->flags & FTRACE_EVENT_FL_ENABLED) { 501 + if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED) 502 + buf = "0*\n"; 503 + else 504 + buf = "1\n"; 505 + } else 626 506 buf = "0\n"; 627 507 628 - return simple_read_from_buffer(ubuf, cnt, ppos, buf, 2); 508 + return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf)); 629 509 } 630 510 631 511 static ssize_t 632 512 event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt, 633 513 loff_t *ppos) 634 514 { 635 - struct ftrace_event_call *call = filp->private_data; 515 + struct ftrace_event_file *file = filp->private_data; 636 516 unsigned long val; 637 517 int ret; 518 + 519 + if (!file) 520 + return -EINVAL; 638 521 639 522 ret = kstrtoul_from_user(ubuf, cnt, 10, &val); 640 523 if (ret) ··· 654 525 case 0: 655 526 case 1: 656 527 mutex_lock(&event_mutex); 657 - ret = ftrace_event_enable_disable(call, val); 528 + ret = ftrace_event_enable_disable(file, val); 658 529 mutex_unlock(&event_mutex); 659 530 break; 660 531 ··· 672 543 loff_t *ppos) 673 544 { 674 545 const char set_to_char[4] = { '?', '0', '1', 'X' }; 675 - struct event_subsystem *system = filp->private_data; 546 + struct ftrace_subsystem_dir *dir = filp->private_data; 547 + struct event_subsystem *system = dir->subsystem; 676 548 struct ftrace_event_call *call; 549 + struct ftrace_event_file *file; 550 + struct trace_array *tr = dir->tr; 677 551 char buf[2]; 678 552 int set = 0; 679 553 int ret; 680 554 681 555 mutex_lock(&event_mutex); 682 - list_for_each_entry(call, &ftrace_events, list) { 556 + list_for_each_entry(file, &tr->events, list) { 557 + call = file->event_call; 683 558 if (!call->name || !call->class || !call->class->reg) 684 559 continue; 685 560 ··· 695 562 * or if all events or cleared, or if we have 696 563 * a mixture. 697 564 */ 698 - set |= (1 << !!(call->flags & TRACE_EVENT_FL_ENABLED)); 565 + set |= (1 << !!(file->flags & FTRACE_EVENT_FL_ENABLED)); 699 566 700 567 /* 701 568 * If we have a mixture, no need to look further. ··· 717 584 system_enable_write(struct file *filp, const char __user *ubuf, size_t cnt, 718 585 loff_t *ppos) 719 586 { 720 - struct event_subsystem *system = filp->private_data; 587 + struct ftrace_subsystem_dir *dir = filp->private_data; 588 + struct event_subsystem *system = dir->subsystem; 721 589 const char *name = NULL; 722 590 unsigned long val; 723 591 ssize_t ret; ··· 741 607 if (system) 742 608 name = system->name; 743 609 744 - ret = __ftrace_set_clr_event(NULL, name, NULL, val); 610 + ret = __ftrace_set_clr_event(dir->tr, NULL, name, NULL, val); 745 611 if (ret) 746 612 goto out; 747 613 ··· 979 845 static int subsystem_open(struct inode *inode, struct file *filp) 980 846 { 981 847 struct event_subsystem *system = NULL; 848 + struct ftrace_subsystem_dir *dir = NULL; /* Initialize for gcc */ 849 + struct trace_array *tr; 982 850 int ret; 983 - 984 - if (!inode->i_private) 985 - goto skip_search; 986 851 987 852 /* Make sure the system still exists */ 988 853 mutex_lock(&event_mutex); 989 - list_for_each_entry(system, &event_subsystems, list) { 990 - if (system == inode->i_private) { 991 - /* Don't open systems with no events */ 992 - if (!system->nr_events) { 993 - system = NULL; 994 - break; 854 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { 855 + list_for_each_entry(dir, &tr->systems, list) { 856 + if (dir == inode->i_private) { 857 + /* Don't open systems with no events */ 858 + if (dir->nr_events) { 859 + __get_system_dir(dir); 860 + system = dir->subsystem; 861 + } 862 + goto exit_loop; 995 863 } 996 - __get_system(system); 997 - break; 998 864 } 999 865 } 866 + exit_loop: 1000 867 mutex_unlock(&event_mutex); 1001 868 1002 - if (system != inode->i_private) 869 + if (!system) 1003 870 return -ENODEV; 1004 871 1005 - skip_search: 872 + /* Some versions of gcc think dir can be uninitialized here */ 873 + WARN_ON(!dir); 874 + 1006 875 ret = tracing_open_generic(inode, filp); 1007 - if (ret < 0 && system) 1008 - put_system(system); 876 + if (ret < 0) 877 + put_system(dir); 878 + 879 + return ret; 880 + } 881 + 882 + static int system_tr_open(struct inode *inode, struct file *filp) 883 + { 884 + struct ftrace_subsystem_dir *dir; 885 + struct trace_array *tr = inode->i_private; 886 + int ret; 887 + 888 + /* Make a temporary dir that has no system but points to tr */ 889 + dir = kzalloc(sizeof(*dir), GFP_KERNEL); 890 + if (!dir) 891 + return -ENOMEM; 892 + 893 + dir->tr = tr; 894 + 895 + ret = tracing_open_generic(inode, filp); 896 + if (ret < 0) 897 + kfree(dir); 898 + 899 + filp->private_data = dir; 1009 900 1010 901 return ret; 1011 902 } 1012 903 1013 904 static int subsystem_release(struct inode *inode, struct file *file) 1014 905 { 1015 - struct event_subsystem *system = inode->i_private; 906 + struct ftrace_subsystem_dir *dir = file->private_data; 1016 907 1017 - if (system) 1018 - put_system(system); 908 + /* 909 + * If dir->subsystem is NULL, then this is a temporary 910 + * descriptor that was made for a trace_array to enable 911 + * all subsystems. 912 + */ 913 + if (dir->subsystem) 914 + put_system(dir); 915 + else 916 + kfree(dir); 1019 917 1020 918 return 0; 1021 919 } ··· 1056 890 subsystem_filter_read(struct file *filp, char __user *ubuf, size_t cnt, 1057 891 loff_t *ppos) 1058 892 { 1059 - struct event_subsystem *system = filp->private_data; 893 + struct ftrace_subsystem_dir *dir = filp->private_data; 894 + struct event_subsystem *system = dir->subsystem; 1060 895 struct trace_seq *s; 1061 896 int r; 1062 897 ··· 1082 915 subsystem_filter_write(struct file *filp, const char __user *ubuf, size_t cnt, 1083 916 loff_t *ppos) 1084 917 { 1085 - struct event_subsystem *system = filp->private_data; 918 + struct ftrace_subsystem_dir *dir = filp->private_data; 1086 919 char *buf; 1087 920 int err; 1088 921 ··· 1099 932 } 1100 933 buf[cnt] = '\0'; 1101 934 1102 - err = apply_subsystem_event_filter(system, buf); 935 + err = apply_subsystem_event_filter(dir, buf); 1103 936 free_page((unsigned long) buf); 1104 937 if (err < 0) 1105 938 return err; ··· 1208 1041 .release = subsystem_release, 1209 1042 }; 1210 1043 1044 + static const struct file_operations ftrace_tr_enable_fops = { 1045 + .open = system_tr_open, 1046 + .read = system_enable_read, 1047 + .write = system_enable_write, 1048 + .llseek = default_llseek, 1049 + .release = subsystem_release, 1050 + }; 1051 + 1211 1052 static const struct file_operations ftrace_show_header_fops = { 1212 1053 .open = tracing_open_generic, 1213 1054 .read = show_header, 1214 1055 .llseek = default_llseek, 1215 1056 }; 1216 1057 1217 - static struct dentry *event_trace_events_dir(void) 1058 + static int 1059 + ftrace_event_open(struct inode *inode, struct file *file, 1060 + const struct seq_operations *seq_ops) 1218 1061 { 1219 - static struct dentry *d_tracer; 1220 - static struct dentry *d_events; 1062 + struct seq_file *m; 1063 + int ret; 1221 1064 1222 - if (d_events) 1223 - return d_events; 1065 + ret = seq_open(file, seq_ops); 1066 + if (ret < 0) 1067 + return ret; 1068 + m = file->private_data; 1069 + /* copy tr over to seq ops */ 1070 + m->private = inode->i_private; 1224 1071 1225 - d_tracer = tracing_init_dentry(); 1226 - if (!d_tracer) 1227 - return NULL; 1228 - 1229 - d_events = debugfs_create_dir("events", d_tracer); 1230 - if (!d_events) 1231 - pr_warning("Could not create debugfs " 1232 - "'events' directory\n"); 1233 - 1234 - return d_events; 1072 + return ret; 1235 1073 } 1236 1074 1237 1075 static int ··· 1244 1072 { 1245 1073 const struct seq_operations *seq_ops = &show_event_seq_ops; 1246 1074 1247 - return seq_open(file, seq_ops); 1075 + return ftrace_event_open(inode, file, seq_ops); 1248 1076 } 1249 1077 1250 1078 static int 1251 1079 ftrace_event_set_open(struct inode *inode, struct file *file) 1252 1080 { 1253 1081 const struct seq_operations *seq_ops = &show_set_event_seq_ops; 1082 + struct trace_array *tr = inode->i_private; 1254 1083 1255 1084 if ((file->f_mode & FMODE_WRITE) && 1256 1085 (file->f_flags & O_TRUNC)) 1257 - ftrace_clear_events(); 1086 + ftrace_clear_events(tr); 1258 1087 1259 - return seq_open(file, seq_ops); 1088 + return ftrace_event_open(inode, file, seq_ops); 1260 1089 } 1261 1090 1262 - static struct dentry * 1263 - event_subsystem_dir(const char *name, struct dentry *d_events) 1091 + static struct event_subsystem * 1092 + create_new_subsystem(const char *name) 1264 1093 { 1265 1094 struct event_subsystem *system; 1266 - struct dentry *entry; 1267 - 1268 - /* First see if we did not already create this dir */ 1269 - list_for_each_entry(system, &event_subsystems, list) { 1270 - if (strcmp(system->name, name) == 0) { 1271 - system->nr_events++; 1272 - return system->entry; 1273 - } 1274 - } 1275 1095 1276 1096 /* need to create new entry */ 1277 1097 system = kmalloc(sizeof(*system), GFP_KERNEL); 1278 - if (!system) { 1279 - pr_warning("No memory to create event subsystem %s\n", 1280 - name); 1281 - return d_events; 1282 - } 1098 + if (!system) 1099 + return NULL; 1283 1100 1284 - system->entry = debugfs_create_dir(name, d_events); 1285 - if (!system->entry) { 1286 - pr_warning("Could not create event subsystem %s\n", 1287 - name); 1288 - kfree(system); 1289 - return d_events; 1290 - } 1291 - 1292 - system->nr_events = 1; 1293 1101 system->ref_count = 1; 1294 - system->name = kstrdup(name, GFP_KERNEL); 1295 - if (!system->name) { 1296 - debugfs_remove(system->entry); 1297 - kfree(system); 1298 - return d_events; 1299 - } 1300 - 1301 - list_add(&system->list, &event_subsystems); 1102 + system->name = name; 1302 1103 1303 1104 system->filter = NULL; 1304 1105 1305 1106 system->filter = kzalloc(sizeof(struct event_filter), GFP_KERNEL); 1306 - if (!system->filter) { 1307 - pr_warning("Could not allocate filter for subsystem " 1308 - "'%s'\n", name); 1309 - return system->entry; 1107 + if (!system->filter) 1108 + goto out_free; 1109 + 1110 + list_add(&system->list, &event_subsystems); 1111 + 1112 + return system; 1113 + 1114 + out_free: 1115 + kfree(system); 1116 + return NULL; 1117 + } 1118 + 1119 + static struct dentry * 1120 + event_subsystem_dir(struct trace_array *tr, const char *name, 1121 + struct ftrace_event_file *file, struct dentry *parent) 1122 + { 1123 + struct ftrace_subsystem_dir *dir; 1124 + struct event_subsystem *system; 1125 + struct dentry *entry; 1126 + 1127 + /* First see if we did not already create this dir */ 1128 + list_for_each_entry(dir, &tr->systems, list) { 1129 + system = dir->subsystem; 1130 + if (strcmp(system->name, name) == 0) { 1131 + dir->nr_events++; 1132 + file->system = dir; 1133 + return dir->entry; 1134 + } 1310 1135 } 1311 1136 1312 - entry = debugfs_create_file("filter", 0644, system->entry, system, 1137 + /* Now see if the system itself exists. */ 1138 + list_for_each_entry(system, &event_subsystems, list) { 1139 + if (strcmp(system->name, name) == 0) 1140 + break; 1141 + } 1142 + /* Reset system variable when not found */ 1143 + if (&system->list == &event_subsystems) 1144 + system = NULL; 1145 + 1146 + dir = kmalloc(sizeof(*dir), GFP_KERNEL); 1147 + if (!dir) 1148 + goto out_fail; 1149 + 1150 + if (!system) { 1151 + system = create_new_subsystem(name); 1152 + if (!system) 1153 + goto out_free; 1154 + } else 1155 + __get_system(system); 1156 + 1157 + dir->entry = debugfs_create_dir(name, parent); 1158 + if (!dir->entry) { 1159 + pr_warning("Failed to create system directory %s\n", name); 1160 + __put_system(system); 1161 + goto out_free; 1162 + } 1163 + 1164 + dir->tr = tr; 1165 + dir->ref_count = 1; 1166 + dir->nr_events = 1; 1167 + dir->subsystem = system; 1168 + file->system = dir; 1169 + 1170 + entry = debugfs_create_file("filter", 0644, dir->entry, dir, 1313 1171 &ftrace_subsystem_filter_fops); 1314 1172 if (!entry) { 1315 1173 kfree(system->filter); 1316 1174 system->filter = NULL; 1317 - pr_warning("Could not create debugfs " 1318 - "'%s/filter' entry\n", name); 1175 + pr_warning("Could not create debugfs '%s/filter' entry\n", name); 1319 1176 } 1320 1177 1321 - trace_create_file("enable", 0644, system->entry, system, 1178 + trace_create_file("enable", 0644, dir->entry, dir, 1322 1179 &ftrace_system_enable_fops); 1323 1180 1324 - return system->entry; 1181 + list_add(&dir->list, &tr->systems); 1182 + 1183 + return dir->entry; 1184 + 1185 + out_free: 1186 + kfree(dir); 1187 + out_fail: 1188 + /* Only print this message if failed on memory allocation */ 1189 + if (!dir || !system) 1190 + pr_warning("No memory to create event subsystem %s\n", 1191 + name); 1192 + return NULL; 1325 1193 } 1326 1194 1327 1195 static int 1328 - event_create_dir(struct ftrace_event_call *call, struct dentry *d_events, 1196 + event_create_dir(struct dentry *parent, 1197 + struct ftrace_event_file *file, 1329 1198 const struct file_operations *id, 1330 1199 const struct file_operations *enable, 1331 1200 const struct file_operations *filter, 1332 1201 const struct file_operations *format) 1333 1202 { 1203 + struct ftrace_event_call *call = file->event_call; 1204 + struct trace_array *tr = file->tr; 1334 1205 struct list_head *head; 1206 + struct dentry *d_events; 1335 1207 int ret; 1336 1208 1337 1209 /* 1338 1210 * If the trace point header did not define TRACE_SYSTEM 1339 1211 * then the system would be called "TRACE_SYSTEM". 1340 1212 */ 1341 - if (strcmp(call->class->system, TRACE_SYSTEM) != 0) 1342 - d_events = event_subsystem_dir(call->class->system, d_events); 1213 + if (strcmp(call->class->system, TRACE_SYSTEM) != 0) { 1214 + d_events = event_subsystem_dir(tr, call->class->system, file, parent); 1215 + if (!d_events) 1216 + return -ENOMEM; 1217 + } else 1218 + d_events = parent; 1343 1219 1344 - call->dir = debugfs_create_dir(call->name, d_events); 1345 - if (!call->dir) { 1346 - pr_warning("Could not create debugfs " 1347 - "'%s' directory\n", call->name); 1220 + file->dir = debugfs_create_dir(call->name, d_events); 1221 + if (!file->dir) { 1222 + pr_warning("Could not create debugfs '%s' directory\n", 1223 + call->name); 1348 1224 return -1; 1349 1225 } 1350 1226 1351 1227 if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) 1352 - trace_create_file("enable", 0644, call->dir, call, 1228 + trace_create_file("enable", 0644, file->dir, file, 1353 1229 enable); 1354 1230 1355 1231 #ifdef CONFIG_PERF_EVENTS 1356 1232 if (call->event.type && call->class->reg) 1357 - trace_create_file("id", 0444, call->dir, call, 1233 + trace_create_file("id", 0444, file->dir, call, 1358 1234 id); 1359 1235 #endif 1360 1236 ··· 1416 1196 if (ret < 0) { 1417 1197 pr_warning("Could not initialize trace point" 1418 1198 " events/%s\n", call->name); 1419 - return ret; 1199 + return -1; 1420 1200 } 1421 1201 } 1422 - trace_create_file("filter", 0644, call->dir, call, 1202 + trace_create_file("filter", 0644, file->dir, call, 1423 1203 filter); 1424 1204 1425 - trace_create_file("format", 0444, call->dir, call, 1205 + trace_create_file("format", 0444, file->dir, call, 1426 1206 format); 1427 1207 1428 1208 return 0; 1429 1209 } 1430 1210 1211 + static void remove_subsystem(struct ftrace_subsystem_dir *dir) 1212 + { 1213 + if (!dir) 1214 + return; 1215 + 1216 + if (!--dir->nr_events) { 1217 + debugfs_remove_recursive(dir->entry); 1218 + list_del(&dir->list); 1219 + __put_system_dir(dir); 1220 + } 1221 + } 1222 + 1223 + static void remove_event_from_tracers(struct ftrace_event_call *call) 1224 + { 1225 + struct ftrace_event_file *file; 1226 + struct trace_array *tr; 1227 + 1228 + do_for_each_event_file_safe(tr, file) { 1229 + 1230 + if (file->event_call != call) 1231 + continue; 1232 + 1233 + list_del(&file->list); 1234 + debugfs_remove_recursive(file->dir); 1235 + remove_subsystem(file->system); 1236 + kmem_cache_free(file_cachep, file); 1237 + 1238 + /* 1239 + * The do_for_each_event_file_safe() is 1240 + * a double loop. After finding the call for this 1241 + * trace_array, we use break to jump to the next 1242 + * trace_array. 1243 + */ 1244 + break; 1245 + } while_for_each_event_file(); 1246 + } 1247 + 1431 1248 static void event_remove(struct ftrace_event_call *call) 1432 1249 { 1433 - ftrace_event_enable_disable(call, 0); 1250 + struct trace_array *tr; 1251 + struct ftrace_event_file *file; 1252 + 1253 + do_for_each_event_file(tr, file) { 1254 + if (file->event_call != call) 1255 + continue; 1256 + ftrace_event_enable_disable(file, 0); 1257 + /* 1258 + * The do_for_each_event_file() is 1259 + * a double loop. After finding the call for this 1260 + * trace_array, we use break to jump to the next 1261 + * trace_array. 1262 + */ 1263 + break; 1264 + } while_for_each_event_file(); 1265 + 1434 1266 if (call->event.funcs) 1435 1267 __unregister_ftrace_event(&call->event); 1268 + remove_event_from_tracers(call); 1436 1269 list_del(&call->list); 1437 1270 } 1438 1271 ··· 1507 1234 } 1508 1235 1509 1236 static int 1510 - __trace_add_event_call(struct ftrace_event_call *call, struct module *mod, 1511 - const struct file_operations *id, 1512 - const struct file_operations *enable, 1513 - const struct file_operations *filter, 1514 - const struct file_operations *format) 1237 + __register_event(struct ftrace_event_call *call, struct module *mod) 1515 1238 { 1516 - struct dentry *d_events; 1517 1239 int ret; 1518 1240 1519 1241 ret = event_init(call); 1520 1242 if (ret < 0) 1521 1243 return ret; 1522 1244 1523 - d_events = event_trace_events_dir(); 1524 - if (!d_events) 1525 - return -ENOENT; 1526 - 1527 - ret = event_create_dir(call, d_events, id, enable, filter, format); 1528 - if (!ret) 1529 - list_add(&call->list, &ftrace_events); 1245 + list_add(&call->list, &ftrace_events); 1530 1246 call->mod = mod; 1531 1247 1532 - return ret; 1248 + return 0; 1533 1249 } 1250 + 1251 + /* Add an event to a trace directory */ 1252 + static int 1253 + __trace_add_new_event(struct ftrace_event_call *call, 1254 + struct trace_array *tr, 1255 + const struct file_operations *id, 1256 + const struct file_operations *enable, 1257 + const struct file_operations *filter, 1258 + const struct file_operations *format) 1259 + { 1260 + struct ftrace_event_file *file; 1261 + 1262 + file = kmem_cache_alloc(file_cachep, GFP_TRACE); 1263 + if (!file) 1264 + return -ENOMEM; 1265 + 1266 + file->event_call = call; 1267 + file->tr = tr; 1268 + list_add(&file->list, &tr->events); 1269 + 1270 + return event_create_dir(tr->event_dir, file, id, enable, filter, format); 1271 + } 1272 + 1273 + /* 1274 + * Just create a decriptor for early init. A descriptor is required 1275 + * for enabling events at boot. We want to enable events before 1276 + * the filesystem is initialized. 1277 + */ 1278 + static __init int 1279 + __trace_early_add_new_event(struct ftrace_event_call *call, 1280 + struct trace_array *tr) 1281 + { 1282 + struct ftrace_event_file *file; 1283 + 1284 + file = kmem_cache_alloc(file_cachep, GFP_TRACE); 1285 + if (!file) 1286 + return -ENOMEM; 1287 + 1288 + file->event_call = call; 1289 + file->tr = tr; 1290 + list_add(&file->list, &tr->events); 1291 + 1292 + return 0; 1293 + } 1294 + 1295 + struct ftrace_module_file_ops; 1296 + static void __add_event_to_tracers(struct ftrace_event_call *call, 1297 + struct ftrace_module_file_ops *file_ops); 1534 1298 1535 1299 /* Add an additional event_call dynamically */ 1536 1300 int trace_add_event_call(struct ftrace_event_call *call) 1537 1301 { 1538 1302 int ret; 1539 1303 mutex_lock(&event_mutex); 1540 - ret = __trace_add_event_call(call, NULL, &ftrace_event_id_fops, 1541 - &ftrace_enable_fops, 1542 - &ftrace_event_filter_fops, 1543 - &ftrace_event_format_fops); 1304 + 1305 + ret = __register_event(call, NULL); 1306 + if (ret >= 0) 1307 + __add_event_to_tracers(call, NULL); 1308 + 1544 1309 mutex_unlock(&event_mutex); 1545 1310 return ret; 1546 1311 } 1547 1312 1548 - static void remove_subsystem_dir(const char *name) 1549 - { 1550 - struct event_subsystem *system; 1551 - 1552 - if (strcmp(name, TRACE_SYSTEM) == 0) 1553 - return; 1554 - 1555 - list_for_each_entry(system, &event_subsystems, list) { 1556 - if (strcmp(system->name, name) == 0) { 1557 - if (!--system->nr_events) { 1558 - debugfs_remove_recursive(system->entry); 1559 - list_del(&system->list); 1560 - __put_system(system); 1561 - } 1562 - break; 1563 - } 1564 - } 1565 - } 1566 - 1567 1313 /* 1568 - * Must be called under locking both of event_mutex and trace_event_mutex. 1314 + * Must be called under locking both of event_mutex and trace_event_sem. 1569 1315 */ 1570 1316 static void __trace_remove_event_call(struct ftrace_event_call *call) 1571 1317 { 1572 1318 event_remove(call); 1573 1319 trace_destroy_fields(call); 1574 1320 destroy_preds(call); 1575 - debugfs_remove_recursive(call->dir); 1576 - remove_subsystem_dir(call->class->system); 1577 1321 } 1578 1322 1579 1323 /* Remove an event_call */ 1580 1324 void trace_remove_event_call(struct ftrace_event_call *call) 1581 1325 { 1582 1326 mutex_lock(&event_mutex); 1583 - down_write(&trace_event_mutex); 1327 + down_write(&trace_event_sem); 1584 1328 __trace_remove_event_call(call); 1585 - up_write(&trace_event_mutex); 1329 + up_write(&trace_event_sem); 1586 1330 mutex_unlock(&event_mutex); 1587 1331 } 1588 1332 ··· 1624 1334 struct file_operations format; 1625 1335 struct file_operations filter; 1626 1336 }; 1337 + 1338 + static struct ftrace_module_file_ops * 1339 + find_ftrace_file_ops(struct ftrace_module_file_ops *file_ops, struct module *mod) 1340 + { 1341 + /* 1342 + * As event_calls are added in groups by module, 1343 + * when we find one file_ops, we don't need to search for 1344 + * each call in that module, as the rest should be the 1345 + * same. Only search for a new one if the last one did 1346 + * not match. 1347 + */ 1348 + if (file_ops && mod == file_ops->mod) 1349 + return file_ops; 1350 + 1351 + list_for_each_entry(file_ops, &ftrace_module_file_list, list) { 1352 + if (file_ops->mod == mod) 1353 + return file_ops; 1354 + } 1355 + return NULL; 1356 + } 1627 1357 1628 1358 static struct ftrace_module_file_ops * 1629 1359 trace_create_file_ops(struct module *mod) ··· 1696 1386 return; 1697 1387 1698 1388 for_each_event(call, start, end) { 1699 - __trace_add_event_call(*call, mod, 1700 - &file_ops->id, &file_ops->enable, 1701 - &file_ops->filter, &file_ops->format); 1389 + __register_event(*call, mod); 1390 + __add_event_to_tracers(*call, file_ops); 1702 1391 } 1703 1392 } 1704 1393 ··· 1705 1396 { 1706 1397 struct ftrace_module_file_ops *file_ops; 1707 1398 struct ftrace_event_call *call, *p; 1708 - bool found = false; 1399 + bool clear_trace = false; 1709 1400 1710 - down_write(&trace_event_mutex); 1401 + down_write(&trace_event_sem); 1711 1402 list_for_each_entry_safe(call, p, &ftrace_events, list) { 1712 1403 if (call->mod == mod) { 1713 - found = true; 1404 + if (call->flags & TRACE_EVENT_FL_WAS_ENABLED) 1405 + clear_trace = true; 1714 1406 __trace_remove_event_call(call); 1715 1407 } 1716 1408 } ··· 1725 1415 list_del(&file_ops->list); 1726 1416 kfree(file_ops); 1727 1417 } 1418 + up_write(&trace_event_sem); 1728 1419 1729 1420 /* 1730 1421 * It is safest to reset the ring buffer if the module being unloaded 1731 - * registered any events. 1422 + * registered any events that were used. The only worry is if 1423 + * a new module gets loaded, and takes on the same id as the events 1424 + * of this module. When printing out the buffer, traced events left 1425 + * over from this module may be passed to the new module events and 1426 + * unexpected results may occur. 1732 1427 */ 1733 - if (found) 1734 - tracing_reset_current_online_cpus(); 1735 - up_write(&trace_event_mutex); 1428 + if (clear_trace) 1429 + tracing_reset_all_online_cpus(); 1736 1430 } 1737 1431 1738 1432 static int trace_module_notify(struct notifier_block *self, ··· 1757 1443 1758 1444 return 0; 1759 1445 } 1446 + 1447 + static int 1448 + __trace_add_new_mod_event(struct ftrace_event_call *call, 1449 + struct trace_array *tr, 1450 + struct ftrace_module_file_ops *file_ops) 1451 + { 1452 + return __trace_add_new_event(call, tr, 1453 + &file_ops->id, &file_ops->enable, 1454 + &file_ops->filter, &file_ops->format); 1455 + } 1456 + 1760 1457 #else 1761 - static int trace_module_notify(struct notifier_block *self, 1762 - unsigned long val, void *data) 1458 + static inline struct ftrace_module_file_ops * 1459 + find_ftrace_file_ops(struct ftrace_module_file_ops *file_ops, struct module *mod) 1460 + { 1461 + return NULL; 1462 + } 1463 + static inline int trace_module_notify(struct notifier_block *self, 1464 + unsigned long val, void *data) 1763 1465 { 1764 1466 return 0; 1765 1467 } 1468 + static inline int 1469 + __trace_add_new_mod_event(struct ftrace_event_call *call, 1470 + struct trace_array *tr, 1471 + struct ftrace_module_file_ops *file_ops) 1472 + { 1473 + return -ENODEV; 1474 + } 1766 1475 #endif /* CONFIG_MODULES */ 1476 + 1477 + /* Create a new event directory structure for a trace directory. */ 1478 + static void 1479 + __trace_add_event_dirs(struct trace_array *tr) 1480 + { 1481 + struct ftrace_module_file_ops *file_ops = NULL; 1482 + struct ftrace_event_call *call; 1483 + int ret; 1484 + 1485 + list_for_each_entry(call, &ftrace_events, list) { 1486 + if (call->mod) { 1487 + /* 1488 + * Directories for events by modules need to 1489 + * keep module ref counts when opened (as we don't 1490 + * want the module to disappear when reading one 1491 + * of these files). The file_ops keep account of 1492 + * the module ref count. 1493 + */ 1494 + file_ops = find_ftrace_file_ops(file_ops, call->mod); 1495 + if (!file_ops) 1496 + continue; /* Warn? */ 1497 + ret = __trace_add_new_mod_event(call, tr, file_ops); 1498 + if (ret < 0) 1499 + pr_warning("Could not create directory for event %s\n", 1500 + call->name); 1501 + continue; 1502 + } 1503 + ret = __trace_add_new_event(call, tr, 1504 + &ftrace_event_id_fops, 1505 + &ftrace_enable_fops, 1506 + &ftrace_event_filter_fops, 1507 + &ftrace_event_format_fops); 1508 + if (ret < 0) 1509 + pr_warning("Could not create directory for event %s\n", 1510 + call->name); 1511 + } 1512 + } 1513 + 1514 + #ifdef CONFIG_DYNAMIC_FTRACE 1515 + 1516 + /* Avoid typos */ 1517 + #define ENABLE_EVENT_STR "enable_event" 1518 + #define DISABLE_EVENT_STR "disable_event" 1519 + 1520 + struct event_probe_data { 1521 + struct ftrace_event_file *file; 1522 + unsigned long count; 1523 + int ref; 1524 + bool enable; 1525 + }; 1526 + 1527 + static struct ftrace_event_file * 1528 + find_event_file(struct trace_array *tr, const char *system, const char *event) 1529 + { 1530 + struct ftrace_event_file *file; 1531 + struct ftrace_event_call *call; 1532 + 1533 + list_for_each_entry(file, &tr->events, list) { 1534 + 1535 + call = file->event_call; 1536 + 1537 + if (!call->name || !call->class || !call->class->reg) 1538 + continue; 1539 + 1540 + if (call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) 1541 + continue; 1542 + 1543 + if (strcmp(event, call->name) == 0 && 1544 + strcmp(system, call->class->system) == 0) 1545 + return file; 1546 + } 1547 + return NULL; 1548 + } 1549 + 1550 + static void 1551 + event_enable_probe(unsigned long ip, unsigned long parent_ip, void **_data) 1552 + { 1553 + struct event_probe_data **pdata = (struct event_probe_data **)_data; 1554 + struct event_probe_data *data = *pdata; 1555 + 1556 + if (!data) 1557 + return; 1558 + 1559 + if (data->enable) 1560 + clear_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &data->file->flags); 1561 + else 1562 + set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &data->file->flags); 1563 + } 1564 + 1565 + static void 1566 + event_enable_count_probe(unsigned long ip, unsigned long parent_ip, void **_data) 1567 + { 1568 + struct event_probe_data **pdata = (struct event_probe_data **)_data; 1569 + struct event_probe_data *data = *pdata; 1570 + 1571 + if (!data) 1572 + return; 1573 + 1574 + if (!data->count) 1575 + return; 1576 + 1577 + /* Skip if the event is in a state we want to switch to */ 1578 + if (data->enable == !(data->file->flags & FTRACE_EVENT_FL_SOFT_DISABLED)) 1579 + return; 1580 + 1581 + if (data->count != -1) 1582 + (data->count)--; 1583 + 1584 + event_enable_probe(ip, parent_ip, _data); 1585 + } 1586 + 1587 + static int 1588 + event_enable_print(struct seq_file *m, unsigned long ip, 1589 + struct ftrace_probe_ops *ops, void *_data) 1590 + { 1591 + struct event_probe_data *data = _data; 1592 + 1593 + seq_printf(m, "%ps:", (void *)ip); 1594 + 1595 + seq_printf(m, "%s:%s:%s", 1596 + data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR, 1597 + data->file->event_call->class->system, 1598 + data->file->event_call->name); 1599 + 1600 + if (data->count == -1) 1601 + seq_printf(m, ":unlimited\n"); 1602 + else 1603 + seq_printf(m, ":count=%ld\n", data->count); 1604 + 1605 + return 0; 1606 + } 1607 + 1608 + static int 1609 + event_enable_init(struct ftrace_probe_ops *ops, unsigned long ip, 1610 + void **_data) 1611 + { 1612 + struct event_probe_data **pdata = (struct event_probe_data **)_data; 1613 + struct event_probe_data *data = *pdata; 1614 + 1615 + data->ref++; 1616 + return 0; 1617 + } 1618 + 1619 + static void 1620 + event_enable_free(struct ftrace_probe_ops *ops, unsigned long ip, 1621 + void **_data) 1622 + { 1623 + struct event_probe_data **pdata = (struct event_probe_data **)_data; 1624 + struct event_probe_data *data = *pdata; 1625 + 1626 + if (WARN_ON_ONCE(data->ref <= 0)) 1627 + return; 1628 + 1629 + data->ref--; 1630 + if (!data->ref) { 1631 + /* Remove the SOFT_MODE flag */ 1632 + __ftrace_event_enable_disable(data->file, 0, 1); 1633 + module_put(data->file->event_call->mod); 1634 + kfree(data); 1635 + } 1636 + *pdata = NULL; 1637 + } 1638 + 1639 + static struct ftrace_probe_ops event_enable_probe_ops = { 1640 + .func = event_enable_probe, 1641 + .print = event_enable_print, 1642 + .init = event_enable_init, 1643 + .free = event_enable_free, 1644 + }; 1645 + 1646 + static struct ftrace_probe_ops event_enable_count_probe_ops = { 1647 + .func = event_enable_count_probe, 1648 + .print = event_enable_print, 1649 + .init = event_enable_init, 1650 + .free = event_enable_free, 1651 + }; 1652 + 1653 + static struct ftrace_probe_ops event_disable_probe_ops = { 1654 + .func = event_enable_probe, 1655 + .print = event_enable_print, 1656 + .init = event_enable_init, 1657 + .free = event_enable_free, 1658 + }; 1659 + 1660 + static struct ftrace_probe_ops event_disable_count_probe_ops = { 1661 + .func = event_enable_count_probe, 1662 + .print = event_enable_print, 1663 + .init = event_enable_init, 1664 + .free = event_enable_free, 1665 + }; 1666 + 1667 + static int 1668 + event_enable_func(struct ftrace_hash *hash, 1669 + char *glob, char *cmd, char *param, int enabled) 1670 + { 1671 + struct trace_array *tr = top_trace_array(); 1672 + struct ftrace_event_file *file; 1673 + struct ftrace_probe_ops *ops; 1674 + struct event_probe_data *data; 1675 + const char *system; 1676 + const char *event; 1677 + char *number; 1678 + bool enable; 1679 + int ret; 1680 + 1681 + /* hash funcs only work with set_ftrace_filter */ 1682 + if (!enabled) 1683 + return -EINVAL; 1684 + 1685 + if (!param) 1686 + return -EINVAL; 1687 + 1688 + system = strsep(&param, ":"); 1689 + if (!param) 1690 + return -EINVAL; 1691 + 1692 + event = strsep(&param, ":"); 1693 + 1694 + mutex_lock(&event_mutex); 1695 + 1696 + ret = -EINVAL; 1697 + file = find_event_file(tr, system, event); 1698 + if (!file) 1699 + goto out; 1700 + 1701 + enable = strcmp(cmd, ENABLE_EVENT_STR) == 0; 1702 + 1703 + if (enable) 1704 + ops = param ? &event_enable_count_probe_ops : &event_enable_probe_ops; 1705 + else 1706 + ops = param ? &event_disable_count_probe_ops : &event_disable_probe_ops; 1707 + 1708 + if (glob[0] == '!') { 1709 + unregister_ftrace_function_probe_func(glob+1, ops); 1710 + ret = 0; 1711 + goto out; 1712 + } 1713 + 1714 + ret = -ENOMEM; 1715 + data = kzalloc(sizeof(*data), GFP_KERNEL); 1716 + if (!data) 1717 + goto out; 1718 + 1719 + data->enable = enable; 1720 + data->count = -1; 1721 + data->file = file; 1722 + 1723 + if (!param) 1724 + goto out_reg; 1725 + 1726 + number = strsep(&param, ":"); 1727 + 1728 + ret = -EINVAL; 1729 + if (!strlen(number)) 1730 + goto out_free; 1731 + 1732 + /* 1733 + * We use the callback data field (which is a pointer) 1734 + * as our counter. 1735 + */ 1736 + ret = kstrtoul(number, 0, &data->count); 1737 + if (ret) 1738 + goto out_free; 1739 + 1740 + out_reg: 1741 + /* Don't let event modules unload while probe registered */ 1742 + ret = try_module_get(file->event_call->mod); 1743 + if (!ret) 1744 + goto out_free; 1745 + 1746 + ret = __ftrace_event_enable_disable(file, 1, 1); 1747 + if (ret < 0) 1748 + goto out_put; 1749 + ret = register_ftrace_function_probe(glob, ops, data); 1750 + if (!ret) 1751 + goto out_disable; 1752 + out: 1753 + mutex_unlock(&event_mutex); 1754 + return ret; 1755 + 1756 + out_disable: 1757 + __ftrace_event_enable_disable(file, 0, 1); 1758 + out_put: 1759 + module_put(file->event_call->mod); 1760 + out_free: 1761 + kfree(data); 1762 + goto out; 1763 + } 1764 + 1765 + static struct ftrace_func_command event_enable_cmd = { 1766 + .name = ENABLE_EVENT_STR, 1767 + .func = event_enable_func, 1768 + }; 1769 + 1770 + static struct ftrace_func_command event_disable_cmd = { 1771 + .name = DISABLE_EVENT_STR, 1772 + .func = event_enable_func, 1773 + }; 1774 + 1775 + static __init int register_event_cmds(void) 1776 + { 1777 + int ret; 1778 + 1779 + ret = register_ftrace_command(&event_enable_cmd); 1780 + if (WARN_ON(ret < 0)) 1781 + return ret; 1782 + ret = register_ftrace_command(&event_disable_cmd); 1783 + if (WARN_ON(ret < 0)) 1784 + unregister_ftrace_command(&event_enable_cmd); 1785 + return ret; 1786 + } 1787 + #else 1788 + static inline int register_event_cmds(void) { return 0; } 1789 + #endif /* CONFIG_DYNAMIC_FTRACE */ 1790 + 1791 + /* 1792 + * The top level array has already had its ftrace_event_file 1793 + * descriptors created in order to allow for early events to 1794 + * be recorded. This function is called after the debugfs has been 1795 + * initialized, and we now have to create the files associated 1796 + * to the events. 1797 + */ 1798 + static __init void 1799 + __trace_early_add_event_dirs(struct trace_array *tr) 1800 + { 1801 + struct ftrace_event_file *file; 1802 + int ret; 1803 + 1804 + 1805 + list_for_each_entry(file, &tr->events, list) { 1806 + ret = event_create_dir(tr->event_dir, file, 1807 + &ftrace_event_id_fops, 1808 + &ftrace_enable_fops, 1809 + &ftrace_event_filter_fops, 1810 + &ftrace_event_format_fops); 1811 + if (ret < 0) 1812 + pr_warning("Could not create directory for event %s\n", 1813 + file->event_call->name); 1814 + } 1815 + } 1816 + 1817 + /* 1818 + * For early boot up, the top trace array requires to have 1819 + * a list of events that can be enabled. This must be done before 1820 + * the filesystem is set up in order to allow events to be traced 1821 + * early. 1822 + */ 1823 + static __init void 1824 + __trace_early_add_events(struct trace_array *tr) 1825 + { 1826 + struct ftrace_event_call *call; 1827 + int ret; 1828 + 1829 + list_for_each_entry(call, &ftrace_events, list) { 1830 + /* Early boot up should not have any modules loaded */ 1831 + if (WARN_ON_ONCE(call->mod)) 1832 + continue; 1833 + 1834 + ret = __trace_early_add_new_event(call, tr); 1835 + if (ret < 0) 1836 + pr_warning("Could not create early event %s\n", 1837 + call->name); 1838 + } 1839 + } 1840 + 1841 + /* Remove the event directory structure for a trace directory. */ 1842 + static void 1843 + __trace_remove_event_dirs(struct trace_array *tr) 1844 + { 1845 + struct ftrace_event_file *file, *next; 1846 + 1847 + list_for_each_entry_safe(file, next, &tr->events, list) { 1848 + list_del(&file->list); 1849 + debugfs_remove_recursive(file->dir); 1850 + remove_subsystem(file->system); 1851 + kmem_cache_free(file_cachep, file); 1852 + } 1853 + } 1854 + 1855 + static void 1856 + __add_event_to_tracers(struct ftrace_event_call *call, 1857 + struct ftrace_module_file_ops *file_ops) 1858 + { 1859 + struct trace_array *tr; 1860 + 1861 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { 1862 + if (file_ops) 1863 + __trace_add_new_mod_event(call, tr, file_ops); 1864 + else 1865 + __trace_add_new_event(call, tr, 1866 + &ftrace_event_id_fops, 1867 + &ftrace_enable_fops, 1868 + &ftrace_event_filter_fops, 1869 + &ftrace_event_format_fops); 1870 + } 1871 + } 1767 1872 1768 1873 static struct notifier_block trace_module_nb = { 1769 1874 .notifier_call = trace_module_notify, ··· 2197 1464 static __init int setup_trace_event(char *str) 2198 1465 { 2199 1466 strlcpy(bootup_event_buf, str, COMMAND_LINE_SIZE); 2200 - ring_buffer_expanded = 1; 2201 - tracing_selftest_disabled = 1; 1467 + ring_buffer_expanded = true; 1468 + tracing_selftest_disabled = true; 2202 1469 2203 1470 return 1; 2204 1471 } 2205 1472 __setup("trace_event=", setup_trace_event); 2206 1473 1474 + /* Expects to have event_mutex held when called */ 1475 + static int 1476 + create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) 1477 + { 1478 + struct dentry *d_events; 1479 + struct dentry *entry; 1480 + 1481 + entry = debugfs_create_file("set_event", 0644, parent, 1482 + tr, &ftrace_set_event_fops); 1483 + if (!entry) { 1484 + pr_warning("Could not create debugfs 'set_event' entry\n"); 1485 + return -ENOMEM; 1486 + } 1487 + 1488 + d_events = debugfs_create_dir("events", parent); 1489 + if (!d_events) { 1490 + pr_warning("Could not create debugfs 'events' directory\n"); 1491 + return -ENOMEM; 1492 + } 1493 + 1494 + /* ring buffer internal formats */ 1495 + trace_create_file("header_page", 0444, d_events, 1496 + ring_buffer_print_page_header, 1497 + &ftrace_show_header_fops); 1498 + 1499 + trace_create_file("header_event", 0444, d_events, 1500 + ring_buffer_print_entry_header, 1501 + &ftrace_show_header_fops); 1502 + 1503 + trace_create_file("enable", 0644, d_events, 1504 + tr, &ftrace_tr_enable_fops); 1505 + 1506 + tr->event_dir = d_events; 1507 + 1508 + return 0; 1509 + } 1510 + 1511 + /** 1512 + * event_trace_add_tracer - add a instance of a trace_array to events 1513 + * @parent: The parent dentry to place the files/directories for events in 1514 + * @tr: The trace array associated with these events 1515 + * 1516 + * When a new instance is created, it needs to set up its events 1517 + * directory, as well as other files associated with events. It also 1518 + * creates the event hierachry in the @parent/events directory. 1519 + * 1520 + * Returns 0 on success. 1521 + */ 1522 + int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr) 1523 + { 1524 + int ret; 1525 + 1526 + mutex_lock(&event_mutex); 1527 + 1528 + ret = create_event_toplevel_files(parent, tr); 1529 + if (ret) 1530 + goto out_unlock; 1531 + 1532 + down_write(&trace_event_sem); 1533 + __trace_add_event_dirs(tr); 1534 + up_write(&trace_event_sem); 1535 + 1536 + out_unlock: 1537 + mutex_unlock(&event_mutex); 1538 + 1539 + return ret; 1540 + } 1541 + 1542 + /* 1543 + * The top trace array already had its file descriptors created. 1544 + * Now the files themselves need to be created. 1545 + */ 1546 + static __init int 1547 + early_event_add_tracer(struct dentry *parent, struct trace_array *tr) 1548 + { 1549 + int ret; 1550 + 1551 + mutex_lock(&event_mutex); 1552 + 1553 + ret = create_event_toplevel_files(parent, tr); 1554 + if (ret) 1555 + goto out_unlock; 1556 + 1557 + down_write(&trace_event_sem); 1558 + __trace_early_add_event_dirs(tr); 1559 + up_write(&trace_event_sem); 1560 + 1561 + out_unlock: 1562 + mutex_unlock(&event_mutex); 1563 + 1564 + return ret; 1565 + } 1566 + 1567 + int event_trace_del_tracer(struct trace_array *tr) 1568 + { 1569 + /* Disable any running events */ 1570 + __ftrace_set_clr_event(tr, NULL, NULL, NULL, 0); 1571 + 1572 + mutex_lock(&event_mutex); 1573 + 1574 + down_write(&trace_event_sem); 1575 + __trace_remove_event_dirs(tr); 1576 + debugfs_remove_recursive(tr->event_dir); 1577 + up_write(&trace_event_sem); 1578 + 1579 + tr->event_dir = NULL; 1580 + 1581 + mutex_unlock(&event_mutex); 1582 + 1583 + return 0; 1584 + } 1585 + 1586 + static __init int event_trace_memsetup(void) 1587 + { 1588 + field_cachep = KMEM_CACHE(ftrace_event_field, SLAB_PANIC); 1589 + file_cachep = KMEM_CACHE(ftrace_event_file, SLAB_PANIC); 1590 + return 0; 1591 + } 1592 + 2207 1593 static __init int event_trace_enable(void) 2208 1594 { 1595 + struct trace_array *tr = top_trace_array(); 2209 1596 struct ftrace_event_call **iter, *call; 2210 1597 char *buf = bootup_event_buf; 2211 1598 char *token; ··· 2339 1486 list_add(&call->list, &ftrace_events); 2340 1487 } 2341 1488 1489 + /* 1490 + * We need the top trace array to have a working set of trace 1491 + * points at early init, before the debug files and directories 1492 + * are created. Create the file entries now, and attach them 1493 + * to the actual file dentries later. 1494 + */ 1495 + __trace_early_add_events(tr); 1496 + 2342 1497 while (true) { 2343 1498 token = strsep(&buf, ","); 2344 1499 ··· 2355 1494 if (!*token) 2356 1495 continue; 2357 1496 2358 - ret = ftrace_set_clr_event(token, 1); 1497 + ret = ftrace_set_clr_event(tr, token, 1); 2359 1498 if (ret) 2360 1499 pr_warn("Failed to enable trace event: %s\n", token); 2361 1500 } 2362 1501 2363 1502 trace_printk_start_comm(); 2364 1503 1504 + register_event_cmds(); 1505 + 2365 1506 return 0; 2366 1507 } 2367 1508 2368 1509 static __init int event_trace_init(void) 2369 1510 { 2370 - struct ftrace_event_call *call; 1511 + struct trace_array *tr; 2371 1512 struct dentry *d_tracer; 2372 1513 struct dentry *entry; 2373 - struct dentry *d_events; 2374 1514 int ret; 1515 + 1516 + tr = top_trace_array(); 2375 1517 2376 1518 d_tracer = tracing_init_dentry(); 2377 1519 if (!d_tracer) 2378 1520 return 0; 2379 1521 2380 1522 entry = debugfs_create_file("available_events", 0444, d_tracer, 2381 - NULL, &ftrace_avail_fops); 1523 + tr, &ftrace_avail_fops); 2382 1524 if (!entry) 2383 1525 pr_warning("Could not create debugfs " 2384 1526 "'available_events' entry\n"); 2385 1527 2386 - entry = debugfs_create_file("set_event", 0644, d_tracer, 2387 - NULL, &ftrace_set_event_fops); 2388 - if (!entry) 2389 - pr_warning("Could not create debugfs " 2390 - "'set_event' entry\n"); 2391 - 2392 - d_events = event_trace_events_dir(); 2393 - if (!d_events) 2394 - return 0; 2395 - 2396 - /* ring buffer internal formats */ 2397 - trace_create_file("header_page", 0444, d_events, 2398 - ring_buffer_print_page_header, 2399 - &ftrace_show_header_fops); 2400 - 2401 - trace_create_file("header_event", 0444, d_events, 2402 - ring_buffer_print_entry_header, 2403 - &ftrace_show_header_fops); 2404 - 2405 - trace_create_file("enable", 0644, d_events, 2406 - NULL, &ftrace_system_enable_fops); 2407 - 2408 1528 if (trace_define_common_fields()) 2409 1529 pr_warning("tracing: Failed to allocate common fields"); 2410 1530 2411 - /* 2412 - * Early initialization already enabled ftrace event. 2413 - * Now it's only necessary to create the event directory. 2414 - */ 2415 - list_for_each_entry(call, &ftrace_events, list) { 2416 - 2417 - ret = event_create_dir(call, d_events, 2418 - &ftrace_event_id_fops, 2419 - &ftrace_enable_fops, 2420 - &ftrace_event_filter_fops, 2421 - &ftrace_event_format_fops); 2422 - if (ret < 0) 2423 - event_remove(call); 2424 - } 1531 + ret = early_event_add_tracer(d_tracer, tr); 1532 + if (ret) 1533 + return ret; 2425 1534 2426 1535 ret = register_module_notifier(&trace_module_nb); 2427 1536 if (ret) ··· 2399 1568 2400 1569 return 0; 2401 1570 } 1571 + early_initcall(event_trace_memsetup); 2402 1572 core_initcall(event_trace_enable); 2403 1573 fs_initcall(event_trace_init); 2404 1574 ··· 2459 1627 */ 2460 1628 static __init void event_trace_self_tests(void) 2461 1629 { 1630 + struct ftrace_subsystem_dir *dir; 1631 + struct ftrace_event_file *file; 2462 1632 struct ftrace_event_call *call; 2463 1633 struct event_subsystem *system; 1634 + struct trace_array *tr; 2464 1635 int ret; 1636 + 1637 + tr = top_trace_array(); 2465 1638 2466 1639 pr_info("Running tests on trace events:\n"); 2467 1640 2468 - list_for_each_entry(call, &ftrace_events, list) { 1641 + list_for_each_entry(file, &tr->events, list) { 1642 + 1643 + call = file->event_call; 2469 1644 2470 1645 /* Only test those that have a probe */ 2471 1646 if (!call->class || !call->class->probe) ··· 2496 1657 * If an event is already enabled, someone is using 2497 1658 * it and the self test should not be on. 2498 1659 */ 2499 - if (call->flags & TRACE_EVENT_FL_ENABLED) { 1660 + if (file->flags & FTRACE_EVENT_FL_ENABLED) { 2500 1661 pr_warning("Enabled event during self test!\n"); 2501 1662 WARN_ON_ONCE(1); 2502 1663 continue; 2503 1664 } 2504 1665 2505 - ftrace_event_enable_disable(call, 1); 1666 + ftrace_event_enable_disable(file, 1); 2506 1667 event_test_stuff(); 2507 - ftrace_event_enable_disable(call, 0); 1668 + ftrace_event_enable_disable(file, 0); 2508 1669 2509 1670 pr_cont("OK\n"); 2510 1671 } ··· 2513 1674 2514 1675 pr_info("Running tests on trace event systems:\n"); 2515 1676 2516 - list_for_each_entry(system, &event_subsystems, list) { 1677 + list_for_each_entry(dir, &tr->systems, list) { 1678 + 1679 + system = dir->subsystem; 2517 1680 2518 1681 /* the ftrace system is special, skip it */ 2519 1682 if (strcmp(system->name, "ftrace") == 0) ··· 2523 1682 2524 1683 pr_info("Testing event system %s: ", system->name); 2525 1684 2526 - ret = __ftrace_set_clr_event(NULL, system->name, NULL, 1); 1685 + ret = __ftrace_set_clr_event(tr, NULL, system->name, NULL, 1); 2527 1686 if (WARN_ON_ONCE(ret)) { 2528 1687 pr_warning("error enabling system %s\n", 2529 1688 system->name); ··· 2532 1691 2533 1692 event_test_stuff(); 2534 1693 2535 - ret = __ftrace_set_clr_event(NULL, system->name, NULL, 0); 1694 + ret = __ftrace_set_clr_event(tr, NULL, system->name, NULL, 0); 2536 1695 if (WARN_ON_ONCE(ret)) { 2537 1696 pr_warning("error disabling system %s\n", 2538 1697 system->name); ··· 2547 1706 pr_info("Running tests on all trace events:\n"); 2548 1707 pr_info("Testing all events: "); 2549 1708 2550 - ret = __ftrace_set_clr_event(NULL, NULL, NULL, 1); 1709 + ret = __ftrace_set_clr_event(tr, NULL, NULL, NULL, 1); 2551 1710 if (WARN_ON_ONCE(ret)) { 2552 1711 pr_warning("error enabling all events\n"); 2553 1712 return; ··· 2556 1715 event_test_stuff(); 2557 1716 2558 1717 /* reset sysname */ 2559 - ret = __ftrace_set_clr_event(NULL, NULL, NULL, 0); 1718 + ret = __ftrace_set_clr_event(tr, NULL, NULL, NULL, 0); 2560 1719 if (WARN_ON_ONCE(ret)) { 2561 1720 pr_warning("error disabling all events\n"); 2562 1721 return;

+4 -30

kernel/trace/trace_events_filter.c

··· 658 658 mutex_unlock(&event_mutex); 659 659 } 660 660 661 - static struct ftrace_event_field * 662 - __find_event_field(struct list_head *head, char *name) 663 - { 664 - struct ftrace_event_field *field; 665 - 666 - list_for_each_entry(field, head, link) { 667 - if (!strcmp(field->name, name)) 668 - return field; 669 - } 670 - 671 - return NULL; 672 - } 673 - 674 - static struct ftrace_event_field * 675 - find_event_field(struct ftrace_event_call *call, char *name) 676 - { 677 - struct ftrace_event_field *field; 678 - struct list_head *head; 679 - 680 - field = __find_event_field(&ftrace_common_fields, name); 681 - if (field) 682 - return field; 683 - 684 - head = trace_get_fields(call); 685 - return __find_event_field(head, name); 686 - } 687 - 688 661 static int __alloc_pred_stack(struct pred_stack *stack, int n_preds) 689 662 { 690 663 stack->preds = kcalloc(n_preds + 1, sizeof(*stack->preds), GFP_KERNEL); ··· 1310 1337 return NULL; 1311 1338 } 1312 1339 1313 - field = find_event_field(call, operand1); 1340 + field = trace_find_event_field(call, operand1); 1314 1341 if (!field) { 1315 1342 parse_error(ps, FILT_ERR_FIELD_NOT_FOUND, 0); 1316 1343 return NULL; ··· 1880 1907 return err; 1881 1908 } 1882 1909 1883 - int apply_subsystem_event_filter(struct event_subsystem *system, 1910 + int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir, 1884 1911 char *filter_string) 1885 1912 { 1913 + struct event_subsystem *system = dir->subsystem; 1886 1914 struct event_filter *filter; 1887 1915 int err = 0; 1888 1916 1889 1917 mutex_lock(&event_mutex); 1890 1918 1891 1919 /* Make sure the system still has events */ 1892 - if (!system->nr_events) { 1920 + if (!dir->nr_events) { 1893 1921 err = -ENODEV; 1894 1922 goto out_unlock; 1895 1923 }

+2 -2

kernel/trace/trace_export.c

··· 129 129 130 130 #undef FTRACE_ENTRY 131 131 #define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \ 132 - int \ 132 + static int __init \ 133 133 ftrace_define_fields_##name(struct ftrace_event_call *event_call) \ 134 134 { \ 135 135 struct struct_name field; \ ··· 168 168 #define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, filter,\ 169 169 regfn) \ 170 170 \ 171 - struct ftrace_event_class event_class_ftrace_##call = { \ 171 + struct ftrace_event_class __refdata event_class_ftrace_##call = { \ 172 172 .system = __stringify(TRACE_SYSTEM), \ 173 173 .define_fields = ftrace_define_fields_##call, \ 174 174 .fields = LIST_HEAD_INIT(event_class_ftrace_##call.fields),\

+157 -64

kernel/trace/trace_functions.c

··· 28 28 static int function_trace_init(struct trace_array *tr) 29 29 { 30 30 func_trace = tr; 31 - tr->cpu = get_cpu(); 31 + tr->trace_buffer.cpu = get_cpu(); 32 32 put_cpu(); 33 33 34 34 tracing_start_cmdline_record(); ··· 44 44 45 45 static void function_trace_start(struct trace_array *tr) 46 46 { 47 - tracing_reset_online_cpus(tr); 47 + tracing_reset_online_cpus(&tr->trace_buffer); 48 48 } 49 49 50 50 /* Our option */ ··· 76 76 goto out; 77 77 78 78 cpu = smp_processor_id(); 79 - data = tr->data[cpu]; 79 + data = per_cpu_ptr(tr->trace_buffer.data, cpu); 80 80 if (!atomic_read(&data->disabled)) { 81 81 local_save_flags(flags); 82 82 trace_function(tr, ip, parent_ip, flags, pc); ··· 107 107 */ 108 108 local_irq_save(flags); 109 109 cpu = raw_smp_processor_id(); 110 - data = tr->data[cpu]; 110 + data = per_cpu_ptr(tr->trace_buffer.data, cpu); 111 111 disabled = atomic_inc_return(&data->disabled); 112 112 113 113 if (likely(disabled == 1)) { ··· 214 214 }; 215 215 216 216 #ifdef CONFIG_DYNAMIC_FTRACE 217 - static void 218 - ftrace_traceon(unsigned long ip, unsigned long parent_ip, void **data) 217 + static int update_count(void **data) 219 218 { 220 - long *count = (long *)data; 221 - 222 - if (tracing_is_on()) 223 - return; 219 + unsigned long *count = (long *)data; 224 220 225 221 if (!*count) 226 - return; 222 + return 0; 227 223 228 224 if (*count != -1) 229 225 (*count)--; 226 + 227 + return 1; 228 + } 229 + 230 + static void 231 + ftrace_traceon_count(unsigned long ip, unsigned long parent_ip, void **data) 232 + { 233 + if (tracing_is_on()) 234 + return; 235 + 236 + if (update_count(data)) 237 + tracing_on(); 238 + } 239 + 240 + static void 241 + ftrace_traceoff_count(unsigned long ip, unsigned long parent_ip, void **data) 242 + { 243 + if (!tracing_is_on()) 244 + return; 245 + 246 + if (update_count(data)) 247 + tracing_off(); 248 + } 249 + 250 + static void 251 + ftrace_traceon(unsigned long ip, unsigned long parent_ip, void **data) 252 + { 253 + if (tracing_is_on()) 254 + return; 230 255 231 256 tracing_on(); 232 257 } ··· 259 234 static void 260 235 ftrace_traceoff(unsigned long ip, unsigned long parent_ip, void **data) 261 236 { 262 - long *count = (long *)data; 263 - 264 237 if (!tracing_is_on()) 265 238 return; 266 - 267 - if (!*count) 268 - return; 269 - 270 - if (*count != -1) 271 - (*count)--; 272 239 273 240 tracing_off(); 274 241 } 275 242 243 + /* 244 + * Skip 4: 245 + * ftrace_stacktrace() 246 + * function_trace_probe_call() 247 + * ftrace_ops_list_func() 248 + * ftrace_call() 249 + */ 250 + #define STACK_SKIP 4 251 + 252 + static void 253 + ftrace_stacktrace(unsigned long ip, unsigned long parent_ip, void **data) 254 + { 255 + trace_dump_stack(STACK_SKIP); 256 + } 257 + 258 + static void 259 + ftrace_stacktrace_count(unsigned long ip, unsigned long parent_ip, void **data) 260 + { 261 + if (!tracing_is_on()) 262 + return; 263 + 264 + if (update_count(data)) 265 + trace_dump_stack(STACK_SKIP); 266 + } 267 + 276 268 static int 277 - ftrace_trace_onoff_print(struct seq_file *m, unsigned long ip, 278 - struct ftrace_probe_ops *ops, void *data); 279 - 280 - static struct ftrace_probe_ops traceon_probe_ops = { 281 - .func = ftrace_traceon, 282 - .print = ftrace_trace_onoff_print, 283 - }; 284 - 285 - static struct ftrace_probe_ops traceoff_probe_ops = { 286 - .func = ftrace_traceoff, 287 - .print = ftrace_trace_onoff_print, 288 - }; 289 - 290 - static int 291 - ftrace_trace_onoff_print(struct seq_file *m, unsigned long ip, 292 - struct ftrace_probe_ops *ops, void *data) 269 + ftrace_probe_print(const char *name, struct seq_file *m, 270 + unsigned long ip, void *data) 293 271 { 294 272 long count = (long)data; 295 273 296 - seq_printf(m, "%ps:", (void *)ip); 297 - 298 - if (ops == &traceon_probe_ops) 299 - seq_printf(m, "traceon"); 300 - else 301 - seq_printf(m, "traceoff"); 274 + seq_printf(m, "%ps:%s", (void *)ip, name); 302 275 303 276 if (count == -1) 304 277 seq_printf(m, ":unlimited\n"); ··· 307 284 } 308 285 309 286 static int 310 - ftrace_trace_onoff_unreg(char *glob, char *cmd, char *param) 287 + ftrace_traceon_print(struct seq_file *m, unsigned long ip, 288 + struct ftrace_probe_ops *ops, void *data) 311 289 { 312 - struct ftrace_probe_ops *ops; 313 - 314 - /* we register both traceon and traceoff to this callback */ 315 - if (strcmp(cmd, "traceon") == 0) 316 - ops = &traceon_probe_ops; 317 - else 318 - ops = &traceoff_probe_ops; 319 - 320 - unregister_ftrace_function_probe_func(glob, ops); 321 - 322 - return 0; 290 + return ftrace_probe_print("traceon", m, ip, data); 323 291 } 324 292 325 293 static int 326 - ftrace_trace_onoff_callback(struct ftrace_hash *hash, 327 - char *glob, char *cmd, char *param, int enable) 294 + ftrace_traceoff_print(struct seq_file *m, unsigned long ip, 295 + struct ftrace_probe_ops *ops, void *data) 328 296 { 329 - struct ftrace_probe_ops *ops; 297 + return ftrace_probe_print("traceoff", m, ip, data); 298 + } 299 + 300 + static int 301 + ftrace_stacktrace_print(struct seq_file *m, unsigned long ip, 302 + struct ftrace_probe_ops *ops, void *data) 303 + { 304 + return ftrace_probe_print("stacktrace", m, ip, data); 305 + } 306 + 307 + static struct ftrace_probe_ops traceon_count_probe_ops = { 308 + .func = ftrace_traceon_count, 309 + .print = ftrace_traceon_print, 310 + }; 311 + 312 + static struct ftrace_probe_ops traceoff_count_probe_ops = { 313 + .func = ftrace_traceoff_count, 314 + .print = ftrace_traceoff_print, 315 + }; 316 + 317 + static struct ftrace_probe_ops stacktrace_count_probe_ops = { 318 + .func = ftrace_stacktrace_count, 319 + .print = ftrace_stacktrace_print, 320 + }; 321 + 322 + static struct ftrace_probe_ops traceon_probe_ops = { 323 + .func = ftrace_traceon, 324 + .print = ftrace_traceon_print, 325 + }; 326 + 327 + static struct ftrace_probe_ops traceoff_probe_ops = { 328 + .func = ftrace_traceoff, 329 + .print = ftrace_traceoff_print, 330 + }; 331 + 332 + static struct ftrace_probe_ops stacktrace_probe_ops = { 333 + .func = ftrace_stacktrace, 334 + .print = ftrace_stacktrace_print, 335 + }; 336 + 337 + static int 338 + ftrace_trace_probe_callback(struct ftrace_probe_ops *ops, 339 + struct ftrace_hash *hash, char *glob, 340 + char *cmd, char *param, int enable) 341 + { 330 342 void *count = (void *)-1; 331 343 char *number; 332 344 int ret; ··· 370 312 if (!enable) 371 313 return -EINVAL; 372 314 373 - if (glob[0] == '!') 374 - return ftrace_trace_onoff_unreg(glob+1, cmd, param); 375 - 376 - /* we register both traceon and traceoff to this callback */ 377 - if (strcmp(cmd, "traceon") == 0) 378 - ops = &traceon_probe_ops; 379 - else 380 - ops = &traceoff_probe_ops; 315 + if (glob[0] == '!') { 316 + unregister_ftrace_function_probe_func(glob+1, ops); 317 + return 0; 318 + } 381 319 382 320 if (!param) 383 321 goto out_reg; ··· 397 343 return ret < 0 ? ret : 0; 398 344 } 399 345 346 + static int 347 + ftrace_trace_onoff_callback(struct ftrace_hash *hash, 348 + char *glob, char *cmd, char *param, int enable) 349 + { 350 + struct ftrace_probe_ops *ops; 351 + 352 + /* we register both traceon and traceoff to this callback */ 353 + if (strcmp(cmd, "traceon") == 0) 354 + ops = param ? &traceon_count_probe_ops : &traceon_probe_ops; 355 + else 356 + ops = param ? &traceoff_count_probe_ops : &traceoff_probe_ops; 357 + 358 + return ftrace_trace_probe_callback(ops, hash, glob, cmd, 359 + param, enable); 360 + } 361 + 362 + static int 363 + ftrace_stacktrace_callback(struct ftrace_hash *hash, 364 + char *glob, char *cmd, char *param, int enable) 365 + { 366 + struct ftrace_probe_ops *ops; 367 + 368 + ops = param ? &stacktrace_count_probe_ops : &stacktrace_probe_ops; 369 + 370 + return ftrace_trace_probe_callback(ops, hash, glob, cmd, 371 + param, enable); 372 + } 373 + 400 374 static struct ftrace_func_command ftrace_traceon_cmd = { 401 375 .name = "traceon", 402 376 .func = ftrace_trace_onoff_callback, ··· 433 351 static struct ftrace_func_command ftrace_traceoff_cmd = { 434 352 .name = "traceoff", 435 353 .func = ftrace_trace_onoff_callback, 354 + }; 355 + 356 + static struct ftrace_func_command ftrace_stacktrace_cmd = { 357 + .name = "stacktrace", 358 + .func = ftrace_stacktrace_callback, 436 359 }; 437 360 438 361 static int __init init_func_cmd_traceon(void) ··· 451 364 ret = register_ftrace_command(&ftrace_traceon_cmd); 452 365 if (ret) 453 366 unregister_ftrace_command(&ftrace_traceoff_cmd); 367 + 368 + ret = register_ftrace_command(&ftrace_stacktrace_cmd); 369 + if (ret) { 370 + unregister_ftrace_command(&ftrace_traceoff_cmd); 371 + unregister_ftrace_command(&ftrace_traceon_cmd); 372 + } 454 373 return ret; 455 374 } 456 375 #else

+6 -6

kernel/trace/trace_functions_graph.c

··· 218 218 { 219 219 struct ftrace_event_call *call = &event_funcgraph_entry; 220 220 struct ring_buffer_event *event; 221 - struct ring_buffer *buffer = tr->buffer; 221 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 222 222 struct ftrace_graph_ent_entry *entry; 223 223 224 224 if (unlikely(__this_cpu_read(ftrace_cpu_disabled))) ··· 265 265 266 266 local_irq_save(flags); 267 267 cpu = raw_smp_processor_id(); 268 - data = tr->data[cpu]; 268 + data = per_cpu_ptr(tr->trace_buffer.data, cpu); 269 269 disabled = atomic_inc_return(&data->disabled); 270 270 if (likely(disabled == 1)) { 271 271 pc = preempt_count(); ··· 323 323 { 324 324 struct ftrace_event_call *call = &event_funcgraph_exit; 325 325 struct ring_buffer_event *event; 326 - struct ring_buffer *buffer = tr->buffer; 326 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 327 327 struct ftrace_graph_ret_entry *entry; 328 328 329 329 if (unlikely(__this_cpu_read(ftrace_cpu_disabled))) ··· 350 350 351 351 local_irq_save(flags); 352 352 cpu = raw_smp_processor_id(); 353 - data = tr->data[cpu]; 353 + data = per_cpu_ptr(tr->trace_buffer.data, cpu); 354 354 disabled = atomic_inc_return(&data->disabled); 355 355 if (likely(disabled == 1)) { 356 356 pc = preempt_count(); ··· 560 560 * We need to consume the current entry to see 561 561 * the next one. 562 562 */ 563 - ring_buffer_consume(iter->tr->buffer, iter->cpu, 563 + ring_buffer_consume(iter->trace_buffer->buffer, iter->cpu, 564 564 NULL, NULL); 565 - event = ring_buffer_peek(iter->tr->buffer, iter->cpu, 565 + event = ring_buffer_peek(iter->trace_buffer->buffer, iter->cpu, 566 566 NULL, NULL); 567 567 } 568 568

+64 -21

kernel/trace/trace_irqsoff.c

··· 33 33 static int trace_type __read_mostly; 34 34 35 35 static int save_flags; 36 + static bool function_enabled; 36 37 37 38 static void stop_irqsoff_tracer(struct trace_array *tr, int graph); 38 39 static int start_irqsoff_tracer(struct trace_array *tr, int graph); ··· 122 121 if (!irqs_disabled_flags(*flags)) 123 122 return 0; 124 123 125 - *data = tr->data[cpu]; 124 + *data = per_cpu_ptr(tr->trace_buffer.data, cpu); 126 125 disabled = atomic_inc_return(&(*data)->disabled); 127 126 128 127 if (likely(disabled == 1)) ··· 176 175 per_cpu(tracing_cpu, cpu) = 0; 177 176 178 177 tracing_max_latency = 0; 179 - tracing_reset_online_cpus(irqsoff_trace); 178 + tracing_reset_online_cpus(&irqsoff_trace->trace_buffer); 180 179 181 180 return start_irqsoff_tracer(irqsoff_trace, set); 182 181 } ··· 381 380 if (per_cpu(tracing_cpu, cpu)) 382 381 return; 383 382 384 - data = tr->data[cpu]; 383 + data = per_cpu_ptr(tr->trace_buffer.data, cpu); 385 384 386 385 if (unlikely(!data) || atomic_read(&data->disabled)) 387 386 return; ··· 419 418 if (!tracer_enabled) 420 419 return; 421 420 422 - data = tr->data[cpu]; 421 + data = per_cpu_ptr(tr->trace_buffer.data, cpu); 423 422 424 423 if (unlikely(!data) || 425 424 !data->critical_start || atomic_read(&data->disabled)) ··· 529 528 } 530 529 #endif /* CONFIG_PREEMPT_TRACER */ 531 530 532 - static int start_irqsoff_tracer(struct trace_array *tr, int graph) 531 + static int register_irqsoff_function(int graph, int set) 533 532 { 534 - int ret = 0; 533 + int ret; 535 534 536 - if (!graph) 537 - ret = register_ftrace_function(&trace_ops); 538 - else 535 + /* 'set' is set if TRACE_ITER_FUNCTION is about to be set */ 536 + if (function_enabled || (!set && !(trace_flags & TRACE_ITER_FUNCTION))) 537 + return 0; 538 + 539 + if (graph) 539 540 ret = register_ftrace_graph(&irqsoff_graph_return, 540 541 &irqsoff_graph_entry); 542 + else 543 + ret = register_ftrace_function(&trace_ops); 544 + 545 + if (!ret) 546 + function_enabled = true; 547 + 548 + return ret; 549 + } 550 + 551 + static void unregister_irqsoff_function(int graph) 552 + { 553 + if (!function_enabled) 554 + return; 555 + 556 + if (graph) 557 + unregister_ftrace_graph(); 558 + else 559 + unregister_ftrace_function(&trace_ops); 560 + 561 + function_enabled = false; 562 + } 563 + 564 + static void irqsoff_function_set(int set) 565 + { 566 + if (set) 567 + register_irqsoff_function(is_graph(), 1); 568 + else 569 + unregister_irqsoff_function(is_graph()); 570 + } 571 + 572 + static int irqsoff_flag_changed(struct tracer *tracer, u32 mask, int set) 573 + { 574 + if (mask & TRACE_ITER_FUNCTION) 575 + irqsoff_function_set(set); 576 + 577 + return trace_keep_overwrite(tracer, mask, set); 578 + } 579 + 580 + static int start_irqsoff_tracer(struct trace_array *tr, int graph) 581 + { 582 + int ret; 583 + 584 + ret = register_irqsoff_function(graph, 0); 541 585 542 586 if (!ret && tracing_is_enabled()) 543 587 tracer_enabled = 1; ··· 596 550 { 597 551 tracer_enabled = 0; 598 552 599 - if (!graph) 600 - unregister_ftrace_function(&trace_ops); 601 - else 602 - unregister_ftrace_graph(); 553 + unregister_irqsoff_function(graph); 603 554 } 604 555 605 556 static void __irqsoff_tracer_init(struct trace_array *tr) ··· 604 561 save_flags = trace_flags; 605 562 606 563 /* non overwrite screws up the latency tracers */ 607 - set_tracer_flag(TRACE_ITER_OVERWRITE, 1); 608 - set_tracer_flag(TRACE_ITER_LATENCY_FMT, 1); 564 + set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1); 565 + set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1); 609 566 610 567 tracing_max_latency = 0; 611 568 irqsoff_trace = tr; 612 569 /* make sure that the tracer is visible */ 613 570 smp_wmb(); 614 - tracing_reset_online_cpus(tr); 571 + tracing_reset_online_cpus(&tr->trace_buffer); 615 572 616 573 if (start_irqsoff_tracer(tr, is_graph())) 617 574 printk(KERN_ERR "failed to start irqsoff tracer\n"); ··· 624 581 625 582 stop_irqsoff_tracer(tr, is_graph()); 626 583 627 - set_tracer_flag(TRACE_ITER_LATENCY_FMT, lat_flag); 628 - set_tracer_flag(TRACE_ITER_OVERWRITE, overwrite_flag); 584 + set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag); 585 + set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); 629 586 } 630 587 631 588 static void irqsoff_tracer_start(struct trace_array *tr) ··· 658 615 .print_line = irqsoff_print_line, 659 616 .flags = &tracer_flags, 660 617 .set_flag = irqsoff_set_flag, 661 - .flag_changed = trace_keep_overwrite, 618 + .flag_changed = irqsoff_flag_changed, 662 619 #ifdef CONFIG_FTRACE_SELFTEST 663 620 .selftest = trace_selftest_startup_irqsoff, 664 621 #endif ··· 692 649 .print_line = irqsoff_print_line, 693 650 .flags = &tracer_flags, 694 651 .set_flag = irqsoff_set_flag, 695 - .flag_changed = trace_keep_overwrite, 652 + .flag_changed = irqsoff_flag_changed, 696 653 #ifdef CONFIG_FTRACE_SELFTEST 697 654 .selftest = trace_selftest_startup_preemptoff, 698 655 #endif ··· 728 685 .print_line = irqsoff_print_line, 729 686 .flags = &tracer_flags, 730 687 .set_flag = irqsoff_set_flag, 731 - .flag_changed = trace_keep_overwrite, 688 + .flag_changed = irqsoff_flag_changed, 732 689 #ifdef CONFIG_FTRACE_SELFTEST 733 690 .selftest = trace_selftest_startup_preemptirqsoff, 734 691 #endif

+6 -6

kernel/trace/trace_kdb.c

··· 26 26 trace_init_global_iter(&iter); 27 27 28 28 for_each_tracing_cpu(cpu) { 29 - atomic_inc(&iter.tr->data[cpu]->disabled); 29 + atomic_inc(&per_cpu_ptr(iter.trace_buffer->data, cpu)->disabled); 30 30 } 31 31 32 32 old_userobj = trace_flags; ··· 43 43 iter.iter_flags |= TRACE_FILE_LAT_FMT; 44 44 iter.pos = -1; 45 45 46 - if (cpu_file == TRACE_PIPE_ALL_CPU) { 46 + if (cpu_file == RING_BUFFER_ALL_CPUS) { 47 47 for_each_tracing_cpu(cpu) { 48 48 iter.buffer_iter[cpu] = 49 - ring_buffer_read_prepare(iter.tr->buffer, cpu); 49 + ring_buffer_read_prepare(iter.trace_buffer->buffer, cpu); 50 50 ring_buffer_read_start(iter.buffer_iter[cpu]); 51 51 tracing_iter_reset(&iter, cpu); 52 52 } 53 53 } else { 54 54 iter.cpu_file = cpu_file; 55 55 iter.buffer_iter[cpu_file] = 56 - ring_buffer_read_prepare(iter.tr->buffer, cpu_file); 56 + ring_buffer_read_prepare(iter.trace_buffer->buffer, cpu_file); 57 57 ring_buffer_read_start(iter.buffer_iter[cpu_file]); 58 58 tracing_iter_reset(&iter, cpu_file); 59 59 } ··· 83 83 trace_flags = old_userobj; 84 84 85 85 for_each_tracing_cpu(cpu) { 86 - atomic_dec(&iter.tr->data[cpu]->disabled); 86 + atomic_dec(&per_cpu_ptr(iter.trace_buffer->data, cpu)->disabled); 87 87 } 88 88 89 89 for_each_tracing_cpu(cpu) ··· 115 115 !cpu_online(cpu_file)) 116 116 return KDB_BADINT; 117 117 } else { 118 - cpu_file = TRACE_PIPE_ALL_CPU; 118 + cpu_file = RING_BUFFER_ALL_CPUS; 119 119 } 120 120 121 121 kdb_trap_printk++;

+6 -6

kernel/trace/trace_mmiotrace.c

··· 31 31 overrun_detected = false; 32 32 prev_overruns = 0; 33 33 34 - tracing_reset_online_cpus(tr); 34 + tracing_reset_online_cpus(&tr->trace_buffer); 35 35 } 36 36 37 37 static int mmio_trace_init(struct trace_array *tr) ··· 128 128 static unsigned long count_overruns(struct trace_iterator *iter) 129 129 { 130 130 unsigned long cnt = atomic_xchg(&dropped_count, 0); 131 - unsigned long over = ring_buffer_overruns(iter->tr->buffer); 131 + unsigned long over = ring_buffer_overruns(iter->trace_buffer->buffer); 132 132 133 133 if (over > prev_overruns) 134 134 cnt += over - prev_overruns; ··· 309 309 struct mmiotrace_rw *rw) 310 310 { 311 311 struct ftrace_event_call *call = &event_mmiotrace_rw; 312 - struct ring_buffer *buffer = tr->buffer; 312 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 313 313 struct ring_buffer_event *event; 314 314 struct trace_mmiotrace_rw *entry; 315 315 int pc = preempt_count(); ··· 330 330 void mmio_trace_rw(struct mmiotrace_rw *rw) 331 331 { 332 332 struct trace_array *tr = mmio_trace_array; 333 - struct trace_array_cpu *data = tr->data[smp_processor_id()]; 333 + struct trace_array_cpu *data = per_cpu_ptr(tr->trace_buffer.data, smp_processor_id()); 334 334 __trace_mmiotrace_rw(tr, data, rw); 335 335 } 336 336 ··· 339 339 struct mmiotrace_map *map) 340 340 { 341 341 struct ftrace_event_call *call = &event_mmiotrace_map; 342 - struct ring_buffer *buffer = tr->buffer; 342 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 343 343 struct ring_buffer_event *event; 344 344 struct trace_mmiotrace_map *entry; 345 345 int pc = preempt_count(); ··· 363 363 struct trace_array_cpu *data; 364 364 365 365 preempt_disable(); 366 - data = tr->data[smp_processor_id()]; 366 + data = per_cpu_ptr(tr->trace_buffer.data, smp_processor_id()); 367 367 __trace_mmiotrace_map(tr, data, map); 368 368 preempt_enable(); 369 369 }

+110 -9

kernel/trace/trace_output.c

··· 14 14 /* must be a power of 2 */ 15 15 #define EVENT_HASHSIZE 128 16 16 17 - DECLARE_RWSEM(trace_event_mutex); 17 + DECLARE_RWSEM(trace_event_sem); 18 18 19 19 static struct hlist_head event_hash[EVENT_HASHSIZE] __read_mostly; 20 20 ··· 35 35 trace_seq_init(s); 36 36 37 37 return ret; 38 + } 39 + 40 + enum print_line_t trace_print_bputs_msg_only(struct trace_iterator *iter) 41 + { 42 + struct trace_seq *s = &iter->seq; 43 + struct trace_entry *entry = iter->ent; 44 + struct bputs_entry *field; 45 + int ret; 46 + 47 + trace_assign_type(field, entry); 48 + 49 + ret = trace_seq_puts(s, field->str); 50 + if (!ret) 51 + return TRACE_TYPE_PARTIAL_LINE; 52 + 53 + return TRACE_TYPE_HANDLED; 38 54 } 39 55 40 56 enum print_line_t trace_print_bprintk_msg_only(struct trace_iterator *iter) ··· 413 397 } 414 398 EXPORT_SYMBOL(ftrace_print_hex_seq); 415 399 400 + int ftrace_raw_output_prep(struct trace_iterator *iter, 401 + struct trace_event *trace_event) 402 + { 403 + struct ftrace_event_call *event; 404 + struct trace_seq *s = &iter->seq; 405 + struct trace_seq *p = &iter->tmp_seq; 406 + struct trace_entry *entry; 407 + int ret; 408 + 409 + event = container_of(trace_event, struct ftrace_event_call, event); 410 + entry = iter->ent; 411 + 412 + if (entry->type != event->event.type) { 413 + WARN_ON_ONCE(1); 414 + return TRACE_TYPE_UNHANDLED; 415 + } 416 + 417 + trace_seq_init(p); 418 + ret = trace_seq_printf(s, "%s: ", event->name); 419 + if (!ret) 420 + return TRACE_TYPE_PARTIAL_LINE; 421 + 422 + return 0; 423 + } 424 + EXPORT_SYMBOL(ftrace_raw_output_prep); 425 + 416 426 #ifdef CONFIG_KRETPROBES 417 427 static inline const char *kretprobed(const char *name) 418 428 { ··· 659 617 { 660 618 unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE; 661 619 unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS; 662 - unsigned long long abs_ts = iter->ts - iter->tr->time_start; 620 + unsigned long long abs_ts = iter->ts - iter->trace_buffer->time_start; 663 621 unsigned long long rel_ts = next_ts - iter->ts; 664 622 struct trace_seq *s = &iter->seq; 665 623 ··· 825 783 826 784 void trace_event_read_lock(void) 827 785 { 828 - down_read(&trace_event_mutex); 786 + down_read(&trace_event_sem); 829 787 } 830 788 831 789 void trace_event_read_unlock(void) 832 790 { 833 - up_read(&trace_event_mutex); 791 + up_read(&trace_event_sem); 834 792 } 835 793 836 794 /** ··· 853 811 unsigned key; 854 812 int ret = 0; 855 813 856 - down_write(&trace_event_mutex); 814 + down_write(&trace_event_sem); 857 815 858 816 if (WARN_ON(!event)) 859 817 goto out; ··· 908 866 909 867 ret = event->type; 910 868 out: 911 - up_write(&trace_event_mutex); 869 + up_write(&trace_event_sem); 912 870 913 871 return ret; 914 872 } 915 873 EXPORT_SYMBOL_GPL(register_ftrace_event); 916 874 917 875 /* 918 - * Used by module code with the trace_event_mutex held for write. 876 + * Used by module code with the trace_event_sem held for write. 919 877 */ 920 878 int __unregister_ftrace_event(struct trace_event *event) 921 879 { ··· 930 888 */ 931 889 int unregister_ftrace_event(struct trace_event *event) 932 890 { 933 - down_write(&trace_event_mutex); 891 + down_write(&trace_event_sem); 934 892 __unregister_ftrace_event(event); 935 - up_write(&trace_event_mutex); 893 + up_write(&trace_event_sem); 936 894 937 895 return 0; 938 896 } ··· 1259 1217 .funcs = &trace_user_stack_funcs, 1260 1218 }; 1261 1219 1220 + /* TRACE_BPUTS */ 1221 + static enum print_line_t 1222 + trace_bputs_print(struct trace_iterator *iter, int flags, 1223 + struct trace_event *event) 1224 + { 1225 + struct trace_entry *entry = iter->ent; 1226 + struct trace_seq *s = &iter->seq; 1227 + struct bputs_entry *field; 1228 + 1229 + trace_assign_type(field, entry); 1230 + 1231 + if (!seq_print_ip_sym(s, field->ip, flags)) 1232 + goto partial; 1233 + 1234 + if (!trace_seq_puts(s, ": ")) 1235 + goto partial; 1236 + 1237 + if (!trace_seq_puts(s, field->str)) 1238 + goto partial; 1239 + 1240 + return TRACE_TYPE_HANDLED; 1241 + 1242 + partial: 1243 + return TRACE_TYPE_PARTIAL_LINE; 1244 + } 1245 + 1246 + 1247 + static enum print_line_t 1248 + trace_bputs_raw(struct trace_iterator *iter, int flags, 1249 + struct trace_event *event) 1250 + { 1251 + struct bputs_entry *field; 1252 + struct trace_seq *s = &iter->seq; 1253 + 1254 + trace_assign_type(field, iter->ent); 1255 + 1256 + if (!trace_seq_printf(s, ": %lx : ", field->ip)) 1257 + goto partial; 1258 + 1259 + if (!trace_seq_puts(s, field->str)) 1260 + goto partial; 1261 + 1262 + return TRACE_TYPE_HANDLED; 1263 + 1264 + partial: 1265 + return TRACE_TYPE_PARTIAL_LINE; 1266 + } 1267 + 1268 + static struct trace_event_functions trace_bputs_funcs = { 1269 + .trace = trace_bputs_print, 1270 + .raw = trace_bputs_raw, 1271 + }; 1272 + 1273 + static struct trace_event trace_bputs_event = { 1274 + .type = TRACE_BPUTS, 1275 + .funcs = &trace_bputs_funcs, 1276 + }; 1277 + 1262 1278 /* TRACE_BPRINT */ 1263 1279 static enum print_line_t 1264 1280 trace_bprint_print(struct trace_iterator *iter, int flags, ··· 1429 1329 &trace_wake_event, 1430 1330 &trace_stack_event, 1431 1331 &trace_user_stack_event, 1332 + &trace_bputs_event, 1432 1333 &trace_bprint_event, 1433 1334 &trace_print_event, 1434 1335 NULL

+3 -1

kernel/trace/trace_output.h

··· 5 5 #include "trace.h" 6 6 7 7 extern enum print_line_t 8 + trace_print_bputs_msg_only(struct trace_iterator *iter); 9 + extern enum print_line_t 8 10 trace_print_bprintk_msg_only(struct trace_iterator *iter); 9 11 extern enum print_line_t 10 12 trace_print_printk_msg_only(struct trace_iterator *iter); ··· 33 31 34 32 /* used by module unregistering */ 35 33 extern int __unregister_ftrace_event(struct trace_event *event); 36 - extern struct rw_semaphore trace_event_mutex; 34 + extern struct rw_semaphore trace_event_sem; 37 35 38 36 #define MAX_MEMHEX_BYTES 8 39 37 #define HEX_CHARS (MAX_MEMHEX_BYTES*2 + 1)

+4 -4

kernel/trace/trace_sched_switch.c

··· 28 28 unsigned long flags, int pc) 29 29 { 30 30 struct ftrace_event_call *call = &event_context_switch; 31 - struct ring_buffer *buffer = tr->buffer; 31 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 32 32 struct ring_buffer_event *event; 33 33 struct ctx_switch_entry *entry; 34 34 ··· 69 69 pc = preempt_count(); 70 70 local_irq_save(flags); 71 71 cpu = raw_smp_processor_id(); 72 - data = ctx_trace->data[cpu]; 72 + data = per_cpu_ptr(ctx_trace->trace_buffer.data, cpu); 73 73 74 74 if (likely(!atomic_read(&data->disabled))) 75 75 tracing_sched_switch_trace(ctx_trace, prev, next, flags, pc); ··· 86 86 struct ftrace_event_call *call = &event_wakeup; 87 87 struct ring_buffer_event *event; 88 88 struct ctx_switch_entry *entry; 89 - struct ring_buffer *buffer = tr->buffer; 89 + struct ring_buffer *buffer = tr->trace_buffer.buffer; 90 90 91 91 event = trace_buffer_lock_reserve(buffer, TRACE_WAKE, 92 92 sizeof(*entry), flags, pc); ··· 123 123 pc = preempt_count(); 124 124 local_irq_save(flags); 125 125 cpu = raw_smp_processor_id(); 126 - data = ctx_trace->data[cpu]; 126 + data = per_cpu_ptr(ctx_trace->trace_buffer.data, cpu); 127 127 128 128 if (likely(!atomic_read(&data->disabled))) 129 129 tracing_sched_wakeup_trace(ctx_trace, wakee, current,

+66 -23

kernel/trace/trace_sched_wakeup.c

··· 37 37 static void wakeup_graph_return(struct ftrace_graph_ret *trace); 38 38 39 39 static int save_flags; 40 + static bool function_enabled; 40 41 41 42 #define TRACE_DISPLAY_GRAPH 1 42 43 ··· 90 89 if (cpu != wakeup_current_cpu) 91 90 goto out_enable; 92 91 93 - *data = tr->data[cpu]; 92 + *data = per_cpu_ptr(tr->trace_buffer.data, cpu); 94 93 disabled = atomic_inc_return(&(*data)->disabled); 95 94 if (unlikely(disabled != 1)) 96 95 goto out; ··· 135 134 }; 136 135 #endif /* CONFIG_FUNCTION_TRACER */ 137 136 137 + static int register_wakeup_function(int graph, int set) 138 + { 139 + int ret; 140 + 141 + /* 'set' is set if TRACE_ITER_FUNCTION is about to be set */ 142 + if (function_enabled || (!set && !(trace_flags & TRACE_ITER_FUNCTION))) 143 + return 0; 144 + 145 + if (graph) 146 + ret = register_ftrace_graph(&wakeup_graph_return, 147 + &wakeup_graph_entry); 148 + else 149 + ret = register_ftrace_function(&trace_ops); 150 + 151 + if (!ret) 152 + function_enabled = true; 153 + 154 + return ret; 155 + } 156 + 157 + static void unregister_wakeup_function(int graph) 158 + { 159 + if (!function_enabled) 160 + return; 161 + 162 + if (graph) 163 + unregister_ftrace_graph(); 164 + else 165 + unregister_ftrace_function(&trace_ops); 166 + 167 + function_enabled = false; 168 + } 169 + 170 + static void wakeup_function_set(int set) 171 + { 172 + if (set) 173 + register_wakeup_function(is_graph(), 1); 174 + else 175 + unregister_wakeup_function(is_graph()); 176 + } 177 + 178 + static int wakeup_flag_changed(struct tracer *tracer, u32 mask, int set) 179 + { 180 + if (mask & TRACE_ITER_FUNCTION) 181 + wakeup_function_set(set); 182 + 183 + return trace_keep_overwrite(tracer, mask, set); 184 + } 185 + 138 186 static int start_func_tracer(int graph) 139 187 { 140 188 int ret; 141 189 142 - if (!graph) 143 - ret = register_ftrace_function(&trace_ops); 144 - else 145 - ret = register_ftrace_graph(&wakeup_graph_return, 146 - &wakeup_graph_entry); 190 + ret = register_wakeup_function(graph, 0); 147 191 148 192 if (!ret && tracing_is_enabled()) 149 193 tracer_enabled = 1; ··· 202 156 { 203 157 tracer_enabled = 0; 204 158 205 - if (!graph) 206 - unregister_ftrace_function(&trace_ops); 207 - else 208 - unregister_ftrace_graph(); 159 + unregister_wakeup_function(graph); 209 160 } 210 161 211 162 #ifdef CONFIG_FUNCTION_GRAPH_TRACER ··· 396 353 397 354 /* disable local data, not wakeup_cpu data */ 398 355 cpu = raw_smp_processor_id(); 399 - disabled = atomic_inc_return(&wakeup_trace->data[cpu]->disabled); 356 + disabled = atomic_inc_return(&per_cpu_ptr(wakeup_trace->trace_buffer.data, cpu)->disabled); 400 357 if (likely(disabled != 1)) 401 358 goto out; 402 359 ··· 408 365 goto out_unlock; 409 366 410 367 /* The task we are waiting for is waking up */ 411 - data = wakeup_trace->data[wakeup_cpu]; 368 + data = per_cpu_ptr(wakeup_trace->trace_buffer.data, wakeup_cpu); 412 369 413 370 __trace_function(wakeup_trace, CALLER_ADDR0, CALLER_ADDR1, flags, pc); 414 371 tracing_sched_switch_trace(wakeup_trace, prev, next, flags, pc); ··· 430 387 arch_spin_unlock(&wakeup_lock); 431 388 local_irq_restore(flags); 432 389 out: 433 - atomic_dec(&wakeup_trace->data[cpu]->disabled); 390 + atomic_dec(&per_cpu_ptr(wakeup_trace->trace_buffer.data, cpu)->disabled); 434 391 } 435 392 436 393 static void __wakeup_reset(struct trace_array *tr) ··· 448 405 { 449 406 unsigned long flags; 450 407 451 - tracing_reset_online_cpus(tr); 408 + tracing_reset_online_cpus(&tr->trace_buffer); 452 409 453 410 local_irq_save(flags); 454 411 arch_spin_lock(&wakeup_lock); ··· 478 435 return; 479 436 480 437 pc = preempt_count(); 481 - disabled = atomic_inc_return(&wakeup_trace->data[cpu]->disabled); 438 + disabled = atomic_inc_return(&per_cpu_ptr(wakeup_trace->trace_buffer.data, cpu)->disabled); 482 439 if (unlikely(disabled != 1)) 483 440 goto out; 484 441 ··· 501 458 502 459 local_save_flags(flags); 503 460 504 - data = wakeup_trace->data[wakeup_cpu]; 461 + data = per_cpu_ptr(wakeup_trace->trace_buffer.data, wakeup_cpu); 505 462 data->preempt_timestamp = ftrace_now(cpu); 506 463 tracing_sched_wakeup_trace(wakeup_trace, p, current, flags, pc); 507 464 ··· 515 472 out_locked: 516 473 arch_spin_unlock(&wakeup_lock); 517 474 out: 518 - atomic_dec(&wakeup_trace->data[cpu]->disabled); 475 + atomic_dec(&per_cpu_ptr(wakeup_trace->trace_buffer.data, cpu)->disabled); 519 476 } 520 477 521 478 static void start_wakeup_tracer(struct trace_array *tr) ··· 586 543 save_flags = trace_flags; 587 544 588 545 /* non overwrite screws up the latency tracers */ 589 - set_tracer_flag(TRACE_ITER_OVERWRITE, 1); 590 - set_tracer_flag(TRACE_ITER_LATENCY_FMT, 1); 546 + set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1); 547 + set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1); 591 548 592 549 tracing_max_latency = 0; 593 550 wakeup_trace = tr; ··· 616 573 /* make sure we put back any tasks we are tracing */ 617 574 wakeup_reset(tr); 618 575 619 - set_tracer_flag(TRACE_ITER_LATENCY_FMT, lat_flag); 620 - set_tracer_flag(TRACE_ITER_OVERWRITE, overwrite_flag); 576 + set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag); 577 + set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); 621 578 } 622 579 623 580 static void wakeup_tracer_start(struct trace_array *tr) ··· 643 600 .print_line = wakeup_print_line, 644 601 .flags = &tracer_flags, 645 602 .set_flag = wakeup_set_flag, 646 - .flag_changed = trace_keep_overwrite, 603 + .flag_changed = wakeup_flag_changed, 647 604 #ifdef CONFIG_FTRACE_SELFTEST 648 605 .selftest = trace_selftest_startup_wakeup, 649 606 #endif ··· 665 622 .print_line = wakeup_print_line, 666 623 .flags = &tracer_flags, 667 624 .set_flag = wakeup_set_flag, 668 - .flag_changed = trace_keep_overwrite, 625 + .flag_changed = wakeup_flag_changed, 669 626 #ifdef CONFIG_FTRACE_SELFTEST 670 627 .selftest = trace_selftest_startup_wakeup, 671 628 #endif

+26 -25

kernel/trace/trace_selftest.c

··· 21 21 return 0; 22 22 } 23 23 24 - static int trace_test_buffer_cpu(struct trace_array *tr, int cpu) 24 + static int trace_test_buffer_cpu(struct trace_buffer *buf, int cpu) 25 25 { 26 26 struct ring_buffer_event *event; 27 27 struct trace_entry *entry; 28 28 unsigned int loops = 0; 29 29 30 - while ((event = ring_buffer_consume(tr->buffer, cpu, NULL, NULL))) { 30 + while ((event = ring_buffer_consume(buf->buffer, cpu, NULL, NULL))) { 31 31 entry = ring_buffer_event_data(event); 32 32 33 33 /* ··· 58 58 * Test the trace buffer to see if all the elements 59 59 * are still sane. 60 60 */ 61 - static int trace_test_buffer(struct trace_array *tr, unsigned long *count) 61 + static int trace_test_buffer(struct trace_buffer *buf, unsigned long *count) 62 62 { 63 63 unsigned long flags, cnt = 0; 64 64 int cpu, ret = 0; ··· 67 67 local_irq_save(flags); 68 68 arch_spin_lock(&ftrace_max_lock); 69 69 70 - cnt = ring_buffer_entries(tr->buffer); 70 + cnt = ring_buffer_entries(buf->buffer); 71 71 72 72 /* 73 73 * The trace_test_buffer_cpu runs a while loop to consume all data. ··· 78 78 */ 79 79 tracing_off(); 80 80 for_each_possible_cpu(cpu) { 81 - ret = trace_test_buffer_cpu(tr, cpu); 81 + ret = trace_test_buffer_cpu(buf, cpu); 82 82 if (ret) 83 83 break; 84 84 } ··· 355 355 msleep(100); 356 356 357 357 /* we should have nothing in the buffer */ 358 - ret = trace_test_buffer(tr, &count); 358 + ret = trace_test_buffer(&tr->trace_buffer, &count); 359 359 if (ret) 360 360 goto out; 361 361 ··· 376 376 ftrace_enabled = 0; 377 377 378 378 /* check the trace buffer */ 379 - ret = trace_test_buffer(tr, &count); 379 + ret = trace_test_buffer(&tr->trace_buffer, &count); 380 380 tracing_start(); 381 381 382 382 /* we should only have one item */ ··· 666 666 ftrace_enabled = 0; 667 667 668 668 /* check the trace buffer */ 669 - ret = trace_test_buffer(tr, &count); 669 + ret = trace_test_buffer(&tr->trace_buffer, &count); 670 670 trace->reset(tr); 671 671 tracing_start(); 672 672 ··· 703 703 /* Maximum number of functions to trace before diagnosing a hang */ 704 704 #define GRAPH_MAX_FUNC_TEST 100000000 705 705 706 - static void 707 - __ftrace_dump(bool disable_tracing, enum ftrace_dump_mode oops_dump_mode); 708 706 static unsigned int graph_hang_thresh; 709 707 710 708 /* Wrap the real function entry probe to avoid possible hanging */ ··· 712 714 if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) { 713 715 ftrace_graph_stop(); 714 716 printk(KERN_WARNING "BUG: Function graph tracer hang!\n"); 715 - if (ftrace_dump_on_oops) 716 - __ftrace_dump(false, DUMP_ALL); 717 + if (ftrace_dump_on_oops) { 718 + ftrace_dump(DUMP_ALL); 719 + /* ftrace_dump() disables tracing */ 720 + tracing_on(); 721 + } 717 722 return 0; 718 723 } 719 724 ··· 738 737 * Simulate the init() callback but we attach a watchdog callback 739 738 * to detect and recover from possible hangs 740 739 */ 741 - tracing_reset_online_cpus(tr); 740 + tracing_reset_online_cpus(&tr->trace_buffer); 742 741 set_graph_array(tr); 743 742 ret = register_ftrace_graph(&trace_graph_return, 744 743 &trace_graph_entry_watchdog); ··· 761 760 tracing_stop(); 762 761 763 762 /* check the trace buffer */ 764 - ret = trace_test_buffer(tr, &count); 763 + ret = trace_test_buffer(&tr->trace_buffer, &count); 765 764 766 765 trace->reset(tr); 767 766 tracing_start(); ··· 816 815 /* stop the tracing. */ 817 816 tracing_stop(); 818 817 /* check both trace buffers */ 819 - ret = trace_test_buffer(tr, NULL); 818 + ret = trace_test_buffer(&tr->trace_buffer, NULL); 820 819 if (!ret) 821 - ret = trace_test_buffer(&max_tr, &count); 820 + ret = trace_test_buffer(&tr->max_buffer, &count); 822 821 trace->reset(tr); 823 822 tracing_start(); 824 823 ··· 878 877 /* stop the tracing. */ 879 878 tracing_stop(); 880 879 /* check both trace buffers */ 881 - ret = trace_test_buffer(tr, NULL); 880 + ret = trace_test_buffer(&tr->trace_buffer, NULL); 882 881 if (!ret) 883 - ret = trace_test_buffer(&max_tr, &count); 882 + ret = trace_test_buffer(&tr->max_buffer, &count); 884 883 trace->reset(tr); 885 884 tracing_start(); 886 885 ··· 944 943 /* stop the tracing. */ 945 944 tracing_stop(); 946 945 /* check both trace buffers */ 947 - ret = trace_test_buffer(tr, NULL); 946 + ret = trace_test_buffer(&tr->trace_buffer, NULL); 948 947 if (ret) 949 948 goto out; 950 949 951 - ret = trace_test_buffer(&max_tr, &count); 950 + ret = trace_test_buffer(&tr->max_buffer, &count); 952 951 if (ret) 953 952 goto out; 954 953 ··· 974 973 /* stop the tracing. */ 975 974 tracing_stop(); 976 975 /* check both trace buffers */ 977 - ret = trace_test_buffer(tr, NULL); 976 + ret = trace_test_buffer(&tr->trace_buffer, NULL); 978 977 if (ret) 979 978 goto out; 980 979 981 - ret = trace_test_buffer(&max_tr, &count); 980 + ret = trace_test_buffer(&tr->max_buffer, &count); 982 981 983 982 if (!ret && !count) { 984 983 printk(KERN_CONT ".. no entries found .."); ··· 1085 1084 /* stop the tracing. */ 1086 1085 tracing_stop(); 1087 1086 /* check both trace buffers */ 1088 - ret = trace_test_buffer(tr, NULL); 1087 + ret = trace_test_buffer(&tr->trace_buffer, NULL); 1089 1088 printk("ret = %d\n", ret); 1090 1089 if (!ret) 1091 - ret = trace_test_buffer(&max_tr, &count); 1090 + ret = trace_test_buffer(&tr->max_buffer, &count); 1092 1091 1093 1092 1094 1093 trace->reset(tr); ··· 1127 1126 /* stop the tracing. */ 1128 1127 tracing_stop(); 1129 1128 /* check the trace buffer */ 1130 - ret = trace_test_buffer(tr, &count); 1129 + ret = trace_test_buffer(&tr->trace_buffer, &count); 1131 1130 trace->reset(tr); 1132 1131 tracing_start(); 1133 1132

+69 -7

kernel/trace/trace_stack.c

··· 20 20 21 21 #define STACK_TRACE_ENTRIES 500 22 22 23 + #ifdef CC_USING_FENTRY 24 + # define fentry 1 25 + #else 26 + # define fentry 0 27 + #endif 28 + 23 29 static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] = 24 30 { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX }; 25 31 static unsigned stack_dump_index[STACK_TRACE_ENTRIES]; 26 32 33 + /* 34 + * Reserve one entry for the passed in ip. This will allow 35 + * us to remove most or all of the stack size overhead 36 + * added by the stack tracer itself. 37 + */ 27 38 static struct stack_trace max_stack_trace = { 28 - .max_entries = STACK_TRACE_ENTRIES, 29 - .entries = stack_dump_trace, 39 + .max_entries = STACK_TRACE_ENTRIES - 1, 40 + .entries = &stack_dump_trace[1], 30 41 }; 31 42 32 43 static unsigned long max_stack_size; ··· 50 39 int stack_tracer_enabled; 51 40 static int last_stack_tracer_enabled; 52 41 53 - static inline void check_stack(void) 42 + static inline void 43 + check_stack(unsigned long ip, unsigned long *stack) 54 44 { 55 45 unsigned long this_size, flags; 56 46 unsigned long *p, *top, *start; 47 + static int tracer_frame; 48 + int frame_size = ACCESS_ONCE(tracer_frame); 57 49 int i; 58 50 59 - this_size = ((unsigned long)&this_size) & (THREAD_SIZE-1); 51 + this_size = ((unsigned long)stack) & (THREAD_SIZE-1); 60 52 this_size = THREAD_SIZE - this_size; 53 + /* Remove the frame of the tracer */ 54 + this_size -= frame_size; 61 55 62 56 if (this_size <= max_stack_size) 63 57 return; 64 58 65 59 /* we do not handle interrupt stacks yet */ 66 - if (!object_is_on_stack(&this_size)) 60 + if (!object_is_on_stack(stack)) 67 61 return; 68 62 69 63 local_irq_save(flags); 70 64 arch_spin_lock(&max_stack_lock); 65 + 66 + /* In case another CPU set the tracer_frame on us */ 67 + if (unlikely(!frame_size)) 68 + this_size -= tracer_frame; 71 69 72 70 /* a race could have already updated it */ 73 71 if (this_size <= max_stack_size) ··· 90 70 save_stack_trace(&max_stack_trace); 91 71 92 72 /* 73 + * Add the passed in ip from the function tracer. 74 + * Searching for this on the stack will skip over 75 + * most of the overhead from the stack tracer itself. 76 + */ 77 + stack_dump_trace[0] = ip; 78 + max_stack_trace.nr_entries++; 79 + 80 + /* 93 81 * Now find where in the stack these are. 94 82 */ 95 83 i = 0; 96 - start = &this_size; 84 + start = stack; 97 85 top = (unsigned long *) 98 86 (((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE); 99 87 ··· 125 97 found = 1; 126 98 /* Start the search from here */ 127 99 start = p + 1; 100 + /* 101 + * We do not want to show the overhead 102 + * of the stack tracer stack in the 103 + * max stack. If we haven't figured 104 + * out what that is, then figure it out 105 + * now. 106 + */ 107 + if (unlikely(!tracer_frame) && i == 1) { 108 + tracer_frame = (p - stack) * 109 + sizeof(unsigned long); 110 + max_stack_size -= tracer_frame; 111 + } 128 112 } 129 113 } 130 114 ··· 153 113 stack_trace_call(unsigned long ip, unsigned long parent_ip, 154 114 struct ftrace_ops *op, struct pt_regs *pt_regs) 155 115 { 116 + unsigned long stack; 156 117 int cpu; 157 118 158 119 preempt_disable_notrace(); ··· 163 122 if (per_cpu(trace_active, cpu)++ != 0) 164 123 goto out; 165 124 166 - check_stack(); 125 + /* 126 + * When fentry is used, the traced function does not get 127 + * its stack frame set up, and we lose the parent. 128 + * The ip is pretty useless because the function tracer 129 + * was called before that function set up its stack frame. 130 + * In this case, we use the parent ip. 131 + * 132 + * By adding the return address of either the parent ip 133 + * or the current ip we can disregard most of the stack usage 134 + * caused by the stack tracer itself. 135 + * 136 + * The function tracer always reports the address of where the 137 + * mcount call was, but the stack will hold the return address. 138 + */ 139 + if (fentry) 140 + ip = parent_ip; 141 + else 142 + ip += MCOUNT_INSN_SIZE; 143 + 144 + check_stack(ip, &stack); 167 145 168 146 out: 169 147 per_cpu(trace_active, cpu)--; ··· 431 371 struct dentry *d_tracer; 432 372 433 373 d_tracer = tracing_init_dentry(); 374 + if (!d_tracer) 375 + return 0; 434 376 435 377 trace_create_file("stack_max_size", 0644, d_tracer, 436 378 &max_stack_size, &stack_max_size_fops);

+2

kernel/trace/trace_stat.c

··· 307 307 struct dentry *d_tracing; 308 308 309 309 d_tracing = tracing_init_dentry(); 310 + if (!d_tracing) 311 + return 0; 310 312 311 313 stat_dir = debugfs_create_dir("trace_stat", d_tracing); 312 314 if (!stat_dir)

+51 -39

kernel/trace/trace_syscalls.c

··· 12 12 #include "trace.h" 13 13 14 14 static DEFINE_MUTEX(syscall_trace_lock); 15 - static int sys_refcount_enter; 16 - static int sys_refcount_exit; 17 - static DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls); 18 - static DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls); 19 15 20 16 static int syscall_enter_register(struct ftrace_event_call *event, 21 17 enum trace_reg type, void *data); ··· 37 41 /* 38 42 * Only compare after the "sys" prefix. Archs that use 39 43 * syscall wrappers may have syscalls symbols aliases prefixed 40 - * with "SyS" instead of "sys", leading to an unwanted 44 + * with ".SyS" or ".sys" instead of "sys", leading to an unwanted 41 45 * mismatch. 42 46 */ 43 47 return !strcmp(sym + 3, name + 3); ··· 261 265 kfree(call->print_fmt); 262 266 } 263 267 264 - static int syscall_enter_define_fields(struct ftrace_event_call *call) 268 + static int __init syscall_enter_define_fields(struct ftrace_event_call *call) 265 269 { 266 270 struct syscall_trace_enter trace; 267 271 struct syscall_metadata *meta = call->data; ··· 284 288 return ret; 285 289 } 286 290 287 - static int syscall_exit_define_fields(struct ftrace_event_call *call) 291 + static int __init syscall_exit_define_fields(struct ftrace_event_call *call) 288 292 { 289 293 struct syscall_trace_exit trace; 290 294 int ret; ··· 299 303 return ret; 300 304 } 301 305 302 - static void ftrace_syscall_enter(void *ignore, struct pt_regs *regs, long id) 306 + static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) 303 307 { 308 + struct trace_array *tr = data; 304 309 struct syscall_trace_enter *entry; 305 310 struct syscall_metadata *sys_data; 306 311 struct ring_buffer_event *event; ··· 312 315 syscall_nr = trace_get_syscall_nr(current, regs); 313 316 if (syscall_nr < 0) 314 317 return; 315 - if (!test_bit(syscall_nr, enabled_enter_syscalls)) 318 + if (!test_bit(syscall_nr, tr->enabled_enter_syscalls)) 316 319 return; 317 320 318 321 sys_data = syscall_nr_to_meta(syscall_nr); ··· 321 324 322 325 size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args; 323 326 324 - event = trace_current_buffer_lock_reserve(&buffer, 327 + buffer = tr->trace_buffer.buffer; 328 + event = trace_buffer_lock_reserve(buffer, 325 329 sys_data->enter_event->event.type, size, 0, 0); 326 330 if (!event) 327 331 return; ··· 336 338 trace_current_buffer_unlock_commit(buffer, event, 0, 0); 337 339 } 338 340 339 - static void ftrace_syscall_exit(void *ignore, struct pt_regs *regs, long ret) 341 + static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret) 340 342 { 343 + struct trace_array *tr = data; 341 344 struct syscall_trace_exit *entry; 342 345 struct syscall_metadata *sys_data; 343 346 struct ring_buffer_event *event; ··· 348 349 syscall_nr = trace_get_syscall_nr(current, regs); 349 350 if (syscall_nr < 0) 350 351 return; 351 - if (!test_bit(syscall_nr, enabled_exit_syscalls)) 352 + if (!test_bit(syscall_nr, tr->enabled_exit_syscalls)) 352 353 return; 353 354 354 355 sys_data = syscall_nr_to_meta(syscall_nr); 355 356 if (!sys_data) 356 357 return; 357 358 358 - event = trace_current_buffer_lock_reserve(&buffer, 359 + buffer = tr->trace_buffer.buffer; 360 + event = trace_buffer_lock_reserve(buffer, 359 361 sys_data->exit_event->event.type, sizeof(*entry), 0, 0); 360 362 if (!event) 361 363 return; ··· 370 370 trace_current_buffer_unlock_commit(buffer, event, 0, 0); 371 371 } 372 372 373 - static int reg_event_syscall_enter(struct ftrace_event_call *call) 373 + static int reg_event_syscall_enter(struct ftrace_event_file *file, 374 + struct ftrace_event_call *call) 374 375 { 376 + struct trace_array *tr = file->tr; 375 377 int ret = 0; 376 378 int num; 377 379 ··· 381 379 if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 382 380 return -ENOSYS; 383 381 mutex_lock(&syscall_trace_lock); 384 - if (!sys_refcount_enter) 385 - ret = register_trace_sys_enter(ftrace_syscall_enter, NULL); 382 + if (!tr->sys_refcount_enter) 383 + ret = register_trace_sys_enter(ftrace_syscall_enter, tr); 386 384 if (!ret) { 387 - set_bit(num, enabled_enter_syscalls); 388 - sys_refcount_enter++; 385 + set_bit(num, tr->enabled_enter_syscalls); 386 + tr->sys_refcount_enter++; 389 387 } 390 388 mutex_unlock(&syscall_trace_lock); 391 389 return ret; 392 390 } 393 391 394 - static void unreg_event_syscall_enter(struct ftrace_event_call *call) 392 + static void unreg_event_syscall_enter(struct ftrace_event_file *file, 393 + struct ftrace_event_call *call) 395 394 { 395 + struct trace_array *tr = file->tr; 396 396 int num; 397 397 398 398 num = ((struct syscall_metadata *)call->data)->syscall_nr; 399 399 if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 400 400 return; 401 401 mutex_lock(&syscall_trace_lock); 402 - sys_refcount_enter--; 403 - clear_bit(num, enabled_enter_syscalls); 404 - if (!sys_refcount_enter) 405 - unregister_trace_sys_enter(ftrace_syscall_enter, NULL); 402 + tr->sys_refcount_enter--; 403 + clear_bit(num, tr->enabled_enter_syscalls); 404 + if (!tr->sys_refcount_enter) 405 + unregister_trace_sys_enter(ftrace_syscall_enter, tr); 406 406 mutex_unlock(&syscall_trace_lock); 407 407 } 408 408 409 - static int reg_event_syscall_exit(struct ftrace_event_call *call) 409 + static int reg_event_syscall_exit(struct ftrace_event_file *file, 410 + struct ftrace_event_call *call) 410 411 { 412 + struct trace_array *tr = file->tr; 411 413 int ret = 0; 412 414 int num; 413 415 ··· 419 413 if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 420 414 return -ENOSYS; 421 415 mutex_lock(&syscall_trace_lock); 422 - if (!sys_refcount_exit) 423 - ret = register_trace_sys_exit(ftrace_syscall_exit, NULL); 416 + if (!tr->sys_refcount_exit) 417 + ret = register_trace_sys_exit(ftrace_syscall_exit, tr); 424 418 if (!ret) { 425 - set_bit(num, enabled_exit_syscalls); 426 - sys_refcount_exit++; 419 + set_bit(num, tr->enabled_exit_syscalls); 420 + tr->sys_refcount_exit++; 427 421 } 428 422 mutex_unlock(&syscall_trace_lock); 429 423 return ret; 430 424 } 431 425 432 - static void unreg_event_syscall_exit(struct ftrace_event_call *call) 426 + static void unreg_event_syscall_exit(struct ftrace_event_file *file, 427 + struct ftrace_event_call *call) 433 428 { 429 + struct trace_array *tr = file->tr; 434 430 int num; 435 431 436 432 num = ((struct syscall_metadata *)call->data)->syscall_nr; 437 433 if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 438 434 return; 439 435 mutex_lock(&syscall_trace_lock); 440 - sys_refcount_exit--; 441 - clear_bit(num, enabled_exit_syscalls); 442 - if (!sys_refcount_exit) 443 - unregister_trace_sys_exit(ftrace_syscall_exit, NULL); 436 + tr->sys_refcount_exit--; 437 + clear_bit(num, tr->enabled_exit_syscalls); 438 + if (!tr->sys_refcount_exit) 439 + unregister_trace_sys_exit(ftrace_syscall_exit, tr); 444 440 mutex_unlock(&syscall_trace_lock); 445 441 } 446 442 ··· 479 471 .trace = print_syscall_exit, 480 472 }; 481 473 482 - struct ftrace_event_class event_class_syscall_enter = { 474 + struct ftrace_event_class __refdata event_class_syscall_enter = { 483 475 .system = "syscalls", 484 476 .reg = syscall_enter_register, 485 477 .define_fields = syscall_enter_define_fields, ··· 487 479 .raw_init = init_syscall_trace, 488 480 }; 489 481 490 - struct ftrace_event_class event_class_syscall_exit = { 482 + struct ftrace_event_class __refdata event_class_syscall_exit = { 491 483 .system = "syscalls", 492 484 .reg = syscall_exit_register, 493 485 .define_fields = syscall_exit_define_fields, ··· 693 685 static int syscall_enter_register(struct ftrace_event_call *event, 694 686 enum trace_reg type, void *data) 695 687 { 688 + struct ftrace_event_file *file = data; 689 + 696 690 switch (type) { 697 691 case TRACE_REG_REGISTER: 698 - return reg_event_syscall_enter(event); 692 + return reg_event_syscall_enter(file, event); 699 693 case TRACE_REG_UNREGISTER: 700 - unreg_event_syscall_enter(event); 694 + unreg_event_syscall_enter(file, event); 701 695 return 0; 702 696 703 697 #ifdef CONFIG_PERF_EVENTS ··· 721 711 static int syscall_exit_register(struct ftrace_event_call *event, 722 712 enum trace_reg type, void *data) 723 713 { 714 + struct ftrace_event_file *file = data; 715 + 724 716 switch (type) { 725 717 case TRACE_REG_REGISTER: 726 - return reg_event_syscall_exit(event); 718 + return reg_event_syscall_exit(file, event); 727 719 case TRACE_REG_UNREGISTER: 728 - unreg_event_syscall_exit(event); 720 + unreg_event_syscall_exit(file, event); 729 721 return 0; 730 722 731 723 #ifdef CONFIG_PERF_EVENTS

+13 -8

kernel/tracepoint.c

··· 112 112 int nr_probes = 0; 113 113 struct tracepoint_func *old, *new; 114 114 115 - WARN_ON(!probe); 115 + if (WARN_ON(!probe)) 116 + return ERR_PTR(-EINVAL); 116 117 117 118 debug_print_probes(entry); 118 119 old = entry->funcs; ··· 153 152 154 153 debug_print_probes(entry); 155 154 /* (N -> M), (N > 1, M >= 0) probes */ 156 - for (nr_probes = 0; old[nr_probes].func; nr_probes++) { 157 - if (!probe || 158 - (old[nr_probes].func == probe && 159 - old[nr_probes].data == data)) 160 - nr_del++; 155 + if (probe) { 156 + for (nr_probes = 0; old[nr_probes].func; nr_probes++) { 157 + if (old[nr_probes].func == probe && 158 + old[nr_probes].data == data) 159 + nr_del++; 160 + } 161 161 } 162 162 163 + /* 164 + * If probe is NULL, then nr_probes = nr_del = 0, and then the 165 + * entire entry will be removed. 166 + */ 163 167 if (nr_probes - nr_del == 0) { 164 168 /* N -> 0, (N > 1) */ 165 169 entry->funcs = NULL; ··· 179 173 if (new == NULL) 180 174 return ERR_PTR(-ENOMEM); 181 175 for (i = 0; old[i].func; i++) 182 - if (probe && 183 - (old[i].func != probe || old[i].data != data)) 176 + if (old[i].func != probe || old[i].data != data) 184 177 new[j++] = old[i]; 185 178 new[nr_probes - nr_del].func = NULL; 186 179 entry->refcount = nr_probes - nr_del;