Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:
"A few more minor updates:

- Show the tgid mappings for user space trace tools to use

- Fix and optimize the comm and tgid cache recording

- Sanitize derived kprobe names

- Ftrace selftest updates

- trace file header fix

- Update of Documentation/trace/ftrace.txt

- Compiler warning fixes

- Fix possible uninitialized variable"

* tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix uninitialized variable in match_records()
ftrace: Remove an unneeded NULL check
ftrace: Hide cached module code for !CONFIG_MODULES
tracing: Do note expose stack_trace_filter without DYNAMIC_FTRACE
tracing: Update Documentation/trace/ftrace.txt
tracing: Fixup trace file header alignment
selftests/ftrace: Add a testcase for kprobe event naming
selftests/ftrace: Add a test to probe module functions
selftests/ftrace: Update multiple kprobes test for powerpc
trace/kprobes: Sanitize derived event names
tracing: Attempt to record other information even if some fail
tracing: Treat recording tgid for idle task as a success
tracing: Treat recording comm for idle task as a success
tracing: Add saved_tgids file to show cached pid to tgid mappings

+603 -132
+395 -111
Documentation/trace/ftrace.txt
··· 5 5 Author: Steven Rostedt <srostedt@redhat.com> 6 6 License: The GNU Free Documentation License, Version 1.2 7 7 (dual licensed under the GPL v2) 8 - Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, 9 - John Kacur, and David Teigland. 8 + Original Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, 9 + John Kacur, and David Teigland. 10 10 Written for: 2.6.28-rc2 11 11 Updated for: 3.10 12 + Updated for: 4.13 - Copyright 2017 VMware Inc. Steven Rostedt 12 13 13 14 Introduction 14 15 ------------ ··· 27 26 28 27 One of the most common uses of ftrace is the event tracing. 29 28 Through out the kernel is hundreds of static event points that 30 - can be enabled via the debugfs file system to see what is 29 + can be enabled via the tracefs file system to see what is 31 30 going on in certain parts of the kernel. 31 + 32 + See events.txt for more information. 32 33 33 34 34 35 Implementation Details ··· 42 39 The File System 43 40 --------------- 44 41 45 - Ftrace uses the debugfs file system to hold the control files as 42 + Ftrace uses the tracefs file system to hold the control files as 46 43 well as the files to display output. 47 44 48 - When debugfs is configured into the kernel (which selecting any ftrace 49 - option will do) the directory /sys/kernel/debug will be created. To mount 45 + When tracefs is configured into the kernel (which selecting any ftrace 46 + option will do) the directory /sys/kernel/tracing will be created. To mount 50 47 this directory, you can add to your /etc/fstab file: 51 48 52 - debugfs /sys/kernel/debug debugfs defaults 0 0 49 + tracefs /sys/kernel/tracing tracefs defaults 0 0 53 50 54 51 Or you can mount it at run time with: 55 52 56 - mount -t debugfs nodev /sys/kernel/debug 53 + mount -t tracefs nodev /sys/kernel/tracing 57 54 58 55 For quicker access to that directory you may want to make a soft link to 59 56 it: 60 57 61 - ln -s /sys/kernel/debug /debug 58 + ln -s /sys/kernel/tracing /tracing 62 59 63 - Any selected ftrace option will also create a directory called tracing 64 - within the debugfs. The rest of the document will assume that you are in 65 - the ftrace directory (cd /sys/kernel/debug/tracing) and will only concentrate 66 - on the files within that directory and not distract from the content with 67 - the extended "/sys/kernel/debug/tracing" path name. 60 + *** NOTICE *** 61 + 62 + Before 4.1, all ftrace tracing control files were within the debugfs 63 + file system, which is typically located at /sys/kernel/debug/tracing. 64 + For backward compatibility, when mounting the debugfs file system, 65 + the tracefs file system will be automatically mounted at: 66 + 67 + /sys/kernel/debug/tracing 68 + 69 + All files located in the tracefs file system will be located in that 70 + debugfs file system directory as well. 71 + 72 + *** NOTICE *** 73 + 74 + Any selected ftrace option will also create the tracefs file system. 75 + The rest of the document will assume that you are in the ftrace directory 76 + (cd /sys/kernel/tracing) and will only concentrate on the files within that 77 + directory and not distract from the content with the extended 78 + "/sys/kernel/tracing" path name. 68 79 69 80 That's it! (assuming that you have ftrace configured into your kernel) 70 81 71 - After mounting debugfs, you can see a directory called 72 - "tracing". This directory contains the control and output files 82 + After mounting tracefs you will have access to the control and output files 73 83 of ftrace. Here is a list of some of the key files: 74 84 75 85 ··· 108 92 writing to the ring buffer, the tracing overhead may 109 93 still be occurring. 110 94 95 + The kernel function tracing_off() can be used within the 96 + kernel to disable writing to the ring buffer, which will 97 + set this file to "0". User space can re-enable tracing by 98 + echoing "1" into the file. 99 + 100 + Note, the function and event trigger "traceoff" will also 101 + set this file to zero and stop tracing. Which can also 102 + be re-enabled by user space using this file. 103 + 111 104 trace: 112 105 113 106 This file holds the output of the trace in a human 114 - readable format (described below). 107 + readable format (described below). Note, tracing is temporarily 108 + disabled while this file is being read (opened). 115 109 116 110 trace_pipe: 117 111 ··· 135 109 will not be read again with a sequential read. The 136 110 "trace" file is static, and if the tracer is not 137 111 adding more data, it will display the same 138 - information every time it is read. 112 + information every time it is read. This file will not 113 + disable tracing while being read. 139 114 140 115 trace_options: 141 116 ··· 155 128 tracing_max_latency: 156 129 157 130 Some of the tracers record the max latency. 158 - For example, the time interrupts are disabled. 159 - This time is saved in this file. The max trace 160 - will also be stored, and displayed by "trace". 161 - A new max trace will only be recorded if the 162 - latency is greater than the value in this 163 - file. (in microseconds) 131 + For example, the maximum time that interrupts are disabled. 132 + The maximum time is saved in this file. The max trace will also be 133 + stored, and displayed by "trace". A new max trace will only be 134 + recorded if the latency is greater than the value in this file 135 + (in microseconds). 136 + 137 + By echoing in a time into this file, no latency will be recorded 138 + unless it is greater than the time in this file. 164 139 165 140 tracing_thresh: 166 141 ··· 181 152 that the kernel uses for allocation, usually 4 KB in size). 182 153 If the last page allocated has room for more bytes 183 154 than requested, the rest of the page will be used, 184 - making the actual allocation bigger than requested. 155 + making the actual allocation bigger than requested or shown. 185 156 ( Note, the size may not be a multiple of the page size 186 157 due to buffer management meta-data. ) 158 + 159 + Buffer sizes for individual CPUs may vary 160 + (see "per_cpu/cpu0/buffer_size_kb" below), and if they do 161 + this file will show "X". 187 162 188 163 buffer_total_size_kb: 189 164 ··· 195 162 196 163 free_buffer: 197 164 198 - If a process is performing the tracing, and the ring buffer 199 - should be shrunk "freed" when the process is finished, even 200 - if it were to be killed by a signal, this file can be used 201 - for that purpose. On close of this file, the ring buffer will 202 - be resized to its minimum size. Having a process that is tracing 203 - also open this file, when the process exits its file descriptor 204 - for this file will be closed, and in doing so, the ring buffer 205 - will be "freed". 165 + If a process is performing tracing, and the ring buffer should be 166 + shrunk "freed" when the process is finished, even if it were to be 167 + killed by a signal, this file can be used for that purpose. On close 168 + of this file, the ring buffer will be resized to its minimum size. 169 + Having a process that is tracing also open this file, when the process 170 + exits its file descriptor for this file will be closed, and in doing so, 171 + the ring buffer will be "freed". 206 172 207 173 It may also stop tracing if disable_on_free option is set. 208 174 209 175 tracing_cpumask: 210 176 211 - This is a mask that lets the user only trace 212 - on specified CPUs. The format is a hex string 213 - representing the CPUs. 177 + This is a mask that lets the user only trace on specified CPUs. 178 + The format is a hex string representing the CPUs. 214 179 215 180 set_ftrace_filter: 216 181 ··· 220 189 has a side effect of enabling or disabling specific functions 221 190 to be traced. Echoing names of functions into this file 222 191 will limit the trace to only those functions. 192 + 193 + The functions listed in "available_filter_functions" are what 194 + can be written into this file. 223 195 224 196 This interface also allows for commands to be used. See the 225 197 "Filter commands" section for more details. ··· 236 202 237 203 set_ftrace_pid: 238 204 239 - Have the function tracer only trace a single thread. 205 + Have the function tracer only trace the threads whose PID are 206 + listed in this file. 207 + 208 + If the "function-fork" option is set, then when a task whose 209 + PID is listed in this file forks, the child's PID will 210 + automatically be added to this file, and the child will be 211 + traced by the function tracer as well. This option will also 212 + cause PIDs of tasks that exit to be removed from the file. 240 213 241 214 set_event_pid: 242 215 ··· 258 217 259 218 set_graph_function: 260 219 261 - Set a "trigger" function where tracing should start 262 - with the function graph tracer (See the section 263 - "dynamic ftrace" for more details). 220 + Functions listed in this file will cause the function graph 221 + tracer to only trace these functions and the functions that 222 + they call. (See the section "dynamic ftrace" for more details). 223 + 224 + set_graph_notrace: 225 + 226 + Similar to set_graph_function, but will disable function graph 227 + tracing when the function is hit until it exits the function. 228 + This makes it possible to ignore tracing functions that are called 229 + by a specific function. 264 230 265 231 available_filter_functions: 266 232 267 - This lists the functions that ftrace 268 - has processed and can trace. These are the function 269 - names that you can pass to "set_ftrace_filter" or 270 - "set_ftrace_notrace". (See the section "dynamic ftrace" 271 - below for more details.) 233 + This lists the functions that ftrace has processed and can trace. 234 + These are the function names that you can pass to 235 + "set_ftrace_filter" or "set_ftrace_notrace". 236 + (See the section "dynamic ftrace" below for more details.) 237 + 238 + dyn_ftrace_total_info: 239 + 240 + This file is for debugging purposes. The number of functions that 241 + have been converted to nops and are available to be traced. 272 242 273 243 enabled_functions: 274 244 ··· 302 250 an 'I' will be displayed on the same line as the function that 303 251 can be overridden. 304 252 253 + If the architecture supports it, it will also show what callback 254 + is being directly called by the function. If the count is greater 255 + than 1 it most likely will be ftrace_ops_list_func(). 256 + 257 + If the callback of the function jumps to a trampoline that is 258 + specific to a the callback and not the standard trampoline, 259 + its address will be printed as well as the function that the 260 + trampoline calls. 261 + 305 262 function_profile_enabled: 306 263 307 264 When set it will enable all functions with either the function 308 - tracer, or if enabled, the function graph tracer. It will 265 + tracer, or if configured, the function graph tracer. It will 309 266 keep a histogram of the number of functions that were called 310 - and if run with the function graph tracer, it will also keep 267 + and if the function graph tracer was configured, it will also keep 311 268 track of the time spent in those functions. The histogram 312 269 content can be displayed in the files: 313 270 ··· 344 283 printk_formats: 345 284 346 285 This is for tools that read the raw format files. If an event in 347 - the ring buffer references a string (currently only trace_printk() 348 - does this), only a pointer to the string is recorded into the buffer 349 - and not the string itself. This prevents tools from knowing what 350 - that string was. This file displays the string and address for 351 - the string allowing tools to map the pointers to what the 352 - strings were. 286 + the ring buffer references a string, only a pointer to the string 287 + is recorded into the buffer and not the string itself. This prevents 288 + tools from knowing what that string was. This file displays the string 289 + and address for the string allowing tools to map the pointers to what 290 + the strings were. 353 291 354 292 saved_cmdlines: 355 293 ··· 357 297 makes a cache of pid mappings to comms to try to display 358 298 comms for events. If a pid for a comm is not listed, then 359 299 "<...>" is displayed in the output. 300 + 301 + If the option "record-cmd" is set to "0", then comms of tasks 302 + will not be saved during recording. By default, it is enabled. 303 + 304 + saved_cmdlines_size: 305 + 306 + By default, 128 comms are saved (see "saved_cmdlines" above). To 307 + increase or decrease the amount of comms that are cached, echo 308 + in a the number of comms to cache, into this file. 309 + 310 + saved_tgids: 311 + 312 + If the option "record-tgid" is set, on each scheduling context switch 313 + the Task Group ID of a task is saved in a table mapping the PID of 314 + the thread to its TGID. By default, the "record-tgid" option is 315 + disabled. 360 316 361 317 snapshot: 362 318 ··· 411 335 412 336 # cat trace_clock 413 337 [local] global counter x86-tsc 338 + 339 + The clock with the square brackets around it is the one 340 + in effect. 414 341 415 342 local: Default clock, but may not be in sync across CPUs 416 343 ··· 527 448 528 449 See events.txt for more information. 529 450 451 + set_event: 452 + 453 + By echoing in the event into this file, will enable that event. 454 + 455 + See events.txt for more information. 456 + 457 + available_events: 458 + 459 + A list of events that can be enabled in tracing. 460 + 461 + See events.txt for more information. 462 + 463 + hwlat_detector: 464 + 465 + Directory for the Hardware Latency Detector. 466 + See "Hardware Latency Detector" section below. 467 + 530 468 per_cpu: 531 469 532 470 This is a directory that contains the trace per_cpu information. ··· 635 539 to draw a graph of function calls similar to C code 636 540 source. 637 541 542 + "blk" 543 + 544 + The block tracer. The tracer used by the blktrace user 545 + application. 546 + 547 + "hwlat" 548 + 549 + The Hardware Latency tracer is used to detect if the hardware 550 + produces any latency. See "Hardware Latency Detector" section 551 + below. 552 + 638 553 "irqsoff" 639 554 640 555 Traces the areas that disable interrupts and saves 641 556 the trace with the longest max latency. 642 557 See tracing_max_latency. When a new max is recorded, 643 558 it replaces the old trace. It is best to view this 644 - trace with the latency-format option enabled. 559 + trace with the latency-format option enabled, which 560 + happens automatically when the tracer is selected. 645 561 646 562 "preemptoff" 647 563 ··· 679 571 RT tasks (as the current "wakeup" does). This is useful 680 572 for those interested in wake up timings of RT tasks. 681 573 574 + "wakeup_dl" 575 + 576 + Traces and records the max latency that it takes for 577 + a SCHED_DEADLINE task to be woken (as the "wakeup" and 578 + "wakeup_rt" does). 579 + 580 + "mmiotrace" 581 + 582 + A special tracer that is used to trace binary module. 583 + It will trace all the calls that a module makes to the 584 + hardware. Everything it writes and reads from the I/O 585 + as well. 586 + 587 + "branch" 588 + 589 + This tracer can be configured when tracing likely/unlikely 590 + calls within the kernel. It will trace when a likely and 591 + unlikely branch is hit and if it was correct in its prediction 592 + of being correct. 593 + 682 594 "nop" 683 595 684 596 This is the "trace nothing" tracer. To remove all ··· 710 582 ---------------------------- 711 583 712 584 Here are typical examples of using the tracers when controlling 713 - them only with the debugfs interface (without using any 585 + them only with the tracefs interface (without using any 714 586 user-land utilities). 715 587 716 588 Output format: ··· 802 674 This shows that the current tracer is "irqsoff" tracing the time 803 675 for which interrupts were disabled. It gives the trace version (which 804 676 never changes) and the version of the kernel upon which this was executed on 805 - (3.10). Then it displays the max latency in microseconds (259 us). The number 677 + (3.8). Then it displays the max latency in microseconds (259 us). The number 806 678 of trace entries displayed and the total number (both are four: #4/4). 807 679 VP, KP, SP, and HP are always zero and are reserved for later use. 808 680 #P is the number of online CPUs (#P:4). ··· 837 709 '.' otherwise. 838 710 839 711 hardirq/softirq: 712 + 'Z' - NMI occurred inside a hardirq 713 + 'z' - NMI is running 840 714 'H' - hard irq occurred inside a softirq. 841 715 'h' - hard irq is running 842 716 's' - soft irq is running ··· 887 757 nobin 888 758 noblock 889 759 trace_printk 890 - nobranch 891 760 annotate 892 761 nouserstacktrace 893 762 nosym-userobj 894 763 noprintk-msg-only 895 764 context-info 896 765 nolatency-format 897 - sleep-time 898 - graph-time 899 766 record-cmd 767 + norecord-tgid 900 768 overwrite 901 769 nodisable_on_free 902 770 irq-info 903 771 markers 904 772 noevent-fork 905 773 function-trace 774 + nofunction-fork 906 775 nodisplay-graph 907 776 nostacktrace 777 + nobranch 908 778 909 779 To disable one of the options, echo in the option prepended with 910 780 "no". ··· 960 830 961 831 trace_printk - Can disable trace_printk() from writing into the buffer. 962 832 963 - branch - Enable branch tracing with the tracer. 964 - 965 833 annotate - It is sometimes confusing when the CPU buffers are full 966 834 and one CPU buffer had a lot of events recently, thus 967 835 a shorter time frame, were another CPU may have only had ··· 978 850 <idle>-0 [001] .Ns3 21169.031485: sub_preempt_count <-_raw_spin_unlock 979 851 980 852 userstacktrace - This option changes the trace. It records a 981 - stacktrace of the current userspace thread. 853 + stacktrace of the current user space thread after 854 + each trace event. 982 855 983 856 sym-userobj - when user stacktrace are enabled, look up which 984 857 object the address belongs to, and print a ··· 1002 873 context-info - Show only the event data. Hides the comm, PID, 1003 874 timestamp, CPU, and other useful data. 1004 875 1005 - latency-format - This option changes the trace. When 1006 - it is enabled, the trace displays 1007 - additional information about the 1008 - latencies, as described in "Latency 1009 - trace format". 1010 - 1011 - sleep-time - When running function graph tracer, to include 1012 - the time a task schedules out in its function. 1013 - When enabled, it will account time the task has been 1014 - scheduled out as part of the function call. 1015 - 1016 - graph-time - When running function profiler with function graph tracer, 1017 - to include the time to call nested functions. When this is 1018 - not set, the time reported for the function will only 1019 - include the time the function itself executed for, not the 1020 - time for functions that it called. 876 + latency-format - This option changes the trace output. When it is enabled, 877 + the trace displays additional information about the 878 + latency, as described in "Latency trace format". 1021 879 1022 880 record-cmd - When any event or tracer is enabled, a hook is enabled 1023 - in the sched_switch trace point to fill comm cache 881 + in the sched_switch trace point to fill comm cache 1024 882 with mapped pids and comms. But this may cause some 1025 883 overhead, and if you only care about pids, and not the 1026 884 name of the task, disabling this option can lower the 1027 - impact of tracing. 885 + impact of tracing. See "saved_cmdlines". 886 + 887 + record-tgid - When any event or tracer is enabled, a hook is enabled 888 + in the sched_switch trace point to fill the cache of 889 + mapped Thread Group IDs (TGID) mapping to pids. See 890 + "saved_tgids". 1028 891 1029 892 overwrite - This controls what happens when the trace buffer is 1030 893 full. If "1" (default), the oldest events are ··· 1056 935 functions. This keeps the overhead of the tracer down 1057 936 when performing latency tests. 1058 937 938 + function-fork - When set, tasks with PIDs listed in set_ftrace_pid will 939 + have the PIDs of their children added to set_ftrace_pid 940 + when those tasks fork. Also, when tasks with PIDs in 941 + set_ftrace_pid exit, their PIDs will be removed from the 942 + file. 943 + 1059 944 display-graph - When set, the latency tracers (irqsoff, wakeup, etc) will 1060 945 use function graph tracing instead of function tracing. 1061 946 1062 - stacktrace - This is one of the options that changes the trace 1063 - itself. When a trace is recorded, so is the stack 1064 - of functions. This allows for back traces of 1065 - trace sites. 947 + stacktrace - When set, a stack trace is recorded after any trace event 948 + is recorded. 949 + 950 + branch - Enable branch tracing with the tracer. This enables branch 951 + tracer along with the currently set tracer. Enabling this 952 + with the "nop" tracer is the same as just enabling the 953 + "branch" tracer. 1066 954 1067 955 Note: Some tracers have their own options. They only appear in this 1068 956 file when the tracer is active. They always appear in the 1069 957 options directory. 1070 958 959 + 960 + Here are the per tracer options: 961 + 962 + Options for function tracer: 963 + 964 + func_stack_trace - When set, a stack trace is recorded after every 965 + function that is recorded. NOTE! Limit the functions 966 + that are recorded before enabling this, with 967 + "set_ftrace_filter" otherwise the system performance 968 + will be critically degraded. Remember to disable 969 + this option before clearing the function filter. 970 + 971 + Options for function_graph tracer: 972 + 973 + Since the function_graph tracer has a slightly different output 974 + it has its own options to control what is displayed. 975 + 976 + funcgraph-overrun - When set, the "overrun" of the graph stack is 977 + displayed after each function traced. The 978 + overrun, is when the stack depth of the calls 979 + is greater than what is reserved for each task. 980 + Each task has a fixed array of functions to 981 + trace in the call graph. If the depth of the 982 + calls exceeds that, the function is not traced. 983 + The overrun is the number of functions missed 984 + due to exceeding this array. 985 + 986 + funcgraph-cpu - When set, the CPU number of the CPU where the trace 987 + occurred is displayed. 988 + 989 + funcgraph-overhead - When set, if the function takes longer than 990 + A certain amount, then a delay marker is 991 + displayed. See "delay" above, under the 992 + header description. 993 + 994 + funcgraph-proc - Unlike other tracers, the process' command line 995 + is not displayed by default, but instead only 996 + when a task is traced in and out during a context 997 + switch. Enabling this options has the command 998 + of each process displayed at every line. 999 + 1000 + funcgraph-duration - At the end of each function (the return) 1001 + the duration of the amount of time in the 1002 + function is displayed in microseconds. 1003 + 1004 + funcgraph-abstime - When set, the timestamp is displayed at each 1005 + line. 1006 + 1007 + funcgraph-irqs - When disabled, functions that happen inside an 1008 + interrupt will not be traced. 1009 + 1010 + funcgraph-tail - When set, the return event will include the function 1011 + that it represents. By default this is off, and 1012 + only a closing curly bracket "}" is displayed for 1013 + the return of a function. 1014 + 1015 + sleep-time - When running function graph tracer, to include 1016 + the time a task schedules out in its function. 1017 + When enabled, it will account time the task has been 1018 + scheduled out as part of the function call. 1019 + 1020 + graph-time - When running function profiler with function graph tracer, 1021 + to include the time to call nested functions. When this is 1022 + not set, the time reported for the function will only 1023 + include the time the function itself executed for, not the 1024 + time for functions that it called. 1025 + 1026 + Options for blk tracer: 1027 + 1028 + blk_classic - Shows a more minimalistic output. 1071 1029 1072 1030 1073 1031 irqsoff ··· 1911 1711 <idle>-0 2d..3 6us : 0:120:R ==> [002] 5882: 94:R sleep 1912 1712 1913 1713 1714 + Hardware Latency Detector 1715 + ------------------------- 1716 + 1717 + The hardware latency detector is executed by enabling the "hwlat" tracer. 1718 + 1719 + NOTE, this tracer will affect the performance of the system as it will 1720 + periodically make a CPU constantly busy with interrupts disabled. 1721 + 1722 + # echo hwlat > current_tracer 1723 + # sleep 100 1724 + # cat trace 1725 + # tracer: hwlat 1726 + # 1727 + # _-----=> irqs-off 1728 + # / _----=> need-resched 1729 + # | / _---=> hardirq/softirq 1730 + # || / _--=> preempt-depth 1731 + # ||| / delay 1732 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 1733 + # | | | |||| | | 1734 + <...>-3638 [001] d... 19452.055471: #1 inner/outer(us): 12/14 ts:1499801089.066141940 1735 + <...>-3638 [003] d... 19454.071354: #2 inner/outer(us): 11/9 ts:1499801091.082164365 1736 + <...>-3638 [002] dn.. 19461.126852: #3 inner/outer(us): 12/9 ts:1499801098.138150062 1737 + <...>-3638 [001] d... 19488.340960: #4 inner/outer(us): 8/12 ts:1499801125.354139633 1738 + <...>-3638 [003] d... 19494.388553: #5 inner/outer(us): 8/12 ts:1499801131.402150961 1739 + <...>-3638 [003] d... 19501.283419: #6 inner/outer(us): 0/12 ts:1499801138.297435289 nmi-total:4 nmi-count:1 1740 + 1741 + 1742 + The above output is somewhat the same in the header. All events will have 1743 + interrupts disabled 'd'. Under the FUNCTION title there is: 1744 + 1745 + #1 - This is the count of events recorded that were greater than the 1746 + tracing_threshold (See below). 1747 + 1748 + inner/outer(us): 12/14 1749 + 1750 + This shows two numbers as "inner latency" and "outer latency". The test 1751 + runs in a loop checking a timestamp twice. The latency detected within 1752 + the two timestamps is the "inner latency" and the latency detected 1753 + after the previous timestamp and the next timestamp in the loop is 1754 + the "outer latency". 1755 + 1756 + ts:1499801089.066141940 1757 + 1758 + The absolute timestamp that the event happened. 1759 + 1760 + nmi-total:4 nmi-count:1 1761 + 1762 + On architectures that support it, if an NMI comes in during the 1763 + test, the time spent in NMI is reported in "nmi-total" (in 1764 + microseconds). 1765 + 1766 + All architectures that have NMIs will show the "nmi-count" if an 1767 + NMI comes in during the test. 1768 + 1769 + hwlat files: 1770 + 1771 + tracing_threshold - This gets automatically set to "10" to represent 10 1772 + microseconds. This is the threshold of latency that 1773 + needs to be detected before the trace will be recorded. 1774 + 1775 + Note, when hwlat tracer is finished (another tracer is 1776 + written into "current_tracer"), the original value for 1777 + tracing_threshold is placed back into this file. 1778 + 1779 + hwlat_detector/width - The length of time the test runs with interrupts 1780 + disabled. 1781 + 1782 + hwlat_detector/window - The length of time of the window which the test 1783 + runs. That is, the test will run for "width" 1784 + microseconds per "window" microseconds 1785 + 1786 + tracing_cpumask - When the test is started. A kernel thread is created that 1787 + runs the test. This thread will alternate between CPUs 1788 + listed in the tracing_cpumask between each period 1789 + (one "window"). To limit the test to specific CPUs 1790 + set the mask in this file to only the CPUs that the test 1791 + should run on. 1792 + 1914 1793 function 1915 1794 -------- 1916 1795 ··· 2100 1821 #define STR(x) _STR(x) 2101 1822 #define MAX_PATH 256 2102 1823 2103 - const char *find_debugfs(void) 1824 + const char *find_tracefs(void) 2104 1825 { 2105 - static char debugfs[MAX_PATH+1]; 2106 - static int debugfs_found; 1826 + static char tracefs[MAX_PATH+1]; 1827 + static int tracefs_found; 2107 1828 char type[100]; 2108 1829 FILE *fp; 2109 1830 2110 - if (debugfs_found) 2111 - return debugfs; 1831 + if (tracefs_found) 1832 + return tracefs; 2112 1833 2113 1834 if ((fp = fopen("/proc/mounts","r")) == NULL) { 2114 1835 perror("/proc/mounts"); ··· 2118 1839 while (fscanf(fp, "%*s %" 2119 1840 STR(MAX_PATH) 2120 1841 "s %99s %*s %*d %*d\n", 2121 - debugfs, type) == 2) { 2122 - if (strcmp(type, "debugfs") == 0) 1842 + tracefs, type) == 2) { 1843 + if (strcmp(type, "tracefs") == 0) 2123 1844 break; 2124 1845 } 2125 1846 fclose(fp); 2126 1847 2127 - if (strcmp(type, "debugfs") != 0) { 2128 - fprintf(stderr, "debugfs not mounted"); 1848 + if (strcmp(type, "tracefs") != 0) { 1849 + fprintf(stderr, "tracefs not mounted"); 2129 1850 return NULL; 2130 1851 } 2131 1852 2132 - strcat(debugfs, "/tracing/"); 2133 - debugfs_found = 1; 1853 + strcat(tracefs, "/tracing/"); 1854 + tracefs_found = 1; 2134 1855 2135 - return debugfs; 1856 + return tracefs; 2136 1857 } 2137 1858 2138 1859 const char *tracing_file(const char *file_name) 2139 1860 { 2140 1861 static char trace_file[MAX_PATH+1]; 2141 - snprintf(trace_file, MAX_PATH, "%s/%s", find_debugfs(), file_name); 1862 + snprintf(trace_file, MAX_PATH, "%s/%s", find_tracefs(), file_name); 2142 1863 return trace_file; 2143 1864 } 2144 1865 ··· 2177 1898 ------ 2178 1899 #!/bin/bash 2179 1900 2180 - debugfs=`sed -ne 's/^debugfs \(.*\) debugfs.*/\1/p' /proc/mounts` 2181 - echo nop > $debugfs/tracing/current_tracer 2182 - echo 0 > $debugfs/tracing/tracing_on 2183 - echo $$ > $debugfs/tracing/set_ftrace_pid 2184 - echo function > $debugfs/tracing/current_tracer 2185 - echo 1 > $debugfs/tracing/tracing_on 1901 + tracefs=`sed -ne 's/^tracefs \(.*\) tracefs.*/\1/p' /proc/mounts` 1902 + echo nop > $tracefs/tracing/current_tracer 1903 + echo 0 > $tracefs/tracing/tracing_on 1904 + echo $$ > $tracefs/tracing/set_ftrace_pid 1905 + echo function > $tracefs/tracing/current_tracer 1906 + echo 1 > $tracefs/tracing/tracing_on 2186 1907 exec "$@" 2187 1908 ------ 2188 1909 ··· 2424 2145 At compile time every C file object is run through the 2425 2146 recordmcount program (located in the scripts directory). This 2426 2147 program will parse the ELF headers in the C object to find all 2427 - the locations in the .text section that call mcount. (Note, only 2428 - white listed .text sections are processed, since processing other 2429 - sections like .init.text may cause races due to those sections 2430 - being freed unexpectedly). 2148 + the locations in the .text section that call mcount. Starting 2149 + with gcc verson 4.6, the -mfentry has been added for x86, which 2150 + calls "__fentry__" instead of "mcount". Which is called before 2151 + the creation of the stack frame. 2431 2152 2432 - A new section called "__mcount_loc" is created that holds 2433 - references to all the mcount call sites in the .text section. 2153 + Note, not all sections are traced. They may be prevented by either 2154 + a notrace, or blocked another way and all inline functions are not 2155 + traced. Check the "available_filter_functions" file to see what functions 2156 + can be traced. 2157 + 2158 + A section called "__mcount_loc" is created that holds 2159 + references to all the mcount/fentry call sites in the .text section. 2434 2160 The recordmcount program re-links this section back into the 2435 2161 original object. The final linking stage of the kernel will add all these 2436 2162 references into a single table. ··· 2963 2679 buffer with a spare buffer, and tracing continues in the new 2964 2680 current (=previous spare) buffer. 2965 2681 2966 - The following debugfs files in "tracing" are related to this 2682 + The following tracefs files in "tracing" are related to this 2967 2683 feature: 2968 2684 2969 2685 snapshot: ··· 3036 2752 3037 2753 Instances 3038 2754 --------- 3039 - In the debugfs tracing directory is a directory called "instances". 2755 + In the tracefs tracing directory is a directory called "instances". 3040 2756 This directory can have new directories created inside of it using 3041 2757 mkdir, and removing directories with rmdir. The directory created 3042 2758 with mkdir in this directory will already contain files and other
+4 -2
kernel/trace/ftrace.c
··· 3816 3816 int exclude_mod = 0; 3817 3817 int found = 0; 3818 3818 int ret; 3819 - int clear_filter; 3819 + int clear_filter = 0; 3820 3820 3821 3821 if (func) { 3822 3822 func_g.type = filter_parse_regex(func, len, &func_g.search, ··· 3950 3950 continue; 3951 3951 3952 3952 /* no func matches all */ 3953 - if (!func || strcmp(func, "*") == 0 || 3953 + if (strcmp(func, "*") == 0 || 3954 3954 (ftrace_mod->func && 3955 3955 strcmp(ftrace_mod->func, func) == 0)) { 3956 3956 ret = 0; ··· 3978 3978 ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len, 3979 3979 int reset, int enable); 3980 3980 3981 + #ifdef CONFIG_MODULES 3981 3982 static void process_mod_list(struct list_head *head, struct ftrace_ops *ops, 3982 3983 char *mod, bool enable) 3983 3984 { ··· 4069 4068 4070 4069 kfree(mod); 4071 4070 } 4071 + #endif 4072 4072 4073 4073 /* 4074 4074 * We register the module command as a template to show others how
+123 -17
kernel/trace/trace.c
··· 1916 1916 { 1917 1917 unsigned pid, idx; 1918 1918 1919 - if (!tsk->pid || unlikely(tsk->pid > PID_MAX_DEFAULT)) 1919 + /* treat recording of idle task as a success */ 1920 + if (!tsk->pid) 1921 + return 1; 1922 + 1923 + if (unlikely(tsk->pid > PID_MAX_DEFAULT)) 1920 1924 return 0; 1921 1925 1922 1926 /* ··· 2006 2002 2007 2003 static int trace_save_tgid(struct task_struct *tsk) 2008 2004 { 2009 - if (unlikely(!tgid_map || !tsk->pid || tsk->pid > PID_MAX_DEFAULT)) 2005 + /* treat recording of idle task as a success */ 2006 + if (!tsk->pid) 2007 + return 1; 2008 + 2009 + if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT)) 2010 2010 return 0; 2011 2011 2012 2012 tgid_map[tsk->pid] = tsk->tgid; ··· 2037 2029 */ 2038 2030 void tracing_record_taskinfo(struct task_struct *task, int flags) 2039 2031 { 2032 + bool done; 2033 + 2040 2034 if (tracing_record_taskinfo_skip(flags)) 2041 2035 return; 2042 - if ((flags & TRACE_RECORD_CMDLINE) && !trace_save_cmdline(task)) 2043 - return; 2044 - if ((flags & TRACE_RECORD_TGID) && !trace_save_tgid(task)) 2036 + 2037 + /* 2038 + * Record as much task information as possible. If some fail, continue 2039 + * to try to record the others. 2040 + */ 2041 + done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(task); 2042 + done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(task); 2043 + 2044 + /* If recording any information failed, retry again soon. */ 2045 + if (!done) 2045 2046 return; 2046 2047 2047 2048 __this_cpu_write(trace_taskinfo_save, false); ··· 2067 2050 void tracing_record_taskinfo_sched_switch(struct task_struct *prev, 2068 2051 struct task_struct *next, int flags) 2069 2052 { 2053 + bool done; 2054 + 2070 2055 if (tracing_record_taskinfo_skip(flags)) 2071 2056 return; 2072 2057 2073 - if ((flags & TRACE_RECORD_CMDLINE) && 2074 - (!trace_save_cmdline(prev) || !trace_save_cmdline(next))) 2075 - return; 2058 + /* 2059 + * Record as much task information as possible. If some fail, continue 2060 + * to try to record the others. 2061 + */ 2062 + done = !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(prev); 2063 + done &= !(flags & TRACE_RECORD_CMDLINE) || trace_save_cmdline(next); 2064 + done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(prev); 2065 + done &= !(flags & TRACE_RECORD_TGID) || trace_save_tgid(next); 2076 2066 2077 - if ((flags & TRACE_RECORD_TGID) && 2078 - (!trace_save_tgid(prev) || !trace_save_tgid(next))) 2067 + /* If recording any information failed, retry again soon. */ 2068 + if (!done) 2079 2069 return; 2080 2070 2081 2071 __this_cpu_write(trace_taskinfo_save, false); ··· 3358 3334 unsigned int flags) 3359 3335 { 3360 3336 bool tgid = flags & TRACE_ITER_RECORD_TGID; 3337 + const char tgid_space[] = " "; 3338 + const char space[] = " "; 3361 3339 3362 - seq_printf(m, "# %s _-----=> irqs-off\n", tgid ? " " : ""); 3363 - seq_printf(m, "# %s / _----=> need-resched\n", tgid ? " " : ""); 3364 - seq_printf(m, "# %s| / _---=> hardirq/softirq\n", tgid ? " " : ""); 3365 - seq_printf(m, "# %s|| / _--=> preempt-depth\n", tgid ? " " : ""); 3366 - seq_printf(m, "# %s||| / delay\n", tgid ? " " : ""); 3367 - seq_printf(m, "# TASK-PID CPU#%s|||| TIMESTAMP FUNCTION\n", tgid ? " TGID " : ""); 3368 - seq_printf(m, "# | | | %s|||| | |\n", tgid ? " | " : ""); 3340 + seq_printf(m, "# %s _-----=> irqs-off\n", 3341 + tgid ? tgid_space : space); 3342 + seq_printf(m, "# %s / _----=> need-resched\n", 3343 + tgid ? tgid_space : space); 3344 + seq_printf(m, "# %s| / _---=> hardirq/softirq\n", 3345 + tgid ? tgid_space : space); 3346 + seq_printf(m, "# %s|| / _--=> preempt-depth\n", 3347 + tgid ? tgid_space : space); 3348 + seq_printf(m, "# %s||| / delay\n", 3349 + tgid ? tgid_space : space); 3350 + seq_printf(m, "# TASK-PID CPU#%s|||| TIMESTAMP FUNCTION\n", 3351 + tgid ? " TGID " : space); 3352 + seq_printf(m, "# | | | %s|||| | |\n", 3353 + tgid ? " | " : space); 3369 3354 } 3370 3355 3371 3356 void ··· 4720 4687 .open = tracing_open_generic, 4721 4688 .read = tracing_readme_read, 4722 4689 .llseek = generic_file_llseek, 4690 + }; 4691 + 4692 + static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos) 4693 + { 4694 + int *ptr = v; 4695 + 4696 + if (*pos || m->count) 4697 + ptr++; 4698 + 4699 + (*pos)++; 4700 + 4701 + for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) { 4702 + if (trace_find_tgid(*ptr)) 4703 + return ptr; 4704 + } 4705 + 4706 + return NULL; 4707 + } 4708 + 4709 + static void *saved_tgids_start(struct seq_file *m, loff_t *pos) 4710 + { 4711 + void *v; 4712 + loff_t l = 0; 4713 + 4714 + if (!tgid_map) 4715 + return NULL; 4716 + 4717 + v = &tgid_map[0]; 4718 + while (l <= *pos) { 4719 + v = saved_tgids_next(m, v, &l); 4720 + if (!v) 4721 + return NULL; 4722 + } 4723 + 4724 + return v; 4725 + } 4726 + 4727 + static void saved_tgids_stop(struct seq_file *m, void *v) 4728 + { 4729 + } 4730 + 4731 + static int saved_tgids_show(struct seq_file *m, void *v) 4732 + { 4733 + int pid = (int *)v - tgid_map; 4734 + 4735 + seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid)); 4736 + return 0; 4737 + } 4738 + 4739 + static const struct seq_operations tracing_saved_tgids_seq_ops = { 4740 + .start = saved_tgids_start, 4741 + .stop = saved_tgids_stop, 4742 + .next = saved_tgids_next, 4743 + .show = saved_tgids_show, 4744 + }; 4745 + 4746 + static int tracing_saved_tgids_open(struct inode *inode, struct file *filp) 4747 + { 4748 + if (tracing_disabled) 4749 + return -ENODEV; 4750 + 4751 + return seq_open(filp, &tracing_saved_tgids_seq_ops); 4752 + } 4753 + 4754 + 4755 + static const struct file_operations tracing_saved_tgids_fops = { 4756 + .open = tracing_saved_tgids_open, 4757 + .read = seq_read, 4758 + .llseek = seq_lseek, 4759 + .release = seq_release, 4723 4760 }; 4724 4761 4725 4762 static void *saved_cmdlines_next(struct seq_file *m, void *v, loff_t *pos) ··· 8023 7920 8024 7921 trace_create_file("saved_cmdlines_size", 0644, d_tracer, 8025 7922 NULL, &tracing_saved_cmdlines_size_fops); 7923 + 7924 + trace_create_file("saved_tgids", 0444, d_tracer, 7925 + NULL, &tracing_saved_tgids_fops); 8026 7926 8027 7927 trace_eval_init(); 8028 7928
+9
kernel/trace/trace_kprobe.c
··· 598 598 .priority = 1 /* Invoked after kprobe module callback */ 599 599 }; 600 600 601 + /* Convert certain expected symbols into '_' when generating event names */ 602 + static inline void sanitize_event_name(char *name) 603 + { 604 + while (*name++ != '\0') 605 + if (*name == ':' || *name == '.') 606 + *name = '_'; 607 + } 608 + 601 609 static int create_trace_kprobe(int argc, char **argv) 602 610 { 603 611 /* ··· 744 736 else 745 737 snprintf(buf, MAX_EVENT_NAME_LEN, "%c_0x%p", 746 738 is_return ? 'r' : 'p', addr); 739 + sanitize_event_name(buf); 747 740 event = buf; 748 741 } 749 742 tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
+6
kernel/trace/trace_stack.c
··· 406 406 .release = seq_release, 407 407 }; 408 408 409 + #ifdef CONFIG_DYNAMIC_FTRACE 410 + 409 411 static int 410 412 stack_trace_filter_open(struct inode *inode, struct file *file) 411 413 { ··· 424 422 .llseek = tracing_lseek, 425 423 .release = ftrace_regex_release, 426 424 }; 425 + 426 + #endif /* CONFIG_DYNAMIC_FTRACE */ 427 427 428 428 int 429 429 stack_trace_sysctl(struct ctl_table *table, int write, ··· 481 477 trace_create_file("stack_trace", 0444, d_tracer, 482 478 NULL, &stack_trace_fops); 483 479 480 + #ifdef CONFIG_DYNAMIC_FTRACE 484 481 trace_create_file("stack_trace_filter", 0444, d_tracer, 485 482 &trace_ops, &stack_trace_filter_fops); 483 + #endif 486 484 487 485 if (stack_trace_filter_buf[0]) 488 486 ftrace_set_early_filter(&trace_ops, stack_trace_filter_buf, 1);
+36
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc
··· 1 + #!/bin/sh 2 + # description: Kprobe event auto/manual naming 3 + 4 + [ -f kprobe_events ] || exit_unsupported # this is configurable 5 + 6 + disable_events 7 + echo > kprobe_events 8 + 9 + :;: "Add an event on function without name" ;: 10 + 11 + FUNC=`grep " [tT] .*vfs_read$" /proc/kallsyms | tail -n 1 | cut -f 3 -d " "` 12 + [ "x" != "x$FUNC" ] || exit_unresolved 13 + echo "p $FUNC" > kprobe_events 14 + PROBE_NAME=`echo $FUNC | tr ".:" "_"` 15 + test -d events/kprobes/p_${PROBE_NAME}_0 || exit_failure 16 + 17 + :;: "Add an event on function with new name" ;: 18 + 19 + echo "p:event1 $FUNC" > kprobe_events 20 + test -d events/kprobes/event1 || exit_failure 21 + 22 + :;: "Add an event on function with new name and group" ;: 23 + 24 + echo "p:kprobes2/event2 $FUNC" > kprobe_events 25 + test -d events/kprobes2/event2 || exit_failure 26 + 27 + :;: "Add an event on dot function without name" ;: 28 + 29 + FUNC=`grep -m 10 " [tT] .*\.isra\..*$" /proc/kallsyms | tail -n 1 | cut -f 3 -d " "` 30 + [ "x" != "x$FUNC" ] || exit_unresolved 31 + echo "p $FUNC" > kprobe_events 32 + EVENT=`grep $FUNC kprobe_events | cut -f 1 -d " " | cut -f 2 -d:` 33 + [ "x" != "x$EVENT" ] || exit_failure 34 + test -d events/$EVENT || exit_failure 35 + 36 + echo > kprobe_events
+28
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc
··· 1 + #!/bin/sh 2 + # description: Kprobe dynamic event - probing module 3 + 4 + [ -f kprobe_events ] || exit_unsupported # this is configurable 5 + 6 + disable_events 7 + echo > kprobe_events 8 + 9 + :;: "Add an event on a module function without specifying event name" ;: 10 + 11 + MOD=`lsmod | head -n 2 | tail -n 1 | cut -f1 -d" "` 12 + FUNC=`grep -m 1 ".* t .*\\[$MOD\\]" /proc/kallsyms | xargs | cut -f3 -d" "` 13 + [ "x" != "x$MOD" -a "y" != "y$FUNC" ] || exit_unresolved 14 + echo "p $MOD:$FUNC" > kprobe_events 15 + PROBE_NAME=`echo $MOD:$FUNC | tr ".:" "_"` 16 + test -d events/kprobes/p_${PROBE_NAME}_0 || exit_failure 17 + 18 + :;: "Add an event on a module function with new event name" ;: 19 + 20 + echo "p:event1 $MOD:$FUNC" > kprobe_events 21 + test -d events/kprobes/event1 || exit_failure 22 + 23 + :;: "Add an event on a module function with new event and group name" ;: 24 + 25 + echo "p:kprobes1/event1 $MOD:$FUNC" > kprobe_events 26 + test -d events/kprobes1/event1 || exit_failure 27 + 28 + echo > kprobe_events
+2 -2
tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
··· 2 2 # description: Register/unregister many kprobe events 3 3 4 4 # ftrace fentry skip size depends on the machine architecture. 5 - # Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc 5 + # Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc64le 6 6 case `uname -m` in 7 7 x86_64|i[3456]86) OFFS=5;; 8 - ppc*) OFFS=4;; 8 + ppc64le) OFFS=8;; 9 9 *) OFFS=0;; 10 10 esac 11 11