Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ring-buffer: prevent infinite looping on time stamping
ftrace: disable tracing on resize
ftrace: fix breakage in bin_fmt results
ftrace: ftrace.txt version update
ftrace: update txt document

+90 -100
+72 -97
Documentation/ftrace.txt
··· 8 8 Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, 9 9 John Kacur, and David Teigland. 10 10 11 - Written for: 2.6.27-rc1 11 + Written for: 2.6.28-rc2 12 12 13 13 Introduction 14 14 ------------ ··· 50 50 51 51 Note: all time values are in microseconds. 52 52 53 - current_tracer : This is used to set or display the current tracer 53 + current_tracer: This is used to set or display the current tracer 54 54 that is configured. 55 55 56 - available_tracers : This holds the different types of tracers that 56 + available_tracers: This holds the different types of tracers that 57 57 have been compiled into the kernel. The tracers 58 58 listed here can be configured by echoing their name 59 59 into current_tracer. 60 60 61 - tracing_enabled : This sets or displays whether the current_tracer 61 + tracing_enabled: This sets or displays whether the current_tracer 62 62 is activated and tracing or not. Echo 0 into this 63 63 file to disable the tracer or 1 to enable it. 64 64 65 - trace : This file holds the output of the trace in a human readable 65 + trace: This file holds the output of the trace in a human readable 66 66 format (described below). 67 67 68 - latency_trace : This file shows the same trace but the information 68 + latency_trace: This file shows the same trace but the information 69 69 is organized more to display possible latencies 70 70 in the system (described below). 71 71 72 - trace_pipe : The output is the same as the "trace" file but this 72 + trace_pipe: The output is the same as the "trace" file but this 73 73 file is meant to be streamed with live tracing. 74 74 Reads from this file will block until new data 75 75 is retrieved. Unlike the "trace" and "latency_trace" ··· 82 82 tracer is not adding more data, they will display 83 83 the same information every time they are read. 84 84 85 - iter_ctrl : This file lets the user control the amount of data 85 + iter_ctrl: This file lets the user control the amount of data 86 86 that is displayed in one of the above output 87 87 files. 88 88 89 - trace_max_latency : Some of the tracers record the max latency. 89 + trace_max_latency: Some of the tracers record the max latency. 90 90 For example, the time interrupts are disabled. 91 91 This time is saved in this file. The max trace 92 92 will also be stored, and displayed by either ··· 94 94 only be recorded if the latency is greater than 95 95 the value in this file. (in microseconds) 96 96 97 - trace_entries : This sets or displays the number of trace 98 - entries each CPU buffer can hold. The tracer buffers 99 - are the same size for each CPU. The displayed number 100 - is the size of the CPU buffer and not total size. The 97 + trace_entries: This sets or displays the number of bytes each CPU 98 + buffer can hold. The tracer buffers are the same size 99 + for each CPU. The displayed number is the size of the 100 + CPU buffer and not total size of all buffers. The 101 101 trace buffers are allocated in pages (blocks of memory 102 102 that the kernel uses for allocation, usually 4 KB in size). 103 - Since each entry is smaller than a page, if the last 104 - allocated page has room for more entries than were 105 - requested, the rest of the page is used to allocate 106 - entries. 103 + If the last page allocated has room for more bytes 104 + than requested, the rest of the page will be used, 105 + making the actual allocation bigger than requested. 106 + (Note, the size may not be a multiple of the page size due 107 + to buffer managment overhead.) 107 108 108 109 This can only be updated when the current_tracer 109 - is set to "none". 110 + is set to "nop". 110 111 111 - NOTE: It is planned on changing the allocated buffers 112 - from being the number of possible CPUS to 113 - the number of online CPUS. 114 - 115 - tracing_cpumask : This is a mask that lets the user only trace 112 + tracing_cpumask: This is a mask that lets the user only trace 116 113 on specified CPUS. The format is a hex string 117 114 representing the CPUS. 118 115 119 - set_ftrace_filter : When dynamic ftrace is configured in (see the 116 + set_ftrace_filter: When dynamic ftrace is configured in (see the 120 117 section below "dynamic ftrace"), the code is dynamically 121 118 modified (code text rewrite) to disable calling of the 122 119 function profiler (mcount). This lets tracing be configured ··· 127 130 be traced. If a function exists in both set_ftrace_filter 128 131 and set_ftrace_notrace, the function will _not_ be traced. 129 132 130 - available_filter_functions : When a function is encountered the first 131 - time by the dynamic tracer, it is recorded and 132 - later the call is converted into a nop. This file 133 - lists the functions that have been recorded 134 - by the dynamic tracer and these functions can 135 - be used to set the ftrace filter by the above 136 - "set_ftrace_filter" file. (See the section "dynamic ftrace" 137 - below for more details). 133 + available_filter_functions: This lists the functions that ftrace 134 + has processed and can trace. These are the function 135 + names that you can pass to "set_ftrace_filter" or 136 + "set_ftrace_notrace". (See the section "dynamic ftrace" 137 + below for more details.) 138 138 139 139 140 140 The Tracers ··· 139 145 140 146 Here is the list of current tracers that may be configured. 141 147 142 - ftrace - function tracer that uses mcount to trace all functions. 148 + function - function tracer that uses mcount to trace all functions. 143 149 144 150 sched_switch - traces the context switches between tasks. 145 151 ··· 160 166 the highest priority task to get scheduled after 161 167 it has been woken up. 162 168 163 - none - This is not a tracer. To remove all tracers from tracing 164 - simply echo "none" into current_tracer. 169 + nop - This is not a tracer. To remove all tracers from tracing 170 + simply echo "nop" into current_tracer. 165 171 166 172 167 173 Examples of using the tracer ··· 176 182 Here is an example of the output format of the file "trace" 177 183 178 184 -------- 179 - # tracer: ftrace 185 + # tracer: function 180 186 # 181 187 # TASK-PID CPU# TIMESTAMP FUNCTION 182 188 # | | | | | ··· 186 192 -------- 187 193 188 194 A header is printed with the tracer name that is represented by the trace. 189 - In this case the tracer is "ftrace". Then a header showing the format. Task 195 + In this case the tracer is "function". Then a header showing the format. Task 190 196 name "bash", the task PID "4251", the CPU that it was running on 191 197 "01", the timestamp in <secs>.<usecs> format, the function name that was 192 198 traced "path_put" and the parent function that called this function ··· 997 1003 has been set. We do not see the 'N' until we switch back to the task's 998 1004 assigned stack. 999 1005 1000 - ftrace 1001 - ------ 1006 + function 1007 + -------- 1002 1008 1003 - ftrace is not only the name of the tracing infrastructure, but it 1004 - is also a name of one of the tracers. The tracer is the function 1005 - tracer. Enabling the function tracer can be done from the 1006 - debug file system. Make sure the ftrace_enabled is set otherwise 1007 - this tracer is a nop. 1009 + This tracer is the function tracer. Enabling the function tracer 1010 + can be done from the debug file system. Make sure the ftrace_enabled is 1011 + set; otherwise this tracer is a nop. 1008 1012 1009 1013 # sysctl kernel.ftrace_enabled=1 1010 - # echo ftrace > /debug/tracing/current_tracer 1014 + # echo function > /debug/tracing/current_tracer 1011 1015 # echo 1 > /debug/tracing/tracing_enabled 1012 1016 # usleep 1 1013 1017 # echo 0 > /debug/tracing/tracing_enabled 1014 1018 # cat /debug/tracing/trace 1015 - # tracer: ftrace 1019 + # tracer: function 1016 1020 # 1017 1021 # TASK-PID CPU# TIMESTAMP FUNCTION 1018 1022 # | | | | | ··· 1032 1040 [...] 1033 1041 1034 1042 1035 - Note: ftrace uses ring buffers to store the above entries. The newest data 1036 - may overwrite the oldest data. Sometimes using echo to stop the trace 1037 - is not sufficient because the tracing could have overwritten the data 1038 - that you wanted to record. For this reason, it is sometimes better to 1043 + Note: function tracer uses ring buffers to store the above entries. 1044 + The newest data may overwrite the oldest data. Sometimes using echo to 1045 + stop the trace is not sufficient because the tracing could have overwritten 1046 + the data that you wanted to record. For this reason, it is sometimes better to 1039 1047 disable tracing directly from a program. This allows you to stop the 1040 1048 tracing at the point that you hit the part that you are interested in. 1041 1049 To disable the tracing directly from a C program, something like following ··· 1069 1077 of pointing to a simple return. (Enabling FTRACE will include the 1070 1078 -pg switch in the compiling of the kernel.) 1071 1079 1072 - When dynamic ftrace is initialized, it calls kstop_machine to make 1073 - the machine act like a uniprocessor so that it can freely modify code 1074 - without worrying about other processors executing that same code. At 1075 - initialization, the mcount calls are changed to call a "record_ip" 1076 - function. After this, the first time a kernel function is called, 1077 - it has the calling address saved in a hash table. 1080 + At compile time every C file object is run through the 1081 + recordmcount.pl script (located in the scripts directory). This 1082 + script will process the C object using objdump to find all the 1083 + locations in the .text section that call mcount. (Note, only 1084 + the .text section is processed, since processing other sections 1085 + like .init.text may cause races due to those sections being freed). 1078 1086 1079 - Later on the ftraced kernel thread is awoken and will again call 1080 - kstop_machine if new functions have been recorded. The ftraced thread 1081 - will change all calls to mcount to "nop". Just calling mcount 1082 - and having mcount return has shown a 10% overhead. By converting 1083 - it to a nop, there is no measurable overhead to the system. 1087 + A new section called "__mcount_loc" is created that holds references 1088 + to all the mcount call sites in the .text section. This section is 1089 + compiled back into the original object. The final linker will add 1090 + all these references into a single table. 1091 + 1092 + On boot up, before SMP is initialized, the dynamic ftrace code 1093 + scans this table and updates all the locations into nops. It also 1094 + records the locations, which are added to the available_filter_functions 1095 + list. Modules are processed as they are loaded and before they are 1096 + executed. When a module is unloaded, it also removes its functions from 1097 + the ftrace function list. This is automatic in the module unload 1098 + code, and the module author does not need to worry about it. 1099 + 1100 + When tracing is enabled, kstop_machine is called to prevent races 1101 + with the CPUS executing code being modified (which can cause the 1102 + CPU to do undesireable things), and the nops are patched back 1103 + to calls. But this time, they do not call mcount (which is just 1104 + a function stub). They now call into the ftrace infrastructure. 1084 1105 1085 1106 One special side-effect to the recording of the functions being 1086 1107 traced is that we can now selectively choose which functions we ··· 1256 1251 1257 1252 We can see that there's no more lock or preempt tracing. 1258 1253 1259 - ftraced 1260 - ------- 1261 - 1262 - As mentioned above, when dynamic ftrace is configured in, a kernel 1263 - thread wakes up once a second and checks to see if there are mcount 1264 - calls that need to be converted into nops. If there are not any, then 1265 - it simply goes back to sleep. But if there are some, it will call 1266 - kstop_machine to convert the calls to nops. 1267 - 1268 - There may be a case in which you do not want this added latency. 1269 - Perhaps you are doing some audio recording and this activity might 1270 - cause skips in the playback. There is an interface to disable 1271 - and enable the "ftraced" kernel thread. 1272 - 1273 - # echo 0 > /debug/tracing/ftraced_enabled 1274 - 1275 - This will disable the calling of kstop_machine to update the 1276 - mcount calls to nops. Remember that there is a large overhead 1277 - to calling mcount. Without this kernel thread, that overhead will 1278 - exist. 1279 - 1280 - If there are recorded calls to mcount, any write to the ftraced_enabled 1281 - file will cause the kstop_machine to run. This means that a 1282 - user can manually perform the updates when they want to by simply 1283 - echoing a '0' into the ftraced_enabled file. 1284 - 1285 - The updates are also done at the beginning of enabling a tracer 1286 - that uses ftrace function recording. 1287 - 1288 - 1289 1254 trace_pipe 1290 1255 ---------- 1291 1256 ··· 1264 1289 This means that subsequent reads will be different. The trace 1265 1290 is live. 1266 1291 1267 - # echo ftrace > /debug/tracing/current_tracer 1292 + # echo function > /debug/tracing/current_tracer 1268 1293 # cat /debug/tracing/trace_pipe > /tmp/trace.out & 1269 1294 [1] 4153 1270 1295 # echo 1 > /debug/tracing/tracing_enabled 1271 1296 # usleep 1 1272 1297 # echo 0 > /debug/tracing/tracing_enabled 1273 1298 # cat /debug/tracing/trace 1274 - # tracer: ftrace 1299 + # tracer: function 1275 1300 # 1276 1301 # TASK-PID CPU# TIMESTAMP FUNCTION 1277 1302 # | | | | | ··· 1292 1317 1293 1318 Note, reading the trace_pipe file will block until more input is added. 1294 1319 By changing the tracer, trace_pipe will issue an EOF. We needed 1295 - to set the ftrace tracer _before_ cating the trace_pipe file. 1320 + to set the function tracer _before_ we "cat" the trace_pipe file. 1296 1321 1297 1322 1298 1323 trace entries ··· 1309 1334 65620 1310 1335 1311 1336 Note, to modify this, you must have tracing completely disabled. To do that, 1312 - echo "none" into the current_tracer. If the current_tracer is not set 1313 - to "none", an EINVAL error will be returned. 1337 + echo "nop" into the current_tracer. If the current_tracer is not set 1338 + to "nop", an EINVAL error will be returned. 1314 1339 1315 - # echo none > /debug/tracing/current_tracer 1340 + # echo nop > /debug/tracing/current_tracer 1316 1341 # echo 100000 > /debug/tracing/trace_entries 1317 1342 # cat /debug/tracing/trace_entries 1318 1343 100045
+1 -1
kernel/trace/ring_buffer.c
··· 1060 1060 1061 1061 /* Did the write stamp get updated already? */ 1062 1062 if (unlikely(ts < cpu_buffer->write_stamp)) 1063 - goto again; 1063 + delta = 0; 1064 1064 1065 1065 if (test_time_stamp(delta)) { 1066 1066
+17 -2
kernel/trace/trace.c
··· 1755 1755 return TRACE_TYPE_HANDLED; 1756 1756 1757 1757 SEQ_PUT_FIELD_RET(s, entry->pid); 1758 - SEQ_PUT_FIELD_RET(s, iter->cpu); 1758 + SEQ_PUT_FIELD_RET(s, entry->cpu); 1759 1759 SEQ_PUT_FIELD_RET(s, iter->ts); 1760 1760 1761 1761 switch (entry->type) { ··· 2676 2676 { 2677 2677 unsigned long val; 2678 2678 char buf[64]; 2679 - int ret; 2679 + int ret, cpu; 2680 2680 struct trace_array *tr = filp->private_data; 2681 2681 2682 2682 if (cnt >= sizeof(buf)) ··· 2702 2702 pr_info("ftrace: please disable tracing" 2703 2703 " before modifying buffer size\n"); 2704 2704 goto out; 2705 + } 2706 + 2707 + /* disable all cpu buffers */ 2708 + for_each_tracing_cpu(cpu) { 2709 + if (global_trace.data[cpu]) 2710 + atomic_inc(&global_trace.data[cpu]->disabled); 2711 + if (max_tr.data[cpu]) 2712 + atomic_inc(&max_tr.data[cpu]->disabled); 2705 2713 } 2706 2714 2707 2715 if (val != global_trace.entries) { ··· 2743 2735 if (tracing_disabled) 2744 2736 cnt = -ENOMEM; 2745 2737 out: 2738 + for_each_tracing_cpu(cpu) { 2739 + if (global_trace.data[cpu]) 2740 + atomic_dec(&global_trace.data[cpu]->disabled); 2741 + if (max_tr.data[cpu]) 2742 + atomic_dec(&max_tr.data[cpu]->disabled); 2743 + } 2744 + 2746 2745 max_tr.entries = global_trace.entries; 2747 2746 mutex_unlock(&trace_types_lock); 2748 2747