Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ring-buffer: prevent infinite looping on time stamping
ftrace: disable tracing on resize
ftrace: fix breakage in bin_fmt results
ftrace: ftrace.txt version update
ftrace: update txt document

+90 -100
+72 -97
Documentation/ftrace.txt
··· 8 Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, 9 John Kacur, and David Teigland. 10 11 - Written for: 2.6.27-rc1 12 13 Introduction 14 ------------ ··· 50 51 Note: all time values are in microseconds. 52 53 - current_tracer : This is used to set or display the current tracer 54 that is configured. 55 56 - available_tracers : This holds the different types of tracers that 57 have been compiled into the kernel. The tracers 58 listed here can be configured by echoing their name 59 into current_tracer. 60 61 - tracing_enabled : This sets or displays whether the current_tracer 62 is activated and tracing or not. Echo 0 into this 63 file to disable the tracer or 1 to enable it. 64 65 - trace : This file holds the output of the trace in a human readable 66 format (described below). 67 68 - latency_trace : This file shows the same trace but the information 69 is organized more to display possible latencies 70 in the system (described below). 71 72 - trace_pipe : The output is the same as the "trace" file but this 73 file is meant to be streamed with live tracing. 74 Reads from this file will block until new data 75 is retrieved. Unlike the "trace" and "latency_trace" ··· 82 tracer is not adding more data, they will display 83 the same information every time they are read. 84 85 - iter_ctrl : This file lets the user control the amount of data 86 that is displayed in one of the above output 87 files. 88 89 - trace_max_latency : Some of the tracers record the max latency. 90 For example, the time interrupts are disabled. 91 This time is saved in this file. The max trace 92 will also be stored, and displayed by either ··· 94 only be recorded if the latency is greater than 95 the value in this file. (in microseconds) 96 97 - trace_entries : This sets or displays the number of trace 98 - entries each CPU buffer can hold. The tracer buffers 99 - are the same size for each CPU. The displayed number 100 - is the size of the CPU buffer and not total size. The 101 trace buffers are allocated in pages (blocks of memory 102 that the kernel uses for allocation, usually 4 KB in size). 103 - Since each entry is smaller than a page, if the last 104 - allocated page has room for more entries than were 105 - requested, the rest of the page is used to allocate 106 - entries. 107 108 This can only be updated when the current_tracer 109 - is set to "none". 110 111 - NOTE: It is planned on changing the allocated buffers 112 - from being the number of possible CPUS to 113 - the number of online CPUS. 114 - 115 - tracing_cpumask : This is a mask that lets the user only trace 116 on specified CPUS. The format is a hex string 117 representing the CPUS. 118 119 - set_ftrace_filter : When dynamic ftrace is configured in (see the 120 section below "dynamic ftrace"), the code is dynamically 121 modified (code text rewrite) to disable calling of the 122 function profiler (mcount). This lets tracing be configured ··· 127 be traced. If a function exists in both set_ftrace_filter 128 and set_ftrace_notrace, the function will _not_ be traced. 129 130 - available_filter_functions : When a function is encountered the first 131 - time by the dynamic tracer, it is recorded and 132 - later the call is converted into a nop. This file 133 - lists the functions that have been recorded 134 - by the dynamic tracer and these functions can 135 - be used to set the ftrace filter by the above 136 - "set_ftrace_filter" file. (See the section "dynamic ftrace" 137 - below for more details). 138 139 140 The Tracers ··· 139 140 Here is the list of current tracers that may be configured. 141 142 - ftrace - function tracer that uses mcount to trace all functions. 143 144 sched_switch - traces the context switches between tasks. 145 ··· 160 the highest priority task to get scheduled after 161 it has been woken up. 162 163 - none - This is not a tracer. To remove all tracers from tracing 164 - simply echo "none" into current_tracer. 165 166 167 Examples of using the tracer ··· 176 Here is an example of the output format of the file "trace" 177 178 -------- 179 - # tracer: ftrace 180 # 181 # TASK-PID CPU# TIMESTAMP FUNCTION 182 # | | | | | ··· 186 -------- 187 188 A header is printed with the tracer name that is represented by the trace. 189 - In this case the tracer is "ftrace". Then a header showing the format. Task 190 name "bash", the task PID "4251", the CPU that it was running on 191 "01", the timestamp in <secs>.<usecs> format, the function name that was 192 traced "path_put" and the parent function that called this function ··· 997 has been set. We do not see the 'N' until we switch back to the task's 998 assigned stack. 999 1000 - ftrace 1001 - ------ 1002 1003 - ftrace is not only the name of the tracing infrastructure, but it 1004 - is also a name of one of the tracers. The tracer is the function 1005 - tracer. Enabling the function tracer can be done from the 1006 - debug file system. Make sure the ftrace_enabled is set otherwise 1007 - this tracer is a nop. 1008 1009 # sysctl kernel.ftrace_enabled=1 1010 - # echo ftrace > /debug/tracing/current_tracer 1011 # echo 1 > /debug/tracing/tracing_enabled 1012 # usleep 1 1013 # echo 0 > /debug/tracing/tracing_enabled 1014 # cat /debug/tracing/trace 1015 - # tracer: ftrace 1016 # 1017 # TASK-PID CPU# TIMESTAMP FUNCTION 1018 # | | | | | ··· 1032 [...] 1033 1034 1035 - Note: ftrace uses ring buffers to store the above entries. The newest data 1036 - may overwrite the oldest data. Sometimes using echo to stop the trace 1037 - is not sufficient because the tracing could have overwritten the data 1038 - that you wanted to record. For this reason, it is sometimes better to 1039 disable tracing directly from a program. This allows you to stop the 1040 tracing at the point that you hit the part that you are interested in. 1041 To disable the tracing directly from a C program, something like following ··· 1069 of pointing to a simple return. (Enabling FTRACE will include the 1070 -pg switch in the compiling of the kernel.) 1071 1072 - When dynamic ftrace is initialized, it calls kstop_machine to make 1073 - the machine act like a uniprocessor so that it can freely modify code 1074 - without worrying about other processors executing that same code. At 1075 - initialization, the mcount calls are changed to call a "record_ip" 1076 - function. After this, the first time a kernel function is called, 1077 - it has the calling address saved in a hash table. 1078 1079 - Later on the ftraced kernel thread is awoken and will again call 1080 - kstop_machine if new functions have been recorded. The ftraced thread 1081 - will change all calls to mcount to "nop". Just calling mcount 1082 - and having mcount return has shown a 10% overhead. By converting 1083 - it to a nop, there is no measurable overhead to the system. 1084 1085 One special side-effect to the recording of the functions being 1086 traced is that we can now selectively choose which functions we ··· 1256 1257 We can see that there's no more lock or preempt tracing. 1258 1259 - ftraced 1260 - ------- 1261 - 1262 - As mentioned above, when dynamic ftrace is configured in, a kernel 1263 - thread wakes up once a second and checks to see if there are mcount 1264 - calls that need to be converted into nops. If there are not any, then 1265 - it simply goes back to sleep. But if there are some, it will call 1266 - kstop_machine to convert the calls to nops. 1267 - 1268 - There may be a case in which you do not want this added latency. 1269 - Perhaps you are doing some audio recording and this activity might 1270 - cause skips in the playback. There is an interface to disable 1271 - and enable the "ftraced" kernel thread. 1272 - 1273 - # echo 0 > /debug/tracing/ftraced_enabled 1274 - 1275 - This will disable the calling of kstop_machine to update the 1276 - mcount calls to nops. Remember that there is a large overhead 1277 - to calling mcount. Without this kernel thread, that overhead will 1278 - exist. 1279 - 1280 - If there are recorded calls to mcount, any write to the ftraced_enabled 1281 - file will cause the kstop_machine to run. This means that a 1282 - user can manually perform the updates when they want to by simply 1283 - echoing a '0' into the ftraced_enabled file. 1284 - 1285 - The updates are also done at the beginning of enabling a tracer 1286 - that uses ftrace function recording. 1287 - 1288 - 1289 trace_pipe 1290 ---------- 1291 ··· 1264 This means that subsequent reads will be different. The trace 1265 is live. 1266 1267 - # echo ftrace > /debug/tracing/current_tracer 1268 # cat /debug/tracing/trace_pipe > /tmp/trace.out & 1269 [1] 4153 1270 # echo 1 > /debug/tracing/tracing_enabled 1271 # usleep 1 1272 # echo 0 > /debug/tracing/tracing_enabled 1273 # cat /debug/tracing/trace 1274 - # tracer: ftrace 1275 # 1276 # TASK-PID CPU# TIMESTAMP FUNCTION 1277 # | | | | | ··· 1292 1293 Note, reading the trace_pipe file will block until more input is added. 1294 By changing the tracer, trace_pipe will issue an EOF. We needed 1295 - to set the ftrace tracer _before_ cating the trace_pipe file. 1296 1297 1298 trace entries ··· 1309 65620 1310 1311 Note, to modify this, you must have tracing completely disabled. To do that, 1312 - echo "none" into the current_tracer. If the current_tracer is not set 1313 - to "none", an EINVAL error will be returned. 1314 1315 - # echo none > /debug/tracing/current_tracer 1316 # echo 100000 > /debug/tracing/trace_entries 1317 # cat /debug/tracing/trace_entries 1318 100045
··· 8 Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton, 9 John Kacur, and David Teigland. 10 11 + Written for: 2.6.28-rc2 12 13 Introduction 14 ------------ ··· 50 51 Note: all time values are in microseconds. 52 53 + current_tracer: This is used to set or display the current tracer 54 that is configured. 55 56 + available_tracers: This holds the different types of tracers that 57 have been compiled into the kernel. The tracers 58 listed here can be configured by echoing their name 59 into current_tracer. 60 61 + tracing_enabled: This sets or displays whether the current_tracer 62 is activated and tracing or not. Echo 0 into this 63 file to disable the tracer or 1 to enable it. 64 65 + trace: This file holds the output of the trace in a human readable 66 format (described below). 67 68 + latency_trace: This file shows the same trace but the information 69 is organized more to display possible latencies 70 in the system (described below). 71 72 + trace_pipe: The output is the same as the "trace" file but this 73 file is meant to be streamed with live tracing. 74 Reads from this file will block until new data 75 is retrieved. Unlike the "trace" and "latency_trace" ··· 82 tracer is not adding more data, they will display 83 the same information every time they are read. 84 85 + iter_ctrl: This file lets the user control the amount of data 86 that is displayed in one of the above output 87 files. 88 89 + trace_max_latency: Some of the tracers record the max latency. 90 For example, the time interrupts are disabled. 91 This time is saved in this file. The max trace 92 will also be stored, and displayed by either ··· 94 only be recorded if the latency is greater than 95 the value in this file. (in microseconds) 96 97 + trace_entries: This sets or displays the number of bytes each CPU 98 + buffer can hold. The tracer buffers are the same size 99 + for each CPU. The displayed number is the size of the 100 + CPU buffer and not total size of all buffers. The 101 trace buffers are allocated in pages (blocks of memory 102 that the kernel uses for allocation, usually 4 KB in size). 103 + If the last page allocated has room for more bytes 104 + than requested, the rest of the page will be used, 105 + making the actual allocation bigger than requested. 106 + (Note, the size may not be a multiple of the page size due 107 + to buffer managment overhead.) 108 109 This can only be updated when the current_tracer 110 + is set to "nop". 111 112 + tracing_cpumask: This is a mask that lets the user only trace 113 on specified CPUS. The format is a hex string 114 representing the CPUS. 115 116 + set_ftrace_filter: When dynamic ftrace is configured in (see the 117 section below "dynamic ftrace"), the code is dynamically 118 modified (code text rewrite) to disable calling of the 119 function profiler (mcount). This lets tracing be configured ··· 130 be traced. If a function exists in both set_ftrace_filter 131 and set_ftrace_notrace, the function will _not_ be traced. 132 133 + available_filter_functions: This lists the functions that ftrace 134 + has processed and can trace. These are the function 135 + names that you can pass to "set_ftrace_filter" or 136 + "set_ftrace_notrace". (See the section "dynamic ftrace" 137 + below for more details.) 138 139 140 The Tracers ··· 145 146 Here is the list of current tracers that may be configured. 147 148 + function - function tracer that uses mcount to trace all functions. 149 150 sched_switch - traces the context switches between tasks. 151 ··· 166 the highest priority task to get scheduled after 167 it has been woken up. 168 169 + nop - This is not a tracer. To remove all tracers from tracing 170 + simply echo "nop" into current_tracer. 171 172 173 Examples of using the tracer ··· 182 Here is an example of the output format of the file "trace" 183 184 -------- 185 + # tracer: function 186 # 187 # TASK-PID CPU# TIMESTAMP FUNCTION 188 # | | | | | ··· 192 -------- 193 194 A header is printed with the tracer name that is represented by the trace. 195 + In this case the tracer is "function". Then a header showing the format. Task 196 name "bash", the task PID "4251", the CPU that it was running on 197 "01", the timestamp in <secs>.<usecs> format, the function name that was 198 traced "path_put" and the parent function that called this function ··· 1003 has been set. We do not see the 'N' until we switch back to the task's 1004 assigned stack. 1005 1006 + function 1007 + -------- 1008 1009 + This tracer is the function tracer. Enabling the function tracer 1010 + can be done from the debug file system. Make sure the ftrace_enabled is 1011 + set; otherwise this tracer is a nop. 1012 1013 # sysctl kernel.ftrace_enabled=1 1014 + # echo function > /debug/tracing/current_tracer 1015 # echo 1 > /debug/tracing/tracing_enabled 1016 # usleep 1 1017 # echo 0 > /debug/tracing/tracing_enabled 1018 # cat /debug/tracing/trace 1019 + # tracer: function 1020 # 1021 # TASK-PID CPU# TIMESTAMP FUNCTION 1022 # | | | | | ··· 1040 [...] 1041 1042 1043 + Note: function tracer uses ring buffers to store the above entries. 1044 + The newest data may overwrite the oldest data. Sometimes using echo to 1045 + stop the trace is not sufficient because the tracing could have overwritten 1046 + the data that you wanted to record. For this reason, it is sometimes better to 1047 disable tracing directly from a program. This allows you to stop the 1048 tracing at the point that you hit the part that you are interested in. 1049 To disable the tracing directly from a C program, something like following ··· 1077 of pointing to a simple return. (Enabling FTRACE will include the 1078 -pg switch in the compiling of the kernel.) 1079 1080 + At compile time every C file object is run through the 1081 + recordmcount.pl script (located in the scripts directory). This 1082 + script will process the C object using objdump to find all the 1083 + locations in the .text section that call mcount. (Note, only 1084 + the .text section is processed, since processing other sections 1085 + like .init.text may cause races due to those sections being freed). 1086 1087 + A new section called "__mcount_loc" is created that holds references 1088 + to all the mcount call sites in the .text section. This section is 1089 + compiled back into the original object. The final linker will add 1090 + all these references into a single table. 1091 + 1092 + On boot up, before SMP is initialized, the dynamic ftrace code 1093 + scans this table and updates all the locations into nops. It also 1094 + records the locations, which are added to the available_filter_functions 1095 + list. Modules are processed as they are loaded and before they are 1096 + executed. When a module is unloaded, it also removes its functions from 1097 + the ftrace function list. This is automatic in the module unload 1098 + code, and the module author does not need to worry about it. 1099 + 1100 + When tracing is enabled, kstop_machine is called to prevent races 1101 + with the CPUS executing code being modified (which can cause the 1102 + CPU to do undesireable things), and the nops are patched back 1103 + to calls. But this time, they do not call mcount (which is just 1104 + a function stub). They now call into the ftrace infrastructure. 1105 1106 One special side-effect to the recording of the functions being 1107 traced is that we can now selectively choose which functions we ··· 1251 1252 We can see that there's no more lock or preempt tracing. 1253 1254 trace_pipe 1255 ---------- 1256 ··· 1289 This means that subsequent reads will be different. The trace 1290 is live. 1291 1292 + # echo function > /debug/tracing/current_tracer 1293 # cat /debug/tracing/trace_pipe > /tmp/trace.out & 1294 [1] 4153 1295 # echo 1 > /debug/tracing/tracing_enabled 1296 # usleep 1 1297 # echo 0 > /debug/tracing/tracing_enabled 1298 # cat /debug/tracing/trace 1299 + # tracer: function 1300 # 1301 # TASK-PID CPU# TIMESTAMP FUNCTION 1302 # | | | | | ··· 1317 1318 Note, reading the trace_pipe file will block until more input is added. 1319 By changing the tracer, trace_pipe will issue an EOF. We needed 1320 + to set the function tracer _before_ we "cat" the trace_pipe file. 1321 1322 1323 trace entries ··· 1334 65620 1335 1336 Note, to modify this, you must have tracing completely disabled. To do that, 1337 + echo "nop" into the current_tracer. If the current_tracer is not set 1338 + to "nop", an EINVAL error will be returned. 1339 1340 + # echo nop > /debug/tracing/current_tracer 1341 # echo 100000 > /debug/tracing/trace_entries 1342 # cat /debug/tracing/trace_entries 1343 100045
+1 -1
kernel/trace/ring_buffer.c
··· 1060 1061 /* Did the write stamp get updated already? */ 1062 if (unlikely(ts < cpu_buffer->write_stamp)) 1063 - goto again; 1064 1065 if (test_time_stamp(delta)) { 1066
··· 1060 1061 /* Did the write stamp get updated already? */ 1062 if (unlikely(ts < cpu_buffer->write_stamp)) 1063 + delta = 0; 1064 1065 if (test_time_stamp(delta)) { 1066
+17 -2
kernel/trace/trace.c
··· 1755 return TRACE_TYPE_HANDLED; 1756 1757 SEQ_PUT_FIELD_RET(s, entry->pid); 1758 - SEQ_PUT_FIELD_RET(s, iter->cpu); 1759 SEQ_PUT_FIELD_RET(s, iter->ts); 1760 1761 switch (entry->type) { ··· 2676 { 2677 unsigned long val; 2678 char buf[64]; 2679 - int ret; 2680 struct trace_array *tr = filp->private_data; 2681 2682 if (cnt >= sizeof(buf)) ··· 2702 pr_info("ftrace: please disable tracing" 2703 " before modifying buffer size\n"); 2704 goto out; 2705 } 2706 2707 if (val != global_trace.entries) { ··· 2743 if (tracing_disabled) 2744 cnt = -ENOMEM; 2745 out: 2746 max_tr.entries = global_trace.entries; 2747 mutex_unlock(&trace_types_lock); 2748
··· 1755 return TRACE_TYPE_HANDLED; 1756 1757 SEQ_PUT_FIELD_RET(s, entry->pid); 1758 + SEQ_PUT_FIELD_RET(s, entry->cpu); 1759 SEQ_PUT_FIELD_RET(s, iter->ts); 1760 1761 switch (entry->type) { ··· 2676 { 2677 unsigned long val; 2678 char buf[64]; 2679 + int ret, cpu; 2680 struct trace_array *tr = filp->private_data; 2681 2682 if (cnt >= sizeof(buf)) ··· 2702 pr_info("ftrace: please disable tracing" 2703 " before modifying buffer size\n"); 2704 goto out; 2705 + } 2706 + 2707 + /* disable all cpu buffers */ 2708 + for_each_tracing_cpu(cpu) { 2709 + if (global_trace.data[cpu]) 2710 + atomic_inc(&global_trace.data[cpu]->disabled); 2711 + if (max_tr.data[cpu]) 2712 + atomic_inc(&max_tr.data[cpu]->disabled); 2713 } 2714 2715 if (val != global_trace.entries) { ··· 2735 if (tracing_disabled) 2736 cnt = -ENOMEM; 2737 out: 2738 + for_each_tracing_cpu(cpu) { 2739 + if (global_trace.data[cpu]) 2740 + atomic_dec(&global_trace.data[cpu]->disabled); 2741 + if (max_tr.data[cpu]) 2742 + atomic_dec(&max_tr.data[cpu]->disabled); 2743 + } 2744 + 2745 max_tr.entries = global_trace.entries; 2746 mutex_unlock(&trace_types_lock); 2747