Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core

+74 -197
+7
Documentation/trace/ftrace-design.txt
··· 247 247 - Support the TIF_SYSCALL_TRACEPOINT thread flags. 248 248 - Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace 249 249 in the ptrace syscalls tracing path. 250 + - If the system call table on this arch is more complicated than a simple array 251 + of addresses of the system calls, implement an arch_syscall_addr to return 252 + the address of a given system call. 253 + - If the symbol names of the system calls do not match the function names on 254 + this arch, define ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and 255 + implement arch_syscall_match_sym_name with the appropriate logic to return 256 + true if the function name corresponds with the symbol name. 250 257 - Tag this arch as HAVE_SYSCALL_TRACEPOINTS. 251 258 252 259
+19 -129
Documentation/trace/ftrace.txt
··· 80 80 tracers listed here can be configured by 81 81 echoing their name into current_tracer. 82 82 83 - tracing_enabled: 83 + tracing_on: 84 84 85 - This sets or displays whether the current_tracer 86 - is activated and tracing or not. Echo 0 into this 87 - file to disable the tracer or 1 to enable it. 85 + This sets or displays whether writing to the trace 86 + ring buffer is enabled. Echo 0 into this file to disable 87 + the tracer or 1 to enable it. 88 88 89 89 trace: 90 90 ··· 202 202 to draw a graph of function calls similar to C code 203 203 source. 204 204 205 - "sched_switch" 206 - 207 - Traces the context switches and wakeups between tasks. 208 - 209 205 "irqsoff" 210 206 211 207 Traces the areas that disable interrupts and saves ··· 268 272 format, the function name that was traced "path_put" and the 269 273 parent function that called this function "path_walk". The 270 274 timestamp is the time at which the function was entered. 271 - 272 - The sched_switch tracer also includes tracing of task wakeups 273 - and context switches. 274 - 275 - ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S 276 - ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S 277 - ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R 278 - events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R 279 - kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R 280 - ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R 281 - 282 - Wake ups are represented by a "+" and the context switches are 283 - shown as "==>". The format is: 284 - 285 - Context switches: 286 - 287 - Previous task Next Task 288 - 289 - <pid>:<prio>:<state> ==> <pid>:<prio>:<state> 290 - 291 - Wake ups: 292 - 293 - Current task Task waking up 294 - 295 - <pid>:<prio>:<state> + <pid>:<prio>:<state> 296 - 297 - The prio is the internal kernel priority, which is the inverse 298 - of the priority that is usually displayed by user-space tools. 299 - Zero represents the highest priority (99). Prio 100 starts the 300 - "nice" priorities with 100 being equal to nice -20 and 139 being 301 - nice 19. The prio "140" is reserved for the idle task which is 302 - the lowest priority thread (pid 0). 303 - 304 275 305 276 Latency trace format 306 277 -------------------- ··· 454 491 latencies, as described in "Latency 455 492 trace format". 456 493 457 - sched_switch 458 - ------------ 459 - 460 - This tracer simply records schedule switches. Here is an example 461 - of how to use it. 462 - 463 - # echo sched_switch > current_tracer 464 - # echo 1 > tracing_enabled 465 - # sleep 1 466 - # echo 0 > tracing_enabled 467 - # cat trace 468 - 469 - # tracer: sched_switch 470 - # 471 - # TASK-PID CPU# TIMESTAMP FUNCTION 472 - # | | | | | 473 - bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R 474 - bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R 475 - sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R 476 - bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S 477 - bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R 478 - sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R 479 - bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D 480 - bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R 481 - <idle>-0 [00] 240.132589: 0:140:R + 4:115:S 482 - <idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R 483 - ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R 484 - <idle>-0 [00] 240.132598: 0:140:R + 4:115:S 485 - <idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R 486 - ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R 487 - sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R 488 - [...] 489 - 490 - 491 - As we have discussed previously about this format, the header 492 - shows the name of the trace and points to the options. The 493 - "FUNCTION" is a misnomer since here it represents the wake ups 494 - and context switches. 495 - 496 - The sched_switch file only lists the wake ups (represented with 497 - '+') and context switches ('==>') with the previous task or 498 - current task first followed by the next task or task waking up. 499 - The format for both of these is PID:KERNEL-PRIO:TASK-STATE. 500 - Remember that the KERNEL-PRIO is the inverse of the actual 501 - priority with zero (0) being the highest priority and the nice 502 - values starting at 100 (nice -20). Below is a quick chart to map 503 - the kernel priority to user land priorities. 504 - 505 - Kernel Space User Space 506 - =============================================================== 507 - 0(high) to 98(low) user RT priority 99(high) to 1(low) 508 - with SCHED_RR or SCHED_FIFO 509 - --------------------------------------------------------------- 510 - 99 sched_priority is not used in scheduling 511 - decisions(it must be specified as 0) 512 - --------------------------------------------------------------- 513 - 100(high) to 139(low) user nice -20(high) to 19(low) 514 - --------------------------------------------------------------- 515 - 140 idle task priority 516 - --------------------------------------------------------------- 517 - 518 - The task states are: 519 - 520 - R - running : wants to run, may not actually be running 521 - S - sleep : process is waiting to be woken up (handles signals) 522 - D - disk sleep (uninterruptible sleep) : process must be woken up 523 - (ignores signals) 524 - T - stopped : process suspended 525 - t - traced : process is being traced (with something like gdb) 526 - Z - zombie : process waiting to be cleaned up 527 - X - unknown 528 - 529 - 530 494 ftrace_enabled 531 495 -------------- 532 496 ··· 497 607 # echo irqsoff > current_tracer 498 608 # echo latency-format > trace_options 499 609 # echo 0 > tracing_max_latency 500 - # echo 1 > tracing_enabled 610 + # echo 1 > tracing_on 501 611 # ls -ltr 502 612 [...] 503 - # echo 0 > tracing_enabled 613 + # echo 0 > tracing_on 504 614 # cat trace 505 615 # tracer: irqsoff 506 616 # ··· 605 715 # echo preemptoff > current_tracer 606 716 # echo latency-format > trace_options 607 717 # echo 0 > tracing_max_latency 608 - # echo 1 > tracing_enabled 718 + # echo 1 > tracing_on 609 719 # ls -ltr 610 720 [...] 611 - # echo 0 > tracing_enabled 721 + # echo 0 > tracing_on 612 722 # cat trace 613 723 # tracer: preemptoff 614 724 # ··· 753 863 # echo preemptirqsoff > current_tracer 754 864 # echo latency-format > trace_options 755 865 # echo 0 > tracing_max_latency 756 - # echo 1 > tracing_enabled 866 + # echo 1 > tracing_on 757 867 # ls -ltr 758 868 [...] 759 - # echo 0 > tracing_enabled 869 + # echo 0 > tracing_on 760 870 # cat trace 761 871 # tracer: preemptirqsoff 762 872 # ··· 916 1026 # echo wakeup > current_tracer 917 1027 # echo latency-format > trace_options 918 1028 # echo 0 > tracing_max_latency 919 - # echo 1 > tracing_enabled 1029 + # echo 1 > tracing_on 920 1030 # chrt -f 5 sleep 1 921 - # echo 0 > tracing_enabled 1031 + # echo 0 > tracing_on 922 1032 # cat trace 923 1033 # tracer: wakeup 924 1034 # ··· 1030 1140 1031 1141 # sysctl kernel.ftrace_enabled=1 1032 1142 # echo function > current_tracer 1033 - # echo 1 > tracing_enabled 1143 + # echo 1 > tracing_on 1034 1144 # usleep 1 1035 - # echo 0 > tracing_enabled 1145 + # echo 0 > tracing_on 1036 1146 # cat trace 1037 1147 # tracer: function 1038 1148 # ··· 1070 1180 [...] 1071 1181 int main(int argc, char *argv[]) { 1072 1182 [...] 1073 - trace_fd = open(tracing_file("tracing_enabled"), O_WRONLY); 1183 + trace_fd = open(tracing_file("tracing_on"), O_WRONLY); 1074 1184 [...] 1075 1185 if (condition_hit()) { 1076 1186 write(trace_fd, "0", 1); ··· 1521 1631 # echo sys_nanosleep hrtimer_interrupt \ 1522 1632 > set_ftrace_filter 1523 1633 # echo function > current_tracer 1524 - # echo 1 > tracing_enabled 1634 + # echo 1 > tracing_on 1525 1635 # usleep 1 1526 - # echo 0 > tracing_enabled 1636 + # echo 0 > tracing_on 1527 1637 # cat trace 1528 1638 # tracer: ftrace 1529 1639 # ··· 1769 1879 # echo function > current_tracer 1770 1880 # cat trace_pipe > /tmp/trace.out & 1771 1881 [1] 4153 1772 - # echo 1 > tracing_enabled 1882 + # echo 1 > tracing_on 1773 1883 # usleep 1 1774 - # echo 0 > tracing_enabled 1884 + # echo 0 > tracing_on 1775 1885 # cat trace 1776 1886 # tracer: function 1777 1887 #
+6 -4
include/linux/syscalls.h
··· 132 132 .class = &event_class_syscall_enter, \ 133 133 .event.funcs = &enter_syscall_print_funcs, \ 134 134 .data = (void *)&__syscall_meta_##sname,\ 135 + .flags = TRACE_EVENT_FL_CAP_ANY, \ 135 136 }; \ 136 137 static struct ftrace_event_call __used \ 137 138 __attribute__((section("_ftrace_events"))) \ 138 - *__event_enter_##sname = &event_enter_##sname; \ 139 - __TRACE_EVENT_FLAGS(enter_##sname, TRACE_EVENT_FL_CAP_ANY) 139 + *__event_enter_##sname = &event_enter_##sname; 140 140 141 141 #define SYSCALL_TRACE_EXIT_EVENT(sname) \ 142 142 static struct syscall_metadata __syscall_meta_##sname; \ ··· 146 146 .class = &event_class_syscall_exit, \ 147 147 .event.funcs = &exit_syscall_print_funcs, \ 148 148 .data = (void *)&__syscall_meta_##sname,\ 149 + .flags = TRACE_EVENT_FL_CAP_ANY, \ 149 150 }; \ 150 151 static struct ftrace_event_call __used \ 151 152 __attribute__((section("_ftrace_events"))) \ 152 - *__event_exit_##sname = &event_exit_##sname; \ 153 - __TRACE_EVENT_FLAGS(exit_##sname, TRACE_EVENT_FL_CAP_ANY) 153 + *__event_exit_##sname = &event_exit_##sname; 154 154 155 155 #define SYSCALL_METADATA(sname, nb) \ 156 156 SYSCALL_TRACE_ENTER_EVENT(sname); \ ··· 158 158 static struct syscall_metadata __used \ 159 159 __syscall_meta_##sname = { \ 160 160 .name = "sys"#sname, \ 161 + .syscall_nr = -1, /* Filled in at boot */ \ 161 162 .nb_args = nb, \ 162 163 .types = types_##sname, \ 163 164 .args = args_##sname, \ ··· 176 175 static struct syscall_metadata __used \ 177 176 __syscall_meta__##sname = { \ 178 177 .name = "sys_"#sname, \ 178 + .syscall_nr = -1, /* Filled in at boot */ \ 179 179 .nb_args = 0, \ 180 180 .enter_event = &event_enter__##sname, \ 181 181 .exit_event = &event_exit__##sname, \
+6 -2
kernel/trace/ring_buffer.c
··· 2163 2163 delta = diff; 2164 2164 if (unlikely(test_time_stamp(delta))) { 2165 2165 WARN_ONCE(delta > (1ULL << 59), 2166 - KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n", 2166 + KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n%s", 2167 2167 (unsigned long long)delta, 2168 2168 (unsigned long long)ts, 2169 - (unsigned long long)cpu_buffer->write_stamp); 2169 + (unsigned long long)cpu_buffer->write_stamp, 2170 + sched_clock_stable ? "" : 2171 + "If you just came from a suspend/resume,\n" 2172 + "please switch to the trace global clock:\n" 2173 + " echo global > /sys/kernel/debug/tracing/trace_clock\n"); 2170 2174 add_timestamp = 1; 2171 2175 } 2172 2176 }
+4
kernel/trace/trace.c
··· 2710 2710 2711 2711 mutex_lock(&trace_types_lock); 2712 2712 if (tracer_enabled ^ val) { 2713 + 2714 + /* Only need to warn if this is used to change the state */ 2715 + WARN_ONCE(1, "tracing_enabled is deprecated. Use tracing_on"); 2716 + 2713 2717 if (val) { 2714 2718 tracer_enabled = 1; 2715 2719 if (current_trace->start)
-48
kernel/trace/trace_sched_switch.c
··· 247 247 ctx_trace = tr; 248 248 } 249 249 250 - static void stop_sched_trace(struct trace_array *tr) 251 - { 252 - tracing_stop_sched_switch_record(); 253 - } 254 - 255 - static int sched_switch_trace_init(struct trace_array *tr) 256 - { 257 - ctx_trace = tr; 258 - tracing_reset_online_cpus(tr); 259 - tracing_start_sched_switch_record(); 260 - return 0; 261 - } 262 - 263 - static void sched_switch_trace_reset(struct trace_array *tr) 264 - { 265 - if (sched_ref) 266 - stop_sched_trace(tr); 267 - } 268 - 269 - static void sched_switch_trace_start(struct trace_array *tr) 270 - { 271 - sched_stopped = 0; 272 - } 273 - 274 - static void sched_switch_trace_stop(struct trace_array *tr) 275 - { 276 - sched_stopped = 1; 277 - } 278 - 279 - static struct tracer sched_switch_trace __read_mostly = 280 - { 281 - .name = "sched_switch", 282 - .init = sched_switch_trace_init, 283 - .reset = sched_switch_trace_reset, 284 - .start = sched_switch_trace_start, 285 - .stop = sched_switch_trace_stop, 286 - .wait_pipe = poll_wait_pipe, 287 - #ifdef CONFIG_FTRACE_SELFTEST 288 - .selftest = trace_selftest_startup_sched_switch, 289 - #endif 290 - }; 291 - 292 - __init static int init_sched_switch_trace(void) 293 - { 294 - return register_tracer(&sched_switch_trace); 295 - } 296 - device_initcall(init_sched_switch_trace); 297 -
+30 -12
kernel/trace/trace_syscalls.c
··· 60 60 61 61 static struct syscall_metadata **syscalls_metadata; 62 62 63 + #ifndef ARCH_HAS_SYSCALL_MATCH_SYM_NAME 64 + static inline bool arch_syscall_match_sym_name(const char *sym, const char *name) 65 + { 66 + /* 67 + * Only compare after the "sys" prefix. Archs that use 68 + * syscall wrappers may have syscalls symbols aliases prefixed 69 + * with "SyS" instead of "sys", leading to an unwanted 70 + * mismatch. 71 + */ 72 + return !strcmp(sym + 3, name + 3); 73 + } 74 + #endif 75 + 63 76 static __init struct syscall_metadata * 64 77 find_syscall_meta(unsigned long syscall) 65 78 { ··· 85 72 stop = __stop_syscalls_metadata; 86 73 kallsyms_lookup(syscall, NULL, NULL, NULL, str); 87 74 75 + if (arch_syscall_match_sym_name(str, "sys_ni_syscall")) 76 + return NULL; 77 + 88 78 for ( ; start < stop; start++) { 89 - /* 90 - * Only compare after the "sys" prefix. Archs that use 91 - * syscall wrappers may have syscalls symbols aliases prefixed 92 - * with "SyS" instead of "sys", leading to an unwanted 93 - * mismatch. 94 - */ 95 - if ((*start)->name && !strcmp((*start)->name + 3, str + 3)) 79 + if ((*start)->name && arch_syscall_match_sym_name(str, (*start)->name)) 96 80 return *start; 97 81 } 98 82 return NULL; ··· 369 359 int num; 370 360 371 361 num = ((struct syscall_metadata *)call->data)->syscall_nr; 372 - if (num < 0 || num >= NR_syscalls) 362 + if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 373 363 return -ENOSYS; 374 364 mutex_lock(&syscall_trace_lock); 375 365 if (!sys_refcount_enter) ··· 387 377 int num; 388 378 389 379 num = ((struct syscall_metadata *)call->data)->syscall_nr; 390 - if (num < 0 || num >= NR_syscalls) 380 + if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 391 381 return; 392 382 mutex_lock(&syscall_trace_lock); 393 383 sys_refcount_enter--; ··· 403 393 int num; 404 394 405 395 num = ((struct syscall_metadata *)call->data)->syscall_nr; 406 - if (num < 0 || num >= NR_syscalls) 396 + if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 407 397 return -ENOSYS; 408 398 mutex_lock(&syscall_trace_lock); 409 399 if (!sys_refcount_exit) ··· 421 411 int num; 422 412 423 413 num = ((struct syscall_metadata *)call->data)->syscall_nr; 424 - if (num < 0 || num >= NR_syscalls) 414 + if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls)) 425 415 return; 426 416 mutex_lock(&syscall_trace_lock); 427 417 sys_refcount_exit--; ··· 434 424 int init_syscall_trace(struct ftrace_event_call *call) 435 425 { 436 426 int id; 427 + int num; 428 + 429 + num = ((struct syscall_metadata *)call->data)->syscall_nr; 430 + if (num < 0 || num >= NR_syscalls) { 431 + pr_debug("syscall %s metadata not mapped, disabling ftrace event\n", 432 + ((struct syscall_metadata *)call->data)->name); 433 + return -ENOSYS; 434 + } 437 435 438 436 if (set_syscall_print_fmt(call) < 0) 439 437 return -ENOMEM; ··· 456 438 return id; 457 439 } 458 440 459 - unsigned long __init arch_syscall_addr(int nr) 441 + unsigned long __init __weak arch_syscall_addr(int nr) 460 442 { 461 443 return (unsigned long)sys_call_table[nr]; 462 444 }
+1 -1
scripts/kconfig/streamline_config.pl
··· 1 1 #!/usr/bin/perl -w 2 2 # 3 - # Copywrite 2005-2009 - Steven Rostedt 3 + # Copyright 2005-2009 - Steven Rostedt 4 4 # Licensed under the terms of the GNU GPL License version 2 5 5 # 6 6 # It's simple enough to figure out how this works.
+1 -1
tools/testing/ktest/ktest.pl
··· 1 1 #!/usr/bin/perl -w 2 2 # 3 - # Copywrite 2010 - Steven Rostedt <srostedt@redhat.com>, Red Hat Inc. 3 + # Copyright 2010 - Steven Rostedt <srostedt@redhat.com>, Red Hat Inc. 4 4 # Licensed under the terms of the GNU GPL License version 2 5 5 # 6 6