Merge tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

+29

Documentation/admin-guide/kernel-parameters.txt

··· 1509 1509 boot up that is likely to be overridden by user space 1510 1510 start up functionality. 1511 1511 1512 + Optionally, the snapshot can also be defined for a tracing 1513 + instance that was created by the trace_instance= command 1514 + line parameter. 1515 + 1516 + trace_instance=foo,sched_switch ftrace_boot_snapshot=foo 1517 + 1518 + The above will cause the "foo" tracing instance to trigger 1519 + a snapshot at the end of boot up. 1520 + 1512 1521 ftrace_dump_on_oops[=orig_cpu] 1513 1522 [FTRACE] will dump the trace buffers on oops. 1514 1523 If no parameter is passed, ftrace will dump ··· 6291 6282 to facilitate early boot debugging. The event-list is a 6292 6283 comma-separated list of trace events to enable. See 6293 6284 also Documentation/trace/events.rst 6285 + 6286 + trace_instance=[instance-info] 6287 + [FTRACE] Create a ring buffer instance early in boot up. 6288 + This will be listed in: 6289 + 6290 + /sys/kernel/tracing/instances 6291 + 6292 + Events can be enabled at the time the instance is created 6293 + via: 6294 + 6295 + trace_instance=<name>,<system1>:<event1>,<system2>:<event2> 6296 + 6297 + Note, the "<system*>:" portion is optional if the event is 6298 + unique. 6299 + 6300 + trace_instance=foo,sched:sched_switch,irq_handler_entry,initcall 6301 + 6302 + will enable the "sched_switch" event (note, the "sched:" is optional, and 6303 + the same thing would happen if it was left off). The irq_handler_entry 6304 + event, and all events under the "initcall" system. 6294 6305 6295 6306 trace_options=[option-list] 6296 6307 [FTRACE] Enable or disable tracer options at boot.

+12

Documentation/trace/events.rst

··· 207 207 As the kernel will have to know how to retrieve the memory that the pointer 208 208 is at from user space. 209 209 210 + You can convert any long type to a function address and search by function name:: 211 + 212 + call_site.function == security_prepare_creds 213 + 214 + The above will filter when the field "call_site" falls on the address within 215 + "security_prepare_creds". That is, it will compare the value of "call_site" and 216 + the filter will return true if it is greater than or equal to the start of 217 + the function "security_prepare_creds" and less than the end of that function. 218 + 219 + The ".function" postfix can only be attached to values of size long, and can only 220 + be compared with "==" or "!=". 221 + 210 222 5.2 Setting filters 211 223 ------------------- 212 224

+198 -44

Documentation/trace/histogram.rst

··· 81 81 .usecs display a common_timestamp in microseconds 82 82 .percent display a number of percentage value 83 83 .graph display a bar-graph of a value 84 + .stacktrace display as a stacktrace (must by a long[] type) 84 85 ============= ================================================= 85 86 86 87 Note that in general the semantics of a given field aren't ··· 1787 1786 # echo 'hist:keys=next_pid:us_per_sec=1000000 ...' >> event/trigger 1788 1787 # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/$us_per_sec ...' >> event/trigger 1789 1788 1789 + Variables can even hold stacktraces, which are useful with synthetic events. 1790 + 1790 1791 2.2.2 Synthetic Events 1791 1792 ---------------------- 1792 1793 ··· 1864 1861 The above shows the latency "lat" in a power of 2 grouping. 1865 1862 1866 1863 Like any other event, once a histogram is enabled for the event, the 1867 - output can be displayed by reading the event's 'hist' file. 1864 + output can be displayed by reading the event's 'hist' file:: 1868 1865 1869 1866 # cat /sys/kernel/tracing/events/synthetic/wakeup_latency/hist 1870 1867 ··· 1911 1908 1912 1909 1913 1910 The latency values can also be grouped linearly by a given size with 1914 - the ".buckets" modifier and specify a size (in this case groups of 10). 1911 + the ".buckets" modifier and specify a size (in this case groups of 10):: 1915 1912 1916 1913 # echo 'hist:keys=pid,prio,lat.buckets=10:sort=lat' >> \ 1917 1914 /sys/kernel/tracing/events/synthetic/wakeup_latency/trigger ··· 1942 1939 Hits: 2112 1943 1940 Entries: 16 1944 1941 Dropped: 0 1942 + 1943 + To save stacktraces, create a synthetic event with a field of type "unsigned long[]" 1944 + or even just "long[]". For example, to see how long a task is blocked in an 1945 + uninterruptible state:: 1946 + 1947 + # cd /sys/kernel/tracing 1948 + # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events 1949 + # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger 1950 + # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger 1951 + # echo 1 > events/synthetic/block_lat/enable 1952 + # cat trace 1953 + 1954 + # tracer: nop 1955 + # 1956 + # entries-in-buffer/entries-written: 2/2 #P:8 1957 + # 1958 + # _-----=> irqs-off/BH-disabled 1959 + # / _----=> need-resched 1960 + # | / _---=> hardirq/softirq 1961 + # || / _--=> preempt-depth 1962 + # ||| / _-=> migrate-disable 1963 + # |||| / delay 1964 + # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 1965 + # | | | ||||| | | 1966 + <idle>-0 [005] d..4. 521.164922: block_lat: pid=0 delta=8322 stack=STACK: 1967 + => __schedule+0x448/0x7b0 1968 + => schedule+0x5a/0xb0 1969 + => io_schedule+0x42/0x70 1970 + => bit_wait_io+0xd/0x60 1971 + => __wait_on_bit+0x4b/0x140 1972 + => out_of_line_wait_on_bit+0x91/0xb0 1973 + => jbd2_journal_commit_transaction+0x1679/0x1a70 1974 + => kjournald2+0xa9/0x280 1975 + => kthread+0xe9/0x110 1976 + => ret_from_fork+0x2c/0x50 1977 + 1978 + <...>-2 [004] d..4. 525.184257: block_lat: pid=2 delta=76 stack=STACK: 1979 + => __schedule+0x448/0x7b0 1980 + => schedule+0x5a/0xb0 1981 + => schedule_timeout+0x11a/0x150 1982 + => wait_for_completion_killable+0x144/0x1f0 1983 + => __kthread_create_on_node+0xe7/0x1e0 1984 + => kthread_create_on_node+0x51/0x70 1985 + => create_worker+0xcc/0x1a0 1986 + => worker_thread+0x2ad/0x380 1987 + => kthread+0xe9/0x110 1988 + => ret_from_fork+0x2c/0x50 1989 + 1990 + A synthetic event that has a stacktrace field may use it as a key in 1991 + histogram:: 1992 + 1993 + # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger 1994 + # cat events/synthetic/block_lat/hist 1995 + 1996 + # event histogram 1997 + # 1998 + # trigger info: hist:keys=delta.buckets=100,stack.stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active] 1999 + # 2000 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 2001 + schedule+0x6b/0x110 2002 + io_schedule+0x46/0x80 2003 + bit_wait_io+0x11/0x80 2004 + __wait_on_bit+0x4e/0x120 2005 + out_of_line_wait_on_bit+0x8d/0xb0 2006 + __wait_on_buffer+0x33/0x40 2007 + jbd2_journal_commit_transaction+0x155a/0x19b0 2008 + kjournald2+0xab/0x270 2009 + kthread+0xfa/0x130 2010 + ret_from_fork+0x29/0x50 2011 + } hitcount: 1 2012 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 2013 + schedule+0x6b/0x110 2014 + io_schedule+0x46/0x80 2015 + rq_qos_wait+0xd0/0x170 2016 + wbt_wait+0x9e/0xf0 2017 + __rq_qos_throttle+0x25/0x40 2018 + blk_mq_submit_bio+0x2c3/0x5b0 2019 + __submit_bio+0xff/0x190 2020 + submit_bio_noacct_nocheck+0x25b/0x2b0 2021 + submit_bio_noacct+0x20b/0x600 2022 + submit_bio+0x28/0x90 2023 + ext4_bio_write_page+0x1e0/0x8c0 2024 + mpage_submit_page+0x60/0x80 2025 + mpage_process_page_bufs+0x16c/0x180 2026 + mpage_prepare_extent_to_map+0x23f/0x530 2027 + } hitcount: 1 2028 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 2029 + schedule+0x6b/0x110 2030 + schedule_hrtimeout_range_clock+0x97/0x110 2031 + schedule_hrtimeout_range+0x13/0x20 2032 + usleep_range_state+0x65/0x90 2033 + __intel_wait_for_register+0x1c1/0x230 [i915] 2034 + intel_psr_wait_for_idle_locked+0x171/0x2a0 [i915] 2035 + intel_pipe_update_start+0x169/0x360 [i915] 2036 + intel_update_crtc+0x112/0x490 [i915] 2037 + skl_commit_modeset_enables+0x199/0x600 [i915] 2038 + intel_atomic_commit_tail+0x7c4/0x1080 [i915] 2039 + intel_atomic_commit_work+0x12/0x20 [i915] 2040 + process_one_work+0x21c/0x3f0 2041 + worker_thread+0x50/0x3e0 2042 + kthread+0xfa/0x130 2043 + } hitcount: 3 2044 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 2045 + schedule+0x6b/0x110 2046 + schedule_timeout+0x11e/0x160 2047 + __wait_for_common+0x8f/0x190 2048 + wait_for_completion+0x24/0x30 2049 + __flush_work.isra.0+0x1cc/0x360 2050 + flush_work+0xe/0x20 2051 + drm_mode_rmfb+0x18b/0x1d0 [drm] 2052 + drm_mode_rmfb_ioctl+0x10/0x20 [drm] 2053 + drm_ioctl_kernel+0xb8/0x150 [drm] 2054 + drm_ioctl+0x243/0x560 [drm] 2055 + __x64_sys_ioctl+0x92/0xd0 2056 + do_syscall_64+0x59/0x90 2057 + entry_SYSCALL_64_after_hwframe+0x72/0xdc 2058 + } hitcount: 1 2059 + { delta: ~ 0-99, stack.stacktrace __schedule+0xa19/0x1520 2060 + schedule+0x6b/0x110 2061 + schedule_timeout+0x87/0x160 2062 + __wait_for_common+0x8f/0x190 2063 + wait_for_completion_timeout+0x1d/0x30 2064 + drm_atomic_helper_wait_for_flip_done+0x57/0x90 [drm_kms_helper] 2065 + intel_atomic_commit_tail+0x8ce/0x1080 [i915] 2066 + intel_atomic_commit_work+0x12/0x20 [i915] 2067 + process_one_work+0x21c/0x3f0 2068 + worker_thread+0x50/0x3e0 2069 + kthread+0xfa/0x130 2070 + ret_from_fork+0x29/0x50 2071 + } hitcount: 1 2072 + { delta: ~ 100-199, stack.stacktrace __schedule+0xa19/0x1520 2073 + schedule+0x6b/0x110 2074 + schedule_hrtimeout_range_clock+0x97/0x110 2075 + schedule_hrtimeout_range+0x13/0x20 2076 + usleep_range_state+0x65/0x90 2077 + pci_set_low_power_state+0x17f/0x1f0 2078 + pci_set_power_state+0x49/0x250 2079 + pci_finish_runtime_suspend+0x4a/0x90 2080 + pci_pm_runtime_suspend+0xcb/0x1b0 2081 + __rpm_callback+0x48/0x120 2082 + rpm_callback+0x67/0x70 2083 + rpm_suspend+0x167/0x780 2084 + rpm_idle+0x25a/0x380 2085 + pm_runtime_work+0x93/0xc0 2086 + process_one_work+0x21c/0x3f0 2087 + } hitcount: 1 2088 + 2089 + Totals: 2090 + Hits: 10 2091 + Entries: 7 2092 + Dropped: 0 1945 2093 1946 2094 2.2.3 Hist trigger 'handlers' and 'actions' 1947 2095 ------------------------------------------- ··· 2208 2054 wakeup_new_test($testpid) if comm=="cyclictest"' >> \ 2209 2055 /sys/kernel/tracing/events/sched/sched_wakeup_new/trigger 2210 2056 2211 - Or, equivalently, using the 'trace' keyword syntax: 2057 + Or, equivalently, using the 'trace' keyword syntax:: 2212 2058 2213 - # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\ 2214 - trace(wakeup_new_test,$testpid) if comm=="cyclictest"' >> \ 2215 - /sys/kernel/tracing/events/sched/sched_wakeup_new/trigger 2059 + # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\ 2060 + trace(wakeup_new_test,$testpid) if comm=="cyclictest"' >> \ 2061 + /sys/kernel/tracing/events/sched/sched_wakeup_new/trigger 2216 2062 2217 2063 Creating and displaying a histogram based on those events is now 2218 2064 just a matter of using the fields and new synthetic event in the ··· 2345 2191 resulting latency, stored in wakeup_lat, exceeds the current 2346 2192 maximum latency, a snapshot is taken. As part of the setup, all 2347 2193 the scheduler events are also enabled, which are the events that 2348 - will show up in the snapshot when it is taken at some point: 2194 + will show up in the snapshot when it is taken at some point:: 2349 2195 2350 - # echo 1 > /sys/kernel/tracing/events/sched/enable 2196 + # echo 1 > /sys/kernel/tracing/events/sched/enable 2351 2197 2352 - # echo 'hist:keys=pid:ts0=common_timestamp.usecs \ 2353 - if comm=="cyclictest"' >> \ 2354 - /sys/kernel/tracing/events/sched/sched_waking/trigger 2198 + # echo 'hist:keys=pid:ts0=common_timestamp.usecs \ 2199 + if comm=="cyclictest"' >> \ 2200 + /sys/kernel/tracing/events/sched/sched_waking/trigger 2355 2201 2356 - # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ 2357 - onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio, \ 2358 - prev_comm):onmax($wakeup_lat).snapshot() \ 2359 - if next_comm=="cyclictest"' >> \ 2360 - /sys/kernel/tracing/events/sched/sched_switch/trigger 2202 + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ 2203 + onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio, \ 2204 + prev_comm):onmax($wakeup_lat).snapshot() \ 2205 + if next_comm=="cyclictest"' >> \ 2206 + /sys/kernel/tracing/events/sched/sched_switch/trigger 2361 2207 2362 2208 When the histogram is displayed, for each bucket the max value 2363 2209 and the saved values corresponding to the max are displayed 2364 2210 following the rest of the fields. 2365 2211 2366 2212 If a snapshot was taken, there is also a message indicating that, 2367 - along with the value and event that triggered the global maximum: 2213 + along with the value and event that triggered the global maximum:: 2368 2214 2369 - # cat /sys/kernel/tracing/events/sched/sched_switch/hist 2370 - { next_pid: 2101 } hitcount: 200 2371 - max: 52 next_prio: 120 next_comm: cyclictest \ 2372 - prev_pid: 0 prev_prio: 120 prev_comm: swapper/6 2215 + # cat /sys/kernel/tracing/events/sched/sched_switch/hist 2216 + { next_pid: 2101 } hitcount: 200 2217 + max: 52 next_prio: 120 next_comm: cyclictest \ 2218 + prev_pid: 0 prev_prio: 120 prev_comm: swapper/6 2373 2219 2374 - { next_pid: 2103 } hitcount: 1326 2375 - max: 572 next_prio: 19 next_comm: cyclictest \ 2376 - prev_pid: 0 prev_prio: 120 prev_comm: swapper/1 2220 + { next_pid: 2103 } hitcount: 1326 2221 + max: 572 next_prio: 19 next_comm: cyclictest \ 2222 + prev_pid: 0 prev_prio: 120 prev_comm: swapper/1 2377 2223 2378 - { next_pid: 2102 } hitcount: 1982 \ 2379 - max: 74 next_prio: 19 next_comm: cyclictest \ 2380 - prev_pid: 0 prev_prio: 120 prev_comm: swapper/5 2224 + { next_pid: 2102 } hitcount: 1982 \ 2225 + max: 74 next_prio: 19 next_comm: cyclictest \ 2226 + prev_pid: 0 prev_prio: 120 prev_comm: swapper/5 2381 2227 2382 - Snapshot taken (see tracing/snapshot). Details: 2383 - triggering value { onmax($wakeup_lat) }: 572 \ 2384 - triggered by event with key: { next_pid: 2103 } 2228 + Snapshot taken (see tracing/snapshot). Details: 2229 + triggering value { onmax($wakeup_lat) }: 572 \ 2230 + triggered by event with key: { next_pid: 2103 } 2385 2231 2386 - Totals: 2387 - Hits: 3508 2388 - Entries: 3 2389 - Dropped: 0 2232 + Totals: 2233 + Hits: 3508 2234 + Entries: 3 2235 + Dropped: 0 2390 2236 2391 2237 In the above case, the event that triggered the global maximum has 2392 2238 the key with next_pid == 2103. If you look at the bucket that has ··· 2464 2310 $cwnd variable. If the value has changed, a snapshot is taken. 2465 2311 As part of the setup, all the scheduler and tcp events are also 2466 2312 enabled, which are the events that will show up in the snapshot 2467 - when it is taken at some point: 2313 + when it is taken at some point:: 2468 2314 2469 - # echo 1 > /sys/kernel/tracing/events/sched/enable 2470 - # echo 1 > /sys/kernel/tracing/events/tcp/enable 2315 + # echo 1 > /sys/kernel/tracing/events/sched/enable 2316 + # echo 1 > /sys/kernel/tracing/events/tcp/enable 2471 2317 2472 - # echo 'hist:keys=dport:cwnd=snd_cwnd: \ 2473 - onchange($cwnd).save(snd_wnd,srtt,rcv_wnd): \ 2474 - onchange($cwnd).snapshot()' >> \ 2475 - /sys/kernel/tracing/events/tcp/tcp_probe/trigger 2318 + # echo 'hist:keys=dport:cwnd=snd_cwnd: \ 2319 + onchange($cwnd).save(snd_wnd,srtt,rcv_wnd): \ 2320 + onchange($cwnd).snapshot()' >> \ 2321 + /sys/kernel/tracing/events/tcp/tcp_probe/trigger 2476 2322 2477 2323 When the histogram is displayed, for each bucket the tracked value 2478 2324 and the saved values corresponding to that value are displayed ··· 2495 2341 { dport: 443 } hitcount: 211 2496 2342 changed: 10 snd_wnd: 26960 srtt: 17379 rcv_wnd: 28800 2497 2343 2498 - Snapshot taken (see tracing/snapshot). Details:: 2344 + Snapshot taken (see tracing/snapshot). Details: 2499 2345 2500 - triggering value { onchange($cwnd) }: 10 2501 - triggered by event with key: { dport: 80 } 2346 + triggering value { onchange($cwnd) }: 10 2347 + triggered by event with key: { dport: 80 } 2502 2348 2503 2349 Totals: 2504 2350 Hits: 414

+1 -1

include/linux/kernel.h

··· 297 297 * 298 298 * Use tracing_on/tracing_off when you want to quickly turn on or off 299 299 * tracing. It simply enables or disables the recording of the trace events. 300 - * This also corresponds to the user space /sys/kernel/debug/tracing/tracing_on 300 + * This also corresponds to the user space /sys/kernel/tracing/tracing_on 301 301 * file, which gives a means for the kernel and userspace to interact. 302 302 * Place a tracing_off() in the kernel where you want tracing to end. 303 303 * From user space, examine the trace, and then echo 1 > tracing_on

+12

include/linux/trace.h

··· 33 33 int register_ftrace_export(struct trace_export *export); 34 34 int unregister_ftrace_export(struct trace_export *export); 35 35 36 + /** 37 + * trace_array_puts - write a constant string into the trace buffer. 38 + * @tr: The trace array to write to 39 + * @str: The constant string to write 40 + */ 41 + #define trace_array_puts(tr, str) \ 42 + ({ \ 43 + str ? __trace_array_puts(tr, _THIS_IP_, str, strlen(str)) : -1; \ 44 + }) 45 + int __trace_array_puts(struct trace_array *tr, unsigned long ip, 46 + const char *str, int size); 47 + 36 48 void trace_printk_init_buffers(void); 37 49 __printf(3, 4) 38 50 int trace_array_printk(struct trace_array *tr, unsigned long ip,

+5

include/linux/trace_seq.h

··· 95 95 extern int trace_seq_hex_dump(struct trace_seq *s, const char *prefix_str, 96 96 int prefix_type, int rowsize, int groupsize, 97 97 const void *buf, size_t len, bool ascii); 98 + char *trace_seq_acquire(struct trace_seq *s, unsigned int len); 98 99 99 100 #else /* CONFIG_TRACING */ 100 101 static inline __printf(2, 3) ··· 139 138 static inline int trace_seq_path(struct trace_seq *s, const struct path *path) 140 139 { 141 140 return 0; 141 + } 142 + static inline char *trace_seq_acquire(struct trace_seq *s, unsigned int len) 143 + { 144 + return NULL; 142 145 } 143 146 #endif /* CONFIG_TRACING */ 144 147

+2 -2

include/linux/tracepoint.h

··· 482 482 * * This is how the trace record is structured and will 483 483 * * be saved into the ring buffer. These are the fields 484 484 * * that will be exposed to user-space in 485 - * * /sys/kernel/debug/tracing/events/<*>/format. 485 + * * /sys/kernel/tracing/events/<*>/format. 486 486 * * 487 487 * * The declared 'local variable' is called '__entry' 488 488 * * ··· 542 542 * tracepoint callback (this is used by programmatic plugins and 543 543 * can also by used by generic instrumentation like SystemTap), and 544 544 * it is also used to expose a structured trace record in 545 - * /sys/kernel/debug/tracing/events/. 545 + * /sys/kernel/tracing/events/. 546 546 * 547 547 * A set of (un)registration functions can be passed to the variant 548 548 * TRACE_EVENT_FN to perform any (un)registration work.

+1 -44

include/trace/bpf_probe.h

··· 4 4 5 5 #ifdef CONFIG_BPF_EVENTS 6 6 7 - #undef __entry 8 - #define __entry entry 9 - 10 - #undef __get_dynamic_array 11 - #define __get_dynamic_array(field) \ 12 - ((void *)__entry + (__entry->__data_loc_##field & 0xffff)) 13 - 14 - #undef __get_dynamic_array_len 15 - #define __get_dynamic_array_len(field) \ 16 - ((__entry->__data_loc_##field >> 16) & 0xffff) 17 - 18 - #undef __get_str 19 - #define __get_str(field) ((char *)__get_dynamic_array(field)) 20 - 21 - #undef __get_bitmask 22 - #define __get_bitmask(field) (char *)__get_dynamic_array(field) 23 - 24 - #undef __get_cpumask 25 - #define __get_cpumask(field) (char *)__get_dynamic_array(field) 26 - 27 - #undef __get_sockaddr 28 - #define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field)) 29 - 30 - #undef __get_rel_dynamic_array 31 - #define __get_rel_dynamic_array(field) \ 32 - ((void *)(&__entry->__rel_loc_##field) + \ 33 - sizeof(__entry->__rel_loc_##field) + \ 34 - (__entry->__rel_loc_##field & 0xffff)) 35 - 36 - #undef __get_rel_dynamic_array_len 37 - #define __get_rel_dynamic_array_len(field) \ 38 - ((__entry->__rel_loc_##field >> 16) & 0xffff) 39 - 40 - #undef __get_rel_str 41 - #define __get_rel_str(field) ((char *)__get_rel_dynamic_array(field)) 42 - 43 - #undef __get_rel_bitmask 44 - #define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field) 45 - 46 - #undef __get_rel_cpumask 47 - #define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field) 48 - 49 - #undef __get_rel_sockaddr 50 - #define __get_rel_sockaddr(field) ((struct sockaddr *)__get_rel_dynamic_array(field)) 7 + #include "stages/stage6_event_callback.h" 51 8 52 9 #undef __perf_count 53 10 #define __perf_count(c) (c)

+1 -45

include/trace/perf.h

··· 4 4 5 5 #ifdef CONFIG_PERF_EVENTS 6 6 7 - #undef __entry 8 - #define __entry entry 9 - 10 - #undef __get_dynamic_array 11 - #define __get_dynamic_array(field) \ 12 - ((void *)__entry + (__entry->__data_loc_##field & 0xffff)) 13 - 14 - #undef __get_dynamic_array_len 15 - #define __get_dynamic_array_len(field) \ 16 - ((__entry->__data_loc_##field >> 16) & 0xffff) 17 - 18 - #undef __get_str 19 - #define __get_str(field) ((char *)__get_dynamic_array(field)) 20 - 21 - #undef __get_bitmask 22 - #define __get_bitmask(field) (char *)__get_dynamic_array(field) 23 - 24 - #undef __get_cpumask 25 - #define __get_cpumask(field) (char *)__get_dynamic_array(field) 26 - 27 - #undef __get_sockaddr 28 - #define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field)) 29 - 30 - #undef __get_rel_dynamic_array 31 - #define __get_rel_dynamic_array(field) \ 32 - ((void *)__entry + \ 33 - offsetof(typeof(*__entry), __rel_loc_##field) + \ 34 - sizeof(__entry->__rel_loc_##field) + \ 35 - (__entry->__rel_loc_##field & 0xffff)) 36 - 37 - #undef __get_rel_dynamic_array_len 38 - #define __get_rel_dynamic_array_len(field) \ 39 - ((__entry->__rel_loc_##field >> 16) & 0xffff) 40 - 41 - #undef __get_rel_str 42 - #define __get_rel_str(field) ((char *)__get_rel_dynamic_array(field)) 43 - 44 - #undef __get_rel_bitmask 45 - #define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field) 46 - 47 - #undef __get_rel_cpumask 48 - #define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field) 49 - 50 - #undef __get_rel_sockaddr 51 - #define __get_rel_sockaddr(field) ((struct sockaddr *)__get_rel_dynamic_array(field)) 7 + #include "stages/stage6_event_callback.h" 52 8 53 9 #undef __perf_count 54 10 #define __perf_count(c) (__count = (c))

+3

include/trace/stages/stage3_trace_output.h

··· 139 139 u64 ____val = (u64)(value); \ 140 140 (u32) do_div(____val, NSEC_PER_SEC); \ 141 141 }) 142 + 143 + #undef __get_buf 144 + #define __get_buf(len) trace_seq_acquire(p, (len))

+3

include/trace/stages/stage6_event_callback.h

··· 2 2 3 3 /* Stage 6 definitions for creating trace events */ 4 4 5 + /* Reuse some of the stage 3 macros */ 6 + #include "stage3_trace_output.h" 7 + 5 8 #undef __entry 6 9 #define __entry entry 7 10

+1

include/trace/stages/stage7_class_define.h

··· 23 23 #undef __get_rel_sockaddr 24 24 #undef __print_array 25 25 #undef __print_hex_dump 26 + #undef __get_buf 26 27 27 28 /* 28 29 * The below is not executed in the kernel. It is only what is

+10 -10

kernel/trace/Kconfig

··· 242 242 enabled, and the functions not enabled will not affect 243 243 performance of the system. 244 244 245 - See the files in /sys/kernel/debug/tracing: 245 + See the files in /sys/kernel/tracing: 246 246 available_filter_functions 247 247 set_ftrace_filter 248 248 set_ftrace_notrace ··· 306 306 select KALLSYMS 307 307 help 308 308 This special tracer records the maximum stack footprint of the 309 - kernel and displays it in /sys/kernel/debug/tracing/stack_trace. 309 + kernel and displays it in /sys/kernel/tracing/stack_trace. 310 310 311 311 This tracer works by hooking into every function call that the 312 312 kernel executes, and keeping a maximum stack depth value and ··· 346 346 disabled by default and can be runtime (re-)started 347 347 via: 348 348 349 - echo 0 > /sys/kernel/debug/tracing/tracing_max_latency 349 + echo 0 > /sys/kernel/tracing/tracing_max_latency 350 350 351 351 (Note that kernel size and overhead increase with this option 352 352 enabled. This option and the preempt-off timing option can be ··· 370 370 disabled by default and can be runtime (re-)started 371 371 via: 372 372 373 - echo 0 > /sys/kernel/debug/tracing/tracing_max_latency 373 + echo 0 > /sys/kernel/tracing/tracing_max_latency 374 374 375 375 (Note that kernel size and overhead increase with this option 376 376 enabled. This option and the irqs-off timing option can be ··· 522 522 Allow tracing users to take snapshot of the current buffer using the 523 523 ftrace interface, e.g.: 524 524 525 - echo 1 > /sys/kernel/debug/tracing/snapshot 525 + echo 1 > /sys/kernel/tracing/snapshot 526 526 cat snapshot 527 527 528 528 config TRACER_SNAPSHOT_PER_CPU_SWAP ··· 534 534 full swap (all buffers). If this is set, then the following is 535 535 allowed: 536 536 537 - echo 1 > /sys/kernel/debug/tracing/per_cpu/cpu2/snapshot 537 + echo 1 > /sys/kernel/tracing/per_cpu/cpu2/snapshot 538 538 539 539 After which, only the tracing buffer for CPU 2 was swapped with 540 540 the main tracing buffer, and the other CPU buffers remain the same. ··· 581 581 This tracer profiles all likely and unlikely macros 582 582 in the kernel. It will display the results in: 583 583 584 - /sys/kernel/debug/tracing/trace_stat/branch_annotated 584 + /sys/kernel/tracing/trace_stat/branch_annotated 585 585 586 586 Note: this will add a significant overhead; only turn this 587 587 on if you need to profile the system's use of these macros. ··· 594 594 taken in the kernel is recorded whether it hit or miss. 595 595 The results will be displayed in: 596 596 597 - /sys/kernel/debug/tracing/trace_stat/branch_all 597 + /sys/kernel/tracing/trace_stat/branch_all 598 598 599 599 This option also enables the likely/unlikely profiler. 600 600 ··· 645 645 Tracing also is possible using the ftrace interface, e.g.: 646 646 647 647 echo 1 > /sys/block/sda/sda1/trace/enable 648 - echo blk > /sys/kernel/debug/tracing/current_tracer 649 - cat /sys/kernel/debug/tracing/trace_pipe 648 + echo blk > /sys/kernel/tracing/current_tracer 649 + cat /sys/kernel/tracing/trace_pipe 650 650 651 651 If unsure, say N. 652 652

+1 -1

kernel/trace/kprobe_event_gen_test.c

··· 21 21 * Then: 22 22 * 23 23 * # insmod kernel/trace/kprobe_event_gen_test.ko 24 - * # cat /sys/kernel/debug/tracing/trace 24 + * # cat /sys/kernel/tracing/trace 25 25 * 26 26 * You should see many instances of the "gen_kprobe_test" and 27 27 * "gen_kretprobe_test" events in the trace buffer.

+7 -2

kernel/trace/ring_buffer.c

··· 2864 2864 sched_clock_stable() ? "" : 2865 2865 "If you just came from a suspend/resume,\n" 2866 2866 "please switch to the trace global clock:\n" 2867 - " echo global > /sys/kernel/debug/tracing/trace_clock\n" 2867 + " echo global > /sys/kernel/tracing/trace_clock\n" 2868 2868 "or add trace_clock=global to the kernel command line\n"); 2869 2869 } 2870 2870 ··· 5604 5604 */ 5605 5605 void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data) 5606 5606 { 5607 - struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu]; 5607 + struct ring_buffer_per_cpu *cpu_buffer; 5608 5608 struct buffer_data_page *bpage = data; 5609 5609 struct page *page = virt_to_page(bpage); 5610 5610 unsigned long flags; 5611 + 5612 + if (!buffer || !buffer->buffers || !buffer->buffers[cpu]) 5613 + return; 5614 + 5615 + cpu_buffer = buffer->buffers[cpu]; 5611 5616 5612 5617 /* If the page is still in use someplace else, we can't reuse it */ 5613 5618 if (page_ref_count(page) > 1)

+1 -1

kernel/trace/synth_event_gen_test.c

··· 22 22 * Then: 23 23 * 24 24 * # insmod kernel/trace/synth_event_gen_test.ko 25 - * # cat /sys/kernel/debug/tracing/trace 25 + * # cat /sys/kernel/tracing/trace 26 26 * 27 27 * You should see several events in the trace buffer - 28 28 * "create_synth_test", "empty_synth_test", and several instances of

+145 -19

kernel/trace/trace.c

··· 49 49 #include <linux/irq_work.h> 50 50 #include <linux/workqueue.h> 51 51 52 + #include <asm/setup.h> /* COMMAND_LINE_SIZE */ 53 + 52 54 #include "trace.h" 53 55 #include "trace_output.h" 54 56 ··· 188 186 static bool allocate_snapshot; 189 187 static bool snapshot_at_boot; 190 188 189 + static char boot_instance_info[COMMAND_LINE_SIZE] __initdata; 190 + static int boot_instance_index; 191 + 192 + static char boot_snapshot_info[COMMAND_LINE_SIZE] __initdata; 193 + static int boot_snapshot_index; 194 + 191 195 static int __init set_cmdline_ftrace(char *str) 192 196 { 193 197 strlcpy(bootup_tracer_buf, str, MAX_TRACER_SIZE); ··· 230 222 231 223 static int __init boot_alloc_snapshot(char *str) 232 224 { 233 - allocate_snapshot = true; 234 - /* We also need the main ring buffer expanded */ 235 - ring_buffer_expanded = true; 225 + char *slot = boot_snapshot_info + boot_snapshot_index; 226 + int left = sizeof(boot_snapshot_info) - boot_snapshot_index; 227 + int ret; 228 + 229 + if (str[0] == '=') { 230 + str++; 231 + if (strlen(str) >= left) 232 + return -1; 233 + 234 + ret = snprintf(slot, left, "%s\t", str); 235 + boot_snapshot_index += ret; 236 + } else { 237 + allocate_snapshot = true; 238 + /* We also need the main ring buffer expanded */ 239 + ring_buffer_expanded = true; 240 + } 236 241 return 1; 237 242 } 238 243 __setup("alloc_snapshot", boot_alloc_snapshot); ··· 258 237 return 1; 259 238 } 260 239 __setup("ftrace_boot_snapshot", boot_snapshot); 240 + 241 + 242 + static int __init boot_instance(char *str) 243 + { 244 + char *slot = boot_instance_info + boot_instance_index; 245 + int left = sizeof(boot_instance_info) - boot_instance_index; 246 + int ret; 247 + 248 + if (strlen(str) >= left) 249 + return -1; 250 + 251 + ret = snprintf(slot, left, "%s\t", str); 252 + boot_instance_index += ret; 253 + 254 + return 1; 255 + } 256 + __setup("trace_instance=", boot_instance); 261 257 262 258 263 259 static char trace_boot_options_buf[MAX_TRACER_SIZE] __initdata; ··· 1039 1001 ring_buffer_unlock_commit(buffer); 1040 1002 } 1041 1003 1042 - /** 1043 - * __trace_puts - write a constant string into the trace buffer. 1044 - * @ip: The address of the caller 1045 - * @str: The constant string to write 1046 - * @size: The size of the string. 1047 - */ 1048 - int __trace_puts(unsigned long ip, const char *str, int size) 1004 + int __trace_array_puts(struct trace_array *tr, unsigned long ip, 1005 + const char *str, int size) 1049 1006 { 1050 1007 struct ring_buffer_event *event; 1051 1008 struct trace_buffer *buffer; ··· 1048 1015 unsigned int trace_ctx; 1049 1016 int alloc; 1050 1017 1051 - if (!(global_trace.trace_flags & TRACE_ITER_PRINTK)) 1018 + if (!(tr->trace_flags & TRACE_ITER_PRINTK)) 1052 1019 return 0; 1053 1020 1054 1021 if (unlikely(tracing_selftest_running || tracing_disabled)) ··· 1057 1024 alloc = sizeof(*entry) + size + 2; /* possible \n added */ 1058 1025 1059 1026 trace_ctx = tracing_gen_ctx(); 1060 - buffer = global_trace.array_buffer.buffer; 1027 + buffer = tr->array_buffer.buffer; 1061 1028 ring_buffer_nest_start(buffer); 1062 1029 event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, alloc, 1063 1030 trace_ctx); ··· 1079 1046 entry->buf[size] = '\0'; 1080 1047 1081 1048 __buffer_unlock_commit(buffer, event); 1082 - ftrace_trace_stack(&global_trace, buffer, trace_ctx, 4, NULL); 1049 + ftrace_trace_stack(tr, buffer, trace_ctx, 4, NULL); 1083 1050 out: 1084 1051 ring_buffer_nest_end(buffer); 1085 1052 return size; 1053 + } 1054 + EXPORT_SYMBOL_GPL(__trace_array_puts); 1055 + 1056 + /** 1057 + * __trace_puts - write a constant string into the trace buffer. 1058 + * @ip: The address of the caller 1059 + * @str: The constant string to write 1060 + * @size: The size of the string. 1061 + */ 1062 + int __trace_puts(unsigned long ip, const char *str, int size) 1063 + { 1064 + return __trace_array_puts(&global_trace, ip, str, size); 1086 1065 } 1087 1066 EXPORT_SYMBOL_GPL(__trace_puts); 1088 1067 ··· 1187 1142 * 1188 1143 * Note, make sure to allocate the snapshot with either 1189 1144 * a tracing_snapshot_alloc(), or by doing it manually 1190 - * with: echo 1 > /sys/kernel/debug/tracing/snapshot 1145 + * with: echo 1 > /sys/kernel/tracing/snapshot 1191 1146 * 1192 1147 * If the snapshot buffer is not allocated, it will stop tracing. 1193 1148 * Basically making a permanent snapshot. ··· 5805 5760 #ifdef CONFIG_SYNTH_EVENTS 5806 5761 " events/synthetic_events\t- Create/append/remove/show synthetic events\n" 5807 5762 "\t Write into this file to define/undefine new synthetic events.\n" 5808 - "\t example: echo 'myevent u64 lat; char name[]' >> synthetic_events\n" 5763 + "\t example: echo 'myevent u64 lat; char name[]; long[] stack' >> synthetic_events\n" 5809 5764 #endif 5810 5765 #endif 5811 5766 ; ··· 9270 9225 } 9271 9226 tr->allocated_snapshot = allocate_snapshot; 9272 9227 9273 - /* 9274 - * Only the top level trace array gets its snapshot allocated 9275 - * from the kernel command line. 9276 - */ 9277 9228 allocate_snapshot = false; 9278 9229 #endif 9279 9230 ··· 10185 10144 return ret; 10186 10145 } 10187 10146 10147 + #ifdef CONFIG_TRACER_MAX_TRACE 10148 + __init static bool tr_needs_alloc_snapshot(const char *name) 10149 + { 10150 + char *test; 10151 + int len = strlen(name); 10152 + bool ret; 10153 + 10154 + if (!boot_snapshot_index) 10155 + return false; 10156 + 10157 + if (strncmp(name, boot_snapshot_info, len) == 0 && 10158 + boot_snapshot_info[len] == '\t') 10159 + return true; 10160 + 10161 + test = kmalloc(strlen(name) + 3, GFP_KERNEL); 10162 + if (!test) 10163 + return false; 10164 + 10165 + sprintf(test, "\t%s\t", name); 10166 + ret = strstr(boot_snapshot_info, test) == NULL; 10167 + kfree(test); 10168 + return ret; 10169 + } 10170 + 10171 + __init static void do_allocate_snapshot(const char *name) 10172 + { 10173 + if (!tr_needs_alloc_snapshot(name)) 10174 + return; 10175 + 10176 + /* 10177 + * When allocate_snapshot is set, the next call to 10178 + * allocate_trace_buffers() (called by trace_array_get_by_name()) 10179 + * will allocate the snapshot buffer. That will alse clear 10180 + * this flag. 10181 + */ 10182 + allocate_snapshot = true; 10183 + } 10184 + #else 10185 + static inline void do_allocate_snapshot(const char *name) { } 10186 + #endif 10187 + 10188 + __init static void enable_instances(void) 10189 + { 10190 + struct trace_array *tr; 10191 + char *curr_str; 10192 + char *str; 10193 + char *tok; 10194 + 10195 + /* A tab is always appended */ 10196 + boot_instance_info[boot_instance_index - 1] = '\0'; 10197 + str = boot_instance_info; 10198 + 10199 + while ((curr_str = strsep(&str, "\t"))) { 10200 + 10201 + tok = strsep(&curr_str, ","); 10202 + 10203 + if (IS_ENABLED(CONFIG_TRACER_MAX_TRACE)) 10204 + do_allocate_snapshot(tok); 10205 + 10206 + tr = trace_array_get_by_name(tok); 10207 + if (!tr) { 10208 + pr_warn("Failed to create instance buffer %s\n", curr_str); 10209 + continue; 10210 + } 10211 + /* Allow user space to delete it */ 10212 + trace_array_put(tr); 10213 + 10214 + while ((tok = strsep(&curr_str, ","))) { 10215 + early_enable_events(tr, tok, true); 10216 + } 10217 + } 10218 + } 10219 + 10188 10220 __init static int tracer_alloc_buffers(void) 10189 10221 { 10190 10222 int ring_buf_size; ··· 10391 10277 10392 10278 void __init ftrace_boot_snapshot(void) 10393 10279 { 10280 + struct trace_array *tr; 10281 + 10394 10282 if (snapshot_at_boot) { 10395 10283 tracing_snapshot(); 10396 10284 internal_trace_puts("** Boot snapshot taken **\n"); 10285 + } 10286 + 10287 + list_for_each_entry(tr, &ftrace_trace_arrays, list) { 10288 + if (tr == &global_trace) 10289 + continue; 10290 + trace_array_puts(tr, "** Boot snapshot taken **\n"); 10291 + tracing_snapshot_instance(tr); 10397 10292 } 10398 10293 } 10399 10294 ··· 10425 10302 void __init trace_init(void) 10426 10303 { 10427 10304 trace_event_init(); 10305 + 10306 + if (boot_instance_index) 10307 + enable_instances(); 10428 10308 } 10429 10309 10430 10310 __init static void clear_boot_tracer(void)

+6

kernel/trace/trace.h

··· 113 113 #define MEM_FAIL(condition, fmt, ...) \ 114 114 DO_ONCE_LITE_IF(condition, pr_err, "ERROR: " fmt, ##__VA_ARGS__) 115 115 116 + #define HIST_STACKTRACE_DEPTH 16 117 + #define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long)) 118 + #define HIST_STACKTRACE_SKIP 5 119 + 116 120 /* 117 121 * syscalls are special, and need special handling, this is why 118 122 * they are not included in trace_entries.h ··· 1334 1330 DECLARE_PER_CPU(int, trace_buffered_event_cnt); 1335 1331 void trace_buffered_event_disable(void); 1336 1332 void trace_buffered_event_enable(void); 1333 + 1334 + void early_enable_events(struct trace_array *tr, char *buf, bool disable_first); 1337 1335 1338 1336 static inline void 1339 1337 __trace_event_discard_commit(struct trace_buffer *buffer,

+5 -8

kernel/trace/trace_events.c

··· 2281 2281 if (!system->name) 2282 2282 goto out_free; 2283 2283 2284 - system->filter = NULL; 2285 - 2286 2284 system->filter = kzalloc(sizeof(struct event_filter), GFP_KERNEL); 2287 2285 if (!system->filter) 2288 2286 goto out_free; ··· 2841 2843 if (!trigger) 2842 2844 break; 2843 2845 bootup_triggers[i].event = strsep(&trigger, "."); 2844 - bootup_triggers[i].trigger = strsep(&trigger, "."); 2846 + bootup_triggers[i].trigger = trigger; 2845 2847 if (!bootup_triggers[i].trigger) 2846 2848 break; 2847 2849 } ··· 3769 3771 return 0; 3770 3772 } 3771 3773 3772 - static __init void 3773 - early_enable_events(struct trace_array *tr, bool disable_first) 3774 + __init void 3775 + early_enable_events(struct trace_array *tr, char *buf, bool disable_first) 3774 3776 { 3775 - char *buf = bootup_event_buf; 3776 3777 char *token; 3777 3778 int ret; 3778 3779 ··· 3824 3827 */ 3825 3828 __trace_early_add_events(tr); 3826 3829 3827 - early_enable_events(tr, false); 3830 + early_enable_events(tr, bootup_event_buf, false); 3828 3831 3829 3832 trace_printk_start_comm(); 3830 3833 ··· 3852 3855 if (!tr) 3853 3856 return -ENODEV; 3854 3857 3855 - early_enable_events(tr, true); 3858 + early_enable_events(tr, bootup_event_buf, true); 3856 3859 3857 3860 return 0; 3858 3861 }

+92 -1

kernel/trace/trace_events_filter.c

··· 64 64 FILTER_PRED_FN_PCHAR_USER, 65 65 FILTER_PRED_FN_PCHAR, 66 66 FILTER_PRED_FN_CPU, 67 + FILTER_PRED_FN_FUNCTION, 67 68 FILTER_PRED_FN_, 68 69 FILTER_PRED_TEST_VISITED, 69 70 }; ··· 72 71 struct filter_pred { 73 72 enum filter_pred_fn fn_num; 74 73 u64 val; 74 + u64 val2; 75 75 struct regex regex; 76 76 unsigned short *ops; 77 77 struct ftrace_event_field *field; ··· 105 103 C(INVALID_FILTER, "Meaningless filter expression"), \ 106 104 C(IP_FIELD_ONLY, "Only 'ip' field is supported for function trace"), \ 107 105 C(INVALID_VALUE, "Invalid value (did you forget quotes)?"), \ 106 + C(NO_FUNCTION, "Function not found"), \ 108 107 C(ERRNO, "Error"), \ 109 108 C(NO_FILTER, "No filter found") 110 109 ··· 879 876 return cmp ^ pred->not; 880 877 } 881 878 879 + /* Filter predicate for functions. */ 880 + static int filter_pred_function(struct filter_pred *pred, void *event) 881 + { 882 + unsigned long *addr = (unsigned long *)(event + pred->offset); 883 + unsigned long start = (unsigned long)pred->val; 884 + unsigned long end = (unsigned long)pred->val2; 885 + int ret = *addr >= start && *addr < end; 886 + 887 + return pred->op == OP_EQ ? ret : !ret; 888 + } 889 + 882 890 /* 883 891 * regex_match_foo - Basic regex callbacks 884 892 * ··· 1349 1335 return filter_pred_pchar(pred, event); 1350 1336 case FILTER_PRED_FN_CPU: 1351 1337 return filter_pred_cpu(pred, event); 1338 + case FILTER_PRED_FN_FUNCTION: 1339 + return filter_pred_function(pred, event); 1352 1340 case FILTER_PRED_TEST_VISITED: 1353 1341 return test_pred_visited_fn(pred, event); 1354 1342 default: ··· 1366 1350 struct trace_event_call *call = data; 1367 1351 struct ftrace_event_field *field; 1368 1352 struct filter_pred *pred = NULL; 1353 + unsigned long offset; 1354 + unsigned long size; 1355 + unsigned long ip; 1369 1356 char num_buf[24]; /* Big enough to hold an address */ 1370 1357 char *field_name; 1358 + char *name; 1359 + bool function = false; 1371 1360 bool ustring = false; 1372 1361 char q; 1373 1362 u64 val; ··· 1414 1393 i += len; 1415 1394 } 1416 1395 1396 + /* See if the field is a kernel function name */ 1397 + if ((len = str_has_prefix(str + i, ".function"))) { 1398 + function = true; 1399 + i += len; 1400 + } 1401 + 1417 1402 while (isspace(str[i])) 1418 1403 i++; 1419 1404 ··· 1450 1423 pred->offset = field->offset; 1451 1424 pred->op = op; 1452 1425 1453 - if (ftrace_event_is_function(call)) { 1426 + if (function) { 1427 + /* The field must be the same size as long */ 1428 + if (field->size != sizeof(long)) { 1429 + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i); 1430 + goto err_free; 1431 + } 1432 + 1433 + /* Function only works with '==' or '!=' and an unquoted string */ 1434 + switch (op) { 1435 + case OP_NE: 1436 + case OP_EQ: 1437 + break; 1438 + default: 1439 + parse_error(pe, FILT_ERR_INVALID_OP, pos + i); 1440 + goto err_free; 1441 + } 1442 + 1443 + if (isdigit(str[i])) { 1444 + /* We allow 0xDEADBEEF */ 1445 + while (isalnum(str[i])) 1446 + i++; 1447 + 1448 + len = i - s; 1449 + /* 0xfeedfacedeadbeef is 18 chars max */ 1450 + if (len >= sizeof(num_buf)) { 1451 + parse_error(pe, FILT_ERR_OPERAND_TOO_LONG, pos + i); 1452 + goto err_free; 1453 + } 1454 + 1455 + strncpy(num_buf, str + s, len); 1456 + num_buf[len] = 0; 1457 + 1458 + ret = kstrtoul(num_buf, 0, &ip); 1459 + if (ret) { 1460 + parse_error(pe, FILT_ERR_INVALID_VALUE, pos + i); 1461 + goto err_free; 1462 + } 1463 + } else { 1464 + s = i; 1465 + for (; str[i] && !isspace(str[i]); i++) 1466 + ; 1467 + 1468 + len = i - s; 1469 + name = kmemdup_nul(str + s, len, GFP_KERNEL); 1470 + if (!name) 1471 + goto err_mem; 1472 + ip = kallsyms_lookup_name(name); 1473 + kfree(name); 1474 + if (!ip) { 1475 + parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i); 1476 + goto err_free; 1477 + } 1478 + } 1479 + 1480 + /* Now find the function start and end address */ 1481 + if (!kallsyms_lookup_size_offset(ip, &size, &offset)) { 1482 + parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i); 1483 + goto err_free; 1484 + } 1485 + 1486 + pred->fn_num = FILTER_PRED_FN_FUNCTION; 1487 + pred->val = ip - offset; 1488 + pred->val2 = pred->val + size; 1489 + 1490 + } else if (ftrace_event_is_function(call)) { 1454 1491 /* 1455 1492 * Perf does things different with function events. 1456 1493 * It only allows an "ip" field, and expects a string.

+102 -22

kernel/trace/trace_events_hist.c

··· 135 135 HIST_FIELD_FN_DIV_NOT_POWER2, 136 136 HIST_FIELD_FN_DIV_MULT_SHIFT, 137 137 HIST_FIELD_FN_EXECNAME, 138 + HIST_FIELD_FN_STACK, 138 139 }; 139 140 140 141 /* ··· 480 479 481 480 #define for_each_hist_key_field(i, hist_data) \ 482 481 for ((i) = (hist_data)->n_vals; (i) < (hist_data)->n_fields; (i)++) 483 - 484 - #define HIST_STACKTRACE_DEPTH 16 485 - #define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long)) 486 - #define HIST_STACKTRACE_SKIP 5 487 482 488 483 #define HITCOUNT_IDX 0 489 484 #define HIST_KEY_SIZE_MAX (MAX_FILTER_STR_VAL + HIST_STACKTRACE_SIZE) ··· 1357 1360 field_name = field->name; 1358 1361 } else if (field->flags & HIST_FIELD_FL_TIMESTAMP) 1359 1362 field_name = "common_timestamp"; 1360 - else if (field->flags & HIST_FIELD_FL_HITCOUNT) 1363 + else if (field->flags & HIST_FIELD_FL_STACKTRACE) { 1364 + if (field->field) 1365 + field_name = field->field->name; 1366 + else 1367 + field_name = "stacktrace"; 1368 + } else if (field->flags & HIST_FIELD_FL_HITCOUNT) 1361 1369 field_name = "hitcount"; 1362 1370 1363 1371 if (field_name == NULL) ··· 1720 1718 flags_str = "percent"; 1721 1719 else if (hist_field->flags & HIST_FIELD_FL_GRAPH) 1722 1720 flags_str = "graph"; 1721 + else if (hist_field->flags & HIST_FIELD_FL_STACKTRACE) 1722 + flags_str = "stacktrace"; 1723 1723 1724 1724 return flags_str; 1725 1725 } ··· 1983 1979 } 1984 1980 1985 1981 if (flags & HIST_FIELD_FL_STACKTRACE) { 1986 - hist_field->fn_num = HIST_FIELD_FN_NOP; 1982 + if (field) 1983 + hist_field->fn_num = HIST_FIELD_FN_STACK; 1984 + else 1985 + hist_field->fn_num = HIST_FIELD_FN_NOP; 1986 + hist_field->size = HIST_STACKTRACE_SIZE; 1987 + hist_field->type = kstrdup_const("unsigned long[]", GFP_KERNEL); 1988 + if (!hist_field->type) 1989 + goto free; 1987 1990 goto out; 1988 1991 } 1989 1992 ··· 2323 2312 *flags |= HIST_FIELD_FL_EXECNAME; 2324 2313 else if (strcmp(modifier, "syscall") == 0) 2325 2314 *flags |= HIST_FIELD_FL_SYSCALL; 2315 + else if (strcmp(modifier, "stacktrace") == 0) 2316 + *flags |= HIST_FIELD_FL_STACKTRACE; 2326 2317 else if (strcmp(modifier, "log2") == 0) 2327 2318 *flags |= HIST_FIELD_FL_LOG2; 2328 2319 else if (strcmp(modifier, "usecs") == 0) ··· 2364 2351 hist_data->enable_timestamps = true; 2365 2352 if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS) 2366 2353 hist_data->attrs->ts_in_usecs = true; 2354 + } else if (strcmp(field_name, "stacktrace") == 0) { 2355 + *flags |= HIST_FIELD_FL_STACKTRACE; 2367 2356 } else if (strcmp(field_name, "common_cpu") == 0) 2368 2357 *flags |= HIST_FIELD_FL_CPU; 2369 2358 else if (strcmp(field_name, "hitcount") == 0) ··· 3126 3111 unsigned int i, j, var_idx; 3127 3112 u64 var_val; 3128 3113 3114 + /* Make sure stacktrace can fit in the string variable length */ 3115 + BUILD_BUG_ON((HIST_STACKTRACE_DEPTH + 1) * sizeof(long) >= STR_VAR_LEN_MAX); 3116 + 3129 3117 for (i = 0, j = field_var_str_start; i < n_field_vars; i++) { 3130 3118 struct field_var *field_var = field_vars[i]; 3131 3119 struct hist_field *var = field_var->var; ··· 3137 3119 var_val = hist_fn_call(val, elt, buffer, rbe, rec); 3138 3120 var_idx = var->var.idx; 3139 3121 3140 - if (val->flags & HIST_FIELD_FL_STRING) { 3122 + if (val->flags & (HIST_FIELD_FL_STRING | 3123 + HIST_FIELD_FL_STACKTRACE)) { 3141 3124 char *str = elt_data->field_var_str[j++]; 3142 3125 char *val_str = (char *)(uintptr_t)var_val; 3143 3126 unsigned int size; 3144 3127 3145 - size = min(val->size, STR_VAR_LEN_MAX); 3146 - strscpy(str, val_str, size); 3128 + if (val->flags & HIST_FIELD_FL_STRING) { 3129 + size = min(val->size, STR_VAR_LEN_MAX); 3130 + strscpy(str, val_str, size); 3131 + } else { 3132 + char *stack_start = str + sizeof(unsigned long); 3133 + int e; 3134 + 3135 + e = stack_trace_save((void *)stack_start, 3136 + HIST_STACKTRACE_DEPTH, 3137 + HIST_STACKTRACE_SKIP); 3138 + if (e < HIST_STACKTRACE_DEPTH - 1) 3139 + ((unsigned long *)stack_start)[e] = 0; 3140 + *((unsigned long *)str) = e; 3141 + } 3147 3142 var_val = (u64)(uintptr_t)str; 3148 3143 } 3149 3144 tracing_map_set_var(elt, var_idx, var_val); ··· 3855 3824 { 3856 3825 hist_data->field_vars[hist_data->n_field_vars++] = field_var; 3857 3826 3858 - if (field_var->val->flags & HIST_FIELD_FL_STRING) 3827 + /* Stack traces are saved in the string storage too */ 3828 + if (field_var->val->flags & (HIST_FIELD_FL_STRING | HIST_FIELD_FL_STACKTRACE)) 3859 3829 hist_data->n_field_var_str++; 3860 3830 } 3861 3831 ··· 3879 3847 */ 3880 3848 if (strstr(hist_field->type, "char[") && field->is_string 3881 3849 && field->is_dynamic) 3850 + return 0; 3851 + 3852 + if (strstr(hist_field->type, "long[") && field->is_stack) 3882 3853 return 0; 3883 3854 3884 3855 if (strcmp(field->type, hist_field->type) != 0) { ··· 4138 4103 } 4139 4104 4140 4105 hist_data->save_vars[hist_data->n_save_vars++] = field_var; 4141 - if (field_var->val->flags & HIST_FIELD_FL_STRING) 4106 + if (field_var->val->flags & 4107 + (HIST_FIELD_FL_STRING | HIST_FIELD_FL_STACKTRACE)) 4142 4108 hist_data->n_save_var_str++; 4143 4109 kfree(param); 4144 4110 } ··· 4278 4242 return (u64)(unsigned long)(elt_data->comm); 4279 4243 } 4280 4244 4245 + static u64 hist_field_stack(struct hist_field *hist_field, 4246 + struct tracing_map_elt *elt, 4247 + struct trace_buffer *buffer, 4248 + struct ring_buffer_event *rbe, 4249 + void *event) 4250 + { 4251 + u32 str_item = *(u32 *)(event + hist_field->field->offset); 4252 + int str_loc = str_item & 0xffff; 4253 + char *addr = (char *)(event + str_loc); 4254 + 4255 + return (u64)(unsigned long)addr; 4256 + } 4257 + 4281 4258 static u64 hist_fn_call(struct hist_field *hist_field, 4282 4259 struct tracing_map_elt *elt, 4283 4260 struct trace_buffer *buffer, ··· 4354 4305 return div_by_mult_and_shift(hist_field, elt, buffer, rbe, event); 4355 4306 case HIST_FIELD_FN_EXECNAME: 4356 4307 return hist_field_execname(hist_field, elt, buffer, rbe, event); 4308 + case HIST_FIELD_FN_STACK: 4309 + return hist_field_stack(hist_field, elt, buffer, rbe, event); 4357 4310 default: 4358 4311 return 0; 4359 4312 } ··· 4402 4351 if (!ret && hist_data->fields[val_idx]->flags & HIST_FIELD_FL_EXECNAME) 4403 4352 update_var_execname(hist_data->fields[val_idx]); 4404 4353 4405 - if (!ret && hist_data->fields[val_idx]->flags & HIST_FIELD_FL_STRING) 4354 + if (!ret && hist_data->fields[val_idx]->flags & 4355 + (HIST_FIELD_FL_STRING | HIST_FIELD_FL_STACKTRACE)) 4406 4356 hist_data->fields[val_idx]->var_str_idx = hist_data->n_var_str++; 4407 4357 4408 4358 return ret; ··· 5144 5092 if (hist_field->flags & HIST_FIELD_FL_VAR) { 5145 5093 var_idx = hist_field->var.idx; 5146 5094 5147 - if (hist_field->flags & HIST_FIELD_FL_STRING) { 5095 + if (hist_field->flags & 5096 + (HIST_FIELD_FL_STRING | HIST_FIELD_FL_STACKTRACE)) { 5148 5097 unsigned int str_start, var_str_idx, idx; 5149 5098 char *str, *val_str; 5150 5099 unsigned int size; ··· 5158 5105 str = elt_data->field_var_str[idx]; 5159 5106 val_str = (char *)(uintptr_t)hist_val; 5160 5107 5161 - size = min(hist_field->size, STR_VAR_LEN_MAX); 5162 - strscpy(str, val_str, size); 5108 + if (hist_field->flags & HIST_FIELD_FL_STRING) { 5109 + size = min(hist_field->size, STR_VAR_LEN_MAX); 5110 + strscpy(str, val_str, size); 5111 + } else { 5112 + char *stack_start = str + sizeof(unsigned long); 5113 + int e; 5163 5114 5115 + e = stack_trace_save((void *)stack_start, 5116 + HIST_STACKTRACE_DEPTH, 5117 + HIST_STACKTRACE_SKIP); 5118 + if (e < HIST_STACKTRACE_DEPTH - 1) 5119 + ((unsigned long *)stack_start)[e] = 0; 5120 + *((unsigned long *)str) = e; 5121 + } 5164 5122 hist_val = (u64)(uintptr_t)str; 5165 5123 } 5166 5124 tracing_map_set_var(elt, var_idx, hist_val); ··· 5257 5193 5258 5194 if (key_field->flags & HIST_FIELD_FL_STACKTRACE) { 5259 5195 memset(entries, 0, HIST_STACKTRACE_SIZE); 5260 - stack_trace_save(entries, HIST_STACKTRACE_DEPTH, 5261 - HIST_STACKTRACE_SKIP); 5196 + if (key_field->field) { 5197 + unsigned long *stack, n_entries; 5198 + 5199 + field_contents = hist_fn_call(key_field, elt, buffer, rbe, rec); 5200 + stack = (unsigned long *)(long)field_contents; 5201 + n_entries = *stack; 5202 + memcpy(entries, ++stack, n_entries * sizeof(unsigned long)); 5203 + } else { 5204 + stack_trace_save(entries, HIST_STACKTRACE_DEPTH, 5205 + HIST_STACKTRACE_SKIP); 5206 + } 5262 5207 key = entries; 5263 5208 } else { 5264 5209 field_contents = hist_fn_call(key_field, elt, buffer, rbe, rec); ··· 5370 5297 seq_printf(m, "%s: %-30s[%3llu]", field_name, 5371 5298 syscall_name, uval); 5372 5299 } else if (key_field->flags & HIST_FIELD_FL_STACKTRACE) { 5373 - seq_puts(m, "stacktrace:\n"); 5300 + if (key_field->field) 5301 + seq_printf(m, "%s.stacktrace", key_field->field->name); 5302 + else 5303 + seq_puts(m, "stacktrace:\n"); 5374 5304 hist_trigger_stacktrace_print(m, 5375 5305 key + key_field->offset, 5376 5306 HIST_STACKTRACE_DEPTH); ··· 5918 5842 5919 5843 if (hist_field->flags) { 5920 5844 if (!(hist_field->flags & HIST_FIELD_FL_VAR_REF) && 5921 - !(hist_field->flags & HIST_FIELD_FL_EXPR)) { 5845 + !(hist_field->flags & HIST_FIELD_FL_EXPR) && 5846 + !(hist_field->flags & HIST_FIELD_FL_STACKTRACE)) { 5922 5847 const char *flags = get_hist_field_flags(hist_field); 5923 5848 5924 5849 if (flags) ··· 5952 5875 if (i > hist_data->n_vals) 5953 5876 seq_puts(m, ","); 5954 5877 5955 - if (field->flags & HIST_FIELD_FL_STACKTRACE) 5956 - seq_puts(m, "stacktrace"); 5957 - else 5878 + if (field->flags & HIST_FIELD_FL_STACKTRACE) { 5879 + if (field->field) 5880 + seq_printf(m, "%s.stacktrace", field->field->name); 5881 + else 5882 + seq_puts(m, "stacktrace"); 5883 + } else 5958 5884 hist_field_print(m, field); 5959 5885 } 5960 5886

+86 -4

kernel/trace/trace_events_synth.c

··· 173 173 return false; 174 174 } 175 175 176 + static int synth_field_is_stack(char *type) 177 + { 178 + if (strstr(type, "long[") != NULL) 179 + return true; 180 + 181 + return false; 182 + } 183 + 176 184 static int synth_field_string_size(char *type) 177 185 { 178 186 char buf[4], *end, *start; ··· 256 248 size = sizeof(gfp_t); 257 249 else if (synth_field_is_string(type)) 258 250 size = synth_field_string_size(type); 251 + else if (synth_field_is_stack(type)) 252 + size = 0; 259 253 260 254 return size; 261 255 } ··· 302 292 fmt = "%x"; 303 293 else if (synth_field_is_string(type)) 304 294 fmt = "%.*s"; 295 + else if (synth_field_is_stack(type)) 296 + fmt = "%s"; 305 297 306 298 return fmt; 307 299 } ··· 383 371 i == se->n_fields - 1 ? "" : " "); 384 372 n_u64 += STR_VAR_LEN_MAX / sizeof(u64); 385 373 } 374 + } else if (se->fields[i]->is_stack) { 375 + u32 offset, data_offset, len; 376 + unsigned long *p, *end; 377 + 378 + offset = (u32)entry->fields[n_u64]; 379 + data_offset = offset & 0xffff; 380 + len = offset >> 16; 381 + 382 + p = (void *)entry + data_offset; 383 + end = (void *)p + len - (sizeof(long) - 1); 384 + 385 + trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name); 386 + 387 + for (; *p && p < end; p++) 388 + trace_seq_printf(s, "=> %pS\n", (void *)*p); 389 + n_u64++; 390 + 386 391 } else { 387 392 struct trace_print_flags __flags[] = { 388 393 __def_gfpflag_names, {-1, NULL} }; ··· 445 416 if (is_dynamic) { 446 417 u32 data_offset; 447 418 448 - data_offset = offsetof(typeof(*entry), fields); 449 - data_offset += event->n_u64 * sizeof(u64); 419 + data_offset = struct_size(entry, fields, event->n_u64); 450 420 data_offset += data_size; 451 421 452 422 len = kern_fetch_store_strlen((unsigned long)str_val); ··· 471 443 472 444 (*n_u64) += STR_VAR_LEN_MAX / sizeof(u64); 473 445 } 446 + 447 + return len; 448 + } 449 + 450 + static unsigned int trace_stack(struct synth_trace_event *entry, 451 + struct synth_event *event, 452 + long *stack, 453 + unsigned int data_size, 454 + unsigned int *n_u64) 455 + { 456 + unsigned int len; 457 + u32 data_offset; 458 + void *data_loc; 459 + 460 + data_offset = struct_size(entry, fields, event->n_u64); 461 + data_offset += data_size; 462 + 463 + for (len = 0; len < HIST_STACKTRACE_DEPTH; len++) { 464 + if (!stack[len]) 465 + break; 466 + } 467 + 468 + /* Include the zero'd element if it fits */ 469 + if (len < HIST_STACKTRACE_DEPTH) 470 + len++; 471 + 472 + len *= sizeof(long); 473 + 474 + /* Find the dynamic section to copy the stack into. */ 475 + data_loc = (void *)entry + data_offset; 476 + memcpy(data_loc, stack, len); 477 + 478 + /* Fill in the field that holds the offset/len combo */ 479 + data_offset |= len << 16; 480 + *(u32 *)&entry->fields[*n_u64] = data_offset; 481 + 482 + (*n_u64)++; 474 483 475 484 return len; 476 485 } ··· 538 473 val_idx = var_ref_idx[field_pos]; 539 474 str_val = (char *)(long)var_ref_vals[val_idx]; 540 475 541 - len = kern_fetch_store_strlen((unsigned long)str_val); 476 + if (event->dynamic_fields[i]->is_stack) { 477 + len = *((unsigned long *)str_val); 478 + len *= sizeof(unsigned long); 479 + } else { 480 + len = kern_fetch_store_strlen((unsigned long)str_val); 481 + } 542 482 543 483 fields_size += len; 544 484 } ··· 569 499 event->fields[i]->is_dynamic, 570 500 data_size, &n_u64); 571 501 data_size += len; /* only dynamic string increments */ 502 + } else if (event->fields[i]->is_stack) { 503 + long *stack = (long *)(long)var_ref_vals[val_idx]; 504 + 505 + len = trace_stack(entry, event, stack, 506 + data_size, &n_u64); 507 + data_size += len; 572 508 } else { 573 509 struct synth_field *field = event->fields[i]; 574 510 u64 val = var_ref_vals[val_idx]; ··· 637 561 event->fields[i]->is_dynamic) 638 562 pos += snprintf(buf + pos, LEN_OR_ZERO, 639 563 ", __get_str(%s)", event->fields[i]->name); 564 + else if (event->fields[i]->is_stack) 565 + pos += snprintf(buf + pos, LEN_OR_ZERO, 566 + ", __get_stacktrace(%s)", event->fields[i]->name); 640 567 else 641 568 pos += snprintf(buf + pos, LEN_OR_ZERO, 642 569 ", REC->%s", event->fields[i]->name); ··· 776 697 ret = -EINVAL; 777 698 goto free; 778 699 } else if (size == 0) { 779 - if (synth_field_is_string(field->type)) { 700 + if (synth_field_is_string(field->type) || 701 + synth_field_is_stack(field->type)) { 780 702 char *type; 781 703 782 704 len = sizeof("__data_loc ") + strlen(field->type) + 1; ··· 808 728 809 729 if (synth_field_is_string(field->type)) 810 730 field->is_string = true; 731 + else if (synth_field_is_stack(field->type)) 732 + field->is_stack = true; 811 733 812 734 field->is_signed = synth_field_signed(field->type); 813 735 out:

+1 -1

kernel/trace/trace_osnoise.c

··· 1539 1539 wake_time = ktime_add_us(ktime_get(), interval); 1540 1540 __set_current_state(TASK_INTERRUPTIBLE); 1541 1541 1542 - while (schedule_hrtimeout_range(&wake_time, 0, HRTIMER_MODE_ABS)) { 1542 + while (schedule_hrtimeout(&wake_time, HRTIMER_MODE_ABS)) { 1543 1543 if (kthread_should_stop()) 1544 1544 break; 1545 1545 }

+23

kernel/trace/trace_seq.c

··· 403 403 return 1; 404 404 } 405 405 EXPORT_SYMBOL(trace_seq_hex_dump); 406 + 407 + /* 408 + * trace_seq_acquire - acquire seq buffer with size len 409 + * @s: trace sequence descriptor 410 + * @len: size of buffer to be acquired 411 + * 412 + * acquire buffer with size of @len from trace_seq for output usage, 413 + * user can fill string into that buffer. 414 + * 415 + * Returns start address of acquired buffer. 416 + * 417 + * it allow multiple usage in one trace output function call. 418 + */ 419 + char *trace_seq_acquire(struct trace_seq *s, unsigned int len) 420 + { 421 + char *ret = trace_seq_buffer_ptr(s); 422 + 423 + if (!WARN_ON_ONCE(seq_buf_buffer_left(&s->seq) < len)) 424 + seq_buf_commit(&s->seq, len); 425 + 426 + return ret; 427 + } 428 + EXPORT_SYMBOL(trace_seq_acquire);

+1

kernel/trace/trace_synth.h

··· 18 18 bool is_signed; 19 19 bool is_string; 20 20 bool is_dynamic; 21 + bool is_stack; 21 22 }; 22 23 23 24 struct synth_event {

+2 -2

kernel/tracepoint.c

··· 571 571 bool trace_module_has_bad_taint(struct module *mod) 572 572 { 573 573 return mod->taints & ~((1 << TAINT_OOT_MODULE) | (1 << TAINT_CRAP) | 574 - (1 << TAINT_UNSIGNED_MODULE) | 575 - (1 << TAINT_TEST)); 574 + (1 << TAINT_UNSIGNED_MODULE) | (1 << TAINT_TEST) | 575 + (1 << TAINT_LIVEPATCH)); 576 576 } 577 577 578 578 static BLOCKING_NOTIFIER_HEAD(tracepoint_notify_list);

+7

samples/Kconfig

··· 46 46 that hooks to wake_up_process and schedule, and prints 47 47 the function addresses. 48 48 49 + config SAMPLE_FTRACE_OPS 50 + tristate "Build custom ftrace ops example" 51 + depends on FUNCTION_TRACER 52 + help 53 + This builds an ftrace ops example that hooks two functions and 54 + measures the time taken to invoke one function a number of times. 55 + 49 56 config SAMPLE_TRACE_ARRAY 50 57 tristate "Build sample module for kernel access to Ftrace instancess" 51 58 depends on EVENT_TRACING && m

+1

samples/Makefile

··· 24 24 obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ 25 25 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace/ 26 26 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT_MULTI) += ftrace/ 27 + obj-$(CONFIG_SAMPLE_FTRACE_OPS) += ftrace/ 27 28 obj-$(CONFIG_SAMPLE_TRACE_ARRAY) += ftrace/ 28 29 subdir-$(CONFIG_SAMPLE_UHID) += uhid 29 30 obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/

+1

samples/ftrace/Makefile

··· 5 5 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct-modify.o 6 6 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT_MULTI) += ftrace-direct-multi.o 7 7 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT_MULTI) += ftrace-direct-multi-modify.o 8 + obj-$(CONFIG_SAMPLE_FTRACE_OPS) += ftrace-ops.o 8 9 9 10 CFLAGS_sample-trace-array.o := -I$(src) 10 11 obj-$(CONFIG_SAMPLE_TRACE_ARRAY) += sample-trace-array.o

+1 -1

samples/ftrace/ftrace-direct-modify.c

··· 3 3 #include <linux/kthread.h> 4 4 #include <linux/ftrace.h> 5 5 #include <asm/asm-offsets.h> 6 - #include <asm/nospec-branch.h> 7 6 8 7 extern void my_direct_func1(void); 9 8 extern void my_direct_func2(void); ··· 25 26 #ifdef CONFIG_X86_64 26 27 27 28 #include <asm/ibt.h> 29 + #include <asm/nospec-branch.h> 28 30 29 31 asm ( 30 32 " .pushsection .text, \"ax\", @progbits\n"

+1 -1

samples/ftrace/ftrace-direct-multi-modify.c

··· 3 3 #include <linux/kthread.h> 4 4 #include <linux/ftrace.h> 5 5 #include <asm/asm-offsets.h> 6 - #include <asm/nospec-branch.h> 7 6 8 7 extern void my_direct_func1(unsigned long ip); 9 8 extern void my_direct_func2(unsigned long ip); ··· 23 24 #ifdef CONFIG_X86_64 24 25 25 26 #include <asm/ibt.h> 27 + #include <asm/nospec-branch.h> 26 28 27 29 asm ( 28 30 " .pushsection .text, \"ax\", @progbits\n"

+1 -1

samples/ftrace/ftrace-direct-multi.c

··· 5 5 #include <linux/ftrace.h> 6 6 #include <linux/sched/stat.h> 7 7 #include <asm/asm-offsets.h> 8 - #include <asm/nospec-branch.h> 9 8 10 9 extern void my_direct_func(unsigned long ip); 11 10 ··· 18 19 #ifdef CONFIG_X86_64 19 20 20 21 #include <asm/ibt.h> 22 + #include <asm/nospec-branch.h> 21 23 22 24 asm ( 23 25 " .pushsection .text, \"ax\", @progbits\n"

+1 -1

samples/ftrace/ftrace-direct-too.c

··· 4 4 #include <linux/mm.h> /* for handle_mm_fault() */ 5 5 #include <linux/ftrace.h> 6 6 #include <asm/asm-offsets.h> 7 - #include <asm/nospec-branch.h> 8 7 9 8 extern void my_direct_func(struct vm_area_struct *vma, 10 9 unsigned long address, unsigned int flags); ··· 20 21 #ifdef CONFIG_X86_64 21 22 22 23 #include <asm/ibt.h> 24 + #include <asm/nospec-branch.h> 23 25 24 26 asm ( 25 27 " .pushsection .text, \"ax\", @progbits\n"

+1 -1

samples/ftrace/ftrace-direct.c

··· 4 4 #include <linux/sched.h> /* for wake_up_process() */ 5 5 #include <linux/ftrace.h> 6 6 #include <asm/asm-offsets.h> 7 - #include <asm/nospec-branch.h> 8 7 9 8 extern void my_direct_func(struct task_struct *p); 10 9 ··· 17 18 #ifdef CONFIG_X86_64 18 19 19 20 #include <asm/ibt.h> 21 + #include <asm/nospec-branch.h> 20 22 21 23 asm ( 22 24 " .pushsection .text, \"ax\", @progbits\n"

+252

samples/ftrace/ftrace-ops.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 4 + 5 + #include <linux/ftrace.h> 6 + #include <linux/ktime.h> 7 + #include <linux/module.h> 8 + 9 + #include <asm/barrier.h> 10 + 11 + /* 12 + * Arbitrary large value chosen to be sufficiently large to minimize noise but 13 + * sufficiently small to complete quickly. 14 + */ 15 + static unsigned int nr_function_calls = 100000; 16 + module_param(nr_function_calls, uint, 0); 17 + MODULE_PARM_DESC(nr_function_calls, "How many times to call the relevant tracee"); 18 + 19 + /* 20 + * The number of ops associated with a call site affects whether a tracer can 21 + * be called directly or whether it's necessary to go via the list func, which 22 + * can be significantly more expensive. 23 + */ 24 + static unsigned int nr_ops_relevant = 1; 25 + module_param(nr_ops_relevant, uint, 0); 26 + MODULE_PARM_DESC(nr_ops_relevant, "How many ftrace_ops to associate with the relevant tracee"); 27 + 28 + /* 29 + * On architectures where all call sites share the same trampoline, having 30 + * tracers enabled for distinct functions can force the use of the list func 31 + * and incur overhead for all call sites. 32 + */ 33 + static unsigned int nr_ops_irrelevant; 34 + module_param(nr_ops_irrelevant, uint, 0); 35 + MODULE_PARM_DESC(nr_ops_irrelevant, "How many ftrace_ops to associate with the irrelevant tracee"); 36 + 37 + /* 38 + * On architectures with DYNAMIC_FTRACE_WITH_REGS, saving the full pt_regs can 39 + * be more expensive than only saving the minimal necessary regs. 40 + */ 41 + static bool save_regs; 42 + module_param(save_regs, bool, 0); 43 + MODULE_PARM_DESC(save_regs, "Register ops with FTRACE_OPS_FL_SAVE_REGS (save all registers in the trampoline)"); 44 + 45 + static bool assist_recursion; 46 + module_param(assist_recursion, bool, 0); 47 + MODULE_PARM_DESC(assist_reursion, "Register ops with FTRACE_OPS_FL_RECURSION"); 48 + 49 + static bool assist_rcu; 50 + module_param(assist_rcu, bool, 0); 51 + MODULE_PARM_DESC(assist_reursion, "Register ops with FTRACE_OPS_FL_RCU"); 52 + 53 + /* 54 + * By default, a trivial tracer is used which immediately returns to mimimize 55 + * overhead. Sometimes a consistency check using a more expensive tracer is 56 + * desireable. 57 + */ 58 + static bool check_count; 59 + module_param(check_count, bool, 0); 60 + MODULE_PARM_DESC(check_count, "Check that tracers are called the expected number of times\n"); 61 + 62 + /* 63 + * Usually it's not interesting to leave the ops registered after the test 64 + * runs, but sometimes it can be useful to leave them registered so that they 65 + * can be inspected through the tracefs 'enabled_functions' file. 66 + */ 67 + static bool persist; 68 + module_param(persist, bool, 0); 69 + MODULE_PARM_DESC(persist, "Successfully load module and leave ftrace ops registered after test completes\n"); 70 + 71 + /* 72 + * Marked as noinline to ensure that an out-of-line traceable copy is 73 + * generated by the compiler. 74 + * 75 + * The barrier() ensures the compiler won't elide calls by determining there 76 + * are no side-effects. 77 + */ 78 + static noinline void tracee_relevant(void) 79 + { 80 + barrier(); 81 + } 82 + 83 + /* 84 + * Marked as noinline to ensure that an out-of-line traceable copy is 85 + * generated by the compiler. 86 + * 87 + * The barrier() ensures the compiler won't elide calls by determining there 88 + * are no side-effects. 89 + */ 90 + static noinline void tracee_irrelevant(void) 91 + { 92 + barrier(); 93 + } 94 + 95 + struct sample_ops { 96 + struct ftrace_ops ops; 97 + unsigned int count; 98 + }; 99 + 100 + static void ops_func_nop(unsigned long ip, unsigned long parent_ip, 101 + struct ftrace_ops *op, 102 + struct ftrace_regs *fregs) 103 + { 104 + /* do nothing */ 105 + } 106 + 107 + static void ops_func_count(unsigned long ip, unsigned long parent_ip, 108 + struct ftrace_ops *op, 109 + struct ftrace_regs *fregs) 110 + { 111 + struct sample_ops *self; 112 + 113 + self = container_of(op, struct sample_ops, ops); 114 + self->count++; 115 + } 116 + 117 + static struct sample_ops *ops_relevant; 118 + static struct sample_ops *ops_irrelevant; 119 + 120 + static struct sample_ops *ops_alloc_init(void *tracee, ftrace_func_t func, 121 + unsigned long flags, int nr) 122 + { 123 + struct sample_ops *ops; 124 + 125 + ops = kcalloc(nr, sizeof(*ops), GFP_KERNEL); 126 + if (WARN_ON_ONCE(!ops)) 127 + return NULL; 128 + 129 + for (unsigned int i = 0; i < nr; i++) { 130 + ops[i].ops.func = func; 131 + ops[i].ops.flags = flags; 132 + WARN_ON_ONCE(ftrace_set_filter_ip(&ops[i].ops, (unsigned long)tracee, 0, 0)); 133 + WARN_ON_ONCE(register_ftrace_function(&ops[i].ops)); 134 + } 135 + 136 + return ops; 137 + } 138 + 139 + static void ops_destroy(struct sample_ops *ops, int nr) 140 + { 141 + if (!ops) 142 + return; 143 + 144 + for (unsigned int i = 0; i < nr; i++) { 145 + WARN_ON_ONCE(unregister_ftrace_function(&ops[i].ops)); 146 + ftrace_free_filter(&ops[i].ops); 147 + } 148 + 149 + kfree(ops); 150 + } 151 + 152 + static void ops_check(struct sample_ops *ops, int nr, 153 + unsigned int expected_count) 154 + { 155 + if (!ops || !check_count) 156 + return; 157 + 158 + for (unsigned int i = 0; i < nr; i++) { 159 + if (ops->count == expected_count) 160 + continue; 161 + pr_warn("Counter called %u times (expected %u)\n", 162 + ops->count, expected_count); 163 + } 164 + } 165 + 166 + static ftrace_func_t tracer_relevant = ops_func_nop; 167 + static ftrace_func_t tracer_irrelevant = ops_func_nop; 168 + 169 + static int __init ftrace_ops_sample_init(void) 170 + { 171 + unsigned long flags = 0; 172 + ktime_t start, end; 173 + u64 period; 174 + 175 + if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) && save_regs) { 176 + pr_info("this kernel does not support saving registers\n"); 177 + save_regs = false; 178 + } else if (save_regs) { 179 + flags |= FTRACE_OPS_FL_SAVE_REGS; 180 + } 181 + 182 + if (assist_recursion) 183 + flags |= FTRACE_OPS_FL_RECURSION; 184 + 185 + if (assist_rcu) 186 + flags |= FTRACE_OPS_FL_RCU; 187 + 188 + if (check_count) { 189 + tracer_relevant = ops_func_count; 190 + tracer_irrelevant = ops_func_count; 191 + } 192 + 193 + pr_info("registering:\n" 194 + " relevant ops: %u\n" 195 + " tracee: %ps\n" 196 + " tracer: %ps\n" 197 + " irrelevant ops: %u\n" 198 + " tracee: %ps\n" 199 + " tracer: %ps\n" 200 + " saving registers: %s\n" 201 + " assist recursion: %s\n" 202 + " assist RCU: %s\n", 203 + nr_ops_relevant, tracee_relevant, tracer_relevant, 204 + nr_ops_irrelevant, tracee_irrelevant, tracer_irrelevant, 205 + save_regs ? "YES" : "NO", 206 + assist_recursion ? "YES" : "NO", 207 + assist_rcu ? "YES" : "NO"); 208 + 209 + ops_relevant = ops_alloc_init(tracee_relevant, tracer_relevant, 210 + flags, nr_ops_relevant); 211 + ops_irrelevant = ops_alloc_init(tracee_irrelevant, tracer_irrelevant, 212 + flags, nr_ops_irrelevant); 213 + 214 + start = ktime_get(); 215 + for (unsigned int i = 0; i < nr_function_calls; i++) 216 + tracee_relevant(); 217 + end = ktime_get(); 218 + 219 + ops_check(ops_relevant, nr_ops_relevant, nr_function_calls); 220 + ops_check(ops_irrelevant, nr_ops_irrelevant, 0); 221 + 222 + period = ktime_to_ns(ktime_sub(end, start)); 223 + 224 + pr_info("Attempted %u calls to %ps in %lluns (%lluns / call)\n", 225 + nr_function_calls, tracee_relevant, 226 + period, div_u64(period, nr_function_calls)); 227 + 228 + if (persist) 229 + return 0; 230 + 231 + ops_destroy(ops_relevant, nr_ops_relevant); 232 + ops_destroy(ops_irrelevant, nr_ops_irrelevant); 233 + 234 + /* 235 + * The benchmark completed sucessfully, but there's no reason to keep 236 + * the module around. Return an error do the user doesn't have to 237 + * manually unload the module. 238 + */ 239 + return -EINVAL; 240 + } 241 + module_init(ftrace_ops_sample_init); 242 + 243 + static void __exit ftrace_ops_sample_exit(void) 244 + { 245 + ops_destroy(ops_relevant, nr_ops_relevant); 246 + ops_destroy(ops_irrelevant, nr_ops_irrelevant); 247 + } 248 + module_exit(ftrace_ops_sample_exit); 249 + 250 + MODULE_AUTHOR("Mark Rutland"); 251 + MODULE_DESCRIPTION("Example of using custom ftrace_ops"); 252 + MODULE_LICENSE("GPL");

+2 -2

samples/user_events/example.c

··· 23 23 #endif 24 24 25 25 /* Assumes debugfs is mounted */ 26 - const char *data_file = "/sys/kernel/debug/tracing/user_events_data"; 27 - const char *status_file = "/sys/kernel/debug/tracing/user_events_status"; 26 + const char *data_file = "/sys/kernel/tracing/user_events_data"; 27 + const char *status_file = "/sys/kernel/tracing/user_events_status"; 28 28 29 29 static int event_status(long **status) 30 30 {

+3 -3

scripts/tracing/draw_functrace.py

··· 12 12 13 13 Usage: 14 14 Be sure that you have CONFIG_FUNCTION_TRACER 15 - # mount -t debugfs nodev /sys/kernel/debug 16 - # echo function > /sys/kernel/debug/tracing/current_tracer 17 - $ cat /sys/kernel/debug/tracing/trace_pipe > ~/raw_trace_func 15 + # mount -t tracefs nodev /sys/kernel/tracing 16 + # echo function > /sys/kernel/tracing/current_tracer 17 + $ cat /sys/kernel/tracing/trace_pipe > ~/raw_trace_func 18 18 Wait some times but not too much, the script is a bit slow. 19 19 Break the pipe (Ctrl + Z) 20 20 $ scripts/tracing/draw_functrace.py < ~/raw_trace_func > draw_functrace

+2 -2

tools/lib/api/fs/tracing_path.c

··· 14 14 #include "tracing_path.h" 15 15 16 16 static char tracing_mnt[PATH_MAX] = "/sys/kernel/debug"; 17 - static char tracing_path[PATH_MAX] = "/sys/kernel/debug/tracing"; 18 - static char tracing_events_path[PATH_MAX] = "/sys/kernel/debug/tracing/events"; 17 + static char tracing_path[PATH_MAX] = "/sys/kernel/tracing"; 18 + static char tracing_events_path[PATH_MAX] = "/sys/kernel/tracing/events"; 19 19 20 20 static void __tracing_path_set(const char *tracing, const char *mountpoint) 21 21 {

+58

tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc

··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: event filter function - test event filtering on functions 4 + # requires: set_event events/kmem/kmem_cache_free/filter 5 + # flags: instance 6 + 7 + fail() { #msg 8 + echo $1 9 + exit_fail 10 + } 11 + 12 + echo "Test event filter function name" 13 + echo 0 > tracing_on 14 + echo 0 > events/enable 15 + echo > trace 16 + echo 'call_site.function == exit_mmap' > events/kmem/kmem_cache_free/filter 17 + echo 1 > events/kmem/kmem_cache_free/enable 18 + echo 1 > tracing_on 19 + ls > /dev/null 20 + echo 0 > events/kmem/kmem_cache_free/enable 21 + 22 + hitcnt=`grep kmem_cache_free trace| grep exit_mmap | wc -l` 23 + misscnt=`grep kmem_cache_free trace| grep -v exit_mmap | wc -l` 24 + 25 + if [ $hitcnt -eq 0 ]; then 26 + exit_fail 27 + fi 28 + 29 + if [ $misscnt -gt 0 ]; then 30 + exit_fail 31 + fi 32 + 33 + address=`grep ' exit_mmap$' /proc/kallsyms | cut -d' ' -f1` 34 + 35 + echo "Test event filter function address" 36 + echo 0 > tracing_on 37 + echo 0 > events/enable 38 + echo > trace 39 + echo "call_site.function == 0x$address" > events/kmem/kmem_cache_free/filter 40 + echo 1 > events/kmem/kmem_cache_free/enable 41 + echo 1 > tracing_on 42 + sleep 1 43 + echo 0 > events/kmem/kmem_cache_free/enable 44 + 45 + hitcnt=`grep kmem_cache_free trace| grep exit_mmap | wc -l` 46 + misscnt=`grep kmem_cache_free trace| grep -v exit_mmap | wc -l` 47 + 48 + if [ $hitcnt -eq 0 ]; then 49 + exit_fail 50 + fi 51 + 52 + if [ $misscnt -gt 0 ]; then 53 + exit_fail 54 + fi 55 + 56 + reset_events_filter 57 + 58 + exit 0

+24

tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack.tc

··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: event trigger - test inter-event histogram trigger trace action with dynamic string param 4 + # requires: set_event synthetic_events events/sched/sched_process_exec/hist "long[]' >> synthetic_events":README 5 + 6 + fail() { #msg 7 + echo $1 8 + exit_fail 9 + } 10 + 11 + echo "Test create synthetic event with stack" 12 + 13 + 14 + echo 's:wake_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events 15 + echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger 16 + echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(wake_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger 17 + echo 1 > events/synthetic/wake_lat/enable 18 + sleep 1 19 + 20 + if ! grep -q "=>.*sched" trace; then 21 + fail "Failed to create synthetic event with stack" 22 + fi 23 + 24 + exit 0

+6

tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc

··· 70 70 echo "myevent char var[10]" > synthetic_events 71 71 grep "myevent[[:space:]]char\[10\] var" synthetic_events 72 72 73 + if grep -q 'long\[\]' README; then 74 + # test stacktrace type 75 + echo "myevent unsigned long[] var" > synthetic_events 76 + grep "myevent[[:space:]]unsigned long\[\] var" synthetic_events 77 + fi 78 + 73 79 do_reset 74 80 75 81 exit 0

+1 -1

tools/tracing/latency/latency-collector.c

··· 1584 1584 /* 1585 1585 * Toss a coin to decide if we want to sleep before printing 1586 1586 * out the backtrace. The reason for this is that opening 1587 - * /sys/kernel/debug/tracing/trace will cause a blackout of 1587 + * /sys/kernel/tracing/trace will cause a blackout of 1588 1588 * hundreds of ms, where no latencies will be noted by the 1589 1589 * latency tracer. Thus by randomly sleeping we try to avoid 1590 1590 * missing traces systematically due to this. With this option