Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

- Added option for per CPU threads to the hwlat tracer

- Have hwlat tracer handle hotplug CPUs

- New tracer: osnoise, that detects latency caused by interrupts,
softirqs and scheduling of other tasks.

- Added timerlat tracer that creates a thread and measures in detail
what sources of latency it has for wake ups.

- Removed the "success" field of the sched_wakeup trace event. This has
been hardcoded as "1" since 2015, no tooling should be looking at it
now. If one exists, we can revert this commit, fix that tool and try
to remove it again in the future.

- tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids.

- New boot command line option "tp_printk_stop", as tp_printk causes
trace events to write to console. When user space starts, this can
easily live lock the system. Having a boot option to stop just after
boot up is useful to prevent that from happening.

- Have ftrace_dump_on_oops boot command line option take numbers that
match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops.

- Bootconfig clean ups, fixes and enhancements.

- New ktest script that tests bootconfig options.

- Add tracepoint_probe_register_may_exist() to register a tracepoint
without triggering a WARN*() if it already exists. BPF has a path
from user space that can do this. All other paths are considered a
bug.

- Small clean ups and fixes

* tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits)
tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
tracing: Simplify & fix saved_tgids logic
treewide: Add missing semicolons to __assign_str uses
tracing: Change variable type as bool for clean-up
trace/timerlat: Fix indentation on timerlat_main()
trace/osnoise: Make 'noise' variable s64 in run_osnoise()
tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing
tracing: Fix spelling in osnoise tracer "interferences" -> "interference"
Documentation: Fix a typo on trace/osnoise-tracer
trace/osnoise: Fix return value on osnoise_init_hotplug_support
trace/osnoise: Make interval u64 on osnoise_main
trace/osnoise: Fix 'no previous prototype' warnings
tracing: Have osnoise_main() add a quiescent state for task rcu
seq_buf: Make trace_seq_putmem_hex() support data longer than 8
seq_buf: Fix overflow in seq_buf_putmem_hex()
trace/osnoise: Support hotplug operations
trace/hwlat: Support hotplug operations
trace/hwlat: Protect kdata->kthread with get/put_online_cpus
trace: Add timerlat tracer
trace: Add osnoise tracer
...

+4464 -394
+26 -4
Documentation/admin-guide/bootconfig.rst
··· 89 89 90 90 In this case, the key ``foo`` has ``bar``, ``baz`` and ``qux``. 91 91 92 - However, a sub-key and a value can not co-exist under a parent key. 93 - For example, following config is NOT allowed.:: 92 + Moreover, sub-keys and a value can coexist under a parent key. 93 + For example, following config is allowed.:: 94 94 95 95 foo = value1 96 - foo.bar = value2 # !ERROR! subkey "bar" and value "value1" can NOT co-exist 97 - foo.bar := value2 # !ERROR! even with the override operator, this is NOT allowed. 96 + foo.bar = value2 97 + foo := value3 # This will update foo's value. 98 98 99 + Note, since there is no syntax to put a raw value directly under a 100 + structured key, you have to define it outside of the brace. For example:: 101 + 102 + foo { 103 + bar = value1 104 + bar { 105 + baz = value2 106 + qux = value3 107 + } 108 + } 109 + 110 + Also, the order of the value node under a key is fixed. If there 111 + are a value and subkeys, the value is always the first child node 112 + of the key. Thus if user specifies subkeys first, e.g.:: 113 + 114 + foo.bar = value1 115 + foo = value2 116 + 117 + In the program (and /proc/bootconfig), it will be shown as below:: 118 + 119 + foo = value2 120 + foo.bar = value1 99 121 100 122 Comments 101 123 --------
+13
Documentation/admin-guide/kernel-parameters.txt
··· 5672 5672 Note, echoing 1 into this file without the 5673 5673 tracepoint_printk kernel cmdline option has no effect. 5674 5674 5675 + The tp_printk_stop_on_boot (see below) can also be used 5676 + to stop the printing of events to console at 5677 + late_initcall_sync. 5678 + 5675 5679 ** CAUTION ** 5676 5680 5677 5681 Having tracepoints sent to printk() and activating high 5678 5682 frequency tracepoints such as irq or sched, can cause 5679 5683 the system to live lock. 5684 + 5685 + tp_printk_stop_on_boot[FTRACE] 5686 + When tp_printk (above) is set, it can cause a lot of noise 5687 + on the console. It may be useful to only include the 5688 + printing of events during boot up, as user space may 5689 + make the system inoperable. 5690 + 5691 + This command line option will stop the printing of events 5692 + to console at the late_initcall_sync() time frame. 5680 5693 5681 5694 traceoff_on_warning 5682 5695 [FTRACE] enable this option to disable tracing when a
+6
Documentation/trace/boottime-trace.rst
··· 99 99 ftrace.[instance.INSTANCE.]event.GROUP.EVENT.enable 100 100 Enable GROUP:EVENT tracing. 101 101 102 + ftrace.[instance.INSTANCE.]event.GROUP.enable 103 + Enable all event tracing within GROUP. 104 + 105 + ftrace.[instance.INSTANCE.]event.enable 106 + Enable all event tracing. 107 + 102 108 ftrace.[instance.INSTANCE.]event.GROUP.EVENT.filter = FILTER 103 109 Set FILTER rule to the GROUP:EVENT. 104 110
+9 -4
Documentation/trace/hwlat_detector.rst
··· 76 76 - tracing_cpumask - the CPUs to move the hwlat thread across 77 77 - hwlat_detector/width - specified amount of time to spin within window (usecs) 78 78 - hwlat_detector/window - amount of time between (width) runs (usecs) 79 + - hwlat_detector/mode - the thread mode 79 80 80 - The hwlat detector's kernel thread will migrate across each CPU specified in 81 - tracing_cpumask between each window. To limit the migration, either modify 82 - tracing_cpumask, or modify the hwlat kernel thread (named [hwlatd]) CPU 83 - affinity directly, and the migration will stop. 81 + By default, one hwlat detector's kernel thread will migrate across each CPU 82 + specified in cpumask at the beginning of a new window, in a round-robin 83 + fashion. This behavior can be changed by changing the thread mode, 84 + the available options are: 85 + 86 + - none: do not force migration 87 + - round-robin: migrate across each CPU specified in cpumask [default] 88 + - per-cpu: create one thread for each cpu in tracing_cpumask
+2
Documentation/trace/index.rst
··· 23 23 histogram-design 24 24 boottime-trace 25 25 hwlat_detector 26 + osnoise-tracer 27 + timerlat-tracer 26 28 intel_th 27 29 ring-buffer-design 28 30 stm
+152
Documentation/trace/osnoise-tracer.rst
··· 1 + ============== 2 + OSNOISE Tracer 3 + ============== 4 + 5 + In the context of high-performance computing (HPC), the Operating System 6 + Noise (*osnoise*) refers to the interference experienced by an application 7 + due to activities inside the operating system. In the context of Linux, 8 + NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the 9 + system. Moreover, hardware-related jobs can also cause noise, for example, 10 + via SMIs. 11 + 12 + hwlat_detector is one of the tools used to identify the most complex 13 + source of noise: *hardware noise*. 14 + 15 + In a nutshell, the hwlat_detector creates a thread that runs 16 + periodically for a given period. At the beginning of a period, the thread 17 + disables interrupt and starts sampling. While running, the hwlatd 18 + thread reads the time in a loop. As interrupts are disabled, threads, 19 + IRQs, and SoftIRQs cannot interfere with the hwlatd thread. Hence, the 20 + cause of any gap between two different reads of the time roots either on 21 + NMI or in the hardware itself. At the end of the period, hwlatd enables 22 + interrupts and reports the max observed gap between the reads. It also 23 + prints a NMI occurrence counter. If the output does not report NMI 24 + executions, the user can conclude that the hardware is the culprit for 25 + the latency. The hwlat detects the NMI execution by observing 26 + the entry and exit of a NMI. 27 + 28 + The osnoise tracer leverages the hwlat_detector by running a 29 + similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing 30 + all the sources of *osnoise* during its execution. Using the same approach 31 + of hwlat, osnoise takes note of the entry and exit point of any 32 + source of interferences, increasing a per-cpu interference counter. The 33 + osnoise tracer also saves an interference counter for each source of 34 + interference. The interference counter for NMI, IRQs, SoftIRQs, and 35 + threads is increased anytime the tool observes these interferences' entry 36 + events. When a noise happens without any interference from the operating 37 + system level, the hardware noise counter increases, pointing to a 38 + hardware-related noise. In this way, osnoise can account for any 39 + source of interference. At the end of the period, the osnoise tracer 40 + prints the sum of all noise, the max single noise, the percentage of CPU 41 + available for the thread, and the counters for the noise sources. 42 + 43 + Usage 44 + ----- 45 + 46 + Write the ASCII text "osnoise" into the current_tracer file of the 47 + tracing system (generally mounted at /sys/kernel/tracing). 48 + 49 + For example:: 50 + 51 + [root@f32 ~]# cd /sys/kernel/tracing/ 52 + [root@f32 tracing]# echo osnoise > current_tracer 53 + 54 + It is possible to follow the trace by reading the trace trace file:: 55 + 56 + [root@f32 tracing]# cat trace 57 + # tracer: osnoise 58 + # 59 + # _-----=> irqs-off 60 + # / _----=> need-resched 61 + # | / _---=> hardirq/softirq 62 + # || / _--=> preempt-depth MAX 63 + # || / SINGLE Interference counters: 64 + # |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+ 65 + # TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD 66 + # | | | |||| | | | | | | | | | | 67 + <...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1 68 + <...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3 69 + <...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21 70 + <...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0 71 + <...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41 72 + <...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2 73 + <...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1 74 + <...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19 75 + 76 + In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the 77 + tracer prints a message at the end of each period for each CPU that is 78 + running an osnoise/ thread. The osnoise specific fields report: 79 + 80 + - The RUNTIME IN US reports the amount of time in microseconds that 81 + the osnoise thread kept looping reading the time. 82 + - The NOISE IN US reports the sum of noise in microseconds observed 83 + by the osnoise tracer during the associated runtime. 84 + - The % OF CPU AVAILABLE reports the percentage of CPU available for 85 + the osnoise thread during the runtime window. 86 + - The MAX SINGLE NOISE IN US reports the maximum single noise observed 87 + during the runtime window. 88 + - The Interference counters display how many each of the respective 89 + interference happened during the runtime window. 90 + 91 + Note that the example above shows a high number of HW noise samples. 92 + The reason being is that this sample was taken on a virtual machine, 93 + and the host interference is detected as a hardware interference. 94 + 95 + Tracer options 96 + --------------------- 97 + 98 + The tracer has a set of options inside the osnoise directory, they are: 99 + 100 + - osnoise/cpus: CPUs at which a osnoise thread will execute. 101 + - osnoise/period_us: the period of the osnoise thread. 102 + - osnoise/runtime_us: how long an osnoise thread will look for noise. 103 + - osnoise/stop_tracing_us: stop the system tracing if a single noise 104 + higher than the configured value happens. Writing 0 disables this 105 + option. 106 + - osnoise/stop_tracing_total_us: stop the system tracing if total noise 107 + higher than the configured value happens. Writing 0 disables this 108 + option. 109 + - tracing_threshold: the minimum delta between two time() reads to be 110 + considered as noise, in us. When set to 0, the default value will 111 + will be used, which is currently 5 us. 112 + 113 + Additional Tracing 114 + ------------------ 115 + 116 + In addition to the tracer, a set of tracepoints were added to 117 + facilitate the identification of the osnoise source. 118 + 119 + - osnoise:sample_threshold: printed anytime a noise is higher than 120 + the configurable tolerance_ns. 121 + - osnoise:nmi_noise: noise from NMI, including the duration. 122 + - osnoise:irq_noise: noise from an IRQ, including the duration. 123 + - osnoise:softirq_noise: noise from a SoftIRQ, including the 124 + duration. 125 + - osnoise:thread_noise: noise from a thread, including the duration. 126 + 127 + Note that all the values are *net values*. For example, if while osnoise 128 + is running, another thread preempts the osnoise thread, it will start a 129 + thread_noise duration at the start. Then, an IRQ takes place, preempting 130 + the thread_noise, starting a irq_noise. When the IRQ ends its execution, 131 + it will compute its duration, and this duration will be subtracted from 132 + the thread_noise, in such a way as to avoid the double accounting of the 133 + IRQ execution. This logic is valid for all sources of noise. 134 + 135 + Here is one example of the usage of these tracepoints:: 136 + 137 + osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns 138 + osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns 139 + migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns 140 + osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8812 ns interferences 2 141 + 142 + In this example, a noise sample of 8 microseconds was reported in the last 143 + line, pointing to two interferences. Looking backward in the trace, the 144 + two previous entries were about the migration thread running after a 145 + timer IRQ execution. The first event is not part of the noise because 146 + it took place one millisecond before. 147 + 148 + It is worth noticing that the sum of the duration reported in the 149 + tracepoints is smaller than eight us reported in the sample_threshold. 150 + The reason roots in the overhead of the entry and exit code that happens 151 + before and after any interference execution. This justifies the dual 152 + approach: measuring thread and tracing.
+181
Documentation/trace/timerlat-tracer.rst
··· 1 + ############### 2 + Timerlat tracer 3 + ############### 4 + 5 + The timerlat tracer aims to help the preemptive kernel developers to 6 + find souces of wakeup latencies of real-time threads. Like cyclictest, 7 + the tracer sets a periodic timer that wakes up a thread. The thread then 8 + computes a *wakeup latency* value as the difference between the *current 9 + time* and the *absolute time* that the timer was set to expire. The main 10 + goal of timerlat is tracing in such a way to help kernel developers. 11 + 12 + Usage 13 + ----- 14 + 15 + Write the ASCII text "timerlat" into the current_tracer file of the 16 + tracing system (generally mounted at /sys/kernel/tracing). 17 + 18 + For example:: 19 + 20 + [root@f32 ~]# cd /sys/kernel/tracing/ 21 + [root@f32 tracing]# echo timerlat > current_tracer 22 + 23 + It is possible to follow the trace by reading the trace trace file:: 24 + 25 + [root@f32 tracing]# cat trace 26 + # tracer: timerlat 27 + # 28 + # _-----=> irqs-off 29 + # / _----=> need-resched 30 + # | / _---=> hardirq/softirq 31 + # || / _--=> preempt-depth 32 + # || / 33 + # |||| ACTIVATION 34 + # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY 35 + # | | | |||| | | | | 36 + <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns 37 + <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns 38 + <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns 39 + <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns 40 + <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns 41 + <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns 42 + <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns 43 + <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns 44 + 45 + 46 + The tracer creates a per-cpu kernel thread with real-time priority that 47 + prints two lines at every activation. The first is the *timer latency* 48 + observed at the *hardirq* context before the activation of the thread. 49 + The second is the *timer latency* observed by the thread. The ACTIVATION 50 + ID field serves to relate the *irq* execution to its respective *thread* 51 + execution. 52 + 53 + The *irq*/*thread* splitting is important to clarify at which context 54 + the unexpected high value is coming from. The *irq* context can be 55 + delayed by hardware related actions, such as SMIs, NMIs, IRQs 56 + or by a thread masking interrupts. Once the timer happens, the delay 57 + can also be influenced by blocking caused by threads. For example, by 58 + postponing the scheduler execution via preempt_disable(), by the 59 + scheduler execution, or by masking interrupts. Threads can 60 + also be delayed by the interference from other threads and IRQs. 61 + 62 + Tracer options 63 + --------------------- 64 + 65 + The timerlat tracer is built on top of osnoise tracer. 66 + So its configuration is also done in the osnoise/ config 67 + directory. The timerlat configs are: 68 + 69 + - cpus: CPUs at which a timerlat thread will execute. 70 + - timerlat_period_us: the period of the timerlat thread. 71 + - osnoise/stop_tracing_us: stop the system tracing if a 72 + timer latency at the *irq* context higher than the configured 73 + value happens. Writing 0 disables this option. 74 + - stop_tracing_total_us: stop the system tracing if a 75 + timer latency at the *thread* context higher than the configured 76 + value happens. Writing 0 disables this option. 77 + - print_stack: save the stack of the IRQ ocurrence, and print 78 + it afte the *thread context* event". 79 + 80 + timerlat and osnoise 81 + ---------------------------- 82 + 83 + The timerlat can also take advantage of the osnoise: traceevents. 84 + For example:: 85 + 86 + [root@f32 ~]# cd /sys/kernel/tracing/ 87 + [root@f32 tracing]# echo timerlat > current_tracer 88 + [root@f32 tracing]# echo 1 > events/osnoise/enable 89 + [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us 90 + [root@f32 tracing]# tail -10 trace 91 + cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 13585 ns 92 + cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns 93 + cc1-87882 [005] dNLh2.. 548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns 94 + cc1-87882 [005] d...3.. 548.771102: thread_noise: cc1:87882 start 548.771078243 duration 9909 ns 95 + timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns 96 + 97 + In this case, the root cause of the timer latency does not point to a 98 + single cause, but to multiple ones. Firstly, the timer IRQ was delayed 99 + for 13 us, which may point to a long IRQ disabled section (see IRQ 100 + stacktrace section). Then the timer interrupt that wakes up the timerlat 101 + thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally, 102 + the cc1 thread noise took 9909 ns of time before the context switch. 103 + Such pieces of evidence are useful for the developer to use other 104 + tracing methods to figure out how to debug and optimize the system. 105 + 106 + It is worth mentioning that the *duration* values reported 107 + by the osnoise: events are *net* values. For example, the 108 + thread_noise does not include the duration of the overhead caused 109 + by the IRQ execution (which indeed accounted for 12736 ns). But 110 + the values reported by the timerlat tracer (timerlat_latency) 111 + are *gross* values. 112 + 113 + The art below illustrates a CPU timeline and how the timerlat tracer 114 + observes it at the top and the osnoise: events at the bottom. Each "-" 115 + in the timelines means circa 1 us, and the time moves ==>:: 116 + 117 + External timer irq thread 118 + clock latency latency 119 + event 13585 ns 39960 ns 120 + | ^ ^ 121 + v | | 122 + |-------------| | 123 + |-------------+-------------------------| 124 + ^ ^ 125 + ======================================================================== 126 + [tmr irq] [dev irq] 127 + [another thread...^ v..^ v.......][timerlat/ thread] <-- CPU timeline 128 + ========================================================================= 129 + |-------| |-------| 130 + |--^ v-------| 131 + | | | 132 + | | + thread_noise: 9909 ns 133 + | +-> irq_noise: 6139 ns 134 + +-> irq_noise: 7597 ns 135 + 136 + IRQ stacktrace 137 + --------------------------- 138 + 139 + The osnoise/print_stack option is helpful for the cases in which a thread 140 + noise causes the major factor for the timer latency, because of preempt or 141 + irq disabled. For example:: 142 + 143 + [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us 144 + [root@f32 tracing]# echo 500 > osnoise/print_stack 145 + [root@f32 tracing]# echo timerlat > current_tracer 146 + [root@f32 tracing]# tail -21 per_cpu/cpu7/trace 147 + insmod-1026 [007] dN.h1.. 200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns 148 + insmod-1026 [007] d..h1.. 200.202587: #29800 context irq timer_latency 1616 ns 149 + insmod-1026 [007] dN.h2.. 200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns 150 + insmod-1026 [007] dN.h3.. 200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns 151 + insmod-1026 [007] d...3.. 200.203444: thread_noise: insmod:1026 start 200.202586933 duration 838681 ns 152 + timerlat/7-1001 [007] ....... 200.203445: #29800 context thread timer_latency 859978 ns 153 + timerlat/7-1001 [007] ....1.. 200.203446: <stack trace> 154 + => timerlat_irq 155 + => __hrtimer_run_queues 156 + => hrtimer_interrupt 157 + => __sysvec_apic_timer_interrupt 158 + => asm_call_irq_on_stack 159 + => sysvec_apic_timer_interrupt 160 + => asm_sysvec_apic_timer_interrupt 161 + => delay_tsc 162 + => dummy_load_1ms_pd_init 163 + => do_one_initcall 164 + => do_init_module 165 + => __do_sys_finit_module 166 + => do_syscall_64 167 + => entry_SYSCALL_64_after_hwframe 168 + 169 + In this case, it is possible to see that the thread added the highest 170 + contribution to the *timer latency* and the stack trace, saved during 171 + the timerlat IRQ handler, points to a function named 172 + dummy_load_1ms_pd_init, which had the following code (on purpose):: 173 + 174 + static int __init dummy_load_1ms_pd_init(void) 175 + { 176 + preempt_disable(); 177 + mdelay(1); 178 + preempt_enable(); 179 + return 0; 180 + 181 + }
+1
arch/x86/kernel/Makefile
··· 102 102 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o 103 103 obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o 104 104 obj-$(CONFIG_X86_TSC) += trace_clock.o 105 + obj-$(CONFIG_TRACING) += trace.o 105 106 obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o 106 107 obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o 107 108 obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o
+234
arch/x86/kernel/trace.c
··· 1 + #include <asm/trace/irq_vectors.h> 2 + #include <linux/trace.h> 3 + 4 + #if defined(CONFIG_OSNOISE_TRACER) && defined(CONFIG_X86_LOCAL_APIC) 5 + /* 6 + * trace_intel_irq_entry - record intel specific IRQ entry 7 + */ 8 + static void trace_intel_irq_entry(void *data, int vector) 9 + { 10 + osnoise_trace_irq_entry(vector); 11 + } 12 + 13 + /* 14 + * trace_intel_irq_exit - record intel specific IRQ exit 15 + */ 16 + static void trace_intel_irq_exit(void *data, int vector) 17 + { 18 + char *vector_desc = (char *) data; 19 + 20 + osnoise_trace_irq_exit(vector, vector_desc); 21 + } 22 + 23 + /* 24 + * register_intel_irq_tp - Register intel specific IRQ entry tracepoints 25 + */ 26 + int osnoise_arch_register(void) 27 + { 28 + int ret; 29 + 30 + ret = register_trace_local_timer_entry(trace_intel_irq_entry, NULL); 31 + if (ret) 32 + goto out_err; 33 + 34 + ret = register_trace_local_timer_exit(trace_intel_irq_exit, "local_timer"); 35 + if (ret) 36 + goto out_timer_entry; 37 + 38 + #ifdef CONFIG_X86_THERMAL_VECTOR 39 + ret = register_trace_thermal_apic_entry(trace_intel_irq_entry, NULL); 40 + if (ret) 41 + goto out_timer_exit; 42 + 43 + ret = register_trace_thermal_apic_exit(trace_intel_irq_exit, "thermal_apic"); 44 + if (ret) 45 + goto out_thermal_entry; 46 + #endif /* CONFIG_X86_THERMAL_VECTOR */ 47 + 48 + #ifdef CONFIG_X86_MCE_AMD 49 + ret = register_trace_deferred_error_apic_entry(trace_intel_irq_entry, NULL); 50 + if (ret) 51 + goto out_thermal_exit; 52 + 53 + ret = register_trace_deferred_error_apic_exit(trace_intel_irq_exit, "deferred_error"); 54 + if (ret) 55 + goto out_deferred_entry; 56 + #endif 57 + 58 + #ifdef CONFIG_X86_MCE_THRESHOLD 59 + ret = register_trace_threshold_apic_entry(trace_intel_irq_entry, NULL); 60 + if (ret) 61 + goto out_deferred_exit; 62 + 63 + ret = register_trace_threshold_apic_exit(trace_intel_irq_exit, "threshold_apic"); 64 + if (ret) 65 + goto out_threshold_entry; 66 + #endif /* CONFIG_X86_MCE_THRESHOLD */ 67 + 68 + #ifdef CONFIG_SMP 69 + ret = register_trace_call_function_single_entry(trace_intel_irq_entry, NULL); 70 + if (ret) 71 + goto out_threshold_exit; 72 + 73 + ret = register_trace_call_function_single_exit(trace_intel_irq_exit, 74 + "call_function_single"); 75 + if (ret) 76 + goto out_call_function_single_entry; 77 + 78 + ret = register_trace_call_function_entry(trace_intel_irq_entry, NULL); 79 + if (ret) 80 + goto out_call_function_single_exit; 81 + 82 + ret = register_trace_call_function_exit(trace_intel_irq_exit, "call_function"); 83 + if (ret) 84 + goto out_call_function_entry; 85 + 86 + ret = register_trace_reschedule_entry(trace_intel_irq_entry, NULL); 87 + if (ret) 88 + goto out_call_function_exit; 89 + 90 + ret = register_trace_reschedule_exit(trace_intel_irq_exit, "reschedule"); 91 + if (ret) 92 + goto out_reschedule_entry; 93 + #endif /* CONFIG_SMP */ 94 + 95 + #ifdef CONFIG_IRQ_WORK 96 + ret = register_trace_irq_work_entry(trace_intel_irq_entry, NULL); 97 + if (ret) 98 + goto out_reschedule_exit; 99 + 100 + ret = register_trace_irq_work_exit(trace_intel_irq_exit, "irq_work"); 101 + if (ret) 102 + goto out_irq_work_entry; 103 + #endif 104 + 105 + ret = register_trace_x86_platform_ipi_entry(trace_intel_irq_entry, NULL); 106 + if (ret) 107 + goto out_irq_work_exit; 108 + 109 + ret = register_trace_x86_platform_ipi_exit(trace_intel_irq_exit, "x86_platform_ipi"); 110 + if (ret) 111 + goto out_x86_ipi_entry; 112 + 113 + ret = register_trace_error_apic_entry(trace_intel_irq_entry, NULL); 114 + if (ret) 115 + goto out_x86_ipi_exit; 116 + 117 + ret = register_trace_error_apic_exit(trace_intel_irq_exit, "error_apic"); 118 + if (ret) 119 + goto out_error_apic_entry; 120 + 121 + ret = register_trace_spurious_apic_entry(trace_intel_irq_entry, NULL); 122 + if (ret) 123 + goto out_error_apic_exit; 124 + 125 + ret = register_trace_spurious_apic_exit(trace_intel_irq_exit, "spurious_apic"); 126 + if (ret) 127 + goto out_spurious_apic_entry; 128 + 129 + return 0; 130 + 131 + out_spurious_apic_entry: 132 + unregister_trace_spurious_apic_entry(trace_intel_irq_entry, NULL); 133 + out_error_apic_exit: 134 + unregister_trace_error_apic_exit(trace_intel_irq_exit, "error_apic"); 135 + out_error_apic_entry: 136 + unregister_trace_error_apic_entry(trace_intel_irq_entry, NULL); 137 + out_x86_ipi_exit: 138 + unregister_trace_x86_platform_ipi_exit(trace_intel_irq_exit, "x86_platform_ipi"); 139 + out_x86_ipi_entry: 140 + unregister_trace_x86_platform_ipi_entry(trace_intel_irq_entry, NULL); 141 + out_irq_work_exit: 142 + 143 + #ifdef CONFIG_IRQ_WORK 144 + unregister_trace_irq_work_exit(trace_intel_irq_exit, "irq_work"); 145 + out_irq_work_entry: 146 + unregister_trace_irq_work_entry(trace_intel_irq_entry, NULL); 147 + out_reschedule_exit: 148 + #endif 149 + 150 + #ifdef CONFIG_SMP 151 + unregister_trace_reschedule_exit(trace_intel_irq_exit, "reschedule"); 152 + out_reschedule_entry: 153 + unregister_trace_reschedule_entry(trace_intel_irq_entry, NULL); 154 + out_call_function_exit: 155 + unregister_trace_call_function_exit(trace_intel_irq_exit, "call_function"); 156 + out_call_function_entry: 157 + unregister_trace_call_function_entry(trace_intel_irq_entry, NULL); 158 + out_call_function_single_exit: 159 + unregister_trace_call_function_single_exit(trace_intel_irq_exit, "call_function_single"); 160 + out_call_function_single_entry: 161 + unregister_trace_call_function_single_entry(trace_intel_irq_entry, NULL); 162 + out_threshold_exit: 163 + #endif 164 + 165 + #ifdef CONFIG_X86_MCE_THRESHOLD 166 + unregister_trace_threshold_apic_exit(trace_intel_irq_exit, "threshold_apic"); 167 + out_threshold_entry: 168 + unregister_trace_threshold_apic_entry(trace_intel_irq_entry, NULL); 169 + out_deferred_exit: 170 + #endif 171 + 172 + #ifdef CONFIG_X86_MCE_AMD 173 + unregister_trace_deferred_error_apic_exit(trace_intel_irq_exit, "deferred_error"); 174 + out_deferred_entry: 175 + unregister_trace_deferred_error_apic_entry(trace_intel_irq_entry, NULL); 176 + out_thermal_exit: 177 + #endif /* CONFIG_X86_MCE_AMD */ 178 + 179 + #ifdef CONFIG_X86_THERMAL_VECTOR 180 + unregister_trace_thermal_apic_exit(trace_intel_irq_exit, "thermal_apic"); 181 + out_thermal_entry: 182 + unregister_trace_thermal_apic_entry(trace_intel_irq_entry, NULL); 183 + out_timer_exit: 184 + #endif /* CONFIG_X86_THERMAL_VECTOR */ 185 + 186 + unregister_trace_local_timer_exit(trace_intel_irq_exit, "local_timer"); 187 + out_timer_entry: 188 + unregister_trace_local_timer_entry(trace_intel_irq_entry, NULL); 189 + out_err: 190 + return -EINVAL; 191 + } 192 + 193 + void osnoise_arch_unregister(void) 194 + { 195 + unregister_trace_spurious_apic_exit(trace_intel_irq_exit, "spurious_apic"); 196 + unregister_trace_spurious_apic_entry(trace_intel_irq_entry, NULL); 197 + unregister_trace_error_apic_exit(trace_intel_irq_exit, "error_apic"); 198 + unregister_trace_error_apic_entry(trace_intel_irq_entry, NULL); 199 + unregister_trace_x86_platform_ipi_exit(trace_intel_irq_exit, "x86_platform_ipi"); 200 + unregister_trace_x86_platform_ipi_entry(trace_intel_irq_entry, NULL); 201 + 202 + #ifdef CONFIG_IRQ_WORK 203 + unregister_trace_irq_work_exit(trace_intel_irq_exit, "irq_work"); 204 + unregister_trace_irq_work_entry(trace_intel_irq_entry, NULL); 205 + #endif 206 + 207 + #ifdef CONFIG_SMP 208 + unregister_trace_reschedule_exit(trace_intel_irq_exit, "reschedule"); 209 + unregister_trace_reschedule_entry(trace_intel_irq_entry, NULL); 210 + unregister_trace_call_function_exit(trace_intel_irq_exit, "call_function"); 211 + unregister_trace_call_function_entry(trace_intel_irq_entry, NULL); 212 + unregister_trace_call_function_single_exit(trace_intel_irq_exit, "call_function_single"); 213 + unregister_trace_call_function_single_entry(trace_intel_irq_entry, NULL); 214 + #endif 215 + 216 + #ifdef CONFIG_X86_MCE_THRESHOLD 217 + unregister_trace_threshold_apic_exit(trace_intel_irq_exit, "threshold_apic"); 218 + unregister_trace_threshold_apic_entry(trace_intel_irq_entry, NULL); 219 + #endif 220 + 221 + #ifdef CONFIG_X86_MCE_AMD 222 + unregister_trace_deferred_error_apic_exit(trace_intel_irq_exit, "deferred_error"); 223 + unregister_trace_deferred_error_apic_entry(trace_intel_irq_entry, NULL); 224 + #endif 225 + 226 + #ifdef CONFIG_X86_THERMAL_VECTOR 227 + unregister_trace_thermal_apic_exit(trace_intel_irq_exit, "thermal_apic"); 228 + unregister_trace_thermal_apic_entry(trace_intel_irq_entry, NULL); 229 + #endif /* CONFIG_X86_THERMAL_VECTOR */ 230 + 231 + unregister_trace_local_timer_exit(trace_intel_irq_exit, "local_timer"); 232 + unregister_trace_local_timer_entry(trace_intel_irq_entry, NULL); 233 + } 234 + #endif /* CONFIG_OSNOISE_TRAECR && CONFIG_X86_LOCAL_APIC */
+7 -7
drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
··· 176 176 177 177 TP_fast_assign( 178 178 __entry->sched_job_id = job->base.id; 179 - __assign_str(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job)) 179 + __assign_str(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job)); 180 180 __entry->context = job->base.s_fence->finished.context; 181 181 __entry->seqno = job->base.s_fence->finished.seqno; 182 - __assign_str(ring, to_amdgpu_ring(job->base.sched)->name) 182 + __assign_str(ring, to_amdgpu_ring(job->base.sched)->name); 183 183 __entry->num_ibs = job->num_ibs; 184 184 ), 185 185 TP_printk("sched_job=%llu, timeline=%s, context=%u, seqno=%u, ring_name=%s, num_ibs=%u", ··· 201 201 202 202 TP_fast_assign( 203 203 __entry->sched_job_id = job->base.id; 204 - __assign_str(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job)) 204 + __assign_str(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job)); 205 205 __entry->context = job->base.s_fence->finished.context; 206 206 __entry->seqno = job->base.s_fence->finished.seqno; 207 - __assign_str(ring, to_amdgpu_ring(job->base.sched)->name) 207 + __assign_str(ring, to_amdgpu_ring(job->base.sched)->name); 208 208 __entry->num_ibs = job->num_ibs; 209 209 ), 210 210 TP_printk("sched_job=%llu, timeline=%s, context=%u, seqno=%u, ring_name=%s, num_ibs=%u", ··· 229 229 230 230 TP_fast_assign( 231 231 __entry->pasid = vm->pasid; 232 - __assign_str(ring, ring->name) 232 + __assign_str(ring, ring->name); 233 233 __entry->vmid = job->vmid; 234 234 __entry->vm_hub = ring->funcs->vmhub, 235 235 __entry->pd_addr = job->vm_pd_addr; ··· 424 424 ), 425 425 426 426 TP_fast_assign( 427 - __assign_str(ring, ring->name) 427 + __assign_str(ring, ring->name); 428 428 __entry->vmid = vmid; 429 429 __entry->vm_hub = ring->funcs->vmhub; 430 430 __entry->pd_addr = pd_addr; ··· 525 525 ), 526 526 527 527 TP_fast_assign( 528 - __assign_str(ring, sched_job->base.sched->name) 528 + __assign_str(ring, sched_job->base.sched->name); 529 529 __entry->id = sched_job->base.id; 530 530 __entry->fence = fence; 531 531 __entry->ctx = fence->context;
+1 -1
drivers/gpu/drm/lima/lima_trace.h
··· 24 24 __entry->task_id = task->base.id; 25 25 __entry->context = task->base.s_fence->finished.context; 26 26 __entry->seqno = task->base.s_fence->finished.seqno; 27 - __assign_str(pipe, task->base.sched->name) 27 + __assign_str(pipe, task->base.sched->name); 28 28 ), 29 29 30 30 TP_printk("task=%llu, context=%u seqno=%u pipe=%s",
+2 -2
drivers/infiniband/hw/hfi1/trace_misc.h
··· 63 63 __array(char, buf, 64) 64 64 __field(int, src) 65 65 ), 66 - TP_fast_assign(DD_DEV_ASSIGN(dd) 66 + TP_fast_assign(DD_DEV_ASSIGN(dd); 67 67 is_entry->is_name(__entry->buf, 64, 68 68 src - is_entry->start); 69 69 __entry->src = src; ··· 100 100 __field(u32, qpn) 101 101 __field(u8, opcode) 102 102 ), 103 - TP_fast_assign(DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 103 + TP_fast_assign(DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 104 104 __entry->qpn = qp->ibqp.qp_num; 105 105 __entry->opcode = opcode; 106 106 ),
+2 -2
drivers/infiniband/hw/hfi1/trace_rc.h
··· 70 70 __field(u32, r_psn) 71 71 ), 72 72 TP_fast_assign( 73 - DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 73 + DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 74 74 __entry->qpn = qp->ibqp.qp_num; 75 75 __entry->s_flags = qp->s_flags; 76 76 __entry->psn = psn; ··· 130 130 __field(u32, lpsn) 131 131 ), 132 132 TP_fast_assign(/* assign */ 133 - DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 133 + DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 134 134 __entry->qpn = qp->ibqp.qp_num; 135 135 __entry->aeth = aeth; 136 136 __entry->psn = psn;
+3 -3
drivers/infiniband/hw/hfi1/trace_tid.h
··· 886 886 __field(u8, s_retry) 887 887 ), 888 888 TP_fast_assign(/* assign */ 889 - DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 889 + DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 890 890 __entry->qpn = qp->ibqp.qp_num; 891 891 __entry->state = qp->state; 892 892 __entry->s_cur = qp->s_cur; ··· 1285 1285 __field(int, diff) 1286 1286 ), 1287 1287 TP_fast_assign(/* assign */ 1288 - DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 1288 + DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 1289 1289 __entry->qpn = qp->ibqp.qp_num; 1290 1290 __entry->s_flags = qp->s_flags; 1291 1291 __entry->state = qp->state; ··· 1574 1574 __field(u32, resync_psn) 1575 1575 ), 1576 1576 TP_fast_assign(/* assign */ 1577 - DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 1577 + DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 1578 1578 __entry->qpn = qp->ibqp.qp_num; 1579 1579 __entry->aeth = aeth; 1580 1580 __entry->psn = psn;
+4 -4
drivers/infiniband/hw/hfi1/trace_tx.h
··· 120 120 __field(unsigned long, iow_flags) 121 121 ), 122 122 TP_fast_assign( 123 - DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 123 + DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 124 124 __entry->flags = flags; 125 125 __entry->qpn = qp->ibqp.qp_num; 126 126 __entry->s_flags = qp->s_flags; ··· 868 868 __field(int, send_flags) 869 869 ), 870 870 TP_fast_assign( 871 - DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 871 + DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 872 872 __entry->wqe = wqe; 873 873 __entry->wr_id = wqe->wr.wr_id; 874 874 __entry->qpn = qp->ibqp.qp_num; ··· 904 904 __field(bool, flag) 905 905 ), 906 906 TP_fast_assign( 907 - DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)) 907 + DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device)); 908 908 __entry->qpn = qp->ibqp.qp_num; 909 909 __entry->flag = flag; 910 910 ), ··· 952 952 __field(u8, stopped) 953 953 ), 954 954 TP_fast_assign(/* assign */ 955 - DD_DEV_ASSIGN(txq->priv->dd) 955 + DD_DEV_ASSIGN(txq->priv->dd); 956 956 __entry->txq = txq; 957 957 __entry->sde = txq->sde; 958 958 __entry->head = txq->tx_ring.head;
+2 -2
drivers/infiniband/sw/rdmavt/trace_cq.h
··· 85 85 __field(int, comp_vector_cpu) 86 86 __field(u32, flags) 87 87 ), 88 - TP_fast_assign(RDI_DEV_ASSIGN(cq->rdi) 88 + TP_fast_assign(RDI_DEV_ASSIGN(cq->rdi); 89 89 __entry->ip = cq->ip; 90 90 __entry->cqe = attr->cqe; 91 91 __entry->comp_vector = attr->comp_vector; ··· 123 123 __field(u32, imm) 124 124 ), 125 125 TP_fast_assign( 126 - RDI_DEV_ASSIGN(cq->rdi) 126 + RDI_DEV_ASSIGN(cq->rdi); 127 127 __entry->wr_id = wc->wr_id; 128 128 __entry->status = wc->status; 129 129 __entry->opcode = wc->opcode;
+1 -1
drivers/infiniband/sw/rdmavt/trace_mr.h
··· 195 195 __field(uint, sg_offset) 196 196 ), 197 197 TP_fast_assign( 198 - RDI_DEV_ASSIGN(ib_to_rvt(to_imr(ibmr)->mr.pd->device)) 198 + RDI_DEV_ASSIGN(ib_to_rvt(to_imr(ibmr)->mr.pd->device)); 199 199 __entry->ibmr_iova = ibmr->iova; 200 200 __entry->iova = to_imr(ibmr)->mr.iova; 201 201 __entry->user_base = to_imr(ibmr)->mr.user_base;
+2 -2
drivers/infiniband/sw/rdmavt/trace_qp.h
··· 65 65 __field(u32, bucket) 66 66 ), 67 67 TP_fast_assign( 68 - RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)) 68 + RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)); 69 69 __entry->qpn = qp->ibqp.qp_num; 70 70 __entry->bucket = bucket; 71 71 ), ··· 97 97 __field(u32, to) 98 98 ), 99 99 TP_fast_assign( 100 - RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)) 100 + RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)); 101 101 __entry->qpn = qp->ibqp.qp_num; 102 102 __entry->hrtimer = &qp->s_rnr_timer; 103 103 __entry->s_flags = qp->s_flags;
+1 -1
drivers/infiniband/sw/rdmavt/trace_rc.h
··· 71 71 __field(u32, r_psn) 72 72 ), 73 73 TP_fast_assign( 74 - RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)) 74 + RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)); 75 75 __entry->qpn = qp->ibqp.qp_num; 76 76 __entry->s_flags = qp->s_flags; 77 77 __entry->psn = psn;
+2 -2
drivers/infiniband/sw/rdmavt/trace_tx.h
··· 111 111 __field(int, wr_num_sge) 112 112 ), 113 113 TP_fast_assign( 114 - RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)) 114 + RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)); 115 115 __entry->wqe = wqe; 116 116 __entry->wr_id = wqe->wr.wr_id; 117 117 __entry->qpn = qp->ibqp.qp_num; ··· 170 170 __field(int, send_flags) 171 171 ), 172 172 TP_fast_assign( 173 - RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)) 173 + RDI_DEV_ASSIGN(ib_to_rvt(qp->ibqp.device)); 174 174 __entry->wqe = wqe; 175 175 __entry->wr_id = wqe->wr.wr_id; 176 176 __entry->qpn = qp->ibqp.qp_num;
+3 -3
drivers/misc/mei/mei-trace.h
··· 26 26 __field(u32, val) 27 27 ), 28 28 TP_fast_assign( 29 - __assign_str(dev, dev_name(dev)) 29 + __assign_str(dev, dev_name(dev)); 30 30 __entry->reg = reg; 31 31 __entry->offs = offs; 32 32 __entry->val = val; ··· 45 45 __field(u32, val) 46 46 ), 47 47 TP_fast_assign( 48 - __assign_str(dev, dev_name(dev)) 48 + __assign_str(dev, dev_name(dev)); 49 49 __entry->reg = reg; 50 50 __entry->offs = offs; 51 51 __entry->val = val; ··· 64 64 __field(u32, val) 65 65 ), 66 66 TP_fast_assign( 67 - __assign_str(dev, dev_name(dev)) 67 + __assign_str(dev, dev_name(dev)); 68 68 __entry->reg = reg; 69 69 __entry->offs = offs; 70 70 __entry->val = val;
+6 -6
drivers/net/ethernet/marvell/octeontx2/af/rvu_trace.h
··· 21 21 __field(u16, id) 22 22 __field(u64, size) 23 23 ), 24 - TP_fast_assign(__assign_str(dev, pci_name(pdev)) 24 + TP_fast_assign(__assign_str(dev, pci_name(pdev)); 25 25 __entry->id = id; 26 26 __entry->size = size; 27 27 ), ··· 36 36 __field(u16, num_msgs) 37 37 __field(u64, msg_size) 38 38 ), 39 - TP_fast_assign(__assign_str(dev, pci_name(pdev)) 39 + TP_fast_assign(__assign_str(dev, pci_name(pdev)); 40 40 __entry->num_msgs = num_msgs; 41 41 __entry->msg_size = msg_size; 42 42 ), ··· 52 52 __field(u16, rspid) 53 53 __field(int, rc) 54 54 ), 55 - TP_fast_assign(__assign_str(dev, pci_name(pdev)) 55 + TP_fast_assign(__assign_str(dev, pci_name(pdev)); 56 56 __entry->reqid = reqid; 57 57 __entry->rspid = rspid; 58 58 __entry->rc = rc; ··· 69 69 __string(str, msg) 70 70 __field(u64, intr) 71 71 ), 72 - TP_fast_assign(__assign_str(dev, pci_name(pdev)) 73 - __assign_str(str, msg) 72 + TP_fast_assign(__assign_str(dev, pci_name(pdev)); 73 + __assign_str(str, msg); 74 74 __entry->intr = intr; 75 75 ), 76 76 TP_printk("[%s] mbox interrupt %s (0x%llx)\n", __get_str(dev), ··· 84 84 __field(u16, id) 85 85 __field(int, err) 86 86 ), 87 - TP_fast_assign(__assign_str(dev, pci_name(pdev)) 87 + TP_fast_assign(__assign_str(dev, pci_name(pdev)); 88 88 __entry->id = id; 89 89 __entry->err = err; 90 90 ),
+2 -2
drivers/net/fjes/fjes_trace.h
··· 232 232 __string(err, err) 233 233 ), 234 234 TP_fast_assign( 235 - __assign_str(err, err) 235 + __assign_str(err, err); 236 236 ), 237 237 TP_printk("%s", __get_str(err)) 238 238 ); ··· 258 258 __string(err, err) 259 259 ), 260 260 TP_fast_assign( 261 - __assign_str(err, err) 261 + __assign_str(err, err); 262 262 ), 263 263 TP_printk("%s", __get_str(err)) 264 264 );
+1 -1
drivers/usb/cdns3/cdnsp-trace.h
··· 138 138 __string(text, msg) 139 139 ), 140 140 TP_fast_assign( 141 - __assign_str(text, msg) 141 + __assign_str(text, msg); 142 142 ), 143 143 TP_printk("%s", __get_str(text)) 144 144 );
+3 -3
fs/nfs/nfs4trace.h
··· 625 625 626 626 TP_fast_assign( 627 627 __entry->state = clp->cl_state; 628 - __assign_str(hostname, clp->cl_hostname) 628 + __assign_str(hostname, clp->cl_hostname); 629 629 ), 630 630 631 631 TP_printk( ··· 1637 1637 __entry->fileid = 0; 1638 1638 __entry->dev = 0; 1639 1639 } 1640 - __assign_str(dstaddr, clp ? clp->cl_hostname : "unknown") 1640 + __assign_str(dstaddr, clp ? clp->cl_hostname : "unknown"); 1641 1641 ), 1642 1642 1643 1643 TP_printk( ··· 1694 1694 __entry->fileid = 0; 1695 1695 __entry->dev = 0; 1696 1696 } 1697 - __assign_str(dstaddr, clp ? clp->cl_hostname : "unknown") 1697 + __assign_str(dstaddr, clp ? clp->cl_hostname : "unknown"); 1698 1698 __entry->stateid_seq = 1699 1699 be32_to_cpu(stateid->seqid); 1700 1700 __entry->stateid_hash =
+2 -2
fs/nfs/nfstrace.h
··· 1427 1427 __entry->version = task->tk_client->cl_vers; 1428 1428 __entry->error = error; 1429 1429 __assign_str(program, 1430 - task->tk_client->cl_program->name) 1431 - __assign_str(procedure, task->tk_msg.rpc_proc->p_name) 1430 + task->tk_client->cl_program->name); 1431 + __assign_str(procedure, task->tk_msg.rpc_proc->p_name); 1432 1432 ), 1433 1433 1434 1434 TP_printk(
+1 -1
fs/proc/bootconfig.c
··· 49 49 else 50 50 q = '"'; 51 51 ret = snprintf(dst, rest(dst, end), "%c%s%c%s", 52 - q, val, q, vnode->next ? ", " : "\n"); 52 + q, val, q, xbc_node_is_array(vnode) ? ", " : "\n"); 53 53 if (ret < 0) 54 54 goto out; 55 55 dst += ret;
+55 -3
include/linux/bootconfig.h
··· 16 16 #define BOOTCONFIG_ALIGN (1 << BOOTCONFIG_ALIGN_SHIFT) 17 17 #define BOOTCONFIG_ALIGN_MASK (BOOTCONFIG_ALIGN - 1) 18 18 19 + /** 20 + * xbc_calc_checksum() - Calculate checksum of bootconfig 21 + * @data: Bootconfig data. 22 + * @size: The size of the bootconfig data. 23 + * 24 + * Calculate the checksum value of the bootconfig data. 25 + * The checksum will be used with the BOOTCONFIG_MAGIC and the size for 26 + * embedding the bootconfig in the initrd image. 27 + */ 28 + static inline __init u32 xbc_calc_checksum(void *data, u32 size) 29 + { 30 + unsigned char *p = data; 31 + u32 ret = 0; 32 + 33 + while (size--) 34 + ret += *p++; 35 + 36 + return ret; 37 + } 38 + 19 39 /* XBC tree node */ 20 40 struct xbc_node { 21 41 u16 next; ··· 91 71 */ 92 72 static inline __init bool xbc_node_is_array(struct xbc_node *node) 93 73 { 94 - return xbc_node_is_value(node) && node->next != 0; 74 + return xbc_node_is_value(node) && node->child != 0; 95 75 } 96 76 97 77 /** ··· 100 80 * 101 81 * Test the @node is a leaf key node which is a key node and has a value node 102 82 * or no child. Returns true if it is a leaf node, or false if not. 83 + * Note that the leaf node can have subkey nodes in addition to the 84 + * value node. 103 85 */ 104 86 static inline __init bool xbc_node_is_leaf(struct xbc_node *node) 105 87 { ··· 152 130 } 153 131 154 132 /** 133 + * xbc_node_get_subkey() - Return the first subkey node if exists 134 + * @node: Parent node 135 + * 136 + * Return the first subkey node of the @node. If the @node has no child 137 + * or only value node, this will return NULL. 138 + */ 139 + static inline struct xbc_node * __init xbc_node_get_subkey(struct xbc_node *node) 140 + { 141 + struct xbc_node *child = xbc_node_get_child(node); 142 + 143 + if (child && xbc_node_is_value(child)) 144 + return xbc_node_get_next(child); 145 + else 146 + return child; 147 + } 148 + 149 + /** 155 150 * xbc_array_for_each_value() - Iterate value nodes on an array 156 151 * @anode: An XBC arraied value node 157 152 * @value: A value ··· 179 140 */ 180 141 #define xbc_array_for_each_value(anode, value) \ 181 142 for (value = xbc_node_get_data(anode); anode != NULL ; \ 182 - anode = xbc_node_get_next(anode), \ 143 + anode = xbc_node_get_child(anode), \ 183 144 value = anode ? xbc_node_get_data(anode) : NULL) 184 145 185 146 /** ··· 188 149 * @child: Iterated XBC node. 189 150 * 190 151 * Iterate child nodes of @parent. Each child nodes are stored to @child. 152 + * The @child can be mixture of a value node and subkey nodes. 191 153 */ 192 154 #define xbc_node_for_each_child(parent, child) \ 193 155 for (child = xbc_node_get_child(parent); child != NULL ; \ 156 + child = xbc_node_get_next(child)) 157 + 158 + /** 159 + * xbc_node_for_each_subkey() - Iterate child subkey nodes 160 + * @parent: An XBC node. 161 + * @child: Iterated XBC node. 162 + * 163 + * Iterate subkey nodes of @parent. Each child nodes are stored to @child. 164 + * The @child is only the subkey node. 165 + */ 166 + #define xbc_node_for_each_subkey(parent, child) \ 167 + for (child = xbc_node_get_subkey(parent); child != NULL ; \ 194 168 child = xbc_node_get_next(child)) 195 169 196 170 /** ··· 223 171 */ 224 172 #define xbc_node_for_each_array_value(node, key, anode, value) \ 225 173 for (value = xbc_node_find_value(node, key, &anode); value != NULL; \ 226 - anode = xbc_node_get_next(anode), \ 174 + anode = xbc_node_get_child(anode), \ 227 175 value = anode ? xbc_node_get_data(anode) : NULL) 228 176 229 177 /**
+13
include/linux/ftrace_irq.h
··· 7 7 extern void trace_hwlat_callback(bool enter); 8 8 #endif 9 9 10 + #ifdef CONFIG_OSNOISE_TRACER 11 + extern bool trace_osnoise_callback_enabled; 12 + extern void trace_osnoise_callback(bool enter); 13 + #endif 14 + 10 15 static inline void ftrace_nmi_enter(void) 11 16 { 12 17 #ifdef CONFIG_HWLAT_TRACER 13 18 if (trace_hwlat_callback_enabled) 14 19 trace_hwlat_callback(true); 20 + #endif 21 + #ifdef CONFIG_OSNOISE_TRACER 22 + if (trace_osnoise_callback_enabled) 23 + trace_osnoise_callback(true); 15 24 #endif 16 25 } 17 26 ··· 29 20 #ifdef CONFIG_HWLAT_TRACER 30 21 if (trace_hwlat_callback_enabled) 31 22 trace_hwlat_callback(false); 23 + #endif 24 + #ifdef CONFIG_OSNOISE_TRACER 25 + if (trace_osnoise_callback_enabled) 26 + trace_osnoise_callback(false); 32 27 #endif 33 28 } 34 29
+7
include/linux/trace.h
··· 41 41 void trace_array_put(struct trace_array *tr); 42 42 struct trace_array *trace_array_get_by_name(const char *name); 43 43 int trace_array_destroy(struct trace_array *tr); 44 + 45 + /* For osnoise tracer */ 46 + int osnoise_arch_register(void); 47 + void osnoise_arch_unregister(void); 48 + void osnoise_trace_irq_entry(int id); 49 + void osnoise_trace_irq_exit(int id, const char *desc); 50 + 44 51 #endif /* CONFIG_TRACING */ 45 52 46 53 #endif /* _LINUX_TRACE_H */
+10
include/linux/tracepoint.h
··· 41 41 tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data, 42 42 int prio); 43 43 extern int 44 + tracepoint_probe_register_prio_may_exist(struct tracepoint *tp, void *probe, void *data, 45 + int prio); 46 + extern int 44 47 tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data); 48 + static inline int 49 + tracepoint_probe_register_may_exist(struct tracepoint *tp, void *probe, 50 + void *data) 51 + { 52 + return tracepoint_probe_register_prio_may_exist(tp, probe, data, 53 + TRACEPOINT_DEFAULT_PRIO); 54 + } 45 55 extern void 46 56 for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv), 47 57 void *priv);
+1 -1
include/trace/events/btrfs.h
··· 1092 1092 __entry->flags = flags; 1093 1093 __entry->bytes = bytes; 1094 1094 __entry->flush = flush; 1095 - __assign_str(reason, reason) 1095 + __assign_str(reason, reason); 1096 1096 ), 1097 1097 1098 1098 TP_printk_btrfs("%s: flush=%d(%s) flags=%llu(%s) bytes=%llu",
+2 -2
include/trace/events/dma_fence.h
··· 23 23 ), 24 24 25 25 TP_fast_assign( 26 - __assign_str(driver, fence->ops->get_driver_name(fence)) 27 - __assign_str(timeline, fence->ops->get_timeline_name(fence)) 26 + __assign_str(driver, fence->ops->get_driver_name(fence)); 27 + __assign_str(timeline, fence->ops->get_timeline_name(fence)); 28 28 __entry->context = fence->context; 29 29 __entry->seqno = fence->seqno; 30 30 ),
+142
include/trace/events/osnoise.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #undef TRACE_SYSTEM 3 + #define TRACE_SYSTEM osnoise 4 + 5 + #if !defined(_OSNOISE_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) 6 + #define _OSNOISE_TRACE_H 7 + 8 + #include <linux/tracepoint.h> 9 + TRACE_EVENT(thread_noise, 10 + 11 + TP_PROTO(struct task_struct *t, u64 start, u64 duration), 12 + 13 + TP_ARGS(t, start, duration), 14 + 15 + TP_STRUCT__entry( 16 + __array( char, comm, TASK_COMM_LEN) 17 + __field( u64, start ) 18 + __field( u64, duration) 19 + __field( pid_t, pid ) 20 + ), 21 + 22 + TP_fast_assign( 23 + memcpy(__entry->comm, t->comm, TASK_COMM_LEN); 24 + __entry->pid = t->pid; 25 + __entry->start = start; 26 + __entry->duration = duration; 27 + ), 28 + 29 + TP_printk("%8s:%d start %llu.%09u duration %llu ns", 30 + __entry->comm, 31 + __entry->pid, 32 + __print_ns_to_secs(__entry->start), 33 + __print_ns_without_secs(__entry->start), 34 + __entry->duration) 35 + ); 36 + 37 + TRACE_EVENT(softirq_noise, 38 + 39 + TP_PROTO(int vector, u64 start, u64 duration), 40 + 41 + TP_ARGS(vector, start, duration), 42 + 43 + TP_STRUCT__entry( 44 + __field( u64, start ) 45 + __field( u64, duration) 46 + __field( int, vector ) 47 + ), 48 + 49 + TP_fast_assign( 50 + __entry->vector = vector; 51 + __entry->start = start; 52 + __entry->duration = duration; 53 + ), 54 + 55 + TP_printk("%8s:%d start %llu.%09u duration %llu ns", 56 + show_softirq_name(__entry->vector), 57 + __entry->vector, 58 + __print_ns_to_secs(__entry->start), 59 + __print_ns_without_secs(__entry->start), 60 + __entry->duration) 61 + ); 62 + 63 + TRACE_EVENT(irq_noise, 64 + 65 + TP_PROTO(int vector, const char *desc, u64 start, u64 duration), 66 + 67 + TP_ARGS(vector, desc, start, duration), 68 + 69 + TP_STRUCT__entry( 70 + __field( u64, start ) 71 + __field( u64, duration) 72 + __string( desc, desc ) 73 + __field( int, vector ) 74 + 75 + ), 76 + 77 + TP_fast_assign( 78 + __assign_str(desc, desc); 79 + __entry->vector = vector; 80 + __entry->start = start; 81 + __entry->duration = duration; 82 + ), 83 + 84 + TP_printk("%s:%d start %llu.%09u duration %llu ns", 85 + __get_str(desc), 86 + __entry->vector, 87 + __print_ns_to_secs(__entry->start), 88 + __print_ns_without_secs(__entry->start), 89 + __entry->duration) 90 + ); 91 + 92 + TRACE_EVENT(nmi_noise, 93 + 94 + TP_PROTO(u64 start, u64 duration), 95 + 96 + TP_ARGS(start, duration), 97 + 98 + TP_STRUCT__entry( 99 + __field( u64, start ) 100 + __field( u64, duration) 101 + ), 102 + 103 + TP_fast_assign( 104 + __entry->start = start; 105 + __entry->duration = duration; 106 + ), 107 + 108 + TP_printk("start %llu.%09u duration %llu ns", 109 + __print_ns_to_secs(__entry->start), 110 + __print_ns_without_secs(__entry->start), 111 + __entry->duration) 112 + ); 113 + 114 + TRACE_EVENT(sample_threshold, 115 + 116 + TP_PROTO(u64 start, u64 duration, u64 interference), 117 + 118 + TP_ARGS(start, duration, interference), 119 + 120 + TP_STRUCT__entry( 121 + __field( u64, start ) 122 + __field( u64, duration) 123 + __field( u64, interference) 124 + ), 125 + 126 + TP_fast_assign( 127 + __entry->start = start; 128 + __entry->duration = duration; 129 + __entry->interference = interference; 130 + ), 131 + 132 + TP_printk("start %llu.%09u duration %llu ns interference %llu", 133 + __print_ns_to_secs(__entry->start), 134 + __print_ns_without_secs(__entry->start), 135 + __entry->duration, 136 + __entry->interference) 137 + ); 138 + 139 + #endif /* _TRACE_OSNOISE_H */ 140 + 141 + /* This part must be outside protection */ 142 + #include <trace/define_trace.h>
+2 -2
include/trace/events/rpcgss.h
··· 152 152 TP_fast_assign( 153 153 __entry->cred = gc; 154 154 __entry->service = gc->gc_service; 155 - __assign_str(principal, gc->gc_principal) 155 + __assign_str(principal, gc->gc_principal); 156 156 ), 157 157 158 158 TP_printk("cred=%p service=%s principal='%s'", ··· 535 535 ), 536 536 537 537 TP_fast_assign( 538 - __assign_str(msg, buf) 538 + __assign_str(msg, buf); 539 539 ), 540 540 541 541 TP_printk("msg='%s'", __get_str(msg))
-2
include/trace/events/sched.h
··· 148 148 __array( char, comm, TASK_COMM_LEN ) 149 149 __field( pid_t, pid ) 150 150 __field( int, prio ) 151 - __field( int, success ) 152 151 __field( int, target_cpu ) 153 152 ), 154 153 ··· 155 156 memcpy(__entry->comm, p->comm, TASK_COMM_LEN); 156 157 __entry->pid = p->pid; 157 158 __entry->prio = p->prio; /* XXX SCHED_DEADLINE */ 158 - __entry->success = 1; /* rudiment, kill when possible */ 159 159 __entry->target_cpu = task_cpu(p); 160 160 ), 161 161
+20 -20
include/trace/events/sunrpc.h
··· 154 154 __entry->client_id = clnt->cl_clid; 155 155 __assign_str(addr, xprt->address_strings[RPC_DISPLAY_ADDR]); 156 156 __assign_str(port, xprt->address_strings[RPC_DISPLAY_PORT]); 157 - __assign_str(program, program) 158 - __assign_str(server, server) 157 + __assign_str(program, program); 158 + __assign_str(server, server); 159 159 ), 160 160 161 161 TP_printk("client=%u peer=[%s]:%s program=%s server=%s", ··· 180 180 181 181 TP_fast_assign( 182 182 __entry->error = error; 183 - __assign_str(program, program) 184 - __assign_str(server, server) 183 + __assign_str(program, program); 184 + __assign_str(server, server); 185 185 ), 186 186 187 187 TP_printk("program=%s server=%s error=%d", ··· 284 284 __entry->client_id = task->tk_client->cl_clid; 285 285 __entry->version = task->tk_client->cl_vers; 286 286 __entry->async = RPC_IS_ASYNC(task); 287 - __assign_str(progname, task->tk_client->cl_program->name) 288 - __assign_str(procname, rpc_proc_name(task)) 287 + __assign_str(progname, task->tk_client->cl_program->name); 288 + __assign_str(procname, rpc_proc_name(task)); 289 289 ), 290 290 291 291 TP_printk("task:%u@%u %sv%d %s (%ssync)", ··· 494 494 __entry->task_id = task->tk_pid; 495 495 __entry->client_id = task->tk_client->cl_clid; 496 496 __entry->xid = be32_to_cpu(task->tk_rqstp->rq_xid); 497 - __assign_str(progname, task->tk_client->cl_program->name) 497 + __assign_str(progname, task->tk_client->cl_program->name); 498 498 __entry->version = task->tk_client->cl_vers; 499 - __assign_str(procname, rpc_proc_name(task)) 500 - __assign_str(servername, task->tk_xprt->servername) 499 + __assign_str(procname, rpc_proc_name(task)); 500 + __assign_str(servername, task->tk_xprt->servername); 501 501 ), 502 502 503 503 TP_printk("task:%u@%d server=%s xid=0x%08x %sv%d %s", ··· 622 622 __entry->task_id = task->tk_pid; 623 623 __entry->xid = be32_to_cpu(task->tk_rqstp->rq_xid); 624 624 __entry->version = task->tk_client->cl_vers; 625 - __assign_str(progname, task->tk_client->cl_program->name) 626 - __assign_str(procname, rpc_proc_name(task)) 625 + __assign_str(progname, task->tk_client->cl_program->name); 626 + __assign_str(procname, rpc_proc_name(task)); 627 627 __entry->backlog = ktime_to_us(backlog); 628 628 __entry->rtt = ktime_to_us(rtt); 629 629 __entry->execute = ktime_to_us(execute); ··· 669 669 __entry->task_id = task->tk_pid; 670 670 __entry->client_id = task->tk_client->cl_clid; 671 671 __assign_str(progname, 672 - task->tk_client->cl_program->name) 672 + task->tk_client->cl_program->name); 673 673 __entry->version = task->tk_client->cl_vers; 674 - __assign_str(procedure, task->tk_msg.rpc_proc->p_name) 674 + __assign_str(procedure, task->tk_msg.rpc_proc->p_name); 675 675 } else { 676 676 __entry->task_id = 0; 677 677 __entry->client_id = 0; 678 - __assign_str(progname, "unknown") 678 + __assign_str(progname, "unknown"); 679 679 __entry->version = 0; 680 - __assign_str(procedure, "unknown") 680 + __assign_str(procedure, "unknown"); 681 681 } 682 682 __entry->requested = requested; 683 683 __entry->end = xdr->end; ··· 735 735 __entry->task_id = task->tk_pid; 736 736 __entry->client_id = task->tk_client->cl_clid; 737 737 __assign_str(progname, 738 - task->tk_client->cl_program->name) 738 + task->tk_client->cl_program->name); 739 739 __entry->version = task->tk_client->cl_vers; 740 - __assign_str(procedure, task->tk_msg.rpc_proc->p_name) 740 + __assign_str(procedure, task->tk_msg.rpc_proc->p_name); 741 741 742 742 __entry->offset = offset; 743 743 __entry->copied = copied; ··· 1107 1107 __entry->xid = be32_to_cpu(rqst->rq_xid); 1108 1108 __entry->ntrans = rqst->rq_ntrans; 1109 1109 __assign_str(progname, 1110 - task->tk_client->cl_program->name) 1110 + task->tk_client->cl_program->name); 1111 1111 __entry->version = task->tk_client->cl_vers; 1112 - __assign_str(procedure, task->tk_msg.rpc_proc->p_name) 1112 + __assign_str(procedure, task->tk_msg.rpc_proc->p_name); 1113 1113 ), 1114 1114 1115 1115 TP_printk( ··· 1842 1842 1843 1843 TP_fast_assign( 1844 1844 __assign_str(addr, xprt->xpt_remotebuf); 1845 - __assign_str(protocol, xprt->xpt_class->xcl_name) 1845 + __assign_str(protocol, xprt->xpt_class->xcl_name); 1846 1846 __assign_str(service, service); 1847 1847 ), 1848 1848
+2 -1
include/trace/events/writeback.h
··· 36 36 EM( WB_REASON_PERIODIC, "periodic") \ 37 37 EM( WB_REASON_LAPTOP_TIMER, "laptop_timer") \ 38 38 EM( WB_REASON_FS_FREE_SPACE, "fs_free_space") \ 39 - EMe(WB_REASON_FORKER_THREAD, "forker_thread") 39 + EM( WB_REASON_FORKER_THREAD, "forker_thread") \ 40 + EMe(WB_REASON_FOREIGN_FLUSH, "foreign_flush") 40 41 41 42 WB_WORK_REASON 42 43
+25
include/trace/trace_events.h
··· 358 358 trace_print_hex_dump_seq(p, prefix_str, prefix_type, \ 359 359 rowsize, groupsize, buf, len, ascii) 360 360 361 + #undef __print_ns_to_secs 362 + #define __print_ns_to_secs(value) \ 363 + ({ \ 364 + u64 ____val = (u64)(value); \ 365 + do_div(____val, NSEC_PER_SEC); \ 366 + ____val; \ 367 + }) 368 + 369 + #undef __print_ns_without_secs 370 + #define __print_ns_without_secs(value) \ 371 + ({ \ 372 + u64 ____val = (u64)(value); \ 373 + (u32) do_div(____val, NSEC_PER_SEC); \ 374 + }) 375 + 361 376 #undef DECLARE_EVENT_CLASS 362 377 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 363 378 static notrace enum print_line_t \ ··· 750 735 #undef __get_bitmask 751 736 #undef __print_array 752 737 #undef __print_hex_dump 738 + 739 + /* 740 + * The below is not executed in the kernel. It is only what is 741 + * displayed in the print format for userspace to parse. 742 + */ 743 + #undef __print_ns_to_secs 744 + #define __print_ns_to_secs(val) (val) / 1000000000UL 745 + 746 + #undef __print_ns_without_secs 747 + #define __print_ns_without_secs(val) (val) % 1000000000UL 753 748 754 749 #undef TP_printk 755 750 #define TP_printk(fmt, args...) "\"" fmt "\", " __stringify(args)
+1 -11
init/main.c
··· 386 386 return new_cmdline; 387 387 } 388 388 389 - static u32 boot_config_checksum(unsigned char *p, u32 size) 390 - { 391 - u32 ret = 0; 392 - 393 - while (size--) 394 - ret += *p++; 395 - 396 - return ret; 397 - } 398 - 399 389 static int __init bootconfig_params(char *param, char *val, 400 390 const char *unused, void *arg) 401 391 { ··· 429 439 return; 430 440 } 431 441 432 - if (boot_config_checksum((unsigned char *)data, size) != csum) { 442 + if (xbc_calc_checksum(data, size) != csum) { 433 443 pr_err("bootconfig checksum failed\n"); 434 444 return; 435 445 }
+62
kernel/trace/Kconfig
··· 356 356 file. Every time a latency is greater than tracing_thresh, it will 357 357 be recorded into the ring buffer. 358 358 359 + config OSNOISE_TRACER 360 + bool "OS Noise tracer" 361 + select GENERIC_TRACER 362 + help 363 + In the context of high-performance computing (HPC), the Operating 364 + System Noise (osnoise) refers to the interference experienced by an 365 + application due to activities inside the operating system. In the 366 + context of Linux, NMIs, IRQs, SoftIRQs, and any other system thread 367 + can cause noise to the system. Moreover, hardware-related jobs can 368 + also cause noise, for example, via SMIs. 369 + 370 + The osnoise tracer leverages the hwlat_detector by running a similar 371 + loop with preemption, SoftIRQs and IRQs enabled, thus allowing all 372 + the sources of osnoise during its execution. The osnoise tracer takes 373 + note of the entry and exit point of any source of interferences, 374 + increasing a per-cpu interference counter. It saves an interference 375 + counter for each source of interference. The interference counter for 376 + NMI, IRQs, SoftIRQs, and threads is increased anytime the tool 377 + observes these interferences' entry events. When a noise happens 378 + without any interference from the operating system level, the 379 + hardware noise counter increases, pointing to a hardware-related 380 + noise. In this way, osnoise can account for any source of 381 + interference. At the end of the period, the osnoise tracer prints 382 + the sum of all noise, the max single noise, the percentage of CPU 383 + available for the thread, and the counters for the noise sources. 384 + 385 + In addition to the tracer, a set of tracepoints were added to 386 + facilitate the identification of the osnoise source. 387 + 388 + The output will appear in the trace and trace_pipe files. 389 + 390 + To enable this tracer, echo in "osnoise" into the current_tracer 391 + file. 392 + 393 + config TIMERLAT_TRACER 394 + bool "Timerlat tracer" 395 + select OSNOISE_TRACER 396 + select GENERIC_TRACER 397 + help 398 + The timerlat tracer aims to help the preemptive kernel developers 399 + to find sources of wakeup latencies of real-time threads. 400 + 401 + The tracer creates a per-cpu kernel thread with real-time priority. 402 + The tracer thread sets a periodic timer to wakeup itself, and goes 403 + to sleep waiting for the timer to fire. At the wakeup, the thread 404 + then computes a wakeup latency value as the difference between 405 + the current time and the absolute time that the timer was set 406 + to expire. 407 + 408 + The tracer prints two lines at every activation. The first is the 409 + timer latency observed at the hardirq context before the 410 + activation of the thread. The second is the timer latency observed 411 + by the thread, which is the same level that cyclictest reports. The 412 + ACTIVATION ID field serves to relate the irq execution to its 413 + respective thread execution. 414 + 415 + The tracer is build on top of osnoise tracer, and the osnoise: 416 + events can be used to trace the source of interference from NMI, 417 + IRQs and other threads. It also enables the capture of the 418 + stacktrace at the IRQ context, which helps to identify the code 419 + path that can cause thread delay. 420 + 359 421 config MMIOTRACE 360 422 bool "Memory mapped IO tracing" 361 423 depends on HAVE_MMIOTRACE_SUPPORT && PCI
+1
kernel/trace/Makefile
··· 58 58 obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o 59 59 obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o 60 60 obj-$(CONFIG_HWLAT_TRACER) += trace_hwlat.o 61 + obj-$(CONFIG_OSNOISE_TRACER) += trace_osnoise.o 61 62 obj-$(CONFIG_NOP_TRACER) += trace_nop.o 62 63 obj-$(CONFIG_STACK_TRACER) += trace_stack.o 63 64 obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
+2 -1
kernel/trace/bpf_trace.c
··· 1842 1842 if (prog->aux->max_tp_access > btp->writable_size) 1843 1843 return -EINVAL; 1844 1844 1845 - return tracepoint_probe_register(tp, (void *)btp->bpf_func, prog); 1845 + return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func, 1846 + prog); 1846 1847 } 1847 1848 1848 1849 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+1 -1
kernel/trace/ring_buffer.c
··· 3391 3391 case RINGBUF_TYPE_PADDING: 3392 3392 if (event->time_delta == 1) 3393 3393 break; 3394 - /* fall through */ 3394 + fallthrough; 3395 3395 case RINGBUF_TYPE_DATA: 3396 3396 ts += event->time_delta; 3397 3397 break;
+211 -54
kernel/trace/trace.c
··· 87 87 /* Pipe tracepoints to printk */ 88 88 struct trace_iterator *tracepoint_print_iter; 89 89 int tracepoint_printk; 90 + static bool tracepoint_printk_stop_on_boot __initdata; 90 91 static DEFINE_STATIC_KEY_FALSE(tracepoint_printk_key); 91 92 92 93 /* For tracers that don't implement custom flags */ ··· 198 197 199 198 static int __init set_ftrace_dump_on_oops(char *str) 200 199 { 201 - if (*str++ != '=' || !*str) { 200 + if (*str++ != '=' || !*str || !strcmp("1", str)) { 202 201 ftrace_dump_on_oops = DUMP_ALL; 203 202 return 1; 204 203 } 205 204 206 - if (!strcmp("orig_cpu", str)) { 205 + if (!strcmp("orig_cpu", str) || !strcmp("2", str)) { 207 206 ftrace_dump_on_oops = DUMP_ORIG; 208 207 return 1; 209 208 } ··· 257 256 return 1; 258 257 } 259 258 __setup("tp_printk", set_tracepoint_printk); 259 + 260 + static int __init set_tracepoint_printk_stop(char *str) 261 + { 262 + tracepoint_printk_stop_on_boot = true; 263 + return 1; 264 + } 265 + __setup("tp_printk_stop_on_boot", set_tracepoint_printk_stop); 260 266 261 267 unsigned long long ns2usecs(u64 nsec) 262 268 { ··· 1691 1683 unsigned long __read_mostly tracing_thresh; 1692 1684 static const struct file_operations tracing_max_lat_fops; 1693 1685 1694 - #if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ 1695 - defined(CONFIG_FSNOTIFY) 1686 + #ifdef LATENCY_FS_NOTIFY 1696 1687 1697 1688 static struct workqueue_struct *fsnotify_wq; 1698 1689 ··· 2192 2185 } 2193 2186 } 2194 2187 2188 + /* 2189 + * The tgid_map array maps from pid to tgid; i.e. the value stored at index i 2190 + * is the tgid last observed corresponding to pid=i. 2191 + */ 2195 2192 static int *tgid_map; 2193 + 2194 + /* The maximum valid index into tgid_map. */ 2195 + static size_t tgid_map_max; 2196 2196 2197 2197 #define SAVED_CMDLINES_DEFAULT 128 2198 2198 #define NO_CMDLINE_MAP UINT_MAX ··· 2473 2459 preempt_enable(); 2474 2460 } 2475 2461 2462 + static int *trace_find_tgid_ptr(int pid) 2463 + { 2464 + /* 2465 + * Pairs with the smp_store_release in set_tracer_flag() to ensure that 2466 + * if we observe a non-NULL tgid_map then we also observe the correct 2467 + * tgid_map_max. 2468 + */ 2469 + int *map = smp_load_acquire(&tgid_map); 2470 + 2471 + if (unlikely(!map || pid > tgid_map_max)) 2472 + return NULL; 2473 + 2474 + return &map[pid]; 2475 + } 2476 + 2476 2477 int trace_find_tgid(int pid) 2477 2478 { 2478 - if (unlikely(!tgid_map || !pid || pid > PID_MAX_DEFAULT)) 2479 - return 0; 2479 + int *ptr = trace_find_tgid_ptr(pid); 2480 2480 2481 - return tgid_map[pid]; 2481 + return ptr ? *ptr : 0; 2482 2482 } 2483 2483 2484 2484 static int trace_save_tgid(struct task_struct *tsk) 2485 2485 { 2486 + int *ptr; 2487 + 2486 2488 /* treat recording of idle task as a success */ 2487 2489 if (!tsk->pid) 2488 2490 return 1; 2489 2491 2490 - if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT)) 2492 + ptr = trace_find_tgid_ptr(tsk->pid); 2493 + if (!ptr) 2491 2494 return 0; 2492 2495 2493 - tgid_map[tsk->pid] = tsk->tgid; 2496 + *ptr = tsk->tgid; 2494 2497 return 1; 2495 2498 } 2496 2499 ··· 2761 2730 if (!tr->no_filter_buffering_ref && 2762 2731 (trace_file->flags & (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) && 2763 2732 (entry = this_cpu_read(trace_buffered_event))) { 2764 - /* Try to use the per cpu buffer first */ 2733 + /* 2734 + * Filtering is on, so try to use the per cpu buffer first. 2735 + * This buffer will simulate a ring_buffer_event, 2736 + * where the type_len is zero and the array[0] will 2737 + * hold the full length. 2738 + * (see include/linux/ring-buffer.h for details on 2739 + * how the ring_buffer_event is structured). 2740 + * 2741 + * Using a temp buffer during filtering and copying it 2742 + * on a matched filter is quicker than writing directly 2743 + * into the ring buffer and then discarding it when 2744 + * it doesn't match. That is because the discard 2745 + * requires several atomic operations to get right. 2746 + * Copying on match and doing nothing on a failed match 2747 + * is still quicker than no copy on match, but having 2748 + * to discard out of the ring buffer on a failed match. 2749 + */ 2750 + int max_len = PAGE_SIZE - struct_size(entry, array, 1); 2751 + 2765 2752 val = this_cpu_inc_return(trace_buffered_event_cnt); 2766 - if ((len < (PAGE_SIZE - sizeof(*entry) - sizeof(entry->array[0]))) && val == 1) { 2753 + 2754 + /* 2755 + * Preemption is disabled, but interrupts and NMIs 2756 + * can still come in now. If that happens after 2757 + * the above increment, then it will have to go 2758 + * back to the old method of allocating the event 2759 + * on the ring buffer, and if the filter fails, it 2760 + * will have to call ring_buffer_discard_commit() 2761 + * to remove it. 2762 + * 2763 + * Need to also check the unlikely case that the 2764 + * length is bigger than the temp buffer size. 2765 + * If that happens, then the reserve is pretty much 2766 + * guaranteed to fail, as the ring buffer currently 2767 + * only allows events less than a page. But that may 2768 + * change in the future, so let the ring buffer reserve 2769 + * handle the failure in that case. 2770 + */ 2771 + if (val == 1 && likely(len <= max_len)) { 2767 2772 trace_event_setup(entry, type, trace_ctx); 2768 2773 entry->array[0] = len; 2769 2774 return entry; ··· 5239 5172 5240 5173 int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) 5241 5174 { 5175 + int *map; 5176 + 5242 5177 if ((mask == TRACE_ITER_RECORD_TGID) || 5243 5178 (mask == TRACE_ITER_RECORD_CMD)) 5244 5179 lockdep_assert_held(&event_mutex); ··· 5263 5194 trace_event_enable_cmd_record(enabled); 5264 5195 5265 5196 if (mask == TRACE_ITER_RECORD_TGID) { 5266 - if (!tgid_map) 5267 - tgid_map = kvcalloc(PID_MAX_DEFAULT + 1, 5268 - sizeof(*tgid_map), 5269 - GFP_KERNEL); 5197 + if (!tgid_map) { 5198 + tgid_map_max = pid_max; 5199 + map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map), 5200 + GFP_KERNEL); 5201 + 5202 + /* 5203 + * Pairs with smp_load_acquire() in 5204 + * trace_find_tgid_ptr() to ensure that if it observes 5205 + * the tgid_map we just allocated then it also observes 5206 + * the corresponding tgid_map_max value. 5207 + */ 5208 + smp_store_release(&tgid_map, map); 5209 + } 5270 5210 if (!tgid_map) { 5271 5211 tr->trace_flags &= ~TRACE_ITER_RECORD_TGID; 5272 5212 return -ENOMEM; ··· 5687 5609 5688 5610 static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos) 5689 5611 { 5690 - int *ptr = v; 5612 + int pid = ++(*pos); 5691 5613 5692 - if (*pos || m->count) 5693 - ptr++; 5694 - 5695 - (*pos)++; 5696 - 5697 - for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) { 5698 - if (trace_find_tgid(*ptr)) 5699 - return ptr; 5700 - } 5701 - 5702 - return NULL; 5614 + return trace_find_tgid_ptr(pid); 5703 5615 } 5704 5616 5705 5617 static void *saved_tgids_start(struct seq_file *m, loff_t *pos) 5706 5618 { 5707 - void *v; 5708 - loff_t l = 0; 5619 + int pid = *pos; 5709 5620 5710 - if (!tgid_map) 5711 - return NULL; 5712 - 5713 - v = &tgid_map[0]; 5714 - while (l <= *pos) { 5715 - v = saved_tgids_next(m, v, &l); 5716 - if (!v) 5717 - return NULL; 5718 - } 5719 - 5720 - return v; 5621 + return trace_find_tgid_ptr(pid); 5721 5622 } 5722 5623 5723 5624 static void saved_tgids_stop(struct seq_file *m, void *v) ··· 5705 5648 5706 5649 static int saved_tgids_show(struct seq_file *m, void *v) 5707 5650 { 5708 - int pid = (int *)v - tgid_map; 5651 + int *entry = (int *)v; 5652 + int pid = entry - tgid_map; 5653 + int tgid = *entry; 5709 5654 5710 - seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid)); 5655 + if (tgid == 0) 5656 + return SEQ_SKIP; 5657 + 5658 + seq_printf(m, "%d %d\n", pid, tgid); 5711 5659 return 0; 5712 5660 } 5713 5661 ··· 6197 6135 ssize_t tracing_resize_ring_buffer(struct trace_array *tr, 6198 6136 unsigned long size, int cpu_id) 6199 6137 { 6200 - int ret = size; 6138 + int ret; 6201 6139 6202 6140 mutex_lock(&trace_types_lock); 6203 6141 ··· 7590 7528 }; 7591 7529 7592 7530 #endif /* CONFIG_TRACER_SNAPSHOT */ 7531 + 7532 + /* 7533 + * trace_min_max_write - Write a u64 value to a trace_min_max_param struct 7534 + * @filp: The active open file structure 7535 + * @ubuf: The userspace provided buffer to read value into 7536 + * @cnt: The maximum number of bytes to read 7537 + * @ppos: The current "file" position 7538 + * 7539 + * This function implements the write interface for a struct trace_min_max_param. 7540 + * The filp->private_data must point to a trace_min_max_param structure that 7541 + * defines where to write the value, the min and the max acceptable values, 7542 + * and a lock to protect the write. 7543 + */ 7544 + static ssize_t 7545 + trace_min_max_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos) 7546 + { 7547 + struct trace_min_max_param *param = filp->private_data; 7548 + u64 val; 7549 + int err; 7550 + 7551 + if (!param) 7552 + return -EFAULT; 7553 + 7554 + err = kstrtoull_from_user(ubuf, cnt, 10, &val); 7555 + if (err) 7556 + return err; 7557 + 7558 + if (param->lock) 7559 + mutex_lock(param->lock); 7560 + 7561 + if (param->min && val < *param->min) 7562 + err = -EINVAL; 7563 + 7564 + if (param->max && val > *param->max) 7565 + err = -EINVAL; 7566 + 7567 + if (!err) 7568 + *param->val = val; 7569 + 7570 + if (param->lock) 7571 + mutex_unlock(param->lock); 7572 + 7573 + if (err) 7574 + return err; 7575 + 7576 + return cnt; 7577 + } 7578 + 7579 + /* 7580 + * trace_min_max_read - Read a u64 value from a trace_min_max_param struct 7581 + * @filp: The active open file structure 7582 + * @ubuf: The userspace provided buffer to read value into 7583 + * @cnt: The maximum number of bytes to read 7584 + * @ppos: The current "file" position 7585 + * 7586 + * This function implements the read interface for a struct trace_min_max_param. 7587 + * The filp->private_data must point to a trace_min_max_param struct with valid 7588 + * data. 7589 + */ 7590 + static ssize_t 7591 + trace_min_max_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) 7592 + { 7593 + struct trace_min_max_param *param = filp->private_data; 7594 + char buf[U64_STR_SIZE]; 7595 + int len; 7596 + u64 val; 7597 + 7598 + if (!param) 7599 + return -EFAULT; 7600 + 7601 + val = *param->val; 7602 + 7603 + if (cnt > sizeof(buf)) 7604 + cnt = sizeof(buf); 7605 + 7606 + len = snprintf(buf, sizeof(buf), "%llu\n", val); 7607 + 7608 + return simple_read_from_buffer(ubuf, cnt, ppos, buf, len); 7609 + } 7610 + 7611 + const struct file_operations trace_min_max_fops = { 7612 + .open = tracing_open_generic, 7613 + .read = trace_min_max_read, 7614 + .write = trace_min_max_write, 7615 + }; 7593 7616 7594 7617 #define TRACING_LOG_ERRS_MAX 8 7595 7618 #define TRACING_LOG_LOC_MAX 128 ··· 9679 9532 return 0; 9680 9533 } 9681 9534 9535 + fs_initcall(tracer_init_tracefs); 9536 + 9682 9537 static int trace_panic_handler(struct notifier_block *this, 9683 9538 unsigned long event, void *unused) 9684 9539 { ··· 10101 9952 trace_event_init(); 10102 9953 } 10103 9954 10104 - __init static int clear_boot_tracer(void) 9955 + __init static void clear_boot_tracer(void) 10105 9956 { 10106 9957 /* 10107 9958 * The default tracer at boot buffer is an init section. ··· 10111 9962 * about to be freed. 10112 9963 */ 10113 9964 if (!default_bootup_tracer) 10114 - return 0; 9965 + return; 10115 9966 10116 9967 printk(KERN_INFO "ftrace bootup tracer '%s' not registered.\n", 10117 9968 default_bootup_tracer); 10118 9969 default_bootup_tracer = NULL; 10119 - 10120 - return 0; 10121 9970 } 10122 9971 10123 - fs_initcall(tracer_init_tracefs); 10124 - late_initcall_sync(clear_boot_tracer); 10125 - 10126 9972 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK 10127 - __init static int tracing_set_default_clock(void) 9973 + __init static void tracing_set_default_clock(void) 10128 9974 { 10129 9975 /* sched_clock_stable() is determined in late_initcall */ 10130 9976 if (!trace_boot_clock && !sched_clock_stable()) { 10131 9977 if (security_locked_down(LOCKDOWN_TRACEFS)) { 10132 9978 pr_warn("Can not set tracing clock due to lockdown\n"); 10133 - return -EPERM; 9979 + return; 10134 9980 } 10135 9981 10136 9982 printk(KERN_WARNING ··· 10135 9991 "on the kernel command line\n"); 10136 9992 tracing_set_clock(&global_trace, "global"); 10137 9993 } 9994 + } 9995 + #else 9996 + static inline void tracing_set_default_clock(void) { } 9997 + #endif 10138 9998 9999 + __init static int late_trace_init(void) 10000 + { 10001 + if (tracepoint_printk && tracepoint_printk_stop_on_boot) { 10002 + static_key_disable(&tracepoint_printk_key.key); 10003 + tracepoint_printk = 0; 10004 + } 10005 + 10006 + tracing_set_default_clock(); 10007 + clear_boot_tracer(); 10139 10008 return 0; 10140 10009 } 10141 - late_initcall_sync(tracing_set_default_clock); 10142 - #endif 10010 + 10011 + late_initcall_sync(late_trace_init);
+29 -6
kernel/trace/trace.h
··· 45 45 TRACE_BLK, 46 46 TRACE_BPUTS, 47 47 TRACE_HWLAT, 48 + TRACE_OSNOISE, 49 + TRACE_TIMERLAT, 48 50 TRACE_RAW_DATA, 49 51 TRACE_FUNC_REPEATS, 50 52 ··· 292 290 struct array_buffer max_buffer; 293 291 bool allocated_snapshot; 294 292 #endif 295 - #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) 293 + #if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) \ 294 + || defined(CONFIG_OSNOISE_TRACER) 296 295 unsigned long max_latency; 297 296 #ifdef CONFIG_FSNOTIFY 298 297 struct dentry *d_max_latency; ··· 441 438 IF_ASSIGN(var, ent, struct bprint_entry, TRACE_BPRINT); \ 442 439 IF_ASSIGN(var, ent, struct bputs_entry, TRACE_BPUTS); \ 443 440 IF_ASSIGN(var, ent, struct hwlat_entry, TRACE_HWLAT); \ 441 + IF_ASSIGN(var, ent, struct osnoise_entry, TRACE_OSNOISE);\ 442 + IF_ASSIGN(var, ent, struct timerlat_entry, TRACE_TIMERLAT);\ 444 443 IF_ASSIGN(var, ent, struct raw_data_entry, TRACE_RAW_DATA);\ 445 444 IF_ASSIGN(var, ent, struct trace_mmiotrace_rw, \ 446 445 TRACE_MMIO_RW); \ ··· 673 668 struct task_struct *tsk, int cpu); 674 669 #endif /* CONFIG_TRACER_MAX_TRACE */ 675 670 676 - #if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER)) && \ 677 - defined(CONFIG_FSNOTIFY) 671 + #if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) \ 672 + || defined(CONFIG_OSNOISE_TRACER)) && defined(CONFIG_FSNOTIFY) 673 + #define LATENCY_FS_NOTIFY 674 + #endif 678 675 676 + #ifdef LATENCY_FS_NOTIFY 679 677 void latency_fsnotify(struct trace_array *tr); 680 - 681 678 #else 682 - 683 679 static inline void latency_fsnotify(struct trace_array *tr) { } 684 - 685 680 #endif 686 681 687 682 #ifdef CONFIG_STACKTRACE ··· 1949 1944 } 1950 1945 return true; 1951 1946 } 1947 + 1948 + /* 1949 + * This is a generic way to read and write a u64 value from a file in tracefs. 1950 + * 1951 + * The value is stored on the variable pointed by *val. The value needs 1952 + * to be at least *min and at most *max. The write is protected by an 1953 + * existing *lock. 1954 + */ 1955 + struct trace_min_max_param { 1956 + struct mutex *lock; 1957 + u64 *val; 1958 + u64 *min; 1959 + u64 *max; 1960 + }; 1961 + 1962 + #define U64_STR_SIZE 24 /* 20 digits max */ 1963 + 1964 + extern const struct file_operations trace_min_max_fops; 1952 1965 1953 1966 #endif /* _LINUX_KERNEL_TRACE_H */
+25 -2
kernel/trace/trace_boot.c
··· 225 225 trace_boot_init_events(struct trace_array *tr, struct xbc_node *node) 226 226 { 227 227 struct xbc_node *gnode, *enode; 228 + bool enable, enable_all = false; 229 + const char *data; 228 230 229 231 node = xbc_node_find_child(node, "event"); 230 232 if (!node) 231 233 return; 232 234 /* per-event key starts with "event.GROUP.EVENT" */ 233 - xbc_node_for_each_child(node, gnode) 234 - xbc_node_for_each_child(gnode, enode) 235 + xbc_node_for_each_child(node, gnode) { 236 + data = xbc_node_get_data(gnode); 237 + if (!strcmp(data, "enable")) { 238 + enable_all = true; 239 + continue; 240 + } 241 + enable = false; 242 + xbc_node_for_each_child(gnode, enode) { 243 + data = xbc_node_get_data(enode); 244 + if (!strcmp(data, "enable")) { 245 + enable = true; 246 + continue; 247 + } 235 248 trace_boot_init_one_event(tr, gnode, enode); 249 + } 250 + /* Event enablement must be done after event settings */ 251 + if (enable) { 252 + data = xbc_node_get_data(gnode); 253 + trace_array_set_clr_event(tr, data, NULL, true); 254 + } 255 + } 256 + /* Ditto */ 257 + if (enable_all) 258 + trace_array_set_clr_event(tr, NULL, NULL, true); 236 259 } 237 260 #else 238 261 #define trace_boot_enable_events(tr, node) do {} while (0)
+41
kernel/trace/trace_entries.h
··· 360 360 __entry->count, 361 361 FUNC_REPEATS_GET_DELTA_TS(__entry)) 362 362 ); 363 + 364 + FTRACE_ENTRY(osnoise, osnoise_entry, 365 + 366 + TRACE_OSNOISE, 367 + 368 + F_STRUCT( 369 + __field( u64, noise ) 370 + __field( u64, runtime ) 371 + __field( u64, max_sample ) 372 + __field( unsigned int, hw_count ) 373 + __field( unsigned int, nmi_count ) 374 + __field( unsigned int, irq_count ) 375 + __field( unsigned int, softirq_count ) 376 + __field( unsigned int, thread_count ) 377 + ), 378 + 379 + F_printk("noise:%llu\tmax_sample:%llu\thw:%u\tnmi:%u\tirq:%u\tsoftirq:%u\tthread:%u\n", 380 + __entry->noise, 381 + __entry->max_sample, 382 + __entry->hw_count, 383 + __entry->nmi_count, 384 + __entry->irq_count, 385 + __entry->softirq_count, 386 + __entry->thread_count) 387 + ); 388 + 389 + FTRACE_ENTRY(timerlat, timerlat_entry, 390 + 391 + TRACE_TIMERLAT, 392 + 393 + F_STRUCT( 394 + __field( unsigned int, seqnum ) 395 + __field( int, context ) 396 + __field( u64, timer_latency ) 397 + ), 398 + 399 + F_printk("seq:%u\tcontext:%d\ttimer_latency:%llu\n", 400 + __entry->seqnum, 401 + __entry->context, 402 + __entry->timer_latency) 403 + );
+2 -1
kernel/trace/trace_events_hist.c
··· 2434 2434 char *subsys_name, char *event_name, char *field_name) 2435 2435 { 2436 2436 struct trace_array *tr = target_hist_data->event_file->tr; 2437 - struct hist_field *event_var = ERR_PTR(-EINVAL); 2438 2437 struct hist_trigger_data *hist_data; 2439 2438 unsigned int i, n, first = true; 2440 2439 struct field_var_hist *var_hist; 2441 2440 struct trace_event_file *file; 2442 2441 struct hist_field *key_field; 2442 + struct hist_field *event_var; 2443 2443 char *saved_filter; 2444 2444 char *cmd; 2445 2445 int ret; ··· 5232 5232 cmd = hist_data->field_var_hists[i]->cmd; 5233 5233 ret = event_hist_trigger_func(&trigger_hist_cmd, file, 5234 5234 "!hist", "hist", cmd); 5235 + WARN_ON_ONCE(ret < 0); 5235 5236 } 5236 5237 } 5237 5238
+2 -1
kernel/trace/trace_events_trigger.c
··· 916 916 917 917 /** 918 918 * set_named_trigger_data - Associate common named trigger data 919 - * @data: The trigger data of a named trigger to unpause 919 + * @data: The trigger data to associate 920 + * @named_data: The common named trigger to be associated 920 921 * 921 922 * Named triggers are sets of triggers that share a common set of 922 923 * trigger data. The first named trigger registered with a given name
+417 -167
kernel/trace/trace_hwlat.c
··· 34 34 * Copyright (C) 2008-2009 Jon Masters, Red Hat, Inc. <jcm@redhat.com> 35 35 * Copyright (C) 2013-2016 Steven Rostedt, Red Hat, Inc. <srostedt@redhat.com> 36 36 * 37 - * Includes useful feedback from Clark Williams <clark@redhat.com> 37 + * Includes useful feedback from Clark Williams <williams@redhat.com> 38 38 * 39 39 */ 40 40 #include <linux/kthread.h> ··· 54 54 #define DEFAULT_SAMPLE_WIDTH 500000 /* 0.5s */ 55 55 #define DEFAULT_LAT_THRESHOLD 10 /* 10us */ 56 56 57 - /* sampling thread*/ 58 - static struct task_struct *hwlat_kthread; 59 - 60 57 static struct dentry *hwlat_sample_width; /* sample width us */ 61 58 static struct dentry *hwlat_sample_window; /* sample window us */ 59 + static struct dentry *hwlat_thread_mode; /* hwlat thread mode */ 60 + 61 + enum { 62 + MODE_NONE = 0, 63 + MODE_ROUND_ROBIN, 64 + MODE_PER_CPU, 65 + MODE_MAX 66 + }; 67 + static char *thread_mode_str[] = { "none", "round-robin", "per-cpu" }; 62 68 63 69 /* Save the previous tracing_thresh value */ 64 70 static unsigned long save_tracing_thresh; 65 71 66 - /* NMI timestamp counters */ 67 - static u64 nmi_ts_start; 68 - static u64 nmi_total_ts; 69 - static int nmi_count; 70 - static int nmi_cpu; 72 + /* runtime kthread data */ 73 + struct hwlat_kthread_data { 74 + struct task_struct *kthread; 75 + /* NMI timestamp counters */ 76 + u64 nmi_ts_start; 77 + u64 nmi_total_ts; 78 + int nmi_count; 79 + int nmi_cpu; 80 + }; 81 + 82 + struct hwlat_kthread_data hwlat_single_cpu_data; 83 + DEFINE_PER_CPU(struct hwlat_kthread_data, hwlat_per_cpu_data); 71 84 72 85 /* Tells NMIs to call back to the hwlat tracer to record timestamps */ 73 86 bool trace_hwlat_callback_enabled; ··· 109 96 u64 sample_window; /* total sampling window (on+off) */ 110 97 u64 sample_width; /* active sampling portion of window */ 111 98 99 + int thread_mode; /* thread mode */ 100 + 112 101 } hwlat_data = { 113 102 .sample_window = DEFAULT_SAMPLE_WINDOW, 114 103 .sample_width = DEFAULT_SAMPLE_WIDTH, 104 + .thread_mode = MODE_ROUND_ROBIN 115 105 }; 106 + 107 + static struct hwlat_kthread_data *get_cpu_data(void) 108 + { 109 + if (hwlat_data.thread_mode == MODE_PER_CPU) 110 + return this_cpu_ptr(&hwlat_per_cpu_data); 111 + else 112 + return &hwlat_single_cpu_data; 113 + } 114 + 115 + static bool hwlat_busy; 116 116 117 117 static void trace_hwlat_sample(struct hwlat_sample *sample) 118 118 { ··· 162 136 163 137 void trace_hwlat_callback(bool enter) 164 138 { 165 - if (smp_processor_id() != nmi_cpu) 139 + struct hwlat_kthread_data *kdata = get_cpu_data(); 140 + 141 + if (!kdata->kthread) 166 142 return; 167 143 168 144 /* ··· 173 145 */ 174 146 if (!IS_ENABLED(CONFIG_GENERIC_SCHED_CLOCK)) { 175 147 if (enter) 176 - nmi_ts_start = time_get(); 148 + kdata->nmi_ts_start = time_get(); 177 149 else 178 - nmi_total_ts += time_get() - nmi_ts_start; 150 + kdata->nmi_total_ts += time_get() - kdata->nmi_ts_start; 179 151 } 180 152 181 153 if (enter) 182 - nmi_count++; 154 + kdata->nmi_count++; 183 155 } 156 + 157 + /* 158 + * hwlat_err - report a hwlat error. 159 + */ 160 + #define hwlat_err(msg) ({ \ 161 + struct trace_array *tr = hwlat_trace; \ 162 + \ 163 + trace_array_printk_buf(tr->array_buffer.buffer, _THIS_IP_, msg); \ 164 + }) 184 165 185 166 /** 186 167 * get_sample - sample the CPU TSC and look for likely hardware latencies ··· 200 163 */ 201 164 static int get_sample(void) 202 165 { 166 + struct hwlat_kthread_data *kdata = get_cpu_data(); 203 167 struct trace_array *tr = hwlat_trace; 204 168 struct hwlat_sample s; 205 169 time_type start, t1, t2, last_t2; ··· 213 175 214 176 do_div(thresh, NSEC_PER_USEC); /* modifies interval value */ 215 177 216 - nmi_cpu = smp_processor_id(); 217 - nmi_total_ts = 0; 218 - nmi_count = 0; 178 + kdata->nmi_total_ts = 0; 179 + kdata->nmi_count = 0; 219 180 /* Make sure NMIs see this first */ 220 181 barrier(); 221 182 ··· 234 197 outer_diff = time_to_us(time_sub(t1, last_t2)); 235 198 /* This shouldn't happen */ 236 199 if (outer_diff < 0) { 237 - pr_err(BANNER "time running backwards\n"); 200 + hwlat_err(BANNER "time running backwards\n"); 238 201 goto out; 239 202 } 240 203 if (outer_diff > outer_sample) ··· 246 209 247 210 /* Check for possible overflows */ 248 211 if (total < last_total) { 249 - pr_err("Time total overflowed\n"); 212 + hwlat_err("Time total overflowed\n"); 250 213 break; 251 214 } 252 215 last_total = total; ··· 262 225 263 226 /* This shouldn't happen */ 264 227 if (diff < 0) { 265 - pr_err(BANNER "time running backwards\n"); 228 + hwlat_err(BANNER "time running backwards\n"); 266 229 goto out; 267 230 } 268 231 ··· 284 247 ret = 1; 285 248 286 249 /* We read in microseconds */ 287 - if (nmi_total_ts) 288 - do_div(nmi_total_ts, NSEC_PER_USEC); 250 + if (kdata->nmi_total_ts) 251 + do_div(kdata->nmi_total_ts, NSEC_PER_USEC); 289 252 290 253 hwlat_data.count++; 291 254 s.seqnum = hwlat_data.count; 292 255 s.duration = sample; 293 256 s.outer_duration = outer_sample; 294 - s.nmi_total_ts = nmi_total_ts; 295 - s.nmi_count = nmi_count; 257 + s.nmi_total_ts = kdata->nmi_total_ts; 258 + s.nmi_count = kdata->nmi_count; 296 259 s.count = count; 297 260 trace_hwlat_sample(&s); 298 261 ··· 310 273 } 311 274 312 275 static struct cpumask save_cpumask; 313 - static bool disable_migrate; 314 276 315 277 static void move_to_next_cpu(void) 316 278 { ··· 317 281 struct trace_array *tr = hwlat_trace; 318 282 int next_cpu; 319 283 320 - if (disable_migrate) 321 - return; 322 284 /* 323 285 * If for some reason the user modifies the CPU affinity 324 286 * of this thread, then stop migrating for the duration 325 287 * of the current test. 326 288 */ 327 289 if (!cpumask_equal(current_mask, current->cpus_ptr)) 328 - goto disable; 290 + goto change_mode; 329 291 330 292 get_online_cpus(); 331 293 cpumask_and(current_mask, cpu_online_mask, tr->tracing_cpumask); ··· 334 300 next_cpu = cpumask_first(current_mask); 335 301 336 302 if (next_cpu >= nr_cpu_ids) /* Shouldn't happen! */ 337 - goto disable; 303 + goto change_mode; 338 304 339 305 cpumask_clear(current_mask); 340 306 cpumask_set_cpu(next_cpu, current_mask); ··· 342 308 sched_setaffinity(0, current_mask); 343 309 return; 344 310 345 - disable: 346 - disable_migrate = true; 311 + change_mode: 312 + hwlat_data.thread_mode = MODE_NONE; 313 + pr_info(BANNER "cpumask changed while in round-robin mode, switching to mode none\n"); 347 314 } 348 315 349 316 /* ··· 363 328 364 329 while (!kthread_should_stop()) { 365 330 366 - move_to_next_cpu(); 331 + if (hwlat_data.thread_mode == MODE_ROUND_ROBIN) 332 + move_to_next_cpu(); 367 333 368 334 local_irq_disable(); 369 335 get_sample(); ··· 387 351 return 0; 388 352 } 389 353 390 - /** 391 - * start_kthread - Kick off the hardware latency sampling/detector kthread 354 + /* 355 + * stop_stop_kthread - Inform the hardware latency sampling/detector kthread to stop 356 + * 357 + * This kicks the running hardware latency sampling/detector kernel thread and 358 + * tells it to stop sampling now. Use this on unload and at system shutdown. 359 + */ 360 + static void stop_single_kthread(void) 361 + { 362 + struct hwlat_kthread_data *kdata = get_cpu_data(); 363 + struct task_struct *kthread; 364 + 365 + get_online_cpus(); 366 + kthread = kdata->kthread; 367 + 368 + if (!kthread) 369 + goto out_put_cpus; 370 + 371 + kthread_stop(kthread); 372 + kdata->kthread = NULL; 373 + 374 + out_put_cpus: 375 + put_online_cpus(); 376 + } 377 + 378 + 379 + /* 380 + * start_single_kthread - Kick off the hardware latency sampling/detector kthread 392 381 * 393 382 * This starts the kernel thread that will sit and sample the CPU timestamp 394 383 * counter (TSC or similar) and look for potential hardware latencies. 395 384 */ 396 - static int start_kthread(struct trace_array *tr) 385 + static int start_single_kthread(struct trace_array *tr) 397 386 { 387 + struct hwlat_kthread_data *kdata = get_cpu_data(); 398 388 struct cpumask *current_mask = &save_cpumask; 399 389 struct task_struct *kthread; 400 390 int next_cpu; 401 391 402 - if (hwlat_kthread) 403 - return 0; 404 - 405 - /* Just pick the first CPU on first iteration */ 406 392 get_online_cpus(); 407 - cpumask_and(current_mask, cpu_online_mask, tr->tracing_cpumask); 408 - put_online_cpus(); 409 - next_cpu = cpumask_first(current_mask); 393 + if (kdata->kthread) 394 + goto out_put_cpus; 410 395 411 396 kthread = kthread_create(kthread_fn, NULL, "hwlatd"); 397 + if (IS_ERR(kthread)) { 398 + pr_err(BANNER "could not start sampling thread\n"); 399 + put_online_cpus(); 400 + return -ENOMEM; 401 + } 402 + 403 + /* Just pick the first CPU on first iteration */ 404 + cpumask_and(current_mask, cpu_online_mask, tr->tracing_cpumask); 405 + 406 + if (hwlat_data.thread_mode == MODE_ROUND_ROBIN) { 407 + next_cpu = cpumask_first(current_mask); 408 + cpumask_clear(current_mask); 409 + cpumask_set_cpu(next_cpu, current_mask); 410 + 411 + } 412 + 413 + sched_setaffinity(kthread->pid, current_mask); 414 + 415 + kdata->kthread = kthread; 416 + wake_up_process(kthread); 417 + 418 + out_put_cpus: 419 + put_online_cpus(); 420 + return 0; 421 + } 422 + 423 + /* 424 + * stop_cpu_kthread - Stop a hwlat cpu kthread 425 + */ 426 + static void stop_cpu_kthread(unsigned int cpu) 427 + { 428 + struct task_struct *kthread; 429 + 430 + kthread = per_cpu(hwlat_per_cpu_data, cpu).kthread; 431 + if (kthread) 432 + kthread_stop(kthread); 433 + per_cpu(hwlat_per_cpu_data, cpu).kthread = NULL; 434 + } 435 + 436 + /* 437 + * stop_per_cpu_kthreads - Inform the hardware latency sampling/detector kthread to stop 438 + * 439 + * This kicks the running hardware latency sampling/detector kernel threads and 440 + * tells it to stop sampling now. Use this on unload and at system shutdown. 441 + */ 442 + static void stop_per_cpu_kthreads(void) 443 + { 444 + unsigned int cpu; 445 + 446 + get_online_cpus(); 447 + for_each_online_cpu(cpu) 448 + stop_cpu_kthread(cpu); 449 + put_online_cpus(); 450 + } 451 + 452 + /* 453 + * start_cpu_kthread - Start a hwlat cpu kthread 454 + */ 455 + static int start_cpu_kthread(unsigned int cpu) 456 + { 457 + struct task_struct *kthread; 458 + char comm[24]; 459 + 460 + snprintf(comm, 24, "hwlatd/%d", cpu); 461 + 462 + kthread = kthread_create_on_cpu(kthread_fn, NULL, cpu, comm); 412 463 if (IS_ERR(kthread)) { 413 464 pr_err(BANNER "could not start sampling thread\n"); 414 465 return -ENOMEM; 415 466 } 416 467 417 - cpumask_clear(current_mask); 418 - cpumask_set_cpu(next_cpu, current_mask); 419 - sched_setaffinity(kthread->pid, current_mask); 420 - 421 - hwlat_kthread = kthread; 468 + per_cpu(hwlat_per_cpu_data, cpu).kthread = kthread; 422 469 wake_up_process(kthread); 423 470 424 471 return 0; 425 472 } 426 473 427 - /** 428 - * stop_kthread - Inform the hardware latency sampling/detector kthread to stop 429 - * 430 - * This kicks the running hardware latency sampling/detector kernel thread and 431 - * tells it to stop sampling now. Use this on unload and at system shutdown. 432 - */ 433 - static void stop_kthread(void) 474 + #ifdef CONFIG_HOTPLUG_CPU 475 + static void hwlat_hotplug_workfn(struct work_struct *dummy) 434 476 { 435 - if (!hwlat_kthread) 436 - return; 437 - kthread_stop(hwlat_kthread); 438 - hwlat_kthread = NULL; 477 + struct trace_array *tr = hwlat_trace; 478 + unsigned int cpu = smp_processor_id(); 479 + 480 + mutex_lock(&trace_types_lock); 481 + mutex_lock(&hwlat_data.lock); 482 + get_online_cpus(); 483 + 484 + if (!hwlat_busy || hwlat_data.thread_mode != MODE_PER_CPU) 485 + goto out_unlock; 486 + 487 + if (!cpumask_test_cpu(cpu, tr->tracing_cpumask)) 488 + goto out_unlock; 489 + 490 + start_cpu_kthread(cpu); 491 + 492 + out_unlock: 493 + put_online_cpus(); 494 + mutex_unlock(&hwlat_data.lock); 495 + mutex_unlock(&trace_types_lock); 496 + } 497 + 498 + static DECLARE_WORK(hwlat_hotplug_work, hwlat_hotplug_workfn); 499 + 500 + /* 501 + * hwlat_cpu_init - CPU hotplug online callback function 502 + */ 503 + static int hwlat_cpu_init(unsigned int cpu) 504 + { 505 + schedule_work_on(cpu, &hwlat_hotplug_work); 506 + return 0; 439 507 } 440 508 441 509 /* 442 - * hwlat_read - Wrapper read function for reading both window and width 443 - * @filp: The active open file structure 444 - * @ubuf: The userspace provided buffer to read value into 445 - * @cnt: The maximum number of bytes to read 446 - * @ppos: The current "file" position 447 - * 448 - * This function provides a generic read implementation for the global state 449 - * "hwlat_data" structure filesystem entries. 510 + * hwlat_cpu_die - CPU hotplug offline callback function 450 511 */ 451 - static ssize_t hwlat_read(struct file *filp, char __user *ubuf, 452 - size_t cnt, loff_t *ppos) 512 + static int hwlat_cpu_die(unsigned int cpu) 453 513 { 454 - char buf[U64STR_SIZE]; 455 - u64 *entry = filp->private_data; 456 - u64 val; 457 - int len; 514 + stop_cpu_kthread(cpu); 515 + return 0; 516 + } 458 517 459 - if (!entry) 518 + static void hwlat_init_hotplug_support(void) 519 + { 520 + int ret; 521 + 522 + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "trace/hwlat:online", 523 + hwlat_cpu_init, hwlat_cpu_die); 524 + if (ret < 0) 525 + pr_warn(BANNER "Error to init cpu hotplug support\n"); 526 + 527 + return; 528 + } 529 + #else /* CONFIG_HOTPLUG_CPU */ 530 + static void hwlat_init_hotplug_support(void) 531 + { 532 + return; 533 + } 534 + #endif /* CONFIG_HOTPLUG_CPU */ 535 + 536 + /* 537 + * start_per_cpu_kthreads - Kick off the hardware latency sampling/detector kthreads 538 + * 539 + * This starts the kernel threads that will sit on potentially all cpus and 540 + * sample the CPU timestamp counter (TSC or similar) and look for potential 541 + * hardware latencies. 542 + */ 543 + static int start_per_cpu_kthreads(struct trace_array *tr) 544 + { 545 + struct cpumask *current_mask = &save_cpumask; 546 + unsigned int cpu; 547 + int retval; 548 + 549 + get_online_cpus(); 550 + /* 551 + * Run only on CPUs in which hwlat is allowed to run. 552 + */ 553 + cpumask_and(current_mask, cpu_online_mask, tr->tracing_cpumask); 554 + 555 + for_each_online_cpu(cpu) 556 + per_cpu(hwlat_per_cpu_data, cpu).kthread = NULL; 557 + 558 + for_each_cpu(cpu, current_mask) { 559 + retval = start_cpu_kthread(cpu); 560 + if (retval) 561 + goto out_error; 562 + } 563 + put_online_cpus(); 564 + 565 + return 0; 566 + 567 + out_error: 568 + put_online_cpus(); 569 + stop_per_cpu_kthreads(); 570 + return retval; 571 + } 572 + 573 + static void *s_mode_start(struct seq_file *s, loff_t *pos) 574 + { 575 + int mode = *pos; 576 + 577 + mutex_lock(&hwlat_data.lock); 578 + 579 + if (mode >= MODE_MAX) 580 + return NULL; 581 + 582 + return pos; 583 + } 584 + 585 + static void *s_mode_next(struct seq_file *s, void *v, loff_t *pos) 586 + { 587 + int mode = ++(*pos); 588 + 589 + if (mode >= MODE_MAX) 590 + return NULL; 591 + 592 + return pos; 593 + } 594 + 595 + static int s_mode_show(struct seq_file *s, void *v) 596 + { 597 + loff_t *pos = v; 598 + int mode = *pos; 599 + 600 + if (mode == hwlat_data.thread_mode) 601 + seq_printf(s, "[%s]", thread_mode_str[mode]); 602 + else 603 + seq_printf(s, "%s", thread_mode_str[mode]); 604 + 605 + if (mode != MODE_MAX) 606 + seq_puts(s, " "); 607 + 608 + return 0; 609 + } 610 + 611 + static void s_mode_stop(struct seq_file *s, void *v) 612 + { 613 + seq_puts(s, "\n"); 614 + mutex_unlock(&hwlat_data.lock); 615 + } 616 + 617 + static const struct seq_operations thread_mode_seq_ops = { 618 + .start = s_mode_start, 619 + .next = s_mode_next, 620 + .show = s_mode_show, 621 + .stop = s_mode_stop 622 + }; 623 + 624 + static int hwlat_mode_open(struct inode *inode, struct file *file) 625 + { 626 + return seq_open(file, &thread_mode_seq_ops); 627 + }; 628 + 629 + static void hwlat_tracer_start(struct trace_array *tr); 630 + static void hwlat_tracer_stop(struct trace_array *tr); 631 + 632 + /** 633 + * hwlat_mode_write - Write function for "mode" entry 634 + * @filp: The active open file structure 635 + * @ubuf: The user buffer that contains the value to write 636 + * @cnt: The maximum number of bytes to write to "file" 637 + * @ppos: The current position in @file 638 + * 639 + * This function provides a write implementation for the "mode" interface 640 + * to the hardware latency detector. hwlatd has different operation modes. 641 + * The "none" sets the allowed cpumask for a single hwlatd thread at the 642 + * startup and lets the scheduler handle the migration. The default mode is 643 + * the "round-robin" one, in which a single hwlatd thread runs, migrating 644 + * among the allowed CPUs in a round-robin fashion. The "per-cpu" mode 645 + * creates one hwlatd thread per allowed CPU. 646 + */ 647 + static ssize_t hwlat_mode_write(struct file *filp, const char __user *ubuf, 648 + size_t cnt, loff_t *ppos) 649 + { 650 + struct trace_array *tr = hwlat_trace; 651 + const char *mode; 652 + char buf[64]; 653 + int ret, i; 654 + 655 + if (cnt >= sizeof(buf)) 656 + return -EINVAL; 657 + 658 + if (copy_from_user(buf, ubuf, cnt)) 460 659 return -EFAULT; 461 660 462 - if (cnt > sizeof(buf)) 463 - cnt = sizeof(buf); 661 + buf[cnt] = 0; 464 662 465 - val = *entry; 663 + mode = strstrip(buf); 466 664 467 - len = snprintf(buf, sizeof(buf), "%llu\n", val); 665 + ret = -EINVAL; 468 666 469 - return simple_read_from_buffer(ubuf, cnt, ppos, buf, len); 470 - } 471 - 472 - /** 473 - * hwlat_width_write - Write function for "width" entry 474 - * @filp: The active open file structure 475 - * @ubuf: The user buffer that contains the value to write 476 - * @cnt: The maximum number of bytes to write to "file" 477 - * @ppos: The current position in @file 478 - * 479 - * This function provides a write implementation for the "width" interface 480 - * to the hardware latency detector. It can be used to configure 481 - * for how many us of the total window us we will actively sample for any 482 - * hardware-induced latency periods. Obviously, it is not possible to 483 - * sample constantly and have the system respond to a sample reader, or, 484 - * worse, without having the system appear to have gone out to lunch. It 485 - * is enforced that width is less that the total window size. 486 - */ 487 - static ssize_t 488 - hwlat_width_write(struct file *filp, const char __user *ubuf, 489 - size_t cnt, loff_t *ppos) 490 - { 491 - u64 val; 492 - int err; 493 - 494 - err = kstrtoull_from_user(ubuf, cnt, 10, &val); 495 - if (err) 496 - return err; 667 + /* 668 + * trace_types_lock is taken to avoid concurrency on start/stop 669 + * and hwlat_busy. 670 + */ 671 + mutex_lock(&trace_types_lock); 672 + if (hwlat_busy) 673 + hwlat_tracer_stop(tr); 497 674 498 675 mutex_lock(&hwlat_data.lock); 499 - if (val < hwlat_data.sample_window) 500 - hwlat_data.sample_width = val; 501 - else 502 - err = -EINVAL; 676 + 677 + for (i = 0; i < MODE_MAX; i++) { 678 + if (strcmp(mode, thread_mode_str[i]) == 0) { 679 + hwlat_data.thread_mode = i; 680 + ret = cnt; 681 + } 682 + } 683 + 503 684 mutex_unlock(&hwlat_data.lock); 504 685 505 - if (err) 506 - return err; 686 + if (hwlat_busy) 687 + hwlat_tracer_start(tr); 688 + mutex_unlock(&trace_types_lock); 507 689 508 - return cnt; 690 + *ppos += cnt; 691 + 692 + 693 + 694 + return ret; 509 695 } 510 696 511 - /** 512 - * hwlat_window_write - Write function for "window" entry 513 - * @filp: The active open file structure 514 - * @ubuf: The user buffer that contains the value to write 515 - * @cnt: The maximum number of bytes to write to "file" 516 - * @ppos: The current position in @file 517 - * 518 - * This function provides a write implementation for the "window" interface 519 - * to the hardware latency detector. The window is the total time 520 - * in us that will be considered one sample period. Conceptually, windows 521 - * occur back-to-back and contain a sample width period during which 522 - * actual sampling occurs. Can be used to write a new total window size. It 523 - * is enforced that any value written must be greater than the sample width 524 - * size, or an error results. 697 + /* 698 + * The width parameter is read/write using the generic trace_min_max_param 699 + * method. The *val is protected by the hwlat_data lock and is upper 700 + * bounded by the window parameter. 525 701 */ 526 - static ssize_t 527 - hwlat_window_write(struct file *filp, const char __user *ubuf, 528 - size_t cnt, loff_t *ppos) 529 - { 530 - u64 val; 531 - int err; 532 - 533 - err = kstrtoull_from_user(ubuf, cnt, 10, &val); 534 - if (err) 535 - return err; 536 - 537 - mutex_lock(&hwlat_data.lock); 538 - if (hwlat_data.sample_width < val) 539 - hwlat_data.sample_window = val; 540 - else 541 - err = -EINVAL; 542 - mutex_unlock(&hwlat_data.lock); 543 - 544 - if (err) 545 - return err; 546 - 547 - return cnt; 548 - } 549 - 550 - static const struct file_operations width_fops = { 551 - .open = tracing_open_generic, 552 - .read = hwlat_read, 553 - .write = hwlat_width_write, 702 + static struct trace_min_max_param hwlat_width = { 703 + .lock = &hwlat_data.lock, 704 + .val = &hwlat_data.sample_width, 705 + .max = &hwlat_data.sample_window, 706 + .min = NULL, 554 707 }; 555 708 556 - static const struct file_operations window_fops = { 557 - .open = tracing_open_generic, 558 - .read = hwlat_read, 559 - .write = hwlat_window_write, 709 + /* 710 + * The window parameter is read/write using the generic trace_min_max_param 711 + * method. The *val is protected by the hwlat_data lock and is lower 712 + * bounded by the width parameter. 713 + */ 714 + static struct trace_min_max_param hwlat_window = { 715 + .lock = &hwlat_data.lock, 716 + .val = &hwlat_data.sample_window, 717 + .max = NULL, 718 + .min = &hwlat_data.sample_width, 560 719 }; 561 720 721 + static const struct file_operations thread_mode_fops = { 722 + .open = hwlat_mode_open, 723 + .read = seq_read, 724 + .llseek = seq_lseek, 725 + .release = seq_release, 726 + .write = hwlat_mode_write 727 + }; 562 728 /** 563 729 * init_tracefs - A function to initialize the tracefs interface files 564 730 * ··· 784 546 785 547 hwlat_sample_window = tracefs_create_file("window", 0640, 786 548 top_dir, 787 - &hwlat_data.sample_window, 788 - &window_fops); 549 + &hwlat_window, 550 + &trace_min_max_fops); 789 551 if (!hwlat_sample_window) 790 552 goto err; 791 553 792 554 hwlat_sample_width = tracefs_create_file("width", 0644, 793 555 top_dir, 794 - &hwlat_data.sample_width, 795 - &width_fops); 556 + &hwlat_width, 557 + &trace_min_max_fops); 796 558 if (!hwlat_sample_width) 559 + goto err; 560 + 561 + hwlat_thread_mode = trace_create_file("mode", 0644, 562 + top_dir, 563 + NULL, 564 + &thread_mode_fops); 565 + if (!hwlat_thread_mode) 797 566 goto err; 798 567 799 568 return 0; ··· 814 569 { 815 570 int err; 816 571 817 - err = start_kthread(tr); 572 + if (hwlat_data.thread_mode == MODE_PER_CPU) 573 + err = start_per_cpu_kthreads(tr); 574 + else 575 + err = start_single_kthread(tr); 818 576 if (err) 819 577 pr_err(BANNER "Cannot start hwlat kthread\n"); 820 578 } 821 579 822 580 static void hwlat_tracer_stop(struct trace_array *tr) 823 581 { 824 - stop_kthread(); 582 + if (hwlat_data.thread_mode == MODE_PER_CPU) 583 + stop_per_cpu_kthreads(); 584 + else 585 + stop_single_kthread(); 825 586 } 826 - 827 - static bool hwlat_busy; 828 587 829 588 static int hwlat_tracer_init(struct trace_array *tr) 830 589 { ··· 838 589 839 590 hwlat_trace = tr; 840 591 841 - disable_migrate = false; 842 592 hwlat_data.count = 0; 843 593 tr->max_latency = 0; 844 594 save_tracing_thresh = tracing_thresh; ··· 856 608 857 609 static void hwlat_tracer_reset(struct trace_array *tr) 858 610 { 859 - stop_kthread(); 611 + hwlat_tracer_stop(tr); 860 612 861 613 /* the tracing threshold is static between runs */ 862 614 last_tracing_thresh = tracing_thresh; ··· 884 636 ret = register_tracer(&hwlat_tracer); 885 637 if (ret) 886 638 return ret; 639 + 640 + hwlat_init_hotplug_support(); 887 641 888 642 init_tracefs(); 889 643
+2059
kernel/trace/trace_osnoise.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * OS Noise Tracer: computes the OS Noise suffered by a running thread. 4 + * Timerlat Tracer: measures the wakeup latency of a timer triggered IRQ and thread. 5 + * 6 + * Based on "hwlat_detector" tracer by: 7 + * Copyright (C) 2008-2009 Jon Masters, Red Hat, Inc. <jcm@redhat.com> 8 + * Copyright (C) 2013-2016 Steven Rostedt, Red Hat, Inc. <srostedt@redhat.com> 9 + * With feedback from Clark Williams <williams@redhat.com> 10 + * 11 + * And also based on the rtsl tracer presented on: 12 + * DE OLIVEIRA, Daniel Bristot, et al. Demystifying the real-time linux 13 + * scheduling latency. In: 32nd Euromicro Conference on Real-Time Systems 14 + * (ECRTS 2020). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik, 2020. 15 + * 16 + * Copyright (C) 2021 Daniel Bristot de Oliveira, Red Hat, Inc. <bristot@redhat.com> 17 + */ 18 + 19 + #include <linux/kthread.h> 20 + #include <linux/tracefs.h> 21 + #include <linux/uaccess.h> 22 + #include <linux/cpumask.h> 23 + #include <linux/delay.h> 24 + #include <linux/sched/clock.h> 25 + #include <uapi/linux/sched/types.h> 26 + #include <linux/sched.h> 27 + #include "trace.h" 28 + 29 + #ifdef CONFIG_X86_LOCAL_APIC 30 + #include <asm/trace/irq_vectors.h> 31 + #undef TRACE_INCLUDE_PATH 32 + #undef TRACE_INCLUDE_FILE 33 + #endif /* CONFIG_X86_LOCAL_APIC */ 34 + 35 + #include <trace/events/irq.h> 36 + #include <trace/events/sched.h> 37 + 38 + #define CREATE_TRACE_POINTS 39 + #include <trace/events/osnoise.h> 40 + 41 + static struct trace_array *osnoise_trace; 42 + 43 + /* 44 + * Default values. 45 + */ 46 + #define BANNER "osnoise: " 47 + #define DEFAULT_SAMPLE_PERIOD 1000000 /* 1s */ 48 + #define DEFAULT_SAMPLE_RUNTIME 1000000 /* 1s */ 49 + 50 + #define DEFAULT_TIMERLAT_PERIOD 1000 /* 1ms */ 51 + #define DEFAULT_TIMERLAT_PRIO 95 /* FIFO 95 */ 52 + 53 + /* 54 + * NMI runtime info. 55 + */ 56 + struct osn_nmi { 57 + u64 count; 58 + u64 delta_start; 59 + }; 60 + 61 + /* 62 + * IRQ runtime info. 63 + */ 64 + struct osn_irq { 65 + u64 count; 66 + u64 arrival_time; 67 + u64 delta_start; 68 + }; 69 + 70 + #define IRQ_CONTEXT 0 71 + #define THREAD_CONTEXT 1 72 + /* 73 + * sofirq runtime info. 74 + */ 75 + struct osn_softirq { 76 + u64 count; 77 + u64 arrival_time; 78 + u64 delta_start; 79 + }; 80 + 81 + /* 82 + * thread runtime info. 83 + */ 84 + struct osn_thread { 85 + u64 count; 86 + u64 arrival_time; 87 + u64 delta_start; 88 + }; 89 + 90 + /* 91 + * Runtime information: this structure saves the runtime information used by 92 + * one sampling thread. 93 + */ 94 + struct osnoise_variables { 95 + struct task_struct *kthread; 96 + bool sampling; 97 + pid_t pid; 98 + struct osn_nmi nmi; 99 + struct osn_irq irq; 100 + struct osn_softirq softirq; 101 + struct osn_thread thread; 102 + local_t int_counter; 103 + }; 104 + 105 + /* 106 + * Per-cpu runtime information. 107 + */ 108 + DEFINE_PER_CPU(struct osnoise_variables, per_cpu_osnoise_var); 109 + 110 + /* 111 + * this_cpu_osn_var - Return the per-cpu osnoise_variables on its relative CPU 112 + */ 113 + static inline struct osnoise_variables *this_cpu_osn_var(void) 114 + { 115 + return this_cpu_ptr(&per_cpu_osnoise_var); 116 + } 117 + 118 + #ifdef CONFIG_TIMERLAT_TRACER 119 + /* 120 + * Runtime information for the timer mode. 121 + */ 122 + struct timerlat_variables { 123 + struct task_struct *kthread; 124 + struct hrtimer timer; 125 + u64 rel_period; 126 + u64 abs_period; 127 + bool tracing_thread; 128 + u64 count; 129 + }; 130 + 131 + DEFINE_PER_CPU(struct timerlat_variables, per_cpu_timerlat_var); 132 + 133 + /* 134 + * this_cpu_tmr_var - Return the per-cpu timerlat_variables on its relative CPU 135 + */ 136 + static inline struct timerlat_variables *this_cpu_tmr_var(void) 137 + { 138 + return this_cpu_ptr(&per_cpu_timerlat_var); 139 + } 140 + 141 + /* 142 + * tlat_var_reset - Reset the values of the given timerlat_variables 143 + */ 144 + static inline void tlat_var_reset(void) 145 + { 146 + struct timerlat_variables *tlat_var; 147 + int cpu; 148 + /* 149 + * So far, all the values are initialized as 0, so 150 + * zeroing the structure is perfect. 151 + */ 152 + for_each_cpu(cpu, cpu_online_mask) { 153 + tlat_var = per_cpu_ptr(&per_cpu_timerlat_var, cpu); 154 + memset(tlat_var, 0, sizeof(*tlat_var)); 155 + } 156 + } 157 + #else /* CONFIG_TIMERLAT_TRACER */ 158 + #define tlat_var_reset() do {} while (0) 159 + #endif /* CONFIG_TIMERLAT_TRACER */ 160 + 161 + /* 162 + * osn_var_reset - Reset the values of the given osnoise_variables 163 + */ 164 + static inline void osn_var_reset(void) 165 + { 166 + struct osnoise_variables *osn_var; 167 + int cpu; 168 + 169 + /* 170 + * So far, all the values are initialized as 0, so 171 + * zeroing the structure is perfect. 172 + */ 173 + for_each_cpu(cpu, cpu_online_mask) { 174 + osn_var = per_cpu_ptr(&per_cpu_osnoise_var, cpu); 175 + memset(osn_var, 0, sizeof(*osn_var)); 176 + } 177 + } 178 + 179 + /* 180 + * osn_var_reset_all - Reset the value of all per-cpu osnoise_variables 181 + */ 182 + static inline void osn_var_reset_all(void) 183 + { 184 + osn_var_reset(); 185 + tlat_var_reset(); 186 + } 187 + 188 + /* 189 + * Tells NMIs to call back to the osnoise tracer to record timestamps. 190 + */ 191 + bool trace_osnoise_callback_enabled; 192 + 193 + /* 194 + * osnoise sample structure definition. Used to store the statistics of a 195 + * sample run. 196 + */ 197 + struct osnoise_sample { 198 + u64 runtime; /* runtime */ 199 + u64 noise; /* noise */ 200 + u64 max_sample; /* max single noise sample */ 201 + int hw_count; /* # HW (incl. hypervisor) interference */ 202 + int nmi_count; /* # NMIs during this sample */ 203 + int irq_count; /* # IRQs during this sample */ 204 + int softirq_count; /* # softirqs during this sample */ 205 + int thread_count; /* # threads during this sample */ 206 + }; 207 + 208 + #ifdef CONFIG_TIMERLAT_TRACER 209 + /* 210 + * timerlat sample structure definition. Used to store the statistics of 211 + * a sample run. 212 + */ 213 + struct timerlat_sample { 214 + u64 timer_latency; /* timer_latency */ 215 + unsigned int seqnum; /* unique sequence */ 216 + int context; /* timer context */ 217 + }; 218 + #endif 219 + 220 + /* 221 + * Protect the interface. 222 + */ 223 + struct mutex interface_lock; 224 + 225 + /* 226 + * Tracer data. 227 + */ 228 + static struct osnoise_data { 229 + u64 sample_period; /* total sampling period */ 230 + u64 sample_runtime; /* active sampling portion of period */ 231 + u64 stop_tracing; /* stop trace in the internal operation (loop/irq) */ 232 + u64 stop_tracing_total; /* stop trace in the final operation (report/thread) */ 233 + #ifdef CONFIG_TIMERLAT_TRACER 234 + u64 timerlat_period; /* timerlat period */ 235 + u64 print_stack; /* print IRQ stack if total > */ 236 + int timerlat_tracer; /* timerlat tracer */ 237 + #endif 238 + bool tainted; /* infor users and developers about a problem */ 239 + } osnoise_data = { 240 + .sample_period = DEFAULT_SAMPLE_PERIOD, 241 + .sample_runtime = DEFAULT_SAMPLE_RUNTIME, 242 + .stop_tracing = 0, 243 + .stop_tracing_total = 0, 244 + #ifdef CONFIG_TIMERLAT_TRACER 245 + .print_stack = 0, 246 + .timerlat_period = DEFAULT_TIMERLAT_PERIOD, 247 + .timerlat_tracer = 0, 248 + #endif 249 + }; 250 + 251 + /* 252 + * Boolean variable used to inform that the tracer is currently sampling. 253 + */ 254 + static bool osnoise_busy; 255 + 256 + /* 257 + * Print the osnoise header info. 258 + */ 259 + static void print_osnoise_headers(struct seq_file *s) 260 + { 261 + if (osnoise_data.tainted) 262 + seq_puts(s, "# osnoise is tainted!\n"); 263 + 264 + seq_puts(s, "# _-----=> irqs-off\n"); 265 + seq_puts(s, "# / _----=> need-resched\n"); 266 + seq_puts(s, "# | / _---=> hardirq/softirq\n"); 267 + seq_puts(s, "# || / _--=> preempt-depth "); 268 + seq_puts(s, " MAX\n"); 269 + 270 + seq_puts(s, "# || / "); 271 + seq_puts(s, " SINGLE Interference counters:\n"); 272 + 273 + seq_puts(s, "# |||| RUNTIME "); 274 + seq_puts(s, " NOISE %% OF CPU NOISE +-----------------------------+\n"); 275 + 276 + seq_puts(s, "# TASK-PID CPU# |||| TIMESTAMP IN US "); 277 + seq_puts(s, " IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD\n"); 278 + 279 + seq_puts(s, "# | | | |||| | | "); 280 + seq_puts(s, " | | | | | | | |\n"); 281 + } 282 + 283 + /* 284 + * osnoise_taint - report an osnoise error. 285 + */ 286 + #define osnoise_taint(msg) ({ \ 287 + struct trace_array *tr = osnoise_trace; \ 288 + \ 289 + trace_array_printk_buf(tr->array_buffer.buffer, _THIS_IP_, msg); \ 290 + osnoise_data.tainted = true; \ 291 + }) 292 + 293 + /* 294 + * Record an osnoise_sample into the tracer buffer. 295 + */ 296 + static void trace_osnoise_sample(struct osnoise_sample *sample) 297 + { 298 + struct trace_array *tr = osnoise_trace; 299 + struct trace_buffer *buffer = tr->array_buffer.buffer; 300 + struct trace_event_call *call = &event_osnoise; 301 + struct ring_buffer_event *event; 302 + struct osnoise_entry *entry; 303 + 304 + event = trace_buffer_lock_reserve(buffer, TRACE_OSNOISE, sizeof(*entry), 305 + tracing_gen_ctx()); 306 + if (!event) 307 + return; 308 + entry = ring_buffer_event_data(event); 309 + entry->runtime = sample->runtime; 310 + entry->noise = sample->noise; 311 + entry->max_sample = sample->max_sample; 312 + entry->hw_count = sample->hw_count; 313 + entry->nmi_count = sample->nmi_count; 314 + entry->irq_count = sample->irq_count; 315 + entry->softirq_count = sample->softirq_count; 316 + entry->thread_count = sample->thread_count; 317 + 318 + if (!call_filter_check_discard(call, entry, buffer, event)) 319 + trace_buffer_unlock_commit_nostack(buffer, event); 320 + } 321 + 322 + #ifdef CONFIG_TIMERLAT_TRACER 323 + /* 324 + * Print the timerlat header info. 325 + */ 326 + static void print_timerlat_headers(struct seq_file *s) 327 + { 328 + seq_puts(s, "# _-----=> irqs-off\n"); 329 + seq_puts(s, "# / _----=> need-resched\n"); 330 + seq_puts(s, "# | / _---=> hardirq/softirq\n"); 331 + seq_puts(s, "# || / _--=> preempt-depth\n"); 332 + seq_puts(s, "# || /\n"); 333 + seq_puts(s, "# |||| ACTIVATION\n"); 334 + seq_puts(s, "# TASK-PID CPU# |||| TIMESTAMP ID "); 335 + seq_puts(s, " CONTEXT LATENCY\n"); 336 + seq_puts(s, "# | | | |||| | | "); 337 + seq_puts(s, " | |\n"); 338 + } 339 + 340 + /* 341 + * Record an timerlat_sample into the tracer buffer. 342 + */ 343 + static void trace_timerlat_sample(struct timerlat_sample *sample) 344 + { 345 + struct trace_array *tr = osnoise_trace; 346 + struct trace_event_call *call = &event_osnoise; 347 + struct trace_buffer *buffer = tr->array_buffer.buffer; 348 + struct ring_buffer_event *event; 349 + struct timerlat_entry *entry; 350 + 351 + event = trace_buffer_lock_reserve(buffer, TRACE_TIMERLAT, sizeof(*entry), 352 + tracing_gen_ctx()); 353 + if (!event) 354 + return; 355 + entry = ring_buffer_event_data(event); 356 + entry->seqnum = sample->seqnum; 357 + entry->context = sample->context; 358 + entry->timer_latency = sample->timer_latency; 359 + 360 + if (!call_filter_check_discard(call, entry, buffer, event)) 361 + trace_buffer_unlock_commit_nostack(buffer, event); 362 + } 363 + 364 + #ifdef CONFIG_STACKTRACE 365 + 366 + #define MAX_CALLS 256 367 + 368 + /* 369 + * Stack trace will take place only at IRQ level, so, no need 370 + * to control nesting here. 371 + */ 372 + struct trace_stack { 373 + int stack_size; 374 + int nr_entries; 375 + unsigned long calls[MAX_CALLS]; 376 + }; 377 + 378 + static DEFINE_PER_CPU(struct trace_stack, trace_stack); 379 + 380 + /* 381 + * timerlat_save_stack - save a stack trace without printing 382 + * 383 + * Save the current stack trace without printing. The 384 + * stack will be printed later, after the end of the measurement. 385 + */ 386 + static void timerlat_save_stack(int skip) 387 + { 388 + unsigned int size, nr_entries; 389 + struct trace_stack *fstack; 390 + 391 + fstack = this_cpu_ptr(&trace_stack); 392 + 393 + size = ARRAY_SIZE(fstack->calls); 394 + 395 + nr_entries = stack_trace_save(fstack->calls, size, skip); 396 + 397 + fstack->stack_size = nr_entries * sizeof(unsigned long); 398 + fstack->nr_entries = nr_entries; 399 + 400 + return; 401 + 402 + } 403 + /* 404 + * timerlat_dump_stack - dump a stack trace previously saved 405 + * 406 + * Dump a saved stack trace into the trace buffer. 407 + */ 408 + static void timerlat_dump_stack(void) 409 + { 410 + struct trace_event_call *call = &event_osnoise; 411 + struct trace_array *tr = osnoise_trace; 412 + struct trace_buffer *buffer = tr->array_buffer.buffer; 413 + struct ring_buffer_event *event; 414 + struct trace_stack *fstack; 415 + struct stack_entry *entry; 416 + unsigned int size; 417 + 418 + preempt_disable_notrace(); 419 + fstack = this_cpu_ptr(&trace_stack); 420 + size = fstack->stack_size; 421 + 422 + event = trace_buffer_lock_reserve(buffer, TRACE_STACK, sizeof(*entry) + size, 423 + tracing_gen_ctx()); 424 + if (!event) 425 + goto out; 426 + 427 + entry = ring_buffer_event_data(event); 428 + 429 + memcpy(&entry->caller, fstack->calls, size); 430 + entry->size = fstack->nr_entries; 431 + 432 + if (!call_filter_check_discard(call, entry, buffer, event)) 433 + trace_buffer_unlock_commit_nostack(buffer, event); 434 + 435 + out: 436 + preempt_enable_notrace(); 437 + } 438 + #else 439 + #define timerlat_dump_stack() do {} while (0) 440 + #define timerlat_save_stack(a) do {} while (0) 441 + #endif /* CONFIG_STACKTRACE */ 442 + #endif /* CONFIG_TIMERLAT_TRACER */ 443 + 444 + /* 445 + * Macros to encapsulate the time capturing infrastructure. 446 + */ 447 + #define time_get() trace_clock_local() 448 + #define time_to_us(x) div_u64(x, 1000) 449 + #define time_sub(a, b) ((a) - (b)) 450 + 451 + /* 452 + * cond_move_irq_delta_start - Forward the delta_start of a running IRQ 453 + * 454 + * If an IRQ is preempted by an NMI, its delta_start is pushed forward 455 + * to discount the NMI interference. 456 + * 457 + * See get_int_safe_duration(). 458 + */ 459 + static inline void 460 + cond_move_irq_delta_start(struct osnoise_variables *osn_var, u64 duration) 461 + { 462 + if (osn_var->irq.delta_start) 463 + osn_var->irq.delta_start += duration; 464 + } 465 + 466 + #ifndef CONFIG_PREEMPT_RT 467 + /* 468 + * cond_move_softirq_delta_start - Forward the delta_start of a running softirq. 469 + * 470 + * If a softirq is preempted by an IRQ or NMI, its delta_start is pushed 471 + * forward to discount the interference. 472 + * 473 + * See get_int_safe_duration(). 474 + */ 475 + static inline void 476 + cond_move_softirq_delta_start(struct osnoise_variables *osn_var, u64 duration) 477 + { 478 + if (osn_var->softirq.delta_start) 479 + osn_var->softirq.delta_start += duration; 480 + } 481 + #else /* CONFIG_PREEMPT_RT */ 482 + #define cond_move_softirq_delta_start(osn_var, duration) do {} while (0) 483 + #endif 484 + 485 + /* 486 + * cond_move_thread_delta_start - Forward the delta_start of a running thread 487 + * 488 + * If a noisy thread is preempted by an softirq, IRQ or NMI, its delta_start 489 + * is pushed forward to discount the interference. 490 + * 491 + * See get_int_safe_duration(). 492 + */ 493 + static inline void 494 + cond_move_thread_delta_start(struct osnoise_variables *osn_var, u64 duration) 495 + { 496 + if (osn_var->thread.delta_start) 497 + osn_var->thread.delta_start += duration; 498 + } 499 + 500 + /* 501 + * get_int_safe_duration - Get the duration of a window 502 + * 503 + * The irq, softirq and thread varaibles need to have its duration without 504 + * the interference from higher priority interrupts. Instead of keeping a 505 + * variable to discount the interrupt interference from these variables, the 506 + * starting time of these variables are pushed forward with the interrupt's 507 + * duration. In this way, a single variable is used to: 508 + * 509 + * - Know if a given window is being measured. 510 + * - Account its duration. 511 + * - Discount the interference. 512 + * 513 + * To avoid getting inconsistent values, e.g.,: 514 + * 515 + * now = time_get() 516 + * ---> interrupt! 517 + * delta_start -= int duration; 518 + * <--- 519 + * duration = now - delta_start; 520 + * 521 + * result: negative duration if the variable duration before the 522 + * interrupt was smaller than the interrupt execution. 523 + * 524 + * A counter of interrupts is used. If the counter increased, try 525 + * to capture an interference safe duration. 526 + */ 527 + static inline s64 528 + get_int_safe_duration(struct osnoise_variables *osn_var, u64 *delta_start) 529 + { 530 + u64 int_counter, now; 531 + s64 duration; 532 + 533 + do { 534 + int_counter = local_read(&osn_var->int_counter); 535 + /* synchronize with interrupts */ 536 + barrier(); 537 + 538 + now = time_get(); 539 + duration = (now - *delta_start); 540 + 541 + /* synchronize with interrupts */ 542 + barrier(); 543 + } while (int_counter != local_read(&osn_var->int_counter)); 544 + 545 + /* 546 + * This is an evidence of race conditions that cause 547 + * a value to be "discounted" too much. 548 + */ 549 + if (duration < 0) 550 + osnoise_taint("Negative duration!\n"); 551 + 552 + *delta_start = 0; 553 + 554 + return duration; 555 + } 556 + 557 + /* 558 + * 559 + * set_int_safe_time - Save the current time on *time, aware of interference 560 + * 561 + * Get the time, taking into consideration a possible interference from 562 + * higher priority interrupts. 563 + * 564 + * See get_int_safe_duration() for an explanation. 565 + */ 566 + static u64 567 + set_int_safe_time(struct osnoise_variables *osn_var, u64 *time) 568 + { 569 + u64 int_counter; 570 + 571 + do { 572 + int_counter = local_read(&osn_var->int_counter); 573 + /* synchronize with interrupts */ 574 + barrier(); 575 + 576 + *time = time_get(); 577 + 578 + /* synchronize with interrupts */ 579 + barrier(); 580 + } while (int_counter != local_read(&osn_var->int_counter)); 581 + 582 + return int_counter; 583 + } 584 + 585 + #ifdef CONFIG_TIMERLAT_TRACER 586 + /* 587 + * copy_int_safe_time - Copy *src into *desc aware of interference 588 + */ 589 + static u64 590 + copy_int_safe_time(struct osnoise_variables *osn_var, u64 *dst, u64 *src) 591 + { 592 + u64 int_counter; 593 + 594 + do { 595 + int_counter = local_read(&osn_var->int_counter); 596 + /* synchronize with interrupts */ 597 + barrier(); 598 + 599 + *dst = *src; 600 + 601 + /* synchronize with interrupts */ 602 + barrier(); 603 + } while (int_counter != local_read(&osn_var->int_counter)); 604 + 605 + return int_counter; 606 + } 607 + #endif /* CONFIG_TIMERLAT_TRACER */ 608 + 609 + /* 610 + * trace_osnoise_callback - NMI entry/exit callback 611 + * 612 + * This function is called at the entry and exit NMI code. The bool enter 613 + * distinguishes between either case. This function is used to note a NMI 614 + * occurrence, compute the noise caused by the NMI, and to remove the noise 615 + * it is potentially causing on other interference variables. 616 + */ 617 + void trace_osnoise_callback(bool enter) 618 + { 619 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 620 + u64 duration; 621 + 622 + if (!osn_var->sampling) 623 + return; 624 + 625 + /* 626 + * Currently trace_clock_local() calls sched_clock() and the 627 + * generic version is not NMI safe. 628 + */ 629 + if (!IS_ENABLED(CONFIG_GENERIC_SCHED_CLOCK)) { 630 + if (enter) { 631 + osn_var->nmi.delta_start = time_get(); 632 + local_inc(&osn_var->int_counter); 633 + } else { 634 + duration = time_get() - osn_var->nmi.delta_start; 635 + 636 + trace_nmi_noise(osn_var->nmi.delta_start, duration); 637 + 638 + cond_move_irq_delta_start(osn_var, duration); 639 + cond_move_softirq_delta_start(osn_var, duration); 640 + cond_move_thread_delta_start(osn_var, duration); 641 + } 642 + } 643 + 644 + if (enter) 645 + osn_var->nmi.count++; 646 + } 647 + 648 + /* 649 + * osnoise_trace_irq_entry - Note the starting of an IRQ 650 + * 651 + * Save the starting time of an IRQ. As IRQs are non-preemptive to other IRQs, 652 + * it is safe to use a single variable (ons_var->irq) to save the statistics. 653 + * The arrival_time is used to report... the arrival time. The delta_start 654 + * is used to compute the duration at the IRQ exit handler. See 655 + * cond_move_irq_delta_start(). 656 + */ 657 + void osnoise_trace_irq_entry(int id) 658 + { 659 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 660 + 661 + if (!osn_var->sampling) 662 + return; 663 + /* 664 + * This value will be used in the report, but not to compute 665 + * the execution time, so it is safe to get it unsafe. 666 + */ 667 + osn_var->irq.arrival_time = time_get(); 668 + set_int_safe_time(osn_var, &osn_var->irq.delta_start); 669 + osn_var->irq.count++; 670 + 671 + local_inc(&osn_var->int_counter); 672 + } 673 + 674 + /* 675 + * osnoise_irq_exit - Note the end of an IRQ, sava data and trace 676 + * 677 + * Computes the duration of the IRQ noise, and trace it. Also discounts the 678 + * interference from other sources of noise could be currently being accounted. 679 + */ 680 + void osnoise_trace_irq_exit(int id, const char *desc) 681 + { 682 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 683 + int duration; 684 + 685 + if (!osn_var->sampling) 686 + return; 687 + 688 + duration = get_int_safe_duration(osn_var, &osn_var->irq.delta_start); 689 + trace_irq_noise(id, desc, osn_var->irq.arrival_time, duration); 690 + osn_var->irq.arrival_time = 0; 691 + cond_move_softirq_delta_start(osn_var, duration); 692 + cond_move_thread_delta_start(osn_var, duration); 693 + } 694 + 695 + /* 696 + * trace_irqentry_callback - Callback to the irq:irq_entry traceevent 697 + * 698 + * Used to note the starting of an IRQ occurece. 699 + */ 700 + static void trace_irqentry_callback(void *data, int irq, 701 + struct irqaction *action) 702 + { 703 + osnoise_trace_irq_entry(irq); 704 + } 705 + 706 + /* 707 + * trace_irqexit_callback - Callback to the irq:irq_exit traceevent 708 + * 709 + * Used to note the end of an IRQ occurece. 710 + */ 711 + static void trace_irqexit_callback(void *data, int irq, 712 + struct irqaction *action, int ret) 713 + { 714 + osnoise_trace_irq_exit(irq, action->name); 715 + } 716 + 717 + /* 718 + * arch specific register function. 719 + */ 720 + int __weak osnoise_arch_register(void) 721 + { 722 + return 0; 723 + } 724 + 725 + /* 726 + * arch specific unregister function. 727 + */ 728 + void __weak osnoise_arch_unregister(void) 729 + { 730 + return; 731 + } 732 + 733 + /* 734 + * hook_irq_events - Hook IRQ handling events 735 + * 736 + * This function hooks the IRQ related callbacks to the respective trace 737 + * events. 738 + */ 739 + static int hook_irq_events(void) 740 + { 741 + int ret; 742 + 743 + ret = register_trace_irq_handler_entry(trace_irqentry_callback, NULL); 744 + if (ret) 745 + goto out_err; 746 + 747 + ret = register_trace_irq_handler_exit(trace_irqexit_callback, NULL); 748 + if (ret) 749 + goto out_unregister_entry; 750 + 751 + ret = osnoise_arch_register(); 752 + if (ret) 753 + goto out_irq_exit; 754 + 755 + return 0; 756 + 757 + out_irq_exit: 758 + unregister_trace_irq_handler_exit(trace_irqexit_callback, NULL); 759 + out_unregister_entry: 760 + unregister_trace_irq_handler_entry(trace_irqentry_callback, NULL); 761 + out_err: 762 + return -EINVAL; 763 + } 764 + 765 + /* 766 + * unhook_irq_events - Unhook IRQ handling events 767 + * 768 + * This function unhooks the IRQ related callbacks to the respective trace 769 + * events. 770 + */ 771 + static void unhook_irq_events(void) 772 + { 773 + osnoise_arch_unregister(); 774 + unregister_trace_irq_handler_exit(trace_irqexit_callback, NULL); 775 + unregister_trace_irq_handler_entry(trace_irqentry_callback, NULL); 776 + } 777 + 778 + #ifndef CONFIG_PREEMPT_RT 779 + /* 780 + * trace_softirq_entry_callback - Note the starting of a softirq 781 + * 782 + * Save the starting time of a softirq. As softirqs are non-preemptive to 783 + * other softirqs, it is safe to use a single variable (ons_var->softirq) 784 + * to save the statistics. The arrival_time is used to report... the 785 + * arrival time. The delta_start is used to compute the duration at the 786 + * softirq exit handler. See cond_move_softirq_delta_start(). 787 + */ 788 + static void trace_softirq_entry_callback(void *data, unsigned int vec_nr) 789 + { 790 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 791 + 792 + if (!osn_var->sampling) 793 + return; 794 + /* 795 + * This value will be used in the report, but not to compute 796 + * the execution time, so it is safe to get it unsafe. 797 + */ 798 + osn_var->softirq.arrival_time = time_get(); 799 + set_int_safe_time(osn_var, &osn_var->softirq.delta_start); 800 + osn_var->softirq.count++; 801 + 802 + local_inc(&osn_var->int_counter); 803 + } 804 + 805 + /* 806 + * trace_softirq_exit_callback - Note the end of an softirq 807 + * 808 + * Computes the duration of the softirq noise, and trace it. Also discounts the 809 + * interference from other sources of noise could be currently being accounted. 810 + */ 811 + static void trace_softirq_exit_callback(void *data, unsigned int vec_nr) 812 + { 813 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 814 + int duration; 815 + 816 + if (!osn_var->sampling) 817 + return; 818 + 819 + #ifdef CONFIG_TIMERLAT_TRACER 820 + /* 821 + * If the timerlat is enabled, but the irq handler did 822 + * not run yet enabling timerlat_tracer, do not trace. 823 + */ 824 + if (unlikely(osnoise_data.timerlat_tracer)) { 825 + struct timerlat_variables *tlat_var; 826 + tlat_var = this_cpu_tmr_var(); 827 + if (!tlat_var->tracing_thread) { 828 + osn_var->softirq.arrival_time = 0; 829 + osn_var->softirq.delta_start = 0; 830 + return; 831 + } 832 + } 833 + #endif 834 + 835 + duration = get_int_safe_duration(osn_var, &osn_var->softirq.delta_start); 836 + trace_softirq_noise(vec_nr, osn_var->softirq.arrival_time, duration); 837 + cond_move_thread_delta_start(osn_var, duration); 838 + osn_var->softirq.arrival_time = 0; 839 + } 840 + 841 + /* 842 + * hook_softirq_events - Hook softirq handling events 843 + * 844 + * This function hooks the softirq related callbacks to the respective trace 845 + * events. 846 + */ 847 + static int hook_softirq_events(void) 848 + { 849 + int ret; 850 + 851 + ret = register_trace_softirq_entry(trace_softirq_entry_callback, NULL); 852 + if (ret) 853 + goto out_err; 854 + 855 + ret = register_trace_softirq_exit(trace_softirq_exit_callback, NULL); 856 + if (ret) 857 + goto out_unreg_entry; 858 + 859 + return 0; 860 + 861 + out_unreg_entry: 862 + unregister_trace_softirq_entry(trace_softirq_entry_callback, NULL); 863 + out_err: 864 + return -EINVAL; 865 + } 866 + 867 + /* 868 + * unhook_softirq_events - Unhook softirq handling events 869 + * 870 + * This function hooks the softirq related callbacks to the respective trace 871 + * events. 872 + */ 873 + static void unhook_softirq_events(void) 874 + { 875 + unregister_trace_softirq_entry(trace_softirq_entry_callback, NULL); 876 + unregister_trace_softirq_exit(trace_softirq_exit_callback, NULL); 877 + } 878 + #else /* CONFIG_PREEMPT_RT */ 879 + /* 880 + * softirq are threads on the PREEMPT_RT mode. 881 + */ 882 + static int hook_softirq_events(void) 883 + { 884 + return 0; 885 + } 886 + static void unhook_softirq_events(void) 887 + { 888 + } 889 + #endif 890 + 891 + /* 892 + * thread_entry - Record the starting of a thread noise window 893 + * 894 + * It saves the context switch time for a noisy thread, and increments 895 + * the interference counters. 896 + */ 897 + static void 898 + thread_entry(struct osnoise_variables *osn_var, struct task_struct *t) 899 + { 900 + if (!osn_var->sampling) 901 + return; 902 + /* 903 + * The arrival time will be used in the report, but not to compute 904 + * the execution time, so it is safe to get it unsafe. 905 + */ 906 + osn_var->thread.arrival_time = time_get(); 907 + 908 + set_int_safe_time(osn_var, &osn_var->thread.delta_start); 909 + 910 + osn_var->thread.count++; 911 + local_inc(&osn_var->int_counter); 912 + } 913 + 914 + /* 915 + * thread_exit - Report the end of a thread noise window 916 + * 917 + * It computes the total noise from a thread, tracing if needed. 918 + */ 919 + static void 920 + thread_exit(struct osnoise_variables *osn_var, struct task_struct *t) 921 + { 922 + int duration; 923 + 924 + if (!osn_var->sampling) 925 + return; 926 + 927 + #ifdef CONFIG_TIMERLAT_TRACER 928 + if (osnoise_data.timerlat_tracer) { 929 + struct timerlat_variables *tlat_var; 930 + tlat_var = this_cpu_tmr_var(); 931 + if (!tlat_var->tracing_thread) { 932 + osn_var->thread.delta_start = 0; 933 + osn_var->thread.arrival_time = 0; 934 + return; 935 + } 936 + } 937 + #endif 938 + 939 + duration = get_int_safe_duration(osn_var, &osn_var->thread.delta_start); 940 + 941 + trace_thread_noise(t, osn_var->thread.arrival_time, duration); 942 + 943 + osn_var->thread.arrival_time = 0; 944 + } 945 + 946 + /* 947 + * trace_sched_switch - sched:sched_switch trace event handler 948 + * 949 + * This function is hooked to the sched:sched_switch trace event, and it is 950 + * used to record the beginning and to report the end of a thread noise window. 951 + */ 952 + static void 953 + trace_sched_switch_callback(void *data, bool preempt, struct task_struct *p, 954 + struct task_struct *n) 955 + { 956 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 957 + 958 + if (p->pid != osn_var->pid) 959 + thread_exit(osn_var, p); 960 + 961 + if (n->pid != osn_var->pid) 962 + thread_entry(osn_var, n); 963 + } 964 + 965 + /* 966 + * hook_thread_events - Hook the insturmentation for thread noise 967 + * 968 + * Hook the osnoise tracer callbacks to handle the noise from other 969 + * threads on the necessary kernel events. 970 + */ 971 + static int hook_thread_events(void) 972 + { 973 + int ret; 974 + 975 + ret = register_trace_sched_switch(trace_sched_switch_callback, NULL); 976 + if (ret) 977 + return -EINVAL; 978 + 979 + return 0; 980 + } 981 + 982 + /* 983 + * unhook_thread_events - *nhook the insturmentation for thread noise 984 + * 985 + * Unook the osnoise tracer callbacks to handle the noise from other 986 + * threads on the necessary kernel events. 987 + */ 988 + static void unhook_thread_events(void) 989 + { 990 + unregister_trace_sched_switch(trace_sched_switch_callback, NULL); 991 + } 992 + 993 + /* 994 + * save_osn_sample_stats - Save the osnoise_sample statistics 995 + * 996 + * Save the osnoise_sample statistics before the sampling phase. These 997 + * values will be used later to compute the diff betwneen the statistics 998 + * before and after the osnoise sampling. 999 + */ 1000 + static void 1001 + save_osn_sample_stats(struct osnoise_variables *osn_var, struct osnoise_sample *s) 1002 + { 1003 + s->nmi_count = osn_var->nmi.count; 1004 + s->irq_count = osn_var->irq.count; 1005 + s->softirq_count = osn_var->softirq.count; 1006 + s->thread_count = osn_var->thread.count; 1007 + } 1008 + 1009 + /* 1010 + * diff_osn_sample_stats - Compute the osnoise_sample statistics 1011 + * 1012 + * After a sample period, compute the difference on the osnoise_sample 1013 + * statistics. The struct osnoise_sample *s contains the statistics saved via 1014 + * save_osn_sample_stats() before the osnoise sampling. 1015 + */ 1016 + static void 1017 + diff_osn_sample_stats(struct osnoise_variables *osn_var, struct osnoise_sample *s) 1018 + { 1019 + s->nmi_count = osn_var->nmi.count - s->nmi_count; 1020 + s->irq_count = osn_var->irq.count - s->irq_count; 1021 + s->softirq_count = osn_var->softirq.count - s->softirq_count; 1022 + s->thread_count = osn_var->thread.count - s->thread_count; 1023 + } 1024 + 1025 + /* 1026 + * osnoise_stop_tracing - Stop tracing and the tracer. 1027 + */ 1028 + static void osnoise_stop_tracing(void) 1029 + { 1030 + struct trace_array *tr = osnoise_trace; 1031 + tracer_tracing_off(tr); 1032 + } 1033 + 1034 + /* 1035 + * run_osnoise - Sample the time and look for osnoise 1036 + * 1037 + * Used to capture the time, looking for potential osnoise latency repeatedly. 1038 + * Different from hwlat_detector, it is called with preemption and interrupts 1039 + * enabled. This allows irqs, softirqs and threads to run, interfering on the 1040 + * osnoise sampling thread, as they would do with a regular thread. 1041 + */ 1042 + static int run_osnoise(void) 1043 + { 1044 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 1045 + struct trace_array *tr = osnoise_trace; 1046 + u64 start, sample, last_sample; 1047 + u64 last_int_count, int_count; 1048 + s64 noise = 0, max_noise = 0; 1049 + s64 total, last_total = 0; 1050 + struct osnoise_sample s; 1051 + unsigned int threshold; 1052 + u64 runtime, stop_in; 1053 + u64 sum_noise = 0; 1054 + int hw_count = 0; 1055 + int ret = -1; 1056 + 1057 + /* 1058 + * Considers the current thread as the workload. 1059 + */ 1060 + osn_var->pid = current->pid; 1061 + 1062 + /* 1063 + * Save the current stats for the diff 1064 + */ 1065 + save_osn_sample_stats(osn_var, &s); 1066 + 1067 + /* 1068 + * if threshold is 0, use the default value of 5 us. 1069 + */ 1070 + threshold = tracing_thresh ? : 5000; 1071 + 1072 + /* 1073 + * Make sure NMIs see sampling first 1074 + */ 1075 + osn_var->sampling = true; 1076 + barrier(); 1077 + 1078 + /* 1079 + * Transform the *_us config to nanoseconds to avoid the 1080 + * division on the main loop. 1081 + */ 1082 + runtime = osnoise_data.sample_runtime * NSEC_PER_USEC; 1083 + stop_in = osnoise_data.stop_tracing * NSEC_PER_USEC; 1084 + 1085 + /* 1086 + * Start timestemp 1087 + */ 1088 + start = time_get(); 1089 + 1090 + /* 1091 + * "previous" loop. 1092 + */ 1093 + last_int_count = set_int_safe_time(osn_var, &last_sample); 1094 + 1095 + do { 1096 + /* 1097 + * Get sample! 1098 + */ 1099 + int_count = set_int_safe_time(osn_var, &sample); 1100 + 1101 + noise = time_sub(sample, last_sample); 1102 + 1103 + /* 1104 + * This shouldn't happen. 1105 + */ 1106 + if (noise < 0) { 1107 + osnoise_taint("negative noise!"); 1108 + goto out; 1109 + } 1110 + 1111 + /* 1112 + * Sample runtime. 1113 + */ 1114 + total = time_sub(sample, start); 1115 + 1116 + /* 1117 + * Check for possible overflows. 1118 + */ 1119 + if (total < last_total) { 1120 + osnoise_taint("total overflow!"); 1121 + break; 1122 + } 1123 + 1124 + last_total = total; 1125 + 1126 + if (noise >= threshold) { 1127 + int interference = int_count - last_int_count; 1128 + 1129 + if (noise > max_noise) 1130 + max_noise = noise; 1131 + 1132 + if (!interference) 1133 + hw_count++; 1134 + 1135 + sum_noise += noise; 1136 + 1137 + trace_sample_threshold(last_sample, noise, interference); 1138 + 1139 + if (osnoise_data.stop_tracing) 1140 + if (noise > stop_in) 1141 + osnoise_stop_tracing(); 1142 + } 1143 + 1144 + /* 1145 + * For the non-preemptive kernel config: let threads runs, if 1146 + * they so wish. 1147 + */ 1148 + cond_resched(); 1149 + 1150 + last_sample = sample; 1151 + last_int_count = int_count; 1152 + 1153 + } while (total < runtime && !kthread_should_stop()); 1154 + 1155 + /* 1156 + * Finish the above in the view for interrupts. 1157 + */ 1158 + barrier(); 1159 + 1160 + osn_var->sampling = false; 1161 + 1162 + /* 1163 + * Make sure sampling data is no longer updated. 1164 + */ 1165 + barrier(); 1166 + 1167 + /* 1168 + * Save noise info. 1169 + */ 1170 + s.noise = time_to_us(sum_noise); 1171 + s.runtime = time_to_us(total); 1172 + s.max_sample = time_to_us(max_noise); 1173 + s.hw_count = hw_count; 1174 + 1175 + /* Save interference stats info */ 1176 + diff_osn_sample_stats(osn_var, &s); 1177 + 1178 + trace_osnoise_sample(&s); 1179 + 1180 + /* Keep a running maximum ever recorded osnoise "latency" */ 1181 + if (max_noise > tr->max_latency) { 1182 + tr->max_latency = max_noise; 1183 + latency_fsnotify(tr); 1184 + } 1185 + 1186 + if (osnoise_data.stop_tracing_total) 1187 + if (s.noise > osnoise_data.stop_tracing_total) 1188 + osnoise_stop_tracing(); 1189 + 1190 + return 0; 1191 + out: 1192 + return ret; 1193 + } 1194 + 1195 + static struct cpumask osnoise_cpumask; 1196 + static struct cpumask save_cpumask; 1197 + 1198 + /* 1199 + * osnoise_main - The osnoise detection kernel thread 1200 + * 1201 + * Calls run_osnoise() function to measure the osnoise for the configured runtime, 1202 + * every period. 1203 + */ 1204 + static int osnoise_main(void *data) 1205 + { 1206 + u64 interval; 1207 + 1208 + while (!kthread_should_stop()) { 1209 + 1210 + run_osnoise(); 1211 + 1212 + mutex_lock(&interface_lock); 1213 + interval = osnoise_data.sample_period - osnoise_data.sample_runtime; 1214 + mutex_unlock(&interface_lock); 1215 + 1216 + do_div(interval, USEC_PER_MSEC); 1217 + 1218 + /* 1219 + * differently from hwlat_detector, the osnoise tracer can run 1220 + * without a pause because preemption is on. 1221 + */ 1222 + if (interval < 1) { 1223 + /* Let synchronize_rcu_tasks() make progress */ 1224 + cond_resched_tasks_rcu_qs(); 1225 + continue; 1226 + } 1227 + 1228 + if (msleep_interruptible(interval)) 1229 + break; 1230 + } 1231 + 1232 + return 0; 1233 + } 1234 + 1235 + #ifdef CONFIG_TIMERLAT_TRACER 1236 + /* 1237 + * timerlat_irq - hrtimer handler for timerlat. 1238 + */ 1239 + static enum hrtimer_restart timerlat_irq(struct hrtimer *timer) 1240 + { 1241 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 1242 + struct trace_array *tr = osnoise_trace; 1243 + struct timerlat_variables *tlat; 1244 + struct timerlat_sample s; 1245 + u64 now; 1246 + u64 diff; 1247 + 1248 + /* 1249 + * I am not sure if the timer was armed for this CPU. So, get 1250 + * the timerlat struct from the timer itself, not from this 1251 + * CPU. 1252 + */ 1253 + tlat = container_of(timer, struct timerlat_variables, timer); 1254 + 1255 + now = ktime_to_ns(hrtimer_cb_get_time(&tlat->timer)); 1256 + 1257 + /* 1258 + * Enable the osnoise: events for thread an softirq. 1259 + */ 1260 + tlat->tracing_thread = true; 1261 + 1262 + osn_var->thread.arrival_time = time_get(); 1263 + 1264 + /* 1265 + * A hardirq is running: the timer IRQ. It is for sure preempting 1266 + * a thread, and potentially preempting a softirq. 1267 + * 1268 + * At this point, it is not interesting to know the duration of the 1269 + * preempted thread (and maybe softirq), but how much time they will 1270 + * delay the beginning of the execution of the timer thread. 1271 + * 1272 + * To get the correct (net) delay added by the softirq, its delta_start 1273 + * is set as the IRQ one. In this way, at the return of the IRQ, the delta 1274 + * start of the sofitrq will be zeroed, accounting then only the time 1275 + * after that. 1276 + * 1277 + * The thread follows the same principle. However, if a softirq is 1278 + * running, the thread needs to receive the softirq delta_start. The 1279 + * reason being is that the softirq will be the last to be unfolded, 1280 + * resseting the thread delay to zero. 1281 + */ 1282 + #ifndef CONFIG_PREEMPT_RT 1283 + if (osn_var->softirq.delta_start) { 1284 + copy_int_safe_time(osn_var, &osn_var->thread.delta_start, 1285 + &osn_var->softirq.delta_start); 1286 + 1287 + copy_int_safe_time(osn_var, &osn_var->softirq.delta_start, 1288 + &osn_var->irq.delta_start); 1289 + } else { 1290 + copy_int_safe_time(osn_var, &osn_var->thread.delta_start, 1291 + &osn_var->irq.delta_start); 1292 + } 1293 + #else /* CONFIG_PREEMPT_RT */ 1294 + /* 1295 + * The sofirqs run as threads on RT, so there is not need 1296 + * to keep track of it. 1297 + */ 1298 + copy_int_safe_time(osn_var, &osn_var->thread.delta_start, &osn_var->irq.delta_start); 1299 + #endif /* CONFIG_PREEMPT_RT */ 1300 + 1301 + /* 1302 + * Compute the current time with the expected time. 1303 + */ 1304 + diff = now - tlat->abs_period; 1305 + 1306 + tlat->count++; 1307 + s.seqnum = tlat->count; 1308 + s.timer_latency = diff; 1309 + s.context = IRQ_CONTEXT; 1310 + 1311 + trace_timerlat_sample(&s); 1312 + 1313 + /* Keep a running maximum ever recorded os noise "latency" */ 1314 + if (diff > tr->max_latency) { 1315 + tr->max_latency = diff; 1316 + latency_fsnotify(tr); 1317 + } 1318 + 1319 + if (osnoise_data.stop_tracing) 1320 + if (time_to_us(diff) >= osnoise_data.stop_tracing) 1321 + osnoise_stop_tracing(); 1322 + 1323 + wake_up_process(tlat->kthread); 1324 + 1325 + if (osnoise_data.print_stack) 1326 + timerlat_save_stack(0); 1327 + 1328 + return HRTIMER_NORESTART; 1329 + } 1330 + 1331 + /* 1332 + * wait_next_period - Wait for the next period for timerlat 1333 + */ 1334 + static int wait_next_period(struct timerlat_variables *tlat) 1335 + { 1336 + ktime_t next_abs_period, now; 1337 + u64 rel_period = osnoise_data.timerlat_period * 1000; 1338 + 1339 + now = hrtimer_cb_get_time(&tlat->timer); 1340 + next_abs_period = ns_to_ktime(tlat->abs_period + rel_period); 1341 + 1342 + /* 1343 + * Save the next abs_period. 1344 + */ 1345 + tlat->abs_period = (u64) ktime_to_ns(next_abs_period); 1346 + 1347 + /* 1348 + * If the new abs_period is in the past, skip the activation. 1349 + */ 1350 + while (ktime_compare(now, next_abs_period) > 0) { 1351 + next_abs_period = ns_to_ktime(tlat->abs_period + rel_period); 1352 + tlat->abs_period = (u64) ktime_to_ns(next_abs_period); 1353 + } 1354 + 1355 + set_current_state(TASK_INTERRUPTIBLE); 1356 + 1357 + hrtimer_start(&tlat->timer, next_abs_period, HRTIMER_MODE_ABS_PINNED_HARD); 1358 + schedule(); 1359 + return 1; 1360 + } 1361 + 1362 + /* 1363 + * timerlat_main- Timerlat main 1364 + */ 1365 + static int timerlat_main(void *data) 1366 + { 1367 + struct osnoise_variables *osn_var = this_cpu_osn_var(); 1368 + struct timerlat_variables *tlat = this_cpu_tmr_var(); 1369 + struct timerlat_sample s; 1370 + struct sched_param sp; 1371 + u64 now, diff; 1372 + 1373 + /* 1374 + * Make the thread RT, that is how cyclictest is usually used. 1375 + */ 1376 + sp.sched_priority = DEFAULT_TIMERLAT_PRIO; 1377 + sched_setscheduler_nocheck(current, SCHED_FIFO, &sp); 1378 + 1379 + tlat->count = 0; 1380 + tlat->tracing_thread = false; 1381 + 1382 + hrtimer_init(&tlat->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_HARD); 1383 + tlat->timer.function = timerlat_irq; 1384 + tlat->kthread = current; 1385 + osn_var->pid = current->pid; 1386 + /* 1387 + * Anotate the arrival time. 1388 + */ 1389 + tlat->abs_period = hrtimer_cb_get_time(&tlat->timer); 1390 + 1391 + wait_next_period(tlat); 1392 + 1393 + osn_var->sampling = 1; 1394 + 1395 + while (!kthread_should_stop()) { 1396 + now = ktime_to_ns(hrtimer_cb_get_time(&tlat->timer)); 1397 + diff = now - tlat->abs_period; 1398 + 1399 + s.seqnum = tlat->count; 1400 + s.timer_latency = diff; 1401 + s.context = THREAD_CONTEXT; 1402 + 1403 + trace_timerlat_sample(&s); 1404 + 1405 + #ifdef CONFIG_STACKTRACE 1406 + if (osnoise_data.print_stack) 1407 + if (osnoise_data.print_stack <= time_to_us(diff)) 1408 + timerlat_dump_stack(); 1409 + #endif /* CONFIG_STACKTRACE */ 1410 + 1411 + tlat->tracing_thread = false; 1412 + if (osnoise_data.stop_tracing_total) 1413 + if (time_to_us(diff) >= osnoise_data.stop_tracing_total) 1414 + osnoise_stop_tracing(); 1415 + 1416 + wait_next_period(tlat); 1417 + } 1418 + 1419 + hrtimer_cancel(&tlat->timer); 1420 + return 0; 1421 + } 1422 + #endif /* CONFIG_TIMERLAT_TRACER */ 1423 + 1424 + /* 1425 + * stop_kthread - stop a workload thread 1426 + */ 1427 + static void stop_kthread(unsigned int cpu) 1428 + { 1429 + struct task_struct *kthread; 1430 + 1431 + kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread; 1432 + if (kthread) 1433 + kthread_stop(kthread); 1434 + per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL; 1435 + } 1436 + 1437 + /* 1438 + * stop_per_cpu_kthread - Stop per-cpu threads 1439 + * 1440 + * Stop the osnoise sampling htread. Use this on unload and at system 1441 + * shutdown. 1442 + */ 1443 + static void stop_per_cpu_kthreads(void) 1444 + { 1445 + int cpu; 1446 + 1447 + get_online_cpus(); 1448 + 1449 + for_each_online_cpu(cpu) 1450 + stop_kthread(cpu); 1451 + 1452 + put_online_cpus(); 1453 + } 1454 + 1455 + /* 1456 + * start_kthread - Start a workload tread 1457 + */ 1458 + static int start_kthread(unsigned int cpu) 1459 + { 1460 + struct task_struct *kthread; 1461 + void *main = osnoise_main; 1462 + char comm[24]; 1463 + 1464 + #ifdef CONFIG_TIMERLAT_TRACER 1465 + if (osnoise_data.timerlat_tracer) { 1466 + snprintf(comm, 24, "timerlat/%d", cpu); 1467 + main = timerlat_main; 1468 + } else { 1469 + snprintf(comm, 24, "osnoise/%d", cpu); 1470 + } 1471 + #else 1472 + snprintf(comm, 24, "osnoise/%d", cpu); 1473 + #endif 1474 + kthread = kthread_create_on_cpu(main, NULL, cpu, comm); 1475 + 1476 + if (IS_ERR(kthread)) { 1477 + pr_err(BANNER "could not start sampling thread\n"); 1478 + stop_per_cpu_kthreads(); 1479 + return -ENOMEM; 1480 + } 1481 + 1482 + per_cpu(per_cpu_osnoise_var, cpu).kthread = kthread; 1483 + wake_up_process(kthread); 1484 + 1485 + return 0; 1486 + } 1487 + 1488 + /* 1489 + * start_per_cpu_kthread - Kick off per-cpu osnoise sampling kthreads 1490 + * 1491 + * This starts the kernel thread that will look for osnoise on many 1492 + * cpus. 1493 + */ 1494 + static int start_per_cpu_kthreads(struct trace_array *tr) 1495 + { 1496 + struct cpumask *current_mask = &save_cpumask; 1497 + int retval; 1498 + int cpu; 1499 + 1500 + get_online_cpus(); 1501 + /* 1502 + * Run only on CPUs in which trace and osnoise are allowed to run. 1503 + */ 1504 + cpumask_and(current_mask, tr->tracing_cpumask, &osnoise_cpumask); 1505 + /* 1506 + * And the CPU is online. 1507 + */ 1508 + cpumask_and(current_mask, cpu_online_mask, current_mask); 1509 + 1510 + for_each_possible_cpu(cpu) 1511 + per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL; 1512 + 1513 + for_each_cpu(cpu, current_mask) { 1514 + retval = start_kthread(cpu); 1515 + if (retval) { 1516 + stop_per_cpu_kthreads(); 1517 + return retval; 1518 + } 1519 + } 1520 + 1521 + put_online_cpus(); 1522 + 1523 + return 0; 1524 + } 1525 + 1526 + #ifdef CONFIG_HOTPLUG_CPU 1527 + static void osnoise_hotplug_workfn(struct work_struct *dummy) 1528 + { 1529 + struct trace_array *tr = osnoise_trace; 1530 + unsigned int cpu = smp_processor_id(); 1531 + 1532 + 1533 + mutex_lock(&trace_types_lock); 1534 + 1535 + if (!osnoise_busy) 1536 + goto out_unlock_trace; 1537 + 1538 + mutex_lock(&interface_lock); 1539 + get_online_cpus(); 1540 + 1541 + if (!cpumask_test_cpu(cpu, &osnoise_cpumask)) 1542 + goto out_unlock; 1543 + 1544 + if (!cpumask_test_cpu(cpu, tr->tracing_cpumask)) 1545 + goto out_unlock; 1546 + 1547 + start_kthread(cpu); 1548 + 1549 + out_unlock: 1550 + put_online_cpus(); 1551 + mutex_unlock(&interface_lock); 1552 + out_unlock_trace: 1553 + mutex_unlock(&trace_types_lock); 1554 + } 1555 + 1556 + static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn); 1557 + 1558 + /* 1559 + * osnoise_cpu_init - CPU hotplug online callback function 1560 + */ 1561 + static int osnoise_cpu_init(unsigned int cpu) 1562 + { 1563 + schedule_work_on(cpu, &osnoise_hotplug_work); 1564 + return 0; 1565 + } 1566 + 1567 + /* 1568 + * osnoise_cpu_die - CPU hotplug offline callback function 1569 + */ 1570 + static int osnoise_cpu_die(unsigned int cpu) 1571 + { 1572 + stop_kthread(cpu); 1573 + return 0; 1574 + } 1575 + 1576 + static void osnoise_init_hotplug_support(void) 1577 + { 1578 + int ret; 1579 + 1580 + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "trace/osnoise:online", 1581 + osnoise_cpu_init, osnoise_cpu_die); 1582 + if (ret < 0) 1583 + pr_warn(BANNER "Error to init cpu hotplug support\n"); 1584 + 1585 + return; 1586 + } 1587 + #else /* CONFIG_HOTPLUG_CPU */ 1588 + static void osnoise_init_hotplug_support(void) 1589 + { 1590 + return; 1591 + } 1592 + #endif /* CONFIG_HOTPLUG_CPU */ 1593 + 1594 + /* 1595 + * osnoise_cpus_read - Read function for reading the "cpus" file 1596 + * @filp: The active open file structure 1597 + * @ubuf: The userspace provided buffer to read value into 1598 + * @cnt: The maximum number of bytes to read 1599 + * @ppos: The current "file" position 1600 + * 1601 + * Prints the "cpus" output into the user-provided buffer. 1602 + */ 1603 + static ssize_t 1604 + osnoise_cpus_read(struct file *filp, char __user *ubuf, size_t count, 1605 + loff_t *ppos) 1606 + { 1607 + char *mask_str; 1608 + int len; 1609 + 1610 + mutex_lock(&interface_lock); 1611 + 1612 + len = snprintf(NULL, 0, "%*pbl\n", cpumask_pr_args(&osnoise_cpumask)) + 1; 1613 + mask_str = kmalloc(len, GFP_KERNEL); 1614 + if (!mask_str) { 1615 + count = -ENOMEM; 1616 + goto out_unlock; 1617 + } 1618 + 1619 + len = snprintf(mask_str, len, "%*pbl\n", cpumask_pr_args(&osnoise_cpumask)); 1620 + if (len >= count) { 1621 + count = -EINVAL; 1622 + goto out_free; 1623 + } 1624 + 1625 + count = simple_read_from_buffer(ubuf, count, ppos, mask_str, len); 1626 + 1627 + out_free: 1628 + kfree(mask_str); 1629 + out_unlock: 1630 + mutex_unlock(&interface_lock); 1631 + 1632 + return count; 1633 + } 1634 + 1635 + static void osnoise_tracer_start(struct trace_array *tr); 1636 + static void osnoise_tracer_stop(struct trace_array *tr); 1637 + 1638 + /* 1639 + * osnoise_cpus_write - Write function for "cpus" entry 1640 + * @filp: The active open file structure 1641 + * @ubuf: The user buffer that contains the value to write 1642 + * @cnt: The maximum number of bytes to write to "file" 1643 + * @ppos: The current position in @file 1644 + * 1645 + * This function provides a write implementation for the "cpus" 1646 + * interface to the osnoise trace. By default, it lists all CPUs, 1647 + * in this way, allowing osnoise threads to run on any online CPU 1648 + * of the system. It serves to restrict the execution of osnoise to the 1649 + * set of CPUs writing via this interface. Note that osnoise also 1650 + * respects the "tracing_cpumask." Hence, osnoise threads will run only 1651 + * on the set of CPUs allowed here AND on "tracing_cpumask." Why not 1652 + * have just "tracing_cpumask?" Because the user might be interested 1653 + * in tracing what is running on other CPUs. For instance, one might 1654 + * run osnoise in one HT CPU while observing what is running on the 1655 + * sibling HT CPU. 1656 + */ 1657 + static ssize_t 1658 + osnoise_cpus_write(struct file *filp, const char __user *ubuf, size_t count, 1659 + loff_t *ppos) 1660 + { 1661 + struct trace_array *tr = osnoise_trace; 1662 + cpumask_var_t osnoise_cpumask_new; 1663 + int running, err; 1664 + char buf[256]; 1665 + 1666 + if (count >= 256) 1667 + return -EINVAL; 1668 + 1669 + if (copy_from_user(buf, ubuf, count)) 1670 + return -EFAULT; 1671 + 1672 + if (!zalloc_cpumask_var(&osnoise_cpumask_new, GFP_KERNEL)) 1673 + return -ENOMEM; 1674 + 1675 + err = cpulist_parse(buf, osnoise_cpumask_new); 1676 + if (err) 1677 + goto err_free; 1678 + 1679 + /* 1680 + * trace_types_lock is taken to avoid concurrency on start/stop 1681 + * and osnoise_busy. 1682 + */ 1683 + mutex_lock(&trace_types_lock); 1684 + running = osnoise_busy; 1685 + if (running) 1686 + osnoise_tracer_stop(tr); 1687 + 1688 + mutex_lock(&interface_lock); 1689 + /* 1690 + * osnoise_cpumask is read by CPU hotplug operations. 1691 + */ 1692 + get_online_cpus(); 1693 + 1694 + cpumask_copy(&osnoise_cpumask, osnoise_cpumask_new); 1695 + 1696 + put_online_cpus(); 1697 + mutex_unlock(&interface_lock); 1698 + 1699 + if (running) 1700 + osnoise_tracer_start(tr); 1701 + mutex_unlock(&trace_types_lock); 1702 + 1703 + free_cpumask_var(osnoise_cpumask_new); 1704 + return count; 1705 + 1706 + err_free: 1707 + free_cpumask_var(osnoise_cpumask_new); 1708 + 1709 + return err; 1710 + } 1711 + 1712 + /* 1713 + * osnoise/runtime_us: cannot be greater than the period. 1714 + */ 1715 + static struct trace_min_max_param osnoise_runtime = { 1716 + .lock = &interface_lock, 1717 + .val = &osnoise_data.sample_runtime, 1718 + .max = &osnoise_data.sample_period, 1719 + .min = NULL, 1720 + }; 1721 + 1722 + /* 1723 + * osnoise/period_us: cannot be smaller than the runtime. 1724 + */ 1725 + static struct trace_min_max_param osnoise_period = { 1726 + .lock = &interface_lock, 1727 + .val = &osnoise_data.sample_period, 1728 + .max = NULL, 1729 + .min = &osnoise_data.sample_runtime, 1730 + }; 1731 + 1732 + /* 1733 + * osnoise/stop_tracing_us: no limit. 1734 + */ 1735 + static struct trace_min_max_param osnoise_stop_tracing_in = { 1736 + .lock = &interface_lock, 1737 + .val = &osnoise_data.stop_tracing, 1738 + .max = NULL, 1739 + .min = NULL, 1740 + }; 1741 + 1742 + /* 1743 + * osnoise/stop_tracing_total_us: no limit. 1744 + */ 1745 + static struct trace_min_max_param osnoise_stop_tracing_total = { 1746 + .lock = &interface_lock, 1747 + .val = &osnoise_data.stop_tracing_total, 1748 + .max = NULL, 1749 + .min = NULL, 1750 + }; 1751 + 1752 + #ifdef CONFIG_TIMERLAT_TRACER 1753 + /* 1754 + * osnoise/print_stack: print the stacktrace of the IRQ handler if the total 1755 + * latency is higher than val. 1756 + */ 1757 + static struct trace_min_max_param osnoise_print_stack = { 1758 + .lock = &interface_lock, 1759 + .val = &osnoise_data.print_stack, 1760 + .max = NULL, 1761 + .min = NULL, 1762 + }; 1763 + 1764 + /* 1765 + * osnoise/timerlat_period: min 100 us, max 1 s 1766 + */ 1767 + u64 timerlat_min_period = 100; 1768 + u64 timerlat_max_period = 1000000; 1769 + static struct trace_min_max_param timerlat_period = { 1770 + .lock = &interface_lock, 1771 + .val = &osnoise_data.timerlat_period, 1772 + .max = &timerlat_max_period, 1773 + .min = &timerlat_min_period, 1774 + }; 1775 + #endif 1776 + 1777 + static const struct file_operations cpus_fops = { 1778 + .open = tracing_open_generic, 1779 + .read = osnoise_cpus_read, 1780 + .write = osnoise_cpus_write, 1781 + .llseek = generic_file_llseek, 1782 + }; 1783 + 1784 + /* 1785 + * init_tracefs - A function to initialize the tracefs interface files 1786 + * 1787 + * This function creates entries in tracefs for "osnoise" and "timerlat". 1788 + * It creates these directories in the tracing directory, and within that 1789 + * directory the use can change and view the configs. 1790 + */ 1791 + static int init_tracefs(void) 1792 + { 1793 + struct dentry *top_dir; 1794 + struct dentry *tmp; 1795 + int ret; 1796 + 1797 + ret = tracing_init_dentry(); 1798 + if (ret) 1799 + return -ENOMEM; 1800 + 1801 + top_dir = tracefs_create_dir("osnoise", NULL); 1802 + if (!top_dir) 1803 + return 0; 1804 + 1805 + tmp = tracefs_create_file("period_us", 0640, top_dir, 1806 + &osnoise_period, &trace_min_max_fops); 1807 + if (!tmp) 1808 + goto err; 1809 + 1810 + tmp = tracefs_create_file("runtime_us", 0644, top_dir, 1811 + &osnoise_runtime, &trace_min_max_fops); 1812 + if (!tmp) 1813 + goto err; 1814 + 1815 + tmp = tracefs_create_file("stop_tracing_us", 0640, top_dir, 1816 + &osnoise_stop_tracing_in, &trace_min_max_fops); 1817 + if (!tmp) 1818 + goto err; 1819 + 1820 + tmp = tracefs_create_file("stop_tracing_total_us", 0640, top_dir, 1821 + &osnoise_stop_tracing_total, &trace_min_max_fops); 1822 + if (!tmp) 1823 + goto err; 1824 + 1825 + tmp = trace_create_file("cpus", 0644, top_dir, NULL, &cpus_fops); 1826 + if (!tmp) 1827 + goto err; 1828 + #ifdef CONFIG_TIMERLAT_TRACER 1829 + #ifdef CONFIG_STACKTRACE 1830 + tmp = tracefs_create_file("print_stack", 0640, top_dir, 1831 + &osnoise_print_stack, &trace_min_max_fops); 1832 + if (!tmp) 1833 + goto err; 1834 + #endif 1835 + 1836 + tmp = tracefs_create_file("timerlat_period_us", 0640, top_dir, 1837 + &timerlat_period, &trace_min_max_fops); 1838 + if (!tmp) 1839 + goto err; 1840 + #endif 1841 + 1842 + return 0; 1843 + 1844 + err: 1845 + tracefs_remove(top_dir); 1846 + return -ENOMEM; 1847 + } 1848 + 1849 + static int osnoise_hook_events(void) 1850 + { 1851 + int retval; 1852 + 1853 + /* 1854 + * Trace is already hooked, we are re-enabling from 1855 + * a stop_tracing_*. 1856 + */ 1857 + if (trace_osnoise_callback_enabled) 1858 + return 0; 1859 + 1860 + retval = hook_irq_events(); 1861 + if (retval) 1862 + return -EINVAL; 1863 + 1864 + retval = hook_softirq_events(); 1865 + if (retval) 1866 + goto out_unhook_irq; 1867 + 1868 + retval = hook_thread_events(); 1869 + /* 1870 + * All fine! 1871 + */ 1872 + if (!retval) 1873 + return 0; 1874 + 1875 + unhook_softirq_events(); 1876 + out_unhook_irq: 1877 + unhook_irq_events(); 1878 + return -EINVAL; 1879 + } 1880 + 1881 + static int __osnoise_tracer_start(struct trace_array *tr) 1882 + { 1883 + int retval; 1884 + 1885 + osn_var_reset_all(); 1886 + 1887 + retval = osnoise_hook_events(); 1888 + if (retval) 1889 + return retval; 1890 + /* 1891 + * Make sure NMIs see reseted values. 1892 + */ 1893 + barrier(); 1894 + trace_osnoise_callback_enabled = true; 1895 + 1896 + retval = start_per_cpu_kthreads(tr); 1897 + if (retval) { 1898 + unhook_irq_events(); 1899 + return retval; 1900 + } 1901 + 1902 + osnoise_busy = true; 1903 + 1904 + return 0; 1905 + } 1906 + 1907 + static void osnoise_tracer_start(struct trace_array *tr) 1908 + { 1909 + int retval; 1910 + 1911 + if (osnoise_busy) 1912 + return; 1913 + 1914 + retval = __osnoise_tracer_start(tr); 1915 + if (retval) 1916 + pr_err(BANNER "Error starting osnoise tracer\n"); 1917 + 1918 + } 1919 + 1920 + static void osnoise_tracer_stop(struct trace_array *tr) 1921 + { 1922 + if (!osnoise_busy) 1923 + return; 1924 + 1925 + trace_osnoise_callback_enabled = false; 1926 + barrier(); 1927 + 1928 + stop_per_cpu_kthreads(); 1929 + 1930 + unhook_irq_events(); 1931 + unhook_softirq_events(); 1932 + unhook_thread_events(); 1933 + 1934 + osnoise_busy = false; 1935 + } 1936 + 1937 + static int osnoise_tracer_init(struct trace_array *tr) 1938 + { 1939 + 1940 + /* Only allow one instance to enable this */ 1941 + if (osnoise_busy) 1942 + return -EBUSY; 1943 + 1944 + osnoise_trace = tr; 1945 + tr->max_latency = 0; 1946 + 1947 + osnoise_tracer_start(tr); 1948 + 1949 + return 0; 1950 + } 1951 + 1952 + static void osnoise_tracer_reset(struct trace_array *tr) 1953 + { 1954 + osnoise_tracer_stop(tr); 1955 + } 1956 + 1957 + static struct tracer osnoise_tracer __read_mostly = { 1958 + .name = "osnoise", 1959 + .init = osnoise_tracer_init, 1960 + .reset = osnoise_tracer_reset, 1961 + .start = osnoise_tracer_start, 1962 + .stop = osnoise_tracer_stop, 1963 + .print_header = print_osnoise_headers, 1964 + .allow_instances = true, 1965 + }; 1966 + 1967 + #ifdef CONFIG_TIMERLAT_TRACER 1968 + static void timerlat_tracer_start(struct trace_array *tr) 1969 + { 1970 + int retval; 1971 + 1972 + if (osnoise_busy) 1973 + return; 1974 + 1975 + osnoise_data.timerlat_tracer = 1; 1976 + 1977 + retval = __osnoise_tracer_start(tr); 1978 + if (retval) 1979 + goto out_err; 1980 + 1981 + return; 1982 + out_err: 1983 + pr_err(BANNER "Error starting timerlat tracer\n"); 1984 + } 1985 + 1986 + static void timerlat_tracer_stop(struct trace_array *tr) 1987 + { 1988 + int cpu; 1989 + 1990 + if (!osnoise_busy) 1991 + return; 1992 + 1993 + for_each_online_cpu(cpu) 1994 + per_cpu(per_cpu_osnoise_var, cpu).sampling = 0; 1995 + 1996 + osnoise_tracer_stop(tr); 1997 + 1998 + osnoise_data.timerlat_tracer = 0; 1999 + } 2000 + 2001 + static int timerlat_tracer_init(struct trace_array *tr) 2002 + { 2003 + /* Only allow one instance to enable this */ 2004 + if (osnoise_busy) 2005 + return -EBUSY; 2006 + 2007 + osnoise_trace = tr; 2008 + 2009 + tr->max_latency = 0; 2010 + 2011 + timerlat_tracer_start(tr); 2012 + 2013 + return 0; 2014 + } 2015 + 2016 + static void timerlat_tracer_reset(struct trace_array *tr) 2017 + { 2018 + timerlat_tracer_stop(tr); 2019 + } 2020 + 2021 + static struct tracer timerlat_tracer __read_mostly = { 2022 + .name = "timerlat", 2023 + .init = timerlat_tracer_init, 2024 + .reset = timerlat_tracer_reset, 2025 + .start = timerlat_tracer_start, 2026 + .stop = timerlat_tracer_stop, 2027 + .print_header = print_timerlat_headers, 2028 + .allow_instances = true, 2029 + }; 2030 + #endif /* CONFIG_TIMERLAT_TRACER */ 2031 + 2032 + __init static int init_osnoise_tracer(void) 2033 + { 2034 + int ret; 2035 + 2036 + mutex_init(&interface_lock); 2037 + 2038 + cpumask_copy(&osnoise_cpumask, cpu_all_mask); 2039 + 2040 + ret = register_tracer(&osnoise_tracer); 2041 + if (ret) { 2042 + pr_err(BANNER "Error registering osnoise!\n"); 2043 + return ret; 2044 + } 2045 + 2046 + #ifdef CONFIG_TIMERLAT_TRACER 2047 + ret = register_tracer(&timerlat_tracer); 2048 + if (ret) { 2049 + pr_err(BANNER "Error registering timerlat\n"); 2050 + return ret; 2051 + } 2052 + #endif 2053 + osnoise_init_hotplug_support(); 2054 + 2055 + init_tracefs(); 2056 + 2057 + return 0; 2058 + } 2059 + late_initcall(init_osnoise_tracer);
+118 -1
kernel/trace/trace_output.c
··· 1202 1202 return trace_handle_return(s); 1203 1203 } 1204 1204 1205 - 1206 1205 static enum print_line_t 1207 1206 trace_hwlat_raw(struct trace_iterator *iter, int flags, 1208 1207 struct trace_event *event) ··· 1229 1230 static struct trace_event trace_hwlat_event = { 1230 1231 .type = TRACE_HWLAT, 1231 1232 .funcs = &trace_hwlat_funcs, 1233 + }; 1234 + 1235 + /* TRACE_OSNOISE */ 1236 + static enum print_line_t 1237 + trace_osnoise_print(struct trace_iterator *iter, int flags, 1238 + struct trace_event *event) 1239 + { 1240 + struct trace_entry *entry = iter->ent; 1241 + struct trace_seq *s = &iter->seq; 1242 + struct osnoise_entry *field; 1243 + u64 ratio, ratio_dec; 1244 + u64 net_runtime; 1245 + 1246 + trace_assign_type(field, entry); 1247 + 1248 + /* 1249 + * compute the available % of cpu time. 1250 + */ 1251 + net_runtime = field->runtime - field->noise; 1252 + ratio = net_runtime * 10000000; 1253 + do_div(ratio, field->runtime); 1254 + ratio_dec = do_div(ratio, 100000); 1255 + 1256 + trace_seq_printf(s, "%llu %10llu %3llu.%05llu %7llu", 1257 + field->runtime, 1258 + field->noise, 1259 + ratio, ratio_dec, 1260 + field->max_sample); 1261 + 1262 + trace_seq_printf(s, " %6u", field->hw_count); 1263 + trace_seq_printf(s, " %6u", field->nmi_count); 1264 + trace_seq_printf(s, " %6u", field->irq_count); 1265 + trace_seq_printf(s, " %6u", field->softirq_count); 1266 + trace_seq_printf(s, " %6u", field->thread_count); 1267 + 1268 + trace_seq_putc(s, '\n'); 1269 + 1270 + return trace_handle_return(s); 1271 + } 1272 + 1273 + static enum print_line_t 1274 + trace_osnoise_raw(struct trace_iterator *iter, int flags, 1275 + struct trace_event *event) 1276 + { 1277 + struct osnoise_entry *field; 1278 + struct trace_seq *s = &iter->seq; 1279 + 1280 + trace_assign_type(field, iter->ent); 1281 + 1282 + trace_seq_printf(s, "%lld %llu %llu %u %u %u %u %u\n", 1283 + field->runtime, 1284 + field->noise, 1285 + field->max_sample, 1286 + field->hw_count, 1287 + field->nmi_count, 1288 + field->irq_count, 1289 + field->softirq_count, 1290 + field->thread_count); 1291 + 1292 + return trace_handle_return(s); 1293 + } 1294 + 1295 + static struct trace_event_functions trace_osnoise_funcs = { 1296 + .trace = trace_osnoise_print, 1297 + .raw = trace_osnoise_raw, 1298 + }; 1299 + 1300 + static struct trace_event trace_osnoise_event = { 1301 + .type = TRACE_OSNOISE, 1302 + .funcs = &trace_osnoise_funcs, 1303 + }; 1304 + 1305 + /* TRACE_TIMERLAT */ 1306 + static enum print_line_t 1307 + trace_timerlat_print(struct trace_iterator *iter, int flags, 1308 + struct trace_event *event) 1309 + { 1310 + struct trace_entry *entry = iter->ent; 1311 + struct trace_seq *s = &iter->seq; 1312 + struct timerlat_entry *field; 1313 + 1314 + trace_assign_type(field, entry); 1315 + 1316 + trace_seq_printf(s, "#%-5u context %6s timer_latency %9llu ns\n", 1317 + field->seqnum, 1318 + field->context ? "thread" : "irq", 1319 + field->timer_latency); 1320 + 1321 + return trace_handle_return(s); 1322 + } 1323 + 1324 + static enum print_line_t 1325 + trace_timerlat_raw(struct trace_iterator *iter, int flags, 1326 + struct trace_event *event) 1327 + { 1328 + struct timerlat_entry *field; 1329 + struct trace_seq *s = &iter->seq; 1330 + 1331 + trace_assign_type(field, iter->ent); 1332 + 1333 + trace_seq_printf(s, "%u %d %llu\n", 1334 + field->seqnum, 1335 + field->context, 1336 + field->timer_latency); 1337 + 1338 + return trace_handle_return(s); 1339 + } 1340 + 1341 + static struct trace_event_functions trace_timerlat_funcs = { 1342 + .trace = trace_timerlat_print, 1343 + .raw = trace_timerlat_raw, 1344 + }; 1345 + 1346 + static struct trace_event trace_timerlat_event = { 1347 + .type = TRACE_TIMERLAT, 1348 + .funcs = &trace_timerlat_funcs, 1232 1349 }; 1233 1350 1234 1351 /* TRACE_BPUTS */ ··· 1557 1442 &trace_bprint_event, 1558 1443 &trace_print_event, 1559 1444 &trace_hwlat_event, 1445 + &trace_osnoise_event, 1446 + &trace_timerlat_event, 1560 1447 &trace_raw_data_event, 1561 1448 &trace_func_repeats_event, 1562 1449 NULL
+12 -12
kernel/trace/trace_sched_wakeup.c
··· 26 26 static int wakeup_cpu; 27 27 static int wakeup_current_cpu; 28 28 static unsigned wakeup_prio = -1; 29 - static int wakeup_rt; 30 - static int wakeup_dl; 31 - static int tracing_dl = 0; 29 + static bool wakeup_rt; 30 + static bool wakeup_dl; 31 + static bool tracing_dl; 32 32 33 33 static arch_spinlock_t wakeup_lock = 34 34 (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; ··· 498 498 { 499 499 wakeup_cpu = -1; 500 500 wakeup_prio = -1; 501 - tracing_dl = 0; 501 + tracing_dl = false; 502 502 503 503 if (wakeup_task) 504 504 put_task_struct(wakeup_task); ··· 572 572 * another task until the first one wakes up. 573 573 */ 574 574 if (dl_task(p)) 575 - tracing_dl = 1; 575 + tracing_dl = true; 576 576 else 577 - tracing_dl = 0; 577 + tracing_dl = false; 578 578 579 579 wakeup_task = get_task_struct(p); 580 580 ··· 685 685 if (wakeup_busy) 686 686 return -EBUSY; 687 687 688 - wakeup_dl = 0; 689 - wakeup_rt = 0; 688 + wakeup_dl = false; 689 + wakeup_rt = false; 690 690 return __wakeup_tracer_init(tr); 691 691 } 692 692 ··· 695 695 if (wakeup_busy) 696 696 return -EBUSY; 697 697 698 - wakeup_dl = 0; 699 - wakeup_rt = 1; 698 + wakeup_dl = false; 699 + wakeup_rt = true; 700 700 return __wakeup_tracer_init(tr); 701 701 } 702 702 ··· 705 705 if (wakeup_busy) 706 706 return -EBUSY; 707 707 708 - wakeup_dl = 1; 709 - wakeup_rt = 0; 708 + wakeup_dl = true; 709 + wakeup_rt = false; 710 710 return __wakeup_tracer_init(tr); 711 711 } 712 712
+30 -3
kernel/tracepoint.c
··· 273 273 * Add the probe function to a tracepoint. 274 274 */ 275 275 static int tracepoint_add_func(struct tracepoint *tp, 276 - struct tracepoint_func *func, int prio) 276 + struct tracepoint_func *func, int prio, 277 + bool warn) 277 278 { 278 279 struct tracepoint_func *old, *tp_funcs; 279 280 int ret; ··· 289 288 lockdep_is_held(&tracepoints_mutex)); 290 289 old = func_add(&tp_funcs, func, prio); 291 290 if (IS_ERR(old)) { 292 - WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM); 291 + WARN_ON_ONCE(warn && PTR_ERR(old) != -ENOMEM); 293 292 return PTR_ERR(old); 294 293 } 295 294 ··· 345 344 } 346 345 347 346 /** 347 + * tracepoint_probe_register_prio_may_exist - Connect a probe to a tracepoint with priority 348 + * @tp: tracepoint 349 + * @probe: probe handler 350 + * @data: tracepoint data 351 + * @prio: priority of this function over other registered functions 352 + * 353 + * Same as tracepoint_probe_register_prio() except that it will not warn 354 + * if the tracepoint is already registered. 355 + */ 356 + int tracepoint_probe_register_prio_may_exist(struct tracepoint *tp, void *probe, 357 + void *data, int prio) 358 + { 359 + struct tracepoint_func tp_func; 360 + int ret; 361 + 362 + mutex_lock(&tracepoints_mutex); 363 + tp_func.func = probe; 364 + tp_func.data = data; 365 + tp_func.prio = prio; 366 + ret = tracepoint_add_func(tp, &tp_func, prio, false); 367 + mutex_unlock(&tracepoints_mutex); 368 + return ret; 369 + } 370 + EXPORT_SYMBOL_GPL(tracepoint_probe_register_prio_may_exist); 371 + 372 + /** 348 373 * tracepoint_probe_register_prio - Connect a probe to a tracepoint with priority 349 374 * @tp: tracepoint 350 375 * @probe: probe handler ··· 393 366 tp_func.func = probe; 394 367 tp_func.data = data; 395 368 tp_func.prio = prio; 396 - ret = tracepoint_add_func(tp, &tp_func, prio); 369 + ret = tracepoint_add_func(tp, &tp_func, prio, true); 397 370 mutex_unlock(&tracepoints_mutex); 398 371 return ret; 399 372 }
+58 -18
lib/bootconfig.c
··· 156 156 struct xbc_node *node; 157 157 158 158 if (parent) 159 - node = xbc_node_get_child(parent); 159 + node = xbc_node_get_subkey(parent); 160 160 else 161 161 node = xbc_root_node(); 162 162 ··· 164 164 if (!xbc_node_match_prefix(node, &key)) 165 165 node = xbc_node_get_next(node); 166 166 else if (*key != '\0') 167 - node = xbc_node_get_child(node); 167 + node = xbc_node_get_subkey(node); 168 168 else 169 169 break; 170 170 } ··· 274 274 struct xbc_node * __init xbc_node_find_next_leaf(struct xbc_node *root, 275 275 struct xbc_node *node) 276 276 { 277 + struct xbc_node *next; 278 + 277 279 if (unlikely(!xbc_data)) 278 280 return NULL; 279 281 ··· 284 282 if (!node) 285 283 node = xbc_nodes; 286 284 } else { 285 + /* Leaf node may have a subkey */ 286 + next = xbc_node_get_subkey(node); 287 + if (next) { 288 + node = next; 289 + goto found; 290 + } 291 + 287 292 if (node == root) /* @root was a leaf, no child node. */ 288 293 return NULL; 289 294 ··· 305 296 node = xbc_node_get_next(node); 306 297 } 307 298 299 + found: 308 300 while (node && !xbc_node_is_leaf(node)) 309 301 node = xbc_node_get_child(node); 310 302 ··· 377 367 return node; 378 368 } 379 369 380 - static struct xbc_node * __init xbc_add_sibling(char *data, u32 flag) 370 + static inline __init struct xbc_node *xbc_last_child(struct xbc_node *node) 371 + { 372 + while (node->child) 373 + node = xbc_node_get_child(node); 374 + 375 + return node; 376 + } 377 + 378 + static struct xbc_node * __init __xbc_add_sibling(char *data, u32 flag, bool head) 381 379 { 382 380 struct xbc_node *sib, *node = xbc_add_node(data, flag); 383 381 384 382 if (node) { 385 383 if (!last_parent) { 384 + /* Ignore @head in this case */ 386 385 node->parent = XBC_NODE_MAX; 387 386 sib = xbc_last_sibling(xbc_nodes); 388 387 sib->next = xbc_node_index(node); 389 388 } else { 390 389 node->parent = xbc_node_index(last_parent); 391 - if (!last_parent->child) { 390 + if (!last_parent->child || head) { 391 + node->next = last_parent->child; 392 392 last_parent->child = xbc_node_index(node); 393 393 } else { 394 394 sib = xbc_node_get_child(last_parent); ··· 410 390 xbc_parse_error("Too many nodes", data); 411 391 412 392 return node; 393 + } 394 + 395 + static inline struct xbc_node * __init xbc_add_sibling(char *data, u32 flag) 396 + { 397 + return __xbc_add_sibling(data, flag, false); 398 + } 399 + 400 + static inline struct xbc_node * __init xbc_add_head_sibling(char *data, u32 flag) 401 + { 402 + return __xbc_add_sibling(data, flag, true); 413 403 } 414 404 415 405 static inline __init struct xbc_node *xbc_add_child(char *data, u32 flag) ··· 547 517 char *next; 548 518 int c = 0; 549 519 520 + if (last_parent->child) 521 + last_parent = xbc_node_get_child(last_parent); 522 + 550 523 do { 551 524 c = __xbc_parse_value(__v, &next); 552 525 if (c < 0) 553 526 return c; 554 527 555 - node = xbc_add_sibling(*__v, XBC_VALUE); 528 + node = xbc_add_child(*__v, XBC_VALUE); 556 529 if (!node) 557 530 return -ENOMEM; 558 531 *__v = next; 559 532 } while (c == ','); 560 - node->next = 0; 533 + node->child = 0; 561 534 562 535 return c; 563 536 } ··· 590 557 node = find_match_node(xbc_nodes, k); 591 558 else { 592 559 child = xbc_node_get_child(last_parent); 560 + /* Since the value node is the first child, skip it. */ 593 561 if (child && xbc_node_is_value(child)) 594 - return xbc_parse_error("Subkey is mixed with value", k); 562 + child = xbc_node_get_next(child); 595 563 node = find_match_node(child, k); 596 564 } 597 565 ··· 635 601 if (ret) 636 602 return ret; 637 603 638 - child = xbc_node_get_child(last_parent); 639 - if (child) { 640 - if (xbc_node_is_key(child)) 641 - return xbc_parse_error("Value is mixed with subkey", v); 642 - else if (op == '=') 643 - return xbc_parse_error("Value is redefined", v); 644 - } 645 - 646 604 c = __xbc_parse_value(&v, &next); 647 605 if (c < 0) 648 606 return c; 649 607 650 - if (op == ':' && child) { 651 - xbc_init_node(child, v, XBC_VALUE); 652 - } else if (!xbc_add_sibling(v, XBC_VALUE)) 608 + child = xbc_node_get_child(last_parent); 609 + if (child && xbc_node_is_value(child)) { 610 + if (op == '=') 611 + return xbc_parse_error("Value is redefined", v); 612 + if (op == ':') { 613 + unsigned short nidx = child->next; 614 + 615 + xbc_init_node(child, v, XBC_VALUE); 616 + child->next = nidx; /* keep subkeys */ 617 + goto array; 618 + } 619 + /* op must be '+' */ 620 + last_parent = xbc_last_child(child); 621 + } 622 + /* The value node should always be the first child */ 623 + if (!xbc_add_head_sibling(v, XBC_VALUE)) 653 624 return -ENOMEM; 654 625 626 + array: 655 627 if (c == ',') { /* Array */ 656 628 c = xbc_parse_array(&next); 657 629 if (c < 0)
+6 -2
lib/seq_buf.c
··· 229 229 230 230 WARN_ON(s->size == 0); 231 231 232 + BUILD_BUG_ON(MAX_MEMHEX_BYTES * 2 >= HEX_CHARS); 233 + 232 234 while (len) { 233 - start_len = min(len, HEX_CHARS - 1); 235 + start_len = min(len, MAX_MEMHEX_BYTES); 234 236 #ifdef __BIG_ENDIAN 235 237 for (i = 0, j = 0; i < start_len; i++) { 236 238 #else ··· 245 243 break; 246 244 247 245 /* j increments twice per loop */ 248 - len -= j / 2; 249 246 hex[j++] = ' '; 250 247 251 248 seq_buf_putmem(s, hex, j); 252 249 if (seq_buf_has_overflowed(s)) 253 250 return -1; 251 + 252 + len -= start_len; 253 + data += start_len; 254 254 } 255 255 return 0; 256 256 }
+40 -24
tools/bootconfig/main.c
··· 27 27 q = '\''; 28 28 else 29 29 q = '"'; 30 - printf("%c%s%c%s", q, val, q, node->next ? ", " : eol); 30 + printf("%c%s%c%s", q, val, q, xbc_node_is_array(node) ? ", " : eol); 31 31 i++; 32 32 } 33 33 return i; ··· 35 35 36 36 static void xbc_show_compact_tree(void) 37 37 { 38 - struct xbc_node *node, *cnode; 38 + struct xbc_node *node, *cnode = NULL, *vnode; 39 39 int depth = 0, i; 40 40 41 41 node = xbc_root_node(); 42 42 while (node && xbc_node_is_key(node)) { 43 43 for (i = 0; i < depth; i++) 44 44 printf("\t"); 45 - cnode = xbc_node_get_child(node); 45 + if (!cnode) 46 + cnode = xbc_node_get_child(node); 46 47 while (cnode && xbc_node_is_key(cnode) && !cnode->next) { 48 + vnode = xbc_node_get_child(cnode); 49 + /* 50 + * If @cnode has value and subkeys, this 51 + * should show it as below. 52 + * 53 + * key(@node) { 54 + * key(@cnode) = value; 55 + * key(@cnode) { 56 + * subkeys; 57 + * } 58 + * } 59 + */ 60 + if (vnode && xbc_node_is_value(vnode) && vnode->next) 61 + break; 47 62 printf("%s.", xbc_node_get_data(node)); 48 63 node = cnode; 49 - cnode = xbc_node_get_child(node); 64 + cnode = vnode; 50 65 } 51 66 if (cnode && xbc_node_is_key(cnode)) { 52 67 printf("%s {\n", xbc_node_get_data(node)); 53 68 depth++; 54 69 node = cnode; 70 + cnode = NULL; 55 71 continue; 56 72 } else if (cnode && xbc_node_is_value(cnode)) { 57 73 printf("%s = ", xbc_node_get_data(node)); 58 74 xbc_show_value(cnode, true); 75 + /* 76 + * If @node has value and subkeys, continue 77 + * looping on subkeys with same node. 78 + */ 79 + if (cnode->next) { 80 + cnode = xbc_node_get_next(cnode); 81 + continue; 82 + } 59 83 } else { 60 84 printf("%s;\n", xbc_node_get_data(node)); 61 85 } 86 + cnode = NULL; 62 87 63 88 if (node->next) { 64 89 node = xbc_node_get_next(node); ··· 95 70 return; 96 71 if (!xbc_node_get_child(node)->next) 97 72 continue; 98 - depth--; 99 - for (i = 0; i < depth; i++) 100 - printf("\t"); 101 - printf("}\n"); 73 + if (depth) { 74 + depth--; 75 + for (i = 0; i < depth; i++) 76 + printf("\t"); 77 + printf("}\n"); 78 + } 102 79 } 103 80 node = xbc_node_get_next(node); 104 81 } ··· 111 84 char key[XBC_KEYLEN_MAX]; 112 85 struct xbc_node *leaf; 113 86 const char *val; 114 - int ret = 0; 115 87 116 88 xbc_for_each_key_value(leaf, val) { 117 - ret = xbc_node_compose_key(leaf, key, XBC_KEYLEN_MAX); 118 - if (ret < 0) 89 + if (xbc_node_compose_key(leaf, key, XBC_KEYLEN_MAX) < 0) { 90 + fprintf(stderr, "Failed to compose key %d\n", ret); 119 91 break; 92 + } 120 93 printf("%s = ", key); 121 94 if (!val || val[0] == '\0') { 122 95 printf("\"\"\n"); ··· 124 97 } 125 98 xbc_show_value(xbc_node_get_child(leaf), false); 126 99 } 127 - } 128 - 129 - /* Simple real checksum */ 130 - static int checksum(unsigned char *buf, int len) 131 - { 132 - int i, sum = 0; 133 - 134 - for (i = 0; i < len; i++) 135 - sum += buf[i]; 136 - 137 - return sum; 138 100 } 139 101 140 102 #define PAGE_SIZE 4096 ··· 221 205 return ret; 222 206 223 207 /* Wrong Checksum */ 224 - rcsum = checksum((unsigned char *)*buf, size); 208 + rcsum = xbc_calc_checksum(*buf, size); 225 209 if (csum != rcsum) { 226 210 pr_err("checksum error: %d != %d\n", csum, rcsum); 227 211 return -EINVAL; ··· 370 354 return ret; 371 355 } 372 356 size = strlen(buf) + 1; 373 - csum = checksum((unsigned char *)buf, size); 357 + csum = xbc_calc_checksum(buf, size); 374 358 375 359 /* Backup the bootconfig data */ 376 360 data = calloc(size + BOOTCONFIG_ALIGN +
tools/bootconfig/samples/bad-mixed-kv1.bconf tools/bootconfig/samples/good-mixed-kv1.bconf
tools/bootconfig/samples/bad-mixed-kv2.bconf tools/bootconfig/samples/good-mixed-kv2.bconf
-3
tools/bootconfig/samples/bad-override.bconf
··· 1 - key.subkey = value 2 - # We can not override pre-defined subkeys with value 3 - key := value
-3
tools/bootconfig/samples/bad-override2.bconf
··· 1 - key = value 2 - # We can not override pre-defined value with subkey 3 - key.subkey := value
+4
tools/bootconfig/samples/good-mixed-append.bconf
··· 1 + key = foo 2 + keyx.subkey = value 3 + key += bar 4 +
+6
tools/bootconfig/samples/good-mixed-kv3.bconf
··· 1 + # mixed key and subkeys with braces 2 + key = value 3 + key { 4 + subkey1 5 + subkey2 = foo 6 + }
+4
tools/bootconfig/samples/good-mixed-override.bconf
··· 1 + key.foo = bar 2 + key = value 3 + # mixed key value can be overridden 4 + key := value2
+49
tools/testing/ktest/examples/bootconfigs/boottrace.bconf
··· 1 + ftrace.event { 2 + task.task_newtask { 3 + filter = "pid < 128" 4 + enable 5 + } 6 + kprobes.vfs_read { 7 + probes = "vfs_read $arg1 $arg2" 8 + filter = "common_pid < 200" 9 + enable 10 + } 11 + synthetic.initcall_latency { 12 + fields = "unsigned long func", "u64 lat" 13 + actions = "hist:keys=func.sym,lat:vals=lat:sort=lat" 14 + } 15 + initcall.initcall_start { 16 + actions = "hist:keys=func:ts0=common_timestamp.usecs" 17 + } 18 + initcall.initcall_finish { 19 + actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:onmatch(initcall.initcall_start).initcall_latency(func,$lat)" 20 + } 21 + } 22 + 23 + ftrace.instance { 24 + foo { 25 + tracer = "function" 26 + ftrace.filters = "user_*" 27 + cpumask = 1 28 + options = nosym-addr 29 + buffer_size = 512KB 30 + trace_clock = mono 31 + event.signal.signal_deliver.actions=snapshot 32 + } 33 + bar { 34 + tracer = "function" 35 + ftrace.filters = "kernel_*" 36 + cpumask = 2 37 + trace_clock = x86-tsc 38 + } 39 + } 40 + 41 + ftrace.alloc_snapshot 42 + 43 + kernel { 44 + trace_options = sym-addr 45 + trace_event = "initcall:*" 46 + trace_buf_size = 1M 47 + ftrace = function 48 + ftrace_filter = "vfs*" 49 + }
+1
tools/testing/ktest/examples/bootconfigs/config-bootconfig
··· 1 + CONFIG_CMDLINE="bootconfig"
+15
tools/testing/ktest/examples/bootconfigs/functiongraph.bconf
··· 1 + ftrace { 2 + tracing_on = 0 # off by default 3 + tracer = function_graph 4 + event.kprobes { 5 + start_event { 6 + probes = "pci_proc_init" 7 + actions = "traceon" 8 + } 9 + end_event { 10 + probes = "pci_proc_init%return" 11 + actions = "traceoff" 12 + } 13 + } 14 + } 15 +
+33
tools/testing/ktest/examples/bootconfigs/tracing.bconf
··· 1 + ftrace { 2 + tracer = function_graph; 3 + options = event-fork, sym-addr, stacktrace; 4 + buffer_size = 1M; 5 + alloc_snapshot; 6 + trace_clock = global; 7 + events = "task:task_newtask", "initcall:*"; 8 + event.sched.sched_process_exec { 9 + filter = "pid < 128"; 10 + } 11 + instance.bar { 12 + event.kprobes { 13 + myevent { 14 + probes = "vfs_read $arg2 $arg3"; 15 + } 16 + myevent2 { 17 + probes = "vfs_write $arg2 +0($arg2):ustring $arg3"; 18 + } 19 + myevent3 { 20 + probes = "initrd_load"; 21 + } 22 + enable 23 + } 24 + } 25 + instance.foo { 26 + tracer = function; 27 + tracing_on = false; 28 + }; 29 + } 30 + kernel { 31 + ftrace_dump_on_oops = "orig_cpu" 32 + traceoff_on_warning 33 + }
+84
tools/testing/ktest/examples/bootconfigs/verify-boottrace.sh
··· 1 + #!/bin/sh 2 + 3 + cd /sys/kernel/tracing 4 + 5 + compare_file() { 6 + file="$1" 7 + val="$2" 8 + content=`cat $file` 9 + if [ "$content" != "$val" ]; then 10 + echo "FAILED: $file has '$content', expected '$val'" 11 + exit 1 12 + fi 13 + } 14 + 15 + compare_file_partial() { 16 + file="$1" 17 + val="$2" 18 + content=`cat $file | sed -ne "/^$val/p"` 19 + if [ -z "$content" ]; then 20 + echo "FAILED: $file does not contain '$val'" 21 + cat $file 22 + exit 1 23 + fi 24 + } 25 + 26 + file_contains() { 27 + file=$1 28 + val="$2" 29 + 30 + if ! grep -q "$val" $file ; then 31 + echo "FAILED: $file does not contain $val" 32 + cat $file 33 + exit 1 34 + fi 35 + } 36 + 37 + compare_mask() { 38 + file=$1 39 + val="$2" 40 + 41 + content=`cat $file | sed -ne "/^[0 ]*$val/p"` 42 + if [ -z "$content" ]; then 43 + echo "FAILED: $file does not have mask '$val'" 44 + cat $file 45 + exit 1 46 + fi 47 + } 48 + 49 + compare_file "events/task/task_newtask/filter" "pid < 128" 50 + compare_file "events/task/task_newtask/enable" "1" 51 + 52 + compare_file "events/kprobes/vfs_read/filter" "common_pid < 200" 53 + compare_file "events/kprobes/vfs_read/enable" "1" 54 + 55 + compare_file_partial "events/synthetic/initcall_latency/trigger" "hist:keys=func.sym,lat:vals=hitcount,lat:sort=lat" 56 + compare_file_partial "events/synthetic/initcall_latency/enable" "0" 57 + 58 + compare_file_partial "events/initcall/initcall_start/trigger" "hist:keys=func:vals=hitcount:ts0=common_timestamp.usecs" 59 + compare_file_partial "events/initcall/initcall_start/enable" "1" 60 + 61 + compare_file_partial "events/initcall/initcall_finish/trigger" 'hist:keys=func:vals=hitcount:lat=common_timestamp.usecs-\$ts0:sort=hitcount:size=2048:clock=global:onmatch(initcall.initcall_start).initcall_latency(func,\$lat)' 62 + compare_file_partial "events/initcall/initcall_finish/enable" "1" 63 + 64 + compare_file "instances/foo/current_tracer" "function" 65 + file_contains "instances/foo/set_ftrace_filter" "^user" 66 + compare_file "instances/foo/buffer_size_kb" "512" 67 + compare_mask "instances/foo/tracing_cpumask" "1" 68 + compare_file "instances/foo/options/sym-addr" "0" 69 + file_contains "instances/foo/trace_clock" '\[mono\]' 70 + compare_file_partial "instances/foo/events/signal/signal_deliver/trigger" "snapshot" 71 + 72 + compare_file "instances/bar/current_tracer" "function" 73 + file_contains "instances/bar/set_ftrace_filter" "^kernel" 74 + compare_mask "instances/bar/tracing_cpumask" "2" 75 + file_contains "instances/bar/trace_clock" '\[x86-tsc\]' 76 + 77 + file_contains "snapshot" "Snapshot is allocated" 78 + compare_file "options/sym-addr" "1" 79 + compare_file "events/initcall/enable" "1" 80 + compare_file "buffer_size_kb" "1027" 81 + compare_file "current_tracer" "function" 82 + file_contains "set_ftrace_filter" '^vfs' 83 + 84 + exit 0
+61
tools/testing/ktest/examples/bootconfigs/verify-functiongraph.sh
··· 1 + #!/bin/sh 2 + 3 + cd /sys/kernel/tracing 4 + 5 + compare_file() { 6 + file="$1" 7 + val="$2" 8 + content=`cat $file` 9 + if [ "$content" != "$val" ]; then 10 + echo "FAILED: $file has '$content', expected '$val'" 11 + exit 1 12 + fi 13 + } 14 + 15 + compare_file_partial() { 16 + file="$1" 17 + val="$2" 18 + content=`cat $file | sed -ne "/^$val/p"` 19 + if [ -z "$content" ]; then 20 + echo "FAILED: $file does not contain '$val'" 21 + cat $file 22 + exit 1 23 + fi 24 + } 25 + 26 + file_contains() { 27 + file=$1 28 + val="$2" 29 + 30 + if ! grep -q "$val" $file ; then 31 + echo "FAILED: $file does not contain $val" 32 + cat $file 33 + exit 1 34 + fi 35 + } 36 + 37 + compare_mask() { 38 + file=$1 39 + val="$2" 40 + 41 + content=`cat $file | sed -ne "/^[0 ]*$val/p"` 42 + if [ -z "$content" ]; then 43 + echo "FAILED: $file does not have mask '$val'" 44 + cat $file 45 + exit 1 46 + fi 47 + } 48 + 49 + 50 + compare_file "tracing_on" "0" 51 + compare_file "current_tracer" "function_graph" 52 + 53 + compare_file_partial "events/kprobes/start_event/enable" "1" 54 + compare_file_partial "events/kprobes/start_event/trigger" "traceon" 55 + file_contains "kprobe_events" 'start_event.*pci_proc_init' 56 + 57 + compare_file_partial "events/kprobes/end_event/enable" "1" 58 + compare_file_partial "events/kprobes/end_event/trigger" "traceoff" 59 + file_contains "kprobe_events" '^r.*end_event.*pci_proc_init' 60 + 61 + exit 0
+72
tools/testing/ktest/examples/bootconfigs/verify-tracing.sh
··· 1 + #!/bin/sh 2 + 3 + cd /sys/kernel/tracing 4 + 5 + compare_file() { 6 + file="$1" 7 + val="$2" 8 + content=`cat $file` 9 + if [ "$content" != "$val" ]; then 10 + echo "FAILED: $file has '$content', expected '$val'" 11 + exit 1 12 + fi 13 + } 14 + 15 + compare_file_partial() { 16 + file="$1" 17 + val="$2" 18 + content=`cat $file | sed -ne "/^$val/p"` 19 + if [ -z "$content" ]; then 20 + echo "FAILED: $file does not contain '$val'" 21 + cat $file 22 + exit 1 23 + fi 24 + } 25 + 26 + file_contains() { 27 + file=$1 28 + val="$2" 29 + 30 + if ! grep -q "$val" $file ; then 31 + echo "FAILED: $file does not contain $val" 32 + cat $file 33 + exit 1 34 + fi 35 + } 36 + 37 + compare_mask() { 38 + file=$1 39 + val="$2" 40 + 41 + content=`cat $file | sed -ne "/^[0 ]*$val/p"` 42 + if [ -z "$content" ]; then 43 + echo "FAILED: $file does not have mask '$val'" 44 + cat $file 45 + exit 1 46 + fi 47 + } 48 + 49 + compare_file "current_tracer" "function_graph" 50 + compare_file "options/event-fork" "1" 51 + compare_file "options/sym-addr" "1" 52 + compare_file "options/stacktrace" "1" 53 + compare_file "buffer_size_kb" "1024" 54 + file_contains "snapshot" "Snapshot is allocated" 55 + file_contains "trace_clock" '\[global\]' 56 + 57 + compare_file "events/initcall/enable" "1" 58 + compare_file "events/task/task_newtask/enable" "1" 59 + compare_file "events/sched/sched_process_exec/filter" "pid < 128" 60 + compare_file "events/kprobes/enable" "1" 61 + 62 + compare_file "instances/bar/events/kprobes/myevent/enable" "1" 63 + compare_file "instances/bar/events/kprobes/myevent2/enable" "1" 64 + compare_file "instances/bar/events/kprobes/myevent3/enable" "1" 65 + 66 + compare_file "instances/foo/current_tracer" "function" 67 + compare_file "instances/foo/tracing_on" "0" 68 + 69 + compare_file "/proc/sys/kernel/ftrace_dump_on_oops" "2" 70 + compare_file "/proc/sys/kernel/traceoff_on_warning" "1" 71 + 72 + exit 0
+69
tools/testing/ktest/examples/include/bootconfig.conf
··· 1 + # bootconfig.conf 2 + # 3 + # Tests to test some bootconfig scripts 4 + 5 + # List where on the target machine the initrd is used 6 + INITRD := /boot/initramfs-test.img 7 + 8 + # Install bootconfig on the target machine and define the path here. 9 + BOOTCONFIG := /usr/bin/bootconfig 10 + 11 + # Currenty we just build the .config in the BUILD_DIR 12 + BUILD_TYPE := oldconfig 13 + 14 + # Helper macro to run bootconfig on the target 15 + # SSH is defined in include/defaults.conf 16 + ADD_BOOTCONFIG := ${SSH} "${BOOTCONFIG} -d ${INITRD} && ${BOOTCONFIG} -a /tmp/${BOOTCONFIG_FILE} ${INITRD}" 17 + 18 + # This copies a bootconfig script to the target and then will 19 + # add it to the initrd. SSH_USER is defined in include/defaults.conf 20 + # and MACHINE is defined in the example configs. 21 + BOOTCONFIG_TEST_PREP = scp ${BOOTCONFIG_PATH}${BOOTCONFIG_FILE} ${SSH_USER}@${MACHINE}:/tmp && ${ADD_BOOTCONFIG} 22 + 23 + # When a test is complete, remove the bootconfig from the initrd. 24 + CLEAR_BOOTCONFIG := ${SSH} "${BOOTCONFIG} -d ${INITRD}" 25 + 26 + # Run a verifier on the target after it had booted, to make sure that the 27 + # bootconfig script did what it was expected to do 28 + DO_TEST = scp ${BOOTCONFIG_PATH}${BOOTCONFIG_VERIFY} ${SSH_USER}@${MACHINE}:/tmp && ${SSH} /tmp/${BOOTCONFIG_VERIFY} 29 + 30 + # Comment this out to not run the boot configs 31 + RUN_BOOTCONFIG := 1 32 + 33 + TEST_START IF DEFINED RUN_BOOTCONFIG 34 + TEST_TYPE = test 35 + TEST_NAME = bootconfig boottrace 36 + # Just testing the bootconfig on initrd, no need to build the kernel 37 + BUILD_TYPE = nobuild 38 + BOOTCONFIG_FILE = boottrace.bconf 39 + BOOTCONFIG_VERIFY = verify-boottrace.sh 40 + ADD_CONFIG = ${ADD_CONFIG} ${BOOTCONFIG_PATH}/config-bootconfig 41 + PRE_TEST = ${BOOTCONFIG_TEST_PREP} 42 + PRE_TEST_DIE = 1 43 + TEST = ${DO_TEST} 44 + POST_TEST = ${CLEAR_BOOTCONFIG} 45 + 46 + TEST_START IF DEFINED RUN_BOOTCONFIG 47 + TEST_TYPE = test 48 + TEST_NAME = bootconfig function graph 49 + BUILD_TYPE = nobuild 50 + BOOTCONFIG_FILE = functiongraph.bconf 51 + BOOTCONFIG_VERIFY = verify-functiongraph.sh 52 + ADD_CONFIG = ${ADD_CONFIG} ${BOOTCONFIG_PATH}/config-bootconfig 53 + PRE_TEST = ${BOOTCONFIG_TEST_PREP} 54 + PRE_TEST_DIE = 1 55 + TEST = ${DO_TEST} 56 + POST_TEST = ${CLEAR_BOOTCONFIG} 57 + 58 + TEST_START IF DEFINED RUN_BOOTCONFIG 59 + TEST_TYPE = test 60 + TEST_NAME = bootconfig tracing 61 + BUILD_TYPE = nobuild 62 + BOOTCONFIG_FILE = tracing.bconf 63 + BOOTCONFIG_VERIFY = verify-tracing.sh 64 + ADD_CONFIG = ${ADD_CONFIG} ${BOOTCONFIG_PATH}/config-bootconfig 65 + PRE_TEST = ${BOOTCONFIG_TEST_PREP} 66 + PRE_TEST_DIE = 1 67 + TEST = ${DO_TEST} 68 + POST_TEST = ${CLEAR_BOOTCONFIG} 69 +
+1
tools/testing/ktest/examples/kvm.conf
··· 90 90 INCLUDE include/tests.conf 91 91 INCLUDE include/bisect.conf 92 92 INCLUDE include/min-config.conf 93 + INCLUDE include/bootconfig.conf