Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

- kprobes: Restructured stack unwinder to show properly on x86 when a
stack dump happens from a kretprobe callback.

- Fix to bootconfig parsing

- Have tracefs allow owner and group permissions by default (only
denying others). There's been pressure to allow non root to tracefs
in a controlled fashion, and using groups is probably the safest.

- Bootconfig memory managament updates.

- Bootconfig clean up to have the tools directory be less dependent on
changes in the kernel tree.

- Allow perf to be traced by function tracer.

- Rewrite of function graph tracer to be a callback from the function
tracer instead of having its own trampoline (this change will happen
on an arch by arch basis, and currently only x86_64 implements it).

- Allow multiple direct trampolines (bpf hooks to functions) be batched
together in one synchronization.

- Allow histogram triggers to add variables that can perform
calculations against the event's fields.

- Use the linker to determine architecture callbacks from the ftrace
trampoline to allow for proper parameter prototypes and prevent
warnings from the compiler.

- Extend histogram triggers to key off of variables.

- Have trace recursion use bit magic to determine preempt context over
if branches.

- Have trace recursion disable preemption as all use cases do anyway.

- Added testing for verification of tracing utilities.

- Various small clean ups and fixes.

* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
tracing/histogram: Fix semicolon.cocci warnings
tracing/histogram: Fix documentation inline emphasis warning
tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
tracing: Show size of requested perf buffer
bootconfig: Initialize ret in xbc_parse_tree()
ftrace: do CPU checking after preemption disabled
ftrace: disable preemption when recursion locked
tracing/histogram: Document expression arithmetic and constants
tracing/histogram: Optimize division by a power of 2
tracing/histogram: Covert expr to const if both operands are constants
tracing/histogram: Simplify handling of .sym-offset in expressions
tracing: Fix operator precedence for hist triggers expression
tracing: Add division and multiplication support for hist triggers
tracing: Add support for creating hist trigger variables from literal
selftests/ftrace: Stop tracing while reading the trace file by default
MAINTAINERS: Update KPROBES and TRACING entries
test_kprobes: Move it from kernel/ to lib/
docs, kprobes: Remove invalid URL and add new reference
samples/kretprobes: Fix return value if register_kretprobe() failed
lib/bootconfig: Fix the xbc_get_info kerneldoc
...

+3012 -1529
+14
Documentation/trace/histogram.rst
··· 1763 1763 1764 1764 # echo 'hist:key=pid:wakeupswitch_lat=$wakeup_lat+$switchtime_lat ...' >> event3/trigger 1765 1765 1766 + Expressions support the use of addition, subtraction, multiplication and 1767 + division operators (+-\*/). 1768 + 1769 + Note that division by zero always returns -1. 1770 + 1771 + Numeric constants can also be used directly in an expression:: 1772 + 1773 + # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/1000000 ...' >> event/trigger 1774 + 1775 + or assigned to a variable and referenced in a subsequent expression:: 1776 + 1777 + # echo 'hist:keys=next_pid:us_per_sec=1000000 ...' >> event/trigger 1778 + # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/$us_per_sec ...' >> event/trigger 1779 + 1766 1780 2.2.2 Synthetic Events 1767 1781 ---------------------- 1768 1782
+1 -1
Documentation/trace/kprobes.rst
··· 784 784 785 785 For additional information on Kprobes, refer to the following URLs: 786 786 787 - - https://www.ibm.com/developerworks/library/l-kprobes/index.html 787 + - https://lwn.net/Articles/132196/ 788 788 - https://www.kernel.org/doc/ols/2006/ols2006v2-pages-109-124.pdf 789 789
+12 -12
Documentation/trace/timerlat-tracer.rst
··· 3 3 ############### 4 4 5 5 The timerlat tracer aims to help the preemptive kernel developers to 6 - find souces of wakeup latencies of real-time threads. Like cyclictest, 6 + find sources of wakeup latencies of real-time threads. Like cyclictest, 7 7 the tracer sets a periodic timer that wakes up a thread. The thread then 8 8 computes a *wakeup latency* value as the difference between the *current 9 9 time* and the *absolute time* that the timer was set to expire. The main ··· 50 50 ID field serves to relate the *irq* execution to its respective *thread* 51 51 execution. 52 52 53 - The *irq*/*thread* splitting is important to clarify at which context 53 + The *irq*/*thread* splitting is important to clarify in which context 54 54 the unexpected high value is coming from. The *irq* context can be 55 - delayed by hardware related actions, such as SMIs, NMIs, IRQs 56 - or by a thread masking interrupts. Once the timer happens, the delay 55 + delayed by hardware-related actions, such as SMIs, NMIs, IRQs, 56 + or by thread masking interrupts. Once the timer happens, the delay 57 57 can also be influenced by blocking caused by threads. For example, by 58 - postponing the scheduler execution via preempt_disable(), by the 59 - scheduler execution, or by masking interrupts. Threads can 60 - also be delayed by the interference from other threads and IRQs. 58 + postponing the scheduler execution via preempt_disable(), scheduler 59 + execution, or masking interrupts. Threads can also be delayed by the 60 + interference from other threads and IRQs. 61 61 62 62 Tracer options 63 63 --------------------- ··· 68 68 69 69 - cpus: CPUs at which a timerlat thread will execute. 70 70 - timerlat_period_us: the period of the timerlat thread. 71 - - osnoise/stop_tracing_us: stop the system tracing if a 71 + - stop_tracing_us: stop the system tracing if a 72 72 timer latency at the *irq* context higher than the configured 73 73 value happens. Writing 0 disables this option. 74 74 - stop_tracing_total_us: stop the system tracing if a 75 - timer latency at the *thread* context higher than the configured 75 + timer latency at the *thread* context is higher than the configured 76 76 value happens. Writing 0 disables this option. 77 - - print_stack: save the stack of the IRQ ocurrence, and print 78 - it afte the *thread context* event". 77 + - print_stack: save the stack of the IRQ occurrence, and print 78 + it after the *thread context* event". 79 79 80 80 timerlat and osnoise 81 81 ---------------------------- ··· 95 95 timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns 96 96 97 97 In this case, the root cause of the timer latency does not point to a 98 - single cause, but to multiple ones. Firstly, the timer IRQ was delayed 98 + single cause but to multiple ones. Firstly, the timer IRQ was delayed 99 99 for 13 us, which may point to a long IRQ disabled section (see IRQ 100 100 stacktrace section). Then the timer interrupt that wakes up the timerlat 101 101 thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
+4 -1
MAINTAINERS
··· 10482 10482 M: "David S. Miller" <davem@davemloft.net> 10483 10483 M: Masami Hiramatsu <mhiramat@kernel.org> 10484 10484 S: Maintained 10485 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git 10485 10486 F: Documentation/trace/kprobes.rst 10486 10487 F: include/asm-generic/kprobes.h 10487 10488 F: include/linux/kprobes.h 10488 10489 F: kernel/kprobes.c 10490 + F: lib/test_kprobes.c 10491 + F: samples/kprobes 10489 10492 10490 10493 KS0108 LCD CONTROLLER DRIVER 10491 10494 M: Miguel Ojeda <ojeda@kernel.org> ··· 19029 19026 M: Steven Rostedt <rostedt@goodmis.org> 19030 19027 M: Ingo Molnar <mingo@redhat.com> 19031 19028 S: Maintained 19032 - T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core 19029 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git 19033 19030 F: Documentation/trace/ftrace.rst 19034 19031 F: arch/*/*/*/ftrace.h 19035 19032 F: arch/*/kernel/ftrace.c
+8
arch/Kconfig
··· 191 191 config HAVE_KPROBES_ON_FTRACE 192 192 bool 193 193 194 + config ARCH_CORRECT_STACKTRACE_ON_KRETPROBE 195 + bool 196 + help 197 + Since kretprobes modifies return address on the stack, the 198 + stacktrace may see the kretprobe trampoline address instead 199 + of correct one. If the architecture stacktrace code and 200 + unwinder can adjust such entries, select this configuration. 201 + 194 202 config HAVE_FUNCTION_ERROR_INJECTION 195 203 bool 196 204
+1 -1
arch/arc/include/asm/kprobes.h
··· 46 46 }; 47 47 48 48 int kprobe_fault_handler(struct pt_regs *regs, unsigned long cause); 49 - void kretprobe_trampoline(void); 49 + void __kretprobe_trampoline(void); 50 50 void trap_is_kprobe(unsigned long address, struct pt_regs *regs); 51 51 #else 52 52 #define trap_is_kprobe(address, regs)
+5
arch/arc/include/asm/ptrace.h
··· 149 149 return (long)regs->r0; 150 150 } 151 151 152 + static inline void instruction_pointer_set(struct pt_regs *regs, 153 + unsigned long val) 154 + { 155 + instruction_pointer(regs) = val; 156 + } 152 157 #endif /* !__ASSEMBLY__ */ 153 158 154 159 #endif /* __ASM_PTRACE_H */
+7 -6
arch/arc/kernel/kprobes.c
··· 363 363 364 364 static void __used kretprobe_trampoline_holder(void) 365 365 { 366 - __asm__ __volatile__(".global kretprobe_trampoline\n" 367 - "kretprobe_trampoline:\n" "nop\n"); 366 + __asm__ __volatile__(".global __kretprobe_trampoline\n" 367 + "__kretprobe_trampoline:\n" 368 + "nop\n"); 368 369 } 369 370 370 371 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri, ··· 376 375 ri->fp = NULL; 377 376 378 377 /* Replace the return addr with trampoline addr */ 379 - regs->blink = (unsigned long)&kretprobe_trampoline; 378 + regs->blink = (unsigned long)&__kretprobe_trampoline; 380 379 } 381 380 382 381 static int __kprobes trampoline_probe_handler(struct kprobe *p, 383 382 struct pt_regs *regs) 384 383 { 385 - regs->ret = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL); 384 + regs->ret = __kretprobe_trampoline_handler(regs, NULL); 386 385 387 386 /* By returning a non zero value, we are telling the kprobe handler 388 387 * that we don't want the post_handler to run ··· 391 390 } 392 391 393 392 static struct kprobe trampoline_p = { 394 - .addr = (kprobe_opcode_t *) &kretprobe_trampoline, 393 + .addr = (kprobe_opcode_t *) &__kretprobe_trampoline, 395 394 .pre_handler = trampoline_probe_handler 396 395 }; 397 396 ··· 403 402 404 403 int __kprobes arch_trampoline_kprobe(struct kprobe *p) 405 404 { 406 - if (p->addr == (kprobe_opcode_t *) &kretprobe_trampoline) 405 + if (p->addr == (kprobe_opcode_t *) &__kretprobe_trampoline) 407 406 return 1; 408 407 409 408 return 0;
+1
arch/arm/Kconfig
··· 3 3 bool 4 4 default y 5 5 select ARCH_32BIT_OFF_T 6 + select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE if HAVE_KRETPROBES && FRAME_POINTER && !ARM_UNWIND 6 7 select ARCH_HAS_BINFMT_FLAT 7 8 select ARCH_HAS_DEBUG_VIRTUAL if MMU 8 9 select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE
+9
arch/arm/include/asm/stacktrace.h
··· 3 3 #define __ASM_STACKTRACE_H 4 4 5 5 #include <asm/ptrace.h> 6 + #include <linux/llist.h> 6 7 7 8 struct stackframe { 8 9 /* ··· 14 13 unsigned long sp; 15 14 unsigned long lr; 16 15 unsigned long pc; 16 + #ifdef CONFIG_KRETPROBES 17 + struct llist_node *kr_cur; 18 + struct task_struct *tsk; 19 + #endif 17 20 }; 18 21 19 22 static __always_inline ··· 27 22 frame->sp = regs->ARM_sp; 28 23 frame->lr = regs->ARM_lr; 29 24 frame->pc = regs->ARM_pc; 25 + #ifdef CONFIG_KRETPROBES 26 + frame->kr_cur = NULL; 27 + frame->tsk = current; 28 + #endif 30 29 } 31 30 32 31 extern int unwind_frame(struct stackframe *frame);
-5
arch/arm/kernel/ftrace.c
··· 193 193 194 194 return ret; 195 195 } 196 - 197 - int __init ftrace_dyn_arch_init(void) 198 - { 199 - return 0; 200 - } 201 196 #endif /* CONFIG_DYNAMIC_FTRACE */ 202 197 203 198 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
+4
arch/arm/kernel/return_address.c
··· 42 42 frame.sp = current_stack_pointer; 43 43 frame.lr = (unsigned long)__builtin_return_address(0); 44 44 frame.pc = (unsigned long)return_address; 45 + #ifdef CONFIG_KRETPROBES 46 + frame.kr_cur = NULL; 47 + frame.tsk = current; 48 + #endif 45 49 46 50 walk_stackframe(&frame, save_return_addr, &data); 47 51
+15 -2
arch/arm/kernel/stacktrace.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 #include <linux/export.h> 3 + #include <linux/kprobes.h> 3 4 #include <linux/sched.h> 4 5 #include <linux/sched/debug.h> 5 6 #include <linux/stacktrace.h> ··· 55 54 56 55 frame->sp = frame->fp; 57 56 frame->fp = *(unsigned long *)(fp); 58 - frame->pc = frame->lr; 59 - frame->lr = *(unsigned long *)(fp + 4); 57 + frame->pc = *(unsigned long *)(fp + 4); 60 58 #else 61 59 /* check current frame pointer is within bounds */ 62 60 if (fp < low + 12 || fp > high - 4) ··· 65 65 frame->fp = *(unsigned long *)(fp - 12); 66 66 frame->sp = *(unsigned long *)(fp - 8); 67 67 frame->pc = *(unsigned long *)(fp - 4); 68 + #endif 69 + #ifdef CONFIG_KRETPROBES 70 + if (is_kretprobe_trampoline(frame->pc)) 71 + frame->pc = kretprobe_find_ret_addr(frame->tsk, 72 + (void *)frame->fp, &frame->kr_cur); 68 73 #endif 69 74 70 75 return 0; ··· 162 157 frame.lr = (unsigned long)__builtin_return_address(0); 163 158 frame.pc = (unsigned long)__save_stack_trace; 164 159 } 160 + #ifdef CONFIG_KRETPROBES 161 + frame.kr_cur = NULL; 162 + frame.tsk = tsk; 163 + #endif 165 164 166 165 walk_stackframe(&frame, save_trace, &data); 167 166 } ··· 183 174 frame.sp = regs->ARM_sp; 184 175 frame.lr = regs->ARM_lr; 185 176 frame.pc = regs->ARM_pc; 177 + #ifdef CONFIG_KRETPROBES 178 + frame.kr_cur = NULL; 179 + frame.tsk = current; 180 + #endif 186 181 187 182 walk_stackframe(&frame, save_trace, &data); 188 183 }
+33 -10
arch/arm/probes/kprobes/core.c
··· 11 11 * Copyright (C) 2007 Marvell Ltd. 12 12 */ 13 13 14 + #define pr_fmt(fmt) "kprobes: " fmt 15 + 14 16 #include <linux/kernel.h> 15 17 #include <linux/kprobes.h> 16 18 #include <linux/module.h> ··· 280 278 break; 281 279 case KPROBE_REENTER: 282 280 /* A nested probe was hit in FIQ, it is a BUG */ 283 - pr_warn("Unrecoverable kprobe detected.\n"); 281 + pr_warn("Failed to recover from reentered kprobes.\n"); 284 282 dump_kprobe(p); 285 283 fallthrough; 286 284 default: ··· 368 366 /* 369 367 * When a retprobed function returns, trampoline_handler() is called, 370 368 * calling the kretprobe's handler. We construct a struct pt_regs to 371 - * give a view of registers r0-r11 to the user return-handler. This is 372 - * not a complete pt_regs structure, but that should be plenty sufficient 373 - * for kretprobe handlers which should normally be interested in r0 only 374 - * anyway. 369 + * give a view of registers r0-r11, sp, lr, and pc to the user 370 + * return-handler. This is not a complete pt_regs structure, but that 371 + * should be enough for stacktrace from the return handler with or 372 + * without pt_regs. 375 373 */ 376 - void __naked __kprobes kretprobe_trampoline(void) 374 + void __naked __kprobes __kretprobe_trampoline(void) 377 375 { 378 376 __asm__ __volatile__ ( 377 + #ifdef CONFIG_FRAME_POINTER 378 + "ldr lr, =__kretprobe_trampoline \n\t" 379 + /* __kretprobe_trampoline makes a framepointer on pt_regs. */ 380 + #ifdef CONFIG_CC_IS_CLANG 381 + "stmdb sp, {sp, lr, pc} \n\t" 382 + "sub sp, sp, #12 \n\t" 383 + /* In clang case, pt_regs->ip = lr. */ 384 + "stmdb sp!, {r0 - r11, lr} \n\t" 385 + /* fp points regs->r11 (fp) */ 386 + "add fp, sp, #44 \n\t" 387 + #else /* !CONFIG_CC_IS_CLANG */ 388 + /* In gcc case, pt_regs->ip = fp. */ 389 + "stmdb sp, {fp, sp, lr, pc} \n\t" 390 + "sub sp, sp, #16 \n\t" 379 391 "stmdb sp!, {r0 - r11} \n\t" 392 + /* fp points regs->r15 (pc) */ 393 + "add fp, sp, #60 \n\t" 394 + #endif /* CONFIG_CC_IS_CLANG */ 395 + #else /* !CONFIG_FRAME_POINTER */ 396 + "sub sp, sp, #16 \n\t" 397 + "stmdb sp!, {r0 - r11} \n\t" 398 + #endif /* CONFIG_FRAME_POINTER */ 380 399 "mov r0, sp \n\t" 381 400 "bl trampoline_handler \n\t" 382 401 "mov lr, r0 \n\t" 383 402 "ldmia sp!, {r0 - r11} \n\t" 403 + "add sp, sp, #16 \n\t" 384 404 #ifdef CONFIG_THUMB2_KERNEL 385 405 "bx lr \n\t" 386 406 #else ··· 411 387 : : : "memory"); 412 388 } 413 389 414 - /* Called from kretprobe_trampoline */ 390 + /* Called from __kretprobe_trampoline */ 415 391 static __used __kprobes void *trampoline_handler(struct pt_regs *regs) 416 392 { 417 - return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, 418 - (void *)regs->ARM_fp); 393 + return (void *)kretprobe_trampoline_handler(regs, (void *)regs->ARM_fp); 419 394 } 420 395 421 396 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri, ··· 424 401 ri->fp = (void *)regs->ARM_fp; 425 402 426 403 /* Replace the return addr with trampoline addr. */ 427 - regs->ARM_lr = (unsigned long)&kretprobe_trampoline; 404 + regs->ARM_lr = (unsigned long)&__kretprobe_trampoline; 428 405 } 429 406 430 407 int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+4 -3
arch/arm/probes/kprobes/opt-arm.c
··· 347 347 } 348 348 349 349 int arch_within_optimized_kprobe(struct optimized_kprobe *op, 350 - unsigned long addr) 350 + kprobe_opcode_t *addr) 351 351 { 352 - return ((unsigned long)op->kp.addr <= addr && 353 - (unsigned long)op->kp.addr + RELATIVEJUMP_SIZE > addr); 352 + return (op->kp.addr <= addr && 353 + op->kp.addr + (RELATIVEJUMP_SIZE / sizeof(kprobe_opcode_t)) > addr); 354 + 354 355 } 355 356 356 357 void arch_remove_optimized_kprobe(struct optimized_kprobe *op)
+1
arch/arm64/Kconfig
··· 11 11 select ACPI_PPTT if ACPI 12 12 select ARCH_HAS_DEBUG_WX 13 13 select ARCH_BINFMT_ELF_STATE 14 + select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE 14 15 select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION 15 16 select ARCH_ENABLE_MEMORY_HOTPLUG 16 17 select ARCH_ENABLE_MEMORY_HOTREMOVE
+1 -1
arch/arm64/include/asm/kprobes.h
··· 39 39 int kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr); 40 40 int kprobe_exceptions_notify(struct notifier_block *self, 41 41 unsigned long val, void *data); 42 - void kretprobe_trampoline(void); 42 + void __kretprobe_trampoline(void); 43 43 void __kprobes *trampoline_probe_handler(struct pt_regs *regs); 44 44 45 45 #endif /* CONFIG_KPROBES */
+4
arch/arm64/include/asm/stacktrace.h
··· 9 9 #include <linux/sched.h> 10 10 #include <linux/sched/task_stack.h> 11 11 #include <linux/types.h> 12 + #include <linux/llist.h> 12 13 13 14 #include <asm/memory.h> 14 15 #include <asm/ptrace.h> ··· 59 58 enum stack_type prev_type; 60 59 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 61 60 int graph; 61 + #endif 62 + #ifdef CONFIG_KRETPROBES 63 + struct llist_node *kr_cur; 62 64 #endif 63 65 }; 64 66
-5
arch/arm64/kernel/ftrace.c
··· 236 236 command |= FTRACE_MAY_SLEEP; 237 237 ftrace_modify_all_code(command); 238 238 } 239 - 240 - int __init ftrace_dyn_arch_init(void) 241 - { 242 - return 0; 243 - } 244 239 #endif /* CONFIG_DYNAMIC_FTRACE */ 245 240 246 241 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
+7 -5
arch/arm64/kernel/probes/kprobes.c
··· 7 7 * Copyright (C) 2013 Linaro Limited. 8 8 * Author: Sandeepa Prabhu <sandeepa.prabhu@linaro.org> 9 9 */ 10 + 11 + #define pr_fmt(fmt) "kprobes: " fmt 12 + 10 13 #include <linux/extable.h> 11 14 #include <linux/kasan.h> 12 15 #include <linux/kernel.h> ··· 221 218 break; 222 219 case KPROBE_HIT_SS: 223 220 case KPROBE_REENTER: 224 - pr_warn("Unrecoverable kprobe detected.\n"); 221 + pr_warn("Failed to recover from reentered kprobes.\n"); 225 222 dump_kprobe(p); 226 223 BUG(); 227 224 break; ··· 401 398 402 399 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs) 403 400 { 404 - return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, 405 - (void *)kernel_stack_pointer(regs)); 401 + return (void *)kretprobe_trampoline_handler(regs, (void *)regs->regs[29]); 406 402 } 407 403 408 404 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri, 409 405 struct pt_regs *regs) 410 406 { 411 407 ri->ret_addr = (kprobe_opcode_t *)regs->regs[30]; 412 - ri->fp = (void *)kernel_stack_pointer(regs); 408 + ri->fp = (void *)regs->regs[29]; 413 409 414 410 /* replace return addr (x30) with trampoline */ 415 - regs->regs[30] = (long)&kretprobe_trampoline; 411 + regs->regs[30] = (long)&__kretprobe_trampoline; 416 412 } 417 413 418 414 int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+6 -2
arch/arm64/kernel/probes/kprobes_trampoline.S
··· 61 61 ldp x28, x29, [sp, #S_X28] 62 62 .endm 63 63 64 - SYM_CODE_START(kretprobe_trampoline) 64 + SYM_CODE_START(__kretprobe_trampoline) 65 65 sub sp, sp, #PT_REGS_SIZE 66 66 67 67 save_all_base_regs 68 + 69 + /* Setup a frame pointer. */ 70 + add x29, sp, #S_FP 68 71 69 72 mov x0, sp 70 73 bl trampoline_probe_handler ··· 77 74 */ 78 75 mov lr, x0 79 76 77 + /* The frame pointer (x29) is restored with other registers. */ 80 78 restore_all_base_regs 81 79 82 80 add sp, sp, #PT_REGS_SIZE 83 81 ret 84 82 85 - SYM_CODE_END(kretprobe_trampoline) 83 + SYM_CODE_END(__kretprobe_trampoline)
+7
arch/arm64/kernel/stacktrace.c
··· 41 41 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 42 42 frame->graph = 0; 43 43 #endif 44 + #ifdef CONFIG_KRETPROBES 45 + frame->kr_cur = NULL; 46 + #endif 44 47 45 48 /* 46 49 * Prime the first unwind. ··· 132 129 frame->pc = ret_stack->ret; 133 130 } 134 131 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ 132 + #ifdef CONFIG_KRETPROBES 133 + if (is_kretprobe_trampoline(frame->pc)) 134 + frame->pc = kretprobe_find_ret_addr(tsk, (void *)frame->fp, &frame->kr_cur); 135 + #endif 135 136 136 137 frame->pc = ptrauth_strip_insn_pac(frame->pc); 137 138
+1 -1
arch/csky/include/asm/kprobes.h
··· 41 41 int kprobe_fault_handler(struct pt_regs *regs, unsigned int trapnr); 42 42 int kprobe_breakpoint_handler(struct pt_regs *regs); 43 43 int kprobe_single_step_handler(struct pt_regs *regs); 44 - void kretprobe_trampoline(void); 44 + void __kretprobe_trampoline(void); 45 45 void __kprobes *trampoline_probe_handler(struct pt_regs *regs); 46 46 47 47 #endif /* CONFIG_KPROBES */
-5
arch/csky/kernel/ftrace.c
··· 133 133 (unsigned long)func, true, true); 134 134 return ret; 135 135 } 136 - 137 - int __init ftrace_dyn_arch_init(void) 138 - { 139 - return 0; 140 - } 141 136 #endif /* CONFIG_DYNAMIC_FTRACE */ 142 137 143 138 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
-9
arch/csky/kernel/probes/ftrace.c
··· 2 2 3 3 #include <linux/kprobes.h> 4 4 5 - int arch_check_ftrace_location(struct kprobe *p) 6 - { 7 - if (ftrace_location((unsigned long)p->addr)) 8 - p->flags |= KPROBE_FLAG_FTRACE; 9 - return 0; 10 - } 11 - 12 5 /* Ftrace callback handler for kprobes -- called under preepmt disabled */ 13 6 void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, 14 7 struct ftrace_ops *ops, struct ftrace_regs *fregs) ··· 17 24 return; 18 25 19 26 regs = ftrace_get_regs(fregs); 20 - preempt_disable_notrace(); 21 27 p = get_kprobe((kprobe_opcode_t *)ip); 22 28 if (!p) { 23 29 p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE)); ··· 56 64 __this_cpu_write(current_kprobe, NULL); 57 65 } 58 66 out: 59 - preempt_enable_notrace(); 60 67 ftrace_test_recursion_unlock(bit); 61 68 } 62 69 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
+7 -7
arch/csky/kernel/probes/kprobes.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0+ 2 2 3 + #define pr_fmt(fmt) "kprobes: " fmt 4 + 3 5 #include <linux/kprobes.h> 4 6 #include <linux/extable.h> 5 7 #include <linux/slab.h> ··· 79 77 { 80 78 unsigned long probe_addr = (unsigned long)p->addr; 81 79 82 - if (probe_addr & 0x1) { 83 - pr_warn("Address not aligned.\n"); 84 - return -EINVAL; 85 - } 80 + if (probe_addr & 0x1) 81 + return -EILSEQ; 86 82 87 83 /* copy instruction */ 88 84 p->opcode = le32_to_cpu(*p->addr); ··· 225 225 break; 226 226 case KPROBE_HIT_SS: 227 227 case KPROBE_REENTER: 228 - pr_warn("Unrecoverable kprobe detected.\n"); 228 + pr_warn("Failed to recover from reentered kprobes.\n"); 229 229 dump_kprobe(p); 230 230 BUG(); 231 231 break; ··· 386 386 387 387 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs) 388 388 { 389 - return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL); 389 + return (void *)kretprobe_trampoline_handler(regs, NULL); 390 390 } 391 391 392 392 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri, ··· 394 394 { 395 395 ri->ret_addr = (kprobe_opcode_t *)regs->lr; 396 396 ri->fp = NULL; 397 - regs->lr = (unsigned long) &kretprobe_trampoline; 397 + regs->lr = (unsigned long) &__kretprobe_trampoline; 398 398 } 399 399 400 400 int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+2 -2
arch/csky/kernel/probes/kprobes_trampoline.S
··· 4 4 5 5 #include <abi/entry.h> 6 6 7 - ENTRY(kretprobe_trampoline) 7 + ENTRY(__kretprobe_trampoline) 8 8 SAVE_REGS_FTRACE 9 9 10 10 mov a0, sp /* pt_regs */ ··· 16 16 17 17 RESTORE_REGS_FTRACE 18 18 rts 19 - ENDPROC(kretprobe_trampoline) 19 + ENDPROC(__kretprobe_trampoline)
+5
arch/ia64/include/asm/ptrace.h
··· 51 51 * the canonical representation by adding to instruction pointer. 52 52 */ 53 53 # define instruction_pointer(regs) ((regs)->cr_iip + ia64_psr(regs)->ri) 54 + # define instruction_pointer_set(regs, val) \ 55 + ({ \ 56 + ia64_psr(regs)->ri = (val & 0xf); \ 57 + regs->cr_iip = (val & ~0xfULL); \ 58 + }) 54 59 55 60 static inline unsigned long user_stack_pointer(struct pt_regs *regs) 56 61 {
-6
arch/ia64/kernel/ftrace.c
··· 194 194 flush_icache_range(addr, addr + 16); 195 195 return 0; 196 196 } 197 - 198 - /* run from kstop_machine */ 199 - int __init ftrace_dyn_arch_init(void) 200 - { 201 - return 0; 202 - }
+5 -10
arch/ia64/kernel/kprobes.c
··· 392 392 __this_cpu_write(current_kprobe, p); 393 393 } 394 394 395 - static void kretprobe_trampoline(void) 395 + void __kretprobe_trampoline(void) 396 396 { 397 397 } 398 398 399 399 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs) 400 400 { 401 - regs->cr_iip = __kretprobe_trampoline_handler(regs, kretprobe_trampoline, NULL); 401 + regs->cr_iip = __kretprobe_trampoline_handler(regs, NULL); 402 402 /* 403 403 * By returning a non-zero value, we are telling 404 404 * kprobe_handler() that we don't want the post_handler ··· 414 414 ri->fp = NULL; 415 415 416 416 /* Replace the return addr with trampoline addr */ 417 - regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip; 417 + regs->b0 = (unsigned long)dereference_function_descriptor(__kretprobe_trampoline); 418 418 } 419 419 420 420 /* Check the instruction in the slot is break */ ··· 890 890 return ret; 891 891 } 892 892 893 - unsigned long arch_deref_entry_point(void *entry) 894 - { 895 - return ((struct fnptr *)entry)->ip; 896 - } 897 - 898 893 static struct kprobe trampoline_p = { 899 894 .pre_handler = trampoline_probe_handler 900 895 }; ··· 897 902 int __init arch_init_kprobes(void) 898 903 { 899 904 trampoline_p.addr = 900 - (kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip; 905 + dereference_function_descriptor(__kretprobe_trampoline); 901 906 return register_kprobe(&trampoline_p); 902 907 } 903 908 904 909 int __kprobes arch_trampoline_kprobe(struct kprobe *p) 905 910 { 906 911 if (p->addr == 907 - (kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip) 912 + dereference_function_descriptor(__kretprobe_trampoline)) 908 913 return 1; 909 914 910 915 return 0;
-5
arch/microblaze/kernel/ftrace.c
··· 163 163 return ret; 164 164 } 165 165 166 - int __init ftrace_dyn_arch_init(void) 167 - { 168 - return 0; 169 - } 170 - 171 166 int ftrace_update_ftrace_func(ftrace_func_t func) 172 167 { 173 168 unsigned long ip = (unsigned long)(&ftrace_call);
+12 -14
arch/mips/kernel/kprobes.c
··· 11 11 * Copyright (C) IBM Corporation, 2002, 2004 12 12 */ 13 13 14 + #define pr_fmt(fmt) "kprobes: " fmt 15 + 14 16 #include <linux/kprobes.h> 15 17 #include <linux/preempt.h> 16 18 #include <linux/uaccess.h> ··· 82 80 insn = p->addr[0]; 83 81 84 82 if (insn_has_ll_or_sc(insn)) { 85 - pr_notice("Kprobes for ll and sc instructions are not" 86 - "supported\n"); 83 + pr_notice("Kprobes for ll and sc instructions are not supported\n"); 87 84 ret = -EINVAL; 88 85 goto out; 89 86 } ··· 220 219 return 0; 221 220 222 221 unaligned: 223 - pr_notice("%s: unaligned epc - sending SIGBUS.\n", current->comm); 222 + pr_notice("Failed to emulate branch instruction because of unaligned epc - sending SIGBUS to %s.\n", current->comm); 224 223 force_sig(SIGBUS); 225 224 return -EFAULT; 226 225 ··· 239 238 regs->cp0_epc = (unsigned long)p->addr; 240 239 else if (insn_has_delayslot(p->opcode)) { 241 240 ret = evaluate_branch_instruction(p, regs, kcb); 242 - if (ret < 0) { 243 - pr_notice("Kprobes: Error in evaluating branch\n"); 241 + if (ret < 0) 244 242 return; 245 - } 246 243 } 247 244 regs->cp0_epc = (unsigned long)&p->ainsn.insn[0]; 248 245 } ··· 460 461 /* Keep the assembler from reordering and placing JR here. */ 461 462 ".set noreorder\n\t" 462 463 "nop\n\t" 463 - ".global kretprobe_trampoline\n" 464 - "kretprobe_trampoline:\n\t" 464 + ".global __kretprobe_trampoline\n" 465 + "__kretprobe_trampoline:\n\t" 465 466 "nop\n\t" 466 467 ".set pop" 467 468 : : : "memory"); 468 469 } 469 470 470 - void kretprobe_trampoline(void); 471 + void __kretprobe_trampoline(void); 471 472 472 473 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri, 473 474 struct pt_regs *regs) ··· 476 477 ri->fp = NULL; 477 478 478 479 /* Replace the return addr with trampoline addr */ 479 - regs->regs[31] = (unsigned long)kretprobe_trampoline; 480 + regs->regs[31] = (unsigned long)__kretprobe_trampoline; 480 481 } 481 482 482 483 /* ··· 485 486 static int __kprobes trampoline_probe_handler(struct kprobe *p, 486 487 struct pt_regs *regs) 487 488 { 488 - instruction_pointer(regs) = __kretprobe_trampoline_handler(regs, 489 - kretprobe_trampoline, NULL); 489 + instruction_pointer(regs) = __kretprobe_trampoline_handler(regs, NULL); 490 490 /* 491 491 * By returning a non-zero value, we are telling 492 492 * kprobe_handler() that we don't want the post_handler ··· 496 498 497 499 int __kprobes arch_trampoline_kprobe(struct kprobe *p) 498 500 { 499 - if (p->addr == (kprobe_opcode_t *)kretprobe_trampoline) 501 + if (p->addr == (kprobe_opcode_t *)__kretprobe_trampoline) 500 502 return 1; 501 503 502 504 return 0; 503 505 } 504 506 505 507 static struct kprobe trampoline_p = { 506 - .addr = (kprobe_opcode_t *)kretprobe_trampoline, 508 + .addr = (kprobe_opcode_t *)__kretprobe_trampoline, 507 509 .pre_handler = trampoline_probe_handler 508 510 }; 509 511
-5
arch/nds32/kernel/ftrace.c
··· 84 84 /* restore all state needed by the compiler epilogue */ 85 85 } 86 86 87 - int __init ftrace_dyn_arch_init(void) 88 - { 89 - return 0; 90 - } 91 - 92 87 static unsigned long gen_sethi_insn(unsigned long addr) 93 88 { 94 89 unsigned long opcode = 0x46000000;
-8
arch/parisc/kernel/ftrace.c
··· 93 93 #endif 94 94 95 95 #ifdef CONFIG_DYNAMIC_FTRACE 96 - 97 - int __init ftrace_dyn_arch_init(void) 98 - { 99 - return 0; 100 - } 101 - 102 96 int ftrace_update_ftrace_func(ftrace_func_t func) 103 97 { 104 98 ftrace_func = func; ··· 211 217 return; 212 218 213 219 regs = ftrace_get_regs(fregs); 214 - preempt_disable_notrace(); 215 220 p = get_kprobe((kprobe_opcode_t *)ip); 216 221 if (unlikely(!p) || kprobe_disabled(p)) 217 222 goto out; ··· 239 246 } 240 247 __this_cpu_write(current_kprobe, NULL); 241 248 out: 242 - preempt_enable_notrace(); 243 249 ftrace_test_recursion_unlock(bit); 244 250 } 245 251 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
+3 -3
arch/parisc/kernel/kprobes.c
··· 175 175 return 1; 176 176 } 177 177 178 - static inline void kretprobe_trampoline(void) 178 + void __kretprobe_trampoline(void) 179 179 { 180 180 asm volatile("nop"); 181 181 asm volatile("nop"); ··· 193 193 { 194 194 unsigned long orig_ret_address; 195 195 196 - orig_ret_address = __kretprobe_trampoline_handler(regs, trampoline_p.addr, NULL); 196 + orig_ret_address = __kretprobe_trampoline_handler(regs, NULL); 197 197 instruction_pointer_set(regs, orig_ret_address); 198 198 199 199 return 1; ··· 217 217 int __init arch_init_kprobes(void) 218 218 { 219 219 trampoline_p.addr = (kprobe_opcode_t *) 220 - dereference_function_descriptor(kretprobe_trampoline); 220 + dereference_function_descriptor(__kretprobe_trampoline); 221 221 return register_kprobe(&trampoline_p); 222 222 }
+1 -1
arch/powerpc/include/asm/kprobes.h
··· 51 51 #define flush_insn_slot(p) do { } while (0) 52 52 #define kretprobe_blacklist_size 0 53 53 54 - void kretprobe_trampoline(void); 54 + void __kretprobe_trampoline(void); 55 55 extern void arch_remove_kprobe(struct kprobe *p); 56 56 57 57 /* Architecture specific copy of original instruction */
-2
arch/powerpc/kernel/kprobes-ftrace.c
··· 26 26 return; 27 27 28 28 regs = ftrace_get_regs(fregs); 29 - preempt_disable_notrace(); 30 29 p = get_kprobe((kprobe_opcode_t *)nip); 31 30 if (unlikely(!p) || kprobe_disabled(p)) 32 31 goto out; ··· 60 61 __this_cpu_write(current_kprobe, NULL); 61 62 } 62 63 out: 63 - preempt_enable_notrace(); 64 64 ftrace_test_recursion_unlock(bit); 65 65 } 66 66 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
+9 -20
arch/powerpc/kernel/kprobes.c
··· 237 237 ri->fp = NULL; 238 238 239 239 /* Replace the return addr with trampoline addr */ 240 - regs->link = (unsigned long)kretprobe_trampoline; 240 + regs->link = (unsigned long)__kretprobe_trampoline; 241 241 } 242 242 NOKPROBE_SYMBOL(arch_prepare_kretprobe); 243 243 ··· 403 403 * - When the probed function returns, this probe 404 404 * causes the handlers to fire 405 405 */ 406 - asm(".global kretprobe_trampoline\n" 407 - ".type kretprobe_trampoline, @function\n" 408 - "kretprobe_trampoline:\n" 406 + asm(".global __kretprobe_trampoline\n" 407 + ".type __kretprobe_trampoline, @function\n" 408 + "__kretprobe_trampoline:\n" 409 409 "nop\n" 410 410 "blr\n" 411 - ".size kretprobe_trampoline, .-kretprobe_trampoline\n"); 411 + ".size __kretprobe_trampoline, .-__kretprobe_trampoline\n"); 412 412 413 413 /* 414 414 * Called when the probe at kretprobe trampoline is hit ··· 417 417 { 418 418 unsigned long orig_ret_address; 419 419 420 - orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL); 420 + orig_ret_address = __kretprobe_trampoline_handler(regs, NULL); 421 421 /* 422 422 * We get here through one of two paths: 423 423 * 1. by taking a trap -> kprobe_handler() -> here ··· 427 427 * as it is used to determine the return address from the trap. 428 428 * For (2), since nip is not honoured with optprobes, we instead setup 429 429 * the link register properly so that the subsequent 'blr' in 430 - * kretprobe_trampoline jumps back to the right instruction. 430 + * __kretprobe_trampoline jumps back to the right instruction. 431 431 * 432 432 * For nip, we should set the address to the previous instruction since 433 433 * we end up emulating it in kprobe_handler(), which increments the nip ··· 542 542 } 543 543 NOKPROBE_SYMBOL(kprobe_fault_handler); 544 544 545 - unsigned long arch_deref_entry_point(void *entry) 546 - { 547 - #ifdef PPC64_ELF_ABI_v1 548 - if (!kernel_text_address((unsigned long)entry)) 549 - return ppc_global_function_entry(entry); 550 - else 551 - #endif 552 - return (unsigned long)entry; 553 - } 554 - NOKPROBE_SYMBOL(arch_deref_entry_point); 555 - 556 545 static struct kprobe trampoline_p = { 557 - .addr = (kprobe_opcode_t *) &kretprobe_trampoline, 546 + .addr = (kprobe_opcode_t *) &__kretprobe_trampoline, 558 547 .pre_handler = trampoline_probe_handler 559 548 }; 560 549 ··· 554 565 555 566 int arch_trampoline_kprobe(struct kprobe *p) 556 567 { 557 - if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline) 568 + if (p->addr == (kprobe_opcode_t *)&__kretprobe_trampoline) 558 569 return 1; 559 570 560 571 return 0;
+4 -4
arch/powerpc/kernel/optprobes.c
··· 56 56 * has a 'nop' instruction, which can be emulated. 57 57 * So further checks can be skipped. 58 58 */ 59 - if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline) 59 + if (p->addr == (kprobe_opcode_t *)&__kretprobe_trampoline) 60 60 return addr + sizeof(kprobe_opcode_t); 61 61 62 62 /* ··· 301 301 } 302 302 } 303 303 304 - int arch_within_optimized_kprobe(struct optimized_kprobe *op, unsigned long addr) 304 + int arch_within_optimized_kprobe(struct optimized_kprobe *op, kprobe_opcode_t *addr) 305 305 { 306 - return ((unsigned long)op->kp.addr <= addr && 307 - (unsigned long)op->kp.addr + RELATIVEJUMP_SIZE > addr); 306 + return (op->kp.addr <= addr && 307 + op->kp.addr + (RELATIVEJUMP_SIZE / sizeof(kprobe_opcode_t)) > addr); 308 308 }
+1 -1
arch/powerpc/kernel/stacktrace.c
··· 155 155 * Mark stacktraces with kretprobed functions on them 156 156 * as unreliable. 157 157 */ 158 - if (ip == (unsigned long)kretprobe_trampoline) 158 + if (ip == (unsigned long)__kretprobe_trampoline) 159 159 return -EINVAL; 160 160 #endif 161 161
+1 -1
arch/riscv/include/asm/kprobes.h
··· 40 40 int kprobe_fault_handler(struct pt_regs *regs, unsigned int trapnr); 41 41 bool kprobe_breakpoint_handler(struct pt_regs *regs); 42 42 bool kprobe_single_step_handler(struct pt_regs *regs); 43 - void kretprobe_trampoline(void); 43 + void __kretprobe_trampoline(void); 44 44 void __kprobes *trampoline_probe_handler(struct pt_regs *regs); 45 45 46 46 #endif /* CONFIG_KPROBES */
-5
arch/riscv/kernel/ftrace.c
··· 154 154 155 155 return ret; 156 156 } 157 - 158 - int __init ftrace_dyn_arch_init(void) 159 - { 160 - return 0; 161 - } 162 157 #endif 163 158 164 159 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
-2
arch/riscv/kernel/probes/ftrace.c
··· 15 15 if (bit < 0) 16 16 return; 17 17 18 - preempt_disable_notrace(); 19 18 p = get_kprobe((kprobe_opcode_t *)ip); 20 19 if (unlikely(!p) || kprobe_disabled(p)) 21 20 goto out; ··· 51 52 __this_cpu_write(current_kprobe, NULL); 52 53 } 53 54 out: 54 - preempt_enable_notrace(); 55 55 ftrace_test_recursion_unlock(bit); 56 56 } 57 57 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
+7 -8
arch/riscv/kernel/probes/kprobes.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0+ 2 2 3 + #define pr_fmt(fmt) "kprobes: " fmt 4 + 3 5 #include <linux/kprobes.h> 4 6 #include <linux/extable.h> 5 7 #include <linux/slab.h> ··· 52 50 { 53 51 unsigned long probe_addr = (unsigned long)p->addr; 54 52 55 - if (probe_addr & 0x1) { 56 - pr_warn("Address not aligned.\n"); 57 - 58 - return -EINVAL; 59 - } 53 + if (probe_addr & 0x1) 54 + return -EILSEQ; 60 55 61 56 /* copy instruction */ 62 57 p->opcode = *p->addr; ··· 190 191 break; 191 192 case KPROBE_HIT_SS: 192 193 case KPROBE_REENTER: 193 - pr_warn("Unrecoverable kprobe detected.\n"); 194 + pr_warn("Failed to recover from reentered kprobes.\n"); 194 195 dump_kprobe(p); 195 196 BUG(); 196 197 break; ··· 347 348 348 349 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs) 349 350 { 350 - return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL); 351 + return (void *)kretprobe_trampoline_handler(regs, NULL); 351 352 } 352 353 353 354 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri, ··· 355 356 { 356 357 ri->ret_addr = (kprobe_opcode_t *)regs->ra; 357 358 ri->fp = NULL; 358 - regs->ra = (unsigned long) &kretprobe_trampoline; 359 + regs->ra = (unsigned long) &__kretprobe_trampoline; 359 360 } 360 361 361 362 int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+2 -2
arch/riscv/kernel/probes/kprobes_trampoline.S
··· 75 75 REG_L x31, PT_T6(sp) 76 76 .endm 77 77 78 - ENTRY(kretprobe_trampoline) 78 + ENTRY(__kretprobe_trampoline) 79 79 addi sp, sp, -(PT_SIZE_ON_STACK) 80 80 save_all_base_regs 81 81 ··· 90 90 addi sp, sp, PT_SIZE_ON_STACK 91 91 92 92 ret 93 - ENDPROC(kretprobe_trampoline) 93 + ENDPROC(__kretprobe_trampoline)
+1 -1
arch/s390/include/asm/kprobes.h
··· 70 70 }; 71 71 72 72 void arch_remove_kprobe(struct kprobe *p); 73 - void kretprobe_trampoline(void); 73 + void __kretprobe_trampoline(void); 74 74 75 75 int kprobe_fault_handler(struct pt_regs *regs, int trapnr); 76 76 int kprobe_exceptions_notify(struct notifier_block *self,
-5
arch/s390/kernel/ftrace.c
··· 262 262 return 0; 263 263 } 264 264 265 - int __init ftrace_dyn_arch_init(void) 266 - { 267 - return 0; 268 - } 269 - 270 265 void arch_ftrace_update_code(int command) 271 266 { 272 267 if (ftrace_shared_hotpatch_trampoline(NULL))
+9 -7
arch/s390/kernel/kprobes.c
··· 7 7 * s390 port, used ppc64 as template. Mike Grundy <grundym@us.ibm.com> 8 8 */ 9 9 10 + #define pr_fmt(fmt) "kprobes: " fmt 11 + 10 12 #include <linux/moduleloader.h> 11 13 #include <linux/kprobes.h> 12 14 #include <linux/ptrace.h> ··· 242 240 ri->fp = NULL; 243 241 244 242 /* Replace the return addr with trampoline addr */ 245 - regs->gprs[14] = (unsigned long) &kretprobe_trampoline; 243 + regs->gprs[14] = (unsigned long) &__kretprobe_trampoline; 246 244 } 247 245 NOKPROBE_SYMBOL(arch_prepare_kretprobe); 248 246 ··· 261 259 * is a BUG. The code path resides in the .kprobes.text 262 260 * section and is executed with interrupts disabled. 263 261 */ 264 - pr_err("Invalid kprobe detected.\n"); 262 + pr_err("Failed to recover from reentered kprobes.\n"); 265 263 dump_kprobe(p); 266 264 BUG(); 267 265 } ··· 334 332 */ 335 333 static void __used kretprobe_trampoline_holder(void) 336 334 { 337 - asm volatile(".global kretprobe_trampoline\n" 338 - "kretprobe_trampoline: bcr 0,0\n"); 335 + asm volatile(".global __kretprobe_trampoline\n" 336 + "__kretprobe_trampoline: bcr 0,0\n"); 339 337 } 340 338 341 339 /* ··· 343 341 */ 344 342 static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs) 345 343 { 346 - regs->psw.addr = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL); 344 + regs->psw.addr = __kretprobe_trampoline_handler(regs, NULL); 347 345 /* 348 346 * By returning a non-zero value, we are telling 349 347 * kprobe_handler() that we don't want the post_handler ··· 509 507 NOKPROBE_SYMBOL(kprobe_exceptions_notify); 510 508 511 509 static struct kprobe trampoline = { 512 - .addr = (kprobe_opcode_t *) &kretprobe_trampoline, 510 + .addr = (kprobe_opcode_t *) &__kretprobe_trampoline, 513 511 .pre_handler = trampoline_probe_handler 514 512 }; 515 513 ··· 520 518 521 519 int arch_trampoline_kprobe(struct kprobe *p) 522 520 { 523 - return p->addr == (kprobe_opcode_t *) &kretprobe_trampoline; 521 + return p->addr == (kprobe_opcode_t *) &__kretprobe_trampoline; 524 522 } 525 523 NOKPROBE_SYMBOL(arch_trampoline_kprobe);
+1 -1
arch/s390/kernel/stacktrace.c
··· 46 46 * Mark stacktraces with kretprobed functions on them 47 47 * as unreliable. 48 48 */ 49 - if (state.ip == (unsigned long)kretprobe_trampoline) 49 + if (state.ip == (unsigned long)__kretprobe_trampoline) 50 50 return -EINVAL; 51 51 #endif 52 52
+3
arch/sh/boot/compressed/misc.c
··· 115 115 void ftrace_stub(void) 116 116 { 117 117 } 118 + void arch_ftrace_ops_list_func(void) 119 + { 120 + } 118 121 119 122 #define stackalign 4 120 123
+1 -1
arch/sh/include/asm/kprobes.h
··· 26 26 struct kprobe; 27 27 28 28 void arch_remove_kprobe(struct kprobe *); 29 - void kretprobe_trampoline(void); 29 + void __kretprobe_trampoline(void); 30 30 31 31 /* Architecture specific copy of original instruction*/ 32 32 struct arch_specific_insn {
-5
arch/sh/kernel/ftrace.c
··· 252 252 253 253 return ftrace_modify_code(rec->ip, old, new); 254 254 } 255 - 256 - int __init ftrace_dyn_arch_init(void) 257 - { 258 - return 0; 259 - } 260 255 #endif /* CONFIG_DYNAMIC_FTRACE */ 261 256 262 257 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
+6 -6
arch/sh/kernel/kprobes.c
··· 207 207 ri->fp = NULL; 208 208 209 209 /* Replace the return addr with trampoline addr */ 210 - regs->pr = (unsigned long)kretprobe_trampoline; 210 + regs->pr = (unsigned long)__kretprobe_trampoline; 211 211 } 212 212 213 213 static int __kprobes kprobe_handler(struct pt_regs *regs) ··· 293 293 */ 294 294 static void __used kretprobe_trampoline_holder(void) 295 295 { 296 - asm volatile (".globl kretprobe_trampoline\n" 297 - "kretprobe_trampoline:\n\t" 296 + asm volatile (".globl __kretprobe_trampoline\n" 297 + "__kretprobe_trampoline:\n\t" 298 298 "nop\n"); 299 299 } 300 300 301 301 /* 302 - * Called when we hit the probe point at kretprobe_trampoline 302 + * Called when we hit the probe point at __kretprobe_trampoline 303 303 */ 304 304 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs) 305 305 { 306 - regs->pc = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL); 306 + regs->pc = __kretprobe_trampoline_handler(regs, NULL); 307 307 308 308 return 1; 309 309 } ··· 442 442 } 443 443 444 444 static struct kprobe trampoline_p = { 445 - .addr = (kprobe_opcode_t *)&kretprobe_trampoline, 445 + .addr = (kprobe_opcode_t *)&__kretprobe_trampoline, 446 446 .pre_handler = trampoline_probe_handler 447 447 }; 448 448
+1 -1
arch/sparc/include/asm/kprobes.h
··· 24 24 flushi(&(p)->ainsn.insn[1]); \ 25 25 } while (0) 26 26 27 - void kretprobe_trampoline(void); 27 + void __kretprobe_trampoline(void); 28 28 29 29 /* Architecture specific copy of original instruction*/ 30 30 struct arch_specific_insn {
-5
arch/sparc/kernel/ftrace.c
··· 82 82 new = ftrace_call_replace(ip, (unsigned long)func); 83 83 return ftrace_modify_code(ip, old, new); 84 84 } 85 - 86 - int __init ftrace_dyn_arch_init(void) 87 - { 88 - return 0; 89 - } 90 85 #endif 91 86 92 87 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
+6 -6
arch/sparc/kernel/kprobes.c
··· 440 440 441 441 /* Replace the return addr with trampoline addr */ 442 442 regs->u_regs[UREG_RETPC] = 443 - ((unsigned long)kretprobe_trampoline) - 8; 443 + ((unsigned long)__kretprobe_trampoline) - 8; 444 444 } 445 445 446 446 /* ··· 451 451 { 452 452 unsigned long orig_ret_address = 0; 453 453 454 - orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL); 454 + orig_ret_address = __kretprobe_trampoline_handler(regs, NULL); 455 455 regs->tpc = orig_ret_address; 456 456 regs->tnpc = orig_ret_address + 4; 457 457 ··· 465 465 466 466 static void __used kretprobe_trampoline_holder(void) 467 467 { 468 - asm volatile(".global kretprobe_trampoline\n" 469 - "kretprobe_trampoline:\n" 468 + asm volatile(".global __kretprobe_trampoline\n" 469 + "__kretprobe_trampoline:\n" 470 470 "\tnop\n" 471 471 "\tnop\n"); 472 472 } 473 473 static struct kprobe trampoline_p = { 474 - .addr = (kprobe_opcode_t *) &kretprobe_trampoline, 474 + .addr = (kprobe_opcode_t *) &__kretprobe_trampoline, 475 475 .pre_handler = trampoline_probe_handler 476 476 }; 477 477 ··· 482 482 483 483 int __kprobes arch_trampoline_kprobe(struct kprobe *p) 484 484 { 485 - if (p->addr == (kprobe_opcode_t *)&kretprobe_trampoline) 485 + if (p->addr == (kprobe_opcode_t *)&__kretprobe_trampoline) 486 486 return 1; 487 487 488 488 return 0;
+2 -1
arch/x86/Kconfig
··· 61 61 select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI 62 62 select ARCH_32BIT_OFF_T if X86_32 63 63 select ARCH_CLOCKSOURCE_INIT 64 + select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE 64 65 select ARCH_ENABLE_HUGEPAGE_MIGRATION if X86_64 && HUGETLB_PAGE && MIGRATION 65 66 select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64 || (X86_32 && HIGHMEM) 66 67 select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG ··· 199 198 select HAVE_FAST_GUP 200 199 select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE 201 200 select HAVE_FTRACE_MCOUNT_RECORD 202 - select HAVE_FUNCTION_GRAPH_TRACER 201 + select HAVE_FUNCTION_GRAPH_TRACER if X86_32 || (X86_64 && DYNAMIC_FTRACE) 203 202 select HAVE_FUNCTION_TRACER 204 203 select HAVE_GCC_PLUGINS 205 204 select HAVE_HW_BREAKPOINT
+7 -2
arch/x86/include/asm/ftrace.h
··· 57 57 58 58 #define ftrace_instruction_pointer_set(fregs, _ip) \ 59 59 do { (fregs)->regs.ip = (_ip); } while (0) 60 + 61 + struct ftrace_ops; 62 + #define ftrace_graph_func ftrace_graph_func 63 + void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, 64 + struct ftrace_ops *op, struct ftrace_regs *fregs); 65 + #else 66 + #define FTRACE_GRAPH_TRAMP_ADDR FTRACE_GRAPH_ADDR 60 67 #endif 61 68 62 69 #ifdef CONFIG_DYNAMIC_FTRACE ··· 71 64 struct dyn_arch_ftrace { 72 65 /* No extra data needed for x86 */ 73 66 }; 74 - 75 - #define FTRACE_GRAPH_TRAMP_ADDR FTRACE_GRAPH_ADDR 76 67 77 68 #endif /* CONFIG_DYNAMIC_FTRACE */ 78 69 #endif /* __ASSEMBLY__ */
-1
arch/x86/include/asm/kprobes.h
··· 49 49 extern const int kretprobe_blacklist_size; 50 50 51 51 void arch_remove_kprobe(struct kprobe *p); 52 - asmlinkage void kretprobe_trampoline(void); 53 52 54 53 extern void arch_kprobe_override_function(struct pt_regs *regs); 55 54
+29
arch/x86/include/asm/unwind.h
··· 4 4 5 5 #include <linux/sched.h> 6 6 #include <linux/ftrace.h> 7 + #include <linux/kprobes.h> 7 8 #include <asm/ptrace.h> 8 9 #include <asm/stacktrace.h> 9 10 ··· 16 15 unsigned long stack_mask; 17 16 struct task_struct *task; 18 17 int graph_idx; 18 + #ifdef CONFIG_KRETPROBES 19 + struct llist_node *kr_cur; 20 + #endif 19 21 bool error; 20 22 #if defined(CONFIG_UNWINDER_ORC) 21 23 bool signal, full_regs; ··· 102 98 void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size, 103 99 void *orc, size_t orc_size) {} 104 100 #endif 101 + 102 + static inline 103 + unsigned long unwind_recover_kretprobe(struct unwind_state *state, 104 + unsigned long addr, unsigned long *addr_p) 105 + { 106 + #ifdef CONFIG_KRETPROBES 107 + return is_kretprobe_trampoline(addr) ? 108 + kretprobe_find_ret_addr(state->task, addr_p, &state->kr_cur) : 109 + addr; 110 + #else 111 + return addr; 112 + #endif 113 + } 114 + 115 + /* Recover the return address modified by kretprobe and ftrace_graph. */ 116 + static inline 117 + unsigned long unwind_recover_ret_addr(struct unwind_state *state, 118 + unsigned long addr, unsigned long *addr_p) 119 + { 120 + unsigned long ret; 121 + 122 + ret = ftrace_graph_ret_addr(state->task, &state->graph_idx, 123 + addr, addr_p); 124 + return unwind_recover_kretprobe(state, ret, addr_p); 125 + } 105 126 106 127 /* 107 128 * This disables KASAN checking when reading a value from another task's stack,
+5
arch/x86/include/asm/unwind_hints.h
··· 52 52 UNWIND_HINT sp_reg=ORC_REG_SP sp_offset=8 type=UNWIND_HINT_TYPE_FUNC 53 53 .endm 54 54 55 + #else 56 + 57 + #define UNWIND_HINT_FUNC \ 58 + UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0) 59 + 55 60 #endif /* __ASSEMBLY__ */ 56 61 57 62 #endif /* _ASM_X86_UNWIND_HINTS_H */
+35 -41
arch/x86/kernel/ftrace.c
··· 252 252 ftrace_modify_all_code(command); 253 253 } 254 254 255 - int __init ftrace_dyn_arch_init(void) 256 - { 257 - return 0; 258 - } 259 - 260 255 /* Currently only x86_64 supports dynamic trampolines */ 261 256 #ifdef CONFIG_X86_64 262 257 ··· 522 527 return ptr + CALL_INSN_SIZE + call.disp; 523 528 } 524 529 525 - void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent, 530 + void prepare_ftrace_return(unsigned long ip, unsigned long *parent, 526 531 unsigned long frame_pointer); 527 532 528 533 /* ··· 536 541 void *ptr; 537 542 538 543 if (ops && ops->trampoline) { 539 - #ifdef CONFIG_FUNCTION_GRAPH_TRACER 544 + #if !defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) && \ 545 + defined(CONFIG_FUNCTION_GRAPH_TRACER) 540 546 /* 541 547 * We only know about function graph tracer setting as static 542 548 * trampoline. ··· 585 589 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 586 590 587 591 #ifdef CONFIG_DYNAMIC_FTRACE 588 - extern void ftrace_graph_call(void); 589 592 593 + #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS 594 + extern void ftrace_graph_call(void); 590 595 static const char *ftrace_jmp_replace(unsigned long ip, unsigned long addr) 591 596 { 592 597 return text_gen_insn(JMP32_INSN_OPCODE, (void *)ip, (void *)addr); ··· 615 618 616 619 return ftrace_mod_jmp(ip, &ftrace_stub); 617 620 } 621 + #else /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */ 622 + int ftrace_enable_ftrace_graph_caller(void) 623 + { 624 + return 0; 625 + } 618 626 627 + int ftrace_disable_ftrace_graph_caller(void) 628 + { 629 + return 0; 630 + } 631 + #endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */ 619 632 #endif /* !CONFIG_DYNAMIC_FTRACE */ 620 633 621 634 /* 622 635 * Hook the return address and push it in the stack of return addrs 623 636 * in current thread info. 624 637 */ 625 - void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent, 638 + void prepare_ftrace_return(unsigned long ip, unsigned long *parent, 626 639 unsigned long frame_pointer) 627 640 { 628 641 unsigned long return_hooker = (unsigned long)&return_to_handler; 629 - unsigned long old; 630 - int faulted; 642 + int bit; 631 643 632 644 /* 633 645 * When resuming from suspend-to-ram, this function can be indirectly ··· 656 650 if (unlikely(atomic_read(&current->tracing_graph_pause))) 657 651 return; 658 652 659 - /* 660 - * Protect against fault, even if it shouldn't 661 - * happen. This tool is too much intrusive to 662 - * ignore such a protection. 663 - */ 664 - asm volatile( 665 - "1: " _ASM_MOV " (%[parent]), %[old]\n" 666 - "2: " _ASM_MOV " %[return_hooker], (%[parent])\n" 667 - " movl $0, %[faulted]\n" 668 - "3:\n" 669 - 670 - ".section .fixup, \"ax\"\n" 671 - "4: movl $1, %[faulted]\n" 672 - " jmp 3b\n" 673 - ".previous\n" 674 - 675 - _ASM_EXTABLE(1b, 4b) 676 - _ASM_EXTABLE(2b, 4b) 677 - 678 - : [old] "=&r" (old), [faulted] "=r" (faulted) 679 - : [parent] "r" (parent), [return_hooker] "r" (return_hooker) 680 - : "memory" 681 - ); 682 - 683 - if (unlikely(faulted)) { 684 - ftrace_graph_stop(); 685 - WARN_ON(1); 653 + bit = ftrace_test_recursion_trylock(ip, *parent); 654 + if (bit < 0) 686 655 return; 687 - } 688 656 689 - if (function_graph_enter(old, self_addr, frame_pointer, parent)) 690 - *parent = old; 657 + if (!function_graph_enter(*parent, ip, frame_pointer, parent)) 658 + *parent = return_hooker; 659 + 660 + ftrace_test_recursion_unlock(bit); 691 661 } 662 + 663 + #ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS 664 + void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, 665 + struct ftrace_ops *op, struct ftrace_regs *fregs) 666 + { 667 + struct pt_regs *regs = &fregs->regs; 668 + unsigned long *stack = (unsigned long *)kernel_stack_pointer(regs); 669 + 670 + prepare_ftrace_return(ip, (unsigned long *)stack, 0); 671 + } 672 + #endif 673 + 692 674 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+1 -29
arch/x86/kernel/ftrace_64.S
··· 174 174 SYM_FUNC_END(ftrace_caller); 175 175 176 176 SYM_FUNC_START(ftrace_epilogue) 177 - #ifdef CONFIG_FUNCTION_GRAPH_TRACER 178 - SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL) 179 - jmp ftrace_stub 180 - #endif 181 - 182 177 /* 183 178 * This is weak to keep gas from relaxing the jumps. 184 179 * It is also used to copy the retq for trampolines. ··· 246 251 * If ORIG_RAX is anything but zero, make this a call to that. 247 252 * See arch_ftrace_set_direct_caller(). 248 253 */ 249 - movq ORIG_RAX(%rsp), %rax 250 254 testq %rax, %rax 251 255 SYM_INNER_LABEL(ftrace_regs_caller_jmp, SYM_L_GLOBAL) 252 256 jnz 1f ··· 283 289 cmpq $ftrace_stub, ftrace_trace_function 284 290 jnz trace 285 291 286 - fgraph_trace: 287 - #ifdef CONFIG_FUNCTION_GRAPH_TRACER 288 - cmpq $ftrace_stub, ftrace_graph_return 289 - jnz ftrace_graph_caller 290 - 291 - cmpq $ftrace_graph_entry_stub, ftrace_graph_entry 292 - jnz ftrace_graph_caller 293 - #endif 294 - 295 292 SYM_INNER_LABEL(ftrace_stub, SYM_L_GLOBAL) 296 293 retq 297 294 ··· 300 315 CALL_NOSPEC r8 301 316 restore_mcount_regs 302 317 303 - jmp fgraph_trace 318 + jmp ftrace_stub 304 319 SYM_FUNC_END(__fentry__) 305 320 EXPORT_SYMBOL(__fentry__) 306 321 #endif /* CONFIG_DYNAMIC_FTRACE */ 307 322 308 323 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 309 - SYM_FUNC_START(ftrace_graph_caller) 310 - /* Saves rbp into %rdx and fills first parameter */ 311 - save_mcount_regs 312 - 313 - leaq MCOUNT_REG_SIZE+8(%rsp), %rsi 314 - movq $0, %rdx /* No framepointers needed */ 315 - call prepare_ftrace_return 316 - 317 - restore_mcount_regs 318 - 319 - retq 320 - SYM_FUNC_END(ftrace_graph_caller) 321 - 322 324 SYM_FUNC_START(return_to_handler) 323 325 subq $24, %rsp 324 326
+55 -16
arch/x86/kernel/kprobes/core.c
··· 809 809 ri->fp = sara; 810 810 811 811 /* Replace the return addr with trampoline addr */ 812 - *sara = (unsigned long) &kretprobe_trampoline; 812 + *sara = (unsigned long) &__kretprobe_trampoline; 813 813 } 814 814 NOKPROBE_SYMBOL(arch_prepare_kretprobe); 815 815 ··· 1019 1019 */ 1020 1020 asm( 1021 1021 ".text\n" 1022 - ".global kretprobe_trampoline\n" 1023 - ".type kretprobe_trampoline, @function\n" 1024 - "kretprobe_trampoline:\n" 1025 - /* We don't bother saving the ss register */ 1022 + ".global __kretprobe_trampoline\n" 1023 + ".type __kretprobe_trampoline, @function\n" 1024 + "__kretprobe_trampoline:\n" 1026 1025 #ifdef CONFIG_X86_64 1026 + /* Push a fake return address to tell the unwinder it's a kretprobe. */ 1027 + " pushq $__kretprobe_trampoline\n" 1028 + UNWIND_HINT_FUNC 1029 + /* Save the 'sp - 8', this will be fixed later. */ 1027 1030 " pushq %rsp\n" 1028 1031 " pushfq\n" 1029 1032 SAVE_REGS_STRING 1030 1033 " movq %rsp, %rdi\n" 1031 1034 " call trampoline_handler\n" 1032 - /* Replace saved sp with true return address. */ 1033 - " movq %rax, 19*8(%rsp)\n" 1034 1035 RESTORE_REGS_STRING 1036 + /* In trampoline_handler(), 'regs->flags' is copied to 'regs->sp'. */ 1037 + " addq $8, %rsp\n" 1035 1038 " popfq\n" 1036 1039 #else 1040 + /* Push a fake return address to tell the unwinder it's a kretprobe. */ 1041 + " pushl $__kretprobe_trampoline\n" 1042 + UNWIND_HINT_FUNC 1043 + /* Save the 'sp - 4', this will be fixed later. */ 1037 1044 " pushl %esp\n" 1038 1045 " pushfl\n" 1039 1046 SAVE_REGS_STRING 1040 1047 " movl %esp, %eax\n" 1041 1048 " call trampoline_handler\n" 1042 - /* Replace saved sp with true return address. */ 1043 - " movl %eax, 15*4(%esp)\n" 1044 1049 RESTORE_REGS_STRING 1050 + /* In trampoline_handler(), 'regs->flags' is copied to 'regs->sp'. */ 1051 + " addl $4, %esp\n" 1045 1052 " popfl\n" 1046 1053 #endif 1047 1054 " ret\n" 1048 - ".size kretprobe_trampoline, .-kretprobe_trampoline\n" 1055 + ".size __kretprobe_trampoline, .-__kretprobe_trampoline\n" 1049 1056 ); 1050 - NOKPROBE_SYMBOL(kretprobe_trampoline); 1051 - STACK_FRAME_NON_STANDARD(kretprobe_trampoline); 1057 + NOKPROBE_SYMBOL(__kretprobe_trampoline); 1058 + /* 1059 + * __kretprobe_trampoline() skips updating frame pointer. The frame pointer 1060 + * saved in trampoline_handler() points to the real caller function's 1061 + * frame pointer. Thus the __kretprobe_trampoline() doesn't have a 1062 + * standard stack frame with CONFIG_FRAME_POINTER=y. 1063 + * Let's mark it non-standard function. Anyway, FP unwinder can correctly 1064 + * unwind without the hint. 1065 + */ 1066 + STACK_FRAME_NON_STANDARD_FP(__kretprobe_trampoline); 1052 1067 1068 + /* This is called from kretprobe_trampoline_handler(). */ 1069 + void arch_kretprobe_fixup_return(struct pt_regs *regs, 1070 + kprobe_opcode_t *correct_ret_addr) 1071 + { 1072 + unsigned long *frame_pointer = &regs->sp + 1; 1073 + 1074 + /* Replace fake return address with real one. */ 1075 + *frame_pointer = (unsigned long)correct_ret_addr; 1076 + } 1053 1077 1054 1078 /* 1055 - * Called from kretprobe_trampoline 1079 + * Called from __kretprobe_trampoline 1056 1080 */ 1057 - __used __visible void *trampoline_handler(struct pt_regs *regs) 1081 + __used __visible void trampoline_handler(struct pt_regs *regs) 1058 1082 { 1083 + unsigned long *frame_pointer; 1084 + 1059 1085 /* fixup registers */ 1060 1086 regs->cs = __KERNEL_CS; 1061 1087 #ifdef CONFIG_X86_32 1062 1088 regs->gs = 0; 1063 1089 #endif 1064 - regs->ip = (unsigned long)&kretprobe_trampoline; 1090 + regs->ip = (unsigned long)&__kretprobe_trampoline; 1065 1091 regs->orig_ax = ~0UL; 1092 + regs->sp += sizeof(long); 1093 + frame_pointer = &regs->sp + 1; 1066 1094 1067 - return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, &regs->sp); 1095 + /* 1096 + * The return address at 'frame_pointer' is recovered by the 1097 + * arch_kretprobe_fixup_return() which called from the 1098 + * kretprobe_trampoline_handler(). 1099 + */ 1100 + kretprobe_trampoline_handler(regs, frame_pointer); 1101 + 1102 + /* 1103 + * Copy FLAGS to 'pt_regs::sp' so that __kretprobe_trapmoline() 1104 + * can do RET right after POPF. 1105 + */ 1106 + regs->sp = regs->flags; 1068 1107 } 1069 1108 NOKPROBE_SYMBOL(trampoline_handler); 1070 1109
-2
arch/x86/kernel/kprobes/ftrace.c
··· 25 25 if (bit < 0) 26 26 return; 27 27 28 - preempt_disable_notrace(); 29 28 p = get_kprobe((kprobe_opcode_t *)ip); 30 29 if (unlikely(!p) || kprobe_disabled(p)) 31 30 goto out; ··· 58 59 __this_cpu_write(current_kprobe, NULL); 59 60 } 60 61 out: 61 - preempt_enable_notrace(); 62 62 ftrace_test_recursion_unlock(bit); 63 63 } 64 64 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
+3 -3
arch/x86/kernel/kprobes/opt.c
··· 367 367 368 368 /* Check the addr is within the optimized instructions. */ 369 369 int arch_within_optimized_kprobe(struct optimized_kprobe *op, 370 - unsigned long addr) 370 + kprobe_opcode_t *addr) 371 371 { 372 - return ((unsigned long)op->kp.addr <= addr && 373 - (unsigned long)op->kp.addr + op->optinsn.size > addr); 372 + return (op->kp.addr <= addr && 373 + op->kp.addr + op->optinsn.size > addr); 374 374 } 375 375 376 376 /* Free optimized instruction slot */
+1 -1
arch/x86/kernel/trace.c
··· 231 231 unregister_trace_local_timer_exit(trace_intel_irq_exit, "local_timer"); 232 232 unregister_trace_local_timer_entry(trace_intel_irq_entry, NULL); 233 233 } 234 - #endif /* CONFIG_OSNOISE_TRAECR && CONFIG_X86_LOCAL_APIC */ 234 + #endif /* CONFIG_OSNOISE_TRACER && CONFIG_X86_LOCAL_APIC */
+1 -2
arch/x86/kernel/unwind_frame.c
··· 240 240 else { 241 241 addr_p = unwind_get_return_address_ptr(state); 242 242 addr = READ_ONCE_TASK_STACK(state->task, *addr_p); 243 - state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx, 244 - addr, addr_p); 243 + state->ip = unwind_recover_ret_addr(state, addr, addr_p); 245 244 } 246 245 247 246 /* Save the original stack pointer for unwind_dump(): */
+1 -2
arch/x86/kernel/unwind_guess.c
··· 15 15 16 16 addr = READ_ONCE_NOCHECK(*state->sp); 17 17 18 - return ftrace_graph_ret_addr(state->task, &state->graph_idx, 19 - addr, state->sp); 18 + return unwind_recover_ret_addr(state, addr, state->sp); 20 19 } 21 20 EXPORT_SYMBOL_GPL(unwind_get_return_address); 22 21
+17 -4
arch/x86/kernel/unwind_orc.c
··· 534 534 if (!deref_stack_reg(state, ip_p, &state->ip)) 535 535 goto err; 536 536 537 - state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx, 538 - state->ip, (void *)ip_p); 539 - 537 + state->ip = unwind_recover_ret_addr(state, state->ip, 538 + (unsigned long *)ip_p); 540 539 state->sp = sp; 541 540 state->regs = NULL; 542 541 state->prev_regs = NULL; ··· 548 549 (void *)orig_ip); 549 550 goto err; 550 551 } 551 - 552 + /* 553 + * There is a small chance to interrupt at the entry of 554 + * __kretprobe_trampoline() where the ORC info doesn't exist. 555 + * That point is right after the RET to __kretprobe_trampoline() 556 + * which was modified return address. 557 + * At that point, the @addr_p of the unwind_recover_kretprobe() 558 + * (this has to point the address of the stack entry storing 559 + * the modified return address) must be "SP - (a stack entry)" 560 + * because SP is incremented by the RET. 561 + */ 562 + state->ip = unwind_recover_kretprobe(state, state->ip, 563 + (unsigned long *)(state->sp - sizeof(long))); 552 564 state->regs = (struct pt_regs *)sp; 553 565 state->prev_regs = NULL; 554 566 state->full_regs = true; ··· 572 562 (void *)orig_ip); 573 563 goto err; 574 564 } 565 + /* See UNWIND_HINT_TYPE_REGS case comment. */ 566 + state->ip = unwind_recover_kretprobe(state, state->ip, 567 + (unsigned long *)(state->sp - sizeof(long))); 575 568 576 569 if (state->full_regs) 577 570 state->prev_regs = state->regs;
+2 -1
fs/tracefs/inode.c
··· 432 432 if (unlikely(!inode)) 433 433 return failed_creating(dentry); 434 434 435 - inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; 435 + /* Do not set bits for OTH */ 436 + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; 436 437 inode->i_op = ops; 437 438 inode->i_fop = &simple_dir_operations; 438 439
+8 -2
include/asm-generic/vmlinux.lds.h
··· 164 164 * Need to also make ftrace_stub_graph point to ftrace_stub 165 165 * so that the same stub location may have different protocols 166 166 * and not mess up with C verifiers. 167 + * 168 + * ftrace_ops_list_func will be defined as arch_ftrace_ops_list_func 169 + * as some archs will have a different prototype for that function 170 + * but ftrace_ops_list_func() will have a single prototype. 167 171 */ 168 172 #define MCOUNT_REC() . = ALIGN(8); \ 169 173 __start_mcount_loc = .; \ 170 174 KEEP(*(__mcount_loc)) \ 171 175 KEEP(*(__patchable_function_entries)) \ 172 176 __stop_mcount_loc = .; \ 173 - ftrace_stub_graph = ftrace_stub; 177 + ftrace_stub_graph = ftrace_stub; \ 178 + ftrace_ops_list_func = arch_ftrace_ops_list_func; 174 179 #else 175 180 # ifdef CONFIG_FUNCTION_TRACER 176 - # define MCOUNT_REC() ftrace_stub_graph = ftrace_stub; 181 + # define MCOUNT_REC() ftrace_stub_graph = ftrace_stub; \ 182 + ftrace_ops_list_func = arch_ftrace_ops_list_func; 177 183 # else 178 184 # define MCOUNT_REC() 179 185 # endif
+20 -11
include/linux/bootconfig.h
··· 7 7 * Author: Masami Hiramatsu <mhiramat@kernel.org> 8 8 */ 9 9 10 + #ifdef __KERNEL__ 10 11 #include <linux/kernel.h> 11 12 #include <linux/types.h> 13 + #else /* !__KERNEL__ */ 14 + /* 15 + * NOTE: This is only for tools/bootconfig, because tools/bootconfig will 16 + * run the parser sanity test. 17 + * This does NOT mean linux/bootconfig.h is available in the user space. 18 + * However, if you change this file, please make sure the tools/bootconfig 19 + * has no issue on building and running. 20 + */ 21 + #endif 12 22 13 23 #define BOOTCONFIG_MAGIC "#BOOTCONFIG\n" 14 24 #define BOOTCONFIG_MAGIC_LEN 12 ··· 35 25 * The checksum will be used with the BOOTCONFIG_MAGIC and the size for 36 26 * embedding the bootconfig in the initrd image. 37 27 */ 38 - static inline __init u32 xbc_calc_checksum(void *data, u32 size) 28 + static inline __init uint32_t xbc_calc_checksum(void *data, uint32_t size) 39 29 { 40 30 unsigned char *p = data; 41 - u32 ret = 0; 31 + uint32_t ret = 0; 42 32 43 33 while (size--) 44 34 ret += *p++; ··· 48 38 49 39 /* XBC tree node */ 50 40 struct xbc_node { 51 - u16 next; 52 - u16 child; 53 - u16 parent; 54 - u16 data; 41 + uint16_t next; 42 + uint16_t child; 43 + uint16_t parent; 44 + uint16_t data; 55 45 } __attribute__ ((__packed__)); 56 46 57 47 #define XBC_KEY 0 ··· 281 271 } 282 272 283 273 /* XBC node initializer */ 284 - int __init xbc_init(char *buf, const char **emsg, int *epos); 274 + int __init xbc_init(const char *buf, size_t size, const char **emsg, int *epos); 285 275 276 + /* XBC node and size information */ 277 + int __init xbc_get_info(int *node_size, size_t *data_size); 286 278 287 279 /* XBC cleanup data structures */ 288 - void __init xbc_destroy_all(void); 289 - 290 - /* Debug dump functions */ 291 - void __init xbc_debug_dump(void); 280 + void __init xbc_exit(void); 292 281 293 282 #endif
+36 -2
include/linux/ftrace.h
··· 30 30 #define ARCH_SUPPORTS_FTRACE_OPS 0 31 31 #endif 32 32 33 + #ifdef CONFIG_FUNCTION_TRACER 34 + struct ftrace_ops; 35 + struct ftrace_regs; 33 36 /* 34 37 * If the arch's mcount caller does not support all of ftrace's 35 38 * features, then it must call an indirect function that 36 39 * does. Or at least does enough to prevent any unwelcome side effects. 40 + * 41 + * Also define the function prototype that these architectures use 42 + * to call the ftrace_ops_list_func(). 37 43 */ 38 44 #if !ARCH_SUPPORTS_FTRACE_OPS 39 45 # define FTRACE_FORCE_LIST_FUNC 1 46 + void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip); 40 47 #else 41 48 # define FTRACE_FORCE_LIST_FUNC 0 49 + void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, 50 + struct ftrace_ops *op, struct ftrace_regs *fregs); 42 51 #endif 52 + #endif /* CONFIG_FUNCTION_TRACER */ 43 53 44 54 /* Main tracing buffer and events set up */ 45 55 #ifdef CONFIG_TRACING ··· 97 87 extern int 98 88 ftrace_enable_sysctl(struct ctl_table *table, int write, 99 89 void *buffer, size_t *lenp, loff_t *ppos); 100 - 101 - struct ftrace_ops; 102 90 103 91 #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS 104 92 ··· 324 316 unsigned long old_addr, 325 317 unsigned long new_addr); 326 318 unsigned long ftrace_find_rec_direct(unsigned long ip); 319 + int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 320 + int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 321 + int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); 322 + 327 323 #else 324 + struct ftrace_ops; 328 325 # define ftrace_direct_func_count 0 329 326 static inline int register_ftrace_direct(unsigned long ip, unsigned long addr) 330 327 { ··· 358 345 static inline unsigned long ftrace_find_rec_direct(unsigned long ip) 359 346 { 360 347 return 0; 348 + } 349 + static inline int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 350 + { 351 + return -ENODEV; 352 + } 353 + static inline int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 354 + { 355 + return -ENODEV; 356 + } 357 + static inline int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 358 + { 359 + return -ENODEV; 361 360 } 362 361 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ 363 362 ··· 819 794 return false; 820 795 } 821 796 #endif /* CONFIG_DYNAMIC_FTRACE */ 797 + 798 + #ifdef CONFIG_FUNCTION_GRAPH_TRACER 799 + #ifndef ftrace_graph_func 800 + #define ftrace_graph_func ftrace_stub 801 + #define FTRACE_OPS_GRAPH_STUB FTRACE_OPS_FL_STUB 802 + #else 803 + #define FTRACE_OPS_GRAPH_STUB 0 804 + #endif 805 + #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ 822 806 823 807 /* totally disable ftrace - can not re-enable after this */ 824 808 void ftrace_kill(void);
+71 -42
include/linux/kprobes.h
··· 3 3 #define _LINUX_KPROBES_H 4 4 /* 5 5 * Kernel Probes (KProbes) 6 - * include/linux/kprobes.h 7 6 * 8 7 * Copyright (C) IBM Corporation, 2002, 2004 9 8 * ··· 38 39 #define KPROBE_REENTER 0x00000004 39 40 #define KPROBE_HIT_SSDONE 0x00000008 40 41 41 - #else /* CONFIG_KPROBES */ 42 + #else /* !CONFIG_KPROBES */ 42 43 #include <asm-generic/kprobes.h> 43 44 typedef int kprobe_opcode_t; 44 45 struct arch_specific_insn { ··· 104 105 #define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */ 105 106 106 107 /* Has this kprobe gone ? */ 107 - static inline int kprobe_gone(struct kprobe *p) 108 + static inline bool kprobe_gone(struct kprobe *p) 108 109 { 109 110 return p->flags & KPROBE_FLAG_GONE; 110 111 } 111 112 112 113 /* Is this kprobe disabled ? */ 113 - static inline int kprobe_disabled(struct kprobe *p) 114 + static inline bool kprobe_disabled(struct kprobe *p) 114 115 { 115 116 return p->flags & (KPROBE_FLAG_DISABLED | KPROBE_FLAG_GONE); 116 117 } 117 118 118 119 /* Is this kprobe really running optimized path ? */ 119 - static inline int kprobe_optimized(struct kprobe *p) 120 + static inline bool kprobe_optimized(struct kprobe *p) 120 121 { 121 122 return p->flags & KPROBE_FLAG_OPTIMIZED; 122 123 } 123 124 124 125 /* Is this kprobe uses ftrace ? */ 125 - static inline int kprobe_ftrace(struct kprobe *p) 126 + static inline bool kprobe_ftrace(struct kprobe *p) 126 127 { 127 128 return p->flags & KPROBE_FLAG_FTRACE; 128 129 } ··· 180 181 DECLARE_PER_CPU(struct kprobe *, current_kprobe); 181 182 DECLARE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); 182 183 183 - /* 184 - * For #ifdef avoidance: 185 - */ 186 - static inline int kprobes_built_in(void) 187 - { 188 - return 1; 189 - } 190 - 191 184 extern void kprobe_busy_begin(void); 192 185 extern void kprobe_busy_end(void); 193 186 ··· 188 197 struct pt_regs *regs); 189 198 extern int arch_trampoline_kprobe(struct kprobe *p); 190 199 200 + void arch_kretprobe_fixup_return(struct pt_regs *regs, 201 + kprobe_opcode_t *correct_ret_addr); 202 + 203 + void __kretprobe_trampoline(void); 204 + /* 205 + * Since some architecture uses structured function pointer, 206 + * use dereference_function_descriptor() to get real function address. 207 + */ 208 + static nokprobe_inline void *kretprobe_trampoline_addr(void) 209 + { 210 + return dereference_kernel_function_descriptor(__kretprobe_trampoline); 211 + } 212 + 191 213 /* If the trampoline handler called from a kprobe, use this version */ 192 214 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs, 193 - void *trampoline_address, 194 - void *frame_pointer); 215 + void *frame_pointer); 195 216 196 217 static nokprobe_inline 197 218 unsigned long kretprobe_trampoline_handler(struct pt_regs *regs, 198 - void *trampoline_address, 199 - void *frame_pointer) 219 + void *frame_pointer) 200 220 { 201 221 unsigned long ret; 202 222 /* ··· 216 214 * be running at this point. 217 215 */ 218 216 kprobe_busy_begin(); 219 - ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer); 217 + ret = __kretprobe_trampoline_handler(regs, frame_pointer); 220 218 kprobe_busy_end(); 221 219 222 220 return ret; ··· 230 228 return READ_ONCE(ri->rph->rp); 231 229 } 232 230 233 - #else /* CONFIG_KRETPROBES */ 231 + #else /* !CONFIG_KRETPROBES */ 234 232 static inline void arch_prepare_kretprobe(struct kretprobe *rp, 235 233 struct pt_regs *regs) 236 234 { ··· 241 239 } 242 240 #endif /* CONFIG_KRETPROBES */ 243 241 242 + /* Markers of '_kprobe_blacklist' section */ 243 + extern unsigned long __start_kprobe_blacklist[]; 244 + extern unsigned long __stop_kprobe_blacklist[]; 245 + 244 246 extern struct kretprobe_blackpoint kretprobe_blacklist[]; 245 247 246 248 #ifdef CONFIG_KPROBES_SANITY_TEST 247 249 extern int init_test_probes(void); 248 - #else 250 + #else /* !CONFIG_KPROBES_SANITY_TEST */ 249 251 static inline int init_test_probes(void) 250 252 { 251 253 return 0; ··· 309 303 #define KPROBE_OPTINSN_PAGE_SYM "kprobe_optinsn_page" 310 304 int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum, 311 305 unsigned long *value, char *type, char *sym); 312 - #else /* __ARCH_WANT_KPROBES_INSN_SLOT */ 306 + #else /* !__ARCH_WANT_KPROBES_INSN_SLOT */ 313 307 #define DEFINE_INSN_CACHE_OPS(__name) \ 314 308 static inline bool is_kprobe_##__name##_slot(unsigned long addr) \ 315 309 { \ ··· 340 334 struct list_head *done_list); 341 335 extern void arch_unoptimize_kprobe(struct optimized_kprobe *op); 342 336 extern int arch_within_optimized_kprobe(struct optimized_kprobe *op, 343 - unsigned long addr); 337 + kprobe_opcode_t *addr); 344 338 345 339 extern void opt_pre_handler(struct kprobe *p, struct pt_regs *regs); 346 340 ··· 351 345 extern int proc_kprobes_optimization_handler(struct ctl_table *table, 352 346 int write, void *buffer, 353 347 size_t *length, loff_t *ppos); 354 - #endif 348 + #endif /* CONFIG_SYSCTL */ 355 349 extern void wait_for_kprobe_optimizer(void); 356 - #else 350 + #else /* !CONFIG_OPTPROBES */ 357 351 static inline void wait_for_kprobe_optimizer(void) { } 358 352 #endif /* CONFIG_OPTPROBES */ 353 + 359 354 #ifdef CONFIG_KPROBES_ON_FTRACE 360 355 extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, 361 356 struct ftrace_ops *ops, struct ftrace_regs *fregs); 362 357 extern int arch_prepare_kprobe_ftrace(struct kprobe *p); 363 - #endif 364 - 365 - int arch_check_ftrace_location(struct kprobe *p); 358 + #else 359 + static inline int arch_prepare_kprobe_ftrace(struct kprobe *p) 360 + { 361 + return -EINVAL; 362 + } 363 + #endif /* CONFIG_KPROBES_ON_FTRACE */ 366 364 367 365 /* Get the kprobe at this addr (if any) - called with preemption disabled */ 368 366 struct kprobe *get_kprobe(void *addr); ··· 374 364 /* kprobe_running() will just return the current_kprobe on this CPU */ 375 365 static inline struct kprobe *kprobe_running(void) 376 366 { 377 - return (__this_cpu_read(current_kprobe)); 367 + return __this_cpu_read(current_kprobe); 378 368 } 379 369 380 370 static inline void reset_current_kprobe(void) ··· 392 382 void unregister_kprobe(struct kprobe *p); 393 383 int register_kprobes(struct kprobe **kps, int num); 394 384 void unregister_kprobes(struct kprobe **kps, int num); 395 - unsigned long arch_deref_entry_point(void *); 396 385 397 386 int register_kretprobe(struct kretprobe *rp); 398 387 void unregister_kretprobe(struct kretprobe *rp); ··· 419 410 char *type, char *sym); 420 411 #else /* !CONFIG_KPROBES: */ 421 412 422 - static inline int kprobes_built_in(void) 423 - { 424 - return 0; 425 - } 426 413 static inline int kprobe_fault_handler(struct pt_regs *regs, int trapnr) 427 414 { 428 415 return 0; ··· 433 428 } 434 429 static inline int register_kprobe(struct kprobe *p) 435 430 { 436 - return -ENOSYS; 431 + return -EOPNOTSUPP; 437 432 } 438 433 static inline int register_kprobes(struct kprobe **kps, int num) 439 434 { 440 - return -ENOSYS; 435 + return -EOPNOTSUPP; 441 436 } 442 437 static inline void unregister_kprobe(struct kprobe *p) 443 438 { ··· 447 442 } 448 443 static inline int register_kretprobe(struct kretprobe *rp) 449 444 { 450 - return -ENOSYS; 445 + return -EOPNOTSUPP; 451 446 } 452 447 static inline int register_kretprobes(struct kretprobe **rps, int num) 453 448 { 454 - return -ENOSYS; 449 + return -EOPNOTSUPP; 455 450 } 456 451 static inline void unregister_kretprobe(struct kretprobe *rp) 457 452 { ··· 467 462 } 468 463 static inline int disable_kprobe(struct kprobe *kp) 469 464 { 470 - return -ENOSYS; 465 + return -EOPNOTSUPP; 471 466 } 472 467 static inline int enable_kprobe(struct kprobe *kp) 473 468 { 474 - return -ENOSYS; 469 + return -EOPNOTSUPP; 475 470 } 476 471 477 472 static inline bool within_kprobe_blacklist(unsigned long addr) ··· 484 479 return -ERANGE; 485 480 } 486 481 #endif /* CONFIG_KPROBES */ 482 + 487 483 static inline int disable_kretprobe(struct kretprobe *rp) 488 484 { 489 485 return disable_kprobe(&rp->kp); ··· 499 493 { 500 494 return false; 501 495 } 502 - #endif 496 + #endif /* !CONFIG_KPROBES */ 497 + 503 498 #ifndef CONFIG_OPTPROBES 504 499 static inline bool is_kprobe_optinsn_slot(unsigned long addr) 505 500 { 506 501 return false; 502 + } 503 + #endif /* !CONFIG_OPTPROBES */ 504 + 505 + #ifdef CONFIG_KRETPROBES 506 + static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr) 507 + { 508 + return (void *)addr == kretprobe_trampoline_addr(); 509 + } 510 + 511 + unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp, 512 + struct llist_node **cur); 513 + #else 514 + static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr) 515 + { 516 + return false; 517 + } 518 + 519 + static nokprobe_inline 520 + unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp, 521 + struct llist_node **cur) 522 + { 523 + return 0; 507 524 } 508 525 #endif 509 526 ··· 534 505 static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs, 535 506 unsigned int trap) 536 507 { 537 - if (!kprobes_built_in()) 508 + if (!IS_ENABLED(CONFIG_KPROBES)) 538 509 return false; 539 510 if (user_mode(regs)) 540 511 return false;
+12
include/linux/objtool.h
··· 66 66 static void __used __section(".discard.func_stack_frame_non_standard") \ 67 67 *__func_stack_frame_non_standard_##func = func 68 68 69 + /* 70 + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore 71 + * for the case where a function is intentionally missing frame pointer setup, 72 + * but otherwise needs objtool/ORC coverage when frame pointers are disabled. 73 + */ 74 + #ifdef CONFIG_FRAME_POINTER 75 + #define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func) 76 + #else 77 + #define STACK_FRAME_NON_STANDARD_FP(func) 78 + #endif 79 + 69 80 #else /* __ASSEMBLY__ */ 70 81 71 82 /* ··· 138 127 #define UNWIND_HINT(sp_reg, sp_offset, type, end) \ 139 128 "\n\t" 140 129 #define STACK_FRAME_NON_STANDARD(func) 130 + #define STACK_FRAME_NON_STANDARD_FP(func) 141 131 #else 142 132 #define ANNOTATE_INTRA_FUNCTION_CALL 143 133 .macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0
+21
include/linux/preempt.h
··· 77 77 /* preempt_count() and related functions, depends on PREEMPT_NEED_RESCHED */ 78 78 #include <asm/preempt.h> 79 79 80 + /** 81 + * interrupt_context_level - return interrupt context level 82 + * 83 + * Returns the current interrupt context level. 84 + * 0 - normal context 85 + * 1 - softirq context 86 + * 2 - hardirq context 87 + * 3 - NMI context 88 + */ 89 + static __always_inline unsigned char interrupt_context_level(void) 90 + { 91 + unsigned long pc = preempt_count(); 92 + unsigned char level = 0; 93 + 94 + level += !!(pc & (NMI_MASK)); 95 + level += !!(pc & (NMI_MASK | HARDIRQ_MASK)); 96 + level += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)); 97 + 98 + return level; 99 + } 100 + 80 101 #define nmi_count() (preempt_count() & NMI_MASK) 81 102 #define hardirq_count() (preempt_count() & HARDIRQ_MASK) 82 103 #ifdef CONFIG_PREEMPT_RT
+1 -1
include/linux/trace_events.h
··· 671 671 } \ 672 672 early_initcall(trace_init_perf_perm_##name); 673 673 674 - #define PERF_MAX_TRACE_SIZE 2048 674 + #define PERF_MAX_TRACE_SIZE 8192 675 675 676 676 #define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */ 677 677
+18 -9
include/linux/trace_recursion.h
··· 116 116 117 117 static __always_inline int trace_get_context_bit(void) 118 118 { 119 - unsigned long pc = preempt_count(); 119 + unsigned char bit = interrupt_context_level(); 120 120 121 - if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) 122 - return TRACE_CTX_NORMAL; 123 - else 124 - return pc & NMI_MASK ? TRACE_CTX_NMI : 125 - pc & HARDIRQ_MASK ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ; 121 + return TRACE_CTX_NORMAL - bit; 126 122 } 127 123 128 124 #ifdef CONFIG_FTRACE_RECORD_RECURSION ··· 135 139 # define do_ftrace_record_recursion(ip, pip) do { } while (0) 136 140 #endif 137 141 142 + /* 143 + * Preemption is promised to be disabled when return bit >= 0. 144 + */ 138 145 static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip, 139 146 int start) 140 147 { ··· 147 148 bit = trace_get_context_bit() + start; 148 149 if (unlikely(val & (1 << bit))) { 149 150 /* 150 - * It could be that preempt_count has not been updated during 151 - * a switch between contexts. Allow for a single recursion. 151 + * If an interrupt occurs during a trace, and another trace 152 + * happens in that interrupt but before the preempt_count is 153 + * updated to reflect the new interrupt context, then this 154 + * will think a recursion occurred, and the event will be dropped. 155 + * Let a single instance happen via the TRANSITION_BIT to 156 + * not drop those events. 152 157 */ 153 158 bit = TRACE_CTX_TRANSITION + start; 154 159 if (val & (1 << bit)) { ··· 165 162 current->trace_recursion = val; 166 163 barrier(); 167 164 165 + preempt_disable_notrace(); 166 + 168 167 return bit; 169 168 } 170 169 170 + /* 171 + * Preemption will be enabled (if it was previously enabled). 172 + */ 171 173 static __always_inline void trace_clear_recursion(int bit) 172 174 { 175 + preempt_enable_notrace(); 173 176 barrier(); 174 177 trace_recursion_clear(bit); 175 178 } ··· 187 178 * tracing recursed in the same context (normal vs interrupt), 188 179 * 189 180 * Returns: -1 if a recursion happened. 190 - * >= 0 if no recursion 181 + * >= 0 if no recursion. 191 182 */ 192 183 static __always_inline int ftrace_test_recursion_trylock(unsigned long ip, 193 184 unsigned long parent_ip)
+4 -12
init/main.c
··· 409 409 const char *msg; 410 410 int pos; 411 411 u32 size, csum; 412 - char *data, *copy, *err; 412 + char *data, *err; 413 413 int ret; 414 414 415 415 /* Cut out the bootconfig data even if we have no bootconfig option */ ··· 442 442 return; 443 443 } 444 444 445 - copy = memblock_alloc(size + 1, SMP_CACHE_BYTES); 446 - if (!copy) { 447 - pr_err("Failed to allocate memory for bootconfig\n"); 448 - return; 449 - } 450 - 451 - memcpy(copy, data, size); 452 - copy[size] = '\0'; 453 - 454 - ret = xbc_init(copy, &msg, &pos); 445 + ret = xbc_init(data, size, &msg, &pos); 455 446 if (ret < 0) { 456 447 if (pos < 0) 457 448 pr_err("Failed to init bootconfig: %s.\n", msg); ··· 450 459 pr_err("Failed to parse bootconfig: %s at %d.\n", 451 460 msg, pos); 452 461 } else { 462 + xbc_get_info(&ret, NULL); 453 463 pr_info("Load bootconfig: %d bytes %d nodes\n", size, ret); 454 464 /* keys starting with "kernel." are passed via cmdline */ 455 465 extra_command_line = xbc_make_cmdline("kernel"); ··· 462 470 463 471 static void __init exit_boot_config(void) 464 472 { 465 - xbc_destroy_all(); 473 + xbc_exit(); 466 474 } 467 475 468 476 #else /* !CONFIG_BOOT_CONFIG */
-1
kernel/Makefile
··· 85 85 obj-$(CONFIG_IKCONFIG) += configs.o 86 86 obj-$(CONFIG_IKHEADERS) += kheaders.o 87 87 obj-$(CONFIG_SMP) += stop_machine.o 88 - obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o 89 88 obj-$(CONFIG_AUDIT) += audit.o auditfilter.o 90 89 obj-$(CONFIG_AUDITSYSCALL) += auditsc.o audit_watch.o audit_fsnotify.o audit_tree.o 91 90 obj-$(CONFIG_GCOV_KERNEL) += gcov/
+1 -6
kernel/events/internal.h
··· 205 205 206 206 static inline int get_recursion_context(int *recursion) 207 207 { 208 - unsigned int pc = preempt_count(); 209 - unsigned char rctx = 0; 210 - 211 - rctx += !!(pc & (NMI_MASK)); 212 - rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK)); 213 - rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)); 208 + unsigned char rctx = interrupt_context_level(); 214 209 215 210 if (recursion[rctx]) 216 211 return -1;
+282 -229
kernel/kprobes.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0-or-later 2 2 /* 3 3 * Kernel Probes (KProbes) 4 - * kernel/kprobes.c 5 4 * 6 5 * Copyright (C) IBM Corporation, 2002, 2004 7 6 * ··· 17 18 * <jkenisto@us.ibm.com> and Prasanna S Panchamukhi 18 19 * <prasanna@in.ibm.com> added function-return probes. 19 20 */ 21 + 22 + #define pr_fmt(fmt) "kprobes: " fmt 23 + 20 24 #include <linux/kprobes.h> 21 25 #include <linux/hash.h> 22 26 #include <linux/init.h> ··· 51 49 52 50 static int kprobes_initialized; 53 51 /* kprobe_table can be accessed by 54 - * - Normal hlist traversal and RCU add/del under kprobe_mutex is held. 52 + * - Normal hlist traversal and RCU add/del under 'kprobe_mutex' is held. 55 53 * Or 56 54 * - RCU hlist traversal under disabling preempt (breakpoint handlers) 57 55 */ 58 56 static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE]; 59 57 60 - /* NOTE: change this value only with kprobe_mutex held */ 58 + /* NOTE: change this value only with 'kprobe_mutex' held */ 61 59 static bool kprobes_all_disarmed; 62 60 63 - /* This protects kprobe_table and optimizing_list */ 61 + /* This protects 'kprobe_table' and 'optimizing_list' */ 64 62 static DEFINE_MUTEX(kprobe_mutex); 65 - static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL; 63 + static DEFINE_PER_CPU(struct kprobe *, kprobe_instance); 66 64 67 65 kprobe_opcode_t * __weak kprobe_lookup_name(const char *name, 68 66 unsigned int __unused) ··· 70 68 return ((kprobe_opcode_t *)(kallsyms_lookup_name(name))); 71 69 } 72 70 73 - /* Blacklist -- list of struct kprobe_blacklist_entry */ 71 + /* 72 + * Blacklist -- list of 'struct kprobe_blacklist_entry' to store info where 73 + * kprobes can not probe. 74 + */ 74 75 static LIST_HEAD(kprobe_blacklist); 75 76 76 77 #ifdef __ARCH_WANT_KPROBES_INSN_SLOT 77 78 /* 78 - * kprobe->ainsn.insn points to the copy of the instruction to be 79 + * 'kprobe::ainsn.insn' points to the copy of the instruction to be 79 80 * single-stepped. x86_64, POWER4 and above have no-exec support and 80 81 * stepping on the instruction on a vmalloced/kmalloced/data page 81 82 * is a recipe for disaster ··· 109 104 110 105 void __weak *alloc_insn_page(void) 111 106 { 107 + /* 108 + * Use module_alloc() so this page is within +/- 2GB of where the 109 + * kernel image and loaded module images reside. This is required 110 + * for most of the architectures. 111 + * (e.g. x86-64 needs this to handle the %rip-relative fixups.) 112 + */ 112 113 return module_alloc(PAGE_SIZE); 113 114 } 114 115 ··· 150 139 list_for_each_entry_rcu(kip, &c->pages, list) { 151 140 if (kip->nused < slots_per_page(c)) { 152 141 int i; 142 + 153 143 for (i = 0; i < slots_per_page(c); i++) { 154 144 if (kip->slot_used[i] == SLOT_CLEAN) { 155 145 kip->slot_used[i] = SLOT_USED; ··· 176 164 if (!kip) 177 165 goto out; 178 166 179 - /* 180 - * Use module_alloc so this page is within +/- 2GB of where the 181 - * kernel image and loaded module images reside. This is required 182 - * so x86_64 can correctly handle the %rip-relative fixups. 183 - */ 184 167 kip->insns = c->alloc(); 185 168 if (!kip->insns) { 186 169 kfree(kip); ··· 198 191 return slot; 199 192 } 200 193 201 - /* Return 1 if all garbages are collected, otherwise 0. */ 202 - static int collect_one_slot(struct kprobe_insn_page *kip, int idx) 194 + /* Return true if all garbages are collected, otherwise false. */ 195 + static bool collect_one_slot(struct kprobe_insn_page *kip, int idx) 203 196 { 204 197 kip->slot_used[idx] = SLOT_CLEAN; 205 198 kip->nused--; ··· 223 216 kip->cache->free(kip->insns); 224 217 kfree(kip); 225 218 } 226 - return 1; 219 + return true; 227 220 } 228 - return 0; 221 + return false; 229 222 } 230 223 231 224 static int collect_garbage_slots(struct kprobe_insn_cache *c) ··· 237 230 238 231 list_for_each_entry_safe(kip, next, &c->pages, list) { 239 232 int i; 233 + 240 234 if (kip->ngarbage == 0) 241 235 continue; 242 236 kip->ngarbage = 0; /* we will collect all garbages */ ··· 318 310 list_for_each_entry_rcu(kip, &c->pages, list) { 319 311 if ((*symnum)--) 320 312 continue; 321 - strlcpy(sym, c->sym, KSYM_NAME_LEN); 313 + strscpy(sym, c->sym, KSYM_NAME_LEN); 322 314 *type = 't'; 323 315 *value = (unsigned long)kip->insns; 324 316 ret = 0; ··· 366 358 367 359 /* 368 360 * This routine is called either: 369 - * - under the kprobe_mutex - during kprobe_[un]register() 370 - * OR 371 - * - with preemption disabled - from arch/xxx/kernel/kprobes.c 361 + * - under the 'kprobe_mutex' - during kprobe_[un]register(). 362 + * OR 363 + * - with preemption disabled - from architecture specific code. 372 364 */ 373 365 struct kprobe *get_kprobe(void *addr) 374 366 { ··· 388 380 389 381 static int aggr_pre_handler(struct kprobe *p, struct pt_regs *regs); 390 382 391 - /* Return true if the kprobe is an aggregator */ 392 - static inline int kprobe_aggrprobe(struct kprobe *p) 383 + /* Return true if 'p' is an aggregator */ 384 + static inline bool kprobe_aggrprobe(struct kprobe *p) 393 385 { 394 386 return p->pre_handler == aggr_pre_handler; 395 387 } 396 388 397 - /* Return true(!0) if the kprobe is unused */ 398 - static inline int kprobe_unused(struct kprobe *p) 389 + /* Return true if 'p' is unused */ 390 + static inline bool kprobe_unused(struct kprobe *p) 399 391 { 400 392 return kprobe_aggrprobe(p) && kprobe_disabled(p) && 401 393 list_empty(&p->list); 402 394 } 403 395 404 - /* 405 - * Keep all fields in the kprobe consistent 406 - */ 396 + /* Keep all fields in the kprobe consistent. */ 407 397 static inline void copy_kprobe(struct kprobe *ap, struct kprobe *p) 408 398 { 409 399 memcpy(&p->opcode, &ap->opcode, sizeof(kprobe_opcode_t)); ··· 409 403 } 410 404 411 405 #ifdef CONFIG_OPTPROBES 412 - /* NOTE: change this value only with kprobe_mutex held */ 406 + /* NOTE: This is protected by 'kprobe_mutex'. */ 413 407 static bool kprobes_allow_optimization; 414 408 415 409 /* 416 - * Call all pre_handler on the list, but ignores its return value. 410 + * Call all 'kprobe::pre_handler' on the list, but ignores its return value. 417 411 * This must be called from arch-dep optimized caller. 418 412 */ 419 413 void opt_pre_handler(struct kprobe *p, struct pt_regs *regs) ··· 441 435 kfree(op); 442 436 } 443 437 444 - /* Return true(!0) if the kprobe is ready for optimization. */ 438 + /* Return true if the kprobe is ready for optimization. */ 445 439 static inline int kprobe_optready(struct kprobe *p) 446 440 { 447 441 struct optimized_kprobe *op; ··· 454 448 return 0; 455 449 } 456 450 457 - /* Return true(!0) if the kprobe is disarmed. Note: p must be on hash list */ 458 - static inline int kprobe_disarmed(struct kprobe *p) 451 + /* Return true if the kprobe is disarmed. Note: p must be on hash list */ 452 + static inline bool kprobe_disarmed(struct kprobe *p) 459 453 { 460 454 struct optimized_kprobe *op; 461 455 ··· 468 462 return kprobe_disabled(p) && list_empty(&op->list); 469 463 } 470 464 471 - /* Return true(!0) if the probe is queued on (un)optimizing lists */ 472 - static int kprobe_queued(struct kprobe *p) 465 + /* Return true if the probe is queued on (un)optimizing lists */ 466 + static bool kprobe_queued(struct kprobe *p) 473 467 { 474 468 struct optimized_kprobe *op; 475 469 476 470 if (kprobe_aggrprobe(p)) { 477 471 op = container_of(p, struct optimized_kprobe, kp); 478 472 if (!list_empty(&op->list)) 479 - return 1; 473 + return true; 480 474 } 481 - return 0; 475 + return false; 482 476 } 483 477 484 478 /* 485 479 * Return an optimized kprobe whose optimizing code replaces 486 - * instructions including addr (exclude breakpoint). 480 + * instructions including 'addr' (exclude breakpoint). 487 481 */ 488 - static struct kprobe *get_optimized_kprobe(unsigned long addr) 482 + static struct kprobe *get_optimized_kprobe(kprobe_opcode_t *addr) 489 483 { 490 484 int i; 491 485 struct kprobe *p = NULL; 492 486 struct optimized_kprobe *op; 493 487 494 488 /* Don't check i == 0, since that is a breakpoint case. */ 495 - for (i = 1; !p && i < MAX_OPTIMIZED_LENGTH; i++) 496 - p = get_kprobe((void *)(addr - i)); 489 + for (i = 1; !p && i < MAX_OPTIMIZED_LENGTH / sizeof(kprobe_opcode_t); i++) 490 + p = get_kprobe(addr - i); 497 491 498 492 if (p && kprobe_optready(p)) { 499 493 op = container_of(p, struct optimized_kprobe, kp); ··· 504 498 return NULL; 505 499 } 506 500 507 - /* Optimization staging list, protected by kprobe_mutex */ 501 + /* Optimization staging list, protected by 'kprobe_mutex' */ 508 502 static LIST_HEAD(optimizing_list); 509 503 static LIST_HEAD(unoptimizing_list); 510 504 static LIST_HEAD(freeing_list); ··· 515 509 516 510 /* 517 511 * Optimize (replace a breakpoint with a jump) kprobes listed on 518 - * optimizing_list. 512 + * 'optimizing_list'. 519 513 */ 520 514 static void do_optimize_kprobes(void) 521 515 { 522 516 lockdep_assert_held(&text_mutex); 523 517 /* 524 - * The optimization/unoptimization refers online_cpus via 525 - * stop_machine() and cpu-hotplug modifies online_cpus. 526 - * And same time, text_mutex will be held in cpu-hotplug and here. 527 - * This combination can cause a deadlock (cpu-hotplug try to lock 528 - * text_mutex but stop_machine can not be done because online_cpus 529 - * has been changed) 530 - * To avoid this deadlock, caller must have locked cpu hotplug 531 - * for preventing cpu-hotplug outside of text_mutex locking. 518 + * The optimization/unoptimization refers 'online_cpus' via 519 + * stop_machine() and cpu-hotplug modifies the 'online_cpus'. 520 + * And same time, 'text_mutex' will be held in cpu-hotplug and here. 521 + * This combination can cause a deadlock (cpu-hotplug tries to lock 522 + * 'text_mutex' but stop_machine() can not be done because 523 + * the 'online_cpus' has been changed) 524 + * To avoid this deadlock, caller must have locked cpu-hotplug 525 + * for preventing cpu-hotplug outside of 'text_mutex' locking. 532 526 */ 533 527 lockdep_assert_cpus_held(); 534 528 ··· 542 536 543 537 /* 544 538 * Unoptimize (replace a jump with a breakpoint and remove the breakpoint 545 - * if need) kprobes listed on unoptimizing_list. 539 + * if need) kprobes listed on 'unoptimizing_list'. 546 540 */ 547 541 static void do_unoptimize_kprobes(void) 548 542 { ··· 557 551 return; 558 552 559 553 arch_unoptimize_kprobes(&unoptimizing_list, &freeing_list); 560 - /* Loop free_list for disarming */ 554 + /* Loop on 'freeing_list' for disarming */ 561 555 list_for_each_entry_safe(op, tmp, &freeing_list, list) { 562 556 /* Switching from detour code to origin */ 563 557 op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED; ··· 568 562 /* 569 563 * Remove unused probes from hash list. After waiting 570 564 * for synchronization, these probes are reclaimed. 571 - * (reclaiming is done by do_free_cleaned_kprobes.) 565 + * (reclaiming is done by do_free_cleaned_kprobes().) 572 566 */ 573 567 hlist_del_rcu(&op->kp.hlist); 574 568 } else ··· 576 570 } 577 571 } 578 572 579 - /* Reclaim all kprobes on the free_list */ 573 + /* Reclaim all kprobes on the 'freeing_list' */ 580 574 static void do_free_cleaned_kprobes(void) 581 575 { 582 576 struct optimized_kprobe *op, *tmp; ··· 648 642 while (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list)) { 649 643 mutex_unlock(&kprobe_mutex); 650 644 651 - /* this will also make optimizing_work execute immmediately */ 645 + /* This will also make 'optimizing_work' execute immmediately */ 652 646 flush_delayed_work(&optimizing_work); 653 - /* @optimizing_work might not have been queued yet, relax */ 647 + /* 'optimizing_work' might not have been queued yet, relax */ 654 648 cpu_relax(); 655 649 656 650 mutex_lock(&kprobe_mutex); ··· 681 675 (kprobe_disabled(p) || kprobes_all_disarmed)) 682 676 return; 683 677 684 - /* kprobes with post_handler can not be optimized */ 678 + /* kprobes with 'post_handler' can not be optimized */ 685 679 if (p->post_handler) 686 680 return; 687 681 ··· 701 695 } 702 696 op->kp.flags |= KPROBE_FLAG_OPTIMIZED; 703 697 704 - /* On unoptimizing/optimizing_list, op must have OPTIMIZED flag */ 698 + /* 699 + * On the 'unoptimizing_list' and 'optimizing_list', 700 + * 'op' must have OPTIMIZED flag 701 + */ 705 702 if (WARN_ON_ONCE(!list_empty(&op->list))) 706 703 return; 707 704 ··· 774 765 WARN_ON_ONCE(list_empty(&op->list)); 775 766 /* Enable the probe again */ 776 767 ap->flags &= ~KPROBE_FLAG_DISABLED; 777 - /* Optimize it again (remove from op->list) */ 768 + /* Optimize it again. (remove from 'op->list') */ 778 769 if (!kprobe_optready(ap)) 779 770 return -EINVAL; 780 771 ··· 824 815 __prepare_optimized_kprobe(op, p); 825 816 } 826 817 827 - /* Allocate new optimized_kprobe and try to prepare optimized instructions */ 818 + /* Allocate new optimized_kprobe and try to prepare optimized instructions. */ 828 819 static struct kprobe *alloc_aggr_kprobe(struct kprobe *p) 829 820 { 830 821 struct optimized_kprobe *op; ··· 843 834 static void init_aggr_kprobe(struct kprobe *ap, struct kprobe *p); 844 835 845 836 /* 846 - * Prepare an optimized_kprobe and optimize it 847 - * NOTE: p must be a normal registered kprobe 837 + * Prepare an optimized_kprobe and optimize it. 838 + * NOTE: 'p' must be a normal registered kprobe. 848 839 */ 849 840 static void try_to_optimize_kprobe(struct kprobe *p) 850 841 { 851 842 struct kprobe *ap; 852 843 struct optimized_kprobe *op; 853 844 854 - /* Impossible to optimize ftrace-based kprobe */ 845 + /* Impossible to optimize ftrace-based kprobe. */ 855 846 if (kprobe_ftrace(p)) 856 847 return; 857 848 858 - /* For preparing optimization, jump_label_text_reserved() is called */ 849 + /* For preparing optimization, jump_label_text_reserved() is called. */ 859 850 cpus_read_lock(); 860 851 jump_label_lock(); 861 852 mutex_lock(&text_mutex); ··· 866 857 867 858 op = container_of(ap, struct optimized_kprobe, kp); 868 859 if (!arch_prepared_optinsn(&op->optinsn)) { 869 - /* If failed to setup optimizing, fallback to kprobe */ 860 + /* If failed to setup optimizing, fallback to kprobe. */ 870 861 arch_remove_optimized_kprobe(op); 871 862 kfree(op); 872 863 goto out; 873 864 } 874 865 875 866 init_aggr_kprobe(ap, p); 876 - optimize_kprobe(ap); /* This just kicks optimizer thread */ 867 + optimize_kprobe(ap); /* This just kicks optimizer thread. */ 877 868 878 869 out: 879 870 mutex_unlock(&text_mutex); ··· 888 879 unsigned int i; 889 880 890 881 mutex_lock(&kprobe_mutex); 891 - /* If optimization is already allowed, just return */ 882 + /* If optimization is already allowed, just return. */ 892 883 if (kprobes_allow_optimization) 893 884 goto out; 894 885 ··· 901 892 optimize_kprobe(p); 902 893 } 903 894 cpus_read_unlock(); 904 - printk(KERN_INFO "Kprobes globally optimized\n"); 895 + pr_info("kprobe jump-optimization is enabled. All kprobes are optimized if possible.\n"); 905 896 out: 906 897 mutex_unlock(&kprobe_mutex); 907 898 } ··· 914 905 unsigned int i; 915 906 916 907 mutex_lock(&kprobe_mutex); 917 - /* If optimization is already prohibited, just return */ 908 + /* If optimization is already prohibited, just return. */ 918 909 if (!kprobes_allow_optimization) { 919 910 mutex_unlock(&kprobe_mutex); 920 911 return; ··· 932 923 cpus_read_unlock(); 933 924 mutex_unlock(&kprobe_mutex); 934 925 935 - /* Wait for unoptimizing completion */ 926 + /* Wait for unoptimizing completion. */ 936 927 wait_for_kprobe_optimizer(); 937 - printk(KERN_INFO "Kprobes globally unoptimized\n"); 928 + pr_info("kprobe jump-optimization is disabled. All kprobes are based on software breakpoint.\n"); 938 929 } 939 930 940 931 static DEFINE_MUTEX(kprobe_sysctl_mutex); ··· 959 950 } 960 951 #endif /* CONFIG_SYSCTL */ 961 952 962 - /* Put a breakpoint for a probe. Must be called with text_mutex locked */ 953 + /* Put a breakpoint for a probe. */ 963 954 static void __arm_kprobe(struct kprobe *p) 964 955 { 965 956 struct kprobe *_p; 966 957 967 - /* Check collision with other optimized kprobes */ 968 - _p = get_optimized_kprobe((unsigned long)p->addr); 958 + lockdep_assert_held(&text_mutex); 959 + 960 + /* Find the overlapping optimized kprobes. */ 961 + _p = get_optimized_kprobe(p->addr); 969 962 if (unlikely(_p)) 970 963 /* Fallback to unoptimized kprobe */ 971 964 unoptimize_kprobe(_p, true); ··· 976 965 optimize_kprobe(p); /* Try to optimize (add kprobe to a list) */ 977 966 } 978 967 979 - /* Remove the breakpoint of a probe. Must be called with text_mutex locked */ 968 + /* Remove the breakpoint of a probe. */ 980 969 static void __disarm_kprobe(struct kprobe *p, bool reopt) 981 970 { 982 971 struct kprobe *_p; 972 + 973 + lockdep_assert_held(&text_mutex); 983 974 984 975 /* Try to unoptimize */ 985 976 unoptimize_kprobe(p, kprobes_all_disarmed); 986 977 987 978 if (!kprobe_queued(p)) { 988 979 arch_disarm_kprobe(p); 989 - /* If another kprobe was blocked, optimize it. */ 990 - _p = get_optimized_kprobe((unsigned long)p->addr); 980 + /* If another kprobe was blocked, re-optimize it. */ 981 + _p = get_optimized_kprobe(p->addr); 991 982 if (unlikely(_p) && reopt) 992 983 optimize_kprobe(_p); 993 984 } 994 - /* TODO: reoptimize others after unoptimized this probe */ 985 + /* 986 + * TODO: Since unoptimization and real disarming will be done by 987 + * the worker thread, we can not check whether another probe are 988 + * unoptimized because of this probe here. It should be re-optimized 989 + * by the worker thread. 990 + */ 995 991 } 996 992 997 993 #else /* !CONFIG_OPTPROBES */ ··· 1021 1003 * unregistered. 1022 1004 * Thus there should be no chance to reuse unused kprobe. 1023 1005 */ 1024 - printk(KERN_ERR "Error: There should be no unused kprobe here.\n"); 1006 + WARN_ON_ONCE(1); 1025 1007 return -EINVAL; 1026 1008 } 1027 1009 ··· 1051 1033 static int kprobe_ipmodify_enabled; 1052 1034 static int kprobe_ftrace_enabled; 1053 1035 1054 - /* Must ensure p->addr is really on ftrace */ 1055 - static int prepare_kprobe(struct kprobe *p) 1056 - { 1057 - if (!kprobe_ftrace(p)) 1058 - return arch_prepare_kprobe(p); 1059 - 1060 - return arch_prepare_kprobe_ftrace(p); 1061 - } 1062 - 1063 - /* Caller must lock kprobe_mutex */ 1064 1036 static int __arm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops, 1065 1037 int *cnt) 1066 1038 { 1067 1039 int ret = 0; 1068 1040 1041 + lockdep_assert_held(&kprobe_mutex); 1042 + 1069 1043 ret = ftrace_set_filter_ip(ops, (unsigned long)p->addr, 0, 0); 1070 - if (ret) { 1071 - pr_debug("Failed to arm kprobe-ftrace at %pS (%d)\n", 1072 - p->addr, ret); 1044 + if (WARN_ONCE(ret < 0, "Failed to arm kprobe-ftrace at %pS (error %d)\n", p->addr, ret)) 1073 1045 return ret; 1074 - } 1075 1046 1076 1047 if (*cnt == 0) { 1077 1048 ret = register_ftrace_function(ops); 1078 - if (ret) { 1079 - pr_debug("Failed to init kprobe-ftrace (%d)\n", ret); 1049 + if (WARN(ret < 0, "Failed to register kprobe-ftrace (error %d)\n", ret)) 1080 1050 goto err_ftrace; 1081 - } 1082 1051 } 1083 1052 1084 1053 (*cnt)++; ··· 1089 1084 ipmodify ? &kprobe_ipmodify_enabled : &kprobe_ftrace_enabled); 1090 1085 } 1091 1086 1092 - /* Caller must lock kprobe_mutex */ 1093 1087 static int __disarm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops, 1094 1088 int *cnt) 1095 1089 { 1096 1090 int ret = 0; 1097 1091 1092 + lockdep_assert_held(&kprobe_mutex); 1093 + 1098 1094 if (*cnt == 1) { 1099 1095 ret = unregister_ftrace_function(ops); 1100 - if (WARN(ret < 0, "Failed to unregister kprobe-ftrace (%d)\n", ret)) 1096 + if (WARN(ret < 0, "Failed to unregister kprobe-ftrace (error %d)\n", ret)) 1101 1097 return ret; 1102 1098 } 1103 1099 1104 1100 (*cnt)--; 1105 1101 1106 1102 ret = ftrace_set_filter_ip(ops, (unsigned long)p->addr, 1, 0); 1107 - WARN_ONCE(ret < 0, "Failed to disarm kprobe-ftrace at %pS (%d)\n", 1103 + WARN_ONCE(ret < 0, "Failed to disarm kprobe-ftrace at %pS (error %d)\n", 1108 1104 p->addr, ret); 1109 1105 return ret; 1110 1106 } ··· 1119 1113 ipmodify ? &kprobe_ipmodify_enabled : &kprobe_ftrace_enabled); 1120 1114 } 1121 1115 #else /* !CONFIG_KPROBES_ON_FTRACE */ 1122 - static inline int prepare_kprobe(struct kprobe *p) 1123 - { 1124 - return arch_prepare_kprobe(p); 1125 - } 1126 - 1127 1116 static inline int arm_kprobe_ftrace(struct kprobe *p) 1128 1117 { 1129 1118 return -ENODEV; ··· 1130 1129 } 1131 1130 #endif 1132 1131 1133 - /* Arm a kprobe with text_mutex */ 1132 + static int prepare_kprobe(struct kprobe *p) 1133 + { 1134 + /* Must ensure p->addr is really on ftrace */ 1135 + if (kprobe_ftrace(p)) 1136 + return arch_prepare_kprobe_ftrace(p); 1137 + 1138 + return arch_prepare_kprobe(p); 1139 + } 1140 + 1134 1141 static int arm_kprobe(struct kprobe *kp) 1135 1142 { 1136 1143 if (unlikely(kprobe_ftrace(kp))) ··· 1153 1144 return 0; 1154 1145 } 1155 1146 1156 - /* Disarm a kprobe with text_mutex */ 1157 1147 static int disarm_kprobe(struct kprobe *kp, bool reopt) 1158 1148 { 1159 1149 if (unlikely(kprobe_ftrace(kp))) ··· 1202 1194 } 1203 1195 NOKPROBE_SYMBOL(aggr_post_handler); 1204 1196 1205 - /* Walks the list and increments nmissed count for multiprobe case */ 1197 + /* Walks the list and increments 'nmissed' if 'p' has child probes. */ 1206 1198 void kprobes_inc_nmissed_count(struct kprobe *p) 1207 1199 { 1208 1200 struct kprobe *kp; 1201 + 1209 1202 if (!kprobe_aggrprobe(p)) { 1210 1203 p->nmissed++; 1211 1204 } else { 1212 1205 list_for_each_entry_rcu(kp, &p->list, list) 1213 1206 kp->nmissed++; 1214 1207 } 1215 - return; 1216 1208 } 1217 1209 NOKPROBE_SYMBOL(kprobes_inc_nmissed_count); 1218 1210 ··· 1230 1222 { 1231 1223 struct kretprobe *rp = get_kretprobe(ri); 1232 1224 1233 - if (likely(rp)) { 1225 + if (likely(rp)) 1234 1226 freelist_add(&ri->freelist, &rp->freelist); 1235 - } else 1227 + else 1236 1228 call_rcu(&ri->rcu, free_rp_inst_rcu); 1237 1229 } 1238 1230 NOKPROBE_SYMBOL(recycle_rp_inst); ··· 1259 1251 1260 1252 /* 1261 1253 * This function is called from delayed_put_task_struct() when a task is 1262 - * dead and cleaned up to recycle any function-return probe instances 1263 - * associated with this task. These left over instances represent probed 1264 - * functions that have been called but will never return. 1254 + * dead and cleaned up to recycle any kretprobe instances associated with 1255 + * this task. These left over instances represent probed functions that 1256 + * have been called but will never return. 1265 1257 */ 1266 1258 void kprobe_flush_task(struct task_struct *tk) 1267 1259 { ··· 1307 1299 } 1308 1300 } 1309 1301 1310 - /* Add the new probe to ap->list */ 1302 + /* Add the new probe to 'ap->list'. */ 1311 1303 static int add_new_kprobe(struct kprobe *ap, struct kprobe *p) 1312 1304 { 1313 1305 if (p->post_handler) ··· 1321 1313 } 1322 1314 1323 1315 /* 1324 - * Fill in the required fields of the "manager kprobe". Replace the 1325 - * earlier kprobe in the hlist with the manager kprobe 1316 + * Fill in the required fields of the aggregator kprobe. Replace the 1317 + * earlier kprobe in the hlist with the aggregator kprobe. 1326 1318 */ 1327 1319 static void init_aggr_kprobe(struct kprobe *ap, struct kprobe *p) 1328 1320 { 1329 - /* Copy p's insn slot to ap */ 1321 + /* Copy the insn slot of 'p' to 'ap'. */ 1330 1322 copy_kprobe(p, ap); 1331 1323 flush_insn_slot(ap); 1332 1324 ap->addr = p->addr; ··· 1344 1336 } 1345 1337 1346 1338 /* 1347 - * This is the second or subsequent kprobe at the address - handle 1348 - * the intricacies 1339 + * This registers the second or subsequent kprobe at the same address. 1349 1340 */ 1350 1341 static int register_aggr_kprobe(struct kprobe *orig_p, struct kprobe *p) 1351 1342 { ··· 1358 1351 mutex_lock(&text_mutex); 1359 1352 1360 1353 if (!kprobe_aggrprobe(orig_p)) { 1361 - /* If orig_p is not an aggr_kprobe, create new aggr_kprobe. */ 1354 + /* If 'orig_p' is not an 'aggr_kprobe', create new one. */ 1362 1355 ap = alloc_aggr_kprobe(orig_p); 1363 1356 if (!ap) { 1364 1357 ret = -ENOMEM; ··· 1383 1376 if (ret) 1384 1377 /* 1385 1378 * Even if fail to allocate new slot, don't need to 1386 - * free aggr_probe. It will be used next time, or 1387 - * freed by unregister_kprobe. 1379 + * free the 'ap'. It will be used next time, or 1380 + * freed by unregister_kprobe(). 1388 1381 */ 1389 1382 goto out; 1390 1383 ··· 1399 1392 | KPROBE_FLAG_DISABLED; 1400 1393 } 1401 1394 1402 - /* Copy ap's insn slot to p */ 1395 + /* Copy the insn slot of 'p' to 'ap'. */ 1403 1396 copy_kprobe(ap, p); 1404 1397 ret = add_new_kprobe(ap, p); 1405 1398 ··· 1425 1418 1426 1419 bool __weak arch_within_kprobe_blacklist(unsigned long addr) 1427 1420 { 1428 - /* The __kprobes marked functions and entry code must not be probed */ 1421 + /* The '__kprobes' functions and entry code must not be probed. */ 1429 1422 return addr >= (unsigned long)__kprobes_text_start && 1430 1423 addr < (unsigned long)__kprobes_text_end; 1431 1424 } ··· 1437 1430 if (arch_within_kprobe_blacklist(addr)) 1438 1431 return true; 1439 1432 /* 1440 - * If there exists a kprobe_blacklist, verify and 1441 - * fail any probe registration in the prohibited area 1433 + * If 'kprobe_blacklist' is defined, check the address and 1434 + * reject any probe registration in the prohibited area. 1442 1435 */ 1443 1436 list_for_each_entry(ent, &kprobe_blacklist, list) { 1444 1437 if (addr >= ent->start_addr && addr < ent->end_addr) ··· 1468 1461 } 1469 1462 1470 1463 /* 1471 - * If we have a symbol_name argument, look it up and add the offset field 1464 + * If 'symbol_name' is specified, look it up and add the 'offset' 1472 1465 * to it. This way, we can specify a relative address to a symbol. 1473 1466 * This returns encoded errors if it fails to look up symbol or invalid 1474 1467 * combination of parameters. ··· 1498 1491 return _kprobe_addr(p->addr, p->symbol_name, p->offset); 1499 1492 } 1500 1493 1501 - /* Check passed kprobe is valid and return kprobe in kprobe_table. */ 1494 + /* 1495 + * Check the 'p' is valid and return the aggregator kprobe 1496 + * at the same address. 1497 + */ 1502 1498 static struct kprobe *__get_valid_kprobe(struct kprobe *p) 1503 1499 { 1504 1500 struct kprobe *ap, *list_p; ··· 1539 1529 return ret; 1540 1530 } 1541 1531 1542 - int __weak arch_check_ftrace_location(struct kprobe *p) 1532 + static int check_ftrace_location(struct kprobe *p) 1543 1533 { 1544 1534 unsigned long ftrace_addr; 1545 1535 ··· 1562 1552 { 1563 1553 int ret; 1564 1554 1565 - ret = arch_check_ftrace_location(p); 1555 + ret = check_ftrace_location(p); 1566 1556 if (ret) 1567 1557 return ret; 1568 1558 jump_label_lock(); ··· 1578 1568 goto out; 1579 1569 } 1580 1570 1581 - /* Check if are we probing a module */ 1571 + /* Check if 'p' is probing a module. */ 1582 1572 *probed_mod = __module_text_address((unsigned long) p->addr); 1583 1573 if (*probed_mod) { 1584 1574 /* ··· 1591 1581 } 1592 1582 1593 1583 /* 1594 - * If the module freed .init.text, we couldn't insert 1584 + * If the module freed '.init.text', we couldn't insert 1595 1585 * kprobes in there. 1596 1586 */ 1597 1587 if (within_module_init((unsigned long)p->addr, *probed_mod) && ··· 1638 1628 1639 1629 old_p = get_kprobe(p->addr); 1640 1630 if (old_p) { 1641 - /* Since this may unoptimize old_p, locking text_mutex. */ 1631 + /* Since this may unoptimize 'old_p', locking 'text_mutex'. */ 1642 1632 ret = register_aggr_kprobe(old_p, p); 1643 1633 goto out; 1644 1634 } ··· 1677 1667 } 1678 1668 EXPORT_SYMBOL_GPL(register_kprobe); 1679 1669 1680 - /* Check if all probes on the aggrprobe are disabled */ 1681 - static int aggr_kprobe_disabled(struct kprobe *ap) 1670 + /* Check if all probes on the 'ap' are disabled. */ 1671 + static bool aggr_kprobe_disabled(struct kprobe *ap) 1682 1672 { 1683 1673 struct kprobe *kp; 1684 1674 ··· 1687 1677 list_for_each_entry(kp, &ap->list, list) 1688 1678 if (!kprobe_disabled(kp)) 1689 1679 /* 1690 - * There is an active probe on the list. 1691 - * We can't disable this ap. 1680 + * Since there is an active probe on the list, 1681 + * we can't disable this 'ap'. 1692 1682 */ 1693 - return 0; 1683 + return false; 1694 1684 1695 - return 1; 1685 + return true; 1696 1686 } 1697 1687 1698 - /* Disable one kprobe: Make sure called under kprobe_mutex is locked */ 1699 1688 static struct kprobe *__disable_kprobe(struct kprobe *p) 1700 1689 { 1701 1690 struct kprobe *orig_p; 1702 1691 int ret; 1692 + 1693 + lockdep_assert_held(&kprobe_mutex); 1703 1694 1704 1695 /* Get an original kprobe for return */ 1705 1696 orig_p = __get_valid_kprobe(p); ··· 1715 1704 /* Try to disarm and disable this/parent probe */ 1716 1705 if (p == orig_p || aggr_kprobe_disabled(orig_p)) { 1717 1706 /* 1718 - * If kprobes_all_disarmed is set, orig_p 1707 + * If 'kprobes_all_disarmed' is set, 'orig_p' 1719 1708 * should have already been disarmed, so 1720 1709 * skip unneed disarming process. 1721 1710 */ ··· 1861 1850 .priority = 0x7fffffff /* we need to be notified first */ 1862 1851 }; 1863 1852 1864 - unsigned long __weak arch_deref_entry_point(void *entry) 1865 - { 1866 - return (unsigned long)entry; 1867 - } 1868 - 1869 1853 #ifdef CONFIG_KRETPROBES 1870 1854 1855 + /* This assumes the 'tsk' is the current task or the is not running. */ 1856 + static kprobe_opcode_t *__kretprobe_find_ret_addr(struct task_struct *tsk, 1857 + struct llist_node **cur) 1858 + { 1859 + struct kretprobe_instance *ri = NULL; 1860 + struct llist_node *node = *cur; 1861 + 1862 + if (!node) 1863 + node = tsk->kretprobe_instances.first; 1864 + else 1865 + node = node->next; 1866 + 1867 + while (node) { 1868 + ri = container_of(node, struct kretprobe_instance, llist); 1869 + if (ri->ret_addr != kretprobe_trampoline_addr()) { 1870 + *cur = node; 1871 + return ri->ret_addr; 1872 + } 1873 + node = node->next; 1874 + } 1875 + return NULL; 1876 + } 1877 + NOKPROBE_SYMBOL(__kretprobe_find_ret_addr); 1878 + 1879 + /** 1880 + * kretprobe_find_ret_addr -- Find correct return address modified by kretprobe 1881 + * @tsk: Target task 1882 + * @fp: A frame pointer 1883 + * @cur: a storage of the loop cursor llist_node pointer for next call 1884 + * 1885 + * Find the correct return address modified by a kretprobe on @tsk in unsigned 1886 + * long type. If it finds the return address, this returns that address value, 1887 + * or this returns 0. 1888 + * The @tsk must be 'current' or a task which is not running. @fp is a hint 1889 + * to get the currect return address - which is compared with the 1890 + * kretprobe_instance::fp field. The @cur is a loop cursor for searching the 1891 + * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the 1892 + * first call, but '@cur' itself must NOT NULL. 1893 + */ 1894 + unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp, 1895 + struct llist_node **cur) 1896 + { 1897 + struct kretprobe_instance *ri = NULL; 1898 + kprobe_opcode_t *ret; 1899 + 1900 + if (WARN_ON_ONCE(!cur)) 1901 + return 0; 1902 + 1903 + do { 1904 + ret = __kretprobe_find_ret_addr(tsk, cur); 1905 + if (!ret) 1906 + break; 1907 + ri = container_of(*cur, struct kretprobe_instance, llist); 1908 + } while (ri->fp != fp); 1909 + 1910 + return (unsigned long)ret; 1911 + } 1912 + NOKPROBE_SYMBOL(kretprobe_find_ret_addr); 1913 + 1914 + void __weak arch_kretprobe_fixup_return(struct pt_regs *regs, 1915 + kprobe_opcode_t *correct_ret_addr) 1916 + { 1917 + /* 1918 + * Do nothing by default. Please fill this to update the fake return 1919 + * address on the stack with the correct one on each arch if possible. 1920 + */ 1921 + } 1922 + 1871 1923 unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs, 1872 - void *trampoline_address, 1873 1924 void *frame_pointer) 1874 1925 { 1875 1926 kprobe_opcode_t *correct_ret_addr = NULL; 1876 1927 struct kretprobe_instance *ri = NULL; 1877 - struct llist_node *first, *node; 1928 + struct llist_node *first, *node = NULL; 1878 1929 struct kretprobe *rp; 1879 1930 1880 - /* Find all nodes for this frame. */ 1881 - first = node = current->kretprobe_instances.first; 1882 - while (node) { 1883 - ri = container_of(node, struct kretprobe_instance, llist); 1884 - 1885 - BUG_ON(ri->fp != frame_pointer); 1886 - 1887 - if (ri->ret_addr != trampoline_address) { 1888 - correct_ret_addr = ri->ret_addr; 1889 - /* 1890 - * This is the real return address. Any other 1891 - * instances associated with this task are for 1892 - * other calls deeper on the call stack 1893 - */ 1894 - goto found; 1895 - } 1896 - 1897 - node = node->next; 1931 + /* Find correct address and all nodes for this frame. */ 1932 + correct_ret_addr = __kretprobe_find_ret_addr(current, &node); 1933 + if (!correct_ret_addr) { 1934 + pr_err("kretprobe: Return address not found, not execute handler. Maybe there is a bug in the kernel.\n"); 1935 + BUG_ON(1); 1898 1936 } 1899 - pr_err("Oops! Kretprobe fails to find correct return address.\n"); 1900 - BUG_ON(1); 1901 1937 1902 - found: 1903 - /* Unlink all nodes for this frame. */ 1904 - current->kretprobe_instances.first = node->next; 1905 - node->next = NULL; 1938 + /* 1939 + * Set the return address as the instruction pointer, because if the 1940 + * user handler calls stack_trace_save_regs() with this 'regs', 1941 + * the stack trace will start from the instruction pointer. 1942 + */ 1943 + instruction_pointer_set(regs, (unsigned long)correct_ret_addr); 1906 1944 1907 - /* Run them.. */ 1945 + /* Run the user handler of the nodes. */ 1946 + first = current->kretprobe_instances.first; 1908 1947 while (first) { 1909 1948 ri = container_of(first, struct kretprobe_instance, llist); 1910 - first = first->next; 1949 + 1950 + if (WARN_ON_ONCE(ri->fp != frame_pointer)) 1951 + break; 1911 1952 1912 1953 rp = get_kretprobe(ri); 1913 1954 if (rp && rp->handler) { ··· 1970 1907 rp->handler(ri, regs); 1971 1908 __this_cpu_write(current_kprobe, prev); 1972 1909 } 1910 + if (first == node) 1911 + break; 1912 + 1913 + first = first->next; 1914 + } 1915 + 1916 + arch_kretprobe_fixup_return(regs, correct_ret_addr); 1917 + 1918 + /* Unlink all nodes for this frame. */ 1919 + first = current->kretprobe_instances.first; 1920 + current->kretprobe_instances.first = node->next; 1921 + node->next = NULL; 1922 + 1923 + /* Recycle free instances. */ 1924 + while (first) { 1925 + ri = container_of(first, struct kretprobe_instance, llist); 1926 + first = first->next; 1973 1927 1974 1928 recycle_rp_inst(ri); 1975 1929 } ··· 2071 1991 if (ret) 2072 1992 return ret; 2073 1993 2074 - /* If only rp->kp.addr is specified, check reregistering kprobes */ 1994 + /* If only 'rp->kp.addr' is specified, check reregistering kprobes */ 2075 1995 if (rp->kp.addr && warn_kprobe_rereg(&rp->kp)) 2076 1996 return -EINVAL; 2077 1997 ··· 2176 2096 #else /* CONFIG_KRETPROBES */ 2177 2097 int register_kretprobe(struct kretprobe *rp) 2178 2098 { 2179 - return -ENOSYS; 2099 + return -EOPNOTSUPP; 2180 2100 } 2181 2101 EXPORT_SYMBOL_GPL(register_kretprobe); 2182 2102 2183 2103 int register_kretprobes(struct kretprobe **rps, int num) 2184 2104 { 2185 - return -ENOSYS; 2105 + return -EOPNOTSUPP; 2186 2106 } 2187 2107 EXPORT_SYMBOL_GPL(register_kretprobes); 2188 2108 ··· 2231 2151 /* 2232 2152 * The module is going away. We should disarm the kprobe which 2233 2153 * is using ftrace, because ftrace framework is still available at 2234 - * MODULE_STATE_GOING notification. 2154 + * 'MODULE_STATE_GOING' notification. 2235 2155 */ 2236 2156 if (kprobe_ftrace(p) && !kprobe_disabled(p) && !kprobes_all_disarmed) 2237 2157 disarm_kprobe_ftrace(p); ··· 2294 2214 /* Caller must NOT call this in usual path. This is only for critical case */ 2295 2215 void dump_kprobe(struct kprobe *kp) 2296 2216 { 2297 - pr_err("Dumping kprobe:\n"); 2298 - pr_err("Name: %s\nOffset: %x\nAddress: %pS\n", 2217 + pr_err("Dump kprobe:\n.symbol_name = %s, .offset = %x, .addr = %pS\n", 2299 2218 kp->symbol_name, kp->offset, kp->addr); 2300 2219 } 2301 2220 NOKPROBE_SYMBOL(dump_kprobe); ··· 2396 2317 int ret; 2397 2318 2398 2319 for (iter = start; iter < end; iter++) { 2399 - entry = arch_deref_entry_point((void *)*iter); 2320 + entry = (unsigned long)dereference_symbol_descriptor((void *)*iter); 2400 2321 ret = kprobe_add_ksym_blacklist(entry); 2401 2322 if (ret == -EINVAL) 2402 2323 continue; ··· 2404 2325 return ret; 2405 2326 } 2406 2327 2407 - /* Symbols in __kprobes_text are blacklisted */ 2328 + /* Symbols in '__kprobes_text' are blacklisted */ 2408 2329 ret = kprobe_add_area_blacklist((unsigned long)__kprobes_text_start, 2409 2330 (unsigned long)__kprobes_text_end); 2410 2331 if (ret) 2411 2332 return ret; 2412 2333 2413 - /* Symbols in noinstr section are blacklisted */ 2334 + /* Symbols in 'noinstr' section are blacklisted */ 2414 2335 ret = kprobe_add_area_blacklist((unsigned long)__noinstr_text_start, 2415 2336 (unsigned long)__noinstr_text_end); 2416 2337 ··· 2482 2403 return NOTIFY_DONE; 2483 2404 2484 2405 /* 2485 - * When MODULE_STATE_GOING was notified, both of module .text and 2486 - * .init.text sections would be freed. When MODULE_STATE_LIVE was 2487 - * notified, only .init.text section would be freed. We need to 2406 + * When 'MODULE_STATE_GOING' was notified, both of module '.text' and 2407 + * '.init.text' sections would be freed. When 'MODULE_STATE_LIVE' was 2408 + * notified, only '.init.text' section would be freed. We need to 2488 2409 * disable kprobes which have been inserted in the sections. 2489 2410 */ 2490 2411 mutex_lock(&kprobe_mutex); ··· 2501 2422 * 2502 2423 * Note, this will also move any optimized probes 2503 2424 * that are pending to be removed from their 2504 - * corresponding lists to the freeing_list and 2425 + * corresponding lists to the 'freeing_list' and 2505 2426 * will not be touched by the delayed 2506 - * kprobe_optimizer work handler. 2427 + * kprobe_optimizer() work handler. 2507 2428 */ 2508 2429 kill_kprobe(p); 2509 2430 } ··· 2519 2440 .priority = 0 2520 2441 }; 2521 2442 2522 - /* Markers of _kprobe_blacklist section */ 2523 - extern unsigned long __start_kprobe_blacklist[]; 2524 - extern unsigned long __stop_kprobe_blacklist[]; 2525 - 2526 2443 void kprobe_free_init_mem(void) 2527 2444 { 2528 2445 void *start = (void *)(&__init_begin); ··· 2529 2454 2530 2455 mutex_lock(&kprobe_mutex); 2531 2456 2532 - /* Kill all kprobes on initmem */ 2457 + /* Kill all kprobes on initmem because the target code has been freed. */ 2533 2458 for (i = 0; i < KPROBE_TABLE_SIZE; i++) { 2534 2459 head = &kprobe_table[i]; 2535 2460 hlist_for_each_entry(p, head, hlist) { ··· 2552 2477 2553 2478 err = populate_kprobe_blacklist(__start_kprobe_blacklist, 2554 2479 __stop_kprobe_blacklist); 2555 - if (err) { 2556 - pr_err("kprobes: failed to populate blacklist: %d\n", err); 2557 - pr_err("Please take care of using kprobes.\n"); 2558 - } 2480 + if (err) 2481 + pr_err("Failed to populate blacklist (error %d), kprobes not restricted, be careful using them!\n", err); 2559 2482 2560 2483 if (kretprobe_blacklist_size) { 2561 2484 /* lookup the function address from its name */ ··· 2561 2488 kretprobe_blacklist[i].addr = 2562 2489 kprobe_lookup_name(kretprobe_blacklist[i].name, 0); 2563 2490 if (!kretprobe_blacklist[i].addr) 2564 - printk("kretprobe: lookup failed: %s\n", 2491 + pr_err("Failed to lookup symbol '%s' for kretprobe blacklist. Maybe the target function is removed or renamed.\n", 2565 2492 kretprobe_blacklist[i].name); 2566 2493 } 2567 2494 } ··· 2570 2497 kprobes_all_disarmed = false; 2571 2498 2572 2499 #if defined(CONFIG_OPTPROBES) && defined(__ARCH_WANT_KPROBES_INSN_SLOT) 2573 - /* Init kprobe_optinsn_slots for allocation */ 2500 + /* Init 'kprobe_optinsn_slots' for allocation */ 2574 2501 kprobe_optinsn_slots.insn_size = MAX_OPTINSN_SIZE; 2575 2502 #endif 2576 2503 ··· 2581 2508 err = register_module_notifier(&kprobe_module_nb); 2582 2509 2583 2510 kprobes_initialized = (err == 0); 2584 - 2585 - if (!err) 2586 - init_test_probes(); 2587 2511 return err; 2588 2512 } 2589 2513 early_initcall(init_kprobes); ··· 2701 2631 list_entry(v, struct kprobe_blacklist_entry, list); 2702 2632 2703 2633 /* 2704 - * If /proc/kallsyms is not showing kernel address, we won't 2634 + * If '/proc/kallsyms' is not showing kernel address, we won't 2705 2635 * show them here either. 2706 2636 */ 2707 2637 if (!kallsyms_show_value(m->file->f_cred)) ··· 2762 2692 } 2763 2693 2764 2694 if (errors) 2765 - pr_warn("Kprobes globally enabled, but failed to arm %d out of %d probes\n", 2695 + pr_warn("Kprobes globally enabled, but failed to enable %d out of %d probes. Please check which kprobes are kept disabled via debugfs.\n", 2766 2696 errors, total); 2767 2697 else 2768 2698 pr_info("Kprobes globally enabled\n"); ··· 2805 2735 } 2806 2736 2807 2737 if (errors) 2808 - pr_warn("Kprobes globally disabled, but failed to disarm %d out of %d probes\n", 2738 + pr_warn("Kprobes globally disabled, but failed to disable %d out of %d probes. Please check which kprobes are kept enabled via debugfs.\n", 2809 2739 errors, total); 2810 2740 else 2811 2741 pr_info("Kprobes globally disabled\n"); ··· 2840 2770 static ssize_t write_enabled_file_bool(struct file *file, 2841 2771 const char __user *user_buf, size_t count, loff_t *ppos) 2842 2772 { 2843 - char buf[32]; 2844 - size_t buf_size; 2845 - int ret = 0; 2773 + bool enable; 2774 + int ret; 2846 2775 2847 - buf_size = min(count, (sizeof(buf)-1)); 2848 - if (copy_from_user(buf, user_buf, buf_size)) 2849 - return -EFAULT; 2776 + ret = kstrtobool_from_user(user_buf, count, &enable); 2777 + if (ret) 2778 + return ret; 2850 2779 2851 - buf[buf_size] = '\0'; 2852 - switch (buf[0]) { 2853 - case 'y': 2854 - case 'Y': 2855 - case '1': 2856 - ret = arm_all_kprobes(); 2857 - break; 2858 - case 'n': 2859 - case 'N': 2860 - case '0': 2861 - ret = disarm_all_kprobes(); 2862 - break; 2863 - default: 2864 - return -EINVAL; 2865 - } 2866 - 2780 + ret = enable ? arm_all_kprobes() : disarm_all_kprobes(); 2867 2781 if (ret) 2868 2782 return ret; 2869 2783 ··· 2863 2809 static int __init debugfs_kprobe_init(void) 2864 2810 { 2865 2811 struct dentry *dir; 2866 - unsigned int value = 1; 2867 2812 2868 2813 dir = debugfs_create_dir("kprobes", NULL); 2869 2814 2870 2815 debugfs_create_file("list", 0400, dir, NULL, &kprobes_fops); 2871 2816 2872 - debugfs_create_file("enabled", 0600, dir, &value, &fops_kp); 2817 + debugfs_create_file("enabled", 0600, dir, NULL, &fops_kp); 2873 2818 2874 2819 debugfs_create_file("blacklist", 0400, dir, NULL, 2875 2820 &kprobe_blacklist_fops);
+6 -6
kernel/livepatch/patch.c
··· 49 49 50 50 ops = container_of(fops, struct klp_ops, fops); 51 51 52 + /* 53 + * The ftrace_test_recursion_trylock() will disable preemption, 54 + * which is required for the variant of synchronize_rcu() that is 55 + * used to allow patching functions where RCU is not watching. 56 + * See klp_synchronize_transition() for more details. 57 + */ 52 58 bit = ftrace_test_recursion_trylock(ip, parent_ip); 53 59 if (WARN_ON_ONCE(bit < 0)) 54 60 return; 55 - /* 56 - * A variant of synchronize_rcu() is used to allow patching functions 57 - * where RCU is not watching, see klp_synchronize_transition(). 58 - */ 59 - preempt_disable_notrace(); 60 61 61 62 func = list_first_or_null_rcu(&ops->func_stack, struct klp_func, 62 63 stack_node); ··· 121 120 klp_arch_set_pc(fregs, (unsigned long)func->new_func); 122 121 123 122 unlock: 124 - preempt_enable_notrace(); 125 123 ftrace_test_recursion_unlock(bit); 126 124 } 127 125
-313
kernel/test_kprobes.c
··· 1 - // SPDX-License-Identifier: GPL-2.0-or-later 2 - /* 3 - * test_kprobes.c - simple sanity test for *probes 4 - * 5 - * Copyright IBM Corp. 2008 6 - */ 7 - 8 - #define pr_fmt(fmt) "Kprobe smoke test: " fmt 9 - 10 - #include <linux/kernel.h> 11 - #include <linux/kprobes.h> 12 - #include <linux/random.h> 13 - 14 - #define div_factor 3 15 - 16 - static u32 rand1, preh_val, posth_val; 17 - static int errors, handler_errors, num_tests; 18 - static u32 (*target)(u32 value); 19 - static u32 (*target2)(u32 value); 20 - 21 - static noinline u32 kprobe_target(u32 value) 22 - { 23 - return (value / div_factor); 24 - } 25 - 26 - static int kp_pre_handler(struct kprobe *p, struct pt_regs *regs) 27 - { 28 - if (preemptible()) { 29 - handler_errors++; 30 - pr_err("pre-handler is preemptible\n"); 31 - } 32 - preh_val = (rand1 / div_factor); 33 - return 0; 34 - } 35 - 36 - static void kp_post_handler(struct kprobe *p, struct pt_regs *regs, 37 - unsigned long flags) 38 - { 39 - if (preemptible()) { 40 - handler_errors++; 41 - pr_err("post-handler is preemptible\n"); 42 - } 43 - if (preh_val != (rand1 / div_factor)) { 44 - handler_errors++; 45 - pr_err("incorrect value in post_handler\n"); 46 - } 47 - posth_val = preh_val + div_factor; 48 - } 49 - 50 - static struct kprobe kp = { 51 - .symbol_name = "kprobe_target", 52 - .pre_handler = kp_pre_handler, 53 - .post_handler = kp_post_handler 54 - }; 55 - 56 - static int test_kprobe(void) 57 - { 58 - int ret; 59 - 60 - ret = register_kprobe(&kp); 61 - if (ret < 0) { 62 - pr_err("register_kprobe returned %d\n", ret); 63 - return ret; 64 - } 65 - 66 - ret = target(rand1); 67 - unregister_kprobe(&kp); 68 - 69 - if (preh_val == 0) { 70 - pr_err("kprobe pre_handler not called\n"); 71 - handler_errors++; 72 - } 73 - 74 - if (posth_val == 0) { 75 - pr_err("kprobe post_handler not called\n"); 76 - handler_errors++; 77 - } 78 - 79 - return 0; 80 - } 81 - 82 - static noinline u32 kprobe_target2(u32 value) 83 - { 84 - return (value / div_factor) + 1; 85 - } 86 - 87 - static int kp_pre_handler2(struct kprobe *p, struct pt_regs *regs) 88 - { 89 - preh_val = (rand1 / div_factor) + 1; 90 - return 0; 91 - } 92 - 93 - static void kp_post_handler2(struct kprobe *p, struct pt_regs *regs, 94 - unsigned long flags) 95 - { 96 - if (preh_val != (rand1 / div_factor) + 1) { 97 - handler_errors++; 98 - pr_err("incorrect value in post_handler2\n"); 99 - } 100 - posth_val = preh_val + div_factor; 101 - } 102 - 103 - static struct kprobe kp2 = { 104 - .symbol_name = "kprobe_target2", 105 - .pre_handler = kp_pre_handler2, 106 - .post_handler = kp_post_handler2 107 - }; 108 - 109 - static int test_kprobes(void) 110 - { 111 - int ret; 112 - struct kprobe *kps[2] = {&kp, &kp2}; 113 - 114 - /* addr and flags should be cleard for reusing kprobe. */ 115 - kp.addr = NULL; 116 - kp.flags = 0; 117 - ret = register_kprobes(kps, 2); 118 - if (ret < 0) { 119 - pr_err("register_kprobes returned %d\n", ret); 120 - return ret; 121 - } 122 - 123 - preh_val = 0; 124 - posth_val = 0; 125 - ret = target(rand1); 126 - 127 - if (preh_val == 0) { 128 - pr_err("kprobe pre_handler not called\n"); 129 - handler_errors++; 130 - } 131 - 132 - if (posth_val == 0) { 133 - pr_err("kprobe post_handler not called\n"); 134 - handler_errors++; 135 - } 136 - 137 - preh_val = 0; 138 - posth_val = 0; 139 - ret = target2(rand1); 140 - 141 - if (preh_val == 0) { 142 - pr_err("kprobe pre_handler2 not called\n"); 143 - handler_errors++; 144 - } 145 - 146 - if (posth_val == 0) { 147 - pr_err("kprobe post_handler2 not called\n"); 148 - handler_errors++; 149 - } 150 - 151 - unregister_kprobes(kps, 2); 152 - return 0; 153 - 154 - } 155 - 156 - #ifdef CONFIG_KRETPROBES 157 - static u32 krph_val; 158 - 159 - static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs) 160 - { 161 - if (preemptible()) { 162 - handler_errors++; 163 - pr_err("kretprobe entry handler is preemptible\n"); 164 - } 165 - krph_val = (rand1 / div_factor); 166 - return 0; 167 - } 168 - 169 - static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs) 170 - { 171 - unsigned long ret = regs_return_value(regs); 172 - 173 - if (preemptible()) { 174 - handler_errors++; 175 - pr_err("kretprobe return handler is preemptible\n"); 176 - } 177 - if (ret != (rand1 / div_factor)) { 178 - handler_errors++; 179 - pr_err("incorrect value in kretprobe handler\n"); 180 - } 181 - if (krph_val == 0) { 182 - handler_errors++; 183 - pr_err("call to kretprobe entry handler failed\n"); 184 - } 185 - 186 - krph_val = rand1; 187 - return 0; 188 - } 189 - 190 - static struct kretprobe rp = { 191 - .handler = return_handler, 192 - .entry_handler = entry_handler, 193 - .kp.symbol_name = "kprobe_target" 194 - }; 195 - 196 - static int test_kretprobe(void) 197 - { 198 - int ret; 199 - 200 - ret = register_kretprobe(&rp); 201 - if (ret < 0) { 202 - pr_err("register_kretprobe returned %d\n", ret); 203 - return ret; 204 - } 205 - 206 - ret = target(rand1); 207 - unregister_kretprobe(&rp); 208 - if (krph_val != rand1) { 209 - pr_err("kretprobe handler not called\n"); 210 - handler_errors++; 211 - } 212 - 213 - return 0; 214 - } 215 - 216 - static int return_handler2(struct kretprobe_instance *ri, struct pt_regs *regs) 217 - { 218 - unsigned long ret = regs_return_value(regs); 219 - 220 - if (ret != (rand1 / div_factor) + 1) { 221 - handler_errors++; 222 - pr_err("incorrect value in kretprobe handler2\n"); 223 - } 224 - if (krph_val == 0) { 225 - handler_errors++; 226 - pr_err("call to kretprobe entry handler failed\n"); 227 - } 228 - 229 - krph_val = rand1; 230 - return 0; 231 - } 232 - 233 - static struct kretprobe rp2 = { 234 - .handler = return_handler2, 235 - .entry_handler = entry_handler, 236 - .kp.symbol_name = "kprobe_target2" 237 - }; 238 - 239 - static int test_kretprobes(void) 240 - { 241 - int ret; 242 - struct kretprobe *rps[2] = {&rp, &rp2}; 243 - 244 - /* addr and flags should be cleard for reusing kprobe. */ 245 - rp.kp.addr = NULL; 246 - rp.kp.flags = 0; 247 - ret = register_kretprobes(rps, 2); 248 - if (ret < 0) { 249 - pr_err("register_kretprobe returned %d\n", ret); 250 - return ret; 251 - } 252 - 253 - krph_val = 0; 254 - ret = target(rand1); 255 - if (krph_val != rand1) { 256 - pr_err("kretprobe handler not called\n"); 257 - handler_errors++; 258 - } 259 - 260 - krph_val = 0; 261 - ret = target2(rand1); 262 - if (krph_val != rand1) { 263 - pr_err("kretprobe handler2 not called\n"); 264 - handler_errors++; 265 - } 266 - unregister_kretprobes(rps, 2); 267 - return 0; 268 - } 269 - #endif /* CONFIG_KRETPROBES */ 270 - 271 - int init_test_probes(void) 272 - { 273 - int ret; 274 - 275 - target = kprobe_target; 276 - target2 = kprobe_target2; 277 - 278 - do { 279 - rand1 = prandom_u32(); 280 - } while (rand1 <= div_factor); 281 - 282 - pr_info("started\n"); 283 - num_tests++; 284 - ret = test_kprobe(); 285 - if (ret < 0) 286 - errors++; 287 - 288 - num_tests++; 289 - ret = test_kprobes(); 290 - if (ret < 0) 291 - errors++; 292 - 293 - #ifdef CONFIG_KRETPROBES 294 - num_tests++; 295 - ret = test_kretprobe(); 296 - if (ret < 0) 297 - errors++; 298 - 299 - num_tests++; 300 - ret = test_kretprobes(); 301 - if (ret < 0) 302 - errors++; 303 - #endif /* CONFIG_KRETPROBES */ 304 - 305 - if (errors) 306 - pr_err("BUG: %d out of %d tests failed\n", errors, num_tests); 307 - else if (handler_errors) 308 - pr_err("BUG: %d error(s) running handlers\n", handler_errors); 309 - else 310 - pr_info("passed successfully\n"); 311 - 312 - return 0; 313 - }
+1
kernel/trace/Makefile
··· 47 47 obj-$(CONFIG_TRACING) += trace_seq.o 48 48 obj-$(CONFIG_TRACING) += trace_stat.o 49 49 obj-$(CONFIG_TRACING) += trace_printk.o 50 + obj-$(CONFIG_TRACING) += pid_list.o 50 51 obj-$(CONFIG_TRACING_MAP) += tracing_map.o 51 52 obj-$(CONFIG_PREEMPTIRQ_DELAY_TEST) += preemptirq_delay_test.o 52 53 obj-$(CONFIG_SYNTH_EVENT_GEN_TEST) += synth_event_gen_test.o
+4 -2
kernel/trace/fgraph.c
··· 115 115 { 116 116 struct ftrace_graph_ent trace; 117 117 118 + #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS 118 119 /* 119 120 * Skip graph tracing if the return location is served by direct trampoline, 120 121 * since call sequence and return addresses are unpredictable anyway. ··· 125 124 if (ftrace_direct_func_count && 126 125 ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE)) 127 126 return -EBUSY; 127 + #endif 128 128 trace.func = func; 129 129 trace.depth = ++current->curr_ret_depth; 130 130 ··· 335 333 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */ 336 334 337 335 static struct ftrace_ops graph_ops = { 338 - .func = ftrace_stub, 336 + .func = ftrace_graph_func, 339 337 .flags = FTRACE_OPS_FL_INITIALIZED | 340 338 FTRACE_OPS_FL_PID | 341 - FTRACE_OPS_FL_STUB, 339 + FTRACE_OPS_GRAPH_STUB, 342 340 #ifdef FTRACE_GRAPH_TRAMP_ADDR 343 341 .trampoline = FTRACE_GRAPH_TRAMP_ADDR, 344 342 /* trampoline_size is only needed for dynamically allocated tramps */
+283 -65
kernel/trace/ftrace.c
··· 119 119 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub; 120 120 struct ftrace_ops global_ops; 121 121 122 - #if ARCH_SUPPORTS_FTRACE_OPS 123 - static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, 124 - struct ftrace_ops *op, struct ftrace_regs *fregs); 125 - #else 126 - /* See comment below, where ftrace_ops_list_func is defined */ 127 - static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip); 128 - #define ftrace_ops_list_func ((ftrace_func_t)ftrace_ops_no_ops) 129 - #endif 122 + /* Defined by vmlinux.lds.h see the commment above arch_ftrace_ops_list_func for details */ 123 + void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, 124 + struct ftrace_ops *op, struct ftrace_regs *fregs); 130 125 131 126 static inline void ftrace_ops_init(struct ftrace_ops *ops) 132 127 { ··· 576 581 FTRACE_PROFILE_HASH_SIZE * sizeof(struct hlist_head)); 577 582 } 578 583 579 - int ftrace_profile_pages_init(struct ftrace_profile_stat *stat) 584 + static int ftrace_profile_pages_init(struct ftrace_profile_stat *stat) 580 585 { 581 586 struct ftrace_profile_page *pg; 582 587 int functions; ··· 983 988 } 984 989 } 985 990 986 - entry = tracefs_create_file("function_profile_enabled", 0644, 987 - d_tracer, NULL, &ftrace_profile_fops); 991 + entry = tracefs_create_file("function_profile_enabled", 992 + TRACE_MODE_WRITE, d_tracer, NULL, 993 + &ftrace_profile_fops); 988 994 if (!entry) 989 995 pr_warn("Could not create tracefs 'function_profile_enabled' entry\n"); 990 996 } ··· 2388 2392 return 0; 2389 2393 2390 2394 return entry->direct; 2395 + } 2396 + 2397 + static struct ftrace_func_entry* 2398 + ftrace_add_rec_direct(unsigned long ip, unsigned long addr, 2399 + struct ftrace_hash **free_hash) 2400 + { 2401 + struct ftrace_func_entry *entry; 2402 + 2403 + if (ftrace_hash_empty(direct_functions) || 2404 + direct_functions->count > 2 * (1 << direct_functions->size_bits)) { 2405 + struct ftrace_hash *new_hash; 2406 + int size = ftrace_hash_empty(direct_functions) ? 0 : 2407 + direct_functions->count + 1; 2408 + 2409 + if (size < 32) 2410 + size = 32; 2411 + 2412 + new_hash = dup_hash(direct_functions, size); 2413 + if (!new_hash) 2414 + return NULL; 2415 + 2416 + *free_hash = direct_functions; 2417 + direct_functions = new_hash; 2418 + } 2419 + 2420 + entry = kmalloc(sizeof(*entry), GFP_KERNEL); 2421 + if (!entry) 2422 + return NULL; 2423 + 2424 + entry->ip = ip; 2425 + entry->direct = addr; 2426 + __add_hash_entry(direct_functions, entry); 2427 + return entry; 2391 2428 } 2392 2429 2393 2430 static void call_direct_funcs(unsigned long ip, unsigned long pip, ··· 5139 5110 } 5140 5111 5141 5112 ret = -ENOMEM; 5142 - if (ftrace_hash_empty(direct_functions) || 5143 - direct_functions->count > 2 * (1 << direct_functions->size_bits)) { 5144 - struct ftrace_hash *new_hash; 5145 - int size = ftrace_hash_empty(direct_functions) ? 0 : 5146 - direct_functions->count + 1; 5147 - 5148 - if (size < 32) 5149 - size = 32; 5150 - 5151 - new_hash = dup_hash(direct_functions, size); 5152 - if (!new_hash) 5153 - goto out_unlock; 5154 - 5155 - free_hash = direct_functions; 5156 - direct_functions = new_hash; 5157 - } 5158 - 5159 - entry = kmalloc(sizeof(*entry), GFP_KERNEL); 5160 - if (!entry) 5161 - goto out_unlock; 5162 - 5163 5113 direct = ftrace_find_direct_func(addr); 5164 5114 if (!direct) { 5165 5115 direct = ftrace_alloc_direct_func(addr); 5166 - if (!direct) { 5167 - kfree(entry); 5116 + if (!direct) 5168 5117 goto out_unlock; 5169 - } 5170 5118 } 5171 5119 5172 - entry->ip = ip; 5173 - entry->direct = addr; 5174 - __add_hash_entry(direct_functions, entry); 5120 + entry = ftrace_add_rec_direct(ip, addr, &free_hash); 5121 + if (!entry) 5122 + goto out_unlock; 5175 5123 5176 5124 ret = ftrace_set_filter_ip(&direct_ops, ip, 0, 0); 5177 5125 if (ret) ··· 5401 5395 return ret; 5402 5396 } 5403 5397 EXPORT_SYMBOL_GPL(modify_ftrace_direct); 5398 + 5399 + #define MULTI_FLAGS (FTRACE_OPS_FL_IPMODIFY | FTRACE_OPS_FL_DIRECT | \ 5400 + FTRACE_OPS_FL_SAVE_REGS) 5401 + 5402 + static int check_direct_multi(struct ftrace_ops *ops) 5403 + { 5404 + if (!(ops->flags & FTRACE_OPS_FL_INITIALIZED)) 5405 + return -EINVAL; 5406 + if ((ops->flags & MULTI_FLAGS) != MULTI_FLAGS) 5407 + return -EINVAL; 5408 + return 0; 5409 + } 5410 + 5411 + static void remove_direct_functions_hash(struct ftrace_hash *hash, unsigned long addr) 5412 + { 5413 + struct ftrace_func_entry *entry, *del; 5414 + int size, i; 5415 + 5416 + size = 1 << hash->size_bits; 5417 + for (i = 0; i < size; i++) { 5418 + hlist_for_each_entry(entry, &hash->buckets[i], hlist) { 5419 + del = __ftrace_lookup_ip(direct_functions, entry->ip); 5420 + if (del && del->direct == addr) { 5421 + remove_hash_entry(direct_functions, del); 5422 + kfree(del); 5423 + } 5424 + } 5425 + } 5426 + } 5427 + 5428 + /** 5429 + * register_ftrace_direct_multi - Call a custom trampoline directly 5430 + * for multiple functions registered in @ops 5431 + * @ops: The address of the struct ftrace_ops object 5432 + * @addr: The address of the trampoline to call at @ops functions 5433 + * 5434 + * This is used to connect a direct calls to @addr from the nop locations 5435 + * of the functions registered in @ops (with by ftrace_set_filter_ip 5436 + * function). 5437 + * 5438 + * The location that it calls (@addr) must be able to handle a direct call, 5439 + * and save the parameters of the function being traced, and restore them 5440 + * (or inject new ones if needed), before returning. 5441 + * 5442 + * Returns: 5443 + * 0 on success 5444 + * -EINVAL - The @ops object was already registered with this call or 5445 + * when there are no functions in @ops object. 5446 + * -EBUSY - Another direct function is already attached (there can be only one) 5447 + * -ENODEV - @ip does not point to a ftrace nop location (or not supported) 5448 + * -ENOMEM - There was an allocation failure. 5449 + */ 5450 + int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5451 + { 5452 + struct ftrace_hash *hash, *free_hash = NULL; 5453 + struct ftrace_func_entry *entry, *new; 5454 + int err = -EBUSY, size, i; 5455 + 5456 + if (ops->func || ops->trampoline) 5457 + return -EINVAL; 5458 + if (!(ops->flags & FTRACE_OPS_FL_INITIALIZED)) 5459 + return -EINVAL; 5460 + if (ops->flags & FTRACE_OPS_FL_ENABLED) 5461 + return -EINVAL; 5462 + 5463 + hash = ops->func_hash->filter_hash; 5464 + if (ftrace_hash_empty(hash)) 5465 + return -EINVAL; 5466 + 5467 + mutex_lock(&direct_mutex); 5468 + 5469 + /* Make sure requested entries are not already registered.. */ 5470 + size = 1 << hash->size_bits; 5471 + for (i = 0; i < size; i++) { 5472 + hlist_for_each_entry(entry, &hash->buckets[i], hlist) { 5473 + if (ftrace_find_rec_direct(entry->ip)) 5474 + goto out_unlock; 5475 + } 5476 + } 5477 + 5478 + /* ... and insert them to direct_functions hash. */ 5479 + err = -ENOMEM; 5480 + for (i = 0; i < size; i++) { 5481 + hlist_for_each_entry(entry, &hash->buckets[i], hlist) { 5482 + new = ftrace_add_rec_direct(entry->ip, addr, &free_hash); 5483 + if (!new) 5484 + goto out_remove; 5485 + entry->direct = addr; 5486 + } 5487 + } 5488 + 5489 + ops->func = call_direct_funcs; 5490 + ops->flags = MULTI_FLAGS; 5491 + ops->trampoline = FTRACE_REGS_ADDR; 5492 + 5493 + err = register_ftrace_function(ops); 5494 + 5495 + out_remove: 5496 + if (err) 5497 + remove_direct_functions_hash(hash, addr); 5498 + 5499 + out_unlock: 5500 + mutex_unlock(&direct_mutex); 5501 + 5502 + if (free_hash) { 5503 + synchronize_rcu_tasks(); 5504 + free_ftrace_hash(free_hash); 5505 + } 5506 + return err; 5507 + } 5508 + EXPORT_SYMBOL_GPL(register_ftrace_direct_multi); 5509 + 5510 + /** 5511 + * unregister_ftrace_direct_multi - Remove calls to custom trampoline 5512 + * previously registered by register_ftrace_direct_multi for @ops object. 5513 + * @ops: The address of the struct ftrace_ops object 5514 + * 5515 + * This is used to remove a direct calls to @addr from the nop locations 5516 + * of the functions registered in @ops (with by ftrace_set_filter_ip 5517 + * function). 5518 + * 5519 + * Returns: 5520 + * 0 on success 5521 + * -EINVAL - The @ops object was not properly registered. 5522 + */ 5523 + int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5524 + { 5525 + struct ftrace_hash *hash = ops->func_hash->filter_hash; 5526 + int err; 5527 + 5528 + if (check_direct_multi(ops)) 5529 + return -EINVAL; 5530 + if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 5531 + return -EINVAL; 5532 + 5533 + mutex_lock(&direct_mutex); 5534 + err = unregister_ftrace_function(ops); 5535 + remove_direct_functions_hash(hash, addr); 5536 + mutex_unlock(&direct_mutex); 5537 + return err; 5538 + } 5539 + EXPORT_SYMBOL_GPL(unregister_ftrace_direct_multi); 5540 + 5541 + /** 5542 + * modify_ftrace_direct_multi - Modify an existing direct 'multi' call 5543 + * to call something else 5544 + * @ops: The address of the struct ftrace_ops object 5545 + * @addr: The address of the new trampoline to call at @ops functions 5546 + * 5547 + * This is used to unregister currently registered direct caller and 5548 + * register new one @addr on functions registered in @ops object. 5549 + * 5550 + * Note there's window between ftrace_shutdown and ftrace_startup calls 5551 + * where there will be no callbacks called. 5552 + * 5553 + * Returns: zero on success. Non zero on error, which includes: 5554 + * -EINVAL - The @ops object was not properly registered. 5555 + */ 5556 + int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr) 5557 + { 5558 + struct ftrace_hash *hash; 5559 + struct ftrace_func_entry *entry, *iter; 5560 + static struct ftrace_ops tmp_ops = { 5561 + .func = ftrace_stub, 5562 + .flags = FTRACE_OPS_FL_STUB, 5563 + }; 5564 + int i, size; 5565 + int err; 5566 + 5567 + if (check_direct_multi(ops)) 5568 + return -EINVAL; 5569 + if (!(ops->flags & FTRACE_OPS_FL_ENABLED)) 5570 + return -EINVAL; 5571 + 5572 + mutex_lock(&direct_mutex); 5573 + 5574 + /* Enable the tmp_ops to have the same functions as the direct ops */ 5575 + ftrace_ops_init(&tmp_ops); 5576 + tmp_ops.func_hash = ops->func_hash; 5577 + 5578 + err = register_ftrace_function(&tmp_ops); 5579 + if (err) 5580 + goto out_direct; 5581 + 5582 + /* 5583 + * Now the ftrace_ops_list_func() is called to do the direct callers. 5584 + * We can safely change the direct functions attached to each entry. 5585 + */ 5586 + mutex_lock(&ftrace_lock); 5587 + 5588 + hash = ops->func_hash->filter_hash; 5589 + size = 1 << hash->size_bits; 5590 + for (i = 0; i < size; i++) { 5591 + hlist_for_each_entry(iter, &hash->buckets[i], hlist) { 5592 + entry = __ftrace_lookup_ip(direct_functions, iter->ip); 5593 + if (!entry) 5594 + continue; 5595 + entry->direct = addr; 5596 + } 5597 + } 5598 + 5599 + /* Removing the tmp_ops will add the updated direct callers to the functions */ 5600 + unregister_ftrace_function(&tmp_ops); 5601 + 5602 + mutex_unlock(&ftrace_lock); 5603 + out_direct: 5604 + mutex_unlock(&direct_mutex); 5605 + return err; 5606 + } 5607 + EXPORT_SYMBOL_GPL(modify_ftrace_direct_multi); 5404 5608 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ 5405 5609 5406 5610 /** ··· 6325 6109 struct dentry *parent) 6326 6110 { 6327 6111 6328 - trace_create_file("set_ftrace_filter", 0644, parent, 6112 + trace_create_file("set_ftrace_filter", TRACE_MODE_WRITE, parent, 6329 6113 ops, &ftrace_filter_fops); 6330 6114 6331 - trace_create_file("set_ftrace_notrace", 0644, parent, 6115 + trace_create_file("set_ftrace_notrace", TRACE_MODE_WRITE, parent, 6332 6116 ops, &ftrace_notrace_fops); 6333 6117 } 6334 6118 ··· 6355 6139 static __init int ftrace_init_dyn_tracefs(struct dentry *d_tracer) 6356 6140 { 6357 6141 6358 - trace_create_file("available_filter_functions", 0444, 6142 + trace_create_file("available_filter_functions", TRACE_MODE_READ, 6359 6143 d_tracer, NULL, &ftrace_avail_fops); 6360 6144 6361 - trace_create_file("enabled_functions", 0444, 6145 + trace_create_file("enabled_functions", TRACE_MODE_READ, 6362 6146 d_tracer, NULL, &ftrace_enabled_fops); 6363 6147 6364 6148 ftrace_create_filter_files(&global_ops, d_tracer); 6365 6149 6366 6150 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 6367 - trace_create_file("set_graph_function", 0644, d_tracer, 6151 + trace_create_file("set_graph_function", TRACE_MODE_WRITE, d_tracer, 6368 6152 NULL, 6369 6153 &ftrace_graph_fops); 6370 - trace_create_file("set_graph_notrace", 0644, d_tracer, 6154 + trace_create_file("set_graph_notrace", TRACE_MODE_WRITE, d_tracer, 6371 6155 NULL, 6372 6156 &ftrace_graph_notrace_fops); 6373 6157 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ ··· 7062 6846 ftrace_free_mem(NULL, start, end); 7063 6847 } 7064 6848 6849 + int __init __weak ftrace_dyn_arch_init(void) 6850 + { 6851 + return 0; 6852 + } 6853 + 7065 6854 void __init ftrace_init(void) 7066 6855 { 7067 6856 extern unsigned long __start_mcount_loc[]; ··· 7198 6977 struct ftrace_ops *op; 7199 6978 int bit; 7200 6979 6980 + /* 6981 + * The ftrace_test_and_set_recursion() will disable preemption, 6982 + * which is required since some of the ops may be dynamically 6983 + * allocated, they must be freed after a synchronize_rcu(). 6984 + */ 7201 6985 bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_LIST_START); 7202 6986 if (bit < 0) 7203 6987 return; 7204 - 7205 - /* 7206 - * Some of the ops may be dynamically allocated, 7207 - * they must be freed after a synchronize_rcu(). 7208 - */ 7209 - preempt_disable_notrace(); 7210 6988 7211 6989 do_for_each_ftrace_op(op, ftrace_ops_list) { 7212 6990 /* Stub functions don't need to be called nor tested */ ··· 7230 7010 } 7231 7011 } while_for_each_ftrace_op(op); 7232 7012 out: 7233 - preempt_enable_notrace(); 7234 7013 trace_clear_recursion(bit); 7235 7014 } 7236 7015 ··· 7245 7026 * Note, CONFIG_DYNAMIC_FTRACE_WITH_REGS expects a full regs to be saved. 7246 7027 * An architecture can pass partial regs with ftrace_ops and still 7247 7028 * set the ARCH_SUPPORTS_FTRACE_OPS. 7029 + * 7030 + * In vmlinux.lds.h, ftrace_ops_list_func() is defined to be 7031 + * arch_ftrace_ops_list_func. 7248 7032 */ 7249 7033 #if ARCH_SUPPORTS_FTRACE_OPS 7250 - static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, 7251 - struct ftrace_ops *op, struct ftrace_regs *fregs) 7034 + void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, 7035 + struct ftrace_ops *op, struct ftrace_regs *fregs) 7252 7036 { 7253 7037 __ftrace_ops_list_func(ip, parent_ip, NULL, fregs); 7254 7038 } 7255 - NOKPROBE_SYMBOL(ftrace_ops_list_func); 7256 7039 #else 7257 - static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip) 7040 + void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip) 7258 7041 { 7259 7042 __ftrace_ops_list_func(ip, parent_ip, NULL, NULL); 7260 7043 } 7261 - NOKPROBE_SYMBOL(ftrace_ops_no_ops); 7262 7044 #endif 7045 + NOKPROBE_SYMBOL(arch_ftrace_ops_list_func); 7263 7046 7264 7047 /* 7265 7048 * If there's only one function registered but it does not support ··· 7277 7056 if (bit < 0) 7278 7057 return; 7279 7058 7280 - preempt_disable_notrace(); 7281 - 7282 7059 if (!(op->flags & FTRACE_OPS_FL_RCU) || rcu_is_watching()) 7283 7060 op->func(ip, parent_ip, op, fregs); 7284 7061 7285 - preempt_enable_notrace(); 7286 7062 trace_clear_recursion(bit); 7287 7063 } 7288 7064 NOKPROBE_SYMBOL(ftrace_ops_assist_func); ··· 7402 7184 synchronize_rcu(); 7403 7185 7404 7186 if ((type & TRACE_PIDS) && pid_list) 7405 - trace_free_pid_list(pid_list); 7187 + trace_pid_list_free(pid_list); 7406 7188 7407 7189 if ((type & TRACE_NO_PIDS) && no_pid_list) 7408 - trace_free_pid_list(no_pid_list); 7190 + trace_pid_list_free(no_pid_list); 7409 7191 } 7410 7192 7411 7193 void ftrace_clear_pids(struct trace_array *tr) ··· 7646 7428 7647 7429 if (filtered_pids) { 7648 7430 synchronize_rcu(); 7649 - trace_free_pid_list(filtered_pids); 7431 + trace_pid_list_free(filtered_pids); 7650 7432 } else if (pid_list && !other_pids) { 7651 7433 /* Register a probe to set whether to ignore the tracing of a task */ 7652 7434 register_trace_sched_switch(ftrace_filter_pid_sched_switch_probe, tr); ··· 7712 7494 7713 7495 void ftrace_init_tracefs(struct trace_array *tr, struct dentry *d_tracer) 7714 7496 { 7715 - trace_create_file("set_ftrace_pid", 0644, d_tracer, 7497 + trace_create_file("set_ftrace_pid", TRACE_MODE_WRITE, d_tracer, 7716 7498 tr, &ftrace_pid_fops); 7717 - trace_create_file("set_ftrace_notrace_pid", 0644, d_tracer, 7718 - tr, &ftrace_no_pid_fops); 7499 + trace_create_file("set_ftrace_notrace_pid", TRACE_MODE_WRITE, 7500 + d_tracer, tr, &ftrace_no_pid_fops); 7719 7501 } 7720 7502 7721 7503 void __init ftrace_init_tracefs_toplevel(struct trace_array *tr,
+495
kernel/trace/pid_list.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2021 VMware Inc, Steven Rostedt <rostedt@goodmis.org> 4 + */ 5 + #include <linux/spinlock.h> 6 + #include <linux/irq_work.h> 7 + #include <linux/slab.h> 8 + #include "trace.h" 9 + 10 + /* See pid_list.h for details */ 11 + 12 + static inline union lower_chunk *get_lower_chunk(struct trace_pid_list *pid_list) 13 + { 14 + union lower_chunk *chunk; 15 + 16 + lockdep_assert_held(&pid_list->lock); 17 + 18 + if (!pid_list->lower_list) 19 + return NULL; 20 + 21 + chunk = pid_list->lower_list; 22 + pid_list->lower_list = chunk->next; 23 + pid_list->free_lower_chunks--; 24 + WARN_ON_ONCE(pid_list->free_lower_chunks < 0); 25 + chunk->next = NULL; 26 + /* 27 + * If a refill needs to happen, it can not happen here 28 + * as the scheduler run queue locks are held. 29 + */ 30 + if (pid_list->free_lower_chunks <= CHUNK_REALLOC) 31 + irq_work_queue(&pid_list->refill_irqwork); 32 + 33 + return chunk; 34 + } 35 + 36 + static inline union upper_chunk *get_upper_chunk(struct trace_pid_list *pid_list) 37 + { 38 + union upper_chunk *chunk; 39 + 40 + lockdep_assert_held(&pid_list->lock); 41 + 42 + if (!pid_list->upper_list) 43 + return NULL; 44 + 45 + chunk = pid_list->upper_list; 46 + pid_list->upper_list = chunk->next; 47 + pid_list->free_upper_chunks--; 48 + WARN_ON_ONCE(pid_list->free_upper_chunks < 0); 49 + chunk->next = NULL; 50 + /* 51 + * If a refill needs to happen, it can not happen here 52 + * as the scheduler run queue locks are held. 53 + */ 54 + if (pid_list->free_upper_chunks <= CHUNK_REALLOC) 55 + irq_work_queue(&pid_list->refill_irqwork); 56 + 57 + return chunk; 58 + } 59 + 60 + static inline void put_lower_chunk(struct trace_pid_list *pid_list, 61 + union lower_chunk *chunk) 62 + { 63 + lockdep_assert_held(&pid_list->lock); 64 + 65 + chunk->next = pid_list->lower_list; 66 + pid_list->lower_list = chunk; 67 + pid_list->free_lower_chunks++; 68 + } 69 + 70 + static inline void put_upper_chunk(struct trace_pid_list *pid_list, 71 + union upper_chunk *chunk) 72 + { 73 + lockdep_assert_held(&pid_list->lock); 74 + 75 + chunk->next = pid_list->upper_list; 76 + pid_list->upper_list = chunk; 77 + pid_list->free_upper_chunks++; 78 + } 79 + 80 + static inline bool upper_empty(union upper_chunk *chunk) 81 + { 82 + /* 83 + * If chunk->data has no lower chunks, it will be the same 84 + * as a zeroed bitmask. Use find_first_bit() to test it 85 + * and if it doesn't find any bits set, then the array 86 + * is empty. 87 + */ 88 + int bit = find_first_bit((unsigned long *)chunk->data, 89 + sizeof(chunk->data) * 8); 90 + return bit >= sizeof(chunk->data) * 8; 91 + } 92 + 93 + static inline int pid_split(unsigned int pid, unsigned int *upper1, 94 + unsigned int *upper2, unsigned int *lower) 95 + { 96 + /* MAX_PID should cover all pids */ 97 + BUILD_BUG_ON(MAX_PID < PID_MAX_LIMIT); 98 + 99 + /* In case a bad pid is passed in, then fail */ 100 + if (unlikely(pid >= MAX_PID)) 101 + return -1; 102 + 103 + *upper1 = (pid >> UPPER1_SHIFT) & UPPER_MASK; 104 + *upper2 = (pid >> UPPER2_SHIFT) & UPPER_MASK; 105 + *lower = pid & LOWER_MASK; 106 + 107 + return 0; 108 + } 109 + 110 + static inline unsigned int pid_join(unsigned int upper1, 111 + unsigned int upper2, unsigned int lower) 112 + { 113 + return ((upper1 & UPPER_MASK) << UPPER1_SHIFT) | 114 + ((upper2 & UPPER_MASK) << UPPER2_SHIFT) | 115 + (lower & LOWER_MASK); 116 + } 117 + 118 + /** 119 + * trace_pid_list_is_set - test if the pid is set in the list 120 + * @pid_list: The pid list to test 121 + * @pid: The pid to to see if set in the list. 122 + * 123 + * Tests if @pid is is set in the @pid_list. This is usually called 124 + * from the scheduler when a task is scheduled. Its pid is checked 125 + * if it should be traced or not. 126 + * 127 + * Return true if the pid is in the list, false otherwise. 128 + */ 129 + bool trace_pid_list_is_set(struct trace_pid_list *pid_list, unsigned int pid) 130 + { 131 + union upper_chunk *upper_chunk; 132 + union lower_chunk *lower_chunk; 133 + unsigned long flags; 134 + unsigned int upper1; 135 + unsigned int upper2; 136 + unsigned int lower; 137 + bool ret = false; 138 + 139 + if (!pid_list) 140 + return false; 141 + 142 + if (pid_split(pid, &upper1, &upper2, &lower) < 0) 143 + return false; 144 + 145 + raw_spin_lock_irqsave(&pid_list->lock, flags); 146 + upper_chunk = pid_list->upper[upper1]; 147 + if (upper_chunk) { 148 + lower_chunk = upper_chunk->data[upper2]; 149 + if (lower_chunk) 150 + ret = test_bit(lower, lower_chunk->data); 151 + } 152 + raw_spin_unlock_irqrestore(&pid_list->lock, flags); 153 + 154 + return ret; 155 + } 156 + 157 + /** 158 + * trace_pid_list_set - add a pid to the list 159 + * @pid_list: The pid list to add the @pid to. 160 + * @pid: The pid to add. 161 + * 162 + * Adds @pid to @pid_list. This is usually done explicitly by a user 163 + * adding a task to be traced, or indirectly by the fork function 164 + * when children should be traced and a task's pid is in the list. 165 + * 166 + * Return 0 on success, negative otherwise. 167 + */ 168 + int trace_pid_list_set(struct trace_pid_list *pid_list, unsigned int pid) 169 + { 170 + union upper_chunk *upper_chunk; 171 + union lower_chunk *lower_chunk; 172 + unsigned long flags; 173 + unsigned int upper1; 174 + unsigned int upper2; 175 + unsigned int lower; 176 + int ret; 177 + 178 + if (!pid_list) 179 + return -ENODEV; 180 + 181 + if (pid_split(pid, &upper1, &upper2, &lower) < 0) 182 + return -EINVAL; 183 + 184 + raw_spin_lock_irqsave(&pid_list->lock, flags); 185 + upper_chunk = pid_list->upper[upper1]; 186 + if (!upper_chunk) { 187 + upper_chunk = get_upper_chunk(pid_list); 188 + if (!upper_chunk) { 189 + ret = -ENOMEM; 190 + goto out; 191 + } 192 + pid_list->upper[upper1] = upper_chunk; 193 + } 194 + lower_chunk = upper_chunk->data[upper2]; 195 + if (!lower_chunk) { 196 + lower_chunk = get_lower_chunk(pid_list); 197 + if (!lower_chunk) { 198 + ret = -ENOMEM; 199 + goto out; 200 + } 201 + upper_chunk->data[upper2] = lower_chunk; 202 + } 203 + set_bit(lower, lower_chunk->data); 204 + ret = 0; 205 + out: 206 + raw_spin_unlock_irqrestore(&pid_list->lock, flags); 207 + return ret; 208 + } 209 + 210 + /** 211 + * trace_pid_list_clear - remove a pid from the list 212 + * @pid_list: The pid list to remove the @pid from. 213 + * @pid: The pid to remove. 214 + * 215 + * Removes @pid from @pid_list. This is usually done explicitly by a user 216 + * removing tasks from tracing, or indirectly by the exit function 217 + * when a task that is set to be traced exits. 218 + * 219 + * Return 0 on success, negative otherwise. 220 + */ 221 + int trace_pid_list_clear(struct trace_pid_list *pid_list, unsigned int pid) 222 + { 223 + union upper_chunk *upper_chunk; 224 + union lower_chunk *lower_chunk; 225 + unsigned long flags; 226 + unsigned int upper1; 227 + unsigned int upper2; 228 + unsigned int lower; 229 + 230 + if (!pid_list) 231 + return -ENODEV; 232 + 233 + if (pid_split(pid, &upper1, &upper2, &lower) < 0) 234 + return -EINVAL; 235 + 236 + raw_spin_lock_irqsave(&pid_list->lock, flags); 237 + upper_chunk = pid_list->upper[upper1]; 238 + if (!upper_chunk) 239 + goto out; 240 + 241 + lower_chunk = upper_chunk->data[upper2]; 242 + if (!lower_chunk) 243 + goto out; 244 + 245 + clear_bit(lower, lower_chunk->data); 246 + 247 + /* if there's no more bits set, add it to the free list */ 248 + if (find_first_bit(lower_chunk->data, LOWER_MAX) >= LOWER_MAX) { 249 + put_lower_chunk(pid_list, lower_chunk); 250 + upper_chunk->data[upper2] = NULL; 251 + if (upper_empty(upper_chunk)) { 252 + put_upper_chunk(pid_list, upper_chunk); 253 + pid_list->upper[upper1] = NULL; 254 + } 255 + } 256 + out: 257 + raw_spin_unlock_irqrestore(&pid_list->lock, flags); 258 + return 0; 259 + } 260 + 261 + /** 262 + * trace_pid_list_next - return the next pid in the list 263 + * @pid_list: The pid list to examine. 264 + * @pid: The pid to start from 265 + * @next: The pointer to place the pid that is set starting from @pid. 266 + * 267 + * Looks for the next consecutive pid that is in @pid_list starting 268 + * at the pid specified by @pid. If one is set (including @pid), then 269 + * that pid is placed into @next. 270 + * 271 + * Return 0 when a pid is found, -1 if there are no more pids included. 272 + */ 273 + int trace_pid_list_next(struct trace_pid_list *pid_list, unsigned int pid, 274 + unsigned int *next) 275 + { 276 + union upper_chunk *upper_chunk; 277 + union lower_chunk *lower_chunk; 278 + unsigned long flags; 279 + unsigned int upper1; 280 + unsigned int upper2; 281 + unsigned int lower; 282 + 283 + if (!pid_list) 284 + return -ENODEV; 285 + 286 + if (pid_split(pid, &upper1, &upper2, &lower) < 0) 287 + return -EINVAL; 288 + 289 + raw_spin_lock_irqsave(&pid_list->lock, flags); 290 + for (; upper1 <= UPPER_MASK; upper1++, upper2 = 0) { 291 + upper_chunk = pid_list->upper[upper1]; 292 + 293 + if (!upper_chunk) 294 + continue; 295 + 296 + for (; upper2 <= UPPER_MASK; upper2++, lower = 0) { 297 + lower_chunk = upper_chunk->data[upper2]; 298 + if (!lower_chunk) 299 + continue; 300 + 301 + lower = find_next_bit(lower_chunk->data, LOWER_MAX, 302 + lower); 303 + if (lower < LOWER_MAX) 304 + goto found; 305 + } 306 + } 307 + 308 + found: 309 + raw_spin_unlock_irqrestore(&pid_list->lock, flags); 310 + if (upper1 > UPPER_MASK) 311 + return -1; 312 + 313 + *next = pid_join(upper1, upper2, lower); 314 + return 0; 315 + } 316 + 317 + /** 318 + * trace_pid_list_first - return the first pid in the list 319 + * @pid_list: The pid list to examine. 320 + * @pid: The pointer to place the pid first found pid that is set. 321 + * 322 + * Looks for the first pid that is set in @pid_list, and places it 323 + * into @pid if found. 324 + * 325 + * Return 0 when a pid is found, -1 if there are no pids set. 326 + */ 327 + int trace_pid_list_first(struct trace_pid_list *pid_list, unsigned int *pid) 328 + { 329 + return trace_pid_list_next(pid_list, 0, pid); 330 + } 331 + 332 + static void pid_list_refill_irq(struct irq_work *iwork) 333 + { 334 + struct trace_pid_list *pid_list = container_of(iwork, struct trace_pid_list, 335 + refill_irqwork); 336 + union upper_chunk *upper = NULL; 337 + union lower_chunk *lower = NULL; 338 + union upper_chunk **upper_next = &upper; 339 + union lower_chunk **lower_next = &lower; 340 + int upper_count; 341 + int lower_count; 342 + int ucnt = 0; 343 + int lcnt = 0; 344 + 345 + again: 346 + raw_spin_lock(&pid_list->lock); 347 + upper_count = CHUNK_ALLOC - pid_list->free_upper_chunks; 348 + lower_count = CHUNK_ALLOC - pid_list->free_lower_chunks; 349 + raw_spin_unlock(&pid_list->lock); 350 + 351 + if (upper_count <= 0 && lower_count <= 0) 352 + return; 353 + 354 + while (upper_count-- > 0) { 355 + union upper_chunk *chunk; 356 + 357 + chunk = kzalloc(sizeof(*chunk), GFP_KERNEL); 358 + if (!chunk) 359 + break; 360 + *upper_next = chunk; 361 + upper_next = &chunk->next; 362 + ucnt++; 363 + } 364 + 365 + while (lower_count-- > 0) { 366 + union lower_chunk *chunk; 367 + 368 + chunk = kzalloc(sizeof(*chunk), GFP_KERNEL); 369 + if (!chunk) 370 + break; 371 + *lower_next = chunk; 372 + lower_next = &chunk->next; 373 + lcnt++; 374 + } 375 + 376 + raw_spin_lock(&pid_list->lock); 377 + if (upper) { 378 + *upper_next = pid_list->upper_list; 379 + pid_list->upper_list = upper; 380 + pid_list->free_upper_chunks += ucnt; 381 + } 382 + if (lower) { 383 + *lower_next = pid_list->lower_list; 384 + pid_list->lower_list = lower; 385 + pid_list->free_lower_chunks += lcnt; 386 + } 387 + raw_spin_unlock(&pid_list->lock); 388 + 389 + /* 390 + * On success of allocating all the chunks, both counters 391 + * will be less than zero. If they are not, then an allocation 392 + * failed, and we should not try again. 393 + */ 394 + if (upper_count >= 0 || lower_count >= 0) 395 + return; 396 + /* 397 + * When the locks were released, free chunks could have 398 + * been used and allocation needs to be done again. Might as 399 + * well allocate it now. 400 + */ 401 + goto again; 402 + } 403 + 404 + /** 405 + * trace_pid_list_alloc - create a new pid_list 406 + * 407 + * Allocates a new pid_list to store pids into. 408 + * 409 + * Returns the pid_list on success, NULL otherwise. 410 + */ 411 + struct trace_pid_list *trace_pid_list_alloc(void) 412 + { 413 + struct trace_pid_list *pid_list; 414 + int i; 415 + 416 + /* According to linux/thread.h, pids can be no bigger that 30 bits */ 417 + WARN_ON_ONCE(pid_max > (1 << 30)); 418 + 419 + pid_list = kzalloc(sizeof(*pid_list), GFP_KERNEL); 420 + if (!pid_list) 421 + return NULL; 422 + 423 + init_irq_work(&pid_list->refill_irqwork, pid_list_refill_irq); 424 + 425 + raw_spin_lock_init(&pid_list->lock); 426 + 427 + for (i = 0; i < CHUNK_ALLOC; i++) { 428 + union upper_chunk *chunk; 429 + 430 + chunk = kzalloc(sizeof(*chunk), GFP_KERNEL); 431 + if (!chunk) 432 + break; 433 + chunk->next = pid_list->upper_list; 434 + pid_list->upper_list = chunk; 435 + pid_list->free_upper_chunks++; 436 + } 437 + 438 + for (i = 0; i < CHUNK_ALLOC; i++) { 439 + union lower_chunk *chunk; 440 + 441 + chunk = kzalloc(sizeof(*chunk), GFP_KERNEL); 442 + if (!chunk) 443 + break; 444 + chunk->next = pid_list->lower_list; 445 + pid_list->lower_list = chunk; 446 + pid_list->free_lower_chunks++; 447 + } 448 + 449 + return pid_list; 450 + } 451 + 452 + /** 453 + * trace_pid_list_free - Frees an allocated pid_list. 454 + * 455 + * Frees the memory for a pid_list that was allocated. 456 + */ 457 + void trace_pid_list_free(struct trace_pid_list *pid_list) 458 + { 459 + union upper_chunk *upper; 460 + union lower_chunk *lower; 461 + int i, j; 462 + 463 + if (!pid_list) 464 + return; 465 + 466 + irq_work_sync(&pid_list->refill_irqwork); 467 + 468 + while (pid_list->lower_list) { 469 + union lower_chunk *chunk; 470 + 471 + chunk = pid_list->lower_list; 472 + pid_list->lower_list = pid_list->lower_list->next; 473 + kfree(chunk); 474 + } 475 + 476 + while (pid_list->upper_list) { 477 + union upper_chunk *chunk; 478 + 479 + chunk = pid_list->upper_list; 480 + pid_list->upper_list = pid_list->upper_list->next; 481 + kfree(chunk); 482 + } 483 + 484 + for (i = 0; i < UPPER1_SIZE; i++) { 485 + upper = pid_list->upper[i]; 486 + if (upper) { 487 + for (j = 0; j < UPPER2_SIZE; j++) { 488 + lower = upper->data[j]; 489 + kfree(lower); 490 + } 491 + kfree(upper); 492 + } 493 + } 494 + kfree(pid_list); 495 + }
+88
kernel/trace/pid_list.h
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + /* Do not include this file directly. */ 4 + 5 + #ifndef _TRACE_INTERNAL_PID_LIST_H 6 + #define _TRACE_INTERNAL_PID_LIST_H 7 + 8 + /* 9 + * In order to keep track of what pids to trace, a tree is created much 10 + * like page tables are used. This creates a sparse bit map, where 11 + * the tree is filled in when needed. A PID is at most 30 bits (see 12 + * linux/thread.h), and is broken up into 3 sections based on the bit map 13 + * of the bits. The 8 MSB is the "upper1" section. The next 8 MSB is the 14 + * "upper2" section and the 14 LSB is the "lower" section. 15 + * 16 + * A trace_pid_list structure holds the "upper1" section, in an 17 + * array of 256 pointers (1 or 2K in size) to "upper_chunk" unions, where 18 + * each has an array of 256 pointers (1 or 2K in size) to the "lower_chunk" 19 + * structures, where each has an array of size 2K bytes representing a bitmask 20 + * of the 14 LSB of the PID (256 * 8 = 2048) 21 + * 22 + * When a trace_pid_list is allocated, it includes the 256 pointer array 23 + * of the upper1 unions. Then a "cache" of upper and lower is allocated 24 + * where these will be assigned as needed. 25 + * 26 + * When a bit is set in the pid_list bitmask, the pid to use has 27 + * the 8 MSB masked, and this is used to index the array in the 28 + * pid_list to find the next upper union. If the element is NULL, 29 + * then one is retrieved from the upper_list cache. If none is 30 + * available, then -ENOMEM is returned. 31 + * 32 + * The next 8 MSB is used to index into the "upper2" section. If this 33 + * element is NULL, then it is retrieved from the lower_list cache. 34 + * Again, if one is not available -ENOMEM is returned. 35 + * 36 + * Finally the 14 LSB of the PID is used to set the bit in the 16384 37 + * bitmask (made up of 2K bytes). 38 + * 39 + * When the second upper section or the lower section has their last 40 + * bit cleared, they are added back to the free list to be reused 41 + * when needed. 42 + */ 43 + 44 + #define UPPER_BITS 8 45 + #define UPPER_MAX (1 << UPPER_BITS) 46 + #define UPPER1_SIZE (1 << UPPER_BITS) 47 + #define UPPER2_SIZE (1 << UPPER_BITS) 48 + 49 + #define LOWER_BITS 14 50 + #define LOWER_MAX (1 << LOWER_BITS) 51 + #define LOWER_SIZE (LOWER_MAX / BITS_PER_LONG) 52 + 53 + #define UPPER1_SHIFT (LOWER_BITS + UPPER_BITS) 54 + #define UPPER2_SHIFT LOWER_BITS 55 + #define LOWER_MASK (LOWER_MAX - 1) 56 + 57 + #define UPPER_MASK (UPPER_MAX - 1) 58 + 59 + /* According to linux/thread.h pids can not be bigger than or equal to 1 << 30 */ 60 + #define MAX_PID (1 << 30) 61 + 62 + /* Just keep 6 chunks of both upper and lower in the cache on alloc */ 63 + #define CHUNK_ALLOC 6 64 + 65 + /* Have 2 chunks free, trigger a refill of the cache */ 66 + #define CHUNK_REALLOC 2 67 + 68 + union lower_chunk { 69 + union lower_chunk *next; 70 + unsigned long data[LOWER_SIZE]; // 2K in size 71 + }; 72 + 73 + union upper_chunk { 74 + union upper_chunk *next; 75 + union lower_chunk *data[UPPER2_SIZE]; // 1 or 2K in size 76 + }; 77 + 78 + struct trace_pid_list { 79 + raw_spinlock_t lock; 80 + struct irq_work refill_irqwork; 81 + union upper_chunk *upper[UPPER1_SIZE]; // 1 or 2K in size 82 + union upper_chunk *upper_list; 83 + union lower_chunk *lower_list; 84 + int free_upper_chunks; 85 + int free_lower_chunks; 86 + }; 87 + 88 + #endif /* _TRACE_INTERNAL_PID_LIST_H */
+2 -7
kernel/trace/ring_buffer.c
··· 3167 3167 trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer) 3168 3168 { 3169 3169 unsigned int val = cpu_buffer->current_context; 3170 - unsigned long pc = preempt_count(); 3171 - int bit; 3170 + int bit = interrupt_context_level(); 3172 3171 3173 - if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) 3174 - bit = RB_CTX_NORMAL; 3175 - else 3176 - bit = pc & NMI_MASK ? RB_CTX_NMI : 3177 - pc & HARDIRQ_MASK ? RB_CTX_IRQ : RB_CTX_SOFTIRQ; 3172 + bit = RB_CTX_NORMAL - bit; 3178 3173 3179 3174 if (unlikely(val & (1 << (bit + cpu_buffer->nest)))) { 3180 3175 /*
+64 -87
kernel/trace/trace.c
··· 512 512 return 0; 513 513 } 514 514 515 - void trace_free_pid_list(struct trace_pid_list *pid_list) 516 - { 517 - vfree(pid_list->pids); 518 - kfree(pid_list); 519 - } 520 - 521 515 /** 522 516 * trace_find_filtered_pid - check if a pid exists in a filtered_pid list 523 517 * @filtered_pids: The list of pids to check ··· 522 528 bool 523 529 trace_find_filtered_pid(struct trace_pid_list *filtered_pids, pid_t search_pid) 524 530 { 525 - /* 526 - * If pid_max changed after filtered_pids was created, we 527 - * by default ignore all pids greater than the previous pid_max. 528 - */ 529 - if (search_pid >= filtered_pids->pid_max) 530 - return false; 531 - 532 - return test_bit(search_pid, filtered_pids->pids); 531 + return trace_pid_list_is_set(filtered_pids, search_pid); 533 532 } 534 533 535 534 /** ··· 579 592 return; 580 593 } 581 594 582 - /* Sorry, but we don't support pid_max changing after setting */ 583 - if (task->pid >= pid_list->pid_max) 584 - return; 585 - 586 595 /* "self" is set for forks, and NULL for exits */ 587 596 if (self) 588 - set_bit(task->pid, pid_list->pids); 597 + trace_pid_list_set(pid_list, task->pid); 589 598 else 590 - clear_bit(task->pid, pid_list->pids); 599 + trace_pid_list_clear(pid_list, task->pid); 591 600 } 592 601 593 602 /** ··· 600 617 */ 601 618 void *trace_pid_next(struct trace_pid_list *pid_list, void *v, loff_t *pos) 602 619 { 603 - unsigned long pid = (unsigned long)v; 620 + long pid = (unsigned long)v; 621 + unsigned int next; 604 622 605 623 (*pos)++; 606 624 607 625 /* pid already is +1 of the actual previous bit */ 608 - pid = find_next_bit(pid_list->pids, pid_list->pid_max, pid); 626 + if (trace_pid_list_next(pid_list, pid, &next) < 0) 627 + return NULL; 628 + 629 + pid = next; 609 630 610 631 /* Return pid + 1 to allow zero to be represented */ 611 - if (pid < pid_list->pid_max) 612 - return (void *)(pid + 1); 613 - 614 - return NULL; 632 + return (void *)(pid + 1); 615 633 } 616 634 617 635 /** ··· 629 645 void *trace_pid_start(struct trace_pid_list *pid_list, loff_t *pos) 630 646 { 631 647 unsigned long pid; 648 + unsigned int first; 632 649 loff_t l = 0; 633 650 634 - pid = find_first_bit(pid_list->pids, pid_list->pid_max); 635 - if (pid >= pid_list->pid_max) 651 + if (trace_pid_list_first(pid_list, &first) < 0) 636 652 return NULL; 653 + 654 + pid = first; 637 655 638 656 /* Return pid + 1 so that zero can be the exit value */ 639 657 for (pid++; pid && l < *pos; ··· 672 686 unsigned long val; 673 687 int nr_pids = 0; 674 688 ssize_t read = 0; 675 - ssize_t ret = 0; 689 + ssize_t ret; 676 690 loff_t pos; 677 691 pid_t pid; 678 692 ··· 685 699 * the user. If the operation fails, then the current list is 686 700 * not modified. 687 701 */ 688 - pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL); 702 + pid_list = trace_pid_list_alloc(); 689 703 if (!pid_list) { 690 704 trace_parser_put(&parser); 691 705 return -ENOMEM; 692 706 } 693 707 694 - pid_list->pid_max = READ_ONCE(pid_max); 695 - 696 - /* Only truncating will shrink pid_max */ 697 - if (filtered_pids && filtered_pids->pid_max > pid_list->pid_max) 698 - pid_list->pid_max = filtered_pids->pid_max; 699 - 700 - pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3); 701 - if (!pid_list->pids) { 702 - trace_parser_put(&parser); 703 - kfree(pid_list); 704 - return -ENOMEM; 705 - } 706 - 707 708 if (filtered_pids) { 708 709 /* copy the current bits to the new max */ 709 - for_each_set_bit(pid, filtered_pids->pids, 710 - filtered_pids->pid_max) { 711 - set_bit(pid, pid_list->pids); 710 + ret = trace_pid_list_first(filtered_pids, &pid); 711 + while (!ret) { 712 + trace_pid_list_set(pid_list, pid); 713 + ret = trace_pid_list_next(filtered_pids, pid + 1, &pid); 712 714 nr_pids++; 713 715 } 714 716 } 715 717 718 + ret = 0; 716 719 while (cnt > 0) { 717 720 718 721 pos = 0; ··· 717 742 ret = -EINVAL; 718 743 if (kstrtoul(parser.buffer, 0, &val)) 719 744 break; 720 - if (val >= pid_list->pid_max) 721 - break; 722 745 723 746 pid = (pid_t)val; 724 747 725 - set_bit(pid, pid_list->pids); 748 + if (trace_pid_list_set(pid_list, pid) < 0) { 749 + ret = -1; 750 + break; 751 + } 726 752 nr_pids++; 727 753 728 754 trace_parser_clear(&parser); ··· 732 756 trace_parser_put(&parser); 733 757 734 758 if (ret < 0) { 735 - trace_free_pid_list(pid_list); 759 + trace_pid_list_free(pid_list); 736 760 return ret; 737 761 } 738 762 739 763 if (!nr_pids) { 740 764 /* Cleared the list of pids */ 741 - trace_free_pid_list(pid_list); 765 + trace_pid_list_free(pid_list); 742 766 read = ret; 743 767 pid_list = NULL; 744 768 } ··· 1690 1714 { 1691 1715 INIT_WORK(&tr->fsnotify_work, latency_fsnotify_workfn); 1692 1716 init_irq_work(&tr->fsnotify_irqwork, latency_fsnotify_workfn_irq); 1693 - tr->d_max_latency = trace_create_file("tracing_max_latency", 0644, 1717 + tr->d_max_latency = trace_create_file("tracing_max_latency", 1718 + TRACE_MODE_WRITE, 1694 1719 d_tracer, &tr->max_latency, 1695 1720 &tracing_max_lat_fops); 1696 1721 } ··· 1725 1748 || defined(CONFIG_OSNOISE_TRACER) 1726 1749 1727 1750 #define trace_create_maxlat_file(tr, d_tracer) \ 1728 - trace_create_file("tracing_max_latency", 0644, d_tracer, \ 1729 - &tr->max_latency, &tracing_max_lat_fops) 1751 + trace_create_file("tracing_max_latency", TRACE_MODE_WRITE, \ 1752 + d_tracer, &tr->max_latency, &tracing_max_lat_fops) 1730 1753 1731 1754 #else 1732 1755 #define trace_create_maxlat_file(tr, d_tracer) do { } while (0) ··· 6054 6077 6055 6078 static void trace_create_eval_file(struct dentry *d_tracer) 6056 6079 { 6057 - trace_create_file("eval_map", 0444, d_tracer, 6080 + trace_create_file("eval_map", TRACE_MODE_READ, d_tracer, 6058 6081 NULL, &tracing_eval_map_fops); 6059 6082 } 6060 6083 ··· 8567 8590 } 8568 8591 8569 8592 /* per cpu trace_pipe */ 8570 - trace_create_cpu_file("trace_pipe", 0444, d_cpu, 8593 + trace_create_cpu_file("trace_pipe", TRACE_MODE_READ, d_cpu, 8571 8594 tr, cpu, &tracing_pipe_fops); 8572 8595 8573 8596 /* per cpu trace */ 8574 - trace_create_cpu_file("trace", 0644, d_cpu, 8597 + trace_create_cpu_file("trace", TRACE_MODE_WRITE, d_cpu, 8575 8598 tr, cpu, &tracing_fops); 8576 8599 8577 - trace_create_cpu_file("trace_pipe_raw", 0444, d_cpu, 8600 + trace_create_cpu_file("trace_pipe_raw", TRACE_MODE_READ, d_cpu, 8578 8601 tr, cpu, &tracing_buffers_fops); 8579 8602 8580 - trace_create_cpu_file("stats", 0444, d_cpu, 8603 + trace_create_cpu_file("stats", TRACE_MODE_READ, d_cpu, 8581 8604 tr, cpu, &tracing_stats_fops); 8582 8605 8583 - trace_create_cpu_file("buffer_size_kb", 0444, d_cpu, 8606 + trace_create_cpu_file("buffer_size_kb", TRACE_MODE_READ, d_cpu, 8584 8607 tr, cpu, &tracing_entries_fops); 8585 8608 8586 8609 #ifdef CONFIG_TRACER_SNAPSHOT 8587 - trace_create_cpu_file("snapshot", 0644, d_cpu, 8610 + trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu, 8588 8611 tr, cpu, &snapshot_fops); 8589 8612 8590 - trace_create_cpu_file("snapshot_raw", 0444, d_cpu, 8613 + trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu, 8591 8614 tr, cpu, &snapshot_raw_fops); 8592 8615 #endif 8593 8616 } ··· 8793 8816 topt->opt = opt; 8794 8817 topt->tr = tr; 8795 8818 8796 - topt->entry = trace_create_file(opt->name, 0644, t_options, topt, 8797 - &trace_options_fops); 8819 + topt->entry = trace_create_file(opt->name, TRACE_MODE_WRITE, 8820 + t_options, topt, &trace_options_fops); 8798 8821 8799 8822 } 8800 8823 ··· 8869 8892 if (!t_options) 8870 8893 return NULL; 8871 8894 8872 - return trace_create_file(option, 0644, t_options, 8895 + return trace_create_file(option, TRACE_MODE_WRITE, t_options, 8873 8896 (void *)&tr->trace_flags_index[index], 8874 8897 &trace_options_core_fops); 8875 8898 } ··· 9394 9417 struct trace_event_file *file; 9395 9418 int cpu; 9396 9419 9397 - trace_create_file("available_tracers", 0444, d_tracer, 9420 + trace_create_file("available_tracers", TRACE_MODE_READ, d_tracer, 9398 9421 tr, &show_traces_fops); 9399 9422 9400 - trace_create_file("current_tracer", 0644, d_tracer, 9423 + trace_create_file("current_tracer", TRACE_MODE_WRITE, d_tracer, 9401 9424 tr, &set_tracer_fops); 9402 9425 9403 - trace_create_file("tracing_cpumask", 0644, d_tracer, 9426 + trace_create_file("tracing_cpumask", TRACE_MODE_WRITE, d_tracer, 9404 9427 tr, &tracing_cpumask_fops); 9405 9428 9406 - trace_create_file("trace_options", 0644, d_tracer, 9429 + trace_create_file("trace_options", TRACE_MODE_WRITE, d_tracer, 9407 9430 tr, &tracing_iter_fops); 9408 9431 9409 - trace_create_file("trace", 0644, d_tracer, 9432 + trace_create_file("trace", TRACE_MODE_WRITE, d_tracer, 9410 9433 tr, &tracing_fops); 9411 9434 9412 - trace_create_file("trace_pipe", 0444, d_tracer, 9435 + trace_create_file("trace_pipe", TRACE_MODE_READ, d_tracer, 9413 9436 tr, &tracing_pipe_fops); 9414 9437 9415 - trace_create_file("buffer_size_kb", 0644, d_tracer, 9438 + trace_create_file("buffer_size_kb", TRACE_MODE_WRITE, d_tracer, 9416 9439 tr, &tracing_entries_fops); 9417 9440 9418 - trace_create_file("buffer_total_size_kb", 0444, d_tracer, 9441 + trace_create_file("buffer_total_size_kb", TRACE_MODE_READ, d_tracer, 9419 9442 tr, &tracing_total_entries_fops); 9420 9443 9421 9444 trace_create_file("free_buffer", 0200, d_tracer, ··· 9426 9449 9427 9450 file = __find_event_file(tr, "ftrace", "print"); 9428 9451 if (file && file->dir) 9429 - trace_create_file("trigger", 0644, file->dir, file, 9430 - &event_trigger_fops); 9452 + trace_create_file("trigger", TRACE_MODE_WRITE, file->dir, 9453 + file, &event_trigger_fops); 9431 9454 tr->trace_marker_file = file; 9432 9455 9433 9456 trace_create_file("trace_marker_raw", 0220, d_tracer, 9434 9457 tr, &tracing_mark_raw_fops); 9435 9458 9436 - trace_create_file("trace_clock", 0644, d_tracer, tr, 9459 + trace_create_file("trace_clock", TRACE_MODE_WRITE, d_tracer, tr, 9437 9460 &trace_clock_fops); 9438 9461 9439 - trace_create_file("tracing_on", 0644, d_tracer, 9462 + trace_create_file("tracing_on", TRACE_MODE_WRITE, d_tracer, 9440 9463 tr, &rb_simple_fops); 9441 9464 9442 - trace_create_file("timestamp_mode", 0444, d_tracer, tr, 9465 + trace_create_file("timestamp_mode", TRACE_MODE_READ, d_tracer, tr, 9443 9466 &trace_time_stamp_mode_fops); 9444 9467 9445 9468 tr->buffer_percent = 50; 9446 9469 9447 - trace_create_file("buffer_percent", 0444, d_tracer, 9470 + trace_create_file("buffer_percent", TRACE_MODE_READ, d_tracer, 9448 9471 tr, &buffer_percent_fops); 9449 9472 9450 9473 create_trace_options_dir(tr); ··· 9455 9478 MEM_FAIL(1, "Could not allocate function filter files"); 9456 9479 9457 9480 #ifdef CONFIG_TRACER_SNAPSHOT 9458 - trace_create_file("snapshot", 0644, d_tracer, 9481 + trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer, 9459 9482 tr, &snapshot_fops); 9460 9483 #endif 9461 9484 9462 - trace_create_file("error_log", 0644, d_tracer, 9485 + trace_create_file("error_log", TRACE_MODE_WRITE, d_tracer, 9463 9486 tr, &tracing_err_log_fops); 9464 9487 9465 9488 for_each_tracing_cpu(cpu) ··· 9652 9675 init_tracer_tracefs(&global_trace, NULL); 9653 9676 ftrace_init_tracefs_toplevel(&global_trace, NULL); 9654 9677 9655 - trace_create_file("tracing_thresh", 0644, NULL, 9678 + trace_create_file("tracing_thresh", TRACE_MODE_WRITE, NULL, 9656 9679 &global_trace, &tracing_thresh_fops); 9657 9680 9658 - trace_create_file("README", 0444, NULL, 9681 + trace_create_file("README", TRACE_MODE_READ, NULL, 9659 9682 NULL, &tracing_readme_fops); 9660 9683 9661 - trace_create_file("saved_cmdlines", 0444, NULL, 9684 + trace_create_file("saved_cmdlines", TRACE_MODE_READ, NULL, 9662 9685 NULL, &tracing_saved_cmdlines_fops); 9663 9686 9664 - trace_create_file("saved_cmdlines_size", 0644, NULL, 9687 + trace_create_file("saved_cmdlines_size", TRACE_MODE_WRITE, NULL, 9665 9688 NULL, &tracing_saved_cmdlines_size_fops); 9666 9689 9667 - trace_create_file("saved_tgids", 0444, NULL, 9690 + trace_create_file("saved_tgids", TRACE_MODE_READ, NULL, 9668 9691 NULL, &tracing_saved_tgids_fops); 9669 9692 9670 9693 trace_eval_init(); ··· 9676 9699 #endif 9677 9700 9678 9701 #ifdef CONFIG_DYNAMIC_FTRACE 9679 - trace_create_file("dyn_ftrace_total_info", 0444, NULL, 9702 + trace_create_file("dyn_ftrace_total_info", TRACE_MODE_READ, NULL, 9680 9703 NULL, &tracing_dyn_info_fops); 9681 9704 #endif 9682 9705
+14 -5
kernel/trace/trace.h
··· 22 22 #include <linux/ctype.h> 23 23 #include <linux/once_lite.h> 24 24 25 + #include "pid_list.h" 26 + 25 27 #ifdef CONFIG_FTRACE_SYSCALLS 26 28 #include <asm/unistd.h> /* For NR_SYSCALLS */ 27 29 #include <asm/syscall.h> /* some archs define it here */ 28 30 #endif 31 + 32 + #define TRACE_MODE_WRITE 0640 33 + #define TRACE_MODE_READ 0440 29 34 30 35 enum trace_type { 31 36 __TRACE_FIRST_TYPE = 0, ··· 193 188 struct trace_option_dentry *topts; 194 189 }; 195 190 196 - struct trace_pid_list { 197 - int pid_max; 198 - unsigned long *pids; 199 - }; 191 + struct trace_pid_list *trace_pid_list_alloc(void); 192 + void trace_pid_list_free(struct trace_pid_list *pid_list); 193 + bool trace_pid_list_is_set(struct trace_pid_list *pid_list, unsigned int pid); 194 + int trace_pid_list_set(struct trace_pid_list *pid_list, unsigned int pid); 195 + int trace_pid_list_clear(struct trace_pid_list *pid_list, unsigned int pid); 196 + int trace_pid_list_first(struct trace_pid_list *pid_list, unsigned int *pid); 197 + int trace_pid_list_next(struct trace_pid_list *pid_list, unsigned int pid, 198 + unsigned int *next); 200 199 201 200 enum { 202 201 TRACE_PIDS = BIT(0), ··· 890 881 * is set, and called by an interrupt handler, we still 891 882 * want to trace it. 892 883 */ 893 - if (in_irq()) 884 + if (in_hardirq()) 894 885 trace_recursion_set(TRACE_IRQ_BIT); 895 886 else 896 887 trace_recursion_clear(TRACE_IRQ_BIT);
+4
kernel/trace/trace_boot.c
··· 430 430 /* All digit started node should be instances. */ 431 431 if (trace_boot_compose_hist_cmd(node, buf, size) == 0) { 432 432 tmp = kstrdup(buf, GFP_KERNEL); 433 + if (!tmp) 434 + return; 433 435 if (trigger_process_regex(file, buf) < 0) 434 436 pr_err("Failed to apply hist trigger: %s\n", tmp); 435 437 kfree(tmp); ··· 441 439 if (xbc_node_find_subkey(hnode, "keys")) { 442 440 if (trace_boot_compose_hist_cmd(hnode, buf, size) == 0) { 443 441 tmp = kstrdup(buf, GFP_KERNEL); 442 + if (!tmp) 443 + return; 444 444 if (trigger_process_regex(file, buf) < 0) 445 445 pr_err("Failed to apply hist trigger: %s\n", tmp); 446 446 kfree(tmp);
+1 -1
kernel/trace/trace_dynevent.c
··· 262 262 if (ret) 263 263 return 0; 264 264 265 - entry = tracefs_create_file("dynamic_events", 0644, NULL, 265 + entry = tracefs_create_file("dynamic_events", TRACE_MODE_WRITE, NULL, 266 266 NULL, &dynamic_events_ops); 267 267 268 268 /* Event list interface */
+5 -4
kernel/trace/trace_event_perf.c
··· 400 400 BUILD_BUG_ON(PERF_MAX_TRACE_SIZE % sizeof(unsigned long)); 401 401 402 402 if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, 403 - "perf buffer not large enough")) 403 + "perf buffer not large enough, wanted %d, have %d", 404 + size, PERF_MAX_TRACE_SIZE)) 404 405 return NULL; 405 406 406 407 *rctxp = rctx = perf_swevent_get_recursion_context(); ··· 442 441 if (!rcu_is_watching()) 443 442 return; 444 443 445 - if ((unsigned long)ops->private != smp_processor_id()) 446 - return; 447 - 448 444 bit = ftrace_test_recursion_trylock(ip, parent_ip); 449 445 if (bit < 0) 450 446 return; 447 + 448 + if ((unsigned long)ops->private != smp_processor_id()) 449 + goto out; 451 450 452 451 event = container_of(ops, struct perf_event, ftrace_ops); 453 452
+25 -23
kernel/trace/trace_events.c
··· 885 885 tracepoint_synchronize_unregister(); 886 886 887 887 if ((type & TRACE_PIDS) && pid_list) 888 - trace_free_pid_list(pid_list); 888 + trace_pid_list_free(pid_list); 889 889 890 890 if ((type & TRACE_NO_PIDS) && no_pid_list) 891 - trace_free_pid_list(no_pid_list); 891 + trace_pid_list_free(no_pid_list); 892 892 } 893 893 894 894 static void ftrace_clear_event_pids(struct trace_array *tr, int type) ··· 1967 1967 1968 1968 if (filtered_pids) { 1969 1969 tracepoint_synchronize_unregister(); 1970 - trace_free_pid_list(filtered_pids); 1970 + trace_pid_list_free(filtered_pids); 1971 1971 } else if (pid_list && !other_pids) { 1972 1972 register_pid_events(tr); 1973 1973 } ··· 2312 2312 /* the ftrace system is special, do not create enable or filter files */ 2313 2313 if (strcmp(name, "ftrace") != 0) { 2314 2314 2315 - entry = tracefs_create_file("filter", 0644, dir->entry, dir, 2315 + entry = tracefs_create_file("filter", TRACE_MODE_WRITE, 2316 + dir->entry, dir, 2316 2317 &ftrace_subsystem_filter_fops); 2317 2318 if (!entry) { 2318 2319 kfree(system->filter); ··· 2321 2320 pr_warn("Could not create tracefs '%s/filter' entry\n", name); 2322 2321 } 2323 2322 2324 - trace_create_file("enable", 0644, dir->entry, dir, 2323 + trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir, 2325 2324 &ftrace_system_enable_fops); 2326 2325 } 2327 2326 ··· 2403 2402 } 2404 2403 2405 2404 if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) 2406 - trace_create_file("enable", 0644, file->dir, file, 2405 + trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file, 2407 2406 &ftrace_enable_fops); 2408 2407 2409 2408 #ifdef CONFIG_PERF_EVENTS 2410 2409 if (call->event.type && call->class->reg) 2411 - trace_create_file("id", 0444, file->dir, 2410 + trace_create_file("id", TRACE_MODE_READ, file->dir, 2412 2411 (void *)(long)call->event.type, 2413 2412 &ftrace_event_id_fops); 2414 2413 #endif ··· 2424 2423 * triggers or filters. 2425 2424 */ 2426 2425 if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { 2427 - trace_create_file("filter", 0644, file->dir, file, 2428 - &ftrace_event_filter_fops); 2426 + trace_create_file("filter", TRACE_MODE_WRITE, file->dir, 2427 + file, &ftrace_event_filter_fops); 2429 2428 2430 - trace_create_file("trigger", 0644, file->dir, file, 2431 - &event_trigger_fops); 2429 + trace_create_file("trigger", TRACE_MODE_WRITE, file->dir, 2430 + file, &event_trigger_fops); 2432 2431 } 2433 2432 2434 2433 #ifdef CONFIG_HIST_TRIGGERS 2435 - trace_create_file("hist", 0444, file->dir, file, 2434 + trace_create_file("hist", TRACE_MODE_READ, file->dir, file, 2436 2435 &event_hist_fops); 2437 2436 #endif 2438 2437 #ifdef CONFIG_HIST_TRIGGERS_DEBUG 2439 - trace_create_file("hist_debug", 0444, file->dir, file, 2438 + trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file, 2440 2439 &event_hist_debug_fops); 2441 2440 #endif 2442 - trace_create_file("format", 0444, file->dir, call, 2441 + trace_create_file("format", TRACE_MODE_READ, file->dir, call, 2443 2442 &ftrace_event_format_fops); 2444 2443 2445 2444 #ifdef CONFIG_TRACE_EVENT_INJECT ··· 3434 3433 struct dentry *d_events; 3435 3434 struct dentry *entry; 3436 3435 3437 - entry = tracefs_create_file("set_event", 0644, parent, 3436 + entry = tracefs_create_file("set_event", TRACE_MODE_WRITE, parent, 3438 3437 tr, &ftrace_set_event_fops); 3439 3438 if (!entry) { 3440 3439 pr_warn("Could not create tracefs 'set_event' entry\n"); ··· 3447 3446 return -ENOMEM; 3448 3447 } 3449 3448 3450 - entry = trace_create_file("enable", 0644, d_events, 3449 + entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events, 3451 3450 tr, &ftrace_tr_enable_fops); 3452 3451 if (!entry) { 3453 3452 pr_warn("Could not create tracefs 'enable' entry\n"); ··· 3456 3455 3457 3456 /* There are not as crucial, just warn if they are not created */ 3458 3457 3459 - entry = tracefs_create_file("set_event_pid", 0644, parent, 3458 + entry = tracefs_create_file("set_event_pid", TRACE_MODE_WRITE, parent, 3460 3459 tr, &ftrace_set_event_pid_fops); 3461 3460 if (!entry) 3462 3461 pr_warn("Could not create tracefs 'set_event_pid' entry\n"); 3463 3462 3464 - entry = tracefs_create_file("set_event_notrace_pid", 0644, parent, 3465 - tr, &ftrace_set_event_notrace_pid_fops); 3463 + entry = tracefs_create_file("set_event_notrace_pid", 3464 + TRACE_MODE_WRITE, parent, tr, 3465 + &ftrace_set_event_notrace_pid_fops); 3466 3466 if (!entry) 3467 3467 pr_warn("Could not create tracefs 'set_event_notrace_pid' entry\n"); 3468 3468 3469 3469 /* ring buffer internal formats */ 3470 - entry = trace_create_file("header_page", 0444, d_events, 3470 + entry = trace_create_file("header_page", TRACE_MODE_READ, d_events, 3471 3471 ring_buffer_print_page_header, 3472 3472 &ftrace_show_header_fops); 3473 3473 if (!entry) 3474 3474 pr_warn("Could not create tracefs 'header_page' entry\n"); 3475 3475 3476 - entry = trace_create_file("header_event", 0444, d_events, 3476 + entry = trace_create_file("header_event", TRACE_MODE_READ, d_events, 3477 3477 ring_buffer_print_entry_header, 3478 3478 &ftrace_show_header_fops); 3479 3479 if (!entry) ··· 3691 3689 if (!tr) 3692 3690 return -ENODEV; 3693 3691 3694 - entry = tracefs_create_file("available_events", 0444, NULL, 3695 - tr, &ftrace_avail_fops); 3692 + entry = tracefs_create_file("available_events", TRACE_MODE_READ, 3693 + NULL, tr, &ftrace_avail_fops); 3696 3694 if (!entry) 3697 3695 pr_warn("Could not create tracefs 'available_events' entry\n"); 3698 3696
+329 -86
kernel/trace/trace_events_hist.c
··· 66 66 C(EMPTY_SORT_FIELD, "Empty sort field"), \ 67 67 C(TOO_MANY_SORT_FIELDS, "Too many sort fields (Max = 2)"), \ 68 68 C(INVALID_SORT_FIELD, "Sort field must be a key or a val"), \ 69 - C(INVALID_STR_OPERAND, "String type can not be an operand in expression"), 69 + C(INVALID_STR_OPERAND, "String type can not be an operand in expression"), \ 70 + C(EXPECT_NUMBER, "Expecting numeric literal"), \ 71 + C(UNARY_MINUS_SUBEXPR, "Unary minus not supported in sub-expressions"), 70 72 71 73 #undef C 72 74 #define C(a, b) HIST_ERR_##a ··· 91 89 #define HIST_FIELD_OPERANDS_MAX 2 92 90 #define HIST_FIELDS_MAX (TRACING_MAP_FIELDS_MAX + TRACING_MAP_VARS_MAX) 93 91 #define HIST_ACTIONS_MAX 8 92 + #define HIST_CONST_DIGITS_MAX 21 94 93 95 94 enum field_op_id { 96 95 FIELD_OP_NONE, 97 96 FIELD_OP_PLUS, 98 97 FIELD_OP_MINUS, 99 98 FIELD_OP_UNARY_MINUS, 99 + FIELD_OP_DIV, 100 + FIELD_OP_MULT, 100 101 }; 101 102 102 103 /* ··· 157 152 bool read_once; 158 153 159 154 unsigned int var_str_idx; 155 + 156 + /* Numeric literals are represented as u64 */ 157 + u64 constant; 160 158 }; 161 159 162 160 static u64 hist_field_none(struct hist_field *field, ··· 169 161 void *event) 170 162 { 171 163 return 0; 164 + } 165 + 166 + static u64 hist_field_const(struct hist_field *field, 167 + struct tracing_map_elt *elt, 168 + struct trace_buffer *buffer, 169 + struct ring_buffer_event *rbe, 170 + void *event) 171 + { 172 + return field->constant; 172 173 } 173 174 174 175 static u64 hist_field_counter(struct hist_field *field, ··· 288 271 return val1 - val2; 289 272 } 290 273 274 + static u64 hist_field_div(struct hist_field *hist_field, 275 + struct tracing_map_elt *elt, 276 + struct trace_buffer *buffer, 277 + struct ring_buffer_event *rbe, 278 + void *event) 279 + { 280 + struct hist_field *operand1 = hist_field->operands[0]; 281 + struct hist_field *operand2 = hist_field->operands[1]; 282 + 283 + u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 284 + u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event); 285 + 286 + /* Return -1 for the undefined case */ 287 + if (!val2) 288 + return -1; 289 + 290 + /* Use shift if the divisor is a power of 2 */ 291 + if (!(val2 & (val2 - 1))) 292 + return val1 >> __ffs64(val2); 293 + 294 + return div64_u64(val1, val2); 295 + } 296 + 297 + static u64 hist_field_mult(struct hist_field *hist_field, 298 + struct tracing_map_elt *elt, 299 + struct trace_buffer *buffer, 300 + struct ring_buffer_event *rbe, 301 + void *event) 302 + { 303 + struct hist_field *operand1 = hist_field->operands[0]; 304 + struct hist_field *operand2 = hist_field->operands[1]; 305 + 306 + u64 val1 = operand1->fn(operand1, elt, buffer, rbe, event); 307 + u64 val2 = operand2->fn(operand2, elt, buffer, rbe, event); 308 + 309 + return val1 * val2; 310 + } 311 + 291 312 static u64 hist_field_unary_minus(struct hist_field *hist_field, 292 313 struct tracing_map_elt *elt, 293 314 struct trace_buffer *buffer, ··· 396 341 HIST_FIELD_FL_CPU = 1 << 15, 397 342 HIST_FIELD_FL_ALIAS = 1 << 16, 398 343 HIST_FIELD_FL_BUCKET = 1 << 17, 344 + HIST_FIELD_FL_CONST = 1 << 18, 399 345 }; 400 346 401 347 struct var_defs { ··· 1572 1516 { 1573 1517 if (field->flags & HIST_FIELD_FL_VAR_REF) 1574 1518 strcat(expr, "$"); 1519 + else if (field->flags & HIST_FIELD_FL_CONST) { 1520 + char str[HIST_CONST_DIGITS_MAX]; 1521 + 1522 + snprintf(str, HIST_CONST_DIGITS_MAX, "%llu", field->constant); 1523 + strcat(expr, str); 1524 + } 1575 1525 1576 1526 strcat(expr, hist_field_name(field, 0)); 1577 1527 ··· 1633 1571 case FIELD_OP_PLUS: 1634 1572 strcat(expr, "+"); 1635 1573 break; 1574 + case FIELD_OP_DIV: 1575 + strcat(expr, "/"); 1576 + break; 1577 + case FIELD_OP_MULT: 1578 + strcat(expr, "*"); 1579 + break; 1636 1580 default: 1637 1581 kfree(expr); 1638 1582 return NULL; ··· 1649 1581 return expr; 1650 1582 } 1651 1583 1652 - static int contains_operator(char *str) 1584 + /* 1585 + * If field_op != FIELD_OP_NONE, *sep points to the root operator 1586 + * of the expression tree to be evaluated. 1587 + */ 1588 + static int contains_operator(char *str, char **sep) 1653 1589 { 1654 1590 enum field_op_id field_op = FIELD_OP_NONE; 1655 - char *op; 1591 + char *minus_op, *plus_op, *div_op, *mult_op; 1656 1592 1657 - op = strpbrk(str, "+-"); 1658 - if (!op) 1659 - return FIELD_OP_NONE; 1660 1593 1661 - switch (*op) { 1662 - case '-': 1594 + /* 1595 + * Report the last occurrence of the operators first, so that the 1596 + * expression is evaluated left to right. This is important since 1597 + * subtraction and division are not associative. 1598 + * 1599 + * e.g 1600 + * 64/8/4/2 is 1, i.e 64/8/4/2 = ((64/8)/4)/2 1601 + * 14-7-5-2 is 0, i.e 14-7-5-2 = ((14-7)-5)-2 1602 + */ 1603 + 1604 + /* 1605 + * First, find lower precedence addition and subtraction 1606 + * since the expression will be evaluated recursively. 1607 + */ 1608 + minus_op = strrchr(str, '-'); 1609 + if (minus_op) { 1663 1610 /* 1664 - * Unfortunately, the modifier ".sym-offset" 1665 - * can confuse things. 1611 + * Unary minus is not supported in sub-expressions. If 1612 + * present, it is always the next root operator. 1666 1613 */ 1667 - if (op - str >= 4 && !strncmp(op - 4, ".sym-offset", 11)) 1668 - return FIELD_OP_NONE; 1669 - 1670 - if (*str == '-') 1614 + if (minus_op == str) { 1671 1615 field_op = FIELD_OP_UNARY_MINUS; 1672 - else 1673 - field_op = FIELD_OP_MINUS; 1674 - break; 1675 - case '+': 1676 - field_op = FIELD_OP_PLUS; 1677 - break; 1678 - default: 1679 - break; 1616 + goto out; 1617 + } 1618 + 1619 + field_op = FIELD_OP_MINUS; 1620 + } 1621 + 1622 + plus_op = strrchr(str, '+'); 1623 + if (plus_op || minus_op) { 1624 + /* 1625 + * For operators of the same precedence use to rightmost as the 1626 + * root, so that the expression is evaluated left to right. 1627 + */ 1628 + if (plus_op > minus_op) 1629 + field_op = FIELD_OP_PLUS; 1630 + goto out; 1631 + } 1632 + 1633 + /* 1634 + * Multiplication and division have higher precedence than addition and 1635 + * subtraction. 1636 + */ 1637 + div_op = strrchr(str, '/'); 1638 + if (div_op) 1639 + field_op = FIELD_OP_DIV; 1640 + 1641 + mult_op = strrchr(str, '*'); 1642 + /* 1643 + * For operators of the same precedence use to rightmost as the 1644 + * root, so that the expression is evaluated left to right. 1645 + */ 1646 + if (mult_op > div_op) 1647 + field_op = FIELD_OP_MULT; 1648 + 1649 + out: 1650 + if (sep) { 1651 + switch (field_op) { 1652 + case FIELD_OP_UNARY_MINUS: 1653 + case FIELD_OP_MINUS: 1654 + *sep = minus_op; 1655 + break; 1656 + case FIELD_OP_PLUS: 1657 + *sep = plus_op; 1658 + break; 1659 + case FIELD_OP_DIV: 1660 + *sep = div_op; 1661 + break; 1662 + case FIELD_OP_MULT: 1663 + *sep = mult_op; 1664 + break; 1665 + case FIELD_OP_NONE: 1666 + default: 1667 + *sep = NULL; 1668 + break; 1669 + } 1680 1670 } 1681 1671 1682 1672 return field_op; ··· 1812 1686 hist_field->fn = hist_field_counter; 1813 1687 hist_field->size = sizeof(u64); 1814 1688 hist_field->type = "u64"; 1689 + goto out; 1690 + } 1691 + 1692 + if (flags & HIST_FIELD_FL_CONST) { 1693 + hist_field->fn = hist_field_const; 1694 + hist_field->size = sizeof(u64); 1695 + hist_field->type = kstrdup("u64", GFP_KERNEL); 1696 + if (!hist_field->type) 1697 + goto free; 1815 1698 goto out; 1816 1699 } 1817 1700 ··· 2060 1925 2061 1926 if (strcmp(var_name, name) == 0) { 2062 1927 field = hist_data->attrs->var_defs.expr[i]; 2063 - if (contains_operator(field) || is_var_ref(field)) 1928 + if (contains_operator(field, NULL) || is_var_ref(field)) 2064 1929 continue; 2065 1930 return field; 2066 1931 } ··· 2137 2002 *flags |= HIST_FIELD_FL_HEX; 2138 2003 else if (strcmp(modifier, "sym") == 0) 2139 2004 *flags |= HIST_FIELD_FL_SYM; 2140 - else if (strcmp(modifier, "sym-offset") == 0) 2005 + /* 2006 + * 'sym-offset' occurrences in the trigger string are modified 2007 + * to 'symXoffset' to simplify arithmetic expression parsing. 2008 + */ 2009 + else if (strcmp(modifier, "symXoffset") == 0) 2141 2010 *flags |= HIST_FIELD_FL_SYM_OFFSET; 2142 2011 else if ((strcmp(modifier, "execname") == 0) && 2143 2012 (strcmp(field_name, "common_pid") == 0)) ··· 2229 2090 return alias; 2230 2091 } 2231 2092 2093 + static struct hist_field *parse_const(struct hist_trigger_data *hist_data, 2094 + char *str, char *var_name, 2095 + unsigned long *flags) 2096 + { 2097 + struct trace_array *tr = hist_data->event_file->tr; 2098 + struct hist_field *field = NULL; 2099 + u64 constant; 2100 + 2101 + if (kstrtoull(str, 0, &constant)) { 2102 + hist_err(tr, HIST_ERR_EXPECT_NUMBER, errpos(str)); 2103 + return NULL; 2104 + } 2105 + 2106 + *flags |= HIST_FIELD_FL_CONST; 2107 + field = create_hist_field(hist_data, NULL, *flags, var_name); 2108 + if (!field) 2109 + return NULL; 2110 + 2111 + field->constant = constant; 2112 + 2113 + return field; 2114 + } 2115 + 2232 2116 static struct hist_field *parse_atom(struct hist_trigger_data *hist_data, 2233 2117 struct trace_event_file *file, char *str, 2234 2118 unsigned long *flags, char *var_name) ··· 2261 2099 struct hist_field *hist_field = NULL; 2262 2100 unsigned long buckets = 0; 2263 2101 int ret = 0; 2102 + 2103 + if (isdigit(str[0])) { 2104 + hist_field = parse_const(hist_data, str, var_name, flags); 2105 + if (!hist_field) { 2106 + ret = -EINVAL; 2107 + goto out; 2108 + } 2109 + return hist_field; 2110 + } 2264 2111 2265 2112 s = strchr(str, '.'); 2266 2113 if (s) { ··· 2327 2156 static struct hist_field *parse_expr(struct hist_trigger_data *hist_data, 2328 2157 struct trace_event_file *file, 2329 2158 char *str, unsigned long flags, 2330 - char *var_name, unsigned int level); 2159 + char *var_name, unsigned int *n_subexprs); 2331 2160 2332 2161 static struct hist_field *parse_unary(struct hist_trigger_data *hist_data, 2333 2162 struct trace_event_file *file, 2334 2163 char *str, unsigned long flags, 2335 - char *var_name, unsigned int level) 2164 + char *var_name, unsigned int *n_subexprs) 2336 2165 { 2337 2166 struct hist_field *operand1, *expr = NULL; 2338 2167 unsigned long operand_flags; 2339 2168 int ret = 0; 2340 2169 char *s; 2341 2170 2171 + /* Unary minus operator, increment n_subexprs */ 2172 + ++*n_subexprs; 2173 + 2342 2174 /* we support only -(xxx) i.e. explicit parens required */ 2343 2175 2344 - if (level > 3) { 2176 + if (*n_subexprs > 3) { 2345 2177 hist_err(file->tr, HIST_ERR_TOO_MANY_SUBEXPR, errpos(str)); 2346 2178 ret = -EINVAL; 2347 2179 goto free; ··· 2361 2187 } 2362 2188 2363 2189 s = strrchr(str, ')'); 2364 - if (s) 2190 + if (s) { 2191 + /* unary minus not supported in sub-expressions */ 2192 + if (*(s+1) != '\0') { 2193 + hist_err(file->tr, HIST_ERR_UNARY_MINUS_SUBEXPR, 2194 + errpos(str)); 2195 + ret = -EINVAL; 2196 + goto free; 2197 + } 2365 2198 *s = '\0'; 2199 + } 2366 2200 else { 2367 2201 ret = -EINVAL; /* no closing ')' */ 2368 2202 goto free; ··· 2384 2202 } 2385 2203 2386 2204 operand_flags = 0; 2387 - operand1 = parse_expr(hist_data, file, str, operand_flags, NULL, ++level); 2205 + operand1 = parse_expr(hist_data, file, str, operand_flags, NULL, n_subexprs); 2388 2206 if (IS_ERR(operand1)) { 2389 2207 ret = PTR_ERR(operand1); 2390 2208 goto free; ··· 2415 2233 return ERR_PTR(ret); 2416 2234 } 2417 2235 2236 + /* 2237 + * If the operands are var refs, return pointers the 2238 + * variable(s) referenced in var1 and var2, else NULL. 2239 + */ 2418 2240 static int check_expr_operands(struct trace_array *tr, 2419 2241 struct hist_field *operand1, 2420 - struct hist_field *operand2) 2242 + struct hist_field *operand2, 2243 + struct hist_field **var1, 2244 + struct hist_field **var2) 2421 2245 { 2422 2246 unsigned long operand1_flags = operand1->flags; 2423 2247 unsigned long operand2_flags = operand2->flags; ··· 2436 2248 if (!var) 2437 2249 return -EINVAL; 2438 2250 operand1_flags = var->flags; 2251 + *var1 = var; 2439 2252 } 2440 2253 2441 2254 if ((operand2_flags & HIST_FIELD_FL_VAR_REF) || ··· 2447 2258 if (!var) 2448 2259 return -EINVAL; 2449 2260 operand2_flags = var->flags; 2261 + *var2 = var; 2450 2262 } 2451 2263 2452 2264 if ((operand1_flags & HIST_FIELD_FL_TIMESTAMP_USECS) != ··· 2462 2272 static struct hist_field *parse_expr(struct hist_trigger_data *hist_data, 2463 2273 struct trace_event_file *file, 2464 2274 char *str, unsigned long flags, 2465 - char *var_name, unsigned int level) 2275 + char *var_name, unsigned int *n_subexprs) 2466 2276 { 2467 2277 struct hist_field *operand1 = NULL, *operand2 = NULL, *expr = NULL; 2468 - unsigned long operand_flags; 2278 + struct hist_field *var1 = NULL, *var2 = NULL; 2279 + unsigned long operand_flags, operand2_flags; 2469 2280 int field_op, ret = -EINVAL; 2470 2281 char *sep, *operand1_str; 2282 + hist_field_fn_t op_fn; 2283 + bool combine_consts; 2471 2284 2472 - if (level > 3) { 2285 + if (*n_subexprs > 3) { 2473 2286 hist_err(file->tr, HIST_ERR_TOO_MANY_SUBEXPR, errpos(str)); 2474 2287 return ERR_PTR(-EINVAL); 2475 2288 } 2476 2289 2477 - field_op = contains_operator(str); 2290 + field_op = contains_operator(str, &sep); 2478 2291 2479 2292 if (field_op == FIELD_OP_NONE) 2480 2293 return parse_atom(hist_data, file, str, &flags, var_name); 2481 2294 2482 2295 if (field_op == FIELD_OP_UNARY_MINUS) 2483 - return parse_unary(hist_data, file, str, flags, var_name, ++level); 2296 + return parse_unary(hist_data, file, str, flags, var_name, n_subexprs); 2484 2297 2485 - switch (field_op) { 2486 - case FIELD_OP_MINUS: 2487 - sep = "-"; 2488 - break; 2489 - case FIELD_OP_PLUS: 2490 - sep = "+"; 2491 - break; 2492 - default: 2298 + /* Binary operator found, increment n_subexprs */ 2299 + ++*n_subexprs; 2300 + 2301 + /* Split the expression string at the root operator */ 2302 + if (!sep) 2493 2303 goto free; 2494 - } 2304 + *sep = '\0'; 2305 + operand1_str = str; 2306 + str = sep+1; 2495 2307 2496 - operand1_str = strsep(&str, sep); 2497 2308 if (!operand1_str || !str) 2498 2309 goto free; 2499 2310 2500 2311 operand_flags = 0; 2501 - operand1 = parse_atom(hist_data, file, operand1_str, 2502 - &operand_flags, NULL); 2312 + 2313 + /* LHS of string is an expression e.g. a+b in a+b+c */ 2314 + operand1 = parse_expr(hist_data, file, operand1_str, operand_flags, NULL, n_subexprs); 2503 2315 if (IS_ERR(operand1)) { 2504 2316 ret = PTR_ERR(operand1); 2505 2317 operand1 = NULL; ··· 2513 2321 goto free; 2514 2322 } 2515 2323 2516 - /* rest of string could be another expression e.g. b+c in a+b+c */ 2324 + /* RHS of string is another expression e.g. c in a+b+c */ 2517 2325 operand_flags = 0; 2518 - operand2 = parse_expr(hist_data, file, str, operand_flags, NULL, ++level); 2326 + operand2 = parse_expr(hist_data, file, str, operand_flags, NULL, n_subexprs); 2519 2327 if (IS_ERR(operand2)) { 2520 2328 ret = PTR_ERR(operand2); 2521 2329 operand2 = NULL; ··· 2527 2335 goto free; 2528 2336 } 2529 2337 2530 - ret = check_expr_operands(file->tr, operand1, operand2); 2338 + switch (field_op) { 2339 + case FIELD_OP_MINUS: 2340 + op_fn = hist_field_minus; 2341 + break; 2342 + case FIELD_OP_PLUS: 2343 + op_fn = hist_field_plus; 2344 + break; 2345 + case FIELD_OP_DIV: 2346 + op_fn = hist_field_div; 2347 + break; 2348 + case FIELD_OP_MULT: 2349 + op_fn = hist_field_mult; 2350 + break; 2351 + default: 2352 + ret = -EINVAL; 2353 + goto free; 2354 + } 2355 + 2356 + ret = check_expr_operands(file->tr, operand1, operand2, &var1, &var2); 2531 2357 if (ret) 2532 2358 goto free; 2533 2359 2534 - flags |= HIST_FIELD_FL_EXPR; 2360 + operand_flags = var1 ? var1->flags : operand1->flags; 2361 + operand2_flags = var2 ? var2->flags : operand2->flags; 2362 + 2363 + /* 2364 + * If both operands are constant, the expression can be 2365 + * collapsed to a single constant. 2366 + */ 2367 + combine_consts = operand_flags & operand2_flags & HIST_FIELD_FL_CONST; 2368 + 2369 + flags |= combine_consts ? HIST_FIELD_FL_CONST : HIST_FIELD_FL_EXPR; 2535 2370 2536 2371 flags |= operand1->flags & 2537 2372 (HIST_FIELD_FL_TIMESTAMP | HIST_FIELD_FL_TIMESTAMP_USECS); ··· 2575 2356 expr->operands[0] = operand1; 2576 2357 expr->operands[1] = operand2; 2577 2358 2578 - /* The operand sizes should be the same, so just pick one */ 2579 - expr->size = operand1->size; 2359 + if (combine_consts) { 2360 + if (var1) 2361 + expr->operands[0] = var1; 2362 + if (var2) 2363 + expr->operands[1] = var2; 2580 2364 2581 - expr->operator = field_op; 2582 - expr->name = expr_str(expr, 0); 2583 - expr->type = kstrdup_const(operand1->type, GFP_KERNEL); 2584 - if (!expr->type) { 2585 - ret = -ENOMEM; 2586 - goto free; 2587 - } 2365 + expr->constant = op_fn(expr, NULL, NULL, NULL, NULL); 2588 2366 2589 - switch (field_op) { 2590 - case FIELD_OP_MINUS: 2591 - expr->fn = hist_field_minus; 2592 - break; 2593 - case FIELD_OP_PLUS: 2594 - expr->fn = hist_field_plus; 2595 - break; 2596 - default: 2597 - ret = -EINVAL; 2598 - goto free; 2367 + expr->operands[0] = NULL; 2368 + expr->operands[1] = NULL; 2369 + 2370 + /* 2371 + * var refs won't be destroyed immediately 2372 + * See: destroy_hist_field() 2373 + */ 2374 + destroy_hist_field(operand2, 0); 2375 + destroy_hist_field(operand1, 0); 2376 + 2377 + expr->name = expr_str(expr, 0); 2378 + } else { 2379 + expr->fn = op_fn; 2380 + 2381 + /* The operand sizes should be the same, so just pick one */ 2382 + expr->size = operand1->size; 2383 + 2384 + expr->operator = field_op; 2385 + expr->type = kstrdup_const(operand1->type, GFP_KERNEL); 2386 + if (!expr->type) { 2387 + ret = -ENOMEM; 2388 + goto free; 2389 + } 2390 + 2391 + expr->name = expr_str(expr, 0); 2599 2392 } 2600 2393 2601 2394 return expr; 2602 - free: 2395 + free: 2603 2396 destroy_hist_field(operand1, 0); 2604 2397 destroy_hist_field(operand2, 0); 2605 2398 destroy_hist_field(expr, 0); ··· 3982 3751 unsigned long flags) 3983 3752 { 3984 3753 struct hist_field *hist_field; 3985 - int ret = 0; 3754 + int ret = 0, n_subexprs = 0; 3986 3755 3987 - hist_field = parse_expr(hist_data, file, field_str, flags, var_name, 0); 3756 + hist_field = parse_expr(hist_data, file, field_str, flags, var_name, &n_subexprs); 3988 3757 if (IS_ERR(hist_field)) { 3989 3758 ret = PTR_ERR(hist_field); 3990 3759 goto out; ··· 4125 3894 struct hist_field *hist_field = NULL; 4126 3895 unsigned long flags = 0; 4127 3896 unsigned int key_size; 4128 - int ret = 0; 3897 + int ret = 0, n_subexprs = 0; 4129 3898 4130 3899 if (WARN_ON(key_idx >= HIST_FIELDS_MAX)) 4131 3900 return -EINVAL; ··· 4138 3907 hist_field = create_hist_field(hist_data, NULL, flags, NULL); 4139 3908 } else { 4140 3909 hist_field = parse_expr(hist_data, file, field_str, flags, 4141 - NULL, 0); 3910 + NULL, &n_subexprs); 4142 3911 if (IS_ERR(hist_field)) { 4143 3912 ret = PTR_ERR(hist_field); 4144 3913 goto out; ··· 4937 4706 unsigned long *stacktrace_entries, 4938 4707 unsigned int max_entries) 4939 4708 { 4940 - char str[KSYM_SYMBOL_LEN]; 4941 4709 unsigned int spaces = 8; 4942 4710 unsigned int i; 4943 4711 ··· 4945 4715 return; 4946 4716 4947 4717 seq_printf(m, "%*c", 1 + spaces, ' '); 4948 - sprint_symbol(str, stacktrace_entries[i]); 4949 - seq_printf(m, "%s\n", str); 4718 + seq_printf(m, "%pS\n", (void*)stacktrace_entries[i]); 4950 4719 } 4951 4720 } 4952 4721 ··· 4955 4726 struct tracing_map_elt *elt) 4956 4727 { 4957 4728 struct hist_field *key_field; 4958 - char str[KSYM_SYMBOL_LEN]; 4959 4729 bool multiline = false; 4960 4730 const char *field_name; 4961 4731 unsigned int i; ··· 4975 4747 seq_printf(m, "%s: %llx", field_name, uval); 4976 4748 } else if (key_field->flags & HIST_FIELD_FL_SYM) { 4977 4749 uval = *(u64 *)(key + key_field->offset); 4978 - sprint_symbol_no_offset(str, uval); 4979 - seq_printf(m, "%s: [%llx] %-45s", field_name, 4980 - uval, str); 4750 + seq_printf(m, "%s: [%llx] %-45ps", field_name, 4751 + uval, (void *)(uintptr_t)uval); 4981 4752 } else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) { 4982 4753 uval = *(u64 *)(key + key_field->offset); 4983 - sprint_symbol(str, uval); 4984 - seq_printf(m, "%s: [%llx] %-55s", field_name, 4985 - uval, str); 4754 + seq_printf(m, "%s: [%llx] %-55pS", field_name, 4755 + uval, (void *)(uintptr_t)uval); 4986 4756 } else if (key_field->flags & HIST_FIELD_FL_EXECNAME) { 4987 4757 struct hist_elt_data *elt_data = elt->private_data; 4988 4758 char *comm; ··· 5176 4950 5177 4951 if (flags & HIST_FIELD_FL_ALIAS) 5178 4952 seq_puts(m, " HIST_FIELD_FL_ALIAS\n"); 4953 + else if (flags & HIST_FIELD_FL_CONST) 4954 + seq_puts(m, " HIST_FIELD_FL_CONST\n"); 5179 4955 } 5180 4956 5181 4957 static int hist_field_debug_show(struct seq_file *m, ··· 5198 4970 seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n", 5199 4971 field->var.idx); 5200 4972 } 4973 + 4974 + if (field->flags & HIST_FIELD_FL_CONST) 4975 + seq_printf(m, " constant: %llu\n", field->constant); 5201 4976 5202 4977 if (field->flags & HIST_FIELD_FL_ALIAS) 5203 4978 seq_printf(m, " var_ref_idx (into hist_data->var_refs[]): %u\n", ··· 5444 5213 5445 5214 if (hist_field->flags & HIST_FIELD_FL_CPU) 5446 5215 seq_puts(m, "common_cpu"); 5216 + else if (hist_field->flags & HIST_FIELD_FL_CONST) 5217 + seq_printf(m, "%llu", hist_field->constant); 5447 5218 else if (field_name) { 5448 5219 if (hist_field->flags & HIST_FIELD_FL_VAR_REF || 5449 5220 hist_field->flags & HIST_FIELD_FL_ALIAS) ··· 6028 5795 struct synth_event *se; 6029 5796 const char *se_name; 6030 5797 bool remove = false; 6031 - char *trigger, *p; 5798 + char *trigger, *p, *start; 6032 5799 int ret = 0; 6033 5800 6034 5801 lockdep_assert_held(&event_mutex); ··· 6074 5841 *(p - 1) = '\0'; 6075 5842 param = strstrip(p); 6076 5843 trigger = strstrip(trigger); 5844 + } 5845 + 5846 + /* 5847 + * To simplify arithmetic expression parsing, replace occurrences of 5848 + * '.sym-offset' modifier with '.symXoffset' 5849 + */ 5850 + start = strstr(trigger, ".sym-offset"); 5851 + while (start) { 5852 + *(start + 4) = 'X'; 5853 + start = strstr(start + 11, ".sym-offset"); 6077 5854 } 6078 5855 6079 5856 attrs = parse_hist_trigger_attrs(file->tr, trigger);
+2 -2
kernel/trace/trace_events_synth.c
··· 2227 2227 if (err) 2228 2228 goto err; 2229 2229 2230 - entry = tracefs_create_file("synthetic_events", 0644, NULL, 2231 - NULL, &synth_events_fops); 2230 + entry = tracefs_create_file("synthetic_events", TRACE_MODE_WRITE, 2231 + NULL, NULL, &synth_events_fops); 2232 2232 if (!entry) { 2233 2233 err = -ENODEV; 2234 2234 goto err;
-5
kernel/trace/trace_functions.c
··· 186 186 return; 187 187 188 188 trace_ctx = tracing_gen_ctx(); 189 - preempt_disable_notrace(); 190 189 191 190 cpu = smp_processor_id(); 192 191 data = per_cpu_ptr(tr->array_buffer.data, cpu); ··· 193 194 trace_function(tr, ip, parent_ip, trace_ctx); 194 195 195 196 ftrace_test_recursion_unlock(bit); 196 - preempt_enable_notrace(); 197 197 } 198 198 199 199 #ifdef CONFIG_UNWINDER_ORC ··· 296 298 if (bit < 0) 297 299 return; 298 300 299 - preempt_disable_notrace(); 300 - 301 301 cpu = smp_processor_id(); 302 302 data = per_cpu_ptr(tr->array_buffer.data, cpu); 303 303 if (atomic_read(&data->disabled)) ··· 320 324 321 325 out: 322 326 ftrace_test_recursion_unlock(bit); 323 - preempt_enable_notrace(); 324 327 } 325 328 326 329 static void
+2 -2
kernel/trace/trace_functions_graph.c
··· 120 120 if (!ftrace_graph_skip_irqs || trace_recursion_test(TRACE_IRQ_BIT)) 121 121 return 0; 122 122 123 - return in_irq(); 123 + return in_hardirq(); 124 124 } 125 125 126 126 int trace_graph_entry(struct ftrace_graph_ent *trace) ··· 1340 1340 if (ret) 1341 1341 return 0; 1342 1342 1343 - trace_create_file("max_graph_depth", 0644, NULL, 1343 + trace_create_file("max_graph_depth", TRACE_MODE_WRITE, NULL, 1344 1344 NULL, &graph_depth_fops); 1345 1345 1346 1346 return 0;
+5 -5
kernel/trace/trace_hwlat.c
··· 79 79 int nmi_cpu; 80 80 }; 81 81 82 - struct hwlat_kthread_data hwlat_single_cpu_data; 83 - DEFINE_PER_CPU(struct hwlat_kthread_data, hwlat_per_cpu_data); 82 + static struct hwlat_kthread_data hwlat_single_cpu_data; 83 + static DEFINE_PER_CPU(struct hwlat_kthread_data, hwlat_per_cpu_data); 84 84 85 85 /* Tells NMIs to call back to the hwlat tracer to record timestamps */ 86 86 bool trace_hwlat_callback_enabled; ··· 782 782 if (!top_dir) 783 783 return -ENOMEM; 784 784 785 - hwlat_sample_window = tracefs_create_file("window", 0640, 785 + hwlat_sample_window = tracefs_create_file("window", TRACE_MODE_WRITE, 786 786 top_dir, 787 787 &hwlat_window, 788 788 &trace_min_max_fops); 789 789 if (!hwlat_sample_window) 790 790 goto err; 791 791 792 - hwlat_sample_width = tracefs_create_file("width", 0644, 792 + hwlat_sample_width = tracefs_create_file("width", TRACE_MODE_WRITE, 793 793 top_dir, 794 794 &hwlat_width, 795 795 &trace_min_max_fops); 796 796 if (!hwlat_sample_width) 797 797 goto err; 798 798 799 - hwlat_thread_mode = trace_create_file("mode", 0644, 799 + hwlat_thread_mode = trace_create_file("mode", TRACE_MODE_WRITE, 800 800 top_dir, 801 801 NULL, 802 802 &thread_mode_fops);
+5 -5
kernel/trace/trace_kprobe.c
··· 97 97 98 98 static nokprobe_inline bool trace_kprobe_has_gone(struct trace_kprobe *tk) 99 99 { 100 - return !!(kprobe_gone(&tk->rp.kp)); 100 + return kprobe_gone(&tk->rp.kp); 101 101 } 102 102 103 103 static nokprobe_inline bool trace_kprobe_within_module(struct trace_kprobe *tk, ··· 1925 1925 if (ret) 1926 1926 return 0; 1927 1927 1928 - entry = tracefs_create_file("kprobe_events", 0644, NULL, 1929 - NULL, &kprobe_events_ops); 1928 + entry = tracefs_create_file("kprobe_events", TRACE_MODE_WRITE, 1929 + NULL, NULL, &kprobe_events_ops); 1930 1930 1931 1931 /* Event list interface */ 1932 1932 if (!entry) 1933 1933 pr_warn("Could not create tracefs 'kprobe_events' entry\n"); 1934 1934 1935 1935 /* Profile interface */ 1936 - entry = tracefs_create_file("kprobe_profile", 0444, NULL, 1937 - NULL, &kprobe_profile_ops); 1936 + entry = tracefs_create_file("kprobe_profile", TRACE_MODE_READ, 1937 + NULL, NULL, &kprobe_profile_ops); 1938 1938 1939 1939 if (!entry) 1940 1940 pr_warn("Could not create tracefs 'kprobe_profile' entry\n");
+20 -19
kernel/trace/trace_osnoise.c
··· 294 294 seq_puts(s, "# _-----=> irqs-off\n"); 295 295 seq_puts(s, "# / _----=> need-resched\n"); 296 296 seq_puts(s, "# | / _---=> hardirq/softirq\n"); 297 - seq_puts(s, "# || / _--=> preempt-depth "); 298 - seq_puts(s, " MAX\n"); 299 - 300 - seq_puts(s, "# || / "); 297 + seq_puts(s, "# || / _--=> preempt-depth\n"); 298 + seq_puts(s, "# ||| / _-=> migrate-disable "); 299 + seq_puts(s, " MAX\n"); 300 + seq_puts(s, "# |||| / delay "); 301 301 seq_puts(s, " SINGLE Interference counters:\n"); 302 302 303 - seq_puts(s, "# |||| RUNTIME "); 303 + seq_puts(s, "# ||||| RUNTIME "); 304 304 seq_puts(s, " NOISE %% OF CPU NOISE +-----------------------------+\n"); 305 305 306 - seq_puts(s, "# TASK-PID CPU# |||| TIMESTAMP IN US "); 306 + seq_puts(s, "# TASK-PID CPU# ||||| TIMESTAMP IN US "); 307 307 seq_puts(s, " IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD\n"); 308 308 309 - seq_puts(s, "# | | | |||| | | "); 309 + seq_puts(s, "# | | | ||||| | | "); 310 310 seq_puts(s, " | | | | | | | |\n"); 311 311 } 312 312 #endif /* CONFIG_PREEMPT_RT */ ··· 378 378 seq_puts(s, "# / _----=> need-resched\n"); 379 379 seq_puts(s, "# | / _---=> hardirq/softirq\n"); 380 380 seq_puts(s, "# || / _--=> preempt-depth\n"); 381 - seq_puts(s, "# || /\n"); 382 - seq_puts(s, "# |||| ACTIVATION\n"); 383 - seq_puts(s, "# TASK-PID CPU# |||| TIMESTAMP ID "); 384 - seq_puts(s, " CONTEXT LATENCY\n"); 385 - seq_puts(s, "# | | | |||| | | "); 381 + seq_puts(s, "# ||| / _-=> migrate-disable\n"); 382 + seq_puts(s, "# |||| / delay\n"); 383 + seq_puts(s, "# ||||| ACTIVATION\n"); 384 + seq_puts(s, "# TASK-PID CPU# ||||| TIMESTAMP ID "); 385 + seq_puts(s, " CONTEXT LATENCY\n"); 386 + seq_puts(s, "# | | | ||||| | | "); 386 387 seq_puts(s, " | |\n"); 387 388 } 388 389 #endif /* CONFIG_PREEMPT_RT */ ··· 1857 1856 if (!top_dir) 1858 1857 return 0; 1859 1858 1860 - tmp = tracefs_create_file("period_us", 0640, top_dir, 1859 + tmp = tracefs_create_file("period_us", TRACE_MODE_WRITE, top_dir, 1861 1860 &osnoise_period, &trace_min_max_fops); 1862 1861 if (!tmp) 1863 1862 goto err; 1864 1863 1865 - tmp = tracefs_create_file("runtime_us", 0644, top_dir, 1864 + tmp = tracefs_create_file("runtime_us", TRACE_MODE_WRITE, top_dir, 1866 1865 &osnoise_runtime, &trace_min_max_fops); 1867 1866 if (!tmp) 1868 1867 goto err; 1869 1868 1870 - tmp = tracefs_create_file("stop_tracing_us", 0640, top_dir, 1869 + tmp = tracefs_create_file("stop_tracing_us", TRACE_MODE_WRITE, top_dir, 1871 1870 &osnoise_stop_tracing_in, &trace_min_max_fops); 1872 1871 if (!tmp) 1873 1872 goto err; 1874 1873 1875 - tmp = tracefs_create_file("stop_tracing_total_us", 0640, top_dir, 1874 + tmp = tracefs_create_file("stop_tracing_total_us", TRACE_MODE_WRITE, top_dir, 1876 1875 &osnoise_stop_tracing_total, &trace_min_max_fops); 1877 1876 if (!tmp) 1878 1877 goto err; 1879 1878 1880 - tmp = trace_create_file("cpus", 0644, top_dir, NULL, &cpus_fops); 1879 + tmp = trace_create_file("cpus", TRACE_MODE_WRITE, top_dir, NULL, &cpus_fops); 1881 1880 if (!tmp) 1882 1881 goto err; 1883 1882 #ifdef CONFIG_TIMERLAT_TRACER 1884 1883 #ifdef CONFIG_STACKTRACE 1885 - tmp = tracefs_create_file("print_stack", 0640, top_dir, 1884 + tmp = tracefs_create_file("print_stack", TRACE_MODE_WRITE, top_dir, 1886 1885 &osnoise_print_stack, &trace_min_max_fops); 1887 1886 if (!tmp) 1888 1887 goto err; 1889 1888 #endif 1890 1889 1891 - tmp = tracefs_create_file("timerlat_period_us", 0640, top_dir, 1890 + tmp = tracefs_create_file("timerlat_period_us", TRACE_MODE_WRITE, top_dir, 1892 1891 &timerlat_period, &trace_min_max_fops); 1893 1892 if (!tmp) 1894 1893 goto err;
+4 -13
kernel/trace/trace_output.c
··· 8 8 #include <linux/module.h> 9 9 #include <linux/mutex.h> 10 10 #include <linux/ftrace.h> 11 + #include <linux/kprobes.h> 11 12 #include <linux/sched/clock.h> 12 13 #include <linux/sched/mm.h> 13 14 ··· 347 346 } 348 347 EXPORT_SYMBOL_GPL(trace_output_call); 349 348 350 - #ifdef CONFIG_KRETPROBES 351 - static inline const char *kretprobed(const char *name) 349 + static inline const char *kretprobed(const char *name, unsigned long addr) 352 350 { 353 - static const char tramp_name[] = "kretprobe_trampoline"; 354 - int size = sizeof(tramp_name); 355 - 356 - if (strncmp(tramp_name, name, size) == 0) 351 + if (is_kretprobe_trampoline(addr)) 357 352 return "[unknown/kretprobe'd]"; 358 353 return name; 359 354 } 360 - #else 361 - static inline const char *kretprobed(const char *name) 362 - { 363 - return name; 364 - } 365 - #endif /* CONFIG_KRETPROBES */ 366 355 367 356 void 368 357 trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset) ··· 365 374 sprint_symbol(str, address); 366 375 else 367 376 kallsyms_lookup(address, NULL, NULL, NULL, str); 368 - name = kretprobed(str); 377 + name = kretprobed(str, address); 369 378 370 379 if (name && strlen(name)) { 371 380 trace_seq_puts(s, name);
+1 -1
kernel/trace/trace_printk.c
··· 384 384 if (ret) 385 385 return 0; 386 386 387 - trace_create_file("printk_formats", 0444, NULL, 387 + trace_create_file("printk_formats", TRACE_MODE_READ, NULL, 388 388 NULL, &ftrace_formats_fops); 389 389 390 390 return 0;
+2 -2
kernel/trace/trace_recursion_record.c
··· 226 226 { 227 227 struct dentry *dentry; 228 228 229 - dentry = trace_create_file("recursed_functions", 0644, NULL, NULL, 230 - &recursed_functions_fops); 229 + dentry = trace_create_file("recursed_functions", TRACE_MODE_WRITE, 230 + NULL, NULL, &recursed_functions_fops); 231 231 if (!dentry) 232 232 pr_warn("WARNING: Failed to create recursed_functions\n"); 233 233 return 0;
+91 -1
kernel/trace/trace_selftest.c
··· 287 287 if (trace_selftest_test_probe3_cnt != 4) 288 288 goto out_free; 289 289 290 + /* Remove trace function from probe 3 */ 291 + func1_name = "!" __stringify(DYN_FTRACE_TEST_NAME); 292 + len1 = strlen(func1_name); 293 + 294 + ftrace_set_filter(&test_probe3, func1_name, len1, 0); 295 + 296 + DYN_FTRACE_TEST_NAME(); 297 + 298 + print_counts(); 299 + 300 + if (trace_selftest_test_probe1_cnt != 3) 301 + goto out_free; 302 + if (trace_selftest_test_probe2_cnt != 2) 303 + goto out_free; 304 + if (trace_selftest_test_probe3_cnt != 4) 305 + goto out_free; 306 + if (cnt > 1) { 307 + if (trace_selftest_test_global_cnt == 0) 308 + goto out_free; 309 + } 310 + if (trace_selftest_test_dyn_cnt == 0) 311 + goto out_free; 312 + 313 + DYN_FTRACE_TEST_NAME2(); 314 + 315 + print_counts(); 316 + 317 + if (trace_selftest_test_probe1_cnt != 3) 318 + goto out_free; 319 + if (trace_selftest_test_probe2_cnt != 3) 320 + goto out_free; 321 + if (trace_selftest_test_probe3_cnt != 5) 322 + goto out_free; 323 + 290 324 ret = 0; 291 325 out_free: 292 326 unregister_ftrace_function(dyn_ops); ··· 784 750 .retfunc = &trace_graph_return, 785 751 }; 786 752 753 + #if defined(CONFIG_DYNAMIC_FTRACE) && \ 754 + defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) 755 + #define TEST_DIRECT_TRAMP 756 + noinline __noclone static void trace_direct_tramp(void) { } 757 + #endif 758 + 787 759 /* 788 760 * Pretty much the same than for the function tracer from which the selftest 789 761 * has been borrowed. ··· 800 760 { 801 761 int ret; 802 762 unsigned long count; 763 + char *func_name __maybe_unused; 803 764 804 765 #ifdef CONFIG_DYNAMIC_FTRACE 805 766 if (ftrace_filter_param) { ··· 849 808 goto out; 850 809 } 851 810 852 - /* Don't test dynamic tracing, the function tracer already did */ 811 + #ifdef TEST_DIRECT_TRAMP 812 + tracing_reset_online_cpus(&tr->array_buffer); 813 + set_graph_array(tr); 853 814 815 + /* 816 + * Some archs *cough*PowerPC*cough* add characters to the 817 + * start of the function names. We simply put a '*' to 818 + * accommodate them. 819 + */ 820 + func_name = "*" __stringify(DYN_FTRACE_TEST_NAME); 821 + ftrace_set_global_filter(func_name, strlen(func_name), 1); 822 + 823 + /* 824 + * Register direct function together with graph tracer 825 + * and make sure we get graph trace. 826 + */ 827 + ret = register_ftrace_direct((unsigned long) DYN_FTRACE_TEST_NAME, 828 + (unsigned long) trace_direct_tramp); 829 + if (ret) 830 + goto out; 831 + 832 + ret = register_ftrace_graph(&fgraph_ops); 833 + if (ret) { 834 + warn_failed_init_tracer(trace, ret); 835 + goto out; 836 + } 837 + 838 + DYN_FTRACE_TEST_NAME(); 839 + 840 + count = 0; 841 + 842 + tracing_stop(); 843 + /* check the trace buffer */ 844 + ret = trace_test_buffer(&tr->array_buffer, &count); 845 + 846 + unregister_ftrace_graph(&fgraph_ops); 847 + 848 + ret = unregister_ftrace_direct((unsigned long) DYN_FTRACE_TEST_NAME, 849 + (unsigned long) trace_direct_tramp); 850 + if (ret) 851 + goto out; 852 + 853 + tracing_start(); 854 + 855 + if (!ret && !count) { 856 + ret = -1; 857 + goto out; 858 + } 859 + #endif 860 + 861 + /* Don't test dynamic tracing, the function tracer already did */ 854 862 out: 855 863 /* Stop it if we failed */ 856 864 if (ret)
+3 -3
kernel/trace/trace_stack.c
··· 559 559 if (ret) 560 560 return 0; 561 561 562 - trace_create_file("stack_max_size", 0644, NULL, 562 + trace_create_file("stack_max_size", TRACE_MODE_WRITE, NULL, 563 563 &stack_trace_max_size, &stack_max_size_fops); 564 564 565 - trace_create_file("stack_trace", 0444, NULL, 565 + trace_create_file("stack_trace", TRACE_MODE_READ, NULL, 566 566 NULL, &stack_trace_fops); 567 567 568 568 #ifdef CONFIG_DYNAMIC_FTRACE 569 - trace_create_file("stack_trace_filter", 0644, NULL, 569 + trace_create_file("stack_trace_filter", TRACE_MODE_WRITE, NULL, 570 570 &trace_ops, &stack_trace_filter_fops); 571 571 #endif 572 572
+3 -3
kernel/trace/trace_stat.c
··· 297 297 if (!stat_dir && (ret = tracing_stat_init())) 298 298 return ret; 299 299 300 - session->file = tracefs_create_file(session->ts->name, 0644, 301 - stat_dir, 302 - session, &tracing_stat_fops); 300 + session->file = tracefs_create_file(session->ts->name, TRACE_MODE_WRITE, 301 + stat_dir, session, 302 + &tracing_stat_fops); 303 303 if (!session->file) 304 304 return -ENOMEM; 305 305 return 0;
+2 -2
kernel/trace/trace_uprobe.c
··· 1655 1655 if (ret) 1656 1656 return 0; 1657 1657 1658 - trace_create_file("uprobe_events", 0644, NULL, 1658 + trace_create_file("uprobe_events", TRACE_MODE_WRITE, NULL, 1659 1659 NULL, &uprobe_events_ops); 1660 1660 /* Profile interface */ 1661 - trace_create_file("uprobe_profile", 0444, NULL, 1661 + trace_create_file("uprobe_profile", TRACE_MODE_READ, NULL, 1662 1662 NULL, &uprobe_profile_ops); 1663 1663 return 0; 1664 1664 }
+23 -17
kernel/trace/tracing_map.c
··· 834 834 return err; 835 835 } 836 836 837 - static int cmp_entries_dup(const struct tracing_map_sort_entry **a, 838 - const struct tracing_map_sort_entry **b) 837 + static int cmp_entries_dup(const void *A, const void *B) 839 838 { 839 + const struct tracing_map_sort_entry *a, *b; 840 840 int ret = 0; 841 841 842 - if (memcmp((*a)->key, (*b)->key, (*a)->elt->map->key_size)) 842 + a = *(const struct tracing_map_sort_entry **)A; 843 + b = *(const struct tracing_map_sort_entry **)B; 844 + 845 + if (memcmp(a->key, b->key, a->elt->map->key_size)) 843 846 ret = 1; 844 847 845 848 return ret; 846 849 } 847 850 848 - static int cmp_entries_sum(const struct tracing_map_sort_entry **a, 849 - const struct tracing_map_sort_entry **b) 851 + static int cmp_entries_sum(const void *A, const void *B) 850 852 { 851 853 const struct tracing_map_elt *elt_a, *elt_b; 854 + const struct tracing_map_sort_entry *a, *b; 852 855 struct tracing_map_sort_key *sort_key; 853 856 struct tracing_map_field *field; 854 857 tracing_map_cmp_fn_t cmp_fn; 855 858 void *val_a, *val_b; 856 859 int ret = 0; 857 860 858 - elt_a = (*a)->elt; 859 - elt_b = (*b)->elt; 861 + a = *(const struct tracing_map_sort_entry **)A; 862 + b = *(const struct tracing_map_sort_entry **)B; 863 + 864 + elt_a = a->elt; 865 + elt_b = b->elt; 860 866 861 867 sort_key = &elt_a->map->sort_key; 862 868 ··· 879 873 return ret; 880 874 } 881 875 882 - static int cmp_entries_key(const struct tracing_map_sort_entry **a, 883 - const struct tracing_map_sort_entry **b) 876 + static int cmp_entries_key(const void *A, const void *B) 884 877 { 885 878 const struct tracing_map_elt *elt_a, *elt_b; 879 + const struct tracing_map_sort_entry *a, *b; 886 880 struct tracing_map_sort_key *sort_key; 887 881 struct tracing_map_field *field; 888 882 tracing_map_cmp_fn_t cmp_fn; 889 883 void *val_a, *val_b; 890 884 int ret = 0; 891 885 892 - elt_a = (*a)->elt; 893 - elt_b = (*b)->elt; 886 + a = *(const struct tracing_map_sort_entry **)A; 887 + b = *(const struct tracing_map_sort_entry **)B; 888 + 889 + elt_a = a->elt; 890 + elt_b = b->elt; 894 891 895 892 sort_key = &elt_a->map->sort_key; 896 893 ··· 998 989 struct tracing_map_sort_key *primary_key, 999 990 struct tracing_map_sort_key *secondary_key) 1000 991 { 1001 - int (*primary_fn)(const struct tracing_map_sort_entry **, 1002 - const struct tracing_map_sort_entry **); 1003 - int (*secondary_fn)(const struct tracing_map_sort_entry **, 1004 - const struct tracing_map_sort_entry **); 992 + int (*primary_fn)(const void *, const void *); 993 + int (*secondary_fn)(const void *, const void *); 1005 994 unsigned i, start = 0, n_sub = 1; 1006 995 1007 996 if (is_key(map, primary_key->field_idx)) ··· 1068 1061 unsigned int n_sort_keys, 1069 1062 struct tracing_map_sort_entry ***sort_entries) 1070 1063 { 1071 - int (*cmp_entries_fn)(const struct tracing_map_sort_entry **, 1072 - const struct tracing_map_sort_entry **); 1064 + int (*cmp_entries_fn)(const void *, const void *); 1073 1065 struct tracing_map_sort_entry *sort_entry, **entries; 1074 1066 int i, n_entries, ret; 1075 1067
+2 -1
lib/Kconfig.debug
··· 2080 2080 If unsure, say N. 2081 2081 2082 2082 config KPROBES_SANITY_TEST 2083 - bool "Kprobes sanity tests" 2083 + tristate "Kprobes sanity tests" 2084 2084 depends on DEBUG_KERNEL 2085 2085 depends on KPROBES 2086 + depends on KUNIT 2086 2087 help 2087 2088 This option provides for testing basic kprobes functionality on 2088 2089 boot. Samples of kprobe and kretprobe are inserted and
+1
lib/Makefile
··· 100 100 obj-$(CONFIG_TEST_LOCKUP) += test_lockup.o 101 101 obj-$(CONFIG_TEST_HMM) += test_hmm.o 102 102 obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o 103 + obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o 103 104 104 105 # 105 106 # CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
+143 -92
lib/bootconfig.c
··· 4 4 * Masami Hiramatsu <mhiramat@kernel.org> 5 5 */ 6 6 7 - #define pr_fmt(fmt) "bootconfig: " fmt 8 - 7 + #ifdef __KERNEL__ 9 8 #include <linux/bootconfig.h> 10 9 #include <linux/bug.h> 11 10 #include <linux/ctype.h> 12 11 #include <linux/errno.h> 13 12 #include <linux/kernel.h> 14 13 #include <linux/memblock.h> 15 - #include <linux/printk.h> 16 14 #include <linux/string.h> 15 + #else /* !__KERNEL__ */ 16 + /* 17 + * NOTE: This is only for tools/bootconfig, because tools/bootconfig will 18 + * run the parser sanity test. 19 + * This does NOT mean lib/bootconfig.c is available in the user space. 20 + * However, if you change this file, please make sure the tools/bootconfig 21 + * has no issue on building and running. 22 + */ 23 + #include <linux/bootconfig.h> 24 + #endif 17 25 18 26 /* 19 27 * Extra Boot Config (XBC) is given as tree-structured ascii text of ··· 41 33 static int xbc_err_pos __initdata; 42 34 static int open_brace[XBC_DEPTH_MAX] __initdata; 43 35 static int brace_index __initdata; 36 + 37 + #ifdef __KERNEL__ 38 + static inline void * __init xbc_alloc_mem(size_t size) 39 + { 40 + return memblock_alloc(size, SMP_CACHE_BYTES); 41 + } 42 + 43 + static inline void __init xbc_free_mem(void *addr, size_t size) 44 + { 45 + memblock_free_ptr(addr, size); 46 + } 47 + 48 + #else /* !__KERNEL__ */ 49 + 50 + static inline void *xbc_alloc_mem(size_t size) 51 + { 52 + return malloc(size); 53 + } 54 + 55 + static inline void xbc_free_mem(void *addr, size_t size) 56 + { 57 + free(addr); 58 + } 59 + #endif 60 + /** 61 + * xbc_get_info() - Get the information of loaded boot config 62 + * @node_size: A pointer to store the number of nodes. 63 + * @data_size: A pointer to store the size of bootconfig data. 64 + * 65 + * Get the number of used nodes in @node_size if it is not NULL, 66 + * and the size of bootconfig data in @data_size if it is not NULL. 67 + * Return 0 if the boot config is initialized, or return -ENODEV. 68 + */ 69 + int __init xbc_get_info(int *node_size, size_t *data_size) 70 + { 71 + if (!xbc_data) 72 + return -ENODEV; 73 + 74 + if (node_size) 75 + *node_size = xbc_node_num; 76 + if (data_size) 77 + *data_size = xbc_data_size; 78 + return 0; 79 + } 44 80 45 81 static int __init xbc_parse_error(const char *msg, const char *p) 46 82 { ··· 278 226 struct xbc_node *node, 279 227 char *buf, size_t size) 280 228 { 281 - u16 keys[XBC_DEPTH_MAX]; 229 + uint16_t keys[XBC_DEPTH_MAX]; 282 230 int depth = 0, ret = 0, total = 0; 283 231 284 232 if (!node || node == root) ··· 393 341 394 342 /* XBC parse and tree build */ 395 343 396 - static int __init xbc_init_node(struct xbc_node *node, char *data, u32 flag) 344 + static int __init xbc_init_node(struct xbc_node *node, char *data, uint32_t flag) 397 345 { 398 346 unsigned long offset = data - xbc_data; 399 347 400 348 if (WARN_ON(offset >= XBC_DATA_MAX)) 401 349 return -EINVAL; 402 350 403 - node->data = (u16)offset | flag; 351 + node->data = (uint16_t)offset | flag; 404 352 node->child = 0; 405 353 node->next = 0; 406 354 407 355 return 0; 408 356 } 409 357 410 - static struct xbc_node * __init xbc_add_node(char *data, u32 flag) 358 + static struct xbc_node * __init xbc_add_node(char *data, uint32_t flag) 411 359 { 412 360 struct xbc_node *node; 413 361 ··· 437 385 return node; 438 386 } 439 387 440 - static struct xbc_node * __init __xbc_add_sibling(char *data, u32 flag, bool head) 388 + static struct xbc_node * __init __xbc_add_sibling(char *data, uint32_t flag, bool head) 441 389 { 442 390 struct xbc_node *sib, *node = xbc_add_node(data, flag); 443 391 ··· 464 412 return node; 465 413 } 466 414 467 - static inline struct xbc_node * __init xbc_add_sibling(char *data, u32 flag) 415 + static inline struct xbc_node * __init xbc_add_sibling(char *data, uint32_t flag) 468 416 { 469 417 return __xbc_add_sibling(data, flag, false); 470 418 } 471 419 472 - static inline struct xbc_node * __init xbc_add_head_sibling(char *data, u32 flag) 420 + static inline struct xbc_node * __init xbc_add_head_sibling(char *data, uint32_t flag) 473 421 { 474 422 return __xbc_add_sibling(data, flag, true); 475 423 } 476 424 477 - static inline __init struct xbc_node *xbc_add_child(char *data, u32 flag) 425 + static inline __init struct xbc_node *xbc_add_child(char *data, uint32_t flag) 478 426 { 479 427 struct xbc_node *node = xbc_add_sibling(data, flag); 480 428 ··· 832 780 return 0; 833 781 } 834 782 835 - /** 836 - * xbc_destroy_all() - Clean up all parsed bootconfig 837 - * 838 - * This clears all data structures of parsed bootconfig on memory. 839 - * If you need to reuse xbc_init() with new boot config, you can 840 - * use this. 841 - */ 842 - void __init xbc_destroy_all(void) 843 - { 844 - xbc_data = NULL; 845 - xbc_data_size = 0; 846 - xbc_node_num = 0; 847 - memblock_free_ptr(xbc_nodes, sizeof(struct xbc_node) * XBC_NODE_MAX); 848 - xbc_nodes = NULL; 849 - brace_index = 0; 850 - } 851 - 852 - /** 853 - * xbc_init() - Parse given XBC file and build XBC internal tree 854 - * @buf: boot config text 855 - * @emsg: A pointer of const char * to store the error message 856 - * @epos: A pointer of int to store the error position 857 - * 858 - * This parses the boot config text in @buf. @buf must be a 859 - * null terminated string and smaller than XBC_DATA_MAX. 860 - * Return the number of stored nodes (>0) if succeeded, or -errno 861 - * if there is any error. 862 - * In error cases, @emsg will be updated with an error message and 863 - * @epos will be updated with the error position which is the byte offset 864 - * of @buf. If the error is not a parser error, @epos will be -1. 865 - */ 866 - int __init xbc_init(char *buf, const char **emsg, int *epos) 783 + /* Need to setup xbc_data and xbc_nodes before call this. */ 784 + static int __init xbc_parse_tree(void) 867 785 { 868 786 char *p, *q; 869 - int ret, c; 787 + int ret = 0, c; 870 788 871 - if (epos) 872 - *epos = -1; 873 - 874 - if (xbc_data) { 875 - if (emsg) 876 - *emsg = "Bootconfig is already initialized"; 877 - return -EBUSY; 878 - } 879 - 880 - ret = strlen(buf); 881 - if (ret > XBC_DATA_MAX - 1 || ret == 0) { 882 - if (emsg) 883 - *emsg = ret ? "Config data is too big" : 884 - "Config data is empty"; 885 - return -ERANGE; 886 - } 887 - 888 - xbc_nodes = memblock_alloc(sizeof(struct xbc_node) * XBC_NODE_MAX, 889 - SMP_CACHE_BYTES); 890 - if (!xbc_nodes) { 891 - if (emsg) 892 - *emsg = "Failed to allocate bootconfig nodes"; 893 - return -ENOMEM; 894 - } 895 - memset(xbc_nodes, 0, sizeof(struct xbc_node) * XBC_NODE_MAX); 896 - xbc_data = buf; 897 - xbc_data_size = ret + 1; 898 789 last_parent = NULL; 899 - 900 - p = buf; 790 + p = xbc_data; 901 791 do { 902 792 q = strpbrk(p, "{}=+;:\n#"); 903 793 if (!q) { ··· 881 887 } 882 888 } while (!ret); 883 889 890 + return ret; 891 + } 892 + 893 + /** 894 + * xbc_exit() - Clean up all parsed bootconfig 895 + * 896 + * This clears all data structures of parsed bootconfig on memory. 897 + * If you need to reuse xbc_init() with new boot config, you can 898 + * use this. 899 + */ 900 + void __init xbc_exit(void) 901 + { 902 + xbc_free_mem(xbc_data, xbc_data_size); 903 + xbc_data = NULL; 904 + xbc_data_size = 0; 905 + xbc_node_num = 0; 906 + xbc_free_mem(xbc_nodes, sizeof(struct xbc_node) * XBC_NODE_MAX); 907 + xbc_nodes = NULL; 908 + brace_index = 0; 909 + } 910 + 911 + /** 912 + * xbc_init() - Parse given XBC file and build XBC internal tree 913 + * @data: The boot config text original data 914 + * @size: The size of @data 915 + * @emsg: A pointer of const char * to store the error message 916 + * @epos: A pointer of int to store the error position 917 + * 918 + * This parses the boot config text in @data. @size must be smaller 919 + * than XBC_DATA_MAX. 920 + * Return the number of stored nodes (>0) if succeeded, or -errno 921 + * if there is any error. 922 + * In error cases, @emsg will be updated with an error message and 923 + * @epos will be updated with the error position which is the byte offset 924 + * of @buf. If the error is not a parser error, @epos will be -1. 925 + */ 926 + int __init xbc_init(const char *data, size_t size, const char **emsg, int *epos) 927 + { 928 + int ret; 929 + 930 + if (epos) 931 + *epos = -1; 932 + 933 + if (xbc_data) { 934 + if (emsg) 935 + *emsg = "Bootconfig is already initialized"; 936 + return -EBUSY; 937 + } 938 + if (size > XBC_DATA_MAX || size == 0) { 939 + if (emsg) 940 + *emsg = size ? "Config data is too big" : 941 + "Config data is empty"; 942 + return -ERANGE; 943 + } 944 + 945 + xbc_data = xbc_alloc_mem(size + 1); 946 + if (!xbc_data) { 947 + if (emsg) 948 + *emsg = "Failed to allocate bootconfig data"; 949 + return -ENOMEM; 950 + } 951 + memcpy(xbc_data, data, size); 952 + xbc_data[size] = '\0'; 953 + xbc_data_size = size + 1; 954 + 955 + xbc_nodes = xbc_alloc_mem(sizeof(struct xbc_node) * XBC_NODE_MAX); 956 + if (!xbc_nodes) { 957 + if (emsg) 958 + *emsg = "Failed to allocate bootconfig nodes"; 959 + xbc_exit(); 960 + return -ENOMEM; 961 + } 962 + memset(xbc_nodes, 0, sizeof(struct xbc_node) * XBC_NODE_MAX); 963 + 964 + ret = xbc_parse_tree(); 884 965 if (!ret) 885 966 ret = xbc_verify_tree(); 886 967 ··· 964 895 *epos = xbc_err_pos; 965 896 if (emsg) 966 897 *emsg = xbc_err_msg; 967 - xbc_destroy_all(); 898 + xbc_exit(); 968 899 } else 969 900 ret = xbc_node_num; 970 901 971 902 return ret; 972 - } 973 - 974 - /** 975 - * xbc_debug_dump() - Dump current XBC node list 976 - * 977 - * Dump the current XBC node list on printk buffer for debug. 978 - */ 979 - void __init xbc_debug_dump(void) 980 - { 981 - int i; 982 - 983 - for (i = 0; i < xbc_node_num; i++) { 984 - pr_debug("[%d] %s (%s) .next=%d, .child=%d .parent=%d\n", i, 985 - xbc_node_get_data(xbc_nodes + i), 986 - xbc_node_is_value(xbc_nodes + i) ? "value" : "key", 987 - xbc_nodes[i].next, xbc_nodes[i].child, 988 - xbc_nodes[i].parent); 989 - } 990 903 }
+2 -1
lib/error-inject.c
··· 8 8 #include <linux/mutex.h> 9 9 #include <linux/list.h> 10 10 #include <linux/slab.h> 11 + #include <asm/sections.h> 11 12 12 13 /* Whitelist of symbols that can be overridden for error injection. */ 13 14 static LIST_HEAD(error_injection_list); ··· 65 64 66 65 mutex_lock(&ei_mutex); 67 66 for (iter = start; iter < end; iter++) { 68 - entry = arch_deref_entry_point((void *)iter->addr); 67 + entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr); 69 68 70 69 if (!kernel_text_address(entry) || 71 70 !kallsyms_lookup_size_offset(entry, &size, &offset)) {
+371
lib/test_kprobes.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * test_kprobes.c - simple sanity test for *probes 4 + * 5 + * Copyright IBM Corp. 2008 6 + */ 7 + 8 + #include <linux/kernel.h> 9 + #include <linux/kprobes.h> 10 + #include <linux/random.h> 11 + #include <kunit/test.h> 12 + 13 + #define div_factor 3 14 + 15 + static u32 rand1, preh_val, posth_val; 16 + static u32 (*target)(u32 value); 17 + static u32 (*target2)(u32 value); 18 + static struct kunit *current_test; 19 + 20 + static unsigned long (*internal_target)(void); 21 + static unsigned long (*stacktrace_target)(void); 22 + static unsigned long (*stacktrace_driver)(void); 23 + static unsigned long target_return_address[2]; 24 + 25 + static noinline u32 kprobe_target(u32 value) 26 + { 27 + return (value / div_factor); 28 + } 29 + 30 + static int kp_pre_handler(struct kprobe *p, struct pt_regs *regs) 31 + { 32 + KUNIT_EXPECT_FALSE(current_test, preemptible()); 33 + preh_val = (rand1 / div_factor); 34 + return 0; 35 + } 36 + 37 + static void kp_post_handler(struct kprobe *p, struct pt_regs *regs, 38 + unsigned long flags) 39 + { 40 + KUNIT_EXPECT_FALSE(current_test, preemptible()); 41 + KUNIT_EXPECT_EQ(current_test, preh_val, (rand1 / div_factor)); 42 + posth_val = preh_val + div_factor; 43 + } 44 + 45 + static struct kprobe kp = { 46 + .symbol_name = "kprobe_target", 47 + .pre_handler = kp_pre_handler, 48 + .post_handler = kp_post_handler 49 + }; 50 + 51 + static void test_kprobe(struct kunit *test) 52 + { 53 + current_test = test; 54 + KUNIT_EXPECT_EQ(test, 0, register_kprobe(&kp)); 55 + target(rand1); 56 + unregister_kprobe(&kp); 57 + KUNIT_EXPECT_NE(test, 0, preh_val); 58 + KUNIT_EXPECT_NE(test, 0, posth_val); 59 + } 60 + 61 + static noinline u32 kprobe_target2(u32 value) 62 + { 63 + return (value / div_factor) + 1; 64 + } 65 + 66 + static noinline unsigned long kprobe_stacktrace_internal_target(void) 67 + { 68 + if (!target_return_address[0]) 69 + target_return_address[0] = (unsigned long)__builtin_return_address(0); 70 + return target_return_address[0]; 71 + } 72 + 73 + static noinline unsigned long kprobe_stacktrace_target(void) 74 + { 75 + if (!target_return_address[1]) 76 + target_return_address[1] = (unsigned long)__builtin_return_address(0); 77 + 78 + if (internal_target) 79 + internal_target(); 80 + 81 + return target_return_address[1]; 82 + } 83 + 84 + static noinline unsigned long kprobe_stacktrace_driver(void) 85 + { 86 + if (stacktrace_target) 87 + stacktrace_target(); 88 + 89 + /* This is for preventing inlining the function */ 90 + return (unsigned long)__builtin_return_address(0); 91 + } 92 + 93 + static int kp_pre_handler2(struct kprobe *p, struct pt_regs *regs) 94 + { 95 + preh_val = (rand1 / div_factor) + 1; 96 + return 0; 97 + } 98 + 99 + static void kp_post_handler2(struct kprobe *p, struct pt_regs *regs, 100 + unsigned long flags) 101 + { 102 + KUNIT_EXPECT_EQ(current_test, preh_val, (rand1 / div_factor) + 1); 103 + posth_val = preh_val + div_factor; 104 + } 105 + 106 + static struct kprobe kp2 = { 107 + .symbol_name = "kprobe_target2", 108 + .pre_handler = kp_pre_handler2, 109 + .post_handler = kp_post_handler2 110 + }; 111 + 112 + static void test_kprobes(struct kunit *test) 113 + { 114 + struct kprobe *kps[2] = {&kp, &kp2}; 115 + 116 + current_test = test; 117 + 118 + /* addr and flags should be cleard for reusing kprobe. */ 119 + kp.addr = NULL; 120 + kp.flags = 0; 121 + 122 + KUNIT_EXPECT_EQ(test, 0, register_kprobes(kps, 2)); 123 + preh_val = 0; 124 + posth_val = 0; 125 + target(rand1); 126 + 127 + KUNIT_EXPECT_NE(test, 0, preh_val); 128 + KUNIT_EXPECT_NE(test, 0, posth_val); 129 + 130 + preh_val = 0; 131 + posth_val = 0; 132 + target2(rand1); 133 + 134 + KUNIT_EXPECT_NE(test, 0, preh_val); 135 + KUNIT_EXPECT_NE(test, 0, posth_val); 136 + unregister_kprobes(kps, 2); 137 + } 138 + 139 + #ifdef CONFIG_KRETPROBES 140 + static u32 krph_val; 141 + 142 + static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs) 143 + { 144 + KUNIT_EXPECT_FALSE(current_test, preemptible()); 145 + krph_val = (rand1 / div_factor); 146 + return 0; 147 + } 148 + 149 + static int return_handler(struct kretprobe_instance *ri, struct pt_regs *regs) 150 + { 151 + unsigned long ret = regs_return_value(regs); 152 + 153 + KUNIT_EXPECT_FALSE(current_test, preemptible()); 154 + KUNIT_EXPECT_EQ(current_test, ret, rand1 / div_factor); 155 + KUNIT_EXPECT_NE(current_test, krph_val, 0); 156 + krph_val = rand1; 157 + return 0; 158 + } 159 + 160 + static struct kretprobe rp = { 161 + .handler = return_handler, 162 + .entry_handler = entry_handler, 163 + .kp.symbol_name = "kprobe_target" 164 + }; 165 + 166 + static void test_kretprobe(struct kunit *test) 167 + { 168 + current_test = test; 169 + KUNIT_EXPECT_EQ(test, 0, register_kretprobe(&rp)); 170 + target(rand1); 171 + unregister_kretprobe(&rp); 172 + KUNIT_EXPECT_EQ(test, krph_val, rand1); 173 + } 174 + 175 + static int return_handler2(struct kretprobe_instance *ri, struct pt_regs *regs) 176 + { 177 + unsigned long ret = regs_return_value(regs); 178 + 179 + KUNIT_EXPECT_EQ(current_test, ret, (rand1 / div_factor) + 1); 180 + KUNIT_EXPECT_NE(current_test, krph_val, 0); 181 + krph_val = rand1; 182 + return 0; 183 + } 184 + 185 + static struct kretprobe rp2 = { 186 + .handler = return_handler2, 187 + .entry_handler = entry_handler, 188 + .kp.symbol_name = "kprobe_target2" 189 + }; 190 + 191 + static void test_kretprobes(struct kunit *test) 192 + { 193 + struct kretprobe *rps[2] = {&rp, &rp2}; 194 + 195 + current_test = test; 196 + /* addr and flags should be cleard for reusing kprobe. */ 197 + rp.kp.addr = NULL; 198 + rp.kp.flags = 0; 199 + KUNIT_EXPECT_EQ(test, 0, register_kretprobes(rps, 2)); 200 + 201 + krph_val = 0; 202 + target(rand1); 203 + KUNIT_EXPECT_EQ(test, krph_val, rand1); 204 + 205 + krph_val = 0; 206 + target2(rand1); 207 + KUNIT_EXPECT_EQ(test, krph_val, rand1); 208 + unregister_kretprobes(rps, 2); 209 + } 210 + 211 + #ifdef CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE 212 + #define STACK_BUF_SIZE 16 213 + static unsigned long stack_buf[STACK_BUF_SIZE]; 214 + 215 + static int stacktrace_return_handler(struct kretprobe_instance *ri, struct pt_regs *regs) 216 + { 217 + unsigned long retval = regs_return_value(regs); 218 + int i, ret; 219 + 220 + KUNIT_EXPECT_FALSE(current_test, preemptible()); 221 + KUNIT_EXPECT_EQ(current_test, retval, target_return_address[1]); 222 + 223 + /* 224 + * Test stacktrace inside the kretprobe handler, this will involves 225 + * kretprobe trampoline, but must include correct return address 226 + * of the target function. 227 + */ 228 + ret = stack_trace_save(stack_buf, STACK_BUF_SIZE, 0); 229 + KUNIT_EXPECT_NE(current_test, ret, 0); 230 + 231 + for (i = 0; i < ret; i++) { 232 + if (stack_buf[i] == target_return_address[1]) 233 + break; 234 + } 235 + KUNIT_EXPECT_NE(current_test, i, ret); 236 + 237 + #if !IS_MODULE(CONFIG_KPROBES_SANITY_TEST) 238 + /* 239 + * Test stacktrace from pt_regs at the return address. Thus the stack 240 + * trace must start from the target return address. 241 + */ 242 + ret = stack_trace_save_regs(regs, stack_buf, STACK_BUF_SIZE, 0); 243 + KUNIT_EXPECT_NE(current_test, ret, 0); 244 + KUNIT_EXPECT_EQ(current_test, stack_buf[0], target_return_address[1]); 245 + #endif 246 + 247 + return 0; 248 + } 249 + 250 + static struct kretprobe rp3 = { 251 + .handler = stacktrace_return_handler, 252 + .kp.symbol_name = "kprobe_stacktrace_target" 253 + }; 254 + 255 + static void test_stacktrace_on_kretprobe(struct kunit *test) 256 + { 257 + unsigned long myretaddr = (unsigned long)__builtin_return_address(0); 258 + 259 + current_test = test; 260 + rp3.kp.addr = NULL; 261 + rp3.kp.flags = 0; 262 + 263 + /* 264 + * Run the stacktrace_driver() to record correct return address in 265 + * stacktrace_target() and ensure stacktrace_driver() call is not 266 + * inlined by checking the return address of stacktrace_driver() 267 + * and the return address of this function is different. 268 + */ 269 + KUNIT_ASSERT_NE(test, myretaddr, stacktrace_driver()); 270 + 271 + KUNIT_ASSERT_EQ(test, 0, register_kretprobe(&rp3)); 272 + KUNIT_ASSERT_NE(test, myretaddr, stacktrace_driver()); 273 + unregister_kretprobe(&rp3); 274 + } 275 + 276 + static int stacktrace_internal_return_handler(struct kretprobe_instance *ri, struct pt_regs *regs) 277 + { 278 + unsigned long retval = regs_return_value(regs); 279 + int i, ret; 280 + 281 + KUNIT_EXPECT_FALSE(current_test, preemptible()); 282 + KUNIT_EXPECT_EQ(current_test, retval, target_return_address[0]); 283 + 284 + /* 285 + * Test stacktrace inside the kretprobe handler for nested case. 286 + * The unwinder will find the kretprobe_trampoline address on the 287 + * return address, and kretprobe must solve that. 288 + */ 289 + ret = stack_trace_save(stack_buf, STACK_BUF_SIZE, 0); 290 + KUNIT_EXPECT_NE(current_test, ret, 0); 291 + 292 + for (i = 0; i < ret - 1; i++) { 293 + if (stack_buf[i] == target_return_address[0]) { 294 + KUNIT_EXPECT_EQ(current_test, stack_buf[i + 1], target_return_address[1]); 295 + break; 296 + } 297 + } 298 + KUNIT_EXPECT_NE(current_test, i, ret); 299 + 300 + #if !IS_MODULE(CONFIG_KPROBES_SANITY_TEST) 301 + /* Ditto for the regs version. */ 302 + ret = stack_trace_save_regs(regs, stack_buf, STACK_BUF_SIZE, 0); 303 + KUNIT_EXPECT_NE(current_test, ret, 0); 304 + KUNIT_EXPECT_EQ(current_test, stack_buf[0], target_return_address[0]); 305 + KUNIT_EXPECT_EQ(current_test, stack_buf[1], target_return_address[1]); 306 + #endif 307 + 308 + return 0; 309 + } 310 + 311 + static struct kretprobe rp4 = { 312 + .handler = stacktrace_internal_return_handler, 313 + .kp.symbol_name = "kprobe_stacktrace_internal_target" 314 + }; 315 + 316 + static void test_stacktrace_on_nested_kretprobe(struct kunit *test) 317 + { 318 + unsigned long myretaddr = (unsigned long)__builtin_return_address(0); 319 + struct kretprobe *rps[2] = {&rp3, &rp4}; 320 + 321 + current_test = test; 322 + rp3.kp.addr = NULL; 323 + rp3.kp.flags = 0; 324 + 325 + //KUNIT_ASSERT_NE(test, myretaddr, stacktrace_driver()); 326 + 327 + KUNIT_ASSERT_EQ(test, 0, register_kretprobes(rps, 2)); 328 + KUNIT_ASSERT_NE(test, myretaddr, stacktrace_driver()); 329 + unregister_kretprobes(rps, 2); 330 + } 331 + #endif /* CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE */ 332 + 333 + #endif /* CONFIG_KRETPROBES */ 334 + 335 + static int kprobes_test_init(struct kunit *test) 336 + { 337 + target = kprobe_target; 338 + target2 = kprobe_target2; 339 + stacktrace_target = kprobe_stacktrace_target; 340 + internal_target = kprobe_stacktrace_internal_target; 341 + stacktrace_driver = kprobe_stacktrace_driver; 342 + 343 + do { 344 + rand1 = prandom_u32(); 345 + } while (rand1 <= div_factor); 346 + return 0; 347 + } 348 + 349 + static struct kunit_case kprobes_testcases[] = { 350 + KUNIT_CASE(test_kprobe), 351 + KUNIT_CASE(test_kprobes), 352 + #ifdef CONFIG_KRETPROBES 353 + KUNIT_CASE(test_kretprobe), 354 + KUNIT_CASE(test_kretprobes), 355 + #ifdef CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE 356 + KUNIT_CASE(test_stacktrace_on_kretprobe), 357 + KUNIT_CASE(test_stacktrace_on_nested_kretprobe), 358 + #endif 359 + #endif 360 + {} 361 + }; 362 + 363 + static struct kunit_suite kprobes_test_suite = { 364 + .name = "kprobes_test", 365 + .init = kprobes_test_init, 366 + .test_cases = kprobes_testcases, 367 + }; 368 + 369 + kunit_test_suites(&kprobes_test_suite); 370 + 371 + MODULE_LICENSE("GPL");
+1
samples/ftrace/Makefile
··· 3 3 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct.o 4 4 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct-too.o 5 5 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct-modify.o 6 + obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct-multi.o 6 7 7 8 CFLAGS_sample-trace-array.o := -I$(src) 8 9 obj-$(CONFIG_SAMPLE_TRACE_ARRAY) += sample-trace-array.o
+52
samples/ftrace/ftrace-direct-multi.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #include <linux/module.h> 3 + 4 + #include <linux/mm.h> /* for handle_mm_fault() */ 5 + #include <linux/ftrace.h> 6 + #include <linux/sched/stat.h> 7 + 8 + void my_direct_func(unsigned long ip) 9 + { 10 + trace_printk("ip %lx\n", ip); 11 + } 12 + 13 + extern void my_tramp(void *); 14 + 15 + asm ( 16 + " .pushsection .text, \"ax\", @progbits\n" 17 + " .type my_tramp, @function\n" 18 + " .globl my_tramp\n" 19 + " my_tramp:" 20 + " pushq %rbp\n" 21 + " movq %rsp, %rbp\n" 22 + " pushq %rdi\n" 23 + " movq 8(%rbp), %rdi\n" 24 + " call my_direct_func\n" 25 + " popq %rdi\n" 26 + " leave\n" 27 + " ret\n" 28 + " .size my_tramp, .-my_tramp\n" 29 + " .popsection\n" 30 + ); 31 + 32 + static struct ftrace_ops direct; 33 + 34 + static int __init ftrace_direct_multi_init(void) 35 + { 36 + ftrace_set_filter_ip(&direct, (unsigned long) wake_up_process, 0, 0); 37 + ftrace_set_filter_ip(&direct, (unsigned long) schedule, 0, 0); 38 + 39 + return register_ftrace_direct_multi(&direct, (unsigned long) my_tramp); 40 + } 41 + 42 + static void __exit ftrace_direct_multi_exit(void) 43 + { 44 + unregister_ftrace_direct_multi(&direct, (unsigned long) my_tramp); 45 + } 46 + 47 + module_init(ftrace_direct_multi_init); 48 + module_exit(ftrace_direct_multi_exit); 49 + 50 + MODULE_AUTHOR("Jiri Olsa"); 51 + MODULE_DESCRIPTION("Example use case of using register_ftrace_direct_multi()"); 52 + MODULE_LICENSE("GPL");
+1 -1
samples/kprobes/kretprobe_example.c
··· 86 86 ret = register_kretprobe(&my_kretprobe); 87 87 if (ret < 0) { 88 88 pr_err("register_kretprobe failed, returned %d\n", ret); 89 - return -1; 89 + return ret; 90 90 } 91 91 pr_info("Planted return probe at %s: %p\n", 92 92 my_kretprobe.kp.symbol_name, my_kretprobe.kp.addr);
+2 -2
tools/bootconfig/Makefile
··· 15 15 ALL_TARGETS := bootconfig 16 16 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) 17 17 18 - all: $(ALL_PROGRAMS) 18 + all: $(ALL_PROGRAMS) test 19 19 20 - $(OUTPUT)bootconfig: main.c $(LIBSRC) 20 + $(OUTPUT)bootconfig: main.c include/linux/bootconfig.h $(LIBSRC) 21 21 $(CC) $(filter %.c,$^) $(CFLAGS) -o $@ 22 22 23 23 test: $(ALL_PROGRAMS) test-bootconfig.sh
+44 -1
tools/bootconfig/include/linux/bootconfig.h
··· 2 2 #ifndef _BOOTCONFIG_LINUX_BOOTCONFIG_H 3 3 #define _BOOTCONFIG_LINUX_BOOTCONFIG_H 4 4 5 - #include "../../../../include/linux/bootconfig.h" 5 + #include <stdio.h> 6 + #include <stdlib.h> 7 + #include <stdint.h> 8 + #include <stdbool.h> 9 + #include <ctype.h> 10 + #include <errno.h> 11 + #include <string.h> 12 + 6 13 7 14 #ifndef fallthrough 8 15 # define fallthrough 9 16 #endif 17 + 18 + #define WARN_ON(cond) \ 19 + ((cond) ? printf("Internal warning(%s:%d, %s): %s\n", \ 20 + __FILE__, __LINE__, __func__, #cond) : 0) 21 + 22 + #define unlikely(cond) (cond) 23 + 24 + /* Copied from lib/string.c */ 25 + static inline char *skip_spaces(const char *str) 26 + { 27 + while (isspace(*str)) 28 + ++str; 29 + return (char *)str; 30 + } 31 + 32 + static inline char *strim(char *s) 33 + { 34 + size_t size; 35 + char *end; 36 + 37 + size = strlen(s); 38 + if (!size) 39 + return s; 40 + 41 + end = s + size - 1; 42 + while (end >= s && isspace(*end)) 43 + end--; 44 + *(end + 1) = '\0'; 45 + 46 + return skip_spaces(s); 47 + } 48 + 49 + #define __init 50 + #define __initdata 51 + 52 + #include "../../../../include/linux/bootconfig.h" 10 53 11 54 #endif
-12
tools/bootconfig/include/linux/bug.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _SKC_LINUX_BUG_H 3 - #define _SKC_LINUX_BUG_H 4 - 5 - #include <stdio.h> 6 - #include <stdlib.h> 7 - 8 - #define WARN_ON(cond) \ 9 - ((cond) ? printf("Internal warning(%s:%d, %s): %s\n", \ 10 - __FILE__, __LINE__, __func__, #cond) : 0) 11 - 12 - #endif
-7
tools/bootconfig/include/linux/ctype.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _SKC_LINUX_CTYPE_H 3 - #define _SKC_LINUX_CTYPE_H 4 - 5 - #include <ctype.h> 6 - 7 - #endif
-7
tools/bootconfig/include/linux/errno.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _SKC_LINUX_ERRNO_H 3 - #define _SKC_LINUX_ERRNO_H 4 - 5 - #include <asm/errno.h> 6 - 7 - #endif
-18
tools/bootconfig/include/linux/kernel.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _SKC_LINUX_KERNEL_H 3 - #define _SKC_LINUX_KERNEL_H 4 - 5 - #include <stdlib.h> 6 - #include <stdbool.h> 7 - 8 - #include <linux/printk.h> 9 - 10 - typedef unsigned short u16; 11 - typedef unsigned int u32; 12 - 13 - #define unlikely(cond) (cond) 14 - 15 - #define __init 16 - #define __initdata 17 - 18 - #endif
-11
tools/bootconfig/include/linux/memblock.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _XBC_LINUX_MEMBLOCK_H 3 - #define _XBC_LINUX_MEMBLOCK_H 4 - 5 - #include <stdlib.h> 6 - 7 - #define SMP_CACHE_BYTES 0 8 - #define memblock_alloc(size, align) malloc(size) 9 - #define memblock_free_ptr(paddr, size) free(paddr) 10 - 11 - #endif
-14
tools/bootconfig/include/linux/printk.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _SKC_LINUX_PRINTK_H 3 - #define _SKC_LINUX_PRINTK_H 4 - 5 - #include <stdio.h> 6 - 7 - #define printk(fmt, ...) printf(fmt, ##__VA_ARGS__) 8 - 9 - #define pr_err printk 10 - #define pr_warn printk 11 - #define pr_info printk 12 - #define pr_debug printk 13 - 14 - #endif
-32
tools/bootconfig/include/linux/string.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _SKC_LINUX_STRING_H 3 - #define _SKC_LINUX_STRING_H 4 - 5 - #include <string.h> 6 - 7 - /* Copied from lib/string.c */ 8 - static inline char *skip_spaces(const char *str) 9 - { 10 - while (isspace(*str)) 11 - ++str; 12 - return (char *)str; 13 - } 14 - 15 - static inline char *strim(char *s) 16 - { 17 - size_t size; 18 - char *end; 19 - 20 - size = strlen(s); 21 - if (!size) 22 - return s; 23 - 24 - end = s + size - 1; 25 - while (end >= s && isspace(*end)) 26 - end--; 27 - *(end + 1) = '\0'; 28 - 29 - return skip_spaces(s); 30 - } 31 - 32 - #endif
+17 -15
tools/bootconfig/main.c
··· 12 12 #include <errno.h> 13 13 #include <endian.h> 14 14 15 - #include <linux/kernel.h> 16 15 #include <linux/bootconfig.h> 16 + 17 + #define pr_err(fmt, ...) fprintf(stderr, fmt, ##__VA_ARGS__) 17 18 18 19 static int xbc_show_value(struct xbc_node *node, bool semicolon) 19 20 { ··· 177 176 { 178 177 struct stat stat; 179 178 int ret; 180 - u32 size = 0, csum = 0, rcsum; 179 + uint32_t size = 0, csum = 0, rcsum; 181 180 char magic[BOOTCONFIG_MAGIC_LEN]; 182 181 const char *msg; 183 182 ··· 201 200 if (lseek(fd, -(8 + BOOTCONFIG_MAGIC_LEN), SEEK_END) < 0) 202 201 return pr_errno("Failed to lseek for size", -errno); 203 202 204 - if (read(fd, &size, sizeof(u32)) < 0) 203 + if (read(fd, &size, sizeof(uint32_t)) < 0) 205 204 return pr_errno("Failed to read size", -errno); 206 205 size = le32toh(size); 207 206 208 - if (read(fd, &csum, sizeof(u32)) < 0) 207 + if (read(fd, &csum, sizeof(uint32_t)) < 0) 209 208 return pr_errno("Failed to read checksum", -errno); 210 209 csum = le32toh(csum); 211 210 ··· 230 229 return -EINVAL; 231 230 } 232 231 233 - ret = xbc_init(*buf, &msg, NULL); 232 + ret = xbc_init(*buf, size, &msg, NULL); 234 233 /* Wrong data */ 235 234 if (ret < 0) { 236 235 pr_err("parse error: %s.\n", msg); ··· 270 269 if (!copy) 271 270 return -ENOMEM; 272 271 273 - ret = xbc_init(buf, &msg, &pos); 272 + ret = xbc_init(buf, len, &msg, &pos); 274 273 if (ret < 0) 275 274 show_xbc_error(copy, msg, pos); 276 275 free(copy); ··· 363 362 size_t total_size; 364 363 struct stat stat; 365 364 const char *msg; 366 - u32 size, csum; 365 + uint32_t size, csum; 367 366 int pos, pad; 368 367 int ret, fd; 369 368 ··· 377 376 378 377 /* Backup the bootconfig data */ 379 378 data = calloc(size + BOOTCONFIG_ALIGN + 380 - sizeof(u32) + sizeof(u32) + BOOTCONFIG_MAGIC_LEN, 1); 379 + sizeof(uint32_t) + sizeof(uint32_t) + BOOTCONFIG_MAGIC_LEN, 1); 381 380 if (!data) 382 381 return -ENOMEM; 383 382 memcpy(data, buf, size); 384 383 385 384 /* Check the data format */ 386 - ret = xbc_init(buf, &msg, &pos); 385 + ret = xbc_init(buf, size, &msg, &pos); 387 386 if (ret < 0) { 388 387 show_xbc_error(data, msg, pos); 389 388 free(data); ··· 392 391 return ret; 393 392 } 394 393 printf("Apply %s to %s\n", xbc_path, path); 394 + xbc_get_info(&ret, NULL); 395 395 printf("\tNumber of nodes: %d\n", ret); 396 396 printf("\tSize: %u bytes\n", (unsigned int)size); 397 397 printf("\tChecksum: %d\n", (unsigned int)csum); 398 398 399 399 /* TODO: Check the options by schema */ 400 - xbc_destroy_all(); 400 + xbc_exit(); 401 401 free(buf); 402 402 403 403 /* Remove old boot config if exists */ ··· 425 423 } 426 424 427 425 /* To align up the total size to BOOTCONFIG_ALIGN, get padding size */ 428 - total_size = stat.st_size + size + sizeof(u32) * 2 + BOOTCONFIG_MAGIC_LEN; 426 + total_size = stat.st_size + size + sizeof(uint32_t) * 2 + BOOTCONFIG_MAGIC_LEN; 429 427 pad = ((total_size + BOOTCONFIG_ALIGN - 1) & (~BOOTCONFIG_ALIGN_MASK)) - total_size; 430 428 size += pad; 431 429 432 430 /* Add a footer */ 433 431 p = data + size; 434 - *(u32 *)p = htole32(size); 435 - p += sizeof(u32); 432 + *(uint32_t *)p = htole32(size); 433 + p += sizeof(uint32_t); 436 434 437 - *(u32 *)p = htole32(csum); 438 - p += sizeof(u32); 435 + *(uint32_t *)p = htole32(csum); 436 + p += sizeof(uint32_t); 439 437 440 438 memcpy(p, BOOTCONFIG_MAGIC, BOOTCONFIG_MAGIC_LEN); 441 439 p += BOOTCONFIG_MAGIC_LEN;
+12
tools/include/linux/objtool.h
··· 66 66 static void __used __section(".discard.func_stack_frame_non_standard") \ 67 67 *__func_stack_frame_non_standard_##func = func 68 68 69 + /* 70 + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore 71 + * for the case where a function is intentionally missing frame pointer setup, 72 + * but otherwise needs objtool/ORC coverage when frame pointers are disabled. 73 + */ 74 + #ifdef CONFIG_FRAME_POINTER 75 + #define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func) 76 + #else 77 + #define STACK_FRAME_NON_STANDARD_FP(func) 78 + #endif 79 + 69 80 #else /* __ASSEMBLY__ */ 70 81 71 82 /* ··· 138 127 #define UNWIND_HINT(sp_reg, sp_offset, type, end) \ 139 128 "\n\t" 140 129 #define STACK_FRAME_NON_STANDARD(func) 130 + #define STACK_FRAME_NON_STANDARD_FP(func) 141 131 #else 142 132 #define ANNOTATE_INTRA_FUNCTION_CALL 143 133 .macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0
+1 -1
tools/objtool/check.c
··· 3229 3229 } 3230 3230 3231 3231 while (&insn->list != &file->insn_list && (!sec || insn->sec == sec)) { 3232 - if (insn->hint && !insn->visited) { 3232 + if (insn->hint && !insn->visited && !insn->ignore) { 3233 3233 ret = validate_branch(file, insn->func, insn, state); 3234 3234 if (ret && backtrace) 3235 3235 BT_FUNC("<=== (hint)", insn);
+1 -1
tools/testing/selftests/ftrace/ftracetest
··· 428 428 exit 1 429 429 fi 430 430 done 431 - (cd $TRACING_DIR; initialize_ftrace) # for cleanup 431 + (cd $TRACING_DIR; finish_ftrace) # for cleanup 432 432 433 433 prlog "" 434 434 prlog "# of passed: " `echo $PASSED_CASES | wc -w`
+12
tools/testing/selftests/ftrace/test.d/functions
··· 124 124 [ -f uprobe_events ] && echo > uprobe_events 125 125 [ -f synthetic_events ] && echo > synthetic_events 126 126 [ -f snapshot ] && echo 0 > snapshot 127 + 128 + # Stop tracing while reading the trace file by default, to prevent 129 + # the test results while checking it and to avoid taking a long time 130 + # to check the result. 131 + [ -f options/pause-on-trace ] && echo 1 > options/pause-on-trace 132 + 127 133 clear_trace 128 134 enable_tracing 135 + } 136 + 137 + finish_ftrace() { 138 + initialize_ftrace 139 + # And recover it to default. 140 + [ -f options/pause-on-trace ] && echo 0 > options/pause-on-trace 129 141 } 130 142 131 143 check_requires() { # Check required files and tracers
+1 -1
tools/tracing/latency/latency-collector.c
··· 1538 1538 mutex_lock(&print_mtx); 1539 1539 check_signals(); 1540 1540 write_or_die(fd_stdout, queue_full_warning, 1541 - sizeof(queue_full_warning)); 1541 + strlen(queue_full_warning)); 1542 1542 mutex_unlock(&print_mtx); 1543 1543 } 1544 1544 modified--;