Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tracing: Allow system call tracepoints to handle page faults

Use Tasks Trace RCU to protect iteration of system call enter/exit
tracepoint probes to allow those probes to handle page faults.

In preparation for this change, all tracers registering to system call
enter/exit tracepoints should expect those to be called with preemption
enabled.

This allows tracers to fault-in userspace system call arguments such as
path strings within their probe callbacks.

Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/20241009010718.2050182-6-mathieu.desnoyers@efficios.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

authored by

Mathieu Desnoyers and committed by
Steven Rostedt (Google)
a363d27c 4aadde89

+17 -2
+16 -2
include/linux/tracepoint.h
··· 17 17 #include <linux/errno.h> 18 18 #include <linux/types.h> 19 19 #include <linux/rcupdate.h> 20 + #include <linux/rcupdate_trace.h> 20 21 #include <linux/tracepoint-defs.h> 21 22 #include <linux/static_call.h> 22 23 ··· 108 107 #ifdef CONFIG_TRACEPOINTS 109 108 static inline void tracepoint_synchronize_unregister(void) 110 109 { 110 + synchronize_rcu_tasks_trace(); 111 111 synchronize_rcu(); 112 112 } 113 113 #else ··· 198 196 /* 199 197 * it_func[0] is never NULL because there is at least one element in the array 200 198 * when the array itself is non NULL. 199 + * 200 + * With @syscall=0, the tracepoint callback array dereference is 201 + * protected by disabling preemption. 202 + * With @syscall=1, the tracepoint callback array dereference is 203 + * protected by Tasks Trace RCU, which allows probes to handle page 204 + * faults. 201 205 */ 202 206 #define __DO_TRACE(name, args, cond, syscall) \ 203 207 do { \ ··· 212 204 if (!(cond)) \ 213 205 return; \ 214 206 \ 215 - preempt_disable_notrace(); \ 207 + if (syscall) \ 208 + rcu_read_lock_trace(); \ 209 + else \ 210 + preempt_disable_notrace(); \ 216 211 \ 217 212 __DO_TRACE_CALL(name, TP_ARGS(args)); \ 218 213 \ 219 - preempt_enable_notrace(); \ 214 + if (syscall) \ 215 + rcu_read_unlock_trace(); \ 216 + else \ 217 + preempt_enable_notrace(); \ 220 218 } while (0) 221 219 222 220 /*
+1
init/Kconfig
··· 1985 1985 # 1986 1986 config TRACEPOINTS 1987 1987 bool 1988 + select TASKS_TRACE_RCU 1988 1989 1989 1990 source "kernel/Kconfig.kexec" 1990 1991