Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tracing: Add sched_prepare_exec tracepoint

Add "sched_prepare_exec" tracepoint, which is run right after the point
of no return but before the current task assumes its new exec identity.

Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec"
tracepoint runs before flushing the old exec, i.e. while the task still
has the original state (such as original MM), but when the new exec
either succeeds or crashes (but never returns to the original exec).

Being able to trace this event can be helpful in a number of use cases:

* allowing tracing eBPF programs access to the original MM on exec,
before current->mm is replaced;
* counting exec in the original task (via perf event);
* profiling flush time ("sched_prepare_exec" to "sched_process_exec").

Example of tracing output:

$ cat /sys/kernel/debug/tracing/trace_pipe
<...>-379 [003] ..... 179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd
<...>-381 [002] ..... 180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd
<...>-385 [001] ..... 180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash
<...>-389 [006] ..... 192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash

Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240411102158.1272267-1-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>

authored by

Marco Elver and committed by
Kees Cook
c8238994 39cd87c4

+43
+8
fs/exec.c
··· 1268 1268 return retval; 1269 1269 1270 1270 /* 1271 + * This tracepoint marks the point before flushing the old exec where 1272 + * the current task is still unchanged, but errors are fatal (point of 1273 + * no return). The later "sched_process_exec" tracepoint is called after 1274 + * the current task has successfully switched to the new exec. 1275 + */ 1276 + trace_sched_prepare_exec(current, bprm); 1277 + 1278 + /* 1271 1279 * Ensure all future errors are fatal. 1272 1280 */ 1273 1281 bprm->point_of_no_return = true;
+35
include/trace/events/sched.h
··· 420 420 __entry->pid, __entry->old_pid) 421 421 ); 422 422 423 + /** 424 + * sched_prepare_exec - called before setting up new exec 425 + * @task: pointer to the current task 426 + * @bprm: pointer to linux_binprm used for new exec 427 + * 428 + * Called before flushing the old exec, where @task is still unchanged, but at 429 + * the point of no return during switching to the new exec. At the point it is 430 + * called the exec will either succeed, or on failure terminate the task. Also 431 + * see the "sched_process_exec" tracepoint, which is called right after @task 432 + * has successfully switched to the new exec. 433 + */ 434 + TRACE_EVENT(sched_prepare_exec, 435 + 436 + TP_PROTO(struct task_struct *task, struct linux_binprm *bprm), 437 + 438 + TP_ARGS(task, bprm), 439 + 440 + TP_STRUCT__entry( 441 + __string( interp, bprm->interp ) 442 + __string( filename, bprm->filename ) 443 + __field( pid_t, pid ) 444 + __string( comm, task->comm ) 445 + ), 446 + 447 + TP_fast_assign( 448 + __assign_str(interp, bprm->interp); 449 + __assign_str(filename, bprm->filename); 450 + __entry->pid = task->pid; 451 + __assign_str(comm, task->comm); 452 + ), 453 + 454 + TP_printk("interp=%s filename=%s pid=%d comm=%s", 455 + __get_str(interp), __get_str(filename), 456 + __entry->pid, __get_str(comm)) 457 + ); 423 458 424 459 #ifdef CONFIG_SCHEDSTATS 425 460 #define DEFINE_EVENT_SCHEDSTAT DEFINE_EVENT