perf/core: Fix missing read event generation on task exit

For events with inherit_stat enabled, a "read" event will be generated
to collect per task event counts on task exit.

The call chain is as follows:

do_exit
-> perf_event_exit_task
-> perf_event_exit_task_context
-> perf_event_exit_event
-> perf_remove_from_context
-> perf_child_detach
-> sync_child_event
-> perf_event_read_event

However, the child event context detaches the task too early in
perf_event_exit_task_context, which causes sync_child_event to never
generate the read event in this case, since child_event->ctx->task is
always set to TASK_TOMBSTONE. Fix that by moving context lock section
backward to ensure ctx->task is not set to TASK_TOMBSTONE before
generating the read event.

Because perf_event_free_task calls perf_event_exit_task_context with
exit = false to tear down all child events from the context, and the
task never lived, accessing the task PID can lead to a use-after-free.

To fix that, let sync_child_event read task from argument and move the
call to the only place it should be triggered to avoid the effect of
setting ctx->task to TASK_TOMESTONE, and add a task parameter to
perf_event_exit_event to trigger the sync_child_event properly when
needed.

This bug can be reproduced by running "perf record -s" and attaching to
any program that generates perf events in its child tasks. If we check
the result with "perf report -T", the last line of the report will leave
an empty table like "# PID TID", which is expected to contain the
per-task event counts by design.

Fixes: ef54c1a476ae ("perf: Rework perf_event_exit_event()")
Signed-off-by: Thaumy Cheng <thaumy.love@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linux-perf-users@vger.kernel.org
Link: https://patch.msgid.link/20251209041600.963586-1-thaumy.love@gmail.com

authored by Thaumy Cheng and committed by Ingo Molnar c418d8b4 01439286

+12 -10
+12 -10
kernel/events/core.c
··· 2317 perf_event__header_size(leader); 2318 } 2319 2320 - static void sync_child_event(struct perf_event *child_event); 2321 - 2322 static void perf_child_detach(struct perf_event *event) 2323 { 2324 struct perf_event *parent_event = event->parent; ··· 2335 lockdep_assert_held(&parent_event->child_mutex); 2336 */ 2337 2338 - sync_child_event(event); 2339 list_del_init(&event->child_list); 2340 } 2341 ··· 4585 static void perf_remove_from_owner(struct perf_event *event); 4586 static void perf_event_exit_event(struct perf_event *event, 4587 struct perf_event_context *ctx, 4588 bool revoke); 4589 4590 /* ··· 4613 4614 modified = true; 4615 4616 - perf_event_exit_event(event, ctx, false); 4617 } 4618 4619 raw_spin_lock_irqsave(&ctx->lock, flags); ··· 12516 /* 12517 * De-schedule the event and mark it REVOKED. 12518 */ 12519 - perf_event_exit_event(event, ctx, true); 12520 12521 /* 12522 * All _free_event() bits that rely on event->pmu: ··· 14073 } 14074 EXPORT_SYMBOL_GPL(perf_pmu_migrate_context); 14075 14076 - static void sync_child_event(struct perf_event *child_event) 14077 { 14078 struct perf_event *parent_event = child_event->parent; 14079 u64 child_val; 14080 14081 if (child_event->attr.inherit_stat) { 14082 - struct task_struct *task = child_event->ctx->task; 14083 - 14084 if (task && task != TASK_TOMBSTONE) 14085 perf_event_read_event(child_event, task); 14086 } ··· 14098 14099 static void 14100 perf_event_exit_event(struct perf_event *event, 14101 - struct perf_event_context *ctx, bool revoke) 14102 { 14103 struct perf_event *parent_event = event->parent; 14104 unsigned long detach_flags = DETACH_EXIT; ··· 14123 mutex_lock(&parent_event->child_mutex); 14124 /* PERF_ATTACH_ITRACE might be set concurrently */ 14125 attach_state = READ_ONCE(event->attach_state); 14126 } 14127 14128 if (revoke) ··· 14217 perf_event_task(task, ctx, 0); 14218 14219 list_for_each_entry_safe(child_event, next, &ctx->event_list, event_entry) 14220 - perf_event_exit_event(child_event, ctx, false); 14221 14222 mutex_unlock(&ctx->mutex); 14223
··· 2317 perf_event__header_size(leader); 2318 } 2319 2320 static void perf_child_detach(struct perf_event *event) 2321 { 2322 struct perf_event *parent_event = event->parent; ··· 2337 lockdep_assert_held(&parent_event->child_mutex); 2338 */ 2339 2340 list_del_init(&event->child_list); 2341 } 2342 ··· 4588 static void perf_remove_from_owner(struct perf_event *event); 4589 static void perf_event_exit_event(struct perf_event *event, 4590 struct perf_event_context *ctx, 4591 + struct task_struct *task, 4592 bool revoke); 4593 4594 /* ··· 4615 4616 modified = true; 4617 4618 + perf_event_exit_event(event, ctx, ctx->task, false); 4619 } 4620 4621 raw_spin_lock_irqsave(&ctx->lock, flags); ··· 12518 /* 12519 * De-schedule the event and mark it REVOKED. 12520 */ 12521 + perf_event_exit_event(event, ctx, ctx->task, true); 12522 12523 /* 12524 * All _free_event() bits that rely on event->pmu: ··· 14075 } 14076 EXPORT_SYMBOL_GPL(perf_pmu_migrate_context); 14077 14078 + static void sync_child_event(struct perf_event *child_event, 14079 + struct task_struct *task) 14080 { 14081 struct perf_event *parent_event = child_event->parent; 14082 u64 child_val; 14083 14084 if (child_event->attr.inherit_stat) { 14085 if (task && task != TASK_TOMBSTONE) 14086 perf_event_read_event(child_event, task); 14087 } ··· 14101 14102 static void 14103 perf_event_exit_event(struct perf_event *event, 14104 + struct perf_event_context *ctx, 14105 + struct task_struct *task, 14106 + bool revoke) 14107 { 14108 struct perf_event *parent_event = event->parent; 14109 unsigned long detach_flags = DETACH_EXIT; ··· 14124 mutex_lock(&parent_event->child_mutex); 14125 /* PERF_ATTACH_ITRACE might be set concurrently */ 14126 attach_state = READ_ONCE(event->attach_state); 14127 + 14128 + if (attach_state & PERF_ATTACH_CHILD) 14129 + sync_child_event(event, task); 14130 } 14131 14132 if (revoke) ··· 14215 perf_event_task(task, ctx, 0); 14216 14217 list_for_each_entry_safe(child_event, next, &ctx->event_list, event_entry) 14218 + perf_event_exit_event(child_event, ctx, exit ? task : NULL, false); 14219 14220 mutex_unlock(&ctx->mutex); 14221