Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tracing/user_events: Document user_event_mm one-shot list usage

During 6.4 development it became clear that the one-shot list used by
the user_event_mm's next field was confusing to others. It is not clear
how this list is protected or what the next field usage is for unless
you are familiar with the code.

Add comments into the user_event_mm struct indicating lock requirement
and usage. Also document how and why this approach was used via comments
in both user_event_enabler_update() and user_event_mm_get_all() and the
rules to properly use it.

Link: https://lkml.kernel.org/r/20230519230741.669-5-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

authored by

Beau Belgrave and committed by
Steven Rostedt (Google)
ff9e1632 dcbd1ac2

+23 -1
+1
include/linux/user_events.h
··· 20 20 struct list_head mms_link; 21 21 struct list_head enablers; 22 22 struct mm_struct *mm; 23 + /* Used for one-shot lists, protected by event_mutex */ 23 24 struct user_event_mm *next; 24 25 refcount_t refcnt; 25 26 refcount_t tasks;
+22 -1
kernel/trace/trace_events_user.c
··· 451 451 static void user_event_enabler_update(struct user_event *user) 452 452 { 453 453 struct user_event_enabler *enabler; 454 - struct user_event_mm *mm = user_event_mm_get_all(user); 455 454 struct user_event_mm *next; 455 + struct user_event_mm *mm; 456 456 int attempt; 457 457 458 458 lockdep_assert_held(&event_mutex); 459 + 460 + /* 461 + * We need to build a one-shot list of all the mms that have an 462 + * enabler for the user_event passed in. This list is only valid 463 + * while holding the event_mutex. The only reason for this is due 464 + * to the global mm list being RCU protected and we use methods 465 + * which can wait (mmap_read_lock and pin_user_pages_remote). 466 + * 467 + * NOTE: user_event_mm_get_all() increments the ref count of each 468 + * mm that is added to the list to prevent removal timing windows. 469 + * We must always put each mm after they are used, which may wait. 470 + */ 471 + mm = user_event_mm_get_all(user); 459 472 460 473 while (mm) { 461 474 next = mm->next; ··· 527 514 struct user_event_mm *found = NULL; 528 515 struct user_event_enabler *enabler; 529 516 struct user_event_mm *mm; 517 + 518 + /* 519 + * We use the mm->next field to build a one-shot list from the global 520 + * RCU protected list. To build this list the event_mutex must be held. 521 + * This lets us build a list without requiring allocs that could fail 522 + * when user based events are most wanted for diagnostics. 523 + */ 524 + lockdep_assert_held(&event_mutex); 530 525 531 526 /* 532 527 * We do not want to block fork/exec while enablements are being