Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

libperf evlist: Fix polling of system-wide events

Originally, (refer commit f90d194a867a5a1d ("perf evlist: Do not poll
events that use the system_wide flag") there wasn't much reason to poll
system-wide events because:

1. The mmaps get "merged" via set-output anyway (the per-cpu case)
2. perf reads all mmaps when any event is woken
3. system-wide mmaps do not fill up as fast as the mmaps for user
selected events

But there was 1 reason not to poll which was that it prevented correct
termination due to POLLHUP on all user selected events. That issue is
now easily resolved by using fdarray_flag__nonfilterable.

With the advent of commit ae4f8ae16a078964 ("libperf evlist: Allow
mixing per-thread and per-cpu mmaps"), system-wide mmaps can be used
also in the per-thread case where reason 1 does not apply.

Fix the omission of system-wide events from polling by using the
fdarray_flag__nonfilterable flag.

Example:

Before:

$ perf record --no-bpf-event -vvv -e intel_pt// --per-thread uname 2>err.txt
Linux
$ grep 'sys_perf_event_open.*=\|pollfd' err.txt
sys_perf_event_open: pid 155076 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 155076 cpu -1 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 15
thread_data[0x55fb43c29e80]: pollfd[0] <- event_fd=5
thread_data[0x55fb43c29e80]: pollfd[1] <- event_fd=6
thread_data[0x55fb43c29e80]: pollfd[2] <- non_perf_event fd=4

After:

$ perf record --no-bpf-event -vvv -e intel_pt// --per-thread uname 2>err.txt
Linux
$ grep 'sys_perf_event_open.*=\|pollfd' err.txt
sys_perf_event_open: pid 156316 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 156316 cpu -1 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 15
thread_data[0x55cc19e58e80]: pollfd[0] <- event_fd=5
thread_data[0x55cc19e58e80]: pollfd[1] <- event_fd=6
thread_data[0x55cc19e58e80]: pollfd[2] <- event_fd=7
thread_data[0x55cc19e58e80]: pollfd[3] <- event_fd=9
thread_data[0x55cc19e58e80]: pollfd[4] <- event_fd=10
thread_data[0x55cc19e58e80]: pollfd[5] <- event_fd=11
thread_data[0x55cc19e58e80]: pollfd[6] <- event_fd=12
thread_data[0x55cc19e58e80]: pollfd[7] <- event_fd=13
thread_data[0x55cc19e58e80]: pollfd[8] <- event_fd=14
thread_data[0x55cc19e58e80]: pollfd[9] <- event_fd=15
thread_data[0x55cc19e58e80]: pollfd[10] <- non_perf_event fd=4

Fixes: ae4f8ae16a078964 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220915122612.81738-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Adrian Hunter and committed by
Arnaldo Carvalho de Melo
6cc44796 ca76d7d2

+3 -2
+3 -2
tools/lib/perf/evlist.c
··· 441 441 442 442 perf_evlist__for_each_entry(evlist, evsel) { 443 443 bool overwrite = evsel->attr.write_backward; 444 + enum fdarray_flags flgs; 444 445 struct perf_mmap *map; 445 446 int *output, fd, cpu; 446 447 ··· 505 504 506 505 revent = !overwrite ? POLLIN : 0; 507 506 508 - if (!evsel->system_wide && 509 - perf_evlist__add_pollfd(evlist, fd, map, revent, fdarray_flag__default) < 0) { 507 + flgs = evsel->system_wide ? fdarray_flag__nonfilterable : fdarray_flag__default; 508 + if (perf_evlist__add_pollfd(evlist, fd, map, revent, flgs) < 0) { 510 509 perf_mmap__put(map); 511 510 return -1; 512 511 }