Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

libbpf: perfbuf: Add API to get the ring buffer

Add support for writing a custom event reader, by exposing the ring
buffer.

With the new API perf_buffer__buffer() you will get access to the
raw mmaped()'ed per-cpu underlying memory of the ring buffer.

This region contains both the perf buffer data and header
(struct perf_event_mmap_page), which manages the ring buffer
state (head/tail positions, when accessing the head/tail position
it's important to take into consideration SMP).
With this type of low level access one can implement different types of
consumers here are few simple examples where this API helps with:

1. perf_event_read_simple is allocating using malloc, perhaps you want
to handle the wrap-around in some other way.
2. Since perf buf is per-cpu then the order of the events is not
guarnteed, for example:
Given 3 events where each event has a timestamp t0 < t1 < t2,
and the events are spread on more than 1 CPU, then we can end
up with the following state in the ring buf:
CPU[0] => [t0, t2]
CPU[1] => [t1]
When you consume the events from CPU[0], you could know there is
a t1 missing, (assuming there are no drops, and your event data
contains a sequential index).
So now one can simply do the following, for CPU[0], you can store
the address of t0 and t2 in an array (without moving the tail, so
there data is not perished) then move on the CPU[1] and set the
address of t1 in the same array.
So you end up with something like:
void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
and move the tails as you process in order.
3. Assuming there are multiple CPUs and we want to start draining the
messages from them, then we can "pick" with which one to start with
according to the remaining free space in the ring buffer.

Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220715181122.149224-1-arilou@gmail.com

authored by

Jon Doron and committed by
Andrii Nakryiko
9ff5efde 8eab0a09

+33
+16
tools/lib/bpf/libbpf.c
··· 11734 11734 return cpu_buf->fd; 11735 11735 } 11736 11736 11737 + int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, size_t *buf_size) 11738 + { 11739 + struct perf_cpu_buf *cpu_buf; 11740 + 11741 + if (buf_idx >= pb->cpu_cnt) 11742 + return libbpf_err(-EINVAL); 11743 + 11744 + cpu_buf = pb->cpu_bufs[buf_idx]; 11745 + if (!cpu_buf) 11746 + return libbpf_err(-ENOENT); 11747 + 11748 + *buf = cpu_buf->base; 11749 + *buf_size = pb->mmap_size; 11750 + return 0; 11751 + } 11752 + 11737 11753 /* 11738 11754 * Consume data from perf ring buffer corresponding to slot *buf_idx* in 11739 11755 * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
+16
tools/lib/bpf/libbpf.h
··· 1053 1053 LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx); 1054 1054 LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb); 1055 1055 LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx); 1056 + /** 1057 + * @brief **perf_buffer__buffer()** returns the per-cpu raw mmap()'ed underlying 1058 + * memory region of the ring buffer. 1059 + * This ring buffer can be used to implement a custom events consumer. 1060 + * The ring buffer starts with the *struct perf_event_mmap_page*, which 1061 + * holds the ring buffer managment fields, when accessing the header 1062 + * structure it's important to be SMP aware. 1063 + * You can refer to *perf_event_read_simple* for a simple example. 1064 + * @param pb the perf buffer structure 1065 + * @param buf_idx the buffer index to retreive 1066 + * @param buf (out) gets the base pointer of the mmap()'ed memory 1067 + * @param buf_size (out) gets the size of the mmap()'ed region 1068 + * @return 0 on success, negative error code for failure 1069 + */ 1070 + LIBBPF_API int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void **buf, 1071 + size_t *buf_size); 1056 1072 1057 1073 struct bpf_prog_linfo; 1058 1074 struct bpf_prog_info;
+1
tools/lib/bpf/libbpf.map
··· 362 362 libbpf_bpf_link_type_str; 363 363 libbpf_bpf_map_type_str; 364 364 libbpf_bpf_prog_type_str; 365 + perf_buffer__buffer; 365 366 };