Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

libbpf: make RINGBUF map size adjustments more eagerly

Make libbpf adjust RINGBUF map size (rounding it up to closest power-of-2
of page_size) more eagerly: during open phase when initializing the map
and on explicit calls to bpf_map__set_max_entries().

Such approach allows user to check actual size of BPF ringbuf even
before it's created in the kernel, but also it prevents various edge
case scenarios where BPF ringbuf size can get out of sync with what it
would be in kernel. One of them (reported in [0]) is during an attempt
to pin/reuse BPF ringbuf.

Move adjust_ringbuf_sz() helper closer to its first actual use. The
implementation of the helper is unchanged.

Also make detection of whether bpf_object is already loaded more robust
by checking obj->loaded explicitly, given that map->fd can be < 0 even
if bpf_object is already loaded due to ability to disable map creation
with bpf_map__set_autocreate(map, false).

[0] Closes: https://github.com/libbpf/libbpf/pull/530

Fixes: 0087a681fa8c ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220715230952.2219271-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Andrii Nakryiko and committed by
Alexei Starovoitov
597fbc46 bdb2bc75

+42 -35
+42 -35
tools/lib/bpf/libbpf.c
··· 2331 2331 return 0; 2332 2332 } 2333 2333 2334 + static size_t adjust_ringbuf_sz(size_t sz) 2335 + { 2336 + __u32 page_sz = sysconf(_SC_PAGE_SIZE); 2337 + __u32 mul; 2338 + 2339 + /* if user forgot to set any size, make sure they see error */ 2340 + if (sz == 0) 2341 + return 0; 2342 + /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be 2343 + * a power-of-2 multiple of kernel's page size. If user diligently 2344 + * satisified these conditions, pass the size through. 2345 + */ 2346 + if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz)) 2347 + return sz; 2348 + 2349 + /* Otherwise find closest (page_sz * power_of_2) product bigger than 2350 + * user-set size to satisfy both user size request and kernel 2351 + * requirements and substitute correct max_entries for map creation. 2352 + */ 2353 + for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) { 2354 + if (mul * page_sz > sz) 2355 + return mul * page_sz; 2356 + } 2357 + 2358 + /* if it's impossible to satisfy the conditions (i.e., user size is 2359 + * very close to UINT_MAX but is not a power-of-2 multiple of 2360 + * page_size) then just return original size and let kernel reject it 2361 + */ 2362 + return sz; 2363 + } 2364 + 2334 2365 static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def) 2335 2366 { 2336 2367 map->def.type = def->map_type; ··· 2374 2343 map->numa_node = def->numa_node; 2375 2344 map->btf_key_type_id = def->key_type_id; 2376 2345 map->btf_value_type_id = def->value_type_id; 2346 + 2347 + /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */ 2348 + if (map->def.type == BPF_MAP_TYPE_RINGBUF) 2349 + map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 2377 2350 2378 2351 if (def->parts & MAP_DEF_MAP_TYPE) 2379 2352 pr_debug("map '%s': found type = %u.\n", map->name, def->map_type); ··· 4352 4317 4353 4318 int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries) 4354 4319 { 4355 - if (map->fd >= 0) 4320 + if (map->obj->loaded) 4356 4321 return libbpf_err(-EBUSY); 4322 + 4357 4323 map->def.max_entries = max_entries; 4324 + 4325 + /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */ 4326 + if (map->def.type == BPF_MAP_TYPE_RINGBUF) 4327 + map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 4328 + 4358 4329 return 0; 4359 4330 } 4360 4331 ··· 4916 4875 4917 4876 static void bpf_map__destroy(struct bpf_map *map); 4918 4877 4919 - static size_t adjust_ringbuf_sz(size_t sz) 4920 - { 4921 - __u32 page_sz = sysconf(_SC_PAGE_SIZE); 4922 - __u32 mul; 4923 - 4924 - /* if user forgot to set any size, make sure they see error */ 4925 - if (sz == 0) 4926 - return 0; 4927 - /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be 4928 - * a power-of-2 multiple of kernel's page size. If user diligently 4929 - * satisified these conditions, pass the size through. 4930 - */ 4931 - if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz)) 4932 - return sz; 4933 - 4934 - /* Otherwise find closest (page_sz * power_of_2) product bigger than 4935 - * user-set size to satisfy both user size request and kernel 4936 - * requirements and substitute correct max_entries for map creation. 4937 - */ 4938 - for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) { 4939 - if (mul * page_sz > sz) 4940 - return mul * page_sz; 4941 - } 4942 - 4943 - /* if it's impossible to satisfy the conditions (i.e., user size is 4944 - * very close to UINT_MAX but is not a power-of-2 multiple of 4945 - * page_size) then just return original size and let kernel reject it 4946 - */ 4947 - return sz; 4948 - } 4949 - 4950 4878 static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, bool is_inner) 4951 4879 { 4952 4880 LIBBPF_OPTS(bpf_map_create_opts, create_attr); ··· 4954 4944 } 4955 4945 4956 4946 switch (def->type) { 4957 - case BPF_MAP_TYPE_RINGBUF: 4958 - map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries); 4959 - /* fallthrough */ 4960 4947 case BPF_MAP_TYPE_PERF_EVENT_ARRAY: 4961 4948 case BPF_MAP_TYPE_CGROUP_ARRAY: 4962 4949 case BPF_MAP_TYPE_STACK_TRACE: