Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: Add bpf_user_ringbuf_drain() helper

In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which
will allow user-space applications to publish messages to a ring buffer
that is consumed by a BPF program in kernel-space. In order for this
map-type to be useful, it will require a BPF helper function that BPF
programs can invoke to drain samples from the ring buffer, and invoke
callbacks on those samples. This change adds that capability via a new BPF
helper function:

bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx,
u64 flags)

BPF programs may invoke this function to run callback_fn() on a series of
samples in the ring buffer. callback_fn() has the following signature:

long callback_fn(struct bpf_dynptr *dynptr, void *context);

Samples are provided to the callback in the form of struct bpf_dynptr *'s,
which the program can read using BPF helper functions for querying
struct bpf_dynptr's.

In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register
type is added to the verifier to reflect a dynptr that was allocated by
a helper function and passed to a BPF program. Unlike PTR_TO_STACK
dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR
dynptrs need not use reference tracking, as the BPF helper is trusted to
properly free the dynptr before returning. The verifier currently only
supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL.

Note that while the corresponding user-space libbpf logic will be added
in a subsequent patch, this patch does contain an implementation of the
.map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This
.map_poll() callback guarantees that an epoll-waiting user-space
producer will receive at least one event notification whenever at least
one sample is drained in an invocation of bpf_user_ringbuf_drain(),
provided that the function is not invoked with the BPF_RB_NO_WAKEUP
flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup
notification is sent even if no sample was drained.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com

authored by

David Vernet and committed by
Andrii Nakryiko
20571567 583c1f42

+320 -11
+9 -2
include/linux/bpf.h
··· 451 451 /* DYNPTR points to memory local to the bpf program. */ 452 452 DYNPTR_TYPE_LOCAL = BIT(8 + BPF_BASE_TYPE_BITS), 453 453 454 - /* DYNPTR points to a ringbuf record. */ 454 + /* DYNPTR points to a kernel-produced ringbuf record. */ 455 455 DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), 456 456 457 457 /* Size is known at compile time. */ ··· 656 656 PTR_TO_MEM, /* reg points to valid memory region */ 657 657 PTR_TO_BUF, /* reg points to a read/write buffer */ 658 658 PTR_TO_FUNC, /* reg points to a bpf program function */ 659 + PTR_TO_DYNPTR, /* reg points to a dynptr */ 659 660 __BPF_REG_TYPE_MAX, 660 661 661 662 /* Extended reg_types. */ ··· 1394 1393 1395 1394 #define BPF_MAP_CAN_READ BIT(0) 1396 1395 #define BPF_MAP_CAN_WRITE BIT(1) 1396 + 1397 + /* Maximum number of user-producer ring buffer samples that can be drained in 1398 + * a call to bpf_user_ringbuf_drain(). 1399 + */ 1400 + #define BPF_MAX_USER_RINGBUF_SAMPLES (128 * 1024) 1397 1401 1398 1402 static inline u32 bpf_map_flags_to_cap(struct bpf_map *map) 1399 1403 { ··· 2501 2495 extern const struct bpf_func_proto bpf_copy_from_user_task_proto; 2502 2496 extern const struct bpf_func_proto bpf_set_retval_proto; 2503 2497 extern const struct bpf_func_proto bpf_get_retval_proto; 2498 + extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto; 2504 2499 2505 2500 const struct bpf_func_proto *tracing_prog_func_proto( 2506 2501 enum bpf_func_id func_id, const struct bpf_prog *prog); ··· 2646 2639 BPF_DYNPTR_TYPE_INVALID, 2647 2640 /* Points to memory that is local to the bpf program */ 2648 2641 BPF_DYNPTR_TYPE_LOCAL, 2649 - /* Underlying data is a ringbuf record */ 2642 + /* Underlying data is a kernel-produced ringbuf record */ 2650 2643 BPF_DYNPTR_TYPE_RINGBUF, 2651 2644 }; 2652 2645
+38
include/uapi/linux/bpf.h
··· 5388 5388 * Return 5389 5389 * Current *ktime*. 5390 5390 * 5391 + * long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) 5392 + * Description 5393 + * Drain samples from the specified user ring buffer, and invoke 5394 + * the provided callback for each such sample: 5395 + * 5396 + * long (\*callback_fn)(struct bpf_dynptr \*dynptr, void \*ctx); 5397 + * 5398 + * If **callback_fn** returns 0, the helper will continue to try 5399 + * and drain the next sample, up to a maximum of 5400 + * BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value is 1, 5401 + * the helper will skip the rest of the samples and return. Other 5402 + * return values are not used now, and will be rejected by the 5403 + * verifier. 5404 + * Return 5405 + * The number of drained samples if no error was encountered while 5406 + * draining samples, or 0 if no samples were present in the ring 5407 + * buffer. If a user-space producer was epoll-waiting on this map, 5408 + * and at least one sample was drained, they will receive an event 5409 + * notification notifying them of available space in the ring 5410 + * buffer. If the BPF_RB_NO_WAKEUP flag is passed to this 5411 + * function, no wakeup notification will be sent. If the 5412 + * BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification will 5413 + * be sent even if no sample was drained. 5414 + * 5415 + * On failure, the returned value is one of the following: 5416 + * 5417 + * **-EBUSY** if the ring buffer is contended, and another calling 5418 + * context was concurrently draining the ring buffer. 5419 + * 5420 + * **-EINVAL** if user-space is not properly tracking the ring 5421 + * buffer due to the producer position not being aligned to 8 5422 + * bytes, a sample not being aligned to 8 bytes, or the producer 5423 + * position not matching the advertised length of a sample. 5424 + * 5425 + * **-E2BIG** if user-space has tried to publish a sample which is 5426 + * larger than the size of the ring buffer, or which cannot fit 5427 + * within a struct bpf_dynptr. 5391 5428 */ 5392 5429 #define __BPF_FUNC_MAPPER(FN) \ 5393 5430 FN(unspec), \ ··· 5636 5599 FN(tcp_raw_check_syncookie_ipv4), \ 5637 5600 FN(tcp_raw_check_syncookie_ipv6), \ 5638 5601 FN(ktime_get_tai_ns), \ 5602 + FN(user_ringbuf_drain), \ 5639 5603 /* */ 5640 5604 5641 5605 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
+2
kernel/bpf/helpers.c
··· 1659 1659 return &bpf_for_each_map_elem_proto; 1660 1660 case BPF_FUNC_loop: 1661 1661 return &bpf_loop_proto; 1662 + case BPF_FUNC_user_ringbuf_drain: 1663 + return &bpf_user_ringbuf_drain_proto; 1662 1664 default: 1663 1665 break; 1664 1666 }
+176 -5
kernel/bpf/ringbuf.c
··· 38 38 struct page **pages; 39 39 int nr_pages; 40 40 spinlock_t spinlock ____cacheline_aligned_in_smp; 41 + /* For user-space producer ring buffers, an atomic_t busy bit is used 42 + * to synchronize access to the ring buffers in the kernel, rather than 43 + * the spinlock that is used for kernel-producer ring buffers. This is 44 + * done because the ring buffer must hold a lock across a BPF program's 45 + * callback: 46 + * 47 + * __bpf_user_ringbuf_peek() // lock acquired 48 + * -> program callback_fn() 49 + * -> __bpf_user_ringbuf_sample_release() // lock released 50 + * 51 + * It is unsafe and incorrect to hold an IRQ spinlock across what could 52 + * be a long execution window, so we instead simply disallow concurrent 53 + * access to the ring buffer by kernel consumers, and return -EBUSY from 54 + * __bpf_user_ringbuf_peek() if the busy bit is held by another task. 55 + */ 56 + atomic_t busy ____cacheline_aligned_in_smp; 41 57 /* Consumer and producer counters are put into separate pages to 42 58 * allow each position to be mapped with different permissions. 43 59 * This prevents a user-space application from modifying the ··· 169 153 return NULL; 170 154 171 155 spin_lock_init(&rb->spinlock); 156 + atomic_set(&rb->busy, 0); 172 157 init_waitqueue_head(&rb->waitq); 173 158 init_irq_work(&rb->work, bpf_ringbuf_notify); 174 159 ··· 305 288 return prod_pos - cons_pos; 306 289 } 307 290 308 - static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp, 309 - struct poll_table_struct *pts) 291 + static u32 ringbuf_total_data_sz(const struct bpf_ringbuf *rb) 292 + { 293 + return rb->mask + 1; 294 + } 295 + 296 + static __poll_t ringbuf_map_poll_kern(struct bpf_map *map, struct file *filp, 297 + struct poll_table_struct *pts) 310 298 { 311 299 struct bpf_ringbuf_map *rb_map; 312 300 ··· 323 301 return 0; 324 302 } 325 303 304 + static __poll_t ringbuf_map_poll_user(struct bpf_map *map, struct file *filp, 305 + struct poll_table_struct *pts) 306 + { 307 + struct bpf_ringbuf_map *rb_map; 308 + 309 + rb_map = container_of(map, struct bpf_ringbuf_map, map); 310 + poll_wait(filp, &rb_map->rb->waitq, pts); 311 + 312 + if (ringbuf_avail_data_sz(rb_map->rb) < ringbuf_total_data_sz(rb_map->rb)) 313 + return EPOLLOUT | EPOLLWRNORM; 314 + return 0; 315 + } 316 + 326 317 BTF_ID_LIST_SINGLE(ringbuf_map_btf_ids, struct, bpf_ringbuf_map) 327 318 const struct bpf_map_ops ringbuf_map_ops = { 328 319 .map_meta_equal = bpf_map_meta_equal, 329 320 .map_alloc = ringbuf_map_alloc, 330 321 .map_free = ringbuf_map_free, 331 322 .map_mmap = ringbuf_map_mmap_kern, 332 - .map_poll = ringbuf_map_poll, 323 + .map_poll = ringbuf_map_poll_kern, 333 324 .map_lookup_elem = ringbuf_map_lookup_elem, 334 325 .map_update_elem = ringbuf_map_update_elem, 335 326 .map_delete_elem = ringbuf_map_delete_elem, ··· 356 321 .map_alloc = ringbuf_map_alloc, 357 322 .map_free = ringbuf_map_free, 358 323 .map_mmap = ringbuf_map_mmap_user, 324 + .map_poll = ringbuf_map_poll_user, 359 325 .map_lookup_elem = ringbuf_map_lookup_elem, 360 326 .map_update_elem = ringbuf_map_update_elem, 361 327 .map_delete_elem = ringbuf_map_delete_elem, ··· 398 362 return NULL; 399 363 400 364 len = round_up(size + BPF_RINGBUF_HDR_SZ, 8); 401 - if (len > rb->mask + 1) 365 + if (len > ringbuf_total_data_sz(rb)) 402 366 return NULL; 403 367 404 368 cons_pos = smp_load_acquire(&rb->consumer_pos); ··· 545 509 case BPF_RB_AVAIL_DATA: 546 510 return ringbuf_avail_data_sz(rb); 547 511 case BPF_RB_RING_SIZE: 548 - return rb->mask + 1; 512 + return ringbuf_total_data_sz(rb); 549 513 case BPF_RB_CONS_POS: 550 514 return smp_load_acquire(&rb->consumer_pos); 551 515 case BPF_RB_PROD_POS: ··· 638 602 .ret_type = RET_VOID, 639 603 .arg1_type = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE, 640 604 .arg2_type = ARG_ANYTHING, 605 + }; 606 + 607 + static int __bpf_user_ringbuf_peek(struct bpf_ringbuf *rb, void **sample, u32 *size) 608 + { 609 + int err; 610 + u32 hdr_len, sample_len, total_len, flags, *hdr; 611 + u64 cons_pos, prod_pos; 612 + 613 + /* Synchronizes with smp_store_release() in user-space producer. */ 614 + prod_pos = smp_load_acquire(&rb->producer_pos); 615 + if (prod_pos % 8) 616 + return -EINVAL; 617 + 618 + /* Synchronizes with smp_store_release() in __bpf_user_ringbuf_sample_release() */ 619 + cons_pos = smp_load_acquire(&rb->consumer_pos); 620 + if (cons_pos >= prod_pos) 621 + return -ENODATA; 622 + 623 + hdr = (u32 *)((uintptr_t)rb->data + (uintptr_t)(cons_pos & rb->mask)); 624 + /* Synchronizes with smp_store_release() in user-space producer. */ 625 + hdr_len = smp_load_acquire(hdr); 626 + flags = hdr_len & (BPF_RINGBUF_BUSY_BIT | BPF_RINGBUF_DISCARD_BIT); 627 + sample_len = hdr_len & ~flags; 628 + total_len = round_up(sample_len + BPF_RINGBUF_HDR_SZ, 8); 629 + 630 + /* The sample must fit within the region advertised by the producer position. */ 631 + if (total_len > prod_pos - cons_pos) 632 + return -EINVAL; 633 + 634 + /* The sample must fit within the data region of the ring buffer. */ 635 + if (total_len > ringbuf_total_data_sz(rb)) 636 + return -E2BIG; 637 + 638 + /* The sample must fit into a struct bpf_dynptr. */ 639 + err = bpf_dynptr_check_size(sample_len); 640 + if (err) 641 + return -E2BIG; 642 + 643 + if (flags & BPF_RINGBUF_DISCARD_BIT) { 644 + /* If the discard bit is set, the sample should be skipped. 645 + * 646 + * Update the consumer pos, and return -EAGAIN so the caller 647 + * knows to skip this sample and try to read the next one. 648 + */ 649 + smp_store_release(&rb->consumer_pos, cons_pos + total_len); 650 + return -EAGAIN; 651 + } 652 + 653 + if (flags & BPF_RINGBUF_BUSY_BIT) 654 + return -ENODATA; 655 + 656 + *sample = (void *)((uintptr_t)rb->data + 657 + (uintptr_t)((cons_pos + BPF_RINGBUF_HDR_SZ) & rb->mask)); 658 + *size = sample_len; 659 + return 0; 660 + } 661 + 662 + static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags) 663 + { 664 + u64 consumer_pos; 665 + u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8); 666 + 667 + /* Using smp_load_acquire() is unnecessary here, as the busy-bit 668 + * prevents another task from writing to consumer_pos after it was read 669 + * by this task with smp_load_acquire() in __bpf_user_ringbuf_peek(). 670 + */ 671 + consumer_pos = rb->consumer_pos; 672 + /* Synchronizes with smp_load_acquire() in user-space producer. */ 673 + smp_store_release(&rb->consumer_pos, consumer_pos + rounded_size); 674 + } 675 + 676 + BPF_CALL_4(bpf_user_ringbuf_drain, struct bpf_map *, map, 677 + void *, callback_fn, void *, callback_ctx, u64, flags) 678 + { 679 + struct bpf_ringbuf *rb; 680 + long samples, discarded_samples = 0, ret = 0; 681 + bpf_callback_t callback = (bpf_callback_t)callback_fn; 682 + u64 wakeup_flags = BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP; 683 + int busy = 0; 684 + 685 + if (unlikely(flags & ~wakeup_flags)) 686 + return -EINVAL; 687 + 688 + rb = container_of(map, struct bpf_ringbuf_map, map)->rb; 689 + 690 + /* If another consumer is already consuming a sample, wait for them to finish. */ 691 + if (!atomic_try_cmpxchg(&rb->busy, &busy, 1)) 692 + return -EBUSY; 693 + 694 + for (samples = 0; samples < BPF_MAX_USER_RINGBUF_SAMPLES && ret == 0; samples++) { 695 + int err; 696 + u32 size; 697 + void *sample; 698 + struct bpf_dynptr_kern dynptr; 699 + 700 + err = __bpf_user_ringbuf_peek(rb, &sample, &size); 701 + if (err) { 702 + if (err == -ENODATA) { 703 + break; 704 + } else if (err == -EAGAIN) { 705 + discarded_samples++; 706 + continue; 707 + } else { 708 + ret = err; 709 + goto schedule_work_return; 710 + } 711 + } 712 + 713 + bpf_dynptr_init(&dynptr, sample, BPF_DYNPTR_TYPE_LOCAL, 0, size); 714 + ret = callback((uintptr_t)&dynptr, (uintptr_t)callback_ctx, 0, 0, 0); 715 + __bpf_user_ringbuf_sample_release(rb, size, flags); 716 + } 717 + ret = samples - discarded_samples; 718 + 719 + schedule_work_return: 720 + /* Prevent the clearing of the busy-bit from being reordered before the 721 + * storing of any rb consumer or producer positions. 722 + */ 723 + smp_mb__before_atomic(); 724 + atomic_set(&rb->busy, 0); 725 + 726 + if (flags & BPF_RB_FORCE_WAKEUP) 727 + irq_work_queue(&rb->work); 728 + else if (!(flags & BPF_RB_NO_WAKEUP) && samples > 0) 729 + irq_work_queue(&rb->work); 730 + return ret; 731 + } 732 + 733 + const struct bpf_func_proto bpf_user_ringbuf_drain_proto = { 734 + .func = bpf_user_ringbuf_drain, 735 + .ret_type = RET_INTEGER, 736 + .arg1_type = ARG_CONST_MAP_PTR, 737 + .arg2_type = ARG_PTR_TO_FUNC, 738 + .arg3_type = ARG_PTR_TO_STACK_OR_NULL, 739 + .arg4_type = ARG_ANYTHING, 641 740 };
+57 -4
kernel/bpf/verifier.c
··· 563 563 [PTR_TO_BUF] = "buf", 564 564 [PTR_TO_FUNC] = "func", 565 565 [PTR_TO_MAP_KEY] = "map_key", 566 + [PTR_TO_DYNPTR] = "dynptr_ptr", 566 567 }; 567 568 568 569 if (type & PTR_MAYBE_NULL) { ··· 5689 5688 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } }; 5690 5689 static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } }; 5691 5690 static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } }; 5691 + static const struct bpf_reg_types dynptr_types = { 5692 + .types = { 5693 + PTR_TO_STACK, 5694 + PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL, 5695 + } 5696 + }; 5692 5697 5693 5698 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = { 5694 5699 [ARG_PTR_TO_MAP_KEY] = &map_key_value_types, ··· 5721 5714 [ARG_PTR_TO_CONST_STR] = &const_str_ptr_types, 5722 5715 [ARG_PTR_TO_TIMER] = &timer_types, 5723 5716 [ARG_PTR_TO_KPTR] = &kptr_types, 5724 - [ARG_PTR_TO_DYNPTR] = &stack_ptr_types, 5717 + [ARG_PTR_TO_DYNPTR] = &dynptr_types, 5725 5718 }; 5726 5719 5727 5720 static int check_reg_type(struct bpf_verifier_env *env, u32 regno, ··· 6073 6066 err = check_mem_size_reg(env, reg, regno, true, meta); 6074 6067 break; 6075 6068 case ARG_PTR_TO_DYNPTR: 6069 + /* We only need to check for initialized / uninitialized helper 6070 + * dynptr args if the dynptr is not PTR_TO_DYNPTR, as the 6071 + * assumption is that if it is, that a helper function 6072 + * initialized the dynptr on behalf of the BPF program. 6073 + */ 6074 + if (base_type(reg->type) == PTR_TO_DYNPTR) 6075 + break; 6076 6076 if (arg_type & MEM_UNINIT) { 6077 6077 if (!is_dynptr_reg_valid_uninit(env, reg)) { 6078 6078 verbose(env, "Dynptr has to be an uninitialized dynptr\n"); ··· 6255 6241 goto error; 6256 6242 break; 6257 6243 case BPF_MAP_TYPE_USER_RINGBUF: 6258 - goto error; 6244 + if (func_id != BPF_FUNC_user_ringbuf_drain) 6245 + goto error; 6246 + break; 6259 6247 case BPF_MAP_TYPE_STACK_TRACE: 6260 6248 if (func_id != BPF_FUNC_get_stackid) 6261 6249 goto error; ··· 6375 6359 case BPF_FUNC_ringbuf_submit_dynptr: 6376 6360 case BPF_FUNC_ringbuf_discard_dynptr: 6377 6361 if (map->map_type != BPF_MAP_TYPE_RINGBUF) 6362 + goto error; 6363 + break; 6364 + case BPF_FUNC_user_ringbuf_drain: 6365 + if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF) 6378 6366 goto error; 6379 6367 break; 6380 6368 case BPF_FUNC_get_stackid: ··· 6907 6887 return 0; 6908 6888 } 6909 6889 6890 + static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env, 6891 + struct bpf_func_state *caller, 6892 + struct bpf_func_state *callee, 6893 + int insn_idx) 6894 + { 6895 + /* bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void 6896 + * callback_ctx, u64 flags); 6897 + * callback_fn(struct bpf_dynptr_t* dynptr, void *callback_ctx); 6898 + */ 6899 + __mark_reg_not_init(env, &callee->regs[BPF_REG_0]); 6900 + callee->regs[BPF_REG_1].type = PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL; 6901 + __mark_reg_known_zero(&callee->regs[BPF_REG_1]); 6902 + callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3]; 6903 + 6904 + /* unused */ 6905 + __mark_reg_not_init(env, &callee->regs[BPF_REG_3]); 6906 + __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); 6907 + __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); 6908 + 6909 + callee->in_callback_fn = true; 6910 + return 0; 6911 + } 6912 + 6910 6913 static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx) 6911 6914 { 6912 6915 struct bpf_verifier_state *state = env->cur_state; ··· 7389 7346 case BPF_FUNC_dynptr_data: 7390 7347 for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { 7391 7348 if (arg_type_is_dynptr(fn->arg_type[i])) { 7349 + struct bpf_reg_state *reg = &regs[BPF_REG_1 + i]; 7350 + 7392 7351 if (meta.ref_obj_id) { 7393 7352 verbose(env, "verifier internal error: meta.ref_obj_id already set\n"); 7394 7353 return -EFAULT; 7395 7354 } 7396 - /* Find the id of the dynptr we're tracking the reference of */ 7397 - meta.ref_obj_id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]); 7355 + 7356 + if (base_type(reg->type) != PTR_TO_DYNPTR) 7357 + /* Find the id of the dynptr we're 7358 + * tracking the reference of 7359 + */ 7360 + meta.ref_obj_id = stack_slot_get_id(env, reg); 7398 7361 break; 7399 7362 } 7400 7363 } ··· 7408 7359 verbose(env, "verifier internal error: no dynptr in bpf_dynptr_data()\n"); 7409 7360 return -EFAULT; 7410 7361 } 7362 + break; 7363 + case BPF_FUNC_user_ringbuf_drain: 7364 + err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, 7365 + set_user_ringbuf_callback_state); 7411 7366 break; 7412 7367 } 7413 7368
+38
tools/include/uapi/linux/bpf.h
··· 5388 5388 * Return 5389 5389 * Current *ktime*. 5390 5390 * 5391 + * long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) 5392 + * Description 5393 + * Drain samples from the specified user ring buffer, and invoke 5394 + * the provided callback for each such sample: 5395 + * 5396 + * long (\*callback_fn)(struct bpf_dynptr \*dynptr, void \*ctx); 5397 + * 5398 + * If **callback_fn** returns 0, the helper will continue to try 5399 + * and drain the next sample, up to a maximum of 5400 + * BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value is 1, 5401 + * the helper will skip the rest of the samples and return. Other 5402 + * return values are not used now, and will be rejected by the 5403 + * verifier. 5404 + * Return 5405 + * The number of drained samples if no error was encountered while 5406 + * draining samples, or 0 if no samples were present in the ring 5407 + * buffer. If a user-space producer was epoll-waiting on this map, 5408 + * and at least one sample was drained, they will receive an event 5409 + * notification notifying them of available space in the ring 5410 + * buffer. If the BPF_RB_NO_WAKEUP flag is passed to this 5411 + * function, no wakeup notification will be sent. If the 5412 + * BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification will 5413 + * be sent even if no sample was drained. 5414 + * 5415 + * On failure, the returned value is one of the following: 5416 + * 5417 + * **-EBUSY** if the ring buffer is contended, and another calling 5418 + * context was concurrently draining the ring buffer. 5419 + * 5420 + * **-EINVAL** if user-space is not properly tracking the ring 5421 + * buffer due to the producer position not being aligned to 8 5422 + * bytes, a sample not being aligned to 8 bytes, or the producer 5423 + * position not matching the advertised length of a sample. 5424 + * 5425 + * **-E2BIG** if user-space has tried to publish a sample which is 5426 + * larger than the size of the ring buffer, or which cannot fit 5427 + * within a struct bpf_dynptr. 5391 5428 */ 5392 5429 #define __BPF_FUNC_MAPPER(FN) \ 5393 5430 FN(unspec), \ ··· 5636 5599 FN(tcp_raw_check_syncookie_ipv4), \ 5637 5600 FN(tcp_raw_check_syncookie_ipv6), \ 5638 5601 FN(ktime_get_tai_ns), \ 5602 + FN(user_ringbuf_drain), \ 5639 5603 /* */ 5640 5604 5641 5605 /* integer value in 'imm' field of BPF_CALL instruction selects which helper