Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: Enable IRQ after irq_work_raise() completes in unit_free{_rcu}()

Both unit_free() and unit_free_rcu() invoke irq_work_raise() to free
freed objects back to slab and the invocation may also be preempted by
unit_alloc() and unit_alloc() may return NULL unexpectedly as shown in
the following case:

task A task B

unit_free()
// high_watermark = 48
// free_cnt = 49 after free
irq_work_raise()
// mark irq work as IRQ_WORK_PENDING
irq_work_claim()

// task B preempts task A
unit_alloc()
// free_cnt = 48 after alloc

// does unit_alloc() 32-times
......
// free_cnt = 16

unit_alloc()
// free_cnt = 15 after alloc
// irq work is already PENDING,
// so just return
irq_work_raise()

// does unit_alloc() 15-times
......
// free_cnt = 0

unit_alloc()
// free_cnt = 0 before alloc
return NULL

Fix it by enabling IRQ after irq_work_raise() completes.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230901111954.1804721-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Hou Tao and committed by
Alexei Starovoitov
62cf51cb 566f6de3

+7 -2
+7 -2
kernel/bpf/memalloc.c
··· 778 778 llist_add(llnode, &c->free_llist_extra); 779 779 } 780 780 local_dec(&c->active); 781 - local_irq_restore(flags); 782 781 783 782 if (cnt > c->high_watermark) 784 783 /* free few objects from current cpu into global kmalloc pool */ 785 784 irq_work_raise(c); 785 + /* Enable IRQ after irq_work_raise() completes, otherwise when current 786 + * task is preempted by task which does unit_alloc(), unit_alloc() may 787 + * return NULL unexpectedly because irq work is already pending but can 788 + * not been triggered and free_llist can not be refilled timely. 789 + */ 790 + local_irq_restore(flags); 786 791 } 787 792 788 793 static void notrace unit_free_rcu(struct bpf_mem_cache *c, void *ptr) ··· 805 800 llist_add(llnode, &c->free_llist_extra_rcu); 806 801 } 807 802 local_dec(&c->active); 808 - local_irq_restore(flags); 809 803 810 804 if (!atomic_read(&c->call_rcu_in_progress)) 811 805 irq_work_raise(c); 806 + local_irq_restore(flags); 812 807 } 813 808 814 809 /* Called from BPF program or from sys_bpf syscall.