Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

io_uring: free fixed_file_data after RCU grace period

The percpu refcount protects this structure, and we can have an atomic
switch in progress when exiting. This makes it unsafe to just free the
struct normally, and can trigger the following KASAN warning:

BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
Read of size 1 at addr ffff888181a19a30 by task swapper/0/0

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x76/0xa0
print_address_description.constprop.0+0x3b/0x60
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
__kasan_report.cold+0x1a/0x3d
? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
rcu_core+0x370/0x830
? percpu_ref_exit+0x50/0x50
? rcu_note_context_switch+0x7b0/0x7b0
? run_rebalance_domains+0x11d/0x140
__do_softirq+0x10a/0x3e9
irq_exit+0xd5/0xe0
smp_apic_timer_interrupt+0x86/0x200
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:default_idle+0x26/0x1f0

Fix this by punting the final exit and free of the struct to RCU, then
we know that it's safe to do so. Jann suggested the approach of using a
double rcu callback to achieve this. It's important that we do a nested
call_rcu() callback, as otherwise the free could be ordered before the
atomic switch, even if the latter was already queued.

Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com
Suggested-by: Jann Horn <jannh@google.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

+22 -2
+22 -2
fs/io_uring.c
··· 191 191 struct llist_head put_llist; 192 192 struct work_struct ref_work; 193 193 struct completion done; 194 + struct rcu_head rcu; 194 195 }; 195 196 196 197 struct io_ring_ctx { ··· 5330 5329 complete(&data->done); 5331 5330 } 5332 5331 5332 + static void __io_file_ref_exit_and_free(struct rcu_head *rcu) 5333 + { 5334 + struct fixed_file_data *data = container_of(rcu, struct fixed_file_data, 5335 + rcu); 5336 + percpu_ref_exit(&data->refs); 5337 + kfree(data); 5338 + } 5339 + 5340 + static void io_file_ref_exit_and_free(struct rcu_head *rcu) 5341 + { 5342 + /* 5343 + * We need to order our exit+free call against the potentially 5344 + * existing call_rcu() for switching to atomic. One way to do that 5345 + * is to have this rcu callback queue the final put and free, as we 5346 + * could otherwise have a pre-existing atomic switch complete _after_ 5347 + * the free callback we queued. 5348 + */ 5349 + call_rcu(rcu, __io_file_ref_exit_and_free); 5350 + } 5351 + 5333 5352 static int io_sqe_files_unregister(struct io_ring_ctx *ctx) 5334 5353 { 5335 5354 struct fixed_file_data *data = ctx->file_data; ··· 5362 5341 flush_work(&data->ref_work); 5363 5342 wait_for_completion(&data->done); 5364 5343 io_ring_file_ref_flush(data); 5365 - percpu_ref_exit(&data->refs); 5366 5344 5367 5345 __io_sqe_files_unregister(ctx); 5368 5346 nr_tables = DIV_ROUND_UP(ctx->nr_user_files, IORING_MAX_FILES_TABLE); 5369 5347 for (i = 0; i < nr_tables; i++) 5370 5348 kfree(data->table[i].files); 5371 5349 kfree(data->table); 5372 - kfree(data); 5350 + call_rcu(&data->rcu, io_file_ref_exit_and_free); 5373 5351 ctx->file_data = NULL; 5374 5352 ctx->nr_user_files = 0; 5375 5353 return 0;