Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

fs: annotate suspected data race between poll_schedule_timeout() and pollwake()

When running almost any select()/poll() workload intense enough,
KCSAN is likely to report data races around using 'triggered' flag
of 'struct poll_wqueues'. For example, running 'find /' on a tty
console may trigger the following:

BUG: KCSAN: data-race in poll_schedule_timeout / pollwake

write to 0xffffc900030cfb90 of 4 bytes by task 97 on cpu 5:
pollwake+0xd1/0x130
__wake_up_common_lock+0x7f/0xd0
n_tty_receive_buf_common+0x776/0xc30
n_tty_receive_buf2+0x3d/0x60
tty_ldisc_receive_buf+0x6b/0x100
tty_port_default_receive_buf+0x63/0xa0
flush_to_ldisc+0x169/0x3c0
process_scheduled_works+0x6fe/0xf40
worker_thread+0x53b/0x7b0
kthread+0x4f8/0x590
ret_from_fork+0x28c/0x450
ret_from_fork_asm+0x1a/0x30

read to 0xffffc900030cfb90 of 4 bytes by task 5802 on cpu 4:
poll_schedule_timeout+0x96/0x160
do_sys_poll+0x966/0xb30
__se_sys_ppoll+0x1c3/0x210
__x64_sys_ppoll+0x71/0x90
x64_sys_call+0x3079/0x32b0
do_syscall_64+0xfa/0x3b0
entry_SYSCALL_64_after_hwframe+0x77/0x7f

According to Jan, "there's no practical issue here because it is hard
to imagine how the compiler could compile the above code using some
intermediate values stored into 'triggered' or multiple fetches from
'triggered'". Nevertheless, silence KCSAN by using WRITE_ONCE() in
__pollwake() and READ_ONCE() in poll_schedule_timeout(), respectively.

Link: https://lore.kernel.org/linux-fsdevel/bwx72orsztfjx6aoftzzkl7wle3hi4syvusuwc7x36nw6t235e@bjwrosehblty
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://lore.kernel.org/20250620063059.1800689-1-dmantipov@yandex.ru
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

Dmitry Antipov and committed by
Christian Brauner
2b7c9664 2773d282

+2 -2
+2 -2
fs/select.c
··· 192 192 * and is paired with smp_store_mb() in poll_schedule_timeout. 193 193 */ 194 194 smp_wmb(); 195 - pwq->triggered = 1; 195 + WRITE_ONCE(pwq->triggered, 1); 196 196 197 197 /* 198 198 * Perform the default wake up operation using a dummy ··· 237 237 int rc = -EINTR; 238 238 239 239 set_current_state(state); 240 - if (!pwq->triggered) 240 + if (!READ_ONCE(pwq->triggered)) 241 241 rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS); 242 242 __set_current_state(TASK_RUNNING); 243 243