Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

io-wq: fix potential race of acct->nr_workers

Given max_worker is 1, and we currently have 1 running and it is
exiting. There may be race like:
io_wqe_enqueue worker1
no work there and timeout
unlock(wqe->lock)
->insert work
-->io_worker_exit
lock(wqe->lock)
->if(!nr_workers) //it's still 1
unlock(wqe->lock)
goto run_cancel
lock(wqe->lock)
nr_workers--
->dec_running
->worker creation fails
unlock(wqe->lock)

We enqueued one work but there is no workers, causes hung.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Hao Xu and committed by
Jens Axboe
767a65e9 7a842fb5

+1 -2
+1 -2
fs/io-wq.c
··· 176 176 static void io_worker_exit(struct io_worker *worker) 177 177 { 178 178 struct io_wqe *wqe = worker->wqe; 179 - struct io_wqe_acct *acct = io_wqe_get_acct(worker); 180 179 181 180 if (refcount_dec_and_test(&worker->ref)) 182 181 complete(&worker->ref_done); ··· 185 186 if (worker->flags & IO_WORKER_F_FREE) 186 187 hlist_nulls_del_rcu(&worker->nulls_node); 187 188 list_del_rcu(&worker->all_list); 188 - acct->nr_workers--; 189 189 preempt_disable(); 190 190 io_wqe_dec_running(worker); 191 191 worker->flags = 0; ··· 567 569 } 568 570 /* timed out, exit unless we're the last worker */ 569 571 if (last_timeout && acct->nr_workers > 1) { 572 + acct->nr_workers--; 570 573 raw_spin_unlock(&wqe->lock); 571 574 __set_current_state(TASK_RUNNING); 572 575 break;