Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

RDMAVT: Fix synchronization around percpu_ref

rvt_mregion uses percpu_ref for reference counting and RCU to protect
accesses from lkey_table. When a rvt_mregion needs to be freed, it
first gets unregistered from lkey_table and then rvt_check_refs() is
called to wait for in-flight usages before the rvt_mregion is freed.

rvt_check_refs() seems to have a couple issues.

* It has a fast exit path which tests percpu_ref_is_zero(). However,
a percpu_ref reading zero doesn't mean that the object can be
released. In fact, the ->release() callback might not even have
started executing yet. Proceeding with freeing can lead to
use-after-free.

* lkey_table is RCU protected but there is no RCU grace period in the
free path. percpu_ref uses RCU internally but it's sched-RCU whose
grace periods are different from regular RCU. Also, it generally
isn't a good idea to depend on internal behaviors like this.

To address the above issues, this patch removes the fast exit and adds
an explicit synchronize_rcu().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: linux-rdma@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>

+6 -4
+6 -4
drivers/infiniband/sw/rdmavt/mr.c
··· 489 489 unsigned long timeout; 490 490 struct rvt_dev_info *rdi = ib_to_rvt(mr->pd->device); 491 491 492 - if (percpu_ref_is_zero(&mr->refcount)) 493 - return 0; 494 - /* avoid dma mr */ 495 - if (mr->lkey) 492 + if (mr->lkey) { 493 + /* avoid dma mr */ 496 494 rvt_dereg_clean_qps(mr); 495 + /* @mr was indexed on rcu protected @lkey_table */ 496 + synchronize_rcu(); 497 + } 498 + 497 499 timeout = wait_for_completion_timeout(&mr->comp, 5 * HZ); 498 500 if (!timeout) { 499 501 rvt_pr_err(rdi,