Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

svcrdma: Release transport resources synchronously

NFSD has always supported added network listeners. The new netlink
protocol now enables the removal of listeners.

Olga noticed that if an RDMA listener is removed and immediately
re-added, the deferred __svc_rdma_free() function might not have
run yet, so some or all of the old listener's RDMA resources
linger, which prevents a new listener on the same address from
being created.

Also, svc_xprt_free() does a module_put() just after calling
->xpo_free(). That means if there is deferred work going on, the
module could be unloaded before that work is even started,
resulting in a UAF.

Neil asks:
> What particular part of __svc_rdma_free() needs to run in order for a
> subsequent registration to succeed?
> Can that bit be run directory from svc_rdma_free() rather than be
> delayed?
> (I know almost nothing about rdma so forgive me if the answers to these
> questions seems obvious)

The reasons I can recall are:

- Some of the transport tear-down work can sleep
- Releasing a cm_id is tricky and can deadlock

We might be able to mitigate the second issue with judicious
application of transport reference counting.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Closes: https://lore.kernel.org/linux-nfs/20250821204328.89218-1-okorniev@redhat.com/
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

+8 -11
+8 -11
net/sunrpc/xprtrdma/svc_rdma_transport.c
··· 591 591 rdma_disconnect(rdma->sc_cm_id); 592 592 } 593 593 594 - static void __svc_rdma_free(struct work_struct *work) 594 + /** 595 + * svc_rdma_free - Release class-specific transport resources 596 + * @xprt: Generic svc transport object 597 + */ 598 + static void svc_rdma_free(struct svc_xprt *xprt) 595 599 { 596 600 struct svcxprt_rdma *rdma = 597 - container_of(work, struct svcxprt_rdma, sc_work); 601 + container_of(xprt, struct svcxprt_rdma, sc_xprt); 598 602 struct ib_device *device = rdma->sc_cm_id->device; 603 + 604 + might_sleep(); 599 605 600 606 /* This blocks until the Completion Queues are empty */ 601 607 if (rdma->sc_qp && !IS_ERR(rdma->sc_qp)) ··· 633 627 if (!test_bit(XPT_LISTENER, &rdma->sc_xprt.xpt_flags)) 634 628 rpcrdma_rn_unregister(device, &rdma->sc_rn); 635 629 kfree(rdma); 636 - } 637 - 638 - static void svc_rdma_free(struct svc_xprt *xprt) 639 - { 640 - struct svcxprt_rdma *rdma = 641 - container_of(xprt, struct svcxprt_rdma, sc_xprt); 642 - 643 - INIT_WORK(&rdma->sc_work, __svc_rdma_free); 644 - schedule_work(&rdma->sc_work); 645 630 } 646 631 647 632 static int svc_rdma_has_wspace(struct svc_xprt *xprt)