Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: page_pool: allow enabling recycling late, fix false positive warning

Page pool can have pages "directly" (locklessly) recycled to it,
if the NAPI that owns the page pool is scheduled to run on the same CPU.
To make this safe we check that the NAPI is disabled while we destroy
the page pool. In most cases NAPI and page pool lifetimes are tied
together so this happens naturally.

The queue API expects the following order of calls:
-> mem_alloc
alloc new pp
-> stop
napi_disable
-> start
napi_enable
-> mem_free
free old pp

Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
But the NAPIs are only stopped between ->stop and ->start. We created
page_pool_disable_direct_recycling() to safely shut down the recycling
in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
have to worry about recycling any more.

Unfortunately, the page_pool_disable_direct_recycling() is not enough
to deal with failures which necessitate freeing the _new_ page pool.
If we hit a failure in ->mem_alloc or ->stop the new page pool has
to be freed while the NAPI is active (assuming driver attaches the
page pool to an existing NAPI instance and doesn't reallocate NAPIs).

Freeing the new page pool is technically safe because it hasn't been
used for any packets, yet, so there can be no recycling. But the check
in napi_assert_will_not_race() has no way of knowing that. We could
check if page pool is empty but that'd make the check much less likely
to trigger during development.

Add page_pool_enable_direct_recycling(), pairing with
page_pool_disable_direct_recycling(). It will allow us to create the new
page pools in "disabled" state and only enable recycling when we know
the reconfig operation will not fail.

Coincidentally it will also let us re-enable the recycling for the old
pool, if the reconfig failed:

-> mem_alloc (new)
-> stop (old)
# disables direct recycling for old
-> start (new)
# fail!!
-> start (old)
# go back to old pp but direct recycling is lost :(
-> mem_free (new)

The new helper is idempotent to make the life easier for drivers,
which can operate in HDS mode and support zero-copy Rx.
The driver can call the helper twice whether there are two pools
or it has multiple references to a single pool.

Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
Tested-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250805003654.2944974-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+39 -1
+8 -1
drivers/net/ethernet/broadcom/bnxt/bnxt.c
··· 3819 3819 if (BNXT_RX_PAGE_MODE(bp)) 3820 3820 pp.pool_size += bp->rx_ring_size / rx_size_fac; 3821 3821 pp.nid = numa_node; 3822 - pp.napi = &rxr->bnapi->napi; 3823 3822 pp.netdev = bp->dev; 3824 3823 pp.dev = &bp->pdev->dev; 3825 3824 pp.dma_dir = bp->rx_dir; ··· 3848 3849 page_pool_destroy(rxr->page_pool); 3849 3850 rxr->page_pool = NULL; 3850 3851 return PTR_ERR(pool); 3852 + } 3853 + 3854 + static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr) 3855 + { 3856 + page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi); 3857 + page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi); 3851 3858 } 3852 3859 3853 3860 static int bnxt_alloc_rx_agg_bmap(struct bnxt *bp, struct bnxt_rx_ring_info *rxr) ··· 3894 3889 rc = bnxt_alloc_rx_page_pool(bp, rxr, cpu_node); 3895 3890 if (rc) 3896 3891 return rc; 3892 + bnxt_enable_rx_page_pool(rxr); 3897 3893 3898 3894 rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0); 3899 3895 if (rc < 0) ··· 16037 16031 goto err_reset; 16038 16032 } 16039 16033 16034 + bnxt_enable_rx_page_pool(rxr); 16040 16035 napi_enable_locked(&bnapi->napi); 16041 16036 bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons); 16042 16037
+2
include/net/page_pool/types.h
··· 265 265 struct xdp_mem_info; 266 266 267 267 #ifdef CONFIG_PAGE_POOL 268 + void page_pool_enable_direct_recycling(struct page_pool *pool, 269 + struct napi_struct *napi); 268 270 void page_pool_disable_direct_recycling(struct page_pool *pool); 269 271 void page_pool_destroy(struct page_pool *pool); 270 272 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
+29
net/core/page_pool.c
··· 1201 1201 pool->xdp_mem_id = mem->id; 1202 1202 } 1203 1203 1204 + /** 1205 + * page_pool_enable_direct_recycling() - mark page pool as owned by NAPI 1206 + * @pool: page pool to modify 1207 + * @napi: NAPI instance to associate the page pool with 1208 + * 1209 + * Associate a page pool with a NAPI instance for lockless page recycling. 1210 + * This is useful when a new page pool has to be added to a NAPI instance 1211 + * without disabling that NAPI instance, to mark the point at which control 1212 + * path "hands over" the page pool to the NAPI instance. In most cases driver 1213 + * can simply set the @napi field in struct page_pool_params, and does not 1214 + * have to call this helper. 1215 + * 1216 + * The function is idempotent, but does not implement any refcounting. 1217 + * Single page_pool_disable_direct_recycling() will disable recycling, 1218 + * no matter how many times enable was called. 1219 + */ 1220 + void page_pool_enable_direct_recycling(struct page_pool *pool, 1221 + struct napi_struct *napi) 1222 + { 1223 + if (READ_ONCE(pool->p.napi) == napi) 1224 + return; 1225 + WARN_ON(!napi || pool->p.napi); 1226 + 1227 + mutex_lock(&page_pools_lock); 1228 + WRITE_ONCE(pool->p.napi, napi); 1229 + mutex_unlock(&page_pools_lock); 1230 + } 1231 + EXPORT_SYMBOL(page_pool_enable_direct_recycling); 1232 + 1204 1233 void page_pool_disable_direct_recycling(struct page_pool *pool) 1205 1234 { 1206 1235 /* Disable direct recycling based on pool->cpuid.