Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'net-mlx5e-use-multiple-doorbells'

Tariq Toukan says:

====================
net/mlx5e: Use multiple doorbells

mlx5e uses a single MMIO-mapped doorbell per netdevice for all send and
receive operations. Writes to the doorbell go over the PCIe bus directly
to the device, which then services the indicated queues.

On certain architectures and with sufficiently high volume of doorbell
ringing (many cores, many active channels, small MTU, no GSO, etc.), the
MMIO-mapped doorbell address can become contended, leading to delays in
servicing writes to that address and a global slowdown of all traffic
for that netdevice.

mlx5 NICs have supported using multiple doorbells for many years, the
mlx5_ib driver for the same hardware has been using multiple doorbells
traditionally.

This patch series extends the mlx5 Ethernet driver to also use multiple
doorbells to solve the MMIO contention issues. By allocating and using
more doorbells for all channel queues (TX and RX), the MMIO contention
on any particular doorbell address is reduced significantly.

The first patches are cleanups:
net/mlx5: Fix typo of MLX5_EQ_DOORBEL_OFFSET
net/mlx5: Remove unused 'offset' field from struct mlx5_sq_bfreg'
net/mlx5e: Remove unused 'xsk' param of mlx5e_build_xdpsq_param

The next patch separates the global doorbell from Ethernet-specific
resources:
net/mlx5: Store the global doorbell in mlx5_priv

Next, plumbing to allow a different doorbell to be used for channel TX
and RX queues:
net/mlx5e: Prepare for using multiple TX doorbells
net/mlx5e: Prepare for using different CQ doorbells

Then, enable using multiple doorbells for channel queues:
net/mlx5e: Use multiple TX doorbells
net/mlx5e: Use multiple CQ doorbells

Finally, introduce a devlink parameter to control this:
devlink: Add a 'num_doorbells' driverinit param
net/mlx5e: Use the 'num_doorbells' devlink param

Some performance results, done with the Linux pktgen script, running b2b
over Connect-X 8 NICs:
samples/pktgen/pktgen_sample02_multiqueue.sh -i $NIC -s 64 -d $DST_IP \
-m $MAC -t 64

Baseline (1 doorbell): 9 Mpps
This series (8 doorbells): 56 Mpps

Note that pktgen without 'burst' rings the doorbell after every packet,
while real packet TX using NAPI usually batches multiple pending packets
with the xmit_more mechanism. So this is in essence a micro-benchmark
showcasing the improvement of using multiple doorbells on platforms
affected by MMIO contention. Real life traffic usually sees little
movement either way.
====================

Link: https://patch.msgid.link/1758031904-634231-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+163 -59
+9
Documentation/networking/devlink/mlx5.rst
··· 62 62 echo 1 >/sys/bus/pci/rescan 63 63 grep ^ /sys/bus/pci/devices/0000:01:00.0/sriov_* 64 64 65 + * - ``num_doorbells`` 66 + - driverinit 67 + - This controls the number of channel doorbells used by the netdev. In all 68 + cases, an additional doorbell is allocated and used for non-channel 69 + communication (e.g. for PTP, HWS, etc.). Supported values are: 70 + 71 + - 0: No channel-specific doorbells, use the global one for everything. 72 + - [1, max_num_channels]: Spread netdev channels equally across these 73 + doorbells. 65 74 66 75 The ``mlx5`` driver also implements the following driver-specific 67 76 parameters.
+2 -2
drivers/infiniband/hw/mlx5/cq.c
··· 648 648 { 649 649 struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev; 650 650 struct mlx5_ib_cq *cq = to_mcq(ibcq); 651 - void __iomem *uar_page = mdev->priv.uar->map; 651 + void __iomem *uar_page = mdev->priv.bfreg.up->map; 652 652 unsigned long irq_flags; 653 653 int ret = 0; 654 654 ··· 923 923 cq->buf.frag_buf.page_shift - 924 924 MLX5_ADAPTER_PAGE_SHIFT); 925 925 926 - *index = dev->mdev->priv.uar->index; 926 + *index = dev->mdev->priv.bfreg.up->index; 927 927 928 928 return 0; 929 929
-1
drivers/net/ethernet/mellanox/mlx5/core/cq.c
··· 145 145 mlx5_core_dbg(dev, "failed adding CP 0x%x to debug file system\n", 146 146 cq->cqn); 147 147 148 - cq->uar = dev->priv.uar; 149 148 cq->irqn = eq->core.irqn; 150 149 151 150 return 0;
+26
drivers/net/ethernet/mellanox/mlx5/core/devlink.c
··· 530 530 return 0; 531 531 } 532 532 533 + static int mlx5_devlink_num_doorbells_validate(struct devlink *devlink, u32 id, 534 + union devlink_param_value val, 535 + struct netlink_ext_ack *extack) 536 + { 537 + struct mlx5_core_dev *mdev = devlink_priv(devlink); 538 + u32 val32 = val.vu32; 539 + u32 max_num_channels; 540 + 541 + max_num_channels = mlx5e_get_max_num_channels(mdev); 542 + if (val32 > max_num_channels) { 543 + NL_SET_ERR_MSG_FMT_MOD(extack, 544 + "Requested num_doorbells (%u) exceeds maximum number of channels (%u)", 545 + val32, max_num_channels); 546 + return -EINVAL; 547 + } 548 + 549 + return 0; 550 + } 551 + 533 552 static void mlx5_devlink_hairpin_params_init_values(struct devlink *devlink) 534 553 { 535 554 struct mlx5_core_dev *dev = devlink_priv(devlink); ··· 628 609 "hairpin_queue_size", DEVLINK_PARAM_TYPE_U32, 629 610 BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, 630 611 mlx5_devlink_hairpin_queue_size_validate), 612 + DEVLINK_PARAM_GENERIC(NUM_DOORBELLS, 613 + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, 614 + mlx5_devlink_num_doorbells_validate), 631 615 }; 632 616 633 617 static int mlx5_devlink_eth_params_register(struct devlink *devlink) ··· 654 632 655 633 mlx5_devlink_hairpin_params_init_values(devlink); 656 634 635 + value.vu32 = MLX5_DEFAULT_NUM_DOORBELLS; 636 + devl_param_driverinit_value_set(devlink, 637 + DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS, 638 + value); 657 639 return 0; 658 640 } 659 641
+3
drivers/net/ethernet/mellanox/mlx5/core/en.h
··· 344 344 /* data path - accessed per napi poll */ 345 345 u16 event_ctr; 346 346 struct napi_struct *napi; 347 + struct mlx5_uars_page *uar; 347 348 struct mlx5_core_cq mcq; 348 349 struct mlx5e_ch_stats *ch_stats; 349 350 ··· 789 788 int vec_ix; 790 789 int sd_ix; 791 790 int cpu; 791 + struct mlx5_sq_bfreg *bfreg; 792 792 /* Sync between icosq recovery and XSK enable/disable. */ 793 793 struct mutex icosq_recovery_lock; 794 794 ··· 1062 1060 struct mlx5e_ch_stats *ch_stats; 1063 1061 int node; 1064 1062 int ix; 1063 + struct mlx5_uars_page *uar; 1065 1064 }; 1066 1065 1067 1066 struct mlx5e_cq_param;
+3 -3
drivers/net/ethernet/mellanox/mlx5/core/en/params.c
··· 611 611 .ch_stats = c->stats, 612 612 .node = cpu_to_node(c->cpu), 613 613 .ix = c->vec_ix, 614 + .uar = c->bfreg->up, 614 615 }; 615 616 } 616 617 ··· 811 810 { 812 811 void *cqc = param->cqc; 813 812 814 - MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); 813 + MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); 815 814 if (MLX5_CAP_GEN(mdev, cqe_128_always) && cache_line_size() >= 128) 816 815 MLX5_SET(cqc, cqc, cqe_sz, CQE_STRIDE_128_PAD); 817 816 } ··· 1230 1229 1231 1230 void mlx5e_build_xdpsq_param(struct mlx5_core_dev *mdev, 1232 1231 struct mlx5e_params *params, 1233 - struct mlx5e_xsk_param *xsk, 1234 1232 struct mlx5e_sq_param *param) 1235 1233 { 1236 1234 void *sqc = param->sqc; ··· 1256 1256 async_icosq_log_wq_sz = mlx5e_build_async_icosq_log_wq_sz(mdev); 1257 1257 1258 1258 mlx5e_build_sq_param(mdev, params, &cparam->txq_sq); 1259 - mlx5e_build_xdpsq_param(mdev, params, NULL, &cparam->xdp_sq); 1259 + mlx5e_build_xdpsq_param(mdev, params, &cparam->xdp_sq); 1260 1260 mlx5e_build_icosq_param(mdev, icosq_log_wq_sz, &cparam->icosq); 1261 1261 mlx5e_build_async_icosq_param(mdev, async_icosq_log_wq_sz, &cparam->async_icosq); 1262 1262
+1 -1
drivers/net/ethernet/mellanox/mlx5/core/en/params.h
··· 51 51 u32 tisn; 52 52 u8 tis_lst_sz; 53 53 u8 min_inline_mode; 54 + u32 uar_page; 54 55 }; 55 56 56 57 /* Striding RQ dynamic parameters */ ··· 133 132 struct mlx5e_cq_param *param); 134 133 void mlx5e_build_xdpsq_param(struct mlx5_core_dev *mdev, 135 134 struct mlx5e_params *params, 136 - struct mlx5e_xsk_param *xsk, 137 135 struct mlx5e_sq_param *param); 138 136 int mlx5e_build_channel_param(struct mlx5_core_dev *mdev, 139 137 struct mlx5e_params *params,
+5 -1
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
··· 334 334 sq->mdev = mdev; 335 335 sq->ch_ix = MLX5E_PTP_CHANNEL_IX; 336 336 sq->txq_ix = txq_ix; 337 - sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; 337 + sq->uar_map = c->bfreg->map; 338 338 sq->min_inline_mode = params->tx_min_inline_mode; 339 339 sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); 340 340 sq->stats = &c->priv->ptp_stats.sq[tc]; ··· 486 486 csp.wq_ctrl = &txqsq->wq_ctrl; 487 487 csp.min_inline_mode = txqsq->min_inline_mode; 488 488 csp.ts_cqe_to_dest_cqn = ptpsq->ts_cq.mcq.cqn; 489 + csp.uar_page = c->bfreg->index; 489 490 490 491 err = mlx5e_create_sq_rdy(c->mdev, sqp, &csp, 0, &txqsq->sqn); 491 492 if (err) ··· 578 577 ccp.ch_stats = c->stats; 579 578 ccp.napi = &c->napi; 580 579 ccp.ix = MLX5E_PTP_CHANNEL_IX; 580 + ccp.uar = c->bfreg->up; 581 581 582 582 cq_param = &cparams->txq_sq_param.cqp; 583 583 ··· 628 626 ccp.ch_stats = c->stats; 629 627 ccp.napi = &c->napi; 630 628 ccp.ix = MLX5E_PTP_CHANNEL_IX; 629 + ccp.uar = c->bfreg->up; 631 630 632 631 cq_param = &cparams->rq_param.cqp; 633 632 ··· 903 900 c->num_tc = mlx5e_get_dcb_num_tc(params); 904 901 c->stats = &priv->ptp_stats.ch; 905 902 c->lag_port = lag_port; 903 + c->bfreg = &mdev->priv.bfreg; 906 904 907 905 err = mlx5e_ptp_set_state(c, params); 908 906 if (err)
+1
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h
··· 66 66 struct mlx5_core_dev *mdev; 67 67 struct hwtstamp_config *tstamp; 68 68 DECLARE_BITMAP(state, MLX5E_PTP_STATE_NUM_STATES); 69 + struct mlx5_sq_bfreg *bfreg; 69 70 }; 70 71 71 72 static inline bool mlx5e_use_ptpsq(struct sk_buff *skb)
+1
drivers/net/ethernet/mellanox/mlx5/core/en/trap.c
··· 76 76 ccp.ch_stats = t->stats; 77 77 ccp.napi = &t->napi; 78 78 ccp.ix = 0; 79 + ccp.uar = mdev->priv.bfreg.up; 79 80 err = mlx5e_open_cq(priv->mdev, trap_moder, &rq_param->cqp, &ccp, &rq->cq); 80 81 if (err) 81 82 return err;
+1 -4
drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
··· 309 309 310 310 static inline void mlx5e_cq_arm(struct mlx5e_cq *cq) 311 311 { 312 - struct mlx5_core_cq *mcq; 313 - 314 - mcq = &cq->mcq; 315 - mlx5_cq_arm(mcq, MLX5_CQ_DB_REQ_NOT, mcq->uar->map, cq->wq.cc); 312 + mlx5_cq_arm(&cq->mcq, MLX5_CQ_DB_REQ_NOT, cq->uar->map, cq->wq.cc); 316 313 } 317 314 318 315 static inline struct mlx5e_sq_dma *
+1 -1
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
··· 54 54 struct mlx5e_channel_param *cparam) 55 55 { 56 56 mlx5e_build_rq_param(mdev, params, xsk, &cparam->rq); 57 - mlx5e_build_xdpsq_param(mdev, params, xsk, &cparam->xdp_sq); 57 + mlx5e_build_xdpsq_param(mdev, params, &cparam->xdp_sq); 58 58 } 59 59 60 60 static int mlx5e_init_xsk_rq(struct mlx5e_channel *c,
+38 -7
drivers/net/ethernet/mellanox/mlx5/core/en_common.c
··· 30 30 * SOFTWARE. 31 31 */ 32 32 33 + #include "devlink.h" 33 34 #include "en.h" 34 35 #include "lib/crypto.h" 35 36 ··· 141 140 return err; 142 141 } 143 142 143 + static unsigned int 144 + mlx5e_get_devlink_param_num_doorbells(struct mlx5_core_dev *dev) 145 + { 146 + const u32 param_id = DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS; 147 + struct devlink *devlink = priv_to_devlink(dev); 148 + union devlink_param_value val; 149 + int err; 150 + 151 + err = devl_param_driverinit_value_get(devlink, param_id, &val); 152 + return err ? MLX5_DEFAULT_NUM_DOORBELLS : val.vu32; 153 + } 154 + 144 155 int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises) 145 156 { 146 157 struct mlx5e_hw_objs *res = &mdev->mlx5e_res.hw_objs; 158 + unsigned int num_doorbells, i; 147 159 int err; 148 160 149 161 err = mlx5_core_alloc_pd(mdev, &res->pdn); ··· 177 163 goto err_dealloc_transport_domain; 178 164 } 179 165 180 - err = mlx5_alloc_bfreg(mdev, &res->bfreg, false, false); 181 - if (err) { 182 - mlx5_core_err(mdev, "alloc bfreg failed, %d\n", err); 166 + num_doorbells = min(mlx5e_get_devlink_param_num_doorbells(mdev), 167 + mlx5e_get_max_num_channels(mdev)); 168 + res->bfregs = kcalloc(num_doorbells, sizeof(*res->bfregs), GFP_KERNEL); 169 + if (!res->bfregs) { 170 + err = -ENOMEM; 183 171 goto err_destroy_mkey; 184 172 } 173 + 174 + for (i = 0; i < num_doorbells; i++) { 175 + err = mlx5_alloc_bfreg(mdev, res->bfregs + i, false, false); 176 + if (err) { 177 + mlx5_core_warn(mdev, 178 + "could only allocate %d/%d doorbells, err %d.\n", 179 + i, num_doorbells, err); 180 + break; 181 + } 182 + } 183 + res->num_bfregs = i; 185 184 186 185 if (create_tises) { 187 186 err = mlx5e_create_tises(mdev, res->tisn); 188 187 if (err) { 189 188 mlx5_core_err(mdev, "alloc tises failed, %d\n", err); 190 - goto err_destroy_bfreg; 189 + goto err_destroy_bfregs; 191 190 } 192 191 res->tisn_valid = true; 193 192 } ··· 217 190 218 191 return 0; 219 192 220 - err_destroy_bfreg: 221 - mlx5_free_bfreg(mdev, &res->bfreg); 193 + err_destroy_bfregs: 194 + for (i = 0; i < res->num_bfregs; i++) 195 + mlx5_free_bfreg(mdev, res->bfregs + i); 196 + kfree(res->bfregs); 222 197 err_destroy_mkey: 223 198 mlx5_core_destroy_mkey(mdev, res->mkey); 224 199 err_dealloc_transport_domain: ··· 238 209 mdev->mlx5e_res.dek_priv = NULL; 239 210 if (res->tisn_valid) 240 211 mlx5e_destroy_tises(mdev, res->tisn); 241 - mlx5_free_bfreg(mdev, &res->bfreg); 212 + for (unsigned int i = 0; i < res->num_bfregs; i++) 213 + mlx5_free_bfreg(mdev, res->bfregs + i); 214 + kfree(res->bfregs); 242 215 mlx5_core_destroy_mkey(mdev, res->mkey); 243 216 mlx5_core_dealloc_transport_domain(mdev, res->td.tdn); 244 217 mlx5_core_dealloc_pd(mdev, res->pdn);
+30 -7
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
··· 1536 1536 sq->pdev = c->pdev; 1537 1537 sq->mkey_be = c->mkey_be; 1538 1538 sq->channel = c; 1539 - sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; 1539 + sq->uar_map = c->bfreg->map; 1540 1540 sq->min_inline_mode = params->tx_min_inline_mode; 1541 1541 sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu) - ETH_FCS_LEN; 1542 1542 sq->xsk_pool = xsk_pool; ··· 1621 1621 int err; 1622 1622 1623 1623 sq->channel = c; 1624 - sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; 1624 + sq->uar_map = c->bfreg->map; 1625 1625 sq->reserved_room = param->stop_room; 1626 1626 1627 1627 param->wq.db_numa_node = cpu_to_node(c->cpu); ··· 1706 1706 sq->priv = c->priv; 1707 1707 sq->ch_ix = c->ix; 1708 1708 sq->txq_ix = txq_ix; 1709 - sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; 1709 + sq->uar_map = c->bfreg->map; 1710 1710 sq->min_inline_mode = params->tx_min_inline_mode; 1711 1711 sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); 1712 1712 sq->max_sq_mpw_wqebbs = mlx5e_get_max_sq_aligned_wqebbs(mdev); ··· 1782 1782 MLX5_SET(sqc, sqc, flush_in_error_en, 1); 1783 1783 1784 1784 MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); 1785 - MLX5_SET(wq, wq, uar_page, mdev->mlx5e_res.hw_objs.bfreg.index); 1785 + MLX5_SET(wq, wq, uar_page, csp->uar_page); 1786 1786 MLX5_SET(wq, wq, log_wq_pg_sz, csp->wq_ctrl->buf.page_shift - 1787 1787 MLX5_ADAPTER_PAGE_SHIFT); 1788 1788 MLX5_SET64(wq, wq, dbr_addr, csp->wq_ctrl->db.dma); ··· 1886 1886 csp.cqn = sq->cq.mcq.cqn; 1887 1887 csp.wq_ctrl = &sq->wq_ctrl; 1888 1888 csp.min_inline_mode = sq->min_inline_mode; 1889 + csp.uar_page = c->bfreg->index; 1889 1890 err = mlx5e_create_sq_rdy(c->mdev, param, &csp, qos_queue_group_id, &sq->sqn); 1890 1891 if (err) 1891 1892 goto err_free_txqsq; ··· 2057 2056 csp.cqn = sq->cq.mcq.cqn; 2058 2057 csp.wq_ctrl = &sq->wq_ctrl; 2059 2058 csp.min_inline_mode = params->tx_min_inline_mode; 2059 + csp.uar_page = c->bfreg->index; 2060 2060 err = mlx5e_create_sq_rdy(c->mdev, param, &csp, 0, &sq->sqn); 2061 2061 if (err) 2062 2062 goto err_free_icosq; ··· 2118 2116 csp.cqn = sq->cq.mcq.cqn; 2119 2117 csp.wq_ctrl = &sq->wq_ctrl; 2120 2118 csp.min_inline_mode = sq->min_inline_mode; 2119 + csp.uar_page = c->bfreg->index; 2121 2120 set_bit(MLX5E_SQ_STATE_ENABLED, &sq->state); 2122 2121 2123 2122 err = mlx5e_create_sq_rdy(c->mdev, param, &csp, 0, &sq->sqn); ··· 2189 2186 static int mlx5e_alloc_cq_common(struct mlx5_core_dev *mdev, 2190 2187 struct net_device *netdev, 2191 2188 struct workqueue_struct *workqueue, 2189 + struct mlx5_uars_page *uar, 2192 2190 struct mlx5e_cq_param *param, 2193 2191 struct mlx5e_cq *cq) 2194 2192 { ··· 2221 2217 cq->mdev = mdev; 2222 2218 cq->netdev = netdev; 2223 2219 cq->workqueue = workqueue; 2220 + cq->uar = uar; 2224 2221 2225 2222 return 0; 2226 2223 } ··· 2237 2232 param->wq.db_numa_node = ccp->node; 2238 2233 param->eq_ix = ccp->ix; 2239 2234 2240 - err = mlx5e_alloc_cq_common(mdev, ccp->netdev, ccp->wq, param, cq); 2235 + err = mlx5e_alloc_cq_common(mdev, ccp->netdev, ccp->wq, 2236 + ccp->uar, param, cq); 2241 2237 2242 2238 cq->napi = ccp->napi; 2243 2239 cq->ch_stats = ccp->ch_stats; ··· 2283 2277 MLX5_SET(cqc, cqc, cq_period_mode, mlx5e_cq_period_mode(param->cq_period_mode)); 2284 2278 2285 2279 MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); 2286 - MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); 2280 + MLX5_SET(cqc, cqc, uar_page, cq->uar->index); 2287 2281 MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - 2288 2282 MLX5_ADAPTER_PAGE_SHIFT); 2289 2283 MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); ··· 2750 2744 local_bh_enable(); 2751 2745 } 2752 2746 2747 + static void mlx5e_channel_pick_doorbell(struct mlx5e_channel *c) 2748 + { 2749 + struct mlx5e_hw_objs *hw_objs = &c->mdev->mlx5e_res.hw_objs; 2750 + 2751 + /* No dedicated Ethernet doorbells, use the global one. */ 2752 + if (hw_objs->num_bfregs == 0) { 2753 + c->bfreg = &c->mdev->priv.bfreg; 2754 + return; 2755 + } 2756 + 2757 + /* Round-robin between doorbells. */ 2758 + c->bfreg = hw_objs->bfregs + c->vec_ix % hw_objs->num_bfregs; 2759 + } 2760 + 2753 2761 static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, 2754 2762 struct mlx5e_params *params, 2755 2763 struct xsk_buff_pool *xsk_pool, ··· 2817 2797 c->stats = &priv->channel_stats[ix]->ch; 2818 2798 c->aff_mask = irq_get_effective_affinity_mask(irq); 2819 2799 c->lag_port = mlx5e_enumerate_lag_port(mdev, ix); 2800 + 2801 + mlx5e_channel_pick_doorbell(c); 2820 2802 2821 2803 netif_napi_add_config_locked(netdev, &c->napi, mlx5e_napi_poll, ix); 2822 2804 netif_napi_set_irq_locked(&c->napi, irq); ··· 3605 3583 param->wq.buf_numa_node = dev_to_node(mlx5_core_dma_dev(mdev)); 3606 3584 param->wq.db_numa_node = dev_to_node(mlx5_core_dma_dev(mdev)); 3607 3585 3608 - return mlx5e_alloc_cq_common(priv->mdev, priv->netdev, priv->wq, param, cq); 3586 + return mlx5e_alloc_cq_common(priv->mdev, priv->netdev, priv->wq, 3587 + mdev->priv.bfreg.up, param, cq); 3609 3588 } 3610 3589 3611 3590 int mlx5e_open_drop_rq(struct mlx5e_priv *priv,
+3 -5
drivers/net/ethernet/mellanox/mlx5/core/eq.c
··· 32 32 MLX5_EQ_STATE_ALWAYS_ARMED = 0xb, 33 33 }; 34 34 35 - enum { 36 - MLX5_EQ_DOORBEL_OFFSET = 0x40, 37 - }; 35 + #define MLX5_EQ_DOORBELL_OFFSET 0x40 38 36 39 37 /* budget must be smaller than MLX5_NUM_SPARE_EQE to guarantee that we update 40 38 * the ci before we polled all the entries in the EQ. MLX5_NUM_SPARE_EQE is ··· 307 309 308 310 eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry); 309 311 MLX5_SET(eqc, eqc, log_eq_size, eq->fbc.log_sz); 310 - MLX5_SET(eqc, eqc, uar_page, priv->uar->index); 312 + MLX5_SET(eqc, eqc, uar_page, priv->bfreg.up->index); 311 313 MLX5_SET(eqc, eqc, intr, vecidx); 312 314 MLX5_SET(eqc, eqc, log_page_size, 313 315 eq->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT); ··· 320 322 eq->eqn = MLX5_GET(create_eq_out, out, eq_number); 321 323 eq->irqn = pci_irq_vector(dev->pdev, vecidx); 322 324 eq->dev = dev; 323 - eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET; 325 + eq->doorbell = priv->bfreg.up->map + MLX5_EQ_DOORBELL_OFFSET; 324 326 325 327 err = mlx5_debug_eq_add(dev, eq); 326 328 if (err)
-1
drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
··· 475 475 *conn->cq.mcq.arm_db = 0; 476 476 conn->cq.mcq.vector = 0; 477 477 conn->cq.mcq.comp = mlx5_fpga_conn_cq_complete; 478 - conn->cq.mcq.uar = fdev->conn_res.uar; 479 478 tasklet_setup(&conn->cq.tasklet, mlx5_fpga_conn_cq_tasklet); 480 479 481 480 mlx5_fpga_dbg(fdev, "Created CQ #0x%x\n", conn->cq.mcq.cqn);
+4 -4
drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
··· 100 100 101 101 MLX5_SET(cqc, cqc, cq_period_mode, MLX5_CQ_PERIOD_MODE_START_FROM_EQE); 102 102 MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); 103 - MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); 103 + MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); 104 104 MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - 105 105 MLX5_ADAPTER_PAGE_SHIFT); 106 106 MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); ··· 129 129 return -ENOMEM; 130 130 131 131 MLX5_SET(cqc, cqc_data, log_cq_size, 1); 132 - MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.uar->index); 132 + MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.bfreg.up->index); 133 133 if (MLX5_CAP_GEN(mdev, cqe_128_always) && cache_line_size() >= 128) 134 134 MLX5_SET(cqc, cqc_data, cqe_sz, CQE_STRIDE_128_PAD); 135 135 ··· 163 163 struct mlx5_wq_param param; 164 164 int err; 165 165 166 - sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; 166 + sq->uar_map = mdev->priv.bfreg.map; 167 167 168 168 param.db_numa_node = numa_node; 169 169 param.buf_numa_node = numa_node; ··· 203 203 MLX5_SET(sqc, sqc, ts_format, ts_format); 204 204 205 205 MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); 206 - MLX5_SET(wq, wq, uar_page, mdev->mlx5e_res.hw_objs.bfreg.index); 206 + MLX5_SET(wq, wq, uar_page, mdev->priv.bfreg.index); 207 207 MLX5_SET(wq, wq, log_wq_pg_sz, sq->wq_ctrl.buf.page_shift - 208 208 MLX5_ADAPTER_PAGE_SHIFT); 209 209 MLX5_SET64(wq, wq, dbr_addr, sq->wq_ctrl.db.dma);
+5 -6
drivers/net/ethernet/mellanox/mlx5/core/main.c
··· 1316 1316 { 1317 1317 int err; 1318 1318 1319 - dev->priv.uar = mlx5_get_uars_page(dev); 1320 - if (IS_ERR(dev->priv.uar)) { 1321 - mlx5_core_err(dev, "Failed allocating uar, aborting\n"); 1322 - err = PTR_ERR(dev->priv.uar); 1319 + err = mlx5_alloc_bfreg(dev, &dev->priv.bfreg, false, false); 1320 + if (err) { 1321 + mlx5_core_err(dev, "Failed allocating bfreg, %d\n", err); 1323 1322 return err; 1324 1323 } 1325 1324 ··· 1429 1430 err_irq_table: 1430 1431 mlx5_pagealloc_stop(dev); 1431 1432 mlx5_events_stop(dev); 1432 - mlx5_put_uars_page(dev, dev->priv.uar); 1433 + mlx5_free_bfreg(dev, &dev->priv.bfreg); 1433 1434 return err; 1434 1435 } 1435 1436 ··· 1454 1455 mlx5_irq_table_destroy(dev); 1455 1456 mlx5_pagealloc_stop(dev); 1456 1457 mlx5_events_stop(dev); 1457 - mlx5_put_uars_page(dev, dev->priv.uar); 1458 + mlx5_free_bfreg(dev, &dev->priv.bfreg); 1458 1459 } 1459 1460 1460 1461 int mlx5_init_one_devl_locked(struct mlx5_core_dev *dev)
+4 -4
drivers/net/ethernet/mellanox/mlx5/core/steering/hws/send.c
··· 690 690 size_t buf_sz; 691 691 int err; 692 692 693 - sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; 693 + sq->uar_map = mdev->priv.bfreg.map; 694 694 sq->mdev = mdev; 695 695 696 696 param.db_numa_node = numa_node; ··· 764 764 MLX5_SET(sqc, sqc, ts_format, ts_format); 765 765 766 766 MLX5_SET(wq, wq, wq_type, MLX5_WQ_TYPE_CYCLIC); 767 - MLX5_SET(wq, wq, uar_page, mdev->mlx5e_res.hw_objs.bfreg.index); 767 + MLX5_SET(wq, wq, uar_page, mdev->priv.bfreg.index); 768 768 MLX5_SET(wq, wq, log_wq_pg_sz, sq->wq_ctrl.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT); 769 769 MLX5_SET64(wq, wq, dbr_addr, sq->wq_ctrl.db.dma); 770 770 ··· 940 940 (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas)); 941 941 942 942 MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); 943 - MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); 943 + MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); 944 944 MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT); 945 945 MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); 946 946 ··· 963 963 if (!cqc_data) 964 964 return -ENOMEM; 965 965 966 - MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.uar->index); 966 + MLX5_SET(cqc, cqc_data, uar_page, mdev->priv.bfreg.up->index); 967 967 MLX5_SET(cqc, cqc_data, log_cq_size, ilog2(queue->num_entries)); 968 968 969 969 err = hws_send_ring_alloc_cq(mdev, numa_node, queue, cqc_data, cq);
-1
drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_send.c
··· 1131 1131 *cq->mcq.arm_db = cpu_to_be32(2 << 28); 1132 1132 1133 1133 cq->mcq.vector = 0; 1134 - cq->mcq.uar = uar; 1135 1134 cq->mdev = mdev; 1136 1135 1137 1136 return cq;
+9 -7
drivers/net/ethernet/mellanox/mlx5/core/wc.c
··· 94 94 95 95 MLX5_SET(cqc, cqc, cq_period_mode, MLX5_CQ_PERIOD_MODE_START_FROM_EQE); 96 96 MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); 97 - MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); 97 + MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); 98 98 MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift - 99 99 MLX5_ADAPTER_PAGE_SHIFT); 100 100 MLX5_SET64(cqc, cqc, dbr_addr, cq->wq_ctrl.db.dma); ··· 116 116 return -ENOMEM; 117 117 118 118 MLX5_SET(cqc, cqc, log_cq_size, TEST_WC_LOG_CQ_SZ); 119 - MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); 119 + MLX5_SET(cqc, cqc, uar_page, mdev->priv.bfreg.up->index); 120 120 if (MLX5_CAP_GEN(mdev, cqe_128_always) && cache_line_size() >= 128) 121 121 MLX5_SET(cqc, cqc, cqe_sz, CQE_STRIDE_128_PAD); 122 122 ··· 255 255 mlx5_wq_destroy(&sq->wq_ctrl); 256 256 } 257 257 258 - static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, bool signaled) 258 + static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset, 259 + bool signaled) 259 260 { 260 261 int buf_size = (1 << MLX5_CAP_GEN(sq->cq.mdev, log_bf_reg_size)) / 2; 261 262 struct mlx5_wqe_ctrl_seg *ctrl; ··· 289 288 */ 290 289 wmb(); 291 290 292 - __iowrite64_copy(sq->bfreg.map + sq->bfreg.offset, mmio_wqe, 291 + __iowrite64_copy(sq->bfreg.map + *offset, mmio_wqe, 293 292 sizeof(mmio_wqe) / 8); 294 293 295 - sq->bfreg.offset ^= buf_size; 294 + *offset ^= buf_size; 296 295 } 297 296 298 297 static int mlx5_wc_poll_cq(struct mlx5_wc_sq *sq) ··· 333 332 334 333 static void mlx5_core_test_wc(struct mlx5_core_dev *mdev) 335 334 { 335 + unsigned int offset = 0; 336 336 unsigned long expires; 337 337 struct mlx5_wc_sq *sq; 338 338 int i, err; ··· 360 358 goto err_create_sq; 361 359 362 360 for (i = 0; i < TEST_WC_NUM_WQES - 1; i++) 363 - mlx5_wc_post_nop(sq, false); 361 + mlx5_wc_post_nop(sq, &offset, false); 364 362 365 - mlx5_wc_post_nop(sq, true); 363 + mlx5_wc_post_nop(sq, &offset, true); 366 364 367 365 expires = jiffies + TEST_WC_POLLING_MAX_TIME_JIFFIES; 368 366 do {
-1
include/linux/mlx5/cq.h
··· 41 41 int cqe_sz; 42 42 __be32 *set_ci_db; 43 43 __be32 *arm_db; 44 - struct mlx5_uars_page *uar; 45 44 refcount_t refcount; 46 45 struct completion free; 47 46 unsigned vector;
+5 -3
include/linux/mlx5/driver.h
··· 434 434 struct mlx5_uars_page *up; 435 435 bool wc; 436 436 u32 index; 437 - unsigned int offset; 438 437 }; 439 438 440 439 struct mlx5_core_health { ··· 612 613 struct mlx5_ft_pool *ft_pool; 613 614 614 615 struct mlx5_bfreg_data bfregs; 615 - struct mlx5_uars_page *uar; 616 + struct mlx5_sq_bfreg bfreg; 616 617 #ifdef CONFIG_MLX5_SF 617 618 struct mlx5_vhca_state_notifier *vhca_state_notifier; 618 619 struct mlx5_sf_dev_table *sf_dev_table; ··· 658 659 u32 pdn; 659 660 struct mlx5_td td; 660 661 u32 mkey; 661 - struct mlx5_sq_bfreg bfreg; 662 + struct mlx5_sq_bfreg *bfregs; 663 + unsigned int num_bfregs; 662 664 #define MLX5_MAX_NUM_TC 8 663 665 u32 tisn[MLX5_MAX_PORTS][MLX5_MAX_NUM_TC]; 664 666 bool tisn_valid; ··· 802 802 dma_addr_t dma; 803 803 int index; 804 804 }; 805 + 806 + #define MLX5_DEFAULT_NUM_DOORBELLS 8 805 807 806 808 enum { 807 809 MLX5_COMP_EQ_SIZE = 1024,
+4
include/net/devlink.h
··· 531 531 DEVLINK_PARAM_GENERIC_ID_ENABLE_PHC, 532 532 DEVLINK_PARAM_GENERIC_ID_CLOCK_ID, 533 533 DEVLINK_PARAM_GENERIC_ID_TOTAL_VFS, 534 + DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS, 534 535 535 536 /* add new param generic ids above here*/ 536 537 __DEVLINK_PARAM_GENERIC_ID_MAX, ··· 598 597 599 598 #define DEVLINK_PARAM_GENERIC_TOTAL_VFS_NAME "total_vfs" 600 599 #define DEVLINK_PARAM_GENERIC_TOTAL_VFS_TYPE DEVLINK_PARAM_TYPE_U32 600 + 601 + #define DEVLINK_PARAM_GENERIC_NUM_DOORBELLS_NAME "num_doorbells" 602 + #define DEVLINK_PARAM_GENERIC_NUM_DOORBELLS_TYPE DEVLINK_PARAM_TYPE_U32 601 603 602 604 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \ 603 605 { \
+5
net/devlink/param.c
··· 107 107 .name = DEVLINK_PARAM_GENERIC_TOTAL_VFS_NAME, 108 108 .type = DEVLINK_PARAM_GENERIC_TOTAL_VFS_TYPE, 109 109 }, 110 + { 111 + .id = DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS, 112 + .name = DEVLINK_PARAM_GENERIC_NUM_DOORBELLS_NAME, 113 + .type = DEVLINK_PARAM_GENERIC_NUM_DOORBELLS_TYPE, 114 + }, 110 115 }; 111 116 112 117 static int devlink_param_generic_verify(const struct devlink_param *param)