Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:

- Usual minor updates and fixes for bnxt_re, hfi1, rxe, mana, iser,
mlx5, vmw_pvrdma, hns

- Make rxe work on tun devices

- mana gains more standard verbs as it moves toward supporting
in-kernel verbs

- DMABUF support for mana

- Fix page size calculations when memory registration exceeds 4G

- On Demand Paging support for rxe

- mlx5 support for RDMA TRANSPORT flow tables and a new ucap mechanism
to access control use of them

- Optional RDMA_TX/RX counters per QP in mlx5

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (73 commits)
IB/mad: Check available slots before posting receive WRs
RDMA/mana_ib: Fix integer overflow during queue creation
RDMA/mlx5: Fix calculation of total invalidated pages
RDMA/mlx5: Fix mlx5_poll_one() cur_qp update flow
RDMA/mlx5: Fix page_size variable overflow
RDMA/mlx5: Drop access_flags from _mlx5_mr_cache_alloc()
RDMA/mlx5: Fix cache entry update on dereg error
RDMA/mlx5: Fix MR cache initialization error flow
RDMA/mlx5: Support optional-counters binding for QPs
RDMA/mlx5: Compile fs.c regardless of INFINIBAND_USER_ACCESS config
RDMA/core: Pass port to counter bind/unbind operations
RDMA/core: Add support to optional-counters binding configuration
RDMA/core: Create and destroy rdma_counter using rdma_zalloc_drv_obj()
RDMA/mlx5: Add optional counters for RDMA_TX/RX_packets/bytes
RDMA/core: Fix use-after-free when rename device name
RDMA/bnxt_re: Support perf management counters
RDMA/rxe: Fix incorrect return value of rxe_odp_atomic_op()
RDMA/uverbs: Propagate errors from rdma_lookup_get_uobject()
RDMA/mana_ib: Handle net event for pointing to the current netdev
net: mana: Change the function signature of mana_get_primary_netdev_rcu
...

+4056 -660
+1
Documentation/infiniband/index.rst
··· 12 12 opa_vnic 13 13 sysfs 14 14 tag_matching 15 + ucaps 15 16 user_mad 16 17 user_verbs 17 18
+71
Documentation/infiniband/ucaps.rst
··· 1 + ================================= 2 + Infiniband Userspace Capabilities 3 + ================================= 4 + 5 + User CAPabilities (UCAPs) provide fine-grained control over specific 6 + firmware features in Infiniband (IB) devices. This approach offers 7 + more granular capabilities than the existing Linux capabilities, 8 + which may be too generic for certain FW features. 9 + 10 + Each user capability is represented as a character device with root 11 + read-write access. Root processes can grant users special privileges 12 + by allowing access to these character devices (e.g., using chown). 13 + 14 + Usage 15 + ===== 16 + 17 + UCAPs allow control over specific features of an IB device using file 18 + descriptors of UCAP character devices. Here is how a user enables 19 + specific features of an IB device: 20 + 21 + * A root process grants the user access to the UCAP files that 22 + represents the capabilities (e.g., using chown). 23 + * The user opens the UCAP files, obtaining file descriptors. 24 + * When opening an IB device, include an array of the UCAP file 25 + descriptors as an attribute. 26 + * The ib_uverbs driver recognizes the UCAP file descriptors and enables 27 + the corresponding capabilities for the IB device. 28 + 29 + Creating UCAPs 30 + ============== 31 + 32 + To create a new UCAP, drivers must first define a type in the 33 + rdma_user_cap enum in rdma/ib_ucaps.h. The name of the UCAP character 34 + device should be added to the ucap_names array in 35 + drivers/infiniband/core/ucaps.c. Then, the driver can create the UCAP 36 + character device by calling the ib_create_ucap API with the UCAP 37 + type. 38 + 39 + A reference count is stored for each UCAP to track creations and 40 + removals of the UCAP device. If multiple creation calls are made with 41 + the same type (e.g., for two IB devices), the UCAP character device 42 + is created during the first call and subsequent calls increment the 43 + reference count. 44 + 45 + The UCAP character device is created under /dev/infiniband, and its 46 + permissions are set to allow root read and write access only. 47 + 48 + Removing UCAPs 49 + ============== 50 + 51 + Each removal decrements the reference count of the UCAP. The UCAP 52 + character device is removed from the filesystem only when the 53 + reference count is decreased to 0. 54 + 55 + /dev and /sys/class files 56 + ========================= 57 + 58 + The class:: 59 + 60 + /sys/class/infiniband_ucaps 61 + 62 + is created when the first UCAP character device is created. 63 + 64 + The UCAP character device is created under /dev/infiniband. 65 + 66 + For example, if mlx5_ib adds the rdma_user_cap 67 + RDMA_UCAP_MLX5_CTRL_LOCAL with name "mlx5_perm_ctrl_local", this will 68 + create the device node:: 69 + 70 + /dev/infiniband/mlx5_perm_ctrl_local 71 +
+2 -1
drivers/infiniband/core/Makefile
··· 39 39 uverbs_std_types_async_fd.o \ 40 40 uverbs_std_types_srq.o \ 41 41 uverbs_std_types_wq.o \ 42 - uverbs_std_types_qp.o 42 + uverbs_std_types_qp.o \ 43 + ucaps.o 43 44 ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o 44 45 ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
+6
drivers/infiniband/core/cache.c
··· 1501 1501 device->port_data[port].cache.pkey = pkey_cache; 1502 1502 } 1503 1503 device->port_data[port].cache.lmc = tprops->lmc; 1504 + 1505 + if (device->port_data[port].cache.port_state != IB_PORT_NOP && 1506 + device->port_data[port].cache.port_state != tprops->state) 1507 + ibdev_info(device, "Port: %d Link %s\n", port, 1508 + ib_port_state_to_str(tprops->state)); 1509 + 1504 1510 device->port_data[port].cache.port_state = tprops->state; 1505 1511 1506 1512 device->port_data[port].cache.subnet_prefix = tprops->subnet_prefix;
+19 -5
drivers/infiniband/core/cma.c
··· 739 739 goto out; 740 740 } 741 741 742 - if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) { 743 - ndev = dev_get_by_index(dev_addr->net, bound_if_index); 744 - if (!ndev) 745 - goto out; 742 + /* 743 + * For a RXE device, it should work with TUN device and normal ethernet 744 + * devices. Use driver_id to check if a device is a RXE device or not. 745 + * ARPHDR_NONE means a TUN device. 746 + */ 747 + if (device->ops.driver_id == RDMA_DRIVER_RXE) { 748 + if ((dev_type == ARPHRD_NONE || dev_type == ARPHRD_ETHER) 749 + && rdma_protocol_roce(device, port)) { 750 + ndev = dev_get_by_index(dev_addr->net, bound_if_index); 751 + if (!ndev) 752 + goto out; 753 + } 746 754 } else { 747 - gid_type = IB_GID_TYPE_IB; 755 + if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) { 756 + ndev = dev_get_by_index(dev_addr->net, bound_if_index); 757 + if (!ndev) 758 + goto out; 759 + } else { 760 + gid_type = IB_GID_TYPE_IB; 761 + } 748 762 } 749 763 750 764 sgid_attr = rdma_find_gid_by_port(device, gid, gid_type, port, ndev);
+32 -20
drivers/infiniband/core/counters.c
··· 12 12 13 13 static int __counter_set_mode(struct rdma_port_counter *port_counter, 14 14 enum rdma_nl_counter_mode new_mode, 15 - enum rdma_nl_counter_mask new_mask) 15 + enum rdma_nl_counter_mask new_mask, 16 + bool bind_opcnt) 16 17 { 17 18 if (new_mode == RDMA_COUNTER_MODE_AUTO) { 18 19 if (new_mask & (~ALL_AUTO_MODE_MASKS)) ··· 24 23 25 24 port_counter->mode.mode = new_mode; 26 25 port_counter->mode.mask = new_mask; 26 + port_counter->mode.bind_opcnt = bind_opcnt; 27 27 return 0; 28 28 } 29 29 ··· 43 41 */ 44 42 int rdma_counter_set_auto_mode(struct ib_device *dev, u32 port, 45 43 enum rdma_nl_counter_mask mask, 44 + bool bind_opcnt, 46 45 struct netlink_ext_ack *extack) 47 46 { 48 47 struct rdma_port_counter *port_counter; ··· 62 59 RDMA_COUNTER_MODE_NONE; 63 60 64 61 if (port_counter->mode.mode == mode && 65 - port_counter->mode.mask == mask) { 62 + port_counter->mode.mask == mask && 63 + port_counter->mode.bind_opcnt == bind_opcnt) { 66 64 ret = 0; 67 65 goto out; 68 66 } 69 67 70 - ret = __counter_set_mode(port_counter, mode, mask); 68 + ret = __counter_set_mode(port_counter, mode, mask, bind_opcnt); 71 69 72 70 out: 73 71 mutex_unlock(&port_counter->lock); ··· 93 89 } 94 90 95 91 static int __rdma_counter_bind_qp(struct rdma_counter *counter, 96 - struct ib_qp *qp) 92 + struct ib_qp *qp, u32 port) 97 93 { 98 94 int ret; 99 95 ··· 104 100 return -EOPNOTSUPP; 105 101 106 102 mutex_lock(&counter->lock); 107 - ret = qp->device->ops.counter_bind_qp(counter, qp); 103 + ret = qp->device->ops.counter_bind_qp(counter, qp, port); 108 104 mutex_unlock(&counter->lock); 109 105 110 106 return ret; ··· 144 140 145 141 static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, 146 142 struct ib_qp *qp, 147 - enum rdma_nl_counter_mode mode) 143 + enum rdma_nl_counter_mode mode, 144 + bool bind_opcnt) 148 145 { 149 146 struct rdma_port_counter *port_counter; 150 147 struct rdma_counter *counter; ··· 154 149 if (!dev->ops.counter_dealloc || !dev->ops.counter_alloc_stats) 155 150 return NULL; 156 151 157 - counter = kzalloc(sizeof(*counter), GFP_KERNEL); 152 + counter = rdma_zalloc_drv_obj(dev, rdma_counter); 158 153 if (!counter) 159 154 return NULL; 160 155 161 156 counter->device = dev; 162 157 counter->port = port; 158 + 159 + dev->ops.counter_init(counter); 163 160 164 161 rdma_restrack_new(&counter->res, RDMA_RESTRACK_COUNTER); 165 162 counter->stats = dev->ops.counter_alloc_stats(counter); ··· 173 166 switch (mode) { 174 167 case RDMA_COUNTER_MODE_MANUAL: 175 168 ret = __counter_set_mode(port_counter, RDMA_COUNTER_MODE_MANUAL, 176 - 0); 169 + 0, bind_opcnt); 177 170 if (ret) { 178 171 mutex_unlock(&port_counter->lock); 179 172 goto err_mode; ··· 192 185 mutex_unlock(&port_counter->lock); 193 186 194 187 counter->mode.mode = mode; 188 + counter->mode.bind_opcnt = bind_opcnt; 195 189 kref_init(&counter->kref); 196 190 mutex_init(&counter->lock); 197 191 198 - ret = __rdma_counter_bind_qp(counter, qp); 192 + ret = __rdma_counter_bind_qp(counter, qp, port); 199 193 if (ret) 200 194 goto err_mode; 201 195 ··· 221 213 port_counter->num_counters--; 222 214 if (!port_counter->num_counters && 223 215 (port_counter->mode.mode == RDMA_COUNTER_MODE_MANUAL)) 224 - __counter_set_mode(port_counter, RDMA_COUNTER_MODE_NONE, 0); 216 + __counter_set_mode(port_counter, RDMA_COUNTER_MODE_NONE, 0, 217 + false); 225 218 226 219 mutex_unlock(&port_counter->lock); 227 220 ··· 247 238 return match; 248 239 } 249 240 250 - static int __rdma_counter_unbind_qp(struct ib_qp *qp) 241 + static int __rdma_counter_unbind_qp(struct ib_qp *qp, u32 port) 251 242 { 252 243 struct rdma_counter *counter = qp->counter; 253 244 int ret; ··· 256 247 return -EOPNOTSUPP; 257 248 258 249 mutex_lock(&counter->lock); 259 - ret = qp->device->ops.counter_unbind_qp(qp); 250 + ret = qp->device->ops.counter_unbind_qp(qp, port); 260 251 mutex_unlock(&counter->lock); 261 252 262 253 return ret; ··· 348 339 349 340 counter = rdma_get_counter_auto_mode(qp, port); 350 341 if (counter) { 351 - ret = __rdma_counter_bind_qp(counter, qp); 342 + ret = __rdma_counter_bind_qp(counter, qp, port); 352 343 if (ret) { 353 344 kref_put(&counter->kref, counter_release); 354 345 return ret; 355 346 } 356 347 } else { 357 - counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_AUTO); 348 + counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_AUTO, 349 + port_counter->mode.bind_opcnt); 358 350 if (!counter) 359 351 return -ENOMEM; 360 352 } ··· 368 358 * @force: 369 359 * true - Decrease the counter ref-count anyway (e.g., qp destroy) 370 360 */ 371 - int rdma_counter_unbind_qp(struct ib_qp *qp, bool force) 361 + int rdma_counter_unbind_qp(struct ib_qp *qp, u32 port, bool force) 372 362 { 373 363 struct rdma_counter *counter = qp->counter; 374 364 int ret; ··· 376 366 if (!counter) 377 367 return -EINVAL; 378 368 379 - ret = __rdma_counter_unbind_qp(qp); 369 + ret = __rdma_counter_unbind_qp(qp, port); 380 370 if (ret && !force) 381 371 return ret; 382 372 ··· 523 513 goto err_task; 524 514 } 525 515 526 - ret = __rdma_counter_bind_qp(counter, qp); 516 + ret = __rdma_counter_bind_qp(counter, qp, port); 527 517 if (ret) 528 518 goto err_task; 529 519 ··· 568 558 goto err; 569 559 } 570 560 571 - counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_MANUAL); 561 + counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_MANUAL, true); 572 562 if (!counter) { 573 563 ret = -ENOMEM; 574 564 goto err; ··· 614 604 goto out; 615 605 } 616 606 617 - ret = rdma_counter_unbind_qp(qp, false); 607 + ret = rdma_counter_unbind_qp(qp, port, false); 618 608 619 609 out: 620 610 rdma_restrack_put(&qp->res); ··· 623 613 624 614 int rdma_counter_get_mode(struct ib_device *dev, u32 port, 625 615 enum rdma_nl_counter_mode *mode, 626 - enum rdma_nl_counter_mask *mask) 616 + enum rdma_nl_counter_mask *mask, 617 + bool *opcnt) 627 618 { 628 619 struct rdma_port_counter *port_counter; 629 620 630 621 port_counter = &dev->port_data[port].port_counter; 631 622 *mode = port_counter->mode.mode; 632 623 *mask = port_counter->mode.mask; 624 + *opcnt = port_counter->mode.bind_opcnt; 633 625 634 626 return 0; 635 627 }
+18 -2
drivers/infiniband/core/device.c
··· 528 528 static void rdma_init_coredev(struct ib_core_device *coredev, 529 529 struct ib_device *dev, struct net *net) 530 530 { 531 + bool is_full_dev = &dev->coredev == coredev; 532 + 531 533 /* This BUILD_BUG_ON is intended to catch layout change 532 534 * of union of ib_core_device and device. 533 535 * dev must be the first element as ib_core and providers ··· 541 539 542 540 coredev->dev.class = &ib_class; 543 541 coredev->dev.groups = dev->groups; 542 + 543 + /* 544 + * Don't expose hw counters outside of the init namespace. 545 + */ 546 + if (!is_full_dev && dev->hw_stats_attr_index) 547 + coredev->dev.groups[dev->hw_stats_attr_index] = NULL; 548 + 544 549 device_initialize(&coredev->dev); 545 550 coredev->owner = dev; 546 551 INIT_LIST_HEAD(&coredev->port_list); ··· 1350 1341 u32 port; 1351 1342 int ret; 1352 1343 1344 + down_read(&devices_rwsem); 1345 + 1353 1346 ret = rdma_nl_notify_event(device, 0, RDMA_REGISTER_EVENT); 1354 1347 if (ret) 1355 - return; 1348 + goto out; 1356 1349 1357 1350 rdma_for_each_port(device, port) { 1358 1351 netdev = ib_device_get_netdev(device, port); ··· 1365 1354 RDMA_NETDEV_ATTACH_EVENT); 1366 1355 dev_put(netdev); 1367 1356 if (ret) 1368 - return; 1357 + goto out; 1369 1358 } 1359 + 1360 + out: 1361 + up_read(&devices_rwsem); 1370 1362 } 1371 1363 1372 1364 /** ··· 2683 2669 SET_DEVICE_OP(dev_ops, counter_alloc_stats); 2684 2670 SET_DEVICE_OP(dev_ops, counter_bind_qp); 2685 2671 SET_DEVICE_OP(dev_ops, counter_dealloc); 2672 + SET_DEVICE_OP(dev_ops, counter_init); 2686 2673 SET_DEVICE_OP(dev_ops, counter_unbind_qp); 2687 2674 SET_DEVICE_OP(dev_ops, counter_update_stats); 2688 2675 SET_DEVICE_OP(dev_ops, create_ah); ··· 2798 2783 SET_OBJ_SIZE(dev_ops, ib_srq); 2799 2784 SET_OBJ_SIZE(dev_ops, ib_ucontext); 2800 2785 SET_OBJ_SIZE(dev_ops, ib_xrcd); 2786 + SET_OBJ_SIZE(dev_ops, rdma_counter); 2801 2787 } 2802 2788 EXPORT_SYMBOL(ib_set_device_ops); 2803 2789
+3 -1
drivers/infiniband/core/iwcm.c
··· 109 109 .data = &default_backlog, 110 110 .maxlen = sizeof(default_backlog), 111 111 .mode = 0644, 112 - .proc_handler = proc_dointvec, 112 + .proc_handler = proc_dointvec_minmax, 113 + .extra1 = SYSCTL_ZERO, 114 + .extra2 = SYSCTL_INT_MAX, 113 115 }, 114 116 }; 115 117
+20 -18
drivers/infiniband/core/mad.c
··· 2671 2671 struct ib_mad_private *mad) 2672 2672 { 2673 2673 unsigned long flags; 2674 - int post, ret; 2675 2674 struct ib_mad_private *mad_priv; 2676 2675 struct ib_sge sg_list; 2677 2676 struct ib_recv_wr recv_wr; 2678 2677 struct ib_mad_queue *recv_queue = &qp_info->recv_queue; 2678 + int ret = 0; 2679 2679 2680 2680 /* Initialize common scatter list fields */ 2681 2681 sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey; ··· 2685 2685 recv_wr.sg_list = &sg_list; 2686 2686 recv_wr.num_sge = 1; 2687 2687 2688 - do { 2688 + while (true) { 2689 2689 /* Allocate and map receive buffer */ 2690 2690 if (mad) { 2691 2691 mad_priv = mad; ··· 2693 2693 } else { 2694 2694 mad_priv = alloc_mad_private(port_mad_size(qp_info->port_priv), 2695 2695 GFP_ATOMIC); 2696 - if (!mad_priv) { 2697 - ret = -ENOMEM; 2698 - break; 2699 - } 2696 + if (!mad_priv) 2697 + return -ENOMEM; 2700 2698 } 2701 2699 sg_list.length = mad_priv_dma_size(mad_priv); 2702 2700 sg_list.addr = ib_dma_map_single(qp_info->port_priv->device, ··· 2703 2705 DMA_FROM_DEVICE); 2704 2706 if (unlikely(ib_dma_mapping_error(qp_info->port_priv->device, 2705 2707 sg_list.addr))) { 2706 - kfree(mad_priv); 2707 2708 ret = -ENOMEM; 2708 - break; 2709 + goto free_mad_priv; 2709 2710 } 2710 2711 mad_priv->header.mapping = sg_list.addr; 2711 2712 mad_priv->header.mad_list.mad_queue = recv_queue; 2712 2713 mad_priv->header.mad_list.cqe.done = ib_mad_recv_done; 2713 2714 recv_wr.wr_cqe = &mad_priv->header.mad_list.cqe; 2714 - 2715 - /* Post receive WR */ 2716 2715 spin_lock_irqsave(&recv_queue->lock, flags); 2717 - post = (++recv_queue->count < recv_queue->max_active); 2718 - list_add_tail(&mad_priv->header.mad_list.list, &recv_queue->list); 2716 + if (recv_queue->count >= recv_queue->max_active) { 2717 + /* Fully populated the receive queue */ 2718 + spin_unlock_irqrestore(&recv_queue->lock, flags); 2719 + break; 2720 + } 2721 + recv_queue->count++; 2722 + list_add_tail(&mad_priv->header.mad_list.list, 2723 + &recv_queue->list); 2719 2724 spin_unlock_irqrestore(&recv_queue->lock, flags); 2725 + 2720 2726 ret = ib_post_recv(qp_info->qp, &recv_wr, NULL); 2721 2727 if (ret) { 2722 2728 spin_lock_irqsave(&recv_queue->lock, flags); 2723 2729 list_del(&mad_priv->header.mad_list.list); 2724 2730 recv_queue->count--; 2725 2731 spin_unlock_irqrestore(&recv_queue->lock, flags); 2726 - ib_dma_unmap_single(qp_info->port_priv->device, 2727 - mad_priv->header.mapping, 2728 - mad_priv_dma_size(mad_priv), 2729 - DMA_FROM_DEVICE); 2730 - kfree(mad_priv); 2731 2732 dev_err(&qp_info->port_priv->device->dev, 2732 2733 "ib_post_recv failed: %d\n", ret); 2733 2734 break; 2734 2735 } 2735 - } while (post); 2736 + } 2736 2737 2738 + ib_dma_unmap_single(qp_info->port_priv->device, 2739 + mad_priv->header.mapping, 2740 + mad_priv_dma_size(mad_priv), DMA_FROM_DEVICE); 2741 + free_mad_priv: 2742 + kfree(mad_priv); 2737 2743 return ret; 2738 2744 } 2739 2745
+16 -2
drivers/infiniband/core/nldev.c
··· 171 171 [RDMA_NLDEV_ATTR_PARENT_NAME] = { .type = NLA_NUL_STRING }, 172 172 [RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE] = { .type = NLA_U8 }, 173 173 [RDMA_NLDEV_ATTR_EVENT_TYPE] = { .type = NLA_U8 }, 174 + [RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED] = { .type = NLA_U8 }, 174 175 }; 175 176 176 177 static int put_driver_name_print_type(struct sk_buff *msg, const char *name, ··· 2029 2028 struct ib_device *device, u32 port) 2030 2029 { 2031 2030 u32 mode, mask = 0, qpn, cntn = 0; 2031 + bool opcnt = false; 2032 2032 int ret; 2033 2033 2034 2034 /* Currently only counter for QP is supported */ ··· 2037 2035 nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_RES]) != RDMA_NLDEV_ATTR_RES_QP) 2038 2036 return -EINVAL; 2039 2037 2038 + if (tb[RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED]) 2039 + opcnt = !!nla_get_u8( 2040 + tb[RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED]); 2041 + 2040 2042 mode = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_MODE]); 2041 2043 if (mode == RDMA_COUNTER_MODE_AUTO) { 2042 2044 if (tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]) 2043 2045 mask = nla_get_u32( 2044 2046 tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]); 2045 - return rdma_counter_set_auto_mode(device, port, mask, extack); 2047 + return rdma_counter_set_auto_mode(device, port, mask, opcnt, 2048 + extack); 2046 2049 } 2047 2050 2048 2051 if (!tb[RDMA_NLDEV_ATTR_RES_LQPN]) ··· 2365 2358 struct ib_device *device; 2366 2359 struct sk_buff *msg; 2367 2360 u32 index, port; 2361 + bool opcnt; 2368 2362 int ret; 2369 2363 2370 2364 if (tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]) ··· 2401 2393 goto err_msg; 2402 2394 } 2403 2395 2404 - ret = rdma_counter_get_mode(device, port, &mode, &mask); 2396 + ret = rdma_counter_get_mode(device, port, &mode, &mask, &opcnt); 2405 2397 if (ret) 2406 2398 goto err_msg; 2407 2399 ··· 2414 2406 2415 2407 if ((mode == RDMA_COUNTER_MODE_AUTO) && 2416 2408 nla_put_u32(msg, RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK, mask)) { 2409 + ret = -EMSGSIZE; 2410 + goto err_msg; 2411 + } 2412 + 2413 + if ((mode == RDMA_COUNTER_MODE_AUTO) && 2414 + nla_put_u8(msg, RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED, opcnt)) { 2417 2415 ret = -EMSGSIZE; 2418 2416 goto err_msg; 2419 2417 }
+2 -13
drivers/infiniband/core/sysfs.c
··· 216 216 struct ib_port_attr attr; 217 217 ssize_t ret; 218 218 219 - static const char *state_name[] = { 220 - [IB_PORT_NOP] = "NOP", 221 - [IB_PORT_DOWN] = "DOWN", 222 - [IB_PORT_INIT] = "INIT", 223 - [IB_PORT_ARMED] = "ARMED", 224 - [IB_PORT_ACTIVE] = "ACTIVE", 225 - [IB_PORT_ACTIVE_DEFER] = "ACTIVE_DEFER" 226 - }; 227 - 228 219 ret = ib_query_port(ibdev, port_num, &attr); 229 220 if (ret) 230 221 return ret; 231 222 232 223 return sysfs_emit(buf, "%d: %s\n", attr.state, 233 - attr.state >= 0 && 234 - attr.state < ARRAY_SIZE(state_name) ? 235 - state_name[attr.state] : 236 - "UNKNOWN"); 224 + ib_port_state_to_str(attr.state)); 237 225 } 238 226 239 227 static ssize_t lid_show(struct ib_device *ibdev, u32 port_num, ··· 976 988 for (i = 0; i != ARRAY_SIZE(ibdev->groups); i++) 977 989 if (!ibdev->groups[i]) { 978 990 ibdev->groups[i] = &data->group; 991 + ibdev->hw_stats_attr_index = i; 979 992 return 0; 980 993 } 981 994 WARN(true, "struct ib_device->groups is too small");
+267
drivers/infiniband/core/ucaps.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + /* 3 + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved 4 + */ 5 + 6 + #include <linux/kref.h> 7 + #include <linux/cdev.h> 8 + #include <linux/mutex.h> 9 + #include <linux/file.h> 10 + #include <linux/fs.h> 11 + #include <rdma/ib_ucaps.h> 12 + 13 + #define RDMA_UCAP_FIRST RDMA_UCAP_MLX5_CTRL_LOCAL 14 + 15 + static DEFINE_MUTEX(ucaps_mutex); 16 + static struct ib_ucap *ucaps_list[RDMA_UCAP_MAX]; 17 + static bool ucaps_class_is_registered; 18 + static dev_t ucaps_base_dev; 19 + 20 + struct ib_ucap { 21 + struct cdev cdev; 22 + struct device dev; 23 + struct kref ref; 24 + }; 25 + 26 + static const char *ucap_names[RDMA_UCAP_MAX] = { 27 + [RDMA_UCAP_MLX5_CTRL_LOCAL] = "mlx5_perm_ctrl_local", 28 + [RDMA_UCAP_MLX5_CTRL_OTHER_VHCA] = "mlx5_perm_ctrl_other_vhca" 29 + }; 30 + 31 + static char *ucaps_devnode(const struct device *dev, umode_t *mode) 32 + { 33 + if (mode) 34 + *mode = 0600; 35 + 36 + return kasprintf(GFP_KERNEL, "infiniband/%s", dev_name(dev)); 37 + } 38 + 39 + static const struct class ucaps_class = { 40 + .name = "infiniband_ucaps", 41 + .devnode = ucaps_devnode, 42 + }; 43 + 44 + static const struct file_operations ucaps_cdev_fops = { 45 + .owner = THIS_MODULE, 46 + .open = simple_open, 47 + }; 48 + 49 + /** 50 + * ib_cleanup_ucaps - cleanup all API resources and class. 51 + * 52 + * This is called once, when removing the ib_uverbs module. 53 + */ 54 + void ib_cleanup_ucaps(void) 55 + { 56 + mutex_lock(&ucaps_mutex); 57 + if (!ucaps_class_is_registered) { 58 + mutex_unlock(&ucaps_mutex); 59 + return; 60 + } 61 + 62 + for (int i = RDMA_UCAP_FIRST; i < RDMA_UCAP_MAX; i++) 63 + WARN_ON(ucaps_list[i]); 64 + 65 + class_unregister(&ucaps_class); 66 + ucaps_class_is_registered = false; 67 + unregister_chrdev_region(ucaps_base_dev, RDMA_UCAP_MAX); 68 + mutex_unlock(&ucaps_mutex); 69 + } 70 + 71 + static int get_ucap_from_devt(dev_t devt, u64 *idx_mask) 72 + { 73 + for (int type = RDMA_UCAP_FIRST; type < RDMA_UCAP_MAX; type++) { 74 + if (ucaps_list[type] && ucaps_list[type]->dev.devt == devt) { 75 + *idx_mask |= 1 << type; 76 + return 0; 77 + } 78 + } 79 + 80 + return -EINVAL; 81 + } 82 + 83 + static int get_devt_from_fd(unsigned int fd, dev_t *ret_dev) 84 + { 85 + struct file *file; 86 + 87 + file = fget(fd); 88 + if (!file) 89 + return -EBADF; 90 + 91 + *ret_dev = file_inode(file)->i_rdev; 92 + fput(file); 93 + return 0; 94 + } 95 + 96 + /** 97 + * ib_ucaps_init - Initialization required before ucap creation. 98 + * 99 + * Return: 0 on success, or a negative errno value on failure 100 + */ 101 + static int ib_ucaps_init(void) 102 + { 103 + int ret = 0; 104 + 105 + if (ucaps_class_is_registered) 106 + return ret; 107 + 108 + ret = class_register(&ucaps_class); 109 + if (ret) 110 + return ret; 111 + 112 + ret = alloc_chrdev_region(&ucaps_base_dev, 0, RDMA_UCAP_MAX, 113 + ucaps_class.name); 114 + if (ret < 0) { 115 + class_unregister(&ucaps_class); 116 + return ret; 117 + } 118 + 119 + ucaps_class_is_registered = true; 120 + 121 + return 0; 122 + } 123 + 124 + static void ucap_dev_release(struct device *device) 125 + { 126 + struct ib_ucap *ucap = container_of(device, struct ib_ucap, dev); 127 + 128 + kfree(ucap); 129 + } 130 + 131 + /** 132 + * ib_create_ucap - Add a ucap character device 133 + * @type: UCAP type 134 + * 135 + * Creates a ucap character device in the /dev/infiniband directory. By default, 136 + * the device has root-only read-write access. 137 + * 138 + * A driver may call this multiple times with the same UCAP type. A reference 139 + * count tracks creations and deletions. 140 + * 141 + * Return: 0 on success, or a negative errno value on failure 142 + */ 143 + int ib_create_ucap(enum rdma_user_cap type) 144 + { 145 + struct ib_ucap *ucap; 146 + int ret; 147 + 148 + if (type >= RDMA_UCAP_MAX) 149 + return -EINVAL; 150 + 151 + mutex_lock(&ucaps_mutex); 152 + ret = ib_ucaps_init(); 153 + if (ret) 154 + goto unlock; 155 + 156 + ucap = ucaps_list[type]; 157 + if (ucap) { 158 + kref_get(&ucap->ref); 159 + mutex_unlock(&ucaps_mutex); 160 + return 0; 161 + } 162 + 163 + ucap = kzalloc(sizeof(*ucap), GFP_KERNEL); 164 + if (!ucap) { 165 + ret = -ENOMEM; 166 + goto unlock; 167 + } 168 + 169 + device_initialize(&ucap->dev); 170 + ucap->dev.class = &ucaps_class; 171 + ucap->dev.devt = MKDEV(MAJOR(ucaps_base_dev), type); 172 + ucap->dev.release = ucap_dev_release; 173 + ret = dev_set_name(&ucap->dev, ucap_names[type]); 174 + if (ret) 175 + goto err_device; 176 + 177 + cdev_init(&ucap->cdev, &ucaps_cdev_fops); 178 + ucap->cdev.owner = THIS_MODULE; 179 + 180 + ret = cdev_device_add(&ucap->cdev, &ucap->dev); 181 + if (ret) 182 + goto err_device; 183 + 184 + kref_init(&ucap->ref); 185 + ucaps_list[type] = ucap; 186 + mutex_unlock(&ucaps_mutex); 187 + 188 + return 0; 189 + 190 + err_device: 191 + put_device(&ucap->dev); 192 + unlock: 193 + mutex_unlock(&ucaps_mutex); 194 + return ret; 195 + } 196 + EXPORT_SYMBOL(ib_create_ucap); 197 + 198 + static void ib_release_ucap(struct kref *ref) 199 + { 200 + struct ib_ucap *ucap = container_of(ref, struct ib_ucap, ref); 201 + enum rdma_user_cap type; 202 + 203 + for (type = RDMA_UCAP_FIRST; type < RDMA_UCAP_MAX; type++) { 204 + if (ucaps_list[type] == ucap) 205 + break; 206 + } 207 + WARN_ON(type == RDMA_UCAP_MAX); 208 + 209 + ucaps_list[type] = NULL; 210 + cdev_device_del(&ucap->cdev, &ucap->dev); 211 + put_device(&ucap->dev); 212 + } 213 + 214 + /** 215 + * ib_remove_ucap - Remove a ucap character device 216 + * @type: User cap type 217 + * 218 + * Removes the ucap character device according to type. The device is completely 219 + * removed from the filesystem when its reference count reaches 0. 220 + */ 221 + void ib_remove_ucap(enum rdma_user_cap type) 222 + { 223 + struct ib_ucap *ucap; 224 + 225 + mutex_lock(&ucaps_mutex); 226 + ucap = ucaps_list[type]; 227 + if (WARN_ON(!ucap)) 228 + goto end; 229 + 230 + kref_put(&ucap->ref, ib_release_ucap); 231 + end: 232 + mutex_unlock(&ucaps_mutex); 233 + } 234 + EXPORT_SYMBOL(ib_remove_ucap); 235 + 236 + /** 237 + * ib_get_ucaps - Get bitmask of ucap types from file descriptors 238 + * @fds: Array of file descriptors 239 + * @fd_count: Number of file descriptors in the array 240 + * @idx_mask: Bitmask to be updated based on the ucaps in the fd list 241 + * 242 + * Given an array of file descriptors, this function returns a bitmask of 243 + * the ucaps where a bit is set if an FD for that ucap type was in the array. 244 + * 245 + * Return: 0 on success, or a negative errno value on failure 246 + */ 247 + int ib_get_ucaps(int *fds, int fd_count, uint64_t *idx_mask) 248 + { 249 + int ret = 0; 250 + dev_t dev; 251 + 252 + *idx_mask = 0; 253 + mutex_lock(&ucaps_mutex); 254 + for (int i = 0; i < fd_count; i++) { 255 + ret = get_devt_from_fd(fds[i], &dev); 256 + if (ret) 257 + goto end; 258 + 259 + ret = get_ucap_from_devt(dev, idx_mask); 260 + if (ret) 261 + goto end; 262 + } 263 + 264 + end: 265 + mutex_unlock(&ucaps_mutex); 266 + return ret; 267 + }
+3 -1
drivers/infiniband/core/ucma.c
··· 69 69 .data = &max_backlog, 70 70 .maxlen = sizeof max_backlog, 71 71 .mode = 0644, 72 - .proc_handler = proc_dointvec, 72 + .proc_handler = proc_dointvec_minmax, 73 + .extra1 = SYSCTL_ZERO, 74 + .extra2 = SYSCTL_INT_MAX, 73 75 }, 74 76 }; 75 77
+26 -10
drivers/infiniband/core/umem.c
··· 80 80 unsigned long pgsz_bitmap, 81 81 unsigned long virt) 82 82 { 83 - struct scatterlist *sg; 83 + unsigned long curr_len = 0; 84 + dma_addr_t curr_base = ~0; 84 85 unsigned long va, pgoff; 86 + struct scatterlist *sg; 85 87 dma_addr_t mask; 88 + dma_addr_t end; 86 89 int i; 87 90 88 91 umem->iova = va = virt; ··· 110 107 pgoff = umem->address & ~PAGE_MASK; 111 108 112 109 for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i) { 113 - /* Walk SGL and reduce max page size if VA/PA bits differ 114 - * for any address. 110 + /* If the current entry is physically contiguous with the previous 111 + * one, no need to take its start addresses into consideration. 115 112 */ 116 - mask |= (sg_dma_address(sg) + pgoff) ^ va; 113 + if (check_add_overflow(curr_base, curr_len, &end) || 114 + end != sg_dma_address(sg)) { 115 + 116 + curr_base = sg_dma_address(sg); 117 + curr_len = 0; 118 + 119 + /* Reduce max page size if VA/PA bits differ */ 120 + mask |= (curr_base + pgoff) ^ va; 121 + 122 + /* The alignment of any VA matching a discontinuity point 123 + * in the physical memory sets the maximum possible page 124 + * size as this must be a starting point of a new page that 125 + * needs to be aligned. 126 + */ 127 + if (i != 0) 128 + mask |= va; 129 + } 130 + 131 + curr_len += sg_dma_len(sg); 117 132 va += sg_dma_len(sg) - pgoff; 118 - /* Except for the last entry, the ending iova alignment sets 119 - * the maximum possible page size as the low bits of the iova 120 - * must be zero when starting the next chunk. 121 - */ 122 - if (i != (umem->sgt_append.sgt.nents - 1)) 123 - mask |= va; 133 + 124 134 pgoff = 0; 125 135 } 126 136
+95 -68
drivers/infiniband/core/uverbs_cmd.c
··· 42 42 43 43 #include <rdma/uverbs_types.h> 44 44 #include <rdma/uverbs_std_types.h> 45 + #include <rdma/ib_ucaps.h> 45 46 #include "rdma_core.h" 46 47 47 48 #include "uverbs.h" ··· 233 232 { 234 233 struct ib_ucontext *ucontext = attrs->context; 235 234 struct ib_uverbs_file *file = attrs->ufile; 235 + int *fd_array; 236 + int fd_count; 236 237 int ret; 237 238 238 239 if (!down_read_trylock(&file->hw_destroy_rwsem)) ··· 249 246 RDMACG_RESOURCE_HCA_HANDLE); 250 247 if (ret) 251 248 goto err; 249 + 250 + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_GET_CONTEXT_FD_ARR)) { 251 + fd_count = uverbs_attr_ptr_get_array_size(attrs, 252 + UVERBS_ATTR_GET_CONTEXT_FD_ARR, 253 + sizeof(int)); 254 + if (fd_count < 0) { 255 + ret = fd_count; 256 + goto err_uncharge; 257 + } 258 + 259 + fd_array = uverbs_attr_get_alloced_ptr(attrs, 260 + UVERBS_ATTR_GET_CONTEXT_FD_ARR); 261 + ret = ib_get_ucaps(fd_array, fd_count, &ucontext->enabled_caps); 262 + if (ret) 263 + goto err_uncharge; 264 + } 252 265 253 266 ret = ucontext->device->ops.alloc_ucontext(ucontext, 254 267 &attrs->driver_udata); ··· 735 716 goto err_free; 736 717 737 718 pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs); 738 - if (!pd) { 739 - ret = -EINVAL; 719 + if (IS_ERR(pd)) { 720 + ret = PTR_ERR(pd); 740 721 goto err_free; 741 722 } 742 723 ··· 826 807 if (cmd.flags & IB_MR_REREG_PD) { 827 808 new_pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, 828 809 attrs); 829 - if (!new_pd) { 830 - ret = -EINVAL; 810 + if (IS_ERR(new_pd)) { 811 + ret = PTR_ERR(new_pd); 831 812 goto put_uobjs; 832 813 } 833 814 } else { ··· 936 917 return PTR_ERR(uobj); 937 918 938 919 pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs); 939 - if (!pd) { 940 - ret = -EINVAL; 920 + if (IS_ERR(pd)) { 921 + ret = PTR_ERR(pd); 941 922 goto err_free; 942 923 } 943 924 ··· 1144 1125 return ret; 1145 1126 1146 1127 cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs); 1147 - if (!cq) 1148 - return -EINVAL; 1128 + if (IS_ERR(cq)) 1129 + return PTR_ERR(cq); 1149 1130 1150 1131 ret = cq->device->ops.resize_cq(cq, cmd.cqe, &attrs->driver_udata); 1151 1132 if (ret) ··· 1206 1187 return ret; 1207 1188 1208 1189 cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs); 1209 - if (!cq) 1210 - return -EINVAL; 1190 + if (IS_ERR(cq)) 1191 + return PTR_ERR(cq); 1211 1192 1212 1193 /* we copy a struct ib_uverbs_poll_cq_resp to user space */ 1213 1194 header_ptr = attrs->ucore.outbuf; ··· 1255 1236 return ret; 1256 1237 1257 1238 cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs); 1258 - if (!cq) 1259 - return -EINVAL; 1239 + if (IS_ERR(cq)) 1240 + return PTR_ERR(cq); 1260 1241 1261 1242 ib_req_notify_cq(cq, cmd.solicited_only ? 1262 1243 IB_CQ_SOLICITED : IB_CQ_NEXT_COMP); ··· 1338 1319 ind_tbl = uobj_get_obj_read(rwq_ind_table, 1339 1320 UVERBS_OBJECT_RWQ_IND_TBL, 1340 1321 cmd->rwq_ind_tbl_handle, attrs); 1341 - if (!ind_tbl) { 1342 - ret = -EINVAL; 1322 + if (IS_ERR(ind_tbl)) { 1323 + ret = PTR_ERR(ind_tbl); 1343 1324 goto err_put; 1344 1325 } 1345 1326 ··· 1377 1358 if (cmd->is_srq) { 1378 1359 srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, 1379 1360 cmd->srq_handle, attrs); 1380 - if (!srq || srq->srq_type == IB_SRQT_XRC) { 1381 - ret = -EINVAL; 1361 + if (IS_ERR(srq) || 1362 + srq->srq_type == IB_SRQT_XRC) { 1363 + ret = IS_ERR(srq) ? PTR_ERR(srq) : 1364 + -EINVAL; 1382 1365 goto err_put; 1383 1366 } 1384 1367 } ··· 1390 1369 rcq = uobj_get_obj_read( 1391 1370 cq, UVERBS_OBJECT_CQ, 1392 1371 cmd->recv_cq_handle, attrs); 1393 - if (!rcq) { 1394 - ret = -EINVAL; 1372 + if (IS_ERR(rcq)) { 1373 + ret = PTR_ERR(rcq); 1395 1374 goto err_put; 1396 1375 } 1397 1376 } 1398 1377 } 1399 1378 } 1400 1379 1401 - if (has_sq) 1380 + if (has_sq) { 1402 1381 scq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, 1403 1382 cmd->send_cq_handle, attrs); 1383 + if (IS_ERR(scq)) { 1384 + ret = PTR_ERR(scq); 1385 + goto err_put; 1386 + } 1387 + } 1388 + 1404 1389 if (!ind_tbl && cmd->qp_type != IB_QPT_XRC_INI) 1405 1390 rcq = rcq ?: scq; 1406 1391 pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd->pd_handle, 1407 1392 attrs); 1408 - if (!pd || (!scq && has_sq)) { 1409 - ret = -EINVAL; 1393 + if (IS_ERR(pd)) { 1394 + ret = PTR_ERR(pd); 1410 1395 goto err_put; 1411 1396 } 1412 1397 ··· 1507 1480 err_put: 1508 1481 if (!IS_ERR(xrcd_uobj)) 1509 1482 uobj_put_read(xrcd_uobj); 1510 - if (pd) 1483 + if (!IS_ERR_OR_NULL(pd)) 1511 1484 uobj_put_obj_read(pd); 1512 - if (scq) 1485 + if (!IS_ERR_OR_NULL(scq)) 1513 1486 rdma_lookup_put_uobject(&scq->uobject->uevent.uobject, 1514 1487 UVERBS_LOOKUP_READ); 1515 - if (rcq && rcq != scq) 1488 + if (!IS_ERR_OR_NULL(rcq) && rcq != scq) 1516 1489 rdma_lookup_put_uobject(&rcq->uobject->uevent.uobject, 1517 1490 UVERBS_LOOKUP_READ); 1518 - if (srq) 1491 + if (!IS_ERR_OR_NULL(srq)) 1519 1492 rdma_lookup_put_uobject(&srq->uobject->uevent.uobject, 1520 1493 UVERBS_LOOKUP_READ); 1521 - if (ind_tbl) 1494 + if (!IS_ERR_OR_NULL(ind_tbl)) 1522 1495 uobj_put_obj_read(ind_tbl); 1523 1496 1524 1497 uobj_alloc_abort(&obj->uevent.uobject, attrs); ··· 1680 1653 } 1681 1654 1682 1655 qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs); 1683 - if (!qp) { 1684 - ret = -EINVAL; 1656 + if (IS_ERR(qp)) { 1657 + ret = PTR_ERR(qp); 1685 1658 goto out; 1686 1659 } 1687 1660 ··· 1786 1759 1787 1760 qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd->base.qp_handle, 1788 1761 attrs); 1789 - if (!qp) { 1790 - ret = -EINVAL; 1762 + if (IS_ERR(qp)) { 1763 + ret = PTR_ERR(qp); 1791 1764 goto out; 1792 1765 } 1793 1766 ··· 2053 2026 return -ENOMEM; 2054 2027 2055 2028 qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs); 2056 - if (!qp) { 2057 - ret = -EINVAL; 2029 + if (IS_ERR(qp)) { 2030 + ret = PTR_ERR(qp); 2058 2031 goto out; 2059 2032 } 2060 2033 ··· 2091 2064 2092 2065 ud->ah = uobj_get_obj_read(ah, UVERBS_OBJECT_AH, 2093 2066 user_wr->wr.ud.ah, attrs); 2094 - if (!ud->ah) { 2067 + if (IS_ERR(ud->ah)) { 2068 + ret = PTR_ERR(ud->ah); 2095 2069 kfree(ud); 2096 - ret = -EINVAL; 2097 2070 goto out_put; 2098 2071 } 2099 2072 ud->remote_qpn = user_wr->wr.ud.remote_qpn; ··· 2330 2303 return PTR_ERR(wr); 2331 2304 2332 2305 qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs); 2333 - if (!qp) { 2334 - ret = -EINVAL; 2306 + if (IS_ERR(qp)) { 2307 + ret = PTR_ERR(qp); 2335 2308 goto out; 2336 2309 } 2337 2310 ··· 2381 2354 return PTR_ERR(wr); 2382 2355 2383 2356 srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs); 2384 - if (!srq) { 2385 - ret = -EINVAL; 2357 + if (IS_ERR(srq)) { 2358 + ret = PTR_ERR(srq); 2386 2359 goto out; 2387 2360 } 2388 2361 ··· 2438 2411 } 2439 2412 2440 2413 pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs); 2441 - if (!pd) { 2442 - ret = -EINVAL; 2414 + if (IS_ERR(pd)) { 2415 + ret = PTR_ERR(pd); 2443 2416 goto err; 2444 2417 } 2445 2418 ··· 2508 2481 return ret; 2509 2482 2510 2483 qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs); 2511 - if (!qp) 2512 - return -EINVAL; 2484 + if (IS_ERR(qp)) 2485 + return PTR_ERR(qp); 2513 2486 2514 2487 obj = qp->uobject; 2515 2488 ··· 2558 2531 return ret; 2559 2532 2560 2533 qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs); 2561 - if (!qp) 2562 - return -EINVAL; 2534 + if (IS_ERR(qp)) 2535 + return PTR_ERR(qp); 2563 2536 2564 2537 obj = qp->uobject; 2565 2538 mutex_lock(&obj->mcast_lock); ··· 2693 2666 UVERBS_OBJECT_FLOW_ACTION, 2694 2667 kern_spec->action.handle, 2695 2668 attrs); 2696 - if (!ib_spec->action.act) 2697 - return -EINVAL; 2669 + if (IS_ERR(ib_spec->action.act)) 2670 + return PTR_ERR(ib_spec->action.act); 2698 2671 ib_spec->action.size = 2699 2672 sizeof(struct ib_flow_spec_action_handle); 2700 2673 flow_resources_add(uflow_res, ··· 2711 2684 UVERBS_OBJECT_COUNTERS, 2712 2685 kern_spec->flow_count.handle, 2713 2686 attrs); 2714 - if (!ib_spec->flow_count.counters) 2715 - return -EINVAL; 2687 + if (IS_ERR(ib_spec->flow_count.counters)) 2688 + return PTR_ERR(ib_spec->flow_count.counters); 2716 2689 ib_spec->flow_count.size = 2717 2690 sizeof(struct ib_flow_spec_action_count); 2718 2691 flow_resources_add(uflow_res, ··· 2930 2903 return PTR_ERR(obj); 2931 2904 2932 2905 pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd.pd_handle, attrs); 2933 - if (!pd) { 2934 - err = -EINVAL; 2906 + if (IS_ERR(pd)) { 2907 + err = PTR_ERR(pd); 2935 2908 goto err_uobj; 2936 2909 } 2937 2910 2938 2911 cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs); 2939 - if (!cq) { 2940 - err = -EINVAL; 2912 + if (IS_ERR(cq)) { 2913 + err = PTR_ERR(cq); 2941 2914 goto err_put_pd; 2942 2915 } 2943 2916 ··· 3038 3011 return -EINVAL; 3039 3012 3040 3013 wq = uobj_get_obj_read(wq, UVERBS_OBJECT_WQ, cmd.wq_handle, attrs); 3041 - if (!wq) 3042 - return -EINVAL; 3014 + if (IS_ERR(wq)) 3015 + return PTR_ERR(wq); 3043 3016 3044 3017 if (cmd.attr_mask & IB_WQ_FLAGS) { 3045 3018 wq_attr.flags = cmd.flags; ··· 3122 3095 num_read_wqs++) { 3123 3096 wq = uobj_get_obj_read(wq, UVERBS_OBJECT_WQ, 3124 3097 wqs_handles[num_read_wqs], attrs); 3125 - if (!wq) { 3126 - err = -EINVAL; 3098 + if (IS_ERR(wq)) { 3099 + err = PTR_ERR(wq); 3127 3100 goto put_wqs; 3128 3101 } 3129 3102 ··· 3278 3251 } 3279 3252 3280 3253 qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs); 3281 - if (!qp) { 3282 - err = -EINVAL; 3254 + if (IS_ERR(qp)) { 3255 + err = PTR_ERR(qp); 3283 3256 goto err_uobj; 3284 3257 } 3285 3258 ··· 3425 3398 if (ib_srq_has_cq(cmd->srq_type)) { 3426 3399 attr.ext.cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, 3427 3400 cmd->cq_handle, attrs); 3428 - if (!attr.ext.cq) { 3429 - ret = -EINVAL; 3401 + if (IS_ERR(attr.ext.cq)) { 3402 + ret = PTR_ERR(attr.ext.cq); 3430 3403 goto err_put_xrcd; 3431 3404 } 3432 3405 } 3433 3406 3434 3407 pd = uobj_get_obj_read(pd, UVERBS_OBJECT_PD, cmd->pd_handle, attrs); 3435 - if (!pd) { 3436 - ret = -EINVAL; 3408 + if (IS_ERR(pd)) { 3409 + ret = PTR_ERR(pd); 3437 3410 goto err_put_cq; 3438 3411 } 3439 3412 ··· 3540 3513 return ret; 3541 3514 3542 3515 srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs); 3543 - if (!srq) 3544 - return -EINVAL; 3516 + if (IS_ERR(srq)) 3517 + return PTR_ERR(srq); 3545 3518 3546 3519 attr.max_wr = cmd.max_wr; 3547 3520 attr.srq_limit = cmd.srq_limit; ··· 3568 3541 return ret; 3569 3542 3570 3543 srq = uobj_get_obj_read(srq, UVERBS_OBJECT_SRQ, cmd.srq_handle, attrs); 3571 - if (!srq) 3572 - return -EINVAL; 3544 + if (IS_ERR(srq)) 3545 + return PTR_ERR(srq); 3573 3546 3574 3547 ret = ib_query_srq(srq, &attr); 3575 3548 ··· 3694 3667 return -EOPNOTSUPP; 3695 3668 3696 3669 cq = uobj_get_obj_read(cq, UVERBS_OBJECT_CQ, cmd.cq_handle, attrs); 3697 - if (!cq) 3698 - return -EINVAL; 3670 + if (IS_ERR(cq)) 3671 + return PTR_ERR(cq); 3699 3672 3700 3673 ret = rdma_set_cq_moderation(cq, cmd.attr.cq_count, cmd.attr.cq_period); 3701 3674
+2
drivers/infiniband/core/uverbs_main.c
··· 52 52 #include <rdma/ib.h> 53 53 #include <rdma/uverbs_std_types.h> 54 54 #include <rdma/rdma_netlink.h> 55 + #include <rdma/ib_ucaps.h> 55 56 56 57 #include "uverbs.h" 57 58 #include "core_priv.h" ··· 1346 1345 IB_UVERBS_NUM_FIXED_MINOR); 1347 1346 unregister_chrdev_region(dynamic_uverbs_dev, 1348 1347 IB_UVERBS_NUM_DYNAMIC_MINOR); 1348 + ib_cleanup_ucaps(); 1349 1349 mmu_notifier_synchronize(); 1350 1350 } 1351 1351
+4
drivers/infiniband/core/uverbs_std_types_device.c
··· 437 437 UVERBS_ATTR_TYPE(u32), UA_OPTIONAL), 438 438 UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_GET_CONTEXT_CORE_SUPPORT, 439 439 UVERBS_ATTR_TYPE(u64), UA_OPTIONAL), 440 + UVERBS_ATTR_PTR_IN(UVERBS_ATTR_GET_CONTEXT_FD_ARR, 441 + UVERBS_ATTR_MIN_SIZE(sizeof(int)), 442 + UA_OPTIONAL, 443 + UA_ALLOC_AND_COPY), 440 444 UVERBS_ATTR_UHW()); 441 445 442 446 DECLARE_UVERBS_NAMED_METHOD(
+7 -6
drivers/infiniband/core/verbs.c
··· 2105 2105 if (!qp->uobject) 2106 2106 rdma_rw_cleanup_mrs(qp); 2107 2107 2108 - rdma_counter_unbind_qp(qp, true); 2108 + rdma_counter_unbind_qp(qp, qp->port, true); 2109 2109 ret = qp->device->ops.destroy_qp(qp, udata); 2110 2110 if (ret) { 2111 2111 if (sec) ··· 3109 3109 bool __rdma_block_iter_next(struct ib_block_iter *biter) 3110 3110 { 3111 3111 unsigned int block_offset; 3112 - unsigned int sg_delta; 3112 + unsigned int delta; 3113 3113 3114 3114 if (!biter->__sg_nents || !biter->__sg) 3115 3115 return false; 3116 3116 3117 3117 biter->__dma_addr = sg_dma_address(biter->__sg) + biter->__sg_advance; 3118 3118 block_offset = biter->__dma_addr & (BIT_ULL(biter->__pg_bit) - 1); 3119 - sg_delta = BIT_ULL(biter->__pg_bit) - block_offset; 3119 + delta = BIT_ULL(biter->__pg_bit) - block_offset; 3120 3120 3121 - if (sg_dma_len(biter->__sg) - biter->__sg_advance > sg_delta) { 3122 - biter->__sg_advance += sg_delta; 3123 - } else { 3121 + while (biter->__sg_nents && biter->__sg && 3122 + sg_dma_len(biter->__sg) - biter->__sg_advance <= delta) { 3123 + delta -= sg_dma_len(biter->__sg) - biter->__sg_advance; 3124 3124 biter->__sg_advance = 0; 3125 3125 biter->__sg = sg_next(biter->__sg); 3126 3126 biter->__sg_nents--; 3127 3127 } 3128 + biter->__sg_advance += delta; 3128 3129 3129 3130 return true; 3130 3131 }
+6
drivers/infiniband/hw/bnxt_re/bnxt_re.h
··· 225 225 unsigned long event_bitmap; 226 226 struct bnxt_qplib_cc_param cc_param; 227 227 struct workqueue_struct *dcb_wq; 228 + struct dentry *cc_config; 229 + struct bnxt_re_dbg_cc_config_params *cc_config_params; 228 230 }; 229 231 230 232 #define to_bnxt_re_dev(ptr, member) \ ··· 238 236 239 237 #define BNXT_RE_CHECK_RC(x) ((x) && ((x) != -ETIMEDOUT)) 240 238 void bnxt_re_pacing_alert(struct bnxt_re_dev *rdev); 239 + 240 + int bnxt_re_assign_pma_port_counters(struct bnxt_re_dev *rdev, struct ib_mad *out_mad); 241 + int bnxt_re_assign_pma_port_ext_counters(struct bnxt_re_dev *rdev, 242 + struct ib_mad *out_mad); 241 243 242 244 static inline struct device *rdev_to_dev(struct bnxt_re_dev *rdev) 243 245 {
+214 -1
drivers/infiniband/hw/bnxt_re/debugfs.c
··· 22 22 23 23 static struct dentry *bnxt_re_debugfs_root; 24 24 25 + static const char * const bnxt_re_cc_gen0_name[] = { 26 + "enable_cc", 27 + "run_avg_weight_g", 28 + "num_phase_per_state", 29 + "init_cr", 30 + "init_tr", 31 + "tos_ecn", 32 + "tos_dscp", 33 + "alt_vlan_pcp", 34 + "alt_vlan_dscp", 35 + "rtt", 36 + "cc_mode", 37 + "tcp_cp", 38 + "tx_queue", 39 + "inactivity_cp", 40 + }; 41 + 25 42 static inline const char *bnxt_re_qp_state_str(u8 state) 26 43 { 27 44 switch (state) { ··· 127 110 debugfs_remove(qp->dentry); 128 111 } 129 112 113 + static int map_cc_config_offset_gen0_ext0(u32 offset, struct bnxt_qplib_cc_param *ccparam, u32 *val) 114 + { 115 + u64 map_offset; 116 + 117 + map_offset = BIT(offset); 118 + 119 + switch (map_offset) { 120 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ENABLE_CC: 121 + *val = ccparam->enable; 122 + break; 123 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_G: 124 + *val = ccparam->g; 125 + break; 126 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_NUMPHASEPERSTATE: 127 + *val = ccparam->nph_per_state; 128 + break; 129 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_CR: 130 + *val = ccparam->init_cr; 131 + break; 132 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_TR: 133 + *val = ccparam->init_tr; 134 + break; 135 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_ECN: 136 + *val = ccparam->tos_ecn; 137 + break; 138 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_DSCP: 139 + *val = ccparam->tos_dscp; 140 + break; 141 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_VLAN_PCP: 142 + *val = ccparam->alt_vlan_pcp; 143 + break; 144 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_TOS_DSCP: 145 + *val = ccparam->alt_tos_dscp; 146 + break; 147 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_RTT: 148 + *val = ccparam->rtt; 149 + break; 150 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_CC_MODE: 151 + *val = ccparam->cc_mode; 152 + break; 153 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TCP_CP: 154 + *val = ccparam->tcp_cp; 155 + break; 156 + default: 157 + return -EINVAL; 158 + } 159 + 160 + return 0; 161 + } 162 + 163 + static ssize_t bnxt_re_cc_config_get(struct file *filp, char __user *buffer, 164 + size_t usr_buf_len, loff_t *ppos) 165 + { 166 + struct bnxt_re_cc_param *dbg_cc_param = filp->private_data; 167 + struct bnxt_re_dev *rdev = dbg_cc_param->rdev; 168 + struct bnxt_qplib_cc_param ccparam = {}; 169 + u32 offset = dbg_cc_param->offset; 170 + char buf[16]; 171 + u32 val; 172 + int rc; 173 + 174 + rc = bnxt_qplib_query_cc_param(&rdev->qplib_res, &ccparam); 175 + if (rc) 176 + return rc; 177 + 178 + rc = map_cc_config_offset_gen0_ext0(offset, &ccparam, &val); 179 + if (rc) 180 + return rc; 181 + 182 + rc = snprintf(buf, sizeof(buf), "%d\n", val); 183 + if (rc < 0) 184 + return rc; 185 + 186 + return simple_read_from_buffer(buffer, usr_buf_len, ppos, (u8 *)(buf), rc); 187 + } 188 + 189 + static void bnxt_re_fill_gen0_ext0(struct bnxt_qplib_cc_param *ccparam, u32 offset, u32 val) 190 + { 191 + u32 modify_mask; 192 + 193 + modify_mask = BIT(offset); 194 + 195 + switch (modify_mask) { 196 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ENABLE_CC: 197 + ccparam->enable = val; 198 + break; 199 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_G: 200 + ccparam->g = val; 201 + break; 202 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_NUMPHASEPERSTATE: 203 + ccparam->nph_per_state = val; 204 + break; 205 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_CR: 206 + ccparam->init_cr = val; 207 + break; 208 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_TR: 209 + ccparam->init_tr = val; 210 + break; 211 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_ECN: 212 + ccparam->tos_ecn = val; 213 + break; 214 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_DSCP: 215 + ccparam->tos_dscp = val; 216 + break; 217 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_VLAN_PCP: 218 + ccparam->alt_vlan_pcp = val; 219 + break; 220 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_TOS_DSCP: 221 + ccparam->alt_tos_dscp = val; 222 + break; 223 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_RTT: 224 + ccparam->rtt = val; 225 + break; 226 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_CC_MODE: 227 + ccparam->cc_mode = val; 228 + break; 229 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TCP_CP: 230 + ccparam->tcp_cp = val; 231 + break; 232 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TX_QUEUE: 233 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INACTIVITY_CP: 234 + break; 235 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TIME_PER_PHASE: 236 + ccparam->time_pph = val; 237 + break; 238 + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_PKTS_PER_PHASE: 239 + ccparam->pkts_pph = val; 240 + break; 241 + } 242 + 243 + ccparam->mask = modify_mask; 244 + } 245 + 246 + static int bnxt_re_configure_cc(struct bnxt_re_dev *rdev, u32 gen_ext, u32 offset, u32 val) 247 + { 248 + struct bnxt_qplib_cc_param ccparam = { }; 249 + 250 + /* Supporting only Gen 0 now */ 251 + if (gen_ext == CC_CONFIG_GEN0_EXT0) 252 + bnxt_re_fill_gen0_ext0(&ccparam, offset, val); 253 + else 254 + return -EINVAL; 255 + 256 + bnxt_qplib_modify_cc(&rdev->qplib_res, &ccparam); 257 + return 0; 258 + } 259 + 260 + static ssize_t bnxt_re_cc_config_set(struct file *filp, const char __user *buffer, 261 + size_t count, loff_t *ppos) 262 + { 263 + struct bnxt_re_cc_param *dbg_cc_param = filp->private_data; 264 + struct bnxt_re_dev *rdev = dbg_cc_param->rdev; 265 + u32 offset = dbg_cc_param->offset; 266 + u8 cc_gen = dbg_cc_param->cc_gen; 267 + char buf[16]; 268 + u32 val; 269 + int rc; 270 + 271 + if (count >= sizeof(buf)) 272 + return -EINVAL; 273 + 274 + if (copy_from_user(buf, buffer, count)) 275 + return -EFAULT; 276 + 277 + buf[count] = '\0'; 278 + if (kstrtou32(buf, 0, &val)) 279 + return -EINVAL; 280 + 281 + rc = bnxt_re_configure_cc(rdev, cc_gen, offset, val); 282 + return rc ? rc : count; 283 + } 284 + 285 + static const struct file_operations bnxt_re_cc_config_ops = { 286 + .owner = THIS_MODULE, 287 + .open = simple_open, 288 + .read = bnxt_re_cc_config_get, 289 + .write = bnxt_re_cc_config_set, 290 + }; 291 + 130 292 void bnxt_re_debugfs_add_pdev(struct bnxt_re_dev *rdev) 131 293 { 132 294 struct pci_dev *pdev = rdev->en_dev->pdev; 295 + struct bnxt_re_dbg_cc_config_params *cc_params; 296 + int i; 133 297 134 298 rdev->dbg_root = debugfs_create_dir(dev_name(&pdev->dev), bnxt_re_debugfs_root); 135 299 136 300 rdev->qp_debugfs = debugfs_create_dir("QPs", rdev->dbg_root); 301 + rdev->cc_config = debugfs_create_dir("cc_config", rdev->dbg_root); 302 + 303 + rdev->cc_config_params = kzalloc(sizeof(*cc_params), GFP_KERNEL); 304 + 305 + for (i = 0; i < BNXT_RE_CC_PARAM_GEN0; i++) { 306 + struct bnxt_re_cc_param *tmp_params = &rdev->cc_config_params->gen0_parms[i]; 307 + 308 + tmp_params->rdev = rdev; 309 + tmp_params->offset = i; 310 + tmp_params->cc_gen = CC_CONFIG_GEN0_EXT0; 311 + tmp_params->dentry = debugfs_create_file(bnxt_re_cc_gen0_name[i], 0400, 312 + rdev->cc_config, tmp_params, 313 + &bnxt_re_cc_config_ops); 314 + } 137 315 } 138 316 139 317 void bnxt_re_debugfs_rem_pdev(struct bnxt_re_dev *rdev) 140 318 { 141 319 debugfs_remove_recursive(rdev->qp_debugfs); 142 - 320 + debugfs_remove_recursive(rdev->cc_config); 321 + kfree(rdev->cc_config_params); 143 322 debugfs_remove_recursive(rdev->dbg_root); 144 323 rdev->dbg_root = NULL; 145 324 }
+15
drivers/infiniband/hw/bnxt_re/debugfs.h
··· 18 18 void bnxt_re_register_debugfs(void); 19 19 void bnxt_re_unregister_debugfs(void); 20 20 21 + #define CC_CONFIG_GEN_EXT(x, y) (((x) << 16) | (y)) 22 + #define CC_CONFIG_GEN0_EXT0 CC_CONFIG_GEN_EXT(0, 0) 23 + 24 + #define BNXT_RE_CC_PARAM_GEN0 14 25 + 26 + struct bnxt_re_cc_param { 27 + struct bnxt_re_dev *rdev; 28 + struct dentry *dentry; 29 + u32 offset; 30 + u8 cc_gen; 31 + }; 32 + 33 + struct bnxt_re_dbg_cc_config_params { 34 + struct bnxt_re_cc_param gen0_parms[BNXT_RE_CC_PARAM_GEN0]; 35 + }; 21 36 #endif
+92
drivers/infiniband/hw/bnxt_re/hw_counters.c
··· 39 39 40 40 #include <linux/types.h> 41 41 #include <linux/pci.h> 42 + #include <rdma/ib_mad.h> 43 + #include <rdma/ib_pma.h> 42 44 43 45 #include "roce_hsi.h" 44 46 #include "qplib_res.h" ··· 285 283 stats->value[BNXT_RE_PACING_ALERT] = pacing_s->alerts; 286 284 stats->value[BNXT_RE_DB_FIFO_REG] = 287 285 readl(rdev->en_dev->bar0 + rdev->pacing.dbr_db_fifo_reg_off); 286 + } 287 + 288 + int bnxt_re_assign_pma_port_ext_counters(struct bnxt_re_dev *rdev, struct ib_mad *out_mad) 289 + { 290 + struct ib_pma_portcounters_ext *pma_cnt_ext; 291 + struct bnxt_qplib_ext_stat *estat = &rdev->stats.rstat.ext_stat; 292 + struct ctx_hw_stats *hw_stats = NULL; 293 + int rc; 294 + 295 + hw_stats = rdev->qplib_ctx.stats.dma; 296 + 297 + pma_cnt_ext = (struct ib_pma_portcounters_ext *)(out_mad->data + 40); 298 + if (_is_ext_stats_supported(rdev->dev_attr->dev_cap_flags)) { 299 + u32 fid = PCI_FUNC(rdev->en_dev->pdev->devfn); 300 + 301 + rc = bnxt_qplib_qext_stat(&rdev->rcfw, fid, estat); 302 + if (rc) 303 + return rc; 304 + } 305 + 306 + pma_cnt_ext = (struct ib_pma_portcounters_ext *)(out_mad->data + 40); 307 + if ((bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx) && rdev->is_virtfn) || 308 + !bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx)) { 309 + pma_cnt_ext->port_xmit_data = 310 + cpu_to_be64(le64_to_cpu(hw_stats->tx_ucast_bytes) / 4); 311 + pma_cnt_ext->port_rcv_data = 312 + cpu_to_be64(le64_to_cpu(hw_stats->rx_ucast_bytes) / 4); 313 + pma_cnt_ext->port_xmit_packets = 314 + cpu_to_be64(le64_to_cpu(hw_stats->tx_ucast_pkts)); 315 + pma_cnt_ext->port_rcv_packets = 316 + cpu_to_be64(le64_to_cpu(hw_stats->rx_ucast_pkts)); 317 + pma_cnt_ext->port_unicast_rcv_packets = 318 + cpu_to_be64(le64_to_cpu(hw_stats->rx_ucast_pkts)); 319 + pma_cnt_ext->port_unicast_xmit_packets = 320 + cpu_to_be64(le64_to_cpu(hw_stats->tx_ucast_pkts)); 321 + 322 + } else { 323 + pma_cnt_ext->port_rcv_packets = cpu_to_be64(estat->rx_roce_good_pkts); 324 + pma_cnt_ext->port_rcv_data = cpu_to_be64(estat->rx_roce_good_bytes / 4); 325 + pma_cnt_ext->port_xmit_packets = cpu_to_be64(estat->tx_roce_pkts); 326 + pma_cnt_ext->port_xmit_data = cpu_to_be64(estat->tx_roce_bytes / 4); 327 + pma_cnt_ext->port_unicast_rcv_packets = cpu_to_be64(estat->rx_roce_good_pkts); 328 + pma_cnt_ext->port_unicast_xmit_packets = cpu_to_be64(estat->tx_roce_pkts); 329 + } 330 + return 0; 331 + } 332 + 333 + int bnxt_re_assign_pma_port_counters(struct bnxt_re_dev *rdev, struct ib_mad *out_mad) 334 + { 335 + struct bnxt_qplib_ext_stat *estat = &rdev->stats.rstat.ext_stat; 336 + struct ib_pma_portcounters *pma_cnt; 337 + struct ctx_hw_stats *hw_stats = NULL; 338 + int rc; 339 + 340 + hw_stats = rdev->qplib_ctx.stats.dma; 341 + 342 + pma_cnt = (struct ib_pma_portcounters *)(out_mad->data + 40); 343 + if (_is_ext_stats_supported(rdev->dev_attr->dev_cap_flags)) { 344 + u32 fid = PCI_FUNC(rdev->en_dev->pdev->devfn); 345 + 346 + rc = bnxt_qplib_qext_stat(&rdev->rcfw, fid, estat); 347 + if (rc) 348 + return rc; 349 + } 350 + if ((bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx) && rdev->is_virtfn) || 351 + !bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx)) { 352 + pma_cnt->port_rcv_packets = 353 + cpu_to_be32((u32)(le64_to_cpu(hw_stats->rx_ucast_pkts)) & 0xFFFFFFFF); 354 + pma_cnt->port_rcv_data = 355 + cpu_to_be32((u32)((le64_to_cpu(hw_stats->rx_ucast_bytes) & 356 + 0xFFFFFFFF) / 4)); 357 + pma_cnt->port_xmit_packets = 358 + cpu_to_be32((u32)(le64_to_cpu(hw_stats->tx_ucast_pkts)) & 0xFFFFFFFF); 359 + pma_cnt->port_xmit_data = 360 + cpu_to_be32((u32)((le64_to_cpu(hw_stats->tx_ucast_bytes) 361 + & 0xFFFFFFFF) / 4)); 362 + } else { 363 + pma_cnt->port_rcv_packets = cpu_to_be32(estat->rx_roce_good_pkts); 364 + pma_cnt->port_rcv_data = cpu_to_be32((estat->rx_roce_good_bytes / 4)); 365 + pma_cnt->port_xmit_packets = cpu_to_be32(estat->tx_roce_pkts); 366 + pma_cnt->port_xmit_data = cpu_to_be32((estat->tx_roce_bytes / 4)); 367 + } 368 + pma_cnt->port_rcv_constraint_errors = (u8)(le64_to_cpu(hw_stats->rx_discard_pkts) & 0xFF); 369 + pma_cnt->port_rcv_errors = cpu_to_be16((u16)(le64_to_cpu(hw_stats->rx_error_pkts) 370 + & 0xFFFF)); 371 + pma_cnt->port_xmit_constraint_errors = (u8)(le64_to_cpu(hw_stats->tx_error_pkts) & 0xFF); 372 + pma_cnt->port_xmit_discards = cpu_to_be16((u16)(le64_to_cpu(hw_stats->tx_discard_pkts) 373 + & 0xFFFF)); 374 + 375 + return 0; 288 376 } 289 377 290 378 int bnxt_re_ib_get_hw_stats(struct ib_device *ibdev,
+36
drivers/infiniband/hw/bnxt_re/ib_verbs.c
··· 49 49 #include <rdma/ib_addr.h> 50 50 #include <rdma/ib_mad.h> 51 51 #include <rdma/ib_cache.h> 52 + #include <rdma/ib_pma.h> 52 53 #include <rdma/uverbs_ioctl.h> 53 54 #include <linux/hashtable.h> 54 55 ··· 4490 4489 rdma_entry); 4491 4490 4492 4491 kfree(bnxt_entry); 4492 + } 4493 + 4494 + int bnxt_re_process_mad(struct ib_device *ibdev, int mad_flags, 4495 + u32 port_num, const struct ib_wc *in_wc, 4496 + const struct ib_grh *in_grh, 4497 + const struct ib_mad *in_mad, struct ib_mad *out_mad, 4498 + size_t *out_mad_size, u16 *out_mad_pkey_index) 4499 + { 4500 + struct bnxt_re_dev *rdev = to_bnxt_re_dev(ibdev, ibdev); 4501 + struct ib_class_port_info cpi = {}; 4502 + int ret = IB_MAD_RESULT_SUCCESS; 4503 + int rc = 0; 4504 + 4505 + if (in_mad->mad_hdr.mgmt_class != IB_MGMT_CLASS_PERF_MGMT) 4506 + return ret; 4507 + 4508 + switch (in_mad->mad_hdr.attr_id) { 4509 + case IB_PMA_CLASS_PORT_INFO: 4510 + cpi.capability_mask = IB_PMA_CLASS_CAP_EXT_WIDTH; 4511 + memcpy((out_mad->data + 40), &cpi, sizeof(cpi)); 4512 + break; 4513 + case IB_PMA_PORT_COUNTERS_EXT: 4514 + rc = bnxt_re_assign_pma_port_ext_counters(rdev, out_mad); 4515 + break; 4516 + case IB_PMA_PORT_COUNTERS: 4517 + rc = bnxt_re_assign_pma_port_counters(rdev, out_mad); 4518 + break; 4519 + default: 4520 + rc = -EINVAL; 4521 + break; 4522 + } 4523 + if (rc) 4524 + return IB_MAD_RESULT_FAILURE; 4525 + ret |= IB_MAD_RESULT_REPLY; 4526 + return ret; 4493 4527 } 4494 4528 4495 4529 static int UVERBS_HANDLER(BNXT_RE_METHOD_NOTIFY_DRV)(struct uverbs_attr_bundle *attrs)
+6
drivers/infiniband/hw/bnxt_re/ib_verbs.h
··· 268 268 int bnxt_re_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); 269 269 void bnxt_re_mmap_free(struct rdma_user_mmap_entry *rdma_entry); 270 270 271 + int bnxt_re_process_mad(struct ib_device *device, int process_mad_flags, 272 + u32 port_num, const struct ib_wc *in_wc, 273 + const struct ib_grh *in_grh, 274 + const struct ib_mad *in_mad, struct ib_mad *out_mad, 275 + size_t *out_mad_size, u16 *out_mad_pkey_index); 276 + 271 277 static inline u32 __to_ib_port_num(u16 port_id) 272 278 { 273 279 return (u32)port_id + 1;
+1
drivers/infiniband/hw/bnxt_re/main.c
··· 1285 1285 .post_recv = bnxt_re_post_recv, 1286 1286 .post_send = bnxt_re_post_send, 1287 1287 .post_srq_recv = bnxt_re_post_srq_recv, 1288 + .process_mad = bnxt_re_process_mad, 1288 1289 .query_ah = bnxt_re_query_ah, 1289 1290 .query_device = bnxt_re_query_device, 1290 1291 .modify_device = bnxt_re_modify_device,
-1
drivers/infiniband/hw/erdma/erdma_cm.c
··· 709 709 erdma_cancel_mpatimer(new_cep); 710 710 711 711 erdma_cep_put(new_cep); 712 - new_cep->sock = NULL; 713 712 } 714 713 715 714 if (new_s) {
-18
drivers/infiniband/hw/hfi1/chip.c
··· 12882 12882 } 12883 12883 } 12884 12884 12885 - /* return the OPA port logical state name */ 12886 - const char *opa_lstate_name(u32 lstate) 12887 - { 12888 - static const char * const port_logical_names[] = { 12889 - "PORT_NOP", 12890 - "PORT_DOWN", 12891 - "PORT_INIT", 12892 - "PORT_ARMED", 12893 - "PORT_ACTIVE", 12894 - "PORT_ACTIVE_DEFER", 12895 - }; 12896 - if (lstate < ARRAY_SIZE(port_logical_names)) 12897 - return port_logical_names[lstate]; 12898 - return "unknown"; 12899 - } 12900 - 12901 12885 /* return the OPA port physical state name */ 12902 12886 const char *opa_pstate_name(u32 pstate) 12903 12887 { ··· 12940 12956 break; 12941 12957 } 12942 12958 } 12943 - dd_dev_info(ppd->dd, "logical state changed to %s (0x%x)\n", 12944 - opa_lstate_name(state), state); 12945 12959 } 12946 12960 12947 12961 /**
-1
drivers/infiniband/hw/hfi1/chip.h
··· 771 771 bool is_urg_masked(struct hfi1_ctxtdata *rcd); 772 772 u32 read_physical_state(struct hfi1_devdata *dd); 773 773 u32 chip_to_opa_pstate(struct hfi1_devdata *dd, u32 chip_pstate); 774 - const char *opa_lstate_name(u32 lstate); 775 774 const char *opa_pstate_name(u32 pstate); 776 775 u32 driver_pstate(struct hfi1_pportdata *ppd); 777 776 u32 driver_lstate(struct hfi1_pportdata *ppd);
+1 -1
drivers/infiniband/hw/hfi1/driver.c
··· 968 968 if (hwstate != IB_PORT_ACTIVE) { 969 969 dd_dev_info(packet->rcd->dd, 970 970 "Unexpected link state %s\n", 971 - opa_lstate_name(hwstate)); 971 + ib_port_state_to_str(hwstate)); 972 972 return false; 973 973 } 974 974
+2 -2
drivers/infiniband/hw/hfi1/mad.c
··· 1160 1160 if (ret == HFI_TRANSITION_DISALLOWED || 1161 1161 ret == HFI_TRANSITION_UNDEFINED) { 1162 1162 pr_warn("invalid logical state transition %s -> %s\n", 1163 - opa_lstate_name(logical_old), 1164 - opa_lstate_name(logical_new)); 1163 + ib_port_state_to_str(logical_old), 1164 + ib_port_state_to_str(logical_new)); 1165 1165 return ret; 1166 1166 } 1167 1167
-20
drivers/infiniband/hw/hfi1/qsfp.c
··· 405 405 } 406 406 407 407 /* 408 - * Perform a stand-alone single QSFP write. Acquire the resource, do the 409 - * write, then release the resource. 410 - */ 411 - int one_qsfp_write(struct hfi1_pportdata *ppd, u32 target, int addr, void *bp, 412 - int len) 413 - { 414 - struct hfi1_devdata *dd = ppd->dd; 415 - u32 resource = qsfp_resource(dd); 416 - int ret; 417 - 418 - ret = acquire_chip_resource(dd, resource, QSFP_WAIT); 419 - if (ret) 420 - return ret; 421 - ret = qsfp_write(ppd, target, addr, bp, len); 422 - release_chip_resource(dd, resource); 423 - 424 - return ret; 425 - } 426 - 427 - /* 428 408 * Access page n, offset m of QSFP memory as defined by SFF 8636 429 409 * by reading @addr = ((256 * n) + m) 430 410 *
-2
drivers/infiniband/hw/hfi1/qsfp.h
··· 195 195 int len); 196 196 int qsfp_read(struct hfi1_pportdata *ppd, u32 target, int addr, void *bp, 197 197 int len); 198 - int one_qsfp_write(struct hfi1_pportdata *ppd, u32 target, int addr, void *bp, 199 - int len); 200 198 int one_qsfp_read(struct hfi1_pportdata *ppd, u32 target, int addr, void *bp, 201 199 int len); 202 200 struct hfi1_asic_data;
+1 -1
drivers/infiniband/hw/hns/hns_roce_mr.c
··· 998 998 if (attr->region_count > ARRAY_SIZE(attr->region) || 999 999 attr->region_count < 1 || attr->page_shift < HNS_HW_PAGE_SHIFT) { 1000 1000 ibdev_err(ibdev, 1001 - "invalid buf attr, region count %d, page shift %u.\n", 1001 + "invalid buf attr, region count %u, page shift %u.\n", 1002 1002 attr->region_count, attr->page_shift); 1003 1003 return false; 1004 1004 }
+1 -1
drivers/infiniband/hw/hns/hns_roce_qp.c
··· 1320 1320 1321 1321 ret = hns_roce_create_qp_common(hr_dev, init_attr, udata, hr_qp); 1322 1322 if (ret) 1323 - ibdev_err(ibdev, "create QP type 0x%x failed(%d)\n", 1323 + ibdev_err(ibdev, "create QP type %d failed(%d)\n", 1324 1324 init_attr->qp_type, ret); 1325 1325 1326 1326 err_out:
+1 -1
drivers/infiniband/hw/hns/hns_roce_srq.c
··· 51 51 break; 52 52 default: 53 53 dev_err(hr_dev->dev, 54 - "hns_roce:Unexpected event type 0x%x on SRQ %06lx\n", 54 + "hns_roce:Unexpected event type %d on SRQ %06lx\n", 55 55 event_type, srq->srqn); 56 56 return; 57 57 }
+1
drivers/infiniband/hw/irdma/Kconfig
··· 7 7 depends on ICE && I40E 8 8 select GENERIC_ALLOCATOR 9 9 select AUXILIARY_BUS 10 + select CRC32 10 11 help 11 12 This is an Intel(R) Ethernet Protocol Driver for RDMA driver 12 13 that support E810 (iWARP/RoCE) and X722 (iWARP) network devices.
-1
drivers/infiniband/hw/irdma/main.h
··· 30 30 #endif 31 31 #include <linux/auxiliary_bus.h> 32 32 #include <linux/net/intel/iidc.h> 33 - #include <crypto/hash.h> 34 33 #include <rdma/ib_smi.h> 35 34 #include <rdma/ib_verbs.h> 36 35 #include <rdma/ib_pack.h>
+1 -5
drivers/infiniband/hw/irdma/osdep.h
··· 6 6 #include <linux/pci.h> 7 7 #include <linux/bitfield.h> 8 8 #include <linux/net/intel/iidc.h> 9 - #include <crypto/hash.h> 10 9 #include <rdma/ib_verbs.h> 11 10 12 11 #define STATS_TIMER_DELAY 60000 ··· 42 43 bool irdma_vf_clear_to_send(struct irdma_sc_dev *dev); 43 44 void irdma_add_dev_ref(struct irdma_sc_dev *dev); 44 45 void irdma_put_dev_ref(struct irdma_sc_dev *dev); 45 - int irdma_ieq_check_mpacrc(struct shash_desc *desc, void *addr, u32 len, 46 - u32 val); 46 + int irdma_ieq_check_mpacrc(const void *addr, u32 len, u32 val); 47 47 struct irdma_sc_qp *irdma_ieq_get_qp(struct irdma_sc_dev *dev, 48 48 struct irdma_puda_buf *buf); 49 49 void irdma_send_ieq_ack(struct irdma_sc_qp *qp); 50 50 void irdma_ieq_update_tcpip_info(struct irdma_puda_buf *buf, u16 len, 51 51 u32 seqnum); 52 - void irdma_free_hash_desc(struct shash_desc *hash_desc); 53 - int irdma_init_hash_desc(struct shash_desc **hash_desc); 54 52 int irdma_puda_get_tcpip_info(struct irdma_puda_cmpl_info *info, 55 53 struct irdma_puda_buf *buf); 56 54 int irdma_cqp_sds_cmd(struct irdma_sc_dev *dev,
+7 -12
drivers/infiniband/hw/irdma/puda.c
··· 923 923 924 924 switch (rsrc->cmpl) { 925 925 case PUDA_HASH_CRC_COMPLETE: 926 - irdma_free_hash_desc(rsrc->hash_desc); 927 - fallthrough; 928 926 case PUDA_QP_CREATED: 929 927 irdma_qp_rem_qos(&rsrc->qp); 930 928 ··· 1093 1095 goto error; 1094 1096 1095 1097 if (info->type == IRDMA_PUDA_RSRC_TYPE_IEQ) { 1096 - if (!irdma_init_hash_desc(&rsrc->hash_desc)) { 1097 - rsrc->check_crc = true; 1098 - rsrc->cmpl = PUDA_HASH_CRC_COMPLETE; 1099 - ret = 0; 1100 - } 1098 + rsrc->check_crc = true; 1099 + rsrc->cmpl = PUDA_HASH_CRC_COMPLETE; 1101 1100 } 1102 1101 1103 1102 irdma_sc_ccq_arm(&rsrc->cq); 1104 - return ret; 1103 + return 0; 1105 1104 1106 1105 error: 1107 1106 irdma_puda_dele_rsrc(vsi, info->type, false); ··· 1391 1396 crcptr = txbuf->data + fpdu_len - 4; 1392 1397 mpacrc = *(u32 *)crcptr; 1393 1398 if (ieq->check_crc) { 1394 - status = irdma_ieq_check_mpacrc(ieq->hash_desc, txbuf->data, 1395 - (fpdu_len - 4), mpacrc); 1399 + status = irdma_ieq_check_mpacrc(txbuf->data, fpdu_len - 4, 1400 + mpacrc); 1396 1401 if (status) { 1397 1402 ibdev_dbg(to_ibdev(ieq->dev), "IEQ: error bad crc\n"); 1398 1403 goto error; ··· 1460 1465 crcptr = datap + fpdu_len - 4; 1461 1466 mpacrc = *(u32 *)crcptr; 1462 1467 if (ieq->check_crc) 1463 - ret = irdma_ieq_check_mpacrc(ieq->hash_desc, datap, 1464 - fpdu_len - 4, mpacrc); 1468 + ret = irdma_ieq_check_mpacrc(datap, fpdu_len - 4, 1469 + mpacrc); 1465 1470 if (ret) { 1466 1471 list_add(&buf->list, rxlist); 1467 1472 ibdev_dbg(to_ibdev(ieq->dev),
+1 -4
drivers/infiniband/hw/irdma/puda.h
··· 119 119 u32 rx_wqe_idx; 120 120 u32 rxq_invalid_cnt; 121 121 u32 tx_wqe_avail_cnt; 122 - struct shash_desc *hash_desc; 123 122 struct list_head txpend; 124 123 struct list_head bufpool; /* free buffers pool list for recv and xmit */ 125 124 u32 alloc_buf_count; ··· 162 163 struct irdma_puda_buf *buf); 163 164 int irdma_puda_get_tcpip_info(struct irdma_puda_cmpl_info *info, 164 165 struct irdma_puda_buf *buf); 165 - int irdma_ieq_check_mpacrc(struct shash_desc *desc, void *addr, u32 len, u32 val); 166 - int irdma_init_hash_desc(struct shash_desc **desc); 166 + int irdma_ieq_check_mpacrc(const void *addr, u32 len, u32 val); 167 167 void irdma_ieq_mpa_crc_ae(struct irdma_sc_dev *dev, struct irdma_sc_qp *qp); 168 - void irdma_free_hash_desc(struct shash_desc *desc); 169 168 void irdma_ieq_update_tcpip_info(struct irdma_puda_buf *buf, u16 len, u32 seqnum); 170 169 int irdma_cqp_qp_create_cmd(struct irdma_sc_dev *dev, struct irdma_sc_qp *qp); 171 170 int irdma_cqp_cq_create_cmd(struct irdma_sc_dev *dev, struct irdma_sc_cq *cq);
+2 -45
drivers/infiniband/hw/irdma/utils.c
··· 1274 1274 } 1275 1275 1276 1276 /** 1277 - * irdma_init_hash_desc - initialize hash for crc calculation 1278 - * @desc: cryption type 1279 - */ 1280 - int irdma_init_hash_desc(struct shash_desc **desc) 1281 - { 1282 - struct crypto_shash *tfm; 1283 - struct shash_desc *tdesc; 1284 - 1285 - tfm = crypto_alloc_shash("crc32c", 0, 0); 1286 - if (IS_ERR(tfm)) 1287 - return -EINVAL; 1288 - 1289 - tdesc = kzalloc(sizeof(*tdesc) + crypto_shash_descsize(tfm), 1290 - GFP_KERNEL); 1291 - if (!tdesc) { 1292 - crypto_free_shash(tfm); 1293 - return -EINVAL; 1294 - } 1295 - 1296 - tdesc->tfm = tfm; 1297 - *desc = tdesc; 1298 - 1299 - return 0; 1300 - } 1301 - 1302 - /** 1303 - * irdma_free_hash_desc - free hash desc 1304 - * @desc: to be freed 1305 - */ 1306 - void irdma_free_hash_desc(struct shash_desc *desc) 1307 - { 1308 - if (desc) { 1309 - crypto_free_shash(desc->tfm); 1310 - kfree(desc); 1311 - } 1312 - } 1313 - 1314 - /** 1315 1277 * irdma_ieq_check_mpacrc - check if mpa crc is OK 1316 - * @desc: desc for hash 1317 1278 * @addr: address of buffer for crc 1318 1279 * @len: length of buffer 1319 1280 * @val: value to be compared 1320 1281 */ 1321 - int irdma_ieq_check_mpacrc(struct shash_desc *desc, void *addr, u32 len, 1322 - u32 val) 1282 + int irdma_ieq_check_mpacrc(const void *addr, u32 len, u32 val) 1323 1283 { 1324 - u32 crc = 0; 1325 - 1326 - crypto_shash_digest(desc, addr, len, (u8 *)&crc); 1327 - if (crc != val) 1284 + if ((__force u32)cpu_to_le32(~crc32c(~0, addr, len)) != val) 1328 1285 return -EINVAL; 1329 1286 1330 1287 return 0;
+1 -1
drivers/infiniband/hw/mana/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 2 obj-$(CONFIG_MANA_INFINIBAND) += mana_ib.o 3 3 4 - mana_ib-y := device.o main.o wq.o qp.o cq.o mr.o 4 + mana_ib-y := device.o main.o wq.o qp.o cq.o mr.o ah.o wr.o counters.o
+58
drivers/infiniband/hw/mana/ah.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (c) 2024, Microsoft Corporation. All rights reserved. 4 + */ 5 + 6 + #include "mana_ib.h" 7 + 8 + int mana_ib_create_ah(struct ib_ah *ibah, struct rdma_ah_init_attr *attr, 9 + struct ib_udata *udata) 10 + { 11 + struct mana_ib_dev *mdev = container_of(ibah->device, struct mana_ib_dev, ib_dev); 12 + struct mana_ib_ah *ah = container_of(ibah, struct mana_ib_ah, ibah); 13 + struct rdma_ah_attr *ah_attr = attr->ah_attr; 14 + const struct ib_global_route *grh; 15 + enum rdma_network_type ntype; 16 + 17 + if (ah_attr->type != RDMA_AH_ATTR_TYPE_ROCE || 18 + !(rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH)) 19 + return -EINVAL; 20 + 21 + if (udata) 22 + return -EINVAL; 23 + 24 + ah->av = dma_pool_zalloc(mdev->av_pool, GFP_ATOMIC, &ah->dma_handle); 25 + if (!ah->av) 26 + return -ENOMEM; 27 + 28 + grh = rdma_ah_read_grh(ah_attr); 29 + ntype = rdma_gid_attr_network_type(grh->sgid_attr); 30 + 31 + copy_in_reverse(ah->av->dest_mac, ah_attr->roce.dmac, ETH_ALEN); 32 + ah->av->udp_src_port = rdma_flow_label_to_udp_sport(grh->flow_label); 33 + ah->av->hop_limit = grh->hop_limit; 34 + ah->av->dscp = (grh->traffic_class >> 2) & 0x3f; 35 + ah->av->is_ipv6 = (ntype == RDMA_NETWORK_IPV6); 36 + 37 + if (ah->av->is_ipv6) { 38 + copy_in_reverse(ah->av->dest_ip, grh->dgid.raw, 16); 39 + copy_in_reverse(ah->av->src_ip, grh->sgid_attr->gid.raw, 16); 40 + } else { 41 + ah->av->dest_ip[10] = 0xFF; 42 + ah->av->dest_ip[11] = 0xFF; 43 + copy_in_reverse(&ah->av->dest_ip[12], &grh->dgid.raw[12], 4); 44 + copy_in_reverse(&ah->av->src_ip[12], &grh->sgid_attr->gid.raw[12], 4); 45 + } 46 + 47 + return 0; 48 + } 49 + 50 + int mana_ib_destroy_ah(struct ib_ah *ibah, u32 flags) 51 + { 52 + struct mana_ib_dev *mdev = container_of(ibah->device, struct mana_ib_dev, ib_dev); 53 + struct mana_ib_ah *ah = container_of(ibah, struct mana_ib_ah, ibah); 54 + 55 + dma_pool_free(mdev->av_pool, ah->av, ah->dma_handle); 56 + 57 + return 0; 58 + }
+105
drivers/infiniband/hw/mana/counters.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (c) 2024, Microsoft Corporation. All rights reserved. 4 + */ 5 + 6 + #include "counters.h" 7 + 8 + static const struct rdma_stat_desc mana_ib_port_stats_desc[] = { 9 + [MANA_IB_REQUESTER_TIMEOUT].name = "requester_timeout", 10 + [MANA_IB_REQUESTER_OOS_NAK].name = "requester_oos_nak", 11 + [MANA_IB_REQUESTER_RNR_NAK].name = "requester_rnr_nak", 12 + [MANA_IB_RESPONDER_RNR_NAK].name = "responder_rnr_nak", 13 + [MANA_IB_RESPONDER_OOS].name = "responder_oos", 14 + [MANA_IB_RESPONDER_DUP_REQUEST].name = "responder_dup_request", 15 + [MANA_IB_REQUESTER_IMPLICIT_NAK].name = "requester_implicit_nak", 16 + [MANA_IB_REQUESTER_READRESP_PSN_MISMATCH].name = "requester_readresp_psn_mismatch", 17 + [MANA_IB_NAK_INV_REQ].name = "nak_inv_req", 18 + [MANA_IB_NAK_ACCESS_ERR].name = "nak_access_error", 19 + [MANA_IB_NAK_OPP_ERR].name = "nak_opp_error", 20 + [MANA_IB_NAK_INV_READ].name = "nak_inv_read", 21 + [MANA_IB_RESPONDER_LOCAL_LEN_ERR].name = "responder_local_len_error", 22 + [MANA_IB_REQUESTOR_LOCAL_PROT_ERR].name = "requestor_local_prot_error", 23 + [MANA_IB_RESPONDER_REM_ACCESS_ERR].name = "responder_rem_access_error", 24 + [MANA_IB_RESPONDER_LOCAL_QP_ERR].name = "responder_local_qp_error", 25 + [MANA_IB_RESPONDER_MALFORMED_WQE].name = "responder_malformed_wqe", 26 + [MANA_IB_GENERAL_HW_ERR].name = "general_hw_error", 27 + [MANA_IB_REQUESTER_RNR_NAK_RETRIES_EXCEEDED].name = "requester_rnr_nak_retries_exceeded", 28 + [MANA_IB_REQUESTER_RETRIES_EXCEEDED].name = "requester_retries_exceeded", 29 + [MANA_IB_TOTAL_FATAL_ERR].name = "total_fatal_error", 30 + [MANA_IB_RECEIVED_CNPS].name = "received_cnps", 31 + [MANA_IB_NUM_QPS_CONGESTED].name = "num_qps_congested", 32 + [MANA_IB_RATE_INC_EVENTS].name = "rate_inc_events", 33 + [MANA_IB_NUM_QPS_RECOVERED].name = "num_qps_recovered", 34 + [MANA_IB_CURRENT_RATE].name = "current_rate", 35 + }; 36 + 37 + struct rdma_hw_stats *mana_ib_alloc_hw_port_stats(struct ib_device *ibdev, 38 + u32 port_num) 39 + { 40 + return rdma_alloc_hw_stats_struct(mana_ib_port_stats_desc, 41 + ARRAY_SIZE(mana_ib_port_stats_desc), 42 + RDMA_HW_STATS_DEFAULT_LIFESPAN); 43 + } 44 + 45 + int mana_ib_get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats, 46 + u32 port_num, int index) 47 + { 48 + struct mana_ib_dev *mdev = container_of(ibdev, struct mana_ib_dev, 49 + ib_dev); 50 + struct mana_rnic_query_vf_cntrs_resp resp = {}; 51 + struct mana_rnic_query_vf_cntrs_req req = {}; 52 + int err; 53 + 54 + mana_gd_init_req_hdr(&req.hdr, MANA_IB_QUERY_VF_COUNTERS, 55 + sizeof(req), sizeof(resp)); 56 + req.hdr.dev_id = mdev->gdma_dev->dev_id; 57 + req.adapter = mdev->adapter_handle; 58 + 59 + err = mana_gd_send_request(mdev_to_gc(mdev), sizeof(req), &req, 60 + sizeof(resp), &resp); 61 + if (err) { 62 + ibdev_err(&mdev->ib_dev, "Failed to query vf counters err %d", 63 + err); 64 + return err; 65 + } 66 + 67 + stats->value[MANA_IB_REQUESTER_TIMEOUT] = resp.requester_timeout; 68 + stats->value[MANA_IB_REQUESTER_OOS_NAK] = resp.requester_oos_nak; 69 + stats->value[MANA_IB_REQUESTER_RNR_NAK] = resp.requester_rnr_nak; 70 + stats->value[MANA_IB_RESPONDER_RNR_NAK] = resp.responder_rnr_nak; 71 + stats->value[MANA_IB_RESPONDER_OOS] = resp.responder_oos; 72 + stats->value[MANA_IB_RESPONDER_DUP_REQUEST] = resp.responder_dup_request; 73 + stats->value[MANA_IB_REQUESTER_IMPLICIT_NAK] = 74 + resp.requester_implicit_nak; 75 + stats->value[MANA_IB_REQUESTER_READRESP_PSN_MISMATCH] = 76 + resp.requester_readresp_psn_mismatch; 77 + stats->value[MANA_IB_NAK_INV_REQ] = resp.nak_inv_req; 78 + stats->value[MANA_IB_NAK_ACCESS_ERR] = resp.nak_access_err; 79 + stats->value[MANA_IB_NAK_OPP_ERR] = resp.nak_opp_err; 80 + stats->value[MANA_IB_NAK_INV_READ] = resp.nak_inv_read; 81 + stats->value[MANA_IB_RESPONDER_LOCAL_LEN_ERR] = 82 + resp.responder_local_len_err; 83 + stats->value[MANA_IB_REQUESTOR_LOCAL_PROT_ERR] = 84 + resp.requestor_local_prot_err; 85 + stats->value[MANA_IB_RESPONDER_REM_ACCESS_ERR] = 86 + resp.responder_rem_access_err; 87 + stats->value[MANA_IB_RESPONDER_LOCAL_QP_ERR] = 88 + resp.responder_local_qp_err; 89 + stats->value[MANA_IB_RESPONDER_MALFORMED_WQE] = 90 + resp.responder_malformed_wqe; 91 + stats->value[MANA_IB_GENERAL_HW_ERR] = resp.general_hw_err; 92 + stats->value[MANA_IB_REQUESTER_RNR_NAK_RETRIES_EXCEEDED] = 93 + resp.requester_rnr_nak_retries_exceeded; 94 + stats->value[MANA_IB_REQUESTER_RETRIES_EXCEEDED] = 95 + resp.requester_retries_exceeded; 96 + stats->value[MANA_IB_TOTAL_FATAL_ERR] = resp.total_fatal_err; 97 + 98 + stats->value[MANA_IB_RECEIVED_CNPS] = resp.received_cnps; 99 + stats->value[MANA_IB_NUM_QPS_CONGESTED] = resp.num_qps_congested; 100 + stats->value[MANA_IB_RATE_INC_EVENTS] = resp.rate_inc_events; 101 + stats->value[MANA_IB_NUM_QPS_RECOVERED] = resp.num_qps_recovered; 102 + stats->value[MANA_IB_CURRENT_RATE] = resp.current_rate; 103 + 104 + return ARRAY_SIZE(mana_ib_port_stats_desc); 105 + }
+44
drivers/infiniband/hw/mana/counters.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (c) 2024 Microsoft Corporation. All rights reserved. 4 + */ 5 + 6 + #ifndef _COUNTERS_H_ 7 + #define _COUNTERS_H_ 8 + 9 + #include "mana_ib.h" 10 + 11 + enum mana_ib_port_counters { 12 + MANA_IB_REQUESTER_TIMEOUT, 13 + MANA_IB_REQUESTER_OOS_NAK, 14 + MANA_IB_REQUESTER_RNR_NAK, 15 + MANA_IB_RESPONDER_RNR_NAK, 16 + MANA_IB_RESPONDER_OOS, 17 + MANA_IB_RESPONDER_DUP_REQUEST, 18 + MANA_IB_REQUESTER_IMPLICIT_NAK, 19 + MANA_IB_REQUESTER_READRESP_PSN_MISMATCH, 20 + MANA_IB_NAK_INV_REQ, 21 + MANA_IB_NAK_ACCESS_ERR, 22 + MANA_IB_NAK_OPP_ERR, 23 + MANA_IB_NAK_INV_READ, 24 + MANA_IB_RESPONDER_LOCAL_LEN_ERR, 25 + MANA_IB_REQUESTOR_LOCAL_PROT_ERR, 26 + MANA_IB_RESPONDER_REM_ACCESS_ERR, 27 + MANA_IB_RESPONDER_LOCAL_QP_ERR, 28 + MANA_IB_RESPONDER_MALFORMED_WQE, 29 + MANA_IB_GENERAL_HW_ERR, 30 + MANA_IB_REQUESTER_RNR_NAK_RETRIES_EXCEEDED, 31 + MANA_IB_REQUESTER_RETRIES_EXCEEDED, 32 + MANA_IB_TOTAL_FATAL_ERR, 33 + MANA_IB_RECEIVED_CNPS, 34 + MANA_IB_NUM_QPS_CONGESTED, 35 + MANA_IB_RATE_INC_EVENTS, 36 + MANA_IB_NUM_QPS_RECOVERED, 37 + MANA_IB_CURRENT_RATE, 38 + }; 39 + 40 + struct rdma_hw_stats *mana_ib_alloc_hw_port_stats(struct ib_device *ibdev, 41 + u32 port_num); 42 + int mana_ib_get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats, 43 + u32 port_num, int index); 44 + #endif /* _COUNTERS_H_ */
+203 -31
drivers/infiniband/hw/mana/cq.c
··· 15 15 struct ib_device *ibdev = ibcq->device; 16 16 struct mana_ib_create_cq ucmd = {}; 17 17 struct mana_ib_dev *mdev; 18 + struct gdma_context *gc; 18 19 bool is_rnic_cq; 19 20 u32 doorbell; 21 + u32 buf_size; 20 22 int err; 21 23 22 24 mdev = container_of(ibdev, struct mana_ib_dev, ib_dev); 25 + gc = mdev_to_gc(mdev); 23 26 24 27 cq->comp_vector = attr->comp_vector % ibdev->num_comp_vectors; 25 28 cq->cq_handle = INVALID_MANA_HANDLE; 26 29 27 - if (udata->inlen < offsetof(struct mana_ib_create_cq, flags)) 28 - return -EINVAL; 30 + if (udata) { 31 + if (udata->inlen < offsetof(struct mana_ib_create_cq, flags)) 32 + return -EINVAL; 29 33 30 - err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen)); 31 - if (err) { 32 - ibdev_dbg(ibdev, 33 - "Failed to copy from udata for create cq, %d\n", err); 34 - return err; 34 + err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen)); 35 + if (err) { 36 + ibdev_dbg(ibdev, "Failed to copy from udata for create cq, %d\n", err); 37 + return err; 38 + } 39 + 40 + is_rnic_cq = !!(ucmd.flags & MANA_IB_CREATE_RNIC_CQ); 41 + 42 + if ((!is_rnic_cq && attr->cqe > mdev->adapter_caps.max_qp_wr) || 43 + attr->cqe > U32_MAX / COMP_ENTRY_SIZE) { 44 + ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe); 45 + return -EINVAL; 46 + } 47 + 48 + cq->cqe = attr->cqe; 49 + err = mana_ib_create_queue(mdev, ucmd.buf_addr, cq->cqe * COMP_ENTRY_SIZE, 50 + &cq->queue); 51 + if (err) { 52 + ibdev_dbg(ibdev, "Failed to create queue for create cq, %d\n", err); 53 + return err; 54 + } 55 + 56 + mana_ucontext = rdma_udata_to_drv_context(udata, struct mana_ib_ucontext, 57 + ibucontext); 58 + doorbell = mana_ucontext->doorbell; 59 + } else { 60 + is_rnic_cq = true; 61 + buf_size = MANA_PAGE_ALIGN(roundup_pow_of_two(attr->cqe * COMP_ENTRY_SIZE)); 62 + cq->cqe = buf_size / COMP_ENTRY_SIZE; 63 + err = mana_ib_create_kernel_queue(mdev, buf_size, GDMA_CQ, &cq->queue); 64 + if (err) { 65 + ibdev_dbg(ibdev, "Failed to create kernel queue for create cq, %d\n", err); 66 + return err; 67 + } 68 + doorbell = gc->mana_ib.doorbell; 35 69 } 36 - 37 - is_rnic_cq = !!(ucmd.flags & MANA_IB_CREATE_RNIC_CQ); 38 - 39 - if (!is_rnic_cq && attr->cqe > mdev->adapter_caps.max_qp_wr) { 40 - ibdev_dbg(ibdev, "CQE %d exceeding limit\n", attr->cqe); 41 - return -EINVAL; 42 - } 43 - 44 - cq->cqe = attr->cqe; 45 - err = mana_ib_create_queue(mdev, ucmd.buf_addr, cq->cqe * COMP_ENTRY_SIZE, &cq->queue); 46 - if (err) { 47 - ibdev_dbg(ibdev, "Failed to create queue for create cq, %d\n", err); 48 - return err; 49 - } 50 - 51 - mana_ucontext = rdma_udata_to_drv_context(udata, struct mana_ib_ucontext, 52 - ibucontext); 53 - doorbell = mana_ucontext->doorbell; 54 70 55 71 if (is_rnic_cq) { 56 72 err = mana_ib_gd_create_cq(mdev, cq, doorbell); ··· 82 66 } 83 67 } 84 68 85 - resp.cqid = cq->queue.id; 86 - err = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen)); 87 - if (err) { 88 - ibdev_dbg(&mdev->ib_dev, "Failed to copy to udata, %d\n", err); 89 - goto err_remove_cq_cb; 69 + if (udata) { 70 + resp.cqid = cq->queue.id; 71 + err = ib_copy_to_udata(udata, &resp, min(sizeof(resp), udata->outlen)); 72 + if (err) { 73 + ibdev_dbg(&mdev->ib_dev, "Failed to copy to udata, %d\n", err); 74 + goto err_remove_cq_cb; 75 + } 90 76 } 77 + 78 + spin_lock_init(&cq->cq_lock); 79 + INIT_LIST_HEAD(&cq->list_send_qp); 80 + INIT_LIST_HEAD(&cq->list_recv_qp); 91 81 92 82 return 0; 93 83 ··· 144 122 return -EINVAL; 145 123 /* Create CQ table entry */ 146 124 WARN_ON(gc->cq_table[cq->queue.id]); 147 - gdma_cq = kzalloc(sizeof(*gdma_cq), GFP_KERNEL); 125 + if (cq->queue.kmem) 126 + gdma_cq = cq->queue.kmem; 127 + else 128 + gdma_cq = kzalloc(sizeof(*gdma_cq), GFP_KERNEL); 148 129 if (!gdma_cq) 149 130 return -ENOMEM; 150 131 ··· 166 141 if (cq->queue.id >= gc->max_num_cqs || cq->queue.id == INVALID_QUEUE_ID) 167 142 return; 168 143 144 + if (cq->queue.kmem) 145 + /* Then it will be cleaned and removed by the mana */ 146 + return; 147 + 169 148 kfree(gc->cq_table[cq->queue.id]); 170 149 gc->cq_table[cq->queue.id] = NULL; 150 + } 151 + 152 + int mana_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags) 153 + { 154 + struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq); 155 + struct gdma_queue *gdma_cq = cq->queue.kmem; 156 + 157 + if (!gdma_cq) 158 + return -EINVAL; 159 + 160 + mana_gd_ring_cq(gdma_cq, SET_ARM_BIT); 161 + return 0; 162 + } 163 + 164 + static inline void handle_ud_sq_cqe(struct mana_ib_qp *qp, struct gdma_comp *cqe) 165 + { 166 + struct mana_rdma_cqe *rdma_cqe = (struct mana_rdma_cqe *)cqe->cqe_data; 167 + struct gdma_queue *wq = qp->ud_qp.queues[MANA_UD_SEND_QUEUE].kmem; 168 + struct ud_sq_shadow_wqe *shadow_wqe; 169 + 170 + shadow_wqe = shadow_queue_get_next_to_complete(&qp->shadow_sq); 171 + if (!shadow_wqe) 172 + return; 173 + 174 + shadow_wqe->header.error_code = rdma_cqe->ud_send.vendor_error; 175 + 176 + wq->tail += shadow_wqe->header.posted_wqe_size; 177 + shadow_queue_advance_next_to_complete(&qp->shadow_sq); 178 + } 179 + 180 + static inline void handle_ud_rq_cqe(struct mana_ib_qp *qp, struct gdma_comp *cqe) 181 + { 182 + struct mana_rdma_cqe *rdma_cqe = (struct mana_rdma_cqe *)cqe->cqe_data; 183 + struct gdma_queue *wq = qp->ud_qp.queues[MANA_UD_RECV_QUEUE].kmem; 184 + struct ud_rq_shadow_wqe *shadow_wqe; 185 + 186 + shadow_wqe = shadow_queue_get_next_to_complete(&qp->shadow_rq); 187 + if (!shadow_wqe) 188 + return; 189 + 190 + shadow_wqe->byte_len = rdma_cqe->ud_recv.msg_len; 191 + shadow_wqe->src_qpn = rdma_cqe->ud_recv.src_qpn; 192 + shadow_wqe->header.error_code = IB_WC_SUCCESS; 193 + 194 + wq->tail += shadow_wqe->header.posted_wqe_size; 195 + shadow_queue_advance_next_to_complete(&qp->shadow_rq); 196 + } 197 + 198 + static void mana_handle_cqe(struct mana_ib_dev *mdev, struct gdma_comp *cqe) 199 + { 200 + struct mana_ib_qp *qp = mana_get_qp_ref(mdev, cqe->wq_num, cqe->is_sq); 201 + 202 + if (!qp) 203 + return; 204 + 205 + if (qp->ibqp.qp_type == IB_QPT_GSI || qp->ibqp.qp_type == IB_QPT_UD) { 206 + if (cqe->is_sq) 207 + handle_ud_sq_cqe(qp, cqe); 208 + else 209 + handle_ud_rq_cqe(qp, cqe); 210 + } 211 + 212 + mana_put_qp_ref(qp); 213 + } 214 + 215 + static void fill_verbs_from_shadow_wqe(struct mana_ib_qp *qp, struct ib_wc *wc, 216 + const struct shadow_wqe_header *shadow_wqe) 217 + { 218 + const struct ud_rq_shadow_wqe *ud_wqe = (const struct ud_rq_shadow_wqe *)shadow_wqe; 219 + 220 + wc->wr_id = shadow_wqe->wr_id; 221 + wc->status = shadow_wqe->error_code; 222 + wc->opcode = shadow_wqe->opcode; 223 + wc->vendor_err = shadow_wqe->error_code; 224 + wc->wc_flags = 0; 225 + wc->qp = &qp->ibqp; 226 + wc->pkey_index = 0; 227 + 228 + if (shadow_wqe->opcode == IB_WC_RECV) { 229 + wc->byte_len = ud_wqe->byte_len; 230 + wc->src_qp = ud_wqe->src_qpn; 231 + wc->wc_flags |= IB_WC_GRH; 232 + } 233 + } 234 + 235 + static int mana_process_completions(struct mana_ib_cq *cq, int nwc, struct ib_wc *wc) 236 + { 237 + struct shadow_wqe_header *shadow_wqe; 238 + struct mana_ib_qp *qp; 239 + int wc_index = 0; 240 + 241 + /* process send shadow queue completions */ 242 + list_for_each_entry(qp, &cq->list_send_qp, cq_send_list) { 243 + while ((shadow_wqe = shadow_queue_get_next_to_consume(&qp->shadow_sq)) 244 + != NULL) { 245 + if (wc_index >= nwc) 246 + goto out; 247 + 248 + fill_verbs_from_shadow_wqe(qp, &wc[wc_index], shadow_wqe); 249 + shadow_queue_advance_consumer(&qp->shadow_sq); 250 + wc_index++; 251 + } 252 + } 253 + 254 + /* process recv shadow queue completions */ 255 + list_for_each_entry(qp, &cq->list_recv_qp, cq_recv_list) { 256 + while ((shadow_wqe = shadow_queue_get_next_to_consume(&qp->shadow_rq)) 257 + != NULL) { 258 + if (wc_index >= nwc) 259 + goto out; 260 + 261 + fill_verbs_from_shadow_wqe(qp, &wc[wc_index], shadow_wqe); 262 + shadow_queue_advance_consumer(&qp->shadow_rq); 263 + wc_index++; 264 + } 265 + } 266 + 267 + out: 268 + return wc_index; 269 + } 270 + 271 + int mana_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc) 272 + { 273 + struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq); 274 + struct mana_ib_dev *mdev = container_of(ibcq->device, struct mana_ib_dev, ib_dev); 275 + struct gdma_queue *queue = cq->queue.kmem; 276 + struct gdma_comp gdma_cqe; 277 + unsigned long flags; 278 + int num_polled = 0; 279 + int comp_read, i; 280 + 281 + spin_lock_irqsave(&cq->cq_lock, flags); 282 + for (i = 0; i < num_entries; i++) { 283 + comp_read = mana_gd_poll_cq(queue, &gdma_cqe, 1); 284 + if (comp_read < 1) 285 + break; 286 + mana_handle_cqe(mdev, &gdma_cqe); 287 + } 288 + 289 + num_polled = mana_process_completions(cq, num_entries, wc); 290 + spin_unlock_irqrestore(&cq->cq_lock, flags); 291 + 292 + return num_polled; 171 293 }
+76 -8
drivers/infiniband/hw/mana/device.c
··· 19 19 .add_gid = mana_ib_gd_add_gid, 20 20 .alloc_pd = mana_ib_alloc_pd, 21 21 .alloc_ucontext = mana_ib_alloc_ucontext, 22 + .create_ah = mana_ib_create_ah, 22 23 .create_cq = mana_ib_create_cq, 23 24 .create_qp = mana_ib_create_qp, 24 25 .create_rwq_ind_table = mana_ib_create_rwq_ind_table, ··· 28 27 .dealloc_ucontext = mana_ib_dealloc_ucontext, 29 28 .del_gid = mana_ib_gd_del_gid, 30 29 .dereg_mr = mana_ib_dereg_mr, 30 + .destroy_ah = mana_ib_destroy_ah, 31 31 .destroy_cq = mana_ib_destroy_cq, 32 32 .destroy_qp = mana_ib_destroy_qp, 33 33 .destroy_rwq_ind_table = mana_ib_destroy_rwq_ind_table, 34 34 .destroy_wq = mana_ib_destroy_wq, 35 35 .disassociate_ucontext = mana_ib_disassociate_ucontext, 36 + .get_dma_mr = mana_ib_get_dma_mr, 36 37 .get_link_layer = mana_ib_get_link_layer, 37 38 .get_port_immutable = mana_ib_get_port_immutable, 38 39 .mmap = mana_ib_mmap, 39 40 .modify_qp = mana_ib_modify_qp, 40 41 .modify_wq = mana_ib_modify_wq, 42 + .poll_cq = mana_ib_poll_cq, 43 + .post_recv = mana_ib_post_recv, 44 + .post_send = mana_ib_post_send, 41 45 .query_device = mana_ib_query_device, 42 46 .query_gid = mana_ib_query_gid, 43 47 .query_pkey = mana_ib_query_pkey, 44 48 .query_port = mana_ib_query_port, 45 49 .reg_user_mr = mana_ib_reg_user_mr, 50 + .reg_user_mr_dmabuf = mana_ib_reg_user_mr_dmabuf, 51 + .req_notify_cq = mana_ib_arm_cq, 46 52 53 + INIT_RDMA_OBJ_SIZE(ib_ah, mana_ib_ah, ibah), 47 54 INIT_RDMA_OBJ_SIZE(ib_cq, mana_ib_cq, ibcq), 48 55 INIT_RDMA_OBJ_SIZE(ib_pd, mana_ib_pd, ibpd), 49 56 INIT_RDMA_OBJ_SIZE(ib_qp, mana_ib_qp, ibqp), ··· 59 50 INIT_RDMA_OBJ_SIZE(ib_rwq_ind_table, mana_ib_rwq_ind_table, 60 51 ib_ind_table), 61 52 }; 53 + 54 + static const struct ib_device_ops mana_ib_stats_ops = { 55 + .alloc_hw_port_stats = mana_ib_alloc_hw_port_stats, 56 + .get_hw_stats = mana_ib_get_hw_stats, 57 + }; 58 + 59 + static int mana_ib_netdev_event(struct notifier_block *this, 60 + unsigned long event, void *ptr) 61 + { 62 + struct mana_ib_dev *dev = container_of(this, struct mana_ib_dev, nb); 63 + struct net_device *event_dev = netdev_notifier_info_to_dev(ptr); 64 + struct gdma_context *gc = dev->gdma_dev->gdma_context; 65 + struct mana_context *mc = gc->mana.driver_data; 66 + struct net_device *ndev; 67 + 68 + /* Only process events from our parent device */ 69 + if (event_dev != mc->ports[0]) 70 + return NOTIFY_DONE; 71 + 72 + switch (event) { 73 + case NETDEV_CHANGEUPPER: 74 + ndev = mana_get_primary_netdev(mc, 0, &dev->dev_tracker); 75 + /* 76 + * RDMA core will setup GID based on updated netdev. 77 + * It's not possible to race with the core as rtnl lock is being 78 + * held. 79 + */ 80 + ib_device_set_netdev(&dev->ib_dev, ndev, 1); 81 + 82 + /* mana_get_primary_netdev() returns ndev with refcount held */ 83 + netdev_put(ndev, &dev->dev_tracker); 84 + 85 + return NOTIFY_OK; 86 + default: 87 + return NOTIFY_DONE; 88 + } 89 + } 62 90 63 91 static int mana_ib_probe(struct auxiliary_device *adev, 64 92 const struct auxiliary_device_id *id) ··· 130 84 dev->ib_dev.num_comp_vectors = mdev->gdma_context->max_num_queues; 131 85 dev->ib_dev.dev.parent = mdev->gdma_context->dev; 132 86 133 - rcu_read_lock(); /* required to get primary netdev */ 134 - ndev = mana_get_primary_netdev_rcu(mc, 0); 87 + ndev = mana_get_primary_netdev(mc, 0, &dev->dev_tracker); 135 88 if (!ndev) { 136 - rcu_read_unlock(); 137 89 ret = -ENODEV; 138 90 ibdev_err(&dev->ib_dev, "Failed to get netdev for IB port 1"); 139 91 goto free_ib_device; ··· 139 95 ether_addr_copy(mac_addr, ndev->dev_addr); 140 96 addrconf_addr_eui48((u8 *)&dev->ib_dev.node_guid, ndev->dev_addr); 141 97 ret = ib_device_set_netdev(&dev->ib_dev, ndev, 1); 142 - rcu_read_unlock(); 98 + /* mana_get_primary_netdev() returns ndev with refcount held */ 99 + netdev_put(ndev, &dev->dev_tracker); 143 100 if (ret) { 144 101 ibdev_err(&dev->ib_dev, "Failed to set ib netdev, ret %d", ret); 145 102 goto free_ib_device; ··· 154 109 } 155 110 dev->gdma_dev = &mdev->gdma_context->mana_ib; 156 111 157 - ret = mana_ib_gd_query_adapter_caps(dev); 112 + dev->nb.notifier_call = mana_ib_netdev_event; 113 + ret = register_netdevice_notifier(&dev->nb); 158 114 if (ret) { 159 - ibdev_err(&dev->ib_dev, "Failed to query device caps, ret %d", 115 + ibdev_err(&dev->ib_dev, "Failed to register net notifier, %d", 160 116 ret); 161 117 goto deregister_device; 162 118 } 163 119 120 + ret = mana_ib_gd_query_adapter_caps(dev); 121 + if (ret) { 122 + ibdev_err(&dev->ib_dev, "Failed to query device caps, ret %d", 123 + ret); 124 + goto deregister_net_notifier; 125 + } 126 + 127 + ib_set_device_ops(&dev->ib_dev, &mana_ib_stats_ops); 128 + 164 129 ret = mana_ib_create_eqs(dev); 165 130 if (ret) { 166 131 ibdev_err(&dev->ib_dev, "Failed to create EQs, ret %d", ret); 167 - goto deregister_device; 132 + goto deregister_net_notifier; 168 133 } 169 134 170 135 ret = mana_ib_gd_create_rnic_adapter(dev); ··· 189 134 goto destroy_rnic; 190 135 } 191 136 137 + dev->av_pool = dma_pool_create("mana_ib_av", mdev->gdma_context->dev, 138 + MANA_AV_BUFFER_SIZE, MANA_AV_BUFFER_SIZE, 0); 139 + if (!dev->av_pool) { 140 + ret = -ENOMEM; 141 + goto destroy_rnic; 142 + } 143 + 192 144 ret = ib_register_device(&dev->ib_dev, "mana_%d", 193 145 mdev->gdma_context->dev); 194 146 if (ret) 195 - goto destroy_rnic; 147 + goto deallocate_pool; 196 148 197 149 dev_set_drvdata(&adev->dev, dev); 198 150 199 151 return 0; 200 152 153 + deallocate_pool: 154 + dma_pool_destroy(dev->av_pool); 201 155 destroy_rnic: 202 156 xa_destroy(&dev->qp_table_wq); 203 157 mana_ib_gd_destroy_rnic_adapter(dev); 204 158 destroy_eqs: 205 159 mana_ib_destroy_eqs(dev); 160 + deregister_net_notifier: 161 + unregister_netdevice_notifier(&dev->nb); 206 162 deregister_device: 207 163 mana_gd_deregister_device(dev->gdma_dev); 208 164 free_ib_device: ··· 226 160 struct mana_ib_dev *dev = dev_get_drvdata(&adev->dev); 227 161 228 162 ib_unregister_device(&dev->ib_dev); 163 + dma_pool_destroy(dev->av_pool); 229 164 xa_destroy(&dev->qp_table_wq); 230 165 mana_ib_gd_destroy_rnic_adapter(dev); 231 166 mana_ib_destroy_eqs(dev); 167 + unregister_netdevice_notifier(&dev->nb); 232 168 mana_gd_deregister_device(dev->gdma_dev); 233 169 ib_dealloc_device(&dev->ib_dev); 234 170 }
+98 -5
drivers/infiniband/hw/mana/main.c
··· 82 82 mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_PD, sizeof(req), 83 83 sizeof(resp)); 84 84 85 + if (!udata) 86 + flags |= GDMA_PD_FLAG_ALLOW_GPA_MR; 87 + 85 88 req.flags = flags; 86 89 err = mana_gd_send_request(gc, sizeof(req), &req, 87 90 sizeof(resp), &resp); ··· 240 237 ibdev_dbg(ibdev, "Failed to destroy doorbell page %d\n", ret); 241 238 } 242 239 240 + int mana_ib_create_kernel_queue(struct mana_ib_dev *mdev, u32 size, enum gdma_queue_type type, 241 + struct mana_ib_queue *queue) 242 + { 243 + struct gdma_context *gc = mdev_to_gc(mdev); 244 + struct gdma_queue_spec spec = {}; 245 + int err; 246 + 247 + queue->id = INVALID_QUEUE_ID; 248 + queue->gdma_region = GDMA_INVALID_DMA_REGION; 249 + spec.type = type; 250 + spec.monitor_avl_buf = false; 251 + spec.queue_size = size; 252 + err = mana_gd_create_mana_wq_cq(&gc->mana_ib, &spec, &queue->kmem); 253 + if (err) 254 + return err; 255 + /* take ownership into mana_ib from mana */ 256 + queue->gdma_region = queue->kmem->mem_info.dma_region_handle; 257 + queue->kmem->mem_info.dma_region_handle = GDMA_INVALID_DMA_REGION; 258 + return 0; 259 + } 260 + 243 261 int mana_ib_create_queue(struct mana_ib_dev *mdev, u64 addr, u32 size, 244 262 struct mana_ib_queue *queue) 245 263 { ··· 300 276 */ 301 277 mana_ib_gd_destroy_dma_region(mdev, queue->gdma_region); 302 278 ib_umem_release(queue->umem); 279 + if (queue->kmem) 280 + mana_gd_destroy_queue(mdev_to_gc(mdev), queue->kmem); 303 281 } 304 282 305 283 static int ··· 384 358 unsigned int tail = 0; 385 359 u64 *page_addr_list; 386 360 void *request_buf; 387 - int err; 361 + int err = 0; 388 362 389 363 gc = mdev_to_gc(dev); 390 364 hwc = gc->hwc.driver_data; ··· 561 535 immutable->pkey_tbl_len = attr.pkey_tbl_len; 562 536 immutable->gid_tbl_len = attr.gid_tbl_len; 563 537 immutable->core_cap_flags = RDMA_CORE_PORT_RAW_PACKET; 564 - if (port_num == 1) 538 + if (port_num == 1) { 565 539 immutable->core_cap_flags |= RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP; 540 + immutable->max_mad_size = IB_MGMT_MAD_SIZE; 541 + } 566 542 567 543 return 0; 568 544 } ··· 623 595 props->active_width = IB_WIDTH_4X; 624 596 props->active_speed = IB_SPEED_EDR; 625 597 props->pkey_tbl_len = 1; 626 - if (port == 1) 598 + if (port == 1) { 627 599 props->gid_tbl_len = 16; 600 + props->port_cap_flags = IB_PORT_CM_SUP; 601 + props->ip_gids = true; 602 + } 628 603 629 604 return 0; 630 605 } ··· 665 634 666 635 mana_gd_init_req_hdr(&req.hdr, MANA_IB_GET_ADAPTER_CAP, sizeof(req), 667 636 sizeof(resp)); 668 - req.hdr.resp.msg_version = GDMA_MESSAGE_V3; 637 + req.hdr.resp.msg_version = GDMA_MESSAGE_V4; 669 638 req.hdr.dev_id = dev->gdma_dev->dev_id; 670 639 671 640 err = mana_gd_send_request(mdev_to_gc(dev), sizeof(req), ··· 694 663 caps->max_inline_data_size = resp.max_inline_data_size; 695 664 caps->max_send_sge_count = resp.max_send_sge_count; 696 665 caps->max_recv_sge_count = resp.max_recv_sge_count; 666 + caps->feature_flags = resp.feature_flags; 697 667 698 668 return 0; 699 669 } ··· 710 678 switch (event->type) { 711 679 case GDMA_EQE_RNIC_QP_FATAL: 712 680 qpn = event->details[0]; 713 - qp = mana_get_qp_ref(mdev, qpn); 681 + qp = mana_get_qp_ref(mdev, qpn, false); 714 682 if (!qp) 715 683 break; 716 684 if (qp->ibqp.event_handler) { ··· 793 761 req.hdr.req.msg_version = GDMA_MESSAGE_V2; 794 762 req.hdr.dev_id = gc->mana_ib.dev_id; 795 763 req.notify_eq_id = mdev->fatal_err_eq->id; 764 + 765 + if (mdev->adapter_caps.feature_flags & MANA_IB_FEATURE_CLIENT_ERROR_CQE_SUPPORT) 766 + req.feature_flags |= MANA_IB_FEATURE_CLIENT_ERROR_CQE_REQUEST; 796 767 797 768 err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 798 769 if (err) { ··· 1018 983 err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 1019 984 if (err) { 1020 985 ibdev_err(&mdev->ib_dev, "Failed to destroy rc qp err %d", err); 986 + return err; 987 + } 988 + return 0; 989 + } 990 + 991 + int mana_ib_gd_create_ud_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp, 992 + struct ib_qp_init_attr *attr, u32 doorbell, u32 type) 993 + { 994 + struct mana_ib_cq *send_cq = container_of(qp->ibqp.send_cq, struct mana_ib_cq, ibcq); 995 + struct mana_ib_cq *recv_cq = container_of(qp->ibqp.recv_cq, struct mana_ib_cq, ibcq); 996 + struct mana_ib_pd *pd = container_of(qp->ibqp.pd, struct mana_ib_pd, ibpd); 997 + struct gdma_context *gc = mdev_to_gc(mdev); 998 + struct mana_rnic_create_udqp_resp resp = {}; 999 + struct mana_rnic_create_udqp_req req = {}; 1000 + int err, i; 1001 + 1002 + mana_gd_init_req_hdr(&req.hdr, MANA_IB_CREATE_UD_QP, sizeof(req), sizeof(resp)); 1003 + req.hdr.dev_id = gc->mana_ib.dev_id; 1004 + req.adapter = mdev->adapter_handle; 1005 + req.pd_handle = pd->pd_handle; 1006 + req.send_cq_handle = send_cq->cq_handle; 1007 + req.recv_cq_handle = recv_cq->cq_handle; 1008 + for (i = 0; i < MANA_UD_QUEUE_TYPE_MAX; i++) 1009 + req.dma_region[i] = qp->ud_qp.queues[i].gdma_region; 1010 + req.doorbell_page = doorbell; 1011 + req.max_send_wr = attr->cap.max_send_wr; 1012 + req.max_recv_wr = attr->cap.max_recv_wr; 1013 + req.max_send_sge = attr->cap.max_send_sge; 1014 + req.max_recv_sge = attr->cap.max_recv_sge; 1015 + req.qp_type = type; 1016 + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 1017 + if (err) { 1018 + ibdev_err(&mdev->ib_dev, "Failed to create ud qp err %d", err); 1019 + return err; 1020 + } 1021 + qp->qp_handle = resp.qp_handle; 1022 + for (i = 0; i < MANA_UD_QUEUE_TYPE_MAX; i++) { 1023 + qp->ud_qp.queues[i].id = resp.queue_ids[i]; 1024 + /* The GDMA regions are now owned by the RNIC QP handle */ 1025 + qp->ud_qp.queues[i].gdma_region = GDMA_INVALID_DMA_REGION; 1026 + } 1027 + return 0; 1028 + } 1029 + 1030 + int mana_ib_gd_destroy_ud_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp) 1031 + { 1032 + struct mana_rnic_destroy_udqp_resp resp = {0}; 1033 + struct mana_rnic_destroy_udqp_req req = {0}; 1034 + struct gdma_context *gc = mdev_to_gc(mdev); 1035 + int err; 1036 + 1037 + mana_gd_init_req_hdr(&req.hdr, MANA_IB_DESTROY_UD_QP, sizeof(req), sizeof(resp)); 1038 + req.hdr.dev_id = gc->mana_ib.dev_id; 1039 + req.adapter = mdev->adapter_handle; 1040 + req.qp_handle = qp->qp_handle; 1041 + err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp); 1042 + if (err) { 1043 + ibdev_err(&mdev->ib_dev, "Failed to destroy ud qp err %d", err); 1021 1044 return err; 1022 1045 } 1023 1046 return 0;
+209 -1
drivers/infiniband/hw/mana/mana_ib.h
··· 11 11 #include <rdma/ib_umem.h> 12 12 #include <rdma/mana-abi.h> 13 13 #include <rdma/uverbs_ioctl.h> 14 + #include <linux/dmapool.h> 14 15 15 16 #include <net/mana/mana.h> 17 + #include "shadow_queue.h" 18 + #include "counters.h" 16 19 17 20 #define PAGE_SZ_BM \ 18 21 (SZ_4K | SZ_8K | SZ_16K | SZ_32K | SZ_64K | SZ_128K | SZ_256K | \ ··· 23 20 24 21 /* MANA doesn't have any limit for MR size */ 25 22 #define MANA_IB_MAX_MR_SIZE U64_MAX 23 + 24 + /* Send queue ID mask */ 25 + #define MANA_SENDQ_MASK BIT(31) 26 26 27 27 /* 28 28 * The hardware limit of number of MRs is greater than maximum number of MRs ··· 37 31 * The CA timeout is approx. 260ms (4us * 2^(DELAY)) 38 32 */ 39 33 #define MANA_CA_ACK_DELAY 16 34 + 35 + /* 36 + * The buffer used for writing AV 37 + */ 38 + #define MANA_AV_BUFFER_SIZE 64 40 39 41 40 struct mana_ib_adapter_caps { 42 41 u32 max_sq_id; ··· 59 48 u32 max_send_sge_count; 60 49 u32 max_recv_sge_count; 61 50 u32 max_inline_data_size; 51 + u64 feature_flags; 62 52 }; 63 53 64 54 struct mana_ib_queue { 65 55 struct ib_umem *umem; 56 + struct gdma_queue *kmem; 66 57 u64 gdma_region; 67 58 u64 id; 68 59 }; ··· 77 64 struct gdma_queue **eqs; 78 65 struct xarray qp_table_wq; 79 66 struct mana_ib_adapter_caps adapter_caps; 67 + struct dma_pool *av_pool; 68 + netdevice_tracker dev_tracker; 69 + struct notifier_block nb; 80 70 }; 81 71 82 72 struct mana_ib_wq { ··· 103 87 u32 tx_vp_offset; 104 88 }; 105 89 90 + struct mana_ib_av { 91 + u8 dest_ip[16]; 92 + u8 dest_mac[ETH_ALEN]; 93 + u16 udp_src_port; 94 + u8 src_ip[16]; 95 + u32 hop_limit : 8; 96 + u32 reserved1 : 12; 97 + u32 dscp : 6; 98 + u32 reserved2 : 5; 99 + u32 is_ipv6 : 1; 100 + u32 reserved3 : 32; 101 + }; 102 + 103 + struct mana_ib_ah { 104 + struct ib_ah ibah; 105 + struct mana_ib_av *av; 106 + dma_addr_t dma_handle; 107 + }; 108 + 106 109 struct mana_ib_mr { 107 110 struct ib_mr ibmr; 108 111 struct ib_umem *umem; ··· 131 96 struct mana_ib_cq { 132 97 struct ib_cq ibcq; 133 98 struct mana_ib_queue queue; 99 + /* protects CQ polling */ 100 + spinlock_t cq_lock; 101 + struct list_head list_send_qp; 102 + struct list_head list_recv_qp; 134 103 int cqe; 135 104 u32 comp_vector; 136 105 mana_handle_t cq_handle; ··· 153 114 struct mana_ib_queue queues[MANA_RC_QUEUE_TYPE_MAX]; 154 115 }; 155 116 117 + enum mana_ud_queue_type { 118 + MANA_UD_SEND_QUEUE = 0, 119 + MANA_UD_RECV_QUEUE, 120 + MANA_UD_QUEUE_TYPE_MAX, 121 + }; 122 + 123 + struct mana_ib_ud_qp { 124 + struct mana_ib_queue queues[MANA_UD_QUEUE_TYPE_MAX]; 125 + u32 sq_psn; 126 + }; 127 + 156 128 struct mana_ib_qp { 157 129 struct ib_qp ibqp; 158 130 ··· 171 121 union { 172 122 struct mana_ib_queue raw_sq; 173 123 struct mana_ib_rc_qp rc_qp; 124 + struct mana_ib_ud_qp ud_qp; 174 125 }; 175 126 176 127 /* The port on the IB device, starting with 1 */ 177 128 u32 port; 129 + 130 + struct list_head cq_send_list; 131 + struct list_head cq_recv_list; 132 + struct shadow_queue shadow_rq; 133 + struct shadow_queue shadow_sq; 178 134 179 135 refcount_t refcount; 180 136 struct completion free; ··· 201 145 MANA_IB_DESTROY_ADAPTER = 0x30003, 202 146 MANA_IB_CONFIG_IP_ADDR = 0x30004, 203 147 MANA_IB_CONFIG_MAC_ADDR = 0x30005, 148 + MANA_IB_CREATE_UD_QP = 0x30006, 149 + MANA_IB_DESTROY_UD_QP = 0x30007, 204 150 MANA_IB_CREATE_CQ = 0x30008, 205 151 MANA_IB_DESTROY_CQ = 0x30009, 206 152 MANA_IB_CREATE_RC_QP = 0x3000a, 207 153 MANA_IB_DESTROY_RC_QP = 0x3000b, 208 154 MANA_IB_SET_QP_STATE = 0x3000d, 155 + MANA_IB_QUERY_VF_COUNTERS = 0x30022, 209 156 }; 210 157 211 158 struct mana_ib_query_adapter_caps_req { 212 159 struct gdma_req_hdr hdr; 213 160 }; /*HW Data */ 161 + 162 + enum mana_ib_adapter_features { 163 + MANA_IB_FEATURE_CLIENT_ERROR_CQE_SUPPORT = BIT(4), 164 + }; 214 165 215 166 struct mana_ib_query_adapter_caps_resp { 216 167 struct gdma_resp_hdr hdr; ··· 239 176 u32 max_send_sge_count; 240 177 u32 max_recv_sge_count; 241 178 u32 max_inline_data_size; 179 + u64 feature_flags; 242 180 }; /* HW Data */ 181 + 182 + enum mana_ib_adapter_features_request { 183 + MANA_IB_FEATURE_CLIENT_ERROR_CQE_REQUEST = BIT(1), 184 + }; /*HW Data */ 243 185 244 186 struct mana_rnic_create_adapter_req { 245 187 struct gdma_req_hdr hdr; ··· 364 296 struct gdma_resp_hdr hdr; 365 297 }; /* HW Data */ 366 298 299 + struct mana_rnic_create_udqp_req { 300 + struct gdma_req_hdr hdr; 301 + mana_handle_t adapter; 302 + mana_handle_t pd_handle; 303 + mana_handle_t send_cq_handle; 304 + mana_handle_t recv_cq_handle; 305 + u64 dma_region[MANA_UD_QUEUE_TYPE_MAX]; 306 + u32 qp_type; 307 + u32 doorbell_page; 308 + u32 max_send_wr; 309 + u32 max_recv_wr; 310 + u32 max_send_sge; 311 + u32 max_recv_sge; 312 + }; /* HW Data */ 313 + 314 + struct mana_rnic_create_udqp_resp { 315 + struct gdma_resp_hdr hdr; 316 + mana_handle_t qp_handle; 317 + u32 queue_ids[MANA_UD_QUEUE_TYPE_MAX]; 318 + }; /* HW Data*/ 319 + 320 + struct mana_rnic_destroy_udqp_req { 321 + struct gdma_req_hdr hdr; 322 + mana_handle_t adapter; 323 + mana_handle_t qp_handle; 324 + }; /* HW Data */ 325 + 326 + struct mana_rnic_destroy_udqp_resp { 327 + struct gdma_resp_hdr hdr; 328 + }; /* HW Data */ 329 + 367 330 struct mana_ib_ah_attr { 368 331 u8 src_addr[16]; 369 332 u8 dest_addr[16]; ··· 431 332 struct gdma_resp_hdr hdr; 432 333 }; /* HW Data */ 433 334 335 + enum WQE_OPCODE_TYPES { 336 + WQE_TYPE_UD_SEND = 0, 337 + WQE_TYPE_UD_RECV = 8, 338 + }; /* HW DATA */ 339 + 340 + struct rdma_send_oob { 341 + u32 wqe_type : 5; 342 + u32 fence : 1; 343 + u32 signaled : 1; 344 + u32 solicited : 1; 345 + u32 psn : 24; 346 + 347 + u32 ssn_or_rqpn : 24; 348 + u32 reserved1 : 8; 349 + union { 350 + struct { 351 + u32 remote_qkey; 352 + u32 immediate; 353 + u32 reserved1; 354 + u32 reserved2; 355 + } ud_send; 356 + }; 357 + }; /* HW DATA */ 358 + 359 + struct mana_rdma_cqe { 360 + union { 361 + struct { 362 + u8 cqe_type; 363 + u8 data[GDMA_COMP_DATA_SIZE - 1]; 364 + }; 365 + struct { 366 + u32 cqe_type : 8; 367 + u32 vendor_error : 9; 368 + u32 reserved1 : 15; 369 + u32 sge_offset : 5; 370 + u32 tx_wqe_offset : 27; 371 + } ud_send; 372 + struct { 373 + u32 cqe_type : 8; 374 + u32 reserved1 : 24; 375 + u32 msg_len; 376 + u32 src_qpn : 24; 377 + u32 reserved2 : 8; 378 + u32 imm_data; 379 + u32 rx_wqe_offset; 380 + } ud_recv; 381 + }; 382 + }; /* HW DATA */ 383 + 384 + struct mana_rnic_query_vf_cntrs_req { 385 + struct gdma_req_hdr hdr; 386 + mana_handle_t adapter; 387 + }; /* HW Data */ 388 + 389 + struct mana_rnic_query_vf_cntrs_resp { 390 + struct gdma_resp_hdr hdr; 391 + u64 requester_timeout; 392 + u64 requester_oos_nak; 393 + u64 requester_rnr_nak; 394 + u64 responder_rnr_nak; 395 + u64 responder_oos; 396 + u64 responder_dup_request; 397 + u64 requester_implicit_nak; 398 + u64 requester_readresp_psn_mismatch; 399 + u64 nak_inv_req; 400 + u64 nak_access_err; 401 + u64 nak_opp_err; 402 + u64 nak_inv_read; 403 + u64 responder_local_len_err; 404 + u64 requestor_local_prot_err; 405 + u64 responder_rem_access_err; 406 + u64 responder_local_qp_err; 407 + u64 responder_malformed_wqe; 408 + u64 general_hw_err; 409 + u64 requester_rnr_nak_retries_exceeded; 410 + u64 requester_retries_exceeded; 411 + u64 total_fatal_err; 412 + u64 received_cnps; 413 + u64 num_qps_congested; 414 + u64 rate_inc_events; 415 + u64 num_qps_recovered; 416 + u64 current_rate; 417 + }; /* HW Data */ 418 + 434 419 static inline struct gdma_context *mdev_to_gc(struct mana_ib_dev *mdev) 435 420 { 436 421 return mdev->gdma_dev->gdma_context; 437 422 } 438 423 439 424 static inline struct mana_ib_qp *mana_get_qp_ref(struct mana_ib_dev *mdev, 440 - uint32_t qid) 425 + u32 qid, bool is_sq) 441 426 { 442 427 struct mana_ib_qp *qp; 443 428 unsigned long flag; 429 + 430 + if (is_sq) 431 + qid |= MANA_SENDQ_MASK; 444 432 445 433 xa_lock_irqsave(&mdev->qp_table_wq, flag); 446 434 qp = xa_load(&mdev->qp_table_wq, qid); ··· 574 388 int mana_ib_gd_destroy_dma_region(struct mana_ib_dev *dev, 575 389 mana_handle_t gdma_region); 576 390 391 + int mana_ib_create_kernel_queue(struct mana_ib_dev *mdev, u32 size, enum gdma_queue_type type, 392 + struct mana_ib_queue *queue); 577 393 int mana_ib_create_queue(struct mana_ib_dev *mdev, u64 addr, u32 size, 578 394 struct mana_ib_queue *queue); 579 395 void mana_ib_destroy_queue(struct mana_ib_dev *mdev, struct mana_ib_queue *queue); ··· 668 480 int mana_ib_gd_create_rc_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp, 669 481 struct ib_qp_init_attr *attr, u32 doorbell, u64 flags); 670 482 int mana_ib_gd_destroy_rc_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp); 483 + 484 + int mana_ib_gd_create_ud_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp, 485 + struct ib_qp_init_attr *attr, u32 doorbell, u32 type); 486 + int mana_ib_gd_destroy_ud_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp); 487 + 488 + int mana_ib_create_ah(struct ib_ah *ibah, struct rdma_ah_init_attr *init_attr, 489 + struct ib_udata *udata); 490 + int mana_ib_destroy_ah(struct ib_ah *ah, u32 flags); 491 + 492 + int mana_ib_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr, 493 + const struct ib_recv_wr **bad_wr); 494 + int mana_ib_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr, 495 + const struct ib_send_wr **bad_wr); 496 + 497 + int mana_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc); 498 + int mana_ib_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags); 499 + 500 + struct ib_mr *mana_ib_reg_user_mr_dmabuf(struct ib_pd *ibpd, u64 start, u64 length, 501 + u64 iova, int fd, int mr_access_flags, 502 + struct uverbs_attr_bundle *attrs); 671 503 #endif
+105
drivers/infiniband/hw/mana/mr.c
··· 8 8 #define VALID_MR_FLAGS \ 9 9 (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ) 10 10 11 + #define VALID_DMA_MR_FLAGS (IB_ACCESS_LOCAL_WRITE) 12 + 11 13 static enum gdma_mr_access_flags 12 14 mana_ib_verbs_to_gdma_access_flags(int access_flags) 13 15 { ··· 41 39 req.mr_type = mr_params->mr_type; 42 40 43 41 switch (mr_params->mr_type) { 42 + case GDMA_MR_TYPE_GPA: 43 + break; 44 44 case GDMA_MR_TYPE_GVA: 45 45 req.gva.dma_region_handle = mr_params->gva.dma_region_handle; 46 46 req.gva.virtual_address = mr_params->gva.virtual_address; ··· 167 163 168 164 err_umem: 169 165 ib_umem_release(mr->umem); 166 + 167 + err_free: 168 + kfree(mr); 169 + return ERR_PTR(err); 170 + } 171 + 172 + struct ib_mr *mana_ib_reg_user_mr_dmabuf(struct ib_pd *ibpd, u64 start, u64 length, 173 + u64 iova, int fd, int access_flags, 174 + struct uverbs_attr_bundle *attrs) 175 + { 176 + struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd); 177 + struct gdma_create_mr_params mr_params = {}; 178 + struct ib_device *ibdev = ibpd->device; 179 + struct ib_umem_dmabuf *umem_dmabuf; 180 + struct mana_ib_dev *dev; 181 + struct mana_ib_mr *mr; 182 + u64 dma_region_handle; 183 + int err; 184 + 185 + dev = container_of(ibdev, struct mana_ib_dev, ib_dev); 186 + 187 + access_flags &= ~IB_ACCESS_OPTIONAL; 188 + if (access_flags & ~VALID_MR_FLAGS) 189 + return ERR_PTR(-EOPNOTSUPP); 190 + 191 + mr = kzalloc(sizeof(*mr), GFP_KERNEL); 192 + if (!mr) 193 + return ERR_PTR(-ENOMEM); 194 + 195 + umem_dmabuf = ib_umem_dmabuf_get_pinned(ibdev, start, length, fd, access_flags); 196 + if (IS_ERR(umem_dmabuf)) { 197 + err = PTR_ERR(umem_dmabuf); 198 + ibdev_dbg(ibdev, "Failed to get dmabuf umem, %d\n", err); 199 + goto err_free; 200 + } 201 + 202 + mr->umem = &umem_dmabuf->umem; 203 + 204 + err = mana_ib_create_dma_region(dev, mr->umem, &dma_region_handle, iova); 205 + if (err) { 206 + ibdev_dbg(ibdev, "Failed create dma region for user-mr, %d\n", 207 + err); 208 + goto err_umem; 209 + } 210 + 211 + mr_params.pd_handle = pd->pd_handle; 212 + mr_params.mr_type = GDMA_MR_TYPE_GVA; 213 + mr_params.gva.dma_region_handle = dma_region_handle; 214 + mr_params.gva.virtual_address = iova; 215 + mr_params.gva.access_flags = 216 + mana_ib_verbs_to_gdma_access_flags(access_flags); 217 + 218 + err = mana_ib_gd_create_mr(dev, mr, &mr_params); 219 + if (err) 220 + goto err_dma_region; 221 + 222 + /* 223 + * There is no need to keep track of dma_region_handle after MR is 224 + * successfully created. The dma_region_handle is tracked in the PF 225 + * as part of the lifecycle of this MR. 226 + */ 227 + 228 + return &mr->ibmr; 229 + 230 + err_dma_region: 231 + mana_gd_destroy_dma_region(mdev_to_gc(dev), dma_region_handle); 232 + 233 + err_umem: 234 + ib_umem_release(mr->umem); 235 + 236 + err_free: 237 + kfree(mr); 238 + return ERR_PTR(err); 239 + } 240 + 241 + struct ib_mr *mana_ib_get_dma_mr(struct ib_pd *ibpd, int access_flags) 242 + { 243 + struct mana_ib_pd *pd = container_of(ibpd, struct mana_ib_pd, ibpd); 244 + struct gdma_create_mr_params mr_params = {}; 245 + struct ib_device *ibdev = ibpd->device; 246 + struct mana_ib_dev *dev; 247 + struct mana_ib_mr *mr; 248 + int err; 249 + 250 + dev = container_of(ibdev, struct mana_ib_dev, ib_dev); 251 + 252 + if (access_flags & ~VALID_DMA_MR_FLAGS) 253 + return ERR_PTR(-EINVAL); 254 + 255 + mr = kzalloc(sizeof(*mr), GFP_KERNEL); 256 + if (!mr) 257 + return ERR_PTR(-ENOMEM); 258 + 259 + mr_params.pd_handle = pd->pd_handle; 260 + mr_params.mr_type = GDMA_MR_TYPE_GPA; 261 + 262 + err = mana_ib_gd_create_mr(dev, mr, &mr_params); 263 + if (err) 264 + goto err_free; 265 + 266 + return &mr->ibmr; 170 267 171 268 err_free: 172 269 kfree(mr);
+242 -3
drivers/infiniband/hw/mana/qp.c
··· 398 398 return err; 399 399 } 400 400 401 + static u32 mana_ib_wqe_size(u32 sge, u32 oob_size) 402 + { 403 + u32 wqe_size = sge * sizeof(struct gdma_sge) + sizeof(struct gdma_wqe) + oob_size; 404 + 405 + return ALIGN(wqe_size, GDMA_WQE_BU_SIZE); 406 + } 407 + 408 + static u32 mana_ib_queue_size(struct ib_qp_init_attr *attr, u32 queue_type) 409 + { 410 + u32 queue_size; 411 + 412 + switch (attr->qp_type) { 413 + case IB_QPT_UD: 414 + case IB_QPT_GSI: 415 + if (queue_type == MANA_UD_SEND_QUEUE) 416 + queue_size = attr->cap.max_send_wr * 417 + mana_ib_wqe_size(attr->cap.max_send_sge, INLINE_OOB_LARGE_SIZE); 418 + else 419 + queue_size = attr->cap.max_recv_wr * 420 + mana_ib_wqe_size(attr->cap.max_recv_sge, INLINE_OOB_SMALL_SIZE); 421 + break; 422 + default: 423 + return 0; 424 + } 425 + 426 + return MANA_PAGE_ALIGN(roundup_pow_of_two(queue_size)); 427 + } 428 + 429 + static enum gdma_queue_type mana_ib_queue_type(struct ib_qp_init_attr *attr, u32 queue_type) 430 + { 431 + enum gdma_queue_type type; 432 + 433 + switch (attr->qp_type) { 434 + case IB_QPT_UD: 435 + case IB_QPT_GSI: 436 + if (queue_type == MANA_UD_SEND_QUEUE) 437 + type = GDMA_SQ; 438 + else 439 + type = GDMA_RQ; 440 + break; 441 + default: 442 + type = GDMA_INVALID_QUEUE; 443 + } 444 + return type; 445 + } 446 + 447 + static int mana_table_store_rc_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp) 448 + { 449 + return xa_insert_irq(&mdev->qp_table_wq, qp->ibqp.qp_num, qp, 450 + GFP_KERNEL); 451 + } 452 + 453 + static void mana_table_remove_rc_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp) 454 + { 455 + xa_erase_irq(&mdev->qp_table_wq, qp->ibqp.qp_num); 456 + } 457 + 458 + static int mana_table_store_ud_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp) 459 + { 460 + u32 qids = qp->ud_qp.queues[MANA_UD_SEND_QUEUE].id | MANA_SENDQ_MASK; 461 + u32 qidr = qp->ud_qp.queues[MANA_UD_RECV_QUEUE].id; 462 + int err; 463 + 464 + err = xa_insert_irq(&mdev->qp_table_wq, qids, qp, GFP_KERNEL); 465 + if (err) 466 + return err; 467 + 468 + err = xa_insert_irq(&mdev->qp_table_wq, qidr, qp, GFP_KERNEL); 469 + if (err) 470 + goto remove_sq; 471 + 472 + return 0; 473 + 474 + remove_sq: 475 + xa_erase_irq(&mdev->qp_table_wq, qids); 476 + return err; 477 + } 478 + 479 + static void mana_table_remove_ud_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp) 480 + { 481 + u32 qids = qp->ud_qp.queues[MANA_UD_SEND_QUEUE].id | MANA_SENDQ_MASK; 482 + u32 qidr = qp->ud_qp.queues[MANA_UD_RECV_QUEUE].id; 483 + 484 + xa_erase_irq(&mdev->qp_table_wq, qids); 485 + xa_erase_irq(&mdev->qp_table_wq, qidr); 486 + } 487 + 401 488 static int mana_table_store_qp(struct mana_ib_dev *mdev, struct mana_ib_qp *qp) 402 489 { 403 490 refcount_set(&qp->refcount, 1); 404 491 init_completion(&qp->free); 405 - return xa_insert_irq(&mdev->qp_table_wq, qp->ibqp.qp_num, qp, 406 - GFP_KERNEL); 492 + 493 + switch (qp->ibqp.qp_type) { 494 + case IB_QPT_RC: 495 + return mana_table_store_rc_qp(mdev, qp); 496 + case IB_QPT_UD: 497 + case IB_QPT_GSI: 498 + return mana_table_store_ud_qp(mdev, qp); 499 + default: 500 + ibdev_dbg(&mdev->ib_dev, "Unknown QP type for storing in mana table, %d\n", 501 + qp->ibqp.qp_type); 502 + } 503 + 504 + return -EINVAL; 407 505 } 408 506 409 507 static void mana_table_remove_qp(struct mana_ib_dev *mdev, 410 508 struct mana_ib_qp *qp) 411 509 { 412 - xa_erase_irq(&mdev->qp_table_wq, qp->ibqp.qp_num); 510 + switch (qp->ibqp.qp_type) { 511 + case IB_QPT_RC: 512 + mana_table_remove_rc_qp(mdev, qp); 513 + break; 514 + case IB_QPT_UD: 515 + case IB_QPT_GSI: 516 + mana_table_remove_ud_qp(mdev, qp); 517 + break; 518 + default: 519 + ibdev_dbg(&mdev->ib_dev, "Unknown QP type for removing from mana table, %d\n", 520 + qp->ibqp.qp_type); 521 + return; 522 + } 413 523 mana_put_qp_ref(qp); 414 524 wait_for_completion(&qp->free); 415 525 } ··· 600 490 return err; 601 491 } 602 492 493 + static void mana_add_qp_to_cqs(struct mana_ib_qp *qp) 494 + { 495 + struct mana_ib_cq *send_cq = container_of(qp->ibqp.send_cq, struct mana_ib_cq, ibcq); 496 + struct mana_ib_cq *recv_cq = container_of(qp->ibqp.recv_cq, struct mana_ib_cq, ibcq); 497 + unsigned long flags; 498 + 499 + spin_lock_irqsave(&send_cq->cq_lock, flags); 500 + list_add_tail(&qp->cq_send_list, &send_cq->list_send_qp); 501 + spin_unlock_irqrestore(&send_cq->cq_lock, flags); 502 + 503 + spin_lock_irqsave(&recv_cq->cq_lock, flags); 504 + list_add_tail(&qp->cq_recv_list, &recv_cq->list_recv_qp); 505 + spin_unlock_irqrestore(&recv_cq->cq_lock, flags); 506 + } 507 + 508 + static void mana_remove_qp_from_cqs(struct mana_ib_qp *qp) 509 + { 510 + struct mana_ib_cq *send_cq = container_of(qp->ibqp.send_cq, struct mana_ib_cq, ibcq); 511 + struct mana_ib_cq *recv_cq = container_of(qp->ibqp.recv_cq, struct mana_ib_cq, ibcq); 512 + unsigned long flags; 513 + 514 + spin_lock_irqsave(&send_cq->cq_lock, flags); 515 + list_del(&qp->cq_send_list); 516 + spin_unlock_irqrestore(&send_cq->cq_lock, flags); 517 + 518 + spin_lock_irqsave(&recv_cq->cq_lock, flags); 519 + list_del(&qp->cq_recv_list); 520 + spin_unlock_irqrestore(&recv_cq->cq_lock, flags); 521 + } 522 + 523 + static int mana_ib_create_ud_qp(struct ib_qp *ibqp, struct ib_pd *ibpd, 524 + struct ib_qp_init_attr *attr, struct ib_udata *udata) 525 + { 526 + struct mana_ib_dev *mdev = container_of(ibpd->device, struct mana_ib_dev, ib_dev); 527 + struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp); 528 + struct gdma_context *gc = mdev_to_gc(mdev); 529 + u32 doorbell, queue_size; 530 + int i, err; 531 + 532 + if (udata) { 533 + ibdev_dbg(&mdev->ib_dev, "User-level UD QPs are not supported\n"); 534 + return -EOPNOTSUPP; 535 + } 536 + 537 + for (i = 0; i < MANA_UD_QUEUE_TYPE_MAX; ++i) { 538 + queue_size = mana_ib_queue_size(attr, i); 539 + err = mana_ib_create_kernel_queue(mdev, queue_size, mana_ib_queue_type(attr, i), 540 + &qp->ud_qp.queues[i]); 541 + if (err) { 542 + ibdev_err(&mdev->ib_dev, "Failed to create queue %d, err %d\n", 543 + i, err); 544 + goto destroy_queues; 545 + } 546 + } 547 + doorbell = gc->mana_ib.doorbell; 548 + 549 + err = create_shadow_queue(&qp->shadow_rq, attr->cap.max_recv_wr, 550 + sizeof(struct ud_rq_shadow_wqe)); 551 + if (err) { 552 + ibdev_err(&mdev->ib_dev, "Failed to create shadow rq err %d\n", err); 553 + goto destroy_queues; 554 + } 555 + err = create_shadow_queue(&qp->shadow_sq, attr->cap.max_send_wr, 556 + sizeof(struct ud_sq_shadow_wqe)); 557 + if (err) { 558 + ibdev_err(&mdev->ib_dev, "Failed to create shadow sq err %d\n", err); 559 + goto destroy_shadow_queues; 560 + } 561 + 562 + err = mana_ib_gd_create_ud_qp(mdev, qp, attr, doorbell, attr->qp_type); 563 + if (err) { 564 + ibdev_err(&mdev->ib_dev, "Failed to create ud qp %d\n", err); 565 + goto destroy_shadow_queues; 566 + } 567 + qp->ibqp.qp_num = qp->ud_qp.queues[MANA_UD_RECV_QUEUE].id; 568 + qp->port = attr->port_num; 569 + 570 + for (i = 0; i < MANA_UD_QUEUE_TYPE_MAX; ++i) 571 + qp->ud_qp.queues[i].kmem->id = qp->ud_qp.queues[i].id; 572 + 573 + err = mana_table_store_qp(mdev, qp); 574 + if (err) 575 + goto destroy_qp; 576 + 577 + mana_add_qp_to_cqs(qp); 578 + 579 + return 0; 580 + 581 + destroy_qp: 582 + mana_ib_gd_destroy_ud_qp(mdev, qp); 583 + destroy_shadow_queues: 584 + destroy_shadow_queue(&qp->shadow_rq); 585 + destroy_shadow_queue(&qp->shadow_sq); 586 + destroy_queues: 587 + while (i-- > 0) 588 + mana_ib_destroy_queue(mdev, &qp->ud_qp.queues[i]); 589 + return err; 590 + } 591 + 603 592 int mana_ib_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *attr, 604 593 struct ib_udata *udata) 605 594 { ··· 712 503 return mana_ib_create_qp_raw(ibqp, ibqp->pd, attr, udata); 713 504 case IB_QPT_RC: 714 505 return mana_ib_create_rc_qp(ibqp, ibqp->pd, attr, udata); 506 + case IB_QPT_UD: 507 + case IB_QPT_GSI: 508 + return mana_ib_create_ud_qp(ibqp, ibqp->pd, attr, udata); 715 509 default: 716 510 ibdev_dbg(ibqp->device, "Creating QP type %u not supported\n", 717 511 attr->qp_type); ··· 791 579 { 792 580 switch (ibqp->qp_type) { 793 581 case IB_QPT_RC: 582 + case IB_QPT_UD: 583 + case IB_QPT_GSI: 794 584 return mana_ib_gd_modify_qp(ibqp, attr, attr_mask, udata); 795 585 default: 796 586 ibdev_dbg(ibqp->device, "Modify QP type %u not supported", ibqp->qp_type); ··· 866 652 return 0; 867 653 } 868 654 655 + static int mana_ib_destroy_ud_qp(struct mana_ib_qp *qp, struct ib_udata *udata) 656 + { 657 + struct mana_ib_dev *mdev = 658 + container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev); 659 + int i; 660 + 661 + mana_remove_qp_from_cqs(qp); 662 + mana_table_remove_qp(mdev, qp); 663 + 664 + destroy_shadow_queue(&qp->shadow_rq); 665 + destroy_shadow_queue(&qp->shadow_sq); 666 + 667 + /* Ignore return code as there is not much we can do about it. 668 + * The error message is printed inside. 669 + */ 670 + mana_ib_gd_destroy_ud_qp(mdev, qp); 671 + for (i = 0; i < MANA_UD_QUEUE_TYPE_MAX; ++i) 672 + mana_ib_destroy_queue(mdev, &qp->ud_qp.queues[i]); 673 + 674 + return 0; 675 + } 676 + 869 677 int mana_ib_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata) 870 678 { 871 679 struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp); ··· 901 665 return mana_ib_destroy_qp_raw(qp, udata); 902 666 case IB_QPT_RC: 903 667 return mana_ib_destroy_rc_qp(qp, udata); 668 + case IB_QPT_UD: 669 + case IB_QPT_GSI: 670 + return mana_ib_destroy_ud_qp(qp, udata); 904 671 default: 905 672 ibdev_dbg(ibqp->device, "Unexpected QP type %u\n", 906 673 ibqp->qp_type);
+115
drivers/infiniband/hw/mana/shadow_queue.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + /* 3 + * Copyright (c) 2024, Microsoft Corporation. All rights reserved. 4 + */ 5 + 6 + #ifndef _MANA_SHADOW_QUEUE_H_ 7 + #define _MANA_SHADOW_QUEUE_H_ 8 + 9 + struct shadow_wqe_header { 10 + u16 opcode; 11 + u16 error_code; 12 + u32 posted_wqe_size; 13 + u64 wr_id; 14 + }; 15 + 16 + struct ud_rq_shadow_wqe { 17 + struct shadow_wqe_header header; 18 + u32 byte_len; 19 + u32 src_qpn; 20 + }; 21 + 22 + struct ud_sq_shadow_wqe { 23 + struct shadow_wqe_header header; 24 + }; 25 + 26 + struct shadow_queue { 27 + /* Unmasked producer index, Incremented on wqe posting */ 28 + u64 prod_idx; 29 + /* Unmasked consumer index, Incremented on cq polling */ 30 + u64 cons_idx; 31 + /* Unmasked index of next-to-complete (from HW) shadow WQE */ 32 + u64 next_to_complete_idx; 33 + /* queue size in wqes */ 34 + u32 length; 35 + /* distance between elements in bytes */ 36 + u32 stride; 37 + /* ring buffer holding wqes */ 38 + void *buffer; 39 + }; 40 + 41 + static inline int create_shadow_queue(struct shadow_queue *queue, uint32_t length, uint32_t stride) 42 + { 43 + queue->buffer = kvmalloc_array(length, stride, GFP_KERNEL); 44 + if (!queue->buffer) 45 + return -ENOMEM; 46 + 47 + queue->length = length; 48 + queue->stride = stride; 49 + 50 + return 0; 51 + } 52 + 53 + static inline void destroy_shadow_queue(struct shadow_queue *queue) 54 + { 55 + kvfree(queue->buffer); 56 + } 57 + 58 + static inline bool shadow_queue_full(struct shadow_queue *queue) 59 + { 60 + return (queue->prod_idx - queue->cons_idx) >= queue->length; 61 + } 62 + 63 + static inline bool shadow_queue_empty(struct shadow_queue *queue) 64 + { 65 + return queue->prod_idx == queue->cons_idx; 66 + } 67 + 68 + static inline void * 69 + shadow_queue_get_element(const struct shadow_queue *queue, u64 unmasked_index) 70 + { 71 + u32 index = unmasked_index % queue->length; 72 + 73 + return ((u8 *)queue->buffer + index * queue->stride); 74 + } 75 + 76 + static inline void * 77 + shadow_queue_producer_entry(struct shadow_queue *queue) 78 + { 79 + return shadow_queue_get_element(queue, queue->prod_idx); 80 + } 81 + 82 + static inline void * 83 + shadow_queue_get_next_to_consume(const struct shadow_queue *queue) 84 + { 85 + if (queue->cons_idx == queue->next_to_complete_idx) 86 + return NULL; 87 + 88 + return shadow_queue_get_element(queue, queue->cons_idx); 89 + } 90 + 91 + static inline void * 92 + shadow_queue_get_next_to_complete(struct shadow_queue *queue) 93 + { 94 + if (queue->next_to_complete_idx == queue->prod_idx) 95 + return NULL; 96 + 97 + return shadow_queue_get_element(queue, queue->next_to_complete_idx); 98 + } 99 + 100 + static inline void shadow_queue_advance_producer(struct shadow_queue *queue) 101 + { 102 + queue->prod_idx++; 103 + } 104 + 105 + static inline void shadow_queue_advance_consumer(struct shadow_queue *queue) 106 + { 107 + queue->cons_idx++; 108 + } 109 + 110 + static inline void shadow_queue_advance_next_to_complete(struct shadow_queue *queue) 111 + { 112 + queue->next_to_complete_idx++; 113 + } 114 + 115 + #endif
+168
drivers/infiniband/hw/mana/wr.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (c) 2024, Microsoft Corporation. All rights reserved. 4 + */ 5 + 6 + #include "mana_ib.h" 7 + 8 + #define MAX_WR_SGL_NUM (2) 9 + 10 + static int mana_ib_post_recv_ud(struct mana_ib_qp *qp, const struct ib_recv_wr *wr) 11 + { 12 + struct mana_ib_dev *mdev = container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev); 13 + struct gdma_queue *queue = qp->ud_qp.queues[MANA_UD_RECV_QUEUE].kmem; 14 + struct gdma_posted_wqe_info wqe_info = {0}; 15 + struct gdma_sge gdma_sgl[MAX_WR_SGL_NUM]; 16 + struct gdma_wqe_request wqe_req = {0}; 17 + struct ud_rq_shadow_wqe *shadow_wqe; 18 + int err, i; 19 + 20 + if (shadow_queue_full(&qp->shadow_rq)) 21 + return -EINVAL; 22 + 23 + if (wr->num_sge > MAX_WR_SGL_NUM) 24 + return -EINVAL; 25 + 26 + for (i = 0; i < wr->num_sge; ++i) { 27 + gdma_sgl[i].address = wr->sg_list[i].addr; 28 + gdma_sgl[i].mem_key = wr->sg_list[i].lkey; 29 + gdma_sgl[i].size = wr->sg_list[i].length; 30 + } 31 + wqe_req.num_sge = wr->num_sge; 32 + wqe_req.sgl = gdma_sgl; 33 + 34 + err = mana_gd_post_work_request(queue, &wqe_req, &wqe_info); 35 + if (err) 36 + return err; 37 + 38 + shadow_wqe = shadow_queue_producer_entry(&qp->shadow_rq); 39 + memset(shadow_wqe, 0, sizeof(*shadow_wqe)); 40 + shadow_wqe->header.opcode = IB_WC_RECV; 41 + shadow_wqe->header.wr_id = wr->wr_id; 42 + shadow_wqe->header.posted_wqe_size = wqe_info.wqe_size_in_bu; 43 + shadow_queue_advance_producer(&qp->shadow_rq); 44 + 45 + mana_gd_wq_ring_doorbell(mdev_to_gc(mdev), queue); 46 + return 0; 47 + } 48 + 49 + int mana_ib_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr, 50 + const struct ib_recv_wr **bad_wr) 51 + { 52 + struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp); 53 + int err = 0; 54 + 55 + for (; wr; wr = wr->next) { 56 + switch (ibqp->qp_type) { 57 + case IB_QPT_UD: 58 + case IB_QPT_GSI: 59 + err = mana_ib_post_recv_ud(qp, wr); 60 + if (unlikely(err)) { 61 + *bad_wr = wr; 62 + return err; 63 + } 64 + break; 65 + default: 66 + ibdev_dbg(ibqp->device, "Posting recv wr on qp type %u is not supported\n", 67 + ibqp->qp_type); 68 + return -EINVAL; 69 + } 70 + } 71 + 72 + return err; 73 + } 74 + 75 + static int mana_ib_post_send_ud(struct mana_ib_qp *qp, const struct ib_ud_wr *wr) 76 + { 77 + struct mana_ib_dev *mdev = container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev); 78 + struct mana_ib_ah *ah = container_of(wr->ah, struct mana_ib_ah, ibah); 79 + struct net_device *ndev = mana_ib_get_netdev(&mdev->ib_dev, qp->port); 80 + struct gdma_queue *queue = qp->ud_qp.queues[MANA_UD_SEND_QUEUE].kmem; 81 + struct gdma_sge gdma_sgl[MAX_WR_SGL_NUM + 1]; 82 + struct gdma_posted_wqe_info wqe_info = {0}; 83 + struct gdma_wqe_request wqe_req = {0}; 84 + struct rdma_send_oob send_oob = {0}; 85 + struct ud_sq_shadow_wqe *shadow_wqe; 86 + int err, i; 87 + 88 + if (!ndev) { 89 + ibdev_dbg(&mdev->ib_dev, "Invalid port %u in QP %u\n", 90 + qp->port, qp->ibqp.qp_num); 91 + return -EINVAL; 92 + } 93 + 94 + if (wr->wr.opcode != IB_WR_SEND) 95 + return -EINVAL; 96 + 97 + if (shadow_queue_full(&qp->shadow_sq)) 98 + return -EINVAL; 99 + 100 + if (wr->wr.num_sge > MAX_WR_SGL_NUM) 101 + return -EINVAL; 102 + 103 + gdma_sgl[0].address = ah->dma_handle; 104 + gdma_sgl[0].mem_key = qp->ibqp.pd->local_dma_lkey; 105 + gdma_sgl[0].size = sizeof(struct mana_ib_av); 106 + for (i = 0; i < wr->wr.num_sge; ++i) { 107 + gdma_sgl[i + 1].address = wr->wr.sg_list[i].addr; 108 + gdma_sgl[i + 1].mem_key = wr->wr.sg_list[i].lkey; 109 + gdma_sgl[i + 1].size = wr->wr.sg_list[i].length; 110 + } 111 + 112 + wqe_req.num_sge = wr->wr.num_sge + 1; 113 + wqe_req.sgl = gdma_sgl; 114 + wqe_req.inline_oob_size = sizeof(struct rdma_send_oob); 115 + wqe_req.inline_oob_data = &send_oob; 116 + wqe_req.flags = GDMA_WR_OOB_IN_SGL; 117 + wqe_req.client_data_unit = ib_mtu_enum_to_int(ib_mtu_int_to_enum(ndev->mtu)); 118 + 119 + send_oob.wqe_type = WQE_TYPE_UD_SEND; 120 + send_oob.fence = !!(wr->wr.send_flags & IB_SEND_FENCE); 121 + send_oob.signaled = !!(wr->wr.send_flags & IB_SEND_SIGNALED); 122 + send_oob.solicited = !!(wr->wr.send_flags & IB_SEND_SOLICITED); 123 + send_oob.psn = qp->ud_qp.sq_psn; 124 + send_oob.ssn_or_rqpn = wr->remote_qpn; 125 + send_oob.ud_send.remote_qkey = 126 + qp->ibqp.qp_type == IB_QPT_GSI ? IB_QP1_QKEY : wr->remote_qkey; 127 + 128 + err = mana_gd_post_work_request(queue, &wqe_req, &wqe_info); 129 + if (err) 130 + return err; 131 + 132 + qp->ud_qp.sq_psn++; 133 + shadow_wqe = shadow_queue_producer_entry(&qp->shadow_sq); 134 + memset(shadow_wqe, 0, sizeof(*shadow_wqe)); 135 + shadow_wqe->header.opcode = IB_WC_SEND; 136 + shadow_wqe->header.wr_id = wr->wr.wr_id; 137 + shadow_wqe->header.posted_wqe_size = wqe_info.wqe_size_in_bu; 138 + shadow_queue_advance_producer(&qp->shadow_sq); 139 + 140 + mana_gd_wq_ring_doorbell(mdev_to_gc(mdev), queue); 141 + return 0; 142 + } 143 + 144 + int mana_ib_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr, 145 + const struct ib_send_wr **bad_wr) 146 + { 147 + int err; 148 + struct mana_ib_qp *qp = container_of(ibqp, struct mana_ib_qp, ibqp); 149 + 150 + for (; wr; wr = wr->next) { 151 + switch (ibqp->qp_type) { 152 + case IB_QPT_UD: 153 + case IB_QPT_GSI: 154 + err = mana_ib_post_send_ud(qp, ud_wr(wr)); 155 + if (unlikely(err)) { 156 + *bad_wr = wr; 157 + return err; 158 + } 159 + break; 160 + default: 161 + ibdev_dbg(ibqp->device, "Posting send wr on qp type %u is not supported\n", 162 + ibqp->qp_type); 163 + return -EINVAL; 164 + } 165 + } 166 + 167 + return err; 168 + }
+1 -1
drivers/infiniband/hw/mlx5/Makefile
··· 9 9 data_direct.o \ 10 10 dm.o \ 11 11 doorbell.o \ 12 + fs.o \ 12 13 gsi.o \ 13 14 ib_virt.o \ 14 15 mad.o \ ··· 27 26 mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o 28 27 mlx5_ib-$(CONFIG_MLX5_ESWITCH) += ib_rep.o 29 28 mlx5_ib-$(CONFIG_INFINIBAND_USER_ACCESS) += devx.o \ 30 - fs.o \ 31 29 qos.o \ 32 30 std_types.o 33 31 mlx5_ib-$(CONFIG_MLX5_MACSEC) += macsec.o
+182 -13
drivers/infiniband/hw/mlx5/counters.c
··· 140 140 INIT_OP_COUNTER(cc_tx_cnp_pkts, CC_TX_CNP_PKTS), 141 141 }; 142 142 143 + static const struct mlx5_ib_counter packets_op_cnts[] = { 144 + INIT_OP_COUNTER(rdma_tx_packets, RDMA_TX_PACKETS), 145 + INIT_OP_COUNTER(rdma_tx_bytes, RDMA_TX_BYTES), 146 + INIT_OP_COUNTER(rdma_rx_packets, RDMA_RX_PACKETS), 147 + INIT_OP_COUNTER(rdma_rx_bytes, RDMA_RX_BYTES), 148 + }; 149 + 143 150 static int mlx5_ib_read_counters(struct ib_counters *counters, 144 151 struct ib_counters_read_attr *read_attr, 145 152 struct uverbs_attr_bundle *attrs) ··· 434 427 return num_counters; 435 428 } 436 429 430 + static bool is_rdma_bytes_counter(u32 type) 431 + { 432 + if (type == MLX5_IB_OPCOUNTER_RDMA_TX_BYTES || 433 + type == MLX5_IB_OPCOUNTER_RDMA_RX_BYTES || 434 + type == MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP || 435 + type == MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP) 436 + return true; 437 + 438 + return false; 439 + } 440 + 441 + static int do_per_qp_get_op_stat(struct rdma_counter *counter) 442 + { 443 + struct mlx5_ib_dev *dev = to_mdev(counter->device); 444 + const struct mlx5_ib_counters *cnts = get_counters(dev, counter->port); 445 + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); 446 + int i, ret, index, num_hw_counters; 447 + u64 packets = 0, bytes = 0; 448 + 449 + for (i = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; 450 + i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; i++) { 451 + if (!mcounter->fc[i]) 452 + continue; 453 + 454 + ret = mlx5_fc_query(dev->mdev, mcounter->fc[i], 455 + &packets, &bytes); 456 + if (ret) 457 + return ret; 458 + 459 + num_hw_counters = cnts->num_q_counters + 460 + cnts->num_cong_counters + 461 + cnts->num_ext_ppcnt_counters; 462 + 463 + index = i - MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP + 464 + num_hw_counters; 465 + 466 + if (is_rdma_bytes_counter(i)) 467 + counter->stats->value[index] = bytes; 468 + else 469 + counter->stats->value[index] = packets; 470 + 471 + clear_bit(index, counter->stats->is_disabled); 472 + } 473 + return 0; 474 + } 475 + 437 476 static int do_get_op_stat(struct ib_device *ibdev, 438 477 struct rdma_hw_stats *stats, 439 478 u32 port_num, int index) ··· 487 434 struct mlx5_ib_dev *dev = to_mdev(ibdev); 488 435 const struct mlx5_ib_counters *cnts; 489 436 const struct mlx5_ib_op_fc *opfcs; 490 - u64 packets = 0, bytes; 437 + u64 packets, bytes; 491 438 u32 type; 492 439 int ret; 493 440 ··· 506 453 if (ret) 507 454 return ret; 508 455 456 + if (is_rdma_bytes_counter(type)) 457 + stats->value[index] = bytes; 458 + else 459 + stats->value[index] = packets; 509 460 out: 510 - stats->value[index] = packets; 511 461 return index; 512 462 } 513 463 ··· 579 523 { 580 524 struct mlx5_ib_dev *dev = to_mdev(counter->device); 581 525 const struct mlx5_ib_counters *cnts = get_counters(dev, counter->port); 526 + int ret; 582 527 583 - return mlx5_ib_query_q_counters(dev->mdev, cnts, 584 - counter->stats, counter->id); 528 + ret = mlx5_ib_query_q_counters(dev->mdev, cnts, counter->stats, 529 + counter->id); 530 + if (ret) 531 + return ret; 532 + 533 + if (!counter->mode.bind_opcnt) 534 + return 0; 535 + 536 + return do_per_qp_get_op_stat(counter); 585 537 } 586 538 587 539 static int mlx5_ib_counter_dealloc(struct rdma_counter *counter) 588 540 { 541 + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); 589 542 struct mlx5_ib_dev *dev = to_mdev(counter->device); 590 543 u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)] = {}; 591 544 592 545 if (!counter->id) 593 546 return 0; 594 547 548 + WARN_ON(!xa_empty(&mcounter->qpn_opfc_xa)); 549 + mlx5r_fs_destroy_fcs(dev, counter); 595 550 MLX5_SET(dealloc_q_counter_in, in, opcode, 596 551 MLX5_CMD_OP_DEALLOC_Q_COUNTER); 597 552 MLX5_SET(dealloc_q_counter_in, in, counter_set_id, counter->id); ··· 610 543 } 611 544 612 545 static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter, 613 - struct ib_qp *qp) 546 + struct ib_qp *qp, u32 port) 614 547 { 615 548 struct mlx5_ib_dev *dev = to_mdev(qp->device); 616 549 bool new = false; ··· 635 568 if (err) 636 569 goto fail_set_counter; 637 570 571 + err = mlx5r_fs_bind_op_fc(qp, counter, port); 572 + if (err) 573 + goto fail_bind_op_fc; 574 + 638 575 return 0; 639 576 577 + fail_bind_op_fc: 578 + mlx5_ib_qp_set_counter(qp, NULL); 640 579 fail_set_counter: 641 580 if (new) { 642 581 mlx5_ib_counter_dealloc(counter); ··· 652 579 return err; 653 580 } 654 581 655 - static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp) 582 + static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp, u32 port) 656 583 { 657 - return mlx5_ib_qp_set_counter(qp, NULL); 584 + struct rdma_counter *counter = qp->counter; 585 + int err; 586 + 587 + mlx5r_fs_unbind_op_fc(qp, counter); 588 + 589 + err = mlx5_ib_qp_set_counter(qp, NULL); 590 + if (err) 591 + goto fail_set_counter; 592 + 593 + return 0; 594 + 595 + fail_set_counter: 596 + mlx5r_fs_bind_op_fc(qp, counter, port); 597 + return err; 658 598 } 659 599 660 600 static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev, ··· 767 681 descs[j].priv = &rdmatx_cnp_op_cnts[i].type; 768 682 } 769 683 } 684 + 685 + for (i = 0; i < ARRAY_SIZE(packets_op_cnts); i++, j++) { 686 + descs[j].name = packets_op_cnts[i].name; 687 + descs[j].flags |= IB_STAT_FLAG_OPTIONAL; 688 + descs[j].priv = &packets_op_cnts[i].type; 689 + } 770 690 } 771 691 772 692 ··· 823 731 824 732 num_op_counters = ARRAY_SIZE(basic_op_cnts); 825 733 734 + num_op_counters += ARRAY_SIZE(packets_op_cnts); 735 + 826 736 if (MLX5_CAP_FLOWTABLE(dev->mdev, 827 737 ft_field_support_2_nic_receive_rdma.bth_opcode)) 828 738 num_op_counters += ARRAY_SIZE(rdmarx_cnp_op_cnts); ··· 854 760 return -ENOMEM; 855 761 } 856 762 763 + /* 764 + * Checks if the given flow counter type should be sharing the same flow counter 765 + * with another type and if it should, checks if that other type flow counter 766 + * was already created, if both conditions are met return true and the counter 767 + * else return false. 768 + */ 769 + bool mlx5r_is_opfc_shared_and_in_use(struct mlx5_ib_op_fc *opfcs, u32 type, 770 + struct mlx5_ib_op_fc **opfc) 771 + { 772 + u32 shared_fc_type; 773 + 774 + switch (type) { 775 + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS: 776 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES; 777 + break; 778 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: 779 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS; 780 + break; 781 + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS: 782 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES; 783 + break; 784 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: 785 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS; 786 + break; 787 + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP: 788 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP; 789 + break; 790 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP: 791 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; 792 + break; 793 + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP: 794 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; 795 + break; 796 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP: 797 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; 798 + break; 799 + default: 800 + return false; 801 + } 802 + 803 + *opfc = &opfcs[shared_fc_type]; 804 + if (!(*opfc)->fc) 805 + return false; 806 + 807 + return true; 808 + } 809 + 857 810 static void mlx5_ib_dealloc_counters(struct mlx5_ib_dev *dev) 858 811 { 859 812 u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)] = {}; 860 813 int num_cnt_ports = dev->num_ports; 814 + struct mlx5_ib_op_fc *in_use_opfc; 861 815 int i, j; 862 816 863 817 if (is_mdev_switchdev_mode(dev->mdev)) ··· 927 785 if (!dev->port[i].cnts.opfcs[j].fc) 928 786 continue; 929 787 930 - if (IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)) 931 - mlx5_ib_fs_remove_op_fc(dev, 932 - &dev->port[i].cnts.opfcs[j], j); 788 + if (mlx5r_is_opfc_shared_and_in_use( 789 + dev->port[i].cnts.opfcs, j, &in_use_opfc)) 790 + goto skip; 791 + 792 + mlx5_ib_fs_remove_op_fc(dev, 793 + &dev->port[i].cnts.opfcs[j], j); 933 794 mlx5_fc_destroy(dev->mdev, 934 795 dev->port[i].cnts.opfcs[j].fc); 796 + skip: 935 797 dev->port[i].cnts.opfcs[j].fc = NULL; 936 798 } 937 799 } ··· 1129 983 unsigned int index, bool enable) 1130 984 { 1131 985 struct mlx5_ib_dev *dev = to_mdev(device); 986 + struct mlx5_ib_op_fc *opfc, *in_use_opfc; 1132 987 struct mlx5_ib_counters *cnts; 1133 - struct mlx5_ib_op_fc *opfc; 1134 988 u32 num_hw_counters, type; 1135 989 int ret; 1136 990 ··· 1154 1008 if (opfc->fc) 1155 1009 return -EEXIST; 1156 1010 1011 + if (mlx5r_is_opfc_shared_and_in_use(cnts->opfcs, type, 1012 + &in_use_opfc)) { 1013 + opfc->fc = in_use_opfc->fc; 1014 + opfc->rule[0] = in_use_opfc->rule[0]; 1015 + return 0; 1016 + } 1017 + 1157 1018 opfc->fc = mlx5_fc_create(dev->mdev, false); 1158 1019 if (IS_ERR(opfc->fc)) 1159 1020 return PTR_ERR(opfc->fc); ··· 1176 1023 if (!opfc->fc) 1177 1024 return -EINVAL; 1178 1025 1026 + if (mlx5r_is_opfc_shared_and_in_use(cnts->opfcs, type, &in_use_opfc)) 1027 + goto out; 1028 + 1179 1029 mlx5_ib_fs_remove_op_fc(dev, opfc, type); 1180 1030 mlx5_fc_destroy(dev->mdev, opfc->fc); 1031 + out: 1181 1032 opfc->fc = NULL; 1182 1033 return 0; 1034 + } 1035 + 1036 + static void mlx5_ib_counter_init(struct rdma_counter *counter) 1037 + { 1038 + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); 1039 + 1040 + xa_init(&mcounter->qpn_opfc_xa); 1183 1041 } 1184 1042 1185 1043 static const struct ib_device_ops hw_stats_ops = { ··· 1201 1037 .counter_dealloc = mlx5_ib_counter_dealloc, 1202 1038 .counter_alloc_stats = mlx5_ib_counter_alloc_stats, 1203 1039 .counter_update_stats = mlx5_ib_counter_update_stats, 1204 - .modify_hw_stat = IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS) ? 1205 - mlx5_ib_modify_stat : NULL, 1040 + .modify_hw_stat = mlx5_ib_modify_stat, 1041 + .counter_init = mlx5_ib_counter_init, 1042 + 1043 + INIT_RDMA_OBJ_SIZE(rdma_counter, mlx5_rdma_counter, rdma_counter), 1206 1044 }; 1207 1045 1208 1046 static const struct ib_device_ops hw_switchdev_vport_op = { ··· 1219 1053 .counter_dealloc = mlx5_ib_counter_dealloc, 1220 1054 .counter_alloc_stats = mlx5_ib_counter_alloc_stats, 1221 1055 .counter_update_stats = mlx5_ib_counter_update_stats, 1056 + .counter_init = mlx5_ib_counter_init, 1057 + 1058 + INIT_RDMA_OBJ_SIZE(rdma_counter, mlx5_rdma_counter, rdma_counter), 1222 1059 }; 1223 1060 1224 1061 static const struct ib_device_ops counters_ops = {
+15
drivers/infiniband/hw/mlx5/counters.h
··· 8 8 9 9 #include "mlx5_ib.h" 10 10 11 + struct mlx5_rdma_counter { 12 + struct rdma_counter rdma_counter; 13 + 14 + struct mlx5_fc *fc[MLX5_IB_OPCOUNTER_MAX]; 15 + struct xarray qpn_opfc_xa; 16 + }; 17 + 18 + static inline struct mlx5_rdma_counter * 19 + to_mcounter(struct rdma_counter *counter) 20 + { 21 + return container_of(counter, struct mlx5_rdma_counter, rdma_counter); 22 + } 23 + 11 24 int mlx5_ib_counters_init(struct mlx5_ib_dev *dev); 12 25 void mlx5_ib_counters_cleanup(struct mlx5_ib_dev *dev); 13 26 void mlx5_ib_counters_clear_description(struct ib_counters *counters); 14 27 int mlx5_ib_flow_counters_set_data(struct ib_counters *ibcounters, 15 28 struct mlx5_ib_create_flow *ucmd); 16 29 u16 mlx5_ib_get_counters_id(struct mlx5_ib_dev *dev, u32 port_num); 30 + bool mlx5r_is_opfc_shared_and_in_use(struct mlx5_ib_op_fc *opfcs, u32 type, 31 + struct mlx5_ib_op_fc **opfc); 17 32 #endif /* _MLX5_IB_COUNTERS_H */
+1 -1
drivers/infiniband/hw/mlx5/cq.c
··· 490 490 } 491 491 492 492 qpn = ntohl(cqe64->sop_drop_qpn) & 0xffffff; 493 - if (!*cur_qp || (qpn != (*cur_qp)->ibqp.qp_num)) { 493 + if (!*cur_qp || (qpn != (*cur_qp)->trans_qp.base.mqp.qpn)) { 494 494 /* We do not have to take the QP table lock here, 495 495 * because CQs will be locked while QPs are removed 496 496 * from the table.
+35 -6
drivers/infiniband/hw/mlx5/devx.c
··· 13 13 #include <rdma/uverbs_std_types.h> 14 14 #include <linux/mlx5/driver.h> 15 15 #include <linux/mlx5/fs.h> 16 + #include <rdma/ib_ucaps.h> 16 17 #include "mlx5_ib.h" 17 18 #include "devx.h" 18 19 #include "qp.h" ··· 123 122 return to_mucontext(ib_uverbs_get_ucontext(attrs)); 124 123 } 125 124 126 - int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user) 125 + static int set_uctx_ucaps(struct mlx5_ib_dev *dev, u64 req_ucaps, u32 *cap) 126 + { 127 + if (UCAP_ENABLED(req_ucaps, RDMA_UCAP_MLX5_CTRL_LOCAL)) { 128 + if (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RDMA_CTRL) 129 + *cap |= MLX5_UCTX_CAP_RDMA_CTRL; 130 + else 131 + return -EOPNOTSUPP; 132 + } 133 + 134 + if (UCAP_ENABLED(req_ucaps, RDMA_UCAP_MLX5_CTRL_OTHER_VHCA)) { 135 + if (MLX5_CAP_GEN(dev->mdev, uctx_cap) & 136 + MLX5_UCTX_CAP_RDMA_CTRL_OTHER_VHCA) 137 + *cap |= MLX5_UCTX_CAP_RDMA_CTRL_OTHER_VHCA; 138 + else 139 + return -EOPNOTSUPP; 140 + } 141 + 142 + return 0; 143 + } 144 + 145 + int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user, u64 req_ucaps) 127 146 { 128 147 u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {}; 129 148 u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {}; ··· 157 136 return -EINVAL; 158 137 159 138 uctx = MLX5_ADDR_OF(create_uctx_in, in, uctx); 160 - if (is_user && capable(CAP_NET_RAW) && 161 - (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RAW_TX)) 139 + if (is_user && 140 + (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RAW_TX) && 141 + capable(CAP_NET_RAW)) 162 142 cap |= MLX5_UCTX_CAP_RAW_TX; 163 - if (is_user && capable(CAP_SYS_RAWIO) && 143 + if (is_user && 164 144 (MLX5_CAP_GEN(dev->mdev, uctx_cap) & 165 - MLX5_UCTX_CAP_INTERNAL_DEV_RES)) 145 + MLX5_UCTX_CAP_INTERNAL_DEV_RES) && 146 + capable(CAP_SYS_RAWIO)) 166 147 cap |= MLX5_UCTX_CAP_INTERNAL_DEV_RES; 148 + 149 + if (req_ucaps) { 150 + err = set_uctx_ucaps(dev, req_ucaps, &cap); 151 + if (err) 152 + return err; 153 + } 167 154 168 155 MLX5_SET(create_uctx_in, in, opcode, MLX5_CMD_OP_CREATE_UCTX); 169 156 MLX5_SET(uctx, uctx, cap, cap); ··· 2602 2573 struct mlx5_devx_event_table *table = &dev->devx_event_table; 2603 2574 int uid; 2604 2575 2605 - uid = mlx5_ib_devx_create(dev, false); 2576 + uid = mlx5_ib_devx_create(dev, false, 0); 2606 2577 if (uid > 0) { 2607 2578 dev->devx_whitelist_uid = uid; 2608 2579 xa_init(&table->event_xa);
+3 -2
drivers/infiniband/hw/mlx5/devx.h
··· 24 24 struct list_head event_sub; /* holds devx_event_subscription entries */ 25 25 }; 26 26 #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS) 27 - int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user); 27 + int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user, u64 req_ucaps); 28 28 void mlx5_ib_devx_destroy(struct mlx5_ib_dev *dev, u16 uid); 29 29 int mlx5_ib_devx_init(struct mlx5_ib_dev *dev); 30 30 void mlx5_ib_devx_cleanup(struct mlx5_ib_dev *dev); 31 31 void mlx5_ib_ufile_hw_cleanup(struct ib_uverbs_file *ufile); 32 32 #else 33 - static inline int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user) 33 + static inline int mlx5_ib_devx_create(struct mlx5_ib_dev *dev, bool is_user, 34 + u64 req_ucaps) 34 35 { 35 36 return -EOPNOTSUPP; 36 37 }
+620 -17
drivers/infiniband/hw/mlx5/fs.c
··· 12 12 #include <rdma/mlx5_user_ioctl_verbs.h> 13 13 #include <rdma/ib_hdrs.h> 14 14 #include <rdma/ib_umem.h> 15 + #include <rdma/ib_ucaps.h> 15 16 #include <linux/mlx5/driver.h> 16 17 #include <linux/mlx5/fs.h> 17 18 #include <linux/mlx5/fs_helpers.h> ··· 31 30 MATCH_CRITERIA_ENABLE_MISC_BIT, 32 31 MATCH_CRITERIA_ENABLE_INNER_BIT, 33 32 MATCH_CRITERIA_ENABLE_MISC2_BIT 33 + }; 34 + 35 + 36 + struct mlx5_per_qp_opfc { 37 + struct mlx5_ib_op_fc opfcs[MLX5_IB_OPCOUNTER_MAX]; 34 38 }; 35 39 36 40 #define HEADER_IS_ZERO(match_criteria, headers) \ ··· 684 678 #define MLX5_FS_MAX_TYPES 6 685 679 #define MLX5_FS_MAX_ENTRIES BIT(16) 686 680 687 - static bool mlx5_ib_shared_ft_allowed(struct ib_device *device) 681 + static bool __maybe_unused mlx5_ib_shared_ft_allowed(struct ib_device *device) 688 682 { 689 683 struct mlx5_ib_dev *dev = to_mdev(device); 690 684 ··· 696 690 struct mlx5_ib_flow_prio *prio, 697 691 int priority, 698 692 int num_entries, int num_groups, 699 - u32 flags) 693 + u32 flags, u16 vport) 700 694 { 701 695 struct mlx5_flow_table_attr ft_attr = {}; 702 696 struct mlx5_flow_table *ft; ··· 704 698 ft_attr.prio = priority; 705 699 ft_attr.max_fte = num_entries; 706 700 ft_attr.flags = flags; 701 + ft_attr.vport = vport; 707 702 ft_attr.autogroup.max_num_groups = num_groups; 708 703 ft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr); 709 704 if (IS_ERR(ft)) ··· 799 792 ft = prio->flow_table; 800 793 if (!ft) 801 794 return _get_prio(dev, ns, prio, priority, max_table_size, 802 - num_groups, flags); 795 + num_groups, flags, 0); 803 796 804 797 return prio; 805 798 } 806 799 807 800 enum { 801 + RDMA_RX_ECN_OPCOUNTER_PER_QP_PRIO, 802 + RDMA_RX_CNP_OPCOUNTER_PER_QP_PRIO, 803 + RDMA_RX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO, 808 804 RDMA_RX_ECN_OPCOUNTER_PRIO, 809 805 RDMA_RX_CNP_OPCOUNTER_PRIO, 806 + RDMA_RX_PKTS_BYTES_OPCOUNTER_PRIO, 810 807 }; 811 808 812 809 enum { 810 + RDMA_TX_CNP_OPCOUNTER_PER_QP_PRIO, 811 + RDMA_TX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO, 813 812 RDMA_TX_CNP_OPCOUNTER_PRIO, 813 + RDMA_TX_PKTS_BYTES_OPCOUNTER_PRIO, 814 814 }; 815 815 816 816 static int set_vhca_port_spec(struct mlx5_ib_dev *dev, u32 port_num, ··· 881 867 return 0; 882 868 } 883 869 870 + /* Returns the prio we should use for the given optional counter type, 871 + * whereas for bytes type we use the packet type, since they share the same 872 + * resources. 873 + */ 874 + static struct mlx5_ib_flow_prio *get_opfc_prio(struct mlx5_ib_dev *dev, 875 + u32 type) 876 + { 877 + u32 prio_type; 878 + 879 + switch (type) { 880 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: 881 + prio_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS; 882 + break; 883 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: 884 + prio_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS; 885 + break; 886 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP: 887 + prio_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; 888 + break; 889 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP: 890 + prio_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; 891 + break; 892 + default: 893 + prio_type = type; 894 + } 895 + 896 + return &dev->flow_db->opfcs[prio_type]; 897 + } 898 + 899 + static void put_per_qp_prio(struct mlx5_ib_dev *dev, 900 + enum mlx5_ib_optional_counter_type type) 901 + { 902 + enum mlx5_ib_optional_counter_type per_qp_type; 903 + struct mlx5_ib_flow_prio *prio; 904 + 905 + switch (type) { 906 + case MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS: 907 + per_qp_type = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; 908 + break; 909 + case MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS: 910 + per_qp_type = MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS_PER_QP; 911 + break; 912 + case MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS: 913 + per_qp_type = MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS_PER_QP; 914 + break; 915 + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS: 916 + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; 917 + break; 918 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: 919 + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP; 920 + break; 921 + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS: 922 + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; 923 + break; 924 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: 925 + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; 926 + break; 927 + default: 928 + return; 929 + } 930 + 931 + prio = get_opfc_prio(dev, per_qp_type); 932 + put_flow_table(dev, prio, true); 933 + } 934 + 935 + static int get_per_qp_prio(struct mlx5_ib_dev *dev, 936 + enum mlx5_ib_optional_counter_type type) 937 + { 938 + enum mlx5_ib_optional_counter_type per_qp_type; 939 + enum mlx5_flow_namespace_type fn_type; 940 + struct mlx5_flow_namespace *ns; 941 + struct mlx5_ib_flow_prio *prio; 942 + int priority; 943 + 944 + switch (type) { 945 + case MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS: 946 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; 947 + priority = RDMA_RX_ECN_OPCOUNTER_PER_QP_PRIO; 948 + per_qp_type = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; 949 + break; 950 + case MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS: 951 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; 952 + priority = RDMA_RX_CNP_OPCOUNTER_PER_QP_PRIO; 953 + per_qp_type = MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS_PER_QP; 954 + break; 955 + case MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS: 956 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS; 957 + priority = RDMA_TX_CNP_OPCOUNTER_PER_QP_PRIO; 958 + per_qp_type = MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS_PER_QP; 959 + break; 960 + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS: 961 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS; 962 + priority = RDMA_TX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO; 963 + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; 964 + break; 965 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: 966 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS; 967 + priority = RDMA_TX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO; 968 + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP; 969 + break; 970 + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS: 971 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; 972 + priority = RDMA_RX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO; 973 + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; 974 + break; 975 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: 976 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; 977 + priority = RDMA_RX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO; 978 + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; 979 + break; 980 + default: 981 + return -EINVAL; 982 + } 983 + 984 + ns = mlx5_get_flow_namespace(dev->mdev, fn_type); 985 + if (!ns) 986 + return -EOPNOTSUPP; 987 + 988 + prio = get_opfc_prio(dev, per_qp_type); 989 + if (prio->flow_table) 990 + return 0; 991 + 992 + prio = _get_prio(dev, ns, prio, priority, MLX5_FS_MAX_POOL_SIZE, 1, 0, 0); 993 + if (IS_ERR(prio)) 994 + return PTR_ERR(prio); 995 + 996 + prio->refcount = 1; 997 + 998 + return 0; 999 + } 1000 + 1001 + static struct mlx5_per_qp_opfc * 1002 + get_per_qp_opfc(struct mlx5_rdma_counter *mcounter, u32 qp_num, bool *new) 1003 + { 1004 + struct mlx5_per_qp_opfc *per_qp_opfc; 1005 + 1006 + *new = false; 1007 + 1008 + per_qp_opfc = xa_load(&mcounter->qpn_opfc_xa, qp_num); 1009 + if (per_qp_opfc) 1010 + return per_qp_opfc; 1011 + per_qp_opfc = kzalloc(sizeof(*per_qp_opfc), GFP_KERNEL); 1012 + 1013 + if (!per_qp_opfc) 1014 + return NULL; 1015 + 1016 + *new = true; 1017 + return per_qp_opfc; 1018 + } 1019 + 1020 + static int add_op_fc_rules(struct mlx5_ib_dev *dev, 1021 + struct mlx5_rdma_counter *mcounter, 1022 + struct mlx5_per_qp_opfc *per_qp_opfc, 1023 + struct mlx5_ib_flow_prio *prio, 1024 + enum mlx5_ib_optional_counter_type type, 1025 + u32 qp_num, u32 port_num) 1026 + { 1027 + struct mlx5_ib_op_fc *opfc = &per_qp_opfc->opfcs[type], *in_use_opfc; 1028 + struct mlx5_flow_act flow_act = {}; 1029 + struct mlx5_flow_destination dst; 1030 + struct mlx5_flow_spec *spec; 1031 + int i, err, spec_num; 1032 + bool is_tx; 1033 + 1034 + if (opfc->fc) 1035 + return -EEXIST; 1036 + 1037 + if (mlx5r_is_opfc_shared_and_in_use(per_qp_opfc->opfcs, type, 1038 + &in_use_opfc)) { 1039 + opfc->fc = in_use_opfc->fc; 1040 + opfc->rule[0] = in_use_opfc->rule[0]; 1041 + return 0; 1042 + } 1043 + 1044 + opfc->fc = mcounter->fc[type]; 1045 + 1046 + spec = kcalloc(MAX_OPFC_RULES, sizeof(*spec), GFP_KERNEL); 1047 + if (!spec) { 1048 + err = -ENOMEM; 1049 + goto null_fc; 1050 + } 1051 + 1052 + switch (type) { 1053 + case MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP: 1054 + if (set_ecn_ce_spec(dev, port_num, &spec[0], 1055 + MLX5_FS_IPV4_VERSION) || 1056 + set_ecn_ce_spec(dev, port_num, &spec[1], 1057 + MLX5_FS_IPV6_VERSION)) { 1058 + err = -EOPNOTSUPP; 1059 + goto free_spec; 1060 + } 1061 + spec_num = 2; 1062 + is_tx = false; 1063 + 1064 + MLX5_SET_TO_ONES(fte_match_param, spec[1].match_criteria, 1065 + misc_parameters.bth_dst_qp); 1066 + MLX5_SET(fte_match_param, spec[1].match_value, 1067 + misc_parameters.bth_dst_qp, qp_num); 1068 + spec[1].match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS; 1069 + break; 1070 + case MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS_PER_QP: 1071 + if (!MLX5_CAP_FLOWTABLE( 1072 + dev->mdev, 1073 + ft_field_support_2_nic_receive_rdma.bth_opcode) || 1074 + set_cnp_spec(dev, port_num, &spec[0])) { 1075 + err = -EOPNOTSUPP; 1076 + goto free_spec; 1077 + } 1078 + spec_num = 1; 1079 + is_tx = false; 1080 + break; 1081 + case MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS_PER_QP: 1082 + if (!MLX5_CAP_FLOWTABLE( 1083 + dev->mdev, 1084 + ft_field_support_2_nic_transmit_rdma.bth_opcode) || 1085 + set_cnp_spec(dev, port_num, &spec[0])) { 1086 + err = -EOPNOTSUPP; 1087 + goto free_spec; 1088 + } 1089 + spec_num = 1; 1090 + is_tx = true; 1091 + break; 1092 + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP: 1093 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP: 1094 + spec_num = 1; 1095 + is_tx = true; 1096 + break; 1097 + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP: 1098 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP: 1099 + spec_num = 1; 1100 + is_tx = false; 1101 + break; 1102 + default: 1103 + err = -EINVAL; 1104 + goto free_spec; 1105 + } 1106 + 1107 + if (is_tx) { 1108 + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, 1109 + misc_parameters.source_sqn); 1110 + MLX5_SET(fte_match_param, spec->match_value, 1111 + misc_parameters.source_sqn, qp_num); 1112 + } else { 1113 + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, 1114 + misc_parameters.bth_dst_qp); 1115 + MLX5_SET(fte_match_param, spec->match_value, 1116 + misc_parameters.bth_dst_qp, qp_num); 1117 + } 1118 + 1119 + spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS; 1120 + 1121 + dst.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; 1122 + dst.counter = opfc->fc; 1123 + 1124 + flow_act.action = 1125 + MLX5_FLOW_CONTEXT_ACTION_COUNT | MLX5_FLOW_CONTEXT_ACTION_ALLOW; 1126 + 1127 + for (i = 0; i < spec_num; i++) { 1128 + opfc->rule[i] = mlx5_add_flow_rules(prio->flow_table, &spec[i], 1129 + &flow_act, &dst, 1); 1130 + if (IS_ERR(opfc->rule[i])) { 1131 + err = PTR_ERR(opfc->rule[i]); 1132 + goto del_rules; 1133 + } 1134 + } 1135 + prio->refcount += spec_num; 1136 + 1137 + err = xa_err(xa_store(&mcounter->qpn_opfc_xa, qp_num, per_qp_opfc, 1138 + GFP_KERNEL)); 1139 + if (err) 1140 + goto del_rules; 1141 + 1142 + kfree(spec); 1143 + 1144 + return 0; 1145 + 1146 + del_rules: 1147 + while (i--) 1148 + mlx5_del_flow_rules(opfc->rule[i]); 1149 + put_flow_table(dev, prio, false); 1150 + free_spec: 1151 + kfree(spec); 1152 + null_fc: 1153 + opfc->fc = NULL; 1154 + return err; 1155 + } 1156 + 1157 + static bool is_fc_shared_and_in_use(struct mlx5_rdma_counter *mcounter, 1158 + u32 type, struct mlx5_fc **fc) 1159 + { 1160 + u32 shared_fc_type; 1161 + 1162 + switch (type) { 1163 + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP: 1164 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP; 1165 + break; 1166 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP: 1167 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; 1168 + break; 1169 + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP: 1170 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; 1171 + break; 1172 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP: 1173 + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; 1174 + break; 1175 + default: 1176 + return false; 1177 + } 1178 + 1179 + *fc = mcounter->fc[shared_fc_type]; 1180 + if (!(*fc)) 1181 + return false; 1182 + 1183 + return true; 1184 + } 1185 + 1186 + void mlx5r_fs_destroy_fcs(struct mlx5_ib_dev *dev, 1187 + struct rdma_counter *counter) 1188 + { 1189 + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); 1190 + struct mlx5_fc *in_use_fc; 1191 + int i; 1192 + 1193 + for (i = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; 1194 + i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; i++) { 1195 + if (!mcounter->fc[i]) 1196 + continue; 1197 + 1198 + if (is_fc_shared_and_in_use(mcounter, i, &in_use_fc)) { 1199 + mcounter->fc[i] = NULL; 1200 + continue; 1201 + } 1202 + 1203 + mlx5_fc_destroy(dev->mdev, mcounter->fc[i]); 1204 + mcounter->fc[i] = NULL; 1205 + } 1206 + } 1207 + 884 1208 int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, 885 1209 struct mlx5_ib_op_fc *opfc, 886 1210 enum mlx5_ib_optional_counter_type type) ··· 1273 921 priority = RDMA_TX_CNP_OPCOUNTER_PRIO; 1274 922 break; 1275 923 924 + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS: 925 + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: 926 + spec_num = 1; 927 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS; 928 + priority = RDMA_TX_PKTS_BYTES_OPCOUNTER_PRIO; 929 + break; 930 + 931 + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS: 932 + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: 933 + spec_num = 1; 934 + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; 935 + priority = RDMA_RX_PKTS_BYTES_OPCOUNTER_PRIO; 936 + break; 937 + 1276 938 default: 1277 939 err = -EOPNOTSUPP; 1278 940 goto free; ··· 1298 932 goto free; 1299 933 } 1300 934 1301 - prio = &dev->flow_db->opfcs[type]; 935 + prio = get_opfc_prio(dev, type); 1302 936 if (!prio->flow_table) { 937 + err = get_per_qp_prio(dev, type); 938 + if (err) 939 + goto free; 940 + 1303 941 prio = _get_prio(dev, ns, prio, priority, 1304 - dev->num_ports * MAX_OPFC_RULES, 1, 0); 942 + dev->num_ports * MAX_OPFC_RULES, 1, 0, 0); 1305 943 if (IS_ERR(prio)) { 1306 944 err = PTR_ERR(prio); 1307 - goto free; 945 + goto put_prio; 1308 946 } 1309 947 } 1310 948 ··· 1335 965 for (i -= 1; i >= 0; i--) 1336 966 mlx5_del_flow_rules(opfc->rule[i]); 1337 967 put_flow_table(dev, prio, false); 968 + put_prio: 969 + put_per_qp_prio(dev, type); 1338 970 free: 1339 971 kfree(spec); 1340 972 return err; ··· 1346 974 struct mlx5_ib_op_fc *opfc, 1347 975 enum mlx5_ib_optional_counter_type type) 1348 976 { 977 + struct mlx5_ib_flow_prio *prio; 1349 978 int i; 979 + 980 + prio = get_opfc_prio(dev, type); 1350 981 1351 982 for (i = 0; i < MAX_OPFC_RULES && opfc->rule[i]; i++) { 1352 983 mlx5_del_flow_rules(opfc->rule[i]); 1353 - put_flow_table(dev, &dev->flow_db->opfcs[type], true); 984 + put_flow_table(dev, prio, true); 1354 985 } 986 + 987 + put_per_qp_prio(dev, type); 988 + } 989 + 990 + void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct rdma_counter *counter) 991 + { 992 + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); 993 + struct mlx5_ib_dev *dev = to_mdev(counter->device); 994 + struct mlx5_per_qp_opfc *per_qp_opfc; 995 + struct mlx5_ib_op_fc *in_use_opfc; 996 + struct mlx5_ib_flow_prio *prio; 997 + int i, j; 998 + 999 + per_qp_opfc = xa_load(&mcounter->qpn_opfc_xa, qp->qp_num); 1000 + if (!per_qp_opfc) 1001 + return; 1002 + 1003 + for (i = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; 1004 + i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; i++) { 1005 + if (!per_qp_opfc->opfcs[i].fc) 1006 + continue; 1007 + 1008 + if (mlx5r_is_opfc_shared_and_in_use(per_qp_opfc->opfcs, i, 1009 + &in_use_opfc)) { 1010 + per_qp_opfc->opfcs[i].fc = NULL; 1011 + continue; 1012 + } 1013 + 1014 + for (j = 0; j < MAX_OPFC_RULES; j++) { 1015 + if (!per_qp_opfc->opfcs[i].rule[j]) 1016 + continue; 1017 + mlx5_del_flow_rules(per_qp_opfc->opfcs[i].rule[j]); 1018 + prio = get_opfc_prio(dev, i); 1019 + put_flow_table(dev, prio, true); 1020 + } 1021 + per_qp_opfc->opfcs[i].fc = NULL; 1022 + } 1023 + 1024 + kfree(per_qp_opfc); 1025 + xa_erase(&mcounter->qpn_opfc_xa, qp->qp_num); 1026 + } 1027 + 1028 + int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, 1029 + u32 port) 1030 + { 1031 + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); 1032 + struct mlx5_ib_dev *dev = to_mdev(qp->device); 1033 + struct mlx5_per_qp_opfc *per_qp_opfc; 1034 + struct mlx5_ib_flow_prio *prio; 1035 + struct mlx5_ib_counters *cnts; 1036 + struct mlx5_ib_op_fc *opfc; 1037 + struct mlx5_fc *in_use_fc; 1038 + int i, err, per_qp_type; 1039 + bool new; 1040 + 1041 + if (!counter->mode.bind_opcnt) 1042 + return 0; 1043 + 1044 + cnts = &dev->port[port - 1].cnts; 1045 + 1046 + for (i = 0; i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES; i++) { 1047 + opfc = &cnts->opfcs[i]; 1048 + if (!opfc->fc) 1049 + continue; 1050 + 1051 + per_qp_type = i + MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; 1052 + prio = get_opfc_prio(dev, per_qp_type); 1053 + WARN_ON(!prio->flow_table); 1054 + 1055 + if (is_fc_shared_and_in_use(mcounter, per_qp_type, &in_use_fc)) 1056 + mcounter->fc[per_qp_type] = in_use_fc; 1057 + 1058 + if (!mcounter->fc[per_qp_type]) { 1059 + mcounter->fc[per_qp_type] = mlx5_fc_create(dev->mdev, 1060 + false); 1061 + if (IS_ERR(mcounter->fc[per_qp_type])) 1062 + return PTR_ERR(mcounter->fc[per_qp_type]); 1063 + } 1064 + 1065 + per_qp_opfc = get_per_qp_opfc(mcounter, qp->qp_num, &new); 1066 + if (!per_qp_opfc) { 1067 + err = -ENOMEM; 1068 + goto free_fc; 1069 + } 1070 + err = add_op_fc_rules(dev, mcounter, per_qp_opfc, prio, 1071 + per_qp_type, qp->qp_num, port); 1072 + if (err) 1073 + goto del_rules; 1074 + } 1075 + 1076 + return 0; 1077 + 1078 + del_rules: 1079 + mlx5r_fs_unbind_op_fc(qp, counter); 1080 + if (new) 1081 + kfree(per_qp_opfc); 1082 + free_fc: 1083 + if (xa_empty(&mcounter->qpn_opfc_xa)) 1084 + mlx5r_fs_destroy_fcs(dev, counter); 1085 + return err; 1355 1086 } 1356 1087 1357 1088 static void set_underlay_qp(struct mlx5_ib_dev *dev, ··· 1888 1413 return ERR_PTR(err); 1889 1414 } 1890 1415 1416 + static int mlx5_ib_fill_transport_ns_info(struct mlx5_ib_dev *dev, 1417 + enum mlx5_flow_namespace_type type, 1418 + u32 *flags, u16 *vport_idx, 1419 + u16 *vport, 1420 + struct mlx5_core_dev **ft_mdev, 1421 + u32 ib_port) 1422 + { 1423 + struct mlx5_core_dev *esw_mdev; 1424 + 1425 + if (!is_mdev_switchdev_mode(dev->mdev)) 1426 + return 0; 1427 + 1428 + if (!MLX5_CAP_ADV_RDMA(dev->mdev, rdma_transport_manager)) 1429 + return -EOPNOTSUPP; 1430 + 1431 + if (!dev->port[ib_port - 1].rep) 1432 + return -EINVAL; 1433 + 1434 + esw_mdev = mlx5_eswitch_get_core_dev(dev->port[ib_port - 1].rep->esw); 1435 + if (esw_mdev != dev->mdev) 1436 + return -EOPNOTSUPP; 1437 + 1438 + *flags |= MLX5_FLOW_TABLE_OTHER_VPORT; 1439 + *ft_mdev = esw_mdev; 1440 + *vport = dev->port[ib_port - 1].rep->vport; 1441 + *vport_idx = dev->port[ib_port - 1].rep->vport_index; 1442 + 1443 + return 0; 1444 + } 1445 + 1891 1446 static struct mlx5_ib_flow_prio * 1892 1447 _get_flow_table(struct mlx5_ib_dev *dev, u16 user_priority, 1893 1448 enum mlx5_flow_namespace_type ns_type, 1894 - bool mcast) 1449 + bool mcast, u32 ib_port) 1895 1450 { 1451 + struct mlx5_core_dev *ft_mdev = dev->mdev; 1896 1452 struct mlx5_flow_namespace *ns = NULL; 1897 1453 struct mlx5_ib_flow_prio *prio = NULL; 1898 1454 int max_table_size = 0; 1455 + u16 vport_idx = 0; 1899 1456 bool esw_encap; 1900 1457 u32 flags = 0; 1458 + u16 vport = 0; 1901 1459 int priority; 1460 + int ret; 1902 1461 1903 1462 if (mcast) 1904 1463 priority = MLX5_IB_FLOW_MCAST_PRIO; ··· 1980 1471 MLX5_CAP_FLOWTABLE_RDMA_TX(dev->mdev, log_max_ft_size)); 1981 1472 priority = user_priority; 1982 1473 break; 1474 + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX: 1475 + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX: 1476 + if (ib_port == 0 || user_priority > MLX5_RDMA_TRANSPORT_BYPASS_PRIO) 1477 + return ERR_PTR(-EINVAL); 1478 + ret = mlx5_ib_fill_transport_ns_info(dev, ns_type, &flags, 1479 + &vport_idx, &vport, 1480 + &ft_mdev, ib_port); 1481 + if (ret) 1482 + return ERR_PTR(ret); 1483 + 1484 + if (ns_type == MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX) 1485 + max_table_size = 1486 + BIT(MLX5_CAP_FLOWTABLE_RDMA_TRANSPORT_RX( 1487 + ft_mdev, log_max_ft_size)); 1488 + else 1489 + max_table_size = 1490 + BIT(MLX5_CAP_FLOWTABLE_RDMA_TRANSPORT_TX( 1491 + ft_mdev, log_max_ft_size)); 1492 + priority = user_priority; 1493 + break; 1983 1494 default: 1984 1495 break; 1985 1496 } 1986 1497 1987 1498 max_table_size = min_t(int, max_table_size, MLX5_FS_MAX_ENTRIES); 1988 1499 1989 - ns = mlx5_get_flow_namespace(dev->mdev, ns_type); 1500 + if (ns_type == MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX || 1501 + ns_type == MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX) 1502 + ns = mlx5_get_flow_vport_namespace(ft_mdev, ns_type, vport_idx); 1503 + else 1504 + ns = mlx5_get_flow_namespace(ft_mdev, ns_type); 1505 + 1990 1506 if (!ns) 1991 1507 return ERR_PTR(-EOPNOTSUPP); 1992 1508 ··· 2031 1497 case MLX5_FLOW_NAMESPACE_RDMA_TX: 2032 1498 prio = &dev->flow_db->rdma_tx[priority]; 2033 1499 break; 1500 + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX: 1501 + prio = &dev->flow_db->rdma_transport_rx[ib_port - 1]; 1502 + break; 1503 + case MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX: 1504 + prio = &dev->flow_db->rdma_transport_tx[ib_port - 1]; 1505 + break; 2034 1506 default: return ERR_PTR(-EINVAL); 2035 1507 } 2036 1508 ··· 2047 1507 return prio; 2048 1508 2049 1509 return _get_prio(dev, ns, prio, priority, max_table_size, 2050 - MLX5_FS_MAX_TYPES, flags); 1510 + MLX5_FS_MAX_TYPES, flags, vport); 2051 1511 } 2052 1512 2053 1513 static struct mlx5_ib_flow_handler * ··· 2166 1626 mutex_lock(&dev->flow_db->lock); 2167 1627 2168 1628 ft_prio = _get_flow_table(dev, fs_matcher->priority, 2169 - fs_matcher->ns_type, mcast); 1629 + fs_matcher->ns_type, mcast, 1630 + fs_matcher->ib_port); 2170 1631 if (IS_ERR(ft_prio)) { 2171 1632 err = PTR_ERR(ft_prio); 2172 1633 goto unlock; ··· 2283 1742 case MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_TX: 2284 1743 *namespace = MLX5_FLOW_NAMESPACE_RDMA_TX; 2285 1744 break; 1745 + case MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_TRANSPORT_RX: 1746 + *namespace = MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX; 1747 + break; 1748 + case MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_TRANSPORT_TX: 1749 + *namespace = MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX; 1750 + break; 2286 1751 default: 2287 1752 return -EINVAL; 2288 1753 } ··· 2378 1831 return -EINVAL; 2379 1832 2380 1833 /* Allow only DEVX object or QP as dest when inserting to RDMA_RX */ 2381 - if ((fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_RX) && 1834 + if ((fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_RX || 1835 + fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX) && 2382 1836 ((!dest_devx && !dest_qp) || (dest_devx && dest_qp))) 2383 1837 return -EINVAL; 2384 1838 ··· 2396 1848 return -EINVAL; 2397 1849 /* Allow only flow table as dest when inserting to FDB or RDMA_RX */ 2398 1850 if ((fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_FDB_BYPASS || 2399 - fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_RX) && 1851 + fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_RX || 1852 + fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX) && 2400 1853 *dest_type != MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE) 2401 1854 return -EINVAL; 2402 1855 } else if (dest_qp) { ··· 2418 1869 *dest_id = mqp->raw_packet_qp.rq.tirn; 2419 1870 *dest_type = MLX5_FLOW_DESTINATION_TYPE_TIR; 2420 1871 } else if ((fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_EGRESS || 2421 - fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TX) && 1872 + fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TX || 1873 + fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX) && 2422 1874 !(*flags & MLX5_IB_ATTR_CREATE_FLOW_FLAGS_DROP)) { 2423 1875 *dest_type = MLX5_FLOW_DESTINATION_TYPE_PORT; 2424 1876 } 2425 1877 2426 1878 if (*dest_type == MLX5_FLOW_DESTINATION_TYPE_TIR && 2427 1879 (fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_EGRESS || 2428 - fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TX)) 1880 + fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TX || 1881 + fs_matcher->ns_type == MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX)) 2429 1882 return -EINVAL; 2430 1883 2431 1884 return 0; ··· 2904 2353 return 0; 2905 2354 } 2906 2355 2356 + static bool verify_context_caps(struct mlx5_ib_dev *dev, u64 enabled_caps) 2357 + { 2358 + if (is_mdev_switchdev_mode(dev->mdev)) 2359 + return UCAP_ENABLED(enabled_caps, 2360 + RDMA_UCAP_MLX5_CTRL_OTHER_VHCA); 2361 + 2362 + return UCAP_ENABLED(enabled_caps, RDMA_UCAP_MLX5_CTRL_LOCAL); 2363 + } 2364 + 2907 2365 static int UVERBS_HANDLER(MLX5_IB_METHOD_FLOW_MATCHER_CREATE)( 2908 2366 struct uverbs_attr_bundle *attrs) 2909 2367 { ··· 2959 2399 mlx5_eswitch_mode(dev->mdev) != MLX5_ESWITCH_OFFLOADS) { 2960 2400 err = -EINVAL; 2961 2401 goto end; 2402 + } 2403 + 2404 + if (uverbs_attr_is_valid(attrs, MLX5_IB_ATTR_FLOW_MATCHER_IB_PORT)) { 2405 + err = uverbs_copy_from(&obj->ib_port, attrs, 2406 + MLX5_IB_ATTR_FLOW_MATCHER_IB_PORT); 2407 + if (err) 2408 + goto end; 2409 + if (!rdma_is_port_valid(&dev->ib_dev, obj->ib_port)) { 2410 + err = -EINVAL; 2411 + goto end; 2412 + } 2413 + if (obj->ns_type != MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX && 2414 + obj->ns_type != MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX) { 2415 + err = -EINVAL; 2416 + goto end; 2417 + } 2418 + if (!verify_context_caps(dev, uobj->context->enabled_caps)) { 2419 + err = -EOPNOTSUPP; 2420 + goto end; 2421 + } 2962 2422 } 2963 2423 2964 2424 uobj->object = obj; ··· 3028 2448 3029 2449 mutex_lock(&dev->flow_db->lock); 3030 2450 3031 - ft_prio = _get_flow_table(dev, priority, ns_type, 0); 2451 + ft_prio = _get_flow_table(dev, priority, ns_type, 0, 0); 3032 2452 if (IS_ERR(ft_prio)) { 3033 2453 err = PTR_ERR(ft_prio); 3034 2454 goto free_obj; ··· 3414 2834 UA_OPTIONAL), 3415 2835 UVERBS_ATTR_CONST_IN(MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE, 3416 2836 enum mlx5_ib_uapi_flow_table_type, 3417 - UA_OPTIONAL)); 2837 + UA_OPTIONAL), 2838 + UVERBS_ATTR_PTR_IN(MLX5_IB_ATTR_FLOW_MATCHER_IB_PORT, 2839 + UVERBS_ATTR_TYPE(u32), 2840 + UA_OPTIONAL)); 3418 2841 3419 2842 DECLARE_UVERBS_NAMED_METHOD_DESTROY( 3420 2843 MLX5_IB_METHOD_FLOW_MATCHER_DESTROY, ··· 3461 2878 &UVERBS_METHOD(MLX5_IB_METHOD_STEERING_ANCHOR_DESTROY)); 3462 2879 3463 2880 const struct uapi_definition mlx5_ib_flow_defs[] = { 2881 + #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS) 3464 2882 UAPI_DEF_CHAIN_OBJ_TREE_NAMED( 3465 2883 MLX5_IB_OBJECT_FLOW_MATCHER), 3466 2884 UAPI_DEF_CHAIN_OBJ_TREE( ··· 3472 2888 UAPI_DEF_CHAIN_OBJ_TREE_NAMED( 3473 2889 MLX5_IB_OBJECT_STEERING_ANCHOR, 3474 2890 UAPI_DEF_IS_OBJ_SUPPORTED(mlx5_ib_shared_ft_allowed)), 2891 + #endif 3475 2892 {}, 3476 2893 }; 3477 2894 ··· 3489 2904 if (!dev->flow_db) 3490 2905 return -ENOMEM; 3491 2906 2907 + dev->flow_db->rdma_transport_rx = kcalloc(dev->num_ports, 2908 + sizeof(struct mlx5_ib_flow_prio), 2909 + GFP_KERNEL); 2910 + if (!dev->flow_db->rdma_transport_rx) 2911 + goto free_flow_db; 2912 + 2913 + dev->flow_db->rdma_transport_tx = kcalloc(dev->num_ports, 2914 + sizeof(struct mlx5_ib_flow_prio), 2915 + GFP_KERNEL); 2916 + if (!dev->flow_db->rdma_transport_tx) 2917 + goto free_rdma_transport_rx; 2918 + 3492 2919 mutex_init(&dev->flow_db->lock); 3493 2920 3494 2921 ib_set_device_ops(&dev->ib_dev, &flow_ops); 3495 2922 return 0; 2923 + 2924 + free_rdma_transport_rx: 2925 + kfree(dev->flow_db->rdma_transport_rx); 2926 + free_flow_db: 2927 + kfree(dev->flow_db); 2928 + return -ENOMEM; 3496 2929 }
+2 -15
drivers/infiniband/hw/mlx5/fs.h
··· 8 8 9 9 #include "mlx5_ib.h" 10 10 11 - #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS) 12 11 int mlx5_ib_fs_init(struct mlx5_ib_dev *dev); 13 12 void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev); 14 - #else 15 - static inline int mlx5_ib_fs_init(struct mlx5_ib_dev *dev) 16 - { 17 - dev->flow_db = kzalloc(sizeof(*dev->flow_db), GFP_KERNEL); 18 - 19 - if (!dev->flow_db) 20 - return -ENOMEM; 21 - 22 - mutex_init(&dev->flow_db->lock); 23 - return 0; 24 - } 25 - 26 - inline void mlx5_ib_fs_cleanup_anchor(struct mlx5_ib_dev *dev) {} 27 - #endif 28 13 29 14 static inline void mlx5_ib_fs_cleanup(struct mlx5_ib_dev *dev) 30 15 { ··· 25 40 * is a safe assumption that all references are gone. 26 41 */ 27 42 mlx5_ib_fs_cleanup_anchor(dev); 43 + kfree(dev->flow_db->rdma_transport_tx); 44 + kfree(dev->flow_db->rdma_transport_rx); 28 45 kfree(dev->flow_db); 29 46 } 30 47 #endif /* _MLX5_IB_FS_H */
+73 -4
drivers/infiniband/hw/mlx5/main.c
··· 47 47 #include <rdma/uverbs_ioctl.h> 48 48 #include <rdma/mlx5_user_ioctl_verbs.h> 49 49 #include <rdma/mlx5_user_ioctl_cmds.h> 50 + #include <rdma/ib_ucaps.h> 50 51 #include "macsec.h" 51 52 #include "data_direct.h" 52 53 ··· 1935 1934 return 0; 1936 1935 } 1937 1936 1937 + static bool uctx_rdma_ctrl_is_enabled(u64 enabled_caps) 1938 + { 1939 + return UCAP_ENABLED(enabled_caps, RDMA_UCAP_MLX5_CTRL_LOCAL) || 1940 + UCAP_ENABLED(enabled_caps, RDMA_UCAP_MLX5_CTRL_OTHER_VHCA); 1941 + } 1942 + 1938 1943 static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx, 1939 1944 struct ib_udata *udata) 1940 1945 { ··· 1983 1976 return -EINVAL; 1984 1977 1985 1978 if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) { 1986 - err = mlx5_ib_devx_create(dev, true); 1979 + err = mlx5_ib_devx_create(dev, true, uctx->enabled_caps); 1987 1980 if (err < 0) 1988 1981 goto out_ctx; 1989 1982 context->devx_uid = err; 1983 + 1984 + if (uctx_rdma_ctrl_is_enabled(uctx->enabled_caps)) { 1985 + err = mlx5_cmd_add_privileged_uid(dev->mdev, 1986 + context->devx_uid); 1987 + if (err) 1988 + goto out_devx; 1989 + } 1990 1990 } 1991 1991 1992 1992 lib_uar_4k = req.lib_caps & MLX5_LIB_CAP_4K_UAR; ··· 2008 1994 /* updates req->total_num_bfregs */ 2009 1995 err = calc_total_bfregs(dev, lib_uar_4k, &req, bfregi); 2010 1996 if (err) 2011 - goto out_devx; 1997 + goto out_ucap; 2012 1998 2013 1999 mutex_init(&bfregi->lock); 2014 2000 bfregi->lib_uar_4k = lib_uar_4k; ··· 2016 2002 GFP_KERNEL); 2017 2003 if (!bfregi->count) { 2018 2004 err = -ENOMEM; 2019 - goto out_devx; 2005 + goto out_ucap; 2020 2006 } 2021 2007 2022 2008 bfregi->sys_pages = kcalloc(bfregi->num_sys_pages, ··· 2080 2066 out_count: 2081 2067 kfree(bfregi->count); 2082 2068 2069 + out_ucap: 2070 + if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX && 2071 + uctx_rdma_ctrl_is_enabled(uctx->enabled_caps)) 2072 + mlx5_cmd_remove_privileged_uid(dev->mdev, context->devx_uid); 2073 + 2083 2074 out_devx: 2084 2075 if (req.flags & MLX5_IB_ALLOC_UCTX_DEVX) 2085 2076 mlx5_ib_devx_destroy(dev, context->devx_uid); ··· 2129 2110 kfree(bfregi->sys_pages); 2130 2111 kfree(bfregi->count); 2131 2112 2132 - if (context->devx_uid) 2113 + if (context->devx_uid) { 2114 + if (uctx_rdma_ctrl_is_enabled(ibcontext->enabled_caps)) 2115 + mlx5_cmd_remove_privileged_uid(dev->mdev, 2116 + context->devx_uid); 2133 2117 mlx5_ib_devx_destroy(dev, context->devx_uid); 2118 + } 2134 2119 } 2135 2120 2136 2121 static phys_addr_t uar_index2pfn(struct mlx5_ib_dev *dev, ··· 4224 4201 return (var_table->bitmap) ? 0 : -ENOMEM; 4225 4202 } 4226 4203 4204 + static void mlx5_ib_cleanup_ucaps(struct mlx5_ib_dev *dev) 4205 + { 4206 + if (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RDMA_CTRL) 4207 + ib_remove_ucap(RDMA_UCAP_MLX5_CTRL_LOCAL); 4208 + 4209 + if (MLX5_CAP_GEN(dev->mdev, uctx_cap) & 4210 + MLX5_UCTX_CAP_RDMA_CTRL_OTHER_VHCA) 4211 + ib_remove_ucap(RDMA_UCAP_MLX5_CTRL_OTHER_VHCA); 4212 + } 4213 + 4214 + static int mlx5_ib_init_ucaps(struct mlx5_ib_dev *dev) 4215 + { 4216 + int ret; 4217 + 4218 + if (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RDMA_CTRL) { 4219 + ret = ib_create_ucap(RDMA_UCAP_MLX5_CTRL_LOCAL); 4220 + if (ret) 4221 + return ret; 4222 + } 4223 + 4224 + if (MLX5_CAP_GEN(dev->mdev, uctx_cap) & 4225 + MLX5_UCTX_CAP_RDMA_CTRL_OTHER_VHCA) { 4226 + ret = ib_create_ucap(RDMA_UCAP_MLX5_CTRL_OTHER_VHCA); 4227 + if (ret) 4228 + goto remove_local; 4229 + } 4230 + 4231 + return 0; 4232 + 4233 + remove_local: 4234 + if (MLX5_CAP_GEN(dev->mdev, uctx_cap) & MLX5_UCTX_CAP_RDMA_CTRL) 4235 + ib_remove_ucap(RDMA_UCAP_MLX5_CTRL_LOCAL); 4236 + return ret; 4237 + } 4238 + 4227 4239 static void mlx5_ib_stage_caps_cleanup(struct mlx5_ib_dev *dev) 4228 4240 { 4241 + if (MLX5_CAP_GEN_2_64(dev->mdev, general_obj_types_127_64) & 4242 + MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL) 4243 + mlx5_ib_cleanup_ucaps(dev); 4244 + 4229 4245 bitmap_free(dev->var_table.bitmap); 4230 4246 } 4231 4247 ··· 4311 4249 if (MLX5_CAP_GEN_64(dev->mdev, general_obj_types) & 4312 4250 MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) { 4313 4251 err = mlx5_ib_init_var_table(dev); 4252 + if (err) 4253 + return err; 4254 + } 4255 + 4256 + if (MLX5_CAP_GEN_2_64(dev->mdev, general_obj_types_127_64) & 4257 + MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL) { 4258 + err = mlx5_ib_init_ucaps(dev); 4314 4259 if (err) 4315 4260 return err; 4316 4261 }
+23
drivers/infiniband/hw/mlx5/mlx5_ib.h
··· 276 276 struct mlx5_core_dev *mdev; 277 277 atomic_t usecnt; 278 278 u8 match_criteria_enable; 279 + u32 ib_port; 279 280 }; 280 281 281 282 struct mlx5_ib_steering_anchor { ··· 294 293 MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS, 295 294 MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS, 296 295 MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS, 296 + MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS, 297 + MLX5_IB_OPCOUNTER_RDMA_TX_BYTES, 298 + MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS, 299 + MLX5_IB_OPCOUNTER_RDMA_RX_BYTES, 300 + 301 + MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP, 302 + MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS_PER_QP, 303 + MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS_PER_QP, 304 + MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP, 305 + MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP, 306 + MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP, 307 + MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP, 297 308 298 309 MLX5_IB_OPCOUNTER_MAX, 299 310 }; ··· 320 307 struct mlx5_ib_flow_prio rdma_tx[MLX5_IB_NUM_FLOW_FT]; 321 308 struct mlx5_ib_flow_prio opfcs[MLX5_IB_OPCOUNTER_MAX]; 322 309 struct mlx5_flow_table *lag_demux_ft; 310 + struct mlx5_ib_flow_prio *rdma_transport_rx; 311 + struct mlx5_ib_flow_prio *rdma_transport_tx; 323 312 /* Protect flow steering bypass flow tables 324 313 * when add/del flow rules. 325 314 * only single add/removal of flow steering rule could be done ··· 897 882 void mlx5_ib_fs_remove_op_fc(struct mlx5_ib_dev *dev, 898 883 struct mlx5_ib_op_fc *opfc, 899 884 enum mlx5_ib_optional_counter_type type); 885 + 886 + int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, 887 + u32 port); 888 + 889 + void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct rdma_counter *counter); 890 + 891 + void mlx5r_fs_destroy_fcs(struct mlx5_ib_dev *dev, 892 + struct rdma_counter *counter); 900 893 901 894 struct mlx5_ib_multiport_info; 902 895
+32 -20
drivers/infiniband/hw/mlx5/mr.c
··· 56 56 create_mkey_callback(int status, struct mlx5_async_work *context); 57 57 static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, struct ib_umem *umem, 58 58 u64 iova, int access_flags, 59 - unsigned int page_size, bool populate, 59 + unsigned long page_size, bool populate, 60 60 int access_mode); 61 61 static int __mlx5_ib_dereg_mr(struct ib_mr *ibmr); 62 62 ··· 718 718 } 719 719 720 720 static struct mlx5_ib_mr *_mlx5_mr_cache_alloc(struct mlx5_ib_dev *dev, 721 - struct mlx5_cache_ent *ent, 722 - int access_flags) 721 + struct mlx5_cache_ent *ent) 723 722 { 724 723 struct mlx5_ib_mr *mr; 725 724 int err; ··· 793 794 if (!ent) 794 795 return ERR_PTR(-EOPNOTSUPP); 795 796 796 - return _mlx5_mr_cache_alloc(dev, ent, access_flags); 797 + return _mlx5_mr_cache_alloc(dev, ent); 797 798 } 798 799 799 800 static void mlx5_mkey_cache_debugfs_cleanup(struct mlx5_ib_dev *dev) ··· 918 919 return ERR_PTR(ret); 919 920 } 920 921 922 + static void mlx5r_destroy_cache_entries(struct mlx5_ib_dev *dev) 923 + { 924 + struct rb_root *root = &dev->cache.rb_root; 925 + struct mlx5_cache_ent *ent; 926 + struct rb_node *node; 927 + 928 + mutex_lock(&dev->cache.rb_lock); 929 + node = rb_first(root); 930 + while (node) { 931 + ent = rb_entry(node, struct mlx5_cache_ent, node); 932 + node = rb_next(node); 933 + clean_keys(dev, ent); 934 + rb_erase(&ent->node, root); 935 + mlx5r_mkeys_uninit(ent); 936 + kfree(ent); 937 + } 938 + mutex_unlock(&dev->cache.rb_lock); 939 + } 940 + 921 941 int mlx5_mkey_cache_init(struct mlx5_ib_dev *dev) 922 942 { 923 943 struct mlx5_mkey_cache *cache = &dev->cache; ··· 988 970 err: 989 971 mutex_unlock(&cache->rb_lock); 990 972 mlx5_mkey_cache_debugfs_cleanup(dev); 973 + mlx5r_destroy_cache_entries(dev); 974 + destroy_workqueue(cache->wq); 991 975 mlx5_ib_warn(dev, "failed to create mkey cache entry\n"); 992 976 return ret; 993 977 } ··· 1023 1003 mlx5_cmd_cleanup_async_ctx(&dev->async_ctx); 1024 1004 1025 1005 /* At this point all entries are disabled and have no concurrent work. */ 1026 - mutex_lock(&dev->cache.rb_lock); 1027 - node = rb_first(root); 1028 - while (node) { 1029 - ent = rb_entry(node, struct mlx5_cache_ent, node); 1030 - node = rb_next(node); 1031 - clean_keys(dev, ent); 1032 - rb_erase(&ent->node, root); 1033 - mlx5r_mkeys_uninit(ent); 1034 - kfree(ent); 1035 - } 1036 - mutex_unlock(&dev->cache.rb_lock); 1006 + mlx5r_destroy_cache_entries(dev); 1037 1007 1038 1008 destroy_workqueue(dev->cache.wq); 1039 1009 del_timer_sync(&dev->delay_timer); ··· 1125 1115 struct mlx5r_cache_rb_key rb_key = {}; 1126 1116 struct mlx5_cache_ent *ent; 1127 1117 struct mlx5_ib_mr *mr; 1128 - unsigned int page_size; 1118 + unsigned long page_size; 1129 1119 1130 1120 if (umem->is_dmabuf) 1131 1121 page_size = mlx5_umem_dmabuf_default_pgsz(umem, iova); ··· 1154 1144 return mr; 1155 1145 } 1156 1146 1157 - mr = _mlx5_mr_cache_alloc(dev, ent, access_flags); 1147 + mr = _mlx5_mr_cache_alloc(dev, ent); 1158 1148 if (IS_ERR(mr)) 1159 1149 return mr; 1160 1150 ··· 1229 1219 */ 1230 1220 static struct mlx5_ib_mr *reg_create(struct ib_pd *pd, struct ib_umem *umem, 1231 1221 u64 iova, int access_flags, 1232 - unsigned int page_size, bool populate, 1222 + unsigned long page_size, bool populate, 1233 1223 int access_mode) 1234 1224 { 1235 1225 struct mlx5_ib_dev *dev = to_mdev(pd->device); ··· 1435 1425 mr = alloc_cacheable_mr(pd, umem, iova, access_flags, 1436 1426 MLX5_MKC_ACCESS_MODE_MTT); 1437 1427 } else { 1438 - unsigned int page_size = 1428 + unsigned long page_size = 1439 1429 mlx5_umem_mkc_find_best_pgsz(dev, umem, iova); 1440 1430 1441 1431 mutex_lock(&dev->slow_path_mutex); ··· 1967 1957 1968 1958 if (mr->mmkey.cache_ent) { 1969 1959 spin_lock_irq(&mr->mmkey.cache_ent->mkeys_queue.lock); 1970 - mr->mmkey.cache_ent->in_use--; 1971 1960 goto end; 1972 1961 } 1973 1962 ··· 2034 2025 bool is_odp = is_odp_mr(mr); 2035 2026 bool is_odp_dma_buf = is_dmabuf_mr(mr) && 2036 2027 !to_ib_umem_dmabuf(mr->umem)->pinned; 2028 + bool from_cache = !!ent; 2037 2029 int ret = 0; 2038 2030 2039 2031 if (is_odp) ··· 2047 2037 ent = mr->mmkey.cache_ent; 2048 2038 /* upon storing to a clean temp entry - schedule its cleanup */ 2049 2039 spin_lock_irq(&ent->mkeys_queue.lock); 2040 + if (from_cache) 2041 + ent->in_use--; 2050 2042 if (ent->is_tmp && !ent->tmp_cleanup_scheduled) { 2051 2043 mod_delayed_work(ent->dev->cache.wq, &ent->dwork, 2052 2044 msecs_to_jiffies(30 * 1000));
+6 -4
drivers/infiniband/hw/mlx5/odp.c
··· 309 309 blk_start_idx = idx; 310 310 in_block = 1; 311 311 } 312 - 313 - /* Count page invalidations */ 314 - invalidations += idx - blk_start_idx + 1; 315 312 } else { 316 313 u64 umr_offset = idx & umr_block_mask; 317 314 ··· 318 321 MLX5_IB_UPD_XLT_ZAP | 319 322 MLX5_IB_UPD_XLT_ATOMIC); 320 323 in_block = 0; 324 + /* Count page invalidations */ 325 + invalidations += idx - blk_start_idx + 1; 321 326 } 322 327 } 323 328 } 324 - if (in_block) 329 + if (in_block) { 325 330 mlx5r_umr_update_xlt(mr, blk_start_idx, 326 331 idx - blk_start_idx + 1, 0, 327 332 MLX5_IB_UPD_XLT_ZAP | 328 333 MLX5_IB_UPD_XLT_ATOMIC); 334 + /* Count page invalidations */ 335 + invalidations += idx - blk_start_idx + 1; 336 + } 329 337 330 338 mlx5_update_odp_stats_with_handled(mr, invalidations, invalidations); 331 339
-28
drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c
··· 237 237 return IB_LINK_LAYER_ETHERNET; 238 238 } 239 239 240 - int pvrdma_modify_device(struct ib_device *ibdev, int mask, 241 - struct ib_device_modify *props) 242 - { 243 - unsigned long flags; 244 - 245 - if (mask & ~(IB_DEVICE_MODIFY_SYS_IMAGE_GUID | 246 - IB_DEVICE_MODIFY_NODE_DESC)) { 247 - dev_warn(&to_vdev(ibdev)->pdev->dev, 248 - "unsupported device modify mask %#x\n", mask); 249 - return -EOPNOTSUPP; 250 - } 251 - 252 - if (mask & IB_DEVICE_MODIFY_NODE_DESC) { 253 - spin_lock_irqsave(&to_vdev(ibdev)->desc_lock, flags); 254 - memcpy(ibdev->node_desc, props->node_desc, 64); 255 - spin_unlock_irqrestore(&to_vdev(ibdev)->desc_lock, flags); 256 - } 257 - 258 - if (mask & IB_DEVICE_MODIFY_SYS_IMAGE_GUID) { 259 - mutex_lock(&to_vdev(ibdev)->port_mutex); 260 - to_vdev(ibdev)->sys_image_guid = 261 - cpu_to_be64(props->sys_image_guid); 262 - mutex_unlock(&to_vdev(ibdev)->port_mutex); 263 - } 264 - 265 - return 0; 266 - } 267 - 268 240 /** 269 241 * pvrdma_modify_port - modify device port attributes 270 242 * @ibdev: the device to modify
-2
drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
··· 356 356 u16 index, u16 *pkey); 357 357 enum rdma_link_layer pvrdma_port_link_layer(struct ib_device *ibdev, 358 358 u32 port); 359 - int pvrdma_modify_device(struct ib_device *ibdev, int mask, 360 - struct ib_device_modify *props); 361 359 int pvrdma_modify_port(struct ib_device *ibdev, u32 port, 362 360 int mask, struct ib_port_modify *props); 363 361 int pvrdma_mmap(struct ib_ucontext *context, struct vm_area_struct *vma);
+1 -2
drivers/infiniband/sw/rxe/Kconfig
··· 4 4 depends on INET && PCI && INFINIBAND 5 5 depends on INFINIBAND_VIRT_DMA 6 6 select NET_UDP_TUNNEL 7 - select CRYPTO 8 - select CRYPTO_CRC32 7 + select CRC32 9 8 help 10 9 This driver implements the InfiniBand RDMA transport over 11 10 the Linux network stack. It enables a system with a
+2
drivers/infiniband/sw/rxe/Makefile
··· 23 23 rxe_task.o \ 24 24 rxe_net.o \ 25 25 rxe_hw_counters.o 26 + 27 + rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o
+31 -5
drivers/infiniband/sw/rxe/rxe.c
··· 31 31 32 32 WARN_ON(!RB_EMPTY_ROOT(&rxe->mcg_tree)); 33 33 34 - if (rxe->tfm) 35 - crypto_free_shash(rxe->tfm); 36 - 37 34 mutex_destroy(&rxe->usdev_lock); 38 35 } 39 36 ··· 69 72 rxe->attr.max_pkeys = RXE_MAX_PKEYS; 70 73 rxe->attr.local_ca_ack_delay = RXE_LOCAL_CA_ACK_DELAY; 71 74 75 + if (ndev->addr_len) { 76 + memcpy(rxe->raw_gid, ndev->dev_addr, 77 + min_t(unsigned int, ndev->addr_len, ETH_ALEN)); 78 + } else { 79 + /* 80 + * This device does not have a HW address, but 81 + * connection mangagement requires a unique gid. 82 + */ 83 + eth_random_addr(rxe->raw_gid); 84 + } 85 + 72 86 addrconf_addr_eui48((unsigned char *)&rxe->attr.sys_image_guid, 73 - ndev->dev_addr); 87 + rxe->raw_gid); 74 88 75 89 rxe->max_ucontext = RXE_MAX_UCONTEXT; 90 + 91 + if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) { 92 + rxe->attr.kernel_cap_flags |= IBK_ON_DEMAND_PAGING; 93 + 94 + /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ 95 + rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; 96 + 97 + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND; 98 + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_RECV; 99 + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; 100 + 101 + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SEND; 102 + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; 103 + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; 104 + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; 105 + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC; 106 + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; 107 + } 76 108 } 77 109 78 110 /* initialize port attributes */ ··· 139 113 140 114 rxe_init_port_param(port); 141 115 addrconf_addr_eui48((unsigned char *)&port->port_guid, 142 - ndev->dev_addr); 116 + rxe->raw_gid); 143 117 spin_lock_init(&port->port_lock); 144 118 } 145 119
-38
drivers/infiniband/sw/rxe/rxe.h
··· 21 21 #include <rdma/ib_umem.h> 22 22 #include <rdma/ib_cache.h> 23 23 #include <rdma/ib_addr.h> 24 - #include <crypto/hash.h> 25 24 26 25 #include "rxe_net.h" 27 26 #include "rxe_opcode.h" ··· 98 99 "mr#%d %s: " fmt, (mr)->elem.index, __func__, ##__VA_ARGS__) 99 100 #define rxe_info_mw(mw, fmt, ...) ibdev_info_ratelimited((mw)->ibmw.device, \ 100 101 "mw#%d %s: " fmt, (mw)->elem.index, __func__, ##__VA_ARGS__) 101 - 102 - /* responder states */ 103 - enum resp_states { 104 - RESPST_NONE, 105 - RESPST_GET_REQ, 106 - RESPST_CHK_PSN, 107 - RESPST_CHK_OP_SEQ, 108 - RESPST_CHK_OP_VALID, 109 - RESPST_CHK_RESOURCE, 110 - RESPST_CHK_LENGTH, 111 - RESPST_CHK_RKEY, 112 - RESPST_EXECUTE, 113 - RESPST_READ_REPLY, 114 - RESPST_ATOMIC_REPLY, 115 - RESPST_ATOMIC_WRITE_REPLY, 116 - RESPST_PROCESS_FLUSH, 117 - RESPST_COMPLETE, 118 - RESPST_ACKNOWLEDGE, 119 - RESPST_CLEANUP, 120 - RESPST_DUPLICATE_REQUEST, 121 - RESPST_ERR_MALFORMED_WQE, 122 - RESPST_ERR_UNSUPPORTED_OPCODE, 123 - RESPST_ERR_MISALIGNED_ATOMIC, 124 - RESPST_ERR_PSN_OUT_OF_SEQ, 125 - RESPST_ERR_MISSING_OPCODE_FIRST, 126 - RESPST_ERR_MISSING_OPCODE_LAST_C, 127 - RESPST_ERR_MISSING_OPCODE_LAST_D1E, 128 - RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, 129 - RESPST_ERR_RNR, 130 - RESPST_ERR_RKEY_VIOLATION, 131 - RESPST_ERR_INVALIDATE_RKEY, 132 - RESPST_ERR_LENGTH, 133 - RESPST_ERR_CQ_OVERFLOW, 134 - RESPST_ERROR, 135 - RESPST_DONE, 136 - RESPST_EXIT, 137 - }; 138 102 139 103 void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); 140 104
+1 -39
drivers/infiniband/sw/rxe/rxe_icrc.c
··· 10 10 #include "rxe_loc.h" 11 11 12 12 /** 13 - * rxe_icrc_init() - Initialize crypto function for computing crc32 14 - * @rxe: rdma_rxe device object 15 - * 16 - * Return: 0 on success else an error 17 - */ 18 - int rxe_icrc_init(struct rxe_dev *rxe) 19 - { 20 - struct crypto_shash *tfm; 21 - 22 - tfm = crypto_alloc_shash("crc32", 0, 0); 23 - if (IS_ERR(tfm)) { 24 - rxe_dbg_dev(rxe, "failed to init crc32 algorithm err: %ld\n", 25 - PTR_ERR(tfm)); 26 - return PTR_ERR(tfm); 27 - } 28 - 29 - rxe->tfm = tfm; 30 - 31 - return 0; 32 - } 33 - 34 - /** 35 13 * rxe_crc32() - Compute cumulative crc32 for a contiguous segment 36 14 * @rxe: rdma_rxe device object 37 15 * @crc: starting crc32 value from previous segments ··· 20 42 */ 21 43 static __be32 rxe_crc32(struct rxe_dev *rxe, __be32 crc, void *next, size_t len) 22 44 { 23 - __be32 icrc; 24 - int err; 25 - 26 - SHASH_DESC_ON_STACK(shash, rxe->tfm); 27 - 28 - shash->tfm = rxe->tfm; 29 - *(__be32 *)shash_desc_ctx(shash) = crc; 30 - err = crypto_shash_update(shash, next, len); 31 - if (unlikely(err)) { 32 - rxe_dbg_dev(rxe, "failed crc calculation, err: %d\n", err); 33 - return (__force __be32)crc32_le((__force u32)crc, next, len); 34 - } 35 - 36 - icrc = *(__be32 *)shash_desc_ctx(shash); 37 - barrier_data(shash_desc_ctx(shash)); 38 - 39 - return icrc; 45 + return (__force __be32)crc32_le((__force u32)crc, next, len); 40 46 } 41 47 42 48 /**
+34 -1
drivers/infiniband/sw/rxe/rxe_loc.h
··· 58 58 59 59 /* rxe_mr.c */ 60 60 u8 rxe_get_next_key(u32 last_key); 61 + void rxe_mr_init(int access, struct rxe_mr *mr); 61 62 void rxe_mr_init_dma(int access, struct rxe_mr *mr); 62 63 int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, 63 64 int access, struct rxe_mr *mr); ··· 80 79 int rxe_invalidate_mr(struct rxe_qp *qp, u32 key); 81 80 int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe); 82 81 void rxe_mr_cleanup(struct rxe_pool_elem *elem); 82 + 83 + /* defined in rxe_mr.c; used in rxe_mr.c and rxe_odp.c */ 84 + extern spinlock_t atomic_ops_lock; 83 85 84 86 /* rxe_mw.c */ 85 87 int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata); ··· 172 168 int rxe_receiver(struct rxe_qp *qp); 173 169 174 170 /* rxe_icrc.c */ 175 - int rxe_icrc_init(struct rxe_dev *rxe); 176 171 int rxe_icrc_check(struct sk_buff *skb, struct rxe_pkt_info *pkt); 177 172 void rxe_icrc_generate(struct sk_buff *skb, struct rxe_pkt_info *pkt); 178 173 ··· 183 180 { 184 181 return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type]; 185 182 } 183 + 184 + /* rxe_odp.c */ 185 + extern const struct mmu_interval_notifier_ops rxe_mn_ops; 186 + 187 + #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING 188 + int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, 189 + u64 iova, int access_flags, struct rxe_mr *mr); 190 + int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, 191 + enum rxe_mr_copy_dir dir); 192 + int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, 193 + u64 compare, u64 swap_add, u64 *orig_val); 194 + #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ 195 + static inline int 196 + rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, 197 + int access_flags, struct rxe_mr *mr) 198 + { 199 + return -EOPNOTSUPP; 200 + } 201 + static inline int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, 202 + int length, enum rxe_mr_copy_dir dir) 203 + { 204 + return -EOPNOTSUPP; 205 + } 206 + static inline int 207 + rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, 208 + u64 compare, u64 swap_add, u64 *orig_val) 209 + { 210 + return RESPST_ERR_UNSUPPORTED_OPCODE; 211 + } 212 + #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ 186 213 187 214 #endif /* RXE_LOC_H */
+10 -3
drivers/infiniband/sw/rxe/rxe_mr.c
··· 45 45 } 46 46 } 47 47 48 - static void rxe_mr_init(int access, struct rxe_mr *mr) 48 + void rxe_mr_init(int access, struct rxe_mr *mr) 49 49 { 50 50 u32 key = mr->elem.index << 8 | rxe_get_next_key(-1); 51 51 ··· 323 323 return err; 324 324 } 325 325 326 - return rxe_mr_copy_xarray(mr, iova, addr, length, dir); 326 + if (mr->umem->is_odp) 327 + return rxe_odp_mr_copy(mr, iova, addr, length, dir); 328 + else 329 + return rxe_mr_copy_xarray(mr, iova, addr, length, dir); 327 330 } 328 331 329 332 /* copy data in or out of a wqe, i.e. sg list ··· 469 466 } 470 467 471 468 /* Guarantee atomicity of atomic operations at the machine level. */ 472 - static DEFINE_SPINLOCK(atomic_ops_lock); 469 + DEFINE_SPINLOCK(atomic_ops_lock); 473 470 474 471 int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, 475 472 u64 compare, u64 swap_add, u64 *orig_val) ··· 534 531 unsigned int page_offset; 535 532 struct page *page; 536 533 u64 *va; 534 + 535 + /* ODP is not supported right now. WIP. */ 536 + if (mr->umem->is_odp) 537 + return RESPST_ERR_UNSUPPORTED_OPCODE; 537 538 538 539 /* See IBA oA19-28 */ 539 540 if (unlikely(mr->state != RXE_MR_STATE_VALID)) {
+326
drivers/infiniband/sw/rxe/rxe_odp.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + /* 3 + * Copyright (c) 2022-2023 Fujitsu Ltd. All rights reserved. 4 + */ 5 + 6 + #include <linux/hmm.h> 7 + 8 + #include <rdma/ib_umem_odp.h> 9 + 10 + #include "rxe.h" 11 + 12 + static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, 13 + const struct mmu_notifier_range *range, 14 + unsigned long cur_seq) 15 + { 16 + struct ib_umem_odp *umem_odp = 17 + container_of(mni, struct ib_umem_odp, notifier); 18 + unsigned long start, end; 19 + 20 + if (!mmu_notifier_range_blockable(range)) 21 + return false; 22 + 23 + mutex_lock(&umem_odp->umem_mutex); 24 + mmu_interval_set_seq(mni, cur_seq); 25 + 26 + start = max_t(u64, ib_umem_start(umem_odp), range->start); 27 + end = min_t(u64, ib_umem_end(umem_odp), range->end); 28 + 29 + /* update umem_odp->dma_list */ 30 + ib_umem_odp_unmap_dma_pages(umem_odp, start, end); 31 + 32 + mutex_unlock(&umem_odp->umem_mutex); 33 + return true; 34 + } 35 + 36 + const struct mmu_interval_notifier_ops rxe_mn_ops = { 37 + .invalidate = rxe_ib_invalidate_range, 38 + }; 39 + 40 + #define RXE_PAGEFAULT_DEFAULT 0 41 + #define RXE_PAGEFAULT_RDONLY BIT(0) 42 + #define RXE_PAGEFAULT_SNAPSHOT BIT(1) 43 + static int rxe_odp_do_pagefault_and_lock(struct rxe_mr *mr, u64 user_va, int bcnt, u32 flags) 44 + { 45 + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); 46 + bool fault = !(flags & RXE_PAGEFAULT_SNAPSHOT); 47 + u64 access_mask; 48 + int np; 49 + 50 + access_mask = ODP_READ_ALLOWED_BIT; 51 + if (umem_odp->umem.writable && !(flags & RXE_PAGEFAULT_RDONLY)) 52 + access_mask |= ODP_WRITE_ALLOWED_BIT; 53 + 54 + /* 55 + * ib_umem_odp_map_dma_and_lock() locks umem_mutex on success. 56 + * Callers must release the lock later to let invalidation handler 57 + * do its work again. 58 + */ 59 + np = ib_umem_odp_map_dma_and_lock(umem_odp, user_va, bcnt, 60 + access_mask, fault); 61 + return np; 62 + } 63 + 64 + static int rxe_odp_init_pages(struct rxe_mr *mr) 65 + { 66 + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); 67 + int ret; 68 + 69 + ret = rxe_odp_do_pagefault_and_lock(mr, mr->umem->address, 70 + mr->umem->length, 71 + RXE_PAGEFAULT_SNAPSHOT); 72 + 73 + if (ret >= 0) 74 + mutex_unlock(&umem_odp->umem_mutex); 75 + 76 + return ret >= 0 ? 0 : ret; 77 + } 78 + 79 + int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, 80 + u64 iova, int access_flags, struct rxe_mr *mr) 81 + { 82 + struct ib_umem_odp *umem_odp; 83 + int err; 84 + 85 + if (!IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) 86 + return -EOPNOTSUPP; 87 + 88 + rxe_mr_init(access_flags, mr); 89 + 90 + if (!start && length == U64_MAX) { 91 + if (iova != 0) 92 + return -EINVAL; 93 + if (!(rxe->attr.odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT)) 94 + return -EINVAL; 95 + 96 + /* Never reach here, for implicit ODP is not implemented. */ 97 + } 98 + 99 + umem_odp = ib_umem_odp_get(&rxe->ib_dev, start, length, access_flags, 100 + &rxe_mn_ops); 101 + if (IS_ERR(umem_odp)) { 102 + rxe_dbg_mr(mr, "Unable to create umem_odp err = %d\n", 103 + (int)PTR_ERR(umem_odp)); 104 + return PTR_ERR(umem_odp); 105 + } 106 + 107 + umem_odp->private = mr; 108 + 109 + mr->umem = &umem_odp->umem; 110 + mr->access = access_flags; 111 + mr->ibmr.length = length; 112 + mr->ibmr.iova = iova; 113 + mr->page_offset = ib_umem_offset(&umem_odp->umem); 114 + 115 + err = rxe_odp_init_pages(mr); 116 + if (err) { 117 + ib_umem_odp_release(umem_odp); 118 + return err; 119 + } 120 + 121 + mr->state = RXE_MR_STATE_VALID; 122 + mr->ibmr.type = IB_MR_TYPE_USER; 123 + 124 + return err; 125 + } 126 + 127 + static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp, 128 + u64 iova, int length, u32 perm) 129 + { 130 + bool need_fault = false; 131 + u64 addr; 132 + int idx; 133 + 134 + addr = iova & (~(BIT(umem_odp->page_shift) - 1)); 135 + 136 + /* Skim through all pages that are to be accessed. */ 137 + while (addr < iova + length) { 138 + idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; 139 + 140 + if (!(umem_odp->dma_list[idx] & perm)) { 141 + need_fault = true; 142 + break; 143 + } 144 + 145 + addr += BIT(umem_odp->page_shift); 146 + } 147 + return need_fault; 148 + } 149 + 150 + static int rxe_odp_map_range_and_lock(struct rxe_mr *mr, u64 iova, int length, u32 flags) 151 + { 152 + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); 153 + bool need_fault; 154 + u64 perm; 155 + int err; 156 + 157 + if (unlikely(length < 1)) 158 + return -EINVAL; 159 + 160 + perm = ODP_READ_ALLOWED_BIT; 161 + if (!(flags & RXE_PAGEFAULT_RDONLY)) 162 + perm |= ODP_WRITE_ALLOWED_BIT; 163 + 164 + mutex_lock(&umem_odp->umem_mutex); 165 + 166 + need_fault = rxe_check_pagefault(umem_odp, iova, length, perm); 167 + if (need_fault) { 168 + mutex_unlock(&umem_odp->umem_mutex); 169 + 170 + /* umem_mutex is locked on success. */ 171 + err = rxe_odp_do_pagefault_and_lock(mr, iova, length, 172 + flags); 173 + if (err < 0) 174 + return err; 175 + 176 + need_fault = rxe_check_pagefault(umem_odp, iova, length, perm); 177 + if (need_fault) 178 + return -EFAULT; 179 + } 180 + 181 + return 0; 182 + } 183 + 184 + static int __rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, 185 + int length, enum rxe_mr_copy_dir dir) 186 + { 187 + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); 188 + struct page *page; 189 + int idx, bytes; 190 + size_t offset; 191 + u8 *user_va; 192 + 193 + idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift; 194 + offset = iova & (BIT(umem_odp->page_shift) - 1); 195 + 196 + while (length > 0) { 197 + u8 *src, *dest; 198 + 199 + page = hmm_pfn_to_page(umem_odp->pfn_list[idx]); 200 + user_va = kmap_local_page(page); 201 + if (!user_va) 202 + return -EFAULT; 203 + 204 + src = (dir == RXE_TO_MR_OBJ) ? addr : user_va; 205 + dest = (dir == RXE_TO_MR_OBJ) ? user_va : addr; 206 + 207 + bytes = BIT(umem_odp->page_shift) - offset; 208 + if (bytes > length) 209 + bytes = length; 210 + 211 + memcpy(dest, src, bytes); 212 + kunmap_local(user_va); 213 + 214 + length -= bytes; 215 + idx++; 216 + offset = 0; 217 + } 218 + 219 + return 0; 220 + } 221 + 222 + int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, 223 + enum rxe_mr_copy_dir dir) 224 + { 225 + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); 226 + u32 flags = RXE_PAGEFAULT_DEFAULT; 227 + int err; 228 + 229 + if (length == 0) 230 + return 0; 231 + 232 + if (unlikely(!mr->umem->is_odp)) 233 + return -EOPNOTSUPP; 234 + 235 + switch (dir) { 236 + case RXE_TO_MR_OBJ: 237 + break; 238 + 239 + case RXE_FROM_MR_OBJ: 240 + flags |= RXE_PAGEFAULT_RDONLY; 241 + break; 242 + 243 + default: 244 + return -EINVAL; 245 + } 246 + 247 + err = rxe_odp_map_range_and_lock(mr, iova, length, flags); 248 + if (err) 249 + return err; 250 + 251 + err = __rxe_odp_mr_copy(mr, iova, addr, length, dir); 252 + 253 + mutex_unlock(&umem_odp->umem_mutex); 254 + 255 + return err; 256 + } 257 + 258 + static int rxe_odp_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, 259 + u64 compare, u64 swap_add, u64 *orig_val) 260 + { 261 + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); 262 + unsigned int page_offset; 263 + struct page *page; 264 + unsigned int idx; 265 + u64 value; 266 + u64 *va; 267 + int err; 268 + 269 + if (unlikely(mr->state != RXE_MR_STATE_VALID)) { 270 + rxe_dbg_mr(mr, "mr not in valid state\n"); 271 + return RESPST_ERR_RKEY_VIOLATION; 272 + } 273 + 274 + err = mr_check_range(mr, iova, sizeof(value)); 275 + if (err) { 276 + rxe_dbg_mr(mr, "iova out of range\n"); 277 + return RESPST_ERR_RKEY_VIOLATION; 278 + } 279 + 280 + idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift; 281 + page_offset = iova & (BIT(umem_odp->page_shift) - 1); 282 + page = hmm_pfn_to_page(umem_odp->pfn_list[idx]); 283 + if (!page) 284 + return RESPST_ERR_RKEY_VIOLATION; 285 + 286 + if (unlikely(page_offset & 0x7)) { 287 + rxe_dbg_mr(mr, "iova not aligned\n"); 288 + return RESPST_ERR_MISALIGNED_ATOMIC; 289 + } 290 + 291 + va = kmap_local_page(page); 292 + 293 + spin_lock_bh(&atomic_ops_lock); 294 + value = *orig_val = va[page_offset >> 3]; 295 + 296 + if (opcode == IB_OPCODE_RC_COMPARE_SWAP) { 297 + if (value == compare) 298 + va[page_offset >> 3] = swap_add; 299 + } else { 300 + value += swap_add; 301 + va[page_offset >> 3] = value; 302 + } 303 + spin_unlock_bh(&atomic_ops_lock); 304 + 305 + kunmap_local(va); 306 + 307 + return 0; 308 + } 309 + 310 + int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, 311 + u64 compare, u64 swap_add, u64 *orig_val) 312 + { 313 + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); 314 + int err; 315 + 316 + err = rxe_odp_map_range_and_lock(mr, iova, sizeof(char), 317 + RXE_PAGEFAULT_DEFAULT); 318 + if (err < 0) 319 + return RESPST_ERR_RKEY_VIOLATION; 320 + 321 + err = rxe_odp_do_atomic_op(mr, iova, opcode, compare, swap_add, 322 + orig_val); 323 + mutex_unlock(&umem_odp->umem_mutex); 324 + 325 + return err; 326 + }
-1
drivers/infiniband/sw/rxe/rxe_req.c
··· 5 5 */ 6 6 7 7 #include <linux/skbuff.h> 8 - #include <crypto/hash.h> 9 8 10 9 #include "rxe.h" 11 10 #include "rxe_loc.h"
+14 -4
drivers/infiniband/sw/rxe/rxe_resp.c
··· 649 649 struct rxe_mr *mr = qp->resp.mr; 650 650 struct resp_res *res = qp->resp.res; 651 651 652 + /* ODP is not supported right now. WIP. */ 653 + if (mr->umem->is_odp) 654 + return RESPST_ERR_UNSUPPORTED_OPCODE; 655 + 652 656 /* oA19-14, oA19-15 */ 653 657 if (res && res->replay) 654 658 return RESPST_ACKNOWLEDGE; ··· 706 702 if (!res->replay) { 707 703 u64 iova = qp->resp.va + qp->resp.offset; 708 704 709 - err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, 710 - atmeth_comp(pkt), 711 - atmeth_swap_add(pkt), 712 - &res->atomic.orig_val); 705 + if (mr->umem->is_odp) 706 + err = rxe_odp_atomic_op(mr, iova, pkt->opcode, 707 + atmeth_comp(pkt), 708 + atmeth_swap_add(pkt), 709 + &res->atomic.orig_val); 710 + else 711 + err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, 712 + atmeth_comp(pkt), 713 + atmeth_swap_add(pkt), 714 + &res->atomic.orig_val); 713 715 if (err) 714 716 return err; 715 717
+18 -6
drivers/infiniband/sw/rxe/rxe_verbs.c
··· 80 80 return err; 81 81 } 82 82 83 + static int rxe_query_gid(struct ib_device *ibdev, u32 port, int idx, 84 + union ib_gid *gid) 85 + { 86 + struct rxe_dev *rxe = to_rdev(ibdev); 87 + 88 + /* subnet_prefix == interface_id == 0; */ 89 + memset(gid, 0, sizeof(*gid)); 90 + memcpy(gid->raw, rxe->raw_gid, ETH_ALEN); 91 + 92 + return 0; 93 + } 94 + 83 95 static int rxe_query_pkey(struct ib_device *ibdev, 84 96 u32 port_num, u16 index, u16 *pkey) 85 97 { ··· 1298 1286 mr->ibmr.pd = ibpd; 1299 1287 mr->ibmr.device = ibpd->device; 1300 1288 1301 - err = rxe_mr_init_user(rxe, start, length, access, mr); 1289 + if (access & IB_ACCESS_ON_DEMAND) 1290 + err = rxe_odp_mr_init_user(rxe, start, length, iova, access, mr); 1291 + else 1292 + err = rxe_mr_init_user(rxe, start, length, access, mr); 1302 1293 if (err) { 1303 1294 rxe_dbg_mr(mr, "reg_user_mr failed, err = %d\n", err); 1304 1295 goto err_cleanup; ··· 1508 1493 .query_ah = rxe_query_ah, 1509 1494 .query_device = rxe_query_device, 1510 1495 .query_pkey = rxe_query_pkey, 1496 + .query_gid = rxe_query_gid, 1511 1497 .query_port = rxe_query_port, 1512 1498 .query_qp = rxe_query_qp, 1513 1499 .query_srq = rxe_query_srq, ··· 1539 1523 dev->num_comp_vectors = num_possible_cpus(); 1540 1524 dev->local_dma_lkey = 0; 1541 1525 addrconf_addr_eui48((unsigned char *)&dev->node_guid, 1542 - ndev->dev_addr); 1526 + rxe->raw_gid); 1543 1527 1544 1528 dev->uverbs_cmd_mask |= BIT_ULL(IB_USER_VERBS_CMD_POST_SEND) | 1545 1529 BIT_ULL(IB_USER_VERBS_CMD_REQ_NOTIFY_CQ); 1546 1530 1547 1531 ib_set_device_ops(dev, &rxe_dev_ops); 1548 1532 err = ib_device_set_netdev(&rxe->ib_dev, ndev, 1); 1549 - if (err) 1550 - return err; 1551 - 1552 - err = rxe_icrc_init(rxe); 1553 1533 if (err) 1554 1534 return err; 1555 1535
+40 -2
drivers/infiniband/sw/rxe/rxe_verbs.h
··· 126 126 u32 rnr_retry; 127 127 }; 128 128 129 + /* responder states */ 130 + enum resp_states { 131 + RESPST_NONE, 132 + RESPST_GET_REQ, 133 + RESPST_CHK_PSN, 134 + RESPST_CHK_OP_SEQ, 135 + RESPST_CHK_OP_VALID, 136 + RESPST_CHK_RESOURCE, 137 + RESPST_CHK_LENGTH, 138 + RESPST_CHK_RKEY, 139 + RESPST_EXECUTE, 140 + RESPST_READ_REPLY, 141 + RESPST_ATOMIC_REPLY, 142 + RESPST_ATOMIC_WRITE_REPLY, 143 + RESPST_PROCESS_FLUSH, 144 + RESPST_COMPLETE, 145 + RESPST_ACKNOWLEDGE, 146 + RESPST_CLEANUP, 147 + RESPST_DUPLICATE_REQUEST, 148 + RESPST_ERR_MALFORMED_WQE, 149 + RESPST_ERR_UNSUPPORTED_OPCODE, 150 + RESPST_ERR_MISALIGNED_ATOMIC, 151 + RESPST_ERR_PSN_OUT_OF_SEQ, 152 + RESPST_ERR_MISSING_OPCODE_FIRST, 153 + RESPST_ERR_MISSING_OPCODE_LAST_C, 154 + RESPST_ERR_MISSING_OPCODE_LAST_D1E, 155 + RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, 156 + RESPST_ERR_RNR, 157 + RESPST_ERR_RKEY_VIOLATION, 158 + RESPST_ERR_INVALIDATE_RKEY, 159 + RESPST_ERR_LENGTH, 160 + RESPST_ERR_CQ_OVERFLOW, 161 + RESPST_ERROR, 162 + RESPST_DONE, 163 + RESPST_EXIT, 164 + }; 165 + 129 166 enum rdatm_res_state { 130 167 rdatm_res_state_next, 131 168 rdatm_res_state_new, ··· 413 376 struct ib_device_attr attr; 414 377 int max_ucontext; 415 378 int max_inline_data; 416 - struct mutex usdev_lock; 379 + struct mutex usdev_lock; 380 + 381 + char raw_gid[ETH_ALEN]; 417 382 418 383 struct rxe_pool uc_pool; 419 384 struct rxe_pool pd_pool; ··· 441 402 atomic64_t stats_counters[RXE_NUM_OF_COUNTERS]; 442 403 443 404 struct rxe_port port; 444 - struct crypto_shash *tfm; 445 405 }; 446 406 447 407 static inline struct net_device *rxe_ib_device_get_netdev(struct ib_device *dev)
+1 -3
drivers/infiniband/sw/siw/Kconfig
··· 2 2 tristate "Software RDMA over TCP/IP (iWARP) driver" 3 3 depends on INET && INFINIBAND 4 4 depends on INFINIBAND_VIRT_DMA 5 - select LIBCRC32C 6 - select CRYPTO 7 - select CRYPTO_CRC32C 5 + select CRC32 8 6 help 9 7 This driver implements the iWARP RDMA transport over 10 8 the Linux TCP/IP network stack. It enables a system with a
+31 -6
drivers/infiniband/sw/siw/siw.h
··· 10 10 #include <rdma/restrack.h> 11 11 #include <linux/socket.h> 12 12 #include <linux/skbuff.h> 13 - #include <crypto/hash.h> 14 13 #include <linux/crc32.h> 15 14 #include <linux/crc32c.h> 15 + #include <linux/unaligned.h> 16 16 17 17 #include <rdma/siw-abi.h> 18 18 #include "iwarp.h" ··· 289 289 290 290 union iwarp_hdr hdr; 291 291 struct mpa_trailer trailer; 292 - struct shash_desc *mpa_crc_hd; 292 + u32 mpa_crc; 293 + bool mpa_crc_enabled; 293 294 294 295 /* 295 296 * For each FPDU, main RX loop runs through 3 stages: ··· 391 390 int burst; 392 391 int bytes_unsent; /* ddp payload bytes */ 393 392 394 - struct shash_desc *mpa_crc_hd; 393 + u32 mpa_crc; 394 + bool mpa_crc_enabled; 395 395 396 396 u8 do_crc : 1; /* do crc for segment */ 397 397 u8 use_sendpage : 1; /* send w/o copy */ ··· 498 496 extern const bool peer_to_peer; 499 497 extern struct task_struct *siw_tx_thread[]; 500 498 501 - extern struct crypto_shash *siw_crypto_shash; 502 499 extern struct iwarp_msg_info iwarp_pktinfo[RDMAP_TERMINATE + 1]; 503 500 504 501 /* QP general functions */ ··· 669 668 return NULL; 670 669 } 671 670 671 + static inline void siw_crc_init(u32 *crc) 672 + { 673 + *crc = ~0; 674 + } 675 + 676 + static inline void siw_crc_update(u32 *crc, const void *data, size_t len) 677 + { 678 + *crc = crc32c(*crc, data, len); 679 + } 680 + 681 + static inline void siw_crc_final(u32 *crc, u8 out[4]) 682 + { 683 + put_unaligned_le32(~*crc, out); 684 + } 685 + 686 + static inline void siw_crc_oneshot(const void *data, size_t len, u8 out[4]) 687 + { 688 + u32 crc; 689 + 690 + siw_crc_init(&crc); 691 + siw_crc_update(&crc, data, len); 692 + return siw_crc_final(&crc, out); 693 + } 694 + 672 695 static inline __wsum siw_csum_update(const void *buff, int len, __wsum sum) 673 696 { 674 697 return (__force __wsum)crc32c((__force __u32)sum, buff, len); ··· 711 686 .update = siw_csum_update, 712 687 .combine = siw_csum_combine, 713 688 }; 714 - __wsum crc = *(u32 *)shash_desc_ctx(srx->mpa_crc_hd); 689 + __wsum crc = (__force __wsum)srx->mpa_crc; 715 690 716 691 crc = __skb_checksum(srx->skb, srx->skb_offset, len, crc, 717 692 &siw_cs_ops); 718 - *(u32 *)shash_desc_ctx(srx->mpa_crc_hd) = crc; 693 + srx->mpa_crc = (__force u32)crc; 719 694 } 720 695 721 696 #define siw_dbg(ibdev, fmt, ...) \
+1 -21
drivers/infiniband/sw/siw/siw_main.c
··· 59 59 const bool peer_to_peer; 60 60 61 61 struct task_struct *siw_tx_thread[NR_CPUS]; 62 - struct crypto_shash *siw_crypto_shash; 63 62 64 63 static int siw_device_register(struct siw_device *sdev, const char *name) 65 64 { ··· 466 467 rv = -ENOMEM; 467 468 goto out_error; 468 469 } 469 - /* 470 - * Locate CRC32 algorithm. If unsuccessful, fail 471 - * loading siw only, if CRC is required. 472 - */ 473 - siw_crypto_shash = crypto_alloc_shash("crc32c", 0, 0); 474 - if (IS_ERR(siw_crypto_shash)) { 475 - pr_info("siw: Loading CRC32c failed: %ld\n", 476 - PTR_ERR(siw_crypto_shash)); 477 - siw_crypto_shash = NULL; 478 - if (mpa_crc_required) { 479 - rv = -EOPNOTSUPP; 480 - goto out_error; 481 - } 482 - } 470 + 483 471 rv = register_netdevice_notifier(&siw_netdev_nb); 484 472 if (rv) 485 473 goto out_error; ··· 478 492 479 493 out_error: 480 494 siw_stop_tx_threads(); 481 - 482 - if (siw_crypto_shash) 483 - crypto_free_shash(siw_crypto_shash); 484 495 485 496 pr_info("SoftIWARP attach failed. Error: %d\n", rv); 486 497 ··· 498 515 siw_cm_exit(); 499 516 500 517 siw_destroy_cpulist(siw_cpu_info.num_nodes); 501 - 502 - if (siw_crypto_shash) 503 - crypto_free_shash(siw_crypto_shash); 504 518 505 519 pr_info("SoftiWARP detached\n"); 506 520 }
+11 -43
drivers/infiniband/sw/siw/siw_qp.c
··· 226 226 return 0; 227 227 } 228 228 229 - static int siw_qp_enable_crc(struct siw_qp *qp) 230 - { 231 - struct siw_rx_stream *c_rx = &qp->rx_stream; 232 - struct siw_iwarp_tx *c_tx = &qp->tx_ctx; 233 - int size; 234 - 235 - if (siw_crypto_shash == NULL) 236 - return -ENOENT; 237 - 238 - size = crypto_shash_descsize(siw_crypto_shash) + 239 - sizeof(struct shash_desc); 240 - 241 - c_tx->mpa_crc_hd = kzalloc(size, GFP_KERNEL); 242 - c_rx->mpa_crc_hd = kzalloc(size, GFP_KERNEL); 243 - if (!c_tx->mpa_crc_hd || !c_rx->mpa_crc_hd) { 244 - kfree(c_tx->mpa_crc_hd); 245 - kfree(c_rx->mpa_crc_hd); 246 - c_tx->mpa_crc_hd = NULL; 247 - c_rx->mpa_crc_hd = NULL; 248 - return -ENOMEM; 249 - } 250 - c_tx->mpa_crc_hd->tfm = siw_crypto_shash; 251 - c_rx->mpa_crc_hd->tfm = siw_crypto_shash; 252 - 253 - return 0; 254 - } 255 - 256 229 /* 257 230 * Send a non signalled READ or WRITE to peer side as negotiated 258 231 * with MPAv2 P2P setup protocol. The work request is only created ··· 556 583 557 584 term->ctrl.mpa_len = 558 585 cpu_to_be16(len_terminate - (MPA_HDR_SIZE + MPA_CRC_SIZE)); 559 - if (qp->tx_ctx.mpa_crc_hd) { 560 - crypto_shash_init(qp->tx_ctx.mpa_crc_hd); 561 - if (crypto_shash_update(qp->tx_ctx.mpa_crc_hd, 562 - (u8 *)iov[0].iov_base, 563 - iov[0].iov_len)) 564 - goto out; 565 - 586 + if (qp->tx_ctx.mpa_crc_enabled) { 587 + siw_crc_init(&qp->tx_ctx.mpa_crc); 588 + siw_crc_update(&qp->tx_ctx.mpa_crc, 589 + iov[0].iov_base, iov[0].iov_len); 566 590 if (num_frags == 3) { 567 - if (crypto_shash_update(qp->tx_ctx.mpa_crc_hd, 568 - (u8 *)iov[1].iov_base, 569 - iov[1].iov_len)) 570 - goto out; 591 + siw_crc_update(&qp->tx_ctx.mpa_crc, 592 + iov[1].iov_base, iov[1].iov_len); 571 593 } 572 - crypto_shash_final(qp->tx_ctx.mpa_crc_hd, (u8 *)&crc); 594 + siw_crc_final(&qp->tx_ctx.mpa_crc, (u8 *)&crc); 573 595 } 574 596 575 597 rv = kernel_sendmsg(s, &msg, iov, num_frags, len_terminate); ··· 572 604 rv == len_terminate ? "success" : "failure", 573 605 __rdmap_term_layer(term), __rdmap_term_etype(term), 574 606 __rdmap_term_ecode(term), rv); 575 - out: 576 607 kfree(term); 577 608 kfree(err_hdr); 578 609 } ··· 610 643 switch (attrs->state) { 611 644 case SIW_QP_STATE_RTS: 612 645 if (attrs->flags & SIW_MPA_CRC) { 613 - rv = siw_qp_enable_crc(qp); 614 - if (rv) 615 - break; 646 + siw_crc_init(&qp->tx_ctx.mpa_crc); 647 + qp->tx_ctx.mpa_crc_enabled = true; 648 + siw_crc_init(&qp->rx_stream.mpa_crc); 649 + qp->rx_stream.mpa_crc_enabled = true; 616 650 } 617 651 if (!(mask & SIW_QP_ATTR_LLP_HANDLE)) { 618 652 siw_dbg_qp(qp, "no socket\n");
+11 -12
drivers/infiniband/sw/siw/siw_qp_rx.c
··· 67 67 68 68 return -EFAULT; 69 69 } 70 - if (srx->mpa_crc_hd) { 70 + if (srx->mpa_crc_enabled) { 71 71 if (rdma_is_kernel_res(&rx_qp(srx)->base_qp.res)) { 72 - crypto_shash_update(srx->mpa_crc_hd, 73 - (u8 *)(dest + pg_off), bytes); 72 + siw_crc_update(&srx->mpa_crc, dest + pg_off, 73 + bytes); 74 74 kunmap_atomic(dest); 75 75 } else { 76 76 kunmap_atomic(dest); ··· 114 114 115 115 return rv; 116 116 } 117 - if (srx->mpa_crc_hd) 118 - crypto_shash_update(srx->mpa_crc_hd, (u8 *)kva, len); 117 + if (srx->mpa_crc_enabled) 118 + siw_crc_update(&srx->mpa_crc, kva, len); 119 119 120 120 srx->skb_offset += len; 121 121 srx->skb_copied += len; ··· 966 966 if (srx->fpdu_part_rem) 967 967 return -EAGAIN; 968 968 969 - if (!srx->mpa_crc_hd) 969 + if (!srx->mpa_crc_enabled) 970 970 return 0; 971 971 972 972 if (srx->pad) 973 - crypto_shash_update(srx->mpa_crc_hd, tbuf, srx->pad); 973 + siw_crc_update(&srx->mpa_crc, tbuf, srx->pad); 974 974 /* 975 975 * CRC32 is computed, transmitted and received directly in NBO, 976 976 * so there's never a reason to convert byte order. 977 977 */ 978 - crypto_shash_final(srx->mpa_crc_hd, (u8 *)&crc_own); 978 + siw_crc_final(&srx->mpa_crc, (u8 *)&crc_own); 979 979 crc_in = (__force __wsum)srx->trailer.crc; 980 980 981 981 if (unlikely(crc_in != crc_own)) { ··· 1093 1093 * (tagged/untagged). E.g., a WRITE can get intersected by a SEND, 1094 1094 * but not by a READ RESPONSE etc. 1095 1095 */ 1096 - if (srx->mpa_crc_hd) { 1096 + if (srx->mpa_crc_enabled) { 1097 1097 /* 1098 1098 * Restart CRC computation 1099 1099 */ 1100 - crypto_shash_init(srx->mpa_crc_hd); 1101 - crypto_shash_update(srx->mpa_crc_hd, (u8 *)c_hdr, 1102 - srx->fpdu_part_rcvd); 1100 + siw_crc_init(&srx->mpa_crc); 1101 + siw_crc_update(&srx->mpa_crc, c_hdr, srx->fpdu_part_rcvd); 1103 1102 } 1104 1103 if (frx->more_ddp_segs) { 1105 1104 frx->first_ddp_seg = 0;
+19 -25
drivers/infiniband/sw/siw/siw_qp_tx.c
··· 248 248 /* 249 249 * Do complete CRC if enabled and short packet 250 250 */ 251 - if (c_tx->mpa_crc_hd && 252 - crypto_shash_digest(c_tx->mpa_crc_hd, (u8 *)&c_tx->pkt, 253 - c_tx->ctrl_len, (u8 *)crc) != 0) 254 - return -EINVAL; 251 + if (c_tx->mpa_crc_enabled) 252 + siw_crc_oneshot(&c_tx->pkt, c_tx->ctrl_len, (u8 *)crc); 255 253 c_tx->ctrl_len += MPA_CRC_SIZE; 256 254 257 255 return PKT_COMPLETE; ··· 480 482 iov[seg].iov_len = sge_len; 481 483 482 484 if (do_crc) 483 - crypto_shash_update(c_tx->mpa_crc_hd, 484 - iov[seg].iov_base, 485 - sge_len); 485 + siw_crc_update(&c_tx->mpa_crc, 486 + iov[seg].iov_base, sge_len); 486 487 sge_off += sge_len; 487 488 data_len -= sge_len; 488 489 seg++; ··· 513 516 iov[seg].iov_len = plen; 514 517 515 518 if (do_crc) 516 - crypto_shash_update( 517 - c_tx->mpa_crc_hd, 519 + siw_crc_update( 520 + &c_tx->mpa_crc, 518 521 iov[seg].iov_base, 519 522 plen); 520 523 } else if (do_crc) { 521 524 kaddr = kmap_local_page(p); 522 - crypto_shash_update(c_tx->mpa_crc_hd, 523 - kaddr + fp_off, 524 - plen); 525 + siw_crc_update(&c_tx->mpa_crc, 526 + kaddr + fp_off, plen); 525 527 kunmap_local(kaddr); 526 528 } 527 529 } else { ··· 532 536 533 537 page_array[seg] = ib_virt_dma_to_page(va); 534 538 if (do_crc) 535 - crypto_shash_update( 536 - c_tx->mpa_crc_hd, 537 - ib_virt_dma_to_ptr(va), 538 - plen); 539 + siw_crc_update(&c_tx->mpa_crc, 540 + ib_virt_dma_to_ptr(va), 541 + plen); 539 542 } 540 543 541 544 sge_len -= plen; ··· 571 576 if (c_tx->pad) { 572 577 *(u32 *)c_tx->trailer.pad = 0; 573 578 if (do_crc) 574 - crypto_shash_update(c_tx->mpa_crc_hd, 575 - (u8 *)&c_tx->trailer.crc - c_tx->pad, 576 - c_tx->pad); 579 + siw_crc_update(&c_tx->mpa_crc, 580 + (u8 *)&c_tx->trailer.crc - c_tx->pad, 581 + c_tx->pad); 577 582 } 578 - if (!c_tx->mpa_crc_hd) 583 + if (!c_tx->mpa_crc_enabled) 579 584 c_tx->trailer.crc = 0; 580 585 else if (do_crc) 581 - crypto_shash_final(c_tx->mpa_crc_hd, (u8 *)&c_tx->trailer.crc); 586 + siw_crc_final(&c_tx->mpa_crc, (u8 *)&c_tx->trailer.crc); 582 587 583 588 data_len = c_tx->bytes_unsent; 584 589 ··· 731 736 /* 732 737 * Init MPA CRC computation 733 738 */ 734 - if (c_tx->mpa_crc_hd) { 735 - crypto_shash_init(c_tx->mpa_crc_hd); 736 - crypto_shash_update(c_tx->mpa_crc_hd, (u8 *)&c_tx->pkt, 737 - c_tx->ctrl_len); 739 + if (c_tx->mpa_crc_enabled) { 740 + siw_crc_init(&c_tx->mpa_crc); 741 + siw_crc_update(&c_tx->mpa_crc, &c_tx->pkt, c_tx->ctrl_len); 738 742 c_tx->do_crc = 1; 739 743 } 740 744 }
-3
drivers/infiniband/sw/siw/siw_verbs.c
··· 631 631 } 632 632 up_write(&qp->state_lock); 633 633 634 - kfree(qp->tx_ctx.mpa_crc_hd); 635 - kfree(qp->rx_stream.mpa_crc_hd); 636 - 637 634 qp->scq = qp->rcq = NULL; 638 635 639 636 siw_qp_put(qp);
+4 -4
drivers/infiniband/ulp/iser/iscsi_iser.c
··· 393 393 * @task: iscsi task 394 394 * @sector: error sector if exsists (output) 395 395 * 396 - * Return: zero if no data-integrity errors have occured 397 - * 0x1: data-integrity error occured in the guard-block 398 - * 0x2: data-integrity error occured in the reference tag 399 - * 0x3: data-integrity error occured in the application tag 396 + * Return: zero if no data-integrity errors have occurred 397 + * 0x1: data-integrity error occurred in the guard-block 398 + * 0x2: data-integrity error occurred in the reference tag 399 + * 0x3: data-integrity error occurred in the application tag 400 400 * 401 401 * In addition the error sector is marked. 402 402 */
+6 -1
drivers/net/ethernet/microsoft/mana/gdma_main.c
··· 337 337 mana_gd_ring_doorbell(gc, queue->gdma_dev->doorbell, queue->type, 338 338 queue->id, queue->head * GDMA_WQE_BU_SIZE, 0); 339 339 } 340 + EXPORT_SYMBOL_NS(mana_gd_wq_ring_doorbell, "NET_MANA"); 340 341 341 342 void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit) 342 343 { ··· 350 349 mana_gd_ring_doorbell(gc, cq->gdma_dev->doorbell, cq->type, cq->id, 351 350 head, arm_bit); 352 351 } 352 + EXPORT_SYMBOL_NS(mana_gd_ring_cq, "NET_MANA"); 353 353 354 354 static void mana_gd_process_eqe(struct gdma_queue *eq) 355 355 { ··· 896 894 kfree(queue); 897 895 return err; 898 896 } 897 + EXPORT_SYMBOL_NS(mana_gd_create_mana_wq_cq, "NET_MANA"); 899 898 900 899 void mana_gd_destroy_queue(struct gdma_context *gc, struct gdma_queue *queue) 901 900 { ··· 1071 1068 header->inline_oob_size_div4 = client_oob_size / sizeof(u32); 1072 1069 1073 1070 if (oob_in_sgl) { 1074 - WARN_ON_ONCE(!pad_data || wqe_req->num_sge < 2); 1071 + WARN_ON_ONCE(wqe_req->num_sge < 2); 1075 1072 1076 1073 header->client_oob_in_sgl = 1; 1077 1074 ··· 1178 1175 1179 1176 return 0; 1180 1177 } 1178 + EXPORT_SYMBOL_NS(mana_gd_post_work_request, "NET_MANA"); 1181 1179 1182 1180 int mana_gd_post_and_ring(struct gdma_queue *queue, 1183 1181 const struct gdma_wqe_request *wqe_req, ··· 1252 1248 1253 1249 return cqe_idx; 1254 1250 } 1251 + EXPORT_SYMBOL_NS(mana_gd_poll_cq, "NET_MANA"); 1255 1252 1256 1253 static irqreturn_t mana_gd_intr(int irq, void *arg) 1257 1254 {
+14 -8
drivers/net/ethernet/microsoft/mana/mana_en.c
··· 3179 3179 dev_dbg(dev, "%s succeeded\n", __func__); 3180 3180 } 3181 3181 3182 - struct net_device *mana_get_primary_netdev_rcu(struct mana_context *ac, u32 port_index) 3182 + struct net_device *mana_get_primary_netdev(struct mana_context *ac, 3183 + u32 port_index, 3184 + netdevice_tracker *tracker) 3183 3185 { 3184 3186 struct net_device *ndev; 3185 3187 3186 - RCU_LOCKDEP_WARN(!rcu_read_lock_held(), 3187 - "Taking primary netdev without holding the RCU read lock"); 3188 3188 if (port_index >= ac->num_ports) 3189 3189 return NULL; 3190 3190 3191 - /* When mana is used in netvsc, the upper netdevice should be returned. */ 3192 - if (ac->ports[port_index]->flags & IFF_SLAVE) 3193 - ndev = netdev_master_upper_dev_get_rcu(ac->ports[port_index]); 3194 - else 3191 + rcu_read_lock(); 3192 + 3193 + /* If mana is used in netvsc, the upper netdevice should be returned. */ 3194 + ndev = netdev_master_upper_dev_get_rcu(ac->ports[port_index]); 3195 + 3196 + /* If there is no upper device, use the parent Ethernet device */ 3197 + if (!ndev) 3195 3198 ndev = ac->ports[port_index]; 3199 + 3200 + netdev_hold(ndev, tracker, GFP_ATOMIC); 3201 + rcu_read_unlock(); 3196 3202 3197 3203 return ndev; 3198 3204 } 3199 - EXPORT_SYMBOL_NS(mana_get_primary_netdev_rcu, "NET_MANA"); 3205 + EXPORT_SYMBOL_NS(mana_get_primary_netdev, "NET_MANA");
+2 -2
include/linux/mlx5/device.h
··· 1533 1533 return MLX5_MIN_PKEY_TABLE_SIZE << pkey_sz; 1534 1534 } 1535 1535 1536 - #define MLX5_RDMA_RX_NUM_COUNTERS_PRIOS 2 1537 - #define MLX5_RDMA_TX_NUM_COUNTERS_PRIOS 1 1536 + #define MLX5_RDMA_RX_NUM_COUNTERS_PRIOS 6 1537 + #define MLX5_RDMA_TX_NUM_COUNTERS_PRIOS 4 1538 1538 #define MLX5_BY_PASS_NUM_REGULAR_PRIOS 16 1539 1539 #define MLX5_BY_PASS_NUM_DONT_TRAP_PRIOS 16 1540 1540 #define MLX5_BY_PASS_NUM_MULTICAST_PRIOS 1
+7
include/net/mana/gdma.h
··· 152 152 #define GDMA_MESSAGE_V1 1 153 153 #define GDMA_MESSAGE_V2 2 154 154 #define GDMA_MESSAGE_V3 3 155 + #define GDMA_MESSAGE_V4 4 155 156 156 157 struct gdma_general_resp { 157 158 struct gdma_resp_hdr hdr; ··· 779 778 780 779 enum gdma_pd_flags { 781 780 GDMA_PD_FLAG_INVALID = 0, 781 + GDMA_PD_FLAG_ALLOW_GPA_MR = 1, 782 782 }; 783 783 784 784 struct gdma_create_pd_req { ··· 805 803 };/* HW DATA */ 806 804 807 805 enum gdma_mr_type { 806 + /* 807 + * Guest Physical Address - MRs of this type allow access 808 + * to any DMA-mapped memory using bus-logical address 809 + */ 810 + GDMA_MR_TYPE_GPA = 1, 808 811 /* Guest Virtual Address - MRs of this type allow access 809 812 * to memory mapped by PTEs associated with this MR using a virtual 810 813 * address that is set up in the MST
+3 -1
include/net/mana/mana.h
··· 827 827 u32 doorbell_pg_id); 828 828 void mana_uncfg_vport(struct mana_port_context *apc); 829 829 830 - struct net_device *mana_get_primary_netdev_rcu(struct mana_context *ac, u32 port_index); 830 + struct net_device *mana_get_primary_netdev(struct mana_context *ac, 831 + u32 port_index, 832 + netdevice_tracker *tracker); 831 833 #endif /* _MANA_H */
+30
include/rdma/ib_ucaps.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ 2 + /* 3 + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved 4 + */ 5 + 6 + #ifndef _IB_UCAPS_H_ 7 + #define _IB_UCAPS_H_ 8 + 9 + #define UCAP_ENABLED(ucaps, type) (!!((ucaps) & (1U << (type)))) 10 + 11 + enum rdma_user_cap { 12 + RDMA_UCAP_MLX5_CTRL_LOCAL, 13 + RDMA_UCAP_MLX5_CTRL_OTHER_VHCA, 14 + RDMA_UCAP_MAX 15 + }; 16 + 17 + void ib_cleanup_ucaps(void); 18 + int ib_get_ucaps(int *fds, int fd_count, uint64_t *idx_mask); 19 + #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS) 20 + int ib_create_ucap(enum rdma_user_cap type); 21 + void ib_remove_ucap(enum rdma_user_cap type); 22 + #else 23 + static inline int ib_create_ucap(enum rdma_user_cap type) 24 + { 25 + return -EOPNOTSUPP; 26 + } 27 + static inline void ib_remove_ucap(enum rdma_user_cap type) {} 28 + #endif /* CONFIG_INFINIBAND_USER_ACCESS */ 29 + 30 + #endif /* _IB_UCAPS_H_ */
+28 -2
include/rdma/ib_verbs.h
··· 519 519 IB_PORT_ACTIVE_DEFER = 5 520 520 }; 521 521 522 + static inline const char *__attribute_const__ 523 + ib_port_state_to_str(enum ib_port_state state) 524 + { 525 + const char * const states[] = { 526 + [IB_PORT_NOP] = "NOP", 527 + [IB_PORT_DOWN] = "DOWN", 528 + [IB_PORT_INIT] = "INIT", 529 + [IB_PORT_ARMED] = "ARMED", 530 + [IB_PORT_ACTIVE] = "ACTIVE", 531 + [IB_PORT_ACTIVE_DEFER] = "ACTIVE_DEFER", 532 + }; 533 + 534 + if (state < ARRAY_SIZE(states)) 535 + return states[state]; 536 + return "UNKNOWN"; 537 + } 538 + 522 539 enum ib_port_phys_state { 523 540 IB_PORT_PHYS_STATE_SLEEP = 1, 524 541 IB_PORT_PHYS_STATE_POLLING = 2, ··· 1530 1513 struct ib_uverbs_file *ufile; 1531 1514 1532 1515 struct ib_rdmacg_object cg_obj; 1516 + u64 enabled_caps; 1533 1517 /* 1534 1518 * Implementation details of the RDMA core, don't use in drivers: 1535 1519 */ ··· 2644 2626 * @counter - The counter to be bound. If counter->id is zero then 2645 2627 * the driver needs to allocate a new counter and set counter->id 2646 2628 */ 2647 - int (*counter_bind_qp)(struct rdma_counter *counter, struct ib_qp *qp); 2629 + int (*counter_bind_qp)(struct rdma_counter *counter, struct ib_qp *qp, 2630 + u32 port); 2648 2631 /** 2649 2632 * counter_unbind_qp - Unbind the qp from the dynamically-allocated 2650 2633 * counter and bind it onto the default one 2651 2634 */ 2652 - int (*counter_unbind_qp)(struct ib_qp *qp); 2635 + int (*counter_unbind_qp)(struct ib_qp *qp, u32 port); 2653 2636 /** 2654 2637 * counter_dealloc -De-allocate the hw counter 2655 2638 */ ··· 2665 2646 * counter_update_stats - Query the stats value of this counter 2666 2647 */ 2667 2648 int (*counter_update_stats)(struct rdma_counter *counter); 2649 + 2650 + /** 2651 + * counter_init - Initialize the driver specific rdma counter struct. 2652 + */ 2653 + void (*counter_init)(struct rdma_counter *counter); 2668 2654 2669 2655 /** 2670 2656 * Allows rdma drivers to add their own restrack attributes ··· 2722 2698 DECLARE_RDMA_OBJ_SIZE(ib_srq); 2723 2699 DECLARE_RDMA_OBJ_SIZE(ib_ucontext); 2724 2700 DECLARE_RDMA_OBJ_SIZE(ib_xrcd); 2701 + DECLARE_RDMA_OBJ_SIZE(rdma_counter); 2725 2702 }; 2726 2703 2727 2704 struct ib_core_device { ··· 2775 2750 * It is a NULL terminated array. 2776 2751 */ 2777 2752 const struct attribute_group *groups[4]; 2753 + u8 hw_stats_attr_index; 2778 2754 2779 2755 u64 uverbs_cmd_mask; 2780 2756
+5 -2
include/rdma/rdma_counter.h
··· 23 23 enum rdma_nl_counter_mode mode; 24 24 enum rdma_nl_counter_mask mask; 25 25 struct auto_mode_param param; 26 + bool bind_opcnt; 26 27 }; 27 28 28 29 struct rdma_port_counter { ··· 48 47 void rdma_counter_release(struct ib_device *dev); 49 48 int rdma_counter_set_auto_mode(struct ib_device *dev, u32 port, 50 49 enum rdma_nl_counter_mask mask, 50 + bool bind_opcnt, 51 51 struct netlink_ext_ack *extack); 52 52 int rdma_counter_bind_qp_auto(struct ib_qp *qp, u32 port); 53 - int rdma_counter_unbind_qp(struct ib_qp *qp, bool force); 53 + int rdma_counter_unbind_qp(struct ib_qp *qp, u32 port, bool force); 54 54 55 55 int rdma_counter_query_stats(struct rdma_counter *counter); 56 56 u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u32 port, u32 index); ··· 63 61 u32 qp_num, u32 counter_id); 64 62 int rdma_counter_get_mode(struct ib_device *dev, u32 port, 65 63 enum rdma_nl_counter_mode *mode, 66 - enum rdma_nl_counter_mask *mask); 64 + enum rdma_nl_counter_mask *mask, 65 + bool *opcnt); 67 66 68 67 int rdma_counter_modify(struct ib_device *dev, u32 port, 69 68 unsigned int index, bool enable);
+1 -1
include/rdma/uverbs_std_types.h
··· 34 34 static inline void *_uobj_get_obj_read(struct ib_uobject *uobj) 35 35 { 36 36 if (IS_ERR(uobj)) 37 - return NULL; 37 + return ERR_CAST(uobj); 38 38 return uobj->object; 39 39 } 40 40 #define uobj_get_obj_read(_object, _type, _id, _attrs) \
+1
include/uapi/rdma/ib_user_ioctl_cmds.h
··· 88 88 enum uverbs_attrs_get_context_attr_ids { 89 89 UVERBS_ATTR_GET_CONTEXT_NUM_COMP_VECTORS, 90 90 UVERBS_ATTR_GET_CONTEXT_CORE_SUPPORT, 91 + UVERBS_ATTR_GET_CONTEXT_FD_ARR, 91 92 }; 92 93 93 94 enum uverbs_attrs_query_context_attr_ids {
+1
include/uapi/rdma/mlx5_user_ioctl_cmds.h
··· 239 239 MLX5_IB_ATTR_FLOW_MATCHER_MATCH_CRITERIA, 240 240 MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS, 241 241 MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE, 242 + MLX5_IB_ATTR_FLOW_MATCHER_IB_PORT, 242 243 }; 243 244 244 245 enum mlx5_ib_flow_matcher_destroy_attrs {
+2
include/uapi/rdma/mlx5_user_ioctl_verbs.h
··· 45 45 MLX5_IB_UAPI_FLOW_TABLE_TYPE_FDB = 0x2, 46 46 MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_RX = 0x3, 47 47 MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_TX = 0x4, 48 + MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_TRANSPORT_RX = 0x5, 49 + MLX5_IB_UAPI_FLOW_TABLE_TYPE_RDMA_TRANSPORT_TX = 0x6, 48 50 }; 49 51 50 52 enum mlx5_ib_uapi_flow_action_packet_reformat_type {
+2
include/uapi/rdma/rdma_netlink.h
··· 580 580 RDMA_NLDEV_ATTR_EVENT_TYPE, /* u8 */ 581 581 582 582 RDMA_NLDEV_SYS_ATTR_MONITOR_MODE, /* u8 */ 583 + 584 + RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED, /* u8 */ 583 585 /* 584 586 * Always the end 585 587 */