Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'support-direct-read-from-region'

Jacob Keller says:

====================
support direct read from region

A long time ago when initially implementing devlink regions in ice I
proposed the ability to allow reading from a region without taking a
snapshot [1]. I eventually dropped this work from the original series due to
size. Then I eventually lost track of submitting this follow up.

This can be useful when interacting with some region that has some
definitive "contents" from which snapshots are made. For example the ice
driver has regions representing the contents of the device flash.

If userspace wants to read the contents today, it must first take a snapshot
and then read from that snapshot. This makes sense if you want to read a
large portion of data or you want to be sure reads are consistently from the
same recording of the flash.

However if user space only wants to read a small chunk, it must first
generate a snapshot of the entire contents, perform a read from the
snapshot, and then delete the snapshot after reading.

For such a use case, a direct read from the region makes more sense. This
can be achieved by allowing the devlink region read command to work without
a snapshot. Instead the portion to be read can be forwarded directly to the
driver via a new .read callback.

This avoids the need to read the entire region contents into memory first
and avoids the software overhead of creating a snapshot and then deleting
it.

This series implements such behavior and hooks up the ice NVM and shadow RAM
regions to allow it.

[1] https://lore.kernel.org/netdev/20200130225913.1671982-1-jacob.e.keller@intel.com/
====================

Link: https://lore.kernel.org/r/20221128203647.1198669-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+215 -80
+11 -2
Documentation/networking/devlink/ice.rst
··· 189 189 * - ``nvm-flash`` 190 190 - The contents of the entire flash chip, sometimes referred to as 191 191 the device's Non Volatile Memory. 192 + * - ``shadow-ram`` 193 + - The contents of the Shadow RAM, which is loaded from the beginning 194 + of the flash. Although the contents are primarily from the flash, 195 + this area also contains data generated during device boot which is 196 + not stored in flash. 192 197 * - ``device-caps`` 193 198 - The contents of the device firmware's capabilities buffer. Useful to 194 199 determine the current state and configuration of the device. 195 200 196 - Users can request an immediate capture of a snapshot via the 197 - ``DEVLINK_CMD_REGION_NEW`` 201 + Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a 202 + snapshot. The ``device-caps`` region requires a snapshot as the contents are 203 + sent by firmware and can't be split into separate reads. 204 + 205 + Users can request an immediate capture of a snapshot for all three regions 206 + via the ``DEVLINK_CMD_REGION_NEW`` command. 198 207 199 208 .. code:: shell 200 209
+66 -46
drivers/net/ethernet/intel/ice/ice_devlink.c
··· 1596 1596 1597 1597 #define ICE_DEVLINK_READ_BLK_SIZE (1024 * 1024) 1598 1598 1599 + static const struct devlink_region_ops ice_nvm_region_ops; 1600 + static const struct devlink_region_ops ice_sram_region_ops; 1601 + 1599 1602 /** 1600 1603 * ice_devlink_nvm_snapshot - Capture a snapshot of the NVM flash contents 1601 1604 * @devlink: the devlink instance 1602 - * @ops: the devlink region being snapshotted 1605 + * @ops: the devlink region to snapshot 1603 1606 * @extack: extended ACK response structure 1604 1607 * @data: on exit points to snapshot data buffer 1605 1608 * 1606 - * This function is called in response to the DEVLINK_CMD_REGION_TRIGGER for 1607 - * the nvm-flash devlink region. It captures a snapshot of the full NVM flash 1608 - * contents, including both banks of flash. This snapshot can later be viewed 1609 - * via the devlink-region interface. 1609 + * This function is called in response to a DEVLINK_CMD_REGION_NEW for either 1610 + * the nvm-flash or shadow-ram region. 1610 1611 * 1611 - * It captures the flash using the FLASH_ONLY bit set when reading via 1612 - * firmware, so it does not read the current Shadow RAM contents. For that, 1613 - * use the shadow-ram region. 1612 + * It captures a snapshot of the NVM or Shadow RAM flash contents. This 1613 + * snapshot can then later be viewed via the DEVLINK_CMD_REGION_READ netlink 1614 + * interface. 1614 1615 * 1615 1616 * @returns zero on success, and updates the data pointer. Returns a non-zero 1616 1617 * error code on failure. ··· 1623 1622 struct ice_pf *pf = devlink_priv(devlink); 1624 1623 struct device *dev = ice_pf_to_dev(pf); 1625 1624 struct ice_hw *hw = &pf->hw; 1625 + bool read_shadow_ram; 1626 1626 u8 *nvm_data, *tmp, i; 1627 1627 u32 nvm_size, left; 1628 1628 s8 num_blks; 1629 1629 int status; 1630 1630 1631 - nvm_size = hw->flash.flash_size; 1631 + if (ops == &ice_nvm_region_ops) { 1632 + read_shadow_ram = false; 1633 + nvm_size = hw->flash.flash_size; 1634 + } else if (ops == &ice_sram_region_ops) { 1635 + read_shadow_ram = true; 1636 + nvm_size = hw->flash.sr_words * 2u; 1637 + } else { 1638 + NL_SET_ERR_MSG_MOD(extack, "Unexpected region in snapshot function"); 1639 + return -EOPNOTSUPP; 1640 + } 1641 + 1632 1642 nvm_data = vzalloc(nvm_size); 1633 1643 if (!nvm_data) 1634 1644 return -ENOMEM; 1635 - 1636 1645 1637 1646 num_blks = DIV_ROUND_UP(nvm_size, ICE_DEVLINK_READ_BLK_SIZE); 1638 1647 tmp = nvm_data; ··· 1667 1656 } 1668 1657 1669 1658 status = ice_read_flat_nvm(hw, i * ICE_DEVLINK_READ_BLK_SIZE, 1670 - &read_sz, tmp, false); 1659 + &read_sz, tmp, read_shadow_ram); 1671 1660 if (status) { 1672 1661 dev_dbg(dev, "ice_read_flat_nvm failed after reading %u bytes, err %d aq_err %d\n", 1673 1662 read_sz, status, hw->adminq.sq_last_status); ··· 1688 1677 } 1689 1678 1690 1679 /** 1691 - * ice_devlink_sram_snapshot - Capture a snapshot of the Shadow RAM contents 1680 + * ice_devlink_nvm_read - Read a portion of NVM flash contents 1692 1681 * @devlink: the devlink instance 1693 - * @ops: the devlink region being snapshotted 1682 + * @ops: the devlink region to snapshot 1694 1683 * @extack: extended ACK response structure 1695 - * @data: on exit points to snapshot data buffer 1684 + * @offset: the offset to start at 1685 + * @size: the amount to read 1686 + * @data: the data buffer to read into 1696 1687 * 1697 - * This function is called in response to the DEVLINK_CMD_REGION_TRIGGER for 1698 - * the shadow-ram devlink region. It captures a snapshot of the shadow ram 1699 - * contents. This snapshot can later be viewed via the devlink-region 1700 - * interface. 1688 + * This function is called in response to DEVLINK_CMD_REGION_READ to directly 1689 + * read a section of the NVM contents. 1690 + * 1691 + * It reads from either the nvm-flash or shadow-ram region contents. 1701 1692 * 1702 1693 * @returns zero on success, and updates the data pointer. Returns a non-zero 1703 1694 * error code on failure. 1704 1695 */ 1705 - static int 1706 - ice_devlink_sram_snapshot(struct devlink *devlink, 1707 - const struct devlink_region_ops __always_unused *ops, 1708 - struct netlink_ext_ack *extack, u8 **data) 1696 + static int ice_devlink_nvm_read(struct devlink *devlink, 1697 + const struct devlink_region_ops *ops, 1698 + struct netlink_ext_ack *extack, 1699 + u64 offset, u32 size, u8 *data) 1709 1700 { 1710 1701 struct ice_pf *pf = devlink_priv(devlink); 1711 1702 struct device *dev = ice_pf_to_dev(pf); 1712 1703 struct ice_hw *hw = &pf->hw; 1713 - u8 *sram_data; 1714 - u32 sram_size; 1715 - int err; 1704 + bool read_shadow_ram; 1705 + u64 nvm_size; 1706 + int status; 1716 1707 1717 - sram_size = hw->flash.sr_words * 2u; 1718 - sram_data = vzalloc(sram_size); 1719 - if (!sram_data) 1720 - return -ENOMEM; 1708 + if (ops == &ice_nvm_region_ops) { 1709 + read_shadow_ram = false; 1710 + nvm_size = hw->flash.flash_size; 1711 + } else if (ops == &ice_sram_region_ops) { 1712 + read_shadow_ram = true; 1713 + nvm_size = hw->flash.sr_words * 2u; 1714 + } else { 1715 + NL_SET_ERR_MSG_MOD(extack, "Unexpected region in snapshot function"); 1716 + return -EOPNOTSUPP; 1717 + } 1721 1718 1722 - err = ice_acquire_nvm(hw, ICE_RES_READ); 1723 - if (err) { 1719 + if (offset + size >= nvm_size) { 1720 + NL_SET_ERR_MSG_MOD(extack, "Cannot read beyond the region size"); 1721 + return -ERANGE; 1722 + } 1723 + 1724 + status = ice_acquire_nvm(hw, ICE_RES_READ); 1725 + if (status) { 1724 1726 dev_dbg(dev, "ice_acquire_nvm failed, err %d aq_err %d\n", 1725 - err, hw->adminq.sq_last_status); 1727 + status, hw->adminq.sq_last_status); 1726 1728 NL_SET_ERR_MSG_MOD(extack, "Failed to acquire NVM semaphore"); 1727 - vfree(sram_data); 1728 - return err; 1729 + return -EIO; 1729 1730 } 1730 1731 1731 - /* Read from the Shadow RAM, rather than directly from NVM */ 1732 - err = ice_read_flat_nvm(hw, 0, &sram_size, sram_data, true); 1733 - if (err) { 1732 + status = ice_read_flat_nvm(hw, (u32)offset, &size, data, 1733 + read_shadow_ram); 1734 + if (status) { 1734 1735 dev_dbg(dev, "ice_read_flat_nvm failed after reading %u bytes, err %d aq_err %d\n", 1735 - sram_size, err, hw->adminq.sq_last_status); 1736 - NL_SET_ERR_MSG_MOD(extack, 1737 - "Failed to read Shadow RAM contents"); 1736 + size, status, hw->adminq.sq_last_status); 1737 + NL_SET_ERR_MSG_MOD(extack, "Failed to read NVM contents"); 1738 1738 ice_release_nvm(hw); 1739 - vfree(sram_data); 1740 - return err; 1739 + return -EIO; 1741 1740 } 1742 - 1743 1741 ice_release_nvm(hw); 1744 - 1745 - *data = sram_data; 1746 1742 1747 1743 return 0; 1748 1744 } ··· 1802 1784 .name = "nvm-flash", 1803 1785 .destructor = vfree, 1804 1786 .snapshot = ice_devlink_nvm_snapshot, 1787 + .read = ice_devlink_nvm_read, 1805 1788 }; 1806 1789 1807 1790 static const struct devlink_region_ops ice_sram_region_ops = { 1808 1791 .name = "shadow-ram", 1809 1792 .destructor = vfree, 1810 - .snapshot = ice_devlink_sram_snapshot, 1793 + .snapshot = ice_devlink_nvm_snapshot, 1794 + .read = ice_devlink_nvm_read, 1811 1795 }; 1812 1796 1813 1797 static const struct devlink_region_ops ice_devcaps_region_ops = {
+16
include/net/devlink.h
··· 650 650 * the data variable must be updated to point to the snapshot data. 651 651 * The function will be called while the devlink instance lock is 652 652 * held. 653 + * @read: callback to directly read a portion of the region. On success, 654 + * the data pointer will be updated with the contents of the 655 + * requested portion of the region. The function will be called 656 + * while the devlink instance lock is held. 653 657 * @priv: Pointer to driver private data for the region operation 654 658 */ 655 659 struct devlink_region_ops { ··· 663 659 const struct devlink_region_ops *ops, 664 660 struct netlink_ext_ack *extack, 665 661 u8 **data); 662 + int (*read)(struct devlink *devlink, 663 + const struct devlink_region_ops *ops, 664 + struct netlink_ext_ack *extack, 665 + u64 offset, u32 size, u8 *data); 666 666 void *priv; 667 667 }; 668 668 ··· 678 670 * the data variable must be updated to point to the snapshot data. 679 671 * The function will be called while the devlink instance lock is 680 672 * held. 673 + * @read: callback to directly read a portion of the region. On success, 674 + * the data pointer will be updated with the contents of the 675 + * requested portion of the region. The function will be called 676 + * while the devlink instance lock is held. 681 677 * @priv: Pointer to driver private data for the region operation 682 678 */ 683 679 struct devlink_port_region_ops { ··· 691 679 const struct devlink_port_region_ops *ops, 692 680 struct netlink_ext_ack *extack, 693 681 u8 **data); 682 + int (*read)(struct devlink_port *port, 683 + const struct devlink_port_region_ops *ops, 684 + struct netlink_ext_ack *extack, 685 + u64 offset, u32 size, u8 *data); 694 686 void *priv; 695 687 }; 696 688
+2
include/uapi/linux/devlink.h
··· 610 610 DEVLINK_ATTR_RATE_TX_PRIORITY, /* u32 */ 611 611 DEVLINK_ATTR_RATE_TX_WEIGHT, /* u32 */ 612 612 613 + DEVLINK_ATTR_REGION_DIRECT, /* flag */ 614 + 613 615 /* add new attributes above here, update the policy in devlink.c */ 614 616 615 617 __DEVLINK_ATTR_MAX,
+107 -32
net/core/devlink.c
··· 6431 6431 } 6432 6432 6433 6433 static int devlink_nl_cmd_region_read_chunk_fill(struct sk_buff *msg, 6434 - struct devlink *devlink, 6435 6434 u8 *chunk, u32 chunk_size, 6436 6435 u64 addr) 6437 6436 { ··· 6460 6461 6461 6462 #define DEVLINK_REGION_READ_CHUNK_SIZE 256 6462 6463 6463 - static int devlink_nl_region_read_snapshot_fill(struct sk_buff *skb, 6464 - struct devlink *devlink, 6465 - struct devlink_region *region, 6466 - struct nlattr **attrs, 6467 - u64 start_offset, 6468 - u64 end_offset, 6469 - u64 *new_offset) 6464 + typedef int devlink_chunk_fill_t(void *cb_priv, u8 *chunk, u32 chunk_size, 6465 + u64 curr_offset, 6466 + struct netlink_ext_ack *extack); 6467 + 6468 + static int 6469 + devlink_nl_region_read_fill(struct sk_buff *skb, devlink_chunk_fill_t *cb, 6470 + void *cb_priv, u64 start_offset, u64 end_offset, 6471 + u64 *new_offset, struct netlink_ext_ack *extack) 6470 6472 { 6471 - struct devlink_snapshot *snapshot; 6472 6473 u64 curr_offset = start_offset; 6473 - u32 snapshot_id; 6474 6474 int err = 0; 6475 + u8 *data; 6476 + 6477 + /* Allocate and re-use a single buffer */ 6478 + data = kmalloc(DEVLINK_REGION_READ_CHUNK_SIZE, GFP_KERNEL); 6479 + if (!data) 6480 + return -ENOMEM; 6475 6481 6476 6482 *new_offset = start_offset; 6477 6483 6478 - snapshot_id = nla_get_u32(attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]); 6479 - snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); 6480 - if (!snapshot) 6481 - return -EINVAL; 6482 - 6483 6484 while (curr_offset < end_offset) { 6484 6485 u32 data_size; 6485 - u8 *data; 6486 6486 6487 - if (end_offset - curr_offset < DEVLINK_REGION_READ_CHUNK_SIZE) 6488 - data_size = end_offset - curr_offset; 6489 - else 6490 - data_size = DEVLINK_REGION_READ_CHUNK_SIZE; 6487 + data_size = min_t(u32, end_offset - curr_offset, 6488 + DEVLINK_REGION_READ_CHUNK_SIZE); 6491 6489 6492 - data = &snapshot->data[curr_offset]; 6493 - err = devlink_nl_cmd_region_read_chunk_fill(skb, devlink, 6494 - data, data_size, 6495 - curr_offset); 6490 + err = cb(cb_priv, data, data_size, curr_offset, extack); 6491 + if (err) 6492 + break; 6493 + 6494 + err = devlink_nl_cmd_region_read_chunk_fill(skb, data, data_size, curr_offset); 6496 6495 if (err) 6497 6496 break; 6498 6497 ··· 6498 6501 } 6499 6502 *new_offset = curr_offset; 6500 6503 6504 + kfree(data); 6505 + 6501 6506 return err; 6507 + } 6508 + 6509 + static int 6510 + devlink_region_snapshot_fill(void *cb_priv, u8 *chunk, u32 chunk_size, 6511 + u64 curr_offset, 6512 + struct netlink_ext_ack __always_unused *extack) 6513 + { 6514 + struct devlink_snapshot *snapshot = cb_priv; 6515 + 6516 + memcpy(chunk, &snapshot->data[curr_offset], chunk_size); 6517 + 6518 + return 0; 6519 + } 6520 + 6521 + static int 6522 + devlink_region_port_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, 6523 + u64 curr_offset, struct netlink_ext_ack *extack) 6524 + { 6525 + struct devlink_region *region = cb_priv; 6526 + 6527 + return region->port_ops->read(region->port, region->port_ops, extack, 6528 + curr_offset, chunk_size, chunk); 6529 + } 6530 + 6531 + static int 6532 + devlink_region_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, 6533 + u64 curr_offset, struct netlink_ext_ack *extack) 6534 + { 6535 + struct devlink_region *region = cb_priv; 6536 + 6537 + return region->ops->read(region->devlink, region->ops, extack, 6538 + curr_offset, chunk_size, chunk); 6502 6539 } 6503 6540 6504 6541 static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, 6505 6542 struct netlink_callback *cb) 6506 6543 { 6507 6544 const struct genl_dumpit_info *info = genl_dumpit_info(cb); 6545 + struct nlattr *chunks_attr, *region_attr, *snapshot_attr; 6508 6546 u64 ret_offset, start_offset, end_offset = U64_MAX; 6509 6547 struct nlattr **attrs = info->attrs; 6510 6548 struct devlink_port *port = NULL; 6549 + devlink_chunk_fill_t *region_cb; 6511 6550 struct devlink_region *region; 6512 - struct nlattr *chunks_attr; 6513 6551 const char *region_name; 6514 6552 struct devlink *devlink; 6515 6553 unsigned int index; 6554 + void *region_cb_priv; 6516 6555 void *hdr; 6517 6556 int err; 6518 6557 ··· 6560 6527 6561 6528 devl_lock(devlink); 6562 6529 6563 - if (!attrs[DEVLINK_ATTR_REGION_NAME] || 6564 - !attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) { 6530 + if (!attrs[DEVLINK_ATTR_REGION_NAME]) { 6531 + NL_SET_ERR_MSG(cb->extack, "No region name provided"); 6565 6532 err = -EINVAL; 6566 6533 goto out_unlock; 6567 6534 } ··· 6576 6543 } 6577 6544 } 6578 6545 6579 - region_name = nla_data(attrs[DEVLINK_ATTR_REGION_NAME]); 6546 + region_attr = attrs[DEVLINK_ATTR_REGION_NAME]; 6547 + region_name = nla_data(region_attr); 6580 6548 6581 6549 if (port) 6582 6550 region = devlink_port_region_get_by_name(port, region_name); ··· 6585 6551 region = devlink_region_get_by_name(devlink, region_name); 6586 6552 6587 6553 if (!region) { 6554 + NL_SET_ERR_MSG_ATTR(cb->extack, region_attr, "Requested region does not exist"); 6588 6555 err = -EINVAL; 6589 6556 goto out_unlock; 6557 + } 6558 + 6559 + snapshot_attr = attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]; 6560 + if (!snapshot_attr) { 6561 + if (!nla_get_flag(attrs[DEVLINK_ATTR_REGION_DIRECT])) { 6562 + NL_SET_ERR_MSG(cb->extack, "No snapshot id provided"); 6563 + err = -EINVAL; 6564 + goto out_unlock; 6565 + } 6566 + 6567 + if (!region->ops->read) { 6568 + NL_SET_ERR_MSG(cb->extack, "Requested region does not support direct read"); 6569 + err = -EOPNOTSUPP; 6570 + goto out_unlock; 6571 + } 6572 + 6573 + if (port) 6574 + region_cb = &devlink_region_port_direct_fill; 6575 + else 6576 + region_cb = &devlink_region_direct_fill; 6577 + region_cb_priv = region; 6578 + } else { 6579 + struct devlink_snapshot *snapshot; 6580 + u32 snapshot_id; 6581 + 6582 + if (nla_get_flag(attrs[DEVLINK_ATTR_REGION_DIRECT])) { 6583 + NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "Direct region read does not use snapshot"); 6584 + err = -EINVAL; 6585 + goto out_unlock; 6586 + } 6587 + 6588 + snapshot_id = nla_get_u32(snapshot_attr); 6589 + snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); 6590 + if (!snapshot) { 6591 + NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "Requested snapshot does not exist"); 6592 + err = -EINVAL; 6593 + goto out_unlock; 6594 + } 6595 + region_cb = &devlink_region_snapshot_fill; 6596 + region_cb_priv = snapshot; 6590 6597 } 6591 6598 6592 6599 if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] && ··· 6678 6603 goto nla_put_failure; 6679 6604 } 6680 6605 6681 - err = devlink_nl_region_read_snapshot_fill(skb, devlink, 6682 - region, attrs, 6683 - start_offset, 6684 - end_offset, &ret_offset); 6606 + err = devlink_nl_region_read_fill(skb, region_cb, region_cb_priv, 6607 + start_offset, end_offset, &ret_offset, 6608 + cb->extack); 6685 6609 6686 6610 if (err && err != -EMSGSIZE) 6687 6611 goto nla_put_failure; ··· 9325 9251 [DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED }, 9326 9252 [DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32 }, 9327 9253 [DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32 }, 9254 + [DEVLINK_ATTR_REGION_DIRECT] = { .type = NLA_FLAG }, 9328 9255 }; 9329 9256 9330 9257 static const struct genl_small_ops devlink_nl_ops[] = {