Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

cxl/region: Add inject and clear poison by region offset

Add CXL region debugfs attributes to inject and clear poison based
on an offset into the region. These new interfaces allow users to
operate on poison at the region level without needing to resolve
Device Physical Addresses (DPA) or target individual memdevs.

The implementation uses a new helper, region_offset_to_dpa_result()
that applies decoder interleave logic, including XOR-based address
decoding when applicable. Note that XOR decodes rely on driver
internal xormaps which are not exposed to userspace. So, this support
is not only a simplification of poison operations that could be done
using existing per memdev operations, but also it enables this
functionality for XOR interleaved regions for the first time.

New debugfs attributes are added in /sys/kernel/debug/cxl/regionX/:
inject_poison and clear_poison. These are only exposed if all memdevs
participating in the region support both inject and clear commands,
ensuring consistent and reliable behavior across multi-device regions.

If tracing is enabled, these operations are logged as cxl_poison
events in /sys/kernel/tracing/trace.

The ABI documentation warns users of the significant risks that
come with using these capabilities.

A CXL Maturity Map update shows this user flow is now supported.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/f3fd8628ab57ea79704fb2d645902cd499c066af.1754290144.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>

authored by

Alison Schofield and committed by
Dave Jiang
c3dd6768 25a02078

+228 -4
+87
Documentation/ABI/testing/debugfs-cxl
··· 19 19 is returned to the user. The inject_poison attribute is only 20 20 visible for devices supporting the capability. 21 21 22 + TEST-ONLY INTERFACE: This interface is intended for testing 23 + and validation purposes only. It is not a data repair mechanism 24 + and should never be used on production systems or live data. 25 + 26 + DATA LOSS RISK: For CXL persistent memory (PMEM) devices, 27 + poison injection can result in permanent data loss. Injected 28 + poison may render data permanently inaccessible even after 29 + clearing, as the clear operation writes zeros and does not 30 + recover original data. 31 + 32 + SYSTEM STABILITY RISK: For volatile memory, poison injection 33 + can cause kernel crashes, system instability, or unpredictable 34 + behavior if the poisoned addresses are accessed by running code 35 + or critical kernel structures. 22 36 23 37 What: /sys/kernel/debug/cxl/memX/clear_poison 24 38 Date: April, 2023 ··· 48 34 device cannot clear poison from the address, -ENXIO is returned. 49 35 The clear_poison attribute is only visible for devices 50 36 supporting the capability. 37 + 38 + TEST-ONLY INTERFACE: This interface is intended for testing 39 + and validation purposes only. It is not a data repair mechanism 40 + and should never be used on production systems or live data. 41 + 42 + CLEAR IS NOT DATA RECOVERY: This operation writes zeros to the 43 + specified address range and removes the address from the poison 44 + list. It does NOT recover or restore original data that may have 45 + been present before poison injection. Any original data at the 46 + cleared address is permanently lost and replaced with zeros. 47 + 48 + CLEAR IS NOT A REPAIR MECHANISM: This interface is for testing 49 + purposes only and should not be used as a data repair tool. 50 + Clearing poison is fundamentally different from data recovery 51 + or error correction. 52 + 53 + What: /sys/kernel/debug/cxl/regionX/inject_poison 54 + Date: August, 2025 55 + Contact: linux-cxl@vger.kernel.org 56 + Description: 57 + (WO) When a Host Physical Address (HPA) is written to this 58 + attribute, the region driver translates it to a Device 59 + Physical Address (DPA) and identifies the corresponding 60 + memdev. It then sends an inject poison command to that memdev 61 + at the translated DPA. Refer to the memdev ABI entry at: 62 + /sys/kernel/debug/cxl/memX/inject_poison for the detailed 63 + behavior. This attribute is only visible if all memdevs 64 + participating in the region support both inject and clear 65 + poison commands. 66 + 67 + TEST-ONLY INTERFACE: This interface is intended for testing 68 + and validation purposes only. It is not a data repair mechanism 69 + and should never be used on production systems or live data. 70 + 71 + DATA LOSS RISK: For CXL persistent memory (PMEM) devices, 72 + poison injection can result in permanent data loss. Injected 73 + poison may render data permanently inaccessible even after 74 + clearing, as the clear operation writes zeros and does not 75 + recover original data. 76 + 77 + SYSTEM STABILITY RISK: For volatile memory, poison injection 78 + can cause kernel crashes, system instability, or unpredictable 79 + behavior if the poisoned addresses are accessed by running code 80 + or critical kernel structures. 81 + 82 + What: /sys/kernel/debug/cxl/regionX/clear_poison 83 + Date: August, 2025 84 + Contact: linux-cxl@vger.kernel.org 85 + Description: 86 + (WO) When a Host Physical Address (HPA) is written to this 87 + attribute, the region driver translates it to a Device 88 + Physical Address (DPA) and identifies the corresponding 89 + memdev. It then sends a clear poison command to that memdev 90 + at the translated DPA. Refer to the memdev ABI entry at: 91 + /sys/kernel/debug/cxl/memX/clear_poison for the detailed 92 + behavior. This attribute is only visible if all memdevs 93 + participating in the region support both inject and clear 94 + poison commands. 95 + 96 + TEST-ONLY INTERFACE: This interface is intended for testing 97 + and validation purposes only. It is not a data repair mechanism 98 + and should never be used on production systems or live data. 99 + 100 + CLEAR IS NOT DATA RECOVERY: This operation writes zeros to the 101 + specified address range and removes the address from the poison 102 + list. It does NOT recover or restore original data that may have 103 + been present before poison injection. Any original data at the 104 + cleared address is permanently lost and replaced with zeros. 105 + 106 + CLEAR IS NOT A REPAIR MECHANISM: This interface is for testing 107 + purposes only and should not be used as a data repair tool. 108 + Clearing poison is fundamentally different from data recovery 109 + or error correction. 51 110 52 111 What: /sys/kernel/debug/cxl/einj_types 53 112 Date: January, 2024
+1 -1
Documentation/driver-api/cxl/maturity-map.rst
··· 173 173 User Flow Support 174 174 ----------------- 175 175 176 - * [0] Inject & clear poison by HPA 176 + * [2] Inject & clear poison by region offset 177 177 178 178 Details 179 179 =======
+4
drivers/cxl/core/core.h
··· 135 135 CXL_POISON_TRACE_CLEAR, 136 136 }; 137 137 138 + enum poison_cmd_enabled_bits; 139 + bool cxl_memdev_has_poison_cmd(struct cxl_memdev *cxlmd, 140 + enum poison_cmd_enabled_bits cmd); 141 + 138 142 long cxl_pci_get_latency(struct pci_dev *pdev); 139 143 int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c); 140 144 int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr,
+8
drivers/cxl/core/memdev.c
··· 200 200 static struct device_attribute dev_attr_security_erase = 201 201 __ATTR(erase, 0200, NULL, security_erase_store); 202 202 203 + bool cxl_memdev_has_poison_cmd(struct cxl_memdev *cxlmd, 204 + enum poison_cmd_enabled_bits cmd) 205 + { 206 + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); 207 + 208 + return test_bit(cmd, mds->poison.enabled_cmds); 209 + } 210 + 203 211 static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd) 204 212 { 205 213 struct cxl_dev_state *cxlds = cxlmd->cxlds;
+128 -3
drivers/cxl/core/region.c
··· 2 2 /* Copyright(c) 2022 Intel Corporation. All rights reserved. */ 3 3 #include <linux/memregion.h> 4 4 #include <linux/genalloc.h> 5 + #include <linux/debugfs.h> 5 6 #include <linux/device.h> 6 7 #include <linux/module.h> 7 8 #include <linux/memory.h> ··· 3004 3003 u64 dpa; 3005 3004 }; 3006 3005 3007 - static int __maybe_unused region_offset_to_dpa_result(struct cxl_region *cxlr, 3008 - u64 offset, 3009 - struct dpa_result *result) 3006 + static int region_offset_to_dpa_result(struct cxl_region *cxlr, u64 offset, 3007 + struct dpa_result *result) 3010 3008 { 3011 3009 struct cxl_region_params *p = &cxlr->params; 3012 3010 struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); ··· 3648 3648 unregister_mt_adistance_algorithm(&cxlr->adist_notifier); 3649 3649 } 3650 3650 3651 + static void remove_debugfs(void *dentry) 3652 + { 3653 + debugfs_remove_recursive(dentry); 3654 + } 3655 + 3656 + static int validate_region_offset(struct cxl_region *cxlr, u64 offset) 3657 + { 3658 + struct cxl_region_params *p = &cxlr->params; 3659 + resource_size_t region_size; 3660 + u64 hpa; 3661 + 3662 + if (offset < p->cache_size) { 3663 + dev_err(&cxlr->dev, 3664 + "Offset %#llx is within extended linear cache %#llx\n", 3665 + offset, p->cache_size); 3666 + return -EINVAL; 3667 + } 3668 + 3669 + region_size = resource_size(p->res); 3670 + if (offset >= region_size) { 3671 + dev_err(&cxlr->dev, "Offset %#llx exceeds region size %#llx\n", 3672 + offset, region_size); 3673 + return -EINVAL; 3674 + } 3675 + 3676 + hpa = p->res->start + offset; 3677 + if (hpa < p->res->start || hpa > p->res->end) { 3678 + dev_err(&cxlr->dev, "HPA %#llx not in region %pr\n", hpa, 3679 + p->res); 3680 + return -EINVAL; 3681 + } 3682 + 3683 + return 0; 3684 + } 3685 + 3686 + static int cxl_region_debugfs_poison_inject(void *data, u64 offset) 3687 + { 3688 + struct dpa_result result = { .dpa = ULLONG_MAX, .cxlmd = NULL }; 3689 + struct cxl_region *cxlr = data; 3690 + int rc; 3691 + 3692 + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region); 3693 + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem))) 3694 + return rc; 3695 + 3696 + ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa); 3697 + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem))) 3698 + return rc; 3699 + 3700 + if (validate_region_offset(cxlr, offset)) 3701 + return -EINVAL; 3702 + 3703 + rc = region_offset_to_dpa_result(cxlr, offset, &result); 3704 + if (rc || !result.cxlmd || result.dpa == ULLONG_MAX) { 3705 + dev_dbg(&cxlr->dev, 3706 + "Failed to resolve DPA for region offset %#llx rc %d\n", 3707 + offset, rc); 3708 + 3709 + return rc ? rc : -EINVAL; 3710 + } 3711 + 3712 + return cxl_inject_poison_locked(result.cxlmd, result.dpa); 3713 + } 3714 + 3715 + DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_inject_fops, NULL, 3716 + cxl_region_debugfs_poison_inject, "%llx\n"); 3717 + 3718 + static int cxl_region_debugfs_poison_clear(void *data, u64 offset) 3719 + { 3720 + struct dpa_result result = { .dpa = ULLONG_MAX, .cxlmd = NULL }; 3721 + struct cxl_region *cxlr = data; 3722 + int rc; 3723 + 3724 + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region); 3725 + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem))) 3726 + return rc; 3727 + 3728 + ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa); 3729 + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem))) 3730 + return rc; 3731 + 3732 + if (validate_region_offset(cxlr, offset)) 3733 + return -EINVAL; 3734 + 3735 + rc = region_offset_to_dpa_result(cxlr, offset, &result); 3736 + if (rc || !result.cxlmd || result.dpa == ULLONG_MAX) { 3737 + dev_dbg(&cxlr->dev, 3738 + "Failed to resolve DPA for region offset %#llx rc %d\n", 3739 + offset, rc); 3740 + 3741 + return rc ? rc : -EINVAL; 3742 + } 3743 + 3744 + return cxl_clear_poison_locked(result.cxlmd, result.dpa); 3745 + } 3746 + 3747 + DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL, 3748 + cxl_region_debugfs_poison_clear, "%llx\n"); 3749 + 3651 3750 static int cxl_region_can_probe(struct cxl_region *cxlr) 3652 3751 { 3653 3752 struct cxl_region_params *p = &cxlr->params; ··· 3776 3677 { 3777 3678 struct cxl_region *cxlr = to_cxl_region(dev); 3778 3679 struct cxl_region_params *p = &cxlr->params; 3680 + bool poison_supported = true; 3779 3681 int rc; 3780 3682 3781 3683 rc = cxl_region_can_probe(cxlr); ··· 3799 3699 rc = devm_add_action_or_reset(&cxlr->dev, shutdown_notifiers, cxlr); 3800 3700 if (rc) 3801 3701 return rc; 3702 + 3703 + /* Create poison attributes if all memdevs support the capabilities */ 3704 + for (int i = 0; i < p->nr_targets; i++) { 3705 + struct cxl_endpoint_decoder *cxled = p->targets[i]; 3706 + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 3707 + 3708 + if (!cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_INJECT) || 3709 + !cxl_memdev_has_poison_cmd(cxlmd, CXL_POISON_ENABLED_CLEAR)) { 3710 + poison_supported = false; 3711 + break; 3712 + } 3713 + } 3714 + 3715 + if (poison_supported) { 3716 + struct dentry *dentry; 3717 + 3718 + dentry = cxl_debugfs_create_dir(dev_name(dev)); 3719 + debugfs_create_file("inject_poison", 0200, dentry, cxlr, 3720 + &cxl_poison_inject_fops); 3721 + debugfs_create_file("clear_poison", 0200, dentry, cxlr, 3722 + &cxl_poison_clear_fops); 3723 + rc = devm_add_action_or_reset(dev, remove_debugfs, dentry); 3724 + if (rc) 3725 + return rc; 3726 + } 3802 3727 3803 3728 switch (cxlr->mode) { 3804 3729 case CXL_PARTMODE_PMEM: