Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'cxl-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull CXL updates from Dan Williams:
"CXL has mechanisms to enumerate the performance characteristics of
memory devices. Those mechanisms allow Linux to build the equivalent
of ACPI SRAT, SLIT, and HMAT tables dynamically at runtime. That
capability is necessary because static ACPI can not represent dynamic
CXL configurations (and reconfigurations).

So, building on the v6.8 work to add "Quality of Service" enumeration,
this update plumbs CXL "access coordinates" (read/write access latency
and bandwidth) in all the same places that ACPI HMAT feeds similar
data. Follow-on patches from the -mm side can then use that data to
feed mechanisms like mm/memory-tiers.c. Greg has acked the touch to
drivers/base/.

The other feature update this cycle is support for CXL error injection
via the ACPI EINJ module. That facility enables injection of bus
protocol errors provided the user knows the magic address values to
insert in the interface. To hide that magic, and make this easier to
use, new error injection attributes were added to CXL debugfs. That
interface injects the errors relative to a CXL object rather than
require user tooling to know how to lookup and inject RCRB (Root
Complex Register Block) addresses into the raw EINJ debugfs interface.
It received some helpful review comments from Tony, but no explicit
acks from the ACPI side. The primary user visible change for existing
EINJ users is that they may find that einj.ko was already loaded by
cxl_core.ko. Previously, einj.ko was only loaded on demand.

The usual collection of miscellaneous cleanups are also present this
cycle.

Summary:

- Supplement ACPI HMAT reported memory performance with native CXL
memory performance enumeration

- Add support for CXL error injection via the ACPI EINJ mechanism

- Cleanup CXL DOE and CDAT integration

- Miscellaneous cleanups and fixes"

* tag 'cxl-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
Documentation/ABI/testing/debugfs-cxl: Fix "Unexpected indentation"
lib/firmware_table: Provide buffer length argument to cdat_table_parse()
cxl/pci: Get rid of pointer arithmetic reading CDAT table
cxl/pci: Rename DOE mailbox handle to doe_mb
cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location
cxl/core: Add CXL EINJ debugfs files
EINJ, Documentation: Update EINJ kernel doc
EINJ: Add CXL error type support
EINJ: Migrate to a platform driver
cxl/region: Deal with numa nodes not enumerated by SRAT
cxl/region: Add memory hotplug notifier for cxl region
cxl/region: Add sysfs attribute for locality attributes of CXL regions
cxl/region: Calculate performance data for a region
cxl: Set cxlmd->endpoint before adding port device
cxl: Move QoS class to be calculated from the nearest CPU
cxl: Split out host bridge access coordinates
cxl: Split out combine_coordinates() for common shared usage
ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes
ACPI: HMAT: Introduce 2 levels of generic port access class
base/node / ACPI: Enumerate node access class for 'struct access_coordinate'
...

+1004 -148
+34
Documentation/ABI/testing/debugfs-cxl
··· 33 33 device cannot clear poison from the address, -ENXIO is returned. 34 34 The clear_poison attribute is only visible for devices 35 35 supporting the capability. 36 + 37 + What: /sys/kernel/debug/cxl/einj_types 38 + Date: January, 2024 39 + KernelVersion: v6.9 40 + Contact: linux-cxl@vger.kernel.org 41 + Description: 42 + (RO) Prints the CXL protocol error types made available by 43 + the platform in the format: 44 + 45 + 0x<error number> <error type> 46 + 47 + The possible error types are (as of ACPI v6.5): 48 + 49 + 0x1000 CXL.cache Protocol Correctable 50 + 0x2000 CXL.cache Protocol Uncorrectable non-fatal 51 + 0x4000 CXL.cache Protocol Uncorrectable fatal 52 + 0x8000 CXL.mem Protocol Correctable 53 + 0x10000 CXL.mem Protocol Uncorrectable non-fatal 54 + 0x20000 CXL.mem Protocol Uncorrectable fatal 55 + 56 + The <error number> can be written to einj_inject to inject 57 + <error type> into a chosen dport. 58 + 59 + What: /sys/kernel/debug/cxl/$dport_dev/einj_inject 60 + Date: January, 2024 61 + KernelVersion: v6.9 62 + Contact: linux-cxl@vger.kernel.org 63 + Description: 64 + (WO) Writing an integer to this file injects the corresponding 65 + CXL protocol error into $dport_dev ($dport_dev will be a device 66 + name from /sys/bus/pci/devices). The integer to type mapping for 67 + injection can be found by reading from einj_types. If the dport 68 + was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise 69 + a CXL 2.0 error is injected.
+34
Documentation/ABI/testing/sysfs-bus-cxl
··· 552 552 attribute is only visible for devices supporting the 553 553 capability. The retrieved errors are logged as kernel 554 554 events when cxl_poison event tracing is enabled. 555 + 556 + 557 + What: /sys/bus/cxl/devices/regionZ/accessY/read_bandwidth 558 + /sys/bus/cxl/devices/regionZ/accessY/write_banwidth 559 + Date: Jan, 2024 560 + KernelVersion: v6.9 561 + Contact: linux-cxl@vger.kernel.org 562 + Description: 563 + (RO) The aggregated read or write bandwidth of the region. The 564 + number is the accumulated read or write bandwidth of all CXL memory 565 + devices that contributes to the region in MB/s. It is 566 + identical data that should appear in 567 + /sys/devices/system/node/nodeX/accessY/initiators/read_bandwidth or 568 + /sys/devices/system/node/nodeX/accessY/initiators/write_bandwidth. 569 + See Documentation/ABI/stable/sysfs-devices-node. access0 provides 570 + the number to the closest initiator and access1 provides the 571 + number to the closest CPU. 572 + 573 + 574 + What: /sys/bus/cxl/devices/regionZ/accessY/read_latency 575 + /sys/bus/cxl/devices/regionZ/accessY/write_latency 576 + Date: Jan, 2024 577 + KernelVersion: v6.9 578 + Contact: linux-cxl@vger.kernel.org 579 + Description: 580 + (RO) The read or write latency of the region. The number is 581 + the worst read or write latency of all CXL memory devices that 582 + contributes to the region in nanoseconds. It is identical data 583 + that should appear in 584 + /sys/devices/system/node/nodeX/accessY/initiators/read_latency or 585 + /sys/devices/system/node/nodeX/accessY/initiators/write_latency. 586 + See Documentation/ABI/stable/sysfs-devices-node. access0 provides 587 + the number to the closest initiator and access1 provides the 588 + number to the closest CPU.
+34
Documentation/firmware-guide/acpi/apei/einj.rst
··· 32 32 CONFIG_ACPI_APEI 33 33 CONFIG_ACPI_APEI_EINJ 34 34 35 + ...and to (optionally) enable CXL protocol error injection set:: 36 + 37 + CONFIG_ACPI_APEI_EINJ_CXL 38 + 35 39 The EINJ user interface is in <debugfs mount point>/apei/einj. 36 40 37 41 The following files belong to it: ··· 122 118 this actually works depends on what operations the BIOS actually 123 119 includes in the trigger phase. 124 120 121 + CXL error types are supported from ACPI 6.5 onwards (given a CXL port 122 + is present). The EINJ user interface for CXL error types is at 123 + <debugfs mount point>/cxl. The following files belong to it: 124 + 125 + - einj_types: 126 + 127 + Provides the same functionality as available_error_types above, but 128 + for CXL error types 129 + 130 + - $dport_dev/einj_inject: 131 + 132 + Injects a CXL error type into the CXL port represented by $dport_dev, 133 + where $dport_dev is the name of the CXL port (usually a PCIe device name). 134 + Error injections targeting a CXL 2.0+ port can use the legacy interface 135 + under <debugfs mount point>/apei/einj, while CXL 1.1/1.0 port injections 136 + must use this file. 137 + 138 + 125 139 BIOS versions based on the ACPI 4.0 specification have limited options 126 140 in controlling where the errors are injected. Your BIOS may support an 127 141 extension (enabled with the param_extension=1 module parameter, or boot ··· 202 180 [22715.834759] EDAC sbridge MC3: ADDR 12345000 EDAC sbridge MC3: MISC 144780c86 203 181 [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0 204 182 [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) 183 + 184 + A CXL error injection example with $dport_dev=0000:e0:01.1:: 185 + 186 + # cd /sys/kernel/debug/cxl/ 187 + # ls 188 + 0000:e0:01.1 0000:0c:00.0 189 + # cat einj_types # See which errors can be injected 190 + 0x00008000 CXL.mem Protocol Correctable 191 + 0x00010000 CXL.mem Protocol Uncorrectable non-fatal 192 + 0x00020000 CXL.mem Protocol Uncorrectable fatal 193 + # cd 0000:e0:01.1 # Navigate to dport to inject into 194 + # echo 0x8000 > einj_inject # Inject error 205 195 206 196 Special notes for injection into SGX enclaves: 207 197
+1
MAINTAINERS
··· 5321 5321 L: linux-cxl@vger.kernel.org 5322 5322 S: Maintained 5323 5323 F: drivers/cxl/ 5324 + F: include/linux/cxl-einj.h 5324 5325 F: include/linux/cxl-event.h 5325 5326 F: include/uapi/linux/cxl_mem.h 5326 5327 F: tools/testing/cxl/
+13
drivers/acpi/apei/Kconfig
··· 60 60 mainly used for debugging and testing the other parts of 61 61 APEI and some other RAS features. 62 62 63 + config ACPI_APEI_EINJ_CXL 64 + bool "CXL Error INJection Support" 65 + default ACPI_APEI_EINJ 66 + depends on ACPI_APEI_EINJ 67 + depends on CXL_BUS && CXL_BUS <= ACPI_APEI_EINJ 68 + help 69 + Support for CXL protocol Error INJection through debugfs/cxl. 70 + Availability and which errors are supported is dependent on 71 + the host platform. Look to ACPI v6.5 section 18.6.4 and kernel 72 + EINJ documentation for more information. 73 + 74 + If unsure say 'n' 75 + 63 76 config ACPI_APEI_ERST_DEBUG 64 77 tristate "APEI Error Record Serialization Table (ERST) Debug Support" 65 78 depends on ACPI_APEI
+2
drivers/acpi/apei/Makefile
··· 2 2 obj-$(CONFIG_ACPI_APEI) += apei.o 3 3 obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o 4 4 obj-$(CONFIG_ACPI_APEI_EINJ) += einj.o 5 + einj-y := einj-core.o 6 + einj-$(CONFIG_ACPI_APEI_EINJ_CXL) += einj-cxl.o 5 7 obj-$(CONFIG_ACPI_APEI_ERST_DEBUG) += erst-dbg.o 6 8 7 9 apei-y := apei-base.o hest.o erst.o bert.o
+18
drivers/acpi/apei/apei-internal.h
··· 130 130 } 131 131 132 132 int apei_osc_setup(void); 133 + 134 + int einj_get_available_error_type(u32 *type); 135 + int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3, 136 + u64 param4); 137 + int einj_cxl_rch_error_inject(u32 type, u32 flags, u64 param1, u64 param2, 138 + u64 param3, u64 param4); 139 + bool einj_is_cxl_error_type(u64 type); 140 + int einj_validate_error_type(u64 type); 141 + 142 + #ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE 143 + #define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12) 144 + #define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13) 145 + #define ACPI_EINJ_CXL_CACHE_FATAL BIT(14) 146 + #define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15) 147 + #define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16) 148 + #define ACPI_EINJ_CXL_MEM_FATAL BIT(17) 149 + #endif 150 + 133 151 #endif
+113
drivers/acpi/apei/einj-cxl.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * CXL Error INJection support. Used by CXL core to inject 4 + * protocol errors into CXL ports. 5 + * 6 + * Copyright (C) 2023 Advanced Micro Devices, Inc. 7 + * 8 + * Author: Ben Cheatham <benjamin.cheatham@amd.com> 9 + */ 10 + #include <linux/einj-cxl.h> 11 + #include <linux/seq_file.h> 12 + #include <linux/pci.h> 13 + 14 + #include "apei-internal.h" 15 + 16 + /* Defined in einj-core.c */ 17 + extern bool einj_initialized; 18 + 19 + static struct { u32 mask; const char *str; } const einj_cxl_error_type_string[] = { 20 + { ACPI_EINJ_CXL_CACHE_CORRECTABLE, "CXL.cache Protocol Correctable" }, 21 + { ACPI_EINJ_CXL_CACHE_UNCORRECTABLE, "CXL.cache Protocol Uncorrectable non-fatal" }, 22 + { ACPI_EINJ_CXL_CACHE_FATAL, "CXL.cache Protocol Uncorrectable fatal" }, 23 + { ACPI_EINJ_CXL_MEM_CORRECTABLE, "CXL.mem Protocol Correctable" }, 24 + { ACPI_EINJ_CXL_MEM_UNCORRECTABLE, "CXL.mem Protocol Uncorrectable non-fatal" }, 25 + { ACPI_EINJ_CXL_MEM_FATAL, "CXL.mem Protocol Uncorrectable fatal" }, 26 + }; 27 + 28 + int einj_cxl_available_error_type_show(struct seq_file *m, void *v) 29 + { 30 + int cxl_err, rc; 31 + u32 available_error_type = 0; 32 + 33 + rc = einj_get_available_error_type(&available_error_type); 34 + if (rc) 35 + return rc; 36 + 37 + for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) { 38 + cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos; 39 + 40 + if (available_error_type & cxl_err) 41 + seq_printf(m, "0x%08x\t%s\n", 42 + einj_cxl_error_type_string[pos].mask, 43 + einj_cxl_error_type_string[pos].str); 44 + } 45 + 46 + return 0; 47 + } 48 + EXPORT_SYMBOL_NS_GPL(einj_cxl_available_error_type_show, CXL); 49 + 50 + static int cxl_dport_get_sbdf(struct pci_dev *dport_dev, u64 *sbdf) 51 + { 52 + struct pci_bus *pbus; 53 + struct pci_host_bridge *bridge; 54 + u64 seg = 0, bus; 55 + 56 + pbus = dport_dev->bus; 57 + bridge = pci_find_host_bridge(pbus); 58 + 59 + if (!bridge) 60 + return -ENODEV; 61 + 62 + if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET) 63 + seg = bridge->domain_nr; 64 + 65 + bus = pbus->number; 66 + *sbdf = (seg << 24) | (bus << 16) | dport_dev->devfn; 67 + 68 + return 0; 69 + } 70 + 71 + int einj_cxl_inject_rch_error(u64 rcrb, u64 type) 72 + { 73 + int rc; 74 + 75 + /* Only CXL error types can be specified */ 76 + if (!einj_is_cxl_error_type(type)) 77 + return -EINVAL; 78 + 79 + rc = einj_validate_error_type(type); 80 + if (rc) 81 + return rc; 82 + 83 + return einj_cxl_rch_error_inject(type, 0x2, rcrb, GENMASK_ULL(63, 0), 84 + 0, 0); 85 + } 86 + EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_rch_error, CXL); 87 + 88 + int einj_cxl_inject_error(struct pci_dev *dport, u64 type) 89 + { 90 + u64 param4 = 0; 91 + int rc; 92 + 93 + /* Only CXL error types can be specified */ 94 + if (!einj_is_cxl_error_type(type)) 95 + return -EINVAL; 96 + 97 + rc = einj_validate_error_type(type); 98 + if (rc) 99 + return rc; 100 + 101 + rc = cxl_dport_get_sbdf(dport, &param4); 102 + if (rc) 103 + return rc; 104 + 105 + return einj_error_inject(type, 0x4, 0, 0, 0, param4); 106 + } 107 + EXPORT_SYMBOL_NS_GPL(einj_cxl_inject_error, CXL); 108 + 109 + bool einj_cxl_is_initialized(void) 110 + { 111 + return einj_initialized; 112 + } 113 + EXPORT_SYMBOL_NS_GPL(einj_cxl_is_initialized, CXL);
+101 -21
drivers/acpi/apei/einj.c drivers/acpi/apei/einj-core.c
··· 21 21 #include <linux/nmi.h> 22 22 #include <linux/delay.h> 23 23 #include <linux/mm.h> 24 + #include <linux/platform_device.h> 24 25 #include <asm/unaligned.h> 25 26 26 27 #include "apei-internal.h" ··· 37 36 #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \ 38 37 ACPI_EINJ_MEMORY_UNCORRECTABLE | \ 39 38 ACPI_EINJ_MEMORY_FATAL) 39 + #define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \ 40 + ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \ 41 + ACPI_EINJ_CXL_CACHE_FATAL | \ 42 + ACPI_EINJ_CXL_MEM_CORRECTABLE | \ 43 + ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \ 44 + ACPI_EINJ_CXL_MEM_FATAL) 40 45 41 46 /* 42 47 * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action. ··· 144 137 */ 145 138 static DEFINE_MUTEX(einj_mutex); 146 139 140 + /* 141 + * Exported APIs use this flag to exit early if einj_probe() failed. 142 + */ 143 + bool einj_initialized __ro_after_init; 144 + 147 145 static void *einj_param; 148 146 149 147 static void einj_exec_ctx_init(struct apei_exec_context *ctx) ··· 172 160 } 173 161 174 162 /* Get error injection capabilities of the platform */ 175 - static int einj_get_available_error_type(u32 *type) 163 + int einj_get_available_error_type(u32 *type) 176 164 { 177 165 int rc; 178 166 ··· 542 530 } 543 531 544 532 /* Inject the specified hardware error */ 545 - static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, 546 - u64 param3, u64 param4) 533 + int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3, 534 + u64 param4) 547 535 { 548 536 int rc; 549 537 u64 base_addr, size; ··· 566 554 if (type & ACPI5_VENDOR_BIT) { 567 555 if (vendor_flags != SETWA_FLAGS_MEM) 568 556 goto inject; 569 - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) 557 + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) { 570 558 goto inject; 559 + } 560 + 561 + /* 562 + * Injections targeting a CXL 1.0/1.1 port have to be injected 563 + * via the einj_cxl_rch_error_inject() path as that does the proper 564 + * validation of the given RCRB base (MMIO) address. 565 + */ 566 + if (einj_is_cxl_error_type(type) && (flags & SETWA_FLAGS_MEM)) 567 + return -EINVAL; 571 568 572 569 /* 573 570 * Disallow crazy address masks that give BIOS leeway to pick ··· 608 587 return rc; 609 588 } 610 589 590 + int einj_cxl_rch_error_inject(u32 type, u32 flags, u64 param1, u64 param2, 591 + u64 param3, u64 param4) 592 + { 593 + int rc; 594 + 595 + if (!(einj_is_cxl_error_type(type) && (flags & SETWA_FLAGS_MEM))) 596 + return -EINVAL; 597 + 598 + mutex_lock(&einj_mutex); 599 + rc = __einj_error_inject(type, flags, param1, param2, param3, param4); 600 + mutex_unlock(&einj_mutex); 601 + 602 + return rc; 603 + } 604 + 611 605 static u32 error_type; 612 606 static u32 error_flags; 613 607 static u64 error_param1; ··· 643 607 { BIT(9), "Platform Correctable" }, 644 608 { BIT(10), "Platform Uncorrectable non-fatal" }, 645 609 { BIT(11), "Platform Uncorrectable fatal"}, 646 - { BIT(12), "CXL.cache Protocol Correctable" }, 647 - { BIT(13), "CXL.cache Protocol Uncorrectable non-fatal" }, 648 - { BIT(14), "CXL.cache Protocol Uncorrectable fatal" }, 649 - { BIT(15), "CXL.mem Protocol Correctable" }, 650 - { BIT(16), "CXL.mem Protocol Uncorrectable non-fatal" }, 651 - { BIT(17), "CXL.mem Protocol Uncorrectable fatal" }, 652 610 { BIT(31), "Vendor Defined Error Types" }, 653 611 }; 654 612 ··· 671 641 return 0; 672 642 } 673 643 674 - static int error_type_set(void *data, u64 val) 644 + bool einj_is_cxl_error_type(u64 type) 675 645 { 646 + return (type & CXL_ERROR_MASK) && (!(type & ACPI5_VENDOR_BIT)); 647 + } 648 + 649 + int einj_validate_error_type(u64 type) 650 + { 651 + u32 tval, vendor, available_error_type = 0; 676 652 int rc; 677 - u32 available_error_type = 0; 678 - u32 tval, vendor; 679 653 680 654 /* Only low 32 bits for error type are valid */ 681 - if (val & GENMASK_ULL(63, 32)) 655 + if (type & GENMASK_ULL(63, 32)) 682 656 return -EINVAL; 683 657 684 658 /* 685 659 * Vendor defined types have 0x80000000 bit set, and 686 660 * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE 687 661 */ 688 - vendor = val & ACPI5_VENDOR_BIT; 689 - tval = val & 0x7fffffff; 662 + vendor = type & ACPI5_VENDOR_BIT; 663 + tval = type & GENMASK(30, 0); 690 664 691 665 /* Only one error type can be specified */ 692 666 if (tval & (tval - 1)) ··· 699 665 rc = einj_get_available_error_type(&available_error_type); 700 666 if (rc) 701 667 return rc; 702 - if (!(val & available_error_type)) 668 + if (!(type & available_error_type)) 703 669 return -EINVAL; 704 670 } 671 + 672 + return 0; 673 + } 674 + 675 + static int error_type_set(void *data, u64 val) 676 + { 677 + int rc; 678 + 679 + rc = einj_validate_error_type(val); 680 + if (rc) 681 + return rc; 682 + 705 683 error_type = val; 706 684 707 685 return 0; ··· 749 703 return 0; 750 704 } 751 705 752 - static int __init einj_init(void) 706 + static int __init einj_probe(struct platform_device *pdev) 753 707 { 754 708 int rc; 755 709 acpi_status status; 756 710 struct apei_exec_context ctx; 757 711 758 712 if (acpi_disabled) { 759 - pr_info("ACPI disabled.\n"); 713 + pr_debug("ACPI disabled.\n"); 760 714 return -ENODEV; 761 715 } 762 716 763 717 status = acpi_get_table(ACPI_SIG_EINJ, 0, 764 718 (struct acpi_table_header **)&einj_tab); 765 719 if (status == AE_NOT_FOUND) { 766 - pr_warn("EINJ table not found.\n"); 720 + pr_debug("EINJ table not found.\n"); 767 721 return -ENODEV; 768 722 } else if (ACPI_FAILURE(status)) { 769 723 pr_err("Failed to get EINJ table: %s\n", ··· 851 805 return rc; 852 806 } 853 807 854 - static void __exit einj_exit(void) 808 + static void __exit einj_remove(struct platform_device *pdev) 855 809 { 856 810 struct apei_exec_context ctx; 857 811 ··· 870 824 apei_resources_fini(&einj_resources); 871 825 debugfs_remove_recursive(einj_debug_dir); 872 826 acpi_put_table((struct acpi_table_header *)einj_tab); 827 + } 828 + 829 + static struct platform_device *einj_dev; 830 + static struct platform_driver einj_driver = { 831 + .remove_new = einj_remove, 832 + .driver = { 833 + .name = "acpi-einj", 834 + }, 835 + }; 836 + 837 + static int __init einj_init(void) 838 + { 839 + struct platform_device_info einj_dev_info = { 840 + .name = "acpi-einj", 841 + .id = -1, 842 + }; 843 + int rc; 844 + 845 + einj_dev = platform_device_register_full(&einj_dev_info); 846 + if (IS_ERR(einj_dev)) 847 + return PTR_ERR(einj_dev); 848 + 849 + rc = platform_driver_probe(&einj_driver, einj_probe); 850 + einj_initialized = rc == 0; 851 + 852 + return 0; 853 + } 854 + 855 + static void __exit einj_exit(void) 856 + { 857 + if (einj_initialized) 858 + platform_driver_unregister(&einj_driver); 859 + 860 + platform_device_del(einj_dev); 873 861 } 874 862 875 863 module_init(einj_init);
+64 -19
drivers/acpi/numa/hmat.c
··· 59 59 }; 60 60 61 61 enum { 62 - NODE_ACCESS_CLASS_0 = 0, 63 - NODE_ACCESS_CLASS_1, 64 - NODE_ACCESS_CLASS_GENPORT_SINK, 62 + NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL = ACCESS_COORDINATE_MAX, 63 + NODE_ACCESS_CLASS_GENPORT_SINK_CPU, 65 64 NODE_ACCESS_CLASS_MAX, 66 65 }; 67 66 ··· 74 75 struct node_cache_attrs cache_attrs; 75 76 u8 gen_port_device_handle[ACPI_SRAT_DEVICE_HANDLE_SIZE]; 76 77 bool registered; 78 + bool ext_updated; /* externally updated */ 77 79 }; 78 80 79 81 struct memory_initiator { ··· 127 127 /** 128 128 * acpi_get_genport_coordinates - Retrieve the access coordinates for a generic port 129 129 * @uid: ACPI unique id 130 - * @coord: The access coordinates written back out for the generic port 130 + * @coord: The access coordinates written back out for the generic port. 131 + * Expect 2 levels array. 131 132 * 132 133 * Return: 0 on success. Errno on failure. 133 134 * ··· 144 143 if (!target) 145 144 return -ENOENT; 146 145 147 - *coord = target->coord[NODE_ACCESS_CLASS_GENPORT_SINK]; 146 + coord[ACCESS_COORDINATE_LOCAL] = 147 + target->coord[NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL]; 148 + coord[ACCESS_COORDINATE_CPU] = 149 + target->coord[NODE_ACCESS_CLASS_GENPORT_SINK_CPU]; 148 150 149 151 return 0; 150 152 } ··· 329 325 } 330 326 } 331 327 328 + int hmat_update_target_coordinates(int nid, struct access_coordinate *coord, 329 + enum access_coordinate_class access) 330 + { 331 + struct memory_target *target; 332 + int pxm; 333 + 334 + if (nid == NUMA_NO_NODE) 335 + return -EINVAL; 336 + 337 + pxm = node_to_pxm(nid); 338 + guard(mutex)(&target_lock); 339 + target = find_mem_target(pxm); 340 + if (!target) 341 + return -ENODEV; 342 + 343 + hmat_update_target_access(target, ACPI_HMAT_READ_LATENCY, 344 + coord->read_latency, access); 345 + hmat_update_target_access(target, ACPI_HMAT_WRITE_LATENCY, 346 + coord->write_latency, access); 347 + hmat_update_target_access(target, ACPI_HMAT_READ_BANDWIDTH, 348 + coord->read_bandwidth, access); 349 + hmat_update_target_access(target, ACPI_HMAT_WRITE_BANDWIDTH, 350 + coord->write_bandwidth, access); 351 + target->ext_updated = true; 352 + 353 + return 0; 354 + } 355 + EXPORT_SYMBOL_GPL(hmat_update_target_coordinates); 356 + 332 357 static __init void hmat_add_locality(struct acpi_hmat_locality *hmat_loc) 333 358 { 334 359 struct memory_locality *loc; ··· 407 374 408 375 if (target && target->processor_pxm == init_pxm) { 409 376 hmat_update_target_access(target, type, value, 410 - NODE_ACCESS_CLASS_0); 377 + ACCESS_COORDINATE_LOCAL); 411 378 /* If the node has a CPU, update access 1 */ 412 379 if (node_state(pxm_to_node(init_pxm), N_CPU)) 413 380 hmat_update_target_access(target, type, value, 414 - NODE_ACCESS_CLASS_1); 381 + ACCESS_COORDINATE_CPU); 415 382 } 416 383 } 417 384 ··· 729 696 u32 best = 0; 730 697 int i; 731 698 699 + /* Don't update if an external agent has changed the data. */ 700 + if (target->ext_updated) 701 + return; 702 + 732 703 /* Don't update for generic port if there's no device handle */ 733 - if (access == NODE_ACCESS_CLASS_GENPORT_SINK && 704 + if ((access == NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL || 705 + access == NODE_ACCESS_CLASS_GENPORT_SINK_CPU) && 734 706 !(*(u16 *)target->gen_port_device_handle)) 735 707 return; 736 708 ··· 747 709 */ 748 710 if (target->processor_pxm != PXM_INVAL) { 749 711 cpu_nid = pxm_to_node(target->processor_pxm); 750 - if (access == 0 || node_state(cpu_nid, N_CPU)) { 712 + if (access == ACCESS_COORDINATE_LOCAL || 713 + node_state(cpu_nid, N_CPU)) { 751 714 set_bit(target->processor_pxm, p_nodes); 752 715 return; 753 716 } ··· 776 737 list_for_each_entry(initiator, &initiators, node) { 777 738 u32 value; 778 739 779 - if (access == 1 && !initiator->has_cpu) { 740 + if ((access == ACCESS_COORDINATE_CPU || 741 + access == NODE_ACCESS_CLASS_GENPORT_SINK_CPU) && 742 + !initiator->has_cpu) { 780 743 clear_bit(initiator->processor_pxm, p_nodes); 781 744 continue; 782 745 } ··· 811 770 } 812 771 } 813 772 814 - static void hmat_register_generic_target_initiators(struct memory_target *target) 773 + static void hmat_update_generic_target(struct memory_target *target) 815 774 { 816 775 static DECLARE_BITMAP(p_nodes, MAX_NUMNODES); 817 776 818 - __hmat_register_target_initiators(target, p_nodes, 819 - NODE_ACCESS_CLASS_GENPORT_SINK); 777 + hmat_update_target_attrs(target, p_nodes, 778 + NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL); 779 + hmat_update_target_attrs(target, p_nodes, 780 + NODE_ACCESS_CLASS_GENPORT_SINK_CPU); 820 781 } 821 782 822 783 static void hmat_register_target_initiators(struct memory_target *target) 823 784 { 824 785 static DECLARE_BITMAP(p_nodes, MAX_NUMNODES); 825 786 826 - __hmat_register_target_initiators(target, p_nodes, 0); 827 - __hmat_register_target_initiators(target, p_nodes, 1); 787 + __hmat_register_target_initiators(target, p_nodes, 788 + ACCESS_COORDINATE_LOCAL); 789 + __hmat_register_target_initiators(target, p_nodes, 790 + ACCESS_COORDINATE_CPU); 828 791 } 829 792 830 793 static void hmat_register_target_cache(struct memory_target *target) ··· 880 835 */ 881 836 mutex_lock(&target_lock); 882 837 if (*(u16 *)target->gen_port_device_handle) { 883 - hmat_register_generic_target_initiators(target); 838 + hmat_update_generic_target(target); 884 839 target->registered = true; 885 840 } 886 841 mutex_unlock(&target_lock); ··· 899 854 if (!target->registered) { 900 855 hmat_register_target_initiators(target); 901 856 hmat_register_target_cache(target); 902 - hmat_register_target_perf(target, NODE_ACCESS_CLASS_0); 903 - hmat_register_target_perf(target, NODE_ACCESS_CLASS_1); 857 + hmat_register_target_perf(target, ACCESS_COORDINATE_LOCAL); 858 + hmat_register_target_perf(target, ACCESS_COORDINATE_CPU); 904 859 target->registered = true; 905 860 } 906 861 mutex_unlock(&target_lock); ··· 972 927 return NOTIFY_OK; 973 928 974 929 mutex_lock(&target_lock); 975 - hmat_update_target_attrs(target, p_nodes, 1); 930 + hmat_update_target_attrs(target, p_nodes, ACCESS_COORDINATE_CPU); 976 931 mutex_unlock(&target_lock); 977 932 978 933 perf = &target->coord[1];
+11
drivers/acpi/numa/srat.c
··· 29 29 unsigned char acpi_srat_revision __initdata; 30 30 static int acpi_numa __initdata; 31 31 32 + static int last_real_pxm; 33 + 32 34 void __init disable_srat(void) 33 35 { 34 36 acpi_numa = -1; ··· 538 536 if (node_to_pxm_map[i] > fake_pxm) 539 537 fake_pxm = node_to_pxm_map[i]; 540 538 } 539 + last_real_pxm = fake_pxm; 541 540 fake_pxm++; 542 541 acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, 543 542 &fake_pxm); ··· 549 546 return -ENOENT; 550 547 return 0; 551 548 } 549 + 550 + bool acpi_node_backed_by_real_pxm(int nid) 551 + { 552 + int pxm = node_to_pxm(nid); 553 + 554 + return pxm <= last_real_pxm; 555 + } 556 + EXPORT_SYMBOL_GPL(acpi_node_backed_by_real_pxm); 552 557 553 558 static int acpi_get_pxm(acpi_handle h) 554 559 {
+1 -1
drivers/acpi/tables.c
··· 253 253 254 254 count = acpi_parse_entries_array(id, table_size, 255 255 (union fw_table_header *)table_header, 256 - proc, proc_num, max_entries); 256 + 0, proc, proc_num, max_entries); 257 257 258 258 acpi_put_table(table_header); 259 259 return count;
+4 -3
drivers/base/node.c
··· 126 126 } 127 127 128 128 static struct node_access_nodes *node_init_node_access(struct node *node, 129 - unsigned int access) 129 + enum access_coordinate_class access) 130 130 { 131 131 struct node_access_nodes *access_node; 132 132 struct device *dev; ··· 191 191 * @access: The access class the for the given attributes 192 192 */ 193 193 void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, 194 - unsigned int access) 194 + enum access_coordinate_class access) 195 195 { 196 196 struct node_access_nodes *c; 197 197 struct node *node; ··· 215 215 } 216 216 } 217 217 } 218 + EXPORT_SYMBOL_GPL(node_set_perf_attrs); 218 219 219 220 /** 220 221 * struct node_cache_info - Internal tracking for memory node caches ··· 690 689 */ 691 690 int register_memory_node_under_compute_node(unsigned int mem_nid, 692 691 unsigned int cpu_nid, 693 - unsigned int access) 692 + enum access_coordinate_class access) 694 693 { 695 694 struct node *init_node, *targ_node; 696 695 struct node_access_nodes *initiator, *target;
+5 -3
drivers/cxl/acpi.c
··· 530 530 if (kstrtou32(acpi_device_uid(hb), 0, &uid)) 531 531 return -EINVAL; 532 532 533 - rc = acpi_get_genport_coordinates(uid, &dport->hb_coord); 533 + rc = acpi_get_genport_coordinates(uid, dport->hb_coord); 534 534 if (rc < 0) 535 535 return rc; 536 536 537 537 /* Adjust back to picoseconds from nanoseconds */ 538 - dport->hb_coord.read_latency *= 1000; 539 - dport->hb_coord.write_latency *= 1000; 538 + for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { 539 + dport->hb_coord[i].read_latency *= 1000; 540 + dport->hb_coord[i].write_latency *= 1000; 541 + } 540 542 541 543 return 0; 542 544 }
+138 -32
drivers/cxl/core/cdat.c
··· 9 9 #include "cxlmem.h" 10 10 #include "core.h" 11 11 #include "cxl.h" 12 + #include "core.h" 12 13 13 14 struct dsmas_entry { 14 15 struct range dpa_range; ··· 150 149 int rc; 151 150 152 151 rc = cdat_table_parse(ACPI_CDAT_TYPE_DSMAS, cdat_dsmas_handler, 153 - dsmas_xa, port->cdat.table); 152 + dsmas_xa, port->cdat.table, port->cdat.length); 154 153 rc = cdat_table_parse_output(rc); 155 154 if (rc) 156 155 return rc; 157 156 158 157 rc = cdat_table_parse(ACPI_CDAT_TYPE_DSLBIS, cdat_dslbis_handler, 159 - dsmas_xa, port->cdat.table); 158 + dsmas_xa, port->cdat.table, port->cdat.length); 160 159 return cdat_table_parse_output(rc); 161 160 } 162 161 163 162 static int cxl_port_perf_data_calculate(struct cxl_port *port, 164 163 struct xarray *dsmas_xa) 165 164 { 166 - struct access_coordinate c; 165 + struct access_coordinate ep_c; 166 + struct access_coordinate coord[ACCESS_COORDINATE_MAX]; 167 167 struct dsmas_entry *dent; 168 168 int valid_entries = 0; 169 169 unsigned long index; 170 170 int rc; 171 171 172 - rc = cxl_endpoint_get_perf_coordinates(port, &c); 172 + rc = cxl_endpoint_get_perf_coordinates(port, &ep_c); 173 173 if (rc) { 174 - dev_dbg(&port->dev, "Failed to retrieve perf coordinates.\n"); 174 + dev_dbg(&port->dev, "Failed to retrieve ep perf coordinates.\n"); 175 + return rc; 176 + } 177 + 178 + rc = cxl_hb_get_perf_coordinates(port, coord); 179 + if (rc) { 180 + dev_dbg(&port->dev, "Failed to retrieve hb perf coordinates.\n"); 175 181 return rc; 176 182 } 177 183 ··· 193 185 xa_for_each(dsmas_xa, index, dent) { 194 186 int qos_class; 195 187 196 - dent->coord.read_latency = dent->coord.read_latency + 197 - c.read_latency; 198 - dent->coord.write_latency = dent->coord.write_latency + 199 - c.write_latency; 200 - dent->coord.read_bandwidth = min_t(int, c.read_bandwidth, 201 - dent->coord.read_bandwidth); 202 - dent->coord.write_bandwidth = min_t(int, c.write_bandwidth, 203 - dent->coord.write_bandwidth); 204 - 188 + cxl_coordinates_combine(&dent->coord, &dent->coord, &ep_c); 189 + /* 190 + * Keeping the host bridge coordinates separate from the dsmas 191 + * coordinates in order to allow calculation of access class 192 + * 0 and 1 for region later. 193 + */ 194 + cxl_coordinates_combine(&coord[ACCESS_COORDINATE_CPU], 195 + &coord[ACCESS_COORDINATE_CPU], 196 + &dent->coord); 205 197 dent->entries = 1; 206 - rc = cxl_root->ops->qos_class(cxl_root, &dent->coord, 1, 207 - &qos_class); 198 + rc = cxl_root->ops->qos_class(cxl_root, 199 + &coord[ACCESS_COORDINATE_CPU], 200 + 1, &qos_class); 208 201 if (rc != 1) 209 202 continue; 210 203 ··· 398 389 static int cdat_sslbis_handler(union acpi_subtable_headers *header, void *arg, 399 390 const unsigned long end) 400 391 { 392 + struct acpi_cdat_sslbis_table { 393 + struct acpi_cdat_header header; 394 + struct acpi_cdat_sslbis sslbis_header; 395 + struct acpi_cdat_sslbe entries[]; 396 + } *tbl = (struct acpi_cdat_sslbis_table *)header; 397 + int size = sizeof(header->cdat) + sizeof(tbl->sslbis_header); 401 398 struct acpi_cdat_sslbis *sslbis; 402 - int size = sizeof(header->cdat) + sizeof(*sslbis); 403 399 struct cxl_port *port = arg; 404 400 struct device *dev = &port->dev; 405 - struct acpi_cdat_sslbe *entry; 406 401 int remain, entries, i; 407 402 u16 len; 408 403 409 404 len = le16_to_cpu((__force __le16)header->cdat.length); 410 405 remain = len - size; 411 - if (!remain || remain % sizeof(*entry) || 406 + if (!remain || remain % sizeof(tbl->entries[0]) || 412 407 (unsigned long)header + len > end) { 413 408 dev_warn(dev, "Malformed SSLBIS table length: (%u)\n", len); 414 409 return -EINVAL; 415 410 } 416 411 417 - /* Skip common header */ 418 - sslbis = (struct acpi_cdat_sslbis *)((unsigned long)header + 419 - sizeof(header->cdat)); 420 - 412 + sslbis = &tbl->sslbis_header; 421 413 /* Unrecognized data type, we can skip */ 422 414 if (sslbis->data_type > ACPI_HMAT_WRITE_BANDWIDTH) 423 415 return 0; 424 416 425 - entries = remain / sizeof(*entry); 426 - entry = (struct acpi_cdat_sslbe *)((unsigned long)header + sizeof(*sslbis)); 417 + entries = remain / sizeof(tbl->entries[0]); 418 + if (struct_size(tbl, entries, entries) != len) 419 + return -EINVAL; 427 420 428 421 for (i = 0; i < entries; i++) { 429 - u16 x = le16_to_cpu((__force __le16)entry->portx_id); 430 - u16 y = le16_to_cpu((__force __le16)entry->porty_id); 422 + u16 x = le16_to_cpu((__force __le16)tbl->entries[i].portx_id); 423 + u16 y = le16_to_cpu((__force __le16)tbl->entries[i].porty_id); 431 424 __le64 le_base; 432 425 __le16 le_val; 433 426 struct cxl_dport *dport; ··· 459 448 break; 460 449 } 461 450 462 - le_base = (__force __le64)sslbis->entry_base_unit; 463 - le_val = (__force __le16)entry->latency_or_bandwidth; 451 + le_base = (__force __le64)tbl->sslbis_header.entry_base_unit; 452 + le_val = (__force __le16)tbl->entries[i].latency_or_bandwidth; 464 453 465 454 if (check_mul_overflow(le64_to_cpu(le_base), 466 455 le16_to_cpu(le_val), &val)) ··· 473 462 sslbis->data_type, 474 463 val); 475 464 } 476 - 477 - entry++; 478 465 } 479 466 480 467 return 0; ··· 486 477 return; 487 478 488 479 rc = cdat_table_parse(ACPI_CDAT_TYPE_SSLBIS, cdat_sslbis_handler, 489 - port, port->cdat.table); 480 + port, port->cdat.table, port->cdat.length); 490 481 rc = cdat_table_parse_output(rc); 491 482 if (rc) 492 483 dev_dbg(&port->dev, "Failed to parse SSLBIS: %d\n", rc); 493 484 } 494 485 EXPORT_SYMBOL_NS_GPL(cxl_switch_parse_cdat, CXL); 495 486 487 + /** 488 + * cxl_coordinates_combine - Combine the two input coordinates 489 + * 490 + * @out: Output coordinate of c1 and c2 combined 491 + * @c1: input coordinates 492 + * @c2: input coordinates 493 + */ 494 + void cxl_coordinates_combine(struct access_coordinate *out, 495 + struct access_coordinate *c1, 496 + struct access_coordinate *c2) 497 + { 498 + if (c1->write_bandwidth && c2->write_bandwidth) 499 + out->write_bandwidth = min(c1->write_bandwidth, 500 + c2->write_bandwidth); 501 + out->write_latency = c1->write_latency + c2->write_latency; 502 + 503 + if (c1->read_bandwidth && c2->read_bandwidth) 504 + out->read_bandwidth = min(c1->read_bandwidth, 505 + c2->read_bandwidth); 506 + out->read_latency = c1->read_latency + c2->read_latency; 507 + } 508 + 496 509 MODULE_IMPORT_NS(CXL); 510 + 511 + void cxl_region_perf_data_calculate(struct cxl_region *cxlr, 512 + struct cxl_endpoint_decoder *cxled) 513 + { 514 + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 515 + struct cxl_port *port = cxlmd->endpoint; 516 + struct cxl_dev_state *cxlds = cxlmd->cxlds; 517 + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); 518 + struct access_coordinate hb_coord[ACCESS_COORDINATE_MAX]; 519 + struct access_coordinate coord; 520 + struct range dpa = { 521 + .start = cxled->dpa_res->start, 522 + .end = cxled->dpa_res->end, 523 + }; 524 + struct cxl_dpa_perf *perf; 525 + int rc; 526 + 527 + switch (cxlr->mode) { 528 + case CXL_DECODER_RAM: 529 + perf = &mds->ram_perf; 530 + break; 531 + case CXL_DECODER_PMEM: 532 + perf = &mds->pmem_perf; 533 + break; 534 + default: 535 + return; 536 + } 537 + 538 + lockdep_assert_held(&cxl_dpa_rwsem); 539 + 540 + if (!range_contains(&perf->dpa_range, &dpa)) 541 + return; 542 + 543 + rc = cxl_hb_get_perf_coordinates(port, hb_coord); 544 + if (rc) { 545 + dev_dbg(&port->dev, "Failed to retrieve hb perf coordinates.\n"); 546 + return; 547 + } 548 + 549 + for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { 550 + /* Pickup the host bridge coords */ 551 + cxl_coordinates_combine(&coord, &hb_coord[i], &perf->coord); 552 + 553 + /* Get total bandwidth and the worst latency for the cxl region */ 554 + cxlr->coord[i].read_latency = max_t(unsigned int, 555 + cxlr->coord[i].read_latency, 556 + coord.read_latency); 557 + cxlr->coord[i].write_latency = max_t(unsigned int, 558 + cxlr->coord[i].write_latency, 559 + coord.write_latency); 560 + cxlr->coord[i].read_bandwidth += coord.read_bandwidth; 561 + cxlr->coord[i].write_bandwidth += coord.write_bandwidth; 562 + 563 + /* 564 + * Convert latency to nanosec from picosec to be consistent 565 + * with the resulting latency coordinates computed by the 566 + * HMAT_REPORTING code. 567 + */ 568 + cxlr->coord[i].read_latency = 569 + DIV_ROUND_UP(cxlr->coord[i].read_latency, 1000); 570 + cxlr->coord[i].write_latency = 571 + DIV_ROUND_UP(cxlr->coord[i].write_latency, 1000); 572 + } 573 + } 574 + 575 + int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, 576 + enum access_coordinate_class access) 577 + { 578 + return hmat_update_target_coordinates(nid, &cxlr->coord[access], access); 579 + } 580 + 581 + bool cxl_need_node_perf_attrs_update(int nid) 582 + { 583 + return !acpi_node_backed_by_real_pxm(nid); 584 + }
+4
drivers/cxl/core/core.h
··· 90 90 91 91 long cxl_pci_get_latency(struct pci_dev *pdev); 92 92 93 + int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, 94 + enum access_coordinate_class access); 95 + bool cxl_need_node_perf_attrs_update(int nid); 96 + 93 97 #endif /* __CXL_CORE_H__ */
+55 -44
drivers/cxl/core/pci.c
··· 518 518 FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle))) 519 519 520 520 static int cxl_cdat_get_length(struct device *dev, 521 - struct pci_doe_mb *cdat_doe, 521 + struct pci_doe_mb *doe_mb, 522 522 size_t *length) 523 523 { 524 524 __le32 request = CDAT_DOE_REQ(0); 525 525 __le32 response[2]; 526 526 int rc; 527 527 528 - rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL, 528 + rc = pci_doe(doe_mb, PCI_DVSEC_VENDOR_ID_CXL, 529 529 CXL_DOE_PROTOCOL_TABLE_ACCESS, 530 530 &request, sizeof(request), 531 531 &response, sizeof(response)); ··· 543 543 } 544 544 545 545 static int cxl_cdat_read_table(struct device *dev, 546 - struct pci_doe_mb *cdat_doe, 547 - void *cdat_table, size_t *cdat_length) 546 + struct pci_doe_mb *doe_mb, 547 + struct cdat_doe_rsp *rsp, size_t *length) 548 548 { 549 - size_t length = *cdat_length + sizeof(__le32); 550 - __le32 *data = cdat_table; 551 - int entry_handle = 0; 549 + size_t received, remaining = *length; 550 + unsigned int entry_handle = 0; 551 + union cdat_data *data; 552 552 __le32 saved_dw = 0; 553 553 554 554 do { 555 555 __le32 request = CDAT_DOE_REQ(entry_handle); 556 - struct cdat_entry_header *entry; 557 - size_t entry_dw; 558 556 int rc; 559 557 560 - rc = pci_doe(cdat_doe, PCI_DVSEC_VENDOR_ID_CXL, 558 + rc = pci_doe(doe_mb, PCI_DVSEC_VENDOR_ID_CXL, 561 559 CXL_DOE_PROTOCOL_TABLE_ACCESS, 562 560 &request, sizeof(request), 563 - data, length); 561 + rsp, sizeof(*rsp) + remaining); 564 562 if (rc < 0) { 565 563 dev_err(dev, "DOE failed: %d", rc); 566 564 return rc; 567 565 } 568 566 569 - /* 1 DW Table Access Response Header + CDAT entry */ 570 - entry = (struct cdat_entry_header *)(data + 1); 571 - if ((entry_handle == 0 && 572 - rc != sizeof(__le32) + sizeof(struct cdat_header)) || 573 - (entry_handle > 0 && 574 - (rc < sizeof(__le32) + sizeof(*entry) || 575 - rc != sizeof(__le32) + le16_to_cpu(entry->length)))) 567 + if (rc < sizeof(*rsp)) 576 568 return -EIO; 569 + 570 + data = (union cdat_data *)rsp->data; 571 + received = rc - sizeof(*rsp); 572 + 573 + if (entry_handle == 0) { 574 + if (received != sizeof(data->header)) 575 + return -EIO; 576 + } else { 577 + if (received < sizeof(data->entry) || 578 + received != le16_to_cpu(data->entry.length)) 579 + return -EIO; 580 + } 577 581 578 582 /* Get the CXL table access header entry handle */ 579 583 entry_handle = FIELD_GET(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, 580 - le32_to_cpu(data[0])); 581 - entry_dw = rc / sizeof(__le32); 582 - /* Skip Header */ 583 - entry_dw -= 1; 584 + le32_to_cpu(rsp->doe_header)); 585 + 584 586 /* 585 587 * Table Access Response Header overwrote the last DW of 586 588 * previous entry, so restore that DW 587 589 */ 588 - *data = saved_dw; 589 - length -= entry_dw * sizeof(__le32); 590 - data += entry_dw; 591 - saved_dw = *data; 590 + rsp->doe_header = saved_dw; 591 + remaining -= received; 592 + rsp = (void *)rsp + received; 593 + saved_dw = rsp->doe_header; 592 594 } while (entry_handle != CXL_DOE_TABLE_ACCESS_LAST_ENTRY); 593 595 594 596 /* Length in CDAT header may exceed concatenation of CDAT entries */ 595 - *cdat_length -= length - sizeof(__le32); 597 + *length -= remaining; 596 598 597 599 return 0; 598 600 } ··· 619 617 { 620 618 struct device *uport = port->uport_dev; 621 619 struct device *dev = &port->dev; 622 - struct pci_doe_mb *cdat_doe; 620 + struct pci_doe_mb *doe_mb; 623 621 struct pci_dev *pdev = NULL; 624 622 struct cxl_memdev *cxlmd; 625 - size_t cdat_length; 626 - void *cdat_table, *cdat_buf; 623 + struct cdat_doe_rsp *buf; 624 + size_t table_length, length; 627 625 int rc; 628 626 629 627 if (is_cxl_memdev(uport)) { ··· 640 638 if (!pdev) 641 639 return; 642 640 643 - cdat_doe = pci_find_doe_mailbox(pdev, PCI_DVSEC_VENDOR_ID_CXL, 644 - CXL_DOE_PROTOCOL_TABLE_ACCESS); 645 - if (!cdat_doe) { 641 + doe_mb = pci_find_doe_mailbox(pdev, PCI_DVSEC_VENDOR_ID_CXL, 642 + CXL_DOE_PROTOCOL_TABLE_ACCESS); 643 + if (!doe_mb) { 646 644 dev_dbg(dev, "No CDAT mailbox\n"); 647 645 return; 648 646 } 649 647 650 648 port->cdat_available = true; 651 649 652 - if (cxl_cdat_get_length(dev, cdat_doe, &cdat_length)) { 650 + if (cxl_cdat_get_length(dev, doe_mb, &length)) { 653 651 dev_dbg(dev, "No CDAT length\n"); 654 652 return; 655 653 } 656 654 657 - cdat_buf = devm_kzalloc(dev, cdat_length + sizeof(__le32), GFP_KERNEL); 658 - if (!cdat_buf) 659 - return; 655 + /* 656 + * The begin of the CDAT buffer needs space for additional 4 657 + * bytes for the DOE header. Table data starts afterwards. 658 + */ 659 + buf = devm_kzalloc(dev, sizeof(*buf) + length, GFP_KERNEL); 660 + if (!buf) 661 + goto err; 660 662 661 - rc = cxl_cdat_read_table(dev, cdat_doe, cdat_buf, &cdat_length); 663 + table_length = length; 664 + 665 + rc = cxl_cdat_read_table(dev, doe_mb, buf, &length); 662 666 if (rc) 663 667 goto err; 664 668 665 - cdat_table = cdat_buf + sizeof(__le32); 666 - if (cdat_checksum(cdat_table, cdat_length)) 669 + if (table_length != length) 670 + dev_warn(dev, "Malformed CDAT table length (%zu:%zu), discarding trailing data\n", 671 + table_length, length); 672 + 673 + if (cdat_checksum(buf->data, length)) 667 674 goto err; 668 675 669 - port->cdat.table = cdat_table; 670 - port->cdat.length = cdat_length; 671 - return; 676 + port->cdat.table = buf->data; 677 + port->cdat.length = length; 672 678 679 + return; 673 680 err: 674 681 /* Don't leave table data allocated on error */ 675 - devm_kfree(dev, cdat_buf); 682 + devm_kfree(dev, buf); 676 683 dev_err(dev, "Failed to read/validate CDAT.\n"); 677 684 } 678 685 EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
+71 -15
drivers/cxl/core/port.c
··· 3 3 #include <linux/platform_device.h> 4 4 #include <linux/memregion.h> 5 5 #include <linux/workqueue.h> 6 + #include <linux/einj-cxl.h> 6 7 #include <linux/debugfs.h> 7 8 #include <linux/device.h> 8 9 #include <linux/module.h> ··· 794 793 return rc; 795 794 } 796 795 796 + DEFINE_SHOW_ATTRIBUTE(einj_cxl_available_error_type); 797 + 798 + static int cxl_einj_inject(void *data, u64 type) 799 + { 800 + struct cxl_dport *dport = data; 801 + 802 + if (dport->rch) 803 + return einj_cxl_inject_rch_error(dport->rcrb.base, type); 804 + 805 + return einj_cxl_inject_error(to_pci_dev(dport->dport_dev), type); 806 + } 807 + DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, 808 + "0x%llx\n"); 809 + 810 + static void cxl_debugfs_create_dport_dir(struct cxl_dport *dport) 811 + { 812 + struct dentry *dir; 813 + 814 + if (!einj_cxl_is_initialized()) 815 + return; 816 + 817 + /* 818 + * dport_dev needs to be a PCIe port for CXL 2.0+ ports because 819 + * EINJ expects a dport SBDF to be specified for 2.0 error injection. 820 + */ 821 + if (!dport->rch && !dev_is_pci(dport->dport_dev)) 822 + return; 823 + 824 + dir = cxl_debugfs_create_dir(dev_name(dport->dport_dev)); 825 + 826 + debugfs_create_file("einj_inject", 0200, dir, dport, 827 + &cxl_einj_inject_fops); 828 + } 829 + 797 830 static struct cxl_port *__devm_cxl_add_port(struct device *host, 798 831 struct device *uport_dev, 799 832 resource_size_t component_reg_phys, ··· 857 822 */ 858 823 port->reg_map = cxlds->reg_map; 859 824 port->reg_map.host = &port->dev; 825 + cxlmd->endpoint = port; 860 826 } else if (parent_dport) { 861 827 rc = dev_set_name(dev, "port%d", port->id); 862 828 if (rc) ··· 1185 1149 if (dev_is_pci(dport_dev)) 1186 1150 dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev)); 1187 1151 1152 + cxl_debugfs_create_dport_dir(dport); 1153 + 1188 1154 return dport; 1189 1155 } 1190 1156 ··· 1412 1374 1413 1375 get_device(host); 1414 1376 get_device(&endpoint->dev); 1415 - cxlmd->endpoint = endpoint; 1416 1377 cxlmd->depth = endpoint->depth; 1417 1378 return devm_add_action_or_reset(dev, delete_endpoint, cxlmd); 1418 1379 } ··· 2133 2096 } 2134 2097 EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL); 2135 2098 2136 - static void combine_coordinates(struct access_coordinate *c1, 2137 - struct access_coordinate *c2) 2099 + /** 2100 + * cxl_hb_get_perf_coordinates - Retrieve performance numbers between initiator 2101 + * and host bridge 2102 + * 2103 + * @port: endpoint cxl_port 2104 + * @coord: output access coordinates 2105 + * 2106 + * Return: errno on failure, 0 on success. 2107 + */ 2108 + int cxl_hb_get_perf_coordinates(struct cxl_port *port, 2109 + struct access_coordinate *coord) 2138 2110 { 2139 - if (c2->write_bandwidth) 2140 - c1->write_bandwidth = min(c1->write_bandwidth, 2141 - c2->write_bandwidth); 2142 - c1->write_latency += c2->write_latency; 2111 + struct cxl_port *iter = port; 2112 + struct cxl_dport *dport; 2143 2113 2144 - if (c2->read_bandwidth) 2145 - c1->read_bandwidth = min(c1->read_bandwidth, 2146 - c2->read_bandwidth); 2147 - c1->read_latency += c2->read_latency; 2114 + if (!is_cxl_endpoint(port)) 2115 + return -EINVAL; 2116 + 2117 + dport = iter->parent_dport; 2118 + while (iter && !is_cxl_root(to_cxl_port(iter->dev.parent))) { 2119 + iter = to_cxl_port(iter->dev.parent); 2120 + dport = iter->parent_dport; 2121 + } 2122 + 2123 + coord[ACCESS_COORDINATE_LOCAL] = 2124 + dport->hb_coord[ACCESS_COORDINATE_LOCAL]; 2125 + coord[ACCESS_COORDINATE_CPU] = 2126 + dport->hb_coord[ACCESS_COORDINATE_CPU]; 2127 + 2128 + return 0; 2148 2129 } 2149 2130 2150 2131 /** ··· 2198 2143 * nothing to gather. 2199 2144 */ 2200 2145 while (iter && !is_cxl_root(to_cxl_port(iter->dev.parent))) { 2201 - combine_coordinates(&c, &dport->sw_coord); 2146 + cxl_coordinates_combine(&c, &c, &dport->sw_coord); 2202 2147 c.write_latency += dport->link_latency; 2203 2148 c.read_latency += dport->link_latency; 2204 2149 2205 2150 iter = to_cxl_port(iter->dev.parent); 2206 2151 dport = iter->parent_dport; 2207 2152 } 2208 - 2209 - /* Augment with the generic port (host bridge) perf data */ 2210 - combine_coordinates(&c, &dport->hb_coord); 2211 2153 2212 2154 /* Get the calculated PCI paths bandwidth */ 2213 2155 pdev = to_pci_dev(port->uport_dev->parent); ··· 2272 2220 int rc; 2273 2221 2274 2222 cxl_debugfs = debugfs_create_dir("cxl", NULL); 2223 + 2224 + if (einj_cxl_is_initialized()) 2225 + debugfs_create_file("einj_types", 0400, cxl_debugfs, NULL, 2226 + &einj_cxl_available_error_type_fops); 2275 2227 2276 2228 cxl_mbox_init(); 2277 2229
+169
drivers/cxl/core/region.c
··· 4 4 #include <linux/genalloc.h> 5 5 #include <linux/device.h> 6 6 #include <linux/module.h> 7 + #include <linux/memory.h> 7 8 #include <linux/slab.h> 8 9 #include <linux/uuid.h> 9 10 #include <linux/sort.h> ··· 30 29 */ 31 30 32 31 static struct cxl_region *to_cxl_region(struct device *dev); 32 + 33 + #define __ACCESS_ATTR_RO(_level, _name) { \ 34 + .attr = { .name = __stringify(_name), .mode = 0444 }, \ 35 + .show = _name##_access##_level##_show, \ 36 + } 37 + 38 + #define ACCESS_DEVICE_ATTR_RO(level, name) \ 39 + struct device_attribute dev_attr_access##level##_##name = __ACCESS_ATTR_RO(level, name) 40 + 41 + #define ACCESS_ATTR_RO(level, attrib) \ 42 + static ssize_t attrib##_access##level##_show(struct device *dev, \ 43 + struct device_attribute *attr, \ 44 + char *buf) \ 45 + { \ 46 + struct cxl_region *cxlr = to_cxl_region(dev); \ 47 + \ 48 + if (cxlr->coord[level].attrib == 0) \ 49 + return -ENOENT; \ 50 + \ 51 + return sysfs_emit(buf, "%u\n", cxlr->coord[level].attrib); \ 52 + } \ 53 + static ACCESS_DEVICE_ATTR_RO(level, attrib) 54 + 55 + ACCESS_ATTR_RO(0, read_bandwidth); 56 + ACCESS_ATTR_RO(0, read_latency); 57 + ACCESS_ATTR_RO(0, write_bandwidth); 58 + ACCESS_ATTR_RO(0, write_latency); 59 + 60 + #define ACCESS_ATTR_DECLARE(level, attrib) \ 61 + (&dev_attr_access##level##_##attrib.attr) 62 + 63 + static struct attribute *access0_coordinate_attrs[] = { 64 + ACCESS_ATTR_DECLARE(0, read_bandwidth), 65 + ACCESS_ATTR_DECLARE(0, write_bandwidth), 66 + ACCESS_ATTR_DECLARE(0, read_latency), 67 + ACCESS_ATTR_DECLARE(0, write_latency), 68 + NULL 69 + }; 70 + 71 + ACCESS_ATTR_RO(1, read_bandwidth); 72 + ACCESS_ATTR_RO(1, read_latency); 73 + ACCESS_ATTR_RO(1, write_bandwidth); 74 + ACCESS_ATTR_RO(1, write_latency); 75 + 76 + static struct attribute *access1_coordinate_attrs[] = { 77 + ACCESS_ATTR_DECLARE(1, read_bandwidth), 78 + ACCESS_ATTR_DECLARE(1, write_bandwidth), 79 + ACCESS_ATTR_DECLARE(1, read_latency), 80 + ACCESS_ATTR_DECLARE(1, write_latency), 81 + NULL 82 + }; 83 + 84 + #define ACCESS_VISIBLE(level) \ 85 + static umode_t cxl_region_access##level##_coordinate_visible( \ 86 + struct kobject *kobj, struct attribute *a, int n) \ 87 + { \ 88 + struct device *dev = kobj_to_dev(kobj); \ 89 + struct cxl_region *cxlr = to_cxl_region(dev); \ 90 + \ 91 + if (a == &dev_attr_access##level##_read_latency.attr && \ 92 + cxlr->coord[level].read_latency == 0) \ 93 + return 0; \ 94 + \ 95 + if (a == &dev_attr_access##level##_write_latency.attr && \ 96 + cxlr->coord[level].write_latency == 0) \ 97 + return 0; \ 98 + \ 99 + if (a == &dev_attr_access##level##_read_bandwidth.attr && \ 100 + cxlr->coord[level].read_bandwidth == 0) \ 101 + return 0; \ 102 + \ 103 + if (a == &dev_attr_access##level##_write_bandwidth.attr && \ 104 + cxlr->coord[level].write_bandwidth == 0) \ 105 + return 0; \ 106 + \ 107 + return a->mode; \ 108 + } 109 + 110 + ACCESS_VISIBLE(0); 111 + ACCESS_VISIBLE(1); 112 + 113 + static const struct attribute_group cxl_region_access0_coordinate_group = { 114 + .name = "access0", 115 + .attrs = access0_coordinate_attrs, 116 + .is_visible = cxl_region_access0_coordinate_visible, 117 + }; 118 + 119 + static const struct attribute_group *get_cxl_region_access0_group(void) 120 + { 121 + return &cxl_region_access0_coordinate_group; 122 + } 123 + 124 + static const struct attribute_group cxl_region_access1_coordinate_group = { 125 + .name = "access1", 126 + .attrs = access1_coordinate_attrs, 127 + .is_visible = cxl_region_access1_coordinate_visible, 128 + }; 129 + 130 + static const struct attribute_group *get_cxl_region_access1_group(void) 131 + { 132 + return &cxl_region_access1_coordinate_group; 133 + } 33 134 34 135 static ssize_t uuid_show(struct device *dev, struct device_attribute *attr, 35 136 char *buf) ··· 1855 1752 return -EINVAL; 1856 1753 } 1857 1754 1755 + cxl_region_perf_data_calculate(cxlr, cxled); 1756 + 1858 1757 if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { 1859 1758 int i; 1860 1759 ··· 2172 2067 &cxl_base_attribute_group, 2173 2068 &cxl_region_group, 2174 2069 &cxl_region_target_group, 2070 + &cxl_region_access0_coordinate_group, 2071 + &cxl_region_access1_coordinate_group, 2175 2072 NULL, 2176 2073 }; 2177 2074 ··· 2227 2120 struct cxl_region_params *p = &cxlr->params; 2228 2121 int i; 2229 2122 2123 + unregister_memory_notifier(&cxlr->memory_notifier); 2230 2124 device_del(&cxlr->dev); 2231 2125 2232 2126 /* ··· 2270 2162 cxlr->id = id; 2271 2163 2272 2164 return cxlr; 2165 + } 2166 + 2167 + static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid) 2168 + { 2169 + int cset = 0; 2170 + int rc; 2171 + 2172 + for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { 2173 + if (cxlr->coord[i].read_bandwidth) { 2174 + rc = 0; 2175 + if (cxl_need_node_perf_attrs_update(nid)) 2176 + node_set_perf_attrs(nid, &cxlr->coord[i], i); 2177 + else 2178 + rc = cxl_update_hmat_access_coordinates(nid, cxlr, i); 2179 + 2180 + if (rc == 0) 2181 + cset++; 2182 + } 2183 + } 2184 + 2185 + if (!cset) 2186 + return false; 2187 + 2188 + rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_access0_group()); 2189 + if (rc) 2190 + dev_dbg(&cxlr->dev, "Failed to update access0 group\n"); 2191 + 2192 + rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_access1_group()); 2193 + if (rc) 2194 + dev_dbg(&cxlr->dev, "Failed to update access1 group\n"); 2195 + 2196 + return true; 2197 + } 2198 + 2199 + static int cxl_region_perf_attrs_callback(struct notifier_block *nb, 2200 + unsigned long action, void *arg) 2201 + { 2202 + struct cxl_region *cxlr = container_of(nb, struct cxl_region, 2203 + memory_notifier); 2204 + struct cxl_region_params *p = &cxlr->params; 2205 + struct cxl_endpoint_decoder *cxled = p->targets[0]; 2206 + struct cxl_decoder *cxld = &cxled->cxld; 2207 + struct memory_notify *mnb = arg; 2208 + int nid = mnb->status_change_nid; 2209 + int region_nid; 2210 + 2211 + if (nid == NUMA_NO_NODE || action != MEM_ONLINE) 2212 + return NOTIFY_DONE; 2213 + 2214 + region_nid = phys_to_target_node(cxld->hpa_range.start); 2215 + if (nid != region_nid) 2216 + return NOTIFY_DONE; 2217 + 2218 + if (!cxl_region_update_coordinates(cxlr, nid)) 2219 + return NOTIFY_DONE; 2220 + 2221 + return NOTIFY_OK; 2273 2222 } 2274 2223 2275 2224 /** ··· 2375 2210 rc = device_add(dev); 2376 2211 if (rc) 2377 2212 goto err; 2213 + 2214 + cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; 2215 + cxlr->memory_notifier.priority = CXL_CALLBACK_PRI; 2216 + register_memory_notifier(&cxlr->memory_notifier); 2378 2217 2379 2218 rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr); 2380 2219 if (rc)
+14 -1
drivers/cxl/cxl.h
··· 6 6 7 7 #include <linux/libnvdimm.h> 8 8 #include <linux/bitfield.h> 9 + #include <linux/notifier.h> 9 10 #include <linux/bitops.h> 10 11 #include <linux/log2.h> 11 12 #include <linux/node.h> ··· 518 517 * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge 519 518 * @flags: Region state flags 520 519 * @params: active + config params for the region 520 + * @coord: QoS access coordinates for the region 521 + * @memory_notifier: notifier for setting the access coordinates to node 521 522 */ 522 523 struct cxl_region { 523 524 struct device dev; ··· 530 527 struct cxl_pmem_region *cxlr_pmem; 531 528 unsigned long flags; 532 529 struct cxl_region_params params; 530 + struct access_coordinate coord[ACCESS_COORDINATE_MAX]; 531 + struct notifier_block memory_notifier; 533 532 }; 534 533 535 534 struct cxl_nvdimm_bridge { ··· 676 671 struct cxl_port *port; 677 672 struct cxl_regs regs; 678 673 struct access_coordinate sw_coord; 679 - struct access_coordinate hb_coord; 674 + struct access_coordinate hb_coord[ACCESS_COORDINATE_MAX]; 680 675 long link_latency; 681 676 }; 682 677 ··· 884 879 885 880 int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, 886 881 struct access_coordinate *coord); 882 + int cxl_hb_get_perf_coordinates(struct cxl_port *port, 883 + struct access_coordinate *coord); 884 + void cxl_region_perf_data_calculate(struct cxl_region *cxlr, 885 + struct cxl_endpoint_decoder *cxled); 887 886 888 887 void cxl_memdev_update_perf(struct cxl_memdev *cxlmd); 888 + 889 + void cxl_coordinates_combine(struct access_coordinate *out, 890 + struct access_coordinate *c1, 891 + struct access_coordinate *c2); 889 892 890 893 /* 891 894 * Unit test builds overrides this to __weak, find the 'strong' version
+24
drivers/cxl/cxlpci.h
··· 71 71 CXL_REGLOC_RBI_TYPES 72 72 }; 73 73 74 + /* 75 + * Table Access DOE, CDAT Read Entry Response 76 + * 77 + * Spec refs: 78 + * 79 + * CXL 3.1 8.1.11, Table 8-14: Read Entry Response 80 + * CDAT Specification 1.03: 2 CDAT Data Structures 81 + */ 82 + 74 83 struct cdat_header { 75 84 __le32 length; 76 85 u8 revision; ··· 92 83 u8 type; 93 84 u8 reserved; 94 85 __le16 length; 86 + } __packed; 87 + 88 + /* 89 + * The DOE CDAT read response contains a CDAT read entry (either the 90 + * CDAT header or a structure). 91 + */ 92 + union cdat_data { 93 + struct cdat_header header; 94 + struct cdat_entry_header entry; 95 + } __packed; 96 + 97 + /* There is an additional CDAT response header of 4 bytes. */ 98 + struct cdat_doe_rsp { 99 + __le32 doe_header; 100 + u8 data[]; 95 101 } __packed; 96 102 97 103 /*
+21
include/linux/acpi.h
··· 1548 1548 ACPI_COMPANION_SET(dev, ACPI_COMPANION(dev->parent)); 1549 1549 } 1550 1550 1551 + #ifdef CONFIG_ACPI_HMAT 1552 + int hmat_update_target_coordinates(int nid, struct access_coordinate *coord, 1553 + enum access_coordinate_class access); 1554 + #else 1555 + static inline int hmat_update_target_coordinates(int nid, 1556 + struct access_coordinate *coord, 1557 + enum access_coordinate_class access) 1558 + { 1559 + return -EOPNOTSUPP; 1560 + } 1561 + #endif 1562 + 1563 + #ifdef CONFIG_ACPI_NUMA 1564 + bool acpi_node_backed_by_real_pxm(int nid); 1565 + #else 1566 + static inline bool acpi_node_backed_by_real_pxm(int nid) 1567 + { 1568 + return false; 1569 + } 1570 + #endif 1571 + 1551 1572 #endif /*_LINUX_ACPI_H*/
+44
include/linux/einj-cxl.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + /* 3 + * CXL protocol Error INJection support. 4 + * 5 + * Copyright (c) 2023 Advanced Micro Devices, Inc. 6 + * All Rights Reserved. 7 + * 8 + * Author: Ben Cheatham <benjamin.cheatham@amd.com> 9 + */ 10 + #ifndef EINJ_CXL_H 11 + #define EINJ_CXL_H 12 + 13 + #include <linux/errno.h> 14 + #include <linux/types.h> 15 + 16 + struct pci_dev; 17 + struct seq_file; 18 + 19 + #if IS_ENABLED(CONFIG_ACPI_APEI_EINJ_CXL) 20 + int einj_cxl_available_error_type_show(struct seq_file *m, void *v); 21 + int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type); 22 + int einj_cxl_inject_rch_error(u64 rcrb, u64 type); 23 + bool einj_cxl_is_initialized(void); 24 + #else /* !IS_ENABLED(CONFIG_ACPI_APEI_EINJ_CXL) */ 25 + static inline int einj_cxl_available_error_type_show(struct seq_file *m, 26 + void *v) 27 + { 28 + return -ENXIO; 29 + } 30 + 31 + static inline int einj_cxl_inject_error(struct pci_dev *dport_dev, u64 type) 32 + { 33 + return -ENXIO; 34 + } 35 + 36 + static inline int einj_cxl_inject_rch_error(u64 rcrb, u64 type) 37 + { 38 + return -ENXIO; 39 + } 40 + 41 + static inline bool einj_cxl_is_initialized(void) { return false; } 42 + #endif /* CONFIG_ACPI_APEI_EINJ_CXL */ 43 + 44 + #endif /* EINJ_CXL_H */
+3 -1
include/linux/fw_table.h
··· 40 40 41 41 int acpi_parse_entries_array(char *id, unsigned long table_size, 42 42 union fw_table_header *table_header, 43 + unsigned long max_length, 43 44 struct acpi_subtable_proc *proc, 44 45 int proc_num, unsigned int max_entries); 45 46 46 47 int cdat_table_parse(enum acpi_cdat_type type, 47 48 acpi_tbl_entry_handler_arg handler_arg, void *arg, 48 - struct acpi_table_cdat *table_header); 49 + struct acpi_table_cdat *table_header, 50 + unsigned long length); 49 51 50 52 /* CXL is the only non-ACPI consumer of the FIRMWARE_TABLE library */ 51 53 #if IS_ENABLED(CONFIG_ACPI) && !IS_ENABLED(CONFIG_CXL_BUS)
+1
include/linux/memory.h
··· 123 123 #define DEFAULT_CALLBACK_PRI 0 124 124 #define SLAB_CALLBACK_PRI 1 125 125 #define HMAT_CALLBACK_PRI 2 126 + #define CXL_CALLBACK_PRI 5 126 127 #define MM_COMPUTE_BATCH_PRI 10 127 128 #define CPUSET_CALLBACK_PRI 10 128 129 #define MEMTIER_HOTPLUG_PRI 100
+15 -3
include/linux/node.h
··· 34 34 unsigned int write_latency; 35 35 }; 36 36 37 + /* 38 + * ACCESS_COORDINATE_LOCAL correlates to ACCESS CLASS 0 39 + * - access_coordinate between target node and nearest initiator node 40 + * ACCESS_COORDINATE_CPU correlates to ACCESS CLASS 1 41 + * - access_coordinate between target node and nearest CPU node 42 + */ 43 + enum access_coordinate_class { 44 + ACCESS_COORDINATE_LOCAL, 45 + ACCESS_COORDINATE_CPU, 46 + ACCESS_COORDINATE_MAX 47 + }; 48 + 37 49 enum cache_indexing { 38 50 NODE_CACHE_DIRECT_MAP, 39 51 NODE_CACHE_INDEXED, ··· 78 66 #ifdef CONFIG_HMEM_REPORTING 79 67 void node_add_cache(unsigned int nid, struct node_cache_attrs *cache_attrs); 80 68 void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, 81 - unsigned access); 69 + enum access_coordinate_class access); 82 70 #else 83 71 static inline void node_add_cache(unsigned int nid, 84 72 struct node_cache_attrs *cache_attrs) ··· 87 75 88 76 static inline void node_set_perf_attrs(unsigned int nid, 89 77 struct access_coordinate *coord, 90 - unsigned access) 78 + enum access_coordinate_class access) 91 79 { 92 80 } 93 81 #endif ··· 149 137 150 138 extern int register_memory_node_under_compute_node(unsigned int mem_nid, 151 139 unsigned int cpu_nid, 152 - unsigned access); 140 + enum access_coordinate_class access); 153 141 #else 154 142 static inline void node_dev_init(void) 155 143 {
+10 -5
lib/fw_table.c
··· 127 127 * 128 128 * @id: table id (for debugging purposes) 129 129 * @table_size: size of the root table 130 + * @max_length: maximum size of the table (ignore if 0) 130 131 * @table_header: where does the table start? 131 132 * @proc: array of acpi_subtable_proc struct containing entry id 132 133 * and associated handler with it ··· 149 148 int __init_or_fwtbl_lib 150 149 acpi_parse_entries_array(char *id, unsigned long table_size, 151 150 union fw_table_header *table_header, 151 + unsigned long max_length, 152 152 struct acpi_subtable_proc *proc, 153 153 int proc_num, unsigned int max_entries) 154 154 { 155 - unsigned long table_end, subtable_len, entry_len; 155 + unsigned long table_len, table_end, subtable_len, entry_len; 156 156 struct acpi_subtable_entry entry; 157 157 enum acpi_subtable_type type; 158 158 int count = 0; 159 159 int i; 160 160 161 161 type = acpi_get_subtable_type(id); 162 - table_end = (unsigned long)table_header + 163 - acpi_table_get_length(type, table_header); 162 + table_len = acpi_table_get_length(type, table_header); 163 + if (max_length && max_length < table_len) 164 + table_len = max_length; 165 + table_end = (unsigned long)table_header + table_len; 164 166 165 167 /* Parse all entries looking for a match. */ 166 168 ··· 212 208 cdat_table_parse(enum acpi_cdat_type type, 213 209 acpi_tbl_entry_handler_arg handler_arg, 214 210 void *arg, 215 - struct acpi_table_cdat *table_header) 211 + struct acpi_table_cdat *table_header, 212 + unsigned long length) 216 213 { 217 214 struct acpi_subtable_proc proc = { 218 215 .id = type, ··· 227 222 return acpi_parse_entries_array(ACPI_SIG_CDAT, 228 223 sizeof(struct acpi_table_cdat), 229 224 (union fw_table_header *)table_header, 230 - &proc, 1, 0); 225 + length, &proc, 1, 0); 231 226 } 232 227 EXPORT_SYMBOL_FWTBL_LIB(cdat_table_parse);