Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

PM: CXL: Disable suspend

The CXL specification claims S3 support at a hardware level, but at a
system software level there are some missing pieces. Section 9.4 (CXL
2.0) rightly claims that "CXL mem adapters may need aux power to retain
memory context across S3", but there is no enumeration mechanism for the
OS to determine if a given adapter has that support. Moreover the save
state and resume image for the system may inadvertantly end up in a CXL
device that needs to be restored before the save state is recoverable.
I.e. a circular dependency that is not resolvable without a third party
save-area.

Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly
allows for suspend, but requires unbinding all CXL memory devices before
the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind
flow is intended to also tear down all CXL memory regions associated
with a given cxl_memdev.

It is reasonable to assume that any device participating in a System RAM
range published in the EFI memory map is covered by aux power and
save-area outside the device itself. So this restriction can be
minimized in the future once pre-existing region enumeration support
arrives, and perhaps a spec update to clarify if the EFI memory map is
sufficent for determining the range of devices managed by
platform-firmware for S3 support.

Per Rafael, if the CXL configuration prevents suspend then it should
fail early before tasks are frozen, and mem_sleep should stop showing
'mem' as an option [1]. Effectively CXL augments the platform suspend
->valid() op since, for example, the ACPI ops are not aware of the CXL /
PCI dependencies. Given the split role of platform firmware vs OS
provisioned CXL memory it is up to the cxl_mem driver to determine if
the CXL configuration has elements that platform firmware may not be
prepared to restore.

Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1]
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/165066828317.3907920.5690432272182042556.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

+79 -6
+1 -1
drivers/Makefile
··· 72 72 obj-y += base/ block/ misc/ mfd/ nfc/ 73 73 obj-$(CONFIG_LIBNVDIMM) += nvdimm/ 74 74 obj-$(CONFIG_DAX) += dax/ 75 - obj-$(CONFIG_CXL_BUS) += cxl/ 76 75 obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/ 77 76 obj-$(CONFIG_NUBUS) += nubus/ 77 + obj-y += cxl/ 78 78 obj-y += macintosh/ 79 79 obj-y += scsi/ 80 80 obj-y += nvme/
+4
drivers/cxl/Kconfig
··· 98 98 default CXL_BUS 99 99 tristate 100 100 101 + config CXL_SUSPEND 102 + def_bool y 103 + depends on SUSPEND && CXL_MEM 104 + 101 105 endif
+1 -1
drivers/cxl/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 - obj-$(CONFIG_CXL_BUS) += core/ 2 + obj-y += core/ 3 3 obj-$(CONFIG_CXL_PCI) += cxl_pci.o 4 4 obj-$(CONFIG_CXL_MEM) += cxl_mem.o 5 5 obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
+1
drivers/cxl/core/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-$(CONFIG_CXL_BUS) += cxl_core.o 3 + obj-$(CONFIG_CXL_SUSPEND) += suspend.o 3 4 4 5 ccflags-y += -I$(srctree)/drivers/cxl 5 6 cxl_core-y := port.o
+24
drivers/cxl/core/suspend.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright(c) 2022 Intel Corporation. All rights reserved. */ 3 + #include <linux/atomic.h> 4 + #include <linux/export.h> 5 + #include "cxlmem.h" 6 + 7 + static atomic_t mem_active; 8 + 9 + bool cxl_mem_active(void) 10 + { 11 + return atomic_read(&mem_active) != 0; 12 + } 13 + 14 + void cxl_mem_active_inc(void) 15 + { 16 + atomic_inc(&mem_active); 17 + } 18 + EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, CXL); 19 + 20 + void cxl_mem_active_dec(void) 21 + { 22 + atomic_dec(&mem_active); 23 + } 24 + EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, CXL);
+11
drivers/cxl/cxlmem.h
··· 353 353 struct cxl_dev_state *cxl_dev_state_create(struct device *dev); 354 354 void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); 355 355 void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); 356 + #ifdef CONFIG_CXL_SUSPEND 357 + void cxl_mem_active_inc(void); 358 + void cxl_mem_active_dec(void); 359 + #else 360 + static inline void cxl_mem_active_inc(void) 361 + { 362 + } 363 + static inline void cxl_mem_active_dec(void) 364 + { 365 + } 366 + #endif 356 367 357 368 struct cxl_hdm { 358 369 struct cxl_component_regs regs;
+21 -1
drivers/cxl/mem.c
··· 138 138 return retval; 139 139 } 140 140 141 + static void enable_suspend(void *data) 142 + { 143 + cxl_mem_active_dec(); 144 + } 145 + 141 146 static int cxl_mem_probe(struct device *dev) 142 147 { 143 148 struct cxl_memdev *cxlmd = to_cxl_memdev(dev); ··· 199 194 out: 200 195 cxl_device_unlock(&parent_port->dev); 201 196 put_device(&parent_port->dev); 202 - return rc; 197 + 198 + /* 199 + * The kernel may be operating out of CXL memory on this device, 200 + * there is no spec defined way to determine whether this device 201 + * preserves contents over suspend, and there is no simple way 202 + * to arrange for the suspend image to avoid CXL memory which 203 + * would setup a circular dependency between PCI resume and save 204 + * state restoration. 205 + * 206 + * TODO: support suspend when all the regions this device is 207 + * hosting are locked and covered by the system address map, 208 + * i.e. platform firmware owns restoring the HDM configuration 209 + * that it locked. 210 + */ 211 + cxl_mem_active_inc(); 212 + return devm_add_action_or_reset(dev, enable_suspend, NULL); 203 213 } 204 214 205 215 static struct cxl_driver cxl_mem_driver = {
+9
include/linux/pm.h
··· 36 36 } 37 37 #endif /* CONFIG_VT_CONSOLE_SLEEP */ 38 38 39 + #ifdef CONFIG_CXL_SUSPEND 40 + bool cxl_mem_active(void); 41 + #else 42 + static inline bool cxl_mem_active(void) 43 + { 44 + return false; 45 + } 46 + #endif 47 + 39 48 /* 40 49 * Device power management 41 50 */
+1 -1
kernel/power/hibernate.c
··· 83 83 { 84 84 return nohibernate == 0 && 85 85 !security_locked_down(LOCKDOWN_HIBERNATION) && 86 - !secretmem_active(); 86 + !secretmem_active() && !cxl_mem_active(); 87 87 } 88 88 89 89 /**
+4 -1
kernel/power/main.c
··· 127 127 char *s = buf; 128 128 suspend_state_t i; 129 129 130 - for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) 130 + for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) { 131 + if (i >= PM_SUSPEND_MEM && cxl_mem_active()) 132 + continue; 131 133 if (mem_sleep_states[i]) { 132 134 const char *label = mem_sleep_states[i]; 133 135 ··· 138 136 else 139 137 s += sprintf(s, "%s ", label); 140 138 } 139 + } 141 140 142 141 /* Convert the last space to a newline if needed. */ 143 142 if (s != buf)
+2 -1
kernel/power/suspend.c
··· 236 236 237 237 static bool sleep_state_supported(suspend_state_t state) 238 238 { 239 - return state == PM_SUSPEND_TO_IDLE || valid_state(state); 239 + return state == PM_SUSPEND_TO_IDLE || 240 + (valid_state(state) && !cxl_mem_active()); 240 241 } 241 242 242 243 static int platform_suspend_prepare(suspend_state_t state)