Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'for-6.3/cxl-ram-region' into cxl/next

Include the support for enumerating and provisioning ram regions for
v6.3. This also include a default policy change for ram / volatile
device-dax instances to assign them to the dax_kmem driver by default.

+1491 -320
+38 -26
Documentation/ABI/testing/sysfs-bus-cxl
··· 198 198 199 199 What: /sys/bus/cxl/devices/endpointX/CDAT 200 200 Date: July, 2022 201 - KernelVersion: v5.20 201 + KernelVersion: v6.0 202 202 Contact: linux-cxl@vger.kernel.org 203 203 Description: 204 204 (RO) If this sysfs entry is not present no DOE mailbox was ··· 209 209 210 210 What: /sys/bus/cxl/devices/decoderX.Y/mode 211 211 Date: May, 2022 212 - KernelVersion: v5.20 212 + KernelVersion: v6.0 213 213 Contact: linux-cxl@vger.kernel.org 214 214 Description: 215 215 (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it ··· 229 229 230 230 What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource 231 231 Date: May, 2022 232 - KernelVersion: v5.20 232 + KernelVersion: v6.0 233 233 Contact: linux-cxl@vger.kernel.org 234 234 Description: 235 235 (RO) When a CXL decoder is of devtype "cxl_decoder_endpoint", ··· 240 240 241 241 What: /sys/bus/cxl/devices/decoderX.Y/dpa_size 242 242 Date: May, 2022 243 - KernelVersion: v5.20 243 + KernelVersion: v6.0 244 244 Contact: linux-cxl@vger.kernel.org 245 245 Description: 246 246 (RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it ··· 260 260 261 261 What: /sys/bus/cxl/devices/decoderX.Y/interleave_ways 262 262 Date: May, 2022 263 - KernelVersion: v5.20 263 + KernelVersion: v6.0 264 264 Contact: linux-cxl@vger.kernel.org 265 265 Description: 266 266 (RO) The number of targets across which this decoder's host ··· 275 275 276 276 What: /sys/bus/cxl/devices/decoderX.Y/interleave_granularity 277 277 Date: May, 2022 278 - KernelVersion: v5.20 278 + KernelVersion: v6.0 279 279 Contact: linux-cxl@vger.kernel.org 280 280 Description: 281 281 (RO) The number of consecutive bytes of host physical address ··· 285 285 interleave_granularity). 286 286 287 287 288 - What: /sys/bus/cxl/devices/decoderX.Y/create_pmem_region 289 - Date: May, 2022 290 - KernelVersion: v5.20 288 + What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region 289 + Date: May, 2022, January, 2023 290 + KernelVersion: v6.0 (pmem), v6.3 (ram) 291 291 Contact: linux-cxl@vger.kernel.org 292 292 Description: 293 293 (RW) Write a string in the form 'regionZ' to start the process 294 - of defining a new persistent memory region (interleave-set) 295 - within the decode range bounded by root decoder 'decoderX.Y'. 296 - The value written must match the current value returned from 297 - reading this attribute. An atomic compare exchange operation is 298 - done on write to assign the requested id to a region and 299 - allocate the region-id for the next creation attempt. EBUSY is 300 - returned if the region name written does not match the current 301 - cached value. 294 + of defining a new persistent, or volatile memory region 295 + (interleave-set) within the decode range bounded by root decoder 296 + 'decoderX.Y'. The value written must match the current value 297 + returned from reading this attribute. An atomic compare exchange 298 + operation is done on write to assign the requested id to a 299 + region and allocate the region-id for the next creation attempt. 300 + EBUSY is returned if the region name written does not match the 301 + current cached value. 302 302 303 303 304 304 What: /sys/bus/cxl/devices/decoderX.Y/delete_region 305 305 Date: May, 2022 306 - KernelVersion: v5.20 306 + KernelVersion: v6.0 307 307 Contact: linux-cxl@vger.kernel.org 308 308 Description: 309 309 (WO) Write a string in the form 'regionZ' to delete that region, ··· 312 312 313 313 What: /sys/bus/cxl/devices/regionZ/uuid 314 314 Date: May, 2022 315 - KernelVersion: v5.20 315 + KernelVersion: v6.0 316 316 Contact: linux-cxl@vger.kernel.org 317 317 Description: 318 318 (RW) Write a unique identifier for the region. This field must 319 319 be set for persistent regions and it must not conflict with the 320 - UUID of another region. 320 + UUID of another region. For volatile ram regions this 321 + attribute is a read-only empty string. 321 322 322 323 323 324 What: /sys/bus/cxl/devices/regionZ/interleave_granularity 324 325 Date: May, 2022 325 - KernelVersion: v5.20 326 + KernelVersion: v6.0 326 327 Contact: linux-cxl@vger.kernel.org 327 328 Description: 328 329 (RW) Set the number of consecutive bytes each device in the ··· 334 333 335 334 What: /sys/bus/cxl/devices/regionZ/interleave_ways 336 335 Date: May, 2022 337 - KernelVersion: v5.20 336 + KernelVersion: v6.0 338 337 Contact: linux-cxl@vger.kernel.org 339 338 Description: 340 339 (RW) Configures the number of devices participating in the ··· 344 343 345 344 What: /sys/bus/cxl/devices/regionZ/size 346 345 Date: May, 2022 347 - KernelVersion: v5.20 346 + KernelVersion: v6.0 348 347 Contact: linux-cxl@vger.kernel.org 349 348 Description: 350 349 (RW) System physical address space to be consumed by the region. ··· 359 358 results in the same address being allocated. 360 359 361 360 361 + What: /sys/bus/cxl/devices/regionZ/mode 362 + Date: January, 2023 363 + KernelVersion: v6.3 364 + Contact: linux-cxl@vger.kernel.org 365 + Description: 366 + (RO) The mode of a region is established at region creation time 367 + and dictates the mode of the endpoint decoder that comprise the 368 + region. For more details on the possible modes see 369 + /sys/bus/cxl/devices/decoderX.Y/mode 370 + 371 + 362 372 What: /sys/bus/cxl/devices/regionZ/resource 363 373 Date: May, 2022 364 - KernelVersion: v5.20 374 + KernelVersion: v6.0 365 375 Contact: linux-cxl@vger.kernel.org 366 376 Description: 367 377 (RO) A region is a contiguous partition of a CXL root decoder ··· 384 372 385 373 What: /sys/bus/cxl/devices/regionZ/target[0..N] 386 374 Date: May, 2022 387 - KernelVersion: v5.20 375 + KernelVersion: v6.0 388 376 Contact: linux-cxl@vger.kernel.org 389 377 Description: 390 378 (RW) Write an endpoint decoder object name to 'targetX' where X ··· 403 391 404 392 What: /sys/bus/cxl/devices/regionZ/commit 405 393 Date: May, 2022 406 - KernelVersion: v5.20 394 + KernelVersion: v6.0 407 395 Contact: linux-cxl@vger.kernel.org 408 396 Description: 409 397 (RW) Write a boolean 'true' string value to this attribute to
+1
MAINTAINERS
··· 6034 6034 M: Vishal Verma <vishal.l.verma@intel.com> 6035 6035 M: Dave Jiang <dave.jiang@intel.com> 6036 6036 L: nvdimm@lists.linux.dev 6037 + L: linux-cxl@vger.kernel.org 6037 6038 S: Supported 6038 6039 F: drivers/dax/ 6039 6040
+2 -2
drivers/acpi/numa/hmat.c
··· 718 718 for (res = target->memregions.child; res; res = res->sibling) { 719 719 int target_nid = pxm_to_node(target->memory_pxm); 720 720 721 - hmem_register_device(target_nid, res); 721 + hmem_register_resource(target_nid, res); 722 722 } 723 723 } 724 724 ··· 869 869 acpi_put_table(tbl); 870 870 return 0; 871 871 } 872 - device_initcall(hmat_init); 872 + subsys_initcall(hmat_init);
+11 -1
drivers/cxl/Kconfig
··· 104 104 depends on SUSPEND && CXL_MEM 105 105 106 106 config CXL_REGION 107 - bool 107 + bool "CXL: Region Support" 108 108 default CXL_BUS 109 109 # For MAX_PHYSMEM_BITS 110 110 depends on SPARSEMEM 111 111 select MEMREGION 112 112 select GET_FREE_REGION 113 + help 114 + Enable the CXL core to enumerate and provision CXL regions. A CXL 115 + region is defined by one or more CXL expanders that decode a given 116 + system-physical address range. For CXL regions established by 117 + platform-firmware this option enables memory error handling to 118 + identify the devices participating in a given interleaved memory 119 + range. Otherwise, platform-firmware managed CXL is enabled by being 120 + placed in the system address map and does not need a driver. 121 + 122 + If unsure say 'y' 113 123 114 124 config CXL_REGION_INVALIDATION_TEST 115 125 bool "CXL: Region Cache Management Bypass (TEST)"
+2 -1
drivers/cxl/acpi.c
··· 731 731 cxl_bus_drain(); 732 732 } 733 733 734 - module_init(cxl_acpi_init); 734 + /* load before dax_hmem sees 'Soft Reserved' CXL ranges */ 735 + subsys_initcall(cxl_acpi_init); 735 736 module_exit(cxl_acpi_exit); 736 737 MODULE_LICENSE("GPL v2"); 737 738 MODULE_IMPORT_NS(CXL);
+4 -3
drivers/cxl/core/core.h
··· 11 11 12 12 #ifdef CONFIG_CXL_REGION 13 13 extern struct device_attribute dev_attr_create_pmem_region; 14 + extern struct device_attribute dev_attr_create_ram_region; 14 15 extern struct device_attribute dev_attr_delete_region; 15 16 extern struct device_attribute dev_attr_region; 16 17 extern const struct device_type cxl_pmem_region_type; 18 + extern const struct device_type cxl_dax_region_type; 17 19 extern const struct device_type cxl_region_type; 18 20 void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled); 19 21 #define CXL_REGION_ATTR(x) (&dev_attr_##x.attr) 20 22 #define CXL_REGION_TYPE(x) (&cxl_region_type) 21 23 #define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr), 22 24 #define CXL_PMEM_REGION_TYPE(x) (&cxl_pmem_region_type) 25 + #define CXL_DAX_REGION_TYPE(x) (&cxl_dax_region_type) 23 26 int cxl_region_init(void); 24 27 void cxl_region_exit(void); 25 28 #else ··· 40 37 #define CXL_REGION_TYPE(x) NULL 41 38 #define SET_CXL_REGION_ATTR(x) 42 39 #define CXL_PMEM_REGION_TYPE(x) NULL 40 + #define CXL_DAX_REGION_TYPE(x) NULL 43 41 #endif 44 42 45 43 struct cxl_send_command; ··· 59 55 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); 60 56 resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); 61 57 extern struct rw_semaphore cxl_dpa_rwsem; 62 - 63 - bool is_switch_decoder(struct device *dev); 64 - struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev); 65 58 66 59 int cxl_memdev_init(void); 67 60 void cxl_memdev_exit(void);
+21 -4
drivers/cxl/core/hdm.c
··· 279 279 return 0; 280 280 } 281 281 282 - static int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, 282 + int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, 283 283 resource_size_t base, resource_size_t len, 284 284 resource_size_t skipped) 285 285 { ··· 295 295 296 296 return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled); 297 297 } 298 + EXPORT_SYMBOL_NS_GPL(devm_cxl_dpa_reserve, CXL); 298 299 299 300 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled) 300 301 { ··· 677 676 port->commit_end--; 678 677 cxld->flags &= ~CXL_DECODER_F_ENABLE; 679 678 679 + /* Userspace is now responsible for reconfiguring this decoder */ 680 + if (is_endpoint_decoder(&cxld->dev)) { 681 + struct cxl_endpoint_decoder *cxled; 682 + 683 + cxled = to_cxl_endpoint_decoder(&cxld->dev); 684 + cxled->state = CXL_DECODER_STATE_MANUAL; 685 + } 686 + 680 687 return 0; 681 688 } 682 689 ··· 792 783 return rc; 793 784 } 794 785 *dpa_base += dpa_size + skip; 786 + 787 + cxled->state = CXL_DECODER_STATE_AUTO; 788 + 795 789 return 0; 796 790 } 797 791 ··· 838 826 cxled = cxl_endpoint_decoder_alloc(port); 839 827 if (IS_ERR(cxled)) { 840 828 dev_warn(&port->dev, 841 - "Failed to allocate the decoder\n"); 829 + "Failed to allocate decoder%d.%d\n", 830 + port->id, i); 842 831 return PTR_ERR(cxled); 843 832 } 844 833 cxld = &cxled->cxld; ··· 849 836 cxlsd = cxl_switch_decoder_alloc(port, target_count); 850 837 if (IS_ERR(cxlsd)) { 851 838 dev_warn(&port->dev, 852 - "Failed to allocate the decoder\n"); 839 + "Failed to allocate decoder%d.%d\n", 840 + port->id, i); 853 841 return PTR_ERR(cxlsd); 854 842 } 855 843 cxld = &cxlsd->cxld; ··· 858 844 859 845 rc = init_hdm_decoder(port, cxld, target_map, hdm, i, &dpa_base); 860 846 if (rc) { 847 + dev_warn(&port->dev, 848 + "Failed to initialize decoder%d.%d\n", 849 + port->id, i); 861 850 put_device(&cxld->dev); 862 851 return rc; 863 852 } 864 853 rc = add_hdm_decoder(port, cxld, target_map); 865 854 if (rc) { 866 855 dev_warn(&port->dev, 867 - "Failed to add decoder to port\n"); 856 + "Failed to add decoder%d.%d\n", port->id, i); 868 857 return rc; 869 858 } 870 859 }
+1
drivers/cxl/core/memdev.c
··· 246 246 if (rc < 0) 247 247 goto err; 248 248 cxlmd->id = rc; 249 + cxlmd->depth = -1; 249 250 250 251 dev = &cxlmd->dev; 251 252 device_initialize(dev);
-5
drivers/cxl/core/pci.c
··· 214 214 return devm_add_action_or_reset(host, clear_mem_enable, cxlds); 215 215 } 216 216 217 - static bool range_contains(struct range *r1, struct range *r2) 218 - { 219 - return r1->start <= r2->start && r1->end >= r2->end; 220 - } 221 - 222 217 /* require dvsec ranges to be covered by a locked platform window */ 223 218 static int dvsec_range_allowed(struct device *dev, void *arg) 224 219 {
+53 -39
drivers/cxl/core/port.c
··· 46 46 return CXL_DEVICE_NVDIMM; 47 47 if (dev->type == CXL_PMEM_REGION_TYPE()) 48 48 return CXL_DEVICE_PMEM_REGION; 49 + if (dev->type == CXL_DAX_REGION_TYPE()) 50 + return CXL_DEVICE_DAX_REGION; 49 51 if (is_cxl_port(dev)) { 50 52 if (is_cxl_root(to_cxl_port(dev))) 51 53 return CXL_DEVICE_ROOT; ··· 182 180 { 183 181 struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); 184 182 185 - switch (cxled->mode) { 186 - case CXL_DECODER_RAM: 187 - return sysfs_emit(buf, "ram\n"); 188 - case CXL_DECODER_PMEM: 189 - return sysfs_emit(buf, "pmem\n"); 190 - case CXL_DECODER_NONE: 191 - return sysfs_emit(buf, "none\n"); 192 - case CXL_DECODER_MIXED: 193 - default: 194 - return sysfs_emit(buf, "mixed\n"); 195 - } 183 + return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode)); 196 184 } 197 185 198 186 static ssize_t mode_store(struct device *dev, struct device_attribute *attr, ··· 296 304 &dev_attr_cap_type3.attr, 297 305 &dev_attr_target_list.attr, 298 306 SET_CXL_REGION_ATTR(create_pmem_region) 307 + SET_CXL_REGION_ATTR(create_ram_region) 299 308 SET_CXL_REGION_ATTR(delete_region) 300 309 NULL, 301 310 }; ··· 304 311 static bool can_create_pmem(struct cxl_root_decoder *cxlrd) 305 312 { 306 313 unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM; 314 + 315 + return (cxlrd->cxlsd.cxld.flags & flags) == flags; 316 + } 317 + 318 + static bool can_create_ram(struct cxl_root_decoder *cxlrd) 319 + { 320 + unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM; 307 321 308 322 return (cxlrd->cxlsd.cxld.flags & flags) == flags; 309 323 } ··· 323 323 if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd)) 324 324 return 0; 325 325 326 - if (a == CXL_REGION_ATTR(delete_region) && !can_create_pmem(cxlrd)) 326 + if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd)) 327 + return 0; 328 + 329 + if (a == CXL_REGION_ATTR(delete_region) && 330 + !(can_create_pmem(cxlrd) || can_create_ram(cxlrd))) 327 331 return 0; 328 332 329 333 return a->mode; ··· 448 444 { 449 445 return dev->type == &cxl_decoder_endpoint_type; 450 446 } 447 + EXPORT_SYMBOL_NS_GPL(is_endpoint_decoder, CXL); 451 448 452 449 bool is_root_decoder(struct device *dev) 453 450 { ··· 460 455 { 461 456 return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type; 462 457 } 458 + EXPORT_SYMBOL_NS_GPL(is_switch_decoder, CXL); 463 459 464 460 struct cxl_decoder *to_cxl_decoder(struct device *dev) 465 461 { ··· 488 482 return NULL; 489 483 return container_of(dev, struct cxl_switch_decoder, cxld.dev); 490 484 } 485 + EXPORT_SYMBOL_NS_GPL(to_cxl_switch_decoder, CXL); 491 486 492 487 static void cxl_ep_release(struct cxl_ep *ep) 493 488 { ··· 1214 1207 1215 1208 get_device(&endpoint->dev); 1216 1209 dev_set_drvdata(dev, endpoint); 1210 + cxlmd->depth = endpoint->depth; 1217 1211 return devm_add_action_or_reset(dev, delete_endpoint, cxlmd); 1218 1212 } 1219 1213 EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL); ··· 1249 1241 } 1250 1242 } 1251 1243 1244 + struct detach_ctx { 1245 + struct cxl_memdev *cxlmd; 1246 + int depth; 1247 + }; 1248 + 1249 + static int port_has_memdev(struct device *dev, const void *data) 1250 + { 1251 + const struct detach_ctx *ctx = data; 1252 + struct cxl_port *port; 1253 + 1254 + if (!is_cxl_port(dev)) 1255 + return 0; 1256 + 1257 + port = to_cxl_port(dev); 1258 + if (port->depth != ctx->depth) 1259 + return 0; 1260 + 1261 + return !!cxl_ep_load(port, ctx->cxlmd); 1262 + } 1263 + 1252 1264 static void cxl_detach_ep(void *data) 1253 1265 { 1254 1266 struct cxl_memdev *cxlmd = data; 1255 - struct device *iter; 1256 1267 1257 - for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) { 1258 - struct device *dport_dev = grandparent(iter); 1268 + for (int i = cxlmd->depth - 1; i >= 1; i--) { 1259 1269 struct cxl_port *port, *parent_port; 1270 + struct detach_ctx ctx = { 1271 + .cxlmd = cxlmd, 1272 + .depth = i, 1273 + }; 1274 + struct device *dev; 1260 1275 struct cxl_ep *ep; 1261 1276 bool died = false; 1262 1277 1263 - if (!dport_dev) 1264 - break; 1265 - 1266 - port = find_cxl_port(dport_dev, NULL); 1267 - if (!port) 1278 + dev = bus_find_device(&cxl_bus_type, NULL, &ctx, 1279 + port_has_memdev); 1280 + if (!dev) 1268 1281 continue; 1269 - 1270 - if (is_cxl_root(port)) { 1271 - put_device(&port->dev); 1272 - continue; 1273 - } 1282 + port = to_cxl_port(dev); 1274 1283 1275 1284 parent_port = to_cxl_port(port->dev.parent); 1276 1285 device_lock(&parent_port->dev); 1277 - if (!parent_port->dev.driver) { 1278 - /* 1279 - * The bottom-up race to delete the port lost to a 1280 - * top-down port disable, give up here, because the 1281 - * parent_port ->remove() will have cleaned up all 1282 - * descendants. 1283 - */ 1284 - device_unlock(&parent_port->dev); 1285 - put_device(&port->dev); 1286 - continue; 1287 - } 1288 - 1289 1286 device_lock(&port->dev); 1290 1287 ep = cxl_ep_load(port, cxlmd); 1291 1288 dev_dbg(&cxlmd->dev, "disconnect %s from %s\n", 1292 1289 ep ? dev_name(ep->ep) : "", dev_name(&port->dev)); 1293 1290 cxl_ep_remove(port, ep); 1294 1291 if (ep && !port->dead && xa_empty(&port->endpoints) && 1295 - !is_cxl_root(parent_port)) { 1292 + !is_cxl_root(parent_port) && parent_port->dev.driver) { 1296 1293 /* 1297 1294 * This was the last ep attached to a dynamically 1298 1295 * enumerated port. Block new cxl_add_ep() and garbage ··· 1633 1620 } 1634 1621 1635 1622 cxlrd->calc_hb = calc_hb; 1623 + mutex_init(&cxlrd->range_lock); 1636 1624 1637 1625 cxld = &cxlsd->cxld; 1638 1626 cxld->dev.type = &cxl_decoder_root_type; ··· 2017 2003 debugfs_remove_recursive(cxl_debugfs); 2018 2004 } 2019 2005 2020 - module_init(cxl_core_init); 2006 + subsys_initcall(cxl_core_init); 2021 2007 module_exit(cxl_core_exit); 2022 2008 MODULE_LICENSE("GPL v2");
+778 -80
drivers/cxl/core/region.c
··· 6 6 #include <linux/module.h> 7 7 #include <linux/slab.h> 8 8 #include <linux/uuid.h> 9 + #include <linux/sort.h> 9 10 #include <linux/idr.h> 10 11 #include <cxlmem.h> 11 12 #include <cxl.h> ··· 46 45 rc = down_read_interruptible(&cxl_region_rwsem); 47 46 if (rc) 48 47 return rc; 49 - rc = sysfs_emit(buf, "%pUb\n", &p->uuid); 48 + if (cxlr->mode != CXL_DECODER_PMEM) 49 + rc = sysfs_emit(buf, "\n"); 50 + else 51 + rc = sysfs_emit(buf, "%pUb\n", &p->uuid); 50 52 up_read(&cxl_region_rwsem); 51 53 52 54 return rc; ··· 306 302 struct device *dev = kobj_to_dev(kobj); 307 303 struct cxl_region *cxlr = to_cxl_region(dev); 308 304 305 + /* 306 + * Support tooling that expects to find a 'uuid' attribute for all 307 + * regions regardless of mode. 308 + */ 309 309 if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM) 310 - return 0; 310 + return 0444; 311 311 return a->mode; 312 312 } 313 313 ··· 468 460 } 469 461 static DEVICE_ATTR_RO(resource); 470 462 463 + static ssize_t mode_show(struct device *dev, struct device_attribute *attr, 464 + char *buf) 465 + { 466 + struct cxl_region *cxlr = to_cxl_region(dev); 467 + 468 + return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode)); 469 + } 470 + static DEVICE_ATTR_RO(mode); 471 + 471 472 static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) 472 473 { 473 474 struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); ··· 527 510 if (device_is_registered(&cxlr->dev)) 528 511 lockdep_assert_held_write(&cxl_region_rwsem); 529 512 if (p->res) { 530 - remove_resource(p->res); 513 + /* 514 + * Autodiscovered regions may not have been able to insert their 515 + * resource. 516 + */ 517 + if (p->res->parent) 518 + remove_resource(p->res); 531 519 kfree(p->res); 532 520 p->res = NULL; 533 521 } ··· 609 587 &dev_attr_interleave_granularity.attr, 610 588 &dev_attr_resource.attr, 611 589 &dev_attr_size.attr, 590 + &dev_attr_mode.attr, 612 591 NULL, 613 592 }; 614 593 ··· 1113 1090 return rc; 1114 1091 } 1115 1092 1116 - cxld->interleave_ways = iw; 1117 - cxld->interleave_granularity = ig; 1118 - cxld->hpa_range = (struct range) { 1119 - .start = p->res->start, 1120 - .end = p->res->end, 1121 - }; 1093 + if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { 1094 + if (cxld->interleave_ways != iw || 1095 + cxld->interleave_granularity != ig || 1096 + cxld->hpa_range.start != p->res->start || 1097 + cxld->hpa_range.end != p->res->end || 1098 + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { 1099 + dev_err(&cxlr->dev, 1100 + "%s:%s %s expected iw: %d ig: %d %pr\n", 1101 + dev_name(port->uport), dev_name(&port->dev), 1102 + __func__, iw, ig, p->res); 1103 + dev_err(&cxlr->dev, 1104 + "%s:%s %s got iw: %d ig: %d state: %s %#llx:%#llx\n", 1105 + dev_name(port->uport), dev_name(&port->dev), 1106 + __func__, cxld->interleave_ways, 1107 + cxld->interleave_granularity, 1108 + (cxld->flags & CXL_DECODER_F_ENABLE) ? 1109 + "enabled" : 1110 + "disabled", 1111 + cxld->hpa_range.start, cxld->hpa_range.end); 1112 + return -ENXIO; 1113 + } 1114 + } else { 1115 + cxld->interleave_ways = iw; 1116 + cxld->interleave_granularity = ig; 1117 + cxld->hpa_range = (struct range) { 1118 + .start = p->res->start, 1119 + .end = p->res->end, 1120 + }; 1121 + } 1122 1122 dev_dbg(&cxlr->dev, "%s:%s iw: %d ig: %d\n", dev_name(port->uport), 1123 1123 dev_name(&port->dev), iw, ig); 1124 1124 add_target: ··· 1152 1106 dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos); 1153 1107 return -ENXIO; 1154 1108 } 1155 - cxlsd->target[cxl_rr->nr_targets_set] = ep->dport; 1109 + if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { 1110 + if (cxlsd->target[cxl_rr->nr_targets_set] != ep->dport) { 1111 + dev_dbg(&cxlr->dev, "%s:%s: %s expected %s at %d\n", 1112 + dev_name(port->uport), dev_name(&port->dev), 1113 + dev_name(&cxlsd->cxld.dev), 1114 + dev_name(ep->dport->dport), 1115 + cxl_rr->nr_targets_set); 1116 + return -ENXIO; 1117 + } 1118 + } else 1119 + cxlsd->target[cxl_rr->nr_targets_set] = ep->dport; 1156 1120 inc = 1; 1157 1121 out_target_set: 1158 1122 cxl_rr->nr_targets_set += inc; ··· 1204 1148 struct cxl_ep *ep; 1205 1149 int i; 1206 1150 1151 + /* 1152 + * In the auto-discovery case skip automatic teardown since the 1153 + * address space is already active 1154 + */ 1155 + if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) 1156 + return; 1157 + 1207 1158 for (i = 0; i < p->nr_targets; i++) { 1208 1159 cxled = p->targets[i]; 1209 1160 cxlmd = cxled_to_memdev(cxled); ··· 1243 1180 iter = to_cxl_port(iter->dev.parent); 1244 1181 1245 1182 /* 1246 - * Descend the topology tree programming targets while 1247 - * looking for conflicts. 1183 + * Descend the topology tree programming / validating 1184 + * targets while looking for conflicts. 1248 1185 */ 1249 1186 for (ep = cxl_ep_load(iter, cxlmd); iter; 1250 1187 iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { ··· 1259 1196 return 0; 1260 1197 } 1261 1198 1262 - static int cxl_region_attach(struct cxl_region *cxlr, 1263 - struct cxl_endpoint_decoder *cxled, int pos) 1199 + static int cxl_region_validate_position(struct cxl_region *cxlr, 1200 + struct cxl_endpoint_decoder *cxled, 1201 + int pos) 1264 1202 { 1265 - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); 1266 1203 struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 1267 - struct cxl_port *ep_port, *root_port, *iter; 1268 1204 struct cxl_region_params *p = &cxlr->params; 1269 - struct cxl_dport *dport; 1270 - int i, rc = -ENXIO; 1271 - 1272 - if (cxled->mode == CXL_DECODER_DEAD) { 1273 - dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev)); 1274 - return -ENODEV; 1275 - } 1276 - 1277 - /* all full of members, or interleave config not established? */ 1278 - if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { 1279 - dev_dbg(&cxlr->dev, "region already active\n"); 1280 - return -EBUSY; 1281 - } else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) { 1282 - dev_dbg(&cxlr->dev, "interleave config missing\n"); 1283 - return -ENXIO; 1284 - } 1205 + int i; 1285 1206 1286 1207 if (pos < 0 || pos >= p->interleave_ways) { 1287 1208 dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, ··· 1304 1257 } 1305 1258 } 1306 1259 1260 + return 0; 1261 + } 1262 + 1263 + static int cxl_region_attach_position(struct cxl_region *cxlr, 1264 + struct cxl_root_decoder *cxlrd, 1265 + struct cxl_endpoint_decoder *cxled, 1266 + const struct cxl_dport *dport, int pos) 1267 + { 1268 + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 1269 + struct cxl_port *iter; 1270 + int rc; 1271 + 1272 + if (cxlrd->calc_hb(cxlrd, pos) != dport) { 1273 + dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n", 1274 + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), 1275 + dev_name(&cxlrd->cxlsd.cxld.dev)); 1276 + return -ENXIO; 1277 + } 1278 + 1279 + for (iter = cxled_to_port(cxled); !is_cxl_root(iter); 1280 + iter = to_cxl_port(iter->dev.parent)) { 1281 + rc = cxl_port_attach_region(iter, cxlr, cxled, pos); 1282 + if (rc) 1283 + goto err; 1284 + } 1285 + 1286 + return 0; 1287 + 1288 + err: 1289 + for (iter = cxled_to_port(cxled); !is_cxl_root(iter); 1290 + iter = to_cxl_port(iter->dev.parent)) 1291 + cxl_port_detach_region(iter, cxlr, cxled); 1292 + return rc; 1293 + } 1294 + 1295 + static int cxl_region_attach_auto(struct cxl_region *cxlr, 1296 + struct cxl_endpoint_decoder *cxled, int pos) 1297 + { 1298 + struct cxl_region_params *p = &cxlr->params; 1299 + 1300 + if (cxled->state != CXL_DECODER_STATE_AUTO) { 1301 + dev_err(&cxlr->dev, 1302 + "%s: unable to add decoder to autodetected region\n", 1303 + dev_name(&cxled->cxld.dev)); 1304 + return -EINVAL; 1305 + } 1306 + 1307 + if (pos >= 0) { 1308 + dev_dbg(&cxlr->dev, "%s: expected auto position, not %d\n", 1309 + dev_name(&cxled->cxld.dev), pos); 1310 + return -EINVAL; 1311 + } 1312 + 1313 + if (p->nr_targets >= p->interleave_ways) { 1314 + dev_err(&cxlr->dev, "%s: no more target slots available\n", 1315 + dev_name(&cxled->cxld.dev)); 1316 + return -ENXIO; 1317 + } 1318 + 1319 + /* 1320 + * Temporarily record the endpoint decoder into the target array. Yes, 1321 + * this means that userspace can view devices in the wrong position 1322 + * before the region activates, and must be careful to understand when 1323 + * it might be racing region autodiscovery. 1324 + */ 1325 + pos = p->nr_targets; 1326 + p->targets[pos] = cxled; 1327 + cxled->pos = pos; 1328 + p->nr_targets++; 1329 + 1330 + return 0; 1331 + } 1332 + 1333 + static struct cxl_port *next_port(struct cxl_port *port) 1334 + { 1335 + if (!port->parent_dport) 1336 + return NULL; 1337 + return port->parent_dport->port; 1338 + } 1339 + 1340 + static int decoder_match_range(struct device *dev, void *data) 1341 + { 1342 + struct cxl_endpoint_decoder *cxled = data; 1343 + struct cxl_switch_decoder *cxlsd; 1344 + 1345 + if (!is_switch_decoder(dev)) 1346 + return 0; 1347 + 1348 + cxlsd = to_cxl_switch_decoder(dev); 1349 + return range_contains(&cxlsd->cxld.hpa_range, &cxled->cxld.hpa_range); 1350 + } 1351 + 1352 + static void find_positions(const struct cxl_switch_decoder *cxlsd, 1353 + const struct cxl_port *iter_a, 1354 + const struct cxl_port *iter_b, int *a_pos, 1355 + int *b_pos) 1356 + { 1357 + int i; 1358 + 1359 + for (i = 0, *a_pos = -1, *b_pos = -1; i < cxlsd->nr_targets; i++) { 1360 + if (cxlsd->target[i] == iter_a->parent_dport) 1361 + *a_pos = i; 1362 + else if (cxlsd->target[i] == iter_b->parent_dport) 1363 + *b_pos = i; 1364 + if (*a_pos >= 0 && *b_pos >= 0) 1365 + break; 1366 + } 1367 + } 1368 + 1369 + static int cmp_decode_pos(const void *a, const void *b) 1370 + { 1371 + struct cxl_endpoint_decoder *cxled_a = *(typeof(cxled_a) *)a; 1372 + struct cxl_endpoint_decoder *cxled_b = *(typeof(cxled_b) *)b; 1373 + struct cxl_memdev *cxlmd_a = cxled_to_memdev(cxled_a); 1374 + struct cxl_memdev *cxlmd_b = cxled_to_memdev(cxled_b); 1375 + struct cxl_port *port_a = cxled_to_port(cxled_a); 1376 + struct cxl_port *port_b = cxled_to_port(cxled_b); 1377 + struct cxl_port *iter_a, *iter_b, *port = NULL; 1378 + struct cxl_switch_decoder *cxlsd; 1379 + struct device *dev; 1380 + int a_pos, b_pos; 1381 + unsigned int seq; 1382 + 1383 + /* Exit early if any prior sorting failed */ 1384 + if (cxled_a->pos < 0 || cxled_b->pos < 0) 1385 + return 0; 1386 + 1387 + /* 1388 + * Walk up the hierarchy to find a shared port, find the decoder that 1389 + * maps the range, compare the relative position of those dport 1390 + * mappings. 1391 + */ 1392 + for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) { 1393 + struct cxl_port *next_a, *next_b; 1394 + 1395 + next_a = next_port(iter_a); 1396 + if (!next_a) 1397 + break; 1398 + 1399 + for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) { 1400 + next_b = next_port(iter_b); 1401 + if (next_a != next_b) 1402 + continue; 1403 + port = next_a; 1404 + break; 1405 + } 1406 + 1407 + if (port) 1408 + break; 1409 + } 1410 + 1411 + if (!port) { 1412 + dev_err(cxlmd_a->dev.parent, 1413 + "failed to find shared port with %s\n", 1414 + dev_name(cxlmd_b->dev.parent)); 1415 + goto err; 1416 + } 1417 + 1418 + dev = device_find_child(&port->dev, cxled_a, decoder_match_range); 1419 + if (!dev) { 1420 + struct range *range = &cxled_a->cxld.hpa_range; 1421 + 1422 + dev_err(port->uport, 1423 + "failed to find decoder that maps %#llx-%#llx\n", 1424 + range->start, range->end); 1425 + goto err; 1426 + } 1427 + 1428 + cxlsd = to_cxl_switch_decoder(dev); 1429 + do { 1430 + seq = read_seqbegin(&cxlsd->target_lock); 1431 + find_positions(cxlsd, iter_a, iter_b, &a_pos, &b_pos); 1432 + } while (read_seqretry(&cxlsd->target_lock, seq)); 1433 + 1434 + put_device(dev); 1435 + 1436 + if (a_pos < 0 || b_pos < 0) { 1437 + dev_err(port->uport, 1438 + "failed to find shared decoder for %s and %s\n", 1439 + dev_name(cxlmd_a->dev.parent), 1440 + dev_name(cxlmd_b->dev.parent)); 1441 + goto err; 1442 + } 1443 + 1444 + dev_dbg(port->uport, "%s comes %s %s\n", dev_name(cxlmd_a->dev.parent), 1445 + a_pos - b_pos < 0 ? "before" : "after", 1446 + dev_name(cxlmd_b->dev.parent)); 1447 + 1448 + return a_pos - b_pos; 1449 + err: 1450 + cxled_a->pos = -1; 1451 + return 0; 1452 + } 1453 + 1454 + static int cxl_region_sort_targets(struct cxl_region *cxlr) 1455 + { 1456 + struct cxl_region_params *p = &cxlr->params; 1457 + int i, rc = 0; 1458 + 1459 + sort(p->targets, p->nr_targets, sizeof(p->targets[0]), cmp_decode_pos, 1460 + NULL); 1461 + 1462 + for (i = 0; i < p->nr_targets; i++) { 1463 + struct cxl_endpoint_decoder *cxled = p->targets[i]; 1464 + 1465 + /* 1466 + * Record that sorting failed, but still continue to restore 1467 + * cxled->pos with its ->targets[] position so that follow-on 1468 + * code paths can reliably do p->targets[cxled->pos] to 1469 + * self-reference their entry. 1470 + */ 1471 + if (cxled->pos < 0) 1472 + rc = -ENXIO; 1473 + cxled->pos = i; 1474 + } 1475 + 1476 + dev_dbg(&cxlr->dev, "region sort %s\n", rc ? "failed" : "successful"); 1477 + return rc; 1478 + } 1479 + 1480 + static int cxl_region_attach(struct cxl_region *cxlr, 1481 + struct cxl_endpoint_decoder *cxled, int pos) 1482 + { 1483 + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); 1484 + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 1485 + struct cxl_region_params *p = &cxlr->params; 1486 + struct cxl_port *ep_port, *root_port; 1487 + struct cxl_dport *dport; 1488 + int rc = -ENXIO; 1489 + 1490 + if (cxled->mode != cxlr->mode) { 1491 + dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", 1492 + dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); 1493 + return -EINVAL; 1494 + } 1495 + 1496 + if (cxled->mode == CXL_DECODER_DEAD) { 1497 + dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev)); 1498 + return -ENODEV; 1499 + } 1500 + 1501 + /* all full of members, or interleave config not established? */ 1502 + if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { 1503 + dev_dbg(&cxlr->dev, "region already active\n"); 1504 + return -EBUSY; 1505 + } else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) { 1506 + dev_dbg(&cxlr->dev, "interleave config missing\n"); 1507 + return -ENXIO; 1508 + } 1509 + 1307 1510 ep_port = cxled_to_port(cxled); 1308 1511 root_port = cxlrd_to_port(cxlrd); 1309 1512 dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge); ··· 1561 1264 dev_dbg(&cxlr->dev, "%s:%s invalid target for %s\n", 1562 1265 dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), 1563 1266 dev_name(cxlr->dev.parent)); 1564 - return -ENXIO; 1565 - } 1566 - 1567 - if (cxlrd->calc_hb(cxlrd, pos) != dport) { 1568 - dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n", 1569 - dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), 1570 - dev_name(&cxlrd->cxlsd.cxld.dev)); 1571 1267 return -ENXIO; 1572 1268 } 1573 1269 ··· 1587 1297 return -EINVAL; 1588 1298 } 1589 1299 1590 - for (iter = ep_port; !is_cxl_root(iter); 1591 - iter = to_cxl_port(iter->dev.parent)) { 1592 - rc = cxl_port_attach_region(iter, cxlr, cxled, pos); 1300 + if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { 1301 + int i; 1302 + 1303 + rc = cxl_region_attach_auto(cxlr, cxled, pos); 1593 1304 if (rc) 1594 - goto err; 1305 + return rc; 1306 + 1307 + /* await more targets to arrive... */ 1308 + if (p->nr_targets < p->interleave_ways) 1309 + return 0; 1310 + 1311 + /* 1312 + * All targets are here, which implies all PCI enumeration that 1313 + * affects this region has been completed. Walk the topology to 1314 + * sort the devices into their relative region decode position. 1315 + */ 1316 + rc = cxl_region_sort_targets(cxlr); 1317 + if (rc) 1318 + return rc; 1319 + 1320 + for (i = 0; i < p->nr_targets; i++) { 1321 + cxled = p->targets[i]; 1322 + ep_port = cxled_to_port(cxled); 1323 + dport = cxl_find_dport_by_dev(root_port, 1324 + ep_port->host_bridge); 1325 + rc = cxl_region_attach_position(cxlr, cxlrd, cxled, 1326 + dport, i); 1327 + if (rc) 1328 + return rc; 1329 + } 1330 + 1331 + rc = cxl_region_setup_targets(cxlr); 1332 + if (rc) 1333 + return rc; 1334 + 1335 + /* 1336 + * If target setup succeeds in the autodiscovery case 1337 + * then the region is already committed. 1338 + */ 1339 + p->state = CXL_CONFIG_COMMIT; 1340 + 1341 + return 0; 1595 1342 } 1343 + 1344 + rc = cxl_region_validate_position(cxlr, cxled, pos); 1345 + if (rc) 1346 + return rc; 1347 + 1348 + rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos); 1349 + if (rc) 1350 + return rc; 1596 1351 1597 1352 p->targets[pos] = cxled; 1598 1353 cxled->pos = pos; ··· 1661 1326 1662 1327 err_decrement: 1663 1328 p->nr_targets--; 1664 - err: 1665 - for (iter = ep_port; !is_cxl_root(iter); 1666 - iter = to_cxl_port(iter->dev.parent)) 1667 - cxl_port_detach_region(iter, cxlr, cxled); 1329 + cxled->pos = -1; 1330 + p->targets[pos] = NULL; 1668 1331 return rc; 1669 1332 } 1670 1333 ··· 1734 1401 up_write(&cxl_region_rwsem); 1735 1402 } 1736 1403 1737 - static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos) 1404 + static int attach_target(struct cxl_region *cxlr, 1405 + struct cxl_endpoint_decoder *cxled, int pos, 1406 + unsigned int state) 1738 1407 { 1739 - struct device *dev; 1740 - int rc; 1408 + int rc = 0; 1741 1409 1742 - dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder); 1743 - if (!dev) 1744 - return -ENODEV; 1745 - 1746 - if (!is_endpoint_decoder(dev)) { 1747 - put_device(dev); 1748 - return -EINVAL; 1749 - } 1750 - 1751 - rc = down_write_killable(&cxl_region_rwsem); 1410 + if (state == TASK_INTERRUPTIBLE) 1411 + rc = down_write_killable(&cxl_region_rwsem); 1412 + else 1413 + down_write(&cxl_region_rwsem); 1752 1414 if (rc) 1753 - goto out; 1415 + return rc; 1416 + 1754 1417 down_read(&cxl_dpa_rwsem); 1755 - rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos); 1418 + rc = cxl_region_attach(cxlr, cxled, pos); 1756 1419 if (rc == 0) 1757 1420 set_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags); 1758 1421 up_read(&cxl_dpa_rwsem); 1759 1422 up_write(&cxl_region_rwsem); 1760 - out: 1761 - put_device(dev); 1762 1423 return rc; 1763 1424 } 1764 1425 ··· 1790 1463 1791 1464 if (sysfs_streq(buf, "\n")) 1792 1465 rc = detach_target(cxlr, pos); 1793 - else 1794 - rc = attach_target(cxlr, buf, pos); 1466 + else { 1467 + struct device *dev; 1468 + 1469 + dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf); 1470 + if (!dev) 1471 + return -ENODEV; 1472 + 1473 + if (!is_endpoint_decoder(dev)) { 1474 + rc = -EINVAL; 1475 + goto out; 1476 + } 1477 + 1478 + rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos, 1479 + TASK_INTERRUPTIBLE); 1480 + out: 1481 + put_device(dev); 1482 + } 1795 1483 1796 1484 if (rc < 0) 1797 1485 return rc; ··· 2010 1668 struct device *dev; 2011 1669 int rc; 2012 1670 1671 + switch (mode) { 1672 + case CXL_DECODER_RAM: 1673 + case CXL_DECODER_PMEM: 1674 + break; 1675 + default: 1676 + dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode); 1677 + return ERR_PTR(-EINVAL); 1678 + } 1679 + 2013 1680 cxlr = cxl_region_alloc(cxlrd, id); 2014 1681 if (IS_ERR(cxlr)) 2015 1682 return cxlr; ··· 2047 1696 return ERR_PTR(rc); 2048 1697 } 2049 1698 1699 + static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf) 1700 + { 1701 + return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id)); 1702 + } 1703 + 2050 1704 static ssize_t create_pmem_region_show(struct device *dev, 2051 1705 struct device_attribute *attr, char *buf) 2052 1706 { 2053 - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); 1707 + return __create_region_show(to_cxl_root_decoder(dev), buf); 1708 + } 2054 1709 2055 - return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id)); 1710 + static ssize_t create_ram_region_show(struct device *dev, 1711 + struct device_attribute *attr, char *buf) 1712 + { 1713 + return __create_region_show(to_cxl_root_decoder(dev), buf); 1714 + } 1715 + 1716 + static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, 1717 + enum cxl_decoder_mode mode, int id) 1718 + { 1719 + int rc; 1720 + 1721 + rc = memregion_alloc(GFP_KERNEL); 1722 + if (rc < 0) 1723 + return ERR_PTR(rc); 1724 + 1725 + if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) { 1726 + memregion_free(rc); 1727 + return ERR_PTR(-EBUSY); 1728 + } 1729 + 1730 + return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_EXPANDER); 2056 1731 } 2057 1732 2058 1733 static ssize_t create_pmem_region_store(struct device *dev, ··· 2087 1710 { 2088 1711 struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); 2089 1712 struct cxl_region *cxlr; 2090 - int id, rc; 1713 + int rc, id; 2091 1714 2092 1715 rc = sscanf(buf, "region%d\n", &id); 2093 1716 if (rc != 1) 2094 1717 return -EINVAL; 2095 1718 2096 - rc = memregion_alloc(GFP_KERNEL); 2097 - if (rc < 0) 2098 - return rc; 2099 - 2100 - if (atomic_cmpxchg(&cxlrd->region_id, id, rc) != id) { 2101 - memregion_free(rc); 2102 - return -EBUSY; 2103 - } 2104 - 2105 - cxlr = devm_cxl_add_region(cxlrd, id, CXL_DECODER_PMEM, 2106 - CXL_DECODER_EXPANDER); 1719 + cxlr = __create_region(cxlrd, CXL_DECODER_PMEM, id); 2107 1720 if (IS_ERR(cxlr)) 2108 1721 return PTR_ERR(cxlr); 2109 1722 2110 1723 return len; 2111 1724 } 2112 1725 DEVICE_ATTR_RW(create_pmem_region); 1726 + 1727 + static ssize_t create_ram_region_store(struct device *dev, 1728 + struct device_attribute *attr, 1729 + const char *buf, size_t len) 1730 + { 1731 + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev); 1732 + struct cxl_region *cxlr; 1733 + int rc, id; 1734 + 1735 + rc = sscanf(buf, "region%d\n", &id); 1736 + if (rc != 1) 1737 + return -EINVAL; 1738 + 1739 + cxlr = __create_region(cxlrd, CXL_DECODER_RAM, id); 1740 + if (IS_ERR(cxlr)) 1741 + return PTR_ERR(cxlr); 1742 + 1743 + return len; 1744 + } 1745 + DEVICE_ATTR_RW(create_ram_region); 2113 1746 2114 1747 static ssize_t region_show(struct device *dev, struct device_attribute *attr, 2115 1748 char *buf) ··· 2280 1893 return cxlr_pmem; 2281 1894 } 2282 1895 1896 + static void cxl_dax_region_release(struct device *dev) 1897 + { 1898 + struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); 1899 + 1900 + kfree(cxlr_dax); 1901 + } 1902 + 1903 + static const struct attribute_group *cxl_dax_region_attribute_groups[] = { 1904 + &cxl_base_attribute_group, 1905 + NULL, 1906 + }; 1907 + 1908 + const struct device_type cxl_dax_region_type = { 1909 + .name = "cxl_dax_region", 1910 + .release = cxl_dax_region_release, 1911 + .groups = cxl_dax_region_attribute_groups, 1912 + }; 1913 + 1914 + static bool is_cxl_dax_region(struct device *dev) 1915 + { 1916 + return dev->type == &cxl_dax_region_type; 1917 + } 1918 + 1919 + struct cxl_dax_region *to_cxl_dax_region(struct device *dev) 1920 + { 1921 + if (dev_WARN_ONCE(dev, !is_cxl_dax_region(dev), 1922 + "not a cxl_dax_region device\n")) 1923 + return NULL; 1924 + return container_of(dev, struct cxl_dax_region, dev); 1925 + } 1926 + EXPORT_SYMBOL_NS_GPL(to_cxl_dax_region, CXL); 1927 + 1928 + static struct lock_class_key cxl_dax_region_key; 1929 + 1930 + static struct cxl_dax_region *cxl_dax_region_alloc(struct cxl_region *cxlr) 1931 + { 1932 + struct cxl_region_params *p = &cxlr->params; 1933 + struct cxl_dax_region *cxlr_dax; 1934 + struct device *dev; 1935 + 1936 + down_read(&cxl_region_rwsem); 1937 + if (p->state != CXL_CONFIG_COMMIT) { 1938 + cxlr_dax = ERR_PTR(-ENXIO); 1939 + goto out; 1940 + } 1941 + 1942 + cxlr_dax = kzalloc(sizeof(*cxlr_dax), GFP_KERNEL); 1943 + if (!cxlr_dax) { 1944 + cxlr_dax = ERR_PTR(-ENOMEM); 1945 + goto out; 1946 + } 1947 + 1948 + cxlr_dax->hpa_range.start = p->res->start; 1949 + cxlr_dax->hpa_range.end = p->res->end; 1950 + 1951 + dev = &cxlr_dax->dev; 1952 + cxlr_dax->cxlr = cxlr; 1953 + device_initialize(dev); 1954 + lockdep_set_class(&dev->mutex, &cxl_dax_region_key); 1955 + device_set_pm_not_required(dev); 1956 + dev->parent = &cxlr->dev; 1957 + dev->bus = &cxl_bus_type; 1958 + dev->type = &cxl_dax_region_type; 1959 + out: 1960 + up_read(&cxl_region_rwsem); 1961 + 1962 + return cxlr_dax; 1963 + } 1964 + 2283 1965 static void cxlr_pmem_unregister(void *_cxlr_pmem) 2284 1966 { 2285 1967 struct cxl_pmem_region *cxlr_pmem = _cxlr_pmem; ··· 2433 1977 return rc; 2434 1978 } 2435 1979 1980 + static void cxlr_dax_unregister(void *_cxlr_dax) 1981 + { 1982 + struct cxl_dax_region *cxlr_dax = _cxlr_dax; 1983 + 1984 + device_unregister(&cxlr_dax->dev); 1985 + } 1986 + 1987 + static int devm_cxl_add_dax_region(struct cxl_region *cxlr) 1988 + { 1989 + struct cxl_dax_region *cxlr_dax; 1990 + struct device *dev; 1991 + int rc; 1992 + 1993 + cxlr_dax = cxl_dax_region_alloc(cxlr); 1994 + if (IS_ERR(cxlr_dax)) 1995 + return PTR_ERR(cxlr_dax); 1996 + 1997 + dev = &cxlr_dax->dev; 1998 + rc = dev_set_name(dev, "dax_region%d", cxlr->id); 1999 + if (rc) 2000 + goto err; 2001 + 2002 + rc = device_add(dev); 2003 + if (rc) 2004 + goto err; 2005 + 2006 + dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent), 2007 + dev_name(dev)); 2008 + 2009 + return devm_add_action_or_reset(&cxlr->dev, cxlr_dax_unregister, 2010 + cxlr_dax); 2011 + err: 2012 + put_device(dev); 2013 + return rc; 2014 + } 2015 + 2016 + static int match_decoder_by_range(struct device *dev, void *data) 2017 + { 2018 + struct range *r1, *r2 = data; 2019 + struct cxl_root_decoder *cxlrd; 2020 + 2021 + if (!is_root_decoder(dev)) 2022 + return 0; 2023 + 2024 + cxlrd = to_cxl_root_decoder(dev); 2025 + r1 = &cxlrd->cxlsd.cxld.hpa_range; 2026 + return range_contains(r1, r2); 2027 + } 2028 + 2029 + static int match_region_by_range(struct device *dev, void *data) 2030 + { 2031 + struct cxl_region_params *p; 2032 + struct cxl_region *cxlr; 2033 + struct range *r = data; 2034 + int rc = 0; 2035 + 2036 + if (!is_cxl_region(dev)) 2037 + return 0; 2038 + 2039 + cxlr = to_cxl_region(dev); 2040 + p = &cxlr->params; 2041 + 2042 + down_read(&cxl_region_rwsem); 2043 + if (p->res && p->res->start == r->start && p->res->end == r->end) 2044 + rc = 1; 2045 + up_read(&cxl_region_rwsem); 2046 + 2047 + return rc; 2048 + } 2049 + 2050 + /* Establish an empty region covering the given HPA range */ 2051 + static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, 2052 + struct cxl_endpoint_decoder *cxled) 2053 + { 2054 + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 2055 + struct cxl_port *port = cxlrd_to_port(cxlrd); 2056 + struct range *hpa = &cxled->cxld.hpa_range; 2057 + struct cxl_region_params *p; 2058 + struct cxl_region *cxlr; 2059 + struct resource *res; 2060 + int rc; 2061 + 2062 + do { 2063 + cxlr = __create_region(cxlrd, cxled->mode, 2064 + atomic_read(&cxlrd->region_id)); 2065 + } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY); 2066 + 2067 + if (IS_ERR(cxlr)) { 2068 + dev_err(cxlmd->dev.parent, 2069 + "%s:%s: %s failed assign region: %ld\n", 2070 + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), 2071 + __func__, PTR_ERR(cxlr)); 2072 + return cxlr; 2073 + } 2074 + 2075 + down_write(&cxl_region_rwsem); 2076 + p = &cxlr->params; 2077 + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) { 2078 + dev_err(cxlmd->dev.parent, 2079 + "%s:%s: %s autodiscovery interrupted\n", 2080 + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), 2081 + __func__); 2082 + rc = -EBUSY; 2083 + goto err; 2084 + } 2085 + 2086 + set_bit(CXL_REGION_F_AUTO, &cxlr->flags); 2087 + 2088 + res = kmalloc(sizeof(*res), GFP_KERNEL); 2089 + if (!res) { 2090 + rc = -ENOMEM; 2091 + goto err; 2092 + } 2093 + 2094 + *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa), 2095 + dev_name(&cxlr->dev)); 2096 + rc = insert_resource(cxlrd->res, res); 2097 + if (rc) { 2098 + /* 2099 + * Platform-firmware may not have split resources like "System 2100 + * RAM" on CXL window boundaries see cxl_region_iomem_release() 2101 + */ 2102 + dev_warn(cxlmd->dev.parent, 2103 + "%s:%s: %s %s cannot insert resource\n", 2104 + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), 2105 + __func__, dev_name(&cxlr->dev)); 2106 + } 2107 + 2108 + p->res = res; 2109 + p->interleave_ways = cxled->cxld.interleave_ways; 2110 + p->interleave_granularity = cxled->cxld.interleave_granularity; 2111 + p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; 2112 + 2113 + rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group()); 2114 + if (rc) 2115 + goto err; 2116 + 2117 + dev_dbg(cxlmd->dev.parent, "%s:%s: %s %s res: %pr iw: %d ig: %d\n", 2118 + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), __func__, 2119 + dev_name(&cxlr->dev), p->res, p->interleave_ways, 2120 + p->interleave_granularity); 2121 + 2122 + /* ...to match put_device() in cxl_add_to_region() */ 2123 + get_device(&cxlr->dev); 2124 + up_write(&cxl_region_rwsem); 2125 + 2126 + return cxlr; 2127 + 2128 + err: 2129 + up_write(&cxl_region_rwsem); 2130 + devm_release_action(port->uport, unregister_region, cxlr); 2131 + return ERR_PTR(rc); 2132 + } 2133 + 2134 + int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled) 2135 + { 2136 + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 2137 + struct range *hpa = &cxled->cxld.hpa_range; 2138 + struct cxl_decoder *cxld = &cxled->cxld; 2139 + struct device *cxlrd_dev, *region_dev; 2140 + struct cxl_root_decoder *cxlrd; 2141 + struct cxl_region_params *p; 2142 + struct cxl_region *cxlr; 2143 + bool attach = false; 2144 + int rc; 2145 + 2146 + cxlrd_dev = device_find_child(&root->dev, &cxld->hpa_range, 2147 + match_decoder_by_range); 2148 + if (!cxlrd_dev) { 2149 + dev_err(cxlmd->dev.parent, 2150 + "%s:%s no CXL window for range %#llx:%#llx\n", 2151 + dev_name(&cxlmd->dev), dev_name(&cxld->dev), 2152 + cxld->hpa_range.start, cxld->hpa_range.end); 2153 + return -ENXIO; 2154 + } 2155 + 2156 + cxlrd = to_cxl_root_decoder(cxlrd_dev); 2157 + 2158 + /* 2159 + * Ensure that if multiple threads race to construct_region() for @hpa 2160 + * one does the construction and the others add to that. 2161 + */ 2162 + mutex_lock(&cxlrd->range_lock); 2163 + region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa, 2164 + match_region_by_range); 2165 + if (!region_dev) { 2166 + cxlr = construct_region(cxlrd, cxled); 2167 + region_dev = &cxlr->dev; 2168 + } else 2169 + cxlr = to_cxl_region(region_dev); 2170 + mutex_unlock(&cxlrd->range_lock); 2171 + 2172 + if (IS_ERR(cxlr)) { 2173 + rc = PTR_ERR(cxlr); 2174 + goto out; 2175 + } 2176 + 2177 + attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE); 2178 + 2179 + down_read(&cxl_region_rwsem); 2180 + p = &cxlr->params; 2181 + attach = p->state == CXL_CONFIG_COMMIT; 2182 + up_read(&cxl_region_rwsem); 2183 + 2184 + if (attach) { 2185 + /* 2186 + * If device_attach() fails the range may still be active via 2187 + * the platform-firmware memory map, otherwise the driver for 2188 + * regions is local to this file, so driver matching can't fail. 2189 + */ 2190 + if (device_attach(&cxlr->dev) < 0) 2191 + dev_err(&cxlr->dev, "failed to enable, range: %pr\n", 2192 + p->res); 2193 + } 2194 + 2195 + put_device(region_dev); 2196 + out: 2197 + put_device(cxlrd_dev); 2198 + return rc; 2199 + } 2200 + EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL); 2201 + 2436 2202 static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) 2437 2203 { 2438 2204 if (!test_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags)) ··· 2677 1999 cpu_cache_invalidate_memregion(IORES_DESC_CXL); 2678 2000 clear_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags); 2679 2001 return 0; 2002 + } 2003 + 2004 + static int is_system_ram(struct resource *res, void *arg) 2005 + { 2006 + struct cxl_region *cxlr = arg; 2007 + struct cxl_region_params *p = &cxlr->params; 2008 + 2009 + dev_dbg(&cxlr->dev, "%pr has System RAM: %pr\n", p->res, res); 2010 + return 1; 2680 2011 } 2681 2012 2682 2013 static int cxl_region_probe(struct device *dev) ··· 2721 2034 switch (cxlr->mode) { 2722 2035 case CXL_DECODER_PMEM: 2723 2036 return devm_cxl_add_pmem_region(cxlr); 2037 + case CXL_DECODER_RAM: 2038 + /* 2039 + * The region can not be manged by CXL if any portion of 2040 + * it is already online as 'System RAM' 2041 + */ 2042 + if (walk_iomem_res_desc(IORES_DESC_NONE, 2043 + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, 2044 + p->res->start, p->res->end, cxlr, 2045 + is_system_ram) > 0) 2046 + return 0; 2047 + return devm_cxl_add_dax_region(cxlr); 2724 2048 default: 2725 2049 dev_dbg(&cxlr->dev, "unsupported region mode: %d\n", 2726 2050 cxlr->mode);
+57
drivers/cxl/cxl.h
··· 277 277 * cxl_decoder flags that define the type of memory / devices this 278 278 * decoder supports as well as configuration lock status See "CXL 2.0 279 279 * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details. 280 + * Additionally indicate whether decoder settings were autodetected, 281 + * user customized. 280 282 */ 281 283 #define CXL_DECODER_F_RAM BIT(0) 282 284 #define CXL_DECODER_F_PMEM BIT(1) ··· 338 336 CXL_DECODER_DEAD, 339 337 }; 340 338 339 + static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) 340 + { 341 + static const char * const names[] = { 342 + [CXL_DECODER_NONE] = "none", 343 + [CXL_DECODER_RAM] = "ram", 344 + [CXL_DECODER_PMEM] = "pmem", 345 + [CXL_DECODER_MIXED] = "mixed", 346 + }; 347 + 348 + if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED) 349 + return names[mode]; 350 + return "mixed"; 351 + } 352 + 353 + /* 354 + * Track whether this decoder is reserved for region autodiscovery, or 355 + * free for userspace provisioning. 356 + */ 357 + enum cxl_decoder_state { 358 + CXL_DECODER_STATE_MANUAL, 359 + CXL_DECODER_STATE_AUTO, 360 + }; 361 + 341 362 /** 342 363 * struct cxl_endpoint_decoder - Endpoint / SPA to DPA decoder 343 364 * @cxld: base cxl_decoder_object 344 365 * @dpa_res: actively claimed DPA span of this decoder 345 366 * @skip: offset into @dpa_res where @cxld.hpa_range maps 346 367 * @mode: which memory type / access-mode-partition this decoder targets 368 + * @state: autodiscovery state 347 369 * @pos: interleave position in @cxld.region 348 370 */ 349 371 struct cxl_endpoint_decoder { ··· 375 349 struct resource *dpa_res; 376 350 resource_size_t skip; 377 351 enum cxl_decoder_mode mode; 352 + enum cxl_decoder_state state; 378 353 int pos; 379 354 }; 380 355 ··· 409 382 * @region_id: region id for next region provisioning event 410 383 * @calc_hb: which host bridge covers the n'th position by granularity 411 384 * @platform_data: platform specific configuration data 385 + * @range_lock: sync region autodiscovery by address range 412 386 * @cxlsd: base cxl switch decoder 413 387 */ 414 388 struct cxl_root_decoder { ··· 417 389 atomic_t region_id; 418 390 cxl_calc_hb_fn calc_hb; 419 391 void *platform_data; 392 + struct mutex range_lock; 420 393 struct cxl_switch_decoder cxlsd; 421 394 }; 422 395 ··· 466 437 * CPU cache state at region activation time. 467 438 */ 468 439 #define CXL_REGION_F_INCOHERENT 0 440 + 441 + /* 442 + * Indicate whether this region has been assembled by autodetection or 443 + * userspace assembly. Prevent endpoint decoders outside of automatic 444 + * detection from being added to the region. 445 + */ 446 + #define CXL_REGION_F_AUTO 1 469 447 470 448 /** 471 449 * struct cxl_region - CXL region ··· 527 491 struct range hpa_range; 528 492 int nr_mappings; 529 493 struct cxl_pmem_region_mapping mapping[]; 494 + }; 495 + 496 + struct cxl_dax_region { 497 + struct device dev; 498 + struct cxl_region *cxlr; 499 + struct range hpa_range; 530 500 }; 531 501 532 502 /** ··· 675 633 676 634 struct cxl_decoder *to_cxl_decoder(struct device *dev); 677 635 struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev); 636 + struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev); 678 637 struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev); 679 638 bool is_root_decoder(struct device *dev); 639 + bool is_switch_decoder(struct device *dev); 680 640 bool is_endpoint_decoder(struct device *dev); 681 641 struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, 682 642 unsigned int nr_targets, ··· 729 685 #define CXL_DEVICE_MEMORY_EXPANDER 5 730 686 #define CXL_DEVICE_REGION 6 731 687 #define CXL_DEVICE_PMEM_REGION 7 688 + #define CXL_DEVICE_DAX_REGION 8 732 689 733 690 #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*") 734 691 #define CXL_MODALIAS_FMT "cxl:t%d" ··· 746 701 #ifdef CONFIG_CXL_REGION 747 702 bool is_cxl_pmem_region(struct device *dev); 748 703 struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev); 704 + int cxl_add_to_region(struct cxl_port *root, 705 + struct cxl_endpoint_decoder *cxled); 706 + struct cxl_dax_region *to_cxl_dax_region(struct device *dev); 749 707 #else 750 708 static inline bool is_cxl_pmem_region(struct device *dev) 751 709 { 752 710 return false; 753 711 } 754 712 static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev) 713 + { 714 + return NULL; 715 + } 716 + static inline int cxl_add_to_region(struct cxl_port *root, 717 + struct cxl_endpoint_decoder *cxled) 718 + { 719 + return 0; 720 + } 721 + static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev) 755 722 { 756 723 return NULL; 757 724 }
+5
drivers/cxl/cxlmem.h
··· 39 39 * @cxl_nvb: coordinate removal of @cxl_nvd if present 40 40 * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem 41 41 * @id: id number of this memdev instance. 42 + * @depth: endpoint port depth 42 43 */ 43 44 struct cxl_memdev { 44 45 struct device dev; ··· 49 48 struct cxl_nvdimm_bridge *cxl_nvb; 50 49 struct cxl_nvdimm *cxl_nvd; 51 50 int id; 51 + int depth; 52 52 }; 53 53 54 54 static inline struct cxl_memdev *to_cxl_memdev(struct device *dev) ··· 82 80 } 83 81 84 82 struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds); 83 + int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, 84 + resource_size_t base, resource_size_t len, 85 + resource_size_t skipped); 85 86 86 87 static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, 87 88 struct cxl_memdev *cxlmd)
+83 -29
drivers/cxl/port.c
··· 30 30 schedule_cxl_memdev_detach(cxlmd); 31 31 } 32 32 33 - static int cxl_port_probe(struct device *dev) 33 + static int discover_region(struct device *dev, void *root) 34 34 { 35 - struct cxl_port *port = to_cxl_port(dev); 35 + struct cxl_endpoint_decoder *cxled; 36 + int rc; 37 + 38 + if (!is_endpoint_decoder(dev)) 39 + return 0; 40 + 41 + cxled = to_cxl_endpoint_decoder(dev); 42 + if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0) 43 + return 0; 44 + 45 + if (cxled->state != CXL_DECODER_STATE_AUTO) 46 + return 0; 47 + 48 + /* 49 + * Region enumeration is opportunistic, if this add-event fails, 50 + * continue to the next endpoint decoder. 51 + */ 52 + rc = cxl_add_to_region(root, cxled); 53 + if (rc) 54 + dev_dbg(dev, "failed to add to region: %#llx-%#llx\n", 55 + cxled->cxld.hpa_range.start, cxled->cxld.hpa_range.end); 56 + 57 + return 0; 58 + } 59 + 60 + static int cxl_switch_port_probe(struct cxl_port *port) 61 + { 36 62 struct cxl_hdm *cxlhdm; 37 63 int rc; 38 64 65 + rc = devm_cxl_port_enumerate_dports(port); 66 + if (rc < 0) 67 + return rc; 39 68 40 - if (!is_cxl_endpoint(port)) { 41 - rc = devm_cxl_port_enumerate_dports(port); 42 - if (rc < 0) 43 - return rc; 44 - if (rc == 1) 45 - return devm_cxl_add_passthrough_decoder(port); 46 - } 69 + if (rc == 1) 70 + return devm_cxl_add_passthrough_decoder(port); 47 71 48 72 cxlhdm = devm_cxl_setup_hdm(port); 49 73 if (IS_ERR(cxlhdm)) 50 74 return PTR_ERR(cxlhdm); 51 75 52 - if (is_cxl_endpoint(port)) { 53 - struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport); 54 - struct cxl_dev_state *cxlds = cxlmd->cxlds; 76 + return devm_cxl_enumerate_decoders(cxlhdm); 77 + } 55 78 56 - /* Cache the data early to ensure is_visible() works */ 57 - read_cdat_data(port); 79 + static int cxl_endpoint_port_probe(struct cxl_port *port) 80 + { 81 + struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport); 82 + struct cxl_dev_state *cxlds = cxlmd->cxlds; 83 + struct cxl_hdm *cxlhdm; 84 + struct cxl_port *root; 85 + int rc; 58 86 59 - get_device(&cxlmd->dev); 60 - rc = devm_add_action_or_reset(dev, schedule_detach, cxlmd); 61 - if (rc) 62 - return rc; 87 + cxlhdm = devm_cxl_setup_hdm(port); 88 + if (IS_ERR(cxlhdm)) 89 + return PTR_ERR(cxlhdm); 63 90 64 - rc = cxl_hdm_decode_init(cxlds, cxlhdm); 65 - if (rc) 66 - return rc; 91 + /* Cache the data early to ensure is_visible() works */ 92 + read_cdat_data(port); 67 93 68 - rc = cxl_await_media_ready(cxlds); 69 - if (rc) { 70 - dev_err(dev, "Media not active (%d)\n", rc); 71 - return rc; 72 - } 73 - } 94 + get_device(&cxlmd->dev); 95 + rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd); 96 + if (rc) 97 + return rc; 74 98 75 - rc = devm_cxl_enumerate_decoders(cxlhdm); 99 + rc = cxl_hdm_decode_init(cxlds, cxlhdm); 100 + if (rc) 101 + return rc; 102 + 103 + rc = cxl_await_media_ready(cxlds); 76 104 if (rc) { 77 - dev_err(dev, "Couldn't enumerate decoders (%d)\n", rc); 105 + dev_err(&port->dev, "Media not active (%d)\n", rc); 78 106 return rc; 79 107 } 80 108 109 + rc = devm_cxl_enumerate_decoders(cxlhdm); 110 + if (rc) 111 + return rc; 112 + 113 + /* 114 + * This can't fail in practice as CXL root exit unregisters all 115 + * descendant ports and that in turn synchronizes with cxl_port_probe() 116 + */ 117 + root = find_cxl_root(&cxlmd->dev); 118 + 119 + /* 120 + * Now that all endpoint decoders are successfully enumerated, try to 121 + * assemble regions from committed decoders 122 + */ 123 + device_for_each_child(&port->dev, root, discover_region); 124 + put_device(&root->dev); 125 + 81 126 return 0; 127 + } 128 + 129 + static int cxl_port_probe(struct device *dev) 130 + { 131 + struct cxl_port *port = to_cxl_port(dev); 132 + 133 + if (is_cxl_endpoint(port)) 134 + return cxl_endpoint_port_probe(port); 135 + return cxl_switch_port_probe(port); 82 136 } 83 137 84 138 static ssize_t CDAT_read(struct file *filp, struct kobject *kobj,
+15 -2
drivers/dax/Kconfig
··· 45 45 46 46 Say M if unsure. 47 47 48 + config DEV_DAX_CXL 49 + tristate "CXL DAX: direct access to CXL RAM regions" 50 + depends on CXL_REGION && DEV_DAX 51 + default CXL_REGION && DEV_DAX 52 + help 53 + CXL RAM regions are either mapped by platform-firmware 54 + and published in the initial system-memory map as "System RAM", mapped 55 + by platform-firmware as "Soft Reserved", or dynamically provisioned 56 + after boot by the CXL driver. In the latter two cases a device-dax 57 + instance is created to access that unmapped-by-default address range. 58 + Per usual it can remain as dedicated access via a device interface, or 59 + converted to "System RAM" via the dax_kmem facility. 60 + 48 61 config DEV_DAX_HMEM_DEVICES 49 - depends on DEV_DAX_HMEM && DAX=y 62 + depends on DEV_DAX_HMEM && DAX 50 63 def_bool y 51 64 52 65 config DEV_DAX_KMEM 53 - tristate "KMEM DAX: volatile-use of persistent memory" 66 + tristate "KMEM DAX: map dax-devices as System-RAM" 54 67 default DEV_DAX 55 68 depends on DEV_DAX 56 69 depends on MEMORY_HOTPLUG # for add_memory() and friends
+2
drivers/dax/Makefile
··· 3 3 obj-$(CONFIG_DEV_DAX) += device_dax.o 4 4 obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o 5 5 obj-$(CONFIG_DEV_DAX_PMEM) += dax_pmem.o 6 + obj-$(CONFIG_DEV_DAX_CXL) += dax_cxl.o 6 7 7 8 dax-y := super.o 8 9 dax-y += bus.o 9 10 device_dax-y := device.o 10 11 dax_pmem-y := pmem.o 12 + dax_cxl-y := cxl.o 11 13 12 14 obj-y += hmem/
+22 -31
drivers/dax/bus.c
··· 56 56 return match; 57 57 } 58 58 59 + static int dax_match_type(struct dax_device_driver *dax_drv, struct device *dev) 60 + { 61 + enum dax_driver_type type = DAXDRV_DEVICE_TYPE; 62 + struct dev_dax *dev_dax = to_dev_dax(dev); 63 + 64 + if (dev_dax->region->res.flags & IORESOURCE_DAX_KMEM) 65 + type = DAXDRV_KMEM_TYPE; 66 + 67 + if (dax_drv->type == type) 68 + return 1; 69 + 70 + /* default to device mode if dax_kmem is disabled */ 71 + if (dax_drv->type == DAXDRV_DEVICE_TYPE && 72 + !IS_ENABLED(CONFIG_DEV_DAX_KMEM)) 73 + return 1; 74 + 75 + return 0; 76 + } 77 + 59 78 enum id_action { 60 79 ID_REMOVE, 61 80 ID_ADD, ··· 235 216 { 236 217 struct dax_device_driver *dax_drv = to_dax_drv(drv); 237 218 238 - /* 239 - * All but the 'device-dax' driver, which has 'match_always' 240 - * set, requires an exact id match. 241 - */ 242 - if (dax_drv->match_always) 219 + if (dax_match_id(dax_drv, dev)) 243 220 return 1; 244 - 245 - return dax_match_id(dax_drv, dev); 221 + return dax_match_type(dax_drv, dev); 246 222 } 247 223 248 224 /* ··· 1427 1413 } 1428 1414 EXPORT_SYMBOL_GPL(devm_create_dev_dax); 1429 1415 1430 - static int match_always_count; 1431 - 1432 1416 int __dax_driver_register(struct dax_device_driver *dax_drv, 1433 1417 struct module *module, const char *mod_name) 1434 1418 { 1435 1419 struct device_driver *drv = &dax_drv->drv; 1436 - int rc = 0; 1437 1420 1438 1421 /* 1439 1422 * dax_bus_probe() calls dax_drv->probe() unconditionally. ··· 1445 1434 drv->mod_name = mod_name; 1446 1435 drv->bus = &dax_bus_type; 1447 1436 1448 - /* there can only be one default driver */ 1449 - mutex_lock(&dax_bus_lock); 1450 - match_always_count += dax_drv->match_always; 1451 - if (match_always_count > 1) { 1452 - match_always_count--; 1453 - WARN_ON(1); 1454 - rc = -EINVAL; 1455 - } 1456 - mutex_unlock(&dax_bus_lock); 1457 - if (rc) 1458 - return rc; 1459 - 1460 - rc = driver_register(drv); 1461 - if (rc && dax_drv->match_always) { 1462 - mutex_lock(&dax_bus_lock); 1463 - match_always_count -= dax_drv->match_always; 1464 - mutex_unlock(&dax_bus_lock); 1465 - } 1466 - 1467 - return rc; 1437 + return driver_register(drv); 1468 1438 } 1469 1439 EXPORT_SYMBOL_GPL(__dax_driver_register); 1470 1440 ··· 1455 1463 struct dax_id *dax_id, *_id; 1456 1464 1457 1465 mutex_lock(&dax_bus_lock); 1458 - match_always_count -= dax_drv->match_always; 1459 1466 list_for_each_entry_safe(dax_id, _id, &dax_drv->ids, list) { 1460 1467 list_del(&dax_id->list); 1461 1468 kfree(dax_id);
+10 -2
drivers/dax/bus.h
··· 11 11 struct dax_region; 12 12 void dax_region_put(struct dax_region *dax_region); 13 13 14 - #define IORESOURCE_DAX_STATIC (1UL << 0) 14 + /* dax bus specific ioresource flags */ 15 + #define IORESOURCE_DAX_STATIC BIT(0) 16 + #define IORESOURCE_DAX_KMEM BIT(1) 17 + 15 18 struct dax_region *alloc_dax_region(struct device *parent, int region_id, 16 19 struct range *range, int target_node, unsigned int align, 17 20 unsigned long flags); ··· 28 25 29 26 struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data); 30 27 28 + enum dax_driver_type { 29 + DAXDRV_KMEM_TYPE, 30 + DAXDRV_DEVICE_TYPE, 31 + }; 32 + 31 33 struct dax_device_driver { 32 34 struct device_driver drv; 33 35 struct list_head ids; 34 - int match_always; 36 + enum dax_driver_type type; 35 37 int (*probe)(struct dev_dax *dev); 36 38 void (*remove)(struct dev_dax *dev); 37 39 };
+53
drivers/dax/cxl.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright(c) 2023 Intel Corporation. All rights reserved. */ 3 + #include <linux/module.h> 4 + #include <linux/dax.h> 5 + 6 + #include "../cxl/cxl.h" 7 + #include "bus.h" 8 + 9 + static int cxl_dax_region_probe(struct device *dev) 10 + { 11 + struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev); 12 + int nid = phys_to_target_node(cxlr_dax->hpa_range.start); 13 + struct cxl_region *cxlr = cxlr_dax->cxlr; 14 + struct dax_region *dax_region; 15 + struct dev_dax_data data; 16 + struct dev_dax *dev_dax; 17 + 18 + if (nid == NUMA_NO_NODE) 19 + nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start); 20 + 21 + dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid, 22 + PMD_SIZE, IORESOURCE_DAX_KMEM); 23 + if (!dax_region) 24 + return -ENOMEM; 25 + 26 + data = (struct dev_dax_data) { 27 + .dax_region = dax_region, 28 + .id = -1, 29 + .size = range_len(&cxlr_dax->hpa_range), 30 + }; 31 + dev_dax = devm_create_dev_dax(&data); 32 + if (IS_ERR(dev_dax)) 33 + return PTR_ERR(dev_dax); 34 + 35 + /* child dev_dax instances now own the lifetime of the dax_region */ 36 + dax_region_put(dax_region); 37 + return 0; 38 + } 39 + 40 + static struct cxl_driver cxl_dax_region_driver = { 41 + .name = "cxl_dax_region", 42 + .probe = cxl_dax_region_probe, 43 + .id = CXL_DEVICE_DAX_REGION, 44 + .drv = { 45 + .suppress_bind_attrs = true, 46 + }, 47 + }; 48 + 49 + module_cxl_driver(cxl_dax_region_driver); 50 + MODULE_ALIAS_CXL(CXL_DEVICE_DAX_REGION); 51 + MODULE_LICENSE("GPL"); 52 + MODULE_AUTHOR("Intel Corporation"); 53 + MODULE_IMPORT_NS(CXL);
+1 -2
drivers/dax/device.c
··· 475 475 476 476 static struct dax_device_driver device_dax_driver = { 477 477 .probe = dev_dax_probe, 478 - /* all probe actions are unwound by devm, so .remove isn't necessary */ 479 - .match_always = 1, 478 + .type = DAXDRV_DEVICE_TYPE, 480 479 }; 481 480 482 481 static int __init dax_init(void)
+2 -1
drivers/dax/hmem/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 - obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o 2 + # device_hmem.o deliberately precedes dax_hmem.o for initcall ordering 3 3 obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device_hmem.o 4 + obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o 4 5 5 6 device_hmem-y := device.o 6 7 dax_hmem-y := hmem.o
+47 -59
drivers/dax/hmem/device.c
··· 8 8 static bool nohmem; 9 9 module_param_named(disable, nohmem, bool, 0444); 10 10 11 + static bool platform_initialized; 12 + static DEFINE_MUTEX(hmem_resource_lock); 11 13 static struct resource hmem_active = { 12 14 .name = "HMEM devices", 13 15 .start = 0, ··· 17 15 .flags = IORESOURCE_MEM, 18 16 }; 19 17 20 - void hmem_register_device(int target_nid, struct resource *r) 18 + int walk_hmem_resources(struct device *host, walk_hmem_fn fn) 21 19 { 22 - /* define a clean / non-busy resource for the platform device */ 23 - struct resource res = { 24 - .start = r->start, 25 - .end = r->end, 26 - .flags = IORESOURCE_MEM, 27 - .desc = IORES_DESC_SOFT_RESERVED, 28 - }; 20 + struct resource *res; 21 + int rc = 0; 22 + 23 + mutex_lock(&hmem_resource_lock); 24 + for (res = hmem_active.child; res; res = res->sibling) { 25 + rc = fn(host, (int) res->desc, res); 26 + if (rc) 27 + break; 28 + } 29 + mutex_unlock(&hmem_resource_lock); 30 + return rc; 31 + } 32 + EXPORT_SYMBOL_GPL(walk_hmem_resources); 33 + 34 + static void __hmem_register_resource(int target_nid, struct resource *res) 35 + { 29 36 struct platform_device *pdev; 30 - struct memregion_info info; 31 - int rc, id; 37 + struct resource *new; 38 + int rc; 32 39 33 - if (nohmem) 34 - return; 35 - 36 - rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM, 37 - IORES_DESC_SOFT_RESERVED); 38 - if (rc != REGION_INTERSECTS) 39 - return; 40 - 41 - id = memregion_alloc(GFP_KERNEL); 42 - if (id < 0) { 43 - pr_err("memregion allocation failure for %pr\n", &res); 40 + new = __request_region(&hmem_active, res->start, resource_size(res), "", 41 + 0); 42 + if (!new) { 43 + pr_debug("hmem range %pr already active\n", res); 44 44 return; 45 45 } 46 46 47 - pdev = platform_device_alloc("hmem", id); 47 + new->desc = target_nid; 48 + 49 + if (platform_initialized) 50 + return; 51 + 52 + pdev = platform_device_alloc("hmem_platform", 0); 48 53 if (!pdev) { 49 - pr_err("hmem device allocation failure for %pr\n", &res); 50 - goto out_pdev; 51 - } 52 - 53 - if (!__request_region(&hmem_active, res.start, resource_size(&res), 54 - dev_name(&pdev->dev), 0)) { 55 - dev_dbg(&pdev->dev, "hmem range %pr already active\n", &res); 56 - goto out_active; 57 - } 58 - 59 - pdev->dev.numa_node = numa_map_to_online_node(target_nid); 60 - info = (struct memregion_info) { 61 - .target_node = target_nid, 62 - }; 63 - rc = platform_device_add_data(pdev, &info, sizeof(info)); 64 - if (rc < 0) { 65 - pr_err("hmem memregion_info allocation failure for %pr\n", &res); 66 - goto out_resource; 67 - } 68 - 69 - rc = platform_device_add_resources(pdev, &res, 1); 70 - if (rc < 0) { 71 - pr_err("hmem resource allocation failure for %pr\n", &res); 72 - goto out_resource; 54 + pr_err_once("failed to register device-dax hmem_platform device\n"); 55 + return; 73 56 } 74 57 75 58 rc = platform_device_add(pdev); 76 - if (rc < 0) { 77 - dev_err(&pdev->dev, "device add failed for %pr\n", &res); 78 - goto out_resource; 79 - } 59 + if (rc) 60 + platform_device_put(pdev); 61 + else 62 + platform_initialized = true; 63 + } 80 64 81 - return; 65 + void hmem_register_resource(int target_nid, struct resource *res) 66 + { 67 + if (nohmem) 68 + return; 82 69 83 - out_resource: 84 - __release_region(&hmem_active, res.start, resource_size(&res)); 85 - out_active: 86 - platform_device_put(pdev); 87 - out_pdev: 88 - memregion_free(id); 70 + mutex_lock(&hmem_resource_lock); 71 + __hmem_register_resource(target_nid, res); 72 + mutex_unlock(&hmem_resource_lock); 89 73 } 90 74 91 75 static __init int hmem_register_one(struct resource *res, void *data) 92 76 { 93 - hmem_register_device(phys_to_target_node(res->start), res); 77 + hmem_register_resource(phys_to_target_node(res->start), res); 94 78 95 79 return 0; 96 80 } ··· 92 104 * As this is a fallback for address ranges unclaimed by the ACPI HMAT 93 105 * parsing it must be at an initcall level greater than hmat_init(). 94 106 */ 95 - late_initcall(hmem_init); 107 + device_initcall(hmem_init);
+130 -18
drivers/dax/hmem/hmem.c
··· 3 3 #include <linux/memregion.h> 4 4 #include <linux/module.h> 5 5 #include <linux/pfn_t.h> 6 + #include <linux/dax.h> 6 7 #include "../bus.h" 7 8 8 9 static bool region_idle; ··· 11 10 12 11 static int dax_hmem_probe(struct platform_device *pdev) 13 12 { 13 + unsigned long flags = IORESOURCE_DAX_KMEM; 14 14 struct device *dev = &pdev->dev; 15 15 struct dax_region *dax_region; 16 16 struct memregion_info *mri; 17 17 struct dev_dax_data data; 18 18 struct dev_dax *dev_dax; 19 - struct resource *res; 20 - struct range range; 21 19 22 - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); 23 - if (!res) 24 - return -ENOMEM; 20 + /* 21 + * @region_idle == true indicates that an administrative agent 22 + * wants to manipulate the range partitioning before the devices 23 + * are created, so do not send them to the dax_kmem driver by 24 + * default. 25 + */ 26 + if (region_idle) 27 + flags = 0; 25 28 26 29 mri = dev->platform_data; 27 - range.start = res->start; 28 - range.end = res->end; 29 - dax_region = alloc_dax_region(dev, pdev->id, &range, mri->target_node, 30 - PMD_SIZE, 0); 30 + dax_region = alloc_dax_region(dev, pdev->id, &mri->range, 31 + mri->target_node, PMD_SIZE, flags); 31 32 if (!dax_region) 32 33 return -ENOMEM; 33 34 34 35 data = (struct dev_dax_data) { 35 36 .dax_region = dax_region, 36 37 .id = -1, 37 - .size = region_idle ? 0 : resource_size(res), 38 + .size = region_idle ? 0 : range_len(&mri->range), 38 39 }; 39 40 dev_dax = devm_create_dev_dax(&data); 40 41 if (IS_ERR(dev_dax)) ··· 47 44 return 0; 48 45 } 49 46 50 - static int dax_hmem_remove(struct platform_device *pdev) 51 - { 52 - /* devm handles teardown */ 53 - return 0; 54 - } 55 - 56 47 static struct platform_driver dax_hmem_driver = { 57 48 .probe = dax_hmem_probe, 58 - .remove = dax_hmem_remove, 59 49 .driver = { 60 50 .name = "hmem", 61 51 }, 62 52 }; 63 53 64 - module_platform_driver(dax_hmem_driver); 54 + static void release_memregion(void *data) 55 + { 56 + memregion_free((long) data); 57 + } 58 + 59 + static void release_hmem(void *pdev) 60 + { 61 + platform_device_unregister(pdev); 62 + } 63 + 64 + static int hmem_register_device(struct device *host, int target_nid, 65 + const struct resource *res) 66 + { 67 + struct platform_device *pdev; 68 + struct memregion_info info; 69 + long id; 70 + int rc; 71 + 72 + if (IS_ENABLED(CONFIG_CXL_REGION) && 73 + region_intersects(res->start, resource_size(res), IORESOURCE_MEM, 74 + IORES_DESC_CXL) != REGION_DISJOINT) { 75 + dev_dbg(host, "deferring range to CXL: %pr\n", res); 76 + return 0; 77 + } 78 + 79 + rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM, 80 + IORES_DESC_SOFT_RESERVED); 81 + if (rc != REGION_INTERSECTS) 82 + return 0; 83 + 84 + id = memregion_alloc(GFP_KERNEL); 85 + if (id < 0) { 86 + dev_err(host, "memregion allocation failure for %pr\n", res); 87 + return -ENOMEM; 88 + } 89 + rc = devm_add_action_or_reset(host, release_memregion, (void *) id); 90 + if (rc) 91 + return rc; 92 + 93 + pdev = platform_device_alloc("hmem", id); 94 + if (!pdev) { 95 + dev_err(host, "device allocation failure for %pr\n", res); 96 + return -ENOMEM; 97 + } 98 + 99 + pdev->dev.numa_node = numa_map_to_online_node(target_nid); 100 + info = (struct memregion_info) { 101 + .target_node = target_nid, 102 + .range = { 103 + .start = res->start, 104 + .end = res->end, 105 + }, 106 + }; 107 + rc = platform_device_add_data(pdev, &info, sizeof(info)); 108 + if (rc < 0) { 109 + dev_err(host, "memregion_info allocation failure for %pr\n", 110 + res); 111 + goto out_put; 112 + } 113 + 114 + rc = platform_device_add(pdev); 115 + if (rc < 0) { 116 + dev_err(host, "%s add failed for %pr\n", dev_name(&pdev->dev), 117 + res); 118 + goto out_put; 119 + } 120 + 121 + return devm_add_action_or_reset(host, release_hmem, pdev); 122 + 123 + out_put: 124 + platform_device_put(pdev); 125 + return rc; 126 + } 127 + 128 + static int dax_hmem_platform_probe(struct platform_device *pdev) 129 + { 130 + return walk_hmem_resources(&pdev->dev, hmem_register_device); 131 + } 132 + 133 + static struct platform_driver dax_hmem_platform_driver = { 134 + .probe = dax_hmem_platform_probe, 135 + .driver = { 136 + .name = "hmem_platform", 137 + }, 138 + }; 139 + 140 + static __init int dax_hmem_init(void) 141 + { 142 + int rc; 143 + 144 + rc = platform_driver_register(&dax_hmem_platform_driver); 145 + if (rc) 146 + return rc; 147 + 148 + rc = platform_driver_register(&dax_hmem_driver); 149 + if (rc) 150 + platform_driver_unregister(&dax_hmem_platform_driver); 151 + 152 + return rc; 153 + } 154 + 155 + static __exit void dax_hmem_exit(void) 156 + { 157 + platform_driver_unregister(&dax_hmem_driver); 158 + platform_driver_unregister(&dax_hmem_platform_driver); 159 + } 160 + 161 + module_init(dax_hmem_init); 162 + module_exit(dax_hmem_exit); 163 + 164 + /* Allow for CXL to define its own dax regions */ 165 + #if IS_ENABLED(CONFIG_CXL_REGION) 166 + #if IS_MODULE(CONFIG_CXL_ACPI) 167 + MODULE_SOFTDEP("pre: cxl_acpi"); 168 + #endif 169 + #endif 65 170 66 171 MODULE_ALIAS("platform:hmem*"); 172 + MODULE_ALIAS("platform:hmem_platform*"); 67 173 MODULE_LICENSE("GPL v2"); 68 174 MODULE_AUTHOR("Intel Corporation");
+1
drivers/dax/kmem.c
··· 239 239 static struct dax_device_driver device_dax_kmem_driver = { 240 240 .probe = dev_dax_kmem_probe, 241 241 .remove = dev_dax_kmem_remove, 242 + .type = DAXDRV_KMEM_TYPE, 242 243 }; 243 244 244 245 static int __init dax_kmem_init(void)
+5 -2
include/linux/dax.h
··· 262 262 } 263 263 264 264 #ifdef CONFIG_DEV_DAX_HMEM_DEVICES 265 - void hmem_register_device(int target_nid, struct resource *r); 265 + void hmem_register_resource(int target_nid, struct resource *r); 266 266 #else 267 - static inline void hmem_register_device(int target_nid, struct resource *r) 267 + static inline void hmem_register_resource(int target_nid, struct resource *r) 268 268 { 269 269 } 270 270 #endif 271 271 272 + typedef int (*walk_hmem_fn)(struct device *dev, int target_nid, 273 + const struct resource *res); 274 + int walk_hmem_resources(struct device *dev, walk_hmem_fn fn); 272 275 #endif
+2
include/linux/memregion.h
··· 3 3 #define _MEMREGION_H_ 4 4 #include <linux/types.h> 5 5 #include <linux/errno.h> 6 + #include <linux/range.h> 6 7 #include <linux/bug.h> 7 8 8 9 struct memregion_info { 9 10 int target_node; 11 + struct range range; 10 12 }; 11 13 12 14 #ifdef CONFIG_MEMREGION
+5
include/linux/range.h
··· 13 13 return range->end - range->start + 1; 14 14 } 15 15 16 + static inline bool range_contains(struct range *r1, struct range *r2) 17 + { 18 + return r1->start <= r2->start && r1->end >= r2->end; 19 + } 20 + 16 21 int add_range(struct range *range, int az, int nr_range, 17 22 u64 start, u64 end); 18 23
+3 -3
lib/stackinit_kunit.c
··· 31 31 static void *fill_start, *target_start; 32 32 static size_t fill_size, target_size; 33 33 34 - static bool range_contains(char *haystack_start, size_t haystack_size, 35 - char *needle_start, size_t needle_size) 34 + static bool stackinit_range_contains(char *haystack_start, size_t haystack_size, 35 + char *needle_start, size_t needle_size) 36 36 { 37 37 if (needle_start >= haystack_start && 38 38 needle_start + needle_size <= haystack_start + haystack_size) ··· 175 175 \ 176 176 /* Validate that compiler lined up fill and target. */ \ 177 177 KUNIT_ASSERT_TRUE_MSG(test, \ 178 - range_contains(fill_start, fill_size, \ 178 + stackinit_range_contains(fill_start, fill_size, \ 179 179 target_start, target_size), \ 180 180 "stack fill missed target!? " \ 181 181 "(fill %zu wide, target offset by %d)\n", \
+137 -10
tools/testing/cxl/test/cxl.c
··· 703 703 return 0; 704 704 } 705 705 706 + static void default_mock_decoder(struct cxl_decoder *cxld) 707 + { 708 + cxld->hpa_range = (struct range){ 709 + .start = 0, 710 + .end = -1, 711 + }; 712 + 713 + cxld->interleave_ways = 1; 714 + cxld->interleave_granularity = 256; 715 + cxld->target_type = CXL_DECODER_EXPANDER; 716 + cxld->commit = mock_decoder_commit; 717 + cxld->reset = mock_decoder_reset; 718 + } 719 + 720 + static int first_decoder(struct device *dev, void *data) 721 + { 722 + struct cxl_decoder *cxld; 723 + 724 + if (!is_switch_decoder(dev)) 725 + return 0; 726 + cxld = to_cxl_decoder(dev); 727 + if (cxld->id == 0) 728 + return 1; 729 + return 0; 730 + } 731 + 732 + static void mock_init_hdm_decoder(struct cxl_decoder *cxld) 733 + { 734 + struct acpi_cedt_cfmws *window = mock_cfmws[0]; 735 + struct platform_device *pdev = NULL; 736 + struct cxl_endpoint_decoder *cxled; 737 + struct cxl_switch_decoder *cxlsd; 738 + struct cxl_port *port, *iter; 739 + const int size = SZ_512M; 740 + struct cxl_memdev *cxlmd; 741 + struct cxl_dport *dport; 742 + struct device *dev; 743 + bool hb0 = false; 744 + u64 base; 745 + int i; 746 + 747 + if (is_endpoint_decoder(&cxld->dev)) { 748 + cxled = to_cxl_endpoint_decoder(&cxld->dev); 749 + cxlmd = cxled_to_memdev(cxled); 750 + WARN_ON(!dev_is_platform(cxlmd->dev.parent)); 751 + pdev = to_platform_device(cxlmd->dev.parent); 752 + 753 + /* check is endpoint is attach to host-bridge0 */ 754 + port = cxled_to_port(cxled); 755 + do { 756 + if (port->uport == &cxl_host_bridge[0]->dev) { 757 + hb0 = true; 758 + break; 759 + } 760 + if (is_cxl_port(port->dev.parent)) 761 + port = to_cxl_port(port->dev.parent); 762 + else 763 + port = NULL; 764 + } while (port); 765 + port = cxled_to_port(cxled); 766 + } 767 + 768 + /* 769 + * The first decoder on the first 2 devices on the first switch 770 + * attached to host-bridge0 mock a fake / static RAM region. All 771 + * other decoders are default disabled. Given the round robin 772 + * assignment those devices are named cxl_mem.0, and cxl_mem.4. 773 + * 774 + * See 'cxl list -BMPu -m cxl_mem.0,cxl_mem.4' 775 + */ 776 + if (!hb0 || pdev->id % 4 || pdev->id > 4 || cxld->id > 0) { 777 + default_mock_decoder(cxld); 778 + return; 779 + } 780 + 781 + base = window->base_hpa; 782 + cxld->hpa_range = (struct range) { 783 + .start = base, 784 + .end = base + size - 1, 785 + }; 786 + 787 + cxld->interleave_ways = 2; 788 + eig_to_granularity(window->granularity, &cxld->interleave_granularity); 789 + cxld->target_type = CXL_DECODER_EXPANDER; 790 + cxld->flags = CXL_DECODER_F_ENABLE; 791 + cxled->state = CXL_DECODER_STATE_AUTO; 792 + port->commit_end = cxld->id; 793 + devm_cxl_dpa_reserve(cxled, 0, size / cxld->interleave_ways, 0); 794 + cxld->commit = mock_decoder_commit; 795 + cxld->reset = mock_decoder_reset; 796 + 797 + /* 798 + * Now that endpoint decoder is set up, walk up the hierarchy 799 + * and setup the switch and root port decoders targeting @cxlmd. 800 + */ 801 + iter = port; 802 + for (i = 0; i < 2; i++) { 803 + dport = iter->parent_dport; 804 + iter = dport->port; 805 + dev = device_find_child(&iter->dev, NULL, first_decoder); 806 + /* 807 + * Ancestor ports are guaranteed to be enumerated before 808 + * @port, and all ports have at least one decoder. 809 + */ 810 + if (WARN_ON(!dev)) 811 + continue; 812 + cxlsd = to_cxl_switch_decoder(dev); 813 + if (i == 0) { 814 + /* put cxl_mem.4 second in the decode order */ 815 + if (pdev->id == 4) 816 + cxlsd->target[1] = dport; 817 + else 818 + cxlsd->target[0] = dport; 819 + } else 820 + cxlsd->target[0] = dport; 821 + cxld = &cxlsd->cxld; 822 + cxld->target_type = CXL_DECODER_EXPANDER; 823 + cxld->flags = CXL_DECODER_F_ENABLE; 824 + iter->commit_end = 0; 825 + /* 826 + * Switch targets 2 endpoints, while host bridge targets 827 + * one root port 828 + */ 829 + if (i == 0) 830 + cxld->interleave_ways = 2; 831 + else 832 + cxld->interleave_ways = 1; 833 + cxld->interleave_granularity = 256; 834 + cxld->hpa_range = (struct range) { 835 + .start = base, 836 + .end = base + size - 1, 837 + }; 838 + put_device(dev); 839 + } 840 + } 841 + 706 842 static int mock_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm) 707 843 { 708 844 struct cxl_port *port = cxlhdm->port; ··· 884 748 cxld = &cxled->cxld; 885 749 } 886 750 887 - cxld->hpa_range = (struct range) { 888 - .start = 0, 889 - .end = -1, 890 - }; 891 - 892 - cxld->interleave_ways = min_not_zero(target_count, 1); 893 - cxld->interleave_granularity = SZ_4K; 894 - cxld->target_type = CXL_DECODER_EXPANDER; 895 - cxld->commit = mock_decoder_commit; 896 - cxld->reset = mock_decoder_reset; 751 + mock_init_hdm_decoder(cxld); 897 752 898 753 if (target_count) { 899 754 rc = device_for_each_child(port->uport, &ctx,