Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

cxl/port: Fix use-after-free, permit out-of-order decoder shutdown

In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:

cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]

At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.

The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.

In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.

A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.

Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

authored by

Dan Williams and committed by
Ira Weiny
101c268b 48f62d38

+100 -53
+35
drivers/base/core.c
··· 4038 4038 EXPORT_SYMBOL_GPL(device_for_each_child_reverse); 4039 4039 4040 4040 /** 4041 + * device_for_each_child_reverse_from - device child iterator in reversed order. 4042 + * @parent: parent struct device. 4043 + * @from: optional starting point in child list 4044 + * @fn: function to be called for each device. 4045 + * @data: data for the callback. 4046 + * 4047 + * Iterate over @parent's child devices, starting at @from, and call @fn 4048 + * for each, passing it @data. This helper is identical to 4049 + * device_for_each_child_reverse() when @from is NULL. 4050 + * 4051 + * @fn is checked each iteration. If it returns anything other than 0, 4052 + * iteration stop and that value is returned to the caller of 4053 + * device_for_each_child_reverse_from(); 4054 + */ 4055 + int device_for_each_child_reverse_from(struct device *parent, 4056 + struct device *from, const void *data, 4057 + int (*fn)(struct device *, const void *)) 4058 + { 4059 + struct klist_iter i; 4060 + struct device *child; 4061 + int error = 0; 4062 + 4063 + if (!parent->p) 4064 + return 0; 4065 + 4066 + klist_iter_init_node(&parent->p->klist_children, &i, 4067 + (from ? &from->p->knode_parent : NULL)); 4068 + while ((child = prev_device(&i)) && !error) 4069 + error = fn(child, data); 4070 + klist_iter_exit(&i); 4071 + return error; 4072 + } 4073 + EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); 4074 + 4075 + /** 4041 4076 * device_find_child - device iterator for locating a particular device. 4042 4077 * @parent: parent struct device 4043 4078 * @match: Callback function to check device
+42 -8
drivers/cxl/core/hdm.c
··· 712 712 return 0; 713 713 } 714 714 715 - static int cxl_decoder_reset(struct cxl_decoder *cxld) 715 + static int commit_reap(struct device *dev, const void *data) 716 + { 717 + struct cxl_port *port = to_cxl_port(dev->parent); 718 + struct cxl_decoder *cxld; 719 + 720 + if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev)) 721 + return 0; 722 + 723 + cxld = to_cxl_decoder(dev); 724 + if (port->commit_end == cxld->id && 725 + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { 726 + port->commit_end--; 727 + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", 728 + dev_name(&cxld->dev), port->commit_end); 729 + } 730 + 731 + return 0; 732 + } 733 + 734 + void cxl_port_commit_reap(struct cxl_decoder *cxld) 735 + { 736 + struct cxl_port *port = to_cxl_port(cxld->dev.parent); 737 + 738 + lockdep_assert_held_write(&cxl_region_rwsem); 739 + 740 + /* 741 + * Once the highest committed decoder is disabled, free any other 742 + * decoders that were pinned allocated by out-of-order release. 743 + */ 744 + port->commit_end--; 745 + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev), 746 + port->commit_end); 747 + device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL, 748 + commit_reap); 749 + } 750 + EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL); 751 + 752 + static void cxl_decoder_reset(struct cxl_decoder *cxld) 716 753 { 717 754 struct cxl_port *port = to_cxl_port(cxld->dev.parent); 718 755 struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); ··· 758 721 u32 ctrl; 759 722 760 723 if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) 761 - return 0; 724 + return; 762 725 763 - if (port->commit_end != id) { 726 + if (port->commit_end == id) 727 + cxl_port_commit_reap(cxld); 728 + else 764 729 dev_dbg(&port->dev, 765 730 "%s: out of order reset, expected decoder%d.%d\n", 766 731 dev_name(&cxld->dev), port->id, port->commit_end); 767 - return -EBUSY; 768 - } 769 732 770 733 down_read(&cxl_dpa_rwsem); 771 734 ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); ··· 778 741 writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); 779 742 up_read(&cxl_dpa_rwsem); 780 743 781 - port->commit_end--; 782 744 cxld->flags &= ~CXL_DECODER_F_ENABLE; 783 745 784 746 /* Userspace is now responsible for reconfiguring this decoder */ ··· 787 751 cxled = to_cxl_endpoint_decoder(&cxld->dev); 788 752 cxled->state = CXL_DECODER_STATE_MANUAL; 789 753 } 790 - 791 - return 0; 792 754 } 793 755 794 756 static int cxl_setup_hdm_decoder_from_dvsec(
+13 -35
drivers/cxl/core/region.c
··· 232 232 "Bypassing cpu_cache_invalidate_memregion() for testing!\n"); 233 233 return 0; 234 234 } else { 235 - dev_err(&cxlr->dev, 236 - "Failed to synchronize CPU cache state\n"); 235 + dev_WARN(&cxlr->dev, 236 + "Failed to synchronize CPU cache state\n"); 237 237 return -ENXIO; 238 238 } 239 239 } ··· 242 242 return 0; 243 243 } 244 244 245 - static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) 245 + static void cxl_region_decode_reset(struct cxl_region *cxlr, int count) 246 246 { 247 247 struct cxl_region_params *p = &cxlr->params; 248 - int i, rc = 0; 248 + int i; 249 249 250 250 /* 251 - * Before region teardown attempt to flush, and if the flush 252 - * fails cancel the region teardown for data consistency 253 - * concerns 251 + * Before region teardown attempt to flush, evict any data cached for 252 + * this region, or scream loudly about missing arch / platform support 253 + * for CXL teardown. 254 254 */ 255 - rc = cxl_region_invalidate_memregion(cxlr); 256 - if (rc) 257 - return rc; 255 + cxl_region_invalidate_memregion(cxlr); 258 256 259 257 for (i = count - 1; i >= 0; i--) { 260 258 struct cxl_endpoint_decoder *cxled = p->targets[i]; ··· 275 277 cxl_rr = cxl_rr_load(iter, cxlr); 276 278 cxld = cxl_rr->decoder; 277 279 if (cxld->reset) 278 - rc = cxld->reset(cxld); 279 - if (rc) 280 - return rc; 280 + cxld->reset(cxld); 281 281 set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); 282 282 } 283 283 284 284 endpoint_reset: 285 - rc = cxled->cxld.reset(&cxled->cxld); 286 - if (rc) 287 - return rc; 285 + cxled->cxld.reset(&cxled->cxld); 288 286 set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); 289 287 } 290 288 291 289 /* all decoders associated with this region have been torn down */ 292 290 clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); 293 - 294 - return 0; 295 291 } 296 292 297 293 static int commit_decoder(struct cxl_decoder *cxld) ··· 401 409 * still pending. 402 410 */ 403 411 if (p->state == CXL_CONFIG_RESET_PENDING) { 404 - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); 405 - /* 406 - * Revert to committed since there may still be active 407 - * decoders associated with this region, or move forward 408 - * to active to mark the reset successful 409 - */ 410 - if (rc) 411 - p->state = CXL_CONFIG_COMMIT; 412 - else 413 - p->state = CXL_CONFIG_ACTIVE; 412 + cxl_region_decode_reset(cxlr, p->interleave_ways); 413 + p->state = CXL_CONFIG_ACTIVE; 414 414 } 415 415 } 416 416 ··· 2038 2054 get_device(&cxlr->dev); 2039 2055 2040 2056 if (p->state > CXL_CONFIG_ACTIVE) { 2041 - /* 2042 - * TODO: tear down all impacted regions if a device is 2043 - * removed out of order 2044 - */ 2045 - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); 2046 - if (rc) 2047 - goto out; 2057 + cxl_region_decode_reset(cxlr, p->interleave_ways); 2048 2058 p->state = CXL_CONFIG_ACTIVE; 2049 2059 } 2050 2060
+2 -1
drivers/cxl/cxl.h
··· 359 359 struct cxl_region *region; 360 360 unsigned long flags; 361 361 int (*commit)(struct cxl_decoder *cxld); 362 - int (*reset)(struct cxl_decoder *cxld); 362 + void (*reset)(struct cxl_decoder *cxld); 363 363 }; 364 364 365 365 /* ··· 730 730 int cxl_num_decoders_committed(struct cxl_port *port); 731 731 bool is_cxl_port(const struct device *dev); 732 732 struct cxl_port *to_cxl_port(const struct device *dev); 733 + void cxl_port_commit_reap(struct cxl_decoder *cxld); 733 734 struct pci_bus; 734 735 int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev, 735 736 struct pci_bus *bus);
+3
include/linux/device.h
··· 1078 1078 int (*fn)(struct device *dev, void *data)); 1079 1079 int device_for_each_child_reverse(struct device *dev, void *data, 1080 1080 int (*fn)(struct device *dev, void *data)); 1081 + int device_for_each_child_reverse_from(struct device *parent, 1082 + struct device *from, const void *data, 1083 + int (*fn)(struct device *, const void *)); 1081 1084 struct device *device_find_child(struct device *dev, void *data, 1082 1085 int (*match)(struct device *dev, void *data)); 1083 1086 struct device *device_find_child_by_name(struct device *parent,
+5 -9
tools/testing/cxl/test/cxl.c
··· 693 693 return 0; 694 694 } 695 695 696 - static int mock_decoder_reset(struct cxl_decoder *cxld) 696 + static void mock_decoder_reset(struct cxl_decoder *cxld) 697 697 { 698 698 struct cxl_port *port = to_cxl_port(cxld->dev.parent); 699 699 int id = cxld->id; 700 700 701 701 if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) 702 - return 0; 702 + return; 703 703 704 704 dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev)); 705 - if (port->commit_end != id) { 705 + if (port->commit_end == id) 706 + cxl_port_commit_reap(cxld); 707 + else 706 708 dev_dbg(&port->dev, 707 709 "%s: out of order reset, expected decoder%d.%d\n", 708 710 dev_name(&cxld->dev), port->id, port->commit_end); 709 - return -EBUSY; 710 - } 711 - 712 - port->commit_end--; 713 711 cxld->flags &= ~CXL_DECODER_F_ENABLE; 714 - 715 - return 0; 716 712 } 717 713 718 714 static void default_mock_decoder(struct cxl_decoder *cxld)