Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

device-dax: add memory via add_memory_driver_managed()

Currently, when adding memory, we create entries in /sys/firmware/memmap/
as "System RAM". This will lead to kexec-tools to add that memory to the
fixed-up initial memmap for a kexec kernel (loaded via kexec_load()). The
memory will be considered initial System RAM by the kexec'd kernel and can
no longer be reconfigured. This is not what happens during a real reboot.

Let's add our memory via add_memory_driver_managed() now, so we won't
create entries in /sys/firmware/memmap/ and indicate the memory as "System
RAM (kmem)" in /proc/iomem. This allows everybody (especially
kexec-tools) to identify that this memory is special and has to be treated
differently than ordinary (hotplugged) System RAM.

Before configuring the namespace:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-33fffffff : namespace0.0
3280000000-32ffffffff : PCI Bus 0000:00

After configuring the namespace:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00

After loading kmem before this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM
3280000000-32ffffffff : PCI Bus 0000:00

After loading kmem after this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM (kmem)
3280000000-32ffffffff : PCI Bus 0000:00

After a proper reboot:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00

Within the kexec kernel before this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : System RAM
3280000000-32ffffffff : PCI Bus 0000:00

Within the kexec kernel after this change:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
148200000-33fffffff : dax0.0
3280000000-32ffffffff : PCI Bus 0000:00

/sys/firmware/memmap/ before this change:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)
0000000150000000-0000000340000000 (System RAM)

/sys/firmware/memmap/ after a proper reboot:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)

/sys/firmware/memmap/ after this change:
0000000000000000-000000000009fc00 (System RAM)
000000000009fc00-00000000000a0000 (Reserved)
00000000000f0000-0000000000100000 (Reserved)
0000000000100000-00000000bffdf000 (System RAM)
00000000bffdf000-00000000c0000000 (Reserved)
00000000feffc000-00000000ff000000 (Reserved)
00000000fffc0000-0000000100000000 (Reserved)
0000000100000000-0000000140000000 (System RAM)

kexec-tools already seem to basically ignore any System RAM that's not on
top level when searching for areas to place kexec images - but also for
determining crash areas to dump via kdump. Changing the resource name
won't have an impact.

Handle unloading of the driver after memory hotremove failed properly, by
duplicating the string if necessary.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/20200508084217.9160-5-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

David Hildenbrand and committed by
Linus Torvalds
8a725e46 3fe4f499

+27 -2
+1
drivers/dax/dax-private.h
··· 44 44 * @dev - device core 45 45 * @pgmap - pgmap for memmap setup / lifetime (driver owned) 46 46 * @dax_mem_res: physical address range of hotadded DAX memory 47 + * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed() 47 48 */ 48 49 struct dev_dax { 49 50 struct dax_region *region;
+26 -2
drivers/dax/kmem.c
··· 14 14 #include "dax-private.h" 15 15 #include "bus.h" 16 16 17 + /* Memory resource name used for add_memory_driver_managed(). */ 18 + static const char *kmem_name; 19 + /* Set if any memory will remain added when the driver will be unloaded. */ 20 + static bool any_hotremove_failed; 21 + 17 22 int dev_dax_kmem_probe(struct device *dev) 18 23 { 19 24 struct dev_dax *dev_dax = to_dev_dax(dev); ··· 75 70 */ 76 71 new_res->flags = IORESOURCE_SYSTEM_RAM; 77 72 78 - rc = add_memory(numa_node, new_res->start, resource_size(new_res)); 73 + /* 74 + * Ensure that future kexec'd kernels will not treat this as RAM 75 + * automatically. 76 + */ 77 + rc = add_memory_driver_managed(numa_node, new_res->start, 78 + resource_size(new_res), kmem_name); 79 79 if (rc) { 80 80 release_resource(new_res); 81 81 kfree(new_res); ··· 110 100 */ 111 101 rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size); 112 102 if (rc) { 103 + any_hotremove_failed = true; 113 104 dev_err(dev, 114 105 "DAX region %pR cannot be hotremoved until the next reboot\n", 115 106 res); ··· 135 124 * permanently pinned as reserved by the unreleased 136 125 * request_mem_region(). 137 126 */ 127 + any_hotremove_failed = true; 138 128 return 0; 139 129 } 140 130 #endif /* CONFIG_MEMORY_HOTREMOVE */ ··· 149 137 150 138 static int __init dax_kmem_init(void) 151 139 { 152 - return dax_driver_register(&device_dax_kmem_driver); 140 + int rc; 141 + 142 + /* Resource name is permanently allocated if any hotremove fails. */ 143 + kmem_name = kstrdup_const("System RAM (kmem)", GFP_KERNEL); 144 + if (!kmem_name) 145 + return -ENOMEM; 146 + 147 + rc = dax_driver_register(&device_dax_kmem_driver); 148 + if (rc) 149 + kfree_const(kmem_name); 150 + return rc; 153 151 } 154 152 155 153 static void __exit dax_kmem_exit(void) 156 154 { 157 155 dax_driver_unregister(&device_dax_kmem_driver); 156 + if (!any_hotremove_failed) 157 + kfree_const(kmem_name); 158 158 } 159 159 160 160 MODULE_AUTHOR("Intel Corporation");