Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/mm/hotplug: Modify PGD entry when removing memory

When hot-adding/removing memory, sync_global_pgds() is called
for synchronizing PGD to PGD entries of all processes MM. But
when hot-removing memory, sync_global_pgds() does not work
correctly.

At first, sync_global_pgds() checks whether target PGD is none
or not. And if PGD is none, the PGD is skipped. But when
hot-removing memory, PGD may be none since PGD may be cleared by
free_pud_table(). So when sync_global_pgds() is called after
hot-removing memory, sync_global_pgds() should not skip PGD even
if the PGD is none. And sync_global_pgds() must clear PGD
entries of all processes MM.

Currently sync_global_pgds() does not clear PGD entries of all
processes MM when hot-removing memory. So when hot adding
memory which is same memory range as removed memory after
hot-removing memory, following call traces are shown:

kernel BUG at arch/x86/mm/init_64.c:206!
...
[<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2
[<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380
[<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0
[<ffffffff815d03d9>] add_memory+0xb9/0x1b0
[<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e
[<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0
[<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f
[<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
[<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
[<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5
[<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2
[<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e
[<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15
[<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32
[<ffffffff8107e02b>] process_one_work+0x17b/0x460
[<ffffffff8107edfb>] worker_thread+0x11b/0x400
[<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400
[<ffffffff81085aef>] kthread+0xcf/0xe0
[<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0
[<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140

This patch clears PGD entries of all processes MM when
sync_global_pgds() is called after hot-removing memory

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Yasuaki Ishimatsu and committed by
Ingo Molnar
9661d5bc 5255e0a7

+22 -10
+2 -1
arch/x86/include/asm/pgtable_64.h
··· 115 115 native_set_pgd(pgd, native_make_pgd(0)); 116 116 } 117 117 118 - extern void sync_global_pgds(unsigned long start, unsigned long end); 118 + extern void sync_global_pgds(unsigned long start, unsigned long end, 119 + int removed); 119 120 120 121 /* 121 122 * Conversion functions: convert a page and protection to a page entry,
+1 -1
arch/x86/mm/fault.c
··· 350 350 351 351 void vmalloc_sync_all(void) 352 352 { 353 - sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END); 353 + sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END, 0); 354 354 } 355 355 356 356 /*
+19 -8
arch/x86/mm/init_64.c
··· 178 178 * When memory was added/removed make sure all the processes MM have 179 179 * suitable PGD entries in the local PGD level page. 180 180 */ 181 - void sync_global_pgds(unsigned long start, unsigned long end) 181 + void sync_global_pgds(unsigned long start, unsigned long end, int removed) 182 182 { 183 183 unsigned long address; 184 184 ··· 186 186 const pgd_t *pgd_ref = pgd_offset_k(address); 187 187 struct page *page; 188 188 189 - if (pgd_none(*pgd_ref)) 189 + /* 190 + * When it is called after memory hot remove, pgd_none() 191 + * returns true. In this case (removed == 1), we must clear 192 + * the PGD entries in the local PGD level page. 193 + */ 194 + if (pgd_none(*pgd_ref) && !removed) 190 195 continue; 191 196 192 197 spin_lock(&pgd_lock); ··· 204 199 pgt_lock = &pgd_page_get_mm(page)->page_table_lock; 205 200 spin_lock(pgt_lock); 206 201 207 - if (pgd_none(*pgd)) 208 - set_pgd(pgd, *pgd_ref); 209 - else 202 + if (!pgd_none(*pgd_ref) && !pgd_none(*pgd)) 210 203 BUG_ON(pgd_page_vaddr(*pgd) 211 204 != pgd_page_vaddr(*pgd_ref)); 205 + 206 + if (removed) { 207 + if (pgd_none(*pgd_ref) && !pgd_none(*pgd)) 208 + pgd_clear(pgd); 209 + } else { 210 + if (pgd_none(*pgd)) 211 + set_pgd(pgd, *pgd_ref); 212 + } 212 213 213 214 spin_unlock(pgt_lock); 214 215 } ··· 644 633 } 645 634 646 635 if (pgd_changed) 647 - sync_global_pgds(addr, end - 1); 636 + sync_global_pgds(addr, end - 1, 0); 648 637 649 638 __flush_tlb_all(); 650 639 ··· 1006 995 } 1007 996 1008 997 if (pgd_changed) 1009 - sync_global_pgds(start, end - 1); 998 + sync_global_pgds(start, end - 1, 1); 1010 999 1011 1000 flush_tlb_all(); 1012 1001 } ··· 1353 1342 else 1354 1343 err = vmemmap_populate_basepages(start, end, node); 1355 1344 if (!err) 1356 - sync_global_pgds(start, end - 1); 1345 + sync_global_pgds(start, end - 1, 0); 1357 1346 return err; 1358 1347 } 1359 1348