Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping

When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
mapping overlaps with kernel module virtual space. When mapping in this space
is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
finish at 2MB boundary.

When module loading is just on top of the 2MB space, got below warning:

WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
Call Trace:
[<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
[<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
[<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
[<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
[<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
[<ffffffff8103ca54>] module_alloc+0x64/0x70
[<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
[<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
[<ffffffff810ac9a7>] move_module+0x27/0x150
[<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
[<ffffffff810af0a8>] load_module+0x78/0x640
[<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
[<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
[<ffffffff815154c2>] system_call_fastpath+0x16/0x1b

Then the mapping of 2MB is cleared, finally oops when the page in that space is
accessed.

BUG: unable to handle kernel paging request at ffff880022600000
IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
Oops: 0002 [#1] SMP
Call Trace:
[<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
[<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
[<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
[<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
[<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
[<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
[<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
[<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
[<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
[<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
[<ffffffff81510c10>] do_page_fault+0x140/0x470
[<ffffffff8150d7d5>] page_fault+0x25/0x30

Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.

-v2: add comment about XEN alignment from Juergen.

References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.html
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

[boris: added 'xen/mmu' tag to commit subject]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

authored by

Zhenzhong Duan and committed by
Boris Ostrovsky
0d805ee7 8c28ef3f

+4 -9
+4 -9
arch/x86/xen/mmu_pv.c
··· 1238 1238 * from _brk_limit way up to the max_pfn_mapped (which is the end of 1239 1239 * the ramdisk). We continue on, erasing PMD entries that point to page 1240 1240 * tables - do note that they are accessible at this stage via __va. 1241 - * For good measure we also round up to the PMD - which means that if 1241 + * As Xen is aligning the memory end to a 4MB boundary, for good 1242 + * measure we also round up to PMD_SIZE * 2 - which means that if 1242 1243 * anybody is using __ka address to the initial boot-stack - and try 1243 1244 * to use it - they are going to crash. The xen_start_info has been 1244 1245 * taken care of already in xen_setup_kernel_pagetable. */ 1245 1246 addr = xen_start_info->pt_base; 1246 - size = roundup(xen_start_info->nr_pt_frames * PAGE_SIZE, PMD_SIZE); 1247 + size = xen_start_info->nr_pt_frames * PAGE_SIZE; 1247 1248 1248 - xen_cleanhighmap(addr, addr + size); 1249 + xen_cleanhighmap(addr, roundup(addr + size, PMD_SIZE * 2)); 1249 1250 xen_start_info->pt_base = (unsigned long)__va(__pa(xen_start_info->pt_base)); 1250 - #ifdef DEBUG 1251 - /* This is superfluous and is not necessary, but you know what 1252 - * lets do it. The MODULES_VADDR -> MODULES_END should be clear of 1253 - * anything at this stage. */ 1254 - xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1); 1255 - #endif 1256 1251 } 1257 1252 #endif 1258 1253