[PATCH] hugetlb: prepare_hugepage_range check offset too

(David:)

If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
because the given file offset is not hugepage aligned - then do_mmap_pgoff
will go to the unmap_and_free_vma backout path.

But at this stage the vma hasn't been marked as hugepage, and the backout path
will call unmap_region() on it. That will eventually call down to the
non-hugepage version of unmap_page_range(). On ppc64, at least, that will
cause serious problems if there are any existing hugepage pagetable entries in
the vicinity - for example if there are any other hugepage mappings under the
same PUD. unmap_page_range() will trigger a bad_pud() on the hugepage pud
entries. I suspect this will also cause bad problems on ia64, though I don't
have a machine to test it on.

(Hugh:)

prepare_hugepage_range() should check file offset alignment when it checks
virtual address and length, to stop MAP_FIXED with a bad huge offset from
unmapping before it fails further down. PowerPC should apply the same
prepare_hugepage_range alignment checks as ia64 and all the others do.

Then none of the alignment checks in hugetlbfs_file_mmap are required (nor
is the check for too small a mapping); but even so, move up setting of
VM_HUGETLB and add a comment to warn of what David Gibson discovered - if
hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
when unwinding from error will go the non-huge way, which may cause bad
behaviour on architectures (powerpc and ia64) which segregate their huge
mappings into a separate region of the address space.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

authored by Hugh Dickins and committed by Linus Torvalds 68589bc3 69ae9e3e

+25 -20
+3 -1
arch/ia64/mm/hugetlbpage.c
··· 70 70 * Don't actually need to do any preparation, but need to make sure 71 71 * the address is in the right region. 72 72 */ 73 - int prepare_hugepage_range(unsigned long addr, unsigned long len) 73 + int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff) 74 74 { 75 + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) 76 + return -EINVAL; 75 77 if (len & ~HPAGE_MASK) 76 78 return -EINVAL; 77 79 if (addr & ~HPAGE_MASK)
+6 -2
arch/powerpc/mm/hugetlbpage.c
··· 491 491 return 0; 492 492 } 493 493 494 - int prepare_hugepage_range(unsigned long addr, unsigned long len) 494 + int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff) 495 495 { 496 496 int err = 0; 497 497 498 - if ( (addr+len) < addr ) 498 + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) 499 + return -EINVAL; 500 + if (len & ~HPAGE_MASK) 501 + return -EINVAL; 502 + if (addr & ~HPAGE_MASK) 499 503 return -EINVAL; 500 504 501 505 if (addr < 0x100000000UL)
+8 -13
fs/hugetlbfs/inode.c
··· 62 62 loff_t len, vma_len; 63 63 int ret; 64 64 65 - if (vma->vm_pgoff & (HPAGE_SIZE / PAGE_SIZE - 1)) 66 - return -EINVAL; 67 - 68 - if (vma->vm_start & ~HPAGE_MASK) 69 - return -EINVAL; 70 - 71 - if (vma->vm_end & ~HPAGE_MASK) 72 - return -EINVAL; 73 - 74 - if (vma->vm_end - vma->vm_start < HPAGE_SIZE) 75 - return -EINVAL; 65 + /* 66 + * vma alignment has already been checked by prepare_hugepage_range. 67 + * If you add any error returns here, do so after setting VM_HUGETLB, 68 + * so is_vm_hugetlb_page tests below unmap_region go the right way 69 + * when do_mmap_pgoff unwinds (may be important on powerpc and ia64). 70 + */ 71 + vma->vm_flags |= VM_HUGETLB | VM_RESERVED; 72 + vma->vm_ops = &hugetlb_vm_ops; 76 73 77 74 vma_len = (loff_t)(vma->vm_end - vma->vm_start); 78 75 79 76 mutex_lock(&inode->i_mutex); 80 77 file_accessed(file); 81 - vma->vm_flags |= VM_HUGETLB | VM_RESERVED; 82 - vma->vm_ops = &hugetlb_vm_ops; 83 78 84 79 ret = -ENOMEM; 85 80 len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+7 -3
include/linux/hugetlb.h
··· 60 60 * If the arch doesn't supply something else, assume that hugepage 61 61 * size aligned regions are ok without further preparation. 62 62 */ 63 - static inline int prepare_hugepage_range(unsigned long addr, unsigned long len) 63 + static inline int prepare_hugepage_range(unsigned long addr, unsigned long len, 64 + pgoff_t pgoff) 64 65 { 66 + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) 67 + return -EINVAL; 65 68 if (len & ~HPAGE_MASK) 66 69 return -EINVAL; 67 70 if (addr & ~HPAGE_MASK) ··· 72 69 return 0; 73 70 } 74 71 #else 75 - int prepare_hugepage_range(unsigned long addr, unsigned long len); 72 + int prepare_hugepage_range(unsigned long addr, unsigned long len, 73 + pgoff_t pgoff); 76 74 #endif 77 75 78 76 #ifndef ARCH_HAS_SETCLEAR_HUGE_PTE ··· 111 107 #define hugetlb_report_meminfo(buf) 0 112 108 #define hugetlb_report_node_meminfo(n, buf) 0 113 109 #define follow_huge_pmd(mm, addr, pmd, write) NULL 114 - #define prepare_hugepage_range(addr, len) (-EINVAL) 110 + #define prepare_hugepage_range(addr,len,pgoff) (-EINVAL) 115 111 #define pmd_huge(x) 0 116 112 #define is_hugepage_only_range(mm, addr, len) 0 117 113 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
+1 -1
mm/mmap.c
··· 1379 1379 * Check if the given range is hugepage aligned, and 1380 1380 * can be made suitable for hugepages. 1381 1381 */ 1382 - ret = prepare_hugepage_range(addr, len); 1382 + ret = prepare_hugepage_range(addr, len, pgoff); 1383 1383 } else { 1384 1384 /* 1385 1385 * Ensure that a normal request is not falling in a