Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: numa: do not trap faults on the huge zero page

Faults on the huge zero page are pointless and there is a BUG_ON to catch
them during fault time. This patch reintroduces a check that avoids
marking the zero page PAGE_NONE.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Mel Gorman and committed by
Linus Torvalds
e944fd67 21d9ee3e

+27 -4
+2 -1
include/linux/huge_mm.h
··· 31 31 unsigned long new_addr, unsigned long old_end, 32 32 pmd_t *old_pmd, pmd_t *new_pmd); 33 33 extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, 34 - unsigned long addr, pgprot_t newprot); 34 + unsigned long addr, pgprot_t newprot, 35 + int prot_numa); 35 36 36 37 enum transparent_hugepage_flag { 37 38 TRANSPARENT_HUGEPAGE_FLAG,
+12 -1
mm/huge_memory.c
··· 1471 1471 * - HPAGE_PMD_NR is protections changed and TLB flush necessary 1472 1472 */ 1473 1473 int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, 1474 - unsigned long addr, pgprot_t newprot) 1474 + unsigned long addr, pgprot_t newprot, int prot_numa) 1475 1475 { 1476 1476 struct mm_struct *mm = vma->vm_mm; 1477 1477 spinlock_t *ptl; ··· 1479 1479 1480 1480 if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) { 1481 1481 pmd_t entry; 1482 + 1483 + /* 1484 + * Avoid trapping faults against the zero page. The read-only 1485 + * data is likely to be read-cached on the local CPU and 1486 + * local/remote hits to the zero page are not interesting. 1487 + */ 1488 + if (prot_numa && is_huge_zero_pmd(*pmd)) { 1489 + spin_unlock(ptl); 1490 + return 0; 1491 + } 1492 + 1482 1493 ret = 1; 1483 1494 entry = pmdp_get_and_clear_notify(mm, addr, pmd); 1484 1495 entry = pmd_modify(entry, newprot);
-1
mm/memory.c
··· 3040 3040 pte_unmap_unlock(ptep, ptl); 3041 3041 return 0; 3042 3042 } 3043 - BUG_ON(is_zero_pfn(page_to_pfn(page))); 3044 3043 3045 3044 /* 3046 3045 * Avoid grouping on DSO/COW pages in specific and RO pages
+13 -1
mm/mprotect.c
··· 76 76 if (pte_present(oldpte)) { 77 77 pte_t ptent; 78 78 79 + /* 80 + * Avoid trapping faults against the zero or KSM 81 + * pages. See similar comment in change_huge_pmd. 82 + */ 83 + if (prot_numa) { 84 + struct page *page; 85 + 86 + page = vm_normal_page(vma, addr, oldpte); 87 + if (!page || PageKsm(page)) 88 + continue; 89 + } 90 + 79 91 ptent = ptep_modify_prot_start(mm, addr, pte); 80 92 ptent = pte_modify(ptent, newprot); 81 93 ··· 154 142 split_huge_page_pmd(vma, addr, pmd); 155 143 else { 156 144 int nr_ptes = change_huge_pmd(vma, pmd, addr, 157 - newprot); 145 + newprot, prot_numa); 158 146 159 147 if (nr_ptes) { 160 148 if (nr_ptes == HPAGE_PMD_NR) {