Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: thp: fix BUG on mm->nr_ptes

Dave Jones reports a few Fedora users hitting the BUG_ON(mm->nr_ptes...)
in exit_mmap() recently.

Quoting Hugh's discovery and explanation of the SMP race condition:

"mm->nr_ptes had unusual locking: down_read mmap_sem plus
page_table_lock when incrementing, down_write mmap_sem (or mm_users
0) when decrementing; whereas THP is careful to increment and
decrement it under page_table_lock.

Now most of those paths in THP also hold mmap_sem for read or write
(with appropriate checks on mm_users), but two do not: when
split_huge_page() is called by hwpoison_user_mappings(), and when
called by add_to_swap().

It's conceivable that the latter case is responsible for the
exit_mmap() BUG_ON mm->nr_ptes that has been reported on Fedora."

The simplest way to fix it without having to alter the locking is to make
split_huge_page() a noop in nr_ptes terms, so by counting the preallocated
pagetables that exists for every mapped hugepage. It was an arbitrary
choice not to count them and either way is not wrong or right, because
they are not used but they're still allocated.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org> [3.0.x, 3.1.x, 3.2.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Andrea Arcangeli and committed by
Linus Torvalds
1c641e84 62aca403

+3 -3
+3 -3
mm/huge_memory.c
··· 671 671 set_pmd_at(mm, haddr, pmd, entry); 672 672 prepare_pmd_huge_pte(pgtable, mm); 673 673 add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR); 674 + mm->nr_ptes++; 674 675 spin_unlock(&mm->page_table_lock); 675 676 } 676 677 ··· 790 789 pmd = pmd_mkold(pmd_wrprotect(pmd)); 791 790 set_pmd_at(dst_mm, addr, dst_pmd, pmd); 792 791 prepare_pmd_huge_pte(pgtable, dst_mm); 792 + dst_mm->nr_ptes++; 793 793 794 794 ret = 0; 795 795 out_unlock: ··· 889 887 } 890 888 kfree(pages); 891 889 892 - mm->nr_ptes++; 893 890 smp_wmb(); /* make pte visible before pmd */ 894 891 pmd_populate(mm, pmd, pgtable); 895 892 page_remove_rmap(page); ··· 1048 1047 VM_BUG_ON(page_mapcount(page) < 0); 1049 1048 add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); 1050 1049 VM_BUG_ON(!PageHead(page)); 1050 + tlb->mm->nr_ptes--; 1051 1051 spin_unlock(&tlb->mm->page_table_lock); 1052 1052 tlb_remove_page(tlb, page); 1053 1053 pte_free(tlb->mm, pgtable); ··· 1377 1375 pte_unmap(pte); 1378 1376 } 1379 1377 1380 - mm->nr_ptes++; 1381 1378 smp_wmb(); /* make pte visible before pmd */ 1382 1379 /* 1383 1380 * Up to this point the pmd is present and huge and ··· 1989 1988 set_pmd_at(mm, address, pmd, _pmd); 1990 1989 update_mmu_cache(vma, address, _pmd); 1991 1990 prepare_pmd_huge_pte(pgtable, mm); 1992 - mm->nr_ptes--; 1993 1991 spin_unlock(&mm->page_table_lock); 1994 1992 1995 1993 #ifndef CONFIG_NUMA