Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm/khugepaged: fix GUP-fast interaction by sending IPI

Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP
collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to
ensure that the page table was not removed by khugepaged in between.

However, lockless_pages_from_mm() still requires that the page table is
not concurrently freed. Fix it by sending IPIs (if the architecture uses
semi-RCU-style page table freeing) before freeing/reusing page tables.

Link: https://lkml.kernel.org/r/20221129154730.2274278-2-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-2-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-2-jannh@google.com
Fixes: ba76149f47d8 ("thp: khugepaged")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Jann Horn and committed by
Andrew Morton
2ba99c5e 8d3c106e

+7 -3
+4
include/asm-generic/tlb.h
··· 222 222 #define tlb_needs_table_invalidate() (true) 223 223 #endif 224 224 225 + void tlb_remove_table_sync_one(void); 226 + 225 227 #else 226 228 227 229 #ifdef tlb_needs_table_invalidate 228 230 #error tlb_needs_table_invalidate() requires MMU_GATHER_RCU_TABLE_FREE 229 231 #endif 232 + 233 + static inline void tlb_remove_table_sync_one(void) { } 230 234 231 235 #endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */ 232 236
+2
mm/khugepaged.c
··· 1051 1051 _pmd = pmdp_collapse_flush(vma, address, pmd); 1052 1052 spin_unlock(pmd_ptl); 1053 1053 mmu_notifier_invalidate_range_end(&range); 1054 + tlb_remove_table_sync_one(); 1054 1055 1055 1056 spin_lock(pte_ptl); 1056 1057 result = __collapse_huge_page_isolate(vma, address, pte, cc, ··· 1411 1410 lockdep_assert_held_write(&vma->anon_vma->root->rwsem); 1412 1411 1413 1412 pmd = pmdp_collapse_flush(vma, addr, pmdp); 1413 + tlb_remove_table_sync_one(); 1414 1414 mm_dec_nr_ptes(mm); 1415 1415 page_table_check_pte_clear_range(mm, addr, pmd); 1416 1416 pte_free(mm, pmd_pgtable(pmd));
+1 -3
mm/mmu_gather.c
··· 153 153 /* Simply deliver the interrupt */ 154 154 } 155 155 156 - static void tlb_remove_table_sync_one(void) 156 + void tlb_remove_table_sync_one(void) 157 157 { 158 158 /* 159 159 * This isn't an RCU grace period and hence the page-tables cannot be ··· 176 176 } 177 177 178 178 #else /* !CONFIG_MMU_GATHER_RCU_TABLE_FREE */ 179 - 180 - static void tlb_remove_table_sync_one(void) { } 181 179 182 180 static void tlb_remove_table_free(struct mmu_table_batch *batch) 183 181 {