Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: generalise COW SMC TLB flushing race comment

I'm not sure if I'm completely missing something here, but AFAIKS the
reference to the mysterious "COW SMC race" confuses the issue. The
original changelog and mailing list thread didn't help me either.

This SMC race is where the problem was detected, but isn't the general
problem bigger and more obvious: that the new PTE could be picked up at
any time by any TLB while entries for the old PTE exist in other TLBs
before the TLB flush takes effect?

The case where the iTLB and dTLB of a CPU are pointing at different pages
is an interesting one but follows from the general problem.

The other (minor) thing with the comment I think it makes it a bit clearer
to say what the old code was doing (i.e., it avoids the race as opposed to
what?).

References: 4ce072f1faf29 ("mm: fix a race condition under SMC + COW")
Link: https://lkml.kernel.org/r/20201215121119.351650-1-npiggin@gmail.com
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Nicholas Piggin and committed by
Linus Torvalds
111fe718 e05986ee

+5 -3
+5 -3
mm/memory.c
··· 2892 2892 entry = mk_pte(new_page, vma->vm_page_prot); 2893 2893 entry = pte_sw_mkyoung(entry); 2894 2894 entry = maybe_mkwrite(pte_mkdirty(entry), vma); 2895 + 2895 2896 /* 2896 2897 * Clear the pte entry and flush it first, before updating the 2897 - * pte with the new entry. This will avoid a race condition 2898 - * seen in the presence of one thread doing SMC and another 2899 - * thread doing COW. 2898 + * pte with the new entry, to keep TLBs on different CPUs in 2899 + * sync. This code used to set the new PTE then flush TLBs, but 2900 + * that left a window where the new PTE could be loaded into 2901 + * some TLBs while the old PTE remains in others. 2900 2902 */ 2901 2903 ptep_clear_flush_notify(vma, vmf->address, vmf->pte); 2902 2904 page_add_new_anon_rmap(new_page, vma, vmf->address, false);