Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/membarrier: Get rid of a dubious optimization

sync_core_before_usermode() had an incorrect optimization. If the kernel
returns from an interrupt, it can get to usermode without IRET. It just has
to schedule to a different task in the same mm and do SYSRET. Fortunately,
there were no callers of sync_core_before_usermode() that could have had
in_irq() or in_nmi() equal to true, because it's only ever called from the
scheduler.

While at it, clarify a related comment.

Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/5afc7632be1422f91eaf7611aaaa1b5b8580a086.1607058304.git.luto@kernel.org

authored by

Andy Lutomirski and committed by
Thomas Gleixner
a493d1ca 0477e928

+13 -6
+5 -4
arch/x86/include/asm/sync_core.h
··· 98 98 /* With PTI, we unconditionally serialize before running user code. */ 99 99 if (static_cpu_has(X86_FEATURE_PTI)) 100 100 return; 101 + 101 102 /* 102 - * Return from interrupt and NMI is done through iret, which is core 103 - * serializing. 103 + * Even if we're in an interrupt, we might reschedule before returning, 104 + * in which case we could switch to a different thread in the same mm 105 + * and return using SYSRET or SYSEXIT. Instead of trying to keep 106 + * track of our need to sync the core, just sync right away. 104 107 */ 105 - if (in_irq() || in_nmi()) 106 - return; 107 108 sync_core(); 108 109 } 109 110
+8 -2
arch/x86/mm/tlb.c
··· 474 474 /* 475 475 * The membarrier system call requires a full memory barrier and 476 476 * core serialization before returning to user-space, after 477 - * storing to rq->curr. Writing to CR3 provides that full 478 - * memory barrier and core serializing instruction. 477 + * storing to rq->curr, when changing mm. This is because 478 + * membarrier() sends IPIs to all CPUs that are in the target mm 479 + * to make them issue memory barriers. However, if another CPU 480 + * switches to/from the target mm concurrently with 481 + * membarrier(), it can cause that CPU not to receive an IPI 482 + * when it really should issue a memory barrier. Writing to CR3 483 + * provides that full memory barrier and core serializing 484 + * instruction. 479 485 */ 480 486 if (real_prev == next) { 481 487 VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=