Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge patch series "membarrier: riscv: Core serializing command"

RISC-V was lacking a membarrier implementation for the store/fetch
ordering, which is a bit tricky because of the deferred icache flushing
we use in RISC-V.

* b4-shazam-merge:
membarrier: riscv: Provide core serializing command
locking: Introduce prepare_sync_core_cmd()
membarrier: Create Documentation/scheduler/membarrier.rst
membarrier: riscv: Add full memory barrier in switch_mm()

Link: https://lore.kernel.org/r/20240131144936.29190-1-parri.andrea@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

+185 -10
+17 -1
Documentation/features/sched/membarrier-sync-core/arch-support.txt
··· 10 10 # Rely on implicit context synchronization as a result of exception return 11 11 # when returning from IPI handler, and when returning to user-space. 12 12 # 13 + # * riscv 14 + # 15 + # riscv uses xRET as return from interrupt and to return to user-space. 16 + # 17 + # Given that xRET is not core serializing, we rely on FENCE.I for providing 18 + # core serialization: 19 + # 20 + # - by calling sync_core_before_usermode() on return from interrupt (cf. 21 + # ipi_sync_core()), 22 + # 23 + # - via switch_mm() and sync_core_before_usermode() (respectively, for 24 + # uthread->uthread and kthread->uthread transitions) before returning 25 + # to user-space. 26 + # 27 + # The serialization in switch_mm() is activated by prepare_sync_core_cmd(). 28 + # 13 29 # * x86 14 30 # 15 31 # x86-32 uses IRET as return from interrupt, which takes care of the IPI. ··· 59 43 | openrisc: | TODO | 60 44 | parisc: | TODO | 61 45 | powerpc: | ok | 62 - | riscv: | TODO | 46 + | riscv: | ok | 63 47 | s390: | ok | 64 48 | sh: | TODO | 65 49 | sparc: | TODO |
+1
Documentation/scheduler/index.rst
··· 7 7 8 8 9 9 completion 10 + membarrier 10 11 sched-arch 11 12 sched-bwc 12 13 sched-deadline
+39
Documentation/scheduler/membarrier.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================== 4 + membarrier() System Call 5 + ======================== 6 + 7 + MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements 8 + ===================================================================== 9 + 10 + Memory barriers before updating rq->curr 11 + ---------------------------------------- 12 + 13 + The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDITED 14 + require each architecture to have a full memory barrier after coming from 15 + user-space, before updating rq->curr. This barrier is implied by the sequence 16 + rq_lock(); smp_mb__after_spinlock() in __schedule(). The barrier matches a full 17 + barrier in the proximity of the membarrier system call exit, cf. 18 + membarrier_{private,global}_expedited(). 19 + 20 + Memory barriers after updating rq->curr 21 + --------------------------------------- 22 + 23 + The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDITED 24 + require each architecture to have a full memory barrier after updating rq->curr, 25 + before returning to user-space. The schemes providing this barrier on the various 26 + architectures are as follows. 27 + 28 + - alpha, arc, arm, hexagon, mips rely on the full barrier implied by 29 + spin_unlock() in finish_lock_switch(). 30 + 31 + - arm64 relies on the full barrier implied by switch_to(). 32 + 33 + - powerpc, riscv, s390, sparc, x86 rely on the full barrier implied by 34 + switch_mm(), if mm is not NULL; they rely on the full barrier implied 35 + by mmdrop(), otherwise. On powerpc and riscv, switch_mm() relies on 36 + membarrier_arch_switch_mm(). 37 + 38 + The barrier matches a full barrier in the proximity of the membarrier system call 39 + entry, cf. membarrier_{private,global}_expedited().
+3 -1
MAINTAINERS
··· 14039 14039 M: "Paul E. McKenney" <paulmck@kernel.org> 14040 14040 L: linux-kernel@vger.kernel.org 14041 14041 S: Supported 14042 - F: arch/powerpc/include/asm/membarrier.h 14042 + F: Documentation/scheduler/membarrier.rst 14043 + F: arch/*/include/asm/membarrier.h 14044 + F: arch/*/include/asm/sync_core.h 14043 14045 F: include/uapi/linux/membarrier.h 14044 14046 F: kernel/sched/membarrier.c 14045 14047
+4
arch/riscv/Kconfig
··· 27 27 select ARCH_HAS_GCOV_PROFILE_ALL 28 28 select ARCH_HAS_GIGANTIC_PAGE 29 29 select ARCH_HAS_KCOV 30 + select ARCH_HAS_MEMBARRIER_CALLBACKS 31 + select ARCH_HAS_MEMBARRIER_SYNC_CORE 30 32 select ARCH_HAS_MMIOWB 31 33 select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE 32 34 select ARCH_HAS_PMEM_API 35 + select ARCH_HAS_PREPARE_SYNC_CORE_CMD 33 36 select ARCH_HAS_PTE_SPECIAL 34 37 select ARCH_HAS_SET_DIRECT_MAP if MMU 35 38 select ARCH_HAS_SET_MEMORY if MMU 36 39 select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL 37 40 select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL 41 + select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE 38 42 select ARCH_HAS_SYSCALL_WRAPPER 39 43 select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST 40 44 select ARCH_HAS_UBSAN_SANITIZE_ALL
+50
arch/riscv/include/asm/membarrier.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef _ASM_RISCV_MEMBARRIER_H 3 + #define _ASM_RISCV_MEMBARRIER_H 4 + 5 + static inline void membarrier_arch_switch_mm(struct mm_struct *prev, 6 + struct mm_struct *next, 7 + struct task_struct *tsk) 8 + { 9 + /* 10 + * Only need the full barrier when switching between processes. 11 + * Barrier when switching from kernel to userspace is not 12 + * required here, given that it is implied by mmdrop(). Barrier 13 + * when switching from userspace to kernel is not needed after 14 + * store to rq->curr. 15 + */ 16 + if (IS_ENABLED(CONFIG_SMP) && 17 + likely(!(atomic_read(&next->membarrier_state) & 18 + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | 19 + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev)) 20 + return; 21 + 22 + /* 23 + * The membarrier system call requires a full memory barrier 24 + * after storing to rq->curr, before going back to user-space. 25 + * 26 + * This barrier is also needed for the SYNC_CORE command when 27 + * switching between processes; in particular, on a transition 28 + * from a thread belonging to another mm to a thread belonging 29 + * to the mm for which a membarrier SYNC_CORE is done on CPU0: 30 + * 31 + * - [CPU0] sets all bits in the mm icache_stale_mask (in 32 + * prepare_sync_core_cmd()); 33 + * 34 + * - [CPU1] stores to rq->curr (by the scheduler); 35 + * 36 + * - [CPU0] loads rq->curr within membarrier and observes 37 + * cpu_rq(1)->curr->mm != mm, so the IPI is skipped on 38 + * CPU1; this means membarrier relies on switch_mm() to 39 + * issue the sync-core; 40 + * 41 + * - [CPU1] switch_mm() loads icache_stale_mask; if the bit 42 + * is zero, switch_mm() may incorrectly skip the sync-core. 43 + * 44 + * Matches a full barrier in the proximity of the membarrier 45 + * system call entry. 46 + */ 47 + smp_mb(); 48 + } 49 + 50 + #endif /* _ASM_RISCV_MEMBARRIER_H */
+29
arch/riscv/include/asm/sync_core.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _ASM_RISCV_SYNC_CORE_H 3 + #define _ASM_RISCV_SYNC_CORE_H 4 + 5 + /* 6 + * RISC-V implements return to user-space through an xRET instruction, 7 + * which is not core serializing. 8 + */ 9 + static inline void sync_core_before_usermode(void) 10 + { 11 + asm volatile ("fence.i" ::: "memory"); 12 + } 13 + 14 + #ifdef CONFIG_SMP 15 + /* 16 + * Ensure the next switch_mm() on every CPU issues a core serializing 17 + * instruction for the given @mm. 18 + */ 19 + static inline void prepare_sync_core_cmd(struct mm_struct *mm) 20 + { 21 + cpumask_setall(&mm->context.icache_stale_mask); 22 + } 23 + #else 24 + static inline void prepare_sync_core_cmd(struct mm_struct *mm) 25 + { 26 + } 27 + #endif /* CONFIG_SMP */ 28 + 29 + #endif /* _ASM_RISCV_SYNC_CORE_H */
+2
arch/riscv/mm/context.c
··· 323 323 if (unlikely(prev == next)) 324 324 return; 325 325 326 + membarrier_arch_switch_mm(prev, next, task); 327 + 326 328 /* 327 329 * Mark the current MM context as inactive, and the next as 328 330 * active. This is at least used by the icache flushing
+15 -1
include/linux/sync_core.h
··· 17 17 } 18 18 #endif 19 19 20 - #endif /* _LINUX_SYNC_CORE_H */ 20 + #ifdef CONFIG_ARCH_HAS_PREPARE_SYNC_CORE_CMD 21 + #include <asm/sync_core.h> 22 + #else 23 + /* 24 + * This is a dummy prepare_sync_core_cmd() implementation that can be used on 25 + * all architectures which provide unconditional core serializing instructions 26 + * in switch_mm(). 27 + * If your architecture doesn't provide such core serializing instructions in 28 + * switch_mm(), you may need to write your own functions. 29 + */ 30 + static inline void prepare_sync_core_cmd(struct mm_struct *mm) 31 + { 32 + } 33 + #endif 21 34 35 + #endif /* _LINUX_SYNC_CORE_H */
+3
init/Kconfig
··· 1970 1970 config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE 1971 1971 bool 1972 1972 1973 + config ARCH_HAS_PREPARE_SYNC_CORE_CMD 1974 + bool 1975 + 1973 1976 config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE 1974 1977 bool 1975 1978
+13 -3
kernel/sched/core.c
··· 6638 6638 * if (signal_pending_state()) if (p->state & @state) 6639 6639 * 6640 6640 * Also, the membarrier system call requires a full memory barrier 6641 - * after coming from user-space, before storing to rq->curr. 6641 + * after coming from user-space, before storing to rq->curr; this 6642 + * barrier matches a full barrier in the proximity of the membarrier 6643 + * system call exit. 6642 6644 */ 6643 6645 rq_lock(rq, &rf); 6644 6646 smp_mb__after_spinlock(); ··· 6711 6709 * 6712 6710 * Here are the schemes providing that barrier on the 6713 6711 * various architectures: 6714 - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. 6715 - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. 6712 + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC, 6713 + * RISC-V. switch_mm() relies on membarrier_arch_switch_mm() 6714 + * on PowerPC and on RISC-V. 6716 6715 * - finish_lock_switch() for weakly-ordered 6717 6716 * architectures where spin_unlock is a full barrier, 6718 6717 * - switch_to() for arm64 (weakly-ordered, spin_unlock 6719 6718 * is a RELEASE barrier), 6719 + * 6720 + * The barrier matches a full barrier in the proximity of 6721 + * the membarrier system call entry. 6722 + * 6723 + * On RISC-V, this barrier pairing is also needed for the 6724 + * SYNC_CORE command when switching between processes, cf. 6725 + * the inline comments in membarrier_arch_switch_mm(). 6720 6726 */ 6721 6727 ++*switch_count; 6722 6728
+9 -4
kernel/sched/membarrier.c
··· 251 251 return 0; 252 252 253 253 /* 254 - * Matches memory barriers around rq->curr modification in 254 + * Matches memory barriers after rq->curr modification in 255 255 * scheduler. 256 256 */ 257 257 smp_mb(); /* system call entry is not a mb. */ ··· 300 300 301 301 /* 302 302 * Memory barrier on the caller thread _after_ we finished 303 - * waiting for the last IPI. Matches memory barriers around 303 + * waiting for the last IPI. Matches memory barriers before 304 304 * rq->curr modification in scheduler. 305 305 */ 306 306 smp_mb(); /* exit from system call is not a mb */ ··· 320 320 MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY)) 321 321 return -EPERM; 322 322 ipi_func = ipi_sync_core; 323 + prepare_sync_core_cmd(mm); 323 324 } else if (flags == MEMBARRIER_FLAG_RSEQ) { 324 325 if (!IS_ENABLED(CONFIG_RSEQ)) 325 326 return -EINVAL; ··· 340 339 return 0; 341 340 342 341 /* 343 - * Matches memory barriers around rq->curr modification in 342 + * Matches memory barriers after rq->curr modification in 344 343 * scheduler. 344 + * 345 + * On RISC-V, this barrier pairing is also needed for the 346 + * SYNC_CORE command when switching between processes, cf. 347 + * the inline comments in membarrier_arch_switch_mm(). 345 348 */ 346 349 smp_mb(); /* system call entry is not a mb. */ 347 350 ··· 420 415 421 416 /* 422 417 * Memory barrier on the caller thread _after_ we finished 423 - * waiting for the last IPI. Matches memory barriers around 418 + * waiting for the last IPI. Matches memory barriers before 424 419 * rq->curr modification in scheduler. 425 420 */ 426 421 smp_mb(); /* exit from system call is not a mb */