Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

membarrier: Provide core serializing command, *_SYNC_CORE

Provide core serializing membarrier command to support memory reclaim
by JIT.

Each architecture needs to explicitly opt into that support by
documenting in their architecture code how they provide the core
serializing instructions required when returning from the membarrier
IPI, and after the scheduler has updated the curr->mm pointer (before
going back to user-space). They should then select
ARCH_HAS_MEMBARRIER_SYNC_CORE to enable support for that command on
their architecture.

Architectures selecting this feature need to either document that
they issue core serializing instructions when returning to user-space,
or implement their architecture-specific sync_core_before_usermode().

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-9-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Mathieu Desnoyers and committed by
Ingo Molnar
70216e18 ac1ab12a

+106 -18
+18
include/linux/sched/mm.h
··· 7 7 #include <linux/sched.h> 8 8 #include <linux/mm_types.h> 9 9 #include <linux/gfp.h> 10 + #include <linux/sync_core.h> 10 11 11 12 /* 12 13 * Routines for handling mm_structs ··· 224 223 MEMBARRIER_STATE_PRIVATE_EXPEDITED = (1U << 1), 225 224 MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY = (1U << 2), 226 225 MEMBARRIER_STATE_GLOBAL_EXPEDITED = (1U << 3), 226 + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY = (1U << 4), 227 + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE = (1U << 5), 228 + }; 229 + 230 + enum { 231 + MEMBARRIER_FLAG_SYNC_CORE = (1U << 0), 227 232 }; 228 233 229 234 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS 230 235 #include <asm/membarrier.h> 231 236 #endif 237 + 238 + static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) 239 + { 240 + if (likely(!(atomic_read(&mm->membarrier_state) & 241 + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))) 242 + return; 243 + sync_core_before_usermode(); 244 + } 232 245 233 246 static inline void membarrier_execve(struct task_struct *t) 234 247 { ··· 257 242 } 258 243 #endif 259 244 static inline void membarrier_execve(struct task_struct *t) 245 + { 246 + } 247 + static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) 260 248 { 261 249 } 262 250 #endif
+31 -1
include/uapi/linux/membarrier.h
··· 73 73 * to and return from the system call 74 74 * (non-running threads are de facto in such a 75 75 * state). This only covers threads from the 76 - * same processes as the caller thread. This 76 + * same process as the caller thread. This 77 77 * command returns 0 on success. The 78 78 * "expedited" commands complete faster than 79 79 * the non-expedited ones, they never block, ··· 86 86 * Register the process intent to use 87 87 * MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always 88 88 * returns 0. 89 + * @MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE: 90 + * In addition to provide memory ordering 91 + * guarantees described in 92 + * MEMBARRIER_CMD_PRIVATE_EXPEDITED, ensure 93 + * the caller thread, upon return from system 94 + * call, that all its running threads siblings 95 + * have executed a core serializing 96 + * instruction. (architectures are required to 97 + * guarantee that non-running threads issue 98 + * core serializing instructions before they 99 + * resume user-space execution). This only 100 + * covers threads from the same process as the 101 + * caller thread. This command returns 0 on 102 + * success. The "expedited" commands complete 103 + * faster than the non-expedited ones, they 104 + * never block, but have the downside of 105 + * causing extra overhead. If this command is 106 + * not implemented by an architecture, -EINVAL 107 + * is returned. A process needs to register its 108 + * intent to use the private expedited sync 109 + * core command prior to using it, otherwise 110 + * this command returns -EPERM. 111 + * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE: 112 + * Register the process intent to use 113 + * MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE. 114 + * If this command is not implemented by an 115 + * architecture, -EINVAL is returned. 116 + * Returns 0 on success. 89 117 * @MEMBARRIER_CMD_SHARED: 90 118 * Alias to MEMBARRIER_CMD_GLOBAL. Provided for 91 119 * header backward compatibility. ··· 129 101 MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED = (1 << 2), 130 102 MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), 131 103 MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED = (1 << 4), 104 + MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 5), 105 + MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 6), 132 106 133 107 /* Alias for header backward compatibility. */ 134 108 MEMBARRIER_CMD_SHARED = MEMBARRIER_CMD_GLOBAL,
+3
init/Kconfig
··· 1415 1415 config ARCH_HAS_MEMBARRIER_CALLBACKS 1416 1416 bool 1417 1417 1418 + config ARCH_HAS_MEMBARRIER_SYNC_CORE 1419 + bool 1420 + 1418 1421 config EMBEDDED 1419 1422 bool "Embedded system" 1420 1423 option allnoconfig_y
+13 -5
kernel/sched/core.c
··· 2704 2704 2705 2705 fire_sched_in_preempt_notifiers(current); 2706 2706 /* 2707 - * When transitioning from a kernel thread to a userspace 2708 - * thread, mmdrop()'s implicit full barrier is required by the 2709 - * membarrier system call, because the current ->active_mm can 2710 - * become the current mm without going through switch_mm(). 2707 + * When switching through a kernel thread, the loop in 2708 + * membarrier_{private,global}_expedited() may have observed that 2709 + * kernel thread and not issued an IPI. It is therefore possible to 2710 + * schedule between user->kernel->user threads without passing though 2711 + * switch_mm(). Membarrier requires a barrier after storing to 2712 + * rq->curr, before returning to userspace, so provide them here: 2713 + * 2714 + * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly 2715 + * provided by mmdrop(), 2716 + * - a sync_core for SYNC_CORE. 2711 2717 */ 2712 - if (mm) 2718 + if (mm) { 2719 + membarrier_mm_sync_core_before_usermode(mm); 2713 2720 mmdrop(mm); 2721 + } 2714 2722 if (unlikely(prev_state == TASK_DEAD)) { 2715 2723 if (prev->sched_class->task_dead) 2716 2724 prev->sched_class->task_dead(prev);
+41 -12
kernel/sched/membarrier.c
··· 26 26 * Bitmask made from a "or" of all commands within enum membarrier_cmd, 27 27 * except MEMBARRIER_CMD_QUERY. 28 28 */ 29 + #ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE 30 + #define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK \ 31 + (MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE \ 32 + | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE) 33 + #else 34 + #define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK 0 35 + #endif 36 + 29 37 #define MEMBARRIER_CMD_BITMASK \ 30 38 (MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED \ 31 39 | MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \ 32 40 | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ 33 - | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED) 41 + | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED \ 42 + | MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK) 34 43 35 44 static void ipi_mb(void *info) 36 45 { ··· 113 104 return 0; 114 105 } 115 106 116 - static int membarrier_private_expedited(void) 107 + static int membarrier_private_expedited(int flags) 117 108 { 118 109 int cpu; 119 110 bool fallback = false; 120 111 cpumask_var_t tmpmask; 121 112 122 - if (!(atomic_read(&current->mm->membarrier_state) 123 - & MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)) 124 - return -EPERM; 113 + if (flags & MEMBARRIER_FLAG_SYNC_CORE) { 114 + if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) 115 + return -EINVAL; 116 + if (!(atomic_read(&current->mm->membarrier_state) & 117 + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY)) 118 + return -EPERM; 119 + } else { 120 + if (!(atomic_read(&current->mm->membarrier_state) & 121 + MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)) 122 + return -EPERM; 123 + } 125 124 126 125 if (num_online_cpus() == 1) 127 126 return 0; ··· 222 205 return 0; 223 206 } 224 207 225 - static int membarrier_register_private_expedited(void) 208 + static int membarrier_register_private_expedited(int flags) 226 209 { 227 210 struct task_struct *p = current; 228 211 struct mm_struct *mm = p->mm; 212 + int state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY; 213 + 214 + if (flags & MEMBARRIER_FLAG_SYNC_CORE) { 215 + if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) 216 + return -EINVAL; 217 + state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY; 218 + } 229 219 230 220 /* 231 221 * We need to consider threads belonging to different thread 232 222 * groups, which use the same mm. (CLONE_VM but not 233 223 * CLONE_THREAD). 234 224 */ 235 - if (atomic_read(&mm->membarrier_state) 236 - & MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY) 225 + if (atomic_read(&mm->membarrier_state) & state) 237 226 return 0; 238 227 atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state); 228 + if (flags & MEMBARRIER_FLAG_SYNC_CORE) 229 + atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE, 230 + &mm->membarrier_state); 239 231 if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) { 240 232 /* 241 233 * Ensure all future scheduler executions will observe the ··· 252 226 */ 253 227 synchronize_sched(); 254 228 } 255 - atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY, 256 - &mm->membarrier_state); 229 + atomic_or(state, &mm->membarrier_state); 257 230 return 0; 258 231 } 259 232 ··· 308 283 case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED: 309 284 return membarrier_register_global_expedited(); 310 285 case MEMBARRIER_CMD_PRIVATE_EXPEDITED: 311 - return membarrier_private_expedited(); 286 + return membarrier_private_expedited(0); 312 287 case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED: 313 - return membarrier_register_private_expedited(); 288 + return membarrier_register_private_expedited(0); 289 + case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE: 290 + return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE); 291 + case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE: 292 + return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE); 314 293 default: 315 294 return -EINVAL; 316 295 }