Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

arch, locking: Ciao arch_mutex_cpu_relax()

The arch_mutex_cpu_relax() function, introduced by 34b133f, is
hacky and ugly. It was added a few years ago to address the fact
that common cpu_relax() calls include yielding on s390, and thus
impact the optimistic spinning functionality of mutexes. Nowadays
we use this function well beyond mutexes: rwsem, qrwlock, mcs and
lockref. Since the macro that defines the call is in the mutex header,
any users must include mutex.h and the naming is misleading as well.

This patch (i) renames the call to cpu_relax_lowlatency ("relax, but
only if you can do it with very low latency") and (ii) defines it in
each arch's asm/processor.h local header, just like for regular cpu_relax
functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
and thus we can take it out of mutex.h. While this can seem redundant,
I believe it is a good choice as it allows us to move out arch specific
logic from generic locking primitives and enables future(?) archs to
transparently define it, similarly to System Z.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bharat Bhushan <r65777@freescale.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Stratos Karafotis <stratosk@semaphore.gr>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-cris-kernel@axis.com
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux@lists.openrisc.net
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-metag@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Davidlohr Bueso and committed by
Ingo Molnar
3a6bfbc9 acf59377

+51 -25
+1
arch/alpha/include/asm/processor.h
··· 57 57 ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp) 58 58 59 59 #define cpu_relax() barrier() 60 + #define cpu_relax_lowlatency() cpu_relax() 60 61 61 62 #define ARCH_HAS_PREFETCH 62 63 #define ARCH_HAS_PREFETCHW
+2
arch/arc/include/asm/processor.h
··· 62 62 #define cpu_relax() do { } while (0) 63 63 #endif 64 64 65 + #define cpu_relax_lowlatency() cpu_relax() 66 + 65 67 #define copy_segments(tsk, mm) do { } while (0) 66 68 #define release_segments(mm) do { } while (0) 67 69
+2
arch/arm/include/asm/processor.h
··· 82 82 #define cpu_relax() barrier() 83 83 #endif 84 84 85 + #define cpu_relax_lowlatency() cpu_relax() 86 + 85 87 #define task_pt_regs(p) \ 86 88 ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1) 87 89
+1
arch/arm64/include/asm/processor.h
··· 129 129 unsigned long get_wchan(struct task_struct *p); 130 130 131 131 #define cpu_relax() barrier() 132 + #define cpu_relax_lowlatency() cpu_relax() 132 133 133 134 /* Thread switching */ 134 135 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
+1
arch/avr32/include/asm/processor.h
··· 92 92 #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3)) 93 93 94 94 #define cpu_relax() barrier() 95 + #define cpu_relax_lowlatency() cpu_relax() 95 96 #define cpu_sync_pipeline() asm volatile("sub pc, -2" : : : "memory") 96 97 97 98 struct cpu_context {
+1 -1
arch/blackfin/include/asm/processor.h
··· 99 99 #define KSTK_ESP(tsk) ((tsk) == current ? rdusp() : (tsk)->thread.usp) 100 100 101 101 #define cpu_relax() smp_mb() 102 - 102 + #define cpu_relax_lowlatency() cpu_relax() 103 103 104 104 /* Get the Silicon Revision of the chip */ 105 105 static inline uint32_t __pure bfin_revid(void)
+1
arch/c6x/include/asm/processor.h
··· 121 121 #define KSTK_ESP(task) (task_pt_regs(task)->sp) 122 122 123 123 #define cpu_relax() do { } while (0) 124 + #define cpu_relax_lowlatency() cpu_relax() 124 125 125 126 extern const struct seq_operations cpuinfo_op; 126 127
+1
arch/cris/include/asm/processor.h
··· 63 63 #define init_stack (init_thread_union.stack) 64 64 65 65 #define cpu_relax() barrier() 66 + #define cpu_relax_lowlatency() cpu_relax() 66 67 67 68 void default_idle(void); 68 69
+1
arch/hexagon/include/asm/processor.h
··· 56 56 } 57 57 58 58 #define cpu_relax() __vmyield() 59 + #define cpu_relax_lowlatency() cpu_relax() 59 60 60 61 /* 61 62 * Decides where the kernel will search for a free chunk of vm space during
+1
arch/ia64/include/asm/processor.h
··· 548 548 } 549 549 550 550 #define cpu_relax() ia64_hint(ia64_hint_pause) 551 + #define cpu_relax_lowlatency() cpu_relax() 551 552 552 553 static inline int 553 554 ia64_get_irr(unsigned int vector)
+1
arch/m32r/include/asm/processor.h
··· 133 133 #define KSTK_ESP(tsk) ((tsk)->thread.sp) 134 134 135 135 #define cpu_relax() barrier() 136 + #define cpu_relax_lowlatency() cpu_relax() 136 137 137 138 #endif /* _ASM_M32R_PROCESSOR_H */
+1
arch/m68k/include/asm/processor.h
··· 176 176 #define task_pt_regs(tsk) ((struct pt_regs *) ((tsk)->thread.esp0)) 177 177 178 178 #define cpu_relax() barrier() 179 + #define cpu_relax_lowlatency() cpu_relax() 179 180 180 181 #endif
+1
arch/metag/include/asm/processor.h
··· 155 155 #define user_stack_pointer(regs) ((regs)->ctx.AX[0].U0) 156 156 157 157 #define cpu_relax() barrier() 158 + #define cpu_relax_lowlatency() cpu_relax() 158 159 159 160 extern void setup_priv(void); 160 161
+1
arch/microblaze/include/asm/processor.h
··· 22 22 extern const struct seq_operations cpuinfo_op; 23 23 24 24 # define cpu_relax() barrier() 25 + # define cpu_relax_lowlatency() cpu_relax() 25 26 26 27 #define task_pt_regs(tsk) \ 27 28 (((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
+1
arch/mips/include/asm/processor.h
··· 367 367 #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) 368 368 369 369 #define cpu_relax() barrier() 370 + #define cpu_relax_lowlatency() cpu_relax() 370 371 371 372 /* 372 373 * Return_address is a replacement for __builtin_return_address(count)
+2
arch/mn10300/include/asm/processor.h
··· 68 68 extern void identify_cpu(struct mn10300_cpuinfo *); 69 69 extern void print_cpu_info(struct mn10300_cpuinfo *); 70 70 extern void dodgy_tsc(void); 71 + 71 72 #define cpu_relax() barrier() 73 + #define cpu_relax_lowlatency() cpu_relax() 72 74 73 75 /* 74 76 * User space process size: 1.75GB (default).
+1
arch/openrisc/include/asm/processor.h
··· 101 101 #define init_stack (init_thread_union.stack) 102 102 103 103 #define cpu_relax() barrier() 104 + #define cpu_relax_lowlatency() cpu_relax() 104 105 105 106 #endif /* __ASSEMBLY__ */ 106 107 #endif /* __ASM_OPENRISC_PROCESSOR_H */
+1
arch/parisc/include/asm/processor.h
··· 338 338 #define KSTK_ESP(tsk) ((tsk)->thread.regs.gr[30]) 339 339 340 340 #define cpu_relax() barrier() 341 + #define cpu_relax_lowlatency() cpu_relax() 341 342 342 343 /* Used as a macro to identify the combined VIPT/PIPT cached 343 344 * CPUs which require a guarantee of coherency (no inequivalent
+2
arch/powerpc/include/asm/processor.h
··· 400 400 #define cpu_relax() barrier() 401 401 #endif 402 402 403 + #define cpu_relax_lowlatency() cpu_relax() 404 + 403 405 /* Check that a certain kernel stack pointer is valid in task_struct p */ 404 406 int validate_sp(unsigned long sp, struct task_struct *p, 405 407 unsigned long nbytes);
+1 -1
arch/s390/include/asm/processor.h
··· 217 217 barrier(); 218 218 } 219 219 220 - #define arch_mutex_cpu_relax() barrier() 220 + #define cpu_relax_lowlatency() barrier() 221 221 222 222 static inline void psw_set_key(unsigned int key) 223 223 {
+1
arch/score/include/asm/processor.h
··· 24 24 #define current_text_addr() ({ __label__ _l; _l: &&_l; }) 25 25 26 26 #define cpu_relax() barrier() 27 + #define cpu_relax_lowlatency() cpu_relax() 27 28 #define release_thread(thread) do {} while (0) 28 29 29 30 /*
+1
arch/sh/include/asm/processor.h
··· 97 97 98 98 #define cpu_sleep() __asm__ __volatile__ ("sleep" : : : "memory") 99 99 #define cpu_relax() barrier() 100 + #define cpu_relax_lowlatency() cpu_relax() 100 101 101 102 void default_idle(void); 102 103 void stop_this_cpu(void *);
+2
arch/sparc/include/asm/processor_32.h
··· 119 119 int do_mathemu(struct pt_regs *regs, struct task_struct *fpt); 120 120 121 121 #define cpu_relax() barrier() 122 + #define cpu_relax_lowlatency() cpu_relax() 123 + 122 124 extern void (*sparc_idle)(void); 123 125 124 126 #endif
+1
arch/sparc/include/asm/processor_64.h
··· 216 216 "nop\n\t" \ 217 217 ".previous" \ 218 218 ::: "memory") 219 + #define cpu_relax_lowlatency() cpu_relax() 219 220 220 221 /* Prefetch support. This is tuned for UltraSPARC-III and later. 221 222 * UltraSPARC-I will treat these as nops, and UltraSPARC-II has
+2
arch/tile/include/asm/processor.h
··· 266 266 barrier(); 267 267 } 268 268 269 + #define cpu_relax_lowlatency() cpu_relax() 270 + 269 271 /* Info on this processor (see fs/proc/cpuinfo.c) */ 270 272 struct seq_operations; 271 273 extern const struct seq_operations cpuinfo_op;
+1
arch/unicore32/include/asm/processor.h
··· 71 71 unsigned long get_wchan(struct task_struct *p); 72 72 73 73 #define cpu_relax() barrier() 74 + #define cpu_relax_lowlatency() cpu_relax() 74 75 75 76 #define task_pt_regs(p) \ 76 77 ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
+2
arch/x86/include/asm/processor.h
··· 696 696 rep_nop(); 697 697 } 698 698 699 + #define cpu_relax_lowlatency() cpu_relax() 700 + 699 701 /* Stop speculative execution and prefetching of modified code. */ 700 702 static inline void sync_core(void) 701 703 {
+2 -1
arch/x86/um/asm/processor.h
··· 25 25 __asm__ __volatile__("rep;nop": : :"memory"); 26 26 } 27 27 28 - #define cpu_relax() rep_nop() 28 + #define cpu_relax() rep_nop() 29 + #define cpu_relax_lowlatency() cpu_relax() 29 30 30 31 #include <asm/processor-generic.h> 31 32
+1
arch/xtensa/include/asm/processor.h
··· 182 182 #define KSTK_ESP(tsk) (task_pt_regs(tsk)->areg[1]) 183 183 184 184 #define cpu_relax() barrier() 185 + #define cpu_relax_lowlatency() cpu_relax() 185 186 186 187 /* Special register access. */ 187 188
-4
include/linux/mutex.h
··· 176 176 177 177 extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); 178 178 179 - #ifndef arch_mutex_cpu_relax 180 - # define arch_mutex_cpu_relax() cpu_relax() 181 - #endif 182 - 183 179 #endif /* __LINUX_MUTEX_H */
+3 -5
kernel/locking/mcs_spinlock.c
··· 1 - 2 1 #include <linux/percpu.h> 3 - #include <linux/mutex.h> 4 2 #include <linux/sched.h> 5 3 #include "mcs_spinlock.h" 6 4 ··· 77 79 break; 78 80 } 79 81 80 - arch_mutex_cpu_relax(); 82 + cpu_relax_lowlatency(); 81 83 } 82 84 83 85 return next; ··· 118 120 if (need_resched()) 119 121 goto unqueue; 120 122 121 - arch_mutex_cpu_relax(); 123 + cpu_relax_lowlatency(); 122 124 } 123 125 return true; 124 126 ··· 144 146 if (smp_load_acquire(&node->locked)) 145 147 return true; 146 148 147 - arch_mutex_cpu_relax(); 149 + cpu_relax_lowlatency(); 148 150 149 151 /* 150 152 * Or we race against a concurrent unqueue()'s step-B, in which
+2 -2
kernel/locking/mcs_spinlock.h
··· 27 27 #define arch_mcs_spin_lock_contended(l) \ 28 28 do { \ 29 29 while (!(smp_load_acquire(l))) \ 30 - arch_mutex_cpu_relax(); \ 30 + cpu_relax_lowlatency(); \ 31 31 } while (0) 32 32 #endif 33 33 ··· 104 104 return; 105 105 /* Wait until the next pointer is set */ 106 106 while (!(next = ACCESS_ONCE(node->next))) 107 - arch_mutex_cpu_relax(); 107 + cpu_relax_lowlatency(); 108 108 } 109 109 110 110 /* Pass lock to next waiter. */
+2 -2
kernel/locking/mutex.c
··· 146 146 if (need_resched()) 147 147 break; 148 148 149 - arch_mutex_cpu_relax(); 149 + cpu_relax_lowlatency(); 150 150 } 151 151 rcu_read_unlock(); 152 152 ··· 464 464 * memory barriers as we'll eventually observe the right 465 465 * values at the cost of a few extra spins. 466 466 */ 467 - arch_mutex_cpu_relax(); 467 + cpu_relax_lowlatency(); 468 468 } 469 469 osq_unlock(&lock->osq); 470 470 slowpath:
+4 -5
kernel/locking/qrwlock.c
··· 20 20 #include <linux/cpumask.h> 21 21 #include <linux/percpu.h> 22 22 #include <linux/hardirq.h> 23 - #include <linux/mutex.h> 24 23 #include <asm/qrwlock.h> 25 24 26 25 /** ··· 34 35 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts) 35 36 { 36 37 while ((cnts & _QW_WMASK) == _QW_LOCKED) { 37 - arch_mutex_cpu_relax(); 38 + cpu_relax_lowlatency(); 38 39 cnts = smp_load_acquire((u32 *)&lock->cnts); 39 40 } 40 41 } ··· 74 75 * to make sure that the write lock isn't taken. 75 76 */ 76 77 while (atomic_read(&lock->cnts) & _QW_WMASK) 77 - arch_mutex_cpu_relax(); 78 + cpu_relax_lowlatency(); 78 79 79 80 cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS; 80 81 rspin_until_writer_unlock(lock, cnts); ··· 113 114 cnts | _QW_WAITING) == cnts)) 114 115 break; 115 116 116 - arch_mutex_cpu_relax(); 117 + cpu_relax_lowlatency(); 117 118 } 118 119 119 120 /* When no more readers, set the locked flag */ ··· 124 125 _QW_LOCKED) == _QW_WAITING)) 125 126 break; 126 127 127 - arch_mutex_cpu_relax(); 128 + cpu_relax_lowlatency(); 128 129 } 129 130 unlock: 130 131 arch_spin_unlock(&lock->lock);
+2 -2
kernel/locking/rwsem-xadd.c
··· 329 329 if (need_resched()) 330 330 break; 331 331 332 - arch_mutex_cpu_relax(); 332 + cpu_relax_lowlatency(); 333 333 } 334 334 rcu_read_unlock(); 335 335 ··· 381 381 * memory barriers as we'll eventually observe the right 382 382 * values at the cost of a few extra spins. 383 383 */ 384 - arch_mutex_cpu_relax(); 384 + cpu_relax_lowlatency(); 385 385 } 386 386 osq_unlock(&sem->osq); 387 387 done:
+1 -2
lib/lockref.c
··· 1 1 #include <linux/export.h> 2 2 #include <linux/lockref.h> 3 - #include <linux/mutex.h> 4 3 5 4 #if USE_CMPXCHG_LOCKREF 6 5 ··· 28 29 if (likely(old.lock_count == prev.lock_count)) { \ 29 30 SUCCESS; \ 30 31 } \ 31 - arch_mutex_cpu_relax(); \ 32 + cpu_relax_lowlatency(); \ 32 33 } \ 33 34 } while (0) 34 35