Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

locking/qspinlock/x86: Micro-optimize virt_spin_lock()

Optimize virt_spin_lock() to use simpler and faster:

atomic_try_cmpxchg(*ptr, &val, new)

instead of:

atomic_cmpxchg(*ptr, val, new) == val

The x86 CMPXCHG instruction returns success in the ZF flag, so
this change saves a compare after the CMPXCHG.

Also optimize retry loop a bit. atomic_try_cmpxchg() fails iff
&lock->val != 0, so there is no need to load and compare the
lock value again - cpu_relax() can be unconditinally called in
this case. This allows us to generate optimized:

1f: ba 01 00 00 00 mov $0x1,%edx
24: 8b 03 mov (%rbx),%eax
26: 85 c0 test %eax,%eax
28: 75 63 jne 8d <...>
2a: f0 0f b1 13 lock cmpxchg %edx,(%rbx)
2e: 75 5d jne 8d <...>
...
8d: f3 90 pause
8f: eb 93 jmp 24 <...>

instead of:

1f: ba 01 00 00 00 mov $0x1,%edx
24: 8b 03 mov (%rbx),%eax
26: 85 c0 test %eax,%eax
28: 75 13 jne 3d <...>
2a: f0 0f b1 13 lock cmpxchg %edx,(%rbx)
2e: 85 c0 test %eax,%eax
30: 75 f2 jne 24 <...>
...
3d: f3 90 pause
3f: eb e3 jmp 24 <...>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240422120054.199092-1-ubizjak@gmail.com

authored by

Uros Bizjak and committed by
Ingo Molnar
94af3a04 33eb8ab4

+9 -4
+9 -4
arch/x86/include/asm/qspinlock.h
··· 85 85 #define virt_spin_lock virt_spin_lock 86 86 static inline bool virt_spin_lock(struct qspinlock *lock) 87 87 { 88 + int val; 89 + 88 90 if (!static_branch_likely(&virt_spin_lock_key)) 89 91 return false; 90 92 ··· 96 94 * horrible lock 'holder' preemption issues. 97 95 */ 98 96 99 - do { 100 - while (atomic_read(&lock->val) != 0) 101 - cpu_relax(); 102 - } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0); 97 + __retry: 98 + val = atomic_read(&lock->val); 99 + 100 + if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) { 101 + cpu_relax(); 102 + goto __retry; 103 + } 103 104 104 105 return true; 105 106 }