Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/powernv: Fix race in updating core_idle_state

core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
- The thread is first in subcore to wakeup from sleep/winkle.
- If its the last thread in the core about to enter sleep/winkle

While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.

But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-

First thread in the subcorea Another thread in the same
waking up core entering sleep/winkle

lwarx r15,0,r14
ori r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx. r15,0,r14
[Code to restore subcore state]

lwarx r15,0,r14
[clear thread bit]
stwcx. r15,0,r14

andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw r15,0(r14)

Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.

This patch fixes the above race by looping on the lock bit even while
entering the idle states.

Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

authored by

Shreyas B. Prabhu and committed by
Michael Ellerman
b32aadc1 a8956a7b

+21 -10
+21 -10
arch/powerpc/kernel/idle_power7.S
··· 52 52 .text 53 53 54 54 /* 55 + * Used by threads when the lock bit of core_idle_state is set. 56 + * Threads will spin in HMT_LOW until the lock bit is cleared. 57 + * r14 - pointer to core_idle_state 58 + * r15 - used to load contents of core_idle_state 59 + */ 60 + 61 + core_idle_lock_held: 62 + HMT_LOW 63 + 3: lwz r15,0(r14) 64 + andi. r15,r15,PNV_CORE_IDLE_LOCK_BIT 65 + bne 3b 66 + HMT_MEDIUM 67 + lwarx r15,0,r14 68 + blr 69 + 70 + /* 55 71 * Pass requested state in r3: 56 72 * r3 - PNV_THREAD_NAP/SLEEP/WINKLE 57 73 * ··· 166 150 ld r14,PACA_CORE_IDLE_STATE_PTR(r13) 167 151 lwarx_loop1: 168 152 lwarx r15,0,r14 153 + 154 + andi. r9,r15,PNV_CORE_IDLE_LOCK_BIT 155 + bnel core_idle_lock_held 156 + 169 157 andc r15,r15,r7 /* Clear thread bit */ 170 158 171 159 andi. r15,r15,PNV_CORE_IDLE_THREAD_BITS ··· 314 294 * workaround undo code or resyncing timebase or restoring context 315 295 * In either case loop until the lock bit is cleared. 316 296 */ 317 - bne core_idle_lock_held 297 + bnel core_idle_lock_held 318 298 319 299 cmpwi cr2,r15,0 320 300 lbz r4,PACA_SUBCORE_SIBLING_MASK(r13) ··· 338 318 bne- lwarx_loop2 339 319 isync 340 320 b common_exit 341 - 342 - core_idle_lock_held: 343 - HMT_LOW 344 - core_idle_lock_loop: 345 - lwz r15,0(14) 346 - andi. r9,r15,PNV_CORE_IDLE_LOCK_BIT 347 - bne core_idle_lock_loop 348 - HMT_MEDIUM 349 - b lwarx_loop2 350 321 351 322 first_thread_in_subcore: 352 323 /* First thread in subcore to wakeup */