Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

MIPS: Loongson: Introduce and use loongson_llsc_mb()

On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and
lld/scd is very weak ordering. We should add sync instructions "before
each ll/lld" and "at the branch-target between ll/sc" to workaround.
Otherwise, this flaw will cause deadlock occasionally (e.g. when doing
heavy load test with LTP).

Below is the explaination of CPU designer:

"For Loongson 3 family, when a memory access instruction (load, store,
or prefetch)'s executing occurs between the execution of LL and SC, the
success or failure of SC is not predictable. Although programmer would
not insert memory access instructions between LL and SC, the memory
instructions before LL in program-order, may dynamically executed
between the execution of LL/SC, so a memory fence (SYNC) is needed
before LL/LLD to avoid this situation.

Since Loongson-3A R2 (3A2000), we have improved our hardware design to
handle this case. But we later deduce a rarely circumstance that some
speculatively executed memory instructions due to branch misprediction
between LL/SC still fall into the above case, so a memory fence (SYNC)
at branch-target (if its target is not between LL/SC) is needed for
Loongson 3A1000, 3B1500, 3A2000 and 3A3000.

Our processor is continually evolving and we aim to to remove all these
workaround-SYNCs around LL/SC for new-come processor."

Here is an example:

Both cpu1 and cpu2 simutaneously run atomic_add by 1 on same atomic var,
this bug cause both 'sc' run by two cpus (in atomic_add) succeed at same
time('sc' return 1), and the variable is only *added by 1*, sometimes,
which is wrong and unacceptable(it should be added by 2).

Why disable fix-loongson3-llsc in compiler?
Because compiler fix will cause problems in kernel's __ex_table section.

This patch fix all the cases in kernel, but:

+. the fix at the end of futex_atomic_cmpxchg_inatomic is for branch-target
of 'bne', there other cases which smp_mb__before_llsc() and smp_llsc_mb() fix
the ll and branch-target coincidently such as atomic_sub_if_positive/
cmpxchg/xchg, just like this one.

+. Loongson 3 does support CONFIG_EDAC_ATOMIC_SCRUB, so no need to touch
edac.h

+. local_ops and cmpxchg_local should not be affected by this bug since
only the owner can write.

+. mips_atomic_set for syscall.c is deprecated and rarely used, just let
it go

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Huang Pei <huangpei@loongson.cn>
[paul.burton@mips.com:
- Simplify the addition of -mno-fix-loongson3-llsc to cflags, and add
a comment describing why it's there.
- Make loongson_llsc_mb() a no-op when
CONFIG_CPU_LOONGSON3_WORKAROUNDS=n, rather than a compiler memory
barrier.
- Add a comment describing the bug & how loongson_llsc_mb() helps
in asm/barrier.h.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: ambrosehua@gmail.com
Cc: Steven J . Hill <Steven.Hill@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Cc: Li Xuefeng <lixuefeng@loongson.cn>
Cc: Xu Chenghua <xuchenghua@loongson.cn>

authored by

Huacai Chen and committed by
Paul Burton
e02e07e3 67fc5dc8

+100
+15
arch/mips/Kconfig
··· 1403 1403 please say 'N' here. If you want a high-performance kernel to run on 1404 1404 new Loongson 3 machines only, please say 'Y' here. 1405 1405 1406 + config CPU_LOONGSON3_WORKAROUNDS 1407 + bool "Old Loongson 3 LLSC Workarounds" 1408 + default y if SMP 1409 + depends on CPU_LOONGSON3 1410 + help 1411 + Loongson 3 processors have the llsc issues which require workarounds. 1412 + Without workarounds the system may hang unexpectedly. 1413 + 1414 + Newer Loongson 3 will fix these issues and no workarounds are needed. 1415 + The workarounds have no significant side effect on them but may 1416 + decrease the performance of the system so this option should be 1417 + disabled unless the kernel is intended to be run on old systems. 1418 + 1419 + If unsure, please say Y. 1420 + 1406 1421 config CPU_LOONGSON2E 1407 1422 bool "Loongson 2E" 1408 1423 depends on SYS_HAS_CPU_LOONGSON2E
+6
arch/mips/include/asm/atomic.h
··· 58 58 if (kernel_uses_llsc) { \ 59 59 int temp; \ 60 60 \ 61 + loongson_llsc_mb(); \ 61 62 __asm__ __volatile__( \ 62 63 " .set push \n" \ 63 64 " .set "MIPS_ISA_LEVEL" \n" \ ··· 86 85 if (kernel_uses_llsc) { \ 87 86 int temp; \ 88 87 \ 88 + loongson_llsc_mb(); \ 89 89 __asm__ __volatile__( \ 90 90 " .set push \n" \ 91 91 " .set "MIPS_ISA_LEVEL" \n" \ ··· 120 118 if (kernel_uses_llsc) { \ 121 119 int temp; \ 122 120 \ 121 + loongson_llsc_mb(); \ 123 122 __asm__ __volatile__( \ 124 123 " .set push \n" \ 125 124 " .set "MIPS_ISA_LEVEL" \n" \ ··· 259 256 if (kernel_uses_llsc) { \ 260 257 long temp; \ 261 258 \ 259 + loongson_llsc_mb(); \ 262 260 __asm__ __volatile__( \ 263 261 " .set push \n" \ 264 262 " .set "MIPS_ISA_LEVEL" \n" \ ··· 287 283 if (kernel_uses_llsc) { \ 288 284 long temp; \ 289 285 \ 286 + loongson_llsc_mb(); \ 290 287 __asm__ __volatile__( \ 291 288 " .set push \n" \ 292 289 " .set "MIPS_ISA_LEVEL" \n" \ ··· 321 316 if (kernel_uses_llsc) { \ 322 317 long temp; \ 323 318 \ 319 + loongson_llsc_mb(); \ 324 320 __asm__ __volatile__( \ 325 321 " .set push \n" \ 326 322 " .set "MIPS_ISA_LEVEL" \n" \
+36
arch/mips/include/asm/barrier.h
··· 222 222 #define __smp_mb__before_atomic() __smp_mb__before_llsc() 223 223 #define __smp_mb__after_atomic() smp_llsc_mb() 224 224 225 + /* 226 + * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load, 227 + * store or pref) in between an ll & sc can cause the sc instruction to 228 + * erroneously succeed, breaking atomicity. Whilst it's unusual to write code 229 + * containing such sequences, this bug bites harder than we might otherwise 230 + * expect due to reordering & speculation: 231 + * 232 + * 1) A memory access appearing prior to the ll in program order may actually 233 + * be executed after the ll - this is the reordering case. 234 + * 235 + * In order to avoid this we need to place a memory barrier (ie. a sync 236 + * instruction) prior to every ll instruction, in between it & any earlier 237 + * memory access instructions. Many of these cases are already covered by 238 + * smp_mb__before_llsc() but for the remaining cases, typically ones in 239 + * which multiple CPUs may operate on a memory location but ordering is not 240 + * usually guaranteed, we use loongson_llsc_mb() below. 241 + * 242 + * This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later. 243 + * 244 + * 2) If a conditional branch exists between an ll & sc with a target outside 245 + * of the ll-sc loop, for example an exit upon value mismatch in cmpxchg() 246 + * or similar, then misprediction of the branch may allow speculative 247 + * execution of memory accesses from outside of the ll-sc loop. 248 + * 249 + * In order to avoid this we need a memory barrier (ie. a sync instruction) 250 + * at each affected branch target, for which we also use loongson_llsc_mb() 251 + * defined below. 252 + * 253 + * This case affects all current Loongson 3 CPUs. 254 + */ 255 + #ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */ 256 + #define loongson_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory") 257 + #else 258 + #define loongson_llsc_mb() do { } while (0) 259 + #endif 260 + 225 261 #include <asm-generic/barrier.h> 226 262 227 263 #endif /* __ASM_BARRIER_H */
+5
arch/mips/include/asm/bitops.h
··· 69 69 : "ir" (1UL << bit), GCC_OFF_SMALL_ASM() (*m)); 70 70 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) 71 71 } else if (kernel_uses_llsc && __builtin_constant_p(bit)) { 72 + loongson_llsc_mb(); 72 73 do { 73 74 __asm__ __volatile__( 74 75 " " __LL "%0, %1 # set_bit \n" ··· 80 79 } while (unlikely(!temp)); 81 80 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */ 82 81 } else if (kernel_uses_llsc) { 82 + loongson_llsc_mb(); 83 83 do { 84 84 __asm__ __volatile__( 85 85 " .set push \n" ··· 125 123 : "ir" (~(1UL << bit))); 126 124 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) 127 125 } else if (kernel_uses_llsc && __builtin_constant_p(bit)) { 126 + loongson_llsc_mb(); 128 127 do { 129 128 __asm__ __volatile__( 130 129 " " __LL "%0, %1 # clear_bit \n" ··· 136 133 } while (unlikely(!temp)); 137 134 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */ 138 135 } else if (kernel_uses_llsc) { 136 + loongson_llsc_mb(); 139 137 do { 140 138 __asm__ __volatile__( 141 139 " .set push \n" ··· 197 193 unsigned long *m = ((unsigned long *) addr) + (nr >> SZLONG_LOG); 198 194 unsigned long temp; 199 195 196 + loongson_llsc_mb(); 200 197 do { 201 198 __asm__ __volatile__( 202 199 " .set push \n"
+3
arch/mips/include/asm/futex.h
··· 50 50 "i" (-EFAULT) \ 51 51 : "memory"); \ 52 52 } else if (cpu_has_llsc) { \ 53 + loongson_llsc_mb(); \ 53 54 __asm__ __volatile__( \ 54 55 " .set push \n" \ 55 56 " .set noat \n" \ ··· 164 163 "i" (-EFAULT) 165 164 : "memory"); 166 165 } else if (cpu_has_llsc) { 166 + loongson_llsc_mb(); 167 167 __asm__ __volatile__( 168 168 "# futex_atomic_cmpxchg_inatomic \n" 169 169 " .set push \n" ··· 194 192 : GCC_OFF_SMALL_ASM() (*uaddr), "Jr" (oldval), "Jr" (newval), 195 193 "i" (-EFAULT) 196 194 : "memory"); 195 + loongson_llsc_mb(); 197 196 } else 198 197 return -ENOSYS; 199 198
+2
arch/mips/include/asm/pgtable.h
··· 228 228 : [buddy] "+m" (buddy->pte), [tmp] "=&r" (tmp) 229 229 : [global] "r" (page_global)); 230 230 } else if (kernel_uses_llsc) { 231 + loongson_llsc_mb(); 231 232 __asm__ __volatile__ ( 232 233 " .set push \n" 233 234 " .set "MIPS_ISA_ARCH_LEVEL" \n" ··· 243 242 " .set pop \n" 244 243 : [buddy] "+m" (buddy->pte), [tmp] "=&r" (tmp) 245 244 : [global] "r" (page_global)); 245 + loongson_llsc_mb(); 246 246 } 247 247 #else /* !CONFIG_SMP */ 248 248 if (pte_none(*buddy))
+23
arch/mips/loongson64/Platform
··· 23 23 endif 24 24 25 25 cflags-$(CONFIG_CPU_LOONGSON3) += -Wa,--trap 26 + 27 + # 28 + # Some versions of binutils, not currently mainline as of 2019/02/04, support 29 + # an -mfix-loongson3-llsc flag which emits a sync prior to each ll instruction 30 + # to work around a CPU bug (see loongson_llsc_mb() in asm/barrier.h for a 31 + # description). 32 + # 33 + # We disable this in order to prevent the assembler meddling with the 34 + # instruction that labels refer to, ie. if we label an ll instruction: 35 + # 36 + # 1: ll v0, 0(a0) 37 + # 38 + # ...then with the assembler fix applied the label may actually point at a sync 39 + # instruction inserted by the assembler, and if we were using the label in an 40 + # exception table the table would no longer contain the address of the ll 41 + # instruction. 42 + # 43 + # Avoid this by explicitly disabling that assembler behaviour. If upstream 44 + # binutils does not merge support for the flag then we can revisit & remove 45 + # this later - for now it ensures vendor toolchains don't cause problems. 46 + # 47 + cflags-$(CONFIG_CPU_LOONGSON3) += $(call as-option,-Wa$(comma)-mno-fix-loongson3-llsc,) 48 + 26 49 # 27 50 # binutils from v2.25 on and gcc starting from v4.9.0 treat -march=loongson3a 28 51 # as MIPS64 R2; older versions as just R1. This leaves the possibility open
+10
arch/mips/mm/tlbex.c
··· 932 932 * to mimic that here by taking a load/istream page 933 933 * fault. 934 934 */ 935 + if (IS_ENABLED(CONFIG_CPU_LOONGSON3_WORKAROUNDS)) 936 + uasm_i_sync(p, 0); 935 937 UASM_i_LA(p, ptr, (unsigned long)tlb_do_page_fault_0); 936 938 uasm_i_jr(p, ptr); 937 939 ··· 1648 1646 iPTE_LW(u32 **p, unsigned int pte, unsigned int ptr) 1649 1647 { 1650 1648 #ifdef CONFIG_SMP 1649 + if (IS_ENABLED(CONFIG_CPU_LOONGSON3_WORKAROUNDS)) 1650 + uasm_i_sync(p, 0); 1651 1651 # ifdef CONFIG_PHYS_ADDR_T_64BIT 1652 1652 if (cpu_has_64bits) 1653 1653 uasm_i_lld(p, pte, 0, ptr); ··· 2263 2259 #endif 2264 2260 2265 2261 uasm_l_nopage_tlbl(&l, p); 2262 + if (IS_ENABLED(CONFIG_CPU_LOONGSON3_WORKAROUNDS)) 2263 + uasm_i_sync(&p, 0); 2266 2264 build_restore_work_registers(&p); 2267 2265 #ifdef CONFIG_CPU_MICROMIPS 2268 2266 if ((unsigned long)tlb_do_page_fault_0 & 1) { ··· 2319 2313 #endif 2320 2314 2321 2315 uasm_l_nopage_tlbs(&l, p); 2316 + if (IS_ENABLED(CONFIG_CPU_LOONGSON3_WORKAROUNDS)) 2317 + uasm_i_sync(&p, 0); 2322 2318 build_restore_work_registers(&p); 2323 2319 #ifdef CONFIG_CPU_MICROMIPS 2324 2320 if ((unsigned long)tlb_do_page_fault_1 & 1) { ··· 2376 2368 #endif 2377 2369 2378 2370 uasm_l_nopage_tlbm(&l, p); 2371 + if (IS_ENABLED(CONFIG_CPU_LOONGSON3_WORKAROUNDS)) 2372 + uasm_i_sync(&p, 0); 2379 2373 build_restore_work_registers(&p); 2380 2374 #ifdef CONFIG_CPU_MICROMIPS 2381 2375 if ((unsigned long)tlb_do_page_fault_1 & 1) {