Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

__arch_xprod64(): make __always_inline when optimizing for performance

Recent gcc versions started not systematically inline __arch_xprod64()
and that has performance implications. Give the compiler the freedom to
decide only when optimizing for size.

Here's some timing numbers from lib/math/test_div64.c

Using __always_inline:

```
test_div64: Starting 64bit/32bit division and modulo test
test_div64: Completed 64bit/32bit division and modulo test, 0.048285584s elapsed
```

Without __always_inline:

```
test_div64: Starting 64bit/32bit division and modulo test
test_div64: Completed 64bit/32bit division and modulo test, 0.053023584s elapsed
```

Forcing constant base through the non-constant base code path:

```
test_div64: Starting 64bit/32bit division and modulo test
test_div64: Completed 64bit/32bit division and modulo test, 0.103263776s elapsed
```

It is worth noting that test_div64 does half the test with non constant
divisors already so the impact is greater than what those numbers show.
And for what it is worth, those numbers were obtained using QEMU. The
gcc version is 14.1.0.

Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

authored by

Nicolas Pitre and committed by
Arnd Bergmann
d533cb2d 06508533

+12 -2
+6 -1
arch/arm/include/asm/div64.h
··· 52 52 53 53 #else 54 54 55 - static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias) 55 + #ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE 56 + static __always_inline 57 + #else 58 + static inline 59 + #endif 60 + uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias) 56 61 { 57 62 unsigned long long res; 58 63 register unsigned int tmp asm("ip") = 0;
+6 -1
include/asm-generic/div64.h
··· 134 134 * Hoping for compile-time optimization of conditional code. 135 135 * Architectures may provide their own optimized assembly implementation. 136 136 */ 137 - static inline uint64_t __arch_xprod_64(const uint64_t m, uint64_t n, bool bias) 137 + #ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE 138 + static __always_inline 139 + #else 140 + static inline 141 + #endif 142 + uint64_t __arch_xprod_64(const uint64_t m, uint64_t n, bool bias) 138 143 { 139 144 uint32_t m_lo = m; 140 145 uint32_t m_hi = m >> 32;