Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

lib/math/gcd: use static key to select implementation at runtime

Patch series "Optimize GCD performance on RISC-V by selecting
implementation at runtime", v3.

The current implementation of gcd() selects between the binary GCD and the
odd-even GCD algorithm at compile time, depending on whether
CONFIG_CPU_NO_EFFICIENT_FFS is set. On platforms like RISC-V, however,
this compile-time decision can be misleading: even when the compiler emits
ctz instructions based on the assumption that they are efficient (as is
the case when CONFIG_RISCV_ISA_ZBB is enabled), the actual hardware may
lack support for the Zbb extension. In such cases, ffs() falls back to a
software implementation at runtime, making the binary GCD algorithm
significantly slower than the odd-even variant.

To address this, we introduce a static key to allow runtime selection
between the binary and odd-even GCD implementations. On RISC-V, the
kernel now checks for Zbb support during boot. If Zbb is unavailable, the
static key is disabled so that gcd() consistently uses the more efficient
odd-even algorithm in that scenario. Additionally, to further reduce code
size, we select CONFIG_CPU_NO_EFFICIENT_FFS automatically when
CONFIG_RISCV_ISA_ZBB is not enabled, avoiding compilation of the unused
binary GCD implementation entirely on systems where it would never be
executed.

This series ensures that the most efficient GCD algorithm is used in
practice and avoids compiling unnecessary code based on hardware
capabilities and kernel configuration.


This patch (of 3):

On platforms like RISC-V, the compiler may generate hardware FFS
instructions even if the underlying CPU does not actually support them.
Currently, the GCD implementation is chosen at compile time based on
CONFIG_CPU_NO_EFFICIENT_FFS, which can result in suboptimal behavior on
such systems.

Introduce a static key, efficient_ffs_key, to enable runtime selection
between the binary GCD (using ffs) and the odd-even GCD implementation.
This allows the kernel to default to the faster binary GCD when FFS is
efficient, while retaining the ability to fall back when needed.

Link: https://lkml.kernel.org/r/20250606134758.1308400-1-visitorckw@gmail.com
Link: https://lkml.kernel.org/r/20250606134758.1308400-2-visitorckw@gmail.com
Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Kuan-Wei Chiu and committed by
Andrew Morton
b3d5fd6f 08eabe4b

+18 -12
+3
include/linux/gcd.h
··· 3 3 #define _GCD_H 4 4 5 5 #include <linux/compiler.h> 6 + #include <linux/jump_label.h> 7 + 8 + DECLARE_STATIC_KEY_TRUE(efficient_ffs_key); 6 9 7 10 unsigned long gcd(unsigned long a, unsigned long b) __attribute_const__; 8 11
+15 -12
lib/math/gcd.c
··· 11 11 * has decent hardware division. 12 12 */ 13 13 14 + DEFINE_STATIC_KEY_TRUE(efficient_ffs_key); 15 + 14 16 #if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) 15 17 16 18 /* If __ffs is available, the even/odd algorithm benchmarks slower. */ 17 19 18 - /** 19 - * gcd - calculate and return the greatest common divisor of 2 unsigned longs 20 - * @a: first value 21 - * @b: second value 22 - */ 23 - unsigned long gcd(unsigned long a, unsigned long b) 20 + static unsigned long binary_gcd(unsigned long a, unsigned long b) 24 21 { 25 22 unsigned long r = a | b; 26 - 27 - if (!a || !b) 28 - return r; 29 23 30 24 b >>= __ffs(b); 31 25 if (b == 1) ··· 38 44 } 39 45 } 40 46 41 - #else 47 + #endif 42 48 43 49 /* If normalization is done by loops, the even/odd algorithm is a win. */ 50 + 51 + /** 52 + * gcd - calculate and return the greatest common divisor of 2 unsigned longs 53 + * @a: first value 54 + * @b: second value 55 + */ 44 56 unsigned long gcd(unsigned long a, unsigned long b) 45 57 { 46 58 unsigned long r = a | b; 47 59 48 60 if (!a || !b) 49 61 return r; 62 + 63 + #if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) 64 + if (static_branch_likely(&efficient_ffs_key)) 65 + return binary_gcd(a, b); 66 + #endif 50 67 51 68 /* Isolate lsbit of r */ 52 69 r &= -r; ··· 84 79 a >>= 1; 85 80 } 86 81 } 87 - 88 - #endif 89 82 90 83 EXPORT_SYMBOL_GPL(gcd);