Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: Improve the general precision of tnum_mul

Drop the value-mask decomposition technique and adopt straightforward
long-multiplication with a twist: when LSB(a) is uncertain, find the
two partial products (for LSB(a) = known 0 and LSB(a) = known 1) and
take a union.

Experiment shows that applying this technique in long multiplication
improves the precision in a significant number of cases (at the cost
of losing precision in a relatively lower number of cases).

Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250826034524.2159515-1-nandakumar@nandakumar.co.in

authored by

Nandakumar Edamana and committed by
Andrii Nakryiko
1df7dad4 2465bb83

+45 -13
+3
include/linux/tnum.h
··· 57 57 /* Return a tnum representing numbers satisfying both @a and @b */ 58 58 struct tnum tnum_intersect(struct tnum a, struct tnum b); 59 59 60 + /* Returns a tnum representing numbers satisfying either @a or @b */ 61 + struct tnum tnum_union(struct tnum t1, struct tnum t2); 62 + 60 63 /* Return @a with all but the lowest @size bytes cleared */ 61 64 struct tnum tnum_cast(struct tnum a, u8 size); 62 65
+42 -13
kernel/bpf/tnum.c
··· 116 116 return TNUM(v & ~mu, mu); 117 117 } 118 118 119 - /* Generate partial products by multiplying each bit in the multiplier (tnum a) 120 - * with the multiplicand (tnum b), and add the partial products after 121 - * appropriately bit-shifting them. Instead of directly performing tnum addition 122 - * on the generated partial products, equivalenty, decompose each partial 123 - * product into two tnums, consisting of the value-sum (acc_v) and the 124 - * mask-sum (acc_m) and then perform tnum addition on them. The following paper 125 - * explains the algorithm in more detail: https://arxiv.org/abs/2105.05398. 119 + /* Perform long multiplication, iterating through the bits in a using rshift: 120 + * - if LSB(a) is a known 0, keep current accumulator 121 + * - if LSB(a) is a known 1, add b to current accumulator 122 + * - if LSB(a) is unknown, take a union of the above cases. 123 + * 124 + * For example: 125 + * 126 + * acc_0: acc_1: 127 + * 128 + * 11 * -> 11 * -> 11 * -> union(0011, 1001) == x0x1 129 + * x1 01 11 130 + * ------ ------ ------ 131 + * 11 11 11 132 + * xx 00 11 133 + * ------ ------ ------ 134 + * ???? 0011 1001 126 135 */ 127 136 struct tnum tnum_mul(struct tnum a, struct tnum b) 128 137 { 129 - u64 acc_v = a.value * b.value; 130 - struct tnum acc_m = TNUM(0, 0); 138 + struct tnum acc = TNUM(0, 0); 131 139 132 140 while (a.value || a.mask) { 133 141 /* LSB of tnum a is a certain 1 */ 134 142 if (a.value & 1) 135 - acc_m = tnum_add(acc_m, TNUM(0, b.mask)); 143 + acc = tnum_add(acc, b); 136 144 /* LSB of tnum a is uncertain */ 137 - else if (a.mask & 1) 138 - acc_m = tnum_add(acc_m, TNUM(0, b.value | b.mask)); 145 + else if (a.mask & 1) { 146 + /* acc = tnum_union(acc_0, acc_1), where acc_0 and 147 + * acc_1 are partial accumulators for cases 148 + * LSB(a) = certain 0 and LSB(a) = certain 1. 149 + * acc_0 = acc + 0 * b = acc. 150 + * acc_1 = acc + 1 * b = tnum_add(acc, b). 151 + */ 152 + 153 + acc = tnum_union(acc, tnum_add(acc, b)); 154 + } 139 155 /* Note: no case for LSB is certain 0 */ 140 156 a = tnum_rshift(a, 1); 141 157 b = tnum_lshift(b, 1); 142 158 } 143 - return tnum_add(TNUM(acc_v, 0), acc_m); 159 + return acc; 144 160 } 145 161 146 162 bool tnum_overlap(struct tnum a, struct tnum b) ··· 176 160 177 161 v = a.value | b.value; 178 162 mu = a.mask & b.mask; 163 + return TNUM(v & ~mu, mu); 164 + } 165 + 166 + /* Returns a tnum with the uncertainty from both a and b, and in addition, new 167 + * uncertainty at any position that a and b disagree. This represents a 168 + * superset of the union of the concrete sets of both a and b. Despite the 169 + * overapproximation, it is optimal. 170 + */ 171 + struct tnum tnum_union(struct tnum a, struct tnum b) 172 + { 173 + u64 v = a.value & b.value; 174 + u64 mu = (a.value ^ b.value) | a.mask | b.mask; 175 + 179 176 return TNUM(v & ~mu, mu); 180 177 } 181 178