Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

at v2.6.30-rc7 68 lines 2.2 kB view raw
1.align 2 2.global ___muldi3; 3.type ___muldi3, STT_FUNC; 4 5#ifdef CONFIG_ARITHMETIC_OPS_L1 6.section .l1.text 7#else 8.text 9#endif 10 11/* 12 R1:R0 * R3:R2 13 = R1.h:R1.l:R0.h:R0.l * R3.h:R3.l:R2.h:R2.l 14[X] = (R1.h * R3.h) * 2^96 15[X] + (R1.h * R3.l + R1.l * R3.h) * 2^80 16[X] + (R1.h * R2.h + R1.l * R3.l + R3.h * R0.h) * 2^64 17[T1] + (R1.h * R2.l + R3.h * R0.l + R1.l * R2.h + R3.l * R0.h) * 2^48 18[T2] + (R1.l * R2.l + R3.l * R0.l + R0.h * R2.h) * 2^32 19[T3] + (R0.l * R2.h + R2.l * R0.h) * 2^16 20[T4] + (R0.l * R2.l) 21 22 We can discard the first three lines marked "X" since we produce 23 only a 64 bit result. So, we need ten 16-bit multiplies. 24 25 Individual mul-acc results: 26[E1] = R1.h * R2.l + R3.h * R0.l + R1.l * R2.h + R3.l * R0.h 27[E2] = R1.l * R2.l + R3.l * R0.l + R0.h * R2.h 28[E3] = R0.l * R2.h + R2.l * R0.h 29[E4] = R0.l * R2.l 30 31 We also need to add high parts from lower-level results to higher ones: 32 E[n]c = E[n] + (E[n+1]c >> 16), where E4c := E4 33 34 One interesting property is that all parts of the result that depend 35 on the sign of the multiplication are discarded. Those would be the 36 multiplications involving R1.h and R3.h, but only the top 16 bit of 37 the 32 bit result depend on the sign, and since R1.h and R3.h only 38 occur in E1, the top half of these results is cut off. 39 So, we can just use FU mode for all of the 16-bit multiplies, and 40 ignore questions of when to use mixed mode. */ 41 42___muldi3: 43 /* [SP] technically is part of the caller's frame, but we can 44 use it as scratch space. */ 45 A0 = R2.H * R1.L, A1 = R2.L * R1.H (FU) || R3 = [SP + 12]; /* E1 */ 46 A0 += R3.H * R0.L, A1 += R3.L * R0.H (FU) || [SP] = R4; /* E1 */ 47 A0 += A1; /* E1 */ 48 R4 = A0.w; 49 A0 = R0.l * R3.l (FU); /* E2 */ 50 A0 += R2.l * R1.l (FU); /* E2 */ 51 52 A1 = R2.L * R0.L (FU); /* E4 */ 53 R3 = A1.w; 54 A1 = A1 >> 16; /* E3c */ 55 A0 += R2.H * R0.H, A1 += R2.L * R0.H (FU); /* E2, E3c */ 56 A1 += R0.L * R2.H (FU); /* E3c */ 57 R0 = A1.w; 58 A1 = A1 >> 16; /* E2c */ 59 A0 += A1; /* E2c */ 60 R1 = A0.w; 61 62 /* low(result) = low(E3c):low(E4) */ 63 R0 = PACK (R0.l, R3.l); 64 /* high(result) = E2c + (E1 << 16) */ 65 R1.h = R1.h + R4.l (NS) || R4 = [SP]; 66 RTS; 67 68.size ___muldi3, .-___muldi3