A game about forced loneliness, made by TACStudios
at master 128 lines 4.5 kB view raw view rendered
1# Loop vectorization 2 3Burst uses [loop vectorization](https://llvm.org/docs/Vectorizers.html#loop-vectorizer) to improve the performance of your code. It uses this technique to loop over multiple values at the same time, rather than looping over single values at a time, which speeds up the performance of your code. For example: 4 5``` c# 6[MethodImpl(MethodImplOptions.NoInlining)] 7private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count) 8{ 9 for (var i = 0; i < count; i++) 10 { 11 a[i] += b[i]; 12 } 13} 14 15public static unsafe void Foo(int count) 16{ 17 var a = stackalloc int[count]; 18 var b = stackalloc int[count]; 19 20 Bar(a, b, count); 21} 22``` 23 24Burst converts the scalar loop in `Bar` into a vectorized loop. Then, instead of looping over a single value at a time, it generates code that loops over multiple values at the same time, which produces faster code. 25 26This is the `x64` assembly Burst generates for `AVX2` for the loop in `Bar` above: 27 28```x86asm 29.LBB1_4: 30 vmovdqu ymm0, ymmword ptr [rdx + 4*rax] 31 vmovdqu ymm1, ymmword ptr [rdx + 4*rax + 32] 32 vmovdqu ymm2, ymmword ptr [rdx + 4*rax + 64] 33 vmovdqu ymm3, ymmword ptr [rdx + 4*rax + 96] 34 vpaddd ymm0, ymm0, ymmword ptr [rcx + 4*rax] 35 vpaddd ymm1, ymm1, ymmword ptr [rcx + 4*rax + 32] 36 vpaddd ymm2, ymm2, ymmword ptr [rcx + 4*rax + 64] 37 vpaddd ymm3, ymm3, ymmword ptr [rcx + 4*rax + 96] 38 vmovdqu ymmword ptr [rcx + 4*rax], ymm0 39 vmovdqu ymmword ptr [rcx + 4*rax + 32], ymm1 40 vmovdqu ymmword ptr [rcx + 4*rax + 64], ymm2 41 vmovdqu ymmword ptr [rcx + 4*rax + 96], ymm3 42 add rax, 32 43 cmp r8, rax 44 jne .LBB1_4 45``` 46 47Burst has unrolled and vectorized the loop into four `vpaddd` instructions, which calculate eight integer additions each, for a total of 32 integer additions per loop iteration. 48 49## Loop vectorization intrinsics 50 51Burst includes experimental intrinsics to express loop vectorization assumptions: `Loop.ExpectVectorized` and `Loop.ExpectNotVectorized`. Burst then validates the loop vectorization at compile-time. This is useful in a situation where you might break the auto vectorization. For example, if you introduce a branch to the code: 52 53``` c# 54[MethodImpl(MethodImplOptions.NoInlining)] 55private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count) 56{ 57 for (var i = 0; i < count; i++) 58 { 59 if (a[i] > b[i]) 60 { 61 break; 62 } 63 64 a[i] += b[i]; 65 } 66} 67``` 68 69This changes the assembly to the following: 70 71```x86asm 72.LBB1_3: 73 mov r9d, dword ptr [rcx + 4*r10] 74 mov eax, dword ptr [rdx + 4*r10] 75 cmp r9d, eax 76 jg .LBB1_4 77 add eax, r9d 78 mov dword ptr [rcx + 4*r10], eax 79 inc r10 80 cmp r8, r10 81 jne .LBB1_3 82``` 83 84This isn't ideal because the loop is scalar and only has 1 integer addition per loop iteration. It can be difficult to spot this happening in your code, so use the experimental intrinsics `Loop.ExpectVectorized` and `Loop.ExpectNotVectorized` to express loop vectorization assumptions. Burst then validates the loop vectorization at compile-time. 85 86Because the intrinsics are experimental, you need to use the `UNITY_BURST_EXPERIMENTAL_LOOP_INTRINSICS` preprocessor define to enable them. 87 88The following example shows the original `Bar` example with the `Loop.ExpectVectorized` intrinsic: 89 90``` c# 91[MethodImpl(MethodImplOptions.NoInlining)] 92private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count) 93{ 94 for (var i = 0; i < count; i++) 95 { 96 Unity.Burst.CompilerServices.Loop.ExpectVectorized(); 97 98 a[i] += b[i]; 99 } 100} 101``` 102 103Burst then validates at compile-time whether the loop is vectorized. If the loop isn't vectorized, Burst emits a compiler error. The following example produces an error: 104 105``` c# 106[MethodImpl(MethodImplOptions.NoInlining)] 107private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count) 108{ 109 for (var i = 0; i < count; i++) 110 { 111 Unity.Burst.CompilerServices.Loop.ExpectVectorized(); 112 113 if (a[i] > b[i]) 114 { 115 break; 116 } 117 118 a[i] += b[i]; 119 } 120} 121``` 122 123Burst emits the following error at compile-time: 124 125>LoopIntrinsics.cs(6,9): Burst error BC1321: The loop is not vectorized where it was expected that it is vectorized. 126 127>[!IMPORTANT] 128>These intrinsics don't work inside `if` statements. Burst doesn't prevent this from happening, so you won't see a compile-time error for this.