A game about forced loneliness, made by TACStudios
1# Loop vectorization
2
3Burst uses [loop vectorization](https://llvm.org/docs/Vectorizers.html#loop-vectorizer) to improve the performance of your code. It uses this technique to loop over multiple values at the same time, rather than looping over single values at a time, which speeds up the performance of your code. For example:
4
5``` c#
6[MethodImpl(MethodImplOptions.NoInlining)]
7private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count)
8{
9 for (var i = 0; i < count; i++)
10 {
11 a[i] += b[i];
12 }
13}
14
15public static unsafe void Foo(int count)
16{
17 var a = stackalloc int[count];
18 var b = stackalloc int[count];
19
20 Bar(a, b, count);
21}
22```
23
24Burst converts the scalar loop in `Bar` into a vectorized loop. Then, instead of looping over a single value at a time, it generates code that loops over multiple values at the same time, which produces faster code.
25
26This is the `x64` assembly Burst generates for `AVX2` for the loop in `Bar` above:
27
28```x86asm
29.LBB1_4:
30 vmovdqu ymm0, ymmword ptr [rdx + 4*rax]
31 vmovdqu ymm1, ymmword ptr [rdx + 4*rax + 32]
32 vmovdqu ymm2, ymmword ptr [rdx + 4*rax + 64]
33 vmovdqu ymm3, ymmword ptr [rdx + 4*rax + 96]
34 vpaddd ymm0, ymm0, ymmword ptr [rcx + 4*rax]
35 vpaddd ymm1, ymm1, ymmword ptr [rcx + 4*rax + 32]
36 vpaddd ymm2, ymm2, ymmword ptr [rcx + 4*rax + 64]
37 vpaddd ymm3, ymm3, ymmword ptr [rcx + 4*rax + 96]
38 vmovdqu ymmword ptr [rcx + 4*rax], ymm0
39 vmovdqu ymmword ptr [rcx + 4*rax + 32], ymm1
40 vmovdqu ymmword ptr [rcx + 4*rax + 64], ymm2
41 vmovdqu ymmword ptr [rcx + 4*rax + 96], ymm3
42 add rax, 32
43 cmp r8, rax
44 jne .LBB1_4
45```
46
47Burst has unrolled and vectorized the loop into four `vpaddd` instructions, which calculate eight integer additions each, for a total of 32 integer additions per loop iteration.
48
49## Loop vectorization intrinsics
50
51Burst includes experimental intrinsics to express loop vectorization assumptions: `Loop.ExpectVectorized` and `Loop.ExpectNotVectorized`. Burst then validates the loop vectorization at compile-time. This is useful in a situation where you might break the auto vectorization. For example, if you introduce a branch to the code:
52
53``` c#
54[MethodImpl(MethodImplOptions.NoInlining)]
55private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count)
56{
57 for (var i = 0; i < count; i++)
58 {
59 if (a[i] > b[i])
60 {
61 break;
62 }
63
64 a[i] += b[i];
65 }
66}
67```
68
69This changes the assembly to the following:
70
71```x86asm
72.LBB1_3:
73 mov r9d, dword ptr [rcx + 4*r10]
74 mov eax, dword ptr [rdx + 4*r10]
75 cmp r9d, eax
76 jg .LBB1_4
77 add eax, r9d
78 mov dword ptr [rcx + 4*r10], eax
79 inc r10
80 cmp r8, r10
81 jne .LBB1_3
82```
83
84This isn't ideal because the loop is scalar and only has 1 integer addition per loop iteration. It can be difficult to spot this happening in your code, so use the experimental intrinsics `Loop.ExpectVectorized` and `Loop.ExpectNotVectorized` to express loop vectorization assumptions. Burst then validates the loop vectorization at compile-time.
85
86Because the intrinsics are experimental, you need to use the `UNITY_BURST_EXPERIMENTAL_LOOP_INTRINSICS` preprocessor define to enable them.
87
88The following example shows the original `Bar` example with the `Loop.ExpectVectorized` intrinsic:
89
90``` c#
91[MethodImpl(MethodImplOptions.NoInlining)]
92private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count)
93{
94 for (var i = 0; i < count; i++)
95 {
96 Unity.Burst.CompilerServices.Loop.ExpectVectorized();
97
98 a[i] += b[i];
99 }
100}
101```
102
103Burst then validates at compile-time whether the loop is vectorized. If the loop isn't vectorized, Burst emits a compiler error. The following example produces an error:
104
105``` c#
106[MethodImpl(MethodImplOptions.NoInlining)]
107private static unsafe void Bar([NoAlias] int* a, [NoAlias] int* b, int count)
108{
109 for (var i = 0; i < count; i++)
110 {
111 Unity.Burst.CompilerServices.Loop.ExpectVectorized();
112
113 if (a[i] > b[i])
114 {
115 break;
116 }
117
118 a[i] += b[i];
119 }
120}
121```
122
123Burst emits the following error at compile-time:
124
125>LoopIntrinsics.cs(6,9): Burst error BC1321: The loop is not vectorized where it was expected that it is vectorized.
126
127>[!IMPORTANT]
128>These intrinsics don't work inside `if` statements. Burst doesn't prevent this from happening, so you won't see a compile-time error for this.