Library/PackageCache/com.unity.collections/Unity.Collections.PerformanceTests/Unity.PerformanceTesting.Benchmark/README.md at master · tacstudios.tngl.sh/AloneGame

A game about forced loneliness, made by TACStudios
AloneGame / Library / PackageCache / com.unity.collections / Unity.Collections.PerformanceTests / Unity.PerformanceTesting.Benchmark / README.md
at master 398 lines 23 kB view raw view rendered
  1# Benchmark Framework
  2
  3## Table of Contents
  4
  5- [Overview and Features](#overview-and-features)
  6- [Using the Framework](#using-the-framework)
  7  - [Attribute Summary](#attribute-summary)
  8- [Example](#example)
  9  - [Glue Layer - Native Containers](#glue-layer---native-containers)
 10  - [Performance and Benchmark Tests - Native Containers](#performance-and-benchmark-tests---native-containers)
 11  - [Results](#results)
 12
 13## Overview and Features
 14The Benchmark Framework is a complimentary framework to the Performance Test Framework. It provides a means to write a code for performance tests *one time* for a given type while providing the following benefits:
 15
 16- Both benchmarks comparisons and performance/regression testing from a single implementation
 17  - A managed execution path (JIT) from the same single implementation
 18  - A Burst compiled *with safety* path from the same single implementation
 19  - A Burst compiled *without safety* path from the same single implementation
 20- Automatically generate markdown formatted documentation for the Benchmark results
 21- Provide a simple means for running benchmarks through custom menu items with easily trackable progress and ability to cancel at any time
 22
 23For the Benchmark Framework itself, tests can be designed to easily group together multiple variations for comparison. For example, the list above may apply to:
 24- An implementation for Native containers
 25- Another implementation for Unsafe containers
 26- And yet another implementation for the container types included in .NET/Mono/IL2CPP Base Class Libraries
 27
 28Finally, test implementations may be classified such as:
 29- Only test for benchmarking, but not for performance/regression testing (such as managed BCL containers)
 30- Consider an implementation variation as the baseline, and compare all other implementation variations against it
 31- Include only a subset of implementation in case there is a gap in functionality (intentional or not) at this time
 32
 33<br/>
 34
 35---
 36## Using the Framework
 37To take advantage of the features above and write tests for the Benchmark Framework, three components are required:
 381. The Benchmark Framework itself which works alongside the Performance Test Framework
 392. An intermediate 'glue' layer for a given benchmark comparison type i.e. BenchmarkContainer, BenchmarkAllocator
 403. The Performance Tests themselves, using the intermediate layer from #2 above
 41
 42Because #1 is provided by the Framework here, the rest of this documentation will give an example of using it to create a 'glue' layer and then a performance test which makes use of this example 'glue' layer.
 43
 44### Attribute Summary
 45Most (but not *quite* all) interaction with the Benchmark Framework will occur through its attributes. These are all defined in the `Unity.PerformanceTesting.Benchmark` namespace. A summary will be given here, but further details can be found in the inline code documentation. As mentioned, a small example demonstrating their use will follow.
 46
 47|Attribute|Description|
 48|---|---|
 49|**`[Benchmark]`**|This marks a class containing performance tests to be used in Benchmark Comparison report generation.|
 50|**`[BenchmarkComparison]`**|This marks an enum as defining the variants that will be generated and simultaneously covers both the Performance Test Framework tests as well as Benchmark Framework tests. *Optionally, this can define the Benchmark baseline if it is also a Performance Test Framework measurement.*|
 51|**`[BenchmarkComparisonExternal]`**|Used on the same enum definition, this associates non-enum values with the enum for Benchmark Framework tests which *are not* to be included in Performance Test Framework tests. *Optionally, this can define the Benchmark baseline if it is not a Performance Test Framework measurement.*|
 52|**`[BenchmarkComparisonDisplay]`**|Also used on the same enum definition, this overrides the default measurement sample unit (millisecond, microsecond, etc.), the decimal places for Benchmark report generation, and the ranking statistic for Benchmark report generation (median, minimum, etc.).|
 53|**`[BenchmarkName]`**|Required with each enum value, this describes a formatting string for naming Benchmark result variations when a report is generated, such as `[BenchmarkName("Native{0}")]`, which when used with a `[Benchmark]` attributed class such as `HashSet`, would generate a the name "NativeHashSet"|
 54|**`[BenchmarkNameOverride]`**|Override the formatted name in case the class doesn't precisely represent the name that should appear in reports.|
 55|**`[BenchmarkTestFootnote]`**|Generate a footnote in the Benchmark Comparison report for a given Performance Test method. When used, the footnote will always include a description of the method and its parameters. Optionally, user-defined footnote text may be specified as well.|
 56
 57Generally, `[Benchmark]`, `[BenchmarkNameOverride]`, and `[BenchmarkTestFootnote]` will be used while writing tests. The rest are used solely in the 'glue' layer, so if you are writing tests on top of a pre-existing 'glue' layer, you will be unlikely to need or use them.
 58
 59<br/>
 60
 61---
 62## Example
 63### Glue Layer - Native Containers
 64
 65This will illustrate a simplified version of the com.unity.collections `BenchmarkContainer` implementation as an example of creating an intermediate 'glue' layer between the Benchmark Framework and user-defined performance tests.
 66
 671. The first requirement is an `enum` type which defines the test variations that will be benchmarked. Values defined in the enum will also generate Performance Test Framework tests used in regression testing and performance analysis. Values defined through the `[BenchmarkComparison]` attribute will only appear in Benchmark reports.<br/><br/>
 68You'll notice two attributes used. `[BenchmarkComparison]` denotes this `enum` will be used for benchmarking as well as indicates an externally defined comparison type (BCL) as the baseline to benchmark against, and `[BenchmarkComparisonDisplay]` overrides the default format for report generation and the statistic used for comparison.<br/><br/>
 69It's worth pointing out that the `{0}` in the name strings will be replaced with the name of the test group, such as `HashSet` or `List`. This also references a `MyExampleConfig` for convenience and consistency which will be defined next.
 70```
 71    [BenchmarkComparison(MyExampleConfig.BCL, "{0} (BCL)")]
 72    [BenchmarkComparisonDisplay(SampleUnit.Millisecond, 3, BenchmarkRankingStatistic.Median)]
 73    public enum MyExampleType : int
 74    {
 75        [BenchmarkName("Native{0}")] Managed,
 76        [BenchmarkName("Native{0} (B)")] BurstCompiled,
 77    }
 78```
 79
 802. The configuration class is not a requirement, but rather it is a recommended pattern for storing common data for all tests as well as the interface (in this case a menu item) for running benchmarks and generating the resulting markdown file.<br/><br/>
 81The main takeaway here is the call to `GenerateMarkdown` which also runs the benchmark tests. Specifically, the argument `typeof(MyExampleType)` refers to the above defined comparison `enum`, and this call will find all the types with a `[Benchmark(typeof(MyExampleType))]` attribute and their methods with the combined `[Test]` and `[Performance]` attributes discover and run benchmark tests. More on this later with the example performance tests which will be benchmarked.
 82```
 83    public static class MyExampleConfig
 84    {
 85        public const int BCL = -1;
 86
 87        internal const int kCountWarmup = 5;
 88        internal const int kCountMeasure = 10;
 89
 90#if UNITY_EDITOR
 91        [UnityEditor.MenuItem("Benchmark Example/Generate My Benchmarks")]
 92#endif
 93        static void RunBenchmarks() =>
 94            BenchmarkGenerator.GenerateMarkdown(
 95                "Containers Example",
 96                typeof(MyExampleType),
 97                "Temp/performance-comparison-example.md",
 98                $"Example benchmark - {kCountMeasure} runs after {kCountWarmup} warmup runs",
 99                "Legend",
100                new string[]
101                {
102                    "`(B)` = Burst Compiled",
103                    "`(BCL)` = Base Class Library implementation (such as provided by Mono or .NET)",
104                });
105    }
106```
107
1083. A `glue` layer should define an `interface` which specifies any test setup, teardown, and measurement for each unique type that will be measured. For the sake of this example, a NativeContainer will be measured, and a managed C# base class library container will be used as a baseline.<br/><br/>
109**Notice** there is not a separate interface definition for the NativeContainer's managed code path versus Burst compiled code path. This can be handled automatically by the final piece of the 'glue' layer, defined next.
110```
111    public interface IMyExampleBenchmark
112    {
113        public void SetupTeardown(int capacity);
114        public object SetupTeardownBCL(int capacity);
115
116        public void Measure();
117        public void MeasureBCL(object container);
118    }
119```
120
1214. Finally, this brings all the individual 'glue' pieces together. Calling this method from a performance framework test implementation (with `[Test]` and `[Performance]` attributes) will ensure the proper code path is executed and measured. Some details worth noting:
122   - `BenchmarkMeasure.Measure` handles selecting the code path for either the Performance Test Framework (run through the Test Runner in Unity) or the Benchmark Framework (run through the above defined menu option, for instance).
123   - Setup and Teardown calls are *not* timed and measured.
124   - Burst compiled (and any other) variants of a single test implementation isn't *entirely* automatic - rather it is defined by the 'glue' layer and specified through the comparison `enum` value.
125   - External comparison values such as `MyExampleConfig.BCL` will never be called by the Performance Test Framework. Only the Benchmark Framework will automatically generation measurement invocations with this value.
126
127
128```
129    [BurstCompile(CompileSynchronously = true)]
130    public static class MyExampleRunner<T> where T : unmanaged, IMyExampleBenchmark
131    {
132        [BurstCompile(CompileSynchronously = true)]
133        unsafe struct BurstCompiledJob : IJob
134        {
135            [NativeDisableUnsafePtrRestriction] public T* methods;
136            public void Execute() => methods->Measure();
137        }
138
139        public static unsafe void Run(int capacity, MyExampleType type)
140        {
141            var methods = new T();
142
143            switch (type)
144            {
145                case (MyExampleType)(MyExampleConfig.BCL):
146                    object container = null;
147                    BenchmarkMeasure.Measure(
148                        typeof(T),
149                        MyExampleConfig.kCountWarmup, 
150                        MyExampleConfig.kCountMeasure,
151                        () => methods.MeasureBCL(container),
152                        () => container = methods.SetupTeardownBCL(capacity),
153                        () => container = methods.SetupTeardownBCL(-1));
154                    break;
155                case MyExampleType.Managed:
156                    BenchmarkMeasure.Measure(
157                        typeof(T),
158                        MyExampleConfig.kCountWarmup, 
159                        MyExampleConfig.kCountMeasure,
160                        () => methods.Measure(),
161                        () => methods.SetupTeardown(capacity),
162                        () => methods.SetupTeardown(-1));
163                    break;
164                case MyExampleType.BurstCompiled:
165                    BenchmarkMeasure.Measure(
166                        typeof(T),
167                        MyExampleConfig.kCountWarmup, 
168                        MyExampleConfig.kCountMeasure,
169                        () => new BurstCompiledJob { methods = (T*)UnsafeUtility.AddressOf(ref methods) }.Run(),
170                        () => methods.SetupTeardown(capacity), 
171                        () => methods.SetupTeardown(-1));
172                    break;
173            }
174        }
175    }
176```
177With these 4 ingredients to the 'glue' layer, writing flexible multipurpose performance and benchmark tests which cover any number of combinations through the minimum amount of code possible - meaning little to no code duplication - is quite easy to do.
178
179There will still be *some* boiler-plate involved, as we do need to adhere to the contract set by the `IMyExampleBenchmark` interface, but the amount of code required to do this for 10s or 100s of performance tests is reduced by about an order of a magnitude compared to doing this manually, and *that* is without consideration even for generating benchmark comparisons and reports.
180
181<br/>
182
183---
184## Example
185### Performance and Benchmark Tests - Native Containers
186
187Now that we have a 'glue' layer, it should be straightforward to define as many performance and benchmark tests for the comparison types provided by that layer as we can imagine.
188
1891. First let's define a simple utility class to reduce boiler plate in each test. This simply commonizes the setup and teardown, as we can not use inheritance due to needing the implementations to be `unmanaged structs` to satisfy the generic constraint of our `MyExampleRunner` in the 'glue' layer.
190```
191    static class ListUtil
192    {
193        static public void SetupTeardown(ref NativeList<int> container, int capacity, bool addValues)
194        {
195            if (capacity >= 0)
196            {
197                container = new NativeList<int>(capacity, Allocator.Persistent);
198                if (addValues)
199                {
200                    for (int i = 0; i < capacity; i++)
201                        container.Add(i);
202                }
203            }
204            else
205                container.Dispose();
206        }
207        static public object SetupTeardownBCL(int capacity, bool addValues)
208        {
209            if (capacity < 0)
210                return null;
211            var list = new System.Collections.Generic.List<int>(capacity);
212            if (addValues)
213            {
214                for (int i = 0; i < capacity; i++)
215                    list.Add(i);
216            }
217            return list;
218        }
219    }
220```
221
2222. Now we'll create an implementation of `IMyExampleBenchmark` provided by the 'glue' layer to grow a list. The code should be straightforward, and each type of container has its code implemented only once. Additionally, the measurement code really is just "the thing we want to measure".
223```
224    struct ListAddGrow : IMyExampleBenchmark
225    {
226        int toAdd;
227        NativeList<int> nativeContainer;
228
229        public void SetupTeardown(int capacity) 
230        {
231            toAdd = capacity;
232            ListUtil.SetupTeardown(ref nativeContainer, 0, false);
233        }
234        public object SetupTeardownBCL(int capacity)
235        {
236            toAdd = capacity;
237            return ListUtil.SetupTeardownBCL(0, false);
238        }
239
240        public void Measure()
241        {
242            for (int i = 0; i < toAdd; i++)
243                nativeContainer.Add(i);
244        }
245        public void MeasureBCL(object container)
246        {
247            var list = (System.Collections.Generic.List<int>)container;
248            for (int i = 0; i < toAdd; i++)
249                list.Add(i);
250        }
251    }
252```
253
2543. Let's make another implementation of `IMyExampleBenchmark`, this time testing the performance of a `foreach` over the list container types.<br/><br/>
255Take special note of the `Volatile.Write` used to ensure optimizations don't throw away the value, thus rendering the loop unnecessary and optimizing it out altogether.
256```
257    struct ListForEach : IMyExampleBenchmark
258    {
259        NativeList<int> nativeContainer;
260
261        public void SetupTeardown(int capacity) => ListUtil.SetupTeardown(ref nativeContainer, capacity, true);
262        public object SetupTeardownBCL(int capacity) => ListUtil.SetupTeardownBCL(capacity, true);
263
264        public void Measure()
265        {
266            int value = 0;
267            foreach (var element in nativeContainer)
268                Volatile.Write(ref value, element);
269        }
270        public void MeasureBCL(object container)
271        {
272            int value = 0;
273            var list = (System.Collections.Generic.List<int>)container;
274            foreach (var element in list)
275                Volatile.Write(ref value, element);
276        }
277    }
278```
279
2804. As a final example, we'll implement a performance test for checking if a list container is empty.<br/><br/>
281*This time*, neither `Volatile.Read` nor `Volatile.Write` would help much because optimization passes can determine the result of checking for empty is constant through each loop iteration, i.e. there is no dependency within the loop itself when making this calculation. Due to this, we must turn off optimizations altogether with `[MethodImpl(MethodImplOptions.NoOptimization)]`.<br/><br/>
282The best that could happen otherwise would be with a `Volatile.Write`. Then, the optimizer would extract the `IsEmpty` or `Count` to outside the loop, calling these only once, and then assign this pre-calculated value to the output of `Volatile.Write` `kIterations` times within a loop. Naturally, this doesn't tell us much about the code we want to measure.
283```
284    struct ListIsEmpty100k : IMyExampleBenchmark
285    {
286        const int kIterations = 100_000;
287        NativeList<int> nativeContainer;
288
289        public void SetupTeardown(int capacity) => ListUtil.SetupTeardown(ref nativeContainer, capacity, true);
290        public object SetupTeardownBCL(int capacity) => ListUtil.SetupTeardownBCL(capacity, true);
291
292        [MethodImpl(MethodImplOptions.NoOptimization)]
293        public void Measure()
294        {
295            for (int i = 0; i < kIterations; i++)
296                _ = nativeContainer.IsEmpty;
297        }
298        [MethodImpl(MethodImplOptions.NoOptimization)]
299        public void MeasureBCL(object container)
300        {
301            var list = (System.Collections.Generic.List<int>)container;
302            for (int i = 0; i < kIterations; i++)
303                _ = list.Count == 0;
304        }
305    }
306```
307
3085. Now, take our measurement code, and simply pass the `IMyExampleBenchmark` implementations into the `MyExampleRunner<T>` runner provided by the 'glue' layer. See the next section for the results of this work.<br/><br/>
309Note `[BenchmarkNameOverride]` is used so that name formatting will look like "NativeList" rather than "NativeMyListMeasurements" in benchmark reports.<br/><br/>
310That may have seemed like a lot of code to get to this point, but keep in mind in that once a 'glue' layer exists, it can be used for as many cases as fit. `com.unity.collections` has many, many performance and benchmarks tests built around a single (albeit  more involved) intermediate 'glue' layer.
311```
312    [Benchmark(typeof(MyExampleType))]
313    [BenchmarkNameOverride("List")]
314    class MyListMeasurements
315    {
316        [Test, Performance]
317        [Category("Performance")]
318        public unsafe void IsEmpty_x_100k(
319            [Values(0, 100)] int capacity,
320            [Values] MyExampleType type)
321        {
322            MyExampleRunner<ListIsEmpty100k>.Run(capacity, type);
323        }
324
325        [Test, Performance]
326        [Category("Performance")]
327        [BenchmarkTestFootnote("Incrementally reaching size of `growTo`")]
328        public unsafe void AddGrow(
329            [Values(65536, 1024 * 1024)] int growTo,
330            [Values] MyExampleType type)
331        {
332            MyExampleRunner<ListAddGrow>.Run(growTo, type);
333        }
334
335        [Test, Performance]
336        [Category("Performance")]
337        public unsafe void Foreach(
338            [Values(10000, 100000, 1000000)] int insertions,
339            [Values] MyExampleType type)
340        {
341            MyExampleRunner<ListForEach>.Run(insertions, type);
342        }
343    }
344```
345
346<br/>
347
348---
349## Example
350### Results
351
352There are two clear results of the List performance tests implemented above
3531. The Test Runner in the Unity Editor will display the following Performance Test Framework tests. Note that with one implementation per type, there is both a burst compiled path and non-burst compiled path being measured. One could easily add others (such as burst compiled while safety on or off, or an UnsafeContainer variation of the same tests, though this would require a bit more 'glue' to integrate). Here is an example of the output:
354
355![Performance Test Framework example](PerformanceTestFrameworkOutput.png)
356
3572. Running the `Benchmark Example/Generate My Benchmarks` menu item implemented above will generate a markdown report, again running the same single code path per type. Here is the output:
358
359> # Performance Comparison: Containers Example
360> 
361> > **<span style="color:red">This file is auto-generated</span>**
362> > 
363> > All measurments were taken on 12th Gen Intel(R) Core(TM) i9-12900K with 24 logical cores.<br/>
364> > To regenerate this file locally use: **DOTS -> Unity.Collections -> Generate &ast;&ast;&ast;** menu.<br/>
365> 
366> ## Table of Contents
367> 
368> - [Benchmark Results](#Benchmark%20Results)
369>   - [List](#List)
370> 
371> ## Benchmark Results
372> 
373> Example benchmark - 10 runs after 5 warmup runs<br/>
374> <br/>
375> 
376> > **Legend**
377> > 
378> > `(B)` = Burst Compiled<br/>
379> > `(BCL)` = Base Class Library implementation (such as provided by Mono or .NET)<br/>
380> 
381> <br/>
382> 
383> ### *List*
384> 
385> | Functionality | NativeList | NativeList (B) | *List (BCL)* |
386> |---|--:|--:|--:|
387> | `IsEmpty_x_100k(0)`*¹* | 0.373ms <span style="color:red">(0.3x)</span>&nbsp;🟠 | 0.089ms <span style="color:green">(1.1x)</span>&nbsp;🟢 | *0.098ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
388> | `IsEmpty_x_100k(100)`*¹* | 0.334ms <span style="color:red">(0.3x)</span>&nbsp;🟠 | 0.089ms <span style="color:green">(1.1x)</span>&nbsp;🟢 | *0.098ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
389> | `AddGrow(65536)`*³* | 1.281ms <span style="color:red">(0.1x)</span>&nbsp;🟠 | 0.427ms <span style="color:red">(0.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *0.144ms <span style="color:grey">(1.0x)</span>*&nbsp;🟢 |
390> | `AddGrow(1048576)`*³* | 21.435ms <span style="color:red">(0.1x)</span>&nbsp;🟠 | 7.471ms <span style="color:red">(0.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *2.274ms <span style="color:grey">(1.0x)</span>*&nbsp;🟢 |
391> | `Foreach(10000)` | 0.042ms <span style="color:red">(0.4x)</span>&nbsp;🟠 | 0.003ms <span style="color:green">(6.6x)</span>&nbsp;🟢 | *0.018ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
392> | `Foreach(100000)` | 0.452ms <span style="color:red">(0.4x)</span>&nbsp;🟠 | 0.025ms <span style="color:green">(7.4x)</span>&nbsp;🟢 | *0.184ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
393> | `Foreach(1000000)` | 4.500ms <span style="color:red">(0.4x)</span>&nbsp;🟠 | 0.250ms <span style="color:green">(7.5x)</span>&nbsp;🟢 | *1.877ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
394> 
395> *¹* Optimizations were disabled to perform this benchmark<br/>
396> *³* AddGrow(growTo) -- Incrementally reaching size of `growTo`<br/>
397 
398*Happy Benchmarking!*