A game about forced loneliness, made by TACStudios
1# Benchmark Framework 2 3## Table of Contents 4 5- [Overview and Features](#overview-and-features) 6- [Using the Framework](#using-the-framework) 7 - [Attribute Summary](#attribute-summary) 8- [Example](#example) 9 - [Glue Layer - Native Containers](#glue-layer---native-containers) 10 - [Performance and Benchmark Tests - Native Containers](#performance-and-benchmark-tests---native-containers) 11 - [Results](#results) 12 13## Overview and Features 14The Benchmark Framework is a complimentary framework to the Performance Test Framework. It provides a means to write a code for performance tests *one time* for a given type while providing the following benefits: 15 16- Both benchmarks comparisons and performance/regression testing from a single implementation 17 - A managed execution path (JIT) from the same single implementation 18 - A Burst compiled *with safety* path from the same single implementation 19 - A Burst compiled *without safety* path from the same single implementation 20- Automatically generate markdown formatted documentation for the Benchmark results 21- Provide a simple means for running benchmarks through custom menu items with easily trackable progress and ability to cancel at any time 22 23For the Benchmark Framework itself, tests can be designed to easily group together multiple variations for comparison. For example, the list above may apply to: 24- An implementation for Native containers 25- Another implementation for Unsafe containers 26- And yet another implementation for the container types included in .NET/Mono/IL2CPP Base Class Libraries 27 28Finally, test implementations may be classified such as: 29- Only test for benchmarking, but not for performance/regression testing (such as managed BCL containers) 30- Consider an implementation variation as the baseline, and compare all other implementation variations against it 31- Include only a subset of implementation in case there is a gap in functionality (intentional or not) at this time 32 33<br/> 34 35--- 36## Using the Framework 37To take advantage of the features above and write tests for the Benchmark Framework, three components are required: 381. The Benchmark Framework itself which works alongside the Performance Test Framework 392. An intermediate 'glue' layer for a given benchmark comparison type i.e. BenchmarkContainer, BenchmarkAllocator 403. The Performance Tests themselves, using the intermediate layer from #2 above 41 42Because #1 is provided by the Framework here, the rest of this documentation will give an example of using it to create a 'glue' layer and then a performance test which makes use of this example 'glue' layer. 43 44### Attribute Summary 45Most (but not *quite* all) interaction with the Benchmark Framework will occur through its attributes. These are all defined in the `Unity.PerformanceTesting.Benchmark` namespace. A summary will be given here, but further details can be found in the inline code documentation. As mentioned, a small example demonstrating their use will follow. 46 47|Attribute|Description| 48|---|---| 49|**`[Benchmark]`**|This marks a class containing performance tests to be used in Benchmark Comparison report generation.| 50|**`[BenchmarkComparison]`**|This marks an enum as defining the variants that will be generated and simultaneously covers both the Performance Test Framework tests as well as Benchmark Framework tests. *Optionally, this can define the Benchmark baseline if it is also a Performance Test Framework measurement.*| 51|**`[BenchmarkComparisonExternal]`**|Used on the same enum definition, this associates non-enum values with the enum for Benchmark Framework tests which *are not* to be included in Performance Test Framework tests. *Optionally, this can define the Benchmark baseline if it is not a Performance Test Framework measurement.*| 52|**`[BenchmarkComparisonDisplay]`**|Also used on the same enum definition, this overrides the default measurement sample unit (millisecond, microsecond, etc.), the decimal places for Benchmark report generation, and the ranking statistic for Benchmark report generation (median, minimum, etc.).| 53|**`[BenchmarkName]`**|Required with each enum value, this describes a formatting string for naming Benchmark result variations when a report is generated, such as `[BenchmarkName("Native{0}")]`, which when used with a `[Benchmark]` attributed class such as `HashSet`, would generate a the name "NativeHashSet"| 54|**`[BenchmarkNameOverride]`**|Override the formatted name in case the class doesn't precisely represent the name that should appear in reports.| 55|**`[BenchmarkTestFootnote]`**|Generate a footnote in the Benchmark Comparison report for a given Performance Test method. When used, the footnote will always include a description of the method and its parameters. Optionally, user-defined footnote text may be specified as well.| 56 57Generally, `[Benchmark]`, `[BenchmarkNameOverride]`, and `[BenchmarkTestFootnote]` will be used while writing tests. The rest are used solely in the 'glue' layer, so if you are writing tests on top of a pre-existing 'glue' layer, you will be unlikely to need or use them. 58 59<br/> 60 61--- 62## Example 63### Glue Layer - Native Containers 64 65This will illustrate a simplified version of the com.unity.collections `BenchmarkContainer` implementation as an example of creating an intermediate 'glue' layer between the Benchmark Framework and user-defined performance tests. 66 671. The first requirement is an `enum` type which defines the test variations that will be benchmarked. Values defined in the enum will also generate Performance Test Framework tests used in regression testing and performance analysis. Values defined through the `[BenchmarkComparison]` attribute will only appear in Benchmark reports.<br/><br/> 68You'll notice two attributes used. `[BenchmarkComparison]` denotes this `enum` will be used for benchmarking as well as indicates an externally defined comparison type (BCL) as the baseline to benchmark against, and `[BenchmarkComparisonDisplay]` overrides the default format for report generation and the statistic used for comparison.<br/><br/> 69It's worth pointing out that the `{0}` in the name strings will be replaced with the name of the test group, such as `HashSet` or `List`. This also references a `MyExampleConfig` for convenience and consistency which will be defined next. 70``` 71 [BenchmarkComparison(MyExampleConfig.BCL, "{0} (BCL)")] 72 [BenchmarkComparisonDisplay(SampleUnit.Millisecond, 3, BenchmarkRankingStatistic.Median)] 73 public enum MyExampleType : int 74 { 75 [BenchmarkName("Native{0}")] Managed, 76 [BenchmarkName("Native{0} (B)")] BurstCompiled, 77 } 78``` 79 802. The configuration class is not a requirement, but rather it is a recommended pattern for storing common data for all tests as well as the interface (in this case a menu item) for running benchmarks and generating the resulting markdown file.<br/><br/> 81The main takeaway here is the call to `GenerateMarkdown` which also runs the benchmark tests. Specifically, the argument `typeof(MyExampleType)` refers to the above defined comparison `enum`, and this call will find all the types with a `[Benchmark(typeof(MyExampleType))]` attribute and their methods with the combined `[Test]` and `[Performance]` attributes discover and run benchmark tests. More on this later with the example performance tests which will be benchmarked. 82``` 83 public static class MyExampleConfig 84 { 85 public const int BCL = -1; 86 87 internal const int kCountWarmup = 5; 88 internal const int kCountMeasure = 10; 89 90#if UNITY_EDITOR 91 [UnityEditor.MenuItem("Benchmark Example/Generate My Benchmarks")] 92#endif 93 static void RunBenchmarks() => 94 BenchmarkGenerator.GenerateMarkdown( 95 "Containers Example", 96 typeof(MyExampleType), 97 "Temp/performance-comparison-example.md", 98 $"Example benchmark - {kCountMeasure} runs after {kCountWarmup} warmup runs", 99 "Legend", 100 new string[] 101 { 102 "`(B)` = Burst Compiled", 103 "`(BCL)` = Base Class Library implementation (such as provided by Mono or .NET)", 104 }); 105 } 106``` 107 1083. A `glue` layer should define an `interface` which specifies any test setup, teardown, and measurement for each unique type that will be measured. For the sake of this example, a NativeContainer will be measured, and a managed C# base class library container will be used as a baseline.<br/><br/> 109**Notice** there is not a separate interface definition for the NativeContainer's managed code path versus Burst compiled code path. This can be handled automatically by the final piece of the 'glue' layer, defined next. 110``` 111 public interface IMyExampleBenchmark 112 { 113 public void SetupTeardown(int capacity); 114 public object SetupTeardownBCL(int capacity); 115 116 public void Measure(); 117 public void MeasureBCL(object container); 118 } 119``` 120 1214. Finally, this brings all the individual 'glue' pieces together. Calling this method from a performance framework test implementation (with `[Test]` and `[Performance]` attributes) will ensure the proper code path is executed and measured. Some details worth noting: 122 - `BenchmarkMeasure.Measure` handles selecting the code path for either the Performance Test Framework (run through the Test Runner in Unity) or the Benchmark Framework (run through the above defined menu option, for instance). 123 - Setup and Teardown calls are *not* timed and measured. 124 - Burst compiled (and any other) variants of a single test implementation isn't *entirely* automatic - rather it is defined by the 'glue' layer and specified through the comparison `enum` value. 125 - External comparison values such as `MyExampleConfig.BCL` will never be called by the Performance Test Framework. Only the Benchmark Framework will automatically generation measurement invocations with this value. 126 127 128``` 129 [BurstCompile(CompileSynchronously = true)] 130 public static class MyExampleRunner<T> where T : unmanaged, IMyExampleBenchmark 131 { 132 [BurstCompile(CompileSynchronously = true)] 133 unsafe struct BurstCompiledJob : IJob 134 { 135 [NativeDisableUnsafePtrRestriction] public T* methods; 136 public void Execute() => methods->Measure(); 137 } 138 139 public static unsafe void Run(int capacity, MyExampleType type) 140 { 141 var methods = new T(); 142 143 switch (type) 144 { 145 case (MyExampleType)(MyExampleConfig.BCL): 146 object container = null; 147 BenchmarkMeasure.Measure( 148 typeof(T), 149 MyExampleConfig.kCountWarmup, 150 MyExampleConfig.kCountMeasure, 151 () => methods.MeasureBCL(container), 152 () => container = methods.SetupTeardownBCL(capacity), 153 () => container = methods.SetupTeardownBCL(-1)); 154 break; 155 case MyExampleType.Managed: 156 BenchmarkMeasure.Measure( 157 typeof(T), 158 MyExampleConfig.kCountWarmup, 159 MyExampleConfig.kCountMeasure, 160 () => methods.Measure(), 161 () => methods.SetupTeardown(capacity), 162 () => methods.SetupTeardown(-1)); 163 break; 164 case MyExampleType.BurstCompiled: 165 BenchmarkMeasure.Measure( 166 typeof(T), 167 MyExampleConfig.kCountWarmup, 168 MyExampleConfig.kCountMeasure, 169 () => new BurstCompiledJob { methods = (T*)UnsafeUtility.AddressOf(ref methods) }.Run(), 170 () => methods.SetupTeardown(capacity), 171 () => methods.SetupTeardown(-1)); 172 break; 173 } 174 } 175 } 176``` 177With these 4 ingredients to the 'glue' layer, writing flexible multipurpose performance and benchmark tests which cover any number of combinations through the minimum amount of code possible - meaning little to no code duplication - is quite easy to do. 178 179There will still be *some* boiler-plate involved, as we do need to adhere to the contract set by the `IMyExampleBenchmark` interface, but the amount of code required to do this for 10s or 100s of performance tests is reduced by about an order of a magnitude compared to doing this manually, and *that* is without consideration even for generating benchmark comparisons and reports. 180 181<br/> 182 183--- 184## Example 185### Performance and Benchmark Tests - Native Containers 186 187Now that we have a 'glue' layer, it should be straightforward to define as many performance and benchmark tests for the comparison types provided by that layer as we can imagine. 188 1891. First let's define a simple utility class to reduce boiler plate in each test. This simply commonizes the setup and teardown, as we can not use inheritance due to needing the implementations to be `unmanaged structs` to satisfy the generic constraint of our `MyExampleRunner` in the 'glue' layer. 190``` 191 static class ListUtil 192 { 193 static public void SetupTeardown(ref NativeList<int> container, int capacity, bool addValues) 194 { 195 if (capacity >= 0) 196 { 197 container = new NativeList<int>(capacity, Allocator.Persistent); 198 if (addValues) 199 { 200 for (int i = 0; i < capacity; i++) 201 container.Add(i); 202 } 203 } 204 else 205 container.Dispose(); 206 } 207 static public object SetupTeardownBCL(int capacity, bool addValues) 208 { 209 if (capacity < 0) 210 return null; 211 var list = new System.Collections.Generic.List<int>(capacity); 212 if (addValues) 213 { 214 for (int i = 0; i < capacity; i++) 215 list.Add(i); 216 } 217 return list; 218 } 219 } 220``` 221 2222. Now we'll create an implementation of `IMyExampleBenchmark` provided by the 'glue' layer to grow a list. The code should be straightforward, and each type of container has its code implemented only once. Additionally, the measurement code really is just "the thing we want to measure". 223``` 224 struct ListAddGrow : IMyExampleBenchmark 225 { 226 int toAdd; 227 NativeList<int> nativeContainer; 228 229 public void SetupTeardown(int capacity) 230 { 231 toAdd = capacity; 232 ListUtil.SetupTeardown(ref nativeContainer, 0, false); 233 } 234 public object SetupTeardownBCL(int capacity) 235 { 236 toAdd = capacity; 237 return ListUtil.SetupTeardownBCL(0, false); 238 } 239 240 public void Measure() 241 { 242 for (int i = 0; i < toAdd; i++) 243 nativeContainer.Add(i); 244 } 245 public void MeasureBCL(object container) 246 { 247 var list = (System.Collections.Generic.List<int>)container; 248 for (int i = 0; i < toAdd; i++) 249 list.Add(i); 250 } 251 } 252``` 253 2543. Let's make another implementation of `IMyExampleBenchmark`, this time testing the performance of a `foreach` over the list container types.<br/><br/> 255Take special note of the `Volatile.Write` used to ensure optimizations don't throw away the value, thus rendering the loop unnecessary and optimizing it out altogether. 256``` 257 struct ListForEach : IMyExampleBenchmark 258 { 259 NativeList<int> nativeContainer; 260 261 public void SetupTeardown(int capacity) => ListUtil.SetupTeardown(ref nativeContainer, capacity, true); 262 public object SetupTeardownBCL(int capacity) => ListUtil.SetupTeardownBCL(capacity, true); 263 264 public void Measure() 265 { 266 int value = 0; 267 foreach (var element in nativeContainer) 268 Volatile.Write(ref value, element); 269 } 270 public void MeasureBCL(object container) 271 { 272 int value = 0; 273 var list = (System.Collections.Generic.List<int>)container; 274 foreach (var element in list) 275 Volatile.Write(ref value, element); 276 } 277 } 278``` 279 2804. As a final example, we'll implement a performance test for checking if a list container is empty.<br/><br/> 281*This time*, neither `Volatile.Read` nor `Volatile.Write` would help much because optimization passes can determine the result of checking for empty is constant through each loop iteration, i.e. there is no dependency within the loop itself when making this calculation. Due to this, we must turn off optimizations altogether with `[MethodImpl(MethodImplOptions.NoOptimization)]`.<br/><br/> 282The best that could happen otherwise would be with a `Volatile.Write`. Then, the optimizer would extract the `IsEmpty` or `Count` to outside the loop, calling these only once, and then assign this pre-calculated value to the output of `Volatile.Write` `kIterations` times within a loop. Naturally, this doesn't tell us much about the code we want to measure. 283``` 284 struct ListIsEmpty100k : IMyExampleBenchmark 285 { 286 const int kIterations = 100_000; 287 NativeList<int> nativeContainer; 288 289 public void SetupTeardown(int capacity) => ListUtil.SetupTeardown(ref nativeContainer, capacity, true); 290 public object SetupTeardownBCL(int capacity) => ListUtil.SetupTeardownBCL(capacity, true); 291 292 [MethodImpl(MethodImplOptions.NoOptimization)] 293 public void Measure() 294 { 295 for (int i = 0; i < kIterations; i++) 296 _ = nativeContainer.IsEmpty; 297 } 298 [MethodImpl(MethodImplOptions.NoOptimization)] 299 public void MeasureBCL(object container) 300 { 301 var list = (System.Collections.Generic.List<int>)container; 302 for (int i = 0; i < kIterations; i++) 303 _ = list.Count == 0; 304 } 305 } 306``` 307 3085. Now, take our measurement code, and simply pass the `IMyExampleBenchmark` implementations into the `MyExampleRunner<T>` runner provided by the 'glue' layer. See the next section for the results of this work.<br/><br/> 309Note `[BenchmarkNameOverride]` is used so that name formatting will look like "NativeList" rather than "NativeMyListMeasurements" in benchmark reports.<br/><br/> 310That may have seemed like a lot of code to get to this point, but keep in mind in that once a 'glue' layer exists, it can be used for as many cases as fit. `com.unity.collections` has many, many performance and benchmarks tests built around a single (albeit more involved) intermediate 'glue' layer. 311``` 312 [Benchmark(typeof(MyExampleType))] 313 [BenchmarkNameOverride("List")] 314 class MyListMeasurements 315 { 316 [Test, Performance] 317 [Category("Performance")] 318 public unsafe void IsEmpty_x_100k( 319 [Values(0, 100)] int capacity, 320 [Values] MyExampleType type) 321 { 322 MyExampleRunner<ListIsEmpty100k>.Run(capacity, type); 323 } 324 325 [Test, Performance] 326 [Category("Performance")] 327 [BenchmarkTestFootnote("Incrementally reaching size of `growTo`")] 328 public unsafe void AddGrow( 329 [Values(65536, 1024 * 1024)] int growTo, 330 [Values] MyExampleType type) 331 { 332 MyExampleRunner<ListAddGrow>.Run(growTo, type); 333 } 334 335 [Test, Performance] 336 [Category("Performance")] 337 public unsafe void Foreach( 338 [Values(10000, 100000, 1000000)] int insertions, 339 [Values] MyExampleType type) 340 { 341 MyExampleRunner<ListForEach>.Run(insertions, type); 342 } 343 } 344``` 345 346<br/> 347 348--- 349## Example 350### Results 351 352There are two clear results of the List performance tests implemented above 3531. The Test Runner in the Unity Editor will display the following Performance Test Framework tests. Note that with one implementation per type, there is both a burst compiled path and non-burst compiled path being measured. One could easily add others (such as burst compiled while safety on or off, or an UnsafeContainer variation of the same tests, though this would require a bit more 'glue' to integrate). Here is an example of the output: 354 355![Performance Test Framework example](PerformanceTestFrameworkOutput.png) 356 3572. Running the `Benchmark Example/Generate My Benchmarks` menu item implemented above will generate a markdown report, again running the same single code path per type. Here is the output: 358 359> # Performance Comparison: Containers Example 360> 361> > **<span style="color:red">This file is auto-generated</span>** 362> > 363> > All measurments were taken on 12th Gen Intel(R) Core(TM) i9-12900K with 24 logical cores.<br/> 364> > To regenerate this file locally use: **DOTS -> Unity.Collections -> Generate &ast;&ast;&ast;** menu.<br/> 365> 366> ## Table of Contents 367> 368> - [Benchmark Results](#Benchmark%20Results) 369> - [List](#List) 370> 371> ## Benchmark Results 372> 373> Example benchmark - 10 runs after 5 warmup runs<br/> 374> <br/> 375> 376> > **Legend** 377> > 378> > `(B)` = Burst Compiled<br/> 379> > `(BCL)` = Base Class Library implementation (such as provided by Mono or .NET)<br/> 380> 381> <br/> 382> 383> ### *List* 384> 385> | Functionality | NativeList | NativeList (B) | *List (BCL)* | 386> |---|--:|--:|--:| 387> | `IsEmpty_x_100k(0)`*¹* | 0.373ms <span style="color:red">(0.3x)</span>&nbsp;🟠 | 0.089ms <span style="color:green">(1.1x)</span>&nbsp;🟢 | *0.098ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 388> | `IsEmpty_x_100k(100)`*¹* | 0.334ms <span style="color:red">(0.3x)</span>&nbsp;🟠 | 0.089ms <span style="color:green">(1.1x)</span>&nbsp;🟢 | *0.098ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 389> | `AddGrow(65536)`*³* | 1.281ms <span style="color:red">(0.1x)</span>&nbsp;🟠 | 0.427ms <span style="color:red">(0.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *0.144ms <span style="color:grey">(1.0x)</span>*&nbsp;🟢 | 390> | `AddGrow(1048576)`*³* | 21.435ms <span style="color:red">(0.1x)</span>&nbsp;🟠 | 7.471ms <span style="color:red">(0.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *2.274ms <span style="color:grey">(1.0x)</span>*&nbsp;🟢 | 391> | `Foreach(10000)` | 0.042ms <span style="color:red">(0.4x)</span>&nbsp;🟠 | 0.003ms <span style="color:green">(6.6x)</span>&nbsp;🟢 | *0.018ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 392> | `Foreach(100000)` | 0.452ms <span style="color:red">(0.4x)</span>&nbsp;🟠 | 0.025ms <span style="color:green">(7.4x)</span>&nbsp;🟢 | *0.184ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 393> | `Foreach(1000000)` | 4.500ms <span style="color:red">(0.4x)</span>&nbsp;🟠 | 0.250ms <span style="color:green">(7.5x)</span>&nbsp;🟢 | *1.877ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 394> 395> *¹* Optimizations were disabled to perform this benchmark<br/> 396> *³* AddGrow(growTo) -- Incrementally reaching size of `growTo`<br/> 397 398*Happy Benchmarking!*