Library/PackageCache/com.unity.collections/Unity.Collections.PerformanceTests/README.md at master · tacstudios.tngl.sh/AloneGame

A game about forced loneliness, made by TACStudios
AloneGame / Library / PackageCache / com.unity.collections / Unity.Collections.PerformanceTests / README.md
at master 211 lines 16 kB view raw view rendered
  1# Collections Benchmarking and Performance Tests
  2
  3## Table of Contents
  4
  5- [Overview](#overview)
  6  - [Containers](#containers)
  7  - [Allocators](#allocators)
  8- [Container Benchmarking and Performance Tests](#container-benchmarking-and-performance-tests)
  9  - [Example Code - List.Add](#example-code---listadd)
 10  - [Results - List.Add](#results---listadd)
 11- [Allocator Benchmarking and Performance Tests](#allocator-benchmarking-and-performance-tests)
 12  - [Example Code - RewindableAllocator.FixedSize](#example-code---rewindableallocatorfixedsize)
 13  - [Results - RewindableAllocator.FixedSize](#results---rewindableallocatorfixedsize)
 14
 15## Overview
 16`com.unity.collections` provides pre-defined intermediate 'glue' layers on top of the Benchmark Framework to enable relatively simple creation of performance and benchmark testing for a wide variety of code paths which may be taken when using the collections package. 
 17
 18### Containers
 19Examples of provided benchmarking and performance testing include:
 20- NativeContainer code
 21- Burst compiled NativeContainer code with safety enabled
 22- Burst compiled NativeContainer code with safety disabled
 23- UnsafeContainer code
 24- Burst compiled UnsafeContainer code with safety enabled
 25- Burst compiled UnsafeContainer code with safety disabled
 26
 27Combine those with:
 28- Container.ParallelWriter code going wide in any of the above mentioned situations
 29- Container.ReadOnly code going wide
 30
 31and it is easy to visualize the vast number of possibilities which we want to monitor and generate concrete performance data *and comparisons* on.
 32
 33Regarding comparisons, we also want to ensure that these burst compatible containers are competitive or better with a similar container in .NET/IL2CPP/Mono's base class library, and have a way to validate and track improvements there as well, such as those found in:
 34- System.Collections.Generic
 35- System.Collections.Concurrent
 36
 37### Allocators
 38
 39Naturally, there is a similar story with the custom allocator types provided by the collections package. In this case we want to be able to compare:
 40- A provided IAllocator implementation in a managed code path
 41- The same in a Burst compiled code path with safety enabled
 42- Again the same in a Burst compiled code path with safety disabled
 43
 44against:
 45- The UnityEngine built-in Allocator.Temp
 46- The UnityEngine built-in Allocator.TempJob
 47- The UnityEngine built-in Allocator.Persistent
 48
 49---
 50
 51## Container Benchmarking and Performance Tests
 52
 53Container performance testing and benchmarks are built around a small handful of types.
 54|Type|Description|
 55|---|---|
 56|`BenchmarkContainerType`|This enum defines variations for Native and Unsafe containers with and without burst compilation - with and without safety enabled. See the inline documentation for full details.|
 57|`IBenchmarkContainer`|Tests are written as implementations of this interface. It provides means for generic int parameters, allocation and disposal of Native, Unsafe, and C# Base Class Library containers, and measurement of the same.
 58|`BenchmarkContainerRunner`|Easy-to-use API for running measurements in a single call. See inline documentation for full details, and see below for example usage.|
 59|`IBenchmarkContainerParallel`|Similar to `IBenchmarkContainer`, but designed to support tightly designed measurement code with Unity Job system workers in mind|
 60|`BenchmarkContainerRunnerParallel`|Similar to `BenchmarkContainerRunner`, but designed to parameterize worker thread counts for performance testing and benchmarking parallel container implementations|
 61
 62---
 63
 64### Example Code - List.Add
 65
 66Here is a real-world basic example of implementing a performance and test and benchmark comparison for lists. This measures the cost of simply adding elements to a list with the expected capacity pre-allocated.
 67
 68```
 69    struct ListAdd : IBenchmarkContainer
 70    {
 71        int capacity;
 72        NativeList<int> nativeContainer;
 73        UnsafeList<int> unsafeContainer;
 74
 75        void IBenchmarkContainer.SetParams(int capacity, params int[] args) => this.capacity = capacity;
 76
 77        public void AllocNativeContainer(int capacity) => ListUtil.AllocInt(ref nativeContainer, capacity, false);
 78        public void AllocUnsafeContainer(int capacity) => ListUtil.AllocInt(ref unsafeContainer, capacity, false);
 79        public object AllocBclContainer(int capacity) => ListUtil.AllocBclContainer(capacity, false);
 80
 81        public void MeasureNativeContainer()
 82        {
 83            for (int i = 0; i < capacity; i++)
 84                nativeContainer.Add(i);
 85        }
 86        public void MeasureUnsafeContainer()
 87        {
 88            for (int i = 0; i < capacity; i++)
 89                unsafeContainer.Add(i);
 90        }
 91        public void MeasureBclContainer(object container)
 92        {
 93            var bclContainer = (System.Collections.Generic.List<int>)container;
 94            for (int i = 0; i < capacity; i++)
 95                bclContainer.Add(i);
 96        }
 97    }
 98```
 99To run these measurements, the calling code is quite simple, and generates a multitude of Performance Test Framework tests which can be run from the Unity Test Runner as well as through CI regression checks, and it also supports the necessary code paths for Benchmarking to make performance comparisons on all the variations *including* the BCL variation. Note the BCL variation of `System.Collections.Generic.List` will not appear as a Performance Test Framework test - it is considered for benchmarking only.
100```
101    [Benchmark(typeof(BenchmarkContainerType))]
102    class List
103    {
104        ... 
105        [Test, Performance]
106        [Category("Performance")]
107        public unsafe void Add(
108            [Values(10000, 100000, 1000000)] int insertions,
109            [Values] BenchmarkContainerType type)
110        {
111            BenchmarkContainerRunner<ListAdd>.Run(insertions, type);
112        }
113        ...
114    }
115```
116
117---
118
119### Results - List.Add
120
121This above two code snippets generate something like the following (notice the BCL tests aren't generated):
122
123![Performance Test Framework example](PerformanceTestFrameworkOutputListAdd.png)
124
125Running the `DOTS/Unity.Collections/Generate Container Benchmarks` menu item will generate a markdown report, again running the same single code path per type. Here is a snippet of the full results showing only the output for `List.Add`:
126
127> ### *List*
128> 
129> | Functionality | NativeList (S) | NativeList (S+B) | NativeList (B) | UnsafeList (S) | UnsafeList (S+B) | UnsafeList (B) | *List (BCL)* |
130> |---|--:|--:|--:|--:|--:|--:|--:|
131> | `Add(10000)` | 0.178ms <span style="color:red">(0.1x)</span>&nbsp;🟠 | 0.057ms  <span style="color:red">(0.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 0.018ms  <span style="color:red">(0.8x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 0.041ms <span style="color:red">(0.4x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 0.006ms  <span style="color:green">(2.3x)</span>&nbsp;🟢 | 0.014ms  <span style="color:green">(1.1x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *0.015ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
132> | `Add(100000)` | 1.827ms <span style="color:red">(0.1x)</span>&nbsp;🟠 | 0.622ms  <span style="color:red">(0.2x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 0.180ms  <span style="color:red">(0.8x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 0.432ms <span style="color:red">(0.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 0.061ms  <span style="color:green">(2.4x)</span>&nbsp;🟢 | 0.139ms  <span style="color:green">(1.1x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *0.146ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
133> | `Add(1000000)` | 18.910ms <span style="color:red">(0.1x)</span>&nbsp;🟠 | 6.443ms  <span style="color:red">(0.2x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 1.814ms  <span style="color:red">(0.8x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 4.136ms <span style="color:red">(0.4x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 0.586ms  <span style="color:green">(2.5x)</span>&nbsp;🟢 | 1.482ms  <span style="color:grey">(1.0x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *1.468ms <span style="color:grey">(1.0x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
134
135---
136
137## Allocator Benchmarking and Performance Tests
138
139Allocator performance testing and benchmarks are built around a small handful of types.
140|Type|Description|
141|---|---|
142|`BenchmarkAllocatorType`|This enum defines variations for allocators with and without burst compilation - with and without safety enabled. See the inline documentation for full details.|
143|`IBenchmarkAllocator`|Tests are written as implementations of this interface. It provides means for generic int parameters, creation and destruction of allocators, allocation and freeing of memory using these allocators as well as using Unity Engine's built-in allocators `Temp`, `TempJob`, and `Persistent`, and measurement of the same.
144|`BenchmarkAllocatorRunner`|Easy-to-use API for running measurements in a single call. See inline documentation for full details, and see below for example usage.|
145|`BenchmarkAllocatorUtil`|Generalized API for simplifying common Setup and Teardown implementations of `IBenchmarkAllocator` derived test types|
146
147---
148
149### Example Code - RewindableAllocator.FixedSize
150
151The following example will omit another utility type designed for RewindableAllocator. The type is designed to simplify setup, teardown, and `Rewind` functionality necessary on a per-test-run basis. See [RewindableAllocatorPerformanceTests.cs](RewindableAllocatorPerformanceTests.cs) for reference.
152
153```
154    struct Rewindable_FixedSize : IBenchmarkAllocator
155    {
156        RewindableAllocationInfo allocInfo;
157
158        public void CreateAllocator(Allocator builtinOverride) => allocInfo.CreateAllocator(builtinOverride);
159        public void DestroyAllocator() => allocInfo.DestroyAllocator();
160        public void Setup(int workers, int size, int allocations) =>
161            allocInfo.Setup(workers, size, 0, allocations);
162        public void Teardown() => allocInfo.Teardown();
163        public void Measure(int workerI) => allocInfo.Allocate(workerI);
164    }
165```
166To run these measurements, the calling code is quite simple, and generates a multitude of Performance Test Framework tests which can be run from the Unity Test Runner as well as through CI regression checks, and it also supports the necessary code paths for Benchmarking to make performance comparisons on all the variations *including* the `Temp`, `TempJob`, and `Persistent` variations. Note these Unity Engine built-in allocator variations will not appear as a Performance Test Framework test - it is considered for benchmarking only.
167```
168    [Benchmark(typeof(BenchmarkAllocatorType))]
169    [BenchmarkNameOverride("RewindableAllocator")]
170    class RewindableAllocatorBenchmark
171    {
172        ...
173        [Test, Performance]
174        [Category("Performance")]
175        [BenchmarkTestFootnote]
176        public void FixedSize(
177            [Values(1, 2, 4, 8)] int workerThreads,
178            [Values(1024, 1024 * 1024)] int allocSize,
179            [Values] BenchmarkAllocatorType type)
180        {
181            BenchmarkAllocatorRunner<Rewindable_FixedSize>.Run(type, allocSize, workerThreads);
182        }
183        ...
184    }
185```
186
187---
188
189### Results - RewindableAllocator.FixedSize
190
191This above two code snippets generate something like the following (notice the BCL tests aren't generated):
192
193![Performance Test Framework example](PerformanceTestFrameworkOutputFixedSize.png)
194
195Running the `DOTS/Unity.Collections/Generate Allocator Benchmarks` menu item will generate a markdown report, again running the same single code path per type. Here is a snippet of the full results showing only the output for `RewindableAllocator.FixedSize`:
196
197> ### *RewindableAllocator*
198> 
199> | Functionality | RewindableAllocator (S) | RewindableAllocator (S+B) | RewindableAllocator (B) | *TempJob (E)* | *Temp (E)* | *Persistent (E)* |
200> |---|--:|--:|--:|--:|--:|--:|
201> | `FixedSize(1, 1024)`*³* | 11.4µs  <span style="color:green">(2.5x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 3.9µs   <span style="color:green">(7.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 3.6µs   <span style="color:green">(7.9x)</span>&nbsp;🟢 | *13.6µs  <span style="color:green">(2.1x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *10.2µs   <span style="color:green">(2.8x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *28.6µs <span style="color:grey">(1.0x)</span>*&nbsp;🟠 |
202> | `FixedSize(2, 1024)`*²˒³* | 27.8µs  <span style="color:green">(2.5x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 17.7µs   <span style="color:green">(3.9x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 8.8µs   <span style="color:green">(7.9x)</span>&nbsp;🟢 | *19.3µs  <span style="color:green">(3.6x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *10.6µs   <span style="color:green">(6.5x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *69.1µs <span style="color:grey">(1.0x)</span>*&nbsp;🟠 |
203> | `FixedSize(4, 1024)`*²˒³* | 65.3µs  <span style="color:green">(1.9x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 73.1µs   <span style="color:green">(1.7x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 66.8µs   <span style="color:green">(1.8x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *28.2µs  <span style="color:green">(4.3x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *11.8µs  <span style="color:green">(10.3x)</span>*&nbsp;🟢 | *121.8µs <span style="color:grey">(1.0x)</span>*&nbsp;🟠 |
204> | `FixedSize(8, 1024)`*²˒³* | 141.5µs  <span style="color:green">(2.1x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 133.3µs   <span style="color:green">(2.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 158.5µs   <span style="color:green">(1.9x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *46.0µs  <span style="color:green">(6.6x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *11.6µs  <span style="color:green">(26.2x)</span>*&nbsp;🟢 | *303.9µs <span style="color:grey">(1.0x)</span>*&nbsp;🟠 |
205> | `FixedSize(1, 1048576)`*³* | 12.3µs <span style="color:green">(16.5x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 4.6µs  <span style="color:green">(44.2x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 4.2µs  <span style="color:green">(48.4x)</span>&nbsp;🟢 | *17.3µs <span style="color:green">(11.8x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *10.5µs  <span style="color:green">(19.4x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *203.3µs <span style="color:grey">(1.0x)</span>*&nbsp;🟠 |
206> | `FixedSize(2, 1048576)`*²˒³* | 24.7µs <span style="color:green">(12.1x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 14.9µs  <span style="color:green">(20.0x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 10.4µs  <span style="color:green">(28.7x)</span>&nbsp;🟢 | *27.7µs <span style="color:green">(10.8x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *11.3µs  <span style="color:green">(26.4x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *298.4µs <span style="color:grey">(1.0x)</span>*&nbsp;🟠 |
207> | `FixedSize(4, 1048576)`*²˒³* | 70.8µs <span style="color:green">(12.4x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 77.5µs  <span style="color:green">(11.3x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 72.5µs  <span style="color:green">(12.1x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *199.5µs  <span style="color:green">(4.4x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *12.5µs  <span style="color:green">(70.2x)</span>*&nbsp;🟢 | *877.7µs <span style="color:grey">(1.0x)</span>*&nbsp;🟠 |
208> | `FixedSize(8, 1048576)`*²˒³* | 152.0µs <span style="color:green">(14.5x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 155.2µs  <span style="color:green">(14.2x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 160.9µs  <span style="color:green">(13.7x)</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *1010.8µs  <span style="color:green">(2.2x)</span>*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | *12.4µs <span style="color:green">(177.2x)</span>*&nbsp;🟢 | *2197.7µs <span style="color:grey">(1.0x)</span>*&nbsp;🟠 |
209>
210> *²* Benchmark run on parallel job workers - results may vary<br/>
211> *³* FixedSize(workerThreads, allocSize)<br/>