Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/doc/rfc: i915 DG1 uAPI

Add an entry for the new uAPI needed for DG1. Also add the overall
upstream plan, including some notes for the TTM conversion.

v2(Daniel):
- include the overall upstreaming plan
- add a note for mmap, there are differences here for TTM vs i915
- bunch of other suggestions from Daniel
v3:
(Daniel)
- add a note for set/get caching stuff
- add some more docs for existing query and extensions stuff
- add an actual code example for regions query
- bunch of other stuff
(Jason)
- uAPI change(!):
- try a simpler design with the placements extension
- rather than have a generic setparam which can cover multiple
use cases, have each extension be responsible for one thing
only
v4:
(Daniel)
- add some more notes for ttm conversion
- bunch of other stuff
(Jason)
- uAPI change(!):
- drop all the extra rsvd members for the region_query and
region_info, just keep the bare minimum needed for padding
v5:
(Jason)
- for the upstream plan, add a requirement that we send the uAPI bits
again for final sign off before turning it on for real
- document how we intend to extend the rsvd bits for the region query
(Kenneth)
- improve the comment for the smem+lmem mmap mode and caching

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: mesa-dev@lists.freedesktop.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jon Bloomfield <jon.bloomfield@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210429103056.407067-1-matthew.auld@intel.com

+372
+237
Documentation/gpu/rfc/i915_gem_lmem.h
··· 1 + /** 2 + * enum drm_i915_gem_memory_class - Supported memory classes 3 + */ 4 + enum drm_i915_gem_memory_class { 5 + /** @I915_MEMORY_CLASS_SYSTEM: System memory */ 6 + I915_MEMORY_CLASS_SYSTEM = 0, 7 + /** @I915_MEMORY_CLASS_DEVICE: Device local-memory */ 8 + I915_MEMORY_CLASS_DEVICE, 9 + }; 10 + 11 + /** 12 + * struct drm_i915_gem_memory_class_instance - Identify particular memory region 13 + */ 14 + struct drm_i915_gem_memory_class_instance { 15 + /** @memory_class: See enum drm_i915_gem_memory_class */ 16 + __u16 memory_class; 17 + 18 + /** @memory_instance: Which instance */ 19 + __u16 memory_instance; 20 + }; 21 + 22 + /** 23 + * struct drm_i915_memory_region_info - Describes one region as known to the 24 + * driver. 25 + * 26 + * Note that we reserve some stuff here for potential future work. As an example 27 + * we might want expose the capabilities for a given region, which could include 28 + * things like if the region is CPU mappable/accessible, what are the supported 29 + * mapping types etc. 30 + * 31 + * Note that to extend struct drm_i915_memory_region_info and struct 32 + * drm_i915_query_memory_regions in the future the plan is to do the following: 33 + * 34 + * .. code-block:: C 35 + * 36 + * struct drm_i915_memory_region_info { 37 + * struct drm_i915_gem_memory_class_instance region; 38 + * union { 39 + * __u32 rsvd0; 40 + * __u32 new_thing1; 41 + * }; 42 + * ... 43 + * union { 44 + * __u64 rsvd1[8]; 45 + * struct { 46 + * __u64 new_thing2; 47 + * __u64 new_thing3; 48 + * ... 49 + * }; 50 + * }; 51 + * }; 52 + * 53 + * With this things should remain source compatible between versions for 54 + * userspace, even as we add new fields. 55 + * 56 + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. 57 + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS 58 + * at &drm_i915_query_item.query_id. 59 + */ 60 + struct drm_i915_memory_region_info { 61 + /** @region: The class:instance pair encoding */ 62 + struct drm_i915_gem_memory_class_instance region; 63 + 64 + /** @rsvd0: MBZ */ 65 + __u32 rsvd0; 66 + 67 + /** @probed_size: Memory probed by the driver (-1 = unknown) */ 68 + __u64 probed_size; 69 + 70 + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ 71 + __u64 unallocated_size; 72 + 73 + /** @rsvd1: MBZ */ 74 + __u64 rsvd1[8]; 75 + }; 76 + 77 + /** 78 + * struct drm_i915_query_memory_regions 79 + * 80 + * The region info query enumerates all regions known to the driver by filling 81 + * in an array of struct drm_i915_memory_region_info structures. 82 + * 83 + * Example for getting the list of supported regions: 84 + * 85 + * .. code-block:: C 86 + * 87 + * struct drm_i915_query_memory_regions *info; 88 + * struct drm_i915_query_item item = { 89 + * .query_id = DRM_I915_QUERY_MEMORY_REGIONS; 90 + * }; 91 + * struct drm_i915_query query = { 92 + * .num_items = 1, 93 + * .items_ptr = (uintptr_t)&item, 94 + * }; 95 + * int err, i; 96 + * 97 + * // First query the size of the blob we need, this needs to be large 98 + * // enough to hold our array of regions. The kernel will fill out the 99 + * // item.length for us, which is the number of bytes we need. 100 + * err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query); 101 + * if (err) ... 102 + * 103 + * info = calloc(1, item.length); 104 + * // Now that we allocated the required number of bytes, we call the ioctl 105 + * // again, this time with the data_ptr pointing to our newly allocated 106 + * // blob, which the kernel can then populate with the all the region info. 107 + * item.data_ptr = (uintptr_t)&info, 108 + * 109 + * err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query); 110 + * if (err) ... 111 + * 112 + * // We can now access each region in the array 113 + * for (i = 0; i < info->num_regions; i++) { 114 + * struct drm_i915_memory_region_info mr = info->regions[i]; 115 + * u16 class = mr.region.class; 116 + * u16 instance = mr.region.instance; 117 + * 118 + * .... 119 + * } 120 + * 121 + * free(info); 122 + */ 123 + struct drm_i915_query_memory_regions { 124 + /** @num_regions: Number of supported regions */ 125 + __u32 num_regions; 126 + 127 + /** @rsvd: MBZ */ 128 + __u32 rsvd[3]; 129 + 130 + /** @regions: Info about each supported region */ 131 + struct drm_i915_memory_region_info regions[]; 132 + }; 133 + 134 + #define DRM_I915_GEM_CREATE_EXT 0xdeadbeaf 135 + #define DRM_IOCTL_I915_GEM_CREATE_EXT DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext) 136 + 137 + /** 138 + * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added 139 + * extension support using struct i915_user_extension. 140 + * 141 + * Note that in the future we want to have our buffer flags here, at least for 142 + * the stuff that is immutable. Previously we would have two ioctls, one to 143 + * create the object with gem_create, and another to apply various parameters, 144 + * however this creates some ambiguity for the params which are considered 145 + * immutable. Also in general we're phasing out the various SET/GET ioctls. 146 + */ 147 + struct drm_i915_gem_create_ext { 148 + /** 149 + * @size: Requested size for the object. 150 + * 151 + * The (page-aligned) allocated size for the object will be returned. 152 + * 153 + * Note that for some devices we have might have further minimum 154 + * page-size restrictions(larger than 4K), like for device local-memory. 155 + * However in general the final size here should always reflect any 156 + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS 157 + * extension to place the object in device local-memory. 158 + */ 159 + __u64 size; 160 + /** 161 + * @handle: Returned handle for the object. 162 + * 163 + * Object handles are nonzero. 164 + */ 165 + __u32 handle; 166 + /** @flags: MBZ */ 167 + __u32 flags; 168 + /** 169 + * @extensions: The chain of extensions to apply to this object. 170 + * 171 + * This will be useful in the future when we need to support several 172 + * different extensions, and we need to apply more than one when 173 + * creating the object. See struct i915_user_extension. 174 + * 175 + * If we don't supply any extensions then we get the same old gem_create 176 + * behaviour. 177 + * 178 + * For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see 179 + * struct drm_i915_gem_create_ext_memory_regions. 180 + */ 181 + #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0 182 + __u64 extensions; 183 + }; 184 + 185 + /** 186 + * struct drm_i915_gem_create_ext_memory_regions - The 187 + * I915_GEM_CREATE_EXT_MEMORY_REGIONS extension. 188 + * 189 + * Set the object with the desired set of placements/regions in priority 190 + * order. Each entry must be unique and supported by the device. 191 + * 192 + * This is provided as an array of struct drm_i915_gem_memory_class_instance, or 193 + * an equivalent layout of class:instance pair encodings. See struct 194 + * drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to 195 + * query the supported regions for a device. 196 + * 197 + * As an example, on discrete devices, if we wish to set the placement as 198 + * device local-memory we can do something like: 199 + * 200 + * .. code-block:: C 201 + * 202 + * struct drm_i915_gem_memory_class_instance region_lmem = { 203 + * .memory_class = I915_MEMORY_CLASS_DEVICE, 204 + * .memory_instance = 0, 205 + * }; 206 + * struct drm_i915_gem_create_ext_memory_regions regions = { 207 + * .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS }, 208 + * .regions = (uintptr_t)&region_lmem, 209 + * .num_regions = 1, 210 + * }; 211 + * struct drm_i915_gem_create_ext create_ext = { 212 + * .size = 16 * PAGE_SIZE, 213 + * .extensions = (uintptr_t)&regions, 214 + * }; 215 + * 216 + * int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); 217 + * if (err) ... 218 + * 219 + * At which point we get the object handle in &drm_i915_gem_create_ext.handle, 220 + * along with the final object size in &drm_i915_gem_create_ext.size, which 221 + * should account for any rounding up, if required. 222 + */ 223 + struct drm_i915_gem_create_ext_memory_regions { 224 + /** @base: Extension link. See struct i915_user_extension. */ 225 + struct i915_user_extension base; 226 + 227 + /** @pad: MBZ */ 228 + __u32 pad; 229 + /** @num_regions: Number of elements in the @regions array. */ 230 + __u32 num_regions; 231 + /** 232 + * @regions: The regions/placements array. 233 + * 234 + * An array of struct drm_i915_gem_memory_class_instance. 235 + */ 236 + __u64 regions; 237 + };
+131
Documentation/gpu/rfc/i915_gem_lmem.rst
··· 1 + ========================= 2 + I915 DG1/LMEM RFC Section 3 + ========================= 4 + 5 + Upstream plan 6 + ============= 7 + For upstream the overall plan for landing all the DG1 stuff and turning it for 8 + real, with all the uAPI bits is: 9 + 10 + * Merge basic HW enabling of DG1(still without pciid) 11 + * Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag 12 + * At this point we can still make changes, but importantly this lets us 13 + start running IGTs which can utilize local-memory in CI 14 + * Convert over to TTM, make sure it all keeps working. Some of the work items: 15 + * TTM shrinker for discrete 16 + * dma_resv_lockitem for full dma_resv_lock, i.e not just trylock 17 + * Use TTM CPU pagefault handler 18 + * Route shmem backend over to TTM SYSTEM for discrete 19 + * TTM purgeable object support 20 + * Move i915 buddy allocator over to TTM 21 + * MMAP ioctl mode(see `I915 MMAP`_) 22 + * SET/GET ioctl caching(see `I915 SET/GET CACHING`_) 23 + * Send RFC(with mesa-dev on cc) for final sign off on the uAPI 24 + * Add pciid for DG1 and turn on uAPI for real 25 + 26 + New object placement and region query uAPI 27 + ========================================== 28 + Starting from DG1 we need to give userspace the ability to allocate buffers from 29 + device local-memory. Currently the driver supports gem_create, which can place 30 + buffers in system memory via shmem, and the usual assortment of other 31 + interfaces, like dumb buffers and userptr. 32 + 33 + To support this new capability, while also providing a uAPI which will work 34 + beyond just DG1, we propose to offer three new bits of uAPI: 35 + 36 + DRM_I915_QUERY_MEMORY_REGIONS 37 + ----------------------------- 38 + New query ID which allows userspace to discover the list of supported memory 39 + regions(like system-memory and local-memory) for a given device. We identify 40 + each region with a class and instance pair, which should be unique. The class 41 + here would be DEVICE or SYSTEM, and the instance would be zero, on platforms 42 + like DG1. 43 + 44 + Side note: The class/instance design is borrowed from our existing engine uAPI, 45 + where we describe every physical engine in terms of its class, and the 46 + particular instance, since we can have more than one per class. 47 + 48 + In the future we also want to expose more information which can further 49 + describe the capabilities of a region. 50 + 51 + .. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h 52 + :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions 53 + 54 + GEM_CREATE_EXT 55 + -------------- 56 + New ioctl which is basically just gem_create but now allows userspace to provide 57 + a chain of possible extensions. Note that if we don't provide any extensions and 58 + set flags=0 then we get the exact same behaviour as gem_create. 59 + 60 + Side note: We also need to support PXP[1] in the near future, which is also 61 + applicable to integrated platforms, and adds its own gem_create_ext extension, 62 + which basically lets userspace mark a buffer as "protected". 63 + 64 + .. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h 65 + :functions: drm_i915_gem_create_ext 66 + 67 + I915_GEM_CREATE_EXT_MEMORY_REGIONS 68 + ---------------------------------- 69 + Implemented as an extension for gem_create_ext, we would now allow userspace to 70 + optionally provide an immutable list of preferred placements at creation time, 71 + in priority order, for a given buffer object. For the placements we expect 72 + them each to use the class/instance encoding, as per the output of the regions 73 + query. Having the list in priority order will be useful in the future when 74 + placing an object, say during eviction. 75 + 76 + .. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h 77 + :functions: drm_i915_gem_create_ext_memory_regions 78 + 79 + One fair criticism here is that this seems a little over-engineered[2]. If we 80 + just consider DG1 then yes, a simple gem_create.flags or something is totally 81 + all that's needed to tell the kernel to allocate the buffer in local-memory or 82 + whatever. However looking to the future we need uAPI which can also support 83 + upcoming Xe HP multi-tile architecture in a sane way, where there can be 84 + multiple local-memory instances for a given device, and so using both class and 85 + instance in our uAPI to describe regions is desirable, although specifically 86 + for DG1 it's uninteresting, since we only have a single local-memory instance. 87 + 88 + Existing uAPI issues 89 + ==================== 90 + Some potential issues we still need to resolve. 91 + 92 + I915 MMAP 93 + --------- 94 + In i915 there are multiple ways to MMAP GEM object, including mapping the same 95 + object using different mapping types(WC vs WB), i.e multiple active mmaps per 96 + object. TTM expects one MMAP at most for the lifetime of the object. If it 97 + turns out that we have to backpedal here, there might be some potential 98 + userspace fallout. 99 + 100 + I915 SET/GET CACHING 101 + -------------------- 102 + In i915 we have set/get_caching ioctl. TTM doesn't let us to change this, but 103 + DG1 doesn't support non-snooped pcie transactions, so we can just always 104 + allocate as WB for smem-only buffers. If/when our hw gains support for 105 + non-snooped pcie transactions then we must fix this mode at allocation time as 106 + a new GEM extension. 107 + 108 + This is related to the mmap problem, because in general (meaning, when we're 109 + not running on intel cpus) the cpu mmap must not, ever, be inconsistent with 110 + allocation mode. 111 + 112 + Possible idea is to let the kernel picks the mmap mode for userspace from the 113 + following table: 114 + 115 + smem-only: WB. Userspace does not need to call clflush. 116 + 117 + smem+lmem: We only ever allow a single mode, so simply allocate this as uncached 118 + memory, and always give userspace a WC mapping. GPU still does snooped access 119 + here(assuming we can't turn it off like on DG1), which is a bit inefficient. 120 + 121 + lmem only: always WC 122 + 123 + This means on discrete you only get a single mmap mode, all others must be 124 + rejected. That's probably going to be a new default mode or something like 125 + that. 126 + 127 + Links 128 + ===== 129 + [1] https://patchwork.freedesktop.org/series/86798/ 130 + 131 + [2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791
+4
Documentation/gpu/rfc/index.rst
··· 15 15 16 16 * Once the code has landed move all the documentation to the right places in 17 17 the main core, helper or driver sections. 18 + 19 + .. toctree:: 20 + 21 + i915_gem_lmem.rst