Merge tag 'drm-intel-gt-next-2022-11-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

+75

Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon

··· 1 + What: /sys/devices/.../hwmon/hwmon/in0_input 2 + Date: February 2023 3 + KernelVersion: 6.2 4 + Contact: intel-gfx@lists.freedesktop.org 5 + Description: RO. Current Voltage in millivolt. 6 + 7 + Only supported for particular Intel i915 graphics platforms. 8 + 9 + What: /sys/devices/.../hwmon/hwmon/power1_max 10 + Date: February 2023 11 + KernelVersion: 6.2 12 + Contact: intel-gfx@lists.freedesktop.org 13 + Description: RW. Card reactive sustained (PL1/Tau) power limit in microwatts. 14 + 15 + The power controller will throttle the operating frequency 16 + if the power averaged over a window (typically seconds) 17 + exceeds this limit. 18 + 19 + Only supported for particular Intel i915 graphics platforms. 20 + 21 + What: /sys/devices/.../hwmon/hwmon/power1_rated_max 22 + Date: February 2023 23 + KernelVersion: 6.2 24 + Contact: intel-gfx@lists.freedesktop.org 25 + Description: RO. Card default power limit (default TDP setting). 26 + 27 + Only supported for particular Intel i915 graphics platforms. 28 + 29 + What: /sys/devices/.../hwmon/hwmon/power1_max_interval 30 + Date: February 2023 31 + KernelVersion: 6.2 32 + Contact: intel-gfx@lists.freedesktop.org 33 + Description: RW. Sustained power limit interval (Tau in PL1/Tau) in 34 + milliseconds over which sustained power is averaged. 35 + 36 + Only supported for particular Intel i915 graphics platforms. 37 + 38 + What: /sys/devices/.../hwmon/hwmon/power1_crit 39 + Date: February 2023 40 + KernelVersion: 6.2 41 + Contact: intel-gfx@lists.freedesktop.org 42 + Description: RW. Card reactive critical (I1) power limit in microwatts. 43 + 44 + Card reactive critical (I1) power limit in microwatts is exposed 45 + for client products. The power controller will throttle the 46 + operating frequency if the power averaged over a window exceeds 47 + this limit. 48 + 49 + Only supported for particular Intel i915 graphics platforms. 50 + 51 + What: /sys/devices/.../hwmon/hwmon/curr1_crit 52 + Date: February 2023 53 + KernelVersion: 6.2 54 + Contact: intel-gfx@lists.freedesktop.org 55 + Description: RW. Card reactive critical (I1) power limit in milliamperes. 56 + 57 + Card reactive critical (I1) power limit in milliamperes is 58 + exposed for server products. The power controller will throttle 59 + the operating frequency if the power averaged over a window 60 + exceeds this limit. 61 + 62 + Only supported for particular Intel i915 graphics platforms. 63 + 64 + What: /sys/devices/.../hwmon/hwmon/energy1_input 65 + Date: February 2023 66 + KernelVersion: 6.2 67 + Contact: intel-gfx@lists.freedesktop.org 68 + Description: RO. Energy input of device or gt in microjoules. 69 + 70 + For i915 device level hwmon devices (name "i915") this 71 + reflects energy input for the entire device. For gt level 72 + hwmon devices (name "i915_gtN") this reflects energy input 73 + for the gt. 74 + 75 + Only supported for particular Intel i915 graphics platforms.

+1

MAINTAINERS

··· 10224 10224 B: https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs 10225 10225 C: irc://irc.oftc.net/intel-gfx 10226 10226 T: git git://anongit.freedesktop.org/drm-intel 10227 + F: Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon 10227 10228 F: Documentation/gpu/i915.rst 10228 10229 F: drivers/gpu/drm/i915/ 10229 10230 F: include/drm/i915*

+22 -4

drivers/gpu/drm/i915/Kconfig.profile

··· 57 57 default 640 # milliseconds 58 58 help 59 59 How long to wait (in milliseconds) for a preemption event to occur 60 - when submitting a new context via execlists. If the current context 61 - does not hit an arbitration point and yield to HW before the timer 62 - expires, the HW will be reset to allow the more important context 63 - to execute. 60 + when submitting a new context. If the current context does not hit 61 + an arbitration point and yield to HW before the timer expires, the 62 + HW will be reset to allow the more important context to execute. 63 + 64 + This is adjustable via 65 + /sys/class/drm/card?/engine/*/preempt_timeout_ms 66 + 67 + May be 0 to disable the timeout. 68 + 69 + The compiled in default may get overridden at driver probe time on 70 + certain platforms and certain engines which will be reflected in the 71 + sysfs control. 72 + 73 + config DRM_I915_PREEMPT_TIMEOUT_COMPUTE 74 + int "Preempt timeout for compute engines (ms, jiffy granularity)" 75 + default 7500 # milliseconds 76 + help 77 + How long to wait (in milliseconds) for a preemption event to occur 78 + when submitting a new context to a compute capable engine. If the 79 + current context does not hit an arbitration point and yield to HW 80 + before the timer expires, the HW will be reset to allow the more 81 + important context to execute. 64 82 65 83 This is adjustable via 66 84 /sys/class/drm/card?/engine/*/preempt_timeout_ms

+10 -4

drivers/gpu/drm/i915/Makefile

··· 209 209 # graphics system controller (GSC) support 210 210 i915-y += gt/intel_gsc.o 211 211 212 + # graphics hardware monitoring (HWMON) support 213 + i915-$(CONFIG_HWMON) += i915_hwmon.o 214 + 212 215 # modesetting core code 213 216 i915-y += \ 214 217 display/hsw_ips.o \ ··· 313 310 314 311 i915-y += i915_perf.o 315 312 316 - # Protected execution platform (PXP) support 317 - i915-$(CONFIG_DRM_I915_PXP) += \ 313 + # Protected execution platform (PXP) support. Base support is required for HuC 314 + i915-y += \ 318 315 pxp/intel_pxp.o \ 316 + pxp/intel_pxp_tee.o \ 317 + pxp/intel_pxp_huc.o 318 + 319 + i915-$(CONFIG_DRM_I915_PXP) += \ 319 320 pxp/intel_pxp_cmd.o \ 320 321 pxp/intel_pxp_debugfs.o \ 321 322 pxp/intel_pxp_irq.o \ 322 323 pxp/intel_pxp_pm.o \ 323 - pxp/intel_pxp_session.o \ 324 - pxp/intel_pxp_tee.o 324 + pxp/intel_pxp_session.o 325 325 326 326 # Post-mortem debug and GPU hang state capture 327 327 i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o

+1

drivers/gpu/drm/i915/display/intel_dpt.c

··· 5 5 6 6 #include "gem/i915_gem_domain.h" 7 7 #include "gem/i915_gem_internal.h" 8 + #include "gem/i915_gem_lmem.h" 8 9 #include "gt/gen8_ppgtt.h" 9 10 10 11 #include "i915_drv.h"

-1

drivers/gpu/drm/i915/display/intel_fb_pin.c

··· 167 167 ret = i915_gem_object_attach_phys(obj, alignment); 168 168 else if (!ret && HAS_LMEM(dev_priv)) 169 169 ret = i915_gem_object_migrate(obj, &ww, INTEL_REGION_LMEM_0); 170 - /* TODO: Do we need to sync when migration becomes async? */ 171 170 if (!ret) 172 171 ret = i915_gem_object_pin_pages(obj); 173 172 if (ret)

+2 -2

drivers/gpu/drm/i915/display/intel_lpe_audio.c

··· 100 100 rsc[0].flags = IORESOURCE_IRQ; 101 101 rsc[0].name = "hdmi-lpe-audio-irq"; 102 102 103 - rsc[1].start = pci_resource_start(pdev, GTTMMADR_BAR) + 103 + rsc[1].start = pci_resource_start(pdev, GEN4_GTTMMADR_BAR) + 104 104 I915_HDMI_LPE_AUDIO_BASE; 105 - rsc[1].end = pci_resource_start(pdev, GTTMMADR_BAR) + 105 + rsc[1].end = pci_resource_start(pdev, GEN4_GTTMMADR_BAR) + 106 106 I915_HDMI_LPE_AUDIO_BASE + I915_HDMI_LPE_AUDIO_SIZE - 1; 107 107 rsc[1].flags = IORESOURCE_MEM; 108 108 rsc[1].name = "hdmi-lpe-audio-mmio";

+1 -2

drivers/gpu/drm/i915/gem/i915_gem_context.c

··· 1452 1452 int err; 1453 1453 1454 1454 /* serialises with execbuf */ 1455 - set_bit(CONTEXT_CLOSED_BIT, &ce->flags); 1455 + intel_context_close(ce); 1456 1456 if (!intel_context_pin_if_active(ce)) 1457 1457 continue; 1458 1458 ··· 2298 2298 } 2299 2299 2300 2300 args->ctx_id = id; 2301 - drm_dbg(&i915->drm, "HW context %d created\n", args->ctx_id); 2302 2301 2303 2302 return 0; 2304 2303

+26 -25

drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c

··· 25 25 return to_intel_bo(buf->priv); 26 26 } 27 27 28 - static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachment, 28 + static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attach, 29 29 enum dma_data_direction dir) 30 30 { 31 - struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf); 32 - struct sg_table *st; 31 + struct drm_i915_gem_object *obj = dma_buf_to_obj(attach->dmabuf); 32 + struct sg_table *sgt; 33 33 struct scatterlist *src, *dst; 34 34 int ret, i; 35 35 36 - /* Copy sg so that we make an independent mapping */ 37 - st = kmalloc(sizeof(struct sg_table), GFP_KERNEL); 38 - if (st == NULL) { 36 + /* 37 + * Make a copy of the object's sgt, so that we can make an independent 38 + * mapping 39 + */ 40 + sgt = kmalloc(sizeof(*sgt), GFP_KERNEL); 41 + if (!sgt) { 39 42 ret = -ENOMEM; 40 43 goto err; 41 44 } 42 45 43 - ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL); 46 + ret = sg_alloc_table(sgt, obj->mm.pages->orig_nents, GFP_KERNEL); 44 47 if (ret) 45 48 goto err_free; 46 49 47 - src = obj->mm.pages->sgl; 48 - dst = st->sgl; 49 - for (i = 0; i < obj->mm.pages->nents; i++) { 50 + dst = sgt->sgl; 51 + for_each_sg(obj->mm.pages->sgl, src, obj->mm.pages->orig_nents, i) { 50 52 sg_set_page(dst, sg_page(src), src->length, 0); 51 53 dst = sg_next(dst); 52 - src = sg_next(src); 53 54 } 54 55 55 - ret = dma_map_sgtable(attachment->dev, st, dir, DMA_ATTR_SKIP_CPU_SYNC); 56 + ret = dma_map_sgtable(attach->dev, sgt, dir, DMA_ATTR_SKIP_CPU_SYNC); 56 57 if (ret) 57 58 goto err_free_sg; 58 59 59 - return st; 60 + return sgt; 60 61 61 62 err_free_sg: 62 - sg_free_table(st); 63 + sg_free_table(sgt); 63 64 err_free: 64 - kfree(st); 65 + kfree(sgt); 65 66 err: 66 67 return ERR_PTR(ret); 67 68 } ··· 237 236 static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj) 238 237 { 239 238 struct drm_i915_private *i915 = to_i915(obj->base.dev); 240 - struct sg_table *pages; 239 + struct sg_table *sgt; 241 240 unsigned int sg_page_sizes; 242 241 243 242 assert_object_held(obj); 244 243 245 - pages = dma_buf_map_attachment(obj->base.import_attach, 246 - DMA_BIDIRECTIONAL); 247 - if (IS_ERR(pages)) 248 - return PTR_ERR(pages); 244 + sgt = dma_buf_map_attachment(obj->base.import_attach, 245 + DMA_BIDIRECTIONAL); 246 + if (IS_ERR(sgt)) 247 + return PTR_ERR(sgt); 249 248 250 249 /* 251 250 * DG1 is special here since it still snoops transactions even with ··· 262 261 (!HAS_LLC(i915) && !IS_DG1(i915))) 263 262 wbinvd_on_all_cpus(); 264 263 265 - sg_page_sizes = i915_sg_dma_sizes(pages->sgl); 266 - __i915_gem_object_set_pages(obj, pages, sg_page_sizes); 264 + sg_page_sizes = i915_sg_dma_sizes(sgt->sgl); 265 + __i915_gem_object_set_pages(obj, sgt, sg_page_sizes); 267 266 268 267 return 0; 269 268 } 270 269 271 270 static void i915_gem_object_put_pages_dmabuf(struct drm_i915_gem_object *obj, 272 - struct sg_table *pages) 271 + struct sg_table *sgt) 273 272 { 274 - dma_buf_unmap_attachment(obj->base.import_attach, pages, 273 + dma_buf_unmap_attachment(obj->base.import_attach, sgt, 275 274 DMA_BIDIRECTIONAL); 276 275 } 277 276 ··· 314 313 get_dma_buf(dma_buf); 315 314 316 315 obj = i915_gem_object_alloc(); 317 - if (obj == NULL) { 316 + if (!obj) { 318 317 ret = -ENOMEM; 319 318 goto fail_detach; 320 319 }

-5

drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c

··· 2954 2954 int err; 2955 2955 2956 2956 for (n = 0; n < eb->num_fences; n++) { 2957 - struct drm_syncobj *syncobj; 2958 - unsigned int flags; 2959 - 2960 - syncobj = ptr_unpack_bits(eb->fences[n].syncobj, &flags, 2); 2961 - 2962 2957 if (!eb->fences[n].dma_fence) 2963 2958 continue; 2964 2959

+4 -15

drivers/gpu/drm/i915/gem/i915_gem_internal.c

··· 6 6 7 7 #include <linux/scatterlist.h> 8 8 #include <linux/slab.h> 9 - #include <linux/swiotlb.h> 10 9 11 10 #include "i915_drv.h" 12 11 #include "i915_gem.h" ··· 37 38 struct scatterlist *sg; 38 39 unsigned int sg_page_sizes; 39 40 unsigned int npages; 40 - int max_order; 41 + int max_order = MAX_ORDER; 42 + unsigned int max_segment; 41 43 gfp_t gfp; 42 44 43 - max_order = MAX_ORDER; 44 - #ifdef CONFIG_SWIOTLB 45 - if (is_swiotlb_active(obj->base.dev->dev)) { 46 - unsigned int max_segment; 47 - 48 - max_segment = swiotlb_max_segment(); 49 - if (max_segment) { 50 - max_segment = max_t(unsigned int, max_segment, 51 - PAGE_SIZE) >> PAGE_SHIFT; 52 - max_order = min(max_order, ilog2(max_segment)); 53 - } 54 - } 55 - #endif 45 + max_segment = i915_sg_segment_size(i915->drm.dev) >> PAGE_SHIFT; 46 + max_order = min(max_order, get_order(max_segment)); 56 47 57 48 gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_RECLAIMABLE; 58 49 if (IS_I965GM(i915) || IS_I965G(i915)) {

+8 -13

drivers/gpu/drm/i915/gem/i915_gem_mman.c

··· 413 413 vma->mmo = mmo; 414 414 415 415 if (CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND) 416 - intel_wakeref_auto(&to_gt(i915)->userfault_wakeref, 416 + intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 417 417 msecs_to_jiffies_timeout(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)); 418 418 419 419 if (write) { ··· 557 557 558 558 drm_vma_node_unmap(&bo->base.vma_node, bdev->dev_mapping); 559 559 560 - if (obj->userfault_count) { 561 - /* rpm wakeref provide exclusive access */ 562 - list_del(&obj->userfault_link); 563 - obj->userfault_count = 0; 564 - } 560 + /* 561 + * We have exclusive access here via runtime suspend. All other callers 562 + * must first grab the rpm wakeref. 563 + */ 564 + GEM_BUG_ON(!obj->userfault_count); 565 + list_del(&obj->userfault_link); 566 + obj->userfault_count = 0; 565 567 } 566 568 567 569 void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj) ··· 589 587 spin_lock(&obj->mmo.lock); 590 588 } 591 589 spin_unlock(&obj->mmo.lock); 592 - 593 - if (obj->userfault_count) { 594 - mutex_lock(&to_gt(to_i915(obj->base.dev))->lmem_userfault_lock); 595 - list_del(&obj->userfault_link); 596 - mutex_unlock(&to_gt(to_i915(obj->base.dev))->lmem_userfault_lock); 597 - obj->userfault_count = 0; 598 - } 599 590 } 600 591 601 592 static struct i915_mmap_offset *

+11 -1

drivers/gpu/drm/i915/gem/i915_gem_object.c

··· 458 458 io_mapping_unmap(src_map); 459 459 } 460 460 461 + static bool object_has_mappable_iomem(struct drm_i915_gem_object *obj) 462 + { 463 + GEM_BUG_ON(!i915_gem_object_has_iomem(obj)); 464 + 465 + if (IS_DGFX(to_i915(obj->base.dev))) 466 + return i915_ttm_resource_mappable(i915_gem_to_ttm(obj)->resource); 467 + 468 + return true; 469 + } 470 + 461 471 /** 462 472 * i915_gem_object_read_from_page - read data from the page of a GEM object 463 473 * @obj: GEM object to read from ··· 490 480 491 481 if (i915_gem_object_has_struct_page(obj)) 492 482 i915_gem_object_read_from_page_kmap(obj, offset, dst, size); 493 - else if (i915_gem_object_has_iomem(obj)) 483 + else if (i915_gem_object_has_iomem(obj) && object_has_mappable_iomem(obj)) 494 484 i915_gem_object_read_from_page_iomap(obj, offset, dst, size); 495 485 else 496 486 return -ENODEV;

+4

drivers/gpu/drm/i915/gem/i915_gem_object.h

··· 482 482 void *__must_check i915_gem_object_pin_map_unlocked(struct drm_i915_gem_object *obj, 483 483 enum i915_map_type type); 484 484 485 + enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, 486 + struct drm_i915_gem_object *obj, 487 + bool always_coherent); 488 + 485 489 void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj, 486 490 unsigned long offset, 487 491 unsigned long size);

+12

drivers/gpu/drm/i915/gem/i915_gem_pages.c

··· 466 466 return ret; 467 467 } 468 468 469 + enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, 470 + struct drm_i915_gem_object *obj, 471 + bool always_coherent) 472 + { 473 + if (i915_gem_object_is_lmem(obj)) 474 + return I915_MAP_WC; 475 + if (HAS_LLC(i915) || always_coherent) 476 + return I915_MAP_WB; 477 + else 478 + return I915_MAP_WC; 479 + } 480 + 469 481 void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj, 470 482 unsigned long offset, 471 483 unsigned long size)

+30 -5

drivers/gpu/drm/i915/gem/i915_gem_pm.c

··· 22 22 23 23 void i915_gem_suspend(struct drm_i915_private *i915) 24 24 { 25 + struct intel_gt *gt; 26 + unsigned int i; 27 + 25 28 GEM_TRACE("%s\n", dev_name(i915->drm.dev)); 26 29 27 - intel_wakeref_auto(&to_gt(i915)->userfault_wakeref, 0); 30 + intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 0); 28 31 flush_workqueue(i915->wq); 29 32 30 33 /* ··· 39 36 * state. Fortunately, the kernel_context is disposable and we do 40 37 * not rely on its state. 41 38 */ 42 - intel_gt_suspend_prepare(to_gt(i915)); 39 + for_each_gt(gt, i915, i) 40 + intel_gt_suspend_prepare(gt); 43 41 44 42 i915_gem_drain_freed_objects(i915); 45 43 } ··· 135 131 &i915->mm.purge_list, 136 132 NULL 137 133 }, **phase; 134 + struct intel_gt *gt; 138 135 unsigned long flags; 136 + unsigned int i; 139 137 bool flush = false; 140 138 141 139 /* ··· 160 154 * machine in an unusable condition. 161 155 */ 162 156 163 - intel_gt_suspend_late(to_gt(i915)); 157 + for_each_gt(gt, i915, i) 158 + intel_gt_suspend_late(gt); 164 159 165 160 spin_lock_irqsave(&i915->mm.obj_lock, flags); 166 161 for (phase = phases; *phase; phase++) { ··· 219 212 220 213 void i915_gem_resume(struct drm_i915_private *i915) 221 214 { 222 - int ret; 215 + struct intel_gt *gt; 216 + int ret, i, j; 223 217 224 218 GEM_TRACE("%s\n", dev_name(i915->drm.dev)); 225 219 ··· 232 224 * guarantee that the context image is complete. So let's just reset 233 225 * it and start again. 234 226 */ 235 - intel_gt_resume(to_gt(i915)); 227 + for_each_gt(gt, i915, i) 228 + if (intel_gt_resume(gt)) 229 + goto err_wedged; 236 230 237 231 ret = lmem_restore(i915, I915_TTM_BACKUP_ALLOW_GPU); 238 232 GEM_WARN_ON(ret); 233 + 234 + return; 235 + 236 + err_wedged: 237 + for_each_gt(gt, i915, j) { 238 + if (!intel_gt_is_wedged(gt)) { 239 + dev_err(i915->drm.dev, 240 + "Failed to re-initialize GPU[%u], declaring it wedged!\n", 241 + j); 242 + intel_gt_set_wedged(gt); 243 + } 244 + 245 + if (j == i) 246 + break; 247 + } 239 248 }

+3 -3

drivers/gpu/drm/i915/gem/i915_gem_shmem.c

··· 194 194 struct intel_memory_region *mem = obj->mm.region; 195 195 struct address_space *mapping = obj->base.filp->f_mapping; 196 196 const unsigned long page_count = obj->base.size / PAGE_SIZE; 197 - unsigned int max_segment = i915_sg_segment_size(); 197 + unsigned int max_segment = i915_sg_segment_size(i915->drm.dev); 198 198 struct sg_table *st; 199 199 struct sgt_iter sgt_iter; 200 200 struct page *page; ··· 369 369 370 370 __start_cpu_write(obj); 371 371 /* 372 - * On non-LLC platforms, force the flush-on-acquire if this is ever 372 + * On non-LLC igfx platforms, force the flush-on-acquire if this is ever 373 373 * swapped-in. Our async flush path is not trust worthy enough yet(and 374 374 * happens in the wrong order), and with some tricks it's conceivable 375 375 * for userspace to change the cache-level to I915_CACHE_NONE after the 376 376 * pages are swapped-in, and since execbuf binds the object before doing 377 377 * the async flush, we have a race window. 378 378 */ 379 - if (!HAS_LLC(i915)) 379 + if (!HAS_LLC(i915) && !IS_DGFX(i915)) 380 380 obj->cache_dirty = true; 381 381 } 382 382

+172 -91

drivers/gpu/drm/i915/gem/i915_gem_stolen.c

··· 77 77 mutex_unlock(&i915->mm.stolen_lock); 78 78 } 79 79 80 - static int i915_adjust_stolen(struct drm_i915_private *i915, 81 - struct resource *dsm) 80 + static bool valid_stolen_size(struct drm_i915_private *i915, struct resource *dsm) 81 + { 82 + return (dsm->start != 0 || HAS_LMEMBAR_SMEM_STOLEN(i915)) && dsm->end > dsm->start; 83 + } 84 + 85 + static int adjust_stolen(struct drm_i915_private *i915, 86 + struct resource *dsm) 82 87 { 83 88 struct i915_ggtt *ggtt = to_gt(i915)->ggtt; 84 89 struct intel_uncore *uncore = ggtt->vm.gt->uncore; 85 - struct resource *r; 86 90 87 - if (dsm->start == 0 || dsm->end <= dsm->start) 91 + if (!valid_stolen_size(i915, dsm)) 88 92 return -EINVAL; 89 93 90 94 /* 95 + * Make sure we don't clobber the GTT if it's within stolen memory 96 + * 91 97 * TODO: We have yet too encounter the case where the GTT wasn't at the 92 98 * end of stolen. With that assumption we could simplify this. 93 99 */ 94 - 95 - /* Make sure we don't clobber the GTT if it's within stolen memory */ 96 100 if (GRAPHICS_VER(i915) <= 4 && 97 101 !IS_G33(i915) && !IS_PINEVIEW(i915) && !IS_G4X(i915)) { 98 102 struct resource stolen[2] = {*dsm, *dsm}; ··· 135 131 } 136 132 } 137 133 134 + if (!valid_stolen_size(i915, dsm)) 135 + return -EINVAL; 136 + 137 + return 0; 138 + } 139 + 140 + static int request_smem_stolen(struct drm_i915_private *i915, 141 + struct resource *dsm) 142 + { 143 + struct resource *r; 144 + 138 145 /* 139 - * With stolen lmem, we don't need to check if the address range 140 - * overlaps with the non-stolen system memory range, since lmem is local 141 - * to the gpu. 146 + * With stolen lmem, we don't need to request system memory for the 147 + * address range since it's local to the gpu. 148 + * 149 + * Starting MTL, in IGFX devices the stolen memory is exposed via 150 + * LMEMBAR and shall be considered similar to stolen lmem. 142 151 */ 143 - if (HAS_LMEM(i915)) 152 + if (HAS_LMEM(i915) || HAS_LMEMBAR_SMEM_STOLEN(i915)) 144 153 return 0; 145 154 146 155 /* ··· 388 371 389 372 drm_dbg(&i915->drm, "GEN6_STOLEN_RESERVED = 0x%016llx\n", reg_val); 390 373 391 - *base = reg_val & GEN11_STOLEN_RESERVED_ADDR_MASK; 392 - 393 374 switch (reg_val & GEN8_STOLEN_RESERVED_SIZE_MASK) { 394 375 case GEN8_STOLEN_RESERVED_1M: 395 376 *size = 1024 * 1024; ··· 405 390 *size = 8 * 1024 * 1024; 406 391 MISSING_CASE(reg_val & GEN8_STOLEN_RESERVED_SIZE_MASK); 407 392 } 393 + 394 + if (HAS_LMEMBAR_SMEM_STOLEN(i915)) 395 + /* the base is initialized to stolen top so subtract size to get base */ 396 + *base -= *size; 397 + else 398 + *base = reg_val & GEN11_STOLEN_RESERVED_ADDR_MASK; 408 399 } 409 400 410 - static int i915_gem_init_stolen(struct intel_memory_region *mem) 401 + /* 402 + * Initialize i915->dsm_reserved to contain the reserved space within the Data 403 + * Stolen Memory. This is a range on the top of DSM that is reserved, not to 404 + * be used by driver, so must be excluded from the region passed to the 405 + * allocator later. In the spec this is also called as WOPCM. 406 + * 407 + * Our expectation is that the reserved space is at the top of the stolen 408 + * region, as it has been the case for every platform, and *never* at the 409 + * bottom, so the calculation here can be simplified. 410 + */ 411 + static int init_reserved_stolen(struct drm_i915_private *i915) 411 412 { 412 - struct drm_i915_private *i915 = mem->i915; 413 413 struct intel_uncore *uncore = &i915->uncore; 414 414 resource_size_t reserved_base, stolen_top; 415 - resource_size_t reserved_total, reserved_size; 416 - 417 - mutex_init(&i915->mm.stolen_lock); 418 - 419 - if (intel_vgpu_active(i915)) { 420 - drm_notice(&i915->drm, 421 - "%s, disabling use of stolen memory\n", 422 - "iGVT-g active"); 423 - return 0; 424 - } 425 - 426 - if (i915_vtd_active(i915) && GRAPHICS_VER(i915) < 8) { 427 - drm_notice(&i915->drm, 428 - "%s, disabling use of stolen memory\n", 429 - "DMAR active"); 430 - return 0; 431 - } 432 - 433 - if (resource_size(&mem->region) == 0) 434 - return 0; 435 - 436 - i915->dsm = mem->region; 437 - 438 - if (i915_adjust_stolen(i915, &i915->dsm)) 439 - return 0; 440 - 441 - GEM_BUG_ON(i915->dsm.start == 0); 442 - GEM_BUG_ON(i915->dsm.end <= i915->dsm.start); 415 + resource_size_t reserved_size; 416 + int ret = 0; 443 417 444 418 stolen_top = i915->dsm.end + 1; 445 419 reserved_base = stolen_top; ··· 459 455 &reserved_base, &reserved_size); 460 456 } 461 457 462 - /* 463 - * Our expectation is that the reserved space is at the top of the 464 - * stolen region and *never* at the bottom. If we see !reserved_base, 465 - * it likely means we failed to read the registers correctly. 466 - */ 458 + /* No reserved stolen */ 459 + if (reserved_base == stolen_top) 460 + goto bail_out; 461 + 467 462 if (!reserved_base) { 468 463 drm_err(&i915->drm, 469 464 "inconsistent reservation %pa + %pa; ignoring\n", 470 465 &reserved_base, &reserved_size); 471 - reserved_base = stolen_top; 472 - reserved_size = 0; 466 + ret = -EINVAL; 467 + goto bail_out; 473 468 } 474 469 475 470 i915->dsm_reserved = ··· 478 475 drm_err(&i915->drm, 479 476 "Stolen reserved area %pR outside stolen memory %pR\n", 480 477 &i915->dsm_reserved, &i915->dsm); 481 - return 0; 478 + ret = -EINVAL; 479 + goto bail_out; 482 480 } 483 481 482 + return 0; 483 + 484 + bail_out: 485 + i915->dsm_reserved = 486 + (struct resource)DEFINE_RES_MEM(reserved_base, 0); 487 + 488 + return ret; 489 + } 490 + 491 + static int i915_gem_init_stolen(struct intel_memory_region *mem) 492 + { 493 + struct drm_i915_private *i915 = mem->i915; 494 + 495 + mutex_init(&i915->mm.stolen_lock); 496 + 497 + if (intel_vgpu_active(i915)) { 498 + drm_notice(&i915->drm, 499 + "%s, disabling use of stolen memory\n", 500 + "iGVT-g active"); 501 + return -ENOSPC; 502 + } 503 + 504 + if (i915_vtd_active(i915) && GRAPHICS_VER(i915) < 8) { 505 + drm_notice(&i915->drm, 506 + "%s, disabling use of stolen memory\n", 507 + "DMAR active"); 508 + return -ENOSPC; 509 + } 510 + 511 + if (adjust_stolen(i915, &mem->region)) 512 + return -ENOSPC; 513 + 514 + if (request_smem_stolen(i915, &mem->region)) 515 + return -ENOSPC; 516 + 517 + i915->dsm = mem->region; 518 + 519 + if (init_reserved_stolen(i915)) 520 + return -ENOSPC; 521 + 484 522 /* Exclude the reserved region from driver use */ 485 - mem->region.end = reserved_base - 1; 523 + mem->region.end = i915->dsm_reserved.start - 1; 486 524 mem->io_size = min(mem->io_size, resource_size(&mem->region)); 487 525 488 - /* It is possible for the reserved area to end before the end of stolen 489 - * memory, so just consider the start. */ 490 - reserved_total = stolen_top - reserved_base; 491 - 492 - i915->stolen_usable_size = 493 - resource_size(&i915->dsm) - reserved_total; 526 + i915->stolen_usable_size = resource_size(&mem->region); 494 527 495 528 drm_dbg(&i915->drm, 496 529 "Memory reserved for graphics device: %lluK, usable: %lluK\n", ··· 534 495 (u64)i915->stolen_usable_size >> 10); 535 496 536 497 if (i915->stolen_usable_size == 0) 537 - return 0; 498 + return -ENOSPC; 538 499 539 500 /* Basic memrange allocator for stolen space. */ 540 501 drm_mm_init(&i915->mm.stolen, 0, i915->stolen_usable_size); ··· 772 733 773 734 static int init_stolen_smem(struct intel_memory_region *mem) 774 735 { 736 + int err; 737 + 775 738 /* 776 739 * Initialise stolen early so that we may reserve preallocated 777 740 * objects for the BIOS to KMS transition. 778 741 */ 779 - return i915_gem_init_stolen(mem); 742 + err = i915_gem_init_stolen(mem); 743 + if (err) 744 + drm_dbg(&mem->i915->drm, "Skip stolen region: failed to setup\n"); 745 + 746 + return 0; 780 747 } 781 748 782 749 static int release_stolen_smem(struct intel_memory_region *mem) ··· 799 754 800 755 static int init_stolen_lmem(struct intel_memory_region *mem) 801 756 { 757 + struct drm_i915_private *i915 = mem->i915; 802 758 int err; 803 759 804 760 if (GEM_WARN_ON(resource_size(&mem->region) == 0)) 805 - return -ENODEV; 761 + return 0; 806 762 807 - /* 808 - * TODO: For stolen lmem we mostly just care about populating the dsm 809 - * related bits and setting up the drm_mm allocator for the range. 810 - * Perhaps split up i915_gem_init_stolen() for this. 811 - */ 812 763 err = i915_gem_init_stolen(mem); 813 - if (err) 814 - return err; 815 - 816 - if (mem->io_size && !io_mapping_init_wc(&mem->iomap, 817 - mem->io_start, 818 - mem->io_size)) { 819 - err = -EIO; 820 - goto err_cleanup; 764 + if (err) { 765 + drm_dbg(&mem->i915->drm, "Skip stolen region: failed to setup\n"); 766 + return 0; 821 767 } 768 + 769 + if (mem->io_size && 770 + !io_mapping_init_wc(&mem->iomap, mem->io_start, mem->io_size)) 771 + goto err_cleanup; 772 + 773 + drm_dbg(&i915->drm, "Stolen Local memory IO start: %pa\n", 774 + &mem->io_start); 775 + drm_dbg(&i915->drm, "Stolen Local DSM base: %pa\n", &mem->region.start); 822 776 823 777 return 0; 824 778 ··· 840 796 .init_object = _i915_gem_object_stolen_init, 841 797 }; 842 798 799 + static int mtl_get_gms_size(struct intel_uncore *uncore) 800 + { 801 + u16 ggc, gms; 802 + 803 + ggc = intel_uncore_read16(uncore, GGC); 804 + 805 + /* check GGMS, should be fixed 0x3 (8MB) */ 806 + if ((ggc & GGMS_MASK) != GGMS_MASK) 807 + return -EIO; 808 + 809 + /* return valid GMS value, -EIO if invalid */ 810 + gms = REG_FIELD_GET(GMS_MASK, ggc); 811 + switch (gms) { 812 + case 0x0 ... 0x04: 813 + return gms * 32; 814 + case 0xf0 ... 0xfe: 815 + return (gms - 0xf0 + 1) * 4; 816 + default: 817 + MISSING_CASE(gms); 818 + return -EIO; 819 + } 820 + } 821 + 843 822 struct intel_memory_region * 844 823 i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type, 845 824 u16 instance) ··· 873 806 struct intel_memory_region *mem; 874 807 resource_size_t io_start, io_size; 875 808 resource_size_t min_page_size; 809 + int ret; 876 810 877 811 if (WARN_ON_ONCE(instance)) 878 812 return ERR_PTR(-ENODEV); ··· 881 813 if (!i915_pci_resource_valid(pdev, GEN12_LMEM_BAR)) 882 814 return ERR_PTR(-ENXIO); 883 815 884 - /* Use DSM base address instead for stolen memory */ 885 - dsm_base = intel_uncore_read64(uncore, GEN12_DSMBASE); 886 - if (IS_DG1(uncore->i915)) { 816 + if (HAS_LMEMBAR_SMEM_STOLEN(i915) || IS_DG1(i915)) { 887 817 lmem_size = pci_resource_len(pdev, GEN12_LMEM_BAR); 888 - if (WARN_ON(lmem_size < dsm_base)) 889 - return ERR_PTR(-ENODEV); 890 818 } else { 891 819 resource_size_t lmem_range; 892 820 ··· 891 827 lmem_size *= SZ_1G; 892 828 } 893 829 894 - dsm_size = lmem_size - dsm_base; 895 - if (pci_resource_len(pdev, GEN12_LMEM_BAR) < lmem_size) { 830 + if (HAS_LMEMBAR_SMEM_STOLEN(i915)) { 831 + /* 832 + * MTL dsm size is in GGC register. 833 + * Also MTL uses offset to DSMBASE in ptes, so i915 834 + * uses dsm_base = 0 to setup stolen region. 835 + */ 836 + ret = mtl_get_gms_size(uncore); 837 + if (ret < 0) { 838 + drm_err(&i915->drm, "invalid MTL GGC register setting\n"); 839 + return ERR_PTR(ret); 840 + } 841 + 842 + dsm_base = 0; 843 + dsm_size = (resource_size_t)(ret * SZ_1M); 844 + 845 + GEM_BUG_ON(pci_resource_len(pdev, GEN12_LMEM_BAR) != SZ_256M); 846 + GEM_BUG_ON((dsm_size + SZ_8M) > lmem_size); 847 + } else { 848 + /* Use DSM base address instead for stolen memory */ 849 + dsm_base = intel_uncore_read64(uncore, GEN12_DSMBASE) & GEN12_BDSM_MASK; 850 + if (WARN_ON(lmem_size < dsm_base)) 851 + return ERR_PTR(-ENODEV); 852 + dsm_size = lmem_size - dsm_base; 853 + } 854 + 855 + io_size = dsm_size; 856 + if (HAS_LMEMBAR_SMEM_STOLEN(i915)) { 857 + io_start = pci_resource_start(pdev, GEN12_LMEM_BAR) + SZ_8M; 858 + } else if (pci_resource_len(pdev, GEN12_LMEM_BAR) < lmem_size) { 896 859 io_start = 0; 897 860 io_size = 0; 898 861 } else { 899 862 io_start = pci_resource_start(pdev, GEN12_LMEM_BAR) + dsm_base; 900 - io_size = dsm_size; 901 863 } 902 864 903 865 min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K : ··· 936 846 &i915_region_stolen_lmem_ops); 937 847 if (IS_ERR(mem)) 938 848 return mem; 939 - 940 - /* 941 - * TODO: consider creating common helper to just print all the 942 - * interesting stuff from intel_memory_region, which we can use for all 943 - * our probed regions. 944 - */ 945 - 946 - drm_dbg(&i915->drm, "Stolen Local memory IO start: %pa\n", 947 - &mem->io_start); 948 - drm_dbg(&i915->drm, "Stolen Local DSM base: %pa\n", &dsm_base); 949 849 950 850 intel_memory_region_set_name(mem, "stolen-local"); 951 851 ··· 961 881 intel_memory_region_set_name(mem, "stolen-system"); 962 882 963 883 mem->private = true; 884 + 964 885 return mem; 965 886 } 966 887

+87 -28

drivers/gpu/drm/i915/gem/i915_gem_ttm.c

··· 189 189 struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev); 190 190 struct intel_memory_region *mr = i915->mm.regions[INTEL_MEMORY_SYSTEM]; 191 191 struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm); 192 - const unsigned int max_segment = i915_sg_segment_size(); 192 + const unsigned int max_segment = i915_sg_segment_size(i915->drm.dev); 193 193 const size_t size = (size_t)ttm->num_pages << PAGE_SHIFT; 194 194 struct file *filp = i915_tt->filp; 195 195 struct sgt_iter sgt_iter; ··· 279 279 struct i915_ttm_tt *i915_tt; 280 280 int ret; 281 281 282 - if (!obj) 282 + if (i915_ttm_is_ghost_object(bo)) 283 283 return NULL; 284 284 285 285 i915_tt = kzalloc(sizeof(*i915_tt), GFP_KERNEL); ··· 362 362 { 363 363 struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); 364 364 365 - if (!obj) 365 + if (i915_ttm_is_ghost_object(bo)) 366 366 return false; 367 367 368 368 /* ··· 509 509 static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo) 510 510 { 511 511 struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); 512 - intel_wakeref_t wakeref = 0; 513 512 514 - if (bo->resource && likely(obj)) { 515 - /* ttm_bo_release() already has dma_resv_lock */ 516 - if (i915_ttm_cpu_maps_iomem(bo->resource)) 517 - wakeref = intel_runtime_pm_get(&to_i915(obj->base.dev)->runtime_pm); 518 - 513 + if (bo->resource && !i915_ttm_is_ghost_object(bo)) { 519 514 __i915_gem_object_pages_fini(obj); 520 - 521 - if (wakeref) 522 - intel_runtime_pm_put(&to_i915(obj->base.dev)->runtime_pm, wakeref); 523 - 524 515 i915_ttm_free_cached_io_rsgt(obj); 525 516 } 526 517 } ··· 529 538 ret = sg_alloc_table_from_pages_segment(st, 530 539 ttm->pages, ttm->num_pages, 531 540 0, (unsigned long)ttm->num_pages << PAGE_SHIFT, 532 - i915_sg_segment_size(), GFP_KERNEL); 541 + i915_sg_segment_size(i915_tt->dev), GFP_KERNEL); 533 542 if (ret) { 534 543 st->sgl = NULL; 535 544 return ERR_PTR(ret); ··· 615 624 struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); 616 625 int ret; 617 626 618 - if (!obj) 627 + if (i915_ttm_is_ghost_object(bo)) 619 628 return; 620 629 621 630 ret = i915_ttm_move_notify(bo); ··· 648 657 struct drm_i915_gem_object *obj = i915_ttm_to_gem(mem->bo); 649 658 bool unknown_state; 650 659 651 - if (!obj) 660 + if (i915_ttm_is_ghost_object(mem->bo)) 652 661 return -EINVAL; 653 662 654 663 if (!kref_get_unless_zero(&obj->base.refcount)) ··· 681 690 unsigned long base; 682 691 unsigned int ofs; 683 692 684 - GEM_BUG_ON(!obj); 693 + GEM_BUG_ON(i915_ttm_is_ghost_object(bo)); 685 694 GEM_WARN_ON(bo->ttm); 686 695 687 696 base = obj->mm.region->iomap.base - obj->mm.region->region.start; 688 697 sg = __i915_gem_object_get_sg(obj, &obj->ttm.get_io_page, page_offset, &ofs, true); 689 698 690 699 return ((base + sg_dma_address(sg)) >> PAGE_SHIFT) + ofs; 700 + } 701 + 702 + static int i915_ttm_access_memory(struct ttm_buffer_object *bo, 703 + unsigned long offset, void *buf, 704 + int len, int write) 705 + { 706 + struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); 707 + resource_size_t iomap = obj->mm.region->iomap.base - 708 + obj->mm.region->region.start; 709 + unsigned long page = offset >> PAGE_SHIFT; 710 + unsigned long bytes_left = len; 711 + 712 + /* 713 + * TODO: For now just let it fail if the resource is non-mappable, 714 + * otherwise we need to perform the memcpy from the gpu here, without 715 + * interfering with the object (like moving the entire thing). 716 + */ 717 + if (!i915_ttm_resource_mappable(bo->resource)) 718 + return -EIO; 719 + 720 + offset -= page << PAGE_SHIFT; 721 + do { 722 + unsigned long bytes = min(bytes_left, PAGE_SIZE - offset); 723 + void __iomem *ptr; 724 + dma_addr_t daddr; 725 + 726 + daddr = i915_gem_object_get_dma_address(obj, page); 727 + ptr = ioremap_wc(iomap + daddr + offset, bytes); 728 + if (!ptr) 729 + return -EIO; 730 + 731 + if (write) 732 + memcpy_toio(ptr, buf, bytes); 733 + else 734 + memcpy_fromio(buf, ptr, bytes); 735 + iounmap(ptr); 736 + 737 + page++; 738 + buf += bytes; 739 + bytes_left -= bytes; 740 + offset = 0; 741 + } while (bytes_left); 742 + 743 + return len; 691 744 } 692 745 693 746 /* ··· 750 715 .delete_mem_notify = i915_ttm_delete_mem_notify, 751 716 .io_mem_reserve = i915_ttm_io_mem_reserve, 752 717 .io_mem_pfn = i915_ttm_io_mem_pfn, 718 + .access_memory = i915_ttm_access_memory, 753 719 }; 754 720 755 721 /** ··· 1026 990 struct vm_area_struct *area = vmf->vma; 1027 991 struct ttm_buffer_object *bo = area->vm_private_data; 1028 992 struct drm_device *dev = bo->base.dev; 1029 - struct drm_i915_gem_object *obj; 993 + struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); 1030 994 intel_wakeref_t wakeref = 0; 1031 995 vm_fault_t ret; 1032 996 int idx; 1033 997 1034 - obj = i915_ttm_to_gem(bo); 1035 - if (!obj) 998 + if (i915_ttm_is_ghost_object(bo)) 1036 999 return VM_FAULT_SIGBUS; 1037 1000 1038 1001 /* Sanity check that we allow writing into this object */ ··· 1070 1035 } 1071 1036 1072 1037 if (err) { 1073 - drm_dbg(dev, "Unable to make resource CPU accessible\n"); 1038 + drm_dbg(dev, "Unable to make resource CPU accessible(err = %pe)\n", 1039 + ERR_PTR(err)); 1074 1040 dma_resv_unlock(bo->base.resv); 1075 1041 ret = VM_FAULT_SIGBUS; 1076 1042 goto out_rpm; ··· 1089 1053 if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) 1090 1054 goto out_rpm; 1091 1055 1092 - /* ttm_bo_vm_reserve() already has dma_resv_lock */ 1056 + /* 1057 + * ttm_bo_vm_reserve() already has dma_resv_lock. 1058 + * userfault_count is protected by dma_resv lock and rpm wakeref. 1059 + */ 1093 1060 if (ret == VM_FAULT_NOPAGE && wakeref && !obj->userfault_count) { 1094 1061 obj->userfault_count = 1; 1095 - mutex_lock(&to_gt(to_i915(obj->base.dev))->lmem_userfault_lock); 1096 - list_add(&obj->userfault_link, &to_gt(to_i915(obj->base.dev))->lmem_userfault_list); 1097 - mutex_unlock(&to_gt(to_i915(obj->base.dev))->lmem_userfault_lock); 1062 + spin_lock(&to_i915(obj->base.dev)->runtime_pm.lmem_userfault_lock); 1063 + list_add(&obj->userfault_link, &to_i915(obj->base.dev)->runtime_pm.lmem_userfault_list); 1064 + spin_unlock(&to_i915(obj->base.dev)->runtime_pm.lmem_userfault_lock); 1098 1065 } 1099 1066 1100 1067 if (wakeref & CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND) 1101 - intel_wakeref_auto(&to_gt(to_i915(obj->base.dev))->userfault_wakeref, 1068 + intel_wakeref_auto(&to_i915(obj->base.dev)->runtime_pm.userfault_wakeref, 1102 1069 msecs_to_jiffies_timeout(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)); 1103 1070 1104 1071 i915_ttm_adjust_lru(obj); ··· 1133 1094 struct drm_i915_gem_object *obj = 1134 1095 i915_ttm_to_gem(vma->vm_private_data); 1135 1096 1136 - GEM_BUG_ON(!obj); 1097 + GEM_BUG_ON(i915_ttm_is_ghost_object(vma->vm_private_data)); 1137 1098 i915_gem_object_get(obj); 1138 1099 } 1139 1100 ··· 1142 1103 struct drm_i915_gem_object *obj = 1143 1104 i915_ttm_to_gem(vma->vm_private_data); 1144 1105 1145 - GEM_BUG_ON(!obj); 1106 + GEM_BUG_ON(i915_ttm_is_ghost_object(vma->vm_private_data)); 1146 1107 i915_gem_object_put(obj); 1147 1108 } 1148 1109 ··· 1163 1124 1164 1125 static void i915_ttm_unmap_virtual(struct drm_i915_gem_object *obj) 1165 1126 { 1127 + struct ttm_buffer_object *bo = i915_gem_to_ttm(obj); 1128 + intel_wakeref_t wakeref = 0; 1129 + 1130 + assert_object_held_shared(obj); 1131 + 1132 + if (i915_ttm_cpu_maps_iomem(bo->resource)) { 1133 + wakeref = intel_runtime_pm_get(&to_i915(obj->base.dev)->runtime_pm); 1134 + 1135 + /* userfault_count is protected by obj lock and rpm wakeref. */ 1136 + if (obj->userfault_count) { 1137 + spin_lock(&to_i915(obj->base.dev)->runtime_pm.lmem_userfault_lock); 1138 + list_del(&obj->userfault_link); 1139 + spin_unlock(&to_i915(obj->base.dev)->runtime_pm.lmem_userfault_lock); 1140 + obj->userfault_count = 0; 1141 + } 1142 + } 1143 + 1166 1144 ttm_bo_unmap_virtual(i915_gem_to_ttm(obj)); 1145 + 1146 + if (wakeref) 1147 + intel_runtime_pm_put(&to_i915(obj->base.dev)->runtime_pm, wakeref); 1167 1148 } 1168 1149 1169 1150 static const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {

+13 -5

drivers/gpu/drm/i915/gem/i915_gem_ttm.h

··· 28 28 void i915_ttm_bo_destroy(struct ttm_buffer_object *bo); 29 29 30 30 /** 31 + * i915_ttm_is_ghost_object - Check if the ttm bo is a ghost object. 32 + * @bo: Pointer to the ttm buffer object 33 + * 34 + * Return: True if the ttm bo is not a i915 object but a ghost ttm object, 35 + * False otherwise. 36 + */ 37 + static inline bool i915_ttm_is_ghost_object(struct ttm_buffer_object *bo) 38 + { 39 + return bo->destroy != i915_ttm_bo_destroy; 40 + } 41 + 42 + /** 31 43 * i915_ttm_to_gem - Convert a struct ttm_buffer_object to an embedding 32 44 * struct drm_i915_gem_object. 33 45 * 34 - * Return: Pointer to the embedding struct ttm_buffer_object, or NULL 35 - * if the object was not an i915 ttm object. 46 + * Return: Pointer to the embedding struct ttm_buffer_object. 36 47 */ 37 48 static inline struct drm_i915_gem_object * 38 49 i915_ttm_to_gem(struct ttm_buffer_object *bo) 39 50 { 40 - if (bo->destroy != i915_ttm_bo_destroy) 41 - return NULL; 42 - 43 51 return container_of(bo, struct drm_i915_gem_object, __do_not_access); 44 52 } 45 53

+1 -1

drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c

··· 560 560 bool clear; 561 561 int ret; 562 562 563 - if (GEM_WARN_ON(!obj)) { 563 + if (GEM_WARN_ON(i915_ttm_is_ghost_object(bo))) { 564 564 ttm_bo_move_null(bo, dst_mem); 565 565 return 0; 566 566 }

+2 -3

drivers/gpu/drm/i915/gem/i915_gem_userptr.c

··· 129 129 static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) 130 130 { 131 131 const unsigned long num_pages = obj->base.size >> PAGE_SHIFT; 132 - unsigned int max_segment = i915_sg_segment_size(); 132 + unsigned int max_segment = i915_sg_segment_size(obj->base.dev->dev); 133 133 struct sg_table *st; 134 134 unsigned int sg_page_sizes; 135 135 struct page **pvec; ··· 292 292 if (!i915_gem_object_is_readonly(obj)) 293 293 gup_flags |= FOLL_WRITE; 294 294 295 - pinned = ret = 0; 295 + pinned = 0; 296 296 while (pinned < num_pages) { 297 297 ret = pin_user_pages_fast(obj->userptr.ptr + pinned * PAGE_SIZE, 298 298 num_pages - pinned, gup_flags, ··· 302 302 303 303 pinned += ret; 304 304 } 305 - ret = 0; 306 305 307 306 ret = i915_gem_object_lock_interruptible(obj, NULL); 308 307 if (ret)

+156 -1

drivers/gpu/drm/i915/gem/selftests/huge_pages.c

··· 1161 1161 GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); 1162 1162 1163 1163 size = obj->base.size; 1164 - if (obj->mm.page_sizes.sg & I915_GTT_PAGE_SIZE_64K) 1164 + if (obj->mm.page_sizes.sg & I915_GTT_PAGE_SIZE_64K && 1165 + !HAS_64K_PAGES(i915)) 1165 1166 size = round_up(size, I915_GTT_PAGE_SIZE_2M); 1166 1167 1167 1168 n = 0; ··· 1215 1214 * size and ensure the vma offset is at the start of the pt 1216 1215 * boundary, however to improve coverage we opt for testing both 1217 1216 * aligned and unaligned offsets. 1217 + * 1218 + * With PS64 this is no longer the case, but to ensure we 1219 + * sometimes get the compact layout for smaller objects, apply 1220 + * the round_up anyway. 1218 1221 */ 1219 1222 if (obj->mm.page_sizes.sg & I915_GTT_PAGE_SIZE_64K) 1220 1223 offset_low = round_down(offset_low, ··· 1416 1411 { SZ_2M + SZ_4K, SZ_64K | SZ_4K }, 1417 1412 { SZ_2M + SZ_4K, SZ_2M | SZ_4K }, 1418 1413 { SZ_2M + SZ_64K, SZ_2M | SZ_64K }, 1414 + { SZ_2M + SZ_64K, SZ_64K }, 1419 1415 }; 1420 1416 int i, j; 1421 1417 int err; ··· 1543 1537 if (err == -ENOMEM) 1544 1538 err = 0; 1545 1539 1540 + return err; 1541 + } 1542 + 1543 + static int igt_ppgtt_mixed(void *arg) 1544 + { 1545 + struct drm_i915_private *i915 = arg; 1546 + const unsigned long flags = PIN_OFFSET_FIXED | PIN_USER; 1547 + struct drm_i915_gem_object *obj, *on; 1548 + struct i915_gem_engines *engines; 1549 + struct i915_gem_engines_iter it; 1550 + struct i915_address_space *vm; 1551 + struct i915_gem_context *ctx; 1552 + struct intel_context *ce; 1553 + struct file *file; 1554 + I915_RND_STATE(prng); 1555 + LIST_HEAD(objects); 1556 + struct intel_memory_region *mr; 1557 + struct i915_vma *vma; 1558 + unsigned int count; 1559 + u32 i, addr; 1560 + int *order; 1561 + int n, err; 1562 + 1563 + /* 1564 + * Sanity check mixing 4K and 64K pages within the same page-table via 1565 + * the new PS64 TLB hint. 1566 + */ 1567 + 1568 + if (!HAS_64K_PAGES(i915)) { 1569 + pr_info("device lacks PS64, skipping\n"); 1570 + return 0; 1571 + } 1572 + 1573 + file = mock_file(i915); 1574 + if (IS_ERR(file)) 1575 + return PTR_ERR(file); 1576 + 1577 + ctx = hugepage_ctx(i915, file); 1578 + if (IS_ERR(ctx)) { 1579 + err = PTR_ERR(ctx); 1580 + goto out; 1581 + } 1582 + vm = i915_gem_context_get_eb_vm(ctx); 1583 + 1584 + i = 0; 1585 + addr = 0; 1586 + do { 1587 + u32 sz; 1588 + 1589 + sz = i915_prandom_u32_max_state(SZ_4M, &prng); 1590 + sz = max_t(u32, sz, SZ_4K); 1591 + 1592 + mr = i915->mm.regions[INTEL_REGION_LMEM_0]; 1593 + if (i & 1) 1594 + mr = i915->mm.regions[INTEL_REGION_SMEM]; 1595 + 1596 + obj = i915_gem_object_create_region(mr, sz, 0, 0); 1597 + if (IS_ERR(obj)) { 1598 + err = PTR_ERR(obj); 1599 + goto out_vm; 1600 + } 1601 + 1602 + list_add_tail(&obj->st_link, &objects); 1603 + 1604 + vma = i915_vma_instance(obj, vm, NULL); 1605 + if (IS_ERR(vma)) { 1606 + err = PTR_ERR(vma); 1607 + goto err_put; 1608 + } 1609 + 1610 + addr = round_up(addr, mr->min_page_size); 1611 + err = i915_vma_pin(vma, 0, 0, addr | flags); 1612 + if (err) 1613 + goto err_put; 1614 + 1615 + if (mr->type == INTEL_MEMORY_LOCAL && 1616 + (vma->resource->page_sizes_gtt & I915_GTT_PAGE_SIZE_4K)) { 1617 + err = -EINVAL; 1618 + goto err_put; 1619 + } 1620 + 1621 + addr += obj->base.size; 1622 + i++; 1623 + } while (addr <= SZ_16M); 1624 + 1625 + n = 0; 1626 + count = 0; 1627 + for_each_gem_engine(ce, i915_gem_context_lock_engines(ctx), it) { 1628 + count++; 1629 + if (!intel_engine_can_store_dword(ce->engine)) 1630 + continue; 1631 + 1632 + n++; 1633 + } 1634 + i915_gem_context_unlock_engines(ctx); 1635 + if (!n) 1636 + goto err_put; 1637 + 1638 + order = i915_random_order(count * count, &prng); 1639 + if (!order) { 1640 + err = -ENOMEM; 1641 + goto err_put; 1642 + } 1643 + 1644 + i = 0; 1645 + addr = 0; 1646 + engines = i915_gem_context_lock_engines(ctx); 1647 + list_for_each_entry(obj, &objects, st_link) { 1648 + u32 rnd = i915_prandom_u32_max_state(UINT_MAX, &prng); 1649 + 1650 + addr = round_up(addr, obj->mm.region->min_page_size); 1651 + 1652 + ce = engines->engines[order[i] % engines->num_engines]; 1653 + i = (i + 1) % (count * count); 1654 + if (!ce || !intel_engine_can_store_dword(ce->engine)) 1655 + continue; 1656 + 1657 + err = __igt_write_huge(ce, obj, obj->base.size, addr, 0, rnd); 1658 + if (err) 1659 + break; 1660 + 1661 + err = __igt_write_huge(ce, obj, obj->base.size, addr, 1662 + offset_in_page(rnd) / sizeof(u32), rnd + 1); 1663 + if (err) 1664 + break; 1665 + 1666 + err = __igt_write_huge(ce, obj, obj->base.size, addr, 1667 + (PAGE_SIZE / sizeof(u32)) - 1, 1668 + rnd + 2); 1669 + if (err) 1670 + break; 1671 + 1672 + addr += obj->base.size; 1673 + 1674 + cond_resched(); 1675 + } 1676 + 1677 + i915_gem_context_unlock_engines(ctx); 1678 + kfree(order); 1679 + err_put: 1680 + list_for_each_entry_safe(obj, on, &objects, st_link) { 1681 + list_del(&obj->st_link); 1682 + i915_gem_object_put(obj); 1683 + } 1684 + out_vm: 1685 + i915_vm_put(vm); 1686 + out: 1687 + fput(file); 1546 1688 return err; 1547 1689 } 1548 1690 ··· 1957 1803 SUBTEST(igt_ppgtt_smoke_huge), 1958 1804 SUBTEST(igt_ppgtt_sanity_check), 1959 1805 SUBTEST(igt_ppgtt_compact), 1806 + SUBTEST(igt_ppgtt_mixed), 1960 1807 }; 1961 1808 1962 1809 if (!HAS_PPGTT(i915)) {

+66 -52

drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c

··· 179 179 } 180 180 181 181 struct parallel_switch { 182 - struct task_struct *tsk; 182 + struct kthread_worker *worker; 183 + struct kthread_work work; 183 184 struct intel_context *ce[2]; 185 + int result; 184 186 }; 185 187 186 - static int __live_parallel_switch1(void *data) 188 + static void __live_parallel_switch1(struct kthread_work *work) 187 189 { 188 - struct parallel_switch *arg = data; 190 + struct parallel_switch *arg = 191 + container_of(work, typeof(*arg), work); 189 192 IGT_TIMEOUT(end_time); 190 193 unsigned long count; 191 194 192 195 count = 0; 196 + arg->result = 0; 193 197 do { 194 198 struct i915_request *rq = NULL; 195 - int err, n; 199 + int n; 196 200 197 - err = 0; 198 - for (n = 0; !err && n < ARRAY_SIZE(arg->ce); n++) { 201 + for (n = 0; !arg->result && n < ARRAY_SIZE(arg->ce); n++) { 199 202 struct i915_request *prev = rq; 200 203 201 204 rq = i915_request_create(arg->ce[n]); 202 205 if (IS_ERR(rq)) { 203 206 i915_request_put(prev); 204 - return PTR_ERR(rq); 207 + arg->result = PTR_ERR(rq); 208 + break; 205 209 } 206 210 207 211 i915_request_get(rq); 208 212 if (prev) { 209 - err = i915_request_await_dma_fence(rq, &prev->fence); 213 + arg->result = 214 + i915_request_await_dma_fence(rq, 215 + &prev->fence); 210 216 i915_request_put(prev); 211 217 } 212 218 213 219 i915_request_add(rq); 214 220 } 221 + 222 + if (IS_ERR_OR_NULL(rq)) 223 + break; 224 + 215 225 if (i915_request_wait(rq, 0, HZ) < 0) 216 - err = -ETIME; 226 + arg->result = -ETIME; 227 + 217 228 i915_request_put(rq); 218 - if (err) 219 - return err; 220 229 221 230 count++; 222 - } while (!__igt_timeout(end_time, NULL)); 231 + } while (!arg->result && !__igt_timeout(end_time, NULL)); 223 232 224 - pr_info("%s: %lu switches (sync)\n", arg->ce[0]->engine->name, count); 225 - return 0; 233 + pr_info("%s: %lu switches (sync) <%d>\n", 234 + arg->ce[0]->engine->name, count, arg->result); 226 235 } 227 236 228 - static int __live_parallel_switchN(void *data) 237 + static void __live_parallel_switchN(struct kthread_work *work) 229 238 { 230 - struct parallel_switch *arg = data; 239 + struct parallel_switch *arg = 240 + container_of(work, typeof(*arg), work); 231 241 struct i915_request *rq = NULL; 232 242 IGT_TIMEOUT(end_time); 233 243 unsigned long count; 234 244 int n; 235 245 236 246 count = 0; 247 + arg->result = 0; 237 248 do { 238 - for (n = 0; n < ARRAY_SIZE(arg->ce); n++) { 249 + for (n = 0; !arg->result && n < ARRAY_SIZE(arg->ce); n++) { 239 250 struct i915_request *prev = rq; 240 - int err = 0; 241 251 242 252 rq = i915_request_create(arg->ce[n]); 243 253 if (IS_ERR(rq)) { 244 254 i915_request_put(prev); 245 - return PTR_ERR(rq); 255 + arg->result = PTR_ERR(rq); 256 + break; 246 257 } 247 258 248 259 i915_request_get(rq); 249 260 if (prev) { 250 - err = i915_request_await_dma_fence(rq, &prev->fence); 261 + arg->result = 262 + i915_request_await_dma_fence(rq, 263 + &prev->fence); 251 264 i915_request_put(prev); 252 265 } 253 266 254 267 i915_request_add(rq); 255 - if (err) { 256 - i915_request_put(rq); 257 - return err; 258 - } 259 268 } 260 269 261 270 count++; 262 - } while (!__igt_timeout(end_time, NULL)); 263 - i915_request_put(rq); 271 + } while (!arg->result && !__igt_timeout(end_time, NULL)); 264 272 265 - pr_info("%s: %lu switches (many)\n", arg->ce[0]->engine->name, count); 266 - return 0; 273 + if (!IS_ERR_OR_NULL(rq)) 274 + i915_request_put(rq); 275 + 276 + pr_info("%s: %lu switches (many) <%d>\n", 277 + arg->ce[0]->engine->name, count, arg->result); 267 278 } 268 279 269 280 static int live_parallel_switch(void *arg) 270 281 { 271 282 struct drm_i915_private *i915 = arg; 272 - static int (* const func[])(void *arg) = { 283 + static void (* const func[])(struct kthread_work *) = { 273 284 __live_parallel_switch1, 274 285 __live_parallel_switchN, 275 286 NULL, ··· 288 277 struct parallel_switch *data = NULL; 289 278 struct i915_gem_engines *engines; 290 279 struct i915_gem_engines_iter it; 291 - int (* const *fn)(void *arg); 280 + void (* const *fn)(struct kthread_work *); 292 281 struct i915_gem_context *ctx; 293 282 struct intel_context *ce; 294 283 struct file *file; ··· 359 348 } 360 349 } 361 350 351 + for (n = 0; n < count; n++) { 352 + struct kthread_worker *worker; 353 + 354 + if (!data[n].ce[0]) 355 + continue; 356 + 357 + worker = kthread_create_worker(0, "igt/parallel:%s", 358 + data[n].ce[0]->engine->name); 359 + if (IS_ERR(worker)) 360 + goto out; 361 + 362 + data[n].worker = worker; 363 + } 364 + 362 365 for (fn = func; !err && *fn; fn++) { 363 366 struct igt_live_test t; 364 - int n; 365 367 366 368 err = igt_live_test_begin(&t, i915, __func__, ""); 367 369 if (err) ··· 384 360 if (!data[n].ce[0]) 385 361 continue; 386 362 387 - data[n].tsk = kthread_run(*fn, &data[n], 388 - "igt/parallel:%s", 389 - data[n].ce[0]->engine->name); 390 - if (IS_ERR(data[n].tsk)) { 391 - err = PTR_ERR(data[n].tsk); 392 - break; 393 - } 394 - get_task_struct(data[n].tsk); 363 + data[n].result = 0; 364 + kthread_init_work(&data[n].work, *fn); 365 + kthread_queue_work(data[n].worker, &data[n].work); 395 366 } 396 367 397 - yield(); /* start all threads before we kthread_stop() */ 398 - 399 368 for (n = 0; n < count; n++) { 400 - int status; 401 - 402 - if (IS_ERR_OR_NULL(data[n].tsk)) 403 - continue; 404 - 405 - status = kthread_stop(data[n].tsk); 406 - if (status && !err) 407 - err = status; 408 - 409 - put_task_struct(data[n].tsk); 410 - data[n].tsk = NULL; 369 + if (data[n].ce[0]) { 370 + kthread_flush_work(&data[n].work); 371 + if (data[n].result && !err) 372 + err = data[n].result; 373 + } 411 374 } 412 375 413 376 if (igt_live_test_end(&t)) ··· 410 399 intel_context_unpin(data[n].ce[m]); 411 400 intel_context_put(data[n].ce[m]); 412 401 } 402 + 403 + if (data[n].worker) 404 + kthread_destroy_worker(data[n].worker); 413 405 } 414 406 kfree(data); 415 407 out_file:

+78 -1

drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c

··· 6 6 7 7 #include "i915_drv.h" 8 8 #include "i915_selftest.h" 9 + #include "gem/i915_gem_context.h" 9 10 11 + #include "mock_context.h" 10 12 #include "mock_dmabuf.h" 13 + #include "igt_gem_utils.h" 14 + #include "selftests/mock_drm.h" 11 15 #include "selftests/mock_gem_device.h" 12 16 13 17 static int igt_dmabuf_export(void *arg) ··· 144 140 return err; 145 141 } 146 142 143 + static int verify_access(struct drm_i915_private *i915, 144 + struct drm_i915_gem_object *native_obj, 145 + struct drm_i915_gem_object *import_obj) 146 + { 147 + struct i915_gem_engines_iter it; 148 + struct i915_gem_context *ctx; 149 + struct intel_context *ce; 150 + struct i915_vma *vma; 151 + struct file *file; 152 + u32 *vaddr; 153 + int err = 0, i; 154 + 155 + file = mock_file(i915); 156 + if (IS_ERR(file)) 157 + return PTR_ERR(file); 158 + 159 + ctx = live_context(i915, file); 160 + if (IS_ERR(ctx)) { 161 + err = PTR_ERR(ctx); 162 + goto out_file; 163 + } 164 + 165 + for_each_gem_engine(ce, i915_gem_context_lock_engines(ctx), it) { 166 + if (intel_engine_can_store_dword(ce->engine)) 167 + break; 168 + } 169 + i915_gem_context_unlock_engines(ctx); 170 + if (!ce) 171 + goto out_file; 172 + 173 + vma = i915_vma_instance(import_obj, ce->vm, NULL); 174 + if (IS_ERR(vma)) { 175 + err = PTR_ERR(vma); 176 + goto out_file; 177 + } 178 + 179 + err = i915_vma_pin(vma, 0, 0, PIN_USER); 180 + if (err) 181 + goto out_file; 182 + 183 + err = igt_gpu_fill_dw(ce, vma, 0, 184 + vma->size >> PAGE_SHIFT, 0xdeadbeaf); 185 + i915_vma_unpin(vma); 186 + if (err) 187 + goto out_file; 188 + 189 + err = i915_gem_object_wait(import_obj, 0, MAX_SCHEDULE_TIMEOUT); 190 + if (err) 191 + goto out_file; 192 + 193 + vaddr = i915_gem_object_pin_map_unlocked(native_obj, I915_MAP_WB); 194 + if (IS_ERR(vaddr)) { 195 + err = PTR_ERR(vaddr); 196 + goto out_file; 197 + } 198 + 199 + for (i = 0; i < native_obj->base.size / sizeof(u32); i += PAGE_SIZE / sizeof(u32)) { 200 + if (vaddr[i] != 0xdeadbeaf) { 201 + pr_err("Data mismatch [%d]=%u\n", i, vaddr[i]); 202 + err = -EINVAL; 203 + goto out_file; 204 + } 205 + } 206 + 207 + out_file: 208 + fput(file); 209 + return err; 210 + } 211 + 147 212 static int igt_dmabuf_import_same_driver(struct drm_i915_private *i915, 148 213 struct intel_memory_region **regions, 149 214 unsigned int num_regions) ··· 227 154 228 155 force_different_devices = true; 229 156 230 - obj = __i915_gem_object_create_user(i915, PAGE_SIZE, 157 + obj = __i915_gem_object_create_user(i915, SZ_8M, 231 158 regions, num_regions); 232 159 if (IS_ERR(obj)) { 233 160 pr_err("__i915_gem_object_create_user failed with err=%ld\n", ··· 278 205 } 279 206 280 207 i915_gem_object_unlock(import_obj); 208 + 209 + err = verify_access(i915, obj, import_obj); 210 + if (err) 211 + goto out_import; 281 212 282 213 /* Now try a fake an importer */ 283 214 import_attach = dma_buf_attach(dmabuf, obj->base.dev->dev);

+1

drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c

··· 8 8 #include <linux/prime_numbers.h> 9 9 10 10 #include "gem/i915_gem_internal.h" 11 + #include "gem/i915_gem_lmem.h" 11 12 #include "gem/i915_gem_region.h" 12 13 #include "gem/i915_gem_ttm.h" 13 14 #include "gem/i915_gem_ttm_move.h"

+35 -20

drivers/gpu/drm/i915/gt/gen8_engine_cs.c

··· 396 396 return 0; 397 397 } 398 398 399 - static int __gen125_emit_bb_start(struct i915_request *rq, 400 - u64 offset, u32 len, 401 - const unsigned int flags, 402 - u32 arb) 399 + static int __xehp_emit_bb_start(struct i915_request *rq, 400 + u64 offset, u32 len, 401 + const unsigned int flags, 402 + u32 arb) 403 403 { 404 404 struct intel_context *ce = rq->context; 405 405 u32 wa_offset = lrc_indirect_bb(ce); 406 406 u32 *cs; 407 + 408 + GEM_BUG_ON(!ce->wa_bb_page); 407 409 408 410 cs = intel_ring_begin(rq, 12); 409 411 if (IS_ERR(cs)) ··· 437 435 return 0; 438 436 } 439 437 440 - int gen125_emit_bb_start_noarb(struct i915_request *rq, 441 - u64 offset, u32 len, 442 - const unsigned int flags) 438 + int xehp_emit_bb_start_noarb(struct i915_request *rq, 439 + u64 offset, u32 len, 440 + const unsigned int flags) 443 441 { 444 - return __gen125_emit_bb_start(rq, offset, len, flags, MI_ARB_DISABLE); 442 + return __xehp_emit_bb_start(rq, offset, len, flags, MI_ARB_DISABLE); 445 443 } 446 444 447 - int gen125_emit_bb_start(struct i915_request *rq, 448 - u64 offset, u32 len, 449 - const unsigned int flags) 445 + int xehp_emit_bb_start(struct i915_request *rq, 446 + u64 offset, u32 len, 447 + const unsigned int flags) 450 448 { 451 - return __gen125_emit_bb_start(rq, offset, len, flags, MI_ARB_ENABLE); 449 + return __xehp_emit_bb_start(rq, offset, len, flags, MI_ARB_ENABLE); 452 450 } 453 451 454 452 int gen8_emit_bb_start_noarb(struct i915_request *rq, ··· 585 583 u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) 586 584 { 587 585 cs = gen8_emit_pipe_control(cs, 586 + PIPE_CONTROL_CS_STALL | 587 + PIPE_CONTROL_TLB_INVALIDATE | 588 588 PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | 589 589 PIPE_CONTROL_DEPTH_CACHE_FLUSH | 590 590 PIPE_CONTROL_DC_FLUSH_ENABLE, ··· 604 600 605 601 u32 *gen11_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) 606 602 { 603 + cs = gen8_emit_pipe_control(cs, 604 + PIPE_CONTROL_CS_STALL | 605 + PIPE_CONTROL_TLB_INVALIDATE | 606 + PIPE_CONTROL_TILE_CACHE_FLUSH | 607 + PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | 608 + PIPE_CONTROL_DEPTH_CACHE_FLUSH | 609 + PIPE_CONTROL_DC_FLUSH_ENABLE, 610 + 0); 611 + 612 + /*XXX: Look at gen8_emit_fini_breadcrumb_rcs */ 607 613 cs = gen8_emit_ggtt_write_rcs(cs, 608 614 rq->fence.seqno, 609 615 hwsp_offset(rq), 610 - PIPE_CONTROL_CS_STALL | 611 - PIPE_CONTROL_TILE_CACHE_FLUSH | 612 - PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | 613 - PIPE_CONTROL_DEPTH_CACHE_FLUSH | 614 - PIPE_CONTROL_DC_FLUSH_ENABLE | 615 - PIPE_CONTROL_FLUSH_ENABLE); 616 + PIPE_CONTROL_FLUSH_ENABLE | 617 + PIPE_CONTROL_CS_STALL); 616 618 617 619 return gen8_emit_fini_breadcrumb_tail(rq, cs); 618 620 } ··· 725 715 { 726 716 struct drm_i915_private *i915 = rq->engine->i915; 727 717 u32 flags = (PIPE_CONTROL_CS_STALL | 718 + PIPE_CONTROL_TLB_INVALIDATE | 728 719 PIPE_CONTROL_TILE_CACHE_FLUSH | 729 720 PIPE_CONTROL_FLUSH_L3 | 730 721 PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | ··· 742 731 else if (rq->engine->class == COMPUTE_CLASS) 743 732 flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS; 744 733 734 + cs = gen12_emit_pipe_control(cs, PIPE_CONTROL0_HDC_PIPELINE_FLUSH, flags, 0); 735 + 736 + /*XXX: Look at gen8_emit_fini_breadcrumb_rcs */ 745 737 cs = gen12_emit_ggtt_write_rcs(cs, 746 738 rq->fence.seqno, 747 739 hwsp_offset(rq), 748 - PIPE_CONTROL0_HDC_PIPELINE_FLUSH, 749 - flags); 740 + 0, 741 + PIPE_CONTROL_FLUSH_ENABLE | 742 + PIPE_CONTROL_CS_STALL); 750 743 751 744 return gen12_emit_fini_breadcrumb_tail(rq, cs); 752 745 }

+6 -6

drivers/gpu/drm/i915/gt/gen8_engine_cs.h

··· 32 32 u64 offset, u32 len, 33 33 const unsigned int flags); 34 34 35 - int gen125_emit_bb_start_noarb(struct i915_request *rq, 36 - u64 offset, u32 len, 37 - const unsigned int flags); 38 - int gen125_emit_bb_start(struct i915_request *rq, 39 - u64 offset, u32 len, 40 - const unsigned int flags); 35 + int xehp_emit_bb_start_noarb(struct i915_request *rq, 36 + u64 offset, u32 len, 37 + const unsigned int flags); 38 + int xehp_emit_bb_start(struct i915_request *rq, 39 + u64 offset, u32 len, 40 + const unsigned int flags); 41 41 42 42 u32 *gen8_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs); 43 43 u32 *gen12_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs);

+48 -38

drivers/gpu/drm/i915/gt/gen8_ppgtt.c

··· 476 476 const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags); 477 477 unsigned int rem = sg_dma_len(iter->sg); 478 478 u64 start = vma_res->start; 479 + u64 end = start + vma_res->vma_size; 479 480 480 481 GEM_BUG_ON(!i915_vm_is_4lvl(vm)); 481 482 ··· 490 489 gen8_pte_t encode = pte_encode; 491 490 unsigned int page_size; 492 491 gen8_pte_t *vaddr; 493 - u16 index, max; 492 + u16 index, max, nent, i; 494 493 495 494 max = I915_PDES; 495 + nent = 1; 496 496 497 497 if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && 498 498 IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && ··· 505 503 506 504 vaddr = px_vaddr(pd); 507 505 } else { 508 - if (encode & GEN12_PPGTT_PTE_LM) { 509 - GEM_BUG_ON(__gen8_pte_index(start, 0) % 16); 510 - GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K); 511 - GEM_BUG_ON(!IS_ALIGNED(iter->dma, 512 - I915_GTT_PAGE_SIZE_64K)); 506 + index = __gen8_pte_index(start, 0); 507 + page_size = I915_GTT_PAGE_SIZE; 513 508 514 - index = __gen8_pte_index(start, 0) / 16; 515 - page_size = I915_GTT_PAGE_SIZE_64K; 509 + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_64K) { 510 + /* 511 + * Device local-memory on these platforms should 512 + * always use 64K pages or larger (including GTT 513 + * alignment), therefore if we know the whole 514 + * page-table needs to be filled we can always 515 + * safely use the compact-layout. Otherwise fall 516 + * back to the TLB hint with PS64. If this is 517 + * system memory we only bother with PS64. 518 + */ 519 + if ((encode & GEN12_PPGTT_PTE_LM) && 520 + end - start >= SZ_2M && !index) { 521 + index = __gen8_pte_index(start, 0) / 16; 522 + page_size = I915_GTT_PAGE_SIZE_64K; 516 523 517 - max /= 16; 524 + max /= 16; 518 525 519 - vaddr = px_vaddr(pd); 520 - vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K; 526 + vaddr = px_vaddr(pd); 527 + vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K; 521 528 522 - pt->is_compact = true; 523 - } else { 524 - GEM_BUG_ON(pt->is_compact); 525 - index = __gen8_pte_index(start, 0); 526 - page_size = I915_GTT_PAGE_SIZE; 529 + pt->is_compact = true; 530 + } else if (IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_64K) && 531 + rem >= I915_GTT_PAGE_SIZE_64K && 532 + !(index % 16)) { 533 + encode |= GEN12_PTE_PS64; 534 + page_size = I915_GTT_PAGE_SIZE_64K; 535 + nent = 16; 536 + } 527 537 } 528 538 529 539 vaddr = px_vaddr(pt); ··· 543 529 544 530 do { 545 531 GEM_BUG_ON(rem < page_size); 546 - vaddr[index++] = encode | iter->dma; 532 + 533 + for (i = 0; i < nent; i++) { 534 + vaddr[index++] = 535 + encode | (iter->dma + i * 536 + I915_GTT_PAGE_SIZE); 537 + } 547 538 548 539 start += page_size; 549 540 iter->dma += page_size; ··· 764 745 GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K)); 765 746 GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K)); 766 747 748 + /* XXX: we don't strictly need to use this layout */ 749 + 767 750 if (!pt->is_compact) { 768 751 vaddr = px_vaddr(pd); 769 752 vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K; ··· 950 929 */ 951 930 ppgtt->vm.has_read_only = !IS_GRAPHICS_VER(gt->i915, 11, 12); 952 931 953 - if (HAS_LMEM(gt->i915)) { 932 + if (HAS_LMEM(gt->i915)) 954 933 ppgtt->vm.alloc_pt_dma = alloc_pt_lmem; 955 - 956 - /* 957 - * On some platforms the hw has dropped support for 4K GTT pages 958 - * when dealing with LMEM, and due to the design of 64K GTT 959 - * pages in the hw, we can only mark the *entire* page-table as 960 - * operating in 64K GTT mode, since the enable bit is still on 961 - * the pde, and not the pte. And since we still need to allow 962 - * 4K GTT pages for SMEM objects, we can't have a "normal" 4K 963 - * page-table with scratch pointing to LMEM, since that's 964 - * undefined from the hw pov. The simplest solution is to just 965 - * move the 64K scratch page to SMEM on such platforms and call 966 - * it a day, since that should work for all configurations. 967 - */ 968 - if (HAS_64K_PAGES(gt->i915)) 969 - ppgtt->vm.alloc_scratch_dma = alloc_pt_dma; 970 - else 971 - ppgtt->vm.alloc_scratch_dma = alloc_pt_lmem; 972 - } else { 934 + else 973 935 ppgtt->vm.alloc_pt_dma = alloc_pt_dma; 974 - ppgtt->vm.alloc_scratch_dma = alloc_pt_dma; 975 - } 936 + 937 + /* 938 + * Using SMEM here instead of LMEM has the advantage of not reserving 939 + * high performance memory for a "never" used filler page. It also 940 + * removes the device access that would be required to initialise the 941 + * scratch page, reducing pressure on an even scarcer resource. 942 + */ 943 + ppgtt->vm.alloc_scratch_dma = alloc_pt_dma; 976 944 977 945 ppgtt->vm.pte_encode = gen8_pte_encode; 978 946

+8

drivers/gpu/drm/i915/gt/intel_context.h

··· 276 276 return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); 277 277 } 278 278 279 + static inline void intel_context_close(struct intel_context *ce) 280 + { 281 + set_bit(CONTEXT_CLOSED_BIT, &ce->flags); 282 + 283 + if (ce->ops->close) 284 + ce->ops->close(ce); 285 + } 286 + 279 287 static inline bool intel_context_is_closed(const struct intel_context *ce) 280 288 { 281 289 return test_bit(CONTEXT_CLOSED_BIT, &ce->flags);

+7 -2

drivers/gpu/drm/i915/gt/intel_context_types.h

··· 43 43 void (*revoke)(struct intel_context *ce, struct i915_request *rq, 44 44 unsigned int preempt_timeout_ms); 45 45 46 + void (*close)(struct intel_context *ce); 47 + 46 48 int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); 47 49 int (*pin)(struct intel_context *ce, void *vaddr); 48 50 void (*unpin)(struct intel_context *ce); ··· 199 197 * context's submissions is complete. 200 198 */ 201 199 struct i915_sw_fence blocked; 202 - /** @number_committed_requests: number of committed requests */ 203 - int number_committed_requests; 204 200 /** @requests: list of active requests on this context */ 205 201 struct list_head requests; 206 202 /** @prio: the context's current guc priority */ ··· 208 208 * each priority bucket 209 209 */ 210 210 u32 prio_count[GUC_CLIENT_PRIORITY_NUM]; 211 + /** 212 + * @sched_disable_delay_work: worker to disable scheduling on this 213 + * context 214 + */ 215 + struct delayed_work sched_disable_delay_work; 211 216 } guc_state; 212 217 213 218 struct {

+6

drivers/gpu/drm/i915/gt/intel_engine.h

··· 348 348 return engine->hung_ce; 349 349 } 350 350 351 + u64 intel_clamp_heartbeat_interval_ms(struct intel_engine_cs *engine, u64 value); 352 + u64 intel_clamp_max_busywait_duration_ns(struct intel_engine_cs *engine, u64 value); 353 + u64 intel_clamp_preempt_timeout_ms(struct intel_engine_cs *engine, u64 value); 354 + u64 intel_clamp_stop_timeout_ms(struct intel_engine_cs *engine, u64 value); 355 + u64 intel_clamp_timeslice_duration_ms(struct intel_engine_cs *engine, u64 value); 356 + 351 357 #endif /* _INTEL_RINGBUFFER_H_ */

+94 -15

drivers/gpu/drm/i915/gt/intel_engine_cs.c

··· 486 486 engine->logical_mask = BIT(logical_instance); 487 487 __sprint_engine_name(engine); 488 488 489 + if ((engine->class == COMPUTE_CLASS && !RCS_MASK(engine->gt) && 490 + __ffs(CCS_MASK(engine->gt)) == engine->instance) || 491 + engine->class == RENDER_CLASS) 492 + engine->flags |= I915_ENGINE_FIRST_RENDER_COMPUTE; 493 + 494 + /* features common between engines sharing EUs */ 495 + if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) { 496 + engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; 497 + engine->flags |= I915_ENGINE_HAS_EU_PRIORITY; 498 + } 499 + 489 500 engine->props.heartbeat_interval_ms = 490 501 CONFIG_DRM_I915_HEARTBEAT_INTERVAL; 491 502 engine->props.max_busywait_duration_ns = ··· 508 497 engine->props.timeslice_duration_ms = 509 498 CONFIG_DRM_I915_TIMESLICE_DURATION; 510 499 511 - /* Override to uninterruptible for OpenCL workloads. */ 512 - if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) 513 - engine->props.preempt_timeout_ms = 0; 500 + /* 501 + * Mid-thread pre-emption is not available in Gen12. Unfortunately, 502 + * some compute workloads run quite long threads. That means they get 503 + * reset due to not pre-empting in a timely manner. So, bump the 504 + * pre-emption timeout value to be much higher for compute engines. 505 + */ 506 + if (GRAPHICS_VER(i915) == 12 && (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)) 507 + engine->props.preempt_timeout_ms = CONFIG_DRM_I915_PREEMPT_TIMEOUT_COMPUTE; 514 508 515 - if ((engine->class == COMPUTE_CLASS && !RCS_MASK(engine->gt) && 516 - __ffs(CCS_MASK(engine->gt)) == engine->instance) || 517 - engine->class == RENDER_CLASS) 518 - engine->flags |= I915_ENGINE_FIRST_RENDER_COMPUTE; 509 + /* Cap properties according to any system limits */ 510 + #define CLAMP_PROP(field) \ 511 + do { \ 512 + u64 clamp = intel_clamp_##field(engine, engine->props.field); \ 513 + if (clamp != engine->props.field) { \ 514 + drm_notice(&engine->i915->drm, \ 515 + "Warning, clamping %s to %lld to prevent overflow\n", \ 516 + #field, clamp); \ 517 + engine->props.field = clamp; \ 518 + } \ 519 + } while (0) 519 520 520 - /* features common between engines sharing EUs */ 521 - if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) { 522 - engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; 523 - engine->flags |= I915_ENGINE_HAS_EU_PRIORITY; 524 - } 521 + CLAMP_PROP(heartbeat_interval_ms); 522 + CLAMP_PROP(max_busywait_duration_ns); 523 + CLAMP_PROP(preempt_timeout_ms); 524 + CLAMP_PROP(stop_timeout_ms); 525 + CLAMP_PROP(timeslice_duration_ms); 526 + 527 + #undef CLAMP_PROP 525 528 526 529 engine->defaults = engine->props; /* never to change again */ 527 530 ··· 557 532 gt->engine[id] = engine; 558 533 559 534 return 0; 535 + } 536 + 537 + u64 intel_clamp_heartbeat_interval_ms(struct intel_engine_cs *engine, u64 value) 538 + { 539 + value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)); 540 + 541 + return value; 542 + } 543 + 544 + u64 intel_clamp_max_busywait_duration_ns(struct intel_engine_cs *engine, u64 value) 545 + { 546 + value = min(value, jiffies_to_nsecs(2)); 547 + 548 + return value; 549 + } 550 + 551 + u64 intel_clamp_preempt_timeout_ms(struct intel_engine_cs *engine, u64 value) 552 + { 553 + /* 554 + * NB: The GuC API only supports 32bit values. However, the limit is further 555 + * reduced due to internal calculations which would otherwise overflow. 556 + */ 557 + if (intel_guc_submission_is_wanted(&engine->gt->uc.guc)) 558 + value = min_t(u64, value, guc_policy_max_preempt_timeout_ms()); 559 + 560 + value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)); 561 + 562 + return value; 563 + } 564 + 565 + u64 intel_clamp_stop_timeout_ms(struct intel_engine_cs *engine, u64 value) 566 + { 567 + value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)); 568 + 569 + return value; 570 + } 571 + 572 + u64 intel_clamp_timeslice_duration_ms(struct intel_engine_cs *engine, u64 value) 573 + { 574 + /* 575 + * NB: The GuC API only supports 32bit values. However, the limit is further 576 + * reduced due to internal calculations which would otherwise overflow. 577 + */ 578 + if (intel_guc_submission_is_wanted(&engine->gt->uc.guc)) 579 + value = min_t(u64, value, guc_policy_max_exec_quantum_ms()); 580 + 581 + value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)); 582 + 583 + return value; 560 584 } 561 585 562 586 static void __setup_engine_capabilities(struct intel_engine_cs *engine) ··· 1348 1274 return err; 1349 1275 1350 1276 err = setup(engine); 1351 - if (err) 1277 + if (err) { 1278 + intel_engine_cleanup_common(engine); 1352 1279 return err; 1280 + } 1281 + 1282 + /* The backend should now be responsible for cleanup */ 1283 + GEM_BUG_ON(engine->release == NULL); 1353 1284 1354 1285 err = engine_init_common(engine); 1355 1286 if (err) ··· 1633 1554 for_each_ss_steering(iter, engine->gt, slice, subslice) { 1634 1555 instdone->sampler[slice][subslice] = 1635 1556 intel_gt_mcr_read(engine->gt, 1636 - GEN7_SAMPLER_INSTDONE, 1557 + GEN8_SAMPLER_INSTDONE, 1637 1558 slice, subslice); 1638 1559 instdone->row[slice][subslice] = 1639 1560 intel_gt_mcr_read(engine->gt, 1640 - GEN7_ROW_INSTDONE, 1561 + GEN8_ROW_INSTDONE, 1641 1562 slice, subslice); 1642 1563 } 1643 1564

+39

drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c

··· 22 22 23 23 static bool next_heartbeat(struct intel_engine_cs *engine) 24 24 { 25 + struct i915_request *rq; 25 26 long delay; 26 27 27 28 delay = READ_ONCE(engine->props.heartbeat_interval_ms); 29 + 30 + rq = engine->heartbeat.systole; 31 + 32 + /* 33 + * FIXME: The final period extension is disabled if the period has been 34 + * modified from the default. This is to prevent issues with certain 35 + * selftests which override the value and expect specific behaviour. 36 + * Once the selftests have been updated to either cope with variable 37 + * heartbeat periods (or to override the pre-emption timeout as well, 38 + * or just to add a selftest specific override of the extension), the 39 + * generic override can be removed. 40 + */ 41 + if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER && 42 + delay == engine->defaults.heartbeat_interval_ms) { 43 + long longer; 44 + 45 + /* 46 + * The final try is at the highest priority possible. Up until now 47 + * a pre-emption might not even have been attempted. So make sure 48 + * this last attempt allows enough time for a pre-emption to occur. 49 + */ 50 + longer = READ_ONCE(engine->props.preempt_timeout_ms) * 2; 51 + longer = intel_clamp_heartbeat_interval_ms(engine, longer); 52 + if (longer > delay) 53 + delay = longer; 54 + } 55 + 28 56 if (!delay) 29 57 return false; 30 58 ··· 315 287 316 288 if (!delay && !intel_engine_has_preempt_reset(engine)) 317 289 return -ENODEV; 290 + 291 + /* FIXME: Remove together with equally marked hack in next_heartbeat. */ 292 + if (delay != engine->defaults.heartbeat_interval_ms && 293 + delay < 2 * engine->props.preempt_timeout_ms) { 294 + if (intel_engine_uses_guc(engine)) 295 + drm_notice(&engine->i915->drm, "%s heartbeat interval adjusted to a non-default value which may downgrade individual engine resets to full GPU resets!\n", 296 + engine->name); 297 + else 298 + drm_notice(&engine->i915->drm, "%s heartbeat interval adjusted to a non-default value which may cause engine resets to target innocent contexts!\n", 299 + engine->name); 300 + } 318 301 319 302 intel_engine_pm_get(engine); 320 303

+1

drivers/gpu/drm/i915/gt/intel_engine_regs.h

··· 201 201 #define RING_CONTEXT_STATUS_PTR(base) _MMIO((base) + 0x3a0) 202 202 #define RING_CTX_TIMESTAMP(base) _MMIO((base) + 0x3a8) /* gen8+ */ 203 203 #define RING_PREDICATE_RESULT(base) _MMIO((base) + 0x3b8) 204 + #define MI_PREDICATE_RESULT_2_ENGINE(base) _MMIO((base) + 0x3bc) 204 205 #define RING_FORCE_TO_NONPRIV(base, i) _MMIO(((base) + 0x4D0) + (i) * 4) 205 206 #define RING_FORCE_TO_NONPRIV_DENY REG_BIT(30) 206 207 #define RING_FORCE_TO_NONPRIV_ADDRESS_MASK REG_GENMASK(25, 2)

+2 -2

drivers/gpu/drm/i915/gt/intel_execlists_submission.c

··· 3471 3471 3472 3472 if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) { 3473 3473 if (intel_engine_has_preemption(engine)) 3474 - engine->emit_bb_start = gen125_emit_bb_start; 3474 + engine->emit_bb_start = xehp_emit_bb_start; 3475 3475 else 3476 - engine->emit_bb_start = gen125_emit_bb_start_noarb; 3476 + engine->emit_bb_start = xehp_emit_bb_start_noarb; 3477 3477 } else { 3478 3478 if (intel_engine_has_preemption(engine)) 3479 3479 engine->emit_bb_start = gen8_emit_bb_start;

+9 -9

drivers/gpu/drm/i915/gt/intel_ggtt.c

··· 871 871 u32 pte_flags; 872 872 int ret; 873 873 874 - GEM_WARN_ON(pci_resource_len(pdev, GTTMMADR_BAR) != gen6_gttmmadr_size(i915)); 875 - phys_addr = pci_resource_start(pdev, GTTMMADR_BAR) + gen6_gttadr_offset(i915); 874 + GEM_WARN_ON(pci_resource_len(pdev, GEN4_GTTMMADR_BAR) != gen6_gttmmadr_size(i915)); 875 + phys_addr = pci_resource_start(pdev, GEN4_GTTMMADR_BAR) + gen6_gttadr_offset(i915); 876 876 877 877 /* 878 878 * On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range ··· 931 931 unsigned int size; 932 932 u16 snb_gmch_ctl; 933 933 934 - if (!HAS_LMEM(i915)) { 935 - if (!i915_pci_resource_valid(pdev, GTT_APERTURE_BAR)) 934 + if (!HAS_LMEM(i915) && !HAS_LMEMBAR_SMEM_STOLEN(i915)) { 935 + if (!i915_pci_resource_valid(pdev, GEN4_GMADR_BAR)) 936 936 return -ENXIO; 937 937 938 - ggtt->gmadr = pci_resource(pdev, GTT_APERTURE_BAR); 938 + ggtt->gmadr = pci_resource(pdev, GEN4_GMADR_BAR); 939 939 ggtt->mappable_end = resource_size(&ggtt->gmadr); 940 940 } 941 941 ··· 986 986 987 987 ggtt->vm.pte_encode = gen8_ggtt_pte_encode; 988 988 989 - setup_private_pat(ggtt->vm.gt->uncore); 989 + setup_private_pat(ggtt->vm.gt); 990 990 991 991 return ggtt_probe_common(ggtt, size); 992 992 } ··· 1089 1089 unsigned int size; 1090 1090 u16 snb_gmch_ctl; 1091 1091 1092 - if (!i915_pci_resource_valid(pdev, GTT_APERTURE_BAR)) 1092 + if (!i915_pci_resource_valid(pdev, GEN4_GMADR_BAR)) 1093 1093 return -ENXIO; 1094 1094 1095 - ggtt->gmadr = pci_resource(pdev, GTT_APERTURE_BAR); 1095 + ggtt->gmadr = pci_resource(pdev, GEN4_GMADR_BAR); 1096 1096 ggtt->mappable_end = resource_size(&ggtt->gmadr); 1097 1097 1098 1098 /* ··· 1308 1308 wbinvd_on_all_cpus(); 1309 1309 1310 1310 if (GRAPHICS_VER(ggtt->vm.i915) >= 8) 1311 - setup_private_pat(ggtt->vm.gt->uncore); 1311 + setup_private_pat(ggtt->vm.gt); 1312 1312 1313 1313 intel_ggtt_restore_fences(ggtt); 1314 1314 }

+4

drivers/gpu/drm/i915/gt/intel_gpu_commands.h

··· 187 187 #define MI_BATCH_RESOURCE_STREAMER REG_BIT(10) 188 188 #define MI_BATCH_PREDICATE REG_BIT(15) /* HSW+ on RCS only*/ 189 189 190 + #define MI_OPCODE(x) (((x) >> 23) & 0x3f) 191 + #define IS_MI_LRI_CMD(x) (MI_OPCODE(x) == MI_OPCODE(MI_INSTR(0x22, 0))) 192 + #define MI_LRI_LEN(x) (((x) & 0xff) + 1) 193 + 190 194 /* 191 195 * 3D instructions used by the kernel 192 196 */

+20 -3

drivers/gpu/drm/i915/gt/intel_gsc.c

··· 7 7 #include <linux/mei_aux.h> 8 8 #include "i915_drv.h" 9 9 #include "i915_reg.h" 10 + #include "gem/i915_gem_lmem.h" 10 11 #include "gem/i915_gem_region.h" 11 12 #include "gt/intel_gsc.h" 12 13 #include "gt/intel_gt.h" ··· 143 142 struct intel_gsc_intf *intf = &gsc->intf[intf_id]; 144 143 145 144 if (intf->adev) { 146 - auxiliary_device_delete(&intf->adev->aux_dev); 147 - auxiliary_device_uninit(&intf->adev->aux_dev); 145 + struct auxiliary_device *aux_dev = &intf->adev->aux_dev; 146 + 147 + if (intf_id == 0) 148 + intel_huc_unregister_gsc_notifier(&gsc_to_gt(gsc)->uc.huc, 149 + aux_dev->dev.bus); 150 + 151 + auxiliary_device_delete(aux_dev); 152 + auxiliary_device_uninit(aux_dev); 148 153 intf->adev = NULL; 149 154 } 150 155 ··· 249 242 goto fail; 250 243 } 251 244 245 + intf->adev = adev; /* needed by the notifier */ 246 + 247 + if (intf_id == 0) 248 + intel_huc_register_gsc_notifier(&gsc_to_gt(gsc)->uc.huc, 249 + aux_dev->dev.bus); 250 + 252 251 ret = auxiliary_device_add(aux_dev); 253 252 if (ret < 0) { 254 253 drm_err(&i915->drm, "gsc aux add failed %d\n", ret); 254 + if (intf_id == 0) 255 + intel_huc_unregister_gsc_notifier(&gsc_to_gt(gsc)->uc.huc, 256 + aux_dev->dev.bus); 257 + intf->adev = NULL; 258 + 255 259 /* adev will be freed with the put_device() and .release sequence */ 256 260 auxiliary_device_uninit(aux_dev); 257 261 goto fail; 258 262 } 259 - intf->adev = adev; 260 263 261 264 return; 262 265 fail:

+114 -25

drivers/gpu/drm/i915/gt/intel_gt.c

··· 40 40 { 41 41 spin_lock_init(gt->irq_lock); 42 42 43 - INIT_LIST_HEAD(&gt->lmem_userfault_list); 44 - mutex_init(&gt->lmem_userfault_lock); 45 43 INIT_LIST_HEAD(&gt->closed_vma); 46 44 spin_lock_init(&gt->closed_lock); 47 45 ··· 229 231 GEN6_RING_FAULT_REG_POSTING_READ(engine); 230 232 } 231 233 234 + i915_reg_t intel_gt_perf_limit_reasons_reg(struct intel_gt *gt) 235 + { 236 + /* GT0_PERF_LIMIT_REASONS is available only for Gen11+ */ 237 + if (GRAPHICS_VER(gt->i915) < 11) 238 + return INVALID_MMIO_REG; 239 + 240 + return gt->type == GT_MEDIA ? 241 + MTL_MEDIA_PERF_LIMIT_REASONS : GT0_PERF_LIMIT_REASONS; 242 + } 243 + 232 244 void 233 245 intel_gt_clear_error_registers(struct intel_gt *gt, 234 246 intel_engine_mask_t engine_mask) ··· 268 260 I915_MASTER_ERROR_INTERRUPT); 269 261 } 270 262 271 - if (GRAPHICS_VER(i915) >= 12) { 263 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { 264 + intel_gt_mcr_multicast_rmw(gt, XEHP_RING_FAULT_REG, 265 + RING_FAULT_VALID, 0); 266 + intel_gt_mcr_read_any(gt, XEHP_RING_FAULT_REG); 267 + } else if (GRAPHICS_VER(i915) >= 12) { 272 268 rmw_clear(uncore, GEN12_RING_FAULT_REG, RING_FAULT_VALID); 273 269 intel_uncore_posting_read(uncore, GEN12_RING_FAULT_REG); 274 270 } else if (GRAPHICS_VER(i915) >= 8) { ··· 307 295 RING_FAULT_SRCID(fault), 308 296 RING_FAULT_FAULT_TYPE(fault)); 309 297 } 298 + } 299 + } 300 + 301 + static void xehp_check_faults(struct intel_gt *gt) 302 + { 303 + u32 fault; 304 + 305 + /* 306 + * Although the fault register now lives in an MCR register range, 307 + * the GAM registers are special and we only truly need to read 308 + * the "primary" GAM instance rather than handling each instance 309 + * individually. intel_gt_mcr_read_any() will automatically steer 310 + * toward the primary instance. 311 + */ 312 + fault = intel_gt_mcr_read_any(gt, XEHP_RING_FAULT_REG); 313 + if (fault & RING_FAULT_VALID) { 314 + u32 fault_data0, fault_data1; 315 + u64 fault_addr; 316 + 317 + fault_data0 = intel_gt_mcr_read_any(gt, XEHP_FAULT_TLB_DATA0); 318 + fault_data1 = intel_gt_mcr_read_any(gt, XEHP_FAULT_TLB_DATA1); 319 + 320 + fault_addr = ((u64)(fault_data1 & FAULT_VA_HIGH_BITS) << 44) | 321 + ((u64)fault_data0 << 12); 322 + 323 + drm_dbg(&gt->i915->drm, "Unexpected fault\n" 324 + "\tAddr: 0x%08x_%08x\n" 325 + "\tAddress space: %s\n" 326 + "\tEngine ID: %d\n" 327 + "\tSource ID: %d\n" 328 + "\tType: %d\n", 329 + upper_32_bits(fault_addr), lower_32_bits(fault_addr), 330 + fault_data1 & FAULT_GTT_SEL ? "GGTT" : "PPGTT", 331 + GEN8_RING_FAULT_ENGINE_ID(fault), 332 + RING_FAULT_SRCID(fault), 333 + RING_FAULT_FAULT_TYPE(fault)); 310 334 } 311 335 } 312 336 ··· 392 344 struct drm_i915_private *i915 = gt->i915; 393 345 394 346 /* From GEN8 onwards we only have one 'All Engine Fault Register' */ 395 - if (GRAPHICS_VER(i915) >= 8) 347 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) 348 + xehp_check_faults(gt); 349 + else if (GRAPHICS_VER(i915) >= 8) 396 350 gen8_check_faults(gt); 397 351 else if (GRAPHICS_VER(i915) >= 6) 398 352 gen6_check_faults(gt); ··· 857 807 } 858 808 859 809 intel_uncore_init_early(gt->uncore, gt); 860 - intel_wakeref_auto_init(&gt->userfault_wakeref, gt->uncore->rpm); 861 810 862 811 ret = intel_uncore_setup_mmio(gt->uncore, phys_addr); 863 812 if (ret) ··· 877 828 unsigned int i; 878 829 int ret; 879 830 880 - mmio_bar = GRAPHICS_VER(i915) == 2 ? GEN2_GTTMMADR_BAR : GTTMMADR_BAR; 831 + mmio_bar = intel_mmio_bar(GRAPHICS_VER(i915)); 881 832 phys_addr = pci_resource_start(pdev, mmio_bar); 882 833 883 834 /* ··· 988 939 } 989 940 990 941 struct reg_and_bit { 991 - i915_reg_t reg; 942 + union { 943 + i915_reg_t reg; 944 + i915_mcr_reg_t mcr_reg; 945 + }; 992 946 u32 bit; 993 947 }; 994 948 ··· 1017 965 return rb; 1018 966 } 1019 967 968 + /* 969 + * HW architecture suggest typical invalidation time at 40us, 970 + * with pessimistic cases up to 100us and a recommendation to 971 + * cap at 1ms. We go a bit higher just in case. 972 + */ 973 + #define TLB_INVAL_TIMEOUT_US 100 974 + #define TLB_INVAL_TIMEOUT_MS 4 975 + 976 + /* 977 + * On Xe_HP the TLB invalidation registers are located at the same MMIO offsets 978 + * but are now considered MCR registers. Since they exist within a GAM range, 979 + * the primary instance of the register rolls up the status from each unit. 980 + */ 981 + static int wait_for_invalidate(struct intel_gt *gt, struct reg_and_bit rb) 982 + { 983 + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50)) 984 + return intel_gt_mcr_wait_for_reg_fw(gt, rb.mcr_reg, rb.bit, 0, 985 + TLB_INVAL_TIMEOUT_US, 986 + TLB_INVAL_TIMEOUT_MS); 987 + else 988 + return __intel_wait_for_register_fw(gt->uncore, rb.reg, rb.bit, 0, 989 + TLB_INVAL_TIMEOUT_US, 990 + TLB_INVAL_TIMEOUT_MS, 991 + NULL); 992 + } 993 + 1020 994 static void mmio_invalidate_full(struct intel_gt *gt) 1021 995 { 1022 996 static const i915_reg_t gen8_regs[] = { ··· 1058 980 [COPY_ENGINE_CLASS] = GEN12_BLT_TLB_INV_CR, 1059 981 [COMPUTE_CLASS] = GEN12_COMPCTX_TLB_INV_CR, 1060 982 }; 983 + static const i915_mcr_reg_t xehp_regs[] = { 984 + [RENDER_CLASS] = XEHP_GFX_TLB_INV_CR, 985 + [VIDEO_DECODE_CLASS] = XEHP_VD_TLB_INV_CR, 986 + [VIDEO_ENHANCEMENT_CLASS] = XEHP_VE_TLB_INV_CR, 987 + [COPY_ENGINE_CLASS] = XEHP_BLT_TLB_INV_CR, 988 + [COMPUTE_CLASS] = XEHP_COMPCTX_TLB_INV_CR, 989 + }; 1061 990 struct drm_i915_private *i915 = gt->i915; 1062 991 struct intel_uncore *uncore = gt->uncore; 1063 992 struct intel_engine_cs *engine; ··· 1073 988 const i915_reg_t *regs; 1074 989 unsigned int num = 0; 1075 990 1076 - if (GRAPHICS_VER(i915) == 12) { 991 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { 992 + regs = NULL; 993 + num = ARRAY_SIZE(xehp_regs); 994 + } else if (GRAPHICS_VER(i915) == 12) { 1077 995 regs = gen12_regs; 1078 996 num = ARRAY_SIZE(gen12_regs); 1079 997 } else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) { ··· 1101 1013 if (!intel_engine_pm_is_awake(engine)) 1102 1014 continue; 1103 1015 1104 - rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); 1105 - if (!i915_mmio_reg_offset(rb.reg)) 1106 - continue; 1016 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { 1017 + intel_gt_mcr_multicast_write_fw(gt, 1018 + xehp_regs[engine->class], 1019 + BIT(engine->instance)); 1020 + } else { 1021 + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); 1022 + if (!i915_mmio_reg_offset(rb.reg)) 1023 + continue; 1107 1024 1108 - intel_uncore_write_fw(uncore, rb.reg, rb.bit); 1025 + intel_uncore_write_fw(uncore, rb.reg, rb.bit); 1026 + } 1109 1027 awake |= engine->mask; 1110 1028 } 1111 1029 ··· 1131 1037 for_each_engine_masked(engine, gt, awake, tmp) { 1132 1038 struct reg_and_bit rb; 1133 1039 1134 - /* 1135 - * HW architecture suggest typical invalidation time at 40us, 1136 - * with pessimistic cases up to 100us and a recommendation to 1137 - * cap at 1ms. We go a bit higher just in case. 1138 - */ 1139 - const unsigned int timeout_us = 100; 1140 - const unsigned int timeout_ms = 4; 1040 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { 1041 + rb.mcr_reg = xehp_regs[engine->class]; 1042 + rb.bit = BIT(engine->instance); 1043 + } else { 1044 + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); 1045 + } 1141 1046 1142 - rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); 1143 - if (__intel_wait_for_register_fw(uncore, 1144 - rb.reg, rb.bit, 0, 1145 - timeout_us, timeout_ms, 1146 - NULL)) 1047 + if (wait_for_invalidate(gt, rb)) 1147 1048 drm_err_ratelimited(&gt->i915->drm, 1148 1049 "%s TLB invalidation did not complete in %ums!\n", 1149 - engine->name, timeout_ms); 1050 + engine->name, TLB_INVAL_TIMEOUT_MS); 1150 1051 } 1151 1052 1152 1053 /*

+1

drivers/gpu/drm/i915/gt/intel_gt.h

··· 60 60 int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout); 61 61 62 62 void intel_gt_check_and_clear_faults(struct intel_gt *gt); 63 + i915_reg_t intel_gt_perf_limit_reasons_reg(struct intel_gt *gt); 63 64 void intel_gt_clear_error_registers(struct intel_gt *gt, 64 65 intel_engine_mask_t engine_mask); 65 66

+33 -5

drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c

··· 107 107 return freq; 108 108 } 109 109 110 - static u32 gen5_read_clock_frequency(struct intel_uncore *uncore) 110 + static u32 gen6_read_clock_frequency(struct intel_uncore *uncore) 111 111 { 112 112 /* 113 113 * PRMs say: ··· 119 119 return 12500000; 120 120 } 121 121 122 - static u32 gen2_read_clock_frequency(struct intel_uncore *uncore) 122 + static u32 gen5_read_clock_frequency(struct intel_uncore *uncore) 123 + { 124 + /* 125 + * 63:32 increments every 1000 ns 126 + * 31:0 mbz 127 + */ 128 + return 1000000000 / 1000; 129 + } 130 + 131 + static u32 g4x_read_clock_frequency(struct intel_uncore *uncore) 132 + { 133 + /* 134 + * 63:20 increments every 1/4 ns 135 + * 19:0 mbz 136 + * 137 + * -> 63:32 increments every 1024 ns 138 + */ 139 + return 1000000000 / 1024; 140 + } 141 + 142 + static u32 gen4_read_clock_frequency(struct intel_uncore *uncore) 123 143 { 124 144 /* 125 145 * PRMs say: ··· 147 127 * "The value in this register increments once every 16 148 128 * hclks." (through the “Clocking Configuration” 149 129 * (“CLKCFG”) MCHBAR register) 130 + * 131 + * Testing on actual hardware has shown there is no /16. 150 132 */ 151 - return RUNTIME_INFO(uncore->i915)->rawclk_freq * 1000 / 16; 133 + return RUNTIME_INFO(uncore->i915)->rawclk_freq * 1000; 152 134 } 153 135 154 136 static u32 read_clock_frequency(struct intel_uncore *uncore) ··· 159 137 return gen11_read_clock_frequency(uncore); 160 138 else if (GRAPHICS_VER(uncore->i915) >= 9) 161 139 return gen9_read_clock_frequency(uncore); 162 - else if (GRAPHICS_VER(uncore->i915) >= 5) 140 + else if (GRAPHICS_VER(uncore->i915) >= 6) 141 + return gen6_read_clock_frequency(uncore); 142 + else if (GRAPHICS_VER(uncore->i915) == 5) 163 143 return gen5_read_clock_frequency(uncore); 144 + else if (IS_G4X(uncore->i915)) 145 + return g4x_read_clock_frequency(uncore); 146 + else if (GRAPHICS_VER(uncore->i915) == 4) 147 + return gen4_read_clock_frequency(uncore); 164 148 else 165 - return gen2_read_clock_frequency(uncore); 149 + return 0; 166 150 } 167 151 168 152 void intel_gt_init_clock_frequency(struct intel_gt *gt)

+270 -35

drivers/gpu/drm/i915/gt/intel_gt_mcr.c

··· 40 40 "L3BANK", 41 41 "MSLICE", 42 42 "LNCF", 43 + "GAM", 44 + "DSS", 45 + "OADDRM", 43 46 "INSTANCE 0", 44 47 }; 45 48 ··· 51 48 {}, 52 49 }; 53 50 51 + /* 52 + * Although the bspec lists more "MSLICE" ranges than shown here, some of those 53 + * are of a "GAM" subclass that has special rules. Thus we use a separate 54 + * GAM table farther down for those. 55 + */ 54 56 static const struct intel_mmio_range xehpsdv_mslice_steering_table[] = { 55 - { 0x004000, 0x004AFF }, 56 - { 0x00C800, 0x00CFFF }, 57 57 { 0x00DD00, 0x00DDFF }, 58 58 { 0x00E900, 0x00FFFF }, /* 0xEA00 - OxEFFF is unused */ 59 + {}, 60 + }; 61 + 62 + static const struct intel_mmio_range xehpsdv_gam_steering_table[] = { 63 + { 0x004000, 0x004AFF }, 64 + { 0x00C800, 0x00CFFF }, 59 65 {}, 60 66 }; 61 67 ··· 101 89 {}, 102 90 }; 103 91 92 + static const struct intel_mmio_range xelpg_instance0_steering_table[] = { 93 + { 0x000B00, 0x000BFF }, /* SQIDI */ 94 + { 0x001000, 0x001FFF }, /* SQIDI */ 95 + { 0x004000, 0x0048FF }, /* GAM */ 96 + { 0x008700, 0x0087FF }, /* SQIDI */ 97 + { 0x00B000, 0x00B0FF }, /* NODE */ 98 + { 0x00C800, 0x00CFFF }, /* GAM */ 99 + { 0x00D880, 0x00D8FF }, /* NODE */ 100 + { 0x00DD00, 0x00DDFF }, /* OAAL2 */ 101 + {}, 102 + }; 103 + 104 + static const struct intel_mmio_range xelpg_l3bank_steering_table[] = { 105 + { 0x00B100, 0x00B3FF }, 106 + {}, 107 + }; 108 + 109 + /* DSS steering is used for SLICE ranges as well */ 110 + static const struct intel_mmio_range xelpg_dss_steering_table[] = { 111 + { 0x005200, 0x0052FF }, /* SLICE */ 112 + { 0x005500, 0x007FFF }, /* SLICE */ 113 + { 0x008140, 0x00815F }, /* SLICE (0x8140-0x814F), DSS (0x8150-0x815F) */ 114 + { 0x0094D0, 0x00955F }, /* SLICE (0x94D0-0x951F), DSS (0x9520-0x955F) */ 115 + { 0x009680, 0x0096FF }, /* DSS */ 116 + { 0x00D800, 0x00D87F }, /* SLICE */ 117 + { 0x00DC00, 0x00DCFF }, /* SLICE */ 118 + { 0x00DE80, 0x00E8FF }, /* DSS (0xE000-0xE0FF reserved) */ 119 + {}, 120 + }; 121 + 122 + static const struct intel_mmio_range xelpmp_oaddrm_steering_table[] = { 123 + { 0x393200, 0x39323F }, 124 + { 0x393400, 0x3934FF }, 125 + {}, 126 + }; 127 + 104 128 void intel_gt_mcr_init(struct intel_gt *gt) 105 129 { 106 130 struct drm_i915_private *i915 = gt->i915; 131 + unsigned long fuse; 132 + int i; 107 133 108 134 /* 109 135 * An mslice is unavailable only if both the meml3 for the slice is ··· 159 109 drm_warn(&i915->drm, "mslice mask all zero!\n"); 160 110 } 161 111 162 - if (IS_PONTEVECCHIO(i915)) { 112 + if (MEDIA_VER(i915) >= 13 && gt->type == GT_MEDIA) { 113 + gt->steering_table[OADDRM] = xelpmp_oaddrm_steering_table; 114 + } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) { 115 + fuse = REG_FIELD_GET(GT_L3_EXC_MASK, 116 + intel_uncore_read(gt->uncore, XEHP_FUSE4)); 117 + 118 + /* 119 + * Despite the register field being named "exclude mask" the 120 + * bits actually represent enabled banks (two banks per bit). 121 + */ 122 + for_each_set_bit(i, &fuse, 3) 123 + gt->info.l3bank_mask |= 0x3 << 2 * i; 124 + 125 + gt->steering_table[INSTANCE0] = xelpg_instance0_steering_table; 126 + gt->steering_table[L3BANK] = xelpg_l3bank_steering_table; 127 + gt->steering_table[DSS] = xelpg_dss_steering_table; 128 + } else if (IS_PONTEVECCHIO(i915)) { 163 129 gt->steering_table[INSTANCE0] = pvc_instance0_steering_table; 164 130 } else if (IS_DG2(i915)) { 165 131 gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table; 166 132 gt->steering_table[LNCF] = dg2_lncf_steering_table; 133 + /* 134 + * No need to hook up the GAM table since it has a dedicated 135 + * steering control register on DG2 and can use implicit 136 + * steering. 137 + */ 167 138 } else if (IS_XEHPSDV(i915)) { 168 139 gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table; 169 140 gt->steering_table[LNCF] = xehpsdv_lncf_steering_table; 141 + gt->steering_table[GAM] = xehpsdv_gam_steering_table; 170 142 } else if (GRAPHICS_VER(i915) >= 11 && 171 143 GRAPHICS_VER_FULL(i915) < IP_VER(12, 50)) { 172 144 gt->steering_table[L3BANK] = icl_l3bank_steering_table; ··· 207 135 } 208 136 209 137 /* 138 + * Although the rest of the driver should use MCR-specific functions to 139 + * read/write MCR registers, we still use the regular intel_uncore_* functions 140 + * internally to implement those, so we need a way for the functions in this 141 + * file to "cast" an i915_mcr_reg_t into an i915_reg_t. 142 + */ 143 + static i915_reg_t mcr_reg_cast(const i915_mcr_reg_t mcr) 144 + { 145 + i915_reg_t r = { .reg = mcr.reg }; 146 + 147 + return r; 148 + } 149 + 150 + /* 210 151 * rw_with_mcr_steering_fw - Access a register with specific MCR steering 211 152 * @uncore: pointer to struct intel_uncore 212 153 * @reg: register being accessed ··· 233 148 * Caller needs to make sure the relevant forcewake wells are up. 234 149 */ 235 150 static u32 rw_with_mcr_steering_fw(struct intel_uncore *uncore, 236 - i915_reg_t reg, u8 rw_flag, 151 + i915_mcr_reg_t reg, u8 rw_flag, 237 152 int group, int instance, u32 value) 238 153 { 239 154 u32 mcr_mask, mcr_ss, mcr, old_mcr, val = 0; 240 155 241 156 lockdep_assert_held(&uncore->lock); 242 157 243 - if (GRAPHICS_VER(uncore->i915) >= 11) { 158 + if (GRAPHICS_VER_FULL(uncore->i915) >= IP_VER(12, 70)) { 159 + /* 160 + * Always leave the hardware in multicast mode when doing reads 161 + * (see comment about Wa_22013088509 below) and only change it 162 + * to unicast mode when doing writes of a specific instance. 163 + * 164 + * No need to save old steering reg value. 165 + */ 166 + intel_uncore_write_fw(uncore, MTL_MCR_SELECTOR, 167 + REG_FIELD_PREP(MTL_MCR_GROUPID, group) | 168 + REG_FIELD_PREP(MTL_MCR_INSTANCEID, instance) | 169 + (rw_flag == FW_REG_READ ? GEN11_MCR_MULTICAST : 0)); 170 + } else if (GRAPHICS_VER(uncore->i915) >= 11) { 244 171 mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK; 245 172 mcr_ss = GEN11_MCR_SLICE(group) | GEN11_MCR_SUBSLICE(instance); 246 173 ··· 270 173 */ 271 174 if (rw_flag == FW_REG_WRITE) 272 175 mcr_mask |= GEN11_MCR_MULTICAST; 176 + 177 + mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR); 178 + old_mcr = mcr; 179 + 180 + mcr &= ~mcr_mask; 181 + mcr |= mcr_ss; 182 + intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr); 273 183 } else { 274 184 mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK; 275 185 mcr_ss = GEN8_MCR_SLICE(group) | GEN8_MCR_SUBSLICE(instance); 186 + 187 + mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR); 188 + old_mcr = mcr; 189 + 190 + mcr &= ~mcr_mask; 191 + mcr |= mcr_ss; 192 + intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr); 276 193 } 277 194 278 - old_mcr = mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR); 279 - 280 - mcr &= ~mcr_mask; 281 - mcr |= mcr_ss; 282 - intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr); 283 - 284 195 if (rw_flag == FW_REG_READ) 285 - val = intel_uncore_read_fw(uncore, reg); 196 + val = intel_uncore_read_fw(uncore, mcr_reg_cast(reg)); 286 197 else 287 - intel_uncore_write_fw(uncore, reg, value); 198 + intel_uncore_write_fw(uncore, mcr_reg_cast(reg), value); 288 199 289 - mcr &= ~mcr_mask; 290 - mcr |= old_mcr & mcr_mask; 291 - 292 - intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr); 200 + /* 201 + * For pre-MTL platforms, we need to restore the old value of the 202 + * steering control register to ensure that implicit steering continues 203 + * to behave as expected. For MTL and beyond, we need only reinstate 204 + * the 'multicast' bit (and only if we did a write that cleared it). 205 + */ 206 + if (GRAPHICS_VER_FULL(uncore->i915) >= IP_VER(12, 70) && rw_flag == FW_REG_WRITE) 207 + intel_uncore_write_fw(uncore, MTL_MCR_SELECTOR, GEN11_MCR_MULTICAST); 208 + else if (GRAPHICS_VER_FULL(uncore->i915) < IP_VER(12, 70)) 209 + intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, old_mcr); 293 210 294 211 return val; 295 212 } 296 213 297 214 static u32 rw_with_mcr_steering(struct intel_uncore *uncore, 298 - i915_reg_t reg, u8 rw_flag, 215 + i915_mcr_reg_t reg, u8 rw_flag, 299 216 int group, int instance, 300 217 u32 value) 301 218 { 302 219 enum forcewake_domains fw_domains; 303 220 u32 val; 304 221 305 - fw_domains = intel_uncore_forcewake_for_reg(uncore, reg, 222 + fw_domains = intel_uncore_forcewake_for_reg(uncore, mcr_reg_cast(reg), 306 223 rw_flag); 307 224 fw_domains |= intel_uncore_forcewake_for_reg(uncore, 308 225 GEN8_MCR_SELECTOR, ··· 344 233 * group/instance. 345 234 */ 346 235 u32 intel_gt_mcr_read(struct intel_gt *gt, 347 - i915_reg_t reg, 236 + i915_mcr_reg_t reg, 348 237 int group, int instance) 349 238 { 350 239 return rw_with_mcr_steering(gt->uncore, reg, FW_REG_READ, group, instance, 0); ··· 361 250 * Write an MCR register in unicast mode after steering toward a specific 362 251 * group/instance. 363 252 */ 364 - void intel_gt_mcr_unicast_write(struct intel_gt *gt, i915_reg_t reg, u32 value, 253 + void intel_gt_mcr_unicast_write(struct intel_gt *gt, i915_mcr_reg_t reg, u32 value, 365 254 int group, int instance) 366 255 { 367 256 rw_with_mcr_steering(gt->uncore, reg, FW_REG_WRITE, group, instance, value); ··· 376 265 * Write an MCR register in multicast mode to update all instances. 377 266 */ 378 267 void intel_gt_mcr_multicast_write(struct intel_gt *gt, 379 - i915_reg_t reg, u32 value) 268 + i915_mcr_reg_t reg, u32 value) 380 269 { 381 - intel_uncore_write(gt->uncore, reg, value); 270 + /* 271 + * Ensure we have multicast behavior, just in case some non-i915 agent 272 + * left the hardware in unicast mode. 273 + */ 274 + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70)) 275 + intel_uncore_write_fw(gt->uncore, MTL_MCR_SELECTOR, GEN11_MCR_MULTICAST); 276 + 277 + intel_uncore_write(gt->uncore, mcr_reg_cast(reg), value); 382 278 } 383 279 384 280 /** ··· 399 281 * domains; use intel_gt_mcr_multicast_write() in cases where forcewake should 400 282 * be obtained automatically. 401 283 */ 402 - void intel_gt_mcr_multicast_write_fw(struct intel_gt *gt, i915_reg_t reg, u32 value) 284 + void intel_gt_mcr_multicast_write_fw(struct intel_gt *gt, i915_mcr_reg_t reg, u32 value) 403 285 { 404 - intel_uncore_write_fw(gt->uncore, reg, value); 286 + /* 287 + * Ensure we have multicast behavior, just in case some non-i915 agent 288 + * left the hardware in unicast mode. 289 + */ 290 + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70)) 291 + intel_uncore_write_fw(gt->uncore, MTL_MCR_SELECTOR, GEN11_MCR_MULTICAST); 292 + 293 + intel_uncore_write_fw(gt->uncore, mcr_reg_cast(reg), value); 294 + } 295 + 296 + /** 297 + * intel_gt_mcr_multicast_rmw - Performs a multicast RMW operations 298 + * @gt: GT structure 299 + * @reg: the MCR register to read and write 300 + * @clear: bits to clear during RMW 301 + * @set: bits to set during RMW 302 + * 303 + * Performs a read-modify-write on an MCR register in a multicast manner. 304 + * This operation only makes sense on MCR registers where all instances are 305 + * expected to have the same value. The read will target any non-terminated 306 + * instance and the write will be applied to all instances. 307 + * 308 + * This function assumes the caller is already holding any necessary forcewake 309 + * domains; use intel_gt_mcr_multicast_rmw() in cases where forcewake should 310 + * be obtained automatically. 311 + * 312 + * Returns the old (unmodified) value read. 313 + */ 314 + u32 intel_gt_mcr_multicast_rmw(struct intel_gt *gt, i915_mcr_reg_t reg, 315 + u32 clear, u32 set) 316 + { 317 + u32 val = intel_gt_mcr_read_any(gt, reg); 318 + 319 + intel_gt_mcr_multicast_write(gt, reg, (val & ~clear) | set); 320 + 321 + return val; 405 322 } 406 323 407 324 /* ··· 454 301 * for @type steering too. 455 302 */ 456 303 static bool reg_needs_read_steering(struct intel_gt *gt, 457 - i915_reg_t reg, 304 + i915_mcr_reg_t reg, 458 305 enum intel_steering_type type) 459 306 { 460 307 const u32 offset = i915_mmio_reg_offset(reg); ··· 485 332 enum intel_steering_type type, 486 333 u8 *group, u8 *instance) 487 334 { 335 + u32 dss; 336 + 488 337 switch (type) { 489 338 case L3BANK: 490 339 *group = 0; /* unused */ ··· 506 351 *group = __ffs(gt->info.mslice_mask) << 1; 507 352 *instance = 0; /* unused */ 508 353 break; 354 + case GAM: 355 + *group = IS_DG2(gt->i915) ? 1 : 0; 356 + *instance = 0; 357 + break; 358 + case DSS: 359 + dss = intel_sseu_find_first_xehp_dss(&gt->info.sseu, 0, 0); 360 + *group = dss / GEN_DSS_PER_GSLICE; 361 + *instance = dss % GEN_DSS_PER_GSLICE; 362 + break; 509 363 case INSTANCE0: 510 364 /* 511 365 * There are a lot of MCR types for which instance (0, 0) 512 366 * will always provide a non-terminated value. 513 367 */ 514 368 *group = 0; 369 + *instance = 0; 370 + break; 371 + case OADDRM: 372 + if ((VDBOX_MASK(gt) | VEBOX_MASK(gt) | gt->info.sfc_mask) & BIT(0)) 373 + *group = 0; 374 + else 375 + *group = 1; 515 376 *instance = 0; 516 377 break; 517 378 default: ··· 551 380 * steering. 552 381 */ 553 382 void intel_gt_mcr_get_nonterminated_steering(struct intel_gt *gt, 554 - i915_reg_t reg, 383 + i915_mcr_reg_t reg, 555 384 u8 *group, u8 *instance) 556 385 { 557 386 int type; ··· 580 409 * 581 410 * Returns the value from a non-terminated instance of @reg. 582 411 */ 583 - u32 intel_gt_mcr_read_any_fw(struct intel_gt *gt, i915_reg_t reg) 412 + u32 intel_gt_mcr_read_any_fw(struct intel_gt *gt, i915_mcr_reg_t reg) 584 413 { 585 414 int type; 586 415 u8 group, instance; ··· 594 423 } 595 424 } 596 425 597 - return intel_uncore_read_fw(gt->uncore, reg); 426 + return intel_uncore_read_fw(gt->uncore, mcr_reg_cast(reg)); 598 427 } 599 428 600 429 /** ··· 607 436 * 608 437 * Returns the value from a non-terminated instance of @reg. 609 438 */ 610 - u32 intel_gt_mcr_read_any(struct intel_gt *gt, i915_reg_t reg) 439 + u32 intel_gt_mcr_read_any(struct intel_gt *gt, i915_mcr_reg_t reg) 611 440 { 612 441 int type; 613 442 u8 group, instance; ··· 621 450 } 622 451 } 623 452 624 - return intel_uncore_read(gt->uncore, reg); 453 + return intel_uncore_read(gt->uncore, mcr_reg_cast(reg)); 625 454 } 626 455 627 456 static void report_steering_type(struct drm_printer *p, ··· 654 483 void intel_gt_mcr_report_steering(struct drm_printer *p, struct intel_gt *gt, 655 484 bool dump_table) 656 485 { 657 - drm_printf(p, "Default steering: group=0x%x, instance=0x%x\n", 658 - gt->default_steering.groupid, 659 - gt->default_steering.instanceid); 486 + /* 487 + * Starting with MTL we no longer have default steering; 488 + * all ranges are explicitly steered. 489 + */ 490 + if (GRAPHICS_VER_FULL(gt->i915) < IP_VER(12, 70)) 491 + drm_printf(p, "Default steering: group=0x%x, instance=0x%x\n", 492 + gt->default_steering.groupid, 493 + gt->default_steering.instanceid); 660 494 661 - if (IS_PONTEVECCHIO(gt->i915)) { 495 + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 70)) { 496 + for (int i = 0; i < NUM_STEERING_TYPES; i++) 497 + if (gt->steering_table[i]) 498 + report_steering_type(p, gt, i, dump_table); 499 + } else if (IS_PONTEVECCHIO(gt->i915)) { 662 500 report_steering_type(p, gt, INSTANCE0, dump_table); 663 501 } else if (HAS_MSLICE_STEERING(gt->i915)) { 664 502 report_steering_type(p, gt, MSLICE, dump_table); ··· 699 519 *instance = dss % GEN_MAX_SS_PER_HSW_SLICE; 700 520 return; 701 521 } 522 + } 523 + 524 + /** 525 + * intel_gt_mcr_wait_for_reg_fw - wait until MCR register matches expected state 526 + * @gt: GT structure 527 + * @reg: the register to read 528 + * @mask: mask to apply to register value 529 + * @value: value to wait for 530 + * @fast_timeout_us: fast timeout in microsecond for atomic/tight wait 531 + * @slow_timeout_ms: slow timeout in millisecond 532 + * 533 + * This routine waits until the target register @reg contains the expected 534 + * @value after applying the @mask, i.e. it waits until :: 535 + * 536 + * (intel_gt_mcr_read_any_fw(gt, reg) & mask) == value 537 + * 538 + * Otherwise, the wait will timeout after @slow_timeout_ms milliseconds. 539 + * For atomic context @slow_timeout_ms must be zero and @fast_timeout_us 540 + * must be not larger than 20,0000 microseconds. 541 + * 542 + * This function is basically an MCR-friendly version of 543 + * __intel_wait_for_register_fw(). Generally this function will only be used 544 + * on GAM registers which are a bit special --- although they're MCR registers, 545 + * reads (e.g., waiting for status updates) are always directed to the primary 546 + * instance. 547 + * 548 + * Note that this routine assumes the caller holds forcewake asserted, it is 549 + * not suitable for very long waits. 550 + * 551 + * Return: 0 if the register matches the desired condition, or -ETIMEDOUT. 552 + */ 553 + int intel_gt_mcr_wait_for_reg_fw(struct intel_gt *gt, 554 + i915_mcr_reg_t reg, 555 + u32 mask, 556 + u32 value, 557 + unsigned int fast_timeout_us, 558 + unsigned int slow_timeout_ms) 559 + { 560 + u32 reg_value = 0; 561 + #define done (((reg_value = intel_gt_mcr_read_any_fw(gt, reg)) & mask) == value) 562 + int ret; 563 + 564 + /* Catch any overuse of this function */ 565 + might_sleep_if(slow_timeout_ms); 566 + GEM_BUG_ON(fast_timeout_us > 20000); 567 + GEM_BUG_ON(!fast_timeout_us && !slow_timeout_ms); 568 + 569 + ret = -ETIMEDOUT; 570 + if (fast_timeout_us && fast_timeout_us <= 20000) 571 + ret = _wait_for_atomic(done, fast_timeout_us, 0); 572 + if (ret && slow_timeout_ms) 573 + ret = wait_for(done, slow_timeout_ms); 574 + 575 + return ret; 576 + #undef done 702 577 }

+17 -7

drivers/gpu/drm/i915/gt/intel_gt_mcr.h

··· 11 11 void intel_gt_mcr_init(struct intel_gt *gt); 12 12 13 13 u32 intel_gt_mcr_read(struct intel_gt *gt, 14 - i915_reg_t reg, 14 + i915_mcr_reg_t reg, 15 15 int group, int instance); 16 - u32 intel_gt_mcr_read_any_fw(struct intel_gt *gt, i915_reg_t reg); 17 - u32 intel_gt_mcr_read_any(struct intel_gt *gt, i915_reg_t reg); 16 + u32 intel_gt_mcr_read_any_fw(struct intel_gt *gt, i915_mcr_reg_t reg); 17 + u32 intel_gt_mcr_read_any(struct intel_gt *gt, i915_mcr_reg_t reg); 18 18 19 19 void intel_gt_mcr_unicast_write(struct intel_gt *gt, 20 - i915_reg_t reg, u32 value, 20 + i915_mcr_reg_t reg, u32 value, 21 21 int group, int instance); 22 22 void intel_gt_mcr_multicast_write(struct intel_gt *gt, 23 - i915_reg_t reg, u32 value); 23 + i915_mcr_reg_t reg, u32 value); 24 24 void intel_gt_mcr_multicast_write_fw(struct intel_gt *gt, 25 - i915_reg_t reg, u32 value); 25 + i915_mcr_reg_t reg, u32 value); 26 + 27 + u32 intel_gt_mcr_multicast_rmw(struct intel_gt *gt, i915_mcr_reg_t reg, 28 + u32 clear, u32 set); 26 29 27 30 void intel_gt_mcr_get_nonterminated_steering(struct intel_gt *gt, 28 - i915_reg_t reg, 31 + i915_mcr_reg_t reg, 29 32 u8 *group, u8 *instance); 30 33 31 34 void intel_gt_mcr_report_steering(struct drm_printer *p, struct intel_gt *gt, ··· 36 33 37 34 void intel_gt_mcr_get_ss_steering(struct intel_gt *gt, unsigned int dss, 38 35 unsigned int *group, unsigned int *instance); 36 + 37 + int intel_gt_mcr_wait_for_reg_fw(struct intel_gt *gt, 38 + i915_mcr_reg_t reg, 39 + u32 mask, 40 + u32 value, 41 + unsigned int fast_timeout_us, 42 + unsigned int slow_timeout_ms); 39 43 40 44 /* 41 45 * Helper for for_each_ss_steering loop. On pre-Xe_HP platforms, subslice

+40 -156

drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c

··· 344 344 drm_printf(p, "efficient (RPe) frequency: %d MHz\n", 345 345 intel_gpu_freq(rps, rps->efficient_freq)); 346 346 } else if (GRAPHICS_VER(i915) >= 6) { 347 - u32 rp_state_limits; 348 - u32 gt_perf_status; 349 - struct intel_rps_freq_caps caps; 350 - u32 rpmodectl, rpinclimit, rpdeclimit; 351 - u32 rpstat, cagf, reqf; 352 - u32 rpcurupei, rpcurup, rpprevup; 353 - u32 rpcurdownei, rpcurdown, rpprevdown; 354 - u32 rpupei, rpupt, rpdownei, rpdownt; 355 - u32 pm_ier, pm_imr, pm_isr, pm_iir, pm_mask; 356 - 357 - rp_state_limits = intel_uncore_read(uncore, GEN6_RP_STATE_LIMITS); 358 - gen6_rps_get_freq_caps(rps, &caps); 359 - if (IS_GEN9_LP(i915)) 360 - gt_perf_status = intel_uncore_read(uncore, BXT_GT_PERF_STATUS); 361 - else 362 - gt_perf_status = intel_uncore_read(uncore, GEN6_GT_PERF_STATUS); 363 - 364 - /* RPSTAT1 is in the GT power well */ 365 - intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); 366 - 367 - reqf = intel_uncore_read(uncore, GEN6_RPNSWREQ); 368 - if (GRAPHICS_VER(i915) >= 9) { 369 - reqf >>= 23; 370 - } else { 371 - reqf &= ~GEN6_TURBO_DISABLE; 372 - if (IS_HASWELL(i915) || IS_BROADWELL(i915)) 373 - reqf >>= 24; 374 - else 375 - reqf >>= 25; 376 - } 377 - reqf = intel_gpu_freq(rps, reqf); 378 - 379 - rpmodectl = intel_uncore_read(uncore, GEN6_RP_CONTROL); 380 - rpinclimit = intel_uncore_read(uncore, GEN6_RP_UP_THRESHOLD); 381 - rpdeclimit = intel_uncore_read(uncore, GEN6_RP_DOWN_THRESHOLD); 382 - 383 - rpstat = intel_uncore_read(uncore, GEN6_RPSTAT1); 384 - rpcurupei = intel_uncore_read(uncore, GEN6_RP_CUR_UP_EI) & GEN6_CURICONT_MASK; 385 - rpcurup = intel_uncore_read(uncore, GEN6_RP_CUR_UP) & GEN6_CURBSYTAVG_MASK; 386 - rpprevup = intel_uncore_read(uncore, GEN6_RP_PREV_UP) & GEN6_CURBSYTAVG_MASK; 387 - rpcurdownei = intel_uncore_read(uncore, GEN6_RP_CUR_DOWN_EI) & GEN6_CURIAVG_MASK; 388 - rpcurdown = intel_uncore_read(uncore, GEN6_RP_CUR_DOWN) & GEN6_CURBSYTAVG_MASK; 389 - rpprevdown = intel_uncore_read(uncore, GEN6_RP_PREV_DOWN) & GEN6_CURBSYTAVG_MASK; 390 - 391 - rpupei = intel_uncore_read(uncore, GEN6_RP_UP_EI); 392 - rpupt = intel_uncore_read(uncore, GEN6_RP_UP_THRESHOLD); 393 - 394 - rpdownei = intel_uncore_read(uncore, GEN6_RP_DOWN_EI); 395 - rpdownt = intel_uncore_read(uncore, GEN6_RP_DOWN_THRESHOLD); 396 - 397 - cagf = intel_rps_read_actual_frequency(rps); 398 - 399 - intel_uncore_forcewake_put(uncore, FORCEWAKE_ALL); 400 - 401 - if (GRAPHICS_VER(i915) >= 11) { 402 - pm_ier = intel_uncore_read(uncore, GEN11_GPM_WGBOXPERF_INTR_ENABLE); 403 - pm_imr = intel_uncore_read(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK); 404 - /* 405 - * The equivalent to the PM ISR & IIR cannot be read 406 - * without affecting the current state of the system 407 - */ 408 - pm_isr = 0; 409 - pm_iir = 0; 410 - } else if (GRAPHICS_VER(i915) >= 8) { 411 - pm_ier = intel_uncore_read(uncore, GEN8_GT_IER(2)); 412 - pm_imr = intel_uncore_read(uncore, GEN8_GT_IMR(2)); 413 - pm_isr = intel_uncore_read(uncore, GEN8_GT_ISR(2)); 414 - pm_iir = intel_uncore_read(uncore, GEN8_GT_IIR(2)); 415 - } else { 416 - pm_ier = intel_uncore_read(uncore, GEN6_PMIER); 417 - pm_imr = intel_uncore_read(uncore, GEN6_PMIMR); 418 - pm_isr = intel_uncore_read(uncore, GEN6_PMISR); 419 - pm_iir = intel_uncore_read(uncore, GEN6_PMIIR); 420 - } 421 - pm_mask = intel_uncore_read(uncore, GEN6_PMINTRMSK); 422 - 423 - drm_printf(p, "Video Turbo Mode: %s\n", 424 - str_yes_no(rpmodectl & GEN6_RP_MEDIA_TURBO)); 425 - drm_printf(p, "HW control enabled: %s\n", 426 - str_yes_no(rpmodectl & GEN6_RP_ENABLE)); 427 - drm_printf(p, "SW control enabled: %s\n", 428 - str_yes_no((rpmodectl & GEN6_RP_MEDIA_MODE_MASK) == GEN6_RP_MEDIA_SW_MODE)); 429 - 430 - drm_printf(p, "PM IER=0x%08x IMR=0x%08x, MASK=0x%08x\n", 431 - pm_ier, pm_imr, pm_mask); 432 - if (GRAPHICS_VER(i915) <= 10) 433 - drm_printf(p, "PM ISR=0x%08x IIR=0x%08x\n", 434 - pm_isr, pm_iir); 435 - drm_printf(p, "pm_intrmsk_mbz: 0x%08x\n", 436 - rps->pm_intrmsk_mbz); 437 - drm_printf(p, "GT_PERF_STATUS: 0x%08x\n", gt_perf_status); 438 - drm_printf(p, "Render p-state ratio: %d\n", 439 - (gt_perf_status & (GRAPHICS_VER(i915) >= 9 ? 0x1ff00 : 0xff00)) >> 8); 440 - drm_printf(p, "Render p-state VID: %d\n", 441 - gt_perf_status & 0xff); 442 - drm_printf(p, "Render p-state limit: %d\n", 443 - rp_state_limits & 0xff); 444 - drm_printf(p, "RPSTAT1: 0x%08x\n", rpstat); 445 - drm_printf(p, "RPMODECTL: 0x%08x\n", rpmodectl); 446 - drm_printf(p, "RPINCLIMIT: 0x%08x\n", rpinclimit); 447 - drm_printf(p, "RPDECLIMIT: 0x%08x\n", rpdeclimit); 448 - drm_printf(p, "RPNSWREQ: %dMHz\n", reqf); 449 - drm_printf(p, "CAGF: %dMHz\n", cagf); 450 - drm_printf(p, "RP CUR UP EI: %d (%lldns)\n", 451 - rpcurupei, 452 - intel_gt_pm_interval_to_ns(gt, rpcurupei)); 453 - drm_printf(p, "RP CUR UP: %d (%lldns)\n", 454 - rpcurup, intel_gt_pm_interval_to_ns(gt, rpcurup)); 455 - drm_printf(p, "RP PREV UP: %d (%lldns)\n", 456 - rpprevup, intel_gt_pm_interval_to_ns(gt, rpprevup)); 457 - drm_printf(p, "Up threshold: %d%%\n", 458 - rps->power.up_threshold); 459 - drm_printf(p, "RP UP EI: %d (%lldns)\n", 460 - rpupei, intel_gt_pm_interval_to_ns(gt, rpupei)); 461 - drm_printf(p, "RP UP THRESHOLD: %d (%lldns)\n", 462 - rpupt, intel_gt_pm_interval_to_ns(gt, rpupt)); 463 - 464 - drm_printf(p, "RP CUR DOWN EI: %d (%lldns)\n", 465 - rpcurdownei, 466 - intel_gt_pm_interval_to_ns(gt, rpcurdownei)); 467 - drm_printf(p, "RP CUR DOWN: %d (%lldns)\n", 468 - rpcurdown, 469 - intel_gt_pm_interval_to_ns(gt, rpcurdown)); 470 - drm_printf(p, "RP PREV DOWN: %d (%lldns)\n", 471 - rpprevdown, 472 - intel_gt_pm_interval_to_ns(gt, rpprevdown)); 473 - drm_printf(p, "Down threshold: %d%%\n", 474 - rps->power.down_threshold); 475 - drm_printf(p, "RP DOWN EI: %d (%lldns)\n", 476 - rpdownei, intel_gt_pm_interval_to_ns(gt, rpdownei)); 477 - drm_printf(p, "RP DOWN THRESHOLD: %d (%lldns)\n", 478 - rpdownt, intel_gt_pm_interval_to_ns(gt, rpdownt)); 479 - 480 - drm_printf(p, "Lowest (RPN) frequency: %dMHz\n", 481 - intel_gpu_freq(rps, caps.min_freq)); 482 - drm_printf(p, "Nominal (RP1) frequency: %dMHz\n", 483 - intel_gpu_freq(rps, caps.rp1_freq)); 484 - drm_printf(p, "Max non-overclocked (RP0) frequency: %dMHz\n", 485 - intel_gpu_freq(rps, caps.rp0_freq)); 486 - drm_printf(p, "Max overclocked frequency: %dMHz\n", 487 - intel_gpu_freq(rps, rps->max_freq)); 488 - 489 - drm_printf(p, "Current freq: %d MHz\n", 490 - intel_gpu_freq(rps, rps->cur_freq)); 491 - drm_printf(p, "Actual freq: %d MHz\n", cagf); 492 - drm_printf(p, "Idle freq: %d MHz\n", 493 - intel_gpu_freq(rps, rps->idle_freq)); 494 - drm_printf(p, "Min freq: %d MHz\n", 495 - intel_gpu_freq(rps, rps->min_freq)); 496 - drm_printf(p, "Boost freq: %d MHz\n", 497 - intel_gpu_freq(rps, rps->boost_freq)); 498 - drm_printf(p, "Max freq: %d MHz\n", 499 - intel_gpu_freq(rps, rps->max_freq)); 500 - drm_printf(p, 501 - "efficient (RPe) frequency: %d MHz\n", 502 - intel_gpu_freq(rps, rps->efficient_freq)); 347 + gen6_rps_frequency_dump(rps, p); 503 348 } else { 504 349 drm_puts(p, "no P-state info available\n"); 505 350 } ··· 500 655 501 656 DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE(rps_boost); 502 657 658 + static int perf_limit_reasons_get(void *data, u64 *val) 659 + { 660 + struct intel_gt *gt = data; 661 + intel_wakeref_t wakeref; 662 + 663 + with_intel_runtime_pm(gt->uncore->rpm, wakeref) 664 + *val = intel_uncore_read(gt->uncore, intel_gt_perf_limit_reasons_reg(gt)); 665 + 666 + return 0; 667 + } 668 + 669 + static int perf_limit_reasons_clear(void *data, u64 val) 670 + { 671 + struct intel_gt *gt = data; 672 + intel_wakeref_t wakeref; 673 + 674 + /* 675 + * Clear the upper 16 "log" bits, the lower 16 "status" bits are 676 + * read-only. The upper 16 "log" bits are identical to the lower 16 677 + * "status" bits except that the "log" bits remain set until cleared. 678 + */ 679 + with_intel_runtime_pm(gt->uncore->rpm, wakeref) 680 + intel_uncore_rmw(gt->uncore, intel_gt_perf_limit_reasons_reg(gt), 681 + GT0_PERF_LIMIT_REASONS_LOG_MASK, 0); 682 + 683 + return 0; 684 + } 685 + 686 + static bool perf_limit_reasons_eval(void *data) 687 + { 688 + struct intel_gt *gt = data; 689 + 690 + return i915_mmio_reg_valid(intel_gt_perf_limit_reasons_reg(gt)); 691 + } 692 + 693 + DEFINE_SIMPLE_ATTRIBUTE(perf_limit_reasons_fops, perf_limit_reasons_get, 694 + perf_limit_reasons_clear, "%llu\n"); 695 + 503 696 void intel_gt_pm_debugfs_register(struct intel_gt *gt, struct dentry *root) 504 697 { 505 698 static const struct intel_gt_debugfs_file files[] = { ··· 547 664 { "forcewake_user", &forcewake_user_fops, NULL}, 548 665 { "llc", &llc_fops, llc_eval }, 549 666 { "rps_boost", &rps_boost_fops, rps_eval }, 667 + { "perf_limit_reasons", &perf_limit_reasons_fops, perf_limit_reasons_eval }, 550 668 }; 551 669 552 670 intel_gt_debugfs_register_files(root, files, ARRAY_SIZE(files), gt);

+111 -70

drivers/gpu/drm/i915/gt/intel_gt_regs.h

··· 8 8 9 9 #include "i915_reg_defs.h" 10 10 11 + #define MCR_REG(offset) ((const i915_mcr_reg_t){ .reg = (offset) }) 12 + 13 + /* 14 + * The perf control registers are technically multicast registers, but the 15 + * driver never needs to read/write them directly; we only use them to build 16 + * lists of registers (where they're mixed in with other non-MCR registers) 17 + * and then operate on the offset directly. For now we'll just define them 18 + * as non-multicast so we can place them on the same list, but we may want 19 + * to try to come up with a better way to handle heterogeneous lists of 20 + * registers in the future. 21 + */ 22 + #define PERF_REG(offset) _MMIO(offset) 23 + 11 24 /* RPM unit config (Gen8+) */ 12 25 #define RPM_CONFIG0 _MMIO(0xd00) 13 26 #define GEN9_RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_SHIFT 3 ··· 52 39 #define FORCEWAKE_ACK_RENDER_GEN9 _MMIO(0xd84) 53 40 #define FORCEWAKE_ACK_MEDIA_GEN9 _MMIO(0xd88) 54 41 42 + #define FORCEWAKE_ACK_GSC _MMIO(0xdf8) 43 + #define FORCEWAKE_ACK_GT_MTL _MMIO(0xdfc) 44 + 55 45 #define GMD_ID_GRAPHICS _MMIO(0xd8c) 56 46 #define GMD_ID_MEDIA _MMIO(MTL_MEDIA_GSI_BASE + 0xd8c) 57 47 58 48 #define MCFG_MCR_SELECTOR _MMIO(0xfd0) 49 + #define MTL_MCR_SELECTOR _MMIO(0xfd4) 59 50 #define SF_MCR_SELECTOR _MMIO(0xfd8) 60 51 #define GEN8_MCR_SELECTOR _MMIO(0xfdc) 52 + #define GAM_MCR_SELECTOR _MMIO(0xfe0) 61 53 #define GEN8_MCR_SLICE(slice) (((slice) & 3) << 26) 62 54 #define GEN8_MCR_SLICE_MASK GEN8_MCR_SLICE(3) 63 55 #define GEN8_MCR_SUBSLICE(subslice) (((subslice) & 3) << 24) ··· 72 54 #define GEN11_MCR_SLICE_MASK GEN11_MCR_SLICE(0xf) 73 55 #define GEN11_MCR_SUBSLICE(subslice) (((subslice) & 0x7) << 24) 74 56 #define GEN11_MCR_SUBSLICE_MASK GEN11_MCR_SUBSLICE(0x7) 57 + #define MTL_MCR_GROUPID REG_GENMASK(11, 8) 58 + #define MTL_MCR_INSTANCEID REG_GENMASK(3, 0) 75 59 76 60 #define IPEIR_I965 _MMIO(0x2064) 77 61 #define IPEHR_I965 _MMIO(0x2068) ··· 349 329 #define GEN7_TLB_RD_ADDR _MMIO(0x4700) 350 330 351 331 #define GEN12_PAT_INDEX(index) _MMIO(0x4800 + (index) * 4) 332 + #define XEHP_PAT_INDEX(index) MCR_REG(0x4800 + (index) * 4) 352 333 353 - #define XEHP_TILE0_ADDR_RANGE _MMIO(0x4900) 334 + #define XEHP_TILE0_ADDR_RANGE MCR_REG(0x4900) 354 335 #define XEHP_TILE_LMEM_RANGE_SHIFT 8 355 336 356 - #define XEHP_FLAT_CCS_BASE_ADDR _MMIO(0x4910) 337 + #define XEHP_FLAT_CCS_BASE_ADDR MCR_REG(0x4910) 357 338 #define XEHP_CCS_BASE_SHIFT 8 358 339 359 340 #define GAMTARBMODE _MMIO(0x4a08) ··· 404 383 #define CHICKEN_RASTER_2 _MMIO(0x6208) 405 384 #define TBIMR_FAST_CLIP REG_BIT(5) 406 385 407 - #define VFLSKPD _MMIO(0x62a8) 386 + #define VFLSKPD MCR_REG(0x62a8) 408 387 #define DIS_OVER_FETCH_CACHE REG_BIT(1) 409 388 #define DIS_MULT_MISS_RD_SQUASH REG_BIT(0) 410 389 411 - #define FF_MODE2 _MMIO(0x6604) 390 + #define GEN12_FF_MODE2 _MMIO(0x6604) 391 + #define XEHP_FF_MODE2 MCR_REG(0x6604) 412 392 #define FF_MODE2_GS_TIMER_MASK REG_GENMASK(31, 24) 413 393 #define FF_MODE2_GS_TIMER_224 REG_FIELD_PREP(FF_MODE2_GS_TIMER_MASK, 224) 414 394 #define FF_MODE2_TDS_TIMER_MASK REG_GENMASK(23, 16) 415 395 #define FF_MODE2_TDS_TIMER_128 REG_FIELD_PREP(FF_MODE2_TDS_TIMER_MASK, 4) 416 396 417 - #define XEHPG_INSTDONE_GEOM_SVG _MMIO(0x666c) 397 + #define XEHPG_INSTDONE_GEOM_SVG MCR_REG(0x666c) 418 398 419 399 #define CACHE_MODE_0_GEN7 _MMIO(0x7000) /* IVB+ */ 420 400 #define RC_OP_FLUSH_ENABLE (1 << 0) ··· 443 421 #define HIZ_CHICKEN _MMIO(0x7018) 444 422 #define CHV_HZ_8X8_MODE_IN_1X REG_BIT(15) 445 423 #define DG1_HZ_READ_SUPPRESSION_OPTIMIZATION_DISABLE REG_BIT(14) 424 + #define HZ_DEPTH_TEST_LE_GE_OPT_DISABLE REG_BIT(13) 446 425 #define BDW_HIZ_POWER_COMPILER_CLOCK_GATING_DISABLE REG_BIT(3) 447 426 448 427 #define GEN8_L3CNTLREG _MMIO(0x7034) ··· 465 442 #define GEN8_HDC_CHICKEN1 _MMIO(0x7304) 466 443 467 444 #define GEN11_COMMON_SLICE_CHICKEN3 _MMIO(0x7304) 445 + #define XEHP_COMMON_SLICE_CHICKEN3 MCR_REG(0x7304) 468 446 #define DG1_FLOAT_POINT_BLEND_OPT_STRICT_MODE_EN REG_BIT(12) 469 447 #define XEHP_DUAL_SIMD8_SEQ_MERGE_DISABLE REG_BIT(12) 470 448 #define GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC REG_BIT(11) 471 449 #define GEN12_DISABLE_CPS_AWARE_COLOR_PIPE REG_BIT(9) 472 450 473 - /* GEN9 chicken */ 474 - #define SLICE_ECO_CHICKEN0 _MMIO(0x7308) 475 - #define PIXEL_MASK_CAMMING_DISABLE (1 << 14) 476 - 477 - #define GEN9_SLICE_COMMON_ECO_CHICKEN0 _MMIO(0x7308) 478 - #define DISABLE_PIXEL_MASK_CAMMING (1 << 14) 479 - 480 451 #define GEN9_SLICE_COMMON_ECO_CHICKEN1 _MMIO(0x731c) 481 - #define GEN11_STATE_CACHE_REDIRECT_TO_CS (1 << 11) 482 - 483 - #define SLICE_COMMON_ECO_CHICKEN1 _MMIO(0x731c) 452 + #define XEHP_SLICE_COMMON_ECO_CHICKEN1 MCR_REG(0x731c) 484 453 #define MSC_MSAA_REODER_BUF_BYPASS_DISABLE REG_BIT(14) 454 + #define GEN11_STATE_CACHE_REDIRECT_TO_CS (1 << 11) 485 455 486 456 #define GEN9_SLICE_PGCTL_ACK(slice) _MMIO(0x804c + (slice) * 0x4) 487 457 #define GEN10_SLICE_PGCTL_ACK(slice) _MMIO(0x804c + ((slice) / 3) * 0x34 + \ ··· 501 485 #define VF_PREEMPTION _MMIO(0x83a4) 502 486 #define PREEMPTION_VERTEX_COUNT REG_GENMASK(15, 0) 503 487 488 + #define VFG_PREEMPTION_CHICKEN _MMIO(0x83b4) 489 + #define POLYGON_TRIFAN_LINELOOP_DISABLE REG_BIT(4) 490 + 504 491 #define GEN8_RC6_CTX_INFO _MMIO(0x8504) 505 492 506 - #define GEN12_SQCM _MMIO(0x8724) 493 + #define XEHP_SQCM MCR_REG(0x8724) 507 494 #define EN_32B_ACCESS REG_BIT(30) 508 495 509 496 #define HSW_IDICR _MMIO(0x9008) ··· 538 519 #define GEN6_MBCTL_BOOT_FETCH_MECH (1 << 0) 539 520 540 521 /* Fuse readout registers for GT */ 522 + #define XEHP_FUSE4 _MMIO(0x9114) 523 + #define GT_L3_EXC_MASK REG_GENMASK(6, 4) 541 524 #define GEN10_MIRROR_FUSE3 _MMIO(0x9118) 542 525 #define GEN10_L3BANK_PAIR_COUNT 4 543 526 #define GEN10_L3BANK_MASK 0x0F ··· 668 647 669 648 #define GEN7_MISCCPCTL _MMIO(0x9424) 670 649 #define GEN7_DOP_CLOCK_GATE_ENABLE (1 << 0) 650 + 651 + #define GEN8_MISCCPCTL MCR_REG(0x9424) 652 + #define GEN8_DOP_CLOCK_GATE_ENABLE REG_BIT(0) 671 653 #define GEN12_DOP_CLOCK_GATE_RENDER_ENABLE REG_BIT(1) 672 654 #define GEN8_DOP_CLOCK_GATE_CFCLK_ENABLE (1 << 2) 673 655 #define GEN8_DOP_CLOCK_GATE_GUC_ENABLE (1 << 4) ··· 724 700 #define GAMTLBVEBOX0_CLKGATE_DIS REG_BIT(16) 725 701 #define LTCDD_CLKGATE_DIS REG_BIT(10) 726 702 727 - #define SLICE_UNIT_LEVEL_CLKGATE _MMIO(0x94d4) 703 + #define GEN11_SLICE_UNIT_LEVEL_CLKGATE _MMIO(0x94d4) 704 + #define XEHP_SLICE_UNIT_LEVEL_CLKGATE MCR_REG(0x94d4) 728 705 #define SARBUNIT_CLKGATE_DIS (1 << 5) 729 706 #define RCCUNIT_CLKGATE_DIS (1 << 7) 730 707 #define MSCUNIT_CLKGATE_DIS (1 << 10) ··· 733 708 #define L3_CLKGATE_DIS REG_BIT(16) 734 709 #define L3_CR2X_CLKGATE_DIS REG_BIT(17) 735 710 736 - #define SCCGCTL94DC _MMIO(0x94dc) 711 + #define SCCGCTL94DC MCR_REG(0x94dc) 737 712 #define CG3DDISURB REG_BIT(14) 738 713 739 714 #define UNSLICE_UNIT_LEVEL_CLKGATE2 _MMIO(0x94e4) 740 715 #define VSUNIT_CLKGATE_DIS_TGL REG_BIT(19) 741 716 #define PSDUNIT_CLKGATE_DIS REG_BIT(5) 742 717 743 - #define SUBSLICE_UNIT_LEVEL_CLKGATE _MMIO(0x9524) 718 + #define GEN11_SUBSLICE_UNIT_LEVEL_CLKGATE MCR_REG(0x9524) 744 719 #define DSS_ROUTER_CLKGATE_DIS REG_BIT(28) 745 720 #define GWUNIT_CLKGATE_DIS REG_BIT(16) 746 721 747 - #define SUBSLICE_UNIT_LEVEL_CLKGATE2 _MMIO(0x9528) 722 + #define SUBSLICE_UNIT_LEVEL_CLKGATE2 MCR_REG(0x9528) 748 723 #define CPSSUNIT_CLKGATE_DIS REG_BIT(9) 749 724 750 - #define SSMCGCTL9530 _MMIO(0x9530) 725 + #define SSMCGCTL9530 MCR_REG(0x9530) 751 726 #define RTFUNIT_CLKGATE_DIS REG_BIT(18) 752 727 753 - #define GEN10_DFR_RATIO_EN_AND_CHICKEN _MMIO(0x9550) 728 + #define GEN10_DFR_RATIO_EN_AND_CHICKEN MCR_REG(0x9550) 754 729 #define DFR_DISABLE (1 << 9) 755 730 756 - #define INF_UNIT_LEVEL_CLKGATE _MMIO(0x9560) 731 + #define INF_UNIT_LEVEL_CLKGATE MCR_REG(0x9560) 757 732 #define CGPSF_CLKGATE_DIS (1 << 3) 758 733 759 734 #define MICRO_BP0_0 _MMIO(0x9800) ··· 926 901 #define FORCEWAKE_MEDIA_VDBOX_GEN11(n) _MMIO(0xa540 + (n) * 4) 927 902 #define FORCEWAKE_MEDIA_VEBOX_GEN11(n) _MMIO(0xa560 + (n) * 4) 928 903 904 + #define FORCEWAKE_REQ_GSC _MMIO(0xa618) 905 + 929 906 #define CHV_POWER_SS0_SIG1 _MMIO(0xa720) 930 907 #define CHV_POWER_SS0_SIG2 _MMIO(0xa724) 931 908 #define CHV_POWER_SS1_SIG1 _MMIO(0xa728) ··· 965 938 966 939 /* MOCS (Memory Object Control State) registers */ 967 940 #define GEN9_LNCFCMOCS(i) _MMIO(0xb020 + (i) * 4) /* L3 Cache Control */ 968 - #define GEN9_LNCFCMOCS_REG_COUNT 32 941 + #define XEHP_LNCFCMOCS(i) MCR_REG(0xb020 + (i) * 4) 942 + #define LNCFCMOCS_REG_COUNT 32 969 943 970 944 #define GEN7_L3CNTLREG3 _MMIO(0xb024) 971 945 ··· 982 954 #define GEN7_L3LOG(slice, i) _MMIO(0xb070 + (slice) * 0x200 + (i) * 4) 983 955 #define GEN7_L3LOG_SIZE 0x80 984 956 985 - #define GEN10_SCRATCH_LNCF2 _MMIO(0xb0a0) 986 - #define PMFLUSHDONE_LNICRSDROP (1 << 20) 987 - #define PMFLUSH_GAPL3UNBLOCK (1 << 21) 988 - #define PMFLUSHDONE_LNEBLK (1 << 22) 989 - 990 - #define XEHP_L3NODEARBCFG _MMIO(0xb0b4) 957 + #define XEHP_L3NODEARBCFG MCR_REG(0xb0b4) 991 958 #define XEHP_LNESPARE REG_BIT(19) 992 959 993 - #define GEN8_L3SQCREG1 _MMIO(0xb100) 960 + #define GEN8_L3SQCREG1 MCR_REG(0xb100) 994 961 /* 995 962 * Note that on CHV the following has an off-by-one error wrt. to BSpec. 996 963 * Using the formula in BSpec leads to a hang, while the formula here works ··· 996 973 #define L3_HIGH_PRIO_CREDITS(x) (((x) >> 1) << 14) 997 974 #define L3_PRIO_CREDITS_MASK ((0x1f << 19) | (0x1f << 14)) 998 975 999 - #define GEN10_L3_CHICKEN_MODE_REGISTER _MMIO(0xb114) 1000 - #define GEN11_I2M_WRITE_DISABLE (1 << 28) 1001 - 1002 - #define GEN8_L3SQCREG4 _MMIO(0xb118) 976 + #define GEN8_L3SQCREG4 MCR_REG(0xb118) 1003 977 #define GEN11_LQSC_CLEAN_EVICT_DISABLE (1 << 6) 1004 978 #define GEN8_LQSC_RO_PERF_DIS (1 << 27) 1005 979 #define GEN8_LQSC_FLUSH_COHERENT_LINES (1 << 21) 1006 980 #define GEN8_LQSQ_NONIA_COHERENT_ATOMICS_ENABLE REG_BIT(22) 1007 981 1008 - #define GEN9_SCRATCH1 _MMIO(0xb11c) 982 + #define GEN9_SCRATCH1 MCR_REG(0xb11c) 1009 983 #define EVICTION_PERF_FIX_ENABLE REG_BIT(8) 1010 984 1011 - #define BDW_SCRATCH1 _MMIO(0xb11c) 985 + #define BDW_SCRATCH1 MCR_REG(0xb11c) 1012 986 #define GEN9_LBS_SLA_RETRY_TIMER_DECREMENT_ENABLE (1 << 2) 1013 987 1014 - #define GEN11_SCRATCH2 _MMIO(0xb140) 988 + #define GEN11_SCRATCH2 MCR_REG(0xb140) 1015 989 #define GEN11_COHERENT_PARTIAL_WRITE_MERGE_ENABLE (1 << 19) 1016 990 1017 - #define GEN11_L3SQCREG5 _MMIO(0xb158) 991 + #define XEHP_L3SQCREG5 MCR_REG(0xb158) 1018 992 #define L3_PWM_TIMER_INIT_VAL_MASK REG_GENMASK(9, 0) 1019 993 1020 - #define MLTICTXCTL _MMIO(0xb170) 994 + #define MLTICTXCTL MCR_REG(0xb170) 1021 995 #define TDONRENDER REG_BIT(2) 1022 996 1023 - #define XEHP_L3SCQREG7 _MMIO(0xb188) 997 + #define XEHP_L3SCQREG7 MCR_REG(0xb188) 1024 998 #define BLEND_FILL_CACHING_OPT_DIS REG_BIT(3) 1025 999 1026 1000 #define XEHPC_L3SCRUB _MMIO(0xb18c) ··· 1025 1005 #define SCRUB_RATE_PER_BANK_MASK REG_GENMASK(2, 0) 1026 1006 #define SCRUB_RATE_4B_PER_CLK REG_FIELD_PREP(SCRUB_RATE_PER_BANK_MASK, 0x6) 1027 1007 1028 - #define L3SQCREG1_CCS0 _MMIO(0xb200) 1008 + #define L3SQCREG1_CCS0 MCR_REG(0xb200) 1029 1009 #define FLUSHALLNONCOH REG_BIT(5) 1030 1010 1031 1011 #define GEN11_GLBLINVL _MMIO(0xb404) ··· 1050 1030 #define GEN9_BLT_MOCS(i) _MMIO(__GEN9_BCS0_MOCS0 + (i) * 4) 1051 1031 1052 1032 #define GEN12_FAULT_TLB_DATA0 _MMIO(0xceb8) 1033 + #define XEHP_FAULT_TLB_DATA0 MCR_REG(0xceb8) 1053 1034 #define GEN12_FAULT_TLB_DATA1 _MMIO(0xcebc) 1035 + #define XEHP_FAULT_TLB_DATA1 MCR_REG(0xcebc) 1054 1036 #define FAULT_VA_HIGH_BITS (0xf << 0) 1055 1037 #define FAULT_GTT_SEL (1 << 4) 1056 1038 1057 1039 #define GEN12_RING_FAULT_REG _MMIO(0xcec4) 1040 + #define XEHP_RING_FAULT_REG MCR_REG(0xcec4) 1058 1041 #define GEN8_RING_FAULT_ENGINE_ID(x) (((x) >> 12) & 0x7) 1059 1042 #define RING_FAULT_GTTSEL_MASK (1 << 11) 1060 1043 #define RING_FAULT_SRCID(x) (((x) >> 3) & 0xff) ··· 1065 1042 #define RING_FAULT_VALID (1 << 0) 1066 1043 1067 1044 #define GEN12_GFX_TLB_INV_CR _MMIO(0xced8) 1045 + #define XEHP_GFX_TLB_INV_CR MCR_REG(0xced8) 1068 1046 #define GEN12_VD_TLB_INV_CR _MMIO(0xcedc) 1047 + #define XEHP_VD_TLB_INV_CR MCR_REG(0xcedc) 1069 1048 #define GEN12_VE_TLB_INV_CR _MMIO(0xcee0) 1049 + #define XEHP_VE_TLB_INV_CR MCR_REG(0xcee0) 1070 1050 #define GEN12_BLT_TLB_INV_CR _MMIO(0xcee4) 1051 + #define XEHP_BLT_TLB_INV_CR MCR_REG(0xcee4) 1071 1052 #define GEN12_COMPCTX_TLB_INV_CR _MMIO(0xcf04) 1053 + #define XEHP_COMPCTX_TLB_INV_CR MCR_REG(0xcf04) 1072 1054 1073 - #define GEN12_MERT_MOD_CTRL _MMIO(0xcf28) 1074 - #define RENDER_MOD_CTRL _MMIO(0xcf2c) 1075 - #define COMP_MOD_CTRL _MMIO(0xcf30) 1076 - #define VDBX_MOD_CTRL _MMIO(0xcf34) 1077 - #define VEBX_MOD_CTRL _MMIO(0xcf38) 1055 + #define XEHP_MERT_MOD_CTRL MCR_REG(0xcf28) 1056 + #define RENDER_MOD_CTRL MCR_REG(0xcf2c) 1057 + #define COMP_MOD_CTRL MCR_REG(0xcf30) 1058 + #define VDBX_MOD_CTRL MCR_REG(0xcf34) 1059 + #define VEBX_MOD_CTRL MCR_REG(0xcf38) 1078 1060 #define FORCE_MISS_FTLB REG_BIT(3) 1079 1061 1080 1062 #define GEN12_GAMSTLB_CTRL _MMIO(0xcf4c) ··· 1094 1066 #define GEN12_GAM_DONE _MMIO(0xcf68) 1095 1067 1096 1068 #define GEN7_HALF_SLICE_CHICKEN1 _MMIO(0xe100) /* IVB GT1 + VLV */ 1069 + #define GEN8_HALF_SLICE_CHICKEN1 MCR_REG(0xe100) 1097 1070 #define GEN7_MAX_PS_THREAD_DEP (8 << 12) 1098 1071 #define GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE (1 << 10) 1099 1072 #define GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE (1 << 4) 1100 1073 #define GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE (1 << 3) 1101 1074 1102 1075 #define GEN7_SAMPLER_INSTDONE _MMIO(0xe160) 1076 + #define GEN8_SAMPLER_INSTDONE MCR_REG(0xe160) 1103 1077 #define GEN7_ROW_INSTDONE _MMIO(0xe164) 1078 + #define GEN8_ROW_INSTDONE MCR_REG(0xe164) 1104 1079 1105 - #define HALF_SLICE_CHICKEN2 _MMIO(0xe180) 1080 + #define HALF_SLICE_CHICKEN2 MCR_REG(0xe180) 1106 1081 #define GEN8_ST_PO_DISABLE (1 << 13) 1107 1082 1108 - #define HALF_SLICE_CHICKEN3 _MMIO(0xe184) 1083 + #define HSW_HALF_SLICE_CHICKEN3 _MMIO(0xe184) 1084 + #define GEN8_HALF_SLICE_CHICKEN3 MCR_REG(0xe184) 1109 1085 #define HSW_SAMPLE_C_PERFORMANCE (1 << 9) 1110 1086 #define GEN8_CENTROID_PIXEL_OPT_DIS (1 << 8) 1111 1087 #define GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC (1 << 5) 1112 1088 #define GEN8_SAMPLER_POWER_BYPASS_DIS (1 << 1) 1113 1089 1114 - #define GEN9_HALF_SLICE_CHICKEN5 _MMIO(0xe188) 1090 + #define GEN9_HALF_SLICE_CHICKEN5 MCR_REG(0xe188) 1115 1091 #define GEN9_DG_MIRROR_FIX_ENABLE (1 << 5) 1116 1092 #define GEN9_CCS_TLB_PREFETCH_ENABLE (1 << 3) 1117 1093 1118 - #define GEN10_SAMPLER_MODE _MMIO(0xe18c) 1094 + #define GEN10_SAMPLER_MODE MCR_REG(0xe18c) 1119 1095 #define ENABLE_SMALLPL REG_BIT(15) 1120 1096 #define SC_DISABLE_POWER_OPTIMIZATION_EBB REG_BIT(9) 1121 1097 #define GEN11_SAMPLER_ENABLE_HEADLESS_MSG REG_BIT(5) 1122 1098 1123 - #define GEN9_HALF_SLICE_CHICKEN7 _MMIO(0xe194) 1099 + #define GEN9_HALF_SLICE_CHICKEN7 MCR_REG(0xe194) 1124 1100 #define DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA REG_BIT(15) 1125 1101 #define GEN9_SAMPLER_HASH_COMPRESSED_READ_ADDR REG_BIT(8) 1126 1102 #define GEN9_ENABLE_YV12_BUGFIX REG_BIT(4) 1127 1103 #define GEN9_ENABLE_GPGPU_PREEMPTION REG_BIT(2) 1128 1104 1129 - #define GEN10_CACHE_MODE_SS _MMIO(0xe420) 1105 + #define GEN10_CACHE_MODE_SS MCR_REG(0xe420) 1130 1106 #define ENABLE_EU_COUNT_FOR_TDL_FLUSH REG_BIT(10) 1131 1107 #define DISABLE_ECC REG_BIT(5) 1132 1108 #define FLOAT_BLEND_OPTIMIZATION_ENABLE REG_BIT(4) 1133 1109 #define ENABLE_PREFETCH_INTO_IC REG_BIT(3) 1134 1110 1135 - #define EU_PERF_CNTL0 _MMIO(0xe458) 1136 - #define EU_PERF_CNTL4 _MMIO(0xe45c) 1111 + #define EU_PERF_CNTL0 PERF_REG(0xe458) 1112 + #define EU_PERF_CNTL4 PERF_REG(0xe45c) 1137 1113 1138 - #define GEN9_ROW_CHICKEN4 _MMIO(0xe48c) 1114 + #define GEN9_ROW_CHICKEN4 MCR_REG(0xe48c) 1139 1115 #define GEN12_DISABLE_GRF_CLEAR REG_BIT(13) 1140 1116 #define XEHP_DIS_BBL_SYSPIPE REG_BIT(11) 1141 1117 #define GEN12_DISABLE_TDL_PUSH REG_BIT(9) ··· 1151 1119 #define HSW_ROW_CHICKEN3 _MMIO(0xe49c) 1152 1120 #define HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE (1 << 6) 1153 1121 1154 - #define GEN8_ROW_CHICKEN _MMIO(0xe4f0) 1122 + #define GEN8_ROW_CHICKEN MCR_REG(0xe4f0) 1155 1123 #define FLOW_CONTROL_ENABLE REG_BIT(15) 1156 1124 #define UGM_BACKUP_MODE REG_BIT(13) 1157 1125 #define MDQ_ARBITRATION_MODE REG_BIT(12) ··· 1162 1130 #define DISABLE_EARLY_EOT REG_BIT(1) 1163 1131 1164 1132 #define GEN7_ROW_CHICKEN2 _MMIO(0xe4f4) 1133 + 1134 + #define GEN8_ROW_CHICKEN2 MCR_REG(0xe4f4) 1165 1135 #define GEN12_DISABLE_READ_SUPPRESSION REG_BIT(15) 1166 1136 #define GEN12_DISABLE_EARLY_READ REG_BIT(14) 1167 1137 #define GEN12_ENABLE_LARGE_GRF_MODE REG_BIT(12) 1168 1138 #define GEN12_PUSH_CONST_DEREF_HOLD_DIS REG_BIT(8) 1139 + #define GEN12_DISABLE_DOP_GATING REG_BIT(0) 1169 1140 1170 - #define RT_CTRL _MMIO(0xe530) 1141 + #define RT_CTRL MCR_REG(0xe530) 1171 1142 #define DIS_NULL_QUERY REG_BIT(10) 1172 1143 #define STACKID_CTRL REG_GENMASK(6, 5) 1173 1144 #define STACKID_CTRL_512 REG_FIELD_PREP(STACKID_CTRL, 0x2) 1174 1145 1175 - #define EU_PERF_CNTL1 _MMIO(0xe558) 1176 - #define EU_PERF_CNTL5 _MMIO(0xe55c) 1146 + #define EU_PERF_CNTL1 PERF_REG(0xe558) 1147 + #define EU_PERF_CNTL5 PERF_REG(0xe55c) 1177 1148 1178 - #define GEN12_HDC_CHICKEN0 _MMIO(0xe5f0) 1149 + #define XEHP_HDC_CHICKEN0 MCR_REG(0xe5f0) 1179 1150 #define LSC_L1_FLUSH_CTL_3D_DATAPORT_FLUSH_EVENTS_MASK REG_GENMASK(13, 11) 1180 - #define ICL_HDC_MODE _MMIO(0xe5f4) 1151 + #define ICL_HDC_MODE MCR_REG(0xe5f4) 1181 1152 1182 - #define EU_PERF_CNTL2 _MMIO(0xe658) 1183 - #define EU_PERF_CNTL6 _MMIO(0xe65c) 1184 - #define EU_PERF_CNTL3 _MMIO(0xe758) 1153 + #define EU_PERF_CNTL2 PERF_REG(0xe658) 1154 + #define EU_PERF_CNTL6 PERF_REG(0xe65c) 1155 + #define EU_PERF_CNTL3 PERF_REG(0xe758) 1185 1156 1186 - #define LSC_CHICKEN_BIT_0 _MMIO(0xe7c8) 1157 + #define LSC_CHICKEN_BIT_0 MCR_REG(0xe7c8) 1187 1158 #define DISABLE_D8_D16_COASLESCE REG_BIT(30) 1188 1159 #define FORCE_1_SUB_MESSAGE_PER_FRAGMENT REG_BIT(15) 1189 - #define LSC_CHICKEN_BIT_0_UDW _MMIO(0xe7c8 + 4) 1160 + #define LSC_CHICKEN_BIT_0_UDW MCR_REG(0xe7c8 + 4) 1190 1161 #define DIS_CHAIN_2XSIMD8 REG_BIT(55 - 32) 1191 1162 #define FORCE_SLM_FENCE_SCOPE_TO_TILE REG_BIT(42 - 32) 1192 1163 #define FORCE_UGM_FENCE_SCOPE_TO_TILE REG_BIT(41 - 32) 1193 1164 #define MAXREQS_PER_BANK REG_GENMASK(39 - 32, 37 - 32) 1194 1165 #define DISABLE_128B_EVICTION_COMMAND_UDW REG_BIT(36 - 32) 1195 1166 1196 - #define SARB_CHICKEN1 _MMIO(0xe90c) 1167 + #define SARB_CHICKEN1 MCR_REG(0xe90c) 1197 1168 #define COMP_CKN_IN REG_GENMASK(30, 29) 1198 - 1199 - #define GEN7_HALF_SLICE_CHICKEN1_GT2 _MMIO(0xf100) 1200 1169 1201 1170 #define GEN7_ROW_CHICKEN2_GT2 _MMIO(0xf4f4) 1202 1171 #define DOP_CLOCK_GATING_DISABLE (1 << 0) ··· 1546 1513 #define VLV_RENDER_C0_COUNT _MMIO(0x138118) 1547 1514 #define VLV_MEDIA_C0_COUNT _MMIO(0x13811c) 1548 1515 1516 + #define GEN12_RPSTAT1 _MMIO(0x1381b4) 1517 + #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0) 1518 + 1549 1519 #define GEN11_GT_INTR_DW(x) _MMIO(0x190018 + ((x) * 4)) 1550 1520 #define GEN11_CSME (31) 1551 1521 #define GEN11_GUNIT (28) ··· 1618 1582 #define XEHPC_BCS7_BCS8_INTR_MASK _MMIO(0x19011c) 1619 1583 1620 1584 #define GEN12_SFC_DONE(n) _MMIO(0x1cc000 + (n) * 0x1000) 1585 + 1586 + #define GT0_PACKAGE_ENERGY_STATUS _MMIO(0x250004) 1587 + #define GT0_PACKAGE_RAPL_LIMIT _MMIO(0x250008) 1588 + #define GT0_PACKAGE_POWER_SKU_UNIT _MMIO(0x250068) 1589 + #define GT0_PLATFORM_ENERGY_STATUS _MMIO(0x25006c) 1621 1590 1622 1591 /* 1623 1592 * Standalone Media's non-engine GT registers are located at their regular GT

+7 -8

drivers/gpu/drm/i915/gt/intel_gt_sysfs.c

··· 22 22 return !strncmp(kobj->name, "gt", 2); 23 23 } 24 24 25 - struct intel_gt *intel_gt_sysfs_get_drvdata(struct device *dev, 25 + struct intel_gt *intel_gt_sysfs_get_drvdata(struct kobject *kobj, 26 26 const char *name) 27 27 { 28 - struct kobject *kobj = &dev->kobj; 29 - 30 28 /* 31 29 * We are interested at knowing from where the interface 32 30 * has been called, whether it's called from gt/ or from ··· 36 38 * "struct drm_i915_private *" type. 37 39 */ 38 40 if (!is_object_gt(kobj)) { 41 + struct device *dev = kobj_to_dev(kobj); 39 42 struct drm_i915_private *i915 = kdev_minor_to_i915(dev); 40 43 41 44 return to_gt(i915); ··· 50 51 return &gt->i915->drm.primary->kdev->kobj; 51 52 } 52 53 53 - static ssize_t id_show(struct device *dev, 54 - struct device_attribute *attr, 54 + static ssize_t id_show(struct kobject *kobj, 55 + struct kobj_attribute *attr, 55 56 char *buf) 56 57 { 57 - struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 58 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(kobj, attr->attr.name); 58 59 59 60 return sysfs_emit(buf, "%u\n", gt->info.id); 60 61 } 61 - static DEVICE_ATTR_RO(id); 62 + static struct kobj_attribute attr_id = __ATTR_RO(id); 62 63 63 64 static struct attribute *id_attrs[] = { 64 - &dev_attr_id.attr, 65 + &attr_id.attr, 65 66 NULL, 66 67 }; 67 68 ATTRIBUTE_GROUPS(id);

+1 -6

drivers/gpu/drm/i915/gt/intel_gt_sysfs.h

··· 18 18 19 19 struct drm_i915_private *kobj_to_i915(struct kobject *kobj); 20 20 21 - struct kobject * 22 - intel_gt_create_kobj(struct intel_gt *gt, 23 - struct kobject *dir, 24 - const char *name); 25 - 26 21 static inline struct intel_gt *kobj_to_gt(struct kobject *kobj) 27 22 { 28 23 return container_of(kobj, struct intel_gt, sysfs_gt); ··· 25 30 26 31 void intel_gt_sysfs_register(struct intel_gt *gt); 27 32 void intel_gt_sysfs_unregister(struct intel_gt *gt); 28 - struct intel_gt *intel_gt_sysfs_get_drvdata(struct device *dev, 33 + struct intel_gt *intel_gt_sysfs_get_drvdata(struct kobject *kobj, 29 34 const char *name); 30 35 31 36 #endif /* SYSFS_GT_H */

+216 -253

drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.c

··· 24 24 }; 25 25 26 26 static int 27 - sysfs_gt_attribute_w_func(struct device *dev, struct device_attribute *attr, 27 + sysfs_gt_attribute_w_func(struct kobject *kobj, struct attribute *attr, 28 28 int (func)(struct intel_gt *gt, u32 val), u32 val) 29 29 { 30 30 struct intel_gt *gt; 31 31 int ret; 32 32 33 - if (!is_object_gt(&dev->kobj)) { 33 + if (!is_object_gt(kobj)) { 34 34 int i; 35 + struct device *dev = kobj_to_dev(kobj); 35 36 struct drm_i915_private *i915 = kdev_minor_to_i915(dev); 36 37 37 38 for_each_gt(gt, i915, i) { ··· 41 40 break; 42 41 } 43 42 } else { 44 - gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 43 + gt = intel_gt_sysfs_get_drvdata(kobj, attr->name); 45 44 ret = func(gt, val); 46 45 } 47 46 ··· 49 48 } 50 49 51 50 static u32 52 - sysfs_gt_attribute_r_func(struct device *dev, struct device_attribute *attr, 51 + sysfs_gt_attribute_r_func(struct kobject *kobj, struct attribute *attr, 53 52 u32 (func)(struct intel_gt *gt), 54 53 enum intel_gt_sysfs_op op) 55 54 { ··· 58 57 59 58 ret = (op == INTEL_GT_SYSFS_MAX) ? 0 : (u32) -1; 60 59 61 - if (!is_object_gt(&dev->kobj)) { 60 + if (!is_object_gt(kobj)) { 62 61 int i; 62 + struct device *dev = kobj_to_dev(kobj); 63 63 struct drm_i915_private *i915 = kdev_minor_to_i915(dev); 64 64 65 65 for_each_gt(gt, i915, i) { ··· 79 77 } 80 78 } 81 79 } else { 82 - gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 80 + gt = intel_gt_sysfs_get_drvdata(kobj, attr->name); 83 81 ret = func(gt); 84 82 } 85 83 ··· 94 92 #define sysfs_gt_attribute_r_max_func(d, a, f) \ 95 93 sysfs_gt_attribute_r_func(d, a, f, INTEL_GT_SYSFS_MAX) 96 94 95 + #define INTEL_GT_SYSFS_SHOW(_name, _attr_type) \ 96 + static ssize_t _name##_show_common(struct kobject *kobj, \ 97 + struct attribute *attr, char *buff) \ 98 + { \ 99 + u32 val = sysfs_gt_attribute_r_##_attr_type##_func(kobj, attr, \ 100 + __##_name##_show); \ 101 + \ 102 + return sysfs_emit(buff, "%u\n", val); \ 103 + } \ 104 + static ssize_t _name##_show(struct kobject *kobj, \ 105 + struct kobj_attribute *attr, char *buff) \ 106 + { \ 107 + return _name ##_show_common(kobj, &attr->attr, buff); \ 108 + } \ 109 + static ssize_t _name##_dev_show(struct device *dev, \ 110 + struct device_attribute *attr, char *buff) \ 111 + { \ 112 + return _name##_show_common(&dev->kobj, &attr->attr, buff); \ 113 + } 114 + 115 + #define INTEL_GT_SYSFS_STORE(_name, _func) \ 116 + static ssize_t _name##_store_common(struct kobject *kobj, \ 117 + struct attribute *attr, \ 118 + const char *buff, size_t count) \ 119 + { \ 120 + int ret; \ 121 + u32 val; \ 122 + \ 123 + ret = kstrtou32(buff, 0, &val); \ 124 + if (ret) \ 125 + return ret; \ 126 + \ 127 + ret = sysfs_gt_attribute_w_func(kobj, attr, _func, val); \ 128 + \ 129 + return ret ?: count; \ 130 + } \ 131 + static ssize_t _name##_store(struct kobject *kobj, \ 132 + struct kobj_attribute *attr, const char *buff, \ 133 + size_t count) \ 134 + { \ 135 + return _name##_store_common(kobj, &attr->attr, buff, count); \ 136 + } \ 137 + static ssize_t _name##_dev_store(struct device *dev, \ 138 + struct device_attribute *attr, \ 139 + const char *buff, size_t count) \ 140 + { \ 141 + return _name##_store_common(&dev->kobj, &attr->attr, buff, count); \ 142 + } 143 + 144 + #define INTEL_GT_SYSFS_SHOW_MAX(_name) INTEL_GT_SYSFS_SHOW(_name, max) 145 + #define INTEL_GT_SYSFS_SHOW_MIN(_name) INTEL_GT_SYSFS_SHOW(_name, min) 146 + 147 + #define INTEL_GT_ATTR_RW(_name) \ 148 + static struct kobj_attribute attr_##_name = __ATTR_RW(_name) 149 + 150 + #define INTEL_GT_ATTR_RO(_name) \ 151 + static struct kobj_attribute attr_##_name = __ATTR_RO(_name) 152 + 153 + #define INTEL_GT_DUAL_ATTR_RW(_name) \ 154 + static struct device_attribute dev_attr_##_name = __ATTR(_name, 0644, \ 155 + _name##_dev_show, \ 156 + _name##_dev_store); \ 157 + INTEL_GT_ATTR_RW(_name) 158 + 159 + #define INTEL_GT_DUAL_ATTR_RO(_name) \ 160 + static struct device_attribute dev_attr_##_name = __ATTR(_name, 0444, \ 161 + _name##_dev_show, \ 162 + NULL); \ 163 + INTEL_GT_ATTR_RO(_name) 164 + 97 165 #ifdef CONFIG_PM 98 166 static u32 get_residency(struct intel_gt *gt, i915_reg_t reg) 99 167 { ··· 176 104 return DIV_ROUND_CLOSEST_ULL(res, 1000); 177 105 } 178 106 179 - static ssize_t rc6_enable_show(struct device *dev, 180 - struct device_attribute *attr, 181 - char *buff) 107 + static u8 get_rc6_mask(struct intel_gt *gt) 182 108 { 183 - struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 184 109 u8 mask = 0; 185 110 186 111 if (HAS_RC6(gt->i915)) ··· 187 118 if (HAS_RC6pp(gt->i915)) 188 119 mask |= BIT(2); 189 120 190 - return sysfs_emit(buff, "%x\n", mask); 121 + return mask; 122 + } 123 + 124 + static ssize_t rc6_enable_show(struct kobject *kobj, 125 + struct kobj_attribute *attr, 126 + char *buff) 127 + { 128 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(kobj, attr->attr.name); 129 + 130 + return sysfs_emit(buff, "%x\n", get_rc6_mask(gt)); 131 + } 132 + 133 + static ssize_t rc6_enable_dev_show(struct device *dev, 134 + struct device_attribute *attr, 135 + char *buff) 136 + { 137 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(&dev->kobj, attr->attr.name); 138 + 139 + return sysfs_emit(buff, "%x\n", get_rc6_mask(gt)); 191 140 } 192 141 193 142 static u32 __rc6_residency_ms_show(struct intel_gt *gt) ··· 213 126 return get_residency(gt, GEN6_GT_GFX_RC6); 214 127 } 215 128 216 - static ssize_t rc6_residency_ms_show(struct device *dev, 217 - struct device_attribute *attr, 218 - char *buff) 219 - { 220 - u32 rc6_residency = sysfs_gt_attribute_r_min_func(dev, attr, 221 - __rc6_residency_ms_show); 222 - 223 - return sysfs_emit(buff, "%u\n", rc6_residency); 224 - } 225 - 226 129 static u32 __rc6p_residency_ms_show(struct intel_gt *gt) 227 130 { 228 131 return get_residency(gt, GEN6_GT_GFX_RC6p); 229 - } 230 - 231 - static ssize_t rc6p_residency_ms_show(struct device *dev, 232 - struct device_attribute *attr, 233 - char *buff) 234 - { 235 - u32 rc6p_residency = sysfs_gt_attribute_r_min_func(dev, attr, 236 - __rc6p_residency_ms_show); 237 - 238 - return sysfs_emit(buff, "%u\n", rc6p_residency); 239 132 } 240 133 241 134 static u32 __rc6pp_residency_ms_show(struct intel_gt *gt) ··· 223 156 return get_residency(gt, GEN6_GT_GFX_RC6pp); 224 157 } 225 158 226 - static ssize_t rc6pp_residency_ms_show(struct device *dev, 227 - struct device_attribute *attr, 228 - char *buff) 229 - { 230 - u32 rc6pp_residency = sysfs_gt_attribute_r_min_func(dev, attr, 231 - __rc6pp_residency_ms_show); 232 - 233 - return sysfs_emit(buff, "%u\n", rc6pp_residency); 234 - } 235 - 236 159 static u32 __media_rc6_residency_ms_show(struct intel_gt *gt) 237 160 { 238 161 return get_residency(gt, VLV_GT_MEDIA_RC6); 239 162 } 240 163 241 - static ssize_t media_rc6_residency_ms_show(struct device *dev, 242 - struct device_attribute *attr, 243 - char *buff) 244 - { 245 - u32 rc6_residency = sysfs_gt_attribute_r_min_func(dev, attr, 246 - __media_rc6_residency_ms_show); 164 + INTEL_GT_SYSFS_SHOW_MIN(rc6_residency_ms); 165 + INTEL_GT_SYSFS_SHOW_MIN(rc6p_residency_ms); 166 + INTEL_GT_SYSFS_SHOW_MIN(rc6pp_residency_ms); 167 + INTEL_GT_SYSFS_SHOW_MIN(media_rc6_residency_ms); 247 168 248 - return sysfs_emit(buff, "%u\n", rc6_residency); 249 - } 250 - 251 - static DEVICE_ATTR_RO(rc6_enable); 252 - static DEVICE_ATTR_RO(rc6_residency_ms); 253 - static DEVICE_ATTR_RO(rc6p_residency_ms); 254 - static DEVICE_ATTR_RO(rc6pp_residency_ms); 255 - static DEVICE_ATTR_RO(media_rc6_residency_ms); 169 + INTEL_GT_DUAL_ATTR_RO(rc6_enable); 170 + INTEL_GT_DUAL_ATTR_RO(rc6_residency_ms); 171 + INTEL_GT_DUAL_ATTR_RO(rc6p_residency_ms); 172 + INTEL_GT_DUAL_ATTR_RO(rc6pp_residency_ms); 173 + INTEL_GT_DUAL_ATTR_RO(media_rc6_residency_ms); 256 174 257 175 static struct attribute *rc6_attrs[] = { 176 + &attr_rc6_enable.attr, 177 + &attr_rc6_residency_ms.attr, 178 + NULL 179 + }; 180 + 181 + static struct attribute *rc6p_attrs[] = { 182 + &attr_rc6p_residency_ms.attr, 183 + &attr_rc6pp_residency_ms.attr, 184 + NULL 185 + }; 186 + 187 + static struct attribute *media_rc6_attrs[] = { 188 + &attr_media_rc6_residency_ms.attr, 189 + NULL 190 + }; 191 + 192 + static struct attribute *rc6_dev_attrs[] = { 258 193 &dev_attr_rc6_enable.attr, 259 194 &dev_attr_rc6_residency_ms.attr, 260 195 NULL 261 196 }; 262 197 263 - static struct attribute *rc6p_attrs[] = { 198 + static struct attribute *rc6p_dev_attrs[] = { 264 199 &dev_attr_rc6p_residency_ms.attr, 265 200 &dev_attr_rc6pp_residency_ms.attr, 266 201 NULL 267 202 }; 268 203 269 - static struct attribute *media_rc6_attrs[] = { 204 + static struct attribute *media_rc6_dev_attrs[] = { 270 205 &dev_attr_media_rc6_residency_ms.attr, 271 206 NULL 272 207 }; 273 208 274 209 static const struct attribute_group rc6_attr_group[] = { 275 210 { .attrs = rc6_attrs, }, 276 - { .name = power_group_name, .attrs = rc6_attrs, }, 211 + { .name = power_group_name, .attrs = rc6_dev_attrs, }, 277 212 }; 278 213 279 214 static const struct attribute_group rc6p_attr_group[] = { 280 215 { .attrs = rc6p_attrs, }, 281 - { .name = power_group_name, .attrs = rc6p_attrs, }, 216 + { .name = power_group_name, .attrs = rc6p_dev_attrs, }, 282 217 }; 283 218 284 219 static const struct attribute_group media_rc6_attr_group[] = { 285 220 { .attrs = media_rc6_attrs, }, 286 - { .name = power_group_name, .attrs = media_rc6_attrs, }, 221 + { .name = power_group_name, .attrs = media_rc6_dev_attrs, }, 287 222 }; 288 223 289 224 static int __intel_gt_sysfs_create_group(struct kobject *kobj, ··· 340 271 return intel_rps_read_actual_frequency(&gt->rps); 341 272 } 342 273 343 - static ssize_t act_freq_mhz_show(struct device *dev, 344 - struct device_attribute *attr, char *buff) 345 - { 346 - u32 actual_freq = sysfs_gt_attribute_r_max_func(dev, attr, 347 - __act_freq_mhz_show); 348 - 349 - return sysfs_emit(buff, "%u\n", actual_freq); 350 - } 351 - 352 274 static u32 __cur_freq_mhz_show(struct intel_gt *gt) 353 275 { 354 276 return intel_rps_get_requested_frequency(&gt->rps); 355 - } 356 - 357 - static ssize_t cur_freq_mhz_show(struct device *dev, 358 - struct device_attribute *attr, char *buff) 359 - { 360 - u32 cur_freq = sysfs_gt_attribute_r_max_func(dev, attr, 361 - __cur_freq_mhz_show); 362 - 363 - return sysfs_emit(buff, "%u\n", cur_freq); 364 277 } 365 278 366 279 static u32 __boost_freq_mhz_show(struct intel_gt *gt) ··· 350 299 return intel_rps_get_boost_frequency(&gt->rps); 351 300 } 352 301 353 - static ssize_t boost_freq_mhz_show(struct device *dev, 354 - struct device_attribute *attr, 355 - char *buff) 356 - { 357 - u32 boost_freq = sysfs_gt_attribute_r_max_func(dev, attr, 358 - __boost_freq_mhz_show); 359 - 360 - return sysfs_emit(buff, "%u\n", boost_freq); 361 - } 362 - 363 302 static int __boost_freq_mhz_store(struct intel_gt *gt, u32 val) 364 303 { 365 304 return intel_rps_set_boost_frequency(&gt->rps, val); 366 305 } 367 306 368 - static ssize_t boost_freq_mhz_store(struct device *dev, 369 - struct device_attribute *attr, 370 - const char *buff, size_t count) 371 - { 372 - ssize_t ret; 373 - u32 val; 374 - 375 - ret = kstrtou32(buff, 0, &val); 376 - if (ret) 377 - return ret; 378 - 379 - return sysfs_gt_attribute_w_func(dev, attr, 380 - __boost_freq_mhz_store, val) ?: count; 381 - } 382 - 383 - static u32 __rp0_freq_mhz_show(struct intel_gt *gt) 307 + static u32 __RP0_freq_mhz_show(struct intel_gt *gt) 384 308 { 385 309 return intel_rps_get_rp0_frequency(&gt->rps); 386 310 } 387 311 388 - static ssize_t RP0_freq_mhz_show(struct device *dev, 389 - struct device_attribute *attr, char *buff) 390 - { 391 - u32 rp0_freq = sysfs_gt_attribute_r_max_func(dev, attr, 392 - __rp0_freq_mhz_show); 393 - 394 - return sysfs_emit(buff, "%u\n", rp0_freq); 395 - } 396 - 397 - static u32 __rp1_freq_mhz_show(struct intel_gt *gt) 398 - { 399 - return intel_rps_get_rp1_frequency(&gt->rps); 400 - } 401 - 402 - static ssize_t RP1_freq_mhz_show(struct device *dev, 403 - struct device_attribute *attr, char *buff) 404 - { 405 - u32 rp1_freq = sysfs_gt_attribute_r_max_func(dev, attr, 406 - __rp1_freq_mhz_show); 407 - 408 - return sysfs_emit(buff, "%u\n", rp1_freq); 409 - } 410 - 411 - static u32 __rpn_freq_mhz_show(struct intel_gt *gt) 312 + static u32 __RPn_freq_mhz_show(struct intel_gt *gt) 412 313 { 413 314 return intel_rps_get_rpn_frequency(&gt->rps); 414 315 } 415 316 416 - static ssize_t RPn_freq_mhz_show(struct device *dev, 417 - struct device_attribute *attr, char *buff) 317 + static u32 __RP1_freq_mhz_show(struct intel_gt *gt) 418 318 { 419 - u32 rpn_freq = sysfs_gt_attribute_r_max_func(dev, attr, 420 - __rpn_freq_mhz_show); 421 - 422 - return sysfs_emit(buff, "%u\n", rpn_freq); 319 + return intel_rps_get_rp1_frequency(&gt->rps); 423 320 } 424 321 425 322 static u32 __max_freq_mhz_show(struct intel_gt *gt) ··· 375 376 return intel_rps_get_max_frequency(&gt->rps); 376 377 } 377 378 378 - static ssize_t max_freq_mhz_show(struct device *dev, 379 - struct device_attribute *attr, char *buff) 380 - { 381 - u32 max_freq = sysfs_gt_attribute_r_max_func(dev, attr, 382 - __max_freq_mhz_show); 383 - 384 - return sysfs_emit(buff, "%u\n", max_freq); 385 - } 386 - 387 379 static int __set_max_freq(struct intel_gt *gt, u32 val) 388 380 { 389 381 return intel_rps_set_max_frequency(&gt->rps, val); 390 - } 391 - 392 - static ssize_t max_freq_mhz_store(struct device *dev, 393 - struct device_attribute *attr, 394 - const char *buff, size_t count) 395 - { 396 - int ret; 397 - u32 val; 398 - 399 - ret = kstrtou32(buff, 0, &val); 400 - if (ret) 401 - return ret; 402 - 403 - ret = sysfs_gt_attribute_w_func(dev, attr, __set_max_freq, val); 404 - 405 - return ret ?: count; 406 382 } 407 383 408 384 static u32 __min_freq_mhz_show(struct intel_gt *gt) ··· 385 411 return intel_rps_get_min_frequency(&gt->rps); 386 412 } 387 413 388 - static ssize_t min_freq_mhz_show(struct device *dev, 389 - struct device_attribute *attr, char *buff) 390 - { 391 - u32 min_freq = sysfs_gt_attribute_r_min_func(dev, attr, 392 - __min_freq_mhz_show); 393 - 394 - return sysfs_emit(buff, "%u\n", min_freq); 395 - } 396 - 397 414 static int __set_min_freq(struct intel_gt *gt, u32 val) 398 415 { 399 416 return intel_rps_set_min_frequency(&gt->rps, val); 400 - } 401 - 402 - static ssize_t min_freq_mhz_store(struct device *dev, 403 - struct device_attribute *attr, 404 - const char *buff, size_t count) 405 - { 406 - int ret; 407 - u32 val; 408 - 409 - ret = kstrtou32(buff, 0, &val); 410 - if (ret) 411 - return ret; 412 - 413 - ret = sysfs_gt_attribute_w_func(dev, attr, __set_min_freq, val); 414 - 415 - return ret ?: count; 416 417 } 417 418 418 419 static u32 __vlv_rpe_freq_mhz_show(struct intel_gt *gt) ··· 397 448 return intel_gpu_freq(rps, rps->efficient_freq); 398 449 } 399 450 400 - static ssize_t vlv_rpe_freq_mhz_show(struct device *dev, 401 - struct device_attribute *attr, char *buff) 402 - { 403 - u32 rpe_freq = sysfs_gt_attribute_r_max_func(dev, attr, 404 - __vlv_rpe_freq_mhz_show); 451 + INTEL_GT_SYSFS_SHOW_MAX(act_freq_mhz); 452 + INTEL_GT_SYSFS_SHOW_MAX(boost_freq_mhz); 453 + INTEL_GT_SYSFS_SHOW_MAX(cur_freq_mhz); 454 + INTEL_GT_SYSFS_SHOW_MAX(RP0_freq_mhz); 455 + INTEL_GT_SYSFS_SHOW_MAX(RP1_freq_mhz); 456 + INTEL_GT_SYSFS_SHOW_MAX(RPn_freq_mhz); 457 + INTEL_GT_SYSFS_SHOW_MAX(max_freq_mhz); 458 + INTEL_GT_SYSFS_SHOW_MIN(min_freq_mhz); 459 + INTEL_GT_SYSFS_SHOW_MAX(vlv_rpe_freq_mhz); 460 + INTEL_GT_SYSFS_STORE(boost_freq_mhz, __boost_freq_mhz_store); 461 + INTEL_GT_SYSFS_STORE(max_freq_mhz, __set_max_freq); 462 + INTEL_GT_SYSFS_STORE(min_freq_mhz, __set_min_freq); 405 463 406 - return sysfs_emit(buff, "%u\n", rpe_freq); 407 - } 464 + #define INTEL_GT_RPS_SYSFS_ATTR(_name, _mode, _show, _store, _show_dev, _store_dev) \ 465 + static struct device_attribute dev_attr_gt_##_name = __ATTR(gt_##_name, _mode, \ 466 + _show_dev, _store_dev); \ 467 + static struct kobj_attribute attr_rps_##_name = __ATTR(rps_##_name, _mode, \ 468 + _show, _store) 408 469 409 - #define INTEL_GT_RPS_SYSFS_ATTR(_name, _mode, _show, _store) \ 410 - static struct device_attribute dev_attr_gt_##_name = __ATTR(gt_##_name, _mode, _show, _store); \ 411 - static struct device_attribute dev_attr_rps_##_name = __ATTR(rps_##_name, _mode, _show, _store) 412 - 413 - #define INTEL_GT_RPS_SYSFS_ATTR_RO(_name) \ 414 - INTEL_GT_RPS_SYSFS_ATTR(_name, 0444, _name##_show, NULL) 415 - #define INTEL_GT_RPS_SYSFS_ATTR_RW(_name) \ 416 - INTEL_GT_RPS_SYSFS_ATTR(_name, 0644, _name##_show, _name##_store) 470 + #define INTEL_GT_RPS_SYSFS_ATTR_RO(_name) \ 471 + INTEL_GT_RPS_SYSFS_ATTR(_name, 0444, _name##_show, NULL, \ 472 + _name##_dev_show, NULL) 473 + #define INTEL_GT_RPS_SYSFS_ATTR_RW(_name) \ 474 + INTEL_GT_RPS_SYSFS_ATTR(_name, 0644, _name##_show, _name##_store, \ 475 + _name##_dev_show, _name##_dev_store) 417 476 418 477 /* The below macros generate static structures */ 419 478 INTEL_GT_RPS_SYSFS_ATTR_RO(act_freq_mhz); ··· 432 475 INTEL_GT_RPS_SYSFS_ATTR_RO(RPn_freq_mhz); 433 476 INTEL_GT_RPS_SYSFS_ATTR_RW(max_freq_mhz); 434 477 INTEL_GT_RPS_SYSFS_ATTR_RW(min_freq_mhz); 478 + INTEL_GT_RPS_SYSFS_ATTR_RO(vlv_rpe_freq_mhz); 435 479 436 - static DEVICE_ATTR_RO(vlv_rpe_freq_mhz); 437 - 438 - #define GEN6_ATTR(s) { \ 439 - &dev_attr_##s##_act_freq_mhz.attr, \ 440 - &dev_attr_##s##_cur_freq_mhz.attr, \ 441 - &dev_attr_##s##_boost_freq_mhz.attr, \ 442 - &dev_attr_##s##_max_freq_mhz.attr, \ 443 - &dev_attr_##s##_min_freq_mhz.attr, \ 444 - &dev_attr_##s##_RP0_freq_mhz.attr, \ 445 - &dev_attr_##s##_RP1_freq_mhz.attr, \ 446 - &dev_attr_##s##_RPn_freq_mhz.attr, \ 480 + #define GEN6_ATTR(p, s) { \ 481 + &p##attr_##s##_act_freq_mhz.attr, \ 482 + &p##attr_##s##_cur_freq_mhz.attr, \ 483 + &p##attr_##s##_boost_freq_mhz.attr, \ 484 + &p##attr_##s##_max_freq_mhz.attr, \ 485 + &p##attr_##s##_min_freq_mhz.attr, \ 486 + &p##attr_##s##_RP0_freq_mhz.attr, \ 487 + &p##attr_##s##_RP1_freq_mhz.attr, \ 488 + &p##attr_##s##_RPn_freq_mhz.attr, \ 447 489 NULL, \ 448 490 } 449 491 450 - #define GEN6_RPS_ATTR GEN6_ATTR(rps) 451 - #define GEN6_GT_ATTR GEN6_ATTR(gt) 492 + #define GEN6_RPS_ATTR GEN6_ATTR(, rps) 493 + #define GEN6_GT_ATTR GEN6_ATTR(dev_, gt) 452 494 453 495 static const struct attribute * const gen6_rps_attrs[] = GEN6_RPS_ATTR; 454 496 static const struct attribute * const gen6_gt_attrs[] = GEN6_GT_ATTR; 455 497 456 - static ssize_t punit_req_freq_mhz_show(struct device *dev, 457 - struct device_attribute *attr, 498 + static ssize_t punit_req_freq_mhz_show(struct kobject *kobj, 499 + struct kobj_attribute *attr, 458 500 char *buff) 459 501 { 460 - struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 502 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(kobj, attr->attr.name); 461 503 u32 preq = intel_rps_read_punit_req_frequency(&gt->rps); 462 504 463 505 return sysfs_emit(buff, "%u\n", preq); ··· 464 508 465 509 struct intel_gt_bool_throttle_attr { 466 510 struct attribute attr; 467 - ssize_t (*show)(struct device *dev, struct device_attribute *attr, 511 + ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr, 468 512 char *buf); 469 - i915_reg_t reg32; 513 + i915_reg_t (*reg32)(struct intel_gt *gt); 470 514 u32 mask; 471 515 }; 472 516 473 - static ssize_t throttle_reason_bool_show(struct device *dev, 474 - struct device_attribute *attr, 517 + static ssize_t throttle_reason_bool_show(struct kobject *kobj, 518 + struct kobj_attribute *attr, 475 519 char *buff) 476 520 { 477 - struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 521 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(kobj, attr->attr.name); 478 522 struct intel_gt_bool_throttle_attr *t_attr = 479 523 (struct intel_gt_bool_throttle_attr *) attr; 480 - bool val = rps_read_mask_mmio(&gt->rps, t_attr->reg32, t_attr->mask); 524 + bool val = rps_read_mask_mmio(&gt->rps, t_attr->reg32(gt), t_attr->mask); 481 525 482 526 return sysfs_emit(buff, "%u\n", val); 483 527 } ··· 486 530 struct intel_gt_bool_throttle_attr attr_##sysfs_func__ = { \ 487 531 .attr = { .name = __stringify(sysfs_func__), .mode = 0444 }, \ 488 532 .show = throttle_reason_bool_show, \ 489 - .reg32 = GT0_PERF_LIMIT_REASONS, \ 533 + .reg32 = intel_gt_perf_limit_reasons_reg, \ 490 534 .mask = mask__, \ 491 535 } 492 536 493 - static DEVICE_ATTR_RO(punit_req_freq_mhz); 537 + INTEL_GT_ATTR_RO(punit_req_freq_mhz); 494 538 static INTEL_GT_RPS_BOOL_ATTR_RO(throttle_reason_status, GT0_PERF_LIMIT_REASONS_MASK); 495 539 static INTEL_GT_RPS_BOOL_ATTR_RO(throttle_reason_pl1, POWER_LIMIT_1_MASK); 496 540 static INTEL_GT_RPS_BOOL_ATTR_RO(throttle_reason_pl2, POWER_LIMIT_2_MASK); ··· 553 597 #define U8_8_VAL_MASK 0xffff 554 598 #define U8_8_SCALE_TO_VALUE "0.00390625" 555 599 556 - static ssize_t freq_factor_scale_show(struct device *dev, 557 - struct device_attribute *attr, 600 + static ssize_t freq_factor_scale_show(struct kobject *kobj, 601 + struct kobj_attribute *attr, 558 602 char *buff) 559 603 { 560 604 return sysfs_emit(buff, "%s\n", U8_8_SCALE_TO_VALUE); ··· 566 610 return !mode ? mode : 256 / mode; 567 611 } 568 612 569 - static ssize_t media_freq_factor_show(struct device *dev, 570 - struct device_attribute *attr, 613 + static ssize_t media_freq_factor_show(struct kobject *kobj, 614 + struct kobj_attribute *attr, 571 615 char *buff) 572 616 { 573 - struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 617 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(kobj, attr->attr.name); 574 618 struct intel_guc_slpc *slpc = &gt->uc.guc.slpc; 575 619 intel_wakeref_t wakeref; 576 620 u32 mode; ··· 597 641 return sysfs_emit(buff, "%u\n", media_ratio_mode_to_factor(mode)); 598 642 } 599 643 600 - static ssize_t media_freq_factor_store(struct device *dev, 601 - struct device_attribute *attr, 644 + static ssize_t media_freq_factor_store(struct kobject *kobj, 645 + struct kobj_attribute *attr, 602 646 const char *buff, size_t count) 603 647 { 604 - struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 648 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(kobj, attr->attr.name); 605 649 struct intel_guc_slpc *slpc = &gt->uc.guc.slpc; 606 650 u32 factor, mode; 607 651 int err; ··· 626 670 return err ?: count; 627 671 } 628 672 629 - static ssize_t media_RP0_freq_mhz_show(struct device *dev, 630 - struct device_attribute *attr, 673 + static ssize_t media_RP0_freq_mhz_show(struct kobject *kobj, 674 + struct kobj_attribute *attr, 631 675 char *buff) 632 676 { 633 - struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 677 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(kobj, attr->attr.name); 634 678 u32 val; 635 679 int err; 636 680 ··· 647 691 return sysfs_emit(buff, "%u\n", val); 648 692 } 649 693 650 - static ssize_t media_RPn_freq_mhz_show(struct device *dev, 651 - struct device_attribute *attr, 694 + static ssize_t media_RPn_freq_mhz_show(struct kobject *kobj, 695 + struct kobj_attribute *attr, 652 696 char *buff) 653 697 { 654 - struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name); 698 + struct intel_gt *gt = intel_gt_sysfs_get_drvdata(kobj, attr->attr.name); 655 699 u32 val; 656 700 int err; 657 701 ··· 668 712 return sysfs_emit(buff, "%u\n", val); 669 713 } 670 714 671 - static DEVICE_ATTR_RW(media_freq_factor); 672 - static struct device_attribute dev_attr_media_freq_factor_scale = 715 + INTEL_GT_ATTR_RW(media_freq_factor); 716 + static struct kobj_attribute attr_media_freq_factor_scale = 673 717 __ATTR(media_freq_factor.scale, 0444, freq_factor_scale_show, NULL); 674 - static DEVICE_ATTR_RO(media_RP0_freq_mhz); 675 - static DEVICE_ATTR_RO(media_RPn_freq_mhz); 718 + INTEL_GT_ATTR_RO(media_RP0_freq_mhz); 719 + INTEL_GT_ATTR_RO(media_RPn_freq_mhz); 676 720 677 721 static const struct attribute *media_perf_power_attrs[] = { 678 - &dev_attr_media_freq_factor.attr, 679 - &dev_attr_media_freq_factor_scale.attr, 680 - &dev_attr_media_RP0_freq_mhz.attr, 681 - &dev_attr_media_RPn_freq_mhz.attr, 722 + &attr_media_freq_factor.attr, 723 + &attr_media_freq_factor_scale.attr, 724 + &attr_media_RP0_freq_mhz.attr, 725 + &attr_media_RPn_freq_mhz.attr, 682 726 NULL 683 727 }; 684 728 ··· 710 754 NULL 711 755 }; 712 756 713 - static int intel_sysfs_rps_init(struct intel_gt *gt, struct kobject *kobj, 714 - const struct attribute * const *attrs) 757 + static int intel_sysfs_rps_init(struct intel_gt *gt, struct kobject *kobj) 715 758 { 759 + const struct attribute * const *attrs; 760 + struct attribute *vlv_attr; 716 761 int ret; 717 762 718 763 if (GRAPHICS_VER(gt->i915) < 6) 719 764 return 0; 765 + 766 + if (is_object_gt(kobj)) { 767 + attrs = gen6_rps_attrs; 768 + vlv_attr = &attr_rps_vlv_rpe_freq_mhz.attr; 769 + } else { 770 + attrs = gen6_gt_attrs; 771 + vlv_attr = &dev_attr_gt_vlv_rpe_freq_mhz.attr; 772 + } 720 773 721 774 ret = sysfs_create_files(kobj, attrs); 722 775 if (ret) 723 776 return ret; 724 777 725 778 if (IS_VALLEYVIEW(gt->i915) || IS_CHERRYVIEW(gt->i915)) 726 - ret = sysfs_create_file(kobj, &dev_attr_vlv_rpe_freq_mhz.attr); 779 + ret = sysfs_create_file(kobj, vlv_attr); 727 780 728 781 return ret; 729 782 } ··· 743 778 744 779 intel_sysfs_rc6_init(gt, kobj); 745 780 746 - ret = is_object_gt(kobj) ? 747 - intel_sysfs_rps_init(gt, kobj, gen6_rps_attrs) : 748 - intel_sysfs_rps_init(gt, kobj, gen6_gt_attrs); 781 + ret = intel_sysfs_rps_init(gt, kobj); 749 782 if (ret) 750 783 drm_warn(&gt->i915->drm, 751 784 "failed to create gt%u RPS sysfs files (%pe)", ··· 753 790 if (!is_object_gt(kobj)) 754 791 return; 755 792 756 - ret = sysfs_create_file(kobj, &dev_attr_punit_req_freq_mhz.attr); 793 + ret = sysfs_create_file(kobj, &attr_punit_req_freq_mhz.attr); 757 794 if (ret) 758 795 drm_warn(&gt->i915->drm, 759 796 "failed to create gt%u punit_req_freq_mhz sysfs (%pe)", 760 797 gt->info.id, ERR_PTR(ret)); 761 798 762 - if (GRAPHICS_VER(gt->i915) >= 11) { 799 + if (i915_mmio_reg_valid(intel_gt_perf_limit_reasons_reg(gt))) { 763 800 ret = sysfs_create_files(kobj, throttle_reason_attrs); 764 801 if (ret) 765 802 drm_warn(&gt->i915->drm,

+6 -17

drivers/gpu/drm/i915/gt/intel_gt_types.h

··· 20 20 #include "intel_gsc.h" 21 21 22 22 #include "i915_vma.h" 23 + #include "i915_perf_types.h" 23 24 #include "intel_engine_types.h" 24 25 #include "intel_gt_buffer_pool_types.h" 25 26 #include "intel_hwconfig.h" ··· 60 59 L3BANK, 61 60 MSLICE, 62 61 LNCF, 62 + GAM, 63 + DSS, 64 + OADDRM, 63 65 64 66 /* 65 67 * On some platforms there are multiple types of MCR registers that ··· 145 141 struct intel_wakeref wakeref; 146 142 atomic_t user_wakeref; 147 143 148 - /** 149 - * Protects access to lmem usefault list. 150 - * It is required, if we are outside of the runtime suspend path, 151 - * access to @lmem_userfault_list requires always first grabbing the 152 - * runtime pm, to ensure we can't race against runtime suspend. 153 - * Once we have that we also need to grab @lmem_userfault_lock, 154 - * at which point we have exclusive access. 155 - * The runtime suspend path is special since it doesn't really hold any locks, 156 - * but instead has exclusive access by virtue of all other accesses requiring 157 - * holding the runtime pm wakeref. 158 - */ 159 - struct mutex lmem_userfault_lock; 160 - struct list_head lmem_userfault_list; 161 - 162 144 struct list_head closed_vma; 163 145 spinlock_t closed_lock; /* guards the list of closed_vma */ 164 146 ··· 159 169 * is a slight delay before we do so. 160 170 */ 161 171 intel_wakeref_t awake; 162 - 163 - /* Manual runtime pm autosuspend delay for user GGTT/lmem mmaps */ 164 - struct intel_wakeref_auto userfault_wakeref; 165 172 166 173 u32 clock_frequency; 167 174 u32 clock_period_ns; ··· 273 286 /* sysfs defaults per gt */ 274 287 struct gt_defaults defaults; 275 288 struct kobject *sysfs_defaults; 289 + 290 + struct i915_perf_gt perf; 276 291 }; 277 292 278 293 struct intel_gt_definition {

+22 -21

drivers/gpu/drm/i915/gt/intel_gtt.c

··· 15 15 #include "i915_trace.h" 16 16 #include "i915_utils.h" 17 17 #include "intel_gt.h" 18 + #include "intel_gt_mcr.h" 18 19 #include "intel_gt_regs.h" 19 20 #include "intel_gtt.h" 20 21 ··· 270 269 memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT, 271 270 ARRAY_SIZE(vm->min_alignment)); 272 271 273 - if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915) && 274 - subclass == VM_CLASS_PPGTT) { 275 - vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M; 276 - vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M; 277 - } else if (HAS_64K_PAGES(vm->i915)) { 272 + if (HAS_64K_PAGES(vm->i915)) { 278 273 vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K; 279 274 vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K; 280 275 } ··· 340 343 */ 341 344 size = I915_GTT_PAGE_SIZE_4K; 342 345 if (i915_vm_is_4lvl(vm) && 343 - HAS_PAGE_SIZES(vm->i915, I915_GTT_PAGE_SIZE_64K)) 346 + HAS_PAGE_SIZES(vm->i915, I915_GTT_PAGE_SIZE_64K) && 347 + !HAS_64K_PAGES(vm->i915)) 344 348 size = I915_GTT_PAGE_SIZE_64K; 345 349 346 350 do { ··· 381 383 i915_gem_object_put(obj); 382 384 skip: 383 385 if (size == I915_GTT_PAGE_SIZE_4K) 384 - return -ENOMEM; 385 - 386 - /* 387 - * If we need 64K minimum GTT pages for device local-memory, 388 - * like on XEHPSDV, then we need to fail the allocation here, 389 - * otherwise we can't safely support the insertion of 390 - * local-memory pages for this vm, since the HW expects the 391 - * correct physical alignment and size when the page-table is 392 - * operating in 64K GTT mode, which includes any scratch PTEs, 393 - * since userspace can still touch them. 394 - */ 395 - if (HAS_64K_PAGES(vm->i915)) 396 386 return -ENOMEM; 397 387 398 388 size = I915_GTT_PAGE_SIZE_4K; ··· 479 493 intel_uncore_write(uncore, GEN12_PAT_INDEX(7), GEN8_PPAT_WB); 480 494 } 481 495 496 + static void xehp_setup_private_ppat(struct intel_gt *gt) 497 + { 498 + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(0), GEN8_PPAT_WB); 499 + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(1), GEN8_PPAT_WC); 500 + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(2), GEN8_PPAT_WT); 501 + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(3), GEN8_PPAT_UC); 502 + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(4), GEN8_PPAT_WB); 503 + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(5), GEN8_PPAT_WB); 504 + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(6), GEN8_PPAT_WB); 505 + intel_gt_mcr_multicast_write(gt, XEHP_PAT_INDEX(7), GEN8_PPAT_WB); 506 + } 507 + 482 508 static void icl_setup_private_ppat(struct intel_uncore *uncore) 483 509 { 484 510 intel_uncore_write(uncore, ··· 583 585 intel_uncore_write(uncore, GEN8_PRIVATE_PAT_HI, upper_32_bits(pat)); 584 586 } 585 587 586 - void setup_private_pat(struct intel_uncore *uncore) 588 + void setup_private_pat(struct intel_gt *gt) 587 589 { 588 - struct drm_i915_private *i915 = uncore->i915; 590 + struct intel_uncore *uncore = gt->uncore; 591 + struct drm_i915_private *i915 = gt->i915; 589 592 590 593 GEM_BUG_ON(GRAPHICS_VER(i915) < 8); 591 594 592 - if (GRAPHICS_VER(i915) >= 12) 595 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) 596 + xehp_setup_private_ppat(gt); 597 + else if (GRAPHICS_VER(i915) >= 12) 593 598 tgl_setup_private_ppat(uncore); 594 599 else if (GRAPHICS_VER(i915) >= 11) 595 600 icl_setup_private_ppat(uncore);

+2 -1

drivers/gpu/drm/i915/gt/intel_gtt.h

··· 93 93 #define GEN12_GGTT_PTE_LM BIT_ULL(1) 94 94 95 95 #define GEN12_PDE_64K BIT(6) 96 + #define GEN12_PTE_PS64 BIT(8) 96 97 97 98 /* 98 99 * Cacheability Control is a 4-bit value. The low three bits are stored in bits ··· 668 667 669 668 void gtt_write_workarounds(struct intel_gt *gt); 670 669 671 - void setup_private_pat(struct intel_uncore *uncore); 670 + void setup_private_pat(struct intel_gt *gt); 672 671 673 672 int i915_vm_alloc_pt_stash(struct i915_address_space *vm, 674 673 struct i915_vm_pt_stash *stash,

+119 -24

drivers/gpu/drm/i915/gt/intel_lrc.c

··· 20 20 #include "intel_ring.h" 21 21 #include "shmem_utils.h" 22 22 23 + /* 24 + * The per-platform tables are u8-encoded in @data. Decode @data and set the 25 + * addresses' offset and commands in @regs. The following encoding is used 26 + * for each byte. There are 2 steps: decoding commands and decoding addresses. 27 + * 28 + * Commands: 29 + * [7]: create NOPs - number of NOPs are set in lower bits 30 + * [6]: When creating MI_LOAD_REGISTER_IMM command, allow to set 31 + * MI_LRI_FORCE_POSTED 32 + * [5:0]: Number of NOPs or registers to set values to in case of 33 + * MI_LOAD_REGISTER_IMM 34 + * 35 + * Addresses: these are decoded after a MI_LOAD_REGISTER_IMM command by "count" 36 + * number of registers. They are set by using the REG/REG16 macros: the former 37 + * is used for offsets smaller than 0x200 while the latter is for values bigger 38 + * than that. Those macros already set all the bits documented below correctly: 39 + * 40 + * [7]: When a register offset needs more than 6 bits, use additional bytes, to 41 + * follow, for the lower bits 42 + * [6:0]: Register offset, without considering the engine base. 43 + * 44 + * This function only tweaks the commands and register offsets. Values are not 45 + * filled out. 46 + */ 23 47 static void set_offsets(u32 *regs, 24 48 const u8 *data, 25 49 const struct intel_engine_cs *engine, ··· 272 248 REG16(0x2b4), 273 249 REG(0x120), 274 250 REG(0x124), 251 + 252 + NOP(1), 253 + LRI(9, POSTED), 254 + REG16(0x3a8), 255 + REG16(0x28c), 256 + REG16(0x288), 257 + REG16(0x284), 258 + REG16(0x280), 259 + REG16(0x27c), 260 + REG16(0x278), 261 + REG16(0x274), 262 + REG16(0x270), 263 + 264 + END 265 + }; 266 + 267 + static const u8 mtl_xcs_offsets[] = { 268 + NOP(1), 269 + LRI(13, POSTED), 270 + REG16(0x244), 271 + REG(0x034), 272 + REG(0x030), 273 + REG(0x038), 274 + REG(0x03c), 275 + REG(0x168), 276 + REG(0x140), 277 + REG(0x110), 278 + REG(0x1c0), 279 + REG(0x1c4), 280 + REG(0x1c8), 281 + REG(0x180), 282 + REG16(0x2b4), 283 + NOP(4), 275 284 276 285 NOP(1), 277 286 LRI(9, POSTED), ··· 663 606 END 664 607 }; 665 608 609 + static const u8 mtl_rcs_offsets[] = { 610 + NOP(1), 611 + LRI(15, POSTED), 612 + REG16(0x244), 613 + REG(0x034), 614 + REG(0x030), 615 + REG(0x038), 616 + REG(0x03c), 617 + REG(0x168), 618 + REG(0x140), 619 + REG(0x110), 620 + REG(0x1c0), 621 + REG(0x1c4), 622 + REG(0x1c8), 623 + REG(0x180), 624 + REG16(0x2b4), 625 + REG(0x120), 626 + REG(0x124), 627 + 628 + NOP(1), 629 + LRI(9, POSTED), 630 + REG16(0x3a8), 631 + REG16(0x28c), 632 + REG16(0x288), 633 + REG16(0x284), 634 + REG16(0x280), 635 + REG16(0x27c), 636 + REG16(0x278), 637 + REG16(0x274), 638 + REG16(0x270), 639 + 640 + NOP(2), 641 + LRI(2, POSTED), 642 + REG16(0x5a8), 643 + REG16(0x5ac), 644 + 645 + NOP(6), 646 + LRI(1, 0), 647 + REG(0x0c8), 648 + 649 + END 650 + }; 651 + 666 652 #undef END 667 653 #undef REG16 668 654 #undef REG ··· 724 624 !intel_engine_has_relative_mmio(engine)); 725 625 726 626 if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) { 727 - if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) 627 + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 70)) 628 + return mtl_rcs_offsets; 629 + else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) 728 630 return dg2_rcs_offsets; 729 631 else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) 730 632 return xehp_rcs_offsets; ··· 739 637 else 740 638 return gen8_rcs_offsets; 741 639 } else { 742 - if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) 640 + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 70)) 641 + return mtl_xcs_offsets; 642 + else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) 743 643 return dg2_xcs_offsets; 744 644 else if (GRAPHICS_VER(engine->i915) >= 12) 745 645 return gen12_xcs_offsets; ··· 849 745 static u32 850 746 lrc_ring_indirect_offset_default(const struct intel_engine_cs *engine) 851 747 { 852 - switch (GRAPHICS_VER(engine->i915)) { 853 - default: 854 - MISSING_CASE(GRAPHICS_VER(engine->i915)); 855 - fallthrough; 856 - case 12: 748 + if (GRAPHICS_VER(engine->i915) >= 12) 857 749 return GEN12_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 858 - case 11: 750 + else if (GRAPHICS_VER(engine->i915) >= 11) 859 751 return GEN11_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 860 - case 9: 752 + else if (GRAPHICS_VER(engine->i915) >= 9) 861 753 return GEN9_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 862 - case 8: 754 + else if (GRAPHICS_VER(engine->i915) >= 8) 863 755 return GEN8_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT; 864 - } 756 + 757 + GEM_BUG_ON(GRAPHICS_VER(engine->i915) < 8); 758 + 759 + return 0; 865 760 } 866 761 867 762 static void ··· 1115 1012 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 1116 1013 context_size += I915_GTT_PAGE_SIZE; /* for redzone */ 1117 1014 1118 - if (GRAPHICS_VER(engine->i915) == 12) { 1015 + if (GRAPHICS_VER(engine->i915) >= 12) { 1119 1016 ce->wa_bb_page = context_size / PAGE_SIZE; 1120 1017 context_size += PAGE_SIZE; 1121 1018 } ··· 1821 1718 unsigned int i; 1822 1719 int err; 1823 1720 1824 - if (!(engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)) 1721 + if (GRAPHICS_VER(engine->i915) >= 11 || 1722 + !(engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)) 1825 1723 return; 1826 1724 1827 - switch (GRAPHICS_VER(engine->i915)) { 1828 - case 12: 1829 - case 11: 1830 - return; 1831 - case 9: 1725 + if (GRAPHICS_VER(engine->i915) == 9) { 1832 1726 wa_bb_fn[0] = gen9_init_indirectctx_bb; 1833 1727 wa_bb_fn[1] = NULL; 1834 - break; 1835 - case 8: 1728 + } else if (GRAPHICS_VER(engine->i915) == 8) { 1836 1729 wa_bb_fn[0] = gen8_init_indirectctx_bb; 1837 1730 wa_bb_fn[1] = NULL; 1838 - break; 1839 - default: 1840 - MISSING_CASE(GRAPHICS_VER(engine->i915)); 1841 - return; 1842 1731 } 1843 1732 1844 1733 err = lrc_create_wa_ctx(engine);

+2

drivers/gpu/drm/i915/gt/intel_lrc.h

··· 110 110 #define XEHP_SW_CTX_ID_WIDTH 16 111 111 #define XEHP_SW_COUNTER_SHIFT 58 112 112 #define XEHP_SW_COUNTER_WIDTH 6 113 + #define GEN12_GUC_SW_CTX_ID_SHIFT 39 114 + #define GEN12_GUC_SW_CTX_ID_WIDTH 16 113 115 114 116 static inline void lrc_runtime_start(struct intel_context *ce) 115 117 {

+1

drivers/gpu/drm/i915/gt/intel_migrate.c

··· 10 10 #include "intel_gtt.h" 11 11 #include "intel_migrate.h" 12 12 #include "intel_ring.h" 13 + #include "gem/i915_gem_lmem.h" 13 14 14 15 struct insert_pte_data { 15 16 u64 offset;

+8 -4

drivers/gpu/drm/i915/gt/intel_mocs.c

··· 7 7 8 8 #include "intel_engine.h" 9 9 #include "intel_gt.h" 10 + #include "intel_gt_mcr.h" 10 11 #include "intel_gt_regs.h" 11 12 #include "intel_mocs.h" 12 13 #include "intel_ring.h" ··· 610 609 0; \ 611 610 i++) 612 611 613 - static void init_l3cc_table(struct intel_uncore *uncore, 612 + static void init_l3cc_table(struct intel_gt *gt, 614 613 const struct drm_i915_mocs_table *table) 615 614 { 616 615 unsigned int i; 617 616 u32 l3cc; 618 617 619 618 for_each_l3cc(l3cc, table, i) 620 - intel_uncore_write_fw(uncore, GEN9_LNCFCMOCS(i), l3cc); 619 + if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50)) 620 + intel_gt_mcr_multicast_write_fw(gt, XEHP_LNCFCMOCS(i), l3cc); 621 + else 622 + intel_uncore_write_fw(gt->uncore, GEN9_LNCFCMOCS(i), l3cc); 621 623 } 622 624 623 625 void intel_mocs_init_engine(struct intel_engine_cs *engine) ··· 640 636 init_mocs_table(engine, &table); 641 637 642 638 if (flags & HAS_RENDER_L3CC && engine->class == RENDER_CLASS) 643 - init_l3cc_table(engine->uncore, &table); 639 + init_l3cc_table(engine->gt, &table); 644 640 } 645 641 646 642 static u32 global_mocs_offset(void) ··· 676 672 * memory transactions including guc transactions 677 673 */ 678 674 if (flags & HAS_RENDER_L3CC) 679 - init_l3cc_table(gt->uncore, &table); 675 + init_l3cc_table(gt, &table); 680 676 } 681 677 682 678 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)

+1 -1

drivers/gpu/drm/i915/gt/intel_reset.c

··· 1278 1278 kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event); 1279 1279 1280 1280 /* Use a watchdog to ensure that our reset completes */ 1281 - intel_wedge_on_timeout(&w, gt, 5 * HZ) { 1281 + intel_wedge_on_timeout(&w, gt, 60 * HZ) { 1282 1282 intel_display_prepare_reset(gt->i915); 1283 1283 1284 1284 intel_gt_reset(gt, engine_mask, reason);

+252 -13

drivers/gpu/drm/i915/gt/intel_rps.c

··· 625 625 rgvswctl = intel_uncore_read16(uncore, MEMSWCTL); 626 626 627 627 /* Ack interrupts, disable EFC interrupt */ 628 - intel_uncore_write(uncore, MEMINTREN, 629 - intel_uncore_read(uncore, MEMINTREN) & 630 - ~MEMINT_EVAL_CHG_EN); 628 + intel_uncore_rmw(uncore, MEMINTREN, MEMINT_EVAL_CHG_EN, 0); 631 629 intel_uncore_write(uncore, MEMINTRSTS, MEMINT_EVAL_CHG); 632 630 633 631 /* Go back to the starting frequency */ ··· 1014 1016 if (rps_uses_slpc(rps)) { 1015 1017 slpc = rps_to_slpc(rps); 1016 1018 1019 + if (slpc->min_freq_softlimit >= slpc->boost_freq) 1020 + return; 1021 + 1017 1022 /* Return if old value is non zero */ 1018 - if (!atomic_fetch_inc(&slpc->num_waiters)) 1023 + if (!atomic_fetch_inc(&slpc->num_waiters)) { 1024 + GT_TRACE(rps_to_gt(rps), "boost fence:%llx:%llx\n", 1025 + rq->fence.context, rq->fence.seqno); 1019 1026 schedule_work(&slpc->boost_work); 1027 + } 1020 1028 1021 1029 return; 1022 1030 } ··· 1089 1085 return intel_uncore_read(uncore, GEN6_RP_STATE_CAP); 1090 1086 } 1091 1087 1092 - /** 1093 - * gen6_rps_get_freq_caps - Get freq caps exposed by HW 1094 - * @rps: the intel_rps structure 1095 - * @caps: returned freq caps 1096 - * 1097 - * Returned "caps" frequencies should be converted to MHz using 1098 - * intel_gpu_freq() 1099 - */ 1100 - void gen6_rps_get_freq_caps(struct intel_rps *rps, struct intel_rps_freq_caps *caps) 1088 + static void 1089 + mtl_get_freq_caps(struct intel_rps *rps, struct intel_rps_freq_caps *caps) 1090 + { 1091 + struct intel_uncore *uncore = rps_to_uncore(rps); 1092 + u32 rp_state_cap = rps_to_gt(rps)->type == GT_MEDIA ? 1093 + intel_uncore_read(uncore, MTL_MEDIAP_STATE_CAP) : 1094 + intel_uncore_read(uncore, MTL_RP_STATE_CAP); 1095 + u32 rpe = rps_to_gt(rps)->type == GT_MEDIA ? 1096 + intel_uncore_read(uncore, MTL_MPE_FREQUENCY) : 1097 + intel_uncore_read(uncore, MTL_GT_RPE_FREQUENCY); 1098 + 1099 + /* MTL values are in units of 16.67 MHz */ 1100 + caps->rp0_freq = REG_FIELD_GET(MTL_RP0_CAP_MASK, rp_state_cap); 1101 + caps->min_freq = REG_FIELD_GET(MTL_RPN_CAP_MASK, rp_state_cap); 1102 + caps->rp1_freq = REG_FIELD_GET(MTL_RPE_MASK, rpe); 1103 + } 1104 + 1105 + static void 1106 + __gen6_rps_get_freq_caps(struct intel_rps *rps, struct intel_rps_freq_caps *caps) 1101 1107 { 1102 1108 struct drm_i915_private *i915 = rps_to_i915(rps); 1103 1109 u32 rp_state_cap; ··· 1140 1126 caps->rp1_freq *= GEN9_FREQ_SCALER; 1141 1127 caps->min_freq *= GEN9_FREQ_SCALER; 1142 1128 } 1129 + } 1130 + 1131 + /** 1132 + * gen6_rps_get_freq_caps - Get freq caps exposed by HW 1133 + * @rps: the intel_rps structure 1134 + * @caps: returned freq caps 1135 + * 1136 + * Returned "caps" frequencies should be converted to MHz using 1137 + * intel_gpu_freq() 1138 + */ 1139 + void gen6_rps_get_freq_caps(struct intel_rps *rps, struct intel_rps_freq_caps *caps) 1140 + { 1141 + struct drm_i915_private *i915 = rps_to_i915(rps); 1142 + 1143 + if (IS_METEORLAKE(i915)) 1144 + return mtl_get_freq_caps(rps, caps); 1145 + else 1146 + return __gen6_rps_get_freq_caps(rps, caps); 1143 1147 } 1144 1148 1145 1149 static void gen6_rps_init(struct intel_rps *rps) ··· 2221 2189 return slpc->min_freq; 2222 2190 else 2223 2191 return intel_gpu_freq(rps, rps->min_freq); 2192 + } 2193 + 2194 + static void rps_frequency_dump(struct intel_rps *rps, struct drm_printer *p) 2195 + { 2196 + struct intel_gt *gt = rps_to_gt(rps); 2197 + struct drm_i915_private *i915 = gt->i915; 2198 + struct intel_uncore *uncore = gt->uncore; 2199 + struct intel_rps_freq_caps caps; 2200 + u32 rp_state_limits; 2201 + u32 gt_perf_status; 2202 + u32 rpmodectl, rpinclimit, rpdeclimit; 2203 + u32 rpstat, cagf, reqf; 2204 + u32 rpcurupei, rpcurup, rpprevup; 2205 + u32 rpcurdownei, rpcurdown, rpprevdown; 2206 + u32 rpupei, rpupt, rpdownei, rpdownt; 2207 + u32 pm_ier, pm_imr, pm_isr, pm_iir, pm_mask; 2208 + 2209 + rp_state_limits = intel_uncore_read(uncore, GEN6_RP_STATE_LIMITS); 2210 + gen6_rps_get_freq_caps(rps, &caps); 2211 + if (IS_GEN9_LP(i915)) 2212 + gt_perf_status = intel_uncore_read(uncore, BXT_GT_PERF_STATUS); 2213 + else 2214 + gt_perf_status = intel_uncore_read(uncore, GEN6_GT_PERF_STATUS); 2215 + 2216 + /* RPSTAT1 is in the GT power well */ 2217 + intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); 2218 + 2219 + reqf = intel_uncore_read(uncore, GEN6_RPNSWREQ); 2220 + if (GRAPHICS_VER(i915) >= 9) { 2221 + reqf >>= 23; 2222 + } else { 2223 + reqf &= ~GEN6_TURBO_DISABLE; 2224 + if (IS_HASWELL(i915) || IS_BROADWELL(i915)) 2225 + reqf >>= 24; 2226 + else 2227 + reqf >>= 25; 2228 + } 2229 + reqf = intel_gpu_freq(rps, reqf); 2230 + 2231 + rpmodectl = intel_uncore_read(uncore, GEN6_RP_CONTROL); 2232 + rpinclimit = intel_uncore_read(uncore, GEN6_RP_UP_THRESHOLD); 2233 + rpdeclimit = intel_uncore_read(uncore, GEN6_RP_DOWN_THRESHOLD); 2234 + 2235 + rpstat = intel_uncore_read(uncore, GEN6_RPSTAT1); 2236 + rpcurupei = intel_uncore_read(uncore, GEN6_RP_CUR_UP_EI) & GEN6_CURICONT_MASK; 2237 + rpcurup = intel_uncore_read(uncore, GEN6_RP_CUR_UP) & GEN6_CURBSYTAVG_MASK; 2238 + rpprevup = intel_uncore_read(uncore, GEN6_RP_PREV_UP) & GEN6_CURBSYTAVG_MASK; 2239 + rpcurdownei = intel_uncore_read(uncore, GEN6_RP_CUR_DOWN_EI) & GEN6_CURIAVG_MASK; 2240 + rpcurdown = intel_uncore_read(uncore, GEN6_RP_CUR_DOWN) & GEN6_CURBSYTAVG_MASK; 2241 + rpprevdown = intel_uncore_read(uncore, GEN6_RP_PREV_DOWN) & GEN6_CURBSYTAVG_MASK; 2242 + 2243 + rpupei = intel_uncore_read(uncore, GEN6_RP_UP_EI); 2244 + rpupt = intel_uncore_read(uncore, GEN6_RP_UP_THRESHOLD); 2245 + 2246 + rpdownei = intel_uncore_read(uncore, GEN6_RP_DOWN_EI); 2247 + rpdownt = intel_uncore_read(uncore, GEN6_RP_DOWN_THRESHOLD); 2248 + 2249 + cagf = intel_rps_read_actual_frequency(rps); 2250 + 2251 + intel_uncore_forcewake_put(uncore, FORCEWAKE_ALL); 2252 + 2253 + if (GRAPHICS_VER(i915) >= 11) { 2254 + pm_ier = intel_uncore_read(uncore, GEN11_GPM_WGBOXPERF_INTR_ENABLE); 2255 + pm_imr = intel_uncore_read(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK); 2256 + /* 2257 + * The equivalent to the PM ISR & IIR cannot be read 2258 + * without affecting the current state of the system 2259 + */ 2260 + pm_isr = 0; 2261 + pm_iir = 0; 2262 + } else if (GRAPHICS_VER(i915) >= 8) { 2263 + pm_ier = intel_uncore_read(uncore, GEN8_GT_IER(2)); 2264 + pm_imr = intel_uncore_read(uncore, GEN8_GT_IMR(2)); 2265 + pm_isr = intel_uncore_read(uncore, GEN8_GT_ISR(2)); 2266 + pm_iir = intel_uncore_read(uncore, GEN8_GT_IIR(2)); 2267 + } else { 2268 + pm_ier = intel_uncore_read(uncore, GEN6_PMIER); 2269 + pm_imr = intel_uncore_read(uncore, GEN6_PMIMR); 2270 + pm_isr = intel_uncore_read(uncore, GEN6_PMISR); 2271 + pm_iir = intel_uncore_read(uncore, GEN6_PMIIR); 2272 + } 2273 + pm_mask = intel_uncore_read(uncore, GEN6_PMINTRMSK); 2274 + 2275 + drm_printf(p, "Video Turbo Mode: %s\n", 2276 + str_yes_no(rpmodectl & GEN6_RP_MEDIA_TURBO)); 2277 + drm_printf(p, "HW control enabled: %s\n", 2278 + str_yes_no(rpmodectl & GEN6_RP_ENABLE)); 2279 + drm_printf(p, "SW control enabled: %s\n", 2280 + str_yes_no((rpmodectl & GEN6_RP_MEDIA_MODE_MASK) == GEN6_RP_MEDIA_SW_MODE)); 2281 + 2282 + drm_printf(p, "PM IER=0x%08x IMR=0x%08x, MASK=0x%08x\n", 2283 + pm_ier, pm_imr, pm_mask); 2284 + if (GRAPHICS_VER(i915) <= 10) 2285 + drm_printf(p, "PM ISR=0x%08x IIR=0x%08x\n", 2286 + pm_isr, pm_iir); 2287 + drm_printf(p, "pm_intrmsk_mbz: 0x%08x\n", 2288 + rps->pm_intrmsk_mbz); 2289 + drm_printf(p, "GT_PERF_STATUS: 0x%08x\n", gt_perf_status); 2290 + drm_printf(p, "Render p-state ratio: %d\n", 2291 + (gt_perf_status & (GRAPHICS_VER(i915) >= 9 ? 0x1ff00 : 0xff00)) >> 8); 2292 + drm_printf(p, "Render p-state VID: %d\n", 2293 + gt_perf_status & 0xff); 2294 + drm_printf(p, "Render p-state limit: %d\n", 2295 + rp_state_limits & 0xff); 2296 + drm_printf(p, "RPSTAT1: 0x%08x\n", rpstat); 2297 + drm_printf(p, "RPMODECTL: 0x%08x\n", rpmodectl); 2298 + drm_printf(p, "RPINCLIMIT: 0x%08x\n", rpinclimit); 2299 + drm_printf(p, "RPDECLIMIT: 0x%08x\n", rpdeclimit); 2300 + drm_printf(p, "RPNSWREQ: %dMHz\n", reqf); 2301 + drm_printf(p, "CAGF: %dMHz\n", cagf); 2302 + drm_printf(p, "RP CUR UP EI: %d (%lldns)\n", 2303 + rpcurupei, 2304 + intel_gt_pm_interval_to_ns(gt, rpcurupei)); 2305 + drm_printf(p, "RP CUR UP: %d (%lldns)\n", 2306 + rpcurup, intel_gt_pm_interval_to_ns(gt, rpcurup)); 2307 + drm_printf(p, "RP PREV UP: %d (%lldns)\n", 2308 + rpprevup, intel_gt_pm_interval_to_ns(gt, rpprevup)); 2309 + drm_printf(p, "Up threshold: %d%%\n", 2310 + rps->power.up_threshold); 2311 + drm_printf(p, "RP UP EI: %d (%lldns)\n", 2312 + rpupei, intel_gt_pm_interval_to_ns(gt, rpupei)); 2313 + drm_printf(p, "RP UP THRESHOLD: %d (%lldns)\n", 2314 + rpupt, intel_gt_pm_interval_to_ns(gt, rpupt)); 2315 + 2316 + drm_printf(p, "RP CUR DOWN EI: %d (%lldns)\n", 2317 + rpcurdownei, 2318 + intel_gt_pm_interval_to_ns(gt, rpcurdownei)); 2319 + drm_printf(p, "RP CUR DOWN: %d (%lldns)\n", 2320 + rpcurdown, 2321 + intel_gt_pm_interval_to_ns(gt, rpcurdown)); 2322 + drm_printf(p, "RP PREV DOWN: %d (%lldns)\n", 2323 + rpprevdown, 2324 + intel_gt_pm_interval_to_ns(gt, rpprevdown)); 2325 + drm_printf(p, "Down threshold: %d%%\n", 2326 + rps->power.down_threshold); 2327 + drm_printf(p, "RP DOWN EI: %d (%lldns)\n", 2328 + rpdownei, intel_gt_pm_interval_to_ns(gt, rpdownei)); 2329 + drm_printf(p, "RP DOWN THRESHOLD: %d (%lldns)\n", 2330 + rpdownt, intel_gt_pm_interval_to_ns(gt, rpdownt)); 2331 + 2332 + drm_printf(p, "Lowest (RPN) frequency: %dMHz\n", 2333 + intel_gpu_freq(rps, caps.min_freq)); 2334 + drm_printf(p, "Nominal (RP1) frequency: %dMHz\n", 2335 + intel_gpu_freq(rps, caps.rp1_freq)); 2336 + drm_printf(p, "Max non-overclocked (RP0) frequency: %dMHz\n", 2337 + intel_gpu_freq(rps, caps.rp0_freq)); 2338 + drm_printf(p, "Max overclocked frequency: %dMHz\n", 2339 + intel_gpu_freq(rps, rps->max_freq)); 2340 + 2341 + drm_printf(p, "Current freq: %d MHz\n", 2342 + intel_gpu_freq(rps, rps->cur_freq)); 2343 + drm_printf(p, "Actual freq: %d MHz\n", cagf); 2344 + drm_printf(p, "Idle freq: %d MHz\n", 2345 + intel_gpu_freq(rps, rps->idle_freq)); 2346 + drm_printf(p, "Min freq: %d MHz\n", 2347 + intel_gpu_freq(rps, rps->min_freq)); 2348 + drm_printf(p, "Boost freq: %d MHz\n", 2349 + intel_gpu_freq(rps, rps->boost_freq)); 2350 + drm_printf(p, "Max freq: %d MHz\n", 2351 + intel_gpu_freq(rps, rps->max_freq)); 2352 + drm_printf(p, 2353 + "efficient (RPe) frequency: %d MHz\n", 2354 + intel_gpu_freq(rps, rps->efficient_freq)); 2355 + } 2356 + 2357 + static void slpc_frequency_dump(struct intel_rps *rps, struct drm_printer *p) 2358 + { 2359 + struct intel_gt *gt = rps_to_gt(rps); 2360 + struct intel_uncore *uncore = gt->uncore; 2361 + struct intel_rps_freq_caps caps; 2362 + u32 pm_mask; 2363 + 2364 + gen6_rps_get_freq_caps(rps, &caps); 2365 + pm_mask = intel_uncore_read(uncore, GEN6_PMINTRMSK); 2366 + 2367 + drm_printf(p, "PM MASK=0x%08x\n", pm_mask); 2368 + drm_printf(p, "pm_intrmsk_mbz: 0x%08x\n", 2369 + rps->pm_intrmsk_mbz); 2370 + drm_printf(p, "RPSTAT1: 0x%08x\n", intel_uncore_read(uncore, GEN6_RPSTAT1)); 2371 + drm_printf(p, "RPNSWREQ: %dMHz\n", intel_rps_get_requested_frequency(rps)); 2372 + drm_printf(p, "Lowest (RPN) frequency: %dMHz\n", 2373 + intel_gpu_freq(rps, caps.min_freq)); 2374 + drm_printf(p, "Nominal (RP1) frequency: %dMHz\n", 2375 + intel_gpu_freq(rps, caps.rp1_freq)); 2376 + drm_printf(p, "Max non-overclocked (RP0) frequency: %dMHz\n", 2377 + intel_gpu_freq(rps, caps.rp0_freq)); 2378 + drm_printf(p, "Current freq: %d MHz\n", 2379 + intel_rps_get_requested_frequency(rps)); 2380 + drm_printf(p, "Actual freq: %d MHz\n", 2381 + intel_rps_read_actual_frequency(rps)); 2382 + drm_printf(p, "Min freq: %d MHz\n", 2383 + intel_rps_get_min_frequency(rps)); 2384 + drm_printf(p, "Boost freq: %d MHz\n", 2385 + intel_rps_get_boost_frequency(rps)); 2386 + drm_printf(p, "Max freq: %d MHz\n", 2387 + intel_rps_get_max_frequency(rps)); 2388 + drm_printf(p, 2389 + "efficient (RPe) frequency: %d MHz\n", 2390 + intel_gpu_freq(rps, caps.rp1_freq)); 2391 + } 2392 + 2393 + void gen6_rps_frequency_dump(struct intel_rps *rps, struct drm_printer *p) 2394 + { 2395 + if (rps_uses_slpc(rps)) 2396 + return slpc_frequency_dump(rps, p); 2397 + else 2398 + return rps_frequency_dump(rps, p); 2224 2399 } 2225 2400 2226 2401 static int set_max_freq(struct intel_rps *rps, u32 val)

+3

drivers/gpu/drm/i915/gt/intel_rps.h

··· 10 10 #include "i915_reg_defs.h" 11 11 12 12 struct i915_request; 13 + struct drm_printer; 13 14 14 15 void intel_rps_init_early(struct intel_rps *rps); 15 16 void intel_rps_init(struct intel_rps *rps); ··· 54 53 55 54 u32 intel_rps_read_throttle_reason(struct intel_rps *rps); 56 55 bool rps_read_mask_mmio(struct intel_rps *rps, i915_reg_t reg32, u32 mask); 56 + 57 + void gen6_rps_frequency_dump(struct intel_rps *rps, struct drm_printer *p); 57 58 58 59 void gen5_rps_irq_handler(struct intel_rps *rps); 59 60 void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir);

+2 -2

drivers/gpu/drm/i915/gt/intel_sseu.c

··· 677 677 * If i915/perf is active, we want a stable powergating configuration 678 678 * on the system. Use the configuration pinned by i915/perf. 679 679 */ 680 - if (i915->perf.exclusive_stream) 681 - req_sseu = &i915->perf.sseu; 680 + if (gt->perf.exclusive_stream) 681 + req_sseu = &gt->perf.sseu; 682 682 683 683 slices = hweight8(req_sseu->slice_mask); 684 684 subslices = hweight8(req_sseu->subslice_mask);

+388 -189

drivers/gpu/drm/i915/gt/intel_workarounds.c

··· 166 166 _wa_add(wal, &wa); 167 167 } 168 168 169 + static void wa_mcr_add(struct i915_wa_list *wal, i915_mcr_reg_t reg, 170 + u32 clear, u32 set, u32 read_mask, bool masked_reg) 171 + { 172 + struct i915_wa wa = { 173 + .mcr_reg = reg, 174 + .clr = clear, 175 + .set = set, 176 + .read = read_mask, 177 + .masked_reg = masked_reg, 178 + .is_mcr = 1, 179 + }; 180 + 181 + _wa_add(wal, &wa); 182 + } 183 + 169 184 static void 170 185 wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set) 171 186 { 172 187 wa_add(wal, reg, clear, set, clear, false); 188 + } 189 + 190 + static void 191 + wa_mcr_write_clr_set(struct i915_wa_list *wal, i915_mcr_reg_t reg, u32 clear, u32 set) 192 + { 193 + wa_mcr_add(wal, reg, clear, set, clear, false); 173 194 } 174 195 175 196 static void ··· 206 185 } 207 186 208 187 static void 188 + wa_mcr_write_or(struct i915_wa_list *wal, i915_mcr_reg_t reg, u32 set) 189 + { 190 + wa_mcr_write_clr_set(wal, reg, set, set); 191 + } 192 + 193 + static void 209 194 wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr) 210 195 { 211 196 wa_write_clr_set(wal, reg, clr, 0); 197 + } 198 + 199 + static void 200 + wa_mcr_write_clr(struct i915_wa_list *wal, i915_mcr_reg_t reg, u32 clr) 201 + { 202 + wa_mcr_write_clr_set(wal, reg, clr, 0); 212 203 } 213 204 214 205 /* ··· 241 208 } 242 209 243 210 static void 211 + wa_mcr_masked_en(struct i915_wa_list *wal, i915_mcr_reg_t reg, u32 val) 212 + { 213 + wa_mcr_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true); 214 + } 215 + 216 + static void 244 217 wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val) 245 218 { 246 219 wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true); 220 + } 221 + 222 + static void 223 + wa_mcr_masked_dis(struct i915_wa_list *wal, i915_mcr_reg_t reg, u32 val) 224 + { 225 + wa_mcr_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true); 247 226 } 248 227 249 228 static void ··· 263 218 u32 mask, u32 val) 264 219 { 265 220 wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true); 221 + } 222 + 223 + static void 224 + wa_mcr_masked_field_set(struct i915_wa_list *wal, i915_mcr_reg_t reg, 225 + u32 mask, u32 val) 226 + { 227 + wa_mcr_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true); 266 228 } 267 229 268 230 static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine, ··· 293 241 wa_masked_en(wal, RING_MI_MODE(RENDER_RING_BASE), ASYNC_FLIP_PERF_DISABLE); 294 242 295 243 /* WaDisablePartialInstShootdown:bdw,chv */ 296 - wa_masked_en(wal, GEN8_ROW_CHICKEN, 297 - PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE); 244 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN, 245 + PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE); 298 246 299 247 /* Use Force Non-Coherent whenever executing a 3D context. This is a 300 248 * workaround for a possible hang in the unlikely event a TLB ··· 340 288 gen8_ctx_workarounds_init(engine, wal); 341 289 342 290 /* WaDisableThreadStallDopClockGating:bdw (pre-production) */ 343 - wa_masked_en(wal, GEN8_ROW_CHICKEN, STALL_DOP_GATING_DISABLE); 291 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN, STALL_DOP_GATING_DISABLE); 344 292 345 293 /* WaDisableDopClockGating:bdw 346 294 * 347 295 * Also see the related UCGTCL1 write in bdw_init_clock_gating() 348 296 * to disable EUTC clock gating. 349 297 */ 350 - wa_masked_en(wal, GEN7_ROW_CHICKEN2, 351 - DOP_CLOCK_GATING_DISABLE); 298 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2, 299 + DOP_CLOCK_GATING_DISABLE); 352 300 353 - wa_masked_en(wal, HALF_SLICE_CHICKEN3, 354 - GEN8_SAMPLER_POWER_BYPASS_DIS); 301 + wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN3, 302 + GEN8_SAMPLER_POWER_BYPASS_DIS); 355 303 356 304 wa_masked_en(wal, HDC_CHICKEN0, 357 305 /* WaForceContextSaveRestoreNonCoherent:bdw */ ··· 366 314 gen8_ctx_workarounds_init(engine, wal); 367 315 368 316 /* WaDisableThreadStallDopClockGating:chv */ 369 - wa_masked_en(wal, GEN8_ROW_CHICKEN, STALL_DOP_GATING_DISABLE); 317 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN, STALL_DOP_GATING_DISABLE); 370 318 371 319 /* Improve HiZ throughput on CHV. */ 372 320 wa_masked_en(wal, HIZ_CHICKEN, CHV_HZ_8X8_MODE_IN_1X); ··· 385 333 */ 386 334 wa_masked_en(wal, COMMON_SLICE_CHICKEN2, 387 335 GEN9_PBE_COMPRESSED_HASH_SELECTION); 388 - wa_masked_en(wal, GEN9_HALF_SLICE_CHICKEN7, 389 - GEN9_SAMPLER_HASH_COMPRESSED_READ_ADDR); 336 + wa_mcr_masked_en(wal, GEN9_HALF_SLICE_CHICKEN7, 337 + GEN9_SAMPLER_HASH_COMPRESSED_READ_ADDR); 390 338 } 391 339 392 340 /* WaClearFlowControlGpgpuContextSave:skl,bxt,kbl,glk,cfl */ 393 341 /* WaDisablePartialInstShootdown:skl,bxt,kbl,glk,cfl */ 394 - wa_masked_en(wal, GEN8_ROW_CHICKEN, 395 - FLOW_CONTROL_ENABLE | 396 - PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE); 342 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN, 343 + FLOW_CONTROL_ENABLE | 344 + PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE); 397 345 398 346 /* WaEnableYV12BugFixInHalfSliceChicken7:skl,bxt,kbl,glk,cfl */ 399 347 /* WaEnableSamplerGPGPUPreemptionSupport:skl,bxt,kbl,cfl */ 400 - wa_masked_en(wal, GEN9_HALF_SLICE_CHICKEN7, 401 - GEN9_ENABLE_YV12_BUGFIX | 402 - GEN9_ENABLE_GPGPU_PREEMPTION); 348 + wa_mcr_masked_en(wal, GEN9_HALF_SLICE_CHICKEN7, 349 + GEN9_ENABLE_YV12_BUGFIX | 350 + GEN9_ENABLE_GPGPU_PREEMPTION); 403 351 404 352 /* Wa4x4STCOptimizationDisable:skl,bxt,kbl,glk,cfl */ 405 353 /* WaDisablePartialResolveInVc:skl,bxt,kbl,cfl */ ··· 408 356 GEN9_PARTIAL_RESOLVE_IN_VC_DISABLE); 409 357 410 358 /* WaCcsTlbPrefetchDisable:skl,bxt,kbl,glk,cfl */ 411 - wa_masked_dis(wal, GEN9_HALF_SLICE_CHICKEN5, 412 - GEN9_CCS_TLB_PREFETCH_ENABLE); 359 + wa_mcr_masked_dis(wal, GEN9_HALF_SLICE_CHICKEN5, 360 + GEN9_CCS_TLB_PREFETCH_ENABLE); 413 361 414 362 /* WaForceContextSaveRestoreNonCoherent:skl,bxt,kbl,cfl */ 415 363 wa_masked_en(wal, HDC_CHICKEN0, ··· 438 386 IS_KABYLAKE(i915) || 439 387 IS_COFFEELAKE(i915) || 440 388 IS_COMETLAKE(i915)) 441 - wa_masked_en(wal, HALF_SLICE_CHICKEN3, 442 - GEN8_SAMPLER_POWER_BYPASS_DIS); 389 + wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN3, 390 + GEN8_SAMPLER_POWER_BYPASS_DIS); 443 391 444 392 /* WaDisableSTUnitPowerOptimization:skl,bxt,kbl,glk,cfl */ 445 - wa_masked_en(wal, HALF_SLICE_CHICKEN2, GEN8_ST_PO_DISABLE); 393 + wa_mcr_masked_en(wal, HALF_SLICE_CHICKEN2, GEN8_ST_PO_DISABLE); 446 394 447 395 /* 448 396 * Supporting preemption with fine-granularity requires changes in the ··· 521 469 gen9_ctx_workarounds_init(engine, wal); 522 470 523 471 /* WaDisableThreadStallDopClockGating:bxt */ 524 - wa_masked_en(wal, GEN8_ROW_CHICKEN, 525 - STALL_DOP_GATING_DISABLE); 472 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN, 473 + STALL_DOP_GATING_DISABLE); 526 474 527 475 /* WaToEnableHwFixForPushConstHWBug:bxt */ 528 476 wa_masked_en(wal, COMMON_SLICE_CHICKEN2, ··· 542 490 GEN8_SBE_DISABLE_REPLAY_BUF_OPTIMIZATION); 543 491 544 492 /* WaDisableSbeCacheDispatchPortSharing:kbl */ 545 - wa_masked_en(wal, GEN7_HALF_SLICE_CHICKEN1, 546 - GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE); 493 + wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1, 494 + GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE); 547 495 } 548 496 549 497 static void glk_ctx_workarounds_init(struct intel_engine_cs *engine, ··· 566 514 GEN8_SBE_DISABLE_REPLAY_BUF_OPTIMIZATION); 567 515 568 516 /* WaDisableSbeCacheDispatchPortSharing:cfl */ 569 - wa_masked_en(wal, GEN7_HALF_SLICE_CHICKEN1, 570 - GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE); 517 + wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1, 518 + GEN7_SBE_SS_CACHE_DISPATCH_PORT_SHARING_DISABLE); 571 519 } 572 520 573 521 static void icl_ctx_workarounds_init(struct intel_engine_cs *engine, ··· 586 534 * (the register is whitelisted in hardware now, so UMDs can opt in 587 535 * for coherency if they have a good reason). 588 536 */ 589 - wa_masked_en(wal, ICL_HDC_MODE, HDC_FORCE_NON_COHERENT); 537 + wa_mcr_masked_en(wal, ICL_HDC_MODE, HDC_FORCE_NON_COHERENT); 590 538 591 539 /* WaEnableFloatBlendOptimization:icl */ 592 - wa_add(wal, GEN10_CACHE_MODE_SS, 0, 593 - _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE), 594 - 0 /* write-only, so skip validation */, 595 - true); 540 + wa_mcr_add(wal, GEN10_CACHE_MODE_SS, 0, 541 + _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE), 542 + 0 /* write-only, so skip validation */, 543 + true); 596 544 597 545 /* WaDisableGPGPUMidThreadPreemption:icl */ 598 546 wa_masked_field_set(wal, GEN8_CS_CHICKEN1, ··· 600 548 GEN9_PREEMPT_GPGPU_THREAD_GROUP_LEVEL); 601 549 602 550 /* allow headerless messages for preemptible GPGPU context */ 603 - wa_masked_en(wal, GEN10_SAMPLER_MODE, 604 - GEN11_SAMPLER_ENABLE_HEADLESS_MSG); 551 + wa_mcr_masked_en(wal, GEN10_SAMPLER_MODE, 552 + GEN11_SAMPLER_ENABLE_HEADLESS_MSG); 605 553 606 554 /* Wa_1604278689:icl,ehl */ 607 555 wa_write(wal, IVB_FBC_RT_BASE, 0xFFFFFFFF & ~ILK_FBC_RT_VALID); ··· 610 558 0xFFFFFFFF); 611 559 612 560 /* Wa_1406306137:icl,ehl */ 613 - wa_masked_en(wal, GEN9_ROW_CHICKEN4, GEN11_DIS_PICK_2ND_EU); 561 + wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN4, GEN11_DIS_PICK_2ND_EU); 614 562 } 615 563 616 564 /* ··· 621 569 struct i915_wa_list *wal) 622 570 { 623 571 wa_masked_en(wal, CHICKEN_RASTER_2, TBIMR_FAST_CLIP); 624 - wa_write_clr_set(wal, GEN11_L3SQCREG5, L3_PWM_TIMER_INIT_VAL_MASK, 625 - REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f)); 626 - wa_add(wal, 627 - FF_MODE2, 628 - FF_MODE2_TDS_TIMER_MASK, 629 - FF_MODE2_TDS_TIMER_128, 630 - 0, false); 572 + wa_mcr_write_clr_set(wal, XEHP_L3SQCREG5, L3_PWM_TIMER_INIT_VAL_MASK, 573 + REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f)); 574 + wa_mcr_add(wal, 575 + XEHP_FF_MODE2, 576 + FF_MODE2_TDS_TIMER_MASK, 577 + FF_MODE2_TDS_TIMER_128, 578 + 0, false); 631 579 } 632 580 633 581 /* ··· 651 599 * verification is ignored. 652 600 */ 653 601 wa_add(wal, 654 - FF_MODE2, 602 + GEN12_FF_MODE2, 655 603 FF_MODE2_TDS_TIMER_MASK, 656 604 FF_MODE2_TDS_TIMER_128, 657 605 0, false); ··· 660 608 static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine, 661 609 struct i915_wa_list *wal) 662 610 { 611 + struct drm_i915_private *i915 = engine->i915; 612 + 663 613 gen12_ctx_gt_tuning_init(engine, wal); 664 614 665 615 /* ··· 691 637 * to Wa_1608008084. 692 638 */ 693 639 wa_add(wal, 694 - FF_MODE2, 640 + GEN12_FF_MODE2, 695 641 FF_MODE2_GS_TIMER_MASK, 696 642 FF_MODE2_GS_TIMER_224, 697 643 0, false); 644 + 645 + if (!IS_DG1(i915)) 646 + /* Wa_1806527549 */ 647 + wa_masked_en(wal, HIZ_CHICKEN, HZ_DEPTH_TEST_LE_GE_OPT_DISABLE); 698 648 } 699 649 700 650 static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine, ··· 722 664 723 665 /* Wa_16011186671:dg2_g11 */ 724 666 if (IS_DG2_GRAPHICS_STEP(engine->i915, G11, STEP_A0, STEP_B0)) { 725 - wa_masked_dis(wal, VFLSKPD, DIS_MULT_MISS_RD_SQUASH); 726 - wa_masked_en(wal, VFLSKPD, DIS_OVER_FETCH_CACHE); 667 + wa_mcr_masked_dis(wal, VFLSKPD, DIS_MULT_MISS_RD_SQUASH); 668 + wa_mcr_masked_en(wal, VFLSKPD, DIS_OVER_FETCH_CACHE); 727 669 } 728 670 729 671 if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_A0, STEP_B0)) { 730 672 /* Wa_14010469329:dg2_g10 */ 731 - wa_masked_en(wal, GEN11_COMMON_SLICE_CHICKEN3, 732 - XEHP_DUAL_SIMD8_SEQ_MERGE_DISABLE); 673 + wa_mcr_masked_en(wal, XEHP_COMMON_SLICE_CHICKEN3, 674 + XEHP_DUAL_SIMD8_SEQ_MERGE_DISABLE); 733 675 734 676 /* 735 677 * Wa_22010465075:dg2_g10 736 678 * Wa_22010613112:dg2_g10 737 679 * Wa_14010698770:dg2_g10 738 680 */ 739 - wa_masked_en(wal, GEN11_COMMON_SLICE_CHICKEN3, 740 - GEN12_DISABLE_CPS_AWARE_COLOR_PIPE); 681 + wa_mcr_masked_en(wal, XEHP_COMMON_SLICE_CHICKEN3, 682 + GEN12_DISABLE_CPS_AWARE_COLOR_PIPE); 741 683 } 742 684 743 685 /* Wa_16013271637:dg2 */ 744 - wa_masked_en(wal, SLICE_COMMON_ECO_CHICKEN1, 745 - MSC_MSAA_REODER_BUF_BYPASS_DISABLE); 686 + wa_mcr_masked_en(wal, XEHP_SLICE_COMMON_ECO_CHICKEN1, 687 + MSC_MSAA_REODER_BUF_BYPASS_DISABLE); 746 688 747 689 /* Wa_14014947963:dg2 */ 748 690 if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_B0, STEP_FOREVER) || ··· 1134 1076 wa_write_clr_set(wal, steering_reg, mcr_mask, mcr); 1135 1077 } 1136 1078 1137 - static void __add_mcr_wa(struct intel_gt *gt, struct i915_wa_list *wal, 1138 - unsigned int slice, unsigned int subslice) 1079 + static void debug_dump_steering(struct intel_gt *gt) 1139 1080 { 1140 1081 struct drm_printer p = drm_debug_printer("MCR Steering:"); 1141 1082 1083 + if (drm_debug_enabled(DRM_UT_DRIVER)) 1084 + intel_gt_mcr_report_steering(&p, gt, false); 1085 + } 1086 + 1087 + static void __add_mcr_wa(struct intel_gt *gt, struct i915_wa_list *wal, 1088 + unsigned int slice, unsigned int subslice) 1089 + { 1142 1090 __set_mcr_steering(wal, GEN8_MCR_SELECTOR, slice, subslice); 1143 1091 1144 1092 gt->default_steering.groupid = slice; 1145 1093 gt->default_steering.instanceid = subslice; 1146 1094 1147 - if (drm_debug_enabled(DRM_UT_DRIVER)) 1148 - intel_gt_mcr_report_steering(&p, gt, false); 1095 + debug_dump_steering(gt); 1149 1096 } 1150 1097 1151 1098 static void ··· 1244 1181 gt->steering_table[MSLICE] = NULL; 1245 1182 } 1246 1183 1184 + if (IS_XEHPSDV(gt->i915) && slice_mask & BIT(0)) 1185 + gt->steering_table[GAM] = NULL; 1186 + 1247 1187 slice = __ffs(slice_mask); 1248 1188 subslice = intel_sseu_find_first_xehp_dss(sseu, GEN_DSS_PER_GSLICE, slice) % 1249 1189 GEN_DSS_PER_GSLICE; ··· 1264 1198 */ 1265 1199 __set_mcr_steering(wal, MCFG_MCR_SELECTOR, 0, 2); 1266 1200 __set_mcr_steering(wal, SF_MCR_SELECTOR, 0, 2); 1201 + 1202 + /* 1203 + * On DG2, GAM registers have a dedicated steering control register 1204 + * and must always be programmed to a hardcoded groupid of "1." 1205 + */ 1206 + if (IS_DG2(gt->i915)) 1207 + __set_mcr_steering(wal, GAM_MCR_SELECTOR, 1, 0); 1267 1208 } 1268 1209 1269 1210 static void ··· 1327 1254 PSDUNIT_CLKGATE_DIS); 1328 1255 1329 1256 /* Wa_1406680159:icl,ehl */ 1330 - wa_write_or(wal, 1331 - SUBSLICE_UNIT_LEVEL_CLKGATE, 1332 - GWUNIT_CLKGATE_DIS); 1257 + wa_mcr_write_or(wal, 1258 + GEN11_SUBSLICE_UNIT_LEVEL_CLKGATE, 1259 + GWUNIT_CLKGATE_DIS); 1333 1260 1334 1261 /* Wa_1607087056:icl,ehl,jsl */ 1335 1262 if (IS_ICELAKE(i915) || 1336 1263 IS_JSL_EHL_GRAPHICS_STEP(i915, STEP_A0, STEP_B0)) 1337 1264 wa_write_or(wal, 1338 - SLICE_UNIT_LEVEL_CLKGATE, 1265 + GEN11_SLICE_UNIT_LEVEL_CLKGATE, 1339 1266 L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS); 1340 1267 1341 1268 /* 1342 1269 * This is not a documented workaround, but rather an optimization 1343 1270 * to reduce sampler power. 1344 1271 */ 1345 - wa_write_clr(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE); 1272 + wa_mcr_write_clr(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE); 1346 1273 } 1347 1274 1348 1275 /* ··· 1376 1303 wa_14011060649(gt, wal); 1377 1304 1378 1305 /* Wa_14011059788:tgl,rkl,adl-s,dg1,adl-p */ 1379 - wa_write_or(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE); 1306 + wa_mcr_write_or(wal, GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE); 1380 1307 } 1381 1308 1382 1309 static void ··· 1388 1315 1389 1316 /* Wa_1409420604:tgl */ 1390 1317 if (IS_TGL_UY_GRAPHICS_STEP(i915, STEP_A0, STEP_B0)) 1391 - wa_write_or(wal, 1392 - SUBSLICE_UNIT_LEVEL_CLKGATE2, 1393 - CPSSUNIT_CLKGATE_DIS); 1318 + wa_mcr_write_or(wal, 1319 + SUBSLICE_UNIT_LEVEL_CLKGATE2, 1320 + CPSSUNIT_CLKGATE_DIS); 1394 1321 1395 1322 /* Wa_1607087056:tgl also know as BUG:1409180338 */ 1396 1323 if (IS_TGL_UY_GRAPHICS_STEP(i915, STEP_A0, STEP_B0)) 1397 1324 wa_write_or(wal, 1398 - SLICE_UNIT_LEVEL_CLKGATE, 1325 + GEN11_SLICE_UNIT_LEVEL_CLKGATE, 1399 1326 L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS); 1400 1327 1401 1328 /* Wa_1408615072:tgl[a0] */ ··· 1414 1341 /* Wa_1607087056:dg1 */ 1415 1342 if (IS_DG1_GRAPHICS_STEP(i915, STEP_A0, STEP_B0)) 1416 1343 wa_write_or(wal, 1417 - SLICE_UNIT_LEVEL_CLKGATE, 1344 + GEN11_SLICE_UNIT_LEVEL_CLKGATE, 1418 1345 L3_CLKGATE_DIS | L3_CR2X_CLKGATE_DIS); 1419 1346 1420 1347 /* Wa_1409420604:dg1 */ 1421 1348 if (IS_DG1(i915)) 1422 - wa_write_or(wal, 1423 - SUBSLICE_UNIT_LEVEL_CLKGATE2, 1424 - CPSSUNIT_CLKGATE_DIS); 1349 + wa_mcr_write_or(wal, 1350 + SUBSLICE_UNIT_LEVEL_CLKGATE2, 1351 + CPSSUNIT_CLKGATE_DIS); 1425 1352 1426 1353 /* Wa_1408615072:dg1 */ 1427 1354 /* Empirical testing shows this register is unaffected by engine reset. */ ··· 1438 1365 xehp_init_mcr(gt, wal); 1439 1366 1440 1367 /* Wa_1409757795:xehpsdv */ 1441 - wa_write_or(wal, SCCGCTL94DC, CG3DDISURB); 1368 + wa_mcr_write_or(wal, SCCGCTL94DC, CG3DDISURB); 1442 1369 1443 1370 /* Wa_16011155590:xehpsdv */ 1444 1371 if (IS_XEHPSDV_GRAPHICS_STEP(i915, STEP_A0, STEP_B0)) ··· 1518 1445 CG3DDISCFEG_CLKGATE_DIS); 1519 1446 1520 1447 /* Wa_14011006942:dg2 */ 1521 - wa_write_or(wal, SUBSLICE_UNIT_LEVEL_CLKGATE, 1522 - DSS_ROUTER_CLKGATE_DIS); 1448 + wa_mcr_write_or(wal, GEN11_SUBSLICE_UNIT_LEVEL_CLKGATE, 1449 + DSS_ROUTER_CLKGATE_DIS); 1523 1450 } 1524 1451 1525 1452 if (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_B0)) { ··· 1530 1457 wa_write_or(wal, UNSLCGCTL9444, LTCDD_CLKGATE_DIS); 1531 1458 1532 1459 /* Wa_14011371254:dg2_g10 */ 1533 - wa_write_or(wal, SLICE_UNIT_LEVEL_CLKGATE, NODEDSS_CLKGATE_DIS); 1460 + wa_mcr_write_or(wal, XEHP_SLICE_UNIT_LEVEL_CLKGATE, NODEDSS_CLKGATE_DIS); 1534 1461 1535 1462 /* Wa_14011431319:dg2_g10 */ 1536 1463 wa_write_or(wal, UNSLCGCTL9440, GAMTLBOACS_CLKGATE_DIS | ··· 1566 1493 GAMEDIA_CLKGATE_DIS); 1567 1494 1568 1495 /* Wa_14011028019:dg2_g10 */ 1569 - wa_write_or(wal, SSMCGCTL9530, RTFUNIT_CLKGATE_DIS); 1496 + wa_mcr_write_or(wal, SSMCGCTL9530, RTFUNIT_CLKGATE_DIS); 1570 1497 } 1571 1498 1572 1499 /* Wa_14014830051:dg2 */ 1573 - wa_write_clr(wal, SARB_CHICKEN1, COMP_CKN_IN); 1500 + wa_mcr_write_clr(wal, SARB_CHICKEN1, COMP_CKN_IN); 1574 1501 1575 1502 /* 1576 1503 * The following are not actually "workarounds" but rather 1577 1504 * recommended tuning settings documented in the bspec's 1578 1505 * performance guide section. 1579 1506 */ 1580 - wa_write_or(wal, GEN12_SQCM, EN_32B_ACCESS); 1507 + wa_mcr_write_or(wal, XEHP_SQCM, EN_32B_ACCESS); 1581 1508 1582 1509 /* Wa_14015795083 */ 1583 - wa_write_clr(wal, GEN7_MISCCPCTL, GEN12_DOP_CLOCK_GATE_RENDER_ENABLE); 1510 + wa_mcr_write_clr(wal, GEN8_MISCCPCTL, GEN12_DOP_CLOCK_GATE_RENDER_ENABLE); 1584 1511 } 1585 1512 1586 1513 static void ··· 1589 1516 pvc_init_mcr(gt, wal); 1590 1517 1591 1518 /* Wa_14015795083 */ 1592 - wa_write_clr(wal, GEN7_MISCCPCTL, GEN12_DOP_CLOCK_GATE_RENDER_ENABLE); 1519 + wa_mcr_write_clr(wal, GEN8_MISCCPCTL, GEN12_DOP_CLOCK_GATE_RENDER_ENABLE); 1520 + } 1521 + 1522 + static void 1523 + xelpg_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) 1524 + { 1525 + /* FIXME: Actual workarounds will be added in future patch(es) */ 1526 + 1527 + /* 1528 + * Unlike older platforms, we no longer setup implicit steering here; 1529 + * all MCR accesses are explicitly steered. 1530 + */ 1531 + debug_dump_steering(gt); 1532 + } 1533 + 1534 + static void 1535 + xelpmp_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal) 1536 + { 1537 + /* FIXME: Actual workarounds will be added in future patch(es) */ 1538 + 1539 + debug_dump_steering(gt); 1593 1540 } 1594 1541 1595 1542 static void ··· 1617 1524 { 1618 1525 struct drm_i915_private *i915 = gt->i915; 1619 1526 1620 - if (IS_PONTEVECCHIO(i915)) 1527 + if (gt->type == GT_MEDIA) { 1528 + if (MEDIA_VER(i915) >= 13) 1529 + xelpmp_gt_workarounds_init(gt, wal); 1530 + else 1531 + MISSING_CASE(MEDIA_VER(i915)); 1532 + 1533 + return; 1534 + } 1535 + 1536 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) 1537 + xelpg_gt_workarounds_init(gt, wal); 1538 + else if (IS_PONTEVECCHIO(i915)) 1621 1539 pvc_gt_workarounds_init(gt, wal); 1622 1540 else if (IS_DG2(i915)) 1623 1541 dg2_gt_workarounds_init(gt, wal); ··· 1732 1628 u32 val, old = 0; 1733 1629 1734 1630 /* open-coded rmw due to steering */ 1735 - old = wa->clr ? intel_gt_mcr_read_any_fw(gt, wa->reg) : 0; 1631 + if (wa->clr) 1632 + old = wa->is_mcr ? 1633 + intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : 1634 + intel_uncore_read_fw(uncore, wa->reg); 1736 1635 val = (old & ~wa->clr) | wa->set; 1737 - if (val != old || !wa->clr) 1738 - intel_uncore_write_fw(uncore, wa->reg, val); 1636 + if (val != old || !wa->clr) { 1637 + if (wa->is_mcr) 1638 + intel_gt_mcr_multicast_write_fw(gt, wa->mcr_reg, val); 1639 + else 1640 + intel_uncore_write_fw(uncore, wa->reg, val); 1641 + } 1739 1642 1740 - if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 1741 - wa_verify(wa, intel_gt_mcr_read_any_fw(gt, wa->reg), 1742 - wal->name, "application"); 1643 + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) { 1644 + u32 val = wa->is_mcr ? 1645 + intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : 1646 + intel_uncore_read_fw(uncore, wa->reg); 1647 + 1648 + wa_verify(wa, val, wal->name, "application"); 1649 + } 1743 1650 } 1744 1651 1745 1652 intel_uncore_forcewake_put__locked(uncore, fw); ··· 1779 1664 intel_uncore_forcewake_get__locked(uncore, fw); 1780 1665 1781 1666 for (i = 0, wa = wal->list; i < wal->count; i++, wa++) 1782 - ok &= wa_verify(wa, 1783 - intel_gt_mcr_read_any_fw(gt, wa->reg), 1667 + ok &= wa_verify(wa, wa->is_mcr ? 1668 + intel_gt_mcr_read_any_fw(gt, wa->mcr_reg) : 1669 + intel_uncore_read_fw(uncore, wa->reg), 1784 1670 wal->name, from); 1785 1671 1786 1672 intel_uncore_forcewake_put__locked(uncore, fw); ··· 1828 1712 } 1829 1713 1830 1714 static void 1715 + whitelist_mcr_reg_ext(struct i915_wa_list *wal, i915_mcr_reg_t reg, u32 flags) 1716 + { 1717 + struct i915_wa wa = { 1718 + .mcr_reg = reg, 1719 + .is_mcr = 1, 1720 + }; 1721 + 1722 + if (GEM_DEBUG_WARN_ON(wal->count >= RING_MAX_NONPRIV_SLOTS)) 1723 + return; 1724 + 1725 + if (GEM_DEBUG_WARN_ON(!is_nonpriv_flags_valid(flags))) 1726 + return; 1727 + 1728 + wa.mcr_reg.reg |= flags; 1729 + _wa_add(wal, &wa); 1730 + } 1731 + 1732 + static void 1831 1733 whitelist_reg(struct i915_wa_list *wal, i915_reg_t reg) 1832 1734 { 1833 1735 whitelist_reg_ext(wal, reg, RING_FORCE_TO_NONPRIV_ACCESS_RW); 1736 + } 1737 + 1738 + static void 1739 + whitelist_mcr_reg(struct i915_wa_list *wal, i915_mcr_reg_t reg) 1740 + { 1741 + whitelist_mcr_reg_ext(wal, reg, RING_FORCE_TO_NONPRIV_ACCESS_RW); 1834 1742 } 1835 1743 1836 1744 static void gen9_whitelist_build(struct i915_wa_list *w) ··· 1882 1742 gen9_whitelist_build(w); 1883 1743 1884 1744 /* WaDisableLSQCROPERFforOCL:skl */ 1885 - whitelist_reg(w, GEN8_L3SQCREG4); 1745 + whitelist_mcr_reg(w, GEN8_L3SQCREG4); 1886 1746 } 1887 1747 1888 1748 static void bxt_whitelist_build(struct intel_engine_cs *engine) ··· 1903 1763 gen9_whitelist_build(w); 1904 1764 1905 1765 /* WaDisableLSQCROPERFforOCL:kbl */ 1906 - whitelist_reg(w, GEN8_L3SQCREG4); 1766 + whitelist_mcr_reg(w, GEN8_L3SQCREG4); 1907 1767 } 1908 1768 1909 1769 static void glk_whitelist_build(struct intel_engine_cs *engine) ··· 1968 1828 switch (engine->class) { 1969 1829 case RENDER_CLASS: 1970 1830 /* WaAllowUMDToModifyHalfSliceChicken7:icl */ 1971 - whitelist_reg(w, GEN9_HALF_SLICE_CHICKEN7); 1831 + whitelist_mcr_reg(w, GEN9_HALF_SLICE_CHICKEN7); 1972 1832 1973 1833 /* WaAllowUMDToModifySamplerMode:icl */ 1974 - whitelist_reg(w, GEN10_SAMPLER_MODE); 1834 + whitelist_mcr_reg(w, GEN10_SAMPLER_MODE); 1975 1835 1976 1836 /* WaEnableStateCacheRedirectToCS:icl */ 1977 1837 whitelist_reg(w, GEN9_SLICE_COMMON_ECO_CHICKEN1); ··· 2247 2107 2248 2108 if (IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0)) { 2249 2109 /* Wa_14013392000:dg2_g11 */ 2250 - wa_masked_en(wal, GEN7_ROW_CHICKEN2, GEN12_ENABLE_LARGE_GRF_MODE); 2251 - 2252 - /* Wa_16011620976:dg2_g11 */ 2253 - wa_write_or(wal, LSC_CHICKEN_BIT_0_UDW, DIS_CHAIN_2XSIMD8); 2110 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2, GEN12_ENABLE_LARGE_GRF_MODE); 2254 2111 } 2255 2112 2256 2113 if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_FOREVER) || 2257 2114 IS_DG2_G11(i915) || IS_DG2_G12(i915)) { 2258 2115 /* Wa_1509727124:dg2 */ 2259 - wa_masked_en(wal, GEN10_SAMPLER_MODE, 2260 - SC_DISABLE_POWER_OPTIMIZATION_EBB); 2116 + wa_mcr_masked_en(wal, GEN10_SAMPLER_MODE, 2117 + SC_DISABLE_POWER_OPTIMIZATION_EBB); 2261 2118 } 2262 2119 2263 2120 if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0) || 2264 2121 IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0)) { 2265 2122 /* Wa_14012419201:dg2 */ 2266 - wa_masked_en(wal, GEN9_ROW_CHICKEN4, 2267 - GEN12_DISABLE_HDR_PAST_PAYLOAD_HOLD_FIX); 2123 + wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN4, 2124 + GEN12_DISABLE_HDR_PAST_PAYLOAD_HOLD_FIX); 2268 2125 } 2269 2126 2270 2127 if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_C0) || ··· 2270 2133 * Wa_22012826095:dg2 2271 2134 * Wa_22013059131:dg2 2272 2135 */ 2273 - wa_write_clr_set(wal, LSC_CHICKEN_BIT_0_UDW, 2274 - MAXREQS_PER_BANK, 2275 - REG_FIELD_PREP(MAXREQS_PER_BANK, 2)); 2136 + wa_mcr_write_clr_set(wal, LSC_CHICKEN_BIT_0_UDW, 2137 + MAXREQS_PER_BANK, 2138 + REG_FIELD_PREP(MAXREQS_PER_BANK, 2)); 2276 2139 2277 2140 /* Wa_22013059131:dg2 */ 2278 - wa_write_or(wal, LSC_CHICKEN_BIT_0, 2279 - FORCE_1_SUB_MESSAGE_PER_FRAGMENT); 2141 + wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0, 2142 + FORCE_1_SUB_MESSAGE_PER_FRAGMENT); 2280 2143 } 2281 2144 2282 2145 /* Wa_1308578152:dg2_g10 when first gslice is fused off */ ··· 2289 2152 if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_FOREVER) || 2290 2153 IS_DG2_G11(i915) || IS_DG2_G12(i915)) { 2291 2154 /* Wa_22013037850:dg2 */ 2292 - wa_write_or(wal, LSC_CHICKEN_BIT_0_UDW, 2293 - DISABLE_128B_EVICTION_COMMAND_UDW); 2155 + wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, 2156 + DISABLE_128B_EVICTION_COMMAND_UDW); 2294 2157 2295 2158 /* Wa_22012856258:dg2 */ 2296 - wa_masked_en(wal, GEN7_ROW_CHICKEN2, 2297 - GEN12_DISABLE_READ_SUPPRESSION); 2159 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2, 2160 + GEN12_DISABLE_READ_SUPPRESSION); 2298 2161 2299 2162 /* 2300 2163 * Wa_22010960976:dg2 2301 2164 * Wa_14013347512:dg2 2302 2165 */ 2303 - wa_masked_dis(wal, GEN12_HDC_CHICKEN0, 2304 - LSC_L1_FLUSH_CTL_3D_DATAPORT_FLUSH_EVENTS_MASK); 2166 + wa_mcr_masked_dis(wal, XEHP_HDC_CHICKEN0, 2167 + LSC_L1_FLUSH_CTL_3D_DATAPORT_FLUSH_EVENTS_MASK); 2305 2168 } 2306 2169 2307 2170 if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) { ··· 2309 2172 * Wa_1608949956:dg2_g10 2310 2173 * Wa_14010198302:dg2_g10 2311 2174 */ 2312 - wa_masked_en(wal, GEN8_ROW_CHICKEN, 2313 - MDQ_ARBITRATION_MODE | UGM_BACKUP_MODE); 2175 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN, 2176 + MDQ_ARBITRATION_MODE | UGM_BACKUP_MODE); 2314 2177 2315 2178 /* 2316 2179 * Wa_14010918519:dg2_g10 ··· 2318 2181 * LSC_CHICKEN_BIT_0 always reads back as 0 is this stepping, 2319 2182 * so ignoring verification. 2320 2183 */ 2321 - wa_add(wal, LSC_CHICKEN_BIT_0_UDW, 0, 2322 - FORCE_SLM_FENCE_SCOPE_TO_TILE | FORCE_UGM_FENCE_SCOPE_TO_TILE, 2323 - 0, false); 2184 + wa_mcr_add(wal, LSC_CHICKEN_BIT_0_UDW, 0, 2185 + FORCE_SLM_FENCE_SCOPE_TO_TILE | FORCE_UGM_FENCE_SCOPE_TO_TILE, 2186 + 0, false); 2324 2187 } 2325 2188 2326 2189 if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) { 2327 2190 /* Wa_22010430635:dg2 */ 2328 - wa_masked_en(wal, 2329 - GEN9_ROW_CHICKEN4, 2330 - GEN12_DISABLE_GRF_CLEAR); 2191 + wa_mcr_masked_en(wal, 2192 + GEN9_ROW_CHICKEN4, 2193 + GEN12_DISABLE_GRF_CLEAR); 2331 2194 2332 2195 /* Wa_14010648519:dg2 */ 2333 - wa_write_or(wal, XEHP_L3NODEARBCFG, XEHP_LNESPARE); 2196 + wa_mcr_write_or(wal, XEHP_L3NODEARBCFG, XEHP_LNESPARE); 2334 2197 } 2335 2198 2336 2199 /* Wa_14013202645:dg2 */ 2337 2200 if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_B0, STEP_C0) || 2338 2201 IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0)) 2339 - wa_write_or(wal, RT_CTRL, DIS_NULL_QUERY); 2202 + wa_mcr_write_or(wal, RT_CTRL, DIS_NULL_QUERY); 2340 2203 2341 2204 /* Wa_22012532006:dg2 */ 2342 2205 if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_A0, STEP_C0) || 2343 2206 IS_DG2_GRAPHICS_STEP(engine->i915, G11, STEP_A0, STEP_B0)) 2344 - wa_masked_en(wal, GEN9_HALF_SLICE_CHICKEN7, 2345 - DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA); 2207 + wa_mcr_masked_en(wal, GEN9_HALF_SLICE_CHICKEN7, 2208 + DG2_DISABLE_ROUND_ENABLE_ALLOW_FOR_SSLA); 2346 2209 2347 2210 if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_A0, STEP_B0)) { 2348 2211 /* Wa_14010680813:dg2_g10 */ ··· 2353 2216 if (IS_DG2_GRAPHICS_STEP(engine->i915, G10, STEP_A0, STEP_B0) || 2354 2217 IS_DG2_GRAPHICS_STEP(engine->i915, G11, STEP_A0, STEP_B0)) { 2355 2218 /* Wa_14012362059:dg2 */ 2356 - wa_write_or(wal, GEN12_MERT_MOD_CTRL, FORCE_MISS_FTLB); 2219 + wa_mcr_write_or(wal, XEHP_MERT_MOD_CTRL, FORCE_MISS_FTLB); 2357 2220 } 2358 2221 2359 2222 if (IS_DG2_GRAPHICS_STEP(i915, G11, STEP_B0, STEP_FOREVER) || 2360 2223 IS_DG2_G10(i915)) { 2361 2224 /* Wa_22014600077:dg2 */ 2362 - wa_add(wal, GEN10_CACHE_MODE_SS, 0, 2363 - _MASKED_BIT_ENABLE(ENABLE_EU_COUNT_FOR_TDL_FLUSH), 2364 - 0 /* Wa_14012342262 :write-only reg, so skip 2365 - verification */, 2366 - true); 2225 + wa_mcr_add(wal, GEN10_CACHE_MODE_SS, 0, 2226 + _MASKED_BIT_ENABLE(ENABLE_EU_COUNT_FOR_TDL_FLUSH), 2227 + 0 /* Wa_14012342262 write-only reg, so skip verification */, 2228 + true); 2367 2229 } 2368 2230 2369 2231 if (IS_DG1_GRAPHICS_STEP(i915, STEP_A0, STEP_B0) || ··· 2389 2253 if (IS_ALDERLAKE_P(i915) || IS_ALDERLAKE_S(i915) || IS_DG1(i915) || 2390 2254 IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { 2391 2255 /* Wa_1606931601:tgl,rkl,dg1,adl-s,adl-p */ 2392 - wa_masked_en(wal, GEN7_ROW_CHICKEN2, GEN12_DISABLE_EARLY_READ); 2256 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2, GEN12_DISABLE_EARLY_READ); 2393 2257 2394 2258 /* 2395 2259 * Wa_1407928979:tgl A* ··· 2418 2282 IS_DG1_GRAPHICS_STEP(i915, STEP_A0, STEP_B0) || 2419 2283 IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915)) { 2420 2284 /* Wa_1409804808:tgl,rkl,dg1[a0],adl-s,adl-p */ 2421 - wa_masked_en(wal, GEN7_ROW_CHICKEN2, 2422 - GEN12_PUSH_CONST_DEREF_HOLD_DIS); 2285 + wa_mcr_masked_en(wal, GEN8_ROW_CHICKEN2, 2286 + GEN12_PUSH_CONST_DEREF_HOLD_DIS); 2423 2287 2424 2288 /* 2425 2289 * Wa_1409085225:tgl 2426 2290 * Wa_14010229206:tgl,rkl,dg1[a0],adl-s,adl-p 2427 2291 */ 2428 - wa_masked_en(wal, GEN9_ROW_CHICKEN4, GEN12_DISABLE_TDL_PUSH); 2292 + wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN4, GEN12_DISABLE_TDL_PUSH); 2429 2293 } 2430 2294 2431 2295 if (IS_DG1_GRAPHICS_STEP(i915, STEP_A0, STEP_B0) || ··· 2449 2313 if (IS_DG1(i915) || IS_ROCKETLAKE(i915) || IS_TIGERLAKE(i915) || 2450 2314 IS_ALDERLAKE_S(i915) || IS_ALDERLAKE_P(i915)) { 2451 2315 /* Wa_1406941453:tgl,rkl,dg1,adl-s,adl-p */ 2452 - wa_masked_en(wal, 2453 - GEN10_SAMPLER_MODE, 2454 - ENABLE_SMALLPL); 2316 + wa_mcr_masked_en(wal, 2317 + GEN10_SAMPLER_MODE, 2318 + ENABLE_SMALLPL); 2455 2319 } 2456 2320 2457 2321 if (GRAPHICS_VER(i915) == 11) { ··· 2485 2349 * Wa_1405733216:icl 2486 2350 * Formerly known as WaDisableCleanEvicts 2487 2351 */ 2488 - wa_write_or(wal, 2489 - GEN8_L3SQCREG4, 2490 - GEN11_LQSC_CLEAN_EVICT_DISABLE); 2352 + wa_mcr_write_or(wal, 2353 + GEN8_L3SQCREG4, 2354 + GEN11_LQSC_CLEAN_EVICT_DISABLE); 2491 2355 2492 2356 /* Wa_1606682166:icl */ 2493 2357 wa_write_or(wal, ··· 2495 2359 GEN7_DISABLE_SAMPLER_PREFETCH); 2496 2360 2497 2361 /* Wa_1409178092:icl */ 2498 - wa_write_clr_set(wal, 2499 - GEN11_SCRATCH2, 2500 - GEN11_COHERENT_PARTIAL_WRITE_MERGE_ENABLE, 2501 - 0); 2362 + wa_mcr_write_clr_set(wal, 2363 + GEN11_SCRATCH2, 2364 + GEN11_COHERENT_PARTIAL_WRITE_MERGE_ENABLE, 2365 + 0); 2502 2366 2503 2367 /* WaEnable32PlaneMode:icl */ 2504 2368 wa_masked_en(wal, GEN9_CSFE_CHICKEN1_RCS, ··· 2525 2389 FF_DOP_CLOCK_GATE_DISABLE); 2526 2390 } 2527 2391 2528 - if (IS_GRAPHICS_VER(i915, 9, 12)) { 2529 - /* FtrPerCtxtPreemptionGranularityControl:skl,bxt,kbl,cfl,cnl,icl,tgl */ 2392 + /* 2393 + * Intel platforms that support fine-grained preemption (i.e., gen9 and 2394 + * beyond) allow the kernel-mode driver to choose between two different 2395 + * options for controlling preemption granularity and behavior. 2396 + * 2397 + * Option 1 (hardware default): 2398 + * Preemption settings are controlled in a global manner via 2399 + * kernel-only register CS_DEBUG_MODE1 (0x20EC). Any granularity 2400 + * and settings chosen by the kernel-mode driver will apply to all 2401 + * userspace clients. 2402 + * 2403 + * Option 2: 2404 + * Preemption settings are controlled on a per-context basis via 2405 + * register CS_CHICKEN1 (0x2580). CS_CHICKEN1 is saved/restored on 2406 + * context switch and is writable by userspace (e.g., via 2407 + * MI_LOAD_REGISTER_IMMEDIATE instructions placed in a batch buffer) 2408 + * which allows different userspace drivers/clients to select 2409 + * different settings, or to change those settings on the fly in 2410 + * response to runtime needs. This option was known by name 2411 + * "FtrPerCtxtPreemptionGranularityControl" at one time, although 2412 + * that name is somewhat misleading as other non-granularity 2413 + * preemption settings are also impacted by this decision. 2414 + * 2415 + * On Linux, our policy has always been to let userspace drivers 2416 + * control preemption granularity/settings (Option 2). This was 2417 + * originally mandatory on gen9 to prevent ABI breakage (old gen9 2418 + * userspace developed before object-level preemption was enabled would 2419 + * not behave well if i915 were to go with Option 1 and enable that 2420 + * preemption in a global manner). On gen9 each context would have 2421 + * object-level preemption disabled by default (see 2422 + * WaDisable3DMidCmdPreemption in gen9_ctx_workarounds_init), but 2423 + * userspace drivers could opt-in to object-level preemption as they 2424 + * saw fit. For post-gen9 platforms, we continue to utilize Option 2; 2425 + * even though it is no longer necessary for ABI compatibility when 2426 + * enabling a new platform, it does ensure that userspace will be able 2427 + * to implement any workarounds that show up requiring temporary 2428 + * adjustments to preemption behavior at runtime. 2429 + * 2430 + * Notes/Workarounds: 2431 + * - Wa_14015141709: On DG2 and early steppings of MTL, 2432 + * CS_CHICKEN1[0] does not disable object-level preemption as 2433 + * it is supposed to (nor does CS_DEBUG_MODE1[0] if we had been 2434 + * using Option 1). Effectively this means userspace is unable 2435 + * to disable object-level preemption on these platforms/steppings 2436 + * despite the setting here. 2437 + * 2438 + * - Wa_16013994831: May require that userspace program 2439 + * CS_CHICKEN1[10] when certain runtime conditions are true. 2440 + * Userspace requires Option 2 to be in effect for their update of 2441 + * CS_CHICKEN1[10] to be effective. 2442 + * 2443 + * Other workarounds may appear in the future that will also require 2444 + * Option 2 behavior to allow proper userspace implementation. 2445 + */ 2446 + if (GRAPHICS_VER(i915) >= 9) 2530 2447 wa_masked_en(wal, 2531 2448 GEN7_FF_SLICE_CS_CHICKEN1, 2532 2449 GEN9_FFSC_PERCTX_PREEMPT_CTRL); 2533 - } 2534 2450 2535 2451 if (IS_SKYLAKE(i915) || 2536 2452 IS_KABYLAKE(i915) || ··· 2608 2420 GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE); 2609 2421 2610 2422 /* WaEnableLbsSlaRetryTimerDecrement:skl,bxt,kbl,glk,cfl */ 2611 - wa_write_or(wal, 2612 - BDW_SCRATCH1, 2613 - GEN9_LBS_SLA_RETRY_TIMER_DECREMENT_ENABLE); 2423 + wa_mcr_write_or(wal, 2424 + BDW_SCRATCH1, 2425 + GEN9_LBS_SLA_RETRY_TIMER_DECREMENT_ENABLE); 2614 2426 2615 2427 /* WaProgramL3SqcReg1DefaultForPerf:bxt,glk */ 2616 2428 if (IS_GEN9_LP(i915)) 2617 - wa_write_clr_set(wal, 2618 - GEN8_L3SQCREG1, 2619 - L3_PRIO_CREDITS_MASK, 2620 - L3_GENERAL_PRIO_CREDITS(62) | 2621 - L3_HIGH_PRIO_CREDITS(2)); 2429 + wa_mcr_write_clr_set(wal, 2430 + GEN8_L3SQCREG1, 2431 + L3_PRIO_CREDITS_MASK, 2432 + L3_GENERAL_PRIO_CREDITS(62) | 2433 + L3_HIGH_PRIO_CREDITS(2)); 2622 2434 2623 2435 /* WaOCLCoherentLineFlush:skl,bxt,kbl,cfl */ 2624 - wa_write_or(wal, 2625 - GEN8_L3SQCREG4, 2626 - GEN8_LQSC_FLUSH_COHERENT_LINES); 2436 + wa_mcr_write_or(wal, 2437 + GEN8_L3SQCREG4, 2438 + GEN8_LQSC_FLUSH_COHERENT_LINES); 2627 2439 2628 2440 /* Disable atomics in L3 to prevent unrecoverable hangs */ 2629 2441 wa_write_clr_set(wal, GEN9_SCRATCH_LNCF1, 2630 2442 GEN9_LNCF_NONIA_COHERENT_ATOMICS_ENABLE, 0); 2631 - wa_write_clr_set(wal, GEN8_L3SQCREG4, 2632 - GEN8_LQSQ_NONIA_COHERENT_ATOMICS_ENABLE, 0); 2633 - wa_write_clr_set(wal, GEN9_SCRATCH1, 2634 - EVICTION_PERF_FIX_ENABLE, 0); 2443 + wa_mcr_write_clr_set(wal, GEN8_L3SQCREG4, 2444 + GEN8_LQSQ_NONIA_COHERENT_ATOMICS_ENABLE, 0); 2445 + wa_mcr_write_clr_set(wal, GEN9_SCRATCH1, 2446 + EVICTION_PERF_FIX_ENABLE, 0); 2635 2447 } 2636 2448 2637 2449 if (IS_HASWELL(i915)) { 2638 2450 /* WaSampleCChickenBitEnable:hsw */ 2639 2451 wa_masked_en(wal, 2640 - HALF_SLICE_CHICKEN3, HSW_SAMPLE_C_PERFORMANCE); 2452 + HSW_HALF_SLICE_CHICKEN3, HSW_SAMPLE_C_PERFORMANCE); 2641 2453 2642 2454 wa_masked_dis(wal, 2643 2455 CACHE_MODE_0_GEN7, ··· 2845 2657 { 2846 2658 if (IS_PVC_CT_STEP(engine->i915, STEP_A0, STEP_C0)) { 2847 2659 /* Wa_14014999345:pvc */ 2848 - wa_masked_en(wal, GEN10_CACHE_MODE_SS, DISABLE_ECC); 2660 + wa_mcr_masked_en(wal, GEN10_CACHE_MODE_SS, DISABLE_ECC); 2849 2661 } 2850 2662 } 2851 2663 ··· 2871 2683 } 2872 2684 2873 2685 if (IS_DG2(i915)) { 2874 - wa_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS); 2875 - wa_write_clr_set(wal, RT_CTRL, STACKID_CTRL, STACKID_CTRL_512); 2686 + wa_mcr_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS); 2687 + wa_mcr_write_clr_set(wal, RT_CTRL, STACKID_CTRL, STACKID_CTRL_512); 2876 2688 2877 2689 /* 2878 2690 * This is also listed as Wa_22012654132 for certain DG2 ··· 2883 2695 * back for verification on DG2 (due to Wa_14012342262), so 2884 2696 * we need to explicitly skip the readback. 2885 2697 */ 2886 - wa_add(wal, GEN10_CACHE_MODE_SS, 0, 2887 - _MASKED_BIT_ENABLE(ENABLE_PREFETCH_INTO_IC), 2888 - 0 /* write-only, so skip validation */, 2889 - true); 2698 + wa_mcr_add(wal, GEN10_CACHE_MODE_SS, 0, 2699 + _MASKED_BIT_ENABLE(ENABLE_PREFETCH_INTO_IC), 2700 + 0 /* write-only, so skip validation */, 2701 + true); 2890 2702 } 2891 2703 2892 2704 /* ··· 2895 2707 * platforms. 2896 2708 */ 2897 2709 if (INTEL_INFO(i915)->tuning_thread_rr_after_dep) 2898 - wa_masked_field_set(wal, GEN9_ROW_CHICKEN4, THREAD_EX_ARB_MODE, 2899 - THREAD_EX_ARB_MODE_RR_AFTER_DEP); 2710 + wa_mcr_masked_field_set(wal, GEN9_ROW_CHICKEN4, THREAD_EX_ARB_MODE, 2711 + THREAD_EX_ARB_MODE_RR_AFTER_DEP); 2900 2712 } 2901 2713 2902 2714 /* ··· 2922 2734 2923 2735 if (IS_XEHPSDV(i915)) { 2924 2736 /* Wa_1409954639 */ 2925 - wa_masked_en(wal, 2926 - GEN8_ROW_CHICKEN, 2927 - SYSTOLIC_DOP_CLOCK_GATING_DIS); 2737 + wa_mcr_masked_en(wal, 2738 + GEN8_ROW_CHICKEN, 2739 + SYSTOLIC_DOP_CLOCK_GATING_DIS); 2928 2740 2929 2741 /* Wa_1607196519 */ 2930 - wa_masked_en(wal, 2931 - GEN9_ROW_CHICKEN4, 2932 - GEN12_DISABLE_GRF_CLEAR); 2742 + wa_mcr_masked_en(wal, 2743 + GEN9_ROW_CHICKEN4, 2744 + GEN12_DISABLE_GRF_CLEAR); 2933 2745 2934 2746 /* Wa_14010670810:xehpsdv */ 2935 - wa_write_or(wal, XEHP_L3NODEARBCFG, XEHP_LNESPARE); 2747 + wa_mcr_write_or(wal, XEHP_L3NODEARBCFG, XEHP_LNESPARE); 2936 2748 2937 2749 /* Wa_14010449647:xehpsdv */ 2938 - wa_masked_en(wal, GEN7_HALF_SLICE_CHICKEN1, 2939 - GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE); 2750 + wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1, 2751 + GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE); 2940 2752 2941 2753 /* Wa_18011725039:xehpsdv */ 2942 2754 if (IS_XEHPSDV_GRAPHICS_STEP(i915, STEP_A1, STEP_B0)) { 2943 - wa_masked_dis(wal, MLTICTXCTL, TDONRENDER); 2944 - wa_write_or(wal, L3SQCREG1_CCS0, FLUSHALLNONCOH); 2755 + wa_mcr_masked_dis(wal, MLTICTXCTL, TDONRENDER); 2756 + wa_mcr_write_or(wal, L3SQCREG1_CCS0, FLUSHALLNONCOH); 2945 2757 } 2946 2758 2947 2759 /* Wa_14012362059:xehpsdv */ 2948 - wa_write_or(wal, GEN12_MERT_MOD_CTRL, FORCE_MISS_FTLB); 2760 + wa_mcr_write_or(wal, XEHP_MERT_MOD_CTRL, FORCE_MISS_FTLB); 2949 2761 2950 2762 /* Wa_14014368820:xehpsdv */ 2951 2763 wa_write_or(wal, GEN12_GAMCNTRL_CTRL, INVALIDATION_BROADCAST_MODE_DIS | ··· 2954 2766 2955 2767 if (IS_DG2(i915) || IS_PONTEVECCHIO(i915)) { 2956 2768 /* Wa_14015227452:dg2,pvc */ 2957 - wa_masked_en(wal, GEN9_ROW_CHICKEN4, XEHP_DIS_BBL_SYSPIPE); 2769 + wa_mcr_masked_en(wal, GEN9_ROW_CHICKEN4, XEHP_DIS_BBL_SYSPIPE); 2958 2770 2959 2771 /* Wa_22014226127:dg2,pvc */ 2960 - wa_write_or(wal, LSC_CHICKEN_BIT_0, DISABLE_D8_D16_COASLESCE); 2772 + wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0, DISABLE_D8_D16_COASLESCE); 2961 2773 2962 2774 /* Wa_16015675438:dg2,pvc */ 2963 2775 wa_masked_en(wal, FF_SLICE_CS_CHICKEN2, GEN12_PERF_FIX_BALANCING_CFE_DISABLE); 2964 2776 2965 2777 /* Wa_18018781329:dg2,pvc */ 2966 - wa_write_or(wal, RENDER_MOD_CTRL, FORCE_MISS_FTLB); 2967 - wa_write_or(wal, COMP_MOD_CTRL, FORCE_MISS_FTLB); 2968 - wa_write_or(wal, VDBX_MOD_CTRL, FORCE_MISS_FTLB); 2969 - wa_write_or(wal, VEBX_MOD_CTRL, FORCE_MISS_FTLB); 2778 + wa_mcr_write_or(wal, RENDER_MOD_CTRL, FORCE_MISS_FTLB); 2779 + wa_mcr_write_or(wal, COMP_MOD_CTRL, FORCE_MISS_FTLB); 2780 + wa_mcr_write_or(wal, VDBX_MOD_CTRL, FORCE_MISS_FTLB); 2781 + wa_mcr_write_or(wal, VEBX_MOD_CTRL, FORCE_MISS_FTLB); 2782 + } 2783 + 2784 + if (IS_DG2(i915)) { 2785 + /* 2786 + * Wa_16011620976:dg2_g11 2787 + * Wa_22015475538:dg2 2788 + */ 2789 + wa_mcr_write_or(wal, LSC_CHICKEN_BIT_0_UDW, DIS_CHAIN_2XSIMD8); 2790 + 2791 + /* Wa_18017747507:dg2 */ 2792 + wa_masked_en(wal, VFG_PREEMPTION_CHICKEN, POLYGON_TRIFAN_LINELOOP_DISABLE); 2970 2793 } 2971 2794 } 2972 2795

+7 -2

drivers/gpu/drm/i915/gt/intel_workarounds_types.h

··· 11 11 #include "i915_reg_defs.h" 12 12 13 13 struct i915_wa { 14 - i915_reg_t reg; 14 + union { 15 + i915_reg_t reg; 16 + i915_mcr_reg_t mcr_reg; 17 + }; 15 18 u32 clr; 16 19 u32 set; 17 20 u32 read; 18 - bool masked_reg; 21 + 22 + u32 masked_reg:1; 23 + u32 is_mcr:1; 19 24 }; 20 25 21 26 struct i915_wa_list {

+19 -3

drivers/gpu/drm/i915/gt/selftest_engine_cs.c

··· 39 39 return igt_flush_test(gt->i915); 40 40 } 41 41 42 + static i915_reg_t timestamp_reg(struct intel_engine_cs *engine) 43 + { 44 + struct drm_i915_private *i915 = engine->i915; 45 + 46 + if (GRAPHICS_VER(i915) == 5 || IS_G4X(i915)) 47 + return RING_TIMESTAMP_UDW(engine->mmio_base); 48 + else 49 + return RING_TIMESTAMP(engine->mmio_base); 50 + } 51 + 42 52 static int write_timestamp(struct i915_request *rq, int slot) 43 53 { 44 54 struct intel_timeline *tl = ··· 65 55 if (GRAPHICS_VER(rq->engine->i915) >= 8) 66 56 cmd++; 67 57 *cs++ = cmd; 68 - *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(rq->engine->mmio_base)); 58 + *cs++ = i915_mmio_reg_offset(timestamp_reg(rq->engine)); 69 59 *cs++ = tl->hwsp_offset + slot * sizeof(u32); 70 60 *cs++ = 0; 71 61 ··· 135 125 enum intel_engine_id id; 136 126 int err = 0; 137 127 138 - if (GRAPHICS_VER(gt->i915) < 7) /* for per-engine CS_TIMESTAMP */ 128 + if (GRAPHICS_VER(gt->i915) < 4) /* Any CS_TIMESTAMP? */ 139 129 return 0; 140 130 141 131 perf_begin(gt); ··· 144 134 struct i915_vma *batch; 145 135 u32 cycles[COUNT]; 146 136 int i; 137 + 138 + if (GRAPHICS_VER(engine->i915) < 7 && engine->id != RCS0) 139 + continue; 147 140 148 141 intel_engine_pm_get(engine); 149 142 ··· 262 249 enum intel_engine_id id; 263 250 int err = 0; 264 251 265 - if (GRAPHICS_VER(gt->i915) < 7) /* for per-engine CS_TIMESTAMP */ 252 + if (GRAPHICS_VER(gt->i915) < 4) /* Any CS_TIMESTAMP? */ 266 253 return 0; 267 254 268 255 perf_begin(gt); ··· 271 258 struct i915_vma *base, *nop; 272 259 u32 cycles[COUNT]; 273 260 int i; 261 + 262 + if (GRAPHICS_VER(engine->i915) < 7 && engine->id != RCS0) 263 + continue; 274 264 275 265 intel_engine_pm_get(engine); 276 266

+23 -27

drivers/gpu/drm/i915/gt/selftest_execlists.c

··· 85 85 break; 86 86 } while (time_before(jiffies, timeout)); 87 87 88 - flush_scheduled_work(); 89 - 90 88 if (rq->fence.error != -EIO) { 91 89 pr_err("%s: hanging request %llx:%lld not reset\n", 92 90 engine->name, ··· 3473 3475 3474 3476 struct preempt_smoke { 3475 3477 struct intel_gt *gt; 3478 + struct kthread_work work; 3476 3479 struct i915_gem_context **contexts; 3477 3480 struct intel_engine_cs *engine; 3478 3481 struct drm_i915_gem_object *batch; 3479 3482 unsigned int ncontext; 3480 3483 struct rnd_state prng; 3481 3484 unsigned long count; 3485 + int result; 3482 3486 }; 3483 3487 3484 3488 static struct i915_gem_context *smoke_context(struct preempt_smoke *smoke) ··· 3540 3540 return err; 3541 3541 } 3542 3542 3543 - static int smoke_crescendo_thread(void *arg) 3543 + static void smoke_crescendo_work(struct kthread_work *work) 3544 3544 { 3545 - struct preempt_smoke *smoke = arg; 3545 + struct preempt_smoke *smoke = container_of(work, typeof(*smoke), work); 3546 3546 IGT_TIMEOUT(end_time); 3547 3547 unsigned long count; 3548 3548 3549 3549 count = 0; 3550 3550 do { 3551 3551 struct i915_gem_context *ctx = smoke_context(smoke); 3552 - int err; 3553 3552 3554 - err = smoke_submit(smoke, 3555 - ctx, count % I915_PRIORITY_MAX, 3556 - smoke->batch); 3557 - if (err) 3558 - return err; 3553 + smoke->result = smoke_submit(smoke, ctx, 3554 + count % I915_PRIORITY_MAX, 3555 + smoke->batch); 3559 3556 3560 3557 count++; 3561 - } while (count < smoke->ncontext && !__igt_timeout(end_time, NULL)); 3558 + } while (!smoke->result && count < smoke->ncontext && 3559 + !__igt_timeout(end_time, NULL)); 3562 3560 3563 3561 smoke->count = count; 3564 - return 0; 3565 3562 } 3566 3563 3567 3564 static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags) 3568 3565 #define BATCH BIT(0) 3569 3566 { 3570 - struct task_struct *tsk[I915_NUM_ENGINES] = {}; 3567 + struct kthread_worker *worker[I915_NUM_ENGINES] = {}; 3571 3568 struct preempt_smoke *arg; 3572 3569 struct intel_engine_cs *engine; 3573 3570 enum intel_engine_id id; ··· 3575 3578 if (!arg) 3576 3579 return -ENOMEM; 3577 3580 3581 + memset(arg, 0, I915_NUM_ENGINES * sizeof(*arg)); 3582 + 3578 3583 for_each_engine(engine, smoke->gt, id) { 3579 3584 arg[id] = *smoke; 3580 3585 arg[id].engine = engine; ··· 3584 3585 arg[id].batch = NULL; 3585 3586 arg[id].count = 0; 3586 3587 3587 - tsk[id] = kthread_run(smoke_crescendo_thread, arg, 3588 - "igt/smoke:%d", id); 3589 - if (IS_ERR(tsk[id])) { 3590 - err = PTR_ERR(tsk[id]); 3588 + worker[id] = kthread_create_worker(0, "igt/smoke:%d", id); 3589 + if (IS_ERR(worker[id])) { 3590 + err = PTR_ERR(worker[id]); 3591 3591 break; 3592 3592 } 3593 - get_task_struct(tsk[id]); 3594 - } 3595 3593 3596 - yield(); /* start all threads before we kthread_stop() */ 3594 + kthread_init_work(&arg[id].work, smoke_crescendo_work); 3595 + kthread_queue_work(worker[id], &arg[id].work); 3596 + } 3597 3597 3598 3598 count = 0; 3599 3599 for_each_engine(engine, smoke->gt, id) { 3600 - int status; 3601 - 3602 - if (IS_ERR_OR_NULL(tsk[id])) 3600 + if (IS_ERR_OR_NULL(worker[id])) 3603 3601 continue; 3604 3602 3605 - status = kthread_stop(tsk[id]); 3606 - if (status && !err) 3607 - err = status; 3603 + kthread_flush_work(&arg[id].work); 3604 + if (arg[id].result && !err) 3605 + err = arg[id].result; 3608 3606 3609 3607 count += arg[id].count; 3610 3608 3611 - put_task_struct(tsk[id]); 3609 + kthread_destroy_worker(worker[id]); 3612 3610 } 3613 3611 3614 3612 pr_info("Submitted %lu crescendo:%x requests across %d engines and %d contexts\n",

+15 -21

drivers/gpu/drm/i915/gt/selftest_gt_pm.c

··· 36 36 return 0; 37 37 } 38 38 39 + static u32 read_timestamp(struct intel_engine_cs *engine) 40 + { 41 + struct drm_i915_private *i915 = engine->i915; 42 + 43 + /* On i965 the first read tends to give a stale value */ 44 + ENGINE_READ_FW(engine, RING_TIMESTAMP); 45 + 46 + if (GRAPHICS_VER(i915) == 5 || IS_G4X(i915)) 47 + return ENGINE_READ_FW(engine, RING_TIMESTAMP_UDW); 48 + else 49 + return ENGINE_READ_FW(engine, RING_TIMESTAMP); 50 + } 51 + 39 52 static void measure_clocks(struct intel_engine_cs *engine, 40 53 u32 *out_cycles, ktime_t *out_dt) 41 54 { ··· 58 45 59 46 for (i = 0; i < 5; i++) { 60 47 local_irq_disable(); 61 - cycles[i] = -ENGINE_READ_FW(engine, RING_TIMESTAMP); 48 + cycles[i] = -read_timestamp(engine); 62 49 dt[i] = ktime_get(); 63 50 64 51 udelay(1000); 65 52 66 53 dt[i] = ktime_sub(ktime_get(), dt[i]); 67 - cycles[i] += ENGINE_READ_FW(engine, RING_TIMESTAMP); 54 + cycles[i] += read_timestamp(engine); 68 55 local_irq_enable(); 69 56 } 70 57 ··· 89 76 } 90 77 91 78 if (GRAPHICS_VER(gt->i915) < 4) /* Any CS_TIMESTAMP? */ 92 - return 0; 93 - 94 - if (GRAPHICS_VER(gt->i915) == 5) 95 - /* 96 - * XXX CS_TIMESTAMP low dword is dysfunctional? 97 - * 98 - * Ville's experiments indicate the high dword still works, 99 - * but at a correspondingly reduced frequency. 100 - */ 101 - return 0; 102 - 103 - if (GRAPHICS_VER(gt->i915) == 4) 104 - /* 105 - * XXX CS_TIMESTAMP appears gibberish 106 - * 107 - * Ville's experiments indicate that it mostly appears 'stuck' 108 - * in that we see the register report the same cycle count 109 - * for a couple of reads. 110 - */ 111 79 return 0; 112 80 113 81 intel_gt_pm_get(gt);

+30 -21

drivers/gpu/drm/i915/gt/selftest_hangcheck.c

··· 866 866 } 867 867 868 868 struct active_engine { 869 - struct task_struct *task; 869 + struct kthread_worker *worker; 870 + struct kthread_work work; 870 871 struct intel_engine_cs *engine; 871 872 unsigned long resets; 872 873 unsigned int flags; 874 + bool stop; 875 + int result; 873 876 }; 874 877 875 878 #define TEST_ACTIVE BIT(0) ··· 903 900 return err; 904 901 } 905 902 906 - static int active_engine(void *data) 903 + static void active_engine(struct kthread_work *work) 907 904 { 908 905 I915_RND_STATE(prng); 909 - struct active_engine *arg = data; 906 + struct active_engine *arg = container_of(work, typeof(*arg), work); 910 907 struct intel_engine_cs *engine = arg->engine; 911 908 struct i915_request *rq[8] = {}; 912 909 struct intel_context *ce[ARRAY_SIZE(rq)]; ··· 916 913 for (count = 0; count < ARRAY_SIZE(ce); count++) { 917 914 ce[count] = intel_context_create(engine); 918 915 if (IS_ERR(ce[count])) { 919 - err = PTR_ERR(ce[count]); 920 - pr_err("[%s] Create context #%ld failed: %d!\n", engine->name, count, err); 916 + arg->result = PTR_ERR(ce[count]); 917 + pr_err("[%s] Create context #%ld failed: %d!\n", 918 + engine->name, count, arg->result); 921 919 while (--count) 922 920 intel_context_put(ce[count]); 923 - return err; 921 + return; 924 922 } 925 923 } 926 924 927 925 count = 0; 928 - while (!kthread_should_stop()) { 926 + while (!READ_ONCE(arg->stop)) { 929 927 unsigned int idx = count++ & (ARRAY_SIZE(rq) - 1); 930 928 struct i915_request *old = rq[idx]; 931 929 struct i915_request *new; ··· 971 967 intel_context_put(ce[count]); 972 968 } 973 969 974 - return err; 970 + arg->result = err; 975 971 } 976 972 977 973 static int __igt_reset_engines(struct intel_gt *gt, ··· 1026 1022 1027 1023 memset(threads, 0, sizeof(*threads) * I915_NUM_ENGINES); 1028 1024 for_each_engine(other, gt, tmp) { 1029 - struct task_struct *tsk; 1025 + struct kthread_worker *worker; 1030 1026 1031 1027 threads[tmp].resets = 1032 1028 i915_reset_engine_count(global, other); ··· 1040 1036 threads[tmp].engine = other; 1041 1037 threads[tmp].flags = flags; 1042 1038 1043 - tsk = kthread_run(active_engine, &threads[tmp], 1044 - "igt/%s", other->name); 1045 - if (IS_ERR(tsk)) { 1046 - err = PTR_ERR(tsk); 1047 - pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err); 1039 + worker = kthread_create_worker(0, "igt/%s", 1040 + other->name); 1041 + if (IS_ERR(worker)) { 1042 + err = PTR_ERR(worker); 1043 + pr_err("[%s] Worker create failed: %d!\n", 1044 + engine->name, err); 1048 1045 goto unwind; 1049 1046 } 1050 1047 1051 - threads[tmp].task = tsk; 1052 - get_task_struct(tsk); 1053 - } 1048 + threads[tmp].worker = worker; 1054 1049 1055 - yield(); /* start all threads before we begin */ 1050 + kthread_init_work(&threads[tmp].work, active_engine); 1051 + kthread_queue_work(threads[tmp].worker, 1052 + &threads[tmp].work); 1053 + } 1056 1054 1057 1055 st_engine_heartbeat_disable_no_pm(engine); 1058 1056 GEM_BUG_ON(test_and_set_bit(I915_RESET_ENGINE + id, ··· 1203 1197 for_each_engine(other, gt, tmp) { 1204 1198 int ret; 1205 1199 1206 - if (!threads[tmp].task) 1200 + if (!threads[tmp].worker) 1207 1201 continue; 1208 1202 1209 - ret = kthread_stop(threads[tmp].task); 1203 + WRITE_ONCE(threads[tmp].stop, true); 1204 + kthread_flush_work(&threads[tmp].work); 1205 + ret = READ_ONCE(threads[tmp].result); 1210 1206 if (ret) { 1211 1207 pr_err("kthread for other engine %s failed, err=%d\n", 1212 1208 other->name, ret); 1213 1209 if (!err) 1214 1210 err = ret; 1215 1211 } 1216 - put_task_struct(threads[tmp].task); 1212 + 1213 + kthread_destroy_worker(threads[tmp].worker); 1217 1214 1218 1215 /* GuC based resets are not logged per engine */ 1219 1216 if (!using_guc) {

+1

drivers/gpu/drm/i915/gt/selftest_migrate.c

··· 6 6 #include <linux/sort.h> 7 7 8 8 #include "gem/i915_gem_internal.h" 9 + #include "gem/i915_gem_lmem.h" 9 10 10 11 #include "selftests/i915_random.h" 11 12

+9 -3

drivers/gpu/drm/i915/gt/selftest_rps.c

··· 1107 1107 return div64_u64(1000 * 1000 * dE, dt); 1108 1108 } 1109 1109 1110 - static u64 measure_power_at(struct intel_rps *rps, int *freq) 1110 + static u64 measure_power(struct intel_rps *rps, int *freq) 1111 1111 { 1112 1112 u64 x[5]; 1113 1113 int i; 1114 1114 1115 - *freq = rps_set_check(rps, *freq); 1116 1115 for (i = 0; i < 5; i++) 1117 1116 x[i] = __measure_power(5); 1118 - *freq = (*freq + read_cagf(rps)) / 2; 1117 + 1118 + *freq = (*freq + intel_rps_read_actual_frequency(rps)) / 2; 1119 1119 1120 1120 /* A simple triangle filter for better result stability */ 1121 1121 sort(x, 5, sizeof(*x), cmp_u64, NULL); 1122 1122 return div_u64(x[1] + 2 * x[2] + x[3], 4); 1123 + } 1124 + 1125 + static u64 measure_power_at(struct intel_rps *rps, int *freq) 1126 + { 1127 + *freq = rps_set_check(rps, *freq); 1128 + return measure_power(rps, freq); 1123 1129 } 1124 1130 1125 1131 int live_rps_power(void *arg)

+165 -25

drivers/gpu/drm/i915/gt/selftest_slpc.c

··· 11 11 enum test_type { 12 12 VARY_MIN, 13 13 VARY_MAX, 14 - MAX_GRANTED 14 + MAX_GRANTED, 15 + SLPC_POWER, 15 16 }; 16 17 17 18 static int slpc_set_min_freq(struct intel_guc_slpc *slpc, u32 freq) ··· 40 39 delay_for_h2g(); 41 40 42 41 return ret; 42 + } 43 + 44 + static int slpc_set_freq(struct intel_gt *gt, u32 freq) 45 + { 46 + int err; 47 + struct intel_guc_slpc *slpc = &gt->uc.guc.slpc; 48 + 49 + err = slpc_set_max_freq(slpc, freq); 50 + if (err) { 51 + pr_err("Unable to update max freq"); 52 + return err; 53 + } 54 + 55 + err = slpc_set_min_freq(slpc, freq); 56 + if (err) { 57 + pr_err("Unable to update min freq"); 58 + return err; 59 + } 60 + 61 + return err; 62 + } 63 + 64 + static u64 measure_power_at_freq(struct intel_gt *gt, int *freq, u64 *power) 65 + { 66 + int err = 0; 67 + 68 + err = slpc_set_freq(gt, *freq); 69 + if (err) 70 + return err; 71 + *freq = intel_rps_read_actual_frequency(&gt->rps); 72 + *power = measure_power(&gt->rps, freq); 73 + 74 + return err; 43 75 } 44 76 45 77 static int vary_max_freq(struct intel_guc_slpc *slpc, struct intel_rps *rps, ··· 147 113 return err; 148 114 } 149 115 116 + static int slpc_power(struct intel_gt *gt, struct intel_engine_cs *engine) 117 + { 118 + struct intel_guc_slpc *slpc = &gt->uc.guc.slpc; 119 + struct { 120 + u64 power; 121 + int freq; 122 + } min, max; 123 + int err = 0; 124 + 125 + /* 126 + * Our fundamental assumption is that running at lower frequency 127 + * actually saves power. Let's see if our RAPL measurement supports 128 + * that theory. 129 + */ 130 + if (!librapl_supported(gt->i915)) 131 + return 0; 132 + 133 + min.freq = slpc->min_freq; 134 + err = measure_power_at_freq(gt, &min.freq, &min.power); 135 + 136 + if (err) 137 + return err; 138 + 139 + max.freq = slpc->rp0_freq; 140 + err = measure_power_at_freq(gt, &max.freq, &max.power); 141 + 142 + if (err) 143 + return err; 144 + 145 + pr_info("%s: min:%llumW @ %uMHz, max:%llumW @ %uMHz\n", 146 + engine->name, 147 + min.power, min.freq, 148 + max.power, max.freq); 149 + 150 + if (10 * min.freq >= 9 * max.freq) { 151 + pr_notice("Could not control frequency, ran at [%uMHz, %uMhz]\n", 152 + min.freq, max.freq); 153 + } 154 + 155 + if (11 * min.power > 10 * max.power) { 156 + pr_err("%s: did not conserve power when setting lower frequency!\n", 157 + engine->name); 158 + err = -EINVAL; 159 + } 160 + 161 + /* Restore min/max frequencies */ 162 + slpc_set_max_freq(slpc, slpc->rp0_freq); 163 + slpc_set_min_freq(slpc, slpc->min_freq); 164 + 165 + return err; 166 + } 167 + 150 168 static int max_granted_freq(struct intel_guc_slpc *slpc, struct intel_rps *rps, u32 *max_act_freq) 151 169 { 152 170 struct intel_gt *gt = rps_to_gt(rps); ··· 239 153 if (!intel_uc_uses_guc_slpc(&gt->uc)) 240 154 return 0; 241 155 156 + if (slpc->min_freq == slpc->rp0_freq) { 157 + pr_err("Min/Max are fused to the same value\n"); 158 + return -EINVAL; 159 + } 160 + 242 161 if (igt_spinner_init(&spin, gt)) 243 162 return -ENOMEM; 244 163 ··· 258 167 } 259 168 260 169 /* 261 - * FIXME: With efficient frequency enabled, GuC can request 262 - * frequencies higher than the SLPC max. While this is fixed 263 - * in GuC, we level set these tests with RPn as min. 170 + * Set min frequency to RPn so that we can test the whole 171 + * range of RPn-RP0. This also turns off efficient freq 172 + * usage and makes results more predictable. 264 173 */ 265 174 err = slpc_set_min_freq(slpc, slpc->min_freq); 266 - if (err) 175 + if (err) { 176 + pr_err("Unable to update min freq!"); 267 177 return err; 268 - 269 - if (slpc->min_freq == slpc->rp0_freq) { 270 - pr_err("Min/Max are fused to the same value\n"); 271 - return -EINVAL; 272 178 } 273 179 274 180 intel_gt_pm_wait_for_idle(gt); ··· 321 233 322 234 err = max_granted_freq(slpc, rps, &max_act_freq); 323 235 break; 236 + 237 + case SLPC_POWER: 238 + err = slpc_power(gt, engine); 239 + break; 324 240 } 325 241 326 - pr_info("Max actual frequency for %s was %d\n", 327 - engine->name, max_act_freq); 242 + if (test_type != SLPC_POWER) { 243 + pr_info("Max actual frequency for %s was %d\n", 244 + engine->name, max_act_freq); 328 245 329 - /* Actual frequency should rise above min */ 330 - if (max_act_freq <= slpc_min_freq) { 331 - pr_err("Actual freq did not rise above min\n"); 332 - pr_err("Perf Limit Reasons: 0x%x\n", 333 - intel_uncore_read(gt->uncore, GT0_PERF_LIMIT_REASONS)); 334 - err = -EINVAL; 246 + /* Actual frequency should rise above min */ 247 + if (max_act_freq <= slpc->min_freq) { 248 + pr_err("Actual freq did not rise above min\n"); 249 + pr_err("Perf Limit Reasons: 0x%x\n", 250 + intel_uncore_read(gt->uncore, GT0_PERF_LIMIT_REASONS)); 251 + err = -EINVAL; 252 + } 335 253 } 336 254 337 255 igt_spinner_end(&spin); ··· 364 270 static int live_slpc_vary_min(void *arg) 365 271 { 366 272 struct drm_i915_private *i915 = arg; 367 - struct intel_gt *gt = to_gt(i915); 273 + struct intel_gt *gt; 274 + unsigned int i; 275 + int ret; 368 276 369 - return run_test(gt, VARY_MIN); 277 + for_each_gt(gt, i915, i) { 278 + ret = run_test(gt, VARY_MIN); 279 + if (ret) 280 + return ret; 281 + } 282 + 283 + return ret; 370 284 } 371 285 372 286 static int live_slpc_vary_max(void *arg) 373 287 { 374 288 struct drm_i915_private *i915 = arg; 375 - struct intel_gt *gt = to_gt(i915); 289 + struct intel_gt *gt; 290 + unsigned int i; 291 + int ret; 376 292 377 - return run_test(gt, VARY_MAX); 293 + for_each_gt(gt, i915, i) { 294 + ret = run_test(gt, VARY_MAX); 295 + if (ret) 296 + return ret; 297 + } 298 + 299 + return ret; 378 300 } 379 301 380 302 /* check if pcode can grant RP0 */ 381 303 static int live_slpc_max_granted(void *arg) 382 304 { 383 305 struct drm_i915_private *i915 = arg; 384 - struct intel_gt *gt = to_gt(i915); 306 + struct intel_gt *gt; 307 + unsigned int i; 308 + int ret; 385 309 386 - return run_test(gt, MAX_GRANTED); 310 + for_each_gt(gt, i915, i) { 311 + ret = run_test(gt, MAX_GRANTED); 312 + if (ret) 313 + return ret; 314 + } 315 + 316 + return ret; 317 + } 318 + 319 + static int live_slpc_power(void *arg) 320 + { 321 + struct drm_i915_private *i915 = arg; 322 + struct intel_gt *gt; 323 + unsigned int i; 324 + int ret; 325 + 326 + for_each_gt(gt, i915, i) { 327 + ret = run_test(gt, SLPC_POWER); 328 + if (ret) 329 + return ret; 330 + } 331 + 332 + return ret; 387 333 } 388 334 389 335 int intel_slpc_live_selftests(struct drm_i915_private *i915) ··· 432 298 SUBTEST(live_slpc_vary_max), 433 299 SUBTEST(live_slpc_vary_min), 434 300 SUBTEST(live_slpc_max_granted), 301 + SUBTEST(live_slpc_power), 435 302 }; 436 303 437 - if (intel_gt_is_wedged(to_gt(i915))) 438 - return 0; 304 + struct intel_gt *gt; 305 + unsigned int i; 306 + 307 + for_each_gt(gt, i915, i) { 308 + if (intel_gt_is_wedged(gt)) 309 + return 0; 310 + } 439 311 440 312 return i915_live_subtests(tests, i915); 441 313 }

+1 -1

drivers/gpu/drm/i915/gt/selftest_workarounds.c

··· 991 991 /* Alas, we must pardon some whitelists. Mistakes already made */ 992 992 static const struct regmask pardon[] = { 993 993 { GEN9_CTX_PREEMPT_REG, 9 }, 994 - { GEN8_L3SQCREG4, 9 }, 994 + { _MMIO(0xb118), 9 }, /* GEN8_L3SQCREG4 */ 995 995 }; 996 996 997 997 return find_reg(i915, reg, pardon, ARRAY_SIZE(pardon));

+15 -10

drivers/gpu/drm/i915/gt/sysfs_engines.c

··· 144 144 const char *buf, size_t count) 145 145 { 146 146 struct intel_engine_cs *engine = kobj_to_engine(kobj); 147 - unsigned long long duration; 147 + unsigned long long duration, clamped; 148 148 int err; 149 149 150 150 /* ··· 168 168 if (err) 169 169 return err; 170 170 171 - if (duration > jiffies_to_nsecs(2)) 171 + clamped = intel_clamp_max_busywait_duration_ns(engine, duration); 172 + if (duration != clamped) 172 173 return -EINVAL; 173 174 174 175 WRITE_ONCE(engine->props.max_busywait_duration_ns, duration); ··· 204 203 const char *buf, size_t count) 205 204 { 206 205 struct intel_engine_cs *engine = kobj_to_engine(kobj); 207 - unsigned long long duration; 206 + unsigned long long duration, clamped; 208 207 int err; 209 208 210 209 /* ··· 219 218 if (err) 220 219 return err; 221 220 222 - if (duration > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)) 221 + clamped = intel_clamp_timeslice_duration_ms(engine, duration); 222 + if (duration != clamped) 223 223 return -EINVAL; 224 224 225 225 WRITE_ONCE(engine->props.timeslice_duration_ms, duration); ··· 258 256 const char *buf, size_t count) 259 257 { 260 258 struct intel_engine_cs *engine = kobj_to_engine(kobj); 261 - unsigned long long duration; 259 + unsigned long long duration, clamped; 262 260 int err; 263 261 264 262 /* ··· 274 272 if (err) 275 273 return err; 276 274 277 - if (duration > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)) 275 + clamped = intel_clamp_stop_timeout_ms(engine, duration); 276 + if (duration != clamped) 278 277 return -EINVAL; 279 278 280 279 WRITE_ONCE(engine->props.stop_timeout_ms, duration); ··· 309 306 const char *buf, size_t count) 310 307 { 311 308 struct intel_engine_cs *engine = kobj_to_engine(kobj); 312 - unsigned long long timeout; 309 + unsigned long long timeout, clamped; 313 310 int err; 314 311 315 312 /* ··· 325 322 if (err) 326 323 return err; 327 324 328 - if (timeout > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)) 325 + clamped = intel_clamp_preempt_timeout_ms(engine, timeout); 326 + if (timeout != clamped) 329 327 return -EINVAL; 330 328 331 329 WRITE_ONCE(engine->props.preempt_timeout_ms, timeout); ··· 366 362 const char *buf, size_t count) 367 363 { 368 364 struct intel_engine_cs *engine = kobj_to_engine(kobj); 369 - unsigned long long delay; 365 + unsigned long long delay, clamped; 370 366 int err; 371 367 372 368 /* ··· 383 379 if (err) 384 380 return err; 385 381 386 - if (delay >= jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)) 382 + clamped = intel_clamp_heartbeat_interval_ms(engine, delay); 383 + if (delay != clamped) 387 384 return -EINVAL; 388 385 389 386 err = intel_engine_set_heartbeat(engine, delay);

+1

drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h

··· 117 117 INTEL_GUC_ACTION_ENTER_S_STATE = 0x501, 118 118 INTEL_GUC_ACTION_EXIT_S_STATE = 0x502, 119 119 INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506, 120 + INTEL_GUC_ACTION_UPDATE_SCHEDULING_POLICIES_KLV = 0x509, 120 121 INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000, 121 122 INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001, 122 123 INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,

+9

drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h

··· 128 128 SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO = 2, 129 129 }; 130 130 131 + enum slpc_gucrc_mode { 132 + SLPC_GUCRC_MODE_HW = 0, 133 + SLPC_GUCRC_MODE_GUCRC_NO_RC6 = 1, 134 + SLPC_GUCRC_MODE_GUCRC_STATIC_TIMEOUT = 2, 135 + SLPC_GUCRC_MODE_GUCRC_DYNAMIC_HYSTERESIS = 3, 136 + 137 + SLPC_GUCRC_MODE_MAX, 138 + }; 139 + 131 140 enum slpc_event_id { 132 141 SLPC_EVENT_RESET = 0, 133 142 SLPC_EVENT_SHUTDOWN = 1,

+8 -1

drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h

··· 82 82 #define GUC_KLV_SELF_CFG_G2H_CTB_SIZE_LEN 1u 83 83 84 84 /* 85 + * Global scheduling policy update keys. 86 + */ 87 + enum { 88 + GUC_SCHEDULING_POLICIES_KLV_ID_RENDER_COMPUTE_YIELD = 0x1001, 89 + }; 90 + 91 + /* 85 92 * Per context scheduling policy update keys. 86 93 */ 87 - enum { 94 + enum { 88 95 GUC_CONTEXT_POLICIES_KLV_ID_EXECUTION_QUANTUM = 0x2001, 89 96 GUC_CONTEXT_POLICIES_KLV_ID_PREEMPTION_TIMEOUT = 0x2002, 90 97 GUC_CONTEXT_POLICIES_KLV_ID_SCHEDULING_PRIORITY = 0x2003,

+1

drivers/gpu/drm/i915/gt/uc/intel_guc.c

··· 441 441 err_fw: 442 442 intel_uc_fw_fini(&guc->fw); 443 443 out: 444 + intel_uc_fw_change_status(&guc->fw, INTEL_UC_FIRMWARE_INIT_FAIL); 444 445 i915_probe_error(gt->i915, "failed with %d\n", ret); 445 446 return ret; 446 447 }

+16

drivers/gpu/drm/i915/gt/uc/intel_guc.h

··· 113 113 */ 114 114 struct list_head guc_id_list; 115 115 /** 116 + * @guc_ids_in_use: Number single-lrc guc_ids in use 117 + */ 118 + unsigned int guc_ids_in_use; 119 + /** 116 120 * @destroyed_contexts: list of contexts waiting to be destroyed 117 121 * (deregistered with the GuC) 118 122 */ ··· 136 132 * @reset_fail_mask: mask of engines that failed to reset 137 133 */ 138 134 intel_engine_mask_t reset_fail_mask; 135 + /** 136 + * @sched_disable_delay_ms: schedule disable delay, in ms, for 137 + * contexts 138 + */ 139 + unsigned int sched_disable_delay_ms; 140 + /** 141 + * @sched_disable_gucid_threshold: threshold of min remaining available 142 + * guc_ids before we start bypassing the schedule disable delay 143 + */ 144 + unsigned int sched_disable_gucid_threshold; 139 145 } submission_state; 140 146 141 147 /** ··· 479 465 void intel_guc_write_barrier(struct intel_guc *guc); 480 466 481 467 void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p); 468 + 469 + int intel_guc_sched_disable_gucid_threshold_max(struct intel_guc *guc); 482 470 483 471 #endif

+49 -22

drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c

··· 5 5 6 6 #include <linux/bsearch.h> 7 7 8 + #include "gem/i915_gem_lmem.h" 8 9 #include "gt/intel_engine_regs.h" 9 10 #include "gt/intel_gt.h" 10 11 #include "gt/intel_gt_mcr.h" ··· 278 277 return slot; 279 278 } 280 279 281 - #define GUC_REGSET_STEERING(group, instance) ( \ 282 - FIELD_PREP(GUC_REGSET_STEERING_GROUP, (group)) | \ 283 - FIELD_PREP(GUC_REGSET_STEERING_INSTANCE, (instance)) | \ 284 - GUC_REGSET_NEEDS_STEERING \ 285 - ) 286 - 287 280 static long __must_check guc_mmio_reg_add(struct intel_gt *gt, 288 281 struct temp_regset *regset, 289 - i915_reg_t reg, u32 flags) 282 + u32 offset, u32 flags) 290 283 { 291 284 u32 count = regset->storage_used - (regset->registers - regset->storage); 292 - u32 offset = i915_mmio_reg_offset(reg); 293 285 struct guc_mmio_reg entry = { 294 286 .offset = offset, 295 287 .flags = flags, 296 288 }; 297 289 struct guc_mmio_reg *slot; 298 - u8 group, inst; 299 290 300 291 /* 301 292 * The mmio list is built using separate lists within the driver. ··· 298 305 if (bsearch(&entry, regset->registers, count, 299 306 sizeof(entry), guc_mmio_reg_cmp)) 300 307 return 0; 301 - 302 - /* 303 - * The GuC doesn't have a default steering, so we need to explicitly 304 - * steer all registers that need steering. However, we do not keep track 305 - * of all the steering ranges, only of those that have a chance of using 306 - * a non-default steering from the i915 pov. Instead of adding such 307 - * tracking, it is easier to just program the default steering for all 308 - * regs that don't need a non-default one. 309 - */ 310 - intel_gt_mcr_get_nonterminated_steering(gt, reg, &group, &inst); 311 - entry.flags |= GUC_REGSET_STEERING(group, inst); 312 308 313 309 slot = __mmio_reg_add(regset, &entry); 314 310 if (IS_ERR(slot)) ··· 316 334 317 335 #define GUC_MMIO_REG_ADD(gt, regset, reg, masked) \ 318 336 guc_mmio_reg_add(gt, \ 337 + regset, \ 338 + i915_mmio_reg_offset(reg), \ 339 + (masked) ? GUC_REGSET_MASKED : 0) 340 + 341 + #define GUC_REGSET_STEERING(group, instance) ( \ 342 + FIELD_PREP(GUC_REGSET_STEERING_GROUP, (group)) | \ 343 + FIELD_PREP(GUC_REGSET_STEERING_INSTANCE, (instance)) | \ 344 + GUC_REGSET_NEEDS_STEERING \ 345 + ) 346 + 347 + static long __must_check guc_mcr_reg_add(struct intel_gt *gt, 348 + struct temp_regset *regset, 349 + i915_mcr_reg_t reg, u32 flags) 350 + { 351 + u8 group, inst; 352 + 353 + /* 354 + * The GuC doesn't have a default steering, so we need to explicitly 355 + * steer all registers that need steering. However, we do not keep track 356 + * of all the steering ranges, only of those that have a chance of using 357 + * a non-default steering from the i915 pov. Instead of adding such 358 + * tracking, it is easier to just program the default steering for all 359 + * regs that don't need a non-default one. 360 + */ 361 + intel_gt_mcr_get_nonterminated_steering(gt, reg, &group, &inst); 362 + flags |= GUC_REGSET_STEERING(group, inst); 363 + 364 + return guc_mmio_reg_add(gt, regset, i915_mmio_reg_offset(reg), flags); 365 + } 366 + 367 + #define GUC_MCR_REG_ADD(gt, regset, reg, masked) \ 368 + guc_mcr_reg_add(gt, \ 319 369 regset, \ 320 370 (reg), \ 321 371 (masked) ? GUC_REGSET_MASKED : 0) ··· 386 372 false); 387 373 388 374 /* add in local MOCS registers */ 389 - for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++) 390 - ret |= GUC_MMIO_REG_ADD(gt, regset, GEN9_LNCFCMOCS(i), false); 375 + for (i = 0; i < LNCFCMOCS_REG_COUNT; i++) 376 + if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) 377 + ret |= GUC_MCR_REG_ADD(gt, regset, XEHP_LNCFCMOCS(i), false); 378 + else 379 + ret |= GUC_MMIO_REG_ADD(gt, regset, GEN9_LNCFCMOCS(i), false); 380 + 381 + if (GRAPHICS_VER(engine->i915) >= 12) { 382 + ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL0, false); 383 + ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL1, false); 384 + ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL2, false); 385 + ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL3, false); 386 + ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL4, false); 387 + ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL5, false); 388 + ret |= GUC_MMIO_REG_ADD(gt, regset, EU_PERF_CNTL6, false); 389 + } 391 390 392 391 return ret ? -1 : 0; 393 392 }

+95 -24

drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c

··· 169 169 MAKE_REGLIST(default_global_regs, PF, GLOBAL, 0), 170 170 MAKE_REGLIST(default_rc_class_regs, PF, ENGINE_CLASS, GUC_RENDER_CLASS), 171 171 MAKE_REGLIST(xe_lpd_rc_inst_regs, PF, ENGINE_INSTANCE, GUC_RENDER_CLASS), 172 + MAKE_REGLIST(default_rc_class_regs, PF, ENGINE_CLASS, GUC_COMPUTE_CLASS), 173 + MAKE_REGLIST(xe_lpd_rc_inst_regs, PF, ENGINE_INSTANCE, GUC_COMPUTE_CLASS), 172 174 MAKE_REGLIST(empty_regs_list, PF, ENGINE_CLASS, GUC_VIDEO_CLASS), 173 175 MAKE_REGLIST(xe_lpd_vd_inst_regs, PF, ENGINE_INSTANCE, GUC_VIDEO_CLASS), 174 176 MAKE_REGLIST(empty_regs_list, PF, ENGINE_CLASS, GUC_VIDEOENHANCE_CLASS), ··· 184 182 MAKE_REGLIST(xe_lpd_global_regs, PF, GLOBAL, 0), 185 183 MAKE_REGLIST(xe_lpd_rc_class_regs, PF, ENGINE_CLASS, GUC_RENDER_CLASS), 186 184 MAKE_REGLIST(xe_lpd_rc_inst_regs, PF, ENGINE_INSTANCE, GUC_RENDER_CLASS), 185 + MAKE_REGLIST(xe_lpd_rc_class_regs, PF, ENGINE_CLASS, GUC_COMPUTE_CLASS), 186 + MAKE_REGLIST(xe_lpd_rc_inst_regs, PF, ENGINE_INSTANCE, GUC_COMPUTE_CLASS), 187 187 MAKE_REGLIST(empty_regs_list, PF, ENGINE_CLASS, GUC_VIDEO_CLASS), 188 188 MAKE_REGLIST(xe_lpd_vd_inst_regs, PF, ENGINE_INSTANCE, GUC_VIDEO_CLASS), 189 189 MAKE_REGLIST(xe_lpd_vec_class_regs, PF, ENGINE_CLASS, GUC_VIDEOENHANCE_CLASS), ··· 244 240 245 241 struct __ext_steer_reg { 246 242 const char *name; 247 - i915_reg_t reg; 243 + i915_mcr_reg_t reg; 248 244 }; 249 245 250 246 static const struct __ext_steer_reg xe_extregs[] = { 251 - {"GEN7_SAMPLER_INSTDONE", GEN7_SAMPLER_INSTDONE}, 252 - {"GEN7_ROW_INSTDONE", GEN7_ROW_INSTDONE} 247 + {"GEN8_SAMPLER_INSTDONE", GEN8_SAMPLER_INSTDONE}, 248 + {"GEN8_ROW_INSTDONE", GEN8_ROW_INSTDONE} 253 249 }; 254 250 255 251 static void __fill_ext_reg(struct __guc_mmio_reg_descr *ext, 256 252 const struct __ext_steer_reg *extlist, 257 253 int slice_id, int subslice_id) 258 254 { 259 - ext->reg = extlist->reg; 255 + ext->reg = _MMIO(i915_mmio_reg_offset(extlist->reg)); 260 256 ext->flags = FIELD_PREP(GUC_REGSET_STEERING_GROUP, slice_id); 261 257 ext->flags |= FIELD_PREP(GUC_REGSET_STEERING_INSTANCE, subslice_id); 262 258 ext->regname = extlist->name; ··· 423 419 return default_lists; 424 420 } 425 421 422 + static const char * 423 + __stringify_type(u32 type) 424 + { 425 + switch (type) { 426 + case GUC_CAPTURE_LIST_TYPE_GLOBAL: 427 + return "Global"; 428 + case GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS: 429 + return "Class"; 430 + case GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE: 431 + return "Instance"; 432 + default: 433 + break; 434 + } 435 + 436 + return "unknown"; 437 + } 438 + 439 + static const char * 440 + __stringify_engclass(u32 class) 441 + { 442 + switch (class) { 443 + case GUC_RENDER_CLASS: 444 + return "Render"; 445 + case GUC_VIDEO_CLASS: 446 + return "Video"; 447 + case GUC_VIDEOENHANCE_CLASS: 448 + return "VideoEnhance"; 449 + case GUC_BLITTER_CLASS: 450 + return "Blitter"; 451 + case GUC_COMPUTE_CLASS: 452 + return "Compute"; 453 + default: 454 + break; 455 + } 456 + 457 + return "unknown"; 458 + } 459 + 426 460 static int 427 461 guc_capture_list_init(struct intel_guc *guc, u32 owner, u32 type, u32 classid, 428 462 struct guc_mmio_reg *ptr, u16 num_entries) ··· 524 482 return num_regs; 525 483 } 526 484 527 - int 528 - intel_guc_capture_getlistsize(struct intel_guc *guc, u32 owner, u32 type, u32 classid, 529 - size_t *size) 485 + static int 486 + guc_capture_getlistsize(struct intel_guc *guc, u32 owner, u32 type, u32 classid, 487 + size_t *size, bool is_purpose_est) 530 488 { 531 489 struct intel_guc_state_capture *gc = guc->capture; 490 + struct drm_i915_private *i915 = guc_to_gt(guc)->i915; 532 491 struct __guc_capture_ads_cache *cache = &gc->ads_cache[owner][type][classid]; 533 492 int num_regs; 534 493 535 - if (!gc->reglists) 494 + if (!gc->reglists) { 495 + drm_warn(&i915->drm, "GuC-capture: No reglist on this device\n"); 536 496 return -ENODEV; 497 + } 537 498 538 499 if (cache->is_valid) { 539 500 *size = cache->size; 540 501 return cache->status; 541 502 } 542 503 504 + if (!is_purpose_est && owner == GUC_CAPTURE_LIST_INDEX_PF && 505 + !guc_capture_get_one_list(gc->reglists, owner, type, classid)) { 506 + if (type == GUC_CAPTURE_LIST_TYPE_GLOBAL) 507 + drm_warn(&i915->drm, "Missing GuC-Err-Cap reglist Global!\n"); 508 + else 509 + drm_warn(&i915->drm, "Missing GuC-Err-Cap reglist %s(%u):%s(%u)!\n", 510 + __stringify_type(type), type, 511 + __stringify_engclass(classid), classid); 512 + return -ENODATA; 513 + } 514 + 543 515 num_regs = guc_cap_list_num_regs(gc, owner, type, classid); 516 + /* intentional empty lists can exist depending on hw config */ 544 517 if (!num_regs) 545 518 return -ENODATA; 546 519 547 - *size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) + 548 - (num_regs * sizeof(struct guc_mmio_reg))); 520 + if (size) 521 + *size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) + 522 + (num_regs * sizeof(struct guc_mmio_reg))); 549 523 550 524 return 0; 525 + } 526 + 527 + int 528 + intel_guc_capture_getlistsize(struct intel_guc *guc, u32 owner, u32 type, u32 classid, 529 + size_t *size) 530 + { 531 + return guc_capture_getlistsize(guc, owner, type, classid, size, false); 551 532 } 552 533 553 534 static void guc_capture_create_prealloc_nodes(struct intel_guc *guc); ··· 671 606 struct intel_gt *gt = guc_to_gt(guc); 672 607 struct intel_engine_cs *engine; 673 608 enum intel_engine_id id; 674 - int worst_min_size = 0, num_regs = 0; 609 + int worst_min_size = 0; 675 610 size_t tmp = 0; 676 611 677 612 if (!guc->capture) ··· 692 627 worst_min_size += sizeof(struct guc_state_capture_group_header_t) + 693 628 (3 * sizeof(struct guc_state_capture_header_t)); 694 629 695 - if (!intel_guc_capture_getlistsize(guc, 0, GUC_CAPTURE_LIST_TYPE_GLOBAL, 0, &tmp)) 696 - num_regs += tmp; 630 + if (!guc_capture_getlistsize(guc, 0, GUC_CAPTURE_LIST_TYPE_GLOBAL, 0, &tmp, true)) 631 + worst_min_size += tmp; 697 632 698 - if (!intel_guc_capture_getlistsize(guc, 0, GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS, 699 - engine->class, &tmp)) { 700 - num_regs += tmp; 633 + if (!guc_capture_getlistsize(guc, 0, GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS, 634 + engine->class, &tmp, true)) { 635 + worst_min_size += tmp; 701 636 } 702 - if (!intel_guc_capture_getlistsize(guc, 0, GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE, 703 - engine->class, &tmp)) { 704 - num_regs += tmp; 637 + if (!guc_capture_getlistsize(guc, 0, GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE, 638 + engine->class, &tmp, true)) { 639 + worst_min_size += tmp; 705 640 } 706 641 } 707 - 708 - worst_min_size += (num_regs * sizeof(struct guc_mmio_reg)); 709 642 710 643 return worst_min_size; 711 644 } ··· 721 658 int spare_size = min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER; 722 659 u32 buffer_size = intel_guc_log_section_size_capture(&guc->log); 723 660 661 + /* 662 + * NOTE: min_size is much smaller than the capture region allocation (DG2: <80K vs 1MB) 663 + * Additionally, its based on space needed to fit all engines getting reset at once 664 + * within the same G2H handler task slot. This is very unlikely. However, if GuC really 665 + * does run out of space for whatever reason, we will see an separate warning message 666 + * when processing the G2H event capture-notification, search for: 667 + * INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_NOSPACE. 668 + */ 724 669 if (min_size < 0) 725 670 drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n", 726 671 min_size); 727 672 else if (min_size > buffer_size) 728 - drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n", 673 + drm_warn(&i915->drm, "GuC error state capture buffer maybe small: %d < %d\n", 729 674 buffer_size, min_size); 730 675 else if (spare_size > buffer_size) 731 - drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n", 732 - buffer_size, spare_size, min_size); 676 + drm_dbg(&i915->drm, "GuC error state capture buffer lacks spare size: %d < %d (min = %d)\n", 677 + buffer_size, spare_size, min_size); 733 678 } 734 679 735 680 /*

+61

drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c

··· 71 71 return intel_guc_slpc_is_used(guc); 72 72 } 73 73 74 + static int guc_sched_disable_delay_ms_get(void *data, u64 *val) 75 + { 76 + struct intel_guc *guc = data; 77 + 78 + if (!intel_guc_submission_is_used(guc)) 79 + return -ENODEV; 80 + 81 + *val = (u64)guc->submission_state.sched_disable_delay_ms; 82 + 83 + return 0; 84 + } 85 + 86 + static int guc_sched_disable_delay_ms_set(void *data, u64 val) 87 + { 88 + struct intel_guc *guc = data; 89 + 90 + if (!intel_guc_submission_is_used(guc)) 91 + return -ENODEV; 92 + 93 + /* clamp to a practical limit, 1 minute is reasonable for a longest delay */ 94 + guc->submission_state.sched_disable_delay_ms = min_t(u64, val, 60000); 95 + 96 + return 0; 97 + } 98 + DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_delay_ms_fops, 99 + guc_sched_disable_delay_ms_get, 100 + guc_sched_disable_delay_ms_set, "%lld\n"); 101 + 102 + static int guc_sched_disable_gucid_threshold_get(void *data, u64 *val) 103 + { 104 + struct intel_guc *guc = data; 105 + 106 + if (!intel_guc_submission_is_used(guc)) 107 + return -ENODEV; 108 + 109 + *val = guc->submission_state.sched_disable_gucid_threshold; 110 + return 0; 111 + } 112 + 113 + static int guc_sched_disable_gucid_threshold_set(void *data, u64 val) 114 + { 115 + struct intel_guc *guc = data; 116 + 117 + if (!intel_guc_submission_is_used(guc)) 118 + return -ENODEV; 119 + 120 + if (val > intel_guc_sched_disable_gucid_threshold_max(guc)) 121 + guc->submission_state.sched_disable_gucid_threshold = 122 + intel_guc_sched_disable_gucid_threshold_max(guc); 123 + else 124 + guc->submission_state.sched_disable_gucid_threshold = val; 125 + 126 + return 0; 127 + } 128 + DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_gucid_threshold_fops, 129 + guc_sched_disable_gucid_threshold_get, 130 + guc_sched_disable_gucid_threshold_set, "%lld\n"); 131 + 74 132 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) 75 133 { 76 134 static const struct intel_gt_debugfs_file files[] = { 77 135 { "guc_info", &guc_info_fops, NULL }, 78 136 { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, 79 137 { "guc_slpc_info", &guc_slpc_info_fops, &intel_eval_slpc_support}, 138 + { "guc_sched_disable_delay_ms", &guc_sched_disable_delay_ms_fops, NULL }, 139 + { "guc_sched_disable_gucid_threshold", &guc_sched_disable_gucid_threshold_fops, 140 + NULL }, 80 141 }; 81 142 82 143 if (!intel_guc_is_supported(guc))

+8 -4

drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c

··· 10 10 */ 11 11 12 12 #include "gt/intel_gt.h" 13 + #include "gt/intel_gt_mcr.h" 13 14 #include "gt/intel_gt_regs.h" 14 15 #include "intel_guc_fw.h" 15 16 #include "i915_drv.h" 16 17 17 - static void guc_prepare_xfer(struct intel_uncore *uncore) 18 + static void guc_prepare_xfer(struct intel_gt *gt) 18 19 { 20 + struct intel_uncore *uncore = gt->uncore; 21 + 19 22 u32 shim_flags = GUC_ENABLE_READ_CACHE_LOGIC | 20 23 GUC_ENABLE_READ_CACHE_FOR_SRAM_DATA | 21 24 GUC_ENABLE_READ_CACHE_FOR_WOPCM_DATA | ··· 38 35 39 36 if (GRAPHICS_VER(uncore->i915) == 9) { 40 37 /* DOP Clock Gating Enable for GuC clocks */ 41 - intel_uncore_rmw(uncore, GEN7_MISCCPCTL, 42 - 0, GEN8_DOP_CLOCK_GATE_GUC_ENABLE); 38 + intel_gt_mcr_multicast_write(gt, GEN8_MISCCPCTL, 39 + GEN8_DOP_CLOCK_GATE_GUC_ENABLE | 40 + intel_gt_mcr_read_any(gt, GEN8_MISCCPCTL)); 43 41 44 42 /* allows for 5us (in 10ns units) before GT can go to RC6 */ 45 43 intel_uncore_write(uncore, GUC_ARAT_C6DIS, 0x1FF); ··· 172 168 struct intel_uncore *uncore = gt->uncore; 173 169 int ret; 174 170 175 - guc_prepare_xfer(uncore); 171 + guc_prepare_xfer(gt); 176 172 177 173 /* 178 174 * Note that GuC needs the CSS header plus uKernel code to be copied

+43

drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h

··· 290 290 struct guc_klv_generic_dw_t klv[GUC_CONTEXT_POLICIES_KLV_NUM_IDS]; 291 291 } __packed; 292 292 293 + /* Format of the UPDATE_SCHEDULING_POLICIES H2G data packet */ 294 + struct guc_update_scheduling_policy_header { 295 + u32 action; 296 + } __packed; 297 + 298 + /* 299 + * Can't dynmically allocate memory for the scheduling policy KLV because 300 + * it will be sent from within the reset path. Need a fixed size lump on 301 + * the stack instead :(. 302 + * 303 + * Currently, there is only one KLV defined, which has 1 word of KL + 2 words of V. 304 + */ 305 + #define MAX_SCHEDULING_POLICY_SIZE 3 306 + 307 + struct guc_update_scheduling_policy { 308 + struct guc_update_scheduling_policy_header header; 309 + u32 data[MAX_SCHEDULING_POLICY_SIZE]; 310 + } __packed; 311 + 293 312 #define GUC_POWER_UNSPECIFIED 0 294 313 #define GUC_POWER_D0 1 295 314 #define GUC_POWER_D1 2 ··· 317 298 318 299 /* Scheduling policy settings */ 319 300 301 + #define GLOBAL_SCHEDULE_POLICY_RC_YIELD_DURATION 100 /* in ms */ 302 + #define GLOBAL_SCHEDULE_POLICY_RC_YIELD_RATIO 50 /* in percent */ 303 + 320 304 #define GLOBAL_POLICY_MAX_NUM_WI 15 321 305 322 306 /* Don't reset an engine upon preemption failure */ 323 307 #define GLOBAL_POLICY_DISABLE_ENGINE_RESET BIT(0) 324 308 325 309 #define GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US 500000 310 + 311 + /* 312 + * GuC converts the timeout to clock ticks internally. Different platforms have 313 + * different GuC clocks. Thus, the maximum value before overflow is platform 314 + * dependent. Current worst case scenario is about 110s. So, the spec says to 315 + * limit to 100s to be safe. 316 + */ 317 + #define GUC_POLICY_MAX_EXEC_QUANTUM_US (100 * 1000 * 1000UL) 318 + #define GUC_POLICY_MAX_PREEMPT_TIMEOUT_US (100 * 1000 * 1000UL) 319 + 320 + static inline u32 guc_policy_max_exec_quantum_ms(void) 321 + { 322 + BUILD_BUG_ON(GUC_POLICY_MAX_EXEC_QUANTUM_US >= UINT_MAX); 323 + return GUC_POLICY_MAX_EXEC_QUANTUM_US / 1000; 324 + } 325 + 326 + static inline u32 guc_policy_max_preempt_timeout_ms(void) 327 + { 328 + BUILD_BUG_ON(GUC_POLICY_MAX_PREEMPT_TIMEOUT_US >= UINT_MAX); 329 + return GUC_POLICY_MAX_PREEMPT_TIMEOUT_US / 1000; 330 + } 326 331 327 332 struct guc_policies { 328 333 u32 submission_queue_depth[GUC_MAX_ENGINE_CLASSES];

+3 -3

drivers/gpu/drm/i915/gt/uc/intel_guc_log.c

··· 16 16 #if defined(CONFIG_DRM_I915_DEBUG_GUC) 17 17 #define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE SZ_2M 18 18 #define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE SZ_16M 19 - #define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE SZ_4M 19 + #define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE SZ_1M 20 20 #elif defined(CONFIG_DRM_I915_DEBUG_GEM) 21 21 #define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE SZ_1M 22 22 #define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE SZ_2M 23 - #define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE SZ_4M 23 + #define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE SZ_1M 24 24 #else 25 25 #define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE SZ_8K 26 26 #define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE SZ_64K 27 - #define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE SZ_2M 27 + #define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE SZ_1M 28 28 #endif 29 29 30 30 static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);

+103

drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c

··· 137 137 return ret > 0 ? -EPROTO : ret; 138 138 } 139 139 140 + static int guc_action_slpc_unset_param(struct intel_guc *guc, u8 id) 141 + { 142 + u32 request[] = { 143 + GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST, 144 + SLPC_EVENT(SLPC_EVENT_PARAMETER_UNSET, 1), 145 + id, 146 + }; 147 + 148 + return intel_guc_send(guc, request, ARRAY_SIZE(request)); 149 + } 150 + 140 151 static bool slpc_is_running(struct intel_guc_slpc *slpc) 141 152 { 142 153 return slpc_get_state(slpc) == SLPC_GLOBAL_STATE_RUNNING; ··· 199 188 id, value, ERR_PTR(ret)); 200 189 201 190 return ret; 191 + } 192 + 193 + static int slpc_unset_param(struct intel_guc_slpc *slpc, u8 id) 194 + { 195 + struct intel_guc *guc = slpc_to_guc(slpc); 196 + 197 + GEM_BUG_ON(id >= SLPC_MAX_PARAM); 198 + 199 + return guc_action_slpc_unset_param(guc, id); 202 200 } 203 201 204 202 static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq) ··· 283 263 284 264 slpc->max_freq_softlimit = 0; 285 265 slpc->min_freq_softlimit = 0; 266 + slpc->min_is_rpmax = false; 286 267 287 268 slpc->boost_freq = 0; 288 269 atomic_set(&slpc->num_waiters, 0); ··· 609 588 return 0; 610 589 } 611 590 591 + static bool is_slpc_min_freq_rpmax(struct intel_guc_slpc *slpc) 592 + { 593 + struct drm_i915_private *i915 = slpc_to_i915(slpc); 594 + int slpc_min_freq; 595 + int ret; 596 + 597 + ret = intel_guc_slpc_get_min_freq(slpc, &slpc_min_freq); 598 + if (ret) { 599 + drm_err(&i915->drm, 600 + "Failed to get min freq: (%d)\n", 601 + ret); 602 + return false; 603 + } 604 + 605 + if (slpc_min_freq == SLPC_MAX_FREQ_MHZ) 606 + return true; 607 + else 608 + return false; 609 + } 610 + 611 + static void update_server_min_softlimit(struct intel_guc_slpc *slpc) 612 + { 613 + /* For server parts, SLPC min will be at RPMax. 614 + * Use min softlimit to clamp it to RP0 instead. 615 + */ 616 + if (!slpc->min_freq_softlimit && 617 + is_slpc_min_freq_rpmax(slpc)) { 618 + slpc->min_is_rpmax = true; 619 + slpc->min_freq_softlimit = slpc->rp0_freq; 620 + (slpc_to_gt(slpc))->defaults.min_freq = slpc->min_freq_softlimit; 621 + } 622 + } 623 + 612 624 static int slpc_use_fused_rp0(struct intel_guc_slpc *slpc) 613 625 { 614 626 /* Force SLPC to used platform rp0 */ ··· 662 608 663 609 if (!slpc->boost_freq) 664 610 slpc->boost_freq = slpc->rp0_freq; 611 + } 612 + 613 + /** 614 + * intel_guc_slpc_override_gucrc_mode() - override GUCRC mode 615 + * @slpc: pointer to intel_guc_slpc. 616 + * @mode: new value of the mode. 617 + * 618 + * This function will override the GUCRC mode. 619 + * 620 + * Return: 0 on success, non-zero error code on failure. 621 + */ 622 + int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode) 623 + { 624 + int ret; 625 + struct drm_i915_private *i915 = slpc_to_i915(slpc); 626 + intel_wakeref_t wakeref; 627 + 628 + if (mode >= SLPC_GUCRC_MODE_MAX) 629 + return -EINVAL; 630 + 631 + with_intel_runtime_pm(&i915->runtime_pm, wakeref) { 632 + ret = slpc_set_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE, mode); 633 + if (ret) 634 + drm_err(&i915->drm, 635 + "Override gucrc mode %d failed %d\n", 636 + mode, ret); 637 + } 638 + 639 + return ret; 640 + } 641 + 642 + int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc) 643 + { 644 + struct drm_i915_private *i915 = slpc_to_i915(slpc); 645 + intel_wakeref_t wakeref; 646 + int ret = 0; 647 + 648 + with_intel_runtime_pm(&i915->runtime_pm, wakeref) { 649 + ret = slpc_unset_param(slpc, SLPC_PARAM_PWRGATE_RC_MODE); 650 + if (ret) 651 + drm_err(&i915->drm, 652 + "Unsetting gucrc mode failed %d\n", 653 + ret); 654 + } 655 + 656 + return ret; 665 657 } 666 658 667 659 /* ··· 746 646 intel_guc_pm_intrmsk_enable(to_gt(i915)); 747 647 748 648 slpc_get_rp_values(slpc); 649 + 650 + /* Handle the case where min=max=RPmax */ 651 + update_server_min_softlimit(slpc); 749 652 750 653 /* Set SLPC max limit to RP0 */ 751 654 ret = slpc_use_fused_rp0(slpc);

+4

drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h

··· 9 9 #include "intel_guc_submission.h" 10 10 #include "intel_guc_slpc_types.h" 11 11 12 + #define SLPC_MAX_FREQ_MHZ 4250 13 + 12 14 struct intel_gt; 13 15 struct drm_printer; 14 16 ··· 44 42 void intel_guc_pm_intrmsk_enable(struct intel_gt *gt); 45 43 void intel_guc_slpc_boost(struct intel_guc_slpc *slpc); 46 44 void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc); 45 + int intel_guc_slpc_unset_gucrc_mode(struct intel_guc_slpc *slpc); 46 + int intel_guc_slpc_override_gucrc_mode(struct intel_guc_slpc *slpc, u32 mode); 47 47 48 48 #endif

+3

drivers/gpu/drm/i915/gt/uc/intel_guc_slpc_types.h

··· 19 19 bool supported; 20 20 bool selected; 21 21 22 + /* Indicates this is a server part */ 23 + bool min_is_rpmax; 24 + 22 25 /* platform frequency limits */ 23 26 u32 min_freq; 24 27 u32 rp0_freq;

+286 -54

drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

··· 6 6 #include <linux/circ_buf.h> 7 7 8 8 #include "gem/i915_gem_context.h" 9 + #include "gem/i915_gem_lmem.h" 9 10 #include "gt/gen8_engine_cs.h" 10 11 #include "gt/intel_breadcrumbs.h" 11 12 #include "gt/intel_context.h" ··· 66 65 * corresponding G2H returns indicating the scheduling disable operation has 67 66 * completed it is safe to unpin the context. While a disable is in flight it 68 67 * isn't safe to resubmit the context so a fence is used to stall all future 69 - * requests of that context until the G2H is returned. 68 + * requests of that context until the G2H is returned. Because this interaction 69 + * with the GuC takes a non-zero amount of time we delay the disabling of 70 + * scheduling after the pin count goes to zero by a configurable period of time 71 + * (see SCHED_DISABLE_DELAY_MS). The thought is this gives the user a window of 72 + * time to resubmit something on the context before doing this costly operation. 73 + * This delay is only done if the context isn't closed and the guc_id usage is 74 + * less than a threshold (see NUM_SCHED_DISABLE_GUC_IDS_THRESHOLD). 70 75 * 71 76 * Context deregistration: 72 77 * Before a context can be destroyed or if we steal its guc_id we must ··· 170 163 #define SCHED_STATE_PENDING_ENABLE BIT(5) 171 164 #define SCHED_STATE_REGISTERED BIT(6) 172 165 #define SCHED_STATE_POLICY_REQUIRED BIT(7) 173 - #define SCHED_STATE_BLOCKED_SHIFT 8 166 + #define SCHED_STATE_CLOSED BIT(8) 167 + #define SCHED_STATE_BLOCKED_SHIFT 9 174 168 #define SCHED_STATE_BLOCKED BIT(SCHED_STATE_BLOCKED_SHIFT) 175 169 #define SCHED_STATE_BLOCKED_MASK (0xfff << SCHED_STATE_BLOCKED_SHIFT) 176 170 ··· 181 173 ce->guc_state.sched_state &= SCHED_STATE_BLOCKED_MASK; 182 174 } 183 175 176 + /* 177 + * Kernel contexts can have SCHED_STATE_REGISTERED after suspend. 178 + * A context close can race with the submission path, so SCHED_STATE_CLOSED 179 + * can be set immediately before we try to register. 180 + */ 181 + #define SCHED_STATE_VALID_INIT \ 182 + (SCHED_STATE_BLOCKED_MASK | \ 183 + SCHED_STATE_CLOSED | \ 184 + SCHED_STATE_REGISTERED) 185 + 184 186 __maybe_unused 185 187 static bool sched_state_is_init(struct intel_context *ce) 186 188 { 187 - /* Kernel contexts can have SCHED_STATE_REGISTERED after suspend. */ 188 - return !(ce->guc_state.sched_state & 189 - ~(SCHED_STATE_BLOCKED_MASK | SCHED_STATE_REGISTERED)); 189 + return !(ce->guc_state.sched_state & ~SCHED_STATE_VALID_INIT); 190 190 } 191 191 192 192 static inline bool ··· 335 319 ce->guc_state.sched_state &= ~SCHED_STATE_POLICY_REQUIRED; 336 320 } 337 321 322 + static inline bool context_close_done(struct intel_context *ce) 323 + { 324 + return ce->guc_state.sched_state & SCHED_STATE_CLOSED; 325 + } 326 + 327 + static inline void set_context_close_done(struct intel_context *ce) 328 + { 329 + lockdep_assert_held(&ce->guc_state.lock); 330 + ce->guc_state.sched_state |= SCHED_STATE_CLOSED; 331 + } 332 + 338 333 static inline u32 context_blocked(struct intel_context *ce) 339 334 { 340 335 return (ce->guc_state.sched_state & SCHED_STATE_BLOCKED_MASK) >> ··· 368 341 GEM_BUG_ON(!context_blocked(ce)); /* Underflow check */ 369 342 370 343 ce->guc_state.sched_state -= SCHED_STATE_BLOCKED; 371 - } 372 - 373 - static inline bool context_has_committed_requests(struct intel_context *ce) 374 - { 375 - return !!ce->guc_state.number_committed_requests; 376 - } 377 - 378 - static inline void incr_context_committed_requests(struct intel_context *ce) 379 - { 380 - lockdep_assert_held(&ce->guc_state.lock); 381 - ++ce->guc_state.number_committed_requests; 382 - GEM_BUG_ON(ce->guc_state.number_committed_requests < 0); 383 - } 384 - 385 - static inline void decr_context_committed_requests(struct intel_context *ce) 386 - { 387 - lockdep_assert_held(&ce->guc_state.lock); 388 - --ce->guc_state.number_committed_requests; 389 - GEM_BUG_ON(ce->guc_state.number_committed_requests < 0); 390 344 } 391 345 392 346 static struct intel_context * ··· 1074 1066 bool do_put = kref_get_unless_zero(&ce->ref); 1075 1067 1076 1068 xa_unlock(&guc->context_lookup); 1069 + 1070 + if (test_bit(CONTEXT_GUC_INIT, &ce->flags) && 1071 + (cancel_delayed_work(&ce->guc_state.sched_disable_delay_work))) { 1072 + /* successful cancel so jump straight to close it */ 1073 + intel_context_sched_disable_unpin(ce); 1074 + } 1077 1075 1078 1076 spin_lock(&ce->guc_state.lock); 1079 1077 ··· 2008 1994 if (unlikely(ret < 0)) 2009 1995 return ret; 2010 1996 1997 + if (!intel_context_is_parent(ce)) 1998 + ++guc->submission_state.guc_ids_in_use; 1999 + 2011 2000 ce->guc_id.id = ret; 2012 2001 return 0; 2013 2002 } ··· 2020 2003 GEM_BUG_ON(intel_context_is_child(ce)); 2021 2004 2022 2005 if (!context_guc_id_invalid(ce)) { 2023 - if (intel_context_is_parent(ce)) 2006 + if (intel_context_is_parent(ce)) { 2024 2007 bitmap_release_region(guc->submission_state.guc_ids_bitmap, 2025 2008 ce->guc_id.id, 2026 2009 order_base_2(ce->parallel.number_children 2027 2010 + 1)); 2028 - else 2011 + } else { 2012 + --guc->submission_state.guc_ids_in_use; 2029 2013 ida_simple_remove(&guc->submission_state.guc_ids, 2030 2014 ce->guc_id.id); 2015 + } 2031 2016 clr_ctx_id_mapping(guc, ce->guc_id.id); 2032 2017 set_context_guc_id_invalid(ce); 2033 2018 } ··· 2448 2429 int ret; 2449 2430 2450 2431 /* NB: For both of these, zero means disabled. */ 2432 + GEM_BUG_ON(overflows_type(engine->props.timeslice_duration_ms * 1000, 2433 + execution_quantum)); 2434 + GEM_BUG_ON(overflows_type(engine->props.preempt_timeout_ms * 1000, 2435 + preemption_timeout)); 2451 2436 execution_quantum = engine->props.timeslice_duration_ms * 1000; 2452 2437 preemption_timeout = engine->props.preempt_timeout_ms * 1000; 2453 2438 ··· 2485 2462 desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE_V69; 2486 2463 2487 2464 /* NB: For both of these, zero means disabled. */ 2465 + GEM_BUG_ON(overflows_type(engine->props.timeslice_duration_ms * 1000, 2466 + desc->execution_quantum)); 2467 + GEM_BUG_ON(overflows_type(engine->props.preempt_timeout_ms * 1000, 2468 + desc->preemption_timeout)); 2488 2469 desc->execution_quantum = engine->props.timeslice_duration_ms * 1000; 2489 2470 desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000; 2490 2471 } ··· 3025 2998 } 3026 2999 } 3027 3000 3028 - static void guc_context_sched_disable(struct intel_context *ce) 3001 + static void do_sched_disable(struct intel_guc *guc, struct intel_context *ce, 3002 + unsigned long flags) 3003 + __releases(ce->guc_state.lock) 3029 3004 { 3030 - struct intel_guc *guc = ce_to_guc(ce); 3031 - unsigned long flags; 3032 3005 struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; 3033 3006 intel_wakeref_t wakeref; 3034 3007 u16 guc_id; 3035 3008 3036 - GEM_BUG_ON(intel_context_is_child(ce)); 3037 - 3038 - spin_lock_irqsave(&ce->guc_state.lock, flags); 3039 - 3040 - /* 3041 - * We have to check if the context has been disabled by another thread, 3042 - * check if submssion has been disabled to seal a race with reset and 3043 - * finally check if any more requests have been committed to the 3044 - * context ensursing that a request doesn't slip through the 3045 - * 'context_pending_disable' fence. 3046 - */ 3047 - if (unlikely(!context_enabled(ce) || submission_disabled(guc) || 3048 - context_has_committed_requests(ce))) { 3049 - clr_context_enabled(ce); 3050 - spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3051 - goto unpin; 3052 - } 3009 + lockdep_assert_held(&ce->guc_state.lock); 3053 3010 guc_id = prep_context_pending_disable(ce); 3054 3011 3055 3012 spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3056 3013 3057 3014 with_intel_runtime_pm(runtime_pm, wakeref) 3058 3015 __guc_context_sched_disable(guc, ce, guc_id); 3016 + } 3059 3017 3060 - return; 3061 - unpin: 3062 - intel_context_sched_disable_unpin(ce); 3018 + static bool bypass_sched_disable(struct intel_guc *guc, 3019 + struct intel_context *ce) 3020 + { 3021 + lockdep_assert_held(&ce->guc_state.lock); 3022 + GEM_BUG_ON(intel_context_is_child(ce)); 3023 + 3024 + if (submission_disabled(guc) || context_guc_id_invalid(ce) || 3025 + !ctx_id_mapped(guc, ce->guc_id.id)) { 3026 + clr_context_enabled(ce); 3027 + return true; 3028 + } 3029 + 3030 + return !context_enabled(ce); 3031 + } 3032 + 3033 + static void __delay_sched_disable(struct work_struct *wrk) 3034 + { 3035 + struct intel_context *ce = 3036 + container_of(wrk, typeof(*ce), guc_state.sched_disable_delay_work.work); 3037 + struct intel_guc *guc = ce_to_guc(ce); 3038 + unsigned long flags; 3039 + 3040 + spin_lock_irqsave(&ce->guc_state.lock, flags); 3041 + 3042 + if (bypass_sched_disable(guc, ce)) { 3043 + spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3044 + intel_context_sched_disable_unpin(ce); 3045 + } else { 3046 + do_sched_disable(guc, ce, flags); 3047 + } 3048 + } 3049 + 3050 + static bool guc_id_pressure(struct intel_guc *guc, struct intel_context *ce) 3051 + { 3052 + /* 3053 + * parent contexts are perma-pinned, if we are unpinning do schedule 3054 + * disable immediately. 3055 + */ 3056 + if (intel_context_is_parent(ce)) 3057 + return true; 3058 + 3059 + /* 3060 + * If we are beyond the threshold for avail guc_ids, do schedule disable immediately. 3061 + */ 3062 + return guc->submission_state.guc_ids_in_use > 3063 + guc->submission_state.sched_disable_gucid_threshold; 3064 + } 3065 + 3066 + static void guc_context_sched_disable(struct intel_context *ce) 3067 + { 3068 + struct intel_guc *guc = ce_to_guc(ce); 3069 + u64 delay = guc->submission_state.sched_disable_delay_ms; 3070 + unsigned long flags; 3071 + 3072 + spin_lock_irqsave(&ce->guc_state.lock, flags); 3073 + 3074 + if (bypass_sched_disable(guc, ce)) { 3075 + spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3076 + intel_context_sched_disable_unpin(ce); 3077 + } else if (!intel_context_is_closed(ce) && !guc_id_pressure(guc, ce) && 3078 + delay) { 3079 + spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3080 + mod_delayed_work(system_unbound_wq, 3081 + &ce->guc_state.sched_disable_delay_work, 3082 + msecs_to_jiffies(delay)); 3083 + } else { 3084 + do_sched_disable(guc, ce, flags); 3085 + } 3086 + } 3087 + 3088 + static void guc_context_close(struct intel_context *ce) 3089 + { 3090 + unsigned long flags; 3091 + 3092 + if (test_bit(CONTEXT_GUC_INIT, &ce->flags) && 3093 + cancel_delayed_work(&ce->guc_state.sched_disable_delay_work)) 3094 + __delay_sched_disable(&ce->guc_state.sched_disable_delay_work.work); 3095 + 3096 + spin_lock_irqsave(&ce->guc_state.lock, flags); 3097 + set_context_close_done(ce); 3098 + spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3063 3099 } 3064 3100 3065 3101 static inline void guc_lrc_desc_unpin(struct intel_context *ce) ··· 3161 3071 ce->guc_state.prio_count[GUC_CLIENT_PRIORITY_HIGH] || 3162 3072 ce->guc_state.prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] || 3163 3073 ce->guc_state.prio_count[GUC_CLIENT_PRIORITY_NORMAL]); 3164 - GEM_BUG_ON(ce->guc_state.number_committed_requests); 3165 3074 3166 3075 lrc_fini(ce); 3167 3076 intel_context_fini(ce); ··· 3429 3340 3430 3341 guc_prio_fini(rq, ce); 3431 3342 3432 - decr_context_committed_requests(ce); 3433 - 3434 3343 spin_unlock_irq(&ce->guc_state.lock); 3435 3344 3436 3345 atomic_dec(&ce->guc_id.ref); ··· 3437 3350 3438 3351 static const struct intel_context_ops guc_context_ops = { 3439 3352 .alloc = guc_context_alloc, 3353 + 3354 + .close = guc_context_close, 3440 3355 3441 3356 .pre_pin = guc_context_pre_pin, 3442 3357 .pin = guc_context_pin, ··· 3522 3433 rcu_read_unlock(); 3523 3434 3524 3435 ce->guc_state.prio = map_i915_prio_to_guc_prio(prio); 3436 + 3437 + INIT_DELAYED_WORK(&ce->guc_state.sched_disable_delay_work, 3438 + __delay_sched_disable); 3439 + 3525 3440 set_bit(CONTEXT_GUC_INIT, &ce->flags); 3526 3441 } 3527 3442 ··· 3563 3470 if (unlikely(!test_bit(CONTEXT_GUC_INIT, &ce->flags))) 3564 3471 guc_context_init(ce); 3565 3472 3473 + /* 3474 + * If the context gets closed while the execbuf is ongoing, the context 3475 + * close code will race with the below code to cancel the delayed work. 3476 + * If the context close wins the race and cancels the work, it will 3477 + * immediately call the sched disable (see guc_context_close), so there 3478 + * is a chance we can get past this check while the sched_disable code 3479 + * is being executed. To make sure that code completes before we check 3480 + * the status further down, we wait for the close process to complete. 3481 + * Else, this code path could send a request down thinking that the 3482 + * context is still in a schedule-enable mode while the GuC ends up 3483 + * dropping the request completely because the disable did go from the 3484 + * context_close path right to GuC just prior. In the event the CT is 3485 + * full, we could potentially need to wait up to 1.5 seconds. 3486 + */ 3487 + if (cancel_delayed_work_sync(&ce->guc_state.sched_disable_delay_work)) 3488 + intel_context_sched_disable_unpin(ce); 3489 + else if (intel_context_is_closed(ce)) 3490 + if (wait_for(context_close_done(ce), 1500)) 3491 + drm_warn(&guc_to_gt(guc)->i915->drm, 3492 + "timed out waiting on context sched close before realloc\n"); 3566 3493 /* 3567 3494 * Call pin_guc_id here rather than in the pinning step as with 3568 3495 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the ··· 3637 3524 3638 3525 list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences); 3639 3526 } 3640 - incr_context_committed_requests(ce); 3641 3527 spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3642 3528 3643 3529 return 0; ··· 3711 3599 3712 3600 static const struct intel_context_ops virtual_guc_context_ops = { 3713 3601 .alloc = guc_virtual_context_alloc, 3602 + 3603 + .close = guc_context_close, 3714 3604 3715 3605 .pre_pin = guc_virtual_context_pre_pin, 3716 3606 .pin = guc_virtual_context_pin, ··· 3802 3688 3803 3689 static const struct intel_context_ops virtual_parent_context_ops = { 3804 3690 .alloc = guc_virtual_context_alloc, 3691 + 3692 + .close = guc_context_close, 3805 3693 3806 3694 .pre_pin = guc_context_pre_pin, 3807 3695 .pin = guc_parent_context_pin, ··· 4209 4093 4210 4094 engine->emit_bb_start = gen8_emit_bb_start; 4211 4095 if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) 4212 - engine->emit_bb_start = gen125_emit_bb_start; 4096 + engine->emit_bb_start = xehp_emit_bb_start; 4213 4097 } 4214 4098 4215 4099 static void rcs_submission_override(struct intel_engine_cs *engine) ··· 4293 4177 return 0; 4294 4178 } 4295 4179 4180 + struct scheduling_policy { 4181 + /* internal data */ 4182 + u32 max_words, num_words; 4183 + u32 count; 4184 + /* API data */ 4185 + struct guc_update_scheduling_policy h2g; 4186 + }; 4187 + 4188 + static u32 __guc_scheduling_policy_action_size(struct scheduling_policy *policy) 4189 + { 4190 + u32 *start = (void *)&policy->h2g; 4191 + u32 *end = policy->h2g.data + policy->num_words; 4192 + size_t delta = end - start; 4193 + 4194 + return delta; 4195 + } 4196 + 4197 + static struct scheduling_policy *__guc_scheduling_policy_start_klv(struct scheduling_policy *policy) 4198 + { 4199 + policy->h2g.header.action = INTEL_GUC_ACTION_UPDATE_SCHEDULING_POLICIES_KLV; 4200 + policy->max_words = ARRAY_SIZE(policy->h2g.data); 4201 + policy->num_words = 0; 4202 + policy->count = 0; 4203 + 4204 + return policy; 4205 + } 4206 + 4207 + static void __guc_scheduling_policy_add_klv(struct scheduling_policy *policy, 4208 + u32 action, u32 *data, u32 len) 4209 + { 4210 + u32 *klv_ptr = policy->h2g.data + policy->num_words; 4211 + 4212 + GEM_BUG_ON((policy->num_words + 1 + len) > policy->max_words); 4213 + *(klv_ptr++) = FIELD_PREP(GUC_KLV_0_KEY, action) | 4214 + FIELD_PREP(GUC_KLV_0_LEN, len); 4215 + memcpy(klv_ptr, data, sizeof(u32) * len); 4216 + policy->num_words += 1 + len; 4217 + policy->count++; 4218 + } 4219 + 4220 + static int __guc_action_set_scheduling_policies(struct intel_guc *guc, 4221 + struct scheduling_policy *policy) 4222 + { 4223 + int ret; 4224 + 4225 + ret = intel_guc_send(guc, (u32 *)&policy->h2g, 4226 + __guc_scheduling_policy_action_size(policy)); 4227 + if (ret < 0) 4228 + return ret; 4229 + 4230 + if (ret != policy->count) { 4231 + drm_warn(&guc_to_gt(guc)->i915->drm, "GuC global scheduler policy processed %d of %d KLVs!", 4232 + ret, policy->count); 4233 + if (ret > policy->count) 4234 + return -EPROTO; 4235 + } 4236 + 4237 + return 0; 4238 + } 4239 + 4240 + static int guc_init_global_schedule_policy(struct intel_guc *guc) 4241 + { 4242 + struct scheduling_policy policy; 4243 + struct intel_gt *gt = guc_to_gt(guc); 4244 + intel_wakeref_t wakeref; 4245 + int ret = 0; 4246 + 4247 + if (GET_UC_VER(guc) < MAKE_UC_VER(70, 3, 0)) 4248 + return 0; 4249 + 4250 + __guc_scheduling_policy_start_klv(&policy); 4251 + 4252 + with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref) { 4253 + u32 yield[] = { 4254 + GLOBAL_SCHEDULE_POLICY_RC_YIELD_DURATION, 4255 + GLOBAL_SCHEDULE_POLICY_RC_YIELD_RATIO, 4256 + }; 4257 + 4258 + __guc_scheduling_policy_add_klv(&policy, 4259 + GUC_SCHEDULING_POLICIES_KLV_ID_RENDER_COMPUTE_YIELD, 4260 + yield, ARRAY_SIZE(yield)); 4261 + 4262 + ret = __guc_action_set_scheduling_policies(guc, &policy); 4263 + if (ret) 4264 + i915_probe_error(gt->i915, 4265 + "Failed to configure global scheduling policies: %pe!\n", 4266 + ERR_PTR(ret)); 4267 + } 4268 + 4269 + return ret; 4270 + } 4271 + 4296 4272 void intel_guc_submission_enable(struct intel_guc *guc) 4297 4273 { 4298 4274 struct intel_gt *gt = guc_to_gt(guc); ··· 4397 4189 4398 4190 guc_init_lrc_mapping(guc); 4399 4191 guc_init_engine_stats(guc); 4192 + guc_init_global_schedule_policy(guc); 4400 4193 } 4401 4194 4402 4195 void intel_guc_submission_disable(struct intel_guc *guc) ··· 4428 4219 return i915->params.enable_guc & ENABLE_GUC_SUBMISSION; 4429 4220 } 4430 4221 4222 + int intel_guc_sched_disable_gucid_threshold_max(struct intel_guc *guc) 4223 + { 4224 + return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc); 4225 + } 4226 + 4227 + /* 4228 + * This default value of 33 milisecs (+1 milisec round up) ensures 30fps or higher 4229 + * workloads are able to enjoy the latency reduction when delaying the schedule-disable 4230 + * operation. This matches the 30fps game-render + encode (real world) workload this 4231 + * knob was tested against. 4232 + */ 4233 + #define SCHED_DISABLE_DELAY_MS 34 4234 + 4235 + /* 4236 + * A threshold of 75% is a reasonable starting point considering that real world apps 4237 + * generally don't get anywhere near this. 4238 + */ 4239 + #define NUM_SCHED_DISABLE_GUCIDS_DEFAULT_THRESHOLD(__guc) \ 4240 + (((intel_guc_sched_disable_gucid_threshold_max(guc)) * 3) / 4) 4241 + 4431 4242 void intel_guc_submission_init_early(struct intel_guc *guc) 4432 4243 { 4433 4244 xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); ··· 4464 4235 spin_lock_init(&guc->timestamp.lock); 4465 4236 INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping); 4466 4237 4238 + guc->submission_state.sched_disable_delay_ms = SCHED_DISABLE_DELAY_MS; 4467 4239 guc->submission_state.num_guc_ids = GUC_MAX_CONTEXT_ID; 4240 + guc->submission_state.sched_disable_gucid_threshold = 4241 + NUM_SCHED_DISABLE_GUCIDS_DEFAULT_THRESHOLD(guc); 4468 4242 guc->submission_supported = __guc_submission_supported(guc); 4469 4243 guc->submission_selected = __guc_submission_selected(guc); 4470 4244 }

+241 -21

drivers/gpu/drm/i915/gt/uc/intel_huc.c

··· 10 10 #include "intel_huc.h" 11 11 #include "i915_drv.h" 12 12 13 + #include <linux/device/bus.h> 14 + #include <linux/mei_aux.h> 15 + 13 16 /** 14 17 * DOC: HuC 15 18 * ··· 45 42 * HuC-specific commands. 46 43 */ 47 44 45 + /* 46 + * MEI-GSC load is an async process. The probing of the exposed aux device 47 + * (see intel_gsc.c) usually happens a few seconds after i915 probe, depending 48 + * on when the kernel schedules it. Unless something goes terribly wrong, we're 49 + * guaranteed for this to happen during boot, so the big timeout is a safety net 50 + * that we never expect to need. 51 + * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs to be resumed 52 + * and/or reset, this can take longer. Note that the kernel might schedule 53 + * other work between the i915 init/resume and the MEI one, which can add to 54 + * the delay. 55 + */ 56 + #define GSC_INIT_TIMEOUT_MS 10000 57 + #define PXP_INIT_TIMEOUT_MS 5000 58 + 59 + static int sw_fence_dummy_notify(struct i915_sw_fence *sf, 60 + enum i915_sw_fence_notify state) 61 + { 62 + return NOTIFY_DONE; 63 + } 64 + 65 + static void __delayed_huc_load_complete(struct intel_huc *huc) 66 + { 67 + if (!i915_sw_fence_done(&huc->delayed_load.fence)) 68 + i915_sw_fence_complete(&huc->delayed_load.fence); 69 + } 70 + 71 + static void delayed_huc_load_complete(struct intel_huc *huc) 72 + { 73 + hrtimer_cancel(&huc->delayed_load.timer); 74 + __delayed_huc_load_complete(huc); 75 + } 76 + 77 + static void __gsc_init_error(struct intel_huc *huc) 78 + { 79 + huc->delayed_load.status = INTEL_HUC_DELAYED_LOAD_ERROR; 80 + __delayed_huc_load_complete(huc); 81 + } 82 + 83 + static void gsc_init_error(struct intel_huc *huc) 84 + { 85 + hrtimer_cancel(&huc->delayed_load.timer); 86 + __gsc_init_error(huc); 87 + } 88 + 89 + static void gsc_init_done(struct intel_huc *huc) 90 + { 91 + hrtimer_cancel(&huc->delayed_load.timer); 92 + 93 + /* MEI-GSC init is done, now we wait for MEI-PXP to bind */ 94 + huc->delayed_load.status = INTEL_HUC_WAITING_ON_PXP; 95 + if (!i915_sw_fence_done(&huc->delayed_load.fence)) 96 + hrtimer_start(&huc->delayed_load.timer, 97 + ms_to_ktime(PXP_INIT_TIMEOUT_MS), 98 + HRTIMER_MODE_REL); 99 + } 100 + 101 + static enum hrtimer_restart huc_delayed_load_timer_callback(struct hrtimer *hrtimer) 102 + { 103 + struct intel_huc *huc = container_of(hrtimer, struct intel_huc, delayed_load.timer); 104 + 105 + if (!intel_huc_is_authenticated(huc)) { 106 + if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC) 107 + drm_notice(&huc_to_gt(huc)->i915->drm, 108 + "timed out waiting for MEI GSC init to load HuC\n"); 109 + else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP) 110 + drm_notice(&huc_to_gt(huc)->i915->drm, 111 + "timed out waiting for MEI PXP init to load HuC\n"); 112 + else 113 + MISSING_CASE(huc->delayed_load.status); 114 + 115 + __gsc_init_error(huc); 116 + } 117 + 118 + return HRTIMER_NORESTART; 119 + } 120 + 121 + static void huc_delayed_load_start(struct intel_huc *huc) 122 + { 123 + ktime_t delay; 124 + 125 + GEM_BUG_ON(intel_huc_is_authenticated(huc)); 126 + 127 + /* 128 + * On resume we don't have to wait for MEI-GSC to be re-probed, but we 129 + * do need to wait for MEI-PXP to reset & re-bind 130 + */ 131 + switch (huc->delayed_load.status) { 132 + case INTEL_HUC_WAITING_ON_GSC: 133 + delay = ms_to_ktime(GSC_INIT_TIMEOUT_MS); 134 + break; 135 + case INTEL_HUC_WAITING_ON_PXP: 136 + delay = ms_to_ktime(PXP_INIT_TIMEOUT_MS); 137 + break; 138 + default: 139 + gsc_init_error(huc); 140 + return; 141 + } 142 + 143 + /* 144 + * This fence is always complete unless we're waiting for the 145 + * GSC device to come up to load the HuC. We arm the fence here 146 + * and complete it when we confirm that the HuC is loaded from 147 + * the PXP bind callback. 148 + */ 149 + GEM_BUG_ON(!i915_sw_fence_done(&huc->delayed_load.fence)); 150 + i915_sw_fence_fini(&huc->delayed_load.fence); 151 + i915_sw_fence_reinit(&huc->delayed_load.fence); 152 + i915_sw_fence_await(&huc->delayed_load.fence); 153 + i915_sw_fence_commit(&huc->delayed_load.fence); 154 + 155 + hrtimer_start(&huc->delayed_load.timer, delay, HRTIMER_MODE_REL); 156 + } 157 + 158 + static int gsc_notifier(struct notifier_block *nb, unsigned long action, void *data) 159 + { 160 + struct device *dev = data; 161 + struct intel_huc *huc = container_of(nb, struct intel_huc, delayed_load.nb); 162 + struct intel_gsc_intf *intf = &huc_to_gt(huc)->gsc.intf[0]; 163 + 164 + if (!intf->adev || &intf->adev->aux_dev.dev != dev) 165 + return 0; 166 + 167 + switch (action) { 168 + case BUS_NOTIFY_BOUND_DRIVER: /* mei driver bound to aux device */ 169 + gsc_init_done(huc); 170 + break; 171 + 172 + case BUS_NOTIFY_DRIVER_NOT_BOUND: /* mei driver fails to be bound */ 173 + case BUS_NOTIFY_UNBIND_DRIVER: /* mei driver about to be unbound */ 174 + drm_info(&huc_to_gt(huc)->i915->drm, 175 + "mei driver not bound, disabling HuC load\n"); 176 + gsc_init_error(huc); 177 + break; 178 + } 179 + 180 + return 0; 181 + } 182 + 183 + void intel_huc_register_gsc_notifier(struct intel_huc *huc, struct bus_type *bus) 184 + { 185 + int ret; 186 + 187 + if (!intel_huc_is_loaded_by_gsc(huc)) 188 + return; 189 + 190 + huc->delayed_load.nb.notifier_call = gsc_notifier; 191 + ret = bus_register_notifier(bus, &huc->delayed_load.nb); 192 + if (ret) { 193 + drm_err(&huc_to_gt(huc)->i915->drm, 194 + "failed to register GSC notifier\n"); 195 + huc->delayed_load.nb.notifier_call = NULL; 196 + gsc_init_error(huc); 197 + } 198 + } 199 + 200 + void intel_huc_unregister_gsc_notifier(struct intel_huc *huc, struct bus_type *bus) 201 + { 202 + if (!huc->delayed_load.nb.notifier_call) 203 + return; 204 + 205 + delayed_huc_load_complete(huc); 206 + 207 + bus_unregister_notifier(bus, &huc->delayed_load.nb); 208 + huc->delayed_load.nb.notifier_call = NULL; 209 + } 210 + 48 211 void intel_huc_init_early(struct intel_huc *huc) 49 212 { 50 213 struct drm_i915_private *i915 = huc_to_gt(huc)->i915; ··· 226 57 huc->status.mask = HUC_FW_VERIFIED; 227 58 huc->status.value = HUC_FW_VERIFIED; 228 59 } 60 + 61 + /* 62 + * Initialize fence to be complete as this is expected to be complete 63 + * unless there is a delayed HuC reload in progress. 64 + */ 65 + i915_sw_fence_init(&huc->delayed_load.fence, 66 + sw_fence_dummy_notify); 67 + i915_sw_fence_commit(&huc->delayed_load.fence); 68 + 69 + hrtimer_init(&huc->delayed_load.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 70 + huc->delayed_load.timer.function = huc_delayed_load_timer_callback; 229 71 } 230 72 231 73 #define HUC_LOAD_MODE_STRING(x) (x ? "GSC" : "legacy") ··· 293 113 return 0; 294 114 295 115 out: 116 + intel_uc_fw_change_status(&huc->fw, INTEL_UC_FIRMWARE_INIT_FAIL); 296 117 drm_info(&i915->drm, "HuC init failed with %d\n", err); 297 118 return err; 298 119 } ··· 303 122 if (!intel_uc_fw_is_loadable(&huc->fw)) 304 123 return; 305 124 125 + delayed_huc_load_complete(huc); 126 + 127 + i915_sw_fence_fini(&huc->delayed_load.fence); 306 128 intel_uc_fw_fini(&huc->fw); 129 + } 130 + 131 + void intel_huc_suspend(struct intel_huc *huc) 132 + { 133 + if (!intel_uc_fw_is_loadable(&huc->fw)) 134 + return; 135 + 136 + /* 137 + * in the unlikely case that we're suspending before the GSC has 138 + * completed its loading sequence, just stop waiting. We'll restart 139 + * on resume. 140 + */ 141 + delayed_huc_load_complete(huc); 142 + } 143 + 144 + int intel_huc_wait_for_auth_complete(struct intel_huc *huc) 145 + { 146 + struct intel_gt *gt = huc_to_gt(huc); 147 + int ret; 148 + 149 + ret = __intel_wait_for_register(gt->uncore, 150 + huc->status.reg, 151 + huc->status.mask, 152 + huc->status.value, 153 + 2, 50, NULL); 154 + 155 + /* mark the load process as complete even if the wait failed */ 156 + delayed_huc_load_complete(huc); 157 + 158 + if (ret) { 159 + drm_err(&gt->i915->drm, "HuC: Firmware not verified %d\n", ret); 160 + intel_uc_fw_change_status(&huc->fw, INTEL_UC_FIRMWARE_LOAD_FAIL); 161 + return ret; 162 + } 163 + 164 + intel_uc_fw_change_status(&huc->fw, INTEL_UC_FIRMWARE_RUNNING); 165 + drm_info(&gt->i915->drm, "HuC authenticated\n"); 166 + return 0; 307 167 } 308 168 309 169 /** ··· 383 161 } 384 162 385 163 /* Check authentication status, it should be done by now */ 386 - ret = __intel_wait_for_register(gt->uncore, 387 - huc->status.reg, 388 - huc->status.mask, 389 - huc->status.value, 390 - 2, 50, NULL); 391 - if (ret) { 392 - DRM_ERROR("HuC: Firmware not verified %d\n", ret); 164 + ret = intel_huc_wait_for_auth_complete(huc); 165 + if (ret) 393 166 goto fail; 394 - } 395 167 396 - intel_uc_fw_change_status(&huc->fw, INTEL_UC_FIRMWARE_RUNNING); 397 - drm_info(&gt->i915->drm, "HuC authenticated\n"); 398 168 return 0; 399 169 400 170 fail: 401 171 i915_probe_error(gt->i915, "HuC: Authentication failed %d\n", ret); 402 - intel_uc_fw_change_status(&huc->fw, INTEL_UC_FIRMWARE_LOAD_FAIL); 403 172 return ret; 404 173 } 405 174 406 - static bool huc_is_authenticated(struct intel_huc *huc) 175 + bool intel_huc_is_authenticated(struct intel_huc *huc) 407 176 { 408 177 struct intel_gt *gt = huc_to_gt(huc); 409 178 intel_wakeref_t wakeref; ··· 413 200 * This function reads status register to verify if HuC 414 201 * firmware was successfully loaded. 415 202 * 416 - * Returns: 417 - * * -ENODEV if HuC is not present on this platform, 418 - * * -EOPNOTSUPP if HuC firmware is disabled, 419 - * * -ENOPKG if HuC firmware was not installed, 420 - * * -ENOEXEC if HuC firmware is invalid or mismatched, 421 - * * 0 if HuC firmware is not running, 422 - * * 1 if HuC firmware is authenticated and running. 203 + * The return values match what is expected for the I915_PARAM_HUC_STATUS 204 + * getparam. 423 205 */ 424 206 int intel_huc_check_status(struct intel_huc *huc) 425 207 { ··· 427 219 return -ENOPKG; 428 220 case INTEL_UC_FIRMWARE_ERROR: 429 221 return -ENOEXEC; 222 + case INTEL_UC_FIRMWARE_INIT_FAIL: 223 + return -ENOMEM; 224 + case INTEL_UC_FIRMWARE_LOAD_FAIL: 225 + return -EIO; 430 226 default: 431 227 break; 432 228 } 433 229 434 - return huc_is_authenticated(huc); 230 + return intel_huc_is_authenticated(huc); 231 + } 232 + 233 + static bool huc_has_delayed_load(struct intel_huc *huc) 234 + { 235 + return intel_huc_is_loaded_by_gsc(huc) && 236 + (huc->delayed_load.status != INTEL_HUC_DELAYED_LOAD_ERROR); 435 237 } 436 238 437 239 void intel_huc_update_auth_status(struct intel_huc *huc) ··· 449 231 if (!intel_uc_fw_is_loadable(&huc->fw)) 450 232 return; 451 233 452 - if (huc_is_authenticated(huc)) 234 + if (intel_huc_is_authenticated(huc)) 453 235 intel_uc_fw_change_status(&huc->fw, 454 236 INTEL_UC_FIRMWARE_RUNNING); 237 + else if (huc_has_delayed_load(huc)) 238 + huc_delayed_load_start(huc); 455 239 } 456 240 457 241 /**

+31

drivers/gpu/drm/i915/gt/uc/intel_huc.h

··· 7 7 #define _INTEL_HUC_H_ 8 8 9 9 #include "i915_reg_defs.h" 10 + #include "i915_sw_fence.h" 10 11 #include "intel_uc_fw.h" 11 12 #include "intel_huc_fw.h" 13 + 14 + #include <linux/notifier.h> 15 + #include <linux/hrtimer.h> 16 + 17 + struct bus_type; 18 + 19 + enum intel_huc_delayed_load_status { 20 + INTEL_HUC_WAITING_ON_GSC = 0, 21 + INTEL_HUC_WAITING_ON_PXP, 22 + INTEL_HUC_DELAYED_LOAD_ERROR, 23 + }; 12 24 13 25 struct intel_huc { 14 26 /* Generic uC firmware management */ ··· 32 20 u32 mask; 33 21 u32 value; 34 22 } status; 23 + 24 + struct { 25 + struct i915_sw_fence fence; 26 + struct hrtimer timer; 27 + struct notifier_block nb; 28 + enum intel_huc_delayed_load_status status; 29 + } delayed_load; 35 30 }; 36 31 37 32 void intel_huc_init_early(struct intel_huc *huc); 38 33 int intel_huc_init(struct intel_huc *huc); 39 34 void intel_huc_fini(struct intel_huc *huc); 35 + void intel_huc_suspend(struct intel_huc *huc); 40 36 int intel_huc_auth(struct intel_huc *huc); 37 + int intel_huc_wait_for_auth_complete(struct intel_huc *huc); 41 38 int intel_huc_check_status(struct intel_huc *huc); 42 39 void intel_huc_update_auth_status(struct intel_huc *huc); 40 + bool intel_huc_is_authenticated(struct intel_huc *huc); 41 + 42 + void intel_huc_register_gsc_notifier(struct intel_huc *huc, struct bus_type *bus); 43 + void intel_huc_unregister_gsc_notifier(struct intel_huc *huc, struct bus_type *bus); 43 44 44 45 static inline int intel_huc_sanitize(struct intel_huc *huc) 45 46 { ··· 79 54 static inline bool intel_huc_is_loaded_by_gsc(const struct intel_huc *huc) 80 55 { 81 56 return huc->fw.loaded_via_gsc; 57 + } 58 + 59 + static inline bool intel_huc_wait_required(struct intel_huc *huc) 60 + { 61 + return intel_huc_is_used(huc) && intel_huc_is_loaded_by_gsc(huc) && 62 + !intel_huc_is_authenticated(huc); 82 63 } 83 64 84 65 void intel_huc_load_status(struct intel_huc *huc, struct drm_printer *p);

+34

drivers/gpu/drm/i915/gt/uc/intel_huc_fw.c

··· 3 3 * Copyright © 2014-2019 Intel Corporation 4 4 */ 5 5 6 + #include "gt/intel_gsc.h" 6 7 #include "gt/intel_gt.h" 8 + #include "intel_huc.h" 7 9 #include "intel_huc_fw.h" 8 10 #include "i915_drv.h" 11 + #include "pxp/intel_pxp_huc.h" 12 + 13 + int intel_huc_fw_load_and_auth_via_gsc(struct intel_huc *huc) 14 + { 15 + int ret; 16 + 17 + if (!intel_huc_is_loaded_by_gsc(huc)) 18 + return -ENODEV; 19 + 20 + if (!intel_uc_fw_is_loadable(&huc->fw)) 21 + return -ENOEXEC; 22 + 23 + /* 24 + * If we abort a suspend, HuC might still be loaded when the mei 25 + * component gets re-bound and this function called again. If so, just 26 + * mark the HuC as loaded. 27 + */ 28 + if (intel_huc_is_authenticated(huc)) { 29 + intel_uc_fw_change_status(&huc->fw, INTEL_UC_FIRMWARE_RUNNING); 30 + return 0; 31 + } 32 + 33 + GEM_WARN_ON(intel_uc_fw_is_loaded(&huc->fw)); 34 + 35 + ret = intel_pxp_huc_load_and_auth(&huc_to_gt(huc)->pxp); 36 + if (ret) 37 + return ret; 38 + 39 + intel_uc_fw_change_status(&huc->fw, INTEL_UC_FIRMWARE_TRANSFERRED); 40 + 41 + return intel_huc_wait_for_auth_complete(huc); 42 + } 9 43 10 44 /** 11 45 * intel_huc_fw_upload() - load HuC uCode to device via DMA transfer

+1

drivers/gpu/drm/i915/gt/uc/intel_huc_fw.h

··· 8 8 9 9 struct intel_huc; 10 10 11 + int intel_huc_fw_load_and_auth_via_gsc(struct intel_huc *huc); 11 12 int intel_huc_fw_upload(struct intel_huc *huc); 12 13 13 14 #endif

+16 -8

drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c

··· 93 93 fw_def(BROXTON, 0, guc_mmp(bxt, 70, 1, 1)) \ 94 94 fw_def(SKYLAKE, 0, guc_mmp(skl, 70, 1, 1)) 95 95 96 - #define INTEL_HUC_FIRMWARE_DEFS(fw_def, huc_raw, huc_mmp) \ 96 + #define INTEL_HUC_FIRMWARE_DEFS(fw_def, huc_raw, huc_mmp, huc_gsc) \ 97 + fw_def(DG2, 0, huc_gsc(dg2)) \ 97 98 fw_def(ALDERLAKE_P, 0, huc_raw(tgl)) \ 98 99 fw_def(ALDERLAKE_P, 0, huc_mmp(tgl, 7, 9, 3)) \ 99 100 fw_def(ALDERLAKE_S, 0, huc_raw(tgl)) \ ··· 142 141 #define MAKE_HUC_FW_PATH_BLANK(prefix_) \ 143 142 __MAKE_UC_FW_PATH_BLANK(prefix_, "_huc") 144 143 144 + #define MAKE_HUC_FW_PATH_GSC(prefix_) \ 145 + __MAKE_UC_FW_PATH_BLANK(prefix_, "_huc_gsc") 146 + 145 147 #define MAKE_HUC_FW_PATH_MMP(prefix_, major_, minor_, patch_) \ 146 148 __MAKE_UC_FW_PATH_MMP(prefix_, "_huc_", major_, minor_, patch_) 147 149 ··· 157 153 MODULE_FIRMWARE(uc_); 158 154 159 155 INTEL_GUC_FIRMWARE_DEFS(INTEL_UC_MODULE_FW, MAKE_GUC_FW_PATH_MAJOR, MAKE_GUC_FW_PATH_MMP) 160 - INTEL_HUC_FIRMWARE_DEFS(INTEL_UC_MODULE_FW, MAKE_HUC_FW_PATH_BLANK, MAKE_HUC_FW_PATH_MMP) 156 + INTEL_HUC_FIRMWARE_DEFS(INTEL_UC_MODULE_FW, MAKE_HUC_FW_PATH_BLANK, MAKE_HUC_FW_PATH_MMP, MAKE_HUC_FW_PATH_GSC) 161 157 162 158 /* 163 159 * The next expansion of the table macros (in __uc_fw_auto_select below) provides ··· 172 168 u8 major; 173 169 u8 minor; 174 170 u8 patch; 171 + bool loaded_via_gsc; 175 172 }; 176 173 177 174 #define UC_FW_BLOB_BASE(major_, minor_, patch_, path_) \ ··· 181 176 .patch = patch_, \ 182 177 .path = path_, 183 178 184 - #define UC_FW_BLOB_NEW(major_, minor_, patch_, path_) \ 179 + #define UC_FW_BLOB_NEW(major_, minor_, patch_, gsc_, path_) \ 185 180 { UC_FW_BLOB_BASE(major_, minor_, patch_, path_) \ 186 - .legacy = false } 181 + .legacy = false, .loaded_via_gsc = gsc_ } 187 182 188 183 #define UC_FW_BLOB_OLD(major_, minor_, patch_, path_) \ 189 184 { UC_FW_BLOB_BASE(major_, minor_, patch_, path_) \ 190 185 .legacy = true } 191 186 192 187 #define GUC_FW_BLOB(prefix_, major_, minor_) \ 193 - UC_FW_BLOB_NEW(major_, minor_, 0, \ 188 + UC_FW_BLOB_NEW(major_, minor_, 0, false, \ 194 189 MAKE_GUC_FW_PATH_MAJOR(prefix_, major_, minor_)) 195 190 196 191 #define GUC_FW_BLOB_MMP(prefix_, major_, minor_, patch_) \ ··· 198 193 MAKE_GUC_FW_PATH_MMP(prefix_, major_, minor_, patch_)) 199 194 200 195 #define HUC_FW_BLOB(prefix_) \ 201 - UC_FW_BLOB_NEW(0, 0, 0, MAKE_HUC_FW_PATH_BLANK(prefix_)) 196 + UC_FW_BLOB_NEW(0, 0, 0, false, MAKE_HUC_FW_PATH_BLANK(prefix_)) 202 197 203 198 #define HUC_FW_BLOB_MMP(prefix_, major_, minor_, patch_) \ 204 199 UC_FW_BLOB_OLD(major_, minor_, patch_, \ 205 200 MAKE_HUC_FW_PATH_MMP(prefix_, major_, minor_, patch_)) 201 + 202 + #define HUC_FW_BLOB_GSC(prefix_) \ 203 + UC_FW_BLOB_NEW(0, 0, 0, true, MAKE_HUC_FW_PATH_GSC(prefix_)) 206 204 207 205 struct __packed uc_fw_platform_requirement { 208 206 enum intel_platform p; ··· 232 224 INTEL_GUC_FIRMWARE_DEFS(MAKE_FW_LIST, GUC_FW_BLOB, GUC_FW_BLOB_MMP) 233 225 }; 234 226 static const struct uc_fw_platform_requirement blobs_huc[] = { 235 - INTEL_HUC_FIRMWARE_DEFS(MAKE_FW_LIST, HUC_FW_BLOB, HUC_FW_BLOB_MMP) 227 + INTEL_HUC_FIRMWARE_DEFS(MAKE_FW_LIST, HUC_FW_BLOB, HUC_FW_BLOB_MMP, HUC_FW_BLOB_GSC) 236 228 }; 237 229 static const struct fw_blobs_by_type blobs_all[INTEL_UC_FW_NUM_TYPES] = { 238 230 [INTEL_UC_FW_TYPE_GUC] = { blobs_guc, ARRAY_SIZE(blobs_guc) }, ··· 280 272 uc_fw->file_wanted.path = blob->path; 281 273 uc_fw->file_wanted.major_ver = blob->major; 282 274 uc_fw->file_wanted.minor_ver = blob->minor; 275 + uc_fw->loaded_via_gsc = blob->loaded_via_gsc; 283 276 found = true; 284 277 break; 285 278 } ··· 913 904 out_unpin: 914 905 i915_gem_object_unpin_pages(uc_fw->obj); 915 906 out: 916 - intel_uc_fw_change_status(uc_fw, INTEL_UC_FIRMWARE_INIT_FAIL); 917 907 return err; 918 908 } 919 909

+2 -2

drivers/gpu/drm/i915/gvt/cfg_space.c

··· 354 354 memset(vgpu_cfg_space(vgpu) + INTEL_GVT_PCI_OPREGION, 0, 4); 355 355 356 356 vgpu->cfg_space.bar[INTEL_GVT_PCI_BAR_GTTMMIO].size = 357 - pci_resource_len(pdev, GTTMMADR_BAR); 357 + pci_resource_len(pdev, GEN4_GTTMMADR_BAR); 358 358 vgpu->cfg_space.bar[INTEL_GVT_PCI_BAR_APERTURE].size = 359 - pci_resource_len(pdev, GTT_APERTURE_BAR); 359 + pci_resource_len(pdev, GEN4_GMADR_BAR); 360 360 361 361 memset(vgpu_cfg_space(vgpu) + PCI_ROM_ADDRESS, 0, 4); 362 362

+2 -2

drivers/gpu/drm/i915/gvt/handlers.c

··· 734 734 _MMIO(0x770c), 735 735 _MMIO(0x83a8), 736 736 _MMIO(0xb110), 737 - GEN8_L3SQCREG4,//_MMIO(0xb118) 737 + _MMIO(0xb118), 738 738 _MMIO(0xe100), 739 739 _MMIO(0xe18c), 740 740 _MMIO(0xe48c), ··· 2257 2257 MMIO_DFH(_MMIO(0x2438), D_ALL, F_CMD_ACCESS, NULL, NULL); 2258 2258 MMIO_DFH(_MMIO(0x243c), D_ALL, F_CMD_ACCESS, NULL, NULL); 2259 2259 MMIO_DFH(_MMIO(0x7018), D_ALL, F_MODE_MASK | F_CMD_ACCESS, NULL, NULL); 2260 - MMIO_DFH(HALF_SLICE_CHICKEN3, D_ALL, F_MODE_MASK | F_CMD_ACCESS, NULL, NULL); 2260 + MMIO_DFH(HSW_HALF_SLICE_CHICKEN3, D_ALL, F_MODE_MASK | F_CMD_ACCESS, NULL, NULL); 2261 2261 MMIO_DFH(GEN7_HALF_SLICE_CHICKEN1, D_ALL, F_MODE_MASK | F_CMD_ACCESS, NULL, NULL); 2262 2262 2263 2263 /* display */

+7 -7

drivers/gpu/drm/i915/gvt/mmio_context.c

··· 106 106 {RCS0, GEN8_CS_CHICKEN1, 0xffff, true}, /* 0x2580 */ 107 107 {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ 108 108 {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ 109 - {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */ 110 - {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */ 109 + {RCS0, _MMIO(0xb118), 0, false}, /* GEN8_L3SQCREG4 */ 110 + {RCS0, _MMIO(0xb11c), 0, false}, /* GEN9_SCRATCH1 */ 111 111 {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ 112 112 {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ 113 - {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ 114 - {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */ 115 - {RCS0, GEN9_HALF_SLICE_CHICKEN5, 0xffff, true}, /* 0xe188 */ 116 - {RCS0, GEN9_HALF_SLICE_CHICKEN7, 0xffff, true}, /* 0xe194 */ 117 - {RCS0, GEN8_ROW_CHICKEN, 0xffff, true}, /* 0xe4f0 */ 113 + {RCS0, _MMIO(0xe180), 0xffff, true}, /* HALF_SLICE_CHICKEN2 */ 114 + {RCS0, _MMIO(0xe184), 0xffff, true}, /* GEN8_HALF_SLICE_CHICKEN3 */ 115 + {RCS0, _MMIO(0xe188), 0xffff, true}, /* GEN9_HALF_SLICE_CHICKEN5 */ 116 + {RCS0, _MMIO(0xe194), 0xffff, true}, /* GEN9_HALF_SLICE_CHICKEN7 */ 117 + {RCS0, _MMIO(0xe4f0), 0xffff, true}, /* GEN8_ROW_CHICKEN */ 118 118 {RCS0, TRVATTL3PTRDW(0), 0, true}, /* 0x4de0 */ 119 119 {RCS0, TRVATTL3PTRDW(1), 0, true}, /* 0x4de4 */ 120 120 {RCS0, TRNULLDETCT, 0, true}, /* 0x4de8 */

+7 -1

drivers/gpu/drm/i915/i915_driver.c

··· 81 81 #include "i915_drm_client.h" 82 82 #include "i915_drv.h" 83 83 #include "i915_getparam.h" 84 + #include "i915_hwmon.h" 84 85 #include "i915_ioc32.h" 85 86 #include "i915_ioctl.h" 86 87 #include "i915_irq.h" ··· 765 764 for_each_gt(gt, dev_priv, i) 766 765 intel_gt_driver_register(gt); 767 766 767 + i915_hwmon_register(dev_priv); 768 + 768 769 intel_display_driver_register(dev_priv); 769 770 770 771 intel_power_domains_enable(dev_priv); ··· 798 795 799 796 for_each_gt(gt, dev_priv, i) 800 797 intel_gt_driver_unregister(gt); 798 + 799 + i915_hwmon_unregister(dev_priv); 801 800 802 801 i915_perf_unregister(dev_priv); 803 802 i915_pmu_unregister(dev_priv); ··· 1661 1656 1662 1657 intel_runtime_pm_enable_interrupts(dev_priv); 1663 1658 1664 - intel_gt_runtime_resume(to_gt(dev_priv)); 1659 + for_each_gt(gt, dev_priv, i) 1660 + intel_gt_runtime_resume(gt); 1665 1661 1666 1662 enable_rpm_wakeref_asserts(rpm); 1667 1663

+10 -20

drivers/gpu/drm/i915/i915_drv.h

··· 40 40 #include "display/intel_display_core.h" 41 41 42 42 #include "gem/i915_gem_context_types.h" 43 - #include "gem/i915_gem_lmem.h" 44 43 #include "gem/i915_gem_shrinker.h" 45 44 #include "gem/i915_gem_stolen.h" 46 45 ··· 348 349 struct intel_runtime_pm runtime_pm; 349 350 350 351 struct i915_perf perf; 352 + 353 + struct i915_hwmon *hwmon; 351 354 352 355 /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */ 353 356 struct intel_gt gt0; ··· 899 898 #define HAS_RUNTIME_PM(dev_priv) (INTEL_INFO(dev_priv)->has_runtime_pm) 900 899 #define HAS_64BIT_RELOC(dev_priv) (INTEL_INFO(dev_priv)->has_64bit_reloc) 901 900 901 + #define HAS_OA_BPC_REPORTING(dev_priv) \ 902 + (INTEL_INFO(dev_priv)->has_oa_bpc_reporting) 903 + #define HAS_OA_SLICE_CONTRIB_LIMITS(dev_priv) \ 904 + (INTEL_INFO(dev_priv)->has_oa_slice_contrib_limits) 905 + 902 906 /* 903 907 * Set this flag, when platform requires 64K GTT page sizes or larger for 904 908 * device local memory access. 905 909 */ 906 910 #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages) 907 - 908 - /* 909 - * Set this flag when platform doesn't allow both 64k pages and 4k pages in 910 - * the same PT. this flag means we need to support compact PT layout for the 911 - * ppGTT when using the 64K GTT pages. 912 - */ 913 - #define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt) 914 911 915 912 #define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc) 916 913 ··· 975 976 976 977 #define HAS_ONE_EU_PER_FUSE_BIT(i915) (INTEL_INFO(i915)->has_one_eu_per_fuse_bit) 977 978 979 + #define HAS_LMEMBAR_SMEM_STOLEN(i915) (!HAS_LMEM(i915) && \ 980 + GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) 981 + 978 982 /* intel_device_info.c */ 979 983 static inline struct intel_device_info * 980 984 mkwrite_device_info(struct drm_i915_private *dev_priv) 981 985 { 982 986 return (struct intel_device_info *)INTEL_INFO(dev_priv); 983 - } 984 - 985 - static inline enum i915_map_type 986 - i915_coherent_map_type(struct drm_i915_private *i915, 987 - struct drm_i915_gem_object *obj, bool always_coherent) 988 - { 989 - if (i915_gem_object_is_lmem(obj)) 990 - return I915_MAP_WC; 991 - if (HAS_LLC(i915) || always_coherent) 992 - return I915_MAP_WB; 993 - else 994 - return I915_MAP_WC; 995 987 } 996 988 997 989 #endif

+32 -17

drivers/gpu/drm/i915/i915_gem.c

··· 843 843 __i915_gem_object_release_mmap_gtt(obj); 844 844 845 845 list_for_each_entry_safe(obj, on, 846 - &to_gt(i915)->lmem_userfault_list, userfault_link) 846 + &i915->runtime_pm.lmem_userfault_list, userfault_link) 847 847 i915_gem_object_runtime_pm_release_mmap_offset(obj); 848 848 849 849 /* ··· 1128 1128 1129 1129 int i915_gem_init(struct drm_i915_private *dev_priv) 1130 1130 { 1131 + struct intel_gt *gt; 1132 + unsigned int i; 1131 1133 int ret; 1132 1134 1133 1135 /* We need to fallback to 4K pages if host doesn't support huge gtt. */ ··· 1160 1158 */ 1161 1159 intel_init_clock_gating(dev_priv); 1162 1160 1163 - ret = intel_gt_init(to_gt(dev_priv)); 1164 - if (ret) 1165 - goto err_unlock; 1161 + for_each_gt(gt, dev_priv, i) { 1162 + ret = intel_gt_init(gt); 1163 + if (ret) 1164 + goto err_unlock; 1165 + } 1166 1166 1167 1167 return 0; 1168 1168 ··· 1177 1173 err_unlock: 1178 1174 i915_gem_drain_workqueue(dev_priv); 1179 1175 1180 - if (ret != -EIO) 1181 - intel_uc_cleanup_firmwares(&to_gt(dev_priv)->uc); 1176 + if (ret != -EIO) { 1177 + for_each_gt(gt, dev_priv, i) { 1178 + intel_gt_driver_remove(gt); 1179 + intel_gt_driver_release(gt); 1180 + intel_uc_cleanup_firmwares(&gt->uc); 1181 + } 1182 + } 1182 1183 1183 1184 if (ret == -EIO) { 1184 1185 /* ··· 1191 1182 * as wedged. But we only want to do this when the GPU is angry, 1192 1183 * for all other failure, such as an allocation failure, bail. 1193 1184 */ 1194 - if (!intel_gt_is_wedged(to_gt(dev_priv))) { 1195 - i915_probe_error(dev_priv, 1196 - "Failed to initialize GPU, declaring it wedged!\n"); 1197 - intel_gt_set_wedged(to_gt(dev_priv)); 1185 + for_each_gt(gt, dev_priv, i) { 1186 + if (!intel_gt_is_wedged(gt)) { 1187 + i915_probe_error(dev_priv, 1188 + "Failed to initialize GPU, declaring it wedged!\n"); 1189 + intel_gt_set_wedged(gt); 1190 + } 1198 1191 } 1199 1192 1200 1193 /* Minimal basic recovery for KMS */ ··· 1224 1213 1225 1214 void i915_gem_driver_remove(struct drm_i915_private *dev_priv) 1226 1215 { 1227 - intel_wakeref_auto_fini(&to_gt(dev_priv)->userfault_wakeref); 1216 + struct intel_gt *gt; 1217 + unsigned int i; 1228 1218 1229 1219 i915_gem_suspend_late(dev_priv); 1230 - intel_gt_driver_remove(to_gt(dev_priv)); 1220 + for_each_gt(gt, dev_priv, i) 1221 + intel_gt_driver_remove(gt); 1231 1222 dev_priv->uabi_engines = RB_ROOT; 1232 1223 1233 1224 /* Flush any outstanding unpin_work. */ 1234 1225 i915_gem_drain_workqueue(dev_priv); 1235 - 1236 - i915_gem_drain_freed_objects(dev_priv); 1237 1226 } 1238 1227 1239 1228 void i915_gem_driver_release(struct drm_i915_private *dev_priv) 1240 1229 { 1241 - intel_gt_driver_release(to_gt(dev_priv)); 1230 + struct intel_gt *gt; 1231 + unsigned int i; 1242 1232 1243 - intel_uc_cleanup_firmwares(&to_gt(dev_priv)->uc); 1233 + for_each_gt(gt, dev_priv, i) { 1234 + intel_gt_driver_release(gt); 1235 + intel_uc_cleanup_firmwares(&gt->uc); 1236 + } 1244 1237 1245 1238 /* Flush any outstanding work, including i915_gem_context.release_work. */ 1246 1239 i915_gem_drain_workqueue(dev_priv); ··· 1274 1259 1275 1260 void i915_gem_cleanup_early(struct drm_i915_private *dev_priv) 1276 1261 { 1277 - i915_gem_drain_freed_objects(dev_priv); 1262 + i915_gem_drain_workqueue(dev_priv); 1278 1263 GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list)); 1279 1264 GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count)); 1280 1265 drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);

+3

drivers/gpu/drm/i915/i915_getparam.c

··· 175 175 case I915_PARAM_PERF_REVISION: 176 176 value = i915_perf_ioctl_version(); 177 177 break; 178 + case I915_PARAM_OA_TIMESTAMP_FREQUENCY: 179 + value = i915_perf_oa_timestamp_frequency(i915); 180 + break; 178 181 default: 179 182 DRM_DEBUG("Unknown parameter %d\n", param->param); 180 183 return -EINVAL;

+10 -2

drivers/gpu/drm/i915/i915_gpu_error.c

··· 1221 1221 if (GRAPHICS_VER(i915) >= 6) { 1222 1222 ee->rc_psmi = ENGINE_READ(engine, RING_PSMI_CTL); 1223 1223 1224 - if (GRAPHICS_VER(i915) >= 12) 1224 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) 1225 + ee->fault_reg = intel_gt_mcr_read_any(engine->gt, 1226 + XEHP_RING_FAULT_REG); 1227 + else if (GRAPHICS_VER(i915) >= 12) 1225 1228 ee->fault_reg = intel_uncore_read(engine->uncore, 1226 1229 GEN12_RING_FAULT_REG); 1227 1230 else if (GRAPHICS_VER(i915) >= 8) ··· 1823 1820 if (GRAPHICS_VER(i915) == 7) 1824 1821 gt->err_int = intel_uncore_read(uncore, GEN7_ERR_INT); 1825 1822 1826 - if (GRAPHICS_VER(i915) >= 12) { 1823 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { 1824 + gt->fault_data0 = intel_gt_mcr_read_any((struct intel_gt *)gt->_gt, 1825 + XEHP_FAULT_TLB_DATA0); 1826 + gt->fault_data1 = intel_gt_mcr_read_any((struct intel_gt *)gt->_gt, 1827 + XEHP_FAULT_TLB_DATA1); 1828 + } else if (GRAPHICS_VER(i915) >= 12) { 1827 1829 gt->fault_data0 = intel_uncore_read(uncore, 1828 1830 GEN12_FAULT_TLB_DATA0); 1829 1831 gt->fault_data1 = intel_uncore_read(uncore,

+732

drivers/gpu/drm/i915/i915_hwmon.c

··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2022 Intel Corporation 4 + */ 5 + 6 + #include <linux/hwmon.h> 7 + #include <linux/hwmon-sysfs.h> 8 + #include <linux/types.h> 9 + 10 + #include "i915_drv.h" 11 + #include "i915_hwmon.h" 12 + #include "i915_reg.h" 13 + #include "intel_mchbar_regs.h" 14 + #include "intel_pcode.h" 15 + #include "gt/intel_gt.h" 16 + #include "gt/intel_gt_regs.h" 17 + 18 + /* 19 + * SF_* - scale factors for particular quantities according to hwmon spec. 20 + * - voltage - millivolts 21 + * - power - microwatts 22 + * - curr - milliamperes 23 + * - energy - microjoules 24 + * - time - milliseconds 25 + */ 26 + #define SF_VOLTAGE 1000 27 + #define SF_POWER 1000000 28 + #define SF_CURR 1000 29 + #define SF_ENERGY 1000000 30 + #define SF_TIME 1000 31 + 32 + struct hwm_reg { 33 + i915_reg_t gt_perf_status; 34 + i915_reg_t pkg_power_sku_unit; 35 + i915_reg_t pkg_power_sku; 36 + i915_reg_t pkg_rapl_limit; 37 + i915_reg_t energy_status_all; 38 + i915_reg_t energy_status_tile; 39 + }; 40 + 41 + struct hwm_energy_info { 42 + u32 reg_val_prev; 43 + long accum_energy; /* Accumulated energy for energy1_input */ 44 + }; 45 + 46 + struct hwm_drvdata { 47 + struct i915_hwmon *hwmon; 48 + struct intel_uncore *uncore; 49 + struct device *hwmon_dev; 50 + struct hwm_energy_info ei; /* Energy info for energy1_input */ 51 + char name[12]; 52 + int gt_n; 53 + }; 54 + 55 + struct i915_hwmon { 56 + struct hwm_drvdata ddat; 57 + struct hwm_drvdata ddat_gt[I915_MAX_GT]; 58 + struct mutex hwmon_lock; /* counter overflow logic and rmw */ 59 + struct hwm_reg rg; 60 + int scl_shift_power; 61 + int scl_shift_energy; 62 + int scl_shift_time; 63 + }; 64 + 65 + static void 66 + hwm_locked_with_pm_intel_uncore_rmw(struct hwm_drvdata *ddat, 67 + i915_reg_t reg, u32 clear, u32 set) 68 + { 69 + struct i915_hwmon *hwmon = ddat->hwmon; 70 + struct intel_uncore *uncore = ddat->uncore; 71 + intel_wakeref_t wakeref; 72 + 73 + mutex_lock(&hwmon->hwmon_lock); 74 + 75 + with_intel_runtime_pm(uncore->rpm, wakeref) 76 + intel_uncore_rmw(uncore, reg, clear, set); 77 + 78 + mutex_unlock(&hwmon->hwmon_lock); 79 + } 80 + 81 + /* 82 + * This function's return type of u64 allows for the case where the scaling 83 + * of the field taken from the 32-bit register value might cause a result to 84 + * exceed 32 bits. 85 + */ 86 + static u64 87 + hwm_field_read_and_scale(struct hwm_drvdata *ddat, i915_reg_t rgadr, 88 + u32 field_msk, int nshift, u32 scale_factor) 89 + { 90 + struct intel_uncore *uncore = ddat->uncore; 91 + intel_wakeref_t wakeref; 92 + u32 reg_value; 93 + 94 + with_intel_runtime_pm(uncore->rpm, wakeref) 95 + reg_value = intel_uncore_read(uncore, rgadr); 96 + 97 + reg_value = REG_FIELD_GET(field_msk, reg_value); 98 + 99 + return mul_u64_u32_shr(reg_value, scale_factor, nshift); 100 + } 101 + 102 + static void 103 + hwm_field_scale_and_write(struct hwm_drvdata *ddat, i915_reg_t rgadr, 104 + int nshift, unsigned int scale_factor, long lval) 105 + { 106 + u32 nval; 107 + 108 + /* Computation in 64-bits to avoid overflow. Round to nearest. */ 109 + nval = DIV_ROUND_CLOSEST_ULL((u64)lval << nshift, scale_factor); 110 + 111 + hwm_locked_with_pm_intel_uncore_rmw(ddat, rgadr, 112 + PKG_PWR_LIM_1, 113 + REG_FIELD_PREP(PKG_PWR_LIM_1, nval)); 114 + } 115 + 116 + /* 117 + * hwm_energy - Obtain energy value 118 + * 119 + * The underlying energy hardware register is 32-bits and is subject to 120 + * overflow. How long before overflow? For example, with an example 121 + * scaling bit shift of 14 bits (see register *PACKAGE_POWER_SKU_UNIT) and 122 + * a power draw of 1000 watts, the 32-bit counter will overflow in 123 + * approximately 4.36 minutes. 124 + * 125 + * Examples: 126 + * 1 watt: (2^32 >> 14) / 1 W / (60 * 60 * 24) secs/day -> 3 days 127 + * 1000 watts: (2^32 >> 14) / 1000 W / 60 secs/min -> 4.36 minutes 128 + * 129 + * The function significantly increases overflow duration (from 4.36 130 + * minutes) by accumulating the energy register into a 'long' as allowed by 131 + * the hwmon API. Using x86_64 128 bit arithmetic (see mul_u64_u32_shr()), 132 + * a 'long' of 63 bits, SF_ENERGY of 1e6 (~20 bits) and 133 + * hwmon->scl_shift_energy of 14 bits we have 57 (63 - 20 + 14) bits before 134 + * energy1_input overflows. This at 1000 W is an overflow duration of 278 years. 135 + */ 136 + static void 137 + hwm_energy(struct hwm_drvdata *ddat, long *energy) 138 + { 139 + struct intel_uncore *uncore = ddat->uncore; 140 + struct i915_hwmon *hwmon = ddat->hwmon; 141 + struct hwm_energy_info *ei = &ddat->ei; 142 + intel_wakeref_t wakeref; 143 + i915_reg_t rgaddr; 144 + u32 reg_val; 145 + 146 + if (ddat->gt_n >= 0) 147 + rgaddr = hwmon->rg.energy_status_tile; 148 + else 149 + rgaddr = hwmon->rg.energy_status_all; 150 + 151 + mutex_lock(&hwmon->hwmon_lock); 152 + 153 + with_intel_runtime_pm(uncore->rpm, wakeref) 154 + reg_val = intel_uncore_read(uncore, rgaddr); 155 + 156 + if (reg_val >= ei->reg_val_prev) 157 + ei->accum_energy += reg_val - ei->reg_val_prev; 158 + else 159 + ei->accum_energy += UINT_MAX - ei->reg_val_prev + reg_val; 160 + ei->reg_val_prev = reg_val; 161 + 162 + *energy = mul_u64_u32_shr(ei->accum_energy, SF_ENERGY, 163 + hwmon->scl_shift_energy); 164 + mutex_unlock(&hwmon->hwmon_lock); 165 + } 166 + 167 + static ssize_t 168 + hwm_power1_max_interval_show(struct device *dev, struct device_attribute *attr, 169 + char *buf) 170 + { 171 + struct hwm_drvdata *ddat = dev_get_drvdata(dev); 172 + struct i915_hwmon *hwmon = ddat->hwmon; 173 + intel_wakeref_t wakeref; 174 + u32 r, x, y, x_w = 2; /* 2 bits */ 175 + u64 tau4, out; 176 + 177 + with_intel_runtime_pm(ddat->uncore->rpm, wakeref) 178 + r = intel_uncore_read(ddat->uncore, hwmon->rg.pkg_rapl_limit); 179 + 180 + x = REG_FIELD_GET(PKG_PWR_LIM_1_TIME_X, r); 181 + y = REG_FIELD_GET(PKG_PWR_LIM_1_TIME_Y, r); 182 + /* 183 + * tau = 1.x * power(2,y), x = bits(23:22), y = bits(21:17) 184 + * = (4 | x) << (y - 2) 185 + * where (y - 2) ensures a 1.x fixed point representation of 1.x 186 + * However because y can be < 2, we compute 187 + * tau4 = (4 | x) << y 188 + * but add 2 when doing the final right shift to account for units 189 + */ 190 + tau4 = ((1 << x_w) | x) << y; 191 + /* val in hwmon interface units (millisec) */ 192 + out = mul_u64_u32_shr(tau4, SF_TIME, hwmon->scl_shift_time + x_w); 193 + 194 + return sysfs_emit(buf, "%llu\n", out); 195 + } 196 + 197 + static ssize_t 198 + hwm_power1_max_interval_store(struct device *dev, 199 + struct device_attribute *attr, 200 + const char *buf, size_t count) 201 + { 202 + struct hwm_drvdata *ddat = dev_get_drvdata(dev); 203 + struct i915_hwmon *hwmon = ddat->hwmon; 204 + u32 x, y, rxy, x_w = 2; /* 2 bits */ 205 + u64 tau4, r, max_win; 206 + unsigned long val; 207 + int ret; 208 + 209 + ret = kstrtoul(buf, 0, &val); 210 + if (ret) 211 + return ret; 212 + 213 + /* 214 + * Max HW supported tau in '1.x * power(2,y)' format, x = 0, y = 0x12 215 + * The hwmon->scl_shift_time default of 0xa results in a max tau of 256 seconds 216 + */ 217 + #define PKG_MAX_WIN_DEFAULT 0x12ull 218 + 219 + /* 220 + * val must be < max in hwmon interface units. The steps below are 221 + * explained in i915_power1_max_interval_show() 222 + */ 223 + r = FIELD_PREP(PKG_MAX_WIN, PKG_MAX_WIN_DEFAULT); 224 + x = REG_FIELD_GET(PKG_MAX_WIN_X, r); 225 + y = REG_FIELD_GET(PKG_MAX_WIN_Y, r); 226 + tau4 = ((1 << x_w) | x) << y; 227 + max_win = mul_u64_u32_shr(tau4, SF_TIME, hwmon->scl_shift_time + x_w); 228 + 229 + if (val > max_win) 230 + return -EINVAL; 231 + 232 + /* val in hw units */ 233 + val = DIV_ROUND_CLOSEST_ULL((u64)val << hwmon->scl_shift_time, SF_TIME); 234 + /* Convert to 1.x * power(2,y) */ 235 + if (!val) 236 + return -EINVAL; 237 + y = ilog2(val); 238 + /* x = (val - (1 << y)) >> (y - 2); */ 239 + x = (val - (1ul << y)) << x_w >> y; 240 + 241 + rxy = REG_FIELD_PREP(PKG_PWR_LIM_1_TIME_X, x) | REG_FIELD_PREP(PKG_PWR_LIM_1_TIME_Y, y); 242 + 243 + hwm_locked_with_pm_intel_uncore_rmw(ddat, hwmon->rg.pkg_rapl_limit, 244 + PKG_PWR_LIM_1_TIME, rxy); 245 + return count; 246 + } 247 + 248 + static SENSOR_DEVICE_ATTR(power1_max_interval, 0664, 249 + hwm_power1_max_interval_show, 250 + hwm_power1_max_interval_store, 0); 251 + 252 + static struct attribute *hwm_attributes[] = { 253 + &sensor_dev_attr_power1_max_interval.dev_attr.attr, 254 + NULL 255 + }; 256 + 257 + static umode_t hwm_attributes_visible(struct kobject *kobj, 258 + struct attribute *attr, int index) 259 + { 260 + struct device *dev = kobj_to_dev(kobj); 261 + struct hwm_drvdata *ddat = dev_get_drvdata(dev); 262 + struct i915_hwmon *hwmon = ddat->hwmon; 263 + 264 + if (attr == &sensor_dev_attr_power1_max_interval.dev_attr.attr) 265 + return i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit) ? attr->mode : 0; 266 + 267 + return 0; 268 + } 269 + 270 + static const struct attribute_group hwm_attrgroup = { 271 + .attrs = hwm_attributes, 272 + .is_visible = hwm_attributes_visible, 273 + }; 274 + 275 + static const struct attribute_group *hwm_groups[] = { 276 + &hwm_attrgroup, 277 + NULL 278 + }; 279 + 280 + static const struct hwmon_channel_info *hwm_info[] = { 281 + HWMON_CHANNEL_INFO(in, HWMON_I_INPUT), 282 + HWMON_CHANNEL_INFO(power, HWMON_P_MAX | HWMON_P_RATED_MAX | HWMON_P_CRIT), 283 + HWMON_CHANNEL_INFO(energy, HWMON_E_INPUT), 284 + HWMON_CHANNEL_INFO(curr, HWMON_C_CRIT), 285 + NULL 286 + }; 287 + 288 + static const struct hwmon_channel_info *hwm_gt_info[] = { 289 + HWMON_CHANNEL_INFO(energy, HWMON_E_INPUT), 290 + NULL 291 + }; 292 + 293 + /* I1 is exposed as power_crit or as curr_crit depending on bit 31 */ 294 + static int hwm_pcode_read_i1(struct drm_i915_private *i915, u32 *uval) 295 + { 296 + return snb_pcode_read_p(&i915->uncore, PCODE_POWER_SETUP, 297 + POWER_SETUP_SUBCOMMAND_READ_I1, 0, uval); 298 + } 299 + 300 + static int hwm_pcode_write_i1(struct drm_i915_private *i915, u32 uval) 301 + { 302 + return snb_pcode_write_p(&i915->uncore, PCODE_POWER_SETUP, 303 + POWER_SETUP_SUBCOMMAND_WRITE_I1, 0, uval); 304 + } 305 + 306 + static umode_t 307 + hwm_in_is_visible(const struct hwm_drvdata *ddat, u32 attr) 308 + { 309 + struct drm_i915_private *i915 = ddat->uncore->i915; 310 + 311 + switch (attr) { 312 + case hwmon_in_input: 313 + return IS_DG1(i915) || IS_DG2(i915) ? 0444 : 0; 314 + default: 315 + return 0; 316 + } 317 + } 318 + 319 + static int 320 + hwm_in_read(struct hwm_drvdata *ddat, u32 attr, long *val) 321 + { 322 + struct i915_hwmon *hwmon = ddat->hwmon; 323 + intel_wakeref_t wakeref; 324 + u32 reg_value; 325 + 326 + switch (attr) { 327 + case hwmon_in_input: 328 + with_intel_runtime_pm(ddat->uncore->rpm, wakeref) 329 + reg_value = intel_uncore_read(ddat->uncore, hwmon->rg.gt_perf_status); 330 + /* HW register value in units of 2.5 millivolt */ 331 + *val = DIV_ROUND_CLOSEST(REG_FIELD_GET(GEN12_VOLTAGE_MASK, reg_value) * 25, 10); 332 + return 0; 333 + default: 334 + return -EOPNOTSUPP; 335 + } 336 + } 337 + 338 + static umode_t 339 + hwm_power_is_visible(const struct hwm_drvdata *ddat, u32 attr, int chan) 340 + { 341 + struct drm_i915_private *i915 = ddat->uncore->i915; 342 + struct i915_hwmon *hwmon = ddat->hwmon; 343 + u32 uval; 344 + 345 + switch (attr) { 346 + case hwmon_power_max: 347 + return i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit) ? 0664 : 0; 348 + case hwmon_power_rated_max: 349 + return i915_mmio_reg_valid(hwmon->rg.pkg_power_sku) ? 0444 : 0; 350 + case hwmon_power_crit: 351 + return (hwm_pcode_read_i1(i915, &uval) || 352 + !(uval & POWER_SETUP_I1_WATTS)) ? 0 : 0644; 353 + default: 354 + return 0; 355 + } 356 + } 357 + 358 + static int 359 + hwm_power_read(struct hwm_drvdata *ddat, u32 attr, int chan, long *val) 360 + { 361 + struct i915_hwmon *hwmon = ddat->hwmon; 362 + int ret; 363 + u32 uval; 364 + 365 + switch (attr) { 366 + case hwmon_power_max: 367 + *val = hwm_field_read_and_scale(ddat, 368 + hwmon->rg.pkg_rapl_limit, 369 + PKG_PWR_LIM_1, 370 + hwmon->scl_shift_power, 371 + SF_POWER); 372 + return 0; 373 + case hwmon_power_rated_max: 374 + *val = hwm_field_read_and_scale(ddat, 375 + hwmon->rg.pkg_power_sku, 376 + PKG_PKG_TDP, 377 + hwmon->scl_shift_power, 378 + SF_POWER); 379 + return 0; 380 + case hwmon_power_crit: 381 + ret = hwm_pcode_read_i1(ddat->uncore->i915, &uval); 382 + if (ret) 383 + return ret; 384 + if (!(uval & POWER_SETUP_I1_WATTS)) 385 + return -ENODEV; 386 + *val = mul_u64_u32_shr(REG_FIELD_GET(POWER_SETUP_I1_DATA_MASK, uval), 387 + SF_POWER, POWER_SETUP_I1_SHIFT); 388 + return 0; 389 + default: 390 + return -EOPNOTSUPP; 391 + } 392 + } 393 + 394 + static int 395 + hwm_power_write(struct hwm_drvdata *ddat, u32 attr, int chan, long val) 396 + { 397 + struct i915_hwmon *hwmon = ddat->hwmon; 398 + u32 uval; 399 + 400 + switch (attr) { 401 + case hwmon_power_max: 402 + hwm_field_scale_and_write(ddat, 403 + hwmon->rg.pkg_rapl_limit, 404 + hwmon->scl_shift_power, 405 + SF_POWER, val); 406 + return 0; 407 + case hwmon_power_crit: 408 + uval = DIV_ROUND_CLOSEST_ULL(val << POWER_SETUP_I1_SHIFT, SF_POWER); 409 + return hwm_pcode_write_i1(ddat->uncore->i915, uval); 410 + default: 411 + return -EOPNOTSUPP; 412 + } 413 + } 414 + 415 + static umode_t 416 + hwm_energy_is_visible(const struct hwm_drvdata *ddat, u32 attr) 417 + { 418 + struct i915_hwmon *hwmon = ddat->hwmon; 419 + i915_reg_t rgaddr; 420 + 421 + switch (attr) { 422 + case hwmon_energy_input: 423 + if (ddat->gt_n >= 0) 424 + rgaddr = hwmon->rg.energy_status_tile; 425 + else 426 + rgaddr = hwmon->rg.energy_status_all; 427 + return i915_mmio_reg_valid(rgaddr) ? 0444 : 0; 428 + default: 429 + return 0; 430 + } 431 + } 432 + 433 + static int 434 + hwm_energy_read(struct hwm_drvdata *ddat, u32 attr, long *val) 435 + { 436 + switch (attr) { 437 + case hwmon_energy_input: 438 + hwm_energy(ddat, val); 439 + return 0; 440 + default: 441 + return -EOPNOTSUPP; 442 + } 443 + } 444 + 445 + static umode_t 446 + hwm_curr_is_visible(const struct hwm_drvdata *ddat, u32 attr) 447 + { 448 + struct drm_i915_private *i915 = ddat->uncore->i915; 449 + u32 uval; 450 + 451 + switch (attr) { 452 + case hwmon_curr_crit: 453 + return (hwm_pcode_read_i1(i915, &uval) || 454 + (uval & POWER_SETUP_I1_WATTS)) ? 0 : 0644; 455 + default: 456 + return 0; 457 + } 458 + } 459 + 460 + static int 461 + hwm_curr_read(struct hwm_drvdata *ddat, u32 attr, long *val) 462 + { 463 + int ret; 464 + u32 uval; 465 + 466 + switch (attr) { 467 + case hwmon_curr_crit: 468 + ret = hwm_pcode_read_i1(ddat->uncore->i915, &uval); 469 + if (ret) 470 + return ret; 471 + if (uval & POWER_SETUP_I1_WATTS) 472 + return -ENODEV; 473 + *val = mul_u64_u32_shr(REG_FIELD_GET(POWER_SETUP_I1_DATA_MASK, uval), 474 + SF_CURR, POWER_SETUP_I1_SHIFT); 475 + return 0; 476 + default: 477 + return -EOPNOTSUPP; 478 + } 479 + } 480 + 481 + static int 482 + hwm_curr_write(struct hwm_drvdata *ddat, u32 attr, long val) 483 + { 484 + u32 uval; 485 + 486 + switch (attr) { 487 + case hwmon_curr_crit: 488 + uval = DIV_ROUND_CLOSEST_ULL(val << POWER_SETUP_I1_SHIFT, SF_CURR); 489 + return hwm_pcode_write_i1(ddat->uncore->i915, uval); 490 + default: 491 + return -EOPNOTSUPP; 492 + } 493 + } 494 + 495 + static umode_t 496 + hwm_is_visible(const void *drvdata, enum hwmon_sensor_types type, 497 + u32 attr, int channel) 498 + { 499 + struct hwm_drvdata *ddat = (struct hwm_drvdata *)drvdata; 500 + 501 + switch (type) { 502 + case hwmon_in: 503 + return hwm_in_is_visible(ddat, attr); 504 + case hwmon_power: 505 + return hwm_power_is_visible(ddat, attr, channel); 506 + case hwmon_energy: 507 + return hwm_energy_is_visible(ddat, attr); 508 + case hwmon_curr: 509 + return hwm_curr_is_visible(ddat, attr); 510 + default: 511 + return 0; 512 + } 513 + } 514 + 515 + static int 516 + hwm_read(struct device *dev, enum hwmon_sensor_types type, u32 attr, 517 + int channel, long *val) 518 + { 519 + struct hwm_drvdata *ddat = dev_get_drvdata(dev); 520 + 521 + switch (type) { 522 + case hwmon_in: 523 + return hwm_in_read(ddat, attr, val); 524 + case hwmon_power: 525 + return hwm_power_read(ddat, attr, channel, val); 526 + case hwmon_energy: 527 + return hwm_energy_read(ddat, attr, val); 528 + case hwmon_curr: 529 + return hwm_curr_read(ddat, attr, val); 530 + default: 531 + return -EOPNOTSUPP; 532 + } 533 + } 534 + 535 + static int 536 + hwm_write(struct device *dev, enum hwmon_sensor_types type, u32 attr, 537 + int channel, long val) 538 + { 539 + struct hwm_drvdata *ddat = dev_get_drvdata(dev); 540 + 541 + switch (type) { 542 + case hwmon_power: 543 + return hwm_power_write(ddat, attr, channel, val); 544 + case hwmon_curr: 545 + return hwm_curr_write(ddat, attr, val); 546 + default: 547 + return -EOPNOTSUPP; 548 + } 549 + } 550 + 551 + static const struct hwmon_ops hwm_ops = { 552 + .is_visible = hwm_is_visible, 553 + .read = hwm_read, 554 + .write = hwm_write, 555 + }; 556 + 557 + static const struct hwmon_chip_info hwm_chip_info = { 558 + .ops = &hwm_ops, 559 + .info = hwm_info, 560 + }; 561 + 562 + static umode_t 563 + hwm_gt_is_visible(const void *drvdata, enum hwmon_sensor_types type, 564 + u32 attr, int channel) 565 + { 566 + struct hwm_drvdata *ddat = (struct hwm_drvdata *)drvdata; 567 + 568 + switch (type) { 569 + case hwmon_energy: 570 + return hwm_energy_is_visible(ddat, attr); 571 + default: 572 + return 0; 573 + } 574 + } 575 + 576 + static int 577 + hwm_gt_read(struct device *dev, enum hwmon_sensor_types type, u32 attr, 578 + int channel, long *val) 579 + { 580 + struct hwm_drvdata *ddat = dev_get_drvdata(dev); 581 + 582 + switch (type) { 583 + case hwmon_energy: 584 + return hwm_energy_read(ddat, attr, val); 585 + default: 586 + return -EOPNOTSUPP; 587 + } 588 + } 589 + 590 + static const struct hwmon_ops hwm_gt_ops = { 591 + .is_visible = hwm_gt_is_visible, 592 + .read = hwm_gt_read, 593 + }; 594 + 595 + static const struct hwmon_chip_info hwm_gt_chip_info = { 596 + .ops = &hwm_gt_ops, 597 + .info = hwm_gt_info, 598 + }; 599 + 600 + static void 601 + hwm_get_preregistration_info(struct drm_i915_private *i915) 602 + { 603 + struct i915_hwmon *hwmon = i915->hwmon; 604 + struct intel_uncore *uncore = &i915->uncore; 605 + struct hwm_drvdata *ddat = &hwmon->ddat; 606 + intel_wakeref_t wakeref; 607 + u32 val_sku_unit = 0; 608 + struct intel_gt *gt; 609 + long energy; 610 + int i; 611 + 612 + /* Available for all Gen12+/dGfx */ 613 + hwmon->rg.gt_perf_status = GEN12_RPSTAT1; 614 + 615 + if (IS_DG1(i915) || IS_DG2(i915)) { 616 + hwmon->rg.pkg_power_sku_unit = PCU_PACKAGE_POWER_SKU_UNIT; 617 + hwmon->rg.pkg_power_sku = PCU_PACKAGE_POWER_SKU; 618 + hwmon->rg.pkg_rapl_limit = PCU_PACKAGE_RAPL_LIMIT; 619 + hwmon->rg.energy_status_all = PCU_PACKAGE_ENERGY_STATUS; 620 + hwmon->rg.energy_status_tile = INVALID_MMIO_REG; 621 + } else if (IS_XEHPSDV(i915)) { 622 + hwmon->rg.pkg_power_sku_unit = GT0_PACKAGE_POWER_SKU_UNIT; 623 + hwmon->rg.pkg_power_sku = INVALID_MMIO_REG; 624 + hwmon->rg.pkg_rapl_limit = GT0_PACKAGE_RAPL_LIMIT; 625 + hwmon->rg.energy_status_all = GT0_PLATFORM_ENERGY_STATUS; 626 + hwmon->rg.energy_status_tile = GT0_PACKAGE_ENERGY_STATUS; 627 + } else { 628 + hwmon->rg.pkg_power_sku_unit = INVALID_MMIO_REG; 629 + hwmon->rg.pkg_power_sku = INVALID_MMIO_REG; 630 + hwmon->rg.pkg_rapl_limit = INVALID_MMIO_REG; 631 + hwmon->rg.energy_status_all = INVALID_MMIO_REG; 632 + hwmon->rg.energy_status_tile = INVALID_MMIO_REG; 633 + } 634 + 635 + with_intel_runtime_pm(uncore->rpm, wakeref) { 636 + /* 637 + * The contents of register hwmon->rg.pkg_power_sku_unit do not change, 638 + * so read it once and store the shift values. 639 + */ 640 + if (i915_mmio_reg_valid(hwmon->rg.pkg_power_sku_unit)) 641 + val_sku_unit = intel_uncore_read(uncore, 642 + hwmon->rg.pkg_power_sku_unit); 643 + } 644 + 645 + hwmon->scl_shift_power = REG_FIELD_GET(PKG_PWR_UNIT, val_sku_unit); 646 + hwmon->scl_shift_energy = REG_FIELD_GET(PKG_ENERGY_UNIT, val_sku_unit); 647 + hwmon->scl_shift_time = REG_FIELD_GET(PKG_TIME_UNIT, val_sku_unit); 648 + 649 + /* 650 + * Initialize 'struct hwm_energy_info', i.e. set fields to the 651 + * first value of the energy register read 652 + */ 653 + if (i915_mmio_reg_valid(hwmon->rg.energy_status_all)) 654 + hwm_energy(ddat, &energy); 655 + if (i915_mmio_reg_valid(hwmon->rg.energy_status_tile)) { 656 + for_each_gt(gt, i915, i) 657 + hwm_energy(&hwmon->ddat_gt[i], &energy); 658 + } 659 + } 660 + 661 + void i915_hwmon_register(struct drm_i915_private *i915) 662 + { 663 + struct device *dev = i915->drm.dev; 664 + struct i915_hwmon *hwmon; 665 + struct device *hwmon_dev; 666 + struct hwm_drvdata *ddat; 667 + struct hwm_drvdata *ddat_gt; 668 + struct intel_gt *gt; 669 + int i; 670 + 671 + /* hwmon is available only for dGfx */ 672 + if (!IS_DGFX(i915)) 673 + return; 674 + 675 + hwmon = devm_kzalloc(dev, sizeof(*hwmon), GFP_KERNEL); 676 + if (!hwmon) 677 + return; 678 + 679 + i915->hwmon = hwmon; 680 + mutex_init(&hwmon->hwmon_lock); 681 + ddat = &hwmon->ddat; 682 + 683 + ddat->hwmon = hwmon; 684 + ddat->uncore = &i915->uncore; 685 + snprintf(ddat->name, sizeof(ddat->name), "i915"); 686 + ddat->gt_n = -1; 687 + 688 + for_each_gt(gt, i915, i) { 689 + ddat_gt = hwmon->ddat_gt + i; 690 + 691 + ddat_gt->hwmon = hwmon; 692 + ddat_gt->uncore = gt->uncore; 693 + snprintf(ddat_gt->name, sizeof(ddat_gt->name), "i915_gt%u", i); 694 + ddat_gt->gt_n = i; 695 + } 696 + 697 + hwm_get_preregistration_info(i915); 698 + 699 + /* hwmon_dev points to device hwmon */ 700 + hwmon_dev = devm_hwmon_device_register_with_info(dev, ddat->name, 701 + ddat, 702 + &hwm_chip_info, 703 + hwm_groups); 704 + if (IS_ERR(hwmon_dev)) { 705 + i915->hwmon = NULL; 706 + return; 707 + } 708 + 709 + ddat->hwmon_dev = hwmon_dev; 710 + 711 + for_each_gt(gt, i915, i) { 712 + ddat_gt = hwmon->ddat_gt + i; 713 + /* 714 + * Create per-gt directories only if a per-gt attribute is 715 + * visible. Currently this is only energy 716 + */ 717 + if (!hwm_gt_is_visible(ddat_gt, hwmon_energy, hwmon_energy_input, 0)) 718 + continue; 719 + 720 + hwmon_dev = devm_hwmon_device_register_with_info(dev, ddat_gt->name, 721 + ddat_gt, 722 + &hwm_gt_chip_info, 723 + NULL); 724 + if (!IS_ERR(hwmon_dev)) 725 + ddat_gt->hwmon_dev = hwmon_dev; 726 + } 727 + } 728 + 729 + void i915_hwmon_unregister(struct drm_i915_private *i915) 730 + { 731 + fetch_and_zero(&i915->hwmon); 732 + }

+20

drivers/gpu/drm/i915/i915_hwmon.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + 3 + /* 4 + * Copyright © 2022 Intel Corporation 5 + */ 6 + 7 + #ifndef __I915_HWMON_H__ 8 + #define __I915_HWMON_H__ 9 + 10 + struct drm_i915_private; 11 + 12 + #if IS_REACHABLE(CONFIG_HWMON) 13 + void i915_hwmon_register(struct drm_i915_private *i915); 14 + void i915_hwmon_unregister(struct drm_i915_private *i915); 15 + #else 16 + static inline void i915_hwmon_register(struct drm_i915_private *i915) { }; 17 + static inline void i915_hwmon_unregister(struct drm_i915_private *i915) { }; 18 + #endif 19 + 20 + #endif /* __I915_HWMON_H__ */

+4 -5

drivers/gpu/drm/i915/i915_pci.c

··· 1023 1023 .has_logical_ring_contexts = 1, \ 1024 1024 .has_logical_ring_elsq = 1, \ 1025 1025 .has_mslice_steering = 1, \ 1026 + .has_oa_bpc_reporting = 1, \ 1027 + .has_oa_slice_contrib_limits = 1, \ 1026 1028 .has_rc6 = 1, \ 1027 1029 .has_reset_engine = 1, \ 1028 1030 .has_rps = 1, \ ··· 1044 1042 PLATFORM(INTEL_XEHPSDV), 1045 1043 NO_DISPLAY, 1046 1044 .has_64k_pages = 1, 1047 - .needs_compact_pt = 1, 1048 1045 .has_media_ratio_mode = 1, 1049 1046 .__runtime.platform_engine_mask = 1050 1047 BIT(RCS0) | BIT(BCS0) | ··· 1065 1064 .has_64k_pages = 1, \ 1066 1065 .has_guc_deprivilege = 1, \ 1067 1066 .has_heci_pxp = 1, \ 1068 - .needs_compact_pt = 1, \ 1069 1067 .has_media_ratio_mode = 1, \ 1070 1068 .display.has_cdclk_squash = 1, \ 1071 1069 .__runtime.platform_engine_mask = \ ··· 1146 1146 .extra_gt_list = xelpmp_extra_gt, 1147 1147 .has_flat_ccs = 0, 1148 1148 .has_gmd_id = 1, 1149 + .has_mslice_steering = 0, 1149 1150 .has_snoop = 1, 1150 1151 .__runtime.memory_regions = REGION_SMEM | REGION_STOLEN_LMEM, 1151 1152 .__runtime.platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(CCS0), ··· 1299 1298 1300 1299 static bool intel_mmio_bar_valid(struct pci_dev *pdev, struct intel_device_info *intel_info) 1301 1300 { 1302 - int gttmmaddr_bar = intel_info->__runtime.graphics.ip.ver == 2 ? GEN2_GTTMMADR_BAR : GTTMMADR_BAR; 1303 - 1304 - return i915_pci_resource_valid(pdev, gttmmaddr_bar); 1301 + return i915_pci_resource_valid(pdev, intel_mmio_bar(intel_info->__runtime.graphics.ip.ver)); 1305 1302 } 1306 1303 1307 1304 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)

+467 -117

drivers/gpu/drm/i915/i915_perf.c

··· 204 204 #include "gt/intel_gpu_commands.h" 205 205 #include "gt/intel_gt.h" 206 206 #include "gt/intel_gt_clock_utils.h" 207 + #include "gt/intel_gt_mcr.h" 207 208 #include "gt/intel_gt_regs.h" 208 209 #include "gt/intel_lrc.h" 209 210 #include "gt/intel_lrc_reg.h" 210 211 #include "gt/intel_ring.h" 212 + #include "gt/uc/intel_guc_slpc.h" 211 213 212 214 #include "i915_drv.h" 213 215 #include "i915_file_private.h" ··· 288 286 #define OAREPORT_REASON_CTX_SWITCH (1<<3) 289 287 #define OAREPORT_REASON_CLK_RATIO (1<<5) 290 288 289 + #define HAS_MI_SET_PREDICATE(i915) (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) 291 290 292 291 /* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate 293 292 * ··· 323 320 [I915_OA_FORMAT_A12] = { 0, 64 }, 324 321 [I915_OA_FORMAT_A12_B8_C8] = { 2, 128 }, 325 322 [I915_OA_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 }, 323 + [I915_OAR_FORMAT_A32u40_A4u32_B8_C8] = { 5, 256 }, 324 + [I915_OA_FORMAT_A24u40_A14u32_B8_C8] = { 5, 256 }, 326 325 }; 327 326 328 327 #define SAMPLE_OA_REPORT (1<<0) ··· 467 462 static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) 468 463 { 469 464 u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); 470 - int report_size = stream->oa_buffer.format_size; 465 + int report_size = stream->oa_buffer.format->size; 471 466 unsigned long flags; 472 467 bool pollin; 473 468 u32 hw_tail; ··· 604 599 size_t *offset, 605 600 const u8 *report) 606 601 { 607 - int report_size = stream->oa_buffer.format_size; 602 + int report_size = stream->oa_buffer.format->size; 608 603 struct drm_i915_perf_record_header header; 609 604 610 605 header.type = DRM_I915_PERF_RECORD_SAMPLE; ··· 654 649 size_t *offset) 655 650 { 656 651 struct intel_uncore *uncore = stream->uncore; 657 - int report_size = stream->oa_buffer.format_size; 652 + int report_size = stream->oa_buffer.format->size; 658 653 u8 *oa_buf_base = stream->oa_buffer.vaddr; 659 654 u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); 660 655 u32 mask = (OA_BUFFER_SIZE - 1); 661 656 size_t start_offset = *offset; 662 657 unsigned long flags; 663 658 u32 head, tail; 664 - u32 taken; 665 659 int ret = 0; 666 660 667 661 if (drm_WARN_ON(&uncore->i915->drm, !stream->enabled)) ··· 696 692 697 693 698 694 for (/* none */; 699 - (taken = OA_TAKEN(tail, head)); 695 + OA_TAKEN(tail, head); 700 696 head = (head + report_size) & mask) { 701 697 u8 *report = oa_buf_base + head; 702 698 u32 *report32 = (void *)report; ··· 778 774 * switches since it's not-uncommon for periodic samples to 779 775 * identify a switch before any 'context switch' report. 780 776 */ 781 - if (!stream->perf->exclusive_stream->ctx || 777 + if (!stream->ctx || 782 778 stream->specific_ctx_id == ctx_id || 783 779 stream->oa_buffer.last_ctx_id == stream->specific_ctx_id || 784 780 reason & OAREPORT_REASON_CTX_SWITCH) { ··· 787 783 * While filtering for a single context we avoid 788 784 * leaking the IDs of other contexts. 789 785 */ 790 - if (stream->perf->exclusive_stream->ctx && 786 + if (stream->ctx && 791 787 stream->specific_ctx_id != ctx_id) { 792 788 report32[2] = INVALID_CTX_ID; 793 789 } ··· 947 943 size_t *offset) 948 944 { 949 945 struct intel_uncore *uncore = stream->uncore; 950 - int report_size = stream->oa_buffer.format_size; 946 + int report_size = stream->oa_buffer.format->size; 951 947 u8 *oa_buf_base = stream->oa_buffer.vaddr; 952 948 u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); 953 949 u32 mask = (OA_BUFFER_SIZE - 1); 954 950 size_t start_offset = *offset; 955 951 unsigned long flags; 956 952 u32 head, tail; 957 - u32 taken; 958 953 int ret = 0; 959 954 960 955 if (drm_WARN_ON(&uncore->i915->drm, !stream->enabled)) ··· 987 984 988 985 989 986 for (/* none */; 990 - (taken = OA_TAKEN(tail, head)); 987 + OA_TAKEN(tail, head); 991 988 head = (head + report_size) & mask) { 992 989 u8 *report = oa_buf_base + head; 993 990 u32 *report32 = (void *)report; ··· 1236 1233 return stream->pinned_ctx; 1237 1234 } 1238 1235 1236 + static int 1237 + __store_reg_to_mem(struct i915_request *rq, i915_reg_t reg, u32 ggtt_offset) 1238 + { 1239 + u32 *cs, cmd; 1240 + 1241 + cmd = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT; 1242 + if (GRAPHICS_VER(rq->engine->i915) >= 8) 1243 + cmd++; 1244 + 1245 + cs = intel_ring_begin(rq, 4); 1246 + if (IS_ERR(cs)) 1247 + return PTR_ERR(cs); 1248 + 1249 + *cs++ = cmd; 1250 + *cs++ = i915_mmio_reg_offset(reg); 1251 + *cs++ = ggtt_offset; 1252 + *cs++ = 0; 1253 + 1254 + intel_ring_advance(rq, cs); 1255 + 1256 + return 0; 1257 + } 1258 + 1259 + static int 1260 + __read_reg(struct intel_context *ce, i915_reg_t reg, u32 ggtt_offset) 1261 + { 1262 + struct i915_request *rq; 1263 + int err; 1264 + 1265 + rq = i915_request_create(ce); 1266 + if (IS_ERR(rq)) 1267 + return PTR_ERR(rq); 1268 + 1269 + i915_request_get(rq); 1270 + 1271 + err = __store_reg_to_mem(rq, reg, ggtt_offset); 1272 + 1273 + i915_request_add(rq); 1274 + if (!err && i915_request_wait(rq, 0, HZ / 2) < 0) 1275 + err = -ETIME; 1276 + 1277 + i915_request_put(rq); 1278 + 1279 + return err; 1280 + } 1281 + 1282 + static int 1283 + gen12_guc_sw_ctx_id(struct intel_context *ce, u32 *ctx_id) 1284 + { 1285 + struct i915_vma *scratch; 1286 + u32 *val; 1287 + int err; 1288 + 1289 + scratch = __vm_create_scratch_for_read_pinned(&ce->engine->gt->ggtt->vm, 4); 1290 + if (IS_ERR(scratch)) 1291 + return PTR_ERR(scratch); 1292 + 1293 + err = i915_vma_sync(scratch); 1294 + if (err) 1295 + goto err_scratch; 1296 + 1297 + err = __read_reg(ce, RING_EXECLIST_STATUS_HI(ce->engine->mmio_base), 1298 + i915_ggtt_offset(scratch)); 1299 + if (err) 1300 + goto err_scratch; 1301 + 1302 + val = i915_gem_object_pin_map_unlocked(scratch->obj, I915_MAP_WB); 1303 + if (IS_ERR(val)) { 1304 + err = PTR_ERR(val); 1305 + goto err_scratch; 1306 + } 1307 + 1308 + *ctx_id = *val; 1309 + i915_gem_object_unpin_map(scratch->obj); 1310 + 1311 + err_scratch: 1312 + i915_vma_unpin_and_release(&scratch, 0); 1313 + return err; 1314 + } 1315 + 1316 + /* 1317 + * For execlist mode of submission, pick an unused context id 1318 + * 0 - (NUM_CONTEXT_TAG -1) are used by other contexts 1319 + * XXX_MAX_CONTEXT_HW_ID is used by idle context 1320 + * 1321 + * For GuC mode of submission read context id from the upper dword of the 1322 + * EXECLIST_STATUS register. Note that we read this value only once and expect 1323 + * that the value stays fixed for the entire OA use case. There are cases where 1324 + * GuC KMD implementation may deregister a context to reuse it's context id, but 1325 + * we prevent that from happening to the OA context by pinning it. 1326 + */ 1327 + static int gen12_get_render_context_id(struct i915_perf_stream *stream) 1328 + { 1329 + u32 ctx_id, mask; 1330 + int ret; 1331 + 1332 + if (intel_engine_uses_guc(stream->engine)) { 1333 + ret = gen12_guc_sw_ctx_id(stream->pinned_ctx, &ctx_id); 1334 + if (ret) 1335 + return ret; 1336 + 1337 + mask = ((1U << GEN12_GUC_SW_CTX_ID_WIDTH) - 1) << 1338 + (GEN12_GUC_SW_CTX_ID_SHIFT - 32); 1339 + } else if (GRAPHICS_VER_FULL(stream->engine->i915) >= IP_VER(12, 50)) { 1340 + ctx_id = (XEHP_MAX_CONTEXT_HW_ID - 1) << 1341 + (XEHP_SW_CTX_ID_SHIFT - 32); 1342 + 1343 + mask = ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) << 1344 + (XEHP_SW_CTX_ID_SHIFT - 32); 1345 + } else { 1346 + ctx_id = (GEN12_MAX_CONTEXT_HW_ID - 1) << 1347 + (GEN11_SW_CTX_ID_SHIFT - 32); 1348 + 1349 + mask = ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << 1350 + (GEN11_SW_CTX_ID_SHIFT - 32); 1351 + } 1352 + stream->specific_ctx_id = ctx_id & mask; 1353 + stream->specific_ctx_id_mask = mask; 1354 + 1355 + return 0; 1356 + } 1357 + 1358 + static bool oa_find_reg_in_lri(u32 *state, u32 reg, u32 *offset, u32 end) 1359 + { 1360 + u32 idx = *offset; 1361 + u32 len = min(MI_LRI_LEN(state[idx]) + idx, end); 1362 + bool found = false; 1363 + 1364 + idx++; 1365 + for (; idx < len; idx += 2) { 1366 + if (state[idx] == reg) { 1367 + found = true; 1368 + break; 1369 + } 1370 + } 1371 + 1372 + *offset = idx; 1373 + return found; 1374 + } 1375 + 1376 + static u32 oa_context_image_offset(struct intel_context *ce, u32 reg) 1377 + { 1378 + u32 offset, len = (ce->engine->context_size - PAGE_SIZE) / 4; 1379 + u32 *state = ce->lrc_reg_state; 1380 + 1381 + for (offset = 0; offset < len; ) { 1382 + if (IS_MI_LRI_CMD(state[offset])) { 1383 + /* 1384 + * We expect reg-value pairs in MI_LRI command, so 1385 + * MI_LRI_LEN() should be even, if not, issue a warning. 1386 + */ 1387 + drm_WARN_ON(&ce->engine->i915->drm, 1388 + MI_LRI_LEN(state[offset]) & 0x1); 1389 + 1390 + if (oa_find_reg_in_lri(state, reg, &offset, len)) 1391 + break; 1392 + } else { 1393 + offset++; 1394 + } 1395 + } 1396 + 1397 + return offset < len ? offset : U32_MAX; 1398 + } 1399 + 1400 + static int set_oa_ctx_ctrl_offset(struct intel_context *ce) 1401 + { 1402 + i915_reg_t reg = GEN12_OACTXCONTROL(ce->engine->mmio_base); 1403 + struct i915_perf *perf = &ce->engine->i915->perf; 1404 + u32 offset = perf->ctx_oactxctrl_offset; 1405 + 1406 + /* Do this only once. Failure is stored as offset of U32_MAX */ 1407 + if (offset) 1408 + goto exit; 1409 + 1410 + offset = oa_context_image_offset(ce, i915_mmio_reg_offset(reg)); 1411 + perf->ctx_oactxctrl_offset = offset; 1412 + 1413 + drm_dbg(&ce->engine->i915->drm, 1414 + "%s oa ctx control at 0x%08x dword offset\n", 1415 + ce->engine->name, offset); 1416 + 1417 + exit: 1418 + return offset && offset != U32_MAX ? 0 : -ENODEV; 1419 + } 1420 + 1421 + static bool engine_supports_mi_query(struct intel_engine_cs *engine) 1422 + { 1423 + return engine->class == RENDER_CLASS; 1424 + } 1425 + 1239 1426 /** 1240 1427 * oa_get_render_ctx_id - determine and hold ctx hw id 1241 1428 * @stream: An i915-perf stream opened for OA metrics ··· 1439 1246 static int oa_get_render_ctx_id(struct i915_perf_stream *stream) 1440 1247 { 1441 1248 struct intel_context *ce; 1249 + int ret = 0; 1442 1250 1443 1251 ce = oa_pin_context(stream); 1444 1252 if (IS_ERR(ce)) 1445 1253 return PTR_ERR(ce); 1254 + 1255 + if (engine_supports_mi_query(stream->engine)) { 1256 + /* 1257 + * We are enabling perf query here. If we don't find the context 1258 + * offset here, just return an error. 1259 + */ 1260 + ret = set_oa_ctx_ctrl_offset(ce); 1261 + if (ret) { 1262 + intel_context_unpin(ce); 1263 + drm_err(&stream->perf->i915->drm, 1264 + "Enabling perf query failed for %s\n", 1265 + stream->engine->name); 1266 + return ret; 1267 + } 1268 + } 1446 1269 1447 1270 switch (GRAPHICS_VER(ce->engine->i915)) { 1448 1271 case 7: { ··· 1501 1292 1502 1293 case 11: 1503 1294 case 12: 1504 - if (GRAPHICS_VER_FULL(ce->engine->i915) >= IP_VER(12, 50)) { 1505 - stream->specific_ctx_id_mask = 1506 - ((1U << XEHP_SW_CTX_ID_WIDTH) - 1) << 1507 - (XEHP_SW_CTX_ID_SHIFT - 32); 1508 - stream->specific_ctx_id = 1509 - (XEHP_MAX_CONTEXT_HW_ID - 1) << 1510 - (XEHP_SW_CTX_ID_SHIFT - 32); 1511 - } else { 1512 - stream->specific_ctx_id_mask = 1513 - ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << (GEN11_SW_CTX_ID_SHIFT - 32); 1514 - /* 1515 - * Pick an unused context id 1516 - * 0 - BITS_PER_LONG are used by other contexts 1517 - * GEN12_MAX_CONTEXT_HW_ID (0x7ff) is used by idle context 1518 - */ 1519 - stream->specific_ctx_id = 1520 - (GEN12_MAX_CONTEXT_HW_ID - 1) << (GEN11_SW_CTX_ID_SHIFT - 32); 1521 - } 1295 + ret = gen12_get_render_context_id(stream); 1522 1296 break; 1523 1297 1524 1298 default: ··· 1515 1323 stream->specific_ctx_id, 1516 1324 stream->specific_ctx_id_mask); 1517 1325 1518 - return 0; 1326 + return ret; 1519 1327 } 1520 1328 1521 1329 /** ··· 1567 1375 static void i915_oa_stream_destroy(struct i915_perf_stream *stream) 1568 1376 { 1569 1377 struct i915_perf *perf = stream->perf; 1378 + struct intel_gt *gt = stream->engine->gt; 1570 1379 1571 - if (WARN_ON(stream != perf->exclusive_stream)) 1380 + if (WARN_ON(stream != gt->perf.exclusive_stream)) 1572 1381 return; 1573 1382 1574 1383 /* ··· 1578 1385 * 1579 1386 * See i915_oa_init_reg_state() and lrc_configure_all_contexts() 1580 1387 */ 1581 - WRITE_ONCE(perf->exclusive_stream, NULL); 1388 + WRITE_ONCE(gt->perf.exclusive_stream, NULL); 1582 1389 perf->ops.disable_metric_set(stream); 1583 1390 1584 1391 free_oa_buffer(stream); 1392 + 1393 + /* 1394 + * Wa_16011777198:dg2: Unset the override of GUCRC mode to enable rc6. 1395 + */ 1396 + if (intel_uc_uses_guc_rc(&gt->uc) && 1397 + (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) || 1398 + IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) 1399 + drm_WARN_ON(&gt->i915->drm, 1400 + intel_guc_slpc_unset_gucrc_mode(&gt->uc.guc.slpc)); 1585 1401 1586 1402 intel_uncore_forcewake_put(stream->uncore, FORCEWAKE_ALL); 1587 1403 intel_engine_pm_put(stream->engine); ··· 1765 1563 static int alloc_oa_buffer(struct i915_perf_stream *stream) 1766 1564 { 1767 1565 struct drm_i915_private *i915 = stream->perf->i915; 1566 + struct intel_gt *gt = stream->engine->gt; 1768 1567 struct drm_i915_gem_object *bo; 1769 1568 struct i915_vma *vma; 1770 1569 int ret; ··· 1785 1582 i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC); 1786 1583 1787 1584 /* PreHSW required 512K alignment, HSW requires 16M */ 1788 - vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0); 1585 + vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL); 1789 1586 if (IS_ERR(vma)) { 1790 1587 ret = PTR_ERR(vma); 1791 1588 goto err_unref; 1792 1589 } 1590 + 1591 + /* 1592 + * PreHSW required 512K alignment. 1593 + * HSW and onwards, align to requested size of OA buffer. 1594 + */ 1595 + ret = i915_vma_pin(vma, 0, SZ_16M, PIN_GLOBAL | PIN_HIGH); 1596 + if (ret) { 1597 + drm_err(&gt->i915->drm, "Failed to pin OA buffer %d\n", ret); 1598 + goto err_unref; 1599 + } 1600 + 1793 1601 stream->oa_buffer.vma = vma; 1794 1602 1795 1603 stream->oa_buffer.vaddr = ··· 1850 1636 static int alloc_noa_wait(struct i915_perf_stream *stream) 1851 1637 { 1852 1638 struct drm_i915_private *i915 = stream->perf->i915; 1639 + struct intel_gt *gt = stream->engine->gt; 1853 1640 struct drm_i915_gem_object *bo; 1854 1641 struct i915_vma *vma; 1855 1642 const u64 delay_ticks = 0xffffffffffffffff - ··· 1869 1654 DELTA_TARGET, 1870 1655 N_CS_GPR 1871 1656 }; 1657 + i915_reg_t mi_predicate_result = HAS_MI_SET_PREDICATE(i915) ? 1658 + MI_PREDICATE_RESULT_2_ENGINE(base) : 1659 + MI_PREDICATE_RESULT_1(RENDER_RING_BASE); 1872 1660 1873 1661 bo = i915_gem_object_create_internal(i915, 4096); 1874 1662 if (IS_ERR(bo)) { ··· 1891 1673 * multiple OA config BOs will have a jump to this address and it 1892 1674 * needs to be fixed during the lifetime of the i915/perf stream. 1893 1675 */ 1894 - vma = i915_gem_object_ggtt_pin_ww(bo, &ww, NULL, 0, 0, PIN_HIGH); 1676 + vma = i915_vma_instance(bo, &gt->ggtt->vm, NULL); 1895 1677 if (IS_ERR(vma)) { 1896 1678 ret = PTR_ERR(vma); 1897 1679 goto out_ww; 1898 1680 } 1681 + 1682 + ret = i915_vma_pin_ww(vma, &ww, 0, 0, PIN_GLOBAL | PIN_HIGH); 1683 + if (ret) 1684 + goto out_ww; 1899 1685 1900 1686 batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB); 1901 1687 if (IS_ERR(batch)) { ··· 1913 1691 stream, cs, true /* save */, CS_GPR(i), 1914 1692 INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2); 1915 1693 cs = save_restore_register( 1916 - stream, cs, true /* save */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE), 1694 + stream, cs, true /* save */, mi_predicate_result, 1917 1695 INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1); 1918 1696 1919 1697 /* First timestamp snapshot location. */ ··· 1967 1745 */ 1968 1746 *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); 1969 1747 *cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE)); 1970 - *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE)); 1748 + *cs++ = i915_mmio_reg_offset(mi_predicate_result); 1749 + 1750 + if (HAS_MI_SET_PREDICATE(i915)) 1751 + *cs++ = MI_SET_PREDICATE | 1; 1971 1752 1972 1753 /* Restart from the beginning if we had timestamps roll over. */ 1973 1754 *cs++ = (GRAPHICS_VER(i915) < 8 ? ··· 1979 1754 MI_BATCH_PREDICATE; 1980 1755 *cs++ = i915_ggtt_offset(vma) + (ts0 - batch) * 4; 1981 1756 *cs++ = 0; 1757 + 1758 + if (HAS_MI_SET_PREDICATE(i915)) 1759 + *cs++ = MI_SET_PREDICATE; 1982 1760 1983 1761 /* 1984 1762 * Now add the diff between to previous timestamps and add it to : ··· 2010 1782 */ 2011 1783 *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); 2012 1784 *cs++ = i915_mmio_reg_offset(CS_GPR(JUMP_PREDICATE)); 2013 - *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1(RENDER_RING_BASE)); 1785 + *cs++ = i915_mmio_reg_offset(mi_predicate_result); 1786 + 1787 + if (HAS_MI_SET_PREDICATE(i915)) 1788 + *cs++ = MI_SET_PREDICATE | 1; 2014 1789 2015 1790 /* Predicate the jump. */ 2016 1791 *cs++ = (GRAPHICS_VER(i915) < 8 ? ··· 2023 1792 *cs++ = i915_ggtt_offset(vma) + (jump - batch) * 4; 2024 1793 *cs++ = 0; 2025 1794 1795 + if (HAS_MI_SET_PREDICATE(i915)) 1796 + *cs++ = MI_SET_PREDICATE; 1797 + 2026 1798 /* Restore registers. */ 2027 1799 for (i = 0; i < N_CS_GPR; i++) 2028 1800 cs = save_restore_register( 2029 1801 stream, cs, false /* restore */, CS_GPR(i), 2030 1802 INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2); 2031 1803 cs = save_restore_register( 2032 - stream, cs, false /* restore */, MI_PREDICATE_RESULT_1(RENDER_RING_BASE), 1804 + stream, cs, false /* restore */, mi_predicate_result, 2033 1805 INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1); 2034 1806 2035 1807 /* And return to the ring. */ ··· 2517 2283 { 2518 2284 int err; 2519 2285 struct intel_context *ce = stream->pinned_ctx; 2520 - u32 format = stream->oa_buffer.format; 2286 + u32 format = stream->oa_buffer.format->format; 2287 + u32 offset = stream->perf->ctx_oactxctrl_offset; 2521 2288 struct flex regs_context[] = { 2522 2289 { 2523 2290 GEN8_OACTXCONTROL, 2524 - stream->perf->ctx_oactxctrl_offset + 1, 2291 + offset + 1, 2525 2292 active ? GEN8_OA_COUNTER_RESUME : 0, 2526 2293 }, 2527 2294 }; ··· 2547 2312 }, 2548 2313 }; 2549 2314 2550 - /* Modify the context image of pinned context with regs_context*/ 2315 + /* Modify the context image of pinned context with regs_context */ 2551 2316 err = intel_context_lock_pinned(ce); 2552 2317 if (err) 2553 2318 return err; 2554 2319 2555 - err = gen8_modify_context(ce, regs_context, ARRAY_SIZE(regs_context)); 2320 + err = gen8_modify_context(ce, regs_context, 2321 + ARRAY_SIZE(regs_context)); 2556 2322 intel_context_unlock_pinned(ce); 2557 2323 if (err) 2558 2324 return err; ··· 2595 2359 { 2596 2360 struct drm_i915_private *i915 = stream->perf->i915; 2597 2361 struct intel_engine_cs *engine; 2362 + struct intel_gt *gt = stream->engine->gt; 2598 2363 struct i915_gem_context *ctx, *cn; 2599 2364 int err; 2600 2365 2601 - lockdep_assert_held(&stream->perf->lock); 2366 + lockdep_assert_held(&gt->perf.lock); 2602 2367 2603 2368 /* 2604 2369 * The OA register config is setup through the context image. This image ··· 2679 2442 const struct i915_oa_config *oa_config, 2680 2443 struct i915_active *active) 2681 2444 { 2445 + u32 ctx_oactxctrl = stream->perf->ctx_oactxctrl_offset; 2682 2446 /* The MMIO offsets for Flex EU registers aren't contiguous */ 2683 2447 const u32 ctx_flexeu0 = stream->perf->ctx_flexeu0_offset; 2684 2448 #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1) ··· 2690 2452 }, 2691 2453 { 2692 2454 GEN8_OACTXCONTROL, 2693 - stream->perf->ctx_oactxctrl_offset + 1, 2455 + ctx_oactxctrl + 1, 2694 2456 }, 2695 2457 { EU_PERF_CNTL0, ctx_flexeuN(0) }, 2696 2458 { EU_PERF_CNTL1, ctx_flexeuN(1) }, ··· 2778 2540 gen12_enable_metric_set(struct i915_perf_stream *stream, 2779 2541 struct i915_active *active) 2780 2542 { 2543 + struct drm_i915_private *i915 = stream->perf->i915; 2781 2544 struct intel_uncore *uncore = stream->uncore; 2782 2545 struct i915_oa_config *oa_config = stream->oa_config; 2783 2546 bool periodic = stream->periodic; 2784 2547 u32 period_exponent = stream->period_exponent; 2548 + u32 sqcnt1; 2785 2549 int ret; 2550 + 2551 + /* 2552 + * Wa_1508761755:xehpsdv, dg2 2553 + * EU NOA signals behave incorrectly if EU clock gating is enabled. 2554 + * Disable thread stall DOP gating and EU DOP gating. 2555 + */ 2556 + if (IS_XEHPSDV(i915) || IS_DG2(i915)) { 2557 + intel_gt_mcr_multicast_write(uncore->gt, GEN8_ROW_CHICKEN, 2558 + _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); 2559 + intel_uncore_write(uncore, GEN7_ROW_CHICKEN2, 2560 + _MASKED_BIT_ENABLE(GEN12_DISABLE_DOP_GATING)); 2561 + } 2786 2562 2787 2563 intel_uncore_write(uncore, GEN12_OAG_OA_DEBUG, 2788 2564 /* Disable clk ratio reports, like previous Gens. */ ··· 2813 2561 GEN12_OAG_OAGLBCTXCTRL_TIMER_ENABLE | 2814 2562 (period_exponent << GEN12_OAG_OAGLBCTXCTRL_TIMER_PERIOD_SHIFT)) 2815 2563 : 0); 2564 + 2565 + /* 2566 + * Initialize Super Queue Internal Cnt Register 2567 + * Set PMON Enable in order to collect valid metrics. 2568 + * Enable byets per clock reporting in OA for XEHPSDV onward. 2569 + */ 2570 + sqcnt1 = GEN12_SQCNT1_PMON_ENABLE | 2571 + (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0); 2572 + 2573 + intel_uncore_rmw(uncore, GEN12_SQCNT1, 0, sqcnt1); 2816 2574 2817 2575 /* 2818 2576 * Update all contexts prior writing the mux configurations as we need ··· 2873 2611 static void gen12_disable_metric_set(struct i915_perf_stream *stream) 2874 2612 { 2875 2613 struct intel_uncore *uncore = stream->uncore; 2614 + struct drm_i915_private *i915 = stream->perf->i915; 2615 + u32 sqcnt1; 2616 + 2617 + /* 2618 + * Wa_1508761755:xehpsdv, dg2 2619 + * Enable thread stall DOP gating and EU DOP gating. 2620 + */ 2621 + if (IS_XEHPSDV(i915) || IS_DG2(i915)) { 2622 + intel_gt_mcr_multicast_write(uncore->gt, GEN8_ROW_CHICKEN, 2623 + _MASKED_BIT_DISABLE(STALL_DOP_GATING_DISABLE)); 2624 + intel_uncore_write(uncore, GEN7_ROW_CHICKEN2, 2625 + _MASKED_BIT_DISABLE(GEN12_DISABLE_DOP_GATING)); 2626 + } 2876 2627 2877 2628 /* Reset all contexts' slices/subslices configurations. */ 2878 2629 gen12_configure_all_contexts(stream, NULL, NULL); ··· 2896 2621 2897 2622 /* Make sure we disable noa to save power. */ 2898 2623 intel_uncore_rmw(uncore, RPM_CONFIG1, GEN10_GT_NOA_ENABLE, 0); 2624 + 2625 + sqcnt1 = GEN12_SQCNT1_PMON_ENABLE | 2626 + (HAS_OA_BPC_REPORTING(i915) ? GEN12_SQCNT1_OABPC : 0); 2627 + 2628 + /* Reset PMON Enable to save power. */ 2629 + intel_uncore_rmw(uncore, GEN12_SQCNT1, sqcnt1, 0); 2899 2630 } 2900 2631 2901 2632 static void gen7_oa_enable(struct i915_perf_stream *stream) ··· 2911 2630 u32 ctx_id = stream->specific_ctx_id; 2912 2631 bool periodic = stream->periodic; 2913 2632 u32 period_exponent = stream->period_exponent; 2914 - u32 report_format = stream->oa_buffer.format; 2633 + u32 report_format = stream->oa_buffer.format->format; 2915 2634 2916 2635 /* 2917 2636 * Reset buf pointers so we don't forward reports from before now. ··· 2937 2656 static void gen8_oa_enable(struct i915_perf_stream *stream) 2938 2657 { 2939 2658 struct intel_uncore *uncore = stream->uncore; 2940 - u32 report_format = stream->oa_buffer.format; 2659 + u32 report_format = stream->oa_buffer.format->format; 2941 2660 2942 2661 /* 2943 2662 * Reset buf pointers so we don't forward reports from before now. ··· 2963 2682 static void gen12_oa_enable(struct i915_perf_stream *stream) 2964 2683 { 2965 2684 struct intel_uncore *uncore = stream->uncore; 2966 - u32 report_format = stream->oa_buffer.format; 2685 + u32 report_format = stream->oa_buffer.format->format; 2967 2686 2968 2687 /* 2969 2688 * If we don't want OA reports from the OA buffer, then we don't even ··· 3119 2838 return i915_gem_user_to_context_sseu(engine->gt, drm_sseu, out_sseu); 3120 2839 } 3121 2840 2841 + /* 2842 + * OA timestamp frequency = CS timestamp frequency in most platforms. On some 2843 + * platforms OA unit ignores the CTC_SHIFT and the 2 timestamps differ. In such 2844 + * cases, return the adjusted CS timestamp frequency to the user. 2845 + */ 2846 + u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915) 2847 + { 2848 + /* Wa_18013179988:dg2 */ 2849 + if (IS_DG2(i915)) { 2850 + intel_wakeref_t wakeref; 2851 + u32 reg, shift; 2852 + 2853 + with_intel_runtime_pm(to_gt(i915)->uncore->rpm, wakeref) 2854 + reg = intel_uncore_read(to_gt(i915)->uncore, RPM_CONFIG0); 2855 + 2856 + shift = REG_FIELD_GET(GEN10_RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, 2857 + reg); 2858 + 2859 + return to_gt(i915)->clock_frequency << (3 - shift); 2860 + } 2861 + 2862 + return to_gt(i915)->clock_frequency; 2863 + } 2864 + 3122 2865 /** 3123 2866 * i915_oa_stream_init - validate combined props for OA stream and init 3124 2867 * @stream: An i915 perf stream ··· 3167 2862 { 3168 2863 struct drm_i915_private *i915 = stream->perf->i915; 3169 2864 struct i915_perf *perf = stream->perf; 3170 - int format_size; 2865 + struct intel_gt *gt; 3171 2866 int ret; 3172 2867 3173 2868 if (!props->engine) { ··· 3175 2870 "OA engine not specified\n"); 3176 2871 return -EINVAL; 3177 2872 } 2873 + gt = props->engine->gt; 3178 2874 3179 2875 /* 3180 2876 * If the sysfs metrics/ directory wasn't registered for some ··· 3206 2900 * counter reports and marshal to the appropriate client 3207 2901 * we currently only allow exclusive access 3208 2902 */ 3209 - if (perf->exclusive_stream) { 2903 + if (gt->perf.exclusive_stream) { 3210 2904 drm_dbg(&stream->perf->i915->drm, 3211 2905 "OA unit already in use\n"); 3212 2906 return -EBUSY; ··· 3223 2917 3224 2918 stream->sample_size = sizeof(struct drm_i915_perf_record_header); 3225 2919 3226 - format_size = perf->oa_formats[props->oa_format].size; 3227 - 3228 - stream->sample_flags = props->sample_flags; 3229 - stream->sample_size += format_size; 3230 - 3231 - stream->oa_buffer.format_size = format_size; 3232 - if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format_size == 0)) 2920 + stream->oa_buffer.format = &perf->oa_formats[props->oa_format]; 2921 + if (drm_WARN_ON(&i915->drm, stream->oa_buffer.format->size == 0)) 3233 2922 return -EINVAL; 3234 2923 3235 - stream->hold_preemption = props->hold_preemption; 2924 + stream->sample_flags = props->sample_flags; 2925 + stream->sample_size += stream->oa_buffer.format->size; 3236 2926 3237 - stream->oa_buffer.format = 3238 - perf->oa_formats[props->oa_format].format; 2927 + stream->hold_preemption = props->hold_preemption; 3239 2928 3240 2929 stream->periodic = props->oa_periodic; 3241 2930 if (stream->periodic) ··· 3275 2974 intel_engine_pm_get(stream->engine); 3276 2975 intel_uncore_forcewake_get(stream->uncore, FORCEWAKE_ALL); 3277 2976 2977 + /* 2978 + * Wa_16011777198:dg2: GuC resets render as part of the Wa. This causes 2979 + * OA to lose the configuration state. Prevent this by overriding GUCRC 2980 + * mode. 2981 + */ 2982 + if (intel_uc_uses_guc_rc(&gt->uc) && 2983 + (IS_DG2_GRAPHICS_STEP(gt->i915, G10, STEP_A0, STEP_C0) || 2984 + IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_B0))) { 2985 + ret = intel_guc_slpc_override_gucrc_mode(&gt->uc.guc.slpc, 2986 + SLPC_GUCRC_MODE_GUCRC_NO_RC6); 2987 + if (ret) { 2988 + drm_dbg(&stream->perf->i915->drm, 2989 + "Unable to override gucrc mode\n"); 2990 + goto err_config; 2991 + } 2992 + } 2993 + 3278 2994 ret = alloc_oa_buffer(stream); 3279 2995 if (ret) 3280 2996 goto err_oa_buf_alloc; 3281 2997 3282 2998 stream->ops = &i915_oa_stream_ops; 3283 2999 3284 - perf->sseu = props->sseu; 3285 - WRITE_ONCE(perf->exclusive_stream, stream); 3000 + stream->engine->gt->perf.sseu = props->sseu; 3001 + WRITE_ONCE(gt->perf.exclusive_stream, stream); 3286 3002 3287 3003 ret = i915_perf_stream_enable_sync(stream); 3288 3004 if (ret) { ··· 3317 2999 stream->poll_check_timer.function = oa_poll_check_timer_cb; 3318 3000 init_waitqueue_head(&stream->poll_wq); 3319 3001 spin_lock_init(&stream->oa_buffer.ptr_lock); 3002 + mutex_init(&stream->lock); 3320 3003 3321 3004 return 0; 3322 3005 3323 3006 err_enable: 3324 - WRITE_ONCE(perf->exclusive_stream, NULL); 3007 + WRITE_ONCE(gt->perf.exclusive_stream, NULL); 3325 3008 perf->ops.disable_metric_set(stream); 3326 3009 3327 3010 free_oa_buffer(stream); ··· 3352 3033 return; 3353 3034 3354 3035 /* perf.exclusive_stream serialised by lrc_configure_all_contexts() */ 3355 - stream = READ_ONCE(engine->i915->perf.exclusive_stream); 3036 + stream = READ_ONCE(engine->gt->perf.exclusive_stream); 3356 3037 if (stream && GRAPHICS_VER(stream->perf->i915) < 12) 3357 3038 gen8_update_reg_state_unlocked(ce, stream); 3358 3039 } ··· 3381 3062 loff_t *ppos) 3382 3063 { 3383 3064 struct i915_perf_stream *stream = file->private_data; 3384 - struct i915_perf *perf = stream->perf; 3385 3065 size_t offset = 0; 3386 3066 int ret; 3387 3067 ··· 3404 3086 if (ret) 3405 3087 return ret; 3406 3088 3407 - mutex_lock(&perf->lock); 3089 + mutex_lock(&stream->lock); 3408 3090 ret = stream->ops->read(stream, buf, count, &offset); 3409 - mutex_unlock(&perf->lock); 3091 + mutex_unlock(&stream->lock); 3410 3092 } while (!offset && !ret); 3411 3093 } else { 3412 - mutex_lock(&perf->lock); 3094 + mutex_lock(&stream->lock); 3413 3095 ret = stream->ops->read(stream, buf, count, &offset); 3414 - mutex_unlock(&perf->lock); 3096 + mutex_unlock(&stream->lock); 3415 3097 } 3416 3098 3417 3099 /* We allow the poll checking to sometimes report false positive EPOLLIN ··· 3458 3140 * &i915_perf_stream_ops->poll_wait to call poll_wait() with a wait queue that 3459 3141 * will be woken for new stream data. 3460 3142 * 3461 - * Note: The &perf->lock mutex has been taken to serialize 3462 - * with any non-file-operation driver hooks. 3463 - * 3464 3143 * Returns: any poll events that are ready without sleeping 3465 3144 */ 3466 3145 static __poll_t i915_perf_poll_locked(struct i915_perf_stream *stream, ··· 3496 3181 static __poll_t i915_perf_poll(struct file *file, poll_table *wait) 3497 3182 { 3498 3183 struct i915_perf_stream *stream = file->private_data; 3499 - struct i915_perf *perf = stream->perf; 3500 3184 __poll_t ret; 3501 3185 3502 - mutex_lock(&perf->lock); 3186 + mutex_lock(&stream->lock); 3503 3187 ret = i915_perf_poll_locked(stream, file, wait); 3504 - mutex_unlock(&perf->lock); 3188 + mutex_unlock(&stream->lock); 3505 3189 3506 3190 return ret; 3507 3191 } ··· 3599 3285 * @cmd: the ioctl request 3600 3286 * @arg: the ioctl data 3601 3287 * 3602 - * Note: The &perf->lock mutex has been taken to serialize 3603 - * with any non-file-operation driver hooks. 3604 - * 3605 3288 * Returns: zero on success or a negative error code. Returns -EINVAL for 3606 3289 * an unknown ioctl request. 3607 3290 */ ··· 3636 3325 unsigned long arg) 3637 3326 { 3638 3327 struct i915_perf_stream *stream = file->private_data; 3639 - struct i915_perf *perf = stream->perf; 3640 3328 long ret; 3641 3329 3642 - mutex_lock(&perf->lock); 3330 + mutex_lock(&stream->lock); 3643 3331 ret = i915_perf_ioctl_locked(stream, cmd, arg); 3644 - mutex_unlock(&perf->lock); 3332 + mutex_unlock(&stream->lock); 3645 3333 3646 3334 return ret; 3647 3335 } ··· 3652 3342 * Frees all resources associated with the given i915 perf @stream, disabling 3653 3343 * any associated data capture in the process. 3654 3344 * 3655 - * Note: The &perf->lock mutex has been taken to serialize 3345 + * Note: The &gt->perf.lock mutex has been taken to serialize 3656 3346 * with any non-file-operation driver hooks. 3657 3347 */ 3658 3348 static void i915_perf_destroy_locked(struct i915_perf_stream *stream) ··· 3684 3374 { 3685 3375 struct i915_perf_stream *stream = file->private_data; 3686 3376 struct i915_perf *perf = stream->perf; 3377 + struct intel_gt *gt = stream->engine->gt; 3687 3378 3688 - mutex_lock(&perf->lock); 3379 + /* 3380 + * Within this call, we know that the fd is being closed and we have no 3381 + * other user of stream->lock. Use the perf lock to destroy the stream 3382 + * here. 3383 + */ 3384 + mutex_lock(&gt->perf.lock); 3689 3385 i915_perf_destroy_locked(stream); 3690 - mutex_unlock(&perf->lock); 3386 + mutex_unlock(&gt->perf.lock); 3691 3387 3692 3388 /* Release the reference the perf stream kept on the driver. */ 3693 3389 drm_dev_put(&perf->i915->drm); ··· 3726 3410 * See i915_perf_ioctl_open() for interface details. 3727 3411 * 3728 3412 * Implements further stream config validation and stream initialization on 3729 - * behalf of i915_perf_open_ioctl() with the &perf->lock mutex 3413 + * behalf of i915_perf_open_ioctl() with the &gt->perf.lock mutex 3730 3414 * taken to serialize with any non-file-operation driver hooks. 3731 3415 * 3732 3416 * Note: at this point the @props have only been validated in isolation and ··· 3881 3565 3882 3566 static u64 oa_exponent_to_ns(struct i915_perf *perf, int exponent) 3883 3567 { 3884 - return intel_gt_clock_interval_to_ns(to_gt(perf->i915), 3885 - 2ULL << exponent); 3568 + u64 nom = (2ULL << exponent) * NSEC_PER_SEC; 3569 + u32 den = i915_perf_oa_timestamp_frequency(perf->i915); 3570 + 3571 + return div_u64(nom + den - 1, den); 3886 3572 } 3887 3573 3888 3574 static __always_inline bool ··· 4112 3794 * mutex to avoid an awkward lockdep with mmap_lock. 4113 3795 * 4114 3796 * Most of the implementation details are handled by 4115 - * i915_perf_open_ioctl_locked() after taking the &perf->lock 3797 + * i915_perf_open_ioctl_locked() after taking the &gt->perf.lock 4116 3798 * mutex for serializing with any non-file-operation driver hooks. 4117 3799 * 4118 3800 * Return: A newly opened i915 Perf stream file descriptor or negative ··· 4123 3805 { 4124 3806 struct i915_perf *perf = &to_i915(dev)->perf; 4125 3807 struct drm_i915_perf_open_param *param = data; 3808 + struct intel_gt *gt; 4126 3809 struct perf_open_properties props; 4127 3810 u32 known_open_flags; 4128 3811 int ret; ··· 4150 3831 if (ret) 4151 3832 return ret; 4152 3833 4153 - mutex_lock(&perf->lock); 3834 + gt = props.engine->gt; 3835 + 3836 + mutex_lock(&gt->perf.lock); 4154 3837 ret = i915_perf_open_ioctl_locked(perf, param, &props, file); 4155 - mutex_unlock(&perf->lock); 3838 + mutex_unlock(&gt->perf.lock); 4156 3839 4157 3840 return ret; 4158 3841 } ··· 4170 3849 void i915_perf_register(struct drm_i915_private *i915) 4171 3850 { 4172 3851 struct i915_perf *perf = &i915->perf; 3852 + struct intel_gt *gt = to_gt(i915); 4173 3853 4174 3854 if (!perf->i915) 4175 3855 return; ··· 4179 3857 * i915_perf_open_ioctl(); considering that we register after 4180 3858 * being exposed to userspace. 4181 3859 */ 4182 - mutex_lock(&perf->lock); 3860 + mutex_lock(&gt->perf.lock); 4183 3861 4184 3862 perf->metrics_kobj = 4185 3863 kobject_create_and_add("metrics", 4186 3864 &i915->drm.primary->kdev->kobj); 4187 3865 4188 - mutex_unlock(&perf->lock); 3866 + mutex_unlock(&gt->perf.lock); 4189 3867 } 4190 3868 4191 3869 /** ··· 4261 3939 {} 4262 3940 }; 4263 3941 3942 + static const struct i915_range xehp_oa_b_counters[] = { 3943 + { .start = 0xdc48, .end = 0xdc48 }, /* OAA_ENABLE_REG */ 3944 + { .start = 0xdd00, .end = 0xdd48 }, /* OAG_LCE0_0 - OAA_LENABLE_REG */ 3945 + }; 3946 + 4264 3947 static const struct i915_range gen7_oa_mux_regs[] = { 4265 3948 { .start = 0x91b8, .end = 0x91cc }, /* OA_PERFCNT[1-2], OA_PERFMATRIX */ 4266 3949 { .start = 0x9800, .end = 0x9888 }, /* MICRO_BP0_0 - NOA_WRITE */ ··· 4338 4011 static bool gen12_is_valid_b_counter_addr(struct i915_perf *perf, u32 addr) 4339 4012 { 4340 4013 return reg_in_range_table(addr, gen12_oa_b_counters); 4014 + } 4015 + 4016 + static bool xehp_is_valid_b_counter_addr(struct i915_perf *perf, u32 addr) 4017 + { 4018 + return reg_in_range_table(addr, xehp_oa_b_counters) || 4019 + reg_in_range_table(addr, gen12_oa_b_counters); 4341 4020 } 4342 4021 4343 4022 static bool gen12_is_valid_mux_addr(struct i915_perf *perf, u32 addr) ··· 4744 4411 oa_format_add(perf, I915_OA_FORMAT_C4_B8); 4745 4412 break; 4746 4413 4414 + case INTEL_DG2: 4415 + oa_format_add(perf, I915_OAR_FORMAT_A32u40_A4u32_B8_C8); 4416 + oa_format_add(perf, I915_OA_FORMAT_A24u40_A14u32_B8_C8); 4417 + break; 4418 + 4747 4419 default: 4748 4420 MISSING_CASE(platform); 4421 + } 4422 + } 4423 + 4424 + static void i915_perf_init_info(struct drm_i915_private *i915) 4425 + { 4426 + struct i915_perf *perf = &i915->perf; 4427 + 4428 + switch (GRAPHICS_VER(i915)) { 4429 + case 8: 4430 + perf->ctx_oactxctrl_offset = 0x120; 4431 + perf->ctx_flexeu0_offset = 0x2ce; 4432 + perf->gen8_valid_ctx_bit = BIT(25); 4433 + break; 4434 + case 9: 4435 + perf->ctx_oactxctrl_offset = 0x128; 4436 + perf->ctx_flexeu0_offset = 0x3de; 4437 + perf->gen8_valid_ctx_bit = BIT(16); 4438 + break; 4439 + case 11: 4440 + perf->ctx_oactxctrl_offset = 0x124; 4441 + perf->ctx_flexeu0_offset = 0x78e; 4442 + perf->gen8_valid_ctx_bit = BIT(16); 4443 + break; 4444 + case 12: 4445 + /* 4446 + * Calculate offset at runtime in oa_pin_context for gen12 and 4447 + * cache the value in perf->ctx_oactxctrl_offset. 4448 + */ 4449 + break; 4450 + default: 4451 + MISSING_CASE(GRAPHICS_VER(i915)); 4749 4452 } 4750 4453 } 4751 4454 ··· 4797 4428 void i915_perf_init(struct drm_i915_private *i915) 4798 4429 { 4799 4430 struct i915_perf *perf = &i915->perf; 4800 - 4801 - /* XXX const struct i915_perf_ops! */ 4802 - 4803 - /* i915_perf is not enabled for DG2 yet */ 4804 - if (IS_DG2(i915)) 4805 - return; 4806 4431 4807 4432 perf->oa_formats = oa_formats; 4808 4433 if (IS_HASWELL(i915)) { ··· 4817 4454 * execlist mode by default. 4818 4455 */ 4819 4456 perf->ops.read = gen8_oa_read; 4457 + i915_perf_init_info(i915); 4820 4458 4821 4459 if (IS_GRAPHICS_VER(i915, 8, 9)) { 4822 4460 perf->ops.is_valid_b_counter_reg = ··· 4837 4473 perf->ops.enable_metric_set = gen8_enable_metric_set; 4838 4474 perf->ops.disable_metric_set = gen8_disable_metric_set; 4839 4475 perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read; 4840 - 4841 - if (GRAPHICS_VER(i915) == 8) { 4842 - perf->ctx_oactxctrl_offset = 0x120; 4843 - perf->ctx_flexeu0_offset = 0x2ce; 4844 - 4845 - perf->gen8_valid_ctx_bit = BIT(25); 4846 - } else { 4847 - perf->ctx_oactxctrl_offset = 0x128; 4848 - perf->ctx_flexeu0_offset = 0x3de; 4849 - 4850 - perf->gen8_valid_ctx_bit = BIT(16); 4851 - } 4852 4476 } else if (GRAPHICS_VER(i915) == 11) { 4853 4477 perf->ops.is_valid_b_counter_reg = 4854 4478 gen7_is_valid_b_counter_addr; ··· 4850 4498 perf->ops.enable_metric_set = gen8_enable_metric_set; 4851 4499 perf->ops.disable_metric_set = gen11_disable_metric_set; 4852 4500 perf->ops.oa_hw_tail_read = gen8_oa_hw_tail_read; 4853 - 4854 - perf->ctx_oactxctrl_offset = 0x124; 4855 - perf->ctx_flexeu0_offset = 0x78e; 4856 - 4857 - perf->gen8_valid_ctx_bit = BIT(16); 4858 4501 } else if (GRAPHICS_VER(i915) == 12) { 4859 4502 perf->ops.is_valid_b_counter_reg = 4503 + HAS_OA_SLICE_CONTRIB_LIMITS(i915) ? 4504 + xehp_is_valid_b_counter_addr : 4860 4505 gen12_is_valid_b_counter_addr; 4861 4506 perf->ops.is_valid_mux_reg = 4862 4507 gen12_is_valid_mux_addr; ··· 4865 4516 perf->ops.enable_metric_set = gen12_enable_metric_set; 4866 4517 perf->ops.disable_metric_set = gen12_disable_metric_set; 4867 4518 perf->ops.oa_hw_tail_read = gen12_oa_hw_tail_read; 4868 - 4869 - perf->ctx_flexeu0_offset = 0; 4870 - perf->ctx_oactxctrl_offset = 0x144; 4871 4519 } 4872 4520 } 4873 4521 4874 4522 if (perf->ops.enable_metric_set) { 4875 - mutex_init(&perf->lock); 4523 + struct intel_gt *gt; 4524 + int i; 4525 + 4526 + for_each_gt(gt, i915, i) 4527 + mutex_init(&gt->perf.lock); 4876 4528 4877 4529 /* Choose a representative limit */ 4878 4530 oa_sample_rate_hard_limit = to_gt(i915)->clock_frequency / 2;

+2

drivers/gpu/drm/i915/i915_perf.h

··· 57 57 kref_put(&oa_config->ref, i915_oa_config_release); 58 58 } 59 59 60 + u32 i915_perf_oa_timestamp_frequency(struct drm_i915_private *i915); 61 + 60 62 #endif /* __I915_PERF_H__ */

+5 -1

drivers/gpu/drm/i915/i915_perf_oa_regs.h

··· 97 97 #define GEN12_OAR_OACONTROL_COUNTER_FORMAT_SHIFT 1 98 98 #define GEN12_OAR_OACONTROL_COUNTER_ENABLE (1 << 0) 99 99 100 - #define GEN12_OACTXCONTROL _MMIO(0x2360) 100 + #define GEN12_OACTXCONTROL(base) _MMIO((base) + 0x360) 101 101 #define GEN12_OAR_OASTATUS _MMIO(0x2968) 102 102 103 103 /* Gen12 OAG unit */ ··· 133 133 134 134 #define GDT_CHICKEN_BITS _MMIO(0x9840) 135 135 #define GT_NOA_ENABLE 0x00000080 136 + 137 + #define GEN12_SQCNT1 _MMIO(0x8718) 138 + #define GEN12_SQCNT1_PMON_ENABLE REG_BIT(30) 139 + #define GEN12_SQCNT1_OABPC REG_BIT(29) 136 140 137 141 #endif /* __INTEL_PERF_OA_REGS__ */

+26 -21

drivers/gpu/drm/i915/i915_perf_types.h

··· 146 146 */ 147 147 struct intel_engine_cs *engine; 148 148 149 + /* 150 + * Lock associated with operations on stream 151 + */ 152 + struct mutex lock; 153 + 149 154 /** 150 155 * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` 151 156 * properties given when opening a stream, representing the contents ··· 250 245 * @oa_buffer: State of the OA buffer. 251 246 */ 252 247 struct { 248 + const struct i915_oa_format *format; 253 249 struct i915_vma *vma; 254 250 u8 *vaddr; 255 251 u32 last_ctx_id; 256 - int format; 257 - int format_size; 258 252 int size_exponent; 259 253 260 254 /** ··· 384 380 u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream); 385 381 }; 386 382 383 + struct i915_perf_gt { 384 + /* 385 + * Lock associated with anything below within this structure. 386 + */ 387 + struct mutex lock; 388 + 389 + /** 390 + * @sseu: sseu configuration selected to run while perf is active, 391 + * applies to all contexts. 392 + */ 393 + struct intel_sseu sseu; 394 + 395 + /* 396 + * @exclusive_stream: The stream currently using the OA unit. This is 397 + * sometimes accessed outside a syscall associated to its file 398 + * descriptor. 399 + */ 400 + struct i915_perf_stream *exclusive_stream; 401 + }; 402 + 387 403 struct i915_perf { 388 404 struct drm_i915_private *i915; 389 405 ··· 420 396 * need to hold perf->metrics_lock to access it. 421 397 */ 422 398 struct idr metrics_idr; 423 - 424 - /* 425 - * Lock associated with anything below within this structure 426 - * except exclusive_stream. 427 - */ 428 - struct mutex lock; 429 - 430 - /* 431 - * The stream currently using the OA unit. If accessed 432 - * outside a syscall associated to its file 433 - * descriptor. 434 - */ 435 - struct i915_perf_stream *exclusive_stream; 436 - 437 - /** 438 - * @sseu: sseu configuration selected to run while perf is active, 439 - * applies to all contexts. 440 - */ 441 - struct intel_sseu sseu; 442 399 443 400 /** 444 401 * For rate limiting any notifications of spurious

+22

drivers/gpu/drm/i915/i915_reg.h

··· 1796 1796 #define XEHPSDV_RP_STATE_CAP _MMIO(0x250014) 1797 1797 #define PVC_RP_STATE_CAP _MMIO(0x281014) 1798 1798 1799 + #define MTL_RP_STATE_CAP _MMIO(0x138000) 1800 + #define MTL_MEDIAP_STATE_CAP _MMIO(0x138020) 1801 + #define MTL_RP0_CAP_MASK REG_GENMASK(8, 0) 1802 + #define MTL_RPN_CAP_MASK REG_GENMASK(24, 16) 1803 + 1804 + #define MTL_GT_RPE_FREQUENCY _MMIO(0x13800c) 1805 + #define MTL_MPE_FREQUENCY _MMIO(0x13802c) 1806 + #define MTL_RPE_MASK REG_GENMASK(8, 0) 1807 + 1799 1808 #define GT0_PERF_LIMIT_REASONS _MMIO(0x1381a8) 1800 1809 #define GT0_PERF_LIMIT_REASONS_MASK 0xde3 1801 1810 #define PROCHOT_MASK REG_BIT(0) ··· 1815 1806 #define POWER_LIMIT_4_MASK REG_BIT(8) 1816 1807 #define POWER_LIMIT_1_MASK REG_BIT(10) 1817 1808 #define POWER_LIMIT_2_MASK REG_BIT(11) 1809 + #define GT0_PERF_LIMIT_REASONS_LOG_MASK REG_GENMASK(31, 16) 1810 + #define MTL_MEDIA_PERF_LIMIT_REASONS _MMIO(0x138030) 1818 1811 1819 1812 #define CHV_CLK_CTL1 _MMIO(0x101100) 1820 1813 #define VLV_CLK_CTL2 _MMIO(0x101104) ··· 6663 6652 #define DG1_PCODE_STATUS 0x7E 6664 6653 #define DG1_UNCORE_GET_INIT_STATUS 0x0 6665 6654 #define DG1_UNCORE_INIT_STATUS_COMPLETE 0x1 6655 + #define PCODE_POWER_SETUP 0x7C 6656 + #define POWER_SETUP_SUBCOMMAND_READ_I1 0x4 6657 + #define POWER_SETUP_SUBCOMMAND_WRITE_I1 0x5 6658 + #define POWER_SETUP_I1_WATTS REG_BIT(31) 6659 + #define POWER_SETUP_I1_SHIFT 6 /* 10.6 fixed point format */ 6660 + #define POWER_SETUP_I1_DATA_MASK REG_GENMASK(15, 0) 6666 6661 #define GEN12_PCODE_READ_SAGV_BLOCK_TIME_US 0x23 6667 6662 #define XEHP_PCODE_FREQUENCY_CONFIG 0x6e /* xehpsdv, pvc */ 6668 6663 /* XEHP_PCODE_FREQUENCY_CONFIG sub-commands (param1) */ ··· 7805 7788 _ICL_PIPE_DSS_CTL2_PB, \ 7806 7789 _ICL_PIPE_DSS_CTL2_PC) 7807 7790 7791 + #define GGC _MMIO(0x108040) 7792 + #define GMS_MASK REG_GENMASK(15, 8) 7793 + #define GGMS_MASK REG_GENMASK(7, 6) 7794 + 7808 7795 #define GEN12_GSMBASE _MMIO(0x108100) 7809 7796 #define GEN12_DSMBASE _MMIO(0x1080C0) 7797 + #define GEN12_BDSM_MASK REG_GENMASK64(63, 20) 7810 7798 7811 7799 #define XEHP_CLOCK_GATE_DIS _MMIO(0x101014) 7812 7800 #define SGSI_SIDECLK_DIS REG_BIT(17)

+13 -14

drivers/gpu/drm/i915/i915_reg_defs.h

··· 104 104 105 105 #define _MMIO(r) ((const i915_reg_t){ .reg = (r) }) 106 106 107 + typedef struct { 108 + u32 reg; 109 + } i915_mcr_reg_t; 110 + 107 111 #define INVALID_MMIO_REG _MMIO(0) 108 112 109 - static __always_inline u32 i915_mmio_reg_offset(i915_reg_t reg) 110 - { 111 - return reg.reg; 112 - } 113 - 114 - static inline bool i915_mmio_reg_equal(i915_reg_t a, i915_reg_t b) 115 - { 116 - return i915_mmio_reg_offset(a) == i915_mmio_reg_offset(b); 117 - } 118 - 119 - static inline bool i915_mmio_reg_valid(i915_reg_t reg) 120 - { 121 - return !i915_mmio_reg_equal(reg, INVALID_MMIO_REG); 122 - } 113 + /* 114 + * These macros can be used on either i915_reg_t or i915_mcr_reg_t since they're 115 + * simply operations on the register's offset and don't care about the MCR vs 116 + * non-MCR nature of the register. 117 + */ 118 + #define i915_mmio_reg_offset(r) \ 119 + _Generic((r), i915_reg_t: (r).reg, i915_mcr_reg_t: (r).reg) 120 + #define i915_mmio_reg_equal(a, b) (i915_mmio_reg_offset(a) == i915_mmio_reg_offset(b)) 121 + #define i915_mmio_reg_valid(r) (!i915_mmio_reg_equal(r, INVALID_MMIO_REG)) 123 122 124 123 #define VLV_DISPLAY_BASE 0x180000 125 124

+24

drivers/gpu/drm/i915/i915_request.c

··· 1621 1621 return ret; 1622 1622 } 1623 1623 1624 + static void i915_request_await_huc(struct i915_request *rq) 1625 + { 1626 + struct intel_huc *huc = &rq->context->engine->gt->uc.huc; 1627 + 1628 + /* don't stall kernel submissions! */ 1629 + if (!rcu_access_pointer(rq->context->gem_context)) 1630 + return; 1631 + 1632 + if (intel_huc_wait_required(huc)) 1633 + i915_sw_fence_await_sw_fence(&rq->submit, 1634 + &huc->delayed_load.fence, 1635 + &rq->hucq); 1636 + } 1637 + 1624 1638 static struct i915_request * 1625 1639 __i915_request_ensure_parallel_ordering(struct i915_request *rq, 1626 1640 struct intel_timeline *timeline) ··· 1715 1701 { 1716 1702 struct intel_timeline *timeline = i915_request_timeline(rq); 1717 1703 struct i915_request *prev; 1704 + 1705 + /* 1706 + * Media workloads may require HuC, so stall them until HuC loading is 1707 + * complete. Note that HuC not being loaded when a user submission 1708 + * arrives can only happen when HuC is loaded via GSC and in that case 1709 + * we still expect the window between us starting to accept submissions 1710 + * and HuC loading completion to be small (a few hundred ms). 1711 + */ 1712 + if (rq->engine->class == VIDEO_DECODE_CLASS) 1713 + i915_request_await_huc(rq); 1718 1714 1719 1715 /* 1720 1716 * Dependency tracking and request ordering along the timeline

+5

drivers/gpu/drm/i915/i915_request.h

··· 348 348 #define GUC_PRIO_FINI 0xfe 349 349 u8 guc_prio; 350 350 351 + /** 352 + * @hucq: wait queue entry used to wait on the HuC load to complete 353 + */ 354 + wait_queue_entry_t hucq; 355 + 351 356 I915_SELFTEST_DECLARE(struct { 352 357 struct list_head link; 353 358 unsigned long delay;

+20 -12

drivers/gpu/drm/i915/i915_scatterlist.h

··· 9 9 10 10 #include <linux/pfn.h> 11 11 #include <linux/scatterlist.h> 12 - #include <linux/swiotlb.h> 12 + #include <linux/dma-mapping.h> 13 + #include <xen/xen.h> 13 14 14 15 #include "i915_gem.h" 15 16 ··· 128 127 return page_sizes; 129 128 } 130 129 131 - static inline unsigned int i915_sg_segment_size(void) 130 + static inline unsigned int i915_sg_segment_size(struct device *dev) 132 131 { 133 - unsigned int size = swiotlb_max_segment(); 132 + size_t max = min_t(size_t, UINT_MAX, dma_max_mapping_size(dev)); 134 133 135 - if (size == 0) 136 - size = UINT_MAX; 137 - 138 - size = rounddown(size, PAGE_SIZE); 139 - /* swiotlb_max_segment_size can return 1 byte when it means one page. */ 140 - if (size < PAGE_SIZE) 141 - size = PAGE_SIZE; 142 - 143 - return size; 134 + /* 135 + * For Xen PV guests pages aren't contiguous in DMA (machine) address 136 + * space. The DMA API takes care of that both in dma_alloc_* (by 137 + * calling into the hypervisor to make the pages contiguous) and in 138 + * dma_map_* (by bounce buffering). But i915 abuses ignores the 139 + * coherency aspects of the DMA API and thus can't cope with bounce 140 + * buffering actually happening, so add a hack here to force small 141 + * allocations and mappings when running in PV mode on Xen. 142 + * 143 + * Note this will still break if bounce buffering is required for other 144 + * reasons, like confidential computing hypervisors or PCIe root ports 145 + * with addressing limitations. 146 + */ 147 + if (xen_pv_domain()) 148 + max = PAGE_SIZE; 149 + return round_down(max, PAGE_SIZE); 144 150 } 145 151 146 152 bool i915_sg_trim(struct sg_table *orig_st);

+2

drivers/gpu/drm/i915/i915_selftest.h

··· 92 92 T, ARRAY_SIZE(T), data) 93 93 #define i915_live_subtests(T, data) ({ \ 94 94 typecheck(struct drm_i915_private *, data); \ 95 + (data)->gt[0]->uc.guc.submission_state.sched_disable_delay_ms = 0; \ 95 96 __i915_subtests(__func__, \ 96 97 __i915_live_setup, __i915_live_teardown, \ 97 98 T, ARRAY_SIZE(T), data); \ 98 99 }) 99 100 #define intel_gt_live_subtests(T, data) ({ \ 100 101 typecheck(struct intel_gt *, data); \ 102 + (data)->uc.guc.submission_state.sched_disable_delay_ms = 0; \ 101 103 __i915_subtests(__func__, \ 102 104 __intel_gt_live_setup, __intel_gt_live_teardown, \ 103 105 T, ARRAY_SIZE(T), data); \

-15

drivers/gpu/drm/i915/i915_trace.h

··· 671 671 (u32)(__entry->val >> 32)) 672 672 ); 673 673 674 - TRACE_EVENT(intel_gpu_freq_change, 675 - TP_PROTO(u32 freq), 676 - TP_ARGS(freq), 677 - 678 - TP_STRUCT__entry( 679 - __field(u32, freq) 680 - ), 681 - 682 - TP_fast_assign( 683 - __entry->freq = freq; 684 - ), 685 - 686 - TP_printk("new_freq=%u", __entry->freq) 687 - ); 688 - 689 674 /** 690 675 * DOC: i915_ppgtt_create and i915_ppgtt_release tracepoints 691 676 *

+2 -7

drivers/gpu/drm/i915/i915_vma.c

··· 776 776 GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE)); 777 777 778 778 alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj)); 779 - /* 780 - * for compact-pt we round up the reservation to prevent 781 - * any smaller pages being used within the same PDE 782 - */ 783 - if (NEEDS_COMPACT_PT(vma->vm->i915)) 784 - size = round_up(size, alignment); 785 779 786 780 /* If binding the object/GGTT view requires more space than the entire 787 781 * aperture has, reject it early before evicting everything in a vain ··· 814 820 * forseeable future. See also i915_ggtt_offset(). 815 821 */ 816 822 if (upper_32_bits(end - 1) && 817 - vma->page_sizes.sg > I915_GTT_PAGE_SIZE) { 823 + vma->page_sizes.sg > I915_GTT_PAGE_SIZE && 824 + !HAS_64K_PAGES(vma->vm->i915)) { 818 825 /* 819 826 * We can't mix 64K and 4K PTEs in the same page-table 820 827 * (2M block), and so to avoid the ugliness and

+2 -1

drivers/gpu/drm/i915/intel_device_info.h

··· 146 146 /* Keep has_* in alphabetical order */ \ 147 147 func(has_64bit_reloc); \ 148 148 func(has_64k_pages); \ 149 - func(needs_compact_pt); \ 150 149 func(gpu_reset_clobbers_display); \ 151 150 func(has_reset_engine); \ 152 151 func(has_3d_pipeline); \ ··· 164 165 func(has_logical_ring_elsq); \ 165 166 func(has_media_ratio_mode); \ 166 167 func(has_mslice_steering); \ 168 + func(has_oa_bpc_reporting); \ 169 + func(has_oa_slice_contrib_limits); \ 167 170 func(has_one_eu_per_fuse_bit); \ 168 171 func(has_pxp); \ 169 172 func(has_rc6); \

+1 -1

drivers/gpu/drm/i915/intel_gvt_mmio_table.c

··· 102 102 MMIO_D(_MMIO(0x2438)); 103 103 MMIO_D(_MMIO(0x243c)); 104 104 MMIO_D(_MMIO(0x7018)); 105 - MMIO_D(HALF_SLICE_CHICKEN3); 105 + MMIO_D(HSW_HALF_SLICE_CHICKEN3); 106 106 MMIO_D(GEN7_HALF_SLICE_CHICKEN1); 107 107 /* display */ 108 108 MMIO_F(_MMIO(0x60220), 0x20);

+21

drivers/gpu/drm/i915/intel_mchbar_regs.h

··· 189 189 #define DG1_QCLK_RATIO_MASK REG_GENMASK(9, 2) 190 190 #define DG1_QCLK_REFERENCE REG_BIT(10) 191 191 192 + /* 193 + * *_PACKAGE_POWER_SKU - SKU power and timing parameters. 194 + */ 195 + #define PCU_PACKAGE_POWER_SKU _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5930) 196 + #define PKG_PKG_TDP GENMASK_ULL(14, 0) 197 + #define PKG_MAX_WIN GENMASK_ULL(54, 48) 198 + #define PKG_MAX_WIN_X GENMASK_ULL(54, 53) 199 + #define PKG_MAX_WIN_Y GENMASK_ULL(52, 48) 200 + 201 + #define PCU_PACKAGE_POWER_SKU_UNIT _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5938) 202 + #define PKG_PWR_UNIT REG_GENMASK(3, 0) 203 + #define PKG_ENERGY_UNIT REG_GENMASK(12, 8) 204 + #define PKG_TIME_UNIT REG_GENMASK(19, 16) 205 + #define PCU_PACKAGE_ENERGY_STATUS _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x593c) 206 + 192 207 #define GEN6_GT_PERF_STATUS _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5948) 193 208 #define GEN6_RP_STATE_LIMITS _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5994) 194 209 #define GEN6_RP_STATE_CAP _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5998) ··· 213 198 214 199 #define GEN10_FREQ_INFO_REC _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5ef0) 215 200 #define RPE_MASK REG_GENMASK(15, 8) 201 + #define PCU_PACKAGE_RAPL_LIMIT _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x59a0) 202 + #define PKG_PWR_LIM_1 REG_GENMASK(14, 0) 203 + #define PKG_PWR_LIM_1_EN REG_BIT(15) 204 + #define PKG_PWR_LIM_1_TIME REG_GENMASK(23, 17) 205 + #define PKG_PWR_LIM_1_TIME_X REG_GENMASK(23, 22) 206 + #define PKG_PWR_LIM_1_TIME_Y REG_GENMASK(21, 17) 216 207 217 208 /* snb MCH registers for priority tuning */ 218 209 #define MCH_SSKPD _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5d10)

+23 -5

drivers/gpu/drm/i915/intel_pci_config.h

··· 7 7 #define __INTEL_PCI_CONFIG_H__ 8 8 9 9 /* PCI BARs */ 10 - #define GTTMMADR_BAR 0 11 - #define GEN2_GTTMMADR_BAR 1 12 - #define GFXMEM_BAR 2 13 - #define GTT_APERTURE_BAR GFXMEM_BAR 14 - #define GEN12_LMEM_BAR GFXMEM_BAR 10 + #define GEN2_GMADR_BAR 0 11 + #define GEN2_MMADR_BAR 1 /* MMIO+GTT, despite the name */ 12 + #define GEN2_IO_BAR 2 /* 85x/865 */ 13 + 14 + #define GEN3_MMADR_BAR 0 /* MMIO only */ 15 + #define GEN3_IO_BAR 1 16 + #define GEN3_GMADR_BAR 2 17 + #define GEN3_GTTADR_BAR 3 /* GTT only */ 18 + 19 + #define GEN4_GTTMMADR_BAR 0 /* MMIO+GTT */ 20 + #define GEN4_GMADR_BAR 2 21 + #define GEN4_IO_BAR 4 22 + 23 + #define GEN12_LMEM_BAR 2 24 + 25 + static inline int intel_mmio_bar(int graphics_ver) 26 + { 27 + switch (graphics_ver) { 28 + case 2: return GEN2_MMADR_BAR; 29 + case 3: return GEN3_MMADR_BAR; 30 + default: return GEN4_GTTMMADR_BAR; 31 + } 32 + } 15 33 16 34 /* BSM in include/drm/i915_drm.h */ 17 35

+76 -129

drivers/gpu/drm/i915/intel_pm.c

··· 30 30 #include "display/skl_watermark.h" 31 31 32 32 #include "gt/intel_engine_regs.h" 33 + #include "gt/intel_gt.h" 34 + #include "gt/intel_gt_mcr.h" 33 35 #include "gt/intel_gt_regs.h" 34 36 35 37 #include "i915_drv.h" ··· 60 58 * Must match Sampler, Pixel Back End, and Media. See 61 59 * WaCompressedResourceSamplerPbeMediaNewHashMode. 62 60 */ 63 - intel_uncore_write(&dev_priv->uncore, CHICKEN_PAR1_1, 64 - intel_uncore_read(&dev_priv->uncore, CHICKEN_PAR1_1) | 65 - SKL_DE_COMPRESSED_HASH_MODE); 61 + intel_uncore_rmw(&dev_priv->uncore, CHICKEN_PAR1_1, 0, SKL_DE_COMPRESSED_HASH_MODE); 66 62 } 67 63 68 64 /* See Bspec note for PSR2_CTL bit 31, Wa#828:skl,bxt,kbl,cfl */ 69 - intel_uncore_write(&dev_priv->uncore, CHICKEN_PAR1_1, 70 - intel_uncore_read(&dev_priv->uncore, CHICKEN_PAR1_1) | SKL_EDP_PSR_FIX_RDWRAP); 65 + intel_uncore_rmw(&dev_priv->uncore, CHICKEN_PAR1_1, 0, SKL_EDP_PSR_FIX_RDWRAP); 71 66 72 67 /* WaEnableChickenDCPR:skl,bxt,kbl,glk,cfl */ 73 - intel_uncore_write(&dev_priv->uncore, GEN8_CHICKEN_DCPR_1, 74 - intel_uncore_read(&dev_priv->uncore, GEN8_CHICKEN_DCPR_1) | MASK_WAKEMEM); 68 + intel_uncore_rmw(&dev_priv->uncore, GEN8_CHICKEN_DCPR_1, 0, MASK_WAKEMEM); 75 69 76 70 /* 77 71 * WaFbcWakeMemOn:skl,bxt,kbl,glk,cfl 78 72 * Display WA #0859: skl,bxt,kbl,glk,cfl 79 73 */ 80 - intel_uncore_write(&dev_priv->uncore, DISP_ARB_CTL, intel_uncore_read(&dev_priv->uncore, DISP_ARB_CTL) | 81 - DISP_FBC_MEMORY_WAKE); 74 + intel_uncore_rmw(&dev_priv->uncore, DISP_ARB_CTL, 0, DISP_FBC_MEMORY_WAKE); 82 75 } 83 76 84 77 static void bxt_init_clock_gating(struct drm_i915_private *dev_priv) ··· 81 84 gen9_init_clock_gating(dev_priv); 82 85 83 86 /* WaDisableSDEUnitClockGating:bxt */ 84 - intel_uncore_write(&dev_priv->uncore, GEN8_UCGCTL6, intel_uncore_read(&dev_priv->uncore, GEN8_UCGCTL6) | 85 - GEN8_SDEUNIT_CLOCK_GATE_DISABLE); 87 + intel_uncore_rmw(&dev_priv->uncore, GEN8_UCGCTL6, 0, GEN8_SDEUNIT_CLOCK_GATE_DISABLE); 86 88 87 89 /* 88 90 * FIXME: 89 91 * GEN8_HDCUNIT_CLOCK_GATE_DISABLE_HDCREQ applies on 3x6 GT SKUs only. 90 92 */ 91 - intel_uncore_write(&dev_priv->uncore, GEN8_UCGCTL6, intel_uncore_read(&dev_priv->uncore, GEN8_UCGCTL6) | 92 - GEN8_HDCUNIT_CLOCK_GATE_DISABLE_HDCREQ); 93 + intel_uncore_rmw(&dev_priv->uncore, GEN8_UCGCTL6, 0, GEN8_HDCUNIT_CLOCK_GATE_DISABLE_HDCREQ); 93 94 94 95 /* 95 96 * Wa: Backlight PWM may stop in the asserted state, causing backlight ··· 108 113 * WaFbcTurnOffFbcWatermark:bxt 109 114 * Display WA #0562: bxt 110 115 */ 111 - intel_uncore_write(&dev_priv->uncore, DISP_ARB_CTL, intel_uncore_read(&dev_priv->uncore, DISP_ARB_CTL) | 112 - DISP_FBC_WM_DIS); 116 + intel_uncore_rmw(&dev_priv->uncore, DISP_ARB_CTL, 0, DISP_FBC_WM_DIS); 113 117 114 118 /* 115 119 * WaFbcHighMemBwCorruptionAvoidance:bxt 116 120 * Display WA #0883: bxt 117 121 */ 118 - intel_uncore_write(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 119 - intel_uncore_read(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A)) | 120 - DPFC_DISABLE_DUMMY0); 122 + intel_uncore_rmw(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 0, DPFC_DISABLE_DUMMY0); 121 123 } 122 124 123 125 static void glk_init_clock_gating(struct drm_i915_private *dev_priv) ··· 4045 4053 */ 4046 4054 static void ilk_init_lp_watermarks(struct drm_i915_private *dev_priv) 4047 4055 { 4048 - intel_uncore_write(&dev_priv->uncore, WM3_LP_ILK, intel_uncore_read(&dev_priv->uncore, WM3_LP_ILK) & ~WM_LP_ENABLE); 4049 - intel_uncore_write(&dev_priv->uncore, WM2_LP_ILK, intel_uncore_read(&dev_priv->uncore, WM2_LP_ILK) & ~WM_LP_ENABLE); 4050 - intel_uncore_write(&dev_priv->uncore, WM1_LP_ILK, intel_uncore_read(&dev_priv->uncore, WM1_LP_ILK) & ~WM_LP_ENABLE); 4056 + intel_uncore_rmw(&dev_priv->uncore, WM3_LP_ILK, WM_LP_ENABLE, 0); 4057 + intel_uncore_rmw(&dev_priv->uncore, WM2_LP_ILK, WM_LP_ENABLE, 0); 4058 + intel_uncore_rmw(&dev_priv->uncore, WM1_LP_ILK, WM_LP_ENABLE, 0); 4051 4059 4052 4060 /* 4053 4061 * Don't touch WM_LP_SPRITE_ENABLE here. ··· 4101 4109 enum pipe pipe; 4102 4110 4103 4111 for_each_pipe(dev_priv, pipe) { 4104 - intel_uncore_write(&dev_priv->uncore, DSPCNTR(pipe), 4105 - intel_uncore_read(&dev_priv->uncore, DSPCNTR(pipe)) | 4106 - DISP_TRICKLE_FEED_DISABLE); 4112 + intel_uncore_rmw(&dev_priv->uncore, DSPCNTR(pipe), 0, DISP_TRICKLE_FEED_DISABLE); 4107 4113 4108 4114 intel_uncore_rmw(&dev_priv->uncore, DSPSURF(pipe), 0, 0); 4109 4115 intel_uncore_posting_read(&dev_priv->uncore, DSPSURF(pipe)); ··· 4150 4160 */ 4151 4161 if (IS_IRONLAKE_M(dev_priv)) { 4152 4162 /* WaFbcAsynchFlipDisableFbcQueue:ilk */ 4153 - intel_uncore_write(&dev_priv->uncore, ILK_DISPLAY_CHICKEN1, 4154 - intel_uncore_read(&dev_priv->uncore, ILK_DISPLAY_CHICKEN1) | 4155 - ILK_FBCQ_DIS); 4156 - intel_uncore_write(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2, 4157 - intel_uncore_read(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2) | 4158 - ILK_DPARB_GATE); 4163 + intel_uncore_rmw(&dev_priv->uncore, ILK_DISPLAY_CHICKEN1, 0, ILK_FBCQ_DIS); 4164 + intel_uncore_rmw(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2, 0, ILK_DPARB_GATE); 4159 4165 } 4160 4166 4161 4167 intel_uncore_write(&dev_priv->uncore, ILK_DSPCLK_GATE_D, dspclk_gate); 4162 4168 4163 - intel_uncore_write(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2, 4164 - intel_uncore_read(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2) | 4165 - ILK_ELPIN_409_SELECT); 4169 + intel_uncore_rmw(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2, 0, ILK_ELPIN_409_SELECT); 4166 4170 4167 4171 g4x_disable_trickle_feed(dev_priv); 4168 4172 ··· 4176 4192 intel_uncore_write(&dev_priv->uncore, SOUTH_DSPCLK_GATE_D, PCH_DPLSUNIT_CLOCK_GATE_DISABLE | 4177 4193 PCH_DPLUNIT_CLOCK_GATE_DISABLE | 4178 4194 PCH_CPUNIT_CLOCK_GATE_DISABLE); 4179 - intel_uncore_write(&dev_priv->uncore, SOUTH_CHICKEN2, intel_uncore_read(&dev_priv->uncore, SOUTH_CHICKEN2) | 4180 - DPLS_EDP_PPS_FIX_DIS); 4195 + intel_uncore_rmw(&dev_priv->uncore, SOUTH_CHICKEN2, 0, DPLS_EDP_PPS_FIX_DIS); 4181 4196 /* The below fixes the weird display corruption, a few pixels shifted 4182 4197 * downward, on (only) LVDS of some HP laptops with IVY. 4183 4198 */ ··· 4214 4231 4215 4232 intel_uncore_write(&dev_priv->uncore, ILK_DSPCLK_GATE_D, dspclk_gate); 4216 4233 4217 - intel_uncore_write(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2, 4218 - intel_uncore_read(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2) | 4219 - ILK_ELPIN_409_SELECT); 4234 + intel_uncore_rmw(&dev_priv->uncore, ILK_DISPLAY_CHICKEN2, 0, ILK_ELPIN_409_SELECT); 4220 4235 4221 4236 intel_uncore_write(&dev_priv->uncore, GEN6_UCGCTL1, 4222 4237 intel_uncore_read(&dev_priv->uncore, GEN6_UCGCTL1) | ··· 4274 4293 * disabled when not needed anymore in order to save power. 4275 4294 */ 4276 4295 if (HAS_PCH_LPT_LP(dev_priv)) 4277 - intel_uncore_write(&dev_priv->uncore, SOUTH_DSPCLK_GATE_D, 4278 - intel_uncore_read(&dev_priv->uncore, SOUTH_DSPCLK_GATE_D) | 4279 - PCH_LP_PARTITION_LEVEL_DISABLE); 4296 + intel_uncore_rmw(&dev_priv->uncore, SOUTH_DSPCLK_GATE_D, 4297 + 0, PCH_LP_PARTITION_LEVEL_DISABLE); 4280 4298 4281 4299 /* WADPOClockGatingDisable:hsw */ 4282 - intel_uncore_write(&dev_priv->uncore, TRANS_CHICKEN1(PIPE_A), 4283 - intel_uncore_read(&dev_priv->uncore, TRANS_CHICKEN1(PIPE_A)) | 4284 - TRANS_CHICKEN1_DP0UNIT_GC_DISABLE); 4300 + intel_uncore_rmw(&dev_priv->uncore, TRANS_CHICKEN1(PIPE_A), 4301 + 0, TRANS_CHICKEN1_DP0UNIT_GC_DISABLE); 4285 4302 } 4286 4303 4287 4304 static void lpt_suspend_hw(struct drm_i915_private *dev_priv) ··· 4300 4321 u32 val; 4301 4322 4302 4323 /* WaTempDisableDOPClkGating:bdw */ 4303 - misccpctl = intel_uncore_rmw(&dev_priv->uncore, GEN7_MISCCPCTL, 4304 - GEN7_DOP_CLOCK_GATE_ENABLE, 0); 4324 + misccpctl = intel_gt_mcr_multicast_rmw(to_gt(dev_priv), GEN8_MISCCPCTL, 4325 + GEN8_DOP_CLOCK_GATE_ENABLE, 0); 4305 4326 4306 - val = intel_uncore_read(&dev_priv->uncore, GEN8_L3SQCREG1); 4327 + val = intel_gt_mcr_read_any(to_gt(dev_priv), GEN8_L3SQCREG1); 4307 4328 val &= ~L3_PRIO_CREDITS_MASK; 4308 4329 val |= L3_GENERAL_PRIO_CREDITS(general_prio_credits); 4309 4330 val |= L3_HIGH_PRIO_CREDITS(high_prio_credits); 4310 - intel_uncore_write(&dev_priv->uncore, GEN8_L3SQCREG1, val); 4331 + intel_gt_mcr_multicast_write(to_gt(dev_priv), GEN8_L3SQCREG1, val); 4311 4332 4312 4333 /* 4313 4334 * Wait at least 100 clocks before re-enabling clock gating. 4314 4335 * See the definition of L3SQCREG1 in BSpec. 4315 4336 */ 4316 - intel_uncore_posting_read(&dev_priv->uncore, GEN8_L3SQCREG1); 4337 + intel_gt_mcr_read_any(to_gt(dev_priv), GEN8_L3SQCREG1); 4317 4338 udelay(1); 4318 - intel_uncore_write(&dev_priv->uncore, GEN7_MISCCPCTL, misccpctl); 4339 + intel_gt_mcr_multicast_write(to_gt(dev_priv), GEN8_MISCCPCTL, misccpctl); 4319 4340 } 4320 4341 4321 4342 static void icl_init_clock_gating(struct drm_i915_private *dev_priv) ··· 4338 4359 4339 4360 /* Wa_1409825376:tgl (pre-prod)*/ 4340 4361 if (IS_TGL_DISPLAY_STEP(dev_priv, STEP_A0, STEP_C0)) 4341 - intel_uncore_write(&dev_priv->uncore, GEN9_CLKGATE_DIS_3, intel_uncore_read(&dev_priv->uncore, GEN9_CLKGATE_DIS_3) | 4342 - TGL_VRH_GATING_DIS); 4362 + intel_uncore_rmw(&dev_priv->uncore, GEN9_CLKGATE_DIS_3, 0, TGL_VRH_GATING_DIS); 4343 4363 4344 4364 /* Wa_14013723622:tgl,rkl,dg1,adl-s */ 4345 4365 if (DISPLAY_VER(dev_priv) == 12) ··· 4363 4385 4364 4386 /* Wa_1409836686:dg1[a0] */ 4365 4387 if (IS_DG1_GRAPHICS_STEP(dev_priv, STEP_A0, STEP_B0)) 4366 - intel_uncore_write(&dev_priv->uncore, GEN9_CLKGATE_DIS_3, intel_uncore_read(&dev_priv->uncore, GEN9_CLKGATE_DIS_3) | 4367 - DPT_GATING_DIS); 4388 + intel_uncore_rmw(&dev_priv->uncore, GEN9_CLKGATE_DIS_3, 0, DPT_GATING_DIS); 4368 4389 } 4369 4390 4370 4391 static void xehpsdv_init_clock_gating(struct drm_i915_private *dev_priv) ··· 4405 4428 return; 4406 4429 4407 4430 /* Display WA #1181 WaSouthDisplayDisablePWMCGEGating: cnp */ 4408 - intel_uncore_write(&dev_priv->uncore, SOUTH_DSPCLK_GATE_D, intel_uncore_read(&dev_priv->uncore, SOUTH_DSPCLK_GATE_D) | 4409 - CNP_PWM_CGE_GATING_DISABLE); 4431 + intel_uncore_rmw(&dev_priv->uncore, SOUTH_DSPCLK_GATE_D, 0, CNP_PWM_CGE_GATING_DISABLE); 4410 4432 } 4411 4433 4412 4434 static void cfl_init_clock_gating(struct drm_i915_private *dev_priv) ··· 4414 4438 gen9_init_clock_gating(dev_priv); 4415 4439 4416 4440 /* WAC6entrylatency:cfl */ 4417 - intel_uncore_write(&dev_priv->uncore, FBC_LLC_READ_CTRL, intel_uncore_read(&dev_priv->uncore, FBC_LLC_READ_CTRL) | 4418 - FBC_LLC_FULLY_OPEN); 4441 + intel_uncore_rmw(&dev_priv->uncore, FBC_LLC_READ_CTRL, 0, FBC_LLC_FULLY_OPEN); 4419 4442 4420 4443 /* 4421 4444 * WaFbcTurnOffFbcWatermark:cfl 4422 4445 * Display WA #0562: cfl 4423 4446 */ 4424 - intel_uncore_write(&dev_priv->uncore, DISP_ARB_CTL, intel_uncore_read(&dev_priv->uncore, DISP_ARB_CTL) | 4425 - DISP_FBC_WM_DIS); 4447 + intel_uncore_rmw(&dev_priv->uncore, DISP_ARB_CTL, 0, DISP_FBC_WM_DIS); 4426 4448 4427 4449 /* 4428 4450 * WaFbcNukeOnHostModify:cfl 4429 4451 * Display WA #0873: cfl 4430 4452 */ 4431 - intel_uncore_write(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 4432 - intel_uncore_read(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A)) | 4433 - DPFC_NUKE_ON_ANY_MODIFICATION); 4453 + intel_uncore_rmw(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 4454 + 0, DPFC_NUKE_ON_ANY_MODIFICATION); 4434 4455 } 4435 4456 4436 4457 static void kbl_init_clock_gating(struct drm_i915_private *dev_priv) ··· 4435 4462 gen9_init_clock_gating(dev_priv); 4436 4463 4437 4464 /* WAC6entrylatency:kbl */ 4438 - intel_uncore_write(&dev_priv->uncore, FBC_LLC_READ_CTRL, intel_uncore_read(&dev_priv->uncore, FBC_LLC_READ_CTRL) | 4439 - FBC_LLC_FULLY_OPEN); 4465 + intel_uncore_rmw(&dev_priv->uncore, FBC_LLC_READ_CTRL, 0, FBC_LLC_FULLY_OPEN); 4440 4466 4441 4467 /* WaDisableSDEUnitClockGating:kbl */ 4442 4468 if (IS_KBL_GRAPHICS_STEP(dev_priv, 0, STEP_C0)) 4443 - intel_uncore_write(&dev_priv->uncore, GEN8_UCGCTL6, intel_uncore_read(&dev_priv->uncore, GEN8_UCGCTL6) | 4444 - GEN8_SDEUNIT_CLOCK_GATE_DISABLE); 4469 + intel_uncore_rmw(&dev_priv->uncore, GEN8_UCGCTL6, 4470 + 0, GEN8_SDEUNIT_CLOCK_GATE_DISABLE); 4445 4471 4446 4472 /* WaDisableGamClockGating:kbl */ 4447 4473 if (IS_KBL_GRAPHICS_STEP(dev_priv, 0, STEP_C0)) 4448 - intel_uncore_write(&dev_priv->uncore, GEN6_UCGCTL1, intel_uncore_read(&dev_priv->uncore, GEN6_UCGCTL1) | 4449 - GEN6_GAMUNIT_CLOCK_GATE_DISABLE); 4474 + intel_uncore_rmw(&dev_priv->uncore, GEN6_UCGCTL1, 4475 + 0, GEN6_GAMUNIT_CLOCK_GATE_DISABLE); 4450 4476 4451 4477 /* 4452 4478 * WaFbcTurnOffFbcWatermark:kbl 4453 4479 * Display WA #0562: kbl 4454 4480 */ 4455 - intel_uncore_write(&dev_priv->uncore, DISP_ARB_CTL, intel_uncore_read(&dev_priv->uncore, DISP_ARB_CTL) | 4456 - DISP_FBC_WM_DIS); 4481 + intel_uncore_rmw(&dev_priv->uncore, DISP_ARB_CTL, 0, DISP_FBC_WM_DIS); 4457 4482 4458 4483 /* 4459 4484 * WaFbcNukeOnHostModify:kbl 4460 4485 * Display WA #0873: kbl 4461 4486 */ 4462 - intel_uncore_write(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 4463 - intel_uncore_read(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A)) | 4464 - DPFC_NUKE_ON_ANY_MODIFICATION); 4487 + intel_uncore_rmw(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 4488 + 0, DPFC_NUKE_ON_ANY_MODIFICATION); 4465 4489 } 4466 4490 4467 4491 static void skl_init_clock_gating(struct drm_i915_private *dev_priv) ··· 4466 4496 gen9_init_clock_gating(dev_priv); 4467 4497 4468 4498 /* WaDisableDopClockGating:skl */ 4469 - intel_uncore_write(&dev_priv->uncore, GEN7_MISCCPCTL, intel_uncore_read(&dev_priv->uncore, GEN7_MISCCPCTL) & 4470 - ~GEN7_DOP_CLOCK_GATE_ENABLE); 4499 + intel_gt_mcr_multicast_rmw(to_gt(dev_priv), GEN8_MISCCPCTL, 4500 + GEN8_DOP_CLOCK_GATE_ENABLE, 0); 4471 4501 4472 4502 /* WAC6entrylatency:skl */ 4473 - intel_uncore_write(&dev_priv->uncore, FBC_LLC_READ_CTRL, intel_uncore_read(&dev_priv->uncore, FBC_LLC_READ_CTRL) | 4474 - FBC_LLC_FULLY_OPEN); 4503 + intel_uncore_rmw(&dev_priv->uncore, FBC_LLC_READ_CTRL, 0, FBC_LLC_FULLY_OPEN); 4475 4504 4476 4505 /* 4477 4506 * WaFbcTurnOffFbcWatermark:skl 4478 4507 * Display WA #0562: skl 4479 4508 */ 4480 - intel_uncore_write(&dev_priv->uncore, DISP_ARB_CTL, intel_uncore_read(&dev_priv->uncore, DISP_ARB_CTL) | 4481 - DISP_FBC_WM_DIS); 4509 + intel_uncore_rmw(&dev_priv->uncore, DISP_ARB_CTL, 0, DISP_FBC_WM_DIS); 4482 4510 4483 4511 /* 4484 4512 * WaFbcNukeOnHostModify:skl 4485 4513 * Display WA #0873: skl 4486 4514 */ 4487 - intel_uncore_write(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 4488 - intel_uncore_read(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A)) | 4489 - DPFC_NUKE_ON_ANY_MODIFICATION); 4515 + intel_uncore_rmw(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 4516 + 0, DPFC_NUKE_ON_ANY_MODIFICATION); 4490 4517 4491 4518 /* 4492 4519 * WaFbcHighMemBwCorruptionAvoidance:skl 4493 4520 * Display WA #0883: skl 4494 4521 */ 4495 - intel_uncore_write(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 4496 - intel_uncore_read(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A)) | 4497 - DPFC_DISABLE_DUMMY0); 4522 + intel_uncore_rmw(&dev_priv->uncore, ILK_DPFC_CHICKEN(INTEL_FBC_A), 0, DPFC_DISABLE_DUMMY0); 4498 4523 } 4499 4524 4500 4525 static void bdw_init_clock_gating(struct drm_i915_private *dev_priv) ··· 4497 4532 enum pipe pipe; 4498 4533 4499 4534 /* WaFbcAsynchFlipDisableFbcQueue:hsw,bdw */ 4500 - intel_uncore_write(&dev_priv->uncore, CHICKEN_PIPESL_1(PIPE_A), 4501 - intel_uncore_read(&dev_priv->uncore, CHICKEN_PIPESL_1(PIPE_A)) | 4502 - HSW_FBCQ_DIS); 4535 + intel_uncore_rmw(&dev_priv->uncore, CHICKEN_PIPESL_1(PIPE_A), 0, HSW_FBCQ_DIS); 4503 4536 4504 4537 /* WaSwitchSolVfFArbitrationPriority:bdw */ 4505 - intel_uncore_write(&dev_priv->uncore, GAM_ECOCHK, intel_uncore_read(&dev_priv->uncore, GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); 4538 + intel_uncore_rmw(&dev_priv->uncore, GAM_ECOCHK, 0, HSW_ECOCHK_ARB_PRIO_SOL); 4506 4539 4507 4540 /* WaPsrDPAMaskVBlankInSRD:bdw */ 4508 - intel_uncore_write(&dev_priv->uncore, CHICKEN_PAR1_1, 4509 - intel_uncore_read(&dev_priv->uncore, CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD); 4541 + intel_uncore_rmw(&dev_priv->uncore, CHICKEN_PAR1_1, 0, DPA_MASK_VBLANK_SRD); 4510 4542 4511 4543 for_each_pipe(dev_priv, pipe) { 4512 4544 /* WaPsrDPRSUnmaskVBlankInSRD:bdw */ 4513 - intel_uncore_write(&dev_priv->uncore, CHICKEN_PIPESL_1(pipe), 4514 - intel_uncore_read(&dev_priv->uncore, CHICKEN_PIPESL_1(pipe)) | 4515 - BDW_DPRS_MASK_VBLANK_SRD); 4545 + intel_uncore_rmw(&dev_priv->uncore, CHICKEN_PIPESL_1(pipe), 4546 + 0, BDW_DPRS_MASK_VBLANK_SRD); 4516 4547 } 4517 4548 4518 4549 /* WaVSRefCountFullforceMissDisable:bdw */ 4519 4550 /* WaDSRefCountFullforceMissDisable:bdw */ 4520 - intel_uncore_write(&dev_priv->uncore, GEN7_FF_THREAD_MODE, 4521 - intel_uncore_read(&dev_priv->uncore, GEN7_FF_THREAD_MODE) & 4522 - ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); 4551 + intel_uncore_rmw(&dev_priv->uncore, GEN7_FF_THREAD_MODE, 4552 + GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME, 0); 4523 4553 4524 4554 intel_uncore_write(&dev_priv->uncore, RING_PSMI_CTL(RENDER_RING_BASE), 4525 4555 _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); 4526 4556 4527 4557 /* WaDisableSDEUnitClockGating:bdw */ 4528 - intel_uncore_write(&dev_priv->uncore, GEN8_UCGCTL6, intel_uncore_read(&dev_priv->uncore, GEN8_UCGCTL6) | 4529 - GEN8_SDEUNIT_CLOCK_GATE_DISABLE); 4558 + intel_uncore_rmw(&dev_priv->uncore, GEN8_UCGCTL6, 0, GEN8_SDEUNIT_CLOCK_GATE_DISABLE); 4530 4559 4531 4560 /* WaProgramL3SqcReg1Default:bdw */ 4532 4561 gen8_set_l3sqc_credits(dev_priv, 30, 2); 4533 4562 4534 4563 /* WaKVMNotificationOnConfigChange:bdw */ 4535 - intel_uncore_write(&dev_priv->uncore, CHICKEN_PAR2_1, intel_uncore_read(&dev_priv->uncore, CHICKEN_PAR2_1) 4536 - | KVM_CONFIG_CHANGE_NOTIFICATION_SELECT); 4564 + intel_uncore_rmw(&dev_priv->uncore, CHICKEN_PAR2_1, 4565 + 0, KVM_CONFIG_CHANGE_NOTIFICATION_SELECT); 4537 4566 4538 4567 lpt_init_clock_gating(dev_priv); 4539 4568 ··· 4536 4577 * Also see the CHICKEN2 write in bdw_init_workarounds() to disable DOP 4537 4578 * clock gating. 4538 4579 */ 4539 - intel_uncore_write(&dev_priv->uncore, GEN6_UCGCTL1, 4540 - intel_uncore_read(&dev_priv->uncore, GEN6_UCGCTL1) | GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE); 4580 + intel_uncore_rmw(&dev_priv->uncore, GEN6_UCGCTL1, 0, GEN6_EU_TCUNIT_CLOCK_GATE_DISABLE); 4541 4581 } 4542 4582 4543 4583 static void hsw_init_clock_gating(struct drm_i915_private *dev_priv) 4544 4584 { 4545 4585 /* WaFbcAsynchFlipDisableFbcQueue:hsw,bdw */ 4546 - intel_uncore_write(&dev_priv->uncore, CHICKEN_PIPESL_1(PIPE_A), 4547 - intel_uncore_read(&dev_priv->uncore, CHICKEN_PIPESL_1(PIPE_A)) | 4548 - HSW_FBCQ_DIS); 4586 + intel_uncore_rmw(&dev_priv->uncore, CHICKEN_PIPESL_1(PIPE_A), 0, HSW_FBCQ_DIS); 4549 4587 4550 4588 /* This is required by WaCatErrorRejectionIssue:hsw */ 4551 - intel_uncore_write(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, 4552 - intel_uncore_read(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG) | 4553 - GEN7_SQ_CHICKEN_MBCUNIT_SQINTMOB); 4589 + intel_uncore_rmw(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, 4590 + 0, GEN7_SQ_CHICKEN_MBCUNIT_SQINTMOB); 4554 4591 4555 4592 /* WaSwitchSolVfFArbitrationPriority:hsw */ 4556 - intel_uncore_write(&dev_priv->uncore, GAM_ECOCHK, intel_uncore_read(&dev_priv->uncore, GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); 4593 + intel_uncore_rmw(&dev_priv->uncore, GAM_ECOCHK, 0, HSW_ECOCHK_ARB_PRIO_SOL); 4557 4594 4558 4595 lpt_init_clock_gating(dev_priv); 4559 4596 } ··· 4559 4604 intel_uncore_write(&dev_priv->uncore, ILK_DSPCLK_GATE_D, ILK_VRHUNIT_CLOCK_GATE_DISABLE); 4560 4605 4561 4606 /* WaFbcAsynchFlipDisableFbcQueue:ivb */ 4562 - intel_uncore_write(&dev_priv->uncore, ILK_DISPLAY_CHICKEN1, 4563 - intel_uncore_read(&dev_priv->uncore, ILK_DISPLAY_CHICKEN1) | 4564 - ILK_FBCQ_DIS); 4607 + intel_uncore_rmw(&dev_priv->uncore, ILK_DISPLAY_CHICKEN1, 0, ILK_FBCQ_DIS); 4565 4608 4566 4609 /* WaDisableBackToBackFlipFix:ivb */ 4567 4610 intel_uncore_write(&dev_priv->uncore, IVB_CHICKEN3, ··· 4585 4632 GEN6_RCZUNIT_CLOCK_GATE_DISABLE); 4586 4633 4587 4634 /* This is required by WaCatErrorRejectionIssue:ivb */ 4588 - intel_uncore_write(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, 4589 - intel_uncore_read(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG) | 4590 - GEN7_SQ_CHICKEN_MBCUNIT_SQINTMOB); 4635 + intel_uncore_rmw(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, 4636 + 0, GEN7_SQ_CHICKEN_MBCUNIT_SQINTMOB); 4591 4637 4592 4638 g4x_disable_trickle_feed(dev_priv); 4593 4639 ··· 4611 4659 _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE)); 4612 4660 4613 4661 /* This is required by WaCatErrorRejectionIssue:vlv */ 4614 - intel_uncore_write(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, 4615 - intel_uncore_read(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG) | 4616 - GEN7_SQ_CHICKEN_MBCUNIT_SQINTMOB); 4662 + intel_uncore_rmw(&dev_priv->uncore, GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, 4663 + 0, GEN7_SQ_CHICKEN_MBCUNIT_SQINTMOB); 4617 4664 4618 4665 /* 4619 4666 * According to the spec, bit 13 (RCZUNIT) must be set on IVB. ··· 4624 4673 /* WaDisableL3Bank2xClockGate:vlv 4625 4674 * Disabling L3 clock gating- MMIO 940c[25] = 1 4626 4675 * Set bit 25, to disable L3_BANK_2x_CLK_GATING */ 4627 - intel_uncore_write(&dev_priv->uncore, GEN7_UCGCTL4, 4628 - intel_uncore_read(&dev_priv->uncore, GEN7_UCGCTL4) | GEN7_L3BANK2X_CLOCK_GATE_DISABLE); 4676 + intel_uncore_rmw(&dev_priv->uncore, GEN7_UCGCTL4, 0, GEN7_L3BANK2X_CLOCK_GATE_DISABLE); 4629 4677 4630 4678 /* 4631 4679 * WaDisableVLVClockGating_VBIIssue:vlv ··· 4638 4688 { 4639 4689 /* WaVSRefCountFullforceMissDisable:chv */ 4640 4690 /* WaDSRefCountFullforceMissDisable:chv */ 4641 - intel_uncore_write(&dev_priv->uncore, GEN7_FF_THREAD_MODE, 4642 - intel_uncore_read(&dev_priv->uncore, GEN7_FF_THREAD_MODE) & 4643 - ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME)); 4691 + intel_uncore_rmw(&dev_priv->uncore, GEN7_FF_THREAD_MODE, 4692 + GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME, 0); 4644 4693 4645 4694 /* WaDisableSemaphoreAndSyncFlipWait:chv */ 4646 4695 intel_uncore_write(&dev_priv->uncore, RING_PSMI_CTL(RENDER_RING_BASE), 4647 4696 _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE)); 4648 4697 4649 4698 /* WaDisableCSUnitClockGating:chv */ 4650 - intel_uncore_write(&dev_priv->uncore, GEN6_UCGCTL1, intel_uncore_read(&dev_priv->uncore, GEN6_UCGCTL1) | 4651 - GEN6_CSUNIT_CLOCK_GATE_DISABLE); 4699 + intel_uncore_rmw(&dev_priv->uncore, GEN6_UCGCTL1, 0, GEN6_CSUNIT_CLOCK_GATE_DISABLE); 4652 4700 4653 4701 /* WaDisableSDEUnitClockGating:chv */ 4654 - intel_uncore_write(&dev_priv->uncore, GEN8_UCGCTL6, intel_uncore_read(&dev_priv->uncore, GEN8_UCGCTL6) | 4655 - GEN8_SDEUNIT_CLOCK_GATE_DISABLE); 4702 + intel_uncore_rmw(&dev_priv->uncore, GEN8_UCGCTL6, 0, GEN8_SDEUNIT_CLOCK_GATE_DISABLE); 4656 4703 4657 4704 /* 4658 4705 * WaProgramL3SqcReg1Default:chv

+5

drivers/gpu/drm/i915/intel_runtime_pm.c

··· 633 633 runtime_pm); 634 634 int count = atomic_read(&rpm->wakeref_count); 635 635 636 + intel_wakeref_auto_fini(&rpm->userfault_wakeref); 637 + 636 638 drm_WARN(&i915->drm, count, 637 639 "i915 raw-wakerefs=%d wakelocks=%d on cleanup\n", 638 640 intel_rpm_raw_wakeref_count(count), ··· 654 652 rpm->available = HAS_RUNTIME_PM(i915); 655 653 656 654 init_intel_runtime_pm_wakeref(rpm); 655 + INIT_LIST_HEAD(&rpm->lmem_userfault_list); 656 + spin_lock_init(&rpm->lmem_userfault_lock); 657 + intel_wakeref_auto_init(&rpm->userfault_wakeref, rpm); 657 658 }

+22

drivers/gpu/drm/i915/intel_runtime_pm.h

··· 53 53 bool irqs_enabled; 54 54 bool no_wakeref_tracking; 55 55 56 + /* 57 + * Protects access to lmem usefault list. 58 + * It is required, if we are outside of the runtime suspend path, 59 + * access to @lmem_userfault_list requires always first grabbing the 60 + * runtime pm, to ensure we can't race against runtime suspend. 61 + * Once we have that we also need to grab @lmem_userfault_lock, 62 + * at which point we have exclusive access. 63 + * The runtime suspend path is special since it doesn't really hold any locks, 64 + * but instead has exclusive access by virtue of all other accesses requiring 65 + * holding the runtime pm wakeref. 66 + */ 67 + spinlock_t lmem_userfault_lock; 68 + 69 + /* 70 + * Keep list of userfaulted gem obj, which require to release their 71 + * mmap mappings at runtime suspend path. 72 + */ 73 + struct list_head lmem_userfault_list; 74 + 75 + /* Manual runtime pm autosuspend delay for user GGTT/lmem mmaps */ 76 + struct intel_wakeref_auto userfault_wakeref; 77 + 56 78 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_RUNTIME_PM) 57 79 /* 58 80 * To aide detection of wakeref leaks and general misuse, we

+259 -21

drivers/gpu/drm/i915/intel_uncore.c

··· 104 104 "vebox1", 105 105 "vebox2", 106 106 "vebox3", 107 + "gsc", 107 108 }; 108 109 109 110 const char * ··· 889 888 spin_unlock_irq(&uncore->lock); 890 889 } 891 890 892 - /* We give fast paths for the really cool registers */ 891 + /* 892 + * We give fast paths for the really cool registers. The second range includes 893 + * media domains (and the GSC starting from Xe_LPM+) 894 + */ 893 895 #define NEEDS_FORCE_WAKE(reg) ({ \ 894 896 u32 __reg = (reg); \ 895 - __reg < 0x40000 || __reg >= GEN11_BSD_RING_BASE; \ 897 + __reg < 0x40000 || __reg >= 0x116000; \ 896 898 }) 897 899 898 900 static int fw_range_cmp(u32 offset, const struct intel_forcewake_range *entry) ··· 1133 1129 { .start = 0x1F4510, .end = 0x1F4550 }, 1134 1130 { .start = 0x1F8030, .end = 0x1F8030 }, 1135 1131 { .start = 0x1F8510, .end = 0x1F8550 }, 1132 + }; 1133 + 1134 + static const struct i915_range mtl_shadowed_regs[] = { 1135 + { .start = 0x2030, .end = 0x2030 }, 1136 + { .start = 0x2510, .end = 0x2550 }, 1137 + { .start = 0xA008, .end = 0xA00C }, 1138 + { .start = 0xA188, .end = 0xA188 }, 1139 + { .start = 0xA278, .end = 0xA278 }, 1140 + { .start = 0xA540, .end = 0xA56C }, 1141 + { .start = 0xC050, .end = 0xC050 }, 1142 + { .start = 0xC340, .end = 0xC340 }, 1143 + { .start = 0xC4C8, .end = 0xC4C8 }, 1144 + { .start = 0xC4E0, .end = 0xC4E0 }, 1145 + { .start = 0xC600, .end = 0xC600 }, 1146 + { .start = 0xC658, .end = 0xC658 }, 1147 + { .start = 0xCFD4, .end = 0xCFDC }, 1148 + { .start = 0x22030, .end = 0x22030 }, 1149 + { .start = 0x22510, .end = 0x22550 }, 1150 + }; 1151 + 1152 + static const struct i915_range xelpmp_shadowed_regs[] = { 1153 + { .start = 0x1C0030, .end = 0x1C0030 }, 1154 + { .start = 0x1C0510, .end = 0x1C0550 }, 1155 + { .start = 0x1C8030, .end = 0x1C8030 }, 1156 + { .start = 0x1C8510, .end = 0x1C8550 }, 1157 + { .start = 0x1D0030, .end = 0x1D0030 }, 1158 + { .start = 0x1D0510, .end = 0x1D0550 }, 1159 + { .start = 0x38A008, .end = 0x38A00C }, 1160 + { .start = 0x38A188, .end = 0x38A188 }, 1161 + { .start = 0x38A278, .end = 0x38A278 }, 1162 + { .start = 0x38A540, .end = 0x38A56C }, 1163 + { .start = 0x38A618, .end = 0x38A618 }, 1164 + { .start = 0x38C050, .end = 0x38C050 }, 1165 + { .start = 0x38C340, .end = 0x38C340 }, 1166 + { .start = 0x38C4C8, .end = 0x38C4C8 }, 1167 + { .start = 0x38C4E0, .end = 0x38C4E4 }, 1168 + { .start = 0x38C600, .end = 0x38C600 }, 1169 + { .start = 0x38C658, .end = 0x38C658 }, 1170 + { .start = 0x38CFD4, .end = 0x38CFDC }, 1136 1171 }; 1137 1172 1138 1173 static int mmio_range_cmp(u32 key, const struct i915_range *range) ··· 1682 1639 GEN_FW_RANGE(0x12000, 0x12fff, 0), /* 1683 1640 0x12000 - 0x127ff: always on 1684 1641 0x12800 - 0x12fff: reserved */ 1685 - GEN_FW_RANGE(0x13000, 0x23fff, FORCEWAKE_GT), /* 1642 + GEN_FW_RANGE(0x13000, 0x19fff, FORCEWAKE_GT), /* 1686 1643 0x13000 - 0x135ff: gt 1687 1644 0x13600 - 0x147ff: reserved 1688 1645 0x14800 - 0x153ff: gt 1689 - 0x15400 - 0x19fff: reserved 1690 - 0x1a000 - 0x1ffff: gt 1691 - 0x20000 - 0x21fff: reserved 1692 - 0x22000 - 0x23fff: gt */ 1646 + 0x15400 - 0x19fff: reserved */ 1647 + GEN_FW_RANGE(0x1a000, 0x21fff, FORCEWAKE_RENDER), /* 1648 + 0x1a000 - 0x1ffff: render 1649 + 0x20000 - 0x21fff: reserved */ 1650 + GEN_FW_RANGE(0x22000, 0x23fff, FORCEWAKE_GT), 1693 1651 GEN_FW_RANGE(0x24000, 0x2417f, 0), /* 1694 1652 24000 - 0x2407f: always on 1695 1653 24080 - 0x2417f: reserved */ 1696 - GEN_FW_RANGE(0x24180, 0x3ffff, FORCEWAKE_GT), /* 1654 + GEN_FW_RANGE(0x24180, 0x25fff, FORCEWAKE_GT), /* 1697 1655 0x24180 - 0x241ff: gt 1698 1656 0x24200 - 0x251ff: reserved 1699 1657 0x25200 - 0x252ff: gt 1700 - 0x25300 - 0x25fff: reserved 1701 - 0x26000 - 0x27fff: gt 1702 - 0x28000 - 0x2ffff: reserved 1703 - 0x30000 - 0x3ffff: gt */ 1658 + 0x25300 - 0x25fff: reserved */ 1659 + GEN_FW_RANGE(0x26000, 0x2ffff, FORCEWAKE_RENDER), /* 1660 + 0x26000 - 0x27fff: render 1661 + 0x28000 - 0x2ffff: reserved */ 1662 + GEN_FW_RANGE(0x30000, 0x3ffff, FORCEWAKE_GT), 1704 1663 GEN_FW_RANGE(0x40000, 0x1bffff, 0), 1705 1664 GEN_FW_RANGE(0x1c0000, 0x1c3fff, FORCEWAKE_MEDIA_VDBOX0), /* 1706 1665 0x1c0000 - 0x1c2bff: VD0 ··· 1722 1677 0x1d4000 - 0x23ffff: reserved */ 1723 1678 GEN_FW_RANGE(0x240000, 0x3dffff, 0), 1724 1679 GEN_FW_RANGE(0x3e0000, 0x3effff, FORCEWAKE_GT), 1680 + }; 1681 + 1682 + static const struct intel_forcewake_range __mtl_fw_ranges[] = { 1683 + GEN_FW_RANGE(0x0, 0xaff, 0), 1684 + GEN_FW_RANGE(0xb00, 0xbff, FORCEWAKE_GT), 1685 + GEN_FW_RANGE(0xc00, 0xfff, 0), 1686 + GEN_FW_RANGE(0x1000, 0x1fff, FORCEWAKE_GT), 1687 + GEN_FW_RANGE(0x2000, 0x26ff, FORCEWAKE_RENDER), 1688 + GEN_FW_RANGE(0x2700, 0x2fff, FORCEWAKE_GT), 1689 + GEN_FW_RANGE(0x3000, 0x3fff, FORCEWAKE_RENDER), 1690 + GEN_FW_RANGE(0x4000, 0x51ff, FORCEWAKE_GT), /* 1691 + 0x4000 - 0x48ff: render 1692 + 0x4900 - 0x51ff: reserved */ 1693 + GEN_FW_RANGE(0x5200, 0x7fff, FORCEWAKE_RENDER), /* 1694 + 0x5200 - 0x53ff: render 1695 + 0x5400 - 0x54ff: reserved 1696 + 0x5500 - 0x7fff: render */ 1697 + GEN_FW_RANGE(0x8000, 0x813f, FORCEWAKE_GT), 1698 + GEN_FW_RANGE(0x8140, 0x817f, FORCEWAKE_RENDER), /* 1699 + 0x8140 - 0x815f: render 1700 + 0x8160 - 0x817f: reserved */ 1701 + GEN_FW_RANGE(0x8180, 0x81ff, 0), 1702 + GEN_FW_RANGE(0x8200, 0x94cf, FORCEWAKE_GT), /* 1703 + 0x8200 - 0x87ff: gt 1704 + 0x8800 - 0x8dff: reserved 1705 + 0x8e00 - 0x8f7f: gt 1706 + 0x8f80 - 0x8fff: reserved 1707 + 0x9000 - 0x947f: gt 1708 + 0x9480 - 0x94cf: reserved */ 1709 + GEN_FW_RANGE(0x94d0, 0x955f, FORCEWAKE_RENDER), 1710 + GEN_FW_RANGE(0x9560, 0x967f, 0), /* 1711 + 0x9560 - 0x95ff: always on 1712 + 0x9600 - 0x967f: reserved */ 1713 + GEN_FW_RANGE(0x9680, 0x97ff, FORCEWAKE_RENDER), /* 1714 + 0x9680 - 0x96ff: render 1715 + 0x9700 - 0x97ff: reserved */ 1716 + GEN_FW_RANGE(0x9800, 0xcfff, FORCEWAKE_GT), /* 1717 + 0x9800 - 0xb4ff: gt 1718 + 0xb500 - 0xbfff: reserved 1719 + 0xc000 - 0xcfff: gt */ 1720 + GEN_FW_RANGE(0xd000, 0xd7ff, 0), /* 1721 + 0xd000 - 0xd3ff: always on 1722 + 0xd400 - 0xd7ff: reserved */ 1723 + GEN_FW_RANGE(0xd800, 0xd87f, FORCEWAKE_RENDER), 1724 + GEN_FW_RANGE(0xd880, 0xdbff, FORCEWAKE_GT), 1725 + GEN_FW_RANGE(0xdc00, 0xdcff, FORCEWAKE_RENDER), 1726 + GEN_FW_RANGE(0xdd00, 0xde7f, FORCEWAKE_GT), /* 1727 + 0xdd00 - 0xddff: gt 1728 + 0xde00 - 0xde7f: reserved */ 1729 + GEN_FW_RANGE(0xde80, 0xe8ff, FORCEWAKE_RENDER), /* 1730 + 0xde80 - 0xdfff: render 1731 + 0xe000 - 0xe0ff: reserved 1732 + 0xe100 - 0xe8ff: render */ 1733 + GEN_FW_RANGE(0xe900, 0xe9ff, FORCEWAKE_GT), 1734 + GEN_FW_RANGE(0xea00, 0x147ff, 0), /* 1735 + 0xea00 - 0x11fff: reserved 1736 + 0x12000 - 0x127ff: always on 1737 + 0x12800 - 0x147ff: reserved */ 1738 + GEN_FW_RANGE(0x14800, 0x19fff, FORCEWAKE_GT), /* 1739 + 0x14800 - 0x153ff: gt 1740 + 0x15400 - 0x19fff: reserved */ 1741 + GEN_FW_RANGE(0x1a000, 0x21fff, FORCEWAKE_RENDER), /* 1742 + 0x1a000 - 0x1bfff: render 1743 + 0x1c000 - 0x21fff: reserved */ 1744 + GEN_FW_RANGE(0x22000, 0x23fff, FORCEWAKE_GT), 1745 + GEN_FW_RANGE(0x24000, 0x2ffff, 0), /* 1746 + 0x24000 - 0x2407f: always on 1747 + 0x24080 - 0x2ffff: reserved */ 1748 + GEN_FW_RANGE(0x30000, 0x3ffff, FORCEWAKE_GT) 1749 + }; 1750 + 1751 + /* 1752 + * Note that the register ranges here are the final offsets after 1753 + * translation of the GSI block to the 0x380000 offset. 1754 + * 1755 + * NOTE: There are a couple MCR ranges near the bottom of this table 1756 + * that need to power up either VD0 or VD2 depending on which replicated 1757 + * instance of the register we're trying to access. Our forcewake logic 1758 + * at the moment doesn't have a good way to take steering into consideration, 1759 + * and the driver doesn't even access any registers in those ranges today, 1760 + * so for now we just mark those ranges as FORCEWAKE_ALL. That will ensure 1761 + * proper operation if we do start using the ranges in the future, and we 1762 + * can determine at that time whether it's worth adding extra complexity to 1763 + * the forcewake handling to take steering into consideration. 1764 + */ 1765 + static const struct intel_forcewake_range __xelpmp_fw_ranges[] = { 1766 + GEN_FW_RANGE(0x0, 0x115fff, 0), /* render GT range */ 1767 + GEN_FW_RANGE(0x116000, 0x11ffff, FORCEWAKE_GSC), /* 1768 + 0x116000 - 0x117fff: gsc 1769 + 0x118000 - 0x119fff: reserved 1770 + 0x11a000 - 0x11efff: gsc 1771 + 0x11f000 - 0x11ffff: reserved */ 1772 + GEN_FW_RANGE(0x120000, 0x1bffff, 0), /* non-GT range */ 1773 + GEN_FW_RANGE(0x1c0000, 0x1c7fff, FORCEWAKE_MEDIA_VDBOX0), /* 1774 + 0x1c0000 - 0x1c3dff: VD0 1775 + 0x1c3e00 - 0x1c3eff: reserved 1776 + 0x1c3f00 - 0x1c3fff: VD0 1777 + 0x1c4000 - 0x1c7fff: reserved */ 1778 + GEN_FW_RANGE(0x1c8000, 0x1cbfff, FORCEWAKE_MEDIA_VEBOX0), /* 1779 + 0x1c8000 - 0x1ca0ff: VE0 1780 + 0x1ca100 - 0x1cbfff: reserved */ 1781 + GEN_FW_RANGE(0x1cc000, 0x1cffff, FORCEWAKE_MEDIA_VDBOX0), /* 1782 + 0x1cc000 - 0x1cdfff: VD0 1783 + 0x1ce000 - 0x1cffff: reserved */ 1784 + GEN_FW_RANGE(0x1d0000, 0x1d7fff, FORCEWAKE_MEDIA_VDBOX2), /* 1785 + 0x1d0000 - 0x1d3dff: VD2 1786 + 0x1d3e00 - 0x1d3eff: reserved 1787 + 0x1d4000 - 0x1d7fff: VD2 */ 1788 + GEN_FW_RANGE(0x1d8000, 0x1da0ff, FORCEWAKE_MEDIA_VEBOX1), 1789 + GEN_FW_RANGE(0x1da100, 0x380aff, 0), /* 1790 + 0x1da100 - 0x23ffff: reserved 1791 + 0x240000 - 0x37ffff: non-GT range 1792 + 0x380000 - 0x380aff: reserved */ 1793 + GEN_FW_RANGE(0x380b00, 0x380bff, FORCEWAKE_GT), 1794 + GEN_FW_RANGE(0x380c00, 0x380fff, 0), 1795 + GEN_FW_RANGE(0x381000, 0x38817f, FORCEWAKE_GT), /* 1796 + 0x381000 - 0x381fff: gt 1797 + 0x382000 - 0x383fff: reserved 1798 + 0x384000 - 0x384aff: gt 1799 + 0x384b00 - 0x3851ff: reserved 1800 + 0x385200 - 0x3871ff: gt 1801 + 0x387200 - 0x387fff: reserved 1802 + 0x388000 - 0x38813f: gt 1803 + 0x388140 - 0x38817f: reserved */ 1804 + GEN_FW_RANGE(0x388180, 0x3882ff, 0), /* 1805 + 0x388180 - 0x3881ff: always on 1806 + 0x388200 - 0x3882ff: reserved */ 1807 + GEN_FW_RANGE(0x388300, 0x38955f, FORCEWAKE_GT), /* 1808 + 0x388300 - 0x38887f: gt 1809 + 0x388880 - 0x388fff: reserved 1810 + 0x389000 - 0x38947f: gt 1811 + 0x389480 - 0x38955f: reserved */ 1812 + GEN_FW_RANGE(0x389560, 0x389fff, 0), /* 1813 + 0x389560 - 0x3895ff: always on 1814 + 0x389600 - 0x389fff: reserved */ 1815 + GEN_FW_RANGE(0x38a000, 0x38cfff, FORCEWAKE_GT), /* 1816 + 0x38a000 - 0x38afff: gt 1817 + 0x38b000 - 0x38bfff: reserved 1818 + 0x38c000 - 0x38cfff: gt */ 1819 + GEN_FW_RANGE(0x38d000, 0x38d11f, 0), 1820 + GEN_FW_RANGE(0x38d120, 0x391fff, FORCEWAKE_GT), /* 1821 + 0x38d120 - 0x38dfff: gt 1822 + 0x38e000 - 0x38efff: reserved 1823 + 0x38f000 - 0x38ffff: gt 1824 + 0x389000 - 0x391fff: reserved */ 1825 + GEN_FW_RANGE(0x392000, 0x392fff, 0), /* 1826 + 0x392000 - 0x3927ff: always on 1827 + 0x392800 - 0x292fff: reserved */ 1828 + GEN_FW_RANGE(0x393000, 0x3931ff, FORCEWAKE_GT), 1829 + GEN_FW_RANGE(0x393200, 0x39323f, FORCEWAKE_ALL), /* instance-based, see note above */ 1830 + GEN_FW_RANGE(0x393240, 0x3933ff, FORCEWAKE_GT), 1831 + GEN_FW_RANGE(0x393400, 0x3934ff, FORCEWAKE_ALL), /* instance-based, see note above */ 1832 + GEN_FW_RANGE(0x393500, 0x393c7f, 0), /* 1833 + 0x393500 - 0x393bff: reserved 1834 + 0x393c00 - 0x393c7f: always on */ 1835 + GEN_FW_RANGE(0x393c80, 0x393dff, FORCEWAKE_GT), 1725 1836 }; 1726 1837 1727 1838 static void ··· 2222 2021 BUILD_BUG_ON(FORCEWAKE_MEDIA_VEBOX1 != (1 << FW_DOMAIN_ID_MEDIA_VEBOX1)); 2223 2022 BUILD_BUG_ON(FORCEWAKE_MEDIA_VEBOX2 != (1 << FW_DOMAIN_ID_MEDIA_VEBOX2)); 2224 2023 BUILD_BUG_ON(FORCEWAKE_MEDIA_VEBOX3 != (1 << FW_DOMAIN_ID_MEDIA_VEBOX3)); 2024 + BUILD_BUG_ON(FORCEWAKE_GSC != (1 << FW_DOMAIN_ID_GSC)); 2225 2025 2226 2026 d->mask = BIT(domain_id); 2227 2027 ··· 2287 2085 (ret ?: (ret = __fw_domain_init((uncore__), (id__), (set__), (ack__)))) 2288 2086 2289 2087 if (GRAPHICS_VER(i915) >= 11) { 2290 - /* we'll prune the domains of missing engines later */ 2291 - intel_engine_mask_t emask = RUNTIME_INFO(i915)->platform_engine_mask; 2088 + intel_engine_mask_t emask; 2292 2089 int i; 2293 2090 2091 + /* we'll prune the domains of missing engines later */ 2092 + emask = uncore->gt->info.engine_mask; 2093 + 2294 2094 uncore->fw_get_funcs = &uncore_get_fallback; 2295 - fw_domain_init(uncore, FW_DOMAIN_ID_RENDER, 2296 - FORCEWAKE_RENDER_GEN9, 2297 - FORCEWAKE_ACK_RENDER_GEN9); 2298 - fw_domain_init(uncore, FW_DOMAIN_ID_GT, 2299 - FORCEWAKE_GT_GEN9, 2300 - FORCEWAKE_ACK_GT_GEN9); 2095 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) 2096 + fw_domain_init(uncore, FW_DOMAIN_ID_GT, 2097 + FORCEWAKE_GT_GEN9, 2098 + FORCEWAKE_ACK_GT_MTL); 2099 + else 2100 + fw_domain_init(uncore, FW_DOMAIN_ID_GT, 2101 + FORCEWAKE_GT_GEN9, 2102 + FORCEWAKE_ACK_GT_GEN9); 2103 + 2104 + if (RCS_MASK(uncore->gt) || CCS_MASK(uncore->gt)) 2105 + fw_domain_init(uncore, FW_DOMAIN_ID_RENDER, 2106 + FORCEWAKE_RENDER_GEN9, 2107 + FORCEWAKE_ACK_RENDER_GEN9); 2301 2108 2302 2109 for (i = 0; i < I915_MAX_VCS; i++) { 2303 2110 if (!__HAS_ENGINE(emask, _VCS(i))) ··· 2324 2113 FORCEWAKE_MEDIA_VEBOX_GEN11(i), 2325 2114 FORCEWAKE_ACK_MEDIA_VEBOX_GEN11(i)); 2326 2115 } 2116 + 2117 + if (uncore->gt->type == GT_MEDIA) 2118 + fw_domain_init(uncore, FW_DOMAIN_ID_GSC, 2119 + FORCEWAKE_REQ_GSC, FORCEWAKE_ACK_GSC); 2327 2120 } else if (IS_GRAPHICS_VER(i915, 9, 10)) { 2328 2121 uncore->fw_get_funcs = &uncore_get_fallback; 2329 2122 fw_domain_init(uncore, FW_DOMAIN_ID_RENDER, ··· 2515 2300 } 2516 2301 } 2517 2302 2303 + static int uncore_media_forcewake_init(struct intel_uncore *uncore) 2304 + { 2305 + struct drm_i915_private *i915 = uncore->i915; 2306 + 2307 + if (MEDIA_VER(i915) >= 13) { 2308 + ASSIGN_FW_DOMAINS_TABLE(uncore, __xelpmp_fw_ranges); 2309 + ASSIGN_SHADOW_TABLE(uncore, xelpmp_shadowed_regs); 2310 + ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); 2311 + } else { 2312 + MISSING_CASE(MEDIA_VER(i915)); 2313 + return -ENODEV; 2314 + } 2315 + 2316 + return 0; 2317 + } 2318 + 2518 2319 static int uncore_forcewake_init(struct intel_uncore *uncore) 2519 2320 { 2520 2321 struct drm_i915_private *i915 = uncore->i915; ··· 2545 2314 2546 2315 ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); 2547 2316 2548 - if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 60)) { 2317 + if (uncore->gt->type == GT_MEDIA) 2318 + return uncore_media_forcewake_init(uncore); 2319 + 2320 + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70)) { 2321 + ASSIGN_FW_DOMAINS_TABLE(uncore, __mtl_fw_ranges); 2322 + ASSIGN_SHADOW_TABLE(uncore, mtl_shadowed_regs); 2323 + ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); 2324 + } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 60)) { 2549 2325 ASSIGN_FW_DOMAINS_TABLE(uncore, __pvc_fw_ranges); 2550 2326 ASSIGN_SHADOW_TABLE(uncore, pvc_shadowed_regs); 2551 2327 ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable);

+2

drivers/gpu/drm/i915/intel_uncore.h

··· 62 62 FW_DOMAIN_ID_MEDIA_VEBOX1, 63 63 FW_DOMAIN_ID_MEDIA_VEBOX2, 64 64 FW_DOMAIN_ID_MEDIA_VEBOX3, 65 + FW_DOMAIN_ID_GSC, 65 66 66 67 FW_DOMAIN_ID_COUNT 67 68 }; ··· 83 82 FORCEWAKE_MEDIA_VEBOX1 = BIT(FW_DOMAIN_ID_MEDIA_VEBOX1), 84 83 FORCEWAKE_MEDIA_VEBOX2 = BIT(FW_DOMAIN_ID_MEDIA_VEBOX2), 85 84 FORCEWAKE_MEDIA_VEBOX3 = BIT(FW_DOMAIN_ID_MEDIA_VEBOX3), 85 + FORCEWAKE_GSC = BIT(FW_DOMAIN_ID_GSC), 86 86 87 87 FORCEWAKE_ALL = BIT(FW_DOMAIN_ID_COUNT) - 1, 88 88 };

+21 -11

drivers/gpu/drm/i915/pxp/intel_pxp.c

··· 103 103 104 104 static void destroy_vcs_context(struct intel_pxp *pxp) 105 105 { 106 - intel_engine_destroy_pinned_context(fetch_and_zero(&pxp->ce)); 106 + if (pxp->ce) 107 + intel_engine_destroy_pinned_context(fetch_and_zero(&pxp->ce)); 107 108 } 108 109 109 - void intel_pxp_init(struct intel_pxp *pxp) 110 + static void pxp_init_full(struct intel_pxp *pxp) 110 111 { 111 112 struct intel_gt *gt = pxp_to_gt(pxp); 112 113 int ret; 113 - 114 - if (!HAS_PXP(gt->i915)) 115 - return; 116 - 117 - mutex_init(&pxp->tee_mutex); 118 114 119 115 /* 120 116 * we'll use the completion to check if there is a termination pending, ··· 120 124 init_completion(&pxp->termination); 121 125 complete_all(&pxp->termination); 122 126 123 - mutex_init(&pxp->arb_mutex); 124 - INIT_WORK(&pxp->session_work, intel_pxp_session_work); 127 + intel_pxp_session_management_init(pxp); 125 128 126 129 ret = create_vcs_context(pxp); 127 130 if (ret) ··· 138 143 destroy_vcs_context(pxp); 139 144 } 140 145 141 - void intel_pxp_fini(struct intel_pxp *pxp) 146 + void intel_pxp_init(struct intel_pxp *pxp) 142 147 { 143 - if (!intel_pxp_is_enabled(pxp)) 148 + struct intel_gt *gt = pxp_to_gt(pxp); 149 + 150 + /* we rely on the mei PXP module */ 151 + if (!IS_ENABLED(CONFIG_INTEL_MEI_PXP)) 144 152 return; 145 153 154 + /* 155 + * If HuC is loaded by GSC but PXP is disabled, we can skip the init of 156 + * the full PXP session/object management and just init the tee channel. 157 + */ 158 + if (HAS_PXP(gt->i915)) 159 + pxp_init_full(pxp); 160 + else if (intel_huc_is_loaded_by_gsc(&gt->uc.huc) && intel_uc_uses_huc(&gt->uc)) 161 + intel_pxp_tee_component_init(pxp); 162 + } 163 + 164 + void intel_pxp_fini(struct intel_pxp *pxp) 165 + { 146 166 pxp->arb_is_valid = false; 147 167 148 168 intel_pxp_tee_component_fini(pxp);

-32

drivers/gpu/drm/i915/pxp/intel_pxp.h

··· 12 12 struct intel_pxp; 13 13 struct drm_i915_gem_object; 14 14 15 - #ifdef CONFIG_DRM_I915_PXP 16 15 struct intel_gt *pxp_to_gt(const struct intel_pxp *pxp); 17 16 bool intel_pxp_is_enabled(const struct intel_pxp *pxp); 18 17 bool intel_pxp_is_active(const struct intel_pxp *pxp); ··· 31 32 bool assign); 32 33 33 34 void intel_pxp_invalidate(struct intel_pxp *pxp); 34 - #else 35 - static inline void intel_pxp_init(struct intel_pxp *pxp) 36 - { 37 - } 38 - 39 - static inline void intel_pxp_fini(struct intel_pxp *pxp) 40 - { 41 - } 42 - 43 - static inline int intel_pxp_start(struct intel_pxp *pxp) 44 - { 45 - return -ENODEV; 46 - } 47 - 48 - static inline bool intel_pxp_is_enabled(const struct intel_pxp *pxp) 49 - { 50 - return false; 51 - } 52 - 53 - static inline bool intel_pxp_is_active(const struct intel_pxp *pxp) 54 - { 55 - return false; 56 - } 57 - 58 - static inline int intel_pxp_key_check(struct intel_pxp *pxp, 59 - struct drm_i915_gem_object *obj, 60 - bool assign) 61 - { 62 - return -ENODEV; 63 - } 64 - #endif 65 35 66 36 #endif /* __INTEL_PXP_H__ */

+69

drivers/gpu/drm/i915/pxp/intel_pxp_huc.c

··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright(c) 2021-2022, Intel Corporation. All rights reserved. 4 + */ 5 + 6 + #include "drm/i915_drm.h" 7 + #include "i915_drv.h" 8 + 9 + #include "gem/i915_gem_region.h" 10 + #include "gt/intel_gt.h" 11 + 12 + #include "intel_pxp.h" 13 + #include "intel_pxp_huc.h" 14 + #include "intel_pxp_tee.h" 15 + #include "intel_pxp_types.h" 16 + #include "intel_pxp_tee_interface.h" 17 + 18 + int intel_pxp_huc_load_and_auth(struct intel_pxp *pxp) 19 + { 20 + struct intel_gt *gt = pxp_to_gt(pxp); 21 + struct intel_huc *huc = &gt->uc.huc; 22 + struct pxp_tee_start_huc_auth_in huc_in = {0}; 23 + struct pxp_tee_start_huc_auth_out huc_out = {0}; 24 + dma_addr_t huc_phys_addr; 25 + u8 client_id = 0; 26 + u8 fence_id = 0; 27 + int err; 28 + 29 + if (!pxp->pxp_component) 30 + return -ENODEV; 31 + 32 + huc_phys_addr = i915_gem_object_get_dma_address(huc->fw.obj, 0); 33 + 34 + /* write the PXP message into the lmem (the sg list) */ 35 + huc_in.header.api_version = PXP_TEE_43_APIVER; 36 + huc_in.header.command_id = PXP_TEE_43_START_HUC_AUTH; 37 + huc_in.header.status = 0; 38 + huc_in.header.buffer_len = sizeof(huc_in.huc_base_address); 39 + huc_in.huc_base_address = huc_phys_addr; 40 + 41 + err = intel_pxp_tee_stream_message(pxp, client_id, fence_id, 42 + &huc_in, sizeof(huc_in), 43 + &huc_out, sizeof(huc_out)); 44 + if (err < 0) { 45 + drm_err(&gt->i915->drm, 46 + "Failed to send HuC load and auth command to GSC [%d]!\n", 47 + err); 48 + return err; 49 + } 50 + 51 + /* 52 + * HuC does sometimes survive suspend/resume (it depends on how "deep" 53 + * a sleep state the device reaches) so we can end up here on resume 54 + * with HuC already loaded, in which case the GSC will return 55 + * PXP_STATUS_OP_NOT_PERMITTED. We can therefore consider the GuC 56 + * correctly transferred in this scenario; if the same error is ever 57 + * returned with HuC not loaded we'll still catch it when we check the 58 + * authentication bit later. 59 + */ 60 + if (huc_out.header.status != PXP_STATUS_SUCCESS && 61 + huc_out.header.status != PXP_STATUS_OP_NOT_PERMITTED) { 62 + drm_err(&gt->i915->drm, 63 + "HuC load failed with GSC error = 0x%x\n", 64 + huc_out.header.status); 65 + return -EPROTO; 66 + } 67 + 68 + return 0; 69 + }

+13

drivers/gpu/drm/i915/pxp/intel_pxp_huc.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright(c) 2021-2022, Intel Corporation. All rights reserved. 4 + */ 5 + 6 + #ifndef __INTEL_PXP_HUC_H__ 7 + #define __INTEL_PXP_HUC_H__ 8 + 9 + struct intel_pxp; 10 + 11 + int intel_pxp_huc_load_and_auth(struct intel_pxp *pxp); 12 + 13 + #endif /* __INTEL_PXP_HUC_H__ */

+8

drivers/gpu/drm/i915/pxp/intel_pxp_irq.h

··· 27 27 static inline void intel_pxp_irq_handler(struct intel_pxp *pxp, u16 iir) 28 28 { 29 29 } 30 + 31 + static inline void intel_pxp_irq_enable(struct intel_pxp *pxp) 32 + { 33 + } 34 + 35 + static inline void intel_pxp_irq_disable(struct intel_pxp *pxp) 36 + { 37 + } 30 38 #endif 31 39 32 40 #endif /* __INTEL_PXP_IRQ_H__ */

+7 -1

drivers/gpu/drm/i915/pxp/intel_pxp_session.c

··· 138 138 complete_all(&pxp->termination); 139 139 } 140 140 141 - void intel_pxp_session_work(struct work_struct *work) 141 + static void pxp_session_work(struct work_struct *work) 142 142 { 143 143 struct intel_pxp *pxp = container_of(work, typeof(*pxp), session_work); 144 144 struct intel_gt *gt = pxp_to_gt(pxp); ··· 172 172 pxp_terminate_complete(pxp); 173 173 174 174 intel_runtime_pm_put(gt->uncore->rpm, wakeref); 175 + } 176 + 177 + void intel_pxp_session_management_init(struct intel_pxp *pxp) 178 + { 179 + mutex_init(&pxp->arb_mutex); 180 + INIT_WORK(&pxp->session_work, pxp_session_work); 175 181 }

+8 -3

drivers/gpu/drm/i915/pxp/intel_pxp_session.h

··· 8 8 9 9 #include <linux/types.h> 10 10 11 - struct work_struct; 11 + struct intel_pxp; 12 12 13 - void intel_pxp_session_work(struct work_struct *work); 14 - 13 + #ifdef CONFIG_DRM_I915_PXP 14 + void intel_pxp_session_management_init(struct intel_pxp *pxp); 15 + #else 16 + static inline void intel_pxp_session_management_init(struct intel_pxp *pxp) 17 + { 18 + } 19 + #endif 15 20 #endif /* __INTEL_PXP_SESSION_H__ */

+134 -5

drivers/gpu/drm/i915/pxp/intel_pxp_tee.c

··· 8 8 #include <drm/i915_pxp_tee_interface.h> 9 9 #include <drm/i915_component.h> 10 10 11 + #include "gem/i915_gem_lmem.h" 12 + 11 13 #include "i915_drv.h" 12 14 #include "intel_pxp.h" 13 15 #include "intel_pxp_session.h" 14 16 #include "intel_pxp_tee.h" 15 17 #include "intel_pxp_tee_interface.h" 18 + #include "intel_pxp_huc.h" 16 19 17 20 static inline struct intel_pxp *i915_dev_to_pxp(struct device *i915_kdev) 18 21 { ··· 72 69 return ret; 73 70 } 74 71 72 + int intel_pxp_tee_stream_message(struct intel_pxp *pxp, 73 + u8 client_id, u32 fence_id, 74 + void *msg_in, size_t msg_in_len, 75 + void *msg_out, size_t msg_out_len) 76 + { 77 + /* TODO: for bigger objects we need to use a sg of 4k pages */ 78 + const size_t max_msg_size = PAGE_SIZE; 79 + struct drm_i915_private *i915 = pxp_to_gt(pxp)->i915; 80 + struct i915_pxp_component *pxp_component = pxp->pxp_component; 81 + unsigned int offset = 0; 82 + struct scatterlist *sg; 83 + int ret; 84 + 85 + if (msg_in_len > max_msg_size || msg_out_len > max_msg_size) 86 + return -ENOSPC; 87 + 88 + mutex_lock(&pxp->tee_mutex); 89 + 90 + if (unlikely(!pxp_component || !pxp_component->ops->gsc_command)) { 91 + ret = -ENODEV; 92 + goto unlock; 93 + } 94 + 95 + GEM_BUG_ON(!pxp->stream_cmd.obj); 96 + 97 + sg = i915_gem_object_get_sg_dma(pxp->stream_cmd.obj, 0, &offset); 98 + 99 + memcpy(pxp->stream_cmd.vaddr, msg_in, msg_in_len); 100 + 101 + ret = pxp_component->ops->gsc_command(pxp_component->tee_dev, client_id, 102 + fence_id, sg, msg_in_len, sg); 103 + if (ret < 0) 104 + drm_err(&i915->drm, "Failed to send PXP TEE gsc command\n"); 105 + else 106 + memcpy(msg_out, pxp->stream_cmd.vaddr, msg_out_len); 107 + 108 + unlock: 109 + mutex_unlock(&pxp->tee_mutex); 110 + return ret; 111 + } 112 + 75 113 /** 76 114 * i915_pxp_tee_component_bind - bind function to pass the function pointers to pxp_tee 77 115 * @i915_kdev: pointer to i915 kernel device ··· 128 84 { 129 85 struct drm_i915_private *i915 = kdev_to_i915(i915_kdev); 130 86 struct intel_pxp *pxp = i915_dev_to_pxp(i915_kdev); 87 + struct intel_uc *uc = &pxp_to_gt(pxp)->uc; 131 88 intel_wakeref_t wakeref; 89 + int ret = 0; 132 90 133 91 mutex_lock(&pxp->tee_mutex); 134 92 pxp->pxp_component = data; 135 93 pxp->pxp_component->tee_dev = tee_kdev; 136 94 mutex_unlock(&pxp->tee_mutex); 95 + 96 + if (intel_uc_uses_huc(uc) && intel_huc_is_loaded_by_gsc(&uc->huc)) { 97 + with_intel_runtime_pm(&i915->runtime_pm, wakeref) { 98 + /* load huc via pxp */ 99 + ret = intel_huc_fw_load_and_auth_via_gsc(&uc->huc); 100 + if (ret < 0) 101 + drm_err(&i915->drm, "failed to load huc via gsc %d\n", ret); 102 + } 103 + } 137 104 138 105 /* if we are suspended, the HW will be re-initialized on resume */ 139 106 wakeref = intel_runtime_pm_get_if_in_use(&i915->runtime_pm); ··· 152 97 return 0; 153 98 154 99 /* the component is required to fully start the PXP HW */ 155 - intel_pxp_init_hw(pxp); 100 + if (intel_pxp_is_enabled(pxp)) 101 + intel_pxp_init_hw(pxp); 156 102 157 103 intel_runtime_pm_put(&i915->runtime_pm, wakeref); 158 104 159 - return 0; 105 + return ret; 160 106 } 161 107 162 108 static void i915_pxp_tee_component_unbind(struct device *i915_kdev, ··· 167 111 struct intel_pxp *pxp = i915_dev_to_pxp(i915_kdev); 168 112 intel_wakeref_t wakeref; 169 113 170 - with_intel_runtime_pm_if_in_use(&i915->runtime_pm, wakeref) 171 - intel_pxp_fini_hw(pxp); 114 + if (intel_pxp_is_enabled(pxp)) 115 + with_intel_runtime_pm_if_in_use(&i915->runtime_pm, wakeref) 116 + intel_pxp_fini_hw(pxp); 172 117 173 118 mutex_lock(&pxp->tee_mutex); 174 119 pxp->pxp_component = NULL; ··· 181 124 .unbind = i915_pxp_tee_component_unbind, 182 125 }; 183 126 127 + static int alloc_streaming_command(struct intel_pxp *pxp) 128 + { 129 + struct drm_i915_private *i915 = pxp_to_gt(pxp)->i915; 130 + struct drm_i915_gem_object *obj = NULL; 131 + void *cmd; 132 + int err; 133 + 134 + pxp->stream_cmd.obj = NULL; 135 + pxp->stream_cmd.vaddr = NULL; 136 + 137 + if (!IS_DGFX(i915)) 138 + return 0; 139 + 140 + /* allocate lmem object of one page for PXP command memory and store it */ 141 + obj = i915_gem_object_create_lmem(i915, PAGE_SIZE, I915_BO_ALLOC_CONTIGUOUS); 142 + if (IS_ERR(obj)) { 143 + drm_err(&i915->drm, "Failed to allocate pxp streaming command!\n"); 144 + return PTR_ERR(obj); 145 + } 146 + 147 + err = i915_gem_object_pin_pages_unlocked(obj); 148 + if (err) { 149 + drm_err(&i915->drm, "Failed to pin gsc message page!\n"); 150 + goto out_put; 151 + } 152 + 153 + /* map the lmem into the virtual memory pointer */ 154 + cmd = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(i915, obj, true)); 155 + if (IS_ERR(cmd)) { 156 + drm_err(&i915->drm, "Failed to map gsc message page!\n"); 157 + err = PTR_ERR(cmd); 158 + goto out_unpin; 159 + } 160 + 161 + memset(cmd, 0, obj->base.size); 162 + 163 + pxp->stream_cmd.obj = obj; 164 + pxp->stream_cmd.vaddr = cmd; 165 + 166 + return 0; 167 + 168 + out_unpin: 169 + i915_gem_object_unpin_pages(obj); 170 + out_put: 171 + i915_gem_object_put(obj); 172 + return err; 173 + } 174 + 175 + static void free_streaming_command(struct intel_pxp *pxp) 176 + { 177 + struct drm_i915_gem_object *obj = fetch_and_zero(&pxp->stream_cmd.obj); 178 + 179 + if (!obj) 180 + return; 181 + 182 + i915_gem_object_unpin_map(obj); 183 + i915_gem_object_unpin_pages(obj); 184 + i915_gem_object_put(obj); 185 + } 186 + 184 187 int intel_pxp_tee_component_init(struct intel_pxp *pxp) 185 188 { 186 189 int ret; 187 190 struct intel_gt *gt = pxp_to_gt(pxp); 188 191 struct drm_i915_private *i915 = gt->i915; 189 192 193 + mutex_init(&pxp->tee_mutex); 194 + 195 + ret = alloc_streaming_command(pxp); 196 + if (ret) 197 + return ret; 198 + 190 199 ret = component_add_typed(i915->drm.dev, &i915_pxp_tee_component_ops, 191 200 I915_COMPONENT_PXP); 192 201 if (ret < 0) { 193 202 drm_err(&i915->drm, "Failed to add PXP component (%d)\n", ret); 194 - return ret; 203 + goto out_free; 195 204 } 196 205 197 206 pxp->pxp_component_added = true; 198 207 199 208 return 0; 209 + 210 + out_free: 211 + free_streaming_command(pxp); 212 + return ret; 200 213 } 201 214 202 215 void intel_pxp_tee_component_fini(struct intel_pxp *pxp) ··· 278 151 279 152 component_del(i915->drm.dev, &i915_pxp_tee_component_ops); 280 153 pxp->pxp_component_added = false; 154 + 155 + free_streaming_command(pxp); 281 156 } 282 157 283 158 int intel_pxp_tee_cmd_create_arb_session(struct intel_pxp *pxp,

+5

drivers/gpu/drm/i915/pxp/intel_pxp_tee.h

··· 14 14 int intel_pxp_tee_cmd_create_arb_session(struct intel_pxp *pxp, 15 15 int arb_session_id); 16 16 17 + int intel_pxp_tee_stream_message(struct intel_pxp *pxp, 18 + u8 client_id, u32 fence_id, 19 + void *msg_in, size_t msg_in_len, 20 + void *msg_out, size_t msg_out_len); 21 + 17 22 #endif /* __INTEL_PXP_TEE_H__ */

+22 -1

drivers/gpu/drm/i915/pxp/intel_pxp_tee_interface.h

··· 1 1 /* SPDX-License-Identifier: MIT */ 2 2 /* 3 - * Copyright(c) 2020, Intel Corporation. All rights reserved. 3 + * Copyright(c) 2020-2022, Intel Corporation. All rights reserved. 4 4 */ 5 5 6 6 #ifndef __INTEL_PXP_TEE_INTERFACE_H__ ··· 9 9 #include <linux/types.h> 10 10 11 11 #define PXP_TEE_APIVER 0x40002 12 + #define PXP_TEE_43_APIVER 0x00040003 12 13 #define PXP_TEE_ARB_CMDID 0x1e 13 14 #define PXP_TEE_ARB_PROTECTION_MODE 0x2 15 + #define PXP_TEE_43_START_HUC_AUTH 0x0000003A 16 + 17 + /* 18 + * there are a lot of status codes for PXP, but we only define the ones we 19 + * actually can handle in the driver. other failure codes will be printed to 20 + * error msg for debug. 21 + */ 22 + enum pxp_status { 23 + PXP_STATUS_SUCCESS = 0x0, 24 + PXP_STATUS_OP_NOT_PERMITTED = 0x4013 25 + }; 14 26 15 27 /* PXP TEE message header */ 16 28 struct pxp_tee_cmd_header { ··· 44 32 struct pxp_tee_create_arb_out { 45 33 struct pxp_tee_cmd_header header; 46 34 } __packed; 35 + 36 + struct pxp_tee_start_huc_auth_in { 37 + struct pxp_tee_cmd_header header; 38 + __le64 huc_base_address; 39 + }; 40 + 41 + struct pxp_tee_start_huc_auth_out { 42 + struct pxp_tee_cmd_header header; 43 + }; 47 44 48 45 #endif /* __INTEL_PXP_TEE_INTERFACE_H__ */

+6

drivers/gpu/drm/i915/pxp/intel_pxp_types.h

··· 53 53 /** @tee_mutex: protects the tee channel binding and messaging. */ 54 54 struct mutex tee_mutex; 55 55 56 + /** @stream_cmd: LMEM obj used to send stream PXP commands to the GSC */ 57 + struct { 58 + struct drm_i915_gem_object *obj; /* contains PXP command memory */ 59 + void *vaddr; /* virtual memory for PXP command */ 60 + } stream_cmd; 61 + 56 62 /** 57 63 * @hw_state_invalidated: if the HW perceives an attack on the integrity 58 64 * of the encryption it will invalidate the keys and expect SW to

+2 -8

drivers/gpu/drm/i915/selftests/i915_gem_gtt.c

··· 27 27 28 28 #include "gem/i915_gem_context.h" 29 29 #include "gem/i915_gem_internal.h" 30 + #include "gem/i915_gem_lmem.h" 30 31 #include "gem/i915_gem_region.h" 31 32 #include "gem/selftests/mock_context.h" 32 33 #include "gt/intel_context.h" ··· 1114 1113 expected_node_size = expected_vma_size; 1115 1114 1116 1115 if (HAS_64K_PAGES(vm->i915) && i915_gem_object_is_lmem(obj)) { 1117 - /* 1118 - * The compact-pt should expand lmem node to 2MB for the ppGTT, 1119 - * for all other cases we should only expect 64K. 1120 - */ 1121 1116 expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K); 1122 - if (NEEDS_COMPACT_PT(vm->i915) && !i915_is_ggtt(vm)) 1123 - expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M); 1124 - else 1125 - expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_64K); 1117 + expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_64K); 1126 1118 } 1127 1119 1128 1120 if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {

+11 -5

drivers/gpu/drm/i915/selftests/i915_perf.c

··· 102 102 I915_OA_FORMAT_A32u40_A4u32_B8_C8 : I915_OA_FORMAT_C4_B8, 103 103 }; 104 104 struct i915_perf_stream *stream; 105 + struct intel_gt *gt; 106 + 107 + if (!props.engine) 108 + return NULL; 109 + 110 + gt = props.engine->gt; 105 111 106 112 if (!oa_config) 107 113 return NULL; ··· 122 116 123 117 stream->perf = perf; 124 118 125 - mutex_lock(&perf->lock); 119 + mutex_lock(&gt->perf.lock); 126 120 if (i915_oa_stream_init(stream, &param, &props)) { 127 121 kfree(stream); 128 122 stream = NULL; 129 123 } 130 - mutex_unlock(&perf->lock); 124 + mutex_unlock(&gt->perf.lock); 131 125 132 126 i915_oa_config_put(oa_config); 133 127 ··· 136 130 137 131 static void stream_destroy(struct i915_perf_stream *stream) 138 132 { 139 - struct i915_perf *perf = stream->perf; 133 + struct intel_gt *gt = stream->engine->gt; 140 134 141 - mutex_lock(&perf->lock); 135 + mutex_lock(&gt->perf.lock); 142 136 i915_perf_destroy_locked(stream); 143 - mutex_unlock(&perf->lock); 137 + mutex_unlock(&gt->perf.lock); 144 138 } 145 139 146 140 static int live_sanitycheck(void *arg)

+162 -90

drivers/gpu/drm/i915/selftests/i915_request.c

··· 299 299 return intel_context_create_request(ce); 300 300 } 301 301 302 - static int __igt_breadcrumbs_smoketest(void *arg) 302 + struct smoke_thread { 303 + struct kthread_worker *worker; 304 + struct kthread_work work; 305 + struct smoketest *t; 306 + bool stop; 307 + int result; 308 + }; 309 + 310 + static void __igt_breadcrumbs_smoketest(struct kthread_work *work) 303 311 { 304 - struct smoketest *t = arg; 312 + struct smoke_thread *thread = container_of(work, typeof(*thread), work); 313 + struct smoketest *t = thread->t; 305 314 const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1; 306 315 const unsigned int total = 4 * t->ncontexts + 1; 307 316 unsigned int num_waits = 0, num_fences = 0; ··· 329 320 */ 330 321 331 322 requests = kcalloc(total, sizeof(*requests), GFP_KERNEL); 332 - if (!requests) 333 - return -ENOMEM; 323 + if (!requests) { 324 + thread->result = -ENOMEM; 325 + return; 326 + } 334 327 335 328 order = i915_random_order(total, &prng); 336 329 if (!order) { ··· 340 329 goto out_requests; 341 330 } 342 331 343 - while (!kthread_should_stop()) { 332 + while (!READ_ONCE(thread->stop)) { 344 333 struct i915_sw_fence *submit, *wait; 345 334 unsigned int n, count; 346 335 ··· 448 437 kfree(order); 449 438 out_requests: 450 439 kfree(requests); 451 - return err; 440 + thread->result = err; 452 441 } 453 442 454 443 static int mock_breadcrumbs_smoketest(void *arg) ··· 461 450 .request_alloc = __mock_request_alloc 462 451 }; 463 452 unsigned int ncpus = num_online_cpus(); 464 - struct task_struct **threads; 453 + struct smoke_thread *threads; 465 454 unsigned int n; 466 455 int ret = 0; 467 456 ··· 490 479 } 491 480 492 481 for (n = 0; n < ncpus; n++) { 493 - threads[n] = kthread_run(__igt_breadcrumbs_smoketest, 494 - &t, "igt/%d", n); 495 - if (IS_ERR(threads[n])) { 496 - ret = PTR_ERR(threads[n]); 482 + struct kthread_worker *worker; 483 + 484 + worker = kthread_create_worker(0, "igt/%d", n); 485 + if (IS_ERR(worker)) { 486 + ret = PTR_ERR(worker); 497 487 ncpus = n; 498 488 break; 499 489 } 500 490 501 - get_task_struct(threads[n]); 491 + threads[n].worker = worker; 492 + threads[n].t = &t; 493 + threads[n].stop = false; 494 + threads[n].result = 0; 495 + 496 + kthread_init_work(&threads[n].work, 497 + __igt_breadcrumbs_smoketest); 498 + kthread_queue_work(worker, &threads[n].work); 502 499 } 503 500 504 - yield(); /* start all threads before we begin */ 505 501 msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies)); 506 502 507 503 for (n = 0; n < ncpus; n++) { 508 504 int err; 509 505 510 - err = kthread_stop(threads[n]); 506 + WRITE_ONCE(threads[n].stop, true); 507 + kthread_flush_work(&threads[n].work); 508 + err = READ_ONCE(threads[n].result); 511 509 if (err < 0 && !ret) 512 510 ret = err; 513 511 514 - put_task_struct(threads[n]); 512 + kthread_destroy_worker(threads[n].worker); 515 513 } 516 514 pr_info("Completed %lu waits for %lu fence across %d cpus\n", 517 515 atomic_long_read(&t.num_waits), ··· 1439 1419 return err; 1440 1420 } 1441 1421 1442 - static int __live_parallel_engine1(void *arg) 1422 + struct parallel_thread { 1423 + struct kthread_worker *worker; 1424 + struct kthread_work work; 1425 + struct intel_engine_cs *engine; 1426 + int result; 1427 + }; 1428 + 1429 + static void __live_parallel_engine1(struct kthread_work *work) 1443 1430 { 1444 - struct intel_engine_cs *engine = arg; 1431 + struct parallel_thread *thread = 1432 + container_of(work, typeof(*thread), work); 1433 + struct intel_engine_cs *engine = thread->engine; 1445 1434 IGT_TIMEOUT(end_time); 1446 1435 unsigned long count; 1447 1436 int err = 0; ··· 1481 1452 intel_engine_pm_put(engine); 1482 1453 1483 1454 pr_info("%s: %lu request + sync\n", engine->name, count); 1484 - return err; 1455 + thread->result = err; 1485 1456 } 1486 1457 1487 - static int __live_parallel_engineN(void *arg) 1458 + static void __live_parallel_engineN(struct kthread_work *work) 1488 1459 { 1489 - struct intel_engine_cs *engine = arg; 1460 + struct parallel_thread *thread = 1461 + container_of(work, typeof(*thread), work); 1462 + struct intel_engine_cs *engine = thread->engine; 1490 1463 IGT_TIMEOUT(end_time); 1491 1464 unsigned long count; 1492 1465 int err = 0; ··· 1510 1479 intel_engine_pm_put(engine); 1511 1480 1512 1481 pr_info("%s: %lu requests\n", engine->name, count); 1513 - return err; 1482 + thread->result = err; 1514 1483 } 1515 1484 1516 1485 static bool wake_all(struct drm_i915_private *i915) ··· 1536 1505 return -ETIME; 1537 1506 } 1538 1507 1539 - static int __live_parallel_spin(void *arg) 1508 + static void __live_parallel_spin(struct kthread_work *work) 1540 1509 { 1541 - struct intel_engine_cs *engine = arg; 1510 + struct parallel_thread *thread = 1511 + container_of(work, typeof(*thread), work); 1512 + struct intel_engine_cs *engine = thread->engine; 1542 1513 struct igt_spinner spin; 1543 1514 struct i915_request *rq; 1544 1515 int err = 0; ··· 1553 1520 1554 1521 if (igt_spinner_init(&spin, engine->gt)) { 1555 1522 wake_all(engine->i915); 1556 - return -ENOMEM; 1523 + thread->result = -ENOMEM; 1524 + return; 1557 1525 } 1558 1526 1559 1527 intel_engine_pm_get(engine); ··· 1587 1553 1588 1554 out_spin: 1589 1555 igt_spinner_fini(&spin); 1590 - return err; 1556 + thread->result = err; 1591 1557 } 1592 1558 1593 1559 static int live_parallel_engines(void *arg) 1594 1560 { 1595 1561 struct drm_i915_private *i915 = arg; 1596 - static int (* const func[])(void *arg) = { 1562 + static void (* const func[])(struct kthread_work *) = { 1597 1563 __live_parallel_engine1, 1598 1564 __live_parallel_engineN, 1599 1565 __live_parallel_spin, 1600 1566 NULL, 1601 1567 }; 1602 1568 const unsigned int nengines = num_uabi_engines(i915); 1569 + struct parallel_thread *threads; 1603 1570 struct intel_engine_cs *engine; 1604 - int (* const *fn)(void *arg); 1605 - struct task_struct **tsk; 1571 + void (* const *fn)(struct kthread_work *); 1606 1572 int err = 0; 1607 1573 1608 1574 /* ··· 1610 1576 * tests that we load up the system maximally. 1611 1577 */ 1612 1578 1613 - tsk = kcalloc(nengines, sizeof(*tsk), GFP_KERNEL); 1614 - if (!tsk) 1579 + threads = kcalloc(nengines, sizeof(*threads), GFP_KERNEL); 1580 + if (!threads) 1615 1581 return -ENOMEM; 1616 1582 1617 1583 for (fn = func; !err && *fn; fn++) { ··· 1628 1594 1629 1595 idx = 0; 1630 1596 for_each_uabi_engine(engine, i915) { 1631 - tsk[idx] = kthread_run(*fn, engine, 1632 - "igt/parallel:%s", 1633 - engine->name); 1634 - if (IS_ERR(tsk[idx])) { 1635 - err = PTR_ERR(tsk[idx]); 1597 + struct kthread_worker *worker; 1598 + 1599 + worker = kthread_create_worker(0, "igt/parallel:%s", 1600 + engine->name); 1601 + if (IS_ERR(worker)) { 1602 + err = PTR_ERR(worker); 1636 1603 break; 1637 1604 } 1638 - get_task_struct(tsk[idx++]); 1639 - } 1640 1605 1641 - yield(); /* start all threads before we kthread_stop() */ 1606 + threads[idx].worker = worker; 1607 + threads[idx].result = 0; 1608 + threads[idx].engine = engine; 1609 + 1610 + kthread_init_work(&threads[idx].work, *fn); 1611 + kthread_queue_work(worker, &threads[idx].work); 1612 + idx++; 1613 + } 1642 1614 1643 1615 idx = 0; 1644 1616 for_each_uabi_engine(engine, i915) { 1645 1617 int status; 1646 1618 1647 - if (IS_ERR(tsk[idx])) 1619 + if (!threads[idx].worker) 1648 1620 break; 1649 1621 1650 - status = kthread_stop(tsk[idx]); 1622 + kthread_flush_work(&threads[idx].work); 1623 + status = READ_ONCE(threads[idx].result); 1651 1624 if (status && !err) 1652 1625 err = status; 1653 1626 1654 - put_task_struct(tsk[idx++]); 1627 + kthread_destroy_worker(threads[idx++].worker); 1655 1628 } 1656 1629 1657 1630 if (igt_live_test_end(&t)) 1658 1631 err = -EIO; 1659 1632 } 1660 1633 1661 - kfree(tsk); 1634 + kfree(threads); 1662 1635 return err; 1663 1636 } 1664 1637 ··· 1713 1672 const unsigned int ncpus = num_online_cpus(); 1714 1673 unsigned long num_waits, num_fences; 1715 1674 struct intel_engine_cs *engine; 1716 - struct task_struct **threads; 1675 + struct smoke_thread *threads; 1717 1676 struct igt_live_test live; 1718 1677 intel_wakeref_t wakeref; 1719 1678 struct smoketest *smoke; ··· 1787 1746 smoke[idx].max_batch, engine->name); 1788 1747 1789 1748 for (n = 0; n < ncpus; n++) { 1790 - struct task_struct *tsk; 1749 + unsigned int i = idx * ncpus + n; 1750 + struct kthread_worker *worker; 1791 1751 1792 - tsk = kthread_run(__igt_breadcrumbs_smoketest, 1793 - &smoke[idx], "igt/%d.%d", idx, n); 1794 - if (IS_ERR(tsk)) { 1795 - ret = PTR_ERR(tsk); 1752 + worker = kthread_create_worker(0, "igt/%d.%d", idx, n); 1753 + if (IS_ERR(worker)) { 1754 + ret = PTR_ERR(worker); 1796 1755 goto out_flush; 1797 1756 } 1798 1757 1799 - get_task_struct(tsk); 1800 - threads[idx * ncpus + n] = tsk; 1758 + threads[i].worker = worker; 1759 + threads[i].t = &smoke[idx]; 1760 + 1761 + kthread_init_work(&threads[i].work, 1762 + __igt_breadcrumbs_smoketest); 1763 + kthread_queue_work(worker, &threads[i].work); 1801 1764 } 1802 1765 1803 1766 idx++; 1804 1767 } 1805 1768 1806 - yield(); /* start all threads before we begin */ 1807 1769 msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies)); 1808 1770 1809 1771 out_flush: ··· 1815 1771 num_fences = 0; 1816 1772 for_each_uabi_engine(engine, i915) { 1817 1773 for (n = 0; n < ncpus; n++) { 1818 - struct task_struct *tsk = threads[idx * ncpus + n]; 1774 + unsigned int i = idx * ncpus + n; 1819 1775 int err; 1820 1776 1821 - if (!tsk) 1777 + if (!threads[i].worker) 1822 1778 continue; 1823 1779 1824 - err = kthread_stop(tsk); 1780 + WRITE_ONCE(threads[i].stop, true); 1781 + kthread_flush_work(&threads[i].work); 1782 + err = READ_ONCE(threads[i].result); 1825 1783 if (err < 0 && !ret) 1826 1784 ret = err; 1827 1785 1828 - put_task_struct(tsk); 1786 + kthread_destroy_worker(threads[i].worker); 1829 1787 } 1830 1788 1831 1789 num_waits += atomic_long_read(&smoke[idx].num_waits); ··· 2937 2891 return err; 2938 2892 } 2939 2893 2940 - static int p_sync0(void *arg) 2894 + struct p_thread { 2895 + struct perf_stats p; 2896 + struct kthread_worker *worker; 2897 + struct kthread_work work; 2898 + struct intel_engine_cs *engine; 2899 + int result; 2900 + }; 2901 + 2902 + static void p_sync0(struct kthread_work *work) 2941 2903 { 2942 - struct perf_stats *p = arg; 2904 + struct p_thread *thread = container_of(work, typeof(*thread), work); 2905 + struct perf_stats *p = &thread->p; 2943 2906 struct intel_engine_cs *engine = p->engine; 2944 2907 struct intel_context *ce; 2945 2908 IGT_TIMEOUT(end_time); ··· 2957 2902 int err = 0; 2958 2903 2959 2904 ce = intel_context_create(engine); 2960 - if (IS_ERR(ce)) 2961 - return PTR_ERR(ce); 2905 + if (IS_ERR(ce)) { 2906 + thread->result = PTR_ERR(ce); 2907 + return; 2908 + } 2962 2909 2963 2910 err = intel_context_pin(ce); 2964 2911 if (err) { 2965 2912 intel_context_put(ce); 2966 - return err; 2913 + thread->result = err; 2914 + return; 2967 2915 } 2968 2916 2969 2917 if (intel_engine_supports_stats(engine)) { ··· 3016 2958 3017 2959 intel_context_unpin(ce); 3018 2960 intel_context_put(ce); 3019 - return err; 2961 + thread->result = err; 3020 2962 } 3021 2963 3022 - static int p_sync1(void *arg) 2964 + static void p_sync1(struct kthread_work *work) 3023 2965 { 3024 - struct perf_stats *p = arg; 2966 + struct p_thread *thread = container_of(work, typeof(*thread), work); 2967 + struct perf_stats *p = &thread->p; 3025 2968 struct intel_engine_cs *engine = p->engine; 3026 2969 struct i915_request *prev = NULL; 3027 2970 struct intel_context *ce; ··· 3032 2973 int err = 0; 3033 2974 3034 2975 ce = intel_context_create(engine); 3035 - if (IS_ERR(ce)) 3036 - return PTR_ERR(ce); 2976 + if (IS_ERR(ce)) { 2977 + thread->result = PTR_ERR(ce); 2978 + return; 2979 + } 3037 2980 3038 2981 err = intel_context_pin(ce); 3039 2982 if (err) { 3040 2983 intel_context_put(ce); 3041 - return err; 2984 + thread->result = err; 2985 + return; 3042 2986 } 3043 2987 3044 2988 if (intel_engine_supports_stats(engine)) { ··· 3093 3031 3094 3032 intel_context_unpin(ce); 3095 3033 intel_context_put(ce); 3096 - return err; 3034 + thread->result = err; 3097 3035 } 3098 3036 3099 - static int p_many(void *arg) 3037 + static void p_many(struct kthread_work *work) 3100 3038 { 3101 - struct perf_stats *p = arg; 3039 + struct p_thread *thread = container_of(work, typeof(*thread), work); 3040 + struct perf_stats *p = &thread->p; 3102 3041 struct intel_engine_cs *engine = p->engine; 3103 3042 struct intel_context *ce; 3104 3043 IGT_TIMEOUT(end_time); ··· 3108 3045 bool busy; 3109 3046 3110 3047 ce = intel_context_create(engine); 3111 - if (IS_ERR(ce)) 3112 - return PTR_ERR(ce); 3048 + if (IS_ERR(ce)) { 3049 + thread->result = PTR_ERR(ce); 3050 + return; 3051 + } 3113 3052 3114 3053 err = intel_context_pin(ce); 3115 3054 if (err) { 3116 3055 intel_context_put(ce); 3117 - return err; 3056 + thread->result = err; 3057 + return; 3118 3058 } 3119 3059 3120 3060 if (intel_engine_supports_stats(engine)) { ··· 3158 3092 3159 3093 intel_context_unpin(ce); 3160 3094 intel_context_put(ce); 3161 - return err; 3095 + thread->result = err; 3162 3096 } 3163 3097 3164 3098 static int perf_parallel_engines(void *arg) 3165 3099 { 3166 3100 struct drm_i915_private *i915 = arg; 3167 - static int (* const func[])(void *arg) = { 3101 + static void (* const func[])(struct kthread_work *) = { 3168 3102 p_sync0, 3169 3103 p_sync1, 3170 3104 p_many, 3171 3105 NULL, 3172 3106 }; 3173 3107 const unsigned int nengines = num_uabi_engines(i915); 3108 + void (* const *fn)(struct kthread_work *); 3174 3109 struct intel_engine_cs *engine; 3175 - int (* const *fn)(void *arg); 3176 3110 struct pm_qos_request qos; 3177 - struct { 3178 - struct perf_stats p; 3179 - struct task_struct *tsk; 3180 - } *engines; 3111 + struct p_thread *engines; 3181 3112 int err = 0; 3182 3113 3183 3114 engines = kcalloc(nengines, sizeof(*engines), GFP_KERNEL); ··· 3197 3134 3198 3135 idx = 0; 3199 3136 for_each_uabi_engine(engine, i915) { 3137 + struct kthread_worker *worker; 3138 + 3200 3139 intel_engine_pm_get(engine); 3201 3140 3202 3141 memset(&engines[idx].p, 0, sizeof(engines[idx].p)); 3203 - engines[idx].p.engine = engine; 3204 3142 3205 - engines[idx].tsk = kthread_run(*fn, &engines[idx].p, 3206 - "igt:%s", engine->name); 3207 - if (IS_ERR(engines[idx].tsk)) { 3208 - err = PTR_ERR(engines[idx].tsk); 3143 + worker = kthread_create_worker(0, "igt:%s", 3144 + engine->name); 3145 + if (IS_ERR(worker)) { 3146 + err = PTR_ERR(worker); 3209 3147 intel_engine_pm_put(engine); 3210 3148 break; 3211 3149 } 3212 - get_task_struct(engines[idx++].tsk); 3213 - } 3150 + engines[idx].worker = worker; 3151 + engines[idx].result = 0; 3152 + engines[idx].p.engine = engine; 3153 + engines[idx].engine = engine; 3214 3154 3215 - yield(); /* start all threads before we kthread_stop() */ 3155 + kthread_init_work(&engines[idx].work, *fn); 3156 + kthread_queue_work(worker, &engines[idx].work); 3157 + idx++; 3158 + } 3216 3159 3217 3160 idx = 0; 3218 3161 for_each_uabi_engine(engine, i915) { 3219 3162 int status; 3220 3163 3221 - if (IS_ERR(engines[idx].tsk)) 3164 + if (!engines[idx].worker) 3222 3165 break; 3223 3166 3224 - status = kthread_stop(engines[idx].tsk); 3167 + kthread_flush_work(&engines[idx].work); 3168 + status = READ_ONCE(engines[idx].result); 3225 3169 if (status && !err) 3226 3170 err = status; 3227 3171 3228 3172 intel_engine_pm_put(engine); 3229 - put_task_struct(engines[idx++].tsk); 3173 + 3174 + kthread_destroy_worker(engines[idx].worker); 3175 + idx++; 3230 3176 } 3231 3177 3232 3178 if (igt_live_test_end(&t))

+4

drivers/gpu/drm/i915/selftests/intel_uncore.c

··· 70 70 { gen12_shadowed_regs, ARRAY_SIZE(gen12_shadowed_regs) }, 71 71 { dg2_shadowed_regs, ARRAY_SIZE(dg2_shadowed_regs) }, 72 72 { pvc_shadowed_regs, ARRAY_SIZE(pvc_shadowed_regs) }, 73 + { mtl_shadowed_regs, ARRAY_SIZE(mtl_shadowed_regs) }, 74 + { xelpmp_shadowed_regs, ARRAY_SIZE(xelpmp_shadowed_regs) }, 73 75 }; 74 76 const struct i915_range *range; 75 77 unsigned int i, j; ··· 119 117 { __gen12_fw_ranges, ARRAY_SIZE(__gen12_fw_ranges), true }, 120 118 { __xehp_fw_ranges, ARRAY_SIZE(__xehp_fw_ranges), true }, 121 119 { __pvc_fw_ranges, ARRAY_SIZE(__pvc_fw_ranges), true }, 120 + { __mtl_fw_ranges, ARRAY_SIZE(__mtl_fw_ranges), true }, 121 + { __xelpmp_fw_ranges, ARRAY_SIZE(__xelpmp_fw_ranges), true }, 122 122 }; 123 123 int err, i; 124 124

-1

drivers/gpu/drm/i915/selftests/mock_gem_device.c

··· 67 67 intel_gt_driver_remove(to_gt(i915)); 68 68 69 69 i915_gem_drain_workqueue(i915); 70 - i915_gem_drain_freed_objects(i915); 71 70 72 71 mock_fini_ggtt(to_gt(i915)->ggtt); 73 72 destroy_workqueue(i915->wq);

+144 -2

drivers/misc/mei/bus.c

··· 13 13 #include <linux/slab.h> 14 14 #include <linux/mutex.h> 15 15 #include <linux/interrupt.h> 16 + #include <linux/scatterlist.h> 16 17 #include <linux/mei_cl_bus.h> 17 18 18 19 #include "mei_dev.h" ··· 101 100 cb->internal = !!(mode & MEI_CL_IO_TX_INTERNAL); 102 101 cb->blocking = !!(mode & MEI_CL_IO_TX_BLOCKING); 103 102 memcpy(cb->buf.data, buf, length); 103 + /* hack we point data to header */ 104 + if (mode & MEI_CL_IO_SGL) { 105 + cb->ext_hdr = (struct mei_ext_hdr *)cb->buf.data; 106 + cb->buf.data = NULL; 107 + cb->buf.size = 0; 108 + } 104 109 105 110 rets = mei_cl_write(cl, cb); 111 + 112 + if (mode & MEI_CL_IO_SGL && rets == 0) 113 + rets = length; 106 114 107 115 out: 108 116 mutex_unlock(&bus->device_lock); ··· 215 205 goto free; 216 206 } 217 207 218 - r_length = min_t(size_t, length, cb->buf_idx); 219 - memcpy(buf, cb->buf.data, r_length); 208 + /* for the GSC type - copy the extended header to the buffer */ 209 + if (cb->ext_hdr && cb->ext_hdr->type == MEI_EXT_HDR_GSC) { 210 + r_length = min_t(size_t, length, cb->ext_hdr->length * sizeof(u32)); 211 + memcpy(buf, cb->ext_hdr, r_length); 212 + } else { 213 + r_length = min_t(size_t, length, cb->buf_idx); 214 + memcpy(buf, cb->buf.data, r_length); 215 + } 220 216 rets = r_length; 217 + 221 218 if (vtag) 222 219 *vtag = cb->vtag; 223 220 ··· 838 821 return err; 839 822 } 840 823 EXPORT_SYMBOL_GPL(mei_cldev_disable); 824 + 825 + /** 826 + * mei_cldev_send_gsc_command - sends a gsc command, by sending 827 + * a gsl mei message to gsc and receiving reply from gsc 828 + * 829 + * @cldev: me client device 830 + * @client_id: client id to send the command to 831 + * @fence_id: fence id to send the command to 832 + * @sg_in: scatter gather list containing addresses for rx message buffer 833 + * @total_in_len: total length of data in 'in' sg, can be less than the sum of buffers sizes 834 + * @sg_out: scatter gather list containing addresses for tx message buffer 835 + * 836 + * Return: 837 + * * written size in bytes 838 + * * < 0 on error 839 + */ 840 + ssize_t mei_cldev_send_gsc_command(struct mei_cl_device *cldev, 841 + u8 client_id, u32 fence_id, 842 + struct scatterlist *sg_in, 843 + size_t total_in_len, 844 + struct scatterlist *sg_out) 845 + { 846 + struct mei_cl *cl; 847 + struct mei_device *bus; 848 + ssize_t ret = 0; 849 + 850 + struct mei_ext_hdr_gsc_h2f *ext_hdr; 851 + size_t buf_sz = sizeof(struct mei_ext_hdr_gsc_h2f); 852 + int sg_out_nents, sg_in_nents; 853 + int i; 854 + struct scatterlist *sg; 855 + struct mei_ext_hdr_gsc_f2h rx_msg; 856 + unsigned int sg_len; 857 + 858 + if (!cldev || !sg_in || !sg_out) 859 + return -EINVAL; 860 + 861 + cl = cldev->cl; 862 + bus = cldev->bus; 863 + 864 + dev_dbg(bus->dev, "client_id %u, fence_id %u\n", client_id, fence_id); 865 + 866 + if (!bus->hbm_f_gsc_supported) 867 + return -EOPNOTSUPP; 868 + 869 + sg_out_nents = sg_nents(sg_out); 870 + sg_in_nents = sg_nents(sg_in); 871 + /* at least one entry in tx and rx sgls must be present */ 872 + if (sg_out_nents <= 0 || sg_in_nents <= 0) 873 + return -EINVAL; 874 + 875 + buf_sz += (sg_out_nents + sg_in_nents) * sizeof(struct mei_gsc_sgl); 876 + ext_hdr = kzalloc(buf_sz, GFP_KERNEL); 877 + if (!ext_hdr) 878 + return -ENOMEM; 879 + 880 + /* construct the GSC message */ 881 + ext_hdr->hdr.type = MEI_EXT_HDR_GSC; 882 + ext_hdr->hdr.length = buf_sz / sizeof(u32); /* length is in dw */ 883 + 884 + ext_hdr->client_id = client_id; 885 + ext_hdr->addr_type = GSC_ADDRESS_TYPE_PHYSICAL_SGL; 886 + ext_hdr->fence_id = fence_id; 887 + ext_hdr->input_address_count = sg_in_nents; 888 + ext_hdr->output_address_count = sg_out_nents; 889 + ext_hdr->reserved[0] = 0; 890 + ext_hdr->reserved[1] = 0; 891 + 892 + /* copy in-sgl to the message */ 893 + for (i = 0, sg = sg_in; i < sg_in_nents; i++, sg++) { 894 + ext_hdr->sgl[i].low = lower_32_bits(sg_dma_address(sg)); 895 + ext_hdr->sgl[i].high = upper_32_bits(sg_dma_address(sg)); 896 + sg_len = min_t(unsigned int, sg_dma_len(sg), PAGE_SIZE); 897 + ext_hdr->sgl[i].length = (sg_len <= total_in_len) ? sg_len : total_in_len; 898 + total_in_len -= ext_hdr->sgl[i].length; 899 + } 900 + 901 + /* copy out-sgl to the message */ 902 + for (i = sg_in_nents, sg = sg_out; i < sg_in_nents + sg_out_nents; i++, sg++) { 903 + ext_hdr->sgl[i].low = lower_32_bits(sg_dma_address(sg)); 904 + ext_hdr->sgl[i].high = upper_32_bits(sg_dma_address(sg)); 905 + sg_len = min_t(unsigned int, sg_dma_len(sg), PAGE_SIZE); 906 + ext_hdr->sgl[i].length = sg_len; 907 + } 908 + 909 + /* send the message to GSC */ 910 + ret = __mei_cl_send(cl, (u8 *)ext_hdr, buf_sz, 0, MEI_CL_IO_SGL); 911 + if (ret < 0) { 912 + dev_err(bus->dev, "__mei_cl_send failed, returned %zd\n", ret); 913 + goto end; 914 + } 915 + if (ret != buf_sz) { 916 + dev_err(bus->dev, "__mei_cl_send returned %zd instead of expected %zd\n", 917 + ret, buf_sz); 918 + ret = -EIO; 919 + goto end; 920 + } 921 + 922 + /* receive the reply from GSC, note that at this point sg_in should contain the reply */ 923 + ret = __mei_cl_recv(cl, (u8 *)&rx_msg, sizeof(rx_msg), NULL, MEI_CL_IO_SGL, 0); 924 + 925 + if (ret != sizeof(rx_msg)) { 926 + dev_err(bus->dev, "__mei_cl_recv returned %zd instead of expected %zd\n", 927 + ret, sizeof(rx_msg)); 928 + if (ret >= 0) 929 + ret = -EIO; 930 + goto end; 931 + } 932 + 933 + /* check rx_msg.client_id and rx_msg.fence_id match the ones we send */ 934 + if (rx_msg.client_id != client_id || rx_msg.fence_id != fence_id) { 935 + dev_err(bus->dev, "received client_id/fence_id %u/%u instead of %u/%u sent\n", 936 + rx_msg.client_id, rx_msg.fence_id, client_id, fence_id); 937 + ret = -EFAULT; 938 + goto end; 939 + } 940 + 941 + dev_dbg(bus->dev, "gsc command: successfully written %u bytes\n", rx_msg.written); 942 + ret = rx_msg.written; 943 + 944 + end: 945 + kfree(ext_hdr); 946 + return ret; 947 + } 948 + EXPORT_SYMBOL_GPL(mei_cldev_send_gsc_command); 841 949 842 950 /** 843 951 * mei_cl_device_find - find matching entry in the driver id table

+40 -15

drivers/misc/mei/client.c

··· 322 322 323 323 list_del(&cb->list); 324 324 kfree(cb->buf.data); 325 + kfree(cb->ext_hdr); 325 326 kfree(cb); 326 327 } 327 328 ··· 402 401 cb->buf_idx = 0; 403 402 cb->fop_type = type; 404 403 cb->vtag = 0; 404 + cb->ext_hdr = NULL; 405 405 406 406 return cb; 407 407 } ··· 1742 1740 return vtag_hdr->hdr.length; 1743 1741 } 1744 1742 1743 + static inline bool mei_ext_hdr_is_gsc(struct mei_ext_hdr *ext) 1744 + { 1745 + return ext && ext->type == MEI_EXT_HDR_GSC; 1746 + } 1747 + 1748 + static inline u8 mei_ext_hdr_set_gsc(struct mei_ext_hdr *ext, struct mei_ext_hdr *gsc_hdr) 1749 + { 1750 + memcpy(ext, gsc_hdr, mei_ext_hdr_len(gsc_hdr)); 1751 + return ext->length; 1752 + } 1753 + 1745 1754 /** 1746 1755 * mei_msg_hdr_init - allocate and initialize mei message header 1747 1756 * ··· 1765 1752 size_t hdr_len; 1766 1753 struct mei_ext_meta_hdr *meta; 1767 1754 struct mei_msg_hdr *mei_hdr; 1768 - bool is_ext, is_vtag; 1755 + bool is_ext, is_hbm, is_gsc, is_vtag; 1756 + struct mei_ext_hdr *next_ext; 1769 1757 1770 1758 if (!cb) 1771 1759 return ERR_PTR(-EINVAL); 1772 1760 1773 1761 /* Extended header for vtag is attached only on the first fragment */ 1774 1762 is_vtag = (cb->vtag && cb->buf_idx == 0); 1775 - is_ext = is_vtag; 1763 + is_hbm = cb->cl->me_cl->client_id == 0; 1764 + is_gsc = ((!is_hbm) && cb->cl->dev->hbm_f_gsc_supported && mei_ext_hdr_is_gsc(cb->ext_hdr)); 1765 + is_ext = is_vtag || is_gsc; 1776 1766 1777 1767 /* Compute extended header size */ 1778 1768 hdr_len = sizeof(*mei_hdr); ··· 1786 1770 hdr_len += sizeof(*meta); 1787 1771 if (is_vtag) 1788 1772 hdr_len += sizeof(struct mei_ext_hdr_vtag); 1773 + 1774 + if (is_gsc) 1775 + hdr_len += mei_ext_hdr_len(cb->ext_hdr); 1789 1776 1790 1777 setup_hdr: 1791 1778 mei_hdr = kzalloc(hdr_len, GFP_KERNEL); ··· 1804 1785 goto out; 1805 1786 1806 1787 meta = (struct mei_ext_meta_hdr *)mei_hdr->extension; 1788 + meta->size = 0; 1789 + next_ext = (struct mei_ext_hdr *)meta->hdrs; 1807 1790 if (is_vtag) { 1808 1791 meta->count++; 1809 - meta->size += mei_ext_hdr_set_vtag(meta->hdrs, cb->vtag); 1792 + meta->size += mei_ext_hdr_set_vtag(next_ext, cb->vtag); 1793 + next_ext = mei_ext_next(next_ext); 1810 1794 } 1795 + 1796 + if (is_gsc) { 1797 + meta->count++; 1798 + meta->size += mei_ext_hdr_set_gsc(next_ext, cb->ext_hdr); 1799 + next_ext = mei_ext_next(next_ext); 1800 + } 1801 + 1811 1802 out: 1812 1803 mei_hdr->length = hdr_len - sizeof(*mei_hdr); 1813 1804 return mei_hdr; ··· 1841 1812 struct mei_msg_hdr *mei_hdr = NULL; 1842 1813 size_t hdr_len; 1843 1814 size_t hbuf_len, dr_len; 1844 - size_t buf_len; 1815 + size_t buf_len = 0; 1845 1816 size_t data_len; 1846 1817 int hbuf_slots; 1847 1818 u32 dr_slots; 1848 1819 u32 dma_len; 1849 1820 int rets; 1850 1821 bool first_chunk; 1851 - const void *data; 1822 + const void *data = NULL; 1852 1823 1853 1824 if (WARN_ON(!cl || !cl->dev)) 1854 1825 return -ENODEV; ··· 1868 1839 return 0; 1869 1840 } 1870 1841 1871 - buf_len = buf->size - cb->buf_idx; 1872 - data = buf->data + cb->buf_idx; 1842 + if (buf->data) { 1843 + buf_len = buf->size - cb->buf_idx; 1844 + data = buf->data + cb->buf_idx; 1845 + } 1873 1846 hbuf_slots = mei_hbuf_empty_slots(dev); 1874 1847 if (hbuf_slots < 0) { 1875 1848 rets = -EOVERFLOW; ··· 1888 1857 mei_hdr = NULL; 1889 1858 goto err; 1890 1859 } 1891 - 1892 - cl_dbg(dev, cl, "Extended Header %d vtag = %d\n", 1893 - mei_hdr->extended, cb->vtag); 1894 1860 1895 1861 hdr_len = sizeof(*mei_hdr) + mei_hdr->length; 1896 1862 ··· 1917 1889 } 1918 1890 mei_hdr->length += data_len; 1919 1891 1920 - if (mei_hdr->dma_ring) 1892 + if (mei_hdr->dma_ring && buf->data) 1921 1893 mei_dma_ring_write(dev, buf->data + cb->buf_idx, buf_len); 1922 1894 rets = mei_write_message(dev, mei_hdr, hdr_len, data, data_len); 1923 1895 ··· 2011 1983 goto err; 2012 1984 } 2013 1985 2014 - cl_dbg(dev, cl, "Extended Header %d vtag = %d\n", 2015 - mei_hdr->extended, cb->vtag); 2016 - 2017 1986 hdr_len = sizeof(*mei_hdr) + mei_hdr->length; 2018 1987 2019 1988 if (rets == 0) { ··· 2055 2030 2056 2031 mei_hdr->length += data_len; 2057 2032 2058 - if (mei_hdr->dma_ring) 2033 + if (mei_hdr->dma_ring && buf->data) 2059 2034 mei_dma_ring_write(dev, buf->data, buf_len); 2060 2035 rets = mei_write_message(dev, mei_hdr, hdr_len, data, data_len); 2061 2036

+13

drivers/misc/mei/hbm.c

··· 340 340 req.hbm_cmd = MEI_HBM_CAPABILITIES_REQ_CMD; 341 341 if (dev->hbm_f_vt_supported) 342 342 req.capability_requested[0] |= HBM_CAP_VT; 343 + 343 344 if (dev->hbm_f_cd_supported) 344 345 req.capability_requested[0] |= HBM_CAP_CD; 346 + 347 + if (dev->hbm_f_gsc_supported) 348 + req.capability_requested[0] |= HBM_CAP_GSC; 345 349 346 350 ret = mei_hbm_write_message(dev, &mei_hdr, &req); 347 351 if (ret) { ··· 1204 1200 dev->version.minor_version >= HBM_MINOR_VERSION_VT)) 1205 1201 dev->hbm_f_vt_supported = 1; 1206 1202 1203 + /* GSC support */ 1204 + if (dev->version.major_version > HBM_MAJOR_VERSION_GSC || 1205 + (dev->version.major_version == HBM_MAJOR_VERSION_GSC && 1206 + dev->version.minor_version >= HBM_MINOR_VERSION_GSC)) 1207 + dev->hbm_f_gsc_supported = 1; 1208 + 1207 1209 /* Capability message Support */ 1208 1210 dev->hbm_f_cap_supported = 0; 1209 1211 if (dev->version.major_version > HBM_MAJOR_VERSION_CAP || ··· 1376 1366 dev->hbm_f_vt_supported = 0; 1377 1367 if (!(capability_res->capability_granted[0] & HBM_CAP_CD)) 1378 1368 dev->hbm_f_cd_supported = 0; 1369 + 1370 + if (!(capability_res->capability_granted[0] & HBM_CAP_GSC)) 1371 + dev->hbm_f_gsc_supported = 0; 1379 1372 1380 1373 if (dev->hbm_f_dr_supported) { 1381 1374 if (mei_dmam_ring_alloc(dev))

+6 -1

drivers/misc/mei/hw-me.c

··· 590 590 u32 dw_cnt; 591 591 int empty_slots; 592 592 593 - if (WARN_ON(!hdr || !data || hdr_len & 0x3)) 593 + if (WARN_ON(!hdr || hdr_len & 0x3)) 594 594 return -EINVAL; 595 + 596 + if (!data && data_len) { 597 + dev_err(dev->dev, "wrong parameters null data with data_len = %zu\n", data_len); 598 + return -EINVAL; 599 + } 595 600 596 601 dev_dbg(dev->dev, MEI_HDR_FMT, MEI_HDR_PRM((struct mei_msg_hdr *)hdr)); 597 602

+84 -5

drivers/misc/mei/hw.h

··· 93 93 #define HBM_MAJOR_VERSION_VT 2 94 94 95 95 /* 96 + * MEI version with GSC support 97 + */ 98 + #define HBM_MINOR_VERSION_GSC 2 99 + #define HBM_MAJOR_VERSION_GSC 2 100 + 101 + /* 96 102 * MEI version with capabilities message support 97 103 */ 98 104 #define HBM_MINOR_VERSION_CAP 2 ··· 235 229 * 236 230 * @MEI_EXT_HDR_NONE: sentinel 237 231 * @MEI_EXT_HDR_VTAG: vtag header 232 + * @MEI_EXT_HDR_GSC: gsc header 238 233 */ 239 234 enum mei_ext_hdr_type { 240 235 MEI_EXT_HDR_NONE = 0, 241 236 MEI_EXT_HDR_VTAG = 1, 237 + MEI_EXT_HDR_GSC = 2, 242 238 }; 243 239 244 240 /** 245 241 * struct mei_ext_hdr - extend header descriptor (TLV) 246 242 * @type: enum mei_ext_hdr_type 247 243 * @length: length excluding descriptor 248 - * @ext_payload: payload of the specific extended header 249 - * @hdr: place holder for actual header 244 + * @data: the extended header payload 250 245 */ 251 246 struct mei_ext_hdr { 252 247 u8 type; ··· 286 279 * Extended header iterator functions 287 280 */ 288 281 /** 289 - * mei_ext_hdr - extended header iterator begin 282 + * mei_ext_begin - extended header iterator begin 290 283 * 291 284 * @meta: meta header of the extended header list 292 285 * 293 - * Return: 294 - * The first extended header 286 + * Return: The first extended header 295 287 */ 296 288 static inline struct mei_ext_hdr *mei_ext_begin(struct mei_ext_meta_hdr *meta) 297 289 { ··· 311 305 return (u8 *)ext >= (u8 *)meta + sizeof(*meta) + (meta->size * 4); 312 306 } 313 307 308 + struct mei_gsc_sgl { 309 + u32 low; 310 + u32 high; 311 + u32 length; 312 + } __packed; 313 + 314 + #define GSC_HECI_MSG_KERNEL 0 315 + #define GSC_HECI_MSG_USER 1 316 + 317 + #define GSC_ADDRESS_TYPE_GTT 0 318 + #define GSC_ADDRESS_TYPE_PPGTT 1 319 + #define GSC_ADDRESS_TYPE_PHYSICAL_CONTINUOUS 2 /* max of 64K */ 320 + #define GSC_ADDRESS_TYPE_PHYSICAL_SGL 3 321 + 322 + /** 323 + * struct mei_ext_hdr_gsc_h2f - extended header: gsc host to firmware interface 324 + * 325 + * @hdr: extended header 326 + * @client_id: GSC_HECI_MSG_KERNEL or GSC_HECI_MSG_USER 327 + * @addr_type: GSC_ADDRESS_TYPE_{GTT, PPGTT, PHYSICAL_CONTINUOUS, PHYSICAL_SGL} 328 + * @fence_id: synchronization marker 329 + * @input_address_count: number of input sgl buffers 330 + * @output_address_count: number of output sgl buffers 331 + * @reserved: reserved 332 + * @sgl: sg list 333 + */ 334 + struct mei_ext_hdr_gsc_h2f { 335 + struct mei_ext_hdr hdr; 336 + u8 client_id; 337 + u8 addr_type; 338 + u32 fence_id; 339 + u8 input_address_count; 340 + u8 output_address_count; 341 + u8 reserved[2]; 342 + struct mei_gsc_sgl sgl[]; 343 + } __packed; 344 + 345 + /** 346 + * struct mei_ext_hdr_gsc_f2h - gsc firmware to host interface 347 + * 348 + * @hdr: extended header 349 + * @client_id: GSC_HECI_MSG_KERNEL or GSC_HECI_MSG_USER 350 + * @reserved: reserved 351 + * @fence_id: synchronization marker 352 + * @written: number of bytes written to firmware 353 + */ 354 + struct mei_ext_hdr_gsc_f2h { 355 + struct mei_ext_hdr hdr; 356 + u8 client_id; 357 + u8 reserved; 358 + u32 fence_id; 359 + u32 written; 360 + } __packed; 361 + 314 362 /** 315 363 * mei_ext_next - following extended header on the TLV list 316 364 * ··· 378 318 static inline struct mei_ext_hdr *mei_ext_next(struct mei_ext_hdr *ext) 379 319 { 380 320 return (struct mei_ext_hdr *)((u8 *)ext + (ext->length * 4)); 321 + } 322 + 323 + /** 324 + * mei_ext_hdr_len - get ext header length in bytes 325 + * 326 + * @ext: extend header 327 + * 328 + * Return: extend header length in bytes 329 + */ 330 + static inline u32 mei_ext_hdr_len(const struct mei_ext_hdr *ext) 331 + { 332 + if (!ext) 333 + return 0; 334 + 335 + return ext->length * sizeof(u32); 381 336 } 382 337 383 338 /** ··· 757 682 758 683 /* virtual tag supported */ 759 684 #define HBM_CAP_VT BIT(0) 685 + 686 + /* gsc extended header support */ 687 + #define HBM_CAP_GSC BIT(1) 688 + 760 689 /* client dma supported */ 761 690 #define HBM_CAP_CD BIT(2) 762 691

+40 -7

drivers/misc/mei/interrupt.c

··· 98 98 struct mei_device *dev = cl->dev; 99 99 struct mei_cl_cb *cb; 100 100 101 + struct mei_ext_hdr_vtag *vtag_hdr = NULL; 102 + struct mei_ext_hdr_gsc_f2h *gsc_f2h = NULL; 103 + 101 104 size_t buf_sz; 102 105 u32 length; 103 - int ext_len; 106 + u32 ext_len; 104 107 105 108 length = mei_hdr->length; 106 109 ext_len = 0; ··· 125 122 } 126 123 127 124 if (mei_hdr->extended) { 128 - struct mei_ext_hdr *ext; 129 - struct mei_ext_hdr_vtag *vtag_hdr = NULL; 130 - 131 - ext = mei_ext_begin(meta); 125 + struct mei_ext_hdr *ext = mei_ext_begin(meta); 132 126 do { 133 127 switch (ext->type) { 134 128 case MEI_EXT_HDR_VTAG: 135 129 vtag_hdr = (struct mei_ext_hdr_vtag *)ext; 136 130 break; 131 + case MEI_EXT_HDR_GSC: 132 + gsc_f2h = (struct mei_ext_hdr_gsc_f2h *)ext; 133 + cb->ext_hdr = kzalloc(sizeof(*gsc_f2h), GFP_KERNEL); 134 + if (!cb->ext_hdr) { 135 + cb->status = -ENOMEM; 136 + goto discard; 137 + } 138 + break; 137 139 case MEI_EXT_HDR_NONE: 138 140 fallthrough; 139 141 default: 142 + cl_err(dev, cl, "unknown extended header\n"); 140 143 cb->status = -EPROTO; 141 144 break; 142 145 } ··· 150 141 ext = mei_ext_next(ext); 151 142 } while (!mei_ext_last(meta, ext)); 152 143 153 - if (!vtag_hdr) { 154 - cl_dbg(dev, cl, "vtag not found in extended header.\n"); 144 + if (!vtag_hdr && !gsc_f2h) { 145 + cl_dbg(dev, cl, "no vtag or gsc found in extended header.\n"); 155 146 cb->status = -EPROTO; 156 147 goto discard; 157 148 } 149 + } 158 150 151 + if (vtag_hdr) { 159 152 cl_dbg(dev, cl, "vtag: %d\n", vtag_hdr->vtag); 160 153 if (cb->vtag && cb->vtag != vtag_hdr->vtag) { 161 154 cl_err(dev, cl, "mismatched tag: %d != %d\n", ··· 166 155 goto discard; 167 156 } 168 157 cb->vtag = vtag_hdr->vtag; 158 + } 159 + 160 + if (gsc_f2h) { 161 + u32 ext_hdr_len = mei_ext_hdr_len(&gsc_f2h->hdr); 162 + 163 + if (!dev->hbm_f_gsc_supported) { 164 + cl_err(dev, cl, "gsc extended header is not supported\n"); 165 + cb->status = -EPROTO; 166 + goto discard; 167 + } 168 + 169 + if (length) { 170 + cl_err(dev, cl, "no data allowed in cb with gsc\n"); 171 + cb->status = -EPROTO; 172 + goto discard; 173 + } 174 + if (ext_hdr_len > sizeof(*gsc_f2h)) { 175 + cl_err(dev, cl, "gsc extended header is too big %u\n", ext_hdr_len); 176 + cb->status = -EPROTO; 177 + goto discard; 178 + } 179 + memcpy(cb->ext_hdr, gsc_f2h, ext_hdr_len); 169 180 } 170 181 171 182 if (!mei_cl_is_connected(cl)) {

+8

drivers/misc/mei/mei_dev.h

··· 116 116 * @MEI_CL_IO_TX_INTERNAL: internal communication between driver and FW 117 117 * 118 118 * @MEI_CL_IO_RX_NONBLOCK: recv is non-blocking 119 + * 120 + * @MEI_CL_IO_SGL: send command with sgl list. 119 121 */ 120 122 enum mei_cl_io_mode { 121 123 MEI_CL_IO_TX_BLOCKING = BIT(0), 122 124 MEI_CL_IO_TX_INTERNAL = BIT(1), 123 125 124 126 MEI_CL_IO_RX_NONBLOCK = BIT(2), 127 + 128 + MEI_CL_IO_SGL = BIT(3), 125 129 }; 126 130 127 131 /* ··· 210 206 * @status: io status of the cb 211 207 * @internal: communication between driver and FW flag 212 208 * @blocking: transmission blocking mode 209 + * @ext_hdr: extended header 213 210 */ 214 211 struct mei_cl_cb { 215 212 struct list_head list; ··· 223 218 int status; 224 219 u32 internal:1; 225 220 u32 blocking:1; 221 + struct mei_ext_hdr *ext_hdr; 226 222 }; 227 223 228 224 /** ··· 500 494 * @hbm_f_vt_supported : hbm feature vtag supported 501 495 * @hbm_f_cap_supported : hbm feature capabilities message supported 502 496 * @hbm_f_cd_supported : hbm feature client dma supported 497 + * @hbm_f_gsc_supported : hbm feature gsc supported 503 498 * 504 499 * @fw_ver : FW versions 505 500 * ··· 592 585 unsigned int hbm_f_vt_supported:1; 593 586 unsigned int hbm_f_cap_supported:1; 594 587 unsigned int hbm_f_cd_supported:1; 588 + unsigned int hbm_f_gsc_supported:1; 595 589 596 590 struct mei_fw_version fw_ver[MEI_MAX_FW_VER_BLOCKS]; 597 591

+35 -3

drivers/misc/mei/pxp/mei_pxp.c

··· 77 77 return byte; 78 78 } 79 79 80 + /** 81 + * mei_pxp_gsc_command() - sends a gsc command, by sending 82 + * a sgl mei message to gsc and receiving reply from gsc 83 + * 84 + * @dev: device corresponding to the mei_cl_device 85 + * @client_id: client id to send the command to 86 + * @fence_id: fence id to send the command to 87 + * @sg_in: scatter gather list containing addresses for rx message buffer 88 + * @total_in_len: total length of data in 'in' sg, can be less than the sum of buffers sizes 89 + * @sg_out: scatter gather list containing addresses for tx message buffer 90 + * 91 + * Return: bytes sent on Success, <0 on Failure 92 + */ 93 + static ssize_t mei_pxp_gsc_command(struct device *dev, u8 client_id, u32 fence_id, 94 + struct scatterlist *sg_in, size_t total_in_len, 95 + struct scatterlist *sg_out) 96 + { 97 + struct mei_cl_device *cldev; 98 + 99 + cldev = to_mei_cl_device(dev); 100 + 101 + return mei_cldev_send_gsc_command(cldev, client_id, fence_id, sg_in, total_in_len, sg_out); 102 + } 103 + 80 104 static const struct i915_pxp_component_ops mei_pxp_ops = { 81 105 .owner = THIS_MODULE, 82 106 .send = mei_pxp_send_message, 83 107 .recv = mei_pxp_receive_message, 108 + .gsc_command = mei_pxp_gsc_command, 84 109 }; 85 110 86 111 static int mei_component_master_bind(struct device *dev) ··· 156 131 { 157 132 struct device *base = data; 158 133 134 + if (!dev) 135 + return 0; 136 + 159 137 if (!dev->driver || strcmp(dev->driver->name, "i915") || 160 138 subcomponent != I915_COMPONENT_PXP) 161 139 return 0; 162 140 163 141 base = base->parent; 164 - if (!base) 142 + if (!base) /* mei device */ 165 143 return 0; 166 144 167 - base = base->parent; 168 - dev = dev->parent; 145 + base = base->parent; /* pci device */ 146 + /* for dgfx */ 147 + if (base && dev == base) 148 + return 1; 169 149 150 + /* for pch */ 151 + dev = dev->parent; 170 152 return (base && dev && dev == base); 171 153 } 172 154

+5

include/drm/i915_pxp_tee_interface.h

··· 8 8 9 9 #include <linux/mutex.h> 10 10 #include <linux/device.h> 11 + struct scatterlist; 11 12 12 13 /** 13 14 * struct i915_pxp_component_ops - ops for PXP services. ··· 24 23 25 24 int (*send)(struct device *dev, const void *message, size_t size); 26 25 int (*recv)(struct device *dev, void *buffer, size_t size); 26 + ssize_t (*gsc_command)(struct device *dev, u8 client_id, u32 fence_id, 27 + struct scatterlist *sg_in, size_t total_in_len, 28 + struct scatterlist *sg_out); 29 + 27 30 }; 28 31 29 32 /**

+6

include/linux/mei_cl_bus.h

··· 11 11 12 12 struct mei_cl_device; 13 13 struct mei_device; 14 + struct scatterlist; 14 15 15 16 typedef void (*mei_cldev_cb_t)(struct mei_cl_device *cldev); 16 17 ··· 117 116 int mei_cldev_enable(struct mei_cl_device *cldev); 118 117 int mei_cldev_disable(struct mei_cl_device *cldev); 119 118 bool mei_cldev_enabled(const struct mei_cl_device *cldev); 119 + ssize_t mei_cldev_send_gsc_command(struct mei_cl_device *cldev, 120 + u8 client_id, u32 fence_id, 121 + struct scatterlist *sg_in, 122 + size_t total_in_len, 123 + struct scatterlist *sg_out); 120 124 121 125 void *mei_cldev_dma_map(struct mei_cl_device *cldev, u8 buffer_id, size_t size); 122 126 int mei_cldev_dma_unmap(struct mei_cl_device *cldev);

+36 -26

include/uapi/drm/i915_drm.h

··· 645 645 */ 646 646 #define I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP (1ul << 5) 647 647 648 + /* 649 + * Query the status of HuC load. 650 + * 651 + * The query can fail in the following scenarios with the listed error codes: 652 + * -ENODEV if HuC is not present on this platform, 653 + * -EOPNOTSUPP if HuC firmware usage is disabled, 654 + * -ENOPKG if HuC firmware fetch failed, 655 + * -ENOEXEC if HuC firmware is invalid or mismatched, 656 + * -ENOMEM if i915 failed to prepare the FW objects for transfer to the uC, 657 + * -EIO if the FW transfer or the FW authentication failed. 658 + * 659 + * If the IOCTL is successful, the returned parameter will be set to one of the 660 + * following values: 661 + * * 0 if HuC firmware load is not complete, 662 + * * 1 if HuC firmware is authenticated and running. 663 + */ 648 664 #define I915_PARAM_HUC_STATUS 42 649 665 650 666 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of ··· 764 748 765 749 /* Query if the kernel supports the I915_USERPTR_PROBE flag. */ 766 750 #define I915_PARAM_HAS_USERPTR_PROBE 56 751 + 752 + /* 753 + * Frequency of the timestamps in OA reports. This used to be the same as the CS 754 + * timestamp frequency, but differs on some platforms. 755 + */ 756 + #define I915_PARAM_OA_TIMESTAMP_FREQUENCY 57 767 757 768 758 /* Must be kept compact -- no holes and well documented */ 769 759 ··· 2672 2650 I915_OA_FORMAT_A12_B8_C8, 2673 2651 I915_OA_FORMAT_A32u40_A4u32_B8_C8, 2674 2652 2653 + /* DG2 */ 2654 + I915_OAR_FORMAT_A32u40_A4u32_B8_C8, 2655 + I915_OA_FORMAT_A24u40_A14u32_B8_C8, 2656 + 2675 2657 I915_OA_FORMAT_MAX /* non-ABI */ 2676 2658 }; 2677 2659 ··· 3519 3493 * 3520 3494 * The (page-aligned) allocated size for the object will be returned. 3521 3495 * 3522 - * DG2 64K min page size implications: 3496 + * On platforms like DG2/ATS the kernel will always use 64K or larger 3497 + * pages for I915_MEMORY_CLASS_DEVICE. The kernel also requires a 3498 + * minimum of 64K GTT alignment for such objects. 3523 3499 * 3524 - * On discrete platforms, starting from DG2, we have to contend with GTT 3525 - * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE 3526 - * objects. Specifically the hardware only supports 64K or larger GTT 3527 - * page sizes for such memory. The kernel will already ensure that all 3528 - * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page 3529 - * sizes underneath. 3530 - * 3531 - * Note that the returned size here will always reflect any required 3532 - * rounding up done by the kernel, i.e 4K will now become 64K on devices 3533 - * such as DG2. The kernel will always select the largest minimum 3534 - * page-size for the set of possible placements as the value to use when 3535 - * rounding up the @size. 3536 - * 3537 - * Special DG2 GTT address alignment requirement: 3538 - * 3539 - * The GTT alignment will also need to be at least 2M for such objects. 3540 - * 3541 - * Note that due to how the hardware implements 64K GTT page support, we 3542 - * have some further complications: 3500 + * NOTE: Previously the ABI here required a minimum GTT alignment of 2M 3501 + * on DG2/ATS, due to how the hardware implemented 64K GTT page support, 3502 + * where we had the following complications: 3543 3503 * 3544 3504 * 1) The entire PDE (which covers a 2MB virtual address range), must 3545 3505 * contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same ··· 3534 3522 * 2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM 3535 3523 * objects. 3536 3524 * 3537 - * To keep things simple for userland, we mandate that any GTT mappings 3538 - * must be aligned to and rounded up to 2MB. The kernel will internally 3539 - * pad them out to the next 2MB boundary. As this only wastes virtual 3540 - * address space and avoids userland having to copy any needlessly 3541 - * complicated PDE sharing scheme (coloring) and only affects DG2, this 3542 - * is deemed to be a good compromise. 3525 + * However on actual production HW this was completely changed to now 3526 + * allow setting a TLB hint at the PTE level (see PS64), which is a lot 3527 + * more flexible than the above. With this the 2M restriction was 3528 + * dropped where we now only require 64K. 3543 3529 */ 3544 3530 __u64 size; 3545 3531