Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'drm-xe-next-2024-07-30' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

drm-xe-next for 6.12

UAPI Changes:
- Rename xe perf layer as xe observation layer, but was
also made available via fixes to previous verison (Ashutosh)
- Use write-back caching mode for system memory on DGFX,
but was also mad available via fixes to previous version (Thomas)
- Expose SIMD16 EU mask in topology query for userspace to know
the type of EU, as available in PVC, Lunar Lake and Battlemage
(Lucas)
- Return ENOBUFS instead of ENOMEM in vm_bind if failure is tied
to an array of binds (Matthew Brost)

Driver Changes:
- Log cleanup moving messages to debug priority (Michal Wajdeczko)
- Add timeout to fences to adhere to dma_buf rules (Matthew Brost)
- Rename old engine nomenclature to exec_queue (Matthew Brost)
- Convert multiple bind ops to 1 job (Matthew Brost)
- Add error injection for vm bind to help testing error path
(Matthew Brost)
- Fix error handling in page table to propagate correctly
to userspace (Matthew Brost)
- Re-organize and cleanup SR-IOV related registers (Michal Wajdeczko)
- Make the device write barrier compatible with VF (Michal Wajdeczko)
- New display workarounds for Battlemage (Matthew Auld)
- New media workarounds for Lunar Lake and Battlemage (Ngai-Mint Kwan)
- New graphics workarounds for Lunar Lake (Bommu Krishnaiah)
- Tracepoint updates (Matthew Brost, Nirmoy Das)
- Cleanup the header generation for OOB workarounds (Lucas De Marchi)
- Fix leaking HDCP-related object (Nirmoy Das)
- Serialize L2 flushes to avoid races (Tejas Upadhyay)
- Log pid and comm on job timeout (José Roberto de Souza)
- Simplify boilerplate code for live kunit (Michal Wajdeczko)
- Improve kunit skips for live kunit (Michal Wajdeczko)
- Fix xe_sync cleanup when handling xe_exec ioctl (Ashutosh Dixit)
- Limit fair VF LMEM provisioning (Michal Wajdeczko)
- New workaround to fence mmio writes in Lunar Lake (Tejas Upadhyay)
- Warn on writes inaccessible register in VF (Michal Wajdeczko)
- Fix register lookup in VF (Michal Wajdeczko)
- Add GSC support for Battlemage (Alexander Usyskin)
- Fix wedging only the GT in which timeout occurred (Matthew Brost)
- Block device suspend when wedging (Matthew Brost)
- Handle compression and migration changes for Battlemage
(Akshata Jahagirdar)
- Limit access of stolen memory for Lunar Lake (Uma Shankar)
- Fail invalid addresses during user fence creation (Matthew Brost)
- Refcount xe_file to safely and accurately store fdinfo stats
(Umesh Nerlige Ramappa)
- Cleanup and fix PM reference for TLB invalidation code
(Matthew Brost)
- Fix PM reference handling when communicating with GuC (Matthew Brost)
- Add new BO flag for 2 MiB alignement and use in VF (Michal Wajdeczko)
- Simplify MMIO setup for multi-tile platforms (Lucas De Marchi)
- Add check for uninitialized access to OOB workarounds
(Lucas De Marchi)
- New GSC and HuC firmware blobs for Lunar Lake and Battlemage
(Daniele Ceraolo Spurio)
- Unify mmio wait logic (Gustavo Sousa)
- Fix off-by-one when processing RTP rules (Lucas De Marchi)
- Future-proof migrate logic with compressed PAT flag (Matt Roper)
- Add WA kunit tests for Battlemage (Lucas De Marchi)
- Test active tracking for workaorunds with kunit (Lucas De Marchi)
- Add kunit tests for RTP with no actions (Lucas De Marchi)
- Unify parse of OR rules in RTP (Lucas De Marchi)
- Add performance tuning for Battlemage (Sai Teja Pottumuttu)
- Make bit masks unsigned (Geert Uytterhoeven)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/k7xuktfav4zmtxxjr77glu2hszypvzgmzghoumh757nqfnk7kn@ccfi4ts3ytbk

+3307 -1773
+8
drivers/gpu/drm/i915/display/intel_display_wa.h
··· 6 6 #ifndef __INTEL_DISPLAY_WA_H__ 7 7 #define __INTEL_DISPLAY_WA_H__ 8 8 9 + #include <linux/types.h> 10 + 9 11 struct drm_i915_private; 10 12 11 13 void intel_display_wa_apply(struct drm_i915_private *i915); 14 + 15 + #ifdef I915 16 + static inline bool intel_display_needs_wa_16023588340(struct drm_i915_private *i915) { return false; } 17 + #else 18 + bool intel_display_needs_wa_16023588340(struct drm_i915_private *i915); 19 + #endif 12 20 13 21 #endif
+6
drivers/gpu/drm/i915/display/intel_fbc.c
··· 56 56 #include "intel_display_device.h" 57 57 #include "intel_display_trace.h" 58 58 #include "intel_display_types.h" 59 + #include "intel_display_wa.h" 59 60 #include "intel_fbc.h" 60 61 #include "intel_fbc_regs.h" 61 62 #include "intel_frontbuffer.h" ··· 1235 1234 1236 1235 if (!plane_state->uapi.visible) { 1237 1236 plane_state->no_fbc_reason = "plane not visible"; 1237 + return 0; 1238 + } 1239 + 1240 + if (intel_display_needs_wa_16023588340(i915)) { 1241 + plane_state->no_fbc_reason = "Wa_16023588340"; 1238 1242 return 0; 1239 1243 } 1240 1244
+5 -18
drivers/gpu/drm/xe/Makefile
··· 12 12 subdir-ccflags-y += -I$(obj) -I$(src) 13 13 14 14 # generated sources 15 + 15 16 hostprogs := xe_gen_wa_oob 16 - 17 17 generated_oob := $(obj)/generated/xe_wa_oob.c $(obj)/generated/xe_wa_oob.h 18 - 19 18 quiet_cmd_wa_oob = GEN $(notdir $(generated_oob)) 20 19 cmd_wa_oob = mkdir -p $(@D); $^ $(generated_oob) 21 - 22 20 $(obj)/generated/%_wa_oob.c $(obj)/generated/%_wa_oob.h: $(obj)/xe_gen_wa_oob \ 23 21 $(src)/xe_wa_oob.rules 24 22 $(call cmd,wa_oob) 25 - 26 - uses_generated_oob := \ 27 - $(obj)/xe_ggtt.o \ 28 - $(obj)/xe_gsc.o \ 29 - $(obj)/xe_gt.o \ 30 - $(obj)/xe_guc.o \ 31 - $(obj)/xe_guc_ads.o \ 32 - $(obj)/xe_guc_pc.o \ 33 - $(obj)/xe_migrate.o \ 34 - $(obj)/xe_ring_ops.o \ 35 - $(obj)/xe_vm.o \ 36 - $(obj)/xe_wa.o \ 37 - $(obj)/xe_ttm_stolen_mgr.o 38 - 39 - $(uses_generated_oob): $(generated_oob) 40 23 41 24 # Please keep these build lists sorted! 42 25 ··· 175 192 display/xe_display.o \ 176 193 display/xe_display_misc.o \ 177 194 display/xe_display_rps.o \ 195 + display/xe_display_wa.o \ 178 196 display/xe_dsb_buffer.o \ 179 197 display/xe_fb_pin.o \ 180 198 display/xe_hdcp_gsc.o \ ··· 304 320 305 321 $(obj)/%.hdrtest: $(src)/%.h FORCE 306 322 $(call if_changed_dep,hdrtest) 323 + 324 + uses_generated_oob := $(addprefix $(obj)/, $(xe-y)) 325 + $(uses_generated_oob): $(obj)/generated/xe_wa_oob.h
+5 -1
drivers/gpu/drm/xe/display/intel_fbdev_fb.c
··· 10 10 #include "xe_bo.h" 11 11 #include "xe_gt.h" 12 12 #include "xe_ttm_stolen_mgr.h" 13 + #include "xe_wa.h" 14 + 15 + #include <generated/xe_wa_oob.h> 13 16 14 17 struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper, 15 18 struct drm_fb_helper_surface_size *sizes) ··· 40 37 size = PAGE_ALIGN(size); 41 38 obj = ERR_PTR(-ENODEV); 42 39 43 - if (!IS_DGFX(xe)) { 40 + if (!IS_DGFX(xe) && !XE_WA(xe_root_mmio_gt(xe), 22019338487_display)) { 44 41 obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), 45 42 NULL, size, 46 43 ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT | ··· 51 48 else 52 49 drm_info(&xe->drm, "Allocated fbdev into stolen failed: %li\n", PTR_ERR(obj)); 53 50 } 51 + 54 52 if (IS_ERR(obj)) { 55 53 obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, size, 56 54 ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+16
drivers/gpu/drm/xe/display/xe_display_wa.c
··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2024 Intel Corporation 4 + */ 5 + 6 + #include "intel_display_wa.h" 7 + 8 + #include "xe_device.h" 9 + #include "xe_wa.h" 10 + 11 + #include <generated/xe_wa_oob.h> 12 + 13 + bool intel_display_needs_wa_16023588340(struct drm_i915_private *i915) 14 + { 15 + return XE_WA(xe_root_mmio_gt(i915), 16023588340); 16 + }
+8
drivers/gpu/drm/xe/display/xe_dsb_buffer.c
··· 7 7 #include "intel_display_types.h" 8 8 #include "intel_dsb_buffer.h" 9 9 #include "xe_bo.h" 10 + #include "xe_device.h" 11 + #include "xe_device_types.h" 10 12 #include "xe_gt.h" 11 13 12 14 u32 intel_dsb_buffer_ggtt_offset(struct intel_dsb_buffer *dsb_buf) ··· 18 16 19 17 void intel_dsb_buffer_write(struct intel_dsb_buffer *dsb_buf, u32 idx, u32 val) 20 18 { 19 + struct xe_device *xe = dsb_buf->vma->bo->tile->xe; 20 + 21 21 iosys_map_wr(&dsb_buf->vma->bo->vmap, idx * 4, u32, val); 22 + xe_device_l2_flush(xe); 22 23 } 23 24 24 25 u32 intel_dsb_buffer_read(struct intel_dsb_buffer *dsb_buf, u32 idx) ··· 31 26 32 27 void intel_dsb_buffer_memset(struct intel_dsb_buffer *dsb_buf, u32 idx, u32 val, size_t size) 33 28 { 29 + struct xe_device *xe = dsb_buf->vma->bo->tile->xe; 30 + 34 31 WARN_ON(idx > (dsb_buf->buf_size - size) / sizeof(*dsb_buf->cmd_buf)); 35 32 36 33 iosys_map_memset(&dsb_buf->vma->bo->vmap, idx * 4, val, size); 34 + xe_device_l2_flush(xe); 37 35 } 38 36 39 37 bool intel_dsb_buffer_create(struct intel_crtc *crtc, struct intel_dsb_buffer *dsb_buf, size_t size)
+3
drivers/gpu/drm/xe/display/xe_fb_pin.c
··· 10 10 #include "intel_fb.h" 11 11 #include "intel_fb_pin.h" 12 12 #include "xe_bo.h" 13 + #include "xe_device.h" 13 14 #include "xe_ggtt.h" 14 15 #include "xe_gt.h" 15 16 #include "xe_pm.h" ··· 305 304 if (ret) 306 305 goto err_unpin; 307 306 307 + /* Ensure DPT writes are flushed */ 308 + xe_device_l2_flush(xe); 308 309 return vma; 309 310 310 311 err_unpin:
+6
drivers/gpu/drm/xe/display/xe_plane_initial.c
··· 18 18 #include "intel_frontbuffer.h" 19 19 #include "intel_plane_initial.h" 20 20 #include "xe_bo.h" 21 + #include "xe_wa.h" 22 + 23 + #include <generated/xe_wa_oob.h> 21 24 22 25 static bool 23 26 intel_reuse_initial_plane_obj(struct intel_crtc *this, ··· 106 103 return NULL; 107 104 phys_base = base; 108 105 flags |= XE_BO_FLAG_STOLEN; 106 + 107 + if (XE_WA(xe_root_mmio_gt(xe), 22019338487_display)) 108 + return NULL; 109 109 110 110 /* 111 111 * If the FB is too big, just don't use it since fbdev is not very
+15
drivers/gpu/drm/xe/regs/xe_gt_regs.h
··· 80 80 #define LE_CACHEABILITY_MASK REG_GENMASK(1, 0) 81 81 #define LE_CACHEABILITY(value) REG_FIELD_PREP(LE_CACHEABILITY_MASK, value) 82 82 83 + #define XE2_GAMREQSTRM_CTRL XE_REG(0x4194) 84 + #define CG_DIS_CNTLBUS REG_BIT(6) 85 + 83 86 #define CCS_AUX_INV XE_REG(0x4208) 84 87 85 88 #define VD0_AUX_INV XE_REG(0x4218) ··· 90 87 91 88 #define VE1_AUX_INV XE_REG(0x42b8) 92 89 #define AUX_INV REG_BIT(0) 90 + 91 + #define XE2_LMEM_CFG XE_REG(0x48b0) 93 92 94 93 #define XEHP_TILE_ADDR_RANGE(_idx) XE_REG_MCR(0x4900 + (_idx) * 4) 95 94 #define XEHP_FLAT_CCS_BASE_ADDR XE_REG_MCR(0x4910) ··· 108 103 109 104 #define FF_MODE XE_REG_MCR(0x6210) 110 105 #define DIS_TE_AUTOSTRIP REG_BIT(31) 106 + #define VS_HIT_MAX_VALUE_MASK REG_GENMASK(25, 20) 111 107 #define DIS_MESH_PARTIAL_AUTOSTRIP REG_BIT(16) 112 108 #define DIS_MESH_AUTOSTRIP REG_BIT(15) 113 109 ··· 378 372 379 373 #define XEHPC_L3CLOS_MASK(i) XE_REG_MCR(0xb194 + (i) * 8) 380 374 375 + #define XE2_GLOBAL_INVAL XE_REG(0xb404) 376 + 377 + #define SCRATCH1LPFC XE_REG(0xb474) 378 + #define EN_L3_RW_CCS_CACHE_FLUSH REG_BIT(0) 379 + 381 380 #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658) 382 381 383 382 #define XE2_TDF_CTRL XE_REG(0xb418) ··· 405 394 #define XEHP_GAMCNTRL_CTRL XE_REG_MCR(0xcf54) 406 395 #define INVALIDATION_BROADCAST_MODE_DIS REG_BIT(12) 407 396 #define GLOBAL_INVALIDATION_MODE REG_BIT(2) 397 + 398 + #define LMEM_CFG XE_REG(0xcf58) 399 + #define LMEM_EN REG_BIT(31) 400 + #define LMTT_DIR_PTR REG_GENMASK(30, 0) /* in multiples of 64KB */ 408 401 409 402 #define HALF_SLICE_CHICKEN5 XE_REG_MCR(0xe188, XE_REG_OPTION_MASKED) 410 403 #define DISABLE_SAMPLE_G_PERFORMANCE REG_BIT(0)
+8 -4
drivers/gpu/drm/xe/regs/xe_regs.h
··· 15 15 #define GU_MISC_IRQ_OFFSET 0x444f0 16 16 #define GU_MISC_GSE REG_BIT(27) 17 17 18 - #define SOFTWARE_FLAGS_SPR33 XE_REG(0x4f084) 19 - 20 18 #define GU_CNTL_PROTECTED XE_REG(0x10100C) 21 19 #define DRIVERINT_FLR_DIS REG_BIT(31) 22 20 ··· 22 24 #define LMEM_INIT REG_BIT(7) 23 25 #define DRIVERFLR REG_BIT(31) 24 26 27 + #define XEHP_CLOCK_GATE_DIS XE_REG(0x101014) 28 + #define SGSI_SIDECLK_DIS REG_BIT(17) 29 + 25 30 #define GU_DEBUG XE_REG(0x101018) 26 31 #define DRIVERFLR_STATUS REG_BIT(31) 27 32 28 - #define XEHP_CLOCK_GATE_DIS XE_REG(0x101014) 29 - #define SGSI_SIDECLK_DIS REG_BIT(17) 33 + #define VIRTUAL_CTRL_REG XE_REG(0x10108c) 34 + #define GUEST_GTT_UPDATE_EN REG_BIT(8) 30 35 31 36 #define XEHP_MTCFG_ADDR XE_REG(0x101800) 32 37 #define TILE_COUNT REG_GENMASK(15, 8) ··· 66 65 #define GU_MISC_IRQ REG_BIT(29) 67 66 #define DISPLAY_IRQ REG_BIT(16) 68 67 #define GT_DW_IRQ(x) REG_BIT(x) 68 + 69 + #define VF_CAP_REG XE_REG(0x1901f8, XE_REG_OPTION_VF) 70 + #define VF_CAP REG_BIT(0) 69 71 70 72 #define PVC_RP_STATE_CAP XE_REG(0x281014) 71 73
-23
drivers/gpu/drm/xe/regs/xe_sriov_regs.h
··· 1 - /* SPDX-License-Identifier: MIT */ 2 - /* 3 - * Copyright © 2023 Intel Corporation 4 - */ 5 - 6 - #ifndef _REGS_XE_SRIOV_REGS_H_ 7 - #define _REGS_XE_SRIOV_REGS_H_ 8 - 9 - #include "regs/xe_reg_defs.h" 10 - 11 - #define XE2_LMEM_CFG XE_REG(0x48b0) 12 - 13 - #define LMEM_CFG XE_REG(0xcf58) 14 - #define LMEM_EN REG_BIT(31) 15 - #define LMTT_DIR_PTR REG_GENMASK(30, 0) /* in multiples of 64KB */ 16 - 17 - #define VIRTUAL_CTRL_REG XE_REG(0x10108c) 18 - #define GUEST_GTT_UPDATE_EN REG_BIT(8) 19 - 20 - #define VF_CAP_REG XE_REG(0x1901f8, XE_REG_OPTION_VF) 21 - #define VF_CAP REG_BIT(0) 22 - 23 - #endif
+1 -5
drivers/gpu/drm/xe/tests/Makefile
··· 2 2 3 3 # "live" kunit tests 4 4 obj-$(CONFIG_DRM_XE_KUNIT_TEST) += xe_live_test.o 5 - xe_live_test-y = xe_live_test_mod.o \ 6 - xe_bo_test.o \ 7 - xe_dma_buf_test.o \ 8 - xe_migrate_test.o \ 9 - xe_mocs_test.o 5 + xe_live_test-y = xe_live_test_mod.o 10 6 11 7 # Normal kunit tests 12 8 obj-$(CONFIG_DRM_XE_KUNIT_TEST) += xe_test.o
+33 -12
drivers/gpu/drm/xe/tests/xe_bo.c
··· 6 6 #include <kunit/test.h> 7 7 #include <kunit/visibility.h> 8 8 9 - #include "tests/xe_bo_test.h" 9 + #include "tests/xe_kunit_helpers.h" 10 10 #include "tests/xe_pci_test.h" 11 11 #include "tests/xe_test.h" 12 12 ··· 154 154 155 155 static int ccs_test_run_device(struct xe_device *xe) 156 156 { 157 - struct kunit *test = xe_cur_kunit(); 157 + struct kunit *test = kunit_get_current_test(); 158 158 struct xe_tile *tile; 159 159 int id; 160 160 161 161 if (!xe_device_has_flat_ccs(xe)) { 162 - kunit_info(test, "Skipping non-flat-ccs device.\n"); 162 + kunit_skip(test, "non-flat-ccs device\n"); 163 + return 0; 164 + } 165 + 166 + /* For xe2+ dgfx, we don't handle ccs metadata */ 167 + if (GRAPHICS_VER(xe) >= 20 && IS_DGFX(xe)) { 168 + kunit_skip(test, "xe2+ dgfx device\n"); 163 169 return 0; 164 170 } 165 171 ··· 183 177 return 0; 184 178 } 185 179 186 - void xe_ccs_migrate_kunit(struct kunit *test) 180 + static void xe_ccs_migrate_kunit(struct kunit *test) 187 181 { 188 - xe_call_for_each_device(ccs_test_run_device); 182 + struct xe_device *xe = test->priv; 183 + 184 + ccs_test_run_device(xe); 189 185 } 190 - EXPORT_SYMBOL_IF_KUNIT(xe_ccs_migrate_kunit); 191 186 192 187 static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struct kunit *test) 193 188 { ··· 332 325 333 326 static int evict_test_run_device(struct xe_device *xe) 334 327 { 335 - struct kunit *test = xe_cur_kunit(); 328 + struct kunit *test = kunit_get_current_test(); 336 329 struct xe_tile *tile; 337 330 int id; 338 331 339 332 if (!IS_DGFX(xe)) { 340 - kunit_info(test, "Skipping non-discrete device %s.\n", 341 - dev_name(xe->drm.dev)); 333 + kunit_skip(test, "non-discrete device\n"); 342 334 return 0; 343 335 } 344 336 ··· 351 345 return 0; 352 346 } 353 347 354 - void xe_bo_evict_kunit(struct kunit *test) 348 + static void xe_bo_evict_kunit(struct kunit *test) 355 349 { 356 - xe_call_for_each_device(evict_test_run_device); 350 + struct xe_device *xe = test->priv; 351 + 352 + evict_test_run_device(xe); 357 353 } 358 - EXPORT_SYMBOL_IF_KUNIT(xe_bo_evict_kunit); 354 + 355 + static struct kunit_case xe_bo_tests[] = { 356 + KUNIT_CASE_PARAM(xe_ccs_migrate_kunit, xe_pci_live_device_gen_param), 357 + KUNIT_CASE_PARAM(xe_bo_evict_kunit, xe_pci_live_device_gen_param), 358 + {} 359 + }; 360 + 361 + VISIBLE_IF_KUNIT 362 + struct kunit_suite xe_bo_test_suite = { 363 + .name = "xe_bo", 364 + .test_cases = xe_bo_tests, 365 + .init = xe_kunit_helper_xe_device_live_test_init, 366 + }; 367 + EXPORT_SYMBOL_IF_KUNIT(xe_bo_test_suite);
-21
drivers/gpu/drm/xe/tests/xe_bo_test.c
··· 1 - // SPDX-License-Identifier: GPL-2.0 2 - /* 3 - * Copyright © 2022 Intel Corporation 4 - */ 5 - 6 - #include "xe_bo_test.h" 7 - 8 - #include <kunit/test.h> 9 - 10 - static struct kunit_case xe_bo_tests[] = { 11 - KUNIT_CASE(xe_ccs_migrate_kunit), 12 - KUNIT_CASE(xe_bo_evict_kunit), 13 - {} 14 - }; 15 - 16 - static struct kunit_suite xe_bo_test_suite = { 17 - .name = "xe_bo", 18 - .test_cases = xe_bo_tests, 19 - }; 20 - 21 - kunit_test_suite(xe_bo_test_suite);
-14
drivers/gpu/drm/xe/tests/xe_bo_test.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 AND MIT */ 2 - /* 3 - * Copyright © 2023 Intel Corporation 4 - */ 5 - 6 - #ifndef _XE_BO_TEST_H_ 7 - #define _XE_BO_TEST_H_ 8 - 9 - struct kunit; 10 - 11 - void xe_ccs_migrate_kunit(struct kunit *test); 12 - void xe_bo_evict_kunit(struct kunit *test); 13 - 14 - #endif
+20 -6
drivers/gpu/drm/xe/tests/xe_dma_buf.c
··· 8 8 #include <kunit/test.h> 9 9 #include <kunit/visibility.h> 10 10 11 - #include "tests/xe_dma_buf_test.h" 11 + #include "tests/xe_kunit_helpers.h" 12 12 #include "tests/xe_pci_test.h" 13 13 14 14 #include "xe_pci.h" ··· 107 107 108 108 static void xe_test_dmabuf_import_same_driver(struct xe_device *xe) 109 109 { 110 - struct kunit *test = xe_cur_kunit(); 110 + struct kunit *test = kunit_get_current_test(); 111 111 struct dma_buf_test_params *params = to_dma_buf_test_params(test->priv); 112 112 struct drm_gem_object *import; 113 113 struct dma_buf *dmabuf; ··· 258 258 static int dma_buf_run_device(struct xe_device *xe) 259 259 { 260 260 const struct dma_buf_test_params *params; 261 - struct kunit *test = xe_cur_kunit(); 261 + struct kunit *test = kunit_get_current_test(); 262 262 263 263 xe_pm_runtime_get(xe); 264 264 for (params = test_params; params->mem_mask; ++params) { ··· 274 274 return 0; 275 275 } 276 276 277 - void xe_dma_buf_kunit(struct kunit *test) 277 + static void xe_dma_buf_kunit(struct kunit *test) 278 278 { 279 - xe_call_for_each_device(dma_buf_run_device); 279 + struct xe_device *xe = test->priv; 280 + 281 + dma_buf_run_device(xe); 280 282 } 281 - EXPORT_SYMBOL_IF_KUNIT(xe_dma_buf_kunit); 283 + 284 + static struct kunit_case xe_dma_buf_tests[] = { 285 + KUNIT_CASE_PARAM(xe_dma_buf_kunit, xe_pci_live_device_gen_param), 286 + {} 287 + }; 288 + 289 + VISIBLE_IF_KUNIT 290 + struct kunit_suite xe_dma_buf_test_suite = { 291 + .name = "xe_dma_buf", 292 + .test_cases = xe_dma_buf_tests, 293 + .init = xe_kunit_helper_xe_device_live_test_init, 294 + }; 295 + EXPORT_SYMBOL_IF_KUNIT(xe_dma_buf_test_suite);
-20
drivers/gpu/drm/xe/tests/xe_dma_buf_test.c
··· 1 - // SPDX-License-Identifier: GPL-2.0 2 - /* 3 - * Copyright © 2022 Intel Corporation 4 - */ 5 - 6 - #include "xe_dma_buf_test.h" 7 - 8 - #include <kunit/test.h> 9 - 10 - static struct kunit_case xe_dma_buf_tests[] = { 11 - KUNIT_CASE(xe_dma_buf_kunit), 12 - {} 13 - }; 14 - 15 - static struct kunit_suite xe_dma_buf_test_suite = { 16 - .name = "xe_dma_buf", 17 - .test_cases = xe_dma_buf_tests, 18 - }; 19 - 20 - kunit_test_suite(xe_dma_buf_test_suite);
-13
drivers/gpu/drm/xe/tests/xe_dma_buf_test.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 AND MIT */ 2 - /* 3 - * Copyright © 2023 Intel Corporation 4 - */ 5 - 6 - #ifndef _XE_DMA_BUF_TEST_H_ 7 - #define _XE_DMA_BUF_TEST_H_ 8 - 9 - struct kunit; 10 - 11 - void xe_dma_buf_kunit(struct kunit *test); 12 - 13 - #endif
+39
drivers/gpu/drm/xe/tests/xe_kunit_helpers.c
··· 12 12 13 13 #include "tests/xe_kunit_helpers.h" 14 14 #include "tests/xe_pci_test.h" 15 + #include "xe_device.h" 15 16 #include "xe_device_types.h" 17 + #include "xe_pm.h" 16 18 17 19 /** 18 20 * xe_kunit_helper_alloc_xe_device - Allocate a &xe_device for a KUnit test. ··· 90 88 return 0; 91 89 } 92 90 EXPORT_SYMBOL_IF_KUNIT(xe_kunit_helper_xe_device_test_init); 91 + 92 + KUNIT_DEFINE_ACTION_WRAPPER(put_xe_pm_runtime, xe_pm_runtime_put, struct xe_device *); 93 + 94 + /** 95 + * xe_kunit_helper_xe_device_live_test_init - Prepare a &xe_device for 96 + * use in a live KUnit test. 97 + * @test: the &kunit where live &xe_device will be used 98 + * 99 + * This function expects pointer to the &xe_device in the &test.param_value, 100 + * like it is prepared by the &xe_pci_live_device_gen_param and stores that 101 + * pointer as &kunit.priv to allow the test code to access it. 102 + * 103 + * This function makes sure that device is not wedged and then resumes it 104 + * to avoid waking up the device inside the test. It uses deferred cleanup 105 + * action to release a runtime_pm reference. 106 + * 107 + * This function can be used as custom implementation of &kunit_suite.init. 108 + * 109 + * This function uses KUNIT_ASSERT to detect any failures. 110 + * 111 + * Return: Always 0. 112 + */ 113 + int xe_kunit_helper_xe_device_live_test_init(struct kunit *test) 114 + { 115 + struct xe_device *xe = xe_device_const_cast(test->param_value); 116 + 117 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, xe); 118 + kunit_info(test, "running on %s device\n", xe->info.platform_name); 119 + 120 + KUNIT_ASSERT_FALSE(test, xe_device_wedged(xe)); 121 + xe_pm_runtime_get(xe); 122 + KUNIT_ASSERT_EQ(test, 0, kunit_add_action_or_reset(test, put_xe_pm_runtime, xe)); 123 + 124 + test->priv = xe; 125 + return 0; 126 + } 127 + EXPORT_SYMBOL_IF_KUNIT(xe_kunit_helper_xe_device_live_test_init);
+2
drivers/gpu/drm/xe/tests/xe_kunit_helpers.h
··· 14 14 struct device *dev); 15 15 int xe_kunit_helper_xe_device_test_init(struct kunit *test); 16 16 17 + int xe_kunit_helper_xe_device_live_test_init(struct kunit *test); 18 + 17 19 #endif
+11
drivers/gpu/drm/xe/tests/xe_live_test_mod.c
··· 3 3 * Copyright © 2023 Intel Corporation 4 4 */ 5 5 #include <linux/module.h> 6 + #include <kunit/test.h> 7 + 8 + extern struct kunit_suite xe_bo_test_suite; 9 + extern struct kunit_suite xe_dma_buf_test_suite; 10 + extern struct kunit_suite xe_migrate_test_suite; 11 + extern struct kunit_suite xe_mocs_test_suite; 12 + 13 + kunit_test_suite(xe_bo_test_suite); 14 + kunit_test_suite(xe_dma_buf_test_suite); 15 + kunit_test_suite(xe_migrate_test_suite); 16 + kunit_test_suite(xe_mocs_test_suite); 6 17 7 18 MODULE_AUTHOR("Intel Corporation"); 8 19 MODULE_LICENSE("GPL");
+419 -5
drivers/gpu/drm/xe/tests/xe_migrate.c
··· 6 6 #include <kunit/test.h> 7 7 #include <kunit/visibility.h> 8 8 9 - #include "tests/xe_migrate_test.h" 9 + #include "tests/xe_kunit_helpers.h" 10 10 #include "tests/xe_pci_test.h" 11 11 12 12 #include "xe_pci.h" ··· 334 334 335 335 static int migrate_test_run_device(struct xe_device *xe) 336 336 { 337 - struct kunit *test = xe_cur_kunit(); 337 + struct kunit *test = kunit_get_current_test(); 338 338 struct xe_tile *tile; 339 339 int id; 340 340 ··· 354 354 return 0; 355 355 } 356 356 357 - void xe_migrate_sanity_kunit(struct kunit *test) 357 + static void xe_migrate_sanity_kunit(struct kunit *test) 358 358 { 359 - xe_call_for_each_device(migrate_test_run_device); 359 + struct xe_device *xe = test->priv; 360 + 361 + migrate_test_run_device(xe); 360 362 } 361 - EXPORT_SYMBOL_IF_KUNIT(xe_migrate_sanity_kunit); 363 + 364 + static struct dma_fence *blt_copy(struct xe_tile *tile, 365 + struct xe_bo *src_bo, struct xe_bo *dst_bo, 366 + bool copy_only_ccs, const char *str, struct kunit *test) 367 + { 368 + struct xe_gt *gt = tile->primary_gt; 369 + struct xe_migrate *m = tile->migrate; 370 + struct xe_device *xe = gt_to_xe(gt); 371 + struct dma_fence *fence = NULL; 372 + u64 size = src_bo->size; 373 + struct xe_res_cursor src_it, dst_it; 374 + struct ttm_resource *src = src_bo->ttm.resource, *dst = dst_bo->ttm.resource; 375 + u64 src_L0_ofs, dst_L0_ofs; 376 + u32 src_L0_pt, dst_L0_pt; 377 + u64 src_L0, dst_L0; 378 + int err; 379 + bool src_is_vram = mem_type_is_vram(src->mem_type); 380 + bool dst_is_vram = mem_type_is_vram(dst->mem_type); 381 + 382 + if (!src_is_vram) 383 + xe_res_first_sg(xe_bo_sg(src_bo), 0, size, &src_it); 384 + else 385 + xe_res_first(src, 0, size, &src_it); 386 + 387 + if (!dst_is_vram) 388 + xe_res_first_sg(xe_bo_sg(dst_bo), 0, size, &dst_it); 389 + else 390 + xe_res_first(dst, 0, size, &dst_it); 391 + 392 + while (size) { 393 + u32 batch_size = 2; /* arb_clear() + MI_BATCH_BUFFER_END */ 394 + struct xe_sched_job *job; 395 + struct xe_bb *bb; 396 + u32 flush_flags = 0; 397 + u32 update_idx; 398 + u32 avail_pts = max_mem_transfer_per_pass(xe) / LEVEL0_PAGE_TABLE_ENCODE_SIZE; 399 + u32 pte_flags; 400 + 401 + src_L0 = xe_migrate_res_sizes(m, &src_it); 402 + dst_L0 = xe_migrate_res_sizes(m, &dst_it); 403 + 404 + src_L0 = min(src_L0, dst_L0); 405 + 406 + pte_flags = src_is_vram ? (PTE_UPDATE_FLAG_IS_VRAM | 407 + PTE_UPDATE_FLAG_IS_COMP_PTE) : 0; 408 + batch_size += pte_update_size(m, pte_flags, src, &src_it, &src_L0, 409 + &src_L0_ofs, &src_L0_pt, 0, 0, 410 + avail_pts); 411 + 412 + pte_flags = dst_is_vram ? (PTE_UPDATE_FLAG_IS_VRAM | 413 + PTE_UPDATE_FLAG_IS_COMP_PTE) : 0; 414 + batch_size += pte_update_size(m, pte_flags, dst, &dst_it, &src_L0, 415 + &dst_L0_ofs, &dst_L0_pt, 0, 416 + avail_pts, avail_pts); 417 + 418 + /* Add copy commands size here */ 419 + batch_size += ((copy_only_ccs) ? 0 : EMIT_COPY_DW) + 420 + ((xe_device_has_flat_ccs(xe) && copy_only_ccs) ? EMIT_COPY_CCS_DW : 0); 421 + 422 + bb = xe_bb_new(gt, batch_size, xe->info.has_usm); 423 + if (IS_ERR(bb)) { 424 + err = PTR_ERR(bb); 425 + goto err_sync; 426 + } 427 + 428 + if (src_is_vram) 429 + xe_res_next(&src_it, src_L0); 430 + else 431 + emit_pte(m, bb, src_L0_pt, src_is_vram, false, 432 + &src_it, src_L0, src); 433 + 434 + if (dst_is_vram) 435 + xe_res_next(&dst_it, src_L0); 436 + else 437 + emit_pte(m, bb, dst_L0_pt, dst_is_vram, false, 438 + &dst_it, src_L0, dst); 439 + 440 + bb->cs[bb->len++] = MI_BATCH_BUFFER_END; 441 + update_idx = bb->len; 442 + if (!copy_only_ccs) 443 + emit_copy(gt, bb, src_L0_ofs, dst_L0_ofs, src_L0, XE_PAGE_SIZE); 444 + 445 + if (copy_only_ccs) 446 + flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, 447 + src_is_vram, dst_L0_ofs, 448 + dst_is_vram, src_L0, dst_L0_ofs, 449 + copy_only_ccs); 450 + 451 + job = xe_bb_create_migration_job(m->q, bb, 452 + xe_migrate_batch_base(m, xe->info.has_usm), 453 + update_idx); 454 + if (IS_ERR(job)) { 455 + err = PTR_ERR(job); 456 + goto err; 457 + } 458 + 459 + xe_sched_job_add_migrate_flush(job, flush_flags); 460 + 461 + mutex_lock(&m->job_mutex); 462 + xe_sched_job_arm(job); 463 + dma_fence_put(fence); 464 + fence = dma_fence_get(&job->drm.s_fence->finished); 465 + xe_sched_job_push(job); 466 + 467 + dma_fence_put(m->fence); 468 + m->fence = dma_fence_get(fence); 469 + 470 + mutex_unlock(&m->job_mutex); 471 + 472 + xe_bb_free(bb, fence); 473 + size -= src_L0; 474 + continue; 475 + 476 + err: 477 + xe_bb_free(bb, NULL); 478 + 479 + err_sync: 480 + if (fence) { 481 + dma_fence_wait(fence, false); 482 + dma_fence_put(fence); 483 + } 484 + return ERR_PTR(err); 485 + } 486 + 487 + return fence; 488 + } 489 + 490 + static void test_migrate(struct xe_device *xe, struct xe_tile *tile, 491 + struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct xe_bo *ccs_bo, 492 + struct kunit *test) 493 + { 494 + struct dma_fence *fence; 495 + u64 expected, retval; 496 + long timeout; 497 + long ret; 498 + 499 + expected = 0xd0d0d0d0d0d0d0d0; 500 + xe_map_memset(xe, &sys_bo->vmap, 0, 0xd0, sys_bo->size); 501 + 502 + fence = blt_copy(tile, sys_bo, vram_bo, false, "Blit copy from sysmem to vram", test); 503 + if (!sanity_fence_failed(xe, fence, "Blit copy from sysmem to vram", test)) { 504 + retval = xe_map_rd(xe, &vram_bo->vmap, 0, u64); 505 + if (retval == expected) 506 + KUNIT_FAIL(test, "Sanity check failed: VRAM must have compressed value\n"); 507 + } 508 + dma_fence_put(fence); 509 + 510 + kunit_info(test, "Evict vram buffer object\n"); 511 + ret = xe_bo_evict(vram_bo, true); 512 + if (ret) { 513 + KUNIT_FAIL(test, "Failed to evict bo.\n"); 514 + return; 515 + } 516 + 517 + ret = xe_bo_vmap(vram_bo); 518 + if (ret) { 519 + KUNIT_FAIL(test, "Failed to vmap vram bo: %li\n", ret); 520 + return; 521 + } 522 + 523 + retval = xe_map_rd(xe, &vram_bo->vmap, 0, u64); 524 + check(retval, expected, "Clear evicted vram data first value", test); 525 + retval = xe_map_rd(xe, &vram_bo->vmap, vram_bo->size - 8, u64); 526 + check(retval, expected, "Clear evicted vram data last value", test); 527 + 528 + fence = blt_copy(tile, vram_bo, ccs_bo, 529 + true, "Blit surf copy from vram to sysmem", test); 530 + if (!sanity_fence_failed(xe, fence, "Clear ccs buffer data", test)) { 531 + retval = xe_map_rd(xe, &ccs_bo->vmap, 0, u64); 532 + check(retval, 0, "Clear ccs data first value", test); 533 + 534 + retval = xe_map_rd(xe, &ccs_bo->vmap, ccs_bo->size - 8, u64); 535 + check(retval, 0, "Clear ccs data last value", test); 536 + } 537 + dma_fence_put(fence); 538 + 539 + kunit_info(test, "Restore vram buffer object\n"); 540 + ret = xe_bo_validate(vram_bo, NULL, false); 541 + if (ret) { 542 + KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret); 543 + return; 544 + } 545 + 546 + /* Sync all migration blits */ 547 + timeout = dma_resv_wait_timeout(vram_bo->ttm.base.resv, 548 + DMA_RESV_USAGE_KERNEL, 549 + true, 550 + 5 * HZ); 551 + if (timeout <= 0) { 552 + KUNIT_FAIL(test, "Failed to sync bo eviction.\n"); 553 + return; 554 + } 555 + 556 + ret = xe_bo_vmap(vram_bo); 557 + if (ret) { 558 + KUNIT_FAIL(test, "Failed to vmap vram bo: %li\n", ret); 559 + return; 560 + } 561 + 562 + retval = xe_map_rd(xe, &vram_bo->vmap, 0, u64); 563 + check(retval, expected, "Restored value must be equal to initial value", test); 564 + retval = xe_map_rd(xe, &vram_bo->vmap, vram_bo->size - 8, u64); 565 + check(retval, expected, "Restored value must be equal to initial value", test); 566 + 567 + fence = blt_copy(tile, vram_bo, ccs_bo, 568 + true, "Blit surf copy from vram to sysmem", test); 569 + if (!sanity_fence_failed(xe, fence, "Clear ccs buffer data", test)) { 570 + retval = xe_map_rd(xe, &ccs_bo->vmap, 0, u64); 571 + check(retval, 0, "Clear ccs data first value", test); 572 + retval = xe_map_rd(xe, &ccs_bo->vmap, ccs_bo->size - 8, u64); 573 + check(retval, 0, "Clear ccs data last value", test); 574 + } 575 + dma_fence_put(fence); 576 + } 577 + 578 + static void test_clear(struct xe_device *xe, struct xe_tile *tile, 579 + struct xe_bo *sys_bo, struct xe_bo *vram_bo, struct kunit *test) 580 + { 581 + struct dma_fence *fence; 582 + u64 expected, retval; 583 + 584 + expected = 0xd0d0d0d0d0d0d0d0; 585 + xe_map_memset(xe, &sys_bo->vmap, 0, 0xd0, sys_bo->size); 586 + 587 + fence = blt_copy(tile, sys_bo, vram_bo, false, "Blit copy from sysmem to vram", test); 588 + if (!sanity_fence_failed(xe, fence, "Blit copy from sysmem to vram", test)) { 589 + retval = xe_map_rd(xe, &vram_bo->vmap, 0, u64); 590 + if (retval == expected) 591 + KUNIT_FAIL(test, "Sanity check failed: VRAM must have compressed value\n"); 592 + } 593 + dma_fence_put(fence); 594 + 595 + fence = blt_copy(tile, vram_bo, sys_bo, false, "Blit copy from vram to sysmem", test); 596 + if (!sanity_fence_failed(xe, fence, "Blit copy from vram to sysmem", test)) { 597 + retval = xe_map_rd(xe, &sys_bo->vmap, 0, u64); 598 + check(retval, expected, "Decompressed value must be equal to initial value", test); 599 + retval = xe_map_rd(xe, &sys_bo->vmap, sys_bo->size - 8, u64); 600 + check(retval, expected, "Decompressed value must be equal to initial value", test); 601 + } 602 + dma_fence_put(fence); 603 + 604 + kunit_info(test, "Clear vram buffer object\n"); 605 + expected = 0x0000000000000000; 606 + fence = xe_migrate_clear(tile->migrate, vram_bo, vram_bo->ttm.resource); 607 + if (sanity_fence_failed(xe, fence, "Clear vram_bo", test)) 608 + return; 609 + dma_fence_put(fence); 610 + 611 + fence = blt_copy(tile, vram_bo, sys_bo, 612 + false, "Blit copy from vram to sysmem", test); 613 + if (!sanity_fence_failed(xe, fence, "Clear main buffer data", test)) { 614 + retval = xe_map_rd(xe, &sys_bo->vmap, 0, u64); 615 + check(retval, expected, "Clear main buffer first value", test); 616 + retval = xe_map_rd(xe, &sys_bo->vmap, sys_bo->size - 8, u64); 617 + check(retval, expected, "Clear main buffer last value", test); 618 + } 619 + dma_fence_put(fence); 620 + 621 + fence = blt_copy(tile, vram_bo, sys_bo, 622 + true, "Blit surf copy from vram to sysmem", test); 623 + if (!sanity_fence_failed(xe, fence, "Clear ccs buffer data", test)) { 624 + retval = xe_map_rd(xe, &sys_bo->vmap, 0, u64); 625 + check(retval, expected, "Clear ccs data first value", test); 626 + retval = xe_map_rd(xe, &sys_bo->vmap, sys_bo->size - 8, u64); 627 + check(retval, expected, "Clear ccs data last value", test); 628 + } 629 + dma_fence_put(fence); 630 + } 631 + 632 + static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *tile, 633 + struct kunit *test) 634 + { 635 + struct xe_bo *sys_bo, *vram_bo = NULL, *ccs_bo = NULL; 636 + unsigned int bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile); 637 + long ret; 638 + 639 + sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M, 640 + DRM_XE_GEM_CPU_CACHING_WC, ttm_bo_type_device, 641 + XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS); 642 + 643 + if (IS_ERR(sys_bo)) { 644 + KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n", 645 + PTR_ERR(sys_bo)); 646 + return; 647 + } 648 + 649 + xe_bo_lock(sys_bo, false); 650 + ret = xe_bo_validate(sys_bo, NULL, false); 651 + if (ret) { 652 + KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret); 653 + goto free_sysbo; 654 + } 655 + 656 + ret = xe_bo_vmap(sys_bo); 657 + if (ret) { 658 + KUNIT_FAIL(test, "Failed to vmap system bo: %li\n", ret); 659 + goto free_sysbo; 660 + } 661 + xe_bo_unlock(sys_bo); 662 + 663 + ccs_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M, DRM_XE_GEM_CPU_CACHING_WC, 664 + ttm_bo_type_device, bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS); 665 + 666 + if (IS_ERR(ccs_bo)) { 667 + KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n", 668 + PTR_ERR(ccs_bo)); 669 + return; 670 + } 671 + 672 + xe_bo_lock(ccs_bo, false); 673 + ret = xe_bo_validate(ccs_bo, NULL, false); 674 + if (ret) { 675 + KUNIT_FAIL(test, "Failed to validate system bo for: %li\n", ret); 676 + goto free_ccsbo; 677 + } 678 + 679 + ret = xe_bo_vmap(ccs_bo); 680 + if (ret) { 681 + KUNIT_FAIL(test, "Failed to vmap system bo: %li\n", ret); 682 + goto free_ccsbo; 683 + } 684 + xe_bo_unlock(ccs_bo); 685 + 686 + vram_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M, DRM_XE_GEM_CPU_CACHING_WC, 687 + ttm_bo_type_device, bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS); 688 + if (IS_ERR(vram_bo)) { 689 + KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n", 690 + PTR_ERR(vram_bo)); 691 + return; 692 + } 693 + 694 + xe_bo_lock(vram_bo, false); 695 + ret = xe_bo_validate(vram_bo, NULL, false); 696 + if (ret) { 697 + KUNIT_FAIL(test, "Failed to validate vram bo for: %li\n", ret); 698 + goto free_vrambo; 699 + } 700 + 701 + ret = xe_bo_vmap(vram_bo); 702 + if (ret) { 703 + KUNIT_FAIL(test, "Failed to vmap vram bo: %li\n", ret); 704 + goto free_vrambo; 705 + } 706 + 707 + test_clear(xe, tile, sys_bo, vram_bo, test); 708 + test_migrate(xe, tile, sys_bo, vram_bo, ccs_bo, test); 709 + xe_bo_unlock(vram_bo); 710 + 711 + xe_bo_lock(vram_bo, false); 712 + xe_bo_vunmap(vram_bo); 713 + xe_bo_unlock(vram_bo); 714 + 715 + xe_bo_lock(ccs_bo, false); 716 + xe_bo_vunmap(ccs_bo); 717 + xe_bo_unlock(ccs_bo); 718 + 719 + xe_bo_lock(sys_bo, false); 720 + xe_bo_vunmap(sys_bo); 721 + xe_bo_unlock(sys_bo); 722 + free_vrambo: 723 + xe_bo_put(vram_bo); 724 + free_ccsbo: 725 + xe_bo_put(ccs_bo); 726 + free_sysbo: 727 + xe_bo_put(sys_bo); 728 + } 729 + 730 + static int validate_ccs_test_run_device(struct xe_device *xe) 731 + { 732 + struct kunit *test = kunit_get_current_test(); 733 + struct xe_tile *tile; 734 + int id; 735 + 736 + if (!xe_device_has_flat_ccs(xe)) { 737 + kunit_skip(test, "non-flat-ccs device\n"); 738 + return 0; 739 + } 740 + 741 + if (!(GRAPHICS_VER(xe) >= 20 && IS_DGFX(xe))) { 742 + kunit_skip(test, "non-xe2 discrete device\n"); 743 + return 0; 744 + } 745 + 746 + xe_pm_runtime_get(xe); 747 + 748 + for_each_tile(tile, xe, id) 749 + validate_ccs_test_run_tile(xe, tile, test); 750 + 751 + xe_pm_runtime_put(xe); 752 + 753 + return 0; 754 + } 755 + 756 + static void xe_validate_ccs_kunit(struct kunit *test) 757 + { 758 + struct xe_device *xe = test->priv; 759 + 760 + validate_ccs_test_run_device(xe); 761 + } 762 + 763 + static struct kunit_case xe_migrate_tests[] = { 764 + KUNIT_CASE_PARAM(xe_migrate_sanity_kunit, xe_pci_live_device_gen_param), 765 + KUNIT_CASE_PARAM(xe_validate_ccs_kunit, xe_pci_live_device_gen_param), 766 + {} 767 + }; 768 + 769 + VISIBLE_IF_KUNIT 770 + struct kunit_suite xe_migrate_test_suite = { 771 + .name = "xe_migrate", 772 + .test_cases = xe_migrate_tests, 773 + .init = xe_kunit_helper_xe_device_live_test_init, 774 + }; 775 + EXPORT_SYMBOL_IF_KUNIT(xe_migrate_test_suite);
-20
drivers/gpu/drm/xe/tests/xe_migrate_test.c
··· 1 - // SPDX-License-Identifier: GPL-2.0 2 - /* 3 - * Copyright © 2022 Intel Corporation 4 - */ 5 - 6 - #include "xe_migrate_test.h" 7 - 8 - #include <kunit/test.h> 9 - 10 - static struct kunit_case xe_migrate_tests[] = { 11 - KUNIT_CASE(xe_migrate_sanity_kunit), 12 - {} 13 - }; 14 - 15 - static struct kunit_suite xe_migrate_test_suite = { 16 - .name = "xe_migrate", 17 - .test_cases = xe_migrate_tests, 18 - }; 19 - 20 - kunit_test_suite(xe_migrate_test_suite);
-13
drivers/gpu/drm/xe/tests/xe_migrate_test.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 AND MIT */ 2 - /* 3 - * Copyright © 2023 Intel Corporation 4 - */ 5 - 6 - #ifndef _XE_MIGRATE_TEST_H_ 7 - #define _XE_MIGRATE_TEST_H_ 8 - 9 - struct kunit; 10 - 11 - void xe_migrate_sanity_kunit(struct kunit *test); 12 - 13 - #endif
+33 -11
drivers/gpu/drm/xe/tests/xe_mocs.c
··· 6 6 #include <kunit/test.h> 7 7 #include <kunit/visibility.h> 8 8 9 - #include "tests/xe_mocs_test.h" 9 + #include "tests/xe_kunit_helpers.h" 10 10 #include "tests/xe_pci_test.h" 11 11 #include "tests/xe_test.h" 12 12 ··· 23 23 static int live_mocs_init(struct live_mocs *arg, struct xe_gt *gt) 24 24 { 25 25 unsigned int flags; 26 - struct kunit *test = xe_cur_kunit(); 26 + struct kunit *test = kunit_get_current_test(); 27 27 28 28 memset(arg, 0, sizeof(*arg)); 29 29 ··· 41 41 static void read_l3cc_table(struct xe_gt *gt, 42 42 const struct xe_mocs_info *info) 43 43 { 44 - struct kunit *test = xe_cur_kunit(); 44 + struct kunit *test = kunit_get_current_test(); 45 45 u32 l3cc, l3cc_expected; 46 46 unsigned int i; 47 47 u32 reg_val; ··· 78 78 static void read_mocs_table(struct xe_gt *gt, 79 79 const struct xe_mocs_info *info) 80 80 { 81 - struct kunit *test = xe_cur_kunit(); 81 + struct kunit *test = kunit_get_current_test(); 82 82 u32 mocs, mocs_expected; 83 83 unsigned int i; 84 84 u32 reg_val; ··· 134 134 return 0; 135 135 } 136 136 137 - void xe_live_mocs_kernel_kunit(struct kunit *test) 137 + static void xe_live_mocs_kernel_kunit(struct kunit *test) 138 138 { 139 - xe_call_for_each_device(mocs_kernel_test_run_device); 139 + struct xe_device *xe = test->priv; 140 + 141 + if (IS_SRIOV_VF(xe)) 142 + kunit_skip(test, "this test is N/A for VF"); 143 + 144 + mocs_kernel_test_run_device(xe); 140 145 } 141 - EXPORT_SYMBOL_IF_KUNIT(xe_live_mocs_kernel_kunit); 142 146 143 147 static int mocs_reset_test_run_device(struct xe_device *xe) 144 148 { ··· 152 148 struct xe_gt *gt; 153 149 unsigned int flags; 154 150 int id; 155 - struct kunit *test = xe_cur_kunit(); 151 + struct kunit *test = kunit_get_current_test(); 156 152 157 153 xe_pm_runtime_get(xe); 158 154 ··· 179 175 return 0; 180 176 } 181 177 182 - void xe_live_mocs_reset_kunit(struct kunit *test) 178 + static void xe_live_mocs_reset_kunit(struct kunit *test) 183 179 { 184 - xe_call_for_each_device(mocs_reset_test_run_device); 180 + struct xe_device *xe = test->priv; 181 + 182 + if (IS_SRIOV_VF(xe)) 183 + kunit_skip(test, "this test is N/A for VF"); 184 + 185 + mocs_reset_test_run_device(xe); 185 186 } 186 - EXPORT_SYMBOL_IF_KUNIT(xe_live_mocs_reset_kunit); 187 + 188 + static struct kunit_case xe_mocs_tests[] = { 189 + KUNIT_CASE_PARAM(xe_live_mocs_kernel_kunit, xe_pci_live_device_gen_param), 190 + KUNIT_CASE_PARAM(xe_live_mocs_reset_kunit, xe_pci_live_device_gen_param), 191 + {} 192 + }; 193 + 194 + VISIBLE_IF_KUNIT 195 + struct kunit_suite xe_mocs_test_suite = { 196 + .name = "xe_mocs", 197 + .test_cases = xe_mocs_tests, 198 + .init = xe_kunit_helper_xe_device_live_test_init, 199 + }; 200 + EXPORT_SYMBOL_IF_KUNIT(xe_mocs_test_suite);
-21
drivers/gpu/drm/xe/tests/xe_mocs_test.c
··· 1 - // SPDX-License-Identifier: GPL-2.0 2 - /* 3 - * Copyright © 2022 Intel Corporation 4 - */ 5 - 6 - #include "xe_mocs_test.h" 7 - 8 - #include <kunit/test.h> 9 - 10 - static struct kunit_case xe_mocs_tests[] = { 11 - KUNIT_CASE(xe_live_mocs_kernel_kunit), 12 - KUNIT_CASE(xe_live_mocs_reset_kunit), 13 - {} 14 - }; 15 - 16 - static struct kunit_suite xe_mocs_test_suite = { 17 - .name = "xe_mocs", 18 - .test_cases = xe_mocs_tests, 19 - }; 20 - 21 - kunit_test_suite(xe_mocs_test_suite);
-14
drivers/gpu/drm/xe/tests/xe_mocs_test.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 AND MIT */ 2 - /* 3 - * Copyright © 2023 Intel Corporation 4 - */ 5 - 6 - #ifndef _XE_MOCS_TEST_H_ 7 - #define _XE_MOCS_TEST_H_ 8 - 9 - struct kunit; 10 - 11 - void xe_live_mocs_kernel_kunit(struct kunit *test); 12 - void xe_live_mocs_reset_kunit(struct kunit *test); 13 - 14 - #endif
+30
drivers/gpu/drm/xe/tests/xe_pci.c
··· 167 167 return 0; 168 168 } 169 169 EXPORT_SYMBOL_IF_KUNIT(xe_pci_fake_device_init); 170 + 171 + /** 172 + * xe_pci_live_device_gen_param - Helper to iterate Xe devices as KUnit parameters 173 + * @prev: the previously returned value, or NULL for the first iteration 174 + * @desc: the buffer for a parameter name 175 + * 176 + * Iterates over the available Xe devices on the system. Uses the device name 177 + * as the parameter name. 178 + * 179 + * To be used only as a parameter generator function in &KUNIT_CASE_PARAM. 180 + * 181 + * Return: pointer to the next &struct xe_device ready to be used as a parameter 182 + * or NULL if there are no more Xe devices on the system. 183 + */ 184 + const void *xe_pci_live_device_gen_param(const void *prev, char *desc) 185 + { 186 + const struct xe_device *xe = prev; 187 + struct device *dev = xe ? xe->drm.dev : NULL; 188 + struct device *next; 189 + 190 + next = driver_find_next_device(&xe_pci_driver.driver, dev); 191 + if (dev) 192 + put_device(dev); 193 + if (!next) 194 + return NULL; 195 + 196 + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "%s", dev_name(next)); 197 + return pdev_to_xe_device(to_pci_dev(next)); 198 + } 199 + EXPORT_SYMBOL_IF_KUNIT(xe_pci_live_device_gen_param);
+2 -2
drivers/gpu/drm/xe/tests/xe_pci_test.c
··· 16 16 17 17 static void check_graphics_ip(const struct xe_graphics_desc *graphics) 18 18 { 19 - struct kunit *test = xe_cur_kunit(); 19 + struct kunit *test = kunit_get_current_test(); 20 20 u64 mask = graphics->hw_engine_mask; 21 21 22 22 /* RCS, CCS, and BCS engines are allowed on the graphics IP */ ··· 30 30 31 31 static void check_media_ip(const struct xe_media_desc *media) 32 32 { 33 - struct kunit *test = xe_cur_kunit(); 33 + struct kunit *test = kunit_get_current_test(); 34 34 u64 mask = media->hw_engine_mask; 35 35 36 36 /* VCS, VECS and GSCCS engines are allowed on the media IP */
+2
drivers/gpu/drm/xe/tests/xe_pci_test.h
··· 35 35 36 36 int xe_pci_fake_device_init(struct xe_device *xe); 37 37 38 + const void *xe_pci_live_device_gen_param(const void *prev, char *desc); 39 + 38 40 #endif
+197 -22
drivers/gpu/drm/xe/tests/xe_rtp_test.c
··· 31 31 #undef XE_REG_MCR 32 32 #define XE_REG_MCR(...) XE_REG(__VA_ARGS__, .mcr = 1) 33 33 34 - struct rtp_test_case { 34 + struct rtp_to_sr_test_case { 35 35 const char *name; 36 36 struct xe_reg expected_reg; 37 37 u32 expected_set_bits; 38 38 u32 expected_clr_bits; 39 - unsigned long expected_count; 39 + unsigned long expected_count_sr_entries; 40 40 unsigned int expected_sr_errors; 41 + unsigned long expected_active; 41 42 const struct xe_rtp_entry_sr *entries; 43 + }; 44 + 45 + struct rtp_test_case { 46 + const char *name; 47 + unsigned long expected_active; 48 + const struct xe_rtp_entry *entries; 42 49 }; 43 50 44 51 static bool match_yes(const struct xe_gt *gt, const struct xe_hw_engine *hwe) ··· 58 51 return false; 59 52 } 60 53 61 - static const struct rtp_test_case cases[] = { 54 + static const struct rtp_to_sr_test_case rtp_to_sr_cases[] = { 62 55 { 63 56 .name = "coalesce-same-reg", 64 57 .expected_reg = REGULAR_REG1, 65 58 .expected_set_bits = REG_BIT(0) | REG_BIT(1), 66 59 .expected_clr_bits = REG_BIT(0) | REG_BIT(1), 67 - .expected_count = 1, 60 + .expected_active = BIT(0) | BIT(1), 61 + .expected_count_sr_entries = 1, 68 62 /* Different bits on the same register: create a single entry */ 69 63 .entries = (const struct xe_rtp_entry_sr[]) { 70 64 { XE_RTP_NAME("basic-1"), ··· 84 76 .expected_reg = REGULAR_REG1, 85 77 .expected_set_bits = REG_BIT(0), 86 78 .expected_clr_bits = REG_BIT(0), 87 - .expected_count = 1, 79 + .expected_active = BIT(0), 80 + .expected_count_sr_entries = 1, 88 81 /* Don't coalesce second entry since rules don't match */ 89 82 .entries = (const struct xe_rtp_entry_sr[]) { 90 83 { XE_RTP_NAME("basic-1"), ··· 104 95 .expected_reg = REGULAR_REG1, 105 96 .expected_set_bits = REG_BIT(0) | REG_BIT(1) | REG_BIT(2), 106 97 .expected_clr_bits = REG_BIT(0) | REG_BIT(1) | REG_BIT(2), 107 - .expected_count = 1, 98 + .expected_active = BIT(0) | BIT(1) | BIT(2), 99 + .expected_count_sr_entries = 1, 108 100 .entries = (const struct xe_rtp_entry_sr[]) { 109 101 { XE_RTP_NAME("first"), 110 102 XE_RTP_RULES(FUNC(match_yes), OR, FUNC(match_no)), ··· 131 121 { 132 122 .name = "match-or-xfail", 133 123 .expected_reg = REGULAR_REG1, 134 - .expected_count = 0, 124 + .expected_count_sr_entries = 0, 135 125 .entries = (const struct xe_rtp_entry_sr[]) { 136 126 { XE_RTP_NAME("leading-or"), 137 127 XE_RTP_RULES(OR, FUNC(match_yes)), ··· 158 148 .expected_reg = REGULAR_REG1, 159 149 .expected_set_bits = REG_BIT(0), 160 150 .expected_clr_bits = REG_BIT(0), 161 - .expected_count = 1, 151 + .expected_active = BIT(0), 152 + .expected_count_sr_entries = 1, 162 153 /* Don't coalesce second entry due to one of the rules */ 163 154 .entries = (const struct xe_rtp_entry_sr[]) { 164 155 { XE_RTP_NAME("basic-1"), ··· 178 167 .expected_reg = REGULAR_REG1, 179 168 .expected_set_bits = REG_BIT(0), 180 169 .expected_clr_bits = REG_BIT(0), 181 - .expected_count = 2, 170 + .expected_active = BIT(0) | BIT(1), 171 + .expected_count_sr_entries = 2, 182 172 /* Same bits on different registers are not coalesced */ 183 173 .entries = (const struct xe_rtp_entry_sr[]) { 184 174 { XE_RTP_NAME("basic-1"), ··· 198 186 .expected_reg = REGULAR_REG1, 199 187 .expected_set_bits = REG_BIT(0), 200 188 .expected_clr_bits = REG_BIT(1) | REG_BIT(0), 201 - .expected_count = 1, 189 + .expected_active = BIT(0) | BIT(1), 190 + .expected_count_sr_entries = 1, 202 191 /* Check clr vs set actions on different bits */ 203 192 .entries = (const struct xe_rtp_entry_sr[]) { 204 193 { XE_RTP_NAME("basic-1"), ··· 220 207 .expected_reg = REGULAR_REG1, 221 208 .expected_set_bits = TEMP_FIELD, 222 209 .expected_clr_bits = TEMP_MASK, 223 - .expected_count = 1, 210 + .expected_active = BIT(0), 211 + .expected_count_sr_entries = 1, 224 212 /* Check FIELD_SET works */ 225 213 .entries = (const struct xe_rtp_entry_sr[]) { 226 214 { XE_RTP_NAME("basic-1"), ··· 239 225 .expected_reg = REGULAR_REG1, 240 226 .expected_set_bits = REG_BIT(0), 241 227 .expected_clr_bits = REG_BIT(0), 242 - .expected_count = 1, 228 + .expected_active = BIT(0) | BIT(1), 229 + .expected_count_sr_entries = 1, 243 230 .expected_sr_errors = 1, 244 231 .entries = (const struct xe_rtp_entry_sr[]) { 245 232 { XE_RTP_NAME("basic-1"), ··· 260 245 .expected_reg = REGULAR_REG1, 261 246 .expected_set_bits = REG_BIT(0), 262 247 .expected_clr_bits = REG_BIT(0), 263 - .expected_count = 1, 248 + .expected_active = BIT(0) | BIT(1), 249 + .expected_count_sr_entries = 1, 264 250 .expected_sr_errors = 1, 265 251 .entries = (const struct xe_rtp_entry_sr[]) { 266 252 { XE_RTP_NAME("basic-1"), ··· 281 265 .expected_reg = REGULAR_REG1, 282 266 .expected_set_bits = REG_BIT(0), 283 267 .expected_clr_bits = REG_BIT(0), 284 - .expected_count = 1, 268 + .expected_active = BIT(0) | BIT(1) | BIT(2), 269 + .expected_count_sr_entries = 1, 285 270 .expected_sr_errors = 2, 286 271 .entries = (const struct xe_rtp_entry_sr[]) { 287 272 { XE_RTP_NAME("basic-1"), ··· 304 287 }, 305 288 }; 306 289 307 - static void xe_rtp_process_tests(struct kunit *test) 290 + static void xe_rtp_process_to_sr_tests(struct kunit *test) 308 291 { 309 - const struct rtp_test_case *param = test->param_value; 292 + const struct rtp_to_sr_test_case *param = test->param_value; 310 293 struct xe_device *xe = test->priv; 311 294 struct xe_gt *gt = xe_device_get_root_tile(xe)->primary_gt; 312 295 struct xe_reg_sr *reg_sr = &gt->reg_sr; 313 296 const struct xe_reg_sr_entry *sre, *sr_entry = NULL; 314 297 struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(gt); 315 - unsigned long idx, count = 0; 298 + unsigned long idx, count_sr_entries = 0, count_rtp_entries = 0, active = 0; 316 299 317 - xe_reg_sr_init(reg_sr, "xe_rtp_tests", xe); 300 + xe_reg_sr_init(reg_sr, "xe_rtp_to_sr_tests", xe); 301 + 302 + while (param->entries[count_rtp_entries].rules) 303 + count_rtp_entries++; 304 + 305 + xe_rtp_process_ctx_enable_active_tracking(&ctx, &active, count_rtp_entries); 318 306 xe_rtp_process_to_sr(&ctx, param->entries, reg_sr); 319 307 320 308 xa_for_each(&reg_sr->xa, idx, sre) { 321 309 if (idx == param->expected_reg.addr) 322 310 sr_entry = sre; 323 311 324 - count++; 312 + count_sr_entries++; 325 313 } 326 314 327 - KUNIT_EXPECT_EQ(test, count, param->expected_count); 328 - if (count) { 315 + KUNIT_EXPECT_EQ(test, active, param->expected_active); 316 + 317 + KUNIT_EXPECT_EQ(test, count_sr_entries, param->expected_count_sr_entries); 318 + if (count_sr_entries) { 329 319 KUNIT_EXPECT_EQ(test, sr_entry->clr_bits, param->expected_clr_bits); 330 320 KUNIT_EXPECT_EQ(test, sr_entry->set_bits, param->expected_set_bits); 331 321 KUNIT_EXPECT_EQ(test, sr_entry->reg.raw, param->expected_reg.raw); ··· 343 319 KUNIT_EXPECT_EQ(test, reg_sr->errors, param->expected_sr_errors); 344 320 } 345 321 322 + /* 323 + * Entries below follow the logic used with xe_wa_oob.rules: 324 + * 1) Entries with empty name are OR'ed: all entries marked active since the 325 + * last entry with a name 326 + * 2) There are no action associated with rules 327 + */ 328 + static const struct rtp_test_case rtp_cases[] = { 329 + { 330 + .name = "active1", 331 + .expected_active = BIT(0), 332 + .entries = (const struct xe_rtp_entry[]) { 333 + { XE_RTP_NAME("r1"), 334 + XE_RTP_RULES(FUNC(match_yes)), 335 + }, 336 + {} 337 + }, 338 + }, 339 + { 340 + .name = "active2", 341 + .expected_active = BIT(0) | BIT(1), 342 + .entries = (const struct xe_rtp_entry[]) { 343 + { XE_RTP_NAME("r1"), 344 + XE_RTP_RULES(FUNC(match_yes)), 345 + }, 346 + { XE_RTP_NAME("r2"), 347 + XE_RTP_RULES(FUNC(match_yes)), 348 + }, 349 + {} 350 + }, 351 + }, 352 + { 353 + .name = "active-inactive", 354 + .expected_active = BIT(0), 355 + .entries = (const struct xe_rtp_entry[]) { 356 + { XE_RTP_NAME("r1"), 357 + XE_RTP_RULES(FUNC(match_yes)), 358 + }, 359 + { XE_RTP_NAME("r2"), 360 + XE_RTP_RULES(FUNC(match_no)), 361 + }, 362 + {} 363 + }, 364 + }, 365 + { 366 + .name = "inactive-active", 367 + .expected_active = BIT(1), 368 + .entries = (const struct xe_rtp_entry[]) { 369 + { XE_RTP_NAME("r1"), 370 + XE_RTP_RULES(FUNC(match_no)), 371 + }, 372 + { XE_RTP_NAME("r2"), 373 + XE_RTP_RULES(FUNC(match_yes)), 374 + }, 375 + {} 376 + }, 377 + }, 378 + { 379 + .name = "inactive-1st_or_active-inactive", 380 + .expected_active = BIT(1), 381 + .entries = (const struct xe_rtp_entry[]) { 382 + { XE_RTP_NAME("r1"), 383 + XE_RTP_RULES(FUNC(match_no)), 384 + }, 385 + { XE_RTP_NAME("r2_or_conditions"), 386 + XE_RTP_RULES(FUNC(match_yes), OR, 387 + FUNC(match_no), OR, 388 + FUNC(match_no)) }, 389 + { XE_RTP_NAME("r3"), 390 + XE_RTP_RULES(FUNC(match_no)), 391 + }, 392 + {} 393 + }, 394 + }, 395 + { 396 + .name = "inactive-2nd_or_active-inactive", 397 + .expected_active = BIT(1), 398 + .entries = (const struct xe_rtp_entry[]) { 399 + { XE_RTP_NAME("r1"), 400 + XE_RTP_RULES(FUNC(match_no)), 401 + }, 402 + { XE_RTP_NAME("r2_or_conditions"), 403 + XE_RTP_RULES(FUNC(match_no), OR, 404 + FUNC(match_yes), OR, 405 + FUNC(match_no)) }, 406 + { XE_RTP_NAME("r3"), 407 + XE_RTP_RULES(FUNC(match_no)), 408 + }, 409 + {} 410 + }, 411 + }, 412 + { 413 + .name = "inactive-last_or_active-inactive", 414 + .expected_active = BIT(1), 415 + .entries = (const struct xe_rtp_entry[]) { 416 + { XE_RTP_NAME("r1"), 417 + XE_RTP_RULES(FUNC(match_no)), 418 + }, 419 + { XE_RTP_NAME("r2_or_conditions"), 420 + XE_RTP_RULES(FUNC(match_no), OR, 421 + FUNC(match_no), OR, 422 + FUNC(match_yes)) }, 423 + { XE_RTP_NAME("r3"), 424 + XE_RTP_RULES(FUNC(match_no)), 425 + }, 426 + {} 427 + }, 428 + }, 429 + { 430 + .name = "inactive-no_or_active-inactive", 431 + .expected_active = 0, 432 + .entries = (const struct xe_rtp_entry[]) { 433 + { XE_RTP_NAME("r1"), 434 + XE_RTP_RULES(FUNC(match_no)), 435 + }, 436 + { XE_RTP_NAME("r2_or_conditions"), 437 + XE_RTP_RULES(FUNC(match_no), OR, 438 + FUNC(match_no), OR, 439 + FUNC(match_no)) }, 440 + { XE_RTP_NAME("r3"), 441 + XE_RTP_RULES(FUNC(match_no)), 442 + }, 443 + {} 444 + }, 445 + }, 446 + }; 447 + 448 + static void xe_rtp_process_tests(struct kunit *test) 449 + { 450 + const struct rtp_test_case *param = test->param_value; 451 + struct xe_device *xe = test->priv; 452 + struct xe_gt *gt = xe_device_get_root_tile(xe)->primary_gt; 453 + struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(gt); 454 + unsigned long count_rtp_entries = 0, active = 0; 455 + 456 + while (param->entries[count_rtp_entries].rules) 457 + count_rtp_entries++; 458 + 459 + xe_rtp_process_ctx_enable_active_tracking(&ctx, &active, count_rtp_entries); 460 + xe_rtp_process(&ctx, param->entries); 461 + 462 + KUNIT_EXPECT_EQ(test, active, param->expected_active); 463 + } 464 + 465 + static void rtp_to_sr_desc(const struct rtp_to_sr_test_case *t, char *desc) 466 + { 467 + strscpy(desc, t->name, KUNIT_PARAM_DESC_SIZE); 468 + } 469 + 470 + KUNIT_ARRAY_PARAM(rtp_to_sr, rtp_to_sr_cases, rtp_to_sr_desc); 471 + 346 472 static void rtp_desc(const struct rtp_test_case *t, char *desc) 347 473 { 348 474 strscpy(desc, t->name, KUNIT_PARAM_DESC_SIZE); 349 475 } 350 476 351 - KUNIT_ARRAY_PARAM(rtp, cases, rtp_desc); 477 + KUNIT_ARRAY_PARAM(rtp, rtp_cases, rtp_desc); 352 478 353 479 static int xe_rtp_test_init(struct kunit *test) 354 480 { ··· 531 357 } 532 358 533 359 static struct kunit_case xe_rtp_tests[] = { 360 + KUNIT_CASE_PARAM(xe_rtp_process_to_sr_tests, rtp_to_sr_gen_params), 534 361 KUNIT_CASE_PARAM(xe_rtp_process_tests, rtp_gen_params), 535 362 {} 536 363 };
+3 -7
drivers/gpu/drm/xe/tests/xe_test.h
··· 9 9 #include <linux/types.h> 10 10 11 11 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST) 12 - #include <linux/sched.h> 13 12 #include <kunit/test.h> 13 + #include <kunit/test-bug.h> 14 14 15 15 /* 16 16 * Each test that provides a kunit private test structure, place a test id ··· 31 31 32 32 #define XE_TEST_DECLARE(x) x 33 33 #define XE_TEST_ONLY(x) unlikely(x) 34 - #define XE_TEST_EXPORT 35 - #define xe_cur_kunit() current->kunit_test 36 34 37 35 /** 38 36 * xe_cur_kunit_priv - Obtain the struct xe_test_priv pointed to by ··· 46 48 { 47 49 struct xe_test_priv *priv; 48 50 49 - if (!xe_cur_kunit()) 51 + if (!kunit_get_current_test()) 50 52 return NULL; 51 53 52 - priv = xe_cur_kunit()->priv; 54 + priv = kunit_get_current_test()->priv; 53 55 return priv->id == id ? priv : NULL; 54 56 } 55 57 ··· 57 59 58 60 #define XE_TEST_DECLARE(x) 59 61 #define XE_TEST_ONLY(x) 0 60 - #define XE_TEST_EXPORT static 61 - #define xe_cur_kunit() NULL 62 62 #define xe_cur_kunit_priv(_id) NULL 63 63 64 64 #endif
+1
drivers/gpu/drm/xe/tests/xe_wa_test.c
··· 74 74 GMDID_CASE(METEORLAKE, 1274, A0, 1300, A0), 75 75 GMDID_CASE(LUNARLAKE, 2004, A0, 2000, A0), 76 76 GMDID_CASE(LUNARLAKE, 2004, B0, 2000, A0), 77 + GMDID_CASE(BATTLEMAGE, 2001, A0, 1301, A1), 77 78 }; 78 79 79 80 static void platform_desc(const struct platform_test_case *t, char *desc)
+7 -6
drivers/gpu/drm/xe/xe_bo.c
··· 1264 1264 if (flags & (XE_BO_FLAG_VRAM_MASK | XE_BO_FLAG_STOLEN) && 1265 1265 !(flags & XE_BO_FLAG_IGNORE_MIN_PAGE_SIZE) && 1266 1266 ((xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) || 1267 - (flags & XE_BO_NEEDS_64K))) { 1268 - aligned_size = ALIGN(size, SZ_64K); 1269 - if (type != ttm_bo_type_device) 1270 - size = ALIGN(size, SZ_64K); 1271 - flags |= XE_BO_FLAG_INTERNAL_64K; 1272 - alignment = SZ_64K >> PAGE_SHIFT; 1267 + (flags & (XE_BO_FLAG_NEEDS_64K | XE_BO_FLAG_NEEDS_2M)))) { 1268 + size_t align = flags & XE_BO_FLAG_NEEDS_2M ? SZ_2M : SZ_64K; 1273 1269 1270 + aligned_size = ALIGN(size, align); 1271 + if (type != ttm_bo_type_device) 1272 + size = ALIGN(size, align); 1273 + flags |= XE_BO_FLAG_INTERNAL_64K; 1274 + alignment = align >> PAGE_SHIFT; 1274 1275 } else { 1275 1276 aligned_size = ALIGN(size, SZ_4K); 1276 1277 flags &= ~XE_BO_FLAG_INTERNAL_64K;
+3 -2
drivers/gpu/drm/xe/xe_bo.h
··· 36 36 #define XE_BO_FLAG_PAGETABLE BIT(12) 37 37 #define XE_BO_FLAG_NEEDS_CPU_ACCESS BIT(13) 38 38 #define XE_BO_FLAG_NEEDS_UC BIT(14) 39 - #define XE_BO_NEEDS_64K BIT(15) 40 - #define XE_BO_FLAG_GGTT_INVALIDATE BIT(16) 39 + #define XE_BO_FLAG_NEEDS_64K BIT(15) 40 + #define XE_BO_FLAG_NEEDS_2M BIT(16) 41 + #define XE_BO_FLAG_GGTT_INVALIDATE BIT(17) 41 42 /* this one is trigger internally only */ 42 43 #define XE_BO_FLAG_INTERNAL_TEST BIT(30) 43 44 #define XE_BO_FLAG_INTERNAL_64K BIT(31)
+2
drivers/gpu/drm/xe/xe_bo_types.h
··· 58 58 #endif 59 59 /** @freed: List node for delayed put. */ 60 60 struct llist_node freed; 61 + /** @update_index: Update index if PT BO */ 62 + int update_index; 61 63 /** @created: Whether the bo has passed initial creation */ 62 64 bool created; 63 65
+2 -8
drivers/gpu/drm/xe/xe_devcoredump.c
··· 171 171 u32 adj_logical_mask = q->logical_mask; 172 172 u32 width_mask = (0x1 << q->width) - 1; 173 173 const char *process_name = "no process"; 174 - struct task_struct *task = NULL; 175 174 176 175 int i; 177 176 bool cookie; ··· 178 179 ss->snapshot_time = ktime_get_real(); 179 180 ss->boot_time = ktime_get_boottime(); 180 181 181 - if (q->vm && q->vm->xef) { 182 - task = get_pid_task(q->vm->xef->drm->pid, PIDTYPE_PID); 183 - if (task) 184 - process_name = task->comm; 185 - } 182 + if (q->vm && q->vm->xef) 183 + process_name = q->vm->xef->process_name; 186 184 strscpy(ss->process_name, process_name); 187 - if (task) 188 - put_task_struct(task); 189 185 190 186 ss->gt = q->gt; 191 187 INIT_WORK(&ss->work, xe_devcoredump_deferred_snap_work);
+101 -10
drivers/gpu/drm/xe/xe_device.c
··· 54 54 #include "xe_vm.h" 55 55 #include "xe_vram.h" 56 56 #include "xe_wait_user_fence.h" 57 + #include "xe_wa.h" 58 + 59 + #include <generated/xe_wa_oob.h> 57 60 58 61 static int xe_file_open(struct drm_device *dev, struct drm_file *file) 59 62 { ··· 64 61 struct xe_drm_client *client; 65 62 struct xe_file *xef; 66 63 int ret = -ENOMEM; 64 + struct task_struct *task = NULL; 67 65 68 66 xef = kzalloc(sizeof(*xef), GFP_KERNEL); 69 67 if (!xef) ··· 91 87 spin_unlock(&xe->clients.lock); 92 88 93 89 file->driver_priv = xef; 90 + kref_init(&xef->refcount); 91 + 92 + task = get_pid_task(rcu_access_pointer(file->pid), PIDTYPE_PID); 93 + if (task) { 94 + xef->process_name = kstrdup(task->comm, GFP_KERNEL); 95 + xef->pid = task->pid; 96 + put_task_struct(task); 97 + } 98 + 94 99 return 0; 100 + } 101 + 102 + static void xe_file_destroy(struct kref *ref) 103 + { 104 + struct xe_file *xef = container_of(ref, struct xe_file, refcount); 105 + struct xe_device *xe = xef->xe; 106 + 107 + xa_destroy(&xef->exec_queue.xa); 108 + mutex_destroy(&xef->exec_queue.lock); 109 + xa_destroy(&xef->vm.xa); 110 + mutex_destroy(&xef->vm.lock); 111 + 112 + spin_lock(&xe->clients.lock); 113 + xe->clients.count--; 114 + spin_unlock(&xe->clients.lock); 115 + 116 + xe_drm_client_put(xef->client); 117 + kfree(xef->process_name); 118 + kfree(xef); 119 + } 120 + 121 + /** 122 + * xe_file_get() - Take a reference to the xe file object 123 + * @xef: Pointer to the xe file 124 + * 125 + * Anyone with a pointer to xef must take a reference to the xe file 126 + * object using this call. 127 + * 128 + * Return: xe file pointer 129 + */ 130 + struct xe_file *xe_file_get(struct xe_file *xef) 131 + { 132 + kref_get(&xef->refcount); 133 + return xef; 134 + } 135 + 136 + /** 137 + * xe_file_put() - Drop a reference to the xe file object 138 + * @xef: Pointer to the xe file 139 + * 140 + * Used to drop reference to the xef object 141 + */ 142 + void xe_file_put(struct xe_file *xef) 143 + { 144 + kref_put(&xef->refcount, xe_file_destroy); 95 145 } 96 146 97 147 static void xe_file_close(struct drm_device *dev, struct drm_file *file) ··· 155 97 struct xe_vm *vm; 156 98 struct xe_exec_queue *q; 157 99 unsigned long idx; 100 + 101 + xe_pm_runtime_get(xe); 158 102 159 103 /* 160 104 * No need for exec_queue.lock here as there is no contention for it ··· 168 108 xe_exec_queue_kill(q); 169 109 xe_exec_queue_put(q); 170 110 } 171 - xa_destroy(&xef->exec_queue.xa); 172 - mutex_destroy(&xef->exec_queue.lock); 173 111 mutex_lock(&xef->vm.lock); 174 112 xa_for_each(&xef->vm.xa, idx, vm) 175 113 xe_vm_close_and_put(vm); 176 114 mutex_unlock(&xef->vm.lock); 177 - xa_destroy(&xef->vm.xa); 178 - mutex_destroy(&xef->vm.lock); 179 115 180 - spin_lock(&xe->clients.lock); 181 - xe->clients.count--; 182 - spin_unlock(&xe->clients.lock); 116 + xe_file_put(xef); 183 117 184 - xe_drm_client_put(xef->client); 185 - kfree(xef); 118 + xe_pm_runtime_put(xe); 186 119 } 187 120 188 121 static const struct drm_ioctl_desc xe_ioctls[] = { ··· 797 744 { 798 745 } 799 746 747 + /** 748 + * xe_device_wmb() - Device specific write memory barrier 749 + * @xe: the &xe_device 750 + * 751 + * While wmb() is sufficient for a barrier if we use system memory, on discrete 752 + * platforms with device memory we additionally need to issue a register write. 753 + * Since it doesn't matter which register we write to, use the read-only VF_CAP 754 + * register that is also marked as accessible by the VFs. 755 + */ 800 756 void xe_device_wmb(struct xe_device *xe) 801 757 { 802 758 struct xe_gt *gt = xe_root_mmio_gt(xe); 803 759 804 760 wmb(); 805 761 if (IS_DGFX(xe)) 806 - xe_mmio_write32(gt, SOFTWARE_FLAGS_SPR33, 0); 762 + xe_mmio_write32(gt, VF_CAP_REG, 0); 807 763 } 808 764 809 765 /** ··· 841 779 if (!IS_DGFX(xe) || GRAPHICS_VER(xe) < 20) 842 780 return; 843 781 782 + if (XE_WA(xe_root_mmio_gt(xe), 16023588340)) { 783 + xe_device_l2_flush(xe); 784 + return; 785 + } 786 + 844 787 for_each_gt(gt, xe, id) { 845 788 if (xe_gt_is_media_type(gt)) 846 789 continue; ··· 867 800 868 801 xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); 869 802 } 803 + } 804 + 805 + void xe_device_l2_flush(struct xe_device *xe) 806 + { 807 + struct xe_gt *gt; 808 + int err; 809 + 810 + gt = xe_root_mmio_gt(xe); 811 + 812 + if (!XE_WA(gt, 16023588340)) 813 + return; 814 + 815 + err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 816 + if (err) 817 + return; 818 + 819 + spin_lock(&gt->global_invl_lock); 820 + xe_mmio_write32(gt, XE2_GLOBAL_INVAL, 0x1); 821 + 822 + if (xe_mmio_wait32(gt, XE2_GLOBAL_INVAL, 0x1, 0x0, 150, NULL, true)) 823 + xe_gt_err_once(gt, "Global invalidation timeout\n"); 824 + spin_unlock(&gt->global_invl_lock); 825 + 826 + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); 870 827 } 871 828 872 829 u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
+9
drivers/gpu/drm/xe/xe_device.h
··· 20 20 return pci_get_drvdata(pdev); 21 21 } 22 22 23 + static inline struct xe_device *xe_device_const_cast(const struct xe_device *xe) 24 + { 25 + return (struct xe_device *)xe; 26 + } 27 + 23 28 static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm) 24 29 { 25 30 return container_of(ttm, struct xe_device, ttm); ··· 167 162 u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address); 168 163 169 164 void xe_device_td_flush(struct xe_device *xe); 165 + void xe_device_l2_flush(struct xe_device *xe); 170 166 171 167 static inline bool xe_device_wedged(struct xe_device *xe) 172 168 { ··· 175 169 } 176 170 177 171 void xe_device_declare_wedged(struct xe_device *xe); 172 + 173 + struct xe_file *xe_file_get(struct xe_file *xef); 174 + void xe_file_put(struct xe_file *xef); 178 175 179 176 #endif
+30
drivers/gpu/drm/xe/xe_device_types.h
··· 23 23 #include "xe_sriov_types.h" 24 24 #include "xe_step_types.h" 25 25 26 + #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 27 + #define TEST_VM_OPS_ERROR 28 + #endif 29 + 26 30 #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY) 27 31 #include "soc/intel_pch.h" 28 32 #include "intel_display_core.h" ··· 44 40 #define MEDIA_VERx100(xe) ((xe)->info.media_verx100) 45 41 #define IS_DGFX(xe) ((xe)->info.is_dgfx) 46 42 #define HAS_HECI_GSCFI(xe) ((xe)->info.has_heci_gscfi) 43 + #define HAS_HECI_CSCFI(xe) ((xe)->info.has_heci_cscfi) 47 44 48 45 #define XE_VRAM_FLAGS_NEED64K BIT(0) 49 46 ··· 290 285 u8 skip_pcode:1; 291 286 /** @info.has_heci_gscfi: device has heci gscfi */ 292 287 u8 has_heci_gscfi:1; 288 + /** @info.has_heci_cscfi: device has heci cscfi */ 289 + u8 has_heci_cscfi:1; 293 290 /** @info.skip_guc_pc: Skip GuC based PM feature init */ 294 291 u8 skip_guc_pc:1; 295 292 /** @info.has_atomic_enable_pte_bit: Device has atomic enable PTE bit */ ··· 484 477 int mode; 485 478 } wedged; 486 479 480 + #ifdef TEST_VM_OPS_ERROR 481 + /** 482 + * @vm_inject_error_position: inject errors at different places in VM 483 + * bind IOCTL based on this value 484 + */ 485 + u8 vm_inject_error_position; 486 + #endif 487 + 487 488 /* private: */ 488 489 489 490 #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY) ··· 581 566 582 567 /** @client: drm client */ 583 568 struct xe_drm_client *client; 569 + 570 + /** 571 + * @process_name: process name for file handle, used to safely output 572 + * during error situations where xe file can outlive process 573 + */ 574 + char *process_name; 575 + 576 + /** 577 + * @pid: pid for file handle, used to safely output uring error 578 + * situations where xe file can outlive process 579 + */ 580 + pid_t pid; 581 + 582 + /** @refcount: ref count of this xe file */ 583 + struct kref refcount; 584 584 }; 585 585 586 586 #endif
+1 -4
drivers/gpu/drm/xe/xe_drm_client.c
··· 251 251 252 252 /* Accumulate all the exec queues from this client */ 253 253 mutex_lock(&xef->exec_queue.lock); 254 - xa_for_each(&xef->exec_queue.xa, i, q) { 254 + xa_for_each(&xef->exec_queue.xa, i, q) 255 255 xe_exec_queue_update_run_ticks(q); 256 - xef->run_ticks[q->class] += q->run_ticks - q->old_run_ticks; 257 - q->old_run_ticks = q->run_ticks; 258 - } 259 256 mutex_unlock(&xef->exec_queue.lock); 260 257 261 258 /* Get the total GPU cycles */
+32 -1
drivers/gpu/drm/xe/xe_exec_queue.c
··· 37 37 { 38 38 if (q->vm) 39 39 xe_vm_put(q->vm); 40 + 41 + if (q->xef) 42 + xe_file_put(q->xef); 43 + 40 44 kfree(q); 41 45 } 42 46 ··· 653 649 goto kill_exec_queue; 654 650 655 651 args->exec_queue_id = id; 652 + q->xef = xe_file_get(xef); 656 653 657 654 return 0; 658 655 ··· 767 762 */ 768 763 void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q) 769 764 { 765 + struct xe_file *xef; 770 766 struct xe_lrc *lrc; 771 767 u32 old_ts, new_ts; 772 768 ··· 779 773 if (!q->vm || !q->vm->xef) 780 774 return; 781 775 776 + xef = q->vm->xef; 777 + 782 778 /* 783 779 * Only sample the first LRC. For parallel submission, all of them are 784 780 * scheduled together and we compensate that below by multiplying by ··· 791 783 */ 792 784 lrc = q->lrc[0]; 793 785 new_ts = xe_lrc_update_timestamp(lrc, &old_ts); 794 - q->run_ticks += (new_ts - old_ts) * q->width; 786 + xef->run_ticks[q->class] += (new_ts - old_ts) * q->width; 795 787 } 796 788 797 789 void xe_exec_queue_kill(struct xe_exec_queue *q) ··· 913 905 914 906 xe_exec_queue_last_fence_put(q, vm); 915 907 q->last_fence = dma_fence_get(fence); 908 + } 909 + 910 + /** 911 + * xe_exec_queue_last_fence_test_dep - Test last fence dependency of queue 912 + * @q: The exec queue 913 + * @vm: The VM the engine does a bind or exec for 914 + * 915 + * Returns: 916 + * -ETIME if there exists an unsignalled last fence dependency, zero otherwise. 917 + */ 918 + int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, struct xe_vm *vm) 919 + { 920 + struct dma_fence *fence; 921 + int err = 0; 922 + 923 + fence = xe_exec_queue_last_fence_get(q, vm); 924 + if (fence) { 925 + err = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) ? 926 + 0 : -ETIME; 927 + dma_fence_put(fence); 928 + } 929 + 930 + return err; 916 931 }
+2
drivers/gpu/drm/xe/xe_exec_queue.h
··· 75 75 struct xe_vm *vm); 76 76 void xe_exec_queue_last_fence_set(struct xe_exec_queue *e, struct xe_vm *vm, 77 77 struct dma_fence *fence); 78 + int xe_exec_queue_last_fence_test_dep(struct xe_exec_queue *q, 79 + struct xe_vm *vm); 78 80 void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q); 79 81 80 82 #endif
+7 -6
drivers/gpu/drm/xe/xe_exec_queue_types.h
··· 38 38 * a kernel object. 39 39 */ 40 40 struct xe_exec_queue { 41 + /** @xef: Back pointer to xe file if this is user created exec queue */ 42 + struct xe_file *xef; 43 + 41 44 /** @gt: graphics tile this exec queue can submit to */ 42 45 struct xe_gt *gt; 43 46 /** ··· 142 139 * Protected by @vm's resv. Unused if @vm == NULL. 143 140 */ 144 141 u64 tlb_flush_seqno; 145 - /** @old_run_ticks: prior hw engine class run time in ticks for this exec queue */ 146 - u64 old_run_ticks; 147 - /** @run_ticks: hw engine class run time in ticks for this exec queue */ 148 - u64 run_ticks; 149 142 /** @lrc: logical ring context for this exec queue */ 150 143 struct xe_lrc *lrc[]; 151 144 }; ··· 171 172 int (*suspend)(struct xe_exec_queue *q); 172 173 /** 173 174 * @suspend_wait: Wait for an exec queue to suspend executing, should be 174 - * call after suspend. 175 + * call after suspend. In dma-fencing path thus must return within a 176 + * reasonable amount of time. -ETIME return shall indicate an error 177 + * waiting for suspend resulting in associated VM getting killed. 175 178 */ 176 - void (*suspend_wait)(struct xe_exec_queue *q); 179 + int (*suspend_wait)(struct xe_exec_queue *q); 177 180 /** 178 181 * @resume: Resume exec queue execution, exec queue must be in a suspended 179 182 * state and dma fence returned from most recent suspend call must be
+2 -1
drivers/gpu/drm/xe/xe_execlist.c
··· 422 422 return 0; 423 423 } 424 424 425 - static void execlist_exec_queue_suspend_wait(struct xe_exec_queue *q) 425 + static int execlist_exec_queue_suspend_wait(struct xe_exec_queue *q) 426 426 427 427 { 428 428 /* NIY */ 429 + return 0; 429 430 } 430 431 431 432 static void execlist_exec_queue_resume(struct xe_exec_queue *q)
+12 -4
drivers/gpu/drm/xe/xe_gen_wa_oob.c
··· 97 97 98 98 if (name) { 99 99 fprintf(cheader, "\tXE_WA_OOB_%s = %u,\n", name, idx); 100 - fprintf(csource, "{ XE_RTP_NAME(\"%s\"), XE_RTP_RULES(%s) },\n", 100 + 101 + /* Close previous entry before starting a new one */ 102 + if (idx) 103 + fprintf(csource, ") },\n"); 104 + 105 + fprintf(csource, "{ XE_RTP_NAME(\"%s\"),\n XE_RTP_RULES(%s", 101 106 name, rules); 107 + idx++; 102 108 } else { 103 - fprintf(csource, "{ XE_RTP_NAME(NULL), XE_RTP_RULES(%s) },\n", 104 - rules); 109 + fprintf(csource, ", OR,\n\t%s", rules); 105 110 } 106 111 107 - idx++; 108 112 lineno++; 109 113 if (!is_continuation) 110 114 prev_name = name; 111 115 } 116 + 117 + /* Close last entry */ 118 + if (idx) 119 + fprintf(csource, ") },\n"); 112 120 113 121 fprintf(cheader, "\t_XE_WA_OOB_COUNT = %u\n", idx); 114 122
+54
drivers/gpu/drm/xe/xe_gt.c
··· 9 9 10 10 #include <drm/drm_managed.h> 11 11 #include <drm/xe_drm.h> 12 + 12 13 #include <generated/xe_wa_oob.h> 13 14 14 15 #include "instructions/xe_gfxpipe_commands.h" ··· 96 95 gt->uc.guc.submission_state.enabled = false; 97 96 } 98 97 98 + static void xe_gt_enable_host_l2_vram(struct xe_gt *gt) 99 + { 100 + u32 reg; 101 + int err; 102 + 103 + if (!XE_WA(gt, 16023588340)) 104 + return; 105 + 106 + err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 107 + if (WARN_ON(err)) 108 + return; 109 + 110 + if (!xe_gt_is_media_type(gt)) { 111 + xe_mmio_write32(gt, SCRATCH1LPFC, EN_L3_RW_CCS_CACHE_FLUSH); 112 + reg = xe_mmio_read32(gt, XE2_GAMREQSTRM_CTRL); 113 + reg |= CG_DIS_CNTLBUS; 114 + xe_mmio_write32(gt, XE2_GAMREQSTRM_CTRL, reg); 115 + } 116 + 117 + xe_gt_mcr_multicast_write(gt, XEHPC_L3CLOS_MASK(3), 0x3); 118 + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); 119 + } 120 + 121 + static void xe_gt_disable_host_l2_vram(struct xe_gt *gt) 122 + { 123 + u32 reg; 124 + int err; 125 + 126 + if (!XE_WA(gt, 16023588340)) 127 + return; 128 + 129 + if (xe_gt_is_media_type(gt)) 130 + return; 131 + 132 + err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 133 + if (WARN_ON(err)) 134 + return; 135 + 136 + reg = xe_mmio_read32(gt, XE2_GAMREQSTRM_CTRL); 137 + reg &= ~CG_DIS_CNTLBUS; 138 + xe_mmio_write32(gt, XE2_GAMREQSTRM_CTRL, reg); 139 + 140 + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); 141 + } 142 + 99 143 /** 100 144 * xe_gt_remove() - Clean up the GT structures before driver removal 101 145 * @gt: the GT object ··· 157 111 158 112 for (i = 0; i < XE_ENGINE_CLASS_MAX; ++i) 159 113 xe_hw_fence_irq_finish(&gt->fence_irq[i]); 114 + 115 + xe_gt_disable_host_l2_vram(gt); 160 116 } 161 117 162 118 static void gt_reset_worker(struct work_struct *w); ··· 387 339 388 340 xe_force_wake_init_gt(gt, gt_to_fw(gt)); 389 341 xe_pcode_init(gt); 342 + spin_lock_init(&gt->global_invl_lock); 390 343 391 344 return 0; 392 345 } ··· 557 508 558 509 xe_gt_mcr_init_early(gt); 559 510 xe_pat_init(gt); 511 + xe_gt_enable_host_l2_vram(gt); 560 512 561 513 err = xe_uc_init(&gt->uc); 562 514 if (err) ··· 692 642 return vf_gt_restart(gt); 693 643 694 644 xe_pat_init(gt); 645 + 646 + xe_gt_enable_host_l2_vram(gt); 695 647 696 648 xe_gt_mcr_set_implicit_defaults(gt); 697 649 xe_reg_sr_apply_mmio(&gt->reg_sr, gt); ··· 847 795 goto err_force_wake; 848 796 849 797 xe_gt_idle_disable_pg(gt); 798 + 799 + xe_gt_disable_host_l2_vram(gt); 850 800 851 801 XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL)); 852 802 xe_gt_dbg(gt, "suspended\n");
+1 -1
drivers/gpu/drm/xe/xe_gt_sriov_pf.c
··· 5 5 6 6 #include <drm/drm_managed.h> 7 7 8 - #include "regs/xe_sriov_regs.h" 8 + #include "regs/xe_regs.h" 9 9 10 10 #include "xe_gt_sriov_pf.h" 11 11 #include "xe_gt_sriov_pf_config.h"
+1
drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
··· 1401 1401 ALIGN(size, PAGE_SIZE), 1402 1402 ttm_bo_type_kernel, 1403 1403 XE_BO_FLAG_VRAM_IF_DGFX(tile) | 1404 + XE_BO_FLAG_NEEDS_2M | 1404 1405 XE_BO_FLAG_PINNED); 1405 1406 if (IS_ERR(bo)) 1406 1407 return PTR_ERR(bo);
+27 -1
drivers/gpu/drm/xe/xe_gt_sriov_vf.c
··· 850 850 851 851 xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); 852 852 853 - return bsearch(&key, runtime->regs, runtime->regs_size, sizeof(key), 853 + return bsearch(&key, runtime->regs, runtime->num_regs, sizeof(key), 854 854 vf_runtime_reg_cmp); 855 855 } 856 856 ··· 890 890 891 891 xe_gt_sriov_dbg_verbose(gt, "runtime[%#x] = %#x\n", addr, rr->value); 892 892 return rr->value; 893 + } 894 + 895 + /** 896 + * xe_gt_sriov_vf_write32 - Handle a write to an inaccessible register. 897 + * @gt: the &xe_gt 898 + * @reg: the register to write 899 + * @val: value to write 900 + * 901 + * This function is for VF use only. 902 + * Currently it will trigger a WARN if running on debug build. 903 + */ 904 + void xe_gt_sriov_vf_write32(struct xe_gt *gt, struct xe_reg reg, u32 val) 905 + { 906 + u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); 907 + 908 + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); 909 + xe_gt_assert(gt, !reg.vf); 910 + 911 + /* 912 + * In the future, we may want to handle selected writes to inaccessible 913 + * registers in some custom way, but for now let's just log a warning 914 + * about such attempt, as likely we might be doing something wrong. 915 + */ 916 + xe_gt_WARN(gt, IS_ENABLED(CONFIG_DRM_XE_DEBUG), 917 + "VF is trying to write %#x to an inaccessible register %#x+%#x\n", 918 + val, reg.addr, addr - reg.addr); 893 919 } 894 920 895 921 /**
+1
drivers/gpu/drm/xe/xe_gt_sriov_vf.h
··· 22 22 u16 xe_gt_sriov_vf_guc_ids(struct xe_gt *gt); 23 23 u64 xe_gt_sriov_vf_lmem(struct xe_gt *gt); 24 24 u32 xe_gt_sriov_vf_read32(struct xe_gt *gt, struct xe_reg reg); 25 + void xe_gt_sriov_vf_write32(struct xe_gt *gt, struct xe_reg reg, u32 val); 25 26 26 27 void xe_gt_sriov_vf_print_config(struct xe_gt *gt, struct drm_printer *p); 27 28 void xe_gt_sriov_vf_print_runtime(struct xe_gt *gt, struct drm_printer *p);
+110 -93
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
··· 13 13 #include "xe_guc.h" 14 14 #include "xe_guc_ct.h" 15 15 #include "xe_mmio.h" 16 + #include "xe_pm.h" 16 17 #include "xe_sriov.h" 17 18 #include "xe_trace.h" 18 19 #include "regs/xe_guc_regs.h" 20 + 21 + #define FENCE_STACK_BIT DMA_FENCE_FLAG_USER_BITS 19 22 20 23 /* 21 24 * TLB inval depends on pending commands in the CT queue and then the real ··· 36 33 return hw_tlb_timeout + 2 * delay; 37 34 } 38 35 36 + static void 37 + __invalidation_fence_signal(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence) 38 + { 39 + bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags); 40 + 41 + trace_xe_gt_tlb_invalidation_fence_signal(xe, fence); 42 + xe_gt_tlb_invalidation_fence_fini(fence); 43 + dma_fence_signal(&fence->base); 44 + if (!stack) 45 + dma_fence_put(&fence->base); 46 + } 47 + 48 + static void 49 + invalidation_fence_signal(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence) 50 + { 51 + list_del(&fence->link); 52 + __invalidation_fence_signal(xe, fence); 53 + } 39 54 40 55 static void xe_gt_tlb_fence_timeout(struct work_struct *work) 41 56 { ··· 75 54 xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d", 76 55 fence->seqno, gt->tlb_invalidation.seqno_recv); 77 56 78 - list_del(&fence->link); 79 57 fence->base.error = -ETIME; 80 - dma_fence_signal(&fence->base); 81 - dma_fence_put(&fence->base); 58 + invalidation_fence_signal(xe, fence); 82 59 } 83 60 if (!list_empty(&gt->tlb_invalidation.pending_fences)) 84 61 queue_delayed_work(system_wq, ··· 106 87 return 0; 107 88 } 108 89 109 - static void 110 - __invalidation_fence_signal(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence) 111 - { 112 - trace_xe_gt_tlb_invalidation_fence_signal(xe, fence); 113 - dma_fence_signal(&fence->base); 114 - dma_fence_put(&fence->base); 115 - } 116 - 117 - static void 118 - invalidation_fence_signal(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence) 119 - { 120 - list_del(&fence->link); 121 - __invalidation_fence_signal(xe, fence); 122 - } 123 - 124 90 /** 125 91 * xe_gt_tlb_invalidation_reset - Initialize GT TLB invalidation reset 126 92 * @gt: graphics tile ··· 115 111 void xe_gt_tlb_invalidation_reset(struct xe_gt *gt) 116 112 { 117 113 struct xe_gt_tlb_invalidation_fence *fence, *next; 118 - struct xe_guc *guc = &gt->uc.guc; 119 114 int pending_seqno; 120 115 121 116 /* ··· 137 134 else 138 135 pending_seqno = gt->tlb_invalidation.seqno - 1; 139 136 WRITE_ONCE(gt->tlb_invalidation.seqno_recv, pending_seqno); 140 - wake_up_all(&guc->ct.wq); 141 137 142 138 list_for_each_entry_safe(fence, next, 143 139 &gt->tlb_invalidation.pending_fences, link) ··· 167 165 int seqno; 168 166 int ret; 169 167 168 + xe_gt_assert(gt, fence); 169 + 170 170 /* 171 171 * XXX: The seqno algorithm relies on TLB invalidation being processed 172 172 * in order which they currently are, if that changes the algorithm will ··· 177 173 178 174 mutex_lock(&guc->ct.lock); 179 175 seqno = gt->tlb_invalidation.seqno; 180 - if (fence) { 181 - fence->seqno = seqno; 182 - trace_xe_gt_tlb_invalidation_fence_send(xe, fence); 183 - } 176 + fence->seqno = seqno; 177 + trace_xe_gt_tlb_invalidation_fence_send(xe, fence); 184 178 action[1] = seqno; 185 179 ret = xe_guc_ct_send_locked(&guc->ct, action, len, 186 180 G2H_LEN_DW_TLB_INVALIDATE, 1); 187 - if (!ret && fence) { 181 + if (!ret) { 188 182 spin_lock_irq(&gt->tlb_invalidation.pending_lock); 189 183 /* 190 184 * We haven't actually published the TLB fence as per ··· 203 201 tlb_timeout_jiffies(gt)); 204 202 } 205 203 spin_unlock_irq(&gt->tlb_invalidation.pending_lock); 206 - } else if (ret < 0 && fence) { 204 + } else if (ret < 0) { 207 205 __invalidation_fence_signal(xe, fence); 208 206 } 209 207 if (!ret) { ··· 211 209 TLB_INVALIDATION_SEQNO_MAX; 212 210 if (!gt->tlb_invalidation.seqno) 213 211 gt->tlb_invalidation.seqno = 1; 214 - ret = seqno; 215 212 } 216 213 mutex_unlock(&guc->ct.lock); 217 214 ··· 224 223 /** 225 224 * xe_gt_tlb_invalidation_guc - Issue a TLB invalidation on this GT for the GuC 226 225 * @gt: graphics tile 226 + * @fence: invalidation fence which will be signal on TLB invalidation 227 + * completion 227 228 * 228 229 * Issue a TLB invalidation for the GuC. Completion of TLB is asynchronous and 229 - * caller can use seqno + xe_gt_tlb_invalidation_wait to wait for completion. 230 + * caller can use the invalidation fence to wait for completion. 230 231 * 231 - * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success, 232 - * negative error code on error. 232 + * Return: 0 on success, negative error code on error 233 233 */ 234 - static int xe_gt_tlb_invalidation_guc(struct xe_gt *gt) 234 + static int xe_gt_tlb_invalidation_guc(struct xe_gt *gt, 235 + struct xe_gt_tlb_invalidation_fence *fence) 235 236 { 236 237 u32 action[] = { 237 238 XE_GUC_ACTION_TLB_INVALIDATION, ··· 241 238 MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC), 242 239 }; 243 240 244 - return send_tlb_invalidation(&gt->uc.guc, NULL, action, 241 + return send_tlb_invalidation(&gt->uc.guc, fence, action, 245 242 ARRAY_SIZE(action)); 246 243 } 247 244 ··· 260 257 261 258 if (xe_guc_ct_enabled(&gt->uc.guc.ct) && 262 259 gt->uc.guc.submission_state.enabled) { 263 - int seqno; 260 + struct xe_gt_tlb_invalidation_fence fence; 261 + int ret; 264 262 265 - seqno = xe_gt_tlb_invalidation_guc(gt); 266 - if (seqno <= 0) 267 - return seqno; 263 + xe_gt_tlb_invalidation_fence_init(gt, &fence, true); 264 + ret = xe_gt_tlb_invalidation_guc(gt, &fence); 265 + if (ret < 0) { 266 + xe_gt_tlb_invalidation_fence_fini(&fence); 267 + return ret; 268 + } 268 269 269 - xe_gt_tlb_invalidation_wait(gt, seqno); 270 + xe_gt_tlb_invalidation_fence_wait(&fence); 270 271 } else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) { 271 272 if (IS_SRIOV_VF(xe)) 272 273 return 0; ··· 297 290 * 298 291 * @gt: graphics tile 299 292 * @fence: invalidation fence which will be signal on TLB invalidation 300 - * completion, can be NULL 293 + * completion 301 294 * @start: start address 302 295 * @end: end address 303 296 * @asid: address space id 304 297 * 305 298 * Issue a range based TLB invalidation if supported, if not fallback to a full 306 - * TLB invalidation. Completion of TLB is asynchronous and caller can either use 307 - * the invalidation fence or seqno + xe_gt_tlb_invalidation_wait to wait for 308 - * completion. 299 + * TLB invalidation. Completion of TLB is asynchronous and caller can use 300 + * the invalidation fence to wait for completion. 309 301 * 310 - * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success, 311 - * negative error code on error. 302 + * Return: Negative error code on error, 0 on success 312 303 */ 313 304 int xe_gt_tlb_invalidation_range(struct xe_gt *gt, 314 305 struct xe_gt_tlb_invalidation_fence *fence, ··· 317 312 u32 action[MAX_TLB_INVALIDATION_LEN]; 318 313 int len = 0; 319 314 315 + xe_gt_assert(gt, fence); 316 + 320 317 /* Execlists not supported */ 321 318 if (gt_to_xe(gt)->info.force_execlist) { 322 - if (fence) 323 - __invalidation_fence_signal(xe, fence); 324 - 319 + __invalidation_fence_signal(xe, fence); 325 320 return 0; 326 321 } 327 322 ··· 387 382 * @vma: VMA to invalidate 388 383 * 389 384 * Issue a range based TLB invalidation if supported, if not fallback to a full 390 - * TLB invalidation. Completion of TLB is asynchronous and caller can either use 391 - * the invalidation fence or seqno + xe_gt_tlb_invalidation_wait to wait for 392 - * completion. 385 + * TLB invalidation. Completion of TLB is asynchronous and caller can use 386 + * the invalidation fence to wait for completion. 393 387 * 394 - * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success, 395 - * negative error code on error. 388 + * Return: Negative error code on error, 0 on success 396 389 */ 397 390 int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, 398 391 struct xe_gt_tlb_invalidation_fence *fence, ··· 401 398 return xe_gt_tlb_invalidation_range(gt, fence, xe_vma_start(vma), 402 399 xe_vma_end(vma), 403 400 xe_vma_vm(vma)->usm.asid); 404 - } 405 - 406 - /** 407 - * xe_gt_tlb_invalidation_wait - Wait for TLB to complete 408 - * @gt: graphics tile 409 - * @seqno: seqno to wait which was returned from xe_gt_tlb_invalidation 410 - * 411 - * Wait for tlb_timeout_jiffies() for a TLB invalidation to complete. 412 - * 413 - * Return: 0 on success, -ETIME on TLB invalidation timeout 414 - */ 415 - int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno) 416 - { 417 - struct xe_guc *guc = &gt->uc.guc; 418 - int ret; 419 - 420 - /* Execlists not supported */ 421 - if (gt_to_xe(gt)->info.force_execlist) 422 - return 0; 423 - 424 - /* 425 - * XXX: See above, this algorithm only works if seqno are always in 426 - * order 427 - */ 428 - ret = wait_event_timeout(guc->ct.wq, 429 - tlb_invalidation_seqno_past(gt, seqno), 430 - tlb_timeout_jiffies(gt)); 431 - if (!ret) { 432 - struct drm_printer p = xe_gt_err_printer(gt); 433 - 434 - xe_gt_err(gt, "TLB invalidation time'd out, seqno=%d, recv=%d\n", 435 - seqno, gt->tlb_invalidation.seqno_recv); 436 - xe_guc_ct_print(&guc->ct, &p, true); 437 - return -ETIME; 438 - } 439 - 440 - return 0; 441 401 } 442 402 443 403 /** ··· 446 480 return 0; 447 481 } 448 482 449 - /* 450 - * wake_up_all() and wait_event_timeout() already have the correct 451 - * barriers. 452 - */ 453 483 WRITE_ONCE(gt->tlb_invalidation.seqno_recv, msg[0]); 454 - wake_up_all(&guc->ct.wq); 455 484 456 485 list_for_each_entry_safe(fence, next, 457 486 &gt->tlb_invalidation.pending_fences, link) { ··· 468 507 spin_unlock_irqrestore(&gt->tlb_invalidation.pending_lock, flags); 469 508 470 509 return 0; 510 + } 511 + 512 + static const char * 513 + invalidation_fence_get_driver_name(struct dma_fence *dma_fence) 514 + { 515 + return "xe"; 516 + } 517 + 518 + static const char * 519 + invalidation_fence_get_timeline_name(struct dma_fence *dma_fence) 520 + { 521 + return "invalidation_fence"; 522 + } 523 + 524 + static const struct dma_fence_ops invalidation_fence_ops = { 525 + .get_driver_name = invalidation_fence_get_driver_name, 526 + .get_timeline_name = invalidation_fence_get_timeline_name, 527 + }; 528 + 529 + /** 530 + * xe_gt_tlb_invalidation_fence_init - Initialize TLB invalidation fence 531 + * @gt: GT 532 + * @fence: TLB invalidation fence to initialize 533 + * @stack: fence is stack variable 534 + * 535 + * Initialize TLB invalidation fence for use. xe_gt_tlb_invalidation_fence_fini 536 + * must be called if fence is not signaled. 537 + */ 538 + void xe_gt_tlb_invalidation_fence_init(struct xe_gt *gt, 539 + struct xe_gt_tlb_invalidation_fence *fence, 540 + bool stack) 541 + { 542 + xe_pm_runtime_get_noresume(gt_to_xe(gt)); 543 + 544 + spin_lock_irq(&gt->tlb_invalidation.lock); 545 + dma_fence_init(&fence->base, &invalidation_fence_ops, 546 + &gt->tlb_invalidation.lock, 547 + dma_fence_context_alloc(1), 1); 548 + spin_unlock_irq(&gt->tlb_invalidation.lock); 549 + INIT_LIST_HEAD(&fence->link); 550 + if (stack) 551 + set_bit(FENCE_STACK_BIT, &fence->base.flags); 552 + else 553 + dma_fence_get(&fence->base); 554 + fence->gt = gt; 555 + } 556 + 557 + /** 558 + * xe_gt_tlb_invalidation_fence_fini - Finalize TLB invalidation fence 559 + * @fence: TLB invalidation fence to finalize 560 + * 561 + * Drop PM ref which fence took durinig init. 562 + */ 563 + void xe_gt_tlb_invalidation_fence_fini(struct xe_gt_tlb_invalidation_fence *fence) 564 + { 565 + xe_pm_runtime_put(gt_to_xe(fence->gt)); 471 566 }
+11 -1
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
··· 23 23 int xe_gt_tlb_invalidation_range(struct xe_gt *gt, 24 24 struct xe_gt_tlb_invalidation_fence *fence, 25 25 u64 start, u64 end, u32 asid); 26 - int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno); 27 26 int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len); 27 + 28 + void xe_gt_tlb_invalidation_fence_init(struct xe_gt *gt, 29 + struct xe_gt_tlb_invalidation_fence *fence, 30 + bool stack); 31 + void xe_gt_tlb_invalidation_fence_fini(struct xe_gt_tlb_invalidation_fence *fence); 32 + 33 + static inline void 34 + xe_gt_tlb_invalidation_fence_wait(struct xe_gt_tlb_invalidation_fence *fence) 35 + { 36 + dma_fence_wait(&fence->base, false); 37 + } 28 38 29 39 #endif /* _XE_GT_TLB_INVALIDATION_ */
+4
drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
··· 8 8 9 9 #include <linux/dma-fence.h> 10 10 11 + struct xe_gt; 12 + 11 13 /** 12 14 * struct xe_gt_tlb_invalidation_fence - XE GT TLB invalidation fence 13 15 * ··· 19 17 struct xe_gt_tlb_invalidation_fence { 20 18 /** @base: dma fence base */ 21 19 struct dma_fence base; 20 + /** @gt: GT which fence belong to */ 21 + struct xe_gt *gt; 22 22 /** @link: link into list of pending tlb fences */ 23 23 struct list_head link; 24 24 /** @seqno: seqno of TLB invalidation to signal fence one */
+22 -5
drivers/gpu/drm/xe/xe_gt_topology.c
··· 6 6 #include "xe_gt_topology.h" 7 7 8 8 #include <linux/bitmap.h> 9 + #include <linux/compiler.h> 9 10 10 11 #include "regs/xe_gt_regs.h" 11 12 #include "xe_assert.h" ··· 32 31 } 33 32 34 33 static void 35 - load_eu_mask(struct xe_gt *gt, xe_eu_mask_t mask) 34 + load_eu_mask(struct xe_gt *gt, xe_eu_mask_t mask, enum xe_gt_eu_type *eu_type) 36 35 { 37 36 struct xe_device *xe = gt_to_xe(gt); 38 37 u32 reg_val = xe_mmio_read32(gt, XELP_EU_ENABLE); ··· 48 47 if (GRAPHICS_VERx100(xe) < 1250) 49 48 reg_val = ~reg_val & XELP_EU_MASK; 50 49 51 - /* On PVC, one bit = one EU */ 52 - if (GRAPHICS_VERx100(xe) == 1260) { 50 + if (GRAPHICS_VERx100(xe) == 1260 || GRAPHICS_VER(xe) >= 20) { 51 + /* SIMD16 EUs, one bit == one EU */ 52 + *eu_type = XE_GT_EU_TYPE_SIMD16; 53 53 val = reg_val; 54 54 } else { 55 - /* All other platforms, one bit = 2 EU */ 55 + /* SIMD8 EUs, one bit == 2 EU */ 56 + *eu_type = XE_GT_EU_TYPE_SIMD8; 56 57 for (i = 0; i < fls(reg_val); i++) 57 58 if (reg_val & BIT(i)) 58 59 val |= 0x3 << 2 * i; ··· 216 213 XEHP_GT_COMPUTE_DSS_ENABLE, 217 214 XEHPC_GT_COMPUTE_DSS_ENABLE_EXT, 218 215 XE2_GT_COMPUTE_DSS_2); 219 - load_eu_mask(gt, gt->fuse_topo.eu_mask_per_dss); 216 + load_eu_mask(gt, gt->fuse_topo.eu_mask_per_dss, &gt->fuse_topo.eu_type); 220 217 load_l3_bank_mask(gt, gt->fuse_topo.l3_bank_mask); 221 218 222 219 p = drm_dbg_printer(&gt_to_xe(gt)->drm, DRM_UT_DRIVER, "GT topology"); 223 220 224 221 xe_gt_topology_dump(gt, &p); 222 + } 223 + 224 + static const char *eu_type_to_str(enum xe_gt_eu_type eu_type) 225 + { 226 + switch (eu_type) { 227 + case XE_GT_EU_TYPE_SIMD16: 228 + return "simd16"; 229 + case XE_GT_EU_TYPE_SIMD8: 230 + return "simd8"; 231 + } 232 + 233 + return NULL; 225 234 } 226 235 227 236 void ··· 246 231 247 232 drm_printf(p, "EU mask per DSS: %*pb\n", XE_MAX_EU_FUSE_BITS, 248 233 gt->fuse_topo.eu_mask_per_dss); 234 + drm_printf(p, "EU type: %s\n", 235 + eu_type_to_str(gt->fuse_topo.eu_type)); 249 236 250 237 drm_printf(p, "L3 bank mask: %*pb\n", XE_MAX_L3_BANK_MASK_BITS, 251 238 gt->fuse_topo.l3_bank_mask);
+24 -1
drivers/gpu/drm/xe/xe_gt_types.h
··· 27 27 XE_GT_TYPE_MEDIA, 28 28 }; 29 29 30 + enum xe_gt_eu_type { 31 + XE_GT_EU_TYPE_SIMD8, 32 + XE_GT_EU_TYPE_SIMD16, 33 + }; 34 + 30 35 #define XE_MAX_DSS_FUSE_REGS 3 31 36 #define XE_MAX_DSS_FUSE_BITS (32 * XE_MAX_DSS_FUSE_REGS) 32 37 #define XE_MAX_EU_FUSE_REGS 1 ··· 348 343 349 344 /** @fuse_topo.l3_bank_mask: L3 bank mask */ 350 345 xe_l3_bank_mask_t l3_bank_mask; 346 + 347 + /** 348 + * @fuse_topo.eu_type: type/width of EU stored in 349 + * fuse_topo.eu_mask_per_dss 350 + */ 351 + enum xe_gt_eu_type eu_type; 351 352 } fuse_topo; 352 353 353 354 /** @steering: register steering for individual HW units */ ··· 373 362 */ 374 363 spinlock_t mcr_lock; 375 364 365 + /** 366 + * @global_invl_lock: protects the register for the duration 367 + * of a global invalidation of l2 cache 368 + */ 369 + spinlock_t global_invl_lock; 370 + 376 371 /** @wa_active: keep track of active workarounds */ 377 372 struct { 378 373 /** @wa_active.gt: bitmap with active GT workarounds */ ··· 387 370 unsigned long *engine; 388 371 /** @wa_active.lrc: bitmap with active LRC workarounds */ 389 372 unsigned long *lrc; 390 - /** @wa_active.oob: bitmap with active OOB workaroudns */ 373 + /** @wa_active.oob: bitmap with active OOB workarounds */ 391 374 unsigned long *oob; 375 + /** 376 + * @wa_active.oob_initialized: mark oob as initialized to help 377 + * detecting misuse of XE_WA() - it can only be called on 378 + * initialization after OOB WAs have being processed 379 + */ 380 + bool oob_initialized; 392 381 } wa_active; 393 382 394 383 /** @user_engines: engines present in GT and available to userspace */
+10 -1
drivers/gpu/drm/xe/xe_guc_ct.c
··· 327 327 xe_gt_assert(ct_to_gt(ct), ct->g2h_outstanding == 0 || 328 328 state == XE_GUC_CT_STATE_STOPPED); 329 329 330 + if (ct->g2h_outstanding) 331 + xe_pm_runtime_put(ct_to_xe(ct)); 330 332 ct->g2h_outstanding = 0; 331 333 ct->state = state; 332 334 ··· 497 495 static void __g2h_reserve_space(struct xe_guc_ct *ct, u32 g2h_len, u32 num_g2h) 498 496 { 499 497 xe_gt_assert(ct_to_gt(ct), g2h_len <= ct->ctbs.g2h.info.space); 498 + xe_gt_assert(ct_to_gt(ct), (!g2h_len && !num_g2h) || 499 + (g2h_len && num_g2h)); 500 500 501 501 if (g2h_len) { 502 502 lockdep_assert_held(&ct->fast_lock); 503 + 504 + if (!ct->g2h_outstanding) 505 + xe_pm_runtime_get_noresume(ct_to_xe(ct)); 503 506 504 507 ct->ctbs.g2h.info.space -= g2h_len; 505 508 ct->g2h_outstanding += num_g2h; ··· 516 509 lockdep_assert_held(&ct->fast_lock); 517 510 xe_gt_assert(ct_to_gt(ct), ct->ctbs.g2h.info.space + g2h_len <= 518 511 ct->ctbs.g2h.info.size - ct->ctbs.g2h.info.resv_space); 512 + xe_gt_assert(ct_to_gt(ct), ct->g2h_outstanding); 519 513 520 514 ct->ctbs.g2h.info.space += g2h_len; 521 - --ct->g2h_outstanding; 515 + if (!--ct->g2h_outstanding) 516 + xe_pm_runtime_put(ct_to_xe(ct)); 522 517 } 523 518 524 519 static void g2h_release_space(struct xe_guc_ct *ct, u32 g2h_len)
+2 -2
drivers/gpu/drm/xe/xe_guc_id_mgr.c
··· 97 97 if (ret) 98 98 return ret; 99 99 100 - xe_gt_info(idm_to_gt(idm), "using %u GUC ID%s\n", 101 - idm->total, str_plural(idm->total)); 100 + xe_gt_dbg(idm_to_gt(idm), "using %u GuC ID%s\n", 101 + idm->total, str_plural(idm->total)); 102 102 return 0; 103 103 } 104 104
+48 -8
drivers/gpu/drm/xe/xe_guc_submit.c
··· 1071 1071 struct xe_exec_queue *q = job->q; 1072 1072 struct xe_gpu_scheduler *sched = &q->guc->sched; 1073 1073 struct xe_guc *guc = exec_queue_to_guc(q); 1074 + const char *process_name = "no process"; 1074 1075 int err = -ETIME; 1076 + pid_t pid = -1; 1075 1077 int i = 0; 1076 1078 bool wedged, skip_timeout_check; 1077 1079 ··· 1170 1168 goto sched_enable; 1171 1169 } 1172 1170 1173 - xe_gt_notice(guc_to_gt(guc), "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx", 1171 + if (q->vm && q->vm->xef) { 1172 + process_name = q->vm->xef->process_name; 1173 + pid = q->vm->xef->pid; 1174 + } 1175 + xe_gt_notice(guc_to_gt(guc), "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx in %s [%d]", 1174 1176 xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job), 1175 - q->guc->id, q->flags); 1177 + q->guc->id, q->flags, process_name, pid); 1178 + 1176 1179 trace_xe_sched_job_timedout(job); 1177 1180 1178 1181 if (!exec_queue_killed(q)) ··· 1319 1312 kfree(msg); 1320 1313 } 1321 1314 1315 + static void __suspend_fence_signal(struct xe_exec_queue *q) 1316 + { 1317 + if (!q->guc->suspend_pending) 1318 + return; 1319 + 1320 + WRITE_ONCE(q->guc->suspend_pending, false); 1321 + wake_up(&q->guc->suspend_wait); 1322 + } 1323 + 1322 1324 static void suspend_fence_signal(struct xe_exec_queue *q) 1323 1325 { 1324 1326 struct xe_guc *guc = exec_queue_to_guc(q); ··· 1337 1321 guc_read_stopped(guc)); 1338 1322 xe_assert(xe, q->guc->suspend_pending); 1339 1323 1340 - q->guc->suspend_pending = false; 1341 - smp_wmb(); 1342 - wake_up(&q->guc->suspend_wait); 1324 + __suspend_fence_signal(q); 1343 1325 } 1344 1326 1345 1327 static void __guc_exec_queue_process_msg_suspend(struct xe_sched_msg *msg) ··· 1389 1375 1390 1376 static void guc_exec_queue_process_msg(struct xe_sched_msg *msg) 1391 1377 { 1378 + struct xe_device *xe = guc_to_xe(exec_queue_to_guc(msg->private_data)); 1379 + 1392 1380 trace_xe_sched_msg_recv(msg); 1393 1381 1394 1382 switch (msg->opcode) { ··· 1409 1393 default: 1410 1394 XE_WARN_ON("Unknown message type"); 1411 1395 } 1396 + 1397 + xe_pm_runtime_put(xe); 1412 1398 } 1413 1399 1414 1400 static const struct drm_sched_backend_ops drm_sched_ops = { ··· 1494 1476 { 1495 1477 trace_xe_exec_queue_kill(q); 1496 1478 set_exec_queue_killed(q); 1479 + __suspend_fence_signal(q); 1497 1480 xe_guc_exec_queue_trigger_cleanup(q); 1498 1481 } 1499 1482 1500 1483 static void guc_exec_queue_add_msg(struct xe_exec_queue *q, struct xe_sched_msg *msg, 1501 1484 u32 opcode) 1502 1485 { 1486 + xe_pm_runtime_get_noresume(guc_to_xe(exec_queue_to_guc(q))); 1487 + 1503 1488 INIT_LIST_HEAD(&msg->link); 1504 1489 msg->opcode = opcode; 1505 1490 msg->private_data = q; ··· 1593 1572 return 0; 1594 1573 } 1595 1574 1596 - static void guc_exec_queue_suspend_wait(struct xe_exec_queue *q) 1575 + static int guc_exec_queue_suspend_wait(struct xe_exec_queue *q) 1597 1576 { 1598 1577 struct xe_guc *guc = exec_queue_to_guc(q); 1578 + int ret; 1599 1579 1600 - wait_event(q->guc->suspend_wait, !q->guc->suspend_pending || 1601 - guc_read_stopped(guc)); 1580 + /* 1581 + * Likely don't need to check exec_queue_killed() as we clear 1582 + * suspend_pending upon kill but to be paranoid but races in which 1583 + * suspend_pending is set after kill also check kill here. 1584 + */ 1585 + ret = wait_event_timeout(q->guc->suspend_wait, 1586 + !READ_ONCE(q->guc->suspend_pending) || 1587 + exec_queue_killed(q) || 1588 + guc_read_stopped(guc), 1589 + HZ * 5); 1590 + 1591 + if (!ret) { 1592 + xe_gt_warn(guc_to_gt(guc), 1593 + "Suspend fence, guc_id=%d, failed to respond", 1594 + q->guc->id); 1595 + /* XXX: Trigger GT reset? */ 1596 + return -ETIME; 1597 + } 1598 + 1599 + return 0; 1602 1600 } 1603 1601 1604 1602 static void guc_exec_queue_resume(struct xe_exec_queue *q)
+25 -3
drivers/gpu/drm/xe/xe_heci_gsc.c
··· 92 92 { 93 93 struct xe_heci_gsc *heci_gsc = &xe->heci_gsc; 94 94 95 - if (!HAS_HECI_GSCFI(xe)) 95 + if (!HAS_HECI_GSCFI(xe) && !HAS_HECI_CSCFI(xe)) 96 96 return; 97 97 98 98 if (heci_gsc->adev) { ··· 177 177 const struct heci_gsc_def *def; 178 178 int ret; 179 179 180 - if (!HAS_HECI_GSCFI(xe)) 180 + if (!HAS_HECI_GSCFI(xe) && !HAS_HECI_CSCFI(xe)) 181 181 return; 182 182 183 183 heci_gsc->irq = -1; 184 184 185 - if (xe->info.platform == XE_PVC) { 185 + if (xe->info.platform == XE_BATTLEMAGE) { 186 + def = &heci_gsc_def_dg2; 187 + } else if (xe->info.platform == XE_PVC) { 186 188 def = &heci_gsc_def_pvc; 187 189 } else if (xe->info.platform == XE_DG2) { 188 190 def = &heci_gsc_def_dg2; ··· 224 222 225 223 if (!HAS_HECI_GSCFI(xe)) { 226 224 drm_warn_once(&xe->drm, "GSC irq: not supported"); 225 + return; 226 + } 227 + 228 + if (xe->heci_gsc.irq < 0) 229 + return; 230 + 231 + ret = generic_handle_irq(xe->heci_gsc.irq); 232 + if (ret) 233 + drm_err_ratelimited(&xe->drm, "error handling GSC irq: %d\n", ret); 234 + } 235 + 236 + void xe_heci_csc_irq_handler(struct xe_device *xe, u32 iir) 237 + { 238 + int ret; 239 + 240 + if ((iir & CSC_IRQ_INTF(1)) == 0) 241 + return; 242 + 243 + if (!HAS_HECI_CSCFI(xe)) { 244 + drm_warn_once(&xe->drm, "CSC irq: not supported"); 227 245 return; 228 246 } 229 247
+8 -2
drivers/gpu/drm/xe/xe_heci_gsc.h
··· 11 11 struct mei_aux_device; 12 12 13 13 /* 14 - * The HECI1 bit corresponds to bit15 and HECI2 to bit14. 14 + * GSC HECI1 bit corresponds to bit15 and HECI2 to bit14. 15 15 * The reason for this is to allow growth for more interfaces in the future. 16 16 */ 17 - #define GSC_IRQ_INTF(_x) BIT(15 - (_x)) 17 + #define GSC_IRQ_INTF(_x) BIT(15 - (_x)) 18 + 19 + /* 20 + * CSC HECI1 bit corresponds to bit9 and HECI2 to bit10. 21 + */ 22 + #define CSC_IRQ_INTF(_x) BIT(9 + (_x)) 18 23 19 24 /** 20 25 * struct xe_heci_gsc - graphics security controller for xe, HECI interface ··· 36 31 void xe_heci_gsc_init(struct xe_device *xe); 37 32 void xe_heci_gsc_fini(struct xe_device *xe); 38 33 void xe_heci_gsc_irq_handler(struct xe_device *xe, u32 iir); 34 + void xe_heci_csc_irq_handler(struct xe_device *xe, u32 iir); 39 35 40 36 #endif /* __XE_HECI_GSC_DEV_H__ */
+2
drivers/gpu/drm/xe/xe_irq.c
··· 459 459 * the primary tile. 460 460 */ 461 461 if (id == 0) { 462 + if (HAS_HECI_CSCFI(xe)) 463 + xe_heci_csc_irq_handler(xe, master_ctl); 462 464 xe_display_irq_handler(xe, master_ctl); 463 465 gu_misc_iir = gu_misc_irq_ack(xe, master_ctl); 464 466 }
+2 -2
drivers/gpu/drm/xe/xe_lmtt.c
··· 7 7 8 8 #include <drm/drm_managed.h> 9 9 10 - #include "regs/xe_sriov_regs.h" 10 + #include "regs/xe_gt_regs.h" 11 11 12 12 #include "xe_assert.h" 13 13 #include "xe_bo.h" ··· 71 71 lmtt->ops->lmtt_pte_num(level)), 72 72 ttm_bo_type_kernel, 73 73 XE_BO_FLAG_VRAM_IF_DGFX(lmtt_to_tile(lmtt)) | 74 - XE_BO_NEEDS_64K | XE_BO_FLAG_PINNED); 74 + XE_BO_FLAG_NEEDS_64K | XE_BO_FLAG_PINNED); 75 75 if (IS_ERR(bo)) { 76 76 err = PTR_ERR(bo); 77 77 goto out_free_pt;
+287 -235
drivers/gpu/drm/xe/xe_migrate.c
··· 73 73 #define NUM_PT_SLOTS 32 74 74 #define LEVEL0_PAGE_TABLE_ENCODE_SIZE SZ_2M 75 75 #define MAX_NUM_PTE 512 76 + #define IDENTITY_OFFSET 256ULL 76 77 77 78 /* 78 79 * Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is the largest ··· 85 84 #define MAX_PTE_PER_SDI 0x1FE 86 85 87 86 /** 88 - * xe_tile_migrate_engine() - Get this tile's migrate engine. 87 + * xe_tile_migrate_exec_queue() - Get this tile's migrate exec queue. 89 88 * @tile: The tile. 90 89 * 91 - * Returns the default migrate engine of this tile. 92 - * TODO: Perhaps this function is slightly misplaced, and even unneeded? 90 + * Returns the default migrate exec queue of this tile. 93 91 * 94 - * Return: The default migrate engine 92 + * Return: The default migrate exec queue 95 93 */ 96 - struct xe_exec_queue *xe_tile_migrate_engine(struct xe_tile *tile) 94 + struct xe_exec_queue *xe_tile_migrate_exec_queue(struct xe_tile *tile) 97 95 { 98 96 return tile->migrate->q; 99 97 } ··· 121 121 return (slot + 1ULL) << xe_pt_shift(level + 1); 122 122 } 123 123 124 - static u64 xe_migrate_vram_ofs(struct xe_device *xe, u64 addr) 124 + static u64 xe_migrate_vram_ofs(struct xe_device *xe, u64 addr, bool is_comp_pte) 125 125 { 126 126 /* 127 127 * Remove the DPA to get a correct offset into identity table for the 128 128 * migrate offset 129 129 */ 130 + u64 identity_offset = IDENTITY_OFFSET; 131 + 132 + if (GRAPHICS_VER(xe) >= 20 && is_comp_pte) 133 + identity_offset += DIV_ROUND_UP_ULL(xe->mem.vram.actual_physical_size, SZ_1G); 134 + 130 135 addr -= xe->mem.vram.dpa_base; 131 - return addr + (256ULL << xe_pt_shift(2)); 136 + return addr + (identity_offset << xe_pt_shift(2)); 137 + } 138 + 139 + static void xe_migrate_program_identity(struct xe_device *xe, struct xe_vm *vm, struct xe_bo *bo, 140 + u64 map_ofs, u64 vram_offset, u16 pat_index, u64 pt_2m_ofs) 141 + { 142 + u64 pos, ofs, flags; 143 + u64 entry; 144 + /* XXX: Unclear if this should be usable_size? */ 145 + u64 vram_limit = xe->mem.vram.actual_physical_size + 146 + xe->mem.vram.dpa_base; 147 + u32 level = 2; 148 + 149 + ofs = map_ofs + XE_PAGE_SIZE * level + vram_offset * 8; 150 + flags = vm->pt_ops->pte_encode_addr(xe, 0, pat_index, level, 151 + true, 0); 152 + 153 + xe_assert(xe, IS_ALIGNED(xe->mem.vram.usable_size, SZ_2M)); 154 + 155 + /* 156 + * Use 1GB pages when possible, last chunk always use 2M 157 + * pages as mixing reserved memory (stolen, WOCPM) with a single 158 + * mapping is not allowed on certain platforms. 159 + */ 160 + for (pos = xe->mem.vram.dpa_base; pos < vram_limit; 161 + pos += SZ_1G, ofs += 8) { 162 + if (pos + SZ_1G >= vram_limit) { 163 + entry = vm->pt_ops->pde_encode_bo(bo, pt_2m_ofs, 164 + pat_index); 165 + xe_map_wr(xe, &bo->vmap, ofs, u64, entry); 166 + 167 + flags = vm->pt_ops->pte_encode_addr(xe, 0, 168 + pat_index, 169 + level - 1, 170 + true, 0); 171 + 172 + for (ofs = pt_2m_ofs; pos < vram_limit; 173 + pos += SZ_2M, ofs += 8) 174 + xe_map_wr(xe, &bo->vmap, ofs, u64, pos | flags); 175 + break; /* Ensure pos == vram_limit assert correct */ 176 + } 177 + 178 + xe_map_wr(xe, &bo->vmap, ofs, u64, pos | flags); 179 + } 180 + 181 + xe_assert(xe, pos == vram_limit); 132 182 } 133 183 134 184 static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m, ··· 187 137 struct xe_device *xe = tile_to_xe(tile); 188 138 u16 pat_index = xe->pat.idx[XE_CACHE_WB]; 189 139 u8 id = tile->id; 190 - u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level, 191 - num_setup = num_level + 1; 140 + u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level; 141 + #define VRAM_IDENTITY_MAP_COUNT 2 142 + u32 num_setup = num_level + VRAM_IDENTITY_MAP_COUNT; 143 + #undef VRAM_IDENTITY_MAP_COUNT 192 144 u32 map_ofs, level, i; 193 145 struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo; 194 - u64 entry, pt30_ofs; 146 + u64 entry, pt29_ofs; 195 147 196 148 /* Can't bump NUM_PT_SLOTS too high */ 197 149 BUILD_BUG_ON(NUM_PT_SLOTS > SZ_2M/XE_PAGE_SIZE); ··· 213 161 if (IS_ERR(bo)) 214 162 return PTR_ERR(bo); 215 163 216 - /* PT31 reserved for 2M identity map */ 217 - pt30_ofs = bo->size - 2 * XE_PAGE_SIZE; 218 - entry = vm->pt_ops->pde_encode_bo(bo, pt30_ofs, pat_index); 164 + /* PT30 & PT31 reserved for 2M identity map */ 165 + pt29_ofs = bo->size - 3 * XE_PAGE_SIZE; 166 + entry = vm->pt_ops->pde_encode_bo(bo, pt29_ofs, pat_index); 219 167 xe_pt_write(xe, &vm->pt_root[id]->bo->vmap, 0, entry); 220 168 221 169 map_ofs = (num_entries - num_setup) * XE_PAGE_SIZE; ··· 267 215 } else { 268 216 u64 batch_addr = xe_bo_addr(batch, 0, XE_PAGE_SIZE); 269 217 270 - m->batch_base_ofs = xe_migrate_vram_ofs(xe, batch_addr); 218 + m->batch_base_ofs = xe_migrate_vram_ofs(xe, batch_addr, false); 271 219 272 220 if (xe->info.has_usm) { 273 221 batch = tile->primary_gt->usm.bb_pool->bo; 274 222 batch_addr = xe_bo_addr(batch, 0, XE_PAGE_SIZE); 275 - m->usm_batch_base_ofs = xe_migrate_vram_ofs(xe, batch_addr); 223 + m->usm_batch_base_ofs = xe_migrate_vram_ofs(xe, batch_addr, false); 276 224 } 277 225 } 278 226 ··· 306 254 307 255 /* Identity map the entire vram at 256GiB offset */ 308 256 if (IS_DGFX(xe)) { 309 - u64 pos, ofs, flags; 310 - /* XXX: Unclear if this should be usable_size? */ 311 - u64 vram_limit = xe->mem.vram.actual_physical_size + 312 - xe->mem.vram.dpa_base; 257 + u64 pt30_ofs = bo->size - 2 * XE_PAGE_SIZE; 313 258 314 - level = 2; 315 - ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8; 316 - flags = vm->pt_ops->pte_encode_addr(xe, 0, pat_index, level, 317 - true, 0); 318 - 319 - xe_assert(xe, IS_ALIGNED(xe->mem.vram.usable_size, SZ_2M)); 259 + xe_migrate_program_identity(xe, vm, bo, map_ofs, IDENTITY_OFFSET, 260 + pat_index, pt30_ofs); 261 + xe_assert(xe, xe->mem.vram.actual_physical_size <= 262 + (MAX_NUM_PTE - IDENTITY_OFFSET) * SZ_1G); 320 263 321 264 /* 322 - * Use 1GB pages when possible, last chunk always use 2M 323 - * pages as mixing reserved memory (stolen, WOCPM) with a single 324 - * mapping is not allowed on certain platforms. 265 + * Identity map the entire vram for compressed pat_index for xe2+ 266 + * if flat ccs is enabled. 325 267 */ 326 - for (pos = xe->mem.vram.dpa_base; pos < vram_limit; 327 - pos += SZ_1G, ofs += 8) { 328 - if (pos + SZ_1G >= vram_limit) { 329 - u64 pt31_ofs = bo->size - XE_PAGE_SIZE; 268 + if (GRAPHICS_VER(xe) >= 20 && xe_device_has_flat_ccs(xe)) { 269 + u16 comp_pat_index = xe->pat.idx[XE_CACHE_NONE_COMPRESSION]; 270 + u64 vram_offset = IDENTITY_OFFSET + 271 + DIV_ROUND_UP_ULL(xe->mem.vram.actual_physical_size, SZ_1G); 272 + u64 pt31_ofs = bo->size - XE_PAGE_SIZE; 330 273 331 - entry = vm->pt_ops->pde_encode_bo(bo, pt31_ofs, 332 - pat_index); 333 - xe_map_wr(xe, &bo->vmap, ofs, u64, entry); 334 - 335 - flags = vm->pt_ops->pte_encode_addr(xe, 0, 336 - pat_index, 337 - level - 1, 338 - true, 0); 339 - 340 - for (ofs = pt31_ofs; pos < vram_limit; 341 - pos += SZ_2M, ofs += 8) 342 - xe_map_wr(xe, &bo->vmap, ofs, u64, pos | flags); 343 - break; /* Ensure pos == vram_limit assert correct */ 344 - } 345 - 346 - xe_map_wr(xe, &bo->vmap, ofs, u64, pos | flags); 274 + xe_assert(xe, xe->mem.vram.actual_physical_size <= (MAX_NUM_PTE - 275 + IDENTITY_OFFSET - IDENTITY_OFFSET / 2) * SZ_1G); 276 + xe_migrate_program_identity(xe, vm, bo, map_ofs, vram_offset, 277 + comp_pat_index, pt31_ofs); 347 278 } 348 - 349 - xe_assert(xe, pos == vram_limit); 350 279 } 351 280 352 281 /* 353 282 * Example layout created above, with root level = 3: 354 283 * [PT0...PT7]: kernel PT's for copy/clear; 64 or 4KiB PTE's 355 284 * [PT8]: Kernel PT for VM_BIND, 4 KiB PTE's 356 - * [PT9...PT27]: Userspace PT's for VM_BIND, 4 KiB PTE's 357 - * [PT28 = PDE 0] [PT29 = PDE 1] [PT30 = PDE 2] [PT31 = 2M vram identity map] 285 + * [PT9...PT26]: Userspace PT's for VM_BIND, 4 KiB PTE's 286 + * [PT27 = PDE 0] [PT28 = PDE 1] [PT29 = PDE 2] [PT30 & PT31 = 2M vram identity map] 358 287 * 359 288 * This makes the lowest part of the VM point to the pagetables. 360 289 * Hence the lowest 2M in the vm should point to itself, with a few writes ··· 379 346 } 380 347 381 348 return logical_mask; 349 + } 350 + 351 + static bool xe_migrate_needs_ccs_emit(struct xe_device *xe) 352 + { 353 + return xe_device_has_flat_ccs(xe) && !(GRAPHICS_VER(xe) >= 20 && IS_DGFX(xe)); 382 354 } 383 355 384 356 /** ··· 459 421 return ERR_PTR(err); 460 422 461 423 if (IS_DGFX(xe)) { 462 - if (xe_device_has_flat_ccs(xe)) 424 + if (xe_migrate_needs_ccs_emit(xe)) 463 425 /* min chunk size corresponds to 4K of CCS Metadata */ 464 426 m->min_chunk_size = SZ_4K * SZ_64K / 465 427 xe_device_ccs_bytes(xe, SZ_64K); ··· 513 475 return cur->size >= size; 514 476 } 515 477 478 + #define PTE_UPDATE_FLAG_IS_VRAM BIT(0) 479 + #define PTE_UPDATE_FLAG_IS_COMP_PTE BIT(1) 480 + 516 481 static u32 pte_update_size(struct xe_migrate *m, 517 - bool is_vram, 482 + u32 flags, 518 483 struct ttm_resource *res, 519 484 struct xe_res_cursor *cur, 520 485 u64 *L0, u64 *L0_ofs, u32 *L0_pt, 521 486 u32 cmd_size, u32 pt_ofs, u32 avail_pts) 522 487 { 523 488 u32 cmds = 0; 489 + bool is_vram = PTE_UPDATE_FLAG_IS_VRAM & flags; 490 + bool is_comp_pte = PTE_UPDATE_FLAG_IS_COMP_PTE & flags; 524 491 525 492 *L0_pt = pt_ofs; 526 493 if (is_vram && xe_migrate_allow_identity(*L0, cur)) { 527 494 /* Offset into identity map. */ 528 495 *L0_ofs = xe_migrate_vram_ofs(tile_to_xe(m->tile), 529 - cur->start + vram_region_gpu_offset(res)); 496 + cur->start + vram_region_gpu_offset(res), 497 + is_comp_pte); 530 498 cmds += cmd_size; 531 499 } else { 532 500 /* Clip L0 to available size */ ··· 705 661 struct xe_gt *gt = m->tile->primary_gt; 706 662 u32 flush_flags = 0; 707 663 708 - if (xe_device_has_flat_ccs(gt_to_xe(gt)) && !copy_ccs && dst_is_indirect) { 664 + if (!copy_ccs && dst_is_indirect) { 709 665 /* 710 666 * If the src is already in vram, then it should already 711 667 * have been cleared by us, or has been populated by the ··· 781 737 bool copy_ccs = xe_device_has_flat_ccs(xe) && 782 738 xe_bo_needs_ccs_pages(src_bo) && xe_bo_needs_ccs_pages(dst_bo); 783 739 bool copy_system_ccs = copy_ccs && (!src_is_vram || !dst_is_vram); 740 + bool use_comp_pat = xe_device_has_flat_ccs(xe) && 741 + GRAPHICS_VER(xe) >= 20 && src_is_vram && !dst_is_vram; 784 742 785 743 /* Copying CCS between two different BOs is not supported yet. */ 786 744 if (XE_WARN_ON(copy_ccs && src_bo != dst_bo)) ··· 809 763 u32 batch_size = 2; /* arb_clear() + MI_BATCH_BUFFER_END */ 810 764 struct xe_sched_job *job; 811 765 struct xe_bb *bb; 812 - u32 flush_flags; 766 + u32 flush_flags = 0; 813 767 u32 update_idx; 814 768 u64 ccs_ofs, ccs_size; 815 769 u32 ccs_pt; 770 + u32 pte_flags; 816 771 817 772 bool usm = xe->info.has_usm; 818 773 u32 avail_pts = max_mem_transfer_per_pass(xe) / LEVEL0_PAGE_TABLE_ENCODE_SIZE; ··· 826 779 827 780 src_L0 = min(src_L0, dst_L0); 828 781 829 - batch_size += pte_update_size(m, src_is_vram, src, &src_it, &src_L0, 782 + pte_flags = src_is_vram ? PTE_UPDATE_FLAG_IS_VRAM : 0; 783 + pte_flags |= use_comp_pat ? PTE_UPDATE_FLAG_IS_COMP_PTE : 0; 784 + batch_size += pte_update_size(m, pte_flags, src, &src_it, &src_L0, 830 785 &src_L0_ofs, &src_L0_pt, 0, 0, 831 786 avail_pts); 832 787 833 - batch_size += pte_update_size(m, dst_is_vram, dst, &dst_it, &src_L0, 788 + pte_flags = dst_is_vram ? PTE_UPDATE_FLAG_IS_VRAM : 0; 789 + batch_size += pte_update_size(m, pte_flags, dst, &dst_it, &src_L0, 834 790 &dst_L0_ofs, &dst_L0_pt, 0, 835 791 avail_pts, avail_pts); 836 792 837 793 if (copy_system_ccs) { 838 794 ccs_size = xe_device_ccs_bytes(xe, src_L0); 839 - batch_size += pte_update_size(m, false, NULL, &ccs_it, &ccs_size, 795 + batch_size += pte_update_size(m, 0, NULL, &ccs_it, &ccs_size, 840 796 &ccs_ofs, &ccs_pt, 0, 841 797 2 * avail_pts, 842 798 avail_pts); ··· 848 798 849 799 /* Add copy commands size here */ 850 800 batch_size += ((copy_only_ccs) ? 0 : EMIT_COPY_DW) + 851 - ((xe_device_has_flat_ccs(xe) ? EMIT_COPY_CCS_DW : 0)); 801 + ((xe_migrate_needs_ccs_emit(xe) ? EMIT_COPY_CCS_DW : 0)); 852 802 853 803 bb = xe_bb_new(gt, batch_size, usm); 854 804 if (IS_ERR(bb)) { ··· 877 827 if (!copy_only_ccs) 878 828 emit_copy(gt, bb, src_L0_ofs, dst_L0_ofs, src_L0, XE_PAGE_SIZE); 879 829 880 - flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, 881 - IS_DGFX(xe) ? src_is_vram : src_is_pltt, 882 - dst_L0_ofs, 883 - IS_DGFX(xe) ? dst_is_vram : dst_is_pltt, 884 - src_L0, ccs_ofs, copy_ccs); 830 + if (xe_migrate_needs_ccs_emit(xe)) 831 + flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, 832 + IS_DGFX(xe) ? src_is_vram : src_is_pltt, 833 + dst_L0_ofs, 834 + IS_DGFX(xe) ? dst_is_vram : dst_is_pltt, 835 + src_L0, ccs_ofs, copy_ccs); 885 836 886 837 job = xe_bb_create_migration_job(m->q, bb, 887 838 xe_migrate_batch_base(m, usm), ··· 1073 1022 struct xe_sched_job *job; 1074 1023 struct xe_bb *bb; 1075 1024 u32 batch_size, update_idx; 1025 + u32 pte_flags; 1076 1026 1077 1027 bool usm = xe->info.has_usm; 1078 1028 u32 avail_pts = max_mem_transfer_per_pass(xe) / LEVEL0_PAGE_TABLE_ENCODE_SIZE; ··· 1081 1029 clear_L0 = xe_migrate_res_sizes(m, &src_it); 1082 1030 1083 1031 /* Calculate final sizes and batch size.. */ 1032 + pte_flags = clear_vram ? PTE_UPDATE_FLAG_IS_VRAM : 0; 1084 1033 batch_size = 2 + 1085 - pte_update_size(m, clear_vram, src, &src_it, 1034 + pte_update_size(m, pte_flags, src, &src_it, 1086 1035 &clear_L0, &clear_L0_ofs, &clear_L0_pt, 1087 1036 clear_system_ccs ? 0 : emit_clear_cmd_len(gt), 0, 1088 1037 avail_pts); 1089 1038 1090 - if (xe_device_has_flat_ccs(xe)) 1039 + if (xe_migrate_needs_ccs_emit(xe)) 1091 1040 batch_size += EMIT_COPY_CCS_DW; 1092 1041 1093 1042 /* Clear commands */ ··· 1116 1063 if (!clear_system_ccs) 1117 1064 emit_clear(gt, bb, clear_L0_ofs, clear_L0, XE_PAGE_SIZE, clear_vram); 1118 1065 1119 - if (xe_device_has_flat_ccs(xe)) { 1066 + if (xe_migrate_needs_ccs_emit(xe)) { 1120 1067 emit_copy_ccs(gt, bb, clear_L0_ofs, true, 1121 1068 m->cleared_mem_ofs, false, clear_L0); 1122 1069 flush_flags = MI_FLUSH_DW_CCS; ··· 1179 1126 } 1180 1127 1181 1128 static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs, 1129 + const struct xe_vm_pgtable_update_op *pt_op, 1182 1130 const struct xe_vm_pgtable_update *update, 1183 1131 struct xe_migrate_pt_update *pt_update) 1184 1132 { ··· 1200 1146 if (!ppgtt_ofs) 1201 1147 ppgtt_ofs = xe_migrate_vram_ofs(tile_to_xe(tile), 1202 1148 xe_bo_addr(update->pt_bo, 0, 1203 - XE_PAGE_SIZE)); 1149 + XE_PAGE_SIZE), false); 1204 1150 1205 1151 do { 1206 1152 u64 addr = ppgtt_ofs + ofs * 8; ··· 1214 1160 bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk); 1215 1161 bb->cs[bb->len++] = lower_32_bits(addr); 1216 1162 bb->cs[bb->len++] = upper_32_bits(addr); 1217 - ops->populate(pt_update, tile, NULL, bb->cs + bb->len, ofs, chunk, 1218 - update); 1163 + if (pt_op->bind) 1164 + ops->populate(pt_update, tile, NULL, bb->cs + bb->len, 1165 + ofs, chunk, update); 1166 + else 1167 + ops->clear(pt_update, tile, NULL, bb->cs + bb->len, 1168 + ofs, chunk, update); 1219 1169 1220 1170 bb->len += chunk * 2; 1221 1171 ofs += chunk; ··· 1244 1186 1245 1187 static struct dma_fence * 1246 1188 xe_migrate_update_pgtables_cpu(struct xe_migrate *m, 1247 - struct xe_vm *vm, struct xe_bo *bo, 1248 - const struct xe_vm_pgtable_update *updates, 1249 - u32 num_updates, bool wait_vm, 1250 1189 struct xe_migrate_pt_update *pt_update) 1251 1190 { 1252 1191 XE_TEST_DECLARE(struct migrate_test_params *test = 1253 1192 to_migrate_test_params 1254 1193 (xe_cur_kunit_priv(XE_TEST_LIVE_MIGRATE));) 1255 1194 const struct xe_migrate_pt_update_ops *ops = pt_update->ops; 1256 - struct dma_fence *fence; 1195 + struct xe_vm *vm = pt_update->vops->vm; 1196 + struct xe_vm_pgtable_update_ops *pt_update_ops = 1197 + &pt_update->vops->pt_update_ops[pt_update->tile_id]; 1257 1198 int err; 1258 - u32 i; 1199 + u32 i, j; 1259 1200 1260 1201 if (XE_TEST_ONLY(test && test->force_gpu)) 1261 - return ERR_PTR(-ETIME); 1262 - 1263 - if (bo && !dma_resv_test_signaled(bo->ttm.base.resv, 1264 - DMA_RESV_USAGE_KERNEL)) 1265 - return ERR_PTR(-ETIME); 1266 - 1267 - if (wait_vm && !dma_resv_test_signaled(xe_vm_resv(vm), 1268 - DMA_RESV_USAGE_BOOKKEEP)) 1269 1202 return ERR_PTR(-ETIME); 1270 1203 1271 1204 if (ops->pre_commit) { ··· 1265 1216 if (err) 1266 1217 return ERR_PTR(err); 1267 1218 } 1268 - for (i = 0; i < num_updates; i++) { 1269 - const struct xe_vm_pgtable_update *update = &updates[i]; 1270 1219 1271 - ops->populate(pt_update, m->tile, &update->pt_bo->vmap, NULL, 1272 - update->ofs, update->qwords, update); 1273 - } 1220 + for (i = 0; i < pt_update_ops->num_ops; ++i) { 1221 + const struct xe_vm_pgtable_update_op *pt_op = 1222 + &pt_update_ops->ops[i]; 1274 1223 1275 - if (vm) { 1276 - trace_xe_vm_cpu_bind(vm); 1277 - xe_device_wmb(vm->xe); 1278 - } 1224 + for (j = 0; j < pt_op->num_entries; j++) { 1225 + const struct xe_vm_pgtable_update *update = 1226 + &pt_op->entries[j]; 1279 1227 1280 - fence = dma_fence_get_stub(); 1281 - 1282 - return fence; 1283 - } 1284 - 1285 - static bool no_in_syncs(struct xe_vm *vm, struct xe_exec_queue *q, 1286 - struct xe_sync_entry *syncs, u32 num_syncs) 1287 - { 1288 - struct dma_fence *fence; 1289 - int i; 1290 - 1291 - for (i = 0; i < num_syncs; i++) { 1292 - fence = syncs[i].fence; 1293 - 1294 - if (fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, 1295 - &fence->flags)) 1296 - return false; 1297 - } 1298 - if (q) { 1299 - fence = xe_exec_queue_last_fence_get(q, vm); 1300 - if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) { 1301 - dma_fence_put(fence); 1302 - return false; 1228 + if (pt_op->bind) 1229 + ops->populate(pt_update, m->tile, 1230 + &update->pt_bo->vmap, NULL, 1231 + update->ofs, update->qwords, 1232 + update); 1233 + else 1234 + ops->clear(pt_update, m->tile, 1235 + &update->pt_bo->vmap, NULL, 1236 + update->ofs, update->qwords, update); 1303 1237 } 1304 - dma_fence_put(fence); 1305 1238 } 1306 1239 1307 - return true; 1240 + trace_xe_vm_cpu_bind(vm); 1241 + xe_device_wmb(vm->xe); 1242 + 1243 + return dma_fence_get_stub(); 1308 1244 } 1309 1245 1310 - /** 1311 - * xe_migrate_update_pgtables() - Pipelined page-table update 1312 - * @m: The migrate context. 1313 - * @vm: The vm we'll be updating. 1314 - * @bo: The bo whose dma-resv we will await before updating, or NULL if userptr. 1315 - * @q: The exec queue to be used for the update or NULL if the default 1316 - * migration engine is to be used. 1317 - * @updates: An array of update descriptors. 1318 - * @num_updates: Number of descriptors in @updates. 1319 - * @syncs: Array of xe_sync_entry to await before updating. Note that waits 1320 - * will block the engine timeline. 1321 - * @num_syncs: Number of entries in @syncs. 1322 - * @pt_update: Pointer to a struct xe_migrate_pt_update, which contains 1323 - * pointers to callback functions and, if subclassed, private arguments to 1324 - * those. 1325 - * 1326 - * Perform a pipelined page-table update. The update descriptors are typically 1327 - * built under the same lock critical section as a call to this function. If 1328 - * using the default engine for the updates, they will be performed in the 1329 - * order they grab the job_mutex. If different engines are used, external 1330 - * synchronization is needed for overlapping updates to maintain page-table 1331 - * consistency. Note that the meaing of "overlapping" is that the updates 1332 - * touch the same page-table, which might be a higher-level page-directory. 1333 - * If no pipelining is needed, then updates may be performed by the cpu. 1334 - * 1335 - * Return: A dma_fence that, when signaled, indicates the update completion. 1336 - */ 1337 - struct dma_fence * 1338 - xe_migrate_update_pgtables(struct xe_migrate *m, 1339 - struct xe_vm *vm, 1340 - struct xe_bo *bo, 1341 - struct xe_exec_queue *q, 1342 - const struct xe_vm_pgtable_update *updates, 1343 - u32 num_updates, 1344 - struct xe_sync_entry *syncs, u32 num_syncs, 1345 - struct xe_migrate_pt_update *pt_update) 1246 + static struct dma_fence * 1247 + __xe_migrate_update_pgtables(struct xe_migrate *m, 1248 + struct xe_migrate_pt_update *pt_update, 1249 + struct xe_vm_pgtable_update_ops *pt_update_ops) 1346 1250 { 1347 1251 const struct xe_migrate_pt_update_ops *ops = pt_update->ops; 1348 1252 struct xe_tile *tile = m->tile; ··· 1304 1302 struct xe_sched_job *job; 1305 1303 struct dma_fence *fence; 1306 1304 struct drm_suballoc *sa_bo = NULL; 1307 - struct xe_vma *vma = pt_update->vma; 1308 1305 struct xe_bb *bb; 1309 - u32 i, batch_size, ppgtt_ofs, update_idx, page_ofs = 0; 1306 + u32 i, j, batch_size = 0, ppgtt_ofs, update_idx, page_ofs = 0; 1307 + u32 num_updates = 0, current_update = 0; 1310 1308 u64 addr; 1311 1309 int err = 0; 1312 - bool usm = !q && xe->info.has_usm; 1313 - bool first_munmap_rebind = vma && 1314 - vma->gpuva.flags & XE_VMA_FIRST_REBIND; 1315 - struct xe_exec_queue *q_override = !q ? m->q : q; 1316 - u16 pat_index = xe->pat.idx[XE_CACHE_WB]; 1310 + bool is_migrate = pt_update_ops->q == m->q; 1311 + bool usm = is_migrate && xe->info.has_usm; 1317 1312 1318 - /* Use the CPU if no in syncs and engine is idle */ 1319 - if (no_in_syncs(vm, q, syncs, num_syncs) && xe_exec_queue_is_idle(q_override)) { 1320 - fence = xe_migrate_update_pgtables_cpu(m, vm, bo, updates, 1321 - num_updates, 1322 - first_munmap_rebind, 1323 - pt_update); 1324 - if (!IS_ERR(fence) || fence == ERR_PTR(-EAGAIN)) 1325 - return fence; 1313 + for (i = 0; i < pt_update_ops->num_ops; ++i) { 1314 + struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[i]; 1315 + struct xe_vm_pgtable_update *updates = pt_op->entries; 1316 + 1317 + num_updates += pt_op->num_entries; 1318 + for (j = 0; j < pt_op->num_entries; ++j) { 1319 + u32 num_cmds = DIV_ROUND_UP(updates[j].qwords, 1320 + MAX_PTE_PER_SDI); 1321 + 1322 + /* align noop + MI_STORE_DATA_IMM cmd prefix */ 1323 + batch_size += 4 * num_cmds + updates[j].qwords * 2; 1324 + } 1326 1325 } 1327 1326 1328 1327 /* fixed + PTE entries */ 1329 1328 if (IS_DGFX(xe)) 1330 - batch_size = 2; 1329 + batch_size += 2; 1331 1330 else 1332 - batch_size = 6 + num_updates * 2; 1331 + batch_size += 6 * (num_updates / MAX_PTE_PER_SDI + 1) + 1332 + num_updates * 2; 1333 1333 1334 - for (i = 0; i < num_updates; i++) { 1335 - u32 num_cmds = DIV_ROUND_UP(updates[i].qwords, MAX_PTE_PER_SDI); 1336 - 1337 - /* align noop + MI_STORE_DATA_IMM cmd prefix */ 1338 - batch_size += 4 * num_cmds + updates[i].qwords * 2; 1339 - } 1340 - 1341 - /* 1342 - * XXX: Create temp bo to copy from, if batch_size becomes too big? 1343 - * 1344 - * Worst case: Sum(2 * (each lower level page size) + (top level page size)) 1345 - * Should be reasonably bound.. 1346 - */ 1347 - xe_tile_assert(tile, batch_size < SZ_128K); 1348 - 1349 - bb = xe_bb_new(gt, batch_size, !q && xe->info.has_usm); 1334 + bb = xe_bb_new(gt, batch_size, usm); 1350 1335 if (IS_ERR(bb)) 1351 1336 return ERR_CAST(bb); 1352 1337 1353 1338 /* For sysmem PTE's, need to map them in our hole.. */ 1354 1339 if (!IS_DGFX(xe)) { 1355 - ppgtt_ofs = NUM_KERNEL_PDE - 1; 1356 - if (q) { 1357 - xe_tile_assert(tile, num_updates <= NUM_VMUSA_WRITES_PER_UNIT); 1340 + u32 ptes, ofs; 1358 1341 1359 - sa_bo = drm_suballoc_new(&m->vm_update_sa, 1, 1342 + ppgtt_ofs = NUM_KERNEL_PDE - 1; 1343 + if (!is_migrate) { 1344 + u32 num_units = DIV_ROUND_UP(num_updates, 1345 + NUM_VMUSA_WRITES_PER_UNIT); 1346 + 1347 + if (num_units > m->vm_update_sa.size) { 1348 + err = -ENOBUFS; 1349 + goto err_bb; 1350 + } 1351 + sa_bo = drm_suballoc_new(&m->vm_update_sa, num_units, 1360 1352 GFP_KERNEL, true, 0); 1361 1353 if (IS_ERR(sa_bo)) { 1362 1354 err = PTR_ERR(sa_bo); ··· 1366 1370 } 1367 1371 1368 1372 /* Map our PT's to gtt */ 1369 - bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(num_updates); 1370 - bb->cs[bb->len++] = ppgtt_ofs * XE_PAGE_SIZE + page_ofs; 1371 - bb->cs[bb->len++] = 0; /* upper_32_bits */ 1373 + i = 0; 1374 + j = 0; 1375 + ptes = num_updates; 1376 + ofs = ppgtt_ofs * XE_PAGE_SIZE + page_ofs; 1377 + while (ptes) { 1378 + u32 chunk = min(MAX_PTE_PER_SDI, ptes); 1379 + u32 idx = 0; 1372 1380 1373 - for (i = 0; i < num_updates; i++) { 1374 - struct xe_bo *pt_bo = updates[i].pt_bo; 1381 + bb->cs[bb->len++] = MI_STORE_DATA_IMM | 1382 + MI_SDI_NUM_QW(chunk); 1383 + bb->cs[bb->len++] = ofs; 1384 + bb->cs[bb->len++] = 0; /* upper_32_bits */ 1375 1385 1376 - xe_tile_assert(tile, pt_bo->size == SZ_4K); 1386 + for (; i < pt_update_ops->num_ops; ++i) { 1387 + struct xe_vm_pgtable_update_op *pt_op = 1388 + &pt_update_ops->ops[i]; 1389 + struct xe_vm_pgtable_update *updates = pt_op->entries; 1377 1390 1378 - addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, pat_index, 0); 1379 - bb->cs[bb->len++] = lower_32_bits(addr); 1380 - bb->cs[bb->len++] = upper_32_bits(addr); 1391 + for (; j < pt_op->num_entries; ++j, ++current_update, ++idx) { 1392 + struct xe_vm *vm = pt_update->vops->vm; 1393 + struct xe_bo *pt_bo = updates[j].pt_bo; 1394 + 1395 + if (idx == chunk) 1396 + goto next_cmd; 1397 + 1398 + xe_tile_assert(tile, pt_bo->size == SZ_4K); 1399 + 1400 + /* Map a PT at most once */ 1401 + if (pt_bo->update_index < 0) 1402 + pt_bo->update_index = current_update; 1403 + 1404 + addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, 1405 + XE_CACHE_WB, 0); 1406 + bb->cs[bb->len++] = lower_32_bits(addr); 1407 + bb->cs[bb->len++] = upper_32_bits(addr); 1408 + } 1409 + 1410 + j = 0; 1411 + } 1412 + 1413 + next_cmd: 1414 + ptes -= chunk; 1415 + ofs += chunk * sizeof(u64); 1381 1416 } 1382 1417 1383 1418 bb->cs[bb->len++] = MI_BATCH_BUFFER_END; ··· 1416 1389 1417 1390 addr = xe_migrate_vm_addr(ppgtt_ofs, 0) + 1418 1391 (page_ofs / sizeof(u64)) * XE_PAGE_SIZE; 1419 - for (i = 0; i < num_updates; i++) 1420 - write_pgtable(tile, bb, addr + i * XE_PAGE_SIZE, 1421 - &updates[i], pt_update); 1392 + for (i = 0; i < pt_update_ops->num_ops; ++i) { 1393 + struct xe_vm_pgtable_update_op *pt_op = 1394 + &pt_update_ops->ops[i]; 1395 + struct xe_vm_pgtable_update *updates = pt_op->entries; 1396 + 1397 + for (j = 0; j < pt_op->num_entries; ++j) { 1398 + struct xe_bo *pt_bo = updates[j].pt_bo; 1399 + 1400 + write_pgtable(tile, bb, addr + 1401 + pt_bo->update_index * XE_PAGE_SIZE, 1402 + pt_op, &updates[j], pt_update); 1403 + } 1404 + } 1422 1405 } else { 1423 1406 /* phys pages, no preamble required */ 1424 1407 bb->cs[bb->len++] = MI_BATCH_BUFFER_END; 1425 1408 update_idx = bb->len; 1426 1409 1427 - for (i = 0; i < num_updates; i++) 1428 - write_pgtable(tile, bb, 0, &updates[i], pt_update); 1410 + for (i = 0; i < pt_update_ops->num_ops; ++i) { 1411 + struct xe_vm_pgtable_update_op *pt_op = 1412 + &pt_update_ops->ops[i]; 1413 + struct xe_vm_pgtable_update *updates = pt_op->entries; 1414 + 1415 + for (j = 0; j < pt_op->num_entries; ++j) 1416 + write_pgtable(tile, bb, 0, pt_op, &updates[j], 1417 + pt_update); 1418 + } 1429 1419 } 1430 1420 1431 - job = xe_bb_create_migration_job(q ?: m->q, bb, 1421 + job = xe_bb_create_migration_job(pt_update_ops->q, bb, 1432 1422 xe_migrate_batch_base(m, usm), 1433 1423 update_idx); 1434 1424 if (IS_ERR(job)) { ··· 1453 1409 goto err_sa; 1454 1410 } 1455 1411 1456 - /* Wait on BO move */ 1457 - if (bo) { 1458 - err = xe_sched_job_add_deps(job, bo->ttm.base.resv, 1459 - DMA_RESV_USAGE_KERNEL); 1460 - if (err) 1461 - goto err_job; 1462 - } 1463 - 1464 - /* 1465 - * Munmap style VM unbind, need to wait for all jobs to be complete / 1466 - * trigger preempts before moving forward 1467 - */ 1468 - if (first_munmap_rebind) { 1469 - err = xe_sched_job_add_deps(job, xe_vm_resv(vm), 1470 - DMA_RESV_USAGE_BOOKKEEP); 1471 - if (err) 1472 - goto err_job; 1473 - } 1474 - 1475 - err = xe_sched_job_last_fence_add_dep(job, vm); 1476 - for (i = 0; !err && i < num_syncs; i++) 1477 - err = xe_sync_entry_add_deps(&syncs[i], job); 1478 - 1479 - if (err) 1480 - goto err_job; 1481 - 1482 1412 if (ops->pre_commit) { 1483 1413 pt_update->job = job; 1484 1414 err = ops->pre_commit(pt_update); 1485 1415 if (err) 1486 1416 goto err_job; 1487 1417 } 1488 - if (!q) 1418 + if (is_migrate) 1489 1419 mutex_lock(&m->job_mutex); 1490 1420 1491 1421 xe_sched_job_arm(job); 1492 1422 fence = dma_fence_get(&job->drm.s_fence->finished); 1493 1423 xe_sched_job_push(job); 1494 1424 1495 - if (!q) 1425 + if (is_migrate) 1496 1426 mutex_unlock(&m->job_mutex); 1497 1427 1498 1428 xe_bb_free(bb, fence); ··· 1481 1463 err_bb: 1482 1464 xe_bb_free(bb, NULL); 1483 1465 return ERR_PTR(err); 1466 + } 1467 + 1468 + /** 1469 + * xe_migrate_update_pgtables() - Pipelined page-table update 1470 + * @m: The migrate context. 1471 + * @pt_update: PT update arguments 1472 + * 1473 + * Perform a pipelined page-table update. The update descriptors are typically 1474 + * built under the same lock critical section as a call to this function. If 1475 + * using the default engine for the updates, they will be performed in the 1476 + * order they grab the job_mutex. If different engines are used, external 1477 + * synchronization is needed for overlapping updates to maintain page-table 1478 + * consistency. Note that the meaing of "overlapping" is that the updates 1479 + * touch the same page-table, which might be a higher-level page-directory. 1480 + * If no pipelining is needed, then updates may be performed by the cpu. 1481 + * 1482 + * Return: A dma_fence that, when signaled, indicates the update completion. 1483 + */ 1484 + struct dma_fence * 1485 + xe_migrate_update_pgtables(struct xe_migrate *m, 1486 + struct xe_migrate_pt_update *pt_update) 1487 + 1488 + { 1489 + struct xe_vm_pgtable_update_ops *pt_update_ops = 1490 + &pt_update->vops->pt_update_ops[pt_update->tile_id]; 1491 + struct dma_fence *fence; 1492 + 1493 + fence = xe_migrate_update_pgtables_cpu(m, pt_update); 1494 + 1495 + /* -ETIME indicates a job is needed, anything else is legit error */ 1496 + if (!IS_ERR(fence) || PTR_ERR(fence) != -ETIME) 1497 + return fence; 1498 + 1499 + return __xe_migrate_update_pgtables(m, pt_update, pt_update_ops); 1484 1500 } 1485 1501 1486 1502 /**
+21 -13
drivers/gpu/drm/xe/xe_migrate.h
··· 47 47 struct xe_tile *tile, struct iosys_map *map, 48 48 void *pos, u32 ofs, u32 num_qwords, 49 49 const struct xe_vm_pgtable_update *update); 50 + /** 51 + * @clear: Clear a command buffer or page-table with ptes. 52 + * @pt_update: Embeddable callback argument. 53 + * @tile: The tile for the current operation. 54 + * @map: struct iosys_map into the memory to be populated. 55 + * @pos: If @map is NULL, map into the memory to be populated. 56 + * @ofs: qword offset into @map, unused if @map is NULL. 57 + * @num_qwords: Number of qwords to write. 58 + * @update: Information about the PTEs to be inserted. 59 + * 60 + * This interface is intended to be used as a callback into the 61 + * page-table system to populate command buffers or shared 62 + * page-tables with PTEs. 63 + */ 64 + void (*clear)(struct xe_migrate_pt_update *pt_update, 65 + struct xe_tile *tile, struct iosys_map *map, 66 + void *pos, u32 ofs, u32 num_qwords, 67 + const struct xe_vm_pgtable_update *update); 50 68 51 69 /** 52 70 * @pre_commit: Callback to be called just before arming the ··· 85 67 struct xe_migrate_pt_update { 86 68 /** @ops: Pointer to the struct xe_migrate_pt_update_ops callbacks */ 87 69 const struct xe_migrate_pt_update_ops *ops; 88 - /** @vma: The vma we're updating the pagetable for. */ 89 - struct xe_vma *vma; 70 + /** @vops: VMA operations */ 71 + struct xe_vma_ops *vops; 90 72 /** @job: The job if a GPU page-table update. NULL otherwise */ 91 73 struct xe_sched_job *job; 92 - /** @start: Start of update for the range fence */ 93 - u64 start; 94 - /** @last: Last of update for the range fence */ 95 - u64 last; 96 74 /** @tile_id: Tile ID of the update */ 97 75 u8 tile_id; 98 76 }; ··· 110 96 111 97 struct dma_fence * 112 98 xe_migrate_update_pgtables(struct xe_migrate *m, 113 - struct xe_vm *vm, 114 - struct xe_bo *bo, 115 - struct xe_exec_queue *q, 116 - const struct xe_vm_pgtable_update *updates, 117 - u32 num_updates, 118 - struct xe_sync_entry *syncs, u32 num_syncs, 119 99 struct xe_migrate_pt_update *pt_update); 120 100 121 101 void xe_migrate_wait(struct xe_migrate *m); 122 102 123 - struct xe_exec_queue *xe_tile_migrate_engine(struct xe_tile *tile); 103 + struct xe_exec_queue *xe_tile_migrate_exec_queue(struct xe_tile *tile); 124 104 #endif
+164 -107
drivers/gpu/drm/xe/xe_mmio.c
··· 33 33 tile->mmio.regs = NULL; 34 34 } 35 35 36 - int xe_mmio_probe_tiles(struct xe_device *xe) 36 + /* 37 + * On multi-tile devices, partition the BAR space for MMIO on each tile, 38 + * possibly accounting for register override on the number of tiles available. 39 + * Resulting memory layout is like below: 40 + * 41 + * .----------------------. <- tile_count * tile_mmio_size 42 + * | .... | 43 + * |----------------------| <- 2 * tile_mmio_size 44 + * | tile1->mmio.regs | 45 + * |----------------------| <- 1 * tile_mmio_size 46 + * | tile0->mmio.regs | 47 + * '----------------------' <- 0MB 48 + */ 49 + static void mmio_multi_tile_setup(struct xe_device *xe, size_t tile_mmio_size) 37 50 { 38 - size_t tile_mmio_size = SZ_16M, tile_mmio_ext_size = xe->info.tile_mmio_ext_size; 39 - u8 id, tile_count = xe->info.tile_count; 40 - struct xe_gt *gt = xe_root_mmio_gt(xe); 41 51 struct xe_tile *tile; 42 52 void __iomem *regs; 43 - u32 mtcfg; 53 + u8 id; 44 54 45 - if (tile_count == 1) 46 - goto add_mmio_ext; 55 + /* 56 + * Nothing to be done as tile 0 has already been setup earlier with the 57 + * entire BAR mapped - see xe_mmio_init() 58 + */ 59 + if (xe->info.tile_count == 1) 60 + return; 47 61 62 + /* Possibly override number of tile based on configuration register */ 48 63 if (!xe->info.skip_mtcfg) { 64 + struct xe_gt *gt = xe_root_mmio_gt(xe); 65 + u8 tile_count; 66 + u32 mtcfg; 67 + 68 + /* 69 + * Although the per-tile mmio regs are not yet initialized, this 70 + * is fine as it's going to the root gt, that's guaranteed to be 71 + * initialized earlier in xe_mmio_init() 72 + */ 49 73 mtcfg = xe_mmio_read64_2x32(gt, XEHP_MTCFG_ADDR); 50 74 tile_count = REG_FIELD_GET(TILE_COUNT, mtcfg) + 1; 75 + 51 76 if (tile_count < xe->info.tile_count) { 52 77 drm_info(&xe->drm, "tile_count: %d, reduced_tile_count %d\n", 53 78 xe->info.tile_count, tile_count); 54 79 xe->info.tile_count = tile_count; 55 80 56 81 /* 57 - * FIXME: Needs some work for standalone media, but should be impossible 58 - * with multi-tile for now. 82 + * FIXME: Needs some work for standalone media, but 83 + * should be impossible with multi-tile for now: 84 + * multi-tile platform with standalone media doesn't 85 + * exist 59 86 */ 60 87 xe->info.gt_count = xe->info.tile_count; 61 88 } ··· 94 67 tile->mmio.regs = regs; 95 68 regs += tile_mmio_size; 96 69 } 70 + } 97 71 98 - add_mmio_ext: 99 - /* 100 - * By design, there's a contiguous multi-tile MMIO space (16MB hard coded per tile). 101 - * When supported, there could be an additional contiguous multi-tile MMIO extension 102 - * space ON TOP of it, and hence the necessity for distinguished MMIO spaces. 103 - */ 104 - if (xe->info.has_mmio_ext) { 105 - regs = xe->mmio.regs + tile_mmio_size * tile_count; 72 + /* 73 + * On top of all the multi-tile MMIO space there can be a platform-dependent 74 + * extension for each tile, resulting in a layout like below: 75 + * 76 + * .----------------------. <- ext_base + tile_count * tile_mmio_ext_size 77 + * | .... | 78 + * |----------------------| <- ext_base + 2 * tile_mmio_ext_size 79 + * | tile1->mmio_ext.regs | 80 + * |----------------------| <- ext_base + 1 * tile_mmio_ext_size 81 + * | tile0->mmio_ext.regs | 82 + * |======================| <- ext_base = tile_count * tile_mmio_size 83 + * | | 84 + * | mmio.regs | 85 + * | | 86 + * '----------------------' <- 0MB 87 + * 88 + * Set up the tile[]->mmio_ext pointers/sizes. 89 + */ 90 + static void mmio_extension_setup(struct xe_device *xe, size_t tile_mmio_size, 91 + size_t tile_mmio_ext_size) 92 + { 93 + struct xe_tile *tile; 94 + void __iomem *regs; 95 + u8 id; 106 96 107 - for_each_tile(tile, xe, id) { 108 - tile->mmio_ext.size = tile_mmio_ext_size; 109 - tile->mmio_ext.regs = regs; 97 + if (!xe->info.has_mmio_ext) 98 + return; 110 99 111 - regs += tile_mmio_ext_size; 112 - } 100 + regs = xe->mmio.regs + tile_mmio_size * xe->info.tile_count; 101 + for_each_tile(tile, xe, id) { 102 + tile->mmio_ext.size = tile_mmio_ext_size; 103 + tile->mmio_ext.regs = regs; 104 + regs += tile_mmio_ext_size; 113 105 } 106 + } 107 + 108 + int xe_mmio_probe_tiles(struct xe_device *xe) 109 + { 110 + size_t tile_mmio_size = SZ_16M; 111 + size_t tile_mmio_ext_size = xe->info.tile_mmio_ext_size; 112 + 113 + mmio_multi_tile_setup(xe, tile_mmio_size); 114 + mmio_extension_setup(xe, tile_mmio_size, tile_mmio_ext_size); 114 115 115 116 return devm_add_action_or_reset(xe->drm.dev, tiles_fini, xe); 116 117 } ··· 176 121 return devm_add_action_or_reset(xe->drm.dev, mmio_fini, xe); 177 122 } 178 123 124 + static void mmio_flush_pending_writes(struct xe_gt *gt) 125 + { 126 + #define DUMMY_REG_OFFSET 0x130030 127 + struct xe_tile *tile = gt_to_tile(gt); 128 + int i; 129 + 130 + if (tile->xe->info.platform != XE_LUNARLAKE) 131 + return; 132 + 133 + /* 4 dummy writes */ 134 + for (i = 0; i < 4; i++) 135 + writel(0, tile->mmio.regs + DUMMY_REG_OFFSET); 136 + } 137 + 179 138 u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg) 180 139 { 181 140 struct xe_tile *tile = gt_to_tile(gt); 182 141 u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); 183 142 u8 val; 143 + 144 + /* Wa_15015404425 */ 145 + mmio_flush_pending_writes(gt); 184 146 185 147 val = readb((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + addr); 186 148 trace_xe_reg_rw(gt, false, addr, val, sizeof(val)); ··· 211 139 u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); 212 140 u16 val; 213 141 142 + /* Wa_15015404425 */ 143 + mmio_flush_pending_writes(gt); 144 + 214 145 val = readw((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + addr); 215 146 trace_xe_reg_rw(gt, false, addr, val, sizeof(val)); 216 147 ··· 226 151 u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); 227 152 228 153 trace_xe_reg_rw(gt, true, addr, val, sizeof(val)); 229 - writel(val, (reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + addr); 154 + 155 + if (!reg.vf && IS_SRIOV_VF(gt_to_xe(gt))) 156 + xe_gt_sriov_vf_write32(gt, reg, val); 157 + else 158 + writel(val, (reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) + addr); 230 159 } 231 160 232 161 u32 xe_mmio_read32(struct xe_gt *gt, struct xe_reg reg) ··· 238 159 struct xe_tile *tile = gt_to_tile(gt); 239 160 u32 addr = xe_mmio_adjusted_addr(gt, reg.addr); 240 161 u32 val; 162 + 163 + /* Wa_15015404425 */ 164 + mmio_flush_pending_writes(gt); 241 165 242 166 if (!reg.vf && IS_SRIOV_VF(gt_to_xe(gt))) 243 167 val = xe_gt_sriov_vf_read32(gt, reg); ··· 333 251 return (u64)udw << 32 | ldw; 334 252 } 335 253 254 + static int __xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us, 255 + u32 *out_val, bool atomic, bool expect_match) 256 + { 257 + ktime_t cur = ktime_get_raw(); 258 + const ktime_t end = ktime_add_us(cur, timeout_us); 259 + int ret = -ETIMEDOUT; 260 + s64 wait = 10; 261 + u32 read; 262 + bool check; 263 + 264 + for (;;) { 265 + read = xe_mmio_read32(gt, reg); 266 + 267 + check = (read & mask) == val; 268 + if (!expect_match) 269 + check = !check; 270 + 271 + if (check) { 272 + ret = 0; 273 + break; 274 + } 275 + 276 + cur = ktime_get_raw(); 277 + if (!ktime_before(cur, end)) 278 + break; 279 + 280 + if (ktime_after(ktime_add_us(cur, wait), end)) 281 + wait = ktime_us_delta(end, cur); 282 + 283 + if (atomic) 284 + udelay(wait); 285 + else 286 + usleep_range(wait, wait << 1); 287 + wait <<= 1; 288 + } 289 + 290 + if (ret != 0) { 291 + read = xe_mmio_read32(gt, reg); 292 + 293 + check = (read & mask) == val; 294 + if (!expect_match) 295 + check = !check; 296 + 297 + if (check) 298 + ret = 0; 299 + } 300 + 301 + if (out_val) 302 + *out_val = read; 303 + 304 + return ret; 305 + } 306 + 336 307 /** 337 308 * xe_mmio_wait32() - Wait for a register to match the desired masked value 338 309 * @gt: MMIO target GT ··· 408 273 int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us, 409 274 u32 *out_val, bool atomic) 410 275 { 411 - ktime_t cur = ktime_get_raw(); 412 - const ktime_t end = ktime_add_us(cur, timeout_us); 413 - int ret = -ETIMEDOUT; 414 - s64 wait = 10; 415 - u32 read; 416 - 417 - for (;;) { 418 - read = xe_mmio_read32(gt, reg); 419 - if ((read & mask) == val) { 420 - ret = 0; 421 - break; 422 - } 423 - 424 - cur = ktime_get_raw(); 425 - if (!ktime_before(cur, end)) 426 - break; 427 - 428 - if (ktime_after(ktime_add_us(cur, wait), end)) 429 - wait = ktime_us_delta(end, cur); 430 - 431 - if (atomic) 432 - udelay(wait); 433 - else 434 - usleep_range(wait, wait << 1); 435 - wait <<= 1; 436 - } 437 - 438 - if (ret != 0) { 439 - read = xe_mmio_read32(gt, reg); 440 - if ((read & mask) == val) 441 - ret = 0; 442 - } 443 - 444 - if (out_val) 445 - *out_val = read; 446 - 447 - return ret; 276 + return __xe_mmio_wait32(gt, reg, mask, val, timeout_us, out_val, atomic, true); 448 277 } 449 278 450 279 /** ··· 416 317 * @gt: MMIO target GT 417 318 * @reg: register to read value from 418 319 * @mask: mask to be applied to the value read from the register 419 - * @val: value to match after applying the mask 420 - * @timeout_us: time out after this period of time. Wait logic tries to be 421 - * smart, applying an exponential backoff until @timeout_us is reached. 320 + * @val: value not to be matched after applying the mask 321 + * @timeout_us: time out after this period of time 422 322 * @out_val: if not NULL, points where to store the last unmasked value 423 323 * @atomic: needs to be true if calling from an atomic context 424 324 * 425 - * This function polls for a masked value to change from a given value and 426 - * returns zero on success or -ETIMEDOUT if timed out. 427 - * 428 - * Note that @timeout_us represents the minimum amount of time to wait before 429 - * giving up. The actual time taken by this function can be a little more than 430 - * @timeout_us for different reasons, specially in non-atomic contexts. Thus, 431 - * it is possible that this function succeeds even after @timeout_us has passed. 325 + * This function works exactly like xe_mmio_wait32() with the exception that 326 + * @val is expected not to be matched. 432 327 */ 433 328 int xe_mmio_wait32_not(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us, 434 329 u32 *out_val, bool atomic) 435 330 { 436 - ktime_t cur = ktime_get_raw(); 437 - const ktime_t end = ktime_add_us(cur, timeout_us); 438 - int ret = -ETIMEDOUT; 439 - s64 wait = 10; 440 - u32 read; 441 - 442 - for (;;) { 443 - read = xe_mmio_read32(gt, reg); 444 - if ((read & mask) != val) { 445 - ret = 0; 446 - break; 447 - } 448 - 449 - cur = ktime_get_raw(); 450 - if (!ktime_before(cur, end)) 451 - break; 452 - 453 - if (ktime_after(ktime_add_us(cur, wait), end)) 454 - wait = ktime_us_delta(end, cur); 455 - 456 - if (atomic) 457 - udelay(wait); 458 - else 459 - usleep_range(wait, wait << 1); 460 - wait <<= 1; 461 - } 462 - 463 - if (ret != 0) { 464 - read = xe_mmio_read32(gt, reg); 465 - if ((read & mask) != val) 466 - ret = 0; 467 - } 468 - 469 - if (out_val) 470 - *out_val = read; 471 - 472 - return ret; 331 + return __xe_mmio_wait32(gt, reg, mask, val, timeout_us, out_val, atomic, false); 473 332 }
-1
drivers/gpu/drm/xe/xe_mmio.h
··· 22 22 int xe_mmio_write32_and_verify(struct xe_gt *gt, struct xe_reg reg, u32 val, u32 mask, u32 eval); 23 23 bool xe_mmio_in_range(const struct xe_gt *gt, const struct xe_mmio_range *range, struct xe_reg reg); 24 24 25 - int xe_mmio_probe_vram(struct xe_device *xe); 26 25 u64 xe_mmio_read64_2x32(struct xe_gt *gt, struct xe_reg reg); 27 26 int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us, 28 27 u32 *out_val, bool atomic);
+1 -1
drivers/gpu/drm/xe/xe_oa.c
··· 641 641 u32 offset = xe_bo_ggtt_addr(lrc->bo); 642 642 643 643 do { 644 - bb->cs[bb->len++] = MI_STORE_DATA_IMM | BIT(22) /* GGTT */ | 2; 644 + bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_GGTT | MI_SDI_NUM_DW(1); 645 645 bb->cs[bb->len++] = offset + flex->offset * sizeof(u32); 646 646 bb->cs[bb->len++] = 0; 647 647 bb->cs[bb->len++] = flex->value;
+10 -1
drivers/gpu/drm/xe/xe_pat.c
··· 7 7 8 8 #include <drm/xe_drm.h> 9 9 10 + #include <generated/xe_wa_oob.h> 11 + 10 12 #include "regs/xe_reg_defs.h" 11 13 #include "xe_assert.h" 12 14 #include "xe_device.h" ··· 17 15 #include "xe_gt_mcr.h" 18 16 #include "xe_mmio.h" 19 17 #include "xe_sriov.h" 18 + #include "xe_wa.h" 20 19 21 20 #define _PAT_ATS 0x47fc 22 21 #define _PAT_INDEX(index) _PICK_EVEN_2RANGES(index, 8, \ ··· 385 382 if (GRAPHICS_VER(xe) == 20) { 386 383 xe->pat.ops = &xe2_pat_ops; 387 384 xe->pat.table = xe2_pat_table; 388 - xe->pat.n_entries = ARRAY_SIZE(xe2_pat_table); 385 + 386 + /* Wa_16023588340. XXX: Should use XE_WA */ 387 + if (GRAPHICS_VERx100(xe) == 2001) 388 + xe->pat.n_entries = 28; /* Disable CLOS3 */ 389 + else 390 + xe->pat.n_entries = ARRAY_SIZE(xe2_pat_table); 391 + 389 392 xe->pat.idx[XE_CACHE_NONE] = 3; 390 393 xe->pat.idx[XE_CACHE_WT] = 15; 391 394 xe->pat.idx[XE_CACHE_WB] = 2;
+5 -2
drivers/gpu/drm/xe/xe_pci.c
··· 59 59 60 60 u8 has_display:1; 61 61 u8 has_heci_gscfi:1; 62 + u8 has_heci_cscfi:1; 62 63 u8 has_llc:1; 63 64 u8 has_mmio_ext:1; 64 65 u8 has_sriov:1; ··· 346 345 PLATFORM(BATTLEMAGE), 347 346 .has_display = true, 348 347 .require_force_probe = true, 348 + .has_heci_cscfi = 1, 349 349 }; 350 350 351 351 #undef PLATFORM ··· 608 606 609 607 xe->info.is_dgfx = desc->is_dgfx; 610 608 xe->info.has_heci_gscfi = desc->has_heci_gscfi; 609 + xe->info.has_heci_cscfi = desc->has_heci_cscfi; 611 610 xe->info.has_llc = desc->has_llc; 612 611 xe->info.has_mmio_ext = desc->has_mmio_ext; 613 612 xe->info.has_sriov = desc->has_sriov; ··· 818 815 if (err) 819 816 return err; 820 817 821 - drm_dbg(&xe->drm, "%s %s %04x:%04x dgfx:%d gfx:%s (%d.%02d) media:%s (%d.%02d) display:%s dma_m_s:%d tc:%d gscfi:%d", 818 + drm_dbg(&xe->drm, "%s %s %04x:%04x dgfx:%d gfx:%s (%d.%02d) media:%s (%d.%02d) display:%s dma_m_s:%d tc:%d gscfi:%d cscfi:%d", 822 819 desc->platform_name, 823 820 subplatform_desc ? subplatform_desc->name : "", 824 821 xe->info.devid, xe->info.revid, ··· 831 828 xe->info.media_verx100 % 100, 832 829 str_yes_no(xe->info.enable_display), 833 830 xe->info.dma_mask_size, xe->info.tile_count, 834 - xe->info.has_heci_gscfi); 831 + xe->info.has_heci_gscfi, xe->info.has_heci_cscfi); 835 832 836 833 drm_dbg(&xe->drm, "Stepping = (G:%s, M:%s, D:%s, B:%s)\n", 837 834 xe_step_name(xe->info.step.graphics),
+8
drivers/gpu/drm/xe/xe_pm.c
··· 20 20 #include "xe_guc.h" 21 21 #include "xe_irq.h" 22 22 #include "xe_pcode.h" 23 + #include "xe_trace.h" 23 24 #include "xe_wa.h" 24 25 25 26 /** ··· 88 87 int err; 89 88 90 89 drm_dbg(&xe->drm, "Suspending device\n"); 90 + trace_xe_pm_suspend(xe, __builtin_return_address(0)); 91 91 92 92 for_each_gt(gt, xe, id) 93 93 xe_gt_suspend_prepare(gt); ··· 133 131 int err; 134 132 135 133 drm_dbg(&xe->drm, "Resuming device\n"); 134 + trace_xe_pm_resume(xe, __builtin_return_address(0)); 136 135 137 136 for_each_tile(tile, xe, id) 138 137 xe_wa_apply_tile_workarounds(tile); ··· 329 326 u8 id; 330 327 int err = 0; 331 328 329 + trace_xe_pm_runtime_suspend(xe, __builtin_return_address(0)); 332 330 /* Disable access_ongoing asserts and prevent recursive pm calls */ 333 331 xe_pm_write_callback_task(xe, current); 334 332 ··· 403 399 u8 id; 404 400 int err = 0; 405 401 402 + trace_xe_pm_runtime_resume(xe, __builtin_return_address(0)); 406 403 /* Disable access_ongoing asserts and prevent recursive pm calls */ 407 404 xe_pm_write_callback_task(xe, current); 408 405 ··· 468 463 */ 469 464 void xe_pm_runtime_get(struct xe_device *xe) 470 465 { 466 + trace_xe_pm_runtime_get(xe, __builtin_return_address(0)); 471 467 pm_runtime_get_noresume(xe->drm.dev); 472 468 473 469 if (xe_pm_read_callback_task(xe) == current) ··· 484 478 */ 485 479 void xe_pm_runtime_put(struct xe_device *xe) 486 480 { 481 + trace_xe_pm_runtime_put(xe, __builtin_return_address(0)); 487 482 if (xe_pm_read_callback_task(xe) == current) { 488 483 pm_runtime_put_noidle(xe->drm.dev); 489 484 } else { ··· 502 495 */ 503 496 int xe_pm_runtime_get_ioctl(struct xe_device *xe) 504 497 { 498 + trace_xe_pm_runtime_get_ioctl(xe, __builtin_return_address(0)); 505 499 if (WARN_ON(xe_pm_read_callback_task(xe) == current)) 506 500 return -ELOOP; 507 501
+9 -3
drivers/gpu/drm/xe/xe_preempt_fence.c
··· 17 17 container_of(w, typeof(*pfence), preempt_work); 18 18 struct xe_exec_queue *q = pfence->q; 19 19 20 - if (pfence->error) 20 + if (pfence->error) { 21 21 dma_fence_set_error(&pfence->base, pfence->error); 22 - else 23 - q->ops->suspend_wait(q); 22 + } else if (!q->ops->reset_status(q)) { 23 + int err = q->ops->suspend_wait(q); 24 + 25 + if (err) 26 + dma_fence_set_error(&pfence->base, err); 27 + } else { 28 + dma_fence_set_error(&pfence->base, -ENOENT); 29 + } 24 30 25 31 dma_fence_signal(&pfence->base); 26 32 /*
+865 -479
drivers/gpu/drm/xe/xe_pt.c
··· 9 9 #include "xe_bo.h" 10 10 #include "xe_device.h" 11 11 #include "xe_drm_client.h" 12 + #include "xe_exec_queue.h" 12 13 #include "xe_gt.h" 13 14 #include "xe_gt_tlb_invalidation.h" 14 15 #include "xe_migrate.h" 15 16 #include "xe_pt_types.h" 16 17 #include "xe_pt_walk.h" 17 18 #include "xe_res_cursor.h" 19 + #include "xe_sched_job.h" 20 + #include "xe_sync.h" 18 21 #include "xe_trace.h" 19 22 #include "xe_ttm_stolen_mgr.h" 20 23 #include "xe_vm.h" ··· 328 325 entry->pt = parent; 329 326 entry->flags = 0; 330 327 entry->qwords = 0; 328 + entry->pt_bo->update_index = -1; 331 329 332 330 if (alloc_entries) { 333 331 entry->pt_entries = kmalloc_array(XE_PDES, ··· 846 842 } 847 843 } 848 844 849 - static void xe_pt_abort_bind(struct xe_vma *vma, 850 - struct xe_vm_pgtable_update *entries, 851 - u32 num_entries) 845 + static void xe_pt_cancel_bind(struct xe_vma *vma, 846 + struct xe_vm_pgtable_update *entries, 847 + u32 num_entries) 852 848 { 853 849 u32 i, j; 854 850 855 851 for (i = 0; i < num_entries; i++) { 856 - if (!entries[i].pt_entries) 852 + struct xe_pt *pt = entries[i].pt; 853 + 854 + if (!pt) 857 855 continue; 858 856 859 - for (j = 0; j < entries[i].qwords; j++) 860 - xe_pt_destroy(entries[i].pt_entries[j].pt, xe_vma_vm(vma)->flags, NULL); 857 + if (pt->level) { 858 + for (j = 0; j < entries[i].qwords; j++) 859 + xe_pt_destroy(entries[i].pt_entries[j].pt, 860 + xe_vma_vm(vma)->flags, NULL); 861 + } 862 + 861 863 kfree(entries[i].pt_entries); 864 + entries[i].pt_entries = NULL; 865 + entries[i].qwords = 0; 862 866 } 863 867 } 864 868 ··· 876 864 877 865 lockdep_assert_held(&vm->lock); 878 866 879 - if (xe_vma_is_userptr(vma)) 880 - lockdep_assert_held_read(&vm->userptr.notifier_lock); 881 - else if (!xe_vma_is_null(vma)) 867 + if (!xe_vma_is_userptr(vma) && !xe_vma_is_null(vma)) 882 868 dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv); 883 869 884 870 xe_vm_assert_held(vm); 885 871 } 886 872 887 - static void xe_pt_commit_bind(struct xe_vma *vma, 888 - struct xe_vm_pgtable_update *entries, 889 - u32 num_entries, bool rebind, 890 - struct llist_head *deferred) 873 + static void xe_pt_commit(struct xe_vma *vma, 874 + struct xe_vm_pgtable_update *entries, 875 + u32 num_entries, struct llist_head *deferred) 876 + { 877 + u32 i, j; 878 + 879 + xe_pt_commit_locks_assert(vma); 880 + 881 + for (i = 0; i < num_entries; i++) { 882 + struct xe_pt *pt = entries[i].pt; 883 + 884 + if (!pt->level) 885 + continue; 886 + 887 + for (j = 0; j < entries[i].qwords; j++) { 888 + struct xe_pt *oldpte = entries[i].pt_entries[j].pt; 889 + 890 + xe_pt_destroy(oldpte, xe_vma_vm(vma)->flags, deferred); 891 + } 892 + } 893 + } 894 + 895 + static void xe_pt_abort_bind(struct xe_vma *vma, 896 + struct xe_vm_pgtable_update *entries, 897 + u32 num_entries, bool rebind) 898 + { 899 + int i, j; 900 + 901 + xe_pt_commit_locks_assert(vma); 902 + 903 + for (i = num_entries - 1; i >= 0; --i) { 904 + struct xe_pt *pt = entries[i].pt; 905 + struct xe_pt_dir *pt_dir; 906 + 907 + if (!rebind) 908 + pt->num_live -= entries[i].qwords; 909 + 910 + if (!pt->level) 911 + continue; 912 + 913 + pt_dir = as_xe_pt_dir(pt); 914 + for (j = 0; j < entries[i].qwords; j++) { 915 + u32 j_ = j + entries[i].ofs; 916 + struct xe_pt *newpte = xe_pt_entry(pt_dir, j_); 917 + struct xe_pt *oldpte = entries[i].pt_entries[j].pt; 918 + 919 + pt_dir->children[j_] = oldpte ? &oldpte->base : 0; 920 + xe_pt_destroy(newpte, xe_vma_vm(vma)->flags, NULL); 921 + } 922 + } 923 + } 924 + 925 + static void xe_pt_commit_prepare_bind(struct xe_vma *vma, 926 + struct xe_vm_pgtable_update *entries, 927 + u32 num_entries, bool rebind) 891 928 { 892 929 u32 i, j; 893 930 ··· 949 888 if (!rebind) 950 889 pt->num_live += entries[i].qwords; 951 890 952 - if (!pt->level) { 953 - kfree(entries[i].pt_entries); 891 + if (!pt->level) 954 892 continue; 955 - } 956 893 957 894 pt_dir = as_xe_pt_dir(pt); 958 895 for (j = 0; j < entries[i].qwords; j++) { 959 896 u32 j_ = j + entries[i].ofs; 960 897 struct xe_pt *newpte = entries[i].pt_entries[j].pt; 898 + struct xe_pt *oldpte = NULL; 961 899 962 900 if (xe_pt_entry(pt_dir, j_)) 963 - xe_pt_destroy(xe_pt_entry(pt_dir, j_), 964 - xe_vma_vm(vma)->flags, deferred); 901 + oldpte = xe_pt_entry(pt_dir, j_); 965 902 966 903 pt_dir->children[j_] = &newpte->base; 904 + entries[i].pt_entries[j].pt = oldpte; 967 905 } 968 - kfree(entries[i].pt_entries); 969 906 } 907 + } 908 + 909 + static void xe_pt_free_bind(struct xe_vm_pgtable_update *entries, 910 + u32 num_entries) 911 + { 912 + u32 i; 913 + 914 + for (i = 0; i < num_entries; i++) 915 + kfree(entries[i].pt_entries); 970 916 } 971 917 972 918 static int ··· 986 918 err = xe_pt_stage_bind(tile, vma, entries, num_entries); 987 919 if (!err) 988 920 xe_tile_assert(tile, *num_entries); 989 - else /* abort! */ 990 - xe_pt_abort_bind(vma, entries, *num_entries); 991 921 992 922 return err; 993 923 } 994 924 995 925 static void xe_vm_dbg_print_entries(struct xe_device *xe, 996 926 const struct xe_vm_pgtable_update *entries, 997 - unsigned int num_entries) 927 + unsigned int num_entries, bool bind) 998 928 #if (IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)) 999 929 { 1000 930 unsigned int i; 1001 931 1002 - vm_dbg(&xe->drm, "%u entries to update\n", num_entries); 932 + vm_dbg(&xe->drm, "%s: %u entries to update\n", bind ? "bind" : "unbind", 933 + num_entries); 1003 934 for (i = 0; i < num_entries; i++) { 1004 935 const struct xe_vm_pgtable_update *entry = &entries[i]; 1005 936 struct xe_pt *xe_pt = entry->pt; ··· 1019 952 {} 1020 953 #endif 1021 954 1022 - #ifdef CONFIG_DRM_XE_USERPTR_INVAL_INJECT 1023 - 1024 - static int xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma) 955 + static bool no_in_syncs(struct xe_sync_entry *syncs, u32 num_syncs) 1025 956 { 1026 - u32 divisor = uvma->userptr.divisor ? uvma->userptr.divisor : 2; 1027 - static u32 count; 957 + int i; 1028 958 1029 - if (count++ % divisor == divisor - 1) { 1030 - struct xe_vm *vm = xe_vma_vm(&uvma->vma); 959 + for (i = 0; i < num_syncs; i++) { 960 + struct dma_fence *fence = syncs[i].fence; 1031 961 1032 - uvma->userptr.divisor = divisor << 1; 1033 - spin_lock(&vm->userptr.invalidated_lock); 1034 - list_move_tail(&uvma->userptr.invalidate_link, 1035 - &vm->userptr.invalidated); 1036 - spin_unlock(&vm->userptr.invalidated_lock); 1037 - return true; 962 + if (fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, 963 + &fence->flags)) 964 + return false; 1038 965 } 1039 966 1040 - return false; 967 + return true; 1041 968 } 1042 969 1043 - #else 1044 - 1045 - static bool xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma) 970 + static int job_test_add_deps(struct xe_sched_job *job, 971 + struct dma_resv *resv, 972 + enum dma_resv_usage usage) 1046 973 { 1047 - return false; 974 + if (!job) { 975 + if (!dma_resv_test_signaled(resv, usage)) 976 + return -ETIME; 977 + 978 + return 0; 979 + } 980 + 981 + return xe_sched_job_add_deps(job, resv, usage); 1048 982 } 1049 983 1050 - #endif 984 + static int vma_add_deps(struct xe_vma *vma, struct xe_sched_job *job) 985 + { 986 + struct xe_bo *bo = xe_vma_bo(vma); 1051 987 1052 - /** 1053 - * struct xe_pt_migrate_pt_update - Callback argument for pre-commit callbacks 1054 - * @base: Base we derive from. 1055 - * @bind: Whether this is a bind or an unbind operation. A bind operation 1056 - * makes the pre-commit callback error with -EAGAIN if it detects a 1057 - * pending invalidation. 1058 - * @locked: Whether the pre-commit callback locked the userptr notifier lock 1059 - * and it needs unlocking. 1060 - */ 1061 - struct xe_pt_migrate_pt_update { 1062 - struct xe_migrate_pt_update base; 1063 - bool bind; 1064 - bool locked; 1065 - }; 988 + xe_bo_assert_held(bo); 1066 989 1067 - /* 1068 - * This function adds the needed dependencies to a page-table update job 1069 - * to make sure racing jobs for separate bind engines don't race writing 1070 - * to the same page-table range, wreaking havoc. Initially use a single 1071 - * fence for the entire VM. An optimization would use smaller granularity. 1072 - */ 990 + if (bo && !bo->vm) 991 + return job_test_add_deps(job, bo->ttm.base.resv, 992 + DMA_RESV_USAGE_KERNEL); 993 + 994 + return 0; 995 + } 996 + 997 + static int op_add_deps(struct xe_vm *vm, struct xe_vma_op *op, 998 + struct xe_sched_job *job) 999 + { 1000 + int err = 0; 1001 + 1002 + switch (op->base.op) { 1003 + case DRM_GPUVA_OP_MAP: 1004 + if (!op->map.immediate && xe_vm_in_fault_mode(vm)) 1005 + break; 1006 + 1007 + err = vma_add_deps(op->map.vma, job); 1008 + break; 1009 + case DRM_GPUVA_OP_REMAP: 1010 + if (op->remap.prev) 1011 + err = vma_add_deps(op->remap.prev, job); 1012 + if (!err && op->remap.next) 1013 + err = vma_add_deps(op->remap.next, job); 1014 + break; 1015 + case DRM_GPUVA_OP_UNMAP: 1016 + break; 1017 + case DRM_GPUVA_OP_PREFETCH: 1018 + err = vma_add_deps(gpuva_to_vma(op->base.prefetch.va), job); 1019 + break; 1020 + default: 1021 + drm_warn(&vm->xe->drm, "NOT POSSIBLE"); 1022 + } 1023 + 1024 + return err; 1025 + } 1026 + 1073 1027 static int xe_pt_vm_dependencies(struct xe_sched_job *job, 1074 - struct xe_range_fence_tree *rftree, 1075 - u64 start, u64 last) 1028 + struct xe_vm *vm, 1029 + struct xe_vma_ops *vops, 1030 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1031 + struct xe_range_fence_tree *rftree) 1076 1032 { 1077 1033 struct xe_range_fence *rtfence; 1078 1034 struct dma_fence *fence; 1079 - int err; 1035 + struct xe_vma_op *op; 1036 + int err = 0, i; 1080 1037 1081 - rtfence = xe_range_fence_tree_first(rftree, start, last); 1038 + xe_vm_assert_held(vm); 1039 + 1040 + if (!job && !no_in_syncs(vops->syncs, vops->num_syncs)) 1041 + return -ETIME; 1042 + 1043 + if (!job && !xe_exec_queue_is_idle(pt_update_ops->q)) 1044 + return -ETIME; 1045 + 1046 + if (pt_update_ops->wait_vm_bookkeep || pt_update_ops->wait_vm_kernel) { 1047 + err = job_test_add_deps(job, xe_vm_resv(vm), 1048 + pt_update_ops->wait_vm_bookkeep ? 1049 + DMA_RESV_USAGE_BOOKKEEP : 1050 + DMA_RESV_USAGE_KERNEL); 1051 + if (err) 1052 + return err; 1053 + } 1054 + 1055 + rtfence = xe_range_fence_tree_first(rftree, pt_update_ops->start, 1056 + pt_update_ops->last); 1082 1057 while (rtfence) { 1083 1058 fence = rtfence->fence; 1084 1059 ··· 1138 1029 return err; 1139 1030 } 1140 1031 1141 - rtfence = xe_range_fence_tree_next(rtfence, start, last); 1032 + rtfence = xe_range_fence_tree_next(rtfence, 1033 + pt_update_ops->start, 1034 + pt_update_ops->last); 1142 1035 } 1143 1036 1144 - return 0; 1037 + list_for_each_entry(op, &vops->list, link) { 1038 + err = op_add_deps(vm, op, job); 1039 + if (err) 1040 + return err; 1041 + } 1042 + 1043 + if (job) 1044 + err = xe_sched_job_last_fence_add_dep(job, vm); 1045 + else 1046 + err = xe_exec_queue_last_fence_test_dep(pt_update_ops->q, vm); 1047 + 1048 + for (i = 0; job && !err && i < vops->num_syncs; i++) 1049 + err = xe_sync_entry_add_deps(&vops->syncs[i], job); 1050 + 1051 + return err; 1145 1052 } 1146 1053 1147 1054 static int xe_pt_pre_commit(struct xe_migrate_pt_update *pt_update) 1148 1055 { 1149 - struct xe_range_fence_tree *rftree = 1150 - &xe_vma_vm(pt_update->vma)->rftree[pt_update->tile_id]; 1056 + struct xe_vma_ops *vops = pt_update->vops; 1057 + struct xe_vm *vm = vops->vm; 1058 + struct xe_range_fence_tree *rftree = &vm->rftree[pt_update->tile_id]; 1059 + struct xe_vm_pgtable_update_ops *pt_update_ops = 1060 + &vops->pt_update_ops[pt_update->tile_id]; 1151 1061 1152 - return xe_pt_vm_dependencies(pt_update->job, rftree, 1153 - pt_update->start, pt_update->last); 1062 + return xe_pt_vm_dependencies(pt_update->job, vm, pt_update->vops, 1063 + pt_update_ops, rftree); 1154 1064 } 1155 1065 1156 - static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update) 1066 + #ifdef CONFIG_DRM_XE_USERPTR_INVAL_INJECT 1067 + 1068 + static bool xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma) 1157 1069 { 1158 - struct xe_pt_migrate_pt_update *userptr_update = 1159 - container_of(pt_update, typeof(*userptr_update), base); 1160 - struct xe_userptr_vma *uvma = to_userptr_vma(pt_update->vma); 1161 - unsigned long notifier_seq = uvma->userptr.notifier_seq; 1162 - struct xe_vm *vm = xe_vma_vm(&uvma->vma); 1163 - int err = xe_pt_vm_dependencies(pt_update->job, 1164 - &vm->rftree[pt_update->tile_id], 1165 - pt_update->start, 1166 - pt_update->last); 1070 + u32 divisor = uvma->userptr.divisor ? uvma->userptr.divisor : 2; 1071 + static u32 count; 1167 1072 1168 - if (err) 1169 - return err; 1170 - 1171 - userptr_update->locked = false; 1172 - 1173 - /* 1174 - * Wait until nobody is running the invalidation notifier, and 1175 - * since we're exiting the loop holding the notifier lock, 1176 - * nobody can proceed invalidating either. 1177 - * 1178 - * Note that we don't update the vma->userptr.notifier_seq since 1179 - * we don't update the userptr pages. 1180 - */ 1181 - do { 1182 - down_read(&vm->userptr.notifier_lock); 1183 - if (!mmu_interval_read_retry(&uvma->userptr.notifier, 1184 - notifier_seq)) 1185 - break; 1186 - 1187 - up_read(&vm->userptr.notifier_lock); 1188 - 1189 - if (userptr_update->bind) 1190 - return -EAGAIN; 1191 - 1192 - notifier_seq = mmu_interval_read_begin(&uvma->userptr.notifier); 1193 - } while (true); 1194 - 1195 - /* Inject errors to test_whether they are handled correctly */ 1196 - if (userptr_update->bind && xe_pt_userptr_inject_eagain(uvma)) { 1197 - up_read(&vm->userptr.notifier_lock); 1198 - return -EAGAIN; 1073 + if (count++ % divisor == divisor - 1) { 1074 + uvma->userptr.divisor = divisor << 1; 1075 + return true; 1199 1076 } 1200 1077 1201 - userptr_update->locked = true; 1078 + return false; 1079 + } 1080 + 1081 + #else 1082 + 1083 + static bool xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma) 1084 + { 1085 + return false; 1086 + } 1087 + 1088 + #endif 1089 + 1090 + static int vma_check_userptr(struct xe_vm *vm, struct xe_vma *vma, 1091 + struct xe_vm_pgtable_update_ops *pt_update) 1092 + { 1093 + struct xe_userptr_vma *uvma; 1094 + unsigned long notifier_seq; 1095 + 1096 + lockdep_assert_held_read(&vm->userptr.notifier_lock); 1097 + 1098 + if (!xe_vma_is_userptr(vma)) 1099 + return 0; 1100 + 1101 + uvma = to_userptr_vma(vma); 1102 + notifier_seq = uvma->userptr.notifier_seq; 1103 + 1104 + if (uvma->userptr.initial_bind && !xe_vm_in_fault_mode(vm)) 1105 + return 0; 1106 + 1107 + if (!mmu_interval_read_retry(&uvma->userptr.notifier, 1108 + notifier_seq) && 1109 + !xe_pt_userptr_inject_eagain(uvma)) 1110 + return 0; 1111 + 1112 + if (xe_vm_in_fault_mode(vm)) { 1113 + return -EAGAIN; 1114 + } else { 1115 + spin_lock(&vm->userptr.invalidated_lock); 1116 + list_move_tail(&uvma->userptr.invalidate_link, 1117 + &vm->userptr.invalidated); 1118 + spin_unlock(&vm->userptr.invalidated_lock); 1119 + 1120 + if (xe_vm_in_preempt_fence_mode(vm)) { 1121 + struct dma_resv_iter cursor; 1122 + struct dma_fence *fence; 1123 + long err; 1124 + 1125 + dma_resv_iter_begin(&cursor, xe_vm_resv(vm), 1126 + DMA_RESV_USAGE_BOOKKEEP); 1127 + dma_resv_for_each_fence_unlocked(&cursor, fence) 1128 + dma_fence_enable_sw_signaling(fence); 1129 + dma_resv_iter_end(&cursor); 1130 + 1131 + err = dma_resv_wait_timeout(xe_vm_resv(vm), 1132 + DMA_RESV_USAGE_BOOKKEEP, 1133 + false, MAX_SCHEDULE_TIMEOUT); 1134 + XE_WARN_ON(err <= 0); 1135 + } 1136 + } 1202 1137 1203 1138 return 0; 1204 1139 } 1205 1140 1206 - static const struct xe_migrate_pt_update_ops bind_ops = { 1207 - .populate = xe_vm_populate_pgtable, 1208 - .pre_commit = xe_pt_pre_commit, 1209 - }; 1141 + static int op_check_userptr(struct xe_vm *vm, struct xe_vma_op *op, 1142 + struct xe_vm_pgtable_update_ops *pt_update) 1143 + { 1144 + int err = 0; 1210 1145 1211 - static const struct xe_migrate_pt_update_ops userptr_bind_ops = { 1212 - .populate = xe_vm_populate_pgtable, 1213 - .pre_commit = xe_pt_userptr_pre_commit, 1214 - }; 1146 + lockdep_assert_held_read(&vm->userptr.notifier_lock); 1147 + 1148 + switch (op->base.op) { 1149 + case DRM_GPUVA_OP_MAP: 1150 + if (!op->map.immediate && xe_vm_in_fault_mode(vm)) 1151 + break; 1152 + 1153 + err = vma_check_userptr(vm, op->map.vma, pt_update); 1154 + break; 1155 + case DRM_GPUVA_OP_REMAP: 1156 + if (op->remap.prev) 1157 + err = vma_check_userptr(vm, op->remap.prev, pt_update); 1158 + if (!err && op->remap.next) 1159 + err = vma_check_userptr(vm, op->remap.next, pt_update); 1160 + break; 1161 + case DRM_GPUVA_OP_UNMAP: 1162 + break; 1163 + case DRM_GPUVA_OP_PREFETCH: 1164 + err = vma_check_userptr(vm, gpuva_to_vma(op->base.prefetch.va), 1165 + pt_update); 1166 + break; 1167 + default: 1168 + drm_warn(&vm->xe->drm, "NOT POSSIBLE"); 1169 + } 1170 + 1171 + return err; 1172 + } 1173 + 1174 + static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update) 1175 + { 1176 + struct xe_vm *vm = pt_update->vops->vm; 1177 + struct xe_vma_ops *vops = pt_update->vops; 1178 + struct xe_vm_pgtable_update_ops *pt_update_ops = 1179 + &vops->pt_update_ops[pt_update->tile_id]; 1180 + struct xe_vma_op *op; 1181 + int err; 1182 + 1183 + err = xe_pt_pre_commit(pt_update); 1184 + if (err) 1185 + return err; 1186 + 1187 + down_read(&vm->userptr.notifier_lock); 1188 + 1189 + list_for_each_entry(op, &vops->list, link) { 1190 + err = op_check_userptr(vm, op, pt_update_ops); 1191 + if (err) { 1192 + up_read(&vm->userptr.notifier_lock); 1193 + break; 1194 + } 1195 + } 1196 + 1197 + return err; 1198 + } 1215 1199 1216 1200 struct invalidation_fence { 1217 1201 struct xe_gt_tlb_invalidation_fence base; ··· 1315 1113 u64 start; 1316 1114 u64 end; 1317 1115 u32 asid; 1318 - }; 1319 - 1320 - static const char * 1321 - invalidation_fence_get_driver_name(struct dma_fence *dma_fence) 1322 - { 1323 - return "xe"; 1324 - } 1325 - 1326 - static const char * 1327 - invalidation_fence_get_timeline_name(struct dma_fence *dma_fence) 1328 - { 1329 - return "invalidation_fence"; 1330 - } 1331 - 1332 - static const struct dma_fence_ops invalidation_fence_ops = { 1333 - .get_driver_name = invalidation_fence_get_driver_name, 1334 - .get_timeline_name = invalidation_fence_get_timeline_name, 1335 1116 }; 1336 1117 1337 1118 static void invalidation_fence_cb(struct dma_fence *fence, ··· 1346 1161 ifence->end, ifence->asid); 1347 1162 } 1348 1163 1349 - static int invalidation_fence_init(struct xe_gt *gt, 1350 - struct invalidation_fence *ifence, 1351 - struct dma_fence *fence, 1352 - u64 start, u64 end, u32 asid) 1164 + static void invalidation_fence_init(struct xe_gt *gt, 1165 + struct invalidation_fence *ifence, 1166 + struct dma_fence *fence, 1167 + u64 start, u64 end, u32 asid) 1353 1168 { 1354 1169 int ret; 1355 1170 1356 1171 trace_xe_gt_tlb_invalidation_fence_create(gt_to_xe(gt), &ifence->base); 1357 1172 1358 - spin_lock_irq(&gt->tlb_invalidation.lock); 1359 - dma_fence_init(&ifence->base.base, &invalidation_fence_ops, 1360 - &gt->tlb_invalidation.lock, 1361 - dma_fence_context_alloc(1), 1); 1362 - spin_unlock_irq(&gt->tlb_invalidation.lock); 1173 + xe_gt_tlb_invalidation_fence_init(gt, &ifence->base, false); 1363 1174 1364 - INIT_LIST_HEAD(&ifence->base.link); 1365 - 1366 - dma_fence_get(&ifence->base.base); /* Ref for caller */ 1367 1175 ifence->fence = fence; 1368 1176 ifence->gt = gt; 1369 1177 ifence->start = start; ··· 1374 1196 } 1375 1197 1376 1198 xe_gt_assert(gt, !ret || ret == -ENOENT); 1377 - 1378 - return ret && ret != -ENOENT ? ret : 0; 1379 - } 1380 - 1381 - static void xe_pt_calc_rfence_interval(struct xe_vma *vma, 1382 - struct xe_pt_migrate_pt_update *update, 1383 - struct xe_vm_pgtable_update *entries, 1384 - u32 num_entries) 1385 - { 1386 - int i, level = 0; 1387 - 1388 - for (i = 0; i < num_entries; i++) { 1389 - const struct xe_vm_pgtable_update *entry = &entries[i]; 1390 - 1391 - if (entry->pt->level > level) 1392 - level = entry->pt->level; 1393 - } 1394 - 1395 - /* Greedy (non-optimal) calculation but simple */ 1396 - update->base.start = ALIGN_DOWN(xe_vma_start(vma), 1397 - 0x1ull << xe_pt_shift(level)); 1398 - update->base.last = ALIGN(xe_vma_end(vma), 1399 - 0x1ull << xe_pt_shift(level)) - 1; 1400 - } 1401 - 1402 - /** 1403 - * __xe_pt_bind_vma() - Build and connect a page-table tree for the vma 1404 - * address range. 1405 - * @tile: The tile to bind for. 1406 - * @vma: The vma to bind. 1407 - * @q: The exec_queue with which to do pipelined page-table updates. 1408 - * @syncs: Entries to sync on before binding the built tree to the live vm tree. 1409 - * @num_syncs: Number of @sync entries. 1410 - * @rebind: Whether we're rebinding this vma to the same address range without 1411 - * an unbind in-between. 1412 - * 1413 - * This function builds a page-table tree (see xe_pt_stage_bind() for more 1414 - * information on page-table building), and the xe_vm_pgtable_update entries 1415 - * abstracting the operations needed to attach it to the main vm tree. It 1416 - * then takes the relevant locks and updates the metadata side of the main 1417 - * vm tree and submits the operations for pipelined attachment of the 1418 - * gpu page-table to the vm main tree, (which can be done either by the 1419 - * cpu and the GPU). 1420 - * 1421 - * Return: A valid dma-fence representing the pipelined attachment operation 1422 - * on success, an error pointer on error. 1423 - */ 1424 - struct dma_fence * 1425 - __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue *q, 1426 - struct xe_sync_entry *syncs, u32 num_syncs, 1427 - bool rebind) 1428 - { 1429 - struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1]; 1430 - struct xe_pt_migrate_pt_update bind_pt_update = { 1431 - .base = { 1432 - .ops = xe_vma_is_userptr(vma) ? &userptr_bind_ops : &bind_ops, 1433 - .vma = vma, 1434 - .tile_id = tile->id, 1435 - }, 1436 - .bind = true, 1437 - }; 1438 - struct xe_vm *vm = xe_vma_vm(vma); 1439 - u32 num_entries; 1440 - struct dma_fence *fence; 1441 - struct invalidation_fence *ifence = NULL; 1442 - struct xe_range_fence *rfence; 1443 - int err; 1444 - 1445 - bind_pt_update.locked = false; 1446 - xe_bo_assert_held(xe_vma_bo(vma)); 1447 - xe_vm_assert_held(vm); 1448 - 1449 - vm_dbg(&xe_vma_vm(vma)->xe->drm, 1450 - "Preparing bind, with range [%llx...%llx) engine %p.\n", 1451 - xe_vma_start(vma), xe_vma_end(vma), q); 1452 - 1453 - err = xe_pt_prepare_bind(tile, vma, entries, &num_entries); 1454 - if (err) 1455 - goto err; 1456 - 1457 - err = dma_resv_reserve_fences(xe_vm_resv(vm), 1); 1458 - if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1459 - err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1); 1460 - if (err) 1461 - goto err; 1462 - 1463 - xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries)); 1464 - 1465 - xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries); 1466 - xe_pt_calc_rfence_interval(vma, &bind_pt_update, entries, 1467 - num_entries); 1468 - 1469 - /* 1470 - * If rebind, we have to invalidate TLB on !LR vms to invalidate 1471 - * cached PTEs point to freed memory. on LR vms this is done 1472 - * automatically when the context is re-enabled by the rebind worker, 1473 - * or in fault mode it was invalidated on PTE zapping. 1474 - * 1475 - * If !rebind, and scratch enabled VMs, there is a chance the scratch 1476 - * PTE is already cached in the TLB so it needs to be invalidated. 1477 - * on !LR VMs this is done in the ring ops preceding a batch, but on 1478 - * non-faulting LR, in particular on user-space batch buffer chaining, 1479 - * it needs to be done here. 1480 - */ 1481 - if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) { 1482 - ifence = kzalloc(sizeof(*ifence), GFP_KERNEL); 1483 - if (!ifence) 1484 - return ERR_PTR(-ENOMEM); 1485 - } else if (rebind && !xe_vm_in_lr_mode(vm)) { 1486 - /* We bump also if batch_invalidate_tlb is true */ 1487 - vm->tlb_flush_seqno++; 1488 - } 1489 - 1490 - rfence = kzalloc(sizeof(*rfence), GFP_KERNEL); 1491 - if (!rfence) { 1492 - kfree(ifence); 1493 - return ERR_PTR(-ENOMEM); 1494 - } 1495 - 1496 - fence = xe_migrate_update_pgtables(tile->migrate, 1497 - vm, xe_vma_bo(vma), q, 1498 - entries, num_entries, 1499 - syncs, num_syncs, 1500 - &bind_pt_update.base); 1501 - if (!IS_ERR(fence)) { 1502 - bool last_munmap_rebind = vma->gpuva.flags & XE_VMA_LAST_REBIND; 1503 - LLIST_HEAD(deferred); 1504 - int err; 1505 - 1506 - err = xe_range_fence_insert(&vm->rftree[tile->id], rfence, 1507 - &xe_range_fence_kfree_ops, 1508 - bind_pt_update.base.start, 1509 - bind_pt_update.base.last, fence); 1510 - if (err) 1511 - dma_fence_wait(fence, false); 1512 - 1513 - /* TLB invalidation must be done before signaling rebind */ 1514 - if (ifence) { 1515 - int err = invalidation_fence_init(tile->primary_gt, 1516 - ifence, fence, 1517 - xe_vma_start(vma), 1518 - xe_vma_end(vma), 1519 - xe_vma_vm(vma)->usm.asid); 1520 - if (err) { 1521 - dma_fence_put(fence); 1522 - kfree(ifence); 1523 - return ERR_PTR(err); 1524 - } 1525 - fence = &ifence->base.base; 1526 - } 1527 - 1528 - /* add shared fence now for pagetable delayed destroy */ 1529 - dma_resv_add_fence(xe_vm_resv(vm), fence, rebind || 1530 - last_munmap_rebind ? 1531 - DMA_RESV_USAGE_KERNEL : 1532 - DMA_RESV_USAGE_BOOKKEEP); 1533 - 1534 - if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1535 - dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence, 1536 - DMA_RESV_USAGE_BOOKKEEP); 1537 - xe_pt_commit_bind(vma, entries, num_entries, rebind, 1538 - bind_pt_update.locked ? &deferred : NULL); 1539 - 1540 - /* This vma is live (again?) now */ 1541 - vma->tile_present |= BIT(tile->id); 1542 - 1543 - if (bind_pt_update.locked) { 1544 - to_userptr_vma(vma)->userptr.initial_bind = true; 1545 - up_read(&vm->userptr.notifier_lock); 1546 - xe_bo_put_commit(&deferred); 1547 - } 1548 - if (!rebind && last_munmap_rebind && 1549 - xe_vm_in_preempt_fence_mode(vm)) 1550 - xe_vm_queue_rebind_worker(vm); 1551 - } else { 1552 - kfree(rfence); 1553 - kfree(ifence); 1554 - if (bind_pt_update.locked) 1555 - up_read(&vm->userptr.notifier_lock); 1556 - xe_pt_abort_bind(vma, entries, num_entries); 1557 - } 1558 - 1559 - return fence; 1560 - 1561 - err: 1562 - return ERR_PTR(err); 1563 1199 } 1564 1200 1565 1201 struct xe_pt_stage_unbind_walk { ··· 1458 1466 struct xe_pt *xe_child = container_of(*child, typeof(*xe_child), base); 1459 1467 pgoff_t end_offset; 1460 1468 u64 size = 1ull << walk->shifts[--level]; 1469 + int err; 1461 1470 1462 1471 if (!IS_ALIGNED(addr, size)) 1463 1472 addr = xe_walk->modified_start; ··· 1474 1481 &end_offset)) 1475 1482 return 0; 1476 1483 1477 - (void)xe_pt_new_shared(&xe_walk->wupd, xe_child, offset, false); 1484 + err = xe_pt_new_shared(&xe_walk->wupd, xe_child, offset, true); 1485 + if (err) 1486 + return err; 1487 + 1478 1488 xe_walk->wupd.updates[level].update->qwords = end_offset - offset; 1479 1489 1480 1490 return 0; ··· 1530 1534 void *ptr, u32 qword_ofs, u32 num_qwords, 1531 1535 const struct xe_vm_pgtable_update *update) 1532 1536 { 1533 - struct xe_vma *vma = pt_update->vma; 1534 - u64 empty = __xe_pt_empty_pte(tile, xe_vma_vm(vma), update->pt->level); 1537 + struct xe_vm *vm = pt_update->vops->vm; 1538 + u64 empty = __xe_pt_empty_pte(tile, vm, update->pt->level); 1535 1539 int i; 1536 1540 1537 1541 if (map && map->is_iomem) ··· 1545 1549 memset64(ptr, empty, num_qwords); 1546 1550 } 1547 1551 1548 - static void 1549 - xe_pt_commit_unbind(struct xe_vma *vma, 1550 - struct xe_vm_pgtable_update *entries, u32 num_entries, 1551 - struct llist_head *deferred) 1552 + static void xe_pt_abort_unbind(struct xe_vma *vma, 1553 + struct xe_vm_pgtable_update *entries, 1554 + u32 num_entries) 1552 1555 { 1553 - u32 j; 1556 + int i, j; 1554 1557 1555 1558 xe_pt_commit_locks_assert(vma); 1556 1559 1557 - for (j = 0; j < num_entries; ++j) { 1558 - struct xe_vm_pgtable_update *entry = &entries[j]; 1560 + for (i = num_entries - 1; i >= 0; --i) { 1561 + struct xe_vm_pgtable_update *entry = &entries[i]; 1559 1562 struct xe_pt *pt = entry->pt; 1563 + struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt); 1564 + 1565 + pt->num_live += entry->qwords; 1566 + 1567 + if (!pt->level) 1568 + continue; 1569 + 1570 + for (j = entry->ofs; j < entry->ofs + entry->qwords; j++) 1571 + pt_dir->children[j] = 1572 + entries[i].pt_entries[j - entry->ofs].pt ? 1573 + &entries[i].pt_entries[j - entry->ofs].pt->base : NULL; 1574 + } 1575 + } 1576 + 1577 + static void 1578 + xe_pt_commit_prepare_unbind(struct xe_vma *vma, 1579 + struct xe_vm_pgtable_update *entries, 1580 + u32 num_entries) 1581 + { 1582 + int i, j; 1583 + 1584 + xe_pt_commit_locks_assert(vma); 1585 + 1586 + for (i = 0; i < num_entries; ++i) { 1587 + struct xe_vm_pgtable_update *entry = &entries[i]; 1588 + struct xe_pt *pt = entry->pt; 1589 + struct xe_pt_dir *pt_dir; 1560 1590 1561 1591 pt->num_live -= entry->qwords; 1562 - if (pt->level) { 1563 - struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt); 1564 - u32 i; 1592 + if (!pt->level) 1593 + continue; 1565 1594 1566 - for (i = entry->ofs; i < entry->ofs + entry->qwords; 1567 - i++) { 1568 - if (xe_pt_entry(pt_dir, i)) 1569 - xe_pt_destroy(xe_pt_entry(pt_dir, i), 1570 - xe_vma_vm(vma)->flags, deferred); 1571 - 1572 - pt_dir->children[i] = NULL; 1573 - } 1595 + pt_dir = as_xe_pt_dir(pt); 1596 + for (j = entry->ofs; j < entry->ofs + entry->qwords; j++) { 1597 + entry->pt_entries[j - entry->ofs].pt = 1598 + xe_pt_entry(pt_dir, j); 1599 + pt_dir->children[j] = NULL; 1574 1600 } 1575 1601 } 1576 1602 } 1577 1603 1578 - static const struct xe_migrate_pt_update_ops unbind_ops = { 1579 - .populate = xe_migrate_clear_pgtable_callback, 1580 - .pre_commit = xe_pt_pre_commit, 1581 - }; 1582 - 1583 - static const struct xe_migrate_pt_update_ops userptr_unbind_ops = { 1584 - .populate = xe_migrate_clear_pgtable_callback, 1585 - .pre_commit = xe_pt_userptr_pre_commit, 1586 - }; 1587 - 1588 - /** 1589 - * __xe_pt_unbind_vma() - Disconnect and free a page-table tree for the vma 1590 - * address range. 1591 - * @tile: The tile to unbind for. 1592 - * @vma: The vma to unbind. 1593 - * @q: The exec_queue with which to do pipelined page-table updates. 1594 - * @syncs: Entries to sync on before disconnecting the tree to be destroyed. 1595 - * @num_syncs: Number of @sync entries. 1596 - * 1597 - * This function builds a the xe_vm_pgtable_update entries abstracting the 1598 - * operations needed to detach the page-table tree to be destroyed from the 1599 - * man vm tree. 1600 - * It then takes the relevant locks and submits the operations for 1601 - * pipelined detachment of the gpu page-table from the vm main tree, 1602 - * (which can be done either by the cpu and the GPU), Finally it frees the 1603 - * detached page-table tree. 1604 - * 1605 - * Return: A valid dma-fence representing the pipelined detachment operation 1606 - * on success, an error pointer on error. 1607 - */ 1608 - struct dma_fence * 1609 - __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue *q, 1610 - struct xe_sync_entry *syncs, u32 num_syncs) 1604 + static void 1605 + xe_pt_update_ops_rfence_interval(struct xe_vm_pgtable_update_ops *pt_update_ops, 1606 + struct xe_vma *vma) 1611 1607 { 1612 - struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1]; 1613 - struct xe_pt_migrate_pt_update unbind_pt_update = { 1614 - .base = { 1615 - .ops = xe_vma_is_userptr(vma) ? &userptr_unbind_ops : 1616 - &unbind_ops, 1617 - .vma = vma, 1618 - .tile_id = tile->id, 1619 - }, 1620 - }; 1621 - struct xe_vm *vm = xe_vma_vm(vma); 1622 - u32 num_entries; 1623 - struct dma_fence *fence = NULL; 1624 - struct invalidation_fence *ifence; 1625 - struct xe_range_fence *rfence; 1608 + u32 current_op = pt_update_ops->current_op; 1609 + struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op]; 1610 + int i, level = 0; 1611 + u64 start, last; 1612 + 1613 + for (i = 0; i < pt_op->num_entries; i++) { 1614 + const struct xe_vm_pgtable_update *entry = &pt_op->entries[i]; 1615 + 1616 + if (entry->pt->level > level) 1617 + level = entry->pt->level; 1618 + } 1619 + 1620 + /* Greedy (non-optimal) calculation but simple */ 1621 + start = ALIGN_DOWN(xe_vma_start(vma), 0x1ull << xe_pt_shift(level)); 1622 + last = ALIGN(xe_vma_end(vma), 0x1ull << xe_pt_shift(level)) - 1; 1623 + 1624 + if (start < pt_update_ops->start) 1625 + pt_update_ops->start = start; 1626 + if (last > pt_update_ops->last) 1627 + pt_update_ops->last = last; 1628 + } 1629 + 1630 + static int vma_reserve_fences(struct xe_device *xe, struct xe_vma *vma) 1631 + { 1632 + if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1633 + return dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1634 + xe->info.tile_count); 1635 + 1636 + return 0; 1637 + } 1638 + 1639 + static int bind_op_prepare(struct xe_vm *vm, struct xe_tile *tile, 1640 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1641 + struct xe_vma *vma) 1642 + { 1643 + u32 current_op = pt_update_ops->current_op; 1644 + struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op]; 1626 1645 int err; 1627 1646 1628 - LLIST_HEAD(deferred); 1629 - 1630 1647 xe_bo_assert_held(xe_vma_bo(vma)); 1631 - xe_vm_assert_held(vm); 1632 1648 1633 1649 vm_dbg(&xe_vma_vm(vma)->xe->drm, 1634 - "Preparing unbind, with range [%llx...%llx) engine %p.\n", 1635 - xe_vma_start(vma), xe_vma_end(vma), q); 1650 + "Preparing bind, with range [%llx...%llx)\n", 1651 + xe_vma_start(vma), xe_vma_end(vma) - 1); 1636 1652 1637 - num_entries = xe_pt_stage_unbind(tile, vma, entries); 1638 - xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries)); 1653 + pt_op->vma = NULL; 1654 + pt_op->bind = true; 1655 + pt_op->rebind = BIT(tile->id) & vma->tile_present; 1639 1656 1640 - xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries); 1641 - xe_pt_calc_rfence_interval(vma, &unbind_pt_update, entries, 1642 - num_entries); 1643 - 1644 - err = dma_resv_reserve_fences(xe_vm_resv(vm), 1); 1645 - if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1646 - err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1); 1657 + err = vma_reserve_fences(tile_to_xe(tile), vma); 1647 1658 if (err) 1648 - return ERR_PTR(err); 1659 + return err; 1649 1660 1650 - ifence = kzalloc(sizeof(*ifence), GFP_KERNEL); 1651 - if (!ifence) 1652 - return ERR_PTR(-ENOMEM); 1661 + err = xe_pt_prepare_bind(tile, vma, pt_op->entries, 1662 + &pt_op->num_entries); 1663 + if (!err) { 1664 + xe_tile_assert(tile, pt_op->num_entries <= 1665 + ARRAY_SIZE(pt_op->entries)); 1666 + xe_vm_dbg_print_entries(tile_to_xe(tile), pt_op->entries, 1667 + pt_op->num_entries, true); 1653 1668 1654 - rfence = kzalloc(sizeof(*rfence), GFP_KERNEL); 1655 - if (!rfence) { 1656 - kfree(ifence); 1657 - return ERR_PTR(-ENOMEM); 1669 + xe_pt_update_ops_rfence_interval(pt_update_ops, vma); 1670 + ++pt_update_ops->current_op; 1671 + pt_update_ops->needs_userptr_lock |= xe_vma_is_userptr(vma); 1672 + 1673 + /* 1674 + * If rebind, we have to invalidate TLB on !LR vms to invalidate 1675 + * cached PTEs point to freed memory. On LR vms this is done 1676 + * automatically when the context is re-enabled by the rebind worker, 1677 + * or in fault mode it was invalidated on PTE zapping. 1678 + * 1679 + * If !rebind, and scratch enabled VMs, there is a chance the scratch 1680 + * PTE is already cached in the TLB so it needs to be invalidated. 1681 + * On !LR VMs this is done in the ring ops preceding a batch, but on 1682 + * non-faulting LR, in particular on user-space batch buffer chaining, 1683 + * it needs to be done here. 1684 + */ 1685 + if ((!pt_op->rebind && xe_vm_has_scratch(vm) && 1686 + xe_vm_in_preempt_fence_mode(vm))) 1687 + pt_update_ops->needs_invalidation = true; 1688 + else if (pt_op->rebind && !xe_vm_in_lr_mode(vm)) 1689 + /* We bump also if batch_invalidate_tlb is true */ 1690 + vm->tlb_flush_seqno++; 1691 + 1692 + vma->tile_staged |= BIT(tile->id); 1693 + pt_op->vma = vma; 1694 + xe_pt_commit_prepare_bind(vma, pt_op->entries, 1695 + pt_op->num_entries, pt_op->rebind); 1696 + } else { 1697 + xe_pt_cancel_bind(vma, pt_op->entries, pt_op->num_entries); 1698 + } 1699 + 1700 + return err; 1701 + } 1702 + 1703 + static int unbind_op_prepare(struct xe_tile *tile, 1704 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1705 + struct xe_vma *vma) 1706 + { 1707 + u32 current_op = pt_update_ops->current_op; 1708 + struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op]; 1709 + int err; 1710 + 1711 + if (!((vma->tile_present | vma->tile_staged) & BIT(tile->id))) 1712 + return 0; 1713 + 1714 + xe_bo_assert_held(xe_vma_bo(vma)); 1715 + 1716 + vm_dbg(&xe_vma_vm(vma)->xe->drm, 1717 + "Preparing unbind, with range [%llx...%llx)\n", 1718 + xe_vma_start(vma), xe_vma_end(vma) - 1); 1719 + 1720 + /* 1721 + * Wait for invalidation to complete. Can corrupt internal page table 1722 + * state if an invalidation is running while preparing an unbind. 1723 + */ 1724 + if (xe_vma_is_userptr(vma) && xe_vm_in_fault_mode(xe_vma_vm(vma))) 1725 + mmu_interval_read_begin(&to_userptr_vma(vma)->userptr.notifier); 1726 + 1727 + pt_op->vma = vma; 1728 + pt_op->bind = false; 1729 + pt_op->rebind = false; 1730 + 1731 + err = vma_reserve_fences(tile_to_xe(tile), vma); 1732 + if (err) 1733 + return err; 1734 + 1735 + pt_op->num_entries = xe_pt_stage_unbind(tile, vma, pt_op->entries); 1736 + 1737 + xe_vm_dbg_print_entries(tile_to_xe(tile), pt_op->entries, 1738 + pt_op->num_entries, false); 1739 + xe_pt_update_ops_rfence_interval(pt_update_ops, vma); 1740 + ++pt_update_ops->current_op; 1741 + pt_update_ops->needs_userptr_lock |= xe_vma_is_userptr(vma); 1742 + pt_update_ops->needs_invalidation = true; 1743 + 1744 + xe_pt_commit_prepare_unbind(vma, pt_op->entries, pt_op->num_entries); 1745 + 1746 + return 0; 1747 + } 1748 + 1749 + static int op_prepare(struct xe_vm *vm, 1750 + struct xe_tile *tile, 1751 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1752 + struct xe_vma_op *op) 1753 + { 1754 + int err = 0; 1755 + 1756 + xe_vm_assert_held(vm); 1757 + 1758 + switch (op->base.op) { 1759 + case DRM_GPUVA_OP_MAP: 1760 + if (!op->map.immediate && xe_vm_in_fault_mode(vm)) 1761 + break; 1762 + 1763 + err = bind_op_prepare(vm, tile, pt_update_ops, op->map.vma); 1764 + pt_update_ops->wait_vm_kernel = true; 1765 + break; 1766 + case DRM_GPUVA_OP_REMAP: 1767 + err = unbind_op_prepare(tile, pt_update_ops, 1768 + gpuva_to_vma(op->base.remap.unmap->va)); 1769 + 1770 + if (!err && op->remap.prev) { 1771 + err = bind_op_prepare(vm, tile, pt_update_ops, 1772 + op->remap.prev); 1773 + pt_update_ops->wait_vm_bookkeep = true; 1774 + } 1775 + if (!err && op->remap.next) { 1776 + err = bind_op_prepare(vm, tile, pt_update_ops, 1777 + op->remap.next); 1778 + pt_update_ops->wait_vm_bookkeep = true; 1779 + } 1780 + break; 1781 + case DRM_GPUVA_OP_UNMAP: 1782 + err = unbind_op_prepare(tile, pt_update_ops, 1783 + gpuva_to_vma(op->base.unmap.va)); 1784 + break; 1785 + case DRM_GPUVA_OP_PREFETCH: 1786 + err = bind_op_prepare(vm, tile, pt_update_ops, 1787 + gpuva_to_vma(op->base.prefetch.va)); 1788 + pt_update_ops->wait_vm_kernel = true; 1789 + break; 1790 + default: 1791 + drm_warn(&vm->xe->drm, "NOT POSSIBLE"); 1792 + } 1793 + 1794 + return err; 1795 + } 1796 + 1797 + static void 1798 + xe_pt_update_ops_init(struct xe_vm_pgtable_update_ops *pt_update_ops) 1799 + { 1800 + init_llist_head(&pt_update_ops->deferred); 1801 + pt_update_ops->start = ~0x0ull; 1802 + pt_update_ops->last = 0x0ull; 1803 + } 1804 + 1805 + /** 1806 + * xe_pt_update_ops_prepare() - Prepare PT update operations 1807 + * @tile: Tile of PT update operations 1808 + * @vops: VMA operationa 1809 + * 1810 + * Prepare PT update operations which includes updating internal PT state, 1811 + * allocate memory for page tables, populate page table being pruned in, and 1812 + * create PT update operations for leaf insertion / removal. 1813 + * 1814 + * Return: 0 on success, negative error code on error. 1815 + */ 1816 + int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops) 1817 + { 1818 + struct xe_vm_pgtable_update_ops *pt_update_ops = 1819 + &vops->pt_update_ops[tile->id]; 1820 + struct xe_vma_op *op; 1821 + int err; 1822 + 1823 + lockdep_assert_held(&vops->vm->lock); 1824 + xe_vm_assert_held(vops->vm); 1825 + 1826 + xe_pt_update_ops_init(pt_update_ops); 1827 + 1828 + err = dma_resv_reserve_fences(xe_vm_resv(vops->vm), 1829 + tile_to_xe(tile)->info.tile_count); 1830 + if (err) 1831 + return err; 1832 + 1833 + list_for_each_entry(op, &vops->list, link) { 1834 + err = op_prepare(vops->vm, tile, pt_update_ops, op); 1835 + 1836 + if (err) 1837 + return err; 1838 + } 1839 + 1840 + xe_tile_assert(tile, pt_update_ops->current_op <= 1841 + pt_update_ops->num_ops); 1842 + 1843 + #ifdef TEST_VM_OPS_ERROR 1844 + if (vops->inject_error && 1845 + vops->vm->xe->vm_inject_error_position == FORCE_OP_ERROR_PREPARE) 1846 + return -ENOSPC; 1847 + #endif 1848 + 1849 + return 0; 1850 + } 1851 + 1852 + static void bind_op_commit(struct xe_vm *vm, struct xe_tile *tile, 1853 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1854 + struct xe_vma *vma, struct dma_fence *fence) 1855 + { 1856 + if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1857 + dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence, 1858 + pt_update_ops->wait_vm_bookkeep ? 1859 + DMA_RESV_USAGE_KERNEL : 1860 + DMA_RESV_USAGE_BOOKKEEP); 1861 + vma->tile_present |= BIT(tile->id); 1862 + vma->tile_staged &= ~BIT(tile->id); 1863 + if (xe_vma_is_userptr(vma)) { 1864 + lockdep_assert_held_read(&vm->userptr.notifier_lock); 1865 + to_userptr_vma(vma)->userptr.initial_bind = true; 1658 1866 } 1659 1867 1660 1868 /* 1661 - * Even if we were already evicted and unbind to destroy, we need to 1662 - * clear again here. The eviction may have updated pagetables at a 1663 - * lower level, because it needs to be more conservative. 1869 + * Kick rebind worker if this bind triggers preempt fences and not in 1870 + * the rebind worker 1664 1871 */ 1665 - fence = xe_migrate_update_pgtables(tile->migrate, 1666 - vm, NULL, q ? q : 1667 - vm->q[tile->id], 1668 - entries, num_entries, 1669 - syncs, num_syncs, 1670 - &unbind_pt_update.base); 1671 - if (!IS_ERR(fence)) { 1672 - int err; 1872 + if (pt_update_ops->wait_vm_bookkeep && 1873 + xe_vm_in_preempt_fence_mode(vm) && 1874 + !current->mm) 1875 + xe_vm_queue_rebind_worker(vm); 1876 + } 1673 1877 1674 - err = xe_range_fence_insert(&vm->rftree[tile->id], rfence, 1675 - &xe_range_fence_kfree_ops, 1676 - unbind_pt_update.base.start, 1677 - unbind_pt_update.base.last, fence); 1678 - if (err) 1679 - dma_fence_wait(fence, false); 1680 - 1681 - /* TLB invalidation must be done before signaling unbind */ 1682 - err = invalidation_fence_init(tile->primary_gt, ifence, fence, 1683 - xe_vma_start(vma), 1684 - xe_vma_end(vma), 1685 - xe_vma_vm(vma)->usm.asid); 1686 - if (err) { 1687 - dma_fence_put(fence); 1688 - kfree(ifence); 1689 - return ERR_PTR(err); 1690 - } 1691 - fence = &ifence->base.base; 1692 - 1693 - /* add shared fence now for pagetable delayed destroy */ 1694 - dma_resv_add_fence(xe_vm_resv(vm), fence, 1878 + static void unbind_op_commit(struct xe_vm *vm, struct xe_tile *tile, 1879 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1880 + struct xe_vma *vma, struct dma_fence *fence) 1881 + { 1882 + if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1883 + dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence, 1884 + pt_update_ops->wait_vm_bookkeep ? 1885 + DMA_RESV_USAGE_KERNEL : 1695 1886 DMA_RESV_USAGE_BOOKKEEP); 1696 - 1697 - /* This fence will be installed by caller when doing eviction */ 1698 - if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1699 - dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence, 1700 - DMA_RESV_USAGE_BOOKKEEP); 1701 - xe_pt_commit_unbind(vma, entries, num_entries, 1702 - unbind_pt_update.locked ? &deferred : NULL); 1703 - vma->tile_present &= ~BIT(tile->id); 1704 - } else { 1705 - kfree(rfence); 1706 - kfree(ifence); 1707 - } 1708 - 1709 - if (!vma->tile_present) 1887 + vma->tile_present &= ~BIT(tile->id); 1888 + if (!vma->tile_present) { 1710 1889 list_del_init(&vma->combined_links.rebind); 1890 + if (xe_vma_is_userptr(vma)) { 1891 + lockdep_assert_held_read(&vm->userptr.notifier_lock); 1711 1892 1712 - if (unbind_pt_update.locked) { 1713 - xe_tile_assert(tile, xe_vma_is_userptr(vma)); 1714 - 1715 - if (!vma->tile_present) { 1716 1893 spin_lock(&vm->userptr.invalidated_lock); 1717 1894 list_del_init(&to_userptr_vma(vma)->userptr.invalidate_link); 1718 1895 spin_unlock(&vm->userptr.invalidated_lock); 1719 1896 } 1720 - up_read(&vm->userptr.notifier_lock); 1721 - xe_bo_put_commit(&deferred); 1897 + } 1898 + } 1899 + 1900 + static void op_commit(struct xe_vm *vm, 1901 + struct xe_tile *tile, 1902 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1903 + struct xe_vma_op *op, struct dma_fence *fence) 1904 + { 1905 + xe_vm_assert_held(vm); 1906 + 1907 + switch (op->base.op) { 1908 + case DRM_GPUVA_OP_MAP: 1909 + if (!op->map.immediate && xe_vm_in_fault_mode(vm)) 1910 + break; 1911 + 1912 + bind_op_commit(vm, tile, pt_update_ops, op->map.vma, fence); 1913 + break; 1914 + case DRM_GPUVA_OP_REMAP: 1915 + unbind_op_commit(vm, tile, pt_update_ops, 1916 + gpuva_to_vma(op->base.remap.unmap->va), fence); 1917 + 1918 + if (op->remap.prev) 1919 + bind_op_commit(vm, tile, pt_update_ops, op->remap.prev, 1920 + fence); 1921 + if (op->remap.next) 1922 + bind_op_commit(vm, tile, pt_update_ops, op->remap.next, 1923 + fence); 1924 + break; 1925 + case DRM_GPUVA_OP_UNMAP: 1926 + unbind_op_commit(vm, tile, pt_update_ops, 1927 + gpuva_to_vma(op->base.unmap.va), fence); 1928 + break; 1929 + case DRM_GPUVA_OP_PREFETCH: 1930 + bind_op_commit(vm, tile, pt_update_ops, 1931 + gpuva_to_vma(op->base.prefetch.va), fence); 1932 + break; 1933 + default: 1934 + drm_warn(&vm->xe->drm, "NOT POSSIBLE"); 1935 + } 1936 + } 1937 + 1938 + static const struct xe_migrate_pt_update_ops migrate_ops = { 1939 + .populate = xe_vm_populate_pgtable, 1940 + .clear = xe_migrate_clear_pgtable_callback, 1941 + .pre_commit = xe_pt_pre_commit, 1942 + }; 1943 + 1944 + static const struct xe_migrate_pt_update_ops userptr_migrate_ops = { 1945 + .populate = xe_vm_populate_pgtable, 1946 + .clear = xe_migrate_clear_pgtable_callback, 1947 + .pre_commit = xe_pt_userptr_pre_commit, 1948 + }; 1949 + 1950 + /** 1951 + * xe_pt_update_ops_run() - Run PT update operations 1952 + * @tile: Tile of PT update operations 1953 + * @vops: VMA operationa 1954 + * 1955 + * Run PT update operations which includes committing internal PT state changes, 1956 + * creating job for PT update operations for leaf insertion / removal, and 1957 + * installing job fence in various places. 1958 + * 1959 + * Return: fence on success, negative ERR_PTR on error. 1960 + */ 1961 + struct dma_fence * 1962 + xe_pt_update_ops_run(struct xe_tile *tile, struct xe_vma_ops *vops) 1963 + { 1964 + struct xe_vm *vm = vops->vm; 1965 + struct xe_vm_pgtable_update_ops *pt_update_ops = 1966 + &vops->pt_update_ops[tile->id]; 1967 + struct dma_fence *fence; 1968 + struct invalidation_fence *ifence = NULL; 1969 + struct xe_range_fence *rfence; 1970 + struct xe_vma_op *op; 1971 + int err = 0, i; 1972 + struct xe_migrate_pt_update update = { 1973 + .ops = pt_update_ops->needs_userptr_lock ? 1974 + &userptr_migrate_ops : 1975 + &migrate_ops, 1976 + .vops = vops, 1977 + .tile_id = tile->id, 1978 + }; 1979 + 1980 + lockdep_assert_held(&vm->lock); 1981 + xe_vm_assert_held(vm); 1982 + 1983 + if (!pt_update_ops->current_op) { 1984 + xe_tile_assert(tile, xe_vm_in_fault_mode(vm)); 1985 + 1986 + return dma_fence_get_stub(); 1722 1987 } 1723 1988 1989 + #ifdef TEST_VM_OPS_ERROR 1990 + if (vops->inject_error && 1991 + vm->xe->vm_inject_error_position == FORCE_OP_ERROR_RUN) 1992 + return ERR_PTR(-ENOSPC); 1993 + #endif 1994 + 1995 + if (pt_update_ops->needs_invalidation) { 1996 + ifence = kzalloc(sizeof(*ifence), GFP_KERNEL); 1997 + if (!ifence) { 1998 + err = -ENOMEM; 1999 + goto kill_vm_tile1; 2000 + } 2001 + } 2002 + 2003 + rfence = kzalloc(sizeof(*rfence), GFP_KERNEL); 2004 + if (!rfence) { 2005 + err = -ENOMEM; 2006 + goto free_ifence; 2007 + } 2008 + 2009 + fence = xe_migrate_update_pgtables(tile->migrate, &update); 2010 + if (IS_ERR(fence)) { 2011 + err = PTR_ERR(fence); 2012 + goto free_rfence; 2013 + } 2014 + 2015 + /* Point of no return - VM killed if failure after this */ 2016 + for (i = 0; i < pt_update_ops->current_op; ++i) { 2017 + struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[i]; 2018 + 2019 + xe_pt_commit(pt_op->vma, pt_op->entries, 2020 + pt_op->num_entries, &pt_update_ops->deferred); 2021 + pt_op->vma = NULL; /* skip in xe_pt_update_ops_abort */ 2022 + } 2023 + 2024 + if (xe_range_fence_insert(&vm->rftree[tile->id], rfence, 2025 + &xe_range_fence_kfree_ops, 2026 + pt_update_ops->start, 2027 + pt_update_ops->last, fence)) 2028 + dma_fence_wait(fence, false); 2029 + 2030 + /* tlb invalidation must be done before signaling rebind */ 2031 + if (ifence) { 2032 + invalidation_fence_init(tile->primary_gt, ifence, fence, 2033 + pt_update_ops->start, 2034 + pt_update_ops->last, vm->usm.asid); 2035 + fence = &ifence->base.base; 2036 + } 2037 + 2038 + dma_resv_add_fence(xe_vm_resv(vm), fence, 2039 + pt_update_ops->wait_vm_bookkeep ? 2040 + DMA_RESV_USAGE_KERNEL : 2041 + DMA_RESV_USAGE_BOOKKEEP); 2042 + 2043 + list_for_each_entry(op, &vops->list, link) 2044 + op_commit(vops->vm, tile, pt_update_ops, op, fence); 2045 + 2046 + if (pt_update_ops->needs_userptr_lock) 2047 + up_read(&vm->userptr.notifier_lock); 2048 + 1724 2049 return fence; 2050 + 2051 + free_rfence: 2052 + kfree(rfence); 2053 + free_ifence: 2054 + kfree(ifence); 2055 + kill_vm_tile1: 2056 + if (err != -EAGAIN && tile->id) 2057 + xe_vm_kill(vops->vm, false); 2058 + 2059 + return ERR_PTR(err); 2060 + } 2061 + 2062 + /** 2063 + * xe_pt_update_ops_fini() - Finish PT update operations 2064 + * @tile: Tile of PT update operations 2065 + * @vops: VMA operations 2066 + * 2067 + * Finish PT update operations by committing to destroy page table memory 2068 + */ 2069 + void xe_pt_update_ops_fini(struct xe_tile *tile, struct xe_vma_ops *vops) 2070 + { 2071 + struct xe_vm_pgtable_update_ops *pt_update_ops = 2072 + &vops->pt_update_ops[tile->id]; 2073 + int i; 2074 + 2075 + lockdep_assert_held(&vops->vm->lock); 2076 + xe_vm_assert_held(vops->vm); 2077 + 2078 + for (i = 0; i < pt_update_ops->current_op; ++i) { 2079 + struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[i]; 2080 + 2081 + xe_pt_free_bind(pt_op->entries, pt_op->num_entries); 2082 + } 2083 + xe_bo_put_commit(&vops->pt_update_ops[tile->id].deferred); 2084 + } 2085 + 2086 + /** 2087 + * xe_pt_update_ops_abort() - Abort PT update operations 2088 + * @tile: Tile of PT update operations 2089 + * @vops: VMA operationa 2090 + * 2091 + * Abort PT update operations by unwinding internal PT state 2092 + */ 2093 + void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops) 2094 + { 2095 + struct xe_vm_pgtable_update_ops *pt_update_ops = 2096 + &vops->pt_update_ops[tile->id]; 2097 + int i; 2098 + 2099 + lockdep_assert_held(&vops->vm->lock); 2100 + xe_vm_assert_held(vops->vm); 2101 + 2102 + for (i = pt_update_ops->num_ops - 1; i >= 0; --i) { 2103 + struct xe_vm_pgtable_update_op *pt_op = 2104 + &pt_update_ops->ops[i]; 2105 + 2106 + if (!pt_op->vma || i >= pt_update_ops->current_op) 2107 + continue; 2108 + 2109 + if (pt_op->bind) 2110 + xe_pt_abort_bind(pt_op->vma, pt_op->entries, 2111 + pt_op->num_entries, 2112 + pt_op->rebind); 2113 + else 2114 + xe_pt_abort_unbind(pt_op->vma, pt_op->entries, 2115 + pt_op->num_entries); 2116 + } 2117 + 2118 + xe_bo_put_commit(&vops->pt_update_ops[tile->id].deferred); 1725 2119 }
+6 -8
drivers/gpu/drm/xe/xe_pt.h
··· 17 17 struct xe_tile; 18 18 struct xe_vm; 19 19 struct xe_vma; 20 + struct xe_vma_ops; 20 21 21 22 /* Largest huge pte is currently 1GiB. May become device dependent. */ 22 23 #define MAX_HUGEPTE_LEVEL 2 ··· 35 34 36 35 void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred); 37 36 38 - struct dma_fence * 39 - __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue *q, 40 - struct xe_sync_entry *syncs, u32 num_syncs, 41 - bool rebind); 42 - 43 - struct dma_fence * 44 - __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue *q, 45 - struct xe_sync_entry *syncs, u32 num_syncs); 37 + int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops); 38 + struct dma_fence *xe_pt_update_ops_run(struct xe_tile *tile, 39 + struct xe_vma_ops *vops); 40 + void xe_pt_update_ops_fini(struct xe_tile *tile, struct xe_vma_ops *vops); 41 + void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops); 46 42 47 43 bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma); 48 44
+48
drivers/gpu/drm/xe/xe_pt_types.h
··· 74 74 u32 flags; 75 75 }; 76 76 77 + /** struct xe_vm_pgtable_update_op - Page table update operation */ 78 + struct xe_vm_pgtable_update_op { 79 + /** @entries: entries to update for this operation */ 80 + struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1]; 81 + /** @vma: VMA for operation, operation not valid if NULL */ 82 + struct xe_vma *vma; 83 + /** @num_entries: number of entries for this update operation */ 84 + u32 num_entries; 85 + /** @bind: is a bind */ 86 + bool bind; 87 + /** @rebind: is a rebind */ 88 + bool rebind; 89 + }; 90 + 91 + /** struct xe_vm_pgtable_update_ops: page table update operations */ 92 + struct xe_vm_pgtable_update_ops { 93 + /** @ops: operations */ 94 + struct xe_vm_pgtable_update_op *ops; 95 + /** @deferred: deferred list to destroy PT entries */ 96 + struct llist_head deferred; 97 + /** @q: exec queue for PT operations */ 98 + struct xe_exec_queue *q; 99 + /** @start: start address of ops */ 100 + u64 start; 101 + /** @last: last address of ops */ 102 + u64 last; 103 + /** @num_ops: number of operations */ 104 + u32 num_ops; 105 + /** @current_op: current operations */ 106 + u32 current_op; 107 + /** @needs_userptr_lock: Needs userptr lock */ 108 + bool needs_userptr_lock; 109 + /** @needs_invalidation: Needs invalidation */ 110 + bool needs_invalidation; 111 + /** 112 + * @wait_vm_bookkeep: PT operations need to wait until VM is idle 113 + * (bookkeep dma-resv slots are idle) and stage all future VM activity 114 + * behind these operations (install PT operations into VM kernel 115 + * dma-resv slot). 116 + */ 117 + bool wait_vm_bookkeep; 118 + /** 119 + * @wait_vm_kernel: PT operations need to wait until VM kernel dma-resv 120 + * slots are idle. 121 + */ 122 + bool wait_vm_kernel; 123 + }; 124 + 77 125 #endif
+3 -1
drivers/gpu/drm/xe/xe_query.c
··· 518 518 if (err) 519 519 return err; 520 520 521 - topo.type = DRM_XE_TOPO_EU_PER_DSS; 521 + topo.type = gt->fuse_topo.eu_type == XE_GT_EU_TYPE_SIMD16 ? 522 + DRM_XE_TOPO_SIMD16_EU_PER_DSS : 523 + DRM_XE_TOPO_EU_PER_DSS; 522 524 err = copy_mask(&query_ptr, &topo, 523 525 gt->fuse_topo.eu_mask_per_dss, 524 526 sizeof(gt->fuse_topo.eu_mask_per_dss));
+12 -30
drivers/gpu/drm/xe/xe_rtp.c
··· 217 217 ctx->active_entries = active_entries; 218 218 ctx->n_entries = n_entries; 219 219 } 220 + EXPORT_SYMBOL_IF_KUNIT(xe_rtp_process_ctx_enable_active_tracking); 220 221 221 222 static void rtp_mark_active(struct xe_device *xe, 222 223 struct xe_rtp_process_ctx *ctx, 223 - unsigned int first, unsigned int last) 224 + unsigned int idx) 224 225 { 225 226 if (!ctx->active_entries) 226 227 return; 227 228 228 - if (drm_WARN_ON(&xe->drm, last > ctx->n_entries)) 229 + if (drm_WARN_ON(&xe->drm, idx >= ctx->n_entries)) 229 230 return; 230 231 231 - if (first == last) 232 - bitmap_set(ctx->active_entries, first, 1); 233 - else 234 - bitmap_set(ctx->active_entries, first, last - first + 2); 232 + bitmap_set(ctx->active_entries, idx, 1); 235 233 } 236 234 237 235 /** ··· 274 276 } 275 277 276 278 if (match) 277 - rtp_mark_active(xe, ctx, entry - entries, 278 - entry - entries); 279 + rtp_mark_active(xe, ctx, entry - entries); 279 280 } 280 281 } 281 282 EXPORT_SYMBOL_IF_KUNIT(xe_rtp_process_to_sr); ··· 285 288 * @entries: Table with RTP definitions 286 289 * 287 290 * Walk the table pointed by @entries (with an empty sentinel), executing the 288 - * rules. A few differences from xe_rtp_process_to_sr(): 289 - * 290 - * 1. There is no action associated with each entry since this uses 291 - * struct xe_rtp_entry. Its main use is for marking active workarounds via 292 - * xe_rtp_process_ctx_enable_active_tracking(). 293 - * 2. There is support for OR operations by having entries with no name. 291 + * rules. One difference from xe_rtp_process_to_sr(): there is no action 292 + * associated with each entry since this uses struct xe_rtp_entry. Its main use 293 + * is for marking active workarounds via 294 + * xe_rtp_process_ctx_enable_active_tracking(). 294 295 */ 295 296 void xe_rtp_process(struct xe_rtp_process_ctx *ctx, 296 297 const struct xe_rtp_entry *entries) 297 298 { 298 - const struct xe_rtp_entry *entry, *first_entry; 299 + const struct xe_rtp_entry *entry; 299 300 struct xe_hw_engine *hwe; 300 301 struct xe_gt *gt; 301 302 struct xe_device *xe; 302 303 303 304 rtp_get_context(ctx, &hwe, &gt, &xe); 304 305 305 - first_entry = entries; 306 - if (drm_WARN_ON(&xe->drm, !first_entry->name)) 307 - return; 308 - 309 306 for (entry = entries; entry && entry->rules; entry++) { 310 - if (entry->name) 311 - first_entry = entry; 312 - 313 307 if (!rule_matches(xe, gt, hwe, entry->rules, entry->n_rules)) 314 308 continue; 315 309 316 - /* Fast-forward entry, eliminating the OR'ed entries */ 317 - for (entry++; entry && entry->rules; entry++) 318 - if (entry->name) 319 - break; 320 - entry--; 321 - 322 - rtp_mark_active(xe, ctx, first_entry - entries, 323 - entry - entries); 310 + rtp_mark_active(xe, ctx, entry - entries); 324 311 } 325 312 } 313 + EXPORT_SYMBOL_IF_KUNIT(xe_rtp_process); 326 314 327 315 bool xe_rtp_match_even_instance(const struct xe_gt *gt, 328 316 const struct xe_hw_engine *hwe)
+2 -2
drivers/gpu/drm/xe/xe_rtp.h
··· 374 374 * XE_RTP_RULES - Helper to set multiple rules to a struct xe_rtp_entry_sr entry 375 375 * @...: Rules 376 376 * 377 - * At least one rule is needed and up to 6 are supported. Multiple rules are 377 + * At least one rule is needed and up to 12 are supported. Multiple rules are 378 378 * AND'ed together, i.e. all the rules must evaluate to true for the entry to 379 379 * be processed. See XE_RTP_MATCH_* for the possible match rules. Example: 380 380 * ··· 399 399 * XE_RTP_ACTIONS - Helper to set multiple actions to a struct xe_rtp_entry_sr 400 400 * @...: Actions to be taken 401 401 * 402 - * At least one action is needed and up to 6 are supported. See XE_RTP_ACTION_* 402 + * At least one action is needed and up to 12 are supported. See XE_RTP_ACTION_* 403 403 * for the possible actions. Example: 404 404 * 405 405 * .. code-block:: c
+6
drivers/gpu/drm/xe/xe_rtp_helpers.h
··· 60 60 #define XE_RTP_PASTE_4(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_3(prefix_, sep_, _XE_TUPLE_TAIL args_) 61 61 #define XE_RTP_PASTE_5(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_4(prefix_, sep_, _XE_TUPLE_TAIL args_) 62 62 #define XE_RTP_PASTE_6(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_5(prefix_, sep_, _XE_TUPLE_TAIL args_) 63 + #define XE_RTP_PASTE_7(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_6(prefix_, sep_, _XE_TUPLE_TAIL args_) 64 + #define XE_RTP_PASTE_8(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_7(prefix_, sep_, _XE_TUPLE_TAIL args_) 65 + #define XE_RTP_PASTE_9(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_8(prefix_, sep_, _XE_TUPLE_TAIL args_) 66 + #define XE_RTP_PASTE_10(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_9(prefix_, sep_, _XE_TUPLE_TAIL args_) 67 + #define XE_RTP_PASTE_11(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_10(prefix_, sep_, _XE_TUPLE_TAIL args_) 68 + #define XE_RTP_PASTE_12(prefix_, sep_, args_) _XE_RTP_CONCAT(prefix_, FIRST_ARG args_) __XE_RTP_PASTE_SEP_ ## sep_ XE_RTP_PASTE_11(prefix_, sep_, _XE_TUPLE_TAIL args_) 63 69 64 70 /* 65 71 * XE_RTP_DROP_CAST - Drop cast to convert a compound statement to a initializer
+7
drivers/gpu/drm/xe/xe_sa.c
··· 84 84 struct drm_suballoc *xe_sa_bo_new(struct xe_sa_manager *sa_manager, 85 85 unsigned int size) 86 86 { 87 + /* 88 + * BB to large, return -ENOBUFS indicating user should split 89 + * array of binds into smaller chunks. 90 + */ 91 + if (size > sa_manager->base.size) 92 + return ERR_PTR(-ENOBUFS); 93 + 87 94 return drm_suballoc_new(&sa_manager->base, size, GFP_KERNEL, true, 0); 88 95 } 89 96
+1 -1
drivers/gpu/drm/xe/xe_sriov.c
··· 5 5 6 6 #include <drm/drm_managed.h> 7 7 8 - #include "regs/xe_sriov_regs.h" 8 + #include "regs/xe_regs.h" 9 9 10 10 #include "xe_assert.h" 11 11 #include "xe_device.h"
+8 -12
drivers/gpu/drm/xe/xe_sync.c
··· 53 53 u64 value) 54 54 { 55 55 struct xe_user_fence *ufence; 56 + u64 __user *ptr = u64_to_user_ptr(addr); 57 + 58 + if (!access_ok(ptr, sizeof(ptr))) 59 + return ERR_PTR(-EFAULT); 56 60 57 61 ufence = kmalloc(sizeof(*ufence), GFP_KERNEL); 58 62 if (!ufence) 59 - return NULL; 63 + return ERR_PTR(-ENOMEM); 60 64 61 65 ufence->xe = xe; 62 66 kref_init(&ufence->refcount); 63 - ufence->addr = u64_to_user_ptr(addr); 67 + ufence->addr = ptr; 64 68 ufence->value = value; 65 69 ufence->mm = current->mm; 66 70 mmgrab(ufence->mm); ··· 187 183 } else { 188 184 sync->ufence = user_fence_create(xe, sync_in.addr, 189 185 sync_in.timeline_value); 190 - if (XE_IOCTL_DBG(xe, !sync->ufence)) 191 - return -ENOMEM; 186 + if (XE_IOCTL_DBG(xe, IS_ERR(sync->ufence))) 187 + return PTR_ERR(sync->ufence); 192 188 } 193 189 194 190 break; ··· 200 196 sync->type = sync_in.type; 201 197 sync->flags = sync_in.flags; 202 198 sync->timeline_value = sync_in.timeline_value; 203 - 204 - return 0; 205 - } 206 - 207 - int xe_sync_entry_wait(struct xe_sync_entry *sync) 208 - { 209 - if (sync->fence) 210 - dma_fence_wait(sync->fence, true); 211 199 212 200 return 0; 213 201 }
-1
drivers/gpu/drm/xe/xe_sync.h
··· 22 22 struct xe_sync_entry *sync, 23 23 struct drm_xe_sync __user *sync_user, 24 24 unsigned int flags); 25 - int xe_sync_entry_wait(struct xe_sync_entry *sync); 26 25 int xe_sync_entry_add_deps(struct xe_sync_entry *sync, 27 26 struct xe_sched_job *job); 28 27 void xe_sync_entry_signal(struct xe_sync_entry *sync,
+52
drivers/gpu/drm/xe/xe_trace.h
··· 369 369 (u32)(__entry->val >> 32)) 370 370 ); 371 371 372 + DECLARE_EVENT_CLASS(xe_pm_runtime, 373 + TP_PROTO(struct xe_device *xe, void *caller), 374 + TP_ARGS(xe, caller), 375 + 376 + TP_STRUCT__entry( 377 + __string(dev, __dev_name_xe(xe)) 378 + __field(void *, caller) 379 + ), 380 + 381 + TP_fast_assign( 382 + __assign_str(dev); 383 + __entry->caller = caller; 384 + ), 385 + 386 + TP_printk("dev=%s caller_function=%pS", __get_str(dev), __entry->caller) 387 + ); 388 + 389 + DEFINE_EVENT(xe_pm_runtime, xe_pm_runtime_get, 390 + TP_PROTO(struct xe_device *xe, void *caller), 391 + TP_ARGS(xe, caller) 392 + ); 393 + 394 + DEFINE_EVENT(xe_pm_runtime, xe_pm_runtime_put, 395 + TP_PROTO(struct xe_device *xe, void *caller), 396 + TP_ARGS(xe, caller) 397 + ); 398 + 399 + DEFINE_EVENT(xe_pm_runtime, xe_pm_resume, 400 + TP_PROTO(struct xe_device *xe, void *caller), 401 + TP_ARGS(xe, caller) 402 + ); 403 + 404 + DEFINE_EVENT(xe_pm_runtime, xe_pm_suspend, 405 + TP_PROTO(struct xe_device *xe, void *caller), 406 + TP_ARGS(xe, caller) 407 + ); 408 + 409 + DEFINE_EVENT(xe_pm_runtime, xe_pm_runtime_resume, 410 + TP_PROTO(struct xe_device *xe, void *caller), 411 + TP_ARGS(xe, caller) 412 + ); 413 + 414 + DEFINE_EVENT(xe_pm_runtime, xe_pm_runtime_suspend, 415 + TP_PROTO(struct xe_device *xe, void *caller), 416 + TP_ARGS(xe, caller) 417 + ); 418 + 419 + DEFINE_EVENT(xe_pm_runtime, xe_pm_runtime_get_ioctl, 420 + TP_PROTO(struct xe_device *xe, void *caller), 421 + TP_ARGS(xe, caller) 422 + ); 423 + 372 424 #endif 373 425 374 426 /* This part must be outside protection */
+5 -5
drivers/gpu/drm/xe/xe_trace_bo.h
··· 117 117 TP_ARGS(vma) 118 118 ); 119 119 120 - DEFINE_EVENT(xe_vma, xe_vma_fail, 121 - TP_PROTO(struct xe_vma *vma), 122 - TP_ARGS(vma) 123 - ); 124 - 125 120 DEFINE_EVENT(xe_vma, xe_vma_bind, 126 121 TP_PROTO(struct xe_vma *vma), 127 122 TP_ARGS(vma) ··· 228 233 ); 229 234 230 235 DEFINE_EVENT(xe_vm, xe_vm_rebind_worker_exit, 236 + TP_PROTO(struct xe_vm *vm), 237 + TP_ARGS(vm) 238 + ); 239 + 240 + DEFINE_EVENT(xe_vm, xe_vm_ops_fail, 231 241 TP_PROTO(struct xe_vm *vm), 232 242 TP_ARGS(vm) 233 243 );
+8
drivers/gpu/drm/xe/xe_tuning.c
··· 93 93 REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f))) 94 94 }, 95 95 96 + /* Xe2_HPG */ 97 + 98 + { XE_RTP_NAME("Tuning: vs hit max value"), 99 + XE_RTP_RULES(GRAPHICS_VERSION(2001), ENGINE_CLASS(RENDER)), 100 + XE_RTP_ACTIONS(FIELD_SET(FF_MODE, VS_HIT_MAX_VALUE_MASK, 101 + REG_FIELD_PREP(VS_HIT_MAX_VALUE_MASK, 0x3f))) 102 + }, 103 + 96 104 {} 97 105 }; 98 106
+3
drivers/gpu/drm/xe/xe_uc_fw.c
··· 116 116 fw_def(TIGERLAKE, major_ver(i915, guc, tgl, 70, 19, 2)) 117 117 118 118 #define XE_HUC_FIRMWARE_DEFS(fw_def, mmp_ver, no_ver) \ 119 + fw_def(BATTLEMAGE, no_ver(xe, huc, bmg)) \ 120 + fw_def(LUNARLAKE, no_ver(xe, huc, lnl)) \ 119 121 fw_def(METEORLAKE, no_ver(i915, huc_gsc, mtl)) \ 120 122 fw_def(DG1, no_ver(i915, huc, dg1)) \ 121 123 fw_def(ALDERLAKE_P, no_ver(i915, huc, tgl)) \ ··· 127 125 128 126 /* for the GSC FW we match the compatibility version and not the release one */ 129 127 #define XE_GSC_FIRMWARE_DEFS(fw_def, major_ver) \ 128 + fw_def(LUNARLAKE, major_ver(xe, gsc, lnl, 1, 0, 0)) \ 130 129 fw_def(METEORLAKE, major_ver(i915, gsc, mtl, 1, 0, 0)) 131 130 132 131 #define MAKE_FW_PATH(dir__, uc__, shortname__, version__) \
+273 -426
drivers/gpu/drm/xe/xe_vm.c
··· 133 133 if (q->lr.pfence) { 134 134 long timeout = dma_fence_wait(q->lr.pfence, false); 135 135 136 - if (timeout < 0) 136 + /* Only -ETIME on fence indicates VM needs to be killed */ 137 + if (timeout < 0 || q->lr.pfence->error == -ETIME) 137 138 return -ETIME; 139 + 138 140 dma_fence_put(q->lr.pfence); 139 141 q->lr.pfence = NULL; 140 142 } ··· 313 311 314 312 #define XE_VM_REBIND_RETRY_TIMEOUT_MS 1000 315 313 316 - static void xe_vm_kill(struct xe_vm *vm, bool unlocked) 314 + /** 315 + * xe_vm_kill() - VM Kill 316 + * @vm: The VM. 317 + * @unlocked: Flag indicates the VM's dma-resv is not held 318 + * 319 + * Kill the VM by setting banned flag indicated VM is no longer available for 320 + * use. If in preempt fence mode, also kill all exec queue attached to the VM. 321 + */ 322 + void xe_vm_kill(struct xe_vm *vm, bool unlocked) 317 323 { 318 324 struct xe_exec_queue *q; 319 325 ··· 718 708 list_empty_careful(&vm->userptr.invalidated)) ? 0 : -EAGAIN; 719 709 } 720 710 711 + static int xe_vma_ops_alloc(struct xe_vma_ops *vops, bool array_of_binds) 712 + { 713 + int i; 714 + 715 + for (i = 0; i < XE_MAX_TILES_PER_DEVICE; ++i) { 716 + if (!vops->pt_update_ops[i].num_ops) 717 + continue; 718 + 719 + vops->pt_update_ops[i].ops = 720 + kmalloc_array(vops->pt_update_ops[i].num_ops, 721 + sizeof(*vops->pt_update_ops[i].ops), 722 + GFP_KERNEL); 723 + if (!vops->pt_update_ops[i].ops) 724 + return array_of_binds ? -ENOBUFS : -ENOMEM; 725 + } 726 + 727 + return 0; 728 + } 729 + 730 + static void xe_vma_ops_fini(struct xe_vma_ops *vops) 731 + { 732 + int i; 733 + 734 + for (i = 0; i < XE_MAX_TILES_PER_DEVICE; ++i) 735 + kfree(vops->pt_update_ops[i].ops); 736 + } 737 + 738 + static void xe_vma_ops_incr_pt_update_ops(struct xe_vma_ops *vops, u8 tile_mask) 739 + { 740 + int i; 741 + 742 + for (i = 0; i < XE_MAX_TILES_PER_DEVICE; ++i) 743 + if (BIT(i) & tile_mask) 744 + ++vops->pt_update_ops[i].num_ops; 745 + } 746 + 721 747 static void xe_vm_populate_rebind(struct xe_vma_op *op, struct xe_vma *vma, 722 748 u8 tile_mask) 723 749 { ··· 781 735 782 736 xe_vm_populate_rebind(op, vma, tile_mask); 783 737 list_add_tail(&op->link, &vops->list); 738 + xe_vma_ops_incr_pt_update_ops(vops, tile_mask); 784 739 785 740 return 0; 786 741 } ··· 798 751 struct xe_vma *vma, *next; 799 752 struct xe_vma_ops vops; 800 753 struct xe_vma_op *op, *next_op; 801 - int err; 754 + int err, i; 802 755 803 756 lockdep_assert_held(&vm->lock); 804 757 if ((xe_vm_in_lr_mode(vm) && !rebind_worker) || ··· 806 759 return 0; 807 760 808 761 xe_vma_ops_init(&vops, vm, NULL, NULL, 0); 762 + for (i = 0; i < XE_MAX_TILES_PER_DEVICE; ++i) 763 + vops.pt_update_ops[i].wait_vm_bookkeep = true; 809 764 810 765 xe_vm_assert_held(vm); 811 766 list_for_each_entry(vma, &vm->rebind_list, combined_links.rebind) { ··· 824 775 goto free_ops; 825 776 } 826 777 778 + err = xe_vma_ops_alloc(&vops, false); 779 + if (err) 780 + goto free_ops; 781 + 827 782 fence = ops_execute(vm, &vops); 828 783 if (IS_ERR(fence)) { 829 784 err = PTR_ERR(fence); ··· 842 789 list_del(&op->link); 843 790 kfree(op); 844 791 } 792 + xe_vma_ops_fini(&vops); 845 793 846 794 return err; 847 795 } ··· 852 798 struct dma_fence *fence = NULL; 853 799 struct xe_vma_ops vops; 854 800 struct xe_vma_op *op, *next_op; 801 + struct xe_tile *tile; 802 + u8 id; 855 803 int err; 856 804 857 805 lockdep_assert_held(&vm->lock); ··· 861 805 xe_assert(vm->xe, xe_vm_in_fault_mode(vm)); 862 806 863 807 xe_vma_ops_init(&vops, vm, NULL, NULL, 0); 808 + for_each_tile(tile, vm->xe, id) { 809 + vops.pt_update_ops[id].wait_vm_bookkeep = true; 810 + vops.pt_update_ops[tile->id].q = 811 + xe_tile_migrate_exec_queue(tile); 812 + } 864 813 865 814 err = xe_vm_ops_add_rebind(&vops, vma, tile_mask); 866 815 if (err) 867 816 return ERR_PTR(err); 868 817 818 + err = xe_vma_ops_alloc(&vops, false); 819 + if (err) { 820 + fence = ERR_PTR(err); 821 + goto free_ops; 822 + } 823 + 869 824 fence = ops_execute(vm, &vops); 870 825 826 + free_ops: 871 827 list_for_each_entry_safe(op, next_op, &vops.list, link) { 872 828 list_del(&op->link); 873 829 kfree(op); 874 830 } 831 + xe_vma_ops_fini(&vops); 875 832 876 833 return fence; 877 834 } ··· 1670 1601 XE_WARN_ON(vm->pt_root[id]); 1671 1602 1672 1603 trace_xe_vm_free(vm); 1604 + 1605 + if (vm->xef) 1606 + xe_file_put(vm->xef); 1607 + 1673 1608 kfree(vm); 1674 1609 } 1675 1610 ··· 1710 1637 return q ? q : vm->q[0]; 1711 1638 } 1712 1639 1713 - static struct dma_fence * 1714 - xe_vm_unbind_vma(struct xe_vma *vma, struct xe_exec_queue *q, 1715 - struct xe_sync_entry *syncs, u32 num_syncs, 1716 - bool first_op, bool last_op) 1717 - { 1718 - struct xe_vm *vm = xe_vma_vm(vma); 1719 - struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, q); 1720 - struct xe_tile *tile; 1721 - struct dma_fence *fence = NULL; 1722 - struct dma_fence **fences = NULL; 1723 - struct dma_fence_array *cf = NULL; 1724 - int cur_fence = 0; 1725 - int number_tiles = hweight8(vma->tile_present); 1726 - int err; 1727 - u8 id; 1728 - 1729 - trace_xe_vma_unbind(vma); 1730 - 1731 - if (number_tiles > 1) { 1732 - fences = kmalloc_array(number_tiles, sizeof(*fences), 1733 - GFP_KERNEL); 1734 - if (!fences) 1735 - return ERR_PTR(-ENOMEM); 1736 - } 1737 - 1738 - for_each_tile(tile, vm->xe, id) { 1739 - if (!(vma->tile_present & BIT(id))) 1740 - goto next; 1741 - 1742 - fence = __xe_pt_unbind_vma(tile, vma, q ? q : vm->q[id], 1743 - first_op ? syncs : NULL, 1744 - first_op ? num_syncs : 0); 1745 - if (IS_ERR(fence)) { 1746 - err = PTR_ERR(fence); 1747 - goto err_fences; 1748 - } 1749 - 1750 - if (fences) 1751 - fences[cur_fence++] = fence; 1752 - 1753 - next: 1754 - if (q && vm->pt_root[id] && !list_empty(&q->multi_gt_list)) 1755 - q = list_next_entry(q, multi_gt_list); 1756 - } 1757 - 1758 - if (fences) { 1759 - cf = dma_fence_array_create(number_tiles, fences, 1760 - vm->composite_fence_ctx, 1761 - vm->composite_fence_seqno++, 1762 - false); 1763 - if (!cf) { 1764 - --vm->composite_fence_seqno; 1765 - err = -ENOMEM; 1766 - goto err_fences; 1767 - } 1768 - } 1769 - 1770 - fence = cf ? &cf->base : !fence ? 1771 - xe_exec_queue_last_fence_get(wait_exec_queue, vm) : fence; 1772 - 1773 - return fence; 1774 - 1775 - err_fences: 1776 - if (fences) { 1777 - while (cur_fence) 1778 - dma_fence_put(fences[--cur_fence]); 1779 - kfree(fences); 1780 - } 1781 - 1782 - return ERR_PTR(err); 1783 - } 1784 - 1785 - static struct dma_fence * 1786 - xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q, 1787 - struct xe_sync_entry *syncs, u32 num_syncs, 1788 - u8 tile_mask, bool first_op, bool last_op) 1789 - { 1790 - struct xe_tile *tile; 1791 - struct dma_fence *fence; 1792 - struct dma_fence **fences = NULL; 1793 - struct dma_fence_array *cf = NULL; 1794 - struct xe_vm *vm = xe_vma_vm(vma); 1795 - int cur_fence = 0; 1796 - int number_tiles = hweight8(tile_mask); 1797 - int err; 1798 - u8 id; 1799 - 1800 - trace_xe_vma_bind(vma); 1801 - 1802 - if (number_tiles > 1) { 1803 - fences = kmalloc_array(number_tiles, sizeof(*fences), 1804 - GFP_KERNEL); 1805 - if (!fences) 1806 - return ERR_PTR(-ENOMEM); 1807 - } 1808 - 1809 - for_each_tile(tile, vm->xe, id) { 1810 - if (!(tile_mask & BIT(id))) 1811 - goto next; 1812 - 1813 - fence = __xe_pt_bind_vma(tile, vma, q ? q : vm->q[id], 1814 - first_op ? syncs : NULL, 1815 - first_op ? num_syncs : 0, 1816 - vma->tile_present & BIT(id)); 1817 - if (IS_ERR(fence)) { 1818 - err = PTR_ERR(fence); 1819 - goto err_fences; 1820 - } 1821 - 1822 - if (fences) 1823 - fences[cur_fence++] = fence; 1824 - 1825 - next: 1826 - if (q && vm->pt_root[id] && !list_empty(&q->multi_gt_list)) 1827 - q = list_next_entry(q, multi_gt_list); 1828 - } 1829 - 1830 - if (fences) { 1831 - cf = dma_fence_array_create(number_tiles, fences, 1832 - vm->composite_fence_ctx, 1833 - vm->composite_fence_seqno++, 1834 - false); 1835 - if (!cf) { 1836 - --vm->composite_fence_seqno; 1837 - err = -ENOMEM; 1838 - goto err_fences; 1839 - } 1840 - } 1841 - 1842 - return cf ? &cf->base : fence; 1843 - 1844 - err_fences: 1845 - if (fences) { 1846 - while (cur_fence) 1847 - dma_fence_put(fences[--cur_fence]); 1848 - kfree(fences); 1849 - } 1850 - 1851 - return ERR_PTR(err); 1852 - } 1853 - 1854 1640 static struct xe_user_fence * 1855 1641 find_ufence_get(struct xe_sync_entry *syncs, u32 num_syncs) 1856 1642 { ··· 1723 1791 } 1724 1792 1725 1793 return NULL; 1726 - } 1727 - 1728 - static struct dma_fence * 1729 - xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma, struct xe_exec_queue *q, 1730 - struct xe_bo *bo, struct xe_sync_entry *syncs, u32 num_syncs, 1731 - u8 tile_mask, bool immediate, bool first_op, bool last_op) 1732 - { 1733 - struct dma_fence *fence; 1734 - struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, q); 1735 - 1736 - xe_vm_assert_held(vm); 1737 - xe_bo_assert_held(bo); 1738 - 1739 - if (immediate) { 1740 - fence = xe_vm_bind_vma(vma, q, syncs, num_syncs, tile_mask, 1741 - first_op, last_op); 1742 - if (IS_ERR(fence)) 1743 - return fence; 1744 - } else { 1745 - xe_assert(vm->xe, xe_vm_in_fault_mode(vm)); 1746 - 1747 - fence = xe_exec_queue_last_fence_get(wait_exec_queue, vm); 1748 - } 1749 - 1750 - return fence; 1751 - } 1752 - 1753 - static struct dma_fence * 1754 - xe_vm_unbind(struct xe_vm *vm, struct xe_vma *vma, 1755 - struct xe_exec_queue *q, struct xe_sync_entry *syncs, 1756 - u32 num_syncs, bool first_op, bool last_op) 1757 - { 1758 - struct dma_fence *fence; 1759 - 1760 - xe_vm_assert_held(vm); 1761 - xe_bo_assert_held(xe_vma_bo(vma)); 1762 - 1763 - fence = xe_vm_unbind_vma(vma, q, syncs, num_syncs, first_op, last_op); 1764 - if (IS_ERR(fence)) 1765 - return fence; 1766 - 1767 - return fence; 1768 1794 } 1769 1795 1770 1796 #define ALL_DRM_XE_VM_CREATE_FLAGS (DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE | \ ··· 1806 1916 } 1807 1917 1808 1918 args->vm_id = id; 1809 - vm->xef = xef; 1919 + vm->xef = xe_file_get(xef); 1810 1920 1811 1921 /* Record BO memory for VM pagetable created against client */ 1812 1922 for_each_tile(tile, xe, id) ··· 1864 1974 XE_PL_VRAM0, 1865 1975 XE_PL_VRAM1, 1866 1976 }; 1867 - 1868 - static struct dma_fence * 1869 - xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma, 1870 - struct xe_exec_queue *q, struct xe_sync_entry *syncs, 1871 - u32 num_syncs, bool first_op, bool last_op) 1872 - { 1873 - struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, q); 1874 - 1875 - if (vma->tile_mask != (vma->tile_present & ~vma->tile_invalidated)) { 1876 - return xe_vm_bind(vm, vma, q, xe_vma_bo(vma), syncs, num_syncs, 1877 - vma->tile_mask, true, first_op, last_op); 1878 - } else { 1879 - return xe_exec_queue_last_fence_get(wait_exec_queue, vm); 1880 - } 1881 - } 1882 1977 1883 1978 static void prep_vma_destroy(struct xe_vm *vm, struct xe_vma *vma, 1884 1979 bool post_commit) ··· 2152 2277 return err; 2153 2278 } 2154 2279 2155 - 2156 - static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q, 2157 - struct drm_gpuva_ops *ops, 2158 - struct xe_sync_entry *syncs, u32 num_syncs, 2159 - struct xe_vma_ops *vops, bool last) 2280 + static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops, 2281 + struct xe_vma_ops *vops) 2160 2282 { 2161 2283 struct xe_device *xe = vm->xe; 2162 - struct xe_vma_op *last_op = NULL; 2163 2284 struct drm_gpuva_op *__op; 2164 2285 struct xe_tile *tile; 2165 2286 u8 id, tile_mask = 0; ··· 2169 2298 drm_gpuva_for_each_op(__op, ops) { 2170 2299 struct xe_vma_op *op = gpuva_op_to_vma_op(__op); 2171 2300 struct xe_vma *vma; 2172 - bool first = list_empty(&vops->list); 2173 2301 unsigned int flags = 0; 2174 2302 2175 2303 INIT_LIST_HEAD(&op->link); 2176 2304 list_add_tail(&op->link, &vops->list); 2177 - 2178 - if (first) { 2179 - op->flags |= XE_VMA_OP_FIRST; 2180 - op->num_syncs = num_syncs; 2181 - op->syncs = syncs; 2182 - } 2183 - 2184 - op->q = q; 2185 2305 op->tile_mask = tile_mask; 2186 2306 2187 2307 switch (op->base.op) { ··· 2191 2329 return PTR_ERR(vma); 2192 2330 2193 2331 op->map.vma = vma; 2332 + if (op->map.immediate || !xe_vm_in_fault_mode(vm)) 2333 + xe_vma_ops_incr_pt_update_ops(vops, 2334 + op->tile_mask); 2194 2335 break; 2195 2336 } 2196 2337 case DRM_GPUVA_OP_REMAP: ··· 2238 2373 vm_dbg(&xe->drm, "REMAP:SKIP_PREV: addr=0x%016llx, range=0x%016llx", 2239 2374 (ULL)op->remap.start, 2240 2375 (ULL)op->remap.range); 2376 + } else { 2377 + xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2241 2378 } 2242 2379 } 2243 2380 ··· 2276 2409 vm_dbg(&xe->drm, "REMAP:SKIP_NEXT: addr=0x%016llx, range=0x%016llx", 2277 2410 (ULL)op->remap.start, 2278 2411 (ULL)op->remap.range); 2412 + } else { 2413 + xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2279 2414 } 2280 2415 } 2416 + xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2281 2417 break; 2282 2418 } 2283 2419 case DRM_GPUVA_OP_UNMAP: 2284 2420 case DRM_GPUVA_OP_PREFETCH: 2285 - /* Nothing to do */ 2421 + /* FIXME: Need to skip some prefetch ops */ 2422 + xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2286 2423 break; 2287 2424 default: 2288 2425 drm_warn(&vm->xe->drm, "NOT POSSIBLE"); 2289 2426 } 2290 - 2291 - last_op = op; 2292 2427 2293 2428 err = xe_vma_op_commit(vm, op); 2294 2429 if (err) 2295 2430 return err; 2296 2431 } 2297 2432 2298 - /* FIXME: Unhandled corner case */ 2299 - XE_WARN_ON(!last_op && last && !list_empty(&vops->list)); 2300 - 2301 - if (!last_op) 2302 - return 0; 2303 - 2304 - if (last) { 2305 - last_op->flags |= XE_VMA_OP_LAST; 2306 - last_op->num_syncs = num_syncs; 2307 - last_op->syncs = syncs; 2308 - } 2309 - 2310 2433 return 0; 2311 - } 2312 - 2313 - static struct dma_fence *op_execute(struct xe_vm *vm, struct xe_vma *vma, 2314 - struct xe_vma_op *op) 2315 - { 2316 - struct dma_fence *fence = NULL; 2317 - 2318 - lockdep_assert_held(&vm->lock); 2319 - 2320 - xe_vm_assert_held(vm); 2321 - xe_bo_assert_held(xe_vma_bo(vma)); 2322 - 2323 - switch (op->base.op) { 2324 - case DRM_GPUVA_OP_MAP: 2325 - fence = xe_vm_bind(vm, vma, op->q, xe_vma_bo(vma), 2326 - op->syncs, op->num_syncs, 2327 - op->tile_mask, 2328 - op->map.immediate || !xe_vm_in_fault_mode(vm), 2329 - op->flags & XE_VMA_OP_FIRST, 2330 - op->flags & XE_VMA_OP_LAST); 2331 - break; 2332 - case DRM_GPUVA_OP_REMAP: 2333 - { 2334 - bool prev = !!op->remap.prev; 2335 - bool next = !!op->remap.next; 2336 - 2337 - if (!op->remap.unmap_done) { 2338 - if (prev || next) 2339 - vma->gpuva.flags |= XE_VMA_FIRST_REBIND; 2340 - fence = xe_vm_unbind(vm, vma, op->q, op->syncs, 2341 - op->num_syncs, 2342 - op->flags & XE_VMA_OP_FIRST, 2343 - op->flags & XE_VMA_OP_LAST && 2344 - !prev && !next); 2345 - if (IS_ERR(fence)) 2346 - break; 2347 - op->remap.unmap_done = true; 2348 - } 2349 - 2350 - if (prev) { 2351 - op->remap.prev->gpuva.flags |= XE_VMA_LAST_REBIND; 2352 - dma_fence_put(fence); 2353 - fence = xe_vm_bind(vm, op->remap.prev, op->q, 2354 - xe_vma_bo(op->remap.prev), op->syncs, 2355 - op->num_syncs, 2356 - op->remap.prev->tile_mask, true, 2357 - false, 2358 - op->flags & XE_VMA_OP_LAST && !next); 2359 - op->remap.prev->gpuva.flags &= ~XE_VMA_LAST_REBIND; 2360 - if (IS_ERR(fence)) 2361 - break; 2362 - op->remap.prev = NULL; 2363 - } 2364 - 2365 - if (next) { 2366 - op->remap.next->gpuva.flags |= XE_VMA_LAST_REBIND; 2367 - dma_fence_put(fence); 2368 - fence = xe_vm_bind(vm, op->remap.next, op->q, 2369 - xe_vma_bo(op->remap.next), 2370 - op->syncs, op->num_syncs, 2371 - op->remap.next->tile_mask, true, 2372 - false, op->flags & XE_VMA_OP_LAST); 2373 - op->remap.next->gpuva.flags &= ~XE_VMA_LAST_REBIND; 2374 - if (IS_ERR(fence)) 2375 - break; 2376 - op->remap.next = NULL; 2377 - } 2378 - 2379 - break; 2380 - } 2381 - case DRM_GPUVA_OP_UNMAP: 2382 - fence = xe_vm_unbind(vm, vma, op->q, op->syncs, 2383 - op->num_syncs, op->flags & XE_VMA_OP_FIRST, 2384 - op->flags & XE_VMA_OP_LAST); 2385 - break; 2386 - case DRM_GPUVA_OP_PREFETCH: 2387 - fence = xe_vm_prefetch(vm, vma, op->q, op->syncs, op->num_syncs, 2388 - op->flags & XE_VMA_OP_FIRST, 2389 - op->flags & XE_VMA_OP_LAST); 2390 - break; 2391 - default: 2392 - drm_warn(&vm->xe->drm, "NOT POSSIBLE"); 2393 - } 2394 - 2395 - if (IS_ERR(fence)) 2396 - trace_xe_vma_fail(vma); 2397 - 2398 - return fence; 2399 - } 2400 - 2401 - static struct dma_fence * 2402 - __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma, 2403 - struct xe_vma_op *op) 2404 - { 2405 - struct dma_fence *fence; 2406 - int err; 2407 - 2408 - retry_userptr: 2409 - fence = op_execute(vm, vma, op); 2410 - if (IS_ERR(fence) && PTR_ERR(fence) == -EAGAIN) { 2411 - lockdep_assert_held_write(&vm->lock); 2412 - 2413 - if (op->base.op == DRM_GPUVA_OP_REMAP) { 2414 - if (!op->remap.unmap_done) 2415 - vma = gpuva_to_vma(op->base.remap.unmap->va); 2416 - else if (op->remap.prev) 2417 - vma = op->remap.prev; 2418 - else 2419 - vma = op->remap.next; 2420 - } 2421 - 2422 - if (xe_vma_is_userptr(vma)) { 2423 - err = xe_vma_userptr_pin_pages(to_userptr_vma(vma)); 2424 - if (!err) 2425 - goto retry_userptr; 2426 - 2427 - fence = ERR_PTR(err); 2428 - trace_xe_vma_fail(vma); 2429 - } 2430 - } 2431 - 2432 - return fence; 2433 - } 2434 - 2435 - static struct dma_fence * 2436 - xe_vma_op_execute(struct xe_vm *vm, struct xe_vma_op *op) 2437 - { 2438 - struct dma_fence *fence = ERR_PTR(-ENOMEM); 2439 - 2440 - lockdep_assert_held(&vm->lock); 2441 - 2442 - switch (op->base.op) { 2443 - case DRM_GPUVA_OP_MAP: 2444 - fence = __xe_vma_op_execute(vm, op->map.vma, op); 2445 - break; 2446 - case DRM_GPUVA_OP_REMAP: 2447 - { 2448 - struct xe_vma *vma; 2449 - 2450 - if (!op->remap.unmap_done) 2451 - vma = gpuva_to_vma(op->base.remap.unmap->va); 2452 - else if (op->remap.prev) 2453 - vma = op->remap.prev; 2454 - else 2455 - vma = op->remap.next; 2456 - 2457 - fence = __xe_vma_op_execute(vm, vma, op); 2458 - break; 2459 - } 2460 - case DRM_GPUVA_OP_UNMAP: 2461 - fence = __xe_vma_op_execute(vm, gpuva_to_vma(op->base.unmap.va), 2462 - op); 2463 - break; 2464 - case DRM_GPUVA_OP_PREFETCH: 2465 - fence = __xe_vma_op_execute(vm, 2466 - gpuva_to_vma(op->base.prefetch.va), 2467 - op); 2468 - break; 2469 - default: 2470 - drm_warn(&vm->xe->drm, "NOT POSSIBLE"); 2471 - } 2472 - 2473 - return fence; 2474 2434 } 2475 2435 2476 2436 static void xe_vma_op_unwind(struct xe_vm *vm, struct xe_vma_op *op, ··· 2482 2788 return err; 2483 2789 } 2484 2790 2791 + #ifdef TEST_VM_OPS_ERROR 2792 + if (vops->inject_error && 2793 + vm->xe->vm_inject_error_position == FORCE_OP_ERROR_LOCK) 2794 + return -ENOSPC; 2795 + #endif 2796 + 2485 2797 return 0; 2798 + } 2799 + 2800 + static void op_trace(struct xe_vma_op *op) 2801 + { 2802 + switch (op->base.op) { 2803 + case DRM_GPUVA_OP_MAP: 2804 + trace_xe_vma_bind(op->map.vma); 2805 + break; 2806 + case DRM_GPUVA_OP_REMAP: 2807 + trace_xe_vma_unbind(gpuva_to_vma(op->base.remap.unmap->va)); 2808 + if (op->remap.prev) 2809 + trace_xe_vma_bind(op->remap.prev); 2810 + if (op->remap.next) 2811 + trace_xe_vma_bind(op->remap.next); 2812 + break; 2813 + case DRM_GPUVA_OP_UNMAP: 2814 + trace_xe_vma_unbind(gpuva_to_vma(op->base.unmap.va)); 2815 + break; 2816 + case DRM_GPUVA_OP_PREFETCH: 2817 + trace_xe_vma_bind(gpuva_to_vma(op->base.prefetch.va)); 2818 + break; 2819 + default: 2820 + XE_WARN_ON("NOT POSSIBLE"); 2821 + } 2822 + } 2823 + 2824 + static void trace_xe_vm_ops_execute(struct xe_vma_ops *vops) 2825 + { 2826 + struct xe_vma_op *op; 2827 + 2828 + list_for_each_entry(op, &vops->list, link) 2829 + op_trace(op); 2830 + } 2831 + 2832 + static int vm_ops_setup_tile_args(struct xe_vm *vm, struct xe_vma_ops *vops) 2833 + { 2834 + struct xe_exec_queue *q = vops->q; 2835 + struct xe_tile *tile; 2836 + int number_tiles = 0; 2837 + u8 id; 2838 + 2839 + for_each_tile(tile, vm->xe, id) { 2840 + if (vops->pt_update_ops[id].num_ops) 2841 + ++number_tiles; 2842 + 2843 + if (vops->pt_update_ops[id].q) 2844 + continue; 2845 + 2846 + if (q) { 2847 + vops->pt_update_ops[id].q = q; 2848 + if (vm->pt_root[id] && !list_empty(&q->multi_gt_list)) 2849 + q = list_next_entry(q, multi_gt_list); 2850 + } else { 2851 + vops->pt_update_ops[id].q = vm->q[id]; 2852 + } 2853 + } 2854 + 2855 + return number_tiles; 2486 2856 } 2487 2857 2488 2858 static struct dma_fence *ops_execute(struct xe_vm *vm, 2489 2859 struct xe_vma_ops *vops) 2490 2860 { 2491 - struct xe_vma_op *op, *next; 2861 + struct xe_tile *tile; 2492 2862 struct dma_fence *fence = NULL; 2863 + struct dma_fence **fences = NULL; 2864 + struct dma_fence_array *cf = NULL; 2865 + int number_tiles = 0, current_fence = 0, err; 2866 + u8 id; 2493 2867 2494 - list_for_each_entry_safe(op, next, &vops->list, link) { 2495 - dma_fence_put(fence); 2496 - fence = xe_vma_op_execute(vm, op); 2497 - if (IS_ERR(fence)) { 2498 - drm_warn(&vm->xe->drm, "VM op(%d) failed with %ld", 2499 - op->base.op, PTR_ERR(fence)); 2500 - fence = ERR_PTR(-ENOSPC); 2501 - break; 2868 + number_tiles = vm_ops_setup_tile_args(vm, vops); 2869 + if (number_tiles == 0) 2870 + return ERR_PTR(-ENODATA); 2871 + 2872 + if (number_tiles > 1) { 2873 + fences = kmalloc_array(number_tiles, sizeof(*fences), 2874 + GFP_KERNEL); 2875 + if (!fences) { 2876 + fence = ERR_PTR(-ENOMEM); 2877 + goto err_trace; 2502 2878 } 2503 2879 } 2504 2880 2881 + for_each_tile(tile, vm->xe, id) { 2882 + if (!vops->pt_update_ops[id].num_ops) 2883 + continue; 2884 + 2885 + err = xe_pt_update_ops_prepare(tile, vops); 2886 + if (err) { 2887 + fence = ERR_PTR(err); 2888 + goto err_out; 2889 + } 2890 + } 2891 + 2892 + trace_xe_vm_ops_execute(vops); 2893 + 2894 + for_each_tile(tile, vm->xe, id) { 2895 + if (!vops->pt_update_ops[id].num_ops) 2896 + continue; 2897 + 2898 + fence = xe_pt_update_ops_run(tile, vops); 2899 + if (IS_ERR(fence)) 2900 + goto err_out; 2901 + 2902 + if (fences) 2903 + fences[current_fence++] = fence; 2904 + } 2905 + 2906 + if (fences) { 2907 + cf = dma_fence_array_create(number_tiles, fences, 2908 + vm->composite_fence_ctx, 2909 + vm->composite_fence_seqno++, 2910 + false); 2911 + if (!cf) { 2912 + --vm->composite_fence_seqno; 2913 + fence = ERR_PTR(-ENOMEM); 2914 + goto err_out; 2915 + } 2916 + fence = &cf->base; 2917 + } 2918 + 2919 + for_each_tile(tile, vm->xe, id) { 2920 + if (!vops->pt_update_ops[id].num_ops) 2921 + continue; 2922 + 2923 + xe_pt_update_ops_fini(tile, vops); 2924 + } 2925 + 2926 + return fence; 2927 + 2928 + err_out: 2929 + for_each_tile(tile, vm->xe, id) { 2930 + if (!vops->pt_update_ops[id].num_ops) 2931 + continue; 2932 + 2933 + xe_pt_update_ops_abort(tile, vops); 2934 + } 2935 + while (current_fence) 2936 + dma_fence_put(fences[--current_fence]); 2937 + kfree(fences); 2938 + kfree(cf); 2939 + 2940 + err_trace: 2941 + trace_xe_vm_ops_fail(vm); 2505 2942 return fence; 2506 2943 } 2507 2944 ··· 2713 2888 fence = ops_execute(vm, vops); 2714 2889 if (IS_ERR(fence)) { 2715 2890 err = PTR_ERR(fence); 2716 - /* FIXME: Killing VM rather than proper error handling */ 2717 - xe_vm_kill(vm, false); 2718 2891 goto unlock; 2719 - } else { 2720 - vm_bind_ioctl_ops_fini(vm, vops, fence); 2721 2892 } 2893 + 2894 + vm_bind_ioctl_ops_fini(vm, vops, fence); 2722 2895 } 2723 2896 2724 2897 unlock: ··· 2724 2901 return err; 2725 2902 } 2726 2903 2727 - #define SUPPORTED_FLAGS \ 2904 + #define SUPPORTED_FLAGS_STUB \ 2728 2905 (DRM_XE_VM_BIND_FLAG_READONLY | \ 2729 2906 DRM_XE_VM_BIND_FLAG_IMMEDIATE | \ 2730 2907 DRM_XE_VM_BIND_FLAG_NULL | \ 2731 2908 DRM_XE_VM_BIND_FLAG_DUMPABLE) 2909 + 2910 + #ifdef TEST_VM_OPS_ERROR 2911 + #define SUPPORTED_FLAGS (SUPPORTED_FLAGS_STUB | FORCE_OP_ERROR) 2912 + #else 2913 + #define SUPPORTED_FLAGS SUPPORTED_FLAGS_STUB 2914 + #endif 2915 + 2732 2916 #define XE_64K_PAGE_MASK 0xffffull 2733 2917 #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP) 2734 2918 ··· 2761 2931 sizeof(struct drm_xe_vm_bind_op), 2762 2932 GFP_KERNEL | __GFP_ACCOUNT); 2763 2933 if (!*bind_ops) 2764 - return -ENOMEM; 2934 + return args->num_binds > 1 ? -ENOBUFS : -ENOMEM; 2765 2935 2766 2936 err = __copy_from_user(*bind_ops, bind_user, 2767 2937 sizeof(struct drm_xe_vm_bind_op) * ··· 3080 3250 goto unwind_ops; 3081 3251 } 3082 3252 3083 - err = vm_bind_ioctl_ops_parse(vm, q, ops[i], syncs, num_syncs, 3084 - &vops, i == args->num_binds - 1); 3253 + err = vm_bind_ioctl_ops_parse(vm, ops[i], &vops); 3085 3254 if (err) 3086 3255 goto unwind_ops; 3256 + 3257 + #ifdef TEST_VM_OPS_ERROR 3258 + if (flags & FORCE_OP_ERROR) { 3259 + vops.inject_error = true; 3260 + vm->xe->vm_inject_error_position = 3261 + (vm->xe->vm_inject_error_position + 1) % 3262 + FORCE_OP_ERROR_COUNT; 3263 + } 3264 + #endif 3087 3265 } 3088 3266 3089 3267 /* Nothing to do */ ··· 3100 3262 goto unwind_ops; 3101 3263 } 3102 3264 3265 + err = xe_vma_ops_alloc(&vops, args->num_binds > 1); 3266 + if (err) 3267 + goto unwind_ops; 3268 + 3103 3269 err = vm_bind_ioctl_ops_execute(vm, &vops); 3104 3270 3105 3271 unwind_ops: 3106 3272 if (err && err != -ENODATA) 3107 3273 vm_bind_ioctl_ops_unwind(vm, ops, args->num_binds); 3274 + xe_vma_ops_fini(&vops); 3108 3275 for (i = args->num_binds - 1; i >= 0; --i) 3109 3276 if (ops[i]) 3110 3277 drm_gpuva_ops_free(&vm->gpuvm, ops[i]); ··· 3180 3337 { 3181 3338 struct xe_device *xe = xe_vma_vm(vma)->xe; 3182 3339 struct xe_tile *tile; 3340 + struct xe_gt_tlb_invalidation_fence fence[XE_MAX_TILES_PER_DEVICE]; 3183 3341 u32 tile_needs_invalidate = 0; 3184 - int seqno[XE_MAX_TILES_PER_DEVICE]; 3185 3342 u8 id; 3186 - int ret; 3343 + int ret = 0; 3187 3344 3188 3345 xe_assert(xe, !xe_vma_is_null(vma)); 3189 3346 trace_xe_vma_invalidate(vma); ··· 3208 3365 3209 3366 for_each_tile(tile, xe, id) { 3210 3367 if (xe_pt_zap_ptes(tile, vma)) { 3211 - tile_needs_invalidate |= BIT(id); 3212 3368 xe_device_wmb(xe); 3369 + xe_gt_tlb_invalidation_fence_init(tile->primary_gt, 3370 + &fence[id], true); 3371 + 3213 3372 /* 3214 3373 * FIXME: We potentially need to invalidate multiple 3215 3374 * GTs within the tile 3216 3375 */ 3217 - seqno[id] = xe_gt_tlb_invalidation_vma(tile->primary_gt, NULL, vma); 3218 - if (seqno[id] < 0) 3219 - return seqno[id]; 3376 + ret = xe_gt_tlb_invalidation_vma(tile->primary_gt, 3377 + &fence[id], vma); 3378 + if (ret < 0) { 3379 + xe_gt_tlb_invalidation_fence_fini(&fence[id]); 3380 + goto wait; 3381 + } 3382 + 3383 + tile_needs_invalidate |= BIT(id); 3220 3384 } 3221 3385 } 3222 3386 3223 - for_each_tile(tile, xe, id) { 3224 - if (tile_needs_invalidate & BIT(id)) { 3225 - ret = xe_gt_tlb_invalidation_wait(tile->primary_gt, seqno[id]); 3226 - if (ret < 0) 3227 - return ret; 3228 - } 3229 - } 3387 + wait: 3388 + for_each_tile(tile, xe, id) 3389 + if (tile_needs_invalidate & BIT(id)) 3390 + xe_gt_tlb_invalidation_fence_wait(&fence[id]); 3230 3391 3231 3392 vma->tile_invalidated = vma->tile_mask; 3232 3393 3233 - return 0; 3394 + return ret; 3234 3395 } 3235 3396 3236 3397 struct xe_vm_snapshot {
+2
drivers/gpu/drm/xe/xe_vm.h
··· 259 259 return drm_gpuvm_resv(&vm->gpuvm); 260 260 } 261 261 262 + void xe_vm_kill(struct xe_vm *vm, bool unlocked); 263 + 262 264 /** 263 265 * xe_vm_assert_held(vm) - Assert that the vm's reservation object is held. 264 266 * @vm: The vm
+30 -25
drivers/gpu/drm/xe/xe_vm_types.h
··· 21 21 struct xe_sync_entry; 22 22 struct xe_user_fence; 23 23 struct xe_vm; 24 + struct xe_vm_pgtable_update_op; 25 + 26 + #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 27 + #define TEST_VM_OPS_ERROR 28 + #define FORCE_OP_ERROR BIT(31) 29 + 30 + #define FORCE_OP_ERROR_LOCK 0 31 + #define FORCE_OP_ERROR_PREPARE 1 32 + #define FORCE_OP_ERROR_RUN 2 33 + #define FORCE_OP_ERROR_COUNT 3 34 + #endif 24 35 25 36 #define XE_VMA_READ_ONLY DRM_GPUVA_USERBITS 26 37 #define XE_VMA_DESTROYED (DRM_GPUVA_USERBITS << 1) 27 38 #define XE_VMA_ATOMIC_PTE_BIT (DRM_GPUVA_USERBITS << 2) 28 - #define XE_VMA_FIRST_REBIND (DRM_GPUVA_USERBITS << 3) 29 - #define XE_VMA_LAST_REBIND (DRM_GPUVA_USERBITS << 4) 30 - #define XE_VMA_PTE_4K (DRM_GPUVA_USERBITS << 5) 31 - #define XE_VMA_PTE_2M (DRM_GPUVA_USERBITS << 6) 32 - #define XE_VMA_PTE_1G (DRM_GPUVA_USERBITS << 7) 33 - #define XE_VMA_PTE_64K (DRM_GPUVA_USERBITS << 8) 34 - #define XE_VMA_PTE_COMPACT (DRM_GPUVA_USERBITS << 9) 35 - #define XE_VMA_DUMPABLE (DRM_GPUVA_USERBITS << 10) 39 + #define XE_VMA_PTE_4K (DRM_GPUVA_USERBITS << 3) 40 + #define XE_VMA_PTE_2M (DRM_GPUVA_USERBITS << 4) 41 + #define XE_VMA_PTE_1G (DRM_GPUVA_USERBITS << 5) 42 + #define XE_VMA_PTE_64K (DRM_GPUVA_USERBITS << 6) 43 + #define XE_VMA_PTE_COMPACT (DRM_GPUVA_USERBITS << 7) 44 + #define XE_VMA_DUMPABLE (DRM_GPUVA_USERBITS << 8) 36 45 37 46 /** struct xe_userptr - User pointer */ 38 47 struct xe_userptr { ··· 107 98 * in write mode. 108 99 */ 109 100 u8 tile_present; 101 + 102 + /** @tile_staged: bind is staged for this VMA */ 103 + u8 tile_staged; 110 104 111 105 /** 112 106 * @pat_index: The pat index to use when encoding the PTEs for this vma. ··· 326 314 327 315 /** enum xe_vma_op_flags - flags for VMA operation */ 328 316 enum xe_vma_op_flags { 329 - /** @XE_VMA_OP_FIRST: first VMA operation for a set of syncs */ 330 - XE_VMA_OP_FIRST = BIT(0), 331 - /** @XE_VMA_OP_LAST: last VMA operation for a set of syncs */ 332 - XE_VMA_OP_LAST = BIT(1), 333 317 /** @XE_VMA_OP_COMMITTED: VMA operation committed */ 334 - XE_VMA_OP_COMMITTED = BIT(2), 318 + XE_VMA_OP_COMMITTED = BIT(0), 335 319 /** @XE_VMA_OP_PREV_COMMITTED: Previous VMA operation committed */ 336 - XE_VMA_OP_PREV_COMMITTED = BIT(3), 320 + XE_VMA_OP_PREV_COMMITTED = BIT(1), 337 321 /** @XE_VMA_OP_NEXT_COMMITTED: Next VMA operation committed */ 338 - XE_VMA_OP_NEXT_COMMITTED = BIT(4), 322 + XE_VMA_OP_NEXT_COMMITTED = BIT(2), 339 323 }; 340 324 341 325 /** struct xe_vma_op - VMA operation */ 342 326 struct xe_vma_op { 343 327 /** @base: GPUVA base operation */ 344 328 struct drm_gpuva_op base; 345 - /** @q: exec queue for this operation */ 346 - struct xe_exec_queue *q; 347 - /** 348 - * @syncs: syncs for this operation, only used on first and last 349 - * operation 350 - */ 351 - struct xe_sync_entry *syncs; 352 - /** @num_syncs: number of syncs */ 353 - u32 num_syncs; 354 329 /** @link: async operation link */ 355 330 struct list_head link; 356 331 /** @flags: operation flags */ ··· 361 362 struct list_head list; 362 363 /** @vm: VM */ 363 364 struct xe_vm *vm; 364 - /** @q: exec queue these operations */ 365 + /** @q: exec queue for VMA operations */ 365 366 struct xe_exec_queue *q; 366 367 /** @syncs: syncs these operation */ 367 368 struct xe_sync_entry *syncs; 368 369 /** @num_syncs: number of syncs */ 369 370 u32 num_syncs; 371 + /** @pt_update_ops: page table update operations */ 372 + struct xe_vm_pgtable_update_ops pt_update_ops[XE_MAX_TILES_PER_DEVICE]; 373 + #ifdef TEST_VM_OPS_ERROR 374 + /** @inject_error: inject error to test error handling */ 375 + bool inject_error; 376 + #endif 370 377 }; 371 378 372 379 #endif
+15
drivers/gpu/drm/xe/xe_wa.c
··· 486 486 XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)), 487 487 XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, SLM_WMTP_RESTORE)) 488 488 }, 489 + { XE_RTP_NAME("14021402888"), 490 + XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 491 + XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) 492 + }, 489 493 490 494 /* Xe2_HPG */ 491 495 ··· 541 537 { XE_RTP_NAME("14021402888"), 542 538 XE_RTP_RULES(GRAPHICS_VERSION(2001), ENGINE_CLASS(RENDER)), 543 539 XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) 540 + }, 541 + 542 + /* Xe2_LPM */ 543 + 544 + { XE_RTP_NAME("16021639441"), 545 + XE_RTP_RULES(MEDIA_VERSION(2000)), 546 + XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), 547 + GHWSP_CSB_REPORT_DIS | 548 + PPHWSP_CSB_AND_TIMESTAMP_REPORT_DIS, 549 + XE_RTP_ACTION_FLAG(ENGINE_BASE))) 544 550 }, 545 551 546 552 /* Xe2_HPM */ ··· 755 741 756 742 xe_rtp_process_ctx_enable_active_tracking(&ctx, gt->wa_active.oob, 757 743 ARRAY_SIZE(oob_was)); 744 + gt->wa_active.oob_initialized = true; 758 745 xe_rtp_process(&ctx, oob_was); 759 746 } 760 747
+6 -1
drivers/gpu/drm/xe/xe_wa.h
··· 6 6 #ifndef _XE_WA_ 7 7 #define _XE_WA_ 8 8 9 + #include "xe_assert.h" 10 + 9 11 struct drm_printer; 10 12 struct xe_gt; 11 13 struct xe_hw_engine; ··· 27 25 * @gt__: gt instance 28 26 * @id__: XE_OOB_<id__>, as generated by build system in generated/xe_wa_oob.h 29 27 */ 30 - #define XE_WA(gt__, id__) test_bit(XE_WA_OOB_ ## id__, (gt__)->wa_active.oob) 28 + #define XE_WA(gt__, id__) ({ \ 29 + xe_gt_assert(gt__, (gt__)->wa_active.oob_initialized); \ 30 + test_bit(XE_WA_OOB_ ## id__, (gt__)->wa_active.oob); \ 31 + }) 31 32 32 33 #endif
+2
drivers/gpu/drm/xe/xe_wa_oob.rules
··· 29 29 13011645652 GRAPHICS_VERSION(2004) 30 30 22019338487 MEDIA_VERSION(2000) 31 31 GRAPHICS_VERSION(2001) 32 + 22019338487_display PLATFORM(LUNARLAKE) 33 + 16023588340 GRAPHICS_VERSION(2001)
+13 -5
include/uapi/drm/xe_drm.h
··· 517 517 * available per Dual Sub Slices (DSS). For example a query response 518 518 * containing the following in mask: 519 519 * ``EU_PER_DSS ff ff 00 00 00 00 00 00`` 520 - * means each DSS has 16 EU. 520 + * means each DSS has 16 SIMD8 EUs. This type may be omitted if device 521 + * doesn't have SIMD8 EUs. 522 + * - %DRM_XE_TOPO_SIMD16_EU_PER_DSS - To query the mask of SIMD16 Execution 523 + * Units (EU) available per Dual Sub Slices (DSS). For example a query 524 + * response containing the following in mask: 525 + * ``SIMD16_EU_PER_DSS ff ff 00 00 00 00 00 00`` 526 + * means each DSS has 16 SIMD16 EUs. This type may be omitted if device 527 + * doesn't have SIMD16 EUs. 521 528 */ 522 529 struct drm_xe_query_topology_mask { 523 530 /** @gt_id: GT ID the mask is associated with */ ··· 534 527 #define DRM_XE_TOPO_DSS_COMPUTE 2 535 528 #define DRM_XE_TOPO_L3_BANK 3 536 529 #define DRM_XE_TOPO_EU_PER_DSS 4 530 + #define DRM_XE_TOPO_SIMD16_EU_PER_DSS 5 537 531 /** @type: type of mask */ 538 532 __u16 type; 539 533 ··· 1598 1590 * b. Counter select c. Counter size and d. BC report. Also refer to the 1599 1591 * oa_formats array in drivers/gpu/drm/xe/xe_oa.c. 1600 1592 */ 1601 - #define DRM_XE_OA_FORMAT_MASK_FMT_TYPE (0xff << 0) 1602 - #define DRM_XE_OA_FORMAT_MASK_COUNTER_SEL (0xff << 8) 1603 - #define DRM_XE_OA_FORMAT_MASK_COUNTER_SIZE (0xff << 16) 1604 - #define DRM_XE_OA_FORMAT_MASK_BC_REPORT (0xff << 24) 1593 + #define DRM_XE_OA_FORMAT_MASK_FMT_TYPE (0xffu << 0) 1594 + #define DRM_XE_OA_FORMAT_MASK_COUNTER_SEL (0xffu << 8) 1595 + #define DRM_XE_OA_FORMAT_MASK_COUNTER_SIZE (0xffu << 16) 1596 + #define DRM_XE_OA_FORMAT_MASK_BC_REPORT (0xffu << 24) 1605 1597 1606 1598 /** 1607 1599 * @DRM_XE_OA_PROPERTY_OA_PERIOD_EXPONENT: Requests periodic OA unit