Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/amdkfd: fixed page fault when enable MES shader debugger

Initialize the process context address before setting the shader debugger.

[ 260.781212] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[ 260.781236] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 260.781255] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A40
[ 260.781270] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 260.781284] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 260.781296] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 260.781308] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x4
[ 260.781320] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 260.781332] amdgpu 0000:03:00.0: amdgpu: RW: 0x1
[ 260.782017] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[ 260.782039] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 260.782058] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A41
[ 260.782073] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 260.782087] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[ 260.782098] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 260.782110] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x4
[ 260.782122] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 260.782137] amdgpu 0000:03:00.0: amdgpu: RW: 0x1
[ 260.782155] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[ 260.782166] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10

Fixes: 438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3849
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Jesse.zhang@amd.com and committed by
Alex Deucher
5b231f5b a317017f

+17
+17
drivers/gpu/drm/amd/amdkfd/kfd_debug.c
··· 350 350 { 351 351 uint32_t spi_dbg_cntl = pdd->spi_dbg_override | pdd->spi_dbg_launch_mode; 352 352 uint32_t flags = pdd->process->dbg_flags; 353 + struct amdgpu_device *adev = pdd->dev->adev; 354 + int r; 353 355 354 356 if (!kfd_dbg_is_per_vmid_supported(pdd->dev)) 355 357 return 0; 358 + 359 + if (!pdd->proc_ctx_cpu_ptr) { 360 + r = amdgpu_amdkfd_alloc_gtt_mem(adev, 361 + AMDGPU_MES_PROC_CTX_SIZE, 362 + &pdd->proc_ctx_bo, 363 + &pdd->proc_ctx_gpu_addr, 364 + &pdd->proc_ctx_cpu_ptr, 365 + false); 366 + if (r) { 367 + dev_err(adev->dev, 368 + "failed to allocate process context bo\n"); 369 + return r; 370 + } 371 + memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE); 372 + } 356 373 357 374 return amdgpu_mes_set_shader_debugger(pdd->dev->adev, pdd->proc_ctx_gpu_addr, spi_dbg_cntl, 358 375 pdd->watch_points, flags, sq_trap_en);