Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/amdgpu: fix double gpu_recovery for NV of SRIOV

issues:
gpu_recover() is re-entered by the mailbox interrupt
handler mxgpu_nv.c

fix:
we need to bypass the gpu_recover() invoke in mailbox
interrupt as long as the timeout is not infinite (thus the TDR
will be triggered automatically after time out, no need to invoke
gpu_recover() through mailbox interrupt.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Monk Liu and committed by
Alex Deucher
1512d064 198e36ba

+5 -1
+5 -1
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
··· 269 269 } 270 270 271 271 /* Trigger recovery for world switch failure if no TDR */ 272 - if (amdgpu_device_should_recover_gpu(adev)) 272 + if (amdgpu_device_should_recover_gpu(adev) 273 + && (adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT || 274 + adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT || 275 + adev->compute_timeout == MAX_SCHEDULE_TIMEOUT || 276 + adev->video_timeout == MAX_SCHEDULE_TIMEOUT)) 273 277 amdgpu_device_gpu_recover(adev, NULL); 274 278 } 275 279