Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

issues:
MEC is ruined by the amdkfd_pre_reset after VF FLR done

fix:
amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR,
the correct sequence is do amdkfd_pre_reset before VF FLR but there is
a limitation to block this sequence:
if we do pre_reset() before VF FLR, it would go KIQ way to do register
access and stuck there, because KIQ probably won't work by that time
(e.g. you already made GFX hang)

so the best way right now is to simply remove it.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Monk Liu and committed by
Alex Deucher
5a7489a7 1512d064

-2
-2
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
··· 3669 3669 if (r) 3670 3670 return r; 3671 3671 3672 - amdgpu_amdkfd_pre_reset(adev); 3673 - 3674 3672 /* Resume IP prior to SMC */ 3675 3673 r = amdgpu_device_ip_reinit_early_sriov(adev); 3676 3674 if (r)