Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/amdgpu: Queue KFD reset workitem in VF FED

The guest recovery sequence is buggy in Fatal Error when both
FLR & KFD reset workitems are queued at the same time. In addition,
FLR guest recovery sequence is out of order when PF/VF communication
breaks due to a GPU fatal error

As a temporary work around, perform a KFD style reset (Initiate reset
request from the guest) inside the pf2vf thread on FED.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Victor Skvortsov and committed by
Alex Deucher
5434bc03 3a19a8af

+1 -1
+1 -1
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
··· 602 602 amdgpu_sriov_runtime(adev)) { 603 603 amdgpu_ras_set_fed(adev, true); 604 604 if (amdgpu_reset_domain_schedule(adev->reset_domain, 605 - &adev->virt.flr_work)) 605 + &adev->kfd.reset_work)) 606 606 return; 607 607 else 608 608 dev_err(adev->dev, "Failed to queue work! at %s", __func__);