Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/amd: Clean up kfd node on surprise disconnect

When an eGPU is unplugged the KFD topology should also be destroyed
for that GPU. This never happens because the fini_sw callbacks never
get to run. Run them manually before calling amdgpu_device_ip_fini_early()
when a device has already been disconnected.

This location is intentionally chosen to make sure that the kfd locking
refcount doesn't get incremented unintentionally.

Cc: kent.russell@amd.com
Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6a23e7b4332c10f8b56c33a9c5431b52ecff9aab)
Cc: stable@vger.kernel.org

authored by

Mario Limonciello (AMD) and committed by
Alex Deucher
28695ca0 9cb6278b

+8
+8
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
··· 5063 5063 5064 5064 amdgpu_ttm_set_buffer_funcs_status(adev, false); 5065 5065 5066 + /* 5067 + * device went through surprise hotplug; we need to destroy topology 5068 + * before ip_fini_early to prevent kfd locking refcount issues by calling 5069 + * amdgpu_amdkfd_suspend() 5070 + */ 5071 + if (drm_dev_is_unplugged(adev_to_drm(adev))) 5072 + amdgpu_amdkfd_device_fini_sw(adev); 5073 + 5066 5074 amdgpu_device_ip_fini_early(adev); 5067 5075 5068 5076 amdgpu_irq_fini_hw(adev);