Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/amdgpu: abort KIQ waits when there is a pending reset

Stop waiting for the KIQ to return back when there is a reset pending.
It's quite likely that the KIQ will never response.

Signed-off-by: Koenig Christian <Christian.Koenig@amd.com>
Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com>
Tested-by: Victor Skvortsov <victor.skvortsov@amd.com>
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Victor Skvortsov and committed by
Alex Deucher
19cff165 96595204

+8 -1
+2 -1
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
··· 786 786 goto failed_kiq; 787 787 788 788 might_sleep(); 789 - while (r < 1 && cnt++ < MAX_KIQ_REG_TRY) { 789 + while (r < 1 && cnt++ < MAX_KIQ_REG_TRY && 790 + !amdgpu_reset_pending(adev->reset_domain)) { 790 791 791 792 msleep(MAX_KIQ_REG_BAILOUT_INTERVAL); 792 793 r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
+6
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
··· 136 136 return queue_work(domain->wq, work); 137 137 } 138 138 139 + static inline bool amdgpu_reset_pending(struct amdgpu_reset_domain *domain) 140 + { 141 + lockdep_assert_held(&domain->sem); 142 + return rwsem_is_contended(&domain->sem); 143 + } 144 + 139 145 void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain *reset_domain); 140 146 141 147 void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain);