Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/amdgpu: Increase tlb flush timeout for sriov

[Why]
During multi-vf executing benchmark (Luxmark) observed kiq error timeout.
It happenes because all of VFs do the tlb invalidation at the same time.
Although each VF has the invalidate register set, from hardware side
the invalidate requests are queue to execute.

[How]
In case of 12 VF increase timeout on 12*100ms

Signed-off-by: Dusica Milinkovic <Dusica.Milinkovic@amd.com>
Acked-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Dusica Milinkovic and committed by
Alex Deucher
373008bf c7dafdfa

+5 -3
+1 -1
drivers/gpu/drm/amd/amdgpu/amdgpu.h
··· 317 317 AMDGPU_CP_KIQ_IRQ_DRIVER0 = 0, 318 318 AMDGPU_CP_KIQ_IRQ_LAST 319 319 }; 320 - 320 + #define SRIOV_USEC_TIMEOUT 1200000 /* wait 12 * 100ms for SRIOV */ 321 321 #define MAX_KIQ_REG_WAIT 5000 /* in usecs, 5ms */ 322 322 #define MAX_KIQ_REG_BAILOUT_INTERVAL 5 /* in msecs, 5ms */ 323 323 #define MAX_KIQ_REG_TRY 1000
+2 -1
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
··· 419 419 uint32_t seq; 420 420 uint16_t queried_pasid; 421 421 bool ret; 422 + u32 usec_timeout = amdgpu_sriov_vf(adev) ? SRIOV_USEC_TIMEOUT : adev->usec_timeout; 422 423 struct amdgpu_ring *ring = &adev->gfx.kiq.ring; 423 424 struct amdgpu_kiq *kiq = &adev->gfx.kiq; 424 425 ··· 438 437 439 438 amdgpu_ring_commit(ring); 440 439 spin_unlock(&adev->gfx.kiq.ring_lock); 441 - r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout); 440 + r = amdgpu_fence_wait_polling(ring, seq, usec_timeout); 442 441 if (r < 1) { 443 442 dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r); 444 443 return -ETIME;
+2 -1
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
··· 896 896 uint32_t seq; 897 897 uint16_t queried_pasid; 898 898 bool ret; 899 + u32 usec_timeout = amdgpu_sriov_vf(adev) ? SRIOV_USEC_TIMEOUT : adev->usec_timeout; 899 900 struct amdgpu_ring *ring = &adev->gfx.kiq.ring; 900 901 struct amdgpu_kiq *kiq = &adev->gfx.kiq; 901 902 ··· 936 935 937 936 amdgpu_ring_commit(ring); 938 937 spin_unlock(&adev->gfx.kiq.ring_lock); 939 - r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout); 938 + r = amdgpu_fence_wait_polling(ring, seq, usec_timeout); 940 939 if (r < 1) { 941 940 dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r); 942 941 up_read(&adev->reset_domain->sem);