Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/amd/amdgpu: fix bad job hw_fence use after free in advance tdr

[Why]
In advance tdr mode, the real bad job will be resubmitted twice, while
in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job
is put one more time than other jobs.

[How]
Adding dma_fence_get before resbumit job in
amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Jingwen Chen and committed by
Alex Deucher
38d4e463 d9bd0541

+4
+4
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
··· 4850 4850 4851 4851 /* clear job's guilty and depend the folowing step to decide the real one */ 4852 4852 drm_sched_reset_karma(s_job); 4853 + /* for the real bad job, it will be resubmitted twice, adding a dma_fence_get 4854 + * to make sure fence is balanced */ 4855 + dma_fence_get(s_job->s_fence->parent); 4853 4856 drm_sched_resubmit_jobs_ext(&ring->sched, 1); 4854 4857 4855 4858 ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, ring->sched.timeout); ··· 4888 4885 4889 4886 /* got the hw fence, signal finished fence */ 4890 4887 atomic_dec(ring->sched.score); 4888 + dma_fence_put(s_job->s_fence->parent); 4891 4889 dma_fence_get(&s_job->s_fence->finished); 4892 4890 dma_fence_signal(&s_job->s_fence->finished); 4893 4891 dma_fence_put(&s_job->s_fence->finished);