Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

Xe can skip the reset if TDR has fired before the free job worker and can
also re-arm the timeout timer in some scenarios. Instead of manipulating
scheduler's internals, inform the scheduler that the job did not actually
timeout and no reset was performed through the new status code
DRM_GPU_SCHED_STAT_NO_HANG.

Note that, in the first case, there is no need to restart submission if it
hasn't been stopped.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-7-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>

+3 -9
+3 -9
drivers/gpu/drm/xe/xe_guc_submit.c
··· 1092 1092 * list so job can be freed and kick scheduler ensuring free job is not 1093 1093 * lost. 1094 1094 */ 1095 - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) { 1096 - xe_sched_add_pending_job(sched, job); 1097 - xe_sched_submission_start(sched); 1098 - 1099 - return DRM_GPU_SCHED_STAT_RESET; 1100 - } 1095 + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) 1096 + return DRM_GPU_SCHED_STAT_NO_HANG; 1101 1097 1102 1098 /* Kill the run_job entry point */ 1103 1099 xe_sched_submission_stop(sched); ··· 1271 1275 * but there is not currently an easy way to do in DRM scheduler. With 1272 1276 * some thought, do this in a follow up. 1273 1277 */ 1274 - xe_sched_add_pending_job(sched, job); 1275 1278 xe_sched_submission_start(sched); 1276 - 1277 - return DRM_GPU_SCHED_STAT_RESET; 1279 + return DRM_GPU_SCHED_STAT_NO_HANG; 1278 1280 } 1279 1281 1280 1282 static void __guc_exec_queue_fini_async(struct work_struct *w)