Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/sched: Add boolean to mark if sched is ready to work v5

Problem:
A particular scheduler may become unsuable (underlying HW) after
some event (e.g. GPU reset). If it's later chosen by
the get free sched. policy a command will fail to be
submitted.

Fix:
Add a driver specific callback to report the sched status so
rq with bad sched can be avoided in favor of working one or
none in which case job init will fail.

v2: Switch from driver callback to flag in scheduler.

v3: rebase

v4: Remove ready paramter from drm_sched_init, set
uncoditionally to true once init done.

v5: fix missed change in v3d in v4 (Alex)

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Andrey Grodzovsky and committed by
Alex Deucher
faf6e1a8 2bb42410

+17 -1
+8 -1
drivers/gpu/drm/scheduler/sched_entity.c
··· 130 130 int i; 131 131 132 132 for (i = 0; i < entity->num_rq_list; ++i) { 133 - num_jobs = atomic_read(&entity->rq_list[i]->sched->num_jobs); 133 + struct drm_gpu_scheduler *sched = entity->rq_list[i]->sched; 134 + 135 + if (!entity->rq_list[i]->sched->ready) { 136 + DRM_WARN("sched%s is not ready, skipping", sched->name); 137 + continue; 138 + } 139 + 140 + num_jobs = atomic_read(&sched->num_jobs); 134 141 if (num_jobs < min_jobs) { 135 142 min_jobs = num_jobs; 136 143 rq = entity->rq_list[i];
+6
drivers/gpu/drm/scheduler/sched_main.c
··· 420 420 struct drm_gpu_scheduler *sched; 421 421 422 422 drm_sched_entity_select_rq(entity); 423 + if (!entity->rq) 424 + return -ENOENT; 425 + 423 426 sched = entity->rq->sched; 424 427 425 428 job->sched = sched; ··· 636 633 return PTR_ERR(sched->thread); 637 634 } 638 635 636 + sched->ready = true; 639 637 return 0; 640 638 } 641 639 EXPORT_SYMBOL(drm_sched_init); ··· 652 648 { 653 649 if (sched->thread) 654 650 kthread_stop(sched->thread); 651 + 652 + sched->ready = false; 655 653 } 656 654 EXPORT_SYMBOL(drm_sched_fini);
+3
include/drm/gpu_scheduler.h
··· 264 264 * @hang_limit: once the hangs by a job crosses this limit then it is marked 265 265 * guilty and it will be considered for scheduling further. 266 266 * @num_jobs: the number of jobs in queue in the scheduler 267 + * @ready: marks if the underlying HW is ready to work 267 268 * 268 269 * One scheduler is implemented for each hardware ring. 269 270 */ ··· 284 283 spinlock_t job_list_lock; 285 284 int hang_limit; 286 285 atomic_t num_jobs; 286 + bool ready; 287 287 }; 288 288 289 289 int drm_sched_init(struct drm_gpu_scheduler *sched, 290 290 const struct drm_sched_backend_ops *ops, 291 291 uint32_t hw_submission, unsigned hang_limit, long timeout, 292 292 const char *name); 293 + 293 294 void drm_sched_fini(struct drm_gpu_scheduler *sched); 294 295 int drm_sched_job_init(struct drm_sched_job *job, 295 296 struct drm_sched_entity *entity,