Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/xe: Fix taking invalid lock on wedge

If device wedges on e.g. GuC upload, the submission is not yet enabled
and the state is not even initialized. Protect the wedge call so it does
nothing in this case. It fixes the following splat:

[] xe 0000:bf:00.0: [drm] device wedged, needs recovery
[] ------------[ cut here ]------------
[] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[] WARNING: CPU: 48 PID: 312 at kernel/locking/mutex.c:564 __mutex_lock+0x8a1/0xe60
...
[] RIP: 0010:__mutex_lock+0x8a1/0xe60
[] mutex_lock_nested+0x1b/0x30
[] xe_guc_submit_wedge+0x80/0x2b0 [xe]

Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250402-warn-after-wedge-v1-1-93e971511fa5@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

+14
+9
drivers/gpu/drm/xe/xe_guc_submit.c
··· 300 300 301 301 primelockdep(guc); 302 302 303 + guc->submission_state.initialized = true; 304 + 303 305 return drmm_add_action_or_reset(&xe->drm, guc_submit_fini, guc); 304 306 } 305 307 ··· 835 833 int err; 836 834 837 835 xe_gt_assert(guc_to_gt(guc), guc_to_xe(guc)->wedged.mode); 836 + 837 + /* 838 + * If device is being wedged even before submission_state is 839 + * initialized, there's nothing to do here. 840 + */ 841 + if (!guc->submission_state.initialized) 842 + return; 838 843 839 844 err = devm_add_action_or_reset(guc_to_xe(guc)->drm.dev, 840 845 guc_submit_wedged_fini, guc);
+5
drivers/gpu/drm/xe/xe_guc_types.h
··· 89 89 struct mutex lock; 90 90 /** @submission_state.enabled: submission is enabled */ 91 91 bool enabled; 92 + /** 93 + * @submission_state.initialized: mark when submission state is 94 + * even initialized - before that not even the lock is valid 95 + */ 96 + bool initialized; 92 97 /** @submission_state.fini_wq: submit fini wait queue */ 93 98 wait_queue_head_t fini_wq; 94 99 } submission_state;