Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/xe: Fix fault on fd close after unbind

If userspace holds an fd open, unbinds the device and then closes it,
the driver shouldn't try to access the hardware. Protect it by using
drm_dev_enter()/drm_dev_exit(). This fixes the following page fault:

<6> [IGT] xe_wedged: exiting, ret=98
<1> BUG: unable to handle page fault for address: ffffc901bc5e508c
<1> #PF: supervisor read access in kernel mode
<1> #PF: error_code(0x0000) - not-present page
...
<4> xe_lrc_update_timestamp+0x1c/0xd0 [xe]
<4> xe_exec_queue_update_run_ticks+0x50/0xb0 [xe]
<4> xe_exec_queue_fini+0x16/0xb0 [xe]
<4> __guc_exec_queue_fini_async+0xc4/0x190 [xe]
<4> guc_exec_queue_fini_async+0xa0/0xe0 [xe]
<4> guc_exec_queue_fini+0x23/0x40 [xe]
<4> xe_exec_queue_destroy+0xb3/0xf0 [xe]
<4> xe_file_close+0xd4/0x1a0 [xe]
<4> drm_file_free+0x210/0x280 [drm]
<4> drm_close_helper.isra.0+0x6d/0x80 [drm]
<4> drm_release_noglobal+0x20/0x90 [drm]

Fixes: 514447a12190 ("drm/xe: Stop accumulating LRC timestamp on job_free")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3421
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241218053122.2730195-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4ca1fd418338d4d135428a0eb1e16e3b3ce17ee8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

authored by

Lucas De Marchi and committed by
Thomas Hellström
fe39b222 af12ba67

+9
+9
drivers/gpu/drm/xe/xe_exec_queue.c
··· 8 8 #include <linux/nospec.h> 9 9 10 10 #include <drm/drm_device.h> 11 + #include <drm/drm_drv.h> 11 12 #include <drm/drm_file.h> 12 13 #include <uapi/drm/xe_drm.h> 13 14 ··· 763 762 */ 764 763 void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q) 765 764 { 765 + struct xe_device *xe = gt_to_xe(q->gt); 766 766 struct xe_file *xef; 767 767 struct xe_lrc *lrc; 768 768 u32 old_ts, new_ts; 769 + int idx; 769 770 770 771 /* 771 772 * Jobs that are run during driver load may use an exec_queue, but are ··· 775 772 * for kernel specific work. 776 773 */ 777 774 if (!q->vm || !q->vm->xef) 775 + return; 776 + 777 + /* Synchronize with unbind while holding the xe file open */ 778 + if (!drm_dev_enter(&xe->drm, &idx)) 778 779 return; 779 780 780 781 xef = q->vm->xef; ··· 794 787 lrc = q->lrc[0]; 795 788 new_ts = xe_lrc_update_timestamp(lrc, &old_ts); 796 789 xef->run_ticks[q->class] += (new_ts - old_ts) * q->width; 790 + 791 + drm_dev_exit(idx); 797 792 } 798 793 799 794 /**