Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/xe/guc: Don't treat GuC generic CAT error as protocol error

GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any
context. We shouldn't treat that as G2H protocol error that would
justify a GT reset, as it may happen due to some VF activity.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241105204557.1991-1-michal.wajdeczko@intel.com

+10
+1
drivers/gpu/drm/xe/xe_guc_fwif.h
··· 17 17 #define G2H_LEN_DW_TLB_INVALIDATE 3 18 18 19 19 #define GUC_ID_MAX 65535 20 + #define GUC_ID_UNKNOWN 0xffffffff 20 21 21 22 #define GUC_CONTEXT_DISABLE 0 22 23 #define GUC_CONTEXT_ENABLE 1
+9
drivers/gpu/drm/xe/xe_guc_submit.c
··· 2021 2021 2022 2022 guc_id = msg[0]; 2023 2023 2024 + if (guc_id == GUC_ID_UNKNOWN) { 2025 + /* 2026 + * GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any PF/VF 2027 + * context. In such case only PF will be notified about that fault. 2028 + */ 2029 + xe_gt_err_ratelimited(gt, "Memory CAT error reported by GuC!\n"); 2030 + return 0; 2031 + } 2032 + 2024 2033 q = g2h_exec_queue_lookup(guc, guc_id); 2025 2034 if (unlikely(!q)) 2026 2035 return -EPROTO;