Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/i915: Fix fallout of fake reset along resume

commit b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on
reset/sanitization") and commit 1288786b18f7 ("drm/i915: Move GEM sanitize
from resume_early to resume") show the conflicting requirements on the
code. We must reset the GPU before trashing live state on a fast resume
(hibernation debug, or error paths), but we must only reset our state
tracking iff the GPU is reset (or power cycled). This is tricky if we
are disabling GPU reset to simulate broken hardware; we reset our state
tracking but the GPU is left intact and recovers from its stale state.

v2: Again without the assertion for forcewake, no longer required since
commit b3ee09a4de33 ("drm/i915/ringbuffer: Fix context restore upon reset")
as the contexts are reset from the CS ensuring everything is powered up.

Fixes: b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
Fixes: 1288786b18f7 ("drm/i915: Move GEM sanitize from resume_early to resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk

+31 -17
+2
drivers/gpu/drm/i915/i915_drv.c
··· 1841 1841 else 1842 1842 intel_display_set_init_power(dev_priv, true); 1843 1843 1844 + intel_engines_sanitize(dev_priv); 1845 + 1844 1846 enable_rpm_wakeref_asserts(dev_priv); 1845 1847 1846 1848 out:
+5 -9
drivers/gpu/drm/i915/i915_gem.c
··· 4990 4990 4991 4991 void i915_gem_sanitize(struct drm_i915_private *i915) 4992 4992 { 4993 - struct intel_engine_cs *engine; 4994 - enum intel_engine_id id; 4993 + int err; 4995 4994 4996 4995 GEM_TRACE("\n"); 4997 4996 ··· 5016 5017 * it may impact the display and we are uncertain about the stability 5017 5018 * of the reset, so this could be applied to even earlier gen. 5018 5019 */ 5020 + err = -ENODEV; 5019 5021 if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) 5020 - WARN_ON(intel_gpu_reset(i915, ALL_ENGINES)); 5021 - 5022 - /* Reset the submission backend after resume as well as the GPU reset */ 5023 - for_each_engine(engine, i915, id) { 5024 - if (engine->reset.reset) 5025 - engine->reset.reset(engine, NULL); 5026 - } 5022 + err = WARN_ON(intel_gpu_reset(i915, ALL_ENGINES)); 5023 + if (!err) 5024 + intel_engines_sanitize(i915); 5027 5025 5028 5026 intel_uncore_forcewake_put(i915, FORCEWAKE_ALL); 5029 5027 intel_runtime_pm_put(i915);
+22
drivers/gpu/drm/i915/intel_engine_cs.c
··· 1078 1078 } 1079 1079 1080 1080 /** 1081 + * intel_engines_sanitize: called after the GPU has lost power 1082 + * @i915: the i915 device 1083 + * 1084 + * Anytime we reset the GPU, either with an explicit GPU reset or through a 1085 + * PCI power cycle, the GPU loses state and we must reset our state tracking 1086 + * to match. Note that calling intel_engines_sanitize() if the GPU has not 1087 + * been reset results in much confusion! 1088 + */ 1089 + void intel_engines_sanitize(struct drm_i915_private *i915) 1090 + { 1091 + struct intel_engine_cs *engine; 1092 + enum intel_engine_id id; 1093 + 1094 + GEM_TRACE("\n"); 1095 + 1096 + for_each_engine(engine, i915, id) { 1097 + if (engine->reset.reset) 1098 + engine->reset.reset(engine, NULL); 1099 + } 1100 + } 1101 + 1102 + /** 1081 1103 * intel_engines_park: called when the GT is transitioning from busy->idle 1082 1104 * @i915: the i915 device 1083 1105 *
-8
drivers/gpu/drm/i915/intel_ringbuffer.c
··· 564 564 GEM_TRACE("%s seqno=%x\n", engine->name, rq ? rq->global_seqno : 0); 565 565 566 566 /* 567 - * RC6 must be prevented until the reset is complete and the engine 568 - * reinitialised. If it occurs in the middle of this sequence, the 569 - * state written to/loaded from the power context is ill-defined (e.g. 570 - * the PP_BASE_DIR may be lost). 571 - */ 572 - assert_forcewakes_active(engine->i915, FORCEWAKE_ALL); 573 - 574 - /* 575 567 * Try to restore the logical GPU state to match the continuation 576 568 * of the request queue. If we skip the context/PD restore, then 577 569 * the next request may try to execute assuming that its context
+2
drivers/gpu/drm/i915/intel_ringbuffer.h
··· 1052 1052 return cs; 1053 1053 } 1054 1054 1055 + void intel_engines_sanitize(struct drm_i915_private *i915); 1056 + 1055 1057 bool intel_engine_is_idle(struct intel_engine_cs *engine); 1056 1058 bool intel_engines_are_idle(struct drm_i915_private *dev_priv); 1057 1059