Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/i915: Cancel outstanding work after disabling heartbeats on an engine

We only allow persistent requests to remain on the GPU past the closure
of their containing context (and process) so long as they are continuously
checked for hangs or allow other requests to preempt them, as we need to
ensure forward progress of the system. If we allow persistent contexts
to remain on the system after the the hangcheck mechanism is disabled,
the system may grind to a halt. On disabling the mechanism, we sent a
pulse along the engine to remove all executing contexts from the engine
which would check for hung contexts -- but we did not prevent those
contexts from being resubmitted if they survived the final hangcheck.

Fixes: 9a40bddd47ca ("drm/i915/gt: Expose heartbeat interval via sysfs")
Testcase: igt/gem_ctx_persistence/heartbeat-stop
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200928221510.26044-1-chris@chris-wilson.co.uk
(cherry picked from commit 7a991cd3e3da9a56d5616b62d425db000a3242f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

authored by

Chris Wilson and committed by
Rodrigo Vivi
7d442ea7 3cfea8c9

+14
+9
drivers/gpu/drm/i915/gt/intel_engine.h
··· 337 337 return intel_engine_has_preemption(engine); 338 338 } 339 339 340 + static inline bool 341 + intel_engine_has_heartbeat(const struct intel_engine_cs *engine) 342 + { 343 + if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL)) 344 + return false; 345 + 346 + return READ_ONCE(engine->props.heartbeat_interval_ms); 347 + } 348 + 340 349 #endif /* _INTEL_RINGBUFFER_H_ */
+5
drivers/gpu/drm/i915/i915_request.c
··· 542 542 if (i915_request_completed(request)) 543 543 goto xfer; 544 544 545 + if (unlikely(intel_context_is_closed(request->context) && 546 + !intel_engine_has_heartbeat(engine))) 547 + intel_context_set_banned(request->context); 548 + 545 549 if (unlikely(intel_context_is_banned(request->context))) 546 550 i915_request_set_error_once(request, -EIO); 551 + 547 552 if (unlikely(fatal_error(request->fence.error))) 548 553 __i915_request_skip(request); 549 554