Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/i915/selftests: Use clear_and_wake_up_bit() for the per-engine reset bitlocks

Some selftests assume that nothing will attempt to grab these bitlocks
while they are held by the selftests. With GuC, for example, that is
not true because the hanging workloads may cause the GuC code to attempt
to grab them for a global reset, and that may cause it to end up
sleeping on the bit never waking up. Regardless whether that will be
the final solution for GuC, use clear_and_wake_up_bit() pending a more
thorough investigation on how this should be handled moving forward.

To be clear this needs to be a temporary solution. If we can't find
an in-kernel locking primitive to use here, we should at the very least
add lockdep annotation to these bitlocks with a thorough explanation
as to why we need to use bits.

v3:
- Use GEM_BUG_ON(test_and_set_bit()) rather than set_bit() to verify
the assumption that nothing is holding the reset locks when we
attempt to grab them. (Chris Wilson)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211105150146.834052-1-thomas.hellstrom@linux.intel.com

+13 -9
+12 -8
drivers/gpu/drm/i915/gt/selftest_hangcheck.c
··· 471 471 count = 0; 472 472 473 473 st_engine_heartbeat_disable(engine); 474 - set_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 474 + GEM_BUG_ON(test_and_set_bit(I915_RESET_ENGINE + id, 475 + &gt->reset.flags)); 475 476 do { 476 477 int i; 477 478 ··· 529 528 break; 530 529 } 531 530 } while (time_before(jiffies, end_time)); 532 - clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 531 + clear_and_wake_up_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 533 532 st_engine_heartbeat_enable(engine); 534 533 535 534 pr_info("%s(%s): %d resets\n", __func__, engine->name, count); ··· 583 582 } 584 583 585 584 st_engine_heartbeat_disable(engine); 586 - set_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 585 + GEM_BUG_ON(test_and_set_bit(I915_RESET_ENGINE + id, 586 + &gt->reset.flags)); 587 587 588 588 force_reset_timeout(engine); 589 589 err = intel_engine_reset(engine, NULL); ··· 681 679 out: 682 680 pr_info("%s(%s): %d resets\n", __func__, engine->name, count); 683 681 skip: 684 - clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 682 + clear_and_wake_up_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 685 683 st_engine_heartbeat_enable(engine); 686 684 intel_context_put(ce); 687 685 ··· 736 734 reset_engine_count = i915_reset_engine_count(global, engine); 737 735 738 736 st_engine_heartbeat_disable(engine); 739 - set_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 737 + GEM_BUG_ON(test_and_set_bit(I915_RESET_ENGINE + id, 738 + &gt->reset.flags)); 740 739 count = 0; 741 740 do { 742 741 struct i915_request *rq = NULL; ··· 827 824 if (err) 828 825 break; 829 826 } while (time_before(jiffies, end_time)); 830 - clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 827 + clear_and_wake_up_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 831 828 st_engine_heartbeat_enable(engine); 832 829 pr_info("%s: Completed %lu %s resets\n", 833 830 engine->name, count, active ? "active" : "idle"); ··· 1045 1042 yield(); /* start all threads before we begin */ 1046 1043 1047 1044 st_engine_heartbeat_disable_no_pm(engine); 1048 - set_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 1045 + GEM_BUG_ON(test_and_set_bit(I915_RESET_ENGINE + id, 1046 + &gt->reset.flags)); 1049 1047 do { 1050 1048 struct i915_request *rq = NULL; 1051 1049 struct intel_selftest_saved_policy saved; ··· 1169 1165 if (err) 1170 1166 break; 1171 1167 } while (time_before(jiffies, end_time)); 1172 - clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 1168 + clear_and_wake_up_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 1173 1169 st_engine_heartbeat_enable_no_pm(engine); 1174 1170 1175 1171 pr_info("i915_reset_engine(%s:%s): %lu resets\n",
+1 -1
drivers/gpu/drm/i915/selftests/igt_reset.c
··· 36 36 enum intel_engine_id id; 37 37 38 38 for_each_engine(engine, gt, id) 39 - clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 39 + clear_and_wake_up_bit(I915_RESET_ENGINE + id, &gt->reset.flags); 40 40 41 41 clear_bit(I915_RESET_BACKOFF, &gt->reset.flags); 42 42 wake_up_all(&gt->reset.queue);