drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.

Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.

Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).

Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.

Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.

Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.

Design awareness: Selftest impact.
As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.

Design awareness: Race between guc_request_alloc and guc_context_close.
If a context close is issued while there is a request submission in
flight and a delayed schedule disable is pending, guc_context_close
and guc_request_alloc will race to cancel the delayed disable.
To close the race, make sure that guc_request_alloc waits for
guc_context_close to finish running before checking any state.

Design awareness: GT Reset event.
If a gt reset is triggered, as preparation steps, add an additional step
to ensure all contexts that have a pending delay-disable-schedule task
be flushed of it. Move them directly into the closed state after cancelling
the worker. This is okay because the existing flow flushes all
yet-to-arrive G2H's dropping them anyway.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com

authored by

Matthew Brost and committed by

John Harrison 3 years ago 83321094 befb231d

+278 -31

7 changed files

expand all

drivers

gpu

drm

i915

gem

i915_gem_context.c

intel_context.h

intel_context_types.h

intel_guc.h

intel_guc_debugfs.c

intel_guc_submission.c

i915_selftest.h

+1 -1

drivers/gpu/drm/i915/gem/i915_gem_context.c

··· 1452 1452 int err; 1453 1453 1454 1454 /* serialises with execbuf */ 1455 - set_bit(CONTEXT_CLOSED_BIT, &ce->flags); 1455 + intel_context_close(ce); 1456 1456 if (!intel_context_pin_if_active(ce)) 1457 1457 continue; 1458 1458

drivers/gpu/drm/i915/gt/intel_context.h

··· 276 276 return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); 277 277 } 278 278 279 + static inline void intel_context_close(struct intel_context *ce) 280 + { 281 + set_bit(CONTEXT_CLOSED_BIT, &ce->flags); 282 + 283 + if (ce->ops->close) 284 + ce->ops->close(ce); 285 + } 286 + 279 287 static inline bool intel_context_is_closed(const struct intel_context *ce) 280 288 { 281 289 return test_bit(CONTEXT_CLOSED_BIT, &ce->flags);

drivers/gpu/drm/i915/gt/intel_context_types.h

··· 43 43 void (*revoke)(struct intel_context *ce, struct i915_request *rq, 44 44 unsigned int preempt_timeout_ms); 45 45 46 + void (*close)(struct intel_context *ce); 47 + 46 48 int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); 47 49 int (*pin)(struct intel_context *ce, void *vaddr); 48 50 void (*unpin)(struct intel_context *ce); ··· 210 208 * each priority bucket 211 209 */ 212 210 u32 prio_count[GUC_CLIENT_PRIORITY_NUM]; 211 + /** 212 + * @sched_disable_delay_work: worker to disable scheduling on this 213 + * context 214 + */ 215 + struct delayed_work sched_disable_delay_work; 213 216 } guc_state; 214 217 215 218 struct {

+16

drivers/gpu/drm/i915/gt/uc/intel_guc.h

··· 113 113 */ 114 114 struct list_head guc_id_list; 115 115 /** 116 + * @guc_ids_in_use: Number single-lrc guc_ids in use 117 + */ 118 + unsigned int guc_ids_in_use; 119 + /** 116 120 * @destroyed_contexts: list of contexts waiting to be destroyed 117 121 * (deregistered with the GuC) 118 122 */ ··· 136 132 * @reset_fail_mask: mask of engines that failed to reset 137 133 */ 138 134 intel_engine_mask_t reset_fail_mask; 135 + /** 136 + * @sched_disable_delay_ms: schedule disable delay, in ms, for 137 + * contexts 138 + */ 139 + unsigned int sched_disable_delay_ms; 140 + /** 141 + * @sched_disable_gucid_threshold: threshold of min remaining available 142 + * guc_ids before we start bypassing the schedule disable delay 143 + */ 144 + unsigned int sched_disable_gucid_threshold; 139 145 } submission_state; 140 146 141 147 /** ··· 479 465 void intel_guc_write_barrier(struct intel_guc *guc); 480 466 481 467 void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p); 468 + 469 + int intel_guc_sched_disable_gucid_threshold_max(struct intel_guc *guc); 482 470 483 471 #endif

+61

drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c

··· 71 71 return intel_guc_slpc_is_used(guc); 72 72 } 73 73 74 + static int guc_sched_disable_delay_ms_get(void *data, u64 *val) 75 + { 76 + struct intel_guc *guc = data; 77 + 78 + if (!intel_guc_submission_is_used(guc)) 79 + return -ENODEV; 80 + 81 + *val = (u64)guc->submission_state.sched_disable_delay_ms; 82 + 83 + return 0; 84 + } 85 + 86 + static int guc_sched_disable_delay_ms_set(void *data, u64 val) 87 + { 88 + struct intel_guc *guc = data; 89 + 90 + if (!intel_guc_submission_is_used(guc)) 91 + return -ENODEV; 92 + 93 + /* clamp to a practical limit, 1 minute is reasonable for a longest delay */ 94 + guc->submission_state.sched_disable_delay_ms = min_t(u64, val, 60000); 95 + 96 + return 0; 97 + } 98 + DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_delay_ms_fops, 99 + guc_sched_disable_delay_ms_get, 100 + guc_sched_disable_delay_ms_set, "%lld\n"); 101 + 102 + static int guc_sched_disable_gucid_threshold_get(void *data, u64 *val) 103 + { 104 + struct intel_guc *guc = data; 105 + 106 + if (!intel_guc_submission_is_used(guc)) 107 + return -ENODEV; 108 + 109 + *val = guc->submission_state.sched_disable_gucid_threshold; 110 + return 0; 111 + } 112 + 113 + static int guc_sched_disable_gucid_threshold_set(void *data, u64 val) 114 + { 115 + struct intel_guc *guc = data; 116 + 117 + if (!intel_guc_submission_is_used(guc)) 118 + return -ENODEV; 119 + 120 + if (val > intel_guc_sched_disable_gucid_threshold_max(guc)) 121 + guc->submission_state.sched_disable_gucid_threshold = 122 + intel_guc_sched_disable_gucid_threshold_max(guc); 123 + else 124 + guc->submission_state.sched_disable_gucid_threshold = val; 125 + 126 + return 0; 127 + } 128 + DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_gucid_threshold_fops, 129 + guc_sched_disable_gucid_threshold_get, 130 + guc_sched_disable_gucid_threshold_set, "%lld\n"); 131 + 74 132 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) 75 133 { 76 134 static const struct intel_gt_debugfs_file files[] = { 77 135 { "guc_info", &guc_info_fops, NULL }, 78 136 { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, 79 137 { "guc_slpc_info", &guc_slpc_info_fops, &intel_eval_slpc_support}, 138 + { "guc_sched_disable_delay_ms", &guc_sched_disable_delay_ms_fops, NULL }, 139 + { "guc_sched_disable_gucid_threshold", &guc_sched_disable_gucid_threshold_fops, 140 + NULL }, 80 141 }; 81 142 82 143 if (!intel_guc_is_supported(guc))

+183 -30

drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

··· 66 66 * corresponding G2H returns indicating the scheduling disable operation has 67 67 * completed it is safe to unpin the context. While a disable is in flight it 68 68 * isn't safe to resubmit the context so a fence is used to stall all future 69 - * requests of that context until the G2H is returned. 69 + * requests of that context until the G2H is returned. Because this interaction 70 + * with the GuC takes a non-zero amount of time we delay the disabling of 71 + * scheduling after the pin count goes to zero by a configurable period of time 72 + * (see SCHED_DISABLE_DELAY_MS). The thought is this gives the user a window of 73 + * time to resubmit something on the context before doing this costly operation. 74 + * This delay is only done if the context isn't closed and the guc_id usage is 75 + * less than a threshold (see NUM_SCHED_DISABLE_GUC_IDS_THRESHOLD). 70 76 * 71 77 * Context deregistration: 72 78 * Before a context can be destroyed or if we steal its guc_id we must ··· 170 164 #define SCHED_STATE_PENDING_ENABLE BIT(5) 171 165 #define SCHED_STATE_REGISTERED BIT(6) 172 166 #define SCHED_STATE_POLICY_REQUIRED BIT(7) 173 - #define SCHED_STATE_BLOCKED_SHIFT 8 167 + #define SCHED_STATE_CLOSED BIT(8) 168 + #define SCHED_STATE_BLOCKED_SHIFT 9 174 169 #define SCHED_STATE_BLOCKED BIT(SCHED_STATE_BLOCKED_SHIFT) 175 170 #define SCHED_STATE_BLOCKED_MASK (0xfff << SCHED_STATE_BLOCKED_SHIFT) 176 171 ··· 181 174 ce->guc_state.sched_state &= SCHED_STATE_BLOCKED_MASK; 182 175 } 183 176 177 + /* 178 + * Kernel contexts can have SCHED_STATE_REGISTERED after suspend. 179 + * A context close can race with the submission path, so SCHED_STATE_CLOSED 180 + * can be set immediately before we try to register. 181 + */ 182 + #define SCHED_STATE_VALID_INIT \ 183 + (SCHED_STATE_BLOCKED_MASK | \ 184 + SCHED_STATE_CLOSED | \ 185 + SCHED_STATE_REGISTERED) 186 + 184 187 __maybe_unused 185 188 static bool sched_state_is_init(struct intel_context *ce) 186 189 { 187 - /* Kernel contexts can have SCHED_STATE_REGISTERED after suspend. */ 188 - return !(ce->guc_state.sched_state & 189 - ~(SCHED_STATE_BLOCKED_MASK | SCHED_STATE_REGISTERED)); 190 + return !(ce->guc_state.sched_state & ~SCHED_STATE_VALID_INIT); 190 191 } 191 192 192 193 static inline bool ··· 333 318 { 334 319 lockdep_assert_held(&ce->guc_state.lock); 335 320 ce->guc_state.sched_state &= ~SCHED_STATE_POLICY_REQUIRED; 321 + } 322 + 323 + static inline bool context_close_done(struct intel_context *ce) 324 + { 325 + return ce->guc_state.sched_state & SCHED_STATE_CLOSED; 326 + } 327 + 328 + static inline void set_context_close_done(struct intel_context *ce) 329 + { 330 + lockdep_assert_held(&ce->guc_state.lock); 331 + ce->guc_state.sched_state |= SCHED_STATE_CLOSED; 336 332 } 337 333 338 334 static inline u32 context_blocked(struct intel_context *ce) ··· 1093 1067 bool do_put = kref_get_unless_zero(&ce->ref); 1094 1068 1095 1069 xa_unlock(&guc->context_lookup); 1070 + 1071 + if (test_bit(CONTEXT_GUC_INIT, &ce->flags) && 1072 + (cancel_delayed_work(&ce->guc_state.sched_disable_delay_work))) { 1073 + /* successful cancel so jump straight to close it */ 1074 + intel_context_sched_disable_unpin(ce); 1075 + } 1096 1076 1097 1077 spin_lock(&ce->guc_state.lock); 1098 1078 ··· 2027 1995 if (unlikely(ret < 0)) 2028 1996 return ret; 2029 1997 1998 + if (!intel_context_is_parent(ce)) 1999 + ++guc->submission_state.guc_ids_in_use; 2000 + 2030 2001 ce->guc_id.id = ret; 2031 2002 return 0; 2032 2003 } ··· 2039 2004 GEM_BUG_ON(intel_context_is_child(ce)); 2040 2005 2041 2006 if (!context_guc_id_invalid(ce)) { 2042 - if (intel_context_is_parent(ce)) 2007 + if (intel_context_is_parent(ce)) { 2043 2008 bitmap_release_region(guc->submission_state.guc_ids_bitmap, 2044 2009 ce->guc_id.id, 2045 2010 order_base_2(ce->parallel.number_children 2046 2011 + 1)); 2047 - else 2012 + } else { 2013 + --guc->submission_state.guc_ids_in_use; 2048 2014 ida_simple_remove(&guc->submission_state.guc_ids, 2049 2015 ce->guc_id.id); 2016 + } 2050 2017 clr_ctx_id_mapping(guc, ce->guc_id.id); 2051 2018 set_context_guc_id_invalid(ce); 2052 2019 } ··· 3044 3007 } 3045 3008 } 3046 3009 3047 - static void guc_context_sched_disable(struct intel_context *ce) 3010 + static void do_sched_disable(struct intel_guc *guc, struct intel_context *ce, 3011 + unsigned long flags) 3012 + __releases(ce->guc_state.lock) 3048 3013 { 3049 - struct intel_guc *guc = ce_to_guc(ce); 3050 - unsigned long flags; 3051 3014 struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; 3052 3015 intel_wakeref_t wakeref; 3053 3016 u16 guc_id; 3054 3017 3055 - GEM_BUG_ON(intel_context_is_child(ce)); 3056 - 3057 - spin_lock_irqsave(&ce->guc_state.lock, flags); 3058 - 3059 - /* 3060 - * We have to check if the context has been disabled by another thread, 3061 - * check if submssion has been disabled to seal a race with reset and 3062 - * finally check if any more requests have been committed to the 3063 - * context ensursing that a request doesn't slip through the 3064 - * 'context_pending_disable' fence. 3065 - */ 3066 - if (unlikely(!context_enabled(ce) || submission_disabled(guc) || 3067 - context_has_committed_requests(ce))) { 3068 - clr_context_enabled(ce); 3069 - spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3070 - goto unpin; 3071 - } 3018 + lockdep_assert_held(&ce->guc_state.lock); 3072 3019 guc_id = prep_context_pending_disable(ce); 3073 3020 3074 3021 spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3075 3022 3076 3023 with_intel_runtime_pm(runtime_pm, wakeref) 3077 3024 __guc_context_sched_disable(guc, ce, guc_id); 3025 + } 3078 3026 3079 - return; 3080 - unpin: 3081 - intel_context_sched_disable_unpin(ce); 3027 + static bool bypass_sched_disable(struct intel_guc *guc, 3028 + struct intel_context *ce) 3029 + { 3030 + lockdep_assert_held(&ce->guc_state.lock); 3031 + GEM_BUG_ON(intel_context_is_child(ce)); 3032 + 3033 + if (submission_disabled(guc) || context_guc_id_invalid(ce) || 3034 + !ctx_id_mapped(guc, ce->guc_id.id)) { 3035 + clr_context_enabled(ce); 3036 + return true; 3037 + } 3038 + 3039 + return !context_enabled(ce); 3040 + } 3041 + 3042 + static void __delay_sched_disable(struct work_struct *wrk) 3043 + { 3044 + struct intel_context *ce = 3045 + container_of(wrk, typeof(*ce), guc_state.sched_disable_delay_work.work); 3046 + struct intel_guc *guc = ce_to_guc(ce); 3047 + unsigned long flags; 3048 + 3049 + spin_lock_irqsave(&ce->guc_state.lock, flags); 3050 + 3051 + if (bypass_sched_disable(guc, ce)) { 3052 + spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3053 + intel_context_sched_disable_unpin(ce); 3054 + } else { 3055 + do_sched_disable(guc, ce, flags); 3056 + } 3057 + } 3058 + 3059 + static bool guc_id_pressure(struct intel_guc *guc, struct intel_context *ce) 3060 + { 3061 + /* 3062 + * parent contexts are perma-pinned, if we are unpinning do schedule 3063 + * disable immediately. 3064 + */ 3065 + if (intel_context_is_parent(ce)) 3066 + return true; 3067 + 3068 + /* 3069 + * If we are beyond the threshold for avail guc_ids, do schedule disable immediately. 3070 + */ 3071 + return guc->submission_state.guc_ids_in_use > 3072 + guc->submission_state.sched_disable_gucid_threshold; 3073 + } 3074 + 3075 + static void guc_context_sched_disable(struct intel_context *ce) 3076 + { 3077 + struct intel_guc *guc = ce_to_guc(ce); 3078 + u64 delay = guc->submission_state.sched_disable_delay_ms; 3079 + unsigned long flags; 3080 + 3081 + spin_lock_irqsave(&ce->guc_state.lock, flags); 3082 + 3083 + if (bypass_sched_disable(guc, ce)) { 3084 + spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3085 + intel_context_sched_disable_unpin(ce); 3086 + } else if (!intel_context_is_closed(ce) && !guc_id_pressure(guc, ce) && 3087 + delay) { 3088 + spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3089 + mod_delayed_work(system_unbound_wq, 3090 + &ce->guc_state.sched_disable_delay_work, 3091 + msecs_to_jiffies(delay)); 3092 + } else { 3093 + do_sched_disable(guc, ce, flags); 3094 + } 3095 + } 3096 + 3097 + static void guc_context_close(struct intel_context *ce) 3098 + { 3099 + unsigned long flags; 3100 + 3101 + if (test_bit(CONTEXT_GUC_INIT, &ce->flags) && 3102 + cancel_delayed_work(&ce->guc_state.sched_disable_delay_work)) 3103 + __delay_sched_disable(&ce->guc_state.sched_disable_delay_work.work); 3104 + 3105 + spin_lock_irqsave(&ce->guc_state.lock, flags); 3106 + set_context_close_done(ce); 3107 + spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3082 3108 } 3083 3109 3084 3110 static inline void guc_lrc_desc_unpin(struct intel_context *ce) ··· 3460 3360 static const struct intel_context_ops guc_context_ops = { 3461 3361 .alloc = guc_context_alloc, 3462 3362 3363 + .close = guc_context_close, 3364 + 3463 3365 .pre_pin = guc_context_pre_pin, 3464 3366 .pin = guc_context_pin, 3465 3367 .unpin = guc_context_unpin, ··· 3544 3442 rcu_read_unlock(); 3545 3443 3546 3444 ce->guc_state.prio = map_i915_prio_to_guc_prio(prio); 3445 + 3446 + INIT_DELAYED_WORK(&ce->guc_state.sched_disable_delay_work, 3447 + __delay_sched_disable); 3448 + 3547 3449 set_bit(CONTEXT_GUC_INIT, &ce->flags); 3548 3450 } 3549 3451 ··· 3585 3479 if (unlikely(!test_bit(CONTEXT_GUC_INIT, &ce->flags))) 3586 3480 guc_context_init(ce); 3587 3481 3482 + /* 3483 + * If the context gets closed while the execbuf is ongoing, the context 3484 + * close code will race with the below code to cancel the delayed work. 3485 + * If the context close wins the race and cancels the work, it will 3486 + * immediately call the sched disable (see guc_context_close), so there 3487 + * is a chance we can get past this check while the sched_disable code 3488 + * is being executed. To make sure that code completes before we check 3489 + * the status further down, we wait for the close process to complete. 3490 + * Else, this code path could send a request down thinking that the 3491 + * context is still in a schedule-enable mode while the GuC ends up 3492 + * dropping the request completely because the disable did go from the 3493 + * context_close path right to GuC just prior. In the event the CT is 3494 + * full, we could potentially need to wait up to 1.5 seconds. 3495 + */ 3496 + if (cancel_delayed_work_sync(&ce->guc_state.sched_disable_delay_work)) 3497 + intel_context_sched_disable_unpin(ce); 3498 + else if (intel_context_is_closed(ce)) 3499 + if (wait_for(context_close_done(ce), 1500)) 3500 + drm_warn(&guc_to_gt(guc)->i915->drm, 3501 + "timed out waiting on context sched close before realloc\n"); 3588 3502 /* 3589 3503 * Call pin_guc_id here rather than in the pinning step as with 3590 3504 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the ··· 3735 3609 static const struct intel_context_ops virtual_guc_context_ops = { 3736 3610 .alloc = guc_virtual_context_alloc, 3737 3611 3612 + .close = guc_context_close, 3613 + 3738 3614 .pre_pin = guc_virtual_context_pre_pin, 3739 3615 .pin = guc_virtual_context_pin, 3740 3616 .unpin = guc_virtual_context_unpin, ··· 3825 3697 3826 3698 static const struct intel_context_ops virtual_parent_context_ops = { 3827 3699 .alloc = guc_virtual_context_alloc, 3700 + 3701 + .close = guc_context_close, 3828 3702 3829 3703 .pre_pin = guc_context_pre_pin, 3830 3704 .pin = guc_parent_context_pin, ··· 4451 4321 return i915->params.enable_guc & ENABLE_GUC_SUBMISSION; 4452 4322 } 4453 4323 4324 + int intel_guc_sched_disable_gucid_threshold_max(struct intel_guc *guc) 4325 + { 4326 + return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc); 4327 + } 4328 + 4329 + /* 4330 + * This default value of 33 milisecs (+1 milisec round up) ensures 30fps or higher 4331 + * workloads are able to enjoy the latency reduction when delaying the schedule-disable 4332 + * operation. This matches the 30fps game-render + encode (real world) workload this 4333 + * knob was tested against. 4334 + */ 4335 + #define SCHED_DISABLE_DELAY_MS 34 4336 + 4337 + /* 4338 + * A threshold of 75% is a reasonable starting point considering that real world apps 4339 + * generally don't get anywhere near this. 4340 + */ 4341 + #define NUM_SCHED_DISABLE_GUCIDS_DEFAULT_THRESHOLD(__guc) \ 4342 + (((intel_guc_sched_disable_gucid_threshold_max(guc)) * 3) / 4) 4343 + 4454 4344 void intel_guc_submission_init_early(struct intel_guc *guc) 4455 4345 { 4456 4346 xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); ··· 4487 4337 spin_lock_init(&guc->timestamp.lock); 4488 4338 INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping); 4489 4339 4340 + guc->submission_state.sched_disable_delay_ms = SCHED_DISABLE_DELAY_MS; 4490 4341 guc->submission_state.num_guc_ids = GUC_MAX_CONTEXT_ID; 4342 + guc->submission_state.sched_disable_gucid_threshold = 4343 + NUM_SCHED_DISABLE_GUCIDS_DEFAULT_THRESHOLD(guc); 4491 4344 guc->submission_supported = __guc_submission_supported(guc); 4492 4345 guc->submission_selected = __guc_submission_selected(guc); 4493 4346 }

drivers/gpu/drm/i915/i915_selftest.h

··· 92 92 T, ARRAY_SIZE(T), data) 93 93 #define i915_live_subtests(T, data) ({ \ 94 94 typecheck(struct drm_i915_private *, data); \ 95 + (data)->gt[0]->uc.guc.submission_state.sched_disable_delay_ms = 0; \ 95 96 __i915_subtests(__func__, \ 96 97 __i915_live_setup, __i915_live_teardown, \ 97 98 T, ARRAY_SIZE(T), data); \ 98 99 }) 99 100 #define intel_gt_live_subtests(T, data) ({ \ 100 101 typecheck(struct intel_gt *, data); \ 102 + (data)->uc.guc.submission_state.sched_disable_delay_ms = 0; \ 101 103 __i915_subtests(__func__, \ 102 104 __intel_gt_live_setup, __intel_gt_live_teardown, \ 103 105 T, ARRAY_SIZE(T), data); \