Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

drm/xe/client: Print runtime to fdinfo

Print the accumulated runtime for client when printing fdinfo.
Each time a query is done it first does 2 things:

1) loop through all the exec queues for the current client and
accumulate the runtime, per engine class. CTX_TIMESTAMP is used for
that, being read from the context image.

2) Read a "GPU timestamp" that can be used for considering "how much GPU
time has passed" and that has the same unit/refclock as the one
recording the runtime. RING_TIMESTAMP is used for that via MMIO.

Since for all current platforms RING_TIMESTAMP follows the same
refclock, just read it once, using any first engine available.

This is exported to userspace as 2 numbers in fdinfo:

drm-cycles-<class>: <RUNTIME>
drm-total-cycles-<class>: <TIMESTAMP>

Userspace is expected to collect at least 2 samples, which allows to
know the client engine busyness as per:

RUNTIME1 - RUNTIME0
busyness = ---------------------
T1 - T0

Since drm-cycles-<class> always starts at 0, it's also possible to know
if and engine was ever used by a client.

It's expected that userspace will read any 2 samples every few seconds.
Given the update frequency of the counters involved and that
CTX_TIMESTAMP is 32-bits, the counter for each exec_queue can wrap
around (assuming 100% utilization) after ~200s. The wraparound is not
perceived by userspace since it's just accumulated for all the
exec_queues in a 64-bit counter) but the measurement will not be
accurate if the samples are too far apart.

This could be mitigated by adding a workqueue to accumulate the counters
every so often, but it's additional complexity for something that is
done already by userspace every few seconds in tools like gputop (from
igt), htop, nvtop, etc, with none of them really defaulting to 1 sample
per minute or more.

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517204310.88854-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

+150 -3
+19 -2
Documentation/gpu/drm-usage-stats.rst
··· 112 112 was previously read, userspace is expected to stay with that larger previous 113 113 value until a monotonic update is seen. 114 114 115 + - drm-total-cycles-<keystr>: <uint> 116 + 117 + Engine identifier string must be the same as the one specified in the 118 + drm-cycles-<keystr> tag and shall contain the total number cycles for the given 119 + engine. 120 + 121 + This is a timestamp in GPU unspecified unit that matches the update rate 122 + of drm-cycles-<keystr>. For drivers that implement this interface, the engine 123 + utilization can be calculated entirely on the GPU clock domain, without 124 + considering the CPU sleep time between 2 samples. 125 + 126 + A driver may implement either this key or drm-maxfreq-<keystr>, but not both. 127 + 115 128 - drm-maxfreq-<keystr>: <uint> [Hz|MHz|KHz] 116 129 117 130 Engine identifier string must be the same as the one specified in the ··· 133 120 percentage utilization of the engine, whereas drm-engine-<keystr> only reflects 134 121 time active without considering what frequency the engine is operating as a 135 122 percentage of its maximum frequency. 123 + 124 + A driver may implement either this key or drm-total-cycles-<keystr>, but not 125 + both. 136 126 137 127 Memory 138 128 ^^^^^^ ··· 184 168 Driver specific implementations 185 169 ------------------------------- 186 170 187 - :ref:`i915-usage-stats` 188 - :ref:`panfrost-usage-stats` 171 + * :ref:`i915-usage-stats` 172 + * :ref:`panfrost-usage-stats` 173 + * :ref:`xe-usage-stats`
+1
Documentation/gpu/xe/index.rst
··· 23 23 xe_firmware 24 24 xe_tile 25 25 xe_debugging 26 + xe-drm-usage-stats.rst
+10
Documentation/gpu/xe/xe-drm-usage-stats.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0+ 2 + 3 + .. _xe-usage-stats: 4 + 5 + ======================================== 6 + Xe DRM client usage stats implementation 7 + ======================================== 8 + 9 + .. kernel-doc:: drivers/gpu/drm/xe/xe_drm_client.c 10 + :doc: DRM Client usage stats
+120 -1
drivers/gpu/drm/xe/xe_drm_client.c
··· 2 2 /* 3 3 * Copyright © 2023 Intel Corporation 4 4 */ 5 + #include "xe_drm_client.h" 5 6 6 7 #include <drm/drm_print.h> 7 8 #include <drm/xe_drm.h> ··· 13 12 #include "xe_bo.h" 14 13 #include "xe_bo_types.h" 15 14 #include "xe_device_types.h" 16 - #include "xe_drm_client.h" 15 + #include "xe_exec_queue.h" 16 + #include "xe_force_wake.h" 17 + #include "xe_gt.h" 18 + #include "xe_hw_engine.h" 19 + #include "xe_pm.h" 17 20 #include "xe_trace.h" 21 + 22 + /** 23 + * DOC: DRM Client usage stats 24 + * 25 + * The drm/xe driver implements the DRM client usage stats specification as 26 + * documented in :ref:`drm-client-usage-stats`. 27 + * 28 + * Example of the output showing the implemented key value pairs and entirety of 29 + * the currently possible format options: 30 + * 31 + * :: 32 + * 33 + * pos: 0 34 + * flags: 0100002 35 + * mnt_id: 26 36 + * ino: 685 37 + * drm-driver: xe 38 + * drm-client-id: 3 39 + * drm-pdev: 0000:03:00.0 40 + * drm-total-system: 0 41 + * drm-shared-system: 0 42 + * drm-active-system: 0 43 + * drm-resident-system: 0 44 + * drm-purgeable-system: 0 45 + * drm-total-gtt: 192 KiB 46 + * drm-shared-gtt: 0 47 + * drm-active-gtt: 0 48 + * drm-resident-gtt: 192 KiB 49 + * drm-total-vram0: 23992 KiB 50 + * drm-shared-vram0: 16 MiB 51 + * drm-active-vram0: 0 52 + * drm-resident-vram0: 23992 KiB 53 + * drm-total-stolen: 0 54 + * drm-shared-stolen: 0 55 + * drm-active-stolen: 0 56 + * drm-resident-stolen: 0 57 + * drm-cycles-rcs: 28257900 58 + * drm-total-cycles-rcs: 7655183225 59 + * drm-cycles-bcs: 0 60 + * drm-total-cycles-bcs: 7655183225 61 + * drm-cycles-vcs: 0 62 + * drm-total-cycles-vcs: 7655183225 63 + * drm-engine-capacity-vcs: 2 64 + * drm-cycles-vecs: 0 65 + * drm-total-cycles-vecs: 7655183225 66 + * drm-engine-capacity-vecs: 2 67 + * drm-cycles-ccs: 0 68 + * drm-total-cycles-ccs: 7655183225 69 + * drm-engine-capacity-ccs: 4 70 + * 71 + * Possible `drm-cycles-` key names are: `rcs`, `ccs`, `bcs`, `vcs`, `vecs` and 72 + * "other". 73 + */ 18 74 19 75 /** 20 76 * xe_drm_client_alloc() - Allocate drm client ··· 237 179 } 238 180 } 239 181 182 + static void show_runtime(struct drm_printer *p, struct drm_file *file) 183 + { 184 + unsigned long class, i, gt_id, capacity[XE_ENGINE_CLASS_MAX] = { }; 185 + struct xe_file *xef = file->driver_priv; 186 + struct xe_device *xe = xef->xe; 187 + struct xe_gt *gt; 188 + struct xe_hw_engine *hwe; 189 + struct xe_exec_queue *q; 190 + u64 gpu_timestamp; 191 + 192 + xe_pm_runtime_get(xe); 193 + 194 + /* Accumulate all the exec queues from this client */ 195 + mutex_lock(&xef->exec_queue.lock); 196 + xa_for_each(&xef->exec_queue.xa, i, q) 197 + xe_exec_queue_update_runtime(q); 198 + mutex_unlock(&xef->exec_queue.lock); 199 + 200 + /* Get the total GPU cycles */ 201 + for_each_gt(gt, xe, gt_id) { 202 + hwe = xe_gt_any_hw_engine(gt); 203 + if (!hwe) 204 + continue; 205 + 206 + xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 207 + gpu_timestamp = xe_hw_engine_read_timestamp(hwe); 208 + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); 209 + break; 210 + } 211 + 212 + xe_pm_runtime_put(xe); 213 + 214 + if (unlikely(!hwe)) 215 + return; 216 + 217 + for (class = 0; class < XE_ENGINE_CLASS_MAX; class++) { 218 + const char *class_name; 219 + 220 + for_each_gt(gt, xe, gt_id) 221 + capacity[class] += gt->user_engines.instances_per_class[class]; 222 + 223 + /* 224 + * Engines may be fused off or not exposed to userspace. Don't 225 + * return anything if this entire class is not available 226 + */ 227 + if (!capacity[class]) 228 + continue; 229 + 230 + class_name = xe_hw_engine_class_to_str(class); 231 + drm_printf(p, "drm-cycles-%s:\t%llu\n", 232 + class_name, xef->runtime[class]); 233 + drm_printf(p, "drm-total-cycles-%s:\t%llu\n", 234 + class_name, gpu_timestamp); 235 + 236 + if (capacity[class] > 1) 237 + drm_printf(p, "drm-engine-capacity-%s:\t%lu\n", 238 + class_name, capacity[class]); 239 + } 240 + } 241 + 240 242 /** 241 243 * xe_drm_client_fdinfo() - Callback for fdinfo interface 242 244 * @p: The drm_printer ptr ··· 310 192 void xe_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file) 311 193 { 312 194 show_meminfo(p, file); 195 + show_runtime(p, file); 313 196 } 314 197 #endif