Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation: thermal: Document thermal throttling on Intel platforms

Add documentation for Intel thermal throttling reporting events.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
[ rjw: Subject adjustment, file name change, minor edits ]
Link: https://patch.msgid.link/20251113212104.221632-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

authored by

Srinivas Pandruvada and committed by
Rafael J. Wysocki
3402bc01 172880f7

+92
+1
Documentation/admin-guide/thermal/index.rst
··· 6 6 :maxdepth: 1 7 7 8 8 intel_powerclamp 9 + intel_thermal_throttle
+91
Documentation/admin-guide/thermal/intel_thermal_throttle.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: <isonum.txt> 3 + 4 + ======================================= 5 + Intel thermal throttle events reporting 6 + ======================================= 7 + 8 + :Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> 9 + 10 + Introduction 11 + ------------ 12 + 13 + Intel processors have built in automatic and adaptive thermal monitoring 14 + mechanisms that force the processor to reduce its power consumption in order 15 + to operate within predetermined temperature limits. 16 + 17 + Refer to section "THERMAL MONITORING AND PROTECTION" in the "Intel® 64 and 18 + IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C, & 3D): 19 + System Programming Guide" for more details. 20 + 21 + In general, there are two mechanisms to control the core temperature of the 22 + processor. They are called "Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2)". 23 + 24 + The status of the temperature sensor that triggers the thermal monitor (TM1/TM2) 25 + is indicated through the "thermal status flag" and "thermal status log flag" in 26 + MSR_IA32_THERM_STATUS for core level and MSR_IA32_PACKAGE_THERM_STATUS for 27 + package level. 28 + 29 + Thermal Status flag, bit 0 — When set, indicates that the processor core 30 + temperature is currently at the trip temperature of the thermal monitor and that 31 + the processor power consumption is being reduced via either TM1 or TM2, depending 32 + on which is enabled. When clear, the flag indicates that the core temperature is 33 + below the thermal monitor trip temperature. This flag is read only. 34 + 35 + Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has 36 + tripped since the last power-up or reset or since the last time that software 37 + cleared this flag. This flag is a sticky bit; once set it remains set until 38 + cleared by software or until a power-up or reset of the processor. The default 39 + state is clear. 40 + 41 + It is possible that when user reads MSR_IA32_THERM_STATUS or 42 + MSR_IA32_PACKAGE_THERM_STATUS, TM1/TM2 is not active. In this case, 43 + "Thermal Status flag" will read "0" and the "Thermal Status Log flag" will be set 44 + to show any previous "TM1/TM2" activation. But since it needs to be cleared by 45 + the software, it can't show the number of occurrences of "TM1/TM2" activations. 46 + 47 + Hence, Linux provides counters of how many times the "Thermal Status flag" was 48 + set. Also presents how long the "Thermal Status flag" was active in milliseconds. 49 + Using these counters, users can check if the performance was limited because of 50 + thermal events. It is recommended to read from sysfs instead of directly reading 51 + MSRs as the "Thermal Status Log flag" is reset by the driver to implement rate 52 + control. 53 + 54 + Sysfs Interface 55 + --------------- 56 + 57 + Thermal throttling events are presented for each CPU under 58 + "/sys/devices/system/cpu/cpuX/thermal_throttle/", where "X" is the CPU number. 59 + 60 + All these counters are read-only. They can't be reset to 0. So, they can potentially 61 + overflow after reaching the maximum 64 bit unsigned integer. 62 + 63 + ``core_throttle_count`` 64 + Shows the number of times "Thermal Status flag" changed from 0 to 1 for this 65 + CPU since OS boot and thermal vector is initialized. This is a 64 bit counter. 66 + 67 + ``package_throttle_count`` 68 + Shows the number of times "Thermal Status flag" changed from 0 to 1 for the 69 + package containing this CPU since OS boot and thermal vector is initialized. 70 + Package status is broadcast to all CPUs; all CPUs in the package increment 71 + this count. This is a 64-bit counter. 72 + 73 + ``core_throttle_max_time_ms`` 74 + Shows the maximum amount of time for which "Thermal Status flag" has been 75 + set to 1 for this CPU at the core level since OS boot and thermal vector 76 + is initialized. 77 + 78 + ``package_throttle_max_time_ms`` 79 + Shows the maximum amount of time for which "Thermal Status flag" has been 80 + set to 1 for the package containing this CPU since OS boot and thermal 81 + vector is initialized. 82 + 83 + ``core_throttle_total_time_ms`` 84 + Shows the cumulative time for which "Thermal Status flag" has been 85 + set to 1 for this CPU for core level since OS boot and thermal vector 86 + is initialized. 87 + 88 + ``package_throttle_total_time_ms`` 89 + Shows the cumulative time for which "Thermal Status flag" has been set 90 + to 1 for the package containing this CPU since OS boot and thermal vector 91 + is initialized.