Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation: amd-pstate: Add AMD P-State driver introduction

Introduce the AMD P-State driver design and implementation.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

authored by

Huang Rui and committed by
Rafael J. Wysocki
c2276088 3ad7fde1

+385
+2
Documentation/admin-guide/acpi/cppc_sysfs.rst
··· 4 4 Collaborative Processor Performance Control (CPPC) 5 5 ================================================== 6 6 7 + .. _cppc_sysfs: 8 + 7 9 CPPC 8 10 ==== 9 11
+382
Documentation/admin-guide/pm/amd-pstate.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: <isonum.txt> 3 + 4 + =============================================== 5 + ``amd-pstate`` CPU Performance Scaling Driver 6 + =============================================== 7 + 8 + :Copyright: |copy| 2021 Advanced Micro Devices, Inc. 9 + 10 + :Author: Huang Rui <ray.huang@amd.com> 11 + 12 + 13 + Introduction 14 + =================== 15 + 16 + ``amd-pstate`` is the AMD CPU performance scaling driver that introduces a 17 + new CPU frequency control mechanism on modern AMD APU and CPU series in 18 + Linux kernel. The new mechanism is based on Collaborative Processor 19 + Performance Control (CPPC) which provides finer grain frequency management 20 + than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using 21 + the ACPI P-states driver to manage CPU frequency and clocks with switching 22 + only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a 23 + flexible, low-latency interface for the Linux kernel to directly 24 + communicate the performance hints to hardware. 25 + 26 + ``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``, 27 + ``ondemand``, etc. to manage the performance hints which are provided by 28 + CPPC hardware functionality that internally follows the hardware 29 + specification (for details refer to AMD64 Architecture Programmer's Manual 30 + Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic 31 + frequency control function according to kernel governors on some of the 32 + Zen2 and Zen3 processors, and we will implement more AMD specific functions 33 + in future after we verify them on the hardware and SBIOS. 34 + 35 + 36 + AMD CPPC Overview 37 + ======================= 38 + 39 + Collaborative Processor Performance Control (CPPC) interface enumerates a 40 + continuous, abstract, and unit-less performance value in a scale that is 41 + not tied to a specific performance state / frequency. This is an ACPI 42 + standard [2]_ which software can specify application performance goals and 43 + hints as a relative target to the infrastructure limits. AMD processors 44 + provides the low latency register model (MSR) instead of AML code 45 + interpreter for performance adjustments. ``amd-pstate`` will initialize a 46 + ``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks 47 + to manage each performance update behavior. :: 48 + 49 + Highest Perf ------>+-----------------------+ +-----------------------+ 50 + | | | | 51 + | | | | 52 + | | Max Perf ---->| | 53 + | | | | 54 + | | | | 55 + Nominal Perf ------>+-----------------------+ +-----------------------+ 56 + | | | | 57 + | | | | 58 + | | | | 59 + | | | | 60 + | | | | 61 + | | | | 62 + | | Desired Perf ---->| | 63 + | | | | 64 + | | | | 65 + | | | | 66 + | | | | 67 + | | | | 68 + | | | | 69 + | | | | 70 + | | | | 71 + | | | | 72 + Lowest non- | | | | 73 + linear perf ------>+-----------------------+ +-----------------------+ 74 + | | | | 75 + | | Lowest perf ---->| | 76 + | | | | 77 + Lowest perf ------>+-----------------------+ +-----------------------+ 78 + | | | | 79 + | | | | 80 + | | | | 81 + 0 ------>+-----------------------+ +-----------------------+ 82 + 83 + AMD P-States Performance Scale 84 + 85 + 86 + .. _perf_cap: 87 + 88 + AMD CPPC Performance Capability 89 + -------------------------------- 90 + 91 + Highest Performance (RO) 92 + ......................... 93 + 94 + It is the absolute maximum performance an individual processor may reach, 95 + assuming ideal conditions. This performance level may not be sustainable 96 + for long durations and may only be achievable if other platform components 97 + are in a specific state; for example, it may require other processors be in 98 + an idle state. This would be equivalent to the highest frequencies 99 + supported by the processor. 100 + 101 + Nominal (Guaranteed) Performance (RO) 102 + ...................................... 103 + 104 + It is the maximum sustained performance level of the processor, assuming 105 + ideal operating conditions. In absence of an external constraint (power, 106 + thermal, etc.) this is the performance level the processor is expected to 107 + be able to maintain continuously. All cores/processors are expected to be 108 + able to sustain their nominal performance state simultaneously. 109 + 110 + Lowest non-linear Performance (RO) 111 + ................................... 112 + 113 + It is the lowest performance level at which nonlinear power savings are 114 + achieved, for example, due to the combined effects of voltage and frequency 115 + scaling. Above this threshold, lower performance levels should be generally 116 + more energy efficient than higher performance levels. This register 117 + effectively conveys the most efficient performance level to ``amd-pstate``. 118 + 119 + Lowest Performance (RO) 120 + ........................ 121 + 122 + It is the absolute lowest performance level of the processor. Selecting a 123 + performance level lower than the lowest nonlinear performance level may 124 + cause an efficiency penalty but should reduce the instantaneous power 125 + consumption of the processor. 126 + 127 + AMD CPPC Performance Control 128 + ------------------------------ 129 + 130 + ``amd-pstate`` passes performance goals through these registers. The 131 + register drives the behavior of the desired performance target. 132 + 133 + Minimum requested performance (RW) 134 + ................................... 135 + 136 + ``amd-pstate`` specifies the minimum allowed performance level. 137 + 138 + Maximum requested performance (RW) 139 + ................................... 140 + 141 + ``amd-pstate`` specifies a limit the maximum performance that is expected 142 + to be supplied by the hardware. 143 + 144 + Desired performance target (RW) 145 + ................................... 146 + 147 + ``amd-pstate`` specifies a desired target in the CPPC performance scale as 148 + a relative number. This can be expressed as percentage of nominal 149 + performance (infrastructure max). Below the nominal sustained performance 150 + level, desired performance expresses the average performance level of the 151 + processor subject to hardware. Above the nominal performance level, 152 + processor must provide at least nominal performance requested and go higher 153 + if current operating conditions allow. 154 + 155 + Energy Performance Preference (EPP) (RW) 156 + ......................................... 157 + 158 + Provides a hint to the hardware if software wants to bias toward performance 159 + (0x0) or energy efficiency (0xff). 160 + 161 + 162 + Key Governors Support 163 + ======================= 164 + 165 + ``amd-pstate`` can be used with all the (generic) scaling governors listed 166 + by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then, 167 + it is responsible for the configuration of policy objects corresponding to 168 + CPUs and provides the ``CPUFreq`` core (and the scaling governors attached 169 + to the policy objects) with accurate information on the maximum and minimum 170 + operating frequencies supported by the hardware. Users can check the 171 + ``scaling_cur_freq`` information comes from the ``CPUFreq`` core. 172 + 173 + ``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic 174 + frequency control. It is to fine tune the processor configuration on 175 + ``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate`` 176 + registers adjust_perf callback to implement the CPPC similar performance 177 + update behavior. It is initialized by ``sugov_start`` and then populate the 178 + CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as 179 + the utilization update callback function in CPU scheduler. CPU scheduler 180 + will call ``cpufreq_update_util`` and assign the target performance 181 + according to the ``struct sugov_cpu`` that utilization update belongs to. 182 + Then ``amd-pstate`` updates the desired performance according to the CPU 183 + scheduler assigned. 184 + 185 + 186 + Processor Support 187 + ======================= 188 + 189 + The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is 190 + not existed at the detected processor, and it uses ``acpi_cpc_valid`` to 191 + check the _CPC existence. All Zen based processors support legacy ACPI 192 + hardware P-States function, so while the ``amd-pstate`` fails to be 193 + initialized, the kernel will fall back to initialize ``acpi-cpufreq`` 194 + driver. 195 + 196 + There are two types of hardware implementations for ``amd-pstate``: one is 197 + `Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support 198 + <perf_cap_>`_. It can use :c:macro:`X86_FEATURE_CPPC` feature flag (for 199 + details refer to Processor Programming Reference (PPR) for AMD Family 200 + 19h Model 51h, Revision A1 Processors [3]_) to indicate the different 201 + types. ``amd-pstate`` is to register different ``static_call`` instances 202 + for different hardware implementations. 203 + 204 + Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the 205 + future, it will be supported on more and more AMD processors. 206 + 207 + Full MSR Support 208 + ----------------- 209 + 210 + Some new Zen3 processors such as Cezanne provide the MSR registers directly 211 + while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set. 212 + ``amd-pstate`` can handle the MSR register to implement the fast switch 213 + function in ``CPUFreq`` that can shrink latency of frequency control on the 214 + interrupt context. The functions with ``pstate_xxx`` prefix represent the 215 + operations of MSR registers. 216 + 217 + Shared Memory Support 218 + ---------------------- 219 + 220 + If :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, that means the 221 + processor supports shared memory solution. In this case, ``amd-pstate`` 222 + uses the ``cppc_acpi`` helper methods to implement the callback functions 223 + that defined on ``static_call``. The functions with ``cppc_xxx`` prefix 224 + represent the operations of acpi cppc helpers for shared memory solution. 225 + 226 + 227 + AMD P-States and ACPI hardware P-States always can be supported in one 228 + processor. But AMD P-States has the higher priority and if it is enabled 229 + with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond 230 + to the request from AMD P-States. 231 + 232 + 233 + User Space Interface in ``sysfs`` 234 + ================================== 235 + 236 + ``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to 237 + control its functionality at the system level. They located in the 238 + ``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. :: 239 + 240 + root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd* 241 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf 242 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq 243 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq 244 + 245 + 246 + ``amd_pstate_highest_perf / amd_pstate_max_freq`` 247 + 248 + Maximum CPPC performance and CPU frequency that the driver is allowed to 249 + set in percent of the maximum supported CPPC performance level (the highest 250 + performance supported in `AMD CPPC Performance Capability <perf_cap_>`_). 251 + In some of ASICs, the highest CPPC performance is not the one in the _CPC 252 + table, so we need to expose it to sysfs. If boost is not active but 253 + supported, this maximum frequency will be larger than the one in 254 + ``cpuinfo``. 255 + This attribute is read-only. 256 + 257 + ``amd_pstate_lowest_nonlinear_freq`` 258 + 259 + The lowest non-linear CPPC CPU frequency that the driver is allowed to set 260 + in percent of the maximum supported CPPC performance level (Please see the 261 + lowest non-linear performance in `AMD CPPC Performance Capability 262 + <perf_cap_>`_). 263 + This attribute is read-only. 264 + 265 + For other performance and frequency values, we can read them back from 266 + ``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`. 267 + 268 + 269 + ``amd-pstate`` vs ``acpi-cpufreq`` 270 + ====================================== 271 + 272 + On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables 273 + provided by the platform firmware used for CPU performance scaling, but 274 + only provides 3 P-states on AMD processors. 275 + However, on modern AMD APU and CPU series, it provides the collaborative 276 + processor performance control according to ACPI protocol and customize this 277 + for AMD platforms. That is fine-grain and continuous frequency range 278 + instead of the legacy hardware P-states. ``amd-pstate`` is the kernel 279 + module which supports the new AMD P-States mechanism on most of future AMD 280 + platforms. The AMD P-States mechanism will be the more performance and energy 281 + efficiency frequency management method on AMD processors. 282 + 283 + Kernel Module Options for ``amd-pstate`` 284 + ========================================= 285 + 286 + ``shared_mem`` 287 + Use a module param (shared_mem) to enable related processors manually with 288 + **amd_pstate.shared_mem=1**. 289 + Due to the performance issue on the processors with `Shared Memory Support 290 + <perf_cap_>`_, so we disable it for the moment and will enable this by default 291 + once we address performance issue on this solution. 292 + 293 + The way to check whether current processor is `Full MSR Support <perf_cap_>`_ 294 + or `Shared Memory Support <perf_cap_>`_ : :: 295 + 296 + ray@hr-test1:~$ lscpu | grep cppc 297 + Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm 298 + 299 + If CPU Flags have cppc, then this processor supports `Full MSR Support 300 + <perf_cap_>`_. Otherwise it supports `Shared Memory Support <perf_cap_>`_. 301 + 302 + 303 + ``cpupower`` tool support for ``amd-pstate`` 304 + =============================================== 305 + 306 + ``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency 307 + information. And it is in progress to support more and more operations for new 308 + ``amd-pstate`` module with this tool. :: 309 + 310 + root@hr-test1:/home/ray# cpupower frequency-info 311 + analyzing CPU 0: 312 + driver: amd-pstate 313 + CPUs which run at the same hardware frequency: 0 314 + CPUs which need to have their frequency coordinated by software: 0 315 + maximum transition latency: 131 us 316 + hardware limits: 400 MHz - 4.68 GHz 317 + available cpufreq governors: ondemand conservative powersave userspace performance schedutil 318 + current policy: frequency should be within 400 MHz and 4.68 GHz. 319 + The governor "schedutil" may decide which speed to use 320 + within this range. 321 + current CPU frequency: Unable to call hardware 322 + current CPU frequency: 4.02 GHz (asserted by call to kernel) 323 + boost state support: 324 + Supported: yes 325 + Active: yes 326 + AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz. 327 + AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz. 328 + AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz. 329 + AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz. 330 + 331 + 332 + Diagnostics and Tuning 333 + ======================= 334 + 335 + Trace Events 336 + -------------- 337 + 338 + There are two static trace events that can be used for ``amd-pstate`` 339 + diagnostics. One of them is the cpu_frequency trace event generally used 340 + by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event 341 + specific to ``amd-pstate``. The following sequence of shell commands can 342 + be used to enable them and see their output (if the kernel is generally 343 + configured to support event tracing). :: 344 + 345 + root@hr-test1:/home/ray# cd /sys/kernel/tracing/ 346 + root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable 347 + root@hr-test1:/sys/kernel/tracing# cat trace 348 + # tracer: nop 349 + # 350 + # entries-in-buffer/entries-written: 47827/42233061 #P:2 351 + # 352 + # _-----=> irqs-off 353 + # / _----=> need-resched 354 + # | / _---=> hardirq/softirq 355 + # || / _--=> preempt-depth 356 + # ||| / delay 357 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 358 + # | | | |||| | | 359 + <idle>-0 [015] dN... 4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true 360 + <idle>-0 [007] d.h.. 4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true 361 + cat-2161 [000] d.... 4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true 362 + sshd-2125 [004] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true 363 + <idle>-0 [007] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true 364 + <idle>-0 [003] d.s.. 4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true 365 + <idle>-0 [011] d.s.. 4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true 366 + 367 + The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling 368 + governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the 369 + policies with other scaling governors). 370 + 371 + 372 + Reference 373 + =========== 374 + 375 + .. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming, 376 + https://www.amd.com/system/files/TechDocs/24593.pdf 377 + 378 + .. [2] Advanced Configuration and Power Interface Specification, 379 + https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf 380 + 381 + .. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors 382 + https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip
+1
Documentation/admin-guide/pm/working-state.rst
··· 11 11 intel_idle 12 12 cpufreq 13 13 intel_pstate 14 + amd-pstate 14 15 cpufreq_drivers 15 16 intel_epb 16 17 intel-speed-select