Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"The most signigicant change here is the addition of a new cpufreq
'P-state' driver for AMD processors as a better replacement for the
venerable acpi-cpufreq driver.

There are also other cpufreq updates (in the core, intel_pstate, ARM
drivers), PM core updates (mostly related to adding new macros for
declaring PM operations which should make the lives of driver
developers somewhat easier), and a bunch of assorted fixes and
cleanups.

Summary:

- Add new P-state driver for AMD processors (Huang Rui).

- Fix initialization of min and max frequency QoS requests in the
cpufreq core (Rafael Wysocki).

- Fix EPP handling on Alder Lake in intel_pstate (Srinivas
Pandruvada).

- Make intel_pstate update cpuinfo.max_freq when notified of HWP
capabilities changes and drop a redundant function call from that
driver (Rafael Wysocki).

- Improve IRQ support in the Qcom cpufreq driver (Ard Biesheuvel,
Stephen Boyd, Vladimir Zapolskiy).

- Fix double devm_remap() in the Mediatek cpufreq driver (Hector
Yuan).

- Introduce thermal pressure helpers for cpufreq CPU cooling (Lukasz
Luba).

- Make cpufreq use default_groups in kobj_type (Greg Kroah-Hartman).

- Make cpuidle use default_groups in kobj_type (Greg Kroah-Hartman).

- Fix two comments in cpuidle code (Jason Wang, Yang Li).

- Allow model-specific normal EPB value to be used in the intel_epb
sysfs attribute handling code (Srinivas Pandruvada).

- Simplify locking in pm_runtime_put_suppliers() (Rafael Wysocki).

- Add safety net to supplier device release in the runtime PM core
code (Rafael Wysocki).

- Capture device status before disabling runtime PM for it (Rafael
Wysocki).

- Add new macros for declaring PM operations to allow drivers to
avoid guarding them with CONFIG_PM #ifdefs or __maybe_unused and
update some drivers to use these macros (Paul Cercueil).

- Allow ACPI hardware signature to be honoured during restore from
hibernation (David Woodhouse).

- Update outdated operating performance points (OPP) documentation
(Tang Yizhou).

- Reduce log severity for informative message regarding frequency
transition failures in devfreq (Tzung-Bi Shih).

- Add DRAM frequency controller devfreq driver for Allwinner sunXi
SoCs (Samuel Holland).

- Add missing COMMON_CLK dependency to sun8i devfreq driver (Arnd
Bergmann).

- Add support for new layout of Psys PowerLimit Register on SPR to
the Intel RAPL power capping driver (Zhang Rui).

- Fix typo in a comment in idle_inject.c (Jason Wang).

- Remove unused function definition from the DTPM (Dynamit Thermal
Power Management) power capping framework (Daniel Lezcano).

- Reduce DTPM trace verbosity (Daniel Lezcano)"

* tag 'pm-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
x86, sched: Fix undefined reference to init_freq_invariance_cppc() build error
cpufreq: amd-pstate: Fix Kconfig dependencies for AMD P-State
cpufreq: amd-pstate: Fix struct amd_cpudata kernel-doc comment
cpuidle: use default_groups in kobj_type
x86: intel_epb: Allow model specific normal EPB value
MAINTAINERS: Add AMD P-State driver maintainer entry
Documentation: amd-pstate: Add AMD P-State driver introduction
cpufreq: amd-pstate: Add AMD P-State performance attributes
cpufreq: amd-pstate: Add AMD P-State frequencies attributes
cpufreq: amd-pstate: Add boost mode support for AMD P-State
cpufreq: amd-pstate: Add trace for AMD P-State module
cpufreq: amd-pstate: Introduce the support for the processors with shared memory solution
cpufreq: amd-pstate: Add fast switch function for AMD P-State
cpufreq: amd-pstate: Introduce a new AMD P-State driver to support future processors
ACPI: CPPC: Add CPPC enable register function
ACPI: CPPC: Check present CPUs for determining _CPC is valid
ACPI: CPPC: Implement support for SystemIO registers
x86/msr: Add AMD CPPC MSR definitions
x86/cpufeatures: Add AMD Collaborative Processor Performance Control feature flag
cpufreq: use default_groups in kobj_type
...

+2274 -218
+2
Documentation/admin-guide/acpi/cppc_sysfs.rst
··· 4 4 Collaborative Processor Performance Control (CPPC) 5 5 ================================================== 6 6 7 + .. _cppc_sysfs: 8 + 7 9 CPPC 8 10 ==== 9 11
+12 -3
Documentation/admin-guide/kernel-parameters.txt
··· 225 225 For broken nForce2 BIOS resulting in XT-PIC timer. 226 226 227 227 acpi_sleep= [HW,ACPI] Sleep options 228 - Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig, 229 - old_ordering, nonvs, sci_force_enable, nobl } 228 + Format: { s3_bios, s3_mode, s3_beep, s4_hwsig, 229 + s4_nohwsig, old_ordering, nonvs, 230 + sci_force_enable, nobl } 230 231 See Documentation/power/video.rst for information on 231 232 s3_bios and s3_mode. 232 233 s3_beep is for debugging; it makes the PC's speaker beep 233 234 as soon as the kernel's real-mode entry point is called. 235 + s4_hwsig causes the kernel to check the ACPI hardware 236 + signature during resume from hibernation, and gracefully 237 + refuse to resume if it has changed. This complies with 238 + the ACPI specification but not with reality, since 239 + Windows does not do this and many laptops do change it 240 + on docking. So the default behaviour is to allow resume 241 + and simply warn when the signature changes, unless the 242 + s4_hwsig option is enabled. 234 243 s4_nohwsig prevents ACPI hardware signature from being 235 - used during resume from hibernation. 244 + used (or even warned about) during resume. 236 245 old_ordering causes the ACPI 1.0 ordering of the _PTS 237 246 control method, with respect to putting devices into 238 247 low power states, to be enforced (the ACPI 2.0 ordering
+382
Documentation/admin-guide/pm/amd-pstate.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: <isonum.txt> 3 + 4 + =============================================== 5 + ``amd-pstate`` CPU Performance Scaling Driver 6 + =============================================== 7 + 8 + :Copyright: |copy| 2021 Advanced Micro Devices, Inc. 9 + 10 + :Author: Huang Rui <ray.huang@amd.com> 11 + 12 + 13 + Introduction 14 + =================== 15 + 16 + ``amd-pstate`` is the AMD CPU performance scaling driver that introduces a 17 + new CPU frequency control mechanism on modern AMD APU and CPU series in 18 + Linux kernel. The new mechanism is based on Collaborative Processor 19 + Performance Control (CPPC) which provides finer grain frequency management 20 + than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using 21 + the ACPI P-states driver to manage CPU frequency and clocks with switching 22 + only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a 23 + flexible, low-latency interface for the Linux kernel to directly 24 + communicate the performance hints to hardware. 25 + 26 + ``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``, 27 + ``ondemand``, etc. to manage the performance hints which are provided by 28 + CPPC hardware functionality that internally follows the hardware 29 + specification (for details refer to AMD64 Architecture Programmer's Manual 30 + Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic 31 + frequency control function according to kernel governors on some of the 32 + Zen2 and Zen3 processors, and we will implement more AMD specific functions 33 + in future after we verify them on the hardware and SBIOS. 34 + 35 + 36 + AMD CPPC Overview 37 + ======================= 38 + 39 + Collaborative Processor Performance Control (CPPC) interface enumerates a 40 + continuous, abstract, and unit-less performance value in a scale that is 41 + not tied to a specific performance state / frequency. This is an ACPI 42 + standard [2]_ which software can specify application performance goals and 43 + hints as a relative target to the infrastructure limits. AMD processors 44 + provides the low latency register model (MSR) instead of AML code 45 + interpreter for performance adjustments. ``amd-pstate`` will initialize a 46 + ``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks 47 + to manage each performance update behavior. :: 48 + 49 + Highest Perf ------>+-----------------------+ +-----------------------+ 50 + | | | | 51 + | | | | 52 + | | Max Perf ---->| | 53 + | | | | 54 + | | | | 55 + Nominal Perf ------>+-----------------------+ +-----------------------+ 56 + | | | | 57 + | | | | 58 + | | | | 59 + | | | | 60 + | | | | 61 + | | | | 62 + | | Desired Perf ---->| | 63 + | | | | 64 + | | | | 65 + | | | | 66 + | | | | 67 + | | | | 68 + | | | | 69 + | | | | 70 + | | | | 71 + | | | | 72 + Lowest non- | | | | 73 + linear perf ------>+-----------------------+ +-----------------------+ 74 + | | | | 75 + | | Lowest perf ---->| | 76 + | | | | 77 + Lowest perf ------>+-----------------------+ +-----------------------+ 78 + | | | | 79 + | | | | 80 + | | | | 81 + 0 ------>+-----------------------+ +-----------------------+ 82 + 83 + AMD P-States Performance Scale 84 + 85 + 86 + .. _perf_cap: 87 + 88 + AMD CPPC Performance Capability 89 + -------------------------------- 90 + 91 + Highest Performance (RO) 92 + ......................... 93 + 94 + It is the absolute maximum performance an individual processor may reach, 95 + assuming ideal conditions. This performance level may not be sustainable 96 + for long durations and may only be achievable if other platform components 97 + are in a specific state; for example, it may require other processors be in 98 + an idle state. This would be equivalent to the highest frequencies 99 + supported by the processor. 100 + 101 + Nominal (Guaranteed) Performance (RO) 102 + ...................................... 103 + 104 + It is the maximum sustained performance level of the processor, assuming 105 + ideal operating conditions. In absence of an external constraint (power, 106 + thermal, etc.) this is the performance level the processor is expected to 107 + be able to maintain continuously. All cores/processors are expected to be 108 + able to sustain their nominal performance state simultaneously. 109 + 110 + Lowest non-linear Performance (RO) 111 + ................................... 112 + 113 + It is the lowest performance level at which nonlinear power savings are 114 + achieved, for example, due to the combined effects of voltage and frequency 115 + scaling. Above this threshold, lower performance levels should be generally 116 + more energy efficient than higher performance levels. This register 117 + effectively conveys the most efficient performance level to ``amd-pstate``. 118 + 119 + Lowest Performance (RO) 120 + ........................ 121 + 122 + It is the absolute lowest performance level of the processor. Selecting a 123 + performance level lower than the lowest nonlinear performance level may 124 + cause an efficiency penalty but should reduce the instantaneous power 125 + consumption of the processor. 126 + 127 + AMD CPPC Performance Control 128 + ------------------------------ 129 + 130 + ``amd-pstate`` passes performance goals through these registers. The 131 + register drives the behavior of the desired performance target. 132 + 133 + Minimum requested performance (RW) 134 + ................................... 135 + 136 + ``amd-pstate`` specifies the minimum allowed performance level. 137 + 138 + Maximum requested performance (RW) 139 + ................................... 140 + 141 + ``amd-pstate`` specifies a limit the maximum performance that is expected 142 + to be supplied by the hardware. 143 + 144 + Desired performance target (RW) 145 + ................................... 146 + 147 + ``amd-pstate`` specifies a desired target in the CPPC performance scale as 148 + a relative number. This can be expressed as percentage of nominal 149 + performance (infrastructure max). Below the nominal sustained performance 150 + level, desired performance expresses the average performance level of the 151 + processor subject to hardware. Above the nominal performance level, 152 + processor must provide at least nominal performance requested and go higher 153 + if current operating conditions allow. 154 + 155 + Energy Performance Preference (EPP) (RW) 156 + ......................................... 157 + 158 + Provides a hint to the hardware if software wants to bias toward performance 159 + (0x0) or energy efficiency (0xff). 160 + 161 + 162 + Key Governors Support 163 + ======================= 164 + 165 + ``amd-pstate`` can be used with all the (generic) scaling governors listed 166 + by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then, 167 + it is responsible for the configuration of policy objects corresponding to 168 + CPUs and provides the ``CPUFreq`` core (and the scaling governors attached 169 + to the policy objects) with accurate information on the maximum and minimum 170 + operating frequencies supported by the hardware. Users can check the 171 + ``scaling_cur_freq`` information comes from the ``CPUFreq`` core. 172 + 173 + ``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic 174 + frequency control. It is to fine tune the processor configuration on 175 + ``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate`` 176 + registers adjust_perf callback to implement the CPPC similar performance 177 + update behavior. It is initialized by ``sugov_start`` and then populate the 178 + CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as 179 + the utilization update callback function in CPU scheduler. CPU scheduler 180 + will call ``cpufreq_update_util`` and assign the target performance 181 + according to the ``struct sugov_cpu`` that utilization update belongs to. 182 + Then ``amd-pstate`` updates the desired performance according to the CPU 183 + scheduler assigned. 184 + 185 + 186 + Processor Support 187 + ======================= 188 + 189 + The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is 190 + not existed at the detected processor, and it uses ``acpi_cpc_valid`` to 191 + check the _CPC existence. All Zen based processors support legacy ACPI 192 + hardware P-States function, so while the ``amd-pstate`` fails to be 193 + initialized, the kernel will fall back to initialize ``acpi-cpufreq`` 194 + driver. 195 + 196 + There are two types of hardware implementations for ``amd-pstate``: one is 197 + `Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support 198 + <perf_cap_>`_. It can use :c:macro:`X86_FEATURE_CPPC` feature flag (for 199 + details refer to Processor Programming Reference (PPR) for AMD Family 200 + 19h Model 51h, Revision A1 Processors [3]_) to indicate the different 201 + types. ``amd-pstate`` is to register different ``static_call`` instances 202 + for different hardware implementations. 203 + 204 + Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the 205 + future, it will be supported on more and more AMD processors. 206 + 207 + Full MSR Support 208 + ----------------- 209 + 210 + Some new Zen3 processors such as Cezanne provide the MSR registers directly 211 + while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set. 212 + ``amd-pstate`` can handle the MSR register to implement the fast switch 213 + function in ``CPUFreq`` that can shrink latency of frequency control on the 214 + interrupt context. The functions with ``pstate_xxx`` prefix represent the 215 + operations of MSR registers. 216 + 217 + Shared Memory Support 218 + ---------------------- 219 + 220 + If :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, that means the 221 + processor supports shared memory solution. In this case, ``amd-pstate`` 222 + uses the ``cppc_acpi`` helper methods to implement the callback functions 223 + that defined on ``static_call``. The functions with ``cppc_xxx`` prefix 224 + represent the operations of acpi cppc helpers for shared memory solution. 225 + 226 + 227 + AMD P-States and ACPI hardware P-States always can be supported in one 228 + processor. But AMD P-States has the higher priority and if it is enabled 229 + with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond 230 + to the request from AMD P-States. 231 + 232 + 233 + User Space Interface in ``sysfs`` 234 + ================================== 235 + 236 + ``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to 237 + control its functionality at the system level. They located in the 238 + ``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. :: 239 + 240 + root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd* 241 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf 242 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq 243 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq 244 + 245 + 246 + ``amd_pstate_highest_perf / amd_pstate_max_freq`` 247 + 248 + Maximum CPPC performance and CPU frequency that the driver is allowed to 249 + set in percent of the maximum supported CPPC performance level (the highest 250 + performance supported in `AMD CPPC Performance Capability <perf_cap_>`_). 251 + In some of ASICs, the highest CPPC performance is not the one in the _CPC 252 + table, so we need to expose it to sysfs. If boost is not active but 253 + supported, this maximum frequency will be larger than the one in 254 + ``cpuinfo``. 255 + This attribute is read-only. 256 + 257 + ``amd_pstate_lowest_nonlinear_freq`` 258 + 259 + The lowest non-linear CPPC CPU frequency that the driver is allowed to set 260 + in percent of the maximum supported CPPC performance level (Please see the 261 + lowest non-linear performance in `AMD CPPC Performance Capability 262 + <perf_cap_>`_). 263 + This attribute is read-only. 264 + 265 + For other performance and frequency values, we can read them back from 266 + ``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`. 267 + 268 + 269 + ``amd-pstate`` vs ``acpi-cpufreq`` 270 + ====================================== 271 + 272 + On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables 273 + provided by the platform firmware used for CPU performance scaling, but 274 + only provides 3 P-states on AMD processors. 275 + However, on modern AMD APU and CPU series, it provides the collaborative 276 + processor performance control according to ACPI protocol and customize this 277 + for AMD platforms. That is fine-grain and continuous frequency range 278 + instead of the legacy hardware P-states. ``amd-pstate`` is the kernel 279 + module which supports the new AMD P-States mechanism on most of future AMD 280 + platforms. The AMD P-States mechanism will be the more performance and energy 281 + efficiency frequency management method on AMD processors. 282 + 283 + Kernel Module Options for ``amd-pstate`` 284 + ========================================= 285 + 286 + ``shared_mem`` 287 + Use a module param (shared_mem) to enable related processors manually with 288 + **amd_pstate.shared_mem=1**. 289 + Due to the performance issue on the processors with `Shared Memory Support 290 + <perf_cap_>`_, so we disable it for the moment and will enable this by default 291 + once we address performance issue on this solution. 292 + 293 + The way to check whether current processor is `Full MSR Support <perf_cap_>`_ 294 + or `Shared Memory Support <perf_cap_>`_ : :: 295 + 296 + ray@hr-test1:~$ lscpu | grep cppc 297 + Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm 298 + 299 + If CPU Flags have cppc, then this processor supports `Full MSR Support 300 + <perf_cap_>`_. Otherwise it supports `Shared Memory Support <perf_cap_>`_. 301 + 302 + 303 + ``cpupower`` tool support for ``amd-pstate`` 304 + =============================================== 305 + 306 + ``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency 307 + information. And it is in progress to support more and more operations for new 308 + ``amd-pstate`` module with this tool. :: 309 + 310 + root@hr-test1:/home/ray# cpupower frequency-info 311 + analyzing CPU 0: 312 + driver: amd-pstate 313 + CPUs which run at the same hardware frequency: 0 314 + CPUs which need to have their frequency coordinated by software: 0 315 + maximum transition latency: 131 us 316 + hardware limits: 400 MHz - 4.68 GHz 317 + available cpufreq governors: ondemand conservative powersave userspace performance schedutil 318 + current policy: frequency should be within 400 MHz and 4.68 GHz. 319 + The governor "schedutil" may decide which speed to use 320 + within this range. 321 + current CPU frequency: Unable to call hardware 322 + current CPU frequency: 4.02 GHz (asserted by call to kernel) 323 + boost state support: 324 + Supported: yes 325 + Active: yes 326 + AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz. 327 + AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz. 328 + AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz. 329 + AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz. 330 + 331 + 332 + Diagnostics and Tuning 333 + ======================= 334 + 335 + Trace Events 336 + -------------- 337 + 338 + There are two static trace events that can be used for ``amd-pstate`` 339 + diagnostics. One of them is the cpu_frequency trace event generally used 340 + by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event 341 + specific to ``amd-pstate``. The following sequence of shell commands can 342 + be used to enable them and see their output (if the kernel is generally 343 + configured to support event tracing). :: 344 + 345 + root@hr-test1:/home/ray# cd /sys/kernel/tracing/ 346 + root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable 347 + root@hr-test1:/sys/kernel/tracing# cat trace 348 + # tracer: nop 349 + # 350 + # entries-in-buffer/entries-written: 47827/42233061 #P:2 351 + # 352 + # _-----=> irqs-off 353 + # / _----=> need-resched 354 + # | / _---=> hardirq/softirq 355 + # || / _--=> preempt-depth 356 + # ||| / delay 357 + # TASK-PID CPU# |||| TIMESTAMP FUNCTION 358 + # | | | |||| | | 359 + <idle>-0 [015] dN... 4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true 360 + <idle>-0 [007] d.h.. 4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true 361 + cat-2161 [000] d.... 4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true 362 + sshd-2125 [004] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true 363 + <idle>-0 [007] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true 364 + <idle>-0 [003] d.s.. 4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true 365 + <idle>-0 [011] d.s.. 4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true 366 + 367 + The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling 368 + governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the 369 + policies with other scaling governors). 370 + 371 + 372 + Reference 373 + =========== 374 + 375 + .. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming, 376 + https://www.amd.com/system/files/TechDocs/24593.pdf 377 + 378 + .. [2] Advanced Configuration and Power Interface Specification, 379 + https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf 380 + 381 + .. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors 382 + https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip
+1
Documentation/admin-guide/pm/working-state.rst
··· 11 11 intel_idle 12 12 cpufreq 13 13 intel_pstate 14 + amd-pstate 14 15 cpufreq_drivers 15 16 intel_epb 16 17 intel-speed-select
+7 -7
Documentation/power/opp.rst
··· 48 48 OPP library provides a set of helper functions to organize and query the OPP 49 49 information. The library is located in drivers/opp/ directory and the header 50 50 is located in include/linux/pm_opp.h. OPP library can be enabled by enabling 51 - CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on 52 - CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to 53 - optionally boot at a certain OPP without needing cpufreq. 51 + CONFIG_PM_OPP from power management menuconfig menu. Certain SoCs such as Texas 52 + Instrument's OMAP framework allows to optionally boot at a certain OPP without 53 + needing cpufreq. 54 54 55 55 Typical usage of the OPP library is as follows:: 56 56 ··· 75 75 76 76 OPP library facilitates this concept in its implementation. The following 77 77 operational functions operate only on available opps: 78 - opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, 79 - dev_pm_opp_get_opp_count 78 + dev_pm_opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, 79 + dev_pm_opp_get_opp_count. 80 80 81 81 dev_pm_opp_find_freq_exact is meant to be used to find the opp pointer 82 82 which can then be used for dev_pm_opp_enable/disable functions to make an ··· 103 103 The OPP is defined using the frequency and voltage. Once added, the OPP 104 104 is assumed to be available and control of its availability can be done 105 105 with the dev_pm_opp_enable/disable functions. OPP library 106 - internally stores and manages this information in the opp struct. 106 + internally stores and manages this information in the dev_pm_opp struct. 107 107 This function may be used by SoC framework to define a optimal list 108 108 as per the demands of SoC usage environment. 109 109 ··· 247 247 5. OPP Data Retrieval Functions 248 248 =============================== 249 249 Since OPP library abstracts away the OPP information, a set of functions to pull 250 - information from the OPP structure is necessary. Once an OPP pointer is 250 + information from the dev_pm_opp structure is necessary. Once an OPP pointer is 251 251 retrieved using the search functions, the following functions can be used by SoC 252 252 framework to retrieve the information represented inside the OPP layer. 253 253
+10 -4
Documentation/power/runtime_pm.rst
··· 265 265 RPM_SUSPENDED, which means that each device is initially regarded by the 266 266 PM core as 'suspended', regardless of its real hardware status 267 267 268 + `enum rpm_status last_status;` 269 + - the last runtime PM status of the device captured before disabling runtime 270 + PM for it (invalid initially and when disable_depth is 0) 271 + 268 272 `unsigned int runtime_auto;` 269 273 - if set, indicates that the user space has allowed the device driver to 270 274 power manage the device at run time via the /sys/devices/.../power/control ··· 337 333 338 334 `int pm_runtime_resume(struct device *dev);` 339 335 - execute the subsystem-level resume callback for the device; returns 0 on 340 - success, 1 if the device's runtime PM status was already 'active' or 341 - error code on failure, where -EAGAIN means it may be safe to attempt to 342 - resume the device again in future, but 'power.runtime_error' should be 343 - checked additionally, and -EACCES means that 'power.disable_depth' is 336 + success, 1 if the device's runtime PM status is already 'active' (also if 337 + 'power.disable_depth' is nonzero, but the status was 'active' when it was 338 + changing from 0 to 1) or error code on failure, where -EAGAIN means it may 339 + be safe to attempt to resume the device again in future, but 340 + 'power.runtime_error' should be checked additionally, and -EACCES means 341 + that the callback could not be run, because 'power.disable_depth' was 344 342 different from 0 345 343 346 344 `int pm_runtime_resume_and_get(struct device *dev);`
+7
MAINTAINERS
··· 994 994 T: git https://gitlab.freedesktop.org/agd5f/linux.git 995 995 F: drivers/gpu/drm/amd/pm/ 996 996 997 + AMD PSTATE DRIVER 998 + M: Huang Rui <ray.huang@amd.com> 999 + L: linux-pm@vger.kernel.org 1000 + S: Supported 1001 + F: Documentation/admin-guide/pm/amd-pstate.rst 1002 + F: drivers/cpufreq/amd-pstate* 1003 + 997 1004 AMD PTDMA DRIVER 998 1005 M: Sanjay R Mehta <sanju.mehta@amd.com> 999 1006 L: dmaengine@vger.kernel.org
+1 -1
arch/arm/include/asm/topology.h
··· 23 23 24 24 /* Replace task scheduler's default thermal pressure API */ 25 25 #define arch_scale_thermal_pressure topology_get_thermal_pressure 26 - #define arch_set_thermal_pressure topology_set_thermal_pressure 26 + #define arch_update_thermal_pressure topology_update_thermal_pressure 27 27 28 28 #else 29 29
+1 -1
arch/arm64/include/asm/topology.h
··· 32 32 33 33 /* Replace task scheduler's default thermal pressure API */ 34 34 #define arch_scale_thermal_pressure topology_get_thermal_pressure 35 - #define arch_set_thermal_pressure topology_set_thermal_pressure 35 + #define arch_update_thermal_pressure topology_update_thermal_pressure 36 36 37 37 #include <asm-generic/topology.h> 38 38
+1
arch/x86/include/asm/cpufeatures.h
··· 315 315 #define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */ 316 316 #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ 317 317 #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */ 318 + #define X86_FEATURE_CPPC (13*32+27) /* Collaborative Processor Performance Control */ 318 319 319 320 /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ 320 321 #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
+17
arch/x86/include/asm/msr-index.h
··· 486 486 487 487 #define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f 488 488 489 + /* AMD Collaborative Processor Performance Control MSRs */ 490 + #define MSR_AMD_CPPC_CAP1 0xc00102b0 491 + #define MSR_AMD_CPPC_ENABLE 0xc00102b1 492 + #define MSR_AMD_CPPC_CAP2 0xc00102b2 493 + #define MSR_AMD_CPPC_REQ 0xc00102b3 494 + #define MSR_AMD_CPPC_STATUS 0xc00102b4 495 + 496 + #define AMD_CPPC_LOWEST_PERF(x) (((x) >> 0) & 0xff) 497 + #define AMD_CPPC_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff) 498 + #define AMD_CPPC_NOMINAL_PERF(x) (((x) >> 16) & 0xff) 499 + #define AMD_CPPC_HIGHEST_PERF(x) (((x) >> 24) & 0xff) 500 + 501 + #define AMD_CPPC_MAX_PERF(x) (((x) & 0xff) << 0) 502 + #define AMD_CPPC_MIN_PERF(x) (((x) & 0xff) << 8) 503 + #define AMD_CPPC_DES_PERF(x) (((x) & 0xff) << 16) 504 + #define AMD_CPPC_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24) 505 + 489 506 /* Fam 17h MSRs */ 490 507 #define MSR_F17H_IRPERF 0xc00000e9 491 508
+1 -1
arch/x86/include/asm/topology.h
··· 221 221 } 222 222 #endif 223 223 224 - #ifdef CONFIG_ACPI_CPPC_LIB 224 + #if defined(CONFIG_ACPI_CPPC_LIB) && defined(CONFIG_SMP) 225 225 void init_freq_invariance_cppc(void); 226 226 #define init_freq_invariance_cppc init_freq_invariance_cppc 227 227 #endif
+3 -1
arch/x86/kernel/acpi/sleep.c
··· 139 139 if (strncmp(str, "s3_beep", 7) == 0) 140 140 acpi_realmode_flags |= 4; 141 141 #ifdef CONFIG_HIBERNATION 142 + if (strncmp(str, "s4_hwsig", 8) == 0) 143 + acpi_check_s4_hw_signature(1); 142 144 if (strncmp(str, "s4_nohwsig", 10) == 0) 143 - acpi_no_s4_hw_signature(); 145 + acpi_check_s4_hw_signature(0); 144 146 #endif 145 147 if (strncmp(str, "nonvs", 5) == 0) 146 148 acpi_nvs_nosave();
+32 -13
arch/x86/kernel/cpu/intel_epb.c
··· 16 16 #include <linux/syscore_ops.h> 17 17 #include <linux/pm.h> 18 18 19 + #include <asm/cpu_device_id.h> 19 20 #include <asm/cpufeature.h> 20 21 #include <asm/msr.h> 21 22 ··· 59 58 #define EPB_SAVED 0x10ULL 60 59 #define MAX_EPB EPB_MASK 61 60 61 + enum energy_perf_value_index { 62 + EPB_INDEX_PERFORMANCE, 63 + EPB_INDEX_BALANCE_PERFORMANCE, 64 + EPB_INDEX_NORMAL, 65 + EPB_INDEX_BALANCE_POWERSAVE, 66 + EPB_INDEX_POWERSAVE, 67 + }; 68 + 69 + static u8 energ_perf_values[] = { 70 + [EPB_INDEX_PERFORMANCE] = ENERGY_PERF_BIAS_PERFORMANCE, 71 + [EPB_INDEX_BALANCE_PERFORMANCE] = ENERGY_PERF_BIAS_BALANCE_PERFORMANCE, 72 + [EPB_INDEX_NORMAL] = ENERGY_PERF_BIAS_NORMAL, 73 + [EPB_INDEX_BALANCE_POWERSAVE] = ENERGY_PERF_BIAS_BALANCE_POWERSAVE, 74 + [EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE, 75 + }; 76 + 62 77 static int intel_epb_save(void) 63 78 { 64 79 u64 epb; ··· 107 90 */ 108 91 val = epb & EPB_MASK; 109 92 if (val == ENERGY_PERF_BIAS_PERFORMANCE) { 110 - val = ENERGY_PERF_BIAS_NORMAL; 93 + val = energ_perf_values[EPB_INDEX_NORMAL]; 111 94 pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); 112 95 } 113 96 } ··· 120 103 }; 121 104 122 105 static const char * const energy_perf_strings[] = { 123 - "performance", 124 - "balance-performance", 125 - "normal", 126 - "balance-power", 127 - "power" 128 - }; 129 - static const u8 energ_perf_values[] = { 130 - ENERGY_PERF_BIAS_PERFORMANCE, 131 - ENERGY_PERF_BIAS_BALANCE_PERFORMANCE, 132 - ENERGY_PERF_BIAS_NORMAL, 133 - ENERGY_PERF_BIAS_BALANCE_POWERSAVE, 134 - ENERGY_PERF_BIAS_POWERSAVE 106 + [EPB_INDEX_PERFORMANCE] = "performance", 107 + [EPB_INDEX_BALANCE_PERFORMANCE] = "balance-performance", 108 + [EPB_INDEX_NORMAL] = "normal", 109 + [EPB_INDEX_BALANCE_POWERSAVE] = "balance-power", 110 + [EPB_INDEX_POWERSAVE] = "power", 135 111 }; 136 112 137 113 static ssize_t energy_perf_bias_show(struct device *dev, ··· 203 193 return 0; 204 194 } 205 195 196 + static const struct x86_cpu_id intel_epb_normal[] = { 197 + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, 7), 198 + {} 199 + }; 200 + 206 201 static __init int intel_epb_init(void) 207 202 { 203 + const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal); 208 204 int ret; 209 205 210 206 if (!boot_cpu_has(X86_FEATURE_EPB)) 211 207 return -ENODEV; 208 + 209 + if (id) 210 + energ_perf_values[EPB_INDEX_NORMAL] = id->driver_data; 212 211 213 212 ret = cpuhp_setup_state(CPUHP_AP_X86_INTEL_EPB_ONLINE, 214 213 "x86/intel/epb:online", intel_epb_online,
+95 -4
drivers/acpi/cppc_acpi.c
··· 118 118 */ 119 119 #define NUM_RETRIES 500ULL 120 120 121 + #define OVER_16BTS_MASK ~0xFFFFULL 122 + 121 123 #define define_one_cppc_ro(_name) \ 122 124 static struct kobj_attribute _name = \ 123 125 __ATTR(_name, 0444, show_##_name, NULL) ··· 414 412 struct cpc_desc *cpc_ptr; 415 413 int cpu; 416 414 417 - for_each_possible_cpu(cpu) { 415 + for_each_present_cpu(cpu) { 418 416 cpc_ptr = per_cpu(cpc_desc_ptr, cpu); 419 417 if (!cpc_ptr) 420 418 return false; ··· 732 730 goto out_free; 733 731 cpc_ptr->cpc_regs[i-2].sys_mem_vaddr = addr; 734 732 } 733 + } else if (gas_t->space_id == ACPI_ADR_SPACE_SYSTEM_IO) { 734 + if (gas_t->access_width < 1 || gas_t->access_width > 3) { 735 + /* 736 + * 1 = 8-bit, 2 = 16-bit, and 3 = 32-bit. 737 + * SystemIO doesn't implement 64-bit 738 + * registers. 739 + */ 740 + pr_debug("Invalid access width %d for SystemIO register\n", 741 + gas_t->access_width); 742 + goto out_free; 743 + } 744 + if (gas_t->address & OVER_16BTS_MASK) { 745 + /* SystemIO registers use 16-bit integer addresses */ 746 + pr_debug("Invalid IO port %llu for SystemIO register\n", 747 + gas_t->address); 748 + goto out_free; 749 + } 735 750 } else { 736 751 if (gas_t->space_id != ACPI_ADR_SPACE_FIXED_HARDWARE || !cpc_ffh_supported()) { 737 - /* Support only PCC ,SYS MEM and FFH type regs */ 752 + /* Support only PCC, SystemMemory, SystemIO, and FFH type regs. */ 738 753 pr_debug("Unsupported register type: %d\n", gas_t->space_id); 739 754 goto out_free; 740 755 } ··· 926 907 } 927 908 928 909 *val = 0; 929 - if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0) 910 + 911 + if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_IO) { 912 + u32 width = 8 << (reg->access_width - 1); 913 + acpi_status status; 914 + 915 + status = acpi_os_read_port((acpi_io_address)reg->address, 916 + (u32 *)val, width); 917 + if (ACPI_FAILURE(status)) { 918 + pr_debug("Error: Failed to read SystemIO port %llx\n", 919 + reg->address); 920 + return -EFAULT; 921 + } 922 + 923 + return 0; 924 + } else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0) 930 925 vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id); 931 926 else if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) 932 927 vaddr = reg_res->sys_mem_vaddr; ··· 979 946 int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu); 980 947 struct cpc_reg *reg = &reg_res->cpc_entry.reg; 981 948 982 - if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0) 949 + if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_IO) { 950 + u32 width = 8 << (reg->access_width - 1); 951 + acpi_status status; 952 + 953 + status = acpi_os_write_port((acpi_io_address)reg->address, 954 + (u32)val, width); 955 + if (ACPI_FAILURE(status)) { 956 + pr_debug("Error: Failed to write SystemIO port %llx\n", 957 + reg->address); 958 + return -EFAULT; 959 + } 960 + 961 + return 0; 962 + } else if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM && pcc_ss_id >= 0) 983 963 vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id); 984 964 else if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) 985 965 vaddr = reg_res->sys_mem_vaddr; ··· 1258 1212 return ret; 1259 1213 } 1260 1214 EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs); 1215 + 1216 + /** 1217 + * cppc_set_enable - Set to enable CPPC on the processor by writing the 1218 + * Continuous Performance Control package EnableRegister field. 1219 + * @cpu: CPU for which to enable CPPC register. 1220 + * @enable: 0 - disable, 1 - enable CPPC feature on the processor. 1221 + * 1222 + * Return: 0 for success, -ERRNO or -EIO otherwise. 1223 + */ 1224 + int cppc_set_enable(int cpu, bool enable) 1225 + { 1226 + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu); 1227 + struct cpc_register_resource *enable_reg; 1228 + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu); 1229 + struct cppc_pcc_data *pcc_ss_data = NULL; 1230 + int ret = -EINVAL; 1231 + 1232 + if (!cpc_desc) { 1233 + pr_debug("No CPC descriptor for CPU:%d\n", cpu); 1234 + return -EINVAL; 1235 + } 1236 + 1237 + enable_reg = &cpc_desc->cpc_regs[ENABLE]; 1238 + 1239 + if (CPC_IN_PCC(enable_reg)) { 1240 + 1241 + if (pcc_ss_id < 0) 1242 + return -EIO; 1243 + 1244 + ret = cpc_write(cpu, enable_reg, enable); 1245 + if (ret) 1246 + return ret; 1247 + 1248 + pcc_ss_data = pcc_data[pcc_ss_id]; 1249 + 1250 + down_write(&pcc_ss_data->pcc_lock); 1251 + /* after writing CPC, transfer the ownership of PCC to platfrom */ 1252 + ret = send_pcc_cmd(pcc_ss_id, CMD_WRITE); 1253 + up_write(&pcc_ss_data->pcc_lock); 1254 + return ret; 1255 + } 1256 + 1257 + return cpc_write(cpu, enable_reg, enable); 1258 + } 1259 + EXPORT_SYMBOL_GPL(cppc_set_enable); 1261 1260 1262 1261 /** 1263 1262 * cppc_set_perf - Set a CPU's performance controls.
+21 -5
drivers/acpi/sleep.c
··· 874 874 #ifdef CONFIG_HIBERNATION 875 875 static unsigned long s4_hardware_signature; 876 876 static struct acpi_table_facs *facs; 877 - static bool nosigcheck; 877 + static int sigcheck = -1; /* Default behaviour is just to warn */ 878 878 879 - void __init acpi_no_s4_hw_signature(void) 879 + void __init acpi_check_s4_hw_signature(int check) 880 880 { 881 - nosigcheck = true; 881 + sigcheck = check; 882 882 } 883 883 884 884 static int acpi_hibernation_begin(pm_message_t stage) ··· 1004 1004 hibernation_set_ops(old_suspend_ordering ? 1005 1005 &acpi_hibernation_ops_old : &acpi_hibernation_ops); 1006 1006 sleep_states[ACPI_STATE_S4] = 1; 1007 - if (nosigcheck) 1007 + if (!sigcheck) 1008 1008 return; 1009 1009 1010 1010 acpi_get_table(ACPI_SIG_FACS, 1, (struct acpi_table_header **)&facs); 1011 - if (facs) 1011 + if (facs) { 1012 + /* 1013 + * s4_hardware_signature is the local variable which is just 1014 + * used to warn about mismatch after we're attempting to 1015 + * resume (in violation of the ACPI specification.) 1016 + */ 1012 1017 s4_hardware_signature = facs->hardware_signature; 1018 + 1019 + if (sigcheck > 0) { 1020 + /* 1021 + * If we're actually obeying the ACPI specification 1022 + * then the signature is written out as part of the 1023 + * swsusp header, in order to allow the boot kernel 1024 + * to gracefully decline to resume. 1025 + */ 1026 + swsusp_hardware_signature = facs->hardware_signature; 1027 + } 1028 + } 1013 1029 } 1014 1030 #else /* !CONFIG_HIBERNATION */ 1015 1031 static inline void acpi_sleep_hibernate_setup(void) {}
+38 -4
drivers/base/arch_topology.c
··· 22 22 static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data); 23 23 static struct cpumask scale_freq_counters_mask; 24 24 static bool scale_freq_invariant; 25 + static DEFINE_PER_CPU(u32, freq_factor) = 1; 25 26 26 27 static bool supports_scale_freq_counters(const struct cpumask *cpus) 27 28 { ··· 156 155 157 156 DEFINE_PER_CPU(unsigned long, thermal_pressure); 158 157 159 - void topology_set_thermal_pressure(const struct cpumask *cpus, 160 - unsigned long th_pressure) 158 + /** 159 + * topology_update_thermal_pressure() - Update thermal pressure for CPUs 160 + * @cpus : The related CPUs for which capacity has been reduced 161 + * @capped_freq : The maximum allowed frequency that CPUs can run at 162 + * 163 + * Update the value of thermal pressure for all @cpus in the mask. The 164 + * cpumask should include all (online+offline) affected CPUs, to avoid 165 + * operating on stale data when hot-plug is used for some CPUs. The 166 + * @capped_freq reflects the currently allowed max CPUs frequency due to 167 + * thermal capping. It might be also a boost frequency value, which is bigger 168 + * than the internal 'freq_factor' max frequency. In such case the pressure 169 + * value should simply be removed, since this is an indication that there is 170 + * no thermal throttling. The @capped_freq must be provided in kHz. 171 + */ 172 + void topology_update_thermal_pressure(const struct cpumask *cpus, 173 + unsigned long capped_freq) 161 174 { 175 + unsigned long max_capacity, capacity, th_pressure; 176 + u32 max_freq; 162 177 int cpu; 178 + 179 + cpu = cpumask_first(cpus); 180 + max_capacity = arch_scale_cpu_capacity(cpu); 181 + max_freq = per_cpu(freq_factor, cpu); 182 + 183 + /* Convert to MHz scale which is used in 'freq_factor' */ 184 + capped_freq /= 1000; 185 + 186 + /* 187 + * Handle properly the boost frequencies, which should simply clean 188 + * the thermal pressure value. 189 + */ 190 + if (max_freq <= capped_freq) 191 + capacity = max_capacity; 192 + else 193 + capacity = mult_frac(max_capacity, capped_freq, max_freq); 194 + 195 + th_pressure = max_capacity - capacity; 163 196 164 197 for_each_cpu(cpu, cpus) 165 198 WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure); 166 199 } 167 - EXPORT_SYMBOL_GPL(topology_set_thermal_pressure); 200 + EXPORT_SYMBOL_GPL(topology_update_thermal_pressure); 168 201 169 202 static ssize_t cpu_capacity_show(struct device *dev, 170 203 struct device_attribute *attr, ··· 252 217 update_topology = 0; 253 218 } 254 219 255 - static DEFINE_PER_CPU(u32, freq_factor) = 1; 256 220 static u32 *raw_capacity; 257 221 258 222 static int free_raw_capacity(void)
+1 -2
drivers/base/core.c
··· 485 485 /* Ensure that all references to the link object have been dropped. */ 486 486 device_link_synchronize_removal(); 487 487 488 - while (refcount_dec_not_one(&link->rpm_active)) 489 - pm_runtime_put(link->supplier); 488 + pm_runtime_release_supplier(link, true); 490 489 491 490 put_device(link->consumer); 492 491 put_device(link->supplier);
+63 -35
drivers/base/power/runtime.c
··· 305 305 return 0; 306 306 } 307 307 308 + /** 309 + * pm_runtime_release_supplier - Drop references to device link's supplier. 310 + * @link: Target device link. 311 + * @check_idle: Whether or not to check if the supplier device is idle. 312 + * 313 + * Drop all runtime PM references associated with @link to its supplier device 314 + * and if @check_idle is set, check if that device is idle (and so it can be 315 + * suspended). 316 + */ 317 + void pm_runtime_release_supplier(struct device_link *link, bool check_idle) 318 + { 319 + struct device *supplier = link->supplier; 320 + 321 + /* 322 + * The additional power.usage_count check is a safety net in case 323 + * the rpm_active refcount becomes saturated, in which case 324 + * refcount_dec_not_one() would return true forever, but it is not 325 + * strictly necessary. 326 + */ 327 + while (refcount_dec_not_one(&link->rpm_active) && 328 + atomic_read(&supplier->power.usage_count) > 0) 329 + pm_runtime_put_noidle(supplier); 330 + 331 + if (check_idle) 332 + pm_request_idle(supplier); 333 + } 334 + 308 335 static void __rpm_put_suppliers(struct device *dev, bool try_to_suspend) 309 336 { 310 337 struct device_link *link; 311 338 312 339 list_for_each_entry_rcu(link, &dev->links.suppliers, c_node, 313 - device_links_read_lock_held()) { 314 - 315 - while (refcount_dec_not_one(&link->rpm_active)) 316 - pm_runtime_put_noidle(link->supplier); 317 - 318 - if (try_to_suspend) 319 - pm_request_idle(link->supplier); 320 - } 340 + device_links_read_lock_held()) 341 + pm_runtime_release_supplier(link, try_to_suspend); 321 342 } 322 343 323 344 static void rpm_put_suppliers(struct device *dev) ··· 763 742 trace_rpm_resume_rcuidle(dev, rpmflags); 764 743 765 744 repeat: 766 - if (dev->power.runtime_error) 745 + if (dev->power.runtime_error) { 767 746 retval = -EINVAL; 768 - else if (dev->power.disable_depth == 1 && dev->power.is_suspended 769 - && dev->power.runtime_status == RPM_ACTIVE) 770 - retval = 1; 771 - else if (dev->power.disable_depth > 0) 772 - retval = -EACCES; 747 + } else if (dev->power.disable_depth > 0) { 748 + if (dev->power.runtime_status == RPM_ACTIVE && 749 + dev->power.last_status == RPM_ACTIVE) 750 + retval = 1; 751 + else 752 + retval = -EACCES; 753 + } 773 754 if (retval) 774 755 goto out; 775 756 ··· 1433 1410 /* Update time accounting before disabling PM-runtime. */ 1434 1411 update_pm_runtime_accounting(dev); 1435 1412 1436 - if (!dev->power.disable_depth++) 1413 + if (!dev->power.disable_depth++) { 1437 1414 __pm_runtime_barrier(dev); 1415 + dev->power.last_status = dev->power.runtime_status; 1416 + } 1438 1417 1439 1418 out: 1440 1419 spin_unlock_irq(&dev->power.lock); ··· 1453 1428 1454 1429 spin_lock_irqsave(&dev->power.lock, flags); 1455 1430 1456 - if (dev->power.disable_depth > 0) { 1457 - dev->power.disable_depth--; 1458 - 1459 - /* About to enable runtime pm, set accounting_timestamp to now */ 1460 - if (!dev->power.disable_depth) 1461 - dev->power.accounting_timestamp = ktime_get_mono_fast_ns(); 1462 - } else { 1431 + if (!dev->power.disable_depth) { 1463 1432 dev_warn(dev, "Unbalanced %s!\n", __func__); 1433 + goto out; 1464 1434 } 1465 1435 1466 - WARN(!dev->power.disable_depth && 1467 - dev->power.runtime_status == RPM_SUSPENDED && 1468 - !dev->power.ignore_children && 1469 - atomic_read(&dev->power.child_count) > 0, 1470 - "Enabling runtime PM for inactive device (%s) with active children\n", 1471 - dev_name(dev)); 1436 + if (--dev->power.disable_depth > 0) 1437 + goto out; 1472 1438 1439 + dev->power.last_status = RPM_INVALID; 1440 + dev->power.accounting_timestamp = ktime_get_mono_fast_ns(); 1441 + 1442 + if (dev->power.runtime_status == RPM_SUSPENDED && 1443 + !dev->power.ignore_children && 1444 + atomic_read(&dev->power.child_count) > 0) 1445 + dev_warn(dev, "Enabling runtime PM for inactive device with active children\n"); 1446 + 1447 + out: 1473 1448 spin_unlock_irqrestore(&dev->power.lock, flags); 1474 1449 } 1475 1450 EXPORT_SYMBOL_GPL(pm_runtime_enable); ··· 1665 1640 void pm_runtime_init(struct device *dev) 1666 1641 { 1667 1642 dev->power.runtime_status = RPM_SUSPENDED; 1643 + dev->power.last_status = RPM_INVALID; 1668 1644 dev->power.idle_notification = false; 1669 1645 1670 1646 dev->power.disable_depth = 1; ··· 1748 1722 void pm_runtime_put_suppliers(struct device *dev) 1749 1723 { 1750 1724 struct device_link *link; 1751 - unsigned long flags; 1752 - bool put; 1753 1725 int idx; 1754 1726 1755 1727 idx = device_links_read_lock(); ··· 1755 1731 list_for_each_entry_rcu(link, &dev->links.suppliers, c_node, 1756 1732 device_links_read_lock_held()) 1757 1733 if (link->supplier_preactivated) { 1734 + bool put; 1735 + 1758 1736 link->supplier_preactivated = false; 1759 - spin_lock_irqsave(&dev->power.lock, flags); 1737 + 1738 + spin_lock_irq(&dev->power.lock); 1739 + 1760 1740 put = pm_runtime_status_suspended(dev) && 1761 1741 refcount_dec_not_one(&link->rpm_active); 1762 - spin_unlock_irqrestore(&dev->power.lock, flags); 1742 + 1743 + spin_unlock_irq(&dev->power.lock); 1744 + 1763 1745 if (put) 1764 1746 pm_runtime_put(link->supplier); 1765 1747 } ··· 1802 1772 return; 1803 1773 1804 1774 pm_runtime_drop_link_count(link->consumer); 1805 - 1806 - while (refcount_dec_not_one(&link->rpm_active)) 1807 - pm_runtime_put(link->supplier); 1775 + pm_runtime_release_supplier(link, true); 1808 1776 } 1809 1777 1810 1778 static bool pm_runtime_need_not_resume(struct device *dev)
+17
drivers/cpufreq/Kconfig.x86
··· 34 34 35 35 If in doubt, say N. 36 36 37 + config X86_AMD_PSTATE 38 + tristate "AMD Processor P-State driver" 39 + depends on X86 && ACPI 40 + select ACPI_PROCESSOR 41 + select ACPI_CPPC_LIB if X86_64 42 + select CPU_FREQ_GOV_SCHEDUTIL if SMP 43 + help 44 + This driver adds a CPUFreq driver which utilizes a fine grain 45 + processor performance frequency control range instead of legacy 46 + performance levels. _CPC needs to be present in the ACPI tables 47 + of the system. 48 + 49 + For details, take a look at: 50 + <file:Documentation/admin-guide/pm/amd-pstate.rst>. 51 + 52 + If in doubt, say N. 53 + 37 54 config X86_ACPI_CPUFREQ 38 55 tristate "ACPI Processor P-States driver" 39 56 depends on ACPI_PROCESSOR
+5
drivers/cpufreq/Makefile
··· 17 17 obj-$(CONFIG_CPUFREQ_DT) += cpufreq-dt.o 18 18 obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o 19 19 20 + # Traces 21 + CFLAGS_amd-pstate-trace.o := -I$(src) 22 + amd_pstate-y := amd-pstate.o amd-pstate-trace.o 23 + 20 24 ################################################################################## 21 25 # x86 drivers. 22 26 # Link order matters. K8 is preferred to ACPI because of firmware bugs in early ··· 29 25 # speedstep-* is preferred over p4-clockmod. 30 26 31 27 obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o 28 + obj-$(CONFIG_X86_AMD_PSTATE) += amd_pstate.o 32 29 obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o 33 30 obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o 34 31 obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o
+2
drivers/cpufreq/amd-pstate-trace.c
··· 1 + #define CREATE_TRACE_POINTS 2 + #include "amd-pstate-trace.h"
+77
drivers/cpufreq/amd-pstate-trace.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * amd-pstate-trace.h - AMD Processor P-state Frequency Driver Tracer 4 + * 5 + * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved. 6 + * 7 + * Author: Huang Rui <ray.huang@amd.com> 8 + */ 9 + 10 + #if !defined(_AMD_PSTATE_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) 11 + #define _AMD_PSTATE_TRACE_H 12 + 13 + #include <linux/cpufreq.h> 14 + #include <linux/tracepoint.h> 15 + #include <linux/trace_events.h> 16 + 17 + #undef TRACE_SYSTEM 18 + #define TRACE_SYSTEM amd_cpu 19 + 20 + #undef TRACE_INCLUDE_FILE 21 + #define TRACE_INCLUDE_FILE amd-pstate-trace 22 + 23 + #define TPS(x) tracepoint_string(x) 24 + 25 + TRACE_EVENT(amd_pstate_perf, 26 + 27 + TP_PROTO(unsigned long min_perf, 28 + unsigned long target_perf, 29 + unsigned long capacity, 30 + unsigned int cpu_id, 31 + bool changed, 32 + bool fast_switch 33 + ), 34 + 35 + TP_ARGS(min_perf, 36 + target_perf, 37 + capacity, 38 + cpu_id, 39 + changed, 40 + fast_switch 41 + ), 42 + 43 + TP_STRUCT__entry( 44 + __field(unsigned long, min_perf) 45 + __field(unsigned long, target_perf) 46 + __field(unsigned long, capacity) 47 + __field(unsigned int, cpu_id) 48 + __field(bool, changed) 49 + __field(bool, fast_switch) 50 + ), 51 + 52 + TP_fast_assign( 53 + __entry->min_perf = min_perf; 54 + __entry->target_perf = target_perf; 55 + __entry->capacity = capacity; 56 + __entry->cpu_id = cpu_id; 57 + __entry->changed = changed; 58 + __entry->fast_switch = fast_switch; 59 + ), 60 + 61 + TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu cpu_id=%u changed=%s fast_switch=%s", 62 + (unsigned long)__entry->min_perf, 63 + (unsigned long)__entry->target_perf, 64 + (unsigned long)__entry->capacity, 65 + (unsigned int)__entry->cpu_id, 66 + (__entry->changed) ? "true" : "false", 67 + (__entry->fast_switch) ? "true" : "false" 68 + ) 69 + ); 70 + 71 + #endif /* _AMD_PSTATE_TRACE_H */ 72 + 73 + /* This part must be outside protection */ 74 + #undef TRACE_INCLUDE_PATH 75 + #define TRACE_INCLUDE_PATH . 76 + 77 + #include <trace/define_trace.h>
+645
drivers/cpufreq/amd-pstate.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * amd-pstate.c - AMD Processor P-state Frequency Driver 4 + * 5 + * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved. 6 + * 7 + * Author: Huang Rui <ray.huang@amd.com> 8 + * 9 + * AMD P-State introduces a new CPU performance scaling design for AMD 10 + * processors using the ACPI Collaborative Performance and Power Control (CPPC) 11 + * feature which works with the AMD SMU firmware providing a finer grained 12 + * frequency control range. It is to replace the legacy ACPI P-States control, 13 + * allows a flexible, low-latency interface for the Linux kernel to directly 14 + * communicate the performance hints to hardware. 15 + * 16 + * AMD P-State is supported on recent AMD Zen base CPU series include some of 17 + * Zen2 and Zen3 processors. _CPC needs to be present in the ACPI tables of AMD 18 + * P-State supported system. And there are two types of hardware implementations 19 + * for AMD P-State: 1) Full MSR Solution and 2) Shared Memory Solution. 20 + * X86_FEATURE_CPPC CPU feature flag is used to distinguish the different types. 21 + */ 22 + 23 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 24 + 25 + #include <linux/kernel.h> 26 + #include <linux/module.h> 27 + #include <linux/init.h> 28 + #include <linux/smp.h> 29 + #include <linux/sched.h> 30 + #include <linux/cpufreq.h> 31 + #include <linux/compiler.h> 32 + #include <linux/dmi.h> 33 + #include <linux/slab.h> 34 + #include <linux/acpi.h> 35 + #include <linux/io.h> 36 + #include <linux/delay.h> 37 + #include <linux/uaccess.h> 38 + #include <linux/static_call.h> 39 + 40 + #include <acpi/processor.h> 41 + #include <acpi/cppc_acpi.h> 42 + 43 + #include <asm/msr.h> 44 + #include <asm/processor.h> 45 + #include <asm/cpufeature.h> 46 + #include <asm/cpu_device_id.h> 47 + #include "amd-pstate-trace.h" 48 + 49 + #define AMD_PSTATE_TRANSITION_LATENCY 0x20000 50 + #define AMD_PSTATE_TRANSITION_DELAY 500 51 + 52 + /* 53 + * TODO: We need more time to fine tune processors with shared memory solution 54 + * with community together. 55 + * 56 + * There are some performance drops on the CPU benchmarks which reports from 57 + * Suse. We are co-working with them to fine tune the shared memory solution. So 58 + * we disable it by default to go acpi-cpufreq on these processors and add a 59 + * module parameter to be able to enable it manually for debugging. 60 + */ 61 + static bool shared_mem = false; 62 + module_param(shared_mem, bool, 0444); 63 + MODULE_PARM_DESC(shared_mem, 64 + "enable amd-pstate on processors with shared memory solution (false = disabled (default), true = enabled)"); 65 + 66 + static struct cpufreq_driver amd_pstate_driver; 67 + 68 + /** 69 + * struct amd_cpudata - private CPU data for AMD P-State 70 + * @cpu: CPU number 71 + * @req: constraint request to apply 72 + * @cppc_req_cached: cached performance request hints 73 + * @highest_perf: the maximum performance an individual processor may reach, 74 + * assuming ideal conditions 75 + * @nominal_perf: the maximum sustained performance level of the processor, 76 + * assuming ideal operating conditions 77 + * @lowest_nonlinear_perf: the lowest performance level at which nonlinear power 78 + * savings are achieved 79 + * @lowest_perf: the absolute lowest performance level of the processor 80 + * @max_freq: the frequency that mapped to highest_perf 81 + * @min_freq: the frequency that mapped to lowest_perf 82 + * @nominal_freq: the frequency that mapped to nominal_perf 83 + * @lowest_nonlinear_freq: the frequency that mapped to lowest_nonlinear_perf 84 + * @boost_supported: check whether the Processor or SBIOS supports boost mode 85 + * 86 + * The amd_cpudata is key private data for each CPU thread in AMD P-State, and 87 + * represents all the attributes and goals that AMD P-State requests at runtime. 88 + */ 89 + struct amd_cpudata { 90 + int cpu; 91 + 92 + struct freq_qos_request req[2]; 93 + u64 cppc_req_cached; 94 + 95 + u32 highest_perf; 96 + u32 nominal_perf; 97 + u32 lowest_nonlinear_perf; 98 + u32 lowest_perf; 99 + 100 + u32 max_freq; 101 + u32 min_freq; 102 + u32 nominal_freq; 103 + u32 lowest_nonlinear_freq; 104 + 105 + bool boost_supported; 106 + }; 107 + 108 + static inline int pstate_enable(bool enable) 109 + { 110 + return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable); 111 + } 112 + 113 + static int cppc_enable(bool enable) 114 + { 115 + int cpu, ret = 0; 116 + 117 + for_each_present_cpu(cpu) { 118 + ret = cppc_set_enable(cpu, enable); 119 + if (ret) 120 + return ret; 121 + } 122 + 123 + return ret; 124 + } 125 + 126 + DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable); 127 + 128 + static inline int amd_pstate_enable(bool enable) 129 + { 130 + return static_call(amd_pstate_enable)(enable); 131 + } 132 + 133 + static int pstate_init_perf(struct amd_cpudata *cpudata) 134 + { 135 + u64 cap1; 136 + 137 + int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, 138 + &cap1); 139 + if (ret) 140 + return ret; 141 + 142 + /* 143 + * TODO: Introduce AMD specific power feature. 144 + * 145 + * CPPC entry doesn't indicate the highest performance in some ASICs. 146 + */ 147 + WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf()); 148 + 149 + WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); 150 + WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); 151 + WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1)); 152 + 153 + return 0; 154 + } 155 + 156 + static int cppc_init_perf(struct amd_cpudata *cpudata) 157 + { 158 + struct cppc_perf_caps cppc_perf; 159 + 160 + int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); 161 + if (ret) 162 + return ret; 163 + 164 + WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf()); 165 + 166 + WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); 167 + WRITE_ONCE(cpudata->lowest_nonlinear_perf, 168 + cppc_perf.lowest_nonlinear_perf); 169 + WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf); 170 + 171 + return 0; 172 + } 173 + 174 + DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf); 175 + 176 + static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata) 177 + { 178 + return static_call(amd_pstate_init_perf)(cpudata); 179 + } 180 + 181 + static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf, 182 + u32 des_perf, u32 max_perf, bool fast_switch) 183 + { 184 + if (fast_switch) 185 + wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached)); 186 + else 187 + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, 188 + READ_ONCE(cpudata->cppc_req_cached)); 189 + } 190 + 191 + static void cppc_update_perf(struct amd_cpudata *cpudata, 192 + u32 min_perf, u32 des_perf, 193 + u32 max_perf, bool fast_switch) 194 + { 195 + struct cppc_perf_ctrls perf_ctrls; 196 + 197 + perf_ctrls.max_perf = max_perf; 198 + perf_ctrls.min_perf = min_perf; 199 + perf_ctrls.desired_perf = des_perf; 200 + 201 + cppc_set_perf(cpudata->cpu, &perf_ctrls); 202 + } 203 + 204 + DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf); 205 + 206 + static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata, 207 + u32 min_perf, u32 des_perf, 208 + u32 max_perf, bool fast_switch) 209 + { 210 + static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf, 211 + max_perf, fast_switch); 212 + } 213 + 214 + static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf, 215 + u32 des_perf, u32 max_perf, bool fast_switch) 216 + { 217 + u64 prev = READ_ONCE(cpudata->cppc_req_cached); 218 + u64 value = prev; 219 + 220 + value &= ~AMD_CPPC_MIN_PERF(~0L); 221 + value |= AMD_CPPC_MIN_PERF(min_perf); 222 + 223 + value &= ~AMD_CPPC_DES_PERF(~0L); 224 + value |= AMD_CPPC_DES_PERF(des_perf); 225 + 226 + value &= ~AMD_CPPC_MAX_PERF(~0L); 227 + value |= AMD_CPPC_MAX_PERF(max_perf); 228 + 229 + trace_amd_pstate_perf(min_perf, des_perf, max_perf, 230 + cpudata->cpu, (value != prev), fast_switch); 231 + 232 + if (value == prev) 233 + return; 234 + 235 + WRITE_ONCE(cpudata->cppc_req_cached, value); 236 + 237 + amd_pstate_update_perf(cpudata, min_perf, des_perf, 238 + max_perf, fast_switch); 239 + } 240 + 241 + static int amd_pstate_verify(struct cpufreq_policy_data *policy) 242 + { 243 + cpufreq_verify_within_cpu_limits(policy); 244 + 245 + return 0; 246 + } 247 + 248 + static int amd_pstate_target(struct cpufreq_policy *policy, 249 + unsigned int target_freq, 250 + unsigned int relation) 251 + { 252 + struct cpufreq_freqs freqs; 253 + struct amd_cpudata *cpudata = policy->driver_data; 254 + unsigned long max_perf, min_perf, des_perf, cap_perf; 255 + 256 + if (!cpudata->max_freq) 257 + return -ENODEV; 258 + 259 + cap_perf = READ_ONCE(cpudata->highest_perf); 260 + min_perf = READ_ONCE(cpudata->lowest_nonlinear_perf); 261 + max_perf = cap_perf; 262 + 263 + freqs.old = policy->cur; 264 + freqs.new = target_freq; 265 + 266 + des_perf = DIV_ROUND_CLOSEST(target_freq * cap_perf, 267 + cpudata->max_freq); 268 + 269 + cpufreq_freq_transition_begin(policy, &freqs); 270 + amd_pstate_update(cpudata, min_perf, des_perf, 271 + max_perf, false); 272 + cpufreq_freq_transition_end(policy, &freqs, false); 273 + 274 + return 0; 275 + } 276 + 277 + static void amd_pstate_adjust_perf(unsigned int cpu, 278 + unsigned long _min_perf, 279 + unsigned long target_perf, 280 + unsigned long capacity) 281 + { 282 + unsigned long max_perf, min_perf, des_perf, 283 + cap_perf, lowest_nonlinear_perf; 284 + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); 285 + struct amd_cpudata *cpudata = policy->driver_data; 286 + 287 + cap_perf = READ_ONCE(cpudata->highest_perf); 288 + lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf); 289 + 290 + des_perf = cap_perf; 291 + if (target_perf < capacity) 292 + des_perf = DIV_ROUND_UP(cap_perf * target_perf, capacity); 293 + 294 + min_perf = READ_ONCE(cpudata->highest_perf); 295 + if (_min_perf < capacity) 296 + min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity); 297 + 298 + if (min_perf < lowest_nonlinear_perf) 299 + min_perf = lowest_nonlinear_perf; 300 + 301 + max_perf = cap_perf; 302 + if (max_perf < min_perf) 303 + max_perf = min_perf; 304 + 305 + des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf); 306 + 307 + amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true); 308 + } 309 + 310 + static int amd_get_min_freq(struct amd_cpudata *cpudata) 311 + { 312 + struct cppc_perf_caps cppc_perf; 313 + 314 + int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); 315 + if (ret) 316 + return ret; 317 + 318 + /* Switch to khz */ 319 + return cppc_perf.lowest_freq * 1000; 320 + } 321 + 322 + static int amd_get_max_freq(struct amd_cpudata *cpudata) 323 + { 324 + struct cppc_perf_caps cppc_perf; 325 + u32 max_perf, max_freq, nominal_freq, nominal_perf; 326 + u64 boost_ratio; 327 + 328 + int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); 329 + if (ret) 330 + return ret; 331 + 332 + nominal_freq = cppc_perf.nominal_freq; 333 + nominal_perf = READ_ONCE(cpudata->nominal_perf); 334 + max_perf = READ_ONCE(cpudata->highest_perf); 335 + 336 + boost_ratio = div_u64(max_perf << SCHED_CAPACITY_SHIFT, 337 + nominal_perf); 338 + 339 + max_freq = nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT; 340 + 341 + /* Switch to khz */ 342 + return max_freq * 1000; 343 + } 344 + 345 + static int amd_get_nominal_freq(struct amd_cpudata *cpudata) 346 + { 347 + struct cppc_perf_caps cppc_perf; 348 + 349 + int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); 350 + if (ret) 351 + return ret; 352 + 353 + /* Switch to khz */ 354 + return cppc_perf.nominal_freq * 1000; 355 + } 356 + 357 + static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata) 358 + { 359 + struct cppc_perf_caps cppc_perf; 360 + u32 lowest_nonlinear_freq, lowest_nonlinear_perf, 361 + nominal_freq, nominal_perf; 362 + u64 lowest_nonlinear_ratio; 363 + 364 + int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); 365 + if (ret) 366 + return ret; 367 + 368 + nominal_freq = cppc_perf.nominal_freq; 369 + nominal_perf = READ_ONCE(cpudata->nominal_perf); 370 + 371 + lowest_nonlinear_perf = cppc_perf.lowest_nonlinear_perf; 372 + 373 + lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf << SCHED_CAPACITY_SHIFT, 374 + nominal_perf); 375 + 376 + lowest_nonlinear_freq = nominal_freq * lowest_nonlinear_ratio >> SCHED_CAPACITY_SHIFT; 377 + 378 + /* Switch to khz */ 379 + return lowest_nonlinear_freq * 1000; 380 + } 381 + 382 + static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state) 383 + { 384 + struct amd_cpudata *cpudata = policy->driver_data; 385 + int ret; 386 + 387 + if (!cpudata->boost_supported) { 388 + pr_err("Boost mode is not supported by this processor or SBIOS\n"); 389 + return -EINVAL; 390 + } 391 + 392 + if (state) 393 + policy->cpuinfo.max_freq = cpudata->max_freq; 394 + else 395 + policy->cpuinfo.max_freq = cpudata->nominal_freq; 396 + 397 + policy->max = policy->cpuinfo.max_freq; 398 + 399 + ret = freq_qos_update_request(&cpudata->req[1], 400 + policy->cpuinfo.max_freq); 401 + if (ret < 0) 402 + return ret; 403 + 404 + return 0; 405 + } 406 + 407 + static void amd_pstate_boost_init(struct amd_cpudata *cpudata) 408 + { 409 + u32 highest_perf, nominal_perf; 410 + 411 + highest_perf = READ_ONCE(cpudata->highest_perf); 412 + nominal_perf = READ_ONCE(cpudata->nominal_perf); 413 + 414 + if (highest_perf <= nominal_perf) 415 + return; 416 + 417 + cpudata->boost_supported = true; 418 + amd_pstate_driver.boost_enabled = true; 419 + } 420 + 421 + static int amd_pstate_cpu_init(struct cpufreq_policy *policy) 422 + { 423 + int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; 424 + struct device *dev; 425 + struct amd_cpudata *cpudata; 426 + 427 + dev = get_cpu_device(policy->cpu); 428 + if (!dev) 429 + return -ENODEV; 430 + 431 + cpudata = kzalloc(sizeof(*cpudata), GFP_KERNEL); 432 + if (!cpudata) 433 + return -ENOMEM; 434 + 435 + cpudata->cpu = policy->cpu; 436 + 437 + ret = amd_pstate_init_perf(cpudata); 438 + if (ret) 439 + goto free_cpudata1; 440 + 441 + min_freq = amd_get_min_freq(cpudata); 442 + max_freq = amd_get_max_freq(cpudata); 443 + nominal_freq = amd_get_nominal_freq(cpudata); 444 + lowest_nonlinear_freq = amd_get_lowest_nonlinear_freq(cpudata); 445 + 446 + if (min_freq < 0 || max_freq < 0 || min_freq > max_freq) { 447 + dev_err(dev, "min_freq(%d) or max_freq(%d) value is incorrect\n", 448 + min_freq, max_freq); 449 + ret = -EINVAL; 450 + goto free_cpudata1; 451 + } 452 + 453 + policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY; 454 + policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY; 455 + 456 + policy->min = min_freq; 457 + policy->max = max_freq; 458 + 459 + policy->cpuinfo.min_freq = min_freq; 460 + policy->cpuinfo.max_freq = max_freq; 461 + 462 + /* It will be updated by governor */ 463 + policy->cur = policy->cpuinfo.min_freq; 464 + 465 + if (boot_cpu_has(X86_FEATURE_CPPC)) 466 + policy->fast_switch_possible = true; 467 + 468 + ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0], 469 + FREQ_QOS_MIN, policy->cpuinfo.min_freq); 470 + if (ret < 0) { 471 + dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret); 472 + goto free_cpudata1; 473 + } 474 + 475 + ret = freq_qos_add_request(&policy->constraints, &cpudata->req[1], 476 + FREQ_QOS_MAX, policy->cpuinfo.max_freq); 477 + if (ret < 0) { 478 + dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret); 479 + goto free_cpudata2; 480 + } 481 + 482 + /* Initial processor data capability frequencies */ 483 + cpudata->max_freq = max_freq; 484 + cpudata->min_freq = min_freq; 485 + cpudata->nominal_freq = nominal_freq; 486 + cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq; 487 + 488 + policy->driver_data = cpudata; 489 + 490 + amd_pstate_boost_init(cpudata); 491 + 492 + return 0; 493 + 494 + free_cpudata2: 495 + freq_qos_remove_request(&cpudata->req[0]); 496 + free_cpudata1: 497 + kfree(cpudata); 498 + return ret; 499 + } 500 + 501 + static int amd_pstate_cpu_exit(struct cpufreq_policy *policy) 502 + { 503 + struct amd_cpudata *cpudata; 504 + 505 + cpudata = policy->driver_data; 506 + 507 + freq_qos_remove_request(&cpudata->req[1]); 508 + freq_qos_remove_request(&cpudata->req[0]); 509 + kfree(cpudata); 510 + 511 + return 0; 512 + } 513 + 514 + /* Sysfs attributes */ 515 + 516 + /* 517 + * This frequency is to indicate the maximum hardware frequency. 518 + * If boost is not active but supported, the frequency will be larger than the 519 + * one in cpuinfo. 520 + */ 521 + static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy, 522 + char *buf) 523 + { 524 + int max_freq; 525 + struct amd_cpudata *cpudata; 526 + 527 + cpudata = policy->driver_data; 528 + 529 + max_freq = amd_get_max_freq(cpudata); 530 + if (max_freq < 0) 531 + return max_freq; 532 + 533 + return sprintf(&buf[0], "%u\n", max_freq); 534 + } 535 + 536 + static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy, 537 + char *buf) 538 + { 539 + int freq; 540 + struct amd_cpudata *cpudata; 541 + 542 + cpudata = policy->driver_data; 543 + 544 + freq = amd_get_lowest_nonlinear_freq(cpudata); 545 + if (freq < 0) 546 + return freq; 547 + 548 + return sprintf(&buf[0], "%u\n", freq); 549 + } 550 + 551 + /* 552 + * In some of ASICs, the highest_perf is not the one in the _CPC table, so we 553 + * need to expose it to sysfs. 554 + */ 555 + static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, 556 + char *buf) 557 + { 558 + u32 perf; 559 + struct amd_cpudata *cpudata = policy->driver_data; 560 + 561 + perf = READ_ONCE(cpudata->highest_perf); 562 + 563 + return sprintf(&buf[0], "%u\n", perf); 564 + } 565 + 566 + cpufreq_freq_attr_ro(amd_pstate_max_freq); 567 + cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); 568 + 569 + cpufreq_freq_attr_ro(amd_pstate_highest_perf); 570 + 571 + static struct freq_attr *amd_pstate_attr[] = { 572 + &amd_pstate_max_freq, 573 + &amd_pstate_lowest_nonlinear_freq, 574 + &amd_pstate_highest_perf, 575 + NULL, 576 + }; 577 + 578 + static struct cpufreq_driver amd_pstate_driver = { 579 + .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, 580 + .verify = amd_pstate_verify, 581 + .target = amd_pstate_target, 582 + .init = amd_pstate_cpu_init, 583 + .exit = amd_pstate_cpu_exit, 584 + .set_boost = amd_pstate_set_boost, 585 + .name = "amd-pstate", 586 + .attr = amd_pstate_attr, 587 + }; 588 + 589 + static int __init amd_pstate_init(void) 590 + { 591 + int ret; 592 + 593 + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) 594 + return -ENODEV; 595 + 596 + if (!acpi_cpc_valid()) { 597 + pr_debug("the _CPC object is not present in SBIOS\n"); 598 + return -ENODEV; 599 + } 600 + 601 + /* don't keep reloading if cpufreq_driver exists */ 602 + if (cpufreq_get_current_driver()) 603 + return -EEXIST; 604 + 605 + /* capability check */ 606 + if (boot_cpu_has(X86_FEATURE_CPPC)) { 607 + pr_debug("AMD CPPC MSR based functionality is supported\n"); 608 + amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf; 609 + } else if (shared_mem) { 610 + static_call_update(amd_pstate_enable, cppc_enable); 611 + static_call_update(amd_pstate_init_perf, cppc_init_perf); 612 + static_call_update(amd_pstate_update_perf, cppc_update_perf); 613 + } else { 614 + pr_info("This processor supports shared memory solution, you can enable it with amd_pstate.shared_mem=1\n"); 615 + return -ENODEV; 616 + } 617 + 618 + /* enable amd pstate feature */ 619 + ret = amd_pstate_enable(true); 620 + if (ret) { 621 + pr_err("failed to enable amd-pstate with return %d\n", ret); 622 + return ret; 623 + } 624 + 625 + ret = cpufreq_register_driver(&amd_pstate_driver); 626 + if (ret) 627 + pr_err("failed to register amd_pstate_driver with return %d\n", 628 + ret); 629 + 630 + return ret; 631 + } 632 + 633 + static void __exit amd_pstate_exit(void) 634 + { 635 + cpufreq_unregister_driver(&amd_pstate_driver); 636 + 637 + amd_pstate_enable(false); 638 + } 639 + 640 + module_init(amd_pstate_init); 641 + module_exit(amd_pstate_exit); 642 + 643 + MODULE_AUTHOR("Huang Rui <ray.huang@amd.com>"); 644 + MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver"); 645 + MODULE_LICENSE("GPL");
+5 -4
drivers/cpufreq/cpufreq.c
··· 924 924 cpufreq_freq_attr_rw(scaling_governor); 925 925 cpufreq_freq_attr_rw(scaling_setspeed); 926 926 927 - static struct attribute *default_attrs[] = { 927 + static struct attribute *cpufreq_attrs[] = { 928 928 &cpuinfo_min_freq.attr, 929 929 &cpuinfo_max_freq.attr, 930 930 &cpuinfo_transition_latency.attr, ··· 938 938 &scaling_setspeed.attr, 939 939 NULL 940 940 }; 941 + ATTRIBUTE_GROUPS(cpufreq); 941 942 942 943 #define to_policy(k) container_of(k, struct cpufreq_policy, kobj) 943 944 #define to_attr(a) container_of(a, struct freq_attr, attr) ··· 1001 1000 1002 1001 static struct kobj_type ktype_cpufreq = { 1003 1002 .sysfs_ops = &sysfs_ops, 1004 - .default_attrs = default_attrs, 1003 + .default_groups = cpufreq_groups, 1005 1004 .release = cpufreq_sysfs_release, 1006 1005 }; 1007 1006 ··· 1404 1403 1405 1404 ret = freq_qos_add_request(&policy->constraints, 1406 1405 policy->min_freq_req, FREQ_QOS_MIN, 1407 - policy->min); 1406 + FREQ_QOS_MIN_DEFAULT_VALUE); 1408 1407 if (ret < 0) { 1409 1408 /* 1410 1409 * So we don't call freq_qos_remove_request() for an ··· 1424 1423 1425 1424 ret = freq_qos_add_request(&policy->constraints, 1426 1425 policy->max_freq_req, FREQ_QOS_MAX, 1427 - policy->max); 1426 + FREQ_QOS_MAX_DEFAULT_VALUE); 1428 1427 if (ret < 0) { 1429 1428 policy->max_freq_req = NULL; 1430 1429 goto out_destroy_policy;
+3 -2
drivers/cpufreq/cpufreq_conservative.c
··· 257 257 gov_attr_rw(down_threshold); 258 258 gov_attr_rw(freq_step); 259 259 260 - static struct attribute *cs_attributes[] = { 260 + static struct attribute *cs_attrs[] = { 261 261 &sampling_rate.attr, 262 262 &sampling_down_factor.attr, 263 263 &up_threshold.attr, ··· 266 266 &freq_step.attr, 267 267 NULL 268 268 }; 269 + ATTRIBUTE_GROUPS(cs); 269 270 270 271 /************************** sysfs end ************************/ 271 272 ··· 316 315 317 316 static struct dbs_governor cs_governor = { 318 317 .gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("conservative"), 319 - .kobj_type = { .default_attrs = cs_attributes }, 318 + .kobj_type = { .default_groups = cs_groups }, 320 319 .gov_dbs_update = cs_dbs_update, 321 320 .alloc = cs_alloc, 322 321 .free = cs_free,
+3 -2
drivers/cpufreq/cpufreq_ondemand.c
··· 328 328 gov_attr_rw(ignore_nice_load); 329 329 gov_attr_rw(powersave_bias); 330 330 331 - static struct attribute *od_attributes[] = { 331 + static struct attribute *od_attrs[] = { 332 332 &sampling_rate.attr, 333 333 &up_threshold.attr, 334 334 &sampling_down_factor.attr, ··· 337 337 &io_is_busy.attr, 338 338 NULL 339 339 }; 340 + ATTRIBUTE_GROUPS(od); 340 341 341 342 /************************** sysfs end ************************/ 342 343 ··· 402 401 403 402 static struct dbs_governor od_dbs_gov = { 404 403 .gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("ondemand"), 405 - .kobj_type = { .default_attrs = od_attributes }, 404 + .kobj_type = { .default_groups = od_groups }, 406 405 .gov_dbs_update = od_dbs_update, 407 406 .alloc = od_alloc, 408 407 .free = od_free,
+81 -40
drivers/cpufreq/intel_pstate.c
··· 664 664 * 3 balance_power 665 665 * 4 power 666 666 */ 667 + 668 + enum energy_perf_value_index { 669 + EPP_INDEX_DEFAULT = 0, 670 + EPP_INDEX_PERFORMANCE, 671 + EPP_INDEX_BALANCE_PERFORMANCE, 672 + EPP_INDEX_BALANCE_POWERSAVE, 673 + EPP_INDEX_POWERSAVE, 674 + }; 675 + 667 676 static const char * const energy_perf_strings[] = { 668 - "default", 669 - "performance", 670 - "balance_performance", 671 - "balance_power", 672 - "power", 677 + [EPP_INDEX_DEFAULT] = "default", 678 + [EPP_INDEX_PERFORMANCE] = "performance", 679 + [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance", 680 + [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power", 681 + [EPP_INDEX_POWERSAVE] = "power", 673 682 NULL 674 683 }; 675 - static const unsigned int epp_values[] = { 676 - HWP_EPP_PERFORMANCE, 677 - HWP_EPP_BALANCE_PERFORMANCE, 678 - HWP_EPP_BALANCE_POWERSAVE, 679 - HWP_EPP_POWERSAVE 684 + static unsigned int epp_values[] = { 685 + [EPP_INDEX_DEFAULT] = 0, /* Unused index */ 686 + [EPP_INDEX_PERFORMANCE] = HWP_EPP_PERFORMANCE, 687 + [EPP_INDEX_BALANCE_PERFORMANCE] = HWP_EPP_BALANCE_PERFORMANCE, 688 + [EPP_INDEX_BALANCE_POWERSAVE] = HWP_EPP_BALANCE_POWERSAVE, 689 + [EPP_INDEX_POWERSAVE] = HWP_EPP_POWERSAVE, 680 690 }; 681 691 682 692 static int intel_pstate_get_energy_pref_index(struct cpudata *cpu_data, int *raw_epp) ··· 700 690 return epp; 701 691 702 692 if (boot_cpu_has(X86_FEATURE_HWP_EPP)) { 703 - if (epp == HWP_EPP_PERFORMANCE) 704 - return 1; 705 - if (epp == HWP_EPP_BALANCE_PERFORMANCE) 706 - return 2; 707 - if (epp == HWP_EPP_BALANCE_POWERSAVE) 708 - return 3; 709 - if (epp == HWP_EPP_POWERSAVE) 710 - return 4; 693 + if (epp == epp_values[EPP_INDEX_PERFORMANCE]) 694 + return EPP_INDEX_PERFORMANCE; 695 + if (epp == epp_values[EPP_INDEX_BALANCE_PERFORMANCE]) 696 + return EPP_INDEX_BALANCE_PERFORMANCE; 697 + if (epp == epp_values[EPP_INDEX_BALANCE_POWERSAVE]) 698 + return EPP_INDEX_BALANCE_POWERSAVE; 699 + if (epp == epp_values[EPP_INDEX_POWERSAVE]) 700 + return EPP_INDEX_POWERSAVE; 711 701 *raw_epp = epp; 712 702 return 0; 713 703 } else if (boot_cpu_has(X86_FEATURE_EPB)) { ··· 767 757 if (use_raw) 768 758 epp = raw_epp; 769 759 else if (epp == -EINVAL) 770 - epp = epp_values[pref_index - 1]; 760 + epp = epp_values[pref_index]; 771 761 772 762 /* 773 763 * To avoid confusion, refuse to set EPP to any values different ··· 853 843 * upfront. 854 844 */ 855 845 if (!raw) 856 - epp = ret ? epp_values[ret - 1] : cpu->epp_default; 846 + epp = ret ? epp_values[ret] : cpu->epp_default; 857 847 858 848 if (cpu->epp_cached != epp) { 859 849 int err; ··· 1134 1124 cpufreq_update_policy(cpu); 1135 1125 } 1136 1126 1127 + static void __intel_pstate_update_max_freq(struct cpudata *cpudata, 1128 + struct cpufreq_policy *policy) 1129 + { 1130 + policy->cpuinfo.max_freq = global.turbo_disabled_mf ? 1131 + cpudata->pstate.max_freq : cpudata->pstate.turbo_freq; 1132 + refresh_frequency_limits(policy); 1133 + } 1134 + 1137 1135 static void intel_pstate_update_max_freq(unsigned int cpu) 1138 1136 { 1139 1137 struct cpufreq_policy *policy = cpufreq_cpu_acquire(cpu); 1140 - struct cpudata *cpudata; 1141 1138 1142 1139 if (!policy) 1143 1140 return; 1144 1141 1145 - cpudata = all_cpu_data[cpu]; 1146 - policy->cpuinfo.max_freq = global.turbo_disabled_mf ? 1147 - cpudata->pstate.max_freq : cpudata->pstate.turbo_freq; 1148 - 1149 - refresh_frequency_limits(policy); 1142 + __intel_pstate_update_max_freq(all_cpu_data[cpu], policy); 1150 1143 1151 1144 cpufreq_cpu_release(policy); 1152 1145 } ··· 1597 1584 { 1598 1585 struct cpudata *cpudata = 1599 1586 container_of(to_delayed_work(work), struct cpudata, hwp_notify_work); 1587 + struct cpufreq_policy *policy = cpufreq_cpu_acquire(cpudata->cpu); 1600 1588 1601 - cpufreq_update_policy(cpudata->cpu); 1589 + if (policy) { 1590 + intel_pstate_get_hwp_cap(cpudata); 1591 + __intel_pstate_update_max_freq(cpudata, policy); 1592 + 1593 + cpufreq_cpu_release(policy); 1594 + } 1595 + 1602 1596 wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_STATUS, 0); 1603 1597 } 1604 1598 ··· 1699 1679 wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00); 1700 1680 1701 1681 wrmsrl_on_cpu(cpudata->cpu, MSR_PM_ENABLE, 0x1); 1702 - if (cpudata->epp_default == -EINVAL) 1703 - cpudata->epp_default = intel_pstate_get_epp(cpudata, 0); 1704 1682 1705 1683 intel_pstate_enable_hwp_interrupt(cpudata); 1684 + 1685 + if (cpudata->epp_default >= 0) 1686 + return; 1687 + 1688 + if (epp_values[EPP_INDEX_BALANCE_PERFORMANCE] == HWP_EPP_BALANCE_PERFORMANCE) { 1689 + cpudata->epp_default = intel_pstate_get_epp(cpudata, 0); 1690 + } else { 1691 + cpudata->epp_default = epp_values[EPP_INDEX_BALANCE_PERFORMANCE]; 1692 + intel_pstate_set_epp(cpudata, cpudata->epp_default); 1693 + } 1706 1694 } 1707 1695 1708 1696 static int atom_get_min_pstate(void) ··· 2514 2486 * HWP needs some special consideration, because HWP_REQUEST uses 2515 2487 * abstract values to represent performance rather than pure ratios. 2516 2488 */ 2517 - if (hwp_active) { 2518 - intel_pstate_get_hwp_cap(cpu); 2489 + if (hwp_active && cpu->pstate.scaling != perf_ctl_scaling) { 2490 + int scaling = cpu->pstate.scaling; 2491 + int freq; 2519 2492 2520 - if (cpu->pstate.scaling != perf_ctl_scaling) { 2521 - int scaling = cpu->pstate.scaling; 2522 - int freq; 2523 - 2524 - freq = max_policy_perf * perf_ctl_scaling; 2525 - max_policy_perf = DIV_ROUND_UP(freq, scaling); 2526 - freq = min_policy_perf * perf_ctl_scaling; 2527 - min_policy_perf = DIV_ROUND_UP(freq, scaling); 2528 - } 2493 + freq = max_policy_perf * perf_ctl_scaling; 2494 + max_policy_perf = DIV_ROUND_UP(freq, scaling); 2495 + freq = min_policy_perf * perf_ctl_scaling; 2496 + min_policy_perf = DIV_ROUND_UP(freq, scaling); 2529 2497 } 2530 2498 2531 2499 pr_debug("cpu:%d min_policy_perf:%d max_policy_perf:%d\n", ··· 3373 3349 return !!(value & 0x1); 3374 3350 } 3375 3351 3352 + static const struct x86_cpu_id intel_epp_balance_perf[] = { 3353 + /* 3354 + * Set EPP value as 102, this is the max suggested EPP 3355 + * which can result in one core turbo frequency for 3356 + * AlderLake Mobile CPUs. 3357 + */ 3358 + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, 102), 3359 + {} 3360 + }; 3361 + 3376 3362 static int __init intel_pstate_init(void) 3377 3363 { 3378 3364 static struct cpudata **_all_cpu_data; ··· 3471 3437 intel_pstate_request_control_from_smm(); 3472 3438 3473 3439 intel_pstate_sysfs_expose_params(); 3440 + 3441 + if (hwp_active) { 3442 + const struct x86_cpu_id *id = x86_match_cpu(intel_epp_balance_perf); 3443 + 3444 + if (id) 3445 + epp_values[EPP_INDEX_BALANCE_PERFORMANCE] = id->driver_data; 3446 + } 3474 3447 3475 3448 mutex_lock(&intel_pstate_driver_lock); 3476 3449 rc = intel_pstate_register_driver(default_driver);
+30 -3
drivers/cpufreq/mediatek-cpufreq-hw.c
··· 36 36 struct mtk_cpufreq_data { 37 37 struct cpufreq_frequency_table *table; 38 38 void __iomem *reg_bases[REG_ARRAY_SIZE]; 39 + struct resource *res; 40 + void __iomem *base; 39 41 int nr_opp; 40 42 }; 41 43 ··· 158 156 { 159 157 struct mtk_cpufreq_data *data; 160 158 struct device *dev = &pdev->dev; 159 + struct resource *res; 161 160 void __iomem *base; 162 161 int ret, i; 163 162 int index; ··· 173 170 if (index < 0) 174 171 return index; 175 172 176 - base = devm_platform_ioremap_resource(pdev, index); 177 - if (IS_ERR(base)) 178 - return PTR_ERR(base); 173 + res = platform_get_resource(pdev, IORESOURCE_MEM, index); 174 + if (!res) { 175 + dev_err(dev, "failed to get mem resource %d\n", index); 176 + return -ENODEV; 177 + } 178 + 179 + if (!request_mem_region(res->start, resource_size(res), res->name)) { 180 + dev_err(dev, "failed to request resource %pR\n", res); 181 + return -EBUSY; 182 + } 183 + 184 + base = ioremap(res->start, resource_size(res)); 185 + if (!base) { 186 + dev_err(dev, "failed to map resource %pR\n", res); 187 + ret = -ENOMEM; 188 + goto release_region; 189 + } 190 + 191 + data->base = base; 192 + data->res = res; 179 193 180 194 for (i = REG_FREQ_LUT_TABLE; i < REG_ARRAY_SIZE; i++) 181 195 data->reg_bases[i] = base + offsets[i]; ··· 207 187 policy->driver_data = data; 208 188 209 189 return 0; 190 + release_region: 191 + release_mem_region(res->start, resource_size(res)); 192 + return ret; 210 193 } 211 194 212 195 static int mtk_cpufreq_hw_cpu_init(struct cpufreq_policy *policy) ··· 256 233 static int mtk_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy) 257 234 { 258 235 struct mtk_cpufreq_data *data = policy->driver_data; 236 + struct resource *res = data->res; 237 + void __iomem *base = data->base; 259 238 260 239 /* HW should be in paused state now */ 261 240 writel_relaxed(0x0, data->reg_bases[REG_FREQ_ENABLE]); 241 + iounmap(base); 242 + release_mem_region(res->start, resource_size(res)); 262 243 263 244 return 0; 264 245 }
+19 -20
drivers/cpufreq/qcom-cpufreq-hw.c
··· 46 46 */ 47 47 struct mutex throttle_lock; 48 48 int throttle_irq; 49 + char irq_name[15]; 49 50 bool cancel_throttle; 50 51 struct delayed_work throttle_work; 51 52 struct cpufreq_policy *policy; ··· 276 275 277 276 static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) 278 277 { 279 - unsigned long max_capacity, capacity, freq_hz, throttled_freq; 280 278 struct cpufreq_policy *policy = data->policy; 281 279 int cpu = cpumask_first(policy->cpus); 282 280 struct device *dev = get_cpu_device(cpu); 281 + unsigned long freq_hz, throttled_freq; 283 282 struct dev_pm_opp *opp; 284 283 unsigned int freq; 285 284 ··· 296 295 297 296 throttled_freq = freq_hz / HZ_PER_KHZ; 298 297 299 - /* Update thermal pressure */ 300 - 301 - max_capacity = arch_scale_cpu_capacity(cpu); 302 - capacity = mult_frac(max_capacity, throttled_freq, policy->cpuinfo.max_freq); 303 - 304 - /* Don't pass boost capacity to scheduler */ 305 - if (capacity > max_capacity) 306 - capacity = max_capacity; 307 - 308 - arch_set_thermal_pressure(policy->cpus, max_capacity - capacity); 298 + /* Update thermal pressure (the boost frequencies are accepted) */ 299 + arch_update_thermal_pressure(policy->related_cpus, throttled_freq); 309 300 310 301 /* 311 302 * In the unlikely case policy is unregistered do not enable ··· 335 342 336 343 /* Disable interrupt and enable polling */ 337 344 disable_irq_nosync(c_data->throttle_irq); 338 - qcom_lmh_dcvs_notify(c_data); 345 + schedule_delayed_work(&c_data->throttle_work, 0); 339 346 340 - return 0; 347 + return IRQ_HANDLED; 341 348 } 342 349 343 350 static const struct qcom_cpufreq_soc_data qcom_soc_data = { ··· 368 375 { 369 376 struct qcom_cpufreq_data *data = policy->driver_data; 370 377 struct platform_device *pdev = cpufreq_get_driver_data(); 371 - char irq_name[15]; 372 378 int ret; 373 379 374 380 /* 375 381 * Look for LMh interrupt. If no interrupt line is specified / 376 382 * if there is an error, allow cpufreq to be enabled as usual. 377 383 */ 378 - data->throttle_irq = platform_get_irq(pdev, index); 379 - if (data->throttle_irq <= 0) 380 - return data->throttle_irq == -EPROBE_DEFER ? -EPROBE_DEFER : 0; 384 + data->throttle_irq = platform_get_irq_optional(pdev, index); 385 + if (data->throttle_irq == -ENXIO) 386 + return 0; 387 + if (data->throttle_irq < 0) 388 + return data->throttle_irq; 381 389 382 390 data->cancel_throttle = false; 383 391 data->policy = policy; ··· 386 392 mutex_init(&data->throttle_lock); 387 393 INIT_DEFERRABLE_WORK(&data->throttle_work, qcom_lmh_dcvs_poll); 388 394 389 - snprintf(irq_name, sizeof(irq_name), "dcvsh-irq-%u", policy->cpu); 395 + snprintf(data->irq_name, sizeof(data->irq_name), "dcvsh-irq-%u", policy->cpu); 390 396 ret = request_threaded_irq(data->throttle_irq, NULL, qcom_lmh_dcvs_handle_irq, 391 - IRQF_ONESHOT, irq_name, data); 397 + IRQF_ONESHOT, data->irq_name, data); 392 398 if (ret) { 393 - dev_err(&pdev->dev, "Error registering %s: %d\n", irq_name, ret); 399 + dev_err(&pdev->dev, "Error registering %s: %d\n", data->irq_name, ret); 394 400 return 0; 395 401 } 402 + 403 + ret = irq_set_affinity_hint(data->throttle_irq, policy->cpus); 404 + if (ret) 405 + dev_err(&pdev->dev, "Failed to set CPU affinity of %s[%d]\n", 406 + data->irq_name, data->throttle_irq); 396 407 397 408 return 0; 398 409 }
+1 -1
drivers/cpuidle/governors/menu.c
··· 34 34 * 1) Energy break even point 35 35 * 2) Performance impact 36 36 * 3) Latency tolerance (from pmqos infrastructure) 37 - * These these three factors are treated independently. 37 + * These three factors are treated independently. 38 38 * 39 39 * Energy break even point 40 40 * -----------------------
+5 -3
drivers/cpuidle/sysfs.c
··· 335 335 &attr_default_status.attr, 336 336 NULL 337 337 }; 338 + ATTRIBUTE_GROUPS(cpuidle_state_default); 338 339 339 340 struct cpuidle_state_kobj { 340 341 struct cpuidle_state *state; ··· 449 448 450 449 static struct kobj_type ktype_state_cpuidle = { 451 450 .sysfs_ops = &cpuidle_state_sysfs_ops, 452 - .default_attrs = cpuidle_state_default_attrs, 451 + .default_groups = cpuidle_state_default_groups, 453 452 .release = cpuidle_state_sysfs_release, 454 453 }; 455 454 ··· 506 505 } 507 506 508 507 /** 509 - * cpuidle_remove_driver_sysfs - removes the cpuidle states sysfs attributes 508 + * cpuidle_remove_state_sysfs - removes the cpuidle states sysfs attributes 510 509 * @device: the target device 511 510 */ 512 511 static void cpuidle_remove_state_sysfs(struct cpuidle_device *device) ··· 592 591 &attr_driver_name.attr, 593 592 NULL 594 593 }; 594 + ATTRIBUTE_GROUPS(cpuidle_driver_default); 595 595 596 596 static struct kobj_type ktype_driver_cpuidle = { 597 597 .sysfs_ops = &cpuidle_driver_sysfs_ops, 598 - .default_attrs = cpuidle_driver_default_attrs, 598 + .default_groups = cpuidle_driver_default_groups, 599 599 .release = cpuidle_driver_sysfs_release, 600 600 }; 601 601
+9
drivers/devfreq/Kconfig
··· 132 132 It sets the frequency for the memory controller and reads the usage counts 133 133 from hardware. 134 134 135 + config ARM_SUN8I_A33_MBUS_DEVFREQ 136 + tristate "sun8i/sun50i MBUS DEVFREQ Driver" 137 + depends on ARCH_SUNXI || COMPILE_TEST 138 + depends on COMMON_CLK 139 + select DEVFREQ_GOV_SIMPLE_ONDEMAND 140 + help 141 + This adds the DEVFREQ driver for the MBUS controller in some 142 + Allwinner sun8i (A33 through H3) and sun50i (A64 and H5) SoCs. 143 + 135 144 source "drivers/devfreq/event/Kconfig" 136 145 137 146 endif # PM_DEVFREQ
+1
drivers/devfreq/Makefile
··· 12 12 obj-$(CONFIG_ARM_IMX_BUS_DEVFREQ) += imx-bus.o 13 13 obj-$(CONFIG_ARM_IMX8M_DDRC_DEVFREQ) += imx8m-ddrc.o 14 14 obj-$(CONFIG_ARM_RK3399_DMC_DEVFREQ) += rk3399_dmc.o 15 + obj-$(CONFIG_ARM_SUN8I_A33_MBUS_DEVFREQ) += sun8i-a33-mbus.o 15 16 obj-$(CONFIG_ARM_TEGRA_DEVFREQ) += tegra30-devfreq.o 16 17 17 18 # DEVFREQ Event Drivers
+2 -2
drivers/devfreq/devfreq.c
··· 382 382 devfreq_notify_transition(devfreq, &freqs, DEVFREQ_POSTCHANGE); 383 383 384 384 if (devfreq_update_status(devfreq, new_freq)) 385 - dev_err(&devfreq->dev, 386 - "Couldn't update frequency transition information.\n"); 385 + dev_warn(&devfreq->dev, 386 + "Couldn't update frequency transition information.\n"); 387 387 388 388 devfreq->previous_freq = new_freq; 389 389
+511
drivers/devfreq/sun8i-a33-mbus.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + // 3 + // Copyright (C) 2020-2021 Samuel Holland <samuel@sholland.org> 4 + // 5 + 6 + #include <linux/clk.h> 7 + #include <linux/devfreq.h> 8 + #include <linux/err.h> 9 + #include <linux/io.h> 10 + #include <linux/iopoll.h> 11 + #include <linux/module.h> 12 + #include <linux/of.h> 13 + #include <linux/platform_device.h> 14 + #include <linux/property.h> 15 + 16 + #define MBUS_CR 0x0000 17 + #define MBUS_CR_GET_DRAM_TYPE(x) (((x) >> 16) & 0x7) 18 + #define MBUS_CR_DRAM_TYPE_DDR2 2 19 + #define MBUS_CR_DRAM_TYPE_DDR3 3 20 + #define MBUS_CR_DRAM_TYPE_DDR4 4 21 + #define MBUS_CR_DRAM_TYPE_LPDDR2 6 22 + #define MBUS_CR_DRAM_TYPE_LPDDR3 7 23 + 24 + #define MBUS_TMR 0x000c 25 + #define MBUS_TMR_PERIOD(x) ((x) - 1) 26 + 27 + #define MBUS_PMU_CFG 0x009c 28 + #define MBUS_PMU_CFG_PERIOD(x) (((x) - 1) << 16) 29 + #define MBUS_PMU_CFG_UNIT (0x3 << 1) 30 + #define MBUS_PMU_CFG_UNIT_B (0x0 << 1) 31 + #define MBUS_PMU_CFG_UNIT_KB (0x1 << 1) 32 + #define MBUS_PMU_CFG_UNIT_MB (0x2 << 1) 33 + #define MBUS_PMU_CFG_ENABLE (0x1 << 0) 34 + 35 + #define MBUS_PMU_BWCR(n) (0x00a0 + (0x04 * (n))) 36 + 37 + #define MBUS_TOTAL_BWCR MBUS_PMU_BWCR(5) 38 + #define MBUS_TOTAL_BWCR_H616 MBUS_PMU_BWCR(13) 39 + 40 + #define MBUS_MDFSCR 0x0100 41 + #define MBUS_MDFSCR_BUFFER_TIMING (0x1 << 15) 42 + #define MBUS_MDFSCR_PAD_HOLD (0x1 << 13) 43 + #define MBUS_MDFSCR_BYPASS (0x1 << 4) 44 + #define MBUS_MDFSCR_MODE (0x1 << 1) 45 + #define MBUS_MDFSCR_MODE_DFS (0x0 << 1) 46 + #define MBUS_MDFSCR_MODE_CFS (0x1 << 1) 47 + #define MBUS_MDFSCR_START (0x1 << 0) 48 + 49 + #define MBUS_MDFSMRMR 0x0108 50 + 51 + #define DRAM_PWRCTL 0x0004 52 + #define DRAM_PWRCTL_SELFREF_EN (0x1 << 0) 53 + 54 + #define DRAM_RFSHTMG 0x0090 55 + #define DRAM_RFSHTMG_TREFI(x) ((x) << 16) 56 + #define DRAM_RFSHTMG_TRFC(x) ((x) << 0) 57 + 58 + #define DRAM_VTFCR 0x00b8 59 + #define DRAM_VTFCR_VTF_ENABLE (0x3 << 8) 60 + 61 + #define DRAM_ODTMAP 0x0120 62 + 63 + #define DRAM_DX_MAX 4 64 + 65 + #define DRAM_DXnGCR0(n) (0x0344 + 0x80 * (n)) 66 + #define DRAM_DXnGCR0_DXODT (0x3 << 4) 67 + #define DRAM_DXnGCR0_DXODT_DYNAMIC (0x0 << 4) 68 + #define DRAM_DXnGCR0_DXODT_ENABLED (0x1 << 4) 69 + #define DRAM_DXnGCR0_DXODT_DISABLED (0x2 << 4) 70 + #define DRAM_DXnGCR0_DXEN (0x1 << 0) 71 + 72 + struct sun8i_a33_mbus_variant { 73 + u32 min_dram_divider; 74 + u32 max_dram_divider; 75 + u32 odt_freq_mhz; 76 + }; 77 + 78 + struct sun8i_a33_mbus { 79 + const struct sun8i_a33_mbus_variant *variant; 80 + void __iomem *reg_dram; 81 + void __iomem *reg_mbus; 82 + struct clk *clk_bus; 83 + struct clk *clk_dram; 84 + struct clk *clk_mbus; 85 + struct devfreq *devfreq_dram; 86 + struct devfreq_simple_ondemand_data gov_data; 87 + struct devfreq_dev_profile profile; 88 + u32 data_width; 89 + u32 nominal_bw; 90 + u32 odtmap; 91 + u32 tREFI_ns; 92 + u32 tRFC_ns; 93 + unsigned long freq_table[]; 94 + }; 95 + 96 + /* 97 + * The unit for this value is (MBUS clock cycles / MBUS_TMR_PERIOD). When 98 + * MBUS_TMR_PERIOD is programmed to match the MBUS clock frequency in MHz, as 99 + * it is during DRAM init and during probe, the resulting unit is microseconds. 100 + */ 101 + static int pmu_period = 50000; 102 + module_param(pmu_period, int, 0644); 103 + MODULE_PARM_DESC(pmu_period, "Bandwidth measurement period (microseconds)"); 104 + 105 + static u32 sun8i_a33_mbus_get_peak_bw(struct sun8i_a33_mbus *priv) 106 + { 107 + /* Returns the peak transfer (in KiB) during any single PMU period. */ 108 + return readl_relaxed(priv->reg_mbus + MBUS_TOTAL_BWCR); 109 + } 110 + 111 + static void sun8i_a33_mbus_restart_pmu_counters(struct sun8i_a33_mbus *priv) 112 + { 113 + u32 pmu_cfg = MBUS_PMU_CFG_PERIOD(pmu_period) | MBUS_PMU_CFG_UNIT_KB; 114 + 115 + /* All PMU counters are cleared on a disable->enable transition. */ 116 + writel_relaxed(pmu_cfg, 117 + priv->reg_mbus + MBUS_PMU_CFG); 118 + writel_relaxed(pmu_cfg | MBUS_PMU_CFG_ENABLE, 119 + priv->reg_mbus + MBUS_PMU_CFG); 120 + 121 + } 122 + 123 + static void sun8i_a33_mbus_update_nominal_bw(struct sun8i_a33_mbus *priv, 124 + u32 ddr_freq_mhz) 125 + { 126 + /* 127 + * Nominal bandwidth (KiB per PMU period): 128 + * 129 + * DDR transfers microseconds KiB 130 + * ------------- * ------------ * -------- 131 + * microsecond PMU period transfer 132 + */ 133 + priv->nominal_bw = ddr_freq_mhz * pmu_period * priv->data_width / 1024; 134 + } 135 + 136 + static int sun8i_a33_mbus_set_dram_freq(struct sun8i_a33_mbus *priv, 137 + unsigned long freq) 138 + { 139 + u32 ddr_freq_mhz = freq / USEC_PER_SEC; /* DDR */ 140 + u32 dram_freq_mhz = ddr_freq_mhz / 2; /* SDR */ 141 + u32 mctl_freq_mhz = dram_freq_mhz / 2; /* HDR */ 142 + u32 dxodt, mdfscr, pwrctl, vtfcr; 143 + u32 i, tREFI_32ck, tRFC_ck; 144 + int ret; 145 + 146 + /* The rate change is not effective until the MDFS process runs. */ 147 + ret = clk_set_rate(priv->clk_dram, freq); 148 + if (ret) 149 + return ret; 150 + 151 + /* Disable automatic self-refesh and VTF before starting MDFS. */ 152 + pwrctl = readl_relaxed(priv->reg_dram + DRAM_PWRCTL) & 153 + ~DRAM_PWRCTL_SELFREF_EN; 154 + writel_relaxed(pwrctl, priv->reg_dram + DRAM_PWRCTL); 155 + vtfcr = readl_relaxed(priv->reg_dram + DRAM_VTFCR); 156 + writel_relaxed(vtfcr & ~DRAM_VTFCR_VTF_ENABLE, 157 + priv->reg_dram + DRAM_VTFCR); 158 + 159 + /* Set up MDFS and enable double buffering for timing registers. */ 160 + mdfscr = MBUS_MDFSCR_MODE_DFS | 161 + MBUS_MDFSCR_BYPASS | 162 + MBUS_MDFSCR_PAD_HOLD | 163 + MBUS_MDFSCR_BUFFER_TIMING; 164 + writel(mdfscr, priv->reg_mbus + MBUS_MDFSCR); 165 + 166 + /* Update the buffered copy of RFSHTMG. */ 167 + tREFI_32ck = priv->tREFI_ns * mctl_freq_mhz / 1000 / 32; 168 + tRFC_ck = DIV_ROUND_UP(priv->tRFC_ns * mctl_freq_mhz, 1000); 169 + writel(DRAM_RFSHTMG_TREFI(tREFI_32ck) | DRAM_RFSHTMG_TRFC(tRFC_ck), 170 + priv->reg_dram + DRAM_RFSHTMG); 171 + 172 + /* Enable ODT if needed, or disable it to save power. */ 173 + if (priv->odtmap && dram_freq_mhz > priv->variant->odt_freq_mhz) { 174 + dxodt = DRAM_DXnGCR0_DXODT_DYNAMIC; 175 + writel(priv->odtmap, priv->reg_dram + DRAM_ODTMAP); 176 + } else { 177 + dxodt = DRAM_DXnGCR0_DXODT_DISABLED; 178 + writel(0, priv->reg_dram + DRAM_ODTMAP); 179 + } 180 + for (i = 0; i < DRAM_DX_MAX; ++i) { 181 + void __iomem *reg = priv->reg_dram + DRAM_DXnGCR0(i); 182 + 183 + writel((readl(reg) & ~DRAM_DXnGCR0_DXODT) | dxodt, reg); 184 + } 185 + 186 + dev_dbg(priv->devfreq_dram->dev.parent, 187 + "Setting DRAM to %u MHz, tREFI=%u, tRFC=%u, ODT=%s\n", 188 + dram_freq_mhz, tREFI_32ck, tRFC_ck, 189 + dxodt == DRAM_DXnGCR0_DXODT_DYNAMIC ? "dynamic" : "disabled"); 190 + 191 + /* Trigger hardware MDFS. */ 192 + writel(mdfscr | MBUS_MDFSCR_START, priv->reg_mbus + MBUS_MDFSCR); 193 + ret = readl_poll_timeout_atomic(priv->reg_mbus + MBUS_MDFSCR, mdfscr, 194 + !(mdfscr & MBUS_MDFSCR_START), 10, 1000); 195 + if (ret) 196 + return ret; 197 + 198 + /* Disable double buffering. */ 199 + writel(0, priv->reg_mbus + MBUS_MDFSCR); 200 + 201 + /* Restore VTF configuration. */ 202 + writel_relaxed(vtfcr, priv->reg_dram + DRAM_VTFCR); 203 + 204 + /* Enable automatic self-refresh at the lowest frequency only. */ 205 + if (freq == priv->freq_table[0]) 206 + pwrctl |= DRAM_PWRCTL_SELFREF_EN; 207 + writel_relaxed(pwrctl, priv->reg_dram + DRAM_PWRCTL); 208 + 209 + sun8i_a33_mbus_restart_pmu_counters(priv); 210 + sun8i_a33_mbus_update_nominal_bw(priv, ddr_freq_mhz); 211 + 212 + return 0; 213 + } 214 + 215 + static int sun8i_a33_mbus_set_dram_target(struct device *dev, 216 + unsigned long *freq, u32 flags) 217 + { 218 + struct sun8i_a33_mbus *priv = dev_get_drvdata(dev); 219 + struct devfreq *devfreq = priv->devfreq_dram; 220 + struct dev_pm_opp *opp; 221 + int ret; 222 + 223 + opp = devfreq_recommended_opp(dev, freq, flags); 224 + if (IS_ERR(opp)) 225 + return PTR_ERR(opp); 226 + 227 + dev_pm_opp_put(opp); 228 + 229 + if (*freq == devfreq->previous_freq) 230 + return 0; 231 + 232 + ret = sun8i_a33_mbus_set_dram_freq(priv, *freq); 233 + if (ret) { 234 + dev_warn(dev, "failed to set DRAM frequency: %d\n", ret); 235 + *freq = devfreq->previous_freq; 236 + } 237 + 238 + return ret; 239 + } 240 + 241 + static int sun8i_a33_mbus_get_dram_status(struct device *dev, 242 + struct devfreq_dev_status *stat) 243 + { 244 + struct sun8i_a33_mbus *priv = dev_get_drvdata(dev); 245 + 246 + stat->busy_time = sun8i_a33_mbus_get_peak_bw(priv); 247 + stat->total_time = priv->nominal_bw; 248 + stat->current_frequency = priv->devfreq_dram->previous_freq; 249 + 250 + sun8i_a33_mbus_restart_pmu_counters(priv); 251 + 252 + dev_dbg(dev, "Using %lu/%lu (%lu%%) at %lu MHz\n", 253 + stat->busy_time, stat->total_time, 254 + DIV_ROUND_CLOSEST(stat->busy_time * 100, stat->total_time), 255 + stat->current_frequency / USEC_PER_SEC); 256 + 257 + return 0; 258 + } 259 + 260 + static int sun8i_a33_mbus_hw_init(struct device *dev, 261 + struct sun8i_a33_mbus *priv, 262 + unsigned long ddr_freq) 263 + { 264 + u32 i, mbus_cr, mbus_freq_mhz; 265 + 266 + /* Choose tREFI and tRFC to match the configured DRAM type. */ 267 + mbus_cr = readl_relaxed(priv->reg_mbus + MBUS_CR); 268 + switch (MBUS_CR_GET_DRAM_TYPE(mbus_cr)) { 269 + case MBUS_CR_DRAM_TYPE_DDR2: 270 + case MBUS_CR_DRAM_TYPE_DDR3: 271 + case MBUS_CR_DRAM_TYPE_DDR4: 272 + priv->tREFI_ns = 7800; 273 + priv->tRFC_ns = 350; 274 + break; 275 + case MBUS_CR_DRAM_TYPE_LPDDR2: 276 + case MBUS_CR_DRAM_TYPE_LPDDR3: 277 + priv->tREFI_ns = 3900; 278 + priv->tRFC_ns = 210; 279 + break; 280 + default: 281 + return -EINVAL; 282 + } 283 + 284 + /* Save ODTMAP so it can be restored when raising the frequency. */ 285 + priv->odtmap = readl_relaxed(priv->reg_dram + DRAM_ODTMAP); 286 + 287 + /* Compute the DRAM data bus width by counting enabled DATx8 blocks. */ 288 + for (i = 0; i < DRAM_DX_MAX; ++i) { 289 + void __iomem *reg = priv->reg_dram + DRAM_DXnGCR0(i); 290 + 291 + if (!(readl_relaxed(reg) & DRAM_DXnGCR0_DXEN)) 292 + break; 293 + } 294 + priv->data_width = i; 295 + 296 + dev_dbg(dev, "Detected %u-bit %sDDRx with%s ODT\n", 297 + priv->data_width * 8, 298 + MBUS_CR_GET_DRAM_TYPE(mbus_cr) > 4 ? "LP" : "", 299 + priv->odtmap ? "" : "out"); 300 + 301 + /* Program MBUS_TMR such that the PMU period unit is microseconds. */ 302 + mbus_freq_mhz = clk_get_rate(priv->clk_mbus) / USEC_PER_SEC; 303 + writel_relaxed(MBUS_TMR_PERIOD(mbus_freq_mhz), 304 + priv->reg_mbus + MBUS_TMR); 305 + 306 + /* "Master Ready Mask Register" bits must be set or MDFS will block. */ 307 + writel_relaxed(0xffffffff, priv->reg_mbus + MBUS_MDFSMRMR); 308 + 309 + sun8i_a33_mbus_restart_pmu_counters(priv); 310 + sun8i_a33_mbus_update_nominal_bw(priv, ddr_freq / USEC_PER_SEC); 311 + 312 + return 0; 313 + } 314 + 315 + static int __maybe_unused sun8i_a33_mbus_suspend(struct device *dev) 316 + { 317 + struct sun8i_a33_mbus *priv = dev_get_drvdata(dev); 318 + 319 + clk_disable_unprepare(priv->clk_bus); 320 + 321 + return 0; 322 + } 323 + 324 + static int __maybe_unused sun8i_a33_mbus_resume(struct device *dev) 325 + { 326 + struct sun8i_a33_mbus *priv = dev_get_drvdata(dev); 327 + 328 + return clk_prepare_enable(priv->clk_bus); 329 + } 330 + 331 + static int sun8i_a33_mbus_probe(struct platform_device *pdev) 332 + { 333 + const struct sun8i_a33_mbus_variant *variant; 334 + struct device *dev = &pdev->dev; 335 + struct sun8i_a33_mbus *priv; 336 + unsigned long base_freq; 337 + unsigned int max_state; 338 + const char *err; 339 + int i, ret; 340 + 341 + variant = device_get_match_data(dev); 342 + if (!variant) 343 + return -EINVAL; 344 + 345 + max_state = variant->max_dram_divider - variant->min_dram_divider + 1; 346 + 347 + priv = devm_kzalloc(dev, struct_size(priv, freq_table, max_state), GFP_KERNEL); 348 + if (!priv) 349 + return -ENOMEM; 350 + 351 + platform_set_drvdata(pdev, priv); 352 + 353 + priv->variant = variant; 354 + 355 + priv->reg_dram = devm_platform_ioremap_resource_byname(pdev, "dram"); 356 + if (IS_ERR(priv->reg_dram)) 357 + return PTR_ERR(priv->reg_dram); 358 + 359 + priv->reg_mbus = devm_platform_ioremap_resource_byname(pdev, "mbus"); 360 + if (IS_ERR(priv->reg_mbus)) 361 + return PTR_ERR(priv->reg_mbus); 362 + 363 + priv->clk_bus = devm_clk_get(dev, "bus"); 364 + if (IS_ERR(priv->clk_bus)) 365 + return dev_err_probe(dev, PTR_ERR(priv->clk_bus), 366 + "failed to get bus clock\n"); 367 + 368 + priv->clk_dram = devm_clk_get(dev, "dram"); 369 + if (IS_ERR(priv->clk_dram)) 370 + return dev_err_probe(dev, PTR_ERR(priv->clk_dram), 371 + "failed to get dram clock\n"); 372 + 373 + priv->clk_mbus = devm_clk_get(dev, "mbus"); 374 + if (IS_ERR(priv->clk_mbus)) 375 + return dev_err_probe(dev, PTR_ERR(priv->clk_mbus), 376 + "failed to get mbus clock\n"); 377 + 378 + ret = clk_prepare_enable(priv->clk_bus); 379 + if (ret) 380 + return dev_err_probe(dev, ret, 381 + "failed to enable bus clock\n"); 382 + 383 + /* Lock the DRAM clock rate to keep priv->nominal_bw in sync. */ 384 + ret = clk_rate_exclusive_get(priv->clk_dram); 385 + if (ret) { 386 + err = "failed to lock dram clock rate\n"; 387 + goto err_disable_bus; 388 + } 389 + 390 + /* Lock the MBUS clock rate to keep MBUS_TMR_PERIOD in sync. */ 391 + ret = clk_rate_exclusive_get(priv->clk_mbus); 392 + if (ret) { 393 + err = "failed to lock mbus clock rate\n"; 394 + goto err_unlock_dram; 395 + } 396 + 397 + priv->gov_data.upthreshold = 10; 398 + priv->gov_data.downdifferential = 5; 399 + 400 + priv->profile.initial_freq = clk_get_rate(priv->clk_dram); 401 + priv->profile.polling_ms = 1000; 402 + priv->profile.target = sun8i_a33_mbus_set_dram_target; 403 + priv->profile.get_dev_status = sun8i_a33_mbus_get_dram_status; 404 + priv->profile.freq_table = priv->freq_table; 405 + priv->profile.max_state = max_state; 406 + 407 + ret = devm_pm_opp_set_clkname(dev, "dram"); 408 + if (ret) { 409 + err = "failed to add OPP table\n"; 410 + goto err_unlock_mbus; 411 + } 412 + 413 + base_freq = clk_get_rate(clk_get_parent(priv->clk_dram)); 414 + for (i = 0; i < max_state; ++i) { 415 + unsigned int div = variant->max_dram_divider - i; 416 + 417 + priv->freq_table[i] = base_freq / div; 418 + 419 + ret = dev_pm_opp_add(dev, priv->freq_table[i], 0); 420 + if (ret) { 421 + err = "failed to add OPPs\n"; 422 + goto err_remove_opps; 423 + } 424 + } 425 + 426 + ret = sun8i_a33_mbus_hw_init(dev, priv, priv->profile.initial_freq); 427 + if (ret) { 428 + err = "failed to init hardware\n"; 429 + goto err_remove_opps; 430 + } 431 + 432 + priv->devfreq_dram = devfreq_add_device(dev, &priv->profile, 433 + DEVFREQ_GOV_SIMPLE_ONDEMAND, 434 + &priv->gov_data); 435 + if (IS_ERR(priv->devfreq_dram)) { 436 + ret = PTR_ERR(priv->devfreq_dram); 437 + err = "failed to add devfreq device\n"; 438 + goto err_remove_opps; 439 + } 440 + 441 + /* 442 + * This must be set manually after registering the devfreq device, 443 + * because there is no way to select a dynamic OPP as the suspend OPP. 444 + */ 445 + priv->devfreq_dram->suspend_freq = priv->freq_table[0]; 446 + 447 + return 0; 448 + 449 + err_remove_opps: 450 + dev_pm_opp_remove_all_dynamic(dev); 451 + err_unlock_mbus: 452 + clk_rate_exclusive_put(priv->clk_mbus); 453 + err_unlock_dram: 454 + clk_rate_exclusive_put(priv->clk_dram); 455 + err_disable_bus: 456 + clk_disable_unprepare(priv->clk_bus); 457 + 458 + return dev_err_probe(dev, ret, err); 459 + } 460 + 461 + static int sun8i_a33_mbus_remove(struct platform_device *pdev) 462 + { 463 + struct sun8i_a33_mbus *priv = platform_get_drvdata(pdev); 464 + unsigned long initial_freq = priv->profile.initial_freq; 465 + struct device *dev = &pdev->dev; 466 + int ret; 467 + 468 + devfreq_remove_device(priv->devfreq_dram); 469 + 470 + ret = sun8i_a33_mbus_set_dram_freq(priv, initial_freq); 471 + if (ret) 472 + dev_warn(dev, "failed to restore DRAM frequency: %d\n", ret); 473 + 474 + dev_pm_opp_remove_all_dynamic(dev); 475 + clk_rate_exclusive_put(priv->clk_mbus); 476 + clk_rate_exclusive_put(priv->clk_dram); 477 + clk_disable_unprepare(priv->clk_bus); 478 + 479 + return 0; 480 + } 481 + 482 + static const struct sun8i_a33_mbus_variant sun50i_a64_mbus = { 483 + .min_dram_divider = 1, 484 + .max_dram_divider = 4, 485 + .odt_freq_mhz = 400, 486 + }; 487 + 488 + static const struct of_device_id sun8i_a33_mbus_of_match[] = { 489 + { .compatible = "allwinner,sun50i-a64-mbus", .data = &sun50i_a64_mbus }, 490 + { .compatible = "allwinner,sun50i-h5-mbus", .data = &sun50i_a64_mbus }, 491 + { }, 492 + }; 493 + MODULE_DEVICE_TABLE(of, sun8i_a33_mbus_of_match); 494 + 495 + static SIMPLE_DEV_PM_OPS(sun8i_a33_mbus_pm_ops, 496 + sun8i_a33_mbus_suspend, sun8i_a33_mbus_resume); 497 + 498 + static struct platform_driver sun8i_a33_mbus_driver = { 499 + .probe = sun8i_a33_mbus_probe, 500 + .remove = sun8i_a33_mbus_remove, 501 + .driver = { 502 + .name = "sun8i-a33-mbus", 503 + .of_match_table = sun8i_a33_mbus_of_match, 504 + .pm = pm_ptr(&sun8i_a33_mbus_pm_ops), 505 + }, 506 + }; 507 + module_platform_driver(sun8i_a33_mbus_driver); 508 + 509 + MODULE_AUTHOR("Samuel Holland <samuel@sholland.org>"); 510 + MODULE_DESCRIPTION("Allwinner sun8i/sun50i MBUS DEVFREQ Driver"); 511 + MODULE_LICENSE("GPL v2");
+4 -4
drivers/mmc/host/jz4740_mmc.c
··· 1103 1103 return 0; 1104 1104 } 1105 1105 1106 - static int __maybe_unused jz4740_mmc_suspend(struct device *dev) 1106 + static int jz4740_mmc_suspend(struct device *dev) 1107 1107 { 1108 1108 return pinctrl_pm_select_sleep_state(dev); 1109 1109 } 1110 1110 1111 - static int __maybe_unused jz4740_mmc_resume(struct device *dev) 1111 + static int jz4740_mmc_resume(struct device *dev) 1112 1112 { 1113 1113 return pinctrl_select_default_state(dev); 1114 1114 } 1115 1115 1116 - static SIMPLE_DEV_PM_OPS(jz4740_mmc_pm_ops, jz4740_mmc_suspend, 1116 + DEFINE_SIMPLE_DEV_PM_OPS(jz4740_mmc_pm_ops, jz4740_mmc_suspend, 1117 1117 jz4740_mmc_resume); 1118 1118 1119 1119 static struct platform_driver jz4740_mmc_driver = { ··· 1123 1123 .name = "jz4740-mmc", 1124 1124 .probe_type = PROBE_PREFER_ASYNCHRONOUS, 1125 1125 .of_match_table = of_match_ptr(jz4740_mmc_of_match), 1126 - .pm = pm_ptr(&jz4740_mmc_pm_ops), 1126 + .pm = pm_sleep_ptr(&jz4740_mmc_pm_ops), 1127 1127 }, 1128 1128 }; 1129 1129
+2 -4
drivers/mmc/host/mxcmmc.c
··· 1183 1183 return 0; 1184 1184 } 1185 1185 1186 - #ifdef CONFIG_PM_SLEEP 1187 1186 static int mxcmci_suspend(struct device *dev) 1188 1187 { 1189 1188 struct mmc_host *mmc = dev_get_drvdata(dev); ··· 1209 1210 1210 1211 return ret; 1211 1212 } 1212 - #endif 1213 1213 1214 - static SIMPLE_DEV_PM_OPS(mxcmci_pm_ops, mxcmci_suspend, mxcmci_resume); 1214 + DEFINE_SIMPLE_DEV_PM_OPS(mxcmci_pm_ops, mxcmci_suspend, mxcmci_resume); 1215 1215 1216 1216 static struct platform_driver mxcmci_driver = { 1217 1217 .probe = mxcmci_probe, ··· 1218 1220 .driver = { 1219 1221 .name = DRIVER_NAME, 1220 1222 .probe_type = PROBE_PREFER_ASYNCHRONOUS, 1221 - .pm = &mxcmci_pm_ops, 1223 + .pm = pm_sleep_ptr(&mxcmci_pm_ops), 1222 1224 .of_match_table = mxcmci_of_match, 1223 1225 } 1224 1226 };
+3 -1
drivers/net/ethernet/realtek/r8169_main.c
··· 5460 5460 .probe = rtl_init_one, 5461 5461 .remove = rtl_remove_one, 5462 5462 .shutdown = rtl_shutdown, 5463 - .driver.pm = pm_ptr(&rtl8169_pm_ops), 5463 + #ifdef CONFIG_PM 5464 + .driver.pm = &rtl8169_pm_ops, 5465 + #endif 5464 5466 }; 5465 5467 5466 5468 module_pci_driver(rtl8169_pci_driver);
+3 -3
drivers/powercap/dtpm.c
··· 382 382 { 383 383 powercap_unregister_zone(pct, &dtpm->zone); 384 384 385 - pr_info("Unregistered dtpm node '%s'\n", dtpm->zone.name); 385 + pr_debug("Unregistered dtpm node '%s'\n", dtpm->zone.name); 386 386 } 387 387 388 388 /** ··· 453 453 dtpm->power_limit = dtpm->power_max; 454 454 } 455 455 456 - pr_info("Registered dtpm node '%s' / %llu-%llu uW, \n", 457 - dtpm->zone.name, dtpm->power_min, dtpm->power_max); 456 + pr_debug("Registered dtpm node '%s' / %llu-%llu uW, \n", 457 + dtpm->zone.name, dtpm->power_min, dtpm->power_max); 458 458 459 459 mutex_unlock(&dtpm_lock); 460 460
+1 -1
drivers/powercap/idle_inject.c
··· 12 12 * 13 13 * All of the kthreads used for idle injection are created at init time. 14 14 * 15 - * Next, the users of the the idle injection framework provide a cpumask via 15 + * Next, the users of the idle injection framework provide a cpumask via 16 16 * its register function. The kthreads will be synchronized with respect to 17 17 * this cpumask. 18 18 *
+59 -2
drivers/powercap/intel_rapl_common.c
··· 61 61 #define PERF_STATUS_THROTTLE_TIME_MASK 0xffffffff 62 62 #define PP_POLICY_MASK 0x1F 63 63 64 + /* 65 + * SPR has different layout for Psys Domain PowerLimit registers. 66 + * There are 17 bits of PL1 and PL2 instead of 15 bits. 67 + * The Enable bits and TimeWindow bits are also shifted as a result. 68 + */ 69 + #define PSYS_POWER_LIMIT1_MASK 0x1FFFF 70 + #define PSYS_POWER_LIMIT1_ENABLE BIT(17) 71 + 72 + #define PSYS_POWER_LIMIT2_MASK (0x1FFFFULL<<32) 73 + #define PSYS_POWER_LIMIT2_ENABLE BIT_ULL(49) 74 + 75 + #define PSYS_TIME_WINDOW1_MASK (0x7FULL<<19) 76 + #define PSYS_TIME_WINDOW2_MASK (0x7FULL<<51) 77 + 64 78 /* Non HW constants */ 65 79 #define RAPL_PRIMITIVE_DERIVED BIT(1) /* not from raw data */ 66 80 #define RAPL_PRIMITIVE_DUMMY BIT(2) ··· 111 97 bool to_raw); 112 98 unsigned int dram_domain_energy_unit; 113 99 unsigned int psys_domain_energy_unit; 100 + bool spr_psys_bits; 114 101 }; 115 102 static struct rapl_defaults *rapl_defaults; 116 103 ··· 684 669 RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0), 685 670 PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, PP_POLICY_MASK, 0, 686 671 RAPL_DOMAIN_REG_POLICY, ARBITRARY_UNIT, 0), 672 + PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT1, PSYS_POWER_LIMIT1_MASK, 0, 673 + RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 674 + PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT2, PSYS_POWER_LIMIT2_MASK, 32, 675 + RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 676 + PRIMITIVE_INFO_INIT(PSYS_PL1_ENABLE, PSYS_POWER_LIMIT1_ENABLE, 17, 677 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 678 + PRIMITIVE_INFO_INIT(PSYS_PL2_ENABLE, PSYS_POWER_LIMIT2_ENABLE, 49, 679 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 680 + PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW1, PSYS_TIME_WINDOW1_MASK, 19, 681 + RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 682 + PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW2, PSYS_TIME_WINDOW2_MASK, 51, 683 + RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 687 684 /* non-hardware */ 688 685 PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0, POWER_UNIT, 689 686 RAPL_PRIMITIVE_DERIVED), 690 687 {NULL, 0, 0, 0}, 691 688 }; 689 + 690 + static enum rapl_primitives 691 + prim_fixups(struct rapl_domain *rd, enum rapl_primitives prim) 692 + { 693 + if (!rapl_defaults->spr_psys_bits) 694 + return prim; 695 + 696 + if (rd->id != RAPL_DOMAIN_PLATFORM) 697 + return prim; 698 + 699 + switch (prim) { 700 + case POWER_LIMIT1: 701 + return PSYS_POWER_LIMIT1; 702 + case POWER_LIMIT2: 703 + return PSYS_POWER_LIMIT2; 704 + case PL1_ENABLE: 705 + return PSYS_PL1_ENABLE; 706 + case PL2_ENABLE: 707 + return PSYS_PL2_ENABLE; 708 + case TIME_WINDOW1: 709 + return PSYS_TIME_WINDOW1; 710 + case TIME_WINDOW2: 711 + return PSYS_TIME_WINDOW2; 712 + default: 713 + return prim; 714 + } 715 + } 692 716 693 717 /* Read primitive data based on its related struct rapl_primitive_info. 694 718 * if xlate flag is set, return translated data based on data units, i.e. ··· 746 692 enum rapl_primitives prim, bool xlate, u64 *data) 747 693 { 748 694 u64 value; 749 - struct rapl_primitive_info *rp = &rpi[prim]; 695 + enum rapl_primitives prim_fixed = prim_fixups(rd, prim); 696 + struct rapl_primitive_info *rp = &rpi[prim_fixed]; 750 697 struct reg_action ra; 751 698 int cpu; 752 699 ··· 793 738 enum rapl_primitives prim, 794 739 unsigned long long value) 795 740 { 796 - struct rapl_primitive_info *rp = &rpi[prim]; 741 + enum rapl_primitives prim_fixed = prim_fixups(rd, prim); 742 + struct rapl_primitive_info *rp = &rpi[prim_fixed]; 797 743 int cpu; 798 744 u64 bits; 799 745 struct reg_action ra; ··· 1037 981 .compute_time_window = rapl_compute_time_window_core, 1038 982 .dram_domain_energy_unit = 15300, 1039 983 .psys_domain_energy_unit = 1000000000, 984 + .spr_psys_bits = true, 1040 985 }; 1041 986 1042 987 static const struct rapl_defaults rapl_defaults_byt = {
+1 -5
drivers/thermal/cpufreq_cooling.c
··· 462 462 struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata; 463 463 struct cpumask *cpus; 464 464 unsigned int frequency; 465 - unsigned long max_capacity, capacity; 466 465 int ret; 467 466 468 467 /* Request state should be less than max_level */ ··· 478 479 if (ret >= 0) { 479 480 cpufreq_cdev->cpufreq_state = state; 480 481 cpus = cpufreq_cdev->policy->related_cpus; 481 - max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus)); 482 - capacity = frequency * max_capacity; 483 - capacity /= cpufreq_cdev->policy->cpuinfo.max_freq; 484 - arch_set_thermal_pressure(cpus, max_capacity - capacity); 482 + arch_update_thermal_pressure(cpus, frequency); 485 483 ret = 0; 486 484 } 487 485
+5
include/acpi/cppc_acpi.h
··· 138 138 extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); 139 139 extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); 140 140 extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); 141 + extern int cppc_set_enable(int cpu, bool enable); 141 142 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps); 142 143 extern bool acpi_cpc_valid(void); 143 144 extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data); ··· 160 159 return -ENOTSUPP; 161 160 } 162 161 static inline int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls) 162 + { 163 + return -ENOTSUPP; 164 + } 165 + static inline int cppc_set_enable(int cpu, bool enable) 163 166 { 164 167 return -ENOTSUPP; 165 168 }
+1 -1
include/linux/acpi.h
··· 506 506 int acpi_resources_are_enforced(void); 507 507 508 508 #ifdef CONFIG_HIBERNATION 509 - void __init acpi_no_s4_hw_signature(void); 509 + void __init acpi_check_s4_hw_signature(int check); 510 510 #endif 511 511 512 512 #ifdef CONFIG_PM_SLEEP
+2 -2
include/linux/arch_topology.h
··· 56 56 return per_cpu(thermal_pressure, cpu); 57 57 } 58 58 59 - void topology_set_thermal_pressure(const struct cpumask *cpus, 60 - unsigned long th_pressure); 59 + void topology_update_thermal_pressure(const struct cpumask *cpus, 60 + unsigned long capped_freq); 61 61 62 62 struct cpu_topology { 63 63 int thread_id;
-2
include/linux/dtpm.h
··· 70 70 71 71 int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent); 72 72 73 - int dtpm_register_cpu(struct dtpm *parent); 74 - 75 73 #endif
+6
include/linux/intel_rapl.h
··· 58 58 THROTTLED_TIME, 59 59 PRIORITY_LEVEL, 60 60 61 + PSYS_POWER_LIMIT1, 62 + PSYS_POWER_LIMIT2, 63 + PSYS_PL1_ENABLE, 64 + PSYS_PL2_ENABLE, 65 + PSYS_TIME_WINDOW1, 66 + PSYS_TIME_WINDOW2, 61 67 /* below are not raw primitive data */ 62 68 AVERAGE_POWER, 63 69 NR_RAPL_PRIMITIVES,
+53 -29
include/linux/pm.h
··· 300 300 int (*runtime_idle)(struct device *dev); 301 301 }; 302 302 303 + #define SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 304 + .suspend = pm_sleep_ptr(suspend_fn), \ 305 + .resume = pm_sleep_ptr(resume_fn), \ 306 + .freeze = pm_sleep_ptr(suspend_fn), \ 307 + .thaw = pm_sleep_ptr(resume_fn), \ 308 + .poweroff = pm_sleep_ptr(suspend_fn), \ 309 + .restore = pm_sleep_ptr(resume_fn), 310 + 311 + #define LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 312 + .suspend_late = pm_sleep_ptr(suspend_fn), \ 313 + .resume_early = pm_sleep_ptr(resume_fn), \ 314 + .freeze_late = pm_sleep_ptr(suspend_fn), \ 315 + .thaw_early = pm_sleep_ptr(resume_fn), \ 316 + .poweroff_late = pm_sleep_ptr(suspend_fn), \ 317 + .restore_early = pm_sleep_ptr(resume_fn), 318 + 319 + #define NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 320 + .suspend_noirq = pm_sleep_ptr(suspend_fn), \ 321 + .resume_noirq = pm_sleep_ptr(resume_fn), \ 322 + .freeze_noirq = pm_sleep_ptr(suspend_fn), \ 323 + .thaw_noirq = pm_sleep_ptr(resume_fn), \ 324 + .poweroff_noirq = pm_sleep_ptr(suspend_fn), \ 325 + .restore_noirq = pm_sleep_ptr(resume_fn), 326 + 327 + #define RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ 328 + .runtime_suspend = suspend_fn, \ 329 + .runtime_resume = resume_fn, \ 330 + .runtime_idle = idle_fn, 331 + 303 332 #ifdef CONFIG_PM_SLEEP 304 333 #define SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 305 - .suspend = suspend_fn, \ 306 - .resume = resume_fn, \ 307 - .freeze = suspend_fn, \ 308 - .thaw = resume_fn, \ 309 - .poweroff = suspend_fn, \ 310 - .restore = resume_fn, 334 + SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) 311 335 #else 312 336 #define SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) 313 337 #endif 314 338 315 339 #ifdef CONFIG_PM_SLEEP 316 340 #define SET_LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 317 - .suspend_late = suspend_fn, \ 318 - .resume_early = resume_fn, \ 319 - .freeze_late = suspend_fn, \ 320 - .thaw_early = resume_fn, \ 321 - .poweroff_late = suspend_fn, \ 322 - .restore_early = resume_fn, 341 + LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) 323 342 #else 324 343 #define SET_LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) 325 344 #endif 326 345 327 346 #ifdef CONFIG_PM_SLEEP 328 347 #define SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 329 - .suspend_noirq = suspend_fn, \ 330 - .resume_noirq = resume_fn, \ 331 - .freeze_noirq = suspend_fn, \ 332 - .thaw_noirq = resume_fn, \ 333 - .poweroff_noirq = suspend_fn, \ 334 - .restore_noirq = resume_fn, 348 + NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) 335 349 #else 336 350 #define SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) 337 351 #endif 338 352 339 353 #ifdef CONFIG_PM 340 354 #define SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ 341 - .runtime_suspend = suspend_fn, \ 342 - .runtime_resume = resume_fn, \ 343 - .runtime_idle = idle_fn, 355 + RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) 344 356 #else 345 357 #define SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) 346 358 #endif ··· 361 349 * Use this if you want to use the same suspend and resume callbacks for suspend 362 350 * to RAM and hibernation. 363 351 */ 364 - #define SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ 365 - const struct dev_pm_ops __maybe_unused name = { \ 366 - SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 352 + #define DEFINE_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ 353 + static const struct dev_pm_ops name = { \ 354 + SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 367 355 } 368 356 369 357 /* ··· 379 367 * .resume_early(), to the same routines as .runtime_suspend() and 380 368 * .runtime_resume(), respectively (and analogously for hibernation). 381 369 */ 370 + #define DEFINE_UNIVERSAL_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \ 371 + static const struct dev_pm_ops name = { \ 372 + SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 373 + RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ 374 + } 375 + 376 + /* Deprecated. Use DEFINE_SIMPLE_DEV_PM_OPS() instead. */ 377 + #define SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ 378 + const struct dev_pm_ops __maybe_unused name = { \ 379 + SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 380 + } 381 + 382 + /* Deprecated. Use DEFINE_UNIVERSAL_DEV_PM_OPS() instead. */ 382 383 #define UNIVERSAL_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \ 383 384 const struct dev_pm_ops __maybe_unused name = { \ 384 385 SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ 385 386 SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ 386 387 } 387 388 388 - #ifdef CONFIG_PM 389 - #define pm_ptr(_ptr) (_ptr) 390 - #else 391 - #define pm_ptr(_ptr) NULL 392 - #endif 389 + #define pm_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM), (_ptr)) 390 + #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr)) 393 391 394 392 /* 395 393 * PM_EVENT_ messages ··· 521 499 */ 522 500 523 501 enum rpm_status { 502 + RPM_INVALID = -1, 524 503 RPM_ACTIVE = 0, 525 504 RPM_RESUMING, 526 505 RPM_SUSPENDED, ··· 635 612 unsigned int links_count; 636 613 enum rpm_request request; 637 614 enum rpm_status runtime_status; 615 + enum rpm_status last_status; 638 616 int runtime_error; 639 617 int autosuspend_delay; 640 618 u64 last_busy;
+3
include/linux/pm_runtime.h
··· 58 58 extern void pm_runtime_put_suppliers(struct device *dev); 59 59 extern void pm_runtime_new_link(struct device *dev); 60 60 extern void pm_runtime_drop_link(struct device_link *link); 61 + extern void pm_runtime_release_supplier(struct device_link *link, bool check_idle); 61 62 62 63 extern int devm_pm_runtime_enable(struct device *dev); 63 64 ··· 284 283 static inline void pm_runtime_put_suppliers(struct device *dev) {} 285 284 static inline void pm_runtime_new_link(struct device *dev) {} 286 285 static inline void pm_runtime_drop_link(struct device_link *link) {} 286 + static inline void pm_runtime_release_supplier(struct device_link *link, 287 + bool check_idle) {} 287 288 288 289 #endif /* !CONFIG_PM */ 289 290
+3 -3
include/linux/sched/topology.h
··· 266 266 } 267 267 #endif 268 268 269 - #ifndef arch_set_thermal_pressure 269 + #ifndef arch_update_thermal_pressure 270 270 static __always_inline 271 - void arch_set_thermal_pressure(const struct cpumask *cpus, 272 - unsigned long th_pressure) 271 + void arch_update_thermal_pressure(const struct cpumask *cpus, 272 + unsigned long capped_frequency) 273 273 { } 274 274 #endif 275 275
+1
include/linux/suspend.h
··· 446 446 extern asmlinkage int swsusp_arch_suspend(void); 447 447 extern asmlinkage int swsusp_arch_resume(void); 448 448 449 + extern u32 swsusp_hardware_signature; 449 450 extern void hibernation_set_ops(const struct platform_hibernation_ops *ops); 450 451 extern int hibernate(void); 451 452 extern bool system_entering_hibernation(void);
+1 -1
init/Kconfig
··· 550 550 i.e. put less load on throttled CPUs than on non/less throttled ones. 551 551 552 552 This requires the architecture to implement 553 - arch_set_thermal_pressure() and arch_scale_thermal_pressure(). 553 + arch_update_thermal_pressure() and arch_scale_thermal_pressure(). 554 554 555 555 config BSD_PROCESS_ACCT 556 556 bool "BSD Process Accounting"
+1
kernel/power/power.h
··· 170 170 #define SF_PLATFORM_MODE 1 171 171 #define SF_NOCOMPRESS_MODE 2 172 172 #define SF_CRC32_MODE 4 173 + #define SF_HW_SIG 8 173 174 174 175 /* kernel/power/hibernate.c */ 175 176 extern int swsusp_check(void);
+14 -2
kernel/power/swap.c
··· 36 36 37 37 #define HIBERNATE_SIG "S1SUSPEND" 38 38 39 + u32 swsusp_hardware_signature; 40 + 39 41 /* 40 42 * When reading an {un,}compressed image, we may restore pages in place, 41 43 * in which case some architectures need these pages cleaning before they ··· 106 104 107 105 struct swsusp_header { 108 106 char reserved[PAGE_SIZE - 20 - sizeof(sector_t) - sizeof(int) - 109 - sizeof(u32)]; 107 + sizeof(u32) - sizeof(u32)]; 108 + u32 hw_sig; 110 109 u32 crc32; 111 110 sector_t image; 112 111 unsigned int flags; /* Flags to pass to the "boot" kernel */ ··· 315 312 /* 316 313 * Saving part 317 314 */ 318 - 319 315 static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags) 320 316 { 321 317 int error; ··· 326 324 memcpy(swsusp_header->orig_sig,swsusp_header->sig, 10); 327 325 memcpy(swsusp_header->sig, HIBERNATE_SIG, 10); 328 326 swsusp_header->image = handle->first_sector; 327 + if (swsusp_hardware_signature) { 328 + swsusp_header->hw_sig = swsusp_hardware_signature; 329 + flags |= SF_HW_SIG; 330 + } 329 331 swsusp_header->flags = flags; 330 332 if (flags & SF_CRC32_MODE) 331 333 swsusp_header->crc32 = handle->crc32; ··· 1541 1535 swsusp_resume_block, 1542 1536 swsusp_header, NULL); 1543 1537 } else { 1538 + error = -EINVAL; 1539 + } 1540 + if (!error && swsusp_header->flags & SF_HW_SIG && 1541 + swsusp_header->hw_sig != swsusp_hardware_signature) { 1542 + pr_info("Suspend image hardware signature mismatch (%08x now %08x); aborting resume.\n", 1543 + swsusp_header->hw_sig, swsusp_hardware_signature); 1544 1544 error = -EINVAL; 1545 1545 } 1546 1546