Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'pm-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"These are PM-runtime framework changes to use ktime instead of jiffies
for accounting, new PM core flag to mark devices that don't need any
form of power management, cpuidle updates including driver API
documentation and a new governor, cpufreq updates including a new
driver for Armada 8K, thermal cleanups and more, some energy-aware
scheduling (EAS) enabling changes, new chips support in the intel_idle
and RAPL drivers and assorted cleanups in some other places.

Specifics:

- Update the PM-runtime framework to use ktime instead of jiffies for
accounting (Thara Gopinath, Vincent Guittot)

- Optimize the autosuspend code in the PM-runtime framework somewhat
(Ladislav Michl)

- Add a PM core flag to mark devices that don't need any form of
power management (Sudeep Holla)

- Introduce driver API documentation for cpuidle and add a new
cpuidle governor for tickless systems (Rafael Wysocki)

- Add Jacobsville support to the intel_idle driver (Zhang Rui)

- Clean up a cpuidle core header file and the cpuidle-dt and ACPI
processor-idle drivers (Yangtao Li, Joseph Lo, Yazen Ghannam)

- Add new cpufreq driver for Armada 8K (Gregory Clement)

- Fix and clean up cpufreq core (Rafael Wysocki, Viresh Kumar, Amit
Kucheria)

- Add support for light-weight tear-down and bring-up of CPUs to the
cpufreq core and use it in the cpufreq-dt driver (Viresh Kumar)

- Fix cpu_cooling Kconfig dependencies, add support for CPU cooling
auto-registration to the cpufreq core and use it in multiple
cpufreq drivers (Amit Kucheria)

- Fix some minor issues and do some cleanups in the davinci,
e_powersaver, ap806, s5pv210, qcom and kryo cpufreq drivers
(Bartosz Golaszewski, Gustavo Silva, Julia Lawall, Paweł Chmiel,
Taniya Das, Viresh Kumar)

- Add a Hisilicon CPPC quirk to the cppc_cpufreq driver (Xiongfeng
Wang)

- Clean up the intel_pstate and acpi-cpufreq drivers (Erwan Velu,
Rafael Wysocki)

- Clean up multiple cpufreq drivers (Yangtao Li)

- Update cpufreq-related MAINTAINERS entries (Baruch Siach, Lukas
Bulwahn)

- Add support for exposing the Energy Model via debugfs and make
multiple cpufreq drivers register an Energy Model to support
energy-aware scheduling (Quentin Perret, Dietmar Eggemann, Matthias
Kaehlcke)

- Add Ice Lake mobile and Jacobsville support to the Intel RAPL
power-capping driver (Gayatri Kammela, Zhang Rui)

- Add a power estimation helper to the operating performance points
(OPP) framework and clean up a core function in it (Quentin Perret,
Viresh Kumar)

- Make minor improvements in the generic power domains (genpd), OPP
and system suspend frameworks and in the PM core (Aditya Pakki,
Douglas Anderson, Greg Kroah-Hartman, Rafael Wysocki, Yangtao Li)"

* tag 'pm-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (80 commits)
cpufreq: kryo: Release OPP tables on module removal
cpufreq: ap806: add missing of_node_put after of_device_is_available
cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
cpufreq: Pass updated policy to driver ->setpolicy() callback
cpufreq: Fix two debug messages in cpufreq_set_policy()
cpufreq: Reorder and simplify cpufreq_update_policy()
cpufreq: Add kerneldoc comments for two core functions
PM / core: Add support to skip power management in device/driver model
cpufreq: intel_pstate: Rework iowait boosting to be less aggressive
cpufreq: intel_pstate: Eliminate intel_pstate_get_base_pstate()
cpufreq: intel_pstate: Avoid redundant initialization of local vars
powercap/intel_rapl: add Ice Lake mobile
ACPI / processor: Set P_LVL{2,3} idle state descriptions
cpufreq / cppc: Work around for Hisilicon CPPC cpufreq
ACPI / CPPC: Add a helper to get desired performance
cpufreq: davinci: move configuration to include/linux/platform_data
cpufreq: speedstep: convert BUG() to BUG_ON()
cpufreq: powernv: fix missing check of return value in init_powernv_pstates()
cpufreq: longhaul: remove unneeded semicolon
cpufreq: pcc-cpufreq: remove unneeded semicolon
..

+1923 -560
+96 -8
Documentation/admin-guide/pm/cpuidle.rst
··· 155 155 and that is the primary reason for having more than one governor in the 156 156 ``CPUIdle`` subsystem. 157 157 158 - There are two ``CPUIdle`` governors available, ``menu`` and ``ladder``. Which 159 - of them is used depends on the configuration of the kernel and in particular on 160 - whether or not the scheduler tick can be `stopped by the idle 161 - loop <idle-cpus-and-tick_>`_. It is possible to change the governor at run time 162 - if the ``cpuidle_sysfs_switch`` command line parameter has been passed to the 163 - kernel, but that is not safe in general, so it should not be done on production 164 - systems (that may change in the future, though). The name of the ``CPUIdle`` 165 - governor currently used by the kernel can be read from the 158 + There are three ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_ 159 + and ``ladder``. Which of them is used by default depends on the configuration 160 + of the kernel and in particular on whether or not the scheduler tick can be 161 + `stopped by the idle loop <idle-cpus-and-tick_>`_. It is possible to change the 162 + governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has 163 + been passed to the kernel, but that is not safe in general, so it should not be 164 + done on production systems (that may change in the future, though). The name of 165 + the ``CPUIdle`` governor currently used by the kernel can be read from the 166 166 :file:`current_governor_ro` (or :file:`current_governor` if 167 167 ``cpuidle_sysfs_switch`` is present in the kernel command line) file under 168 168 :file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``. ··· 256 256 ``CPUIdle`` governor on it will be ``ladder``. 257 257 258 258 259 + .. _menu-gov: 260 + 259 261 The ``menu`` Governor 260 262 ===================== 261 263 ··· 333 331 the real time until the closest timer event and if it really is greater than 334 332 that time, the governor may need to select a shallower state with a suitable 335 333 target residency. 334 + 335 + 336 + .. _teo-gov: 337 + 338 + The Timer Events Oriented (TEO) Governor 339 + ======================================== 340 + 341 + The timer events oriented (TEO) governor is an alternative ``CPUIdle`` governor 342 + for tickless systems. It follows the same basic strategy as the ``menu`` `one 343 + <menu-gov_>`_: it always tries to find the deepest idle state suitable for the 344 + given conditions. However, it applies a different approach to that problem. 345 + 346 + First, it does not use sleep length correction factors, but instead it attempts 347 + to correlate the observed idle duration values with the available idle states 348 + and use that information to pick up the idle state that is most likely to 349 + "match" the upcoming CPU idle interval. Second, it does not take the tasks 350 + that were running on the given CPU in the past and are waiting on some I/O 351 + operations to complete now at all (there is no guarantee that they will run on 352 + the same CPU when they become runnable again) and the pattern detection code in 353 + it avoids taking timer wakeups into account. It also only uses idle duration 354 + values less than the current time till the closest timer (with the scheduler 355 + tick excluded) for that purpose. 356 + 357 + Like in the ``menu`` governor `case <menu-gov_>`_, the first step is to obtain 358 + the *sleep length*, which is the time until the closest timer event with the 359 + assumption that the scheduler tick will be stopped (that also is the upper bound 360 + on the time until the next CPU wakeup). That value is then used to preselect an 361 + idle state on the basis of three metrics maintained for each idle state provided 362 + by the ``CPUIdle`` driver: ``hits``, ``misses`` and ``early_hits``. 363 + 364 + The ``hits`` and ``misses`` metrics measure the likelihood that a given idle 365 + state will "match" the observed (post-wakeup) idle duration if it "matches" the 366 + sleep length. They both are subject to decay (after a CPU wakeup) every time 367 + the target residency of the idle state corresponding to them is less than or 368 + equal to the sleep length and the target residency of the next idle state is 369 + greater than the sleep length (that is, when the idle state corresponding to 370 + them "matches" the sleep length). The ``hits`` metric is increased if the 371 + former condition is satisfied and the target residency of the given idle state 372 + is less than or equal to the observed idle duration and the target residency of 373 + the next idle state is greater than the observed idle duration at the same time 374 + (that is, it is increased when the given idle state "matches" both the sleep 375 + length and the observed idle duration). In turn, the ``misses`` metric is 376 + increased when the given idle state "matches" the sleep length only and the 377 + observed idle duration is too short for its target residency. 378 + 379 + The ``early_hits`` metric measures the likelihood that a given idle state will 380 + "match" the observed (post-wakeup) idle duration if it does not "match" the 381 + sleep length. It is subject to decay on every CPU wakeup and it is increased 382 + when the idle state corresponding to it "matches" the observed (post-wakeup) 383 + idle duration and the target residency of the next idle state is less than or 384 + equal to the sleep length (i.e. the idle state "matching" the sleep length is 385 + deeper than the given one). 386 + 387 + The governor walks the list of idle states provided by the ``CPUIdle`` driver 388 + and finds the last (deepest) one with the target residency less than or equal 389 + to the sleep length. Then, the ``hits`` and ``misses`` metrics of that idle 390 + state are compared with each other and it is preselected if the ``hits`` one is 391 + greater (which means that that idle state is likely to "match" the observed idle 392 + duration after CPU wakeup). If the ``misses`` one is greater, the governor 393 + preselects the shallower idle state with the maximum ``early_hits`` metric 394 + (or if there are multiple shallower idle states with equal ``early_hits`` 395 + metric which also is the maximum, the shallowest of them will be preselected). 396 + [If there is a wakeup latency constraint coming from the `PM QoS framework 397 + <cpu-pm-qos_>`_ which is hit before reaching the deepest idle state with the 398 + target residency within the sleep length, the deepest idle state with the exit 399 + latency within the constraint is preselected without consulting the ``hits``, 400 + ``misses`` and ``early_hits`` metrics.] 401 + 402 + Next, the governor takes several idle duration values observed most recently 403 + into consideration and if at least a half of them are greater than or equal to 404 + the target residency of the preselected idle state, that idle state becomes the 405 + final candidate to ask for. Otherwise, the average of the most recent idle 406 + duration values below the target residency of the preselected idle state is 407 + computed and the governor walks the idle states shallower than the preselected 408 + one and finds the deepest of them with the target residency within that average. 409 + That idle state is then taken as the final candidate to ask for. 410 + 411 + Still, at this point the governor may need to refine the idle state selection if 412 + it has not decided to `stop the scheduler tick <idle-cpus-and-tick_>`_. That 413 + generally happens if the target residency of the idle state selected so far is 414 + less than the tick period and the tick has not been stopped already (in a 415 + previous iteration of the idle loop). Then, like in the ``menu`` governor 416 + `case <menu-gov_>`_, the sleep length used in the previous computations may not 417 + reflect the real time until the closest timer event and if it really is greater 418 + than that time, a shallower state with a suitable target residency may need to 419 + be selected. 336 420 337 421 338 422 .. _idle-states-representation:
-37
Documentation/cpuidle/driver.txt
··· 1 - 2 - 3 - Supporting multiple CPU idle levels in kernel 4 - 5 - cpuidle drivers 6 - 7 - 8 - 9 - 10 - cpuidle driver hooks into the cpuidle infrastructure and handles the 11 - architecture/platform dependent part of CPU idle states. Driver 12 - provides the platform idle state detection capability and also 13 - has mechanisms in place to support actual entry-exit into CPU idle states. 14 - 15 - cpuidle driver initializes the cpuidle_device structure for each CPU device 16 - and registers with cpuidle using cpuidle_register_device. 17 - 18 - If all the idle states are the same, the wrapper function cpuidle_register 19 - could be used instead. 20 - 21 - It can also support the dynamic changes (like battery <-> AC), by using 22 - cpuidle_pause_and_lock, cpuidle_disable_device and cpuidle_enable_device, 23 - cpuidle_resume_and_unlock. 24 - 25 - Interfaces: 26 - extern int cpuidle_register(struct cpuidle_driver *drv, 27 - const struct cpumask *const coupled_cpus); 28 - extern int cpuidle_unregister(struct cpuidle_driver *drv); 29 - extern int cpuidle_register_driver(struct cpuidle_driver *drv); 30 - extern void cpuidle_unregister_driver(struct cpuidle_driver *drv); 31 - extern int cpuidle_register_device(struct cpuidle_device *dev); 32 - extern void cpuidle_unregister_device(struct cpuidle_device *dev); 33 - 34 - extern void cpuidle_pause_and_lock(void); 35 - extern void cpuidle_resume_and_unlock(void); 36 - extern int cpuidle_enable_device(struct cpuidle_device *dev); 37 - extern void cpuidle_disable_device(struct cpuidle_device *dev);
-28
Documentation/cpuidle/governor.txt
··· 1 - 2 - 3 - 4 - Supporting multiple CPU idle levels in kernel 5 - 6 - cpuidle governors 7 - 8 - 9 - 10 - 11 - cpuidle governor is policy routine that decides what idle state to enter at 12 - any given time. cpuidle core uses different callbacks to the governor. 13 - 14 - * enable() to enable governor for a particular device 15 - * disable() to disable governor for a particular device 16 - * select() to select an idle state to enter 17 - * reflect() called after returning from the idle state, which can be used 18 - by the governor for some record keeping. 19 - 20 - More than one governor can be registered at the same time and 21 - users can switch between drivers using /sysfs interface (when enabled). 22 - More than one governor part is supported for developers to easily experiment 23 - with different governors. By default, most optimal governor based on your 24 - kernel configuration and platform will be selected by cpuidle. 25 - 26 - Interfaces: 27 - extern int cpuidle_register_governor(struct cpuidle_governor *gov); 28 - struct cpuidle_governor
+282
Documentation/driver-api/pm/cpuidle.rst
··· 1 + .. |struct cpuidle_governor| replace:: :c:type:`struct cpuidle_governor <cpuidle_governor>` 2 + .. |struct cpuidle_device| replace:: :c:type:`struct cpuidle_device <cpuidle_device>` 3 + .. |struct cpuidle_driver| replace:: :c:type:`struct cpuidle_driver <cpuidle_driver>` 4 + .. |struct cpuidle_state| replace:: :c:type:`struct cpuidle_state <cpuidle_state>` 5 + 6 + ======================== 7 + CPU Idle Time Management 8 + ======================== 9 + 10 + :: 11 + 12 + Copyright (c) 2019 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> 13 + 14 + 15 + CPU Idle Time Management Subsystem 16 + ================================== 17 + 18 + Every time one of the logical CPUs in the system (the entities that appear to 19 + fetch and execute instructions: hardware threads, if present, or processor 20 + cores) is idle after an interrupt or equivalent wakeup event, which means that 21 + there are no tasks to run on it except for the special "idle" task associated 22 + with it, there is an opportunity to save energy for the processor that it 23 + belongs to. That can be done by making the idle logical CPU stop fetching 24 + instructions from memory and putting some of the processor's functional units 25 + depended on by it into an idle state in which they will draw less power. 26 + 27 + However, there may be multiple different idle states that can be used in such a 28 + situation in principle, so it may be necessary to find the most suitable one 29 + (from the kernel perspective) and ask the processor to use (or "enter") that 30 + particular idle state. That is the role of the CPU idle time management 31 + subsystem in the kernel, called ``CPUIdle``. 32 + 33 + The design of ``CPUIdle`` is modular and based on the code duplication avoidance 34 + principle, so the generic code that in principle need not depend on the hardware 35 + or platform design details in it is separate from the code that interacts with 36 + the hardware. It generally is divided into three categories of functional 37 + units: *governors* responsible for selecting idle states to ask the processor 38 + to enter, *drivers* that pass the governors' decisions on to the hardware and 39 + the *core* providing a common framework for them. 40 + 41 + 42 + CPU Idle Time Governors 43 + ======================= 44 + 45 + A CPU idle time (``CPUIdle``) governor is a bundle of policy code invoked when 46 + one of the logical CPUs in the system turns out to be idle. Its role is to 47 + select an idle state to ask the processor to enter in order to save some energy. 48 + 49 + ``CPUIdle`` governors are generic and each of them can be used on any hardware 50 + platform that the Linux kernel can run on. For this reason, data structures 51 + operated on by them cannot depend on any hardware architecture or platform 52 + design details as well. 53 + 54 + The governor itself is represented by a |struct cpuidle_governor| object 55 + containing four callback pointers, :c:member:`enable`, :c:member:`disable`, 56 + :c:member:`select`, :c:member:`reflect`, a :c:member:`rating` field described 57 + below, and a name (string) used for identifying it. 58 + 59 + For the governor to be available at all, that object needs to be registered 60 + with the ``CPUIdle`` core by calling :c:func:`cpuidle_register_governor()` with 61 + a pointer to it passed as the argument. If successful, that causes the core to 62 + add the governor to the global list of available governors and, if it is the 63 + only one in the list (that is, the list was empty before) or the value of its 64 + :c:member:`rating` field is greater than the value of that field for the 65 + governor currently in use, or the name of the new governor was passed to the 66 + kernel as the value of the ``cpuidle.governor=`` command line parameter, the new 67 + governor will be used from that point on (there can be only one ``CPUIdle`` 68 + governor in use at a time). Also, if ``cpuidle_sysfs_switch`` is passed to the 69 + kernel in the command line, user space can choose the ``CPUIdle`` governor to 70 + use at run time via ``sysfs``. 71 + 72 + Once registered, ``CPUIdle`` governors cannot be unregistered, so it is not 73 + practical to put them into loadable kernel modules. 74 + 75 + The interface between ``CPUIdle`` governors and the core consists of four 76 + callbacks: 77 + 78 + :c:member:`enable` 79 + :: 80 + 81 + int (*enable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); 82 + 83 + The role of this callback is to prepare the governor for handling the 84 + (logical) CPU represented by the |struct cpuidle_device| object pointed 85 + to by the ``dev`` argument. The |struct cpuidle_driver| object pointed 86 + to by the ``drv`` argument represents the ``CPUIdle`` driver to be used 87 + with that CPU (among other things, it should contain the list of 88 + |struct cpuidle_state| objects representing idle states that the 89 + processor holding the given CPU can be asked to enter). 90 + 91 + It may fail, in which case it is expected to return a negative error 92 + code, and that causes the kernel to run the architecture-specific 93 + default code for idle CPUs on the CPU in question instead of ``CPUIdle`` 94 + until the ``->enable()`` governor callback is invoked for that CPU 95 + again. 96 + 97 + :c:member:`disable` 98 + :: 99 + 100 + void (*disable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); 101 + 102 + Called to make the governor stop handling the (logical) CPU represented 103 + by the |struct cpuidle_device| object pointed to by the ``dev`` 104 + argument. 105 + 106 + It is expected to reverse any changes made by the ``->enable()`` 107 + callback when it was last invoked for the target CPU, free all memory 108 + allocated by that callback and so on. 109 + 110 + :c:member:`select` 111 + :: 112 + 113 + int (*select) (struct cpuidle_driver *drv, struct cpuidle_device *dev, 114 + bool *stop_tick); 115 + 116 + Called to select an idle state for the processor holding the (logical) 117 + CPU represented by the |struct cpuidle_device| object pointed to by the 118 + ``dev`` argument. 119 + 120 + The list of idle states to take into consideration is represented by the 121 + :c:member:`states` array of |struct cpuidle_state| objects held by the 122 + |struct cpuidle_driver| object pointed to by the ``drv`` argument (which 123 + represents the ``CPUIdle`` driver to be used with the CPU at hand). The 124 + value returned by this callback is interpreted as an index into that 125 + array (unless it is a negative error code). 126 + 127 + The ``stop_tick`` argument is used to indicate whether or not to stop 128 + the scheduler tick before asking the processor to enter the selected 129 + idle state. When the ``bool`` variable pointed to by it (which is set 130 + to ``true`` before invoking this callback) is cleared to ``false``, the 131 + processor will be asked to enter the selected idle state without 132 + stopping the scheduler tick on the given CPU (if the tick has been 133 + stopped on that CPU already, however, it will not be restarted before 134 + asking the processor to enter the idle state). 135 + 136 + This callback is mandatory (i.e. the :c:member:`select` callback pointer 137 + in |struct cpuidle_governor| must not be ``NULL`` for the registration 138 + of the governor to succeed). 139 + 140 + :c:member:`reflect` 141 + :: 142 + 143 + void (*reflect) (struct cpuidle_device *dev, int index); 144 + 145 + Called to allow the governor to evaluate the accuracy of the idle state 146 + selection made by the ``->select()`` callback (when it was invoked last 147 + time) and possibly use the result of that to improve the accuracy of 148 + idle state selections in the future. 149 + 150 + In addition, ``CPUIdle`` governors are required to take power management 151 + quality of service (PM QoS) constraints on the processor wakeup latency into 152 + account when selecting idle states. In order to obtain the current effective 153 + PM QoS wakeup latency constraint for a given CPU, a ``CPUIdle`` governor is 154 + expected to pass the number of the CPU to 155 + :c:func:`cpuidle_governor_latency_req()`. Then, the governor's ``->select()`` 156 + callback must not return the index of an indle state whose 157 + :c:member:`exit_latency` value is greater than the number returned by that 158 + function. 159 + 160 + 161 + CPU Idle Time Management Drivers 162 + ================================ 163 + 164 + CPU idle time management (``CPUIdle``) drivers provide an interface between the 165 + other parts of ``CPUIdle`` and the hardware. 166 + 167 + First of all, a ``CPUIdle`` driver has to populate the :c:member:`states` array 168 + of |struct cpuidle_state| objects included in the |struct cpuidle_driver| object 169 + representing it. Going forward this array will represent the list of available 170 + idle states that the processor hardware can be asked to enter shared by all of 171 + the logical CPUs handled by the given driver. 172 + 173 + The entries in the :c:member:`states` array are expected to be sorted by the 174 + value of the :c:member:`target_residency` field in |struct cpuidle_state| in 175 + the ascending order (that is, index 0 should correspond to the idle state with 176 + the minimum value of :c:member:`target_residency`). [Since the 177 + :c:member:`target_residency` value is expected to reflect the "depth" of the 178 + idle state represented by the |struct cpuidle_state| object holding it, this 179 + sorting order should be the same as the ascending sorting order by the idle 180 + state "depth".] 181 + 182 + Three fields in |struct cpuidle_state| are used by the existing ``CPUIdle`` 183 + governors for computations related to idle state selection: 184 + 185 + :c:member:`target_residency` 186 + Minimum time to spend in this idle state including the time needed to 187 + enter it (which may be substantial) to save more energy than could 188 + be saved by staying in a shallower idle state for the same amount of 189 + time, in microseconds. 190 + 191 + :c:member:`exit_latency` 192 + Maximum time it will take a CPU asking the processor to enter this idle 193 + state to start executing the first instruction after a wakeup from it, 194 + in microseconds. 195 + 196 + :c:member:`flags` 197 + Flags representing idle state properties. Currently, governors only use 198 + the ``CPUIDLE_FLAG_POLLING`` flag which is set if the given object 199 + does not represent a real idle state, but an interface to a software 200 + "loop" that can be used in order to avoid asking the processor to enter 201 + any idle state at all. [There are other flags used by the ``CPUIdle`` 202 + core in special situations.] 203 + 204 + The :c:member:`enter` callback pointer in |struct cpuidle_state|, which must not 205 + be ``NULL``, points to the routine to execute in order to ask the processor to 206 + enter this particular idle state: 207 + 208 + :: 209 + 210 + void (*enter) (struct cpuidle_device *dev, struct cpuidle_driver *drv, 211 + int index); 212 + 213 + The first two arguments of it point to the |struct cpuidle_device| object 214 + representing the logical CPU running this callback and the 215 + |struct cpuidle_driver| object representing the driver itself, respectively, 216 + and the last one is an index of the |struct cpuidle_state| entry in the driver's 217 + :c:member:`states` array representing the idle state to ask the processor to 218 + enter. 219 + 220 + The analogous ``->enter_s2idle()`` callback in |struct cpuidle_state| is used 221 + only for implementing the suspend-to-idle system-wide power management feature. 222 + The difference between in and ``->enter()`` is that it must not re-enable 223 + interrupts at any point (even temporarily) or attempt to change the states of 224 + clock event devices, which the ``->enter()`` callback may do sometimes. 225 + 226 + Once the :c:member:`states` array has been populated, the number of valid 227 + entries in it has to be stored in the :c:member:`state_count` field of the 228 + |struct cpuidle_driver| object representing the driver. Moreover, if any 229 + entries in the :c:member:`states` array represent "coupled" idle states (that 230 + is, idle states that can only be asked for if multiple related logical CPUs are 231 + idle), the :c:member:`safe_state_index` field in |struct cpuidle_driver| needs 232 + to be the index of an idle state that is not "coupled" (that is, one that can be 233 + asked for if only one logical CPU is idle). 234 + 235 + In addition to that, if the given ``CPUIdle`` driver is only going to handle a 236 + subset of logical CPUs in the system, the :c:member:`cpumask` field in its 237 + |struct cpuidle_driver| object must point to the set (mask) of CPUs that will be 238 + handled by it. 239 + 240 + A ``CPUIdle`` driver can only be used after it has been registered. If there 241 + are no "coupled" idle state entries in the driver's :c:member:`states` array, 242 + that can be accomplished by passing the driver's |struct cpuidle_driver| object 243 + to :c:func:`cpuidle_register_driver()`. Otherwise, :c:func:`cpuidle_register()` 244 + should be used for this purpose. 245 + 246 + However, it also is necessary to register |struct cpuidle_device| objects for 247 + all of the logical CPUs to be handled by the given ``CPUIdle`` driver with the 248 + help of :c:func:`cpuidle_register_device()` after the driver has been registered 249 + and :c:func:`cpuidle_register_driver()`, unlike :c:func:`cpuidle_register()`, 250 + does not do that automatically. For this reason, the drivers that use 251 + :c:func:`cpuidle_register_driver()` to register themselves must also take care 252 + of registering the |struct cpuidle_device| objects as needed, so it is generally 253 + recommended to use :c:func:`cpuidle_register()` for ``CPUIdle`` driver 254 + registration in all cases. 255 + 256 + The registration of a |struct cpuidle_device| object causes the ``CPUIdle`` 257 + ``sysfs`` interface to be created and the governor's ``->enable()`` callback to 258 + be invoked for the logical CPU represented by it, so it must take place after 259 + registering the driver that will handle the CPU in question. 260 + 261 + ``CPUIdle`` drivers and |struct cpuidle_device| objects can be unregistered 262 + when they are not necessary any more which allows some resources associated with 263 + them to be released. Due to dependencies between them, all of the 264 + |struct cpuidle_device| objects representing CPUs handled by the given 265 + ``CPUIdle`` driver must be unregistered, with the help of 266 + :c:func:`cpuidle_unregister_device()`, before calling 267 + :c:func:`cpuidle_unregister_driver()` to unregister the driver. Alternatively, 268 + :c:func:`cpuidle_unregister()` can be called to unregister a ``CPUIdle`` driver 269 + along with all of the |struct cpuidle_device| objects representing CPUs handled 270 + by it. 271 + 272 + ``CPUIdle`` drivers can respond to runtime system configuration changes that 273 + lead to modifications of the list of available processor idle states (which can 274 + happen, for example, when the system's power source is switched from AC to 275 + battery or the other way around). Upon a notification of such a change, 276 + a ``CPUIdle`` driver is expected to call :c:func:`cpuidle_pause_and_lock()` to 277 + turn ``CPUIdle`` off temporarily and then :c:func:`cpuidle_disable_device()` for 278 + all of the |struct cpuidle_device| objects representing CPUs affected by that 279 + change. Next, it can update its :c:member:`states` array in accordance with 280 + the new configuration of the system, call :c:func:`cpuidle_enable_device()` for 281 + all of the relevant |struct cpuidle_device| objects and invoke 282 + :c:func:`cpuidle_resume_and_unlock()` to allow ``CPUIdle`` to be used again.
+4 -3
Documentation/driver-api/pm/index.rst
··· 1 - ======================= 2 - Device Power Management 3 - ======================= 1 + =============================== 2 + CPU and Device Power Management 3 + =============================== 4 4 5 5 .. toctree:: 6 6 7 + cpuidle 7 8 devices 8 9 notifiers 9 10 types
+8 -6
MAINTAINERS
··· 1736 1736 F: arch/arm/mach-mvebu/ 1737 1737 F: arch/arm64/boot/dts/marvell/armada* 1738 1738 F: drivers/cpufreq/armada-37xx-cpufreq.c 1739 + F: drivers/cpufreq/armada-8k-cpufreq.c 1739 1740 F: drivers/cpufreq/mvebu-cpufreq.c 1740 1741 F: drivers/irqchip/irq-armada-370-xp.c 1741 1742 F: drivers/irqchip/irq-mvebu-* ··· 3995 3994 L: linux-pm@vger.kernel.org 3996 3995 S: Maintained 3997 3996 T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 3998 - T: git git://git.linaro.org/people/vireshk/linux.git (For ARM Updates) 3997 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git (For ARM Updates) 3999 3998 B: https://bugzilla.kernel.org 4000 3999 F: Documentation/admin-guide/pm/cpufreq.rst 4001 4000 F: Documentation/admin-guide/pm/intel_pstate.rst ··· 4055 4054 T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 4056 4055 B: https://bugzilla.kernel.org 4057 4056 F: Documentation/admin-guide/pm/cpuidle.rst 4057 + F: Documentation/driver-api/pm/cpuidle.rst 4058 4058 F: drivers/cpuidle/* 4059 4059 F: include/linux/cpuidle.h 4060 4060 ··· 12681 12679 F: drivers/media/platform/qcom/camss/ 12682 12680 12683 12681 QUALCOMM CPUFREQ DRIVER MSM8996/APQ8096 12684 - M: Ilia Lin <ilia.lin@gmail.com> 12685 - L: linux-pm@vger.kernel.org 12686 - S: Maintained 12687 - F: Documentation/devicetree/bindings/opp/kryo-cpufreq.txt 12688 - F: drivers/cpufreq/qcom-cpufreq-kryo.c 12682 + M: Ilia Lin <ilia.lin@kernel.org> 12683 + L: linux-pm@vger.kernel.org 12684 + S: Maintained 12685 + F: Documentation/devicetree/bindings/opp/kryo-cpufreq.txt 12686 + F: drivers/cpufreq/qcom-cpufreq-kryo.c 12689 12687 12690 12688 QUALCOMM EMAC GIGABIT ETHERNET DRIVER 12691 12689 M: Timur Tabi <timur@kernel.org>
+1 -1
arch/arm/mach-davinci/da850.c
··· 22 22 #include <linux/mfd/da8xx-cfgchip.h> 23 23 #include <linux/platform_data/clk-da8xx-cfgchip.h> 24 24 #include <linux/platform_data/clk-davinci-pll.h> 25 + #include <linux/platform_data/davinci-cpufreq.h> 25 26 #include <linux/platform_data/gpio-davinci.h> 26 27 #include <linux/platform_device.h> 27 28 #include <linux/regmap.h> ··· 31 30 #include <asm/mach/map.h> 32 31 33 32 #include <mach/common.h> 34 - #include <mach/cpufreq.h> 35 33 #include <mach/cputype.h> 36 34 #include <mach/da8xx.h> 37 35 #include <mach/pm.h>
-26
arch/arm/mach-davinci/include/mach/cpufreq.h
··· 1 - /* 2 - * TI DaVinci CPUFreq platform support. 3 - * 4 - * Copyright (C) 2009 Texas Instruments, Inc. http://www.ti.com/ 5 - * 6 - * This program is free software; you can redistribute it and/or 7 - * modify it under the terms of the GNU General Public License as 8 - * published by the Free Software Foundation version 2. 9 - * 10 - * This program is distributed "as is" WITHOUT ANY WARRANTY of any 11 - * kind, whether express or implied; without even the implied warranty 12 - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 - * GNU General Public License for more details. 14 - */ 15 - #ifndef _MACH_DAVINCI_CPUFREQ_H 16 - #define _MACH_DAVINCI_CPUFREQ_H 17 - 18 - #include <linux/cpufreq.h> 19 - 20 - struct davinci_cpufreq_config { 21 - struct cpufreq_frequency_table *freq_table; 22 - int (*set_voltage) (unsigned int index); 23 - int (*init) (void); 24 - }; 25 - 26 - #endif
+42
drivers/acpi/cppc_acpi.c
··· 1051 1051 } 1052 1052 1053 1053 /** 1054 + * cppc_get_desired_perf - Get the value of desired performance register. 1055 + * @cpunum: CPU from which to get desired performance. 1056 + * @desired_perf: address of a variable to store the returned desired performance 1057 + * 1058 + * Return: 0 for success, -EIO otherwise. 1059 + */ 1060 + int cppc_get_desired_perf(int cpunum, u64 *desired_perf) 1061 + { 1062 + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum); 1063 + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum); 1064 + struct cpc_register_resource *desired_reg; 1065 + struct cppc_pcc_data *pcc_ss_data = NULL; 1066 + 1067 + desired_reg = &cpc_desc->cpc_regs[DESIRED_PERF]; 1068 + 1069 + if (CPC_IN_PCC(desired_reg)) { 1070 + int ret = 0; 1071 + 1072 + if (pcc_ss_id < 0) 1073 + return -EIO; 1074 + 1075 + pcc_ss_data = pcc_data[pcc_ss_id]; 1076 + 1077 + down_write(&pcc_ss_data->pcc_lock); 1078 + 1079 + if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) 1080 + cpc_read(cpunum, desired_reg, desired_perf); 1081 + else 1082 + ret = -EIO; 1083 + 1084 + up_write(&pcc_ss_data->pcc_lock); 1085 + 1086 + return ret; 1087 + } 1088 + 1089 + cpc_read(cpunum, desired_reg, desired_perf); 1090 + 1091 + return 0; 1092 + } 1093 + EXPORT_SYMBOL_GPL(cppc_get_desired_perf); 1094 + 1095 + /** 1054 1096 * cppc_get_perf_caps - Get a CPUs performance capabilities. 1055 1097 * @cpunum: CPU from which to get capabilities info. 1056 1098 * @perf_caps: ptr to cppc_perf_caps. See cppc_acpi.h
+7
drivers/acpi/processor_idle.c
··· 282 282 pr->power.states[ACPI_STATE_C2].address, 283 283 pr->power.states[ACPI_STATE_C3].address)); 284 284 285 + snprintf(pr->power.states[ACPI_STATE_C2].desc, 286 + ACPI_CX_DESC_LEN, "ACPI P_LVL2 IOPORT 0x%x", 287 + pr->power.states[ACPI_STATE_C2].address); 288 + snprintf(pr->power.states[ACPI_STATE_C3].desc, 289 + ACPI_CX_DESC_LEN, "ACPI P_LVL3 IOPORT 0x%x", 290 + pr->power.states[ACPI_STATE_C3].address); 291 + 285 292 return 0; 286 293 } 287 294
+1
drivers/base/cpu.c
··· 427 427 dev->parent = parent; 428 428 dev->groups = groups; 429 429 dev->release = device_create_release; 430 + device_set_pm_not_required(dev); 430 431 dev_set_drvdata(dev, drvdata); 431 432 432 433 retval = kobject_set_name_vargs(&dev->kobj, fmt, args);
+9 -4
drivers/base/power/clock_ops.c
··· 65 65 if (IS_ERR(ce->clk)) { 66 66 ce->status = PCE_STATUS_ERROR; 67 67 } else { 68 - clk_prepare(ce->clk); 69 - ce->status = PCE_STATUS_ACQUIRED; 70 - dev_dbg(dev, "Clock %pC con_id %s managed by runtime PM.\n", 71 - ce->clk, ce->con_id); 68 + if (clk_prepare(ce->clk)) { 69 + ce->status = PCE_STATUS_ERROR; 70 + dev_err(dev, "clk_prepare() failed\n"); 71 + } else { 72 + ce->status = PCE_STATUS_ACQUIRED; 73 + dev_dbg(dev, 74 + "Clock %pC con_id %s managed by runtime PM.\n", 75 + ce->clk, ce->con_id); 76 + } 72 77 } 73 78 } 74 79
+1 -1
drivers/base/power/common.c
··· 160 160 * For a detailed function description, see dev_pm_domain_attach_by_id(). 161 161 */ 162 162 struct device *dev_pm_domain_attach_by_name(struct device *dev, 163 - char *name) 163 + const char *name) 164 164 { 165 165 if (dev->pm_domain) 166 166 return ERR_PTR(-EEXIST);
+3 -10
drivers/base/power/domain.c
··· 2483 2483 * power-domain-names DT property. For further description see 2484 2484 * genpd_dev_pm_attach_by_id(). 2485 2485 */ 2486 - struct device *genpd_dev_pm_attach_by_name(struct device *dev, char *name) 2486 + struct device *genpd_dev_pm_attach_by_name(struct device *dev, const char *name) 2487 2487 { 2488 2488 int index; 2489 2489 ··· 2948 2948 2949 2949 genpd_debugfs_dir = debugfs_create_dir("pm_genpd", NULL); 2950 2950 2951 - if (!genpd_debugfs_dir) 2952 - return -ENOMEM; 2953 - 2954 - d = debugfs_create_file("pm_genpd_summary", S_IRUGO, 2955 - genpd_debugfs_dir, NULL, &summary_fops); 2956 - if (!d) 2957 - return -ENOMEM; 2951 + debugfs_create_file("pm_genpd_summary", S_IRUGO, genpd_debugfs_dir, 2952 + NULL, &summary_fops); 2958 2953 2959 2954 list_for_each_entry(genpd, &gpd_list, gpd_list_node) { 2960 2955 d = debugfs_create_dir(genpd->name, genpd_debugfs_dir); 2961 - if (!d) 2962 - return -ENOMEM; 2963 2956 2964 2957 debugfs_create_file("current_state", 0444, 2965 2958 d, genpd, &status_fops);
+10 -1
drivers/base/power/main.c
··· 124 124 */ 125 125 void device_pm_add(struct device *dev) 126 126 { 127 + /* Skip PM setup/initialization. */ 128 + if (device_pm_not_required(dev)) 129 + return; 130 + 127 131 pr_debug("PM: Adding info for %s:%s\n", 128 132 dev->bus ? dev->bus->name : "No Bus", dev_name(dev)); 129 133 device_pm_check_callbacks(dev); ··· 146 142 */ 147 143 void device_pm_remove(struct device *dev) 148 144 { 145 + if (device_pm_not_required(dev)) 146 + return; 147 + 149 148 pr_debug("PM: Removing info for %s:%s\n", 150 149 dev->bus ? dev->bus->name : "No Bus", dev_name(dev)); 151 150 complete_all(&dev->power.completion); ··· 1748 1741 if (dev->power.direct_complete) { 1749 1742 if (pm_runtime_status_suspended(dev)) { 1750 1743 pm_runtime_disable(dev); 1751 - if (pm_runtime_status_suspended(dev)) 1744 + if (pm_runtime_status_suspended(dev)) { 1745 + pm_dev_dbg(dev, state, "direct-complete "); 1752 1746 goto Complete; 1747 + } 1753 1748 1754 1749 pm_runtime_enable(dev); 1755 1750 }
+52 -22
drivers/base/power/runtime.c
··· 66 66 */ 67 67 void update_pm_runtime_accounting(struct device *dev) 68 68 { 69 - unsigned long now = jiffies; 70 - unsigned long delta; 71 - 72 - delta = now - dev->power.accounting_timestamp; 73 - 74 - dev->power.accounting_timestamp = now; 69 + u64 now, last, delta; 75 70 76 71 if (dev->power.disable_depth > 0) 77 72 return; 78 73 74 + last = dev->power.accounting_timestamp; 75 + 76 + now = ktime_get_mono_fast_ns(); 77 + dev->power.accounting_timestamp = now; 78 + 79 + /* 80 + * Because ktime_get_mono_fast_ns() is not monotonic during 81 + * timekeeping updates, ensure that 'now' is after the last saved 82 + * timesptamp. 83 + */ 84 + if (now < last) 85 + return; 86 + 87 + delta = now - last; 88 + 79 89 if (dev->power.runtime_status == RPM_SUSPENDED) 80 - dev->power.suspended_jiffies += delta; 90 + dev->power.suspended_time += delta; 81 91 else 82 - dev->power.active_jiffies += delta; 92 + dev->power.active_time += delta; 83 93 } 84 94 85 95 static void __update_runtime_status(struct device *dev, enum rpm_status status) ··· 97 87 update_pm_runtime_accounting(dev); 98 88 dev->power.runtime_status = status; 99 89 } 90 + 91 + u64 pm_runtime_suspended_time(struct device *dev) 92 + { 93 + u64 time; 94 + unsigned long flags; 95 + 96 + spin_lock_irqsave(&dev->power.lock, flags); 97 + 98 + update_pm_runtime_accounting(dev); 99 + time = dev->power.suspended_time; 100 + 101 + spin_unlock_irqrestore(&dev->power.lock, flags); 102 + 103 + return time; 104 + } 105 + EXPORT_SYMBOL_GPL(pm_runtime_suspended_time); 100 106 101 107 /** 102 108 * pm_runtime_deactivate_timer - Deactivate given device's suspend timer. ··· 155 129 u64 pm_runtime_autosuspend_expiration(struct device *dev) 156 130 { 157 131 int autosuspend_delay; 158 - u64 last_busy, expires = 0; 159 - u64 now = ktime_get_mono_fast_ns(); 132 + u64 expires; 160 133 161 134 if (!dev->power.use_autosuspend) 162 - goto out; 135 + return 0; 163 136 164 137 autosuspend_delay = READ_ONCE(dev->power.autosuspend_delay); 165 138 if (autosuspend_delay < 0) 166 - goto out; 139 + return 0; 167 140 168 - last_busy = READ_ONCE(dev->power.last_busy); 141 + expires = READ_ONCE(dev->power.last_busy); 142 + expires += (u64)autosuspend_delay * NSEC_PER_MSEC; 143 + if (expires > ktime_get_mono_fast_ns()) 144 + return expires; /* Expires in the future */ 169 145 170 - expires = last_busy + (u64)autosuspend_delay * NSEC_PER_MSEC; 171 - if (expires <= now) 172 - expires = 0; /* Already expired. */ 173 - 174 - out: 175 - return expires; 146 + return 0; 176 147 } 177 148 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration); 178 149 ··· 1299 1276 pm_runtime_put_noidle(dev); 1300 1277 } 1301 1278 1279 + /* Update time accounting before disabling PM-runtime. */ 1280 + update_pm_runtime_accounting(dev); 1281 + 1302 1282 if (!dev->power.disable_depth++) 1303 1283 __pm_runtime_barrier(dev); 1304 1284 ··· 1320 1294 1321 1295 spin_lock_irqsave(&dev->power.lock, flags); 1322 1296 1323 - if (dev->power.disable_depth > 0) 1297 + if (dev->power.disable_depth > 0) { 1324 1298 dev->power.disable_depth--; 1325 - else 1299 + 1300 + /* About to enable runtime pm, set accounting_timestamp to now */ 1301 + if (!dev->power.disable_depth) 1302 + dev->power.accounting_timestamp = ktime_get_mono_fast_ns(); 1303 + } else { 1326 1304 dev_warn(dev, "Unbalanced %s!\n", __func__); 1305 + } 1327 1306 1328 1307 WARN(!dev->power.disable_depth && 1329 1308 dev->power.runtime_status == RPM_SUSPENDED && ··· 1525 1494 dev->power.request_pending = false; 1526 1495 dev->power.request = RPM_REQ_NONE; 1527 1496 dev->power.deferred_resume = false; 1528 - dev->power.accounting_timestamp = jiffies; 1529 1497 INIT_WORK(&dev->power.work, pm_runtime_work); 1530 1498 1531 1499 dev->power.timer_expires = 0;
+14 -3
drivers/base/power/sysfs.c
··· 125 125 struct device_attribute *attr, char *buf) 126 126 { 127 127 int ret; 128 + u64 tmp; 128 129 spin_lock_irq(&dev->power.lock); 129 130 update_pm_runtime_accounting(dev); 130 - ret = sprintf(buf, "%i\n", jiffies_to_msecs(dev->power.active_jiffies)); 131 + tmp = dev->power.active_time; 132 + do_div(tmp, NSEC_PER_MSEC); 133 + ret = sprintf(buf, "%llu\n", tmp); 131 134 spin_unlock_irq(&dev->power.lock); 132 135 return ret; 133 136 } ··· 141 138 struct device_attribute *attr, char *buf) 142 139 { 143 140 int ret; 141 + u64 tmp; 144 142 spin_lock_irq(&dev->power.lock); 145 143 update_pm_runtime_accounting(dev); 146 - ret = sprintf(buf, "%i\n", 147 - jiffies_to_msecs(dev->power.suspended_jiffies)); 144 + tmp = dev->power.suspended_time; 145 + do_div(tmp, NSEC_PER_MSEC); 146 + ret = sprintf(buf, "%llu\n", tmp); 148 147 spin_unlock_irq(&dev->power.lock); 149 148 return ret; 150 149 } ··· 653 648 { 654 649 int rc; 655 650 651 + /* No need to create PM sysfs if explicitly disabled. */ 652 + if (device_pm_not_required(dev)) 653 + return 0; 654 + 656 655 rc = sysfs_create_group(&dev->kobj, &pm_attr_group); 657 656 if (rc) 658 657 return rc; ··· 736 727 737 728 void dpm_sysfs_remove(struct device *dev) 738 729 { 730 + if (device_pm_not_required(dev)) 731 + return; 739 732 sysfs_unmerge_group(&dev->kobj, &pm_qos_latency_tolerance_attr_group); 740 733 dev_pm_qos_constraints_destroy(dev); 741 734 rpm_sysfs_remove(dev);
+1 -1
drivers/base/power/wakeup.c
··· 783 783 EXPORT_SYMBOL_GPL(pm_wakeup_ws_event); 784 784 785 785 /** 786 - * pm_wakeup_event - Notify the PM core of a wakeup event. 786 + * pm_wakeup_dev_event - Notify the PM core of a wakeup event. 787 787 * @dev: Device the wakeup event is related to. 788 788 * @msec: Anticipated event processing time (in milliseconds). 789 789 * @hard: If set, abort suspends in progress and wake up from suspend-to-idle.
-3
drivers/cpufreq/Kconfig
··· 207 207 config CPUFREQ_DT 208 208 tristate "Generic DT based cpufreq driver" 209 209 depends on HAVE_CLK && OF 210 - # if CPU_THERMAL is on and THERMAL=m, CPUFREQ_DT cannot be =y: 211 - depends on !CPU_THERMAL || THERMAL 212 210 select CPUFREQ_DT_PLATDEV 213 211 select PM_OPP 214 212 help ··· 325 327 config QORIQ_CPUFREQ 326 328 tristate "CPU frequency scaling driver for Freescale QorIQ SoCs" 327 329 depends on OF && COMMON_CLK && (PPC_E500MC || ARM || ARM64) 328 - depends on !CPU_THERMAL || THERMAL 329 330 select CLK_QORIQ 330 331 help 331 332 This adds the CPUFreq driver support for Freescale QorIQ SoCs
+11 -5
drivers/cpufreq/Kconfig.arm
··· 25 25 This adds the CPUFreq driver support for Marvell Armada 37xx SoCs. 26 26 The Armada 37xx PMU supports 4 frequency and VDD levels. 27 27 28 + config ARM_ARMADA_8K_CPUFREQ 29 + tristate "Armada 8K CPUFreq driver" 30 + depends on ARCH_MVEBU && CPUFREQ_DT 31 + help 32 + This enables the CPUFreq driver support for Marvell 33 + Armada8k SOCs. 34 + Armada8K device has the AP806 which supports scaling 35 + to any full integer divider. 36 + 37 + If in doubt, say N. 38 + 28 39 # big LITTLE core layer and glue drivers 29 40 config ARM_BIG_LITTLE_CPUFREQ 30 41 tristate "Generic ARM big LITTLE CPUfreq driver" 31 42 depends on ARM_CPU_TOPOLOGY && HAVE_CLK 32 - # if CPU_THERMAL is on and THERMAL=m, ARM_BIT_LITTLE_CPUFREQ cannot be =y 33 - depends on !CPU_THERMAL || THERMAL 34 43 select PM_OPP 35 44 help 36 45 This enables the Generic CPUfreq driver for ARM big.LITTLE platforms. ··· 47 38 config ARM_SCPI_CPUFREQ 48 39 tristate "SCPI based CPUfreq driver" 49 40 depends on ARM_SCPI_PROTOCOL && COMMON_CLK_SCPI 50 - depends on !CPU_THERMAL || THERMAL 51 41 help 52 42 This adds the CPUfreq driver support for ARM platforms using SCPI 53 43 protocol for CPU power management. ··· 101 93 config ARM_MEDIATEK_CPUFREQ 102 94 tristate "CPU Frequency scaling support for MediaTek SoCs" 103 95 depends on ARCH_MEDIATEK && REGULATOR 104 - depends on !CPU_THERMAL || THERMAL 105 96 select PM_OPP 106 97 help 107 98 This adds the CPUFreq driver support for MediaTek SoCs. ··· 240 233 config ARM_SCMI_CPUFREQ 241 234 tristate "SCMI based CPUfreq driver" 242 235 depends on ARM_SCMI_PROTOCOL || COMPILE_TEST 243 - depends on !CPU_THERMAL || THERMAL 244 236 select PM_OPP 245 237 help 246 238 This adds the CPUfreq driver support for ARM platforms using SCMI
+1
drivers/cpufreq/Makefile
··· 50 50 obj-$(CONFIG_ARM_BIG_LITTLE_CPUFREQ) += arm_big_little.o 51 51 52 52 obj-$(CONFIG_ARM_ARMADA_37XX_CPUFREQ) += armada-37xx-cpufreq.o 53 + obj-$(CONFIG_ARM_ARMADA_8K_CPUFREQ) += armada-8k-cpufreq.o 53 54 obj-$(CONFIG_ARM_BRCMSTB_AVS_CPUFREQ) += brcmstb-avs-cpufreq.o 54 55 obj-$(CONFIG_ACPI_CPPC_CPUFREQ) += cppc_cpufreq.o 55 56 obj-$(CONFIG_ARCH_DAVINCI) += davinci-cpufreq.o
+3 -1
drivers/cpufreq/acpi-cpufreq.c
··· 916 916 { 917 917 int ret; 918 918 919 - if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA))) 919 + if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA))) { 920 + pr_debug("Boost capabilities not present in the processor\n"); 920 921 return; 922 + } 921 923 922 924 acpi_cpufreq_driver.set_boost = set_boost; 923 925 acpi_cpufreq_driver.boost_enabled = boost_state(0);
+2
drivers/cpufreq/arm_big_little.c
··· 487 487 policy->cpuinfo.transition_latency = 488 488 arm_bL_ops->get_transition_latency(cpu_dev); 489 489 490 + dev_pm_opp_of_register_em(policy->cpus); 491 + 490 492 if (is_bL_switching_enabled()) 491 493 per_cpu(cpu_last_req_freq, policy->cpu) = clk_get_cpu_rate(policy->cpu); 492 494
+206
drivers/cpufreq/armada-8k-cpufreq.c
··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + /* 3 + * CPUFreq support for Armada 8K 4 + * 5 + * Copyright (C) 2018 Marvell 6 + * 7 + * Omri Itach <omrii@marvell.com> 8 + * Gregory Clement <gregory.clement@bootlin.com> 9 + */ 10 + 11 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 12 + 13 + #include <linux/clk.h> 14 + #include <linux/cpu.h> 15 + #include <linux/err.h> 16 + #include <linux/init.h> 17 + #include <linux/kernel.h> 18 + #include <linux/module.h> 19 + #include <linux/of.h> 20 + #include <linux/platform_device.h> 21 + #include <linux/pm_opp.h> 22 + #include <linux/slab.h> 23 + 24 + /* 25 + * Setup the opps list with the divider for the max frequency, that 26 + * will be filled at runtime. 27 + */ 28 + static const int opps_div[] __initconst = {1, 2, 3, 4}; 29 + 30 + static struct platform_device *armada_8k_pdev; 31 + 32 + struct freq_table { 33 + struct device *cpu_dev; 34 + unsigned int freq[ARRAY_SIZE(opps_div)]; 35 + }; 36 + 37 + /* If the CPUs share the same clock, then they are in the same cluster. */ 38 + static void __init armada_8k_get_sharing_cpus(struct clk *cur_clk, 39 + struct cpumask *cpumask) 40 + { 41 + int cpu; 42 + 43 + for_each_possible_cpu(cpu) { 44 + struct device *cpu_dev; 45 + struct clk *clk; 46 + 47 + cpu_dev = get_cpu_device(cpu); 48 + if (!cpu_dev) { 49 + pr_warn("Failed to get cpu%d device\n", cpu); 50 + continue; 51 + } 52 + 53 + clk = clk_get(cpu_dev, 0); 54 + if (IS_ERR(clk)) { 55 + pr_warn("Cannot get clock for CPU %d\n", cpu); 56 + } else { 57 + if (clk_is_match(clk, cur_clk)) 58 + cpumask_set_cpu(cpu, cpumask); 59 + 60 + clk_put(clk); 61 + } 62 + } 63 + } 64 + 65 + static int __init armada_8k_add_opp(struct clk *clk, struct device *cpu_dev, 66 + struct freq_table *freq_tables, 67 + int opps_index) 68 + { 69 + unsigned int cur_frequency; 70 + unsigned int freq; 71 + int i, ret; 72 + 73 + /* Get nominal (current) CPU frequency. */ 74 + cur_frequency = clk_get_rate(clk); 75 + if (!cur_frequency) { 76 + dev_err(cpu_dev, "Failed to get clock rate for this CPU\n"); 77 + return -EINVAL; 78 + } 79 + 80 + freq_tables[opps_index].cpu_dev = cpu_dev; 81 + 82 + for (i = 0; i < ARRAY_SIZE(opps_div); i++) { 83 + freq = cur_frequency / opps_div[i]; 84 + 85 + ret = dev_pm_opp_add(cpu_dev, freq, 0); 86 + if (ret) 87 + return ret; 88 + 89 + freq_tables[opps_index].freq[i] = freq; 90 + } 91 + 92 + return 0; 93 + } 94 + 95 + static void armada_8k_cpufreq_free_table(struct freq_table *freq_tables) 96 + { 97 + int opps_index, nb_cpus = num_possible_cpus(); 98 + 99 + for (opps_index = 0 ; opps_index <= nb_cpus; opps_index++) { 100 + int i; 101 + 102 + /* If cpu_dev is NULL then we reached the end of the array */ 103 + if (!freq_tables[opps_index].cpu_dev) 104 + break; 105 + 106 + for (i = 0; i < ARRAY_SIZE(opps_div); i++) { 107 + /* 108 + * A 0Hz frequency is not valid, this meant 109 + * that it was not yet initialized so there is 110 + * no more opp to free 111 + */ 112 + if (freq_tables[opps_index].freq[i] == 0) 113 + break; 114 + 115 + dev_pm_opp_remove(freq_tables[opps_index].cpu_dev, 116 + freq_tables[opps_index].freq[i]); 117 + } 118 + } 119 + 120 + kfree(freq_tables); 121 + } 122 + 123 + static int __init armada_8k_cpufreq_init(void) 124 + { 125 + int ret = 0, opps_index = 0, cpu, nb_cpus; 126 + struct freq_table *freq_tables; 127 + struct device_node *node; 128 + struct cpumask cpus; 129 + 130 + node = of_find_compatible_node(NULL, NULL, "marvell,ap806-cpu-clock"); 131 + if (!node || !of_device_is_available(node)) { 132 + of_node_put(node); 133 + return -ENODEV; 134 + } 135 + 136 + nb_cpus = num_possible_cpus(); 137 + freq_tables = kcalloc(nb_cpus, sizeof(*freq_tables), GFP_KERNEL); 138 + cpumask_copy(&cpus, cpu_possible_mask); 139 + 140 + /* 141 + * For each CPU, this loop registers the operating points 142 + * supported (which are the nominal CPU frequency and full integer 143 + * divisions of it). 144 + */ 145 + for_each_cpu(cpu, &cpus) { 146 + struct cpumask shared_cpus; 147 + struct device *cpu_dev; 148 + struct clk *clk; 149 + 150 + cpu_dev = get_cpu_device(cpu); 151 + 152 + if (!cpu_dev) { 153 + pr_err("Cannot get CPU %d\n", cpu); 154 + continue; 155 + } 156 + 157 + clk = clk_get(cpu_dev, 0); 158 + 159 + if (IS_ERR(clk)) { 160 + pr_err("Cannot get clock for CPU %d\n", cpu); 161 + ret = PTR_ERR(clk); 162 + goto remove_opp; 163 + } 164 + 165 + ret = armada_8k_add_opp(clk, cpu_dev, freq_tables, opps_index); 166 + if (ret) { 167 + clk_put(clk); 168 + goto remove_opp; 169 + } 170 + 171 + opps_index++; 172 + cpumask_clear(&shared_cpus); 173 + armada_8k_get_sharing_cpus(clk, &shared_cpus); 174 + dev_pm_opp_set_sharing_cpus(cpu_dev, &shared_cpus); 175 + cpumask_andnot(&cpus, &cpus, &shared_cpus); 176 + clk_put(clk); 177 + } 178 + 179 + armada_8k_pdev = platform_device_register_simple("cpufreq-dt", -1, 180 + NULL, 0); 181 + ret = PTR_ERR_OR_ZERO(armada_8k_pdev); 182 + if (ret) 183 + goto remove_opp; 184 + 185 + platform_set_drvdata(armada_8k_pdev, freq_tables); 186 + 187 + return 0; 188 + 189 + remove_opp: 190 + armada_8k_cpufreq_free_table(freq_tables); 191 + return ret; 192 + } 193 + module_init(armada_8k_cpufreq_init); 194 + 195 + static void __exit armada_8k_cpufreq_exit(void) 196 + { 197 + struct freq_table *freq_tables = platform_get_drvdata(armada_8k_pdev); 198 + 199 + platform_device_unregister(armada_8k_pdev); 200 + armada_8k_cpufreq_free_table(freq_tables); 201 + } 202 + module_exit(armada_8k_cpufreq_exit); 203 + 204 + MODULE_AUTHOR("Gregory Clement <gregory.clement@bootlin.com>"); 205 + MODULE_DESCRIPTION("Armada 8K cpufreq driver"); 206 + MODULE_LICENSE("GPL");
+65
drivers/cpufreq/cppc_cpufreq.c
··· 42 42 */ 43 43 static struct cppc_cpudata **all_cpu_data; 44 44 45 + struct cppc_workaround_oem_info { 46 + char oem_id[ACPI_OEM_ID_SIZE +1]; 47 + char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1]; 48 + u32 oem_revision; 49 + }; 50 + 51 + static bool apply_hisi_workaround; 52 + 53 + static struct cppc_workaround_oem_info wa_info[] = { 54 + { 55 + .oem_id = "HISI ", 56 + .oem_table_id = "HIP07 ", 57 + .oem_revision = 0, 58 + }, { 59 + .oem_id = "HISI ", 60 + .oem_table_id = "HIP08 ", 61 + .oem_revision = 0, 62 + } 63 + }; 64 + 65 + static unsigned int cppc_cpufreq_perf_to_khz(struct cppc_cpudata *cpu, 66 + unsigned int perf); 67 + 68 + /* 69 + * HISI platform does not support delivered performance counter and 70 + * reference performance counter. It can calculate the performance using the 71 + * platform specific mechanism. We reuse the desired performance register to 72 + * store the real performance calculated by the platform. 73 + */ 74 + static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpunum) 75 + { 76 + struct cppc_cpudata *cpudata = all_cpu_data[cpunum]; 77 + u64 desired_perf; 78 + int ret; 79 + 80 + ret = cppc_get_desired_perf(cpunum, &desired_perf); 81 + if (ret < 0) 82 + return -EIO; 83 + 84 + return cppc_cpufreq_perf_to_khz(cpudata, desired_perf); 85 + } 86 + 87 + static void cppc_check_hisi_workaround(void) 88 + { 89 + struct acpi_table_header *tbl; 90 + acpi_status status = AE_OK; 91 + int i; 92 + 93 + status = acpi_get_table(ACPI_SIG_PCCT, 0, &tbl); 94 + if (ACPI_FAILURE(status) || !tbl) 95 + return; 96 + 97 + for (i = 0; i < ARRAY_SIZE(wa_info); i++) { 98 + if (!memcmp(wa_info[i].oem_id, tbl->oem_id, ACPI_OEM_ID_SIZE) && 99 + !memcmp(wa_info[i].oem_table_id, tbl->oem_table_id, ACPI_OEM_TABLE_ID_SIZE) && 100 + wa_info[i].oem_revision == tbl->oem_revision) 101 + apply_hisi_workaround = true; 102 + } 103 + } 104 + 45 105 /* Callback function used to retrieve the max frequency from DMI */ 46 106 static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private) 47 107 { ··· 394 334 struct cppc_cpudata *cpu = all_cpu_data[cpunum]; 395 335 int ret; 396 336 337 + if (apply_hisi_workaround) 338 + return hisi_cppc_cpufreq_get_rate(cpunum); 339 + 397 340 ret = cppc_get_perf_ctrs(cpunum, &fb_ctrs_t0); 398 341 if (ret) 399 342 return ret; ··· 448 385 pr_debug("Error parsing PSD data. Aborting cpufreq registration.\n"); 449 386 goto out; 450 387 } 388 + 389 + cppc_check_hisi_workaround(); 451 390 452 391 ret = cpufreq_register_driver(&cppc_cpufreq_driver); 453 392 if (ret)
+21 -12
drivers/cpufreq/cpufreq-dt.c
··· 13 13 14 14 #include <linux/clk.h> 15 15 #include <linux/cpu.h> 16 - #include <linux/cpu_cooling.h> 17 16 #include <linux/cpufreq.h> 18 17 #include <linux/cpumask.h> 19 18 #include <linux/err.h> ··· 29 30 struct private_data { 30 31 struct opp_table *opp_table; 31 32 struct device *cpu_dev; 32 - struct thermal_cooling_device *cdev; 33 33 const char *reg_name; 34 34 bool have_static_opps; 35 35 }; ··· 278 280 policy->cpuinfo.transition_latency = transition_latency; 279 281 policy->dvfs_possible_from_any_cpu = true; 280 282 283 + dev_pm_opp_of_register_em(policy->cpus); 284 + 281 285 return 0; 282 286 283 287 out_free_cpufreq_table: ··· 297 297 return ret; 298 298 } 299 299 300 + static int cpufreq_online(struct cpufreq_policy *policy) 301 + { 302 + /* We did light-weight tear down earlier, nothing to do here */ 303 + return 0; 304 + } 305 + 306 + static int cpufreq_offline(struct cpufreq_policy *policy) 307 + { 308 + /* 309 + * Preserve policy->driver_data and don't free resources on light-weight 310 + * tear down. 311 + */ 312 + return 0; 313 + } 314 + 300 315 static int cpufreq_exit(struct cpufreq_policy *policy) 301 316 { 302 317 struct private_data *priv = policy->driver_data; 303 318 304 - cpufreq_cooling_unregister(priv->cdev); 305 319 dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table); 306 320 if (priv->have_static_opps) 307 321 dev_pm_opp_of_cpumask_remove_table(policy->related_cpus); ··· 328 314 return 0; 329 315 } 330 316 331 - static void cpufreq_ready(struct cpufreq_policy *policy) 332 - { 333 - struct private_data *priv = policy->driver_data; 334 - 335 - priv->cdev = of_cpufreq_cooling_register(policy); 336 - } 337 - 338 317 static struct cpufreq_driver dt_cpufreq_driver = { 339 - .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK, 318 + .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | 319 + CPUFREQ_IS_COOLING_DEV, 340 320 .verify = cpufreq_generic_frequency_table_verify, 341 321 .target_index = set_target, 342 322 .get = cpufreq_generic_get, 343 323 .init = cpufreq_init, 344 324 .exit = cpufreq_exit, 345 - .ready = cpufreq_ready, 325 + .online = cpufreq_online, 326 + .offline = cpufreq_offline, 346 327 .name = "cpufreq-dt", 347 328 .attr = cpufreq_dt_attr, 348 329 .suspend = cpufreq_generic_suspend,
+86 -48
drivers/cpufreq/cpufreq.c
··· 19 19 20 20 #include <linux/cpu.h> 21 21 #include <linux/cpufreq.h> 22 + #include <linux/cpu_cooling.h> 22 23 #include <linux/delay.h> 23 24 #include <linux/device.h> 24 25 #include <linux/init.h> ··· 546 545 * SYSFS INTERFACE * 547 546 *********************************************************************/ 548 547 static ssize_t show_boost(struct kobject *kobj, 549 - struct attribute *attr, char *buf) 548 + struct kobj_attribute *attr, char *buf) 550 549 { 551 550 return sprintf(buf, "%d\n", cpufreq_driver->boost_enabled); 552 551 } 553 552 554 - static ssize_t store_boost(struct kobject *kobj, struct attribute *attr, 555 - const char *buf, size_t count) 553 + static ssize_t store_boost(struct kobject *kobj, struct kobj_attribute *attr, 554 + const char *buf, size_t count) 556 555 { 557 556 int ret, enable; 558 557 ··· 1201 1200 return -ENOMEM; 1202 1201 } 1203 1202 1204 - cpumask_copy(policy->cpus, cpumask_of(cpu)); 1203 + if (!new_policy && cpufreq_driver->online) { 1204 + ret = cpufreq_driver->online(policy); 1205 + if (ret) { 1206 + pr_debug("%s: %d: initialization failed\n", __func__, 1207 + __LINE__); 1208 + goto out_exit_policy; 1209 + } 1205 1210 1206 - /* call driver. From then on the cpufreq must be able 1207 - * to accept all calls to ->verify and ->setpolicy for this CPU 1208 - */ 1209 - ret = cpufreq_driver->init(policy); 1210 - if (ret) { 1211 - pr_debug("initialization failed\n"); 1212 - goto out_free_policy; 1213 - } 1211 + /* Recover policy->cpus using related_cpus */ 1212 + cpumask_copy(policy->cpus, policy->related_cpus); 1213 + } else { 1214 + cpumask_copy(policy->cpus, cpumask_of(cpu)); 1214 1215 1215 - ret = cpufreq_table_validate_and_sort(policy); 1216 - if (ret) 1217 - goto out_exit_policy; 1216 + /* 1217 + * Call driver. From then on the cpufreq must be able 1218 + * to accept all calls to ->verify and ->setpolicy for this CPU. 1219 + */ 1220 + ret = cpufreq_driver->init(policy); 1221 + if (ret) { 1222 + pr_debug("%s: %d: initialization failed\n", __func__, 1223 + __LINE__); 1224 + goto out_free_policy; 1225 + } 1218 1226 1219 - down_write(&policy->rwsem); 1227 + ret = cpufreq_table_validate_and_sort(policy); 1228 + if (ret) 1229 + goto out_exit_policy; 1220 1230 1221 - if (new_policy) { 1222 1231 /* related_cpus should at least include policy->cpus. */ 1223 1232 cpumask_copy(policy->related_cpus, policy->cpus); 1224 1233 } 1225 1234 1235 + down_write(&policy->rwsem); 1226 1236 /* 1227 1237 * affected cpus must always be the one, which are online. We aren't 1228 1238 * managing offline cpus here. ··· 1317 1305 if (ret) { 1318 1306 pr_err("%s: Failed to initialize policy for cpu: %d (%d)\n", 1319 1307 __func__, cpu, ret); 1320 - /* cpufreq_policy_free() will notify based on this */ 1321 - new_policy = false; 1322 1308 goto out_destroy_policy; 1323 1309 } 1324 1310 ··· 1327 1317 /* Callback for handling stuff after policy is ready */ 1328 1318 if (cpufreq_driver->ready) 1329 1319 cpufreq_driver->ready(policy); 1320 + 1321 + if (IS_ENABLED(CONFIG_CPU_THERMAL) && 1322 + cpufreq_driver->flags & CPUFREQ_IS_COOLING_DEV) 1323 + policy->cdev = of_cpufreq_cooling_register(policy); 1330 1324 1331 1325 pr_debug("initialization complete\n"); 1332 1326 ··· 1419 1405 goto unlock; 1420 1406 } 1421 1407 1408 + if (IS_ENABLED(CONFIG_CPU_THERMAL) && 1409 + cpufreq_driver->flags & CPUFREQ_IS_COOLING_DEV) { 1410 + cpufreq_cooling_unregister(policy->cdev); 1411 + policy->cdev = NULL; 1412 + } 1413 + 1422 1414 if (cpufreq_driver->stop_cpu) 1423 1415 cpufreq_driver->stop_cpu(policy); 1424 1416 ··· 1432 1412 cpufreq_exit_governor(policy); 1433 1413 1434 1414 /* 1435 - * Perform the ->exit() even during light-weight tear-down, 1436 - * since this is a core component, and is essential for the 1437 - * subsequent light-weight ->init() to succeed. 1415 + * Perform the ->offline() during light-weight tear-down, as 1416 + * that allows fast recovery when the CPU comes back. 1438 1417 */ 1439 - if (cpufreq_driver->exit) { 1418 + if (cpufreq_driver->offline) { 1419 + cpufreq_driver->offline(policy); 1420 + } else if (cpufreq_driver->exit) { 1440 1421 cpufreq_driver->exit(policy); 1441 1422 policy->freq_table = NULL; 1442 1423 } ··· 1466 1445 cpumask_clear_cpu(cpu, policy->real_cpus); 1467 1446 remove_cpu_dev_symlink(policy, dev); 1468 1447 1469 - if (cpumask_empty(policy->real_cpus)) 1448 + if (cpumask_empty(policy->real_cpus)) { 1449 + /* We did light-weight exit earlier, do full tear down now */ 1450 + if (cpufreq_driver->offline) 1451 + cpufreq_driver->exit(policy); 1452 + 1470 1453 cpufreq_policy_free(policy); 1454 + } 1471 1455 } 1472 1456 1473 1457 /** ··· 2218 2192 } 2219 2193 EXPORT_SYMBOL(cpufreq_get_policy); 2220 2194 2221 - /* 2222 - * policy : current policy. 2223 - * new_policy: policy to be set. 2195 + /** 2196 + * cpufreq_set_policy - Modify cpufreq policy parameters. 2197 + * @policy: Policy object to modify. 2198 + * @new_policy: New policy data. 2199 + * 2200 + * Pass @new_policy to the cpufreq driver's ->verify() callback, run the 2201 + * installed policy notifiers for it with the CPUFREQ_ADJUST value, pass it to 2202 + * the driver's ->verify() callback again and run the notifiers for it again 2203 + * with the CPUFREQ_NOTIFY value. Next, copy the min and max parameters 2204 + * of @new_policy to @policy and either invoke the driver's ->setpolicy() 2205 + * callback (if present) or carry out a governor update for @policy. That is, 2206 + * run the current governor's ->limits() callback (if the governor field in 2207 + * @new_policy points to the same object as the one in @policy) or replace the 2208 + * governor for @policy with the new one stored in @new_policy. 2209 + * 2210 + * The cpuinfo part of @policy is not updated by this function. 2224 2211 */ 2225 2212 static int cpufreq_set_policy(struct cpufreq_policy *policy, 2226 - struct cpufreq_policy *new_policy) 2213 + struct cpufreq_policy *new_policy) 2227 2214 { 2228 2215 struct cpufreq_governor *old_gov; 2229 2216 int ret; ··· 2286 2247 if (cpufreq_driver->setpolicy) { 2287 2248 policy->policy = new_policy->policy; 2288 2249 pr_debug("setting range\n"); 2289 - return cpufreq_driver->setpolicy(new_policy); 2250 + return cpufreq_driver->setpolicy(policy); 2290 2251 } 2291 2252 2292 2253 if (new_policy->governor == policy->governor) { 2293 - pr_debug("cpufreq: governor limits update\n"); 2254 + pr_debug("governor limits update\n"); 2294 2255 cpufreq_governor_limits(policy); 2295 2256 return 0; 2296 2257 } ··· 2311 2272 if (!ret) { 2312 2273 ret = cpufreq_start_governor(policy); 2313 2274 if (!ret) { 2314 - pr_debug("cpufreq: governor change\n"); 2275 + pr_debug("governor change\n"); 2315 2276 sched_cpufreq_governor_change(policy, old_gov); 2316 2277 return 0; 2317 2278 } ··· 2332 2293 } 2333 2294 2334 2295 /** 2335 - * cpufreq_update_policy - re-evaluate an existing cpufreq policy 2336 - * @cpu: CPU which shall be re-evaluated 2296 + * cpufreq_update_policy - Re-evaluate an existing cpufreq policy. 2297 + * @cpu: CPU to re-evaluate the policy for. 2337 2298 * 2338 - * Useful for policy notifiers which have different necessities 2339 - * at different times. 2299 + * Update the current frequency for the cpufreq policy of @cpu and use 2300 + * cpufreq_set_policy() to re-apply the min and max limits saved in the 2301 + * user_policy sub-structure of that policy, which triggers the evaluation 2302 + * of policy notifiers and the cpufreq driver's ->verify() callback for the 2303 + * policy in question, among other things. 2340 2304 */ 2341 2305 void cpufreq_update_policy(unsigned int cpu) 2342 2306 { ··· 2354 2312 if (policy_is_inactive(policy)) 2355 2313 goto unlock; 2356 2314 2357 - pr_debug("updating policy for CPU %u\n", cpu); 2358 - memcpy(&new_policy, policy, sizeof(*policy)); 2359 - new_policy.min = policy->user_policy.min; 2360 - new_policy.max = policy->user_policy.max; 2361 - 2362 2315 /* 2363 2316 * BIOS might change freq behind our back 2364 2317 * -> ask driver for current freq and notify governors about a change 2365 2318 */ 2366 - if (cpufreq_driver->get && !cpufreq_driver->setpolicy) { 2367 - if (cpufreq_suspended) 2368 - goto unlock; 2319 + if (cpufreq_driver->get && !cpufreq_driver->setpolicy && 2320 + (cpufreq_suspended || WARN_ON(!cpufreq_update_current_freq(policy)))) 2321 + goto unlock; 2369 2322 2370 - new_policy.cur = cpufreq_update_current_freq(policy); 2371 - if (WARN_ON(!new_policy.cur)) 2372 - goto unlock; 2373 - } 2323 + pr_debug("updating policy for CPU %u\n", cpu); 2324 + memcpy(&new_policy, policy, sizeof(*policy)); 2325 + new_policy.min = policy->user_policy.min; 2326 + new_policy.max = policy->user_policy.max; 2374 2327 2375 2328 cpufreq_set_policy(policy, &new_policy); 2376 2329 ··· 2516 2479 driver_data->target) || 2517 2480 (driver_data->setpolicy && (driver_data->target_index || 2518 2481 driver_data->target)) || 2519 - (!!driver_data->get_intermediate != !!driver_data->target_intermediate)) 2482 + (!driver_data->get_intermediate != !driver_data->target_intermediate) || 2483 + (!driver_data->online != !driver_data->offline)) 2520 2484 return -EINVAL; 2521 2485 2522 2486 pr_debug("trying to register driver %s\n", driver_data->name);
+10 -6
drivers/cpufreq/cpufreq_stats.c
··· 31 31 { 32 32 unsigned long long cur_time = get_jiffies_64(); 33 33 34 - spin_lock(&cpufreq_stats_lock); 35 34 stats->time_in_state[stats->last_index] += cur_time - stats->last_time; 36 35 stats->last_time = cur_time; 37 - spin_unlock(&cpufreq_stats_lock); 38 36 } 39 37 40 38 static void cpufreq_stats_clear_table(struct cpufreq_stats *stats) 41 39 { 42 40 unsigned int count = stats->max_state; 43 41 42 + spin_lock(&cpufreq_stats_lock); 44 43 memset(stats->time_in_state, 0, count * sizeof(u64)); 45 44 memset(stats->trans_table, 0, count * count * sizeof(int)); 46 45 stats->last_time = get_jiffies_64(); 47 46 stats->total_trans = 0; 47 + spin_unlock(&cpufreq_stats_lock); 48 48 } 49 49 50 50 static ssize_t show_total_trans(struct cpufreq_policy *policy, char *buf) 51 51 { 52 52 return sprintf(buf, "%d\n", policy->stats->total_trans); 53 53 } 54 + cpufreq_freq_attr_ro(total_trans); 54 55 55 56 static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf) 56 57 { ··· 62 61 if (policy->fast_switch_enabled) 63 62 return 0; 64 63 64 + spin_lock(&cpufreq_stats_lock); 65 65 cpufreq_stats_update(stats); 66 + spin_unlock(&cpufreq_stats_lock); 67 + 66 68 for (i = 0; i < stats->state_num; i++) { 67 69 len += sprintf(buf + len, "%u %llu\n", stats->freq_table[i], 68 70 (unsigned long long) ··· 73 69 } 74 70 return len; 75 71 } 72 + cpufreq_freq_attr_ro(time_in_state); 76 73 77 74 static ssize_t store_reset(struct cpufreq_policy *policy, const char *buf, 78 75 size_t count) ··· 82 77 cpufreq_stats_clear_table(policy->stats); 83 78 return count; 84 79 } 80 + cpufreq_freq_attr_wo(reset); 85 81 86 82 static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf) 87 83 { ··· 131 125 return len; 132 126 } 133 127 cpufreq_freq_attr_ro(trans_table); 134 - 135 - cpufreq_freq_attr_ro(total_trans); 136 - cpufreq_freq_attr_ro(time_in_state); 137 - cpufreq_freq_attr_wo(reset); 138 128 139 129 static struct attribute *default_attrs[] = { 140 130 &total_trans.attr, ··· 242 240 if (old_index == -1 || new_index == -1 || old_index == new_index) 243 241 return; 244 242 243 + spin_lock(&cpufreq_stats_lock); 245 244 cpufreq_stats_update(stats); 246 245 247 246 stats->last_index = new_index; 248 247 stats->trans_table[old_index * stats->max_state + new_index]++; 249 248 stats->total_trans++; 249 + spin_unlock(&cpufreq_stats_lock); 250 250 }
+1 -4
drivers/cpufreq/davinci-cpufreq.c
··· 23 23 #include <linux/init.h> 24 24 #include <linux/err.h> 25 25 #include <linux/clk.h> 26 + #include <linux/platform_data/davinci-cpufreq.h> 26 27 #include <linux/platform_device.h> 27 28 #include <linux/export.h> 28 - 29 - #include <mach/hardware.h> 30 - #include <mach/cpufreq.h> 31 - #include <mach/common.h> 32 29 33 30 struct davinci_cpufreq { 34 31 struct device *dev;
+2 -3
drivers/cpufreq/e_powersaver.c
··· 323 323 states = 2; 324 324 325 325 /* Allocate private data and frequency table for current cpu */ 326 - centaur = kzalloc(sizeof(*centaur) 327 - + (states + 1) * sizeof(struct cpufreq_frequency_table), 328 - GFP_KERNEL); 326 + centaur = kzalloc(struct_size(centaur, freq_table, states + 1), 327 + GFP_KERNEL); 329 328 if (!centaur) 330 329 return -ENOMEM; 331 330 eps_cpu[0] = centaur;
+3 -22
drivers/cpufreq/imx6q-cpufreq.c
··· 9 9 #include <linux/clk.h> 10 10 #include <linux/cpu.h> 11 11 #include <linux/cpufreq.h> 12 - #include <linux/cpu_cooling.h> 13 12 #include <linux/err.h> 14 13 #include <linux/module.h> 15 14 #include <linux/nvmem-consumer.h> ··· 51 52 }; 52 53 53 54 static struct device *cpu_dev; 54 - static struct thermal_cooling_device *cdev; 55 55 static bool free_opp; 56 56 static struct cpufreq_frequency_table *freq_table; 57 57 static unsigned int max_freq; ··· 191 193 return 0; 192 194 } 193 195 194 - static void imx6q_cpufreq_ready(struct cpufreq_policy *policy) 195 - { 196 - cdev = of_cpufreq_cooling_register(policy); 197 - 198 - if (!cdev) 199 - dev_err(cpu_dev, 200 - "running cpufreq without cooling device: %ld\n", 201 - PTR_ERR(cdev)); 202 - } 203 - 204 196 static int imx6q_cpufreq_init(struct cpufreq_policy *policy) 205 197 { 206 198 int ret; ··· 198 210 policy->clk = clks[ARM].clk; 199 211 ret = cpufreq_generic_init(policy, freq_table, transition_latency); 200 212 policy->suspend_freq = max_freq; 213 + dev_pm_opp_of_register_em(policy->cpus); 201 214 202 215 return ret; 203 216 } 204 217 205 - static int imx6q_cpufreq_exit(struct cpufreq_policy *policy) 206 - { 207 - cpufreq_cooling_unregister(cdev); 208 - 209 - return 0; 210 - } 211 - 212 218 static struct cpufreq_driver imx6q_cpufreq_driver = { 213 - .flags = CPUFREQ_NEED_INITIAL_FREQ_CHECK, 219 + .flags = CPUFREQ_NEED_INITIAL_FREQ_CHECK | 220 + CPUFREQ_IS_COOLING_DEV, 214 221 .verify = cpufreq_generic_frequency_table_verify, 215 222 .target_index = imx6q_set_target, 216 223 .get = cpufreq_generic_get, 217 224 .init = imx6q_cpufreq_init, 218 - .exit = imx6q_cpufreq_exit, 219 225 .name = "imx6q-cpufreq", 220 - .ready = imx6q_cpufreq_ready, 221 226 .attr = cpufreq_generic_attr, 222 227 .suspend = cpufreq_generic_suspend, 223 228 };
+56 -49
drivers/cpufreq/intel_pstate.c
··· 50 50 #define int_tofp(X) ((int64_t)(X) << FRAC_BITS) 51 51 #define fp_toint(X) ((X) >> FRAC_BITS) 52 52 53 + #define ONE_EIGHTH_FP ((int64_t)1 << (FRAC_BITS - 3)) 54 + 53 55 #define EXT_BITS 6 54 56 #define EXT_FRAC_BITS (EXT_BITS + FRAC_BITS) 55 57 #define fp_ext_toint(X) ((X) >> EXT_FRAC_BITS) ··· 897 895 /************************** sysfs begin ************************/ 898 896 #define show_one(file_name, object) \ 899 897 static ssize_t show_##file_name \ 900 - (struct kobject *kobj, struct attribute *attr, char *buf) \ 898 + (struct kobject *kobj, struct kobj_attribute *attr, char *buf) \ 901 899 { \ 902 900 return sprintf(buf, "%u\n", global.object); \ 903 901 } ··· 906 904 static int intel_pstate_update_status(const char *buf, size_t size); 907 905 908 906 static ssize_t show_status(struct kobject *kobj, 909 - struct attribute *attr, char *buf) 907 + struct kobj_attribute *attr, char *buf) 910 908 { 911 909 ssize_t ret; 912 910 ··· 917 915 return ret; 918 916 } 919 917 920 - static ssize_t store_status(struct kobject *a, struct attribute *b, 918 + static ssize_t store_status(struct kobject *a, struct kobj_attribute *b, 921 919 const char *buf, size_t count) 922 920 { 923 921 char *p = memchr(buf, '\n', count); ··· 931 929 } 932 930 933 931 static ssize_t show_turbo_pct(struct kobject *kobj, 934 - struct attribute *attr, char *buf) 932 + struct kobj_attribute *attr, char *buf) 935 933 { 936 934 struct cpudata *cpu; 937 935 int total, no_turbo, turbo_pct; ··· 957 955 } 958 956 959 957 static ssize_t show_num_pstates(struct kobject *kobj, 960 - struct attribute *attr, char *buf) 958 + struct kobj_attribute *attr, char *buf) 961 959 { 962 960 struct cpudata *cpu; 963 961 int total; ··· 978 976 } 979 977 980 978 static ssize_t show_no_turbo(struct kobject *kobj, 981 - struct attribute *attr, char *buf) 979 + struct kobj_attribute *attr, char *buf) 982 980 { 983 981 ssize_t ret; 984 982 ··· 1000 998 return ret; 1001 999 } 1002 1000 1003 - static ssize_t store_no_turbo(struct kobject *a, struct attribute *b, 1001 + static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b, 1004 1002 const char *buf, size_t count) 1005 1003 { 1006 1004 unsigned int input; ··· 1047 1045 return count; 1048 1046 } 1049 1047 1050 - static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b, 1048 + static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b, 1051 1049 const char *buf, size_t count) 1052 1050 { 1053 1051 unsigned int input; ··· 1077 1075 return count; 1078 1076 } 1079 1077 1080 - static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b, 1078 + static ssize_t store_min_perf_pct(struct kobject *a, struct kobj_attribute *b, 1081 1079 const char *buf, size_t count) 1082 1080 { 1083 1081 unsigned int input; ··· 1109 1107 } 1110 1108 1111 1109 static ssize_t show_hwp_dynamic_boost(struct kobject *kobj, 1112 - struct attribute *attr, char *buf) 1110 + struct kobj_attribute *attr, char *buf) 1113 1111 { 1114 1112 return sprintf(buf, "%u\n", hwp_boost); 1115 1113 } 1116 1114 1117 - static ssize_t store_hwp_dynamic_boost(struct kobject *a, struct attribute *b, 1115 + static ssize_t store_hwp_dynamic_boost(struct kobject *a, 1116 + struct kobj_attribute *b, 1118 1117 const char *buf, size_t count) 1119 1118 { 1120 1119 unsigned int input; ··· 1447 1444 return ret; 1448 1445 } 1449 1446 1450 - static int intel_pstate_get_base_pstate(struct cpudata *cpu) 1451 - { 1452 - return global.no_turbo || global.turbo_disabled ? 1453 - cpu->pstate.max_pstate : cpu->pstate.turbo_pstate; 1454 - } 1455 - 1456 1447 static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate) 1457 1448 { 1458 1449 trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu); ··· 1467 1470 1468 1471 static void intel_pstate_max_within_limits(struct cpudata *cpu) 1469 1472 { 1470 - int pstate; 1473 + int pstate = max(cpu->pstate.min_pstate, cpu->max_perf_ratio); 1471 1474 1472 1475 update_turbo_state(); 1473 - pstate = intel_pstate_get_base_pstate(cpu); 1474 - pstate = max(cpu->pstate.min_pstate, cpu->max_perf_ratio); 1475 1476 intel_pstate_set_pstate(cpu, pstate); 1476 1477 } 1477 1478 ··· 1673 1678 static inline int32_t get_target_pstate(struct cpudata *cpu) 1674 1679 { 1675 1680 struct sample *sample = &cpu->sample; 1676 - int32_t busy_frac, boost; 1681 + int32_t busy_frac; 1677 1682 int target, avg_pstate; 1678 1683 1679 1684 busy_frac = div_fp(sample->mperf << cpu->aperf_mperf_shift, 1680 1685 sample->tsc); 1681 1686 1682 - boost = cpu->iowait_boost; 1683 - cpu->iowait_boost >>= 1; 1684 - 1685 - if (busy_frac < boost) 1686 - busy_frac = boost; 1687 + if (busy_frac < cpu->iowait_boost) 1688 + busy_frac = cpu->iowait_boost; 1687 1689 1688 1690 sample->busy_scaled = busy_frac * 100; 1689 1691 ··· 1707 1715 1708 1716 static int intel_pstate_prepare_request(struct cpudata *cpu, int pstate) 1709 1717 { 1710 - int max_pstate = intel_pstate_get_base_pstate(cpu); 1711 - int min_pstate; 1718 + int min_pstate = max(cpu->pstate.min_pstate, cpu->min_perf_ratio); 1719 + int max_pstate = max(min_pstate, cpu->max_perf_ratio); 1712 1720 1713 - min_pstate = max(cpu->pstate.min_pstate, cpu->min_perf_ratio); 1714 - max_pstate = max(min_pstate, cpu->max_perf_ratio); 1715 1721 return clamp_t(int, pstate, min_pstate, max_pstate); 1716 1722 } 1717 1723 ··· 1757 1767 if (smp_processor_id() != cpu->cpu) 1758 1768 return; 1759 1769 1770 + delta_ns = time - cpu->last_update; 1760 1771 if (flags & SCHED_CPUFREQ_IOWAIT) { 1761 - cpu->iowait_boost = int_tofp(1); 1762 - cpu->last_update = time; 1763 - /* 1764 - * The last time the busy was 100% so P-state was max anyway 1765 - * so avoid overhead of computation. 1766 - */ 1767 - if (fp_toint(cpu->sample.busy_scaled) == 100) 1768 - return; 1769 - 1770 - goto set_pstate; 1772 + /* Start over if the CPU may have been idle. */ 1773 + if (delta_ns > TICK_NSEC) { 1774 + cpu->iowait_boost = ONE_EIGHTH_FP; 1775 + } else if (cpu->iowait_boost) { 1776 + cpu->iowait_boost <<= 1; 1777 + if (cpu->iowait_boost > int_tofp(1)) 1778 + cpu->iowait_boost = int_tofp(1); 1779 + } else { 1780 + cpu->iowait_boost = ONE_EIGHTH_FP; 1781 + } 1771 1782 } else if (cpu->iowait_boost) { 1772 1783 /* Clear iowait_boost if the CPU may have been idle. */ 1773 - delta_ns = time - cpu->last_update; 1774 1784 if (delta_ns > TICK_NSEC) 1775 1785 cpu->iowait_boost = 0; 1786 + else 1787 + cpu->iowait_boost >>= 1; 1776 1788 } 1777 1789 cpu->last_update = time; 1778 1790 delta_ns = time - cpu->sample.time; 1779 1791 if ((s64)delta_ns < INTEL_PSTATE_SAMPLING_INTERVAL) 1780 1792 return; 1781 1793 1782 - set_pstate: 1783 1794 if (intel_pstate_sample(cpu, time)) 1784 1795 intel_pstate_adjust_pstate(cpu); 1785 1796 } ··· 1967 1976 if (hwp_active) { 1968 1977 intel_pstate_get_hwp_max(cpu->cpu, &turbo_max, &max_state); 1969 1978 } else { 1970 - max_state = intel_pstate_get_base_pstate(cpu); 1979 + max_state = global.no_turbo || global.turbo_disabled ? 1980 + cpu->pstate.max_pstate : cpu->pstate.turbo_pstate; 1971 1981 turbo_max = cpu->pstate.turbo_pstate; 1972 1982 } 1973 1983 ··· 2467 2475 kfree(pss); 2468 2476 } 2469 2477 2478 + pr_debug("ACPI _PSS not found\n"); 2470 2479 return true; 2471 2480 } 2472 2481 ··· 2478 2485 2479 2486 status = acpi_get_handle(NULL, "\\_SB", &handle); 2480 2487 if (ACPI_FAILURE(status)) 2481 - return true; 2488 + goto not_found; 2482 2489 2483 - return !acpi_has_method(handle, "PCCH"); 2490 + if (acpi_has_method(handle, "PCCH")) 2491 + return false; 2492 + 2493 + not_found: 2494 + pr_debug("ACPI PCCH not found\n"); 2495 + return true; 2484 2496 } 2485 2497 2486 2498 static bool __init intel_pstate_has_acpi_ppc(void) ··· 2500 2502 if (acpi_has_method(pr->handle, "_PPC")) 2501 2503 return true; 2502 2504 } 2505 + pr_debug("ACPI _PPC not found\n"); 2503 2506 return false; 2504 2507 } 2505 2508 ··· 2538 2539 id = x86_match_cpu(intel_pstate_cpu_oob_ids); 2539 2540 if (id) { 2540 2541 rdmsrl(MSR_MISC_PWR_MGMT, misc_pwr); 2541 - if ( misc_pwr & (1 << 8)) 2542 + if (misc_pwr & (1 << 8)) { 2543 + pr_debug("Bit 8 in the MISC_PWR_MGMT MSR set\n"); 2542 2544 return true; 2545 + } 2543 2546 } 2544 2547 2545 2548 idx = acpi_match_platform_list(plat_info); ··· 2607 2606 } 2608 2607 } else { 2609 2608 id = x86_match_cpu(intel_pstate_cpu_ids); 2610 - if (!id) 2609 + if (!id) { 2610 + pr_info("CPU ID not supported\n"); 2611 2611 return -ENODEV; 2612 + } 2612 2613 2613 2614 copy_cpu_funcs((struct pstate_funcs *)id->driver_data); 2614 2615 } 2615 2616 2616 - if (intel_pstate_msrs_not_valid()) 2617 + if (intel_pstate_msrs_not_valid()) { 2618 + pr_info("Invalid MSRs\n"); 2617 2619 return -ENODEV; 2620 + } 2618 2621 2619 2622 hwp_cpu_matched: 2620 2623 /* 2621 2624 * The Intel pstate driver will be ignored if the platform 2622 2625 * firmware has its own power management modes. 2623 2626 */ 2624 - if (intel_pstate_platform_pwr_mgmt_exists()) 2627 + if (intel_pstate_platform_pwr_mgmt_exists()) { 2628 + pr_info("P-states controlled by the platform\n"); 2625 2629 return -ENODEV; 2630 + } 2626 2631 2627 2632 if (!hwp_active && hwp_only) 2628 2633 return -ENOTSUPP;
+1 -1
drivers/cpufreq/longhaul.c
··· 851 851 case TYPE_POWERSAVER: 852 852 pr_cont("Powersaver supported\n"); 853 853 break; 854 - }; 854 + } 855 855 856 856 /* Doesn't hurt */ 857 857 longhaul_setup_southbridge();
+4 -12
drivers/cpufreq/mediatek-cpufreq.c
··· 14 14 15 15 #include <linux/clk.h> 16 16 #include <linux/cpu.h> 17 - #include <linux/cpu_cooling.h> 18 17 #include <linux/cpufreq.h> 19 18 #include <linux/cpumask.h> 20 19 #include <linux/module.h> ··· 47 48 struct regulator *sram_reg; 48 49 struct clk *cpu_clk; 49 50 struct clk *inter_clk; 50 - struct thermal_cooling_device *cdev; 51 51 struct list_head list_head; 52 52 int intermediate_voltage; 53 53 bool need_voltage_tracking; ··· 305 307 306 308 #define DYNAMIC_POWER "dynamic-power-coefficient" 307 309 308 - static void mtk_cpufreq_ready(struct cpufreq_policy *policy) 309 - { 310 - struct mtk_cpu_dvfs_info *info = policy->driver_data; 311 - 312 - info->cdev = of_cpufreq_cooling_register(policy); 313 - } 314 - 315 310 static int mtk_cpu_dvfs_info_init(struct mtk_cpu_dvfs_info *info, int cpu) 316 311 { 317 312 struct device *cpu_dev; ··· 456 465 policy->driver_data = info; 457 466 policy->clk = info->cpu_clk; 458 467 468 + dev_pm_opp_of_register_em(policy->cpus); 469 + 459 470 return 0; 460 471 } 461 472 ··· 465 472 { 466 473 struct mtk_cpu_dvfs_info *info = policy->driver_data; 467 474 468 - cpufreq_cooling_unregister(info->cdev); 469 475 dev_pm_opp_free_cpufreq_table(info->cpu_dev, &policy->freq_table); 470 476 471 477 return 0; ··· 472 480 473 481 static struct cpufreq_driver mtk_cpufreq_driver = { 474 482 .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | 475 - CPUFREQ_HAVE_GOVERNOR_PER_POLICY, 483 + CPUFREQ_HAVE_GOVERNOR_PER_POLICY | 484 + CPUFREQ_IS_COOLING_DEV, 476 485 .verify = cpufreq_generic_frequency_table_verify, 477 486 .target_index = mtk_cpufreq_set_target, 478 487 .get = cpufreq_generic_get, 479 488 .init = mtk_cpufreq_init, 480 489 .exit = mtk_cpufreq_exit, 481 - .ready = mtk_cpufreq_ready, 482 490 .name = "mtk-cpufreq", 483 491 .attr = cpufreq_generic_attr, 484 492 };
+3 -1
drivers/cpufreq/omap-cpufreq.c
··· 133 133 134 134 /* FIXME: what's the actual transition time? */ 135 135 result = cpufreq_generic_init(policy, freq_table, 300 * 1000); 136 - if (!result) 136 + if (!result) { 137 + dev_pm_opp_of_register_em(policy->cpus); 137 138 return 0; 139 + } 138 140 139 141 freq_table_free(); 140 142 fail:
+1 -1
drivers/cpufreq/pcc-cpufreq.c
··· 268 268 if (!pccp || pccp->type != ACPI_TYPE_PACKAGE) { 269 269 ret = -ENODEV; 270 270 goto out_free; 271 - }; 271 + } 272 272 273 273 offset = &(pccp->package.elements[0]); 274 274 if (!offset || offset->type != ACPI_TYPE_INTEGER) {
+7 -3
drivers/cpufreq/powernv-cpufreq.c
··· 244 244 u32 len_ids, len_freqs; 245 245 u32 pstate_min, pstate_max, pstate_nominal; 246 246 u32 pstate_turbo, pstate_ultra_turbo; 247 + int rc = -ENODEV; 247 248 248 249 power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); 249 250 if (!power_mgt) { ··· 328 327 powernv_freqs[i].frequency = freq * 1000; /* kHz */ 329 328 powernv_freqs[i].driver_data = id & 0xFF; 330 329 331 - revmap_data = (struct pstate_idx_revmap_data *) 332 - kmalloc(sizeof(*revmap_data), GFP_KERNEL); 330 + revmap_data = kmalloc(sizeof(*revmap_data), GFP_KERNEL); 331 + if (!revmap_data) { 332 + rc = -ENOMEM; 333 + goto out; 334 + } 333 335 334 336 revmap_data->pstate_id = id & 0xFF; 335 337 revmap_data->cpufreq_table_idx = i; ··· 361 357 return 0; 362 358 out: 363 359 of_node_put(power_mgt); 364 - return -ENODEV; 360 + return rc; 365 361 } 366 362 367 363 /* Returns the CPU frequency corresponding to the pstate_id. */
+42 -11
drivers/cpufreq/qcom-cpufreq-hw.c
··· 10 10 #include <linux/module.h> 11 11 #include <linux/of_address.h> 12 12 #include <linux/of_platform.h> 13 + #include <linux/pm_opp.h> 13 14 #include <linux/slab.h> 14 15 15 16 #define LUT_MAX_ENTRIES 40U 16 17 #define LUT_SRC GENMASK(31, 30) 17 18 #define LUT_L_VAL GENMASK(7, 0) 18 19 #define LUT_CORE_COUNT GENMASK(18, 16) 20 + #define LUT_VOLT GENMASK(11, 0) 19 21 #define LUT_ROW_SIZE 32 20 22 #define CLK_HW_DIV 2 21 23 22 24 /* Register offsets */ 23 25 #define REG_ENABLE 0x0 24 - #define REG_LUT_TABLE 0x110 26 + #define REG_FREQ_LUT 0x110 27 + #define REG_VOLT_LUT 0x114 25 28 #define REG_PERF_STATE 0x920 26 29 27 30 static unsigned long cpu_hw_rate, xo_rate; ··· 73 70 return policy->freq_table[index].frequency; 74 71 } 75 72 76 - static int qcom_cpufreq_hw_read_lut(struct device *dev, 73 + static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev, 77 74 struct cpufreq_policy *policy, 78 75 void __iomem *base) 79 76 { 80 77 u32 data, src, lval, i, core_count, prev_cc = 0, prev_freq = 0, freq; 78 + u32 volt; 81 79 unsigned int max_cores = cpumask_weight(policy->cpus); 82 80 struct cpufreq_frequency_table *table; 83 81 ··· 87 83 return -ENOMEM; 88 84 89 85 for (i = 0; i < LUT_MAX_ENTRIES; i++) { 90 - data = readl_relaxed(base + REG_LUT_TABLE + i * LUT_ROW_SIZE); 86 + data = readl_relaxed(base + REG_FREQ_LUT + 87 + i * LUT_ROW_SIZE); 91 88 src = FIELD_GET(LUT_SRC, data); 92 89 lval = FIELD_GET(LUT_L_VAL, data); 93 90 core_count = FIELD_GET(LUT_CORE_COUNT, data); 91 + 92 + data = readl_relaxed(base + REG_VOLT_LUT + 93 + i * LUT_ROW_SIZE); 94 + volt = FIELD_GET(LUT_VOLT, data) * 1000; 94 95 95 96 if (src) 96 97 freq = xo_rate * lval / 1000; 97 98 else 98 99 freq = cpu_hw_rate / 1000; 99 100 100 - /* Ignore boosts in the middle of the table */ 101 - if (core_count != max_cores) { 102 - table[i].frequency = CPUFREQ_ENTRY_INVALID; 103 - } else { 101 + if (freq != prev_freq && core_count == max_cores) { 104 102 table[i].frequency = freq; 105 - dev_dbg(dev, "index=%d freq=%d, core_count %d\n", i, 103 + dev_pm_opp_add(cpu_dev, freq * 1000, volt); 104 + dev_dbg(cpu_dev, "index=%d freq=%d, core_count %d\n", i, 106 105 freq, core_count); 106 + } else { 107 + table[i].frequency = CPUFREQ_ENTRY_INVALID; 107 108 } 108 109 109 110 /* ··· 125 116 if (prev_cc != max_cores) { 126 117 prev->frequency = prev_freq; 127 118 prev->flags = CPUFREQ_BOOST_FREQ; 119 + dev_pm_opp_add(cpu_dev, prev_freq * 1000, volt); 128 120 } 129 121 130 122 break; ··· 137 127 138 128 table[i].frequency = CPUFREQ_TABLE_END; 139 129 policy->freq_table = table; 130 + dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus); 140 131 141 132 return 0; 142 133 } ··· 170 159 struct device *dev = &global_pdev->dev; 171 160 struct of_phandle_args args; 172 161 struct device_node *cpu_np; 162 + struct device *cpu_dev; 173 163 struct resource *res; 174 164 void __iomem *base; 175 165 int ret, index; 166 + 167 + cpu_dev = get_cpu_device(policy->cpu); 168 + if (!cpu_dev) { 169 + pr_err("%s: failed to get cpu%d device\n", __func__, 170 + policy->cpu); 171 + return -ENODEV; 172 + } 176 173 177 174 cpu_np = of_cpu_device_node_get(policy->cpu); 178 175 if (!cpu_np) ··· 218 199 219 200 policy->driver_data = base + REG_PERF_STATE; 220 201 221 - ret = qcom_cpufreq_hw_read_lut(dev, policy, base); 202 + ret = qcom_cpufreq_hw_read_lut(cpu_dev, policy, base); 222 203 if (ret) { 223 204 dev_err(dev, "Domain-%d failed to read LUT\n", index); 224 205 goto error; 225 206 } 207 + 208 + ret = dev_pm_opp_get_opp_count(cpu_dev); 209 + if (ret <= 0) { 210 + dev_err(cpu_dev, "Failed to add OPPs\n"); 211 + ret = -ENODEV; 212 + goto error; 213 + } 214 + 215 + dev_pm_opp_of_register_em(policy->cpus); 226 216 227 217 policy->fast_switch_possible = true; 228 218 ··· 243 215 244 216 static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy) 245 217 { 218 + struct device *cpu_dev = get_cpu_device(policy->cpu); 246 219 void __iomem *base = policy->driver_data - REG_PERF_STATE; 247 220 221 + dev_pm_opp_remove_all_dynamic(cpu_dev); 248 222 kfree(policy->freq_table); 249 223 devm_iounmap(&global_pdev->dev, base); 250 224 ··· 261 231 262 232 static struct cpufreq_driver cpufreq_qcom_hw_driver = { 263 233 .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | 264 - CPUFREQ_HAVE_GOVERNOR_PER_POLICY, 234 + CPUFREQ_HAVE_GOVERNOR_PER_POLICY | 235 + CPUFREQ_IS_COOLING_DEV, 265 236 .verify = cpufreq_generic_frequency_table_verify, 266 237 .target_index = qcom_cpufreq_hw_target_index, 267 238 .get = qcom_cpufreq_hw_get, ··· 327 296 { 328 297 return platform_driver_register(&qcom_cpufreq_hw_driver); 329 298 } 330 - subsys_initcall(qcom_cpufreq_hw_init); 299 + device_initcall(qcom_cpufreq_hw_init); 331 300 332 301 static void __exit qcom_cpufreq_hw_exit(void) 333 302 {
+19 -3
drivers/cpufreq/qcom-cpufreq-kryo.c
··· 42 42 NUM_OF_MSM8996_VERSIONS, 43 43 }; 44 44 45 - struct platform_device *cpufreq_dt_pdev, *kryo_cpufreq_pdev; 45 + static struct platform_device *cpufreq_dt_pdev, *kryo_cpufreq_pdev; 46 46 47 47 static enum _msm8996_version qcom_cpufreq_kryo_get_msm_id(void) 48 48 { ··· 75 75 76 76 static int qcom_cpufreq_kryo_probe(struct platform_device *pdev) 77 77 { 78 - struct opp_table *opp_tables[NR_CPUS] = {0}; 78 + struct opp_table **opp_tables; 79 79 enum _msm8996_version msm8996_version; 80 80 struct nvmem_cell *speedbin_nvmem; 81 81 struct device_node *np; ··· 133 133 } 134 134 kfree(speedbin); 135 135 136 + opp_tables = kcalloc(num_possible_cpus(), sizeof(*opp_tables), GFP_KERNEL); 137 + if (!opp_tables) 138 + return -ENOMEM; 139 + 136 140 for_each_possible_cpu(cpu) { 137 141 cpu_dev = get_cpu_device(cpu); 138 142 if (NULL == cpu_dev) { ··· 155 151 156 152 cpufreq_dt_pdev = platform_device_register_simple("cpufreq-dt", -1, 157 153 NULL, 0); 158 - if (!IS_ERR(cpufreq_dt_pdev)) 154 + if (!IS_ERR(cpufreq_dt_pdev)) { 155 + platform_set_drvdata(pdev, opp_tables); 159 156 return 0; 157 + } 160 158 161 159 ret = PTR_ERR(cpufreq_dt_pdev); 162 160 dev_err(cpu_dev, "Failed to register platform device\n"); ··· 169 163 break; 170 164 dev_pm_opp_put_supported_hw(opp_tables[cpu]); 171 165 } 166 + kfree(opp_tables); 172 167 173 168 return ret; 174 169 } 175 170 176 171 static int qcom_cpufreq_kryo_remove(struct platform_device *pdev) 177 172 { 173 + struct opp_table **opp_tables = platform_get_drvdata(pdev); 174 + unsigned int cpu; 175 + 178 176 platform_device_unregister(cpufreq_dt_pdev); 177 + 178 + for_each_possible_cpu(cpu) 179 + dev_pm_opp_put_supported_hw(opp_tables[cpu]); 180 + 181 + kfree(opp_tables); 182 + 179 183 return 0; 180 184 } 181 185
+2 -13
drivers/cpufreq/qoriq-cpufreq.c
··· 13 13 #include <linux/clk.h> 14 14 #include <linux/clk-provider.h> 15 15 #include <linux/cpufreq.h> 16 - #include <linux/cpu_cooling.h> 17 16 #include <linux/errno.h> 18 17 #include <linux/init.h> 19 18 #include <linux/kernel.h> ··· 30 31 struct cpu_data { 31 32 struct clk **pclk; 32 33 struct cpufreq_frequency_table *table; 33 - struct thermal_cooling_device *cdev; 34 34 }; 35 35 36 36 /* ··· 237 239 { 238 240 struct cpu_data *data = policy->driver_data; 239 241 240 - cpufreq_cooling_unregister(data->cdev); 241 242 kfree(data->pclk); 242 243 kfree(data->table); 243 244 kfree(data); ··· 255 258 return clk_set_parent(policy->clk, parent); 256 259 } 257 260 258 - 259 - static void qoriq_cpufreq_ready(struct cpufreq_policy *policy) 260 - { 261 - struct cpu_data *cpud = policy->driver_data; 262 - 263 - cpud->cdev = of_cpufreq_cooling_register(policy); 264 - } 265 - 266 261 static struct cpufreq_driver qoriq_cpufreq_driver = { 267 262 .name = "qoriq_cpufreq", 268 - .flags = CPUFREQ_CONST_LOOPS, 263 + .flags = CPUFREQ_CONST_LOOPS | 264 + CPUFREQ_IS_COOLING_DEV, 269 265 .init = qoriq_cpufreq_cpu_init, 270 266 .exit = qoriq_cpufreq_cpu_exit, 271 267 .verify = cpufreq_generic_frequency_table_verify, 272 268 .target_index = qoriq_cpufreq_target, 273 269 .get = cpufreq_generic_get, 274 - .ready = qoriq_cpufreq_ready, 275 270 .attr = cpufreq_generic_attr, 276 271 }; 277 272
+48 -19
drivers/cpufreq/s5pv210-cpufreq.c
··· 584 584 static int s5pv210_cpufreq_probe(struct platform_device *pdev) 585 585 { 586 586 struct device_node *np; 587 - int id; 587 + int id, result = 0; 588 588 589 589 /* 590 590 * HACK: This is a temporary workaround to get access to clock ··· 594 594 * this whole driver as soon as S5PV210 gets migrated to use 595 595 * cpufreq-dt driver. 596 596 */ 597 + arm_regulator = regulator_get(NULL, "vddarm"); 598 + if (IS_ERR(arm_regulator)) { 599 + if (PTR_ERR(arm_regulator) == -EPROBE_DEFER) 600 + pr_debug("vddarm regulator not ready, defer\n"); 601 + else 602 + pr_err("failed to get regulator vddarm\n"); 603 + return PTR_ERR(arm_regulator); 604 + } 605 + 606 + int_regulator = regulator_get(NULL, "vddint"); 607 + if (IS_ERR(int_regulator)) { 608 + if (PTR_ERR(int_regulator) == -EPROBE_DEFER) 609 + pr_debug("vddint regulator not ready, defer\n"); 610 + else 611 + pr_err("failed to get regulator vddint\n"); 612 + result = PTR_ERR(int_regulator); 613 + goto err_int_regulator; 614 + } 615 + 597 616 np = of_find_compatible_node(NULL, NULL, "samsung,s5pv210-clock"); 598 617 if (!np) { 599 618 pr_err("%s: failed to find clock controller DT node\n", 600 619 __func__); 601 - return -ENODEV; 620 + result = -ENODEV; 621 + goto err_clock; 602 622 } 603 623 604 624 clk_base = of_iomap(np, 0); 605 625 of_node_put(np); 606 626 if (!clk_base) { 607 627 pr_err("%s: failed to map clock registers\n", __func__); 608 - return -EFAULT; 628 + result = -EFAULT; 629 + goto err_clock; 609 630 } 610 631 611 632 for_each_compatible_node(np, NULL, "samsung,s5pv210-dmc") { ··· 635 614 pr_err("%s: failed to get alias of dmc node '%pOFn'\n", 636 615 __func__, np); 637 616 of_node_put(np); 638 - return id; 617 + result = id; 618 + goto err_clk_base; 639 619 } 640 620 641 621 dmc_base[id] = of_iomap(np, 0); ··· 644 622 pr_err("%s: failed to map dmc%d registers\n", 645 623 __func__, id); 646 624 of_node_put(np); 647 - return -EFAULT; 625 + result = -EFAULT; 626 + goto err_dmc; 648 627 } 649 628 } 650 629 651 630 for (id = 0; id < ARRAY_SIZE(dmc_base); ++id) { 652 631 if (!dmc_base[id]) { 653 632 pr_err("%s: failed to find dmc%d node\n", __func__, id); 654 - return -ENODEV; 633 + result = -ENODEV; 634 + goto err_dmc; 655 635 } 656 - } 657 - 658 - arm_regulator = regulator_get(NULL, "vddarm"); 659 - if (IS_ERR(arm_regulator)) { 660 - pr_err("failed to get regulator vddarm\n"); 661 - return PTR_ERR(arm_regulator); 662 - } 663 - 664 - int_regulator = regulator_get(NULL, "vddint"); 665 - if (IS_ERR(int_regulator)) { 666 - pr_err("failed to get regulator vddint\n"); 667 - regulator_put(arm_regulator); 668 - return PTR_ERR(int_regulator); 669 636 } 670 637 671 638 register_reboot_notifier(&s5pv210_cpufreq_reboot_notifier); 672 639 673 640 return cpufreq_register_driver(&s5pv210_driver); 641 + 642 + err_dmc: 643 + for (id = 0; id < ARRAY_SIZE(dmc_base); ++id) 644 + if (dmc_base[id]) { 645 + iounmap(dmc_base[id]); 646 + dmc_base[id] = NULL; 647 + } 648 + 649 + err_clk_base: 650 + iounmap(clk_base); 651 + 652 + err_clock: 653 + regulator_put(int_regulator); 654 + 655 + err_int_regulator: 656 + regulator_put(arm_regulator); 657 + 658 + return result; 674 659 } 675 660 676 661 static struct platform_driver s5pv210_cpufreq_platdrv = {
+38 -15
drivers/cpufreq/scmi-cpufreq.c
··· 11 11 #include <linux/cpu.h> 12 12 #include <linux/cpufreq.h> 13 13 #include <linux/cpumask.h> 14 - #include <linux/cpu_cooling.h> 14 + #include <linux/energy_model.h> 15 15 #include <linux/export.h> 16 16 #include <linux/module.h> 17 17 #include <linux/pm_opp.h> ··· 22 22 struct scmi_data { 23 23 int domain_id; 24 24 struct device *cpu_dev; 25 - struct thermal_cooling_device *cdev; 26 25 }; 27 26 28 27 static const struct scmi_handle *handle; ··· 102 103 return 0; 103 104 } 104 105 106 + static int __maybe_unused 107 + scmi_get_cpu_power(unsigned long *power, unsigned long *KHz, int cpu) 108 + { 109 + struct device *cpu_dev = get_cpu_device(cpu); 110 + unsigned long Hz; 111 + int ret, domain; 112 + 113 + if (!cpu_dev) { 114 + pr_err("failed to get cpu%d device\n", cpu); 115 + return -ENODEV; 116 + } 117 + 118 + domain = handle->perf_ops->device_domain_id(cpu_dev); 119 + if (domain < 0) 120 + return domain; 121 + 122 + /* Get the power cost of the performance domain. */ 123 + Hz = *KHz * 1000; 124 + ret = handle->perf_ops->est_power_get(handle, domain, &Hz, power); 125 + if (ret) 126 + return ret; 127 + 128 + /* The EM framework specifies the frequency in KHz. */ 129 + *KHz = Hz / 1000; 130 + 131 + return 0; 132 + } 133 + 105 134 static int scmi_cpufreq_init(struct cpufreq_policy *policy) 106 135 { 107 - int ret; 136 + int ret, nr_opp; 108 137 unsigned int latency; 109 138 struct device *cpu_dev; 110 139 struct scmi_data *priv; 111 140 struct cpufreq_frequency_table *freq_table; 141 + struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power); 112 142 113 143 cpu_dev = get_cpu_device(policy->cpu); 114 144 if (!cpu_dev) { ··· 164 136 return ret; 165 137 } 166 138 167 - ret = dev_pm_opp_get_opp_count(cpu_dev); 168 - if (ret <= 0) { 139 + nr_opp = dev_pm_opp_get_opp_count(cpu_dev); 140 + if (nr_opp <= 0) { 169 141 dev_dbg(cpu_dev, "OPP table is not ready, deferring probe\n"); 170 142 ret = -EPROBE_DEFER; 171 143 goto out_free_opp; ··· 199 171 policy->cpuinfo.transition_latency = latency; 200 172 201 173 policy->fast_switch_possible = true; 174 + 175 + em_register_perf_domain(policy->cpus, nr_opp, &em_cb); 176 + 202 177 return 0; 203 178 204 179 out_free_priv: ··· 216 185 { 217 186 struct scmi_data *priv = policy->driver_data; 218 187 219 - cpufreq_cooling_unregister(priv->cdev); 220 188 dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table); 221 189 dev_pm_opp_remove_all_dynamic(priv->cpu_dev); 222 190 kfree(priv); ··· 223 193 return 0; 224 194 } 225 195 226 - static void scmi_cpufreq_ready(struct cpufreq_policy *policy) 227 - { 228 - struct scmi_data *priv = policy->driver_data; 229 - 230 - priv->cdev = of_cpufreq_cooling_register(policy); 231 - } 232 - 233 196 static struct cpufreq_driver scmi_cpufreq_driver = { 234 197 .name = "scmi", 235 198 .flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | 236 - CPUFREQ_NEED_INITIAL_FREQ_CHECK, 199 + CPUFREQ_NEED_INITIAL_FREQ_CHECK | 200 + CPUFREQ_IS_COOLING_DEV, 237 201 .verify = cpufreq_generic_frequency_table_verify, 238 202 .attr = cpufreq_generic_attr, 239 203 .target_index = scmi_cpufreq_set_target, ··· 235 211 .get = scmi_cpufreq_get_rate, 236 212 .init = scmi_cpufreq_init, 237 213 .exit = scmi_cpufreq_exit, 238 - .ready = scmi_cpufreq_ready, 239 214 }; 240 215 241 216 static int scmi_cpufreq_probe(struct scmi_device *sdev)
+5 -12
drivers/cpufreq/scpi-cpufreq.c
··· 22 22 #include <linux/cpu.h> 23 23 #include <linux/cpufreq.h> 24 24 #include <linux/cpumask.h> 25 - #include <linux/cpu_cooling.h> 26 25 #include <linux/export.h> 27 26 #include <linux/module.h> 28 27 #include <linux/of_platform.h> ··· 33 34 struct scpi_data { 34 35 struct clk *clk; 35 36 struct device *cpu_dev; 36 - struct thermal_cooling_device *cdev; 37 37 }; 38 38 39 39 static struct scpi_ops *scpi_ops; ··· 168 170 policy->cpuinfo.transition_latency = latency; 169 171 170 172 policy->fast_switch_possible = false; 173 + 174 + dev_pm_opp_of_register_em(policy->cpus); 175 + 171 176 return 0; 172 177 173 178 out_free_cpufreq_table: ··· 187 186 { 188 187 struct scpi_data *priv = policy->driver_data; 189 188 190 - cpufreq_cooling_unregister(priv->cdev); 191 189 clk_put(priv->clk); 192 190 dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table); 193 191 kfree(priv); ··· 195 195 return 0; 196 196 } 197 197 198 - static void scpi_cpufreq_ready(struct cpufreq_policy *policy) 199 - { 200 - struct scpi_data *priv = policy->driver_data; 201 - 202 - priv->cdev = of_cpufreq_cooling_register(policy); 203 - } 204 - 205 198 static struct cpufreq_driver scpi_cpufreq_driver = { 206 199 .name = "scpi-cpufreq", 207 200 .flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | 208 - CPUFREQ_NEED_INITIAL_FREQ_CHECK, 201 + CPUFREQ_NEED_INITIAL_FREQ_CHECK | 202 + CPUFREQ_IS_COOLING_DEV, 209 203 .verify = cpufreq_generic_frequency_table_verify, 210 204 .attr = cpufreq_generic_attr, 211 205 .get = scpi_cpufreq_get_rate, 212 206 .init = scpi_cpufreq_init, 213 207 .exit = scpi_cpufreq_exit, 214 - .ready = scpi_cpufreq_ready, 215 208 .target_index = scpi_cpufreq_set_target, 216 209 }; 217 210
+1 -2
drivers/cpufreq/speedstep-ich.c
··· 243 243 unsigned int speed; 244 244 245 245 /* You're supposed to ensure CPU is online. */ 246 - if (smp_call_function_single(cpu, get_freq_data, &speed, 1) != 0) 247 - BUG(); 246 + BUG_ON(smp_call_function_single(cpu, get_freq_data, &speed, 1)); 248 247 249 248 pr_debug("detected %u kHz as current frequency\n", speed); 250 249 return speed;
+2
drivers/cpufreq/tegra124-cpufreq.c
··· 118 118 119 119 platform_set_drvdata(pdev, priv); 120 120 121 + of_node_put(np); 122 + 121 123 return 0; 122 124 123 125 out_put_pllp_clk:
+10 -1
drivers/cpuidle/Kconfig
··· 4 4 bool "CPU idle PM support" 5 5 default y if ACPI || PPC_PSERIES 6 6 select CPU_IDLE_GOV_LADDER if (!NO_HZ && !NO_HZ_IDLE) 7 - select CPU_IDLE_GOV_MENU if (NO_HZ || NO_HZ_IDLE) 7 + select CPU_IDLE_GOV_MENU if (NO_HZ || NO_HZ_IDLE) && !CPU_IDLE_GOV_TEO 8 8 help 9 9 CPU idle is a generic framework for supporting software-controlled 10 10 idle processor power management. It includes modular cross-platform ··· 22 22 23 23 config CPU_IDLE_GOV_MENU 24 24 bool "Menu governor (for tickless system)" 25 + 26 + config CPU_IDLE_GOV_TEO 27 + bool "Timer events oriented (TEO) governor (for tickless systems)" 28 + help 29 + This governor implements a simplified idle state selection method 30 + focused on timer events and does not do any interactivity boosting. 31 + 32 + Some workloads benefit from using it and it generally should be safe 33 + to use. Say Y here if you are not happy with the alternatives. 25 34 26 35 config DT_IDLE_STATES 27 36 bool
+9 -6
drivers/cpuidle/dt_idle_states.c
··· 22 22 #include "dt_idle_states.h" 23 23 24 24 static int init_state_node(struct cpuidle_state *idle_state, 25 - const struct of_device_id *matches, 25 + const struct of_device_id *match_id, 26 26 struct device_node *state_node) 27 27 { 28 28 int err; 29 - const struct of_device_id *match_id; 30 29 const char *desc; 31 30 32 - match_id = of_match_node(matches, state_node); 33 - if (!match_id) 34 - return -ENODEV; 35 31 /* 36 32 * CPUidle drivers are expected to initialize the const void *data 37 33 * pointer of the passed in struct of_device_id array to the idle ··· 156 160 { 157 161 struct cpuidle_state *idle_state; 158 162 struct device_node *state_node, *cpu_node; 163 + const struct of_device_id *match_id; 159 164 int i, err = 0; 160 165 const cpumask_t *cpumask; 161 166 unsigned int state_idx = start_idx; ··· 177 180 if (!state_node) 178 181 break; 179 182 183 + match_id = of_match_node(matches, state_node); 184 + if (!match_id) { 185 + err = -ENODEV; 186 + break; 187 + } 188 + 180 189 if (!of_device_is_available(state_node)) { 181 190 of_node_put(state_node); 182 191 continue; ··· 201 198 } 202 199 203 200 idle_state = &drv->states[state_idx++]; 204 - err = init_state_node(idle_state, matches, state_node); 201 + err = init_state_node(idle_state, match_id, state_node); 205 202 if (err) { 206 203 pr_err("Parsing idle state node %pOF failed with err %d\n", 207 204 state_node, err);
+1
drivers/cpuidle/governors/Makefile
··· 4 4 5 5 obj-$(CONFIG_CPU_IDLE_GOV_LADDER) += ladder.o 6 6 obj-$(CONFIG_CPU_IDLE_GOV_MENU) += menu.o 7 + obj-$(CONFIG_CPU_IDLE_GOV_TEO) += teo.o
+444
drivers/cpuidle/governors/teo.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Timer events oriented CPU idle governor 4 + * 5 + * Copyright (C) 2018 Intel Corporation 6 + * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 7 + * 8 + * The idea of this governor is based on the observation that on many systems 9 + * timer events are two or more orders of magnitude more frequent than any 10 + * other interrupts, so they are likely to be the most significant source of CPU 11 + * wakeups from idle states. Moreover, information about what happened in the 12 + * (relatively recent) past can be used to estimate whether or not the deepest 13 + * idle state with target residency within the time to the closest timer is 14 + * likely to be suitable for the upcoming idle time of the CPU and, if not, then 15 + * which of the shallower idle states to choose. 16 + * 17 + * Of course, non-timer wakeup sources are more important in some use cases and 18 + * they can be covered by taking a few most recent idle time intervals of the 19 + * CPU into account. However, even in that case it is not necessary to consider 20 + * idle duration values greater than the time till the closest timer, as the 21 + * patterns that they may belong to produce average values close enough to 22 + * the time till the closest timer (sleep length) anyway. 23 + * 24 + * Thus this governor estimates whether or not the upcoming idle time of the CPU 25 + * is likely to be significantly shorter than the sleep length and selects an 26 + * idle state for it in accordance with that, as follows: 27 + * 28 + * - Find an idle state on the basis of the sleep length and state statistics 29 + * collected over time: 30 + * 31 + * o Find the deepest idle state whose target residency is less than or equal 32 + * to the sleep length. 33 + * 34 + * o Select it if it matched both the sleep length and the observed idle 35 + * duration in the past more often than it matched the sleep length alone 36 + * (i.e. the observed idle duration was significantly shorter than the sleep 37 + * length matched by it). 38 + * 39 + * o Otherwise, select the shallower state with the greatest matched "early" 40 + * wakeups metric. 41 + * 42 + * - If the majority of the most recent idle duration values are below the 43 + * target residency of the idle state selected so far, use those values to 44 + * compute the new expected idle duration and find an idle state matching it 45 + * (which has to be shallower than the one selected so far). 46 + */ 47 + 48 + #include <linux/cpuidle.h> 49 + #include <linux/jiffies.h> 50 + #include <linux/kernel.h> 51 + #include <linux/sched/clock.h> 52 + #include <linux/tick.h> 53 + 54 + /* 55 + * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value 56 + * is used for decreasing metrics on a regular basis. 57 + */ 58 + #define PULSE 1024 59 + #define DECAY_SHIFT 3 60 + 61 + /* 62 + * Number of the most recent idle duration values to take into consideration for 63 + * the detection of wakeup patterns. 64 + */ 65 + #define INTERVALS 8 66 + 67 + /** 68 + * struct teo_idle_state - Idle state data used by the TEO cpuidle governor. 69 + * @early_hits: "Early" CPU wakeups "matching" this state. 70 + * @hits: "On time" CPU wakeups "matching" this state. 71 + * @misses: CPU wakeups "missing" this state. 72 + * 73 + * A CPU wakeup is "matched" by a given idle state if the idle duration measured 74 + * after the wakeup is between the target residency of that state and the target 75 + * residency of the next one (or if this is the deepest available idle state, it 76 + * "matches" a CPU wakeup when the measured idle duration is at least equal to 77 + * its target residency). 78 + * 79 + * Also, from the TEO governor perspective, a CPU wakeup from idle is "early" if 80 + * it occurs significantly earlier than the closest expected timer event (that 81 + * is, early enough to match an idle state shallower than the one matching the 82 + * time till the closest timer event). Otherwise, the wakeup is "on time", or 83 + * it is a "hit". 84 + * 85 + * A "miss" occurs when the given state doesn't match the wakeup, but it matches 86 + * the time till the closest timer event used for idle state selection. 87 + */ 88 + struct teo_idle_state { 89 + unsigned int early_hits; 90 + unsigned int hits; 91 + unsigned int misses; 92 + }; 93 + 94 + /** 95 + * struct teo_cpu - CPU data used by the TEO cpuidle governor. 96 + * @time_span_ns: Time between idle state selection and post-wakeup update. 97 + * @sleep_length_ns: Time till the closest timer event (at the selection time). 98 + * @states: Idle states data corresponding to this CPU. 99 + * @last_state: Idle state entered by the CPU last time. 100 + * @interval_idx: Index of the most recent saved idle interval. 101 + * @intervals: Saved idle duration values. 102 + */ 103 + struct teo_cpu { 104 + u64 time_span_ns; 105 + u64 sleep_length_ns; 106 + struct teo_idle_state states[CPUIDLE_STATE_MAX]; 107 + int last_state; 108 + int interval_idx; 109 + unsigned int intervals[INTERVALS]; 110 + }; 111 + 112 + static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); 113 + 114 + /** 115 + * teo_update - Update CPU data after wakeup. 116 + * @drv: cpuidle driver containing state data. 117 + * @dev: Target CPU. 118 + */ 119 + static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) 120 + { 121 + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 122 + unsigned int sleep_length_us = ktime_to_us(cpu_data->sleep_length_ns); 123 + int i, idx_hit = -1, idx_timer = -1; 124 + unsigned int measured_us; 125 + 126 + if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) { 127 + /* 128 + * One of the safety nets has triggered or this was a timer 129 + * wakeup (or equivalent). 130 + */ 131 + measured_us = sleep_length_us; 132 + } else { 133 + unsigned int lat = drv->states[cpu_data->last_state].exit_latency; 134 + 135 + measured_us = ktime_to_us(cpu_data->time_span_ns); 136 + /* 137 + * The delay between the wakeup and the first instruction 138 + * executed by the CPU is not likely to be worst-case every 139 + * time, so take 1/2 of the exit latency as a very rough 140 + * approximation of the average of it. 141 + */ 142 + if (measured_us >= lat) 143 + measured_us -= lat / 2; 144 + else 145 + measured_us /= 2; 146 + } 147 + 148 + /* 149 + * Decay the "early hits" metric for all of the states and find the 150 + * states matching the sleep length and the measured idle duration. 151 + */ 152 + for (i = 0; i < drv->state_count; i++) { 153 + unsigned int early_hits = cpu_data->states[i].early_hits; 154 + 155 + cpu_data->states[i].early_hits -= early_hits >> DECAY_SHIFT; 156 + 157 + if (drv->states[i].target_residency <= sleep_length_us) { 158 + idx_timer = i; 159 + if (drv->states[i].target_residency <= measured_us) 160 + idx_hit = i; 161 + } 162 + } 163 + 164 + /* 165 + * Update the "hits" and "misses" data for the state matching the sleep 166 + * length. If it matches the measured idle duration too, this is a hit, 167 + * so increase the "hits" metric for it then. Otherwise, this is a 168 + * miss, so increase the "misses" metric for it. In the latter case 169 + * also increase the "early hits" metric for the state that actually 170 + * matches the measured idle duration. 171 + */ 172 + if (idx_timer >= 0) { 173 + unsigned int hits = cpu_data->states[idx_timer].hits; 174 + unsigned int misses = cpu_data->states[idx_timer].misses; 175 + 176 + hits -= hits >> DECAY_SHIFT; 177 + misses -= misses >> DECAY_SHIFT; 178 + 179 + if (idx_timer > idx_hit) { 180 + misses += PULSE; 181 + if (idx_hit >= 0) 182 + cpu_data->states[idx_hit].early_hits += PULSE; 183 + } else { 184 + hits += PULSE; 185 + } 186 + 187 + cpu_data->states[idx_timer].misses = misses; 188 + cpu_data->states[idx_timer].hits = hits; 189 + } 190 + 191 + /* 192 + * If the total time span between idle state selection and the "reflect" 193 + * callback is greater than or equal to the sleep length determined at 194 + * the idle state selection time, the wakeup is likely to be due to a 195 + * timer event. 196 + */ 197 + if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) 198 + measured_us = UINT_MAX; 199 + 200 + /* 201 + * Save idle duration values corresponding to non-timer wakeups for 202 + * pattern detection. 203 + */ 204 + cpu_data->intervals[cpu_data->interval_idx++] = measured_us; 205 + if (cpu_data->interval_idx > INTERVALS) 206 + cpu_data->interval_idx = 0; 207 + } 208 + 209 + /** 210 + * teo_find_shallower_state - Find shallower idle state matching given duration. 211 + * @drv: cpuidle driver containing state data. 212 + * @dev: Target CPU. 213 + * @state_idx: Index of the capping idle state. 214 + * @duration_us: Idle duration value to match. 215 + */ 216 + static int teo_find_shallower_state(struct cpuidle_driver *drv, 217 + struct cpuidle_device *dev, int state_idx, 218 + unsigned int duration_us) 219 + { 220 + int i; 221 + 222 + for (i = state_idx - 1; i >= 0; i--) { 223 + if (drv->states[i].disabled || dev->states_usage[i].disable) 224 + continue; 225 + 226 + state_idx = i; 227 + if (drv->states[i].target_residency <= duration_us) 228 + break; 229 + } 230 + return state_idx; 231 + } 232 + 233 + /** 234 + * teo_select - Selects the next idle state to enter. 235 + * @drv: cpuidle driver containing state data. 236 + * @dev: Target CPU. 237 + * @stop_tick: Indication on whether or not to stop the scheduler tick. 238 + */ 239 + static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, 240 + bool *stop_tick) 241 + { 242 + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 243 + int latency_req = cpuidle_governor_latency_req(dev->cpu); 244 + unsigned int duration_us, count; 245 + int max_early_idx, idx, i; 246 + ktime_t delta_tick; 247 + 248 + if (cpu_data->last_state >= 0) { 249 + teo_update(drv, dev); 250 + cpu_data->last_state = -1; 251 + } 252 + 253 + cpu_data->time_span_ns = local_clock(); 254 + 255 + cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick); 256 + duration_us = ktime_to_us(cpu_data->sleep_length_ns); 257 + 258 + count = 0; 259 + max_early_idx = -1; 260 + idx = -1; 261 + 262 + for (i = 0; i < drv->state_count; i++) { 263 + struct cpuidle_state *s = &drv->states[i]; 264 + struct cpuidle_state_usage *su = &dev->states_usage[i]; 265 + 266 + if (s->disabled || su->disable) { 267 + /* 268 + * If the "early hits" metric of a disabled state is 269 + * greater than the current maximum, it should be taken 270 + * into account, because it would be a mistake to select 271 + * a deeper state with lower "early hits" metric. The 272 + * index cannot be changed to point to it, however, so 273 + * just increase the max count alone and let the index 274 + * still point to a shallower idle state. 275 + */ 276 + if (max_early_idx >= 0 && 277 + count < cpu_data->states[i].early_hits) 278 + count = cpu_data->states[i].early_hits; 279 + 280 + continue; 281 + } 282 + 283 + if (idx < 0) 284 + idx = i; /* first enabled state */ 285 + 286 + if (s->target_residency > duration_us) 287 + break; 288 + 289 + if (s->exit_latency > latency_req) { 290 + /* 291 + * If we break out of the loop for latency reasons, use 292 + * the target residency of the selected state as the 293 + * expected idle duration to avoid stopping the tick 294 + * as long as that target residency is low enough. 295 + */ 296 + duration_us = drv->states[idx].target_residency; 297 + goto refine; 298 + } 299 + 300 + idx = i; 301 + 302 + if (count < cpu_data->states[i].early_hits && 303 + !(tick_nohz_tick_stopped() && 304 + drv->states[i].target_residency < TICK_USEC)) { 305 + count = cpu_data->states[i].early_hits; 306 + max_early_idx = i; 307 + } 308 + } 309 + 310 + /* 311 + * If the "hits" metric of the idle state matching the sleep length is 312 + * greater than its "misses" metric, that is the one to use. Otherwise, 313 + * it is more likely that one of the shallower states will match the 314 + * idle duration observed after wakeup, so take the one with the maximum 315 + * "early hits" metric, but if that cannot be determined, just use the 316 + * state selected so far. 317 + */ 318 + if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses && 319 + max_early_idx >= 0) { 320 + idx = max_early_idx; 321 + duration_us = drv->states[idx].target_residency; 322 + } 323 + 324 + refine: 325 + if (idx < 0) { 326 + idx = 0; /* No states enabled. Must use 0. */ 327 + } else if (idx > 0) { 328 + u64 sum = 0; 329 + 330 + count = 0; 331 + 332 + /* 333 + * Count and sum the most recent idle duration values less than 334 + * the target residency of the state selected so far, find the 335 + * max. 336 + */ 337 + for (i = 0; i < INTERVALS; i++) { 338 + unsigned int val = cpu_data->intervals[i]; 339 + 340 + if (val >= drv->states[idx].target_residency) 341 + continue; 342 + 343 + count++; 344 + sum += val; 345 + } 346 + 347 + /* 348 + * Give up unless the majority of the most recent idle duration 349 + * values are in the interesting range. 350 + */ 351 + if (count > INTERVALS / 2) { 352 + unsigned int avg_us = div64_u64(sum, count); 353 + 354 + /* 355 + * Avoid spending too much time in an idle state that 356 + * would be too shallow. 357 + */ 358 + if (!(tick_nohz_tick_stopped() && avg_us < TICK_USEC)) { 359 + idx = teo_find_shallower_state(drv, dev, idx, avg_us); 360 + duration_us = avg_us; 361 + } 362 + } 363 + } 364 + 365 + /* 366 + * Don't stop the tick if the selected state is a polling one or if the 367 + * expected idle duration is shorter than the tick period length. 368 + */ 369 + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || 370 + duration_us < TICK_USEC) && !tick_nohz_tick_stopped()) { 371 + unsigned int delta_tick_us = ktime_to_us(delta_tick); 372 + 373 + *stop_tick = false; 374 + 375 + /* 376 + * The tick is not going to be stopped, so if the target 377 + * residency of the state to be returned is not within the time 378 + * till the closest timer including the tick, try to correct 379 + * that. 380 + */ 381 + if (idx > 0 && drv->states[idx].target_residency > delta_tick_us) 382 + idx = teo_find_shallower_state(drv, dev, idx, delta_tick_us); 383 + } 384 + 385 + return idx; 386 + } 387 + 388 + /** 389 + * teo_reflect - Note that governor data for the CPU need to be updated. 390 + * @dev: Target CPU. 391 + * @state: Entered state. 392 + */ 393 + static void teo_reflect(struct cpuidle_device *dev, int state) 394 + { 395 + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 396 + 397 + cpu_data->last_state = state; 398 + /* 399 + * If the wakeup was not "natural", but triggered by one of the safety 400 + * nets, assume that the CPU might have been idle for the entire sleep 401 + * length time. 402 + */ 403 + if (dev->poll_time_limit || 404 + (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) { 405 + dev->poll_time_limit = false; 406 + cpu_data->time_span_ns = cpu_data->sleep_length_ns; 407 + } else { 408 + cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns; 409 + } 410 + } 411 + 412 + /** 413 + * teo_enable_device - Initialize the governor's data for the target CPU. 414 + * @drv: cpuidle driver (not used). 415 + * @dev: Target CPU. 416 + */ 417 + static int teo_enable_device(struct cpuidle_driver *drv, 418 + struct cpuidle_device *dev) 419 + { 420 + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 421 + int i; 422 + 423 + memset(cpu_data, 0, sizeof(*cpu_data)); 424 + 425 + for (i = 0; i < INTERVALS; i++) 426 + cpu_data->intervals[i] = UINT_MAX; 427 + 428 + return 0; 429 + } 430 + 431 + static struct cpuidle_governor teo_governor = { 432 + .name = "teo", 433 + .rating = 19, 434 + .enable = teo_enable_device, 435 + .select = teo_select, 436 + .reflect = teo_reflect, 437 + }; 438 + 439 + static int __init teo_governor_init(void) 440 + { 441 + return cpuidle_register_governor(&teo_governor); 442 + } 443 + 444 + postcore_initcall(teo_governor_init);
+6 -10
drivers/gpu/drm/i915/i915_pmu.c
··· 5 5 */ 6 6 7 7 #include <linux/irq.h> 8 + #include <linux/pm_runtime.h> 8 9 #include "i915_pmu.h" 9 10 #include "intel_ringbuffer.h" 10 11 #include "i915_drv.h" ··· 479 478 * counter value. 480 479 */ 481 480 spin_lock_irqsave(&i915->pmu.lock, flags); 482 - spin_lock(&kdev->power.lock); 483 481 484 482 /* 485 483 * After the above branch intel_runtime_pm_get_if_in_use failed ··· 491 491 * suspended and if not we cannot do better than report the last 492 492 * known RC6 value. 493 493 */ 494 - if (kdev->power.runtime_status == RPM_SUSPENDED) { 494 + if (pm_runtime_status_suspended(kdev)) { 495 + val = pm_runtime_suspended_time(kdev); 496 + 495 497 if (!i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur) 496 - i915->pmu.suspended_jiffies_last = 497 - kdev->power.suspended_jiffies; 498 + i915->pmu.suspended_time_last = val; 498 499 499 - val = kdev->power.suspended_jiffies - 500 - i915->pmu.suspended_jiffies_last; 501 - val += jiffies - kdev->power.accounting_timestamp; 502 - 503 - val = jiffies_to_nsecs(val); 500 + val -= i915->pmu.suspended_time_last; 504 501 val += i915->pmu.sample[__I915_SAMPLE_RC6].cur; 505 502 506 503 i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur = val; ··· 507 510 val = i915->pmu.sample[__I915_SAMPLE_RC6].cur; 508 511 } 509 512 510 - spin_unlock(&kdev->power.lock); 511 513 spin_unlock_irqrestore(&i915->pmu.lock, flags); 512 514 } 513 515
+2 -2
drivers/gpu/drm/i915/i915_pmu.h
··· 97 97 */ 98 98 struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS]; 99 99 /** 100 - * @suspended_jiffies_last: Cached suspend time from PM core. 100 + * @suspended_time_last: Cached suspend time from PM core. 101 101 */ 102 - unsigned long suspended_jiffies_last; 102 + u64 suspended_time_last; 103 103 /** 104 104 * @i915_attr: Memory block holding device attributes. 105 105 */
+1
drivers/idle/intel_idle.c
··· 1103 1103 INTEL_CPU_FAM6(ATOM_GOLDMONT, idle_cpu_bxt), 1104 1104 INTEL_CPU_FAM6(ATOM_GOLDMONT_PLUS, idle_cpu_bxt), 1105 1105 INTEL_CPU_FAM6(ATOM_GOLDMONT_X, idle_cpu_dnv), 1106 + INTEL_CPU_FAM6(ATOM_TREMONT_X, idle_cpu_dnv), 1106 1107 {} 1107 1108 }; 1108 1109
+7 -15
drivers/opp/core.c
··· 551 551 return ret; 552 552 } 553 553 554 - static inline int 555 - _generic_set_opp_clk_only(struct device *dev, struct clk *clk, 556 - unsigned long old_freq, unsigned long freq) 554 + static inline int _generic_set_opp_clk_only(struct device *dev, struct clk *clk, 555 + unsigned long freq) 557 556 { 558 557 int ret; 559 558 ··· 589 590 } 590 591 591 592 /* Change frequency */ 592 - ret = _generic_set_opp_clk_only(dev, opp_table->clk, old_freq, freq); 593 + ret = _generic_set_opp_clk_only(dev, opp_table->clk, freq); 593 594 if (ret) 594 595 goto restore_voltage; 595 596 ··· 603 604 return 0; 604 605 605 606 restore_freq: 606 - if (_generic_set_opp_clk_only(dev, opp_table->clk, freq, old_freq)) 607 + if (_generic_set_opp_clk_only(dev, opp_table->clk, old_freq)) 607 608 dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n", 608 609 __func__, old_freq); 609 610 restore_voltage: ··· 776 777 opp->supplies); 777 778 } else { 778 779 /* Only frequency scaling */ 779 - ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); 780 + ret = _generic_set_opp_clk_only(dev, clk, freq); 780 781 } 781 782 782 783 /* Scaling down? Configure required OPPs after frequency */ ··· 810 811 struct opp_table *opp_table) 811 812 { 812 813 struct opp_device *opp_dev; 813 - int ret; 814 814 815 815 opp_dev = kzalloc(sizeof(*opp_dev), GFP_KERNEL); 816 816 if (!opp_dev) ··· 821 823 list_add(&opp_dev->node, &opp_table->dev_list); 822 824 823 825 /* Create debugfs entries for the opp_table */ 824 - ret = opp_debug_register(opp_dev, opp_table); 825 - if (ret) 826 - dev_err(dev, "%s: Failed to register opp debugfs (%d)\n", 827 - __func__, ret); 826 + opp_debug_register(opp_dev, opp_table); 828 827 829 828 return opp_dev; 830 829 } ··· 1242 1247 new_opp->opp_table = opp_table; 1243 1248 kref_init(&new_opp->kref); 1244 1249 1245 - ret = opp_debug_create_one(new_opp, opp_table); 1246 - if (ret) 1247 - dev_err(dev, "%s: Failed to register opp to debugfs (%d)\n", 1248 - __func__, ret); 1250 + opp_debug_create_one(new_opp, opp_table); 1249 1251 1250 1252 if (!_opp_supported_by_regulators(new_opp, opp_table)) { 1251 1253 new_opp->available = false;
+29 -81
drivers/opp/debugfs.c
··· 35 35 debugfs_remove_recursive(opp->dentry); 36 36 } 37 37 38 - static bool opp_debug_create_supplies(struct dev_pm_opp *opp, 38 + static void opp_debug_create_supplies(struct dev_pm_opp *opp, 39 39 struct opp_table *opp_table, 40 40 struct dentry *pdentry) 41 41 { ··· 50 50 /* Create per-opp directory */ 51 51 d = debugfs_create_dir(name, pdentry); 52 52 53 - if (!d) 54 - return false; 53 + debugfs_create_ulong("u_volt_target", S_IRUGO, d, 54 + &opp->supplies[i].u_volt); 55 55 56 - if (!debugfs_create_ulong("u_volt_target", S_IRUGO, d, 57 - &opp->supplies[i].u_volt)) 58 - return false; 56 + debugfs_create_ulong("u_volt_min", S_IRUGO, d, 57 + &opp->supplies[i].u_volt_min); 59 58 60 - if (!debugfs_create_ulong("u_volt_min", S_IRUGO, d, 61 - &opp->supplies[i].u_volt_min)) 62 - return false; 59 + debugfs_create_ulong("u_volt_max", S_IRUGO, d, 60 + &opp->supplies[i].u_volt_max); 63 61 64 - if (!debugfs_create_ulong("u_volt_max", S_IRUGO, d, 65 - &opp->supplies[i].u_volt_max)) 66 - return false; 67 - 68 - if (!debugfs_create_ulong("u_amp", S_IRUGO, d, 69 - &opp->supplies[i].u_amp)) 70 - return false; 62 + debugfs_create_ulong("u_amp", S_IRUGO, d, 63 + &opp->supplies[i].u_amp); 71 64 } 72 - 73 - return true; 74 65 } 75 66 76 - int opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table) 67 + void opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table) 77 68 { 78 69 struct dentry *pdentry = opp_table->dentry; 79 70 struct dentry *d; ··· 86 95 87 96 /* Create per-opp directory */ 88 97 d = debugfs_create_dir(name, pdentry); 89 - if (!d) 90 - return -ENOMEM; 91 98 92 - if (!debugfs_create_bool("available", S_IRUGO, d, &opp->available)) 93 - return -ENOMEM; 99 + debugfs_create_bool("available", S_IRUGO, d, &opp->available); 100 + debugfs_create_bool("dynamic", S_IRUGO, d, &opp->dynamic); 101 + debugfs_create_bool("turbo", S_IRUGO, d, &opp->turbo); 102 + debugfs_create_bool("suspend", S_IRUGO, d, &opp->suspend); 103 + debugfs_create_u32("performance_state", S_IRUGO, d, &opp->pstate); 104 + debugfs_create_ulong("rate_hz", S_IRUGO, d, &opp->rate); 105 + debugfs_create_ulong("clock_latency_ns", S_IRUGO, d, 106 + &opp->clock_latency_ns); 94 107 95 - if (!debugfs_create_bool("dynamic", S_IRUGO, d, &opp->dynamic)) 96 - return -ENOMEM; 97 - 98 - if (!debugfs_create_bool("turbo", S_IRUGO, d, &opp->turbo)) 99 - return -ENOMEM; 100 - 101 - if (!debugfs_create_bool("suspend", S_IRUGO, d, &opp->suspend)) 102 - return -ENOMEM; 103 - 104 - if (!debugfs_create_u32("performance_state", S_IRUGO, d, &opp->pstate)) 105 - return -ENOMEM; 106 - 107 - if (!debugfs_create_ulong("rate_hz", S_IRUGO, d, &opp->rate)) 108 - return -ENOMEM; 109 - 110 - if (!opp_debug_create_supplies(opp, opp_table, d)) 111 - return -ENOMEM; 112 - 113 - if (!debugfs_create_ulong("clock_latency_ns", S_IRUGO, d, 114 - &opp->clock_latency_ns)) 115 - return -ENOMEM; 108 + opp_debug_create_supplies(opp, opp_table, d); 116 109 117 110 opp->dentry = d; 118 - return 0; 119 111 } 120 112 121 - static int opp_list_debug_create_dir(struct opp_device *opp_dev, 122 - struct opp_table *opp_table) 113 + static void opp_list_debug_create_dir(struct opp_device *opp_dev, 114 + struct opp_table *opp_table) 123 115 { 124 116 const struct device *dev = opp_dev->dev; 125 117 struct dentry *d; ··· 111 137 112 138 /* Create device specific directory */ 113 139 d = debugfs_create_dir(opp_table->dentry_name, rootdir); 114 - if (!d) { 115 - dev_err(dev, "%s: Failed to create debugfs dir\n", __func__); 116 - return -ENOMEM; 117 - } 118 140 119 141 opp_dev->dentry = d; 120 142 opp_table->dentry = d; 121 - 122 - return 0; 123 143 } 124 144 125 - static int opp_list_debug_create_link(struct opp_device *opp_dev, 126 - struct opp_table *opp_table) 145 + static void opp_list_debug_create_link(struct opp_device *opp_dev, 146 + struct opp_table *opp_table) 127 147 { 128 - const struct device *dev = opp_dev->dev; 129 148 char name[NAME_MAX]; 130 - struct dentry *d; 131 149 132 150 opp_set_dev_name(opp_dev->dev, name); 133 151 134 152 /* Create device specific directory link */ 135 - d = debugfs_create_symlink(name, rootdir, opp_table->dentry_name); 136 - if (!d) { 137 - dev_err(dev, "%s: Failed to create link\n", __func__); 138 - return -ENOMEM; 139 - } 140 - 141 - opp_dev->dentry = d; 142 - 143 - return 0; 153 + opp_dev->dentry = debugfs_create_symlink(name, rootdir, 154 + opp_table->dentry_name); 144 155 } 145 156 146 157 /** ··· 136 177 * Dynamically adds device specific directory in debugfs 'opp' directory. If the 137 178 * device-opp is shared with other devices, then links will be created for all 138 179 * devices except the first. 139 - * 140 - * Return: 0 on success, otherwise negative error. 141 180 */ 142 - int opp_debug_register(struct opp_device *opp_dev, struct opp_table *opp_table) 181 + void opp_debug_register(struct opp_device *opp_dev, struct opp_table *opp_table) 143 182 { 144 - if (!rootdir) { 145 - pr_debug("%s: Uninitialized rootdir\n", __func__); 146 - return -EINVAL; 147 - } 148 - 149 183 if (opp_table->dentry) 150 - return opp_list_debug_create_link(opp_dev, opp_table); 151 - 152 - return opp_list_debug_create_dir(opp_dev, opp_table); 184 + opp_list_debug_create_link(opp_dev, opp_table); 185 + else 186 + opp_list_debug_create_dir(opp_dev, opp_table); 153 187 } 154 188 155 189 static void opp_migrate_dentry(struct opp_device *opp_dev, ··· 204 252 { 205 253 /* Create /sys/kernel/debug/opp directory */ 206 254 rootdir = debugfs_create_dir("opp", NULL); 207 - if (!rootdir) { 208 - pr_err("%s: Failed to create root directory\n", __func__); 209 - return -ENOMEM; 210 - } 211 255 212 256 return 0; 213 257 }
+99
drivers/opp/of.c
··· 20 20 #include <linux/pm_domain.h> 21 21 #include <linux/slab.h> 22 22 #include <linux/export.h> 23 + #include <linux/energy_model.h> 23 24 24 25 #include "opp.h" 25 26 ··· 1050 1049 return of_node_get(opp->np); 1051 1050 } 1052 1051 EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node); 1052 + 1053 + /* 1054 + * Callback function provided to the Energy Model framework upon registration. 1055 + * This computes the power estimated by @CPU at @kHz if it is the frequency 1056 + * of an existing OPP, or at the frequency of the first OPP above @kHz otherwise 1057 + * (see dev_pm_opp_find_freq_ceil()). This function updates @kHz to the ceiled 1058 + * frequency and @mW to the associated power. The power is estimated as 1059 + * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively 1060 + * the voltage and frequency of the OPP. 1061 + * 1062 + * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power 1063 + * calculation failed because of missing parameters, 0 otherwise. 1064 + */ 1065 + static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, 1066 + int cpu) 1067 + { 1068 + struct device *cpu_dev; 1069 + struct dev_pm_opp *opp; 1070 + struct device_node *np; 1071 + unsigned long mV, Hz; 1072 + u32 cap; 1073 + u64 tmp; 1074 + int ret; 1075 + 1076 + cpu_dev = get_cpu_device(cpu); 1077 + if (!cpu_dev) 1078 + return -ENODEV; 1079 + 1080 + np = of_node_get(cpu_dev->of_node); 1081 + if (!np) 1082 + return -EINVAL; 1083 + 1084 + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); 1085 + of_node_put(np); 1086 + if (ret) 1087 + return -EINVAL; 1088 + 1089 + Hz = *kHz * 1000; 1090 + opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz); 1091 + if (IS_ERR(opp)) 1092 + return -EINVAL; 1093 + 1094 + mV = dev_pm_opp_get_voltage(opp) / 1000; 1095 + dev_pm_opp_put(opp); 1096 + if (!mV) 1097 + return -EINVAL; 1098 + 1099 + tmp = (u64)cap * mV * mV * (Hz / 1000000); 1100 + do_div(tmp, 1000000000); 1101 + 1102 + *mW = (unsigned long)tmp; 1103 + *kHz = Hz / 1000; 1104 + 1105 + return 0; 1106 + } 1107 + 1108 + /** 1109 + * dev_pm_opp_of_register_em() - Attempt to register an Energy Model 1110 + * @cpus : CPUs for which an Energy Model has to be registered 1111 + * 1112 + * This checks whether the "dynamic-power-coefficient" devicetree property has 1113 + * been specified, and tries to register an Energy Model with it if it has. 1114 + */ 1115 + void dev_pm_opp_of_register_em(struct cpumask *cpus) 1116 + { 1117 + struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power); 1118 + int ret, nr_opp, cpu = cpumask_first(cpus); 1119 + struct device *cpu_dev; 1120 + struct device_node *np; 1121 + u32 cap; 1122 + 1123 + cpu_dev = get_cpu_device(cpu); 1124 + if (!cpu_dev) 1125 + return; 1126 + 1127 + nr_opp = dev_pm_opp_get_opp_count(cpu_dev); 1128 + if (nr_opp <= 0) 1129 + return; 1130 + 1131 + np = of_node_get(cpu_dev->of_node); 1132 + if (!np) 1133 + return; 1134 + 1135 + /* 1136 + * Register an EM only if the 'dynamic-power-coefficient' property is 1137 + * set in devicetree. It is assumed the voltage values are known if that 1138 + * property is set since it is useless otherwise. If voltages are not 1139 + * known, just let the EM registration fail with an error to alert the 1140 + * user about the inconsistent configuration. 1141 + */ 1142 + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); 1143 + of_node_put(np); 1144 + if (ret || !cap) 1145 + return; 1146 + 1147 + em_register_perf_domain(cpus, nr_opp, &em_cb); 1148 + } 1149 + EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em);
+7 -8
drivers/opp/opp.h
··· 238 238 239 239 #ifdef CONFIG_DEBUG_FS 240 240 void opp_debug_remove_one(struct dev_pm_opp *opp); 241 - int opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table); 242 - int opp_debug_register(struct opp_device *opp_dev, struct opp_table *opp_table); 241 + void opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table); 242 + void opp_debug_register(struct opp_device *opp_dev, struct opp_table *opp_table); 243 243 void opp_debug_unregister(struct opp_device *opp_dev, struct opp_table *opp_table); 244 244 #else 245 245 static inline void opp_debug_remove_one(struct dev_pm_opp *opp) {} 246 246 247 - static inline int opp_debug_create_one(struct dev_pm_opp *opp, 248 - struct opp_table *opp_table) 249 - { return 0; } 250 - static inline int opp_debug_register(struct opp_device *opp_dev, 251 - struct opp_table *opp_table) 252 - { return 0; } 247 + static inline void opp_debug_create_one(struct dev_pm_opp *opp, 248 + struct opp_table *opp_table) { } 249 + 250 + static inline void opp_debug_register(struct opp_device *opp_dev, 251 + struct opp_table *opp_table) { } 253 252 254 253 static inline void opp_debug_unregister(struct opp_device *opp_dev, 255 254 struct opp_table *opp_table)
+2
drivers/powercap/intel_rapl.c
··· 1156 1156 INTEL_CPU_FAM6(KABYLAKE_MOBILE, rapl_defaults_core), 1157 1157 INTEL_CPU_FAM6(KABYLAKE_DESKTOP, rapl_defaults_core), 1158 1158 INTEL_CPU_FAM6(CANNONLAKE_MOBILE, rapl_defaults_core), 1159 + INTEL_CPU_FAM6(ICELAKE_MOBILE, rapl_defaults_core), 1159 1160 1160 1161 INTEL_CPU_FAM6(ATOM_SILVERMONT, rapl_defaults_byt), 1161 1162 INTEL_CPU_FAM6(ATOM_AIRMONT, rapl_defaults_cht), ··· 1165 1164 INTEL_CPU_FAM6(ATOM_GOLDMONT, rapl_defaults_core), 1166 1165 INTEL_CPU_FAM6(ATOM_GOLDMONT_PLUS, rapl_defaults_core), 1167 1166 INTEL_CPU_FAM6(ATOM_GOLDMONT_X, rapl_defaults_core), 1167 + INTEL_CPU_FAM6(ATOM_TREMONT_X, rapl_defaults_core), 1168 1168 1169 1169 INTEL_CPU_FAM6(XEON_PHI_KNL, rapl_defaults_hsw_server), 1170 1170 INTEL_CPU_FAM6(XEON_PHI_KNM, rapl_defaults_hsw_server),
+1
drivers/thermal/Kconfig
··· 152 152 bool "generic cpu cooling support" 153 153 depends on CPU_FREQ 154 154 depends on THERMAL_OF 155 + depends on THERMAL=y 155 156 help 156 157 This implements the generic cpu cooling mechanism through frequency 157 158 reduction. An ACPI version of this already exists
+1
include/acpi/cppc_acpi.h
··· 137 137 cpumask_var_t shared_cpu_map; 138 138 }; 139 139 140 + extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); 140 141 extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); 141 142 extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); 142 143 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
+26 -22
include/linux/cpufreq.h
··· 151 151 152 152 /* For cpufreq driver's internal use */ 153 153 void *driver_data; 154 + 155 + /* Pointer to the cooling device if used for thermal mitigation */ 156 + struct thermal_cooling_device *cdev; 154 157 }; 155 158 156 159 /* Only for ACPI */ ··· 257 254 static struct freq_attr _name = \ 258 255 __ATTR(_name, 0200, NULL, store_##_name) 259 256 260 - struct global_attr { 261 - struct attribute attr; 262 - ssize_t (*show)(struct kobject *kobj, 263 - struct attribute *attr, char *buf); 264 - ssize_t (*store)(struct kobject *a, struct attribute *b, 265 - const char *c, size_t count); 266 - }; 267 - 268 257 #define define_one_global_ro(_name) \ 269 - static struct global_attr _name = \ 258 + static struct kobj_attribute _name = \ 270 259 __ATTR(_name, 0444, show_##_name, NULL) 271 260 272 261 #define define_one_global_rw(_name) \ 273 - static struct global_attr _name = \ 262 + static struct kobj_attribute _name = \ 274 263 __ATTR(_name, 0644, show_##_name, store_##_name) 275 264 276 265 ··· 325 330 /* optional */ 326 331 int (*bios_limit)(int cpu, unsigned int *limit); 327 332 333 + int (*online)(struct cpufreq_policy *policy); 334 + int (*offline)(struct cpufreq_policy *policy); 328 335 int (*exit)(struct cpufreq_policy *policy); 329 336 void (*stop_cpu)(struct cpufreq_policy *policy); 330 337 int (*suspend)(struct cpufreq_policy *policy); ··· 343 346 }; 344 347 345 348 /* flags */ 346 - #define CPUFREQ_STICKY (1 << 0) /* driver isn't removed even if 347 - all ->init() calls failed */ 348 - #define CPUFREQ_CONST_LOOPS (1 << 1) /* loops_per_jiffy or other 349 - kernel "constants" aren't 350 - affected by frequency 351 - transitions */ 352 - #define CPUFREQ_PM_NO_WARN (1 << 2) /* don't warn on suspend/resume 353 - speed mismatches */ 349 + 350 + /* driver isn't removed even if all ->init() calls failed */ 351 + #define CPUFREQ_STICKY BIT(0) 352 + 353 + /* loops_per_jiffy or other kernel "constants" aren't affected by frequency transitions */ 354 + #define CPUFREQ_CONST_LOOPS BIT(1) 355 + 356 + /* don't warn on suspend/resume speed mismatches */ 357 + #define CPUFREQ_PM_NO_WARN BIT(2) 354 358 355 359 /* 356 360 * This should be set by platforms having multiple clock-domains, i.e. ··· 359 361 * be created in cpu/cpu<num>/cpufreq/ directory and so they can use the same 360 362 * governor with different tunables for different clusters. 361 363 */ 362 - #define CPUFREQ_HAVE_GOVERNOR_PER_POLICY (1 << 3) 364 + #define CPUFREQ_HAVE_GOVERNOR_PER_POLICY BIT(3) 363 365 364 366 /* 365 367 * Driver will do POSTCHANGE notifications from outside of their ->target() 366 368 * routine and so must set cpufreq_driver->flags with this flag, so that core 367 369 * can handle them specially. 368 370 */ 369 - #define CPUFREQ_ASYNC_NOTIFICATION (1 << 4) 371 + #define CPUFREQ_ASYNC_NOTIFICATION BIT(4) 370 372 371 373 /* 372 374 * Set by drivers which want cpufreq core to check if CPU is running at a ··· 375 377 * from the table. And if that fails, we will stop further boot process by 376 378 * issuing a BUG_ON(). 377 379 */ 378 - #define CPUFREQ_NEED_INITIAL_FREQ_CHECK (1 << 5) 380 + #define CPUFREQ_NEED_INITIAL_FREQ_CHECK BIT(5) 379 381 380 382 /* 381 383 * Set by drivers to disallow use of governors with "dynamic_switching" flag 382 384 * set. 383 385 */ 384 - #define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING (1 << 6) 386 + #define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING BIT(6) 387 + 388 + /* 389 + * Set by drivers that want the core to automatically register the cpufreq 390 + * driver as a thermal cooling device. 391 + */ 392 + #define CPUFREQ_IS_COOLING_DEV BIT(7) 385 393 386 394 int cpufreq_register_driver(struct cpufreq_driver *driver_data); 387 395 int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
+3 -5
include/linux/cpuidle.h
··· 69 69 70 70 /* Idle State Flags */ 71 71 #define CPUIDLE_FLAG_NONE (0x00) 72 - #define CPUIDLE_FLAG_POLLING (0x01) /* polling state */ 73 - #define CPUIDLE_FLAG_COUPLED (0x02) /* state applies to multiple cpus */ 74 - #define CPUIDLE_FLAG_TIMER_STOP (0x04) /* timer is stopped on this state */ 75 - 76 - #define CPUIDLE_DRIVER_FLAGS_MASK (0xFFFF0000) 72 + #define CPUIDLE_FLAG_POLLING BIT(0) /* polling state */ 73 + #define CPUIDLE_FLAG_COUPLED BIT(1) /* state applies to multiple cpus */ 74 + #define CPUIDLE_FLAG_TIMER_STOP BIT(2) /* timer is stopped on this state */ 77 75 78 76 struct cpuidle_device_kobj; 79 77 struct cpuidle_state_kobj;
+10
include/linux/device.h
··· 1165 1165 return !!dev->power.async_suspend; 1166 1166 } 1167 1167 1168 + static inline bool device_pm_not_required(struct device *dev) 1169 + { 1170 + return dev->power.no_pm; 1171 + } 1172 + 1173 + static inline void device_set_pm_not_required(struct device *dev) 1174 + { 1175 + dev->power.no_pm = true; 1176 + } 1177 + 1168 1178 static inline void dev_pm_syscore_device(struct device *dev, bool val) 1169 1179 { 1170 1180 #ifdef CONFIG_PM_SLEEP
+19
include/linux/platform_data/davinci-cpufreq.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * TI DaVinci CPUFreq platform support. 4 + * 5 + * Copyright (C) 2009 Texas Instruments, Inc. http://www.ti.com/ 6 + */ 7 + 8 + #ifndef _MACH_DAVINCI_CPUFREQ_H 9 + #define _MACH_DAVINCI_CPUFREQ_H 10 + 11 + #include <linux/cpufreq.h> 12 + 13 + struct davinci_cpufreq_config { 14 + struct cpufreq_frequency_table *freq_table; 15 + int (*set_voltage)(unsigned int index); 16 + int (*init)(void); 17 + }; 18 + 19 + #endif /* _MACH_DAVINCI_CPUFREQ_H */
+4 -3
include/linux/pm.h
··· 592 592 bool is_suspended:1; /* Ditto */ 593 593 bool is_noirq_suspended:1; 594 594 bool is_late_suspended:1; 595 + bool no_pm:1; 595 596 bool early_init:1; /* Owned by the PM core */ 596 597 bool direct_complete:1; /* Owned by the PM core */ 597 598 u32 driver_flags; ··· 634 633 int runtime_error; 635 634 int autosuspend_delay; 636 635 u64 last_busy; 637 - unsigned long active_jiffies; 638 - unsigned long suspended_jiffies; 639 - unsigned long accounting_timestamp; 636 + u64 active_time; 637 + u64 suspended_time; 638 + u64 accounting_timestamp; 640 639 #endif 641 640 struct pm_subsys_data *subsys_data; /* Owned by the subsystem. */ 642 641 void (*set_latency_tolerance)(struct device *, s32);
+4 -4
include/linux/pm_domain.h
··· 271 271 struct device *genpd_dev_pm_attach_by_id(struct device *dev, 272 272 unsigned int index); 273 273 struct device *genpd_dev_pm_attach_by_name(struct device *dev, 274 - char *name); 274 + const char *name); 275 275 #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */ 276 276 static inline int of_genpd_add_provider_simple(struct device_node *np, 277 277 struct generic_pm_domain *genpd) ··· 324 324 } 325 325 326 326 static inline struct device *genpd_dev_pm_attach_by_name(struct device *dev, 327 - char *name) 327 + const char *name) 328 328 { 329 329 return NULL; 330 330 } ··· 341 341 struct device *dev_pm_domain_attach_by_id(struct device *dev, 342 342 unsigned int index); 343 343 struct device *dev_pm_domain_attach_by_name(struct device *dev, 344 - char *name); 344 + const char *name); 345 345 void dev_pm_domain_detach(struct device *dev, bool power_off); 346 346 void dev_pm_domain_set(struct device *dev, struct dev_pm_domain *pd); 347 347 #else ··· 355 355 return NULL; 356 356 } 357 357 static inline struct device *dev_pm_domain_attach_by_name(struct device *dev, 358 - char *name) 358 + const char *name) 359 359 { 360 360 return NULL; 361 361 }
+6
include/linux/pm_opp.h
··· 334 334 struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); 335 335 struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); 336 336 int of_get_required_opp_performance_state(struct device_node *np, int index); 337 + void dev_pm_opp_of_register_em(struct cpumask *cpus); 337 338 #else 338 339 static inline int dev_pm_opp_of_add_table(struct device *dev) 339 340 { ··· 373 372 { 374 373 return NULL; 375 374 } 375 + 376 + static inline void dev_pm_opp_of_register_em(struct cpumask *cpus) 377 + { 378 + } 379 + 376 380 static inline int of_get_required_opp_performance_state(struct device_node *np, int index) 377 381 { 378 382 return -ENOTSUPP;
+2
include/linux/pm_runtime.h
··· 113 113 return dev->power.irq_safe; 114 114 } 115 115 116 + extern u64 pm_runtime_suspended_time(struct device *dev); 117 + 116 118 #else /* !CONFIG_PM */ 117 119 118 120 static inline bool queue_pm_work(struct work_struct *work) { return false; }
+57
kernel/power/energy_model.c
··· 10 10 11 11 #include <linux/cpu.h> 12 12 #include <linux/cpumask.h> 13 + #include <linux/debugfs.h> 13 14 #include <linux/energy_model.h> 14 15 #include <linux/sched/topology.h> 15 16 #include <linux/slab.h> ··· 24 23 */ 25 24 static DEFINE_MUTEX(em_pd_mutex); 26 25 26 + #ifdef CONFIG_DEBUG_FS 27 + static struct dentry *rootdir; 28 + 29 + static void em_debug_create_cs(struct em_cap_state *cs, struct dentry *pd) 30 + { 31 + struct dentry *d; 32 + char name[24]; 33 + 34 + snprintf(name, sizeof(name), "cs:%lu", cs->frequency); 35 + 36 + /* Create per-cs directory */ 37 + d = debugfs_create_dir(name, pd); 38 + debugfs_create_ulong("frequency", 0444, d, &cs->frequency); 39 + debugfs_create_ulong("power", 0444, d, &cs->power); 40 + debugfs_create_ulong("cost", 0444, d, &cs->cost); 41 + } 42 + 43 + static int em_debug_cpus_show(struct seq_file *s, void *unused) 44 + { 45 + seq_printf(s, "%*pbl\n", cpumask_pr_args(to_cpumask(s->private))); 46 + 47 + return 0; 48 + } 49 + DEFINE_SHOW_ATTRIBUTE(em_debug_cpus); 50 + 51 + static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) 52 + { 53 + struct dentry *d; 54 + char name[8]; 55 + int i; 56 + 57 + snprintf(name, sizeof(name), "pd%d", cpu); 58 + 59 + /* Create the directory of the performance domain */ 60 + d = debugfs_create_dir(name, rootdir); 61 + 62 + debugfs_create_file("cpus", 0444, d, pd->cpus, &em_debug_cpus_fops); 63 + 64 + /* Create a sub-directory for each capacity state */ 65 + for (i = 0; i < pd->nr_cap_states; i++) 66 + em_debug_create_cs(&pd->table[i], d); 67 + } 68 + 69 + static int __init em_debug_init(void) 70 + { 71 + /* Create /sys/kernel/debug/energy_model directory */ 72 + rootdir = debugfs_create_dir("energy_model", NULL); 73 + 74 + return 0; 75 + } 76 + core_initcall(em_debug_init); 77 + #else /* CONFIG_DEBUG_FS */ 78 + static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) {} 79 + #endif 27 80 static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states, 28 81 struct em_data_callback *cb) 29 82 { ··· 156 101 pd->table = table; 157 102 pd->nr_cap_states = nr_states; 158 103 cpumask_copy(to_cpumask(pd->cpus), span); 104 + 105 + em_debug_create_pd(pd, cpu); 159 106 160 107 return pd; 161 108
+2 -6
kernel/power/qos.c
··· 582 582 qos->pm_qos_power_miscdev.name = qos->name; 583 583 qos->pm_qos_power_miscdev.fops = &pm_qos_power_fops; 584 584 585 - if (d) { 586 - (void)debugfs_create_file(qos->name, S_IRUGO, d, 587 - (void *)qos, &pm_qos_debug_fops); 588 - } 585 + debugfs_create_file(qos->name, S_IRUGO, d, (void *)qos, 586 + &pm_qos_debug_fops); 589 587 590 588 return misc_register(&qos->pm_qos_power_miscdev); 591 589 } ··· 683 685 BUILD_BUG_ON(ARRAY_SIZE(pm_qos_array) != PM_QOS_NUM_CLASSES); 684 686 685 687 d = debugfs_create_dir("pm_qos", NULL); 686 - if (IS_ERR_OR_NULL(d)) 687 - d = NULL; 688 688 689 689 for (i = PM_QOS_CPU_DMA_LATENCY; i < PM_QOS_NUM_CLASSES; i++) { 690 690 ret = register_pm_qos_misc(pm_qos_array[i], d);