Merge tag 'pm-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

+96 -8

Documentation/admin-guide/pm/cpuidle.rst

··· 155 155 and that is the primary reason for having more than one governor in the 156 156 ``CPUIdle`` subsystem. 157 157 158 - There are two ``CPUIdle`` governors available, ``menu`` and ``ladder``. Which 159 - of them is used depends on the configuration of the kernel and in particular on 160 - whether or not the scheduler tick can be `stopped by the idle 161 - loop <idle-cpus-and-tick_>`_. It is possible to change the governor at run time 162 - if the ``cpuidle_sysfs_switch`` command line parameter has been passed to the 163 - kernel, but that is not safe in general, so it should not be done on production 164 - systems (that may change in the future, though). The name of the ``CPUIdle`` 165 - governor currently used by the kernel can be read from the 158 + There are three ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_ 159 + and ``ladder``. Which of them is used by default depends on the configuration 160 + of the kernel and in particular on whether or not the scheduler tick can be 161 + `stopped by the idle loop <idle-cpus-and-tick_>`_. It is possible to change the 162 + governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has 163 + been passed to the kernel, but that is not safe in general, so it should not be 164 + done on production systems (that may change in the future, though). The name of 165 + the ``CPUIdle`` governor currently used by the kernel can be read from the 166 166 :file:`current_governor_ro` (or :file:`current_governor` if 167 167 ``cpuidle_sysfs_switch`` is present in the kernel command line) file under 168 168 :file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``. ··· 256 256 ``CPUIdle`` governor on it will be ``ladder``. 257 257 258 258 259 + .. _menu-gov: 260 + 259 261 The ``menu`` Governor 260 262 ===================== 261 263 ··· 333 331 the real time until the closest timer event and if it really is greater than 334 332 that time, the governor may need to select a shallower state with a suitable 335 333 target residency. 334 + 335 + 336 + .. _teo-gov: 337 + 338 + The Timer Events Oriented (TEO) Governor 339 + ======================================== 340 + 341 + The timer events oriented (TEO) governor is an alternative ``CPUIdle`` governor 342 + for tickless systems. It follows the same basic strategy as the ``menu`` `one 343 + <menu-gov_>`_: it always tries to find the deepest idle state suitable for the 344 + given conditions. However, it applies a different approach to that problem. 345 + 346 + First, it does not use sleep length correction factors, but instead it attempts 347 + to correlate the observed idle duration values with the available idle states 348 + and use that information to pick up the idle state that is most likely to 349 + "match" the upcoming CPU idle interval. Second, it does not take the tasks 350 + that were running on the given CPU in the past and are waiting on some I/O 351 + operations to complete now at all (there is no guarantee that they will run on 352 + the same CPU when they become runnable again) and the pattern detection code in 353 + it avoids taking timer wakeups into account. It also only uses idle duration 354 + values less than the current time till the closest timer (with the scheduler 355 + tick excluded) for that purpose. 356 + 357 + Like in the ``menu`` governor `case <menu-gov_>`_, the first step is to obtain 358 + the *sleep length*, which is the time until the closest timer event with the 359 + assumption that the scheduler tick will be stopped (that also is the upper bound 360 + on the time until the next CPU wakeup). That value is then used to preselect an 361 + idle state on the basis of three metrics maintained for each idle state provided 362 + by the ``CPUIdle`` driver: ``hits``, ``misses`` and ``early_hits``. 363 + 364 + The ``hits`` and ``misses`` metrics measure the likelihood that a given idle 365 + state will "match" the observed (post-wakeup) idle duration if it "matches" the 366 + sleep length. They both are subject to decay (after a CPU wakeup) every time 367 + the target residency of the idle state corresponding to them is less than or 368 + equal to the sleep length and the target residency of the next idle state is 369 + greater than the sleep length (that is, when the idle state corresponding to 370 + them "matches" the sleep length). The ``hits`` metric is increased if the 371 + former condition is satisfied and the target residency of the given idle state 372 + is less than or equal to the observed idle duration and the target residency of 373 + the next idle state is greater than the observed idle duration at the same time 374 + (that is, it is increased when the given idle state "matches" both the sleep 375 + length and the observed idle duration). In turn, the ``misses`` metric is 376 + increased when the given idle state "matches" the sleep length only and the 377 + observed idle duration is too short for its target residency. 378 + 379 + The ``early_hits`` metric measures the likelihood that a given idle state will 380 + "match" the observed (post-wakeup) idle duration if it does not "match" the 381 + sleep length. It is subject to decay on every CPU wakeup and it is increased 382 + when the idle state corresponding to it "matches" the observed (post-wakeup) 383 + idle duration and the target residency of the next idle state is less than or 384 + equal to the sleep length (i.e. the idle state "matching" the sleep length is 385 + deeper than the given one). 386 + 387 + The governor walks the list of idle states provided by the ``CPUIdle`` driver 388 + and finds the last (deepest) one with the target residency less than or equal 389 + to the sleep length. Then, the ``hits`` and ``misses`` metrics of that idle 390 + state are compared with each other and it is preselected if the ``hits`` one is 391 + greater (which means that that idle state is likely to "match" the observed idle 392 + duration after CPU wakeup). If the ``misses`` one is greater, the governor 393 + preselects the shallower idle state with the maximum ``early_hits`` metric 394 + (or if there are multiple shallower idle states with equal ``early_hits`` 395 + metric which also is the maximum, the shallowest of them will be preselected). 396 + [If there is a wakeup latency constraint coming from the `PM QoS framework 397 + <cpu-pm-qos_>`_ which is hit before reaching the deepest idle state with the 398 + target residency within the sleep length, the deepest idle state with the exit 399 + latency within the constraint is preselected without consulting the ``hits``, 400 + ``misses`` and ``early_hits`` metrics.] 401 + 402 + Next, the governor takes several idle duration values observed most recently 403 + into consideration and if at least a half of them are greater than or equal to 404 + the target residency of the preselected idle state, that idle state becomes the 405 + final candidate to ask for. Otherwise, the average of the most recent idle 406 + duration values below the target residency of the preselected idle state is 407 + computed and the governor walks the idle states shallower than the preselected 408 + one and finds the deepest of them with the target residency within that average. 409 + That idle state is then taken as the final candidate to ask for. 410 + 411 + Still, at this point the governor may need to refine the idle state selection if 412 + it has not decided to `stop the scheduler tick <idle-cpus-and-tick_>`_. That 413 + generally happens if the target residency of the idle state selected so far is 414 + less than the tick period and the tick has not been stopped already (in a 415 + previous iteration of the idle loop). Then, like in the ``menu`` governor 416 + `case <menu-gov_>`_, the sleep length used in the previous computations may not 417 + reflect the real time until the closest timer event and if it really is greater 418 + than that time, a shallower state with a suitable target residency may need to 419 + be selected. 336 420 337 421 338 422 .. _idle-states-representation:

-37

Documentation/cpuidle/driver.txt

··· 1 - 2 - 3 - Supporting multiple CPU idle levels in kernel 4 - 5 - cpuidle drivers 6 - 7 - 8 - 9 - 10 - cpuidle driver hooks into the cpuidle infrastructure and handles the 11 - architecture/platform dependent part of CPU idle states. Driver 12 - provides the platform idle state detection capability and also 13 - has mechanisms in place to support actual entry-exit into CPU idle states. 14 - 15 - cpuidle driver initializes the cpuidle_device structure for each CPU device 16 - and registers with cpuidle using cpuidle_register_device. 17 - 18 - If all the idle states are the same, the wrapper function cpuidle_register 19 - could be used instead. 20 - 21 - It can also support the dynamic changes (like battery <-> AC), by using 22 - cpuidle_pause_and_lock, cpuidle_disable_device and cpuidle_enable_device, 23 - cpuidle_resume_and_unlock. 24 - 25 - Interfaces: 26 - extern int cpuidle_register(struct cpuidle_driver *drv, 27 - const struct cpumask *const coupled_cpus); 28 - extern int cpuidle_unregister(struct cpuidle_driver *drv); 29 - extern int cpuidle_register_driver(struct cpuidle_driver *drv); 30 - extern void cpuidle_unregister_driver(struct cpuidle_driver *drv); 31 - extern int cpuidle_register_device(struct cpuidle_device *dev); 32 - extern void cpuidle_unregister_device(struct cpuidle_device *dev); 33 - 34 - extern void cpuidle_pause_and_lock(void); 35 - extern void cpuidle_resume_and_unlock(void); 36 - extern int cpuidle_enable_device(struct cpuidle_device *dev); 37 - extern void cpuidle_disable_device(struct cpuidle_device *dev);

-28

Documentation/cpuidle/governor.txt

··· 1 - 2 - 3 - 4 - Supporting multiple CPU idle levels in kernel 5 - 6 - cpuidle governors 7 - 8 - 9 - 10 - 11 - cpuidle governor is policy routine that decides what idle state to enter at 12 - any given time. cpuidle core uses different callbacks to the governor. 13 - 14 - * enable() to enable governor for a particular device 15 - * disable() to disable governor for a particular device 16 - * select() to select an idle state to enter 17 - * reflect() called after returning from the idle state, which can be used 18 - by the governor for some record keeping. 19 - 20 - More than one governor can be registered at the same time and 21 - users can switch between drivers using /sysfs interface (when enabled). 22 - More than one governor part is supported for developers to easily experiment 23 - with different governors. By default, most optimal governor based on your 24 - kernel configuration and platform will be selected by cpuidle. 25 - 26 - Interfaces: 27 - extern int cpuidle_register_governor(struct cpuidle_governor *gov); 28 - struct cpuidle_governor

+282

Documentation/driver-api/pm/cpuidle.rst

··· 1 + .. |struct cpuidle_governor| replace:: :c:type:`struct cpuidle_governor <cpuidle_governor>` 2 + .. |struct cpuidle_device| replace:: :c:type:`struct cpuidle_device <cpuidle_device>` 3 + .. |struct cpuidle_driver| replace:: :c:type:`struct cpuidle_driver <cpuidle_driver>` 4 + .. |struct cpuidle_state| replace:: :c:type:`struct cpuidle_state <cpuidle_state>` 5 + 6 + ======================== 7 + CPU Idle Time Management 8 + ======================== 9 + 10 + :: 11 + 12 + Copyright (c) 2019 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> 13 + 14 + 15 + CPU Idle Time Management Subsystem 16 + ================================== 17 + 18 + Every time one of the logical CPUs in the system (the entities that appear to 19 + fetch and execute instructions: hardware threads, if present, or processor 20 + cores) is idle after an interrupt or equivalent wakeup event, which means that 21 + there are no tasks to run on it except for the special "idle" task associated 22 + with it, there is an opportunity to save energy for the processor that it 23 + belongs to. That can be done by making the idle logical CPU stop fetching 24 + instructions from memory and putting some of the processor's functional units 25 + depended on by it into an idle state in which they will draw less power. 26 + 27 + However, there may be multiple different idle states that can be used in such a 28 + situation in principle, so it may be necessary to find the most suitable one 29 + (from the kernel perspective) and ask the processor to use (or "enter") that 30 + particular idle state. That is the role of the CPU idle time management 31 + subsystem in the kernel, called ``CPUIdle``. 32 + 33 + The design of ``CPUIdle`` is modular and based on the code duplication avoidance 34 + principle, so the generic code that in principle need not depend on the hardware 35 + or platform design details in it is separate from the code that interacts with 36 + the hardware. It generally is divided into three categories of functional 37 + units: *governors* responsible for selecting idle states to ask the processor 38 + to enter, *drivers* that pass the governors' decisions on to the hardware and 39 + the *core* providing a common framework for them. 40 + 41 + 42 + CPU Idle Time Governors 43 + ======================= 44 + 45 + A CPU idle time (``CPUIdle``) governor is a bundle of policy code invoked when 46 + one of the logical CPUs in the system turns out to be idle. Its role is to 47 + select an idle state to ask the processor to enter in order to save some energy. 48 + 49 + ``CPUIdle`` governors are generic and each of them can be used on any hardware 50 + platform that the Linux kernel can run on. For this reason, data structures 51 + operated on by them cannot depend on any hardware architecture or platform 52 + design details as well. 53 + 54 + The governor itself is represented by a |struct cpuidle_governor| object 55 + containing four callback pointers, :c:member:`enable`, :c:member:`disable`, 56 + :c:member:`select`, :c:member:`reflect`, a :c:member:`rating` field described 57 + below, and a name (string) used for identifying it. 58 + 59 + For the governor to be available at all, that object needs to be registered 60 + with the ``CPUIdle`` core by calling :c:func:`cpuidle_register_governor()` with 61 + a pointer to it passed as the argument. If successful, that causes the core to 62 + add the governor to the global list of available governors and, if it is the 63 + only one in the list (that is, the list was empty before) or the value of its 64 + :c:member:`rating` field is greater than the value of that field for the 65 + governor currently in use, or the name of the new governor was passed to the 66 + kernel as the value of the ``cpuidle.governor=`` command line parameter, the new 67 + governor will be used from that point on (there can be only one ``CPUIdle`` 68 + governor in use at a time). Also, if ``cpuidle_sysfs_switch`` is passed to the 69 + kernel in the command line, user space can choose the ``CPUIdle`` governor to 70 + use at run time via ``sysfs``. 71 + 72 + Once registered, ``CPUIdle`` governors cannot be unregistered, so it is not 73 + practical to put them into loadable kernel modules. 74 + 75 + The interface between ``CPUIdle`` governors and the core consists of four 76 + callbacks: 77 + 78 + :c:member:`enable` 79 + :: 80 + 81 + int (*enable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); 82 + 83 + The role of this callback is to prepare the governor for handling the 84 + (logical) CPU represented by the |struct cpuidle_device| object pointed 85 + to by the ``dev`` argument. The |struct cpuidle_driver| object pointed 86 + to by the ``drv`` argument represents the ``CPUIdle`` driver to be used 87 + with that CPU (among other things, it should contain the list of 88 + |struct cpuidle_state| objects representing idle states that the 89 + processor holding the given CPU can be asked to enter). 90 + 91 + It may fail, in which case it is expected to return a negative error 92 + code, and that causes the kernel to run the architecture-specific 93 + default code for idle CPUs on the CPU in question instead of ``CPUIdle`` 94 + until the ``->enable()`` governor callback is invoked for that CPU 95 + again. 96 + 97 + :c:member:`disable` 98 + :: 99 + 100 + void (*disable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); 101 + 102 + Called to make the governor stop handling the (logical) CPU represented 103 + by the |struct cpuidle_device| object pointed to by the ``dev`` 104 + argument. 105 + 106 + It is expected to reverse any changes made by the ``->enable()`` 107 + callback when it was last invoked for the target CPU, free all memory 108 + allocated by that callback and so on. 109 + 110 + :c:member:`select` 111 + :: 112 + 113 + int (*select) (struct cpuidle_driver *drv, struct cpuidle_device *dev, 114 + bool *stop_tick); 115 + 116 + Called to select an idle state for the processor holding the (logical) 117 + CPU represented by the |struct cpuidle_device| object pointed to by the 118 + ``dev`` argument. 119 + 120 + The list of idle states to take into consideration is represented by the 121 + :c:member:`states` array of |struct cpuidle_state| objects held by the 122 + |struct cpuidle_driver| object pointed to by the ``drv`` argument (which 123 + represents the ``CPUIdle`` driver to be used with the CPU at hand). The 124 + value returned by this callback is interpreted as an index into that 125 + array (unless it is a negative error code). 126 + 127 + The ``stop_tick`` argument is used to indicate whether or not to stop 128 + the scheduler tick before asking the processor to enter the selected 129 + idle state. When the ``bool`` variable pointed to by it (which is set 130 + to ``true`` before invoking this callback) is cleared to ``false``, the 131 + processor will be asked to enter the selected idle state without 132 + stopping the scheduler tick on the given CPU (if the tick has been 133 + stopped on that CPU already, however, it will not be restarted before 134 + asking the processor to enter the idle state). 135 + 136 + This callback is mandatory (i.e. the :c:member:`select` callback pointer 137 + in |struct cpuidle_governor| must not be ``NULL`` for the registration 138 + of the governor to succeed). 139 + 140 + :c:member:`reflect` 141 + :: 142 + 143 + void (*reflect) (struct cpuidle_device *dev, int index); 144 + 145 + Called to allow the governor to evaluate the accuracy of the idle state 146 + selection made by the ``->select()`` callback (when it was invoked last 147 + time) and possibly use the result of that to improve the accuracy of 148 + idle state selections in the future. 149 + 150 + In addition, ``CPUIdle`` governors are required to take power management 151 + quality of service (PM QoS) constraints on the processor wakeup latency into 152 + account when selecting idle states. In order to obtain the current effective 153 + PM QoS wakeup latency constraint for a given CPU, a ``CPUIdle`` governor is 154 + expected to pass the number of the CPU to 155 + :c:func:`cpuidle_governor_latency_req()`. Then, the governor's ``->select()`` 156 + callback must not return the index of an indle state whose 157 + :c:member:`exit_latency` value is greater than the number returned by that 158 + function. 159 + 160 + 161 + CPU Idle Time Management Drivers 162 + ================================ 163 + 164 + CPU idle time management (``CPUIdle``) drivers provide an interface between the 165 + other parts of ``CPUIdle`` and the hardware. 166 + 167 + First of all, a ``CPUIdle`` driver has to populate the :c:member:`states` array 168 + of |struct cpuidle_state| objects included in the |struct cpuidle_driver| object 169 + representing it. Going forward this array will represent the list of available 170 + idle states that the processor hardware can be asked to enter shared by all of 171 + the logical CPUs handled by the given driver. 172 + 173 + The entries in the :c:member:`states` array are expected to be sorted by the 174 + value of the :c:member:`target_residency` field in |struct cpuidle_state| in 175 + the ascending order (that is, index 0 should correspond to the idle state with 176 + the minimum value of :c:member:`target_residency`). [Since the 177 + :c:member:`target_residency` value is expected to reflect the "depth" of the 178 + idle state represented by the |struct cpuidle_state| object holding it, this 179 + sorting order should be the same as the ascending sorting order by the idle 180 + state "depth".] 181 + 182 + Three fields in |struct cpuidle_state| are used by the existing ``CPUIdle`` 183 + governors for computations related to idle state selection: 184 + 185 + :c:member:`target_residency` 186 + Minimum time to spend in this idle state including the time needed to 187 + enter it (which may be substantial) to save more energy than could 188 + be saved by staying in a shallower idle state for the same amount of 189 + time, in microseconds. 190 + 191 + :c:member:`exit_latency` 192 + Maximum time it will take a CPU asking the processor to enter this idle 193 + state to start executing the first instruction after a wakeup from it, 194 + in microseconds. 195 + 196 + :c:member:`flags` 197 + Flags representing idle state properties. Currently, governors only use 198 + the ``CPUIDLE_FLAG_POLLING`` flag which is set if the given object 199 + does not represent a real idle state, but an interface to a software 200 + "loop" that can be used in order to avoid asking the processor to enter 201 + any idle state at all. [There are other flags used by the ``CPUIdle`` 202 + core in special situations.] 203 + 204 + The :c:member:`enter` callback pointer in |struct cpuidle_state|, which must not 205 + be ``NULL``, points to the routine to execute in order to ask the processor to 206 + enter this particular idle state: 207 + 208 + :: 209 + 210 + void (*enter) (struct cpuidle_device *dev, struct cpuidle_driver *drv, 211 + int index); 212 + 213 + The first two arguments of it point to the |struct cpuidle_device| object 214 + representing the logical CPU running this callback and the 215 + |struct cpuidle_driver| object representing the driver itself, respectively, 216 + and the last one is an index of the |struct cpuidle_state| entry in the driver's 217 + :c:member:`states` array representing the idle state to ask the processor to 218 + enter. 219 + 220 + The analogous ``->enter_s2idle()`` callback in |struct cpuidle_state| is used 221 + only for implementing the suspend-to-idle system-wide power management feature. 222 + The difference between in and ``->enter()`` is that it must not re-enable 223 + interrupts at any point (even temporarily) or attempt to change the states of 224 + clock event devices, which the ``->enter()`` callback may do sometimes. 225 + 226 + Once the :c:member:`states` array has been populated, the number of valid 227 + entries in it has to be stored in the :c:member:`state_count` field of the 228 + |struct cpuidle_driver| object representing the driver. Moreover, if any 229 + entries in the :c:member:`states` array represent "coupled" idle states (that 230 + is, idle states that can only be asked for if multiple related logical CPUs are 231 + idle), the :c:member:`safe_state_index` field in |struct cpuidle_driver| needs 232 + to be the index of an idle state that is not "coupled" (that is, one that can be 233 + asked for if only one logical CPU is idle). 234 + 235 + In addition to that, if the given ``CPUIdle`` driver is only going to handle a 236 + subset of logical CPUs in the system, the :c:member:`cpumask` field in its 237 + |struct cpuidle_driver| object must point to the set (mask) of CPUs that will be 238 + handled by it. 239 + 240 + A ``CPUIdle`` driver can only be used after it has been registered. If there 241 + are no "coupled" idle state entries in the driver's :c:member:`states` array, 242 + that can be accomplished by passing the driver's |struct cpuidle_driver| object 243 + to :c:func:`cpuidle_register_driver()`. Otherwise, :c:func:`cpuidle_register()` 244 + should be used for this purpose. 245 + 246 + However, it also is necessary to register |struct cpuidle_device| objects for 247 + all of the logical CPUs to be handled by the given ``CPUIdle`` driver with the 248 + help of :c:func:`cpuidle_register_device()` after the driver has been registered 249 + and :c:func:`cpuidle_register_driver()`, unlike :c:func:`cpuidle_register()`, 250 + does not do that automatically. For this reason, the drivers that use 251 + :c:func:`cpuidle_register_driver()` to register themselves must also take care 252 + of registering the |struct cpuidle_device| objects as needed, so it is generally 253 + recommended to use :c:func:`cpuidle_register()` for ``CPUIdle`` driver 254 + registration in all cases. 255 + 256 + The registration of a |struct cpuidle_device| object causes the ``CPUIdle`` 257 + ``sysfs`` interface to be created and the governor's ``->enable()`` callback to 258 + be invoked for the logical CPU represented by it, so it must take place after 259 + registering the driver that will handle the CPU in question. 260 + 261 + ``CPUIdle`` drivers and |struct cpuidle_device| objects can be unregistered 262 + when they are not necessary any more which allows some resources associated with 263 + them to be released. Due to dependencies between them, all of the 264 + |struct cpuidle_device| objects representing CPUs handled by the given 265 + ``CPUIdle`` driver must be unregistered, with the help of 266 + :c:func:`cpuidle_unregister_device()`, before calling 267 + :c:func:`cpuidle_unregister_driver()` to unregister the driver. Alternatively, 268 + :c:func:`cpuidle_unregister()` can be called to unregister a ``CPUIdle`` driver 269 + along with all of the |struct cpuidle_device| objects representing CPUs handled 270 + by it. 271 + 272 + ``CPUIdle`` drivers can respond to runtime system configuration changes that 273 + lead to modifications of the list of available processor idle states (which can 274 + happen, for example, when the system's power source is switched from AC to 275 + battery or the other way around). Upon a notification of such a change, 276 + a ``CPUIdle`` driver is expected to call :c:func:`cpuidle_pause_and_lock()` to 277 + turn ``CPUIdle`` off temporarily and then :c:func:`cpuidle_disable_device()` for 278 + all of the |struct cpuidle_device| objects representing CPUs affected by that 279 + change. Next, it can update its :c:member:`states` array in accordance with 280 + the new configuration of the system, call :c:func:`cpuidle_enable_device()` for 281 + all of the relevant |struct cpuidle_device| objects and invoke 282 + :c:func:`cpuidle_resume_and_unlock()` to allow ``CPUIdle`` to be used again.

+4 -3

Documentation/driver-api/pm/index.rst

··· 1 - ======================= 2 - Device Power Management 3 - ======================= 1 + =============================== 2 + CPU and Device Power Management 3 + =============================== 4 4 5 5 .. toctree:: 6 6 7 + cpuidle 7 8 devices 8 9 notifiers 9 10 types

+8 -6

MAINTAINERS

··· 1736 1736 F: arch/arm/mach-mvebu/ 1737 1737 F: arch/arm64/boot/dts/marvell/armada* 1738 1738 F: drivers/cpufreq/armada-37xx-cpufreq.c 1739 + F: drivers/cpufreq/armada-8k-cpufreq.c 1739 1740 F: drivers/cpufreq/mvebu-cpufreq.c 1740 1741 F: drivers/irqchip/irq-armada-370-xp.c 1741 1742 F: drivers/irqchip/irq-mvebu-* ··· 3995 3994 L: linux-pm@vger.kernel.org 3996 3995 S: Maintained 3997 3996 T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 3998 - T: git git://git.linaro.org/people/vireshk/linux.git (For ARM Updates) 3997 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git (For ARM Updates) 3999 3998 B: https://bugzilla.kernel.org 4000 3999 F: Documentation/admin-guide/pm/cpufreq.rst 4001 4000 F: Documentation/admin-guide/pm/intel_pstate.rst ··· 4055 4054 T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 4056 4055 B: https://bugzilla.kernel.org 4057 4056 F: Documentation/admin-guide/pm/cpuidle.rst 4057 + F: Documentation/driver-api/pm/cpuidle.rst 4058 4058 F: drivers/cpuidle/* 4059 4059 F: include/linux/cpuidle.h 4060 4060 ··· 12681 12679 F: drivers/media/platform/qcom/camss/ 12682 12680 12683 12681 QUALCOMM CPUFREQ DRIVER MSM8996/APQ8096 12684 - M: Ilia Lin <ilia.lin@gmail.com> 12685 - L: linux-pm@vger.kernel.org 12686 - S: Maintained 12687 - F: Documentation/devicetree/bindings/opp/kryo-cpufreq.txt 12688 - F: drivers/cpufreq/qcom-cpufreq-kryo.c 12682 + M: Ilia Lin <ilia.lin@kernel.org> 12683 + L: linux-pm@vger.kernel.org 12684 + S: Maintained 12685 + F: Documentation/devicetree/bindings/opp/kryo-cpufreq.txt 12686 + F: drivers/cpufreq/qcom-cpufreq-kryo.c 12689 12687 12690 12688 QUALCOMM EMAC GIGABIT ETHERNET DRIVER 12691 12689 M: Timur Tabi <timur@kernel.org>

+1 -1

arch/arm/mach-davinci/da850.c

··· 22 22 #include <linux/mfd/da8xx-cfgchip.h> 23 23 #include <linux/platform_data/clk-da8xx-cfgchip.h> 24 24 #include <linux/platform_data/clk-davinci-pll.h> 25 + #include <linux/platform_data/davinci-cpufreq.h> 25 26 #include <linux/platform_data/gpio-davinci.h> 26 27 #include <linux/platform_device.h> 27 28 #include <linux/regmap.h> ··· 31 30 #include <asm/mach/map.h> 32 31 33 32 #include <mach/common.h> 34 - #include <mach/cpufreq.h> 35 33 #include <mach/cputype.h> 36 34 #include <mach/da8xx.h> 37 35 #include <mach/pm.h>

-26

arch/arm/mach-davinci/include/mach/cpufreq.h

··· 1 - /* 2 - * TI DaVinci CPUFreq platform support. 3 - * 4 - * Copyright (C) 2009 Texas Instruments, Inc. http://www.ti.com/ 5 - * 6 - * This program is free software; you can redistribute it and/or 7 - * modify it under the terms of the GNU General Public License as 8 - * published by the Free Software Foundation version 2. 9 - * 10 - * This program is distributed "as is" WITHOUT ANY WARRANTY of any 11 - * kind, whether express or implied; without even the implied warranty 12 - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 13 - * GNU General Public License for more details. 14 - */ 15 - #ifndef _MACH_DAVINCI_CPUFREQ_H 16 - #define _MACH_DAVINCI_CPUFREQ_H 17 - 18 - #include <linux/cpufreq.h> 19 - 20 - struct davinci_cpufreq_config { 21 - struct cpufreq_frequency_table *freq_table; 22 - int (*set_voltage) (unsigned int index); 23 - int (*init) (void); 24 - }; 25 - 26 - #endif

+42

drivers/acpi/cppc_acpi.c

··· 1051 1051 } 1052 1052 1053 1053 /** 1054 + * cppc_get_desired_perf - Get the value of desired performance register. 1055 + * @cpunum: CPU from which to get desired performance. 1056 + * @desired_perf: address of a variable to store the returned desired performance 1057 + * 1058 + * Return: 0 for success, -EIO otherwise. 1059 + */ 1060 + int cppc_get_desired_perf(int cpunum, u64 *desired_perf) 1061 + { 1062 + struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpunum); 1063 + int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpunum); 1064 + struct cpc_register_resource *desired_reg; 1065 + struct cppc_pcc_data *pcc_ss_data = NULL; 1066 + 1067 + desired_reg = &cpc_desc->cpc_regs[DESIRED_PERF]; 1068 + 1069 + if (CPC_IN_PCC(desired_reg)) { 1070 + int ret = 0; 1071 + 1072 + if (pcc_ss_id < 0) 1073 + return -EIO; 1074 + 1075 + pcc_ss_data = pcc_data[pcc_ss_id]; 1076 + 1077 + down_write(&pcc_ss_data->pcc_lock); 1078 + 1079 + if (send_pcc_cmd(pcc_ss_id, CMD_READ) >= 0) 1080 + cpc_read(cpunum, desired_reg, desired_perf); 1081 + else 1082 + ret = -EIO; 1083 + 1084 + up_write(&pcc_ss_data->pcc_lock); 1085 + 1086 + return ret; 1087 + } 1088 + 1089 + cpc_read(cpunum, desired_reg, desired_perf); 1090 + 1091 + return 0; 1092 + } 1093 + EXPORT_SYMBOL_GPL(cppc_get_desired_perf); 1094 + 1095 + /** 1054 1096 * cppc_get_perf_caps - Get a CPUs performance capabilities. 1055 1097 * @cpunum: CPU from which to get capabilities info. 1056 1098 * @perf_caps: ptr to cppc_perf_caps. See cppc_acpi.h

+7

drivers/acpi/processor_idle.c

··· 282 282 pr->power.states[ACPI_STATE_C2].address, 283 283 pr->power.states[ACPI_STATE_C3].address)); 284 284 285 + snprintf(pr->power.states[ACPI_STATE_C2].desc, 286 + ACPI_CX_DESC_LEN, "ACPI P_LVL2 IOPORT 0x%x", 287 + pr->power.states[ACPI_STATE_C2].address); 288 + snprintf(pr->power.states[ACPI_STATE_C3].desc, 289 + ACPI_CX_DESC_LEN, "ACPI P_LVL3 IOPORT 0x%x", 290 + pr->power.states[ACPI_STATE_C3].address); 291 + 285 292 return 0; 286 293 } 287 294

+1

drivers/base/cpu.c

··· 427 427 dev->parent = parent; 428 428 dev->groups = groups; 429 429 dev->release = device_create_release; 430 + device_set_pm_not_required(dev); 430 431 dev_set_drvdata(dev, drvdata); 431 432 432 433 retval = kobject_set_name_vargs(&dev->kobj, fmt, args);

+9 -4

drivers/base/power/clock_ops.c

··· 65 65 if (IS_ERR(ce->clk)) { 66 66 ce->status = PCE_STATUS_ERROR; 67 67 } else { 68 - clk_prepare(ce->clk); 69 - ce->status = PCE_STATUS_ACQUIRED; 70 - dev_dbg(dev, "Clock %pC con_id %s managed by runtime PM.\n", 71 - ce->clk, ce->con_id); 68 + if (clk_prepare(ce->clk)) { 69 + ce->status = PCE_STATUS_ERROR; 70 + dev_err(dev, "clk_prepare() failed\n"); 71 + } else { 72 + ce->status = PCE_STATUS_ACQUIRED; 73 + dev_dbg(dev, 74 + "Clock %pC con_id %s managed by runtime PM.\n", 75 + ce->clk, ce->con_id); 76 + } 72 77 } 73 78 } 74 79

+1 -1

drivers/base/power/common.c

··· 160 160 * For a detailed function description, see dev_pm_domain_attach_by_id(). 161 161 */ 162 162 struct device *dev_pm_domain_attach_by_name(struct device *dev, 163 - char *name) 163 + const char *name) 164 164 { 165 165 if (dev->pm_domain) 166 166 return ERR_PTR(-EEXIST);

+3 -10

drivers/base/power/domain.c

··· 2483 2483 * power-domain-names DT property. For further description see 2484 2484 * genpd_dev_pm_attach_by_id(). 2485 2485 */ 2486 - struct device *genpd_dev_pm_attach_by_name(struct device *dev, char *name) 2486 + struct device *genpd_dev_pm_attach_by_name(struct device *dev, const char *name) 2487 2487 { 2488 2488 int index; 2489 2489 ··· 2948 2948 2949 2949 genpd_debugfs_dir = debugfs_create_dir("pm_genpd", NULL); 2950 2950 2951 - if (!genpd_debugfs_dir) 2952 - return -ENOMEM; 2953 - 2954 - d = debugfs_create_file("pm_genpd_summary", S_IRUGO, 2955 - genpd_debugfs_dir, NULL, &summary_fops); 2956 - if (!d) 2957 - return -ENOMEM; 2951 + debugfs_create_file("pm_genpd_summary", S_IRUGO, genpd_debugfs_dir, 2952 + NULL, &summary_fops); 2958 2953 2959 2954 list_for_each_entry(genpd, &gpd_list, gpd_list_node) { 2960 2955 d = debugfs_create_dir(genpd->name, genpd_debugfs_dir); 2961 - if (!d) 2962 - return -ENOMEM; 2963 2956 2964 2957 debugfs_create_file("current_state", 0444, 2965 2958 d, genpd, &status_fops);

+10 -1

drivers/base/power/main.c

··· 124 124 */ 125 125 void device_pm_add(struct device *dev) 126 126 { 127 + /* Skip PM setup/initialization. */ 128 + if (device_pm_not_required(dev)) 129 + return; 130 + 127 131 pr_debug("PM: Adding info for %s:%s\n", 128 132 dev->bus ? dev->bus->name : "No Bus", dev_name(dev)); 129 133 device_pm_check_callbacks(dev); ··· 146 142 */ 147 143 void device_pm_remove(struct device *dev) 148 144 { 145 + if (device_pm_not_required(dev)) 146 + return; 147 + 149 148 pr_debug("PM: Removing info for %s:%s\n", 150 149 dev->bus ? dev->bus->name : "No Bus", dev_name(dev)); 151 150 complete_all(&dev->power.completion); ··· 1748 1741 if (dev->power.direct_complete) { 1749 1742 if (pm_runtime_status_suspended(dev)) { 1750 1743 pm_runtime_disable(dev); 1751 - if (pm_runtime_status_suspended(dev)) 1744 + if (pm_runtime_status_suspended(dev)) { 1745 + pm_dev_dbg(dev, state, "direct-complete "); 1752 1746 goto Complete; 1747 + } 1753 1748 1754 1749 pm_runtime_enable(dev); 1755 1750 }

+52 -22

drivers/base/power/runtime.c

··· 66 66 */ 67 67 void update_pm_runtime_accounting(struct device *dev) 68 68 { 69 - unsigned long now = jiffies; 70 - unsigned long delta; 71 - 72 - delta = now - dev->power.accounting_timestamp; 73 - 74 - dev->power.accounting_timestamp = now; 69 + u64 now, last, delta; 75 70 76 71 if (dev->power.disable_depth > 0) 77 72 return; 78 73 74 + last = dev->power.accounting_timestamp; 75 + 76 + now = ktime_get_mono_fast_ns(); 77 + dev->power.accounting_timestamp = now; 78 + 79 + /* 80 + * Because ktime_get_mono_fast_ns() is not monotonic during 81 + * timekeeping updates, ensure that 'now' is after the last saved 82 + * timesptamp. 83 + */ 84 + if (now < last) 85 + return; 86 + 87 + delta = now - last; 88 + 79 89 if (dev->power.runtime_status == RPM_SUSPENDED) 80 - dev->power.suspended_jiffies += delta; 90 + dev->power.suspended_time += delta; 81 91 else 82 - dev->power.active_jiffies += delta; 92 + dev->power.active_time += delta; 83 93 } 84 94 85 95 static void __update_runtime_status(struct device *dev, enum rpm_status status) ··· 97 87 update_pm_runtime_accounting(dev); 98 88 dev->power.runtime_status = status; 99 89 } 90 + 91 + u64 pm_runtime_suspended_time(struct device *dev) 92 + { 93 + u64 time; 94 + unsigned long flags; 95 + 96 + spin_lock_irqsave(&dev->power.lock, flags); 97 + 98 + update_pm_runtime_accounting(dev); 99 + time = dev->power.suspended_time; 100 + 101 + spin_unlock_irqrestore(&dev->power.lock, flags); 102 + 103 + return time; 104 + } 105 + EXPORT_SYMBOL_GPL(pm_runtime_suspended_time); 100 106 101 107 /** 102 108 * pm_runtime_deactivate_timer - Deactivate given device's suspend timer. ··· 155 129 u64 pm_runtime_autosuspend_expiration(struct device *dev) 156 130 { 157 131 int autosuspend_delay; 158 - u64 last_busy, expires = 0; 159 - u64 now = ktime_get_mono_fast_ns(); 132 + u64 expires; 160 133 161 134 if (!dev->power.use_autosuspend) 162 - goto out; 135 + return 0; 163 136 164 137 autosuspend_delay = READ_ONCE(dev->power.autosuspend_delay); 165 138 if (autosuspend_delay < 0) 166 - goto out; 139 + return 0; 167 140 168 - last_busy = READ_ONCE(dev->power.last_busy); 141 + expires = READ_ONCE(dev->power.last_busy); 142 + expires += (u64)autosuspend_delay * NSEC_PER_MSEC; 143 + if (expires > ktime_get_mono_fast_ns()) 144 + return expires; /* Expires in the future */ 169 145 170 - expires = last_busy + (u64)autosuspend_delay * NSEC_PER_MSEC; 171 - if (expires <= now) 172 - expires = 0; /* Already expired. */ 173 - 174 - out: 175 - return expires; 146 + return 0; 176 147 } 177 148 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration); 178 149 ··· 1299 1276 pm_runtime_put_noidle(dev); 1300 1277 } 1301 1278 1279 + /* Update time accounting before disabling PM-runtime. */ 1280 + update_pm_runtime_accounting(dev); 1281 + 1302 1282 if (!dev->power.disable_depth++) 1303 1283 __pm_runtime_barrier(dev); 1304 1284 ··· 1320 1294 1321 1295 spin_lock_irqsave(&dev->power.lock, flags); 1322 1296 1323 - if (dev->power.disable_depth > 0) 1297 + if (dev->power.disable_depth > 0) { 1324 1298 dev->power.disable_depth--; 1325 - else 1299 + 1300 + /* About to enable runtime pm, set accounting_timestamp to now */ 1301 + if (!dev->power.disable_depth) 1302 + dev->power.accounting_timestamp = ktime_get_mono_fast_ns(); 1303 + } else { 1326 1304 dev_warn(dev, "Unbalanced %s!\n", __func__); 1305 + } 1327 1306 1328 1307 WARN(!dev->power.disable_depth && 1329 1308 dev->power.runtime_status == RPM_SUSPENDED && ··· 1525 1494 dev->power.request_pending = false; 1526 1495 dev->power.request = RPM_REQ_NONE; 1527 1496 dev->power.deferred_resume = false; 1528 - dev->power.accounting_timestamp = jiffies; 1529 1497 INIT_WORK(&dev->power.work, pm_runtime_work); 1530 1498 1531 1499 dev->power.timer_expires = 0;

+14 -3

drivers/base/power/sysfs.c

··· 125 125 struct device_attribute *attr, char *buf) 126 126 { 127 127 int ret; 128 + u64 tmp; 128 129 spin_lock_irq(&dev->power.lock); 129 130 update_pm_runtime_accounting(dev); 130 - ret = sprintf(buf, "%i\n", jiffies_to_msecs(dev->power.active_jiffies)); 131 + tmp = dev->power.active_time; 132 + do_div(tmp, NSEC_PER_MSEC); 133 + ret = sprintf(buf, "%llu\n", tmp); 131 134 spin_unlock_irq(&dev->power.lock); 132 135 return ret; 133 136 } ··· 141 138 struct device_attribute *attr, char *buf) 142 139 { 143 140 int ret; 141 + u64 tmp; 144 142 spin_lock_irq(&dev->power.lock); 145 143 update_pm_runtime_accounting(dev); 146 - ret = sprintf(buf, "%i\n", 147 - jiffies_to_msecs(dev->power.suspended_jiffies)); 144 + tmp = dev->power.suspended_time; 145 + do_div(tmp, NSEC_PER_MSEC); 146 + ret = sprintf(buf, "%llu\n", tmp); 148 147 spin_unlock_irq(&dev->power.lock); 149 148 return ret; 150 149 } ··· 653 648 { 654 649 int rc; 655 650 651 + /* No need to create PM sysfs if explicitly disabled. */ 652 + if (device_pm_not_required(dev)) 653 + return 0; 654 + 656 655 rc = sysfs_create_group(&dev->kobj, &pm_attr_group); 657 656 if (rc) 658 657 return rc; ··· 736 727 737 728 void dpm_sysfs_remove(struct device *dev) 738 729 { 730 + if (device_pm_not_required(dev)) 731 + return; 739 732 sysfs_unmerge_group(&dev->kobj, &pm_qos_latency_tolerance_attr_group); 740 733 dev_pm_qos_constraints_destroy(dev); 741 734 rpm_sysfs_remove(dev);

+1 -1

drivers/base/power/wakeup.c

··· 783 783 EXPORT_SYMBOL_GPL(pm_wakeup_ws_event); 784 784 785 785 /** 786 - * pm_wakeup_event - Notify the PM core of a wakeup event. 786 + * pm_wakeup_dev_event - Notify the PM core of a wakeup event. 787 787 * @dev: Device the wakeup event is related to. 788 788 * @msec: Anticipated event processing time (in milliseconds). 789 789 * @hard: If set, abort suspends in progress and wake up from suspend-to-idle.

-3

drivers/cpufreq/Kconfig

··· 207 207 config CPUFREQ_DT 208 208 tristate "Generic DT based cpufreq driver" 209 209 depends on HAVE_CLK && OF 210 - # if CPU_THERMAL is on and THERMAL=m, CPUFREQ_DT cannot be =y: 211 - depends on !CPU_THERMAL || THERMAL 212 210 select CPUFREQ_DT_PLATDEV 213 211 select PM_OPP 214 212 help ··· 325 327 config QORIQ_CPUFREQ 326 328 tristate "CPU frequency scaling driver for Freescale QorIQ SoCs" 327 329 depends on OF && COMMON_CLK && (PPC_E500MC || ARM || ARM64) 328 - depends on !CPU_THERMAL || THERMAL 329 330 select CLK_QORIQ 330 331 help 331 332 This adds the CPUFreq driver support for Freescale QorIQ SoCs

+11 -5

drivers/cpufreq/Kconfig.arm

··· 25 25 This adds the CPUFreq driver support for Marvell Armada 37xx SoCs. 26 26 The Armada 37xx PMU supports 4 frequency and VDD levels. 27 27 28 + config ARM_ARMADA_8K_CPUFREQ 29 + tristate "Armada 8K CPUFreq driver" 30 + depends on ARCH_MVEBU && CPUFREQ_DT 31 + help 32 + This enables the CPUFreq driver support for Marvell 33 + Armada8k SOCs. 34 + Armada8K device has the AP806 which supports scaling 35 + to any full integer divider. 36 + 37 + If in doubt, say N. 38 + 28 39 # big LITTLE core layer and glue drivers 29 40 config ARM_BIG_LITTLE_CPUFREQ 30 41 tristate "Generic ARM big LITTLE CPUfreq driver" 31 42 depends on ARM_CPU_TOPOLOGY && HAVE_CLK 32 - # if CPU_THERMAL is on and THERMAL=m, ARM_BIT_LITTLE_CPUFREQ cannot be =y 33 - depends on !CPU_THERMAL || THERMAL 34 43 select PM_OPP 35 44 help 36 45 This enables the Generic CPUfreq driver for ARM big.LITTLE platforms. ··· 47 38 config ARM_SCPI_CPUFREQ 48 39 tristate "SCPI based CPUfreq driver" 49 40 depends on ARM_SCPI_PROTOCOL && COMMON_CLK_SCPI 50 - depends on !CPU_THERMAL || THERMAL 51 41 help 52 42 This adds the CPUfreq driver support for ARM platforms using SCPI 53 43 protocol for CPU power management. ··· 101 93 config ARM_MEDIATEK_CPUFREQ 102 94 tristate "CPU Frequency scaling support for MediaTek SoCs" 103 95 depends on ARCH_MEDIATEK && REGULATOR 104 - depends on !CPU_THERMAL || THERMAL 105 96 select PM_OPP 106 97 help 107 98 This adds the CPUFreq driver support for MediaTek SoCs. ··· 240 233 config ARM_SCMI_CPUFREQ 241 234 tristate "SCMI based CPUfreq driver" 242 235 depends on ARM_SCMI_PROTOCOL || COMPILE_TEST 243 - depends on !CPU_THERMAL || THERMAL 244 236 select PM_OPP 245 237 help 246 238 This adds the CPUfreq driver support for ARM platforms using SCMI

+1

drivers/cpufreq/Makefile

··· 50 50 obj-$(CONFIG_ARM_BIG_LITTLE_CPUFREQ) += arm_big_little.o 51 51 52 52 obj-$(CONFIG_ARM_ARMADA_37XX_CPUFREQ) += armada-37xx-cpufreq.o 53 + obj-$(CONFIG_ARM_ARMADA_8K_CPUFREQ) += armada-8k-cpufreq.o 53 54 obj-$(CONFIG_ARM_BRCMSTB_AVS_CPUFREQ) += brcmstb-avs-cpufreq.o 54 55 obj-$(CONFIG_ACPI_CPPC_CPUFREQ) += cppc_cpufreq.o 55 56 obj-$(CONFIG_ARCH_DAVINCI) += davinci-cpufreq.o

+3 -1

drivers/cpufreq/acpi-cpufreq.c

··· 916 916 { 917 917 int ret; 918 918 919 - if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA))) 919 + if (!(boot_cpu_has(X86_FEATURE_CPB) || boot_cpu_has(X86_FEATURE_IDA))) { 920 + pr_debug("Boost capabilities not present in the processor\n"); 920 921 return; 922 + } 921 923 922 924 acpi_cpufreq_driver.set_boost = set_boost; 923 925 acpi_cpufreq_driver.boost_enabled = boost_state(0);

+2

drivers/cpufreq/arm_big_little.c

··· 487 487 policy->cpuinfo.transition_latency = 488 488 arm_bL_ops->get_transition_latency(cpu_dev); 489 489 490 + dev_pm_opp_of_register_em(policy->cpus); 491 + 490 492 if (is_bL_switching_enabled()) 491 493 per_cpu(cpu_last_req_freq, policy->cpu) = clk_get_cpu_rate(policy->cpu); 492 494

+206

drivers/cpufreq/armada-8k-cpufreq.c

··· 1 + // SPDX-License-Identifier: GPL-2.0+ 2 + /* 3 + * CPUFreq support for Armada 8K 4 + * 5 + * Copyright (C) 2018 Marvell 6 + * 7 + * Omri Itach <omrii@marvell.com> 8 + * Gregory Clement <gregory.clement@bootlin.com> 9 + */ 10 + 11 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 12 + 13 + #include <linux/clk.h> 14 + #include <linux/cpu.h> 15 + #include <linux/err.h> 16 + #include <linux/init.h> 17 + #include <linux/kernel.h> 18 + #include <linux/module.h> 19 + #include <linux/of.h> 20 + #include <linux/platform_device.h> 21 + #include <linux/pm_opp.h> 22 + #include <linux/slab.h> 23 + 24 + /* 25 + * Setup the opps list with the divider for the max frequency, that 26 + * will be filled at runtime. 27 + */ 28 + static const int opps_div[] __initconst = {1, 2, 3, 4}; 29 + 30 + static struct platform_device *armada_8k_pdev; 31 + 32 + struct freq_table { 33 + struct device *cpu_dev; 34 + unsigned int freq[ARRAY_SIZE(opps_div)]; 35 + }; 36 + 37 + /* If the CPUs share the same clock, then they are in the same cluster. */ 38 + static void __init armada_8k_get_sharing_cpus(struct clk *cur_clk, 39 + struct cpumask *cpumask) 40 + { 41 + int cpu; 42 + 43 + for_each_possible_cpu(cpu) { 44 + struct device *cpu_dev; 45 + struct clk *clk; 46 + 47 + cpu_dev = get_cpu_device(cpu); 48 + if (!cpu_dev) { 49 + pr_warn("Failed to get cpu%d device\n", cpu); 50 + continue; 51 + } 52 + 53 + clk = clk_get(cpu_dev, 0); 54 + if (IS_ERR(clk)) { 55 + pr_warn("Cannot get clock for CPU %d\n", cpu); 56 + } else { 57 + if (clk_is_match(clk, cur_clk)) 58 + cpumask_set_cpu(cpu, cpumask); 59 + 60 + clk_put(clk); 61 + } 62 + } 63 + } 64 + 65 + static int __init armada_8k_add_opp(struct clk *clk, struct device *cpu_dev, 66 + struct freq_table *freq_tables, 67 + int opps_index) 68 + { 69 + unsigned int cur_frequency; 70 + unsigned int freq; 71 + int i, ret; 72 + 73 + /* Get nominal (current) CPU frequency. */ 74 + cur_frequency = clk_get_rate(clk); 75 + if (!cur_frequency) { 76 + dev_err(cpu_dev, "Failed to get clock rate for this CPU\n"); 77 + return -EINVAL; 78 + } 79 + 80 + freq_tables[opps_index].cpu_dev = cpu_dev; 81 + 82 + for (i = 0; i < ARRAY_SIZE(opps_div); i++) { 83 + freq = cur_frequency / opps_div[i]; 84 + 85 + ret = dev_pm_opp_add(cpu_dev, freq, 0); 86 + if (ret) 87 + return ret; 88 + 89 + freq_tables[opps_index].freq[i] = freq; 90 + } 91 + 92 + return 0; 93 + } 94 + 95 + static void armada_8k_cpufreq_free_table(struct freq_table *freq_tables) 96 + { 97 + int opps_index, nb_cpus = num_possible_cpus(); 98 + 99 + for (opps_index = 0 ; opps_index <= nb_cpus; opps_index++) { 100 + int i; 101 + 102 + /* If cpu_dev is NULL then we reached the end of the array */ 103 + if (!freq_tables[opps_index].cpu_dev) 104 + break; 105 + 106 + for (i = 0; i < ARRAY_SIZE(opps_div); i++) { 107 + /* 108 + * A 0Hz frequency is not valid, this meant 109 + * that it was not yet initialized so there is 110 + * no more opp to free 111 + */ 112 + if (freq_tables[opps_index].freq[i] == 0) 113 + break; 114 + 115 + dev_pm_opp_remove(freq_tables[opps_index].cpu_dev, 116 + freq_tables[opps_index].freq[i]); 117 + } 118 + } 119 + 120 + kfree(freq_tables); 121 + } 122 + 123 + static int __init armada_8k_cpufreq_init(void) 124 + { 125 + int ret = 0, opps_index = 0, cpu, nb_cpus; 126 + struct freq_table *freq_tables; 127 + struct device_node *node; 128 + struct cpumask cpus; 129 + 130 + node = of_find_compatible_node(NULL, NULL, "marvell,ap806-cpu-clock"); 131 + if (!node || !of_device_is_available(node)) { 132 + of_node_put(node); 133 + return -ENODEV; 134 + } 135 + 136 + nb_cpus = num_possible_cpus(); 137 + freq_tables = kcalloc(nb_cpus, sizeof(*freq_tables), GFP_KERNEL); 138 + cpumask_copy(&cpus, cpu_possible_mask); 139 + 140 + /* 141 + * For each CPU, this loop registers the operating points 142 + * supported (which are the nominal CPU frequency and full integer 143 + * divisions of it). 144 + */ 145 + for_each_cpu(cpu, &cpus) { 146 + struct cpumask shared_cpus; 147 + struct device *cpu_dev; 148 + struct clk *clk; 149 + 150 + cpu_dev = get_cpu_device(cpu); 151 + 152 + if (!cpu_dev) { 153 + pr_err("Cannot get CPU %d\n", cpu); 154 + continue; 155 + } 156 + 157 + clk = clk_get(cpu_dev, 0); 158 + 159 + if (IS_ERR(clk)) { 160 + pr_err("Cannot get clock for CPU %d\n", cpu); 161 + ret = PTR_ERR(clk); 162 + goto remove_opp; 163 + } 164 + 165 + ret = armada_8k_add_opp(clk, cpu_dev, freq_tables, opps_index); 166 + if (ret) { 167 + clk_put(clk); 168 + goto remove_opp; 169 + } 170 + 171 + opps_index++; 172 + cpumask_clear(&shared_cpus); 173 + armada_8k_get_sharing_cpus(clk, &shared_cpus); 174 + dev_pm_opp_set_sharing_cpus(cpu_dev, &shared_cpus); 175 + cpumask_andnot(&cpus, &cpus, &shared_cpus); 176 + clk_put(clk); 177 + } 178 + 179 + armada_8k_pdev = platform_device_register_simple("cpufreq-dt", -1, 180 + NULL, 0); 181 + ret = PTR_ERR_OR_ZERO(armada_8k_pdev); 182 + if (ret) 183 + goto remove_opp; 184 + 185 + platform_set_drvdata(armada_8k_pdev, freq_tables); 186 + 187 + return 0; 188 + 189 + remove_opp: 190 + armada_8k_cpufreq_free_table(freq_tables); 191 + return ret; 192 + } 193 + module_init(armada_8k_cpufreq_init); 194 + 195 + static void __exit armada_8k_cpufreq_exit(void) 196 + { 197 + struct freq_table *freq_tables = platform_get_drvdata(armada_8k_pdev); 198 + 199 + platform_device_unregister(armada_8k_pdev); 200 + armada_8k_cpufreq_free_table(freq_tables); 201 + } 202 + module_exit(armada_8k_cpufreq_exit); 203 + 204 + MODULE_AUTHOR("Gregory Clement <gregory.clement@bootlin.com>"); 205 + MODULE_DESCRIPTION("Armada 8K cpufreq driver"); 206 + MODULE_LICENSE("GPL");

+65

drivers/cpufreq/cppc_cpufreq.c

··· 42 42 */ 43 43 static struct cppc_cpudata **all_cpu_data; 44 44 45 + struct cppc_workaround_oem_info { 46 + char oem_id[ACPI_OEM_ID_SIZE +1]; 47 + char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1]; 48 + u32 oem_revision; 49 + }; 50 + 51 + static bool apply_hisi_workaround; 52 + 53 + static struct cppc_workaround_oem_info wa_info[] = { 54 + { 55 + .oem_id = "HISI ", 56 + .oem_table_id = "HIP07 ", 57 + .oem_revision = 0, 58 + }, { 59 + .oem_id = "HISI ", 60 + .oem_table_id = "HIP08 ", 61 + .oem_revision = 0, 62 + } 63 + }; 64 + 65 + static unsigned int cppc_cpufreq_perf_to_khz(struct cppc_cpudata *cpu, 66 + unsigned int perf); 67 + 68 + /* 69 + * HISI platform does not support delivered performance counter and 70 + * reference performance counter. It can calculate the performance using the 71 + * platform specific mechanism. We reuse the desired performance register to 72 + * store the real performance calculated by the platform. 73 + */ 74 + static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpunum) 75 + { 76 + struct cppc_cpudata *cpudata = all_cpu_data[cpunum]; 77 + u64 desired_perf; 78 + int ret; 79 + 80 + ret = cppc_get_desired_perf(cpunum, &desired_perf); 81 + if (ret < 0) 82 + return -EIO; 83 + 84 + return cppc_cpufreq_perf_to_khz(cpudata, desired_perf); 85 + } 86 + 87 + static void cppc_check_hisi_workaround(void) 88 + { 89 + struct acpi_table_header *tbl; 90 + acpi_status status = AE_OK; 91 + int i; 92 + 93 + status = acpi_get_table(ACPI_SIG_PCCT, 0, &tbl); 94 + if (ACPI_FAILURE(status) || !tbl) 95 + return; 96 + 97 + for (i = 0; i < ARRAY_SIZE(wa_info); i++) { 98 + if (!memcmp(wa_info[i].oem_id, tbl->oem_id, ACPI_OEM_ID_SIZE) && 99 + !memcmp(wa_info[i].oem_table_id, tbl->oem_table_id, ACPI_OEM_TABLE_ID_SIZE) && 100 + wa_info[i].oem_revision == tbl->oem_revision) 101 + apply_hisi_workaround = true; 102 + } 103 + } 104 + 45 105 /* Callback function used to retrieve the max frequency from DMI */ 46 106 static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private) 47 107 { ··· 394 334 struct cppc_cpudata *cpu = all_cpu_data[cpunum]; 395 335 int ret; 396 336 337 + if (apply_hisi_workaround) 338 + return hisi_cppc_cpufreq_get_rate(cpunum); 339 + 397 340 ret = cppc_get_perf_ctrs(cpunum, &fb_ctrs_t0); 398 341 if (ret) 399 342 return ret; ··· 448 385 pr_debug("Error parsing PSD data. Aborting cpufreq registration.\n"); 449 386 goto out; 450 387 } 388 + 389 + cppc_check_hisi_workaround(); 451 390 452 391 ret = cpufreq_register_driver(&cppc_cpufreq_driver); 453 392 if (ret)

+21 -12

drivers/cpufreq/cpufreq-dt.c

··· 13 13 14 14 #include <linux/clk.h> 15 15 #include <linux/cpu.h> 16 - #include <linux/cpu_cooling.h> 17 16 #include <linux/cpufreq.h> 18 17 #include <linux/cpumask.h> 19 18 #include <linux/err.h> ··· 29 30 struct private_data { 30 31 struct opp_table *opp_table; 31 32 struct device *cpu_dev; 32 - struct thermal_cooling_device *cdev; 33 33 const char *reg_name; 34 34 bool have_static_opps; 35 35 }; ··· 278 280 policy->cpuinfo.transition_latency = transition_latency; 279 281 policy->dvfs_possible_from_any_cpu = true; 280 282 283 + dev_pm_opp_of_register_em(policy->cpus); 284 + 281 285 return 0; 282 286 283 287 out_free_cpufreq_table: ··· 297 297 return ret; 298 298 } 299 299 300 + static int cpufreq_online(struct cpufreq_policy *policy) 301 + { 302 + /* We did light-weight tear down earlier, nothing to do here */ 303 + return 0; 304 + } 305 + 306 + static int cpufreq_offline(struct cpufreq_policy *policy) 307 + { 308 + /* 309 + * Preserve policy->driver_data and don't free resources on light-weight 310 + * tear down. 311 + */ 312 + return 0; 313 + } 314 + 300 315 static int cpufreq_exit(struct cpufreq_policy *policy) 301 316 { 302 317 struct private_data *priv = policy->driver_data; 303 318 304 - cpufreq_cooling_unregister(priv->cdev); 305 319 dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table); 306 320 if (priv->have_static_opps) 307 321 dev_pm_opp_of_cpumask_remove_table(policy->related_cpus); ··· 328 314 return 0; 329 315 } 330 316 331 - static void cpufreq_ready(struct cpufreq_policy *policy) 332 - { 333 - struct private_data *priv = policy->driver_data; 334 - 335 - priv->cdev = of_cpufreq_cooling_register(policy); 336 - } 337 - 338 317 static struct cpufreq_driver dt_cpufreq_driver = { 339 - .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK, 318 + .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | 319 + CPUFREQ_IS_COOLING_DEV, 340 320 .verify = cpufreq_generic_frequency_table_verify, 341 321 .target_index = set_target, 342 322 .get = cpufreq_generic_get, 343 323 .init = cpufreq_init, 344 324 .exit = cpufreq_exit, 345 - .ready = cpufreq_ready, 325 + .online = cpufreq_online, 326 + .offline = cpufreq_offline, 346 327 .name = "cpufreq-dt", 347 328 .attr = cpufreq_dt_attr, 348 329 .suspend = cpufreq_generic_suspend,

+86 -48

drivers/cpufreq/cpufreq.c

··· 19 19 20 20 #include <linux/cpu.h> 21 21 #include <linux/cpufreq.h> 22 + #include <linux/cpu_cooling.h> 22 23 #include <linux/delay.h> 23 24 #include <linux/device.h> 24 25 #include <linux/init.h> ··· 546 545 * SYSFS INTERFACE * 547 546 *********************************************************************/ 548 547 static ssize_t show_boost(struct kobject *kobj, 549 - struct attribute *attr, char *buf) 548 + struct kobj_attribute *attr, char *buf) 550 549 { 551 550 return sprintf(buf, "%d\n", cpufreq_driver->boost_enabled); 552 551 } 553 552 554 - static ssize_t store_boost(struct kobject *kobj, struct attribute *attr, 555 - const char *buf, size_t count) 553 + static ssize_t store_boost(struct kobject *kobj, struct kobj_attribute *attr, 554 + const char *buf, size_t count) 556 555 { 557 556 int ret, enable; 558 557 ··· 1201 1200 return -ENOMEM; 1202 1201 } 1203 1202 1204 - cpumask_copy(policy->cpus, cpumask_of(cpu)); 1203 + if (!new_policy && cpufreq_driver->online) { 1204 + ret = cpufreq_driver->online(policy); 1205 + if (ret) { 1206 + pr_debug("%s: %d: initialization failed\n", __func__, 1207 + __LINE__); 1208 + goto out_exit_policy; 1209 + } 1205 1210 1206 - /* call driver. From then on the cpufreq must be able 1207 - * to accept all calls to ->verify and ->setpolicy for this CPU 1208 - */ 1209 - ret = cpufreq_driver->init(policy); 1210 - if (ret) { 1211 - pr_debug("initialization failed\n"); 1212 - goto out_free_policy; 1213 - } 1211 + /* Recover policy->cpus using related_cpus */ 1212 + cpumask_copy(policy->cpus, policy->related_cpus); 1213 + } else { 1214 + cpumask_copy(policy->cpus, cpumask_of(cpu)); 1214 1215 1215 - ret = cpufreq_table_validate_and_sort(policy); 1216 - if (ret) 1217 - goto out_exit_policy; 1216 + /* 1217 + * Call driver. From then on the cpufreq must be able 1218 + * to accept all calls to ->verify and ->setpolicy for this CPU. 1219 + */ 1220 + ret = cpufreq_driver->init(policy); 1221 + if (ret) { 1222 + pr_debug("%s: %d: initialization failed\n", __func__, 1223 + __LINE__); 1224 + goto out_free_policy; 1225 + } 1218 1226 1219 - down_write(&policy->rwsem); 1227 + ret = cpufreq_table_validate_and_sort(policy); 1228 + if (ret) 1229 + goto out_exit_policy; 1220 1230 1221 - if (new_policy) { 1222 1231 /* related_cpus should at least include policy->cpus. */ 1223 1232 cpumask_copy(policy->related_cpus, policy->cpus); 1224 1233 } 1225 1234 1235 + down_write(&policy->rwsem); 1226 1236 /* 1227 1237 * affected cpus must always be the one, which are online. We aren't 1228 1238 * managing offline cpus here. ··· 1317 1305 if (ret) { 1318 1306 pr_err("%s: Failed to initialize policy for cpu: %d (%d)\n", 1319 1307 __func__, cpu, ret); 1320 - /* cpufreq_policy_free() will notify based on this */ 1321 - new_policy = false; 1322 1308 goto out_destroy_policy; 1323 1309 } 1324 1310 ··· 1327 1317 /* Callback for handling stuff after policy is ready */ 1328 1318 if (cpufreq_driver->ready) 1329 1319 cpufreq_driver->ready(policy); 1320 + 1321 + if (IS_ENABLED(CONFIG_CPU_THERMAL) && 1322 + cpufreq_driver->flags & CPUFREQ_IS_COOLING_DEV) 1323 + policy->cdev = of_cpufreq_cooling_register(policy); 1330 1324 1331 1325 pr_debug("initialization complete\n"); 1332 1326 ··· 1419 1405 goto unlock; 1420 1406 } 1421 1407 1408 + if (IS_ENABLED(CONFIG_CPU_THERMAL) && 1409 + cpufreq_driver->flags & CPUFREQ_IS_COOLING_DEV) { 1410 + cpufreq_cooling_unregister(policy->cdev); 1411 + policy->cdev = NULL; 1412 + } 1413 + 1422 1414 if (cpufreq_driver->stop_cpu) 1423 1415 cpufreq_driver->stop_cpu(policy); 1424 1416 ··· 1432 1412 cpufreq_exit_governor(policy); 1433 1413 1434 1414 /* 1435 - * Perform the ->exit() even during light-weight tear-down, 1436 - * since this is a core component, and is essential for the 1437 - * subsequent light-weight ->init() to succeed. 1415 + * Perform the ->offline() during light-weight tear-down, as 1416 + * that allows fast recovery when the CPU comes back. 1438 1417 */ 1439 - if (cpufreq_driver->exit) { 1418 + if (cpufreq_driver->offline) { 1419 + cpufreq_driver->offline(policy); 1420 + } else if (cpufreq_driver->exit) { 1440 1421 cpufreq_driver->exit(policy); 1441 1422 policy->freq_table = NULL; 1442 1423 } ··· 1466 1445 cpumask_clear_cpu(cpu, policy->real_cpus); 1467 1446 remove_cpu_dev_symlink(policy, dev); 1468 1447 1469 - if (cpumask_empty(policy->real_cpus)) 1448 + if (cpumask_empty(policy->real_cpus)) { 1449 + /* We did light-weight exit earlier, do full tear down now */ 1450 + if (cpufreq_driver->offline) 1451 + cpufreq_driver->exit(policy); 1452 + 1470 1453 cpufreq_policy_free(policy); 1454 + } 1471 1455 } 1472 1456 1473 1457 /** ··· 2218 2192 } 2219 2193 EXPORT_SYMBOL(cpufreq_get_policy); 2220 2194 2221 - /* 2222 - * policy : current policy. 2223 - * new_policy: policy to be set. 2195 + /** 2196 + * cpufreq_set_policy - Modify cpufreq policy parameters. 2197 + * @policy: Policy object to modify. 2198 + * @new_policy: New policy data. 2199 + * 2200 + * Pass @new_policy to the cpufreq driver's ->verify() callback, run the 2201 + * installed policy notifiers for it with the CPUFREQ_ADJUST value, pass it to 2202 + * the driver's ->verify() callback again and run the notifiers for it again 2203 + * with the CPUFREQ_NOTIFY value. Next, copy the min and max parameters 2204 + * of @new_policy to @policy and either invoke the driver's ->setpolicy() 2205 + * callback (if present) or carry out a governor update for @policy. That is, 2206 + * run the current governor's ->limits() callback (if the governor field in 2207 + * @new_policy points to the same object as the one in @policy) or replace the 2208 + * governor for @policy with the new one stored in @new_policy. 2209 + * 2210 + * The cpuinfo part of @policy is not updated by this function. 2224 2211 */ 2225 2212 static int cpufreq_set_policy(struct cpufreq_policy *policy, 2226 - struct cpufreq_policy *new_policy) 2213 + struct cpufreq_policy *new_policy) 2227 2214 { 2228 2215 struct cpufreq_governor *old_gov; 2229 2216 int ret; ··· 2286 2247 if (cpufreq_driver->setpolicy) { 2287 2248 policy->policy = new_policy->policy; 2288 2249 pr_debug("setting range\n"); 2289 - return cpufreq_driver->setpolicy(new_policy); 2250 + return cpufreq_driver->setpolicy(policy); 2290 2251 } 2291 2252 2292 2253 if (new_policy->governor == policy->governor) { 2293 - pr_debug("cpufreq: governor limits update\n"); 2254 + pr_debug("governor limits update\n"); 2294 2255 cpufreq_governor_limits(policy); 2295 2256 return 0; 2296 2257 } ··· 2311 2272 if (!ret) { 2312 2273 ret = cpufreq_start_governor(policy); 2313 2274 if (!ret) { 2314 - pr_debug("cpufreq: governor change\n"); 2275 + pr_debug("governor change\n"); 2315 2276 sched_cpufreq_governor_change(policy, old_gov); 2316 2277 return 0; 2317 2278 } ··· 2332 2293 } 2333 2294 2334 2295 /** 2335 - * cpufreq_update_policy - re-evaluate an existing cpufreq policy 2336 - * @cpu: CPU which shall be re-evaluated 2296 + * cpufreq_update_policy - Re-evaluate an existing cpufreq policy. 2297 + * @cpu: CPU to re-evaluate the policy for. 2337 2298 * 2338 - * Useful for policy notifiers which have different necessities 2339 - * at different times. 2299 + * Update the current frequency for the cpufreq policy of @cpu and use 2300 + * cpufreq_set_policy() to re-apply the min and max limits saved in the 2301 + * user_policy sub-structure of that policy, which triggers the evaluation 2302 + * of policy notifiers and the cpufreq driver's ->verify() callback for the 2303 + * policy in question, among other things. 2340 2304 */ 2341 2305 void cpufreq_update_policy(unsigned int cpu) 2342 2306 { ··· 2354 2312 if (policy_is_inactive(policy)) 2355 2313 goto unlock; 2356 2314 2357 - pr_debug("updating policy for CPU %u\n", cpu); 2358 - memcpy(&new_policy, policy, sizeof(*policy)); 2359 - new_policy.min = policy->user_policy.min; 2360 - new_policy.max = policy->user_policy.max; 2361 - 2362 2315 /* 2363 2316 * BIOS might change freq behind our back 2364 2317 * -> ask driver for current freq and notify governors about a change 2365 2318 */ 2366 - if (cpufreq_driver->get && !cpufreq_driver->setpolicy) { 2367 - if (cpufreq_suspended) 2368 - goto unlock; 2319 + if (cpufreq_driver->get && !cpufreq_driver->setpolicy && 2320 + (cpufreq_suspended || WARN_ON(!cpufreq_update_current_freq(policy)))) 2321 + goto unlock; 2369 2322 2370 - new_policy.cur = cpufreq_update_current_freq(policy); 2371 - if (WARN_ON(!new_policy.cur)) 2372 - goto unlock; 2373 - } 2323 + pr_debug("updating policy for CPU %u\n", cpu); 2324 + memcpy(&new_policy, policy, sizeof(*policy)); 2325 + new_policy.min = policy->user_policy.min; 2326 + new_policy.max = policy->user_policy.max; 2374 2327 2375 2328 cpufreq_set_policy(policy, &new_policy); 2376 2329 ··· 2516 2479 driver_data->target) || 2517 2480 (driver_data->setpolicy && (driver_data->target_index || 2518 2481 driver_data->target)) || 2519 - (!!driver_data->get_intermediate != !!driver_data->target_intermediate)) 2482 + (!driver_data->get_intermediate != !driver_data->target_intermediate) || 2483 + (!driver_data->online != !driver_data->offline)) 2520 2484 return -EINVAL; 2521 2485 2522 2486 pr_debug("trying to register driver %s\n", driver_data->name);

+10 -6

drivers/cpufreq/cpufreq_stats.c

··· 31 31 { 32 32 unsigned long long cur_time = get_jiffies_64(); 33 33 34 - spin_lock(&cpufreq_stats_lock); 35 34 stats->time_in_state[stats->last_index] += cur_time - stats->last_time; 36 35 stats->last_time = cur_time; 37 - spin_unlock(&cpufreq_stats_lock); 38 36 } 39 37 40 38 static void cpufreq_stats_clear_table(struct cpufreq_stats *stats) 41 39 { 42 40 unsigned int count = stats->max_state; 43 41 42 + spin_lock(&cpufreq_stats_lock); 44 43 memset(stats->time_in_state, 0, count * sizeof(u64)); 45 44 memset(stats->trans_table, 0, count * count * sizeof(int)); 46 45 stats->last_time = get_jiffies_64(); 47 46 stats->total_trans = 0; 47 + spin_unlock(&cpufreq_stats_lock); 48 48 } 49 49 50 50 static ssize_t show_total_trans(struct cpufreq_policy *policy, char *buf) 51 51 { 52 52 return sprintf(buf, "%d\n", policy->stats->total_trans); 53 53 } 54 + cpufreq_freq_attr_ro(total_trans); 54 55 55 56 static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf) 56 57 { ··· 62 61 if (policy->fast_switch_enabled) 63 62 return 0; 64 63 64 + spin_lock(&cpufreq_stats_lock); 65 65 cpufreq_stats_update(stats); 66 + spin_unlock(&cpufreq_stats_lock); 67 + 66 68 for (i = 0; i < stats->state_num; i++) { 67 69 len += sprintf(buf + len, "%u %llu\n", stats->freq_table[i], 68 70 (unsigned long long) ··· 73 69 } 74 70 return len; 75 71 } 72 + cpufreq_freq_attr_ro(time_in_state); 76 73 77 74 static ssize_t store_reset(struct cpufreq_policy *policy, const char *buf, 78 75 size_t count) ··· 82 77 cpufreq_stats_clear_table(policy->stats); 83 78 return count; 84 79 } 80 + cpufreq_freq_attr_wo(reset); 85 81 86 82 static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf) 87 83 { ··· 131 125 return len; 132 126 } 133 127 cpufreq_freq_attr_ro(trans_table); 134 - 135 - cpufreq_freq_attr_ro(total_trans); 136 - cpufreq_freq_attr_ro(time_in_state); 137 - cpufreq_freq_attr_wo(reset); 138 128 139 129 static struct attribute *default_attrs[] = { 140 130 &total_trans.attr, ··· 242 240 if (old_index == -1 || new_index == -1 || old_index == new_index) 243 241 return; 244 242 243 + spin_lock(&cpufreq_stats_lock); 245 244 cpufreq_stats_update(stats); 246 245 247 246 stats->last_index = new_index; 248 247 stats->trans_table[old_index * stats->max_state + new_index]++; 249 248 stats->total_trans++; 249 + spin_unlock(&cpufreq_stats_lock); 250 250 }

+1 -4

drivers/cpufreq/davinci-cpufreq.c

··· 23 23 #include <linux/init.h> 24 24 #include <linux/err.h> 25 25 #include <linux/clk.h> 26 + #include <linux/platform_data/davinci-cpufreq.h> 26 27 #include <linux/platform_device.h> 27 28 #include <linux/export.h> 28 - 29 - #include <mach/hardware.h> 30 - #include <mach/cpufreq.h> 31 - #include <mach/common.h> 32 29 33 30 struct davinci_cpufreq { 34 31 struct device *dev;

+2 -3

drivers/cpufreq/e_powersaver.c

··· 323 323 states = 2; 324 324 325 325 /* Allocate private data and frequency table for current cpu */ 326 - centaur = kzalloc(sizeof(*centaur) 327 - + (states + 1) * sizeof(struct cpufreq_frequency_table), 328 - GFP_KERNEL); 326 + centaur = kzalloc(struct_size(centaur, freq_table, states + 1), 327 + GFP_KERNEL); 329 328 if (!centaur) 330 329 return -ENOMEM; 331 330 eps_cpu[0] = centaur;

+3 -22

drivers/cpufreq/imx6q-cpufreq.c

··· 9 9 #include <linux/clk.h> 10 10 #include <linux/cpu.h> 11 11 #include <linux/cpufreq.h> 12 - #include <linux/cpu_cooling.h> 13 12 #include <linux/err.h> 14 13 #include <linux/module.h> 15 14 #include <linux/nvmem-consumer.h> ··· 51 52 }; 52 53 53 54 static struct device *cpu_dev; 54 - static struct thermal_cooling_device *cdev; 55 55 static bool free_opp; 56 56 static struct cpufreq_frequency_table *freq_table; 57 57 static unsigned int max_freq; ··· 191 193 return 0; 192 194 } 193 195 194 - static void imx6q_cpufreq_ready(struct cpufreq_policy *policy) 195 - { 196 - cdev = of_cpufreq_cooling_register(policy); 197 - 198 - if (!cdev) 199 - dev_err(cpu_dev, 200 - "running cpufreq without cooling device: %ld\n", 201 - PTR_ERR(cdev)); 202 - } 203 - 204 196 static int imx6q_cpufreq_init(struct cpufreq_policy *policy) 205 197 { 206 198 int ret; ··· 198 210 policy->clk = clks[ARM].clk; 199 211 ret = cpufreq_generic_init(policy, freq_table, transition_latency); 200 212 policy->suspend_freq = max_freq; 213 + dev_pm_opp_of_register_em(policy->cpus); 201 214 202 215 return ret; 203 216 } 204 217 205 - static int imx6q_cpufreq_exit(struct cpufreq_policy *policy) 206 - { 207 - cpufreq_cooling_unregister(cdev); 208 - 209 - return 0; 210 - } 211 - 212 218 static struct cpufreq_driver imx6q_cpufreq_driver = { 213 - .flags = CPUFREQ_NEED_INITIAL_FREQ_CHECK, 219 + .flags = CPUFREQ_NEED_INITIAL_FREQ_CHECK | 220 + CPUFREQ_IS_COOLING_DEV, 214 221 .verify = cpufreq_generic_frequency_table_verify, 215 222 .target_index = imx6q_set_target, 216 223 .get = cpufreq_generic_get, 217 224 .init = imx6q_cpufreq_init, 218 - .exit = imx6q_cpufreq_exit, 219 225 .name = "imx6q-cpufreq", 220 - .ready = imx6q_cpufreq_ready, 221 226 .attr = cpufreq_generic_attr, 222 227 .suspend = cpufreq_generic_suspend, 223 228 };

+56 -49

drivers/cpufreq/intel_pstate.c

··· 50 50 #define int_tofp(X) ((int64_t)(X) << FRAC_BITS) 51 51 #define fp_toint(X) ((X) >> FRAC_BITS) 52 52 53 + #define ONE_EIGHTH_FP ((int64_t)1 << (FRAC_BITS - 3)) 54 + 53 55 #define EXT_BITS 6 54 56 #define EXT_FRAC_BITS (EXT_BITS + FRAC_BITS) 55 57 #define fp_ext_toint(X) ((X) >> EXT_FRAC_BITS) ··· 897 895 /************************** sysfs begin ************************/ 898 896 #define show_one(file_name, object) \ 899 897 static ssize_t show_##file_name \ 900 - (struct kobject *kobj, struct attribute *attr, char *buf) \ 898 + (struct kobject *kobj, struct kobj_attribute *attr, char *buf) \ 901 899 { \ 902 900 return sprintf(buf, "%u\n", global.object); \ 903 901 } ··· 906 904 static int intel_pstate_update_status(const char *buf, size_t size); 907 905 908 906 static ssize_t show_status(struct kobject *kobj, 909 - struct attribute *attr, char *buf) 907 + struct kobj_attribute *attr, char *buf) 910 908 { 911 909 ssize_t ret; 912 910 ··· 917 915 return ret; 918 916 } 919 917 920 - static ssize_t store_status(struct kobject *a, struct attribute *b, 918 + static ssize_t store_status(struct kobject *a, struct kobj_attribute *b, 921 919 const char *buf, size_t count) 922 920 { 923 921 char *p = memchr(buf, '\n', count); ··· 931 929 } 932 930 933 931 static ssize_t show_turbo_pct(struct kobject *kobj, 934 - struct attribute *attr, char *buf) 932 + struct kobj_attribute *attr, char *buf) 935 933 { 936 934 struct cpudata *cpu; 937 935 int total, no_turbo, turbo_pct; ··· 957 955 } 958 956 959 957 static ssize_t show_num_pstates(struct kobject *kobj, 960 - struct attribute *attr, char *buf) 958 + struct kobj_attribute *attr, char *buf) 961 959 { 962 960 struct cpudata *cpu; 963 961 int total; ··· 978 976 } 979 977 980 978 static ssize_t show_no_turbo(struct kobject *kobj, 981 - struct attribute *attr, char *buf) 979 + struct kobj_attribute *attr, char *buf) 982 980 { 983 981 ssize_t ret; 984 982 ··· 1000 998 return ret; 1001 999 } 1002 1000 1003 - static ssize_t store_no_turbo(struct kobject *a, struct attribute *b, 1001 + static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b, 1004 1002 const char *buf, size_t count) 1005 1003 { 1006 1004 unsigned int input; ··· 1047 1045 return count; 1048 1046 } 1049 1047 1050 - static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b, 1048 + static ssize_t store_max_perf_pct(struct kobject *a, struct kobj_attribute *b, 1051 1049 const char *buf, size_t count) 1052 1050 { 1053 1051 unsigned int input; ··· 1077 1075 return count; 1078 1076 } 1079 1077 1080 - static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b, 1078 + static ssize_t store_min_perf_pct(struct kobject *a, struct kobj_attribute *b, 1081 1079 const char *buf, size_t count) 1082 1080 { 1083 1081 unsigned int input; ··· 1109 1107 } 1110 1108 1111 1109 static ssize_t show_hwp_dynamic_boost(struct kobject *kobj, 1112 - struct attribute *attr, char *buf) 1110 + struct kobj_attribute *attr, char *buf) 1113 1111 { 1114 1112 return sprintf(buf, "%u\n", hwp_boost); 1115 1113 } 1116 1114 1117 - static ssize_t store_hwp_dynamic_boost(struct kobject *a, struct attribute *b, 1115 + static ssize_t store_hwp_dynamic_boost(struct kobject *a, 1116 + struct kobj_attribute *b, 1118 1117 const char *buf, size_t count) 1119 1118 { 1120 1119 unsigned int input; ··· 1447 1444 return ret; 1448 1445 } 1449 1446 1450 - static int intel_pstate_get_base_pstate(struct cpudata *cpu) 1451 - { 1452 - return global.no_turbo || global.turbo_disabled ? 1453 - cpu->pstate.max_pstate : cpu->pstate.turbo_pstate; 1454 - } 1455 - 1456 1447 static void intel_pstate_set_pstate(struct cpudata *cpu, int pstate) 1457 1448 { 1458 1449 trace_cpu_frequency(pstate * cpu->pstate.scaling, cpu->cpu); ··· 1467 1470 1468 1471 static void intel_pstate_max_within_limits(struct cpudata *cpu) 1469 1472 { 1470 - int pstate; 1473 + int pstate = max(cpu->pstate.min_pstate, cpu->max_perf_ratio); 1471 1474 1472 1475 update_turbo_state(); 1473 - pstate = intel_pstate_get_base_pstate(cpu); 1474 - pstate = max(cpu->pstate.min_pstate, cpu->max_perf_ratio); 1475 1476 intel_pstate_set_pstate(cpu, pstate); 1476 1477 } 1477 1478 ··· 1673 1678 static inline int32_t get_target_pstate(struct cpudata *cpu) 1674 1679 { 1675 1680 struct sample *sample = &cpu->sample; 1676 - int32_t busy_frac, boost; 1681 + int32_t busy_frac; 1677 1682 int target, avg_pstate; 1678 1683 1679 1684 busy_frac = div_fp(sample->mperf << cpu->aperf_mperf_shift, 1680 1685 sample->tsc); 1681 1686 1682 - boost = cpu->iowait_boost; 1683 - cpu->iowait_boost >>= 1; 1684 - 1685 - if (busy_frac < boost) 1686 - busy_frac = boost; 1687 + if (busy_frac < cpu->iowait_boost) 1688 + busy_frac = cpu->iowait_boost; 1687 1689 1688 1690 sample->busy_scaled = busy_frac * 100; 1689 1691 ··· 1707 1715 1708 1716 static int intel_pstate_prepare_request(struct cpudata *cpu, int pstate) 1709 1717 { 1710 - int max_pstate = intel_pstate_get_base_pstate(cpu); 1711 - int min_pstate; 1718 + int min_pstate = max(cpu->pstate.min_pstate, cpu->min_perf_ratio); 1719 + int max_pstate = max(min_pstate, cpu->max_perf_ratio); 1712 1720 1713 - min_pstate = max(cpu->pstate.min_pstate, cpu->min_perf_ratio); 1714 - max_pstate = max(min_pstate, cpu->max_perf_ratio); 1715 1721 return clamp_t(int, pstate, min_pstate, max_pstate); 1716 1722 } 1717 1723 ··· 1757 1767 if (smp_processor_id() != cpu->cpu) 1758 1768 return; 1759 1769 1770 + delta_ns = time - cpu->last_update; 1760 1771 if (flags & SCHED_CPUFREQ_IOWAIT) { 1761 - cpu->iowait_boost = int_tofp(1); 1762 - cpu->last_update = time; 1763 - /* 1764 - * The last time the busy was 100% so P-state was max anyway 1765 - * so avoid overhead of computation. 1766 - */ 1767 - if (fp_toint(cpu->sample.busy_scaled) == 100) 1768 - return; 1769 - 1770 - goto set_pstate; 1772 + /* Start over if the CPU may have been idle. */ 1773 + if (delta_ns > TICK_NSEC) { 1774 + cpu->iowait_boost = ONE_EIGHTH_FP; 1775 + } else if (cpu->iowait_boost) { 1776 + cpu->iowait_boost <<= 1; 1777 + if (cpu->iowait_boost > int_tofp(1)) 1778 + cpu->iowait_boost = int_tofp(1); 1779 + } else { 1780 + cpu->iowait_boost = ONE_EIGHTH_FP; 1781 + } 1771 1782 } else if (cpu->iowait_boost) { 1772 1783 /* Clear iowait_boost if the CPU may have been idle. */ 1773 - delta_ns = time - cpu->last_update; 1774 1784 if (delta_ns > TICK_NSEC) 1775 1785 cpu->iowait_boost = 0; 1786 + else 1787 + cpu->iowait_boost >>= 1; 1776 1788 } 1777 1789 cpu->last_update = time; 1778 1790 delta_ns = time - cpu->sample.time; 1779 1791 if ((s64)delta_ns < INTEL_PSTATE_SAMPLING_INTERVAL) 1780 1792 return; 1781 1793 1782 - set_pstate: 1783 1794 if (intel_pstate_sample(cpu, time)) 1784 1795 intel_pstate_adjust_pstate(cpu); 1785 1796 } ··· 1967 1976 if (hwp_active) { 1968 1977 intel_pstate_get_hwp_max(cpu->cpu, &turbo_max, &max_state); 1969 1978 } else { 1970 - max_state = intel_pstate_get_base_pstate(cpu); 1979 + max_state = global.no_turbo || global.turbo_disabled ? 1980 + cpu->pstate.max_pstate : cpu->pstate.turbo_pstate; 1971 1981 turbo_max = cpu->pstate.turbo_pstate; 1972 1982 } 1973 1983 ··· 2467 2475 kfree(pss); 2468 2476 } 2469 2477 2478 + pr_debug("ACPI _PSS not found\n"); 2470 2479 return true; 2471 2480 } 2472 2481 ··· 2478 2485 2479 2486 status = acpi_get_handle(NULL, "\\_SB", &handle); 2480 2487 if (ACPI_FAILURE(status)) 2481 - return true; 2488 + goto not_found; 2482 2489 2483 - return !acpi_has_method(handle, "PCCH"); 2490 + if (acpi_has_method(handle, "PCCH")) 2491 + return false; 2492 + 2493 + not_found: 2494 + pr_debug("ACPI PCCH not found\n"); 2495 + return true; 2484 2496 } 2485 2497 2486 2498 static bool __init intel_pstate_has_acpi_ppc(void) ··· 2500 2502 if (acpi_has_method(pr->handle, "_PPC")) 2501 2503 return true; 2502 2504 } 2505 + pr_debug("ACPI _PPC not found\n"); 2503 2506 return false; 2504 2507 } 2505 2508 ··· 2538 2539 id = x86_match_cpu(intel_pstate_cpu_oob_ids); 2539 2540 if (id) { 2540 2541 rdmsrl(MSR_MISC_PWR_MGMT, misc_pwr); 2541 - if ( misc_pwr & (1 << 8)) 2542 + if (misc_pwr & (1 << 8)) { 2543 + pr_debug("Bit 8 in the MISC_PWR_MGMT MSR set\n"); 2542 2544 return true; 2545 + } 2543 2546 } 2544 2547 2545 2548 idx = acpi_match_platform_list(plat_info); ··· 2607 2606 } 2608 2607 } else { 2609 2608 id = x86_match_cpu(intel_pstate_cpu_ids); 2610 - if (!id) 2609 + if (!id) { 2610 + pr_info("CPU ID not supported\n"); 2611 2611 return -ENODEV; 2612 + } 2612 2613 2613 2614 copy_cpu_funcs((struct pstate_funcs *)id->driver_data); 2614 2615 } 2615 2616 2616 - if (intel_pstate_msrs_not_valid()) 2617 + if (intel_pstate_msrs_not_valid()) { 2618 + pr_info("Invalid MSRs\n"); 2617 2619 return -ENODEV; 2620 + } 2618 2621 2619 2622 hwp_cpu_matched: 2620 2623 /* 2621 2624 * The Intel pstate driver will be ignored if the platform 2622 2625 * firmware has its own power management modes. 2623 2626 */ 2624 - if (intel_pstate_platform_pwr_mgmt_exists()) 2627 + if (intel_pstate_platform_pwr_mgmt_exists()) { 2628 + pr_info("P-states controlled by the platform\n"); 2625 2629 return -ENODEV; 2630 + } 2626 2631 2627 2632 if (!hwp_active && hwp_only) 2628 2633 return -ENOTSUPP;

+1 -1

drivers/cpufreq/longhaul.c

··· 851 851 case TYPE_POWERSAVER: 852 852 pr_cont("Powersaver supported\n"); 853 853 break; 854 - }; 854 + } 855 855 856 856 /* Doesn't hurt */ 857 857 longhaul_setup_southbridge();

+4 -12

drivers/cpufreq/mediatek-cpufreq.c

··· 14 14 15 15 #include <linux/clk.h> 16 16 #include <linux/cpu.h> 17 - #include <linux/cpu_cooling.h> 18 17 #include <linux/cpufreq.h> 19 18 #include <linux/cpumask.h> 20 19 #include <linux/module.h> ··· 47 48 struct regulator *sram_reg; 48 49 struct clk *cpu_clk; 49 50 struct clk *inter_clk; 50 - struct thermal_cooling_device *cdev; 51 51 struct list_head list_head; 52 52 int intermediate_voltage; 53 53 bool need_voltage_tracking; ··· 305 307 306 308 #define DYNAMIC_POWER "dynamic-power-coefficient" 307 309 308 - static void mtk_cpufreq_ready(struct cpufreq_policy *policy) 309 - { 310 - struct mtk_cpu_dvfs_info *info = policy->driver_data; 311 - 312 - info->cdev = of_cpufreq_cooling_register(policy); 313 - } 314 - 315 310 static int mtk_cpu_dvfs_info_init(struct mtk_cpu_dvfs_info *info, int cpu) 316 311 { 317 312 struct device *cpu_dev; ··· 456 465 policy->driver_data = info; 457 466 policy->clk = info->cpu_clk; 458 467 468 + dev_pm_opp_of_register_em(policy->cpus); 469 + 459 470 return 0; 460 471 } 461 472 ··· 465 472 { 466 473 struct mtk_cpu_dvfs_info *info = policy->driver_data; 467 474 468 - cpufreq_cooling_unregister(info->cdev); 469 475 dev_pm_opp_free_cpufreq_table(info->cpu_dev, &policy->freq_table); 470 476 471 477 return 0; ··· 472 480 473 481 static struct cpufreq_driver mtk_cpufreq_driver = { 474 482 .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | 475 - CPUFREQ_HAVE_GOVERNOR_PER_POLICY, 483 + CPUFREQ_HAVE_GOVERNOR_PER_POLICY | 484 + CPUFREQ_IS_COOLING_DEV, 476 485 .verify = cpufreq_generic_frequency_table_verify, 477 486 .target_index = mtk_cpufreq_set_target, 478 487 .get = cpufreq_generic_get, 479 488 .init = mtk_cpufreq_init, 480 489 .exit = mtk_cpufreq_exit, 481 - .ready = mtk_cpufreq_ready, 482 490 .name = "mtk-cpufreq", 483 491 .attr = cpufreq_generic_attr, 484 492 };

+3 -1

drivers/cpufreq/omap-cpufreq.c

··· 133 133 134 134 /* FIXME: what's the actual transition time? */ 135 135 result = cpufreq_generic_init(policy, freq_table, 300 * 1000); 136 - if (!result) 136 + if (!result) { 137 + dev_pm_opp_of_register_em(policy->cpus); 137 138 return 0; 139 + } 138 140 139 141 freq_table_free(); 140 142 fail:

+1 -1

drivers/cpufreq/pcc-cpufreq.c

··· 268 268 if (!pccp || pccp->type != ACPI_TYPE_PACKAGE) { 269 269 ret = -ENODEV; 270 270 goto out_free; 271 - }; 271 + } 272 272 273 273 offset = &(pccp->package.elements[0]); 274 274 if (!offset || offset->type != ACPI_TYPE_INTEGER) {

+7 -3

drivers/cpufreq/powernv-cpufreq.c

··· 244 244 u32 len_ids, len_freqs; 245 245 u32 pstate_min, pstate_max, pstate_nominal; 246 246 u32 pstate_turbo, pstate_ultra_turbo; 247 + int rc = -ENODEV; 247 248 248 249 power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); 249 250 if (!power_mgt) { ··· 328 327 powernv_freqs[i].frequency = freq * 1000; /* kHz */ 329 328 powernv_freqs[i].driver_data = id & 0xFF; 330 329 331 - revmap_data = (struct pstate_idx_revmap_data *) 332 - kmalloc(sizeof(*revmap_data), GFP_KERNEL); 330 + revmap_data = kmalloc(sizeof(*revmap_data), GFP_KERNEL); 331 + if (!revmap_data) { 332 + rc = -ENOMEM; 333 + goto out; 334 + } 333 335 334 336 revmap_data->pstate_id = id & 0xFF; 335 337 revmap_data->cpufreq_table_idx = i; ··· 361 357 return 0; 362 358 out: 363 359 of_node_put(power_mgt); 364 - return -ENODEV; 360 + return rc; 365 361 } 366 362 367 363 /* Returns the CPU frequency corresponding to the pstate_id. */

+42 -11

drivers/cpufreq/qcom-cpufreq-hw.c

··· 10 10 #include <linux/module.h> 11 11 #include <linux/of_address.h> 12 12 #include <linux/of_platform.h> 13 + #include <linux/pm_opp.h> 13 14 #include <linux/slab.h> 14 15 15 16 #define LUT_MAX_ENTRIES 40U 16 17 #define LUT_SRC GENMASK(31, 30) 17 18 #define LUT_L_VAL GENMASK(7, 0) 18 19 #define LUT_CORE_COUNT GENMASK(18, 16) 20 + #define LUT_VOLT GENMASK(11, 0) 19 21 #define LUT_ROW_SIZE 32 20 22 #define CLK_HW_DIV 2 21 23 22 24 /* Register offsets */ 23 25 #define REG_ENABLE 0x0 24 - #define REG_LUT_TABLE 0x110 26 + #define REG_FREQ_LUT 0x110 27 + #define REG_VOLT_LUT 0x114 25 28 #define REG_PERF_STATE 0x920 26 29 27 30 static unsigned long cpu_hw_rate, xo_rate; ··· 73 70 return policy->freq_table[index].frequency; 74 71 } 75 72 76 - static int qcom_cpufreq_hw_read_lut(struct device *dev, 73 + static int qcom_cpufreq_hw_read_lut(struct device *cpu_dev, 77 74 struct cpufreq_policy *policy, 78 75 void __iomem *base) 79 76 { 80 77 u32 data, src, lval, i, core_count, prev_cc = 0, prev_freq = 0, freq; 78 + u32 volt; 81 79 unsigned int max_cores = cpumask_weight(policy->cpus); 82 80 struct cpufreq_frequency_table *table; 83 81 ··· 87 83 return -ENOMEM; 88 84 89 85 for (i = 0; i < LUT_MAX_ENTRIES; i++) { 90 - data = readl_relaxed(base + REG_LUT_TABLE + i * LUT_ROW_SIZE); 86 + data = readl_relaxed(base + REG_FREQ_LUT + 87 + i * LUT_ROW_SIZE); 91 88 src = FIELD_GET(LUT_SRC, data); 92 89 lval = FIELD_GET(LUT_L_VAL, data); 93 90 core_count = FIELD_GET(LUT_CORE_COUNT, data); 91 + 92 + data = readl_relaxed(base + REG_VOLT_LUT + 93 + i * LUT_ROW_SIZE); 94 + volt = FIELD_GET(LUT_VOLT, data) * 1000; 94 95 95 96 if (src) 96 97 freq = xo_rate * lval / 1000; 97 98 else 98 99 freq = cpu_hw_rate / 1000; 99 100 100 - /* Ignore boosts in the middle of the table */ 101 - if (core_count != max_cores) { 102 - table[i].frequency = CPUFREQ_ENTRY_INVALID; 103 - } else { 101 + if (freq != prev_freq && core_count == max_cores) { 104 102 table[i].frequency = freq; 105 - dev_dbg(dev, "index=%d freq=%d, core_count %d\n", i, 103 + dev_pm_opp_add(cpu_dev, freq * 1000, volt); 104 + dev_dbg(cpu_dev, "index=%d freq=%d, core_count %d\n", i, 106 105 freq, core_count); 106 + } else { 107 + table[i].frequency = CPUFREQ_ENTRY_INVALID; 107 108 } 108 109 109 110 /* ··· 125 116 if (prev_cc != max_cores) { 126 117 prev->frequency = prev_freq; 127 118 prev->flags = CPUFREQ_BOOST_FREQ; 119 + dev_pm_opp_add(cpu_dev, prev_freq * 1000, volt); 128 120 } 129 121 130 122 break; ··· 137 127 138 128 table[i].frequency = CPUFREQ_TABLE_END; 139 129 policy->freq_table = table; 130 + dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus); 140 131 141 132 return 0; 142 133 } ··· 170 159 struct device *dev = &global_pdev->dev; 171 160 struct of_phandle_args args; 172 161 struct device_node *cpu_np; 162 + struct device *cpu_dev; 173 163 struct resource *res; 174 164 void __iomem *base; 175 165 int ret, index; 166 + 167 + cpu_dev = get_cpu_device(policy->cpu); 168 + if (!cpu_dev) { 169 + pr_err("%s: failed to get cpu%d device\n", __func__, 170 + policy->cpu); 171 + return -ENODEV; 172 + } 176 173 177 174 cpu_np = of_cpu_device_node_get(policy->cpu); 178 175 if (!cpu_np) ··· 218 199 219 200 policy->driver_data = base + REG_PERF_STATE; 220 201 221 - ret = qcom_cpufreq_hw_read_lut(dev, policy, base); 202 + ret = qcom_cpufreq_hw_read_lut(cpu_dev, policy, base); 222 203 if (ret) { 223 204 dev_err(dev, "Domain-%d failed to read LUT\n", index); 224 205 goto error; 225 206 } 207 + 208 + ret = dev_pm_opp_get_opp_count(cpu_dev); 209 + if (ret <= 0) { 210 + dev_err(cpu_dev, "Failed to add OPPs\n"); 211 + ret = -ENODEV; 212 + goto error; 213 + } 214 + 215 + dev_pm_opp_of_register_em(policy->cpus); 226 216 227 217 policy->fast_switch_possible = true; 228 218 ··· 243 215 244 216 static int qcom_cpufreq_hw_cpu_exit(struct cpufreq_policy *policy) 245 217 { 218 + struct device *cpu_dev = get_cpu_device(policy->cpu); 246 219 void __iomem *base = policy->driver_data - REG_PERF_STATE; 247 220 221 + dev_pm_opp_remove_all_dynamic(cpu_dev); 248 222 kfree(policy->freq_table); 249 223 devm_iounmap(&global_pdev->dev, base); 250 224 ··· 261 231 262 232 static struct cpufreq_driver cpufreq_qcom_hw_driver = { 263 233 .flags = CPUFREQ_STICKY | CPUFREQ_NEED_INITIAL_FREQ_CHECK | 264 - CPUFREQ_HAVE_GOVERNOR_PER_POLICY, 234 + CPUFREQ_HAVE_GOVERNOR_PER_POLICY | 235 + CPUFREQ_IS_COOLING_DEV, 265 236 .verify = cpufreq_generic_frequency_table_verify, 266 237 .target_index = qcom_cpufreq_hw_target_index, 267 238 .get = qcom_cpufreq_hw_get, ··· 327 296 { 328 297 return platform_driver_register(&qcom_cpufreq_hw_driver); 329 298 } 330 - subsys_initcall(qcom_cpufreq_hw_init); 299 + device_initcall(qcom_cpufreq_hw_init); 331 300 332 301 static void __exit qcom_cpufreq_hw_exit(void) 333 302 {

+19 -3

drivers/cpufreq/qcom-cpufreq-kryo.c

··· 42 42 NUM_OF_MSM8996_VERSIONS, 43 43 }; 44 44 45 - struct platform_device *cpufreq_dt_pdev, *kryo_cpufreq_pdev; 45 + static struct platform_device *cpufreq_dt_pdev, *kryo_cpufreq_pdev; 46 46 47 47 static enum _msm8996_version qcom_cpufreq_kryo_get_msm_id(void) 48 48 { ··· 75 75 76 76 static int qcom_cpufreq_kryo_probe(struct platform_device *pdev) 77 77 { 78 - struct opp_table *opp_tables[NR_CPUS] = {0}; 78 + struct opp_table **opp_tables; 79 79 enum _msm8996_version msm8996_version; 80 80 struct nvmem_cell *speedbin_nvmem; 81 81 struct device_node *np; ··· 133 133 } 134 134 kfree(speedbin); 135 135 136 + opp_tables = kcalloc(num_possible_cpus(), sizeof(*opp_tables), GFP_KERNEL); 137 + if (!opp_tables) 138 + return -ENOMEM; 139 + 136 140 for_each_possible_cpu(cpu) { 137 141 cpu_dev = get_cpu_device(cpu); 138 142 if (NULL == cpu_dev) { ··· 155 151 156 152 cpufreq_dt_pdev = platform_device_register_simple("cpufreq-dt", -1, 157 153 NULL, 0); 158 - if (!IS_ERR(cpufreq_dt_pdev)) 154 + if (!IS_ERR(cpufreq_dt_pdev)) { 155 + platform_set_drvdata(pdev, opp_tables); 159 156 return 0; 157 + } 160 158 161 159 ret = PTR_ERR(cpufreq_dt_pdev); 162 160 dev_err(cpu_dev, "Failed to register platform device\n"); ··· 169 163 break; 170 164 dev_pm_opp_put_supported_hw(opp_tables[cpu]); 171 165 } 166 + kfree(opp_tables); 172 167 173 168 return ret; 174 169 } 175 170 176 171 static int qcom_cpufreq_kryo_remove(struct platform_device *pdev) 177 172 { 173 + struct opp_table **opp_tables = platform_get_drvdata(pdev); 174 + unsigned int cpu; 175 + 178 176 platform_device_unregister(cpufreq_dt_pdev); 177 + 178 + for_each_possible_cpu(cpu) 179 + dev_pm_opp_put_supported_hw(opp_tables[cpu]); 180 + 181 + kfree(opp_tables); 182 + 179 183 return 0; 180 184 } 181 185

+2 -13

drivers/cpufreq/qoriq-cpufreq.c

··· 13 13 #include <linux/clk.h> 14 14 #include <linux/clk-provider.h> 15 15 #include <linux/cpufreq.h> 16 - #include <linux/cpu_cooling.h> 17 16 #include <linux/errno.h> 18 17 #include <linux/init.h> 19 18 #include <linux/kernel.h> ··· 30 31 struct cpu_data { 31 32 struct clk **pclk; 32 33 struct cpufreq_frequency_table *table; 33 - struct thermal_cooling_device *cdev; 34 34 }; 35 35 36 36 /* ··· 237 239 { 238 240 struct cpu_data *data = policy->driver_data; 239 241 240 - cpufreq_cooling_unregister(data->cdev); 241 242 kfree(data->pclk); 242 243 kfree(data->table); 243 244 kfree(data); ··· 255 258 return clk_set_parent(policy->clk, parent); 256 259 } 257 260 258 - 259 - static void qoriq_cpufreq_ready(struct cpufreq_policy *policy) 260 - { 261 - struct cpu_data *cpud = policy->driver_data; 262 - 263 - cpud->cdev = of_cpufreq_cooling_register(policy); 264 - } 265 - 266 261 static struct cpufreq_driver qoriq_cpufreq_driver = { 267 262 .name = "qoriq_cpufreq", 268 - .flags = CPUFREQ_CONST_LOOPS, 263 + .flags = CPUFREQ_CONST_LOOPS | 264 + CPUFREQ_IS_COOLING_DEV, 269 265 .init = qoriq_cpufreq_cpu_init, 270 266 .exit = qoriq_cpufreq_cpu_exit, 271 267 .verify = cpufreq_generic_frequency_table_verify, 272 268 .target_index = qoriq_cpufreq_target, 273 269 .get = cpufreq_generic_get, 274 - .ready = qoriq_cpufreq_ready, 275 270 .attr = cpufreq_generic_attr, 276 271 }; 277 272

+48 -19

drivers/cpufreq/s5pv210-cpufreq.c

··· 584 584 static int s5pv210_cpufreq_probe(struct platform_device *pdev) 585 585 { 586 586 struct device_node *np; 587 - int id; 587 + int id, result = 0; 588 588 589 589 /* 590 590 * HACK: This is a temporary workaround to get access to clock ··· 594 594 * this whole driver as soon as S5PV210 gets migrated to use 595 595 * cpufreq-dt driver. 596 596 */ 597 + arm_regulator = regulator_get(NULL, "vddarm"); 598 + if (IS_ERR(arm_regulator)) { 599 + if (PTR_ERR(arm_regulator) == -EPROBE_DEFER) 600 + pr_debug("vddarm regulator not ready, defer\n"); 601 + else 602 + pr_err("failed to get regulator vddarm\n"); 603 + return PTR_ERR(arm_regulator); 604 + } 605 + 606 + int_regulator = regulator_get(NULL, "vddint"); 607 + if (IS_ERR(int_regulator)) { 608 + if (PTR_ERR(int_regulator) == -EPROBE_DEFER) 609 + pr_debug("vddint regulator not ready, defer\n"); 610 + else 611 + pr_err("failed to get regulator vddint\n"); 612 + result = PTR_ERR(int_regulator); 613 + goto err_int_regulator; 614 + } 615 + 597 616 np = of_find_compatible_node(NULL, NULL, "samsung,s5pv210-clock"); 598 617 if (!np) { 599 618 pr_err("%s: failed to find clock controller DT node\n", 600 619 __func__); 601 - return -ENODEV; 620 + result = -ENODEV; 621 + goto err_clock; 602 622 } 603 623 604 624 clk_base = of_iomap(np, 0); 605 625 of_node_put(np); 606 626 if (!clk_base) { 607 627 pr_err("%s: failed to map clock registers\n", __func__); 608 - return -EFAULT; 628 + result = -EFAULT; 629 + goto err_clock; 609 630 } 610 631 611 632 for_each_compatible_node(np, NULL, "samsung,s5pv210-dmc") { ··· 635 614 pr_err("%s: failed to get alias of dmc node '%pOFn'\n", 636 615 __func__, np); 637 616 of_node_put(np); 638 - return id; 617 + result = id; 618 + goto err_clk_base; 639 619 } 640 620 641 621 dmc_base[id] = of_iomap(np, 0); ··· 644 622 pr_err("%s: failed to map dmc%d registers\n", 645 623 __func__, id); 646 624 of_node_put(np); 647 - return -EFAULT; 625 + result = -EFAULT; 626 + goto err_dmc; 648 627 } 649 628 } 650 629 651 630 for (id = 0; id < ARRAY_SIZE(dmc_base); ++id) { 652 631 if (!dmc_base[id]) { 653 632 pr_err("%s: failed to find dmc%d node\n", __func__, id); 654 - return -ENODEV; 633 + result = -ENODEV; 634 + goto err_dmc; 655 635 } 656 - } 657 - 658 - arm_regulator = regulator_get(NULL, "vddarm"); 659 - if (IS_ERR(arm_regulator)) { 660 - pr_err("failed to get regulator vddarm\n"); 661 - return PTR_ERR(arm_regulator); 662 - } 663 - 664 - int_regulator = regulator_get(NULL, "vddint"); 665 - if (IS_ERR(int_regulator)) { 666 - pr_err("failed to get regulator vddint\n"); 667 - regulator_put(arm_regulator); 668 - return PTR_ERR(int_regulator); 669 636 } 670 637 671 638 register_reboot_notifier(&s5pv210_cpufreq_reboot_notifier); 672 639 673 640 return cpufreq_register_driver(&s5pv210_driver); 641 + 642 + err_dmc: 643 + for (id = 0; id < ARRAY_SIZE(dmc_base); ++id) 644 + if (dmc_base[id]) { 645 + iounmap(dmc_base[id]); 646 + dmc_base[id] = NULL; 647 + } 648 + 649 + err_clk_base: 650 + iounmap(clk_base); 651 + 652 + err_clock: 653 + regulator_put(int_regulator); 654 + 655 + err_int_regulator: 656 + regulator_put(arm_regulator); 657 + 658 + return result; 674 659 } 675 660 676 661 static struct platform_driver s5pv210_cpufreq_platdrv = {

+38 -15

drivers/cpufreq/scmi-cpufreq.c

··· 11 11 #include <linux/cpu.h> 12 12 #include <linux/cpufreq.h> 13 13 #include <linux/cpumask.h> 14 - #include <linux/cpu_cooling.h> 14 + #include <linux/energy_model.h> 15 15 #include <linux/export.h> 16 16 #include <linux/module.h> 17 17 #include <linux/pm_opp.h> ··· 22 22 struct scmi_data { 23 23 int domain_id; 24 24 struct device *cpu_dev; 25 - struct thermal_cooling_device *cdev; 26 25 }; 27 26 28 27 static const struct scmi_handle *handle; ··· 102 103 return 0; 103 104 } 104 105 106 + static int __maybe_unused 107 + scmi_get_cpu_power(unsigned long *power, unsigned long *KHz, int cpu) 108 + { 109 + struct device *cpu_dev = get_cpu_device(cpu); 110 + unsigned long Hz; 111 + int ret, domain; 112 + 113 + if (!cpu_dev) { 114 + pr_err("failed to get cpu%d device\n", cpu); 115 + return -ENODEV; 116 + } 117 + 118 + domain = handle->perf_ops->device_domain_id(cpu_dev); 119 + if (domain < 0) 120 + return domain; 121 + 122 + /* Get the power cost of the performance domain. */ 123 + Hz = *KHz * 1000; 124 + ret = handle->perf_ops->est_power_get(handle, domain, &Hz, power); 125 + if (ret) 126 + return ret; 127 + 128 + /* The EM framework specifies the frequency in KHz. */ 129 + *KHz = Hz / 1000; 130 + 131 + return 0; 132 + } 133 + 105 134 static int scmi_cpufreq_init(struct cpufreq_policy *policy) 106 135 { 107 - int ret; 136 + int ret, nr_opp; 108 137 unsigned int latency; 109 138 struct device *cpu_dev; 110 139 struct scmi_data *priv; 111 140 struct cpufreq_frequency_table *freq_table; 141 + struct em_data_callback em_cb = EM_DATA_CB(scmi_get_cpu_power); 112 142 113 143 cpu_dev = get_cpu_device(policy->cpu); 114 144 if (!cpu_dev) { ··· 164 136 return ret; 165 137 } 166 138 167 - ret = dev_pm_opp_get_opp_count(cpu_dev); 168 - if (ret <= 0) { 139 + nr_opp = dev_pm_opp_get_opp_count(cpu_dev); 140 + if (nr_opp <= 0) { 169 141 dev_dbg(cpu_dev, "OPP table is not ready, deferring probe\n"); 170 142 ret = -EPROBE_DEFER; 171 143 goto out_free_opp; ··· 199 171 policy->cpuinfo.transition_latency = latency; 200 172 201 173 policy->fast_switch_possible = true; 174 + 175 + em_register_perf_domain(policy->cpus, nr_opp, &em_cb); 176 + 202 177 return 0; 203 178 204 179 out_free_priv: ··· 216 185 { 217 186 struct scmi_data *priv = policy->driver_data; 218 187 219 - cpufreq_cooling_unregister(priv->cdev); 220 188 dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table); 221 189 dev_pm_opp_remove_all_dynamic(priv->cpu_dev); 222 190 kfree(priv); ··· 223 193 return 0; 224 194 } 225 195 226 - static void scmi_cpufreq_ready(struct cpufreq_policy *policy) 227 - { 228 - struct scmi_data *priv = policy->driver_data; 229 - 230 - priv->cdev = of_cpufreq_cooling_register(policy); 231 - } 232 - 233 196 static struct cpufreq_driver scmi_cpufreq_driver = { 234 197 .name = "scmi", 235 198 .flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | 236 - CPUFREQ_NEED_INITIAL_FREQ_CHECK, 199 + CPUFREQ_NEED_INITIAL_FREQ_CHECK | 200 + CPUFREQ_IS_COOLING_DEV, 237 201 .verify = cpufreq_generic_frequency_table_verify, 238 202 .attr = cpufreq_generic_attr, 239 203 .target_index = scmi_cpufreq_set_target, ··· 235 211 .get = scmi_cpufreq_get_rate, 236 212 .init = scmi_cpufreq_init, 237 213 .exit = scmi_cpufreq_exit, 238 - .ready = scmi_cpufreq_ready, 239 214 }; 240 215 241 216 static int scmi_cpufreq_probe(struct scmi_device *sdev)

+5 -12

drivers/cpufreq/scpi-cpufreq.c

··· 22 22 #include <linux/cpu.h> 23 23 #include <linux/cpufreq.h> 24 24 #include <linux/cpumask.h> 25 - #include <linux/cpu_cooling.h> 26 25 #include <linux/export.h> 27 26 #include <linux/module.h> 28 27 #include <linux/of_platform.h> ··· 33 34 struct scpi_data { 34 35 struct clk *clk; 35 36 struct device *cpu_dev; 36 - struct thermal_cooling_device *cdev; 37 37 }; 38 38 39 39 static struct scpi_ops *scpi_ops; ··· 168 170 policy->cpuinfo.transition_latency = latency; 169 171 170 172 policy->fast_switch_possible = false; 173 + 174 + dev_pm_opp_of_register_em(policy->cpus); 175 + 171 176 return 0; 172 177 173 178 out_free_cpufreq_table: ··· 187 186 { 188 187 struct scpi_data *priv = policy->driver_data; 189 188 190 - cpufreq_cooling_unregister(priv->cdev); 191 189 clk_put(priv->clk); 192 190 dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table); 193 191 kfree(priv); ··· 195 195 return 0; 196 196 } 197 197 198 - static void scpi_cpufreq_ready(struct cpufreq_policy *policy) 199 - { 200 - struct scpi_data *priv = policy->driver_data; 201 - 202 - priv->cdev = of_cpufreq_cooling_register(policy); 203 - } 204 - 205 198 static struct cpufreq_driver scpi_cpufreq_driver = { 206 199 .name = "scpi-cpufreq", 207 200 .flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | 208 - CPUFREQ_NEED_INITIAL_FREQ_CHECK, 201 + CPUFREQ_NEED_INITIAL_FREQ_CHECK | 202 + CPUFREQ_IS_COOLING_DEV, 209 203 .verify = cpufreq_generic_frequency_table_verify, 210 204 .attr = cpufreq_generic_attr, 211 205 .get = scpi_cpufreq_get_rate, 212 206 .init = scpi_cpufreq_init, 213 207 .exit = scpi_cpufreq_exit, 214 - .ready = scpi_cpufreq_ready, 215 208 .target_index = scpi_cpufreq_set_target, 216 209 }; 217 210

+1 -2

drivers/cpufreq/speedstep-ich.c

··· 243 243 unsigned int speed; 244 244 245 245 /* You're supposed to ensure CPU is online. */ 246 - if (smp_call_function_single(cpu, get_freq_data, &speed, 1) != 0) 247 - BUG(); 246 + BUG_ON(smp_call_function_single(cpu, get_freq_data, &speed, 1)); 248 247 249 248 pr_debug("detected %u kHz as current frequency\n", speed); 250 249 return speed;

+2

drivers/cpufreq/tegra124-cpufreq.c

··· 118 118 119 119 platform_set_drvdata(pdev, priv); 120 120 121 + of_node_put(np); 122 + 121 123 return 0; 122 124 123 125 out_put_pllp_clk:

+10 -1

drivers/cpuidle/Kconfig

··· 4 4 bool "CPU idle PM support" 5 5 default y if ACPI || PPC_PSERIES 6 6 select CPU_IDLE_GOV_LADDER if (!NO_HZ && !NO_HZ_IDLE) 7 - select CPU_IDLE_GOV_MENU if (NO_HZ || NO_HZ_IDLE) 7 + select CPU_IDLE_GOV_MENU if (NO_HZ || NO_HZ_IDLE) && !CPU_IDLE_GOV_TEO 8 8 help 9 9 CPU idle is a generic framework for supporting software-controlled 10 10 idle processor power management. It includes modular cross-platform ··· 22 22 23 23 config CPU_IDLE_GOV_MENU 24 24 bool "Menu governor (for tickless system)" 25 + 26 + config CPU_IDLE_GOV_TEO 27 + bool "Timer events oriented (TEO) governor (for tickless systems)" 28 + help 29 + This governor implements a simplified idle state selection method 30 + focused on timer events and does not do any interactivity boosting. 31 + 32 + Some workloads benefit from using it and it generally should be safe 33 + to use. Say Y here if you are not happy with the alternatives. 25 34 26 35 config DT_IDLE_STATES 27 36 bool

+9 -6

drivers/cpuidle/dt_idle_states.c

··· 22 22 #include "dt_idle_states.h" 23 23 24 24 static int init_state_node(struct cpuidle_state *idle_state, 25 - const struct of_device_id *matches, 25 + const struct of_device_id *match_id, 26 26 struct device_node *state_node) 27 27 { 28 28 int err; 29 - const struct of_device_id *match_id; 30 29 const char *desc; 31 30 32 - match_id = of_match_node(matches, state_node); 33 - if (!match_id) 34 - return -ENODEV; 35 31 /* 36 32 * CPUidle drivers are expected to initialize the const void *data 37 33 * pointer of the passed in struct of_device_id array to the idle ··· 156 160 { 157 161 struct cpuidle_state *idle_state; 158 162 struct device_node *state_node, *cpu_node; 163 + const struct of_device_id *match_id; 159 164 int i, err = 0; 160 165 const cpumask_t *cpumask; 161 166 unsigned int state_idx = start_idx; ··· 177 180 if (!state_node) 178 181 break; 179 182 183 + match_id = of_match_node(matches, state_node); 184 + if (!match_id) { 185 + err = -ENODEV; 186 + break; 187 + } 188 + 180 189 if (!of_device_is_available(state_node)) { 181 190 of_node_put(state_node); 182 191 continue; ··· 201 198 } 202 199 203 200 idle_state = &drv->states[state_idx++]; 204 - err = init_state_node(idle_state, matches, state_node); 201 + err = init_state_node(idle_state, match_id, state_node); 205 202 if (err) { 206 203 pr_err("Parsing idle state node %pOF failed with err %d\n", 207 204 state_node, err);

+1

drivers/cpuidle/governors/Makefile

··· 4 4 5 5 obj-$(CONFIG_CPU_IDLE_GOV_LADDER) += ladder.o 6 6 obj-$(CONFIG_CPU_IDLE_GOV_MENU) += menu.o 7 + obj-$(CONFIG_CPU_IDLE_GOV_TEO) += teo.o

+444

drivers/cpuidle/governors/teo.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Timer events oriented CPU idle governor 4 + * 5 + * Copyright (C) 2018 Intel Corporation 6 + * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 7 + * 8 + * The idea of this governor is based on the observation that on many systems 9 + * timer events are two or more orders of magnitude more frequent than any 10 + * other interrupts, so they are likely to be the most significant source of CPU 11 + * wakeups from idle states. Moreover, information about what happened in the 12 + * (relatively recent) past can be used to estimate whether or not the deepest 13 + * idle state with target residency within the time to the closest timer is 14 + * likely to be suitable for the upcoming idle time of the CPU and, if not, then 15 + * which of the shallower idle states to choose. 16 + * 17 + * Of course, non-timer wakeup sources are more important in some use cases and 18 + * they can be covered by taking a few most recent idle time intervals of the 19 + * CPU into account. However, even in that case it is not necessary to consider 20 + * idle duration values greater than the time till the closest timer, as the 21 + * patterns that they may belong to produce average values close enough to 22 + * the time till the closest timer (sleep length) anyway. 23 + * 24 + * Thus this governor estimates whether or not the upcoming idle time of the CPU 25 + * is likely to be significantly shorter than the sleep length and selects an 26 + * idle state for it in accordance with that, as follows: 27 + * 28 + * - Find an idle state on the basis of the sleep length and state statistics 29 + * collected over time: 30 + * 31 + * o Find the deepest idle state whose target residency is less than or equal 32 + * to the sleep length. 33 + * 34 + * o Select it if it matched both the sleep length and the observed idle 35 + * duration in the past more often than it matched the sleep length alone 36 + * (i.e. the observed idle duration was significantly shorter than the sleep 37 + * length matched by it). 38 + * 39 + * o Otherwise, select the shallower state with the greatest matched "early" 40 + * wakeups metric. 41 + * 42 + * - If the majority of the most recent idle duration values are below the 43 + * target residency of the idle state selected so far, use those values to 44 + * compute the new expected idle duration and find an idle state matching it 45 + * (which has to be shallower than the one selected so far). 46 + */ 47 + 48 + #include <linux/cpuidle.h> 49 + #include <linux/jiffies.h> 50 + #include <linux/kernel.h> 51 + #include <linux/sched/clock.h> 52 + #include <linux/tick.h> 53 + 54 + /* 55 + * The PULSE value is added to metrics when they grow and the DECAY_SHIFT value 56 + * is used for decreasing metrics on a regular basis. 57 + */ 58 + #define PULSE 1024 59 + #define DECAY_SHIFT 3 60 + 61 + /* 62 + * Number of the most recent idle duration values to take into consideration for 63 + * the detection of wakeup patterns. 64 + */ 65 + #define INTERVALS 8 66 + 67 + /** 68 + * struct teo_idle_state - Idle state data used by the TEO cpuidle governor. 69 + * @early_hits: "Early" CPU wakeups "matching" this state. 70 + * @hits: "On time" CPU wakeups "matching" this state. 71 + * @misses: CPU wakeups "missing" this state. 72 + * 73 + * A CPU wakeup is "matched" by a given idle state if the idle duration measured 74 + * after the wakeup is between the target residency of that state and the target 75 + * residency of the next one (or if this is the deepest available idle state, it 76 + * "matches" a CPU wakeup when the measured idle duration is at least equal to 77 + * its target residency). 78 + * 79 + * Also, from the TEO governor perspective, a CPU wakeup from idle is "early" if 80 + * it occurs significantly earlier than the closest expected timer event (that 81 + * is, early enough to match an idle state shallower than the one matching the 82 + * time till the closest timer event). Otherwise, the wakeup is "on time", or 83 + * it is a "hit". 84 + * 85 + * A "miss" occurs when the given state doesn't match the wakeup, but it matches 86 + * the time till the closest timer event used for idle state selection. 87 + */ 88 + struct teo_idle_state { 89 + unsigned int early_hits; 90 + unsigned int hits; 91 + unsigned int misses; 92 + }; 93 + 94 + /** 95 + * struct teo_cpu - CPU data used by the TEO cpuidle governor. 96 + * @time_span_ns: Time between idle state selection and post-wakeup update. 97 + * @sleep_length_ns: Time till the closest timer event (at the selection time). 98 + * @states: Idle states data corresponding to this CPU. 99 + * @last_state: Idle state entered by the CPU last time. 100 + * @interval_idx: Index of the most recent saved idle interval. 101 + * @intervals: Saved idle duration values. 102 + */ 103 + struct teo_cpu { 104 + u64 time_span_ns; 105 + u64 sleep_length_ns; 106 + struct teo_idle_state states[CPUIDLE_STATE_MAX]; 107 + int last_state; 108 + int interval_idx; 109 + unsigned int intervals[INTERVALS]; 110 + }; 111 + 112 + static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); 113 + 114 + /** 115 + * teo_update - Update CPU data after wakeup. 116 + * @drv: cpuidle driver containing state data. 117 + * @dev: Target CPU. 118 + */ 119 + static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) 120 + { 121 + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 122 + unsigned int sleep_length_us = ktime_to_us(cpu_data->sleep_length_ns); 123 + int i, idx_hit = -1, idx_timer = -1; 124 + unsigned int measured_us; 125 + 126 + if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) { 127 + /* 128 + * One of the safety nets has triggered or this was a timer 129 + * wakeup (or equivalent). 130 + */ 131 + measured_us = sleep_length_us; 132 + } else { 133 + unsigned int lat = drv->states[cpu_data->last_state].exit_latency; 134 + 135 + measured_us = ktime_to_us(cpu_data->time_span_ns); 136 + /* 137 + * The delay between the wakeup and the first instruction 138 + * executed by the CPU is not likely to be worst-case every 139 + * time, so take 1/2 of the exit latency as a very rough 140 + * approximation of the average of it. 141 + */ 142 + if (measured_us >= lat) 143 + measured_us -= lat / 2; 144 + else 145 + measured_us /= 2; 146 + } 147 + 148 + /* 149 + * Decay the "early hits" metric for all of the states and find the 150 + * states matching the sleep length and the measured idle duration. 151 + */ 152 + for (i = 0; i < drv->state_count; i++) { 153 + unsigned int early_hits = cpu_data->states[i].early_hits; 154 + 155 + cpu_data->states[i].early_hits -= early_hits >> DECAY_SHIFT; 156 + 157 + if (drv->states[i].target_residency <= sleep_length_us) { 158 + idx_timer = i; 159 + if (drv->states[i].target_residency <= measured_us) 160 + idx_hit = i; 161 + } 162 + } 163 + 164 + /* 165 + * Update the "hits" and "misses" data for the state matching the sleep 166 + * length. If it matches the measured idle duration too, this is a hit, 167 + * so increase the "hits" metric for it then. Otherwise, this is a 168 + * miss, so increase the "misses" metric for it. In the latter case 169 + * also increase the "early hits" metric for the state that actually 170 + * matches the measured idle duration. 171 + */ 172 + if (idx_timer >= 0) { 173 + unsigned int hits = cpu_data->states[idx_timer].hits; 174 + unsigned int misses = cpu_data->states[idx_timer].misses; 175 + 176 + hits -= hits >> DECAY_SHIFT; 177 + misses -= misses >> DECAY_SHIFT; 178 + 179 + if (idx_timer > idx_hit) { 180 + misses += PULSE; 181 + if (idx_hit >= 0) 182 + cpu_data->states[idx_hit].early_hits += PULSE; 183 + } else { 184 + hits += PULSE; 185 + } 186 + 187 + cpu_data->states[idx_timer].misses = misses; 188 + cpu_data->states[idx_timer].hits = hits; 189 + } 190 + 191 + /* 192 + * If the total time span between idle state selection and the "reflect" 193 + * callback is greater than or equal to the sleep length determined at 194 + * the idle state selection time, the wakeup is likely to be due to a 195 + * timer event. 196 + */ 197 + if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) 198 + measured_us = UINT_MAX; 199 + 200 + /* 201 + * Save idle duration values corresponding to non-timer wakeups for 202 + * pattern detection. 203 + */ 204 + cpu_data->intervals[cpu_data->interval_idx++] = measured_us; 205 + if (cpu_data->interval_idx > INTERVALS) 206 + cpu_data->interval_idx = 0; 207 + } 208 + 209 + /** 210 + * teo_find_shallower_state - Find shallower idle state matching given duration. 211 + * @drv: cpuidle driver containing state data. 212 + * @dev: Target CPU. 213 + * @state_idx: Index of the capping idle state. 214 + * @duration_us: Idle duration value to match. 215 + */ 216 + static int teo_find_shallower_state(struct cpuidle_driver *drv, 217 + struct cpuidle_device *dev, int state_idx, 218 + unsigned int duration_us) 219 + { 220 + int i; 221 + 222 + for (i = state_idx - 1; i >= 0; i--) { 223 + if (drv->states[i].disabled || dev->states_usage[i].disable) 224 + continue; 225 + 226 + state_idx = i; 227 + if (drv->states[i].target_residency <= duration_us) 228 + break; 229 + } 230 + return state_idx; 231 + } 232 + 233 + /** 234 + * teo_select - Selects the next idle state to enter. 235 + * @drv: cpuidle driver containing state data. 236 + * @dev: Target CPU. 237 + * @stop_tick: Indication on whether or not to stop the scheduler tick. 238 + */ 239 + static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, 240 + bool *stop_tick) 241 + { 242 + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 243 + int latency_req = cpuidle_governor_latency_req(dev->cpu); 244 + unsigned int duration_us, count; 245 + int max_early_idx, idx, i; 246 + ktime_t delta_tick; 247 + 248 + if (cpu_data->last_state >= 0) { 249 + teo_update(drv, dev); 250 + cpu_data->last_state = -1; 251 + } 252 + 253 + cpu_data->time_span_ns = local_clock(); 254 + 255 + cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick); 256 + duration_us = ktime_to_us(cpu_data->sleep_length_ns); 257 + 258 + count = 0; 259 + max_early_idx = -1; 260 + idx = -1; 261 + 262 + for (i = 0; i < drv->state_count; i++) { 263 + struct cpuidle_state *s = &drv->states[i]; 264 + struct cpuidle_state_usage *su = &dev->states_usage[i]; 265 + 266 + if (s->disabled || su->disable) { 267 + /* 268 + * If the "early hits" metric of a disabled state is 269 + * greater than the current maximum, it should be taken 270 + * into account, because it would be a mistake to select 271 + * a deeper state with lower "early hits" metric. The 272 + * index cannot be changed to point to it, however, so 273 + * just increase the max count alone and let the index 274 + * still point to a shallower idle state. 275 + */ 276 + if (max_early_idx >= 0 && 277 + count < cpu_data->states[i].early_hits) 278 + count = cpu_data->states[i].early_hits; 279 + 280 + continue; 281 + } 282 + 283 + if (idx < 0) 284 + idx = i; /* first enabled state */ 285 + 286 + if (s->target_residency > duration_us) 287 + break; 288 + 289 + if (s->exit_latency > latency_req) { 290 + /* 291 + * If we break out of the loop for latency reasons, use 292 + * the target residency of the selected state as the 293 + * expected idle duration to avoid stopping the tick 294 + * as long as that target residency is low enough. 295 + */ 296 + duration_us = drv->states[idx].target_residency; 297 + goto refine; 298 + } 299 + 300 + idx = i; 301 + 302 + if (count < cpu_data->states[i].early_hits && 303 + !(tick_nohz_tick_stopped() && 304 + drv->states[i].target_residency < TICK_USEC)) { 305 + count = cpu_data->states[i].early_hits; 306 + max_early_idx = i; 307 + } 308 + } 309 + 310 + /* 311 + * If the "hits" metric of the idle state matching the sleep length is 312 + * greater than its "misses" metric, that is the one to use. Otherwise, 313 + * it is more likely that one of the shallower states will match the 314 + * idle duration observed after wakeup, so take the one with the maximum 315 + * "early hits" metric, but if that cannot be determined, just use the 316 + * state selected so far. 317 + */ 318 + if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses && 319 + max_early_idx >= 0) { 320 + idx = max_early_idx; 321 + duration_us = drv->states[idx].target_residency; 322 + } 323 + 324 + refine: 325 + if (idx < 0) { 326 + idx = 0; /* No states enabled. Must use 0. */ 327 + } else if (idx > 0) { 328 + u64 sum = 0; 329 + 330 + count = 0; 331 + 332 + /* 333 + * Count and sum the most recent idle duration values less than 334 + * the target residency of the state selected so far, find the 335 + * max. 336 + */ 337 + for (i = 0; i < INTERVALS; i++) { 338 + unsigned int val = cpu_data->intervals[i]; 339 + 340 + if (val >= drv->states[idx].target_residency) 341 + continue; 342 + 343 + count++; 344 + sum += val; 345 + } 346 + 347 + /* 348 + * Give up unless the majority of the most recent idle duration 349 + * values are in the interesting range. 350 + */ 351 + if (count > INTERVALS / 2) { 352 + unsigned int avg_us = div64_u64(sum, count); 353 + 354 + /* 355 + * Avoid spending too much time in an idle state that 356 + * would be too shallow. 357 + */ 358 + if (!(tick_nohz_tick_stopped() && avg_us < TICK_USEC)) { 359 + idx = teo_find_shallower_state(drv, dev, idx, avg_us); 360 + duration_us = avg_us; 361 + } 362 + } 363 + } 364 + 365 + /* 366 + * Don't stop the tick if the selected state is a polling one or if the 367 + * expected idle duration is shorter than the tick period length. 368 + */ 369 + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || 370 + duration_us < TICK_USEC) && !tick_nohz_tick_stopped()) { 371 + unsigned int delta_tick_us = ktime_to_us(delta_tick); 372 + 373 + *stop_tick = false; 374 + 375 + /* 376 + * The tick is not going to be stopped, so if the target 377 + * residency of the state to be returned is not within the time 378 + * till the closest timer including the tick, try to correct 379 + * that. 380 + */ 381 + if (idx > 0 && drv->states[idx].target_residency > delta_tick_us) 382 + idx = teo_find_shallower_state(drv, dev, idx, delta_tick_us); 383 + } 384 + 385 + return idx; 386 + } 387 + 388 + /** 389 + * teo_reflect - Note that governor data for the CPU need to be updated. 390 + * @dev: Target CPU. 391 + * @state: Entered state. 392 + */ 393 + static void teo_reflect(struct cpuidle_device *dev, int state) 394 + { 395 + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 396 + 397 + cpu_data->last_state = state; 398 + /* 399 + * If the wakeup was not "natural", but triggered by one of the safety 400 + * nets, assume that the CPU might have been idle for the entire sleep 401 + * length time. 402 + */ 403 + if (dev->poll_time_limit || 404 + (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) { 405 + dev->poll_time_limit = false; 406 + cpu_data->time_span_ns = cpu_data->sleep_length_ns; 407 + } else { 408 + cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns; 409 + } 410 + } 411 + 412 + /** 413 + * teo_enable_device - Initialize the governor's data for the target CPU. 414 + * @drv: cpuidle driver (not used). 415 + * @dev: Target CPU. 416 + */ 417 + static int teo_enable_device(struct cpuidle_driver *drv, 418 + struct cpuidle_device *dev) 419 + { 420 + struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); 421 + int i; 422 + 423 + memset(cpu_data, 0, sizeof(*cpu_data)); 424 + 425 + for (i = 0; i < INTERVALS; i++) 426 + cpu_data->intervals[i] = UINT_MAX; 427 + 428 + return 0; 429 + } 430 + 431 + static struct cpuidle_governor teo_governor = { 432 + .name = "teo", 433 + .rating = 19, 434 + .enable = teo_enable_device, 435 + .select = teo_select, 436 + .reflect = teo_reflect, 437 + }; 438 + 439 + static int __init teo_governor_init(void) 440 + { 441 + return cpuidle_register_governor(&teo_governor); 442 + } 443 + 444 + postcore_initcall(teo_governor_init);

+6 -10

drivers/gpu/drm/i915/i915_pmu.c

··· 5 5 */ 6 6 7 7 #include <linux/irq.h> 8 + #include <linux/pm_runtime.h> 8 9 #include "i915_pmu.h" 9 10 #include "intel_ringbuffer.h" 10 11 #include "i915_drv.h" ··· 479 478 * counter value. 480 479 */ 481 480 spin_lock_irqsave(&i915->pmu.lock, flags); 482 - spin_lock(&kdev->power.lock); 483 481 484 482 /* 485 483 * After the above branch intel_runtime_pm_get_if_in_use failed ··· 491 491 * suspended and if not we cannot do better than report the last 492 492 * known RC6 value. 493 493 */ 494 - if (kdev->power.runtime_status == RPM_SUSPENDED) { 494 + if (pm_runtime_status_suspended(kdev)) { 495 + val = pm_runtime_suspended_time(kdev); 496 + 495 497 if (!i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur) 496 - i915->pmu.suspended_jiffies_last = 497 - kdev->power.suspended_jiffies; 498 + i915->pmu.suspended_time_last = val; 498 499 499 - val = kdev->power.suspended_jiffies - 500 - i915->pmu.suspended_jiffies_last; 501 - val += jiffies - kdev->power.accounting_timestamp; 502 - 503 - val = jiffies_to_nsecs(val); 500 + val -= i915->pmu.suspended_time_last; 504 501 val += i915->pmu.sample[__I915_SAMPLE_RC6].cur; 505 502 506 503 i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur = val; ··· 507 510 val = i915->pmu.sample[__I915_SAMPLE_RC6].cur; 508 511 } 509 512 510 - spin_unlock(&kdev->power.lock); 511 513 spin_unlock_irqrestore(&i915->pmu.lock, flags); 512 514 } 513 515

+2 -2

drivers/gpu/drm/i915/i915_pmu.h

··· 97 97 */ 98 98 struct i915_pmu_sample sample[__I915_NUM_PMU_SAMPLERS]; 99 99 /** 100 - * @suspended_jiffies_last: Cached suspend time from PM core. 100 + * @suspended_time_last: Cached suspend time from PM core. 101 101 */ 102 - unsigned long suspended_jiffies_last; 102 + u64 suspended_time_last; 103 103 /** 104 104 * @i915_attr: Memory block holding device attributes. 105 105 */

+1

drivers/idle/intel_idle.c

··· 1103 1103 INTEL_CPU_FAM6(ATOM_GOLDMONT, idle_cpu_bxt), 1104 1104 INTEL_CPU_FAM6(ATOM_GOLDMONT_PLUS, idle_cpu_bxt), 1105 1105 INTEL_CPU_FAM6(ATOM_GOLDMONT_X, idle_cpu_dnv), 1106 + INTEL_CPU_FAM6(ATOM_TREMONT_X, idle_cpu_dnv), 1106 1107 {} 1107 1108 }; 1108 1109

+7 -15

drivers/opp/core.c

··· 551 551 return ret; 552 552 } 553 553 554 - static inline int 555 - _generic_set_opp_clk_only(struct device *dev, struct clk *clk, 556 - unsigned long old_freq, unsigned long freq) 554 + static inline int _generic_set_opp_clk_only(struct device *dev, struct clk *clk, 555 + unsigned long freq) 557 556 { 558 557 int ret; 559 558 ··· 589 590 } 590 591 591 592 /* Change frequency */ 592 - ret = _generic_set_opp_clk_only(dev, opp_table->clk, old_freq, freq); 593 + ret = _generic_set_opp_clk_only(dev, opp_table->clk, freq); 593 594 if (ret) 594 595 goto restore_voltage; 595 596 ··· 603 604 return 0; 604 605 605 606 restore_freq: 606 - if (_generic_set_opp_clk_only(dev, opp_table->clk, freq, old_freq)) 607 + if (_generic_set_opp_clk_only(dev, opp_table->clk, old_freq)) 607 608 dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n", 608 609 __func__, old_freq); 609 610 restore_voltage: ··· 776 777 opp->supplies); 777 778 } else { 778 779 /* Only frequency scaling */ 779 - ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); 780 + ret = _generic_set_opp_clk_only(dev, clk, freq); 780 781 } 781 782 782 783 /* Scaling down? Configure required OPPs after frequency */ ··· 810 811 struct opp_table *opp_table) 811 812 { 812 813 struct opp_device *opp_dev; 813 - int ret; 814 814 815 815 opp_dev = kzalloc(sizeof(*opp_dev), GFP_KERNEL); 816 816 if (!opp_dev) ··· 821 823 list_add(&opp_dev->node, &opp_table->dev_list); 822 824 823 825 /* Create debugfs entries for the opp_table */ 824 - ret = opp_debug_register(opp_dev, opp_table); 825 - if (ret) 826 - dev_err(dev, "%s: Failed to register opp debugfs (%d)\n", 827 - __func__, ret); 826 + opp_debug_register(opp_dev, opp_table); 828 827 829 828 return opp_dev; 830 829 } ··· 1242 1247 new_opp->opp_table = opp_table; 1243 1248 kref_init(&new_opp->kref); 1244 1249 1245 - ret = opp_debug_create_one(new_opp, opp_table); 1246 - if (ret) 1247 - dev_err(dev, "%s: Failed to register opp to debugfs (%d)\n", 1248 - __func__, ret); 1250 + opp_debug_create_one(new_opp, opp_table); 1249 1251 1250 1252 if (!_opp_supported_by_regulators(new_opp, opp_table)) { 1251 1253 new_opp->available = false;

+29 -81

drivers/opp/debugfs.c

··· 35 35 debugfs_remove_recursive(opp->dentry); 36 36 } 37 37 38 - static bool opp_debug_create_supplies(struct dev_pm_opp *opp, 38 + static void opp_debug_create_supplies(struct dev_pm_opp *opp, 39 39 struct opp_table *opp_table, 40 40 struct dentry *pdentry) 41 41 { ··· 50 50 /* Create per-opp directory */ 51 51 d = debugfs_create_dir(name, pdentry); 52 52 53 - if (!d) 54 - return false; 53 + debugfs_create_ulong("u_volt_target", S_IRUGO, d, 54 + &opp->supplies[i].u_volt); 55 55 56 - if (!debugfs_create_ulong("u_volt_target", S_IRUGO, d, 57 - &opp->supplies[i].u_volt)) 58 - return false; 56 + debugfs_create_ulong("u_volt_min", S_IRUGO, d, 57 + &opp->supplies[i].u_volt_min); 59 58 60 - if (!debugfs_create_ulong("u_volt_min", S_IRUGO, d, 61 - &opp->supplies[i].u_volt_min)) 62 - return false; 59 + debugfs_create_ulong("u_volt_max", S_IRUGO, d, 60 + &opp->supplies[i].u_volt_max); 63 61 64 - if (!debugfs_create_ulong("u_volt_max", S_IRUGO, d, 65 - &opp->supplies[i].u_volt_max)) 66 - return false; 67 - 68 - if (!debugfs_create_ulong("u_amp", S_IRUGO, d, 69 - &opp->supplies[i].u_amp)) 70 - return false; 62 + debugfs_create_ulong("u_amp", S_IRUGO, d, 63 + &opp->supplies[i].u_amp); 71 64 } 72 - 73 - return true; 74 65 } 75 66 76 - int opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table) 67 + void opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table) 77 68 { 78 69 struct dentry *pdentry = opp_table->dentry; 79 70 struct dentry *d; ··· 86 95 87 96 /* Create per-opp directory */ 88 97 d = debugfs_create_dir(name, pdentry); 89 - if (!d) 90 - return -ENOMEM; 91 98 92 - if (!debugfs_create_bool("available", S_IRUGO, d, &opp->available)) 93 - return -ENOMEM; 99 + debugfs_create_bool("available", S_IRUGO, d, &opp->available); 100 + debugfs_create_bool("dynamic", S_IRUGO, d, &opp->dynamic); 101 + debugfs_create_bool("turbo", S_IRUGO, d, &opp->turbo); 102 + debugfs_create_bool("suspend", S_IRUGO, d, &opp->suspend); 103 + debugfs_create_u32("performance_state", S_IRUGO, d, &opp->pstate); 104 + debugfs_create_ulong("rate_hz", S_IRUGO, d, &opp->rate); 105 + debugfs_create_ulong("clock_latency_ns", S_IRUGO, d, 106 + &opp->clock_latency_ns); 94 107 95 - if (!debugfs_create_bool("dynamic", S_IRUGO, d, &opp->dynamic)) 96 - return -ENOMEM; 97 - 98 - if (!debugfs_create_bool("turbo", S_IRUGO, d, &opp->turbo)) 99 - return -ENOMEM; 100 - 101 - if (!debugfs_create_bool("suspend", S_IRUGO, d, &opp->suspend)) 102 - return -ENOMEM; 103 - 104 - if (!debugfs_create_u32("performance_state", S_IRUGO, d, &opp->pstate)) 105 - return -ENOMEM; 106 - 107 - if (!debugfs_create_ulong("rate_hz", S_IRUGO, d, &opp->rate)) 108 - return -ENOMEM; 109 - 110 - if (!opp_debug_create_supplies(opp, opp_table, d)) 111 - return -ENOMEM; 112 - 113 - if (!debugfs_create_ulong("clock_latency_ns", S_IRUGO, d, 114 - &opp->clock_latency_ns)) 115 - return -ENOMEM; 108 + opp_debug_create_supplies(opp, opp_table, d); 116 109 117 110 opp->dentry = d; 118 - return 0; 119 111 } 120 112 121 - static int opp_list_debug_create_dir(struct opp_device *opp_dev, 122 - struct opp_table *opp_table) 113 + static void opp_list_debug_create_dir(struct opp_device *opp_dev, 114 + struct opp_table *opp_table) 123 115 { 124 116 const struct device *dev = opp_dev->dev; 125 117 struct dentry *d; ··· 111 137 112 138 /* Create device specific directory */ 113 139 d = debugfs_create_dir(opp_table->dentry_name, rootdir); 114 - if (!d) { 115 - dev_err(dev, "%s: Failed to create debugfs dir\n", __func__); 116 - return -ENOMEM; 117 - } 118 140 119 141 opp_dev->dentry = d; 120 142 opp_table->dentry = d; 121 - 122 - return 0; 123 143 } 124 144 125 - static int opp_list_debug_create_link(struct opp_device *opp_dev, 126 - struct opp_table *opp_table) 145 + static void opp_list_debug_create_link(struct opp_device *opp_dev, 146 + struct opp_table *opp_table) 127 147 { 128 - const struct device *dev = opp_dev->dev; 129 148 char name[NAME_MAX]; 130 - struct dentry *d; 131 149 132 150 opp_set_dev_name(opp_dev->dev, name); 133 151 134 152 /* Create device specific directory link */ 135 - d = debugfs_create_symlink(name, rootdir, opp_table->dentry_name); 136 - if (!d) { 137 - dev_err(dev, "%s: Failed to create link\n", __func__); 138 - return -ENOMEM; 139 - } 140 - 141 - opp_dev->dentry = d; 142 - 143 - return 0; 153 + opp_dev->dentry = debugfs_create_symlink(name, rootdir, 154 + opp_table->dentry_name); 144 155 } 145 156 146 157 /** ··· 136 177 * Dynamically adds device specific directory in debugfs 'opp' directory. If the 137 178 * device-opp is shared with other devices, then links will be created for all 138 179 * devices except the first. 139 - * 140 - * Return: 0 on success, otherwise negative error. 141 180 */ 142 - int opp_debug_register(struct opp_device *opp_dev, struct opp_table *opp_table) 181 + void opp_debug_register(struct opp_device *opp_dev, struct opp_table *opp_table) 143 182 { 144 - if (!rootdir) { 145 - pr_debug("%s: Uninitialized rootdir\n", __func__); 146 - return -EINVAL; 147 - } 148 - 149 183 if (opp_table->dentry) 150 - return opp_list_debug_create_link(opp_dev, opp_table); 151 - 152 - return opp_list_debug_create_dir(opp_dev, opp_table); 184 + opp_list_debug_create_link(opp_dev, opp_table); 185 + else 186 + opp_list_debug_create_dir(opp_dev, opp_table); 153 187 } 154 188 155 189 static void opp_migrate_dentry(struct opp_device *opp_dev, ··· 204 252 { 205 253 /* Create /sys/kernel/debug/opp directory */ 206 254 rootdir = debugfs_create_dir("opp", NULL); 207 - if (!rootdir) { 208 - pr_err("%s: Failed to create root directory\n", __func__); 209 - return -ENOMEM; 210 - } 211 255 212 256 return 0; 213 257 }

+99

drivers/opp/of.c

··· 20 20 #include <linux/pm_domain.h> 21 21 #include <linux/slab.h> 22 22 #include <linux/export.h> 23 + #include <linux/energy_model.h> 23 24 24 25 #include "opp.h" 25 26 ··· 1050 1049 return of_node_get(opp->np); 1051 1050 } 1052 1051 EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node); 1052 + 1053 + /* 1054 + * Callback function provided to the Energy Model framework upon registration. 1055 + * This computes the power estimated by @CPU at @kHz if it is the frequency 1056 + * of an existing OPP, or at the frequency of the first OPP above @kHz otherwise 1057 + * (see dev_pm_opp_find_freq_ceil()). This function updates @kHz to the ceiled 1058 + * frequency and @mW to the associated power. The power is estimated as 1059 + * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively 1060 + * the voltage and frequency of the OPP. 1061 + * 1062 + * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power 1063 + * calculation failed because of missing parameters, 0 otherwise. 1064 + */ 1065 + static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, 1066 + int cpu) 1067 + { 1068 + struct device *cpu_dev; 1069 + struct dev_pm_opp *opp; 1070 + struct device_node *np; 1071 + unsigned long mV, Hz; 1072 + u32 cap; 1073 + u64 tmp; 1074 + int ret; 1075 + 1076 + cpu_dev = get_cpu_device(cpu); 1077 + if (!cpu_dev) 1078 + return -ENODEV; 1079 + 1080 + np = of_node_get(cpu_dev->of_node); 1081 + if (!np) 1082 + return -EINVAL; 1083 + 1084 + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); 1085 + of_node_put(np); 1086 + if (ret) 1087 + return -EINVAL; 1088 + 1089 + Hz = *kHz * 1000; 1090 + opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz); 1091 + if (IS_ERR(opp)) 1092 + return -EINVAL; 1093 + 1094 + mV = dev_pm_opp_get_voltage(opp) / 1000; 1095 + dev_pm_opp_put(opp); 1096 + if (!mV) 1097 + return -EINVAL; 1098 + 1099 + tmp = (u64)cap * mV * mV * (Hz / 1000000); 1100 + do_div(tmp, 1000000000); 1101 + 1102 + *mW = (unsigned long)tmp; 1103 + *kHz = Hz / 1000; 1104 + 1105 + return 0; 1106 + } 1107 + 1108 + /** 1109 + * dev_pm_opp_of_register_em() - Attempt to register an Energy Model 1110 + * @cpus : CPUs for which an Energy Model has to be registered 1111 + * 1112 + * This checks whether the "dynamic-power-coefficient" devicetree property has 1113 + * been specified, and tries to register an Energy Model with it if it has. 1114 + */ 1115 + void dev_pm_opp_of_register_em(struct cpumask *cpus) 1116 + { 1117 + struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power); 1118 + int ret, nr_opp, cpu = cpumask_first(cpus); 1119 + struct device *cpu_dev; 1120 + struct device_node *np; 1121 + u32 cap; 1122 + 1123 + cpu_dev = get_cpu_device(cpu); 1124 + if (!cpu_dev) 1125 + return; 1126 + 1127 + nr_opp = dev_pm_opp_get_opp_count(cpu_dev); 1128 + if (nr_opp <= 0) 1129 + return; 1130 + 1131 + np = of_node_get(cpu_dev->of_node); 1132 + if (!np) 1133 + return; 1134 + 1135 + /* 1136 + * Register an EM only if the 'dynamic-power-coefficient' property is 1137 + * set in devicetree. It is assumed the voltage values are known if that 1138 + * property is set since it is useless otherwise. If voltages are not 1139 + * known, just let the EM registration fail with an error to alert the 1140 + * user about the inconsistent configuration. 1141 + */ 1142 + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); 1143 + of_node_put(np); 1144 + if (ret || !cap) 1145 + return; 1146 + 1147 + em_register_perf_domain(cpus, nr_opp, &em_cb); 1148 + } 1149 + EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em);

+7 -8

drivers/opp/opp.h

··· 238 238 239 239 #ifdef CONFIG_DEBUG_FS 240 240 void opp_debug_remove_one(struct dev_pm_opp *opp); 241 - int opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table); 242 - int opp_debug_register(struct opp_device *opp_dev, struct opp_table *opp_table); 241 + void opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table); 242 + void opp_debug_register(struct opp_device *opp_dev, struct opp_table *opp_table); 243 243 void opp_debug_unregister(struct opp_device *opp_dev, struct opp_table *opp_table); 244 244 #else 245 245 static inline void opp_debug_remove_one(struct dev_pm_opp *opp) {} 246 246 247 - static inline int opp_debug_create_one(struct dev_pm_opp *opp, 248 - struct opp_table *opp_table) 249 - { return 0; } 250 - static inline int opp_debug_register(struct opp_device *opp_dev, 251 - struct opp_table *opp_table) 252 - { return 0; } 247 + static inline void opp_debug_create_one(struct dev_pm_opp *opp, 248 + struct opp_table *opp_table) { } 249 + 250 + static inline void opp_debug_register(struct opp_device *opp_dev, 251 + struct opp_table *opp_table) { } 253 252 254 253 static inline void opp_debug_unregister(struct opp_device *opp_dev, 255 254 struct opp_table *opp_table)

+2

drivers/powercap/intel_rapl.c

··· 1156 1156 INTEL_CPU_FAM6(KABYLAKE_MOBILE, rapl_defaults_core), 1157 1157 INTEL_CPU_FAM6(KABYLAKE_DESKTOP, rapl_defaults_core), 1158 1158 INTEL_CPU_FAM6(CANNONLAKE_MOBILE, rapl_defaults_core), 1159 + INTEL_CPU_FAM6(ICELAKE_MOBILE, rapl_defaults_core), 1159 1160 1160 1161 INTEL_CPU_FAM6(ATOM_SILVERMONT, rapl_defaults_byt), 1161 1162 INTEL_CPU_FAM6(ATOM_AIRMONT, rapl_defaults_cht), ··· 1165 1164 INTEL_CPU_FAM6(ATOM_GOLDMONT, rapl_defaults_core), 1166 1165 INTEL_CPU_FAM6(ATOM_GOLDMONT_PLUS, rapl_defaults_core), 1167 1166 INTEL_CPU_FAM6(ATOM_GOLDMONT_X, rapl_defaults_core), 1167 + INTEL_CPU_FAM6(ATOM_TREMONT_X, rapl_defaults_core), 1168 1168 1169 1169 INTEL_CPU_FAM6(XEON_PHI_KNL, rapl_defaults_hsw_server), 1170 1170 INTEL_CPU_FAM6(XEON_PHI_KNM, rapl_defaults_hsw_server),

+1

drivers/thermal/Kconfig

··· 152 152 bool "generic cpu cooling support" 153 153 depends on CPU_FREQ 154 154 depends on THERMAL_OF 155 + depends on THERMAL=y 155 156 help 156 157 This implements the generic cpu cooling mechanism through frequency 157 158 reduction. An ACPI version of this already exists

+1

include/acpi/cppc_acpi.h

··· 137 137 cpumask_var_t shared_cpu_map; 138 138 }; 139 139 140 + extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); 140 141 extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); 141 142 extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); 142 143 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);

+26 -22

include/linux/cpufreq.h

··· 151 151 152 152 /* For cpufreq driver's internal use */ 153 153 void *driver_data; 154 + 155 + /* Pointer to the cooling device if used for thermal mitigation */ 156 + struct thermal_cooling_device *cdev; 154 157 }; 155 158 156 159 /* Only for ACPI */ ··· 257 254 static struct freq_attr _name = \ 258 255 __ATTR(_name, 0200, NULL, store_##_name) 259 256 260 - struct global_attr { 261 - struct attribute attr; 262 - ssize_t (*show)(struct kobject *kobj, 263 - struct attribute *attr, char *buf); 264 - ssize_t (*store)(struct kobject *a, struct attribute *b, 265 - const char *c, size_t count); 266 - }; 267 - 268 257 #define define_one_global_ro(_name) \ 269 - static struct global_attr _name = \ 258 + static struct kobj_attribute _name = \ 270 259 __ATTR(_name, 0444, show_##_name, NULL) 271 260 272 261 #define define_one_global_rw(_name) \ 273 - static struct global_attr _name = \ 262 + static struct kobj_attribute _name = \ 274 263 __ATTR(_name, 0644, show_##_name, store_##_name) 275 264 276 265 ··· 325 330 /* optional */ 326 331 int (*bios_limit)(int cpu, unsigned int *limit); 327 332 333 + int (*online)(struct cpufreq_policy *policy); 334 + int (*offline)(struct cpufreq_policy *policy); 328 335 int (*exit)(struct cpufreq_policy *policy); 329 336 void (*stop_cpu)(struct cpufreq_policy *policy); 330 337 int (*suspend)(struct cpufreq_policy *policy); ··· 343 346 }; 344 347 345 348 /* flags */ 346 - #define CPUFREQ_STICKY (1 << 0) /* driver isn't removed even if 347 - all ->init() calls failed */ 348 - #define CPUFREQ_CONST_LOOPS (1 << 1) /* loops_per_jiffy or other 349 - kernel "constants" aren't 350 - affected by frequency 351 - transitions */ 352 - #define CPUFREQ_PM_NO_WARN (1 << 2) /* don't warn on suspend/resume 353 - speed mismatches */ 349 + 350 + /* driver isn't removed even if all ->init() calls failed */ 351 + #define CPUFREQ_STICKY BIT(0) 352 + 353 + /* loops_per_jiffy or other kernel "constants" aren't affected by frequency transitions */ 354 + #define CPUFREQ_CONST_LOOPS BIT(1) 355 + 356 + /* don't warn on suspend/resume speed mismatches */ 357 + #define CPUFREQ_PM_NO_WARN BIT(2) 354 358 355 359 /* 356 360 * This should be set by platforms having multiple clock-domains, i.e. ··· 359 361 * be created in cpu/cpu<num>/cpufreq/ directory and so they can use the same 360 362 * governor with different tunables for different clusters. 361 363 */ 362 - #define CPUFREQ_HAVE_GOVERNOR_PER_POLICY (1 << 3) 364 + #define CPUFREQ_HAVE_GOVERNOR_PER_POLICY BIT(3) 363 365 364 366 /* 365 367 * Driver will do POSTCHANGE notifications from outside of their ->target() 366 368 * routine and so must set cpufreq_driver->flags with this flag, so that core 367 369 * can handle them specially. 368 370 */ 369 - #define CPUFREQ_ASYNC_NOTIFICATION (1 << 4) 371 + #define CPUFREQ_ASYNC_NOTIFICATION BIT(4) 370 372 371 373 /* 372 374 * Set by drivers which want cpufreq core to check if CPU is running at a ··· 375 377 * from the table. And if that fails, we will stop further boot process by 376 378 * issuing a BUG_ON(). 377 379 */ 378 - #define CPUFREQ_NEED_INITIAL_FREQ_CHECK (1 << 5) 380 + #define CPUFREQ_NEED_INITIAL_FREQ_CHECK BIT(5) 379 381 380 382 /* 381 383 * Set by drivers to disallow use of governors with "dynamic_switching" flag 382 384 * set. 383 385 */ 384 - #define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING (1 << 6) 386 + #define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING BIT(6) 387 + 388 + /* 389 + * Set by drivers that want the core to automatically register the cpufreq 390 + * driver as a thermal cooling device. 391 + */ 392 + #define CPUFREQ_IS_COOLING_DEV BIT(7) 385 393 386 394 int cpufreq_register_driver(struct cpufreq_driver *driver_data); 387 395 int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);

+3 -5

include/linux/cpuidle.h

··· 69 69 70 70 /* Idle State Flags */ 71 71 #define CPUIDLE_FLAG_NONE (0x00) 72 - #define CPUIDLE_FLAG_POLLING (0x01) /* polling state */ 73 - #define CPUIDLE_FLAG_COUPLED (0x02) /* state applies to multiple cpus */ 74 - #define CPUIDLE_FLAG_TIMER_STOP (0x04) /* timer is stopped on this state */ 75 - 76 - #define CPUIDLE_DRIVER_FLAGS_MASK (0xFFFF0000) 72 + #define CPUIDLE_FLAG_POLLING BIT(0) /* polling state */ 73 + #define CPUIDLE_FLAG_COUPLED BIT(1) /* state applies to multiple cpus */ 74 + #define CPUIDLE_FLAG_TIMER_STOP BIT(2) /* timer is stopped on this state */ 77 75 78 76 struct cpuidle_device_kobj; 79 77 struct cpuidle_state_kobj;

+10

include/linux/device.h

··· 1165 1165 return !!dev->power.async_suspend; 1166 1166 } 1167 1167 1168 + static inline bool device_pm_not_required(struct device *dev) 1169 + { 1170 + return dev->power.no_pm; 1171 + } 1172 + 1173 + static inline void device_set_pm_not_required(struct device *dev) 1174 + { 1175 + dev->power.no_pm = true; 1176 + } 1177 + 1168 1178 static inline void dev_pm_syscore_device(struct device *dev, bool val) 1169 1179 { 1170 1180 #ifdef CONFIG_PM_SLEEP

+19

include/linux/platform_data/davinci-cpufreq.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * TI DaVinci CPUFreq platform support. 4 + * 5 + * Copyright (C) 2009 Texas Instruments, Inc. http://www.ti.com/ 6 + */ 7 + 8 + #ifndef _MACH_DAVINCI_CPUFREQ_H 9 + #define _MACH_DAVINCI_CPUFREQ_H 10 + 11 + #include <linux/cpufreq.h> 12 + 13 + struct davinci_cpufreq_config { 14 + struct cpufreq_frequency_table *freq_table; 15 + int (*set_voltage)(unsigned int index); 16 + int (*init)(void); 17 + }; 18 + 19 + #endif /* _MACH_DAVINCI_CPUFREQ_H */

+4 -3

include/linux/pm.h

··· 592 592 bool is_suspended:1; /* Ditto */ 593 593 bool is_noirq_suspended:1; 594 594 bool is_late_suspended:1; 595 + bool no_pm:1; 595 596 bool early_init:1; /* Owned by the PM core */ 596 597 bool direct_complete:1; /* Owned by the PM core */ 597 598 u32 driver_flags; ··· 634 633 int runtime_error; 635 634 int autosuspend_delay; 636 635 u64 last_busy; 637 - unsigned long active_jiffies; 638 - unsigned long suspended_jiffies; 639 - unsigned long accounting_timestamp; 636 + u64 active_time; 637 + u64 suspended_time; 638 + u64 accounting_timestamp; 640 639 #endif 641 640 struct pm_subsys_data *subsys_data; /* Owned by the subsystem. */ 642 641 void (*set_latency_tolerance)(struct device *, s32);

+4 -4

include/linux/pm_domain.h

··· 271 271 struct device *genpd_dev_pm_attach_by_id(struct device *dev, 272 272 unsigned int index); 273 273 struct device *genpd_dev_pm_attach_by_name(struct device *dev, 274 - char *name); 274 + const char *name); 275 275 #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */ 276 276 static inline int of_genpd_add_provider_simple(struct device_node *np, 277 277 struct generic_pm_domain *genpd) ··· 324 324 } 325 325 326 326 static inline struct device *genpd_dev_pm_attach_by_name(struct device *dev, 327 - char *name) 327 + const char *name) 328 328 { 329 329 return NULL; 330 330 } ··· 341 341 struct device *dev_pm_domain_attach_by_id(struct device *dev, 342 342 unsigned int index); 343 343 struct device *dev_pm_domain_attach_by_name(struct device *dev, 344 - char *name); 344 + const char *name); 345 345 void dev_pm_domain_detach(struct device *dev, bool power_off); 346 346 void dev_pm_domain_set(struct device *dev, struct dev_pm_domain *pd); 347 347 #else ··· 355 355 return NULL; 356 356 } 357 357 static inline struct device *dev_pm_domain_attach_by_name(struct device *dev, 358 - char *name) 358 + const char *name) 359 359 { 360 360 return NULL; 361 361 }

+6

include/linux/pm_opp.h

··· 334 334 struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); 335 335 struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); 336 336 int of_get_required_opp_performance_state(struct device_node *np, int index); 337 + void dev_pm_opp_of_register_em(struct cpumask *cpus); 337 338 #else 338 339 static inline int dev_pm_opp_of_add_table(struct device *dev) 339 340 { ··· 373 372 { 374 373 return NULL; 375 374 } 375 + 376 + static inline void dev_pm_opp_of_register_em(struct cpumask *cpus) 377 + { 378 + } 379 + 376 380 static inline int of_get_required_opp_performance_state(struct device_node *np, int index) 377 381 { 378 382 return -ENOTSUPP;

+2

include/linux/pm_runtime.h

··· 113 113 return dev->power.irq_safe; 114 114 } 115 115 116 + extern u64 pm_runtime_suspended_time(struct device *dev); 117 + 116 118 #else /* !CONFIG_PM */ 117 119 118 120 static inline bool queue_pm_work(struct work_struct *work) { return false; }

+57

kernel/power/energy_model.c

··· 10 10 11 11 #include <linux/cpu.h> 12 12 #include <linux/cpumask.h> 13 + #include <linux/debugfs.h> 13 14 #include <linux/energy_model.h> 14 15 #include <linux/sched/topology.h> 15 16 #include <linux/slab.h> ··· 24 23 */ 25 24 static DEFINE_MUTEX(em_pd_mutex); 26 25 26 + #ifdef CONFIG_DEBUG_FS 27 + static struct dentry *rootdir; 28 + 29 + static void em_debug_create_cs(struct em_cap_state *cs, struct dentry *pd) 30 + { 31 + struct dentry *d; 32 + char name[24]; 33 + 34 + snprintf(name, sizeof(name), "cs:%lu", cs->frequency); 35 + 36 + /* Create per-cs directory */ 37 + d = debugfs_create_dir(name, pd); 38 + debugfs_create_ulong("frequency", 0444, d, &cs->frequency); 39 + debugfs_create_ulong("power", 0444, d, &cs->power); 40 + debugfs_create_ulong("cost", 0444, d, &cs->cost); 41 + } 42 + 43 + static int em_debug_cpus_show(struct seq_file *s, void *unused) 44 + { 45 + seq_printf(s, "%*pbl\n", cpumask_pr_args(to_cpumask(s->private))); 46 + 47 + return 0; 48 + } 49 + DEFINE_SHOW_ATTRIBUTE(em_debug_cpus); 50 + 51 + static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) 52 + { 53 + struct dentry *d; 54 + char name[8]; 55 + int i; 56 + 57 + snprintf(name, sizeof(name), "pd%d", cpu); 58 + 59 + /* Create the directory of the performance domain */ 60 + d = debugfs_create_dir(name, rootdir); 61 + 62 + debugfs_create_file("cpus", 0444, d, pd->cpus, &em_debug_cpus_fops); 63 + 64 + /* Create a sub-directory for each capacity state */ 65 + for (i = 0; i < pd->nr_cap_states; i++) 66 + em_debug_create_cs(&pd->table[i], d); 67 + } 68 + 69 + static int __init em_debug_init(void) 70 + { 71 + /* Create /sys/kernel/debug/energy_model directory */ 72 + rootdir = debugfs_create_dir("energy_model", NULL); 73 + 74 + return 0; 75 + } 76 + core_initcall(em_debug_init); 77 + #else /* CONFIG_DEBUG_FS */ 78 + static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) {} 79 + #endif 27 80 static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states, 28 81 struct em_data_callback *cb) 29 82 { ··· 156 101 pd->table = table; 157 102 pd->nr_cap_states = nr_states; 158 103 cpumask_copy(to_cpumask(pd->cpus), span); 104 + 105 + em_debug_create_pd(pd, cpu); 159 106 160 107 return pd; 161 108

+2 -6

kernel/power/qos.c

··· 582 582 qos->pm_qos_power_miscdev.name = qos->name; 583 583 qos->pm_qos_power_miscdev.fops = &pm_qos_power_fops; 584 584 585 - if (d) { 586 - (void)debugfs_create_file(qos->name, S_IRUGO, d, 587 - (void *)qos, &pm_qos_debug_fops); 588 - } 585 + debugfs_create_file(qos->name, S_IRUGO, d, (void *)qos, 586 + &pm_qos_debug_fops); 589 587 590 588 return misc_register(&qos->pm_qos_power_miscdev); 591 589 } ··· 683 685 BUILD_BUG_ON(ARRAY_SIZE(pm_qos_array) != PM_QOS_NUM_CLASSES); 684 686 685 687 d = debugfs_create_dir("pm_qos", NULL); 686 - if (IS_ERR_OR_NULL(d)) 687 - d = NULL; 688 688 689 689 for (i = PM_QOS_CPU_DMA_LATENCY; i < PM_QOS_NUM_CLASSES; i++) { 690 690 ret = register_pm_qos_misc(pm_qos_array[i], d);