Merge tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

+9

Documentation/ABI/obsolete/sysfs-cpuidle

··· 1 + What: /sys/devices/system/cpu/cpuidle/current_governor_ro 2 + Date: April, 2020 3 + Contact: linux-pm@vger.kernel.org 4 + Description: 5 + current_governor_ro shows current using cpuidle governor, but read only. 6 + with the update that cpuidle governor can be changed at runtime in default, 7 + both current_governor and current_governor_ro co-exist under 8 + /sys/devices/system/cpu/cpuidle/ file, it's duplicate so make 9 + current_governor_ro obselete.

+9 -15

Documentation/ABI/testing/sysfs-devices-system-cpu

··· 106 106 See Documentation/admin-guide/cputopology.rst for more information. 107 107 108 108 109 - What: /sys/devices/system/cpu/cpuidle/current_driver 110 - /sys/devices/system/cpu/cpuidle/current_governer_ro 111 - /sys/devices/system/cpu/cpuidle/available_governors 109 + What: /sys/devices/system/cpu/cpuidle/available_governors 110 + /sys/devices/system/cpu/cpuidle/current_driver 112 111 /sys/devices/system/cpu/cpuidle/current_governor 112 + /sys/devices/system/cpu/cpuidle/current_governer_ro 113 113 Date: September 2007 114 114 Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> 115 115 Description: Discover cpuidle policy and mechanism ··· 119 119 consumption during idle. 120 120 121 121 Idle policy (governor) is differentiated from idle mechanism 122 - (driver) 123 - 124 - current_driver: (RO) displays current idle mechanism 125 - 126 - current_governor_ro: (RO) displays current idle policy 127 - 128 - With the cpuidle_sysfs_switch boot option enabled (meant for 129 - developer testing), the following three attributes are visible 130 - instead: 131 - 132 - current_driver: same as described above 122 + (driver). 133 123 134 124 available_governors: (RO) displays a space separated list of 135 - available governors 125 + available governors. 126 + 127 + current_driver: (RO) displays current idle mechanism. 136 128 137 129 current_governor: (RW) displays current idle policy. Users can 138 130 switch the governor at runtime by writing to this file. 131 + 132 + current_governor_ro: (RO) displays current idle policy. 139 133 140 134 See Documentation/admin-guide/pm/cpuidle.rst and 141 135 Documentation/driver-api/pm/cpuidle.rst for more information.

+9 -11

Documentation/admin-guide/pm/cpuidle.rst

··· 159 159 and that is the primary reason for having more than one governor in the 160 160 ``CPUIdle`` subsystem. 161 161 162 - There are three ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_ 163 - and ``ladder``. Which of them is used by default depends on the configuration 164 - of the kernel and in particular on whether or not the scheduler tick can be 165 - `stopped by the idle loop <idle-cpus-and-tick_>`_. It is possible to change the 166 - governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has 167 - been passed to the kernel, but that is not safe in general, so it should not be 168 - done on production systems (that may change in the future, though). The name of 169 - the ``CPUIdle`` governor currently used by the kernel can be read from the 170 - :file:`current_governor_ro` (or :file:`current_governor` if 171 - ``cpuidle_sysfs_switch`` is present in the kernel command line) file under 172 - :file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``. 162 + There are four ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_, 163 + ``ladder`` and ``haltpoll``. Which of them is used by default depends on the 164 + configuration of the kernel and in particular on whether or not the scheduler 165 + tick can be `stopped by the idle loop <idle-cpus-and-tick_>`_. Available 166 + governors can be read from the :file:`available_governors`, and the governor 167 + can be changed at runtime. The name of the ``CPUIdle`` governor currently 168 + used by the kernel can be read from the :file:`current_governor_ro` or 169 + :file:`current_governor` file under :file:`/sys/devices/system/cpu/cpuidle/` 170 + in ``sysfs``. 173 171 174 172 Which ``CPUIdle`` driver is used, on the other hand, usually depends on the 175 173 platform the kernel is running on, but there are platforms with more than one

+917

Documentation/admin-guide/pm/intel-speed-select.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================================================ 4 + Intel(R) Speed Select Technology User Guide 5 + ============================================================ 6 + 7 + The Intel(R) Speed Select Technology (Intel(R) SST) provides a powerful new 8 + collection of features that give more granular control over CPU performance. 9 + With Intel(R) SST, one server can be configured for power and performance for a 10 + variety of diverse workload requirements. 11 + 12 + Refer to the links below for an overview of the technology: 13 + 14 + - https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html 15 + - https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf 16 + 17 + These capabilities are further enhanced in some of the newer generations of 18 + server platforms where these features can be enumerated and controlled 19 + dynamically without pre-configuring via BIOS setup options. This dynamic 20 + configuration is done via mailbox commands to the hardware. One way to enumerate 21 + and configure these features is by using the Intel Speed Select utility. 22 + 23 + This document explains how to use the Intel Speed Select tool to enumerate and 24 + control Intel(R) SST features. This document gives example commands and explains 25 + how these commands change the power and performance profile of the system under 26 + test. Using this tool as an example, customers can replicate the messaging 27 + implemented in the tool in their production software. 28 + 29 + intel-speed-select configuration tool 30 + ====================================== 31 + 32 + Most Linux distribution packages may include the "intel-speed-select" tool. If not, 33 + it can be built by downloading the Linux kernel tree from kernel.org. Once 34 + downloaded, the tool can be built without building the full kernel. 35 + 36 + From the kernel tree, run the following commands:: 37 + 38 + # cd tools/power/x86/intel-speed-select/ 39 + # make 40 + # make install 41 + 42 + Getting Help 43 + ------------ 44 + 45 + To get help with the tool, execute the command below:: 46 + 47 + # intel-speed-select --help 48 + 49 + The top-level help describes arguments and features. Notice that there is a 50 + multi-level help structure in the tool. For example, to get help for the feature "perf-profile":: 51 + 52 + # intel-speed-select perf-profile --help 53 + 54 + To get help on a command, another level of help is provided. For example for the command info "info":: 55 + 56 + # intel-speed-select perf-profile info --help 57 + 58 + Summary of platform capability 59 + ------------------------------ 60 + To check the current platform and driver capaibilities, execute:: 61 + 62 + #intel-speed-select --info 63 + 64 + For example on a test system:: 65 + 66 + # intel-speed-select --info 67 + Intel(R) Speed Select Technology 68 + Executing on CPU model: X 69 + Platform: API version : 1 70 + Platform: Driver version : 1 71 + Platform: mbox supported : 1 72 + Platform: mmio supported : 1 73 + Intel(R) SST-PP (feature perf-profile) is supported 74 + TDP level change control is unlocked, max level: 4 75 + Intel(R) SST-TF (feature turbo-freq) is supported 76 + Intel(R) SST-BF (feature base-freq) is not supported 77 + Intel(R) SST-CP (feature core-power) is supported 78 + 79 + Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) 80 + ------------------------------------------------------------------------ 81 + 82 + This feature allows configuration of a server dynamically based on workload 83 + performance requirements. This helps users during deployment as they do not have 84 + to choose a specific server configuration statically. This Intel(R) Speed Select 85 + Technology - Performance Profile (Intel(R) SST-PP) feature introduces a mechanism 86 + that allows multiple optimized performance profiles per system. Each profile 87 + defines a set of CPUs that need to be online and rest offline to sustain a 88 + guaranteed base frequency. Once the user issues a command to use a specific 89 + performance profile and meet CPU online/offline requirement, the user can expect 90 + a change in the base frequency dynamically. This feature is called 91 + "perf-profile" when using the Intel Speed Select tool. 92 + 93 + Number or performance levels 94 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 95 + 96 + There can be multiple performance profiles on a system. To get the number of 97 + profiles, execute the command below:: 98 + 99 + # intel-speed-select perf-profile get-config-levels 100 + Intel(R) Speed Select Technology 101 + Executing on CPU model: X 102 + package-0 103 + die-0 104 + cpu-0 105 + get-config-levels:4 106 + package-1 107 + die-0 108 + cpu-14 109 + get-config-levels:4 110 + 111 + On this system under test, there are 4 performance profiles in addition to the 112 + base performance profile (which is performance level 0). 113 + 114 + Lock/Unlock status 115 + ~~~~~~~~~~~~~~~~~~ 116 + 117 + Even if there are multiple performance profiles, it is possible that that they 118 + are locked. If they are locked, users cannot issue a command to change the 119 + performance state. It is possible that there is a BIOS setup to unlock or check 120 + with your system vendor. 121 + 122 + To check if the system is locked, execute the following command:: 123 + 124 + # intel-speed-select perf-profile get-lock-status 125 + Intel(R) Speed Select Technology 126 + Executing on CPU model: X 127 + package-0 128 + die-0 129 + cpu-0 130 + get-lock-status:0 131 + package-1 132 + die-0 133 + cpu-14 134 + get-lock-status:0 135 + 136 + In this case, lock status is 0, which means that the system is unlocked. 137 + 138 + Properties of a performance level 139 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 140 + 141 + To get properties of a specific performance level (For example for the level 0, below), execute the command below:: 142 + 143 + # intel-speed-select perf-profile info -l 0 144 + Intel(R) Speed Select Technology 145 + Executing on CPU model: X 146 + package-0 147 + die-0 148 + cpu-0 149 + perf-profile-level-0 150 + cpu-count:28 151 + enable-cpu-mask:000003ff,f0003fff 152 + enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,28,29,30,31,32,33,34,35,36,37,38,39,40,41 153 + thermal-design-power-ratio:26 154 + base-frequency(MHz):2600 155 + speed-select-turbo-freq:disabled 156 + speed-select-base-freq:disabled 157 + ... 158 + ... 159 + 160 + Here -l option is used to specify a performance level. 161 + 162 + If the option -l is omitted, then this command will print information about all 163 + the performance levels. The above command is printing properties of the 164 + performance level 0. 165 + 166 + For this performance profile, the list of CPUs displayed by the 167 + "enable-cpu-mask/enable-cpu-list" at the max can be "online." When that 168 + condition is met, then base frequency of 2600 MHz can be maintained. To 169 + understand more, execute "intel-speed-select perf-profile info" for performance 170 + level 4:: 171 + 172 + # intel-speed-select perf-profile info -l 4 173 + Intel(R) Speed Select Technology 174 + Executing on CPU model: X 175 + package-0 176 + die-0 177 + cpu-0 178 + perf-profile-level-4 179 + cpu-count:28 180 + enable-cpu-mask:000000fa,f0000faf 181 + enable-cpu-list:0,1,2,3,5,7,8,9,10,11,28,29,30,31,33,35,36,37,38,39 182 + thermal-design-power-ratio:28 183 + base-frequency(MHz):2800 184 + speed-select-turbo-freq:disabled 185 + speed-select-base-freq:unsupported 186 + ... 187 + ... 188 + 189 + There are fewer CPUs in the "enable-cpu-mask/enable-cpu-list". Consequently, if 190 + the user only keeps these CPUs online and the rest "offline," then the base 191 + frequency is increased to 2.8 GHz compared to 2.6 GHz at performance level 0. 192 + 193 + Get current performance level 194 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 195 + 196 + To get the current performance level, execute:: 197 + 198 + # intel-speed-select perf-profile get-config-current-level 199 + Intel(R) Speed Select Technology 200 + Executing on CPU model: X 201 + package-0 202 + die-0 203 + cpu-0 204 + get-config-current_level:0 205 + 206 + First verify that the base_frequency displayed by the cpufreq sysfs is correct:: 207 + 208 + # cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency 209 + 2600000 210 + 211 + This matches the base-frequency (MHz) field value displayed from the 212 + "perf-profile info" command for performance level 0(cpufreq frequency is in 213 + KHz). 214 + 215 + To check if the average frequency is equal to the base frequency for a 100% busy 216 + workload, disable turbo:: 217 + 218 + # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo 219 + 220 + Then runs a busy workload on all CPUs, for example:: 221 + 222 + #stress -c 64 223 + 224 + To verify the base frequency, run turbostat:: 225 + 226 + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 227 + 228 + Package Core CPU Bzy_MHz 229 + - - 2600 230 + 0 0 0 2600 231 + 0 1 1 2600 232 + 0 2 2 2600 233 + 0 3 3 2600 234 + 0 4 4 2600 235 + . . . . 236 + 237 + 238 + Changing performance level 239 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 240 + 241 + To the change the performance level to 4, execute:: 242 + 243 + # intel-speed-select -d perf-profile set-config-level -l 4 -o 244 + Intel(R) Speed Select Technology 245 + Executing on CPU model: X 246 + package-0 247 + die-0 248 + cpu-0 249 + perf-profile 250 + set_tdp_level:success 251 + 252 + In the command above, "-o" is optional. If it is specified, then it will also 253 + offline CPUs which are not present in the enable_cpu_mask for this performance 254 + level. 255 + 256 + Now if the base_frequency is checked:: 257 + 258 + #cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency 259 + 2800000 260 + 261 + Which shows that the base frequency now increased from 2600 MHz at performance 262 + level 0 to 2800 MHz at performance level 4. As a result, any workload, which can 263 + use fewer CPUs, can see a boost of 200 MHz compared to performance level 0. 264 + 265 + Check presence of other Intel(R) SST features 266 + --------------------------------------------- 267 + 268 + Each of the performance profiles also specifies weather there is support of 269 + other two Intel(R) SST features (Intel(R) Speed Select Technology - Base Frequency 270 + (Intel(R) SST-BF) and Intel(R) Speed Select Technology - Turbo Frequency (Intel 271 + SST-TF)). 272 + 273 + For example, from the output of "perf-profile info" above, for level 0 and level 274 + 4: 275 + 276 + For level 0:: 277 + speed-select-turbo-freq:disabled 278 + speed-select-base-freq:disabled 279 + 280 + For level 4:: 281 + speed-select-turbo-freq:disabled 282 + speed-select-base-freq:unsupported 283 + 284 + Given these results, the "speed-select-base-freq" (Intel(R) SST-BF) in level 4 285 + changed from "disabled" to "unsupported" compared to performance level 0. 286 + 287 + This means that at performance level 4, the "speed-select-base-freq" feature is 288 + not supported. However, at performance level 0, this feature is "supported", but 289 + currently "disabled", meaning the user has not activated this feature. Whereas 290 + "speed-select-turbo-freq" (Intel(R) SST-TF) is supported at both performance 291 + levels, but currently not activated by the user. 292 + 293 + The Intel(R) SST-BF and the Intel(R) SST-TF features are built on a foundation 294 + technology called Intel(R) Speed Select Technology - Core Power (Intel(R) SST-CP). 295 + The platform firmware enables this feature when Intel(R) SST-BF or Intel(R) SST-TF 296 + is supported on a platform. 297 + 298 + Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) 299 + --------------------------------------------------------------- 300 + 301 + Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) is an interface that 302 + allows users to define per core priority. This defines a mechanism to distribute 303 + power among cores when there is a power constrained scenario. This defines a 304 + class of service (CLOS) configuration. 305 + 306 + The user can configure up to 4 class of service configurations. Each CLOS group 307 + configuration allows definitions of parameters, which affects how the frequency 308 + can be limited and power is distributed. Each CPU core can be tied to a class of 309 + service and hence an associated priority. The granularity is at core level not 310 + at per CPU level. 311 + 312 + Enable CLOS based prioritization 313 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 314 + 315 + To use CLOS based prioritization feature, firmware must be informed to enable 316 + and use a priority type. There is a default per platform priority type, which 317 + can be changed with optional command line parameter. 318 + 319 + To enable and check the options, execute:: 320 + 321 + # intel-speed-select core-power enable --help 322 + Intel(R) Speed Select Technology 323 + Executing on CPU model: X 324 + Enable core-power for a package/die 325 + Clos Enable: Specify priority type with [--priority|-p] 326 + 0: Proportional, 1: Ordered 327 + 328 + There are two types of priority types: 329 + 330 + - Ordered 331 + 332 + Priority for ordered throttling is defined based on the index of the assigned 333 + CLOS group. Where CLOS0 gets highest priority (throttled last). 334 + 335 + Priority order is: 336 + CLOS0 > CLOS1 > CLOS2 > CLOS3. 337 + 338 + - Proportional 339 + 340 + When proportional priority is used, there is an additional parameter called 341 + frequency_weight, which can be specified per CLOS group. The goal of 342 + proportional priority is to provide each core with the requested min., then 343 + distribute all remaining (excess/deficit) budgets in proportion to a defined 344 + weight. This proportional priority can be configured using "core-power config" 345 + command. 346 + 347 + To enable with the platform default priority type, execute:: 348 + 349 + # intel-speed-select core-power enable 350 + Intel(R) Speed Select Technology 351 + Executing on CPU model: X 352 + package-0 353 + die-0 354 + cpu-0 355 + core-power 356 + enable:success 357 + package-1 358 + die-0 359 + cpu-6 360 + core-power 361 + enable:success 362 + 363 + The scope of this enable is per package or die scoped when a package contains 364 + multiple dies. To check if CLOS is enabled and get priority type, "core-power 365 + info" command can be used. For example to check the status of core-power feature 366 + on CPU 0, execute:: 367 + 368 + # intel-speed-select -c 0 core-power info 369 + Intel(R) Speed Select Technology 370 + Executing on CPU model: X 371 + package-0 372 + die-0 373 + cpu-0 374 + core-power 375 + support-status:supported 376 + enable-status:enabled 377 + clos-enable-status:enabled 378 + priority-type:proportional 379 + package-1 380 + die-0 381 + cpu-24 382 + core-power 383 + support-status:supported 384 + enable-status:enabled 385 + clos-enable-status:enabled 386 + priority-type:proportional 387 + 388 + Configuring CLOS groups 389 + ~~~~~~~~~~~~~~~~~~~~~~~ 390 + 391 + Each CLOS group has its own attributes including min, max, freq_weight and 392 + desired. These parameters can be configured with "core-power config" command. 393 + Defaults will be used if user skips setting a parameter except clos id, which is 394 + mandatory. To check core-power config options, execute:: 395 + 396 + # intel-speed-select core-power config --help 397 + Intel(R) Speed Select Technology 398 + Executing on CPU model: X 399 + Set core-power configuration for one of the four clos ids 400 + Specify targeted clos id with [--clos|-c] 401 + Specify clos Proportional Priority [--weight|-w] 402 + Specify clos min in MHz with [--min|-n] 403 + Specify clos max in MHz with [--max|-m] 404 + 405 + For example:: 406 + 407 + # intel-speed-select core-power config -c 0 408 + Intel(R) Speed Select Technology 409 + Executing on CPU model: X 410 + clos epp is not specified, default: 0 411 + clos frequency weight is not specified, default: 0 412 + clos min is not specified, default: 0 MHz 413 + clos max is not specified, default: 25500 MHz 414 + clos desired is not specified, default: 0 415 + package-0 416 + die-0 417 + cpu-0 418 + core-power 419 + config:success 420 + package-1 421 + die-0 422 + cpu-6 423 + core-power 424 + config:success 425 + 426 + The user has the option to change defaults. For example, the user can change the 427 + "min" and set the base frequency to always get guaranteed base frequency. 428 + 429 + Get the current CLOS configuration 430 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 431 + 432 + To check the current configuration, "core-power get-config" can be used. For 433 + example, to get the configuration of CLOS 0:: 434 + 435 + # intel-speed-select core-power get-config -c 0 436 + Intel(R) Speed Select Technology 437 + Executing on CPU model: X 438 + package-0 439 + die-0 440 + cpu-0 441 + core-power 442 + clos:0 443 + epp:0 444 + clos-proportional-priority:0 445 + clos-min:0 MHz 446 + clos-max:Max Turbo frequency 447 + clos-desired:0 MHz 448 + package-1 449 + die-0 450 + cpu-24 451 + core-power 452 + clos:0 453 + epp:0 454 + clos-proportional-priority:0 455 + clos-min:0 MHz 456 + clos-max:Max Turbo frequency 457 + clos-desired:0 MHz 458 + 459 + Associating a CPU with a CLOS group 460 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 461 + 462 + To associate a CPU to a CLOS group "core-power assoc" command can be used:: 463 + 464 + # intel-speed-select core-power assoc --help 465 + Intel(R) Speed Select Technology 466 + Executing on CPU model: X 467 + Associate a clos id to a CPU 468 + Specify targeted clos id with [--clos|-c] 469 + 470 + 471 + For example to associate CPU 10 to CLOS group 3, execute:: 472 + 473 + # intel-speed-select -c 10 core-power assoc -c 3 474 + Intel(R) Speed Select Technology 475 + Executing on CPU model: X 476 + package-0 477 + die-0 478 + cpu-10 479 + core-power 480 + assoc:success 481 + 482 + Once a CPU is associated, its sibling CPUs are also associated to a CLOS group. 483 + Once associated, avoid changing Linux "cpufreq" subsystem scaling frequency 484 + limits. 485 + 486 + To check the existing association for a CPU, "core-power get-assoc" command can 487 + be used. For example, to get association of CPU 10, execute:: 488 + 489 + # intel-speed-select -c 10 core-power get-assoc 490 + Intel(R) Speed Select Technology 491 + Executing on CPU model: X 492 + package-1 493 + die-0 494 + cpu-10 495 + get-assoc 496 + clos:3 497 + 498 + This shows that CPU 10 is part of a CLOS group 3. 499 + 500 + 501 + Disable CLOS based prioritization 502 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 503 + 504 + To disable, execute:: 505 + 506 + # intel-speed-select core-power disable 507 + 508 + Some features like Intel(R) SST-TF can only be enabled when CLOS based prioritization 509 + is enabled. For this reason, disabling while Intel(R) SST-TF is enabled can cause 510 + Intel(R) SST-TF to fail. This will cause the "disable" command to display an error 511 + if Intel(R) SST-TF is already enabled. In turn, to disable, the Intel(R) SST-TF 512 + feature must be disabled first. 513 + 514 + Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) 515 + ------------------------------------------------------------------- 516 + 517 + The Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) feature lets 518 + the user control base frequency. If some critical workload threads demand 519 + constant high guaranteed performance, then this feature can be used to execute 520 + the thread at higher base frequency on specific sets of CPUs (high priority 521 + CPUs) at the cost of lower base frequency (low priority CPUs) on other CPUs. 522 + This feature does not require offline of the low priority CPUs. 523 + 524 + The support of Intel(R) SST-BF depends on the Intel(R) Speed Select Technology - 525 + Performance Profile (Intel(R) SST-PP) performance level configuration. It is 526 + possible that only certain performance levels support Intel(R) SST-BF. It is also 527 + possible that only base performance level (level = 0) has support of Intel 528 + SST-BF. Consequently, first select the desired performance level to enable this 529 + feature. 530 + 531 + In the system under test here, Intel(R) SST-BF is supported at the base 532 + performance level 0, but currently disabled. For example for the level 0:: 533 + 534 + # intel-speed-select -c 0 perf-profile info -l 0 535 + Intel(R) Speed Select Technology 536 + Executing on CPU model: X 537 + package-0 538 + die-0 539 + cpu-0 540 + perf-profile-level-0 541 + ... 542 + 543 + speed-select-base-freq:disabled 544 + ... 545 + 546 + Before enabling Intel(R) SST-BF and measuring its impact on a workload 547 + performance, execute some workload and measure performance and get a baseline 548 + performance to compare against. 549 + 550 + Here the user wants more guaranteed performance. For this reason, it is likely 551 + that turbo is disabled. To disable turbo, execute:: 552 + 553 + #echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo 554 + 555 + Based on the output of the "intel-speed-select perf-profile info -l 0" base 556 + frequency of guaranteed frequency 2600 MHz. 557 + 558 + 559 + Measure baseline performance for comparison 560 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 561 + 562 + To compare, pick a multi-threaded workload where each thread can be scheduled on 563 + separate CPUs. "Hackbench pipe" test is a good example on how to improve 564 + performance using Intel(R) SST-BF. 565 + 566 + Below, the workload is measuring average scheduler wakeup latency, so a lower 567 + number means better performance:: 568 + 569 + # taskset -c 3,4 perf bench -r 100 sched pipe 570 + # Running 'sched/pipe' benchmark: 571 + # Executed 1000000 pipe operations between two processes 572 + Total time: 6.102 [sec] 573 + 6.102445 usecs/op 574 + 163868 ops/sec 575 + 576 + While running the above test, if we take turbostat output, it will show us that 577 + 2 of the CPUs are busy and reaching max. frequency (which would be the base 578 + frequency as the turbo is disabled). The turbostat output:: 579 + 580 + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 581 + Package Core CPU Bzy_MHz 582 + 0 0 0 1000 583 + 0 1 1 1005 584 + 0 2 2 1000 585 + 0 3 3 2600 586 + 0 4 4 2600 587 + 0 5 5 1000 588 + 0 6 6 1000 589 + 0 7 7 1005 590 + 0 8 8 1005 591 + 0 9 9 1000 592 + 0 10 10 1000 593 + 0 11 11 995 594 + 0 12 12 1000 595 + 0 13 13 1000 596 + 597 + From the above turbostat output, both CPU 3 and 4 are very busy and reaching 598 + full guaranteed frequency of 2600 MHz. 599 + 600 + Intel(R) SST-BF Capabilities 601 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 602 + 603 + To get capabilities of Intel(R) SST-BF for the current performance level 0, 604 + execute:: 605 + 606 + # intel-speed-select base-freq info -l 0 607 + Intel(R) Speed Select Technology 608 + Executing on CPU model: X 609 + package-0 610 + die-0 611 + cpu-0 612 + speed-select-base-freq 613 + high-priority-base-frequency(MHz):3000 614 + high-priority-cpu-mask:00000216,00002160 615 + high-priority-cpu-list:5,6,8,13,33,34,36,41 616 + low-priority-base-frequency(MHz):2400 617 + tjunction-temperature(C):125 618 + thermal-design-power(W):205 619 + 620 + The above capabilities show that there are some CPUs on this system that can 621 + offer base frequency of 3000 MHz compared to the standard base frequency at this 622 + performance levels. Nevertheless, these CPUs are fixed, and they are presented 623 + via high-priority-cpu-list/high-priority-cpu-mask. But if this Intel(R) SST-BF 624 + feature is selected, the low priorities CPUs (which are not in 625 + high-priority-cpu-list) can only offer up to 2400 MHz. As a result, if this 626 + clipping of low priority CPUs is acceptable, then the user can enable Intel 627 + SST-BF feature particularly for the above "sched pipe" workload since only two 628 + CPUs are used, they can be scheduled on high priority CPUs and can get boost of 629 + 400 MHz. 630 + 631 + Enable Intel(R) SST-BF 632 + ~~~~~~~~~~~~~~~~~~~~~~ 633 + 634 + To enable Intel(R) SST-BF feature, execute:: 635 + 636 + # intel-speed-select base-freq enable -a 637 + Intel(R) Speed Select Technology 638 + Executing on CPU model: X 639 + package-0 640 + die-0 641 + cpu-0 642 + base-freq 643 + enable:success 644 + package-1 645 + die-0 646 + cpu-14 647 + base-freq 648 + enable:success 649 + 650 + In this case, -a option is optional. This not only enables Intel(R) SST-BF, but it 651 + also adjusts the priority of cores using Intel(R) Speed Select Technology Core 652 + Power (Intel(R) SST-CP) features. This option sets the minimum performance of each 653 + Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) class to 654 + maximum performance so that the hardware will give maximum performance possible 655 + for each CPU. 656 + 657 + If -a option is not used, then the following steps are required before enabling 658 + Intel(R) SST-BF: 659 + 660 + - Discover Intel(R) SST-BF and note low and high priority base frequency 661 + - Note the high prioity CPU list 662 + - Enable CLOS using core-power feature set 663 + - Configure CLOS parameters. Use CLOS.min to set to minimum performance 664 + - Subscribe desired CPUs to CLOS groups 665 + 666 + With this configuration, if the same workload is executed by pinning the 667 + workload to high priority CPUs (CPU 5 and 6 in this case):: 668 + 669 + #taskset -c 5,6 perf bench -r 100 sched pipe 670 + # Running 'sched/pipe' benchmark: 671 + # Executed 1000000 pipe operations between two processes 672 + Total time: 5.627 [sec] 673 + 5.627922 usecs/op 674 + 177685 ops/sec 675 + 676 + This way, by enabling Intel(R) SST-BF, the performance of this benchmark is 677 + improved (latency reduced) by 7.79%. From the turbostat output, it can be 678 + observed that the high priority CPUs reached 3000 MHz compared to 2600 MHz. 679 + The turbostat output:: 680 + 681 + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 682 + Package Core CPU Bzy_MHz 683 + 0 0 0 2151 684 + 0 1 1 2166 685 + 0 2 2 2175 686 + 0 3 3 2175 687 + 0 4 4 2175 688 + 0 5 5 3000 689 + 0 6 6 3000 690 + 0 7 7 2180 691 + 0 8 8 2662 692 + 0 9 9 2176 693 + 0 10 10 2175 694 + 0 11 11 2176 695 + 0 12 12 2176 696 + 0 13 13 2661 697 + 698 + Disable Intel(R) SST-BF 699 + ~~~~~~~~~~~~~~~~~~~~~~~ 700 + 701 + To disable the Intel(R) SST-BF feature, execute:: 702 + 703 + # intel-speed-select base-freq disable -a 704 + 705 + 706 + Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF) 707 + -------------------------------------------------------------------- 708 + 709 + This feature enables the ability to set different "All core turbo ratio limits" 710 + to cores based on the priority. By using this feature, some cores can be 711 + configured to get higher turbo frequency by designating them as high priority at 712 + the cost of lower or no turbo frequency on the low priority cores. 713 + 714 + For this reason, this feature is only useful when system is busy utilizing all 715 + CPUs, but the user wants some configurable option to get high performance on 716 + some CPUs. 717 + 718 + The support of Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF) 719 + depends on the Intel(R) Speed Select Technology - Performance Profile (Intel 720 + SST-PP) performance level configuration. It is possible that only a certain 721 + performance level supports Intel(R) SST-TF. It is also possible that only the base 722 + performance level (level = 0) has the support of Intel(R) SST-TF. Hence, first 723 + select the desired performance level to enable this feature. 724 + 725 + In the system under test here, Intel(R) SST-TF is supported at the base 726 + performance level 0, but currently disabled:: 727 + 728 + # intel-speed-select -c 0 perf-profile info -l 0 729 + Intel(R) Speed Select Technology 730 + package-0 731 + die-0 732 + cpu-0 733 + perf-profile-level-0 734 + ... 735 + ... 736 + speed-select-turbo-freq:disabled 737 + ... 738 + ... 739 + 740 + 741 + To check if performance can be improved using Intel(R) SST-TF feature, get the turbo 742 + frequency properties with Intel(R) SST-TF enabled and compare to the base turbo 743 + capability of this system. 744 + 745 + Get Base turbo capability 746 + ~~~~~~~~~~~~~~~~~~~~~~~~~ 747 + 748 + To get the base turbo capability of performance level 0, execute:: 749 + 750 + # intel-speed-select perf-profile info -l 0 751 + Intel(R) Speed Select Technology 752 + Executing on CPU model: X 753 + package-0 754 + die-0 755 + cpu-0 756 + perf-profile-level-0 757 + ... 758 + ... 759 + turbo-ratio-limits-sse 760 + bucket-0 761 + core-count:2 762 + max-turbo-frequency(MHz):3200 763 + bucket-1 764 + core-count:4 765 + max-turbo-frequency(MHz):3100 766 + bucket-2 767 + core-count:6 768 + max-turbo-frequency(MHz):3100 769 + bucket-3 770 + core-count:8 771 + max-turbo-frequency(MHz):3100 772 + bucket-4 773 + core-count:10 774 + max-turbo-frequency(MHz):3100 775 + bucket-5 776 + core-count:12 777 + max-turbo-frequency(MHz):3100 778 + bucket-6 779 + core-count:14 780 + max-turbo-frequency(MHz):3100 781 + bucket-7 782 + core-count:16 783 + max-turbo-frequency(MHz):3100 784 + 785 + Based on the data above, when all the CPUS are busy, the max. frequency of 3100 786 + MHz can be achieved. If there is some busy workload on cpu 0 - 11 (e.g. stress) 787 + and on CPU 12 and 13, execute "hackbench pipe" workload:: 788 + 789 + # taskset -c 12,13 perf bench -r 100 sched pipe 790 + # Running 'sched/pipe' benchmark: 791 + # Executed 1000000 pipe operations between two processes 792 + Total time: 5.705 [sec] 793 + 5.705488 usecs/op 794 + 175269 ops/sec 795 + 796 + The turbostat output:: 797 + 798 + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 799 + Package Core CPU Bzy_MHz 800 + 0 0 0 3000 801 + 0 1 1 3000 802 + 0 2 2 3000 803 + 0 3 3 3000 804 + 0 4 4 3000 805 + 0 5 5 3100 806 + 0 6 6 3100 807 + 0 7 7 3000 808 + 0 8 8 3100 809 + 0 9 9 3000 810 + 0 10 10 3000 811 + 0 11 11 3000 812 + 0 12 12 3100 813 + 0 13 13 3100 814 + 815 + Based on turbostat output, the performance is limited by frequency cap of 3100 816 + MHz. To check if the hackbench performance can be improved for CPU 12 and CPU 817 + 13, first check the capability of the Intel(R) SST-TF feature for this performance 818 + level. 819 + 820 + Get Intel(R) SST-TF Capability 821 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 822 + 823 + To get the capability, the "turbo-freq info" command can be used:: 824 + 825 + # intel-speed-select turbo-freq info -l 0 826 + Intel(R) Speed Select Technology 827 + Executing on CPU model: X 828 + package-0 829 + die-0 830 + cpu-0 831 + speed-select-turbo-freq 832 + bucket-0 833 + high-priority-cores-count:2 834 + high-priority-max-frequency(MHz):3200 835 + high-priority-max-avx2-frequency(MHz):3200 836 + high-priority-max-avx512-frequency(MHz):3100 837 + bucket-1 838 + high-priority-cores-count:4 839 + high-priority-max-frequency(MHz):3100 840 + high-priority-max-avx2-frequency(MHz):3000 841 + high-priority-max-avx512-frequency(MHz):2900 842 + bucket-2 843 + high-priority-cores-count:6 844 + high-priority-max-frequency(MHz):3100 845 + high-priority-max-avx2-frequency(MHz):3000 846 + high-priority-max-avx512-frequency(MHz):2900 847 + speed-select-turbo-freq-clip-frequencies 848 + low-priority-max-frequency(MHz):2600 849 + low-priority-max-avx2-frequency(MHz):2400 850 + low-priority-max-avx512-frequency(MHz):2100 851 + 852 + Based on the output above, there is an Intel(R) SST-TF bucket for which there are 853 + two high priority cores. If only two high priority cores are set, then max. 854 + turbo frequency on those cores can be increased to 3200 MHz. This is 100 MHz 855 + more than the base turbo capability for all cores. 856 + 857 + In turn, for the hackbench workload, two CPUs can be set as high priority and 858 + rest as low priority. One side effect is that once enabled, the low priority 859 + cores will be clipped to a lower frequency of 2600 MHz. 860 + 861 + Enable Intel(R) SST-TF 862 + ~~~~~~~~~~~~~~~~~~~~~~ 863 + 864 + To enable Intel(R) SST-TF, execute:: 865 + 866 + # intel-speed-select -c 12,13 turbo-freq enable -a 867 + Intel(R) Speed Select Technology 868 + Executing on CPU model: X 869 + package-0 870 + die-0 871 + cpu-12 872 + turbo-freq 873 + enable:success 874 + package-0 875 + die-0 876 + cpu-13 877 + turbo-freq 878 + enable:success 879 + package--1 880 + die-0 881 + cpu-63 882 + turbo-freq --auto 883 + enable:success 884 + 885 + In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF 886 + feature and also sets the CPUs to high and and low priority using Intel Speed 887 + Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed 888 + with "-c" arguments are marked as high priority, including its siblings. 889 + 890 + If -a option is not used, then the following steps are required before enabling 891 + Intel(R) SST-TF: 892 + 893 + - Discover Intel(R) SST-TF and note buckets of high priority cores and maximum frequency 894 + 895 + - Enable CLOS using core-power feature set - Configure CLOS parameters 896 + 897 + - Subscribe desired CPUs to CLOS groups making sure that high priority cores are set to the maximum frequency 898 + 899 + If the same hackbench workload is executed, schedule hackbench threads on high 900 + priority CPUs:: 901 + 902 + #taskset -c 12,13 perf bench -r 100 sched pipe 903 + # Running 'sched/pipe' benchmark: 904 + # Executed 1000000 pipe operations between two processes 905 + Total time: 5.510 [sec] 906 + 5.510165 usecs/op 907 + 180826 ops/sec 908 + 909 + This improved performance by around 3.3% improvement on a busy system. Here the 910 + turbostat output will show that the CPU 12 and CPU 13 are getting 100 MHz boost. 911 + The turbostat output:: 912 + 913 + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 914 + Package Core CPU Bzy_MHz 915 + ... 916 + 0 12 12 3200 917 + 0 13 13 3200

+19 -13

Documentation/admin-guide/pm/intel_pstate.rst

··· 62 62 Active Mode 63 63 ----------- 64 64 65 - This is the default operation mode of ``intel_pstate``. If it works in this 66 - mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` 67 - policies contains the string "intel_pstate". 65 + This is the default operation mode of ``intel_pstate`` for processors with 66 + hardware-managed P-states (HWP) support. If it works in this mode, the 67 + ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies 68 + contains the string "intel_pstate". 68 69 69 70 In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and 70 71 provides its own scaling algorithms for P-state selection. Those algorithms ··· 139 138 Active Mode Without HWP 140 139 ~~~~~~~~~~~~~~~~~~~~~~~ 141 140 142 - This is the default operation mode for processors that do not support the HWP 143 - feature. It also is used by default with the ``intel_pstate=no_hwp`` argument 144 - in the kernel command line. However, in this mode ``intel_pstate`` may refuse 145 - to work with the given processor if it does not recognize it. [Note that 146 - ``intel_pstate`` will never refuse to work with any processor with the HWP 147 - feature enabled.] 141 + This operation mode is optional for processors that do not support the HWP 142 + feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in 143 + the command line. The active mode is used in those cases if the 144 + ``intel_pstate=active`` argument is passed to the kernel in the command line. 145 + In this mode ``intel_pstate`` may refuse to work with processors that are not 146 + recognized by it. [Note that ``intel_pstate`` will never refuse to work with 147 + any processor with the HWP feature enabled.] 148 148 149 149 In this mode ``intel_pstate`` registers utilization update callbacks with the 150 150 CPU scheduler in order to run a P-state selection algorithm, either ··· 190 188 Passive Mode 191 189 ------------ 192 190 193 - This mode is used if the ``intel_pstate=passive`` argument is passed to the 194 - kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too). 195 - Like in the active mode without HWP support, in this mode ``intel_pstate`` may 196 - refuse to work with the given processor if it does not recognize it. 191 + This is the default operation mode of ``intel_pstate`` for processors without 192 + hardware-managed P-states (HWP) support. It is always used if the 193 + ``intel_pstate=passive`` argument is passed to the kernel in the command line 194 + regardless of whether or not the given processor supports HWP. [Note that the 195 + ``intel_pstate=no_hwp`` setting implies ``intel_pstate=passive`` if it is used 196 + without ``intel_pstate=active``.] Like in the active mode without HWP support, 197 + in this mode ``intel_pstate`` may refuse to work with processors that are not 198 + recognized by it. 197 199 198 200 If the driver works in this mode, the ``scaling_driver`` policy attribute in 199 201 ``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".

+1

Documentation/admin-guide/pm/working-state.rst

··· 13 13 intel_pstate 14 14 cpufreq_drivers 15 15 intel_epb 16 + intel-speed-select

+2 -3

Documentation/driver-api/pm/cpuidle.rst

··· 68 68 governor currently in use, or the name of the new governor was passed to the 69 69 kernel as the value of the ``cpuidle.governor=`` command line parameter, the new 70 70 governor will be used from that point on (there can be only one ``CPUIdle`` 71 - governor in use at a time). Also, if ``cpuidle_sysfs_switch`` is passed to the 72 - kernel in the command line, user space can choose the ``CPUIdle`` governor to 73 - use at run time via ``sysfs``. 71 + governor in use at a time). Also, user space can choose the ``CPUIdle`` 72 + governor to use at run time via ``sysfs``. 74 73 75 74 Once registered, ``CPUIdle`` governors cannot be unregistered, so it is not 76 75 practical to put them into loadable kernel modules.

+125 -70

Documentation/driver-api/pm/devices.rst

··· 349 349 PM core will skip the ``suspend``, ``suspend_late`` and 350 350 ``suspend_noirq`` phases as well as all of the corresponding phases of 351 351 the subsequent device resume for all of these devices. In that case, 352 - the ``->complete`` callback will be invoked directly after the 352 + the ``->complete`` callback will be the next one invoked after the 353 353 ``->prepare`` callback and is entirely responsible for putting the 354 354 device into a consistent state as appropriate. 355 355 ··· 361 361 runtime PM disabled. 362 362 363 363 This feature also can be controlled by device drivers by using the 364 - ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power 365 - management flags. [Typically, they are set at the time the driver is 366 - probed against the device in question by passing them to the 364 + ``DPM_FLAG_NO_DIRECT_COMPLETE`` and ``DPM_FLAG_SMART_PREPARE`` driver 365 + power management flags. [Typically, they are set at the time the driver 366 + is probed against the device in question by passing them to the 367 367 :c:func:`dev_pm_set_driver_flags` helper function.] If the first of 368 368 these flags is set, the PM core will not apply the direct-complete 369 369 procedure described above to the given device and, consequenty, to any ··· 383 383 ``->suspend`` methods provided by subsystems (bus types and PM domains 384 384 in particular) must follow an additional rule regarding what can be done 385 385 to the devices before their drivers' ``->suspend`` methods are called. 386 - Namely, they can only resume the devices from runtime suspend by 387 - calling :c:func:`pm_runtime_resume` for them, if that is necessary, and 386 + Namely, they may resume the devices from runtime suspend by 387 + calling :c:func:`pm_runtime_resume` for them, if that is necessary, but 388 388 they must not update the state of the devices in any other way at that 389 389 time (in case the drivers need to resume the devices from runtime 390 - suspend in their ``->suspend`` methods). 390 + suspend in their ``->suspend`` methods). In fact, the PM core prevents 391 + subsystems or drivers from putting devices into runtime suspend at 392 + these times by calling :c:func:`pm_runtime_get_noresume` before issuing 393 + the ``->prepare`` callback (and calling :c:func:`pm_runtime_put` after 394 + issuing the ``->complete`` callback). 391 395 392 396 3. For a number of devices it is convenient to split suspend into the 393 397 "quiesce device" and "save device state" phases, in which cases ··· 463 459 464 460 Note, however, that new children may be registered below the device as 465 461 soon as the ``->resume`` callbacks occur; it's not necessary to wait 466 - until the ``complete`` phase with that. 462 + until the ``complete`` phase runs. 467 463 468 464 Moreover, if the preceding ``->prepare`` callback returned a positive 469 465 number, the device may have been left in runtime suspend throughout the 470 - whole system suspend and resume (the ``suspend``, ``suspend_late``, 471 - ``suspend_noirq`` phases of system suspend and the ``resume_noirq``, 472 - ``resume_early``, ``resume`` phases of system resume may have been 473 - skipped for it). In that case, the ``->complete`` callback is entirely 466 + whole system suspend and resume (its ``->suspend``, ``->suspend_late``, 467 + ``->suspend_noirq``, ``->resume_noirq``, 468 + ``->resume_early``, and ``->resume`` callbacks may have been 469 + skipped). In that case, the ``->complete`` callback is entirely 474 470 responsible for putting the device into a consistent state after system 475 471 suspend if necessary. [For example, it may need to queue up a runtime 476 472 resume request for the device for this purpose.] To check if that is 477 473 the case, the ``->complete`` callback can consult the device's 478 - ``power.direct_complete`` flag. Namely, if that flag is set when the 479 - ``->complete`` callback is being run, it has been called directly after 480 - the preceding ``->prepare`` and special actions may be required 481 - to make the device work correctly afterward. 474 + ``power.direct_complete`` flag. If that flag is set when the 475 + ``->complete`` callback is being run then the direct-complete mechanism 476 + was used, and special actions may be required to make the device work 477 + correctly afterward. 482 478 483 479 At the end of these phases, drivers should be as functional as they were before 484 480 suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are ··· 579 575 580 576 The ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks 581 577 should do essentially the same things as the ``->suspend``, ``->suspend_late`` 582 - and ``->suspend_noirq`` callbacks, respectively. The only notable difference is 578 + and ``->suspend_noirq`` callbacks, respectively. A notable difference is 583 579 that they need not store the device register values, because the registers 584 580 should already have been stored during the ``freeze``, ``freeze_late`` or 585 - ``freeze_noirq`` phases. 581 + ``freeze_noirq`` phases. Also, on many machines the firmware will power-down 582 + the entire system, so it is not necessary for the callback to put the device in 583 + a low-power state. 586 584 587 585 588 586 Leaving Hibernation ··· 770 764 771 765 If it is necessary to resume a device from runtime suspend during a system-wide 772 766 transition into a sleep state, that can be done by calling 773 - :c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its 774 - couterpart for transitions related to hibernation) of either the device's driver 775 - or a subsystem responsible for it (for example, a bus type or a PM domain). 776 - That is guaranteed to work by the requirement that subsystems must not change 777 - the state of devices (possibly except for resuming them from runtime suspend) 767 + :c:func:`pm_runtime_resume` from the ``->suspend`` callback (or the ``->freeze`` 768 + or ``->poweroff`` callback for transitions related to hibernation) of either the 769 + device's driver or its subsystem (for example, a bus type or a PM domain). 770 + However, subsystems must not otherwise change the runtime status of devices 778 771 from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before* 779 772 invoking device drivers' ``->suspend`` callbacks (or equivalent). 780 773 774 + .. _smart_suspend_flag: 775 + 776 + The ``DPM_FLAG_SMART_SUSPEND`` Driver Flag 777 + ------------------------------------------ 778 + 781 779 Some bus types and PM domains have a policy to resume all devices from runtime 782 780 suspend upfront in their ``->suspend`` callbacks, but that may not be really 783 - necessary if the driver of the device can cope with runtime-suspended devices. 784 - The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in 785 - :c:member:`power.driver_flags` at the probe time, by passing it to the 786 - :c:func:`dev_pm_set_driver_flags` helper. That also may cause middle-layer code 781 + necessary if the device's driver can cope with runtime-suspended devices. 782 + The driver can indicate this by setting ``DPM_FLAG_SMART_SUSPEND`` in 783 + :c:member:`power.driver_flags` at probe time, with the assistance of the 784 + :c:func:`dev_pm_set_driver_flags` helper routine. 785 + 786 + Setting that flag causes the PM core and middle-layer code 787 787 (bus types, PM domains etc.) to skip the ``->suspend_late`` and 788 788 ``->suspend_noirq`` callbacks provided by the driver if the device remains in 789 - runtime suspend at the beginning of the ``suspend_late`` phase of system-wide 790 - suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM 791 - has been disabled for it, under the assumption that its state should not change 792 - after that point until the system-wide transition is over (the PM core itself 793 - does that for devices whose "noirq", "late" and "early" system-wide PM callbacks 794 - are executed directly by it). If that happens, the driver's system-wide resume 795 - callbacks, if present, may still be invoked during the subsequent system-wide 796 - resume transition and the device's runtime power management status may be set 797 - to "active" before enabling runtime PM for it, so the driver must be prepared to 798 - cope with the invocation of its system-wide resume callbacks back-to-back with 799 - its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and 800 - so on) and the final state of the device must reflect the "active" runtime PM 801 - status in that case. 789 + runtime suspend throughout those phases of the system-wide suspend (and 790 + similarly for the "freeze" and "poweroff" parts of system hibernation). 791 + [Otherwise the same driver 792 + callback might be executed twice in a row for the same device, which would not 793 + be valid in general.] If the middle-layer system-wide PM callbacks are present 794 + for the device then they are responsible for skipping these driver callbacks; 795 + if not then the PM core skips them. The subsystem callback routines can 796 + determine whether they need to skip the driver callbacks by testing the return 797 + value from the :c:func:`dev_pm_skip_suspend` helper function. 798 + 799 + In addition, with ``DPM_FLAG_SMART_SUSPEND`` set, the driver's ``->thaw_noirq`` 800 + and ``->thaw_early`` callbacks are skipped in hibernation if the device remained 801 + in runtime suspend throughout the preceding "freeze" transition. Again, if the 802 + middle-layer callbacks are present for the device, they are responsible for 803 + doing this, otherwise the PM core takes care of it. 804 + 805 + 806 + The ``DPM_FLAG_MAY_SKIP_RESUME`` Driver Flag 807 + -------------------------------------------- 802 808 803 809 During system-wide resume from a sleep state it's easiest to put devices into 804 810 the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`. 805 811 [Refer to that document for more information regarding this particular issue as 806 812 well as for information on the device runtime power management framework in 807 - general.] 808 - 809 - However, it often is desirable to leave devices in suspend after system 810 - transitions to the working state, especially if those devices had been in 813 + general.] However, it often is desirable to leave devices in suspend after 814 + system transitions to the working state, especially if those devices had been in 811 815 runtime suspend before the preceding system-wide suspend (or analogous) 812 - transition. Device drivers can use the ``DPM_FLAG_LEAVE_SUSPENDED`` flag to 813 - indicate to the PM core (and middle-layer code) that they prefer the specific 814 - devices handled by them to be left suspended and they have no problems with 815 - skipping their system-wide resume callbacks for this reason. Whether or not the 816 - devices will actually be left in suspend may depend on their state before the 817 - given system suspend-resume cycle and on the type of the system transition under 818 - way. In particular, devices are not left suspended if that transition is a 819 - restore from hibernation, as device states are not guaranteed to be reflected 820 - by the information stored in the hibernation image in that case. 816 + transition. 821 817 822 - The middle-layer code involved in the handling of the device is expected to 823 - indicate to the PM core if the device may be left in suspend by setting its 824 - :c:member:`power.may_skip_resume` status bit which is checked by the PM core 825 - during the "noirq" phase of the preceding system-wide suspend (or analogous) 826 - transition. The middle layer is then responsible for handling the device as 827 - appropriate in its "noirq" resume callback, which is executed regardless of 828 - whether or not the device is left suspended, but the other resume callbacks 829 - (except for ``->complete``) will be skipped automatically by the PM core if the 830 - device really can be left in suspend. 818 + To that end, device drivers can use the ``DPM_FLAG_MAY_SKIP_RESUME`` flag to 819 + indicate to the PM core and middle-layer code that they allow their "noirq" and 820 + "early" resume callbacks to be skipped if the device can be left in suspend 821 + after system-wide PM transitions to the working state. Whether or not that is 822 + the case generally depends on the state of the device before the given system 823 + suspend-resume cycle and on the type of the system transition under way. 824 + In particular, the "thaw" and "restore" transitions related to hibernation are 825 + not affected by ``DPM_FLAG_MAY_SKIP_RESUME`` at all. [All callbacks are 826 + issued during the "restore" transition regardless of the flag settings, 827 + and whether or not any driver callbacks 828 + are skipped during the "thaw" transition depends whether or not the 829 + ``DPM_FLAG_SMART_SUSPEND`` flag is set (see `above <smart_suspend_flag_>`_). 830 + In addition, a device is not allowed to remain in runtime suspend if any of its 831 + children will be returned to full power.] 831 832 832 - For devices whose "noirq", "late" and "early" driver callbacks are invoked 833 - directly by the PM core, all of the system-wide resume callbacks are skipped if 834 - ``DPM_FLAG_LEAVE_SUSPENDED`` is set and the device is in runtime suspend during 835 - the ``suspend_noirq`` (or analogous) phase or the transition under way is a 836 - proper system suspend (rather than anything related to hibernation) and the 837 - device's wakeup settings are suitable for runtime PM (that is, it cannot 838 - generate wakeup signals at all or it is allowed to wake up the system from 839 - sleep). 833 + The ``DPM_FLAG_MAY_SKIP_RESUME`` flag is taken into account in combination with 834 + the :c:member:`power.may_skip_resume` status bit set by the PM core during the 835 + "suspend" phase of suspend-type transitions. If the driver or the middle layer 836 + has a reason to prevent the driver's "noirq" and "early" resume callbacks from 837 + being skipped during the subsequent system resume transition, it should 838 + clear :c:member:`power.may_skip_resume` in its ``->suspend``, ``->suspend_late`` 839 + or ``->suspend_noirq`` callback. [Note that the drivers setting 840 + ``DPM_FLAG_SMART_SUSPEND`` need to clear :c:member:`power.may_skip_resume` in 841 + their ``->suspend`` callback in case the other two are skipped.] 842 + 843 + Setting the :c:member:`power.may_skip_resume` status bit along with the 844 + ``DPM_FLAG_MAY_SKIP_RESUME`` flag is necessary, but generally not sufficient, 845 + for the driver's "noirq" and "early" resume callbacks to be skipped. Whether or 846 + not they should be skipped can be determined by evaluating the 847 + :c:func:`dev_pm_skip_resume` helper function. 848 + 849 + If that function returns ``true``, the driver's "noirq" and "early" resume 850 + callbacks should be skipped and the device's runtime PM status will be set to 851 + "suspended" by the PM core. Otherwise, if the device was runtime-suspended 852 + during the preceding system-wide suspend transition and its 853 + ``DPM_FLAG_SMART_SUSPEND`` is set, its runtime PM status will be set to 854 + "active" by the PM core. [Hence, the drivers that do not set 855 + ``DPM_FLAG_SMART_SUSPEND`` should not expect the runtime PM status of their 856 + devices to be changed from "suspended" to "active" by the PM core during 857 + system-wide resume-type transitions.] 858 + 859 + If the ``DPM_FLAG_MAY_SKIP_RESUME`` flag is not set for a device, but 860 + ``DPM_FLAG_SMART_SUSPEND`` is set and the driver's "late" and "noirq" suspend 861 + callbacks are skipped, its system-wide "noirq" and "early" resume callbacks, if 862 + present, are invoked as usual and the device's runtime PM status is set to 863 + "active" by the PM core before enabling runtime PM for it. In that case, the 864 + driver must be prepared to cope with the invocation of its system-wide resume 865 + callbacks back-to-back with its ``->runtime_suspend`` one (without the 866 + intervening ``->runtime_resume`` and system-wide suspend callbacks) and the 867 + final state of the device must reflect the "active" runtime PM status in that 868 + case. [Note that this is not a problem at all if the driver's 869 + ``->suspend_late`` callback pointer points to the same function as its 870 + ``->runtime_suspend`` one and its ``->resume_early`` callback pointer points to 871 + the same function as the ``->runtime_resume`` one, while none of the other 872 + system-wide suspend-resume callbacks of the driver are present, for example.] 873 + 874 + Likewise, if ``DPM_FLAG_MAY_SKIP_RESUME`` is set for a device, its driver's 875 + system-wide "noirq" and "early" resume callbacks may be skipped while its "late" 876 + and "noirq" suspend callbacks may have been executed (in principle, regardless 877 + of whether or not ``DPM_FLAG_SMART_SUSPEND`` is set). In that case, the driver 878 + needs to be able to cope with the invocation of its ``->runtime_resume`` 879 + callback back-to-back with its "late" and "noirq" suspend ones. [For instance, 880 + that is not a concern if the driver sets both ``DPM_FLAG_SMART_SUSPEND`` and 881 + ``DPM_FLAG_MAY_SKIP_RESUME`` and uses the same pair of suspend/resume callback 882 + functions for runtime PM and system-wide suspend/resume.]

+26 -28

Documentation/power/pci.rst

··· 1004 1004 time with the help of the dev_pm_set_driver_flags() function and they should not 1005 1005 be updated directly afterwards. 1006 1006 1007 - The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete 1008 - mechanism allowing device suspend/resume callbacks to be skipped if the device 1009 - is in runtime suspend when the system suspend starts. That also affects all of 1010 - the ancestors of the device, so this flag should only be used if absolutely 1011 - necessary. 1007 + The DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the 1008 + direct-complete mechanism allowing device suspend/resume callbacks to be skipped 1009 + if the device is in runtime suspend when the system suspend starts. That also 1010 + affects all of the ancestors of the device, so this flag should only be used if 1011 + absolutely necessary. 1012 1012 1013 - The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a 1014 - positive value from pci_pm_prepare() if the ->prepare callback provided by the 1013 + The DPM_FLAG_SMART_PREPARE flag causes the PCI bus type to return a positive 1014 + value from pci_pm_prepare() only if the ->prepare callback provided by the 1015 1015 driver of the device returns a positive value. That allows the driver to opt 1016 - out from using the direct-complete mechanism dynamically. 1016 + out from using the direct-complete mechanism dynamically (whereas setting 1017 + DPM_FLAG_NO_DIRECT_COMPLETE means permanent opt-out). 1017 1018 1018 1019 The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's 1019 1020 perspective the device can be safely left in runtime suspend during system 1020 1021 suspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff() 1021 - to skip resuming the device from runtime suspend unless there are PCI-specific 1022 - reasons for doing that. Also, it causes pci_pm_suspend_late/noirq(), 1023 - pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early 1024 - if the device remains in runtime suspend in the beginning of the "late" phase 1025 - of the system-wide transition under way. Moreover, if the device is in 1026 - runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime 1027 - power management status will be changed to "active" (as it is going to be put 1028 - into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(), 1029 - the function will set the power.direct_complete flag for it (to make the PM core 1030 - skip the subsequent "thaw" callbacks for it) and return. 1022 + to avoid resuming the device from runtime suspend unless there are PCI-specific 1023 + reasons for doing that. Also, it causes pci_pm_suspend_late/noirq() and 1024 + pci_pm_poweroff_late/noirq() to return early if the device remains in runtime 1025 + suspend during the "late" phase of the system-wide transition under way. 1026 + Moreover, if the device is in runtime suspend in pci_pm_resume_noirq() or 1027 + pci_pm_restore_noirq(), its runtime PM status will be changed to "active" (as it 1028 + is going to be put into D0 going forward). 1031 1029 1032 - Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the 1033 - device to be left in suspend after system-wide transitions to the working state. 1034 - This flag is checked by the PM core, but the PCI bus type informs the PM core 1035 - which devices may be left in suspend from its perspective (that happens during 1036 - the "noirq" phase of system-wide suspend and analogous transitions) and next it 1037 - uses the dev_pm_may_skip_resume() helper to decide whether or not to return from 1038 - pci_pm_resume_noirq() early, as the PM core will skip the remaining resume 1039 - callbacks for the device during the transition under way and will set its 1040 - runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for 1041 - it. 1030 + Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows its 1031 + "noirq" and "early" resume callbacks to be skipped if the device can be left 1032 + in suspend after a system-wide transition into the working state. This flag is 1033 + taken into consideration by the PM core along with the power.may_skip_resume 1034 + status bit of the device which is set by pci_pm_suspend_noirq() in certain 1035 + situations. If the PM core determines that the driver's "noirq" and "early" 1036 + resume callbacks should be skipped, the dev_pm_skip_resume() helper function 1037 + will return "true" and that will cause pci_pm_resume_noirq() and 1038 + pci_pm_resume_early() to return upfront without touching the device and 1039 + executing the driver callbacks. 1042 1040 1043 1041 3.2. Device Runtime Power Management 1044 1042 ------------------------------------

+1

MAINTAINERS

··· 2237 2237 F: drivers/*/qcom/ 2238 2238 F: drivers/bluetooth/btqcomsmd.c 2239 2239 F: drivers/clocksource/timer-qcom.c 2240 + F: drivers/cpuidle/cpuidle-qcom-spm.c 2240 2241 F: drivers/extcon/extcon-qcom* 2241 2242 F: drivers/i2c/busses/i2c-qcom-geni.c 2242 2243 F: drivers/i2c/busses/i2c-qup.c

+7 -7

drivers/acpi/acpi_lpss.c

··· 1041 1041 { 1042 1042 int ret; 1043 1043 1044 - if (dev_pm_smart_suspend_and_suspended(dev)) 1044 + if (dev_pm_skip_suspend(dev)) 1045 1045 return 0; 1046 1046 1047 1047 ret = pm_generic_suspend_late(dev); ··· 1093 1093 if (pdata->dev_desc->resume_from_noirq) 1094 1094 return 0; 1095 1095 1096 + if (dev_pm_skip_resume(dev)) 1097 + return 0; 1098 + 1096 1099 return acpi_lpss_do_resume_early(dev); 1097 1100 } 1098 1101 ··· 1105 1102 int ret; 1106 1103 1107 1104 /* Follow acpi_subsys_resume_noirq(). */ 1108 - if (dev_pm_may_skip_resume(dev)) 1105 + if (dev_pm_skip_resume(dev)) 1109 1106 return 0; 1110 - 1111 - if (dev_pm_smart_suspend_and_suspended(dev)) 1112 - pm_runtime_set_active(dev); 1113 1107 1114 1108 ret = pm_generic_resume_noirq(dev); 1115 1109 if (ret) ··· 1169 1169 { 1170 1170 struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev)); 1171 1171 1172 - if (dev_pm_smart_suspend_and_suspended(dev)) 1172 + if (dev_pm_skip_suspend(dev)) 1173 1173 return 0; 1174 1174 1175 1175 if (pdata->dev_desc->resume_from_noirq) ··· 1182 1182 { 1183 1183 struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev)); 1184 1184 1185 - if (dev_pm_smart_suspend_and_suspended(dev)) 1185 + if (dev_pm_skip_suspend(dev)) 1186 1186 return 0; 1187 1187 1188 1188 if (pdata->dev_desc->resume_from_noirq) {

+1 -1

drivers/acpi/acpi_tad.c

··· 624 624 */ 625 625 device_init_wakeup(dev, true); 626 626 dev_pm_set_driver_flags(dev, DPM_FLAG_SMART_SUSPEND | 627 - DPM_FLAG_LEAVE_SUSPENDED); 627 + DPM_FLAG_MAY_SKIP_RESUME); 628 628 /* 629 629 * The platform bus type layer tells the ACPI PM domain powers up the 630 630 * device, so set the runtime PM status of it to "active".

+13 -18

drivers/acpi/device_pm.c

··· 1084 1084 { 1085 1085 int ret; 1086 1086 1087 - if (dev_pm_smart_suspend_and_suspended(dev)) 1087 + if (dev_pm_skip_suspend(dev)) 1088 1088 return 0; 1089 1089 1090 1090 ret = pm_generic_suspend_late(dev); ··· 1100 1100 { 1101 1101 int ret; 1102 1102 1103 - if (dev_pm_smart_suspend_and_suspended(dev)) { 1104 - dev->power.may_skip_resume = true; 1103 + if (dev_pm_skip_suspend(dev)) 1105 1104 return 0; 1106 - } 1107 1105 1108 1106 ret = pm_generic_suspend_noirq(dev); 1109 1107 if (ret) ··· 1114 1116 * acpi_subsys_complete() to take care of fixing up the device's state 1115 1117 * anyway, if need be. 1116 1118 */ 1117 - dev->power.may_skip_resume = device_may_wakeup(dev) || 1118 - !device_can_wakeup(dev); 1119 + if (device_can_wakeup(dev) && !device_may_wakeup(dev)) 1120 + dev->power.may_skip_resume = false; 1119 1121 1120 1122 return 0; 1121 1123 } ··· 1127 1129 */ 1128 1130 static int acpi_subsys_resume_noirq(struct device *dev) 1129 1131 { 1130 - if (dev_pm_may_skip_resume(dev)) 1132 + if (dev_pm_skip_resume(dev)) 1131 1133 return 0; 1132 - 1133 - /* 1134 - * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend 1135 - * during system suspend, so update their runtime PM status to "active" 1136 - * as they will be put into D0 going forward. 1137 - */ 1138 - if (dev_pm_smart_suspend_and_suspended(dev)) 1139 - pm_runtime_set_active(dev); 1140 1134 1141 1135 return pm_generic_resume_noirq(dev); 1142 1136 } ··· 1143 1153 */ 1144 1154 static int acpi_subsys_resume_early(struct device *dev) 1145 1155 { 1146 - int ret = acpi_dev_resume(dev); 1156 + int ret; 1157 + 1158 + if (dev_pm_skip_resume(dev)) 1159 + return 0; 1160 + 1161 + ret = acpi_dev_resume(dev); 1147 1162 return ret ? ret : pm_generic_resume_early(dev); 1148 1163 } 1149 1164 ··· 1213 1218 { 1214 1219 int ret; 1215 1220 1216 - if (dev_pm_smart_suspend_and_suspended(dev)) 1221 + if (dev_pm_skip_suspend(dev)) 1217 1222 return 0; 1218 1223 1219 1224 ret = pm_generic_poweroff_late(dev); ··· 1229 1234 */ 1230 1235 static int acpi_subsys_poweroff_noirq(struct device *dev) 1231 1236 { 1232 - if (dev_pm_smart_suspend_and_suspended(dev)) 1237 + if (dev_pm_skip_suspend(dev)) 1233 1238 return 0; 1234 1239 1235 1240 return pm_generic_poweroff_noirq(dev);

+1 -1

drivers/acpi/ec.c

··· 2017 2017 */ 2018 2018 ret = acpi_dispatch_gpe(NULL, first_ec->gpe); 2019 2019 if (ret == ACPI_INTERRUPT_HANDLED) { 2020 - pm_pr_dbg("EC GPE dispatched\n"); 2020 + pm_pr_dbg("ACPI EC GPE dispatched\n"); 2021 2021 2022 2022 /* Flush the event and query workqueues. */ 2023 2023 acpi_ec_flush_work();

+15 -5

drivers/acpi/sleep.c

··· 992 992 * wakeup is pending anyway and the SCI is not the source of 993 993 * it). 994 994 */ 995 - if (irqd_is_wakeup_armed(irq_get_irq_data(acpi_sci_irq))) 995 + if (irqd_is_wakeup_armed(irq_get_irq_data(acpi_sci_irq))) { 996 + pm_pr_dbg("Wakeup unrelated to ACPI SCI\n"); 996 997 return true; 998 + } 997 999 998 1000 /* 999 1001 * If the status bit of any enabled fixed event is set, the 1000 1002 * wakeup is regarded as valid. 1001 1003 */ 1002 - if (acpi_any_fixed_event_status_set()) 1004 + if (acpi_any_fixed_event_status_set()) { 1005 + pm_pr_dbg("ACPI fixed event wakeup\n"); 1003 1006 return true; 1007 + } 1004 1008 1005 1009 /* Check wakeups from drivers sharing the SCI. */ 1006 - if (acpi_check_wakeup_handlers()) 1010 + if (acpi_check_wakeup_handlers()) { 1011 + pm_pr_dbg("ACPI custom handler wakeup\n"); 1007 1012 return true; 1013 + } 1008 1014 1009 1015 /* Check non-EC GPE wakeups and dispatch the EC GPE. */ 1010 - if (acpi_ec_dispatch_gpe()) 1016 + if (acpi_ec_dispatch_gpe()) { 1017 + pm_pr_dbg("ACPI non-EC GPE wakeup\n"); 1011 1018 return true; 1019 + } 1012 1020 1013 1021 /* 1014 1022 * Cancel the SCI wakeup and process all pending events in case ··· 1035 1027 * are pending here, they must be resulting from the processing 1036 1028 * of EC events above or coming from somewhere else. 1037 1029 */ 1038 - if (pm_wakeup_pending()) 1030 + if (pm_wakeup_pending()) { 1031 + pm_pr_dbg("Wakeup after ACPI Notify sync\n"); 1039 1032 return true; 1033 + } 1040 1034 1041 1035 rearm_wake_irq(acpi_sci_irq); 1042 1036 }

+113 -239

drivers/base/power/main.c

··· 562 562 /*------------------------- Resume routines -------------------------*/ 563 563 564 564 /** 565 - * suspend_event - Return a "suspend" message for given "resume" one. 566 - * @resume_msg: PM message representing a system-wide resume transition. 567 - */ 568 - static pm_message_t suspend_event(pm_message_t resume_msg) 569 - { 570 - switch (resume_msg.event) { 571 - case PM_EVENT_RESUME: 572 - return PMSG_SUSPEND; 573 - case PM_EVENT_THAW: 574 - case PM_EVENT_RESTORE: 575 - return PMSG_FREEZE; 576 - case PM_EVENT_RECOVER: 577 - return PMSG_HIBERNATE; 578 - } 579 - return PMSG_ON; 580 - } 581 - 582 - /** 583 - * dev_pm_may_skip_resume - System-wide device resume optimization check. 565 + * dev_pm_skip_resume - System-wide device resume optimization check. 584 566 * @dev: Target device. 585 567 * 586 - * Checks whether or not the device may be left in suspend after a system-wide 587 - * transition to the working state. 568 + * Return: 569 + * - %false if the transition under way is RESTORE. 570 + * - Return value of dev_pm_skip_suspend() if the transition under way is THAW. 571 + * - The logical negation of %power.must_resume otherwise (that is, when the 572 + * transition under way is RESUME). 588 573 */ 589 - bool dev_pm_may_skip_resume(struct device *dev) 574 + bool dev_pm_skip_resume(struct device *dev) 590 575 { 591 - return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE; 576 + if (pm_transition.event == PM_EVENT_RESTORE) 577 + return false; 578 + 579 + if (pm_transition.event == PM_EVENT_THAW) 580 + return dev_pm_skip_suspend(dev); 581 + 582 + return !dev->power.must_resume; 592 583 } 593 - 594 - static pm_callback_t dpm_subsys_resume_noirq_cb(struct device *dev, 595 - pm_message_t state, 596 - const char **info_p) 597 - { 598 - pm_callback_t callback; 599 - const char *info; 600 - 601 - if (dev->pm_domain) { 602 - info = "noirq power domain "; 603 - callback = pm_noirq_op(&dev->pm_domain->ops, state); 604 - } else if (dev->type && dev->type->pm) { 605 - info = "noirq type "; 606 - callback = pm_noirq_op(dev->type->pm, state); 607 - } else if (dev->class && dev->class->pm) { 608 - info = "noirq class "; 609 - callback = pm_noirq_op(dev->class->pm, state); 610 - } else if (dev->bus && dev->bus->pm) { 611 - info = "noirq bus "; 612 - callback = pm_noirq_op(dev->bus->pm, state); 613 - } else { 614 - return NULL; 615 - } 616 - 617 - if (info_p) 618 - *info_p = info; 619 - 620 - return callback; 621 - } 622 - 623 - static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev, 624 - pm_message_t state, 625 - const char **info_p); 626 - 627 - static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev, 628 - pm_message_t state, 629 - const char **info_p); 630 584 631 585 /** 632 586 * device_resume_noirq - Execute a "noirq resume" callback for given device. ··· 593 639 */ 594 640 static int device_resume_noirq(struct device *dev, pm_message_t state, bool async) 595 641 { 596 - pm_callback_t callback; 597 - const char *info; 642 + pm_callback_t callback = NULL; 643 + const char *info = NULL; 598 644 bool skip_resume; 599 645 int error = 0; 600 646 ··· 610 656 if (!dpm_wait_for_superior(dev, async)) 611 657 goto Out; 612 658 613 - skip_resume = dev_pm_may_skip_resume(dev); 659 + skip_resume = dev_pm_skip_resume(dev); 660 + /* 661 + * If the driver callback is skipped below or by the middle layer 662 + * callback and device_resume_early() also skips the driver callback for 663 + * this device later, it needs to appear as "suspended" to PM-runtime, 664 + * so change its status accordingly. 665 + * 666 + * Otherwise, the device is going to be resumed, so set its PM-runtime 667 + * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set 668 + * to avoid confusing drivers that don't use it. 669 + */ 670 + if (skip_resume) 671 + pm_runtime_set_suspended(dev); 672 + else if (dev_pm_skip_suspend(dev)) 673 + pm_runtime_set_active(dev); 614 674 615 - callback = dpm_subsys_resume_noirq_cb(dev, state, &info); 675 + if (dev->pm_domain) { 676 + info = "noirq power domain "; 677 + callback = pm_noirq_op(&dev->pm_domain->ops, state); 678 + } else if (dev->type && dev->type->pm) { 679 + info = "noirq type "; 680 + callback = pm_noirq_op(dev->type->pm, state); 681 + } else if (dev->class && dev->class->pm) { 682 + info = "noirq class "; 683 + callback = pm_noirq_op(dev->class->pm, state); 684 + } else if (dev->bus && dev->bus->pm) { 685 + info = "noirq bus "; 686 + callback = pm_noirq_op(dev->bus->pm, state); 687 + } 616 688 if (callback) 617 689 goto Run; 618 690 619 691 if (skip_resume) 620 692 goto Skip; 621 - 622 - if (dev_pm_smart_suspend_and_suspended(dev)) { 623 - pm_message_t suspend_msg = suspend_event(state); 624 - 625 - /* 626 - * If "freeze" callbacks have been skipped during a transition 627 - * related to hibernation, the subsequent "thaw" callbacks must 628 - * be skipped too or bad things may happen. Otherwise, resume 629 - * callbacks are going to be run for the device, so its runtime 630 - * PM status must be changed to reflect the new state after the 631 - * transition under way. 632 - */ 633 - if (!dpm_subsys_suspend_late_cb(dev, suspend_msg, NULL) && 634 - !dpm_subsys_suspend_noirq_cb(dev, suspend_msg, NULL)) { 635 - if (state.event == PM_EVENT_THAW) { 636 - skip_resume = true; 637 - goto Skip; 638 - } else { 639 - pm_runtime_set_active(dev); 640 - } 641 - } 642 - } 643 693 644 694 if (dev->driver && dev->driver->pm) { 645 695 info = "noirq driver "; ··· 655 697 656 698 Skip: 657 699 dev->power.is_noirq_suspended = false; 658 - 659 - if (skip_resume) { 660 - /* Make the next phases of resume skip the device. */ 661 - dev->power.is_late_suspended = false; 662 - dev->power.is_suspended = false; 663 - /* 664 - * The device is going to be left in suspend, but it might not 665 - * have been in runtime suspend before the system suspended, so 666 - * its runtime PM status needs to be updated to avoid confusing 667 - * the runtime PM framework when runtime PM is enabled for the 668 - * device again. 669 - */ 670 - pm_runtime_set_suspended(dev); 671 - } 672 700 673 701 Out: 674 702 complete_all(&dev->power.completion); ··· 754 810 cpuidle_resume(); 755 811 } 756 812 757 - static pm_callback_t dpm_subsys_resume_early_cb(struct device *dev, 758 - pm_message_t state, 759 - const char **info_p) 760 - { 761 - pm_callback_t callback; 762 - const char *info; 763 - 764 - if (dev->pm_domain) { 765 - info = "early power domain "; 766 - callback = pm_late_early_op(&dev->pm_domain->ops, state); 767 - } else if (dev->type && dev->type->pm) { 768 - info = "early type "; 769 - callback = pm_late_early_op(dev->type->pm, state); 770 - } else if (dev->class && dev->class->pm) { 771 - info = "early class "; 772 - callback = pm_late_early_op(dev->class->pm, state); 773 - } else if (dev->bus && dev->bus->pm) { 774 - info = "early bus "; 775 - callback = pm_late_early_op(dev->bus->pm, state); 776 - } else { 777 - return NULL; 778 - } 779 - 780 - if (info_p) 781 - *info_p = info; 782 - 783 - return callback; 784 - } 785 - 786 813 /** 787 814 * device_resume_early - Execute an "early resume" callback for given device. 788 815 * @dev: Device to handle. ··· 764 849 */ 765 850 static int device_resume_early(struct device *dev, pm_message_t state, bool async) 766 851 { 767 - pm_callback_t callback; 768 - const char *info; 852 + pm_callback_t callback = NULL; 853 + const char *info = NULL; 769 854 int error = 0; 770 855 771 856 TRACE_DEVICE(dev); ··· 780 865 if (!dpm_wait_for_superior(dev, async)) 781 866 goto Out; 782 867 783 - callback = dpm_subsys_resume_early_cb(dev, state, &info); 868 + if (dev->pm_domain) { 869 + info = "early power domain "; 870 + callback = pm_late_early_op(&dev->pm_domain->ops, state); 871 + } else if (dev->type && dev->type->pm) { 872 + info = "early type "; 873 + callback = pm_late_early_op(dev->type->pm, state); 874 + } else if (dev->class && dev->class->pm) { 875 + info = "early class "; 876 + callback = pm_late_early_op(dev->class->pm, state); 877 + } else if (dev->bus && dev->bus->pm) { 878 + info = "early bus "; 879 + callback = pm_late_early_op(dev->bus->pm, state); 880 + } 881 + if (callback) 882 + goto Run; 784 883 785 - if (!callback && dev->driver && dev->driver->pm) { 884 + if (dev_pm_skip_resume(dev)) 885 + goto Skip; 886 + 887 + if (dev->driver && dev->driver->pm) { 786 888 info = "early driver "; 787 889 callback = pm_late_early_op(dev->driver->pm, state); 788 890 } 789 891 892 + Run: 790 893 error = dpm_run_callback(callback, dev, state, info); 894 + 895 + Skip: 791 896 dev->power.is_late_suspended = false; 792 897 793 - Out: 898 + Out: 794 899 TRACE_RESUME(error); 795 900 796 901 pm_runtime_enable(dev); ··· 1180 1245 device_links_read_unlock(idx); 1181 1246 } 1182 1247 1183 - static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev, 1184 - pm_message_t state, 1185 - const char **info_p) 1186 - { 1187 - pm_callback_t callback; 1188 - const char *info; 1189 - 1190 - if (dev->pm_domain) { 1191 - info = "noirq power domain "; 1192 - callback = pm_noirq_op(&dev->pm_domain->ops, state); 1193 - } else if (dev->type && dev->type->pm) { 1194 - info = "noirq type "; 1195 - callback = pm_noirq_op(dev->type->pm, state); 1196 - } else if (dev->class && dev->class->pm) { 1197 - info = "noirq class "; 1198 - callback = pm_noirq_op(dev->class->pm, state); 1199 - } else if (dev->bus && dev->bus->pm) { 1200 - info = "noirq bus "; 1201 - callback = pm_noirq_op(dev->bus->pm, state); 1202 - } else { 1203 - return NULL; 1204 - } 1205 - 1206 - if (info_p) 1207 - *info_p = info; 1208 - 1209 - return callback; 1210 - } 1211 - 1212 - static bool device_must_resume(struct device *dev, pm_message_t state, 1213 - bool no_subsys_suspend_noirq) 1214 - { 1215 - pm_message_t resume_msg = resume_event(state); 1216 - 1217 - /* 1218 - * If all of the device driver's "noirq", "late" and "early" callbacks 1219 - * are invoked directly by the core, the decision to allow the device to 1220 - * stay in suspend can be based on its current runtime PM status and its 1221 - * wakeup settings. 1222 - */ 1223 - if (no_subsys_suspend_noirq && 1224 - !dpm_subsys_suspend_late_cb(dev, state, NULL) && 1225 - !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) && 1226 - !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) 1227 - return !pm_runtime_status_suspended(dev) && 1228 - (resume_msg.event != PM_EVENT_RESUME || 1229 - (device_can_wakeup(dev) && !device_may_wakeup(dev))); 1230 - 1231 - /* 1232 - * The only safe strategy here is to require that if the device may not 1233 - * be left in suspend, resume callbacks must be invoked for it. 1234 - */ 1235 - return !dev->power.may_skip_resume; 1236 - } 1237 - 1238 1248 /** 1239 1249 * __device_suspend_noirq - Execute a "noirq suspend" callback for given device. 1240 1250 * @dev: Device to handle. ··· 1191 1311 */ 1192 1312 static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool async) 1193 1313 { 1194 - pm_callback_t callback; 1195 - const char *info; 1196 - bool no_subsys_cb = false; 1314 + pm_callback_t callback = NULL; 1315 + const char *info = NULL; 1197 1316 int error = 0; 1198 1317 1199 1318 TRACE_DEVICE(dev); ··· 1206 1327 if (dev->power.syscore || dev->power.direct_complete) 1207 1328 goto Complete; 1208 1329 1209 - callback = dpm_subsys_suspend_noirq_cb(dev, state, &info); 1330 + if (dev->pm_domain) { 1331 + info = "noirq power domain "; 1332 + callback = pm_noirq_op(&dev->pm_domain->ops, state); 1333 + } else if (dev->type && dev->type->pm) { 1334 + info = "noirq type "; 1335 + callback = pm_noirq_op(dev->type->pm, state); 1336 + } else if (dev->class && dev->class->pm) { 1337 + info = "noirq class "; 1338 + callback = pm_noirq_op(dev->class->pm, state); 1339 + } else if (dev->bus && dev->bus->pm) { 1340 + info = "noirq bus "; 1341 + callback = pm_noirq_op(dev->bus->pm, state); 1342 + } 1210 1343 if (callback) 1211 1344 goto Run; 1212 1345 1213 - no_subsys_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL); 1214 - 1215 - if (dev_pm_smart_suspend_and_suspended(dev) && no_subsys_cb) 1346 + if (dev_pm_skip_suspend(dev)) 1216 1347 goto Skip; 1217 1348 1218 1349 if (dev->driver && dev->driver->pm) { ··· 1240 1351 Skip: 1241 1352 dev->power.is_noirq_suspended = true; 1242 1353 1243 - if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) { 1244 - dev->power.must_resume = dev->power.must_resume || 1245 - atomic_read(&dev->power.usage_count) > 1 || 1246 - device_must_resume(dev, state, no_subsys_cb); 1247 - } else { 1354 + /* 1355 + * Skipping the resume of devices that were in use right before the 1356 + * system suspend (as indicated by their PM-runtime usage counters) 1357 + * would be suboptimal. Also resume them if doing that is not allowed 1358 + * to be skipped. 1359 + */ 1360 + if (atomic_read(&dev->power.usage_count) > 1 || 1361 + !(dev_pm_test_driver_flags(dev, DPM_FLAG_MAY_SKIP_RESUME) && 1362 + dev->power.may_skip_resume)) 1248 1363 dev->power.must_resume = true; 1249 - } 1250 1364 1251 1365 if (dev->power.must_resume) 1252 1366 dpm_superior_set_must_resume(dev); ··· 1366 1474 spin_unlock_irq(&parent->power.lock); 1367 1475 } 1368 1476 1369 - static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev, 1370 - pm_message_t state, 1371 - const char **info_p) 1372 - { 1373 - pm_callback_t callback; 1374 - const char *info; 1375 - 1376 - if (dev->pm_domain) { 1377 - info = "late power domain "; 1378 - callback = pm_late_early_op(&dev->pm_domain->ops, state); 1379 - } else if (dev->type && dev->type->pm) { 1380 - info = "late type "; 1381 - callback = pm_late_early_op(dev->type->pm, state); 1382 - } else if (dev->class && dev->class->pm) { 1383 - info = "late class "; 1384 - callback = pm_late_early_op(dev->class->pm, state); 1385 - } else if (dev->bus && dev->bus->pm) { 1386 - info = "late bus "; 1387 - callback = pm_late_early_op(dev->bus->pm, state); 1388 - } else { 1389 - return NULL; 1390 - } 1391 - 1392 - if (info_p) 1393 - *info_p = info; 1394 - 1395 - return callback; 1396 - } 1397 - 1398 1477 /** 1399 1478 * __device_suspend_late - Execute a "late suspend" callback for given device. 1400 1479 * @dev: Device to handle. ··· 1376 1513 */ 1377 1514 static int __device_suspend_late(struct device *dev, pm_message_t state, bool async) 1378 1515 { 1379 - pm_callback_t callback; 1380 - const char *info; 1516 + pm_callback_t callback = NULL; 1517 + const char *info = NULL; 1381 1518 int error = 0; 1382 1519 1383 1520 TRACE_DEVICE(dev); ··· 1398 1535 if (dev->power.syscore || dev->power.direct_complete) 1399 1536 goto Complete; 1400 1537 1401 - callback = dpm_subsys_suspend_late_cb(dev, state, &info); 1538 + if (dev->pm_domain) { 1539 + info = "late power domain "; 1540 + callback = pm_late_early_op(&dev->pm_domain->ops, state); 1541 + } else if (dev->type && dev->type->pm) { 1542 + info = "late type "; 1543 + callback = pm_late_early_op(dev->type->pm, state); 1544 + } else if (dev->class && dev->class->pm) { 1545 + info = "late class "; 1546 + callback = pm_late_early_op(dev->class->pm, state); 1547 + } else if (dev->bus && dev->bus->pm) { 1548 + info = "late bus "; 1549 + callback = pm_late_early_op(dev->bus->pm, state); 1550 + } 1402 1551 if (callback) 1403 1552 goto Run; 1404 1553 1405 - if (dev_pm_smart_suspend_and_suspended(dev) && 1406 - !dpm_subsys_suspend_noirq_cb(dev, state, NULL)) 1554 + if (dev_pm_skip_suspend(dev)) 1407 1555 goto Skip; 1408 1556 1409 1557 if (dev->driver && dev->driver->pm) { ··· 1640 1766 dev->power.direct_complete = false; 1641 1767 } 1642 1768 1643 - dev->power.may_skip_resume = false; 1769 + dev->power.may_skip_resume = true; 1644 1770 dev->power.must_resume = false; 1645 1771 1646 1772 dpm_watchdog_set(&wd, dev); ··· 1844 1970 spin_lock_irq(&dev->power.lock); 1845 1971 dev->power.direct_complete = state.event == PM_EVENT_SUSPEND && 1846 1972 (ret > 0 || dev->power.no_pm_callbacks) && 1847 - !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP); 1973 + !dev_pm_test_driver_flags(dev, DPM_FLAG_NO_DIRECT_COMPLETE); 1848 1974 spin_unlock_irq(&dev->power.lock); 1849 1975 return 0; 1850 1976 } ··· 2002 2128 spin_unlock_irq(&dev->power.lock); 2003 2129 } 2004 2130 2005 - bool dev_pm_smart_suspend_and_suspended(struct device *dev) 2131 + bool dev_pm_skip_suspend(struct device *dev) 2006 2132 { 2007 2133 return dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) && 2008 2134 pm_runtime_status_suspended(dev);

+2 -4

drivers/base/power/runtime.c

··· 523 523 524 524 repeat: 525 525 retval = rpm_check_suspend_allowed(dev); 526 - 527 526 if (retval < 0) 528 - ; /* Conditions are wrong. */ 527 + goto out; /* Conditions are wrong. */ 529 528 530 529 /* Synchronous suspends are not allowed in the RPM_RESUMING state. */ 531 - else if (dev->power.runtime_status == RPM_RESUMING && 532 - !(rpmflags & RPM_ASYNC)) 530 + if (dev->power.runtime_status == RPM_RESUMING && !(rpmflags & RPM_ASYNC)) 533 531 retval = -EAGAIN; 534 532 if (retval) 535 533 goto out;

+2 -2

drivers/base/power/sysfs.c

··· 666 666 if (rc) 667 667 return rc; 668 668 669 - if (pm_runtime_callbacks_present(dev)) { 669 + if (!pm_runtime_has_no_callbacks(dev)) { 670 670 rc = sysfs_merge_group(&dev->kobj, &pm_runtime_attr_group); 671 671 if (rc) 672 672 goto err_out; ··· 709 709 if (rc) 710 710 return rc; 711 711 712 - if (pm_runtime_callbacks_present(dev)) { 712 + if (!pm_runtime_has_no_callbacks(dev)) { 713 713 rc = sysfs_group_change_owner( 714 714 &dev->kobj, &pm_runtime_attr_group, kuid, kgid); 715 715 if (rc)

+27 -3

drivers/clk/clk-qoriq.c

··· 95 95 }; 96 96 97 97 static struct clockgen clockgen; 98 + static bool add_cpufreq_dev __initdata; 98 99 99 100 static void cg_out(struct clockgen *cg, u32 val, u32 __iomem *reg) 100 101 { ··· 1020 1019 } 1021 1020 } 1022 1021 1023 - static void __init clockgen_init(struct device_node *np); 1022 + static void __init _clockgen_init(struct device_node *np, bool legacy); 1024 1023 1025 1024 /* 1026 1025 * Legacy nodes may get probed before the parent clockgen node. ··· 1031 1030 static void __init legacy_init_clockgen(struct device_node *np) 1032 1031 { 1033 1032 if (!clockgen.node) 1034 - clockgen_init(of_get_parent(np)); 1033 + _clockgen_init(of_get_parent(np), true); 1035 1034 } 1036 1035 1037 1036 /* Legacy node */ ··· 1448 1447 } 1449 1448 #endif 1450 1449 1451 - static void __init clockgen_init(struct device_node *np) 1450 + static void __init _clockgen_init(struct device_node *np, bool legacy) 1452 1451 { 1453 1452 int i, ret; 1454 1453 bool is_old_ls1021a = false; ··· 1517 1516 __func__, np, ret); 1518 1517 } 1519 1518 1519 + /* Don't create cpufreq device for legacy clockgen blocks */ 1520 + add_cpufreq_dev = !legacy; 1521 + 1520 1522 return; 1521 1523 err: 1522 1524 iounmap(clockgen.regs); 1523 1525 clockgen.regs = NULL; 1524 1526 } 1527 + 1528 + static void __init clockgen_init(struct device_node *np) 1529 + { 1530 + _clockgen_init(np, false); 1531 + } 1532 + 1533 + static int __init clockgen_cpufreq_init(void) 1534 + { 1535 + struct platform_device *pdev; 1536 + 1537 + if (add_cpufreq_dev) { 1538 + pdev = platform_device_register_simple("qoriq-cpufreq", -1, 1539 + NULL, 0); 1540 + if (IS_ERR(pdev)) 1541 + pr_err("Couldn't register qoriq-cpufreq err=%ld\n", 1542 + PTR_ERR(pdev)); 1543 + } 1544 + return 0; 1545 + } 1546 + device_initcall(clockgen_cpufreq_init); 1525 1547 1526 1548 CLK_OF_DECLARE(qoriq_clockgen_1, "fsl,qoriq-clockgen-1.0", clockgen_init); 1527 1549 CLK_OF_DECLARE(qoriq_clockgen_2, "fsl,qoriq-clockgen-2.0", clockgen_init);

+5 -1

drivers/clk/clk.c

··· 114 114 return 0; 115 115 116 116 ret = pm_runtime_get_sync(core->dev); 117 - return ret < 0 ? ret : 0; 117 + if (ret < 0) { 118 + pm_runtime_put_noidle(core->dev); 119 + return ret; 120 + } 121 + return 0; 118 122 } 119 123 120 124 static void clk_pm_runtime_put(struct clk_core *core)

+2 -1

drivers/cpufreq/Kconfig

··· 323 323 324 324 config QORIQ_CPUFREQ 325 325 tristate "CPU frequency scaling driver for Freescale QorIQ SoCs" 326 - depends on OF && COMMON_CLK && (PPC_E500MC || ARM || ARM64) 326 + depends on OF && COMMON_CLK 327 + depends on PPC_E500MC || SOC_LS1021A || ARCH_LAYERSCAPE || COMPILE_TEST 327 328 select CLK_QORIQ 328 329 help 329 330 This adds the CPUFreq driver support for Freescale QorIQ SoCs

+1

drivers/cpufreq/Kconfig.arm

··· 317 317 config ARM_TI_CPUFREQ 318 318 bool "Texas Instruments CPUFreq support" 319 319 depends on ARCH_OMAP2PLUS 320 + default ARCH_OMAP2PLUS 320 321 help 321 322 This driver enables valid OPPs on the running platform based on 322 323 values contained within the SoC in use. Enable this in order to

+2

drivers/cpufreq/cpufreq-dt-platdev.c

··· 53 53 { .compatible = "renesas,r7s72100", }, 54 54 { .compatible = "renesas,r8a73a4", }, 55 55 { .compatible = "renesas,r8a7740", }, 56 + { .compatible = "renesas,r8a7742", }, 56 57 { .compatible = "renesas,r8a7743", }, 57 58 { .compatible = "renesas,r8a7744", }, 58 59 { .compatible = "renesas,r8a7745", }, ··· 106 105 { .compatible = "calxeda,highbank", }, 107 106 { .compatible = "calxeda,ecx-2000", }, 108 107 108 + { .compatible = "fsl,imx7ulp", }, 109 109 { .compatible = "fsl,imx7d", }, 110 110 { .compatible = "fsl,imx8mq", }, 111 111 { .compatible = "fsl,imx8mm", },

+6 -5

drivers/cpufreq/cpufreq.c

··· 2535 2535 static int cpufreq_boost_set_sw(int state) 2536 2536 { 2537 2537 struct cpufreq_policy *policy; 2538 - int ret = -EINVAL; 2539 2538 2540 2539 for_each_active_policy(policy) { 2540 + int ret; 2541 + 2541 2542 if (!policy->freq_table) 2542 - continue; 2543 + return -ENXIO; 2543 2544 2544 2545 ret = cpufreq_frequency_table_cpuinfo(policy, 2545 2546 policy->freq_table); 2546 2547 if (ret) { 2547 2548 pr_err("%s: Policy frequency update failed\n", 2548 2549 __func__); 2549 - break; 2550 + return ret; 2550 2551 } 2551 2552 2552 2553 ret = freq_qos_update_request(policy->max_freq_req, policy->max); 2553 2554 if (ret < 0) 2554 - break; 2555 + return ret; 2555 2556 } 2556 2557 2557 - return ret; 2558 + return 0; 2558 2559 } 2559 2560 2560 2561 int cpufreq_boost_trigger_state(int state)

+82 -2

drivers/cpufreq/imx-cpufreq-dt.c

··· 3 3 * Copyright 2019 NXP 4 4 */ 5 5 6 + #include <linux/clk.h> 6 7 #include <linux/cpu.h> 8 + #include <linux/cpufreq.h> 7 9 #include <linux/err.h> 8 10 #include <linux/init.h> 9 11 #include <linux/kernel.h> ··· 14 12 #include <linux/of.h> 15 13 #include <linux/platform_device.h> 16 14 #include <linux/pm_opp.h> 15 + #include <linux/regulator/consumer.h> 17 16 #include <linux/slab.h> 17 + 18 + #include "cpufreq-dt.h" 18 19 19 20 #define OCOTP_CFG3_SPEED_GRADE_SHIFT 8 20 21 #define OCOTP_CFG3_SPEED_GRADE_MASK (0x3 << 8) ··· 27 22 #define IMX8MP_OCOTP_CFG3_MKT_SEGMENT_SHIFT 5 28 23 #define IMX8MP_OCOTP_CFG3_MKT_SEGMENT_MASK (0x3 << 5) 29 24 25 + #define IMX7ULP_MAX_RUN_FREQ 528000 26 + 30 27 /* cpufreq-dt device registered by imx-cpufreq-dt */ 31 28 static struct platform_device *cpufreq_dt_pdev; 32 29 static struct opp_table *cpufreq_opp_table; 30 + static struct device *cpu_dev; 31 + 32 + enum IMX7ULP_CPUFREQ_CLKS { 33 + ARM, 34 + CORE, 35 + SCS_SEL, 36 + HSRUN_CORE, 37 + HSRUN_SCS_SEL, 38 + FIRC, 39 + }; 40 + 41 + static struct clk_bulk_data imx7ulp_clks[] = { 42 + { .id = "arm" }, 43 + { .id = "core" }, 44 + { .id = "scs_sel" }, 45 + { .id = "hsrun_core" }, 46 + { .id = "hsrun_scs_sel" }, 47 + { .id = "firc" }, 48 + }; 49 + 50 + static unsigned int imx7ulp_get_intermediate(struct cpufreq_policy *policy, 51 + unsigned int index) 52 + { 53 + return clk_get_rate(imx7ulp_clks[FIRC].clk); 54 + } 55 + 56 + static int imx7ulp_target_intermediate(struct cpufreq_policy *policy, 57 + unsigned int index) 58 + { 59 + unsigned int newfreq = policy->freq_table[index].frequency; 60 + 61 + clk_set_parent(imx7ulp_clks[SCS_SEL].clk, imx7ulp_clks[FIRC].clk); 62 + clk_set_parent(imx7ulp_clks[HSRUN_SCS_SEL].clk, imx7ulp_clks[FIRC].clk); 63 + 64 + if (newfreq > IMX7ULP_MAX_RUN_FREQ) 65 + clk_set_parent(imx7ulp_clks[ARM].clk, 66 + imx7ulp_clks[HSRUN_CORE].clk); 67 + else 68 + clk_set_parent(imx7ulp_clks[ARM].clk, imx7ulp_clks[CORE].clk); 69 + 70 + return 0; 71 + } 72 + 73 + static struct cpufreq_dt_platform_data imx7ulp_data = { 74 + .target_intermediate = imx7ulp_target_intermediate, 75 + .get_intermediate = imx7ulp_get_intermediate, 76 + }; 33 77 34 78 static int imx_cpufreq_dt_probe(struct platform_device *pdev) 35 79 { 36 - struct device *cpu_dev = get_cpu_device(0); 80 + struct platform_device *dt_pdev; 37 81 u32 cell_value, supported_hw[2]; 38 82 int speed_grade, mkt_segment; 39 83 int ret; 40 84 85 + cpu_dev = get_cpu_device(0); 86 + 41 87 if (!of_find_property(cpu_dev->of_node, "cpu-supply", NULL)) 42 88 return -ENODEV; 89 + 90 + if (of_machine_is_compatible("fsl,imx7ulp")) { 91 + ret = clk_bulk_get(cpu_dev, ARRAY_SIZE(imx7ulp_clks), 92 + imx7ulp_clks); 93 + if (ret) 94 + return ret; 95 + 96 + dt_pdev = platform_device_register_data(NULL, "cpufreq-dt", 97 + -1, &imx7ulp_data, 98 + sizeof(imx7ulp_data)); 99 + if (IS_ERR(dt_pdev)) { 100 + clk_bulk_put(ARRAY_SIZE(imx7ulp_clks), imx7ulp_clks); 101 + ret = PTR_ERR(dt_pdev); 102 + dev_err(&pdev->dev, "Failed to register cpufreq-dt: %d\n", ret); 103 + return ret; 104 + } 105 + 106 + cpufreq_dt_pdev = dt_pdev; 107 + 108 + return 0; 109 + } 43 110 44 111 ret = nvmem_cell_read_u32(cpu_dev, "speed_grade", &cell_value); 45 112 if (ret) ··· 175 98 static int imx_cpufreq_dt_remove(struct platform_device *pdev) 176 99 { 177 100 platform_device_unregister(cpufreq_dt_pdev); 178 - dev_pm_opp_put_supported_hw(cpufreq_opp_table); 101 + if (!of_machine_is_compatible("fsl,imx7ulp")) 102 + dev_pm_opp_put_supported_hw(cpufreq_opp_table); 103 + else 104 + clk_bulk_put(ARRAY_SIZE(imx7ulp_clks), imx7ulp_clks); 179 105 180 106 return 0; 181 107 }

+2 -1

drivers/cpufreq/intel_pstate.c

··· 2771 2771 pr_info("Invalid MSRs\n"); 2772 2772 return -ENODEV; 2773 2773 } 2774 + /* Without HWP start in the passive mode. */ 2775 + default_driver = &intel_cpufreq; 2774 2776 2775 2777 hwp_cpu_matched: 2776 2778 /* ··· 2818 2816 if (!strcmp(str, "disable")) { 2819 2817 no_load = 1; 2820 2818 } else if (!strcmp(str, "passive")) { 2821 - pr_info("Passive mode enabled\n"); 2822 2819 default_driver = &intel_cpufreq; 2823 2820 no_hwp = 1; 2824 2821 }

+1 -1

drivers/cpufreq/qcom-cpufreq-nvmem.c

··· 277 277 if (!np) 278 278 return -ENOENT; 279 279 280 - ret = of_device_is_compatible(np, "operating-points-v2-qcom-cpu"); 280 + ret = of_device_is_compatible(np, "operating-points-v2-kryo-cpu"); 281 281 if (!ret) { 282 282 of_node_put(np); 283 283 return -ENOENT;

+30 -48

drivers/cpufreq/qoriq-cpufreq.c

··· 18 18 #include <linux/of.h> 19 19 #include <linux/slab.h> 20 20 #include <linux/smp.h> 21 + #include <linux/platform_device.h> 21 22 22 23 /** 23 24 * struct cpu_data ··· 29 28 struct clk **pclk; 30 29 struct cpufreq_frequency_table *table; 31 30 }; 32 - 33 - /* 34 - * Don't use cpufreq on this SoC -- used when the SoC would have otherwise 35 - * matched a more generic compatible. 36 - */ 37 - #define SOC_BLACKLIST 1 38 31 39 32 /** 40 33 * struct soc_data - SoC specific data ··· 259 264 .attr = cpufreq_generic_attr, 260 265 }; 261 266 262 - static const struct soc_data blacklist = { 263 - .flags = SOC_BLACKLIST, 264 - }; 265 - 266 - static const struct of_device_id node_matches[] __initconst = { 267 + static const struct of_device_id qoriq_cpufreq_blacklist[] = { 267 268 /* e6500 cannot use cpufreq due to erratum A-008083 */ 268 - { .compatible = "fsl,b4420-clockgen", &blacklist }, 269 - { .compatible = "fsl,b4860-clockgen", &blacklist }, 270 - { .compatible = "fsl,t2080-clockgen", &blacklist }, 271 - { .compatible = "fsl,t4240-clockgen", &blacklist }, 272 - 273 - { .compatible = "fsl,ls1012a-clockgen", }, 274 - { .compatible = "fsl,ls1021a-clockgen", }, 275 - { .compatible = "fsl,ls1028a-clockgen", }, 276 - { .compatible = "fsl,ls1043a-clockgen", }, 277 - { .compatible = "fsl,ls1046a-clockgen", }, 278 - { .compatible = "fsl,ls1088a-clockgen", }, 279 - { .compatible = "fsl,ls2080a-clockgen", }, 280 - { .compatible = "fsl,lx2160a-clockgen", }, 281 - { .compatible = "fsl,p4080-clockgen", }, 282 - { .compatible = "fsl,qoriq-clockgen-1.0", }, 283 - { .compatible = "fsl,qoriq-clockgen-2.0", }, 269 + { .compatible = "fsl,b4420-clockgen", }, 270 + { .compatible = "fsl,b4860-clockgen", }, 271 + { .compatible = "fsl,t2080-clockgen", }, 272 + { .compatible = "fsl,t4240-clockgen", }, 284 273 {} 285 274 }; 286 275 287 - static int __init qoriq_cpufreq_init(void) 276 + static int qoriq_cpufreq_probe(struct platform_device *pdev) 288 277 { 289 278 int ret; 290 - struct device_node *np; 291 - const struct of_device_id *match; 292 - const struct soc_data *data; 279 + struct device_node *np; 293 280 294 - np = of_find_matching_node(NULL, node_matches); 295 - if (!np) 281 + np = of_find_matching_node(NULL, qoriq_cpufreq_blacklist); 282 + if (np) { 283 + dev_info(&pdev->dev, "Disabling due to erratum A-008083"); 296 284 return -ENODEV; 297 - 298 - match = of_match_node(node_matches, np); 299 - data = match->data; 300 - 301 - of_node_put(np); 302 - 303 - if (data && data->flags & SOC_BLACKLIST) 304 - return -ENODEV; 285 + } 305 286 306 287 ret = cpufreq_register_driver(&qoriq_cpufreq_driver); 307 - if (!ret) 308 - pr_info("Freescale QorIQ CPU frequency scaling driver\n"); 288 + if (ret) 289 + return ret; 309 290 310 - return ret; 291 + dev_info(&pdev->dev, "Freescale QorIQ CPU frequency scaling driver\n"); 292 + return 0; 311 293 } 312 - module_init(qoriq_cpufreq_init); 313 294 314 - static void __exit qoriq_cpufreq_exit(void) 295 + static int qoriq_cpufreq_remove(struct platform_device *pdev) 315 296 { 316 297 cpufreq_unregister_driver(&qoriq_cpufreq_driver); 317 - } 318 - module_exit(qoriq_cpufreq_exit); 319 298 299 + return 0; 300 + } 301 + 302 + static struct platform_driver qoriq_cpufreq_platform_driver = { 303 + .driver = { 304 + .name = "qoriq-cpufreq", 305 + }, 306 + .probe = qoriq_cpufreq_probe, 307 + .remove = qoriq_cpufreq_remove, 308 + }; 309 + module_platform_driver(qoriq_cpufreq_platform_driver); 310 + 311 + MODULE_ALIAS("platform:qoriq-cpufreq"); 320 312 MODULE_LICENSE("GPL"); 321 313 MODULE_AUTHOR("Tang Yuantian <Yuantian.Tang@freescale.com>"); 322 314 MODULE_DESCRIPTION("cpufreq driver for Freescale QorIQ series SoCs");

+13

drivers/cpuidle/Kconfig.arm

··· 94 94 select ARM_CPU_SUSPEND 95 95 help 96 96 Select this to enable cpuidle for NVIDIA Tegra20/30/114/124 SoCs. 97 + 98 + config ARM_QCOM_SPM_CPUIDLE 99 + bool "CPU Idle Driver for Qualcomm Subsystem Power Manager (SPM)" 100 + depends on (ARCH_QCOM || COMPILE_TEST) && !ARM64 101 + select ARM_CPU_SUSPEND 102 + select CPU_IDLE_MULTIPLE_DRIVERS 103 + select DT_IDLE_STATES 104 + select QCOM_SCM 105 + help 106 + Select this to enable cpuidle for Qualcomm processors. 107 + The Subsystem Power Manager (SPM) controls low power modes for the 108 + CPU and L2 cores. It interface with various system drivers to put 109 + the cores in low power modes.

+1

drivers/cpuidle/Makefile

··· 25 25 cpuidle_psci-y := cpuidle-psci.o 26 26 cpuidle_psci-$(CONFIG_PM_GENERIC_DOMAINS_OF) += cpuidle-psci-domain.o 27 27 obj-$(CONFIG_ARM_TEGRA_CPUIDLE) += cpuidle-tegra.o 28 + obj-$(CONFIG_ARM_QCOM_SPM_CPUIDLE) += cpuidle-qcom-spm.o 28 29 29 30 ############################################################################### 30 31 # MIPS drivers

+7 -1

drivers/cpuidle/cpuidle-psci.c

··· 58 58 u32 state; 59 59 int ret; 60 60 61 + ret = cpu_pm_enter(); 62 + if (ret) 63 + return -1; 64 + 61 65 /* Do runtime PM to manage a hierarchical CPU toplogy. */ 62 66 pm_runtime_put_sync_suspend(pd_dev); 63 67 ··· 69 65 if (!state) 70 66 state = states[idx]; 71 67 72 - ret = psci_enter_state(idx, state); 68 + ret = psci_cpu_suspend_enter(state) ? -1 : idx; 73 69 74 70 pm_runtime_get_sync(pd_dev); 71 + 72 + cpu_pm_exit(); 75 73 76 74 /* Clear the domain state to start fresh when back from idle. */ 77 75 psci_set_domain_state(0);

+22 -53

drivers/cpuidle/sysfs.c

··· 18 18 19 19 #include "cpuidle.h" 20 20 21 - static unsigned int sysfs_switch; 22 - static int __init cpuidle_sysfs_setup(char *unused) 23 - { 24 - sysfs_switch = 1; 25 - return 1; 26 - } 27 - __setup("cpuidle_sysfs_switch", cpuidle_sysfs_setup); 28 - 29 21 static ssize_t show_available_governors(struct device *dev, 30 22 struct device_attribute *attr, 31 23 char *buf) ··· 27 35 28 36 mutex_lock(&cpuidle_lock); 29 37 list_for_each_entry(tmp, &cpuidle_governors, governor_list) { 30 - if (i >= (ssize_t) ((PAGE_SIZE/sizeof(char)) - 31 - CPUIDLE_NAME_LEN - 2)) 38 + if (i >= (ssize_t) (PAGE_SIZE - (CPUIDLE_NAME_LEN + 2))) 32 39 goto out; 33 - i += scnprintf(&buf[i], CPUIDLE_NAME_LEN, "%s ", tmp->name); 40 + 41 + i += scnprintf(&buf[i], CPUIDLE_NAME_LEN + 1, "%s ", tmp->name); 34 42 } 35 43 36 44 out: ··· 77 85 struct device_attribute *attr, 78 86 const char *buf, size_t count) 79 87 { 80 - char gov_name[CPUIDLE_NAME_LEN]; 81 - int ret = -EINVAL; 82 - size_t len = count; 88 + char gov_name[CPUIDLE_NAME_LEN + 1]; 89 + int ret; 83 90 struct cpuidle_governor *gov; 84 91 85 - if (!len || len >= sizeof(gov_name)) 92 + ret = sscanf(buf, "%" __stringify(CPUIDLE_NAME_LEN) "s", gov_name); 93 + if (ret != 1) 86 94 return -EINVAL; 87 95 88 - memcpy(gov_name, buf, len); 89 - gov_name[len] = '\0'; 90 - if (gov_name[len - 1] == '\n') 91 - gov_name[--len] = '\0'; 92 - 93 96 mutex_lock(&cpuidle_lock); 94 - 97 + ret = -EINVAL; 95 98 list_for_each_entry(gov, &cpuidle_governors, governor_list) { 96 - if (strlen(gov->name) == len && !strcmp(gov->name, gov_name)) { 99 + if (!strncmp(gov->name, gov_name, CPUIDLE_NAME_LEN)) { 97 100 ret = cpuidle_switch_governor(gov); 98 101 break; 99 102 } 100 103 } 101 - 102 104 mutex_unlock(&cpuidle_lock); 103 105 104 - if (ret) 105 - return ret; 106 - else 107 - return count; 106 + return ret ? ret : count; 108 107 } 109 108 109 + static DEVICE_ATTR(available_governors, 0444, show_available_governors, NULL); 110 110 static DEVICE_ATTR(current_driver, 0444, show_current_driver, NULL); 111 + static DEVICE_ATTR(current_governor, 0644, show_current_governor, 112 + store_current_governor); 111 113 static DEVICE_ATTR(current_governor_ro, 0444, show_current_governor, NULL); 112 114 113 - static struct attribute *cpuidle_default_attrs[] = { 115 + static struct attribute *cpuidle_attrs[] = { 116 + &dev_attr_available_governors.attr, 114 117 &dev_attr_current_driver.attr, 118 + &dev_attr_current_governor.attr, 115 119 &dev_attr_current_governor_ro.attr, 116 120 NULL 117 121 }; 118 122 119 - static DEVICE_ATTR(available_governors, 0444, show_available_governors, NULL); 120 - static DEVICE_ATTR(current_governor, 0644, show_current_governor, 121 - store_current_governor); 122 - 123 - static struct attribute *cpuidle_switch_attrs[] = { 124 - &dev_attr_available_governors.attr, 125 - &dev_attr_current_driver.attr, 126 - &dev_attr_current_governor.attr, 127 - NULL 128 - }; 129 - 130 123 static struct attribute_group cpuidle_attr_group = { 131 - .attrs = cpuidle_default_attrs, 124 + .attrs = cpuidle_attrs, 132 125 .name = "cpuidle", 133 126 }; 134 127 ··· 123 146 */ 124 147 int cpuidle_add_interface(struct device *dev) 125 148 { 126 - if (sysfs_switch) 127 - cpuidle_attr_group.attrs = cpuidle_switch_attrs; 128 - 129 149 return sysfs_create_group(&dev->kobj, &cpuidle_attr_group); 130 150 } 131 151 ··· 140 166 ssize_t (*show)(struct cpuidle_device *, char *); 141 167 ssize_t (*store)(struct cpuidle_device *, const char *, size_t count); 142 168 }; 143 - 144 - #define define_one_ro(_name, show) \ 145 - static struct cpuidle_attr attr_##_name = __ATTR(_name, 0444, show, NULL) 146 - #define define_one_rw(_name, show, store) \ 147 - static struct cpuidle_attr attr_##_name = __ATTR(_name, 0644, show, store) 148 169 149 170 #define attr_to_cpuidleattr(a) container_of(a, struct cpuidle_attr, attr) 150 171 ··· 400 431 #define attr_to_stateattr(a) container_of(a, struct cpuidle_state_attr, attr) 401 432 402 433 static ssize_t cpuidle_state_show(struct kobject *kobj, struct attribute *attr, 403 - char * buf) 434 + char *buf) 404 435 { 405 436 int ret = -EIO; 406 437 struct cpuidle_state *state = kobj_to_state(kobj); 407 438 struct cpuidle_state_usage *state_usage = kobj_to_state_usage(kobj); 408 - struct cpuidle_state_attr * cattr = attr_to_stateattr(attr); 439 + struct cpuidle_state_attr *cattr = attr_to_stateattr(attr); 409 440 410 441 if (cattr->show) 411 442 ret = cattr->show(state, state_usage, buf); ··· 484 515 ret = kobject_init_and_add(&kobj->kobj, &ktype_state_cpuidle, 485 516 &kdev->kobj, "state%d", i); 486 517 if (ret) { 487 - kfree(kobj); 518 + kobject_put(&kobj->kobj); 488 519 goto error_state; 489 520 } 490 521 cpuidle_add_s2idle_attr_group(kobj); ··· 615 646 ret = kobject_init_and_add(&kdrv->kobj, &ktype_driver_cpuidle, 616 647 &kdev->kobj, "driver"); 617 648 if (ret) { 618 - kfree(kdrv); 649 + kobject_put(&kdrv->kobj); 619 650 return ret; 620 651 } 621 652 ··· 709 740 error = kobject_init_and_add(&kdev->kobj, &ktype_cpuidle, &cpu_dev->kobj, 710 741 "cpuidle"); 711 742 if (error) { 712 - kfree(kdev); 743 + kobject_put(&kdev->kobj); 713 744 return error; 714 745 } 715 746

+8

drivers/devfreq/Kconfig

··· 91 91 and adjusts the operating frequencies and voltages with OPP support. 92 92 This does not yet operate with optimal voltages. 93 93 94 + config ARM_IMX_BUS_DEVFREQ 95 + tristate "i.MX Generic Bus DEVFREQ Driver" 96 + depends on ARCH_MXC || COMPILE_TEST 97 + select DEVFREQ_GOV_USERSPACE 98 + help 99 + This adds the generic DEVFREQ driver for i.MX interconnects. It 100 + allows adjusting NIC/NOC frequency. 101 + 94 102 config ARM_IMX8M_DDRC_DEVFREQ 95 103 tristate "i.MX8M DDRC DEVFREQ Driver" 96 104 depends on (ARCH_MXC && HAVE_ARM_SMCCC) || \

+1

drivers/devfreq/Makefile

··· 9 9 10 10 # DEVFREQ Drivers 11 11 obj-$(CONFIG_ARM_EXYNOS_BUS_DEVFREQ) += exynos-bus.o 12 + obj-$(CONFIG_ARM_IMX_BUS_DEVFREQ) += imx-bus.o 12 13 obj-$(CONFIG_ARM_IMX8M_DDRC_DEVFREQ) += imx8m-ddrc.o 13 14 obj-$(CONFIG_ARM_RK3399_DMC_DEVFREQ) += rk3399_dmc.o 14 15 obj-$(CONFIG_ARM_TEGRA_DEVFREQ) += tegra30-devfreq.o

+8 -11

drivers/devfreq/devfreq.c

··· 60 60 { 61 61 struct devfreq *tmp_devfreq; 62 62 63 + lockdep_assert_held(&devfreq_list_lock); 64 + 63 65 if (IS_ERR_OR_NULL(dev)) { 64 66 pr_err("DEVFREQ: %s: Invalid parameters\n", __func__); 65 67 return ERR_PTR(-EINVAL); 66 68 } 67 - WARN(!mutex_is_locked(&devfreq_list_lock), 68 - "devfreq_list_lock must be locked."); 69 69 70 70 list_for_each_entry(tmp_devfreq, &devfreq_list, node) { 71 71 if (tmp_devfreq->dev.parent == dev) ··· 258 258 { 259 259 struct devfreq_governor *tmp_governor; 260 260 261 + lockdep_assert_held(&devfreq_list_lock); 262 + 261 263 if (IS_ERR_OR_NULL(name)) { 262 264 pr_err("DEVFREQ: %s: Invalid parameters\n", __func__); 263 265 return ERR_PTR(-EINVAL); 264 266 } 265 - WARN(!mutex_is_locked(&devfreq_list_lock), 266 - "devfreq_list_lock must be locked."); 267 267 268 268 list_for_each_entry(tmp_governor, &devfreq_governor_list, node) { 269 269 if (!strncmp(tmp_governor->name, name, DEVFREQ_NAME_LEN)) ··· 289 289 struct devfreq_governor *governor; 290 290 int err = 0; 291 291 292 + lockdep_assert_held(&devfreq_list_lock); 293 + 292 294 if (IS_ERR_OR_NULL(name)) { 293 295 pr_err("DEVFREQ: %s: Invalid parameters\n", __func__); 294 296 return ERR_PTR(-EINVAL); 295 297 } 296 - WARN(!mutex_is_locked(&devfreq_list_lock), 297 - "devfreq_list_lock must be locked."); 298 298 299 299 governor = find_devfreq_governor(name); 300 300 if (IS_ERR(governor)) { ··· 392 392 int err = 0; 393 393 u32 flags = 0; 394 394 395 - if (!mutex_is_locked(&devfreq->lock)) { 396 - WARN(true, "devfreq->lock must be locked by the caller.\n"); 397 - return -EINVAL; 398 - } 395 + lockdep_assert_held(&devfreq->lock); 399 396 400 397 if (!devfreq->governor) 401 398 return -EINVAL; ··· 765 768 devfreq->dev.release = devfreq_dev_release; 766 769 INIT_LIST_HEAD(&devfreq->node); 767 770 devfreq->profile = profile; 768 - strncpy(devfreq->governor_name, governor_name, DEVFREQ_NAME_LEN); 771 + strscpy(devfreq->governor_name, governor_name, DEVFREQ_NAME_LEN); 769 772 devfreq->previous_freq = profile->initial_freq; 770 773 devfreq->last_status.current_frequency = profile->initial_freq; 771 774 devfreq->data = data;

+179

drivers/devfreq/imx-bus.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright 2019 NXP 4 + */ 5 + 6 + #include <linux/clk.h> 7 + #include <linux/devfreq.h> 8 + #include <linux/device.h> 9 + #include <linux/module.h> 10 + #include <linux/of_device.h> 11 + #include <linux/pm_opp.h> 12 + #include <linux/platform_device.h> 13 + #include <linux/slab.h> 14 + 15 + struct imx_bus { 16 + struct devfreq_dev_profile profile; 17 + struct devfreq *devfreq; 18 + struct clk *clk; 19 + struct platform_device *icc_pdev; 20 + }; 21 + 22 + static int imx_bus_target(struct device *dev, 23 + unsigned long *freq, u32 flags) 24 + { 25 + struct dev_pm_opp *new_opp; 26 + int ret; 27 + 28 + new_opp = devfreq_recommended_opp(dev, freq, flags); 29 + if (IS_ERR(new_opp)) { 30 + ret = PTR_ERR(new_opp); 31 + dev_err(dev, "failed to get recommended opp: %d\n", ret); 32 + return ret; 33 + } 34 + dev_pm_opp_put(new_opp); 35 + 36 + return dev_pm_opp_set_rate(dev, *freq); 37 + } 38 + 39 + static int imx_bus_get_cur_freq(struct device *dev, unsigned long *freq) 40 + { 41 + struct imx_bus *priv = dev_get_drvdata(dev); 42 + 43 + *freq = clk_get_rate(priv->clk); 44 + 45 + return 0; 46 + } 47 + 48 + static int imx_bus_get_dev_status(struct device *dev, 49 + struct devfreq_dev_status *stat) 50 + { 51 + struct imx_bus *priv = dev_get_drvdata(dev); 52 + 53 + stat->busy_time = 0; 54 + stat->total_time = 0; 55 + stat->current_frequency = clk_get_rate(priv->clk); 56 + 57 + return 0; 58 + } 59 + 60 + static void imx_bus_exit(struct device *dev) 61 + { 62 + struct imx_bus *priv = dev_get_drvdata(dev); 63 + 64 + dev_pm_opp_of_remove_table(dev); 65 + platform_device_unregister(priv->icc_pdev); 66 + } 67 + 68 + /* imx_bus_init_icc() - register matching icc provider if required */ 69 + static int imx_bus_init_icc(struct device *dev) 70 + { 71 + struct imx_bus *priv = dev_get_drvdata(dev); 72 + const char *icc_driver_name; 73 + 74 + if (!of_get_property(dev->of_node, "#interconnect-cells", 0)) 75 + return 0; 76 + if (!IS_ENABLED(CONFIG_INTERCONNECT_IMX)) { 77 + dev_warn(dev, "imx interconnect drivers disabled\n"); 78 + return 0; 79 + } 80 + 81 + icc_driver_name = of_device_get_match_data(dev); 82 + if (!icc_driver_name) { 83 + dev_err(dev, "unknown interconnect driver\n"); 84 + return 0; 85 + } 86 + 87 + priv->icc_pdev = platform_device_register_data( 88 + dev, icc_driver_name, -1, NULL, 0); 89 + if (IS_ERR(priv->icc_pdev)) { 90 + dev_err(dev, "failed to register icc provider %s: %ld\n", 91 + icc_driver_name, PTR_ERR(priv->icc_pdev)); 92 + return PTR_ERR(priv->icc_pdev); 93 + } 94 + 95 + return 0; 96 + } 97 + 98 + static int imx_bus_probe(struct platform_device *pdev) 99 + { 100 + struct device *dev = &pdev->dev; 101 + struct imx_bus *priv; 102 + const char *gov = DEVFREQ_GOV_USERSPACE; 103 + int ret; 104 + 105 + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); 106 + if (!priv) 107 + return -ENOMEM; 108 + 109 + /* 110 + * Fetch the clock to adjust but don't explicitly enable. 111 + * 112 + * For imx bus clock clk_set_rate is safe no matter if the clock is on 113 + * or off and some peripheral side-buses might be off unless enabled by 114 + * drivers for devices on those specific buses. 115 + * 116 + * Rate adjustment on a disabled bus clock just takes effect later. 117 + */ 118 + priv->clk = devm_clk_get(dev, NULL); 119 + if (IS_ERR(priv->clk)) { 120 + ret = PTR_ERR(priv->clk); 121 + dev_err(dev, "failed to fetch clk: %d\n", ret); 122 + return ret; 123 + } 124 + platform_set_drvdata(pdev, priv); 125 + 126 + ret = dev_pm_opp_of_add_table(dev); 127 + if (ret < 0) { 128 + dev_err(dev, "failed to get OPP table\n"); 129 + return ret; 130 + } 131 + 132 + priv->profile.polling_ms = 1000; 133 + priv->profile.target = imx_bus_target; 134 + priv->profile.get_dev_status = imx_bus_get_dev_status; 135 + priv->profile.exit = imx_bus_exit; 136 + priv->profile.get_cur_freq = imx_bus_get_cur_freq; 137 + priv->profile.initial_freq = clk_get_rate(priv->clk); 138 + 139 + priv->devfreq = devm_devfreq_add_device(dev, &priv->profile, 140 + gov, NULL); 141 + if (IS_ERR(priv->devfreq)) { 142 + ret = PTR_ERR(priv->devfreq); 143 + dev_err(dev, "failed to add devfreq device: %d\n", ret); 144 + goto err; 145 + } 146 + 147 + ret = imx_bus_init_icc(dev); 148 + if (ret) 149 + goto err; 150 + 151 + return 0; 152 + 153 + err: 154 + dev_pm_opp_of_remove_table(dev); 155 + return ret; 156 + } 157 + 158 + static const struct of_device_id imx_bus_of_match[] = { 159 + { .compatible = "fsl,imx8mq-noc", .data = "imx8mq-interconnect", }, 160 + { .compatible = "fsl,imx8mm-noc", .data = "imx8mm-interconnect", }, 161 + { .compatible = "fsl,imx8mn-noc", .data = "imx8mn-interconnect", }, 162 + { .compatible = "fsl,imx8m-noc", }, 163 + { .compatible = "fsl,imx8m-nic", }, 164 + { /* sentinel */ }, 165 + }; 166 + MODULE_DEVICE_TABLE(of, imx_bus_of_match); 167 + 168 + static struct platform_driver imx_bus_platdrv = { 169 + .probe = imx_bus_probe, 170 + .driver = { 171 + .name = "imx-bus-devfreq", 172 + .of_match_table = of_match_ptr(imx_bus_of_match), 173 + }, 174 + }; 175 + module_platform_driver(imx_bus_platdrv); 176 + 177 + MODULE_DESCRIPTION("Generic i.MX bus frequency scaling driver"); 178 + MODULE_AUTHOR("Leonard Crestez <leonard.crestez@nxp.com>"); 179 + MODULE_LICENSE("GPL v2");

+3 -4

drivers/devfreq/tegra30-devfreq.c

··· 420 420 421 421 static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra, cpu_freq); 422 422 423 - if (dev_freq >= static_cpu_emc_freq) 423 + if (dev_freq + actmon_dev->boost_freq >= static_cpu_emc_freq) 424 424 return 0; 425 425 426 426 return static_cpu_emc_freq; ··· 807 807 } 808 808 809 809 err = platform_get_irq(pdev, 0); 810 - if (err < 0) { 811 - dev_err(&pdev->dev, "Failed to get IRQ: %d\n", err); 810 + if (err < 0) 812 811 return err; 813 - } 812 + 814 813 tegra->irq = err; 815 814 816 815 irq_set_status_flags(tegra->irq, IRQ_NOAUTOEN);

+1 -1

drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

··· 191 191 } 192 192 193 193 if (adev->runpm) { 194 - dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP); 194 + dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NO_DIRECT_COMPLETE); 195 195 pm_runtime_use_autosuspend(dev->dev); 196 196 pm_runtime_set_autosuspend_delay(dev->dev, 5000); 197 197 pm_runtime_set_active(dev->dev);

+1 -1

drivers/gpu/drm/i915/intel_runtime_pm.c

··· 549 549 * becaue the HDA driver may require us to enable the audio power 550 550 * domain during system suspend. 551 551 */ 552 - dev_pm_set_driver_flags(kdev, DPM_FLAG_NEVER_SKIP); 552 + dev_pm_set_driver_flags(kdev, DPM_FLAG_NO_DIRECT_COMPLETE); 553 553 554 554 pm_runtime_set_autosuspend_delay(kdev, 10000); /* 10s */ 555 555 pm_runtime_mark_last_busy(kdev);

+1 -1

drivers/gpu/drm/radeon/radeon_kms.c

··· 158 158 } 159 159 160 160 if (radeon_is_px(dev)) { 161 - dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP); 161 + dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NO_DIRECT_COMPLETE); 162 162 pm_runtime_use_autosuspend(dev->dev); 163 163 pm_runtime_set_autosuspend_delay(dev->dev, 5000); 164 164 pm_runtime_set_active(dev->dev);

+2 -2

drivers/i2c/busses/i2c-designware-platdrv.c

··· 357 357 if (dev->flags & ACCESS_NO_IRQ_SUSPEND) { 358 358 dev_pm_set_driver_flags(&pdev->dev, 359 359 DPM_FLAG_SMART_PREPARE | 360 - DPM_FLAG_LEAVE_SUSPENDED); 360 + DPM_FLAG_MAY_SKIP_RESUME); 361 361 } else { 362 362 dev_pm_set_driver_flags(&pdev->dev, 363 363 DPM_FLAG_SMART_PREPARE | 364 364 DPM_FLAG_SMART_SUSPEND | 365 - DPM_FLAG_LEAVE_SUSPENDED); 365 + DPM_FLAG_MAY_SKIP_RESUME); 366 366 } 367 367 368 368 /* The code below assumes runtime PM to be disabled. */

+1 -1

drivers/misc/mei/pci-me.c

··· 241 241 * MEI requires to resume from runtime suspend mode 242 242 * in order to perform link reset flow upon system suspend. 243 243 */ 244 - dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP); 244 + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE); 245 245 246 246 /* 247 247 * ME maps runtime suspend/resume to D0i states,

+1 -1

drivers/misc/mei/pci-txe.c

··· 128 128 * MEI requires to resume from runtime suspend mode 129 129 * in order to perform link reset flow upon system suspend. 130 130 */ 131 - dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP); 131 + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE); 132 132 133 133 /* 134 134 * TXE maps runtime suspend/resume to own power gating states,

+1 -1

drivers/net/ethernet/intel/e1000e/netdev.c

··· 7549 7549 7550 7550 e1000_print_device_info(adapter); 7551 7551 7552 - dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP); 7552 + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE); 7553 7553 7554 7554 if (pci_dev_run_wake(pdev) && hw->mac.type < e1000_pch_cnp) 7555 7555 pm_runtime_put_noidle(&pdev->dev);

+1 -1

drivers/net/ethernet/intel/igb/igb_main.c

··· 3445 3445 } 3446 3446 } 3447 3447 3448 - dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP); 3448 + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE); 3449 3449 3450 3450 pm_runtime_put_noidle(&pdev->dev); 3451 3451 return 0;

+1 -1

drivers/net/ethernet/intel/igc/igc_main.c

··· 4825 4825 pcie_print_link_status(pdev); 4826 4826 netdev_info(netdev, "MAC: %pM\n", netdev->dev_addr); 4827 4827 4828 - dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP); 4828 + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE); 4829 4829 4830 4830 pm_runtime_put_noidle(&pdev->dev); 4831 4831

+1 -1

drivers/pci/hotplug/pciehp_core.c

··· 275 275 * If the port is already runtime suspended we can keep it that 276 276 * way. 277 277 */ 278 - if (dev_pm_smart_suspend_and_suspended(&dev->port->dev)) 278 + if (dev_pm_skip_suspend(&dev->port->dev)) 279 279 return 0; 280 280 281 281 pciehp_disable_interrupt(dev);

+17 -17

drivers/pci/pci-driver.c

··· 776 776 777 777 static int pci_pm_suspend_late(struct device *dev) 778 778 { 779 - if (dev_pm_smart_suspend_and_suspended(dev)) 779 + if (dev_pm_skip_suspend(dev)) 780 780 return 0; 781 781 782 782 pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev)); ··· 789 789 struct pci_dev *pci_dev = to_pci_dev(dev); 790 790 const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 791 791 792 - if (dev_pm_smart_suspend_and_suspended(dev)) { 793 - dev->power.may_skip_resume = true; 792 + if (dev_pm_skip_suspend(dev)) 794 793 return 0; 795 - } 796 794 797 795 if (pci_has_legacy_pm_support(pci_dev)) 798 796 return pci_legacy_suspend_late(dev, PMSG_SUSPEND); ··· 878 880 * pci_pm_complete() to take care of fixing up the device's state 879 881 * anyway, if need be. 880 882 */ 881 - dev->power.may_skip_resume = device_may_wakeup(dev) || 882 - !device_can_wakeup(dev); 883 + if (device_can_wakeup(dev) && !device_may_wakeup(dev)) 884 + dev->power.may_skip_resume = false; 883 885 884 886 return 0; 885 887 } ··· 891 893 pci_power_t prev_state = pci_dev->current_state; 892 894 bool skip_bus_pm = pci_dev->skip_bus_pm; 893 895 894 - if (dev_pm_may_skip_resume(dev)) 896 + if (dev_pm_skip_resume(dev)) 895 897 return 0; 896 - 897 - /* 898 - * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend 899 - * during system suspend, so update their runtime PM status to "active" 900 - * as they are going to be put into D0 shortly. 901 - */ 902 - if (dev_pm_smart_suspend_and_suspended(dev)) 903 - pm_runtime_set_active(dev); 904 898 905 899 /* 906 900 * In the suspend-to-idle case, devices left in D0 during suspend will ··· 916 926 return pm->resume_noirq(dev); 917 927 918 928 return 0; 929 + } 930 + 931 + static int pci_pm_resume_early(struct device *dev) 932 + { 933 + if (dev_pm_skip_resume(dev)) 934 + return 0; 935 + 936 + return pm_generic_resume_early(dev); 919 937 } 920 938 921 939 static int pci_pm_resume(struct device *dev) ··· 959 961 #define pci_pm_suspend_late NULL 960 962 #define pci_pm_suspend_noirq NULL 961 963 #define pci_pm_resume NULL 964 + #define pci_pm_resume_early NULL 962 965 #define pci_pm_resume_noirq NULL 963 966 964 967 #endif /* !CONFIG_SUSPEND */ ··· 1126 1127 1127 1128 static int pci_pm_poweroff_late(struct device *dev) 1128 1129 { 1129 - if (dev_pm_smart_suspend_and_suspended(dev)) 1130 + if (dev_pm_skip_suspend(dev)) 1130 1131 return 0; 1131 1132 1132 1133 pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev)); ··· 1139 1140 struct pci_dev *pci_dev = to_pci_dev(dev); 1140 1141 const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 1141 1142 1142 - if (dev_pm_smart_suspend_and_suspended(dev)) 1143 + if (dev_pm_skip_suspend(dev)) 1143 1144 return 0; 1144 1145 1145 1146 if (pci_has_legacy_pm_support(pci_dev)) ··· 1357 1358 .suspend = pci_pm_suspend, 1358 1359 .suspend_late = pci_pm_suspend_late, 1359 1360 .resume = pci_pm_resume, 1361 + .resume_early = pci_pm_resume_early, 1360 1362 .freeze = pci_pm_freeze, 1361 1363 .thaw = pci_pm_thaw, 1362 1364 .poweroff = pci_pm_poweroff,

+1 -1

drivers/pci/pcie/portdrv_pci.c

··· 115 115 116 116 pci_save_state(dev); 117 117 118 - dev_pm_set_driver_flags(&dev->dev, DPM_FLAG_NEVER_SKIP | 118 + dev_pm_set_driver_flags(&dev->dev, DPM_FLAG_NO_DIRECT_COMPLETE | 119 119 DPM_FLAG_SMART_SUSPEND); 120 120 121 121 if (pci_bridge_d3_possible(dev)) {

+1 -3

drivers/powercap/intel_rapl_common.c

··· 26 26 #include <asm/cpu_device_id.h> 27 27 #include <asm/intel-family.h> 28 28 29 - /* Local defines */ 30 - #define MSR_PLATFORM_POWER_LIMIT 0x0000065C 31 - 32 29 /* bitmasks for RAPL MSRs, used by primitive access functions */ 33 30 #define ENERGY_STATUS_MASK 0xffffffff 34 31 ··· 986 989 X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &rapl_defaults_core), 987 990 X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_PLUS, &rapl_defaults_core), 988 991 X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_D, &rapl_defaults_core), 992 + X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT, &rapl_defaults_core), 989 993 X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D, &rapl_defaults_core), 990 994 X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_L, &rapl_defaults_core), 991 995

-10

drivers/soc/qcom/Kconfig

··· 80 80 tristate 81 81 select QCOM_QMI_HELPERS 82 82 83 - config QCOM_PM 84 - bool "Qualcomm Power Management" 85 - depends on ARCH_QCOM && !ARM64 86 - select ARM_CPU_SUSPEND 87 - select QCOM_SCM 88 - help 89 - QCOM Platform specific power driver to manage cores and L2 low power 90 - modes. It interface with various system drivers to put the cores in 91 - low power modes. 92 - 93 83 config QCOM_QMI_HELPERS 94 84 tristate 95 85 depends on NET

-1

drivers/soc/qcom/Makefile

··· 8 8 obj-$(CONFIG_QCOM_MDT_LOADER) += mdt_loader.o 9 9 obj-$(CONFIG_QCOM_OCMEM) += ocmem.o 10 10 obj-$(CONFIG_QCOM_PDR_HELPERS) += pdr_interface.o 11 - obj-$(CONFIG_QCOM_PM) += spm.o 12 11 obj-$(CONFIG_QCOM_QMI_HELPERS) += qmi_helpers.o 13 12 qmi_helpers-y += qmi_encdec.o qmi_interface.o 14 13 obj-$(CONFIG_QCOM_RMTFS_MEM) += rmtfs_mem.o

+52 -86

drivers/soc/qcom/spm.c drivers/cpuidle/cpuidle-qcom-spm.c

··· 19 19 #include <linux/cpu_pm.h> 20 20 #include <linux/qcom_scm.h> 21 21 22 - #include <asm/cpuidle.h> 23 22 #include <asm/proc-fns.h> 24 23 #include <asm/suspend.h> 24 + 25 + #include "dt_idle_states.h" 25 26 26 27 #define MAX_PMIC_DATA 2 27 28 #define MAX_SEQ_DATA 64 ··· 63 62 }; 64 63 65 64 struct spm_driver_data { 65 + struct cpuidle_driver cpuidle_driver; 66 66 void __iomem *reg_base; 67 67 const struct spm_reg_data *reg_data; 68 68 }; ··· 108 106 .start_index[PM_SLEEP_MODE_STBY] = 0, 109 107 .start_index[PM_SLEEP_MODE_SPC] = 2, 110 108 }; 111 - 112 - static DEFINE_PER_CPU(struct spm_driver_data *, cpu_spm_drv); 113 - 114 - typedef int (*idle_fn)(void); 115 - static DEFINE_PER_CPU(idle_fn*, qcom_idle_ops); 116 109 117 110 static inline void spm_register_write(struct spm_driver_data *drv, 118 111 enum spm_reg reg, u32 val) ··· 169 172 return -1; 170 173 } 171 174 172 - static int qcom_cpu_spc(void) 175 + static int qcom_cpu_spc(struct spm_driver_data *drv) 173 176 { 174 177 int ret; 175 - struct spm_driver_data *drv = __this_cpu_read(cpu_spm_drv); 176 178 177 179 spm_set_low_power_mode(drv, PM_SLEEP_MODE_SPC); 178 180 ret = cpu_suspend(0, qcom_pm_collapse); ··· 186 190 return ret; 187 191 } 188 192 189 - static int qcom_idle_enter(unsigned long index) 193 + static int spm_enter_idle_state(struct cpuidle_device *dev, 194 + struct cpuidle_driver *drv, int idx) 190 195 { 191 - return __this_cpu_read(qcom_idle_ops)[index](); 196 + struct spm_driver_data *data = container_of(drv, struct spm_driver_data, 197 + cpuidle_driver); 198 + 199 + return CPU_PM_CPU_IDLE_ENTER_PARAM(qcom_cpu_spc, idx, data); 192 200 } 193 201 194 - static const struct of_device_id qcom_idle_state_match[] __initconst = { 195 - { .compatible = "qcom,idle-state-spc", .data = qcom_cpu_spc }, 202 + static struct cpuidle_driver qcom_spm_idle_driver = { 203 + .name = "qcom_spm", 204 + .owner = THIS_MODULE, 205 + .states[0] = { 206 + .enter = spm_enter_idle_state, 207 + .exit_latency = 1, 208 + .target_residency = 1, 209 + .power_usage = UINT_MAX, 210 + .name = "WFI", 211 + .desc = "ARM WFI", 212 + } 213 + }; 214 + 215 + static const struct of_device_id qcom_idle_state_match[] = { 216 + { .compatible = "qcom,idle-state-spc", .data = spm_enter_idle_state }, 196 217 { }, 197 218 }; 198 219 199 - static int __init qcom_cpuidle_init(struct device_node *cpu_node, int cpu) 220 + static int spm_cpuidle_init(struct cpuidle_driver *drv, int cpu) 200 221 { 201 - const struct of_device_id *match_id; 202 - struct device_node *state_node; 203 - int i; 204 - int state_count = 1; 205 - idle_fn idle_fns[CPUIDLE_STATE_MAX]; 206 - idle_fn *fns; 207 - cpumask_t mask; 208 - bool use_scm_power_down = false; 222 + int ret; 209 223 210 - if (!qcom_scm_is_available()) 211 - return -EPROBE_DEFER; 224 + memcpy(drv, &qcom_spm_idle_driver, sizeof(*drv)); 225 + drv->cpumask = (struct cpumask *)cpumask_of(cpu); 212 226 213 - for (i = 0; ; i++) { 214 - state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i); 215 - if (!state_node) 216 - break; 227 + /* Parse idle states from device tree */ 228 + ret = dt_init_idle_driver(drv, qcom_idle_state_match, 1); 229 + if (ret <= 0) 230 + return ret ? : -ENODEV; 217 231 218 - if (!of_device_is_available(state_node)) 219 - continue; 220 - 221 - if (i == CPUIDLE_STATE_MAX) { 222 - pr_warn("%s: cpuidle states reached max possible\n", 223 - __func__); 224 - break; 225 - } 226 - 227 - match_id = of_match_node(qcom_idle_state_match, state_node); 228 - if (!match_id) 229 - return -ENODEV; 230 - 231 - idle_fns[state_count] = match_id->data; 232 - 233 - /* Check if any of the states allow power down */ 234 - if (match_id->data == qcom_cpu_spc) 235 - use_scm_power_down = true; 236 - 237 - state_count++; 238 - } 239 - 240 - if (state_count == 1) 241 - goto check_spm; 242 - 243 - fns = devm_kcalloc(get_cpu_device(cpu), state_count, sizeof(*fns), 244 - GFP_KERNEL); 245 - if (!fns) 246 - return -ENOMEM; 247 - 248 - for (i = 1; i < state_count; i++) 249 - fns[i] = idle_fns[i]; 250 - 251 - if (use_scm_power_down) { 252 - /* We have atleast one power down mode */ 253 - cpumask_clear(&mask); 254 - cpumask_set_cpu(cpu, &mask); 255 - qcom_scm_set_warm_boot_addr(cpu_resume_arm, &mask); 256 - } 257 - 258 - per_cpu(qcom_idle_ops, cpu) = fns; 259 - 260 - /* 261 - * SPM probe for the cpu should have happened by now, if the 262 - * SPM device does not exist, return -ENXIO to indicate that the 263 - * cpu does not support idle states. 264 - */ 265 - check_spm: 266 - return per_cpu(cpu_spm_drv, cpu) ? 0 : -ENXIO; 232 + /* We have atleast one power down mode */ 233 + return qcom_scm_set_warm_boot_addr(cpu_resume_arm, drv->cpumask); 267 234 } 268 - 269 - static const struct cpuidle_ops qcom_cpuidle_ops __initconst = { 270 - .suspend = qcom_idle_enter, 271 - .init = qcom_cpuidle_init, 272 - }; 273 - 274 - CPUIDLE_METHOD_OF_DECLARE(qcom_idle_v1, "qcom,kpss-acc-v1", &qcom_cpuidle_ops); 275 - CPUIDLE_METHOD_OF_DECLARE(qcom_idle_v2, "qcom,kpss-acc-v2", &qcom_cpuidle_ops); 276 235 277 236 static struct spm_driver_data *spm_get_drv(struct platform_device *pdev, 278 237 int *spm_cpu) ··· 274 323 struct resource *res; 275 324 const struct of_device_id *match_id; 276 325 void __iomem *addr; 277 - int cpu; 326 + int cpu, ret; 327 + 328 + if (!qcom_scm_is_available()) 329 + return -EPROBE_DEFER; 278 330 279 331 drv = spm_get_drv(pdev, &cpu); 280 332 if (!drv) 281 333 return -EINVAL; 334 + platform_set_drvdata(pdev, drv); 282 335 283 336 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); 284 337 drv->reg_base = devm_ioremap_resource(&pdev->dev, res); ··· 294 339 return -ENODEV; 295 340 296 341 drv->reg_data = match_id->data; 342 + 343 + ret = spm_cpuidle_init(&drv->cpuidle_driver, cpu); 344 + if (ret) 345 + return ret; 297 346 298 347 /* Write the SPM sequences first.. */ 299 348 addr = drv->reg_base + drv->reg_data->reg_offset[SPM_REG_SEQ_ENTRY]; ··· 321 362 /* Set up Standby as the default low power mode */ 322 363 spm_set_low_power_mode(drv, PM_SLEEP_MODE_STBY); 323 364 324 - per_cpu(cpu_spm_drv, cpu) = drv; 365 + return cpuidle_register(&drv->cpuidle_driver, NULL); 366 + } 325 367 368 + static int spm_dev_remove(struct platform_device *pdev) 369 + { 370 + struct spm_driver_data *drv = platform_get_drvdata(pdev); 371 + 372 + cpuidle_unregister(&drv->cpuidle_driver); 326 373 return 0; 327 374 } 328 375 329 376 static struct platform_driver spm_driver = { 330 377 .probe = spm_dev_probe, 378 + .remove = spm_dev_remove, 331 379 .driver = { 332 380 .name = "saw", 333 381 .of_match_table = spm_match_table,

+1 -2

fs/block_dev.c

··· 2022 2022 if (bdev_read_only(I_BDEV(bd_inode))) 2023 2023 return -EPERM; 2024 2024 2025 - /* uswsusp needs write permission to the swap */ 2026 - if (IS_SWAPFILE(bd_inode) && !hibernation_available()) 2025 + if (IS_SWAPFILE(bd_inode) && !is_hibernate_resume_dev(bd_inode)) 2027 2026 return -ETXTBSY; 2028 2027 2029 2028 if (!iov_iter_count(from))

+1 -1

include/linux/cpufreq.h

··· 330 330 * 331 331 * get_intermediate should return a stable intermediate frequency 332 332 * platform wants to switch to and target_intermediate() should set CPU 333 - * to to that frequency, before jumping to the frequency corresponding 333 + * to that frequency, before jumping to the frequency corresponding 334 334 * to 'index'. Core will take care of sending notifications and driver 335 335 * doesn't have to handle them in target_intermediate() or 336 336 * target_index().

+9 -23

include/linux/pm.h

··· 544 544 * These flags can be set by device drivers at the probe time. They need not be 545 545 * cleared by the drivers as the driver core will take care of that. 546 546 * 547 - * NEVER_SKIP: Do not skip all system suspend/resume callbacks for the device. 548 - * SMART_PREPARE: Check the return value of the driver's ->prepare callback. 549 - * SMART_SUSPEND: No need to resume the device from runtime suspend. 550 - * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible. 547 + * NO_DIRECT_COMPLETE: Do not apply direct-complete optimization to the device. 548 + * SMART_PREPARE: Take the driver ->prepare callback return value into account. 549 + * SMART_SUSPEND: Avoid resuming the device from runtime suspend. 550 + * MAY_SKIP_RESUME: Allow driver "noirq" and "early" callbacks to be skipped. 551 551 * 552 - * Setting SMART_PREPARE instructs bus types and PM domains which may want 553 - * system suspend/resume callbacks to be skipped for the device to return 0 from 554 - * their ->prepare callbacks if the driver's ->prepare callback returns 0 (in 555 - * other words, the system suspend/resume callbacks can only be skipped for the 556 - * device if its driver doesn't object against that). This flag has no effect 557 - * if NEVER_SKIP is set. 558 - * 559 - * Setting SMART_SUSPEND instructs bus types and PM domains which may want to 560 - * runtime resume the device upfront during system suspend that doing so is not 561 - * necessary from the driver's perspective. It also may cause them to skip 562 - * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by 563 - * the driver if they decide to leave the device in runtime suspend. 564 - * 565 - * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the 566 - * driver prefers the device to be left in suspend after system resume. 552 + * See Documentation/driver-api/pm/devices.rst for details. 567 553 */ 568 - #define DPM_FLAG_NEVER_SKIP BIT(0) 554 + #define DPM_FLAG_NO_DIRECT_COMPLETE BIT(0) 569 555 #define DPM_FLAG_SMART_PREPARE BIT(1) 570 556 #define DPM_FLAG_SMART_SUSPEND BIT(2) 571 - #define DPM_FLAG_LEAVE_SUSPENDED BIT(3) 557 + #define DPM_FLAG_MAY_SKIP_RESUME BIT(3) 572 558 573 559 struct dev_pm_info { 574 560 pm_message_t power_state; ··· 744 758 extern int pm_generic_poweroff(struct device *dev); 745 759 extern void pm_generic_complete(struct device *dev); 746 760 747 - extern bool dev_pm_may_skip_resume(struct device *dev); 748 - extern bool dev_pm_smart_suspend_and_suspended(struct device *dev); 761 + extern bool dev_pm_skip_resume(struct device *dev); 762 + extern bool dev_pm_skip_suspend(struct device *dev); 749 763 750 764 #else /* !CONFIG_PM_SLEEP */ 751 765

+2 -2

include/linux/pm_runtime.h

··· 102 102 return !dev->power.disable_depth; 103 103 } 104 104 105 - static inline bool pm_runtime_callbacks_present(struct device *dev) 105 + static inline bool pm_runtime_has_no_callbacks(struct device *dev) 106 106 { 107 - return !dev->power.no_callbacks; 107 + return dev->power.no_callbacks; 108 108 } 109 109 110 110 static inline void pm_runtime_mark_last_busy(struct device *dev)

+6

include/linux/suspend.h

··· 466 466 static inline bool hibernation_available(void) { return false; } 467 467 #endif /* CONFIG_HIBERNATION */ 468 468 469 + #ifdef CONFIG_HIBERNATION_SNAPSHOT_DEV 470 + int is_hibernate_resume_dev(const struct inode *); 471 + #else 472 + static inline int is_hibernate_resume_dev(const struct inode *i) { return 0; } 473 + #endif 474 + 469 475 /* Hibernation and suspend events */ 470 476 #define PM_HIBERNATION_PREPARE 0x0001 /* Going to hibernate */ 471 477 #define PM_POST_HIBERNATION 0x0002 /* Hibernation finished */

+12

kernel/power/Kconfig

··· 80 80 81 81 For more information take a look at <file:Documentation/power/swsusp.rst>. 82 82 83 + config HIBERNATION_SNAPSHOT_DEV 84 + bool "Userspace snapshot device" 85 + depends on HIBERNATION 86 + default y 87 + ---help--- 88 + Device used by the uswsusp tools. 89 + 90 + Say N if no snapshotting from userspace is needed, this also 91 + reduces the attack surface of the kernel. 92 + 93 + If in doubt, say Y. 94 + 83 95 config PM_STD_PARTITION 84 96 string "Default resume partition" 85 97 depends on HIBERNATION

+2 -1

kernel/power/Makefile

··· 10 10 obj-$(CONFIG_FREEZER) += process.o 11 11 obj-$(CONFIG_SUSPEND) += suspend.o 12 12 obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o 13 - obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o 13 + obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o 14 + obj-$(CONFIG_HIBERNATION_SNAPSHOT_DEV) += user.o 14 15 obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o 15 16 obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o 16 17

+16 -4

kernel/power/hibernate.c

··· 67 67 68 68 static const struct platform_hibernation_ops *hibernation_ops; 69 69 70 + static atomic_t hibernate_atomic = ATOMIC_INIT(1); 71 + 72 + bool hibernate_acquire(void) 73 + { 74 + return atomic_add_unless(&hibernate_atomic, -1, 0); 75 + } 76 + 77 + void hibernate_release(void) 78 + { 79 + atomic_inc(&hibernate_atomic); 80 + } 81 + 70 82 bool hibernation_available(void) 71 83 { 72 84 return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION); ··· 716 704 717 705 lock_system_sleep(); 718 706 /* The snapshot device should not be opened while we're running */ 719 - if (!atomic_add_unless(&snapshot_device_available, -1, 0)) { 707 + if (!hibernate_acquire()) { 720 708 error = -EBUSY; 721 709 goto Unlock; 722 710 } ··· 787 775 Exit: 788 776 __pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL); 789 777 pm_restore_console(); 790 - atomic_inc(&snapshot_device_available); 778 + hibernate_release(); 791 779 Unlock: 792 780 unlock_system_sleep(); 793 781 pr_info("hibernation exit\n"); ··· 892 880 goto Unlock; 893 881 894 882 /* The snapshot device should not be opened while we're running */ 895 - if (!atomic_add_unless(&snapshot_device_available, -1, 0)) { 883 + if (!hibernate_acquire()) { 896 884 error = -EBUSY; 897 885 swsusp_close(FMODE_READ); 898 886 goto Unlock; ··· 923 911 __pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL); 924 912 pm_restore_console(); 925 913 pr_info("resume failed (%d)\n", error); 926 - atomic_inc(&snapshot_device_available); 914 + hibernate_release(); 927 915 /* For success case, the suspend path will release the lock */ 928 916 Unlock: 929 917 mutex_unlock(&system_transition_mutex);

+2 -2

kernel/power/power.h

··· 154 154 extern void snapshot_write_finalize(struct snapshot_handle *handle); 155 155 extern int snapshot_image_loaded(struct snapshot_handle *handle); 156 156 157 - /* If unset, the snapshot device cannot be open. */ 158 - extern atomic_t snapshot_device_available; 157 + extern bool hibernate_acquire(void); 158 + extern void hibernate_release(void); 159 159 160 160 extern sector_t alloc_swapdev_block(int swap); 161 161 extern void free_all_swap_pages(int swap);

+16 -6

kernel/power/user.c

··· 35 35 bool ready; 36 36 bool platform_support; 37 37 bool free_bitmaps; 38 + struct inode *bd_inode; 38 39 } snapshot_state; 39 40 40 - atomic_t snapshot_device_available = ATOMIC_INIT(1); 41 + int is_hibernate_resume_dev(const struct inode *bd_inode) 42 + { 43 + return hibernation_available() && snapshot_state.bd_inode == bd_inode; 44 + } 41 45 42 46 static int snapshot_open(struct inode *inode, struct file *filp) 43 47 { ··· 53 49 54 50 lock_system_sleep(); 55 51 56 - if (!atomic_add_unless(&snapshot_device_available, -1, 0)) { 52 + if (!hibernate_acquire()) { 57 53 error = -EBUSY; 58 54 goto Unlock; 59 55 } 60 56 61 57 if ((filp->f_flags & O_ACCMODE) == O_RDWR) { 62 - atomic_inc(&snapshot_device_available); 58 + hibernate_release(); 63 59 error = -ENOSYS; 64 60 goto Unlock; 65 61 } ··· 96 92 __pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL); 97 93 } 98 94 if (error) 99 - atomic_inc(&snapshot_device_available); 95 + hibernate_release(); 100 96 101 97 data->frozen = false; 102 98 data->ready = false; 103 99 data->platform_support = false; 100 + data->bd_inode = NULL; 104 101 105 102 Unlock: 106 103 unlock_system_sleep(); ··· 117 112 118 113 swsusp_free(); 119 114 data = filp->private_data; 115 + data->bd_inode = NULL; 120 116 free_all_swap_pages(data->swap); 121 117 if (data->frozen) { 122 118 pm_restore_gfp_mask(); ··· 128 122 } 129 123 pm_notifier_call_chain(data->mode == O_RDONLY ? 130 124 PM_POST_HIBERNATION : PM_POST_RESTORE); 131 - atomic_inc(&snapshot_device_available); 125 + hibernate_release(); 132 126 133 127 unlock_system_sleep(); 134 128 ··· 210 204 static int snapshot_set_swap_area(struct snapshot_data *data, 211 205 void __user *argp) 212 206 { 207 + struct block_device *bdev; 213 208 sector_t offset; 214 209 dev_t swdev; 215 210 ··· 241 234 data->swap = -1; 242 235 return -EINVAL; 243 236 } 244 - data->swap = swap_type_of(swdev, offset, NULL); 237 + data->swap = swap_type_of(swdev, offset, &bdev); 245 238 if (data->swap < 0) 246 239 return -ENODEV; 240 + 241 + data->bd_inode = bdev->bd_inode; 242 + bdput(bdev); 247 243 return 0; 248 244 } 249 245

+1 -1

tools/power/cpupower/utils/cpupower-info.c

··· 62 62 default: 63 63 print_wrong_arg_exit(); 64 64 } 65 - }; 65 + } 66 66 67 67 if (!params.params) 68 68 params.params = 0x7;

+1 -1

tools/power/cpupower/utils/cpupower-set.c

··· 72 72 default: 73 73 print_wrong_arg_exit(); 74 74 } 75 - }; 75 + } 76 76 77 77 if (!params.params) 78 78 print_wrong_arg_exit();

+1 -1

tools/power/cpupower/utils/idle_monitor/amd_fam14h_idle.c

··· 117 117 break; 118 118 default: 119 119 return -1; 120 - }; 120 + } 121 121 return 0; 122 122 } 123 123

+3 -3

tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c

··· 53 53 dprint("CPU %d - State: %d - Val: %llu\n", 54 54 cpu, state, previous_count[cpu][state]); 55 55 } 56 - }; 56 + } 57 57 return 0; 58 58 } 59 59 ··· 72 72 dprint("CPU %d - State: %d - Val: %llu\n", 73 73 cpu, state, previous_count[cpu][state]); 74 74 } 75 - }; 75 + } 76 76 return 0; 77 77 } 78 78 ··· 172 172 cpuidle_cstates[num].id = num; 173 173 cpuidle_cstates[num].get_count_percent = 174 174 cpuidle_get_count_percent; 175 - }; 175 + } 176 176 177 177 /* Free this at program termination */ 178 178 previous_count = malloc(sizeof(long long *) * cpu_count);

+1 -1

tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c

··· 79 79 break; 80 80 default: 81 81 return -1; 82 - }; 82 + } 83 83 if (read_msr(cpu, msr, val)) 84 84 return -1; 85 85 return 0;

+1 -1

tools/power/cpupower/utils/idle_monitor/nhm_idle.c

··· 91 91 break; 92 92 default: 93 93 return -1; 94 - }; 94 + } 95 95 if (read_msr(cpu, msr, val)) 96 96 return -1; 97 97

+1 -1

tools/power/cpupower/utils/idle_monitor/snb_idle.c

··· 77 77 break; 78 78 default: 79 79 return -1; 80 - }; 80 + } 81 81 if (read_msr(cpu, msr, val)) 82 82 return -1; 83 83 return 0;