Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"These are dominated by cpufreq updates which in turn are dominated by
updates related to boost support in the core and drivers and
amd-pstate driver optimizations.

Apart from the above, there are some cpuidle updates including a
rework of the most recent idle intervals handling in the venerable
menu governor that leads to significant improvements in some
performance benchmarks, as the governor is now more likely to predict
a shorter idle duration in some cases, and there are updates of the
core device power management code, mostly related to system suspend
and resume, that should help to avoid potential issues arising when
the drivers of devices depending on one another want to use different
optimizations.

There is also a usual collection of assorted fixes and cleanups,
including removal of some unused code.

Specifics:

- Manage sysfs attributes and boost frequencies efficiently from
cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)

- Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
Dhananjay Ugwekar, Imran Shaik, zuoqian)

- Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
Bai)

- cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)

- Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)

- Optimize the amd-pstate driver to avoid cases where call paths end
up calling the same writes multiple times and needlessly caching
variables through code reorganization, locking overhaul and tracing
adjustments (Mario Limonciello, Dhananjay Ugwekar)

- Make it possible to avoid enabling capacity-aware scheduling (CAS)
in the intel_pstate driver and relocate a check for out-of-band
(OOB) platform handling in it to make it detect OOB before checking
HWP availability (Rafael Wysocki)

- Fix dbs_update() to avoid inadvertent conversions of negative
integer values to unsigned int which causes CPU frequency selection
to be inaccurate in some cases when the "conservative" cpufreq
governor is in use (Jie Zhan)

- Update the handling of the most recent idle intervals in the menu
cpuidle governor to prevent useful information from being discarded
by it in some cases and improve the prediction accuracy (Rafael
Wysocki)

- Make it possible to tell the intel_idle driver to ignore its
built-in table of idle states for the given processor, clean up the
handling of auto-demotion disabling on Baytrail and Cherrytrail
chips in it, and update its MAINTAINERS entry (David Arcari, Artem
Bityutskiy, Rafael Wysocki)

- Make some cpuidle drivers use for_each_present_cpu() instead of
for_each_possible_cpu() during initialization to avoid issues
occurring when nosmp or maxcpus=0 are used (Jacky Bai)

- Clean up the Energy Model handling code somewhat (Rafael Wysocki)

- Use kfree_rcu() to simplify the handling of runtime Energy Model
updates (Li RongQing)

- Add an entry for the Energy Model framework to MAINTAINERS as
properly maintained (Lukasz Luba)

- Address RCU-related sparse warnings in the Energy Model code
(Rafael Wysocki)

- Remove ENERGY_MODEL dependency on SMP and allow it to be selected
when DEVFREQ is set without CPUFREQ so it can be used on a wider
range of systems (Jeson Gao)

- Unify error handling during runtime suspend and runtime resume in
the core to help drivers to implement more consistent runtime PM
error handling (Rafael Wysocki)

- Drop a redundant check from pm_runtime_force_resume() and rearrange
documentation related to __pm_runtime_disable() (Rafael Wysocki)

- Rework the handling of the "smart suspend" driver flag in the PM
core to avoid issues hat may occur when drivers using it depend on
some other drivers and clean up the related PM core code (Rafael
Wysocki, Colin Ian King)

- Fix the handling of devices with the power.direct_complete flag set
if device_suspend() returns an error for at least one device to
avoid situations in which some of them may not be resumed (Rafael
Wysocki)

- Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
possible deadlock that may occur if the "compressor" hibernation
module parameter is accessed during the registration of a new
ieee80211 device (Lizhi Xu)

- Suppress sleeping parent warning in device_pm_add() in the case
when new children are added under a device with the
power.direct_complete set after it has been processed by
device_resume() (Xu Yang)

- Remove needless return in three void functions related to system
wakeup (Zijun Hu)

- Replace deprecated kmap_atomic() with kmap_local_page() in the
hibernation core code (David Reaver)

- Remove unused helper functions related to system sleep (David Alan
Gilbert)

- Clean up s2idle_enter() so it does not lock and unlock CPU offline
in vain and update comments in it (Ulf Hansson)

- Clean up broken white space in dpm_wait_for_children() (Geert
Uytterhoeven)

- Update the cpupower utility to fix lib version-ing in it and memory
leaks in error legs, remove hard-coded values, and implement CPU
physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
Khan, Yiwei Lin, Zhongqiu Han)"

* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
PM: sleep: Fix bit masking operation
dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
cpufreq: Init cpufreq only for present CPUs
PM: sleep: Fix handling devices with direct_complete set on errors
cpuidle: Init cpuidle only for present CPUs
PM: clk: Remove unused pm_clk_remove()
PM: sleep: core: Fix indentation in dpm_wait_for_children()
PM: s2idle: Extend comment in s2idle_enter()
PM: s2idle: Drop redundant locks when entering s2idle
PM: sleep: Remove unused pm_generic_ wrappers
cpufreq: tegra186: Share policy per cluster
cpupower: Make lib versioning scheme more obvious and fix version link
PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
PM: EM: Address RCU-related sparse warnings
cpupower: Implement CPU physical core querying
pm: cpupower: remove hard-coded topology depth values
pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
...

+1145 -1192
+3
Documentation/admin-guide/kernel-parameters.txt
··· 2314 2314 per_cpu_perf_limits 2315 2315 Allow per-logical-CPU P-State performance control limits using 2316 2316 cpufreq sysfs interface 2317 + no_cas 2318 + Do not enable capacity-aware scheduling (CAS) on 2319 + hybrid systems 2317 2320 2318 2321 intremap= [X86-64,Intel-IOMMU,EARLY] 2319 2322 on enable Interrupt Remapping (default)
+16 -11
Documentation/admin-guide/pm/cpuidle.rst
··· 275 275 and variance of them. If the variance is small (smaller than 400 square 276 276 milliseconds) or it is small relative to the average (the average is greater 277 277 that 6 times the standard deviation), the average is regarded as the "typical 278 - interval" value. Otherwise, the longest of the saved observed idle duration 278 + interval" value. Otherwise, either the longest or the shortest (depending on 279 + which one is farther from the average) of the saved observed idle duration 279 280 values is discarded and the computation is repeated for the remaining ones. 281 + 280 282 Again, if the variance of them is small (in the above sense), the average is 281 283 taken as the "typical interval" value and so on, until either the "typical 282 - interval" is determined or too many data points are disregarded, in which case 283 - the "typical interval" is assumed to equal "infinity" (the maximum unsigned 284 - integer value). 284 + interval" is determined or too many data points are disregarded. In the latter 285 + case, if the size of the set of data points still under consideration is 286 + sufficiently large, the next idle duration is not likely to be above the largest 287 + idle duration value still in that set, so that value is taken as the predicted 288 + next idle duration. Finally, if the set of data points still under 289 + consideration is too small, no prediction is made. 285 290 286 - If the "typical interval" computed this way is long enough, the governor obtains 287 - the time until the closest timer event with the assumption that the scheduler 288 - tick will be stopped. That time, referred to as the *sleep length* in what follows, 289 - is the upper bound on the time before the next CPU wakeup. It is used to determine 290 - the sleep length range, which in turn is needed to get the sleep length correction 291 - factor. 291 + If the preliminary prediction of the next idle duration computed this way is 292 + long enough, the governor obtains the time until the closest timer event with 293 + the assumption that the scheduler tick will be stopped. That time, referred to 294 + as the *sleep length* in what follows, is the upper bound on the time before the 295 + next CPU wakeup. It is used to determine the sleep length range, which in turn 296 + is needed to get the sleep length correction factor. 292 297 293 298 The ``menu`` governor maintains an array containing several correction factor 294 299 values that correspond to different sleep length ranges organized so that each ··· 307 302 The sleep length is multiplied by the correction factor for the range that it 308 303 falls into to obtain an approximation of the predicted idle duration that is 309 304 compared to the "typical interval" determined previously and the minimum of 310 - the two is taken as the idle duration prediction. 305 + the two is taken as the final idle duration prediction. 311 306 312 307 If the "typical interval" value is small, which means that the CPU is likely 313 308 to be woken up soon enough, the sleep length computation is skipped as it may
+13 -5
Documentation/admin-guide/pm/intel_idle.rst
··· 192 192 Documentation/admin-guide/pm/cpuidle.rst). 193 193 Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail. 194 194 195 - The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle`` 196 - if the kernel has been configured with ACPI support) can be set to make the 197 - driver ignore the system's ACPI tables entirely or use them for all of the 198 - recognized processor models, respectively (they both are unset by default and 199 - ``use_acpi`` has no effect if ``no_acpi`` is set). 195 + The ``no_acpi``, ``use_acpi`` and ``no_native`` module parameters are 196 + recognized by ``intel_idle`` if the kernel has been configured with ACPI 197 + support. In the case that ACPI is not configured these flags have no impact 198 + on functionality. 199 + 200 + ``no_acpi`` - Do not use ACPI at all. Only native mode is available, no 201 + ACPI mode. 202 + 203 + ``use_acpi`` - No-op in ACPI mode, the driver will consult ACPI tables for 204 + C-states on/off status in native mode. 205 + 206 + ``no_native`` - Work only in ACPI mode, no native mode available (ignore 207 + all custom tables). 200 208 201 209 The value of the ``states_off`` module parameter (0 by default) represents a 202 210 list of idle states to be disabled by default in the form of a bitmask.
+3
Documentation/admin-guide/pm/intel_pstate.rst
··· 696 696 Use per-logical-CPU P-State limits (see `Coordination of P-state 697 697 Limits`_ for details). 698 698 699 + ``no_cas`` 700 + Do not enable capacity-aware scheduling (CAS) which is enabled by 701 + default on hybrid systems. 699 702 700 703 Diagnostics and Tuning 701 704 ======================
+31 -4
Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml
··· 34 34 - description: v2 of CPUFREQ HW (EPSS) 35 35 items: 36 36 - enum: 37 + - qcom,qcs8300-cpufreq-epss 37 38 - qcom,qdu1000-cpufreq-epss 38 39 - qcom,sa8255p-cpufreq-epss 39 40 - qcom,sa8775p-cpufreq-epss ··· 112 111 enum: 113 112 - qcom,qcm2290-cpufreq-hw 114 113 - qcom,sar2130p-cpufreq-epss 114 + - qcom,sdx75-cpufreq-epss 115 115 then: 116 116 properties: 117 117 reg: 118 - minItems: 1 119 118 maxItems: 1 120 119 121 120 reg-names: 122 - minItems: 1 123 121 maxItems: 1 124 122 125 123 interrupts: 126 - minItems: 1 127 124 maxItems: 1 128 125 129 126 interrupt-names: 130 - minItems: 1 127 + maxItems: 1 131 128 132 129 - if: 133 130 properties: ··· 134 135 enum: 135 136 - qcom,qdu1000-cpufreq-epss 136 137 - qcom,sa8255p-cpufreq-epss 138 + - qcom,sa8775p-cpufreq-epss 137 139 - qcom,sc7180-cpufreq-hw 138 140 - qcom,sc8180x-cpufreq-hw 139 141 - qcom,sc8280xp-cpufreq-epss ··· 160 160 161 161 interrupt-names: 162 162 minItems: 2 163 + maxItems: 2 163 164 164 165 - if: 165 166 properties: 166 167 compatible: 167 168 contains: 168 169 enum: 170 + - qcom,qcs8300-cpufreq-epss 169 171 - qcom,sc7280-cpufreq-epss 170 172 - qcom,sm8250-cpufreq-epss 171 173 - qcom,sm8350-cpufreq-epss ··· 189 187 190 188 interrupt-names: 191 189 minItems: 3 190 + maxItems: 3 192 191 193 192 - if: 194 193 properties: ··· 214 211 215 212 interrupt-names: 216 213 minItems: 2 214 + maxItems: 2 217 215 216 + - if: 217 + properties: 218 + compatible: 219 + contains: 220 + enum: 221 + - qcom,sm8650-cpufreq-epss 222 + then: 223 + properties: 224 + reg: 225 + minItems: 4 226 + maxItems: 4 227 + 228 + reg-names: 229 + minItems: 4 230 + maxItems: 4 231 + 232 + interrupts: 233 + minItems: 4 234 + maxItems: 4 235 + 236 + interrupt-names: 237 + minItems: 4 238 + maxItems: 4 218 239 219 240 examples: 220 241 - |
+14 -3
MAINTAINERS
··· 8541 8541 S: Maintained 8542 8542 F: drivers/media/rc/ene_ir.* 8543 8543 8544 + ENERGY MODEL 8545 + M: Lukasz Luba <lukasz.luba@arm.com> 8546 + M: "Rafael J. Wysocki" <rafael@kernel.org> 8547 + L: linux-pm@vger.kernel.org 8548 + S: Maintained 8549 + F: kernel/power/energy_model.c 8550 + F: include/linux/energy_model.h 8551 + F: Documentation/power/energy-model.rst 8552 + 8544 8553 EPAPR HYPERVISOR BYTE CHANNEL DEVICE DRIVER 8545 8554 M: Laurentiu Tudor <laurentiu.tudor@nxp.com> 8546 8555 L: linuxppc-dev@lists.ozlabs.org ··· 11699 11690 F: drivers/crypto/intel/iaa/* 11700 11691 11701 11692 INTEL IDLE DRIVER 11702 - M: Jacob Pan <jacob.jun.pan@linux.intel.com> 11703 - M: Len Brown <lenb@kernel.org> 11693 + M: Rafael J. Wysocki <rafael@kernel.org> 11694 + M: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> 11695 + M: Artem Bityutskiy <dedekind1@gmail.com> 11696 + R: Len Brown <lenb@kernel.org> 11704 11697 L: linux-pm@vger.kernel.org 11705 11698 S: Supported 11706 11699 B: https://bugzilla.kernel.org 11707 - T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git 11700 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 11708 11701 F: drivers/idle/intel_idle.c 11709 11702 11710 11703 INTEL IDXD DRIVER
+10 -8
arch/x86/include/asm/msr-index.h
··· 703 703 #define MSR_AMD_CPPC_REQ 0xc00102b3 704 704 #define MSR_AMD_CPPC_STATUS 0xc00102b4 705 705 706 - #define AMD_CPPC_LOWEST_PERF(x) (((x) >> 0) & 0xff) 707 - #define AMD_CPPC_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff) 708 - #define AMD_CPPC_NOMINAL_PERF(x) (((x) >> 16) & 0xff) 709 - #define AMD_CPPC_HIGHEST_PERF(x) (((x) >> 24) & 0xff) 706 + /* Masks for use with MSR_AMD_CPPC_CAP1 */ 707 + #define AMD_CPPC_LOWEST_PERF_MASK GENMASK(7, 0) 708 + #define AMD_CPPC_LOWNONLIN_PERF_MASK GENMASK(15, 8) 709 + #define AMD_CPPC_NOMINAL_PERF_MASK GENMASK(23, 16) 710 + #define AMD_CPPC_HIGHEST_PERF_MASK GENMASK(31, 24) 710 711 711 - #define AMD_CPPC_MAX_PERF(x) (((x) & 0xff) << 0) 712 - #define AMD_CPPC_MIN_PERF(x) (((x) & 0xff) << 8) 713 - #define AMD_CPPC_DES_PERF(x) (((x) & 0xff) << 16) 714 - #define AMD_CPPC_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24) 712 + /* Masks for use with MSR_AMD_CPPC_REQ */ 713 + #define AMD_CPPC_MAX_PERF_MASK GENMASK(7, 0) 714 + #define AMD_CPPC_MIN_PERF_MASK GENMASK(15, 8) 715 + #define AMD_CPPC_DES_PERF_MASK GENMASK(23, 16) 716 + #define AMD_CPPC_EPP_PERF_MASK GENMASK(31, 24) 715 717 716 718 /* AMD Performance Counter Global Status and Control MSRs */ 717 719 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300
+1 -1
arch/x86/kernel/acpi/cppc.c
··· 151 151 if (ret) 152 152 goto out; 153 153 154 - val = AMD_CPPC_HIGHEST_PERF(val); 154 + val = FIELD_GET(AMD_CPPC_HIGHEST_PERF_MASK, val); 155 155 } else { 156 156 ret = cppc_get_highest_perf(cpu, &val); 157 157 if (ret)
+2 -2
drivers/acpi/device_pm.c
··· 1161 1161 */ 1162 1162 int acpi_subsys_suspend(struct device *dev) 1163 1163 { 1164 - if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || 1164 + if (!dev_pm_smart_suspend(dev) || 1165 1165 acpi_dev_needs_resume(dev, ACPI_COMPANION(dev))) 1166 1166 pm_runtime_resume(dev); 1167 1167 ··· 1320 1320 */ 1321 1321 int acpi_subsys_poweroff(struct device *dev) 1322 1322 { 1323 - if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || 1323 + if (!dev_pm_smart_suspend(dev) || 1324 1324 acpi_dev_needs_resume(dev, ACPI_COMPANION(dev))) 1325 1325 pm_runtime_resume(dev); 1326 1326
-73
drivers/base/power/clock_ops.c
··· 259 259 } 260 260 EXPORT_SYMBOL_GPL(pm_clk_add_clk); 261 261 262 - 263 - /** 264 - * of_pm_clk_add_clk - Start using a device clock for power management. 265 - * @dev: Device whose clock is going to be used for power management. 266 - * @name: Name of clock that is going to be used for power management. 267 - * 268 - * Add the clock described in the 'clocks' device-tree node that matches 269 - * with the 'name' provided, to the list of clocks used for the power 270 - * management of @dev. On success, returns 0. Returns a negative error 271 - * code if the clock is not found or cannot be added. 272 - */ 273 - int of_pm_clk_add_clk(struct device *dev, const char *name) 274 - { 275 - struct clk *clk; 276 - int ret; 277 - 278 - if (!dev || !dev->of_node || !name) 279 - return -EINVAL; 280 - 281 - clk = of_clk_get_by_name(dev->of_node, name); 282 - if (IS_ERR(clk)) 283 - return PTR_ERR(clk); 284 - 285 - ret = pm_clk_add_clk(dev, clk); 286 - if (ret) { 287 - clk_put(clk); 288 - return ret; 289 - } 290 - 291 - return 0; 292 - } 293 - EXPORT_SYMBOL_GPL(of_pm_clk_add_clk); 294 - 295 262 /** 296 263 * of_pm_clk_add_clks - Start using device clock(s) for power management. 297 264 * @dev: Device whose clock(s) is going to be used for power management. ··· 342 375 kfree(ce->con_id); 343 376 kfree(ce); 344 377 } 345 - 346 - /** 347 - * pm_clk_remove - Stop using a device clock for power management. 348 - * @dev: Device whose clock should not be used for PM any more. 349 - * @con_id: Connection ID of the clock. 350 - * 351 - * Remove the clock represented by @con_id from the list of clocks used for 352 - * the power management of @dev. 353 - */ 354 - void pm_clk_remove(struct device *dev, const char *con_id) 355 - { 356 - struct pm_subsys_data *psd = dev_to_psd(dev); 357 - struct pm_clock_entry *ce; 358 - 359 - if (!psd) 360 - return; 361 - 362 - pm_clk_list_lock(psd); 363 - 364 - list_for_each_entry(ce, &psd->clock_list, node) { 365 - if (!con_id && !ce->con_id) 366 - goto remove; 367 - else if (!con_id || !ce->con_id) 368 - continue; 369 - else if (!strcmp(con_id, ce->con_id)) 370 - goto remove; 371 - } 372 - 373 - pm_clk_list_unlock(psd); 374 - return; 375 - 376 - remove: 377 - list_del(&ce->node); 378 - if (ce->enabled_when_prepared) 379 - psd->clock_op_might_sleep--; 380 - pm_clk_list_unlock(psd); 381 - 382 - __pm_clk_remove(ce); 383 - } 384 - EXPORT_SYMBOL_GPL(pm_clk_remove); 385 378 386 379 /** 387 380 * pm_clk_remove_clk - Stop using a device clock for power management.
-24
drivers/base/power/generic_ops.c
··· 115 115 EXPORT_SYMBOL_GPL(pm_generic_freeze_noirq); 116 116 117 117 /** 118 - * pm_generic_freeze_late - Generic freeze_late callback for subsystems. 119 - * @dev: Device to freeze. 120 - */ 121 - int pm_generic_freeze_late(struct device *dev) 122 - { 123 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 124 - 125 - return pm && pm->freeze_late ? pm->freeze_late(dev) : 0; 126 - } 127 - EXPORT_SYMBOL_GPL(pm_generic_freeze_late); 128 - 129 - /** 130 118 * pm_generic_freeze - Generic freeze callback for subsystems. 131 119 * @dev: Device to freeze. 132 120 */ ··· 173 185 return pm && pm->thaw_noirq ? pm->thaw_noirq(dev) : 0; 174 186 } 175 187 EXPORT_SYMBOL_GPL(pm_generic_thaw_noirq); 176 - 177 - /** 178 - * pm_generic_thaw_early - Generic thaw_early callback for subsystems. 179 - * @dev: Device to thaw. 180 - */ 181 - int pm_generic_thaw_early(struct device *dev) 182 - { 183 - const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; 184 - 185 - return pm && pm->thaw_early ? pm->thaw_early(dev) : 0; 186 - } 187 - EXPORT_SYMBOL_GPL(pm_generic_thaw_early); 188 188 189 189 /** 190 190 * pm_generic_thaw - Generic thaw callback for subsystems.
+120 -45
drivers/base/power/main.c
··· 249 249 250 250 static void dpm_wait_for_children(struct device *dev, bool async) 251 251 { 252 - device_for_each_child(dev, &async, dpm_wait_fn); 252 + device_for_each_child(dev, &async, dpm_wait_fn); 253 253 } 254 254 255 255 static void dpm_wait_for_suppliers(struct device *dev, bool async) ··· 599 599 600 600 static bool dpm_async_fn(struct device *dev, async_func_t func) 601 601 { 602 - reinit_completion(&dev->power.completion); 602 + if (!is_async(dev)) 603 + return false; 603 604 604 - if (is_async(dev)) { 605 - dev->power.async_in_progress = true; 605 + dev->power.work_in_progress = true; 606 606 607 - get_device(dev); 607 + get_device(dev); 608 608 609 - if (async_schedule_dev_nocall(func, dev)) 610 - return true; 609 + if (async_schedule_dev_nocall(func, dev)) 610 + return true; 611 611 612 - put_device(dev); 613 - } 612 + put_device(dev); 613 + 614 614 /* 615 - * Because async_schedule_dev_nocall() above has returned false or it 616 - * has not been called at all, func() is not running and it is safe to 617 - * update the async_in_progress flag without extra synchronization. 615 + * async_schedule_dev_nocall() above has returned false, so func() is 616 + * not running and it is safe to update power.work_in_progress without 617 + * extra synchronization. 618 618 */ 619 - dev->power.async_in_progress = false; 619 + dev->power.work_in_progress = false; 620 + 620 621 return false; 622 + } 623 + 624 + static void dpm_clear_async_state(struct device *dev) 625 + { 626 + reinit_completion(&dev->power.completion); 627 + dev->power.work_in_progress = false; 621 628 } 622 629 623 630 /** ··· 663 656 * so change its status accordingly. 664 657 * 665 658 * Otherwise, the device is going to be resumed, so set its PM-runtime 666 - * status to "active" unless its power.set_active flag is clear, in 659 + * status to "active" unless its power.smart_suspend flag is clear, in 667 660 * which case it is not necessary to update its PM-runtime status. 668 661 */ 669 - if (skip_resume) { 662 + if (skip_resume) 670 663 pm_runtime_set_suspended(dev); 671 - } else if (dev->power.set_active) { 664 + else if (dev_pm_smart_suspend(dev)) 672 665 pm_runtime_set_active(dev); 673 - dev->power.set_active = false; 674 - } 675 666 676 667 if (dev->pm_domain) { 677 668 info = "noirq power domain "; ··· 736 731 * Trigger the resume of "async" devices upfront so they don't have to 737 732 * wait for the "non-async" ones they don't depend on. 738 733 */ 739 - list_for_each_entry(dev, &dpm_noirq_list, power.entry) 734 + list_for_each_entry(dev, &dpm_noirq_list, power.entry) { 735 + dpm_clear_async_state(dev); 740 736 dpm_async_fn(dev, async_resume_noirq); 737 + } 741 738 742 739 while (!list_empty(&dpm_noirq_list)) { 743 740 dev = to_device(dpm_noirq_list.next); 744 741 list_move_tail(&dev->power.entry, &dpm_late_early_list); 745 742 746 - if (!dev->power.async_in_progress) { 743 + if (!dev->power.work_in_progress) { 747 744 get_device(dev); 748 745 749 746 mutex_unlock(&dpm_list_mtx); ··· 878 871 * Trigger the resume of "async" devices upfront so they don't have to 879 872 * wait for the "non-async" ones they don't depend on. 880 873 */ 881 - list_for_each_entry(dev, &dpm_late_early_list, power.entry) 874 + list_for_each_entry(dev, &dpm_late_early_list, power.entry) { 875 + dpm_clear_async_state(dev); 882 876 dpm_async_fn(dev, async_resume_early); 877 + } 883 878 884 879 while (!list_empty(&dpm_late_early_list)) { 885 880 dev = to_device(dpm_late_early_list.next); 886 881 list_move_tail(&dev->power.entry, &dpm_suspended_list); 887 882 888 - if (!dev->power.async_in_progress) { 883 + if (!dev->power.work_in_progress) { 889 884 get_device(dev); 890 885 891 886 mutex_unlock(&dpm_list_mtx); ··· 938 929 if (dev->power.syscore) 939 930 goto Complete; 940 931 932 + if (!dev->power.is_suspended) 933 + goto Complete; 934 + 941 935 if (dev->power.direct_complete) { 936 + /* 937 + * Allow new children to be added under the device after this 938 + * point if it has no PM callbacks. 939 + */ 940 + if (dev->power.no_pm_callbacks) 941 + dev->power.is_prepared = false; 942 + 942 943 /* Match the pm_runtime_disable() in device_suspend(). */ 943 944 pm_runtime_enable(dev); 944 945 goto Complete; ··· 965 946 * a resumed device, even if the device hasn't been completed yet. 966 947 */ 967 948 dev->power.is_prepared = false; 968 - 969 - if (!dev->power.is_suspended) 970 - goto Unlock; 971 949 972 950 if (dev->pm_domain) { 973 951 info = "power domain "; ··· 1005 989 error = dpm_run_callback(callback, dev, state, info); 1006 990 dev->power.is_suspended = false; 1007 991 1008 - Unlock: 1009 992 device_unlock(dev); 1010 993 dpm_watchdog_clear(&wd); 1011 994 ··· 1052 1037 * Trigger the resume of "async" devices upfront so they don't have to 1053 1038 * wait for the "non-async" ones they don't depend on. 1054 1039 */ 1055 - list_for_each_entry(dev, &dpm_suspended_list, power.entry) 1040 + list_for_each_entry(dev, &dpm_suspended_list, power.entry) { 1041 + dpm_clear_async_state(dev); 1056 1042 dpm_async_fn(dev, async_resume); 1043 + } 1057 1044 1058 1045 while (!list_empty(&dpm_suspended_list)) { 1059 1046 dev = to_device(dpm_suspended_list.next); 1060 1047 list_move_tail(&dev->power.entry, &dpm_prepared_list); 1061 1048 1062 - if (!dev->power.async_in_progress) { 1049 + if (!dev->power.work_in_progress) { 1063 1050 get_device(dev); 1064 1051 1065 1052 mutex_unlock(&dpm_list_mtx); ··· 1126 1109 device_unlock(dev); 1127 1110 1128 1111 out: 1112 + /* If enabling runtime PM for the device is blocked, unblock it. */ 1113 + pm_runtime_unblock(dev); 1129 1114 pm_runtime_put(dev); 1130 1115 } 1131 1116 ··· 1289 1270 dev->power.is_noirq_suspended = true; 1290 1271 1291 1272 /* 1292 - * Skipping the resume of devices that were in use right before the 1293 - * system suspend (as indicated by their PM-runtime usage counters) 1294 - * would be suboptimal. Also resume them if doing that is not allowed 1295 - * to be skipped. 1273 + * Devices must be resumed unless they are explicitly allowed to be left 1274 + * in suspend, but even in that case skipping the resume of devices that 1275 + * were in use right before the system suspend (as indicated by their 1276 + * runtime PM usage counters and child counters) would be suboptimal. 1296 1277 */ 1297 - if (atomic_read(&dev->power.usage_count) > 1 || 1298 - !(dev_pm_test_driver_flags(dev, DPM_FLAG_MAY_SKIP_RESUME) && 1299 - dev->power.may_skip_resume)) 1278 + if (!(dev_pm_test_driver_flags(dev, DPM_FLAG_MAY_SKIP_RESUME) && 1279 + dev->power.may_skip_resume) || !pm_runtime_need_not_resume(dev)) 1300 1280 dev->power.must_resume = true; 1301 1281 1302 - if (dev->power.must_resume) { 1303 - if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) { 1304 - dev->power.set_active = true; 1305 - if (dev->parent && !dev->parent->power.ignore_children) 1306 - dev->parent->power.set_active = true; 1307 - } 1282 + if (dev->power.must_resume) 1308 1283 dpm_superior_set_must_resume(dev); 1309 - } 1310 1284 1311 1285 Complete: 1312 1286 complete_all(&dev->power.completion); ··· 1332 1320 1333 1321 list_move(&dev->power.entry, &dpm_noirq_list); 1334 1322 1323 + dpm_clear_async_state(dev); 1335 1324 if (dpm_async_fn(dev, async_suspend_noirq)) 1336 1325 continue; 1337 1326 ··· 1417 1404 TRACE_DEVICE(dev); 1418 1405 TRACE_SUSPEND(0); 1419 1406 1407 + /* 1408 + * Disable runtime PM for the device without checking if there is a 1409 + * pending resume request for it. 1410 + */ 1420 1411 __pm_runtime_disable(dev, false); 1421 1412 1422 1413 dpm_wait_for_subordinate(dev, async); ··· 1510 1493 1511 1494 list_move(&dev->power.entry, &dpm_late_early_list); 1512 1495 1496 + dpm_clear_async_state(dev); 1513 1497 if (dpm_async_fn(dev, async_suspend_late)) 1514 1498 continue; 1515 1499 ··· 1668 1650 pm_runtime_disable(dev); 1669 1651 if (pm_runtime_status_suspended(dev)) { 1670 1652 pm_dev_dbg(dev, state, "direct-complete "); 1653 + dev->power.is_suspended = true; 1671 1654 goto Complete; 1672 1655 } 1673 1656 ··· 1779 1760 1780 1761 list_move(&dev->power.entry, &dpm_suspended_list); 1781 1762 1763 + dpm_clear_async_state(dev); 1782 1764 if (dpm_async_fn(dev, async_suspend)) 1783 1765 continue; 1784 1766 ··· 1811 1791 return error; 1812 1792 } 1813 1793 1794 + static bool device_prepare_smart_suspend(struct device *dev) 1795 + { 1796 + struct device_link *link; 1797 + bool ret = true; 1798 + int idx; 1799 + 1800 + /* 1801 + * The "smart suspend" feature is enabled for devices whose drivers ask 1802 + * for it and for devices without PM callbacks. 1803 + * 1804 + * However, if "smart suspend" is not enabled for the device's parent 1805 + * or any of its suppliers that take runtime PM into account, it cannot 1806 + * be enabled for the device either. 1807 + */ 1808 + if (!dev->power.no_pm_callbacks && 1809 + !dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) 1810 + return false; 1811 + 1812 + if (dev->parent && !dev_pm_smart_suspend(dev->parent) && 1813 + !dev->parent->power.ignore_children && !pm_runtime_blocked(dev->parent)) 1814 + return false; 1815 + 1816 + idx = device_links_read_lock(); 1817 + 1818 + list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) { 1819 + if (!(link->flags & DL_FLAG_PM_RUNTIME)) 1820 + continue; 1821 + 1822 + if (!dev_pm_smart_suspend(link->supplier) && 1823 + !pm_runtime_blocked(link->supplier)) { 1824 + ret = false; 1825 + break; 1826 + } 1827 + } 1828 + 1829 + device_links_read_unlock(idx); 1830 + 1831 + return ret; 1832 + } 1833 + 1814 1834 /** 1815 1835 * device_prepare - Prepare a device for system power transition. 1816 1836 * @dev: Device to handle. ··· 1862 1802 static int device_prepare(struct device *dev, pm_message_t state) 1863 1803 { 1864 1804 int (*callback)(struct device *) = NULL; 1805 + bool smart_suspend; 1865 1806 int ret = 0; 1866 1807 1867 1808 /* ··· 1872 1811 * it again during the complete phase. 1873 1812 */ 1874 1813 pm_runtime_get_noresume(dev); 1814 + /* 1815 + * If runtime PM is disabled for the device at this point and it has 1816 + * never been enabled so far, it should not be enabled until this system 1817 + * suspend-resume cycle is complete, so prepare to trigger a warning on 1818 + * subsequent attempts to enable it. 1819 + */ 1820 + smart_suspend = !pm_runtime_block_if_disabled(dev); 1875 1821 1876 1822 if (dev->power.syscore) 1877 1823 return 0; ··· 1913 1845 pm_runtime_put(dev); 1914 1846 return ret; 1915 1847 } 1848 + /* Do not enable "smart suspend" for devices with disabled runtime PM. */ 1849 + if (smart_suspend) 1850 + smart_suspend = device_prepare_smart_suspend(dev); 1851 + 1852 + spin_lock_irq(&dev->power.lock); 1853 + 1854 + dev->power.smart_suspend = smart_suspend; 1916 1855 /* 1917 1856 * A positive return value from ->prepare() means "this device appears 1918 1857 * to be runtime-suspended and its state is fine, so if it really is ··· 1927 1852 * will do the same thing with all of its descendants". This only 1928 1853 * applies to suspend transitions, however. 1929 1854 */ 1930 - spin_lock_irq(&dev->power.lock); 1931 1855 dev->power.direct_complete = state.event == PM_EVENT_SUSPEND && 1932 1856 (ret > 0 || dev->power.no_pm_callbacks) && 1933 1857 !dev_pm_test_driver_flags(dev, DPM_FLAG_NO_DIRECT_COMPLETE); 1858 + 1934 1859 spin_unlock_irq(&dev->power.lock); 1860 + 1935 1861 return 0; 1936 1862 } 1937 1863 ··· 2096 2020 2097 2021 bool dev_pm_skip_suspend(struct device *dev) 2098 2022 { 2099 - return dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) && 2100 - pm_runtime_status_suspended(dev); 2023 + return dev_pm_smart_suspend(dev) && pm_runtime_status_suspended(dev); 2101 2024 }
+55 -32
drivers/base/power/runtime.c
··· 448 448 retval = __rpm_callback(cb, dev); 449 449 } 450 450 451 - dev->power.runtime_error = retval; 452 - return retval != -EACCES ? retval : -EIO; 451 + /* 452 + * Since -EACCES means that runtime PM is disabled for the given device, 453 + * it should not be returned by runtime PM callbacks. If it is returned 454 + * nevertheless, assume it to be a transient error and convert it to 455 + * -EAGAIN. 456 + */ 457 + if (retval == -EACCES) 458 + retval = -EAGAIN; 459 + 460 + if (retval != -EAGAIN && retval != -EBUSY) 461 + dev->power.runtime_error = retval; 462 + 463 + return retval; 453 464 } 454 465 455 466 /** ··· 736 725 dev->power.deferred_resume = false; 737 726 wake_up_all(&dev->power.wait_queue); 738 727 739 - if (retval == -EAGAIN || retval == -EBUSY) { 740 - dev->power.runtime_error = 0; 728 + /* 729 + * On transient errors, if the callback routine failed an autosuspend, 730 + * and if the last_busy time has been updated so that there is a new 731 + * autosuspend expiration time, automatically reschedule another 732 + * autosuspend. 733 + */ 734 + if (!dev->power.runtime_error && (rpmflags & RPM_AUTO) && 735 + pm_runtime_autosuspend_expiration(dev) != 0) 736 + goto repeat; 741 737 742 - /* 743 - * If the callback routine failed an autosuspend, and 744 - * if the last_busy time has been updated so that there 745 - * is a new autosuspend expiration time, automatically 746 - * reschedule another autosuspend. 747 - */ 748 - if ((rpmflags & RPM_AUTO) && 749 - pm_runtime_autosuspend_expiration(dev) != 0) 750 - goto repeat; 751 - } else { 752 - pm_runtime_cancel_pending(dev); 753 - } 738 + pm_runtime_cancel_pending(dev); 739 + 754 740 goto out; 755 741 } 756 742 ··· 1468 1460 } 1469 1461 EXPORT_SYMBOL_GPL(pm_runtime_barrier); 1470 1462 1471 - /** 1472 - * __pm_runtime_disable - Disable runtime PM of a device. 1473 - * @dev: Device to handle. 1474 - * @check_resume: If set, check if there's a resume request for the device. 1475 - * 1476 - * Increment power.disable_depth for the device and if it was zero previously, 1477 - * cancel all pending runtime PM requests for the device and wait for all 1478 - * operations in progress to complete. The device can be either active or 1479 - * suspended after its runtime PM has been disabled. 1480 - * 1481 - * If @check_resume is set and there's a resume request pending when 1482 - * __pm_runtime_disable() is called and power.disable_depth is zero, the 1483 - * function will wake up the device before disabling its runtime PM. 1484 - */ 1463 + bool pm_runtime_block_if_disabled(struct device *dev) 1464 + { 1465 + bool ret; 1466 + 1467 + spin_lock_irq(&dev->power.lock); 1468 + 1469 + ret = !pm_runtime_enabled(dev); 1470 + if (ret && dev->power.last_status == RPM_INVALID) 1471 + dev->power.last_status = RPM_BLOCKED; 1472 + 1473 + spin_unlock_irq(&dev->power.lock); 1474 + 1475 + return ret; 1476 + } 1477 + 1478 + void pm_runtime_unblock(struct device *dev) 1479 + { 1480 + spin_lock_irq(&dev->power.lock); 1481 + 1482 + if (dev->power.last_status == RPM_BLOCKED) 1483 + dev->power.last_status = RPM_INVALID; 1484 + 1485 + spin_unlock_irq(&dev->power.lock); 1486 + } 1487 + 1485 1488 void __pm_runtime_disable(struct device *dev, bool check_resume) 1486 1489 { 1487 1490 spin_lock_irq(&dev->power.lock); ··· 1551 1532 if (--dev->power.disable_depth > 0) 1552 1533 goto out; 1553 1534 1535 + if (dev->power.last_status == RPM_BLOCKED) { 1536 + dev_warn(dev, "Attempt to enable runtime PM when it is blocked\n"); 1537 + dump_stack(); 1538 + } 1554 1539 dev->power.last_status = RPM_INVALID; 1555 1540 dev->power.accounting_timestamp = ktime_get_mono_fast_ns(); 1556 1541 ··· 1897 1874 pm_request_idle(link->supplier); 1898 1875 } 1899 1876 1900 - static bool pm_runtime_need_not_resume(struct device *dev) 1877 + bool pm_runtime_need_not_resume(struct device *dev) 1901 1878 { 1902 1879 return atomic_read(&dev->power.usage_count) <= 1 && 1903 1880 (atomic_read(&dev->power.child_count) == 0 || ··· 1982 1959 int (*callback)(struct device *); 1983 1960 int ret = 0; 1984 1961 1985 - if (!pm_runtime_status_suspended(dev) || !dev->power.needs_force_resume) 1962 + if (!dev->power.needs_force_resume) 1986 1963 goto out; 1987 1964 1988 1965 /*
+1 -1
drivers/cpufreq/Kconfig.arm
··· 254 254 255 255 config ARM_TEGRA194_CPUFREQ 256 256 tristate "Tegra194 CPUFreq support" 257 - depends on ARCH_TEGRA_194_SOC || (64BIT && COMPILE_TEST) 257 + depends on ARCH_TEGRA_194_SOC || ARCH_TEGRA_234_SOC || (64BIT && COMPILE_TEST) 258 258 depends on TEGRA_BPMP 259 259 default y 260 260 help
+3 -1
drivers/cpufreq/acpi-cpufreq.c
··· 909 909 if (perf->states[0].core_frequency * 1000 != freq_table[0].frequency) 910 910 pr_warn(FW_WARN "P-state 0 is not max freq\n"); 911 911 912 + if (acpi_cpufreq_driver.set_boost) 913 + policy->boost_supported = true; 914 + 912 915 return result; 913 916 914 917 err_unreg: ··· 952 949 } 953 950 954 951 static struct freq_attr *acpi_cpufreq_attr[] = { 955 - &cpufreq_freq_attr_scaling_available_freqs, 956 952 &freqdomain_cpus, 957 953 #ifdef CONFIG_X86_ACPI_CPUFREQ_CPB 958 954 &cpb,
+31 -26
drivers/cpufreq/amd-pstate-trace.h
··· 24 24 25 25 TRACE_EVENT(amd_pstate_perf, 26 26 27 - TP_PROTO(unsigned long min_perf, 28 - unsigned long target_perf, 29 - unsigned long capacity, 27 + TP_PROTO(u8 min_perf, 28 + u8 target_perf, 29 + u8 capacity, 30 30 u64 freq, 31 31 u64 mperf, 32 32 u64 aperf, ··· 47 47 ), 48 48 49 49 TP_STRUCT__entry( 50 - __field(unsigned long, min_perf) 51 - __field(unsigned long, target_perf) 52 - __field(unsigned long, capacity) 50 + __field(u8, min_perf) 51 + __field(u8, target_perf) 52 + __field(u8, capacity) 53 53 __field(unsigned long long, freq) 54 54 __field(unsigned long long, mperf) 55 55 __field(unsigned long long, aperf) ··· 70 70 __entry->fast_switch = fast_switch; 71 71 ), 72 72 73 - TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu freq=%llu mperf=%llu aperf=%llu tsc=%llu cpu_id=%u fast_switch=%s", 74 - (unsigned long)__entry->min_perf, 75 - (unsigned long)__entry->target_perf, 76 - (unsigned long)__entry->capacity, 73 + TP_printk("amd_min_perf=%hhu amd_des_perf=%hhu amd_max_perf=%hhu freq=%llu mperf=%llu aperf=%llu tsc=%llu cpu_id=%u fast_switch=%s", 74 + (u8)__entry->min_perf, 75 + (u8)__entry->target_perf, 76 + (u8)__entry->capacity, 77 77 (unsigned long long)__entry->freq, 78 78 (unsigned long long)__entry->mperf, 79 79 (unsigned long long)__entry->aperf, ··· 86 86 TRACE_EVENT(amd_pstate_epp_perf, 87 87 88 88 TP_PROTO(unsigned int cpu_id, 89 - unsigned int highest_perf, 90 - unsigned int epp, 91 - unsigned int min_perf, 92 - unsigned int max_perf, 93 - bool boost 89 + u8 highest_perf, 90 + u8 epp, 91 + u8 min_perf, 92 + u8 max_perf, 93 + bool boost, 94 + bool changed 94 95 ), 95 96 96 97 TP_ARGS(cpu_id, ··· 99 98 epp, 100 99 min_perf, 101 100 max_perf, 102 - boost), 101 + boost, 102 + changed), 103 103 104 104 TP_STRUCT__entry( 105 105 __field(unsigned int, cpu_id) 106 - __field(unsigned int, highest_perf) 107 - __field(unsigned int, epp) 108 - __field(unsigned int, min_perf) 109 - __field(unsigned int, max_perf) 106 + __field(u8, highest_perf) 107 + __field(u8, epp) 108 + __field(u8, min_perf) 109 + __field(u8, max_perf) 110 110 __field(bool, boost) 111 + __field(bool, changed) 111 112 ), 112 113 113 114 TP_fast_assign( ··· 119 116 __entry->min_perf = min_perf; 120 117 __entry->max_perf = max_perf; 121 118 __entry->boost = boost; 119 + __entry->changed = changed; 122 120 ), 123 121 124 - TP_printk("cpu%u: [%u<->%u]/%u, epp=%u, boost=%u", 122 + TP_printk("cpu%u: [%hhu<->%hhu]/%hhu, epp=%hhu, boost=%u, changed=%u", 125 123 (unsigned int)__entry->cpu_id, 126 - (unsigned int)__entry->min_perf, 127 - (unsigned int)__entry->max_perf, 128 - (unsigned int)__entry->highest_perf, 129 - (unsigned int)__entry->epp, 130 - (bool)__entry->boost 124 + (u8)__entry->min_perf, 125 + (u8)__entry->max_perf, 126 + (u8)__entry->highest_perf, 127 + (u8)__entry->epp, 128 + (bool)__entry->boost, 129 + (bool)__entry->changed 131 130 ) 132 131 ); 133 132
+91 -126
drivers/cpufreq/amd-pstate-ut.c
··· 22 22 23 23 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 24 24 25 + #include <linux/bitfield.h> 25 26 #include <linux/kernel.h> 26 27 #include <linux/module.h> 27 28 #include <linux/moduleparam.h> 28 29 #include <linux/fs.h> 30 + #include <linux/cleanup.h> 29 31 30 32 #include <acpi/cppc_acpi.h> 31 33 32 34 #include "amd-pstate.h" 33 35 34 - /* 35 - * Abbreviations: 36 - * amd_pstate_ut: used as a shortform for AMD P-State unit test. 37 - * It helps to keep variable names smaller, simpler 38 - */ 39 - enum amd_pstate_ut_result { 40 - AMD_PSTATE_UT_RESULT_PASS, 41 - AMD_PSTATE_UT_RESULT_FAIL, 42 - }; 43 36 44 37 struct amd_pstate_ut_struct { 45 38 const char *name; 46 - void (*func)(u32 index); 47 - enum amd_pstate_ut_result result; 39 + int (*func)(u32 index); 48 40 }; 49 41 50 42 /* 51 43 * Kernel module for testing the AMD P-State unit test 52 44 */ 53 - static void amd_pstate_ut_acpi_cpc_valid(u32 index); 54 - static void amd_pstate_ut_check_enabled(u32 index); 55 - static void amd_pstate_ut_check_perf(u32 index); 56 - static void amd_pstate_ut_check_freq(u32 index); 57 - static void amd_pstate_ut_check_driver(u32 index); 45 + static int amd_pstate_ut_acpi_cpc_valid(u32 index); 46 + static int amd_pstate_ut_check_enabled(u32 index); 47 + static int amd_pstate_ut_check_perf(u32 index); 48 + static int amd_pstate_ut_check_freq(u32 index); 49 + static int amd_pstate_ut_check_driver(u32 index); 58 50 59 51 static struct amd_pstate_ut_struct amd_pstate_ut_cases[] = { 60 52 {"amd_pstate_ut_acpi_cpc_valid", amd_pstate_ut_acpi_cpc_valid }, ··· 69 77 /* 70 78 * check the _CPC object is present in SBIOS. 71 79 */ 72 - static void amd_pstate_ut_acpi_cpc_valid(u32 index) 80 + static int amd_pstate_ut_acpi_cpc_valid(u32 index) 73 81 { 74 - if (acpi_cpc_valid()) 75 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_PASS; 76 - else { 77 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 82 + if (!acpi_cpc_valid()) { 78 83 pr_err("%s the _CPC object is not present in SBIOS!\n", __func__); 84 + return -EINVAL; 79 85 } 80 - } 81 86 82 - static void amd_pstate_ut_pstate_enable(u32 index) 83 - { 84 - int ret = 0; 85 - u64 cppc_enable = 0; 86 - 87 - ret = rdmsrl_safe(MSR_AMD_CPPC_ENABLE, &cppc_enable); 88 - if (ret) { 89 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 90 - pr_err("%s rdmsrl_safe MSR_AMD_CPPC_ENABLE ret=%d error!\n", __func__, ret); 91 - return; 92 - } 93 - if (cppc_enable) 94 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_PASS; 95 - else { 96 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 97 - pr_err("%s amd pstate must be enabled!\n", __func__); 98 - } 87 + return 0; 99 88 } 100 89 101 90 /* 102 91 * check if amd pstate is enabled 103 92 */ 104 - static void amd_pstate_ut_check_enabled(u32 index) 93 + static int amd_pstate_ut_check_enabled(u32 index) 105 94 { 95 + u64 cppc_enable = 0; 96 + int ret; 97 + 106 98 if (get_shared_mem()) 107 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_PASS; 108 - else 109 - amd_pstate_ut_pstate_enable(index); 99 + return 0; 100 + 101 + ret = rdmsrl_safe(MSR_AMD_CPPC_ENABLE, &cppc_enable); 102 + if (ret) { 103 + pr_err("%s rdmsrl_safe MSR_AMD_CPPC_ENABLE ret=%d error!\n", __func__, ret); 104 + return ret; 105 + } 106 + 107 + if (!cppc_enable) { 108 + pr_err("%s amd pstate must be enabled!\n", __func__); 109 + return -EINVAL; 110 + } 111 + 112 + return 0; 110 113 } 111 114 112 115 /* 113 116 * check if performance values are reasonable. 114 117 * highest_perf >= nominal_perf > lowest_nonlinear_perf > lowest_perf > 0 115 118 */ 116 - static void amd_pstate_ut_check_perf(u32 index) 119 + static int amd_pstate_ut_check_perf(u32 index) 117 120 { 118 121 int cpu = 0, ret = 0; 119 122 u32 highest_perf = 0, nominal_perf = 0, lowest_nonlinear_perf = 0, lowest_perf = 0; 120 123 u64 cap1 = 0; 121 124 struct cppc_perf_caps cppc_perf; 122 - struct cpufreq_policy *policy = NULL; 123 - struct amd_cpudata *cpudata = NULL; 125 + union perf_cached cur_perf; 124 126 125 - for_each_possible_cpu(cpu) { 127 + for_each_online_cpu(cpu) { 128 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = NULL; 129 + struct amd_cpudata *cpudata; 130 + 126 131 policy = cpufreq_cpu_get(cpu); 127 132 if (!policy) 128 - break; 133 + continue; 129 134 cpudata = policy->driver_data; 130 135 131 136 if (get_shared_mem()) { 132 137 ret = cppc_get_perf_caps(cpu, &cppc_perf); 133 138 if (ret) { 134 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 135 139 pr_err("%s cppc_get_perf_caps ret=%d error!\n", __func__, ret); 136 - goto skip_test; 140 + return ret; 137 141 } 138 142 139 143 highest_perf = cppc_perf.highest_perf; ··· 139 151 } else { 140 152 ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1); 141 153 if (ret) { 142 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 143 154 pr_err("%s read CPPC_CAP1 ret=%d error!\n", __func__, ret); 144 - goto skip_test; 155 + return ret; 145 156 } 146 157 147 - highest_perf = AMD_CPPC_HIGHEST_PERF(cap1); 148 - nominal_perf = AMD_CPPC_NOMINAL_PERF(cap1); 149 - lowest_nonlinear_perf = AMD_CPPC_LOWNONLIN_PERF(cap1); 150 - lowest_perf = AMD_CPPC_LOWEST_PERF(cap1); 158 + highest_perf = FIELD_GET(AMD_CPPC_HIGHEST_PERF_MASK, cap1); 159 + nominal_perf = FIELD_GET(AMD_CPPC_NOMINAL_PERF_MASK, cap1); 160 + lowest_nonlinear_perf = FIELD_GET(AMD_CPPC_LOWNONLIN_PERF_MASK, cap1); 161 + lowest_perf = FIELD_GET(AMD_CPPC_LOWEST_PERF_MASK, cap1); 151 162 } 152 163 153 - if (highest_perf != READ_ONCE(cpudata->highest_perf) && !cpudata->hw_prefcore) { 164 + cur_perf = READ_ONCE(cpudata->perf); 165 + if (highest_perf != cur_perf.highest_perf && !cpudata->hw_prefcore) { 154 166 pr_err("%s cpu%d highest=%d %d highest perf doesn't match\n", 155 - __func__, cpu, highest_perf, cpudata->highest_perf); 156 - goto skip_test; 167 + __func__, cpu, highest_perf, cur_perf.highest_perf); 168 + return -EINVAL; 157 169 } 158 - if ((nominal_perf != READ_ONCE(cpudata->nominal_perf)) || 159 - (lowest_nonlinear_perf != READ_ONCE(cpudata->lowest_nonlinear_perf)) || 160 - (lowest_perf != READ_ONCE(cpudata->lowest_perf))) { 161 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 170 + if (nominal_perf != cur_perf.nominal_perf || 171 + (lowest_nonlinear_perf != cur_perf.lowest_nonlinear_perf) || 172 + (lowest_perf != cur_perf.lowest_perf)) { 162 173 pr_err("%s cpu%d nominal=%d %d lowest_nonlinear=%d %d lowest=%d %d, they should be equal!\n", 163 - __func__, cpu, nominal_perf, cpudata->nominal_perf, 164 - lowest_nonlinear_perf, cpudata->lowest_nonlinear_perf, 165 - lowest_perf, cpudata->lowest_perf); 166 - goto skip_test; 174 + __func__, cpu, nominal_perf, cur_perf.nominal_perf, 175 + lowest_nonlinear_perf, cur_perf.lowest_nonlinear_perf, 176 + lowest_perf, cur_perf.lowest_perf); 177 + return -EINVAL; 167 178 } 168 179 169 180 if (!((highest_perf >= nominal_perf) && 170 181 (nominal_perf > lowest_nonlinear_perf) && 171 - (lowest_nonlinear_perf > lowest_perf) && 182 + (lowest_nonlinear_perf >= lowest_perf) && 172 183 (lowest_perf > 0))) { 173 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 174 184 pr_err("%s cpu%d highest=%d >= nominal=%d > lowest_nonlinear=%d > lowest=%d > 0, the formula is incorrect!\n", 175 185 __func__, cpu, highest_perf, nominal_perf, 176 186 lowest_nonlinear_perf, lowest_perf); 177 - goto skip_test; 187 + return -EINVAL; 178 188 } 179 - cpufreq_cpu_put(policy); 180 189 } 181 190 182 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_PASS; 183 - return; 184 - skip_test: 185 - cpufreq_cpu_put(policy); 191 + return 0; 186 192 } 187 193 188 194 /* ··· 184 202 * max_freq >= nominal_freq > lowest_nonlinear_freq > min_freq > 0 185 203 * check max freq when set support boost mode. 186 204 */ 187 - static void amd_pstate_ut_check_freq(u32 index) 205 + static int amd_pstate_ut_check_freq(u32 index) 188 206 { 189 207 int cpu = 0; 190 - struct cpufreq_policy *policy = NULL; 191 - struct amd_cpudata *cpudata = NULL; 192 208 193 - for_each_possible_cpu(cpu) { 209 + for_each_online_cpu(cpu) { 210 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = NULL; 211 + struct amd_cpudata *cpudata; 212 + 194 213 policy = cpufreq_cpu_get(cpu); 195 214 if (!policy) 196 - break; 215 + continue; 197 216 cpudata = policy->driver_data; 198 217 199 - if (!((cpudata->max_freq >= cpudata->nominal_freq) && 218 + if (!((policy->cpuinfo.max_freq >= cpudata->nominal_freq) && 200 219 (cpudata->nominal_freq > cpudata->lowest_nonlinear_freq) && 201 - (cpudata->lowest_nonlinear_freq > cpudata->min_freq) && 202 - (cpudata->min_freq > 0))) { 203 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 220 + (cpudata->lowest_nonlinear_freq >= policy->cpuinfo.min_freq) && 221 + (policy->cpuinfo.min_freq > 0))) { 204 222 pr_err("%s cpu%d max=%d >= nominal=%d > lowest_nonlinear=%d > min=%d > 0, the formula is incorrect!\n", 205 - __func__, cpu, cpudata->max_freq, cpudata->nominal_freq, 206 - cpudata->lowest_nonlinear_freq, cpudata->min_freq); 207 - goto skip_test; 223 + __func__, cpu, policy->cpuinfo.max_freq, cpudata->nominal_freq, 224 + cpudata->lowest_nonlinear_freq, policy->cpuinfo.min_freq); 225 + return -EINVAL; 208 226 } 209 227 210 228 if (cpudata->lowest_nonlinear_freq != policy->min) { 211 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 212 229 pr_err("%s cpu%d cpudata_lowest_nonlinear_freq=%d policy_min=%d, they should be equal!\n", 213 230 __func__, cpu, cpudata->lowest_nonlinear_freq, policy->min); 214 - goto skip_test; 231 + return -EINVAL; 215 232 } 216 233 217 234 if (cpudata->boost_supported) { 218 - if ((policy->max == cpudata->max_freq) || 219 - (policy->max == cpudata->nominal_freq)) 220 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_PASS; 221 - else { 222 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 235 + if ((policy->max != policy->cpuinfo.max_freq) && 236 + (policy->max != cpudata->nominal_freq)) { 223 237 pr_err("%s cpu%d policy_max=%d should be equal cpu_max=%d or cpu_nominal=%d !\n", 224 - __func__, cpu, policy->max, cpudata->max_freq, 238 + __func__, cpu, policy->max, policy->cpuinfo.max_freq, 225 239 cpudata->nominal_freq); 226 - goto skip_test; 240 + return -EINVAL; 227 241 } 228 242 } else { 229 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_FAIL; 230 243 pr_err("%s cpu%d must support boost!\n", __func__, cpu); 231 - goto skip_test; 244 + return -EINVAL; 232 245 } 233 - cpufreq_cpu_put(policy); 234 246 } 235 247 236 - amd_pstate_ut_cases[index].result = AMD_PSTATE_UT_RESULT_PASS; 237 - return; 238 - skip_test: 239 - cpufreq_cpu_put(policy); 248 + return 0; 240 249 } 241 250 242 251 static int amd_pstate_set_mode(enum amd_pstate_mode mode) ··· 239 266 return amd_pstate_update_status(mode_str, strlen(mode_str)); 240 267 } 241 268 242 - static void amd_pstate_ut_check_driver(u32 index) 269 + static int amd_pstate_ut_check_driver(u32 index) 243 270 { 244 271 enum amd_pstate_mode mode1, mode2 = AMD_PSTATE_DISABLE; 245 - int ret; 246 272 247 273 for (mode1 = AMD_PSTATE_DISABLE; mode1 < AMD_PSTATE_MAX; mode1++) { 248 - ret = amd_pstate_set_mode(mode1); 274 + int ret = amd_pstate_set_mode(mode1); 249 275 if (ret) 250 - goto out; 276 + return ret; 251 277 for (mode2 = AMD_PSTATE_DISABLE; mode2 < AMD_PSTATE_MAX; mode2++) { 252 278 if (mode1 == mode2) 253 279 continue; 254 280 ret = amd_pstate_set_mode(mode2); 255 - if (ret) 256 - goto out; 281 + if (ret) { 282 + pr_err("%s: failed to update status for %s->%s\n", __func__, 283 + amd_pstate_get_mode_string(mode1), 284 + amd_pstate_get_mode_string(mode2)); 285 + return ret; 286 + } 257 287 } 258 288 } 259 - out: 260 - if (ret) 261 - pr_warn("%s: failed to update status for %s->%s: %d\n", __func__, 262 - amd_pstate_get_mode_string(mode1), 263 - amd_pstate_get_mode_string(mode2), ret); 264 289 265 - amd_pstate_ut_cases[index].result = ret ? 266 - AMD_PSTATE_UT_RESULT_FAIL : 267 - AMD_PSTATE_UT_RESULT_PASS; 290 + return 0; 268 291 } 269 292 270 293 static int __init amd_pstate_ut_init(void) ··· 268 299 u32 i = 0, arr_size = ARRAY_SIZE(amd_pstate_ut_cases); 269 300 270 301 for (i = 0; i < arr_size; i++) { 271 - amd_pstate_ut_cases[i].func(i); 272 - switch (amd_pstate_ut_cases[i].result) { 273 - case AMD_PSTATE_UT_RESULT_PASS: 302 + int ret = amd_pstate_ut_cases[i].func(i); 303 + 304 + if (ret) 305 + pr_err("%-4d %-20s\t fail: %d!\n", i+1, amd_pstate_ut_cases[i].name, ret); 306 + else 274 307 pr_info("%-4d %-20s\t success!\n", i+1, amd_pstate_ut_cases[i].name); 275 - break; 276 - case AMD_PSTATE_UT_RESULT_FAIL: 277 - default: 278 - pr_info("%-4d %-20s\t fail!\n", i+1, amd_pstate_ut_cases[i].name); 279 - break; 280 - } 281 308 } 282 309 283 310 return 0;
+291 -379
drivers/cpufreq/amd-pstate.c
··· 85 85 static struct cpufreq_driver amd_pstate_driver; 86 86 static struct cpufreq_driver amd_pstate_epp_driver; 87 87 static int cppc_state = AMD_PSTATE_UNDEFINED; 88 - static bool cppc_enabled; 89 88 static bool amd_pstate_prefcore = true; 90 89 static struct quirk_entry *quirks; 91 - 92 - #define AMD_CPPC_MAX_PERF_MASK GENMASK(7, 0) 93 - #define AMD_CPPC_MIN_PERF_MASK GENMASK(15, 8) 94 - #define AMD_CPPC_DES_PERF_MASK GENMASK(23, 16) 95 - #define AMD_CPPC_EPP_PERF_MASK GENMASK(31, 24) 96 90 97 91 /* 98 92 * AMD Energy Preference Performance (EPP) ··· 136 142 .lowest_freq = 550, 137 143 }; 138 144 145 + static inline u8 freq_to_perf(union perf_cached perf, u32 nominal_freq, unsigned int freq_val) 146 + { 147 + u32 perf_val = DIV_ROUND_UP_ULL((u64)freq_val * perf.nominal_perf, nominal_freq); 148 + 149 + return (u8)clamp(perf_val, perf.lowest_perf, perf.highest_perf); 150 + } 151 + 152 + static inline u32 perf_to_freq(union perf_cached perf, u32 nominal_freq, u8 perf_val) 153 + { 154 + return DIV_ROUND_UP_ULL((u64)nominal_freq * perf_val, 155 + perf.nominal_perf); 156 + } 157 + 139 158 static int __init dmi_matched_7k62_bios_bug(const struct dmi_system_id *dmi) 140 159 { 141 160 /** ··· 190 183 return -EINVAL; 191 184 } 192 185 193 - static DEFINE_MUTEX(amd_pstate_limits_lock); 194 186 static DEFINE_MUTEX(amd_pstate_driver_lock); 195 187 196 - static s16 msr_get_epp(struct amd_cpudata *cpudata) 188 + static u8 msr_get_epp(struct amd_cpudata *cpudata) 197 189 { 198 190 u64 value; 199 191 int ret; ··· 213 207 return static_call(amd_pstate_get_epp)(cpudata); 214 208 } 215 209 216 - static s16 shmem_get_epp(struct amd_cpudata *cpudata) 210 + static u8 shmem_get_epp(struct amd_cpudata *cpudata) 217 211 { 218 212 u64 epp; 219 213 int ret; ··· 224 218 return ret; 225 219 } 226 220 227 - return (s16)(epp & 0xff); 221 + return FIELD_GET(AMD_CPPC_EPP_PERF_MASK, epp); 228 222 } 229 223 230 - static int msr_update_perf(struct amd_cpudata *cpudata, u32 min_perf, 231 - u32 des_perf, u32 max_perf, u32 epp, bool fast_switch) 224 + static int msr_update_perf(struct cpufreq_policy *policy, u8 min_perf, 225 + u8 des_perf, u8 max_perf, u8 epp, bool fast_switch) 232 226 { 227 + struct amd_cpudata *cpudata = policy->driver_data; 233 228 u64 value, prev; 234 229 235 230 value = prev = READ_ONCE(cpudata->cppc_req_cached); ··· 241 234 value |= FIELD_PREP(AMD_CPPC_DES_PERF_MASK, des_perf); 242 235 value |= FIELD_PREP(AMD_CPPC_MIN_PERF_MASK, min_perf); 243 236 value |= FIELD_PREP(AMD_CPPC_EPP_PERF_MASK, epp); 237 + 238 + if (trace_amd_pstate_epp_perf_enabled()) { 239 + union perf_cached perf = READ_ONCE(cpudata->perf); 240 + 241 + trace_amd_pstate_epp_perf(cpudata->cpu, 242 + perf.highest_perf, 243 + epp, 244 + min_perf, 245 + max_perf, 246 + policy->boost_enabled, 247 + value != prev); 248 + } 244 249 245 250 if (value == prev) 246 251 return 0; ··· 268 249 } 269 250 270 251 WRITE_ONCE(cpudata->cppc_req_cached, value); 271 - WRITE_ONCE(cpudata->epp_cached, epp); 272 252 273 253 return 0; 274 254 } 275 255 276 256 DEFINE_STATIC_CALL(amd_pstate_update_perf, msr_update_perf); 277 257 278 - static inline int amd_pstate_update_perf(struct amd_cpudata *cpudata, 279 - u32 min_perf, u32 des_perf, 280 - u32 max_perf, u32 epp, 258 + static inline int amd_pstate_update_perf(struct cpufreq_policy *policy, 259 + u8 min_perf, u8 des_perf, 260 + u8 max_perf, u8 epp, 281 261 bool fast_switch) 282 262 { 283 - return static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf, 263 + return static_call(amd_pstate_update_perf)(policy, min_perf, des_perf, 284 264 max_perf, epp, fast_switch); 285 265 } 286 266 287 - static int msr_set_epp(struct amd_cpudata *cpudata, u32 epp) 267 + static int msr_set_epp(struct cpufreq_policy *policy, u8 epp) 288 268 { 269 + struct amd_cpudata *cpudata = policy->driver_data; 289 270 u64 value, prev; 290 271 int ret; 291 272 292 273 value = prev = READ_ONCE(cpudata->cppc_req_cached); 293 274 value &= ~AMD_CPPC_EPP_PERF_MASK; 294 275 value |= FIELD_PREP(AMD_CPPC_EPP_PERF_MASK, epp); 276 + 277 + if (trace_amd_pstate_epp_perf_enabled()) { 278 + union perf_cached perf = cpudata->perf; 279 + 280 + trace_amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf, 281 + epp, 282 + FIELD_GET(AMD_CPPC_MIN_PERF_MASK, 283 + cpudata->cppc_req_cached), 284 + FIELD_GET(AMD_CPPC_MAX_PERF_MASK, 285 + cpudata->cppc_req_cached), 286 + policy->boost_enabled, 287 + value != prev); 288 + } 295 289 296 290 if (value == prev) 297 291 return 0; ··· 316 284 } 317 285 318 286 /* update both so that msr_update_perf() can effectively check */ 319 - WRITE_ONCE(cpudata->epp_cached, epp); 320 287 WRITE_ONCE(cpudata->cppc_req_cached, value); 321 288 322 289 return ret; ··· 323 292 324 293 DEFINE_STATIC_CALL(amd_pstate_set_epp, msr_set_epp); 325 294 326 - static inline int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp) 295 + static inline int amd_pstate_set_epp(struct cpufreq_policy *policy, u8 epp) 327 296 { 328 - return static_call(amd_pstate_set_epp)(cpudata, epp); 297 + return static_call(amd_pstate_set_epp)(policy, epp); 329 298 } 330 299 331 - static int shmem_set_epp(struct amd_cpudata *cpudata, u32 epp) 300 + static int shmem_set_epp(struct cpufreq_policy *policy, u8 epp) 332 301 { 333 - int ret; 302 + struct amd_cpudata *cpudata = policy->driver_data; 334 303 struct cppc_perf_ctrls perf_ctrls; 304 + u8 epp_cached; 305 + u64 value; 306 + int ret; 335 307 336 - if (epp == cpudata->epp_cached) 308 + 309 + epp_cached = FIELD_GET(AMD_CPPC_EPP_PERF_MASK, cpudata->cppc_req_cached); 310 + if (trace_amd_pstate_epp_perf_enabled()) { 311 + union perf_cached perf = cpudata->perf; 312 + 313 + trace_amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf, 314 + epp, 315 + FIELD_GET(AMD_CPPC_MIN_PERF_MASK, 316 + cpudata->cppc_req_cached), 317 + FIELD_GET(AMD_CPPC_MAX_PERF_MASK, 318 + cpudata->cppc_req_cached), 319 + policy->boost_enabled, 320 + epp != epp_cached); 321 + } 322 + 323 + if (epp == epp_cached) 337 324 return 0; 338 325 339 326 perf_ctrls.energy_perf = epp; ··· 360 311 pr_debug("failed to set energy perf value (%d)\n", ret); 361 312 return ret; 362 313 } 363 - WRITE_ONCE(cpudata->epp_cached, epp); 314 + 315 + value = READ_ONCE(cpudata->cppc_req_cached); 316 + value &= ~AMD_CPPC_EPP_PERF_MASK; 317 + value |= FIELD_PREP(AMD_CPPC_EPP_PERF_MASK, epp); 318 + WRITE_ONCE(cpudata->cppc_req_cached, value); 364 319 365 320 return ret; 366 321 } 367 322 368 - static int amd_pstate_set_energy_pref_index(struct cpufreq_policy *policy, 369 - int pref_index) 323 + static inline int msr_cppc_enable(struct cpufreq_policy *policy) 370 324 { 371 - struct amd_cpudata *cpudata = policy->driver_data; 372 - int epp; 373 - 374 - if (!pref_index) 375 - epp = cpudata->epp_default; 376 - else 377 - epp = epp_values[pref_index]; 378 - 379 - if (epp > 0 && cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) { 380 - pr_debug("EPP cannot be set under performance policy\n"); 381 - return -EBUSY; 382 - } 383 - 384 - if (trace_amd_pstate_epp_perf_enabled()) { 385 - trace_amd_pstate_epp_perf(cpudata->cpu, cpudata->highest_perf, 386 - epp, 387 - FIELD_GET(AMD_CPPC_MIN_PERF_MASK, cpudata->cppc_req_cached), 388 - FIELD_GET(AMD_CPPC_MAX_PERF_MASK, cpudata->cppc_req_cached), 389 - policy->boost_enabled); 390 - } 391 - 392 - return amd_pstate_set_epp(cpudata, epp); 325 + return wrmsrl_safe_on_cpu(policy->cpu, MSR_AMD_CPPC_ENABLE, 1); 393 326 } 394 327 395 - static inline int msr_cppc_enable(bool enable) 328 + static int shmem_cppc_enable(struct cpufreq_policy *policy) 396 329 { 397 - int ret, cpu; 398 - unsigned long logical_proc_id_mask = 0; 399 - 400 - /* 401 - * MSR_AMD_CPPC_ENABLE is write-once, once set it cannot be cleared. 402 - */ 403 - if (!enable) 404 - return 0; 405 - 406 - if (enable == cppc_enabled) 407 - return 0; 408 - 409 - for_each_present_cpu(cpu) { 410 - unsigned long logical_id = topology_logical_package_id(cpu); 411 - 412 - if (test_bit(logical_id, &logical_proc_id_mask)) 413 - continue; 414 - 415 - set_bit(logical_id, &logical_proc_id_mask); 416 - 417 - ret = wrmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_ENABLE, 418 - enable); 419 - if (ret) 420 - return ret; 421 - } 422 - 423 - cppc_enabled = enable; 424 - return 0; 425 - } 426 - 427 - static int shmem_cppc_enable(bool enable) 428 - { 429 - int cpu, ret = 0; 430 - struct cppc_perf_ctrls perf_ctrls; 431 - 432 - if (enable == cppc_enabled) 433 - return 0; 434 - 435 - for_each_present_cpu(cpu) { 436 - ret = cppc_set_enable(cpu, enable); 437 - if (ret) 438 - return ret; 439 - 440 - /* Enable autonomous mode for EPP */ 441 - if (cppc_state == AMD_PSTATE_ACTIVE) { 442 - /* Set desired perf as zero to allow EPP firmware control */ 443 - perf_ctrls.desired_perf = 0; 444 - ret = cppc_set_perf(cpu, &perf_ctrls); 445 - if (ret) 446 - return ret; 447 - } 448 - } 449 - 450 - cppc_enabled = enable; 451 - return ret; 330 + return cppc_set_enable(policy->cpu, 1); 452 331 } 453 332 454 333 DEFINE_STATIC_CALL(amd_pstate_cppc_enable, msr_cppc_enable); 455 334 456 - static inline int amd_pstate_cppc_enable(bool enable) 335 + static inline int amd_pstate_cppc_enable(struct cpufreq_policy *policy) 457 336 { 458 - return static_call(amd_pstate_cppc_enable)(enable); 337 + return static_call(amd_pstate_cppc_enable)(policy); 459 338 } 460 339 461 340 static int msr_init_perf(struct amd_cpudata *cpudata) 462 341 { 342 + union perf_cached perf = READ_ONCE(cpudata->perf); 463 343 u64 cap1, numerator; 464 344 465 345 int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, ··· 400 422 if (ret) 401 423 return ret; 402 424 403 - WRITE_ONCE(cpudata->highest_perf, numerator); 404 - WRITE_ONCE(cpudata->max_limit_perf, numerator); 405 - WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); 406 - WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); 407 - WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1)); 408 - WRITE_ONCE(cpudata->prefcore_ranking, AMD_CPPC_HIGHEST_PERF(cap1)); 409 - WRITE_ONCE(cpudata->min_limit_perf, AMD_CPPC_LOWEST_PERF(cap1)); 425 + perf.highest_perf = numerator; 426 + perf.max_limit_perf = numerator; 427 + perf.min_limit_perf = FIELD_GET(AMD_CPPC_LOWEST_PERF_MASK, cap1); 428 + perf.nominal_perf = FIELD_GET(AMD_CPPC_NOMINAL_PERF_MASK, cap1); 429 + perf.lowest_nonlinear_perf = FIELD_GET(AMD_CPPC_LOWNONLIN_PERF_MASK, cap1); 430 + perf.lowest_perf = FIELD_GET(AMD_CPPC_LOWEST_PERF_MASK, cap1); 431 + WRITE_ONCE(cpudata->perf, perf); 432 + WRITE_ONCE(cpudata->prefcore_ranking, FIELD_GET(AMD_CPPC_HIGHEST_PERF_MASK, cap1)); 433 + 410 434 return 0; 411 435 } 412 436 413 437 static int shmem_init_perf(struct amd_cpudata *cpudata) 414 438 { 415 439 struct cppc_perf_caps cppc_perf; 440 + union perf_cached perf = READ_ONCE(cpudata->perf); 416 441 u64 numerator; 417 442 418 443 int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); ··· 426 445 if (ret) 427 446 return ret; 428 447 429 - WRITE_ONCE(cpudata->highest_perf, numerator); 430 - WRITE_ONCE(cpudata->max_limit_perf, numerator); 431 - WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); 432 - WRITE_ONCE(cpudata->lowest_nonlinear_perf, 433 - cppc_perf.lowest_nonlinear_perf); 434 - WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf); 448 + perf.highest_perf = numerator; 449 + perf.max_limit_perf = numerator; 450 + perf.min_limit_perf = cppc_perf.lowest_perf; 451 + perf.nominal_perf = cppc_perf.nominal_perf; 452 + perf.lowest_nonlinear_perf = cppc_perf.lowest_nonlinear_perf; 453 + perf.lowest_perf = cppc_perf.lowest_perf; 454 + WRITE_ONCE(cpudata->perf, perf); 435 455 WRITE_ONCE(cpudata->prefcore_ranking, cppc_perf.highest_perf); 436 - WRITE_ONCE(cpudata->min_limit_perf, cppc_perf.lowest_perf); 437 456 438 457 if (cppc_state == AMD_PSTATE_ACTIVE) 439 458 return 0; ··· 460 479 return static_call(amd_pstate_init_perf)(cpudata); 461 480 } 462 481 463 - static int shmem_update_perf(struct amd_cpudata *cpudata, u32 min_perf, 464 - u32 des_perf, u32 max_perf, u32 epp, bool fast_switch) 482 + static int shmem_update_perf(struct cpufreq_policy *policy, u8 min_perf, 483 + u8 des_perf, u8 max_perf, u8 epp, bool fast_switch) 465 484 { 485 + struct amd_cpudata *cpudata = policy->driver_data; 466 486 struct cppc_perf_ctrls perf_ctrls; 487 + u64 value, prev; 488 + int ret; 467 489 468 490 if (cppc_state == AMD_PSTATE_ACTIVE) { 469 - int ret = shmem_set_epp(cpudata, epp); 491 + int ret = shmem_set_epp(policy, epp); 470 492 471 493 if (ret) 472 494 return ret; 473 495 } 474 496 497 + value = prev = READ_ONCE(cpudata->cppc_req_cached); 498 + 499 + value &= ~(AMD_CPPC_MAX_PERF_MASK | AMD_CPPC_MIN_PERF_MASK | 500 + AMD_CPPC_DES_PERF_MASK | AMD_CPPC_EPP_PERF_MASK); 501 + value |= FIELD_PREP(AMD_CPPC_MAX_PERF_MASK, max_perf); 502 + value |= FIELD_PREP(AMD_CPPC_DES_PERF_MASK, des_perf); 503 + value |= FIELD_PREP(AMD_CPPC_MIN_PERF_MASK, min_perf); 504 + value |= FIELD_PREP(AMD_CPPC_EPP_PERF_MASK, epp); 505 + 506 + if (trace_amd_pstate_epp_perf_enabled()) { 507 + union perf_cached perf = READ_ONCE(cpudata->perf); 508 + 509 + trace_amd_pstate_epp_perf(cpudata->cpu, 510 + perf.highest_perf, 511 + epp, 512 + min_perf, 513 + max_perf, 514 + policy->boost_enabled, 515 + value != prev); 516 + } 517 + 518 + if (value == prev) 519 + return 0; 520 + 475 521 perf_ctrls.max_perf = max_perf; 476 522 perf_ctrls.min_perf = min_perf; 477 523 perf_ctrls.desired_perf = des_perf; 478 524 479 - return cppc_set_perf(cpudata->cpu, &perf_ctrls); 525 + ret = cppc_set_perf(cpudata->cpu, &perf_ctrls); 526 + if (ret) 527 + return ret; 528 + 529 + WRITE_ONCE(cpudata->cppc_req_cached, value); 530 + 531 + return 0; 480 532 } 481 533 482 534 static inline bool amd_pstate_sample(struct amd_cpudata *cpudata) ··· 545 531 return true; 546 532 } 547 533 548 - static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf, 549 - u32 des_perf, u32 max_perf, bool fast_switch, int gov_flags) 534 + static void amd_pstate_update(struct amd_cpudata *cpudata, u8 min_perf, 535 + u8 des_perf, u8 max_perf, bool fast_switch, int gov_flags) 550 536 { 551 - unsigned long max_freq; 552 - struct cpufreq_policy *policy = cpufreq_cpu_get(cpudata->cpu); 553 - u32 nominal_perf = READ_ONCE(cpudata->nominal_perf); 537 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpudata->cpu); 538 + union perf_cached perf = READ_ONCE(cpudata->perf); 554 539 555 - des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf); 540 + if (!policy) 541 + return; 556 542 557 - max_freq = READ_ONCE(cpudata->max_limit_freq); 558 - policy->cur = div_u64(des_perf * max_freq, max_perf); 543 + des_perf = clamp_t(u8, des_perf, min_perf, max_perf); 544 + 545 + policy->cur = perf_to_freq(perf, cpudata->nominal_freq, des_perf); 559 546 560 547 if ((cppc_state == AMD_PSTATE_GUIDED) && (gov_flags & CPUFREQ_GOV_DYNAMIC_SWITCHING)) { 561 548 min_perf = des_perf; ··· 565 550 566 551 /* limit the max perf when core performance boost feature is disabled */ 567 552 if (!cpudata->boost_supported) 568 - max_perf = min_t(unsigned long, nominal_perf, max_perf); 553 + max_perf = min_t(u8, perf.nominal_perf, max_perf); 569 554 570 555 if (trace_amd_pstate_perf_enabled() && amd_pstate_sample(cpudata)) { 571 556 trace_amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->freq, ··· 573 558 cpudata->cpu, fast_switch); 574 559 } 575 560 576 - amd_pstate_update_perf(cpudata, min_perf, des_perf, max_perf, 0, fast_switch); 577 - 578 - cpufreq_cpu_put(policy); 561 + amd_pstate_update_perf(policy, min_perf, des_perf, max_perf, 0, fast_switch); 579 562 } 580 563 581 564 static int amd_pstate_verify(struct cpufreq_policy_data *policy_data) ··· 585 572 * amd-pstate qos_requests. 586 573 */ 587 574 if (policy_data->min == FREQ_QOS_MIN_DEFAULT_VALUE) { 588 - struct cpufreq_policy *policy = cpufreq_cpu_get(policy_data->cpu); 575 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = 576 + cpufreq_cpu_get(policy_data->cpu); 589 577 struct amd_cpudata *cpudata; 590 578 591 579 if (!policy) ··· 594 580 595 581 cpudata = policy->driver_data; 596 582 policy_data->min = cpudata->lowest_nonlinear_freq; 597 - cpufreq_cpu_put(policy); 598 583 } 599 584 600 585 cpufreq_verify_within_cpu_limits(policy_data); 601 - pr_debug("policy_max =%d, policy_min=%d\n", policy_data->max, policy_data->min); 602 586 603 587 return 0; 604 588 } 605 589 606 - static int amd_pstate_update_min_max_limit(struct cpufreq_policy *policy) 590 + static void amd_pstate_update_min_max_limit(struct cpufreq_policy *policy) 607 591 { 608 - u32 max_limit_perf, min_limit_perf, max_perf, max_freq; 609 592 struct amd_cpudata *cpudata = policy->driver_data; 593 + union perf_cached perf = READ_ONCE(cpudata->perf); 610 594 611 - max_perf = READ_ONCE(cpudata->highest_perf); 612 - max_freq = READ_ONCE(cpudata->max_freq); 613 - max_limit_perf = div_u64(policy->max * max_perf, max_freq); 614 - min_limit_perf = div_u64(policy->min * max_perf, max_freq); 595 + perf.max_limit_perf = freq_to_perf(perf, cpudata->nominal_freq, policy->max); 596 + perf.min_limit_perf = freq_to_perf(perf, cpudata->nominal_freq, policy->min); 615 597 616 598 if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) 617 - min_limit_perf = min(cpudata->nominal_perf, max_limit_perf); 599 + perf.min_limit_perf = min(perf.nominal_perf, perf.max_limit_perf); 618 600 619 - WRITE_ONCE(cpudata->max_limit_perf, max_limit_perf); 620 - WRITE_ONCE(cpudata->min_limit_perf, min_limit_perf); 621 601 WRITE_ONCE(cpudata->max_limit_freq, policy->max); 622 602 WRITE_ONCE(cpudata->min_limit_freq, policy->min); 623 - 624 - return 0; 603 + WRITE_ONCE(cpudata->perf, perf); 625 604 } 626 605 627 606 static int amd_pstate_update_freq(struct cpufreq_policy *policy, 628 607 unsigned int target_freq, bool fast_switch) 629 608 { 630 609 struct cpufreq_freqs freqs; 631 - struct amd_cpudata *cpudata = policy->driver_data; 632 - unsigned long max_perf, min_perf, des_perf, cap_perf; 610 + struct amd_cpudata *cpudata; 611 + union perf_cached perf; 612 + u8 des_perf; 633 613 634 - if (!cpudata->max_freq) 635 - return -ENODEV; 614 + cpudata = policy->driver_data; 636 615 637 616 if (policy->min != cpudata->min_limit_freq || policy->max != cpudata->max_limit_freq) 638 617 amd_pstate_update_min_max_limit(policy); 639 618 640 - cap_perf = READ_ONCE(cpudata->highest_perf); 641 - min_perf = READ_ONCE(cpudata->lowest_perf); 642 - max_perf = cap_perf; 619 + perf = READ_ONCE(cpudata->perf); 643 620 644 621 freqs.old = policy->cur; 645 622 freqs.new = target_freq; 646 623 647 - des_perf = DIV_ROUND_CLOSEST(target_freq * cap_perf, 648 - cpudata->max_freq); 624 + des_perf = freq_to_perf(perf, cpudata->nominal_freq, target_freq); 649 625 650 626 WARN_ON(fast_switch && !policy->fast_switch_enabled); 651 627 /* ··· 646 642 if (!fast_switch) 647 643 cpufreq_freq_transition_begin(policy, &freqs); 648 644 649 - amd_pstate_update(cpudata, min_perf, des_perf, 650 - max_perf, fast_switch, policy->governor->flags); 645 + amd_pstate_update(cpudata, perf.min_limit_perf, des_perf, 646 + perf.max_limit_perf, fast_switch, 647 + policy->governor->flags); 651 648 652 649 if (!fast_switch) 653 650 cpufreq_freq_transition_end(policy, &freqs, false); ··· 676 671 unsigned long target_perf, 677 672 unsigned long capacity) 678 673 { 679 - unsigned long max_perf, min_perf, des_perf, 680 - cap_perf, lowest_nonlinear_perf; 681 - struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); 674 + u8 max_perf, min_perf, des_perf, cap_perf; 675 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu); 682 676 struct amd_cpudata *cpudata; 677 + union perf_cached perf; 683 678 684 679 if (!policy) 685 680 return; ··· 689 684 if (policy->min != cpudata->min_limit_freq || policy->max != cpudata->max_limit_freq) 690 685 amd_pstate_update_min_max_limit(policy); 691 686 692 - 693 - cap_perf = READ_ONCE(cpudata->highest_perf); 694 - lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf); 687 + perf = READ_ONCE(cpudata->perf); 688 + cap_perf = perf.highest_perf; 695 689 696 690 des_perf = cap_perf; 697 691 if (target_perf < capacity) 698 692 des_perf = DIV_ROUND_UP(cap_perf * target_perf, capacity); 699 693 700 - min_perf = READ_ONCE(cpudata->lowest_perf); 701 694 if (_min_perf < capacity) 702 695 min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity); 696 + else 697 + min_perf = cap_perf; 703 698 704 - if (min_perf < lowest_nonlinear_perf) 705 - min_perf = lowest_nonlinear_perf; 699 + if (min_perf < perf.min_limit_perf) 700 + min_perf = perf.min_limit_perf; 706 701 707 - max_perf = cpudata->max_limit_perf; 702 + max_perf = perf.max_limit_perf; 708 703 if (max_perf < min_perf) 709 704 max_perf = min_perf; 710 705 711 - des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf); 712 - 713 706 amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true, 714 707 policy->governor->flags); 715 - cpufreq_cpu_put(policy); 716 708 } 717 709 718 710 static int amd_pstate_cpu_boost_update(struct cpufreq_policy *policy, bool on) 719 711 { 720 712 struct amd_cpudata *cpudata = policy->driver_data; 713 + union perf_cached perf = READ_ONCE(cpudata->perf); 721 714 u32 nominal_freq, max_freq; 722 715 int ret = 0; 723 716 724 717 nominal_freq = READ_ONCE(cpudata->nominal_freq); 725 - max_freq = READ_ONCE(cpudata->max_freq); 718 + max_freq = perf_to_freq(perf, cpudata->nominal_freq, perf.highest_perf); 726 719 727 720 if (on) 728 721 policy->cpuinfo.max_freq = max_freq; ··· 747 744 pr_err("Boost mode is not supported by this processor or SBIOS\n"); 748 745 return -EOPNOTSUPP; 749 746 } 750 - guard(mutex)(&amd_pstate_driver_lock); 751 747 752 748 ret = amd_pstate_cpu_boost_update(policy, state); 753 749 refresh_frequency_limits(policy); ··· 823 821 824 822 static void amd_pstate_update_limits(unsigned int cpu) 825 823 { 826 - struct cpufreq_policy *policy = NULL; 824 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu); 827 825 struct amd_cpudata *cpudata; 828 826 u32 prev_high = 0, cur_high = 0; 829 - int ret; 830 827 bool highest_perf_changed = false; 831 828 832 829 if (!amd_pstate_prefcore) 833 830 return; 834 831 835 - policy = cpufreq_cpu_get(cpu); 836 832 if (!policy) 837 833 return; 838 834 839 - cpudata = policy->driver_data; 840 - 841 - guard(mutex)(&amd_pstate_driver_lock); 842 - 843 - ret = amd_get_highest_perf(cpu, &cur_high); 844 - if (ret) { 845 - cpufreq_cpu_put(policy); 835 + if (amd_get_highest_perf(cpu, &cur_high)) 846 836 return; 847 - } 837 + 838 + cpudata = policy->driver_data; 848 839 849 840 prev_high = READ_ONCE(cpudata->prefcore_ranking); 850 841 highest_perf_changed = (prev_high != cur_high); ··· 847 852 if (cur_high < CPPC_MAX_PERF) 848 853 sched_set_itmt_core_prio((int)cur_high, cpu); 849 854 } 850 - cpufreq_cpu_put(policy); 851 - 852 - if (!highest_perf_changed) 853 - cpufreq_update_policy(cpu); 854 - 855 855 } 856 856 857 857 /* ··· 884 894 } 885 895 886 896 /* 887 - * amd_pstate_init_freq: Initialize the max_freq, min_freq, 888 - * nominal_freq and lowest_nonlinear_freq for 889 - * the @cpudata object. 897 + * amd_pstate_init_freq: Initialize the nominal_freq and lowest_nonlinear_freq 898 + * for the @cpudata object. 890 899 * 891 - * Requires: highest_perf, lowest_perf, nominal_perf and 892 - * lowest_nonlinear_perf members of @cpudata to be 893 - * initialized. 900 + * Requires: all perf members of @cpudata to be initialized. 894 901 * 895 - * Returns 0 on success, non-zero value on failure. 902 + * Returns 0 on success, non-zero value on failure. 896 903 */ 897 904 static int amd_pstate_init_freq(struct amd_cpudata *cpudata) 898 905 { 899 - int ret; 900 - u32 min_freq, max_freq; 901 - u32 highest_perf, nominal_perf, nominal_freq; 902 - u32 lowest_nonlinear_perf, lowest_nonlinear_freq; 906 + u32 min_freq, max_freq, nominal_freq, lowest_nonlinear_freq; 903 907 struct cppc_perf_caps cppc_perf; 908 + union perf_cached perf; 909 + int ret; 904 910 905 911 ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); 906 912 if (ret) 907 913 return ret; 908 - 909 - if (quirks && quirks->lowest_freq) 910 - min_freq = quirks->lowest_freq; 911 - else 912 - min_freq = cppc_perf.lowest_freq; 914 + perf = READ_ONCE(cpudata->perf); 913 915 914 916 if (quirks && quirks->nominal_freq) 915 917 nominal_freq = quirks->nominal_freq; 916 918 else 917 919 nominal_freq = cppc_perf.nominal_freq; 920 + nominal_freq *= 1000; 918 921 919 - highest_perf = READ_ONCE(cpudata->highest_perf); 920 - nominal_perf = READ_ONCE(cpudata->nominal_perf); 921 - max_freq = div_u64((u64)highest_perf * nominal_freq, nominal_perf); 922 + if (quirks && quirks->lowest_freq) { 923 + min_freq = quirks->lowest_freq; 924 + perf.lowest_perf = freq_to_perf(perf, nominal_freq, min_freq); 925 + WRITE_ONCE(cpudata->perf, perf); 926 + } else 927 + min_freq = cppc_perf.lowest_freq; 922 928 923 - lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf); 924 - lowest_nonlinear_freq = div_u64((u64)nominal_freq * lowest_nonlinear_perf, nominal_perf); 925 - WRITE_ONCE(cpudata->min_freq, min_freq * 1000); 926 - WRITE_ONCE(cpudata->lowest_nonlinear_freq, lowest_nonlinear_freq * 1000); 927 - WRITE_ONCE(cpudata->nominal_freq, nominal_freq * 1000); 928 - WRITE_ONCE(cpudata->max_freq, max_freq * 1000); 929 + min_freq *= 1000; 930 + 931 + WRITE_ONCE(cpudata->nominal_freq, nominal_freq); 932 + 933 + max_freq = perf_to_freq(perf, nominal_freq, perf.highest_perf); 934 + lowest_nonlinear_freq = perf_to_freq(perf, nominal_freq, perf.lowest_nonlinear_perf); 935 + WRITE_ONCE(cpudata->lowest_nonlinear_freq, lowest_nonlinear_freq); 929 936 930 937 /** 931 938 * Below values need to be initialized correctly, otherwise driver will fail to load ··· 947 960 948 961 static int amd_pstate_cpu_init(struct cpufreq_policy *policy) 949 962 { 950 - int min_freq, max_freq, ret; 951 - struct device *dev; 952 963 struct amd_cpudata *cpudata; 964 + union perf_cached perf; 965 + struct device *dev; 966 + int ret; 953 967 954 968 /* 955 969 * Resetting PERF_CTL_MSR will put the CPU in P0 frequency, ··· 981 993 if (ret) 982 994 goto free_cpudata1; 983 995 984 - min_freq = READ_ONCE(cpudata->min_freq); 985 - max_freq = READ_ONCE(cpudata->max_freq); 986 - 987 996 policy->cpuinfo.transition_latency = amd_pstate_get_transition_latency(policy->cpu); 988 997 policy->transition_delay_us = amd_pstate_get_transition_delay_us(policy->cpu); 989 998 990 - policy->min = min_freq; 991 - policy->max = max_freq; 999 + perf = READ_ONCE(cpudata->perf); 992 1000 993 - policy->cpuinfo.min_freq = min_freq; 994 - policy->cpuinfo.max_freq = max_freq; 1001 + policy->cpuinfo.min_freq = policy->min = perf_to_freq(perf, 1002 + cpudata->nominal_freq, 1003 + perf.lowest_perf); 1004 + policy->cpuinfo.max_freq = policy->max = perf_to_freq(perf, 1005 + cpudata->nominal_freq, 1006 + perf.highest_perf); 995 1007 996 - policy->boost_enabled = READ_ONCE(cpudata->boost_supported); 1008 + ret = amd_pstate_cppc_enable(policy); 1009 + if (ret) 1010 + goto free_cpudata1; 1011 + 1012 + policy->boost_supported = READ_ONCE(cpudata->boost_supported); 997 1013 998 1014 /* It will be updated by governor */ 999 1015 policy->cur = policy->cpuinfo.min_freq; ··· 1019 1027 goto free_cpudata2; 1020 1028 } 1021 1029 1022 - cpudata->max_limit_freq = max_freq; 1023 - cpudata->min_limit_freq = min_freq; 1024 - 1025 1030 policy->driver_data = cpudata; 1026 1031 1027 1032 if (!current_pstate_driver->adjust_perf) ··· 1029 1040 free_cpudata2: 1030 1041 freq_qos_remove_request(&cpudata->req[0]); 1031 1042 free_cpudata1: 1043 + pr_warn("Failed to initialize CPU %d: %d\n", policy->cpu, ret); 1032 1044 kfree(cpudata); 1033 1045 return ret; 1034 1046 } ··· 1044 1054 kfree(cpudata); 1045 1055 } 1046 1056 1047 - static int amd_pstate_cpu_resume(struct cpufreq_policy *policy) 1048 - { 1049 - int ret; 1050 - 1051 - ret = amd_pstate_cppc_enable(true); 1052 - if (ret) 1053 - pr_err("failed to enable amd-pstate during resume, return %d\n", ret); 1054 - 1055 - return ret; 1056 - } 1057 - 1058 - static int amd_pstate_cpu_suspend(struct cpufreq_policy *policy) 1059 - { 1060 - int ret; 1061 - 1062 - ret = amd_pstate_cppc_enable(false); 1063 - if (ret) 1064 - pr_err("failed to disable amd-pstate during suspend, return %d\n", ret); 1065 - 1066 - return ret; 1067 - } 1068 - 1069 1057 /* Sysfs attributes */ 1070 1058 1071 1059 /* ··· 1054 1086 static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy, 1055 1087 char *buf) 1056 1088 { 1057 - int max_freq; 1058 - struct amd_cpudata *cpudata = policy->driver_data; 1089 + struct amd_cpudata *cpudata; 1090 + union perf_cached perf; 1059 1091 1060 - max_freq = READ_ONCE(cpudata->max_freq); 1061 - if (max_freq < 0) 1062 - return max_freq; 1092 + cpudata = policy->driver_data; 1093 + perf = READ_ONCE(cpudata->perf); 1063 1094 1064 - return sysfs_emit(buf, "%u\n", max_freq); 1095 + return sysfs_emit(buf, "%u\n", 1096 + perf_to_freq(perf, cpudata->nominal_freq, perf.highest_perf)); 1065 1097 } 1066 1098 1067 1099 static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy, 1068 1100 char *buf) 1069 1101 { 1070 - int freq; 1071 - struct amd_cpudata *cpudata = policy->driver_data; 1102 + struct amd_cpudata *cpudata; 1103 + union perf_cached perf; 1072 1104 1073 - freq = READ_ONCE(cpudata->lowest_nonlinear_freq); 1074 - if (freq < 0) 1075 - return freq; 1105 + cpudata = policy->driver_data; 1106 + perf = READ_ONCE(cpudata->perf); 1076 1107 1077 - return sysfs_emit(buf, "%u\n", freq); 1108 + return sysfs_emit(buf, "%u\n", 1109 + perf_to_freq(perf, cpudata->nominal_freq, perf.lowest_nonlinear_perf)); 1078 1110 } 1079 1111 1080 1112 /* ··· 1084 1116 static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, 1085 1117 char *buf) 1086 1118 { 1087 - u32 perf; 1088 - struct amd_cpudata *cpudata = policy->driver_data; 1119 + struct amd_cpudata *cpudata; 1089 1120 1090 - perf = READ_ONCE(cpudata->highest_perf); 1121 + cpudata = policy->driver_data; 1091 1122 1092 - return sysfs_emit(buf, "%u\n", perf); 1123 + return sysfs_emit(buf, "%u\n", cpudata->perf.highest_perf); 1093 1124 } 1094 1125 1095 1126 static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy, 1096 1127 char *buf) 1097 1128 { 1098 - u32 perf; 1129 + u8 perf; 1099 1130 struct amd_cpudata *cpudata = policy->driver_data; 1100 1131 1101 1132 perf = READ_ONCE(cpudata->prefcore_ranking); ··· 1135 1168 static ssize_t store_energy_performance_preference( 1136 1169 struct cpufreq_policy *policy, const char *buf, size_t count) 1137 1170 { 1171 + struct amd_cpudata *cpudata = policy->driver_data; 1138 1172 char str_preference[21]; 1139 1173 ssize_t ret; 1174 + u8 epp; 1140 1175 1141 1176 ret = sscanf(buf, "%20s", str_preference); 1142 1177 if (ret != 1) ··· 1148 1179 if (ret < 0) 1149 1180 return -EINVAL; 1150 1181 1151 - guard(mutex)(&amd_pstate_limits_lock); 1182 + if (!ret) 1183 + epp = cpudata->epp_default; 1184 + else 1185 + epp = epp_values[ret]; 1152 1186 1153 - ret = amd_pstate_set_energy_pref_index(policy, ret); 1187 + if (epp > 0 && policy->policy == CPUFREQ_POLICY_PERFORMANCE) { 1188 + pr_debug("EPP cannot be set under performance policy\n"); 1189 + return -EBUSY; 1190 + } 1191 + 1192 + ret = amd_pstate_set_epp(policy, epp); 1154 1193 1155 1194 return ret ? ret : count; 1156 1195 } ··· 1167 1190 struct cpufreq_policy *policy, char *buf) 1168 1191 { 1169 1192 struct amd_cpudata *cpudata = policy->driver_data; 1170 - int preference; 1193 + u8 preference, epp; 1171 1194 1172 - switch (cpudata->epp_cached) { 1195 + epp = FIELD_GET(AMD_CPPC_EPP_PERF_MASK, cpudata->cppc_req_cached); 1196 + 1197 + switch (epp) { 1173 1198 case AMD_CPPC_EPP_PERFORMANCE: 1174 1199 preference = EPP_INDEX_PERFORMANCE; 1175 1200 break; ··· 1193 1214 1194 1215 static void amd_pstate_driver_cleanup(void) 1195 1216 { 1196 - amd_pstate_cppc_enable(false); 1197 1217 cppc_state = AMD_PSTATE_DISABLE; 1198 1218 current_pstate_driver = NULL; 1199 1219 } ··· 1225 1247 return ret; 1226 1248 1227 1249 cppc_state = mode; 1228 - 1229 - ret = amd_pstate_cppc_enable(true); 1230 - if (ret) { 1231 - pr_err("failed to enable cppc during amd-pstate driver registration, return %d\n", 1232 - ret); 1233 - amd_pstate_driver_cleanup(); 1234 - return ret; 1235 - } 1236 1250 1237 1251 /* at least one CPU supports CPB */ 1238 1252 current_pstate_driver->boost_enabled = cpu_feature_enabled(X86_FEATURE_CPB); ··· 1323 1353 if (mode_idx < 0 || mode_idx >= AMD_PSTATE_MAX) 1324 1354 return -EINVAL; 1325 1355 1326 - if (mode_state_machine[cppc_state][mode_idx]) 1356 + if (mode_state_machine[cppc_state][mode_idx]) { 1357 + guard(mutex)(&amd_pstate_driver_lock); 1327 1358 return mode_state_machine[cppc_state][mode_idx](mode_idx); 1359 + } 1328 1360 1329 1361 return 0; 1330 1362 } ··· 1347 1375 char *p = memchr(buf, '\n', count); 1348 1376 int ret; 1349 1377 1350 - guard(mutex)(&amd_pstate_driver_lock); 1351 1378 ret = amd_pstate_update_status(buf, p ? p - buf : count); 1352 1379 1353 1380 return ret < 0 ? ret : count; ··· 1422 1451 1423 1452 static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) 1424 1453 { 1425 - int min_freq, max_freq, ret; 1426 1454 struct amd_cpudata *cpudata; 1455 + union perf_cached perf; 1427 1456 struct device *dev; 1428 1457 u64 value; 1458 + int ret; 1429 1459 1430 1460 /* 1431 1461 * Resetting PERF_CTL_MSR will put the CPU in P0 frequency, ··· 1457 1485 if (ret) 1458 1486 goto free_cpudata1; 1459 1487 1460 - min_freq = READ_ONCE(cpudata->min_freq); 1461 - max_freq = READ_ONCE(cpudata->max_freq); 1488 + perf = READ_ONCE(cpudata->perf); 1462 1489 1463 - policy->cpuinfo.min_freq = min_freq; 1464 - policy->cpuinfo.max_freq = max_freq; 1490 + policy->cpuinfo.min_freq = policy->min = perf_to_freq(perf, 1491 + cpudata->nominal_freq, 1492 + perf.lowest_perf); 1493 + policy->cpuinfo.max_freq = policy->max = perf_to_freq(perf, 1494 + cpudata->nominal_freq, 1495 + perf.highest_perf); 1496 + policy->driver_data = cpudata; 1497 + 1498 + ret = amd_pstate_cppc_enable(policy); 1499 + if (ret) 1500 + goto free_cpudata1; 1501 + 1465 1502 /* It will be updated by governor */ 1466 1503 policy->cur = policy->cpuinfo.min_freq; 1467 1504 1468 - policy->driver_data = cpudata; 1469 1505 1470 - policy->min = policy->cpuinfo.min_freq; 1471 - policy->max = policy->cpuinfo.max_freq; 1472 - 1473 - policy->boost_enabled = READ_ONCE(cpudata->boost_supported); 1506 + policy->boost_supported = READ_ONCE(cpudata->boost_supported); 1474 1507 1475 1508 /* 1476 1509 * Set the policy to provide a valid fallback value in case ··· 1495 1518 if (ret) 1496 1519 return ret; 1497 1520 WRITE_ONCE(cpudata->cppc_req_cached, value); 1498 - 1499 - ret = rdmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1, &value); 1500 - if (ret) 1501 - return ret; 1502 - WRITE_ONCE(cpudata->cppc_cap1_cached, value); 1503 1521 } 1504 - ret = amd_pstate_set_epp(cpudata, cpudata->epp_default); 1522 + ret = amd_pstate_set_epp(policy, cpudata->epp_default); 1505 1523 if (ret) 1506 1524 return ret; 1507 1525 ··· 1505 1533 return 0; 1506 1534 1507 1535 free_cpudata1: 1536 + pr_warn("Failed to initialize CPU %d: %d\n", policy->cpu, ret); 1508 1537 kfree(cpudata); 1509 1538 return ret; 1510 1539 } ··· 1525 1552 static int amd_pstate_epp_update_limit(struct cpufreq_policy *policy) 1526 1553 { 1527 1554 struct amd_cpudata *cpudata = policy->driver_data; 1528 - u32 epp; 1555 + union perf_cached perf; 1556 + u8 epp; 1529 1557 1530 - amd_pstate_update_min_max_limit(policy); 1558 + if (policy->min != cpudata->min_limit_freq || policy->max != cpudata->max_limit_freq) 1559 + amd_pstate_update_min_max_limit(policy); 1531 1560 1532 1561 if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) 1533 1562 epp = 0; 1534 1563 else 1535 - epp = READ_ONCE(cpudata->epp_cached); 1564 + epp = FIELD_GET(AMD_CPPC_EPP_PERF_MASK, cpudata->cppc_req_cached); 1536 1565 1537 - if (trace_amd_pstate_epp_perf_enabled()) { 1538 - trace_amd_pstate_epp_perf(cpudata->cpu, cpudata->highest_perf, epp, 1539 - cpudata->min_limit_perf, 1540 - cpudata->max_limit_perf, 1541 - policy->boost_enabled); 1542 - } 1566 + perf = READ_ONCE(cpudata->perf); 1543 1567 1544 - return amd_pstate_update_perf(cpudata, cpudata->min_limit_perf, 0U, 1545 - cpudata->max_limit_perf, epp, false); 1568 + return amd_pstate_update_perf(policy, perf.min_limit_perf, 0U, 1569 + perf.max_limit_perf, epp, false); 1546 1570 } 1547 1571 1548 1572 static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) ··· 1549 1579 1550 1580 if (!policy->cpuinfo.max_freq) 1551 1581 return -ENODEV; 1552 - 1553 - pr_debug("set_policy: cpuinfo.max %u policy->max %u\n", 1554 - policy->cpuinfo.max_freq, policy->max); 1555 1582 1556 1583 cpudata->policy = policy->policy; 1557 1584 ··· 1565 1598 return 0; 1566 1599 } 1567 1600 1568 - static int amd_pstate_epp_reenable(struct cpufreq_policy *policy) 1569 - { 1570 - struct amd_cpudata *cpudata = policy->driver_data; 1571 - u64 max_perf; 1572 - int ret; 1573 - 1574 - ret = amd_pstate_cppc_enable(true); 1575 - if (ret) 1576 - pr_err("failed to enable amd pstate during resume, return %d\n", ret); 1577 - 1578 - max_perf = READ_ONCE(cpudata->highest_perf); 1579 - 1580 - if (trace_amd_pstate_epp_perf_enabled()) { 1581 - trace_amd_pstate_epp_perf(cpudata->cpu, cpudata->highest_perf, 1582 - cpudata->epp_cached, 1583 - FIELD_GET(AMD_CPPC_MIN_PERF_MASK, cpudata->cppc_req_cached), 1584 - max_perf, policy->boost_enabled); 1585 - } 1586 - 1587 - return amd_pstate_update_perf(cpudata, 0, 0, max_perf, cpudata->epp_cached, false); 1588 - } 1589 - 1590 1601 static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy) 1591 1602 { 1592 - struct amd_cpudata *cpudata = policy->driver_data; 1593 - int ret; 1603 + pr_debug("AMD CPU Core %d going online\n", policy->cpu); 1594 1604 1595 - pr_debug("AMD CPU Core %d going online\n", cpudata->cpu); 1596 - 1597 - ret = amd_pstate_epp_reenable(policy); 1598 - if (ret) 1599 - return ret; 1600 - cpudata->suspended = false; 1601 - 1602 - return 0; 1605 + return amd_pstate_cppc_enable(policy); 1603 1606 } 1604 1607 1605 1608 static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy) 1606 1609 { 1607 - struct amd_cpudata *cpudata = policy->driver_data; 1608 - int min_perf; 1609 - 1610 - if (cpudata->suspended) 1611 - return 0; 1612 - 1613 - min_perf = READ_ONCE(cpudata->lowest_perf); 1614 - 1615 - guard(mutex)(&amd_pstate_limits_lock); 1616 - 1617 - if (trace_amd_pstate_epp_perf_enabled()) { 1618 - trace_amd_pstate_epp_perf(cpudata->cpu, cpudata->highest_perf, 1619 - AMD_CPPC_EPP_BALANCE_POWERSAVE, 1620 - min_perf, min_perf, policy->boost_enabled); 1621 - } 1622 - 1623 - return amd_pstate_update_perf(cpudata, min_perf, 0, min_perf, 1624 - AMD_CPPC_EPP_BALANCE_POWERSAVE, false); 1610 + return 0; 1625 1611 } 1626 1612 1627 1613 static int amd_pstate_epp_suspend(struct cpufreq_policy *policy) 1628 1614 { 1629 1615 struct amd_cpudata *cpudata = policy->driver_data; 1630 - int ret; 1631 1616 1632 - /* avoid suspending when EPP is not enabled */ 1633 - if (cppc_state != AMD_PSTATE_ACTIVE) 1634 - return 0; 1617 + /* invalidate to ensure it's rewritten during resume */ 1618 + cpudata->cppc_req_cached = 0; 1635 1619 1636 1620 /* set this flag to avoid setting core offline*/ 1637 1621 cpudata->suspended = true; 1638 - 1639 - /* disable CPPC in lowlevel firmware */ 1640 - ret = amd_pstate_cppc_enable(false); 1641 - if (ret) 1642 - pr_err("failed to suspend, return %d\n", ret); 1643 1622 1644 1623 return 0; 1645 1624 } ··· 1595 1682 struct amd_cpudata *cpudata = policy->driver_data; 1596 1683 1597 1684 if (cpudata->suspended) { 1598 - guard(mutex)(&amd_pstate_limits_lock); 1685 + int ret; 1599 1686 1600 1687 /* enable amd pstate from suspend state*/ 1601 - amd_pstate_epp_reenable(policy); 1688 + ret = amd_pstate_epp_update_limit(policy); 1689 + if (ret) 1690 + return ret; 1602 1691 1603 1692 cpudata->suspended = false; 1604 1693 } ··· 1615 1700 .fast_switch = amd_pstate_fast_switch, 1616 1701 .init = amd_pstate_cpu_init, 1617 1702 .exit = amd_pstate_cpu_exit, 1618 - .suspend = amd_pstate_cpu_suspend, 1619 - .resume = amd_pstate_cpu_resume, 1620 1703 .set_boost = amd_pstate_set_boost, 1621 1704 .update_limits = amd_pstate_update_limits, 1622 1705 .name = "amd-pstate", ··· 1781 1868 1782 1869 global_attr_free: 1783 1870 cpufreq_unregister_driver(current_pstate_driver); 1784 - amd_pstate_cppc_enable(false); 1785 1871 return ret; 1786 1872 } 1787 1873 device_initcall(amd_pstate_init);
+36 -29
drivers/cpufreq/amd-pstate.h
··· 13 13 /********************************************************************* 14 14 * AMD P-state INTERFACE * 15 15 *********************************************************************/ 16 + 17 + /** 18 + * union perf_cached - A union to cache performance-related data. 19 + * @highest_perf: the maximum performance an individual processor may reach, 20 + * assuming ideal conditions 21 + * For platforms that support the preferred core feature, the highest_perf value maybe 22 + * configured to any value in the range 166-255 by the firmware (because the preferred 23 + * core ranking is encoded in the highest_perf value). To maintain consistency across 24 + * all platforms, we split the highest_perf and preferred core ranking values into 25 + * cpudata->perf.highest_perf and cpudata->prefcore_ranking. 26 + * @nominal_perf: the maximum sustained performance level of the processor, 27 + * assuming ideal operating conditions 28 + * @lowest_nonlinear_perf: the lowest performance level at which nonlinear power 29 + * savings are achieved 30 + * @lowest_perf: the absolute lowest performance level of the processor 31 + * @min_limit_perf: Cached value of the performance corresponding to policy->min 32 + * @max_limit_perf: Cached value of the performance corresponding to policy->max 33 + */ 34 + union perf_cached { 35 + struct { 36 + u8 highest_perf; 37 + u8 nominal_perf; 38 + u8 lowest_nonlinear_perf; 39 + u8 lowest_perf; 40 + u8 min_limit_perf; 41 + u8 max_limit_perf; 42 + }; 43 + u64 val; 44 + }; 45 + 16 46 /** 17 47 * struct amd_aperf_mperf 18 48 * @aperf: actual performance frequency clock count ··· 60 30 * @cpu: CPU number 61 31 * @req: constraint request to apply 62 32 * @cppc_req_cached: cached performance request hints 63 - * @highest_perf: the maximum performance an individual processor may reach, 64 - * assuming ideal conditions 65 - * For platforms that do not support the preferred core feature, the 66 - * highest_pef may be configured with 166 or 255, to avoid max frequency 67 - * calculated wrongly. we take the fixed value as the highest_perf. 68 - * @nominal_perf: the maximum sustained performance level of the processor, 69 - * assuming ideal operating conditions 70 - * @lowest_nonlinear_perf: the lowest performance level at which nonlinear power 71 - * savings are achieved 72 - * @lowest_perf: the absolute lowest performance level of the processor 33 + * @perf: cached performance-related data 73 34 * @prefcore_ranking: the preferred core ranking, the higher value indicates a higher 74 35 * priority. 75 - * @min_limit_perf: Cached value of the performance corresponding to policy->min 76 - * @max_limit_perf: Cached value of the performance corresponding to policy->max 77 36 * @min_limit_freq: Cached value of policy->min (in khz) 78 37 * @max_limit_freq: Cached value of policy->max (in khz) 79 - * @max_freq: the frequency (in khz) that mapped to highest_perf 80 - * @min_freq: the frequency (in khz) that mapped to lowest_perf 81 38 * @nominal_freq: the frequency (in khz) that mapped to nominal_perf 82 39 * @lowest_nonlinear_freq: the frequency (in khz) that mapped to lowest_nonlinear_perf 83 40 * @cur: Difference of Aperf/Mperf/tsc count between last and current sample ··· 76 59 * AMD P-State driver supports preferred core featue. 77 60 * @epp_cached: Cached CPPC energy-performance preference value 78 61 * @policy: Cpufreq policy value 79 - * @cppc_cap1_cached Cached MSR_AMD_CPPC_CAP1 register value 80 62 * 81 63 * The amd_cpudata is key private data for each CPU thread in AMD P-State, and 82 64 * represents all the attributes and goals that AMD P-State requests at runtime. ··· 86 70 struct freq_qos_request req[2]; 87 71 u64 cppc_req_cached; 88 72 89 - u32 highest_perf; 90 - u32 nominal_perf; 91 - u32 lowest_nonlinear_perf; 92 - u32 lowest_perf; 93 - u32 prefcore_ranking; 94 - u32 min_limit_perf; 95 - u32 max_limit_perf; 96 - u32 min_limit_freq; 97 - u32 max_limit_freq; 73 + union perf_cached perf; 98 74 99 - u32 max_freq; 100 - u32 min_freq; 75 + u8 prefcore_ranking; 76 + u32 min_limit_freq; 77 + u32 max_limit_freq; 101 78 u32 nominal_freq; 102 79 u32 lowest_nonlinear_freq; 103 80 ··· 102 93 bool hw_prefcore; 103 94 104 95 /* EPP feature related attributes*/ 105 - s16 epp_cached; 106 96 u32 policy; 107 - u64 cppc_cap1_cached; 108 97 bool suspended; 109 - s16 epp_default; 98 + u8 epp_default; 110 99 }; 111 100 112 101 /*
+1 -17
drivers/cpufreq/apple-soc-cpufreq.c
··· 229 229 return 0; 230 230 } 231 231 232 - static struct freq_attr *apple_soc_cpufreq_hw_attr[] = { 233 - &cpufreq_freq_attr_scaling_available_freqs, 234 - NULL, /* Filled in below if boost is enabled */ 235 - NULL, 236 - }; 237 - 238 232 static int apple_soc_cpufreq_init(struct cpufreq_policy *policy) 239 233 { 240 234 int ret, i; ··· 310 316 policy->fast_switch_possible = true; 311 317 policy->suspend_freq = freq_table[0].frequency; 312 318 313 - if (policy_has_boost_freq(policy)) { 314 - ret = cpufreq_enable_boost_support(); 315 - if (ret) { 316 - dev_warn(cpu_dev, "failed to enable boost: %d\n", ret); 317 - } else { 318 - apple_soc_cpufreq_hw_attr[1] = &cpufreq_freq_attr_scaling_boost_freqs; 319 - apple_soc_cpufreq_driver.boost_enabled = true; 320 - } 321 - } 322 - 323 319 return 0; 324 320 325 321 out_free_cpufreq_table: ··· 344 360 .target_index = apple_soc_cpufreq_set_target, 345 361 .fast_switch = apple_soc_cpufreq_fast_switch, 346 362 .register_em = cpufreq_register_em_with_opp, 347 - .attr = apple_soc_cpufreq_hw_attr, 363 + .set_boost = cpufreq_boost_set_sw, 348 364 .suspend = cpufreq_generic_suspend, 349 365 }; 350 366
+1 -5
drivers/cpufreq/armada-37xx-cpufreq.c
··· 102 102 }; 103 103 104 104 static struct armada_37xx_dvfs armada_37xx_dvfs[] = { 105 - /* 106 - * The cpufreq scaling for 1.2 GHz variant of the SOC is currently 107 - * unstable because we do not know how to configure it properly. 108 - */ 109 - /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */ 105 + {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, 110 106 {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} }, 111 107 {.cpu_freq_max = 800*1000*1000, .divider = {1, 2, 3, 4} }, 112 108 {.cpu_freq_max = 600*1000*1000, .divider = {2, 4, 5, 6} },
+1 -1
drivers/cpufreq/armada-8k-cpufreq.c
··· 47 47 { 48 48 int cpu; 49 49 50 - for_each_possible_cpu(cpu) { 50 + for_each_present_cpu(cpu) { 51 51 struct device *cpu_dev; 52 52 struct clk *clk; 53 53
-1
drivers/cpufreq/bmips-cpufreq.c
··· 150 150 .get = bmips_cpufreq_get, 151 151 .init = bmips_cpufreq_init, 152 152 .exit = bmips_cpufreq_exit, 153 - .attr = cpufreq_generic_attr, 154 153 .name = BMIPS_CPUFREQ_PREFIX, 155 154 }; 156 155
-1
drivers/cpufreq/brcmstb-avs-cpufreq.c
··· 720 720 cpufreq_freq_attr_ro(brcm_avs_frequency); 721 721 722 722 static struct freq_attr *brcm_avs_cpufreq_attr[] = { 723 - &cpufreq_freq_attr_scaling_available_freqs, 724 723 &brcm_avs_pstate, 725 724 &brcm_avs_mode, 726 725 &brcm_avs_pmap,
+1 -8
drivers/cpufreq/cppc_cpufreq.c
··· 34 34 */ 35 35 static LIST_HEAD(cpu_data_list); 36 36 37 - static bool boost_supported; 38 - 39 37 static struct cpufreq_driver cppc_cpufreq_driver; 40 38 41 39 #ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE ··· 651 653 * is supported. 652 654 */ 653 655 if (caps->highest_perf > caps->nominal_perf) 654 - boost_supported = true; 656 + policy->boost_supported = true; 655 657 656 658 /* Set policy->cur to max now. The governors will adjust later. */ 657 659 policy->cur = cppc_perf_to_khz(caps, caps->highest_perf); ··· 788 790 struct cppc_cpudata *cpu_data = policy->driver_data; 789 791 struct cppc_perf_caps *caps = &cpu_data->perf_caps; 790 792 int ret; 791 - 792 - if (!boost_supported) { 793 - pr_err("BOOST not supported by CPU or firmware\n"); 794 - return -EINVAL; 795 - } 796 793 797 794 if (state) 798 795 policy->max = cppc_perf_to_khz(caps, caps->highest_perf);
+2 -22
drivers/cpufreq/cpufreq-dt.c
··· 36 36 37 37 static LIST_HEAD(priv_list); 38 38 39 - static struct freq_attr *cpufreq_dt_attr[] = { 40 - &cpufreq_freq_attr_scaling_available_freqs, 41 - NULL, /* Extra space for boost-attr if required */ 42 - NULL, 43 - }; 44 - 45 39 static struct private_data *cpufreq_dt_find_data(int cpu) 46 40 { 47 41 struct private_data *priv; ··· 114 120 policy->cpuinfo.transition_latency = transition_latency; 115 121 policy->dvfs_possible_from_any_cpu = true; 116 122 117 - /* Support turbo/boost mode */ 118 - if (policy_has_boost_freq(policy)) { 119 - /* This gets disabled by core on driver unregister */ 120 - ret = cpufreq_enable_boost_support(); 121 - if (ret) 122 - goto out_clk_put; 123 - cpufreq_dt_attr[1] = &cpufreq_freq_attr_scaling_boost_freqs; 124 - } 125 - 126 123 return 0; 127 - 128 - out_clk_put: 129 - clk_put(cpu_clk); 130 - 131 - return ret; 132 124 } 133 125 134 126 static int cpufreq_online(struct cpufreq_policy *policy) ··· 149 169 .offline = cpufreq_offline, 150 170 .register_em = cpufreq_register_em_with_opp, 151 171 .name = "cpufreq-dt", 152 - .attr = cpufreq_dt_attr, 172 + .set_boost = cpufreq_boost_set_sw, 153 173 .suspend = cpufreq_generic_suspend, 154 174 }; 155 175 ··· 283 303 int ret, cpu; 284 304 285 305 /* Request resources early so we can return in case of -EPROBE_DEFER */ 286 - for_each_possible_cpu(cpu) { 306 + for_each_present_cpu(cpu) { 287 307 ret = dt_cpufreq_early_init(&pdev->dev, cpu); 288 308 if (ret) 289 309 goto err;
+27 -19
drivers/cpufreq/cpufreq.c
··· 88 88 struct cpufreq_governor *new_gov, 89 89 unsigned int new_pol); 90 90 static bool cpufreq_boost_supported(void); 91 + static int cpufreq_boost_trigger_state(int state); 91 92 92 93 /* 93 94 * Two notifier lists: the "policy" list is involved in the ··· 632 631 if (!cpufreq_driver->boost_enabled) 633 632 return -EINVAL; 634 633 634 + if (!policy->boost_supported) 635 + return -EINVAL; 636 + 635 637 if (policy->boost_enabled == enable) 636 638 return count; 637 639 ··· 1084 1080 { 1085 1081 struct freq_attr **drv_attr; 1086 1082 int ret = 0; 1083 + 1084 + /* Attributes that need freq_table */ 1085 + if (policy->freq_table) { 1086 + ret = sysfs_create_file(&policy->kobj, 1087 + &cpufreq_freq_attr_scaling_available_freqs.attr); 1088 + if (ret) 1089 + return ret; 1090 + 1091 + if (cpufreq_boost_supported()) { 1092 + ret = sysfs_create_file(&policy->kobj, 1093 + &cpufreq_freq_attr_scaling_boost_freqs.attr); 1094 + if (ret) 1095 + return ret; 1096 + } 1097 + } 1087 1098 1088 1099 /* set up files for this cpu device */ 1089 1100 drv_attr = cpufreq_driver->attr; ··· 1618 1599 policy->cdev = of_cpufreq_cooling_register(policy); 1619 1600 1620 1601 /* Let the per-policy boost flag mirror the cpufreq_driver boost during init */ 1621 - if (cpufreq_driver->set_boost && 1602 + if (cpufreq_driver->set_boost && policy->boost_supported && 1622 1603 policy->boost_enabled != cpufreq_boost_enabled()) { 1623 1604 policy->boost_enabled = cpufreq_boost_enabled(); 1624 1605 ret = cpufreq_driver->set_boost(policy, policy->boost_enabled); 1625 1606 if (ret) { 1626 1607 /* If the set_boost fails, the online operation is not affected */ 1627 1608 pr_info("%s: CPU%d: Cannot %s BOOST\n", __func__, policy->cpu, 1628 - policy->boost_enabled ? "enable" : "disable"); 1609 + str_enable_disable(policy->boost_enabled)); 1629 1610 policy->boost_enabled = !policy->boost_enabled; 1630 1611 } 1631 1612 } ··· 2819 2800 /********************************************************************* 2820 2801 * BOOST * 2821 2802 *********************************************************************/ 2822 - static int cpufreq_boost_set_sw(struct cpufreq_policy *policy, int state) 2803 + int cpufreq_boost_set_sw(struct cpufreq_policy *policy, int state) 2823 2804 { 2824 2805 int ret; 2825 2806 ··· 2838 2819 2839 2820 return 0; 2840 2821 } 2822 + EXPORT_SYMBOL_GPL(cpufreq_boost_set_sw); 2841 2823 2842 - int cpufreq_boost_trigger_state(int state) 2824 + static int cpufreq_boost_trigger_state(int state) 2843 2825 { 2844 2826 struct cpufreq_policy *policy; 2845 2827 unsigned long flags; ··· 2855 2835 2856 2836 cpus_read_lock(); 2857 2837 for_each_active_policy(policy) { 2838 + if (!policy->boost_supported) 2839 + continue; 2840 + 2858 2841 policy->boost_enabled = state; 2859 2842 ret = cpufreq_driver->set_boost(policy, state); 2860 2843 if (ret) { ··· 2904 2881 if (cpufreq_boost_supported()) 2905 2882 sysfs_remove_file(cpufreq_global_kobject, &boost.attr); 2906 2883 } 2907 - 2908 - int cpufreq_enable_boost_support(void) 2909 - { 2910 - if (!cpufreq_driver) 2911 - return -EINVAL; 2912 - 2913 - if (cpufreq_boost_supported()) 2914 - return 0; 2915 - 2916 - cpufreq_driver->set_boost = cpufreq_boost_set_sw; 2917 - 2918 - /* This will get removed on driver unregister */ 2919 - return create_boost_sysfs_file(); 2920 - } 2921 - EXPORT_SYMBOL_GPL(cpufreq_enable_boost_support); 2922 2884 2923 2885 bool cpufreq_boost_enabled(void) 2924 2886 {
+23 -22
drivers/cpufreq/cpufreq_governor.c
··· 145 145 time_elapsed = update_time - j_cdbs->prev_update_time; 146 146 j_cdbs->prev_update_time = update_time; 147 147 148 - idle_time = cur_idle_time - j_cdbs->prev_cpu_idle; 148 + /* 149 + * cur_idle_time could be smaller than j_cdbs->prev_cpu_idle if 150 + * it's obtained from get_cpu_idle_time_jiffy() when NOHZ is 151 + * off, where idle_time is calculated by the difference between 152 + * time elapsed in jiffies and "busy time" obtained from CPU 153 + * statistics. If a CPU is 100% busy, the time elapsed and busy 154 + * time should grow with the same amount in two consecutive 155 + * samples, but in practice there could be a tiny difference, 156 + * making the accumulated idle time decrease sometimes. Hence, 157 + * in this case, idle_time should be regarded as 0 in order to 158 + * make the further process correct. 159 + */ 160 + if (cur_idle_time > j_cdbs->prev_cpu_idle) 161 + idle_time = cur_idle_time - j_cdbs->prev_cpu_idle; 162 + else 163 + idle_time = 0; 164 + 149 165 j_cdbs->prev_cpu_idle = cur_idle_time; 150 166 151 167 if (ignore_nice) { ··· 178 162 * calls, so the previous load value can be used then. 179 163 */ 180 164 load = j_cdbs->prev_load; 181 - } else if (unlikely((int)idle_time > 2 * sampling_rate && 165 + } else if (unlikely(idle_time > 2 * sampling_rate && 182 166 j_cdbs->prev_load)) { 183 167 /* 184 168 * If the CPU had gone completely idle and a task has ··· 205 189 load = j_cdbs->prev_load; 206 190 j_cdbs->prev_load = 0; 207 191 } else { 208 - if (time_elapsed >= idle_time) { 192 + if (time_elapsed > idle_time) 209 193 load = 100 * (time_elapsed - idle_time) / time_elapsed; 210 - } else { 211 - /* 212 - * That can happen if idle_time is returned by 213 - * get_cpu_idle_time_jiffy(). In that case 214 - * idle_time is roughly equal to the difference 215 - * between time_elapsed and "busy time" obtained 216 - * from CPU statistics. Then, the "busy time" 217 - * can end up being greater than time_elapsed 218 - * (for example, if jiffies_64 and the CPU 219 - * statistics are updated by different CPUs), 220 - * so idle_time may in fact be negative. That 221 - * means, though, that the CPU was busy all 222 - * the time (on the rough average) during the 223 - * last sampling interval and 100 can be 224 - * returned as the load. 225 - */ 226 - load = (int)idle_time < 0 ? 100 : 0; 227 - } 194 + else 195 + load = 0; 196 + 228 197 j_cdbs->prev_load = load; 229 198 } 230 199 231 - if (unlikely((int)idle_time > 2 * sampling_rate)) { 200 + if (unlikely(idle_time > 2 * sampling_rate)) { 232 201 unsigned int periods = idle_time / sampling_rate; 233 202 234 203 if (periods < idle_periods)
-1
drivers/cpufreq/davinci-cpufreq.c
··· 101 101 .get = cpufreq_generic_get, 102 102 .init = davinci_cpu_init, 103 103 .name = "davinci", 104 - .attr = cpufreq_generic_attr, 105 104 }; 106 105 107 106 static int __init davinci_cpufreq_probe(struct platform_device *pdev)
-1
drivers/cpufreq/e_powersaver.c
··· 376 376 .exit = eps_cpu_exit, 377 377 .get = eps_get, 378 378 .name = "e_powersaver", 379 - .attr = cpufreq_generic_attr, 380 379 }; 381 380 382 381
-1
drivers/cpufreq/elanfreq.c
··· 194 194 .target_index = elanfreq_target, 195 195 .init = elanfreq_cpu_init, 196 196 .name = "elanfreq", 197 - .attr = cpufreq_generic_attr, 198 197 }; 199 198 200 199 static const struct x86_cpu_id elan_id[] = {
+5 -10
drivers/cpufreq/freq_table.c
··· 14 14 * FREQUENCY TABLE HELPERS * 15 15 *********************************************************************/ 16 16 17 - bool policy_has_boost_freq(struct cpufreq_policy *policy) 17 + static bool policy_has_boost_freq(struct cpufreq_policy *policy) 18 18 { 19 19 struct cpufreq_frequency_table *pos, *table = policy->freq_table; 20 20 ··· 27 27 28 28 return false; 29 29 } 30 - EXPORT_SYMBOL_GPL(policy_has_boost_freq); 31 30 32 31 int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy, 33 32 struct cpufreq_frequency_table *table) ··· 275 276 return show_available_freqs(policy, buf, false); 276 277 } 277 278 cpufreq_attr_available_freq(scaling_available); 278 - EXPORT_SYMBOL_GPL(cpufreq_freq_attr_scaling_available_freqs); 279 279 280 280 /* 281 281 * scaling_boost_frequencies_show - show available boost frequencies for ··· 286 288 return show_available_freqs(policy, buf, true); 287 289 } 288 290 cpufreq_attr_available_freq(scaling_boost); 289 - EXPORT_SYMBOL_GPL(cpufreq_freq_attr_scaling_boost_freqs); 290 - 291 - struct freq_attr *cpufreq_generic_attr[] = { 292 - &cpufreq_freq_attr_scaling_available_freqs, 293 - NULL, 294 - }; 295 - EXPORT_SYMBOL_GPL(cpufreq_generic_attr); 296 291 297 292 static int set_freq_table_sorted(struct cpufreq_policy *policy) 298 293 { ··· 357 366 ret = cpufreq_frequency_table_cpuinfo(policy, policy->freq_table); 358 367 if (ret) 359 368 return ret; 369 + 370 + /* Driver's may have set this field already */ 371 + if (policy_has_boost_freq(policy)) 372 + policy->boost_supported = true; 360 373 361 374 return set_freq_table_sorted(policy); 362 375 }
-1
drivers/cpufreq/imx6q-cpufreq.c
··· 207 207 .init = imx6q_cpufreq_init, 208 208 .register_em = cpufreq_register_em_with_opp, 209 209 .name = "imx6q-cpufreq", 210 - .attr = cpufreq_generic_attr, 211 210 .suspend = cpufreq_generic_suspend, 212 211 }; 213 212
+18 -9
drivers/cpufreq/intel_pstate.c
··· 936 936 NULL, 937 937 }; 938 938 939 + static bool no_cas __ro_after_init; 940 + 939 941 static struct cpudata *hybrid_max_perf_cpu __read_mostly; 940 942 /* 941 943 * Protects hybrid_max_perf_cpu, the capacity_perf fields in struct cpudata, ··· 1043 1041 1044 1042 static void hybrid_init_cpu_capacity_scaling(bool refresh) 1045 1043 { 1044 + /* Bail out if enabling capacity-aware scheduling is prohibited. */ 1045 + if (no_cas) 1046 + return; 1047 + 1046 1048 /* 1047 1049 * If hybrid_max_perf_cpu is set at this point, the hybrid CPU capacity 1048 1050 * scaling has been enabled already and the driver is just changing the ··· 3686 3680 if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) 3687 3681 return -ENODEV; 3688 3682 3683 + /* 3684 + * The Intel pstate driver will be ignored if the platform 3685 + * firmware has its own power management modes. 3686 + */ 3687 + if (intel_pstate_platform_pwr_mgmt_exists()) { 3688 + pr_info("P-states controlled by the platform\n"); 3689 + return -ENODEV; 3690 + } 3691 + 3689 3692 id = x86_match_cpu(hwp_support_ids); 3690 3693 if (id) { 3691 3694 hwp_forced = intel_pstate_hwp_is_enabled(); ··· 3750 3735 default_driver = &intel_cpufreq; 3751 3736 3752 3737 hwp_cpu_matched: 3753 - /* 3754 - * The Intel pstate driver will be ignored if the platform 3755 - * firmware has its own power management modes. 3756 - */ 3757 - if (intel_pstate_platform_pwr_mgmt_exists()) { 3758 - pr_info("P-states controlled by the platform\n"); 3759 - return -ENODEV; 3760 - } 3761 - 3762 3738 if (!hwp_active && hwp_only) 3763 3739 return -ENOTSUPP; 3764 3740 ··· 3832 3826 3833 3827 if (!strcmp(str, "no_hwp")) 3834 3828 no_hwp = 1; 3829 + 3830 + if (!strcmp(str, "no_cas")) 3831 + no_cas = true; 3835 3832 3836 3833 if (!strcmp(str, "force")) 3837 3834 force_load = 1;
-1
drivers/cpufreq/kirkwood-cpufreq.c
··· 96 96 .target_index = kirkwood_cpufreq_target, 97 97 .init = kirkwood_cpufreq_cpu_init, 98 98 .name = "kirkwood-cpufreq", 99 - .attr = cpufreq_generic_attr, 100 99 }; 101 100 102 101 static int kirkwood_cpufreq_probe(struct platform_device *pdev)
-1
drivers/cpufreq/longhaul.c
··· 906 906 .get = longhaul_get, 907 907 .init = longhaul_cpu_init, 908 908 .name = "longhaul", 909 - .attr = cpufreq_generic_attr, 910 909 }; 911 910 912 911 static const struct x86_cpu_id longhaul_id[] = {
-1
drivers/cpufreq/loongson2_cpufreq.c
··· 91 91 .verify = cpufreq_generic_frequency_table_verify, 92 92 .target_index = loongson2_cpufreq_target, 93 93 .get = cpufreq_generic_get, 94 - .attr = cpufreq_generic_attr, 95 94 }; 96 95 97 96 static const struct platform_device_id platform_device_ids[] = {
+1 -10
drivers/cpufreq/loongson3_cpufreq.c
··· 299 299 per_cpu(freq_data, i) = per_cpu(freq_data, cpu); 300 300 } 301 301 302 - if (policy_has_boost_freq(policy)) { 303 - ret = cpufreq_enable_boost_support(); 304 - if (ret < 0) { 305 - pr_warn("cpufreq: Failed to enable boost: %d\n", ret); 306 - return ret; 307 - } 308 - loongson3_cpufreq_driver.boost_enabled = true; 309 - } 310 - 311 302 return 0; 312 303 } 313 304 ··· 328 337 .offline = loongson3_cpufreq_cpu_offline, 329 338 .get = loongson3_cpufreq_get, 330 339 .target_index = loongson3_cpufreq_target, 331 - .attr = cpufreq_generic_attr, 332 340 .verify = cpufreq_generic_frequency_table_verify, 341 + .set_boost = cpufreq_boost_set_sw, 333 342 .suspend = cpufreq_generic_suspend, 334 343 }; 335 344
+1 -2
drivers/cpufreq/mediatek-cpufreq-hw.c
··· 293 293 .register_em = mtk_cpufreq_register_em, 294 294 .fast_switch = mtk_cpufreq_hw_fast_switch, 295 295 .name = "mtk-cpufreq-hw", 296 - .attr = cpufreq_generic_attr, 297 296 }; 298 297 299 298 static int mtk_cpufreq_hw_driver_probe(struct platform_device *pdev) ··· 303 304 struct regulator *cpu_reg; 304 305 305 306 /* Make sure that all CPU supplies are available before proceeding. */ 306 - for_each_possible_cpu(cpu) { 307 + for_each_present_cpu(cpu) { 307 308 cpu_dev = get_cpu_device(cpu); 308 309 if (!cpu_dev) 309 310 return dev_err_probe(&pdev->dev, -EPROBE_DEFER,
+1 -2
drivers/cpufreq/mediatek-cpufreq.c
··· 618 618 .exit = mtk_cpufreq_exit, 619 619 .register_em = cpufreq_register_em_with_opp, 620 620 .name = "mtk-cpufreq", 621 - .attr = cpufreq_generic_attr, 622 621 }; 623 622 624 623 static int mtk_cpufreq_probe(struct platform_device *pdev) ··· 631 632 return dev_err_probe(&pdev->dev, -ENODEV, 632 633 "failed to get mtk cpufreq platform data\n"); 633 634 634 - for_each_possible_cpu(cpu) { 635 + for_each_present_cpu(cpu) { 635 636 info = mtk_cpu_dvfs_info_lookup(cpu); 636 637 if (info) 637 638 continue;
+1 -1
drivers/cpufreq/mvebu-cpufreq.c
··· 56 56 * it), and registers the clock notifier that will take care 57 57 * of doing the PMSU part of a frequency transition. 58 58 */ 59 - for_each_possible_cpu(cpu) { 59 + for_each_present_cpu(cpu) { 60 60 struct device *cpu_dev; 61 61 struct clk *clk; 62 62 int ret;
-1
drivers/cpufreq/omap-cpufreq.c
··· 147 147 .exit = omap_cpu_exit, 148 148 .register_em = cpufreq_register_em_with_opp, 149 149 .name = "omap", 150 - .attr = cpufreq_generic_attr, 151 150 }; 152 151 153 152 static int omap_cpufreq_probe(struct platform_device *pdev)
-1
drivers/cpufreq/p4-clockmod.c
··· 227 227 .init = cpufreq_p4_cpu_init, 228 228 .get = cpufreq_p4_get, 229 229 .name = "p4-clockmod", 230 - .attr = cpufreq_generic_attr, 231 230 }; 232 231 233 232 static const struct x86_cpu_id cpufreq_p4_id[] = {
-1
drivers/cpufreq/pasemi-cpufreq.c
··· 245 245 .exit = pas_cpufreq_cpu_exit, 246 246 .verify = cpufreq_generic_frequency_table_verify, 247 247 .target_index = pas_cpufreq_target, 248 - .attr = cpufreq_generic_attr, 249 248 }; 250 249 251 250 /*
-1
drivers/cpufreq/pmac32-cpufreq.c
··· 439 439 .suspend = pmac_cpufreq_suspend, 440 440 .resume = pmac_cpufreq_resume, 441 441 .flags = CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING, 442 - .attr = cpufreq_generic_attr, 443 442 .name = "powermac", 444 443 }; 445 444
-1
drivers/cpufreq/pmac64-cpufreq.c
··· 332 332 .verify = cpufreq_generic_frequency_table_verify, 333 333 .target_index = g5_cpufreq_target, 334 334 .get = g5_cpufreq_get_speed, 335 - .attr = cpufreq_generic_attr, 336 335 }; 337 336 338 337
-1
drivers/cpufreq/powernow-k6.c
··· 253 253 .exit = powernow_k6_cpu_exit, 254 254 .get = powernow_k6_get, 255 255 .name = "powernow-k6", 256 - .attr = cpufreq_generic_attr, 257 256 }; 258 257 259 258 static const struct x86_cpu_id powernow_k6_ids[] = {
-1
drivers/cpufreq/powernow-k7.c
··· 667 667 .init = powernow_cpu_init, 668 668 .exit = powernow_cpu_exit, 669 669 .name = "powernow-k7", 670 - .attr = cpufreq_generic_attr, 671 670 }; 672 671 673 672 static int __init powernow_init(void)
-1
drivers/cpufreq/powernow-k8.c
··· 1143 1143 .exit = powernowk8_cpu_exit, 1144 1144 .get = powernowk8_get, 1145 1145 .name = "powernow-k8", 1146 - .attr = cpufreq_generic_attr, 1147 1146 }; 1148 1147 1149 1148 static void __request_acpi_cpufreq(void)
+1 -10
drivers/cpufreq/powernv-cpufreq.c
··· 386 386 static struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq = 387 387 __ATTR_RO(cpuinfo_nominal_freq); 388 388 389 - #define SCALING_BOOST_FREQS_ATTR_INDEX 2 390 - 391 389 static struct freq_attr *powernv_cpu_freq_attr[] = { 392 - &cpufreq_freq_attr_scaling_available_freqs, 393 390 &cpufreq_freq_attr_cpuinfo_nominal_freq, 394 - &cpufreq_freq_attr_scaling_boost_freqs, 395 391 NULL, 396 392 }; 397 393 ··· 1124 1128 goto out; 1125 1129 1126 1130 if (powernv_pstate_info.wof_enabled) 1127 - powernv_cpufreq_driver.boost_enabled = true; 1128 - else 1129 - powernv_cpu_freq_attr[SCALING_BOOST_FREQS_ATTR_INDEX] = NULL; 1131 + powernv_cpufreq_driver.set_boost = cpufreq_boost_set_sw; 1130 1132 1131 1133 rc = cpufreq_register_driver(&powernv_cpufreq_driver); 1132 1134 if (rc) { 1133 1135 pr_info("Failed to register the cpufreq driver (%d)\n", rc); 1134 1136 goto cleanup; 1135 1137 } 1136 - 1137 - if (powernv_pstate_info.wof_enabled) 1138 - cpufreq_enable_boost_support(); 1139 1138 1140 1139 register_reboot_notifier(&powernv_cpufreq_reboot_nb); 1141 1140 opal_message_notifier_register(OPAL_MSG_OCC, &powernv_cpufreq_opal_nb);
+2 -14
drivers/cpufreq/qcom-cpufreq-hw.c
··· 306 306 struct of_phandle_args args; 307 307 int cpu, ret; 308 308 309 - for_each_possible_cpu(cpu) { 309 + for_each_present_cpu(cpu) { 310 310 cpu_np = of_cpu_device_node_get(cpu); 311 311 if (!cpu_np) 312 312 continue; ··· 566 566 return -ENODEV; 567 567 } 568 568 569 - if (policy_has_boost_freq(policy)) { 570 - ret = cpufreq_enable_boost_support(); 571 - if (ret) 572 - dev_warn(cpu_dev, "failed to enable boost: %d\n", ret); 573 - } 574 - 575 569 return qcom_cpufreq_hw_lmh_init(policy, index); 576 570 } 577 571 ··· 589 595 enable_irq(data->throttle_irq); 590 596 } 591 597 592 - static struct freq_attr *qcom_cpufreq_hw_attr[] = { 593 - &cpufreq_freq_attr_scaling_available_freqs, 594 - &cpufreq_freq_attr_scaling_boost_freqs, 595 - NULL 596 - }; 597 - 598 598 static struct cpufreq_driver cpufreq_qcom_hw_driver = { 599 599 .flags = CPUFREQ_NEED_INITIAL_FREQ_CHECK | 600 600 CPUFREQ_HAVE_GOVERNOR_PER_POLICY | ··· 603 615 .register_em = cpufreq_register_em_with_opp, 604 616 .fast_switch = qcom_cpufreq_hw_fast_switch, 605 617 .name = "qcom-cpufreq-hw", 606 - .attr = qcom_cpufreq_hw_attr, 607 618 .ready = qcom_cpufreq_ready, 619 + .set_boost = cpufreq_boost_set_sw, 608 620 }; 609 621 610 622 static unsigned long qcom_cpufreq_hw_recalc_rate(struct clk_hw *hw, unsigned long parent_rate)
+4 -4
drivers/cpufreq/qcom-cpufreq-nvmem.c
··· 489 489 nvmem_cell_put(speedbin_nvmem); 490 490 } 491 491 492 - for_each_possible_cpu(cpu) { 492 + for_each_present_cpu(cpu) { 493 493 struct dev_pm_opp_config config = { 494 494 .supported_hw = NULL, 495 495 }; ··· 543 543 dev_err(cpu_dev, "Failed to register platform device\n"); 544 544 545 545 free_opp: 546 - for_each_possible_cpu(cpu) { 546 + for_each_present_cpu(cpu) { 547 547 dev_pm_domain_detach_list(drv->cpus[cpu].pd_list); 548 548 dev_pm_opp_clear_config(drv->cpus[cpu].opp_token); 549 549 } ··· 557 557 558 558 platform_device_unregister(cpufreq_dt_pdev); 559 559 560 - for_each_possible_cpu(cpu) { 560 + for_each_present_cpu(cpu) { 561 561 dev_pm_domain_detach_list(drv->cpus[cpu].pd_list); 562 562 dev_pm_opp_clear_config(drv->cpus[cpu].opp_token); 563 563 } ··· 568 568 struct qcom_cpufreq_drv *drv = dev_get_drvdata(dev); 569 569 unsigned int cpu; 570 570 571 - for_each_possible_cpu(cpu) 571 + for_each_present_cpu(cpu) 572 572 qcom_cpufreq_suspend_pd_devs(drv, cpu); 573 573 574 574 return 0;
-1
drivers/cpufreq/qoriq-cpufreq.c
··· 254 254 .verify = cpufreq_generic_frequency_table_verify, 255 255 .target_index = qoriq_cpufreq_target, 256 256 .get = cpufreq_generic_get, 257 - .attr = cpufreq_generic_attr, 258 257 }; 259 258 260 259 static const struct of_device_id qoriq_cpufreq_blacklist[] = {
-1
drivers/cpufreq/sc520_freq.c
··· 92 92 .target_index = sc520_freq_target, 93 93 .init = sc520_freq_cpu_init, 94 94 .name = "sc520_freq", 95 - .attr = cpufreq_generic_attr, 96 95 }; 97 96 98 97 static const struct x86_cpu_id sc520_ids[] = {
+2 -19
drivers/cpufreq/scmi-cpufreq.c
··· 104 104 int cpu, tdomain; 105 105 struct device *tcpu_dev; 106 106 107 - for_each_possible_cpu(cpu) { 107 + for_each_present_cpu(cpu) { 108 108 if (cpu == cpu_dev->id) 109 109 continue; 110 110 ··· 170 170 171 171 return rate_limit; 172 172 } 173 - 174 - static struct freq_attr *scmi_cpufreq_hw_attr[] = { 175 - &cpufreq_freq_attr_scaling_available_freqs, 176 - NULL, 177 - NULL, 178 - }; 179 173 180 174 static int scmi_limit_notify_cb(struct notifier_block *nb, unsigned long event, void *data) 181 175 { ··· 297 303 policy->transition_delay_us = 298 304 scmi_get_rate_limit(domain, policy->fast_switch_possible); 299 305 300 - if (policy_has_boost_freq(policy)) { 301 - ret = cpufreq_enable_boost_support(); 302 - if (ret) { 303 - dev_warn(cpu_dev, "failed to enable boost: %d\n", ret); 304 - goto out_free_table; 305 - } else { 306 - scmi_cpufreq_hw_attr[1] = &cpufreq_freq_attr_scaling_boost_freqs; 307 - scmi_cpufreq_driver.boost_enabled = true; 308 - } 309 - } 310 - 311 306 ret = freq_qos_add_request(&policy->constraints, &priv->limits_freq_req, FREQ_QOS_MAX, 312 307 FREQ_QOS_MAX_DEFAULT_VALUE); 313 308 if (ret < 0) { ··· 378 395 CPUFREQ_NEED_INITIAL_FREQ_CHECK | 379 396 CPUFREQ_IS_COOLING_DEV, 380 397 .verify = cpufreq_generic_frequency_table_verify, 381 - .attr = scmi_cpufreq_hw_attr, 382 398 .target_index = scmi_cpufreq_set_target, 383 399 .fast_switch = scmi_cpufreq_fast_switch, 384 400 .get = scmi_cpufreq_get_rate, 385 401 .init = scmi_cpufreq_init, 386 402 .exit = scmi_cpufreq_exit, 387 403 .register_em = scmi_cpufreq_register_em, 404 + .set_boost = cpufreq_boost_set_sw, 388 405 }; 389 406 390 407 static int scmi_cpufreq_probe(struct scmi_device *sdev)
+4 -4
drivers/cpufreq/scpi-cpufreq.c
··· 39 39 static int 40 40 scpi_cpufreq_set_target(struct cpufreq_policy *policy, unsigned int index) 41 41 { 42 - u64 rate = policy->freq_table[index].frequency * 1000; 42 + unsigned long freq_khz = policy->freq_table[index].frequency; 43 43 struct scpi_data *priv = policy->driver_data; 44 + unsigned long rate = freq_khz * 1000; 44 45 int ret; 45 46 46 47 ret = clk_set_rate(priv->clk, rate); ··· 49 48 if (ret) 50 49 return ret; 51 50 52 - if (clk_get_rate(priv->clk) != rate) 51 + if (clk_get_rate(priv->clk) / 1000 != freq_khz) 53 52 return -EIO; 54 53 55 54 return 0; ··· 65 64 if (domain < 0) 66 65 return domain; 67 66 68 - for_each_possible_cpu(cpu) { 67 + for_each_present_cpu(cpu) { 69 68 if (cpu == cpu_dev->id) 70 69 continue; 71 70 ··· 184 183 CPUFREQ_NEED_INITIAL_FREQ_CHECK | 185 184 CPUFREQ_IS_COOLING_DEV, 186 185 .verify = cpufreq_generic_frequency_table_verify, 187 - .attr = cpufreq_generic_attr, 188 186 .get = scpi_cpufreq_get_rate, 189 187 .init = scpi_cpufreq_init, 190 188 .exit = scpi_cpufreq_exit,
-1
drivers/cpufreq/sh-cpufreq.c
··· 151 151 .verify = sh_cpufreq_verify, 152 152 .init = sh_cpufreq_cpu_init, 153 153 .exit = sh_cpufreq_cpu_exit, 154 - .attr = cpufreq_generic_attr, 155 154 }; 156 155 157 156 static int __init sh_cpufreq_module_init(void)
-1
drivers/cpufreq/spear-cpufreq.c
··· 165 165 .target_index = spear_cpufreq_target, 166 166 .get = cpufreq_generic_get, 167 167 .init = spear_cpufreq_init, 168 - .attr = cpufreq_generic_attr, 169 168 }; 170 169 171 170 static int spear_cpufreq_probe(struct platform_device *pdev)
-1
drivers/cpufreq/speedstep-centrino.c
··· 507 507 .verify = cpufreq_generic_frequency_table_verify, 508 508 .target_index = centrino_target, 509 509 .get = get_cur_freq, 510 - .attr = cpufreq_generic_attr, 511 510 }; 512 511 513 512 /*
-1
drivers/cpufreq/speedstep-ich.c
··· 315 315 .target_index = speedstep_target, 316 316 .init = speedstep_cpu_init, 317 317 .get = speedstep_get, 318 - .attr = cpufreq_generic_attr, 319 318 }; 320 319 321 320 static const struct x86_cpu_id ss_smi_ids[] = {
-1
drivers/cpufreq/speedstep-smi.c
··· 295 295 .init = speedstep_cpu_init, 296 296 .get = speedstep_get, 297 297 .resume = speedstep_resume, 298 - .attr = cpufreq_generic_attr, 299 298 }; 300 299 301 300 static const struct x86_cpu_id ss_smi_ids[] = {
+3 -3
drivers/cpufreq/sun50i-cpufreq-nvmem.c
··· 262 262 snprintf(name, sizeof(name), "speed%d", speed); 263 263 config.prop_name = name; 264 264 265 - for_each_possible_cpu(cpu) { 265 + for_each_present_cpu(cpu) { 266 266 struct device *cpu_dev = get_cpu_device(cpu); 267 267 268 268 if (!cpu_dev) { ··· 288 288 pr_err("Failed to register platform device\n"); 289 289 290 290 free_opp: 291 - for_each_possible_cpu(cpu) 291 + for_each_present_cpu(cpu) 292 292 dev_pm_opp_clear_config(opp_tokens[cpu]); 293 293 kfree(opp_tokens); 294 294 ··· 302 302 303 303 platform_device_unregister(cpufreq_dt_pdev); 304 304 305 - for_each_possible_cpu(cpu) 305 + for_each_present_cpu(cpu) 306 306 dev_pm_opp_clear_config(opp_tokens[cpu]); 307 307 308 308 kfree(opp_tokens);
+7 -1
drivers/cpufreq/tegra186-cpufreq.c
··· 73 73 { 74 74 struct tegra186_cpufreq_data *data = cpufreq_get_driver_data(); 75 75 unsigned int cluster = data->cpus[policy->cpu].bpmp_cluster_id; 76 + u32 cpu; 76 77 77 78 policy->freq_table = data->clusters[cluster].table; 78 79 policy->cpuinfo.transition_latency = 300 * 1000; 79 80 policy->driver_data = NULL; 81 + 82 + /* set same policy for all cpus in a cluster */ 83 + for (cpu = 0; cpu < ARRAY_SIZE(tegra186_cpus); cpu++) { 84 + if (data->cpus[cpu].bpmp_cluster_id == cluster) 85 + cpumask_set_cpu(cpu, policy->cpus); 86 + } 80 87 81 88 return 0; 82 89 } ··· 130 123 .verify = cpufreq_generic_frequency_table_verify, 131 124 .target_index = tegra186_cpufreq_set_target, 132 125 .init = tegra186_cpufreq_init, 133 - .attr = cpufreq_generic_attr, 134 126 }; 135 127 136 128 static struct cpufreq_frequency_table *init_vhint_table(
-1
drivers/cpufreq/tegra194-cpufreq.c
··· 589 589 .exit = tegra194_cpufreq_exit, 590 590 .online = tegra194_cpufreq_online, 591 591 .offline = tegra194_cpufreq_offline, 592 - .attr = cpufreq_generic_attr, 593 592 }; 594 593 595 594 static struct tegra_cpufreq_ops tegra194_cpufreq_ops = {
-1
drivers/cpufreq/vexpress-spc-cpufreq.c
··· 471 471 .init = ve_spc_cpufreq_init, 472 472 .exit = ve_spc_cpufreq_exit, 473 473 .register_em = cpufreq_register_em_with_opp, 474 - .attr = cpufreq_generic_attr, 475 474 }; 476 475 477 476 #ifdef CONFIG_BL_SWITCHER
+1 -2
drivers/cpufreq/virtual-cpufreq.c
··· 138 138 cur_perf_domain = readl_relaxed(base + policy->cpu * 139 139 PER_CPU_OFFSET + REG_PERF_DOMAIN_OFFSET); 140 140 141 - for_each_possible_cpu(cpu) { 141 + for_each_present_cpu(cpu) { 142 142 cpu_dev = get_cpu_device(cpu); 143 143 if (!cpu_dev) 144 144 continue; ··· 265 265 .verify = virt_cpufreq_verify_policy, 266 266 .target = virt_cpufreq_target, 267 267 .fast_switch = virt_cpufreq_fast_switch, 268 - .attr = cpufreq_generic_attr, 269 268 }; 270 269 271 270 static int virt_cpufreq_driver_probe(struct platform_device *pdev)
+4 -4
drivers/cpuidle/cpuidle-arm.c
··· 137 137 /* 138 138 * arm_idle_init - Initializes arm cpuidle driver 139 139 * 140 - * Initializes arm cpuidle driver for all CPUs, if any CPU fails 141 - * to register cpuidle driver then rollback to cancel all CPUs 142 - * registration. 140 + * Initializes arm cpuidle driver for all present CPUs, if any 141 + * CPU fails to register cpuidle driver then rollback to cancel 142 + * all CPUs registration. 143 143 */ 144 144 static int __init arm_idle_init(void) 145 145 { ··· 147 147 struct cpuidle_driver *drv; 148 148 struct cpuidle_device *dev; 149 149 150 - for_each_possible_cpu(cpu) { 150 + for_each_present_cpu(cpu) { 151 151 ret = arm_idle_init_cpu(cpu); 152 152 if (ret) 153 153 goto out_fail;
+1 -1
drivers/cpuidle/cpuidle-big_little.c
··· 148 148 if (!cpumask) 149 149 return -ENOMEM; 150 150 151 - for_each_possible_cpu(cpu) 151 + for_each_present_cpu(cpu) 152 152 if (smp_cpuid_part(cpu) == part_id) 153 153 cpumask_set_cpu(cpu, cpumask); 154 154
+2 -2
drivers/cpuidle/cpuidle-psci.c
··· 400 400 /* 401 401 * psci_idle_probe - Initializes PSCI cpuidle driver 402 402 * 403 - * Initializes PSCI cpuidle driver for all CPUs, if any CPU fails 403 + * Initializes PSCI cpuidle driver for all present CPUs, if any CPU fails 404 404 * to register cpuidle driver then rollback to cancel all CPUs 405 405 * registration. 406 406 */ ··· 410 410 struct cpuidle_driver *drv; 411 411 struct cpuidle_device *dev; 412 412 413 - for_each_possible_cpu(cpu) { 413 + for_each_present_cpu(cpu) { 414 414 ret = psci_idle_init_cpu(&pdev->dev, cpu); 415 415 if (ret) 416 416 goto out_fail;
+1 -1
drivers/cpuidle/cpuidle-qcom-spm.c
··· 135 135 if (ret) 136 136 return dev_err_probe(&pdev->dev, ret, "set warm boot addr failed"); 137 137 138 - for_each_possible_cpu(cpu) { 138 + for_each_present_cpu(cpu) { 139 139 ret = spm_cpuidle_register(&pdev->dev, cpu); 140 140 if (ret && ret != -ENODEV) { 141 141 dev_err(&pdev->dev,
+2 -2
drivers/cpuidle/cpuidle-riscv-sbi.c
··· 529 529 return ret; 530 530 } 531 531 532 - /* Initialize CPU idle driver for each CPU */ 533 - for_each_possible_cpu(cpu) { 532 + /* Initialize CPU idle driver for each present CPU */ 533 + for_each_present_cpu(cpu) { 534 534 ret = sbi_cpuidle_init_cpu(&pdev->dev, cpu); 535 535 if (ret) { 536 536 pr_debug("HART%ld: idle driver init failed\n",
+67 -62
drivers/cpuidle/governors/menu.c
··· 41 41 * the C state is required to actually break even on this cost. CPUIDLE 42 42 * provides us this duration in the "target_residency" field. So all that we 43 43 * need is a good prediction of how long we'll be idle. Like the traditional 44 - * menu governor, we start with the actual known "next timer event" time. 44 + * menu governor, we take the actual known "next timer event" time. 45 45 * 46 46 * Since there are other source of wakeups (interrupts for example) than 47 47 * the next timer event, this estimation is rather optimistic. To get a ··· 50 50 * duration always was 50% of the next timer tick, the correction factor will 51 51 * be 0.5. 52 52 * 53 - * menu uses a running average for this correction factor, however it uses a 54 - * set of factors, not just a single factor. This stems from the realization 55 - * that the ratio is dependent on the order of magnitude of the expected 56 - * duration; if we expect 500 milliseconds of idle time the likelihood of 57 - * getting an interrupt very early is much higher than if we expect 50 micro 58 - * seconds of idle time. A second independent factor that has big impact on 59 - * the actual factor is if there is (disk) IO outstanding or not. 60 - * (as a special twist, we consider every sleep longer than 50 milliseconds 61 - * as perfect; there are no power gains for sleeping longer than this) 62 - * 63 - * For these two reasons we keep an array of 12 independent factors, that gets 64 - * indexed based on the magnitude of the expected duration as well as the 65 - * "is IO outstanding" property. 53 + * menu uses a running average for this correction factor, but it uses a set of 54 + * factors, not just a single factor. This stems from the realization that the 55 + * ratio is dependent on the order of magnitude of the expected duration; if we 56 + * expect 500 milliseconds of idle time the likelihood of getting an interrupt 57 + * very early is much higher than if we expect 50 micro seconds of idle time. 58 + * For this reason, menu keeps an array of 6 independent factors, that gets 59 + * indexed based on the magnitude of the expected duration. 66 60 * 67 61 * Repeatable-interval-detector 68 62 * ---------------------------- 69 63 * There are some cases where "next timer" is a completely unusable predictor: 70 64 * Those cases where the interval is fixed, for example due to hardware 71 - * interrupt mitigation, but also due to fixed transfer rate devices such as 72 - * mice. 65 + * interrupt mitigation, but also due to fixed transfer rate devices like mice. 73 66 * For this, we use a different predictor: We track the duration of the last 8 74 - * intervals and if the stand deviation of these 8 intervals is below a 75 - * threshold value, we use the average of these intervals as prediction. 76 - * 67 + * intervals and use them to estimate the duration of the next one. 77 68 */ 78 69 79 70 struct menu_device { ··· 107 116 */ 108 117 static unsigned int get_typical_interval(struct menu_device *data) 109 118 { 110 - int i, divisor; 111 - unsigned int min, max, thresh, avg; 112 - uint64_t sum, variance; 113 - 114 - thresh = INT_MAX; /* Discard outliers above this value */ 119 + s64 value, min_thresh = -1, max_thresh = UINT_MAX; 120 + unsigned int max, min, divisor; 121 + u64 avg, variance, avg_sq; 122 + int i; 115 123 116 124 again: 117 - 118 - /* First calculate the average of past intervals */ 119 - min = UINT_MAX; 125 + /* Compute the average and variance of past intervals. */ 120 126 max = 0; 121 - sum = 0; 127 + min = UINT_MAX; 128 + avg = 0; 129 + variance = 0; 122 130 divisor = 0; 123 131 for (i = 0; i < INTERVALS; i++) { 124 - unsigned int value = data->intervals[i]; 125 - if (value <= thresh) { 126 - sum += value; 127 - divisor++; 128 - if (value > max) 129 - max = value; 132 + value = data->intervals[i]; 133 + /* 134 + * Discard the samples outside the interval between the min and 135 + * max thresholds. 136 + */ 137 + if (value <= min_thresh || value >= max_thresh) 138 + continue; 130 139 131 - if (value < min) 132 - min = value; 133 - } 140 + divisor++; 141 + 142 + avg += value; 143 + variance += value * value; 144 + 145 + if (value > max) 146 + max = value; 147 + 148 + if (value < min) 149 + min = value; 134 150 } 135 151 136 152 if (!max) 137 153 return UINT_MAX; 138 154 139 - if (divisor == INTERVALS) 140 - avg = sum >> INTERVAL_SHIFT; 141 - else 142 - avg = div_u64(sum, divisor); 143 - 144 - /* Then try to determine variance */ 145 - variance = 0; 146 - for (i = 0; i < INTERVALS; i++) { 147 - unsigned int value = data->intervals[i]; 148 - if (value <= thresh) { 149 - int64_t diff = (int64_t)value - avg; 150 - variance += diff * diff; 151 - } 152 - } 153 - if (divisor == INTERVALS) 155 + if (divisor == INTERVALS) { 156 + avg >>= INTERVAL_SHIFT; 154 157 variance >>= INTERVAL_SHIFT; 155 - else 158 + } else { 159 + do_div(avg, divisor); 156 160 do_div(variance, divisor); 161 + } 162 + 163 + avg_sq = avg * avg; 164 + variance -= avg_sq; 157 165 158 166 /* 159 167 * The typical interval is obtained when standard deviation is ··· 167 177 * Use this result only if there is no timer to wake us up sooner. 168 178 */ 169 179 if (likely(variance <= U64_MAX/36)) { 170 - if ((((u64)avg*avg > variance*36) && (divisor * 4 >= INTERVALS * 3)) 171 - || variance <= 400) { 180 + if ((avg_sq > variance * 36 && divisor * 4 >= INTERVALS * 3) || 181 + variance <= 400) 172 182 return avg; 173 - } 174 183 } 175 184 176 185 /* 177 - * If we have outliers to the upside in our distribution, discard 178 - * those by setting the threshold to exclude these outliers, then 186 + * If there are outliers, discard them by setting thresholds to exclude 187 + * data points at a large enough distance from the average, then 179 188 * calculate the average and standard deviation again. Once we get 180 - * down to the bottom 3/4 of our samples, stop excluding samples. 189 + * down to the last 3/4 of our samples, stop excluding samples. 181 190 * 182 191 * This can deal with workloads that have long pauses interspersed 183 192 * with sporadic activity with a bunch of short pauses. 184 193 */ 185 - if ((divisor * 4) <= INTERVALS * 3) 186 - return UINT_MAX; 194 + if (divisor * 4 <= INTERVALS * 3) { 195 + /* 196 + * If there are sufficiently many data points still under 197 + * consideration after the outliers have been eliminated, 198 + * returning without a prediction would be a mistake because it 199 + * is likely that the next interval will not exceed the current 200 + * maximum, so return the latter in that case. 201 + */ 202 + if (divisor >= INTERVALS / 2) 203 + return max; 187 204 188 - thresh = max - 1; 205 + return UINT_MAX; 206 + } 207 + 208 + /* Update the thresholds for the next round. */ 209 + if (avg - min > max - avg) 210 + min_thresh = min; 211 + else 212 + max_thresh = max; 213 + 189 214 goto again; 190 215 } 191 216
+27 -8
drivers/idle/intel_idle.c
··· 91 91 * Indicate which enable bits to clear here. 92 92 */ 93 93 unsigned long auto_demotion_disable_flags; 94 - bool byt_auto_demotion_disable_flag; 95 94 bool disable_promotion_to_c1e; 96 95 bool use_acpi; 97 96 }; ··· 1473 1474 static const struct idle_cpu idle_cpu_byt __initconst = { 1474 1475 .state_table = byt_cstates, 1475 1476 .disable_promotion_to_c1e = true, 1476 - .byt_auto_demotion_disable_flag = true, 1477 1477 }; 1478 1478 1479 1479 static const struct idle_cpu idle_cpu_cht __initconst = { 1480 1480 .state_table = cht_cstates, 1481 1481 .disable_promotion_to_c1e = true, 1482 - .byt_auto_demotion_disable_flag = true, 1483 1482 }; 1484 1483 1485 1484 static const struct idle_cpu idle_cpu_ivb __initconst = { ··· 1703 1706 module_param_named(use_acpi, force_use_acpi, bool, 0444); 1704 1707 MODULE_PARM_DESC(use_acpi, "Use ACPI _CST for building the idle states list"); 1705 1708 1709 + static bool no_native __read_mostly; /* No effect if no_acpi is set. */ 1710 + module_param_named(no_native, no_native, bool, 0444); 1711 + MODULE_PARM_DESC(no_native, "Ignore cpu specific (native) idle states in lieu of ACPI idle states"); 1712 + 1706 1713 static struct acpi_processor_power acpi_state_table __initdata; 1707 1714 1708 1715 /** ··· 1850 1849 } 1851 1850 return true; 1852 1851 } 1852 + 1853 + static inline bool ignore_native(void) 1854 + { 1855 + return no_native && !no_acpi; 1856 + } 1853 1857 #else /* !CONFIG_ACPI_PROCESSOR_CSTATE */ 1854 1858 #define force_use_acpi (false) 1855 1859 ··· 1864 1858 { 1865 1859 return false; 1866 1860 } 1861 + static inline bool ignore_native(void) { return false; } 1867 1862 #endif /* !CONFIG_ACPI_PROCESSOR_CSTATE */ 1868 1863 1869 1864 /** ··· 2077 2070 } 2078 2071 } 2079 2072 2073 + /** 2074 + * byt_cht_auto_demotion_disable - Disable Bay/Cherry Trail auto-demotion. 2075 + */ 2076 + static void __init byt_cht_auto_demotion_disable(void) 2077 + { 2078 + wrmsrl(MSR_CC6_DEMOTION_POLICY_CONFIG, 0); 2079 + wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0); 2080 + } 2081 + 2080 2082 static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) 2081 2083 { 2082 2084 unsigned int mwait_cstate = (MWAIT_HINT2CSTATE(mwait_hint) + 1) & ··· 2167 2151 case INTEL_ATOM_GRACEMONT: 2168 2152 adl_idle_state_table_update(); 2169 2153 break; 2154 + case INTEL_ATOM_SILVERMONT: 2155 + case INTEL_ATOM_AIRMONT: 2156 + byt_cht_auto_demotion_disable(); 2157 + break; 2170 2158 } 2171 2159 2172 2160 for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) { ··· 2215 2195 state->flags |= CPUIDLE_FLAG_TIMER_STOP; 2216 2196 2217 2197 drv->state_count++; 2218 - } 2219 - 2220 - if (icpu->byt_auto_demotion_disable_flag) { 2221 - wrmsrl(MSR_CC6_DEMOTION_POLICY_CONFIG, 0); 2222 - wrmsrl(MSR_MC6_DEMOTION_POLICY_CONFIG, 0); 2223 2198 } 2224 2199 } 2225 2200 ··· 2361 2346 pr_debug("MWAIT substates: 0x%x\n", mwait_substates); 2362 2347 2363 2348 icpu = (const struct idle_cpu *)id->driver_data; 2349 + if (icpu && ignore_native()) { 2350 + pr_debug("ignoring native CPU idle states\n"); 2351 + icpu = NULL; 2352 + } 2364 2353 if (icpu) { 2365 2354 if (icpu->state_table) 2366 2355 cpuidle_state_table = icpu->state_table;
+1 -1
drivers/mfd/intel-lpss.c
··· 480 480 481 481 static int resume_lpss_device(struct device *dev, void *data) 482 482 { 483 - if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) 483 + if (!dev_pm_smart_suspend(dev)) 484 484 pm_runtime_resume(dev); 485 485 486 486 return 0;
+2 -4
drivers/pci/pci-driver.c
··· 812 812 * suspend callbacks can cope with runtime-suspended devices, it is 813 813 * better to resume the device from runtime suspend here. 814 814 */ 815 - if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || 816 - pci_dev_need_resume(pci_dev)) { 815 + if (!dev_pm_smart_suspend(dev) || pci_dev_need_resume(pci_dev)) { 817 816 pm_runtime_resume(dev); 818 817 pci_dev->state_saved = false; 819 818 } else { ··· 1150 1151 } 1151 1152 1152 1153 /* The reason to do that is the same as in pci_pm_suspend(). */ 1153 - if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) || 1154 - pci_dev_need_resume(pci_dev)) { 1154 + if (!dev_pm_smart_suspend(dev) || pci_dev_need_resume(pci_dev)) { 1155 1155 pm_runtime_resume(dev); 1156 1156 pci_dev->state_saved = false; 1157 1157 } else {
+1 -1
drivers/powercap/Kconfig
··· 82 82 83 83 config DTPM_CPU 84 84 bool "Add CPU power capping based on the energy model" 85 - depends on DTPM && ENERGY_MODEL 85 + depends on DTPM && ENERGY_MODEL && SMP 86 86 help 87 87 This enables support for CPU power limitation based on 88 88 energy model.
+9 -15
include/linux/cpufreq.h
··· 144 144 /* Per policy boost enabled flag. */ 145 145 bool boost_enabled; 146 146 147 + /* Per policy boost supported flag. */ 148 + bool boost_supported; 149 + 147 150 /* Cached frequency lookup from cpufreq_driver_resolve_freq. */ 148 151 unsigned int cached_target_freq; 149 152 unsigned int cached_resolved_idx; ··· 212 209 } 213 210 static inline void cpufreq_cpu_put(struct cpufreq_policy *policy) { } 214 211 #endif 212 + 213 + /* Scope based cleanup macro for cpufreq_policy kobject reference counting */ 214 + DEFINE_FREE(put_cpufreq_policy, struct cpufreq_policy *, if (_T) cpufreq_cpu_put(_T)) 215 215 216 216 static inline bool policy_is_inactive(struct cpufreq_policy *policy) 217 217 { ··· 784 778 ssize_t cpufreq_show_cpus(const struct cpumask *mask, char *buf); 785 779 786 780 #ifdef CONFIG_CPU_FREQ 787 - int cpufreq_boost_trigger_state(int state); 788 781 bool cpufreq_boost_enabled(void); 789 - int cpufreq_enable_boost_support(void); 790 - bool policy_has_boost_freq(struct cpufreq_policy *policy); 782 + int cpufreq_boost_set_sw(struct cpufreq_policy *policy, int state); 791 783 792 784 /* Find lowest freq at or above target in a table in ascending order */ 793 785 static inline int cpufreq_table_find_index_al(struct cpufreq_policy *policy, ··· 1154 1150 return 0; 1155 1151 } 1156 1152 #else 1157 - static inline int cpufreq_boost_trigger_state(int state) 1158 - { 1159 - return 0; 1160 - } 1161 1153 static inline bool cpufreq_boost_enabled(void) 1162 1154 { 1163 1155 return false; 1164 1156 } 1165 1157 1166 - static inline int cpufreq_enable_boost_support(void) 1158 + static inline int cpufreq_boost_set_sw(struct cpufreq_policy *policy, int state) 1167 1159 { 1168 - return -EINVAL; 1169 - } 1170 - 1171 - static inline bool policy_has_boost_freq(struct cpufreq_policy *policy) 1172 - { 1173 - return false; 1160 + return -EOPNOTSUPP; 1174 1161 } 1175 1162 1176 1163 static inline int ··· 1193 1198 /* the following are really really optional */ 1194 1199 extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs; 1195 1200 extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs; 1196 - extern struct freq_attr *cpufreq_generic_attr[]; 1197 1201 int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy); 1198 1202 1199 1203 unsigned int cpufreq_generic_get(unsigned int cpu);
+9
include/linux/device.h
··· 1025 1025 return !!(dev->power.driver_flags & flags); 1026 1026 } 1027 1027 1028 + static inline bool dev_pm_smart_suspend(struct device *dev) 1029 + { 1030 + #ifdef CONFIG_PM_SLEEP 1031 + return dev->power.smart_suspend; 1032 + #else 1033 + return false; 1034 + #endif 1035 + } 1036 + 1028 1037 static inline void device_lock(struct device *dev) 1029 1038 { 1030 1039 mutex_lock(&dev->mutex);
+10 -10
include/linux/energy_model.h
··· 167 167 struct em_perf_domain *em_cpu_get(int cpu); 168 168 struct em_perf_domain *em_pd_get(struct device *dev); 169 169 int em_dev_update_perf_domain(struct device *dev, 170 - struct em_perf_table __rcu *new_table); 170 + struct em_perf_table *new_table); 171 171 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, 172 - struct em_data_callback *cb, cpumask_t *span, 173 - bool microwatts); 172 + const struct em_data_callback *cb, 173 + const cpumask_t *cpus, bool microwatts); 174 174 void em_dev_unregister_perf_domain(struct device *dev); 175 - struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd); 176 - void em_table_free(struct em_perf_table __rcu *table); 175 + struct em_perf_table *em_table_alloc(struct em_perf_domain *pd); 176 + void em_table_free(struct em_perf_table *table); 177 177 int em_dev_compute_costs(struct device *dev, struct em_perf_state *table, 178 178 int nr_states); 179 179 int em_dev_update_chip_binning(struct device *dev); ··· 344 344 345 345 static inline 346 346 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, 347 - struct em_data_callback *cb, cpumask_t *span, 348 - bool microwatts) 347 + const struct em_data_callback *cb, 348 + const cpumask_t *cpus, bool microwatts) 349 349 { 350 350 return -EINVAL; 351 351 } ··· 371 371 return 0; 372 372 } 373 373 static inline 374 - struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd) 374 + struct em_perf_table *em_table_alloc(struct em_perf_domain *pd) 375 375 { 376 376 return NULL; 377 377 } 378 - static inline void em_table_free(struct em_perf_table __rcu *table) {} 378 + static inline void em_table_free(struct em_perf_table *table) {} 379 379 static inline 380 380 int em_dev_update_perf_domain(struct device *dev, 381 - struct em_perf_table __rcu *new_table) 381 + struct em_perf_table *new_table) 382 382 { 383 383 return -EINVAL; 384 384 }
+3 -6
include/linux/pm.h
··· 597 597 RPM_RESUMING, 598 598 RPM_SUSPENDED, 599 599 RPM_SUSPENDING, 600 + RPM_BLOCKED, 600 601 }; 601 602 602 603 /* ··· 679 678 bool wakeup_path:1; 680 679 bool syscore:1; 681 680 bool no_pm_callbacks:1; /* Owned by the PM core */ 682 - bool async_in_progress:1; /* Owned by the PM core */ 681 + bool work_in_progress:1; /* Owned by the PM core */ 682 + bool smart_suspend:1; /* Owned by the PM core */ 683 683 bool must_resume:1; /* Owned by the PM core */ 684 - bool set_active:1; /* Owned by the PM core */ 685 684 bool may_skip_resume:1; /* Set by subsystems */ 686 685 #else 687 686 bool should_wakeup:1; ··· 839 838 extern int pm_generic_resume_noirq(struct device *dev); 840 839 extern int pm_generic_resume(struct device *dev); 841 840 extern int pm_generic_freeze_noirq(struct device *dev); 842 - extern int pm_generic_freeze_late(struct device *dev); 843 841 extern int pm_generic_freeze(struct device *dev); 844 842 extern int pm_generic_thaw_noirq(struct device *dev); 845 - extern int pm_generic_thaw_early(struct device *dev); 846 843 extern int pm_generic_thaw(struct device *dev); 847 844 extern int pm_generic_restore_noirq(struct device *dev); 848 845 extern int pm_generic_restore_early(struct device *dev); ··· 882 883 #define pm_generic_resume_noirq NULL 883 884 #define pm_generic_resume NULL 884 885 #define pm_generic_freeze_noirq NULL 885 - #define pm_generic_freeze_late NULL 886 886 #define pm_generic_freeze NULL 887 887 #define pm_generic_thaw_noirq NULL 888 - #define pm_generic_thaw_early NULL 889 888 #define pm_generic_thaw NULL 890 889 #define pm_generic_restore_noirq NULL 891 890 #define pm_generic_restore_early NULL
-5
include/linux/pm_clock.h
··· 41 41 extern void pm_clk_destroy(struct device *dev); 42 42 extern int pm_clk_add(struct device *dev, const char *con_id); 43 43 extern int pm_clk_add_clk(struct device *dev, struct clk *clk); 44 - extern int of_pm_clk_add_clk(struct device *dev, const char *name); 45 44 extern int of_pm_clk_add_clks(struct device *dev); 46 - extern void pm_clk_remove(struct device *dev, const char *con_id); 47 45 extern void pm_clk_remove_clk(struct device *dev, struct clk *clk); 48 46 extern int pm_clk_suspend(struct device *dev); 49 47 extern int pm_clk_resume(struct device *dev); ··· 73 75 static inline int of_pm_clk_add_clks(struct device *dev) 74 76 { 75 77 return -EINVAL; 76 - } 77 - static inline void pm_clk_remove(struct device *dev, const char *con_id) 78 - { 79 78 } 80 79 #define pm_clk_suspend NULL 81 80 #define pm_clk_resume NULL
+29 -4
include/linux/pm_runtime.h
··· 66 66 67 67 extern int pm_generic_runtime_suspend(struct device *dev); 68 68 extern int pm_generic_runtime_resume(struct device *dev); 69 + extern bool pm_runtime_need_not_resume(struct device *dev); 69 70 extern int pm_runtime_force_suspend(struct device *dev); 70 71 extern int pm_runtime_force_resume(struct device *dev); 71 72 ··· 78 77 extern int pm_schedule_suspend(struct device *dev, unsigned int delay); 79 78 extern int __pm_runtime_set_status(struct device *dev, unsigned int status); 80 79 extern int pm_runtime_barrier(struct device *dev); 80 + extern bool pm_runtime_block_if_disabled(struct device *dev); 81 + extern void pm_runtime_unblock(struct device *dev); 81 82 extern void pm_runtime_enable(struct device *dev); 82 83 extern void __pm_runtime_disable(struct device *dev, bool check_resume); 83 84 extern void pm_runtime_allow(struct device *dev); ··· 200 197 } 201 198 202 199 /** 200 + * pm_runtime_blocked - Check if runtime PM enabling is blocked. 201 + * @dev: Target device. 202 + * 203 + * Do not call this function outside system suspend/resume code paths. 204 + */ 205 + static inline bool pm_runtime_blocked(struct device *dev) 206 + { 207 + return dev->power.last_status == RPM_BLOCKED; 208 + } 209 + 210 + /** 203 211 * pm_runtime_has_no_callbacks - Check if runtime PM callbacks may be present. 204 212 * @dev: Target device. 205 213 * ··· 255 241 256 242 static inline int pm_generic_runtime_suspend(struct device *dev) { return 0; } 257 243 static inline int pm_generic_runtime_resume(struct device *dev) { return 0; } 244 + static inline bool pm_runtime_need_not_resume(struct device *dev) {return true; } 258 245 static inline int pm_runtime_force_suspend(struct device *dev) { return 0; } 259 246 static inline int pm_runtime_force_resume(struct device *dev) { return 0; } 260 247 ··· 286 271 static inline int __pm_runtime_set_status(struct device *dev, 287 272 unsigned int status) { return 0; } 288 273 static inline int pm_runtime_barrier(struct device *dev) { return 0; } 274 + static inline bool pm_runtime_block_if_disabled(struct device *dev) { return true; } 275 + static inline void pm_runtime_unblock(struct device *dev) {} 289 276 static inline void pm_runtime_enable(struct device *dev) {} 290 277 static inline void __pm_runtime_disable(struct device *dev, bool c) {} 278 + static inline bool pm_runtime_blocked(struct device *dev) { return true; } 291 279 static inline void pm_runtime_allow(struct device *dev) {} 292 280 static inline void pm_runtime_forbid(struct device *dev) {} 293 281 ··· 574 556 * pm_runtime_disable - Disable runtime PM for a device. 575 557 * @dev: Target device. 576 558 * 577 - * Prevent the runtime PM framework from working with @dev (by incrementing its 578 - * "blocking" counter). 559 + * Prevent the runtime PM framework from working with @dev by incrementing its 560 + * "disable" counter. 579 561 * 580 - * For each invocation of this function for @dev there must be a matching 581 - * pm_runtime_enable() call in order for runtime PM to be enabled for it. 562 + * If the counter is zero when this function runs and there is a pending runtime 563 + * resume request for @dev, it will be resumed. If the counter is still zero at 564 + * that point, all of the pending runtime PM requests for @dev will be canceled 565 + * and all runtime PM operations in progress involving it will be waited for to 566 + * complete. 567 + * 568 + * For each invocation of this function for @dev, there must be a matching 569 + * pm_runtime_enable() call, so that runtime PM is eventually enabled for it 570 + * again. 582 571 */ 583 572 static inline void pm_runtime_disable(struct device *dev) 584 573 {
+3 -3
include/linux/pm_wakeup.h
··· 205 205 206 206 static inline void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec) 207 207 { 208 - return pm_wakeup_ws_event(ws, msec, false); 208 + pm_wakeup_ws_event(ws, msec, false); 209 209 } 210 210 211 211 static inline void pm_wakeup_event(struct device *dev, unsigned int msec) 212 212 { 213 - return pm_wakeup_dev_event(dev, msec, false); 213 + pm_wakeup_dev_event(dev, msec, false); 214 214 } 215 215 216 216 static inline void pm_wakeup_hard_event(struct device *dev) 217 217 { 218 - return pm_wakeup_dev_event(dev, 0, true); 218 + pm_wakeup_dev_event(dev, 0, true); 219 219 } 220 220 221 221 /**
+1 -2
kernel/power/Kconfig
··· 380 380 381 381 config ENERGY_MODEL 382 382 bool "Energy Model for devices with DVFS (CPUs, GPUs, etc)" 383 - depends on SMP 384 - depends on CPU_FREQ 383 + depends on CPU_FREQ || PM_DEVFREQ 385 384 help 386 385 Several subsystems (thermal and/or the task scheduler for example) 387 386 can leverage information about the energy consumed by devices to
+30 -37
kernel/power/energy_model.c
··· 161 161 static void em_debug_remove_pd(struct device *dev) {} 162 162 #endif 163 163 164 - static void em_destroy_table_rcu(struct rcu_head *rp) 165 - { 166 - struct em_perf_table __rcu *table; 167 - 168 - table = container_of(rp, struct em_perf_table, rcu); 169 - kfree(table); 170 - } 171 - 172 164 static void em_release_table_kref(struct kref *kref) 173 165 { 174 - struct em_perf_table __rcu *table; 175 - 176 166 /* It was the last owner of this table so we can free */ 177 - table = container_of(kref, struct em_perf_table, kref); 178 - 179 - call_rcu(&table->rcu, em_destroy_table_rcu); 167 + kfree_rcu(container_of(kref, struct em_perf_table, kref), rcu); 180 168 } 181 169 182 170 /** ··· 173 185 * 174 186 * No return values. 175 187 */ 176 - void em_table_free(struct em_perf_table __rcu *table) 188 + void em_table_free(struct em_perf_table *table) 177 189 { 178 190 kref_put(&table->kref, em_release_table_kref); 179 191 } ··· 186 198 * has a user. 187 199 * Returns allocated table or NULL. 188 200 */ 189 - struct em_perf_table __rcu *em_table_alloc(struct em_perf_domain *pd) 201 + struct em_perf_table *em_table_alloc(struct em_perf_domain *pd) 190 202 { 191 - struct em_perf_table __rcu *table; 203 + struct em_perf_table *table; 192 204 int table_size; 193 205 194 206 table_size = sizeof(struct em_perf_state) * pd->nr_perf_states; ··· 227 239 } 228 240 229 241 static int em_compute_costs(struct device *dev, struct em_perf_state *table, 230 - struct em_data_callback *cb, int nr_states, 242 + const struct em_data_callback *cb, int nr_states, 231 243 unsigned long flags) 232 244 { 233 245 unsigned long prev_cost = ULONG_MAX; ··· 296 308 * Return 0 on success or an error code on failure. 297 309 */ 298 310 int em_dev_update_perf_domain(struct device *dev, 299 - struct em_perf_table __rcu *new_table) 311 + struct em_perf_table *new_table) 300 312 { 301 - struct em_perf_table __rcu *old_table; 313 + struct em_perf_table *old_table; 302 314 struct em_perf_domain *pd; 303 315 304 316 if (!dev) ··· 315 327 316 328 kref_get(&new_table->kref); 317 329 318 - old_table = pd->em_table; 330 + old_table = rcu_dereference_protected(pd->em_table, 331 + lockdep_is_held(&em_pd_mutex)); 319 332 rcu_assign_pointer(pd->em_table, new_table); 320 333 321 334 em_cpufreq_update_efficiencies(dev, new_table->state); ··· 330 341 331 342 static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, 332 343 struct em_perf_state *table, 333 - struct em_data_callback *cb, 344 + const struct em_data_callback *cb, 334 345 unsigned long flags) 335 346 { 336 347 unsigned long power, freq, prev_freq = 0; ··· 385 396 } 386 397 387 398 static int em_create_pd(struct device *dev, int nr_states, 388 - struct em_data_callback *cb, cpumask_t *cpus, 399 + const struct em_data_callback *cb, 400 + const cpumask_t *cpus, 389 401 unsigned long flags) 390 402 { 391 - struct em_perf_table __rcu *em_table; 403 + struct em_perf_table *em_table; 392 404 struct em_perf_domain *pd; 393 405 struct device *cpu_dev; 394 406 int cpu, ret, num_cpus; ··· 546 556 * Return 0 on success 547 557 */ 548 558 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, 549 - struct em_data_callback *cb, cpumask_t *cpus, 550 - bool microwatts) 559 + const struct em_data_callback *cb, 560 + const cpumask_t *cpus, bool microwatts) 551 561 { 562 + struct em_perf_table *em_table; 552 563 unsigned long cap, prev_cap = 0; 553 564 unsigned long flags = 0; 554 565 int cpu, ret; ··· 622 631 dev->em_pd->min_perf_state = 0; 623 632 dev->em_pd->max_perf_state = nr_states - 1; 624 633 625 - em_cpufreq_update_efficiencies(dev, dev->em_pd->em_table->state); 634 + em_table = rcu_dereference_protected(dev->em_pd->em_table, 635 + lockdep_is_held(&em_pd_mutex)); 636 + em_cpufreq_update_efficiencies(dev, em_table->state); 626 637 627 638 em_debug_create_pd(dev); 628 639 dev_info(dev, "EM: created perf domain\n"); ··· 661 668 mutex_lock(&em_pd_mutex); 662 669 em_debug_remove_pd(dev); 663 670 664 - em_table_free(dev->em_pd->em_table); 671 + em_table_free(rcu_dereference_protected(dev->em_pd->em_table, 672 + lockdep_is_held(&em_pd_mutex))); 665 673 666 674 kfree(dev->em_pd); 667 675 dev->em_pd = NULL; ··· 670 676 } 671 677 EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain); 672 678 673 - static struct em_perf_table __rcu *em_table_dup(struct em_perf_domain *pd) 679 + static struct em_perf_table *em_table_dup(struct em_perf_domain *pd) 674 680 { 675 - struct em_perf_table __rcu *em_table; 681 + struct em_perf_table *em_table; 676 682 struct em_perf_state *ps, *new_ps; 677 683 int ps_size; 678 684 ··· 694 700 } 695 701 696 702 static int em_recalc_and_update(struct device *dev, struct em_perf_domain *pd, 697 - struct em_perf_table __rcu *em_table) 703 + struct em_perf_table *em_table) 698 704 { 699 705 int ret; 700 706 ··· 722 728 * are correctly calculated. 723 729 */ 724 730 static void em_adjust_new_capacity(struct device *dev, 725 - struct em_perf_domain *pd, 726 - u64 max_cap) 731 + struct em_perf_domain *pd) 727 732 { 728 - struct em_perf_table __rcu *em_table; 733 + struct em_perf_table *em_table; 729 734 730 735 em_table = em_table_dup(pd); 731 736 if (!em_table) { ··· 768 775 } 769 776 cpufreq_cpu_put(policy); 770 777 771 - pd = em_cpu_get(cpu); 778 + dev = get_cpu_device(cpu); 779 + pd = em_pd_get(dev); 772 780 if (!pd || em_is_artificial(pd)) 773 781 continue; 774 782 ··· 793 799 pr_debug("updating cpu%d cpu_cap=%lu old capacity=%lu\n", 794 800 cpu, cpu_capacity, em_max_perf); 795 801 796 - dev = get_cpu_device(cpu); 797 - em_adjust_new_capacity(dev, pd, cpu_capacity); 802 + em_adjust_new_capacity(dev, pd); 798 803 } 799 804 800 805 free_cpumask_var(cpu_done_mask); ··· 815 822 */ 816 823 int em_dev_update_chip_binning(struct device *dev) 817 824 { 818 - struct em_perf_table __rcu *em_table; 825 + struct em_perf_table *em_table; 819 826 struct em_perf_domain *pd; 820 827 int i, ret; 821 828
+3 -3
kernel/power/hibernate.c
··· 1446 1446 static int hibernate_compressor_param_set(const char *compressor, 1447 1447 const struct kernel_param *kp) 1448 1448 { 1449 - unsigned int sleep_flags; 1450 1449 int index, ret; 1451 1450 1452 - sleep_flags = lock_system_sleep(); 1451 + if (!mutex_trylock(&system_transition_mutex)) 1452 + return -EBUSY; 1453 1453 1454 1454 index = sysfs_match_string(comp_alg_enabled, compressor); 1455 1455 if (index >= 0) { ··· 1461 1461 ret = index; 1462 1462 } 1463 1463 1464 - unlock_system_sleep(sleep_flags); 1464 + mutex_unlock(&system_transition_mutex); 1465 1465 1466 1466 if (ret) 1467 1467 pr_debug("Cannot set specified compressor %s\n",
+8 -8
kernel/power/snapshot.c
··· 2270 2270 */ 2271 2271 void *kaddr; 2272 2272 2273 - kaddr = kmap_atomic(page); 2273 + kaddr = kmap_local_page(page); 2274 2274 copy_page(buffer, kaddr); 2275 - kunmap_atomic(kaddr); 2275 + kunmap_local(kaddr); 2276 2276 handle->buffer = buffer; 2277 2277 } else { 2278 2278 handle->buffer = page_address(page); ··· 2561 2561 if (last_highmem_page) { 2562 2562 void *dst; 2563 2563 2564 - dst = kmap_atomic(last_highmem_page); 2564 + dst = kmap_local_page(last_highmem_page); 2565 2565 copy_page(dst, buffer); 2566 - kunmap_atomic(dst); 2566 + kunmap_local(dst); 2567 2567 last_highmem_page = NULL; 2568 2568 } 2569 2569 } ··· 2881 2881 { 2882 2882 void *kaddr1, *kaddr2; 2883 2883 2884 - kaddr1 = kmap_atomic(p1); 2885 - kaddr2 = kmap_atomic(p2); 2884 + kaddr1 = kmap_local_page(p1); 2885 + kaddr2 = kmap_local_page(p2); 2886 2886 copy_page(buf, kaddr1); 2887 2887 copy_page(kaddr1, kaddr2); 2888 2888 copy_page(kaddr2, buf); 2889 - kunmap_atomic(kaddr2); 2890 - kunmap_atomic(kaddr1); 2889 + kunmap_local(kaddr2); 2890 + kunmap_local(kaddr1); 2891 2891 } 2892 2892 2893 2893 /**
+10 -4
kernel/power/suspend.c
··· 91 91 { 92 92 trace_suspend_resume(TPS("machine_suspend"), PM_SUSPEND_TO_IDLE, true); 93 93 94 + /* 95 + * The correctness of the code below depends on the number of online 96 + * CPUs being stable, but CPUs cannot be taken offline or put online 97 + * while it is running. 98 + * 99 + * The s2idle_lock must be acquired before the pending wakeup check to 100 + * prevent pm_system_wakeup() from running as a whole between that check 101 + * and the subsequent s2idle_state update in which case a wakeup event 102 + * would get lost. 103 + */ 94 104 raw_spin_lock_irq(&s2idle_lock); 95 105 if (pm_wakeup_pending()) 96 106 goto out; 97 107 98 108 s2idle_state = S2IDLE_STATE_ENTER; 99 109 raw_spin_unlock_irq(&s2idle_lock); 100 - 101 - cpus_read_lock(); 102 110 103 111 /* Push all the CPUs into the idle loop. */ 104 112 wake_up_all_idle_cpus(); ··· 119 111 * consistent system state. 120 112 */ 121 113 wake_up_all_idle_cpus(); 122 - 123 - cpus_read_unlock(); 124 114 125 115 raw_spin_lock_irq(&s2idle_lock); 126 116
+11 -8
tools/power/cpupower/Makefile
··· 52 52 # and _should_ modify the PACKAGE_BUGREPORT definition 53 53 54 54 VERSION:= $(shell ./utils/version-gen.sh) 55 - LIB_MAJ= 0.0.1 56 - LIB_MIN= 1 55 + LIB_FIX= 1 56 + LIB_MIN= 0 57 + LIB_MAJ= 1 58 + LIB_VER= $(LIB_MAJ).$(LIB_MIN).$(LIB_FIX) 59 + 57 60 58 61 PACKAGE = cpupower 59 62 PACKAGE_BUGREPORT = linux-pm@vger.kernel.org ··· 203 200 $(ECHO) " CC " $@ 204 201 $(QUIET) $(CC) $(CFLAGS) -fPIC -o $@ -c lib/$*.c 205 202 206 - $(OUTPUT)libcpupower.so.$(LIB_MAJ): $(LIB_OBJS) 203 + $(OUTPUT)libcpupower.so.$(LIB_VER): $(LIB_OBJS) 207 204 $(ECHO) " LD " $@ 208 205 $(QUIET) $(CC) -shared $(CFLAGS) $(LDFLAGS) -o $@ \ 209 - -Wl,-soname,libcpupower.so.$(LIB_MIN) $(LIB_OBJS) 206 + -Wl,-soname,libcpupower.so.$(LIB_MAJ) $(LIB_OBJS) 210 207 @ln -sf $(@F) $(OUTPUT)libcpupower.so 211 - @ln -sf $(@F) $(OUTPUT)libcpupower.so.$(LIB_MIN) 208 + @ln -sf $(@F) $(OUTPUT)libcpupower.so.$(LIB_MAJ) 212 209 213 - libcpupower: $(OUTPUT)libcpupower.so.$(LIB_MAJ) 210 + libcpupower: $(OUTPUT)libcpupower.so.$(LIB_VER) 214 211 215 212 # Let all .o files depend on its .c file and all headers 216 213 # Might be worth to put this into utils/Makefile at some point of time ··· 220 217 $(ECHO) " CC " $@ 221 218 $(QUIET) $(CC) $(CFLAGS) -I./lib -I ./utils -o $@ -c $*.c 222 219 223 - $(OUTPUT)cpupower: $(UTIL_OBJS) $(OUTPUT)libcpupower.so.$(LIB_MAJ) 220 + $(OUTPUT)cpupower: $(UTIL_OBJS) $(OUTPUT)libcpupower.so.$(LIB_VER) 224 221 $(ECHO) " CC " $@ 225 222 ifeq ($(strip $(STATIC)),true) 226 223 $(QUIET) $(CC) $(CFLAGS) $(LDFLAGS) $(UTIL_OBJS) -lrt -lpci -L$(OUTPUT) -o $@ ··· 265 262 done; 266 263 endif 267 264 268 - compile-bench: $(OUTPUT)libcpupower.so.$(LIB_MAJ) 265 + compile-bench: $(OUTPUT)libcpupower.so.$(LIB_VER) 269 266 @V=$(V) confdir=$(confdir) $(MAKE) -C bench O=$(OUTPUT) 270 267 271 268 # we compile into subdirectories. if the target directory is not the
+4
tools/power/cpupower/bench/parse.c
··· 121 121 struct config *prepare_default_config() 122 122 { 123 123 struct config *config = malloc(sizeof(struct config)); 124 + if (!config) { 125 + perror("malloc"); 126 + return NULL; 127 + } 124 128 125 129 dprintf("loading defaults\n"); 126 130
+40 -8
tools/power/cpupower/lib/cpupower.c
··· 10 10 #include <stdio.h> 11 11 #include <errno.h> 12 12 #include <stdlib.h> 13 + #include <string.h> 13 14 14 15 #include "cpupower.h" 15 16 #include "cpupower_intern.h" ··· 151 150 return 0; 152 151 } 153 152 153 + static int __compare_core_cpu_list(const void *t1, const void *t2) 154 + { 155 + struct cpuid_core_info *top1 = (struct cpuid_core_info *)t1; 156 + struct cpuid_core_info *top2 = (struct cpuid_core_info *)t2; 157 + 158 + return strcmp(top1->core_cpu_list, top2->core_cpu_list); 159 + } 160 + 154 161 /* 155 162 * Returns amount of cpus, negative on error, cpu_top must be 156 163 * passed to cpu_topology_release to free resources 157 164 * 158 - * Array is sorted after ->pkg, ->core, then ->cpu 165 + * Array is sorted after ->cpu_smt_list ->pkg, ->core 159 166 */ 160 167 int get_cpu_topology(struct cpupower_topology *cpu_top) 161 168 { 162 169 int cpu, last_pkg, cpus = sysconf(_SC_NPROCESSORS_CONF); 170 + char path[SYSFS_PATH_MAX]; 171 + char *last_cpu_list; 163 172 164 173 cpu_top->core_info = malloc(sizeof(struct cpuid_core_info) * cpus); 165 174 if (cpu_top->core_info == NULL) ··· 194 183 cpu_top->core_info[cpu].core = -1; 195 184 continue; 196 185 } 186 + if (cpu_top->core_info[cpu].core == -1) { 187 + strncpy(cpu_top->core_info[cpu].core_cpu_list, "-1", CPULIST_BUFFER); 188 + continue; 189 + } 190 + snprintf(path, sizeof(path), PATH_TO_CPU "cpu%u/topology/%s", 191 + cpu, "core_cpus_list"); 192 + if (cpupower_read_sysfs( 193 + path, 194 + cpu_top->core_info[cpu].core_cpu_list, 195 + CPULIST_BUFFER) < 1) { 196 + printf("Warning CPU%u has a 0 size core_cpus_list string", cpu); 197 + } 198 + } 199 + 200 + /* Count the number of distinct cpu lists to get the physical core 201 + * count. 202 + */ 203 + qsort(cpu_top->core_info, cpus, sizeof(struct cpuid_core_info), 204 + __compare_core_cpu_list); 205 + 206 + last_cpu_list = cpu_top->core_info[0].core_cpu_list; 207 + cpu_top->cores = 1; 208 + for (cpu = 1; cpu < cpus; cpu++) { 209 + if (strcmp(cpu_top->core_info[cpu].core_cpu_list, last_cpu_list) != 0 && 210 + cpu_top->core_info[cpu].pkg != -1) { 211 + last_cpu_list = cpu_top->core_info[cpu].core_cpu_list; 212 + cpu_top->cores++; 213 + } 197 214 } 198 215 199 216 qsort(cpu_top->core_info, cpus, sizeof(struct cpuid_core_info), ··· 242 203 if (!(cpu_top->core_info[0].pkg == -1)) 243 204 cpu_top->pkgs++; 244 205 245 - /* Intel's cores count is not consecutively numbered, there may 246 - * be a core_id of 3, but none of 2. Assume there always is 0 247 - * Get amount of cores by counting duplicates in a package 248 - for (cpu = 0; cpu_top->core_info[cpu].pkg = 0 && cpu < cpus; cpu++) { 249 - if (cpu_top->core_info[cpu].core == 0) 250 - cpu_top->cores++; 251 - */ 252 206 return cpus; 253 207 } 254 208
+3
tools/power/cpupower/lib/cpupower.h
··· 2 2 #ifndef __CPUPOWER_CPUPOWER_H__ 3 3 #define __CPUPOWER_CPUPOWER_H__ 4 4 5 + #define CPULIST_BUFFER 5 6 + 5 7 struct cpupower_topology { 6 8 /* Amount of CPU cores, packages and threads per core in the system */ 7 9 unsigned int cores; ··· 18 16 int pkg; 19 17 int core; 20 18 int cpu; 19 + char core_cpu_list[CPULIST_BUFFER]; 21 20 22 21 /* flags */ 23 22 unsigned int is_online:1;
+36 -12
tools/power/cpupower/utils/idle_monitor/cpupower-monitor.c
··· 6 6 */ 7 7 8 8 9 + #include <errno.h> 9 10 #include <stdio.h> 10 11 #include <unistd.h> 11 12 #include <stdlib.h> ··· 92 91 return 0; 93 92 } 94 93 95 - #define MAX_COL_WIDTH 6 94 + #define MAX_COL_WIDTH 6 95 + #define TOPOLOGY_DEPTH_PKG 3 96 + #define TOPOLOGY_DEPTH_CORE 2 97 + #define TOPOLOGY_DEPTH_CPU 1 98 + 96 99 void print_header(int topology_depth) 97 100 { 98 101 int unsigned mon; ··· 118 113 } 119 114 printf("\n"); 120 115 121 - if (topology_depth > 2) 116 + switch (topology_depth) { 117 + case TOPOLOGY_DEPTH_PKG: 122 118 printf(" PKG|"); 123 - if (topology_depth > 1) 119 + break; 120 + case TOPOLOGY_DEPTH_CORE: 124 121 printf("CORE|"); 125 - if (topology_depth > 0) 122 + break; 123 + case TOPOLOGY_DEPTH_CPU: 126 124 printf(" CPU|"); 125 + break; 126 + default: 127 + return; 128 + } 127 129 128 130 for (mon = 0; mon < avail_monitors; mon++) { 129 131 if (mon != 0) ··· 164 152 cpu_top.core_info[cpu].pkg == -1) 165 153 return; 166 154 167 - if (topology_depth > 2) 155 + switch (topology_depth) { 156 + case TOPOLOGY_DEPTH_PKG: 168 157 printf("%4d|", cpu_top.core_info[cpu].pkg); 169 - if (topology_depth > 1) 158 + break; 159 + case TOPOLOGY_DEPTH_CORE: 170 160 printf("%4d|", cpu_top.core_info[cpu].core); 171 - if (topology_depth > 0) 161 + break; 162 + case TOPOLOGY_DEPTH_CPU: 172 163 printf("%4d|", cpu_top.core_info[cpu].cpu); 164 + break; 165 + default: 166 + return; 167 + } 173 168 174 169 for (mon = 0; mon < avail_monitors; mon++) { 175 170 if (mon != 0) ··· 313 294 314 295 if (!child_pid) { 315 296 /* child */ 316 - execvp(argv[0], argv); 297 + if (execvp(argv[0], argv) == -1) { 298 + printf("Invalid monitor command %s\n", argv[0]); 299 + exit(errno); 300 + } 317 301 } else { 318 302 /* parent */ 319 303 if (child_pid == -1) { ··· 445 423 446 424 if (avail_monitors == 0) { 447 425 printf(_("No HW Cstate monitors found\n")); 426 + cpu_topology_release(cpu_top); 448 427 return 1; 449 428 } 450 429 451 430 if (mode == list) { 452 431 list_monitors(); 432 + cpu_topology_release(cpu_top); 453 433 exit(EXIT_SUCCESS); 454 434 } 455 435 ··· 472 448 /* ToDo: Topology parsing needs fixing first to do 473 449 this more generically */ 474 450 if (cpu_top.pkgs > 1) 475 - print_header(3); 451 + print_header(TOPOLOGY_DEPTH_PKG); 476 452 else 477 - print_header(1); 453 + print_header(TOPOLOGY_DEPTH_CPU); 478 454 479 455 for (cpu = 0; cpu < cpu_count; cpu++) { 480 456 if (cpu_top.pkgs > 1) 481 - print_results(3, cpu); 457 + print_results(TOPOLOGY_DEPTH_PKG, cpu); 482 458 else 483 - print_results(1, cpu); 459 + print_results(TOPOLOGY_DEPTH_CPU, cpu); 484 460 } 485 461 486 462 for (num = 0; num < avail_monitors; num++) {