Merge tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

+26

Documentation/admin-guide/pm/amd-pstate.rst

··· 369 369 policies with other scaling governors). 370 370 371 371 372 + Tracer Tool 373 + ------------- 374 + 375 + ``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then 376 + generate performance plots. This utility can be used to debug and tune the 377 + performance of ``amd-pstate`` driver. The tracer tool needs to import intel 378 + pstate tracer. 379 + 380 + Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be 381 + used in two ways. If trace file is available, then directly parse the file 382 + with command :: 383 + 384 + ./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name> 385 + 386 + Or generate trace file with root privilege, then parse and plot with command :: 387 + 388 + sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes] 389 + 390 + The test result can be found in ``results/test_name``. Following is the example 391 + about part of the output. :: 392 + 393 + common_cpu common_secs common_usecs min_perf des_perf max_perf freq mperf apef tsc load duration_ms sample_num elapsed_time common_comm 394 + CPU_005 712 116384 39 49 166 0.7565 9645075 2214891 38431470 25.1 11.646 469 2.496 kworker/5:0-40 395 + CPU_006 712 116408 39 49 166 0.6769 8950227 1839034 37192089 24.06 11.272 470 2.496 kworker/6:0-1264 396 + 397 + 372 398 Reference 373 399 =========== 374 400

+60

Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: <isonum.txt> 3 + 4 + ============================== 5 + Intel Uncore Frequency Scaling 6 + ============================== 7 + 8 + :Copyright: |copy| 2022 Intel Corporation 9 + 10 + :Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> 11 + 12 + Introduction 13 + ------------ 14 + 15 + The uncore can consume significant amount of power in Intel's Xeon servers based 16 + on the workload characteristics. To optimize the total power and improve overall 17 + performance, SoCs have internal algorithms for scaling uncore frequency. These 18 + algorithms monitor workload usage of uncore and set a desirable frequency. 19 + 20 + It is possible that users have different expectations of uncore performance and 21 + want to have control over it. The objective is similar to allowing users to set 22 + the scaling min/max frequencies via cpufreq sysfs to improve CPU performance. 23 + Users may have some latency sensitive workloads where they do not want any 24 + change to uncore frequency. Also, users may have workloads which require 25 + different core and uncore performance at distinct phases and they may want to 26 + use both cpufreq and the uncore scaling interface to distribute power and 27 + improve overall performance. 28 + 29 + Sysfs Interface 30 + --------------- 31 + 32 + To control uncore frequency, a sysfs interface is provided in the directory: 33 + `/sys/devices/system/cpu/intel_uncore_frequency/`. 34 + 35 + There is one directory for each package and die combination as the scope of 36 + uncore scaling control is per die in multiple die/package SoCs or per 37 + package for single die per package SoCs. The name represents the 38 + scope of control. For example: 'package_00_die_00' is for package id 0 and 39 + die 0. 40 + 41 + Each package_*_die_* contains the following attributes: 42 + 43 + ``initial_max_freq_khz`` 44 + Out of reset, this attribute represent the maximum possible frequency. 45 + This is a read-only attribute. If users adjust max_freq_khz, 46 + they can always go back to maximum using the value from this attribute. 47 + 48 + ``initial_min_freq_khz`` 49 + Out of reset, this attribute represent the minimum possible frequency. 50 + This is a read-only attribute. If users adjust min_freq_khz, 51 + they can always go back to minimum using the value from this attribute. 52 + 53 + ``max_freq_khz`` 54 + This attribute is used to set the maximum uncore frequency. 55 + 56 + ``min_freq_khz`` 57 + This attribute is used to set the minimum uncore frequency. 58 + 59 + ``current_freq_khz`` 60 + This attribute is used to get the current uncore frequency.

+1

Documentation/admin-guide/pm/working-state.rst

··· 15 15 cpufreq_drivers 16 16 intel_epb 17 17 intel-speed-select 18 + intel_uncore_frequency_scaling

+1

MAINTAINERS

··· 1002 1002 S: Supported 1003 1003 F: Documentation/admin-guide/pm/amd-pstate.rst 1004 1004 F: drivers/cpufreq/amd-pstate* 1005 + F: tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py 1005 1006 1006 1007 AMD PTDMA DRIVER 1007 1008 M: Sanjay R Mehta <sanju.mehta@amd.com>

+3 -3

arch/arm64/kernel/cpuidle.c

··· 54 54 struct acpi_lpi_state *lpi; 55 55 struct acpi_processor *pr = per_cpu(processors, cpu); 56 56 57 + if (unlikely(!pr || !pr->flags.has_lpi)) 58 + return -EINVAL; 59 + 57 60 /* 58 61 * If the PSCI cpu_suspend function hook has not been initialized 59 62 * idle states must not be enabled, so bail out 60 63 */ 61 64 if (!psci_ops.cpu_suspend) 62 65 return -EOPNOTSUPP; 63 - 64 - if (unlikely(!pr || !pr->flags.has_lpi)) 65 - return -EINVAL; 66 66 67 67 count = pr->power.count - 1; 68 68 if (count <= 0)

+21 -2

arch/x86/kernel/acpi/sleep.c

··· 15 15 #include <asm/desc.h> 16 16 #include <asm/cacheflush.h> 17 17 #include <asm/realmode.h> 18 + #include <asm/hypervisor.h> 18 19 19 20 #include <linux/ftrace.h> 20 21 #include "../../realmode/rm/wakeup.h" ··· 141 140 acpi_realmode_flags |= 4; 142 141 #ifdef CONFIG_HIBERNATION 143 142 if (strncmp(str, "s4_hwsig", 8) == 0) 144 - acpi_check_s4_hw_signature(1); 143 + acpi_check_s4_hw_signature = 1; 145 144 if (strncmp(str, "s4_nohwsig", 10) == 0) 146 - acpi_check_s4_hw_signature(0); 145 + acpi_check_s4_hw_signature = 0; 147 146 #endif 148 147 if (strncmp(str, "nonvs", 5) == 0) 149 148 acpi_nvs_nosave(); ··· 161 160 } 162 161 163 162 __setup("acpi_sleep=", acpi_sleep_setup); 163 + 164 + #if defined(CONFIG_HIBERNATION) && defined(CONFIG_HYPERVISOR_GUEST) 165 + static int __init init_s4_sigcheck(void) 166 + { 167 + /* 168 + * If running on a hypervisor, honour the ACPI specification 169 + * by default and trigger a clean reboot when the hardware 170 + * signature in FACS is changed after hibernation. 171 + */ 172 + if (acpi_check_s4_hw_signature == -1 && 173 + !hypervisor_is_type(X86_HYPER_NATIVE)) 174 + acpi_check_s4_hw_signature = 1; 175 + 176 + return 0; 177 + } 178 + /* This must happen before acpi_init() which is a subsys initcall */ 179 + arch_initcall(init_s4_sigcheck); 180 + #endif

+10 -5

drivers/acpi/processor_idle.c

··· 1080 1080 return 0; 1081 1081 } 1082 1082 1083 + int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu) 1084 + { 1085 + return -EOPNOTSUPP; 1086 + } 1087 + 1083 1088 static int acpi_processor_get_lpi_info(struct acpi_processor *pr) 1084 1089 { 1085 1090 int ret, i; ··· 1092 1087 acpi_handle handle = pr->handle, pr_ahandle; 1093 1088 struct acpi_device *d = NULL; 1094 1089 struct acpi_lpi_states_array info[2], *tmp, *prev, *curr; 1090 + 1091 + /* make sure our architecture has support */ 1092 + ret = acpi_processor_ffh_lpi_probe(pr->id); 1093 + if (ret == -EOPNOTSUPP) 1094 + return ret; 1095 1095 1096 1096 if (!osc_pc_lpi_support_confirmed) 1097 1097 return -EOPNOTSUPP; ··· 1147 1137 pr->flags.power = 1; 1148 1138 1149 1139 return 0; 1150 - } 1151 - 1152 - int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu) 1153 - { 1154 - return -ENODEV; 1155 1140 } 1156 1141 1157 1142 int __weak acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)

+3 -8

drivers/acpi/sleep.c

··· 871 871 #ifdef CONFIG_HIBERNATION 872 872 static unsigned long s4_hardware_signature; 873 873 static struct acpi_table_facs *facs; 874 - static int sigcheck = -1; /* Default behaviour is just to warn */ 875 - 876 - void __init acpi_check_s4_hw_signature(int check) 877 - { 878 - sigcheck = check; 879 - } 874 + int acpi_check_s4_hw_signature = -1; /* Default behaviour is just to warn */ 880 875 881 876 static int acpi_hibernation_begin(pm_message_t stage) 882 877 { ··· 996 1001 hibernation_set_ops(old_suspend_ordering ? 997 1002 &acpi_hibernation_ops_old : &acpi_hibernation_ops); 998 1003 sleep_states[ACPI_STATE_S4] = 1; 999 - if (!sigcheck) 1004 + if (!acpi_check_s4_hw_signature) 1000 1005 return; 1001 1006 1002 1007 acpi_get_table(ACPI_SIG_FACS, 1, (struct acpi_table_header **)&facs); ··· 1008 1013 */ 1009 1014 s4_hardware_signature = facs->hardware_signature; 1010 1015 1011 - if (sigcheck > 0) { 1016 + if (acpi_check_s4_hw_signature > 0) { 1012 1017 /* 1013 1018 * If we're actually obeying the ACPI specification 1014 1019 * then the signature is written out as part of the

+26 -16

drivers/base/power/domain.c

··· 636 636 atomic_read(&genpd->sd_count) > 0) 637 637 return -EBUSY; 638 638 639 + /* 640 + * The children must be in their deepest (powered-off) states to allow 641 + * the parent to be powered off. Note that, there's no need for 642 + * additional locking, as powering on a child, requires the parent's 643 + * lock to be acquired first. 644 + */ 645 + list_for_each_entry(link, &genpd->parent_links, parent_node) { 646 + struct generic_pm_domain *child = link->child; 647 + if (child->state_idx < child->state_count - 1) 648 + return -EBUSY; 649 + } 650 + 639 651 list_for_each_entry(pdd, &genpd->dev_list, list_node) { 640 652 enum pm_qos_flags_status stat; 641 653 ··· 1084 1072 if (genpd->suspended_count != genpd->device_count 1085 1073 || atomic_read(&genpd->sd_count) > 0) 1086 1074 return; 1075 + 1076 + /* Check that the children are in their deepest (powered-off) state. */ 1077 + list_for_each_entry(link, &genpd->parent_links, parent_node) { 1078 + struct generic_pm_domain *child = link->child; 1079 + if (child->state_idx < child->state_count - 1) 1080 + return; 1081 + } 1087 1082 1088 1083 /* Choose the deepest state when suspending */ 1089 1084 genpd->state_idx = genpd->state_count - 1; ··· 2077 2058 kfree(link); 2078 2059 } 2079 2060 2080 - genpd_debug_remove(genpd); 2081 2061 list_del(&genpd->gpd_list_node); 2082 2062 genpd_unlock(genpd); 2063 + genpd_debug_remove(genpd); 2083 2064 cancel_work_sync(&genpd->power_off_work); 2084 2065 if (genpd_is_cpu_domain(genpd)) 2085 2066 free_cpumask_var(genpd->cpus); ··· 2267 2248 /* Parse genpd OPP table */ 2268 2249 if (genpd->set_performance_state) { 2269 2250 ret = dev_pm_opp_of_add_table(&genpd->dev); 2270 - if (ret) { 2271 - if (ret != -EPROBE_DEFER) 2272 - dev_err(&genpd->dev, "Failed to add OPP table: %d\n", 2273 - ret); 2274 - return ret; 2275 - } 2251 + if (ret) 2252 + return dev_err_probe(&genpd->dev, ret, "Failed to add OPP table\n"); 2276 2253 2277 2254 /* 2278 2255 * Save table for faster processing while setting performance ··· 2327 2312 if (genpd->set_performance_state) { 2328 2313 ret = dev_pm_opp_of_add_table_indexed(&genpd->dev, i); 2329 2314 if (ret) { 2330 - if (ret != -EPROBE_DEFER) 2331 - dev_err(&genpd->dev, "Failed to add OPP table for index %d: %d\n", 2332 - i, ret); 2315 + dev_err_probe(&genpd->dev, ret, 2316 + "Failed to add OPP table for index %d\n", i); 2333 2317 goto error; 2334 2318 } 2335 2319 ··· 2686 2672 ret = genpd_add_device(pd, dev, base_dev); 2687 2673 mutex_unlock(&gpd_list_lock); 2688 2674 2689 - if (ret < 0) { 2690 - if (ret != -EPROBE_DEFER) 2691 - dev_err(dev, "failed to add to PM domain %s: %d", 2692 - pd->name, ret); 2693 - return ret; 2694 - } 2675 + if (ret < 0) 2676 + return dev_err_probe(dev, ret, "failed to add to PM domain %s\n", pd->name); 2695 2677 2696 2678 dev->pm_domain->detach = genpd_dev_pm_detach; 2697 2679 dev->pm_domain->sync = genpd_dev_pm_sync;

+9 -7

drivers/base/power/main.c

··· 485 485 trace_device_pm_callback_start(dev, info, state.event); 486 486 error = cb(dev); 487 487 trace_device_pm_callback_end(dev, error); 488 - suspend_report_result(cb, error); 488 + suspend_report_result(dev, cb, error); 489 489 490 490 initcall_debug_report(dev, calltime, cb, error); 491 491 ··· 1568 1568 trace_device_pm_callback_start(dev, info, state.event); 1569 1569 error = cb(dev, state); 1570 1570 trace_device_pm_callback_end(dev, error); 1571 - suspend_report_result(cb, error); 1571 + suspend_report_result(dev, cb, error); 1572 1572 1573 1573 initcall_debug_report(dev, calltime, cb, error); 1574 1574 ··· 1855 1855 device_unlock(dev); 1856 1856 1857 1857 if (ret < 0) { 1858 - suspend_report_result(callback, ret); 1858 + suspend_report_result(dev, callback, ret); 1859 1859 pm_runtime_put(dev); 1860 1860 return ret; 1861 1861 } ··· 1960 1960 } 1961 1961 EXPORT_SYMBOL_GPL(dpm_suspend_start); 1962 1962 1963 - void __suspend_report_result(const char *function, void *fn, int ret) 1963 + void __suspend_report_result(const char *function, struct device *dev, void *fn, int ret) 1964 1964 { 1965 1965 if (ret) 1966 - pr_err("%s(): %pS returns %d\n", function, fn, ret); 1966 + dev_err(dev, "%s(): %pS returns %d\n", function, fn, ret); 1967 1967 } 1968 1968 EXPORT_SYMBOL_GPL(__suspend_report_result); 1969 1969 ··· 2018 2018 2019 2019 void device_pm_check_callbacks(struct device *dev) 2020 2020 { 2021 - spin_lock_irq(&dev->power.lock); 2021 + unsigned long flags; 2022 + 2023 + spin_lock_irqsave(&dev->power.lock, flags); 2022 2024 dev->power.no_pm_callbacks = 2023 2025 (!dev->bus || (pm_ops_is_empty(dev->bus->pm) && 2024 2026 !dev->bus->suspend && !dev->bus->resume)) && ··· 2029 2027 (!dev->pm_domain || pm_ops_is_empty(&dev->pm_domain->ops)) && 2030 2028 (!dev->driver || (pm_ops_is_empty(dev->driver->pm) && 2031 2029 !dev->driver->suspend && !dev->driver->resume)); 2032 - spin_unlock_irq(&dev->power.lock); 2030 + spin_unlock_irqrestore(&dev->power.lock, flags); 2033 2031 } 2034 2032 2035 2033 bool dev_pm_skip_suspend(struct device *dev)

+5

drivers/base/power/runtime.c

··· 1476 1476 1477 1477 static void pm_runtime_disable_action(void *data) 1478 1478 { 1479 + pm_runtime_dont_use_autosuspend(data); 1479 1480 pm_runtime_disable(data); 1480 1481 } 1481 1482 1482 1483 /** 1483 1484 * devm_pm_runtime_enable - devres-enabled version of pm_runtime_enable. 1485 + * 1486 + * NOTE: this will also handle calling pm_runtime_dont_use_autosuspend() for 1487 + * you at driver exit time if needed. 1488 + * 1484 1489 * @dev: Device to handle. 1485 1490 */ 1486 1491 int devm_pm_runtime_enable(struct device *dev)

+1 -1

drivers/base/power/wakeirq.c

··· 289 289 * 290 290 * Enables wakeirq conditionally. We need to enable wake-up interrupt 291 291 * lazily on the first rpm_suspend(). This is needed as the consumer device 292 - * starts in RPM_SUSPENDED state, and the the first pm_runtime_get() would 292 + * starts in RPM_SUSPENDED state, and the first pm_runtime_get() would 293 293 * otherwise try to disable already disabled wakeirq. The wake-up interrupt 294 294 * starts disabled with IRQ_NOAUTOEN set. 295 295 *

+2 -2

drivers/base/power/wakeup.c

··· 587 587 * @ws: Wakeup source to handle. 588 588 * 589 589 * Update the @ws' statistics and, if @ws has just been activated, notify the PM 590 - * core of the event by incrementing the counter of of wakeup events being 590 + * core of the event by incrementing the counter of the wakeup events being 591 591 * processed. 592 592 */ 593 593 static void wakeup_source_activate(struct wakeup_source *ws) ··· 733 733 734 734 /* 735 735 * Increment the counter of registered wakeup events and decrement the 736 - * couter of wakeup events in progress simultaneously. 736 + * counter of wakeup events in progress simultaneously. 737 737 */ 738 738 cec = atomic_add_return(MAX_IN_PROGRESS, &combined_event_count); 739 739 trace_wakeup_source_deactivate(ws->name, cec);

+21 -1

drivers/cpufreq/amd-pstate-trace.h

··· 27 27 TP_PROTO(unsigned long min_perf, 28 28 unsigned long target_perf, 29 29 unsigned long capacity, 30 + u64 freq, 31 + u64 mperf, 32 + u64 aperf, 33 + u64 tsc, 30 34 unsigned int cpu_id, 31 35 bool changed, 32 36 bool fast_switch ··· 39 35 TP_ARGS(min_perf, 40 36 target_perf, 41 37 capacity, 38 + freq, 39 + mperf, 40 + aperf, 41 + tsc, 42 42 cpu_id, 43 43 changed, 44 44 fast_switch ··· 52 44 __field(unsigned long, min_perf) 53 45 __field(unsigned long, target_perf) 54 46 __field(unsigned long, capacity) 47 + __field(unsigned long long, freq) 48 + __field(unsigned long long, mperf) 49 + __field(unsigned long long, aperf) 50 + __field(unsigned long long, tsc) 55 51 __field(unsigned int, cpu_id) 56 52 __field(bool, changed) 57 53 __field(bool, fast_switch) ··· 65 53 __entry->min_perf = min_perf; 66 54 __entry->target_perf = target_perf; 67 55 __entry->capacity = capacity; 56 + __entry->freq = freq; 57 + __entry->mperf = mperf; 58 + __entry->aperf = aperf; 59 + __entry->tsc = tsc; 68 60 __entry->cpu_id = cpu_id; 69 61 __entry->changed = changed; 70 62 __entry->fast_switch = fast_switch; 71 63 ), 72 64 73 - TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu cpu_id=%u changed=%s fast_switch=%s", 65 + TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu freq=%llu mperf=%llu aperf=%llu tsc=%llu cpu_id=%u changed=%s fast_switch=%s", 74 66 (unsigned long)__entry->min_perf, 75 67 (unsigned long)__entry->target_perf, 76 68 (unsigned long)__entry->capacity, 69 + (unsigned long long)__entry->freq, 70 + (unsigned long long)__entry->mperf, 71 + (unsigned long long)__entry->aperf, 72 + (unsigned long long)__entry->tsc, 77 73 (unsigned int)__entry->cpu_id, 78 74 (__entry->changed) ? "true" : "false", 79 75 (__entry->fast_switch) ? "true" : "false"

+57 -2

drivers/cpufreq/amd-pstate.c

··· 66 66 static struct cpufreq_driver amd_pstate_driver; 67 67 68 68 /** 69 + * struct amd_aperf_mperf 70 + * @aperf: actual performance frequency clock count 71 + * @mperf: maximum performance frequency clock count 72 + * @tsc: time stamp counter 73 + */ 74 + struct amd_aperf_mperf { 75 + u64 aperf; 76 + u64 mperf; 77 + u64 tsc; 78 + }; 79 + 80 + /** 69 81 * struct amd_cpudata - private CPU data for AMD P-State 70 82 * @cpu: CPU number 71 83 * @req: constraint request to apply ··· 93 81 * @min_freq: the frequency that mapped to lowest_perf 94 82 * @nominal_freq: the frequency that mapped to nominal_perf 95 83 * @lowest_nonlinear_freq: the frequency that mapped to lowest_nonlinear_perf 84 + * @cur: Difference of Aperf/Mperf/tsc count between last and current sample 85 + * @prev: Last Aperf/Mperf/tsc count value read from register 86 + * @freq: current cpu frequency value 96 87 * @boost_supported: check whether the Processor or SBIOS supports boost mode 97 88 * 98 89 * The amd_cpudata is key private data for each CPU thread in AMD P-State, and ··· 117 102 u32 nominal_freq; 118 103 u32 lowest_nonlinear_freq; 119 104 105 + struct amd_aperf_mperf cur; 106 + struct amd_aperf_mperf prev; 107 + 108 + u64 freq; 120 109 bool boost_supported; 121 110 }; 122 111 ··· 230 211 max_perf, fast_switch); 231 212 } 232 213 214 + static inline bool amd_pstate_sample(struct amd_cpudata *cpudata) 215 + { 216 + u64 aperf, mperf, tsc; 217 + unsigned long flags; 218 + 219 + local_irq_save(flags); 220 + rdmsrl(MSR_IA32_APERF, aperf); 221 + rdmsrl(MSR_IA32_MPERF, mperf); 222 + tsc = rdtsc(); 223 + 224 + if (cpudata->prev.mperf == mperf || cpudata->prev.tsc == tsc) { 225 + local_irq_restore(flags); 226 + return false; 227 + } 228 + 229 + local_irq_restore(flags); 230 + 231 + cpudata->cur.aperf = aperf; 232 + cpudata->cur.mperf = mperf; 233 + cpudata->cur.tsc = tsc; 234 + cpudata->cur.aperf -= cpudata->prev.aperf; 235 + cpudata->cur.mperf -= cpudata->prev.mperf; 236 + cpudata->cur.tsc -= cpudata->prev.tsc; 237 + 238 + cpudata->prev.aperf = aperf; 239 + cpudata->prev.mperf = mperf; 240 + cpudata->prev.tsc = tsc; 241 + 242 + cpudata->freq = div64_u64((cpudata->cur.aperf * cpu_khz), cpudata->cur.mperf); 243 + 244 + return true; 245 + } 246 + 233 247 static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf, 234 248 u32 des_perf, u32 max_perf, bool fast_switch) 235 249 { ··· 278 226 value &= ~AMD_CPPC_MAX_PERF(~0L); 279 227 value |= AMD_CPPC_MAX_PERF(max_perf); 280 228 281 - trace_amd_pstate_perf(min_perf, des_perf, max_perf, 282 - cpudata->cpu, (value != prev), fast_switch); 229 + if (trace_amd_pstate_perf_enabled() && amd_pstate_sample(cpudata)) { 230 + trace_amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->freq, 231 + cpudata->cur.mperf, cpudata->cur.aperf, cpudata->cur.tsc, 232 + cpudata->cpu, (value != prev), fast_switch); 233 + } 283 234 284 235 if (value == prev) 285 236 return;

+5 -5

drivers/cpufreq/cpufreq_conservative.c

··· 146 146 147 147 /************************** sysfs interface ************************/ 148 148 149 - static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set, 149 + static ssize_t sampling_down_factor_store(struct gov_attr_set *attr_set, 150 150 const char *buf, size_t count) 151 151 { 152 152 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 161 161 return count; 162 162 } 163 163 164 - static ssize_t store_up_threshold(struct gov_attr_set *attr_set, 164 + static ssize_t up_threshold_store(struct gov_attr_set *attr_set, 165 165 const char *buf, size_t count) 166 166 { 167 167 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 177 177 return count; 178 178 } 179 179 180 - static ssize_t store_down_threshold(struct gov_attr_set *attr_set, 180 + static ssize_t down_threshold_store(struct gov_attr_set *attr_set, 181 181 const char *buf, size_t count) 182 182 { 183 183 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 195 195 return count; 196 196 } 197 197 198 - static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set, 198 + static ssize_t ignore_nice_load_store(struct gov_attr_set *attr_set, 199 199 const char *buf, size_t count) 200 200 { 201 201 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 220 220 return count; 221 221 } 222 222 223 - static ssize_t store_freq_step(struct gov_attr_set *attr_set, const char *buf, 223 + static ssize_t freq_step_store(struct gov_attr_set *attr_set, const char *buf, 224 224 size_t count) 225 225 { 226 226 struct dbs_data *dbs_data = to_dbs_data(attr_set);

+3 -3

drivers/cpufreq/cpufreq_governor.c

··· 27 27 28 28 /* Common sysfs tunables */ 29 29 /* 30 - * store_sampling_rate - update sampling rate effective immediately if needed. 30 + * sampling_rate_store - update sampling rate effective immediately if needed. 31 31 * 32 32 * If new rate is smaller than the old, simply updating 33 33 * dbs.sampling_rate might not be appropriate. For example, if the ··· 41 41 * This must be called with dbs_data->mutex held, otherwise traversing 42 42 * policy_dbs_list isn't safe. 43 43 */ 44 - ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf, 44 + ssize_t sampling_rate_store(struct gov_attr_set *attr_set, const char *buf, 45 45 size_t count) 46 46 { 47 47 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 80 80 81 81 return count; 82 82 } 83 - EXPORT_SYMBOL_GPL(store_sampling_rate); 83 + EXPORT_SYMBOL_GPL(sampling_rate_store); 84 84 85 85 /** 86 86 * gov_update_cpu_data - Update CPU load data.

+5 -7

drivers/cpufreq/cpufreq_governor.h

··· 51 51 } 52 52 53 53 #define gov_show_one(_gov, file_name) \ 54 - static ssize_t show_##file_name \ 54 + static ssize_t file_name##_show \ 55 55 (struct gov_attr_set *attr_set, char *buf) \ 56 56 { \ 57 57 struct dbs_data *dbs_data = to_dbs_data(attr_set); \ ··· 60 60 } 61 61 62 62 #define gov_show_one_common(file_name) \ 63 - static ssize_t show_##file_name \ 63 + static ssize_t file_name##_show \ 64 64 (struct gov_attr_set *attr_set, char *buf) \ 65 65 { \ 66 66 struct dbs_data *dbs_data = to_dbs_data(attr_set); \ ··· 68 68 } 69 69 70 70 #define gov_attr_ro(_name) \ 71 - static struct governor_attr _name = \ 72 - __ATTR(_name, 0444, show_##_name, NULL) 71 + static struct governor_attr _name = __ATTR_RO(_name) 73 72 74 73 #define gov_attr_rw(_name) \ 75 - static struct governor_attr _name = \ 76 - __ATTR(_name, 0644, show_##_name, store_##_name) 74 + static struct governor_attr _name = __ATTR_RW(_name) 77 75 78 76 /* Common to all CPUs of a policy */ 79 77 struct policy_dbs_info { ··· 174 176 (struct cpufreq_policy *, unsigned int, unsigned int), 175 177 unsigned int powersave_bias); 176 178 void od_unregister_powersave_bias_handler(void); 177 - ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf, 179 + ssize_t sampling_rate_store(struct gov_attr_set *attr_set, const char *buf, 178 180 size_t count); 179 181 void gov_update_cpu_data(struct dbs_data *dbs_data); 180 182 #endif /* _CPUFREQ_GOVERNOR_H */

-5

drivers/cpufreq/cpufreq_governor_attr_set.c

··· 8 8 9 9 #include "cpufreq_governor.h" 10 10 11 - static inline struct gov_attr_set *to_gov_attr_set(struct kobject *kobj) 12 - { 13 - return container_of(kobj, struct gov_attr_set, kobj); 14 - } 15 - 16 11 static inline struct governor_attr *to_gov_attr(struct attribute *attr) 17 12 { 18 13 return container_of(attr, struct governor_attr, attr);

+5 -5

drivers/cpufreq/cpufreq_ondemand.c

··· 202 202 /************************** sysfs interface ************************/ 203 203 static struct dbs_governor od_dbs_gov; 204 204 205 - static ssize_t store_io_is_busy(struct gov_attr_set *attr_set, const char *buf, 205 + static ssize_t io_is_busy_store(struct gov_attr_set *attr_set, const char *buf, 206 206 size_t count) 207 207 { 208 208 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 220 220 return count; 221 221 } 222 222 223 - static ssize_t store_up_threshold(struct gov_attr_set *attr_set, 223 + static ssize_t up_threshold_store(struct gov_attr_set *attr_set, 224 224 const char *buf, size_t count) 225 225 { 226 226 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 237 237 return count; 238 238 } 239 239 240 - static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set, 240 + static ssize_t sampling_down_factor_store(struct gov_attr_set *attr_set, 241 241 const char *buf, size_t count) 242 242 { 243 243 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 265 265 return count; 266 266 } 267 267 268 - static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set, 268 + static ssize_t ignore_nice_load_store(struct gov_attr_set *attr_set, 269 269 const char *buf, size_t count) 270 270 { 271 271 struct dbs_data *dbs_data = to_dbs_data(attr_set); ··· 290 290 return count; 291 291 } 292 292 293 - static ssize_t store_powersave_bias(struct gov_attr_set *attr_set, 293 + static ssize_t powersave_bias_store(struct gov_attr_set *attr_set, 294 294 const char *buf, size_t count) 295 295 { 296 296 struct dbs_data *dbs_data = to_dbs_data(attr_set);

+32 -6

drivers/cpufreq/intel_pstate.c

··· 1692 1692 } 1693 1693 } 1694 1694 1695 + static void intel_pstate_update_epp_defaults(struct cpudata *cpudata) 1696 + { 1697 + cpudata->epp_default = intel_pstate_get_epp(cpudata, 0); 1698 + 1699 + /* 1700 + * If this CPU gen doesn't call for change in balance_perf 1701 + * EPP return. 1702 + */ 1703 + if (epp_values[EPP_INDEX_BALANCE_PERFORMANCE] == HWP_EPP_BALANCE_PERFORMANCE) 1704 + return; 1705 + 1706 + /* 1707 + * If powerup EPP is something other than chipset default 0x80 and 1708 + * - is more performance oriented than 0x80 (default balance_perf EPP) 1709 + * - But less performance oriented than performance EPP 1710 + * then use this as new balance_perf EPP. 1711 + */ 1712 + if (cpudata->epp_default < HWP_EPP_BALANCE_PERFORMANCE && 1713 + cpudata->epp_default > HWP_EPP_PERFORMANCE) { 1714 + epp_values[EPP_INDEX_BALANCE_PERFORMANCE] = cpudata->epp_default; 1715 + return; 1716 + } 1717 + 1718 + /* 1719 + * Use hard coded value per gen to update the balance_perf 1720 + * and default EPP. 1721 + */ 1722 + cpudata->epp_default = epp_values[EPP_INDEX_BALANCE_PERFORMANCE]; 1723 + intel_pstate_set_epp(cpudata, cpudata->epp_default); 1724 + } 1725 + 1695 1726 static void intel_pstate_hwp_enable(struct cpudata *cpudata) 1696 1727 { 1697 1728 /* First disable HWP notification interrupt till we activate again */ ··· 1736 1705 if (cpudata->epp_default >= 0) 1737 1706 return; 1738 1707 1739 - if (epp_values[EPP_INDEX_BALANCE_PERFORMANCE] == HWP_EPP_BALANCE_PERFORMANCE) { 1740 - cpudata->epp_default = intel_pstate_get_epp(cpudata, 0); 1741 - } else { 1742 - cpudata->epp_default = epp_values[EPP_INDEX_BALANCE_PERFORMANCE]; 1743 - intel_pstate_set_epp(cpudata, cpudata->epp_default); 1744 - } 1708 + intel_pstate_update_epp_defaults(cpudata); 1745 1709 } 1746 1710 1747 1711 static int atom_get_min_pstate(void)

+2 -2

drivers/cpufreq/longhaul.c

··· 668 668 u32 nesting_level, 669 669 void *context, void **return_value) 670 670 { 671 - struct acpi_device *d; 671 + struct acpi_device *d = acpi_fetch_acpi_dev(obj_handle); 672 672 673 - if (acpi_bus_get_device(obj_handle, &d)) 673 + if (!d) 674 674 return 0; 675 675 676 676 *return_value = acpi_driver_data(d);

+3 -3

drivers/cpufreq/powernow-k8.c

··· 1172 1172 unsigned int i, supported_cpus = 0; 1173 1173 int ret; 1174 1174 1175 + if (!x86_match_cpu(powernow_k8_ids)) 1176 + return -ENODEV; 1177 + 1175 1178 if (boot_cpu_has(X86_FEATURE_HW_PSTATE)) { 1176 1179 __request_acpi_cpufreq(); 1177 1180 return -ENODEV; 1178 1181 } 1179 - 1180 - if (!x86_match_cpu(powernow_k8_ids)) 1181 - return -ENODEV; 1182 1182 1183 1183 cpus_read_lock(); 1184 1184 for_each_online_cpu(i) {

+2 -2

drivers/cpuidle/cpuidle-haltpoll.c

··· 108 108 if (boot_option_idle_override != IDLE_NO_OVERRIDE) 109 109 return -ENODEV; 110 110 111 - cpuidle_poll_state_init(drv); 112 - 113 111 if (!kvm_para_available() || !haltpoll_want()) 114 112 return -ENODEV; 113 + 114 + cpuidle_poll_state_init(drv); 115 115 116 116 ret = cpuidle_register_driver(drv); 117 117 if (ret < 0)

+108 -3

drivers/idle/intel_idle.c

··· 64 64 /* intel_idle.max_cstate=0 disables driver */ 65 65 static int max_cstate = CPUIDLE_STATE_MAX - 1; 66 66 static unsigned int disabled_states_mask; 67 + static unsigned int preferred_states_mask; 67 68 68 69 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices; 69 70 ··· 121 120 * 122 121 * If the local APIC timer is not known to be reliable in the target idle state, 123 122 * enable one-shot tick broadcasting for the target CPU before executing MWAIT. 124 - * 125 - * Optionally call leave_mm() for the target CPU upfront to avoid wakeups due to 126 - * flushing user TLBs. 127 123 * 128 124 * Must be called under local_irq_disable(). 129 125 */ ··· 759 761 .enter = NULL } 760 762 }; 761 763 764 + /* 765 + * On Sapphire Rapids Xeon C1 has to be disabled if C1E is enabled, and vice 766 + * versa. On SPR C1E is enabled only if "C1E promotion" bit is set in 767 + * MSR_IA32_POWER_CTL. But in this case there effectively no C1, because C1 768 + * requests are promoted to C1E. If the "C1E promotion" bit is cleared, then 769 + * both C1 and C1E requests end up with C1, so there is effectively no C1E. 770 + * 771 + * By default we enable C1 and disable C1E by marking it with 772 + * 'CPUIDLE_FLAG_UNUSABLE'. 773 + */ 774 + static struct cpuidle_state spr_cstates[] __initdata = { 775 + { 776 + .name = "C1", 777 + .desc = "MWAIT 0x00", 778 + .flags = MWAIT2flg(0x00), 779 + .exit_latency = 1, 780 + .target_residency = 1, 781 + .enter = &intel_idle, 782 + .enter_s2idle = intel_idle_s2idle, }, 783 + { 784 + .name = "C1E", 785 + .desc = "MWAIT 0x01", 786 + .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE | 787 + CPUIDLE_FLAG_UNUSABLE, 788 + .exit_latency = 2, 789 + .target_residency = 4, 790 + .enter = &intel_idle, 791 + .enter_s2idle = intel_idle_s2idle, }, 792 + { 793 + .name = "C6", 794 + .desc = "MWAIT 0x20", 795 + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, 796 + .exit_latency = 290, 797 + .target_residency = 800, 798 + .enter = &intel_idle, 799 + .enter_s2idle = intel_idle_s2idle, }, 800 + { 801 + .enter = NULL } 802 + }; 803 + 762 804 static struct cpuidle_state atom_cstates[] __initdata = { 763 805 { 764 806 .name = "C1E", ··· 1142 1104 .use_acpi = true, 1143 1105 }; 1144 1106 1107 + static const struct idle_cpu idle_cpu_spr __initconst = { 1108 + .state_table = spr_cstates, 1109 + .disable_promotion_to_c1e = true, 1110 + .use_acpi = true, 1111 + }; 1112 + 1145 1113 static const struct idle_cpu idle_cpu_avn __initconst = { 1146 1114 .state_table = avn_cstates, 1147 1115 .disable_promotion_to_c1e = true, ··· 1210 1166 X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &idle_cpu_skx), 1211 1167 X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &idle_cpu_icx), 1212 1168 X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &idle_cpu_icx), 1169 + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &idle_cpu_spr), 1213 1170 X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &idle_cpu_knl), 1214 1171 X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &idle_cpu_knl), 1215 1172 X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &idle_cpu_bxt), ··· 1398 1353 static inline bool intel_idle_off_by_default(u32 mwait_hint) { return false; } 1399 1354 #endif /* !CONFIG_ACPI_PROCESSOR_CSTATE */ 1400 1355 1356 + static void c1e_promotion_enable(void); 1357 + 1401 1358 /** 1402 1359 * ivt_idle_state_table_update - Tune the idle states table for Ivy Town. 1403 1360 * ··· 1570 1523 } 1571 1524 } 1572 1525 1526 + /** 1527 + * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table. 1528 + */ 1529 + static void __init spr_idle_state_table_update(void) 1530 + { 1531 + unsigned long long msr; 1532 + 1533 + /* Check if user prefers C1E over C1. */ 1534 + if (preferred_states_mask & BIT(2)) { 1535 + if (preferred_states_mask & BIT(1)) 1536 + /* Both can't be enabled, stick to the defaults. */ 1537 + return; 1538 + 1539 + spr_cstates[0].flags |= CPUIDLE_FLAG_UNUSABLE; 1540 + spr_cstates[1].flags &= ~CPUIDLE_FLAG_UNUSABLE; 1541 + 1542 + /* Enable C1E using the "C1E promotion" bit. */ 1543 + c1e_promotion_enable(); 1544 + disable_promotion_to_c1e = false; 1545 + } 1546 + 1547 + /* 1548 + * By default, the C6 state assumes the worst-case scenario of package 1549 + * C6. However, if PC6 is disabled, we update the numbers to match 1550 + * core C6. 1551 + */ 1552 + rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr); 1553 + 1554 + /* Limit value 2 and above allow for PC6. */ 1555 + if ((msr & 0x7) < 2) { 1556 + spr_cstates[2].exit_latency = 190; 1557 + spr_cstates[2].target_residency = 600; 1558 + } 1559 + } 1560 + 1573 1561 static bool __init intel_idle_verify_cstate(unsigned int mwait_hint) 1574 1562 { 1575 1563 unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1; ··· 1638 1556 break; 1639 1557 case INTEL_FAM6_SKYLAKE_X: 1640 1558 skx_idle_state_table_update(); 1559 + break; 1560 + case INTEL_FAM6_SAPPHIRERAPIDS_X: 1561 + spr_idle_state_table_update(); 1641 1562 break; 1642 1563 } 1643 1564 ··· 1712 1627 rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits); 1713 1628 msr_bits &= ~auto_demotion_disable_flags; 1714 1629 wrmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits); 1630 + } 1631 + 1632 + static void c1e_promotion_enable(void) 1633 + { 1634 + unsigned long long msr_bits; 1635 + 1636 + rdmsrl(MSR_IA32_POWER_CTL, msr_bits); 1637 + msr_bits |= 0x2; 1638 + wrmsrl(MSR_IA32_POWER_CTL, msr_bits); 1715 1639 } 1716 1640 1717 1641 static void c1e_promotion_disable(void) ··· 1892 1798 */ 1893 1799 module_param_named(states_off, disabled_states_mask, uint, 0444); 1894 1800 MODULE_PARM_DESC(states_off, "Mask of disabled idle states"); 1801 + /* 1802 + * Some platforms come with mutually exclusive C-states, so that if one is 1803 + * enabled, the other C-states must not be used. Example: C1 and C1E on 1804 + * Sapphire Rapids platform. This parameter allows for selecting the 1805 + * preferred C-states among the groups of mutually exclusive C-states - the 1806 + * selected C-states will be registered, the other C-states from the mutually 1807 + * exclusive group won't be registered. If the platform has no mutually 1808 + * exclusive C-states, this parameter has no effect. 1809 + */ 1810 + module_param_named(preferred_cstates, preferred_states_mask, uint, 0444); 1811 + MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");

+7 -7

drivers/pci/pci-driver.c

··· 596 596 int error; 597 597 598 598 error = drv->suspend(pci_dev, state); 599 - suspend_report_result(drv->suspend, error); 599 + suspend_report_result(dev, drv->suspend, error); 600 600 if (error) 601 601 return error; 602 602 ··· 775 775 int error; 776 776 777 777 error = pm->suspend(dev); 778 - suspend_report_result(pm->suspend, error); 778 + suspend_report_result(dev, pm->suspend, error); 779 779 if (error) 780 780 return error; 781 781 ··· 821 821 int error; 822 822 823 823 error = pm->suspend_noirq(dev); 824 - suspend_report_result(pm->suspend_noirq, error); 824 + suspend_report_result(dev, pm->suspend_noirq, error); 825 825 if (error) 826 826 return error; 827 827 ··· 1010 1010 int error; 1011 1011 1012 1012 error = pm->freeze(dev); 1013 - suspend_report_result(pm->freeze, error); 1013 + suspend_report_result(dev, pm->freeze, error); 1014 1014 if (error) 1015 1015 return error; 1016 1016 } ··· 1030 1030 int error; 1031 1031 1032 1032 error = pm->freeze_noirq(dev); 1033 - suspend_report_result(pm->freeze_noirq, error); 1033 + suspend_report_result(dev, pm->freeze_noirq, error); 1034 1034 if (error) 1035 1035 return error; 1036 1036 } ··· 1116 1116 int error; 1117 1117 1118 1118 error = pm->poweroff(dev); 1119 - suspend_report_result(pm->poweroff, error); 1119 + suspend_report_result(dev, pm->poweroff, error); 1120 1120 if (error) 1121 1121 return error; 1122 1122 } ··· 1154 1154 int error; 1155 1155 1156 1156 error = pm->poweroff_noirq(dev); 1157 - suspend_report_result(pm->poweroff_noirq, error); 1157 + suspend_report_result(dev, pm->poweroff_noirq, error); 1158 1158 if (error) 1159 1159 return error; 1160 1160 }

+1 -1

drivers/pnp/driver.c

··· 171 171 172 172 if (pnp_drv->driver.pm && pnp_drv->driver.pm->suspend) { 173 173 error = pnp_drv->driver.pm->suspend(dev); 174 - suspend_report_result(pnp_drv->driver.pm->suspend, error); 174 + suspend_report_result(dev, pnp_drv->driver.pm->suspend, error); 175 175 if (error) 176 176 return error; 177 177 }

+8

drivers/powercap/Kconfig

··· 46 46 47 47 config DTPM 48 48 bool "Power capping for Dynamic Thermal Power Management (EXPERIMENTAL)" 49 + depends on OF 49 50 help 50 51 This enables support for the power capping for the dynamic 51 52 thermal power management userspace engine. ··· 56 55 depends on DTPM && ENERGY_MODEL 57 56 help 58 57 This enables support for CPU power limitation based on 58 + energy model. 59 + 60 + config DTPM_DEVFREQ 61 + bool "Add device power capping based on the energy model" 62 + depends on DTPM && ENERGY_MODEL 63 + help 64 + This enables support for device power limitation based on 59 65 energy model. 60 66 endif

+1

drivers/powercap/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 2 obj-$(CONFIG_DTPM) += dtpm.o 3 3 obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o 4 + obj-$(CONFIG_DTPM_DEVFREQ) += dtpm_devfreq.o 4 5 obj-$(CONFIG_POWERCAP) += powercap_sys.o 5 6 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o 6 7 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o

+264 -77

drivers/powercap/dtpm.c

··· 23 23 #include <linux/powercap.h> 24 24 #include <linux/slab.h> 25 25 #include <linux/mutex.h> 26 + #include <linux/of.h> 27 + 28 + #include "dtpm_subsys.h" 26 29 27 30 #define DTPM_POWER_LIMIT_FLAG 0 28 31 ··· 51 48 { 52 49 struct dtpm *dtpm = to_dtpm(pcz); 53 50 54 - mutex_lock(&dtpm_lock); 55 51 *max_power_uw = dtpm->power_max - dtpm->power_min; 56 - mutex_unlock(&dtpm_lock); 57 52 58 53 return 0; 59 54 } ··· 81 80 82 81 static int get_power_uw(struct powercap_zone *pcz, u64 *power_uw) 83 82 { 84 - struct dtpm *dtpm = to_dtpm(pcz); 85 - int ret; 86 - 87 - mutex_lock(&dtpm_lock); 88 - ret = __get_power_uw(dtpm, power_uw); 89 - mutex_unlock(&dtpm_lock); 90 - 91 - return ret; 83 + return __get_power_uw(to_dtpm(pcz), power_uw); 92 84 } 93 85 94 86 static void __dtpm_rebalance_weight(struct dtpm *dtpm) ··· 124 130 } 125 131 } 126 132 127 - static int __dtpm_update_power(struct dtpm *dtpm) 133 + /** 134 + * dtpm_update_power - Update the power on the dtpm 135 + * @dtpm: a pointer to a dtpm structure to update 136 + * 137 + * Function to update the power values of the dtpm node specified in 138 + * parameter. These new values will be propagated to the tree. 139 + * 140 + * Return: zero on success, -EINVAL if the values are inconsistent 141 + */ 142 + int dtpm_update_power(struct dtpm *dtpm) 128 143 { 129 144 int ret; 130 145 ··· 156 153 } 157 154 158 155 /** 159 - * dtpm_update_power - Update the power on the dtpm 160 - * @dtpm: a pointer to a dtpm structure to update 161 - * 162 - * Function to update the power values of the dtpm node specified in 163 - * parameter. These new values will be propagated to the tree. 164 - * 165 - * Return: zero on success, -EINVAL if the values are inconsistent 166 - */ 167 - int dtpm_update_power(struct dtpm *dtpm) 168 - { 169 - int ret; 170 - 171 - mutex_lock(&dtpm_lock); 172 - ret = __dtpm_update_power(dtpm); 173 - mutex_unlock(&dtpm_lock); 174 - 175 - return ret; 176 - } 177 - 178 - /** 179 156 * dtpm_release_zone - Cleanup when the node is released 180 157 * @pcz: a pointer to a powercap_zone structure 181 158 * ··· 171 188 struct dtpm *dtpm = to_dtpm(pcz); 172 189 struct dtpm *parent = dtpm->parent; 173 190 174 - mutex_lock(&dtpm_lock); 175 - 176 - if (!list_empty(&dtpm->children)) { 177 - mutex_unlock(&dtpm_lock); 191 + if (!list_empty(&dtpm->children)) 178 192 return -EBUSY; 179 - } 180 193 181 194 if (parent) 182 195 list_del(&dtpm->sibling); 183 196 184 197 __dtpm_sub_power(dtpm); 185 198 186 - mutex_unlock(&dtpm_lock); 187 - 188 199 if (dtpm->ops) 189 200 dtpm->ops->release(dtpm); 201 + else 202 + kfree(dtpm); 190 203 191 - if (root == dtpm) 192 - root = NULL; 193 - 194 - kfree(dtpm); 195 - 196 - return 0; 197 - } 198 - 199 - static int __get_power_limit_uw(struct dtpm *dtpm, int cid, u64 *power_limit) 200 - { 201 - *power_limit = dtpm->power_limit; 202 204 return 0; 203 205 } 204 206 205 207 static int get_power_limit_uw(struct powercap_zone *pcz, 206 208 int cid, u64 *power_limit) 207 209 { 208 - struct dtpm *dtpm = to_dtpm(pcz); 209 - int ret; 210 - 211 - mutex_lock(&dtpm_lock); 212 - ret = __get_power_limit_uw(dtpm, cid, power_limit); 213 - mutex_unlock(&dtpm_lock); 214 - 215 - return ret; 210 + *power_limit = to_dtpm(pcz)->power_limit; 211 + 212 + return 0; 216 213 } 217 214 218 215 /* ··· 252 289 253 290 ret = __set_power_limit_uw(child, cid, power); 254 291 if (!ret) 255 - ret = __get_power_limit_uw(child, cid, &power); 292 + ret = get_power_limit_uw(&child->zone, cid, &power); 256 293 257 294 if (ret) 258 295 break; ··· 270 307 struct dtpm *dtpm = to_dtpm(pcz); 271 308 int ret; 272 309 273 - mutex_lock(&dtpm_lock); 274 - 275 310 /* 276 311 * Don't allow values outside of the power range previously 277 312 * set when initializing the power numbers. ··· 281 320 pr_debug("%s: power limit: %llu uW, power max: %llu uW\n", 282 321 dtpm->zone.name, dtpm->power_limit, dtpm->power_max); 283 322 284 - mutex_unlock(&dtpm_lock); 285 - 286 323 return ret; 287 324 } 288 325 ··· 291 332 292 333 static int get_max_power_uw(struct powercap_zone *pcz, int id, u64 *max_power) 293 334 { 294 - struct dtpm *dtpm = to_dtpm(pcz); 295 - 296 - mutex_lock(&dtpm_lock); 297 - *max_power = dtpm->power_max; 298 - mutex_unlock(&dtpm_lock); 335 + *max_power = to_dtpm(pcz)->power_max; 299 336 300 337 return 0; 301 338 } ··· 394 439 if (IS_ERR(pcz)) 395 440 return PTR_ERR(pcz); 396 441 397 - mutex_lock(&dtpm_lock); 398 - 399 442 if (parent) { 400 443 list_add_tail(&dtpm->sibling, &parent->children); 401 444 dtpm->parent = parent; ··· 409 456 pr_debug("Registered dtpm node '%s' / %llu-%llu uW, \n", 410 457 dtpm->zone.name, dtpm->power_min, dtpm->power_max); 411 458 412 - mutex_unlock(&dtpm_lock); 413 - 414 459 return 0; 415 460 } 416 461 417 - static int __init init_dtpm(void) 462 + static struct dtpm *dtpm_setup_virtual(const struct dtpm_node *hierarchy, 463 + struct dtpm *parent) 418 464 { 419 - pct = powercap_register_control_type(NULL, "dtpm", NULL); 420 - if (IS_ERR(pct)) { 421 - pr_err("Failed to register control type\n"); 422 - return PTR_ERR(pct); 465 + struct dtpm *dtpm; 466 + int ret; 467 + 468 + dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL); 469 + if (!dtpm) 470 + return ERR_PTR(-ENOMEM); 471 + dtpm_init(dtpm, NULL); 472 + 473 + ret = dtpm_register(hierarchy->name, dtpm, parent); 474 + if (ret) { 475 + pr_err("Failed to register dtpm node '%s': %d\n", 476 + hierarchy->name, ret); 477 + kfree(dtpm); 478 + return ERR_PTR(ret); 479 + } 480 + 481 + return dtpm; 482 + } 483 + 484 + static struct dtpm *dtpm_setup_dt(const struct dtpm_node *hierarchy, 485 + struct dtpm *parent) 486 + { 487 + struct device_node *np; 488 + int i, ret; 489 + 490 + np = of_find_node_by_path(hierarchy->name); 491 + if (!np) { 492 + pr_err("Failed to find '%s'\n", hierarchy->name); 493 + return ERR_PTR(-ENXIO); 494 + } 495 + 496 + for (i = 0; i < ARRAY_SIZE(dtpm_subsys); i++) { 497 + 498 + if (!dtpm_subsys[i]->setup) 499 + continue; 500 + 501 + ret = dtpm_subsys[i]->setup(parent, np); 502 + if (ret) { 503 + pr_err("Failed to setup '%s': %d\n", dtpm_subsys[i]->name, ret); 504 + of_node_put(np); 505 + return ERR_PTR(ret); 506 + } 507 + } 508 + 509 + of_node_put(np); 510 + 511 + /* 512 + * By returning a NULL pointer, we let know the caller there 513 + * is no child for us as we are a leaf of the tree 514 + */ 515 + return NULL; 516 + } 517 + 518 + typedef struct dtpm * (*dtpm_node_callback_t)(const struct dtpm_node *, struct dtpm *); 519 + 520 + static dtpm_node_callback_t dtpm_node_callback[] = { 521 + [DTPM_NODE_VIRTUAL] = dtpm_setup_virtual, 522 + [DTPM_NODE_DT] = dtpm_setup_dt, 523 + }; 524 + 525 + static int dtpm_for_each_child(const struct dtpm_node *hierarchy, 526 + const struct dtpm_node *it, struct dtpm *parent) 527 + { 528 + struct dtpm *dtpm; 529 + int i, ret; 530 + 531 + for (i = 0; hierarchy[i].name; i++) { 532 + 533 + if (hierarchy[i].parent != it) 534 + continue; 535 + 536 + dtpm = dtpm_node_callback[hierarchy[i].type](&hierarchy[i], parent); 537 + 538 + /* 539 + * A NULL pointer means there is no children, hence we 540 + * continue without going deeper in the recursivity. 541 + */ 542 + if (!dtpm) 543 + continue; 544 + 545 + /* 546 + * There are multiple reasons why the callback could 547 + * fail. The generic glue is abstracting the backend 548 + * and therefore it is not possible to report back or 549 + * take a decision based on the error. In any case, 550 + * if this call fails, it is not critical in the 551 + * hierarchy creation, we can assume the underlying 552 + * service is not found, so we continue without this 553 + * branch in the tree but with a warning to log the 554 + * information the node was not created. 555 + */ 556 + if (IS_ERR(dtpm)) { 557 + pr_warn("Failed to create '%s' in the hierarchy\n", 558 + hierarchy[i].name); 559 + continue; 560 + } 561 + 562 + ret = dtpm_for_each_child(hierarchy, &hierarchy[i], dtpm); 563 + if (ret) 564 + return ret; 423 565 } 424 566 425 567 return 0; 426 568 } 427 - late_initcall(init_dtpm); 569 + 570 + /** 571 + * dtpm_create_hierarchy - Create the dtpm hierarchy 572 + * @hierarchy: An array of struct dtpm_node describing the hierarchy 573 + * 574 + * The function is called by the platform specific code with the 575 + * description of the different node in the hierarchy. It creates the 576 + * tree in the sysfs filesystem under the powercap dtpm entry. 577 + * 578 + * The expected tree has the format: 579 + * 580 + * struct dtpm_node hierarchy[] = { 581 + * [0] { .name = "topmost", type = DTPM_NODE_VIRTUAL }, 582 + * [1] { .name = "package", .type = DTPM_NODE_VIRTUAL, .parent = &hierarchy[0] }, 583 + * [2] { .name = "/cpus/cpu0", .type = DTPM_NODE_DT, .parent = &hierarchy[1] }, 584 + * [3] { .name = "/cpus/cpu1", .type = DTPM_NODE_DT, .parent = &hierarchy[1] }, 585 + * [4] { .name = "/cpus/cpu2", .type = DTPM_NODE_DT, .parent = &hierarchy[1] }, 586 + * [5] { .name = "/cpus/cpu3", .type = DTPM_NODE_DT, .parent = &hierarchy[1] }, 587 + * [6] { } 588 + * }; 589 + * 590 + * The last element is always an empty one and marks the end of the 591 + * array. 592 + * 593 + * Return: zero on success, a negative value in case of error. Errors 594 + * are reported back from the underlying functions. 595 + */ 596 + int dtpm_create_hierarchy(struct of_device_id *dtpm_match_table) 597 + { 598 + const struct of_device_id *match; 599 + const struct dtpm_node *hierarchy; 600 + struct device_node *np; 601 + int i, ret; 602 + 603 + mutex_lock(&dtpm_lock); 604 + 605 + if (pct) { 606 + ret = -EBUSY; 607 + goto out_unlock; 608 + } 609 + 610 + pct = powercap_register_control_type(NULL, "dtpm", NULL); 611 + if (IS_ERR(pct)) { 612 + pr_err("Failed to register control type\n"); 613 + ret = PTR_ERR(pct); 614 + goto out_pct; 615 + } 616 + 617 + ret = -ENODEV; 618 + np = of_find_node_by_path("/"); 619 + if (!np) 620 + goto out_err; 621 + 622 + match = of_match_node(dtpm_match_table, np); 623 + 624 + of_node_put(np); 625 + 626 + if (!match) 627 + goto out_err; 628 + 629 + hierarchy = match->data; 630 + if (!hierarchy) { 631 + ret = -EFAULT; 632 + goto out_err; 633 + } 634 + 635 + ret = dtpm_for_each_child(hierarchy, NULL, NULL); 636 + if (ret) 637 + goto out_err; 638 + 639 + for (i = 0; i < ARRAY_SIZE(dtpm_subsys); i++) { 640 + 641 + if (!dtpm_subsys[i]->init) 642 + continue; 643 + 644 + ret = dtpm_subsys[i]->init(); 645 + if (ret) 646 + pr_info("Failed to initialize '%s': %d", 647 + dtpm_subsys[i]->name, ret); 648 + } 649 + 650 + mutex_unlock(&dtpm_lock); 651 + 652 + return 0; 653 + 654 + out_err: 655 + powercap_unregister_control_type(pct); 656 + out_pct: 657 + pct = NULL; 658 + out_unlock: 659 + mutex_unlock(&dtpm_lock); 660 + 661 + return ret; 662 + } 663 + EXPORT_SYMBOL_GPL(dtpm_create_hierarchy); 664 + 665 + static void __dtpm_destroy_hierarchy(struct dtpm *dtpm) 666 + { 667 + struct dtpm *child, *aux; 668 + 669 + list_for_each_entry_safe(child, aux, &dtpm->children, sibling) 670 + __dtpm_destroy_hierarchy(child); 671 + 672 + /* 673 + * At this point, we know all children were removed from the 674 + * recursive call before 675 + */ 676 + dtpm_unregister(dtpm); 677 + } 678 + 679 + void dtpm_destroy_hierarchy(void) 680 + { 681 + int i; 682 + 683 + mutex_lock(&dtpm_lock); 684 + 685 + if (!pct) 686 + goto out_unlock; 687 + 688 + __dtpm_destroy_hierarchy(root); 689 + 690 + 691 + for (i = 0; i < ARRAY_SIZE(dtpm_subsys); i++) { 692 + 693 + if (!dtpm_subsys[i]->exit) 694 + continue; 695 + 696 + dtpm_subsys[i]->exit(); 697 + } 698 + 699 + powercap_unregister_control_type(pct); 700 + 701 + pct = NULL; 702 + 703 + root = NULL; 704 + 705 + out_unlock: 706 + mutex_unlock(&dtpm_lock); 707 + } 708 + EXPORT_SYMBOL_GPL(dtpm_destroy_hierarchy);

+48 -7

drivers/powercap/dtpm_cpu.c

··· 21 21 #include <linux/cpuhotplug.h> 22 22 #include <linux/dtpm.h> 23 23 #include <linux/energy_model.h> 24 + #include <linux/of.h> 24 25 #include <linux/pm_qos.h> 25 26 #include <linux/slab.h> 26 27 #include <linux/units.h> ··· 151 150 static void pd_release(struct dtpm *dtpm) 152 151 { 153 152 struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); 153 + struct cpufreq_policy *policy; 154 154 155 155 if (freq_qos_request_active(&dtpm_cpu->qos_req)) 156 156 freq_qos_remove_request(&dtpm_cpu->qos_req); 157 157 158 + policy = cpufreq_cpu_get(dtpm_cpu->cpu); 159 + if (policy) { 160 + for_each_cpu(dtpm_cpu->cpu, policy->related_cpus) 161 + per_cpu(dtpm_per_cpu, dtpm_cpu->cpu) = NULL; 162 + } 163 + 158 164 kfree(dtpm_cpu); 159 165 } 160 166 ··· 186 178 static int cpuhp_dtpm_cpu_online(unsigned int cpu) 187 179 { 188 180 struct dtpm_cpu *dtpm_cpu; 181 + 182 + dtpm_cpu = per_cpu(dtpm_per_cpu, cpu); 183 + if (dtpm_cpu) 184 + return dtpm_update_power(&dtpm_cpu->dtpm); 185 + 186 + return 0; 187 + } 188 + 189 + static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) 190 + { 191 + struct dtpm_cpu *dtpm_cpu; 189 192 struct cpufreq_policy *policy; 190 193 struct em_perf_domain *pd; 191 194 char name[CPUFREQ_NAME_LEN]; 192 195 int ret = -ENOMEM; 196 + 197 + dtpm_cpu = per_cpu(dtpm_per_cpu, cpu); 198 + if (dtpm_cpu) 199 + return 0; 193 200 194 201 policy = cpufreq_cpu_get(cpu); 195 202 if (!policy) ··· 213 190 pd = em_cpu_get(cpu); 214 191 if (!pd) 215 192 return -EINVAL; 216 - 217 - dtpm_cpu = per_cpu(dtpm_per_cpu, cpu); 218 - if (dtpm_cpu) 219 - return dtpm_update_power(&dtpm_cpu->dtpm); 220 193 221 194 dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL); 222 195 if (!dtpm_cpu) ··· 226 207 227 208 snprintf(name, sizeof(name), "cpu%d-cpufreq", dtpm_cpu->cpu); 228 209 229 - ret = dtpm_register(name, &dtpm_cpu->dtpm, NULL); 210 + ret = dtpm_register(name, &dtpm_cpu->dtpm, parent); 230 211 if (ret) 231 212 goto out_kfree_dtpm_cpu; 232 213 ··· 250 231 return ret; 251 232 } 252 233 253 - static int __init dtpm_cpu_init(void) 234 + static int dtpm_cpu_setup(struct dtpm *dtpm, struct device_node *np) 235 + { 236 + int cpu; 237 + 238 + cpu = of_cpu_node_to_id(np); 239 + if (cpu < 0) 240 + return 0; 241 + 242 + return __dtpm_cpu_setup(cpu, dtpm); 243 + } 244 + 245 + static int dtpm_cpu_init(void) 254 246 { 255 247 int ret; 256 248 ··· 299 269 return 0; 300 270 } 301 271 302 - DTPM_DECLARE(dtpm_cpu, dtpm_cpu_init); 272 + static void dtpm_cpu_exit(void) 273 + { 274 + cpuhp_remove_state_nocalls(CPUHP_AP_ONLINE_DYN); 275 + cpuhp_remove_state_nocalls(CPUHP_AP_DTPM_CPU_DEAD); 276 + } 277 + 278 + struct dtpm_subsys_ops dtpm_cpu_ops = { 279 + .name = KBUILD_MODNAME, 280 + .init = dtpm_cpu_init, 281 + .exit = dtpm_cpu_exit, 282 + .setup = dtpm_cpu_setup, 283 + };

+203

drivers/powercap/dtpm_devfreq.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright 2021 Linaro Limited 4 + * 5 + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> 6 + * 7 + * The devfreq device combined with the energy model and the load can 8 + * give an estimation of the power consumption as well as limiting the 9 + * power. 10 + * 11 + */ 12 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 13 + 14 + #include <linux/cpumask.h> 15 + #include <linux/devfreq.h> 16 + #include <linux/dtpm.h> 17 + #include <linux/energy_model.h> 18 + #include <linux/of.h> 19 + #include <linux/pm_qos.h> 20 + #include <linux/slab.h> 21 + #include <linux/units.h> 22 + 23 + struct dtpm_devfreq { 24 + struct dtpm dtpm; 25 + struct dev_pm_qos_request qos_req; 26 + struct devfreq *devfreq; 27 + }; 28 + 29 + static struct dtpm_devfreq *to_dtpm_devfreq(struct dtpm *dtpm) 30 + { 31 + return container_of(dtpm, struct dtpm_devfreq, dtpm); 32 + } 33 + 34 + static int update_pd_power_uw(struct dtpm *dtpm) 35 + { 36 + struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm); 37 + struct devfreq *devfreq = dtpm_devfreq->devfreq; 38 + struct device *dev = devfreq->dev.parent; 39 + struct em_perf_domain *pd = em_pd_get(dev); 40 + 41 + dtpm->power_min = pd->table[0].power; 42 + dtpm->power_min *= MICROWATT_PER_MILLIWATT; 43 + 44 + dtpm->power_max = pd->table[pd->nr_perf_states - 1].power; 45 + dtpm->power_max *= MICROWATT_PER_MILLIWATT; 46 + 47 + return 0; 48 + } 49 + 50 + static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) 51 + { 52 + struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm); 53 + struct devfreq *devfreq = dtpm_devfreq->devfreq; 54 + struct device *dev = devfreq->dev.parent; 55 + struct em_perf_domain *pd = em_pd_get(dev); 56 + unsigned long freq; 57 + u64 power; 58 + int i; 59 + 60 + for (i = 0; i < pd->nr_perf_states; i++) { 61 + 62 + power = pd->table[i].power * MICROWATT_PER_MILLIWATT; 63 + if (power > power_limit) 64 + break; 65 + } 66 + 67 + freq = pd->table[i - 1].frequency; 68 + 69 + dev_pm_qos_update_request(&dtpm_devfreq->qos_req, freq); 70 + 71 + power_limit = pd->table[i - 1].power * MICROWATT_PER_MILLIWATT; 72 + 73 + return power_limit; 74 + } 75 + 76 + static void _normalize_load(struct devfreq_dev_status *status) 77 + { 78 + if (status->total_time > 0xfffff) { 79 + status->total_time >>= 10; 80 + status->busy_time >>= 10; 81 + } 82 + 83 + status->busy_time <<= 10; 84 + status->busy_time /= status->total_time ? : 1; 85 + 86 + status->busy_time = status->busy_time ? : 1; 87 + status->total_time = 1024; 88 + } 89 + 90 + static u64 get_pd_power_uw(struct dtpm *dtpm) 91 + { 92 + struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm); 93 + struct devfreq *devfreq = dtpm_devfreq->devfreq; 94 + struct device *dev = devfreq->dev.parent; 95 + struct em_perf_domain *pd = em_pd_get(dev); 96 + struct devfreq_dev_status status; 97 + unsigned long freq; 98 + u64 power; 99 + int i; 100 + 101 + mutex_lock(&devfreq->lock); 102 + status = devfreq->last_status; 103 + mutex_unlock(&devfreq->lock); 104 + 105 + freq = DIV_ROUND_UP(status.current_frequency, HZ_PER_KHZ); 106 + _normalize_load(&status); 107 + 108 + for (i = 0; i < pd->nr_perf_states; i++) { 109 + 110 + if (pd->table[i].frequency < freq) 111 + continue; 112 + 113 + power = pd->table[i].power * MICROWATT_PER_MILLIWATT; 114 + power *= status.busy_time; 115 + power >>= 10; 116 + 117 + return power; 118 + } 119 + 120 + return 0; 121 + } 122 + 123 + static void pd_release(struct dtpm *dtpm) 124 + { 125 + struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm); 126 + 127 + if (dev_pm_qos_request_active(&dtpm_devfreq->qos_req)) 128 + dev_pm_qos_remove_request(&dtpm_devfreq->qos_req); 129 + 130 + kfree(dtpm_devfreq); 131 + } 132 + 133 + static struct dtpm_ops dtpm_ops = { 134 + .set_power_uw = set_pd_power_limit, 135 + .get_power_uw = get_pd_power_uw, 136 + .update_power_uw = update_pd_power_uw, 137 + .release = pd_release, 138 + }; 139 + 140 + static int __dtpm_devfreq_setup(struct devfreq *devfreq, struct dtpm *parent) 141 + { 142 + struct device *dev = devfreq->dev.parent; 143 + struct dtpm_devfreq *dtpm_devfreq; 144 + struct em_perf_domain *pd; 145 + int ret = -ENOMEM; 146 + 147 + pd = em_pd_get(dev); 148 + if (!pd) { 149 + ret = dev_pm_opp_of_register_em(dev, NULL); 150 + if (ret) { 151 + pr_err("No energy model available for '%s'\n", dev_name(dev)); 152 + return -EINVAL; 153 + } 154 + } 155 + 156 + dtpm_devfreq = kzalloc(sizeof(*dtpm_devfreq), GFP_KERNEL); 157 + if (!dtpm_devfreq) 158 + return -ENOMEM; 159 + 160 + dtpm_init(&dtpm_devfreq->dtpm, &dtpm_ops); 161 + 162 + dtpm_devfreq->devfreq = devfreq; 163 + 164 + ret = dtpm_register(dev_name(dev), &dtpm_devfreq->dtpm, parent); 165 + if (ret) { 166 + pr_err("Failed to register '%s': %d\n", dev_name(dev), ret); 167 + kfree(dtpm_devfreq); 168 + return ret; 169 + } 170 + 171 + ret = dev_pm_qos_add_request(dev, &dtpm_devfreq->qos_req, 172 + DEV_PM_QOS_MAX_FREQUENCY, 173 + PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE); 174 + if (ret) { 175 + pr_err("Failed to add QoS request: %d\n", ret); 176 + goto out_dtpm_unregister; 177 + } 178 + 179 + dtpm_update_power(&dtpm_devfreq->dtpm); 180 + 181 + return 0; 182 + 183 + out_dtpm_unregister: 184 + dtpm_unregister(&dtpm_devfreq->dtpm); 185 + 186 + return ret; 187 + } 188 + 189 + static int dtpm_devfreq_setup(struct dtpm *dtpm, struct device_node *np) 190 + { 191 + struct devfreq *devfreq; 192 + 193 + devfreq = devfreq_get_devfreq_by_node(np); 194 + if (IS_ERR(devfreq)) 195 + return 0; 196 + 197 + return __dtpm_devfreq_setup(devfreq, dtpm); 198 + } 199 + 200 + struct dtpm_subsys_ops dtpm_devfreq_ops = { 201 + .name = KBUILD_MODNAME, 202 + .setup = dtpm_devfreq_setup, 203 + };

+22

drivers/powercap/dtpm_subsys.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (C) 2022 Linaro Ltd 4 + * 5 + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> 6 + */ 7 + #ifndef ___DTPM_SUBSYS_H__ 8 + #define ___DTPM_SUBSYS_H__ 9 + 10 + extern struct dtpm_subsys_ops dtpm_cpu_ops; 11 + extern struct dtpm_subsys_ops dtpm_devfreq_ops; 12 + 13 + struct dtpm_subsys_ops *dtpm_subsys[] = { 14 + #ifdef CONFIG_DTPM_CPU 15 + &dtpm_cpu_ops, 16 + #endif 17 + #ifdef CONFIG_DTPM_DEVFREQ 18 + &dtpm_devfreq_ops, 19 + #endif 20 + }; 21 + 22 + #endif

+8

drivers/soc/rockchip/Kconfig

··· 34 34 35 35 If unsure, say N. 36 36 37 + config ROCKCHIP_DTPM 38 + tristate "Rockchip DTPM hierarchy" 39 + depends on DTPM && m 40 + help 41 + Describe the hierarchy for the Dynamic Thermal Power 42 + Management tree on this platform. That will create all the 43 + power capping capable devices. 44 + 37 45 endif

+1

drivers/soc/rockchip/Makefile

··· 5 5 obj-$(CONFIG_ROCKCHIP_GRF) += grf.o 6 6 obj-$(CONFIG_ROCKCHIP_IODOMAIN) += io-domain.o 7 7 obj-$(CONFIG_ROCKCHIP_PM_DOMAINS) += pm_domains.o 8 + obj-$(CONFIG_ROCKCHIP_DTPM) += dtpm.o

+65

drivers/soc/rockchip/dtpm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright 2021 Linaro Limited 4 + * 5 + * Author: Daniel Lezcano <daniel.lezcano@linaro.org> 6 + * 7 + * DTPM hierarchy description 8 + */ 9 + #include <linux/dtpm.h> 10 + #include <linux/module.h> 11 + #include <linux/of.h> 12 + #include <linux/platform_device.h> 13 + 14 + static struct dtpm_node __initdata rk3399_hierarchy[] = { 15 + [0]{ .name = "rk3399", 16 + .type = DTPM_NODE_VIRTUAL }, 17 + [1]{ .name = "package", 18 + .type = DTPM_NODE_VIRTUAL, 19 + .parent = &rk3399_hierarchy[0] }, 20 + [2]{ .name = "/cpus/cpu@0", 21 + .type = DTPM_NODE_DT, 22 + .parent = &rk3399_hierarchy[1] }, 23 + [3]{ .name = "/cpus/cpu@1", 24 + .type = DTPM_NODE_DT, 25 + .parent = &rk3399_hierarchy[1] }, 26 + [4]{ .name = "/cpus/cpu@2", 27 + .type = DTPM_NODE_DT, 28 + .parent = &rk3399_hierarchy[1] }, 29 + [5]{ .name = "/cpus/cpu@3", 30 + .type = DTPM_NODE_DT, 31 + .parent = &rk3399_hierarchy[1] }, 32 + [6]{ .name = "/cpus/cpu@100", 33 + .type = DTPM_NODE_DT, 34 + .parent = &rk3399_hierarchy[1] }, 35 + [7]{ .name = "/cpus/cpu@101", 36 + .type = DTPM_NODE_DT, 37 + .parent = &rk3399_hierarchy[1] }, 38 + [8]{ .name = "/gpu@ff9a0000", 39 + .type = DTPM_NODE_DT, 40 + .parent = &rk3399_hierarchy[1] }, 41 + [9]{ /* sentinel */ } 42 + }; 43 + 44 + static struct of_device_id __initdata rockchip_dtpm_match_table[] = { 45 + { .compatible = "rockchip,rk3399", .data = rk3399_hierarchy }, 46 + {}, 47 + }; 48 + 49 + static int __init rockchip_dtpm_init(void) 50 + { 51 + return dtpm_create_hierarchy(rockchip_dtpm_match_table); 52 + } 53 + module_init(rockchip_dtpm_init); 54 + 55 + static void __exit rockchip_dtpm_exit(void) 56 + { 57 + return dtpm_destroy_hierarchy(); 58 + } 59 + module_exit(rockchip_dtpm_exit); 60 + 61 + MODULE_SOFTDEP("pre: panfrost cpufreq-dt"); 62 + MODULE_DESCRIPTION("Rockchip DTPM driver"); 63 + MODULE_LICENSE("GPL"); 64 + MODULE_ALIAS("platform:dtpm"); 65 + MODULE_AUTHOR("Daniel Lezcano <daniel.lezcano@kernel.org");

+2 -2

drivers/usb/core/hcd-pci.c

··· 446 446 HCD_WAKEUP_PENDING(hcd->shared_hcd)) 447 447 return -EBUSY; 448 448 retval = hcd->driver->pci_suspend(hcd, do_wakeup); 449 - suspend_report_result(hcd->driver->pci_suspend, retval); 449 + suspend_report_result(dev, hcd->driver->pci_suspend, retval); 450 450 451 451 /* Check again in case wakeup raced with pci_suspend */ 452 452 if ((retval == 0 && do_wakeup && HCD_WAKEUP_PENDING(hcd)) || ··· 556 556 dev_dbg(dev, "--> PCI %s\n", 557 557 pci_power_name(pci_dev->current_state)); 558 558 } else { 559 - suspend_report_result(pci_prepare_to_sleep, retval); 559 + suspend_report_result(dev, pci_prepare_to_sleep, retval); 560 560 return retval; 561 561 } 562 562

-11

include/asm-generic/vmlinux.lds.h

··· 321 321 #define THERMAL_TABLE(name) 322 322 #endif 323 323 324 - #ifdef CONFIG_DTPM 325 - #define DTPM_TABLE() \ 326 - . = ALIGN(8); \ 327 - __dtpm_table = .; \ 328 - KEEP(*(__dtpm_table)) \ 329 - __dtpm_table_end = .; 330 - #else 331 - #define DTPM_TABLE() 332 - #endif 333 - 334 324 #define KERNEL_DTB() \ 335 325 STRUCT_ALIGN(); \ 336 326 __dtb_start = .; \ ··· 713 723 ACPI_PROBE_TABLE(irqchip) \ 714 724 ACPI_PROBE_TABLE(timer) \ 715 725 THERMAL_TABLE(governor) \ 716 - DTPM_TABLE() \ 717 726 EARLYCON_TABLE() \ 718 727 LSM_TABLE() \ 719 728 EARLY_LSM_TABLE() \

+1 -1

include/linux/acpi.h

··· 526 526 int acpi_resources_are_enforced(void); 527 527 528 528 #ifdef CONFIG_HIBERNATION 529 - void __init acpi_check_s4_hw_signature(int check); 529 + extern int acpi_check_s4_hw_signature; 530 530 #endif 531 531 532 532 #ifdef CONFIG_PM_SLEEP

+5

include/linux/cpufreq.h

··· 661 661 /* sysfs ops for cpufreq governors */ 662 662 extern const struct sysfs_ops governor_sysfs_ops; 663 663 664 + static inline struct gov_attr_set *to_gov_attr_set(struct kobject *kobj) 665 + { 666 + return container_of(kobj, struct gov_attr_set, kobj); 667 + } 668 + 664 669 void gov_attr_set_init(struct gov_attr_set *attr_set, struct list_head *list_node); 665 670 void gov_attr_set_get(struct gov_attr_set *attr_set, struct list_head *list_node); 666 671 unsigned int gov_attr_set_put(struct gov_attr_set *attr_set, struct list_head *list_node);

+18 -18

include/linux/dtpm.h

··· 32 32 void (*release)(struct dtpm *); 33 33 }; 34 34 35 - typedef int (*dtpm_init_t)(void); 35 + struct device_node; 36 36 37 - struct dtpm_descr { 38 - dtpm_init_t init; 37 + struct dtpm_subsys_ops { 38 + const char *name; 39 + int (*init)(void); 40 + void (*exit)(void); 41 + int (*setup)(struct dtpm *, struct device_node *); 39 42 }; 40 43 41 - /* Init section thermal table */ 42 - extern struct dtpm_descr __dtpm_table[]; 43 - extern struct dtpm_descr __dtpm_table_end[]; 44 + enum DTPM_NODE_TYPE { 45 + DTPM_NODE_VIRTUAL = 0, 46 + DTPM_NODE_DT, 47 + }; 44 48 45 - #define DTPM_TABLE_ENTRY(name, __init) \ 46 - static struct dtpm_descr __dtpm_table_entry_##name \ 47 - __used __section("__dtpm_table") = { \ 48 - .init = __init, \ 49 - } 50 - 51 - #define DTPM_DECLARE(name, init) DTPM_TABLE_ENTRY(name, init) 52 - 53 - #define for_each_dtpm_table(__dtpm) \ 54 - for (__dtpm = __dtpm_table; \ 55 - __dtpm < __dtpm_table_end; \ 56 - __dtpm++) 49 + struct dtpm_node { 50 + enum DTPM_NODE_TYPE type; 51 + const char *name; 52 + struct dtpm_node *parent; 53 + }; 57 54 58 55 static inline struct dtpm *to_dtpm(struct powercap_zone *zone) 59 56 { ··· 67 70 68 71 int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent); 69 72 73 + int dtpm_create_hierarchy(struct of_device_id *dtpm_match_table); 74 + 75 + void dtpm_destroy_hierarchy(void); 70 76 #endif

+4 -4

include/linux/pm.h

··· 770 770 extern int dpm_suspend(pm_message_t state); 771 771 extern int dpm_prepare(pm_message_t state); 772 772 773 - extern void __suspend_report_result(const char *function, void *fn, int ret); 773 + extern void __suspend_report_result(const char *function, struct device *dev, void *fn, int ret); 774 774 775 - #define suspend_report_result(fn, ret) \ 775 + #define suspend_report_result(dev, fn, ret) \ 776 776 do { \ 777 - __suspend_report_result(__func__, fn, ret); \ 777 + __suspend_report_result(__func__, dev, fn, ret); \ 778 778 } while (0) 779 779 780 780 extern int device_pm_wait_for_dev(struct device *sub, struct device *dev); ··· 814 814 return 0; 815 815 } 816 816 817 - #define suspend_report_result(fn, ret) do {} while (0) 817 + #define suspend_report_result(dev, fn, ret) do {} while (0) 818 818 819 819 static inline int device_pm_wait_for_dev(struct device *a, struct device *b) 820 820 {

+4

include/linux/pm_runtime.h

··· 567 567 * Allow the runtime PM autosuspend mechanism to be used for @dev whenever 568 568 * requested (or "autosuspend" will be handled as direct runtime-suspend for 569 569 * it). 570 + * 571 + * NOTE: It's important to undo this with pm_runtime_dont_use_autosuspend() 572 + * at driver exit time unless your driver initially enabled pm_runtime 573 + * with devm_pm_runtime_enable() (which handles it for you). 570 574 */ 571 575 static inline void pm_runtime_use_autosuspend(struct device *dev) 572 576 {

+4 -2

kernel/power/hibernate.c

··· 689 689 690 690 lock_device_hotplug(); 691 691 error = create_basic_memory_bitmaps(); 692 - if (error) 692 + if (error) { 693 + swsusp_close(FMODE_READ | FMODE_EXCL); 693 694 goto Unlock; 695 + } 694 696 695 697 error = swsusp_read(&flags); 696 698 swsusp_close(FMODE_READ | FMODE_EXCL); ··· 1330 1328 int rc = kstrtouint(str, 0, &resume_delay); 1331 1329 1332 1330 if (rc) 1333 - return rc; 1331 + pr_warn("resumedelay: bad option string '%s'\n", str); 1334 1332 return 1; 1335 1333 } 1336 1334

+4 -4

kernel/power/suspend_test.c

··· 157 157 value++; 158 158 suspend_type = strsep(&value, ","); 159 159 if (!suspend_type) 160 - return 0; 160 + return 1; 161 161 162 162 repeat = strsep(&value, ","); 163 163 if (repeat) { 164 164 if (kstrtou32(repeat, 0, &test_repeat_count_max)) 165 - return 0; 165 + return 1; 166 166 } 167 167 168 168 for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++) 169 169 if (!strcmp(pm_labels[i], suspend_type)) { 170 170 test_state_label = pm_labels[i]; 171 - return 0; 171 + return 1; 172 172 } 173 173 174 174 printk(warn_bad_state, suspend_type); 175 - return 0; 175 + return 1; 176 176 } 177 177 __setup("test_suspend", setup_test_suspend); 178 178

+4 -4

kernel/power/swap.c

··· 89 89 struct swap_map_page_list *next; 90 90 }; 91 91 92 - /** 92 + /* 93 93 * The swap_map_handle structure is used for handling swap in 94 94 * a file-alike way 95 95 */ ··· 117 117 118 118 static struct swsusp_header *swsusp_header; 119 119 120 - /** 120 + /* 121 121 * The following functions are used for tracing the allocated 122 122 * swap pages, so that they can be freed in case of an error. 123 123 */ ··· 171 171 return 0; 172 172 } 173 173 174 - /** 174 + /* 175 175 * alloc_swapdev_block - allocate a swap page and register that it has 176 176 * been allocated, so that it can be freed in case of an error. 177 177 */ ··· 190 190 return 0; 191 191 } 192 192 193 - /** 193 + /* 194 194 * free_all_swap_pages - free swap pages allocated for saving image data. 195 195 * It also frees the extents used to register which swap entries had been 196 196 * allocated.

+1 -1

kernel/sched/cpufreq_schedutil.c

··· 539 539 540 540 static void sugov_tunables_free(struct kobject *kobj) 541 541 { 542 - struct gov_attr_set *attr_set = container_of(kobj, struct gov_attr_set, kobj); 542 + struct gov_attr_set *attr_set = to_gov_attr_set(kobj); 543 543 544 544 kfree(to_sugov_tunables(attr_set)); 545 545 }

+3 -3

tools/power/cpupower/Makefile

··· 143 143 utils/helpers/bitmask.h \ 144 144 utils/idle_monitor/idle_monitors.h utils/idle_monitor/idle_monitors.def 145 145 146 - LIB_HEADERS = lib/cpufreq.h lib/cpupower.h lib/cpuidle.h 147 - LIB_SRC = lib/cpufreq.c lib/cpupower.c lib/cpuidle.c 148 - LIB_OBJS = lib/cpufreq.o lib/cpupower.o lib/cpuidle.o 146 + LIB_HEADERS = lib/cpufreq.h lib/cpupower.h lib/cpuidle.h lib/acpi_cppc.h 147 + LIB_SRC = lib/cpufreq.c lib/cpupower.c lib/cpuidle.c lib/acpi_cppc.c 148 + LIB_OBJS = lib/cpufreq.o lib/cpupower.o lib/cpuidle.o lib/acpi_cppc.o 149 149 LIB_OBJS := $(addprefix $(OUTPUT),$(LIB_OBJS)) 150 150 151 151 override CFLAGS += -pipe

tools/power/cpupower/ToDo tools/power/cpupower/TODO

+59

tools/power/cpupower/lib/acpi_cppc.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + #include <stdio.h> 4 + #include <errno.h> 5 + #include <stdlib.h> 6 + #include <string.h> 7 + #include <sys/types.h> 8 + #include <sys/stat.h> 9 + #include <fcntl.h> 10 + #include <unistd.h> 11 + 12 + #include "cpupower_intern.h" 13 + #include "acpi_cppc.h" 14 + 15 + /* ACPI CPPC sysfs access ***********************************************/ 16 + 17 + static int acpi_cppc_read_file(unsigned int cpu, const char *fname, 18 + char *buf, size_t buflen) 19 + { 20 + char path[SYSFS_PATH_MAX]; 21 + 22 + snprintf(path, sizeof(path), PATH_TO_CPU "cpu%u/acpi_cppc/%s", 23 + cpu, fname); 24 + return cpupower_read_sysfs(path, buf, buflen); 25 + } 26 + 27 + static const char * const acpi_cppc_value_files[] = { 28 + [HIGHEST_PERF] = "highest_perf", 29 + [LOWEST_PERF] = "lowest_perf", 30 + [NOMINAL_PERF] = "nominal_perf", 31 + [LOWEST_NONLINEAR_PERF] = "lowest_nonlinear_perf", 32 + [LOWEST_FREQ] = "lowest_freq", 33 + [NOMINAL_FREQ] = "nominal_freq", 34 + [REFERENCE_PERF] = "reference_perf", 35 + [WRAPAROUND_TIME] = "wraparound_time" 36 + }; 37 + 38 + unsigned long acpi_cppc_get_data(unsigned int cpu, enum acpi_cppc_value which) 39 + { 40 + unsigned long long value; 41 + unsigned int len; 42 + char linebuf[MAX_LINE_LEN]; 43 + char *endp; 44 + 45 + if (which >= MAX_CPPC_VALUE_FILES) 46 + return 0; 47 + 48 + len = acpi_cppc_read_file(cpu, acpi_cppc_value_files[which], 49 + linebuf, sizeof(linebuf)); 50 + if (len == 0) 51 + return 0; 52 + 53 + value = strtoull(linebuf, &endp, 0); 54 + 55 + if (endp == linebuf || errno == ERANGE) 56 + return 0; 57 + 58 + return value; 59 + }

+21

tools/power/cpupower/lib/acpi_cppc.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + 3 + #ifndef __ACPI_CPPC_H__ 4 + #define __ACPI_CPPC_H__ 5 + 6 + enum acpi_cppc_value { 7 + HIGHEST_PERF, 8 + LOWEST_PERF, 9 + NOMINAL_PERF, 10 + LOWEST_NONLINEAR_PERF, 11 + LOWEST_FREQ, 12 + NOMINAL_FREQ, 13 + REFERENCE_PERF, 14 + WRAPAROUND_TIME, 15 + MAX_CPPC_VALUE_FILES 16 + }; 17 + 18 + unsigned long acpi_cppc_get_data(unsigned int cpu, 19 + enum acpi_cppc_value which); 20 + 21 + #endif /* _ACPI_CPPC_H */

+16 -7

tools/power/cpupower/lib/cpufreq.c

··· 83 83 [STATS_NUM_TRANSITIONS] = "stats/total_trans" 84 84 }; 85 85 86 - 87 - static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu, 88 - enum cpufreq_value which) 86 + unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu, 87 + const char **table, 88 + unsigned int index, 89 + unsigned int size) 89 90 { 90 91 unsigned long value; 91 92 unsigned int len; 92 93 char linebuf[MAX_LINE_LEN]; 93 94 char *endp; 94 95 95 - if (which >= MAX_CPUFREQ_VALUE_READ_FILES) 96 + if (!table || index >= size || !table[index]) 96 97 return 0; 97 98 98 - len = sysfs_cpufreq_read_file(cpu, cpufreq_value_files[which], 99 - linebuf, sizeof(linebuf)); 99 + len = sysfs_cpufreq_read_file(cpu, table[index], linebuf, 100 + sizeof(linebuf)); 100 101 101 102 if (len == 0) 102 103 return 0; ··· 108 107 return 0; 109 108 110 109 return value; 110 + } 111 + 112 + static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu, 113 + enum cpufreq_value which) 114 + { 115 + return cpufreq_get_sysfs_value_from_table(cpu, cpufreq_value_files, 116 + which, 117 + MAX_CPUFREQ_VALUE_READ_FILES); 111 118 } 112 119 113 120 /* read access to files which contain one string */ ··· 133 124 134 125 135 126 static char *sysfs_cpufreq_get_one_string(unsigned int cpu, 136 - enum cpufreq_string which) 127 + enum cpufreq_string which) 137 128 { 138 129 char linebuf[MAX_LINE_LEN]; 139 130 char *result;

+12

tools/power/cpupower/lib/cpufreq.h

··· 203 203 int cpufreq_set_frequency(unsigned int cpu, 204 204 unsigned long target_frequency); 205 205 206 + /* 207 + * get the sysfs value from specific table 208 + * 209 + * Read the value with the sysfs file name from specific table. Does 210 + * only work if the cpufreq driver has the specific sysfs interfaces. 211 + */ 212 + 213 + unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu, 214 + const char **table, 215 + unsigned int index, 216 + unsigned int size); 217 + 206 218 #ifdef __cplusplus 207 219 } 208 220 #endif

+3

tools/power/cpupower/man/cpupower-frequency-info.1

··· 53 53 \fB\-n\fR \fB\-\-no-rounding\fR 54 54 Output frequencies and latencies without rounding off values. 55 55 .TP 56 + \fB\-c\fR \fB\-\-perf\fR 57 + Get performances and frequencies capabilities of CPPC, by reading it from hardware (only available on the hardware with CPPC). 58 + .TP 56 59 .SH "REMARKS" 57 60 .LP 58 61 By default only values of core zero are displayed. How to display settings of

+1 -1

tools/power/cpupower/man/cpupower-idle-set.1

··· 4 4 cpupower\-idle\-set \- Utility to set cpu idle state specific kernel options 5 5 .SH "SYNTAX" 6 6 .LP 7 - cpupower [ \-c cpulist ] idle\-info [\fIoptions\fP] 7 + cpupower [ \-c cpulist ] idle\-set [\fIoptions\fP] 8 8 .SH "DESCRIPTION" 9 9 .LP 10 10 The cpupower idle\-set subcommand allows to set cpu idle, also called cpu

+35 -52

tools/power/cpupower/utils/cpufreq-info.c

··· 84 84 } 85 85 86 86 static int no_rounding; 87 - static void print_speed(unsigned long speed) 88 - { 89 - unsigned long tmp; 90 - 91 - if (no_rounding) { 92 - if (speed > 1000000) 93 - printf("%u.%06u GHz", ((unsigned int) speed/1000000), 94 - ((unsigned int) speed%1000000)); 95 - else if (speed > 1000) 96 - printf("%u.%03u MHz", ((unsigned int) speed/1000), 97 - (unsigned int) (speed%1000)); 98 - else 99 - printf("%lu kHz", speed); 100 - } else { 101 - if (speed > 1000000) { 102 - tmp = speed%10000; 103 - if (tmp >= 5000) 104 - speed += 10000; 105 - printf("%u.%02u GHz", ((unsigned int) speed/1000000), 106 - ((unsigned int) (speed%1000000)/10000)); 107 - } else if (speed > 100000) { 108 - tmp = speed%1000; 109 - if (tmp >= 500) 110 - speed += 1000; 111 - printf("%u MHz", ((unsigned int) speed/1000)); 112 - } else if (speed > 1000) { 113 - tmp = speed%100; 114 - if (tmp >= 50) 115 - speed += 100; 116 - printf("%u.%01u MHz", ((unsigned int) speed/1000), 117 - ((unsigned int) (speed%1000)/100)); 118 - } 119 - } 120 - 121 - return; 122 - } 123 - 124 87 static void print_duration(unsigned long duration) 125 88 { 126 89 unsigned long tmp; ··· 146 183 printf(_(" Supported: %s\n"), support ? _("yes") : _("no")); 147 184 printf(_(" Active: %s\n"), active ? _("yes") : _("no")); 148 185 149 - if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD && 150 - cpupower_cpu_info.family >= 0x10) || 151 - cpupower_cpu_info.vendor == X86_VENDOR_HYGON) { 186 + if (cpupower_cpu_info.vendor == X86_VENDOR_AMD && 187 + cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) { 188 + return 0; 189 + } else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD && 190 + cpupower_cpu_info.family >= 0x10) || 191 + cpupower_cpu_info.vendor == X86_VENDOR_HYGON) { 152 192 ret = decode_pstates(cpu, b_states, pstates, &pstate_no); 153 193 if (ret) 154 194 return ret; ··· 220 254 if (freqs) { 221 255 printf(_(" boost frequency steps: ")); 222 256 while (freqs->next) { 223 - print_speed(freqs->frequency); 257 + print_speed(freqs->frequency, no_rounding); 224 258 printf(", "); 225 259 freqs = freqs->next; 226 260 } 227 - print_speed(freqs->frequency); 261 + print_speed(freqs->frequency, no_rounding); 228 262 printf("\n"); 229 263 cpufreq_put_available_frequencies(freqs); 230 264 } ··· 243 277 return -EINVAL; 244 278 } 245 279 if (human) { 246 - print_speed(freq); 280 + print_speed(freq, no_rounding); 247 281 } else 248 282 printf("%lu", freq); 249 283 printf(_(" (asserted by call to kernel)\n")); ··· 262 296 return -EINVAL; 263 297 } 264 298 if (human) { 265 - print_speed(freq); 299 + print_speed(freq, no_rounding); 266 300 } else 267 301 printf("%lu", freq); 268 302 printf(_(" (asserted by call to hardware)\n")); ··· 282 316 283 317 if (human) { 284 318 printf(_(" hardware limits: ")); 285 - print_speed(min); 319 + print_speed(min, no_rounding); 286 320 printf(" - "); 287 - print_speed(max); 321 + print_speed(max, no_rounding); 288 322 printf("\n"); 289 323 } else { 290 324 printf("%lu %lu\n", min, max); ··· 316 350 return -EINVAL; 317 351 } 318 352 printf(_(" current policy: frequency should be within ")); 319 - print_speed(policy->min); 353 + print_speed(policy->min, no_rounding); 320 354 printf(_(" and ")); 321 - print_speed(policy->max); 355 + print_speed(policy->max, no_rounding); 322 356 323 357 printf(".\n "); 324 358 printf(_("The governor \"%s\" may decide which speed to use\n" ··· 402 436 struct cpufreq_stats *stats = cpufreq_get_stats(cpu, &total_time); 403 437 while (stats) { 404 438 if (human) { 405 - print_speed(stats->frequency); 439 + print_speed(stats->frequency, no_rounding); 406 440 printf(":%.2f%%", 407 441 (100.0 * stats->time_in_state) / total_time); 408 442 } else ··· 438 472 return 0; 439 473 } 440 474 475 + /* --performance / -c */ 476 + 477 + static int get_perf_cap(unsigned int cpu) 478 + { 479 + if (cpupower_cpu_info.vendor == X86_VENDOR_AMD && 480 + cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) 481 + amd_pstate_show_perf_and_freq(cpu, no_rounding); 482 + 483 + return 0; 484 + } 485 + 441 486 static void debug_output_one(unsigned int cpu) 442 487 { 443 488 struct cpufreq_available_frequencies *freqs; ··· 463 486 if (freqs) { 464 487 printf(_(" available frequency steps: ")); 465 488 while (freqs->next) { 466 - print_speed(freqs->frequency); 489 + print_speed(freqs->frequency, no_rounding); 467 490 printf(", "); 468 491 freqs = freqs->next; 469 492 } 470 - print_speed(freqs->frequency); 493 + print_speed(freqs->frequency, no_rounding); 471 494 printf("\n"); 472 495 cpufreq_put_available_frequencies(freqs); 473 496 } ··· 477 500 if (get_freq_hardware(cpu, 1) < 0) 478 501 get_freq_kernel(cpu, 1); 479 502 get_boost_mode(cpu); 503 + get_perf_cap(cpu); 480 504 } 481 505 482 506 static struct option info_opts[] = { ··· 496 518 {"proc", no_argument, NULL, 'o'}, 497 519 {"human", no_argument, NULL, 'm'}, 498 520 {"no-rounding", no_argument, NULL, 'n'}, 521 + {"performance", no_argument, NULL, 'c'}, 499 522 { }, 500 523 }; 501 524 ··· 510 531 int output_param = 0; 511 532 512 533 do { 513 - ret = getopt_long(argc, argv, "oefwldpgrasmybn", info_opts, 534 + ret = getopt_long(argc, argv, "oefwldpgrasmybnc", info_opts, 514 535 NULL); 515 536 switch (ret) { 516 537 case '?': ··· 533 554 case 'e': 534 555 case 's': 535 556 case 'y': 557 + case 'c': 536 558 if (output_param) { 537 559 output_param = -1; 538 560 cont = 0; ··· 639 659 break; 640 660 case 'y': 641 661 ret = get_latency(cpu, human); 662 + break; 663 + case 'c': 664 + ret = get_perf_cap(cpu); 642 665 break; 643 666 } 644 667 if (ret)

+77

tools/power/cpupower/utils/helpers/amd.c

··· 8 8 #include <pci/pci.h> 9 9 10 10 #include "helpers/helpers.h" 11 + #include "cpufreq.h" 12 + #include "acpi_cppc.h" 11 13 14 + /* ACPI P-States Helper Functions for AMD Processors ***************/ 12 15 #define MSR_AMD_PSTATE_STATUS 0xc0010063 13 16 #define MSR_AMD_PSTATE 0xc0010064 14 17 #define MSR_AMD_PSTATE_LIMIT 0xc0010061 ··· 149 146 pci_cleanup(pci_acc); 150 147 return 0; 151 148 } 149 + 150 + /* ACPI P-States Helper Functions for AMD Processors ***************/ 151 + 152 + /* AMD P-State Helper Functions ************************************/ 153 + enum amd_pstate_value { 154 + AMD_PSTATE_HIGHEST_PERF, 155 + AMD_PSTATE_MAX_FREQ, 156 + AMD_PSTATE_LOWEST_NONLINEAR_FREQ, 157 + MAX_AMD_PSTATE_VALUE_READ_FILES, 158 + }; 159 + 160 + static const char *amd_pstate_value_files[MAX_AMD_PSTATE_VALUE_READ_FILES] = { 161 + [AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf", 162 + [AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq", 163 + [AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq", 164 + }; 165 + 166 + static unsigned long amd_pstate_get_data(unsigned int cpu, 167 + enum amd_pstate_value value) 168 + { 169 + return cpufreq_get_sysfs_value_from_table(cpu, 170 + amd_pstate_value_files, 171 + value, 172 + MAX_AMD_PSTATE_VALUE_READ_FILES); 173 + } 174 + 175 + void amd_pstate_boost_init(unsigned int cpu, int *support, int *active) 176 + { 177 + unsigned long highest_perf, nominal_perf, cpuinfo_min, 178 + cpuinfo_max, amd_pstate_max; 179 + 180 + highest_perf = amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF); 181 + nominal_perf = acpi_cppc_get_data(cpu, NOMINAL_PERF); 182 + 183 + *support = highest_perf > nominal_perf ? 1 : 0; 184 + if (!(*support)) 185 + return; 186 + 187 + cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max); 188 + amd_pstate_max = amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ); 189 + 190 + *active = cpuinfo_max == amd_pstate_max ? 1 : 0; 191 + } 192 + 193 + void amd_pstate_show_perf_and_freq(unsigned int cpu, int no_rounding) 194 + { 195 + printf(_(" AMD PSTATE Highest Performance: %lu. Maximum Frequency: "), 196 + amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF)); 197 + /* 198 + * If boost isn't active, the cpuinfo_max doesn't indicate real max 199 + * frequency. So we read it back from amd-pstate sysfs entry. 200 + */ 201 + print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ), no_rounding); 202 + printf(".\n"); 203 + 204 + printf(_(" AMD PSTATE Nominal Performance: %lu. Nominal Frequency: "), 205 + acpi_cppc_get_data(cpu, NOMINAL_PERF)); 206 + print_speed(acpi_cppc_get_data(cpu, NOMINAL_FREQ) * 1000, 207 + no_rounding); 208 + printf(".\n"); 209 + 210 + printf(_(" AMD PSTATE Lowest Non-linear Performance: %lu. Lowest Non-linear Frequency: "), 211 + acpi_cppc_get_data(cpu, LOWEST_NONLINEAR_PERF)); 212 + print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_LOWEST_NONLINEAR_FREQ), 213 + no_rounding); 214 + printf(".\n"); 215 + 216 + printf(_(" AMD PSTATE Lowest Performance: %lu. Lowest Frequency: "), 217 + acpi_cppc_get_data(cpu, LOWEST_PERF)); 218 + print_speed(acpi_cppc_get_data(cpu, LOWEST_FREQ) * 1000, no_rounding); 219 + printf(".\n"); 220 + } 221 + 222 + /* AMD P-State Helper Functions ************************************/ 152 223 #endif /* defined(__i386__) || defined(__x86_64__) */

+13

tools/power/cpupower/utils/helpers/cpuid.c

··· 149 149 if (ext_cpuid_level >= 0x80000008 && 150 150 cpuid_ebx(0x80000008) & (1 << 4)) 151 151 cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU; 152 + 153 + if (cpupower_amd_pstate_enabled()) { 154 + cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE; 155 + 156 + /* 157 + * If AMD P-State is enabled, the firmware will treat 158 + * AMD P-State function as high priority. 159 + */ 160 + cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB; 161 + cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR; 162 + cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE; 163 + cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF; 164 + } 152 165 } 153 166 154 167 if (cpu_info->vendor == X86_VENDOR_INTEL) {

+22

tools/power/cpupower/utils/helpers/helpers.h

··· 11 11 12 12 #include <libintl.h> 13 13 #include <locale.h> 14 + #include <stdbool.h> 14 15 15 16 #include "helpers/bitmask.h" 16 17 #include <cpupower.h> ··· 74 73 #define CPUPOWER_CAP_AMD_HW_PSTATE 0x00000100 75 74 #define CPUPOWER_CAP_AMD_PSTATEDEF 0x00000200 76 75 #define CPUPOWER_CAP_AMD_CPB_MSR 0x00000400 76 + #define CPUPOWER_CAP_AMD_PSTATE 0x00000800 77 77 78 78 #define CPUPOWER_AMD_CPBDIS 0x02000000 79 79 ··· 137 135 138 136 extern int cpufreq_has_boost_support(unsigned int cpu, int *support, 139 137 int *active, int * states); 138 + 139 + /* AMD P-State stuff **************************/ 140 + bool cpupower_amd_pstate_enabled(void); 141 + void amd_pstate_boost_init(unsigned int cpu, 142 + int *support, int *active); 143 + void amd_pstate_show_perf_and_freq(unsigned int cpu, 144 + int no_rounding); 145 + 146 + /* AMD P-State stuff **************************/ 147 + 140 148 /* 141 149 * CPUID functions returning a single datum 142 150 */ ··· 179 167 int *active, int * states) 180 168 { return -1; } 181 169 170 + static inline bool cpupower_amd_pstate_enabled(void) 171 + { return false; } 172 + static inline void amd_pstate_boost_init(unsigned int cpu, int *support, 173 + int *active) 174 + {} 175 + static inline void amd_pstate_show_perf_and_freq(unsigned int cpu, 176 + int no_rounding) 177 + {} 178 + 182 179 /* cpuid and cpuinfo helpers **************************/ 183 180 184 181 static inline unsigned int cpuid_eax(unsigned int op) { return 0; }; ··· 205 184 void get_cpustate(void); 206 185 void print_online_cpus(void); 207 186 void print_offline_cpus(void); 187 + void print_speed(unsigned long speed, int no_rounding); 208 188 209 189 #endif /* __CPUPOWERUTILS_HELPERS__ */

+60

tools/power/cpupower/utils/helpers/misc.c

··· 3 3 #include <stdio.h> 4 4 #include <errno.h> 5 5 #include <stdlib.h> 6 + #include <string.h> 6 7 7 8 #include "helpers/helpers.h" 8 9 #include "helpers/sysfs.h" 10 + #include "cpufreq.h" 9 11 10 12 #if defined(__i386__) || defined(__x86_64__) 11 13 ··· 41 39 if (ret) 42 40 return ret; 43 41 } 42 + } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) { 43 + amd_pstate_boost_init(cpu, support, active); 44 44 } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA) 45 45 *support = *active = 1; 46 46 return 0; ··· 85 81 return -1; 86 82 87 83 return 0; 84 + } 85 + 86 + bool cpupower_amd_pstate_enabled(void) 87 + { 88 + char *driver = cpufreq_get_driver(0); 89 + bool ret = false; 90 + 91 + if (!driver) 92 + return ret; 93 + 94 + if (!strcmp(driver, "amd-pstate")) 95 + ret = true; 96 + 97 + cpufreq_put_driver(driver); 98 + 99 + return ret; 88 100 } 89 101 90 102 #endif /* #if defined(__i386__) || defined(__x86_64__) */ ··· 162 142 bitmask_displaylist(offline_cpus_str, str_len, offline_cpus); 163 143 printf(_("Following CPUs are offline:\n%s\n"), offline_cpus_str); 164 144 printf(_("cpupower set operation was not performed on them\n")); 145 + } 146 + } 147 + 148 + /* 149 + * print_speed 150 + * 151 + * Print the exact CPU frequency with appropriate unit 152 + */ 153 + void print_speed(unsigned long speed, int no_rounding) 154 + { 155 + unsigned long tmp; 156 + 157 + if (no_rounding) { 158 + if (speed > 1000000) 159 + printf("%u.%06u GHz", ((unsigned int)speed / 1000000), 160 + ((unsigned int)speed % 1000000)); 161 + else if (speed > 1000) 162 + printf("%u.%03u MHz", ((unsigned int)speed / 1000), 163 + (unsigned int)(speed % 1000)); 164 + else 165 + printf("%lu kHz", speed); 166 + } else { 167 + if (speed > 1000000) { 168 + tmp = speed % 10000; 169 + if (tmp >= 5000) 170 + speed += 10000; 171 + printf("%u.%02u GHz", ((unsigned int)speed / 1000000), 172 + ((unsigned int)(speed % 1000000) / 10000)); 173 + } else if (speed > 100000) { 174 + tmp = speed % 1000; 175 + if (tmp >= 500) 176 + speed += 1000; 177 + printf("%u MHz", ((unsigned int)speed / 1000)); 178 + } else if (speed > 1000) { 179 + tmp = speed % 100; 180 + if (tmp >= 50) 181 + speed += 100; 182 + printf("%u.%01u MHz", ((unsigned int)speed / 1000), 183 + ((unsigned int)(speed % 1000) / 100)); 184 + } 165 185 } 166 186 }

+354

tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py

··· 1 + #!/usr/bin/env python3 2 + # SPDX-License-Identifier: GPL-2.0-only 3 + # -*- coding: utf-8 -*- 4 + # 5 + """ This utility can be used to debug and tune the performance of the 6 + AMD P-State driver. It imports intel_pstate_tracer to analyze AMD P-State 7 + trace event. 8 + 9 + Prerequisites: 10 + Python version 2.7.x or higher 11 + gnuplot 5.0 or higher 12 + gnuplot-py 1.8 or higher 13 + (Most of the distributions have these required packages. They may be called 14 + gnuplot-py, phython-gnuplot or phython3-gnuplot, gnuplot-nox, ... ) 15 + 16 + Kernel config for Linux trace is enabled 17 + 18 + see print_help(): for Usage and Output details 19 + 20 + """ 21 + from __future__ import print_function 22 + from datetime import datetime 23 + import subprocess 24 + import os 25 + import time 26 + import re 27 + import signal 28 + import sys 29 + import getopt 30 + import Gnuplot 31 + from numpy import * 32 + from decimal import * 33 + sys.path.append('../intel_pstate_tracer') 34 + #import intel_pstate_tracer 35 + import intel_pstate_tracer as ipt 36 + 37 + __license__ = "GPL version 2" 38 + 39 + MAX_CPUS = 256 40 + # Define the csv file columns 41 + C_COMM = 15 42 + C_ELAPSED = 14 43 + C_SAMPLE = 13 44 + C_DURATION = 12 45 + C_LOAD = 11 46 + C_TSC = 10 47 + C_APERF = 9 48 + C_MPERF = 8 49 + C_FREQ = 7 50 + C_MAX_PERF = 6 51 + C_DES_PERF = 5 52 + C_MIN_PERF = 4 53 + C_USEC = 3 54 + C_SEC = 2 55 + C_CPU = 1 56 + 57 + global sample_num, last_sec_cpu, last_usec_cpu, start_time, test_name, trace_file 58 + 59 + getcontext().prec = 11 60 + 61 + sample_num =0 62 + last_sec_cpu = [0] * MAX_CPUS 63 + last_usec_cpu = [0] * MAX_CPUS 64 + 65 + def plot_per_cpu_freq(cpu_index): 66 + """ Plot per cpu frequency """ 67 + 68 + file_name = 'cpu{:0>3}.csv'.format(cpu_index) 69 + if os.path.exists(file_name): 70 + output_png = "cpu%03d_frequency.png" % cpu_index 71 + g_plot = ipt.common_gnuplot_settings() 72 + g_plot('set output "' + output_png + '"') 73 + g_plot('set yrange [0:7]') 74 + g_plot('set ytics 0, 1') 75 + g_plot('set ylabel "CPU Frequency (GHz)"') 76 + g_plot('set title "{} : frequency : CPU {:0>3} : {:%F %H:%M}"'.format(test_name, cpu_index, datetime.now())) 77 + g_plot('set ylabel "CPU frequency"') 78 + g_plot('set key off') 79 + ipt.set_4_plot_linestyles(g_plot) 80 + g_plot('plot "' + file_name + '" using {:d}:{:d} with linespoints linestyle 1 axis x1y1'.format(C_ELAPSED, C_FREQ)) 81 + 82 + def plot_per_cpu_des_perf(cpu_index): 83 + """ Plot per cpu desired perf """ 84 + 85 + file_name = 'cpu{:0>3}.csv'.format(cpu_index) 86 + if os.path.exists(file_name): 87 + output_png = "cpu%03d_des_perf.png" % cpu_index 88 + g_plot = ipt.common_gnuplot_settings() 89 + g_plot('set output "' + output_png + '"') 90 + g_plot('set yrange [0:255]') 91 + g_plot('set ylabel "des perf"') 92 + g_plot('set title "{} : cpu des perf : CPU {:0>3} : {:%F %H:%M}"'.format(test_name, cpu_index, datetime.now())) 93 + g_plot('set key off') 94 + ipt.set_4_plot_linestyles(g_plot) 95 + g_plot('plot "' + file_name + '" using {:d}:{:d} with linespoints linestyle 1 axis x1y1'.format(C_ELAPSED, C_DES_PERF)) 96 + 97 + def plot_per_cpu_load(cpu_index): 98 + """ Plot per cpu load """ 99 + 100 + file_name = 'cpu{:0>3}.csv'.format(cpu_index) 101 + if os.path.exists(file_name): 102 + output_png = "cpu%03d_load.png" % cpu_index 103 + g_plot = ipt.common_gnuplot_settings() 104 + g_plot('set output "' + output_png + '"') 105 + g_plot('set yrange [0:100]') 106 + g_plot('set ytics 0, 10') 107 + g_plot('set ylabel "CPU load (percent)"') 108 + g_plot('set title "{} : cpu load : CPU {:0>3} : {:%F %H:%M}"'.format(test_name, cpu_index, datetime.now())) 109 + g_plot('set key off') 110 + ipt.set_4_plot_linestyles(g_plot) 111 + g_plot('plot "' + file_name + '" using {:d}:{:d} with linespoints linestyle 1 axis x1y1'.format(C_ELAPSED, C_LOAD)) 112 + 113 + def plot_all_cpu_frequency(): 114 + """ Plot all cpu frequencies """ 115 + 116 + output_png = 'all_cpu_frequencies.png' 117 + g_plot = ipt.common_gnuplot_settings() 118 + g_plot('set output "' + output_png + '"') 119 + g_plot('set ylabel "CPU Frequency (GHz)"') 120 + g_plot('set title "{} : cpu frequencies : {:%F %H:%M}"'.format(test_name, datetime.now())) 121 + 122 + title_list = subprocess.check_output('ls cpu???.csv | sed -e \'s/.csv//\'',shell=True).decode('utf-8').replace('\n', ' ') 123 + plot_str = "plot for [i in title_list] i.'.csv' using {:d}:{:d} pt 7 ps 1 title i".format(C_ELAPSED, C_FREQ) 124 + g_plot('title_list = "{}"'.format(title_list)) 125 + g_plot(plot_str) 126 + 127 + def plot_all_cpu_des_perf(): 128 + """ Plot all cpu desired perf """ 129 + 130 + output_png = 'all_cpu_des_perf.png' 131 + g_plot = ipt.common_gnuplot_settings() 132 + g_plot('set output "' + output_png + '"') 133 + g_plot('set ylabel "des perf"') 134 + g_plot('set title "{} : cpu des perf : {:%F %H:%M}"'.format(test_name, datetime.now())) 135 + 136 + title_list = subprocess.check_output('ls cpu???.csv | sed -e \'s/.csv//\'',shell=True).decode('utf-8').replace('\n', ' ') 137 + plot_str = "plot for [i in title_list] i.'.csv' using {:d}:{:d} pt 255 ps 1 title i".format(C_ELAPSED, C_DES_PERF) 138 + g_plot('title_list = "{}"'.format(title_list)) 139 + g_plot(plot_str) 140 + 141 + def plot_all_cpu_load(): 142 + """ Plot all cpu load """ 143 + 144 + output_png = 'all_cpu_load.png' 145 + g_plot = ipt.common_gnuplot_settings() 146 + g_plot('set output "' + output_png + '"') 147 + g_plot('set yrange [0:100]') 148 + g_plot('set ylabel "CPU load (percent)"') 149 + g_plot('set title "{} : cpu load : {:%F %H:%M}"'.format(test_name, datetime.now())) 150 + 151 + title_list = subprocess.check_output('ls cpu???.csv | sed -e \'s/.csv//\'',shell=True).decode('utf-8').replace('\n', ' ') 152 + plot_str = "plot for [i in title_list] i.'.csv' using {:d}:{:d} pt 255 ps 1 title i".format(C_ELAPSED, C_LOAD) 153 + g_plot('title_list = "{}"'.format(title_list)) 154 + g_plot(plot_str) 155 + 156 + def store_csv(cpu_int, time_pre_dec, time_post_dec, min_perf, des_perf, max_perf, freq_ghz, mperf, aperf, tsc, common_comm, load, duration_ms, sample_num, elapsed_time, cpu_mask): 157 + """ Store master csv file information """ 158 + 159 + global graph_data_present 160 + 161 + if cpu_mask[cpu_int] == 0: 162 + return 163 + 164 + try: 165 + f_handle = open('cpu.csv', 'a') 166 + string_buffer = "CPU_%03u, %05u, %06u, %u, %u, %u, %.4f, %u, %u, %u, %.2f, %.3f, %u, %.3f, %s\n" % (cpu_int, int(time_pre_dec), int(time_post_dec), int(min_perf), int(des_perf), int(max_perf), freq_ghz, int(mperf), int(aperf), int(tsc), load, duration_ms, sample_num, elapsed_time, common_comm) 167 + f_handle.write(string_buffer) 168 + f_handle.close() 169 + except: 170 + print('IO error cpu.csv') 171 + return 172 + 173 + graph_data_present = True; 174 + 175 + 176 + def cleanup_data_files(): 177 + """ clean up existing data files """ 178 + 179 + if os.path.exists('cpu.csv'): 180 + os.remove('cpu.csv') 181 + f_handle = open('cpu.csv', 'a') 182 + f_handle.write('common_cpu, common_secs, common_usecs, min_perf, des_perf, max_perf, freq, mperf, aperf, tsc, load, duration_ms, sample_num, elapsed_time, common_comm') 183 + f_handle.write('\n') 184 + f_handle.close() 185 + 186 + def read_trace_data(file_name, cpu_mask): 187 + """ Read and parse trace data """ 188 + 189 + global current_max_cpu 190 + global sample_num, last_sec_cpu, last_usec_cpu, start_time 191 + 192 + try: 193 + data = open(file_name, 'r').read() 194 + except: 195 + print('Error opening ', file_name) 196 + sys.exit(2) 197 + 198 + for line in data.splitlines(): 199 + search_obj = \ 200 + re.search(r'(^(.*?)\[)((\d+)[^\]])(.*?)(\d+)([.])(\d+)(.*?amd_min_perf=)(\d+)(.*?amd_des_perf=)(\d+)(.*?amd_max_perf=)(\d+)(.*?freq=)(\d+)(.*?mperf=)(\d+)(.*?aperf=)(\d+)(.*?tsc=)(\d+)' 201 + , line) 202 + 203 + if search_obj: 204 + cpu = search_obj.group(3) 205 + cpu_int = int(cpu) 206 + cpu = str(cpu_int) 207 + 208 + time_pre_dec = search_obj.group(6) 209 + time_post_dec = search_obj.group(8) 210 + min_perf = search_obj.group(10) 211 + des_perf = search_obj.group(12) 212 + max_perf = search_obj.group(14) 213 + freq = search_obj.group(16) 214 + mperf = search_obj.group(18) 215 + aperf = search_obj.group(20) 216 + tsc = search_obj.group(22) 217 + 218 + common_comm = search_obj.group(2).replace(' ', '') 219 + 220 + if sample_num == 0 : 221 + start_time = Decimal(time_pre_dec) + Decimal(time_post_dec) / Decimal(1000000) 222 + sample_num += 1 223 + 224 + if last_sec_cpu[cpu_int] == 0 : 225 + last_sec_cpu[cpu_int] = time_pre_dec 226 + last_usec_cpu[cpu_int] = time_post_dec 227 + else : 228 + duration_us = (int(time_pre_dec) - int(last_sec_cpu[cpu_int])) * 1000000 + (int(time_post_dec) - int(last_usec_cpu[cpu_int])) 229 + duration_ms = Decimal(duration_us) / Decimal(1000) 230 + last_sec_cpu[cpu_int] = time_pre_dec 231 + last_usec_cpu[cpu_int] = time_post_dec 232 + elapsed_time = Decimal(time_pre_dec) + Decimal(time_post_dec) / Decimal(1000000) - start_time 233 + load = Decimal(int(mperf)*100)/ Decimal(tsc) 234 + freq_ghz = Decimal(freq)/Decimal(1000000) 235 + store_csv(cpu_int, time_pre_dec, time_post_dec, min_perf, des_perf, max_perf, freq_ghz, mperf, aperf, tsc, common_comm, load, duration_ms, sample_num, elapsed_time, cpu_mask) 236 + 237 + if cpu_int > current_max_cpu: 238 + current_max_cpu = cpu_int 239 + # Now separate the main overall csv file into per CPU csv files. 240 + ipt.split_csv(current_max_cpu, cpu_mask) 241 + 242 + 243 + def signal_handler(signal, frame): 244 + print(' SIGINT: Forcing cleanup before exit.') 245 + if interval: 246 + ipt.disable_trace(trace_file) 247 + ipt.clear_trace_file() 248 + ipt.free_trace_buffer() 249 + sys.exit(0) 250 + 251 + trace_file = "/sys/kernel/debug/tracing/events/amd_cpu/enable" 252 + signal.signal(signal.SIGINT, signal_handler) 253 + 254 + interval = "" 255 + file_name = "" 256 + cpu_list = "" 257 + test_name = "" 258 + memory = "10240" 259 + graph_data_present = False; 260 + 261 + valid1 = False 262 + valid2 = False 263 + 264 + cpu_mask = zeros((MAX_CPUS,), dtype=int) 265 + 266 + 267 + try: 268 + opts, args = getopt.getopt(sys.argv[1:],"ht:i:c:n:m:",["help","trace_file=","interval=","cpu=","name=","memory="]) 269 + except getopt.GetoptError: 270 + ipt.print_help('amd_pstate') 271 + sys.exit(2) 272 + for opt, arg in opts: 273 + if opt == '-h': 274 + print() 275 + sys.exit() 276 + elif opt in ("-t", "--trace_file"): 277 + valid1 = True 278 + location = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__))) 279 + file_name = os.path.join(location, arg) 280 + elif opt in ("-i", "--interval"): 281 + valid1 = True 282 + interval = arg 283 + elif opt in ("-c", "--cpu"): 284 + cpu_list = arg 285 + elif opt in ("-n", "--name"): 286 + valid2 = True 287 + test_name = arg 288 + elif opt in ("-m", "--memory"): 289 + memory = arg 290 + 291 + if not (valid1 and valid2): 292 + ipt.print_help('amd_pstate') 293 + sys.exit() 294 + 295 + if cpu_list: 296 + for p in re.split("[,]", cpu_list): 297 + if int(p) < MAX_CPUS : 298 + cpu_mask[int(p)] = 1 299 + else: 300 + for i in range (0, MAX_CPUS): 301 + cpu_mask[i] = 1 302 + 303 + if not os.path.exists('results'): 304 + os.mkdir('results') 305 + ipt.fix_ownership('results') 306 + 307 + os.chdir('results') 308 + if os.path.exists(test_name): 309 + print('The test name directory already exists. Please provide a unique test name. Test re-run not supported, yet.') 310 + sys.exit() 311 + os.mkdir(test_name) 312 + ipt.fix_ownership(test_name) 313 + os.chdir(test_name) 314 + 315 + cur_version = sys.version_info 316 + print('python version (should be >= 2.7):') 317 + print(cur_version) 318 + 319 + cleanup_data_files() 320 + 321 + if interval: 322 + file_name = "/sys/kernel/debug/tracing/trace" 323 + ipt.clear_trace_file() 324 + ipt.set_trace_buffer_size(memory) 325 + ipt.enable_trace(trace_file) 326 + time.sleep(int(interval)) 327 + ipt.disable_trace(trace_file) 328 + 329 + current_max_cpu = 0 330 + 331 + read_trace_data(file_name, cpu_mask) 332 + 333 + if interval: 334 + ipt.clear_trace_file() 335 + ipt.free_trace_buffer() 336 + 337 + if graph_data_present == False: 338 + print('No valid data to plot') 339 + sys.exit(2) 340 + 341 + for cpu_no in range(0, current_max_cpu + 1): 342 + plot_per_cpu_freq(cpu_no) 343 + plot_per_cpu_des_perf(cpu_no) 344 + plot_per_cpu_load(cpu_no) 345 + 346 + plot_all_cpu_des_perf() 347 + plot_all_cpu_frequency() 348 + plot_all_cpu_load() 349 + 350 + for root, dirs, files in os.walk('.'): 351 + for f in files: 352 + ipt.fix_ownership(f) 353 + 354 + os.chdir('../../')

+115 -117

tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py

··· 63 63 C_SEC = 2 64 64 C_CPU = 1 65 65 66 - global sample_num, last_sec_cpu, last_usec_cpu, start_time, testname 66 + global sample_num, last_sec_cpu, last_usec_cpu, start_time, testname, trace_file 67 67 68 68 # 11 digits covers uptime to 115 days 69 69 getcontext().prec = 11 ··· 72 72 last_sec_cpu = [0] * MAX_CPUS 73 73 last_usec_cpu = [0] * MAX_CPUS 74 74 75 - def print_help(): 76 - print('intel_pstate_tracer.py:') 75 + def print_help(driver_name): 76 + print('%s_tracer.py:'%driver_name) 77 77 print(' Usage:') 78 78 print(' If the trace file is available, then to simply parse and plot, use (sudo not required):') 79 - print(' ./intel_pstate_tracer.py [-c cpus] -t <trace_file> -n <test_name>') 79 + print(' ./%s_tracer.py [-c cpus] -t <trace_file> -n <test_name>'%driver_name) 80 80 print(' Or') 81 - print(' ./intel_pstate_tracer.py [--cpu cpus] ---trace_file <trace_file> --name <test_name>') 81 + print(' ./%s_tracer.py [--cpu cpus] ---trace_file <trace_file> --name <test_name>'%driver_name) 82 82 print(' To generate trace file, parse and plot, use (sudo required):') 83 - print(' sudo ./intel_pstate_tracer.py [-c cpus] -i <interval> -n <test_name> -m <kbytes>') 83 + print(' sudo ./%s_tracer.py [-c cpus] -i <interval> -n <test_name> -m <kbytes>'%driver_name) 84 84 print(' Or') 85 - print(' sudo ./intel_pstate_tracer.py [--cpu cpus] --interval <interval> --name <test_name> --memory <kbytes>') 85 + print(' sudo ./%s_tracer.py [--cpu cpus] --interval <interval> --name <test_name> --memory <kbytes>'%driver_name) 86 86 print(' Optional argument:') 87 87 print(' cpus: comma separated list of CPUs') 88 88 print(' kbytes: Kilo bytes of memory per CPU to allocate to the trace buffer. Default: 10240') ··· 323 323 g_plot('set style line 3 linetype 1 linecolor rgb "purple" pointtype -1') 324 324 g_plot('set style line 4 linetype 1 linecolor rgb "blue" pointtype -1') 325 325 326 - def store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _to, mperf, aperf, tsc, freq_ghz, io_boost, common_comm, load, duration_ms, sample_num, elapsed_time, tsc_ghz): 326 + def store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _to, mperf, aperf, tsc, freq_ghz, io_boost, common_comm, load, duration_ms, sample_num, elapsed_time, tsc_ghz, cpu_mask): 327 327 """ Store master csv file information """ 328 328 329 329 global graph_data_present ··· 342 342 343 343 graph_data_present = True; 344 344 345 - def split_csv(): 345 + def split_csv(current_max_cpu, cpu_mask): 346 346 """ seperate the all csv file into per CPU csv files. """ 347 - 348 - global current_max_cpu 349 347 350 348 if os.path.exists('cpu.csv'): 351 349 for index in range(0, current_max_cpu + 1): ··· 379 381 print('IO error clearing trace file ') 380 382 sys.exit(2) 381 383 382 - def enable_trace(): 384 + def enable_trace(trace_file): 383 385 """ Enable trace """ 384 386 385 387 try: 386 - open('/sys/kernel/debug/tracing/events/power/pstate_sample/enable' 387 - , 'w').write("1") 388 + open(trace_file,'w').write("1") 388 389 except: 389 390 print('IO error enabling trace ') 390 391 sys.exit(2) 391 392 392 - def disable_trace(): 393 + def disable_trace(trace_file): 393 394 """ Disable trace """ 394 395 395 396 try: 396 - open('/sys/kernel/debug/tracing/events/power/pstate_sample/enable' 397 - , 'w').write("0") 397 + open(trace_file, 'w').write("0") 398 398 except: 399 399 print('IO error disabling trace ') 400 400 sys.exit(2) 401 401 402 - def set_trace_buffer_size(): 402 + def set_trace_buffer_size(memory): 403 403 """ Set trace buffer size """ 404 404 405 405 try: ··· 417 421 print('IO error freeing trace buffer ') 418 422 sys.exit(2) 419 423 420 - def read_trace_data(filename): 424 + def read_trace_data(filename, cpu_mask): 421 425 """ Read and parse trace data """ 422 426 423 427 global current_max_cpu ··· 477 481 tsc_ghz = Decimal(0) 478 482 if duration_ms != Decimal(0) : 479 483 tsc_ghz = Decimal(tsc)/duration_ms/Decimal(1000000) 480 - store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _to, mperf, aperf, tsc, freq_ghz, io_boost, common_comm, load, duration_ms, sample_num, elapsed_time, tsc_ghz) 484 + store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _to, mperf, aperf, tsc, freq_ghz, io_boost, common_comm, load, duration_ms, sample_num, elapsed_time, tsc_ghz, cpu_mask) 481 485 482 486 if cpu_int > current_max_cpu: 483 487 current_max_cpu = cpu_int 484 488 # End of for each trace line loop 485 489 # Now seperate the main overall csv file into per CPU csv files. 486 - split_csv() 490 + split_csv(current_max_cpu, cpu_mask) 487 491 488 492 def signal_handler(signal, frame): 489 493 print(' SIGINT: Forcing cleanup before exit.') 490 494 if interval: 491 - disable_trace() 495 + disable_trace(trace_file) 492 496 clear_trace_file() 493 497 # Free the memory 494 498 free_trace_buffer() 495 499 sys.exit(0) 496 500 497 - signal.signal(signal.SIGINT, signal_handler) 501 + if __name__ == "__main__": 502 + trace_file = "/sys/kernel/debug/tracing/events/power/pstate_sample/enable" 503 + signal.signal(signal.SIGINT, signal_handler) 498 504 499 - interval = "" 500 - filename = "" 501 - cpu_list = "" 502 - testname = "" 503 - memory = "10240" 504 - graph_data_present = False; 505 + interval = "" 506 + filename = "" 507 + cpu_list = "" 508 + testname = "" 509 + memory = "10240" 510 + graph_data_present = False; 505 511 506 - valid1 = False 507 - valid2 = False 512 + valid1 = False 513 + valid2 = False 508 514 509 - cpu_mask = zeros((MAX_CPUS,), dtype=int) 515 + cpu_mask = zeros((MAX_CPUS,), dtype=int) 510 516 511 - try: 512 - opts, args = getopt.getopt(sys.argv[1:],"ht:i:c:n:m:",["help","trace_file=","interval=","cpu=","name=","memory="]) 513 - except getopt.GetoptError: 514 - print_help() 515 - sys.exit(2) 516 - for opt, arg in opts: 517 - if opt == '-h': 518 - print() 517 + try: 518 + opts, args = getopt.getopt(sys.argv[1:],"ht:i:c:n:m:",["help","trace_file=","interval=","cpu=","name=","memory="]) 519 + except getopt.GetoptError: 520 + print_help('intel_pstate') 521 + sys.exit(2) 522 + for opt, arg in opts: 523 + if opt == '-h': 524 + print_help('intel_pstate') 525 + sys.exit() 526 + elif opt in ("-t", "--trace_file"): 527 + valid1 = True 528 + location = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__))) 529 + filename = os.path.join(location, arg) 530 + elif opt in ("-i", "--interval"): 531 + valid1 = True 532 + interval = arg 533 + elif opt in ("-c", "--cpu"): 534 + cpu_list = arg 535 + elif opt in ("-n", "--name"): 536 + valid2 = True 537 + testname = arg 538 + elif opt in ("-m", "--memory"): 539 + memory = arg 540 + 541 + if not (valid1 and valid2): 542 + print_help('intel_pstate') 519 543 sys.exit() 520 - elif opt in ("-t", "--trace_file"): 521 - valid1 = True 522 - location = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__))) 523 - filename = os.path.join(location, arg) 524 - elif opt in ("-i", "--interval"): 525 - valid1 = True 526 - interval = arg 527 - elif opt in ("-c", "--cpu"): 528 - cpu_list = arg 529 - elif opt in ("-n", "--name"): 530 - valid2 = True 531 - testname = arg 532 - elif opt in ("-m", "--memory"): 533 - memory = arg 534 544 535 - if not (valid1 and valid2): 536 - print_help() 537 - sys.exit() 545 + if cpu_list: 546 + for p in re.split("[,]", cpu_list): 547 + if int(p) < MAX_CPUS : 548 + cpu_mask[int(p)] = 1 549 + else: 550 + for i in range (0, MAX_CPUS): 551 + cpu_mask[i] = 1 538 552 539 - if cpu_list: 540 - for p in re.split("[,]", cpu_list): 541 - if int(p) < MAX_CPUS : 542 - cpu_mask[int(p)] = 1 543 - else: 544 - for i in range (0, MAX_CPUS): 545 - cpu_mask[i] = 1 553 + if not os.path.exists('results'): 554 + os.mkdir('results') 555 + # The regular user needs to own the directory, not root. 556 + fix_ownership('results') 546 557 547 - if not os.path.exists('results'): 548 - os.mkdir('results') 558 + os.chdir('results') 559 + if os.path.exists(testname): 560 + print('The test name directory already exists. Please provide a unique test name. Test re-run not supported, yet.') 561 + sys.exit() 562 + os.mkdir(testname) 549 563 # The regular user needs to own the directory, not root. 550 - fix_ownership('results') 564 + fix_ownership(testname) 565 + os.chdir(testname) 551 566 552 - os.chdir('results') 553 - if os.path.exists(testname): 554 - print('The test name directory already exists. Please provide a unique test name. Test re-run not supported, yet.') 555 - sys.exit() 556 - os.mkdir(testname) 557 - # The regular user needs to own the directory, not root. 558 - fix_ownership(testname) 559 - os.chdir(testname) 567 + # Temporary (or perhaps not) 568 + cur_version = sys.version_info 569 + print('python version (should be >= 2.7):') 570 + print(cur_version) 560 571 561 - # Temporary (or perhaps not) 562 - cur_version = sys.version_info 563 - print('python version (should be >= 2.7):') 564 - print(cur_version) 572 + # Left as "cleanup" for potential future re-run ability. 573 + cleanup_data_files() 565 574 566 - # Left as "cleanup" for potential future re-run ability. 567 - cleanup_data_files() 575 + if interval: 576 + filename = "/sys/kernel/debug/tracing/trace" 577 + clear_trace_file() 578 + set_trace_buffer_size(memory) 579 + enable_trace(trace_file) 580 + print('Sleeping for ', interval, 'seconds') 581 + time.sleep(int(interval)) 582 + disable_trace(trace_file) 568 583 569 - if interval: 570 - filename = "/sys/kernel/debug/tracing/trace" 571 - clear_trace_file() 572 - set_trace_buffer_size() 573 - enable_trace() 574 - print('Sleeping for ', interval, 'seconds') 575 - time.sleep(int(interval)) 576 - disable_trace() 584 + current_max_cpu = 0 577 585 578 - current_max_cpu = 0 586 + read_trace_data(filename, cpu_mask) 579 587 580 - read_trace_data(filename) 588 + if interval: 589 + clear_trace_file() 590 + # Free the memory 591 + free_trace_buffer() 581 592 582 - if interval: 583 - clear_trace_file() 584 - # Free the memory 585 - free_trace_buffer() 593 + if graph_data_present == False: 594 + print('No valid data to plot') 595 + sys.exit(2) 586 596 587 - if graph_data_present == False: 588 - print('No valid data to plot') 589 - sys.exit(2) 597 + for cpu_no in range(0, current_max_cpu + 1): 598 + plot_perf_busy_with_sample(cpu_no) 599 + plot_perf_busy(cpu_no) 600 + plot_durations(cpu_no) 601 + plot_loads(cpu_no) 590 602 591 - for cpu_no in range(0, current_max_cpu + 1): 592 - plot_perf_busy_with_sample(cpu_no) 593 - plot_perf_busy(cpu_no) 594 - plot_durations(cpu_no) 595 - plot_loads(cpu_no) 603 + plot_pstate_cpu_with_sample() 604 + plot_pstate_cpu() 605 + plot_load_cpu() 606 + plot_frequency_cpu() 607 + plot_duration_cpu() 608 + plot_scaled_cpu() 609 + plot_boost_cpu() 610 + plot_ghz_cpu() 596 611 597 - plot_pstate_cpu_with_sample() 598 - plot_pstate_cpu() 599 - plot_load_cpu() 600 - plot_frequency_cpu() 601 - plot_duration_cpu() 602 - plot_scaled_cpu() 603 - plot_boost_cpu() 604 - plot_ghz_cpu() 612 + # It is preferrable, but not necessary, that the regular user owns the files, not root. 613 + for root, dirs, files in os.walk('.'): 614 + for f in files: 615 + fix_ownership(f) 605 616 606 - # It is preferrable, but not necessary, that the regular user owns the files, not root. 607 - for root, dirs, files in os.walk('.'): 608 - for f in files: 609 - fix_ownership(f) 610 - 611 - os.chdir('../../') 617 + os.chdir('../../')

+1 -1

tools/power/x86/turbostat/turbostat.c

··· 2323 2323 }; 2324 2324 2325 2325 int icx_pkg_cstate_limits[16] = 2326 - { PCL__0, PCL__2, PCL__6, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLUNL, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV, 2326 + { PCL__0, PCL__2, PCL__6, PCL__6, PCLRSV, PCLRSV, PCLRSV, PCLUNL, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV, 2327 2327 PCLRSV, PCLRSV 2328 2328 }; 2329 2329